Port of NNEDI under new v2.6 AVS API - Page 17

real.finder · 7th December 2016, 23:34

Quote:

Originally Posted by Groucho2004

By the way, the DLL from Release_Intel_XP_Core2_SSE4.2" does not work on XP. I installed the latest Intel redist package but Dependency Walker reveals that LIBIOMP5MD.DLL is looking for a function in kernel32.dll that does not exist on XP.

it's work here if you use this trick http://www.mediafire.com/file/5rp8jt...u/icl+-+xp.rar

tested in winxp sp3 32bit in VirtualBox

the last Intel redist package is the problem, use this https://software.intel.com/sites/def...2016.1.146.zip

FranceBB · 8th December 2016, 01:03

Thanks for the "trick".
With that, it works just fine.
Tested a few minutes ago with:

nnedi3_resize16(target_width=1280, target_height=720, mixed=true, thr=1.0, elast=1.5, nns=4, qual=2, etype=0, pscrn=4, threads=0, kernel_d="Spline", kernel_u="Spline", taps=12, f_d=1.0, f_u=2.0, sharp=0)

to downscale from 1080p to 720p.
Thanks for the update!

jpsdr · 19th December 2016, 14:34

I found something odd in the nnedi3 code, and i think there is an error :

Code:

	for (int y=0; y<ydia; ++y)
	{
		const uint8_t *srcpT = srcp+y_stride;

		for (int x=0; x<xdia; ++x, ++input)
		{
			sum += srcpT[x];
			sumsq += srcpT[x]*srcpT[x];
			input[0] = srcpT[x];
		}
		y_stride+=stride2;
	}
	const float scale = 1.0f/(float)(xdia*ydia);
	mstd[0] = sum*scale;
	mstd[1] = sumsq*scale-mstd[0]*mstd[0];

If think we should have this instead :

Code:

mstd[1] = sumsq*scale*scale-mstd[0]*mstd[0];

Anyone is welcomed to comment.

feisty2 · 19th December 2016, 15:12

well, you could have a try and see if it still works..
only tritical will ever know the exact answer

jpsdr · 19th December 2016, 19:43

It's too bad he's not on doom9 anymore...

ajp_anton · 19th December 2016, 22:30

Not knowing exactly what that part is for, but yeah, it sure looks odd.

Breaking it down, if
- sum is just the sum of srcpT's, whatever those are.
- sumsq is the sum of the squared srcpT's.
- "sqsum" is the square of the sum (introducing my own variable).
then
- mstd[1] = (sumsq - sqsum*scale)*scale
which looks weirdly unbalanced. Either
- mstd[1] = (sumsq - sqsum)*scale
or
- mstd[1] = (sumsq - sqsum)*scale*scale
would look better. I guess it's the latter (same as your suggested edit) because mstd[0] already has one scale, so squaring that has two.

Edit:
Then again, squaring scale is also weird, becase it's basically (the number of elements in the sum)^-1, so it's a normalization factor. Maybe it's supposed to be (sumsq - sqsum)*scale ?
Like feisty said, try and see the results.

Wilbert · 20th December 2016, 00:33

There's definitely something wrong, but you should look at entire source to figure out how to correct it.

Sadly any documentation in the source is missing. Here is my take. Disclaimer i understand nothing of the source.

Most of the fun seems to be happing in the function evalFunc_2 in nnedi3.cpp. The code:

Code:

void evalFunc_2(void *ps)
{
	...
	const int qual = pss->qual;
	const float scale = 1.0f/(float)qual;
	void (*extract)(const uint8_t*,const int,const int,const int,float*,float*);
	void (*wae5)(const float*,const int,float*);

	if (opt==1) wae5=weightedAvgElliottMul5_m16_C;
	else wae5=weightedAvgElliottMul5_m16_SSE2;
	...
	if (fapprox&2) // use int16 dot products
		{
			if (opt==1) extract=extract_m8_i16_C;
			else extract=extract_m8_i16_SSE2;
			...
		}
		else // use float dot products
		{
			if (opt==1) extract=extract_m8_C;
			else extract=extract_m8_SSE2;
			...
	}
	...
	extract(srcpp+x,src_pitch,xdia,ydia,mstd,input);
	...
	wae5(temp,nns,mstd);
	...
	if (opt>1) castScale_SSE(mstd,&scale,dstp+x);
	else dstp[x]=min(max((int)(mstd[3]*scale+0.5f),0),255);
	...
}

Looking at the last line, it implies that mstd[3] and the destination pixels differ a factor scale (since dstp[x]=mstd[3]*scale, removing the rounding).
castScale_SSE is defined nnedi3_asm.asm, but i don't know how to read asm.

The function weightedAvgElliottMul5_m16_C which is called in evalFunc_2 (and is set to wae5) gives another clue:

Code:

void weightedAvgElliottMul5_m16_C(const float *w,const int n,float *mstd)
{
	...
	if (wsum>min_weight_sum[0]) mstd[3]+=((5.0f*vsum)/wsum)*mstd[1]+mstd[0];
	else mstd[3]+=mstd[0];
}

This implies that mstd[3], mstd[1] and mst[0] should be of the same scale.

Later on in the code, extract_m8_i16_C/extract_m8_i16_SSE2/extract_m8_C/extract_m8_SSE2, is set to extract. The function extract is called as

Code:

extract(srcpp+x,src_pitch,xdia,ydia,mstd,input);

Here mstd is defined. jspdr pasted some code of the function extract_m8_C, but the issue is there in all of these four functions. In extract_m8_C we see

Code:

void extract_m8_C(const uint8_t *srcp,const int stride,const int xdia,const int ydia,float *mstd,float *input)
{
	...
	const float scale = 1.0f/(float)(xdia*ydia);

	mstd[0] = sum*scale;
	mstd[1] = sumsq*scale-mstd[0]*mstd[0];
	mstd[3] = 0.0f;
	if (mstd[1]<=FLT_EPSILON) mstd[1]=mstd[2]=0.0f;
	else
	{
		mstd[1]=sqrtf(mstd[1]);
		mstd[2]=1.0f/mstd[1];
	}
	...
}

mstd[0] and sum (the source pixels) differ a factor scale which is consistent with the above. That is, if the value of scale in extract_m8_C is the same as scale in evalFunc_2. I have no idea if that's the case.
If we change 'mstd[1] = sumsq*scale-mstd[0]*mstd[0];' to 'mstd[1] = sumsq*scale*scale-mstd[0]*mstd[0];', it implies that mstd[1] and mstd[0] differ a factor scale, but mstd[1] is overwritten by its square root later on, so 'mstd[1]=sqrtf(mstd[1]);'. So now mstd[1] and mstd[0] have the same scale which is consistent with the above.
So you need to change that in all four functions.

What i don't understand what mstd[2] is supposed to do. It has scale^(-1) compared to mstd[1]. I don't see where mstd[2] is used, and thus if its scaling is correct.

Wilbert · 20th December 2016, 00:45

mmm scale in evalFunc_2 is set to '1.0f/(float)qual;' with qual being an input parameter (being 1 or 2). While scale in extract_m8_C is equal to '1.0f/(float)(xdia*ydia);'.

qual doesn't seem equal to xdia*ydia to me?? xdia and ydia are set by

Code:

pssInfo[i].xdia = xdiaTable[nsize];
pssInfo[i].ydia = ydiaTable[nsize];

and these tables by (see header file):

Code:

const int xdiaTable[NUM_NSIZE] = {8,16,32,48,8,16,32};
const int ydiaTable[NUM_NSIZE] = {6,6,6,6,4,4,4};

jpsdr · 20th December 2016, 10:09

Finaly, after viewing things in statistic way, it's good. mstd : probably Mean STandard Deviation.
mstd[0] is mean, mstd[1] is mean standard deviation, which is the square root of : mean of the sum of the squares, less the square of the mean.

Sorry, my mistake.

Wilbert · 20th December 2016, 12:43

Quote:

Originally Posted by jpsdr

Finaly, after viewing things in statistic way, it's good. mstd : probably Mean STandard Deviation.
mstd[0] is mean, mstd[1] is mean standard deviation, which is the square root of : mean of the sum of the squares, less the square of the mean.

Yes indeed.

Your post is a bit cryptic. I think you are right that it should be

Code:

mstd[1] = sumsq*scale*scale-mstd[0]*mstd[0];

But i also think that the scale variables in evalFunc_2 and in the extract functions should be the same. I don't understand why they are different.

jpsdr · 20th December 2016, 14:42

Again error from my side, the mean standard deviations is not what i've said after checking (my memory was not exactly right). We are not far, but it's not exactly what is calculated here.
But, what is done here is the mean of the squares less the square of the mean, and viewing like this, it can somehow make sense. So, maybe the formula is correct.

Wilbert · 20th December 2016, 17:27

I give up. Leave the bugs in.

Quote:

But, what is done here is the mean of the squares less the square of the mean

This is called the variance, and if you take the square of it you will get the standard deviation. Thus

VAR[X] = E[(X-E[X])^2] = E[x^2]-E[X^2], SD[X] = sqrt(VAR[X])

feisty2 · 20th December 2016, 17:51

Quote:

Originally Posted by Wilbert

I give up. Leave the bugs in.

This is called the variance, and if you take the square of it you will get the standard deviation. Thus

VAR[X] = E[(X-E[X])^2] = E[x^2]-E[X^2], SD[X] = sqrt(VAR[X])

should be E(x^2) - E(x)^2

EDIT: Var(x) = E((x-E(x))^2) = E(x^2 - 2xE(x) + E(x)^2) = E(x^2) - 2E(x)E(x) + E(x)^2 = E(x^2) - E(x)^2

jpsdr · 20th December 2016, 19:37

So, finaly there is probably no bug, sumsq*scale-mstd[0]*mstd[0] produce the variance.
E(x^2)=sumsq*scale
E(x)^2=mstd[0]*mstd[0]
No...?

Still not been able to get 16bits working, and i can't figure out where it's going wrong....

Wilbert · 20th December 2016, 22:45

Quote:

Originally Posted by jpsdr

So, finaly there is probably no bug, sumsq*scale-mstd[0]*mstd[0] produce the variance.
E(x^2)=sumsq*scale
E(x)^2=mstd[0]*mstd[0]
No...?

Still not been able to get 16bits working, and i can't figure out where it's going wrong....

E(x^2)=sumsq*scale^2 as i see it, but i guess i can't convince anyone.

Anyway. This scale factor is 1 by default (= qual input parameter). Could you make some screenshots voor qual=1 and qual=2 and compare them?

StainlessS · 20th December 2016, 23:59

Quote:

Originally Posted by Wilbert

Could you make some screenshots voor qual=1 and qual=2 and compare them?

This is an English only forum, please don't post in foreign language here, I don't want to have to draw an administrators attention to this. Thank you for your compliance.

Merry Xmas Wilbert et al.

[Latin dont count as a foreign language as only dead Romans speak it + a few Swiss Romansch nearly Roman speakers [bout 10,000 I believe]]

jpsdr · 21st December 2016, 18:22

After a bloody and painfull struggle, i've been able to make the 16bits working.
Can someone explain to me why this is working :

Code:

const uint8_t *srcp = pss->srcp[b];
const uint8_t *srcpp = srcp-(ydia-1)*src_pitch-xdiad2m1;

and why this is not (at least with VS2015 community) :

Code:

const uint8_t *srcp = pss->srcp[b];
const uint8_t *srcpp = srcp-((ydia-1)*src_pitch-xdiad2m1);

???????????

Thanks again again to feisty2 for the code, it was very usefull, especialy for the init part and weight calcul adjustment.
And future thanks also for the part i'll begin to work in : the ASM ! The code will be helpfull.

Groucho2004 · 21st December 2016, 18:44

Quote:

Originally Posted by jpsdr

Can someone explain to me why this is working :

Code:

const uint8_t *srcp = pss->srcp[b];
const uint8_t *srcpp = srcp-(ydia-1)*src_pitch-xdiad2m1;

and why this is not (at least with VS2015 community) :

Code:

const uint8_t *srcp = pss->srcp[b];
const uint8_t *srcpp = srcp-((ydia-1)*src_pitch-xdiad2m1);

???????????

Because the additional braces in the second statement change the precedence in which the variables are evaluated.

jpsdr · 21st December 2016, 19:34

Argh... Back home too late to delete my stupid question after i've realised it...

pinterf · 21st December 2016, 19:42

Great news, I suppose the hard thing was having uint16_t instead of a byte, does it automatically work for e.g. 10 bit videos? (Ideally all filters that work for 16 bits should also support 10, 12 and 14 bit videos)

19th December 2016, 22:30	#326 \| Link
ajp_anton Registered User Join Date: Aug 2006 Location: Stockholm/Helsinki Posts: 805	Not knowing exactly what that part is for, but yeah, it sure looks odd. Breaking it down, if - sum is just the sum of srcpT's, whatever those are. - sumsq is the sum of the squared srcpT's. - "sqsum" is the square of the sum (introducing my own variable). then - mstd[1] = (sumsq - sqsumscale)scale which looks weirdly unbalanced. Either - mstd[1] = (sumsq - sqsum)scale or - mstd[1] = (sumsq - sqsum)scalescale would look better. I guess it's the latter (same as your suggested edit) because mstd[0] already has one scale, so squaring that has two. Edit: Then again, squaring scale is also weird, becase it's basically (the number of elements in the sum)^-1, so it's a normalization factor. Maybe it's supposed to be (sumsq - sqsum)scale ? Like feisty said, try and see the results. Last edited by ajp_anton; 19th December 2016 at 22:39.

20th December 2016, 00:33	#327 \| Link
Wilbert Moderator Join Date: Nov 2001 Location: Netherlands Posts: 6,364	There's definitely something wrong, but you should look at entire source to figure out how to correct it. Sadly any documentation in the source is missing. Here is my take. Disclaimer i understand nothing of the source. Most of the fun seems to be happing in the function evalFunc_2 in nnedi3.cpp. The code: Code: void evalFunc_2(void ps) { ... const int qual = pss->qual; const float scale = 1.0f/(float)qual; void (extract)(const uint8_t,const int,const int,const int,float,float); void (wae5)(const float,const int,float); if (opt==1) wae5=weightedAvgElliottMul5_m16_C; else wae5=weightedAvgElliottMul5_m16_SSE2; ... if (fapprox&2) // use int16 dot products { if (opt==1) extract=extract_m8_i16_C; else extract=extract_m8_i16_SSE2; ... } else // use float dot products { if (opt==1) extract=extract_m8_C; else extract=extract_m8_SSE2; ... } ... extract(srcpp+x,src_pitch,xdia,ydia,mstd,input); ... wae5(temp,nns,mstd); ... if (opt>1) castScale_SSE(mstd,&scale,dstp+x); else dstp[x]=min(max((int)(mstd[3]scale+0.5f),0),255); ... } Looking at the last line, it implies that mstd[3] and the destination pixels differ a factor scale* (since dstp[x]=mstd[3]scale, removing the rounding). castScale_SSE is defined nnedi3_asm.asm, but i don't know how to read asm. The function weightedAvgElliottMul5_m16_C which is called in evalFunc_2 (and is set to wae5) gives another clue: Code: void weightedAvgElliottMul5_m16_C(const float w,const int n,float mstd) { ... if (wsum>min_weight_sum[0]) mstd[3]+=((5.0fvsum)/wsum)mstd[1]+mstd[0]; else mstd[3]+=mstd[0]; } This implies that mstd[3], mstd[1] and mst[0] should be of the same scale.* Later on in the code, extract_m8_i16_C/extract_m8_i16_SSE2/extract_m8_C/extract_m8_SSE2, is set to extract. The function extract is called as Code: extract(srcpp+x,src_pitch,xdia,ydia,mstd,input); Here mstd is defined. jspdr pasted some code of the function extract_m8_C, but the issue is there in all of these four functions. In extract_m8_C we see Code: void extract_m8_C(const uint8_t srcp,const int stride,const int xdia,const int ydia,float mstd,float input) { ... const float scale = 1.0f/(float)(xdiaydia); mstd[0] = sumscale; mstd[1] = sumsqscale-mstd[0]mstd[0]; mstd[3] = 0.0f; if (mstd[1]<=FLT_EPSILON) mstd[1]=mstd[2]=0.0f; else { mstd[1]=sqrtf(mstd[1]); mstd[2]=1.0f/mstd[1]; } ... } mstd[0] and sum (the source pixels) differ a factor scale which is consistent with the above. That is, if the value of scale in extract_m8_C is the same as scale in evalFunc_2. I have no idea if that's the case.* If we change 'mstd[1] = sumsqscale-mstd[0]mstd[0];' to 'mstd[1] = sumsqscalescale-mstd[0]mstd[0];', it implies that mstd[1] and mstd[0] differ a factor scale, but mstd[1] is overwritten by its square root later on, so 'mstd[1]=sqrtf(mstd[1]);'. So now mstd[1] and mstd[0] have the same scale which is consistent with the above. So you need to change that in all four functions. What i don't understand what mstd[2] is supposed to do. It has scale^(-1) compared to mstd[1]. I don't see where mstd[2] is used, and thus if its scaling is correct. Last edited by Wilbert; 20th December 2016 at 00:54.*

20th December 2016, 00:45	#328 \| Link
Wilbert Moderator Join Date: Nov 2001 Location: Netherlands Posts: 6,364	mmm scale in evalFunc_2 is set to '1.0f/(float)qual;' with qual being an input parameter (being 1 or 2). While scale in extract_m8_C is equal to '1.0f/(float)(xdiaydia);'. qual doesn't seem equal to xdiaydia to me?? xdia and ydia are set by Code: pssInfo[i].xdia = xdiaTable[nsize]; pssInfo[i].ydia = ydiaTable[nsize]; and these tables by (see header file): Code: const int xdiaTable[NUM_NSIZE] = {8,16,32,48,8,16,32}; const int ydiaTable[NUM_NSIZE] = {6,6,6,6,4,4,4}; Last edited by Wilbert; 20th December 2016 at 00:49.

20th December 2016, 10:09	#329 \| Link
jpsdr Registered User Join Date: Oct 2002 Location: France Posts: 2,316	Finaly, after viewing things in statistic way, it's good. mstd : probably Mean STandard Deviation. mstd[0] is mean, mstd[1] is mean standard deviation, which is the square root of : mean of the sum of the squares, less the square of the mean. Sorry, my mistake. Last edited by jpsdr; 20th December 2016 at 10:12.

20th December 2016, 19:37	#334 \| Link
jpsdr Registered User Join Date: Oct 2002 Location: France Posts: 2,316	So, finaly there is probably no bug, sumsqscale-mstd[0]mstd[0] produce the variance. E(x^2)=sumsqscale E(x)^2=mstd[0]mstd[0] No...? Still not been able to get 16bits working, and i can't figure out where it's going wrong.... Last edited by jpsdr; 20th December 2016 at 19:43.

8th December 2016, 01:03	#322 \| Link
FranceBB Broadcast Encoder Join Date: Nov 2013 Location: Royal Borough of Kensington & Chelsea, UK Posts: 2,905	Thanks for the "trick". With that, it works just fine. Tested a few minutes ago with: nnedi3_resize16(target_width=1280, target_height=720, mixed=true, thr=1.0, elast=1.5, nns=4, qual=2, etype=0, pscrn=4, threads=0, kernel_d="Spline", kernel_u="Spline", taps=12, f_d=1.0, f_u=2.0, sharp=0) to downscale from 1080p to 720p. Thanks for the update!

19th December 2016, 15:12	#324 \| Link
feisty2 I'm Siri Join Date: Oct 2012 Location: void Posts: 2,633	well, you could have a try and see if it still works.. only tritical will ever know the exact answer

19th December 2016, 19:43	#325 \| Link
jpsdr Registered User Join Date: Oct 2002 Location: France Posts: 2,316	It's too bad he's not on doom9 anymore...

20th December 2016, 14:42	#331 \| Link
jpsdr Registered User Join Date: Oct 2002 Location: France Posts: 2,316	Again error from my side, the mean standard deviations is not what i've said after checking (my memory was not exactly right). We are not far, but it's not exactly what is calculated here. But, what is done here is the mean of the squares less the square of the mean, and viewing like this, it can somehow make sense. So, maybe the formula is correct.

21st December 2016, 18:22	#337 \| Link
jpsdr Registered User Join Date: Oct 2002 Location: France Posts: 2,316	After a bloody and painfull struggle, i've been able to make the 16bits working. Can someone explain to me why this is working : Code: const uint8_t srcp = pss->srcp[b]; const uint8_t srcpp = srcp-(ydia-1)src_pitch-xdiad2m1; and why this is not (at least with VS2015 community) : Code: const uint8_t srcp = pss->srcp[b]; const uint8_t srcpp = srcp-((ydia-1)src_pitch-xdiad2m1); ??????????? Thanks again again to feisty2 for the code, it was very usefull, especialy for the init part and weight calcul adjustment. And future thanks also for the part i'll begin to work in : the ASM ! The code will be helpfull.

21st December 2016, 19:34	#339 \| Link
jpsdr Registered User Join Date: Oct 2002 Location: France Posts: 2,316	Argh... Back home too late to delete my stupid question after i've realised it...

21st December 2016, 19:42	#340 \| Link
pinterf Registered User Join Date: Jan 2014 Posts: 2,314	Great news, I suppose the hard thing was having uint16_t instead of a byte, does it automatically work for e.g. 10 bit videos? (Ideally all filters that work for 16 bits should also support 10, 12 and 14 bit videos) __________________ AviSynth+ on github, Other repos: RgTools, Masktools2, MvTools2, TIVTC, Average