NNEDI - intra-field deinterlacing filter - Page 2

scharfis_brain · 16th September 2007, 00:27

@revgen: if you cannot use something else than Paint then just ensure to set the image size to 1x1 pixels via
Image -> Attributes
before pasting an image!

this will avoid the white borders!

Revgen · 16th September 2007, 00:30

Quote:

Originally Posted by scharfis_brain

@revgen: if you cannot use something else than Paint then just ensure to set the image size to 1x1 pixels via
Image -> Attributes
before pasting an image!

this will avoid the white borders!

I have no idea how to use paint.

I'll do it next time.

tritical · 16th September 2007, 00:48

scharfis, I put up a new nnedi.dll at the same location as before. Can you dl it and see if it fixes the problems with sse. The last one I put up always used sse (then compared the results for each routine to the C version) so opt didn't do anything.

scharfis_brain · 16th September 2007, 00:58

this version behaves like the original one:
- no debugview output
- opt=1 produces a nice output
- opt=2 produces garbage

btw.: I am working with AVS 2.58

tritical · 16th September 2007, 01:13

One more time, same link as before. If it still doesn't work I'm out of ideas.

scharfis_brain · 16th September 2007, 04:30

Still the same:
(to quote myself)

Quote:

this version behaves like the original one:
- no debugview output
- opt=1 produces a nice output
- opt=2/0 produces garbage

EDIT: I just tested it in Microsoft VirtualPC on a fresh, virgin-like install of WindowsXP.
The result was the same:
- opt=1 OK
- opt=2/0 Garbage

Is it possible, that my CPU is faulty and processes SSE commands in a wrong way?
Are there programs to check for correct execution of commands (or command sets like SSE)?

tritical · 16th September 2007, 07:04

I don't know of any programs to check correct execution of sse, but I also haven't looked for one. The only thing that makes the findCluster sse routine (which is the only one that doesn't work correctly on your computer) different from the other sse routines is that it uses the 'comiss' instruction. The rest of it is almost exactly the same as one of the other routines which works correctly.

Maybe someone else with an athlon xp can test?

Fizick · 16th September 2007, 08:05

same bug with my AthlonXP 1800+

tritical · 16th September 2007, 10:11

Here are the C/sse routines, maybe someone can see something I can't:

Code:

int findCluster_C(const float *input, const float *clusters, const int n)
{
   int idx;
   float mdiff = FLT_MAX;
   for (int i=0; i<n; ++i)
   {
      float diff = 0.0f;
      for (int j=0; j<100; ++j)
         diff += (input[j]-clusters[j])*(input[j]-clusters[j]);
      if (diff < mdiff)
      {
         mdiff = diff;
         idx = i;
      }
      clusters += 100;
   }
   return idx;
}

__declspec(align(16)) const float sse_floatmax[4] = 
      { FLT_MAX, FLT_MAX, FLT_MAX, FLT_MAX };

int findCluster_SSE(const float *input, const float *clusters, const int n)
{
   int idx;
   __asm
   {
      xor eax,eax
      mov edx,n
      mov esi,clusters
      movaps xmm7,sse_floatmax
i_loop:
      mov edi,input
      mov ecx,5
      xorps xmm0,xmm0
      xorps xmm1,xmm1
twenty_loop:
      movaps xmm2,[esi]
      movaps xmm3,[esi+16]
      movaps xmm4,[esi+32]
      movaps xmm5,[esi+48]
      movaps xmm6,[esi+64]
      subps xmm2,[edi]
      subps xmm3,[edi+16]
      subps xmm4,[edi+32]
      subps xmm5,[edi+48]
      subps xmm6,[edi+64]
      mulps xmm2,xmm2
      mulps xmm3,xmm3
      mulps xmm4,xmm4
      mulps xmm5,xmm5
      mulps xmm6,xmm6
      addps xmm1,xmm2
      addps xmm3,xmm4
      addps xmm5,xmm6
      addps xmm0,xmm3
      addps xmm1,xmm5
      add esi,80
      add edi,80
      sub ecx,1
      jnz twenty_loop
      addps xmm0,xmm1
      movhlps xmm1,xmm0
      addps xmm0,xmm1
      movaps xmm1,xmm0
      psrlq xmm1,32
      addss xmm0,xmm1
      comiss xmm0,xmm7
      jae check_loop
      movss xmm7,xmm0
      mov idx,eax
check_loop:
      add eax,1
      cmp eax,edx
      jl i_loop
   }
   return idx;
}

ARDA · 16th September 2007, 13:35

@tritical

First of all thank for this contribution; in a fast look (didn't analyze code) if I don't remember wrong
psrlq xmm1,32 is a SSE2 instruction not supported in old SSE capables cpus. All xmm instructions in SSE
are just for floating point ones.
I have not my papers here but ALMOST sure about that.
Best regards for this project

ARDA

IanB · 16th September 2007, 15:51

Yep, psrlq xmm1,32 is an SSE2 instruction.

A convienient reference is distrib/include/SoftWire/InstructionSet.cpp

One of SHUFPS, UNPCKLPS or UNPCKHPS is probably what you want.

Terranigma · 16th September 2007, 16:14

I'm loving this filter. It's really fast and does a terrific job when used with yadifmod.

I could'nt ask for more.

Revgen · 16th September 2007, 19:17

Quote:

Originally Posted by tritical

I bobbed your sample using yadif with nnedi for spatial prediction. Result: test.avi

Oops! Looks like I missed this post.

That's not too bad at all for Yadif. I'll try it out myself later.

tritical · 16th September 2007, 19:35

Thank you ARDA and IanB. I replaced psrlq with shufps. The funny thing is I originally added movaps/psrlq to replace pshufd so that it wouldn't require SSE2.

I put up a new version at the same link as before. scharfis or Fizick, could you test it when you have time?

scharfis_brain · 16th September 2007, 19:46

@tritical: it works this way now and it is much faster!
Many thanks!

Chainmax · 16th September 2007, 19:54

Revgen, could you try to include TDeint+NNEDI+TMM on your comparison?

Revgen · 16th September 2007, 20:33

Quote:

Originally Posted by Chainmax

Revgen, could you try to include TDeint+NNEDI+TMM on your comparison?

Hmm... I didn't know about TMM until you mentioned it. I'll try it out as soon as my other encode is finished.

Revgen · 17th September 2007, 06:06

Okay I checked out TDeint+TMM+NNEDI. The good news is that it rivals MVBob (with either EEDI or NNDI in the script) in terms of quality and stability. The bad news is that it's about as slow as MVBob too. And this is with Threads=2 enabled for NNEDI. It doesn't come close to MCBob though, regardless of whether MCBob is using the NNEDI or not.

[Hint]I wonder if Tritical would be interested in adding an Emask parameter to Yadifmod.[/Hint]

It would be nice to see what result we get with Yadif combined with NNEDI and TMM.

tritical · 17th September 2007, 07:35

If you were to going to use tmm/nnedi you would get the same output as using tdeint+tmm+nnedi... there wouldn't be anything for yadif to do. It doesn't matter anyways, because yadif doesn't use a motion mask like tmm outputs. Yadif doesn't make a straight weave or don't weave decision. It starts with the spatial prediction, and then limits that value to be within 'diff' of the weaved prediction (average of pixels from the prev and next fields). 'diff' is calculated from temporal differences and spatial differences.

There is one obvious improvement that can be made to yadif, and that is to slide the temporal window. Right now it is basically a five field check that checks only the middle case... so, for example, it will never output the weaved prediction if the center field (the one being turned into a frame) is within 2 fields (ahead or back) of a scenechange. The only downside is the added computational complexity. Making it check all five cases is on my list of things to do.

2Bdecided · 17th September 2007, 10:39

Thanks for more toys to play with!

What's the difference, algorithmically, between NNEDI and EEDI2? (Apart from EEDI2 wanting the fields, and NNEDI throwing one field away from a frame?)

Should I stop using EEDI2 and start using pointresize.NNEDI?

Cheers,
David.

16th September 2007, 00:27	#21 \| Link
scharfis_brain brainless Join Date: Mar 2003 Location: Germany Posts: 3,653	@revgen: if you cannot use something else than Paint then just ensure to set the image size to 1x1 pixels via Image -> Attributes before pasting an image! this will avoid the white borders! __________________ Don't forget the 'c'! Don't PM me for technical support, please.

16th September 2007, 00:58	#24 \| Link
scharfis_brain brainless Join Date: Mar 2003 Location: Germany Posts: 3,653	this version behaves like the original one: - no debugview output - opt=1 produces a nice output - opt=2 produces garbage btw.: I am working with AVS 2.58 __________________ Don't forget the 'c'! Don't PM me for technical support, please.

16th September 2007, 15:51	#31 \| Link
IanB Avisynth Developer Join Date: Jan 2003 Location: Melbourne, Australia Posts: 3,167	Yep, psrlq xmm1,32 is an SSE2 instruction. A convienient reference is distrib/include/SoftWire/InstructionSet.cpp One of SHUFPS, UNPCKLPS or UNPCKHPS is probably what you want. Last edited by IanB; 16th September 2007 at 16:15.

16th September 2007, 16:14	#32 \| Link
Terranigma Space Reserved Join Date: May 2006 Posts: 953	I'm loving this filter. It's really fast and does a terrific job when used with yadifmod. I could'nt ask for more. __________________ Kurama Link And Fox Doom10 - It's brighter on the other side

16th September 2007, 19:46	#35 \| Link
scharfis_brain brainless Join Date: Mar 2003 Location: Germany Posts: 3,653	@tritical: it works this way now and it is much faster! Many thanks! __________________ Don't forget the 'c'! Don't PM me for technical support, please.

16th September 2007, 00:48	#23 \| Link
tritical Registered User Join Date: Dec 2003 Location: MO, US Posts: 999	scharfis, I put up a new nnedi.dll at the same location as before. Can you dl it and see if it fixes the problems with sse. The last one I put up always used sse (then compared the results for each routine to the C version) so opt didn't do anything.

16th September 2007, 01:13	#25 \| Link
tritical Registered User Join Date: Dec 2003 Location: MO, US Posts: 999	One more time, same link as before. If it still doesn't work I'm out of ideas.

16th September 2007, 07:04	#27 \| Link
tritical Registered User Join Date: Dec 2003 Location: MO, US Posts: 999	I don't know of any programs to check correct execution of sse, but I also haven't looked for one. The only thing that makes the findCluster sse routine (which is the only one that doesn't work correctly on your computer) different from the other sse routines is that it uses the 'comiss' instruction. The rest of it is almost exactly the same as one of the other routines which works correctly. Maybe someone else with an athlon xp can test?

16th September 2007, 08:05	#28 \| Link
Fizick AviSynth plugger Join Date: Nov 2003 Location: Russia Posts: 2,183	same bug with my AthlonXP 1800+

16th September 2007, 13:35	#30 \| Link
ARDA Registered User Join Date: Nov 2001 Posts: 291	@tritical First of all thank for this contribution; in a fast look (didn't analyze code) if I don't remember wrong psrlq xmm1,32 is a SSE2 instruction not supported in old SSE capables cpus. All xmm instructions in SSE are just for floating point ones. I have not my papers here but ALMOST sure about that. Best regards for this project ARDA

16th September 2007, 19:35	#34 \| Link
tritical Registered User Join Date: Dec 2003 Location: MO, US Posts: 999	Thank you ARDA and IanB. I replaced psrlq with shufps. The funny thing is I originally added movaps/psrlq to replace pshufd so that it wouldn't require SSE2. I put up a new version at the same link as before. scharfis or Fizick, could you test it when you have time?

16th September 2007, 19:54	#36 \| Link
Chainmax Huh? Join Date: Sep 2003 Location: Uruguay Posts: 3,103	Revgen, could you try to include TDeint+NNEDI+TMM on your comparison? __________________ Read Decomb's readmes and tutorials, the IVTC tutorial and the capture guide in order to learn about combing and how to deal with it.

17th September 2007, 06:06	#38 \| Link
Revgen Registered User Join Date: Sep 2004 Location: Near LA, California, USA Posts: 1,545	Okay I checked out TDeint+TMM+NNEDI. The good news is that it rivals MVBob (with either EEDI or NNDI in the script) in terms of quality and stability. The bad news is that it's about as slow as MVBob too. And this is with Threads=2 enabled for NNEDI. It doesn't come close to MCBob though, regardless of whether MCBob is using the NNEDI or not. [Hint]I wonder if Tritical would be interested in adding an Emask parameter to Yadifmod.[/Hint] It would be nice to see what result we get with Yadif combined with NNEDI and TMM. __________________ Pirate: Now how would you like to die? Would you like to have your head chopped off or be burned at the stake? Curly: Burned at the stake! Moe: Why? Curly: A hot steak is always better than a cold chop. Last edited by Revgen; 17th September 2007 at 06:14.

17th September 2007, 07:35	#39 \| Link
tritical Registered User Join Date: Dec 2003 Location: MO, US Posts: 999	If you were to going to use tmm/nnedi you would get the same output as using tdeint+tmm+nnedi... there wouldn't be anything for yadif to do. It doesn't matter anyways, because yadif doesn't use a motion mask like tmm outputs. Yadif doesn't make a straight weave or don't weave decision. It starts with the spatial prediction, and then limits that value to be within 'diff' of the weaved prediction (average of pixels from the prev and next fields). 'diff' is calculated from temporal differences and spatial differences. There is one obvious improvement that can be made to yadif, and that is to slide the temporal window. Right now it is basically a five field check that checks only the middle case... so, for example, it will never output the weaved prediction if the center field (the one being turned into a frame) is within 2 fields (ahead or back) of a scenechange. The only downside is the added computational complexity. Making it check all five cases is on my list of things to do.

17th September 2007, 10:39	#40 \| Link
2Bdecided Registered User Join Date: Dec 2002 Location: UK Posts: 1,673	Thanks for more toys to play with! What's the difference, algorithmically, between NNEDI and EEDI2? (Apart from EEDI2 wanting the fields, and NNEDI throwing one field away from a frame?) Should I stop using EEDI2 and start using pointresize.NNEDI? Cheers, David.