NNEDI - intra-field deinterlacing filter [Archive]

tritical

15th September 2007, 02:41

Well, here is nnedi v1.3 (http://bengal.missouri.edu/~kes25c/nnedi_v1.3.zip). It isn't perfect yet, but I think it definitely proves that this method can work well. A v2.0 is already in the works. The filter operation is pretty simple... it throws away one field of each input frame and then interpolates the missing pixels. There is a parameter called 'field' to control which field is kept and double vs same rate output (same as the field parameter in eedi2). Then there are boolean Y, U, and V parameters to control which planes are processed.

This filter turned out to be pretty good for resizing as well (limited to powers of 2 enlargement). Using it for resizing is pretty easy... pointresize the height to 2x, use nnedi, rotate left or right, pointresize again, use nnedi a second time, etc... It is slightly more difficult for YUY2 because turnleft()/turnright() will mess up (blur/interpolate) the chroma. So you will need to use utoy() and vtoy() to pull the chroma planes out and then process each of the 3 clips separately. Example functions for 2x resizing:

function nnediresize2x(clip c, bool pY, bool pU, bool pV)
{
v = c.nnedi(dh=true,Y=pY,U=pU,V=pV).turnleft()
v = v.nnedi(dh=true,Y=pY,U=pU,V=pV).turnright()
return v
}

function nnediresize_YUY2(clip c)
{
cy = c
cu = c.utoy()
cv = c.vtoy()
cy = nnediresize2x(cy,true,false,false)
cu = nnediresize2x(cu,true,false,false)
cv = nnediresize2x(cv,true,false,false)
return ytouv(cu,cv,cy)
}

function nnediresize_YV12(clip c)
{
return nnediresize2x(c,true,true,true)
}

This will result in a shifted image, the direction being dependent on the rotations used.

As an example, 4x enlargement (http://bengal.missouri.edu/~kes25c/t0.png) of clown image from http://www.general-cathexis.com/interpolation.html. No pre or post processing.

Any feedback is welcome, and thanks again to everyone who contributed cpu time :thanks:.

Dark Shikari

15th September 2007, 03:01

Wow, that is a nice filter :eek:

Great work.

Revgen

15th September 2007, 05:44

I decided to use it as a bob filter and compare it to other bob filters. Since I tend to encode sports, it's what I'm most interested in.

Here's my interpretation:

NNEDI: Very Good Quality, Terrible Stability
MVBOB: Very Good Quality, Very Good Stability
MCBOB: Best Quality, Best Stability
NNEDI+TDeint: Good Quality, Good Stability

NNEDI didn't allow too many stray interlaced lines to come in, but the video was flickering and jerking too much and wasn't stable. Pairing it with TDeint improved stability but allowed more stray interlacing artifacts in. If there are any suggestions on improving quality for the NNDI scripts, let me know.

Here's my Lagarith sample. http://www.mediafire.com/?42y1jis3mbg

NNDI Settings:

NNDI = nnedi(field=3,y=true,u=true,v=true,threads=2,opt=0)

NNEDI+TDeint =

interp = nnedi(field=3,y=true,u=true,v=true,threads=2,opt=0)
tdeint(mode=1,order=1,edeint=interp)

MVBOB = Default

MCBOB = Default

foxyshadis

15th September 2007, 06:56

nnedi is a replacement for eedi2, not a smart bob on its own. Swap it for eedi2 in mvbob and then compare it to eedi2's performance, using either securebob, mvbob, or mcbob as you prefer.

Function NNEDIbob(clip Input)
{
Input.nnedi(Field = -2)
AssumeFrameBased()
GetParity(Input) ? AssumeTFF() : AssumeBFF()
}

Add this as type == 4 to SecureBob. Make it default if you want, currently eedi2 is. Same thing in mcbob, but the relevant line to change there is edibobbed = clp.EEDIbob().

tritical

15th September 2007, 09:11

As foxyshadis said, nnedi is just an interpolater like eedi2. It would never be able to beat a motion compensated or motion adaptive bobber on content with static logos and writing.

Anyways, I noticed a bug which was causing the incorrect lines of the chroma planes to be kept in yv12 (yuy2 was fine). I modified the link above to point to version 1.1.

Revgen

15th September 2007, 09:24

tritical

15th September 2007, 09:41

I bobbed your sample using yadif with nnedi for spatial prediction. Result: test.avi (http://bengal.missouri.edu/~kes25c/test.avi)

scharfis_brain

15th September 2007, 10:34

@tritical: wow!
I am really impatient right now, to see a yadif+nnedi.dll to implement it into mvbob. this should jield into a massive improvement in stability.

scharfis_brain

15th September 2007, 13:17

I tested nnedi now and found that it creates garbage with SSE.
It works fine if I force it to use C-Code.

tritical

15th September 2007, 16:28

What is the script you're using, and what cpu does your computer have? c code and sse produce exactly the same results on my laptop and desktop.

scharfis_brain

15th September 2007, 16:48

loadplugin("c:\x\nnedi.dll")

avisource("60i-YUY2-Huffy.avi").assumetff()
nnedi(opt=0, field=-2)

the resulting image looks like this:
(opt=2 also produces this result)
http://home.arcor.de/scharfis_brain/samples/nnedi-opt0.jpg

when I set opt=1 I receive a pretty nice interpolated result:
http://home.arcor.de/scharfis_brain/samples/nnedi-opt1.jpg

the source video is 640x480@29.97fps YUY2
converting it to YV12 results in the same weird image.

I use an Athlon XP 2600+ (Barton Core) with an ASUS A7N8X-XE Mainboard and 2 Gigs of RAM.

The source image looks like this:
http://home.arcor.de/scharfis_brain/samples/nnedi-source.jpg

tritical

15th September 2007, 18:08

scharfis, could you run [link removed] with debugview open to capture the output. It should show which sse routines aren't working correctly on your computer.

The output log might get really big really fast.

Chainmax

15th September 2007, 18:24

The 4x enlargement look amazing, it's better than most results in that page and at least comparable to Zhao Xin-LI and LAD Decovolution :eek:. Great work, tritical! http://smilies.vidahost.com/otn/wink/thumb.gif

MfA

15th September 2007, 20:17

BTW, what kind of downsampling (or rather PSF) are you optimizing for? Straight bilinear (box) like Aruzinsky?

tritical

15th September 2007, 20:47

yadifmod v1.0 (http://bengal.missouri.edu/~kes25c/yadifmod_v1.zip). I've had this for a while, but never got it together for release. It is the same as Fizick's port, except that spatial predictions are taken from a user supplied clip. Also, it is not an Avisynth_C plugin. It works with YV12 and YUY2 input.

@MfA
None really. The primary purpose of the filter is interpolation for deinterlacing not image enlargement. The training set for v1.0 consisted of 220 frames taken from about 30-35 dvd sources (many of them being anime, probably 5-10 were real life sources) and some random images. The filter simply learns to predict a pixel value given only the pixels in the opposite field surrounding its location. For v2.0 I am increasing it to ~250-270 frames. Most of the new ones are from real life images and test clips I found on the internet. There are some other internal changes being made for the next version as well.

Revgen

15th September 2007, 21:14

Just previewed both MVBob and MCBob with NNEDI and it looks pretty good so far looking at still frames. I exprimented with this line in MCBOB

# If requested, do additional PP via EEDI2
# ----------------------------------------
oweave.mt_merge(last,notstatic,luma=false,U=3,V=3)
AssumeTFF()
edisingle = eedi2().LanczosResize(ox,oy,0,-0.5,ox,2*oy+0.001,taps=3)
edidouble = merge(SeparateFields().SelectEven().eedi2(field=1),SeparateFields().SelectOdd().EEDI2(field=0),0.5)
(EdiPost==1) ? edisingle : \
(EdiPost==2) ? edidouble : last

and changed it to

# If requested, do additional PP via NNEDI
# ----------------------------------------
oweave.mt_merge(last,notstatic,luma=false,U=3,V=3)
AssumeTFF()
edisingle = nnedi()
edidouble = merge(nnedi(field=1),nnedi(field=0),0.5)
(EdiPost==1) ? edisingle : \
(EdiPost==2) ? edidouble : last

It looked okay, but it didn't smooth jagged lines as well as the EEDI2 one, so I kept the former.

I'll let you know more once they are fully encoded.

tritical

15th September 2007, 21:56

Changing

edisingle = eedi2().LanczosResize(ox,oy,0,-0.5,ox,2*oy+0.001,taps=3)

to

edisingle = nnedi()

can't be right. eedi2 is taking in a frame and doubling the height. Whereas, nnedi is taking in the same frame, throwing out half the lines and then interpolating them. To get the height doubling behavior with nnedi you need to pointresize to 2x vertically prior to calling nnedi. It should be:

edisingle = pointresize(width,2*height).nnedi().LanczosResize(ox,oy,0,-0.5,ox,2*oy+0.001,taps=3)

Revgen

15th September 2007, 22:01

scharfis_brain

15th September 2007, 23:59

@tritical:
the special version of nnedi.dll you gave me for testing with debugview neither shows a correct result with opt=2 nor with opt=1.

debugview's only (over and over repeated) message is this:
[2876] findCluster doesn't match!

however, your officially posted nnedi.dll works fine with opt=1.

Revgen

16th September 2007, 00:24

Okay I've now looked at MVBob and MCBob, and it appears that NNEDI makes a definite difference on edges. With EEDI2 the edges display something I call "blur bubbles" on straight lines. Replacing EEDI2 with NNEDI seems to greatly reduce if not eliminate these artifacts.

Here's an example of MVBob in it's regular state.

http://img118.imageshack.us/img118/6697/blurbubblecw9.png (http://imageshack.us)

Here's MVBob paired with NNEDI instead of EEDI2

http://img297.imageshack.us/img297/713/noblurbubblehr0.png (http://imageshack.us)

The differences are hard to notice while in motion though.

scharfis_brain

16th September 2007, 00:27

@revgen: if you cannot use something else than Paint then just ensure to set the image size to 1x1 pixels via
Image -> Attributes
before pasting an image!

this will avoid the white borders!

Revgen

16th September 2007, 00:30

@revgen: if you cannot use something else than Paint then just ensure to set the image size to 1x1 pixels via
Image -> Attributes
before pasting an image!

this will avoid the white borders!

I have no idea how to use paint. :p

I'll do it next time.

tritical

16th September 2007, 00:48

scharfis, I put up a new nnedi.dll at the same location as before. Can you dl it and see if it fixes the problems with sse. The last one I put up always used sse (then compared the results for each routine to the C version) so opt didn't do anything.

scharfis_brain

16th September 2007, 00:58

this version behaves like the original one:
- no debugview output
- opt=1 produces a nice output
- opt=2 produces garbage

btw.: I am working with AVS 2.58

tritical

16th September 2007, 01:13

One more time, same link as before. If it still doesn't work I'm out of ideas.

scharfis_brain

16th September 2007, 04:30

Still the same:
(to quote myself)
this version behaves like the original one:
- no debugview output
- opt=1 produces a nice output
- opt=2/0 produces garbage

EDIT: I just tested it in Microsoft VirtualPC on a fresh, virgin-like install of WindowsXP.
The result was the same:
- opt=1 OK
- opt=2/0 Garbage

Is it possible, that my CPU is faulty and processes SSE commands in a wrong way?
Are there programs to check for correct execution of commands (or command sets like SSE)?

tritical

16th September 2007, 07:04

I don't know of any programs to check correct execution of sse, but I also haven't looked for one. The only thing that makes the findCluster sse routine (which is the only one that doesn't work correctly on your computer) different from the other sse routines is that it uses the 'comiss' instruction. The rest of it is almost exactly the same as one of the other routines which works correctly.

Maybe someone else with an athlon xp can test?

Fizick

16th September 2007, 08:05

same bug with my AthlonXP 1800+

tritical

16th September 2007, 10:11

Here are the C/sse routines, maybe someone can see something I can't:
int findCluster_C(const float *input, const float *clusters, const int n)
{
int idx;
float mdiff = FLT_MAX;
for (int i=0; i<n; ++i)
{
float diff = 0.0f;
for (int j=0; j<100; ++j)
diff += (input[j]-clusters[j])*(input[j]-clusters[j]);
if (diff < mdiff)
{
mdiff = diff;
idx = i;
}
clusters += 100;
}
return idx;
}

__declspec(align(16)) const float sse_floatmax[4] =
{ FLT_MAX, FLT_MAX, FLT_MAX, FLT_MAX };

int findCluster_SSE(const float *input, const float *clusters, const int n)
{
int idx;
__asm
{
xor eax,eax
mov edx,n
mov esi,clusters
movaps xmm7,sse_floatmax
i_loop:
mov edi,input
mov ecx,5
xorps xmm0,xmm0
xorps xmm1,xmm1
twenty_loop:
movaps xmm2,[esi]
movaps xmm3,[esi+16]
movaps xmm4,[esi+32]
movaps xmm5,[esi+48]
movaps xmm6,[esi+64]
subps xmm2,[edi]
subps xmm3,[edi+16]
subps xmm4,[edi+32]
subps xmm5,[edi+48]
subps xmm6,[edi+64]
mulps xmm2,xmm2
mulps xmm3,xmm3
mulps xmm4,xmm4
mulps xmm5,xmm5
mulps xmm6,xmm6
addps xmm1,xmm2
addps xmm3,xmm4
addps xmm5,xmm6
addps xmm0,xmm3
addps xmm1,xmm5
add esi,80
add edi,80
sub ecx,1
jnz twenty_loop
addps xmm0,xmm1
movhlps xmm1,xmm0
addps xmm0,xmm1
movaps xmm1,xmm0
psrlq xmm1,32
addss xmm0,xmm1
comiss xmm0,xmm7
jae check_loop
movss xmm7,xmm0
mov idx,eax
check_loop:
add eax,1
cmp eax,edx
jl i_loop
}
return idx;
}

ARDA

16th September 2007, 13:35

@tritical

First of all thank for this contribution; in a fast look (didn't analyze code) if I don't remember wrong
psrlq xmm1,32 is a SSE2 instruction not supported in old SSE capables cpus. All xmm instructions in SSE
are just for floating point ones.
I have not my papers here but ALMOST sure about that.
Best regards for this project

ARDA

IanB

16th September 2007, 15:51

Yep, psrlq xmm1,32 is an SSE2 instruction.

A convienient reference is distrib/include/SoftWire/InstructionSet.cpp

One of SHUFPS, UNPCKLPS or UNPCKHPS is probably what you want.

Terranigma

16th September 2007, 16:14

I'm loving this filter. It's really fast and does a terrific job when used with yadifmod. :D
I could'nt ask for more. :)

Revgen

16th September 2007, 19:17

I bobbed your sample using yadif with nnedi for spatial prediction. Result: test.avi (http://bengal.missouri.edu/~kes25c/test.avi)

Oops! Looks like I missed this post.

That's not too bad at all for Yadif. I'll try it out myself later.

tritical

16th September 2007, 19:35

Thank you ARDA and IanB. I replaced psrlq with shufps. The funny thing is I originally added movaps/psrlq to replace pshufd so that it wouldn't require SSE2.

I put up a new version at the same link as before. scharfis or Fizick, could you test it when you have time?

scharfis_brain

16th September 2007, 19:46

@tritical: it works this way now and it is much faster!
Many thanks!

Chainmax

16th September 2007, 19:54

Revgen, could you try to include TDeint+NNEDI+TMM on your comparison?

Revgen

16th September 2007, 20:33

Revgen, could you try to include TDeint+NNEDI+TMM on your comparison?

Hmm... I didn't know about TMM until you mentioned it. I'll try it out as soon as my other encode is finished.

Revgen

17th September 2007, 06:06

Okay I checked out TDeint+TMM+NNEDI. The good news is that it rivals MVBob (with either EEDI or NNDI in the script) in terms of quality and stability. The bad news is that it's about as slow as MVBob too. And this is with Threads=2 enabled for NNEDI. It doesn't come close to MCBob though, regardless of whether MCBob is using the NNEDI or not.

I wonder if Tritical would be interested in adding an Emask parameter to Yadifmod.

It would be nice to see what result we get with Yadif combined with NNEDI and TMM.

tritical

17th September 2007, 07:35

If you were to going to use tmm/nnedi you would get the same output as using tdeint+tmm+nnedi... there wouldn't be anything for yadif to do. It doesn't matter anyways, because yadif doesn't use a motion mask like tmm outputs. Yadif doesn't make a straight weave or don't weave decision. It starts with the spatial prediction, and then limits that value to be within 'diff' of the weaved prediction (average of pixels from the prev and next fields). 'diff' is calculated from temporal differences and spatial differences.

There is one obvious improvement that can be made to yadif, and that is to slide the temporal window. Right now it is basically a five field check that checks only the middle case... so, for example, it will never output the weaved prediction if the center field (the one being turned into a frame) is within 2 fields (ahead or back) of a scenechange. The only downside is the added computational complexity. Making it check all five cases is on my list of things to do.

2Bdecided

17th September 2007, 10:39

Thanks for more toys to play with!

What's the difference, algorithmically, between NNEDI and EEDI2? (Apart from EEDI2 wanting the fields, and NNEDI throwing one field away from a frame?)

Should I stop using EEDI2 and start using pointresize.NNEDI?

Cheers,
David.

tritical

18th September 2007, 09:42

In terms of the basic operation, EEDI2 and NNEDI do the same thing. They just get there in different ways... EEDI2 copies every line of the input frame to every other line of the output frame and then interpolates the missing pixels. NNEDI just starts by throwing out every other line of the input frame and interpolates the missing pixels.

Algorithmically, NNEDI is a computational intelligence approach using artifical neural networks and clustering. Whereas EEDI2 uses a vector matching method to create a direction map, does some processing of the direction map, and then does linear interpolation along the determined directions. The main advantage of NNEDI is that it isn't limited to outputting the average of two pixels (one from the line above and one from the line below) like EEDI2 is. This allows it to handle conditions that EEDI2's interpolation can't, and is also the reason it can eliminate what Revgen called "Blur Bubbles," which EEDI2 produces. Atm, there are still some things EEDI2 handles better, but I'm confident NNEDI can best it on those things as well. There is still a lot of experimenting to be done as far as NNEDI is concerned.

Should I stop using EEDI2 and start using pointresize.NNEDI?
You should use whichever one looks best to you :p.

Chainmax

18th September 2007, 22:59

yup

19th September 2007, 09:58

Hi tritical!
:thanks:
Can I use this plugin for calculation pelclip for MVAnalyse(MVTools plugin)? Where need use src_left=0.25 and src_top=0.25, in first pointresize or second?
Advice right way.

With kind regards yup.

tritical

19th September 2007, 23:28

tritical, I used EEDI2 mostly for antialiasing and picture improvement on blocky sources (reconnecting edges). How do you expect NNEDI to behave on such cases? Also, does pointresize have a final image quality advantage over other resizing methods when pairing it with NNEDI or is it just a processing speed choice?
I would expect nnedi to work pretty much the same as EEDI2, but there is only one way to find out. Pointresize is the only resizing method that will work because the original pixels need to be kept intact. Basically, you just need a method that will copy the existing rows of pixels to every other line (even lines if field=1 or odd lines if field=0) of the height doubled input into nnedi. The point resize method copies to both, so it works for both field=0/1.

Can I use this plugin for calculation pelclip for MVAnalyse(MVTools plugin)? Where need use src_left=0.25 and src_top=0.25, in first pointresize or second?

If I understand the documentation correctly, mvtools actually wants a shifted clip (left/up). So you can use the code from the first post, but with field set so that the image always ends up shifted left and up:

function nnediresize2x(clip c, bool pY, bool pU, bool pV)
{
v = c.nnedi(dh=true,Y=pY,U=pU,V=pV,field=1).turnleft()
v = v.nnedi(dh=true,Y=pY,U=pU,V=pV,field=0).turnright()
return v
}

function nnediresize_YUY2(clip c)
{
cy = c
cu = c.utoy()
cv = c.vtoy()
cy = nnediresize2x(cy,true,false,false)
cu = nnediresize2x(cu,true,false,false)
cv = nnediresize2x(cv,true,false,false)
return ytouv(cu,cv,cy)
}

function nnediresize_YV12(clip c)
{
return nnediresize2x(c,true,true,true)
}

Call either nnediresize_YUY2 or nnediresize_YV12 depending on the colorspace, or you could make a wrapper function which checks the colorspace and chooses the right one automatically.

IanB

20th September 2007, 06:21

Hint: To double the height fastInterleave(last,last).AssumeFieldBased().Weave()

foxyshadis

20th September 2007, 06:51

That's actually faster than pointresize? o.O?

Fastest of all would seem to be the way eedi2 does it internally, which is just copying every line of source into every other line of output. (With suitable simd, which eedi2 doesn't have.) I'm not actually much concerned about speed, as the overhead of making and keeping a copy of something in cache that's just going to be thrown right away. (I use it for biiiiiiiiig stuff.) I guess MakeWriteable would prevent that.

IanB

20th September 2007, 07:35

Yes internally doing a BitBlt(..., dest_pitch*2, ....) would be twice as fast as the weave I suggested, which does the above blit twice.

The resizer core does struggle to do a point-resize efficently, it stupidly goes through the full motion, multiplying every pixel by 1 in a loop of 1 cycle.

tritical

20th September 2007, 07:45

I could add the option to make nnedi do it internally, which would be the fastest. However, it really wont make a noticeable difference since nnedi runs more than 100 times slower than pointresize. On my laptop pointresize 720x480 -> 720x960 runs ~260-280 fps. interleave()/weave() 720x480 -> 720x960 runs ~500-600 fps. nnedi on 720x960 input runs ~1.25 fps. Even on my quadcore the ratio is still > 100 times slower (6 fps vs 750 for point and 1100 for interleave/weave).

tritical

21st September 2007, 00:30

nnedi v1.3 (http://bengal.missouri.edu/~kes25c/nnedi_v1.3.zip). foxyshadis's argument about the cache and ram usage in general convinced me to add an option to internally do the needed copying for doubling the height... so no need to call pointresize anymore. I was also using a separate filter to pad the frames prior to nnedi, and then invoking crop afterwards. That has been done away with as well. I also discovered a bug in the yuy2 padding code, which resulted in occasionally incorrect (+-3) interpolated chroma values at the left and right hand sides of the image.

I updated the code in the first post to use the new 'dh' option instead of pointresize.