NNEDI - intra-field deinterlacing filter [Archive] - Page 5

2Bdecided

11th September 2008, 16:29

Don't fear, it is still being worked on :). I've just been busy working on some other projects the last week or two. A bit of good news is that I got permission to run the training program on my university's 512 cpu cluster, and I'm doing some test runs as I write this. A new version should be ready in a week or two (no promises though :p).I wonder if it will make it in time for this post's first birthday? ;)

I ask because, in the UK at least, full time Masters degrees usually run for one year. I wonder if Tritical has started, or finished, or if the next we'll see of his idea is if/when it's deployed commercially.

Any chance of an update?

Cheers,
David.

tritical

17th September 2008, 19:10

If I was going full time it would take ~3 semesters (degree requires 30 hours, 9 hours is full time). However, I've only been part time, and then working some (plus I have no desire to give up the college lifestyle right now :)). I should only have one more semester after this one.

Anyways, I'm still working on NNEDI... trying new ideas, etc... It has turned out to be a rather difficult problem (in terms of achieving the type of results I think are possible). If I ever get something significantly better than the current released version working then I will definitely post it. However, so far its just been small improvements. I'm hoping that the current idea I'm running with will show big improvements.

Adub

17th September 2008, 20:27

Good to hear from you!! I am glad that you are enjoying the college life (as I myself am) and I look forward to your future work with eagerness!

Terka

18th September 2008, 09:57

will new nnedi use also temporal information?

2Bdecided

18th September 2008, 19:02

Thanks for the update. I hope your thesis makes it on-line one day. Sadly, many universities still don't encourage this.

Cheers,
David.

tritical

21st September 2008, 02:23

@Terka
It's still spatial only for now. Once that is working well, incorporating temporal information (which will most likely have to involve separate motion compensation) is definitely the next step.

Terka

23rd September 2008, 10:04

holding the thumbs!

g_aleph_r

21st October 2008, 14:09

News about CUDA version?
I am currently working with 12 Fps on a 50Fps video, it is slooow!! :(

Adub

21st October 2008, 17:49

I don't think anyone is actually working on a CUDA version.

g_aleph_r

26th October 2008, 07:55

...Last fall I actually spent some time writing a CUDA implementation to offload part of the calculations used for training (which are pretty much the same ones used during normal operation).
...
Hopefully, the source code for NNEDI will be available by the summer. I actually have a new version ready, I just need a free day to update the code.

Sorry if I insist,

I would like to see it even if He says that is not very fast, It's still useful if it leaves CPU for other filters.

Terka

26th January 2009, 10:40

Hi tritical, any news regarding new version?

tritical

27th January 2009, 09:49

Not really. I still work on it when I get new ideas, but as it turns out the original formulation of nnedi was pretty good and not that easy to beat.

Dark Shikari

27th January 2009, 09:50

Not really. I still work on it when I get new ideas, but as it turns out the original formulation of nnedi was quite good and not that easy to beat.Would it work better for resizing if it was trained specifically for resizing instead of for edge-interpolation? And how about, for example, making a version explicitly for cartoons by training on such?

Also, any news on publicizing the algorithm behind this? ;) I have a few ideas, but I'd like to know for sure.

Terka

27th January 2009, 12:55

imho users will be grateful if temporal component was added.

Sagekilla

27th January 2009, 20:30

@Dark Shikari: You mean like giving it the full resolution input then having the algorithm try to optimize towards getting the result sharp like the source, instead of trying to get the edges sharp like the source?

Just a guess, I don't know exactly how NNEDI works.

tritical

29th January 2009, 14:19

@Dark Shikari
Yes, training specifically for resizing would make it better for that, and training specifically for anime would make it better at anime. The idea is actually pretty simple. Use cubic interpolation (or some other fast method) where it wont introduce much error, split remaining pixels into similar groups based on local neighborhood, have one or more neural networks for each group that are trained to output the interpolated pixel value given the local neighborhood as input. Of course there are lots of open questions there... How much data to use, and from how many sources? How to separate local neighborhoods into groups (clustering... what method? operate on raw pixel values? do dimensionality reduction? extract specific features?). How many groups to have? What to feed to the neural networks (raw pixel values? extracted features?). What structure should the neural networks have? How should they be trained? What should the objective function be? How should overfitting be avoided?

The version of nnedi out now used pretty much the simplest methods, and took no steps to avoid overfitting aside from using lots of training data:

k-means clustering with 64 clusters, cluster on raw pixel values of local neighborhood (mean removed), local neighborhood was 4x25 (100 pixels), cluster on ~20-25 million local neighborhoods from progressive frames from ~35-40 sources

one neural network per cluster, input was raw pixel values (scaled to [-1,1] and mean of local neighborhood removed), trained with CMA-ES to minimize squared error, neural network had 2 hidden layers w/ 8 neurons apiece, each neuron used Elliott activation, nn had one output neuron with linear activation function which was connected to both hidden layers, one of the neurons in the first hidden layer used linear activation, and starting point for training the neural networks was set by solving for the linear lss weights for the cluster, sticking those into the first layer linear activation neuron (basically the networks started out predicting the linear best fit solution).

I think that is about it, or what I remember at least.

@Terka
I have thought about how to include temporal information, but it isn't all that easy. It would require accurate motion compensation, and I think training would be much more complex than spatial only.

Dark Shikari

29th January 2009, 14:29

Would it be possible to make a release of NNEDI that could be trained specifically for whatever purposes I wanted? I have enough CPU power to go for it... :p

Also, how do you recommend training--downscaling sample input, NNEDI, and comparing it to original input? Won't that to some extent lead to an NNEDI that's optimized to a specific downscaling resampler?

Also, a hunch: if you're basing the neural network on neighboring pixels, are you using the differences between the neighboring pixels as well (e.g. (T-L), (LT-T), (TT-T), (LL-L), etc, where T=Top, L=Left, TL=TopLeft, TT=TopTop [two above], etc)? I suspect this might give even better results (testing with FFV1 shows that it gives the best correlation).

(By the way, here's a recent upscale I did with NNEDI and a few other filters: Left is Lanczos, Right is NNEDI (http://www.upimage.us/image-F548_49803665.jpg))

*.mp4 guy

29th January 2009, 15:21

Could you also post the source image?

Dark Shikari

29th January 2009, 15:27

Could you also post the source image?Linkage (http://i43.tinypic.com/dlgw9l.png)

Script:

image=ImageSource("test.png")
r=image.ShowRed("YV12").nnediresize_YV12().dfttest(sigma=1).fastlinedarken().limitedsharpenfaster()
g=image.ShowGreen("YV12").nnediresize_YV12().dfttest(sigma=1).fastlinedarken().limitedsharpenfaster()
b=image.ShowBlue("YV12").nnediresize_YV12().dfttest(sigma=1).fastlinedarken().limitedsharpenfaster()
MergeRGB(r,g,b)
ConvertToYV12()
AddGrain(1,0.1,0.1)
AddGrain(2,0.2,0.2)
AddGrain(3,0.4,0.4)

AddGrain is for dither/weak noise bascally. DFTtest is to deal with the jpeg artifacts from the original (the PNG is converted from an original source JPEG). Separate upscaling for each color channel is because, IMO, it seems to work better.

tritical

29th January 2009, 23:24

Would it be possible to make a release of NNEDI that could be trained specifically for whatever purposes I wanted? I have enough CPU power to go for it...
It's possible, and I have thought about it before (allowing users to give training data). It would take a little work as the training code is scattered among multiple programs.

Also, how do you recommend training--downscaling sample input, NNEDI, and comparing it to original input? Won't that to some extent lead to an NNEDI that's optimized to a specific downscaling resampler?
It will be biased towards that resampler, but is there a better way to do it? In most of the papers I've read they test upsampling by downscaling large images (usually with basic averaging + some sharpening, trying to approximate how various imaging devices work).

Also, a hunch: if you're basing the neural network on neighboring pixels, are you using the differences between the neighboring pixels as well (e.g. (T-L), (LT-T), (TT-T), (LL-L), etc, where T=Top, L=Left, TL=TopLeft, TT=TopTop [two above], etc)? I suspect this might give even better results (testing with FFV1 shows that it gives the best correlation).
I don't do that. Theoretically, it is unnecessary/redundant, as those differences are simply linear combinations of the input variables... so the input layer neurons could learn the same mappings given the original pixel values as input vs if they were given those differences as input. It might make the learning faster though, would have to try.

Dark Shikari

29th January 2009, 23:28

Also, what about using a metric other than mean squared error? SSIM might be a good one to try for, or perhaps something like x264's psy-RD metric.

MfA

30th January 2009, 15:18

There is no need to simultaneous optimize interpolation and texture synthesis, unlike encoding there is no gain to be had from reusing artefacts as texture. You can always add texture in a separate pass.

Dark Shikari

30th January 2009, 19:22

There is no need to simultaneous optimize interpolation and texture synthesisWhy not?

MfA

31st January 2009, 20:19

Because "You can always add texture in a separate pass.". With H264 if the noise doesn't get encoded then all you have left is a smoothed result, if slightly misaligning edges etc. allows you to maintain some texture and get a better looking picture that's what you do ... because there is no better alternative. With interpolation you have the luxury to add texture afterwards, so you can concentrate on making the optimal interpolator for features which can be well predicted first (mostly edges).

Weighted predictors are not great texture synthesizers anyway.

tritical

4th February 2009, 07:03

@Dark Shikari
I have tried some other metrics, but with the current algorithm and number of data points it really has to be something completely independent of other pixels (or at least independent of pixels in other clusters) so that the weights for each cluster can be learned separately. SSIM can be computed for a single pixel change, but it doesn't work very well in my experience.

The latest idea I've been working with is to switch from a bunch of separate non-linear regression problems to one classification problem. In other words, switch from learning interpolation functions for a given set of groupings (the clusters learned through k-means combined with euclidean distance metric) to learning the grouping function for 'n' sets of linear interpolation weights.

I initialize k-means as usual (to get the initial groupings), but instead of regrouping the pixels based on euclidean distance to the mean of each cluster, I calculate the linear least squares interpolation coefficients for each cluster. I then reassign pixels to the cluster whose interpolation coefficients give the minimum squared error for that pixel, and keep repeating that until convergence (overall mse stops dropping significantly). I've found that it only needs ~16-32 sets of coefficients to get very nice results (I used about 5 million points from my 740 frame video to cluster, then tested it on the whole video by choosing the best set of weights for each pixel). Now, the problem becomes creating a classifier to choose which set of weights to use.

MfA

4th February 2009, 19:15

A problem with MSE (and SSIM) is that it heavily punishes outliers, which is fine for a quality metric ... but not good in a classifier.

PS. I find it curious you chose to optimize interpolation without simultaneously optimizing the classifier (ie. optimizing the interpolation weights with an oracle classifier). I would have expected you to optimize both at the same time. What classifier did you end up using?

tritical

4th February 2009, 21:49

As you say, it's possible to iteratively optimize both... train classifier(s) a little, train interpolation coefficients a little (or solve if direct solution exists), keep repeating. Or did you have something different in mind?

I haven't gotten that far yet. I am still testing different classifiers. Main restriction on what can be used is computation time, since it is typically going to be run on ~25-35% of pixels in a frame (~80k-120k for a 720x480 frame). What has worked best is a basic nn trained to select classes by minimizing squared error of the resulting interpolation (if there are 16 classes, then the nn has 16 output neurons and the one with the largest value is the chosen class). Actually, it worked better to not have the nn choose a single class, but to use its outputs as linear combination weights (either after applying softmax activation or normalizing so they sum to 1). However, having it output combination weights makes iterative optimization with the interpolation coefficients more complicated.

MfA

4th February 2009, 23:15

Actually, it worked better to not have the nn choose a single class, but to use its outputs as linear combination weights (either after applying softmax activation or normalizing so they sum to 1). However, having it output combination weights makes iterative optimization with the interpolation coefficients more complicated.
Wouldn't that only make sense if during application of the filter you also use the weighted combination of all predictors? Doesn't seem a realistic option.

By the way, why did you decide to second guess the CMA-ES algorithm? (ie. why not just let it optimize the entire system of both classifiers and predictors in one go.)

madshi

9th November 2009, 13:43

@tritical,

have you checked out iNEDI yet? It seems to be an noticeable improvement over the original NEDI algorithm. Maybe you could implement some of iNEDI's ideas into your NNEDI?

http://www.tecnick.com/pagefiles/appunti/iNEDI_tesi_Nicola_Asuni.pdf

tetsuo55

9th November 2009, 14:42

Looks like even iNEDI has been surpassed:

http://www.comp.leeds.ac.uk/bmvc2008/proceedings/papers/43.pdf

Is up to 10 times faster when compared to NEDI too.

EDIT:
And even ICBI has been surpassed:

http://www.eurasip.org/Proceedings/Eusipco/Eusipco2009/contents/papers/1569192778.pdf

tritical

9th November 2009, 18:56

Learning the interpolation weights based on the low res image works alright for image enlargement, but isn't any good for deinterlacing because too much information is missing. For now I'm more interested in deinterlacing interpolation than enlargement. The iterative energy minimization post-processing described in the ICBI paper is interesting though, and could be useful for deinterlacing. I will try running it on the result of nnedi2/eedi2 and see how it looks.

It also doesn't appear that MEDI > ICBI > iNEDI is always the case based on the results in that last paper. Looks like it depends on the image content, which isn't surprising. It would be interesting to see how nnedi2 compares psnr/ssim wise.

In the future I plan to revisit nedi... mainly because at the time I wrote ediupsizer I didn't have a full understanding of the mathematics/concepts involved. I will definitely keep these papers in mind as well :thanks:.

tetsuo55

9th November 2009, 19:05

It also doesn't appear that MEDI > ICBI > iNEDI is always the case based on the results in that last paper. Looks like it depends on the image content, which isn't surprising. It would be interesting to see how nnedi2 compares psnr/ssim wise.At a first glance i was confused about this too.

But as i read more, it became more and more obvious that the newer ones, and especially MEDI goes for the phycovisually better result, at the cost of some PSNR and SSIM

Also i find it very interesting that these algo´s can be used for deinterlacing.

It would be great to have a universal, MEDI based scaler-deinterlacer that always works, regardless of I or P content

MfA

9th November 2009, 22:19

The extreme staircasing of the image in the MEDI paper for bilinear makes me think they used decimation for downsampling in their tests (which makes the results pretty much irrelevant for normal upsampling).

PS. the gaps from the spokes to the rim are pretty damning, I'm 99% sure they used decimation ... poor show.

tetsuo55

9th November 2009, 22:22

Yeah i think the top 3, should be tested on realworld moving video.

madshi

13th November 2009, 10:12

For those interested, here's the Clown resampled by ICBI:

http://madshi.net/clownICBI.png

ICBI seems to be a bit soft to me, but on the positive side it looks quite smooth and natural and doesn't seem to add any artifacts (other than those already in the source, like the pole halo in the Clown image).

tetsuo55

13th November 2009, 10:30

I agree that its very soft.

It appears like most high quality interpolators create a soft image.

But we could always add a sharpener at the end.

EDIT: actually i would describe it as slightly out of focus

Mystery Keeper

14th November 2009, 01:48

I tried to implement ICBI for deinterlacing. It doesn't work well. Well, it does work, but NNEDI2 works better and faster than my pixel shaders 3 implementation.
tetsuo55
ICBI is adjustable method. You can get sharper image with it if you play with parameters.

madshi

14th November 2009, 09:45

ICBI is adjustable method. You can get sharper image with it if you play with parameters.
Which parameters did you change in which direction to get a sharper image? Probably choosing sharper parameters comes at the cost of curve smoothness, I guess?

MfA

14th November 2009, 15:36

Softness in an interpolator isn't a bad thing.

There is a difference between interpolation and super-resolution. An interpolator generally conserves the original pixels ... this is fundamentally incorrect if you are trying to reconstruct a non-smoothed higher resolution image. For instance just because a pixel covers an edge in the low resolution image doesn't mean it covers it in the higher resolution one, so mixing colors from both sides of the edge could be fundamentally incorrect for the non smoothed higher resolution image.

To benchmark interpolators you should compare against the smoothed version of the higher resolution image, not the original higher resolution image. Sharpening and texture synthesis are not interpolation.

Which is not to say you couldn't do a single step super resolution algorithm ... it just wouldn't be a pure interpolator and shouldn't retain the original pixels from the low resolution image.

Mystery Keeper

16th November 2009, 04:43

Which parameters did you change in which direction to get a sharper image? Probably choosing sharper parameters comes at the cost of curve smoothness, I guess?
Why, of course it does. Second parameter - beta, is there to limit the smoothing out.

absence

8th February 2011, 11:13

I'm toying with the idea of implementing some kind of EDI on a GPU. Not as advanced as NNEDI, more like NEDI or MEDI. The MEDI paper (http://www.eurasip.org/Proceedings/Eusipco/Eusipco2009/contents/papers/1569192778.pdf) claims NEDI only uses the training window shown in figure 4 a (or b), but according to a more practical description of NEDI (http://chiranjivi.tripod.com/EDITut.html), all 4x4 pixels surrounding the purple unknown high resolution pixel dot on the blue grid are used, in a way that leaves the purple dot centred. The MEDI paper and the figures in the NEDI and MEDI papers makes it sound like the unknown pixel is off-centre and only 3x3 surrounding pixels are used, while the formulas in both are quite clear about using all the pixels surrounding the unknown one. What am I misunderstanding here? :)

tritical

8th February 2011, 16:04

I had the same discussion with madshi about a year ago about MEDI, and the conclusion was that the MEDI paper is just weird. They talk about NEDI using 1 training window, which it could be implemented that way, but never is. All the implementations I've seen use 16 (4x4) or more training windows around the current point - and in both steps the combination of all of them is always centered about the point to interpolated. Here is what I wrote to madshi:

I don't understand that paper either. Saying that nedi uses a single training window is weird, but I guess it could be implemented that way. Their point about covariance mismatch is true either way though. If you use all training windows available inside an NxN window (8x8, 16x16 etc...) around the current point, some of those windows may have the same structure (same linear relationship between the predictors and the center pixel) as the real edge and some may not. The ones that don't could cause bad coefficient estimates. My take is they are trying to select the training window that most closely resembles the window being used for interpolation, but their criteria of highest covariance signal energy doesn't seem like the best solution. Anyway, based on the results they report it doesn't seem like their method is much of an improvement over nedi.

absence

8th February 2011, 16:36

I had the same discussion with madshi about a year ago about MEDI, and the conclusion was that the MEDI paper is just weird.

Glad it's not just me. :)

Anyway, based on the results they report it doesn't seem like their method is much of an improvement over nedi.

MEDI does a better job than NEDI at connecting pixels in the spoke images in the paper, but I haven't seen it in action anywhere else and don't know if the difference matters much for "normal" images.

Thanks for the info!

tritical

9th February 2011, 00:40

Like I said, I would not trust the MEDI paper. If you actually implemented what they describe, which is NEDI with a 2x2 window (so you have 4 training cases per pixel, which is still too few) and then select only one training case based on the energy, I would expect very large artifacts in any type of detailed area. The reason being you are using only 1 training case to estimate 4 parameters.

henryho_hk

21st February 2011, 13:37

Just curious, is the idea of gamma drift in resizing (http://www.4p8.com/eric.brasseur/gamma.html) relevant to the error calculation in NNEDI training?

tritical

21st February 2011, 20:02

The short answer is yes. I think it would relate to all image quality metrics. It would be interesting to know whether mean squared error, mean absolute error, SSIM, etc.. correlate better with human perception when computed on luminance (Y - relative luminance - computed from linear rgb) versus luma (Y' - computed from gamma corrected rgb). Lightness (human perceived brightness) is not linear with respect to relative luminance though. It is roughly linear as Y^1/3, after normalization by the reference white (cie76).

As far as interpolation goes, it is not quite as cut and dry as that webpage makes it out to be. Even if you undo gamma correction, there is no guarantee that the function being used for interpolation will more accurately fit (approximate) the underlying data. I could easily generate an image that is a linear ramp after gamma correction... so linear interpolation would give worse results if performed on the linear values. One could argue that such images are unlikely, or that natural images are generally better modeled by standard interpolation functions after removing gamma correction, which I guess is the argument of that webpage and is probably the case.

I should clarify, if you want to compute the average (or weighted average) luminance around a point (i.e. within some area) then you would most definitely want to work with the linear values. Interpolation is a different matter.

pbristow

24th February 2011, 17:14

@tritical: Regarding interpolation, the ramp example isn't that relevant as the differences between the two methods would be negligible when interpolating between roughly similar brightness levels.

Where this effect shows up strongest is when interpolating between a bright pixel (or several of them) and a much darker one (or several). The effect is that small bright details (e.g. stars in the night sky) become much dimmer and less visible than they should, while small dark details (e.g. speckles of dirt on a white sheet) become more obvious than they should.

tritical

24th February 2011, 22:32

Yes, your are correct that when averaging pixel values the difference between 1.) averaging the gamma corrected values directly and 2.) undoing gamma correction, averaging the linear values, and then redoing the gamma correction will be greatest when the values to be averaged are very far apart... and 1 will always result in a smaller final value than 2 (assuming the applied gamma correction factor is < 1.0). However, that is not interpolation. For example if I am given y=1.0 at x=0.0 and y=0.0 and x=1.0, and am asked to give a value for y at x=0.5, without any other knowledge about the function there is no reason to believe that averaging y at x=0.0 and x=1.0 will be anywhere close to the correct value at x=0.5. Now if I know that y is piecewise linear that is another matter. However, I would argue that most images - especially edge areas - are much closer to piecewise constant than piecewise linear... i.e. when you have two neighboring pixels that are very different, rarely in the original continuous (infinite resolution) image would the value directly between them be exactly half the luminance.

Also in my ramp example the difference between neighboring function values (that are linear after gamma correction) could be made infinitely large, resulting in as large a difference as you want. That point was only to say that the accuracy of interpolation will be limited by how well the model you use for interpolation fits the underlying data.

Actually, I did a quick test using the training data for nnedi (primarily directional edges and complex textures) and linear interpolation (so the goal is to predict the pixel value at (x,y) given (x,y+1) and (x,y-1)). Linear interpolation on the linear values (undo gamma correction, interpolate, redo gamma) vs linear interpolation on the gamma corrected values did not result in any significant reduction in absolute error or squared error - both in gamma corrected and linear value space. The same for cubic interpolation.

pbristow

25th February 2011, 01:12

Hmmm... perhaps I was assuming too simple a model of interpolation (thinking of the case of bilinear interp as the simplest case, and over-generalising)?

Today I've been looking at various images of faces, many where light is reflect in small highlights off of dark hair, and seeing what happens when I resize the image. Subjectively it does seem that the small highlights are getting dimmed (not just shrinking) during the resize, but then it's probably just a crude bilinear resize (whatever IE8 uses to resize images on webpages). I might do some proper testing tomorrow with various resizers, if I get time.