nnedi2/nnedi3/eedi3 - Page 14

aegisofrime · 24th August 2010, 08:27

Thanks for the update tritical! I use NNEDI2 on a daily basis (mainly with TGMC), so having an NNEDI3 competitive speed wise would be great!

Terka · 24th August 2010, 09:05

Tritical,
thank you for all your work!
Will theese speed improvements added to nnedi2 also?

LoRd_MuldeR · 24th August 2010, 10:17

Quote:

Originally Posted by tritical

I had also been experimenting with minimizing abs error instead of squared error, the last image I posted of the castle was using weights optimized for abs error. Since all that takes is me waiting for models to train I might add abs error optimized weights at some point.

I wonder: Did you ever try other metrics than squared/absolute error, like SSIM for example? Or is the "sliding window" approach not suitable for your algorithm?

(BTW: Looking forward for any speed-optimization in NNEDI3)

tritical · 24th August 2010, 17:45

I did try SSIM, but it's hard to effectively use it within the online gradient descent training scheme because the derivative for a single pixel depends on all of the other pixels within the gaussian weighted window around it. That means I can't calculate a single pixel result and then do gradient descent on the model weights unless I assume a value for all of the missing pixels (those which need to be interpolated by the network) and pre-compute the necessary statistics beforehand (so that during training I only have to modify a few values based on the current result to compute the gradient). What I tried was assuming all of the missing values were perfect... that gave results very close to mse training. The one idea I didn't try was starting by assuming all of the missing values were interpolated using cubic interpolation, and after each iteration (after presenting all of the training cases to the network) recalculating all of the ssim statistics that depend on the missing pixels based on the current state of the network. Like a lot of things I plan to try that in the future.

Now if you're not using online gradient descent (say CMA-ES or a fitness function based optimization) you could just run through the training set, interpolate all the necessary pixels, compute ssim for everything at the end. I actually tried that as well, but it takes A LOT more cpu time. Using online gradient descent I can train the 6x48x256 network on 60 million pixels in only a few days on my quadcore desktop. For nnedi2 I was using separable CMA-ES for training (using the scheme I just described) and to train on 1/5 the pixels was taking about 3x longer while using about 4x the cpu power. And nnedi2 is basically equivalent to 4x12x32 in nnedi3 terms. Those are rough numbers of course.

LoRd_MuldeR · 24th August 2010, 19:15

Time to start a nnedi@home project

akupenguin · 25th August 2010, 11:04

Quote:

Originally Posted by tritical

The one idea I didn't try was starting by assuming all of the missing values were interpolated using cubic interpolation, and after each iteration (after presenting all of the training cases to the network) recalculating all of the ssim statistics that depend on the missing pixels based on the current state of the network. Like a lot of things I plan to try that in the future.

Surely the correct algorithm involves computing the partial derivatives of ssim wrt all of its inputs, not treating the neighboring pixels as constants?

Quote:

For nnedi2 I was using separable CMA-ES for training (using the scheme I just described) and to train on 1/5 the pixels was taking about 3x longer while using about 4x the cpu power.

Is that measuring just speed of convergence, or weighting by quality of solutions found?

tritical · 25th August 2010, 16:14

Quote:

Surely the correct algorithm involves computing the partial derivatives of ssim wrt all of its inputs, not treating the neighboring pixels as constants?

I compute the partial of ssim with respect to the network output, which does depend on neighboring pixel values... some of which are fixed, but some of which depend on the network. I assumed fixed values (from the very beginning of the optimization) for those pixels as well, as a rough, fast approximation. Recalculating the statistics depending on those pixels after each (or every few) complete pass through the training patterns using the current state of the network is more desirable, but I haven't done it yet.

EDIT: The partial of ssim with respect to the network output is only for ssim of the 11x11 gaussian weighted window centered on the current pixel. I'm not explicitly taking into account other ssim windows that depend on the network output at this point.

Quote:

Is that measuring just speed of convergence, or weighting by quality of solutions found?

Time to equal quality of solution, approximately. In all of my tests gradient descent achieved equal or lower training error in significantly less time. Of course given more time cmaes might do better, but for my time/cpu point gradient descent was much better.

akupenguin · 26th August 2010, 12:17

BTW, modes with more weights than fit in L1d cache are bottlenecked by cache misses, not arithmetic throughput. I didn't rectify this (except insofar as int16 reduces cache footprint too), but you probably want to if you're tuning for nsize>8x6.

Forteen88 · 28th August 2010, 13:20

@tritical: Could you please make a comparison of NNEDI3, with the same source as these:
http://forum.doom9.org/showthread.ph...68#post1343668
Thanks

Archimedes · 29th August 2010, 11:42

Quote:

Originally Posted by tritical

@Archimedes

What do you think of this enlargement?

nnedi3_rpow2 (rfactor=4, nsize=0, nns=4, qual=2, cshift="Spline36Resize"), 1600x1200

Looks much better, than with the current version of NNEDI3 with the same parameter.

yup · 30th August 2010, 10:43

Hi all!
Please explain how work sclip parameter? It is first iteration for finding solution?
yup.

Skauneboy · 5th September 2010, 15:03

I can't get EEDI3 to work with this AA-function:

Code:

o=last
AssumeBFF().SeparateFields()
dbl   = mt_Average(SelectEven().EEDI3(field=0),SelectOdd().EEDI3(field=1),U=3,V=3)
dblD  = mt_MakeDiff(o,dbl,U=3,V=3)
shrpD = mt_MakeDiff(dbl,dbl.RemoveGrain(11),U=3,V=3)
DD    = shrpD.Repair(dblD,13)
dbl.mt_AddDiff(DD,U=3,V=3)

Repair gives the error "clips must be of equal type". It works fine with your other interpolation filters though.

Didée · 5th September 2010, 16:33

Oops, I don't have eedi3.dll at hand ... give a try if it works with the following change?

dbl = mt_Average(SelectEven().EEDI3(field=0),SelectOdd().EEDI3(field=1),U=3,V=3).AssumeFrameBased()

Skauneboy · 5th September 2010, 18:56

Thanks for the suggestion Didée but it didn't work. Something awry with the field-parameter of EEDI3? Or perhaps Repair.dll is at fault.

Didée · 5th September 2010, 19:43

Ah, reading documentation helps. EEDI3 works framebased like NNEDIx, not fieldbased like EEDI2. Try like so:

Code:

o=last
AssumeBFF() # should not be needed ...
EEDI3(field=-2)
dbl   = merge( selecteven(), selectodd() )
dblD  = mt_MakeDiff(o,dbl,U=3,V=3)
shrpD = mt_MakeDiff(dbl,dbl.RemoveGrain(11),U=3,V=3)
DD    = shrpD.Repair(dblD,13)
dbl.mt_AddDiff(DD,U=3,V=3)

Skauneboy · 6th September 2010, 21:34

Ah, that did the trick.

Chainmax · 12th September 2010, 20:34

I did some more comparisons using Gabriel Knight's "Making Of" video:

eedi3_rpow2(rfactor=2,cshift="spline36resize",hp=false)

nnedi3_rpow2(rfactor=2,nsize=3,nns=4,qual=2,cshift="spline36resize")

eedi3_rpow2(alpha=0.3,beta=0,rfactor=2,cshift="spline36resize",hp=true,vcheck=3)

Original screen batch here. In my opinion, on the particular filterchain used for this particular source, the first version of EEDI3 gives the more pleasing results.

markanini · 13th September 2010, 01:39

Thanks for the comparison chainmax. I'd go with nnedi3 myself in this case but I wouldn't argue with your preference for eedi3 as it's a matter of taste.

I'm eagerly awaiting for Triricals next nnedi3 release.

Usedocne · 13th September 2010, 13:57

Thanks also Chainmax. I couldn't choose between them though. Each is good on different areas (parts of mic, side of face), but struggle on others. They are pretty much equal imho, in as far as weighing up the positives and negatives effects.

tritical · 21st September 2010, 03:10

So I finally got back to working on this, and almost have the next nnedi3 release ready... but one question first. Does anyone using this not have an SSE2 capable processor? I'm considering dropping support for SSE/MMX since it's a pain to keep around and test.

24th August 2010, 17:45	#264 \| Link
tritical Registered User Join Date: Dec 2003 Location: MO, US Posts: 999	I did try SSIM, but it's hard to effectively use it within the online gradient descent training scheme because the derivative for a single pixel depends on all of the other pixels within the gaussian weighted window around it. That means I can't calculate a single pixel result and then do gradient descent on the model weights unless I assume a value for all of the missing pixels (those which need to be interpolated by the network) and pre-compute the necessary statistics beforehand (so that during training I only have to modify a few values based on the current result to compute the gradient). What I tried was assuming all of the missing values were perfect... that gave results very close to mse training. The one idea I didn't try was starting by assuming all of the missing values were interpolated using cubic interpolation, and after each iteration (after presenting all of the training cases to the network) recalculating all of the ssim statistics that depend on the missing pixels based on the current state of the network. Like a lot of things I plan to try that in the future. Now if you're not using online gradient descent (say CMA-ES or a fitness function based optimization) you could just run through the training set, interpolate all the necessary pixels, compute ssim for everything at the end. I actually tried that as well, but it takes A LOT more cpu time. Using online gradient descent I can train the 6x48x256 network on 60 million pixels in only a few days on my quadcore desktop. For nnedi2 I was using separable CMA-ES for training (using the scheme I just described) and to train on 1/5 the pixels was taking about 3x longer while using about 4x the cpu power. And nnedi2 is basically equivalent to 4x12x32 in nnedi3 terms. Those are rough numbers of course. Last edited by tritical; 24th August 2010 at 17:56.

24th August 2010, 19:15	#265 \| Link
LoRd_MuldeR Software Developer Join Date: Jun 2005 Location: Last House on Slunk Street Posts: 13,248	Time to start a nnedi@home project __________________ Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

26th August 2010, 12:17	#268 \| Link
akupenguin x264 developer Join Date: Sep 2004 Posts: 2,392	BTW, modes with more weights than fit in L1d cache are bottlenecked by cache misses, not arithmetic throughput. I didn't rectify this (except insofar as int16 reduces cache footprint too), but you probably want to if you're tuning for nsize>8x6. Last edited by akupenguin; 27th August 2010 at 03:02.

5th September 2010, 15:03	#272 \| Link
Skauneboy Registered User Join Date: Mar 2010 Location: Sweden Posts: 13	I can't get EEDI3 to work with this AA-function: Code: o=last AssumeBFF().SeparateFields() dbl = mt_Average(SelectEven().EEDI3(field=0),SelectOdd().EEDI3(field=1),U=3,V=3) dblD = mt_MakeDiff(o,dbl,U=3,V=3) shrpD = mt_MakeDiff(dbl,dbl.RemoveGrain(11),U=3,V=3) DD = shrpD.Repair(dblD,13) dbl.mt_AddDiff(DD,U=3,V=3) Repair gives the error "clips must be of equal type". It works fine with your other interpolation filters though. Last edited by Skauneboy; 5th September 2010 at 15:15.

5th September 2010, 16:33	#273 \| Link
Didée Registered User Join Date: Apr 2002 Location: Germany Posts: 5,391	Oops, I don't have eedi3.dll at hand ... give a try if it works with the following change? dbl = mt_Average(SelectEven().EEDI3(field=0),SelectOdd().EEDI3(field=1),U=3,V=3).AssumeFrameBased() __________________ - We´re at the beginning of the end of mankind´s childhood - My little flickr gallery. (Yes indeed, I do have hobbies other than digital video!)

24th August 2010, 08:27	#261 \| Link
aegisofrime Registered User Join Date: Apr 2009 Posts: 478	Thanks for the update tritical! I use NNEDI2 on a daily basis (mainly with TGMC), so having an NNEDI3 competitive speed wise would be great!

24th August 2010, 09:05	#262 \| Link
Terka Registered User Join Date: Jan 2005 Location: cz Posts: 704	Tritical, thank you for all your work! Will theese speed improvements added to nnedi2 also?

28th August 2010, 13:20	#269 \| Link
Forteen88 Herr Join Date: Apr 2009 Location: North Europe Posts: 556	@tritical: Could you please make a comparison of NNEDI3, with the same source as these: http://forum.doom9.org/showthread.ph...68#post1343668 Thanks

30th August 2010, 10:43	#271 \| Link
yup Registered User Join Date: Feb 2003 Location: Russia, Moscow Posts: 854	Hi all! Please explain how work sclip parameter? It is first iteration for finding solution? yup.

5th September 2010, 18:56	#274 \| Link
Skauneboy Registered User Join Date: Mar 2010 Location: Sweden Posts: 13	Thanks for the suggestion Didée but it didn't work. Something awry with the field-parameter of EEDI3? Or perhaps Repair.dll is at fault.

5th September 2010, 19:43	#275 \| Link
Didée Registered User Join Date: Apr 2002 Location: Germany Posts: 5,391	Ah, reading documentation helps. EEDI3 works framebased like NNEDIx, not fieldbased like EEDI2. Try like so: Code: o=last AssumeBFF() # should not be needed ... EEDI3(field=-2) dbl = merge( selecteven(), selectodd() ) dblD = mt_MakeDiff(o,dbl,U=3,V=3) shrpD = mt_MakeDiff(dbl,dbl.RemoveGrain(11),U=3,V=3) DD = shrpD.Repair(dblD,13) dbl.mt_AddDiff(DD,U=3,V=3) __________________ - We´re at the beginning of the end of mankind´s childhood - My little flickr gallery. (Yes indeed, I do have hobbies other than digital video!)

6th September 2010, 21:34	#276 \| Link
Skauneboy Registered User Join Date: Mar 2010 Location: Sweden Posts: 13	Ah, that did the trick.

12th September 2010, 20:34	#277 \| Link
Chainmax Huh? Join Date: Sep 2003 Location: Uruguay Posts: 3,103	I did some more comparisons using Gabriel Knight's "Making Of" video: eedi3_rpow2(rfactor=2,cshift="spline36resize",hp=false) nnedi3_rpow2(rfactor=2,nsize=3,nns=4,qual=2,cshift="spline36resize") eedi3_rpow2(alpha=0.3,beta=0,rfactor=2,cshift="spline36resize",hp=true,vcheck=3) Original screen batch here. In my opinion, on the particular filterchain used for this particular source, the first version of EEDI3 gives the more pleasing results. __________________ Read Decomb's readmes and tutorials, the IVTC tutorial and the capture guide in order to learn about combing and how to deal with it.

13th September 2010, 01:39	#278 \| Link
markanini Registered User Join Date: Apr 2006 Posts: 299	Thanks for the comparison chainmax. I'd go with nnedi3 myself in this case but I wouldn't argue with your preference for eedi3 as it's a matter of taste. I'm eagerly awaiting for Triricals next nnedi3 release.

13th September 2010, 13:57	#279 \| Link
Usedocne lurkster Join Date: Jul 2009 Location: D9\|D10 Posts: 123	Thanks also Chainmax. I couldn't choose between them though. Each is good on different areas (parts of mic, side of face), but struggle on others. They are pretty much equal imho, in as far as weighing up the positives and negatives effects.

21st September 2010, 03:10	#280 \| Link
tritical Registered User Join Date: Dec 2003 Location: MO, US Posts: 999	So I finally got back to working on this, and almost have the next nnedi3 release ready... but one question first. Does anyone using this not have an SSE2 capable processor? I'm considering dropping support for SSE/MMX since it's a pain to keep around and test.