View Full Version : nnedi3 plugin for VapourSynth
jackoneill
10th November 2012, 18:23
Source: https://github.com/dubhater/vapoursynth-nnedi3
DLLs can be found in the "releases" section (https://github.com/dubhater/vapoursynth-nnedi3/releases).
Please test and report back. Patches are welcome too.
nnedi3_rpow2 can be found here: http://forum.doom9.org/showthread.php?t=172652
kolak
10th November 2012, 22:10
8bit precision only or also 10bit (16) ?
jackoneill
10th November 2012, 22:48
8bit precision only or also 10bit (16) ?
Precision?... If you're asking what bit depths it accepts for input, only 8. (This might change in the future.) Internally it does the same as the avisynth filter. It's mostly the same code.
Revgen
11th November 2012, 00:15
I notice that there is no "Threads" option for this filter like there is for the avisynth filter. So it can't do multi-threading internally?
mandarinka
11th November 2012, 00:59
Nice job!
Reel.Deel
11th November 2012, 01:04
@ jackoneill
Thank you very much for taking time to port NNEDI3, Histogram, and TemporalSoften. Awesome work :).
I'll test them out and report back if I find any discrepancies.
=============================================
@ Revgen
This post (http://forum.doom9.org/showthread.php?p=1591455#post1591455) might answer your question on multi-threading.
jackoneill
11th November 2012, 07:41
I notice that there is no "Threads" option for this filter like there is for the avisynth filter. So it can't do multi-threading internally?
Indeed, there is no internal threading anymore. There is no need for it, because Vapoursynth makes it process several frames in parallel.
lansing
11th November 2012, 09:46
just ran a speed test against the avisynth nnedi3 on a 720x480 clip, encode with x264 ultrafast preset, the avisynth one's got 30+ fps, and the vs' got only 3.x fps, something must be wrong
jackoneill
11th November 2012, 10:22
just ran a speed test against the avisynth nnedi3 on a 720x480 clip, encode with x264 ultrafast preset, the avisynth one's got 30+ fps, and the vs' got only 3.x fps, something must be wrong
Nothing is wrong. The avisynth version has optimised SSE2 code. Mine doesn't.
You'll get about the same low speed from the avisynth version with opt=1.
jackoneill
16th November 2012, 10:53
Hi!
A new build is available for testing, now with more speed. Use opt=2, but keep the other parameters at their default values, for now (or it will eat your cat). I have only converted a few of the asm functions — those that are used when all parameters are at their default values (except for opt, of course).
opt=2 requires SSE2, obviously.
This new build also requires VapourSynth R15.
Also, it now accepts any colorspace with 8 bits per sample, not just YUV.
https://github.com/dubhater/vapoursynth-nnedi3/downloads
Revgen
16th November 2012, 21:46
Quick test. Played back through vdub using a 1920x1080i mpeg2 source and compared vapoursynth SSE2 version (using version R15) and the latest avisynth version (using an avisynth script) with my Q6850 quadcore cpu & WinXP pro. Vapoursynth version runs most optimally with vs.Core(threads=5). Changing it to threads=6 didn't seem to be much better. Avisynth version worked optimally at threads=4. Vapoursynth version runs about 2-3fps faster than avisynth version.
Vapoursynth script
import vapoursynth as vs
import sys
core = vs.Core(threads=5)
core.avs.LoadPlugin(path=r'C:\dgindex\DGDecode.dll')
core.std.LoadPlugin(path=r'C:\Program Files\VapourSynth\nnedi3-testing.dll')
ret = core.avs.MPEG2Source(r'E:\mympeg2.d2v')
ret = core.nnedi3.nnedi3(clip=ret,opt=2,field=3)
last = ret
17fps-18fps
Avisynth scrpt
mpeg2source("E:\mympeg2.d2v")
nnedi3(opt=2,field=3,threads=4)
15fps-16fps
kolak
16th November 2012, 23:06
Interesting :)
Keiyakusha
16th November 2012, 23:11
Can you also check CPU usage? Because when we'll do actual encoding something like 18fps and 50% cpu load won't make things faster than 10fps and 25% load
Also I posted it before but deleted suggestion cause I'm not sure anymore, but to utilize vapoursynth's MT better, maybe it is good idea to add more filters. Like 2 consecutive nnedi or something (but make sure that it doesn't bottlenecks elsewhere, like overly high resolution...)
Revgen
16th November 2012, 23:29
Vapoursynth is running at 98-99% load and avisynth is running at 87-88% load.
Revgen
17th November 2012, 00:06
Tried double NNEDI3. Had to bump it up to vs.Core(threads=7) to optimally use vapoursynth this time. Threads=4 was stilll the most optimal for avisynth. Avisynth CPU load was 90% to 91%. Vapoursynth was 97% to 98%. Avisynth was about .5 to 1fps slower than vapoursynth. Vapoursynth ran 12.5fps-13fps while avisynth ran 12fps to 12.5fps. However, there was a huge discreprancy in memory usage. Avisynth was using about 1gb of RAM before I stopped playback. Vapoursynth was always consistently at 200mb of usage.
Myrsloik
17th November 2012, 00:10
Tried double NNEDI3. Had to bump it up to vs.Core(threads=7) to optimally use vapoursynth this time. Threads=4 was stilll the most optimal for avisynth. Avisynth CPU load was 90% to 91%. Vapoursynth was 97% to 98%. Avisynth was about .5 to 1fps slower than vapoursynth. Vapoursynth ran 12.5fps-13fps while avisynth ran 12fps to 12.5fps. However, there was a huge discreprancy in memory usage. Avisynth was using about 1gb of RAM before I stopped playback. Vapoursynth was always consistently at 200mb of usage.
The ram difference is just because of different cache strategies. Avisynth caches everything until it bumps into the set memory limit. VS keeps track of the cache request history as well and is decent at detecting linear scans and minimizing cache sizes then.
Revgen
17th November 2012, 00:22
Okay, I used SetMemoryMax(250) this time. Memory usage is under control and avisynth performance is still the same.
jackoneill
17th November 2012, 13:08
Thank you all for testing. I'm glad to see it's working.
I'm done with the asm. The new build should be fully functional now. opt=2 is default. Please try your favourite combination of exotic parameters.
Reel.Deel
17th November 2012, 13:30
Internally it does the same as the avisynth filter. It's mostly the same code.
If I understand this correctly, this means NNEDI3 works in 16-bit internally, then dithers/rounds/truncates(?) to 8-bit. Correct?
High bit depth input/output would be lovely. :)
*edit*
My previous question (NNEDI3 internal processing) is irelevant. I (somehow) misinterpreted the quoted statement and thought that NNEDI3 works like avisynth's internal resizing filters. Sorry for being dumb. :o
jackoneill
17th November 2012, 20:08
If I understand this correctly, this means NNEDI3 works in 16-bit internally, the dithers/rounds/truncates(?) to 8-bit. Correct?
High bit depth input/output would be lovely. :)
No idea. Investigating the possibility of high bit depth input and output is on my todo list.
mandarinka
18th November 2012, 01:18
At least some internal operation are done in higher-bitdepth, judging by this and following posts - http://forum.doom9.org/showthread.php?p=1427793#post1427793.
The behaviour (what gets used internally) depends on the fapprox option though.
jackoneill
18th November 2012, 14:18
Could someone please test if these two produce exactly the same output (md5sum or similar)?
import vapoursynth as vs
import sys
core = vs.Core()
core.std.LoadPlugin(path="ffms2.dll")
core.std.LoadPlugin(path="nnedi3.dll")
ret = core.ffms2.Source(source="8-bit-h264.mkv")
ret = core.nnedi3.nnedi3(clip=ret, field=0, opt=1)
ret = core.std.Trim(clip=ret, first=0, last=99)
ret.output(sys.stdout, y4m=False)
ffvideosource("8-bit-h264.mkv")
nnedi3(field=0, opt=1)
trim(0, 99)
Chikuzen
18th November 2012, 14:48
Could someone please test if these two produce exactly the same output (md5sum or similar)?
import vapoursynth as vs
core = vs.Core()
core.std.LoadPlugin('G:/vsplugins/vsrawsource.dll')
core.std.LoadPlugin('G:/vsplugins/nnedi3.dll')
clip = core.raws.Source('D:/test_media/derf/soccer.y4m')
clip = core.nnedi3.nnedi3(clip, field=0, opt=1)
clip = core.std.Trim(clip, first=0, last=99)
out = open('nnedi3_test.y4m', 'wb')
clip.output(out, y4m=True)
out.close()
https://dl.dropbox.com/u/19797864/subtract_nnedi3.png
not same output.
Reel.Deel
18th November 2012, 14:54
Just for reassurance, the checksums between the two are indeed different.
jackoneill
18th November 2012, 16:10
All right. Thank you.
Reel.Deel
15th December 2012, 15:45
Hi jackoneill, I recently noticed that GitHub no longer has the downloads section. If it's not too much trouble maybe you can re-upload the dll's some place else?
A couple of weeks ago I was searching along and I came across this NNEDI3 wrapper for VapourSynth (http://www.binrand.com/post/4226552-base-vapoursynth-i-had-write-a-vapoursynth-wrapper-of-nnedi-3-base-on-loren.html) based on akupenguin's implementation (http://forum.doom9.org/showthread.php?p=1427793&highlight=nnedi#post1427793). I was wondering if maybe it would be a good idea to adopt the upscaling part onto your ported NNEDI3 plugin? If that's too much work than sorry for bothering. :)
(Sorry for the link, but I couldn't find the original one posted at pastebin.)
jackoneill
15th December 2012, 17:52
Hi jackoneill, I recently noticed that GitHub no longer has the downloads section. If it's not too much trouble maybe you can re-upload the dll's some place else?
Yeah, they axed that feature (https://github.com/blog/1302-goodbye-uploads). I'll probably upload them to uloz.to.
A couple of weeks ago I was searching along and I came across this NNEDI3 wrapper for VapourSynth (http://www.binrand.com/post/4226552-base-vapoursynth-i-had-write-a-vapoursynth-wrapper-of-nnedi-3-base-on-loren.html) based on akupenguin's implementation (http://forum.doom9.org/showthread.php?p=1427793&highlight=nnedi#post1427793). I was wondering if maybe it would be a good idea to adopt the upscaling part onto your ported NNEDI3 plugin? If that's too much work than sorry for bothering. :)
As far as I know, tritical's version can do everything akupenguin's can, and more.
Now that fmtconv exists and works in Linux, I can probably recreate nnedi3_rpow2.
Tima
19th December 2012, 22:46
Yeah, they axed that feature (https://github.com/blog/1302-goodbye-uploads). I'll probably upload them to uloz.to.
You can still download binaries from github:
https://github.com/dubhater/vapoursynth-nnedi3/downloads
This method will be working for 90 days after 2012-12-11, according to the statement in their blog post.
jackoneill
28th December 2012, 15:08
import vapoursynth as vs
import sys
core = vs.Core()
core.std.LoadPlugin(r'C:\Program Files (x86)\AviSynth 2.5\plugins\ffms2.dll')
core.std.LoadPlugin(r'C:\Program Files (x86)\VapourSynth\filters\nnedi3.dll')
clip = core.ffms2.Source(r'test.mp4', threads=1)
clip = core.nnedi3.nnedi3(clip, field=0, nsize=0, opt=2)
clip = core.nnedi3.nnedi3(clip, field=0, nsize=4, opt=2)
clip.output(sys.stdout, y4m=True)
The above script will directly crash python without any error message. test.mp4 (http://www.mediafire.com/?x9m9d19pvamy36c) is just a video encoded from ColorBars(pixel_type="YV12") of avs. Setting opt=1 in second nnedi3 call eliminates the crash problem. The strange thing is if I use core.std.BlankClip(format=vs.YUV420P8) as the source clip and keep opt=2 in both nnedi3 calls, it doesn't crash. However it has nothing to do with ffms2, because I have tried d2vsource to load a d2v as source clip and it crashes too.
Sorry about that. I fail at C.
New dll here: http://uloz.to/xSuqyUw/nnedi3-dll. sha256sum: 45e712f038d23718b912b69fa69a8d603329a02bb19701f06873c2e84ef52dad
Please test and let me know if it works.
jackoneill
28th September 2013, 22:04
I added nnedi3_rpow2 (requested by aegisofrime). The readme.rst is updated, and DLLs can be found in the "releases" section (https://github.com/dubhater/vapoursynth-nnedi3/releases) on github.
For 4:2:0 and 4:4:4 (including RGB) the output should be the same as with the Avisynth version. For other formats, maybe it's correct, maybe not.
aegisofrime
29th September 2013, 03:50
I added nnedi3_rpow2 (requested by aegisofrime). The readme.rst is updated, and DLLs can be found in the "releases" section (https://github.com/dubhater/vapoursynth-nnedi3/releases) on github.
For 4:2:0 and 4:4:4 (including RGB) the output should be the same as with the Avisynth version. For other formats, maybe it's correct, maybe not.
Thanks very much! :)
Edit: Whelp, sorry I'm going to need your help again after your hard work adding rpow2. When I load libnnedi3.dll I get an error saying that "Python exception: 'No entry point found in C:\\Program Files (x86)\\VapourSynth\
\filters\\libnnedi3.dll'" Is there something that I'm missing here?
jackoneill
29th September 2013, 09:09
Thanks very much! :)
Edit: Whelp, sorry I'm going to need your help again after your hard work adding rpow2. When I load libnnedi3.dll I get an error saying that "Python exception: 'No entry point found in C:\\Program Files (x86)\\VapourSynth\
\filters\\libnnedi3.dll'" Is there something that I'm missing here?
Ah, that. It should be fixed now. I replaced the archive.
jackoneill
19th August 2014, 09:49
v2.0 is out, with support for up to 16 bits per sample. There is no asm for this, so it's slower than with 8 bit input. As usual, DLLs can be found at Github (https://github.com/dubhater/vapoursynth-nnedi3/releases).
Basically I hacked the float code path to take 16 bit input. It required working around an overflow and adjusting the prescreener weights for larger pixel values.
Does anyone want to volunteer to write asm?
foxyshadis
19th August 2014, 23:55
Now you're talking my language. Is it compiled to target any particular instruction set?
jackoneill
20th August 2014, 10:09
Now you're talking my language. Is it compiled to target any particular instruction set?
I don't know about the C code. I didn't pass any special options to gcc. The asm requires SSE2.
jackoneill
24th August 2014, 18:25
Here is v2.1 (https://github.com/dubhater/vapoursynth-nnedi3/releases/tag/v2.1), which is ~6 times faster than v2.0 (measured with pscrn=0). No new asm was needed to obtain this speed-up, because the most important asm functions work with floats. It was just a matter of using them.
Mystery Keeper
25th August 2014, 01:41
Great work! ^_^
jackoneill
7th February 2015, 23:59
Since nnedi3 consists of lots of multiply + add operations, why not try FMA3/4? But of course my CPU is a Core 2, so someone else will have to test. Here is a set of DLLs with a few functions modified to use the vfmaddXXXps/vfmaddps instructions:
http://ulozto.net/xs6NnLV5/vapoursynth-nnedi3-v2-1-fma-win32-7z
http://ulozto.net/xC17v6dj/vapoursynth-nnedi3-v2-1-fma-win64-7z
To compile your own, check out the "fma" branch at Github.
Test with a 16 bit clip using the default parameters, or with an 8 bit clip using the default parameters and the following: "pscrn=1, fapprox=12". I'm very curious to see if it gets any faster. (Also if the output looks the same.)
Select FMA3 code with "opt=3", and FMA4 code with "opt=4". The default is still "opt=2".
Myrsloik
8th February 2015, 00:44
A small test on my computer with 12 threads gives:
8bit: 96fps (opt=3) and 80fps (opt=2)
16bit: 71fps (opt=3) and 62fps (opt=2)
Are_
8th February 2015, 02:35
I guess I did nothing wrong:
import vapoursynth as vs
core = vs.get_core()
v = core.d2v.Source(r'VTS_01_1.d2v', nocrop=True, rff=False)
v = core.fmtc.bitdepth(v, bits=16)
v = core.nnedi3.nnedi3(clip=v, field=0, pscrn=1, fapprox=12, opt=2)
v.set_output()
for ((i=1; i<=$@; i++)); do
vspipe test.py -e 2000 - 2>> speed.txt | md5sum >> integrity.txt
sleep 1
done
8bit,opt2
Output 2001 frames in 31.08 seconds (64.38 fps)
Output 2001 frames in 31.19 seconds (64.15 fps)
8bit,opt4
Output 2001 frames in 102.01 seconds (19.62 fps)
Output 2001 frames in 102.06 seconds (19.61 fps)
16bit,opt2
Output 2001 frames in 37.82 seconds (52.91 fps)
Output 2001 frames in 37.34 seconds (53.59 fps)
16bit,opt4
Output 2001 frames in 123.19 seconds (16.24 fps)
Output 2001 frames in 123.99 seconds (16.14 fps)
8bit,opt2
ebf18035a42531bbb518f56225539e90 -
ebf18035a42531bbb518f56225539e90 -
8bit,opt4
9e37a7ff03966a7915fd5c4896a4b53f -
9e37a7ff03966a7915fd5c4896a4b53f -
16bit,opt2
f6e42b870661c88de977fb868a075904 -
f6e42b870661c88de977fb868a075904 -
16bit,opt4
b5b73f2abe2008fbee8cf03f640e0e75 -
b5b73f2abe2008fbee8cf03f640e0e75 -
And indeed the output for opt=4 is wrong, chroma missplacement to the right and interlacing artifacts.
jackoneill
8th February 2015, 09:48
I guess I did nothing wrong:
[...]
And indeed the output for opt=4 is wrong, chroma missplacement to the right and interlacing artifacts.
Oops. There was a mistake in the FMA4 functions. I replaced the links.
Are_
8th February 2015, 13:37
No problems now. Even if the output is not bit identical, neither is the one from opt=1 with the one from opt=2, I can't spot a single difference.
16bit2opt
Output 2001 frames in 37.40 seconds (53.50 fps)
Output 2001 frames in 37.48 seconds (53.39 fps)
16bit4opt
Output 2001 frames in 32.51 seconds (61.55 fps)
Output 2001 frames in 32.62 seconds (61.35 fps)
8bit2opt
Output 2001 frames in 31.70 seconds (63.12 fps)
Output 2001 frames in 30.97 seconds (64.61 fps)
8bit4opt
Output 2001 frames in 25.44 seconds (78.67 fps)
Output 2001 frames in 25.32 seconds (79.03 fps)
kolak
18th February 2015, 00:24
Only 25% speed difference between 8 and 16bit- looks like a good result.
So can we have yadifmod now working at 16bit?
Reel.Deel
18th February 2015, 01:04
So can we have yadifmod now working at 16bit?
Yadifmod supports 9-16 bits since r2 (https://github.com/HomeOfVapourSynthEvolution/VapourSynth-Yadifmod/releases).
Edit: also, here's the appropriate thread: http://forum.doom9.org/showthread.php?t=171028
kolak
18th February 2015, 23:38
Heh- I had a long break from avisynth/vapoursynth. I can remember now (I was chasing for it). Great!
MonoS
19th February 2015, 14:03
No problems now. Even if the output is not bit identical, neither is the one from opt=1 with the one from opt=2, I can't spot a single difference.
It's expected for the output to not be bit identical cause FMA instruction have higher precision than a mul followed than an add [that's because the fpu have infinite precision while fp register have finite precision IIRC]
jackoneill
22nd February 2015, 21:12
v3 is here (https://github.com/dubhater/vapoursynth-nnedi3/releases/tag/v3).
* Use FMA (fused multiply-add) instructions in some functions. Speeds up the float paths a bit.
* Add another SIMD function for a little more speed with 16 bit input.
* Automatically select the best functions if opt=True, use only C functions if opt=False.
* Don't embed the weights into the DLLs. "nnedi3 weights.bin" needs to be in the same folder as the DLL.
"nnedi3 weights.bin" can be downloaded from Github: https://github.com/dubhater/vapoursynth-nnedi3/blob/master/src/nnedi3%20weights.bin It's the same file that's embedded in the Avisynth plugin.
There is theoretical support for architectures other than x86 now. Little endian ones. If anyone wants to run nnedi3 on something big endian, that can be arranged too, I think. You'll just have to volunteer to test it a bit.
mawen1250
6th March 2015, 19:33
With 9-16bit input and FMA3 opt, nnedi3 produces corrupt result. From what I observed, the pixels interpolated by nn predictor are either white or black.
http://i683.photobucket.com/albums/vv197/mawen1250/EP01%20-%20.vpy%20-%2020921_zps9dby6ecb.jpg
jackoneill
9th March 2015, 20:00
Here is v4 (https://github.com/dubhater/vapoursynth-nnedi3/releases/tag/v4).
* Fix copy-paste error in the FMA functions
* Rename "nnedi3 weights.bin" to "nnedi3_weights.bin"
Due to an unforeseen limitation of Automake, the weights file has been renamed. v4 expects the new name.
The broken output reported in the post above is fixed.
jackoneill
23rd April 2015, 20:08
Here is v5 (https://github.com/dubhater/vapoursynth-nnedi3/releases/tag/v5).
* Adjust the frame durations when doubling the frame rate.
* Fix buffer overflow with images wider than 8192 pixels or so (inherited from the Avisynth plugin).
* Hopefully prevent crashes with images that require more than 2 GiB per plane.
* Refuse to create clips longer than INT_MAX (often 2**31-1).
* Use the _FieldBased frame property to determine each frame's field order, for sources where it changes.
If the _FieldBased property is present in the input clip and you need to force a different field order, use std.ModifyFrame (http://www.vapoursynth.com/doc/functions/modifyframe.html) to overwrite the _FieldBased property.
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.