Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > VapourSynth

Reply
 
Thread Tools Search this Thread Display Modes
Old 22nd September 2017, 11:39   #1  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 500
nnedi3cl

https://github.com/HomeOfVapourSynth...Synth-NNEDI3CL

Here comes another OpenCL variant of the popular filter. As usual, some benchmarks below FYI, measured by "vspipe -p test.vpy .". My CPU is E3-1231v3 and GPU is GTX 660. Sample videos used for benchmarking. test1 is quite unfriendly to the prescreener, while test2 is very friendly to the prescreener.

vpy
Code:
import vapoursynth as vs
core = vs.get_core()
core.max_cache_size = 3072

clip = core.lsmas.LibavSMASHSource('test1.mp4')
#clip = core.lsmas.LibavSMASHSource('test2.mp4')
#clip = core.resize.Bicubic(clip, format=vs.YUV420P16)

# deinterlace
#clip = core.nnedi3.nnedi3(clip, field=1)
#clip = core.nnedi3cl.NNEDI3CL(clip, field=1)

# enlarge
#clip = core.std.Transpose(clip).nnedi3.nnedi3(field=1, dh=True, nsize=4, nns=3).std.Transpose().nnedi3.nnedi3(field=1, dh=True, nsize=4, nns=3)
#clip = core.nnedi3cl.NNEDI3CL(clip, field=1, dh=True, dw=True, nsize=4, nns=3)

clip.set_output()
deinterlace-test1
Code:
YUV420P8:
nnedi3:   19.66 fps
nnedi3cl: 35.82 fps

YUV420P16:
nnedi3:   13.12 fps
nnedi3cl: 32.68 fps
deinterlace-test2
Code:
YUV420P8:
nnedi3:   98.34 fps
nnedi3cl: 89.53 fps

YUV420P16:
nnedi3:   59.34 fps
nnedi3cl: 71.07 fps
enlarge-test1
Code:
YUV420P8:
nnedi3:   6.60 fps
nnedi3cl: 7.72 fps

YUV420P16:
nnedi3:   4.99 fps
nnedi3cl: 7.44 fps
enlarge-test2
Code:
YUV420P8:
nnedi3:   28.82 fps
nnedi3cl: 34.85 fps

YUV420P16:
nnedi3:   16.48 fps
nnedi3cl: 29.17 fps

Last edited by HolyWu; 19th October 2017 at 05:01.
HolyWu is offline   Reply With Quote
Old 22nd September 2017, 15:38   #2  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Ikea Chair
Posts: 1,876
Interesting... Maybe I'll try similar benchmarks on my computer which has very different hardware.

Btw, why didn't you use the built in resize for the bitdepth conversion?
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet
Myrsloik is offline   Reply With Quote
Old 22nd September 2017, 16:46   #3  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 3,582
Thanks

Is there an "easy" way to toggle versions for testing for functions based on NNEDI3 / CL ? Maybe some python magic I'm not aware of since I'm a python newbie?

For example, presumably haf.QTGMC would call the nnedi3.nnedi3 version , would I need to find/replace all instances or something like that ?
poisondeathray is offline   Reply With Quote
Old 22nd September 2017, 17:13   #4  |  Link
TheFluff
Excessively jovial fellow
 
Join Date: Jun 2004
Location: rude
Posts: 1,049
Code:
plain_nnedi = core.nnedi3.nnedi3
core.nnedi3.nnedi3 = core.nnedi3cl.NNEDI3CL
If these were regular Python functions this would work just fine, but idk about VS' Cython modules - try it and see. It should work though - you can assign to (and overwrite) builtin standard library functions in Python if you want.
TheFluff is offline   Reply With Quote
Old 22nd September 2017, 17:22   #5  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 3,582
I'm getting some issues with build program failure

Win8.1 x64 . Vapoursynth R39test4

Code:
Failed to evaluate the script:
Python exception: NNEDI3CL: Build Program Failure
:1:3129: error: incompatible pointer types passing '__local float *' to parameter of type 'const __local float (*)[95]'
:1:161: note: passing argument to parameter 'input' here
:1:4825: error: incompatible pointer types passing '__local float *' to parameter of type 'const __local float (*)[95]'
:1:161: note: passing argument to parameter 'input' here

error: front end compiler failed build.
On simple script. UT Video 4:2:0. 720x480. Commenting out NNEDI3CL line works ok
Code:
import vapoursynth as vs
core = vs.get_core()

clip = core.avisource.AVISource(r'PATH\test.avi', pixel_type="yv12")
#clip = core.nnedi3cl.NNEDI3CL(clip, field=1, dh=True, dw=True)
clip.set_output()

Last edited by poisondeathray; 22nd September 2017 at 17:34.
poisondeathray is offline   Reply With Quote
Old 23rd September 2017, 04:19   #6  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 500
Quote:
Originally Posted by Myrsloik View Post
Btw, why didn't you use the built in resize for the bitdepth conversion?
It's simply more convenient for me to test whether the filter is working correctly in different bit depth without having to type full format constant like YUVxxxPxx. Furthermore, only YUV444 has constants for 32 bits defined, hence it complains that module 'vapoursynth' has no attribute 'YUV420PS32' and whatnot. I bet that most users won't even bother to use register_format.

Quote:
Originally Posted by poisondeathray View Post
I'm getting some issues with build program failure
Please try https://www.nmm-hd.org/upload/get~gS...NNEDI3CL-r3.7z and see whether it works.
HolyWu is offline   Reply With Quote
Old 23rd September 2017, 07:06   #7  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 3,582
Quote:
Originally Posted by HolyWu View Post
Please try https://www.nmm-hd.org/upload/get~gS...NNEDI3CL-r3.7z and see whether it works.
This one works

What was the issue ? / What is different with this build ?
poisondeathray is offline   Reply With Quote
Old 23rd September 2017, 10:11   #8  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 500
Quote:
Originally Posted by poisondeathray View Post
This one works

What was the issue ? / What is different with this build ?
The issue was type mismatch between the function parameter and the argument.

Updated r3 on GitHub. Binaries are the same as the linked file in my previous post.
HolyWu is offline   Reply With Quote
Old 24th September 2017, 05:39   #9  |  Link
aegisofrime
Registered User
 
Join Date: Apr 2009
Posts: 453
Interesting. Did you manage to measure power draw at the wall? I'm curious if the increase in power consumption (if any) is worth the gains in speed.
aegisofrime is offline   Reply With Quote
Old 30th September 2017, 19:04   #10  |  Link
DJATOM
Registered User
 
Join Date: Sep 2010
Location: Ukraine, Bohuslav
Posts: 126
I can't get it working on NVIDIA Quadro 600 and Windows Server 2012 R2 - https://pastebin.com/Sya8BVKQ
Is that a driver issue? I'm not familiar with OpenCL.
Upd.: On another server (same OS) on NVIDIA GeForce GTX 550 Ti I get the same errors.

Last edited by DJATOM; 30th September 2017 at 19:29.
DJATOM is offline   Reply With Quote
Old 1st October 2017, 13:07   #11  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 500
Quote:
Originally Posted by DJATOM View Post
I can't get it working on NVIDIA Quadro 600 and Windows Server 2012 R2 - https://pastebin.com/Sya8BVKQ
Is that a driver issue? I'm not familiar with OpenCL.
Upd.: On another server (same OS) on NVIDIA GeForce GTX 550 Ti I get the same errors.
The filter uses features not present in OpenCL 1.1 and below, hence the minimum requirement is OpenCL 1.2.
HolyWu is offline   Reply With Quote
Old 1st October 2017, 18:02   #12  |  Link
DJATOM
Registered User
 
Join Date: Sep 2010
Location: Ukraine, Bohuslav
Posts: 126
Yes, replaced 550 TI with 760 and it works.
DJATOM is offline   Reply With Quote
Old 2nd October 2017, 18:43   #13  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Ikea Chair
Posts: 1,876
All tests performed with:
Code:
vspipe script.vpy . -p
Bitdepth conversion was done with the internal resize.

Threadripper 1950X
Sapphire Fury Tri-X
3200 CL14 RAM

All tests were run with 32 threads (default) except for enlarge on cpu where 16 threads for some reason performed substantially better (7-10 fps difference). The source used was 3000 frames from a typical 1080p tv series episode.

deinterlace1:
Code:
YUV420P8:
nnedi3:   352.14 fps
nnedi3cl: 45.68 fps

YUV420P16:
nnedi3:   164.40 fps
nnedi3cl: 40.42 fps
enlarge:
Code:
YUV420P8:
nnedi3:   65.22 fps
nnedi3cl: 10.47 fps

YUV420P16:
nnedi3:   32.03 fps
nnedi3cl: 9.90 fps
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet

Last edited by Myrsloik; 2nd October 2017 at 18:46.
Myrsloik is offline   Reply With Quote
Old 14th October 2017, 11:41   #14  |  Link
Mystery Keeper
Beyond Kawaii
 
Mystery Keeper's Avatar
 
Join Date: Feb 2008
Location: Russia
Posts: 687
Tried to replace NNEDI3 with NNEDI3CL in QTGMC.
Got this error:
Code:
2017-10-14 13:41:42.021
Failed to evaluate the script:
Python exception: NNEDI3CL: Build Program Failure
:1:528: error: expected identifier or '('
:42:23: note: expanded from here
#define SCALE_ASIZE 0,003472
^
:1:528: error: expected ';' at end of declaration
:42:23: note: expanded from here
#define SCALE_ASIZE 0,003472
^
:1:564: error: expected identifier or '('
:42:23: note: expanded from here
#define SCALE_ASIZE 0,003472
^
:1:564: error: expected ';' at end of declaration
:42:23: note: expanded from here
#define SCALE_ASIZE 0,003472
^
:1:1854: warning: expression result unused


Traceback (most recent call last):
File "src\cython\vapoursynth.pyx", line 1830, in vapoursynth.vpy_evaluateScript (src\cython\vapoursynth.c:36860)
File "D:/video-to-process/Takaradzuka - Phantom 2004/phantom-temp.vpy", line 92, in 
deint = haf.QTGMC(weaved, Preset="Placebo", EdiMode="eedi3+nnedi3", ChromaEdi="", EdiQual=2, NNeurons=4, NNSize=3, SubPel=4, SubPelInterp=2, BlockSize=8, Overlap=4, TFF=True, **qtgmcArguments)
File "D:\vapoursynth-plugins\py\havsfunc.py", line 1104, in QTGMC
edi1 = QTGMC_Interpolate(ediInput, InputType, EdiMode, NNSize, NNeurons, EdiQual, EdiMaxD, bobbed, ChromaEdi, TFF)
File "D:\vapoursynth-plugins\py\havsfunc.py", line 1390, in QTGMC_Interpolate
sclip=core.nnedi3cl.NNEDI3CL(Input, field=field, planes=planes, nsize=NNSize, nns=NNeurons, qual=EdiQual))
File "src\cython\vapoursynth.pyx", line 1722, in vapoursynth.Function.__call__ (src\cython\vapoursynth.c:35000)
vapoursynth.Error: NNEDI3CL: Build Program Failure
:1:528: error: expected identifier or '('
:42:23: note: expanded from here
#define SCALE_ASIZE 0,003472
^
:1:528: error: expected ';' at end of declaration
:42:23: note: expanded from here
#define SCALE_ASIZE 0,003472
^
:1:564: error: expected identifier or '('
:42:23: note: expanded from here
#define SCALE_ASIZE 0,003472
^
:1:564: error: expected ';' at end of declaration
:42:23: note: expanded from here
#define SCALE_ASIZE 0,003472
^
:1:1854: warning: expression result unused
__________________
...desu!
Mystery Keeper is offline   Reply With Quote
Old 19th October 2017, 05:25   #15  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 500
Update r4 & r5.
  • Fix decimal-point character issue in different locales.
  • Add old & new prescreener.
  • Use snprintf to convert floating point to string for more precise value, because to_string only writes six decimal digits after the decimal point.
  • Change filter mode to completely parallel execution.

The benchmark in the first post is revised.

Quote:
Originally Posted by Mystery Keeper View Post
Tried to replace NNEDI3 with NNEDI3CL in QTGMC.
Got this error:
Code:
2017-10-14 13:41:42.021
Failed to evaluate the script:
Python exception: NNEDI3CL: Build Program Failure
:1:528: error: expected identifier or '('
:42:23: note: expanded from here
#define SCALE_ASIZE 0,003472
Thanks for the report. It's caused by different representation of decimal-point character in some locales. Please try the latest release again.
HolyWu is offline   Reply With Quote
Old 19th October 2017, 05:34   #16  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 500
Quote:
Originally Posted by Myrsloik View Post
Threadripper 1950X
The result with Threadripper is really superb. I wonder does it mostly benefit from the advertised neural net prediction and smart prefetch, besides so many threads available.
HolyWu is offline   Reply With Quote
Old 22nd October 2017, 18:15   #17  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 500
Code:
clip = core.ffms2.Source('lena512.bmp').std.Loop(1000)
#clip = core.nnedi3.nnedi3(clip, field=1, pscrn=2)
#clip = core.znedi3.nnedi3(clip, field=1, pscrn=2)
Code:
nnedi3: Output 1000 frames in 6.70 seconds (149.24 fps)
znedi3: Output 1000 frames in 71.07 seconds (14.07 fps)
I have a feeling that the code path selection is wrong and it goes into pure c functions.

Anyway, I have no specific favor over CPU or GPGPU personally. It's simply provided as an alternative here. The users will choose which one to use on their own depending on the speed they get then.
HolyWu is offline   Reply With Quote
Old 23rd October 2017, 14:08   #18  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 500
Quote:
Originally Posted by Stephen R. Savage View Post
I updated with more support for legacy CPUs. Link.
It's leaking memory. Use a clip of larger resolution to see the memory grows quickly.
HolyWu is offline   Reply With Quote
Old 23rd October 2017, 18:21   #19  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 500
Code:
#clip = core.ffms2.Source('lena512color.tiff').std.Loop(2000)
#clip = core.ffms2.Source('test1.mp4')
#clip = core.ffms2.Source('test2.mp4')

#clip = core.nnedi3.nnedi3(clip, field=1, pscrn=2)
#clip = core.znedi3.nnedi3(clip, field=1, pscrn=2)
#clip = core.nnedi3cl.NNEDI3CL(clip, field=1, pscrn=2)
lena512color.tiff
Code:
nnedi3:   Output 2000 frames in 14.52 seconds (137.71 fps)
znedi3:   Output 2000 frames in 9.08 seconds (220.17 fps)
nnedi3cl: Output 2000 frames in 17.65 seconds (113.28 fps)
test1.mp4
Code:
nnedi3:   Output 1250 frames in 63.31 seconds (19.75 fps)
znedi3:   Output 1250 frames in 40.80 seconds (30.64 fps)
nnedi3cl: Output 1250 frames in 34.97 seconds (35.75 fps)
test2.mp4
Code:
nnedi3:   Output 2184 frames in 21.82 seconds (100.10 fps)
znedi3:   Output 2184 frames in 21.07 seconds (103.66 fps)
nnedi3cl: Output 2184 frames in 23.73 seconds (92.02 fps)
Benchmarking with a static picture is probably not that realistic. And both my CPU and GPU are a bit old. Someone with both CPU and GPU released last year, or even better this year, may give a more representative result.
HolyWu is offline   Reply With Quote
Old 23rd October 2017, 18:49   #20  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 3,582
There seem to be some qualitative differences between them too - why would that be ?
poisondeathray is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 10:16.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2018, vBulletin Solutions Inc.