Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
5th January 2011, 23:34 | #1 | Link |
Registered User
Join Date: Sep 2002
Location: Germany
Posts: 352
|
NLMeansCL: GPU based Non Local Means Denoising
Hi,
i would like to introduce a new filter for avisynth: NLMeansCL. The filter is my try on the NLMeans algorithm. Tritical already wrote TNLMeans in 2006, which is also an implementation of the NLMeans algorithm. (Thanks for your work tritical!) In contrast to tritical's implementation - which is written in C++ and runs on the CPU, my implementation is written in OpenCL and runs on the GPU (typically). Note: I will update this post to reflect all changes to the filter. The most recent modifications will be marked in blue. I was only able to test the filter on my NVIDIA Geforce 9600 GT. Therefore i can give no guarantee that it runs on your GPU or even crashes or kills your PC! (Not that i think it would do...) The wrapper around the OpenCL algorithm is written in C#. Syntax: NLMeansCL(int A, int Ay, int Az, int S, int Sy, int B, int By, float aa, float h, float hC, int plane, bool debug, string debugpath, string smf, bool cpu, bool buffer, bool sse) If you want to know some background information about the NLMeans algorithm, i'd like to point you to the very well written readme.txt that is part of tritical's TNLMeans filter package. He also explaines all the parameters of the filter in detail. The syntax of NLMeansCL:
Running the filter on CPUs: To be able to run the filter on the CPU, you have to install the ATI Stream SDK. Furthermore, you have to set 2 parameters: cpu=true and buffer=true. Using the parameter combination cpu=true and buffer=false will fail to execute, since AMD has not implemented image support for the CPU version of its OpenCL drivers yet! The support for buffers is preliminary. That means i'm undecided if i will keep or remove it in future versions of the filter. If AMD adds image support to its drivers (either CPU or GPU), there is not much reason to leave it in the filter. NLMeansCL will take full advantage of multiple CPU cores. There is no need to use MT() or setMTmode()! (It will rather degrade performance.) When forcing to use buffers instead of images, the fps on my Geforce drops from 23.05 to 3.90 Here's how the content of the debug log file will look if the filter initializes correctly: Code:
NLMeansCL Version 0.3.2 ScriptEnvironment present. Number of OpenCL Compute Platforms = 2. Trying OpenCL Compute Platform NVIDIA Corporation. OpenCL 1.0 CUDA 3.2.1. Number of OpenCL Devices in Platform = 1. Trying OpenCL Device GeForce 9600 GT. Device available. Wrong Device Type (Gpu) requesting Cpu. Trying OpenCL Compute Platform Advanced Micro Devices, Inc.. OpenCL 1.1 ATI-Stream-v2.3 (451). Number of OpenCL Devices in Platform = 1. Trying OpenCL Device Intel(R) Core(TM)2 CPU 4400 @ 2.00GHz. Device available. Device Type Cpu. Device does not support images. Using Device Intel(R) Core(TM)2 CPU 4400 @ 2.00GHz. OpenCL Compute Context successfully created. OpenCL Command Queue successfully created. OpenCL Program successfully built. Prog Y Build log: Prog UV Build log: OpenCL kernels successfully created. 1. The NLMeansCL Filter DLL itself: Link and attachment at the end of my post. Put it in you avisynth plugin folder. And don't rename it! 2. CLOO: A .net library for OpenCL. Needed to run NLMeansCL. You can download it here: http://sourceforge.net/projects/cloo/ Take the Cloo.dll file from \bin\release inside the zip file and put it in your avisynth plugin folder. 3. AvsFilterNet: A .net library to write Avisynth filters. Needed to run NLMeansCL. You can download it here: http://avsfilternet.codeplex.com/ Take the AvsFilterNet.dll and put it in your avisynth plugin folder. That's it. Performance: I evaluated some figures for my system: CPU: Core2Duo, running on 3.2GHz GPU: NVIDIA Geforce 9600GT, 512MB GDDR3, 650MHz Core / 900MHz Memory / 1600MHZ Shader, not overclocked I typically get a speed improvement of factor 18 to 25 compared to TNLMeans. For example: Video: 720x576, YV12 Parameter: A=4, S=2, B=1 TNLMeans: 0.98 fps NLMeansCL: 23.93 fps (cpu=false, buffer=false) NLMeansCL: 3.90 fps (cpu=false, buffer=true) NLMeansCL: 1.40 fps (cpu=true, buffer=true) The speed factor between NLMeansCL and TNLMeans is similar for different video sizes (e.g. 1920x1080 or 360x288). As well as for different filter parameters (Bx = 0, By = 0). On my GPU, the implementation does NOT benefit if you use values of 2 or above for B/By! I have some explanations for this behaviour, but it would lead to far to explain it... I'm highly interested to see performance figures for different GPUs as well as the feedback if it runs on different graphic cards. A typical script to test the performance would be the following. Load the script in Virtualdub and check the 'video rendering rate' in the status window. Code:
mpeg2source(...) trim(0, 1) assumefps(500) last = last + last + last + last last = last + last + last + last last = last + last + last + last last = last + last + last + last NLMeansCL(A=4, S=2, B=1, aa=1.0, h=1.8, plane=4) My findings so far are, that the default values for A, S and B work very well! Typically there is no improvements by setting A or S higher. It only get's a lot slower! IMHO there is no need to change aa to something other than 1.0. Playing around with h and hC is sufficient. Problems: If you have any problems with the filter, especially if i doesn't work at all. Please use GPU Caps Viewer (http://www.ozone3d.net/gpu_caps_viewer/) and check first, if the included OpenCL demos do run! Then go to the tab named 'Tools' and send me the 'Full XML Export'! There's an extra button for it on the tab. Also, please send me the log file, that NLMeansCL creates! I cannot guarantee to help you out quick, since i'm rather busy! Version 0.4.0 alpha: This version is only a preliminary version (created in January) that implements a temporal version of the algorithm. Currently, it only supports a temporal window of 1, respectively 2 frames (in both directions). The temporal mode is only implemented for the image based algorithm, not the buffer based. On my PC, the algorithm produces some non deterministic artefacts in the video that are visible as small blocks of completely black pixels. I assume some runtime problems / asychronity between the shaders and writing out the memory to the host PC. I haven't worked for months now on the algorithm and this will probably be the status for the rest of the summer. I have also a version for arbitrary values of Az, but i'm not satisfied with the results (it's too slow and the computed values are incorrect) TODOs: - (Better) Temporal mode - Make NLMeansCL work on AMD graphic cards - x64 version - Other color spaces (YUY2, RGB) Changelog: Changes from v0.1 to v0.1.1
Changes from v0.1.1 to v0.1.2
Changes from v0.1.2 to v0.2
Changes from v0.2 to v0.2.1
Changes from v0.2.1 to v0.2.2
Changes from v0.2.1 to v0.3
Changes from v0.3 to v0.3.1
Changes from v0.3.1 to v0.3.2
v0.4.0 alpha
Download latest version: v0.3.2 :http://www.mediafire.com/?q4butkseucz9tin v0.3.2 sources:http://www.mediafire.com/?l3swlzu2pm3375l v0.4.0 alpha : http://www.mediafire.com/?9osy86a14u0qxr6 Malcolm Last edited by Malcolm; 2nd September 2011 at 22:23. Reason: Added Version 0.3.2 |
6th January 2011, 00:05 | #2 | Link |
warpsharpened
Join Date: Feb 2007
Posts: 787
|
Avisynth is telling me that NLMeansCL_netautoload.dll is not an avisynth 2.5 plugin.
Edit: Am I suppose to load this differently from other plugins other than just a simple LoadPlugin("X:\path\to\filter.dll")? Last edited by TheRyuu; 6th January 2011 at 00:09. |
6th January 2011, 00:13 | #3 | Link |
Registered User
Join Date: Sep 2002
Location: Germany
Posts: 352
|
Huh?
You don't have to load it explicitly. AvsFilterNet does that for you if the filename ends with _netautoload.dll and it resides in the same folder. If you remove the suffix, you can load it manually with LoadPlugin(). Still you need the AvsFilterNet.dll (i guess) Malcolm |
6th January 2011, 01:28 | #5 | Link |
Registered User
Join Date: Sep 2002
Location: Germany
Posts: 352
|
@masterboy
When called with the same parameters, both filters produce exactly(*) the same result! (*) The difference that you see on the right is 16 times enhanced. It contains only a few individual 'dots'. They arise from minor differences in the mathematical calculation. For performance reasons the calculation in OpenCL is performed with 'relaxed-math' optimization and with single-precision (float instead of double) |
6th January 2011, 01:48 | #7 | Link |
warpsharpened
Join Date: Feb 2007
Posts: 787
|
Well unless I'm doing it wrong (I just threw all 3 things in the autoload folder for testing it) it caused my graphics drivers to 'crash' and have to recover (when loading in vdub).
Vdub says nothing on crash, avsp is saying some sort of null pointer exception when I try and run the script (and doesn't cause a driver recovery), dunno if that helps. Running a GTX 570 here with the latest beta drivers (266.35). |
6th January 2011, 02:40 | #8 | Link |
Usered Register
Join Date: Dec 2006
Posts: 9
|
I gave it a try. All three linked dll's in avisynth plugins folder, installed Catalyst 10.12 (APP version that has OpenCL support), and the StreamSDK which has the OpenCL libraries and whatnot. I get the below exception on my Radeon HD 4870. Any ideas how I can determine what actually failed (I know OpenCL on AMD is ?? at best, especially older cards like this one)?
Picture too wide for forum so linked |
6th January 2011, 08:52 | #10 | Link |
Registered User
Join Date: Nov 2009
Posts: 2,361
|
The long awaited denoiser!! Thank you!
I get an error previewing in avspmod: error messege Also will you implement temporal Az? I think it was something tritical did by himself, but itd be very welcome. I have a Geforce 9600M GT card driver version: 197.16
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread Last edited by Dogway; 6th January 2011 at 08:56. |
6th January 2011, 10:34 | #12 | Link | |
Registered User
Join Date: Nov 2009
Posts: 2,361
|
Thanks, it worked CL implementation is on the latter drivers only.
Some questions: -the default behaviour is sse=true (as tnlmeans)? I like using sse=false for animation sources, it works nice for large flat colors. -If you feel like, could you make some kind of dark protection? if a source has very dark scenes it completly turns into a mud secuence (just like tnl or dfttest). -Do I need to make it MT or something? benchmark: Quote:
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread Last edited by Dogway; 6th January 2011 at 10:41. |
|
6th January 2011, 10:38 | #13 | Link | |
Registered User
Join Date: Sep 2002
Location: Germany
Posts: 352
|
Quote:
That means: If you have a very complex OpenCL kernel, that computes for more than 2 seconds on one video frame, then windows will kill and restart your driver! Since you have a GTX 570 i would assume that it's fast enough. So this shouldn't happen unless you use parameters like A=8, S=6 or so. But since i haven't tested the filter on that GPU i can only guess! |
|
6th January 2011, 10:45 | #14 | Link | |
Registered User
Join Date: Sep 2002
Location: Germany
Posts: 352
|
Quote:
I will provide a version of the filter that spits out the real message. What you see on the picture is the mentioned general exception saying that the filter called env.ThowError(...) |
|
6th January 2011, 10:57 | #15 | Link | |
Registered User
Join Date: Sep 2002
Location: Germany
Posts: 352
|
Quote:
- dark scene protection: actually i would recommend to do that with a little bit of scripting in avisynth. - Using MT will not help. NLMeansCL itself is already mutithreaded on the GPU by nature. That's where the real work is done. So multithreading the wrapper-part, which runs on the CPU doesn't help improving performance. 0.19 fps?!? Wow! i cannot imagine how this number comes to existance. At 720x576? What's your script? Last edited by Malcolm; 6th January 2011 at 10:59. |
|
6th January 2011, 12:07 | #16 | Link | |
Registered User
Join Date: Nov 2009
Posts: 2,361
|
Thanks for the answers, something must have been wrong, as there's no temporal I used an image and that was the speed. Now I tried with a video source and results were more optimistic:
Quote:
The dark protection is not only scenes, but part of the scenes, but I will look into that.
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread |
|
6th January 2011, 13:48 | #17 | Link | |
Registered User
Join Date: Sep 2002
Location: Germany
Posts: 352
|
Quote:
|
|
6th January 2011, 14:40 | #18 | Link | |
Registered User
Join Date: Nov 2009
Posts: 2,361
|
It's really strange, if I process my image with the example script of your first post, it goes nice (5.85fps), but with the next script I only get 0.19fps:
Quote:
I always use az=3, sometimes 6 depending on sources, I think it would benefit from still areas, taking advantage of temporal information (noise,codec blocks...), but that's only me, Im aware this is still in experimental phase, I just wanted to help a bit. Nice 3 wise present! Keep the good work :P
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread Last edited by Dogway; 6th January 2011 at 14:46. |
|
6th January 2011, 16:22 | #19 | Link |
Registered User
Join Date: Oct 2009
Posts: 151
|
I got this error when trying to test the filter on avspmod
http://i.imgur.com/OITBQ.png My system is using Windows 7 x64 and GPU AMD Radeon HD4850, checked using GPU-Z and opencl is ticked. |
6th January 2011, 16:26 | #20 | Link |
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 5,391
|
@ Dogway: Just kill that SetMTmode(2) out of your your script.
Simple logic: [SetMTmode(2)] AND [GPU filter] == FAIL
__________________
- We´re at the beginning of the end of mankind´s childhood - My little flickr gallery. (Yes indeed, I do have hobbies other than digital video!) |
Thread Tools | Search this Thread |
Display Modes | |
|
|