Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
5th January 2011, 23:34 | #1 | Link |
Registered User
Join Date: Sep 2002
Location: Germany
Posts: 352
|
NLMeansCL: GPU based Non Local Means Denoising
Hi,
i would like to introduce a new filter for avisynth: NLMeansCL. The filter is my try on the NLMeans algorithm. Tritical already wrote TNLMeans in 2006, which is also an implementation of the NLMeans algorithm. (Thanks for your work tritical!) In contrast to tritical's implementation - which is written in C++ and runs on the CPU, my implementation is written in OpenCL and runs on the GPU (typically). Note: I will update this post to reflect all changes to the filter. The most recent modifications will be marked in blue. I was only able to test the filter on my NVIDIA Geforce 9600 GT. Therefore i can give no guarantee that it runs on your GPU or even crashes or kills your PC! (Not that i think it would do...) The wrapper around the OpenCL algorithm is written in C#. Syntax: NLMeansCL(int A, int Ay, int Az, int S, int Sy, int B, int By, float aa, float h, float hC, int plane, bool debug, string debugpath, string smf, bool cpu, bool buffer, bool sse) If you want to know some background information about the NLMeans algorithm, i'd like to point you to the very well written readme.txt that is part of tritical's TNLMeans filter package. He also explaines all the parameters of the filter in detail. The syntax of NLMeansCL:
Running the filter on CPUs: To be able to run the filter on the CPU, you have to install the ATI Stream SDK. Furthermore, you have to set 2 parameters: cpu=true and buffer=true. Using the parameter combination cpu=true and buffer=false will fail to execute, since AMD has not implemented image support for the CPU version of its OpenCL drivers yet! The support for buffers is preliminary. That means i'm undecided if i will keep or remove it in future versions of the filter. If AMD adds image support to its drivers (either CPU or GPU), there is not much reason to leave it in the filter. NLMeansCL will take full advantage of multiple CPU cores. There is no need to use MT() or setMTmode()! (It will rather degrade performance.) When forcing to use buffers instead of images, the fps on my Geforce drops from 23.05 to 3.90 Here's how the content of the debug log file will look if the filter initializes correctly: Code:
NLMeansCL Version 0.3.2 ScriptEnvironment present. Number of OpenCL Compute Platforms = 2. Trying OpenCL Compute Platform NVIDIA Corporation. OpenCL 1.0 CUDA 3.2.1. Number of OpenCL Devices in Platform = 1. Trying OpenCL Device GeForce 9600 GT. Device available. Wrong Device Type (Gpu) requesting Cpu. Trying OpenCL Compute Platform Advanced Micro Devices, Inc.. OpenCL 1.1 ATI-Stream-v2.3 (451). Number of OpenCL Devices in Platform = 1. Trying OpenCL Device Intel(R) Core(TM)2 CPU 4400 @ 2.00GHz. Device available. Device Type Cpu. Device does not support images. Using Device Intel(R) Core(TM)2 CPU 4400 @ 2.00GHz. OpenCL Compute Context successfully created. OpenCL Command Queue successfully created. OpenCL Program successfully built. Prog Y Build log: Prog UV Build log: OpenCL kernels successfully created. 1. The NLMeansCL Filter DLL itself: Link and attachment at the end of my post. Put it in you avisynth plugin folder. And don't rename it! 2. CLOO: A .net library for OpenCL. Needed to run NLMeansCL. You can download it here: http://sourceforge.net/projects/cloo/ Take the Cloo.dll file from \bin\release inside the zip file and put it in your avisynth plugin folder. 3. AvsFilterNet: A .net library to write Avisynth filters. Needed to run NLMeansCL. You can download it here: http://avsfilternet.codeplex.com/ Take the AvsFilterNet.dll and put it in your avisynth plugin folder. That's it. Performance: I evaluated some figures for my system: CPU: Core2Duo, running on 3.2GHz GPU: NVIDIA Geforce 9600GT, 512MB GDDR3, 650MHz Core / 900MHz Memory / 1600MHZ Shader, not overclocked I typically get a speed improvement of factor 18 to 25 compared to TNLMeans. For example: Video: 720x576, YV12 Parameter: A=4, S=2, B=1 TNLMeans: 0.98 fps NLMeansCL: 23.93 fps (cpu=false, buffer=false) NLMeansCL: 3.90 fps (cpu=false, buffer=true) NLMeansCL: 1.40 fps (cpu=true, buffer=true) The speed factor between NLMeansCL and TNLMeans is similar for different video sizes (e.g. 1920x1080 or 360x288). As well as for different filter parameters (Bx = 0, By = 0). On my GPU, the implementation does NOT benefit if you use values of 2 or above for B/By! I have some explanations for this behaviour, but it would lead to far to explain it... I'm highly interested to see performance figures for different GPUs as well as the feedback if it runs on different graphic cards. A typical script to test the performance would be the following. Load the script in Virtualdub and check the 'video rendering rate' in the status window. Code:
mpeg2source(...) trim(0, 1) assumefps(500) last = last + last + last + last last = last + last + last + last last = last + last + last + last last = last + last + last + last NLMeansCL(A=4, S=2, B=1, aa=1.0, h=1.8, plane=4) My findings so far are, that the default values for A, S and B work very well! Typically there is no improvements by setting A or S higher. It only get's a lot slower! IMHO there is no need to change aa to something other than 1.0. Playing around with h and hC is sufficient. Problems: If you have any problems with the filter, especially if i doesn't work at all. Please use GPU Caps Viewer (http://www.ozone3d.net/gpu_caps_viewer/) and check first, if the included OpenCL demos do run! Then go to the tab named 'Tools' and send me the 'Full XML Export'! There's an extra button for it on the tab. Also, please send me the log file, that NLMeansCL creates! I cannot guarantee to help you out quick, since i'm rather busy! Version 0.4.0 alpha: This version is only a preliminary version (created in January) that implements a temporal version of the algorithm. Currently, it only supports a temporal window of 1, respectively 2 frames (in both directions). The temporal mode is only implemented for the image based algorithm, not the buffer based. On my PC, the algorithm produces some non deterministic artefacts in the video that are visible as small blocks of completely black pixels. I assume some runtime problems / asychronity between the shaders and writing out the memory to the host PC. I haven't worked for months now on the algorithm and this will probably be the status for the rest of the summer. I have also a version for arbitrary values of Az, but i'm not satisfied with the results (it's too slow and the computed values are incorrect) TODOs: - (Better) Temporal mode - Make NLMeansCL work on AMD graphic cards - x64 version - Other color spaces (YUY2, RGB) Changelog: Changes from v0.1 to v0.1.1
Changes from v0.1.1 to v0.1.2
Changes from v0.1.2 to v0.2
Changes from v0.2 to v0.2.1
Changes from v0.2.1 to v0.2.2
Changes from v0.2.1 to v0.3
Changes from v0.3 to v0.3.1
Changes from v0.3.1 to v0.3.2
v0.4.0 alpha
Download latest version: v0.3.2 :http://www.mediafire.com/?q4butkseucz9tin v0.3.2 sources:http://www.mediafire.com/?l3swlzu2pm3375l v0.4.0 alpha : http://www.mediafire.com/?9osy86a14u0qxr6 Malcolm Last edited by Malcolm; 2nd September 2011 at 22:23. Reason: Added Version 0.3.2 |
|
|