Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
![]() |
|
Thread Tools | Search this Thread | Display Modes |
![]() |
#1 | Link |
Registered User
Join Date: Jan 2008
Location: London
Posts: 156
|
Deathray - OpenCL GPU accelerated spatial/temporal non-local means de-noising
I've created Deathray, an Avisynth plug-in filter for spatial/temporal non-local means de-noising. It uses OpenCL for GPU acceleration.
The project lives on GitHub now: Deathray on GitHub To download the DLL only: On the Deathray project page click on "Deathray.dll" in the list of files. On the new page, press the "Raw" button and you will be offered the option of saving the file.To download the entire source code, including the compiled DLL: On the Deathray project page click on the Download ZIP button at the bottom of the right-hand column.Version History Code:
1.04 - Marks use of OpenCL 1.1 features, that have been deprecated in 1.2, as preferred. This keeps AMD SDK from complaining. 1.03 - Fixed black pixels bug with low values of hY and z=true (also affected chroma with low values of hUV) Fixed bug with allocation of a spurious buffer 1.02 - Added new l, c, z and b options Temporal filtering now re-weights target pixel based upon all sampled frames 1.01 - Removed logging to stderr (was only present in NVidia debug version); Includes post-filtering correction first introduced in NVidia debug version; Updated to use AMD APP SDK; Updated to VS2012 Express Edition; Updated DLL so that it is not dependent upon visual C runtime - see post 70; 1.00 - Initial version I've quoted here the Deathray readme.txt: Code:
Deathray ======== An Avisynth plug-in filter for spatial/temporal non-local means de-noising. Created by Jawed Ashraf - Deathray@cupidity.f9.co.uk Installation ============ Copy the Deathray.dll to the "plugins" sub-folder of your installation of Avisynth. De-installation =============== Delete the Deathray.dll from the "plugins" sub-folder of your installation of Avisynth. Compatibility ============= The following software configurations are known to work: - Avisynth 2.5.8 and 2.6 MT (SEt's) - AMD Stream SDK 2.3 and AMD APP SDK 2.8.1 - AMD Catalyst 10.12 and 13.8 beta 2 - Windows Vista 64-bit and Windows 8 64-bit - NVidia software is known to work but drivers unknown The following hardware configurations are known to work: - ATI HD 5870 - AMD HD 7770 - AMD HD 7970 - Various NVidia, models unknown Known non-working hardware: - ATI cards in the 4000 series or earlier - ATI cards in the 5400 series Video: - Deathray is compatible solely with 8-bit planar formatted video. It has been tested with YV12 format. Usage ===== Deathray separates the video into its 3 component planes and processes each of them independently. This means some parameters come in two flavours: luma and chroma. Filtering can be adjusted with the following parameters, with the default value for each in brackets: hY (1.0) - strength of de-noising in the luma plane. Cannot be negative. If set to 0 Deathray will not process the luma plane. hUV (1.0) - strength of de-noising in the chroma planes. Cannot be negative. If set to 0 Deathray will not process the chroma planes. tY (0) - temporal radius for the luma plane. Limited to the range 0 to 64. When set to 0 spatial filtering is performed on the luma plane. When set to 1 filtering uses the prior, current and next frames for the non-local sampling and weighting process. Higher values will increase the range of prior and next frames that are included. tUV (0) - temporal radius for the chroma planes. Limited to the range 0 to 64. When set to 0 spatial filtering is performed on the chroma planes. When set to 1 filtering uses the prior, current and next frames for the non-local sampling and weighting process. Higher values will increase the range of prior and next frames that are included. s (1.0) - sigma used to generate the gaussian weights. Limited to values of at least 0.1. The kernel implemented by Deathray uses 7x7-pixel windows centred upon the pixel being filtered. For a 2-dimensional gaussian kernel sigma should be approximately 1/3 of the radius of the kernel, or less, to retain its gaussian nature. Since a 7x7 window has a radius of 3, values of sigma greater than 1.0 will tend to bias the kernel towards a box-weighting. i.e. all pixels in the window will tend towards being equally weighted. This will tend to reduce the selectivity of the weighting process and result in relatively stronger spatial blurring. x (1) - factor to expand sampling. Limited to values in the range 1 to 14. By default Deathray spatially samples 49 windows centred upon the pixel being filtered, in a 7x7 arrangement. x increases the sampling range in multiples of the kernel radius. Since the kernel radius is 3, setting x to 2 produces a sampling area of 13x13, i.e. 169 windows centred upon the target pixel. Yet higher values of x such as 3 or 4 will result in 19x19 or 25x25 sample windows. Deathray uses 32x32 tiles to accelerate its processing. Each tile is equipped with a border of 8 pixels around all four edges, with pixels copied from neighbouring tiles, or mirrored from within the tile if the tile edge corresponds with a frame edge. This apron of 8 extra pixels ensures that the default sampling of 49 windows is correct, allowing pixels near the edge of the tile to employ 49 sample windows that all have valid pixels. When x is set to 2 or more, sampling will "bump" into the edges defined by the 48x48 region. With strong values of the de-noising parameters this will create artefacts in the filtered image. These artefacts are visible as a grid of vertical and horizontal lines corresponding with the 32x32 arrangement of the tiles. l (false) - linear processing of luma plane. true or false. This option allows processing in linear space instead of the default gamma space. c (true) - correction after filtering. true or false. This option applies a correction after filtering to limit the amount of filtering per pixel. When set to false the naked NLM algorithm is used. z (false) - target pixel tends towards zero-weighted. true or false. Reduces the weight of the pixel being filtered to a minimum. This results in more even filtering across the tonal range from shadows to highlights. The standard NLM algorithm gives the pixel being filtered the maximum weight of all. A refinment of the algorithm is to give the pixel being filtered the maximum weight derived from all the other pixels that were inspected. This maximum of other pixels' weights is used when z is set to false. When set to true, the minimum of other pixels' weights is used instead. b (false) - balanced weighting. true or false. Attempts to balance weighting of pixels based upon their luma value. This parameter is not applied to chroma planes. Avisynth MT =========== Deathray is not thread safe. This means that only a single instance of Deathray can be used per Avisynth script. By extension this means that it is not compatible with any of the multi-threading modes of the Multi Threaded variant of Avisynth. Use: SetMTMode(5) before a call to Deathray in the Avisynth script, if multi-threading is active in other parts of the script. Multiple Scripts Using Deathray =============================== The graphics driver is thread safe. This means it is possible to have an arbitrary number of Avisynth scripts calling Deathray running on a system. e.g. 2 scripts could be encoding, another could be running in a media player and another could be previewing individual frames in AvsP or VirtualDub. Eventually video memory will probably run out, even though it's virtualised. System Responsiveness ===================== Currently graphics drivers are unable to confer user-responsiveness guarantees on OpenCL applications that utilise GPUs. This means if you are using Deathray on a frame size of 16 million pixels, there will be some juddering in Windows every ~0.7 seconds (1.5 frames per second on HD 5870) accompanied by difficulty in typing, etc. Last edited by Jawed; 10th January 2016 at 10:53. |
![]() |
![]() |
![]() |
#4 | Link |
Registered User
Join Date: Aug 2009
Posts: 26
|
Nice one, ill give it a try.
Is ati HD5750 compatible ? I have ati stream installed/enabled. edit: I gues it's compatible, i loaded the default settings, and it works...(just for previewing the filtered video.. cpu load is only 20%, GPU 50% load), now i have to try to tweak the settings and see the effect. Thanks Jawed Last edited by pirej; 19th January 2011 at 03:19. |
![]() |
![]() |
![]() |
#5 | Link |
Registered User
Join Date: Jan 2008
Location: London
Posts: 156
|
Not explicitly.
The general algorithm searches the entire image for blocks (what are usually called "windows") that look like the block around the target pixel. It uses the similarity of each sampled block as the weighting of the centre pixel of each sampled block. The filtered pixel is then the weighted sum of all the centre pixels of every block in the original image. In Deathray the search is restricted to 49 windows around the target pixel. This means if motion is "low", i.e. less than 3 pixels in any direction, motion compensation "arises". Deathray has an option, x, to increase the sampling area. Intrinsically NLM is a spatial filtering technique based on self-similarity in real world images plus it is geared towards noise rather than artefacts such as JPEG/DCT blocks or interlacing artefacts. See this paper for a summary: http://hal.archives-ouvertes.fr/docs...jcvrevised.pdf One of the problems with NLM (generally as well as in Deathray) is that it isn't doing time-series pixel averaging (what you might do with a series of photographs of a static scene) - the spatial aspect tends to dominate, even with a temporal radius of 5 or even 7. Ironically, after the grand claims made in the paper linked above, hybrid time-series techniques have been experimented with by some of the same people: ftp://ftp.math.ucla.edu/pub/camreport/cam09-62.pdf In this paper you will see reference to something called BM3D, which as far as I can tell is the academics' name for MVTools' MVDegrain (or MVTools2's MDegrain). The principle of the hybrid approach is to use BM3D where "registration" is achieved (i.e. motion compensation meets a threshold of suitability) and to use NLM where registration fails. I normally use FizzKiller, which is a variation of MDegrain using a calmed clip for analysis: http://forum.doom9.org/showthread.php?t=133977 but I'm looking for something faster, so I decided to implement temporal NLM. I should update the FizzKiller script I posted in that thread (post 23) as I tweaked it a bit. Overall, FizzKiller is awesome. |
![]() |
![]() |
![]() |
#6 | Link | |
Registered User
Join Date: Jan 2008
Location: London
Posts: 156
|
Quote:
I'm working on linear correction, a post-filtering step, to improve detail retention. This improves the result while allowing stronger de-noising, so I will post an updated version of Deathray soon. I recommend temporal rather than spatial - use 2 or 3 for the radius. I prefer low sigma, i.e. <=1. h varies with material. With sigma set to ~0.7 you are effectively using a 5x5 kernel - the outer ring of 24 pixels in each 7x7-pixel window is effectively weighted "0" (all 24 of these pixels have a total weighting of 4%). The search area is still 7x7 (i.e. 49 windows), but the smaller kernel is "sharper". I may implement a native 5x5 variant of Deathray to make it go faster, since I think temporal is more useful than spatial. I also need to understand how to make arguments for a plug-in optional (this is my first plug-in). Argument handling is very clumsy. Any help on that would be appreciated. Last edited by Jawed; 19th January 2011 at 12:25. |
|
![]() |
![]() |
![]() |
#7 | Link | |
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,430
|
Quote:
For an example, see http://avisynth.org/mediawiki/Filter...le_sample_1.3a. |
|
![]() |
![]() |
![]() |
#8 | Link |
Registered User
Join Date: Jan 2008
Location: London
Posts: 156
|
Thanks, this is how I setup Deathray:
Code:
env->AddFunction("deathray", "c[hY]f[hUV]f[tY]i[tUV]i[s]f[x]i", CreateDeathray, 0); Code:
AVSValue __cdecl CreateDeathray(AVSValue args, void *user_data, IScriptEnvironment *env) { double h_Y = args[1].AsFloat(1.); if (h_Y < 0.) h_Y = 0.; double h_UV = args[2].AsFloat(1.); if (h_UV < 0.) h_UV = 0.; int temporal_radius_Y = args[3].AsInt(0); if (temporal_radius_Y < 0) temporal_radius_Y = 0; if (temporal_radius_Y > 64) temporal_radius_Y = 64; int temporal_radius_UV = args[4].AsInt(0); if (temporal_radius_UV < 0) temporal_radius_UV = 0; if (temporal_radius_UV > 64) temporal_radius_UV = 64; double sigma = args[5].AsFloat(1.); if (sigma < 0.1) sigma = 0.1; int sample_expand = args[6].AsInt(1); if (sample_expand <= 0) sample_expand = 1; if (sample_expand > 14) sample_expand = 14; return new deathray(args[0].AsClip(), h_Y, h_UV, temporal_radius_Y, temporal_radius_UV, sigma, sample_expand, env); } deathray(hY=2) it works fine. If, instead, I try this: deathray(hy=2,1) It fails. But now I think about it, I think that's not valid Avisynth function call syntax - you can't mix named and un-named paramters. Sigh, addled by C++... |
![]() |
![]() |
![]() |
#9 | Link | |
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 5,390
|
Quote:
![]() There's a bunch of scripts that do exactly this kind of "hybrid" filtering ... several by *mp4guy, one or two by me ... Here is one with a short explanation of the principle, 3 years old. And I definetly know my first mentioning of that problem/solution has been years before that already.
__________________
- We´re at the beginning of the end of mankind´s childhood - My little flickr gallery. (Yes indeed, I do have hobbies other than digital video!) |
|
![]() |
![]() |
![]() |
#10 | Link | ||
Registered User
Join Date: Jan 2008
Location: London
Posts: 156
|
Quote:
![]() Quote:
Has anyone built a hybrid of NLM and MDegrain? I've been thinking about trying Deathray to generate the calm clip for FizzKiller... |
||
![]() |
![]() |
![]() |
#11 | Link |
Registered User
Join Date: Dec 2005
Location: Germany
Posts: 1,765
|
I get an error: "Error in OpenCl status=1 frame 0"
My graphics card is a Nvidia 260GTX. Win7 x64
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth VapourSynth Portable FATPACK || VapourSynth Database || https://github.com/avisynth-repository |
![]() |
![]() |
![]() |
#12 | Link |
Registered User
Join Date: Jan 2008
Location: London
Posts: 156
|
That seems to be a compilation error. Or it could be that the code is too complex (which behaves like a compilation error).
I presume you have OpenCL working on your system? As you can probably tell I don't have any NVidia cards to test with so it's something I can only do remotely with others' help. If you or others are prepared to "mess about", I can try to diagnose the issue with tailored versions of the DLL (which might not do any filtering, but would verify basic capability). I'm also planning on a change in architecture (which should improve performance), which has the side-effect of reducing complexity, making the code more likely to work on NVidia. But that's a few days away at least. I'd be interested in results with NVidia 400 or 500 series as the code's complexity is theoretically less of a problem there. (The complexity issue is to do with registers. The code uses an extremely high register allocation on ATI, and likely similar on NVidia. NVidia prefers lower register allocations, but 400/500 series should be fine. My planned changes include a reduction in register allocation.) Did you try Deathray with all default values? i.e. use: Deathray() It may also be worth trying Deathray(hUV=0) but I'm doubtful that will work if the default version doesn't work. If you'd like to try some diagnosis, try this version of Deathray: www.cupidity.f9.co.uk/DeathrayNV110119001.zip Delete the Deathray DLL that is installed in your plugins folder and put this version of Deathray in there. This version of Deathray merely passes through the frame, with default settings. (It will do something else, not sure what, if you turn on temporal filtering.) Make sure to test with no parameters, please. I've also increased the detail on the error status message. That might provide some insight. |
![]() |
![]() |
![]() |
#13 | Link |
Registered User
Join Date: Dec 2005
Location: Germany
Posts: 1,765
|
Yep, OpenCL is working on my system. The NLMeansCL filter works on my system for example ^^
hUV=0 makes no difference your Debug version of Deathray gives me: "Error in OpenCL status=11 frame=0 and OpenCl status= -30" Hope this values can help you ![]()
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth VapourSynth Portable FATPACK || VapourSynth Database || https://github.com/avisynth-repository |
![]() |
![]() |
![]() |
#14 | Link |
Registered User
Join Date: Jan 2008
Location: London
Posts: 156
|
Thanks. That means it is having trouble finding devices. Which is definitely not what I was expecting.
Do you have the AMD OpenCL installed on your computer, in addition to NVidia OpenCL? I'm wondering if it finds the AMD OpenCL first, but there's no GPUs. So then fails. But the status you're getting doesn't seem to correspond with that, there's a different error for that situation. The OpenCL error is more mysterious, "invalid value"... If you'd like to help some more, this will tell me which OpenCL call is failing: www.cupidity.f9.co.uk/DeathrayNV110119002.zip EDIT: corrected, should be fine now Last edited by Jawed; 19th January 2011 at 21:16. |
![]() |
![]() |
![]() |
#15 | Link |
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 5,390
|
With this one, it is
"Error reading source frame 0: Avisynth read error: Deathray: Error in OpenCL status=11 frame 0 and OpenCL status=-30"
__________________
- We´re at the beginning of the end of mankind´s childhood - My little flickr gallery. (Yes indeed, I do have hobbies other than digital video!) |
![]() |
![]() |
![]() |
#16 | Link |
Registered User
Join Date: Jan 2008
Location: London
Posts: 156
|
Thanks, that's really peculiar. It's asking for the number of devices and seemingly responding that asking for the number of devices is invalid.
This is going to be tedious. OK this test version doesn't ask for the device count (eventually Deathray will support multiple cards ![]() www.cupidity.f9.co.uk/DeathrayNV110119003.zip Fingers-crossed. |
![]() |
![]() |
![]() |
#17 | Link |
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 5,390
|
Different message now:
"Error reading source frame 0: Avisynth read error: Single-frame initialisation failed, status=1" However, note I'm not running the latest NV driver for my GT240, it's one or two revisions behind. Reports from people running the most recent drivers could be more interesting. Or, perhaps the GT240 is simply "too small" ?
__________________
- We´re at the beginning of the end of mankind´s childhood - My little flickr gallery. (Yes indeed, I do have hobbies other than digital video!) |
![]() |
![]() |
![]() |
#18 | Link |
Registered User
Join Date: Dec 2005
Location: Germany
Posts: 1,765
|
I get exactly the same messages as Didée, both new versions tested.
And no, there is no trace of an AMD driver on my system ^^" Maybe this information can help? Code:
=================================================== GPU Caps Viewer v1.9.5 http://www.ozone3d.net/gpu_caps_viewer/ =================================================== ===================================[ System / CPU ] - CPU Name: Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz - CPU Core Speed: 2833 MHz - CPU Num Cores: 4 - Family: 6 - Model: 7 - Stepping: 10 - Physical Memory Size: 4095 MB - Operating System: Windows 7 64-bit build 7600 [No Service Pack] - DirectX Version: 10.0 - PhysX Version: 9100514 ===================================[ Graphics Adapter / GPU ] - SLI: disabled - GPUs: 1 - Logical GPUs: 1 - OpenGL Renderer: GeForce GTX 260/PCI/SSE2 - Drivers Renderer: NVIDIA GeForce GTX 260 - DB Renderer: NVIDIA GeForce GTX 260 - Device Description: NVIDIA GeForce GTX 260 - Adapter String: GeForce GTX 260 - Vendor: NVIDIA Corporation - Vendor ID: 0x10DE - Device ID: 0x05E2 - Sub device ID: 0x1109 - Sub vendor ID: 0x19DA - Drivers Version: 8.17.12.6099 (10-16-2010) - nvoglv64 - GPU Codename: GT200 - GPU Unified Shader Processors: 192 - GPU Vertex Shader Processors: 0 - GPU Pixel Shader Processors: 0 - SM / SIMD: 24 - TPC: 8 - TPD (Watts): 182 - Video Memory Size: 896 MB - Video Memory Type: GDDR3 - Clocks level #0: Core: 300MHz - Memory: 100MHz - Shader: 600MHz - Clocks level #1: Core: 400MHz - Memory: 300MHz - Shader: 800MHz - Clocks level #2: Core: 576MHz - Memory: 999MHz - Shader: 1242MHz - BIOS String: 62.0.61.0.0 - Current Display Mode: 1280x1024 @ 60 Hz - 32 bpp ===================================[ OpenGL GPU Capabilities ] - OpenGL Version: 3.3.0 - GLSL (OpenGL Shading Language) Version: 3.30 NVIDIA via Cg compiler - ARB Texture Units: 4 - Vertex Shader Texture Units: 32 - Pixel Shader Texture Units: 32 - Geometry Shader Texture Units: 32 - Max Texture Size: 8192x8192 - Max Anisotropic Filtering Value: X16.0 - Max Point Sprite Size: 63.4 - Max Dynamic Lights: 8 - Max Viewport Size: 8192x8192 - Max Vertex Uniform Components: 4096 - Max Fragment Uniform Components: 2048 - Max Geometry Uniform Components: 2048 - Max Varying Float: 60 - Max Vertex Bindable Uniforms: 12 - Max Fragment Bindable Uniforms: 12 - Max Geometry Bindable Uniforms: 12 - Frame Buffer Objects (FBO) Support:[yes] - Multiple Render Targets / Max draw buffers: 8 - Pixel Buffer Objects (PBO) Support:[yes] - S3TC Texture Compression Support:[yes] - ATI 3Dc Texture Compression Support:[no] - Texture Rectangle Support:[yes] - Floating Point Textures Support:[yes] - MSAA: 2X - MSAA: 4X - MSAA: 8X - MSAA: 16X - OpenGL Extensions: 221 extensions (GL=199 and WGL=22) <li>GL_ARB_blend_func_extended</li> <li>GL_ARB_color_buffer_float</li> <li>GL_ARB_compatibility</li> <li>GL_ARB_copy_buffer</li> <li>GL_ARB_debug_output</li> <li>GL_ARB_depth_buffer_float</li> <li>GL_ARB_depth_clamp</li> <li>GL_ARB_depth_texture</li> <li>GL_ARB_draw_buffers</li> <li>GL_ARB_draw_elements_base_vertex</li> <li>GL_ARB_draw_instanced</li> <li>GL_ARB_ES2_compatibility</li> <li>GL_ARB_explicit_attrib_location</li> <li>GL_ARB_fragment_coord_conventions</li> <li>GL_ARB_fragment_program</li> <li>GL_ARB_fragment_program_shadow</li> <li>GL_ARB_fragment_shader</li> <li>GL_ARB_framebuffer_object</li> <li>GL_ARB_framebuffer_sRGB</li> <li>GL_ARB_geometry_shader4</li> <li>GL_ARB_get_program_binary</li> <li>GL_ARB_half_float_pixel</li> <li>GL_ARB_half_float_vertex</li> <li>GL_ARB_imaging</li> <li>GL_ARB_instanced_arrays</li> <li>GL_ARB_map_buffer_range</li> <li>GL_ARB_multisample</li> <li>GL_ARB_multitexture</li> <li>GL_ARB_occlusion_query</li> <li>GL_ARB_occlusion_query2</li> <li>GL_ARB_pixel_buffer_object</li> <li>GL_ARB_point_parameters</li> <li>GL_ARB_point_sprite</li> <li>GL_ARB_provoking_vertex</li> <li>GL_ARB_robustness</li> <li>GL_ARB_sampler_objects</li> <li>GL_ARB_seamless_cube_map</li> <li>GL_ARB_separate_shader_objects</li> <li>GL_ARB_shader_bit_encoding</li> <li>GL_ARB_shader_objects</li> <li>GL_ARB_shading_language_100</li> <li>GL_ARB_shadow</li> <li>GL_ARB_sync</li> <li>GL_ARB_texture_border_clamp</li> <li>GL_ARB_texture_buffer_object</li> <li>GL_ARB_texture_compression</li> <li>GL_ARB_texture_compression_rgtc</li> <li>GL_ARB_texture_cube_map</li> <li>GL_ARB_texture_env_add</li> <li>GL_ARB_texture_env_combine</li> <li>GL_ARB_texture_env_crossbar</li> <li>GL_ARB_texture_env_dot3</li> <li>GL_ARB_texture_float</li> <li>GL_ARB_texture_mirrored_repeat</li> <li>GL_ARB_texture_multisample</li> <li>GL_ARB_texture_non_power_of_two</li> <li>GL_ARB_texture_rectangle</li> <li>GL_ARB_texture_rg</li> <li>GL_ARB_texture_rgb10_a2ui</li> <li>GL_ARB_texture_swizzle</li> <li>GL_ARB_timer_query</li> <li>GL_ARB_transform_feedback2</li> <li>GL_ARB_transpose_matrix</li> <li>GL_ARB_uniform_buffer_object</li> <li>GL_ARB_vertex_array_bgra</li> <li>GL_ARB_vertex_array_object</li> <li>GL_ARB_vertex_buffer_object</li> <li>GL_ARB_vertex_program</li> <li>GL_ARB_vertex_shader</li> <li>GL_ARB_vertex_type_2_10_10_10_rev</li> <li>GL_ARB_viewport_array</li> <li>GL_ARB_window_pos</li> <li>GL_ATI_draw_buffers</li> <li>GL_ATI_texture_float</li> <li>GL_ATI_texture_mirror_once</li> <li>GL_S3_s3tc</li> <li>GL_EXT_texture_env_add</li> <li>GL_EXT_abgr</li> <li>GL_EXT_bgra</li> <li>GL_EXT_bindable_uniform</li> <li>GL_EXT_blend_color</li> <li>GL_EXT_blend_equation_separate</li> <li>GL_EXT_blend_func_separate</li> <li>GL_EXT_blend_minmax</li> <li>GL_EXT_blend_subtract</li> <li>GL_EXT_compiled_vertex_array</li> <li>GL_EXT_Cg_shader</li> <li>GL_EXT_depth_bounds_test</li> <li>GL_EXT_direct_state_access</li> <li>GL_EXT_draw_buffers2</li> <li>GL_EXT_draw_instanced</li> <li>GL_EXT_draw_range_elements</li> <li>GL_EXT_fog_coord</li> <li>GL_EXT_framebuffer_blit</li> <li>GL_EXT_framebuffer_multisample</li> <li>GL_EXTX_framebuffer_mixed_formats</li> <li>GL_EXT_framebuffer_object</li> <li>GL_EXT_framebuffer_sRGB</li> <li>GL_EXT_geometry_shader4</li> <li>GL_EXT_gpu_program_parameters</li> <li>GL_EXT_gpu_shader4</li> <li>GL_EXT_multi_draw_arrays</li> <li>GL_EXT_packed_depth_stencil</li> <li>GL_EXT_packed_float</li> <li>GL_EXT_packed_pixels</li> <li>GL_EXT_pixel_buffer_object</li> <li>GL_EXT_point_parameters</li> <li>GL_EXT_provoking_vertex</li> <li>GL_EXT_rescale_normal</li> <li>GL_EXT_secondary_color</li> <li>GL_EXT_separate_shader_objects</li> <li>GL_EXT_separate_specular_color</li> <li>GL_EXT_shadow_funcs</li> <li>GL_EXT_stencil_two_side</li> <li>GL_EXT_stencil_wrap</li> <li>GL_EXT_texture3D</li> <li>GL_EXT_texture_array</li> <li>GL_EXT_texture_buffer_object</li> <li>GL_EXT_texture_compression_latc</li> <li>GL_EXT_texture_compression_rgtc</li> <li>GL_EXT_texture_compression_s3tc</li> <li>GL_EXT_texture_cube_map</li> <li>GL_EXT_texture_edge_clamp</li> <li>GL_EXT_texture_env_combine</li> <li>GL_EXT_texture_env_dot3</li> <li>GL_EXT_texture_filter_anisotropic</li> <li>GL_EXT_texture_integer</li> <li>GL_EXT_texture_lod</li> <li>GL_EXT_texture_lod_bias</li> <li>GL_EXT_texture_mirror_clamp</li> <li>GL_EXT_texture_object</li> <li>GL_EXT_texture_shared_exponent</li> <li>GL_EXT_texture_sRGB</li> <li>GL_EXT_texture_swizzle</li> <li>GL_EXT_timer_query</li> <li>GL_EXT_transform_feedback2</li> <li>GL_EXT_vertex_array</li> <li>GL_EXT_vertex_array_bgra</li> <li>GL_IBM_rasterpos_clip</li> <li>GL_IBM_texture_mirrored_repeat</li> <li>GL_KTX_buffer_region</li> <li>GL_NV_blend_square</li> <li>GL_NV_conditional_render</li> <li>GL_NV_copy_depth_to_color</li> <li>GL_NV_copy_image</li> <li>GL_NV_depth_buffer_float</li> <li>GL_NV_depth_clamp</li> <li>GL_NV_explicit_multisample</li> <li>GL_NV_fence</li> <li>GL_NV_float_buffer</li> <li>GL_NV_fog_distance</li> <li>GL_NV_fragment_program</li> <li>GL_NV_fragment_program_option</li> <li>GL_NV_fragment_program2</li> <li>GL_NV_framebuffer_multisample_coverage</li> <li>GL_NV_geometry_shader4</li> <li>GL_NV_gpu_program4</li> <li>GL_NV_half_float</li> <li>GL_NV_light_max_exponent</li> <li>GL_NV_multisample_coverage</li> <li>GL_NV_multisample_filter_hint</li> <li>GL_NV_occlusion_query</li> <li>GL_NV_packed_depth_stencil</li> <li>GL_NV_parameter_buffer_object</li> <li>GL_NV_parameter_buffer_object2</li> <li>GL_NV_pixel_data_range</li> <li>GL_NV_point_sprite</li> <li>GL_NV_primitive_restart</li> <li>GL_NV_register_combiners</li> <li>GL_NV_register_combiners2</li> <li>GL_NV_shader_buffer_load</li> <li>GL_NV_texgen_reflection</li> <li>GL_NV_texture_barrier</li> <li>GL_NV_texture_compression_vtc</li> <li>GL_NV_texture_env_combine4</li> <li>GL_NV_texture_expand_normal</li> <li>GL_NV_texture_multisample</li> <li>GL_NV_texture_rectangle</li> <li>GL_NV_texture_shader</li> <li>GL_NV_texture_shader2</li> <li>GL_NV_texture_shader3</li> <li>GL_NV_transform_feedback</li> <li>GL_NV_transform_feedback2</li> <li>GL_NV_vertex_array_range</li> <li>GL_NV_vertex_array_range2</li> <li>GL_NV_vertex_buffer_unified_memory</li> <li>GL_NV_vertex_program</li> <li>GL_NV_vertex_program1_1</li> <li>GL_NV_vertex_program2</li> <li>GL_NV_vertex_program2_option</li> <li>GL_NV_vertex_program3</li> <li>GL_NVX_conditional_render</li> <li>GL_NVX_gpu_memory_info</li> <li>GL_SGIS_generate_mipmap</li> <li>GL_SGIS_texture_lod</li> <li>GL_SGIX_depth_texture</li> <li>GL_SGIX_shadow</li> <li>GL_SUN_slice_accum</li> <li>GL_WIN_swap_hint</li> <li>WGL_EXT_swap_control</li> <li>WGL_ARB_buffer_region</li> <li>WGL_ARB_create_context</li> <li>WGL_ARB_create_context_profile</li> <li>WGL_ARB_create_context_robustness</li> <li>WGL_ARB_extensions_string</li> <li>WGL_ARB_make_current_read</li> <li>WGL_ARB_multisample</li> <li>WGL_ARB_pbuffer</li> <li>WGL_ARB_pixel_format</li> <li>WGL_ARB_pixel_format_float</li> <li>WGL_ARB_render_texture</li> <li>WGL_ATI_pixel_format_float</li> <li>WGL_EXT_create_context_es2_profile</li> <li>WGL_EXT_extensions_string</li> <li>WGL_EXT_framebuffer_sRGB</li> <li>WGL_EXT_pixel_format_packed_float</li> <li>WGL_NVX_DX_interop</li> <li>WGL_NV_float_buffer</li> <li>WGL_NV_multisample_coverage</li> <li>WGL_NV_render_depth_texture</li> <li>WGL_NV_render_texture_rectangle</li> ===================================[ NVIDIA CUDA Capabilities ] - CUDA Device 0 - Device name: GeForce GTX 260 - Compute Capability: 1.3 - Total Memory: 877 MB - Shader Clock Rate: 1242 MHz - Multiprocessors: 24 - Warp Size: 32 - Max Threads Per Block: 512 - Threads Per Block: 512 x 512 x 64 - Grid Size: 65535 x 65535 x 1 - Registers Per Block: 16384 - Texture Alignment: 256 byte - Total Constant Memory: 64 Kb ===================================[ OpenCL Capabilities ] - Num OpenCL platforms: 1 - Name: NVIDIA CUDA - Version: OpenCL 1.0 CUDA 3.2.1 - Profile: FULL_PROFILE - Vendor: NVIDIA Corporation - Num devices: 1 - CL_DEVICE_NAME: GeForce GTX 260 - CL_DEVICE_VENDOR: NVIDIA Corporation - CL_DRIVER_VERSION: 260.99 - CL_DEVICE_PROFILE: FULL_PROFILE - CL_DEVICE_VERSION: OpenCL 1.0 CUDA - CL_DEVICE_TYPE: GPU - CL_DEVICE_VENDOR_ID: 0x10DE - CL_DEVICE_MAX_COMPUTE_UNITS: 24 - CL_DEVICE_MAX_CLOCK_FREQUENCY: 1242MHz - CL_NV_DEVICE_COMPUTE_CAPABILITY_MAJOR: 1 - CL_NV_DEVICE_COMPUTE_CAPABILITY_MINOR: 3 - CL_NV_DEVICE_REGISTERS_PER_BLOCK: 16384 - CL_NV_DEVICE_WARP_SIZE: 32 - CL_NV_DEVICE_GPU_OVERLAP: 1 - CL_NV_DEVICE_KERNEL_EXEC_TIMEOUT: 1 - CL_NV_DEVICE_INTEGRATED_MEMORY: 0 - CL_DEVICE_ADDRESS_BITS: 32 - CL_DEVICE_MAX_MEM_ALLOC_SIZE: 224608KB - CL_DEVICE_GLOBAL_MEM_SIZE: 877MB - CL_DEVICE_MAX_PARAMETER_SIZE: 4352 - CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 0 Bytes - CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 0KB - CL_DEVICE_ERROR_CORRECTION_SUPPORT: NO - CL_DEVICE_LOCAL_MEM_TYPE: Local (scratchpad) - CL_DEVICE_LOCAL_MEM_SIZE: 16KB - CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64KB - CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3 - CL_DEVICE_MAX_WORK_ITEM_SIZES: [512 ; 512 ; 64] - CL_DEVICE_MAX_WORK_GROUP_SIZE: 512 - CL_EXEC_NATIVE_KERNEL: 4751356 - CL_DEVICE_IMAGE_SUPPORT: YES - CL_DEVICE_MAX_READ_IMAGE_ARGS: 128 - CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8 - CL_DEVICE_IMAGE2D_MAX_WIDTH: 4096 - CL_DEVICE_IMAGE2D_MAX_HEIGHT: 32768 - CL_DEVICE_IMAGE3D_MAX_WIDTH: 2048 - CL_DEVICE_IMAGE3D_MAX_HEIGHT: 2048 - CL_DEVICE_IMAGE3D_MAX_DEPTH: 16 - CL_DEVICE_MAX_SAMPLERS: 16 - CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 1 - CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1 - CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 1 - CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 1 - CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1 - CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 1 - CL_DEVICE_EXTENSIONS: 16 - Extensions: - cl_khr_byte_addressable_store - cl_khr_icd - cl_khr_gl_sharing - cl_nv_d3d9_sharing - cl_nv_d3d10_sharing - cl_khr_d3d10_sharing - cl_nv_d3d11_sharing - cl_nv_compiler_options - cl_nv_device_attribute_query - cl_nv_pragma_unroll - - cl_khr_global_int32_base_atomics - cl_khr_global_int32_extended_atomics - cl_khr_local_int32_base_atomics - cl_khr_local_int32_extended_atomics - cl_khr_fp64 ===================================[ Misc. ] ===================================[ Related Graphics Drivers ] - http://www.geeks3d.com/?page_id=752 - http://downloads.guru3d.com/download.php?id=10 - http://www.tweakguides.com/NVFORCE_1.html - http://www.nvidia.com/object/winxp-2k_archive.html - http://www.geeks3d.com/?p=65 ===================================[ Related Graphics Cards Reviews ] - http://www.geeks3d.com/?tag=geforce-gtx-260 - http://www.google.us/search?q=NVIDIA+GeForce+GTX+260+review
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth VapourSynth Portable FATPACK || VapourSynth Database || https://github.com/avisynth-repository |
![]() |
![]() |
![]() |
#19 | Link | |||
Leader of Dual-Duality
Join Date: Aug 2010
Location: America
Posts: 134
|
Will give this a look see in a bit.
Edit: I tried the normal version and the 3 fixed versions in avspmod I got this for the normal Quote:
Quote:
Quote:
also I am able to get nlmeanscl to run fine on my computer, I have a GTX 260 and a AMD Phenom quad core
__________________
I'm Mr.Fixit and I feel good, fixin all the sources in the neighborhood My New filter is in the works, and will be out soon Last edited by TheProfileth; 20th January 2011 at 00:15. |
|||
![]() |
![]() |
![]() |
#20 | Link |
Registered User
Join Date: Jan 2008
Location: London
Posts: 156
|
Thanks Didée and ChaosKing. That means it's found the GPU and is now trying to create memory allocations and/or preparing the GPU code (kind of compilation).
From another forum (someone else's application) I've learnt there are (or at least, were) problems with NVidia's support for my use of a feature, which complicates things. This is my last update for tonight: www.cupidity.f9.co.uk/DeathrayNV110119004.zip Yet more detail in error tracking. Though I'm pessimistic overall due to the problem I mentioned above. |
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|