Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 13th February 2005, 21:51   #1  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
fft3dGPU 0.8.2

Test this new GPU version of fft3dfilter.
Get the newest
version 0.8.2. manual installation (dll and hlsl only)
version 0.8.1. manual installation (dll and hlsl only)
version 0.8. manual installation (dll and hlsl only)
version 0.7. manual installation (dll and hlsl only)
version 0.6.4. manual installation (dll and hlsl only)
version 0.6.3.
version 0.6.2.
version 0.6.1.
version 0.6.
version 0.51. (Manual installation available here)
version 0.5a.
version 0.47.
version 0.46.1.

From the readme:

Introduction

FFT3dGPU is a GPU version of Fizick's FFT3DFilter. The algorithm (Fast Fourier Transform, denoising) is the same for the most part. Currently the following is not implemented: support for noise pattern.

In this version the next frame is processed while waiting for the GPU to end it's work. Meaning the filters before fft3dGPU are working concurrently with it.
Install:

To use this filter you need directx 9.0c or better and a graphics card supporting directx 9 in hardware. That is at least an ATI Radeon 95xx or Nvidia Geforce fx 5xxx. Geforce 6xxx or better is recommended. If you have downloaded the installer just run it and you're done, else copy fft3dgpu.hlsl and copy FFT3dGPU.dll into the same directory from the 7-zip archive, also install the latest version of directx (april 2006 or later). You can get it here or extract the file d3dx9_30.dll (not included in the archive) to the c:\windows\system32 directory. The installer will copy d3dx9_30.dll to the right location meaning that it shouldn't be neccesary to run the directx installer if you have Directx 9c installed.
Syntax

FFT3DGPU(clip, float "sigma", float "beta", int "bw", int "bh", int "bt", float "sharpen", int "plane", int "mode", int "bordersize", int "precision", bool "NVPerf", float "degrid", float "scutoff", float "svr", float "smin", float "smax", float "kratio", int "ow", int "oh", int "wintype" , int "interlaced", float "sigma2", float "sigma3", float "sigma4", bool "oldfft" )
Function parameters:

clip: the clip to filter. The clip must be YV12 or YUY2.

sigma and beta has the same meaning as in fft3dfilter. Default=2.

sigma2, sigma3, sigma4 If specified controls the sigma value for highest(sigma) to lowest frequency(sigma4). Default=sigma

bw,bh: blockwide and block height. It should be a power of 2 ie valid values is 4,8,16,32,64,128,256,512 (note that bw should be greater than 4 for best result). Default=32

bt: mode. bt=-1 sharpen only, bt=0 kalman filtering, bt=1 is 2d filtering, bt=2 uses the current and previous frame, bt=3 uses the previous current and next frame, bt=4 uses the two previous frames, the current and next frame. default 3

sharpen: positive values sharpens the image, negative values blurs the image. 0 disables sharpening. Default 0.

plane: 0 filters luma, 1,2 and 3 filters Chroma (both U and V). 4 filters both luma and chroma. Default 0.

mode: 0 only overlaps 1:1. This is faster but produces artifacts with high sigma values.
mode=1 block overlaps 2:1. This is slower but produces fewer artifacts.
mode=2 again 1:1 overlap but with a additional border. This reduces border artifacts seen with mode=0. The speed is between mode 0 and 1.
Kalman(bt=0) works well with mode=0. Default 1

bordersize: only used with mode 2. Defines the size of the border. Default is 1.

precision: 0: to use 16 bit floats(half precision),
1: to use 32 bit float(single precision) for the fft and 16 bit float for the wienner/kalman and sharpening.
2: allways use 32 bit floats.
Using 16 bit float increases the performance but reduces precision. With a Geforce 7800GT precision=0 is ~1.5 times faster than than mode 2. Default=0.

NVPerf: Enables support for NVPerfHUD (http://developer.nvidia.com/object/nvperfhud_home.html). Default false.

degrid: Enables degriding. Only works well with mode=1. Doesn't degrid the Kalman filter (but it does degrid the sharpening (if enabled) after kalman filter). default 1.0 for mode=1, 0.0 for mode=0 or 2

scutoff, svr, smin, smax:Same meaning as fft3dfilter. Controls the sharpening. default scutoff=0.3, svr=1.0, smin=4.0, smax=20.0

kratio: same as fft3dfilter. Control the threshold for reseting the Kalman filter. Default 2.0

ow,oh: this only works with mode=1. This specifies how big the overlap between the blocks are. Overlap size must be less than or equal to half the blocksize. Ow must be even. Default: ow=bw/2 ,oh=bh/2

wintype: Change the analysis and syntesis window function. Same as fft3dfilter

interlaced: Set to true for separate filtering for each field. Default=false.

oldfft: Set to true to use the old fftcode (used in version 0.6.2 and lower) false to use new fft code. If not defined fft3dgpu will use the fastest code.
FAQ:
Q: What does it mean when I get a popup box Unexpected error encountered with Error Code: D3DERR_OUTOFVIDEOMEMORY.

A: It means that fft3dgpu needs more memory than there are availebol on the graphics card. So either you will have to upgrade or try lowering the resolution, precision, bt,bh,bw,ow,oh or use usefloat16=true or mode 0 or 2
Q: I can't get this filter to work

A: Try upgrading to the latest drivers(ati radeon or nvidia geforce). Check if your card is supported (see below). If that doesn't solve the problem write me an bug report (see support) where you include the script used, program used and what GPU, driver version, windows version and directx version version you use.
Q: What setting gives the same result as fft3dfilter?

A:fft3dGPU(mode=1,precision=2) is similair to fft3dfilter() but please note the different default values for bw,ow,bh,ow
Q: Is there any differences between fft3dfilter and fft3dgpu?

A: Some of the features from fft3dfilter is still missing.
Q: Why is fft3dGPU so slow compaired to fft3dfilter?

A: either you have a slow graphics card like a Geforce FX 5200 or you are not using it while doing cpu heavy encoding (like XviD/DivX)
Q: How do I use NVPerfHUD?

A: set NVperf=true and used this commandline or make a shortcut to run it: "PATH TO NVPerfHUD\NVPerfHUD.exe" "PATH TO VIRTUALDUB\virtualdub.exe" "PATH TO AVS\test.avs" and enabled "force NON PURE device"
Q: I get this errormessage: "Only pixelshader 2.0 or greater supported"

A: It is because you need a graphics card that has hardware support for Directx 9.
The following cards will not work:

Nvidia:
TNT
TNT2
Geforce 256
GeForce2 Ultra, Ti, Pro,MX,Go and GTS
Geforce3 Ti 200, Ti 500
GeForce4 Ti, MX, Go

Ati:
Radeon 7xxx
Radeon 8xxx
Radeon 90xx
Radeon 92xx

Matrox:
G2xx
G4xx
G5xx
maybe Parhelia

The following should work:

Nvidia:
Geforce FX 5xxx
Geforce 6xxx
Geforce 7xxx

Ati:
Radeon 9500
Radeon 9550
Radeon 9600
Radeon 9700
Radeon 9800
Radeon Xxxx
Radeon X1xxx

where x means any digit.

Support:

This thread on the doom9 forum or my email address (tsp (at) person.dk).
TODO:

(maybe) noise pattern support. Fix all the stupid bugs. Add the directx 9.0b version back.
Changelog:

* 0.1 first release. Buggy and used Brook
* 0.2 sigma should now work like fft3dfilter
* 0.3 Rewrote the code to use Directx 9.0 directly and support for 16 bit float increasing performance and stability.
* 0.31 Fixed bug causing aliased edges.
* 0.4 Added sharpen, mode 1,2, reduceCPU and multithreading
* 0.41 Fixed bug when calculating PSD.
* 0.42 Fixed memory leak when reloading
* 0.43 Fixed bug that caused coruptions on the Geforce FX cards and some more memory leaks. Added more comments to the sourcecode and small performance improvement in the shaders. Also added support for directx 9.0b
* 0.44 fft3dgpu can now reset a lost device and continue work. The direcx 9.0b version should work now.
* 0.45 fixed bug when filtering the chromaplane and mode=0 or 2 crashed the filter.
* 0.46 fixed lockups on hyperthread enabled machines(hopefull). Also fixed infinite loop when closing WMP 6.4.
* 0.46.1 fixed issue with nvperf=true causing fft3dgpu to lock up. Added a FAQ section to this file.
* 0.47 fixed bug with corrupted frames after reseting a lost device. Renamed the readme.txt to fft3dgpu.txt. Uses a newer version of DirectX 9.0c so please _read the install instructions_!!!
* 0.5 Added Kalman, sharpening, bt=4, degrid from fft3dfilter. Renamed ps.hlsl to fft3dgpu.hlsl. Rewrote some of the code. Added new bugs.
* 0.5a fixed bug with bt=2. Only file changed is fft3dgpu.hlsl
* 0.51 Fixed bugwithparametersafterNVPerfwasshifted.iedegrid=scutoff,scutoff=svr. Improved download speed from GPU. Geforce fx 5xxx now works with Kalman filter.
* 0.6 Added wintypes, plane=4 and variable overlap size (ow,oh). Change useFloat16 to precision. Changed default value for mode to 1
* 0.6.1 variable overlap now works on the geforce fx 5xxx. Default value for mode is 1 now.
* 0.6.2 bugfix: Degrid works better and vertical banding is gone when using mode 1. Right edge artifacts gone when using non mod 8 width and plane>0.
* 0.6.3 New fft code. Should improve performance when using larger blocksize and precision= 2(by up to 70%). Fixed bug with HC 0.17 crashing. New html doc(thanks Fizicks for creating this).
* 0.6.4 new fft code should now work with ati cards.
* 0.7 Added sigma2,sigma3 and sigma4 and support for interlaced filtering. Uses the fastest fft code now.
* 0.8 Added support for YUY2 colorspace. If not enough GPU memory is available the least used texture will be swapped to system memory.
* 0.8.1 Fixed crash when recovering lost device with plane=4 (thanks Fizick). Changed default for bt to 3 as ff3dfilter
* 0.8.2 Fixed crash when recovering lost device with interlaced=true (thanks Fizick) and recovering lost device with bt=0 and sigma2,3,4 =sigma.



Sourcecode released under GPL see copying.txt

Last edited by tsp; 20th February 2008 at 18:30.
tsp is offline   Reply With Quote
Old 13th February 2005, 22:26   #2  |  Link
708145
Professional Lemming
 
708145's Avatar
 
Join Date: Dec 2003
Location: Stuttgart, Germany
Posts: 359
Very nice indeed

Could somebody with a recent GPU please provide info about results, problems, speedup, ...?

It'll definitely help to convince me to get out and buy a new GPU ASAP

bis besser,
Tobias
__________________
projects page: ELDER, SmoothD, etc.
708145 is offline   Reply With Quote
Old 13th February 2005, 22:41   #3  |  Link
Soulhunter
Bored...
 
Soulhunter's Avatar
 
Join Date: Apr 2003
Location: Unknown
Posts: 2,812
Hrm, bt mode 3 gives me this... :\

But mode 1/2 works nice (720x576 @ ~10fps) !!!

My box: Athlon XP2800+ / 1024MB RAM / GeForce 6600GT


Bye
__________________

Visit my IRC channel

Last edited by Soulhunter; 13th February 2005 at 22:49.
Soulhunter is offline   Reply With Quote
Old 13th February 2005, 23:25   #4  |  Link
Fizick
AviSynth plugger
 
Fizick's Avatar
 
Join Date: Nov 2003
Location: Russia
Posts: 2,183
Tsp,
talanted works!
But not for my GF2MX400
So, I will stay with fft3dfilter
BTW, what is "hole frame" ? Whole?

Once more question:
Have you plan to implement in GPU all my other plugins?
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick
I usually do not provide a technical support in private messages.
Fizick is offline   Reply With Quote
Old 14th February 2005, 08:52   #5  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
Quote:
BTW, what is "hole frame" ? Whole?
umm yes typo. It should be whole frame. So the border are also filtered.

Quote:
Have you plan to implement in GPU all my other plugins?
Only the FPU heavy filter

Also how does the Kalmar filter works if I should implement it?

Soulhunter: I get a similar result with bt=3. If you use usecache=false the chroma shift disappear (and also the speed)
I'm trying to find out where the error is.

708145: On my computer an athlon xp 2400 MHz with an ASUS Geforce 6800 GT (V9999GT) I get about 10-11 fps @ 720x576
I'm a little curious how the radeons would perform.
tsp is offline   Reply With Quote
Old 14th February 2005, 09:15   #6  |  Link
bill_baroud
Registered User
 
Join Date: Feb 2002
Posts: 407
gah, i forgot my usb dongle, i don't have my screenshots...

well i tested, and got some weird results, quickly :

- it does not any filter (??) but insert some weird black square on the image, of size bh/bw.

- it add some black borders horizontally too.

- speed is about 5-6fps on my FX5900 (looks like it likes those fps
bill_baroud is offline   Reply With Quote
Old 14th February 2005, 11:50   #7  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
I fixed the chroma bug with bt=3. Also added a new option reducecpu. If enabled the cpuload is reduced (but so is the framerate but hopefull it will be fixed someday).
Same link as before.

bill_baroud: What driver are you using? What size is the image? How does the script look like? This filter only process YV12.
tsp is offline   Reply With Quote
Old 14th February 2005, 13:25   #8  |  Link
bill_baroud
Registered User
 
Join Date: Feb 2002
Posts: 407
uh yeah, i forgot ... Source is MJPEG (avi) or MPEG4v2, 768x576 (PAL cap) or 832x480. My script just convert to YV12 and use fft3dgpu() with default settings (well i tried to change the others settings, but with no luck, it only change the size of black squares).

Drivers ??? huh ... i don't think they are the latest, something like 66.77.

I also tried other colorspace as input, but the results were really funky as expected, and not like my bug.
bill_baroud is offline   Reply With Quote
Old 14th February 2005, 15:35   #9  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
bill_baroud: I have tested the filter with version 66.93 and 71.80 both didn't show any artifacts. You could try to update the driver.
tsp is offline   Reply With Quote
Old 14th February 2005, 23:27   #10  |  Link
Fizick
AviSynth plugger
 
Fizick's Avatar
 
Join Date: Nov 2003
Location: Russia
Posts: 2,183
tsp,
Quote:
But sigma=1 in fft3dfilter ~ sigma=25 in fft3dGPU.
How about compatibility? I use:
Code:
norm = 1.0f/(bw*bh); // do not forget set FFT normalization factor
sigma2NoiseNormed = bt*sigma*sigma/norm; // normalize noise value
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick
I usually do not provide a technical support in private messages.
Fizick is offline   Reply With Quote
Old 15th February 2005, 09:32   #11  |  Link
Backwoods
ReMember
 
Backwoods's Avatar
 
Join Date: Nov 2003
Posts: 416
GeForce 6800 OC

720x272

FFT3dFilter = 10-12fps

FFT3dGPU = 12-16fps

(sigma=3.0, bt=3, bh=32 ,bw=32) for both filters.
Backwoods is offline   Reply With Quote
Old 15th February 2005, 09:44   #12  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
Fizick I have added the normalization code to sigma. The only thing I can't seem to figure out is how to apply the 2d window function. At the moment I'm using af 1d window but this produces artifact with sigma values above 10.
When just multiplying the cosx and cosy values I get a checkboard pattern(when the picture is shifted bw/2 and bh/2 and summed the factor doesn't add up to 1.

edit

nevermind I cheated and used this as the window function:
Code:
void ImgStream::CreateFactorMap(float* Map,int x,unsigned int xnum,int y,unsigned int ynum,bool shift)
{
	double cosy,cosx;
	unsigned int offset=0;
	double x1=x;
	double y1=y;
	//xnum=xnum/2;
	//ynum=ynum/2;
	for(unsigned int repy=0;repy<(ynum+shift);repy++){
		for(double n1=(shift&&(repy==0||repy==ynum)?0:-y1/2.0)+0.5;n1<y1/2.0;n1++){
			cosy=cos(n1*pi/(y1));
			for(unsigned int repx=0;repx<(xnum+shift);repx++){
				for(double n2=(shift&&(repx==0||repx==xnum)?0:-x1/2.0)+0.5;n2<x1/2.0;n2++){
					cosx=cos(n2*pi/(x1));
					Map[offset++]=sqrt(0.5*(cosx*cosx+cosy*cosy));
				}
			}
		}
	}
}
I have upload version 0.2 where the sigma values should work like fizick's fft3dfilter.

Last edited by tsp; 15th February 2005 at 18:05.
tsp is offline   Reply With Quote
Old 15th February 2005, 18:33   #13  |  Link
Antitorgo
Registered User
 
Join Date: Dec 2004
Posts: 32
Hmm... seems really slow.

The previous version I tested got something like 6-7fps now I get 2-3fps... This is with reduce CPU set to false (I tried true and got the same framerate with lower CPU utilization)... This could be because of something on my laptop or something tho.

From the stuff on the AviShader thread...

On the sleep() calls, I do them before copying the texture back, in D3D, as soon as the DrawPrimitive() call happens, the GPU begins it's thing, so any work you do between there and copying the texture back is the place to do things. For example, you can start copying the next frame up to the GPU or any sort of preprocessing on the CPU.

As far as my channel idea... if you are uploading a 1 channel 8-bit image (typically Luma) to the GPU, I'm guessing that brook is doing packing/unpacking into a 32-bit texture (which is native on the GPU) at 1/4 the width. This leads to an ineffeciency because of the packing/unpacking that has to happen on the GPU and just adds overhead. My idea was to upload 4 frames into each channel on a 32-bit texture, then you can run the shader across 4 frames at a time. In your case, it is a little complicated because you have your shifted/multiplied frame thing going on... so I'm not sure if it is applicable in your case (In avishader, I expect it to give me a huge performance boost when I get around to implementing it)...
Antitorgo is offline   Reply With Quote
Old 15th February 2005, 19:55   #14  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
Antitorgo I don't get lower framerates with the new version. Strange.
Also in this version the sleep is placed just before EndScene. And the texture is uploaded as D3DFMT_L8 then converted to D3DFMT_A32B32G32R32F where all the calculation is done (FFT requeres float) before converted back to D3DFMT_A8R8G8B8 for download. But I'm convienced that I will have to rewrite the filter without brook if I shall optimize this filter more. So now I just have to learn how to set up DirectX to do the rendering
tsp is offline   Reply With Quote
Old 17th February 2005, 17:30   #15  |  Link
Antitorgo
Registered User
 
Join Date: Dec 2004
Posts: 32
Yeah, copying back using the A8R8G8B8 is what has always killed me too, because it is the slowest operation and has to transfer 4x as much data as necessary. That is why I was thinking that the 4 frame at a time deal would work well...

If you want the source for AviShader which has all the D3D stuff, PM me and I'll see what I can do. D3D is pretty straightforward once you grok it.
Antitorgo is offline   Reply With Quote
Old 22nd February 2005, 20:25   #16  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
when using A8R8G8B8 (or fixed4 in brook) I pack 4 pixels to avoid waisting bandwidth (see FFT3dshader.br for the brook shaders).
tsp is offline   Reply With Quote
Old 14th March 2005, 22:57   #17  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
released version 0.3. It's a major rewrite now using Directx directly instead of brook. The shaders are also optimized and the filter can now use float16(2 byte float) instead of float32(4 byte float or single precision) for storing the calculations causing an up to 150% speed increase compaired to version 0.2:
This is the framerates for version 0.2 ,0.3 and fft3dfilter using this syntax:
fft3dGPU(bt=1,sigma=2)
fft3dfilter(bt=1,sigma=2)
on a 720x576 clip:

fft3dfilter 7.0 FPS
fft3dGPU 0.2 11.0 FPS
fft3dGPU 0.3 24.3 FPS

this is on an athlon XP 2400 MHZ (nforce-2 chipset 200 MHz DDR ram)
and a Geforce 6800 GT 128 MB RAM (411 MHz core/742 MHz mem).

So a 350 % speed increase compaired to fft3dfilter. Nice...

Please report if the filter work with your graphics card. What card it is and how fast.

I have only tested this version with my geforce 6800 GT so I don't know how well it works with geforce FX 5xxx and Radeon Xxxx and 9xxx so please try and report back.


Last edited by tsp; 15th March 2005 at 10:26.
tsp is offline   Reply With Quote
Old 15th March 2005, 00:52   #18  |  Link
Backwoods
ReMember
 
Backwoods's Avatar
 
Join Date: Nov 2003
Posts: 416
AVISource("otto.avi")
ConvertToYV12()
(sigma=3,bt=3)

FFT3DFilter 4-6 fps
FFT3dGPU 8-12 fps

720x480

GeForce 6800OC
2.8HT
1gig RAM

And I noticed the AA problem too.
Backwoods is offline   Reply With Quote
Old 15th March 2005, 10:26   #19  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
Fixed the bug causing the aliased edges. Get the fixed version from the first post.

Also it would be nice if you could post the driver version you are using.

Backwoods: I'm a little curious why you only get 8-12 fps when I get about 18-21 fps on a 720x576 clip using bt=3. Maybe it's the 4 extra pipelines in the geforce 6800 GT.
tsp is offline   Reply With Quote
Old 15th March 2005, 13:20   #20  |  Link
Blue_MiSfit
Derek Prestegard IRL
 
Blue_MiSfit's Avatar
 
Join Date: Nov 2003
Location: Los Angeles
Posts: 5,988
AWESOME!

On my 9800 pro I get about 7-15 fps (encoding into cq2 qpel vhq4(&bvop) xvid) depending on the scene with a crop, lanczosresize(), removegrain(mode=2) and unfilter(-5,-5) before it.

powerful denoising without smudging the image too much, I really like it so far for the new Star Wars DVDs which have an absurd ammount of noise (gives 6of9 nightmares when barely filtered!!).

More later

~misfit
__________________
These are all my personal statements, not those of my employer :)
Blue_MiSfit is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 19:59.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.