PDA

View Full Version : fft3dGPU 0.8.2


Pages : [1] 2 3 4

tsp
13th February 2005, 21:51
Test this new GPU version of fft3dfilter.
Get the newest
version 0.8.2 (http://www.avisynth.org/tsp/fft3dgpu0.8.2a.exe). manual installation (dll and hlsl only) (http://www.avisynth.org/tsp/fft3dgpu0.8.2.7z)
version 0.8.1 (http://www.avisynth.org/tsp/fft3dgpu0.8.1.exe). manual installation (dll and hlsl only) (http://www.avisynth.org/tsp/fft3dgpu0.8.1.7z)
version 0.8 (http://www.avisynth.org/tsp/fft3dgpu0.8.exe). manual installation (dll and hlsl only) (http://www.avisynth.org/tsp/fft3dgpu0.8.7z)
version 0.7 (http://www.avisynth.org/tsp/fft3dgpu0.7.exe). manual installation (dll and hlsl only) (http://www.avisynth.org/tsp/FFT3dGPU0.7.7z)
version 0.6.4 (http://www.avisynth.org/tsp/fft3dgpu0.6.4.exe). manual installation (dll and hlsl only) (http://www.avisynth.org/tsp/FFT3dGPU0.6.4.7z)
version 0.6.3 (http://www.avisynth.org/tsp/fft3dgpu0.6.3.exe).
version 0.6.2 (http://www.avisynth.org/tsp/fft3dgpu0.6.2.exe).
version 0.6.1 (http://www.avisynth.org/tsp/fft3dgpu0.6.1.exe).
version 0.6 (http://www.avisynth.org/tsp/fft3dgpu0.6.exe).
version 0.51 (http://www.avisynth.org/tsp/fft3dgpu0.51.exe). (Manual installation available here (http://www.avisynth.org/tsp/fft3dgpu_051.zip))
version 0.5a (http://www.avisynth.org/tsp/fft3dgpu_05a.zip).
version 0.47 (http://www.avisynth.org/tsp/fft3dgpu_47.zip).
version 0.46.1 (http://www.avisynth.org/tsp/fft3dgpu_0461.zip).

From the readme:

Introduction

FFT3dGPU is a GPU version of Fizick's FFT3DFilter. The algorithm (Fast Fourier Transform, denoising) is the same for the most part. Currently the following is not implemented: support for noise pattern.

In this version the next frame is processed while waiting for the GPU to end it's work. Meaning the filters before fft3dGPU are working concurrently with it.
Install:

To use this filter you need directx 9.0c or better and a graphics card supporting directx 9 in hardware. That is at least an ATI Radeon 95xx or Nvidia Geforce fx 5xxx. Geforce 6xxx or better is recommended. If you have downloaded the installer just run it and you're done, else copy fft3dgpu.hlsl and copy FFT3dGPU.dll into the same directory from the 7-zip archive, also install the latest version of directx (april 2006 or later). You can get it here or extract the file d3dx9_30.dll (not included in the archive) to the c:\windows\system32 directory. The installer will copy d3dx9_30.dll to the right location meaning that it shouldn't be neccesary to run the directx installer if you have Directx 9c installed.
Syntax

FFT3DGPU(clip, float "sigma", float "beta", int "bw", int "bh", int "bt", float "sharpen", int "plane", int "mode", int "bordersize", int "precision", bool "NVPerf", float "degrid", float "scutoff", float "svr", float "smin", float "smax", float "kratio", int "ow", int "oh", int "wintype" , int "interlaced", float "sigma2", float "sigma3", float "sigma4", bool "oldfft" )
Function parameters:

clip: the clip to filter. The clip must be YV12 or YUY2.

sigma and beta has the same meaning as in fft3dfilter. Default=2.

sigma2, sigma3, sigma4 If specified controls the sigma value for highest(sigma) to lowest frequency(sigma4). Default=sigma

bw,bh: blockwide and block height. It should be a power of 2 ie valid values is 4,8,16,32,64,128,256,512 (note that bw should be greater than 4 for best result). Default=32

bt: mode. bt=-1 sharpen only, bt=0 kalman filtering, bt=1 is 2d filtering, bt=2 uses the current and previous frame, bt=3 uses the previous current and next frame, bt=4 uses the two previous frames, the current and next frame. default 3

sharpen: positive values sharpens the image, negative values blurs the image. 0 disables sharpening. Default 0.

plane: 0 filters luma, 1,2 and 3 filters Chroma (both U and V). 4 filters both luma and chroma. Default 0.

mode: 0 only overlaps 1:1. This is faster but produces artifacts with high sigma values.
mode=1 block overlaps 2:1. This is slower but produces fewer artifacts.
mode=2 again 1:1 overlap but with a additional border. This reduces border artifacts seen with mode=0. The speed is between mode 0 and 1.
Kalman(bt=0) works well with mode=0. Default 1

bordersize: only used with mode 2. Defines the size of the border. Default is 1.

precision: 0: to use 16 bit floats(half precision),
1: to use 32 bit float(single precision) for the fft and 16 bit float for the wienner/kalman and sharpening.
2: allways use 32 bit floats.
Using 16 bit float increases the performance but reduces precision. With a Geforce 7800GT precision=0 is ~1.5 times faster than than mode 2. Default=0.

NVPerf: Enables support for NVPerfHUD (http://developer.nvidia.com/object/nvperfhud_home.html). Default false.

degrid: Enables degriding. Only works well with mode=1. Doesn't degrid the Kalman filter (but it does degrid the sharpening (if enabled) after kalman filter). default 1.0 for mode=1, 0.0 for mode=0 or 2

scutoff, svr, smin, smax:Same meaning as fft3dfilter. Controls the sharpening. default scutoff=0.3, svr=1.0, smin=4.0, smax=20.0

kratio: same as fft3dfilter. Control the threshold for reseting the Kalman filter. Default 2.0

ow,oh: this only works with mode=1. This specifies how big the overlap between the blocks are. Overlap size must be less than or equal to half the blocksize. Ow must be even. Default: ow=bw/2 ,oh=bh/2

wintype: Change the analysis and syntesis window function. Same as fft3dfilter

interlaced: Set to true for separate filtering for each field. Default=false.

oldfft: Set to true to use the old fftcode (used in version 0.6.2 and lower) false to use new fft code. If not defined fft3dgpu will use the fastest code.
FAQ:
Q: What does it mean when I get a popup box Unexpected error encountered with Error Code: D3DERR_OUTOFVIDEOMEMORY.

A: It means that fft3dgpu needs more memory than there are availebol on the graphics card. So either you will have to upgrade or try lowering the resolution, precision, bt,bh,bw,ow,oh or use usefloat16=true or mode 0 or 2
Q: I can't get this filter to work

A: Try upgrading to the latest drivers(ati radeon or nvidia geforce). Check if your card is supported (see below). If that doesn't solve the problem write me an bug report (see support) where you include the script used, program used and what GPU, driver version, windows version and directx version version you use.
Q: What setting gives the same result as fft3dfilter?

A:fft3dGPU(mode=1,precision=2) is similair to fft3dfilter() but please note the different default values for bw,ow,bh,ow
Q: Is there any differences between fft3dfilter and fft3dgpu?

A: Some of the features from fft3dfilter is still missing.
Q: Why is fft3dGPU so slow compaired to fft3dfilter?

A: either you have a slow graphics card like a Geforce FX 5200 or you are not using it while doing cpu heavy encoding (like XviD/DivX)
Q: How do I use NVPerfHUD?

A: set NVperf=true and used this commandline or make a shortcut to run it: "PATH TO NVPerfHUD\NVPerfHUD.exe" "PATH TO VIRTUALDUB\virtualdub.exe" "PATH TO AVS\test.avs" and enabled "force NON PURE device"
Q: I get this errormessage: "Only pixelshader 2.0 or greater supported"

A: It is because you need a graphics card that has hardware support for Directx 9.
The following cards will not work:

Nvidia:
TNT
TNT2
Geforce 256
GeForce2 Ultra, Ti, Pro,MX,Go and GTS
Geforce3 Ti 200, Ti 500
GeForce4 Ti, MX, Go

Ati:
Radeon 7xxx
Radeon 8xxx
Radeon 90xx
Radeon 92xx

Matrox:
G2xx
G4xx
G5xx
maybe Parhelia

The following should work:

Nvidia:
Geforce FX 5xxx
Geforce 6xxx
Geforce 7xxx

Ati:
Radeon 9500
Radeon 9550
Radeon 9600
Radeon 9700
Radeon 9800
Radeon Xxxx
Radeon X1xxx

where x means any digit.

Support:

This thread on the doom9 forum or my email address (tsp (at) person.dk).
TODO:

(maybe) noise pattern support. Fix all the stupid bugs. Add the directx 9.0b version back.
Changelog:

* 0.1 first release. Buggy and used Brook
* 0.2 sigma should now work like fft3dfilter
* 0.3 Rewrote the code to use Directx 9.0 directly and support for 16 bit float increasing performance and stability.
* 0.31 Fixed bug causing aliased edges.
* 0.4 Added sharpen, mode 1,2, reduceCPU and multithreading
* 0.41 Fixed bug when calculating PSD.
* 0.42 Fixed memory leak when reloading
* 0.43 Fixed bug that caused coruptions on the Geforce FX cards and some more memory leaks. Added more comments to the sourcecode and small performance improvement in the shaders. Also added support for directx 9.0b
* 0.44 fft3dgpu can now reset a lost device and continue work. The direcx 9.0b version should work now.
* 0.45 fixed bug when filtering the chromaplane and mode=0 or 2 crashed the filter.
* 0.46 fixed lockups on hyperthread enabled machines(hopefull). Also fixed infinite loop when closing WMP 6.4.
* 0.46.1 fixed issue with nvperf=true causing fft3dgpu to lock up. Added a FAQ section to this file.
* 0.47 fixed bug with corrupted frames after reseting a lost device. Renamed the readme.txt to fft3dgpu.txt. Uses a newer version of DirectX 9.0c so please _read the install instructions_!!!
* 0.5 Added Kalman, sharpening, bt=4, degrid from fft3dfilter. Renamed ps.hlsl to fft3dgpu.hlsl. Rewrote some of the code. Added new bugs.
* 0.5a fixed bug with bt=2. Only file changed is fft3dgpu.hlsl
* 0.51 Fixed bugwithparametersafterNVPerfwasshifted.iedegrid=scutoff,scutoff=svr. Improved download speed from GPU. Geforce fx 5xxx now works with Kalman filter.
* 0.6 Added wintypes, plane=4 and variable overlap size (ow,oh). Change useFloat16 to precision. Changed default value for mode to 1
* 0.6.1 variable overlap now works on the geforce fx 5xxx. Default value for mode is 1 now.
* 0.6.2 bugfix: Degrid works better and vertical banding is gone when using mode 1. Right edge artifacts gone when using non mod 8 width and plane>0.
* 0.6.3 New fft code. Should improve performance when using larger blocksize and precision= 2(by up to 70%). Fixed bug with HC 0.17 crashing. New html doc(thanks Fizicks for creating this).
* 0.6.4 new fft code should now work with ati cards.
* 0.7 Added sigma2,sigma3 and sigma4 and support for interlaced filtering. Uses the fastest fft code now.
* 0.8 Added support for YUY2 colorspace. If not enough GPU memory is available the least used texture will be swapped to system memory.
* 0.8.1 Fixed crash when recovering lost device with plane=4 (thanks Fizick). Changed default for bt to 3 as ff3dfilter
* 0.8.2 Fixed crash when recovering lost device with interlaced=true (thanks Fizick) and recovering lost device with bt=0 and sigma2,3,4 =sigma.



Sourcecode released under GPL see copying.txt

708145
13th February 2005, 22:26
Very nice indeed :D

Could somebody with a recent GPU please provide info about results, problems, speedup, ...?

It'll definitely help to convince me to get out and buy a new GPU ASAP ;)

bis besser,
Tobias

Soulhunter
13th February 2005, 22:41
Hrm, bt mode 3 gives me this... (http://img107.exs.cx/img107/1258/118ve.png) :\

But mode 1/2 works nice (720x576 @ ~10fps) !!!

My box: Athlon XP2800+ / 1024MB RAM / GeForce 6600GT


Bye

Fizick
13th February 2005, 23:25
Tsp,
talanted works!
But not for my GF2MX400 :(
So, I will stay with fft3dfilter :)
BTW, what is "hole frame" ? Whole?

Once more question:
Have you plan to implement in GPU all my other plugins?
:D

tsp
14th February 2005, 08:52
BTW, what is "hole frame" ? Whole?


umm yes typo. It should be whole frame. So the border are also filtered.


Have you plan to implement in GPU all my other plugins?

Only the FPU heavy filter ;)

Also how does the Kalmar filter works if I should implement it?

Soulhunter: I get a similar result with bt=3. If you use usecache=false the chroma shift disappear (and also the speed)
I'm trying to find out where the error is.

708145: On my computer an athlon xp 2400 MHz with an ASUS Geforce 6800 GT (V9999GT) I get about 10-11 fps @ 720x576
I'm a little curious how the radeons would perform.

bill_baroud
14th February 2005, 09:15
gah, i forgot my usb dongle, i don't have my screenshots...

well i tested, and got some weird results, quickly :

- it does not any filter (??) but insert some weird black square on the image, of size bh/bw.

- it add some black borders horizontally too.

- speed is about 5-6fps on my FX5900 (looks like it likes those fps :)

tsp
14th February 2005, 11:50
I fixed the chroma bug with bt=3. Also added a new option reducecpu. If enabled the cpuload is reduced (but so is the framerate but hopefull it will be fixed someday).
Same link as before.

bill_baroud: What driver are you using? What size is the image? How does the script look like? This filter only process YV12.

bill_baroud
14th February 2005, 13:25
uh yeah, i forgot ... Source is MJPEG (avi) or MPEG4v2, 768x576 (PAL cap) or 832x480. My script just convert to YV12 and use fft3dgpu() with default settings (well i tried to change the others settings, but with no luck, it only change the size of black squares).

Drivers ??? huh ... i don't think they are the latest, something like 66.77.

I also tried other colorspace as input, but the results were really funky as expected, and not like my bug.

tsp
14th February 2005, 15:35
bill_baroud: I have tested the filter with version 66.93 and 71.80 both didn't show any artifacts. You could try to update the driver.

Fizick
14th February 2005, 23:27
tsp,
But sigma=1 in fft3dfilter ~ sigma=25 in fft3dGPU.

How about compatibility? I use:

norm = 1.0f/(bw*bh); // do not forget set FFT normalization factor
sigma2NoiseNormed = bt*sigma*sigma/norm; // normalize noise value

Backwoods
15th February 2005, 09:32
GeForce 6800 OC

720x272

FFT3dFilter = 10-12fps

FFT3dGPU = 12-16fps

(sigma=3.0, bt=3, bh=32 ,bw=32) for both filters.

tsp
15th February 2005, 09:44
Fizick I have added the normalization code to sigma. The only thing I can't seem to figure out is how to apply the 2d window function. At the moment I'm using af 1d window but this produces artifact with sigma values above 10.
When just multiplying the cosx and cosy values I get a checkboard pattern(when the picture is shifted bw/2 and bh/2 and summed the factor doesn't add up to 1.

edit

nevermind I cheated and used this as the window function:

void ImgStream::CreateFactorMap(float* Map,int x,unsigned int xnum,int y,unsigned int ynum,bool shift)
{
double cosy,cosx;
unsigned int offset=0;
double x1=x;
double y1=y;
//xnum=xnum/2;
//ynum=ynum/2;
for(unsigned int repy=0;repy<(ynum+shift);repy++){
for(double n1=(shift&&(repy==0||repy==ynum)?0:-y1/2.0)+0.5;n1<y1/2.0;n1++){
cosy=cos(n1*pi/(y1));
for(unsigned int repx=0;repx<(xnum+shift);repx++){
for(double n2=(shift&&(repx==0||repx==xnum)?0:-x1/2.0)+0.5;n2<x1/2.0;n2++){
cosx=cos(n2*pi/(x1));
Map[offset++]=sqrt(0.5*(cosx*cosx+cosy*cosy));
}
}
}
}
}


I have upload version 0.2 where the sigma values should work like fizick's fft3dfilter.

Antitorgo
15th February 2005, 18:33
Hmm... seems really slow.

The previous version I tested got something like 6-7fps now I get 2-3fps... This is with reduce CPU set to false (I tried true and got the same framerate with lower CPU utilization)... This could be because of something on my laptop or something tho.

From the stuff on the AviShader thread...

On the sleep() calls, I do them before copying the texture back, in D3D, as soon as the DrawPrimitive() call happens, the GPU begins it's thing, so any work you do between there and copying the texture back is the place to do things. For example, you can start copying the next frame up to the GPU or any sort of preprocessing on the CPU.

As far as my channel idea... if you are uploading a 1 channel 8-bit image (typically Luma) to the GPU, I'm guessing that brook is doing packing/unpacking into a 32-bit texture (which is native on the GPU) at 1/4 the width. This leads to an ineffeciency because of the packing/unpacking that has to happen on the GPU and just adds overhead. My idea was to upload 4 frames into each channel on a 32-bit texture, then you can run the shader across 4 frames at a time. In your case, it is a little complicated because you have your shifted/multiplied frame thing going on... so I'm not sure if it is applicable in your case (In avishader, I expect it to give me a huge performance boost when I get around to implementing it)...

tsp
15th February 2005, 19:55
Antitorgo I don't get lower framerates with the new version. Strange.
Also in this version the sleep is placed just before EndScene. And the texture is uploaded as D3DFMT_L8 then converted to D3DFMT_A32B32G32R32F where all the calculation is done (FFT requeres float) before converted back to D3DFMT_A8R8G8B8 for download. But I'm convienced that I will have to rewrite the filter without brook if I shall optimize this filter more. So now I just have to learn how to set up DirectX to do the rendering :p

Antitorgo
17th February 2005, 17:30
Yeah, copying back using the A8R8G8B8 is what has always killed me too, because it is the slowest operation and has to transfer 4x as much data as necessary. That is why I was thinking that the 4 frame at a time deal would work well...

If you want the source for AviShader which has all the D3D stuff, PM me and I'll see what I can do. D3D is pretty straightforward once you grok it.

tsp
22nd February 2005, 20:25
when using A8R8G8B8 (or fixed4 in brook) I pack 4 pixels to avoid waisting bandwidth (see FFT3dshader.br for the brook shaders).

tsp
14th March 2005, 22:57
released version 0.3. It's a major rewrite now using Directx directly instead of brook. The shaders are also optimized and the filter can now use float16(2 byte float) instead of float32(4 byte float or single precision) for storing the calculations causing an up to 150% speed increase compaired to version 0.2:
This is the framerates for version 0.2 ,0.3 and fft3dfilter using this syntax:
fft3dGPU(bt=1,sigma=2)
fft3dfilter(bt=1,sigma=2)
on a 720x576 clip:

fft3dfilter 7.0 FPS
fft3dGPU 0.2 11.0 FPS
fft3dGPU 0.3 24.3 FPS

this is on an athlon XP 2400 MHZ (nforce-2 chipset 200 MHz DDR ram)
and a Geforce 6800 GT 128 MB RAM (411 MHz core/742 MHz mem).

So a 350 % speed increase compaired to fft3dfilter. Nice...

Please report if the filter work with your graphics card. What card it is and how fast.

I have only tested this version with my geforce 6800 GT so I don't know how well it works with geforce FX 5xxx and Radeon Xxxx and 9xxx so please try and report back.

Backwoods
15th March 2005, 00:52
AVISource("otto.avi")
ConvertToYV12()
(sigma=3,bt=3)

FFT3DFilter 4-6 fps
FFT3dGPU 8-12 fps

720x480

GeForce 6800OC
2.8HT
1gig RAM

And I noticed the AA problem too.

tsp
15th March 2005, 10:26
Fixed the bug causing the aliased edges. Get the fixed version from the first post.

Also it would be nice if you could post the driver version you are using.

Backwoods: I'm a little curious why you only get 8-12 fps when I get about 18-21 fps on a 720x576 clip using bt=3. Maybe it's the 4 extra pipelines in the geforce 6800 GT.

Blue_MiSfit
15th March 2005, 13:20
AWESOME!

On my 9800 pro I get about 7-15 fps (encoding into cq2 qpel vhq4(&bvop) xvid) depending on the scene with a crop, lanczosresize(), removegrain(mode=2) and unfilter(-5,-5) before it.

powerful denoising without smudging the image too much, I really like it so far for the new Star Wars DVDs which have an absurd ammount of noise (gives 6of9 nightmares when barely filtered!!).

More later

~misfit

Didée
15th March 2005, 13:57
Originally posted by Blue_MiSfit
(gives 6of9 nightmares when barely filtered!!)
No, it doesn't give nighmares to 6of9.

It gives you nightmares because of the achieved high bitrates ... ;)

For pressing very noisy sources into tiny files, 6of9 is not suited, by intention.

Backwoods
15th March 2005, 22:59
Originally posted by tsp
Backwoods: I'm a little curious why you only get 8-12 fps when I get about 18-21 fps on a 720x576 clip using bt=3. Maybe it's the 4 extra pipelines in the geforce 6800 GT.

Just installed 71.84 and fft3dgpu 0.31 and now:

12~22 fps maintained 16-18

720x480

Xvid Q2

vinetu
16th March 2005, 23:11
I did some tests and here is the results.
The CPU is P4 1.8GHz overclocked to 2.9GHz,The VGA card is Radeon 9600 Non Pro.
The source avi is an uncompressed, progressive 720x576 YV12.avi ,273 frames,almost static natural video .
The avs script is :
-------------
LoadPlugin("fft3dGPU.dll")
Avisource("X:\YV12.avi")
fft3dGPU(bla,bla)
-------------
The "X:" drive is a 500 Mb RAMDiSK drive.
Processed in VirtualDubMod in direct stream mode and the filtered avi is saved on the same RAM drive -no HDDs involved...

It's really impossible to me to see the differences between original and filtered images,
so I decide to compress the filtered avi files at later point to XviD (single pass,quant 2,no B-frames) to "visualize" the filtering.

The chain is "uncompressed->fft_filter->uncompressed".

|__ settings __________________ | fft3dfilter (fps) / XviD avi Size _____ | fft3dGPU (fps) / XviD avi Size

|__ no filtering ________________ | _______________________________________________ 9,861,120 bytes

| (sigma=2.0, bt=1) ____________ | _ 10.11 fps / 8,026,112 bytes _____ | 18.20 fps / 9,439,232 bytes

| (sigma=3.0, bt=3) ____________ | __ 7.58 fps / 6,307,840 bytes _____ | 16.05 fps / 7,495,680 bytes ___ |

| (sigma=3.0, bt=3, bh=16 ,bw=16) | __ 5.46 fps / 6,408,192 bytes _____ | 19.50 fps / 8,501,248 bytes ___ |

| (sigma=3.0, bt=3, bh=48 ,bw=48) | __ 7.00 fps / 6,305,792 bytes _____ | 13.00 fps / 6,516,736 bytes ___ |
| (sigma=3.0, bt=3, bh=48 ,bw=48) | ___ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ | 14.37 fps (R9600 overclocked from 325/202 to 425/225 core/memory)



This test is my fisrt try with fft based filtering -I'm VERY impressed by the compressibility results without distorting original.

Bow to the ground to both of you Fizick and Tsp!!!

Edit: text formating :(

tsp
16th March 2005, 23:57
thanks for the feedback. Wonder if someone with a Geforce 6800 Ultra SLI could test this filter ;)

vinetu: Please note that if bw or bh is not a power of two (4,8,16,32,64,128,256,512) it's rounded up to the next power of two 48 -> 64. This is not the case when using fft3dfilter because it uses fftw. Also it should be faster to compress directly to XviD when using fft3dGPU because the waisted CPU cycles when waiting on the graphics card are used the encode the last processed frame (if the program is multithreaded (virtualdubmod is)).

vinetu
17th March 2005, 00:22
Thank You tsp!!!

In addition to speed tests - there is no difference in speed between AGPx8 and AGPx4 modes here,
so I guess PCI-E cards should have equal performance to AGP ones (if core chip/memory is the same)

Best Regards

Soulhunter
18th March 2005, 08:35
Uhm, the latest version throws a error... :\

http://img100.exs.cx/img100/866/9869sy.png (http://www.imageshack.us)

tsp
18th March 2005, 11:47
Soulhunter: You should copy the file ps.hlsl from the zip file into the same directory as fft3dgpu.dll (in this case c:\programme\Avisynth 2.5\plugins\)

Soulhunter
18th March 2005, 13:13
Ouch, I feel very stupid now... :D

Guess it was one of this "drag&drop" errors !?!


Bye

Leo 69
18th March 2005, 21:10
With this filter I get resized picture (i.e reduced one) with bunch of big black artifacts all over the place. I use GeForce FX 5900 NU @ 71.84 official drivers. :(

tsp
19th March 2005, 00:10
Leo 69: Damn I hoped the new version would work on a Geforce Fx. Would you try this (http://www.tsp.person.dk/test.zip) version and see if it works with bt=1 or bt=2 or both. In this test version the filtering is disabled so it is just to see where the error is.

Blue_MiSfit
19th March 2005, 07:04
@ Didee

No, it doesn't give nighmares to 6of9.

It gives you nightmares because of the achieved high bitrates ...

For pressing very noisy sources into tiny files, 6of9 is not suited, by intention.


I was actually doing a cq2 compressibilty test for 6of9

Leo 69
19th March 2005, 14:11
Originally posted by tsp
Leo 69: Damn I hoped the new version would work on a Geforce Fx. Would you try this (http://www.tsp.person.dk/test.zip) version and see if it works with bt=1 or bt=2 or both. In this test version the filtering is disabled so it is just to see where the error is.

Yes, the test version works fine :)

tsp
19th March 2005, 14:55
Leo 69: Also with fft3dGPU(bt=1)?? If that is the case it shouldn't be to hard fixing the error.

Leo 69
19th March 2005, 19:39
Originally posted by tsp
Leo 69: Also with fft3dGPU(bt=1)?? If that is the case it shouldn't be to hard fixing the error.

Yes, everything's OK with bt=1 too, tsp

tsp
19th March 2005, 20:44
Leo 69: Good then try version 0.31 again but change the following passage in the end of ps.hlsl(should be in the same directory as fft3dgpu.dll. It's an ordinary text file so use notepad to open it)


//****************************************************************
#ifdef BETA
float4 WFilter( PS_INPUT In) : COLOR
{
float4 src=tex2D(Src,In.texCoord);
float2 PSD=float2(length(src.xz),length(src.yw));
float4 MulFac=float4(BETA.x,BETA.x,BETA.x,BETA.x);
if(SIGMA.x<PSD.x)
MulFac.xz=((PSD.x-SIGMA.y)/PSD.x);
if(SIGMA.x<PSD.y)
MulFac.yw=((PSD.y-SIGMA.y)/PSD.y);
return MulFac*src;
}
#endif
//*******************************************************************


to this:

//*******************************************************************
#ifdef BETA
float4 WFilter( PS_INPUT In) : COLOR
{
float4 src=tex2D(Src,In.texCoord);
float2 PSD=float2(length(src.xz),length(src.yw));
float4 MulFac;
float4 dst;
MulFac.xz=((PSD.x-SIGMA.y)/PSD.x)*(SIGMA.x<PSD.x)+(SIGMA.x>=PSD.x)*float2(BETA.x,BETA.x);
MulFac.yw=((PSD.y-SIGMA.y)/PSD.y)*(SIGMA.x<PSD.y)+(SIGMA.x>=PSD.y)*float2(BETA.x,BETA.x);
dst=MulFac*src;
return dst;
}
#endif
//*******************************************************************

if that doesn't work try this version

//*******************************************************************
#ifdef BETA
float4 WFilter( PS_INPUT In) : COLOR
{
float4 src=tex2D(Src,In.texCoord);
float2 PSD=float2(length(src.xz),length(src.yw));
float4 dst;
if(SIGMA.x<PSD.x)
dst.xz=src.xz*((PSD.x-SIGMA.y)/PSD.x);
else
dst.xz=src.xz*float2(BETA.x,BETA.x);
if(SIGMA.x<PSD.y)
dst.yw=src.yw*((PSD.y-SIGMA.y)/PSD.y);
else
dst.xz=src.xz*float2(BETA.x,BETA.x);
return dst;
}
#endif
//*******************************************************************

Leo 69
19th March 2005, 21:41
None of the script versions work (properly), tsp. Overall Bt=2 mode gives largest amount of artifacts and by the way my mouse constantly stops responding for very short periods of time during playback (~0.2 sec or so).With test version too.

tsp
19th March 2005, 23:13
hmm could you post a sceneshoot of the artifacts? Also does this version produce artifacts(This disables the filtering but it's mainly to confirm that the bug lies in the if statements).

//*******************************************************************
#ifdef BETA
float4 WFilter( PS_INPUT In) : COLOR
{
float4 src=tex2D(Src,In.texCoord);
float4 dst;
dst=src;
return dst;
}
#endif
//*******************************************************************

Also I haven't heared about the mouse problem before. But I will see what I can do about it later when I get this filter working in a geforce FX (even if I have to buy a geforce FX 5200 to test on. Anyone has a spare one?)

LordIntruder
21st March 2005, 05:09
Hi,


I tested you work TSP and here is what I get:

I encoded a 10000 video frames (720 x 528) with this parameter for FFT:
'FFT3DFilter(sigma=3, bt=3, measure=true)'

For FFT3dGPU I used:
'fft3dGPU(sigma=3, bt=3)'

On a 2400+, 1Gb Ram, AGP Radeon 9600 Pro with latest drivers, Windows XP SP2, DirectX 9.0c. Neither my CPU nor my GPU are overcloked.

Without FFT:

1st Pass = 13 min
2nd Pass = 37 min

FFT Measure True:

1st Pass = 53 min
2nd Pass = 80 min

FFT Measure OFF:

1st Pass = 59 min
2nd Pass = 82 min

FFT3dGPU

1st Pass = 17 min
2nd Pass = 40 min

In my example I selected the heaviest solution with latest XviD beta: Qpel, GMC, VHQ4, VHQ for bframes, Chroma motion, etc... So this is why the encode is so slow. You untick Qpel, you put VHQ1, no chroma motion and the speed rise up to the roof :D

The GPU version is amazingly fast !!!! I can't believe it!!! :eek: :eek:

As far as my eyes can see, the quality seems the same between your filter and the original by Fizick. Can you confirm the only thing is about the 16 bits float (useFloat16)? Except that option (which we still can enable to 32 bits), we are suppose to get the same quality that the original filter right?

A last thing I don't understand, I quote you

--
"usecache: if enabled the frames are saved in the GPU after the 2d FFT to avoid calculating them again the next frame if bt=2 or 3.
It can be necessary to disable this internal cache if using motion compensation. Default = true"
--

What do you mean by motion compensation? GMC option in DivX or XviD? What drawbacks are we suppose to get? Artifacts I suppose? And this option Off slow down the encode a lot?

A great thanks for your work, Fizick's filter is very good but so slow. A good idea you've got here. :) I wouldn't imagine that my video card helps me to encode faster. Another reason to buy a fast new card :D

tsp
22nd March 2005, 01:24
Originally posted by LordIntruder

As far as my eyes can see, the quality seems the same between your filter and the original by Fizick. Can you confirm the only thing is about the 16 bits float (useFloat16)? Except that option (which we still can enable to 32 bits), we are suppose to get the same quality that the original filter right?

Ahemm I just discovered that my filter cheats a little more than just using 16 bit float. I'm only using a 1:1 overlap instead of a 2:1 this means that the filter only does half as many calculations as fft3dfilter (explaining in part why it's 4 times as fast). This results in border artifacts when using high sigmavalues (about 2.5-10 depending on bw/h that is higher bw lower sigma before artifacts appears ). That is a 1-2 pixel width dark border. Like this image:
http://www.tsp.person.dk/bug.png

I will do two things about that:
1) Implement the 2:1 overlap(This will cut the speed in half :mad: ).
2) and as an option for the speed hungry people. Just use a slightly larger blocksize and then crop the borders.

Until that is implementet be a little extra carefull with high sigma values.


A last thing I don't understand, I quote you

--
"usecache: if enabled the frames are saved in the GPU after the 2d FFT to avoid calculating them again the next frame if bt=2 or 3.
It can be necessary to disable this internal cache if using motion compensation. Default = true"
--

What do you mean by motion compensation? GMC option in DivX or XviD? What drawbacks are we suppose to get? Artifacts I suppose? And this option Off slow down the encode a lot?

No it was mainly aimed at MVTools but I don't think it will cause artifact anyway so I will disable this option in the next version.

tsp
1st April 2005, 20:16
released version 0.40. Now includes sharpening, 2:1 overlap and 1:1 overlap with border. Also better optimized for multitasking (the filters before fft3dgpu are processed at the same time as fft3dgpu)

vinetu
3rd April 2005, 00:10
Hi!
Some digits again-I did the "compressibility test" at same source (273 frames,PAL)

intel P4, Radeon 9600
non filtered Xvid.avi size 9,861,120 bytes

fft3dGPU v.0.31 (sigma=2.0,bt=1) ->9,439,232 bytes
fft3dGPU v.0.40 (sigma=2.0,bt=1) ->9,441,280 bytes

fft3dGPU v.0.31 (sigma=3.0,bt=3) ->7,495,680 bytes
fft3dGPU v.0.40 (sigma=3.0,bt=3) ->7,489,536 bytes

fft3dGPU v.0.31 (sigma=3.0,bt=3,bh=16,bw=16) ->8,501,248 bytes
fft3dGPU v.0.40 (sigma=3.0,bt=3,bh=16,bw=16) ->8,497,152 bytes

fft3dGPU v.0.31 (sigma=3.0,bt=3,bh=64,bw=64) ->6,516,736 bytes
fft3dGPU v.0.40 (sigma=3.0,bt=3,bh=64,bw=64) ->6,500,352 bytes
fft3dGPU v.0.40 (sigma=3.0,bt=3,bh=64,bw=64,mode=1) ->6,019,072 bytes

Still any artifacts are invisible here :)

Thank You!

P.S. Just curious why "FFT3DFilter(sigma=3.0,bt=3,bh=16,bw=16)"
is produceing much smaller file - 6,408,192 bytes vs 8,497,152 bytes(by fft3dGPU)

Fizick
4th April 2005, 05:53
tsp,
The speed results of your filter is great!

But what is your "1:1 overlap" and "2:1 overlap" mean?
Kokaram (and me) used say 16 pixels blocks width, every next block is shifted by 8 pixels (right, bottom), so one-side overlap size is 8 pixels for every block, and whole blocks width is overlapped, so summary overlap size (left and right) for block is equel to its width= 16 pixels.
I think it is full (maximum possible) overlapping (for simple algo).

Is it your "1:1" or "2:1" ?

Now i create (not release yet) new version of FFT3DFilter with partial overlapping (with arbitrary overlapped size), and confused with therms. I want use new parameter "overlap width" as one-side overlap size, with maximum value equal to half of block width.

tsp
4th April 2005, 07:45
Fizick : What you descripe is my 2:1 overlap=mode 1. In the 1:1 overlap (mode=0) the blocks are only shifted half bh down and bw to the right so 1/4 of a block is only overlapped by 1 block(compaired to 3 blocks when using mode=1). when using mode 2 only bw minus the border is used for overlapping (mainly because the artifacts are most severe at the borders). So this is nearly the same as partial overlap. mode 0 and mode 2 uses another window function than the one used in mode 1.
This image shows the diffent mode:
http://www.tsp.person.dk/overlap.png

So if you wants to compaire fft3dgpu with fft3dfilter use:
fft3dgpu(mode=1,usefloat16=false)

Fizick
4th April 2005, 21:06
tsp,
thanks for response and nice pic. But i am not not quite understand it.
I draw my overlap pic in fft3dfilter thread.
:)

tsp
4th April 2005, 21:38
Fizick: From your drawing it looks like the center of a block isn't overlapped at all. Is that true?
Also mode 1 in fft3dgpu and your default mode is the same. So look carefull at the (ugly) drawing of mode 1 and you can see four different colored blocks (dotted blue , dotted dark green, solid red and solid light green). To filter a 720x576 image we need to fft ~720/bw*576/bh*4 blocks. When using mode 0 there are only to overlapped block (red and green) meaning only ~720/bw*576/bh*2 blocks. And finaly mode 2 needs ~720/(bw-borderwidth*2)*576/(bh-borderheight*2)*2 blocks.

tsp
5th April 2005, 22:11
Found a bug. I forgot to square the modulus of the transformed image when calculating the PowerSpectralDensity. I will release a new version shortly until then the quick fix is to change line 518 in ps.hlsl from

float2 PSD=float2(length(src.xz),length(src.yw));

to

float2 PSD=float2(src.x*src.x+src.z*src.z,src.y*src.y+src.w*src.w);

Fizick
6th April 2005, 21:46
tsp,
Yes, center is not overlapped in my partial overlap mode.

So I conclude, that full overlap mode of my FFT3dfilter(bw=32,bh=32, ow=16, oh=16) is the same as your FFT3DGPU(mode=1, bw=32,bh=32).
But my partial overlap FFT3dfilter(bw=32,bh=32, ow=8, oh=8) is NOT the same as your partial overlap
FFT3DGPU(mode=0, bw=32,bh=32).
So, users may compare results (quality) of different approaches.
(after you fix recent bug, and i fix my quite possible bugs - i rewrote many lines of code in v.0.9)

tsp
6th April 2005, 22:50
released version 0.41. Only new thing is the above bugfix plus a minor bug when calculating sigma (when mapping from 0-255 to 0-1 divide with 255 not 256 doh ).

LordIntruder
9th April 2005, 02:55
Hi,


Some speed measurements again. I put old results back here for better readability:

Athlon 2400+, Radeon 9600 Pro, 1 Gb Ram.

I encoded a 10000 video frames (720 x 528) with these parameters for FFT:

'FFT3DFilter(sigma=3, bt=3)'

For FFT3dGPU I used:
'fft3dGPU(sigma=3, bt=3)'

Without FFT
1st Pass = 13 min
2nd Pass = 37 min

FFT Measure True (v0.8.3)
1st Pass = 53 min
2nd Pass = 80 min

FFT Measure OFF (v0.8.3)
1st Pass = 59 min
2nd Pass = 82 min

FFT3dGPU (v0.3)
1st Pass = 17 min
2nd Pass = 40 min
--------

Today I encoded the exact same clip with the updated versions:

'FFT3DFilter(sigma=3, bt=3)'
'fft3dGPU(sigma=3, bt=3, mode=1, reduceCPU=false)'

FFT Measure OFF (v0.9.1)
1st Pass = 28 min
2nd Pass = 55 min

FFT3dGPU (v0.4)
1st Pass = 25 min
2nd Pass = 49 min

FFT3dGPU (v0.41 reduceCPU=false)
1st Pass = 29 min
2nd Pass = 57 min

FFT3dGPU (v0.41 reduceCPU=true (default))
1st Pass = 21 min
2nd Pass = 38 min

I thought 'reduceCPU=false' increased encoding speed when I first read your explanations about this option. In fact it decrease the speed and by default (true) it is already the fatest.

In short 0.41 is faster than 0.4 (I was afraid that the more complex math to fix the bug would increase time encoding and it is the opposite, good). However as you can notice I used mode=1 (so 2:1 overlap) for the GPU version. Tsp you told me the encoding speed should be cut by half but it is not the case despite twice more calculations. Normal or a bug?

We also can see that the 3DNow optimizations help a lot for the normal version of FFT. That is really great. :D

Is the GPU version 3Dnow or SSE optimized? If not I hope you intend to do it, we would get some speed. :D

Finally is the GPU version work as something multi-threaded, I mean as if there were 2 CPU cores like the forthcoming Intel and AMD processors? Maybe it is the way it works (differently of course but the idea), just curiosity. I was thinking about some general code that could be used by others filters.

I mean you use a special DLL or something like that, some parameters in the AVS and thanks to this the calculation would be done half by the CPU, half by the GPU. Maybe it's impossible, just idea but like that not only the FFT filter would take benefit of the GPU but also others filters and/or general calculations. Instead to optimize each filter you write a general parameters and any filter can take benefit. A crazy idea ;)

Oh before I forget: with both FFT and FFTGPU (and only them) when I start the job using Virtualdubmod latest version, most of the time it closes itself. I launch again VDM, I start the job and the 1st pass start. Then at the end of the 1st pass again sometimes VDM close and I need to manually launch it again and start the 2nd pass by myself or this one is launched after the 1st finishes. It appears ramdomly. I'm using the Avisynth 2.56 build 31 Jan and have just see another Beta from february 21 is out. Will give a try.

Anyway cheers to both of you on the work done on these filters :)

tsp
9th April 2005, 23:22
Originally posted by LordIntruder


I thought 'reduceCPU=false' increased encoding speed when I first read your explanations about this option. In fact it decrease the speed and by default (true) it is already the fatest.

There is a good explanation to this. It's because avisynth(fft3dgpu) uses less cputime when reduceCPU=true. This means that XviD gets more time to do the encoding and the encodetime decrease even if avisynth uses a little more time to process a frame. If reduceCPU=false then the extra cpu-time would be waisted instead of used to encode. You could try to repeat the test with Huffyuv or another fast codec (MJPEG)instead of XviD and you would get some very different results(at least that's what I think would happend)


In short 0.41 is faster than 0.4 (I was afraid that the more complex math to fix the bug would increase time encoding and it is the opposite, good). However as you can notice I used mode=1 (so 2:1 overlap) for the GPU version. Tsp you told me the encoding speed should be cut by half but it is not the case despite twice more calculations. Normal or a bug?

First the bugfix made the math simpler. Instead of calculating the modulus/length of the complex number/vector (squareroot(a^2+b^2)) the modules/length squarred is used (just a^2+b^2) so the squareroot isn't need (and that's an expensive operation).
I must admit that i'm a little surprised that the speed decrease wasn't bigger but again I think it's because the GPU uses more time meaning that XVid get's more time to encode so that it somewhat offset the extra time used (multiprocessing is very nice). Again if you use huffyuv or MJPEG you would get a greater speed decrease.



We also can see that the 3DNow optimizations help a lot for the normal version of FFT. That is really great. :D

Is the GPU version 3Dnow or SSE optimized? If not I hope you intend to do it, we would get some speed. :D

fft3dgpu doesn't need 3dnow or sse because all the math heavy calculations are done on the GPU and it uses a very different operation set (basicly all the commands used are like sse on steroides). Maybe some speed could be gained by using assembly instead of HLSL(the c-like language used by directx.)



Finally is the GPU version work as something multi-threaded, I mean as if there were 2 CPU cores like the forthcoming Intel and AMD processors? Maybe it is the way it works (differently of course but the idea), just curiosity. I was thinking about some general code that could be used by others filters.

I mean you use a special DLL or something like that, some parameters in the AVS and thanks to this the calculation would be done half by the CPU, half by the GPU. Maybe it's impossible, just idea but like that not only the FFT filter would take benefit of the GPU but also others filters and/or general calculations. Instead to optimize each filter you write a general parameters and any filter can take benefit. A crazy idea ;)

The filter is multithreaded. Basicly just before the GPU begins the calculations a thread is created that fetchers the next frame. Meanwhile the first thread asks the GPU(driver) if it's done with the calculations if that is not the case it sleeps 5 msec before asking again. When the GPU is done the data is downloaded to the main memory and then the first threads waits for the second thread to exit. When the next frame is requested the results from all the filters before fft3dgpu are already cached (because they where run at the same time the GPU was working). So with a dualcore processor you could have the filter do it's calculations and calculate thenext frame at the same time (although the cache usage could be quite high)

Try to guess which of these two scripts who would run fastest or would they be equally fast?

#SCRIPT A
fft3dfilter(plane=1)
fft3dfilter(plane=2)
fft3dGPU()

#SCRIPT B
fft3dGPU()
fft3dfilter(plane=1)
fft3dfilter(plane=2)

script A would be fastest if used with a fast encoder because the two filters before fft3dGPU would be run at the same time as fft3dGPU while in script B the extra cpu time would just be waisted because there are no filters before fft3dgpu. If used with Xvid or another slow encoder the speed difference would be less because the waisted cputime would be used by Xvid


Oh before I forget: with both FFT and FFTGPU (and only them) when I start the job using Virtualdubmod latest version, most of the time it closes itself. I launch again VDM, I start the job and the 1st pass start. Then at the end of the 1st pass again sometimes VDM close and I need to manually launch it again and start the 2nd pass by myself or this one is launched after the 1st finishes. It appears ramdomly. I'm using the Avisynth 2.56 build 31 Jan and have just see another Beta from february 21 is out. Will give a try.

Anyway cheers to both of you on the work done on these filters :)
I must say that it sounds odd it only happens with these two filter's because they don't share any code (unless Fizick used some of my code but I somewhat doubt it ;) ) The only bug I know of in fft3dgpu is that if you uses F5 to many times all the videomemory is used. This is caused by a memory leak somewhere (It must be microsoft's fault. DirectX or something :) )
Oh and thanks for the test. It's really amazing to see the interactions with Xvid

Fizick
10th April 2005, 14:11
LordIntruder,
your comparizon is not quite correct.
I change default overlap width since v.0.9.
(for speed).
3DNow was used previusly by FFTw internally for fft calculation (i think),
now in v.0.91 I add 3DNow for my Wiener calculation. The gain is about 25%.

tsp
11th April 2005, 23:38
I finaly got rid of the memory leaks so now you can F5 as crazy as you want in virtualdub(or use it in conditionalfilter/scriptclip/frameevaluate although I wouldn't recommend that because of the slow initialization).
Maybe this filter is ready for the avisynth usage categorie

MacAddict
12th April 2005, 02:36
Was the block artifact bug fixed mentioned earlier in this thread fixed? I've got an FX5200 using the 71.90 Nvidia driver with XP SP2. I still see 0.42 displaying the blocks. Many thanks for the effort!

vinetu
12th April 2005, 06:41
Just curious - what is the type of the picture where the blocks are best
visible - a gradient ...or a flat colored one?

tsp
12th April 2005, 07:53
vinetu: With mode=0 a sharp(gradient) image would produce the most borderartifacts. A single colored area with the same intensity wouldn't produce as many artifacts. But it's easier to spot the artifacts in a flat area. This should produce them:

fft3dGPU(mode=0,bw=128,bh=128,sigma=50)

MacAddict
12th April 2005, 11:06
Guess my picture attachment for the above post was never approved. The black blocks I mention above can be seen in
this screenshot. (http://home.insightbb.com/~macaddict01/gpu.jpg) They appear on every frame.

tsp
12th April 2005, 12:49
Originally posted by MacAddict
Guess my picture attachment for the above post was never approved. The black blocks I mention above can be seen in
this screenshot. (http://home.insightbb.com/~macaddict01/gpu.jpg) They appear on every frame.
yes the mysterious geforce fx bug. I don't know what's causing it. If you read page 2 in this thread Leo 69 had the same problem. I have compiled a new test version. It disables some of the filtering so if you could try it and tell me if it works. It is here (http://www.tsp.person.dk/test.zip). If it works try to replace the ps.hlsl file with the one from version 0.42 and see if it produces black boxes.

Also I have uploaded version 0.42 again because I forgot to update the dll :o

bill_baroud
12th April 2005, 17:02
hey i was the first to report this problem ;) but i didn't have the time to post my screenshot (was moving...) and to test your test version. Did you notice that the size of block depend on the bw/bh parameters ?

nevertheless, i was here to ask you another question: how is your algo scalable ?
because i found a website here (http://openvidia.sourceforge.net/) on parallel vision computation algorithm, and they use a computer with 6 PCI FX5200 (http://openvidia.sourceforge.net/hexagraphic_thumb.jpg).
So i was wondering if something like that could be usefull for our purpose :D

tsp
12th April 2005, 17:41
bill_baroud: If you want you can try the new test version and you're welcome to try fixing the bug ;) (src included) It's very strange that some of the block isn't processed.

I'm wondering if the bedst method to split the work between multiple GPUs would be to give each 1 frame or to split the frame up. Also wouldn't a single geforce 6800 GT be faster than 6 PCI fx5200 (they most be awfull bandwidth limited). If someone wants a multiple GPU version they could give me 2 geforce 6800 ULTRA and a nforce 4 SLI motherboard to work with :)

MacAddict
12th April 2005, 18:02
Originally posted by tsp
yes the mysterious geforce fx bug. I don't know what's causing it. If you read page 2 in this thread Leo 69 had the same problem. I have compiled a new test version. It disables some of the filtering so if you could try it and tell me if it works. It is here (http://www.tsp.person.dk/test.zip). If it works try to replace the ps.hlsl file with the one from version 0.42 and see if it produces black boxes.

Also I have uploaded version 0.42 again because I forgot to update the dll :o The new test build seems to work perfect. Blocks appeared again only when I replaced the ps.hlsl file from the 0.42 build. Seems like your narrowing it down:) Thx again!

tsp
12th April 2005, 18:35
MacAddict: ok try to change Line 481 to 495 in ps.hlsl from this:

//***************************************************************************
#ifdef BETA
float4 WFilter( PS_INPUT In) : COLOR
{
float4 src=tex2D(Src,In.texCoord);
//float2 PSD=float2(length(src.xz),length(src.yw));
float2 PSD=float2(src.x*src.x+src.z*src.z,src.y*src.y+src.w*src.w);
float4 MulFac=float4(BETA.x,BETA.x,BETA.x,BETA.x);
if(SIGMA.x<PSD.x)
MulFac.xz=((PSD.x-SIGMA.y)/PSD.x);
if(SIGMA.x<PSD.y)
MulFac.yw=((PSD.y-SIGMA.y)/PSD.y);
return MulFac*src*float4(1,1,1,1);
}
#endif

to this:

//****************************************************************************
#ifdef BETA
float4 WFilter( PS_INPUT In) : COLOR
{
float4 src=tex2D(Src,In.texCoord);

float2 PSD=float2(src.x*src.x+src.z*src.z,src.y*src.y+src.w*src.w);
float2 PSDInv=1/PSD;
float4 MulFac=float4(BETA.x,BETA.x,BETA.x,BETA.x);
if(SIGMA.x<PSD.x)
MulFac.xz=(PSD.x-SIGMA.y)*PSDInv.x;
if(SIGMA.x<PSD.y)
MulFac.yw=(PSD.y-SIGMA.y)*PSDInv.y;
return MulFac*src;
}
#endif

if that doesn't work try this variant:

//****************************************************************************
#ifdef BETA
float4 WFilter( PS_INPUT In) : COLOR
{
float4 src=tex2D(Src,In.texCoord);

float2 PSD=float2(src.x*src.x+src.z*src.z,src.y*src.y+src.w*src.w);
float2 PSDInv=1/PSD;
float4 MulFac=float4(BETA.x,BETA.x,BETA.x,BETA.x);
if(SIGMA.x<PSD.x)
MulFac.xz=1-SIGMA.y*PSDInv.x;
if(SIGMA.x<PSD.y)
MulFac.yw=1-SIGMA.y*PSDInv.y;
return MulFac*src;
}
#endif

vinetu
12th April 2005, 19:01
MacAddict,
If you get troubles counting lines in ps.hlsl - you can use the script editor inside VirtualDubMod.

tsp,
Now (thanks to your script example) I see the blocks...

here are before/after zipped images (the web host is doing some weird things with my images,so I've zipped them)

before (http://www.hostinganime.com/nikotin05/tmp/before.zip)

fft3dGPU(mode=0,bw=128,bh=128,sigma=50) (http://www.hostinganime.com/nikotin05/tmp/after.zip)

the right Aspect Ratio for viewing is 16:9 (original resolution is 720x576),
the VGA is Radeon9600 non pro (Latest Drivers) AGPx8,
WinXP SP1,DirectX 9c 4.09.0000.904,FFT3dGPU.dll version 0.42 (updated one :) )

Ah and if you get troubles downloading images above with IE ("Save Target as..." did not work here) -try with some download manager...

vinetu
12th April 2005, 19:44
tsp,
now I did remember that nVidia VGA chips are working at low speed in 2D mode!
Typical example - 300MHz in 2D (avisynth) and 500MHz in 3D mode (Doom III :) ).
This is true for FX 5xxx and 6xxx series.

Did you know that?

tsp
12th April 2005, 19:49
Originally posted by vinetu
tsp,
now I did remember that nVidia VGA chips are working at low speed in 2D mode!
Typical example - 300MHz in 2D (avisynth) and 500MHz in 3D mode (Doom III :) ).
This is true for FX 5xxx and 6xxx series.

Did you know that?

yes I have mine clocked at 80 Mhz at 2D. I have a really noise fan on my graphc card so I have it run at very low speed in 2d mode.
Also don't worry when using fft3dgpu the 3D mode clock is used because I'm using Direct3D, you just don't see the rendered scene directly.

Also I can't download the after.zip even with a download accelerator. Couldn't you just upload the uncompressed picture
(The png file is compressed so you don't gain extra compression by "ziping" it)

vinetu
12th April 2005, 20:51
the problem with this .png images is that the web host there is "too smart" - at first try the files was .png and when i decide
to check them and download the hosted images - I get .png ,but looking like heavy down-resized and compressed .jpg ...
that's why i zipped them.

EDIT - Ok here they are:
before (http://www.freewebs.com/vinetu/before.zip)
after (http://www.freewebs.com/vinetu/after.zip)

script:
--------------------------------------
fft3dGPU(mode=0,bw=128,bh=128,sigma=50)
--------------------------------------

Radeon9600 non pro,WinXP SP1,DirectX 9c 4.09.0000.904

koszopal
13th April 2005, 09:16
@vinetu
maybe u can try post 2 png on
http://www.imageshack.ws/ ?
u can try post there as png files
and point here :D
koszopal

tsp
13th April 2005, 21:17
vinetu: That is exactly the kind of border artifacts. Also note how the plain area interacts with high detail area. So don't use such a high sigma value or use mode 1 or 2.

MacAddict: You can try this (http://www.tsp.person.dk/ps.hlsl) ps.hlsl file. It might work.

vinetu
13th April 2005, 22:06
Originally posted by tsp
That is exactly the kind of border artifacts.

Are these artifacts somehow different from these you get on nVidia?

I ask because long time ago I've read a review (something like "Ati vs nVidia") where was a lot of screen shots showing differences in rendered images in 3D games.For example the dithering on gradients was beter on Radeon 8500 ( IIRC the oposite card was GeForce 3 ).
I'm not an Ati fan :) this is technical question... I'm courious what could be my next VGA - for now the winner is FX6600... :)

bill_baroud
14th April 2005, 08:31
@vinetu: imho, you should go for a 6800 non-GT/non-ultra, the price difference is not that much with a 6600GT, and you got a much much better chip (10ps/5vs pipelines instead of 8/3...)

@tsp : v0.42 + ps.hlsl from the test version fixed the problem, i didn't get any blocks with many different parameter.
I didn't got nvperfhud working though, but it's not like it's important ;).

btw, i took a (really) quick look at your code (nice one indeed) and saw some DirectInput stuff ??? why do you need to manage something like that in an avisynth filter (just wondering) ?

tsp
14th April 2005, 08:59
bill_baroud: the ps.hlsl file from the test version disables the Wienerfiltering. So the problem lies in the WFilter function. Could you try the ps.hlsl from my last post. It's a fully working version.

About NVPerf:
Also you did use NVperf=true and used this commandline to run it:

"PATH TO NVPerfHUD\NVPerfHUD.exe" "PATH TO VIRTUALDUBMOD\virtualdubmod.exe" "PATH TO AVS\test.avs"

and enabled "force NON PURE device"

The DirectInput code is included to intercept keyboard commands to nvperfhud. So if nvperfhud isn't enabled the DirectInputcode isn't executed. Also I will have to comment the code better some day and organize the Getframe code. To many if..else.


Vinetu: The artifacts are the same. This filter doesn't use anything fancy like anisotopic filtering or even bi/trilinear filtering but it uses many pixelshaders. So the only difference between ati and nvidia is that nvidia uses 32 bit precision and ati only 24 bit. It doesn't matter that much because when usefloat16=true(the default) is used the result from each pixelshader is saved at 16 bit precision.
A artifact I haven't seen on a ATI card is this (http://www.tsp.person.dk/arti.png) when using high bw,bh like this:

fft3dGPU(mode=0,bt=1,bw=512,bh=256)

MacAddict
14th April 2005, 14:06
Originally posted by tsp
MacAddict: You can try this (http://www.tsp.person.dk/ps.hlsl) ps.hlsl file. It might work. tsp, using this file with the 0.42 build I'm still getting artifacts using fft3dGPU(bt=1):( Haven't tried other parameters yet.

tsp
14th April 2005, 14:42
Originally posted by MacAddict
tsp, using this file with the 0.42 build I'm still getting artifacts using fft3dGPU(bt=1):( Haven't tried other parameters yet.

:angry: :angry:
Oh well I ordered a Geforce FX 5200 today so I will see if I can fix it when it arrives.

bill_baroud
14th April 2005, 15:47
Originally posted by tsp
About NVPerf:
Also you did use NVperf=true and used this commandline to run it:

"PATH TO NVPerfHUD\NVPerfHUD.exe" "PATH TO VIRTUALDUBMOD\virtualdubmod.exe" "PATH TO AVS\test.avs"

and enabled "force NON PURE device"
uh no, i didn't thought of that .. tried the combo-key but it didn't work, and it can't launch the dll ;)

Oh well I ordered a Geforce FX 5200 today so I will see if I can fix it when it arrives.
i'm very curious of the performances it can achieve in your filter :)

bill_baroud
15th April 2005, 16:06
ok, i did some tests yesterday, here the results :

fft3dGPU(bw=32, bh=32,NVPerf=false, bt=1,sigma=2, plane=1, mode=0) : mixed normal/greenish image / no bug with test-ps.hlsl
fft3dGPU(bw=32, bh=32,NVPerf=false, bt=1,sigma=2, plane=1, mode=1) : greenish image / no bug with test-ps.hlsl
fft3dGPU(bw=32, bh=32,NVPerf=false, bt=1,sigma=2, plane=1, mode=2) : mixed normal/greenish image / a pink/green triangle, no filtering with test-ps.hlsl

fft3dGPU(bw=32, bh=32,NVPerf=false, bt=2,sigma=2, plane=1, mode=0) : close vdub / same with test-ps.hlsl
fft3dGPU(bw=32, bh=32,NVPerf=false, bt=2,sigma=2, plane=1, mode=1) : greenish image / no bug with test-ps.hlsl
fft3dGPU(bw=32, bh=32,NVPerf=false, bt=2,sigma=2, plane=1, mode=2) : close vdub / same with test-ps.hlsl

fft3dGPU(bw=32, bh=32,NVPerf=false, bt=3,sigma=2, plane=1, mode=0) : close vdub / same with test-ps.hlsl
fft3dGPU(bw=32, bh=32,NVPerf=false, bt=3,sigma=2, plane=1, mode=1) : greenish image / no bug with test-ps.hlsl
fft3dGPU(bw=32, bh=32,NVPerf=false, bt=3,sigma=2, plane=1, mode=2) : close vdub / same with test-ps.hlsl

fft3dGPU(bw=32, bh=32,NVPerf=false, bt=2,sigma=2, plane=0, mode=0)
fft3dGPU(bw=32, bh=32,NVPerf=false, bt=2,sigma=2, plane=0, mode=1) : black with image blocks ;) / same with test-ps.hlsl
fft3dGPU(bw=32, bh=32,NVPerf=false, bt=2,sigma=2, plane=0, mode=2) : no filtering, only a black with white dots triangle in the left high corner. / same with test-ps.hlsl
same with bt=3

fft3dGPU(bw=32, bh=32,NVPerf=false, bt=1,sigma=2, plane=0, mode=0/1) : mixed image/black blocks / no bug with test-ps.hlsl
fft3dGPU(bw=32, bh=32,NVPerf=false, bt=1,sigma=2, plane=0, mode=2) : mixed image/black blocks / black triangle bug with test-ps.hlsl

opening many avs file in vdub without closing it > no more video memory and crash


some screenshots can be found here (http://moodub.free.fr/fftgpu.zip)

edit: it was v0.42 + ps.hlsl from your last post and from the test.zip

tsp
15th April 2005, 17:10
bill_baroud: The pink/green block is the same error as the black blocks just in the U og V plane. In mode 2 the blocks are triangulair because each block is created using two triangles (in mode 0 and 1 the entire image is made up by two triangles). I'm really curious what warnings the debug directx dll will produce when I get my new superfast Geforce fx 5200 with 64 mb ram :)

Also did you download version 0.42 before I updated it with the right dll since you getting out of video memory? You can see the version number in explorer.

bill_baroud
16th April 2005, 12:32
yes i think i did get the silent update version (can't verify now) because i downloaded it after you said so (iirc).

Also, i have the dx SDK installed with debug version and co, you could have asked me a report or something, by giving me the procedure to follow. Well now you can enjoy the blazing speed of the geforce FX series :D

tsp
19th April 2005, 19:52
ph33r |v|¥ n33w G'ph0Я5e 5200!!!! 1+ W1||| PWN J00!!!!!!!!!!!!!

oh well got my new geforce fx and finnaly fixed the stupid bug. A small riddle:
What is 1000/1000?
Is it 1 not if you ask a geforce fx no it's 13231 ???

Here's the fix change line 481 to 495 in ps.hlsl from this:


//***************************************************************************
#ifdef BETA
float4 WFilter( PS_INPUT In) : COLOR
{
float4 src=tex2D(Src,In.texCoord);
//float2 PSD=float2(length(src.xz),length(src.yw));
float2 PSD=float2(src.x*src.x+src.z*src.z,src.y*src.y+src.w*src.w);
float4 MulFac=float4(BETA.x,BETA.x,BETA.x,BETA.x);
if(SIGMA.x<PSD.x)
MulFac.xz=((PSD.x-SIGMA.y)/PSD.x);
if(SIGMA.x<PSD.y)
MulFac.yw=((PSD.y-SIGMA.y)/PSD.y);
return MulFac*src*float4(1,1,1,1);
}
#endif

to this:


//****************************************************************************
#ifdef BETA
float4 WFilter( PS_INPUT In) : COLOR
{
float4 src=tex2D(Src,In.texCoord);
//float2 PSD=float2(length(src.xz),length(src.yw));
float2 PSD=float2(src.x*src.x+src.z*src.z,src.y*src.y+src.w*src.w);
float4 MulFac=float4(BETA.x,BETA.x,BETA.x,BETA.x);
if(SIGMA.x<PSD.x)
MulFac.xz=((PSD.x-SIGMA.y)/(PSD.x+0.0000000000000000000000000000000000001);
if(SIGMA.x<PSD.y)
MulFac.yw=((PSD.y-SIGMA.y)/(PSD.y+0.0000000000000000000000000000000000001);
return MulFac*src;
}

or download the ps.hlsl from here (http://www.tsp.person.dk/ps.hlsl)
I will release a new version when I fixes the mousecursor shuttering.

edit
hmm it seems as the shuttering is only present when the framerate drops below 4 fps and that it is the nvidia driver that is causing it. If you look at the cpu utilization in the task list and enables show kernel time you will notice a very high kernel time cpu utilization(in this case the nvidia driver) when the shuttering is present. The only solution I have found is to decrease the number of commands the GPU process at once. This however really kills the framerate (something like 1 fps for a 64x64 images :( ) so the only real solution is to upgrade the graphics card to something faster


Also the Geforce FX 5200 is about 15-30 times slower than my geforce 6800 GT. Can't wait to get it back in the computer.

bill_baroud
20th April 2005, 08:07
1000/1000 = 13231 ?? that's a nifty bug... you should perhaps forward it to nvidia :rolleyes: ?

I'll test the new ps.hlsl this evening, thanks :)

MacAddict
21st April 2005, 12:25
tsp,

It seems you definitely found the block bug a few of us were getting with the FX cards. The ps.hlsl modification seems to have fixed the issue.

I seen this stuttering problem while encoding and your right about the kernel times. In fact, while encoding my CPU only uses around 80% max utilization. I'm guessing my FX GPU is the bottleneck and thats why the CPU isn't being utilized 100%:)

tsp
22nd April 2005, 00:03
good to hear the modification worked. If you is not getting 100 % cpu just use some more demanding encoding settings or some more filters before fft3dgpu. Just curious what framerate do you get with a geforce fx 5900? also is it the total cpu utilization(green color) or is it the kerneltime(red color) that uses 80%?

FrEEwilL
26th April 2005, 11:18
i have a problem the filter working. :(
whenever i use FFT3DGPU() in avs, avisynth gives me unrecognized exception error and crashes vdub(or vdubmod) with a memory reference error (....xxxxxxx can not be read").

i'm using
P4 2.8e
Win2k3 Ent.
ATI 9500Pro(firmhacked to 9700)
Catalyst 0.48 (Aug, 2004)

firm patch can cause the problem?
i tried to install the driver both with original (recognized as 9700) and edited (forced to 9500Pro) ini settings but got the same error.

btw i have no problem with fft3dfilter or catalyst for other uses.

tsp
26th April 2005, 12:28
FrEEwilL: I don't think it's the firm patch that's causing the problem. It might be hyperthreading causing problems. Try disabling it and if it then works I will see if I can fix it.

MacAddict
26th April 2005, 12:30
@tsp

I'm actually using a FX5200 card unfortunately which gave me about 6-8fps. 60% of the CPU usage was kernel and the other 20% green. I suspect the high kernel usage could explain the mouse cursor sluggishness?

I'll be trying my other box with a Radeon 9700 Pro in the next 48hrs to see how it compares. Thanks again!

tsp
26th April 2005, 14:28
MacAddict: I also get shuttering when the framerate drops below 8 fps on my geforce 6800GT (although I have to use this setting to get is so low: fft3dGPU(mode=1, bt=3, sigma=10,usefloat16=false) )
the shuttering is reduced when decreasing the number of commands the GPU process at once but the speed is reduced by ~50%. I don't know if I should at it as an option(it could be cool if it only was enabled when the mouse was used. I just have to find a way to detect that)

bill_baroud
26th April 2005, 16:46
Originally posted by MacAddict
I'm actually using a FX5200 card unfortunately which gave me about 6-8fps.

!!
Are you sure it's a FX5200 ? i get 5-6fps with a FX5900 :eek:

which settings do you use (and codec) ?

FrEEwilL
30th April 2005, 11:27
Originally posted by tsp
FrEEwilL: I don't think it's the firm patch that's causing the problem. It might be hyperthreading causing problems. Try disabling it and if it then works I will see if I can fix it.

I tried to disable HT but it didn't help. :(

tsp
1st May 2005, 16:44
FrEEwilL: Could you try this (http://www.tsp.person.dk/test.zip) version. It will generate a text file called FFT3dGPU_log.txt in c:\. If you could post the content of this file I could see where it craches.

MacAddict: Any luck with the Radeon 9700 Pro?

FrEEwilL
3rd May 2005, 12:02
@tsp, i've tried test version and got the following,

when loading from the default avisynth plugin dir, w/ or w/o LoadPlugin()

AvisynthPluginInit2

when loading from the external path by LoadPlugin()

AvisynthPluginInit2

Create_fft3dGPU

FFT3dGPU Constructor

tsp
3rd May 2005, 12:58
FrEEwilL: thanks I have created a new test version that will generate a more detailed report. You can get it here (http://www.tsp.person.dk/test2.zip).
If you could post the new log when loaded from the default avisynth plugin dir and from an external dir.

FrEEwilL
6th May 2005, 12:44
@tsp, here is a log ('ext_log') when loading from the external dir.
when loading from the default dir, two logs are created.
one is in plugin dir and the other is in the root of c:
the former contains the first 2 lines of the following and
the latter is same as 'ext_log' except for the numbers in memory address.


AvisynthPluginInit2:

AvisynthPluginInit2 Addfunction done
CREATE_fft3dGPU

FFT3dGPU constructor address: 1c84ee8
imgp: 0
hr: 1
GetDevice
pDevice :0
RegisterClassEx
create window
Creating D3D
d3dpp: 12c2ec ZeroMemory
Setup d3dpp
Creating D3Ddevice...

tsp
6th May 2005, 18:56
FrEEwilL: ok I have a new test version ready here (http://www.tsp.person.dk/test.zip). It it will again produce a logfile in c:\ so if you could repeat the procidure and post the logfile and include the the adress that couldn't be read. You could also try to set NVPerf=true.

Another thing that could be wrong if you are using windows 2003 server is that directx acceleration is disabled as default. (http://groups.google.dk/groups?hl=da&lr=&threadm=eHoFwFfdDHA.3260%40TK2MSFTNGP09.phx.gbl&rnum=1&prev=/groups%3Fhl%3Dda%26lr%3D%26q%3D%2522Windows%2BServer%2B2003%2522%2Bdirectx) I don't know if you have enabled it or having problem with other non-fullscreen directx accellerated applications/games, also you need directx 9.0 (I think directx8.1 is the default installed). I don't know if you also have windows xp/2000/98 installed and could try it in these OS's instead.

MacAddict
6th May 2005, 19:47
Originally posted by bill_baroud
!!
Are you sure it's a FX5200 ? i get 5-6fps with a FX5900 :eek:

which settings do you use (and codec) ?

Yep, it's an Asus 5200/128MB card. I'm just using a simple script with XviD of course-
mpeg2source("D:\DIvX RIPs\Test Clips\xxxx\xxxx.d2v",idct=6)
crop(4,64,712,352)
fft3dGPU(bt=1)
LanczosResize(672,272)
Undot()
Limiter()

Not sure if it plays a part or not but my AGP bus is running around 69Mhz due to this MSI AMD64 board not supporting locked bus speeds when overclocking.

MacAddict
6th May 2005, 20:21
Originally posted by tsp

MacAddict: Any luck with the Radeon 9700 Pro? Yes indeed:D I'm averaging around 17fps using XviD and the above script. Now I just need to find time to do compression tests and play with the settings. Thx so much for your effort tsp.

Anyone with advice yet on clean DVD movie sources?

dragonfly
7th May 2005, 19:41
I am using fft3dGPU 0.42 and can't help getting the following error;
"Only pixelshader 2.0 or greater is supported."

Well I have an Nvidia Geforce2 Ti 64MB card. Could it be that this is an old card and that fft3dGPU doesn't work with old cards? At least fft3d.dll works with my card, but that is sooooo slow :p

tsp
7th May 2005, 23:25
dragonfly: Yes a geforce 2 is to old. You will need at least a geforce fx 5200 or a Radeon 9500 that is a card with full support for DirectX 9 not just compatible with directx 9 but thanks for testing the supported pixelshader code :)

The following cards will not work:

Nvidia:
TNT
TNT2
Geforce 256
GeForce2 Ultra, Ti, Pro,MX,Go and GTS
Geforce3 Ti 200, Ti 500
GeForce4 Ti, MX, Go

Ati:
Radeon 7xxx
Radeon 8xxx
Radeon 92xx

Matrox:
G2xx
G4xx
G5xx
maybe Parhelia

The following should work:
Nvida
Geforce FX 5xxx
Geforce 6xxx

Ati:
Radeon 9500
Radeon 9550
Radeon 9600
Radeon 9700
Radeon 9800
Radeon Xxxx

where x means any digit.



MacAddict: When testing the speed of the different setting remember to encode to Xvid while doing it. There are a bigger difference in speed between the various setting when using a less cpu demanding codec(like Huffyuy) or no codec compaired to Xvid and other demanding codecs.

LordIntruder
7th May 2005, 23:26
dragonfly:

Read the first page:

"To use this filter you need directx 9 and a graphics card supporting directx 9 in hardware"

Geforce 2 does not support DirectX9 in hardware but DX8 or DX7 don't remember exactly.

dragonfly
7th May 2005, 23:45
@tsp
Thanks for the quick and detailed reply. I guess I have to purchase a new card :D

@LordIntruder
I almost never play games on my pc, so I don't know much about DirectX and the support for it on my card. So when the error came I did some research and found out that pixel shaders are used in DirectX. But wasn't sure if my card supported the right DirectX in hardware.
Thanks for your reply. Slowly I get to know the world of Doom9!

Fizick
8th May 2005, 00:19
Tsp,
Some time ago I read some Dr. Kokaram articles where he describe FFT processing with old good Geforce2 GPU (with NVidia SDK - may be too complex for programming?).

tsp
8th May 2005, 23:38
Fizick: I don't think it will be easy to make a version working on directx 7.0 hardware(geforce2/radeon 7xxx) because it doesn't have programmable shaders so it's only posible to add or subtract textures also it doesn't support floating point math (this is first includes in directx 9) only 8 (maybe 16) bit precision integers. It is posible to do integer fft but I don't know how fast it would be on directx 8 hardware.
Another thing if you would like to know how to do convolution in the frequency domain take a look at the varialble blur source code.

FrEEwilL
9th May 2005, 13:27
Originally posted by tsp
FrEEwilL: ok I have a new test version ready here (http://www.tsp.person.dk/test.zip). It it will again produce a logfile in c:\ so if you could repeat the procidure and post the logfile and include the the adress that couldn't be read. You could also try to set NVPerf=true.

i've tried to toggle the params of bool type but no one helped.

AvisynthPluginInit2:

AvisynthPluginInit2 Addfunction done
CREATE_fft3dGPU

FFT3dGPU constructor address: 1c94ee8
imgp: 0
hr: 1
GetDevice
pDevice :0
RegisterClassEx
create window
Creating D3D

constructor address changes occasionally while repeating the procedures and
depending on what program is used to load the script (vdub,vdubmod, mpc, zp, wmp...)
i don't get memory reference error box any more.
vdub just crashes(disappear) silently _when i switch application focus_.
only mpc gave one _when i'm closing_ (it works if i just keep it open)

The instruction at "0x01fb1177" referenced memory at "0x0229a6c8". The memory could not be "read".

the address changes slightly too.


Another thing that could be wrong if you are using windows 2003 server is that directx acceleration is disabled as default. (http://groups.google.dk/groups?hl=da&lr=&threadm=eHoFwFfdDHA.3260%40TK2MSFTNGP09.phx.gbl&rnum=1&prev=/groups%3Fhl%3Dda%26lr%3D%26q%3D%2522Windows%2BServer%2B2003%2522%2Bdirectx) I don't know if you have enabled it or having problem with other non-fullscreen directx accellerated applications/games, also you need directx 9.0 (I think directx8.1 is the default installed).
i've been using win2003 more than a year and already know those issues.
everything has been set properly and dx 9.0b is installed.
to re-verify, i ran dxdiag and all of d3d/ddraw tests passed successfully.

I don't know if you also have windows xp/2000/98 installed and could try it in these OS's instead.
sorry but i don't have any other OS installed and won't be back to XP
because currently i'm using a soft-raid mirror set (along with hardware stripe sets), which is supported only on the server family.

tsp
9th May 2005, 13:56
FrEEwilL: Hmm try installing directx version 9.0c it's the version I use. I think that it might be the cause of the error. because fft3dgpu stops executing when trying to setup directx.

hartford
10th May 2005, 00:55
For the record: DirectX 9.0c causes problems for many people. I was fortunate in being able to remove it and revert to 9.0b.

Perhaps you could be explicit in the requirements of your filter that
DX 9.0c is required.

(I have Win2000 SP4, DX 9.0b, dual amd processor, ATI 9700. Your filter doesn't work for me :( )

MacAddict
10th May 2005, 01:02
9.0c is running flawlessly on all 4 of my XP and Win2K machines here. I'm not a gamer in the least so maybe thats why I havent seen issues.

@tsp,

I only use XviD with the same settings, source, AVS and CQM's while testing. I overclocked the 9700 to obtain the previous mentioned speeds, still shocked me though compared to this FX5200!

tsp
10th May 2005, 01:36
hartford: I didn't know until now what 9.0c was required. I could try to compile a version for directx 9.0b although it wouldn't support pixelshader 3.0.

hartford
10th May 2005, 01:54
Well, I don't know what the implications would be wrt shader 3.0.

I do not expect you to code for those that cannot use DX 9.0c. I'm just
asking that you inform those that want to use your filter that DX 9.0c is required, that's all.

tsp
10th May 2005, 02:05
hartford: I will do what also I will include an error message in the next version if directx 9.0c isn't installed.

hartford
10th May 2005, 02:09
Originally posted by MacAddict
9.0c is running flawlessly on all 4 of my XP and Win2K machines here. I'm not a gamer in the least so maybe thats why I havent seen issues.



Well, good for you. I don't understand how "gamming" introduces a problem.

I do see where 2 cpu's can introduce a problem.

What changed from DX 9.0b to 9.0c?

I don't know. Many others don't know. 2 programs to remove DX 9.0c would not be in existance if there were no problems, whether 1 or 2 cpu's.

Bah. Until MS fixes this there will be many unhappy people.

hartford
10th May 2005, 02:11
Originally posted by tsp
hartford: I will do what also I will include an error message in the next version if directx 9.0c isn't installed.

Thanks. I'm sure that this will prevent many curses ;)

tsp
10th May 2005, 17:01
I have just released version 0.43
It mainly contains memory leaks fixes and the geforce fx fix.
Also new is a version compiled for DirectX 9.0b so I expect hartford to come with a full report about how this version works on a SMP machine ;D
I also added more comments to the sourcecode if anyone wants to take a look.

FrEEwilL
11th May 2005, 13:04
@tsp
oh jesus.. dx9b was the cause. yesterday i've installed 9c and finally the filter worked! :P thanks for your help.

hartford
12th May 2005, 02:50
Originally posted by tsp
I have just released version 0.43
It mainly contains memory leaks fixes and the geforce fx fix.
Also new is a version compiled for DirectX 9.0b so I expect hartford to come with a full report about how this version works on a SMP machine ;D
I also added more comments to the sourcecode if anyone wants to take a look.

I'm sorry to say that I get this error (using FFT3dGPU9b):

Avisynth read error: Avisynth: caught an access violation at 0X0291cc61, attempting to read from 0x00000008

tsp
12th May 2005, 19:04
Originally posted by hartford
I'm sorry to say that I get this error (using FFT3dGPU9b):

Avisynth read error: Avisynth: caught an access violation at 0X0291cc61, attempting to read from 0x00000008

:( ok I'm installing win2k + SP4 +dx9.0b + visual c++ .NET + dx 9.0b SDK on a WMware virtual machine so I think it should be posible for me to make a working version soon. (Another thing about 15 sec after I installed win2k and connected to the internet there were 3 Trojan horses /WORMS installed from the internet. Total insane!!!)

[edit]
Could you try try this (http://www.tsp.person.dk/FFT3dGPU9b.zip) version. It cantains two files: fft3dgpu9b.dll and fft3dgpu9b_log.dll. try fft3dgpu9b.dll first and if it doesn't work try fft3dgpu9b_log.dll. It will generate a file called c:\FFT3dGPU_log.txt so if you could post the content of this file it would be easier to see what's wrong.
This build works in win2k with SP4+directx 9.0b and the reference (software) render. Vmware doesn't support hardware accelerated directx 9 so I couldn't test that.

MacAddict
12th May 2005, 22:17
Anyone successfully encoded 2passes with 0.43 version yet on an entire movie? My 2nd pass stopped almost immediately indicating the DirectX device was no longer available:confused:

tsp
12th May 2005, 22:45
MacAddict: Is the exact error message: "Direct3D device lost. Please restart the application"?
if that is the case do you have any full-screen program/games/screensavers running. That can cause it.
How many frames can you encode before it happens?

MacAddict
12th May 2005, 23:05
@tsp

Thats the exact message I got. I'll have more details in a few hours when I arrive home. Guessing right now but I dont think anymore than a thousand frames on the 2nd pass were encoded.

I can tell you now that no games are installed on this particular machine, all screensavers are disabled. Only thing else running at the time was 8rda-vcore monitoring voltages and temps in the system tray. Now that I think about it I cant be sure I didn't also launch ATItool to check my GPU speed momentarily:rolleyes:
http://www.techpowerup.com/atitool/

I'll restart the 2pass again to see if in fact ATItool was the culprit for that mishap:) Report back later, thanks for the tip!

tsp
12th May 2005, 23:35
MacAddict: It could be ATItool. Is this the first time it happens? I must amit that it is posible to recover the the Direct3D device but I'm a little lazy so it is not implementet yet (but I will se what i can do about it). As for what can cause it here is sentence from the directx docs:

By design, the full set of scenarios that can cause a device to become lost is not specified. Some typical examples include loss of focus, such as when the user presses ALT+TAB or when a system dialog is initialized. Devices can also be lost due to a power management event, or when another application assumes full-screen operation. In addition, any failure from IDirect3DDevice9::Reset puts the device into a lost state.

losing the focus shouldn't be a problem because this only applies to full-screen applications.

MacAddict
13th May 2005, 01:38
I'm almost positive now that at one point during the encode I did launch ATItool. I suspect the tool polled for D3D devices and somehow interrupted fft3dgpu. This is the first time I've attempted a full 2-pass encode on a DVD source. I'll start the encode up again tonight and report back soon.

FYI- No fullscreen apps, windows or screensavers were used.

hartford
13th May 2005, 02:57
Could you try try this (http://www.tsp.person.dk/FFT3dGPU9b.zip) version.


This version loads without error. I'm not certain at the moment if it works, ie, I'm using default and it doesn't seem to clean.

My script:

loadplugin("d:\plugins\FFT3dGPU9b.dll")

avisource("d:\test.avi").ConvertToYV12()

fft3dGPU()



Now that it loads I can adjust parameters.

Will report again.

hartford
13th May 2005, 03:00
Ok, I increased Sigma from default and it does work ;) nice!

Thankyou!


I'll post FPS when I capture a clip longer than 10 seconds.

hartford
13th May 2005, 03:30
Results from analog capture (analog capture to Huffyuv v2.1.1 CCESP Patch v0.2.5):

FPS: about 17


Did a 3min 45sec capture.

Used this script:

loadplugin("d:\plugins\FFT3dGPU9b.dll")
loadplugin("d:\plugins\Decomb521.dll")

avisource("d:\test2.avi")

Telecide(order=1,Post=0,Guide=1,nt=25)
Decimate(Cycle=5,Mode=0,Quality=3)

ConvertToYV12()

fft3dGPU(sigma=8)

(Recompressed to Huffyuv via VirtualDub)

--

I'm not commenting on the results of Sigma=8 (did cause banding), just
the FPS at that setting.

Zetto
13th May 2005, 09:40
Originally posted by hartford
What changed from DX 9.0b to 9.0c?

I don't know. Many others don't know. 2 programs to remove DX 9.0c would not be in existance if there were no problems, whether 1 or 2 cpu's.

Bah. Until MS fixes this there will be many unhappy people.

Major change with DS9c was the introduction of shaders v3.0 for the GF6x00s. As for the DX uninstallers, there are problems with ALL versions of DX , as there were with DX9b (like that major bug with capture cards ;) )

TSP, keep up the good work. Thanks for the help over emails :D Any plans to add a GUI for ur filter? I wanna see my 6800 work ;) Right now I have to settle for monitoring it's status by noise (fan speeds up) and by temp in the control panel

tsp
14th May 2005, 20:26
I have released version 0.44 of fft3dgpu, it now includes support for recovering a lost device and hopefull the directx 9.0b version works this time.

Zetto: There are already support for monitering the work done by using nvperfHUD(link in readme). To run nvPerfHUD do it like this:

set NVperf=true and used this commandline to run it:

code:
"PATH TO NVPerfHUD\NVPerfHUD.exe" "PATH TO VIRTUALDUBMOD\virtualdubmod.exe" "PATH TO AVS\test.avs"

and enabled "force NON PURE device"


hartford: Good to hear that it works now. Try setting mode=1 to reduce banding or lower the sigma value. Can you get more than 50% cpu utilization with fft3dgpu I ask because the code is multithreaded.

MacAddict: the new version should solve your problem

MacAddict
14th May 2005, 20:47
@tsp

I was able to reproduce the lost device issue:) It turned out to be the tightVNC service that was running in the background which I use for remote viewing purposes. My 2nd test with a full 2-pass encode worked flawlessly once I disable tightVNC.

Thanks for the new version. I'll give it a shot with tightVNC;)

Zetto
15th May 2005, 23:13
Thx, the nvperf_hud looks pretty neat. I have to complain tho about compatibility with HT cpus - still crashing with HT enabled (single cpu is rock solid). Both .43 and .44 are crash-prone (stops encoding, try to cancel and exit - crashes with memory access violation or somethin of that sort). Surprisingly, the .43 u have sent to me in email (is not the same as u have released here) is most stable. It works if I start first with single cpu enabled and later one I add the affinity for the second one. Dunno what to do about it. Without HT, encoding is very slow, and but when enabled, it crashes every now and then.

tsp
16th May 2005, 10:38
Zetto: The problem with hyperthreading is to locate where the error lies. Is it the nvidia driver that doesn't support multiple threads accesing the card at the same time(all games I know of only have 1 thread doing all the directx stuff) or is it somewhere in fft3dgpu there happens to be a lockup or maybe avisynth doesn't like multiple threads accesing the same frame at the same time or it could be virtualdub there is something wrong with. You can try downloading this (http://www.tsp.person.dk/test.zip) version of fft3dgpu 0.44. The only difference is that it will produce a log file (C:\FFT3dGPU_log.txt) so if you could post the last 100 lines from thise file when virtualdubmod locks up(with HT enabled). The filesize can be 10-45 MB if the lockups first come after a couples of hours.

Zetto
17th May 2005, 07:07
Hehe, first pass and it crashes :( I've attached the last 100 lines or so of the log file. Let me know what seems to be the problem :D From my layman perspective it has something to do with GPU.. driver perhaps? I'm using 71.84.

.....

Getframe 5481
time: 16782156
SetSamplerState...
done
download from GPU...

tsp
17th May 2005, 11:27
Zetto: From reading in the directx newsgroups it seems as the problem is that directx doesn't like two threads sending commands to the same GPU at the same time very much. So I made a new version where only one thread at a time can use the GPU. So please try if this (http://www.tsp.person.dk/fft3dgpu_0443.zip) version solves the problem.

Zetto
17th May 2005, 20:11
Yay, tsp, u rule :D I've tried new version but now it locked up with kernel use (red bar in task manager) thru the roof (didn't happen before). With other versions, vdub crashes but doesn't use CPU at all but now it takes up 50% on HT CPU. Hmmm...

Thanks for the effort anyways. Hopefully we'll be able to track this thing down :D

tsp
18th May 2005, 21:26
:angry: Why does it have to be so hard to make a SMP stable version. :(
Could you try virtual dub 1.6 and see if that change anything. I have a new log producing version ready here (http://www.tsp.person.dk/fft3dgpu_0443.zip) so if you could repeat the test and PM the last 100 lines of the log to me I will see what I can get out of it.

Revgen
18th May 2005, 21:32
I use a Geforce 6800 Ultra using WinXP pro, DirectX9c, and 67.02 drivers.

I always get this error message "Avisynth: caught an access violation at 0x01e5cfe1, attempting to read from 0x00000008" instead of a picture whenever I play this script in media player.

Here is the script:

LoadPlugin("E:\TomsMoComp.dll")
LoadPlugin("E:\fft3dgpu\FFT3dGPU.dll")
directshowsource("E:\ds000.avi")
TomsMoComp(1,10,1)
FFT3dGPU(sigma=5,beta=1,bt=3,plane=0)
FFT3dGPU(sigma=5,beta=1,bt=3,plane=1)
FFT3dGPU(sigma=5,beta=1,bt=3,plane=2)

But it's okay when I use this script:

LoadPlugin("E:\TomsMoComp.dll")
LoadPlugin("E:\fft3dgpu\FFT3dGPU.dll")
directshowsource("E:\ds000.avi")
TomsMoComp(1,10,1)
FFT3dGPU(sigma=5,beta=1,bt=3,plane=0)
#FFT3dGPU(sigma=5,beta=1,bt=3,plane=1)
#FFT3dGPU(sigma=5,beta=1,bt=3,plane=2)

I even disabled the luma plane and left the chroma planes active and it didn't work, but it didn't give an error message that time.

Apparently this plugin doesn't want to process chroma.

The ds000.avi file is a video capture from an old vhs tape. I captured it at 720x480 resolution with a special YV12 version of Huffyuv.

Is this a problem on my end? I'm not as advanced a user as some of you are, so I'm not sure.

Any help would be appreciated. Thanks.

tsp
18th May 2005, 22:13
Revgen: I must admit I changed the way the plane parameter works compaired to fft3dfilter. You only need plane=1 to process both chroma planes(U and V)(I made it like this because how many times do you only filter U or V plane?).
But I also found a bug in the code. I forgot to change a U to a V. I will release a new version shortly until then use mode=1.

Another thing if anyone is able to run this filter on a radeon 9700 pro and windows xp/SP2 with this options: fft3dGPU(sigma=3.0,bt=3,bh=64,bw=64,mode=1)
without getting artifacts like this (http://www.bausoft.de/bir0/radeon9700pro.jpg) could you post what driver version you use.

MacAddict
18th May 2005, 22:42
tsp,

With a Radeon 9700 clocked to 400/315Mhz using XP SP2 I'm not getting that artifact with the parameters above. Using the latest Cat 5.4 driver suite as well.

I'm also happy to report the D3D device isnt lost anymore when using other applications like tightVNC, ATItool, etc:) Nice work!

tsp
18th May 2005, 22:58
New version ready that fixes the chromaplane bug.

MacAddict: Thanks and good to hear that the lost driver issue is gone.

Zetto
18th May 2005, 23:13
Hey TSP, I think u should add test (loggin) version with each new release ;) I wanna test .45, I totally didn't use chroma, wanna give a shot :D

tsp
18th May 2005, 23:18
Zetto: the only difference between 0.45 and the test version is the chroma bugfix (and of course the log). So I don't hope you mind being my beta tester :p

Zetto
19th May 2005, 02:27
Check ur pm TSP... now it crashed during 2nd pass, with kernel use again up to 50% on one CPU... Gah! Single affinity works without a hitch ;) but fairly slow (about 10hours instead of 8 :( )

BTW, TSP, point out a single person here who is NOT a beta-tester :D ;)

PS
Is it my imagination or does FFT3d use up a lot of kernel time from ver .443 forward (speed the same give or take)? What's changed?

tsp
19th May 2005, 11:54
Zetto: new test version ready 0.45.1 (http://www.tsp.person.dk/fft3dgpu_0451.zip).
just pm the log like last time. Also try reduceCPU=false if reduceCPU=true crashes. It seems as if the filter is stuck in an infinite loop so I added a timeout after 10 secs.
The kernel time did increase but the speed is the same when encoding XviD (or maybe a little faster). The filter is not multithreaded anymore(in the version from 0.40 to 0.44 the next frame was cached in a seperate thread) and I take care of only thread at a time acces the directx device so the directx driver don't have to handle it.



BTW, TSP, point out a single person here who is NOT a beta-tester :D ;)

well okay you are my alpha tester :) but the time is close where I might create a thread in Avisynth Usage. I think most of the bugs are fixed.

Revgen
19th May 2005, 17:07
I wouldn't post this filter into the Avisynth Usage thread until the WMP memory leaks are fixed. After I play an Avisynth file using fft3dgpu I always have to open up the task bar and shut down WMP manualy.

Although memory leaks don't occur when encoding a file when using Gordian Knot or Virtual Dub, I still believe that WMP should be able to work with it without any issues.

tsp
19th May 2005, 19:25
Revgen: What version of Windows media player do you use and are you using the same avs file as in your first post? Also that is not a memory leak unless you have a steady increasing memory utilization during playback. I guess why this error hasn't turned up is because this filter is to slow for real time playback.

Zetto
19th May 2005, 22:27
Giving .451 a spin right now. Loads up CPU pretty heavy, about 5-10% higher on HT CPU, speed remains the same tho (with default reduceCPU setting, will try to disable it if it crashes). What happes if there is a timeout? :confused:

Revgen
19th May 2005, 22:35
I use WMP 6.4.

It plays back fine for me. The only problem is that when I exit WMP the WMP program still stays in memory. It only happens when I use your filter. I just like playing the file to preview what the image quality is like before I encode it. I have an Athlon 64 3500+ Socket 939 CPU so it's not too bad as far as playback speed goes.

Here is the script I use. I now use mode=1 like you suggested. So it plays back fine.

LoadPlugin("E:\TomsMoComp.dll")
LoadPlugin("E:\fft3dgpu\FFT3dGPU.dll")

directshowsource("E:\ds000.avi")

TomsMoComp(1,10,1)
FFT3dGPU(sigma=2,beta=1,bw=32,bh=32,bt=3,plane=0,mode=1)
FFT3dGPU(sigma=2,beta=1,bw=32,bh=32,bt=3,plane=1,mode=1)

tsp
19th May 2005, 23:21
Zetto: The loop where the lockups happends asks the GPU if it is done rendering if what is not the case FFT3dGPU sleeps so other threads/XviD can use the cpu. If the rendering is done or a timeout occurs the next thing happening is the result is downloaded from the GPU back to main memory. I don't know if the filter stops here or can process with the downloading but we will see what happens.

Revgen: I reproduced the lockup with WMP 6.4. It happends because the thread that initialize the invisible window FFT3dGPU uses for rendering is different from the thread destroying it when WMP closes down so fft3dgpu is stuck in an infinite loop waiting to close the window down. I will fix this tomorrow also you can safely use mode 0 from version 0.45 and onwards. It seems as if you finds a lot of bugs in my filter please continue that and it will soon be bugfree :D

Zetto
20th May 2005, 02:31
It chewed through 4 files (40 min each), actually, 2 files x 2 passes. No freezes. I am worried though that if the lockup has occurred, fft3dgpu stopped filtering (is that possible?) I'll review the files and see if they have problems with filtering.

EDIT: Seems like the v .451 have solved the problem with HT... I'll let it run overnight, see if anything breaks :D My comp been running 24x7 for a while now, hope nothing breaks...:eek:

Still I am curious: how does FFT3fGPU handle those exceptions when GPU locks up in the loop... does it try to redo the frame it got stuck at? or just moves on to the next one? If did I misread the hole idea?

tsp
20th May 2005, 11:20
version 0.46 is done. Includes two bugfixes so WMP 6.4 should work now and also Pentium 4 with HT enabled and Geforce 6800 Ultra.

Zetto: Take a look at this code snip where the lockups happens:

//While GPU working sleep
DWORD endtick=GetTickCount()+10000;//This is the end time this assures that we don't sleep for more than 10 sec.

LOG("sleep while query not flushed...")
//if GPU command queue is not empty and the time out has not occured sleep to allow other thread to work
while((S_FALSE == pQuery->GetData( NULL, 0, D3DGETDATA_FLUSH ))&&(GetTickCount()<endtick))
Sleep(0);
LOG("done")
}

//download texture to system memory texture
LOG("GetRenderTargetData...")
result=_pDevice->GetRenderTargetData(_pVideoSurface,_pShadowSurface);
LOG("done")

as you can see if the loop times out(because a lock-up occured, maybe because the nvidia driver that handles the Query isn't threadsafe) fft3dgpu just continues to the next step that is to download the texture(in this case frame) to the main memory just as it would if the GPU signaled it was done rendering. So it just continues working on the frame as nothing had happend. The only time FFT3dGPU redo's the frame is if the Direct3D device(GPU) is lost while calculating the frame(this can happend if a screensaver or another fullscreen 3D program/game are active.). That happend for MacAddict.

Zetto
20th May 2005, 18:32
So, if I understand correctly, when GPU locks up, the FFT3dGPU just takes whatever GPU has (or hasn't) done to the frame and loads it into main memory for further processing? Can it potentially lead to corrupted frames? That is, if the lock-up occurs while GPU is processing a frame, and does not complete the procedure, would FFT3dGPU load up whatever incompletely filtered frame from GPU into main memory for further processing? Would such crashes/lockups also speed up FFT3dGPU because it won't be processing ALL frames, since GPU would lock up on some of those?

I'm not saying I saw any corrrupted frames but I yet have to check the material I have produces so far. The good news is that the .451 worked for 10 hours straight with HT enabled and didn't crash. I just hope I didn't get garbage as a result ;)

On totally unrelated note, how do u check for artifacts? :D

tsp
20th May 2005, 20:44
Zetto: It is not as much the GPU that locks up that it is one specific command to check if the GPU is done rendering. I don't think there are many games if any that utilize this command so that might be why it locks up once in a while.
So I don't think it would cause any corrupted frames but if it is only 1 out of every 30000 frames I don't think you would notice it. The only way to see if there is any corruption would be to watch the movie. Also every time a lockup occur FFT3dGPU would have to wait 10 seconds before the loop times out so if there are many lockups the speed would only decrease.

Could you try version 0.46 and see if that one also works.

The usual way I check for artifacts is to look at the filtered frame :) or use something like this to compaire it with the output fft3dfilter produce:

src=avisource("c:\test.avi").convertoyv12()
a=src.fft3dfilter(bw=32,bh=32,ow=32,oh=32,sigma=3)
b=src.fft3dgpu(bw=32,bh=32,ow=32,oh=32,sigma=3,mode=1,usefloat16=false)
subtract(a,b)

Revgen
20th May 2005, 20:45
Surprise! Surprise! Another bug!:eek:

I was trying to encode my clip with Gordian Knot 0.35.0 using the new 0.46 version with FP32 instead of the default FP16. Unfortunately GK gave me this error.

http://img219.echo.cx/img219/6347/fft3dgpufp32gkerror10fq.th.jpg (http://img219.echo.cx/my.php?image=fft3dgpufp32gkerror10fq.jpg)

Here is the script I use.

LoadPlugin("E:\TomsMoComp.dll")
LoadPlugin("E:\fft3dgpu\FFT3dGPU.dll")
directshowsource("E:\ds000.avi")
TomsMoComp(-1,10,0)
FFT3dGPU(sigma=8,beta=1,bw=32,bh=32,bt=3,plane=0,mode=1,useFloat16=false)
FFT3dGPU(sigma=8,beta=1,bw=32,bh=32,bt=3,plane=1,mode=1,useFloat16=false)



This was the first time I ever tried to use FP32, so I decided to try it out on the VirtualDubMod 1.5.10 program without using Gordian Knot as a frontend and it worked fine.

So I decided to go back to using the "useFloat16=true" parameter and Gordian Knot didn't give an error.

This error comes up despite whether I'm doing a compressibility check or a straight encode.

tsp
20th May 2005, 20:51
Revgen: It is not a bug just me forgetting to include the explanation of this error in the readme.txt (Maybe I should include a FAQ in the readme). It just means that fft3dgpu needs more memory on the graphics card (yes I know you have 256 mb but that is not enough. Infact I think FFT3dGPU is the only reason why someone would have to buy a Geforce 6800 Ultra with 512 MB ram :) ). The only work around is to use mode 0 or 2 (or FP16). They only uses half the memory.

Revgen
20th May 2005, 22:58
Are you sure?

Why would it work well in VirtualDub and not in Gordian Knot?

tsp
20th May 2005, 23:05
hmm good question. It might be because Gordian knot allocates some memory on the GPU for it's own use or even worse it initialize two instances of the avs file(and when uses twice the memory). I will try downloading gordian knot and investigate what happens.

Revgen
20th May 2005, 23:56
This could be true, because I have to open my avi in Gordian Knot using the avs script that I created. Gordian Knot can't read my avi file because I captured it using a special directshow version of Huffyuv that uses YV12. The only way Gordian Knot and the Virtual Dub program included can read it is by using the "directshowsourece=myclip.avi" parameter in AVISynth.

Gordian Knot uses its own self-created .avs script and feeds it to Virtual Dub when it encodes video, so there may be a problem here.

tsp
21st May 2005, 00:54
Doh nvperfHud was messed up in 0.46 so I released version 0.46.1 that fixes it. Also added a FAQ section to the doc.

Revgen: From my first initial testing with GKNOT it seems a little random when it happends.

Revgen
21st May 2005, 01:26
Random?

It happens every time for me.

Does your filter use AGP memory?

I currently set aside 256mb of RAM for my AGP card to use. Would increasing it to 512mb help at all?

Backwoods
21st May 2005, 02:01
Just a silly little request. Can you rename your future Readme.txt to fft3dgpu_readme.txt?

MacAddict
21st May 2005, 02:20
Originally posted by Backwoods
Just a silly little request. Can you rename your future Readme.txt to fft3dgpu_readme.txt? Thank you, it was my next request as well ;)

Leak
21st May 2005, 10:18
Originally posted by Revgen
I captured it using a special directshow version of Huffyuv that uses YV12. The only way Gordian Knot and the Virtual Dub program included can read it is by using the "directshowsourece=myclip.avi" parameter in AVISynth.

Well, I encode quite a bit to YV12 HuffYUV files, but I'm using ffdshow's VfW interface for this - and with ffdshow set to decode HuffYUV in it's VfW settings I can directly open the file in Virtual Dub with no problems.

Just my .02 EUR...

np: Quinoline Yellow - Sealed (Dol-Goy Assist)

Revgen
21st May 2005, 17:28
Well, your suggestion works somewhat. I can now open my avi file without using avisynth, but unfortunately I still can't encode using FP32.

I guess I'll just have to do a manual 2-pass encode the old-fashioned way with Virtual Dub. Unless there is another frontend that can do a 2-pass encode other than GK.

vinetu
21st May 2005, 21:16
Revgen,
If you feed the fft3dGPU with low res image -let say add a "BilinearResize(320,240)" line before fft3dGPU - will that work with GK and FP32?
It's just for tracking - not a solution

MacAddict
21st May 2005, 21:48
Slightly OT but seems like you could use MeGUI or avs2avi to check the problem?

tsp
21st May 2005, 21:55
Revgen: I don't know much about GKNOT could you please descripe what exactly you do. Shrinking the source would work because it reduce the size of the needed memory but increasing the amount of AGP memory wouldn't because FFT3dGPU only use very little AGP memory (~1 MB).

Backwoods: Now you mentions it. It really annoys me when other filterwriters just call the docs readme.txt but it is really easy to forget that because I don't copy my own docs inside the plugin directory :o But in the next version it will definitely be called fft3dgpu_readme.txt or fft3dgpu.txt (I might even create a html doc).

Zetto
22nd May 2005, 05:30
So far, I didn't see any artifacts and encoding works with HT :D Finally! However, I am experimenting with sharpen now and it gives me problems every now and then. It works but does not accept values higher than +1 and it sometimes forces vdub to quit without any error messages or vdub just doesn't want to load the avs. I have to quit and reload the file to make it work (without making any changes to the script). Didn't try negative values since it seems a little unproductive - after all, fft3dgpu already softening up the image :D

tsp
22nd May 2005, 11:24
Zetto: What script are you using? This one works on my computer:
fft3dgpu(sharpen=200,plane=0)
The result are ugly but it works. But anyway I think i will rewrite the sharpening code so it does an ordinary unsharpen mask sharpen to avoid ringing(but I suspect the ringing could just be caused by the MJPEG compression and it is just more visible when applying the sharpening)

Revgen
22nd May 2005, 15:21
Originally posted by Zetto
So far, I didn't see any artifacts and encoding works with HT :D Finally! However, I am experimenting with sharpen now and it gives me problems every now and then. It works but does not accept values higher than +1 and it sometimes forces vdub to quit without any error messages or vdub just doesn't want to load the avs. I have to quit and reload the file to make it work (without making any changes to the script). Didn't try negative values since it seems a little unproductive - after all, fft3dgpu already softening up the image :D

I've experimented with the sharpen filter a little, and I've never had this problem.

What version of VirtualDub do you use?

I use VirtualDubMod 1.5.10.1 app that comes with Gordian Knot.

Zetto
22nd May 2005, 19:43
I use fft3dgpu(sigma=5,bt=3,mode=1,sharpen=1) with latest regular vdub v1.6.5 build 23350. I guess it's the vdub's fault :D I'll try vdubmod a little later on. TSP, please do improve on sharpen.. Check out other sharpeners with a goal of integrating them ;)

BTW I just noticed the lockup in task manager - the cpu1 usage dropped to 0% while cpu0 kernel rose to 45% for about 10 sec - symptoms of when vdub locked up with .44 version of fft3dgpu... I wonder what happened to that frame that was being processed :confused: It was first pass though, so hopefully second pass will produce a nice frame ;)

So the good news is that the workaround works, but on the bad side of things it doesn't address the issue: why do lockups occur in the first place? After all, without HT, it's smooth sailing all the way throuh.

Revgen
22nd May 2005, 21:29
Originally posted by Zetto
I use fft3dgpu(sigma=5,bt=3,mode=1,sharpen=1) with latest regular vdub v1.6.5 build 23350. I guess it's the vdub's fault :D I'll try vdubmod a little later on.

According to the VirtualDub sourceforge website ( http://virtualdub.sourceforge.net/ ) the v1.6.5 build is an "experimental" build. The older 1.5.10 build is considered stable. I'm betting that the problem is VirtualDub's fault.

tsp
22nd May 2005, 21:31
Originally posted by Zetto
I use fft3dgpu(sigma=5,bt=3,mode=1,sharpen=1) with latest regular vdub v1.6.5 build 23350. I guess it's the vdub's fault :D I'll try vdubmod a little later on. TSP, please do improve on sharpen.. Check out other sharpeners with a goal of integrating them ;)

just tried fft3dgpu(sigma=5,bt=3,mode=1,sharpen=20) and it doesn't crash. wonder of it is HT again. Could you try running with only 1 cpu and see if that works


BTW I just noticed the lockup in task manager - the cpu1 usage dropped to 0% while cpu0 kernel rose to 45% for about 10 sec - symptoms of when vdub locked up with .44 version of fft3dgpu... I wonder what happened to that frame that was being processed :confused: It was first pass though, so hopefully second pass will produce a nice frame ;)

So the good news is that the workaround works, but on the bad side of things it doesn't address the issue: why do lockups occur in the first place? After all, without HT, it's smooth sailing all the way throuh.
that sounds look a lockup.

I think it happends because I use a rarely used directx command and it might be because of that it hasn't been tested that much with HT. So the command doesn't work right but because it doesn't affect the rendering the result should be ok.

JnZ
22nd May 2005, 23:54
Hi everybody,

I've just make some speed tests:

Settings:
FFT3DFilter(sigma=3,bt=3,bh=32,bw=32)
FFT3DGPU(sigma=3,bt=3,bh=32,bw=32)

Configuration:
Athlon 64 3000+@2400MhZ,MSI NX6200@550/680.

Clip resolution:720x384

Speed:
FFT3DFilter: ~7.5fps (CPU:800MhZ):D (extremly situation for my Athlon)
FFT3DFilter: ~15fps (CPU:1800MhZ)
FFT3DFilter: ~20fps (CPU:2400MhZ)
FFT3DGPU : ~15fps (GPU/MEM:300/450MhZ,CPU:1800,2400MhZ)
FFT3DGPU : ~21fps (GPU/MEM:550/680MhZ,CPU:1800,2400MhZ)
FFT3DGPU : ~18fps (GPU/MEM:550/680MhZ,CPU:800MhZ)


It seems that Athlon64 is very strong on high frequencies and my graphics card is too slow, to boost encoding proces rapidly. Maybe if I unlock next 4 pipelines...
But, as you see, FFT3DGPU is very good for people with slow CPU and fast graphics card.

Bye

EDIT: I make some real tests with XviD codec:
XviD settings: Single Pass Q2,BPHQ matrix,Qpel,2 B-frames,Trellis,VHQ4
I test 1000 frames from "Der Untergang" dvd source, LanczosResize(720x384). Whole film contains 223251 frames of video.

FFT3DFilter: 193,1s (CPU:1800MhZ)
FFT3DFilter: 142,9s (CPU:2400MhZ)
FFT3DGPU : 147,9s (GPU/MEM:550/680MhZ,CPU:1800)
FFT3DGPU : 112,4s (GPU/MEM:550/680MhZ,CPU:2400MhZ)
-------------------------------------------------------------
Filesize:
FFT3DFilter: 7 946 240 b
FFT3DGPU : 9 310 208 b
-------------------------------------------------------------
Aprox. estimation to encode whole video:
FFT3DFilter: 8,862h (CPU:2400MhZ)
FFT3DGPU : 6,970h (GPU/MEM:550/680MhZ,CPU:2400MhZ)

So with GPU, I can shorten encoding proces abou 1 hour.
I never wanted strong graphics card,because not playing much,but now, I want 2x6800GT with SLI. With this,encoding process can be shorten rapidly. :D
Bye

Revgen
23rd May 2005, 17:19
@tsp

Have you gotten my email with the PDF and JPG's. I'm just asking because it was about 1.5mb in size and may have been too big for your email server to handle.

tsp
23rd May 2005, 18:35
Revgen: yes I got it. I will try testing it later.

JnZ: tnaks for the speed test. I think you could increase the compression by using mode=1 instead of the default mode=0 without increasing the encode time much(not compaired to when you just preview the video). I don't know if you could include
FFT3DGPU(sigma=3,bt=3,bh=32,bw=32,mode=1)
in you test(and the resulting filesize).
anyway the geforce 6200 performs quite good considering that it is a low budget card.

Revgen
23rd May 2005, 23:52
Originally posted by JnZ


It seems that Athlon64 is very strong on high frequencies and my graphics card is too slow, to boost encoding proces rapidly. Maybe if I unlock next 4 pipelines...


You can unlock the extra pipelines on your GPU depending on what revision of the 6200 chipset you have.

Go to this thread http://forums.guru3d.com/showthread.php?s=&threadid=136293 for more info.

tsp
24th May 2005, 12:25
Revgen: I reproduced your problem with GKnot and found a solution. GK doesn't close the videostream after you click on save and encode in the first window(with the title "FrameNo xxx/yyy") so the memory on the GPU (or main memory) is never deallocated so when you start the encoding fft3dGPU runs out of memory. The solution is simple: After you have clicked on "Add job to to Encoding Queue" you close GK instead of clicking on "start encoding". When you open GK again the job is still in the job list so just click on start encoding this time and it should work. Or you could first include fft3dgpu in the avs script generated by GKnot

Revgen
24th May 2005, 16:11
Your solution does work. Thanks:)

The only problem now is that the compressibility test still will not work. I can't exit Gknot and redo that. I guess I'm griping too much:D

Maybe, when you have some spare time, you can talk to Lenox about this and and maybe he can find a way to solve the problem peermanently in the next version of Gknot.


Thanks

tsp
24th May 2005, 18:53
revgen: what if you first add fft3dgpu in the script GKnot creates(save .avs script tab) (page 3 in the pdf file you send me)

Revgen
24th May 2005, 21:23
@tsp

I just discovered that both methods effectively when I want to do a straight encode. But changing the avisynth script still doesn't work when I want to do a correct compressibility check.

Try this.

1)Go to the "script" tab as shown on page 3, and change the script. However you do it doesn't matter. You can delete it all if you want.

2)Then click on the "compressibility check" tab.

3)Now click the "script" tab again and it will appear as it originally was before you changed it.

The compressibility check works, but I don't believe that it takes into account any of the FFT3dGPU settings. I believe this because Gordian Knot has suggested that I use a bitrate between 3500 and 5500 Kbps after I do the test. This may be true when it's not filtered, but I know that with FFT3dGPU I can encode this video at 1500-2000 kbps and it will still look good.

I'm not quite sure, but this seems to be a problem with Gknot rather than FFT3dGPU.

LordIntruder
24th May 2005, 22:53
Hi,


I don't know if this is what you are talking about but I use GordianKnot too to make the first calculations, bitrate, etc... I don't use it to encode, I use VDM manually, modify my script by hand, etc... My problem with GK and FFT3dGPU is whatever the resolution I select, I get the same result. Example:

- I do a comp test in 640 x 480, say I get 45%. So I move the slider to increase the resolution until GK display ~38%. Says the resolution is now 672 x 496. If I do a comp test again, I get 45% too. And to verify I selected 560 x 416 and same here, again 45% which doesn't make any sense. 720 x ... I was getting 45% too. Great! :)

The work around is quite simple: I use the original FFT3DFilter by Fizick to make my comp test and encode with the GPU version the whole movie.

The first version of FFT3dGPU didn't have this problem. I use 0.43 and there were 2 versions out since. I'll see if the problem still remains but I do not care about that.

Revgen
13th June 2005, 17:42
TSP,

I'm thinking about buying a Dual-Core CPU sometime in the near future.

I know that you have said that DirectX doesn't work well with more than one thread.

Would it be possible to create a version of FFT3dGPU that worked with OpenGL.

AFAIK OpenGL works well with multiple CPU's.

tsp
13th June 2005, 23:15
Revgen: It should be possible but the anoying thing about OpenGL is that I would have to write separate code for ATI and NVidia cards because they don't support the same OpenGL extensions. Also fft3dgpu will work fine with dualcore/processors/hyperthreading. It will only use 1 processor/core/thread like most of the other filters for avisynth (allthough some of them can be used with MT my new filter for the multiprocessor people)

Revgen
14th June 2005, 01:02
I'm sure it will work fine with dual cores. My issue is that using two cores(possibly working with your MT filter) can increase speed.

And I may be wrong but I believe that new Nvidia beta drivers are now 100% compatible with OpenGL 2.0. I don't own an ATI card so I don't know for sure if they have OpenGL 2.0 in their drivers. I'm going check up on it and report back.

If ATI and Nvidia both comply fully with OpenGL 2.0 I figure that it shouldn't be too hard to program for both of them. But then again I'm not a programmer, so what do I know :D.

EDIT

From the new information that I've gathered, Nvidia beta drivers 75.90 and up support OpenGL 2.0 completely.

Catylyst drivers 5.3 and up support OpenGL 2.0 for ATI.

Unfortunately ATI hardware currently CAN execute all OpenGL 2.0 parameters, but they CANNOT perform certain parameters up to OpenGL 2.0 performance specs.

An ATI software engineer explains it here (http://www.rage3d.com/board/showpost.php?p=1333565842&postcount=34).

I'm not sure if these limitations will affect what FFT3dGPU does or not. You would probably know better. :D

I hope this helps.

tsp
14th June 2005, 15:25
Revgen: I don't think it would be faster with OpenGL compaired to DirectX on a dualcore machine if the driver works in another thread(and I think it does) because the limiting factor would still be the graphics card.

Also I haven't tried using OpenGL

acrespo
14th June 2005, 18:36
I have some problems with FFT3DGPU. Sometimes I have a interrupt in frame sequence to show a old frame and sometimes the frame is green instead of the correct image.
Source is PIC MJPEG 3, 640x480 PAL-M (29.97 fps).
My script:


AviSource("d:\capture.avi")
Trim(0,18455)++Trim(19452,53420)

Crop(8,8,-8,-8, align=true)
LanczosResize(width,height*2)
TurnLeft()
SangNom()
TurnRight()
SangNom()
BilinearResize(640,480)

FFT3Dgpu(sigma=2,bt=3,plane=0)
FFT3Dgpu(sigma=2,bt=3,plane=1)
FFT3Dgpu(sigma=2,bt=3,plane=2)

RemoveDirt(repmode=16)
awarpsharp(depth=20)
LimitedSharpen(ss_x=1.0, ss_y=1.0)

Revgen
14th June 2005, 21:12
It might be a problem with the MJPEG codec. In my personal experience, MJPEG formats usually run into problems when they are edited or converted to other formats.

You also are using plane=2. FFT3dGPU doesn't use plane=2. Both Chroma planes are processed when you use plane=1.

Also try getting rid of the other filters and see if the problem still persists.

I hope this helps.

Zetto
19th June 2005, 21:59
It'll work just fine with dual-core, I have the next best thing - Intel HT CPU and it works... although I had some troubles initially ;) BTW, it is much faster with HT enabled rather than disabled - by about 30% in my case. However, my cpu is not 100% loaded, it seems that my videocard is the bottleneck, it's a fast one too - 6800 ultra.

AI
29th July 2005, 09:11
I think download frame from VRAM slowly than upload to VRAM

may be include downsize frame in GPU? (Lanczos for Luma and Bilinear for Chroma)
because many people after this filter use resize.

possible script:
--------------------
MergeChroma(fft3dgpu(plane=0,x=512,y=384),fft3dgpu(plane=1,x=512,y=384))
--------------------

original chroma (luma) only crop (not resize, becouse slowly)

or if I use fast CPU and fact GPU (or slow CPU and slow GPU)
I can use this script:
-------------
MergeChroma(fft3dfilter(plane=0).LanczosResize(512,384),fft3dgpu(plane=1,x=512,y=384))
------------

PS Excuse me my english (my language is russian)

tsp
29th July 2005, 18:53
It should be posiible to implement that as a post processing filter before downloading the result to main memory. But currently my spare time to code this filter and the current multithreaded version of avisynth is very limited until late august.

AI
5th August 2005, 07:28
while you busy,
may be I say several my ideas?
(I am a generator ideas :D)

I think your first versions be only one buffer
New version is double buffering (first buffer coding in GPU, second coping to memory and then coding in CPU)

I suggest 3-buffering (first - coding in VRAM, second coping to RAM from VRAM and simultaneously third buffer coding in CPU next filters or encoder (for instance XviD))

i.e. we have 3 parallel proceses
1) fft3dGPU in GPU
2) Download from VRAM to main RAM
3) other proceses in CPU

What you thin about this?

tsp
6th August 2005, 00:04
well currently fft3dgpu does process the filters before fft3dgpu in the next frame while the gpu is working on the current frame. I don't know how well the gpu handles simultaneously downloading from the gpu and proccessing on the gpu at the same time.

AI
8th August 2005, 04:46
I think you understand me.
(In a complicated way write on unacquainted language)

I want to elaborate that I bore in mind:

1) current your version: (algorithm steps)
- download from VRAM already ready frame (N)
- upload to VRAM next frame (N+1)
- run GPU (N+1 frame)
- end (send management AVISynth)

2) I offer: (N = integer, number curent frame)
- upload to VRAM N+2 frame
- run in GPU N+2 frame
- run download from VRAM already ready frame N+1
- send AVISynth already downloaded frame N

PS say you so have understood previous my post?
PPS if I use DePanInterleave, I want every third frame... What you think about this optimization?

tsp
9th August 2005, 15:59
Slight corection currently fft3dgpu works like this:

-check if needed frame is in the gpu cache(bt=1 uses 1 frame at a time bt=2 uses 2 and bt=3 uses 3 frames at a time) if not shift pixels(because the order the pixels are stored are different when uploaded in a texture compared to in an avisynth frame) and upload frame to GPU and do a 2d fft
-start processing in GPU
-while this is running get the next frame needed. Currently the frame is NOT uploaded to the GPU but this is a thing I'm thinking to implement.
-when this is done and if the GPU is not done then suspend avisynth until the GPU is done so if the result us encoded the encoder gets cpu time to work
-download the result to main memory and return it

AI
31st August 2005, 08:40
why fft3dfiltre(plane=0).fft3gpu(plane=1) faster,
then fft3gpu(plane=1).fft3dfiltre(plane=0)

i.e. use CPU before fft3gpu faster,
then use CPU after fft3gpu...

tsp
31st August 2005, 12:59
because when the fft3dGPU is using the GPU the CPU is fetching the next frame, that is all the filters before fft3dGPU is running concurrent with the GPU. Because fft3dGPU doesn't know about the filters after fft3GPU it is not pissible to run these concurrent with fft3dgpu. To understand this you need to understand how avisynth works. If you script looks like this:

Avisource("c:\test.avi")
fft3dfilter(plane=0)
fft3dGPU(plane=1)

when virtualdub ask avisynth to deliver a frame it first ask ft3dGPU to return a frame. fft3dGPU then ask fft3dfilter to deliver a frame that again asks Avisource to return a frame. Avisource load the frame and returns it to fft3dfilter that process it at returns the result to fft3dgpu. fft3dGPU then works on the frame but because it uses the GPU the CPU is free so fft3dGPU ask fft3dfilter to deliver the next frame while the GPU works so when avisynth asks fft3dGPU to deliver the next frame fft3dGPU already have the result from fft3dfilter. You can see if fft3dfilter and fft3dgpu was used in the reverse order fft3dGPU would only call AVISource to deliver the next frame.

Fizick
2nd September 2005, 19:52
tsp,
please edit this line in your FAQ (first post):
A: fft3dGPU(mode=1,usefloat16=false) is similair to fft3dfilter(oh=bh,ow=bw)

to correct line:
A: fft3dGPU(mode=1,usefloat16=false) is similar to fft3dfilter(oh=bh/2,ow=bw/2) for same bt and power 2 bw,bh

LordIntruder
8th October 2005, 08:25
Hi,

Tsp is there any new update planned? Thanks a lot. :)

tsp
9th October 2005, 16:11
LordIntruder: I'm nearly done with the MT version of avisynth so I will turn my attention to fft3dGPU again and after Fizick released the sourcecode for fft3dfilter(Fizick thank you very much) I will try to convert fizick's code to a GPU version (by using my own fft code instead of fftw and convert fft3dfilter_c.cpp to directx 9 HLSL) also the cachecode will need a minor rewrite. I have my last 3 large exams in the next 3 month before I'm done at the university so my time might be rather limited :sly:

aberforthsgoat
10th October 2005, 07:59
Hiya guys!

What's the lowdown on fft3dgpu and interlaced source material? With fft3dfilter you can set it to interlaced=true - but we don't seem to have an option like that here. I've run some searches but haven't found much so far. What would you guys recommend?

Mike

Backwoods
10th October 2005, 16:24
Have you tried,

SeparateFields()
FFT3DGPU()
Weave()
?

tsp
10th October 2005, 18:04
sf=SeparateFields()
Interleave(sf.selecteven().FFT3DGPU(),sf.selectodd().FFT3DGPU())
Weave()

will be better else the even and odd field will be mixed

Mug Funky
11th October 2005, 05:49
wouldn't that only happen if bt=3?

tsp
11th October 2005, 06:58
or bt=2 yes that's right

aberforthsgoat
11th October 2005, 14:12
sf=SeparateFields()
Interleave(sf.selecteven().FFT3DGPU(),sf.selectodd().FFT3DGPU())
Weave()

will be better else the even and odd field will be mixed

Thanks - that seems to be giving me a great start. I'm now running into some trouble, however. I've put together the following script to work on a DVD of a football game:

LoadPlugin("C:\PROGRA~1\GORDIA~1\DGMPGDec\DGDecode.dll")
LoadPlugin("C:\PROGRA~1\GORDIA~1\AviSynthPlugins\UnDot.dll")
LoadPlugin("C:\Program Files\GordianKnot\AviSynthPlugins\fft3dgpu.dll")
LoadPlugin("C:\PROGRA~1\GORDIA~1\AviSynthPlugins\TomsMoComp.dll")

mpeg2source("D:\Football4\Chargers-Pats.d2v")

sf=SeparateFields()
Interleave(sf.selecteven().FFT3DGPU(bt=3, sigma=4, sharpen=0.4, mode=1, plane=1),sf.selectodd().FFT3DGPU(bt=3, sigma=4, sharpen=0.4, mode=1, plane=1))
Interleave(sf.selecteven().FFT3DGPU(bt=3, sigma=4, sharpen=0.4, mode=1, plane=2),sf.selectodd().FFT3DGPU(bt=3, sigma=4, sharpen=0.4, mode=1, plane=2))
Interleave(sf.selecteven().FFT3DGPU(bt=3, sigma=4, sharpen=0.4, mode=1, plane=0),sf.selectodd().FFT3DGPU(bt=3, sigma=4, sharpen=0.4, mode=1, plane=0))
Weave()

TomsMoComp(1,5,1)

crop(2,6,348,568)

Lanczos4Resize(1024,768)

Undot()



I thought I was doing things exactly right - first processing, then deinterlacing. But it's not working right. Without the TomsMoComp line, the output is rather amazing - except for bad combing artifacts. (Which my on-the-fly deinterlacers in ZP only make worse.)

*With* TomsMoComp I get an error message from ps.hlsl about an unexpacted #else following #else (X1514) and redefinition of "o" (X3003).

If I'm asking a stupid question (i.e., one that a million people have already asked), please feel free to direct me to the nearest FAQ and/or residential facility for the criminally dense.

Best,

Mike

acrespo
11th October 2005, 14:33
If you need deinterlace your video, put FFT3DGPU after deinterlacer. Also, try TDeint instead TomsMoComp.
FFT3DGPU don't have plane=2. plane=1 denoise all chroma planes.

aberforthsgoat
11th October 2005, 14:49
If you need deinterlace your video, put FFT3DGPU after deinterlacer. Also, try TDeint instead TomsMoComp.
FFT3DGPU don't have plane=2. plane=1 denoise all chroma planes.

Oh - OK! I carried that over from the fft3dfilter, thinking it would work here. Back to R-ing TFM ...

And I also just discovered that simply not re-weaving at the end takes care of my combing problem. GSpot say I've upped the ante to 50 FPS, which doesn't seem like a bad thing anyway, aprticularly since this is fast moving sports stuff. (Or am I making a fool of myself again? Sigh.)

Mike

tsp
11th October 2005, 20:24
aberforthsgoat: You will only have half the vertical resolution(and double framerate) if you don't weave the result.

Ferux
21st October 2005, 15:42
Hi,

Thank you for this great filter, the speed is x3 here (AMD Athlon 64 3500+ and Radeon 9800). The only problem for me is that I can't set the system in standby when using your filter. After resuming I receive the following message:

Unexpected error encountered

File:
Line: 387
Error Code: D3DERR_DEVICELOST (0x88760868)
Calling: ResetDevice
Do you want to debug the application?

When I hit 'No':

fft3dGPU

Direct3D device lost. Please restart the application



I use Avisynth 2.5.6 and VirtualDub 1.6.11.

tsp
22nd October 2005, 00:46
Do you have any other program running that are using directx 3D? The filter should handle resuming from standby without that error. It only happends if fft3dGPU can't reinitialize the graphics card. I did a resume from suspend with fft3dGPU and vdub 1.6.11 and what worked. You are using the latest version of fft3dgpu right?

Ferux
22nd October 2005, 01:54
I think that my Windows logon screen (which appears when resuming from standby) uses directx, because I have the same problem when switching users. :(

Leak
23rd October 2005, 12:19
I think that my Windows logon screen (which appears when resuming from standby) uses directx, because I have the same problem when switching users. :(
Ummm... making your system go into standby mode will completely power off the graphics card, which means you need to completely re-initialize both the driver and the part of the application that was using the card, so without tsp handling this gracefully in fft3dGPU it simply can't work - which is a pity, of course, if you happen to try to sleep next to a machine chugging away on a long encoding job; nothing like sending it into standby mode until breakfast... ;)

@tsp: where does the need to restart the application come from? Shouldn't it be enough to just re-init DirectX and upload all the needed data again, which might include re-processing a few earlier frames, if their data is needed for the current frame?

np: Stockfinster - Verge (All Becomes Music)

tsp
23rd October 2005, 14:42
leak: I do reinitialize DirectX. That error only occurs if the reinitializing fails for some reason. On my machine fft3dGPU handles a suspend/standby/hibernate without problem. In this case the error occurs because the directx device can't be reset. se this link: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/directx9_c/IDirect3DDevice9__TestCooperativeLevel.asp And fft3dGPU is not a fullscreen application so it doesn't need focus to work. Currently in fft3dGPU there is a 120 sec timeout before the error is reported. So maybe I set this value to low in this case?

Ferux could you post the script you use.

Leak
23rd October 2005, 21:26
Currently in fft3dGPU there is a 120 sec timeout before the error is reported. So maybe I set this value to low in this case?
I'm just going out on a limb here, but could it be that in his case the timeout triggers because it starts right before/during going into standby (as the applications aren't stopped, but the drivers have to be), so that if you wake the machine after more than 2 minutes it will have already timed out?

If he gets the message right after waking his machine instead of about 2 minutes later that's probably the case...

(All speculation, of course... ;))

np: Stockfinster - Last Report (All Becomes Music)

Ferux
24th October 2005, 23:30
Ferux could you post the script you use.


AVISource("E:\Video's in productie\Prullebak\4de video ffvfw quant1.avi",pixel_type="yuy2")

Trim(0,6303) ++ Trim(6422,12260) ++ Trim(12295,19201) ++ Trim(19218,21088) ++ Trim(21104,21508) ++ Trim(21573,23771) ++ Trim(23799,25292) ++ Trim(25327,29465) ++ Trim(29475,31139) ++ Trim(31143,31348) ++ Trim(31509,32302) ++ Trim(32657,33452) ++ Trim(33496,45127) ++ Trim(45149,51298) ++ Trim(51489,52859) ++ Trim(52911,58559) ++ Trim(58659,67096) ++ Trim(67113,73240) ++ Trim(73314,75169) ++ Trim(75192,81295) ++ Trim(81348,97794) ++ Trim(97905,108199) ++ Trim(108242,115934) ++ Trim(115980,119239) ++ Trim(119260,120939) ++ Trim(120986,122388) ++ Trim(122526,126301) ++ Trim(126415,141354) ++ Trim(141364,153625) ++ Trim(153690,155834) ++ Trim(155907,156938) ++ Trim(156961,174748) ++ Trim(174873,180006) ++ Trim(180029,200628) ++ Trim(200675,206831) ++ Trim(206854,207221) ++ Trim(207239,207567) ++ Trim(207810,211420) ++ Trim(211528,211639) ++ Trim(211670,219630)

AssumeTFF()
SeparateFields()
AssumeFieldBased()
SmoothDeinterlace(tff=true, doublerate=true)

ConvertToYV12()
FFT3DGPU(plane=0,bt=3,sigma=2.0,bw=48,bh=48,mode=1)
FFT3DGPU(plane=1,bt=3,sigma=2.0,bw=48,bh=48,mode=1)

Crop(12,0,-12,-12)

FadeIO2(150)

Subtitle("Video 4",first_frame=0,last_frame=350,align=5,size=60,text_color=$FFFFFF,y=225)
Subtitle("31-01-1991",first_frame=0,last_frame=350,align=5,size=60,text_color=$FFFFFF,y=315)
Subtitle("tot",first_frame=0,last_frame=350,align=5,size=60,text_color=$FFFFFF,y=370)
Subtitle("17-02-1992",first_frame=0,last_frame=350,align=5,size=60,text_color=$FFFFFF,y=425)
Subtitle("SPEELDUUR 2:24", first_frame=0,last_frame=275,align=5,size=14,text_color=$FFFFFF,y=495)
Subtitle("ZET DEBLOCKING OP IN DE DECODER", first_frame=0,last_frame=275,align=5,size=14,text_color=$FFFFFF,y=510)
Subtitle("CAPTURE DOOR VIRTUALDUB & DIVX QUANTIZER 93%", first_frame=0,last_frame=275,align=5,size=14,text_color=$FFFFFF,y=525)
Subtitle("POSTPROCESSING DOOR AVISYNTH: TRIM, SMOOTH DEINTERLACE, FFT3DGPU, CROP, FADE EN TITELS", first_frame=0,last_frame=275,align=5,size=14,text_color=$FFFFFF,y=540)
Subtitle("VIDEO CODERING: XVID (MPEG4-ASP) 2PASS 3300KBPS, GECODEERD IN OKTOBER 2005", first_frame=0,last_frame=275,align=5,size=14,text_color=$FFFFFF,y=555)



After the 'smooth deinterlacer', the video has 50 fps. This took 24h for a 2:24 homevideo. If I process this with FFT3DFILTER, it takes 3 days. So I open this script in Virtual Dub 1.6.11, check 'fast recompress' and save it to FFVFW Quantizer 1 (latest version). (and later on, I encode it to XviD in 2 passes).

My system is:
Athlon 64 3500+
MSI Neo2 Platinum
1024MB Ram
Radeon 9800 Pro

The OS is Windows XP SP2 (not an x64 edition)

I noticed that the problem also occurs when I touch the PC when the screensaver is on and when I switch to another WinXP-user.

Currently in fft3dGPU there is a 120 sec timeout before the error is reported. So maybe I set this value to low in this case?

When I switch to another WinXP-user, it comes just after loging on. And that's much faster than 120 sec.


Thanks


nothing like sending it into standby mode until breakfast...

²!!!

Boulder
25th October 2005, 08:00
A little sidenote: you don't need SeparateFields() and AssumeFieldBased() in your script.

Ferux
25th October 2005, 12:16
A little sidenote: you don't need SeparateFields() and AssumeFieldBased() in your script.

I know, but without those 3 lines SmoothDeinterlace doesn't work! I have that problem since I use SmoothDeinterlace in AVISynth 2.56 instead of 2.08. But whatever, this makes no difference for FFT3DGPU.

Boulder
25th October 2005, 13:23
How about trying some other smart bobber such as LeakKernelBob (in LeakKernelDeint.dll) or TDeint(mode=1)? Seems weird that you need to separate the fields because in my logic the result won't be the same as you already have a 50fps stream before bobbing. Maybe it simply resizes the fields to full height?

LeakKernelBob should be quite a bit faster as well ;)

Ferux
25th October 2005, 23:21
How about trying some other smart bobber such as LeakKernelBob (in LeakKernelDeint.dll) or TDeint(mode=1)? Seems weird that you need to separate the fields because in my logic the result won't be the same as you already have a 50fps stream before bobbing. Maybe it simply resizes the fields to full height?

LeakKernelBob should be quite a bit faster as well ;)


I have tested a lot of deinterlacers for myself, and I found that SmoothDeinterlace was the best. So I used that deinterlacer already in Virtualdub and AVISynth 2.08. Last week I changed to AVS 2.56 and my previous script didn't work anymore (of course, I downloaded the right plugin for this new version). I searched a bit and found that the results after doing this:

AssumeTFF()
SeparateFields()
AssumeFieldBased()
SmoothDeinterlace(tff=true, doublerate=true)
has exactly the same result as this:

SmoothDeinterlace(tff=true, doublerate=true)

Why? I don't know (and I don't care), but it works.

Wilbert
26th October 2005, 00:05
Try the following version of SmoothDeinterlacer. It should work without any problems.

Attachment needs to be approved :)

Ok, here: http://www.geocities.com/wilbertdijkhof/SmoothDeinterlacer_25.zip

Revgen
26th October 2005, 03:06
To Nvidia users with dual-core systems:


The new 81.85 dual-core optimized drivers slow down FFT3DGPU.

I encoded a Huffyuv file and got about 10FPS with the older non-dualcore 81.26 driver. I got about 8fps with the 81.85's. Thats about a 20% decrease in performance. Which is about the same performance decrease that running FFT3DGPU using the MT filter does.

It does, however, improve performance in games that are CPU bound. So make sure to change your driver if you plan to encode.

tsp
26th October 2005, 19:44
Ferux: Could you try this (http://www.avisynth.org/tsp/FFT3dGPU.zip) version. I disabled the timeout so it might hang instead of reporting an error.

Ferux
28th October 2005, 18:42
Ferux: Could you try this (http://www.avisynth.org/tsp/FFT3dGPU.zip) version. I disabled the timeout so it might hang instead of reporting an error.

It doesn't work, but maybe something went wrong when uploading the file? This file is 390KB, the original FFT3DGPU is 1,02MB.




Try the following version of SmoothDeinterlacer. It should work without any problems.

It works, thanks!


Sorry for not answering more quickly, but I was already encoding for 3 days. Today it just finished..

tsp
28th October 2005, 20:23
It doesn't contain the source or the directx 9.0b version. Did it refuse to start or just showed the same error?

acrespo
30th October 2005, 03:25
Can be implement BT=-1 as described in fft3dfilter documentation:

Sharpening

At sharpening stage (after denoising) the plugin amplifies high spectrum (spatial, 2D) frequencies .
There is also sharpen-only mode without denoising (bt=-1).
Since version 1.1, some special limited sharpening method is used :

* the weakest frequencies (with small amplitudes) are not amplifyed to prevent noise increasing;
* the strongest frequencies (with large amplitudes) are not amplifyed to prevent oversharping and haloing.

The sharpening strength is maximal for frequencies with middle-range amplitudes. Of course, you can control both these margins and general sharpening strength.

Since v.1.7, Gaussian High Pass Filter with variable cutoff frequency is used for sharpening.
----------

I need a very fast sharpening plugin but all filters run very very low speed in my computer.

Ferux
30th October 2005, 10:18
It doesn't contain the source or the directx 9.0b version. Did it refuse to start or just showed the same error?

It refused to start:
"This application can't be started because d3dx9_2 can't be found."

The AVS error message:
"Script error: there is no function named "FFT3DGPU"

tsp
30th October 2005, 13:01
Ferux: I hoped that wouldn't happend. Same problem as with the new virtualdub. You can solve it by installing the latest version of directx: http://www.microsoft.com/downloads/details.aspx?FamilyId=9930EFA6-9F7B-4C8A-AEA2-97DD6AB307A2&displaylang=en
if you have a slow connection (the file is about 34 MB!) I have compressed the neccesary files in this (http://www.avisynth.org/tsp/directx.zip) zip file (2 MB). Just exctract them to windows\system32

acrespo: I'm working on it. But it will take a while

Ferux
30th October 2005, 22:03
I installed that DirectX.

When I open a AVS-script in AVISynth, I get these messages (about 20 of them):

C2R (The title is always different)

C:\Program Files\AviSynth2\plugins\ps.hlsl(524): warning X3083: Truncating 4-vector to size 1


After clicking these messages away, I can use VirtualDub like always. So, I save and AVS starts processing. No problems, until I trie to go to another Windows-user. When I log on to this Windows user (where AVS is running), it gives a lot of errors like these:


File:
Line: 1249
Error Code: S_OK (0x00000000)
Calling: FFT3p
Do you want to debug the application?

File:
Line: 1047
Error Code: S_OK (0x00000000)
Calling: WFilter
Do you want to debug the application?

File:
Line: 1317
Error Code: D3DERR_INVALIDCALL (0x8876086c)
Calling: iFFT3p
Do you want to debug the application?

File:
Line: 684
Error Code: S_OK (0x00000000)
Calling: BitReverseButterFlyV
Do you want to debug the application?

File:
Line: 947
Error Code: S_OK (0x00000000)
Calling: ButterflyCollectV
Do you want to debug the application?


...and many more.

After clicking those messages away, the processing continues! But AVS changed the color of 4 frames (it must have been at the time those errors occured).

When I'm "in" the other Windows user, I see that the CPU usage is about 0%, so FFT3DGPU doesn't do anything at that moment.

tsp
2nd November 2005, 21:53
Ferux: Just to prove that I'm not dead here is a new version that might fix the corrupted frames after recovering (should also fix the popups).

You can get it here (http://www.avisynth.org/tsp/fft3dgpu_47.zip)

Ferux
4th November 2005, 17:48
And... it works! But I'm still having those colored frames. That's not really a problem because it's possible to replace those 4 frames in Virtualdub.


Many thanks for the support, tsp!

tsp
6th November 2005, 23:05
Ferux: Could you try this (http://www.avisynth.org/tsp/fft3dgpu_47.zip) version. It shouldn't produce those colered frames.

Ferux
7th November 2005, 20:06
Congrats, it works!

The messages in the beginning 'Truncating 4 vector size to 1' (+/- 20 of them) still apear, but the rest of the plugin seems to work normal.

Again, many thanks.

AI
8th November 2005, 06:52
tsp

I can not loadplugin your new plugin version fft3dgpu_47.zip from 6-november and FFT3dGPU_047.zip from 2-november

In both versions dll have small size - 400Kb, but ver 0.46.1 900kb

PS Excuse me, my very bad English

tsp
8th November 2005, 12:19
Ferux: Just replace the ps.hlsl file with the ps.hlsl file in the version in fft3dgpu_47.zip archive. If you did that and you still get the error could you post the line number it reports.

AI: Did you put the file d3dx9_25.dll in the your c:\windows\system32 directory? else try installing the latest directx version:
http://www.microsoft.com/downloads/details.aspx?FamilyId=9930EFA6-9F7B-4C8A-AEA2-97DD6AB307A2&displaylang=en

Ferux
8th November 2005, 16:52
Oh, sorry I forgot that ps.hlsl. Everything is OK now.

Great job tsp!

tsp
9th November 2005, 21:44
good I posted the new version 0.47 at the first post. Only new is the above fix and the readme.txt has been renamed to fft3dgpu.txt

aberforthsgoat
21st November 2005, 09:41
Hmm.

I just tried installing the latest version of fft3dgpu - and AVS says it cant load. I copied *both* the new version of ps.hlsl and the new version of fft3dpgu into my plugins directory. As soon as I revert back to the version 0.42 stuff, everything is fine again.

Any tips?

Mike

tsp
21st November 2005, 11:31
aberforthsgoat: from the first post:

if you don't have the latest version of directx installed (october 2005) you can get it here: http://www.microsoft.com/downloads/...&displaylang=en(34 MB)
or extract the file d3dx9_25.dll to the c:\windows\system32 directory or use the directx 9.0b version.

Did you try that?

AI
21st November 2005, 12:04
2 tsp

add string about "d3dx9_2x.dll" to change list, and bold this text in first post

(becouse this is not obviously)

PS Excuse my very bad English :(

tsp
21st November 2005, 14:32
added a comment to the changelog about reading the install instructions. Also I got the degrid working for bt=1 but it seems to only work well with mode=1. I will release the next version then degrid has been added to bt=2,3 (might also change the sharp code to)

ariga
25th November 2005, 14:36
Version 0.46 with a GeForce FX 5600 Ultra, I get

Line: 605
Error Code: D3DERR_INVALIDCALL (0x8876086c)
Calling: Create TextureM:Texture

Line: 380
Error Code: D3DERR_INVALIDCALL (0x8876086c)
Calling: FFT2dRR::Create R2CLUT

Using DirectX 9c, 2.6GHz P4 HT.

tsp
26th November 2005, 03:01
ariga: I will post the next version shortly but please post your script else it is very difficult to figure out that is wrong.

tsp
26th November 2005, 15:57
ok version 0.5 is ready to download. Includes Kalman, sharpening, bt=4, degrid from fft3dfilter. Degrid only works well with mode=1. Also currently Kalman filtering is not supported on the geforce fx 5xxx. Rewrote some of the code so it might be faster than the last version.

Chainmax
26th November 2005, 16:39
Thanks for this much expected update, keep up the good work :).

Fizick
26th November 2005, 17:33
degrid: ... (but it does degrid sharpening with kalman
What is degrid sharpening?

tsp
26th November 2005, 18:16
sharpening with degrid.

tsp
26th November 2005, 23:06
Uploaded version 0.5a. It fix a bug with bt=2. Only file changed is fft3dgpu.hlsl.

Revgen
26th November 2005, 23:59
What is degrid sharpening?

sharpening with degrid.

LOL :D

I'll eventually try it out and see how it works, once I find some time.

Kopernikus
27th November 2005, 19:13
@tsp: Is there somewhere more information about shader programming available? Perhaps a sort of SDK?

AssassiNBG
27th November 2005, 19:57
Umm ... am I blind or is there no new version on http://www.avisynth.org/tsp/ ?

tsp
27th November 2005, 20:33
Sorry forgot to update the index page. It should be fixed now. Also I uploaded a new version 0.51 that fixes a bug where the parameters after NVPerf was shifted one place so degrid=scutoff,scutoff=svr, etc. Improved download speed from GPU and Kalman should work with geforce fx 5xxx.

I created a special version of fft3dGPU that reports the time it takes to download the final image from the gpu. You can get it here (http://www.avisynth.org/tsp/bwtest.zip). Just run the included download speed.avs after the included fft3dgpu.dll has been extracted to the plugin directory. On my computer with a AGP Geforce 6800GT it takes ~4.3 mikrosec to download it. That's about 100 MBytes/sec 92MBytes/sec (the other is million bytes/sec). AGPx8 speed upload speed is about 2100 MBytes/sec. So if anyone with a PCI-express GPU would run the test to compare the result.

Kopernikus: There are the DirectX SDK (http://msdn.microsoft.com/directx/sdk/) that contains some sample. Also both NVidia (http://developer.nvidia.com/object/sdk_home.html) and ATI (http://www.ati.com/developer/radeonSDK.html) have SDK available. Most of the sample is game orientatet but there are also some image/video and general purpose GPU (GPGPU) shader examples. The sample chapters (http://www.ati.com/developer/shaderx/index.html) from the ShaderX bookseries contains some nice sample for image manipulations on the GPU.
www.gpgpu.org is also a good site although mostly OpenGL.

Kopernikus
27th November 2005, 20:58
Thank you

ariga
28th November 2005, 11:02
ariga: I will post the next version shortly but please post your script else it is very difficult to figure out that is wrong.

It's a simple
AviSource("dv.avi")
LeakKernelDeint(order=0)
FFT3DGPU() # no params

Doesn't matter what params i pass, the error is the same.

BTW, I tried 0.47 and it complained about missing d3dx9_27.dll

Just d/l 0.51. Will see if it works.

acrespo
28th November 2005, 14:57
I am trying a compare between FFT3Dfilter and FFT3DGPU.

FFT3Dfilter is more efficient than FFT3DGPU in my anime captures. The parameters I used:

FFT3DFilter(sigma=5, sharpen=1.0) << version 1.8.3
FFT3DGPU(sigma=5, bh=48, bw=48, mode=1, sharpen=1.0) << version 0.5a

I still have some grid lines with fft3dgpu in some frames and fft3dfilter don't have any grid in all video. I guess the only thing is different in the parameters above is the overlap. The default overlap in fft3dfilter is bw,bh/3 and I don't know the default overlap of fft3gpu.

tsp
28th November 2005, 15:16
all: I made an installer. It might be more userfrindly ;)

acrespo:the default overlap is ow=bw/2 oh=bh/2 currently that can't be changed. Also there where a bug in 0.5a that assigned the value from svr to degrid. So the default value was 0.3. That might explain it. That is fixed in 0.51. Another difference is the precision used. By default fft3dgpu uses 16 bit floating point precission while fft3dfilter uses 32 bit precision. You can change that by setting useFloat16=false. This slows down the filter and uses more memory on the GPU.

ariga: I don't get any errors with 0.51, fft3dgpu() and a Geforce fx 5200.

AI
29th November 2005, 05:16
I created a special version of fft3dGPU that reports the time it takes to download the final image from the gpu. You can get it here (http://www.avisynth.org/tsp/bwtest.zip). Just run the included download speed.avs after the included fft3dgpu.dll has been extracted to the plugin directory. On my computer with a AGP Geforce 6800GT it takes ~4.3 mikrosec to download it. That's about 100 MBytes/sec. AGPx8 speed upload speed is about 2100 MBytes/sec. So if anyone with a PCI-express GPU would run the test to compare the result.

ATI x700 DDR(I) 128bit, A64 3000+ S939

if core ratio = 4 (i.e. 800Mhz - min) = 7,5e-4 (sec?)
if core ratio = 9 (i.e. 1800Mhz) 1,2e-3 (sec?)

I can test in PIII800 fx5200, but later.