Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
3rd April 2005, 00:10 | #41 | Link |
Registered User
Join Date: Oct 2001
Posts: 195
|
Hi!
Some digits again-I did the "compressibility test" at same source (273 frames,PAL) intel P4, Radeon 9600 non filtered Xvid.avi size 9,861,120 bytes fft3dGPU v.0.31 (sigma=2.0,bt=1) ->9,439,232 bytes fft3dGPU v.0.40 (sigma=2.0,bt=1) ->9,441,280 bytes fft3dGPU v.0.31 (sigma=3.0,bt=3) ->7,495,680 bytes fft3dGPU v.0.40 (sigma=3.0,bt=3) ->7,489,536 bytes fft3dGPU v.0.31 (sigma=3.0,bt=3,bh=16,bw=16) ->8,501,248 bytes fft3dGPU v.0.40 (sigma=3.0,bt=3,bh=16,bw=16) ->8,497,152 bytes fft3dGPU v.0.31 (sigma=3.0,bt=3,bh=64,bw=64) ->6,516,736 bytes fft3dGPU v.0.40 (sigma=3.0,bt=3,bh=64,bw=64) ->6,500,352 bytes fft3dGPU v.0.40 (sigma=3.0,bt=3,bh=64,bw=64,mode=1) ->6,019,072 bytes Still any artifacts are invisible here Thank You! P.S. Just curious why "FFT3DFilter(sigma=3.0,bt=3,bh=16,bw=16)" is produceing much smaller file - 6,408,192 bytes vs 8,497,152 bytes(by fft3dGPU) |
4th April 2005, 05:53 | #42 | Link |
AviSynth plugger
Join Date: Nov 2003
Location: Russia
Posts: 2,183
|
tsp,
The speed results of your filter is great! But what is your "1:1 overlap" and "2:1 overlap" mean? Kokaram (and me) used say 16 pixels blocks width, every next block is shifted by 8 pixels (right, bottom), so one-side overlap size is 8 pixels for every block, and whole blocks width is overlapped, so summary overlap size (left and right) for block is equel to its width= 16 pixels. I think it is full (maximum possible) overlapping (for simple algo). Is it your "1:1" or "2:1" ? Now i create (not release yet) new version of FFT3DFilter with partial overlapping (with arbitrary overlapped size), and confused with therms. I want use new parameter "overlap width" as one-side overlap size, with maximum value equal to half of block width.
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick I usually do not provide a technical support in private messages. |
4th April 2005, 07:45 | #43 | Link |
Registered User
Join Date: Aug 2004
Location: Denmark
Posts: 807
|
Fizick : What you descripe is my 2:1 overlap=mode 1. In the 1:1 overlap (mode=0) the blocks are only shifted half bh down and bw to the right so 1/4 of a block is only overlapped by 1 block(compaired to 3 blocks when using mode=1). when using mode 2 only bw minus the border is used for overlapping (mainly because the artifacts are most severe at the borders). So this is nearly the same as partial overlap. mode 0 and mode 2 uses another window function than the one used in mode 1.
This image shows the diffent mode: So if you wants to compaire fft3dgpu with fft3dfilter use: fft3dgpu(mode=1,usefloat16=false) Last edited by tsp; 4th April 2005 at 07:47. |
4th April 2005, 21:06 | #44 | Link |
AviSynth plugger
Join Date: Nov 2003
Location: Russia
Posts: 2,183
|
tsp,
thanks for response and nice pic. But i am not not quite understand it. I draw my overlap pic in fft3dfilter thread.
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick I usually do not provide a technical support in private messages. |
4th April 2005, 21:38 | #45 | Link |
Registered User
Join Date: Aug 2004
Location: Denmark
Posts: 807
|
Fizick: From your drawing it looks like the center of a block isn't overlapped at all. Is that true?
Also mode 1 in fft3dgpu and your default mode is the same. So look carefull at the (ugly) drawing of mode 1 and you can see four different colored blocks (dotted blue , dotted dark green, solid red and solid light green). To filter a 720x576 image we need to fft ~720/bw*576/bh*4 blocks. When using mode 0 there are only to overlapped block (red and green) meaning only ~720/bw*576/bh*2 blocks. And finaly mode 2 needs ~720/(bw-borderwidth*2)*576/(bh-borderheight*2)*2 blocks. Last edited by tsp; 4th April 2005 at 21:45. |
5th April 2005, 22:11 | #46 | Link |
Registered User
Join Date: Aug 2004
Location: Denmark
Posts: 807
|
Found a bug. I forgot to square the modulus of the transformed image when calculating the PowerSpectralDensity. I will release a new version shortly until then the quick fix is to change line 518 in ps.hlsl from
float2 PSD=float2(length(src.xz),length(src.yw)); to float2 PSD=float2(src.x*src.x+src.z*src.z,src.y*src.y+src.w*src.w); |
6th April 2005, 21:46 | #47 | Link |
AviSynth plugger
Join Date: Nov 2003
Location: Russia
Posts: 2,183
|
tsp,
Yes, center is not overlapped in my partial overlap mode. So I conclude, that full overlap mode of my FFT3dfilter(bw=32,bh=32, ow=16, oh=16) is the same as your FFT3DGPU(mode=1, bw=32,bh=32). But my partial overlap FFT3dfilter(bw=32,bh=32, ow=8, oh=8) is NOT the same as your partial overlap FFT3DGPU(mode=0, bw=32,bh=32). So, users may compare results (quality) of different approaches. (after you fix recent bug, and i fix my quite possible bugs - i rewrote many lines of code in v.0.9)
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick I usually do not provide a technical support in private messages. |
9th April 2005, 02:55 | #49 | Link |
Registered User
Join Date: Jan 2003
Posts: 109
|
Hi,
Some speed measurements again. I put old results back here for better readability: Athlon 2400+, Radeon 9600 Pro, 1 Gb Ram. I encoded a 10000 video frames (720 x 528) with these parameters for FFT: 'FFT3DFilter(sigma=3, bt=3)' For FFT3dGPU I used: 'fft3dGPU(sigma=3, bt=3)' Without FFT 1st Pass = 13 min 2nd Pass = 37 min FFT Measure True (v0.8.3) 1st Pass = 53 min 2nd Pass = 80 min FFT Measure OFF (v0.8.3) 1st Pass = 59 min 2nd Pass = 82 min FFT3dGPU (v0.3) 1st Pass = 17 min 2nd Pass = 40 min -------- Today I encoded the exact same clip with the updated versions: 'FFT3DFilter(sigma=3, bt=3)' 'fft3dGPU(sigma=3, bt=3, mode=1, reduceCPU=false)' FFT Measure OFF (v0.9.1) 1st Pass = 28 min 2nd Pass = 55 min FFT3dGPU (v0.4) 1st Pass = 25 min 2nd Pass = 49 min FFT3dGPU (v0.41 reduceCPU=false) 1st Pass = 29 min 2nd Pass = 57 min FFT3dGPU (v0.41 reduceCPU=true (default)) 1st Pass = 21 min 2nd Pass = 38 min I thought 'reduceCPU=false' increased encoding speed when I first read your explanations about this option. In fact it decrease the speed and by default (true) it is already the fatest. In short 0.41 is faster than 0.4 (I was afraid that the more complex math to fix the bug would increase time encoding and it is the opposite, good). However as you can notice I used mode=1 (so 2:1 overlap) for the GPU version. Tsp you told me the encoding speed should be cut by half but it is not the case despite twice more calculations. Normal or a bug? We also can see that the 3DNow optimizations help a lot for the normal version of FFT. That is really great. Is the GPU version 3Dnow or SSE optimized? If not I hope you intend to do it, we would get some speed. Finally is the GPU version work as something multi-threaded, I mean as if there were 2 CPU cores like the forthcoming Intel and AMD processors? Maybe it is the way it works (differently of course but the idea), just curiosity. I was thinking about some general code that could be used by others filters. I mean you use a special DLL or something like that, some parameters in the AVS and thanks to this the calculation would be done half by the CPU, half by the GPU. Maybe it's impossible, just idea but like that not only the FFT filter would take benefit of the GPU but also others filters and/or general calculations. Instead to optimize each filter you write a general parameters and any filter can take benefit. A crazy idea Oh before I forget: with both FFT and FFTGPU (and only them) when I start the job using Virtualdubmod latest version, most of the time it closes itself. I launch again VDM, I start the job and the 1st pass start. Then at the end of the 1st pass again sometimes VDM close and I need to manually launch it again and start the 2nd pass by myself or this one is launched after the 1st finishes. It appears ramdomly. I'm using the Avisynth 2.56 build 31 Jan and have just see another Beta from february 21 is out. Will give a try. Anyway cheers to both of you on the work done on these filters |
9th April 2005, 23:22 | #50 | Link | |||||
Registered User
Join Date: Aug 2004
Location: Denmark
Posts: 807
|
Quote:
Quote:
I must admit that i'm a little surprised that the speed decrease wasn't bigger but again I think it's because the GPU uses more time meaning that XVid get's more time to encode so that it somewhat offset the extra time used (multiprocessing is very nice). Again if you use huffyuv or MJPEG you would get a greater speed decrease. Quote:
Quote:
Try to guess which of these two scripts who would run fastest or would they be equally fast? Code:
#SCRIPT A fft3dfilter(plane=1) fft3dfilter(plane=2) fft3dGPU() #SCRIPT B fft3dGPU() fft3dfilter(plane=1) fft3dfilter(plane=2) Quote:
Oh and thanks for the test. It's really amazing to see the interactions with Xvid |
|||||
10th April 2005, 14:11 | #51 | Link |
AviSynth plugger
Join Date: Nov 2003
Location: Russia
Posts: 2,183
|
LordIntruder,
your comparizon is not quite correct. I change default overlap width since v.0.9. (for speed). 3DNow was used previusly by FFTw internally for fft calculation (i think), now in v.0.91 I add 3DNow for my Wiener calculation. The gain is about 25%.
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick I usually do not provide a technical support in private messages. |
11th April 2005, 23:38 | #52 | Link |
Registered User
Join Date: Aug 2004
Location: Denmark
Posts: 807
|
I finaly got rid of the memory leaks so now you can F5 as crazy as you want in virtualdub(or use it in conditionalfilter/scriptclip/frameevaluate although I wouldn't recommend that because of the slow initialization).
Maybe this filter is ready for the avisynth usage categorie |
12th April 2005, 02:36 | #53 | Link |
XviD User
Join Date: Oct 2004
Location: Ky
Posts: 190
|
Was the block artifact bug fixed mentioned earlier in this thread fixed? I've got an FX5200 using the 71.90 Nvidia driver with XP SP2. I still see 0.42 displaying the blocks. Many thanks for the effort!
__________________
DFI NF4 SLI Expert | Opteron 165 CCBBE 0616 XPMW (9x325HTT=2.9Ghz) | 2x1GB G.Skill HZ (3-4-4-8-12-16-2T) | LG 62L DVD/CD | Geforce 7300GT | All SATA | Antec 650 Trio PSU | XP SP2 |
12th April 2005, 07:53 | #55 | Link |
Registered User
Join Date: Aug 2004
Location: Denmark
Posts: 807
|
vinetu: With mode=0 a sharp(gradient) image would produce the most borderartifacts. A single colored area with the same intensity wouldn't produce as many artifacts. But it's easier to spot the artifacts in a flat area. This should produce them:
Code:
fft3dGPU(mode=0,bw=128,bh=128,sigma=50) Last edited by tsp; 12th April 2005 at 08:49. |
12th April 2005, 11:06 | #56 | Link |
XviD User
Join Date: Oct 2004
Location: Ky
Posts: 190
|
Guess my picture attachment for the above post was never approved. The black blocks I mention above can be seen in
this screenshot. They appear on every frame.
__________________
DFI NF4 SLI Expert | Opteron 165 CCBBE 0616 XPMW (9x325HTT=2.9Ghz) | 2x1GB G.Skill HZ (3-4-4-8-12-16-2T) | LG 62L DVD/CD | Geforce 7300GT | All SATA | Antec 650 Trio PSU | XP SP2 |
12th April 2005, 12:49 | #57 | Link | |
Registered User
Join Date: Aug 2004
Location: Denmark
Posts: 807
|
Quote:
Also I have uploaded version 0.42 again because I forgot to update the dll |
|
12th April 2005, 17:02 | #58 | Link |
Registered User
Join Date: Feb 2002
Posts: 407
|
hey i was the first to report this problem but i didn't have the time to post my screenshot (was moving...) and to test your test version. Did you notice that the size of block depend on the bw/bh parameters ?
nevertheless, i was here to ask you another question: how is your algo scalable ? because i found a website here on parallel vision computation algorithm, and they use a computer with 6 PCI FX5200. So i was wondering if something like that could be usefull for our purpose |
12th April 2005, 17:41 | #59 | Link |
Registered User
Join Date: Aug 2004
Location: Denmark
Posts: 807
|
bill_baroud: If you want you can try the new test version and you're welcome to try fixing the bug (src included) It's very strange that some of the block isn't processed.
I'm wondering if the bedst method to split the work between multiple GPUs would be to give each 1 frame or to split the frame up. Also wouldn't a single geforce 6800 GT be faster than 6 PCI fx5200 (they most be awfull bandwidth limited). If someone wants a multiple GPU version they could give me 2 geforce 6800 ULTRA and a nforce 4 SLI motherboard to work with |
12th April 2005, 18:02 | #60 | Link | |
XviD User
Join Date: Oct 2004
Location: Ky
Posts: 190
|
Quote:
__________________
DFI NF4 SLI Expert | Opteron 165 CCBBE 0616 XPMW (9x325HTT=2.9Ghz) | 2x1GB G.Skill HZ (3-4-4-8-12-16-2T) | LG 62L DVD/CD | Geforce 7300GT | All SATA | Antec 650 Trio PSU | XP SP2 |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|