Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 3rd April 2005, 00:10   #41  |  Link
vinetu
Registered User
 
Join Date: Oct 2001
Posts: 195
Hi!
Some digits again-I did the "compressibility test" at same source (273 frames,PAL)

intel P4, Radeon 9600
non filtered Xvid.avi size 9,861,120 bytes

fft3dGPU v.0.31 (sigma=2.0,bt=1) ->9,439,232 bytes
fft3dGPU v.0.40 (sigma=2.0,bt=1) ->9,441,280 bytes

fft3dGPU v.0.31 (sigma=3.0,bt=3) ->7,495,680 bytes
fft3dGPU v.0.40 (sigma=3.0,bt=3) ->7,489,536 bytes

fft3dGPU v.0.31 (sigma=3.0,bt=3,bh=16,bw=16) ->8,501,248 bytes
fft3dGPU v.0.40 (sigma=3.0,bt=3,bh=16,bw=16) ->8,497,152 bytes

fft3dGPU v.0.31 (sigma=3.0,bt=3,bh=64,bw=64) ->6,516,736 bytes
fft3dGPU v.0.40 (sigma=3.0,bt=3,bh=64,bw=64) ->6,500,352 bytes
fft3dGPU v.0.40 (sigma=3.0,bt=3,bh=64,bw=64,mode=1) ->6,019,072 bytes

Still any artifacts are invisible here

Thank You!

P.S. Just curious why "FFT3DFilter(sigma=3.0,bt=3,bh=16,bw=16)"
is produceing much smaller file - 6,408,192 bytes vs 8,497,152 bytes(by fft3dGPU)
vinetu is offline   Reply With Quote
Old 4th April 2005, 05:53   #42  |  Link
Fizick
AviSynth plugger
 
Fizick's Avatar
 
Join Date: Nov 2003
Location: Russia
Posts: 2,183
tsp,
The speed results of your filter is great!

But what is your "1:1 overlap" and "2:1 overlap" mean?
Kokaram (and me) used say 16 pixels blocks width, every next block is shifted by 8 pixels (right, bottom), so one-side overlap size is 8 pixels for every block, and whole blocks width is overlapped, so summary overlap size (left and right) for block is equel to its width= 16 pixels.
I think it is full (maximum possible) overlapping (for simple algo).

Is it your "1:1" or "2:1" ?

Now i create (not release yet) new version of FFT3DFilter with partial overlapping (with arbitrary overlapped size), and confused with therms. I want use new parameter "overlap width" as one-side overlap size, with maximum value equal to half of block width.
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick
I usually do not provide a technical support in private messages.
Fizick is offline   Reply With Quote
Old 4th April 2005, 07:45   #43  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
Fizick : What you descripe is my 2:1 overlap=mode 1. In the 1:1 overlap (mode=0) the blocks are only shifted half bh down and bw to the right so 1/4 of a block is only overlapped by 1 block(compaired to 3 blocks when using mode=1). when using mode 2 only bw minus the border is used for overlapping (mainly because the artifacts are most severe at the borders). So this is nearly the same as partial overlap. mode 0 and mode 2 uses another window function than the one used in mode 1.
This image shows the diffent mode:


So if you wants to compaire fft3dgpu with fft3dfilter use:
fft3dgpu(mode=1,usefloat16=false)

Last edited by tsp; 4th April 2005 at 07:47.
tsp is offline   Reply With Quote
Old 4th April 2005, 21:06   #44  |  Link
Fizick
AviSynth plugger
 
Fizick's Avatar
 
Join Date: Nov 2003
Location: Russia
Posts: 2,183
tsp,
thanks for response and nice pic. But i am not not quite understand it.
I draw my overlap pic in fft3dfilter thread.
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick
I usually do not provide a technical support in private messages.
Fizick is offline   Reply With Quote
Old 4th April 2005, 21:38   #45  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
Fizick: From your drawing it looks like the center of a block isn't overlapped at all. Is that true?
Also mode 1 in fft3dgpu and your default mode is the same. So look carefull at the (ugly) drawing of mode 1 and you can see four different colored blocks (dotted blue , dotted dark green, solid red and solid light green). To filter a 720x576 image we need to fft ~720/bw*576/bh*4 blocks. When using mode 0 there are only to overlapped block (red and green) meaning only ~720/bw*576/bh*2 blocks. And finaly mode 2 needs ~720/(bw-borderwidth*2)*576/(bh-borderheight*2)*2 blocks.

Last edited by tsp; 4th April 2005 at 21:45.
tsp is offline   Reply With Quote
Old 5th April 2005, 22:11   #46  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
Found a bug. I forgot to square the modulus of the transformed image when calculating the PowerSpectralDensity. I will release a new version shortly until then the quick fix is to change line 518 in ps.hlsl from

float2 PSD=float2(length(src.xz),length(src.yw));

to

float2 PSD=float2(src.x*src.x+src.z*src.z,src.y*src.y+src.w*src.w);
tsp is offline   Reply With Quote
Old 6th April 2005, 21:46   #47  |  Link
Fizick
AviSynth plugger
 
Fizick's Avatar
 
Join Date: Nov 2003
Location: Russia
Posts: 2,183
tsp,
Yes, center is not overlapped in my partial overlap mode.

So I conclude, that full overlap mode of my FFT3dfilter(bw=32,bh=32, ow=16, oh=16) is the same as your FFT3DGPU(mode=1, bw=32,bh=32).
But my partial overlap FFT3dfilter(bw=32,bh=32, ow=8, oh=8) is NOT the same as your partial overlap
FFT3DGPU(mode=0, bw=32,bh=32).
So, users may compare results (quality) of different approaches.
(after you fix recent bug, and i fix my quite possible bugs - i rewrote many lines of code in v.0.9)
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick
I usually do not provide a technical support in private messages.
Fizick is offline   Reply With Quote
Old 6th April 2005, 22:50   #48  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
released version 0.41. Only new thing is the above bugfix plus a minor bug when calculating sigma (when mapping from 0-255 to 0-1 divide with 255 not 256 doh ).
tsp is offline   Reply With Quote
Old 9th April 2005, 02:55   #49  |  Link
LordIntruder
Registered User
 
Join Date: Jan 2003
Posts: 109
Hi,


Some speed measurements again. I put old results back here for better readability:

Athlon 2400+, Radeon 9600 Pro, 1 Gb Ram.

I encoded a 10000 video frames (720 x 528) with these parameters for FFT:

'FFT3DFilter(sigma=3, bt=3)'

For FFT3dGPU I used:
'fft3dGPU(sigma=3, bt=3)'

Without FFT
1st Pass = 13 min
2nd Pass = 37 min

FFT Measure True (v0.8.3)
1st Pass = 53 min
2nd Pass = 80 min

FFT Measure OFF (v0.8.3)
1st Pass = 59 min
2nd Pass = 82 min

FFT3dGPU (v0.3)
1st Pass = 17 min
2nd Pass = 40 min
--------

Today I encoded the exact same clip with the updated versions:

'FFT3DFilter(sigma=3, bt=3)'
'fft3dGPU(sigma=3, bt=3, mode=1, reduceCPU=false)'

FFT Measure OFF (v0.9.1)
1st Pass = 28 min
2nd Pass = 55 min

FFT3dGPU (v0.4)
1st Pass = 25 min
2nd Pass = 49 min

FFT3dGPU (v0.41 reduceCPU=false)
1st Pass = 29 min
2nd Pass = 57 min

FFT3dGPU (v0.41 reduceCPU=true (default))
1st Pass = 21 min
2nd Pass = 38 min

I thought 'reduceCPU=false' increased encoding speed when I first read your explanations about this option. In fact it decrease the speed and by default (true) it is already the fatest.

In short 0.41 is faster than 0.4 (I was afraid that the more complex math to fix the bug would increase time encoding and it is the opposite, good). However as you can notice I used mode=1 (so 2:1 overlap) for the GPU version. Tsp you told me the encoding speed should be cut by half but it is not the case despite twice more calculations. Normal or a bug?

We also can see that the 3DNow optimizations help a lot for the normal version of FFT. That is really great.

Is the GPU version 3Dnow or SSE optimized? If not I hope you intend to do it, we would get some speed.

Finally is the GPU version work as something multi-threaded, I mean as if there were 2 CPU cores like the forthcoming Intel and AMD processors? Maybe it is the way it works (differently of course but the idea), just curiosity. I was thinking about some general code that could be used by others filters.

I mean you use a special DLL or something like that, some parameters in the AVS and thanks to this the calculation would be done half by the CPU, half by the GPU. Maybe it's impossible, just idea but like that not only the FFT filter would take benefit of the GPU but also others filters and/or general calculations. Instead to optimize each filter you write a general parameters and any filter can take benefit. A crazy idea

Oh before I forget: with both FFT and FFTGPU (and only them) when I start the job using Virtualdubmod latest version, most of the time it closes itself. I launch again VDM, I start the job and the 1st pass start. Then at the end of the 1st pass again sometimes VDM close and I need to manually launch it again and start the 2nd pass by myself or this one is launched after the 1st finishes. It appears ramdomly. I'm using the Avisynth 2.56 build 31 Jan and have just see another Beta from february 21 is out. Will give a try.

Anyway cheers to both of you on the work done on these filters
LordIntruder is offline   Reply With Quote
Old 9th April 2005, 23:22   #50  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
Quote:
Originally posted by LordIntruder


I thought 'reduceCPU=false' increased encoding speed when I first read your explanations about this option. In fact it decrease the speed and by default (true) it is already the fatest.
There is a good explanation to this. It's because avisynth(fft3dgpu) uses less cputime when reduceCPU=true. This means that XviD gets more time to do the encoding and the encodetime decrease even if avisynth uses a little more time to process a frame. If reduceCPU=false then the extra cpu-time would be waisted instead of used to encode. You could try to repeat the test with Huffyuv or another fast codec (MJPEG)instead of XviD and you would get some very different results(at least that's what I think would happend)

Quote:

In short 0.41 is faster than 0.4 (I was afraid that the more complex math to fix the bug would increase time encoding and it is the opposite, good). However as you can notice I used mode=1 (so 2:1 overlap) for the GPU version. Tsp you told me the encoding speed should be cut by half but it is not the case despite twice more calculations. Normal or a bug?
First the bugfix made the math simpler. Instead of calculating the modulus/length of the complex number/vector (squareroot(a^2+b^2)) the modules/length squarred is used (just a^2+b^2) so the squareroot isn't need (and that's an expensive operation).
I must admit that i'm a little surprised that the speed decrease wasn't bigger but again I think it's because the GPU uses more time meaning that XVid get's more time to encode so that it somewhat offset the extra time used (multiprocessing is very nice). Again if you use huffyuv or MJPEG you would get a greater speed decrease.


Quote:

We also can see that the 3DNow optimizations help a lot for the normal version of FFT. That is really great.

Is the GPU version 3Dnow or SSE optimized? If not I hope you intend to do it, we would get some speed.
fft3dgpu doesn't need 3dnow or sse because all the math heavy calculations are done on the GPU and it uses a very different operation set (basicly all the commands used are like sse on steroides). Maybe some speed could be gained by using assembly instead of HLSL(the c-like language used by directx.)


Quote:

Finally is the GPU version work as something multi-threaded, I mean as if there were 2 CPU cores like the forthcoming Intel and AMD processors? Maybe it is the way it works (differently of course but the idea), just curiosity. I was thinking about some general code that could be used by others filters.

I mean you use a special DLL or something like that, some parameters in the AVS and thanks to this the calculation would be done half by the CPU, half by the GPU. Maybe it's impossible, just idea but like that not only the FFT filter would take benefit of the GPU but also others filters and/or general calculations. Instead to optimize each filter you write a general parameters and any filter can take benefit. A crazy idea
The filter is multithreaded. Basicly just before the GPU begins the calculations a thread is created that fetchers the next frame. Meanwhile the first thread asks the GPU(driver) if it's done with the calculations if that is not the case it sleeps 5 msec before asking again. When the GPU is done the data is downloaded to the main memory and then the first threads waits for the second thread to exit. When the next frame is requested the results from all the filters before fft3dgpu are already cached (because they where run at the same time the GPU was working). So with a dualcore processor you could have the filter do it's calculations and calculate thenext frame at the same time (although the cache usage could be quite high)

Try to guess which of these two scripts who would run fastest or would they be equally fast?
Code:
#SCRIPT A
fft3dfilter(plane=1)
fft3dfilter(plane=2)
fft3dGPU()

#SCRIPT B
fft3dGPU()
fft3dfilter(plane=1)
fft3dfilter(plane=2)
script A would be fastest if used with a fast encoder because the two filters before fft3dGPU would be run at the same time as fft3dGPU while in script B the extra cpu time would just be waisted because there are no filters before fft3dgpu. If used with Xvid or another slow encoder the speed difference would be less because the waisted cputime would be used by Xvid

Quote:

Oh before I forget: with both FFT and FFTGPU (and only them) when I start the job using Virtualdubmod latest version, most of the time it closes itself. I launch again VDM, I start the job and the 1st pass start. Then at the end of the 1st pass again sometimes VDM close and I need to manually launch it again and start the 2nd pass by myself or this one is launched after the 1st finishes. It appears ramdomly. I'm using the Avisynth 2.56 build 31 Jan and have just see another Beta from february 21 is out. Will give a try.

Anyway cheers to both of you on the work done on these filters
I must say that it sounds odd it only happens with these two filter's because they don't share any code (unless Fizick used some of my code but I somewhat doubt it ) The only bug I know of in fft3dgpu is that if you uses F5 to many times all the videomemory is used. This is caused by a memory leak somewhere (It must be microsoft's fault. DirectX or something )
Oh and thanks for the test. It's really amazing to see the interactions with Xvid
tsp is offline   Reply With Quote
Old 10th April 2005, 14:11   #51  |  Link
Fizick
AviSynth plugger
 
Fizick's Avatar
 
Join Date: Nov 2003
Location: Russia
Posts: 2,183
LordIntruder,
your comparizon is not quite correct.
I change default overlap width since v.0.9.
(for speed).
3DNow was used previusly by FFTw internally for fft calculation (i think),
now in v.0.91 I add 3DNow for my Wiener calculation. The gain is about 25%.
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick
I usually do not provide a technical support in private messages.
Fizick is offline   Reply With Quote
Old 11th April 2005, 23:38   #52  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
I finaly got rid of the memory leaks so now you can F5 as crazy as you want in virtualdub(or use it in conditionalfilter/scriptclip/frameevaluate although I wouldn't recommend that because of the slow initialization).
Maybe this filter is ready for the avisynth usage categorie
tsp is offline   Reply With Quote
Old 12th April 2005, 02:36   #53  |  Link
MacAddict
XviD User
 
Join Date: Oct 2004
Location: Ky
Posts: 190
Was the block artifact bug fixed mentioned earlier in this thread fixed? I've got an FX5200 using the 71.90 Nvidia driver with XP SP2. I still see 0.42 displaying the blocks. Many thanks for the effort!
Attached Images
 
__________________
DFI NF4 SLI Expert | Opteron 165 CCBBE 0616 XPMW (9x325HTT=2.9Ghz) | 2x1GB G.Skill HZ (3-4-4-8-12-16-2T) | LG 62L DVD/CD | Geforce 7300GT | All SATA | Antec 650 Trio PSU | XP SP2
MacAddict is offline   Reply With Quote
Old 12th April 2005, 06:41   #54  |  Link
vinetu
Registered User
 
Join Date: Oct 2001
Posts: 195
Just curious - what is the type of the picture where the blocks are best
visible - a gradient ...or a flat colored one?
vinetu is offline   Reply With Quote
Old 12th April 2005, 07:53   #55  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
vinetu: With mode=0 a sharp(gradient) image would produce the most borderartifacts. A single colored area with the same intensity wouldn't produce as many artifacts. But it's easier to spot the artifacts in a flat area. This should produce them:
Code:
fft3dGPU(mode=0,bw=128,bh=128,sigma=50)

Last edited by tsp; 12th April 2005 at 08:49.
tsp is offline   Reply With Quote
Old 12th April 2005, 11:06   #56  |  Link
MacAddict
XviD User
 
Join Date: Oct 2004
Location: Ky
Posts: 190
Guess my picture attachment for the above post was never approved. The black blocks I mention above can be seen in
this screenshot. They appear on every frame.
__________________
DFI NF4 SLI Expert | Opteron 165 CCBBE 0616 XPMW (9x325HTT=2.9Ghz) | 2x1GB G.Skill HZ (3-4-4-8-12-16-2T) | LG 62L DVD/CD | Geforce 7300GT | All SATA | Antec 650 Trio PSU | XP SP2
MacAddict is offline   Reply With Quote
Old 12th April 2005, 12:49   #57  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
Quote:
Originally posted by MacAddict
Guess my picture attachment for the above post was never approved. The black blocks I mention above can be seen in
this screenshot. They appear on every frame.
yes the mysterious geforce fx bug. I don't know what's causing it. If you read page 2 in this thread Leo 69 had the same problem. I have compiled a new test version. It disables some of the filtering so if you could try it and tell me if it works. It is here. If it works try to replace the ps.hlsl file with the one from version 0.42 and see if it produces black boxes.

Also I have uploaded version 0.42 again because I forgot to update the dll
tsp is offline   Reply With Quote
Old 12th April 2005, 17:02   #58  |  Link
bill_baroud
Registered User
 
Join Date: Feb 2002
Posts: 407
hey i was the first to report this problem but i didn't have the time to post my screenshot (was moving...) and to test your test version. Did you notice that the size of block depend on the bw/bh parameters ?

nevertheless, i was here to ask you another question: how is your algo scalable ?
because i found a website here on parallel vision computation algorithm, and they use a computer with 6 PCI FX5200.
So i was wondering if something like that could be usefull for our purpose
bill_baroud is offline   Reply With Quote
Old 12th April 2005, 17:41   #59  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
bill_baroud: If you want you can try the new test version and you're welcome to try fixing the bug (src included) It's very strange that some of the block isn't processed.

I'm wondering if the bedst method to split the work between multiple GPUs would be to give each 1 frame or to split the frame up. Also wouldn't a single geforce 6800 GT be faster than 6 PCI fx5200 (they most be awfull bandwidth limited). If someone wants a multiple GPU version they could give me 2 geforce 6800 ULTRA and a nforce 4 SLI motherboard to work with
tsp is offline   Reply With Quote
Old 12th April 2005, 18:02   #60  |  Link
MacAddict
XviD User
 
Join Date: Oct 2004
Location: Ky
Posts: 190
Quote:
Originally posted by tsp
yes the mysterious geforce fx bug. I don't know what's causing it. If you read page 2 in this thread Leo 69 had the same problem. I have compiled a new test version. It disables some of the filtering so if you could try it and tell me if it works. It is here. If it works try to replace the ps.hlsl file with the one from version 0.42 and see if it produces black boxes.

Also I have uploaded version 0.42 again because I forgot to update the dll
The new test build seems to work perfect. Blocks appeared again only when I replaced the ps.hlsl file from the 0.42 build. Seems like your narrowing it down Thx again!
__________________
DFI NF4 SLI Expert | Opteron 165 CCBBE 0616 XPMW (9x325HTT=2.9Ghz) | 2x1GB G.Skill HZ (3-4-4-8-12-16-2T) | LG 62L DVD/CD | Geforce 7300GT | All SATA | Antec 650 Trio PSU | XP SP2
MacAddict is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 20:18.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.