Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Usage

Reply
 
Thread Tools Search this Thread Display Modes
Old 24th June 2008, 18:27   #761  |  Link
TSchniede
Registered User
 
Join Date: Aug 2006
Posts: 77
Quote:
Originally Posted by yup View Post
Hi all!
Simple question default value for searchparam for search=3?
yup.
The default for searchparam is 2 for all search types.
__________________
GA-P35-DS3R, Core2Quad Q9300@3GHz, 4.0GB/800 MHz DDR2, 2x250GB SATA HD, Geforce 6800
TSchniede is offline   Reply With Quote
Old 24th June 2008, 18:41   #762  |  Link
TSchniede
Registered User
 
Join Date: Aug 2006
Posts: 77
This is the new version with buffered source block.
It is still based on 1.9.5.0

The only changes are a new constant in MVInterface and in PlaneOfBloacks (all code is controlled by the constant)

My tests show a slight variation on speed (<1%) if source blocks are aligned anyway (no overlap) as the overhead and the better locality almost cancel each other out. On overlapped blocks 'I measured up to 10% performance increase.
__________________
GA-P35-DS3R, Core2Quad Q9300@3GHz, 4.0GB/800 MHz DDR2, 2x250GB SATA HD, Geforce 6800

Last edited by TSchniede; 2nd July 2008 at 00:59.
TSchniede is offline   Reply With Quote
Old 24th June 2008, 20:13   #763  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
If you aren't already, have you tried using x264's mc.copy for creating the aligned source blocks from the source data? It's blazingly fast.

Also, note that when you're using an aligned source block you can probably take great advantage of the constant stride in the various assembly functions.

Another idea: DCT is slow as hell, and x264's SATD is quite fast. How about replacing the "dct" option with a SATD option instead, borrowing x264's SATD code? And while we're taking assembly from x264, you could try using x264's 6-tap upscaling filter for hpel; its extremely fast.

Last edited by Dark Shikari; 24th June 2008 at 20:17.
Dark Shikari is offline   Reply With Quote
Old 24th June 2008, 20:29   #764  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,731
It seems that the new version is somewhat slower on my E6750.

With blocksize 8 and overlap 4:

The first version
x264_sad=3 : 5.7 fps

New version
x264_sad=3 : 5.3 fps


With blocksize 8 and no overlapping:

The first version
x264_sad=3 : 18.4 fps

The new version
x264_sad=3 : 16.8 fps
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 24th June 2008, 21:12   #765  |  Link
TSchniede
Registered User
 
Join Date: Aug 2006
Posts: 77
Dark Shikari, I was looking into those things, but they are a bit more complicated than importing the sad functions, so that will take some time. Though I wasn't really looking for a dct replacement yet.

Boulder, interesting - on my Q9300 and on my Pentium M the performance is quite good. Quite a chunk of the additional overhead comes by copying, so by speeding that up, it should be better. I tried to switch between direct source block references and buffered based on the alignment, but that was even slower, as the overhead for that is definitely bigger than doing it always. If you are comparing Fizick's merge with my version, different compiler / Win-API versions can make a difference too (and I have no idea yet what additional tweaks were introduced).
__________________
GA-P35-DS3R, Core2Quad Q9300@3GHz, 4.0GB/800 MHz DDR2, 2x250GB SATA HD, Geforce 6800
TSchniede is offline   Reply With Quote
Old 24th June 2008, 21:16   #766  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,731
Yes, it's Fizick's merge that I tested. I could run the same tests on your first build tomorrow to verify if the difference still exists.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 25th June 2008, 04:29   #767  |  Link
TSchniede
Registered User
 
Join Date: Aug 2006
Posts: 77
Quote:
Originally Posted by Dark Shikari View Post
If you aren't already, have you tried using x264's mc.copy for creating the aligned source blocks from the source data? It's blazingly fast.

Also, note that when you're using an aligned source block you can probably take great advantage of the constant stride in the various assembly functions.

Another idea: DCT is slow as hell, and x264's SATD is quite fast. How about replacing the "dct" option with a SATD option instead, borrowing x264's SATD code? And while we're taking assembly from x264, you could try using x264's 6-tap upscaling filter for hpel; its extremely fast.
I tried the calling SSD & SATD. they are slow compared to naked SAD as was to expect (150% for SSD, 250% for SATD) if used as replacements to SAD (both luma & chroma).but blazingly fast compared to the current dct (which needs 6x the time of SATD for luma alone), but it is not really comparable this way. It has to be at least scaled to match default assumptions and probably weighted with spatial SAD. Nevertheless it promises to be a faster alternative. I haven't been able to verify the correctness of my implementation, as it obviously isn't equivalent to a current option. It "works" in the way as it doesn't crash and does "something".

I looked into mc-copy too, but it is far simpler than the other functions (as most optimizations are of no advantage in such a simple algorithm) and uses a different interface, so I think the best option is a reimplementation.

Soon most assembler functions will come from x264

I am working on a optimized 4xY SAD function which takes advantage of the special source block properties too.
Right now it is faster working on a upscaled clip with 8x8 blocks compared to 4x4 with pel=2.

The hpel filter is something i have to try first as a stand alone avisynth filter to make sure it will work.
__________________
GA-P35-DS3R, Core2Quad Q9300@3GHz, 4.0GB/800 MHz DDR2, 2x250GB SATA HD, Geforce 6800
TSchniede is offline   Reply With Quote
Old 25th June 2008, 06:40   #768  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Have you tried using sad_x3/sad_x4? They're quite a bit faster than doing one SAD at a time.

SATD is a drop-in replacement for SAD (its used as such in x264 too, for --me tesa). It doesn't need scaling and there's a satd_x3/satd_x4 in pixel.c just to "fake" the multiple-SAD-call to allow it to be a drop-in replacement.
Dark Shikari is offline   Reply With Quote
Old 25th June 2008, 09:07   #769  |  Link
Terka
Registered User
 
Join Date: Jan 2005
Location: cz
Posts: 704
Hi guys,
its good to hear you are improving the mvtools speed. Would it be possible to port them to run using GPU like fft3dgpu?
Maybee this way the speedup gain will be greater?
Terka is offline   Reply With Quote
Old 25th June 2008, 16:12   #770  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,731
Quote:
Originally Posted by Boulder View Post
It seems that the new version is somewhat slower on my E6750.

With blocksize 8 and overlap 4:

The first version
x264_sad=3 : 5.7 fps

New version
x264_sad=3 : 5.3 fps


With blocksize 8 and no overlapping:

The first version
x264_sad=3 : 18.4 fps

The new version
x264_sad=3 : 16.8 fps
I tested the first build and here are the results:

blksize 8, overlap 4 : 5.2 fps
blksize 8, overlap 0 : 17.1 fps

Apparently Fizick's official 1.9.5.1 build is a tad bit faster.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 25th June 2008, 18:59   #771  |  Link
Fizick
AviSynth plugger
 
Fizick's Avatar
 
Join Date: Nov 2003
Location: Russia
Posts: 2,183
IMO, SSD or SATD is not useful.
But Hadamard (if i spelled it correctly) transform is interesting faster alternative to DCT.
I do not remember where I saw it, Mplayer or x264
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick
I usually do not provide a technical support in private messages.
Fizick is offline   Reply With Quote
Old 25th June 2008, 19:47   #772  |  Link
TSchniede
Registered User
 
Join Date: Aug 2006
Posts: 77
Quote:
Originally Posted by Terka View Post
Hi guys,
its good to hear you are improving the mvtools speed. Would it be possible to port them to run using GPU like fft3dgpu?
Maybe this way the speedup gain will be greater?
Right now we are talking about making MVAnalyse faster. Unfortunately the algorithm is highly linear and you can't split the frame into smaller chunks without sacrificing quality. Even single threaded the memory footprint is huge. So I really doubt moving to a (relatively) memory constrained, low clock rate platform with many cores will help. A basic 8x8, overlap=0 MVDegrain3 on a PAL clip runs nearly real time with SetMTMode(2,4) on my Q9300 anyway even without the last tweaks. And we already use SSE to work on several pixel at once, so there is little which can still be done better in parallel. I have no real knowledge how well GPUs respond to huge amounts of conditional code and synchronization, but it doesn't seem really plausible.
__________________
GA-P35-DS3R, Core2Quad Q9300@3GHz, 4.0GB/800 MHz DDR2, 2x250GB SATA HD, Geforce 6800
TSchniede is offline   Reply With Quote
Old 25th June 2008, 19:56   #773  |  Link
TSchniede
Registered User
 
Join Date: Aug 2006
Posts: 77
Quote:
Originally Posted by Fizick View Post
IMO, SSD or SATD is not useful.
But Hadamard (if i spelled it correctly) transform is interesting faster alternative to DCT.
I do not remember where I saw it, Mplayer or x264
You are unfortunately right. In the current code SATD seem to work inverse to SAD, meaning the best "SAD" is on the worst case scenario - the scene change. Currently I was investigating how Hadamard is supposed to work. I can't say it something is working as expected / useful, if I don't know what it should do in the first place.
__________________
GA-P35-DS3R, Core2Quad Q9300@3GHz, 4.0GB/800 MHz DDR2, 2x250GB SATA HD, Geforce 6800
TSchniede is offline   Reply With Quote
Old 25th June 2008, 21:08   #774  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by Fizick View Post
IMO, SSD or SATD is not useful.
But Hadamard (if i spelled it correctly) transform is interesting faster alternative to DCT.
SATD is the Hadamard transform and is better than SAD because it doesn't fail miserably in the case of fades. Its a far faster alternative to the DCT.
Dark Shikari is offline   Reply With Quote
Old 26th June 2008, 00:05   #775  |  Link
TSchniede
Registered User
 
Join Date: Aug 2006
Posts: 77
Quote:
Originally Posted by Dark Shikari View Post
SATD is the Hadamard transform and is better than SAD because it doesn't fail miserably in the case of fades. Its a far faster alternative to the DCT.
I think i have found my previous error. I had forgotten a debug instruction AND SATD triggered a lot of scene changes. It is definitely far more sensitive to noise than SAD. So it is somewhat "sharper". SSD is even more extreme in that part.

On a blurred clip the picture is completely reversed - SATD is definitely superior to SAD.
On a grainy source -addgrain(20) the blurred SATD is virtually identical to SAD without blur(1). On fades only SATD produces decent quality. I suppose better prefiltering even works better. Besides it is a lot faster than default dct.
__________________
GA-P35-DS3R, Core2Quad Q9300@3GHz, 4.0GB/800 MHz DDR2, 2x250GB SATA HD, Geforce 6800
TSchniede is offline   Reply With Quote
Old 26th June 2008, 00:18   #776  |  Link
Terranigma
*Space Reserved*
 
Terranigma's Avatar
 
Join Date: May 2006
Posts: 953
Have a compile with "SATD" that we can test?
Terranigma is offline   Reply With Quote
Old 26th June 2008, 01:52   #777  |  Link
TSchniede
Registered User
 
Join Date: Aug 2006
Posts: 77
Quote:
Originally Posted by Terranigma View Post
Have a compile with "SATD" that we can test?
you can get my current version here.
It is still based on 1.9.3. I think it's time to get Fizick version and update my Win API

there are two ways to access the new functions:
sadx264: 8-12
dct: 5-10

for a description see the documentation.

I have only tested base functionality yet. And I haven't thoroughly looked for potential performance problems with dct mode. There are definitely some minor parts which will need some work if this is going to stay.

The changes are in SADFunctions.h and PlaneOfBlocks (and of course MVInterface & MVAnalyse) pixel*.asm were added
__________________
GA-P35-DS3R, Core2Quad Q9300@3GHz, 4.0GB/800 MHz DDR2, 2x250GB SATA HD, Geforce 6800
TSchniede is offline   Reply With Quote
Old 26th June 2008, 03:26   #778  |  Link
Terranigma
*Space Reserved*
 
Terranigma's Avatar
 
Join Date: May 2006
Posts: 953
Thanks TSchniede.
Since you guys are borrowing code from x264, how difficult would it be to port over the hexagon and multi hexagon search algorithms?
Terranigma is offline   Reply With Quote
Old 26th June 2008, 04:05   #779  |  Link
TSchniede
Registered User
 
Join Date: Aug 2006
Posts: 77
Quote:
Originally Posted by Terranigma View Post
Since you guys are borrowing code from x264, how difficult would it be to port over the hexagon and multi hexagon search algorithms?
I don't know. A quick glance over the source wasn't that informative

I don't know the algorithm yet so I can't say if it would be useful in the first place. It might be possible to add a hexagonal search function to MVTools along the other. There seems to be some similarity to the logarithmic search. But it would be more a reimplementation anyway. The other functions were small assembler functions where the possible interfaces were very limited. I have only adapted the interface of MVTools in calling them, (if necessary) as they are more complex than the default functions.
__________________
GA-P35-DS3R, Core2Quad Q9300@3GHz, 4.0GB/800 MHz DDR2, 2x250GB SATA HD, Geforce 6800
TSchniede is offline   Reply With Quote
Old 26th June 2008, 05:58   #780  |  Link
Manao
Registered User
 
Join Date: Jan 2002
Location: France
Posts: 2,856
Quote:
SATD is the Hadamard transform and is better than SAD because it doesn't fail miserably in the case of fades
I disagree, it still fails miserably on fades. Fades make motion vectors go crazy because of DC change, and SATD is as sensitive to that as SAD.
__________________
Manao is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 10:10.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.