Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
26th June 2008, 06:07 | #781 | Link |
Registered User
Join Date: Jan 2002
Location: France
Posts: 2,856
|
Adding hex is easy, though not necessarily very usefull.
Adding umh is probably an overkill. umh shines because it tries arbitrarily long vectors, but MVTools inherently hierarchical motion estimation already handles the long motion vectors quite well. So it might help a bit, but it'll mostly slow it down I think.
__________________
|
26th June 2008, 15:26 | #783 | Link | |
Registered User
Join Date: Jan 2002
Location: France
Posts: 2,856
|
Quote:
__________________
|
|
26th June 2008, 17:04 | #784 | Link |
Registered User
Join Date: Mar 2002
Posts: 1,075
|
Obviously to make good use of a GPU you have adapt your algorithm a bit ... the GPU is better suited to a situation where each pixel needs only local information, iterative refinement from block MVs to pixel MVs is something a GPU could do quite well for instance.
|
26th June 2008, 17:27 | #785 | Link |
AviSynth plugger
Join Date: Nov 2003
Location: Russia
Posts: 2,183
|
I agree, that UMH is not needed for hierarhical search.
Exhaustive search is really non exhaustive but limited by radius (and I modified it to spiral). TSchniede, please use merged version 1.9.5.1 as a base to prevent mess! what is the problem with API?
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick I usually do not provide a technical support in private messages. |
26th June 2008, 19:02 | #787 | Link |
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Diamond isn't bad at all if you're doing hierarchical like MVTools does; I'm not sure exactly its method but look at Snow's iterative ME for an example of how diamond search can be surprisingly effective.
Hex would be better though and hardly much slower. |
26th June 2008, 19:21 | #790 | Link |
Registered User
Join Date: Jan 2002
Location: France
Posts: 2,856
|
Diamond search in MVTools isn't the same as "dia" in x264. Whenever the diamond might stop, diagonals are checked, and if one improve the SAD cost, the diamond starts again (it's an idea taken from XviD's ME). The documentation also names it "Logarithmic" search, and there's a reason for it. It starts by doing a diamond of size "searchparam", then each time the diamond stops, the size is divided by two.
So, with default searchparam (2), it covers as much ground as "hex", though in a slightly different manner.
__________________
|
26th June 2008, 19:23 | #791 | Link |
*Space Reserved*
Join Date: May 2006
Posts: 953
|
smh. Oh well, I threw an idea out there in the hopes that we wouldn't happen to resort to such drastic methods, such as what Didée mentioned, and use the likes of photoshop as an option of last resort.....
Last edited by Terranigma; 26th June 2008 at 19:26. |
26th June 2008, 19:44 | #792 | Link |
Guest
Posts: n/a
|
Manao,
I have a feature request. Currently in MVCompensate: If block SAD is above the thSAD, the block is bad, and we use source block instead of the compensated block. Could you add a switch to make this possible: If block SAD is above the thSAD, the block is bad, and we use Compensated block instead of the source block. I know this may seem counter-intuitive, but I have an application where this would be very handy. thanks, -G |
27th June 2008, 02:35 | #793 | Link | |
Registered User
Join Date: Aug 2006
Posts: 77
|
Quote:
It seems your selection code for 32x16 blocks got left out in PlaneOfBlocks by your merge. I added the autodetection Dark Shikari requested. I tried adding the mc-copy, but in my tests it was as fast as default at best, so it is deactivated right now. Boulder mentioned performance problems with my build, so I updated the VS2005 default WinAPI to 2008 and activated all performance options, it seems to be equal to your builds now. Exhaustive search might be improved for 4xY blocks with SSSE4.1 - MPSADBW. I was considering a few modifications to the hierarchy algorithm. As 4xY blocks are clearly slower than the larger ones, it seems advantageous to do a divide, but do a last search with the smaller blocks, that should be very close to the result of a native smaller block search. Right now a search with 8x8 blocks, pel=1 on a 2x up sized clip is faster than a 4x4 blocks, pel=2 search.
__________________
GA-P35-DS3R, Core2Quad Q9300@3GHz, 4.0GB/800 MHz DDR2, 2x250GB SATA HD, Geforce 6800 |
|
27th June 2008, 08:54 | #796 | Link |
x264 developer
Join Date: Sep 2004
Posts: 2,392
|
That's weird. x264's width4 SAD is the same speed as the width8 SAD. So doubling the width of both the frame and the block shouldn't affect speed (aside from cache pressure), but doubling the height should make it slower.
Last edited by akupenguin; 27th June 2008 at 08:56. |
27th June 2008, 22:39 | #798 | Link |
Registered User
Join Date: Nov 2006
Posts: 146
|
I use mvtool for denoising and to double framerate of smartphone videos...... for doubling framerate I use I script like:
source=last backward_vec = source.MVAnalyse(isb = true,overlap=4, pel=2, idx=1,search=3,dct=4) forward_vec = source.MVAnalyse(isb = false,overlap=4, pel=2, idx=1,search=3,dct=4) source.MVFlowFps(backward_vec, forward_vec, num=2*FramerateNumerator(source), \ den=FramerateDenominator(source), idx=1) but i see artifact in most of my videos....... How can I use mvtool for doubling framerate at " top of the possibilities" of this great tool? I've tried pel=4, but i cannot see difference with pel=2....... it's better if I use mvflowfps2? |
28th June 2008, 08:03 | #799 | Link |
Registered User
Join Date: Aug 2006
Posts: 77
|
So this time I improved the internal 2xY SAD function (avoided one push & pop and switched half of the segment register reads with regular ones - 8% faster).
After a good idea I added a optimized version which avoided most reg->mmx moves and used a new fact(1.9.3.2) - the aligned source block buffer is continuous (pitch = blockwidth), so only one read is needed for that. This resulted in an other 8% gain of MVDegrain3 with block=8 on YUY2, the gain with YV12 is less. Since the second version is clearly faster, it is now the only one used. The Source contains both though. You can get it here. So I suppose that should solve the 4xY block issue.
__________________
GA-P35-DS3R, Core2Quad Q9300@3GHz, 4.0GB/800 MHz DDR2, 2x250GB SATA HD, Geforce 6800 |
|
|