MVTools - Page 40

Manao · 26th June 2008, 06:07

Adding hex is easy, though not necessarily very usefull.

Adding umh is probably an overkill. umh shines because it tries arbitrarily long vectors, but MVTools inherently hierarchical motion estimation already handles the long motion vectors quite well. So it might help a bit, but it'll mostly slow it down I think.

Terranigma · 26th June 2008, 14:02

Quote:

Originally Posted by Manao

Adding umh is probably an overkill. umh shines because it tries arbitrarily long vectors, but MVTools inherently hierarchical motion estimation already handles the long motion vectors quite well. So it might help a bit, but it'll mostly slow it down I think.

More so than even the already implemented Exhaustive Search?

Manao · 26th June 2008, 15:26

Quote:

More so than even the already implemented Exhaustive Search? /

Exhaustive is simpler to code, and has its theorical use for the developper, since it ensures to get the optimal vector. As a user, it's of course useless (and even harmful, since we want the motion vector, not the best vector according to a dumb algorithm)

MfA · 26th June 2008, 17:04

Quote:

Originally Posted by TSchniede

Right now we are talking about making MVAnalyse faster. Unfortunately the algorithm is highly linear and you can't split the frame into smaller chunks without sacrificing quality.

Obviously to make good use of a GPU you have adapt your algorithm a bit ... the GPU is better suited to a situation where each pixel needs only local information, iterative refinement from block MVs to pixel MVs is something a GPU could do quite well for instance.

Fizick · 26th June 2008, 17:27

I agree, that UMH is not needed for hierarhical search.
Exhaustive search is really non exhaustive

but limited by radius (and I modified it to spiral).

TSchniede,
please use merged version 1.9.5.1 as a base to prevent mess!
what is the problem with API?

Terranigma · 26th June 2008, 18:59

OK guys, but obviously there can be some improvements upon Diamond Search. Maybe something not too extreme like UMH, but something more useful than hex. Have you guys seen this post ?

Dark Shikari · 26th June 2008, 19:02

Diamond isn't bad at all if you're doing hierarchical like MVTools does; I'm not sure exactly its method but look at Snow's iterative ME for an example of how diamond search can be surprisingly effective.

Hex would be better though and hardly much slower.

Terranigma · 26th June 2008, 19:10

Quote:

Originally Posted by Dark Shikari

Hex would be better though and hardly much slower.

Well, I wasn't sure since Manao said it wasn't very useful.
You're the M.E. expert. What do you suggest?

Dark Shikari · 26th June 2008, 19:14

Quote:

Originally Posted by Terranigma

Well, I wasn't sure since Manao said it wasn't very useful.
You're the M.E. expert. What do you suggest?

I'm actually not much of an expert with iterative ME/hierarchical algorithms so Manao might be more familiar with the topic than I.

Manao · 26th June 2008, 19:21

Diamond search in MVTools isn't the same as "dia" in x264. Whenever the diamond might stop, diagonals are checked, and if one improve the SAD cost, the diamond starts again (it's an idea taken from XviD's ME). The documentation also names it "Logarithmic" search, and there's a reason for it. It starts by doing a diamond of size "searchparam", then each time the diamond stops, the size is divided by two.

So, with default searchparam (2), it covers as much ground as "hex", though in a slightly different manner.

Terranigma · 26th June 2008, 19:23

Quote:

Originally Posted by Dark Shikari

I'm actually not much of an expert with iterative ME/hierarchical algorithms so Manao might be more familiar with the topic than I.

smh. Oh well, I threw an idea out there in the hopes that we wouldn't happen to resort to such drastic methods, such as what Didée mentioned, and use the likes of photoshop as an option of last resort.....

26th June 2008, 19:44

Manao,

I have a feature request. Currently in MVCompensate:
If block SAD is above the thSAD, the block is bad, and we use source block instead of the compensated block.
Could you add a switch to make this possible:
If block SAD is above the thSAD, the block is bad, and we use Compensated block instead of the source block.

I know this may seem counter-intuitive, but I have an application where this would be very handy.

thanks,

-G

TSchniede · 27th June 2008, 02:35

Quote:

Originally Posted by Fizick

I agree, that UMH is not needed for hierarhical search.
Exhaustive search is really non exhaustive

but limited by radius (and I modified it to spiral).

TSchniede,
please use merged version 1.9.5.1 as a base to prevent mess!
what is the problem with API?

I merged my code with version 1.9.5.1. It is available here.
It seems your selection code for 32x16 blocks got left out in PlaneOfBlocks by your merge.
I added the autodetection Dark Shikari requested. I tried adding the mc-copy, but in my tests it was as fast as default at best, so it is deactivated right now.

Boulder mentioned performance problems with my build, so I updated the VS2005 default WinAPI to 2008 and activated all performance options, it seems to be equal to your builds now.

Exhaustive search might be improved for 4xY blocks with SSSE4.1 - MPSADBW.

I was considering a few modifications to the hierarchy algorithm. As 4xY blocks are clearly slower than the larger ones, it seems advantageous to do a divide, but do a last search with the smaller blocks, that should be very close to the result of a native smaller block search.
Right now a search with 8x8 blocks, pel=1 on a 2x up sized clip is faster than a 4x4 blocks, pel=2 search.

Dark Shikari · 27th June 2008, 02:38

Quote:

Originally Posted by TSchniede

Exhaustive search might be improved for 4xY blocks with SSSE4.1 - MPSADBW.

SEA is much faster. MPSADBW is only about 20% faster than 8 cacheline SADs alone for unaligned 16x16 blocks, for example. SEA is about 7 times faster.

27th June 2008, 05:23

Another feature request: thSADC for MVCompensate.

thanks,

-G

akupenguin · 27th June 2008, 08:54

Quote:

Originally Posted by TSchniede

Right now a search with 8x8 blocks, pel=1 on a 2x up sized clip is faster than a 4x4 blocks, pel=2 search.

That's weird. x264's width4 SAD is the same speed as the width8 SAD. So doubling the width of both the frame and the block shouldn't affect speed (aside from cache pressure), but doubling the height should make it slower.

TSchniede · 27th June 2008, 14:17

Quote:

Originally Posted by akupenguin

That's weird. x264's width4 SAD is the same speed as the width8 SAD. So doubling the width of both the frame and the block shouldn't affect speed (aside from cache pressure), but doubling the height should make it slower.

I think part of that is the implementation of the 2xY SAD functions

ficofico · 27th June 2008, 22:39

I use mvtool for denoising and to double framerate of smartphone videos...... for doubling framerate I use I script like:

source=last
backward_vec = source.MVAnalyse(isb = true,overlap=4, pel=2, idx=1,search=3,dct=4)
forward_vec = source.MVAnalyse(isb = false,overlap=4, pel=2, idx=1,search=3,dct=4)
source.MVFlowFps(backward_vec, forward_vec, num=2*FramerateNumerator(source), \
den=FramerateDenominator(source), idx=1)

but i see artifact in most of my videos....... How can I use mvtool for doubling framerate at " top of the possibilities" of this great tool? I've tried pel=4, but i cannot see difference with pel=2....... it's better if I use mvflowfps2?

TSchniede · 28th June 2008, 08:03

So this time I improved the internal 2xY SAD function (avoided one push & pop and switched half of the segment register reads with regular ones - 8% faster).
After a good idea I added a optimized version which avoided most reg->mmx moves and used a new fact(1.9.3.2) - the aligned source block buffer is continuous (pitch = blockwidth), so only one read is needed for that. This resulted in an other 8% gain of MVDegrain3 with block=8 on YUY2, the gain with YV12 is less.
Since the second version is clearly faster, it is now the only one used. The Source contains both though.

You can get it here.

So I suppose that should solve the 4xY block issue.

Undead Sega · 28th June 2008, 11:06

To Manao,

MVTools is an excellent piece of work! i use it for motion compensation deinterlacing and results look great!

but may i ask, what is the possibilities of having MVTools ported to the GPU?

26th June 2008, 06:07	#781 \| Link
Manao Registered User Join Date: Jan 2002 Location: France Posts: 2,856	Adding hex is easy, though not necessarily very usefull. Adding umh is probably an overkill. umh shines because it tries arbitrarily long vectors, but MVTools inherently hierarchical motion estimation already handles the long motion vectors quite well. So it might help a bit, but it'll mostly slow it down I think. __________________ Masktools x86 & x64: Stable (2.0a48) AVCMatrices : Stable (1.3) Anisotool : Beta (1.0a5)

26th June 2008, 17:27	#785 \| Link
Fizick AviSynth plugger Join Date: Nov 2003 Location: Russia Posts: 2,183	I agree, that UMH is not needed for hierarhical search. Exhaustive search is really non exhaustive but limited by radius (and I modified it to spiral). TSchniede, please use merged version 1.9.5.1 as a base to prevent mess! what is the problem with API? __________________ My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick I usually do not provide a technical support in private messages.

26th June 2008, 18:59	#786 \| Link
Terranigma Space Reserved Join Date: May 2006 Posts: 953	OK guys, but obviously there can be some improvements upon Diamond Search. Maybe something not too extreme like UMH, but something more useful than hex. Have you guys seen this post ? __________________ Kurama Link And Fox Doom10 - It's brighter on the other side

26th June 2008, 19:02	#787 \| Link
Dark Shikari x264 developer Join Date: Sep 2005 Posts: 8,666	Diamond isn't bad at all if you're doing hierarchical like MVTools does; I'm not sure exactly its method but look at Snow's iterative ME for an example of how diamond search can be surprisingly effective. Hex would be better though and hardly much slower. __________________ Follow x264 development progress \| akupenguin quotes \| x264 git status ffmpeg and x264-related consulting/coding contracts \| Doom10

26th June 2008, 19:21	#790 \| Link
Manao Registered User Join Date: Jan 2002 Location: France Posts: 2,856	Diamond search in MVTools isn't the same as "dia" in x264. Whenever the diamond might stop, diagonals are checked, and if one improve the SAD cost, the diamond starts again (it's an idea taken from XviD's ME). The documentation also names it "Logarithmic" search, and there's a reason for it. It starts by doing a diamond of size "searchparam", then each time the diamond stops, the size is divided by two. So, with default searchparam (2), it covers as much ground as "hex", though in a slightly different manner. __________________ Masktools x86 & x64: Stable (2.0a48) AVCMatrices : Stable (1.3) Anisotool : Beta (1.0a5)

26th June 2008, 19:44	#792 \| Link
g-force Guest Posts: n/a	Manao, I have a feature request. Currently in MVCompensate: If block SAD is above the thSAD, the block is bad, and we use source block instead of the compensated block. Could you add a switch to make this possible: If block SAD is above the thSAD, the block is bad, and we use Compensated block instead of the source block. I know this may seem counter-intuitive, but I have an application where this would be very handy. thanks, -G

27th June 2008, 05:23	#795 \| Link
g-force Guest Posts: n/a	Another feature request: thSADC for MVCompensate. thanks, -G

27th June 2008, 22:39	#798 \| Link
ficofico Registered User Join Date: Nov 2006 Posts: 146	I use mvtool for denoising and to double framerate of smartphone videos...... for doubling framerate I use I script like: source=last backward_vec = source.MVAnalyse(isb = true,overlap=4, pel=2, idx=1,search=3,dct=4) forward_vec = source.MVAnalyse(isb = false,overlap=4, pel=2, idx=1,search=3,dct=4) source.MVFlowFps(backward_vec, forward_vec, num=2*FramerateNumerator(source), \ den=FramerateDenominator(source), idx=1) but i see artifact in most of my videos....... How can I use mvtool for doubling framerate at " top of the possibilities" of this great tool? I've tried pel=4, but i cannot see difference with pel=2....... it's better if I use mvflowfps2?

28th June 2008, 08:03	#799 \| Link
TSchniede Registered User Join Date: Aug 2006 Posts: 77	So this time I improved the internal 2xY SAD function (avoided one push & pop and switched half of the segment register reads with regular ones - 8% faster). After a good idea I added a optimized version which avoided most reg->mmx moves and used a new fact(1.9.3.2) - the aligned source block buffer is continuous (pitch = blockwidth), so only one read is needed for that. This resulted in an other 8% gain of MVDegrain3 with block=8 on YUY2, the gain with YV12 is less. Since the second version is clearly faster, it is now the only one used. The Source contains both though. You can get it here. So I suppose that should solve the 4xY block issue. __________________ GA-P35-DS3R, Core2Quad Q9300@3GHz, 4.0GB/800 MHz DDR2, 2x250GB SATA HD, Geforce 6800

28th June 2008, 11:06	#800 \| Link
Undead Sega Registered User Join Date: Oct 2007 Posts: 713	To Manao, MVTools is an excellent piece of work! i use it for motion compensation deinterlacing and results look great! but may i ask, what is the possibilities of having MVTools ported to the GPU?