PDA

View Full Version : A thought: "evolving" good ME algorithms.


Dark Shikari
30th October 2007, 15:25
I had this idea last night, so I went on a 3-hour Java coding spree and wrote myself a simple motion estimation language, an interpreter, and a program to evolve hundreds of sample programs to create an optimal motion estimation algorithm for a particular dataset of motion estimation data.

Obviously this has weaknesses; in one sense you're optimizing specifically for the data--but hopefully if I give it enough data that won't happen anymore.

I started the program off with a simple diamond search. Each ME algorithm has 40 steps, no more, no less. There are two instructions that the motion estimation algorithms being bred have:

1. X,Y: Do a motion search at location X,Y relative to the current location.
2. RESET: set the current location to the best location found so far.

Simplistic as all hell, but at least something to start with.

Here's a result from my program run overnight:

X: 0 Y: 1
X: 0 Y: -3
X: -11 Y: 5
X: 0 Y: -6
X: 1 Y: 0
X: -4 Y: 0
X: 1 Y: 5
RESET
X: 3 Y: -5
X: 4 Y: 0
X: 0 Y: -1
X: 0 Y: 3
X: -6 Y: -13
X: -3 Y: -9
X: -2 Y: 7
RESET
X: 0 Y: 1
X: 2 Y: -1
RESET
X: -3 Y: -2
X: 12 Y: -4
X: 4 Y: 5
X: -10 Y: 0
RESET
X: -1 Y: -5
X: -1 Y: 1
X: 1 Y: 1
X: 10 Y: -7
X: -1 Y: 3
RESET
X: 1 Y: -1
X: -1 Y: 0
X: 0 Y: 2
X: -1 Y: -10
RESET
X: 0 Y: -1
X: -4 Y: -10
X: -1 Y: 0
RESET
X: 0 Y: 0

In C, this is

COST_MV(omx+0,omy+1)
COST_MV(omx+0,omy-3)
COST_MV(omx-11,omy+5)
COST_MV(omx+0,omy-6)
COST_MV(omx+1,omy+0)
COST_MV(omx-4,omy+0)
COST_MV(omx+1,omy+5)
omx = bmx; omy = bmy;
COST_MV(omx+3,omy-5)
COST_MV(omx+4,omy+0)
COST_MV(omx+0,omy-1)
COST_MV(omx+0,omy+3)
COST_MV(omx-6,omy-13)
COST_MV(omx-3,omy-9)
COST_MV(omx-2,omy+7)
omx = bmx; omy = bmy;
COST_MV(omx+0,omy+1)
COST_MV(omx+2,omy-1)
omx = bmx; omy = bmy;
COST_MV(omx-3,omy-2)
COST_MV(omx+12,omy-4)
COST_MV(omx+4,omy+5)
COST_MV(omx-10,omy+0)
omx = bmx; omy = bmy;
COST_MV(omx-1,omy-5)
COST_MV(omx-1,omy+1)
COST_MV(omx+1,omy+1)
COST_MV(omx+10,omy-7)
COST_MV(omx-1,omy+3)
omx = bmx; omy = bmy;
COST_MV(omx+1,omy-1)
COST_MV(omx-1,omy+0)
COST_MV(omx+0,omy+2)
COST_MV(omx-1,omy-10)
omx = bmx; omy = bmy;
COST_MV(omx+0,omy-1)
COST_MV(omx-4,omy-10)
COST_MV(omx-1,omy+0)
omx = bmx; omy = bmy;
COST_MV(omx+0,omy+0)

It gets on average 1% higher SAD values than an exhaustive search, for the data set given. This is about 5 times better than diamond.

It'll be interesting to see whether I can make algorithms good for all data sets using this, or whether no matter what it'll just fit it to the data set at the cost of real efficiency.

Mutant_Fruit
30th October 2007, 18:52
What happens if you take a few more datasets and run your 'evolved' ME algorithm against those? Does it have better or worse performance? I'm hazarding a guess it'll have worse ;)

Dark Shikari
30th October 2007, 19:07
What happens if you take a few more datasets and run your 'evolved' ME algorithm against those? Does it have better or worse performance? I'm hazarding a guess it'll have worse ;)Of course, my goal is to avoid that :)

I have modified my evolution to take into account the speed of each algorithm, and allowed the algorithms to change length. I also added a new instruction, QRESET, which if the best value hasn't changed since the previous reset, will terminate the algorithm.

I'm increasing the size of my dataset to 800 megabytes of input motion data.

The idea is that if one adapts to "common" motion data, perhaps for other "common" motion data, the algorithm will also do a good job.

TheBashar
31st October 2007, 04:02
Just a quick suggestion from my days tinkering with python code and spam filter algorithm tweaking: Randomly divide your source material in to two piles. Use one for only for training and use the other one only for evaluation. So in this case, I'd recommend evolving against one set and using the other for more objective evaluation of the end results.

akupenguin
31st October 2007, 08:24
X: 0 Y: 0
Interesting that the evolved solution contains a NOP.

Dark Shikari
31st October 2007, 08:33
Interesting that the evolved solution contains a NOP.That's because the evolved solution didn't count speed. I have much improved my algorithm; it now has a QRESET command which terminates the search if there has been no change in the best MV found since the last RESET/QRESET.

The new best:

X: 14 Y: -4
X: 14 Y: 5
RESET
X: 2 Y: 0
X: 2 Y: -11
RESET
X: 0 Y: 5
X: 0 Y: -5
RESET
X: 14 Y: 3
RESET
X: 0 Y: 1
X: 0 Y: -2
X: 0 Y: 11
QRESET
X: 0 Y: 5
RESET
X: 0 Y: -2
X: 0 Y: 1
QRESET
X: 0 Y: 1
QRESET
QRESET
X: -4 Y: -2
X: 3 Y: 1
X: -1 Y: 2
X: 1 Y: 8
X: 8 Y: -5
X: -4 Y: 1
RESET
X: 1 Y: 3
X: 2 Y: 0
X: 4 Y: -4
X: -4 Y: -4
RESET
X: 6 Y: -1
X: -7 Y: -3
X: 0 Y: -3
X: 4 Y: -5
X: -2 Y: -1
X: 3 Y: 0
X: 7 Y: 1
X: 6 Y: 1
X: -4 Y: -1
X: -2 Y: -8
X: 0 Y: -3
X: 3 Y: 1
X: -5 Y: -2
X: -1 Y: -2
X: -1 Y: -2
QRESET
X: -1 Y: -4
X: -1 Y: -1
X: 1 Y: -4
X: -3 Y: -2
X: 4 Y: -1
X: -8 Y: -5

Note the wide search with RESETs first, then the shorter searches with QRESETs later. The dual QRESETs though ensure that it'll terminate early no matter what, which suggests that speed is being weighted too highly as a factor.

Dark Shikari
31st October 2007, 18:25
Here is a new "best" with speed being weighted less strongly in the heuristic:

X: 13 Y: -8
RESET
X: 1 Y: 0
RESET
X: 15 Y: -3
RESET
X: 13 Y: -10
RESET
X: 1 Y: 3
X: 2 Y: 8
RESET
X: 0 Y: 1
X: 0 Y: -1
X: 0 Y: 14
X: 15 Y: 3
X: 13 Y: -2
X: 14 Y: 7
RESET
X: 13 Y: 2
RESET
X: 1 Y: -1
X: 2 Y: 15
RESET
X: 1 Y: -5
X: 2 Y: 1
RESET
X: 1 Y: 0
RESET
X: 0 Y: 2
X: 0 Y: 9
RESET
X: 0 Y: 0
X: 0 Y: -3
RESET
X: 0 Y: -8
RESET
X: 0 Y: -1
X: -2 Y: 0
X: 0 Y: 1
X: 0 Y: -5
X: 0 Y: 5
X: -1 Y: -14
X: 0 Y: -11
QRESET
X: 0 Y: 1
X: 1 Y: 0
X: -1 Y: 0
X: 0 Y: -1
QRESET
X: -1 Y: 0
X: 0 Y: 1
X: 0 Y: -1
X: 1 Y: 0
X: -5 Y: 0
QRESET
X: -2 Y: 5
RESET
X: 0 Y: 1
X: 0 Y: -1
X: 1 Y: 0
X: -1 Y: 0
QRESET
X: -5 Y: 0
RESET
X: 0 Y: -3
X: -5 Y: 8
X: 11 Y: 1
RESET
X: -1 Y: 0
X: 0 Y: 1
X: 1 Y: 0
X: 0 Y: -1
QRESET
X: 1 Y: 0
X: 0 Y: 1
X: 5 Y: 1
X: -1 Y: 0
X: -14 Y: 0
X: 5 Y: -5
QRESET
X: 0 Y: 1
X: 1 Y: 0
X: 2 Y: -9
X: 0 Y: -2
X: 7 Y: -1
X: -2 Y: -1
X: -7 Y: -2
QRESET
X: 0 Y: 4
X: -2 Y: -3
X: 3 Y: 0
X: 2 Y: 15
RESET
X: 1 Y: 3
X: 1 Y: 0
X: -6 Y: 9
RESET
X: 0 Y: 1
X: -1 Y: 2
X: 0 Y: -1
X: 0 Y: 0
X: 2 Y: 1
X: -14 Y: -4
QRESET
It does almost as well as ME HEX in terms of speed and effectiveness in multiple sources, not just the one it was trained off of--so its not source-dependent (the training source was 360p Elephant's Dream, and the secondary source was a clip from Ocean's Eleven). However, I'm guessing the primary thing stopping it from begin as good or better than ME HEX is the inability to branch and make more intelligent decisions; in other words, it doesn't have enough instructions.

If I have time today, I will considering adding a new BRANCH instruction along with a converter application in the code that prints out C versions of these in addition to the versions I've posted here.

BRANCH syntax will be BRANCH [[X1,Y1][X2,Y2][X3,Y3][X4,Y4]]. In other words, it will hold an ArrayList of other Instructions that all point to various locations for motion searching. What it does is look at the X previous motion searches (between 2 and 4, perhaps). Let's say the second of those four had the best result; it'll search at X2,Y2, in that case.

Another idea would be a JMP instruction; if the best value has or has not changed since the last reset, JMP to a specific location in the algorithm.

If there are any other instructions you think will be useful, post your ideas.

TheBashar
1st November 2007, 08:43
If there are any other instructions you think will be useful, post your ideas.

Perhaps an "easier said than done" idea, but would it be possible to include whatever rudimentary instructions would be necessary to reproduce the hex algorithm? If so, it might be interesting to throw a hand-coded hex algorithm in using the in as a progenitor and see where it evolves from there.

Dark Shikari
1st November 2007, 09:53
Perhaps an "easier said than done" idea, but would it be possible to include whatever rudimentary instructions would be necessary to reproduce the hex algorithm? If so, it might be interesting to throw a hand-coded hex algorithm in using the in as a progenitor and see where it evolves from there.
It can already be done as it is, though not as efficiently, since it searches all 6 directions while the actual HEX searches only the 3 in the direction its moving to avoid overlap.

Branch could be used to avoid this, I guess.

I'm running the latest version of my program overnight on a 90MB data set. It now has a feature that automatically prints out the best algorithm every X iterations as C code; one can literally copy-paste it into x264. It includes the appropriate SAD_X3 and SAD_X4s also. Here's the result of a couple of hours of iteration:

COST_MV(omx + 3, omy + -1);
if(CHECK_MVRANGE(omx + 13, omy + 0)) COST_MV(omx + 13, omy + 0);
if(CHECK_MVRANGE(omx + 7, omy + 1)) COST_MV(omx + 7, omy + 1);
if(!CHECK_MVRANGE(bmx,bmy)) break;
omx = bmx;
omy = bmy;
COST_MV_X4(4,2,1,-3,5,-4,5,2);
if(CHECK_MVRANGE(omx + 9, omy + 3)) COST_MV(omx + 9, omy + 3);
if(!CHECK_MVRANGE(bmx,bmy)) break;
omx = bmx;
omy = bmy;
COST_MV_X4(5,-5,0,-1,5,-4,5,-5);
if(!CHECK_MVRANGE(bmx,bmy)) break;
omx = bmx;
omy = bmy;
COST_MV(omx + 2, omy + 4);
if(CHECK_MVRANGE(omx + 8, omy + 11)) COST_MV(omx + 8, omy + 11);
COST_MV(omx + 2, omy + 0);
COST_MV(omx + 4, omy + 4);
if(!CHECK_MVRANGE(bmx,bmy)) break;
omx = bmx;
omy = bmy;
COST_MV_X4(5,-5,0,4,2,-1,4,0);
if(!CHECK_MVRANGE(bmx,bmy)) break;
omx = bmx;
omy = bmy;
COST_MV(omx + 0, omy + 2);
if(CHECK_MVRANGE(omx + 12, omy + 8)) COST_MV(omx + 12, omy + 8);
COST_MV(omx + 3, omy + 1);
if(!CHECK_MVRANGE(bmx,bmy)) break;
omx = bmx;
omy = bmy;
COST_MV_X4(3,-1,5,-3,5,4,3,5);
COST_MV(omx + 3, omy + -4);
if(!CHECK_MVRANGE(bmx,bmy)) break;
omx = bmx;
omy = bmy;
COST_MV_X4(1,-1,0,-2,5,3,1,5);
if(CHECK_MVRANGE(omx + 15, omy + -8)) COST_MV(omx + 15, omy + -8);
if(!CHECK_MVRANGE(bmx,bmy)) break;
omx = bmx;
omy = bmy;
COST_MV_X3(1,1,5,3,4,-3);
if(!CHECK_MVRANGE(bmx,bmy)) break;
omx = bmx;
omy = bmy;
COST_MV_X3(0,-2,0,3,1,-5);
if(CHECK_MVRANGE(omx + 0, omy + 12)) COST_MV(omx + 0, omy + 12);
if(CHECK_MVRANGE(omx + 0, omy + -14)) COST_MV(omx + 0, omy + -14);
COST_MV(omx + 5, omy + 0);
if(CHECK_MVRANGE(omx + 11, omy + -9)) COST_MV(omx + 11, omy + -9);
if(!CHECK_MVRANGE(bmx,bmy)) break;
omx = bmx;
omy = bmy;
COST_MV_X4(1,1,0,0,0,-1,4,5);
COST_MV(omx + 3, omy + 0);
if(!CHECK_MVRANGE(bmx,bmy)) break;
omx = bmx;
omy = bmy;
COST_MV_X3(-2,-1,1,-5,5,1);
COST_MV_X3(0,-4,5,-1,0,1);
COST_MV_X3(1,-1,-5,-1,2,-5);
COST_MV_X3(-5,5,0,-2,0,3);
COST_MV_X3(0,-5,0,5,5,-5);
if(!CHECK_MVRANGE(bmx,bmy)) break;
omx = bmx;
omy = bmy;
COST_MV_X3(-1,3,-3,2,1,0);
COST_MV_X3(-1,4,-5,-5,0,-1);
COST_MV_X3(-2,-5,-1,2,0,1);
COST_MV_X3(-1,0,-4,0,-2,4);
COST_MV_X3(0,0,1,1,2,0);
COST_MV_X3(0,-3,1,5,5,2);
COST_MV_X3(0,-5,1,-3,-4,5);
COST_MV(omx + -2, omy + -3);
if((omx==bmx && omy == bmy) || !CHECK_MVRANGE(bmx,bmy)) break;
omx = bmx;
omy = bmy;
COST_MV_X3(0,5,-4,-1,-5,3);
COST_MV_X3(2,-4,-2,-3,-5,-5);
if(!CHECK_MVRANGE(bmx,bmy)) break;
omx = bmx;
omy = bmy;
COST_MV_X3(0,1,1,1,0,-1);
COST_MV_X3(-1,0,1,-1,-5,0);
COST_MV_X3(-2,5,1,0,0,-5);
COST_MV(omx + 4, omy + -1);
if((omx==bmx && omy == bmy) || !CHECK_MVRANGE(bmx,bmy)) break;
omx = bmx;
omy = bmy;
COST_MV_X3(-3,0,2,1,5,-5);
COST_MV_X3(0,3,1,-2,5,2);
COST_MV(omx + -4, omy + -2);
if(!CHECK_MVRANGE(bmx,bmy)) break;
omx = bmx;
omy = bmy;
COST_MV_X4(0,-1,1,0,-1,0,0,1);
if((omx==bmx && omy == bmy) || !CHECK_MVRANGE(bmx,bmy)) break;
omx = bmx;
omy = bmy;

Sagekilla
1st November 2007, 16:34
This may be a bit off topic, but I was always wondering if it were possible to convert certain portions of the code to run off a GPU, like maybe the deblock filter or motion estimation. If it were possible, I could imagine the GPU doing a esa motion search would be a magnitude or two faster than even a multicore setup on a fairly high end GPU (Radeon X1k, HD 2k, 7xxxx or 8xxxx).

Would it be possible? Or is it just simply impossible because of limitations of today's GPUs or the simple fact that it would be difficult to port over multiple GPUs?

akupenguin
1st November 2007, 16:46
This may be a bit off topic, but I was always wondering if it were possible to convert certain portions of the code to run off a GPU, like maybe the deblock filter or motion estimation.
Possible, but only for simple algorithms. ESA, not even Successive Elimination. Also has to search all partitions of all reference frames because you can't run early termination without syncing to the mode decision, and uses worse mv predictors because multiple blocks have to run in parallel.
So it had better be two orders of magnitude faster, because you'd spend one of those orders on a worse algorithm.

Sagekilla
1st November 2007, 16:56
Possible, but only for simple algorithms. ESA, not even Successive Elimination. Also has to search all partitions of all reference frames because you can't run early termination without syncing to the mode decision, and uses worse mv predictors because multiple blocks have to run in parallel.
So it had better be two orders of magnitude faster, because you'd spend one of those orders on a worse algorithm.

Yeah, I know that it's pretty bad considering that it's literally searching every single pixel for any motion but I know that's what GPUs are good at: processing a huge amount of data in parallel very quickly. I just thought it would be best to go with esa since it would provide the best quality possible. Correct me if I'm wrong, but I'm pretty sure there's nothing better quality wise than esa

burfadel
1st November 2007, 18:26
Running some filters off the GPU can already be done of course. I came across this:
http://briefcase.yahoo.co.jp/gpu25clone

Unfortunately my Japanese isn't that good... but I do understand the filter file has:
GPU_BilinearResize
GPU_LanczosResize
GPU_Convolution3d
GPU_TemporalSmoother
GPU_ColorYUY2

All you need now is the rest of the filters to be GPU, including the decoder filter!

Ranguvar
1st November 2007, 19:39
Running some filters off the GPU can already be done of course. I came across this:
http://briefcase.yahoo.co.jp/gpu25clone

Unfortunately my Japanese isn't that good... but I do understand the filter file has:
GPU_BilinearResize
GPU_LanczosResize
GPU_Convolution3d
GPU_TemporalSmoother
GPU_ColorYUY2

All you need now is the rest of the filters to be GPU, including the decoder filter!Okay, now we're really off topic, but most of those are fast enough that the bottleneck at any decent encoding speed will be the encoder, and even if it's just filterin gthey should do 30fps fine :)

Some complex denoise and sharpeners, as well as HD filtering, could benefit though. I shall try!

burfadel
2nd November 2007, 02:21
Okay, now we're really off topic, but most of those are fast enough that the bottleneck at any decent encoding speed will be the encoder, and even if it's just filterin gthey should do 30fps fine :)

Some complex denoise and sharpeners, as well as HD filtering, could benefit though. I shall try!

It was only meant to be partly off topic! I was pointing more to since the GPU, due to the nature of the encoding, can't effienciently help x264, a better option would be to offload some of the CPU cost of the decoding and filtering to the GPU which leaves the whole CPU to do the encoding :)

Ranguvar
2nd November 2007, 04:23
It was only meant to be partly off topic! I was pointing more to since the GPU, due to the nature of the encoding, can't effienciently help x264, a better option would be to offload some of the CPU cost of the decoding and filtering to the GPU which leaves the whole CPU to do the encoding :)I see now :)

But I meant I was making it even more off topic :p

lucassp
8th November 2007, 07:56
How about RapidMind, would it help x264 do some GPU processing? What about RV670's Double Precision FP?? Thank you!

Sharktooth
9th November 2007, 03:51
uhm... AMD FireStream 9170: http://gizmodo.com/gadgets/blazing/amd-launches-firestream-9170-first-stream-processor-with-double+precision-floating-point-technology-320290.php

Inventive Software
9th November 2007, 04:08
Something similar from NVIDIA: http://www.nvidia.com/object/tesla_gpu_processor.html

AK_RAGE
29th November 2007, 23:02
There seems to be some work done on "adaptive ME" which would evolve as it is encoding.

See http://www.springerlink.com/content/6034822863562v41/

Dark Shikari
29th November 2007, 23:05
There seems to be some work done on "adaptive ME" which would evolve as it is encoding.

See http://www.springerlink.com/content/6034822863562v41/If that article is what I think it is (IIRC I already read it), its not really related to the topic of this thread. Its an algorithm that probably is not very useful in practice... most like almost everything that comes out of the theoretical world of H.264 :(

akupenguin
29th November 2007, 23:14
(1) Any motion search algorithm that's proposed in a paper that compares its speed against full search is too slow. No exceptions.

AK_RAGE
29th November 2007, 23:28
I personally dont know enough to truely judge the paper.
Although they do compare their algorithm to the JM10 encoder they use as reference.

If nothing else, they site a few other interesting-looking adaptive search algs.

Here is a link to the full text:
http://localhostr.com/files/15a21ebfe8885908c49f.pdf

akupenguin
29th November 2007, 23:33
(2) Any algorithm that's proposed in a paper that compares its speed against JM is too slow.

drmpeg
30th November 2007, 04:05
When are you guys going to attempt telescoping ME? If you're looking for speed, it's the way to go. It's been used on real-time hardware MPEG-2 encoders since the dawn of time.

Ron

Dark Shikari
30th November 2007, 04:18
When are you guys going to attempt telescoping ME? If you're looking for speed, it's the way to go. It's been used on real-time hardware MPEG-2 encoders since the dawn of time.

Ron

Another class of fast ME methods, known as Fast ME based on assumption of spatial or temporal correlation of MV is based on the assumption of a correlation between values of optimum motion vectors for spatially neighboring macroblocks on the same frame, or temporally neighboring macroblocks for the same location on the sequential frames. One widely known example of this method is a telescopic search, which assumes that the motion vector for a particular macroblock can be a good initial approximation for the motion vector search on sequential frames.Um, duh? They're called predictors, and x264 has a whole heapload of them.

drmpeg
30th November 2007, 05:21
Um, duh? They're called predictors, and x264 has a whole heapload of them.
Sorry, I was too vague. I meant staged telescoping search on sub-sampled images. That is, you do a 1st stage search on a quarter resolution image, 2nd stage on a half resolution image based on the target from the 1st stage, and so on.

As I said, this technique is popular with real-time encoders (since ME is always the bottleneck, at least with MPEG-2). You get a large search range from the 1st stage, but with only 1/16 of the operations.

As an old C-Cube guy, I've never worked on an encoder that didn't use this method.

Ron

EDIT: a 1/4 res image would take only 1/16 of the operations.

Dark Shikari
30th November 2007, 05:28
Sorry, I was too vague. I meant staged telescoping search on sub-sampled images. That is, you do a 1st stage search on a quarter resolution image, 2nd stage on a half resolution image based on the target from the 1st stage, and so on.

As I said, this technique is popular with real-time encoders (since ME is always the bottleneck, at least with MPEG-2). You get a large search range from the 1st stage, but with only 1/4 of the operations.

As an old C-Cube guy, I've never worked on an encoder that didn't use this method.

RonThis is likely less useful than you'd think it is, since 16x16 SADs are only 48 operations on a Core 2; thus a lot of the operations involved are simply overhead, meaning that the smaller a block is, the more overhead there is relative to actual time spent comparing the two blocks.

This could theoretically be useful with something like UMH--run the wide search on a lowres version of the video and the final HEX search on the fullres.

drmpeg
30th November 2007, 05:59
This is likely less useful than you'd think it is, since 16x16 SADs are only 48 operations on a Core 2; thus a lot of the operations involved are simply overhead, meaning that the smaller a block is, the more overhead there is relative to actual time spent comparing the two blocks.

This could theoretically be useful with something like UMH--run the wide search on a lowres version of the video and the final HEX search on the fullres.
You don't reduce the size of the SAD. You still do 16x16 on the quarter res image. An exhaustive search takes 1/16 the operations of the full size search.

Anyways, I just thought I'd toss this out there since it's used on millions (over the lifetime of MPEG-2, probably 100's of millions) of real-time MPEG-2 encoder sockets.

Ron

Dark Shikari
30th November 2007, 06:04
You don't reduce the size of the SAD. You still do 16x16 on the quarter res image.Totally useless; such an operation is really a 64x64 SAD, and motion is almost always smaller-scale than that.

That would basically involve a single motion search to get extra predictors for 16 blocks at a time... that seems somewhat silly.

drmpeg
30th November 2007, 06:56
Totally useless; such an operation is really a 64x64 SAD, and motion is almost always smaller-scale than that.

That would basically involve a single motion search to get extra predictors for 16 blocks at a time... that seems somewhat silly.
You're not trying to get the final vector from the 1st stage. It's just to find the best (and smaller) area of the image to search in later stages. You can read the C-Cube patent here:

http://www.freepatentsonline.com/6108039.html

It's all just FYI. You can take the blue pill or the red pill.;)

Ron

akupenguin
30th November 2007, 14:33
You don't reduce the size of the SAD. You still do 16x16 on the quarter res image. An exhaustive search takes 1/16 the operations of the full size search.

Why would you even bother with a 1/16 size search if you're just going to waste that cpu time again making it exhaustive? A 1/16 size exhaustive search is about the same speed as a full size hex search, so it's definitely not worthwhile just for a very coarse predictor.