Questions about motion control [Archive]

View Full Version : Questions about motion control

Ozymandis

24th April 2002, 12:41

When a background is shifting, yet certain parts of the video are remaining stationary, or moving in an opposing direction, are more bits allocated to the stationary or opposing segments?

It seems that distinctions are made between the background scenery and forefront (actors, things in focus, etc) somehow, but I'm just not sure what the criteria is for the codec to decide. I suppose one method would be to scan blocks of the frame, identify a pattern for that block, and then determine if that block is translating into the same direction (this where fourier comes in?) as the majority of the frame, or if it abruptly disappears (scene change). (I'm pulling this out of my ass, so please correct me :) )

If that is correct, then..is the entire frame representation block-based, or is it possible for abstract geometrical vectors to be used? (ie: can rectangles or curved portions be addressed for like..a large black portion or something?) I assume that the video is represented as a differential between blocks of different frames...? How is that done? Also, what's the criteria for deciding what gets the most bits? Lightness/darkness....slow/fast action, umm..if a block is an "edge"? any others?

If anyone could answer any of my questions, or point me to a good, detailed explanation of how a divx-like codec operates internally, I'd be interested. (I'm probably not going to buy an expensive book on it though...I'm just curious :))

Thanks,
Graham

-h

24th April 2002, 13:08

When a background is shifting, yet certain parts of the video are remaining stationary, or moving in an opposing direction, are more bits allocated to the stationary or opposing segments?

All parts of the image are assigned an equal distribution of bits, by which I mean, all blocks are quantized as much as all others in the frame. Typically, a moving block will require more bits in the bitstream (despite looking as good as the stationary object, as the quantizer is the same for both blocks), because the stationary block has less residual error information from the reference frame.

But with regard to what I think you're getting at, XviD does no fancy bit allocation based on motion.

It seems that distinctions are made between the background scenery and forefront (actors, things in focus, etc) somehow, but I'm just not sure what the criteria is for the codec to decide. I suppose one method would be to scan blocks of the frame, identify a pattern for that block, and then determine if that block is translating into the same direction (this where fourier comes in?) as the majority of the frame, or if it abruptly disappears (scene change). (I'm pulling this out of my ass, so please correct me :) )

The task of deciding whether a block is stationary, moving, or "new" (as in just appeared), is performed by the codec's motion estimation system. In XviD's case, this is either PMVFAST or EPZS(SQ).

They are fairly simple when you get down to it - they just search around the reference (in the case of P-frames, this would be the previous) frame to find a block which "looks pretty close" to the block we're currently trying to compress. If we can't find a good match, we assume the block has just appeared, and store it as such (these are intra blocks, which don't reference any other frame). If enough blocks are coded as "intra", we assume that an entire scene change has occurred, and we re-encode the frame as an I-frame.

If that is correct, then..is the entire frame representation block-based, or is it possible for abstract geometrical vectors to be used? (ie: can rectangles or curved portions be addressed for like..a large black portion or something?) I assume that the video is represented as a differential between blocks of different frames...? How is that done? Also, what's the criteria for deciding what gets the most bits? Lightness/darkness....slow/fast action, umm..if a block is an "edge"? any others?

XviD currently segments the entire frame into 16x16-pixel blocks (called macroblocks) and tries to match them against the previous frame, sometimes splitting the 16x16 block into 4 8x8 blocks and matching all 4 separately (INTER4V mode). It is possible to extract abstract objects (dubbed sprites), however it's incredibly complex and may never make it into XviD.

There is no criteria for assigning bits to one area over another, unless you use lumi masking, which quantizes very dark/light areas heavier than mid-range areas.

If anyone could answer any of my questions, or point me to a good, detailed explanation of how a divx-like codec operates internally, I'd be interested. (I'm probably not going to buy an expensive book on it though...I'm just curious :))

Hrm I don't know of any guides, but a google search returned this:

http://www.cmlab.csie.ntu.edu.tw/cml/dsp/training/coding/mpeg1/

It's very well written, and although it refers to MPEG-1, 95% of the concepts hold true for MPEG-4.

-h

chemmajik

24th April 2002, 13:33

http://ptolemy.eecs.berkeley.edu/~vissers/hot_chips_rotated.pdf
Here's another general terms pdf I found from Visser, covers alot of ranges but not mpeg4 specific. But it give's you pictured definitions. That mpeg1 link above is a pretty good one. Might help me some too.

duartix

24th April 2002, 18:03

Yes, I also found the http://www.cmlab.csie.ntu.edu.tw/cml/dsp/training/coding/mpeg1/ link very straight forward and informative but I would like to dive into the Visser's link, pitty it's not working:(

FAMOUS

24th April 2002, 18:18

hello,
is the http://www.cmlab.csie.ntu.edu.tw/cml/dsp/training/coding/mpeg1/
guide avaible in german?

Ozymandis

25th April 2002, 03:35

Thanks -h! Your explanation helped a lot, and the mentioned urls look informative also.