Quote:
Originally Posted by Ataril
Thanks again for all the valuable information that you gave!
And if we go further there are more obscure fields concerning the sizes of macroblock. As far as I understand coder chooses the appropriate size (such as 16x16, 8x8 or smaller) depending on the area's detalization in the frame (in order to provide better quality and compression of the video). For detailed high-frequency area it is reasonable to use smaller macroblock size and vice versa.
But how it's evaluate where is the smooth areas in the frame and where is not? How much should be the differences between these areas so the coder decides to use one or other macroblock size? It should compare values in the matrix or brightness or chroma or both?
|
The input frame is in
spatial domain, so each value in a N×N block represents the "brightness" (luminance) or "color" (chrominance) of a pixel/sample. Those "pixel" values are
transformed into
frequency domain, because, in
frequency domain, the same information can usually be represented with only a few
non-zero frequency coefficients. In other words: You still have N×N values (frequency coefficients) after the transform, but most of those values are very close to zero. And most values (coefficients) actually become
zero after the
quantization stage. Finally, thanks to the
entropy coding stage (e.g. via Huffman coding or arithmetic coding), those long sequences of zero's become extremely "cheap" to store, in terms of bit cost.
Example of DCT transform:
http://img.tomshardware.com/us/1999/...part_3/dct.gif
Now, as a "rule of thumb", using
larger transform blocks is advantageous in "flat" image regions. Simply put, that's because a very large image area can be covered with a
single block that, after the transform to
frequency domain, has only a few non-zero coefficients. But that won't work well in "detailed" image regions! A large block would need
too many non-zero coefficients to provide a reasonable approximation of the "detailed" area. Smaller transform blocks are advantageous there.
How does the encoder know what transform size to use in a specific image location? Again: The
standard does
not dictate that! It's up to the
encoder developers to figure out such things, using whatever methods/ideas they deem appropriate
(A typical approach is called "rate-distortion-optimization", aka
RDO, which will actually try out
many possible decisions and, in the end, keep the decision that resulted in the best "error vs. bit-cost" trade-off)