View Single Post
Old 8th November 2016, 21:16   #6  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by Ataril View Post
Thanks again for all the valuable information that you gave!

And if we go further there are more obscure fields concerning the sizes of macroblock. As far as I understand coder chooses the appropriate size (such as 16x16, 8x8 or smaller) depending on the area's detalization in the frame (in order to provide better quality and compression of the video). For detailed high-frequency area it is reasonable to use smaller macroblock size and vice versa.
But how it's evaluate where is the smooth areas in the frame and where is not? How much should be the differences between these areas so the coder decides to use one or other macroblock size? It should compare values in the matrix or brightness or chroma or both?
The input frame is in spatial domain, so each value in a N×N block represents the "brightness" (luminance) or "color" (chrominance) of a pixel/sample. Those "pixel" values are transformed into frequency domain, because, in frequency domain, the same information can usually be represented with only a few non-zero frequency coefficients. In other words: You still have N×N values (frequency coefficients) after the transform, but most of those values are very close to zero. And most values (coefficients) actually become zero after the quantization stage. Finally, thanks to the entropy coding stage (e.g. via Huffman coding or arithmetic coding), those long sequences of zero's become extremely "cheap" to store, in terms of bit cost.

Example of DCT transform:
http://img.tomshardware.com/us/1999/...part_3/dct.gif

Now, as a "rule of thumb", using larger transform blocks is advantageous in "flat" image regions. Simply put, that's because a very large image area can be covered with a single block that, after the transform to frequency domain, has only a few non-zero coefficients. But that won't work well in "detailed" image regions! A large block would need too many non-zero coefficients to provide a reasonable approximation of the "detailed" area. Smaller transform blocks are advantageous there.

How does the encoder know what transform size to use in a specific image location? Again: The standard does not dictate that! It's up to the encoder developers to figure out such things, using whatever methods/ideas they deem appropriate

(A typical approach is called "rate-distortion-optimization", aka RDO, which will actually try out many possible decisions and, in the end, keep the decision that resulted in the best "error vs. bit-cost" trade-off)
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 8th November 2016 at 21:31.
LoRd_MuldeR is offline   Reply With Quote