Doom9's Forum - View Single Post

Ataril · 13th November 2016, 17:24

Quote:

Originally Posted by LoRd_MuldeR

The input frame is in spatial domain, so each value in a N×N block represents the "brightness" (luminance) or "color" (chrominance) of a pixel/sample. Those "pixel" values are transformed into frequency domain, because, in frequency domain, the same information can usually be represented with only a few non-zero frequency coefficients. In other words: You still have N×N values (frequency coefficients) after the transform, but most of those values are very close to zero. And most values (coefficients) actually become zero after the quantization stage. Finally, thanks to the entropy coding stage (e.g. via Huffman coding or arithmetic coding), those long sequences of zero's become extremely "cheap" to store, in terms of bit cost.

Example of DCT transform:
http://img.tomshardware.com/us/1999/...part_3/dct.gif

Yes, I've read about DCT and quantization. But what was bothering me is that im not really understood how RGB hex color code transforms into separate frequency values of brightness and color.
Lets say we have a 16x16 block of pixels.
Brightness is calculated first using formula - Y′ = 0.299 R′ + 0.587 G′ + 0.114 B′ ( it is the sum of red, green and blue colors with proper weight coefficients - 0.299 for red, 0.587 for green and 0.114 for blue, according to CCIR 601 standart) and its range is from 16 (black) to 235 (white) and we got matrux of 256 values.
For color blocks obviously we take color value in the range 0-255 for blue and for red, average them (if we use 4:2:0 or 4:2:2 subsampling) and construct two matrixes using following formulas for each chroma value:
Cb = 0.564 (B - Y)
Cr = 0.713 (R - Y)
Before being displayed on the screen the picture should be transformed from YCbCr color space back into habitual RGB color space.

Found the answer here:
https://en.wikipedia.org/wiki/Luma_(video)
And once again reread second chapter in Richardson's book, it has quite detailed information

Quote:

Originally Posted by LoRd_MuldeR

How does the encoder know what transform size to use in a specific image location? Again: The standard does not dictate that! It's up to the encoder developers to figure out such things, using whatever methods/ideas they deem appropriate

(A typical approach is called "rate-distortion-optimization", aka RDO, which will actually try out many possible decisions and, in the end, keep the decision that resulted in the best "error vs. bit-cost" trade-off)

Thanks for this new knowledge, not sure that heard about RDO before.

And what about codecs that are built in mobile devices? How are they operate? As far as I understand they don't have chance to evaluate video before coding, because it should be performed on the fly (unlike desktop codecs which are evaluate video one or more times to decide how better to destribute bitrate among frames). How they manage unpredictable video? Are they just using some default parameters?