Quote:
Originally Posted by Ataril
I get it, h.264 knows nothing about RGB color space. But complete codec (such as ffmpeg for example) starts working with any video format not only in YCbCr, am I wrong? It should first convert it into recognizable color space, so it need some converter's functional additionally to codec's functional.
|
Most encoder libraries expect that the application already passes the input frames in the "proper" color format. For lossy encoders that's typically the YCbCr format (most often with 4:2:0 or 4:2:2 subsampling).
But that's really implementation specific! An encoder library could also accept various color formats and take care of the required color-space conversion
internally.
FFmpeg is an application that does a whole lot of different things, including decoding/encoding, color-space conversion, resizing and so on. It uses various own and third-party libraries to implement all those features.
x264, for example,
does support color-space conversion. But that's implemented in the x264 command-line front-end (via libswscale library),
not in the actual "libx264" encoder library.
Quote:
Originally Posted by Ataril
Now it's clear for me, thank you for this brief ultimate explanation. So to figure out how coder works maybe I should explore some existed coder (unfortunately reading someone's code is quite difficult)
|
Yes, if you
really want to understand how video encoding is actually done in practice, you should probably start looking at "real-world" encoder code, e.g.
x264 for H.264/AVC encoding or
x265 for H.265/HEVC encoding.
There also exist so-called "reference" encoders for most video formats (e.g.
JM for H.264/AVC). But be aware that those are more "proof of concept" encoders,
way too slow for real-world usage.
Quote:
Originally Posted by Ataril
Where can I read more about these methods? They determine only the pattern for search or the size of the search field too? Or maybe it's depend on the specific algorithm too?
|
Either in encoder implementations that actually make use of those methods. Or in scientific papers that describe those methods theoretically.
Just a quick Google search:
*
http://www.ntu.edu.sg/home/ekkma/1_P...pt.%201997.pdf
*
http://www.ee.oulu.fi/mvg/files/pdf/pdf_725.pdf
*
http://www.mirlab.org/conference_pap...ers/P90619.pdf
Quote:
Originally Posted by Ataril
And one more dumb question - as soon as we use YCbCr color space, when we say 16x16 size macroblock, what does 16x16 mean exactly? It is pixels of brightness, right?
And all distances on the frame are measured inside this concept?
|
In the YCbCr color-space, there are
three separate channels. One channel ("Y") represents brightness/luminance information. And the other two channels ("Cr" and "Cb") represent color/chrominance information.
So you have "brightness"
and "color" information, a although "brightness" is usually kept at a higher resolution than "color" (aka
chroma-subsampling).
The "transform blocks", e.g. 8×8 pixels in size, are used to transform the image data from
spatial domain into
frequency domain. For example, a block of 8x8 pixels is transformed into a matrix of 64 frequency coefficients.
I suggest that you start with the more simple JPEG image format before you move on to video, because many fundamental concepts are the same (but more easy to follow):
https://en.wikipedia.org/wiki/JPEG#Encoding