PDA

View Full Version : Help me understand H.264


Zanthra
20th July 2007, 09:10
So, I have been thinking about how all of this works. After the I frame that starts a group of pictures, any frame after that is motion compensated from any available references to minimize the difference for the Y, U, and V components, and then the DCT of the difference between the Y, U, and V of the motion compensated frame and the source frame is written into the datastream?

If so, I wonder why fadeins and fadeouts are so expensive? Does H.264 allow that a negative difference reduce a pixel past zero such that the DCTs are more homogeneous?

Hellworm
20th July 2007, 14:07
Fades are expensive because there is change all over in the image. Most compression is done through the motion compensation where nearly zero change has to be encoded with the dct. On fades there is quite some change that has to be encoded with the dct. Additionally the motion search gets popably messed up.

akupenguin
20th July 2007, 19:17
Fades are expensive because x264 doesn't fully use all the available tools provided by the standard. Note that these tools are new to H.264, and are not in any previous compression standard.

There's two ways to efficiently code a fade-in or fade-out: If there's no motion and only the fade, you can put an I-frame at the end and make every frame during the fade a B w/ implicit weighted prediction. Then ideally they take no bits other than the frame header. If there is motion you can do that with hierarchical B-frames (a generalization of x264's pyramid), or you can use explicit weighted prediction to predict each frame by multiplying the samples of the previous frame by some number.
In cross-fades, only the B-frame method works. And in cross-fades with motion, B-frames are better than nothing, but motion compensation is impaired.

The problem with those methods is that there's no general way to codec can know to use them. It would need to explicitly detect fades and apply special frame types any motion estimation modes.

PuzZLeR
20th July 2007, 19:59
Is there any combination of B-frames, MRFs, etc. one can add to the x264 settings to at least improve the situation in the interim?

I personally use alot of cross-fades, roll-fades, dimming transitions in my editing before encoding the final clip. (Masks out "imperfections" from television content.)

akupenguin
20th July 2007, 20:22
Force B-frames by munging the 2pass statsfile, and enable --weightb. Make sure the fade is precisely linear, and make sure your start and end frame numbers are exactly right; any deviation requires lots of residual. If your fade is longer than 16 frames, you'll need to lift x264's restriction on 16 consecutive B-frames, or sacrifice some compression by putting extra I-frames in there.

e.g. 1st pass decides:
IbPbPbPbPbPbPbPbbbPbbbP... (display order)
^black ^full brightness

change it to:
IbbbbbbbbbbbbbibbbPbbbP...

PuzZLeR
21st July 2007, 06:09
Clever. Makes sense.

Now you got me adjusting my editing in advance. Probably an ounce of prevention may create a pound of compression later...

Something to experiment with... thanks!

Zanthra
23rd July 2007, 07:05
So all frame descisions are done on the macroblock level? There is no way to "normalize" the difference of the whole frame, and detect the fade?

Also out of sheer curiocity, would allowing the encoder to give values for a y=mx+b luminocity change over all pixels in a macroblock help? Some cases where luminocity is changing in a video can get pretty high bitrate, even though the change in luminocity falls in a simple mathematical function.

akupenguin
23rd July 2007, 07:26
If you have decided frame types, then detecting P-frames with a uniform change in brightness is possible, and you can then enable explicit weighted prediction. (See a very old patch (http://akuvian.org/src/x264/x264_wpredp.0.diff) I never finished.) But I don't know how to efficiently decide the optimal sequence of frame types.
I have a perhaps irrational aversion to heuristics where the property controlled has no causal relationship with the metric controlling it. (This is also the reason Haali's AQ isn't in mainline.) So while I said that using lots of B-frames in fades is (sometimes) good, I don't want to pack the codec with cases like "detect a fade and force lots of B-frames".

y=mx+b can only be applied to a whole reference frame, you can't set the coefficients indepedently per macroblock. If only part of a frame is changing illumination, the best you can do is use multiple reference frames, and weight some of them according to the illuminated section and others according to the background. (Actually if I read the standard correctly, you can also put one reference frame in multiple slots of the reference list, each with different weights. But you still have to segment the frame into a few regions with constant weights.)

burfadel
23rd July 2007, 09:23
You could have a setting that say, increases the b-frame bias on possible fade detection. That is, if a possible fade is detected, it increases the b-frame bias to say 40 (or mode correctly, 40 extra bias points). That way, if it is not a fade, it shouldn't notoceably affect the picture under that scenario. You could include this as an option only, until a better method (if possible) is found. Just an idea, its not perfect but it should reduce the bitrate penalty quite a lot. This could be an option categorised under b-frame bias options, therefore its use can be considered cautionary like adjusting the b-frame bias.

A suggestion also is to call it say 'Increase b-frame bias on detected fades' or something, and have a low, medium and high settings (say 20, 40, and 60)

If the output filesize is not corectly calculable even with a 2 pass encode, it could be an option thats only available in constant quality/quantiser modes. The same goes for the AQ patch in the standard builds.

Sagittaire
23rd July 2007, 15:10
If you have decided frame types, then detecting P-frames with a uniform change in brightness is possible, and you can then enable explicit weighted prediction. (See a very old patch (http://akuvian.org/src/x264/x264_wpredp.0.diff) I never finished.) But I don't know how to efficiently decide the optimal sequence of frame types.
I have a perhaps irrational aversion to heuristics where the property controlled has no causal relationship with the metric controlling it. (This is also the reason Haali's AQ isn't in mainline.) So while I said that using lots of B-frames in fades is (sometimes) good, I don't want to pack the codec with cases like "detect a fade and force lots of B-frames".


I think that in these particulars sequencies RDO for frame type decision could be a very good way. Why not wpred for pframe in x264? perhaps a futur good implementation but perhaps more complex than wpred for bframe.