How to: B frames

koliva · 19th October 2009, 17:18

Hello all,

I know that B frames are the bi-directional frames. It means that the blocks in B-frames are calculated according to I and P frames. Are they just interpolated? If so, is there any feedback loop for wrongly calculated blocks? How? Thanks.

LoRd_MuldeR · 19th October 2009, 21:42

A frame that was coded as a B-Frame is predicted from it's reference frames. That basically works the same way as a frame that was coded P-Frame is predicted from it's reference. Only that a B-Frame also references to a "future" frame in display order (see here). Anyway, no matter whether a frame was coded as P-Frame or as B-Frame, there will be a difference between the predicted frame and the original frame. That difference is called the "residual" and will be stored in the file. As B-Frames can predict the frame better, the residual will contain less information and thus take less space...

Example:
1) http://img21.imageshack.us/img21/855...predictedp.png
2) http://img42.imageshack.us/img42/639...meresidual.png
3) http://img21.imageshack.us/img21/3425/bframefinal.png

koliva · 20th October 2009, 07:44

Quote:

Originally Posted by LoRd_MuldeR

A frame that was coded as a B-Frame is predicted from it's reference frames. That basically works the same way as a frame that was coded P-Frame is predicted from it's reference. Only that a B-Frame also references to a "future" frame in display order (see here). Anyway, no matter whether a frame was coded as P-Frame or as B-Frame, there will be a difference between the predicted frame and the original frame. That difference is called the "residual" and will be stored in the file. As B-Frames can predict the frame better, the residual will contain less information and thus take less space...

Example:
1) http://img21.imageshack.us/img21/855...predictedp.png
2) http://img42.imageshack.us/img42/639...meresidual.png
3) http://img21.imageshack.us/img21/3425/bframefinal.png

Thank you for your very good explanation and examples.

Could you please correct me if I misunderstand you?

When we go step by step for any B frame, let say our configuration is IBP,

1- B frame is predicted from I frame and the residual image is created,
2- It is predicted from P frame and the residual image is created,

Then what happens these residual images? Are they kept in seperately? Or is there only one residual data after prediction?

Thanks.

LoRd_MuldeR · 20th October 2009, 15:16

Quote:

When we go step by step for any B frame, let say our configuration is IBP

In that sequence the display order is IBP, but the decoding order will be IPB. That's because the the I-Frame is encoded intra-only (no reference), the P-Frame is predicted from the I-Frame and the B-Frame is predicted from both surrounding frames, the I-Frame and the P-Frame. Obviously we will have to decode the P-Frame first before we can decode the B-Frame, even if the B-Frame will be displayed to the user earlier. Furthermore: For both, the P-Frame and the B-Frame, there will be a "predicted" frame (which is predicted from the references) and a "residual" frame. The latter is the difference is between predicted and original. We get the predicted frame "for free" because it's predicted by the decoder from data it already has. Unfortunately the predicted frame will never be perfect (that is: not identical to the original frame). So therefore we must store the residual in the file. Still storing "only" the residual takes far less bits than storing the entire frame. That's because the residual is only the difference between predicted and original, so it contains less information and thus can be compressed more efficiently. The residual of the B-Frame (hopefully) contains even less Information than the P-Frame's residual, because the B-Frame is predicted from two references, which yields a "better" prediction (closer to original). Needless to say that things are more complex in reality, e.g. multiple-references, weighted prediction, hierarchical B-Frames and so on...

koliva · 20th October 2009, 16:14

Quote:

Originally Posted by LoRd_MuldeR

In that sequence the display order is IBP, but the decoding order will be IPB. That's because the the I-Frame is encoded intra-only (no reference), the P-Frame is predicted from the I-Frame and the B-Frame is predicted from both surrounding frames, the I-Frame and the P-Frame. Obviously we will have to decode the P-Frame first before we can decode the B-Frame, even if the B-Frame will be displayed to the user earlier. Furthermore: For both, the P-Frame and the B-Frame, there will be a "predicted" frame (which is predicted from the references) and a "residual" frame. The latter is the difference is between predicted and original. We get the predicted frame "for free" because it's predicted by the decoder from data it already has. Unfortunately the predicted frame will never be perfect (that is: not identical to the original frame). So therefore we must store the residual in the file. Still storing "only" the residual takes far less bits than storing the entire frame. That's because the residual is only the difference between predicted and original, so it contains less information and thus can be compressed more efficiently. The residual of the B-Frame (hopefully) contains even less Information than the P-Frame's residual, because the B-Frame is predicted from two references, which yields a "better" prediction (closer to original). Needless to say that things are more complex in reality, e.g. multiple-references, weighted prediction, hierarchical B-Frames and so on...

I understood everything except the Bold sentence. Did you mean encoder instead of decoder? Or did you mean encoder can create it?

LoRd_MuldeR · 20th October 2009, 16:30

Quote:

Originally Posted by koliva

I understood everything except the Bold sentence. Did you mean encoder instead of decoder? Or did you mean encoder can create it?

It means that the encoder won't store the "predicted" frame in the file, because the decoder can get the "predicted" frame without additional data ("for free") by prediction from the reference frames(s). This is possible because at the time when a P- or B-Frame is decoded, the decoder already has decoded all reference frames. Both, the encoder and the decoder, will calculate the "predicted" frame. The encoder will calculate the "residual" as the difference between the predicted frame and the original frame. The residual is stored in the file. The decoder, who doesn't know how the original frame looked, first calculates the "predicted" frame (from the reference frames it already knows) and then applies the "residual" to the "predicted" in order to get the "final" frame...

koliva · 20th October 2009, 16:35

Quote:

Originally Posted by LoRd_MuldeR

It means that the encoder won't store the "predicted" frame in the file, because the decoder can get the "predicted" frame without additional data ("for free") by prediction from the reference frames(s). This is possible because at the time when a P- or B-Frame is decoded, the decoder already has decoded all reference frames. Both, the encoder and the decoder, will calculate the "predicted" frame. The encoder will calculate the "residual" as the difference between the predicted frame and the original frame. The residual is stored in the file. The decoder, who doesn't know how the original frame looked, first calculates the "predicted" frame from the reference frames it already knows and then applies to "residual" to the "predicted" in order to get the final frame...

Thanks a lot for your excellent explanation.

Asmodian · 20th October 2009, 20:48

Quote:

Originally Posted by koliva

1- B frame is predicted from I frame and the residual image is created,
2- It is predicted from P frame and the residual image is created,

Then what happens these residual images? Are they kept in seperately? Or is there only one residual data after prediction?

The B frame is predicted from the I and P frames in 8x8 blocks (I think this size can change in h.264 at least) and only the smallest residual from the I or P frame is saved for each block.

So the B frame can use a mixture of blocks from the I and P frames but each block only uses the I or P frame for predicted and residual data.

LoRd_MuldeR · 20th October 2009, 21:19

Quote:

Originally Posted by Asmodian

The B frame is predicted from the I and P frames in 8x8 blocks (I think this size can change in h.264 at least) and only the smallest residual from the I or P frame is saved for each block.

So the B frame can use a mixture of blocks from the I and P frames but each block only uses the I or P frame for predicted and residual data.

What about "Weighted Prediction" ???

Doesn't it mix data from several references to efficiently code fades and similar stuff?

imcold · 20th October 2009, 22:10

Quote:

Originally Posted by Asmodian

The B frame is predicted from the I and P frames in 8x8 blocks (I think this size can change in h.264 at least) and only the smallest residual from the I or P frame is saved for each block.

So the B frame can use a mixture of blocks from the I and P frames but each block only uses the I or P frame for predicted and residual data.

No and no. Macroblocks are always 16x16 pixels, and there's a macroblock type that uses both references, too.

koliva · 21st October 2009, 07:39

Another question. Let's think only an I and a P frame. Whole P image is divided to macroblocks and all macroblocks are one by one processed to look for maximum dependency from I frame, called motion vector. Let say the complexity of this process is O(n)=x.
For the B frame, all the macroblocks in the B frame are processed one by one. However, all these macroblocks are tried to match any macroblock from I frame or P frame. Therefore, the complexity is approximately 2x, isn't it?

Dark Shikari · 21st October 2009, 07:42

Quote:

Originally Posted by koliva

Another question. Let's think only an I and a P frame. Whole P image is divided to macroblocks and all macroblocks are one by one processed to look for maximum dependency from I frame, called motion vector. Let say the complexity of this process is O(n)=x.
For the B frame, all the macroblocks in the B frame are processed one by one. However, all these macroblocks are tried to match any macroblock from I frame or P frame. Therefore, the complexity is approximately 2x, isn't it?

Technically, it would be N^2, not 2x, since the biprediction search space has twice the number of dimensions. In practice, it's actually less than that of P-frames because of the effective skip early termination, and because one doesn't actually need to search the whole biprediction search space.

koliva · 21st October 2009, 07:59

Quote:

Originally Posted by Dark Shikari

Technically, it would be N^2, not 2x, since the biprediction search space has twice the number of dimensions. In practice, it's actually less than that of P-frames because of the effective skip early termination, and because one doesn't actually need to search the whole biprediction search space.

Could you please give me more information about effective skip early termination? What are the conditions for this skip process?

LoRd_MuldeR · 21st October 2009, 11:24

Quote:

Originally Posted by koliva

Could you please give me more information about effective skip early termination? What are the conditions for this skip process?

I didn't look into the details yet, but I think it means that you don't test any possible combination ("brute force method"). Instead you only test some candidates and then limit the further search to regions around the most promising matches you have found so far. Regions where no good matches are expected, will be skipped. Also the "search range" has a pre-defined limit...

If you want the details, have a look at:
http://git.videolan.org/?p=x264.git;...f18fbc;hb=HEAD

koliva · 21st October 2009, 13:24

To understand the process and complexity better, Is there any way to see how many comparison are done for a P frame and B frame in a movie using recent encoder? Is there any tool for that purpose?

ConsciousEffect · 26th October 2009, 01:03

hey lord mulder i would be interested in understanding h264 at a similar level of comprehension to what you have, the problem with complex issues like this is finding the proper resources and finding them in the correct order, you know its important to understand the basic framework of the system in question before you try to start comprehending formulas for example, anyways i would be interested in any resources you recommend

19th October 2009, 17:18	#1 \| Link
koliva Beginner Join Date: Jan 2009 Location: Europe Posts: 125	How to: B frames Hello all, I know that B frames are the bi-directional frames. It means that the blocks in B-frames are calculated according to I and P frames. Are they just interpolated? If so, is there any feedback loop for wrongly calculated blocks? How? Thanks.

19th October 2009, 21:42	#2 \| Link
LoRd_MuldeR Software Developer Join Date: Jun 2005 Location: Last House on Slunk Street Posts: 13,248	A frame that was coded as a B-Frame is predicted from it's reference frames. That basically works the same way as a frame that was coded P-Frame is predicted from it's reference. Only that a B-Frame also references to a "future" frame in display order (see here). Anyway, no matter whether a frame was coded as P-Frame or as B-Frame, there will be a difference between the predicted frame and the original frame. That difference is called the "residual" and will be stored in the file. As B-Frames can predict the frame better, the residual will contain less information and thus take less space... Example: 1) http://img21.imageshack.us/img21/855...predictedp.png 2) http://img42.imageshack.us/img42/639...meresidual.png 3) http://img21.imageshack.us/img21/3425/bframefinal.png __________________ Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 19th October 2009 at 23:01.

21st October 2009, 07:39	#11 \| Link
koliva Beginner Join Date: Jan 2009 Location: Europe Posts: 125	Another question. Let's think only an I and a P frame. Whole P image is divided to macroblocks and all macroblocks are one by one processed to look for maximum dependency from I frame, called motion vector. Let say the complexity of this process is O(n)=x. For the B frame, all the macroblocks in the B frame are processed one by one. However, all these macroblocks are tried to match any macroblock from I frame or P frame. Therefore, the complexity is approximately 2x, isn't it?

21st October 2009, 13:24	#15 \| Link
koliva Beginner Join Date: Jan 2009 Location: Europe Posts: 125	To understand the process and complexity better, Is there any way to see how many comparison are done for a P frame and B frame in a movie using recent encoder? Is there any tool for that purpose?

26th October 2009, 01:03	#16 \| Link
ConsciousEffect Registered User Join Date: Feb 2009 Posts: 20	hey lord mulder i would be interested in understanding h264 at a similar level of comprehension to what you have, the problem with complex issues like this is finding the proper resources and finding them in the correct order, you know its important to understand the basic framework of the system in question before you try to start comprehending formulas for example, anyways i would be interested in any resources you recommend