Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Newbies
Register FAQ Calendar Today's Posts Search

Reply
 
Thread Tools Search this Thread Display Modes
Old 19th October 2009, 17:18   #1  |  Link
koliva
Beginner
 
koliva's Avatar
 
Join Date: Jan 2009
Location: Europe
Posts: 125
How to: B frames

Hello all,

I know that B frames are the bi-directional frames. It means that the blocks in B-frames are calculated according to I and P frames. Are they just interpolated? If so, is there any feedback loop for wrongly calculated blocks? How? Thanks.
koliva is offline   Reply With Quote
Old 19th October 2009, 21:42   #2  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
A frame that was coded as a B-Frame is predicted from it's reference frames. That basically works the same way as a frame that was coded P-Frame is predicted from it's reference. Only that a B-Frame also references to a "future" frame in display order (see here). Anyway, no matter whether a frame was coded as P-Frame or as B-Frame, there will be a difference between the predicted frame and the original frame. That difference is called the "residual" and will be stored in the file. As B-Frames can predict the frame better, the residual will contain less information and thus take less space...

Example:
1) http://img21.imageshack.us/img21/855...predictedp.png
2) http://img42.imageshack.us/img42/639...meresidual.png
3) http://img21.imageshack.us/img21/3425/bframefinal.png
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 19th October 2009 at 23:01.
LoRd_MuldeR is offline   Reply With Quote
Old 20th October 2009, 07:44   #3  |  Link
koliva
Beginner
 
koliva's Avatar
 
Join Date: Jan 2009
Location: Europe
Posts: 125
Quote:
Originally Posted by LoRd_MuldeR View Post
A frame that was coded as a B-Frame is predicted from it's reference frames. That basically works the same way as a frame that was coded P-Frame is predicted from it's reference. Only that a B-Frame also references to a "future" frame in display order (see here). Anyway, no matter whether a frame was coded as P-Frame or as B-Frame, there will be a difference between the predicted frame and the original frame. That difference is called the "residual" and will be stored in the file. As B-Frames can predict the frame better, the residual will contain less information and thus take less space...

Example:
1) http://img21.imageshack.us/img21/855...predictedp.png
2) http://img42.imageshack.us/img42/639...meresidual.png
3) http://img21.imageshack.us/img21/3425/bframefinal.png
Thank you for your very good explanation and examples.

Could you please correct me if I misunderstand you?

When we go step by step for any B frame, let say our configuration is IBP,

1- B frame is predicted from I frame and the residual image is created,
2- It is predicted from P frame and the residual image is created,

Then what happens these residual images? Are they kept in seperately? Or is there only one residual data after prediction?

Thanks.
koliva is offline   Reply With Quote
Old 20th October 2009, 15:16   #4  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
When we go step by step for any B frame, let say our configuration is IBP
In that sequence the display order is IBP, but the decoding order will be IPB. That's because the the I-Frame is encoded intra-only (no reference), the P-Frame is predicted from the I-Frame and the B-Frame is predicted from both surrounding frames, the I-Frame and the P-Frame. Obviously we will have to decode the P-Frame first before we can decode the B-Frame, even if the B-Frame will be displayed to the user earlier. Furthermore: For both, the P-Frame and the B-Frame, there will be a "predicted" frame (which is predicted from the references) and a "residual" frame. The latter is the difference is between predicted and original. We get the predicted frame "for free" because it's predicted by the decoder from data it already has. Unfortunately the predicted frame will never be perfect (that is: not identical to the original frame). So therefore we must store the residual in the file. Still storing "only" the residual takes far less bits than storing the entire frame. That's because the residual is only the difference between predicted and original, so it contains less information and thus can be compressed more efficiently. The residual of the B-Frame (hopefully) contains even less Information than the P-Frame's residual, because the B-Frame is predicted from two references, which yields a "better" prediction (closer to original). Needless to say that things are more complex in reality, e.g. multiple-references, weighted prediction, hierarchical B-Frames and so on...
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 20th October 2009 at 15:27.
LoRd_MuldeR is offline   Reply With Quote
Old 20th October 2009, 16:14   #5  |  Link
koliva
Beginner
 
koliva's Avatar
 
Join Date: Jan 2009
Location: Europe
Posts: 125
Quote:
Originally Posted by LoRd_MuldeR View Post
In that sequence the display order is IBP, but the decoding order will be IPB. That's because the the I-Frame is encoded intra-only (no reference), the P-Frame is predicted from the I-Frame and the B-Frame is predicted from both surrounding frames, the I-Frame and the P-Frame. Obviously we will have to decode the P-Frame first before we can decode the B-Frame, even if the B-Frame will be displayed to the user earlier. Furthermore: For both, the P-Frame and the B-Frame, there will be a "predicted" frame (which is predicted from the references) and a "residual" frame. The latter is the difference is between predicted and original. We get the predicted frame "for free" because it's predicted by the decoder from data it already has. Unfortunately the predicted frame will never be perfect (that is: not identical to the original frame). So therefore we must store the residual in the file. Still storing "only" the residual takes far less bits than storing the entire frame. That's because the residual is only the difference between predicted and original, so it contains less information and thus can be compressed more efficiently. The residual of the B-Frame (hopefully) contains even less Information than the P-Frame's residual, because the B-Frame is predicted from two references, which yields a "better" prediction (closer to original). Needless to say that things are more complex in reality, e.g. multiple-references, weighted prediction, hierarchical B-Frames and so on...

I understood everything except the Bold sentence. Did you mean encoder instead of decoder? Or did you mean encoder can create it?
koliva is offline   Reply With Quote
Old 20th October 2009, 16:30   #6  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by koliva View Post
I understood everything except the Bold sentence. Did you mean encoder instead of decoder? Or did you mean encoder can create it?
It means that the encoder won't store the "predicted" frame in the file, because the decoder can get the "predicted" frame without additional data ("for free") by prediction from the reference frames(s). This is possible because at the time when a P- or B-Frame is decoded, the decoder already has decoded all reference frames. Both, the encoder and the decoder, will calculate the "predicted" frame. The encoder will calculate the "residual" as the difference between the predicted frame and the original frame. The residual is stored in the file. The decoder, who doesn't know how the original frame looked, first calculates the "predicted" frame (from the reference frames it already knows) and then applies the "residual" to the "predicted" in order to get the "final" frame...
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 20th October 2009 at 22:33.
LoRd_MuldeR is offline   Reply With Quote
Old 20th October 2009, 16:35   #7  |  Link
koliva
Beginner
 
koliva's Avatar
 
Join Date: Jan 2009
Location: Europe
Posts: 125
Quote:
Originally Posted by LoRd_MuldeR View Post
It means that the encoder won't store the "predicted" frame in the file, because the decoder can get the "predicted" frame without additional data ("for free") by prediction from the reference frames(s). This is possible because at the time when a P- or B-Frame is decoded, the decoder already has decoded all reference frames. Both, the encoder and the decoder, will calculate the "predicted" frame. The encoder will calculate the "residual" as the difference between the predicted frame and the original frame. The residual is stored in the file. The decoder, who doesn't know how the original frame looked, first calculates the "predicted" frame from the reference frames it already knows and then applies to "residual" to the "predicted" in order to get the final frame...
Thanks a lot for your excellent explanation.
koliva is offline   Reply With Quote
Old 20th October 2009, 20:48   #8  |  Link
Asmodian
Registered User
 
Join Date: Feb 2002
Location: San Jose, California
Posts: 4,407
Quote:
Originally Posted by koliva View Post
1- B frame is predicted from I frame and the residual image is created,
2- It is predicted from P frame and the residual image is created,

Then what happens these residual images? Are they kept in seperately? Or is there only one residual data after prediction?
The B frame is predicted from the I and P frames in 8x8 blocks (I think this size can change in h.264 at least) and only the smallest residual from the I or P frame is saved for each block.

So the B frame can use a mixture of blocks from the I and P frames but each block only uses the I or P frame for predicted and residual data.

Last edited by Asmodian; 20th October 2009 at 20:51.
Asmodian is offline   Reply With Quote
Old 20th October 2009, 21:19   #9  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by Asmodian View Post
The B frame is predicted from the I and P frames in 8x8 blocks (I think this size can change in h.264 at least) and only the smallest residual from the I or P frame is saved for each block.

So the B frame can use a mixture of blocks from the I and P frames but each block only uses the I or P frame for predicted and residual data.
What about "Weighted Prediction" ???

Doesn't it mix data from several references to efficiently code fades and similar stuff?
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 20th October 2009 at 21:30.
LoRd_MuldeR is offline   Reply With Quote
Old 20th October 2009, 22:10   #10  |  Link
imcold
pencil artist
 
imcold's Avatar
 
Join Date: Jan 2006
Location: Slovakia
Posts: 201
Quote:
Originally Posted by Asmodian View Post
The B frame is predicted from the I and P frames in 8x8 blocks (I think this size can change in h.264 at least) and only the smallest residual from the I or P frame is saved for each block.

So the B frame can use a mixture of blocks from the I and P frames but each block only uses the I or P frame for predicted and residual data.
No and no. Macroblocks are always 16x16 pixels, and there's a macroblock type that uses both references, too.
__________________
fevh264 - open-source baseline h.264 encoder
imcold is offline   Reply With Quote
Old 21st October 2009, 07:39   #11  |  Link
koliva
Beginner
 
koliva's Avatar
 
Join Date: Jan 2009
Location: Europe
Posts: 125
Another question. Let's think only an I and a P frame. Whole P image is divided to macroblocks and all macroblocks are one by one processed to look for maximum dependency from I frame, called motion vector. Let say the complexity of this process is O(n)=x.
For the B frame, all the macroblocks in the B frame are processed one by one. However, all these macroblocks are tried to match any macroblock from I frame or P frame. Therefore, the complexity is approximately 2x, isn't it?
koliva is offline   Reply With Quote
Old 21st October 2009, 07:42   #12  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by koliva View Post
Another question. Let's think only an I and a P frame. Whole P image is divided to macroblocks and all macroblocks are one by one processed to look for maximum dependency from I frame, called motion vector. Let say the complexity of this process is O(n)=x.
For the B frame, all the macroblocks in the B frame are processed one by one. However, all these macroblocks are tried to match any macroblock from I frame or P frame. Therefore, the complexity is approximately 2x, isn't it?
Technically, it would be N^2, not 2x, since the biprediction search space has twice the number of dimensions. In practice, it's actually less than that of P-frames because of the effective skip early termination, and because one doesn't actually need to search the whole biprediction search space.
Dark Shikari is offline   Reply With Quote
Old 21st October 2009, 07:59   #13  |  Link
koliva
Beginner
 
koliva's Avatar
 
Join Date: Jan 2009
Location: Europe
Posts: 125
Quote:
Originally Posted by Dark Shikari View Post
Technically, it would be N^2, not 2x, since the biprediction search space has twice the number of dimensions. In practice, it's actually less than that of P-frames because of the effective skip early termination, and because one doesn't actually need to search the whole biprediction search space.
Could you please give me more information about effective skip early termination? What are the conditions for this skip process?
koliva is offline   Reply With Quote
Old 21st October 2009, 11:24   #14  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by koliva View Post
Could you please give me more information about effective skip early termination? What are the conditions for this skip process?
I didn't look into the details yet, but I think it means that you don't test any possible combination ("brute force method"). Instead you only test some candidates and then limit the further search to regions around the most promising matches you have found so far. Regions where no good matches are expected, will be skipped. Also the "search range" has a pre-defined limit...

If you want the details, have a look at:
http://git.videolan.org/?p=x264.git;...f18fbc;hb=HEAD
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 21st October 2009 at 11:31.
LoRd_MuldeR is offline   Reply With Quote
Old 21st October 2009, 13:24   #15  |  Link
koliva
Beginner
 
koliva's Avatar
 
Join Date: Jan 2009
Location: Europe
Posts: 125
To understand the process and complexity better, Is there any way to see how many comparison are done for a P frame and B frame in a movie using recent encoder? Is there any tool for that purpose?
koliva is offline   Reply With Quote
Old 26th October 2009, 01:03   #16  |  Link
ConsciousEffect
Registered User
 
Join Date: Feb 2009
Posts: 20
hey lord mulder i would be interested in understanding h264 at a similar level of comprehension to what you have, the problem with complex issues like this is finding the proper resources and finding them in the correct order, you know its important to understand the basic framework of the system in question before you try to start comprehending formulas for example, anyways i would be interested in any resources you recommend
ConsciousEffect is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 10:40.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.