Intra Frame bit stream format in H.264

Rouhi · 24th May 2011, 00:08

As we know, the compressed data stored in I_Frames in H.264 standard can be categorised to two main types of data:
1- Discrete Cosine Transform (or better to say Integer Cosine Transform) coefficients. This type of data is used for transforming the data of the blocks which could not be encoded as intra frame prediction.
2- Intra frame prediction codes. These codes are consisting of 0 to 8 for 4x4 blocks and 0 to 3 for 16x16 blocks.
Do you have any clue about how can we access to these low level data and how they stored in side frames? I know they are in Golomb code format. Whats your idea?

imcold · 24th May 2011, 07:17

You should read the specification of the standard, section "Syntax and semantics". That's about the best explanation you can find. You'll have to parse the bitstream as any h.264 decoder does to get info about each macroblock.
I16x16 coding for example, assuming CAVLC coding (CABAC doesn't use exp.golomb codes at all):
-mbtype: unsigned exp.golomb <1..24> based on CBP for luma&chroma and prediction type
-chroma prediction: unsigned eg
-mb_qp_delta: signed eg
-luma DC block
-luma AC blocks
-chroma DC blocks
-chroma AC blocks
Some parts are optional, based on CBP.

Rouhi · 27th May 2011, 04:26

Do you think the adjacency of predicted block codes can be tracked in coded video without decompressing it?
I hope you understand my mean. suppose a 4x4 bloc code in intra prediction is 3. and my file pointer is on this coded block. can i find the adjacent coded blocks in the compressed file? for example what is the top or left or right or lower block codes, if there are exist.

imcold · 27th May 2011, 07:48

You can't know the length of (macro)blocks without parsing them first, so seeking to get prediction info only is impossible. Also, blocks don't start on byte boundaries.

Rouhi · 31st May 2011, 03:27

Absolutely without parsing, it is impossible. But in my question I said without "decompressing"....
Suppose i have parsed the video data and my file pointer is on a 4x4 prediction block coefficient and suppose it has a value , for example 3(one of that 9 values of intra prediction codes), ok?
I just want to know that how can i find the other 4x4 prediction block coefficient that are near to this block. the point is that i am still in compressed domain and don't know where is the top or left or....so in this case is it possible to find the other blocks coefficients around (from spacial point of view) our file pointer in the compressed domain ?
In another view, my question is that is it possible to find spacial direction(top, bottom , left and right) in compressed domain of a video file in MPEG4 AVC format?

imcold · 31st May 2011, 05:30

If you can parse the slice data correctly, then yes: it is possible to "possible to find the other blocks coefficients around", by having a macroblock array/table - built up while parsing and keeping at least the previous and current row of MBs. You'll need to keep at least the prediction info and nonzero counts in the table. You know the image width & height in MB units, so of course you know where the the top/left/etc. macroblock is.
I'm talking about parsing because you don't need to fully decompress the data, I hoped that much is clear.

Rouhi · 2nd June 2011, 00:45

Tnx for your reply. Regarding parsing the slice correctly we had a discussion (or may be still in continue) with Selur in this topic:

Bit stream structure of Access Unit and VCL NAL unit in MPEG-4

We could finally agree on the header bit stream of I, P and B frames. As an example for I slices the header is 0x 00 00 01 X Y

0x 00 00 01 is three byte of start code Prefix.

X=25,45 or 65 and consist of : forbiden zero bit + NAL-ref-idc+NAL-unit-type. NAL-unit-type=5(00101) id used for IDR pictures and I-Frames
X=01 means B-Frame and X=41 means P-Frames.(NAL-unit-type=1)

Y should be 00 . If Y is not zero it means that the I-Frame is sliced.

With this approach we could get same results with two different code from videos.
If you have a look on that topic, may be you can give me some nice advices like this one.

imcold · 2nd June 2011, 10:23

Y is part of the slice header, you may start your bit-level parsing there to get more info about the slice (but afaik you won't get far if you don't have info from SPS/PPS).
What is your ultimate goal, by the way?

Rouhi · 6th June 2011, 04:16

According to standard H.264, the header contain only 3 byte start code prefix an one byte X represents the VCL header for representing the VCL-NAL Unit type. As presented below

0x00 00 01 X

But the Y which is the first byte after X, is part of VCL or Non VCL data. If you have another idea please let me know your reference.
BTW for SPS the nal_unit_type(the LSB 5 bits of X) sould be 7 and for PPS the nal_unit_type would be 8.

You asked

Quote:

What is your ultimate goal, by the way?

I should answer that i am looking for 4x4 and 16x16 intra prediction modes (0 to 8 for 4x4 and 0 to 3 for 16x16 blocks) in the VCL NAL units.
For my goal, the information stored in SPS seems not very important because contains resolution and colour coding information. But PPS which contains the information such as picture coding, picture partitioning into slices and entropy coding, would be more important in my research, although i am just looking for intra prediction modes and their location in the slices. Do u have any suggestion?

imcold · 6th June 2011, 10:04

I think you should take a look at h264bitstream library. It's probably your best choice to get the data you want, if you don't want to get involved with the h.264 spec.

24th May 2011, 00:08	#1 \| Link
Rouhi Registered User Join Date: Apr 2011 Posts: 64	Intra Frame bit stream format in H.264 As we know, the compressed data stored in I_Frames in H.264 standard can be categorised to two main types of data: 1- Discrete Cosine Transform (or better to say Integer Cosine Transform) coefficients. This type of data is used for transforming the data of the blocks which could not be encoded as intra frame prediction. 2- Intra frame prediction codes. These codes are consisting of 0 to 8 for 4x4 blocks and 0 to 3 for 16x16 blocks. Do you have any clue about how can we access to these low level data and how they stored in side frames? I know they are in Golomb code format. Whats your idea?

24th May 2011, 07:17	#2 \| Link
imcold pencil artist Join Date: Jan 2006 Location: Slovakia Posts: 201	You should read the specification of the standard, section "Syntax and semantics". That's about the best explanation you can find. You'll have to parse the bitstream as any h.264 decoder does to get info about each macroblock. I16x16 coding for example, assuming CAVLC coding (CABAC doesn't use exp.golomb codes at all): -mbtype: unsigned exp.golomb <1..24> based on CBP for luma&chroma and prediction type -chroma prediction: unsigned eg -mb_qp_delta: signed eg -luma DC block -luma AC blocks -chroma DC blocks -chroma AC blocks Some parts are optional, based on CBP. __________________ fevh264 - open-source baseline h.264 encoder

27th May 2011, 07:48	#4 \| Link
imcold pencil artist Join Date: Jan 2006 Location: Slovakia Posts: 201	You can't know the length of (macro)blocks without parsing them first, so seeking to get prediction info only is impossible. Also, blocks don't start on byte boundaries. __________________ fevh264 - open-source baseline h.264 encoder

31st May 2011, 05:30	#6 \| Link
imcold pencil artist Join Date: Jan 2006 Location: Slovakia Posts: 201	If you can parse the slice data correctly, then yes: it is possible to "possible to find the other blocks coefficients around", by having a macroblock array/table - built up while parsing and keeping at least the previous and current row of MBs. You'll need to keep at least the prediction info and nonzero counts in the table. You know the image width & height in MB units, so of course you know where the the top/left/etc. macroblock is. I'm talking about parsing because you don't need to fully decompress the data, I hoped that much is clear. __________________ fevh264 - open-source baseline h.264 encoder Last edited by imcold; 31st May 2011 at 05:33.

2nd June 2011, 00:45	#7 \| Link
Rouhi Registered User Join Date: Apr 2011 Posts: 64	Tnx for your reply. Regarding parsing the slice correctly we had a discussion (or may be still in continue) with Selur in this topic: Bit stream structure of Access Unit and VCL NAL unit in MPEG-4 We could finally agree on the header bit stream of I, P and B frames. As an example for I slices the header is 0x 00 00 01 X Y 0x 00 00 01 is three byte of start code Prefix. X=25,45 or 65 and consist of : forbiden zero bit + NAL-ref-idc+NAL-unit-type. NAL-unit-type=5(00101) id used for IDR pictures and I-Frames X=01 means B-Frame and X=41 means P-Frames.(NAL-unit-type=1) Y should be 00 . If Y is not zero it means that the I-Frame is sliced. With this approach we could get same results with two different code from videos. If you have a look on that topic, may be you can give me some nice advices like this one. Last edited by Rouhi; 2nd June 2011 at 00:53.

27th May 2011, 04:26	#3 \| Link
Rouhi Registered User Join Date: Apr 2011 Posts: 64	Do you think the adjacency of predicted block codes can be tracked in coded video without decompressing it? I hope you understand my mean. suppose a 4x4 bloc code in intra prediction is 3. and my file pointer is on this coded block. can i find the adjacent coded blocks in the compressed file? for example what is the top or left or right or lower block codes, if there are exist.

31st May 2011, 03:27	#5 \| Link
Rouhi Registered User Join Date: Apr 2011 Posts: 64	Absolutely without parsing, it is impossible. But in my question I said without "decompressing".... Suppose i have parsed the video data and my file pointer is on a 4x4 prediction block coefficient and suppose it has a value , for example 3(one of that 9 values of intra prediction codes), ok? I just want to know that how can i find the other 4x4 prediction block coefficient that are near to this block. the point is that i am still in compressed domain and don't know where is the top or left or....so in this case is it possible to find the other blocks coefficients around (from spacial point of view) our file pointer in the compressed domain ? In another view, my question is that is it possible to find spacial direction(top, bottom , left and right) in compressed domain of a video file in MPEG4 AVC format?

2nd June 2011, 10:23	#8 \| Link
imcold pencil artist Join Date: Jan 2006 Location: Slovakia Posts: 201	Y is part of the slice header, you may start your bit-level parsing there to get more info about the slice (but afaik you won't get far if you don't have info from SPS/PPS). What is your ultimate goal, by the way? __________________ fevh264 - open-source baseline h.264 encoder Last edited by imcold; 2nd June 2011 at 10:26.

6th June 2011, 10:04	#10 \| Link
imcold pencil artist Join Date: Jan 2006 Location: Slovakia Posts: 201	I think you should take a look at h264bitstream library. It's probably your best choice to get the data you want, if you don't want to get involved with the h.264 spec. __________________ fevh264 - open-source baseline h.264 encoder Last edited by imcold; 6th June 2011 at 10:07.