Extract IDR and I frames from x264?

akropp · 15th April 2013, 18:05

I had code that was working with an older version of x264 (build 84) that was able to extract sps, pps, and i frames from the encoder output by looking for start codes of the form

00 00 00 01 0x__

where I would take that last byte in position 4 and and it with 0x1f. Types 1, 5, and 6 I would save as I or P frames, and types 7 or 8 were sps or pps.

I upgraded my library to x264 build 130 and I still see that sps and pps NAL units have the 4 byte header but now IDR and I frames have a 3 byte header of the form:

00 00 01 65

I can see this when I look at the value of the memory address of the nail.p_payload pointer that is returned when the encoder gives back a keyframe (determined by IS_X264_TYPE_I(pic_out.i_type))

P frames still come in for me with the header

00 00 00 01 41

Did my old code work purely by happenstance or am I missing something? Is this 3 byte header correct? Do I need to account for both 4 byte headers and 3 byte headers?

Also, can someone explain to me why when I get an I Frame from the encoder that it comes with 4 NAL units: sps, pps, idr and i frame. Is this IDR and I frame two separate frames?

Dark Shikari · 15th April 2013, 19:37

Only the first NAL in an access unit, the SPS, and the PPS need to have a 4-byte startcode. x264 uses 3-byte startcodes everywhere else that it's allowed to.

akropp · 15th April 2013, 20:22

Thanks! That definitely clears up some questions.

Can you explain to me why I get an IDR followed by an I frame? So an nal array of 4 elements. The first two are sps and pps.

I'm trying to modify my code that saves the video, and I'm not sure if I should treat the IDR frame data as a seperate frame entry in the header, or have the sps/pps/IDR/I frame information all as one frame blob? Do I need to remove the sps/pps information?

Guest · 15th April 2013, 21:25

Your questions don't make much sense and so AVC experts will find it very hard to answer them.

To make things more concrete, post a link to a sample stream and then say what you find unusual or problematic about it.

And what do you mean by "saves the video"? What format are you trying to output?

akropp · 15th April 2013, 22:57

Neuron, sorry for the lack of clarity, I'm not totally sure what I'm asking about myself. Bear with me as I'm sort of muddling my way through this.

Basically I am working an application that takes raw I420 frames from a webcam and sends them to x264 to compress as video. I am saving the video in an mp4 container. In the past, with an older version of x264, when a key frame came from the encoder, I captured the sps and pps and stored those bytes in memory. When it came time to create the video file header, I would use the sps and pps information to generate STSD atom information. I have b_repeat_headers on, so when a keyframe comes it also comes with sps and pps data. When I saved a frame data into the MDAT section of the video file, however, I didn't include any sps and pps data that came with a particular keyframe. I saved everything but the sps and pps.

For example, here is some code I found in muxers.c that my old code was based off of. Depending on the NAL unit type, it either parses the sps and pps, or strips off the NAL leading code of a frame and then later would save that to the mp4 file. This is what threw me off originally when I upgraded, prompting my original post asking about 3 or 4 byte start codes (since this code below doesn't work with 3 byte start codes).

Code:

int write_nalu_mp4( hnd_t handle, uint8_t *p_nalu, int i_size )
{
    mp4_t *p_mp4 = (mp4_t *)handle;
    GF_AVCConfigSlot *p_slot;
    uint8_t type = p_nalu[4] & 0x1f;
    int psize;

    switch(type)
    {
    // sps
    case 0x07:
        if (!p_mp4->b_sps)
        {
            p_mp4->p_config->configurationVersion = 1;
            p_mp4->p_config->AVCProfileIndication = p_nalu[5];
            p_mp4->p_config->profile_compatibility = p_nalu[6];
            p_mp4->p_config->AVCLevelIndication = p_nalu[7];
            p_slot = (GF_AVCConfigSlot *)malloc(sizeof(GF_AVCConfigSlot));
            p_slot->size = i_size - 4;
            p_slot->data = (char *)malloc(p_slot->size);
            memcpy(p_slot->data, p_nalu + 4, i_size - 4);
            gf_list_add(p_mp4->p_config->sequenceParameterSets, p_slot);
            p_slot = NULL;
            p_mp4->b_sps = 1;
        }
        break;

    // pps
    case 0x08:
        if (!p_mp4->b_pps)
        {
            p_slot = (GF_AVCConfigSlot *)malloc(sizeof(GF_AVCConfigSlot));
            p_slot->size = i_size - 4;
            p_slot->data = (char *)malloc(p_slot->size);
            memcpy(p_slot->data, p_nalu + 4, i_size - 4);
            gf_list_add(p_mp4->p_config->pictureParameterSets, p_slot);
            p_slot = NULL;
            p_mp4->b_pps = 1;
            if (p_mp4->b_sps)
                gf_isom_avc_config_update(p_mp4->p_file, p_mp4->i_track, 1, p_mp4->p_config);
        }
        break;

    // slice, sei
    case 0x1:
    case 0x5:
    case 0x6:
        psize = i_size - 4 ;
        memcpy(p_mp4->p_sample->data + p_mp4->p_sample->dataLength, p_nalu, i_size);
        p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 0] = (psize >> 24) & 0xff;
        p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 1] = (psize >> 16) & 0xff;
        p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 2] = (psize >> 8) & 0xff;
        p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 3] = (psize >> 0) & 0xff;
        p_mp4->p_sample->dataLength += i_size;
        break;
    }

    return i_size;
}

Since I am changing my code to use the latest x264, I am comparing against the x264.c that is in the source tree.

In x264.c it looks like the entire frame is being treated as a single unit. For p-frames (I'm not using b-frames), this is fine. The NAL array has one element and that's great. When I get a keyframe, the nal variable is an array of 4 items that, as far as I can tell, are:

NAL_SPS
NAL_PPS
NAL_SLICE_IDR
NAL_SEI

Code:

i_frame_size = cli_output.write_frame( hout, nal[0].p_payload, i_frame_size, &pic_out );

This is in comparison to this old x264 code which I was using previously which did work on each NAL unit independently:

Code:

for( i = 0; i < i_nal; i++ )
    {
        int i_size;

        if( mux_buffer_size < nal[i].i_payload * 3/2 + 4 )
        {
            mux_buffer_size = nal[i].i_payload * 2 + 4;
            x264_free( mux_buffer );
            mux_buffer = x264_malloc( mux_buffer_size );
            if( !mux_buffer )
                return -1;
        }

        i_size = mux_buffer_size;
        x264_nal_encode( mux_buffer, &i_size, 1, &nal[i] );
        i_nalu_size = p_write_nalu( hout, mux_buffer, i_size );
        if( i_nalu_size < 0 )
            return -1;
        i_file += i_nalu_size;
    }

So I guess my question is, when saving the output of the encoder to my mp4 container, should I include the sps/pps/idr/sei frame all as one chunk? Or should I drop the sps/pps (like I had previously, which was working), and split up the IDR and SEI frame into two separate frame entries? I'm not even sure what SEI is.

When I mean frame entries, I am talking about what goes in the header. So each keyframe gets an entry in the STSS table and it has a corresponding STSZ (sample size) entry, etc.

As it is, I am saving the entire chunk as one frame entry. This plays back properly in VLC and in my application, but in QuickTime player all I get is green. From experience I am pretty sure this means its can't decode something, so I am trying to rule out that I am saving actual frame data incorrectly (let alone if I have mistakes in the header).

Hopefully that helps clarify my question, though maybe I am just totally confused as to what is going on. If this still makes no sense, feel free to let me know. I know this post ranted on and feels disjointed, but I am trying to put as much information as possible to help anyone to help me out.

Thanks!

Guest · 15th April 2013, 23:30

First it's highly unusual to have SEIs *after* the IDR NALU. Are you sure you are parsing things correctly?

Again, you will have to post a source stream to get better help.

Why don't you let x264.exe create the MP4 file?

My understanding of MP4 is that the SPS/PPS are in header atoms and are stripped from the actual video atoms. But I could be misinformed.

akropp · 15th April 2013, 23:59

Neuron2, you are right. I misread the types! It is definitely SEI then IDR. That said, is an SEI+IDR considered one frame or is that two frames?

I was also under the impression that you should strip off sps/pps from video atoms. That's kind of how this all started out. When you write video atoms I've always seen them written with their size coded into the first 4 bytes. Usually, in the past, I've seen this done by overwriting the frame start code with the size like in the muxers.c example I posted above:

Code:

memcpy(p_mp4->p_sample->data + p_mp4->p_sample->dataLength, p_nalu, i_size);
p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 0] = (psize >> 24) & 0xff;
p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 1] = (psize >> 16) & 0xff;
p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 2] = (psize >> 8) & 0xff;
p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 3] = (psize >> 0) & 0xff;
p_mp4->p_sample->dataLength += i_size;

But if the SEI and IDR frames that come through have only a 3 byte header, should I fake an extra byte in the beginning to be able to write in a 4 byte size?

To answer your question about allowing x264 to write the headers, I wish I could but the application isn't set up that way. It's worked for a long time in the past with the older version of x264, so I'm hoping that any tweaks I need to make are actually reasonably minor

Guest · 16th April 2013, 00:12

Quote:

Originally Posted by akropp

Neuron2, you are right. I misread the types! It is definitely SEI then IDR. That said, is an SEI+IDR considered one frame or is that two frames?

First, you are using incorrect terminology. You need to talk about "pictures" because if you have field structure, then an IDR may specify only one field and not a full frame.

That said, I believe you should store all the SEIs that precede a picture NALU together with that picture NALU. I hope you are also aware that there might be multiple slices, so don't think that there will necessarily only be one NALU for a picture.

Of course these points are moot if you can ensure that your encoder creates only frame structure with one slice.

Quote:

But if the SEI and IDR frames that come through have only a 3 byte header, should I fake an extra byte in the beginning to be able to write in a 4 byte size?

You have to do that, yes.

akropp · 16th April 2013, 00:20

Ah, thanks for the terminology correction. I knew I wasn't saying the right thing. Also, thanks for the helpful information. I'm going to poke around for a bit and hopefully if I get everything working report back!

akropp · 16th April 2013, 00:27

Neuron2, once I put in the extra zero byte for the SEI+IDR everything worked swimmingly. Thanks so much for your help and in clarifying everything!

Guest · 16th April 2013, 00:45

You're welcome and I am glad to be of assistance. Thanks also go to Dark Shikari for his assistance.

15th April 2013, 18:05	#1 \| Link
akropp Registered User Join Date: Sep 2009 Posts: 13	Extract IDR and I frames from x264? I had code that was working with an older version of x264 (build 84) that was able to extract sps, pps, and i frames from the encoder output by looking for start codes of the form 00 00 00 01 0x__ where I would take that last byte in position 4 and and it with 0x1f. Types 1, 5, and 6 I would save as I or P frames, and types 7 or 8 were sps or pps. I upgraded my library to x264 build 130 and I still see that sps and pps NAL units have the 4 byte header but now IDR and I frames have a 3 byte header of the form: 00 00 01 65 I can see this when I look at the value of the memory address of the nail.p_payload pointer that is returned when the encoder gives back a keyframe (determined by IS_X264_TYPE_I(pic_out.i_type)) P frames still come in for me with the header 00 00 00 01 41 Did my old code work purely by happenstance or am I missing something? Is this 3 byte header correct? Do I need to account for both 4 byte headers and 3 byte headers? Also, can someone explain to me why when I get an I Frame from the encoder that it comes with 4 NAL units: sps, pps, idr and i frame. Is this IDR and I frame two separate frames? Last edited by akropp; 15th April 2013 at 18:27.

15th April 2013, 19:37	#2 \| Link
Dark Shikari x264 developer Join Date: Sep 2005 Posts: 8,666	Only the first NAL in an access unit, the SPS, and the PPS need to have a 4-byte startcode. x264 uses 3-byte startcodes everywhere else that it's allowed to. __________________ Follow x264 development progress \| akupenguin quotes \| x264 git status ffmpeg and x264-related consulting/coding contracts \| Doom10

15th April 2013, 20:22	#3 \| Link
akropp Registered User Join Date: Sep 2009 Posts: 13	Thanks! That definitely clears up some questions. Can you explain to me why I get an IDR followed by an I frame? So an nal array of 4 elements. The first two are sps and pps. I'm trying to modify my code that saves the video, and I'm not sure if I should treat the IDR frame data as a seperate frame entry in the header, or have the sps/pps/IDR/I frame information all as one frame blob? Do I need to remove the sps/pps information? Last edited by akropp; 15th April 2013 at 20:32.

15th April 2013, 22:57	#5 \| Link
akropp Registered User Join Date: Sep 2009 Posts: 13	Neuron, sorry for the lack of clarity, I'm not totally sure what I'm asking about myself. Bear with me as I'm sort of muddling my way through this. Basically I am working an application that takes raw I420 frames from a webcam and sends them to x264 to compress as video. I am saving the video in an mp4 container. In the past, with an older version of x264, when a key frame came from the encoder, I captured the sps and pps and stored those bytes in memory. When it came time to create the video file header, I would use the sps and pps information to generate STSD atom information. I have b_repeat_headers on, so when a keyframe comes it also comes with sps and pps data. When I saved a frame data into the MDAT section of the video file, however, I didn't include any sps and pps data that came with a particular keyframe. I saved everything but the sps and pps. For example, here is some code I found in muxers.c that my old code was based off of. Depending on the NAL unit type, it either parses the sps and pps, or strips off the NAL leading code of a frame and then later would save that to the mp4 file. This is what threw me off originally when I upgraded, prompting my original post asking about 3 or 4 byte start codes (since this code below doesn't work with 3 byte start codes). Code: int write_nalu_mp4( hnd_t handle, uint8_t p_nalu, int i_size ) { mp4_t p_mp4 = (mp4_t )handle; GF_AVCConfigSlot p_slot; uint8_t type = p_nalu[4] & 0x1f; int psize; switch(type) { // sps case 0x07: if (!p_mp4->b_sps) { p_mp4->p_config->configurationVersion = 1; p_mp4->p_config->AVCProfileIndication = p_nalu[5]; p_mp4->p_config->profile_compatibility = p_nalu[6]; p_mp4->p_config->AVCLevelIndication = p_nalu[7]; p_slot = (GF_AVCConfigSlot )malloc(sizeof(GF_AVCConfigSlot)); p_slot->size = i_size - 4; p_slot->data = (char )malloc(p_slot->size); memcpy(p_slot->data, p_nalu + 4, i_size - 4); gf_list_add(p_mp4->p_config->sequenceParameterSets, p_slot); p_slot = NULL; p_mp4->b_sps = 1; } break; // pps case 0x08: if (!p_mp4->b_pps) { p_slot = (GF_AVCConfigSlot )malloc(sizeof(GF_AVCConfigSlot)); p_slot->size = i_size - 4; p_slot->data = (char )malloc(p_slot->size); memcpy(p_slot->data, p_nalu + 4, i_size - 4); gf_list_add(p_mp4->p_config->pictureParameterSets, p_slot); p_slot = NULL; p_mp4->b_pps = 1; if (p_mp4->b_sps) gf_isom_avc_config_update(p_mp4->p_file, p_mp4->i_track, 1, p_mp4->p_config); } break; // slice, sei case 0x1: case 0x5: case 0x6: psize = i_size - 4 ; memcpy(p_mp4->p_sample->data + p_mp4->p_sample->dataLength, p_nalu, i_size); p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 0] = (psize >> 24) & 0xff; p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 1] = (psize >> 16) & 0xff; p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 2] = (psize >> 8) & 0xff; p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 3] = (psize >> 0) & 0xff; p_mp4->p_sample->dataLength += i_size; break; } return i_size; } Since I am changing my code to use the latest x264, I am comparing against the x264.c that is in the source tree. In x264.c it looks like the entire frame is being treated as a single unit. For p-frames (I'm not using b-frames), this is fine. The NAL array has one element and that's great. When I get a keyframe, the nal variable is an array of 4 items that, as far as I can tell, are: NAL_SPS NAL_PPS NAL_SLICE_IDR NAL_SEI Code: i_frame_size = cli_output.write_frame( hout, nal[0].p_payload, i_frame_size, &pic_out ); This is in comparison to this old x264 code which I was using previously which did work on each NAL unit independently: Code: for( i = 0; i < i_nal; i++ ) { int i_size; if( mux_buffer_size < nal[i].i_payload * 3/2 + 4 ) { mux_buffer_size = nal[i].i_payload * 2 + 4; x264_free( mux_buffer ); mux_buffer = x264_malloc( mux_buffer_size ); if( !mux_buffer ) return -1; } i_size = mux_buffer_size; x264_nal_encode( mux_buffer, &i_size, 1, &nal[i] ); i_nalu_size = p_write_nalu( hout, mux_buffer, i_size ); if( i_nalu_size < 0 ) return -1; i_file += i_nalu_size; } So I guess my question is, when saving the output of the encoder to my mp4 container, should I include the sps/pps/idr/sei frame all as one chunk? Or should I drop the sps/pps (like I had previously, which was working), and split up the IDR and SEI frame into two separate frame entries? I'm not even sure what SEI is. When I mean frame entries, I am talking about what goes in the header. So each keyframe gets an entry in the STSS table and it has a corresponding STSZ (sample size) entry, etc. As it is, I am saving the entire chunk as one frame entry. This plays back properly in VLC and in my application, but in QuickTime player all I get is green. From experience I am pretty sure this means its can't decode something, so I am trying to rule out that I am saving actual frame data incorrectly (let alone if I have mistakes in the header). Hopefully that helps clarify my question, though maybe I am just totally confused as to what is going on. If this still makes no sense, feel free to let me know. I know this post ranted on and feels disjointed, but I am trying to put as much information as possible to help anyone to help me out. Thanks! Last edited by akropp; 15th April 2013 at 23:03.

15th April 2013, 23:59	#7 \| Link
akropp Registered User Join Date: Sep 2009 Posts: 13	Neuron2, you are right. I misread the types! It is definitely SEI then IDR. That said, is an SEI+IDR considered one frame or is that two frames? I was also under the impression that you should strip off sps/pps from video atoms. That's kind of how this all started out. When you write video atoms I've always seen them written with their size coded into the first 4 bytes. Usually, in the past, I've seen this done by overwriting the frame start code with the size like in the muxers.c example I posted above: Code: memcpy(p_mp4->p_sample->data + p_mp4->p_sample->dataLength, p_nalu, i_size); p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 0] = (psize >> 24) & 0xff; p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 1] = (psize >> 16) & 0xff; p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 2] = (psize >> 8) & 0xff; p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 3] = (psize >> 0) & 0xff; p_mp4->p_sample->dataLength += i_size; But if the SEI and IDR frames that come through have only a 3 byte header, should I fake an extra byte in the beginning to be able to write in a 4 byte size? To answer your question about allowing x264 to write the headers, I wish I could but the application isn't set up that way. It's worked for a long time in the past with the older version of x264, so I'm hoping that any tweaks I need to make are actually reasonably minor

15th April 2013, 21:25	#4 \| Link
Guest Guest Join Date: Jan 2002 Posts: 21,901	Your questions don't make much sense and so AVC experts will find it very hard to answer them. To make things more concrete, post a link to a sample stream and then say what you find unusual or problematic about it. And what do you mean by "saves the video"? What format are you trying to output?

15th April 2013, 23:30	#6 \| Link
Guest Guest Join Date: Jan 2002 Posts: 21,901	First it's highly unusual to have SEIs after the IDR NALU. Are you sure you are parsing things correctly? Again, you will have to post a source stream to get better help. Why don't you let x264.exe create the MP4 file? My understanding of MP4 is that the SPS/PPS are in header atoms and are stripped from the actual video atoms. But I could be misinformed.

16th April 2013, 00:20	#9 \| Link
akropp Registered User Join Date: Sep 2009 Posts: 13	Ah, thanks for the terminology correction. I knew I wasn't saying the right thing. Also, thanks for the helpful information. I'm going to poke around for a bit and hopefully if I get everything working report back!

16th April 2013, 00:27	#10 \| Link
akropp Registered User Join Date: Sep 2009 Posts: 13	Neuron2, once I put in the extra zero byte for the SEI+IDR everything worked swimmingly. Thanks so much for your help and in clarifying everything!

16th April 2013, 00:45	#11 \| Link
Guest Guest Join Date: Jan 2002 Posts: 21,901	You're welcome and I am glad to be of assistance. Thanks also go to Dark Shikari for his assistance.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode