Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Reply
 
Thread Tools Search this Thread Display Modes
Old 15th April 2013, 18:05   #1  |  Link
akropp
Registered User
 
Join Date: Sep 2009
Posts: 13
Extract IDR and I frames from x264?

I had code that was working with an older version of x264 (build 84) that was able to extract sps, pps, and i frames from the encoder output by looking for start codes of the form

00 00 00 01 0x__

where I would take that last byte in position 4 and and it with 0x1f. Types 1, 5, and 6 I would save as I or P frames, and types 7 or 8 were sps or pps.

I upgraded my library to x264 build 130 and I still see that sps and pps NAL units have the 4 byte header but now IDR and I frames have a 3 byte header of the form:

00 00 01 65

I can see this when I look at the value of the memory address of the nail.p_payload pointer that is returned when the encoder gives back a keyframe (determined by IS_X264_TYPE_I(pic_out.i_type))

P frames still come in for me with the header

00 00 00 01 41


Did my old code work purely by happenstance or am I missing something? Is this 3 byte header correct? Do I need to account for both 4 byte headers and 3 byte headers?

Also, can someone explain to me why when I get an I Frame from the encoder that it comes with 4 NAL units: sps, pps, idr and i frame. Is this IDR and I frame two separate frames?

Last edited by akropp; 15th April 2013 at 18:27.
akropp is offline   Reply With Quote
Old 15th April 2013, 19:37   #2  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Only the first NAL in an access unit, the SPS, and the PPS need to have a 4-byte startcode. x264 uses 3-byte startcodes everywhere else that it's allowed to.
Dark Shikari is offline   Reply With Quote
Old 15th April 2013, 20:22   #3  |  Link
akropp
Registered User
 
Join Date: Sep 2009
Posts: 13
Thanks! That definitely clears up some questions.

Can you explain to me why I get an IDR followed by an I frame? So an nal array of 4 elements. The first two are sps and pps.

I'm trying to modify my code that saves the video, and I'm not sure if I should treat the IDR frame data as a seperate frame entry in the header, or have the sps/pps/IDR/I frame information all as one frame blob? Do I need to remove the sps/pps information?

Last edited by akropp; 15th April 2013 at 20:32.
akropp is offline   Reply With Quote
Old 15th April 2013, 21:25   #4  |  Link
Guest
Guest
 
Join Date: Jan 2002
Posts: 21,901
Your questions don't make much sense and so AVC experts will find it very hard to answer them.

To make things more concrete, post a link to a sample stream and then say what you find unusual or problematic about it.

And what do you mean by "saves the video"? What format are you trying to output?
Guest is offline   Reply With Quote
Old 15th April 2013, 22:57   #5  |  Link
akropp
Registered User
 
Join Date: Sep 2009
Posts: 13
Neuron, sorry for the lack of clarity, I'm not totally sure what I'm asking about myself. Bear with me as I'm sort of muddling my way through this.

Basically I am working an application that takes raw I420 frames from a webcam and sends them to x264 to compress as video. I am saving the video in an mp4 container. In the past, with an older version of x264, when a key frame came from the encoder, I captured the sps and pps and stored those bytes in memory. When it came time to create the video file header, I would use the sps and pps information to generate STSD atom information. I have b_repeat_headers on, so when a keyframe comes it also comes with sps and pps data. When I saved a frame data into the MDAT section of the video file, however, I didn't include any sps and pps data that came with a particular keyframe. I saved everything but the sps and pps.

For example, here is some code I found in muxers.c that my old code was based off of. Depending on the NAL unit type, it either parses the sps and pps, or strips off the NAL leading code of a frame and then later would save that to the mp4 file. This is what threw me off originally when I upgraded, prompting my original post asking about 3 or 4 byte start codes (since this code below doesn't work with 3 byte start codes).


Code:
int write_nalu_mp4( hnd_t handle, uint8_t *p_nalu, int i_size )
{
    mp4_t *p_mp4 = (mp4_t *)handle;
    GF_AVCConfigSlot *p_slot;
    uint8_t type = p_nalu[4] & 0x1f;
    int psize;

    switch(type)
    {
    // sps
    case 0x07:
        if (!p_mp4->b_sps)
        {
            p_mp4->p_config->configurationVersion = 1;
            p_mp4->p_config->AVCProfileIndication = p_nalu[5];
            p_mp4->p_config->profile_compatibility = p_nalu[6];
            p_mp4->p_config->AVCLevelIndication = p_nalu[7];
            p_slot = (GF_AVCConfigSlot *)malloc(sizeof(GF_AVCConfigSlot));
            p_slot->size = i_size - 4;
            p_slot->data = (char *)malloc(p_slot->size);
            memcpy(p_slot->data, p_nalu + 4, i_size - 4);
            gf_list_add(p_mp4->p_config->sequenceParameterSets, p_slot);
            p_slot = NULL;
            p_mp4->b_sps = 1;
        }
        break;

    // pps
    case 0x08:
        if (!p_mp4->b_pps)
        {
            p_slot = (GF_AVCConfigSlot *)malloc(sizeof(GF_AVCConfigSlot));
            p_slot->size = i_size - 4;
            p_slot->data = (char *)malloc(p_slot->size);
            memcpy(p_slot->data, p_nalu + 4, i_size - 4);
            gf_list_add(p_mp4->p_config->pictureParameterSets, p_slot);
            p_slot = NULL;
            p_mp4->b_pps = 1;
            if (p_mp4->b_sps)
                gf_isom_avc_config_update(p_mp4->p_file, p_mp4->i_track, 1, p_mp4->p_config);
        }
        break;

    // slice, sei
    case 0x1:
    case 0x5:
    case 0x6:
        psize = i_size - 4 ;
        memcpy(p_mp4->p_sample->data + p_mp4->p_sample->dataLength, p_nalu, i_size);
        p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 0] = (psize >> 24) & 0xff;
        p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 1] = (psize >> 16) & 0xff;
        p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 2] = (psize >> 8) & 0xff;
        p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 3] = (psize >> 0) & 0xff;
        p_mp4->p_sample->dataLength += i_size;
        break;
    }

    return i_size;
}
Since I am changing my code to use the latest x264, I am comparing against the x264.c that is in the source tree.

In x264.c it looks like the entire frame is being treated as a single unit. For p-frames (I'm not using b-frames), this is fine. The NAL array has one element and that's great. When I get a keyframe, the nal variable is an array of 4 items that, as far as I can tell, are:

NAL_SPS
NAL_PPS
NAL_SLICE_IDR
NAL_SEI


Code:
i_frame_size = cli_output.write_frame( hout, nal[0].p_payload, i_frame_size, &pic_out );
This is in comparison to this old x264 code which I was using previously which did work on each NAL unit independently:

Code:
for( i = 0; i < i_nal; i++ )
    {
        int i_size;

        if( mux_buffer_size < nal[i].i_payload * 3/2 + 4 )
        {
            mux_buffer_size = nal[i].i_payload * 2 + 4;
            x264_free( mux_buffer );
            mux_buffer = x264_malloc( mux_buffer_size );
            if( !mux_buffer )
                return -1;
        }

        i_size = mux_buffer_size;
        x264_nal_encode( mux_buffer, &i_size, 1, &nal[i] );
        i_nalu_size = p_write_nalu( hout, mux_buffer, i_size );
        if( i_nalu_size < 0 )
            return -1;
        i_file += i_nalu_size;
    }

So I guess my question is, when saving the output of the encoder to my mp4 container, should I include the sps/pps/idr/sei frame all as one chunk? Or should I drop the sps/pps (like I had previously, which was working), and split up the IDR and SEI frame into two separate frame entries? I'm not even sure what SEI is.

When I mean frame entries, I am talking about what goes in the header. So each keyframe gets an entry in the STSS table and it has a corresponding STSZ (sample size) entry, etc.

As it is, I am saving the entire chunk as one frame entry. This plays back properly in VLC and in my application, but in QuickTime player all I get is green. From experience I am pretty sure this means its can't decode something, so I am trying to rule out that I am saving actual frame data incorrectly (let alone if I have mistakes in the header).

Hopefully that helps clarify my question, though maybe I am just totally confused as to what is going on. If this still makes no sense, feel free to let me know. I know this post ranted on and feels disjointed, but I am trying to put as much information as possible to help anyone to help me out.

Thanks!

Last edited by akropp; 15th April 2013 at 23:03.
akropp is offline   Reply With Quote
Old 15th April 2013, 23:30   #6  |  Link
Guest
Guest
 
Join Date: Jan 2002
Posts: 21,901
First it's highly unusual to have SEIs *after* the IDR NALU. Are you sure you are parsing things correctly?

Again, you will have to post a source stream to get better help.

Why don't you let x264.exe create the MP4 file?

My understanding of MP4 is that the SPS/PPS are in header atoms and are stripped from the actual video atoms. But I could be misinformed.
Guest is offline   Reply With Quote
Old 15th April 2013, 23:59   #7  |  Link
akropp
Registered User
 
Join Date: Sep 2009
Posts: 13
Neuron2, you are right. I misread the types! It is definitely SEI then IDR. That said, is an SEI+IDR considered one frame or is that two frames?

I was also under the impression that you should strip off sps/pps from video atoms. That's kind of how this all started out. When you write video atoms I've always seen them written with their size coded into the first 4 bytes. Usually, in the past, I've seen this done by overwriting the frame start code with the size like in the muxers.c example I posted above:


Code:
memcpy(p_mp4->p_sample->data + p_mp4->p_sample->dataLength, p_nalu, i_size);
p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 0] = (psize >> 24) & 0xff;
p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 1] = (psize >> 16) & 0xff;
p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 2] = (psize >> 8) & 0xff;
p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 3] = (psize >> 0) & 0xff;
p_mp4->p_sample->dataLength += i_size;
But if the SEI and IDR frames that come through have only a 3 byte header, should I fake an extra byte in the beginning to be able to write in a 4 byte size?

To answer your question about allowing x264 to write the headers, I wish I could but the application isn't set up that way. It's worked for a long time in the past with the older version of x264, so I'm hoping that any tweaks I need to make are actually reasonably minor
akropp is offline   Reply With Quote
Old 16th April 2013, 00:12   #8  |  Link
Guest
Guest
 
Join Date: Jan 2002
Posts: 21,901
Quote:
Originally Posted by akropp View Post
Neuron2, you are right. I misread the types! It is definitely SEI then IDR. That said, is an SEI+IDR considered one frame or is that two frames?
First, you are using incorrect terminology. You need to talk about "pictures" because if you have field structure, then an IDR may specify only one field and not a full frame.

That said, I believe you should store all the SEIs that precede a picture NALU together with that picture NALU. I hope you are also aware that there might be multiple slices, so don't think that there will necessarily only be one NALU for a picture.

Of course these points are moot if you can ensure that your encoder creates only frame structure with one slice.

Quote:
But if the SEI and IDR frames that come through have only a 3 byte header, should I fake an extra byte in the beginning to be able to write in a 4 byte size?
You have to do that, yes.
Guest is offline   Reply With Quote
Old 16th April 2013, 00:20   #9  |  Link
akropp
Registered User
 
Join Date: Sep 2009
Posts: 13
Ah, thanks for the terminology correction. I knew I wasn't saying the right thing. Also, thanks for the helpful information. I'm going to poke around for a bit and hopefully if I get everything working report back!
akropp is offline   Reply With Quote
Old 16th April 2013, 00:27   #10  |  Link
akropp
Registered User
 
Join Date: Sep 2009
Posts: 13
Neuron2, once I put in the extra zero byte for the SEI+IDR everything worked swimmingly. Thanks so much for your help and in clarifying everything!
akropp is offline   Reply With Quote
Old 16th April 2013, 00:45   #11  |  Link
Guest
Guest
 
Join Date: Jan 2002
Posts: 21,901
You're welcome and I am glad to be of assistance. Thanks also go to Dark Shikari for his assistance.
Guest is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 04:55.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.