Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
15th April 2013, 18:05 | #1 | Link |
Registered User
Join Date: Sep 2009
Posts: 13
|
Extract IDR and I frames from x264?
I had code that was working with an older version of x264 (build 84) that was able to extract sps, pps, and i frames from the encoder output by looking for start codes of the form
00 00 00 01 0x__ where I would take that last byte in position 4 and and it with 0x1f. Types 1, 5, and 6 I would save as I or P frames, and types 7 or 8 were sps or pps. I upgraded my library to x264 build 130 and I still see that sps and pps NAL units have the 4 byte header but now IDR and I frames have a 3 byte header of the form: 00 00 01 65 I can see this when I look at the value of the memory address of the nail.p_payload pointer that is returned when the encoder gives back a keyframe (determined by IS_X264_TYPE_I(pic_out.i_type)) P frames still come in for me with the header 00 00 00 01 41 Did my old code work purely by happenstance or am I missing something? Is this 3 byte header correct? Do I need to account for both 4 byte headers and 3 byte headers? Also, can someone explain to me why when I get an I Frame from the encoder that it comes with 4 NAL units: sps, pps, idr and i frame. Is this IDR and I frame two separate frames? Last edited by akropp; 15th April 2013 at 18:27. |
15th April 2013, 20:22 | #3 | Link |
Registered User
Join Date: Sep 2009
Posts: 13
|
Thanks! That definitely clears up some questions.
Can you explain to me why I get an IDR followed by an I frame? So an nal array of 4 elements. The first two are sps and pps. I'm trying to modify my code that saves the video, and I'm not sure if I should treat the IDR frame data as a seperate frame entry in the header, or have the sps/pps/IDR/I frame information all as one frame blob? Do I need to remove the sps/pps information? Last edited by akropp; 15th April 2013 at 20:32. |
15th April 2013, 21:25 | #4 | Link |
Guest
Join Date: Jan 2002
Posts: 21,901
|
Your questions don't make much sense and so AVC experts will find it very hard to answer them.
To make things more concrete, post a link to a sample stream and then say what you find unusual or problematic about it. And what do you mean by "saves the video"? What format are you trying to output? |
15th April 2013, 22:57 | #5 | Link |
Registered User
Join Date: Sep 2009
Posts: 13
|
Neuron, sorry for the lack of clarity, I'm not totally sure what I'm asking about myself. Bear with me as I'm sort of muddling my way through this.
Basically I am working an application that takes raw I420 frames from a webcam and sends them to x264 to compress as video. I am saving the video in an mp4 container. In the past, with an older version of x264, when a key frame came from the encoder, I captured the sps and pps and stored those bytes in memory. When it came time to create the video file header, I would use the sps and pps information to generate STSD atom information. I have b_repeat_headers on, so when a keyframe comes it also comes with sps and pps data. When I saved a frame data into the MDAT section of the video file, however, I didn't include any sps and pps data that came with a particular keyframe. I saved everything but the sps and pps. For example, here is some code I found in muxers.c that my old code was based off of. Depending on the NAL unit type, it either parses the sps and pps, or strips off the NAL leading code of a frame and then later would save that to the mp4 file. This is what threw me off originally when I upgraded, prompting my original post asking about 3 or 4 byte start codes (since this code below doesn't work with 3 byte start codes). Code:
int write_nalu_mp4( hnd_t handle, uint8_t *p_nalu, int i_size ) { mp4_t *p_mp4 = (mp4_t *)handle; GF_AVCConfigSlot *p_slot; uint8_t type = p_nalu[4] & 0x1f; int psize; switch(type) { // sps case 0x07: if (!p_mp4->b_sps) { p_mp4->p_config->configurationVersion = 1; p_mp4->p_config->AVCProfileIndication = p_nalu[5]; p_mp4->p_config->profile_compatibility = p_nalu[6]; p_mp4->p_config->AVCLevelIndication = p_nalu[7]; p_slot = (GF_AVCConfigSlot *)malloc(sizeof(GF_AVCConfigSlot)); p_slot->size = i_size - 4; p_slot->data = (char *)malloc(p_slot->size); memcpy(p_slot->data, p_nalu + 4, i_size - 4); gf_list_add(p_mp4->p_config->sequenceParameterSets, p_slot); p_slot = NULL; p_mp4->b_sps = 1; } break; // pps case 0x08: if (!p_mp4->b_pps) { p_slot = (GF_AVCConfigSlot *)malloc(sizeof(GF_AVCConfigSlot)); p_slot->size = i_size - 4; p_slot->data = (char *)malloc(p_slot->size); memcpy(p_slot->data, p_nalu + 4, i_size - 4); gf_list_add(p_mp4->p_config->pictureParameterSets, p_slot); p_slot = NULL; p_mp4->b_pps = 1; if (p_mp4->b_sps) gf_isom_avc_config_update(p_mp4->p_file, p_mp4->i_track, 1, p_mp4->p_config); } break; // slice, sei case 0x1: case 0x5: case 0x6: psize = i_size - 4 ; memcpy(p_mp4->p_sample->data + p_mp4->p_sample->dataLength, p_nalu, i_size); p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 0] = (psize >> 24) & 0xff; p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 1] = (psize >> 16) & 0xff; p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 2] = (psize >> 8) & 0xff; p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 3] = (psize >> 0) & 0xff; p_mp4->p_sample->dataLength += i_size; break; } return i_size; } In x264.c it looks like the entire frame is being treated as a single unit. For p-frames (I'm not using b-frames), this is fine. The NAL array has one element and that's great. When I get a keyframe, the nal variable is an array of 4 items that, as far as I can tell, are: NAL_SPS NAL_PPS NAL_SLICE_IDR NAL_SEI Code:
i_frame_size = cli_output.write_frame( hout, nal[0].p_payload, i_frame_size, &pic_out ); Code:
for( i = 0; i < i_nal; i++ ) { int i_size; if( mux_buffer_size < nal[i].i_payload * 3/2 + 4 ) { mux_buffer_size = nal[i].i_payload * 2 + 4; x264_free( mux_buffer ); mux_buffer = x264_malloc( mux_buffer_size ); if( !mux_buffer ) return -1; } i_size = mux_buffer_size; x264_nal_encode( mux_buffer, &i_size, 1, &nal[i] ); i_nalu_size = p_write_nalu( hout, mux_buffer, i_size ); if( i_nalu_size < 0 ) return -1; i_file += i_nalu_size; } So I guess my question is, when saving the output of the encoder to my mp4 container, should I include the sps/pps/idr/sei frame all as one chunk? Or should I drop the sps/pps (like I had previously, which was working), and split up the IDR and SEI frame into two separate frame entries? I'm not even sure what SEI is. When I mean frame entries, I am talking about what goes in the header. So each keyframe gets an entry in the STSS table and it has a corresponding STSZ (sample size) entry, etc. As it is, I am saving the entire chunk as one frame entry. This plays back properly in VLC and in my application, but in QuickTime player all I get is green. From experience I am pretty sure this means its can't decode something, so I am trying to rule out that I am saving actual frame data incorrectly (let alone if I have mistakes in the header). Hopefully that helps clarify my question, though maybe I am just totally confused as to what is going on. If this still makes no sense, feel free to let me know. I know this post ranted on and feels disjointed, but I am trying to put as much information as possible to help anyone to help me out. Thanks! Last edited by akropp; 15th April 2013 at 23:03. |
15th April 2013, 23:30 | #6 | Link |
Guest
Join Date: Jan 2002
Posts: 21,901
|
First it's highly unusual to have SEIs *after* the IDR NALU. Are you sure you are parsing things correctly?
Again, you will have to post a source stream to get better help. Why don't you let x264.exe create the MP4 file? My understanding of MP4 is that the SPS/PPS are in header atoms and are stripped from the actual video atoms. But I could be misinformed. |
15th April 2013, 23:59 | #7 | Link |
Registered User
Join Date: Sep 2009
Posts: 13
|
Neuron2, you are right. I misread the types! It is definitely SEI then IDR. That said, is an SEI+IDR considered one frame or is that two frames?
I was also under the impression that you should strip off sps/pps from video atoms. That's kind of how this all started out. When you write video atoms I've always seen them written with their size coded into the first 4 bytes. Usually, in the past, I've seen this done by overwriting the frame start code with the size like in the muxers.c example I posted above: Code:
memcpy(p_mp4->p_sample->data + p_mp4->p_sample->dataLength, p_nalu, i_size); p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 0] = (psize >> 24) & 0xff; p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 1] = (psize >> 16) & 0xff; p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 2] = (psize >> 8) & 0xff; p_mp4->p_sample->data[p_mp4->p_sample->dataLength + 3] = (psize >> 0) & 0xff; p_mp4->p_sample->dataLength += i_size; To answer your question about allowing x264 to write the headers, I wish I could but the application isn't set up that way. It's worked for a long time in the past with the older version of x264, so I'm hoping that any tweaks I need to make are actually reasonably minor |
16th April 2013, 00:12 | #8 | Link | ||
Guest
Join Date: Jan 2002
Posts: 21,901
|
Quote:
That said, I believe you should store all the SEIs that precede a picture NALU together with that picture NALU. I hope you are also aware that there might be multiple slices, so don't think that there will necessarily only be one NALU for a picture. Of course these points are moot if you can ensure that your encoder creates only frame structure with one slice. Quote:
|
||
Thread Tools | Search this Thread |
Display Modes | |
|
|