Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
![]() |
#1 | Link |
Registered User
Join Date: Feb 2014
Posts: 15
|
i-frame vs. atom samples?
SHORT VERSION: How would I chop the stsz table such that it begins at a keyframe, given a known offset of the keyframe into mdat?
LONG VERSION: Here's some data to consider with a test file: ffprobe -show_frames Gives i-frames at 10421, 221265, 434154, ... If we subtract the atom headers and stuff so that we're only dealing with actual mdat data, that would then result in 0, 210844, 423733 mp4box -diso Gives the following entries in stsz <SampleSizeBox SampleCount="240"> <BoxInfo Size="980" Type="stsz"/> <FullBoxInfo Version="0" Flags="0x0"/> <SampleSizeEntry Size="27558"/> <SampleSizeEntry Size="1171"/> and so on The problem is, when I add up the SampleSizeEntries... I cannot get it to equal 210844 (nor 221265 just in case it's like stco and counts the file size, though I doubt that's right) In other words, I would think simply add up the sizes and then stop where it hits that number. But I must be misunderstanding something since it never actually hits the number exactly. Before I go ahead and try adjusting other tables and play it out I'd like to understand a bit better what I'm doing wrong here... ADDITIONAL INFO: Docs say when there is no stss atom, treat all samples as keyframes. That seems extremely strange... there is no stss atom here, but I know from ffprobe and encoding parameters that every sample here is definitely not a keyframe. Thanks. If I didn't phrase my question right, feel free to answer me in a way that will be most helpful to understand relationship between i-frames and stbl entries |
![]() |
![]() |
![]() |
#2 | Link |
Registered User
Join Date: Feb 2014
Posts: 15
|
OK I got a few things wrong and was going at it the wrong way. figured instead of editing my question above, it's more helpful to leave it as is (Edison once said something like, he didn't fail 99 times in his experimenting, rather he definitively discovered 99 ways it definitely won't work, or something like that)
Anyway- here's where I'm at now 1) there *is* an stss atom, I was looking at the wrong trak. D'oh! 2) It seems the right approach is really to use seeking based on time and look up the samples and nearest-keyframe-sample for that in the stts and stss atoms respectively (quicktime doc says to go to the sample keyframe less than desired sample, I think lowest distance makes more sense even if it's the next one up...) So now I hope my refined question will be a little more on point ![]() Let's say we're able to jump to a specific sample in each trak which matches the nearest keyframe. Great- as far as the atoms are concerned But how does this correspond to the offset in actual mdat data? On a similar note... not sure how to quite put this in a clear question, but the times are not quite the same for audio and video trak. Close but not the same... how does that factor into the equation? Thanks! |
![]() |
![]() |
![]() |
#3 | Link |
Registered User
Join Date: Feb 2014
Posts: 15
|
Here is the test file:
http://we.tl/nAwN5e5KaQ And here is my rough trace: VIDEO TRAK DESIRED SAMPLE BASED ON 3 SECOND SEEK: 71 CLOSEST KEYFRAME SAMPLE: 73 IN CHUNK: 8 CHUNK #8 OFFSET: 217886 SAMPLES IN CHUNK TILL TARGET: 2 NEED TO ADD THOSE SAMPLE OFFSETS: SAMPLE 71 SIZE 2958 SAMPLE 72 SIZE 903 FINAL OFFSET = 221747 (CHUNK OFFSET= 217886 + COMBINED SAMPLES OFFSET= 3861) AUDIO TRAK DESIRED SAMPLE BASED ON 3 SECOND SEEK: 140 CLOSEST KEYFRAME SAMPLE: 140 IN CHUNK: 8 CHUNK #8 OFFSET: 259942 SAMPLES IN CHUNK TILL TARGET: 2 NEED TO ADD THOSE SAMPLE OFFSETS: SAMPLE 138 SIZE 420 SAMPLE 139 SIZE 420 FINAL OFFSET = 260782 (CHUNK OFFSET= 259942 + COMBINED SAMPLES OFFSET= 840 ) So, how do I translate this into a position to jump to in the file (or mdat maybe)? For what it's worth, the closest i-frame that ffprobe told me is 221265. I don't see any way of getting to that number with the above.... Last edited by dmk; 17th February 2014 at 20:25. |
![]() |
![]() |
![]() |
#5 | Link |
Software Developer
![]() Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,246
|
Offsets are given for each "chunk", in the stco table, sizes are given for each "sample" in the stsz table and finally the "samples per chunk" are given in the stsc table.
Now you can calculate easily the offset for each "sample". If, for example, you want to know the offset for sample N, first calculate in which chunk the N-th sample is located (using stsc) and then use the offset of that chunk (according to stco). But note: Since the sample N may not be the very first sample within "its" chunk, you may need to add the sizes of all samples that precede sample N in the current chunk (according to stsz) to the chunk's offset value.
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 18th February 2014 at 00:28. |
![]() |
![]() |
![]() |
#6 | Link |
Registered User
Join Date: Feb 2014
Posts: 15
|
Thanks LoRd_MuldeR
Though, as you can see, this is exactly what I did above ![]() e.g. Code:
CHUNK #8 OFFSET: 217886 SAMPLES IN CHUNK TILL TARGET: 2 NEED TO ADD THOSE SAMPLE OFFSETS: SAMPLE 71 SIZE 2958 SAMPLE 72 SIZE 903 FINAL OFFSET = 221747 1) With two different offsets, one for audio and one for video, how do I resolve this to a singular offset for seeking in the file and begin playing? 2) Neither offset equals what I got as a keyframe from ffprobe, is there some other factor I forgot to add/subtract in to make it equal? |
![]() |
![]() |
![]() |
#7 | Link | |
Software Developer
![]() Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,246
|
Quote:
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 18th February 2014 at 12:22. |
|
![]() |
![]() |
![]() |
#8 | Link |
Registered User
Join Date: Feb 2014
Posts: 15
|
OK- that makes sense, thanks...
So the only really strange thing is that the offset I'm getting for the keyframe video sample (i.e. after parsing stts,stss,stsc,stco,stsz in the video trak) does not equal what ffprobe tells me should be the iframe (pict_type=i) Is it definitely supposed to and I'm just getting the math wrong somewhere? Or is an iframe offset somehow different from a "keyframe from reading atoms in video trak" offset, if you know what I mean? |
![]() |
![]() |
![]() |
#9 | Link |
Software Developer
![]() Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,246
|
Given you have already the positions of all samples from "stco", "stsc" and "stsz", you can simply look at "stss" to get the indices of the sync samples.
So it usually tells you which frames are IDR-frames in a H.264 video stream or which frames are I-Frames in a MPEG-2 stream. I'm not sure what you need "stts" for here. BTW: Keep in mind that the index of the first sample is 1, not 0, here.
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 18th February 2014 at 14:06. |
![]() |
![]() |
![]() |
#10 | Link |
Registered User
Join Date: Feb 2014
Posts: 15
|
stts is to find nearest keyframe to desired seek point. i.e. sample index is retrieved via stts and then the closest match to this is found in stss. But you're right, for the sake of debugging, I could just read stss and work from there.
It seems that the final offset from a given sample in stss (i.e. accounting for chunk offset + preceding sample sizes in that chunk) SHOULD then match the offset ffprobe tells me for frames with pict_type=i... Assuming that's true, I must have messed up the math somewhere then... will not be able to take a closer look till later (Israel time). Thanks for the help. |
![]() |
![]() |
![]() |
#11 | Link | |
Software Developer
![]() Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,246
|
Quote:
But if ffproble gives you the offset of some sync sample for a given seek-position, then it will depend on how the sync sample is selected (select closest one vs. always select preceding one, etc). Also note that in H.264 streams, I-Frames are not sync samples, but IDR-frames are. Also there can be H.264 streams with no IDR-frame at all but they can still have "recovery points".
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ |
|
![]() |
![]() |
![]() |
#12 | Link | |
Registered User
Join Date: Feb 2014
Posts: 15
|
Quote:
1725 1855 2958 903 1598 1781 27614 1151 1800 Instead of starting the count from 2958, it should be at 1598 217886 + 1598 + 1781 = 221265 (the offset in ffprobe). Yay! Can't get to the code till later to see where I screwed it up, but thank you for explaining how it works and that line about the index. Will think about case where not every IDR frame is iframe... but not for now ![]() |
|
![]() |
![]() |
![]() |
#13 | Link |
Guest
Join Date: Jan 2002
Posts: 21,906
|
A point about IDR versus I as seek points. I have never seen any stream where an I frame is not seekable. In theory they may not be but no encoder I am aware of generates such streams. DG tools have made this assumption from day 1 and nobody ever reported any issues arising from it.
|
![]() |
![]() |
![]() |
#15 | Link | |
Registered User
Join Date: Jul 2007
Posts: 551
|
Quote:
Code:
--keyint infinite --min-keyint 3000 --scenecut 100 If you will try to decode such stream and treat I-frames as IDR-frames you will see a lot of artifacts. |
|
![]() |
![]() |
![]() |
#17 | Link | |
Software Developer
![]() Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,246
|
Quote:
But "normal" I-frames do not flush the DPB. Still the encoder can simply decide to not make any frames after the I-Frame reference to any frames before the I-frame (even though they could). So an I-frame might be a "sync" point. It's even possible that a H.264 stream doesn't have any I/IDR frames at all, such as x264's "periodic intra refresh" mode. Still there can be "sync" points in the form of "SEI recovery" messages in such streams. So, after all, whether a specific sample/frame is a "sync" point or not, is more or less independent from the frame type. That's also why x264 returns the "type" of an encoded frame separately from whether that frame is a "keyframe".
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 19th February 2014 at 02:55. |
|
![]() |
![]() |
![]() |
Tags |
atoms, isomedia, mpeg-4 |
Thread Tools | Search this Thread |
Display Modes | |
|
|