Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Programming and Hacking > Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 17th February 2014, 10:43   #1  |  Link
dmk
Registered User
 
Join Date: Feb 2014
Posts: 15
i-frame vs. atom samples?

SHORT VERSION: How would I chop the stsz table such that it begins at a keyframe, given a known offset of the keyframe into mdat?

LONG VERSION:

Here's some data to consider with a test file:

ffprobe -show_frames

Gives i-frames at 10421, 221265, 434154, ...

If we subtract the atom headers and stuff so that we're only dealing with actual mdat data, that would then result in 0, 210844, 423733

mp4box -diso
Gives the following entries in stsz

<SampleSizeBox SampleCount="240">
<BoxInfo Size="980" Type="stsz"/>
<FullBoxInfo Version="0" Flags="0x0"/>
<SampleSizeEntry Size="27558"/>
<SampleSizeEntry Size="1171"/>
and so on

The problem is, when I add up the SampleSizeEntries... I cannot get it to equal 210844 (nor 221265 just in case it's like stco and counts the file size, though I doubt that's right)

In other words, I would think simply add up the sizes and then stop where it hits that number. But I must be misunderstanding something since it never actually hits the number exactly. Before I go ahead and try adjusting other tables and play it out I'd like to understand a bit better what I'm doing wrong here...

ADDITIONAL INFO:

Docs say when there is no stss atom, treat all samples as keyframes. That seems extremely strange... there is no stss atom here, but I know from ffprobe and encoding parameters that every sample here is definitely not a keyframe.


Thanks. If I didn't phrase my question right, feel free to answer me in a way that will be most helpful to understand relationship between i-frames and stbl entries
dmk is offline   Reply With Quote
Old 17th February 2014, 18:32   #2  |  Link
dmk
Registered User
 
Join Date: Feb 2014
Posts: 15
OK I got a few things wrong and was going at it the wrong way. figured instead of editing my question above, it's more helpful to leave it as is (Edison once said something like, he didn't fail 99 times in his experimenting, rather he definitively discovered 99 ways it definitely won't work, or something like that)

Anyway- here's where I'm at now

1) there *is* an stss atom, I was looking at the wrong trak. D'oh!

2) It seems the right approach is really to use seeking based on time and look up the samples and nearest-keyframe-sample for that in the stts and stss atoms respectively (quicktime doc says to go to the sample keyframe less than desired sample, I think lowest distance makes more sense even if it's the next one up...)

So now I hope my refined question will be a little more on point

Let's say we're able to jump to a specific sample in each trak which matches the nearest keyframe. Great- as far as the atoms are concerned

But how does this correspond to the offset in actual mdat data?

On a similar note... not sure how to quite put this in a clear question, but the times are not quite the same for audio and video trak. Close but not the same... how does that factor into the equation?

Thanks!
dmk is offline   Reply With Quote
Old 17th February 2014, 20:19   #3  |  Link
dmk
Registered User
 
Join Date: Feb 2014
Posts: 15
Here is the test file:

http://we.tl/nAwN5e5KaQ

And here is my rough trace:

VIDEO TRAK

DESIRED SAMPLE BASED ON 3 SECOND SEEK: 71
CLOSEST KEYFRAME SAMPLE: 73
IN CHUNK: 8
CHUNK #8 OFFSET: 217886
SAMPLES IN CHUNK TILL TARGET: 2

NEED TO ADD THOSE SAMPLE OFFSETS:
SAMPLE 71 SIZE 2958
SAMPLE 72 SIZE 903

FINAL OFFSET = 221747 (CHUNK OFFSET= 217886 + COMBINED SAMPLES OFFSET= 3861)

AUDIO TRAK
DESIRED SAMPLE BASED ON 3 SECOND SEEK: 140
CLOSEST KEYFRAME SAMPLE: 140
IN CHUNK: 8
CHUNK #8 OFFSET: 259942
SAMPLES IN CHUNK TILL TARGET: 2

NEED TO ADD THOSE SAMPLE OFFSETS:
SAMPLE 138 SIZE 420
SAMPLE 139 SIZE 420

FINAL OFFSET = 260782 (CHUNK OFFSET= 259942 + COMBINED SAMPLES OFFSET= 840 )

So, how do I translate this into a position to jump to in the file (or mdat maybe)?

For what it's worth, the closest i-frame that ffprobe told me is 221265. I don't see any way of getting to that number with the above....

Last edited by dmk; 17th February 2014 at 20:25.
dmk is offline   Reply With Quote
Old 17th February 2014, 20:40   #4  |  Link
Guest
Guest
 
Join Date: Jan 2002
Posts: 21,906
Moving to Development forum.
Guest is offline   Reply With Quote
Old 18th February 2014, 00:24   #5  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,246
Offsets are given for each "chunk", in the stco table, sizes are given for each "sample" in the stsz table and finally the "samples per chunk" are given in the stsc table.

Now you can calculate easily the offset for each "sample". If, for example, you want to know the offset for sample N, first calculate in which chunk the N-th sample is located (using stsc) and then use the offset of that chunk (according to stco).

But note: Since the sample N may not be the very first sample within "its" chunk, you may need to add the sizes of all samples that precede sample N in the current chunk (according to stsz) to the chunk's offset value.
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 18th February 2014 at 00:28.
LoRd_MuldeR is offline   Reply With Quote
Old 18th February 2014, 06:33   #6  |  Link
dmk
Registered User
 
Join Date: Feb 2014
Posts: 15
Thanks LoRd_MuldeR

Though, as you can see, this is exactly what I did above

e.g.

Code:
CHUNK #8 OFFSET: 217886 
SAMPLES IN CHUNK TILL TARGET: 2

NEED TO ADD THOSE SAMPLE OFFSETS:
SAMPLE 71 SIZE 2958
SAMPLE 72 SIZE 903

FINAL OFFSET = 221747
To clarify, my problem is twofold:

1) With two different offsets, one for audio and one for video, how do I resolve this to a singular offset for seeking in the file and begin playing?

2) Neither offset equals what I got as a keyframe from ffprobe, is there some other factor I forgot to add/subtract in to make it equal?
dmk is offline   Reply With Quote
Old 18th February 2014, 12:17   #7  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,246
Quote:
With two different offsets, one for audio and one for video, how do I resolve this to a singular offset for seeking in the file and begin playing?
You can't. Each sample of each track has its specific position (offset) in the file. For example, if you have two tracks (audio + video), then each audio sample A[n] has a position/offset and each video sample V[n] has a position/offset. You will have to read + decode those seperately anyway. Of course you can simply calculate the minimum of the very first sample's offset of all tracks, e.g. MIN(A[0], V[0]). This will give you the first position you'll ever have to read. And it probably is equal to the MDAT atom's position. Still, you will haveto read and decode A[0] as well as V[0] separetly. And of course the duration of an audio sample can be quite different from the duration of a video sample. So you can't assume that two samples A[x] and V[x] with the same index x are "in sync". For example, it might be neccessary to play multiple audio samples while one video sample is being displayed - or vice versa.
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 18th February 2014 at 12:22.
LoRd_MuldeR is offline   Reply With Quote
Old 18th February 2014, 12:26   #8  |  Link
dmk
Registered User
 
Join Date: Feb 2014
Posts: 15
OK- that makes sense, thanks...

So the only really strange thing is that the offset I'm getting for the keyframe video sample (i.e. after parsing stts,stss,stsc,stco,stsz in the video trak) does not equal what ffprobe tells me should be the iframe (pict_type=i)

Is it definitely supposed to and I'm just getting the math wrong somewhere? Or is an iframe offset somehow different from a "keyframe from reading atoms in video trak" offset, if you know what I mean?
dmk is offline   Reply With Quote
Old 18th February 2014, 13:59   #9  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,246
Given you have already the positions of all samples from "stco", "stsc" and "stsz", you can simply look at "stss" to get the indices of the sync samples.

So it usually tells you which frames are IDR-frames in a H.264 video stream or which frames are I-Frames in a MPEG-2 stream. I'm not sure what you need "stts" for here.

BTW: Keep in mind that the index of the first sample is 1, not 0, here.
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 18th February 2014 at 14:06.
LoRd_MuldeR is offline   Reply With Quote
Old 18th February 2014, 14:07   #10  |  Link
dmk
Registered User
 
Join Date: Feb 2014
Posts: 15
stts is to find nearest keyframe to desired seek point. i.e. sample index is retrieved via stts and then the closest match to this is found in stss. But you're right, for the sake of debugging, I could just read stss and work from there.

It seems that the final offset from a given sample in stss (i.e. accounting for chunk offset + preceding sample sizes in that chunk) SHOULD then match the offset ffprobe tells me for frames with pict_type=i...

Assuming that's true, I must have messed up the math somewhere then... will not be able to take a closer look till later (Israel time). Thanks for the help.
dmk is offline   Reply With Quote
Old 18th February 2014, 14:30   #11  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,246
Quote:
Originally Posted by dmk View Post
It seems that the final offset from a given sample in stss (i.e. accounting for chunk offset + preceding sample sizes in that chunk) SHOULD then match the offset ffprobe tells me for frames with pict_type=i...
If you calculate the offsets of all sync samples and ffprobe also gives the offsets of all sync samples, then it's supposed to match, yes.

But if ffproble gives you the offset of some sync sample for a given seek-position, then it will depend on how the sync sample is selected (select closest one vs. always select preceding one, etc).

Also note that in H.264 streams, I-Frames are not sync samples, but IDR-frames are. Also there can be H.264 streams with no IDR-frame at all but they can still have "recovery points".
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊
LoRd_MuldeR is offline   Reply With Quote
Old 18th February 2014, 16:14   #12  |  Link
dmk
Registered User
 
Join Date: Feb 2014
Posts: 15
Quote:
Originally Posted by LoRd_MuldeR View Post
BTW: Keep in mind that the index of the first sample is 1, not 0, here.
OK- I see the problem now... somewhere my indexing was off. Looking at a list of the samples in the general area:

1725
1855
2958
903
1598
1781
27614
1151
1800

Instead of starting the count from 2958, it should be at 1598

217886 + 1598 + 1781 = 221265 (the offset in ffprobe). Yay! Can't get to the code till later to see where I screwed it up, but thank you for explaining how it works and that line about the index.

Will think about case where not every IDR frame is iframe... but not for now
dmk is offline   Reply With Quote
Old 18th February 2014, 16:19   #13  |  Link
Guest
Guest
 
Join Date: Jan 2002
Posts: 21,906
A point about IDR versus I as seek points. I have never seen any stream where an I frame is not seekable. In theory they may not be but no encoder I am aware of generates such streams. DG tools have made this assumption from day 1 and nobody ever reported any issues arising from it.
Guest is offline   Reply With Quote
Old 18th February 2014, 16:52   #14  |  Link
dmk
Registered User
 
Join Date: Feb 2014
Posts: 15
Phew!
dmk is offline   Reply With Quote
Old 18th February 2014, 18:40   #15  |  Link
MasterNobody
Registered User
 
Join Date: Jul 2007
Posts: 551
Quote:
Originally Posted by neuron2 View Post
A point about IDR versus I as seek points. I have never seen any stream where an I frame is not seekable. In theory they may not be but no encoder I am aware of generates such streams. DG tools have made this assumption from day 1 and nobody ever reported any issues arising from it.
Then you probably are not aware about x264. It can generate such streams and will do this with default settings (probability of this is small but not impossible). To make such probability higher you can use such insane settings:
Code:
--keyint infinite --min-keyint 3000 --scenecut 100
And here is example of such stream: test.h264 (mirror)
If you will try to decode such stream and treat I-frames as IDR-frames you will see a lot of artifacts.
MasterNobody is offline   Reply With Quote
Old 18th February 2014, 18:50   #16  |  Link
dmk
Registered User
 
Join Date: Feb 2014
Posts: 15
OK- great, that was the problem...

lost one byte due to treating sample index as starting from 0
lost another due to simple code mistake (forgot to jump past numberOfEntries in stsz)
dmk is offline   Reply With Quote
Old 19th February 2014, 02:39   #17  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,246
Quote:
Originally Posted by dmk View Post
Will think about case where not every IDR frame is iframe... but not for now
I think IDR-frames always are "sync" points, because they flush the DPB (Decoded Picture Buffers).

But "normal" I-frames do not flush the DPB. Still the encoder can simply decide to not make any frames after the I-Frame reference to any frames before the I-frame (even though they could). So an I-frame might be a "sync" point.

It's even possible that a H.264 stream doesn't have any I/IDR frames at all, such as x264's "periodic intra refresh" mode. Still there can be "sync" points in the form of "SEI recovery" messages in such streams.

So, after all, whether a specific sample/frame is a "sync" point or not, is more or less independent from the frame type. That's also why x264 returns the "type" of an encoded frame separately from whether that frame is a "keyframe".
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 19th February 2014 at 02:55.
LoRd_MuldeR is offline   Reply With Quote
Reply

Tags
atoms, isomedia, mpeg-4

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 10:28.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, vBulletin Solutions Inc.