PDA

View Full Version : Audio encoder delay question


madhatter300871
7th March 2012, 18:08
Does NeroAACenc, dcaenc, Aften automatically insert a delay at the beginning of an encoded track ?

The reason I ask is that I have recently read somewhere (apologies for not remembering exactly where !) that this might be the case.

I've not noticed lip sync issues with my encodes so far (although I'll be looking for it now wont I !!) but had it on my mind to pose the question here. If this does happen I can simply take steps to counteract it by adding a delay setting in the muxer (I use mp4box and mkvmerge)

Thanks.

Nevilne
7th March 2012, 18:09
Use l-smash builds with --priming
iTunes / QuickTime: 2112
faac: 1024
Nero: 2624

Midzuki
7th March 2012, 19:02
Does NeroAACenc, dcaenc, Aften automatically insert a delay at the beginning of an encoded track ?

The reason I ask is that I have recently read somewhere (apologies for not remembering exactly where !) that this might be the case.

Compression scheme artifacts:

http://en.wikipedia.org/wiki/Gapless_playback#Compression_scheme_artifacts

madhatter300871
7th March 2012, 21:21
Nevilne ... not sure what you mean, can you elaborate. Sorry if Im appearing dumb !

Midzuki ... Thanks for the link, yes makes sense. As I am not encoding tracks that will play back one after the other, only one long track, gapless playback isn't quite what I'm looking for (is it ?). So when my movie soundtrack is encoded using a lossy compressor some amount of silence is added to the begining of the track, is this then taken care of upon playback by the decoder ? I'm presuming it is as I really don't seem to have lip sync issues.

madhatter300871
7th March 2012, 22:00
OK, did some more reading. This link gives details of Lame 3.87

http://mp3decoders.mp3-tech.org/decoders_lame.html

Its an interesting read giving values for encoder/decoder delay introduced by different MP3 encoders. Overall delay is in the range of 1000 to 2500 samples (broadly speaking) at a sample rate of 44.1KHz.

So the delay is, broadly speaking, in the range of 24ms to 60ms (if my maths is any good).

The Lame decoder automatically removes any silence that the Lame encoder inserst, but i'm using ffdshow. Now trying to find out if ffdshow, or indeed any of the multitude of audio decoders out there, ignore padded silence in a movies audio track.

Any help or pointers appreciated.

Nevilne
7th March 2012, 22:04
l-smash is alternative muxer to gpac(mp4box). http://code.google.com/p/l-smash/
There are custom x264 builds which use l-smash instead of gpac, like http://x264.fushizen.eu/
It's also available as standalone muxer here: http://sada5.sakura.ne.jp/files/index.php?folder=TC1TTUFTSA==

Audio encoders use different delay/priming even with the same format, so you have to find out the delay for your specific encoder.
Here's some more info on priming: https://developer.apple.com/library/mac/#technotes/tn2258/_index.html

I've listed some corresponding priming times for aac encoders above. The delay is inaudible because it's only around 30ms.

You can set it in x264 as --priming or in l-smash muxer as "encoder-delay=" track property. I'm not sure if mp4box allows you to state priming (didn't use it in ages), but negative audio delay should suffice. However, some video players ignore delay flags, not sure about priming flags.

edit: we posted at the same time, hehe. some repetition but i'll leave original post as it is.

madhatter300871
7th March 2012, 22:22
Nevilne ... Thanks for the Apple link.

That link says in the "Encoder Delay Recommendation" section :-

In all of these cases the behaviour is as described above; an implicit, not-signalled, assumption is made about the size of this delay and the playback engine is required to trim this designated number of samples from its output at the start of playback.

So it seems that this is all taken care of by the playback engine, although perhaps not as accurately as it could be but it would appear (to my eyes/ears at least) accurately enough.

AAC, MP-3 and AC-3 all have priming and padding applied and I am making my own assumption that all lossy encoding schemes do. I have read that lossless doesn't suffer this, so FLAC and OGG lossless (for example) shouldn't have issues.

So..... is it safe for me to assume that audio gaps (priming and padding) are all taken care of upon playback, and I don't really need to worry about it ?

Dark Shikari
7th March 2012, 22:33
All MDCT-based codecs have this issue. There are lossy schemes that don't use MDCT (e.g. ADPCM).

sneaker_ger
7th March 2012, 22:33
Notice that newer versions of mkvmerge read and apply the AAC audio delay automatically when muxing AAC in MP4 if the file has the apple tag with that information. This is the case for NeroAACEnc, so you don't have to manually fiddle with the delay when using mkvmerge.

madhatter300871
7th March 2012, 22:43
Ahh, OK. Its MDCT based codecs that have this problem, thanks for that correction to my thinking.

Sneaker_ger : Thanks. So nothing to worry about at all when using NeroAACenc and mkvmerge. Excellent. Wonder if MP4box does the same ....

When tools like eac3to and dgindex report an audio delay, is this the reason for the delay ? Is it priming introduced by the encoder ? If so, why do some movies have no delay in the audio track ?

I suppose what I'm getting at here is this : if I don't know the exact amount of priming applied by the encoder, I will always have audio sync issues will I not (OK, they may be so small I can't notice, but technically speaking the delay is there).

Is there an official trusted list anywhere that lists the range of samples usually added at the begining by various encoders or is it a matter of exhaustive searching. Could we place a sticky here on doom9 ?

sneaker_ger
7th March 2012, 23:04
When tools like eac3to and dgindex report an audio delay, is this the reason for the delay ? Is it priming introduced by the encoder ?

No, it is not.

I suppose what I'm getting at here is this : if I don't know the exact amount of priming applied by the encoder, I will always have audio sync issues will I not

Correct.

madhatter300871
7th March 2012, 23:17
How do you guys normally deal with this priming issue and hence audio sync issue ? Or is the audio delay incurred due to priming so small that you just live with it as you don't really notice ?

I know how to delay a track inside mkv or mp4, thats not a problem, but I don't know where to find the exact priming values (if indeed they can even be exact on a film by film basis).

Nevilne has posted faac, nero and itunes values. Take for example itunes aac applying 2112 priming samples at the beginning of a track. at 44.1KHz sampling rate this would equate to 47.89 (48) ms. Should I always apply a 48ms audio delay when using a track I encode using itunes ?

Sneaker_ger ... promising not to go off topic for more than a post or two here, what are the audio delays reported by dgindex, eac3to and the like ? If my track has a delay of, say, 100ms and i re-encode to aac, does this mean i now need to set a delay of 148ms in my final mux ? Just trying to grow my knowledge.

Thanks.

sneaker_ger
7th March 2012, 23:50
Nevilne has posted faac, nero and itunes values. Take for example itunes aac applying 2112 priming samples at the beginning of a track. at 44.1KHz sampling rate this would equate to 47.89 (48) ms. Should I always apply a 48ms audio delay when using a track I encode using itunes ?

Yes, these values are constant for a given encoder and profile (LC-AAC). So the value in ms only changes if you either use a different profile (HE-AAC) or a different sample rate.

If my track has a delay of, say, 100ms and i re-encode to aac, does this mean i now need to set a delay of 148ms in my final mux ? Just trying to grow my knowledge.

You have to be careful here.
Some programs add a delay value into the filename, like "movie xyz audio track 1 DELAY 100ms.ac3".
This means that the audio comes 100ms too early and therefore the muxer must apply a positive delay of 100ms.
The e.g. aac encoder delay means that the audio is late, hence you have to apply a negative delay.
So in the end you would have to choose: +100ms -48ms= 52ms
And as said above: mkvmerge automatically applies the aac delay, so you only choose "+100ms" in mkvmerge.

madhatter300871
8th March 2012, 00:06
Understood about positive/negative delay. DGindex names files that way, I'm used to that so I know what you mean.

MKVmerge automatically applies delay for AAC tracks, great, less for me to worry about.

Do you know if there is a sticky anywhere that lists codec priming values and any known muxer automation ? Do you think I could try and start one, or is this stuff pretty basic and standard knowledge and I just need to educate myself ?

Thanks for all your help, genuinely appreciated.

madhatter300871
8th March 2012, 00:13
I read this post :-

https://www.bunkus.org/bugzilla/show_bug.cgi?id=715

Its becoming clearer now. It wouldn't be fun if it was easy now would it !

madhatter300871
8th March 2012, 00:23
Looking at this link :-

http://lame.sourceforge.net/tech-FAQ.txt

MP3 encoded with lame will have 1056 samples priming. This link also talks about delays introduced by the encoder and delays introduced by the decoder, producing a total delay of the sum of both. This value is stored in the MP3 file and the Lame decoder uses it to remove the 528 empty samples when decoding. Clever.

Are Nevilne's values the total delay or the encoder delay ?

So, I need to know priming values of encoders, any muxer automation relating to these values and and decoder automation by e.g. ffdshow. Correct ?

Heeeelp !! :)

tebasuna51
8th March 2012, 00:26
AC3 encoders add 256 samples of silence at the begining, 5.33 ms for 48 KHz.
With Aften you can cancel the delay using the parameter -pad 0

hello_hello
8th March 2012, 03:11
Midzuki ... Thanks for the link, yes makes sense. As I am not encoding tracks that will play back one after the other, only one long track, gapless playback isn't quite what I'm looking for (is it ?). So when my movie soundtrack is encoded using a lossy compressor some amount of silence is added to the beginning of the track, is this then taken care of upon playback by the decoder ? I'm presuming it is as I really don't seem to have lip sync issues.

VirtualDubMod has a setting, which by default, appears to get it to compensate for any MP3 audio delay when muxing, which it says is a 1393 sample lag. So I guess going by the LAME info you linked to it's slightly overcompensating if you use the LAME encoder and CBR (only by 288 samples) and undercompensating if you use VBR. I don't know about other muxers.

As an experiment I just demuxed a DTS stream and opened it with foobar2000. I then converted it to wave, CBR MP3, VBR MP3 and M4A/AAC (mixing to stereo each time). I then opened each file with foobar2000 and it reported exactly the same duration and number of samples for each as the original DTS file, so I'd guess if there was an encoder delay introduced each time foobar2000 must be compensating for it perfectly somehow.

I thought though, I'd try an experiment with AAC. I took the M4A/AAC foobar2000 created and converted it to wave. Same length as the original DTS. I then remuxed the M4A as an MKA (MKVMergeGUI) and foobar2000 reported the mka to be 64ms longer than the M4A. I converted the MKA to wave and it was also 64ms longer. I opened both wave files with an editor to compare them and as best as I could tell, the wave file converted from the mka had a 51ms delay compared to the one converted directly from DTS.

Anyway, I then tried a sync test which basically involves running two instances of the video (original and encode) and syncing the audio until it produces a phasing effect, then watching to see if the scene changes happen at exactly the same time. I tried it twice, first comparing the original video to the encode with the M4A audio, then again with a remuxed copy where I manually specified a -51ms delay. I'm pretty sure the version with the -51ms audio delay was the one more in sync with the original video.

I happened to have a couple of other original videos still on my hard drive and I compared them to the encodes using the above method, and I'm pretty sure, much to my disappointment, introducing a -50ms delay for the encoded version got the audio more in sync with the original. Likewise foobar2000 reported the AAC audio in the MKV encodes to be longer than the original both times (63ms and 105ms). There seems to be a potential for disappointment here. I'll have to investigate further.... or maybe just pretend I never read this thread.

sneaker_ger
8th March 2012, 07:24
As an experiment I just demuxed a DTS stream and opened it with foobar2000. I then converted it to wave, CBR MP3, VBR MP3 and MP4/AAC (mixing to stereo each time). I then opened each file with foobar2000 and it reported exactly the same duration and number of samples for each as the original DTS file, so I'd guess if there was an encoder delay introduced each time foobar2000 must be compensating for it perfectly somehow.

As an audio player, I'm sure foobar2000 will read delay and padding information from the file. These are written by the encoders as e.g. apple style tag (aac, mp3) or lame header (mp3). This has to be done to ensure gapless playback of splitted concerts for example.
These informations will get lost when muxing into a different format, so that's where the discrepancy comes from.

hello_hello
8th March 2012, 08:23
As an audio player, I'm sure foobar2000 will read delay and padding information from the file. These are written by the encoders as e.g. apple style tag (aac, mp3) or lame header (mp3). This has to be done to ensure gapless playback of splitted concerts for example.
These informations will get lost when muxing into a different format, so that's where the discrepancy comes from.

Which seems to be somewhat disappointing. Mind you if my sync tests were any indication, the 50ms delay is small enough that you'd probably need to have OCD to care.... which probably means I do. :(

You said earlier "newer versions of mkvmerge read and apply the AAC audio delay automatically when muxing AAC in MP4 if the file has the apple tag with that information", so I kind of hoped it'd be clever enough to do the same.... or read the delay padding info from the file.... when muxing AAC audio from an m4a file to MKV. From my tests with foobar2000 it seems the information is there, but MKVMergeGUI isn't making use of it. Is there any way to tell it to?

Is that correct? Should I always be applying a negative 50ms delay to AAC/m4a audio when I remux it into an MKV? Is 50ms the correct delay, or at least close enough?

Edit: Well I just converted the DTS audio from another video to AAC/m4a using foobar2000 and then used the AAC audio to replace the audio stream in the MKV.... twice.... the second time applying a -50ms delay. I'm fairly convinced the -50ms delay gets the audio sync virtually exact, whereas without it the audio is a little off compared to the original, even if it's really only off by a very little bit.

sneaker_ger
8th March 2012, 08:36
It's not always 50ms. As said above: it depends on the encoder, the profile and the sample rate.
From my tests: mkvmerge will apply the delay correctly, if it has been properly written to the mp4 file. I don't think it caters for any padding - only delay.

Maybe you can upload sample wav, m4a and mka files. Also, which AAC encoder are you using?

hello_hello
8th March 2012, 09:26
When you say "doesn't cater for any padding" are you referring to the padding at the end of the audio stream?

I'm using the Nero encoder. Assuming I always use it, and I'm always converting DTS 48k audio, would the delay required always be the same? I guess though it begs the question as to what happens if I convert audio (such as AAC) to another format (such as MP3). Would the LAME encoder include the original padding then add more of it's own?

Okay I read the conversation you were involved in here (I assume it's you): https://www.bunkus.org/bugzilla/show_bug.cgi?id=715
I opened the M4A created by foobar2000 with MP3Tag and found the iTunSMPB thingy. Here's the info:
00000000 00000A40 000003C0 000000000F74EE00 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
(I'm not sure how to interpret it, but to confirm MP3Tag is reading the tag correctly I downloaded the aac_encoder_delay_sample.m4a, and MP3Tag reported the same values as listed on the bugzilla page)
Then I extracted the audio time codes after muxing the AAC audio into the original MKV (replacing the DTS audio).

# timecode format v2
0
42
83
125
167
209.....

As the delay applied when muxing should have a negative value (I assume), from the above I also assume any audio delay was applied by trimming the beginning of the audio stream. Is that how it's done or should MKVMergeGUI actually write a negative delay value? How did you determine the audio delay was being applied correctly? (I'm using MKVMerge 5.2.1)

If the above isn't providing enough information, I'll create some little samples and upload them. Thanks, by the way, for all the info.

sneaker_ger
8th March 2012, 09:36
When you say "doesn't cater for any padding" are you referring to the padding at the end of the audio stream?

Yes.

I'm using the Nero encoder. Assuming I always use it, and I'm always converting DTS 48k audio, would the delay required always be the same? I guess though it begs the question as to what happens if I convert audio (such as AAC) to another format (such as MP3). Would the LAME encoder include the original padding then add more of it's own?

Well, the same as with every other lossy conversion: [compressed data]->[decoder]->[uncompressed data]->[encoder]

So it depends solely on whether or not the decoder deletes the delay and padding or not. (NeroAacDec does and foobar2000 probably, too. ffmpeg/ffms2 does not)

Then I extracted the audio time codes after muxing the AAC audio into the original MKV (replacing the DTS audio).

# timecode format v2
0
42
83
125
167
209.....

Those look suspiciously like video timecodes (23.976 fps)

As the delay applied when muxing should have a negative value (I assume), from the above I also assume any audio delay was applied by trimming the beginning of the audio stream. Is that how it's done or should MKVMergeGUI actually write a negative delay value?

Yes, negative delays result in trimming. The first timecode might then be a small ( < frame length) positive delay.

How did you determine the audio delay was being applied correctly? (I'm using MKVMerge 5.2.1)

Decoded with ffaudiosource and compared the waveform to the original.

If the above isn't providing enough information, I'll create some little samples and upload them.

Well, we could see if anything is wrong, then.

sneaker_ger
8th March 2012, 09:44
Okay I read the conversation you were involved in here (I assume it's you): https://www.bunkus.org/bugzilla/show_bug.cgi?id=715
I opened the M4A created by foobar2000 with MP3Tag and found the iTunSMPB thingy. Here's the info:
00000000 00000A40 000003C0 000000000F74EE00 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
(I'm not sure how to interpret it, but to confirm MP3Tag is reading the tag correctly I downloaded the aac_encoder_delay_sample.m4a, and MP3Tag reported the same values as listed on the bugzilla page)

Okay, delay = A40 (HEX) = 2624 (DEC)
length of source file = F74EE00 (HEX) = 259321344 (DEC)
padding = 3C0 (HEX) = 960 (DEC)

For 48kHz:
delay = 2624 / 48000 ~= 55ms
padding = 960 / 48000 = 20ms
original length = 259321344 / 48000 ~= 5402.528s

So the length after muxing to mkv would probably be original length+padding = 20ms + 5402.528s = 5402.548s
But it will only work correctly, if at least one other track starts at timecode 0.
Does that add up?

sneaker_ger
8th March 2012, 10:01
Hmm....
Something's off on my end. I have to do some tests. Will revert to older mkvmerge.

sneaker_ger
8th March 2012, 10:06
Ah, wait:
mkvmerge will "null" all timecodes, so when you're supplying audio only, it will not work as expected. Padding is not taken into consideration. (old post edited)

hello_hello
8th March 2012, 10:17
Yes, negative delays result in trimming. The first timecode might then be a small ( < frame length) positive delay.

Yep. I tried a second test encode and MediaInfo indicated a 9ms audio delay when I muxed the encoded aac back into the MKV. It's not the sample I've uploaded though.


So the length after muxing to mka would probably be original length+padding = 20ms + 5402.528s = 5402.548s
Does that add up?
I checked the remuxed MKV using foobar2000 and it reports the length as being 01:30:02:552. So does that mean the correct delay is being applied (give or take four ms)?
The length of the original MKV with DTS is reported as 01:30:02:531 so that adds up too, I think. Same length, minus 20ms of padding.

Edit: ^&%$!!! I just realized the remuxed MKV I checked above was the one where I manually applied a -50ms delay. Damn! So now I think the delay wasn't applied properly and my manual -50ms delay explains the 4ms difference when compared with your calculations. I'll start again with that file so ignore the above for the moment.

I've created some samples. I figured if you have time I might get you to look at them to make sure the aac audio is being remuxed with the correct delay. I'll try converting them to wave myself later but I keep getting distracted at the moment and losing the plot on what I'm doing.

Is it normal for a single chapter to be written when creating an m4a? I included the remuxed chapter too. I just found it curious MKVMerge seems to find a chapter in the encoded M4A which starts at the 54ms mark.
Hopefully I've named the files in the zip file in a way which makes sense. http://www.mediafire.com/?p085effuuq8787h

I guess if you're sure the correct audio delay is being applied I'll have to go back to my earlier post and try to work out what was going wrong, or where my calculations were going wrong.
Cheers.

sneaker_ger
8th March 2012, 10:50
Is it normal for a single chapter to be written when creating an m4a? I included the remuxed chapter too. I just found it curious MKVMerge seems to find a chapter in the encoded M4A which starts at the 54ms mark.

That's Nero's way of signaling the encoder delay. But due to apples might they also include the "iTunSMPB" tag, now.

Hopefully I've named the files in the zip file in a way which makes sense. http://www.mediafire.com/?p085effuuq8787h

Tag in "foobar encoded aac.m4a":
00000000 00000A40 000001C0 0000000000140400 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

Delay = A40 = 2624 = 55ms (0,054666... s)
Original length = 1311744 = 27.328s
Padding = 1C0 = 448 = 9ms

Checking the length of "original sample dts.mkv": 27.328s (=correct!)
Checking length of "sample aac remuxed.mkv": 27.392s (wtf?)

Remuxing audio of "foobar encoded aac.m4a" and video of "original sample dts.mkv" to "reremux.mkv". Length = 27.337s = 27.328s + 9ms = original length + padding (as expected)

Comparing waveform of "reremux.mkv" and "original sample dts.mkv": no extra delay (as expected)

Comparing waveform of "sample aac remuxed.mkv" and "original sample dts.mkv": delayed by 55ms

So I take you manually entered "55ms" into the delay field of mmg instead of letting it do its work?

hello_hello
8th March 2012, 11:33
So I take you manually entered "55ms" into the delay field of mmg instead of letting it do its work?

Not when I created those samples, I'm sure of it. I did when I posted the info of the muxed file in my earlier post (negative 50ms actually) but I edited the post it to correct that when I realized my mistake. The delay should be negative, shouldn't it?

I just checked those samples again with foobar2000. According to it:
"original sample dts.mkv": 27.319s (1,311,298 samples)
"demuxed DTS": 27.328s (1.311.744 samples)
"sample aac remuxed.mkv": 27.319s (1,311,298 samples)
"foobar encoded aac.m4a": 27.328s (1.311.744 samples)

Speaking of WTF? I am losing the plot on what's going on now. I might have to rest it for a while and come back fresh a bit later. Maybe foobar2000 stops counting where the video stream stops? What are you using to check the length of the audio inside the MKVs?

While you were playing with my samples I re-encoded the audio and remuxed it again. Same file, only this time it's the whole file, not just a small sample of it, and I'm 200% sure I applied no audio delay when remuxing. This time, everything seems to be as it should (I think). I guess the end result is the MKV containing AAC audio should be the same length as the original containing DTS?

According to foobar2000:
Original MKV with DTS audio: 01:30:02:531
DTS after it was demuxed: 01:30:02:528
M4A after converting DTS to AAC: 01:30:02:528
Remuxed MKV with AAC audio: 01:30:02:531

Tag info from M4A as reported by MP3Tag:
00000000 00000A40 000003C0 000000000F74EE00 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

When I return, I'll re-do/re-check the encodes I mentioned in my first post and try to discover why foobar2000 reported the remuxed AAC MKV to be longer than the original MKV containing DTS audio. And I'll make some fresh samples from a different video.

Cheers.

hello_hello
8th March 2012, 12:00
So I take you manually entered "55ms" into the delay field of mmg instead of letting it do its work?

Here's a thought......
MKVMergeGUI found a chapter file inside the m4a file. It starts at the 54ms mark if I remember correctly. When I remuxed the AAC into the MKV and removed the chapter file, that couldn't have caused MKVMerge to use the chapter info to apply a 54ms audio delay to the audio could it? It'd explain why it seems to have removed the extra silence from the beginning of the AAC while a 55ms delay was applied for no apparent reason, and it'd explain why my tests seem to indicate my encodes are about 50ms out of sync with the original.

I also included a sample where I didn't delete the chapter in the M4A file when I remuxed it. Does that sample also have the same 55ms delay applied?

madhatter300871
8th March 2012, 13:08
I'de like to make the following very basic observations and hope you guys don't feel at all patronised by my simple approach. I find starting simple allows me to slowly build up my knowledge and ultimately gain a better understanding.

I'm wondering if I could somehow turn this into a sticky once I have more info ?

I would welcome and value all criticism of, corrections to, and additions to my post.

My 'audio encoder delay for dummies' would go something like this :-

-------------------------------------------------

MDCT based audio encoders apply silence to the begining of the newly encoded audio track (referred to as priming) and silence to the end of the newly encoded audio track (referred to as padding). Lossless audio encoders do not.

The amount of priming and padding differs, and depends on the encoder used, encoding profile used and sample rate.

A good explanation of this principal as used by apple for AAC can be found here :-
https://developer.apple.com/library/mac/#technotes/tn2258/_index.html
And for Lame MP3 here :-
http://mp3decoders.mp3-tech.org/decoders_lame.html

Encoders known to NOT apply priming/padding are :-
Raw PCM
Flac
OGG lossless ?

Values for encoder delay (priming) so far are :-
iTunes / QuickTime: 2112
faac: 1024
Nero: 2624
Lame CBR : 1105
Lame VBR : 2257

Suggested decoder behaviour so far :-
faad: trims 1024 priming samples when decoding.
ffaac(libavcodec): trims nothing.

Audio delay can be set in the muxing application. The following muxers are known to support this :-
MKV : MKVmerge
MP4 : MP4box

Muxing applications can read priming meta data if it is present inside the audio stream and compensate for it automatically. The following muxers support this :-
MKV : MKVmerge (5.3.0 onwards ?) when muxing audio encoded with NerAACenc.

Audio delay not due to priming can also be present in the source movie file. Some applications support reporting, and removing, this delay. Applications known to support this are :-
DVD : DGindex. Includes the delay value in the file name of demuxed audio streams.
Bluray : Eac3to. Removes audio delay when demuxing streams and reports any remainding delay that can't be trimmed.

This delay also needs to be accounted for when applying the final amount of delay in the muxer.

A good starting point for calculating delay to specify in the muxer is :-
Source audio delay - (1 / sample rate)*encoder delay*1000 ms.

For example, a DVD with an audio sample rate of 48KHz, audio delay of -128ms (as reported by DGindex for example) encoded using iTunes/Quicktime AAC would be :-
-128 - (1/48000)*2112*1000 = -172ms

This may not be totally accurate due to different encoder profiles being used and other factors but as a basic starting point it should give better a/v synch than not allowing for encoder audio delay at all ?

The same source encoded using NeroAACenc and muxed using MKVmerge would just have a delay of -128ms set, as the encoder delay is read from the newly encoded AAC stream and trimmed automatically by MKVmerge.

Some questions :-
1. Apple state that the decoder should make an automatic assumtion and trim 2112 samples from the output upon playback. Do all AAC decoders do this ?
2. I may have totally misunderstood MP3 delay. From the link above I have assumed encoder delay to be 576 and have added decoder delay and header size together to come up with decoder delay. Have I assumed correct ?
3. Do any decoders make automatic assumtions about encoder delay ?
4. Do any decoders make allowance for decoder delay ?
5. Do any decoders read audio stream meta data for delay values ?

sneaker_ger
8th March 2012, 14:03
Here's a thought......
MKVMergeGUI found a chapter file inside the m4a file. It starts at the 54ms mark if I remember correctly. When I remuxed the AAC into the MKV and removed the chapter file, that couldn't have caused MKVMerge to use the chapter info to apply a 54ms audio delay to the audio could it? It'd explain why it seems to have removed the extra silence from the beginning of the AAC while a 55ms delay was applied for no apparent reason, and it'd explain why my tests seem to indicate my encodes are about 50ms out of sync with the original.

I also included a sample where I didn't delete the chapter in the M4A file when I remuxed it. Does that sample also have the same 55ms delay applied?

mkvmerge will never apply any delay using the chapter. It doesn't have any influence on that.

I don't know what you did wrong, but I guess it doesn't matter. We know that it works and you have confirmed it. I don't know all the quirks of foobar2000, nor do I feel like testing it.

I always used ffaudiosource in AviSynth for opening the files and created wave files using avs2pipemod. (It also shows the length of the wav it piped)
Then I use Audacity to compare the waveforms (and/or length).

For future tests I recommend using a wave file as a source, so you can rule out any problems coming from the DTS decoder or foobar2000. And use ffaudiosource to get the results, not foobar2000.

VFR maniac
8th March 2012, 16:13
1. Apple state that the decoder should make an automatic assumtion and trim 2112 samples from the output upon playback. Do all AAC decoders do this ?


Why all aac decoders should follow Apple's suggestion?
It can also say why an aac decoder should follow another aac decoder's assumption?
If a stream doesn't have any signal of encoder delay and/or remainder padding, it shall be described in container level and trimmed by presentation layer (decoder->compositor->remove padded samples), not a role of audio decoders.
This is fair, isn't it?

AFAIK, faad (not libfaad) assumes 1024 priming samples and removes it.
ffaac (libavcodec's aac decoder) doesn't assume any priming. (I prefer this behaviour.)


For MP4/MOV files, there is recommended explicit encode delay representation: edit list + pre-roll sample grouping.
(See the newest qtff.pdf Appendix G, 14496-14:2003 Cor1:2006 and 14496-24:2008)
M4A iTunSMPB can't indicate two or more different encoder delay info in one movie file.

Note 1: MP4Box doesn't support encoder delay signaling, and official MP4Box has the bug about timescale of minus delay for import option.
Note 2: HE-AAC's encode delay is not always same as LC-AAC's.

madhatter300871
8th March 2012, 16:48
I wasn't implying decoders should follow apples suggestion, I was just posing the question to help improve my understanding.

The more I am learning the more that I also prefer a decoder not to assume any priming.

MP4box does support delaying a track with :-

-delay trackID=TIME : sets track start-time offset, specified in milliseconds.

Is this not enough to compensate for encoder delay ?

I'll update my 'dummies guide' with the suggested faad and ffaac behaviour.

madhatter300871
8th March 2012, 16:55
If I encode a test batch of audio files, say 10 files in total, with different encoders is it a fair assumtion that for each file (encoded with identical settings) encoder delay should be the same (or roughly the same).

What I'm trying to achieve here are some basic starting values that could be applied as delay in the final container in order to aid closer a/v sync. For example, I may find that all files encoded to MP3 48000KHz 192kbps have about 50ms of silence applied by the encoder, so I can automatically add -50ms delay in the output.

Also, if the encoder applies a delay and the decoder applies a delay, should I be setting a delay equal to the sum of them both in my output container ?

I may not get it right all the time but I am coming to the conclusion that assuming no encoder delay is the wrong thing.

pandy
8th March 2012, 17:58
The reason I ask is that I have recently read somewhere (apologies for not remembering exactly where !) that this might be the case.

It depends only from medium used as a source (storage) - for files (random and nonlinear access possible) this should be not a problem - this can be a serious problem for streaming (ie linear broadcast) decoder is not able to read more date before decoding.

madhatter300871
8th March 2012, 18:02
It depends only from medium used as a source (storage) - for files (random and nonlinear access possible) this should be not a problem - this can be a serious problem for streaming (ie linear broadcast) decoder is not able to read more date before decoding.

Are you sure ?

The general consensus here is that MDCT based encoders apply silence at the beginning of a track, and padding at the end.

Decoders may incur further delays and muxers may or may not allow for encoder delay and trim audio accordingly.

VFR maniac
8th March 2012, 18:19
MP4box does support delaying a track with :-

-delay trackID=TIME : sets track start-time offset, specified in milliseconds.

Is this not enough to compensate for encoder delay ?


Supporting delay (edit list only) is not equal to supporting encoder delay signaling (edit list + pre-roll sample grouping) in MP4/MOV since MP4/MOV can select presentation range from any arbitrary portion of the media in a track by edit list.
And for signaling of the encoder delay presence in MP4/MOV, pre-roll sample grouping is required, which indicates always start to decode from the frame immediately prior to the frame of which you want to start presentation.

Let's say an AAC stream consists of AAC[0] AAC[1] AAC[2] ....
If you want to start to make a sound of AAC[2] correctly for random access, you must start to decode from AAC[1] because of MDCT.
Delaying by priming samples can't signal this in container level, isn't it?

MP4Box's delay option can't signal to require decoding from the previous frame of the frame of which you start to make a sound yet.

madhatter300871
8th March 2012, 18:34
In all honesty pre roll sample grouping and MDCT algorithms are concepts way outside of my comfort zone, so I must claim ignorance at this point and state that, at a complex technical level, I am in way to deep !!

What I'm trying to achieve is simply to specify an arbitrary amount of delay in my final output, be it MKV or MP4, based on the audio encoder used, encoder settings and sample rate.

I know this is a very simplistic ideal and one that is technically incorrect and I am sure makes me sound incompetent. I am sure that not allowing for delays incurred by the encoder is the wrong thing to do and in the absence of any quick-n-dirty way to ascertain the amount of priming at the beginning of an audio track I am trying to gather info on values that can be expected, by enlarge, on the whole, all things being equal.

Its a learning process, but so far I have learned (for example) that encoding with NeroAacEnc and muxing with MKVmerge (5.3.0 or later) will mean that priming is taken care of and the output adjusted automatically.

It is this type of info I am trying to get.

I genuinely read this thread with great interest and still welcome any and all input but please excuse my ignorance when I struggle to understand the complexities of what is going on under the bonnet.

I backup my blurays and DVDs and play them direct from hard drive on my HTPC connected to a projector. Its not life or death for me, its a hobby I thoroughly enjoy, and I am hoping to make my backups just that little bit "better" by having a slightly more accurate a/v sync.

pandy
8th March 2012, 18:53
Are you sure ?

The general consensus here is that MDCT based encoders apply silence at the beginning of a track, and padding at the end.

Decoders may incur further delays and muxers may or may not allow for encoder delay and trim audio accordingly.

IMHO still not an issue - encoder can be made in a way that internal encoder latency is hidden - null samples precede signal samples - those samples can be discarded at decoder side - issue is how feed data to decoder to hide latency (for example DTS/PTS stamps in TS)

So im quite sure that latency (encoder/decoder) is more important when data are streamed than when data can be read in asynchronous way and when decoder have plenty of processing power.
For typical PC case when files are stored on HDD and when decoder is implemented as a software on CPU and CPU use only small fraction of processing power to decode those data then latency can be hide nicely.

madhatter300871
8th March 2012, 19:00
Pandy ... interesting. So are you saying that decoders discard priming samples that are added to the track by the encoder ?

Is this all decoders, or only certain ones ?

I'm particularly interested in aac, mp3, ac-3 and DTS as it it these that I use most often.

madhatter300871
8th March 2012, 20:21
Regarding MP3 encoding, I read this link :-

http://www.hydrogenaudio.org/forums/index.php?showtopic=69525&st=0&p=615515&#entry615515

...about 2/3 way down there is a users explanation of why using the lame tag in an mp3 file destined as a movie audio track is a bad idea, and how it can be avoided when encoding with lame by using the -t switch.

This will remove some delay. I'm going to do some tests, but anyone have an opinion on this ?

madhatter300871
8th March 2012, 23:50
I've done some tests. I have demuxed a DTS track from an episode of Vampire Diaries (don't judge me !!). I have used LeeAudBi to get stream details.

Source DTS track, 235099 frames, 41m:47.723s. Encoded to MP3 using Behappy without -t switch, 104491 frames, 41m:47.784s. Encoded to MP3 using Behappy with -t switch, 104490 frames, 41m:47.76s.

Made 3 x MKVs using MKVmerge 5.3.0 using the source track, using the MP3 without -t track, using the MKV with -t track. In all honesty, I'm finding it impossible to notice any lip sync issues with any of them.

I then made an MKV with a -22ms offset on the audio, MediaInfo reported the MKV as having -2ms audio delay compared to the video.

I then made an MKV with a -50ms offset on the audio, MediaInfo reported the MKV as having -22ms audio delay compared to the video.

This doesn't seem right, should it not have reported -22 and -50 respectively ?

I have only made this test with one media file, I suppose I would have to try like 10 or 20 different source files to correlate results. But on the surface of it it looks like the audio delays incurred due to encoder priming are negligible and (to me at least) unnoticeable. Would people tend to agree that this could be the case ?

On a side note, Lame stores meta data regarding priming and padding values, could MKVmerge be made to read this data and trim accordingly as it does for aac ?

hello_hello
9th March 2012, 05:07
I then made an MKV with a -22ms offset on the audio, MediaInfo reported the MKV as having -2ms audio delay compared to the video.

I then made an MKV with a -50ms offset on the audio, MediaInfo reported the MKV as having -22ms audio delay compared to the video.

This doesn't seem right, should it not have reported -22 and -50 respectively ?

The way I understand it, MKVMergeGUI applies a negative audio delay by trimming the beginning of the audio. It would, I assume, only trim whole frames, so in your case I'd guess the first frame is 20ms long, so it'd remove the first frame and then apply an additional -2ms delay. Something like that....

I have only made this test with one media file, I suppose I would have to try like 10 or 20 different source files to correlate results. But on the surface of it it looks like the audio delays incurred due to encoder priming are negligible and (to me at least) unnoticeable. Would people tend to agree that this could be the case ?

Yes.

On a side note, Lame stores meta data regarding priming and padding values, could MKVmerge be made to read this data and trim accordingly as it does for aac ?

Logically it should be able to. Maybe it already does? Rather than speculate, maybe it'd be worth asking some questions in the MKVToolNix thread to see if the author can provide any definitive answers.
LAME seems to write the total delay and padding when it encodes, as when converting a wave file with foobar2000 it always reports the resulting MP3 as having exactly the same duration and number of samples as the source. The same applies to AAC (using Nero), but not AC3 (using Aften). If foobar2000 can encode and decode MP3/AAC using the data, I don't see why a muxer couldn't use it too. And I do wonder if maybe it isn't something MKVMerge already does....

By the way, I think your earlier assumption of a 576 sample delay for MP3 is incorrect. Unless I'm misunderstanding it there's also a delay when decoding to be factored in. The decoder delay is always 529 samples. The encoder delay varies, plus there's the VBR header delay as well. So going by this chart (http://mp3decoders.mp3-tech.org/decoders_lame.html#delays) a LAME encoded CBR MP3 requires a total 1105 sample delay, while for VBR it's 2257.
I've not read of a decoder delay when it comes to AAC so I assume it's just the encoder delay which needs to be compensated for.

VirtualDubMod (according to it's help file) seems to assume all MP3s have a total 1393 sample delay, which it automatically applies when muxing. As best as I can tell, it compensates for this delay without trimming the beginning of the audio. If you add an MP3 stream to an AVI, save it, open the saved AVI and then demux the MP3 stream it'll be the same length as when you started. Well VirtualDubMod does trim the end of the audio stream if it's longer than the video stream, but it only trims the beginning if you manually apply a negative delay.
I can't seem to find out how VirtualDub handles any MP3 delay.

Probably tomorrow, I'm going to encode a few more DTS files and mux them into MKVs just to confirm MKVMerge is compensating for the encoder delay as it should. sneaker_ger seems to be confident it is and I have no reason to doubt him but I'd still like to prove it to myself properly. Something seemed to be going wrong when I was testing it yesterday. Probably just me.....

madhatter300871
9th March 2012, 10:43
Ahh, OK. Audio is trimmed frame at a time (or something) and the remainder is dealt with by adding a delay. OK.

I'll pose a question on the MKVtoolnix forum, good idea !

I did post encoder and decoder delays but I now know it is the sum of both values that is the total delay that should be accounted for. I'll go back change my values. Why I did that was I wasn't (still not) sure wether any particular decoder dealt with decoder delay in any way that meant allowing for it at mux time was the wrong thing to do. For example it is suggested that faad trims 1024 samples when decoding, libavcodec trims nothing. So does this mean that if I use libavcodec for decoding I should include decoder delay in my final muxer delay value but if I use faad I should not include decoder delay in my final muxer value.

Another example, faac adds 1024 priming samples, faad trims 1024 samples so no delay need be added at the muxing stage, but libavcodec trims nothing so if using that decoder a delay need be added at the muxer stage.

Am I even getting warm here ?

I'm finding it very difficult to put all this together. The concept is easy to understand yet finding what encoders/decoders/muxers follow what format is proving tough !

madhatter300871
9th March 2012, 13:56
So I posted the question on the MKVtoolnix forum, Mosu responded as follows :-

mkvmerge uses the same algorithm it uses for the "--syn" option. Meaning that all timecodes for that track are shifted by the encoder delay information. Packets that have a negative duration after all shifts have been applied are discarded.

At the moment only AAC in MP4 files are handled this way. MP3 files are not.

That answers that then.

madhatter300871
9th March 2012, 15:54
So far, encoder delay (priming) figures are as follows :-

iTunes / QuickTime: 2112
faac: 1024
NeroAACenc: 2624
Lame CBR : 1105
Lame VBR : 2257
Aften AC3 : 256

MKVmerge can read AAC in MP4 meta data and deals with the delay automatically.

Aften can use '-pad 0' switch to NOT insert 256 blank samples.

pandy
9th March 2012, 18:25
Pandy ... interesting. So are you saying that decoders discard priming samples that are added to the track by the encoder ?

Is this all decoders, or only certain ones ?


I only say that decoder can have such functionality - if You can correctly signal to decoder fact that You add null samples to signal in encoder then decoder can use this signaling and discard some decoded samples. This behavior can hide (but not remove) padding


I'm particularly interested in aac, mp3, ac-3 and DTS as it it these that I use most often.

container is also important

madhatter300871
10th March 2012, 00:58
Pandy .... it just doesn't work like that at the moment, hence me trying to get priming figures. MKVmerge with AAC in MP4 is the only one so far (as far as i have learned so far) that does it.

pandy
12th March 2012, 14:40
Pandy .... it just doesn't work like that at the moment, hence me trying to get priming figures. MKVmerge with AAC in MP4 is the only one so far (as far as i have learned so far) that does it.

This is my point - focus on container and correct implementation for muxer/demuxer - correct in way that muxer and demuxer care about audio/video position.

Issue with PC is that not many developers care about audio/video sync - jitter is also very big and You can expect fluctuations with audio/video speed during playout - this is why i always says that PC is not good environment for multimedia - nowadays people don't care about sync, jitter - they accept sample-rate conversion, temporal conversion and they don't see any issue - even large broadcasters can send invalid video (swapped field order) - it is common situation.

PS
Don't trust to much with reporting tools - they can be easily confused.

hello_hello
25th March 2012, 19:50
Tag in "foobar encoded aac.m4a":
00000000 00000A40 000001C0 0000000000140400 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

Delay = A40 = 2624 = 55ms (0,054666... s)
Original length = 1311744 = 27.328s
Padding = 1C0 = 448 = 9ms

Checking the length of "original sample dts.mkv": 27.328s (=correct!)
Checking length of "sample aac remuxed.mkv": 27.392s (wtf?)

Remuxing audio of "foobar encoded aac.m4a" and video of "original sample dts.mkv" to "reremux.mkv". Length = 27.337s = 27.328s + 9ms = original length + padding (as expected)

Comparing waveform of "reremux.mkv" and "original sample dts.mkv": no extra delay (as expected)

Comparing waveform of "sample aac remuxed.mkv" and "original sample dts.mkv": delayed by 55ms

So I take you manually entered "55ms" into the delay field of mmg instead of letting it do its work?

If you're still around sneaker_ger I think I found the problem. &$%^%$!! Despite not having found the motivation to do any further experimenting.....

However I was reading through the MKVMerge changelog looking for something else today. At the time I made those samples (8th March) I was running version 5.2.1. (I mentioned it in post #22).
Anyway if I'm reading the change-log correctly the compensation for MP4/AAC audio delay wasn't included until version 5.3.0, which I think was the latest version at the time (release data for 5.4.0 says 10th March) and because you said "newer versions of mkvmerge read and apply the AAC audio delay" I seem to have incorrectly included the version I was using in that category. &*^&%$!!!
I assume from reading your above calculations again the 55ms encoder delay simply wasn't being compensated for by the version of MKVMerge I was using. *&%%^!!

Oh well.... at least I can use MediaInfo to work out which version of MKVMerge I was using when muxing older MKVs, so one day I might get motivated to remux all the AAC files with a -50ms delay, which is pretty much what I worked out would be required earlier in the thread (post #20). Although it shouldn't be too hard. I upgraded from 5.2.1 to 5.4.0 on 11th March, so it's pretty much any video with AAC audio I've encoded prior to that date. &^&%$%!!!

sneaker_ger
25th March 2012, 20:12
Yes, it hadn't been implemented in 5.2.1 stable yet. I also overlooked that - those files have indeed been created using 5.2.1 according to MediaInfo.