PDA

View Full Version : improper vbr audio encoding


Lashford42
17th December 2002, 21:21
i have two xvid avi's that i want to join together and re-encode into either divx or xvid. when i try to open the xvid files, it says "VirtualDub has detected an improper vbr audio encoding in the source avi file and will rewrite the audio header with standard cbr values during processing for better compatibility. this may introduce up to 3679 ms of skew from the video stream. if this is unacceptable, decompress the 'entire' audio stream to an uncompressed wav file and recompress with a constant bitrate endcoder (bitrate: 157.7 +- 13.9 kbps)" is there no way i can just add the files without decompressing everything separately? am i missing some codec or tool? thanks for the help ya'll.

~your lashdaddy

jggimi
17th December 2002, 22:26
Nandub -- a variant of VirtualDub -- is what most folks use to manipulate .avi files with vbr mp3 files. For example, Gknot uses Vdub for DivX 5 encoding, then uses Nandub for muxing of .ac3 or .mp3 audio, and the default .mp3 audio transcoded by Gknot is vbr.

hakko504
17th December 2002, 22:41
@jggimi

Living up to your title again I see :D

In this case I think VirtualDubMod (v1.4.12.1) is a better choise. It will do everything VirtualDub can do, a lot more, including most things nandub did (not SBC). You'll be able to open the the files you want to join, set the audio to direct stream copy and re-encode the video to XviD in one go. Still, if the audio is VERY different, you may have to extract it and join the audio files first, before using VDM to join everything. Audio can be a little tricky if you don't have compete frames to join. You will get the warning about vbr audio, but if you check 'use nandub vbr mp3 mode' it will handle the audio exactly like nandub did.

jggimi
17th December 2002, 22:59
That's why I'm glad you're such a frequent contributor (and corrector). See http://forum.doom9.org/showthread.php?s=&postid=226430#post226430.

hakko504
17th December 2002, 23:02
yep, and you're great fun to read. :)

ChristianHJW
18th December 2002, 01:16
Since i am helping out at a place where mainly Newbies turn to i am questioning more and more if the gain in quality by using VBR MP3 is worth the hassle and incompatibilities with AVI standard. I would estimate that about 30% of the people turning to this place have probs with VBR audio ...

Belgabor
20th December 2002, 11:47
I can only support Chris. NEVER EVER USE VBR MP3 IN AVIS! Its very very broken. (and don't give me that 'it's working for me' crap. If you play russian roulette it will work out 5 out of 6 times, too)

R3g
28th December 2002, 17:27
Is there different ways to fit VBR audio in avis ?
I ask it because I have made more than 50 encodes with VBR Mp3 muxed with Nandub, and never had a problem. And a friend of my got several really hard problems with downloaded files. So perhaps Nandub is better than other softwares ,or perhaps I am just a lucky guy ?

Belgabor
29th December 2002, 02:14
I'd say you're lucky. To be honest, I surely have lots of files where it works, too, but on the other hand I had several files with freeze frames that where simply gone as soon as i demuxed the file. Well, my point is cbr mp3 isnt that much worse (and works good). If you really want vbr audio use it in ogm (which works good as well).

R3g
29th December 2002, 12:42
Yes, that must be the only point where I'm lucky. Anyway I switched to OGM or CBR audio some monthes ago.
In fact I wanted to know if, AFAYK, there is only _the_ way of achieving VBR audio in avi, or if each software achieves it it's way.

ChristianHJW
30th December 2002, 11:36
Suiryc explained it in detail on irc.freenode.net #matroska ... if you are interested i send you the channel log ( text file, few kb ) ...

R3g
30th December 2002, 12:51
Yes, I am ! Please mail me R3g@gmx.fr

Belgabor
30th December 2002, 17:53
Chris, why don't you post the log in the audio encoding forum? I think this might interest more people.

ChristianHJW
30th December 2002, 23:55
Originally posted by Belgabor Chris, why don't you post the log in the audio encoding forum? I think this might interest more people.

Dont want DG to kick my ass for flooding his forums :D .. doing here :

[21:33] <Belgabor|Home> cyrius, what did your experimets tell?
[21:33] <Suiryc> Belgabor|Home : I think I know know why VBR is not good, and also why Nando's hack works (somehow)
[21:33] <Suiryc> s/know/now
[21:33] <Belgabor|Home> ok, tell me
[21:33] <Suiryc> :)
[21:34] <Suiryc> first of all there are 2 'headers' in the AVI (audio) stream
[21:34] <Belgabor|Home> I have the feeling i need to hammer that down some throat soon :p
[21:34] <ChristianHJW> lol
[21:34] <Suiryc> first one is a general one (the same struture is used for each track)
[21:35] <Suiryc> AVISTREAMINFO
[21:35] <Suiryc> (IIRC ... there should use shorter names ...)
[21:35] <Belgabor|Home> lol
[21:37] <spyder482> ChristianHJW: I won't be moving for a few months still though
[21:37] <Suiryc> this one tell how many frames there are in the stream
[21:37] <Suiryc> and what is the rate of the frames
[21:37] <Suiryc> thanks to dwRate & dwScale fields
[21:37] <Belgabor|Home> got that
[21:38] <Suiryc> it also contains a field saying the size of 1 frame
[21:38] <Suiryc> if VBR, then it is set to 0, otherwise it is set to the correct value
[21:38] <Belgabor|Home> dwSampleSize
[21:39] <Suiryc> yep
[21:39] <Suiryc> then there is a header specific to the audio stream (based on WAVEFORMATEX)
[21:39] <Suiryc> this one tell the samplerate (44100, 48000, ...)
[21:39] <Suiryc> the byterate
[21:39] <Suiryc> the format (wFormatTag)
[21:40] <Suiryc> and especially contains a field names nBlockAlign
[21:40] <Suiryc> nBlockAlign tell how many bytes an audio frame contains
[21:40] <Suiryc> _BUT_
[21:40] <Belgabor|Home> And that musnt be 0
[21:40] <Suiryc> cannot be set to 0
[21:40] <spyder482> so much work for AVI...
[21:40] <Suiryc> :)
[21:40] <Belgabor|Home> ok, i think i get the picture
[21:40] <Suiryc> ok so let's continue
[21:41] <Belgabor|Home> ok
[21:41] <ChristianHJW> all with you guys ...
[21:41] <Suiryc> in Nandub here is what happens with an MP3 stream (VBR one)
[21:42] <Suiryc> Nando set dwRate to the samplerate (44100, 48000, ...)
[21:42] <spyder482> don't you two have a channel for this? :)
[21:42] <Suiryc> spyder482 : shut up :P
[21:42] <Suiryc> and set dwScale to 1152
[21:42] <spyder482> lol
[21:42] <Belgabor|Home> no, the other one is just for lurking
[21:42] <Belgabor|Home> :p
[21:42] <Suiryc> :]
[21:42] <spyder482> hehe
[21:43] <Suiryc> and set nBlockAlign to 1152 too
[21:43] <Suiryc> then, when muxing it only treat whole MP3 frames
[21:43] <Suiryc> (i.e. each MP3 frame is in its own Chunk)
[21:44] <Suiryc> you still follow ?
[21:44] <md`> who has done the mpeg2 import part of vdmod?
[21:44] <Belgabor|Home> ok, one mp3 frame is what?
[21:44] <Belgabor|Home> pulco-citron
[21:44] <md`> hmpf
[21:44] <spyder482> pulco-citron
[21:44] <spyder482> oh
[21:44] <spyder482> :)
[21:45] <md`> why does he generate d2v and dont let the user decide to pick one...
[21:45] <Belgabor|Home> dunno
[21:45] <md`> if there is one already
[21:45] <md`> hmmm
[21:45] <Suiryc> Belgabor|Home : an Mpeg1-Layer3 frame is the shorter block of data you can use
[21:45] <ChristianHJW> let Suiryc finish guys .. please
[21:45] <md`> yes ok
[21:45] <Belgabor|Home> ok
[21:45] <spyder482> ChristianHJW: check #virtualdub
[21:45] <Suiryc> it contains an header saying what is in the frame, and then the data (audio)
[21:46] <ChristianHJW> we have to know whats wrong in AVI to be able to advertise matroska ;-)
[21:46] <Belgabor|Home> this is how much data?
[21:46] <Suiryc> somehow 1 MP3 frame ~ 1 video frame
[21:46] <Belgabor|Home> ChristianHJW: lol
[21:46] <Suiryc> the size of a frame depends on the MP3 settings
[21:46] <Suiryc> (i.e. bitrate, ...)
[21:46] <Belgabor|Home> ok
[21:47] <Belgabor|Home> is it fixed for a file or varible in vbr?
[21:47] <Suiryc> however a Mpeg1-layer3 frame conatins 1152 samples
[21:47] <Suiryc> the size of a frame is variable
[21:47] <Suiryc> even in CBR
[21:48] <Suiryc> (e.g. frames will be of 417 or 418 bytes)
[21:48] <Belgabor|Home> ok, but 1152 is the upper limit?
[21:48] <Suiryc> because a fixed btrate must be achieved
[21:48] <Suiryc> 1152 is the number of samples a frame contains
[21:48] <Suiryc> each frame (whatever its size may be) contains 1152 samples
[21:49] <Belgabor|Home> oic
[21:49] <Suiryc> so let's continue ;)
[21:49] <Suiryc> each frame contains 1152 samples
[21:49] <Belgabor|Home> ok
[21:49] <Suiryc> and the rate of the stream (in AVISTREAMINFO) has been set to :
[21:49] <Suiryc> dwRate / dwScale = SampleRate/1152
[21:50] <Suiryc> since each Frame contains 1152 it is equal to the 'framerate'
[21:50] <Suiryc> (as for video)
[21:50] <Belgabor|Home> ok, i think i got that
[21:50] <Suiryc> now you must recall that each frame is in its own AVI chunk
[21:50] <Belgabor|Home> ok
[21:50] <Suiryc> so it is also the 'chunkrate'
[21:51] <Suiryc> so here is now what happens (it is most likely what happens) when playing the file in Window Media Player
[21:51] <Belgabor|Home> ic
[21:51] <Suiryc> WMP will get both headers
[21:52] <Suiryc> which will say to it that the rate of the stream is SampleRate/1152
[21:52] <Belgabor|Home> gimme a sec, brb
[21:52] <Suiryc> and that each audio frame is 1152 bytes long (nBlockAlign)
[21:52] <Suiryc> k
[21:53] <Belgabor|Home> back
[21:54] <Suiryc> ok so WMP believe each frame is 1152 bytes long
[21:54] <Belgabor|Home> yeah
[21:54] <Suiryc> which is not the case (generally frames are around 400 bytes long with 128kbps stream)
[21:55] <Suiryc> but
[21:55] <Belgabor|Home> yeah, got that much
[21:55] <Suiryc> now you are reading data in the file
[21:55] <Suiryc> and WMP needs to know when to read the audio
[21:55] <Suiryc> (i.e. to which time correspond an audio frame)
[21:56] <Suiryc> to do so it will look at all the previous audio chunks in the file
[21:56] <Suiryc> for each shunk it divide the size (in bytes) of the chunk by nBlockAlign to know how many frames there were in the chunk
[21:56] <Belgabor|Home> ok
[21:56] <Suiryc> s/shunk/chunk
[21:57] <Belgabor|Home> ok
[21:57] <Suiryc> (since every tools dealing with the stream must cut on nBlockAlign boundaries)
[21:57] <Suiryc> since each chunk is shorter than 1152 bytes (nBlockAling) it shoul get 0
[21:57] <Suiryc> but this is not possible
[21:58] <Suiryc> since tools work on blocks of nBlockAlign bytes, it must assume than there is at least 1 frame in the chunk
[21:58] <Suiryc> (even if the chunk is shorter)
[21:59] <Suiryc> so for each chunk it find there is 1 frame in it
[21:59] <Suiryc> which is really the case (each mp3 frame is in its own chunk)
[21:59] <Suiryc> so WMP got the correct number of mp3 frames played so far
[22:00] <Suiryc> and since it has the correct rate (each frame contains 1152 samples, and the rate of the stream is SampleRate/1152)
[22:00] <Suiryc> it also got the correct timecode for the frame
[22:00] <Belgabor|Home> ok
[22:00] <Suiryc> resulting in a perfectly synched MP3 stream
[22:01] <Suiryc> I was lead to this conclusion without debugging WMP while playing ;) but with some tests I made :
[22:02] <Suiryc> I changed the dwScale value (with or without the nBlockAlign value)
[22:02] <Suiryc> but this resulted in otu of synch issues (audio playing too fast/slow)
[22:02] <Suiryc> out*
[22:02] <Suiryc> I changed the nBlockAlign valuie :
[22:03] <Suiryc> setting it to 1 and then I have out of synch issues too
[22:03] <Suiryc> but setting it 2304 and I stil have a perfectly synched stream
[22:03] <Belgabor|Home> ok
[22:04] <Suiryc> so in fact the 1152 value in nBlockAlign could be anything else
[22:04] <Suiryc> _but_
[22:04] <Suiryc> must be higher than the size of an mp3 frame
[22:04] <Belgabor|Home> ok, what happens if you set it to 0?
[22:04] <Suiryc> lol
[22:05] <Suiryc> if you set it to 0 then WMP won't play the stream (the icon for audio is disabled like if there is no audio in the file)
[22:05] <Suiryc> so no VBR ;)
[22:05] <Belgabor|Home> ok
[22:06] <Belgabor|Home> so the failure is in priciple not in avi, but in the WAVEFORMATEX header
[22:06] <Suiryc> yep
[22:06] <Suiryc> but since the AVI will use WAVEFORMATEX for audio headers, it is still a failure in AVI specs
[22:07] <Belgabor|Home> do you have the resemblance of an idea why vbr mp3 fails?
[22:07] <Belgabor|Home> yep
[22:07] <Suiryc> <Belgabor|Home> do you have the resemblance of an idea why vbr mp3 fails? <-- you mean why it is not good ?
[22:08] <Belgabor|Home> yep, why it fails sometimes
[22:08] <ChristianHJW> thats what i am interested in also
[22:08] <Suiryc> well in the case of WMP, it will divide the chunk size by nBlockAlign
[22:08] <Suiryc> (that's what I think, since the synch is good)
[22:08] <Suiryc> and will set it to 1 if the chunk size is too small
[22:09] <Suiryc> but there is another way to compute timecode
[22:09] <Suiryc> (assuming that you have CBR of course)
[22:09] <Suiryc> you take the total bytes in previous chunks
[22:09] <Suiryc> and divide it by nblockAlign
[22:10] <Belgabor|Home> which fails miserably for the vbr hack
[22:10] <Suiryc> of course in this case you get a completly wrong value since mp3 frames are not 1152 bytes lnog
[22:10] <Suiryc> yep
[22:10] <Suiryc> otehr tools may also assume that the chunk is not valid (corrputed) since its size is shorter than nBlockAlign
[22:11] <Belgabor|Home> ok, thats the failure in principle, but why are some files broken?
[22:12] <Suiryc> what files ?
[22:12] <Suiryc> broken ? what do you mean by broken ?
[22:13] <Belgabor|Home> i had some vbr mp3 avis which seemed like having divx3 freeze frames but where ok when demuxed
[22:13] <Suiryc> dunno
[22:13] <Suiryc> maybe a problem with the decoder
[22:14] <Belgabor|Home> ok, well that cleared things up a bit
[22:14] <Belgabor|Home> thx :)
[22:14] <Suiryc> :)
[22:14] <Suiryc> btw there may be problems with Nandub code ;)
[22:14] <Suiryc> because :
[22:14] <Suiryc> 1. layer1 streams only have 384 samples per frame
[22:15] <Suiryc> 2. IIRC with very high bitrates an mp3 frame can be higher than 1152 bytes ;)
[22:15] <Suiryc> s/higher/bigger
[22:16] <Suiryc> (the max size is near 2000 bytes IIRC)
[22:16] <Belgabor|Home> ok, so nBlockAlign should be >2000
[22:17] <Suiryc> so depending on the way dividing is used (rounding to floor or ceil or nearest value)
[22:17] <Suiryc> and the max size of a frame, it may find there are 2 frames in a chunk where there is only 1 frame
[22:17] <Belgabor|Home> ok, i got that
[22:18] <Suiryc> but this is for really high bitrates ...
[22:18] <Suiryc> lemme check ...
[22:19] <Belgabor|Home> what would happen if we put two frames in one chunk? aka set dwRate = 2* sample rate and so on?
[22:20] <Belgabor|Home> no, not two, just double the values?
[22:21] <Suiryc> if you double the value the rate of the audio will be changed accordingly
[22:21] <Suiryc> so to keep it correct you would have to put 2 mp3 frames in each chunk
[22:22] <Suiryc> but then you would most likely go beyond the 1152 bytes per chunk
[22:22] <Suiryc> and increase the chances to generate out of synch problems
[22:23] <Belgabor|Home> let me rethink
[22:24] <Suiryc> changing dwRate and dwScale only affects the rate of the stream
[22:25] <Suiryc> multiplying dwRate by 2 => audio play 2 times faster
[22:25] <Suiryc> multiplying dwScale by 2 => audio play 2 times slower
[22:25] <Belgabor|Home> if we double dwrate, dwscale, nblockalign and dwsamplesize?
[22:25] <Suiryc> multipyling both => no change
[22:25] <Suiryc> dwSampleSize is set to 0
[22:26] <Belgabor|Home> ah ok, so skip that
[22:26] <Suiryc> (dwRate, dwScale) and nBlockAlign are not linked
[22:26] <Suiryc> you can use a higher value in nBlockAlign
[22:27] <Suiryc> (like the 2304 I tested)
[22:27] <Belgabor|Home> nvertheless, if we double all three, shouldnt it be safe for larger mp3 frames?
[22:27] <Suiryc> this won't change anything in the case of WMP because something lower than 1152 divided by 1152 or 2304 will still be rounded to 0
[22:27] <Suiryc> Belgabor|Home : this would be safer
[22:28] <Suiryc> but would cause even more troubles in apps that don't work the same way than Nandub & WMP
[22:28] <Suiryc> I think some apps sometimes check a value of 1152 to know it was made by Nandub
[22:28] <Belgabor|Home> ok, i see the point
[22:29] <Belgabor|Home> faulty concept stays faulty
[22:33] <Suiryc> k I checked
[22:33] <Suiryc> keeping 1152 shoudln't cause too much problems
[22:33] <Suiryc> for Mpeg1-Layer2/3 the mas is near 1750 bytes long
[22:34] <Suiryc> there could be problems with Mpeg2/2.5-layer2/3
[22:34] <Suiryc> where a 160kbps stream of 8kHz have frames of 2881 bytes long at most
[22:35] <Suiryc> anyway I don't think people use this kind of stream ;)
[22:36] <ChristianHJW> highly unlikely ..
[22:51] <Suiryc> nite
[22:52] * Suiryc has left #matroska

Link to complete channel log is here : http://www.wiesneronline.net/irclogs/%23matroska.freenode.20021229.log

Suiryc
31st December 2002, 00:05
At least if my reasoning is wrong (or if something else happens while reading the file) maybe someone will tell us here :)

hakko504
31st December 2002, 00:06
A big thumbs up to that description!:D And it was quite fun to read too! :cool:

Belgabor
31st December 2002, 00:23
i think this might be a good thread to have sticky, wouldn't it?

sillKotscha
31st December 2002, 01:15
interesting and funny to read :)

Originally posted by ChristianHJW
Link to complete channel log is here : http://www.wiesner...

@ Chris: wtf... pdng4a14.zip ??!!!!!!! I think you should remove this build ;)

just my 0,02 €

Sill

ChristianHJW
31st December 2002, 05:28
Originally posted by sillKotscha
@ Chris: wtf... pdng4a14.zip ??!!!!!!! I think you should remove this build ;)
Huh, thanks for telling, link should be broken but actually it wasnt !! ( its not functional now ... )