View Full Version : How is Dolby Digital "true" 16 bit?
Ryu77
18th February 2009, 09:54
I was just curious how this claim stands true with a lossy encoder?
I understand with LPCM how this works as it is straight forward. The bits per sample, samples per second and the number of channels are a logical calculation when working out bitrate...
16 (bit) x 48,000 (kHz/samples per second) x 6 (channels) = 4,608 kbs.
How is it that an AC3 (Dolby Digital) track can claim the same bit depth and sample rate, and only be a maximum of 640 kbs? I understand that this is a lossy encoder, therefore discards information to compress the data. However, if it isn't discarding bit depth or sample rate, what exactly is being discarded? Because 640 kbs simply isn't 16 bits per sample @ 48,000 samples per second by 6 channels!
I am wondering the same thing about most of the lossy formats (DTS, MP3, AAC etc.) that state 16 bit, 24 bit, 48 kHz, 96 kHz etc.
kypec
18th February 2009, 11:00
Lossy means that sample values are being re-constructed on decoding from the internally binary stored (alias encoded) data. Information in these internal samples is somehow squeezed compared to their original value.
tebasuna51
18th February 2009, 12:28
How is it that an AC3 (Dolby Digital) track can claim the same bit depth and sample rate, and only be a maximum of 640 kbs? I understand that this is a lossy encoder, therefore discards information to compress the data. However, if it isn't discarding bit depth or sample rate, what exactly is being discarded? Because 640 kbs simply isn't 16 bits per sample @ 48,000 samples per second by 6 channels!
Never a lossy codec can have a 'true' bitdepth. If we can recover the source bitdepth isn't lossy, is lossless.
- Don't mistake samplerate with the bandwith codified. The encoders have limiter bandwith filters related to bitrate.
- A sample in time domain have a bitdepth, but the samples are traslated to frequency domain when encoded, and the source bitdepth is lost.
The samples in frequency domain are stored with a precission equivalent to 20-24 bits in time domain (AC3,DTS).
The best option when decode an ac3/dts is use, at least, a bitdepth of 24 bits not matter what is the source precission.
2Bdecided
18th February 2009, 12:42
Though you won't get the "original" bits back, AC-3 can store a far greater dynamic range than can be represented in 16-bits.
It can happily store a sound peaking at 0dB FS one moment, and then one of -120dB FS the next. The latter sound would be lost in the dither noise (or rounded / truncated out of existence without dither) with 16-bits.
Just because you don't get the original 24-bits back doesn't mean it can't make some use of 24-bits.
Cheers,
David.
Ryu77
18th February 2009, 14:40
Thanks guys! :)
This all sounds very interesting. I certainly would like to learn more on a deeper scale which is why I posted this question here as I was hoping to open up a conversation on a technical level.
I am certainly inspired to learn more about this.
Dark Shikari
18th February 2009, 14:45
How is it that an AC3 (Dolby Digital) track can claim the same bit depth and sample rate, and only be a maximum of 640 kbs? I understand that this is a lossy encoder, therefore discards information to compress the data. However, if it isn't discarding bit depth or sample rate, what exactly is being discarded? Because 640 kbs simply isn't 16 bits per sample @ 48,000 samples per second by 6 channels!Welcome to the world of transform-based formats, where you can gain compression without explicitly discarding bit depth or sample rate ;)
Works the same way in video.
Koorogi
25th February 2009, 06:30
Lossy audio (and video) codecs typically work in the frequeny domain, because its easier to take advantage of several features/limits of human perception.
Audio codecs typically use a modified discrete cosine transform. This tells them the amount of each frequency that's present in a series of very short, overlapping time slices. Within each timeslice, they can make decisions about which frequency detail is important to keep. Human hearing has some masking effects - we are not as sensitive to sounds (especially of similar frequency) which are temporally nearby or concurrent to louder sounds, for instance. Audio codecs take advantage of this to throw out information that they don't think will be noticed.
This process can be done with any bit depth or time resolution. Increasing the bit depth increases the precision possible (it's possible to record finer changes). It is of course possible that this extra precision will be thrown out by the encoder due to bitrate pressure to encode more important data. I imagine bit rates higher than 16 bit are more useful for studio work where you shouldn't be using lossy compression anyway. Higher sampling rate, however is more important. Increasing the sampling rate does two things - it increases the frequency range expressible. The Nyquist limit states that the sampling rate must be at least twice the bandwidth of the signal, so if your signal has frequencies from 20 Hz to 20KHz, you need at least roughly a 40 KHz sampling rate to reproduce it.
The other thing it does is gives the encoder more accurate frequeny information. The discrete cosine transform and other fourier-based transforms have the problem that the frequency information you get is uniformly distributed throughout the frequency spectrum, but that doesn't match how we see and hear. Since every octave change sounds like the same "distance" to us, but is actually a doubling in frequency, we are more sensitive to changes between low frequencies, so frequeny resolution is more important here. But in order to get better frequency resolution they would need to perform the transform on a larger block of data at a time, which leads to other problems (latency, higher memory and processing requirements on the decoder, harder to take advantage of some properties of hearing due to a corresponding decrease in temporal resolution). These are actually the problems that motivated the development and use of wavelets in codecs like JPEG2000 and Snow.
madshi
25th February 2009, 09:01
Welcome to the board, Koorogi. Always good to have knowledgeable people here.
leeperry
25th February 2009, 12:31
last time I heard, AC3/DTS were able to output up to 18 bit if properly encoded(except for DTS 96/24 of course) :
http://209.85.229.132/search?q=cache:mXn_EIXtuaIJ:www.spannerworks.net/reference/10_1a.asp+AC3+DTS+18+bit&hl=en&ct=clnk&cd=3
Both Dolby Digital and DTS are capable of 24-bit resolution, but currently nominally operate at 18-bit resolution, allowing a dynamic range of approximately 108dB. Theoretically, 24-bit resolution allows dynamic range of 144dB which, though higher, would be indistinguishable from the lower 108dB figure given the current limitations of playback hardware. For all practical purposes, both Dolby Digital and DTS Digital Surround operate at near, or above, 18-bit resolution and dynamic range (108dB). Dolby Digital at 384kbps has an audio frequency response of 20Hz-18kHz with joint frequency coding above 10kHz, while 448kbps has a frequency response of approximately 20Hz to 20kHz with joint frequency coding above 15kHz. DTS at 754kbps has a maximum frequency response of 20Hz-19kHz although DTS's standard hardware encoder, the CAE-4, begins to roll off frequencies at 15kHz. 1509kbps DTS has a maximum frequency response of 20Hz-24kHz. Neither 754kbps nor 1509kbps DTS use joint-frequency coding.
anyway, it's like HDCD I guess.....you don't really get 20 bit, but you get >16 anyway....and it's clearly audible :)
jruggle
26th February 2009, 04:25
last time I heard, AC3/DTS were able to output up to 18 bit if properly encoded(except for DTS 96/24 of course) :
http://209.85.229.132/search?q=cache:mXn_EIXtuaIJ:www.spannerworks.net/reference/10_1a.asp+AC3+DTS+18+bit&hl=en&ct=clnk&cd=3
anyway, it's like HDCD I guess.....you don't really get 20 bit, but you get >16 anyway....and it's clearly audible :)
A lot of that information is not true about AC3. Since the article refers to it as "Dolby Digital" it might only talking about the official Dolby encoder and/or decoder, but it does not apply to AC3 in general. The "nominal operating" bit depth depends on the encoder and/or decoder. The frequency response can go up to about 23.7kHz. Joint-frequency coding is optional and the frequency range it's used for is encoder-adjustable.
Ryu77
26th February 2009, 14:20
Lossy audio (and video) codecs typically work in the frequeny domain, because its easier to take advantage of several features/limits of human perception.
Audio codecs typically use a modified discrete cosine transform. This tells them the amount of each frequency that's present in a series of very short, overlapping time slices. Within each timeslice, they can make decisions about which frequency detail is important to keep. Human hearing has some masking effects - we are not as sensitive to sounds (especially of similar frequency) which are temporally nearby or concurrent to louder sounds, for instance. Audio codecs take advantage of this to throw out information that they don't think will be noticed.
This process can be done with any bit depth or time resolution. Increasing the bit depth increases the precision possible (it's possible to record finer changes). It is of course possible that this extra precision will be thrown out by the encoder due to bitrate pressure to encode more important data. I imagine bit rates higher than 16 bit are more useful for studio work where you shouldn't be using lossy compression anyway. Higher sampling rate, however is more important. Increasing the sampling rate does two things - it increases the frequency range expressible. The Nyquist limit states that the sampling rate must be at least twice the bandwidth of the signal, so if your signal has frequencies from 20 Hz to 20KHz, you need at least roughly a 40 KHz sampling rate to reproduce it.
The other thing it does is gives the encoder more accurate frequeny information. The discrete cosine transform and other fourier-based transforms have the problem that the frequency information you get is uniformly distributed throughout the frequency spectrum, but that doesn't match how we see and hear. Since every octave change sounds like the same "distance" to us, but is actually a doubling in frequency, we are more sensitive to changes between low frequencies, so frequeny resolution is more important here. But in order to get better frequency resolution they would need to perform the transform on a larger block of data at a time, which leads to other problems (latency, higher memory and processing requirements on the decoder, harder to take advantage of some properties of hearing due to a corresponding decrease in temporal resolution). These are actually the problems that motivated the development and use of wavelets in codecs like JPEG2000 and Snow.
Now that's the answer I was looking for... Thank you. I love technical knowledge. I can't get enough. I will study and memorize what you just said. I'll be ready for an exam on it very soon Mr Teacher. :D
vBulletin® v3.8.4, Copyright ©2000-2010, Jelsoft Enterprises Ltd.