Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Audio encoding

Reply
 
Thread Tools Search this Thread Display Modes
Old 18th February 2009, 09:54   #1  |  Link
Ryu77
Registered User
 
Ryu77's Avatar
 
Join Date: Mar 2008
Location: Australia
Posts: 246
How is Dolby Digital "true" 16 bit?

I was just curious how this claim stands true with a lossy encoder?

I understand with LPCM how this works as it is straight forward. The bits per sample, samples per second and the number of channels are a logical calculation when working out bitrate...

16 (bit) x 48,000 (kHz/samples per second) x 6 (channels) = 4,608 kbs.

How is it that an AC3 (Dolby Digital) track can claim the same bit depth and sample rate, and only be a maximum of 640 kbs? I understand that this is a lossy encoder, therefore discards information to compress the data. However, if it isn't discarding bit depth or sample rate, what exactly is being discarded? Because 640 kbs simply isn't 16 bits per sample @ 48,000 samples per second by 6 channels!

I am wondering the same thing about most of the lossy formats (DTS, MP3, AAC etc.) that state 16 bit, 24 bit, 48 kHz, 96 kHz etc.
Ryu77 is offline   Reply With Quote
Old 18th February 2009, 11:00   #2  |  Link
kypec
User of free A/V tools
 
kypec's Avatar
 
Join Date: Jul 2006
Location: SK
Posts: 826
Lossy means that sample values are being re-constructed on decoding from the internally binary stored (alias encoded) data. Information in these internal samples is somehow squeezed compared to their original value.
kypec is offline   Reply With Quote
Old 18th February 2009, 12:28   #3  |  Link
tebasuna51
Moderator
 
tebasuna51's Avatar
 
Join Date: Feb 2005
Location: Spain
Posts: 6,890
Quote:
Originally Posted by Ryu77 View Post
How is it that an AC3 (Dolby Digital) track can claim the same bit depth and sample rate, and only be a maximum of 640 kbs? I understand that this is a lossy encoder, therefore discards information to compress the data. However, if it isn't discarding bit depth or sample rate, what exactly is being discarded? Because 640 kbs simply isn't 16 bits per sample @ 48,000 samples per second by 6 channels!
Never a lossy codec can have a 'true' bitdepth. If we can recover the source bitdepth isn't lossy, is lossless.

- Don't mistake samplerate with the bandwith codified. The encoders have limiter bandwith filters related to bitrate.

- A sample in time domain have a bitdepth, but the samples are traslated to frequency domain when encoded, and the source bitdepth is lost.
The samples in frequency domain are stored with a precission equivalent to 20-24 bits in time domain (AC3,DTS).
The best option when decode an ac3/dts is use, at least, a bitdepth of 24 bits not matter what is the source precission.
__________________
BeHappy, AviSynth audio transcoder.
tebasuna51 is offline   Reply With Quote
Old 18th February 2009, 12:42   #4  |  Link
2Bdecided
Registered User
 
Join Date: Dec 2002
Location: UK
Posts: 1,673
Though you won't get the "original" bits back, AC-3 can store a far greater dynamic range than can be represented in 16-bits.

It can happily store a sound peaking at 0dB FS one moment, and then one of -120dB FS the next. The latter sound would be lost in the dither noise (or rounded / truncated out of existence without dither) with 16-bits.


Just because you don't get the original 24-bits back doesn't mean it can't make some use of 24-bits.

Cheers,
David.
2Bdecided is offline   Reply With Quote
Old 18th February 2009, 14:40   #5  |  Link
Ryu77
Registered User
 
Ryu77's Avatar
 
Join Date: Mar 2008
Location: Australia
Posts: 246
Thanks guys!

This all sounds very interesting. I certainly would like to learn more on a deeper scale which is why I posted this question here as I was hoping to open up a conversation on a technical level.

I am certainly inspired to learn more about this.
Ryu77 is offline   Reply With Quote
Old 18th February 2009, 14:45   #6  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by Ryu77 View Post
How is it that an AC3 (Dolby Digital) track can claim the same bit depth and sample rate, and only be a maximum of 640 kbs? I understand that this is a lossy encoder, therefore discards information to compress the data. However, if it isn't discarding bit depth or sample rate, what exactly is being discarded? Because 640 kbs simply isn't 16 bits per sample @ 48,000 samples per second by 6 channels!
Welcome to the world of transform-based formats, where you can gain compression without explicitly discarding bit depth or sample rate

Works the same way in video.
Dark Shikari is offline   Reply With Quote
Old 25th February 2009, 06:30   #7  |  Link
Koorogi
Registered User
 
Koorogi's Avatar
 
Join Date: Feb 2009
Posts: 8
Lossy audio (and video) codecs typically work in the frequeny domain, because its easier to take advantage of several features/limits of human perception.

Audio codecs typically use a modified discrete cosine transform. This tells them the amount of each frequency that's present in a series of very short, overlapping time slices. Within each timeslice, they can make decisions about which frequency detail is important to keep. Human hearing has some masking effects - we are not as sensitive to sounds (especially of similar frequency) which are temporally nearby or concurrent to louder sounds, for instance. Audio codecs take advantage of this to throw out information that they don't think will be noticed.

This process can be done with any bit depth or time resolution. Increasing the bit depth increases the precision possible (it's possible to record finer changes). It is of course possible that this extra precision will be thrown out by the encoder due to bitrate pressure to encode more important data. I imagine bit rates higher than 16 bit are more useful for studio work where you shouldn't be using lossy compression anyway. Higher sampling rate, however is more important. Increasing the sampling rate does two things - it increases the frequency range expressible. The Nyquist limit states that the sampling rate must be at least twice the bandwidth of the signal, so if your signal has frequencies from 20 Hz to 20KHz, you need at least roughly a 40 KHz sampling rate to reproduce it.

The other thing it does is gives the encoder more accurate frequeny information. The discrete cosine transform and other fourier-based transforms have the problem that the frequency information you get is uniformly distributed throughout the frequency spectrum, but that doesn't match how we see and hear. Since every octave change sounds like the same "distance" to us, but is actually a doubling in frequency, we are more sensitive to changes between low frequencies, so frequeny resolution is more important here. But in order to get better frequency resolution they would need to perform the transform on a larger block of data at a time, which leads to other problems (latency, higher memory and processing requirements on the decoder, harder to take advantage of some properties of hearing due to a corresponding decrease in temporal resolution). These are actually the problems that motivated the development and use of wavelets in codecs like JPEG2000 and Snow.
Koorogi is offline   Reply With Quote
Old 25th February 2009, 09:01   #8  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Welcome to the board, Koorogi. Always good to have knowledgeable people here.
madshi is offline   Reply With Quote
Old 25th February 2009, 12:31   #9  |  Link
leeperry
Kid for Today
 
Join Date: Aug 2004
Posts: 3,477
last time I heard, AC3/DTS were able to output up to 18 bit if properly encoded(except for DTS 96/24 of course) :

http://209.85.229.132/search?q=cache...n&ct=clnk&cd=3

Quote:
Both Dolby Digital and DTS are capable of 24-bit resolution, but currently nominally operate at 18-bit resolution, allowing a dynamic range of approximately 108dB. Theoretically, 24-bit resolution allows dynamic range of 144dB which, though higher, would be indistinguishable from the lower 108dB figure given the current limitations of playback hardware. For all practical purposes, both Dolby Digital and DTS Digital Surround operate at near, or above, 18-bit resolution and dynamic range (108dB). Dolby Digital at 384kbps has an audio frequency response of 20Hz-18kHz with joint frequency coding above 10kHz, while 448kbps has a frequency response of approximately 20Hz to 20kHz with joint frequency coding above 15kHz. DTS at 754kbps has a maximum frequency response of 20Hz-19kHz although DTS's standard hardware encoder, the CAE-4, begins to roll off frequencies at 15kHz. 1509kbps DTS has a maximum frequency response of 20Hz-24kHz. Neither 754kbps nor 1509kbps DTS use joint-frequency coding.
anyway, it's like HDCD I guess.....you don't really get 20 bit, but you get >16 anyway....and it's clearly audible

Last edited by leeperry; 25th February 2009 at 23:26.
leeperry is offline   Reply With Quote
Old 26th February 2009, 04:25   #10  |  Link
jruggle
Registered User
 
Join Date: Jul 2006
Posts: 276
Quote:
Originally Posted by leeperry View Post
last time I heard, AC3/DTS were able to output up to 18 bit if properly encoded(except for DTS 96/24 of course) :

http://209.85.229.132/search?q=cache...n&ct=clnk&cd=3

anyway, it's like HDCD I guess.....you don't really get 20 bit, but you get >16 anyway....and it's clearly audible
A lot of that information is not true about AC3. Since the article refers to it as "Dolby Digital" it might only talking about the official Dolby encoder and/or decoder, but it does not apply to AC3 in general. The "nominal operating" bit depth depends on the encoder and/or decoder. The frequency response can go up to about 23.7kHz. Joint-frequency coding is optional and the frequency range it's used for is encoder-adjustable.
jruggle is offline   Reply With Quote
Old 26th February 2009, 14:20   #11  |  Link
Ryu77
Registered User
 
Ryu77's Avatar
 
Join Date: Mar 2008
Location: Australia
Posts: 246
Quote:
Originally Posted by Koorogi View Post
Lossy audio (and video) codecs typically work in the frequeny domain, because its easier to take advantage of several features/limits of human perception.

Audio codecs typically use a modified discrete cosine transform. This tells them the amount of each frequency that's present in a series of very short, overlapping time slices. Within each timeslice, they can make decisions about which frequency detail is important to keep. Human hearing has some masking effects - we are not as sensitive to sounds (especially of similar frequency) which are temporally nearby or concurrent to louder sounds, for instance. Audio codecs take advantage of this to throw out information that they don't think will be noticed.

This process can be done with any bit depth or time resolution. Increasing the bit depth increases the precision possible (it's possible to record finer changes). It is of course possible that this extra precision will be thrown out by the encoder due to bitrate pressure to encode more important data. I imagine bit rates higher than 16 bit are more useful for studio work where you shouldn't be using lossy compression anyway. Higher sampling rate, however is more important. Increasing the sampling rate does two things - it increases the frequency range expressible. The Nyquist limit states that the sampling rate must be at least twice the bandwidth of the signal, so if your signal has frequencies from 20 Hz to 20KHz, you need at least roughly a 40 KHz sampling rate to reproduce it.

The other thing it does is gives the encoder more accurate frequeny information. The discrete cosine transform and other fourier-based transforms have the problem that the frequency information you get is uniformly distributed throughout the frequency spectrum, but that doesn't match how we see and hear. Since every octave change sounds like the same "distance" to us, but is actually a doubling in frequency, we are more sensitive to changes between low frequencies, so frequeny resolution is more important here. But in order to get better frequency resolution they would need to perform the transform on a larger block of data at a time, which leads to other problems (latency, higher memory and processing requirements on the decoder, harder to take advantage of some properties of hearing due to a corresponding decrease in temporal resolution). These are actually the problems that motivated the development and use of wavelets in codecs like JPEG2000 and Snow.
Now that's the answer I was looking for... Thank you. I love technical knowledge. I can't get enough. I will study and memorize what you just said. I'll be ready for an exam on it very soon Mr Teacher.
Ryu77 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 15:46.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.