View Full Version : HELP: need the best downmix to mono method !
PatchWorKs
27th January 2005, 11:21
I need (for my thesis) to convert a CD-quality file to a mono/32 KHz file, compressed with Vorbis (aoTuV ß3) @ about 64 Kbps. I need to obtain the best possible quality, so here what I (actually) do:
1. CD rip (EAC/aspi)
2. Remove phase shifts between left and right channel (Advanced Audio Corrector)
3. DC Offset adjust (Goldwave)
4. Resample -> 32000 Hz (wavefs44)
5. Downmix to mono (Goldwave)
6. Volume normalization (Foobar/replaygain)
7. Encode to Vorbis q3 (Foobar/oggenc)
Any suggestion ?
Do you know a better way to get higher quality ?
Do you know any open/free software to replace Advanced Audio Corrector & Goldwave functions ?
ANY REPLY IS ***REALLY*** APRECIATED !!!
Thanks a lot !!!
soundz
27th January 2005, 18:23
Hi!
I would suggest the following chain:
1- convert the audio to 32 bits per sample , 96 kHz sampling rate;
2- remove DC;
3- convert to mono;
4- normalize;
5- downsample to 32 kHz
6- encode.
I'm not familiar with the tools you are using, but there are encoders (at least for other formats) that accept input files at more than 16 bit, so you leave this to the encoder because it will (or should) know what to do with the extra bits.
I'm a bit suspicious over Advanced Audio Corrector because it only accepts 44.1/16 files, and the math needed for azymuth correction really works better at higher SR and bit depth; the fact that it not only accepts an input of 44.1/16 but also only outputs 44.1/16 after an operation that should use higher SRs and bit depths and that is not documented by the author makes me a little uneasy.
Besides, there are two points to be taken in consideration on this particular subject:
1- if the source is a commercial audio CD, then it was mastered by a professional and he has (or should have) already taken the proper measures to insure correct phasing between channels;
2- in a stereo recording, only what is 100% in the center of the stereo field is "supposed" to be also 100% in-phase; the rest can (and will in most cases) be less than 100% phase-coherent and depending on the algo used by Advanced Audio Corrector it may be making the wrong adjustments. In the best case, if the source did not need the "correction", you are passing the data trhu an unnecessary processing step and thus reducing the fidelity.
Best,
soundz
ursamtl
27th January 2005, 19:34
I agree wholehearestedly with soundz concerning this Advanced Audio Corrector software. It sounds very suspicious. Googling it provided only the same description paragraph on several sites. Its stated purpose is to prepare files for MP3 encoding, not something you plan to do.
I wouldn't bother upsampling to 96kHz because it's not going to add any high-frequency information that isn't already there. It will just put a lot of extra load on your CPU. Converting to 32-bit floating point format is definitely a good idea as it will maximize the dynamic range and improve the resolution before performing a DCOffset. However, if you need to convert your final file back to 16-bit audio resolution, be sure to use some kind of dithering. Otherwise, audio data is simply truncated or cut off.
Some audio pros are dead set against normalizing. The perceived loudness of a file has more to do with its average RMS loudness whereas normalizing simply moves the peaks or loudest sounds of your file to a target level. Since tiny momentary transients can occasionally fool normalzing software, it's not a good idea to normalize right to 0dB or 100%. If you do normalize, keep the target to at least -0.5dB or 90% or so.
When you're ready to convert to 32kHz sample rate, I would recommend an excellent and free utility from Voxengo called r8brain. It can be set to perform high-quality sample rate conversions. You can find it at http://www.voxengo.com/r8brain/.
Regards,
Steve.
soundz
27th January 2005, 20:43
Hi!
I suggested the upsampling to 96kHz because he will be downsampling it to 32kHz later on, and IME upsampling first and then processing at a higher SR before downsampling produce better results.
About the normalization, I agree that it should only be done in the right context, ie:
If the source is just a bit "under", leave it;
If the end result is to be listened in conjunction with other sound files, it is the RMS value that has to be taken into account, not the absolute peak; short-term peaks have nothing to do with actual perceived loudness.
For an individual sound file that will not be listened in conjunction with other sound files, if he is already removing DC then it will not hurt to normalize it too.
Dithering: if the encoder can accept higher bit-depth files, let the encoder do it's own internal processing.
If the encoder cannot, then listening evaluations should be made with the same source encoded before and after dithering, for a simple reason: perceptual coders give more importance to random sounds (like noise) than to more continuous signals, and in some cases an undithered file may produce better-sounding encoded files than a dithered file, specially at very low bitrates. YMMV.
Best,
soundz
ursamtl
27th January 2005, 23:56
I agree that if one is recording or digitizing sound, the higher the sampling rate, the better the quality. However, once the source has been digitized at a certain rate (in this case, 44,100 samples per second) converting it to 96,000 will not increase the sonic information; it will only provide more samples of the same information.
Think of it this way. Samples are snapshots in time of the audio information. If I take 5 snapshots of a car driving by and then take ten snapshots of those five snapshots, I still only have five unique pictures; it's just that I'd now have two copies of each picture. If I then want to reduce my number of pictures to 4, I have to throw away six instead of one. I don't have any additional choices.
Since he plans to downsample to 32kHz and every conversion introduces some degree of mathematical errors due to rounding, etc., it makes sense to do only one sampling rate conversion instead of two.
Anyway, no big deal. It's just that in the reading I've been doing, some audio pros seem to be saying that the difference between 16- and 24-bit resolution files (32-bit is just 24-bit with a mantissa) has a much greater impact on actual audible sound quality than the difference between 44.1/48 and 96kHz sampling rate. I know in my own experiments with surround processing, etc., moving up to 24-bit sound had a noticeable impact, whereas upsmaling to 96kHz did not.
Regards,
Steve.
PatchWorKs
1st February 2005, 14:23
1st: thanks a lot !
2nd: source IS a professional CD !
3rd: about phase shifts removing - check out this page (http://www.weirdtitanradio.com/techinfo/)
4th: the resampler i used (wavefs44 (http://www.rarewares.org/others.html)) claims to be the best one.
anyway i also tested SSRC and PPHS... I can't say the difference
5th: bits must be 16 (won't change during the conversion)
soundz
2nd February 2005, 05:29
Originally posted by PatchWorKs
1st: thanks a lot !
You're welcome!
2nd: source IS a professional CD !
Good, this makes the task a lot easier.
3rd: about phase shifts removing - check out this page (http://www.weirdtitanradio.com/techinfo/)
Hmmmm... I've never seen a "mono'ed" stereo track distort, unless you sum both channels at unity gain, ie, unless you do it wrong.
About the "sounds less full" argument, well... converting a stereo track to mono will always sound less full because you lose the directional cues and the spaciousness; besides, the software you are using to "correct" this maybe doing more harm than good; do you hear a positive difference when comparing the same stereo track converted to mono with and without that "phase correction"?
4th: the resampler i used (wavefs44 (http://www.rarewares.org/others.html)) claims to be the best one.
anyway i also tested SSRC and PPHS... I can't say the difference
Ok.
5th: bits must be 16 (won't change during the conversion)
Try to do it at 32 bits until the final step, you will get higher quality.
best,
soundz
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.