View Full Version : Downmixing multi-channel tracks to stereo and normalising with FFmpeg's loudnorm
GeoffreyA
3rd October 2024, 12:34
Back in the DVD days, I used Azid to apply DRC, downmix, and normalise audio. Downmixing those 5.1 tracks was always softer than the stereo tracks included on many DVDs. Coming back to encoding years later, the tools have changed but the problems haven't. Dialogue is still too soft.
FFmpeg's loudnorm filter does a good job at harmonising the volume and seems to be better than dynaudnorm. However, one has to adjust the target: loudness and true peak are easy enough (-23 and -1 or -2), but the range, or LRA, is up to the user's taste. Netflix recommends an LRA between 4 and 18; and I find that 18 gives good results, with audible dialogue.
My question is: what LRA values are others using that give good results with downmixed film material? Also, is it better to run loudnorm before downmixing or after?
Here is my batch file. The first, commented-out FFmpeg line is the measuring pass. After it runs, I set the variables to the measured values and run the second pass.
set out_i=-23
set out_tp=-2
set out_lra=12
set in_i=-16.8
set in_tp=6.7
set in_lra=23.2
set in_thresh=-29.4
set tg_offset=1.6
::ffmpeg -i %1 -map 0:a:0 -af aresample=ochl=stereo,loudnorm=i=%out_i%:tp=%out_tp%:lra=%out_lra%:print_format=summary -f null -
ffmpeg -i %1 -map 0:a:0 -af aresample=ochl=stereo:osr=192000:resampler=soxr:precision=33,
loudnorm=i=%out_i%:tp=%out_tp%:lra=%out_lra%:measured_i=%in_i%:measured_tp=%in_tp%:measured_lra=%in_lra%:measured_thresh=%in_thresh%:offset=%tg_offset%:linear=true:print_format=summary,
aresample=48000:resampler=soxr:precision=33 -c:a pcm_f32le -f wav - |
"%qaac%" --tvbr 91 --ignorelength --no-delay --verbose - -o "out\2.0-loudnorm.m4a"
tebasuna51
4th October 2024, 10:39
... Dialogue is still too soft...
...Also, is it better to run loudnorm before downmixing or after?...
Talking about the dialog volume the problem is not use loudnorm (or dynaudnorm) before or after the downmix to stereo. The problem is the downmix method.
If the Center channel (with most of the dialog) have low volume compared with the rest of channels it is the same apply loudnorm after or before.
I recommend use a downmix with high Center contribution.
If the standard ffmpeg downmix is:
pan=stereo|FL=.4142c0+.2929c2+.2929c4|FR=.4142c1+.2929c2+.2929c5
I recommend increment the Center coeficient and decrement the surround contribution not important at all to the stereo output:
pan=stereo|FL=.4142c0+.3929c2+.1929c4|FR=.4142c1+.3929c2+.1929c5
after that downmix you can use loudnorm at your taste.
GeoffreyA
4th October 2024, 14:01
Thanks, tebasuna! I think you've solved my problem and cleared up where the issue actually was.
This morning, incidentally, I tried my old set of PC speakers, which had good sound, and discovered that the loudnorm versions were not that grand after all. (I hadn't tested them on the TV yet.) I took your advice, raising the centre coefficient, and the dialogue came out loud and clear. This eliminates the need for loudnorm, which I was not too happy with because of the resampling, further processing, and potential damage.
As I am working with 32-bit float output, FFmpeg's default coefficients were the 1, 0.707107 set, so I experimented with 0.75-0.9 for the centre, and the respective drop in the surround channels. On a couple of films (Inception, Fellowship of the Ring, and Event Horizon), it works like a charm. On the other end, qaac is normalising to 0 dB before encoding.
If I may ask, is it always necessary to lower the surround channels if raising the centre? (EDIT: Through experimenting with FFmpeg, I see that each channel must add up to 1. I didn't realise that before.)
FranceBB
5th October 2024, 01:41
If I may ask, is it always necessary to lower the surround channels if raising the centre? (EDIT: Through experimenting with FFmpeg, I see that each channel must add up to 1. I didn't realise that before.)
Yes 'cause it otherwise leads to clipping and as you saw the idea is not to exceed 1 to avoid exactly that.
To answer your question about loudnorm, I know it's not required any longer as you solved your problem, but in case someone reading this needs it, I found my sweet spot at LRA 12 and True Peak -2 (target can be anything you need, most countries have -23, Italy has -24).
j7n
5th October 2024, 04:23
The programme is likely normalized to not clip in the stereo downmix, or has a generous headroom. But you can't necessarily avoid clipping by decreasing another channel. They likely reach their maximum amplitude at different moments. I'm not familiar with any quirks in ffmpeg, but usually I'd tweak one knob at a time, the center channel, then multiply all output channels together if needed.
tebasuna51
5th October 2024, 09:06
... FFmpeg's default coefficients were the 1, 0.707107 set...
That default are before normalize: 1, 0.7071
where 0.7071 = (2^0.5)/2
than restore the original Center volume to the phantom one created with that contribution in both channels FL and FR.
That work fine downmixing 3 channel to 2.
But when there are others to downmix the default is (see the < instead the =):
pan=stereo|FL<c0+.7071c2+.7071c4|FR<c1+.7071c2+.7071c5
is the same (after automatic normalize) to:
pan=stereo|FL=.4142c0+.2929c2+.2929c4|FR=.4142c1+.2929c2+.2929c5
Now the Center channel is not recovered at same volume than the original and sometimes need raise it.
GeoffreyA
7th October 2024, 16:44
Thanks to everyone for their advice and thoughts. Testing this over the weekend, it seems that matters are not as simple as I thought the other day. I might have made a mistake, leaving out qaac's normalising. Also, in light of tebasuna's explanation about the phantom centre, I'm now hesitant to raise the centre coefficient too much. I'll keep on experimenting and report back when things are clearer.
...I found my sweet spot at LRA 12...
Would you say that you are targeting a final LRA of 12, or putting that value in the loudnorm filter? I find that, to hit an LRA of 18, I've got to put in a value of around 12, funny enough.
FranceBB
8th October 2024, 11:55
Would you say that you are targeting a final LRA of 12, or putting that value in the loudnorm filter?
That's the value in the loudnorm filter that I generally use, so it looks like we're using the same value. :)
pandy
30th November 2024, 00:24
I recommends sofalizer filter even if your intention is not to use headphones - still seem it is better (smarter) doing downmixing,
this is my personal impression and of course loudness normalization on stereo but i would also consider loudness normalization on center channel (assumption dialogues redirected there) - Tebasuna idea is OK if you not decide to use sofalizer filter - then i would apply loudness normalization to center and after all to stereo.
GeoffreyA
1st December 2024, 12:12
I recommends sofalizer filter even if your intention is not to use headphones - still seem it is better (smarter) doing downmixing,
this is my personal impression and of course loudness normalization on stereo but i would also consider loudness normalization on center channel (assumption dialogues redirected there) - Tebasuna idea is OK if you not decide to use sofalizer filter - then i would apply loudness normalization to center and after all to stereo.
I'm not too familiar with the sofalizer filter, but it would be interesting to see what results it gives.
What I ended up settling on was default downmixing, then normalising with loudnorm. I found that using an LRA of around 12 or 13, instead of 18, brings the dialogue to the forefront; it sounds clear, loud, and how it should be, reminscent of the cinema. With higher LRAs, the dialogue is audible but not loud enough, and the volume has to be manually adjusted throughout the movie. Using loudnorm is simpler and more generalised, whereas raising the centre coefficient before downmixing seems to depend on the film, and that means more work.
You mention performing loudness normalisation on the centre channel before downmixing. That is an interesting idea. It would take more work but give better results, aligning with the EBU's recommendation of having a dialogue LRA (I think 5) and a programme LRA, and having a ratio between these two.
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.