Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Audio encoding

Reply
 
Thread Tools Search this Thread Display Modes
Old 3rd October 2024, 12:34   #1  |  Link
GeoffreyA
Registered User
 
Join Date: Jun 2024
Location: South Africa
Posts: 145
Downmixing multi-channel tracks to stereo and normalising with FFmpeg's loudnorm

Back in the DVD days, I used Azid to apply DRC, downmix, and normalise audio. Downmixing those 5.1 tracks was always softer than the stereo tracks included on many DVDs. Coming back to encoding years later, the tools have changed but the problems haven't. Dialogue is still too soft.

FFmpeg's loudnorm filter does a good job at harmonising the volume and seems to be better than dynaudnorm. However, one has to adjust the target: loudness and true peak are easy enough (-23 and -1 or -2), but the range, or LRA, is up to the user's taste. Netflix recommends an LRA between 4 and 18; and I find that 18 gives good results, with audible dialogue.

My question is: what LRA values are others using that give good results with downmixed film material? Also, is it better to run loudnorm before downmixing or after?

Here is my batch file. The first, commented-out FFmpeg line is the measuring pass. After it runs, I set the variables to the measured values and run the second pass.

Code:
set out_i=-23
set out_tp=-2
set out_lra=12

set in_i=-16.8
set in_tp=6.7
set in_lra=23.2
set in_thresh=-29.4
set tg_offset=1.6

::ffmpeg -i %1 -map 0:a:0 -af aresample=ochl=stereo,loudnorm=i=%out_i%:tp=%out_tp%:lra=%out_lra%:print_format=summary -f null -

ffmpeg -i %1 -map 0:a:0 -af aresample=ochl=stereo:osr=192000:resampler=soxr:precision=33,
loudnorm=i=%out_i%:tp=%out_tp%:lra=%out_lra%:measured_i=%in_i%:measured_tp=%in_tp%:measured_lra=%in_lra%:measured_thresh=%in_thresh%:offset=%tg_offset%:linear=true:print_format=summary,
aresample=48000:resampler=soxr:precision=33 -c:a pcm_f32le -f wav - |
"%qaac%" --tvbr 91 --ignorelength --no-delay --verbose - -o "out\2.0-loudnorm.m4a"

Last edited by GeoffreyA; 3rd October 2024 at 12:42.
GeoffreyA is online now   Reply With Quote
Old 4th October 2024, 10:39   #2  |  Link
tebasuna51
Moderator
 
tebasuna51's Avatar
 
Join Date: Feb 2005
Location: Spain
Posts: 7,127
Quote:
Originally Posted by GeoffreyA View Post
... Dialogue is still too soft...

...Also, is it better to run loudnorm before downmixing or after?...
Talking about the dialog volume the problem is not use loudnorm (or dynaudnorm) before or after the downmix to stereo. The problem is the downmix method.

If the Center channel (with most of the dialog) have low volume compared with the rest of channels it is the same apply loudnorm after or before.

I recommend use a downmix with high Center contribution.
If the standard ffmpeg downmix is:
pan=stereo|FL=.4142c0+.2929c2+.2929c4|FR=.4142c1+.2929c2+.2929c5

I recommend increment the Center coeficient and decrement the surround contribution not important at all to the stereo output:
pan=stereo|FL=.4142c0+.3929c2+.1929c4|FR=.4142c1+.3929c2+.1929c5

after that downmix you can use loudnorm at your taste.
__________________
BeHappy, AviSynth audio transcoder.
tebasuna51 is offline   Reply With Quote
Old 4th October 2024, 14:01   #3  |  Link
GeoffreyA
Registered User
 
Join Date: Jun 2024
Location: South Africa
Posts: 145
Thanks, tebasuna! I think you've solved my problem and cleared up where the issue actually was.

This morning, incidentally, I tried my old set of PC speakers, which had good sound, and discovered that the loudnorm versions were not that grand after all. (I hadn't tested them on the TV yet.) I took your advice, raising the centre coefficient, and the dialogue came out loud and clear. This eliminates the need for loudnorm, which I was not too happy with because of the resampling, further processing, and potential damage.

As I am working with 32-bit float output, FFmpeg's default coefficients were the 1, 0.707107 set, so I experimented with 0.75-0.9 for the centre, and the respective drop in the surround channels. On a couple of films (Inception, Fellowship of the Ring, and Event Horizon), it works like a charm. On the other end, qaac is normalising to 0 dB before encoding.

If I may ask, is it always necessary to lower the surround channels if raising the centre? (EDIT: Through experimenting with FFmpeg, I see that each channel must add up to 1. I didn't realise that before.)

Last edited by GeoffreyA; 4th October 2024 at 14:27.
GeoffreyA is online now   Reply With Quote
Old 5th October 2024, 01:41   #4  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,118
Quote:
Originally Posted by GeoffreyA View Post
If I may ask, is it always necessary to lower the surround channels if raising the centre? (EDIT: Through experimenting with FFmpeg, I see that each channel must add up to 1. I didn't realise that before.)
Yes 'cause it otherwise leads to clipping and as you saw the idea is not to exceed 1 to avoid exactly that.

To answer your question about loudnorm, I know it's not required any longer as you solved your problem, but in case someone reading this needs it, I found my sweet spot at LRA 12 and True Peak -2 (target can be anything you need, most countries have -23, Italy has -24).

Last edited by FranceBB; 5th October 2024 at 01:43.
FranceBB is offline   Reply With Quote
Old 5th October 2024, 04:23   #5  |  Link
j7n
Registered User
 
j7n's Avatar
 
Join Date: Apr 2006
Posts: 153
The programme is likely normalized to not clip in the stereo downmix, or has a generous headroom. But you can't necessarily avoid clipping by decreasing another channel. They likely reach their maximum amplitude at different moments. I'm not familiar with any quirks in ffmpeg, but usually I'd tweak one knob at a time, the center channel, then multiply all output channels together if needed.
j7n is offline   Reply With Quote
Old 5th October 2024, 09:06   #6  |  Link
tebasuna51
Moderator
 
tebasuna51's Avatar
 
Join Date: Feb 2005
Location: Spain
Posts: 7,127
Quote:
Originally Posted by GeoffreyA View Post
... FFmpeg's default coefficients were the 1, 0.707107 set...
That default are before normalize: 1, 0.7071
where 0.7071 = (2^0.5)/2
than restore the original Center volume to the phantom one created with that contribution in both channels FL and FR.
That work fine downmixing 3 channel to 2.

But when there are others to downmix the default is (see the < instead the =):
pan=stereo|FL<c0+.7071c2+.7071c4|FR<c1+.7071c2+.7071c5

is the same (after automatic normalize) to:
pan=stereo|FL=.4142c0+.2929c2+.2929c4|FR=.4142c1+.2929c2+.2929c5

Now the Center channel is not recovered at same volume than the original and sometimes need raise it.
__________________
BeHappy, AviSynth audio transcoder.

Last edited by tebasuna51; 5th October 2024 at 09:11.
tebasuna51 is offline   Reply With Quote
Old 7th October 2024, 16:44   #7  |  Link
GeoffreyA
Registered User
 
Join Date: Jun 2024
Location: South Africa
Posts: 145
Thanks to everyone for their advice and thoughts. Testing this over the weekend, it seems that matters are not as simple as I thought the other day. I might have made a mistake, leaving out qaac's normalising. Also, in light of tebasuna's explanation about the phantom centre, I'm now hesitant to raise the centre coefficient too much. I'll keep on experimenting and report back when things are clearer.

Quote:
Originally Posted by FranceBB View Post
...I found my sweet spot at LRA 12...
Would you say that you are targeting a final LRA of 12, or putting that value in the loudnorm filter? I find that, to hit an LRA of 18, I've got to put in a value of around 12, funny enough.
GeoffreyA is online now   Reply With Quote
Old 8th October 2024, 11:55   #8  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,118
Quote:
Originally Posted by GeoffreyA View Post
Would you say that you are targeting a final LRA of 12, or putting that value in the loudnorm filter?
That's the value in the loudnorm filter that I generally use, so it looks like we're using the same value.
FranceBB is offline   Reply With Quote
Old 30th November 2024, 00:24   #9  |  Link
pandy
Registered User
 
Join Date: Mar 2006
Posts: 1,050
I recommends sofalizer filter even if your intention is not to use headphones - still seem it is better (smarter) doing downmixing,
this is my personal impression and of course loudness normalization on stereo but i would also consider loudness normalization on center channel (assumption dialogues redirected there) - Tebasuna idea is OK if you not decide to use sofalizer filter - then i would apply loudness normalization to center and after all to stereo.
pandy is offline   Reply With Quote
Old 1st December 2024, 12:12   #10  |  Link
GeoffreyA
Registered User
 
Join Date: Jun 2024
Location: South Africa
Posts: 145
Quote:
Originally Posted by pandy View Post
I recommends sofalizer filter even if your intention is not to use headphones - still seem it is better (smarter) doing downmixing,
this is my personal impression and of course loudness normalization on stereo but i would also consider loudness normalization on center channel (assumption dialogues redirected there) - Tebasuna idea is OK if you not decide to use sofalizer filter - then i would apply loudness normalization to center and after all to stereo.
I'm not too familiar with the sofalizer filter, but it would be interesting to see what results it gives.

What I ended up settling on was default downmixing, then normalising with loudnorm. I found that using an LRA of around 12 or 13, instead of 18, brings the dialogue to the forefront; it sounds clear, loud, and how it should be, reminscent of the cinema. With higher LRAs, the dialogue is audible but not loud enough, and the volume has to be manually adjusted throughout the movie. Using loudnorm is simpler and more generalised, whereas raising the centre coefficient before downmixing seems to depend on the film, and that means more work.

You mention performing loudness normalisation on the centre channel before downmixing. That is an interesting idea. It would take more work but give better results, aligning with the EBU's recommendation of having a dialogue LRA (I think 5) and a programme LRA, and having a ratio between these two.
GeoffreyA is online now   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 16:18.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.