convert 5.1 to stereo [Archive]

View Full Version : convert 5.1 to stereo

miltk

29th May 2024, 06:19

my ears apparently cannot comprehend today's audio 5.1 et al

i have a problem listening to 5.1 with any clarity.

network tv is ok. studio audio like espn or news is fine. even commercials are fine. but movies like fallguy is just really muddy to mu ears

would converting to stereo work?

i use handbrake and handbrake's default audio is stereo, not 5.1, so is handbrake using stereo or not,,,or do i need another solution

thx

richardpl

29th May 2024, 07:33

With ffmpeg:

ffmpeg -i input.audio.file -af "aformat=fltp,pan=stereo|FL=FL+FC+LFE+SL+BL|FR=FR+FC+LFE+SR+BR,alimiter" out.audio.file

alimiter filter can be replaced with any other limiter filter, like ones in LADSPA/LV2/CLAP...

Emulgator

29th May 2024, 07:53

Depending on editions audio often is pushed at the whim of (re-)producers.
Considering that anything above stereo is often misunderstood, (and might be poorly QCed) there can be any flaws in any source.

Since there are different opinions, solutions and misconceptions about movie audio, my best attempt was to derive from history:

Lets start with the one and only all-ends-up-there center channel.
From the 30's/40's on the ubiquitous "Voice of Theatre" is a 4x12" hornloaded speaker behind the screen,
from its availabilty in the late 40s driven by lets say a 2x6550 amplifier putting out 40W, or 4x6550 offering 80W.

Leaving the early 7-channel experiment of the few Cinerama movies and theaters aside, later increases of audio channels can be seen as:
For backward compatibility let's keep the center speaker and it assignment for times to come. 1.0 C Mono.
Plus for the ones who are willing to upgrade their theaters, lets add 4 quadro channels.

The quad/eight Coronado is such a 36/24||36+24/8/4/2/1 mixing desk, developed 1979 for the needs of californian film studios, introducing Mute/VCA automation sporting 2 panorama pots per channel (1st row F-C-B, 2nd row L-C-R)
which would feed a quad summing strip with the single quad fader controlling 4 VCAs ganged simultanously.
In 2016 one of the last Coronados arrived here in Berlin and I was called in to restore that flagship from the eighties.
In absence of a manual, and with a few hardly readable xerox copies of a part of schematics I had to redraw the missing ones, so here from my writeup:
"The Coronado extends from the Pacifica and adds Mute/VCA-controlling Automation for repeatable results, especially demanded in film production.
The Coronado can record max. 24-CH at the same time in Overdub Mode, final destination is Quad(ro) or Stereo or Mono
Coronado's main application is Quadro (4.0) Film sound recording following 1978-1984 U.S. standards and specifications."

There was no "5-axis joystick" to pan any source to center, so any dialogue was run on unpanned channels or on a separate desk.
Then 1.0 C + 4.0 quadro FL+FR+RL+RR = 5.0 C+FL+FR+RL+RR
Since Dialogue has to stay at Center the quadro channel set can rather be seen as FX channels.
Anything swooshing, background noises, anything dispensable ends up there.
And since smaller theaters and private living rooms have to live with smaller quadro speaker enclosures,
and since bass from all of these would lead to local bass cancellation:
Lets introduce a LFE thunder box, "only for the rumble", put somewhere where it fits. 5.1= C+FL+FR+RL+RR+LFE.

Now straight quick-converting will inherit any introduced disbalance.
It may well be that dialogue is in center only, it may be that dialogue had been mixed into FL and FR,
it max well be that a previous global upmix of each and every source will lead to overloading when downmixed by fixed assumptions.

Considering all that I can only suggest to demux all audio, drop it on any Audio editor (audacity will do), QC it and then remix to your taste.
Some movies from the 40's..60's might have distorted soundtracks and might need reversal of frequency-selective overcompression.
You are welcome to Re-eq, then revert useless fader pushes, finally remix and use what you prefer from the tracks.

Here with some Blu-ray movies I find myself reverting more than 100 fader pushes per movie per track (!),
vastly differently across languages, and sometimes even differing across 1.0/2.0/5.1 versions of the same language.
To clean such envelope mess takes time, but the reward is ...a much nicer to listen to feature.

Of course there are movies which have been beautifully restored and won't need any retouching !

P.S. Ah, and as I was writing, richardpl was quicker, and of course:
his suggestion of introducing a limiter will help, at least it avoids clipping in such quick downmix overload scenario.

hello_hello

30th May 2024, 17:25

I've used a compressor over the audio for so long (I generally use my PC as a media player) I forget how ridiculously dynamic it can be until I'm watching video a different way.

As surround sound is quite silly I only have stereo speakers, so the audio is downmixed to stereo, and for years I used ffdshow to apply compression via a WinAmp plugin. Now I'm on Linux I use SMPlayer with the Dynamic Audio Normalizer built into ffmpeg as an audio filter for MPV. Mind you neither of them are traditional compressors as such as they increase the volume of the quiet parts rather than squish the loud bits.

My suggestion to miltk in his thread at VideoHelp was to do something like this to replace the existing audio with a compressed stereo version.

ffmpeg.exe -i "InputVideo.mkv" -y -map 0 -c:v copy -c:s copy -ac 2 -af dynaudnorm=f=150:b=1 -c:a aac "OutputVideo.mkv"

Including the LFE channel when downmixing isn't a good idea if you're compressing later, as too much low frequency stuff can cause noticeable "volume pumping", although according to the Dolby spec for downmixing AC3, the LFE channel shouldn't be included anyway.

Emulgator

30th May 2024, 23:06

Lord MuldeR's DAN is very nice, yes. A good suggestion.

Katie Boundary

31st May 2024, 00:19

Many of AVIsynth's audio-source filters (WAVsource, nicac3source, ffaudiosource etc.) have a "channels" parameter that will convert to stereo if you set it to "2".

wonkey_monkey

1st June 2024, 13:47

Many of AVIsynth's audio-source filters (WAVsource, nicac3source, ffaudiosource etc.) have a "channels" parameter

WavSource doesn't have a "channels" parameter and nor does FFAudioSource. Still, 1 out of 3...

FranceBB

2nd June 2024, 20:45

Well, I would rather use a downmix function so that I know exactly what it does and how it does it rather than trusting the indexer doing some shady things when I specify the output channel. In Avisynth there are plenty of downmix functions shared by several people over the years:

function Dmix3Stereo(clip a)
{
flr = GetChannel(a, 1, 2)
fcc = GetChannel(a, 3, 3)
return MixAudio(flr, fcc, 0.5858, 0.4142)
}# 3 Channels L,R,C or L,R,S

function Dmix3Dpl(clip a)
{
flr = GetChannel(a, 1, 2)
sl = GetChannel(a, 3)
sr = Amplify(sl, -1.0)
blr = MergeChannels(sl, sr)
return MixAudio(flr, blr, 0.5858, 0.4142)
}# 3 Channels only L,R,S

function Dmix4qStereo(clip a)
{
flr = GetChannel(a, 1, 2)
blr = GetChannel(a, 3, 4)
return MixAudio(flr, blr, 0.5, 0.5)
}# 4 Channels Quadro L,R,SL,SR

function Dmix4qDpl(clip a)
{
flr = GetChannel(a, 1, 2)
bl = GetChannel(a, 3)
br = GetChannel(a, 4)
sl = MixAudio(bl, br, 0.2929, 0.2929)
sr = MixAudio(bl, br, -0.2929, -0.2929)
blr = MergeChannels(sl, sr)
return MixAudio(flr, blr, 0.4142, 1.0)
}# 4 Channels Quadro L,R,SL,SR

function Dmix4qDpl2(clip a)
{
flr = GetChannel(a, 1, 2)
bl = GetChannel(a, 3)
br = GetChannel(a, 4)
sl = MixAudio(bl, br, 0.3714, 0.2144)
sr = MixAudio(bl, br, -0.2144, -0.3714)
blr = MergeChannels(sl, sr)
return MixAudio(flr, blr, 0.4142, 1.0)
}# 4 Channels Quadro L,R,SL,SR

function Dmix4sStereo(clip a)
{
flr = GetChannel(a, 1, 2)
fcc = GetChannel(a, 3, 3)
lrc = MixAudio(flr, fcc, 0.4142, 0.2929)
blr = GetChannel(a, 4, 4)
return MixAudio(lrc, blr, 1.0, 0.2929)
}# 4 Channels L,R,C,S

function Dmix4sDpl(clip a)
{
flr = GetChannel(a, 1, 2)
fcc = GetChannel(a, 3, 3)
lrc = MixAudio(flr, fcc, 0.4142, 0.2929)
sl = GetChannel(a, 4)
sr = Amplify(sl, -1.0)
blr = MergeChannels(sl, sr)
return MixAudio(flr, blr, 1.0, 0.2929)
}# 4 Channels L,R,C,S

function Dmix5Stereo(clip a)
{
flr = GetChannel(a, 1, 2)
fcc = GetChannel(a, 3, 3)
lrc = MixAudio(flr, fcc, 0.3694, 0.2612)
blr = GetChannel(a, 4, 5)
return MixAudio(lrc, blr, 1.0, 0.3694)
}# 5 Channels L,R,C,SL,SR

function Dmix5Dpl(clip a)
{
flr = GetChannel(a, 1, 2)
fcc = GetChannel(a, 3, 3)
lrc = MixAudio(flr, fcc, 0.3205, 0.2265)
bl = GetChannel(a, 4)
br = GetChannel(a, 5)
sl = MixAudio(bl, br, 0.2265, 0.2265)
sr = MixAudio(bl, br, -0.2265, -0.2265)
blr = MergeChannels(sl, sr)
return MixAudio(lrc, blr, 1.0, 1.0)
}# 5 Channels L,R,C,SL,SR

function Dmix5Dpl2(clip a)
{
flr = GetChannel(a, 1, 2)
fcc = GetChannel(a, 3, 3)
lrc = MixAudio(flr, fcc, 0.3254, 0.2301)
bl = GetChannel(a, 4)
br = GetChannel(a, 5)
sl = MixAudio(bl, br, 0.2818, 0.1627)
sr = MixAudio(bl, br, -0.1627, -0.2818)
blr = MergeChannels(sl, sr)
return MixAudio(lrc, blr, 1.0, 1.0)
}# 5 Channels FL,LR,C,SL,SR

function Dmix6Stereo(clip a)
{
flr = GetChannel(a, 1, 2)
fcc = GetChannel(a, 3, 3)
lrc = MixAudio(flr, fcc, 0.3694, 0.2612)
blr = GetChannel(a, 5, 6)
return MixAudio(lrc, blr, 1.0, 0.3694)
}# 6 Channels FL, FR, C, LFE, SL, SR

function Dmix6Dpl(clip a)
{
flr = GetChannel(a, 1, 2)
fcc = GetChannel(a, 3, 3)
lrc = MixAudio(flr, fcc, 0.3205, 0.2265)
bl = GetChannel(a, 5)
br = GetChannel(a, 6)
sl = MixAudio(bl, br, 0.2265, 0.2265)
sr = MixAudio(bl, br, -0.2265, -0.2265)
blr = MergeChannels(sl, sr)
return MixAudio(lrc, blr, 1.0, 1.0)
}# 6 Channels FL, FR, C, LFE, SL, SR

function Dmix6Dpl2(clip a)
{
flr = GetChannel(a, 1, 2)
fcc = GetChannel(a, 3, 3)
lrc = MixAudio(flr, fcc, 0.3254, 0.2301)
bl = GetChannel(a, 5)
br = GetChannel(a, 6)
sl = MixAudio(bl, br, 0.2818, 0.1627)
sr = MixAudio(bl, br, -0.1627, -0.2818)
blr = MergeChannels(sl, sr)
return MixAudio(lrc, blr, 1.0, 1.0)
}# 6 Channels FL, FR, C, LFE, SL, SR

function Dmix6StereoLfe(clip a)
{
flr = GetChannel(a, 1, 2)
fcc = GetChannel(a, 3, 3)
lrc = MixAudio(flr, fcc, 0.2929, 0.2071)
lfe = GetChannel(a, 4, 4)
lrc = MixAudio(lrc, lfe, 1.0, 0.2071)
blr = GetChannel(a, 5, 6)
return MixAudio(lrc, blr, 1.0, 0.2929)
}# 6 Channels FL, FR, C, LFE, SL, SR

function Dmix6DplLfe(clip a)
{
flr = GetChannel(a, 1, 2)
fcc = GetChannel(a, 3, 3)
lrc = MixAudio(flr, fcc, 0.2613, 0.1847)
lfe = GetChannel(a, 4, 4)
lrc = MixAudio(lrc, lfe, 1.0, 0.1847)
bl = GetChannel(a, 5)
br = GetChannel(a, 6)
sl = MixAudio(bl, br, 0.1847, 0.1847)
sr = MixAudio(bl, br, -0.1847, -0.1847)
blr = MergeChannels(sl, sr)
return MixAudio(lrc, blr, 1.0, 1.0)
}# 6 Channels FL, FR, C, LFE, SL, SR

function Dmix6Dpl2Lfe(clip a)
{
flr = GetChannel(a, 1, 2)
fcc = GetChannel(a, 3, 3)
lrc = MixAudio(flr, fcc, 0.2646, 0.1870)
lfe = GetChannel(a, 4, 4)
lrc = MixAudio(lrc, lfe, 1.0, 0.1870)
bl = GetChannel(a, 5)
br = GetChannel(a, 6)
sl = MixAudio(bl, br, 0.2291, 0.1323)
sr = MixAudio(bl, br, -0.1323, -0.2291)
blr = MergeChannels(sl, sr)
return MixAudio(lrc, blr, 1.0, 1.0)
}# 6 Channels FL, FR, C, LFE, SL, SR

The one that I personally use for 5.1 to 2.0 is Dmix5Dpl2() which is the good old Dolby Pro Logic II downmix function as referenced here (https://forum.doom9.org/showthread.php?t=170512&page=2) and here (https://forum.doom9.org/showthread.php?t=152034). I've been using it over the last decade and it's what I'm still using today. It sounds good when played back on a stereo system and it also sounds fine when it's hardware upmixed by my old Phillips 5.1 and ok-ish when it's software upmixed via pipewire/wireplumber on my newer 5.1 (the one hooked up to my computer).

tormento

3rd June 2024, 17:48

fader pushes
What are fader pushes?

I am always so concerned about video that audio usually becomes just a conversion to a more comfortable format.

Do you mind explaining or writing a little audio guide about how to fix troublesome audio?

Emulgator

4th June 2024, 07:54

What are fader pushes?
During mixdown (editing/foley/mastering) a manual increase of a source's volume by pushing the mixing console's respective channel fader up for that event only, then back.
Depending on mixdown situation (trembling hands, drugged session, producer in backseat, executive officer expecting sound engineer to pin the audience into their seats) often overdone to highlight an exclamation, an effect, a scream, a train passing, anything.

The appended Vegas screenshots show the correction envelope of such a push:
First screenshot shows the correction envelopes for the complete movie, the lower 2.0 language track being WIP, so not finished yet.
Second Screenshot shows the enlarged cursor range: an orchestrated exclamation which was already played expressively by the orchestra, plus the actor was directed to shout at the top of his voice.
Loud enough even for a sleeping audience to "get the drift", but producer additionally wanted to see clipping VUs,
so the thing went overboard and became a nuisance to any seated audience.

BTW this thread is offtopic here in "Decryption" subforum, tebasuna51, you are welcome to move this to "Audio encoding" where it belongs.

tebasuna51

6th June 2024, 10:20

@miltk, this is the correct subforum to talk about your problem.

Like you can see there are different solutions.

Of course edit the audio with Vegas, or the free software Audacity, to correct any problem in the source can be the best, but need a good knowledge of the tools and a hard work.
Maybe you want a easy way to solve "fallguy is just really muddy", low dialog volume.

About the ffmpeg methods:
ffmpeg -i input.audio.file -af "aformat=fltp,pan=stereo|FL=FL+FC+LFE+SL+BL|FR=FR+FC+LFE+SR+BR,alimiter" out.audio.file
ffmpeg -i "InputVideo.mkv" -y -map 0 -c:v copy -c:s copy -ac 2 -af dynaudnorm=f=150:b=1 -c:a aac "OutputVideo.mkv"

The first one do a problematic downmix because the Front Channel (mainly dialogs) go to very low volume 1/5 in each channel and the problem is big than let default downmix.
The second one use the default downmix (-ac 2) and apply after a dynaudnorm for me still not solve the low dialog volume.

We need a way to control the channel downmix like the AviSynth methods proposed show. If you don't play with a 5.1 with DPLII decoder better use a simple one:

flr = GetChannel(a, 1, 2)
fcc = GetChannel(a, 3, 3)
slr = GetChannel(a, 5, 6)
lrc = MixAudio(flr, fcc, 0.37, 0.26)
MixAudio(lrc, slr, 1.0, 0.37)
Normalize()

With these coeficients the FC have more volume, even you can increase it (0.36) and low (0.27) the surround volume (the sum of 3 coefcients must sum 1)

With ffmpeg you can do it with:

ffmpeg -i 51.audio.file -af "pan=stereo|FL=0.37c0+0.36c2+0.27c4|FR=0.37c1+0.36c2+0.27c5" -c:a aac output.aac

tormento

7th June 2024, 08:19

The appended Vegas screenshots show the correction envelope of such a push
And you that by hand? You are a hero.

hello_hello

7th June 2024, 12:00

About the ffmpeg methods:
ffmpeg -i input.audio.file -af "aformat=fltp,pan=stereo|FL=FL+FC+LFE+SL+BL|FR=FR+FC+LFE+SR+BR,alimiter" out.audio.file
ffmpeg -i "InputVideo.mkv" -y -map 0 -c:v copy -c:s copy -ac 2 -af dynaudnorm=f=150:b=1 -c:a aac "OutputVideo.mkv"

The first one do a problematic downmix because the Front Channel (mainly dialogs) go to very low volume 1/5 in each channel and the problem is big than let default downmix.
The second one use the default downmix (-ac 2) and apply after a dynaudnorm for me still not solve the low dialog volume.

The second one generally works quite well.
Obviously there's no reason why you can't adjust the channel volumes when down-mixing... increasing the center channel volume etc... which is fine if you're not also compressing, but if you are, adjusting the individual channel volumes doesn't achieve much because the compression flattens it all out anyway.

Just for fun, I created a couple of samples.

"DownMix Only.m4a" is down-mixed using:
-af "pan=stereo|FL=0.37c0+0.36c2+0.27c4|FR=0.37c1+0.36c2+0.27c5"

"DownMix & DynAudioNorm.m4a" is down-mixed and compressed with:
-af "pan=stereo|FL < 0.3694*FL + 0.2612*FC + 0.2612*BL + 0.2612*SL | FR < 0.3694*FR + 0.2612*FC + 0.2612*BR + 0.2612*SR","dynaudnorm=f=150:b=1"

I discovered -ac 2 doesn't reduce the volume while combining the channels to prevent clipping, so I switched to down-mixing with the pan filter instead.

The audio was piped to QAAC.
I ran an EBU R128 scan on both and losslessly adjusted them to the same volume (+-1dB). The "DownmiMix Only" sample is -22.27 LUFS and the compressed sample is -23.42 LUFS. The dialogue in the compressed sample is mostly louder, although admittedly there's a few places where it's very briefly lower due to a loud background noise causing the compressor to do it's thing, but mostly it's louder.

DownMix Only.m4a (https://files.videohelp.com/u/210984/DownMix%20Only.m4a)
DownMix & DynAudioNorm.m4a (https://files.videohelp.com/u/210984/DownMix%20_%20DynAudioNorm.m4a)

tebasuna51

7th June 2024, 12:57

"DownMix & DynAudioNorm.m4a" is down-mixed and compressed with:
-af "pan=stereo|FL < 0.3694*FL + 0.2612*FC + 0.2612*BL + 0.2612*SL | FR < 0.3694*FR + 0.2612*FC + 0.2612*BR + 0.2612*SR","dynaudnorm=f=150:b=1"

Are you using a 7.1 source? Here we talk about a 5.1 source.

Apply a Dynamic Audio Compression can help to avoid turn up and down audio volume, if your Audio equipment don't have a Night Mode, but not to difference the dialog from the rest if it is applied over a standard downmix.

Of course can be used after a proper downmix.

hello_hello

9th June 2024, 09:02

Are you using a 7.1 source? Here we talk about a 5.1 source.

The ffmpeg docs say it can be used for either a 5.1ch or 7.1ch source.
It's the same matrix I use for 5.1ch with the Matrix Mixer in foobar2000 as sometimes the surround channels are decoded as "side" and other times as "back". That problem mightn't apply to ffmpeg though as it might be smart enough to know where the surround channels are.

From memory, the Matrix Mixer is clever enough to only account for the channels in the source audio when deciding how much to reduce the volume to prevent clipping, rather than reducing it according to the channels included in the downmix.

https://imgur.com/aR0WDoNl.png

Apply a Dynamic Audio Compression can help to avoid turn up and down audio volume, if your Audio equipment don't have a Night Mode, but not to difference the dialog from the rest if it is applied over a standard downmix.

Of course can be used after a proper downmix.

The trouble is, every "night mode" I've come across is horrible. They tend to react quite slowly so you can easily hear the volume dropping and then slowly increasing again.

I'm not sure what you mean by a standard vs proper downmix, but if you listen to the samples it's obvious the dialogue is louder for the sample with the Dynamic Audio Normalizer applied, even though an EBU R128 scan indicates both samples are pretty much the same volume.

tebasuna51

9th June 2024, 11:15

Yes, that is the standard downmix for 7.1 but for 5.1 is:

FL < FL + 0.707*FC + SL | FR < FR + 0.707*FC + SR

is the same than:

FL = 0.37*FL + 0.26*FC + 0.37*SL | FR = 0.37*FR + 0.26*FC + 0.37*SR

If the FC (dialogs) have low volume without DRC, compared with the rest, have the same low volume after the DRC because the gain or attenuation is applied to all.

If your audio equipment need DRC, not mine, or your Night Mode is bad you can apply the DRC at your taste, but the problem of sources with low dialog volume is solved with a mix with high coeficient for the FC.
Of course that is my opinion.

tormento

13th June 2024, 21:45

Tiny OT: how to convert 2.1 (L+R+LFE) to proper 2.0?

richardpl

13th June 2024, 22:25

Use same command i already provided above, it works for 2.1 too.

FranceBB

14th June 2024, 08:24

Tiny OT: how to convert 2.1 (L+R+LFE) to proper 2.0?

I would probably just ignore the LFE channel.
After all LFE is not generally included in any downmix even when you're dealing with 5.1 -> 2.0, so it's not a big deal if you don't include it.

#Index
video=LWLibavVideoSource("video.mxf")
audio=LWLibavAudioSource("audio.mxf")
AudioDub(video, audio)

#Retrieve Left and Right only
GetChannels(1,2)

filler56789

14th June 2024, 09:24

tiny ot: How to convert 2.1 (l+r+lfe) to proper 2.0?

This line is necessary for fooling certain vBulletin's stupid default settings.

L’ = (2/3)*L + (1/3)*LFE
R’ = (2/3)*R + (1/3)*LFE

tormento

14th June 2024, 09:31

L’ = (2/3)*L + (1/3)*LFE
R’ = (2/3)*R + (1/3)*LFE
How does that translates in ffmpeg? [emoji28]

richardpl

14th June 2024, 14:57

That kills stereo dynamics, you should never ever use those silly mixing numbers.

tebasuna51

15th June 2024, 21:33

Play a 2.1 like it was a 2.0 don't try downmix it.

tormento

15th June 2024, 23:55

Play a 2.1 like it was a 2.0 don't try downmix it.
The problem is that just DTS supports 2.1 and my TV Box can't decode it. So, I have to encode in some other format.

filler56789

16th June 2024, 00:49

How does that translates in ffmpeg? [emoji28]

I don't know :)
I use ffmpeg.exe only for demuxing, remuxing, and as an audio decoder too. For simple audio processing, I prefer some standalone CLI tools, but for "complex" audio processing, I prefer sox, Goldwave, and last but not least, Avisynth. :)

filler56789

16th June 2024, 00:59

The problem is that just DTS supports 2.1 and my TV Box can't decode it. So, I have to encode in some other format.

AFAIR, the olde and goode AC-3 has always supported the 2.1 channel layout. :confused: So you can reencode your DTS stream to AC-3 without downmixing. :–|

tormento

16th June 2024, 08:37

AFAIR, the olde and goode AC-3 has always supported the 2.1 channel layout. :confused: So you can reencode your DTS stream to AC-3 without downmixing. :–|

I want a long term quality storage solution and my preferred format, now, is DDP and, unless I use ffmpeg that has a terrible DDP quality, official Dolby encoder doesn’t allow 2.1. Totalmedia allows a bit stranger configurations such as 6.1 but 2.1 is a no.

tebasuna51

16th June 2024, 09:17

Also Adobe Audition 2017 does not support EAC3 2.1, you need go to AC3 with big bitrate

Directly to AC3 2.1 (select the bitrate at your taste):

ffmpeg -i input2.1.dts -ab 256k output2.1.ac3

Converted to 2.0 with your downmix (select the quality at your taste):

ffmpeg -i input2.1.dts -af "pan=stereo|FL<c0+.707c2|FR<c1+.707c2" -acodec aac -aq 1.0 output2.0.aac

or let default ffmpeg downmix (LFE ignored, recommended):

ffmpeg -i input2.1.dts -ac 2 -acodec aac -aq 1.0 output2.0.aac

EDIT: If you still want preserve the full, and untouched, source 2.1 you can convert it to 3.1, than is supported by Adobe Audition 2017 and TotalMedia (3/0+LFE) eac3 encoders, with a empty center channel:

ffmpeg -i input2.1.dts -af "pan=3.1|FL=c0|FR=c1|FC=0.0c2|LFE=c2" -acodec pcm_s24le output3.1.wav

and after encode it to DDP

richardpl

16th June 2024, 10:02

Good luck with SoX and bloody AVS userbase. They are most prideful of all.

tebasuna51

16th June 2024, 10:11

Good luck with SoX and bloody AVS userbase. They are most prideful of all.
Seems you want to be banned.

Is that what you want?

richardpl

16th June 2024, 10:18

I speak the Truth, do what you please, wannabe "tyrant" moderator.

tormento

16th June 2024, 12:20

ffmpeg -i input2.1.dts -af "pan=3.1|FL=c0|FR=c1|FC=0.0c2|LFE=c2" -acodec pcm_s24le output3.1.wav
Thank you!

FranceBB

16th June 2024, 13:32

I speak the Truth

No community is flawed. That includes your beloved FFMpeg community where I've seen multiple people being insulted badly for asking relatively innocent question and then leave for good. I, myself, despite still opening tickets when I find bugs, have also been insulted in the past.

Think what you want, but from what I've seen over the years (2006-2024) first as a "lurker" and then as an active user, the Avisynth community is much more respectful and I made a lot of friends on Doom9. Tormento, the poster here, another Italian, is a real life friend too and we talk a lot outside of Doom9, just to name one. :)

At the end of the day, we're all normal people trying to help each other and support the various open source projects we believe in. Those projects share a lot of code anyway, so we're helping each other out for better or worse. The sooner you'll realize that the better you're gonna feel about being here. ;)

hellgauss

18th June 2024, 16:53

I use the following method with ffmpeg, which consist in 1) Volume analysis and 2) Final encoding/downmix

1)

For standard downmix

ffmpeg -i "source_audio_file" -ac 2 -f s16le pipe: | ffmpeg -f s16le -i pipe: -af "volumedetect" -f null - 2>test_volume.txt

For custom downmix

ffmpeg -i "source_audio_file" -ac 2 -lfe_mix_level *** -center_mix_level *** -surround_mix_level *** -f s16le pipe: | ffmpeg -f s16le -i pipe: -af "volumedetect" -f null - 2>test_volume.txt

Default values are:
-lfe_mix_level = 0.0 (usually subwoofer is not downmixed)
-center_mix_level = 0.707 (increment this if you cannot hear dialogues)
-surround_mix_level = 0.707

Then a file .txt is produced. Look for the string
max_volume: -[x] dB
[x] says how much you have to rise audio, in dB

Calculate
[r] = 10 ^ ( [x] / 20 ) (round down the result)

---------------------------

2)

ffmpeg -i "source_audio_file" -c:a flac -compression_level 12 -ac 2 -rmvol [r] -lfe_mix_level *** -center_mix_level *** -surround_mix_level *** out.flac

Of course you can use your preferred audio codec. If the codec is lossy, I suggest to subtract 0.1 -- 0.5 from [x] in order to avoid saturation by transcoding. Also, for better precision, I suggest to use -rmvol parameter in the final mix together with -mix_level parameters, without any intermediate files.

tebasuna51

19th June 2024, 10:33

@hellgauss

Yes, like ffmpeg can't do the AviSynth Normalize() we need the two pass to obtain the same result.

1)
The standard downmix -ac 2 is the same than use the defaults: -lfe_mix_level 0.0 -center_mix_level 0.707 -surround_mix_level 0.707
And is also the same than use:

ffmpeg -i "source" -af "pan=stereo|FL<c0+.707c2+.707c4|FR<c1+.707c2+.707c5, volumedetect" -f null - 2> test_volume.txt

2)
After that you can do the second pass to maximize peaks:

ffmpeg -i "source" -af "pan=stereo|FL<c0+.707c2+.707c4|FR<c1+.707c2+.707c5, volume=[r]" out.flac

Of course I recommend use a low coeficient for -surround_mix_level, maybe 0.5, or other coeficients at your taste (use it also in step 1)

hellgauss

19th June 2024, 12:31

When I studied the issue, I did not find precise reference for pan audiofilter usage, so I used my method.

Are those number the downmix coefficient for pan? If it is so, it is not guaranteed that saturation is not avoided in the first pass.

Perhaps < instead of = avoid saturation? Therefore an extra pass is needed.

Furthermore I do not know if the usage of < is the same for FL and FR. I prefer that they are normalized together and not independently.

For checking the true coefficient that ffmpeg use in my command line try

ffmpeg -i "source_audio" -ac 2 -t 1 -v debug -f null - 2>test.txt

Eventually you can add rmvol and mix_levels. In the output you find the string e.g. for a 7.1 input

FL: FL:0.320377 FR:0.000000 FC:0.226541 LFE:0.000000 BL:0.226541 BR:0.000000 SL:0.226541 SR:0.000000
FR: FL:0.000000 FR:0.320377 FC:0.226541 LFE:0.000000 BL:0.000000 BR:0.226541 SL:0.000000 SR:0.226541

As you can see the sum of each line is 1 so overflow is not possible. You can safely play with mix_levels, the sum is always rmvol=1 until you explicitly add the parameter rmvol

tebasuna51

19th June 2024, 13:55

Perhaps < instead of = avoid saturation?
Yes:
pan=stereo|FL<c0+.707c2+.707c4|FR<c1+.707c2+.707c5

is the same than
pan=stereo|FL=.414c0+.293c2+.293c4|FR=.414c1+.293c2+.293c5

then don't exist clip.
I use the < option to see better the relation with the -X_mix_levels parameters in your syntax.
Of course the output is the same with your syntax, only other option with the syntax used in this thread.

Furthermore I do not know if the usage of < is the same for FL and FR. I prefer that they are normalized together and not independently.

I can't understand this, each channel must have the same mix or you lose the balance.

The -volumedetect show the max volume you can amplify both channels without clip in any of them.

hellgauss

19th June 2024, 14:44

Thank you for your explanation, now it is clear. Probably I had some confusion when I studied the issue.

I still prefer my method since same command line works both on 5.1 and in 7.1 and you don't have to check channel order, it is more "user friendly". Furthermore I do not know how two concatenate filters works in ffmpeg and I prefer to use options rather than filters. Theorically if an intermediate output is stored in memory, you can loose 1-2 bit of information with those parameters (see final comment in my first post).

tebasuna51

20th June 2024, 00:11

Of course each user can use the preferred syntax.

...Also, for better precision, I suggest to use -rmvol parameter in the final mix together with -mix_level parameters, without any intermediate files.

But with the filters don't exist intermediate files, and the operations work with float samples and I doubt there are any appreciable difference in precission.

fa1rid

23rd June 2024, 03:52

hellgauss

23rd June 2024, 09:31

For a night mode use -ac 2 (standard downmix) without any further option and turn counterclockwise the amplifier wheel as much as you can. If, by doing so, you cannot hear dialogues try add e.g. -center_mix_level 0.9 . Do your own tests, there is no "proper way", except the "standard".

If your speakers are very good at low frequencies, try add e.g. -lfe_mix_level 0.4 (subwoofer). Again, do your own test and find which is better for you, for that movie, for your speakers and for your neighbours. Note that, according to standards, the downmix of the lfe channel can have some small side effects (but I did not notice that).

The "2pass" are needed to normalize the results with the -rmvol parameter which simply "rise the overall volume", which is usually very low after downmix. By default rmvol=1.0 (no volume change). Of course you can rise volume in your amplifier, but it is better to exploit the full sample range for better bit-depth precision. Since you cannot know a priori what is the right -rmvol parameter to use you need a first pass volume analysis.

Note: the -compression_level 12 in my example is not related to downmix but it is a flac swhitch for "best bitrate compression".

tebasuna51

23rd June 2024, 12:23

Can someone please simplify this discussion and summarize what is the proper way to downmix (7.1 or 5.1) to 2.0?

Maybe with two settings, one normal (keep sound levels and loudness same as the original), and one for night mode (where dialog is more clear)? Is that right?

1) Don't mistake 'Night Mode' (apply Dynamic Range Compression to reduce the differences between low and high volume) with 'dialog most clear' (Front Center volume, dialogs, enhanced to listen it better when it is necesary).

The filter 'Night Mode' (or the fix -af dynaudnorm=f=150:b=1) can be applied, or not, over a standard downmix or over a downmix FC enhanced.

2) Sorry, don't exist a proper way for all sources and all audio equipments. Normally all players can do a standard downmix if the source is 7.1 or 5.1 and only have a 2.0 audio equipments, then only need your action when you experiment problems listen the standard downmix.

BTW I put the 'standard downmix' and one with the 'FC enhanced' but any intermediate coeficients can be better for your source or audio equipment:

ffmpeg -i "source51" -af "pan=stereo|FL<c0+.707c2+.707c4|FR<c1+.707c2+.707c5, volumedetect" -f null - 2> standard_volume.txt

ffmpeg -i "source51" -af "pan=stereo|FL<c0+.9c2+.5c4|FR<c1+.9c2+.5c5, volumedetect" -f null - 2> FCenhanced_volume.txt

ffmpeg -i "source71" -af "pan=stereo|FL<c0+.707c2+.707c4+.707c6|FR<c1+.707c2+.707c5+.707c7, volumedetect" -f null - 2> standard_volume.txt

ffmpeg -i "source71" -af "pan=stereo|FL<c0+.9c2+.3c4+.3c6|FR<c1+.9c2+.3c5+.3c7, volumedetect" -f null - 2> FCenhanced_volume.txt

After that you can change the ', volumedetect' to ', volume=[r]' to Normalize and/or with ', dynaudnorm=f=150:b=1' to apply DRC if you want and your player don't have a proper 'Night Mode'

fa1rid

27th June 2024, 14:04

I encountered a puzzling issue while performing some test encodes. I encoded a 5.1 audio file to 2.0 using ffmpeg -ac 2. There were three test encodes: one to AAC, one to FLAC, and one to AC3. Interestingly, both the AAC and AC3 encodes had much louder volume compared to the FLAC, which was similar to the source. I verified this by checking the waveform in Avisynth and Audacity, and it was significantly larger in the AAC and AC3 files compared to the FLAC.

I conducted another test, encoding to AAC without downmixing, and found the audio level was similar to the source.

Why is FFMPEG adding gain when encoding and downmixing to AAC and AC3 only? The audio source was a Fraunhofer 5.1 HE-AAC mp4 file.

Commands used:

ffmpeg -i "AAC 5.1.mp4" -vn -ac 2 "5.1 to 2.0 -ac2.flac"
ffmpeg -i "AAC 5.1.mp4" -vn -b:a 256k -ac 2 "5.1 to 2.0 -ac2.m4a"
ffmpeg -i "AAC 5.1.mp4" -vn -b:a 256k -ac 2 "5.1 to 2.0 -ac2.ac3"

hellgauss

27th June 2024, 18:28

That's because default aac and ac3 library in ffmpeg accept float32 input, so the overflow risk is not catastrophic (although I would not recommend it anyway). The downmix coefficient are not rescaled with sum=1.

You can somewhat "restore" a saturated aac by opening it in a software which accept f32 input, such as audacity, and adding a negative gain: you will not see the classical "plateau" of the saturated audio.

To overcome the problem either:
- Encode to an intermediate wav (after adding proper gain with rmvol), then encode to aac or ac3
- If you encode in aac, try a ffmpeg build with libfdk_aac support and add "-c:a libfdk_aac" to command line. libfdk work with i16, so downmix coefficient are properly rescaled, and it works better than standard aac codec in ffmpeg.

hello_hello

27th June 2024, 20:28

That's because default aac and ac3 library in ffmpeg accept float32 input, so the overflow risk is not catastrophic (although I would not recommend it anyway). The downmix coefficient are not rescaled with sum=1.

So sometimes it reduces the volume to ensure there's no clipping and sometimes it doesn't?
That makes me feel better, because in post #13 I mentioned that -ac 2 can produce peaks above 0dB, yet today I read the lines from your ffmpeg log showing it doesn't.
It's probably because at the time ffmpeg's output was being piped to QAAC as 32 bit float for encoding, but I assumed it'd always downmix the same way, volume-wise.

You can somewhat "restore" a saturated aac by opening it in a software which accept f32 input, such as audacity, and adding a negative gain: you will not see the classical "plateau" of the saturated audio.

I recall testing a few encoders to see what they'd do with peaks above 0dB. I can't remember which ones I tried as it was a fair while ago, but I do remember QAAC faithfully encoding peaks quite a ways above 0dB. I know ideally you wouldn't want to do that, but it's nice to have some margin for error, because....

Are you aware the volume of AAC audio can be adjusted losslessly as it can be for MP3? The same limitation of 1.5 dB increments still applies, but it's better than having to decode and re-encode.

fa1rid

28th June 2024, 11:41

As @hello_hello mentioned, -ac 2 can produce peaks above 0dB. Is that fine or not? I think it would play fine but it can cause issues in some cases?
In this case I guess we shouldn't use the default ffmpeg -ac 2 ?

@hellgauss you mentioned to use the -ac 2 but to turn counterclockwise the amplifier wheel as much as I can, how to do that?

Should I just use what @tebasuna51 suggested:

-af "pan=stereo|FL<c0+.707c2+.707c4|FR<c1+.707c2+.707c5, volumedetect"
then
-af "pan=stereo|FL<c0+.707c2+.707c4|FR<c1+.707c2+.707c5, volume=[r]" <-- [r] means replace with the value we get from volumedetect right?

in that formula we have the coefficients with a sum>1, is that fine?
Can I add a very small amount of LEF like 0.2? because maybe 0.4 could be high?

So the formula used by ffmpeg seems to be different if we are using 32bit or not?

hellgauss

28th June 2024, 13:52

@fa1rid

The fastest way to achieve a "night mode" is to lower the volume in the amplifier. Some amplifier have a "smart" , i.e. dynamic, "night mode", which of course can be incorporated in the audio file via proper filter, but I would not suggest that: you will have a file optimized for night view.

It is better to achieve that by post processing, i.e. using your player/amplifier switches.

I'm not familiar with the syntax proposed by tebasuna51 but as I understood the usage of "<" instead of "=" ensure that those coefficient are rescaled to sum=1, whatever the output format (integer or float). Also I do not know if the channel order is always the same for each input file.

I prefer my method since I only have to deal with 2-3 numbers: -center_mix_level to raise voice if it seems too low, lfe_mix_level if I want to play with subwoofer channel and surround_mix_level if I want to emphasize rear channels. When going away from the standard, I do it at my own risk.

Since I only use flac or libfdk_aac to encode to stereo I was not aware (my fault) that sometimes ffmpeg does not rescale those coefficient. Technically, you cannot saturate a float32 samlpe, except for astronomical gain, so probably ffmpeg though that it is not necessary. However it is better that float32 audio can be converted to standard integer, so peaks >0db are not desirable.

If your codec needs float input either use tebasuna51 method or use an intermediate i16/i24 wav/flac, or use pipe syntax enforcing an integer conversion in the workflow.