Oversound: Mixing Italian narrator on English Full Mix track

FranceBB · 17th January 2024, 22:03

Hi there guys,
I have a documentary with two files:

1) Video + audio in English (stereo full mix)
2) audio in Italian (mono, narrator only)

I'm trying to mix the Italian narrator on the English stereo full mix track to create an oversound track.

I've been doing this:

Quote:

#Indexing the English Full Mix
video=LWLibavVideoSource("EarthAmazingDay.mxf")
audio=LWLibavAudioSource("EarthAmazingDay.mxf", stream_index=1)
AudioDub(video, audio)

#Indexing the Italian Narrator
Narrator=WAVSource("EarthAmazingDay.wav").GetChannel(1).ResampleAudio(48000).ConvertAudioTo24bit()

#Retrieve the Stereo CH.1-2
Left=GetChannel(1)
Right=GetChannel(2)

#Create the two stereo
Stereo_Original=MergeChannels(Left, Right)
Stereo_Narrator_Only=MergeChannels(Narrator, Narrator)

#Mix Italian Narrator over the English Full Mix
MixAudio(Stereo_Original, Stereo_Narrator_Only, 0.20, 0.80)

I indeed get an oversound in output and as you can see I'm mixing 20% of the English Full Mix with 80% of the Italian Narrator to create the output.
The reason for this is that if I raise the % of the English Full Mix, then I have to lower the % of the Italian Narrator and that's a problem 'cause the English voices become too loud and people can't understand the Italian translation anymore.
On the other hand, if I keep the current % as 20% English Full Mix 80% Italian Narrator, then when nobody is talking and there's only music and effects, those are low, very low, 'cause the whole mix has been lowered down.

The idea here would be to lower the English Full Mix when the Italian narrator is talking and leave it as-is when the Italian narrator isn't talking (i.e when there's only music and effects).

I also tried a more radical approach of using Sox to filter out the English Narrator entirely but it led to some very poor and questionable results:

Quote:

#Indexing the English Full Mix
video=LWLibavVideoSource("EarthAmazingDay.mxf")
audio=LWLibavAudioSource("EarthAmazingDay.mxf", stream_index=1)
AudioDub(video, audio)

#Indexing the Italian Narrator
Narrator=WAVSource("EarthAmazingDay.wav").GetChannel(1).ResampleAudio(48000).ConvertAudioTo24bit()

#Create a M&E only track from the English Stereo in CH.1-2
filtered = soxfilter("sinc 20-20000")
music_and_effects=mixaudio(filtered.GetLeftChannel(),filtered.GetRightChannel(),0.794,-0.794)

#Create the two stereo
Stereo_No_Narrator=MergeChannels(music_and_effects, music_and_effects)
Stereo_Narrator_Only=MergeChannels(Narrator, Narrator)

#Mix Italian Narrator on the filtered M&E track
MixAudio(Stereo_No_Narrator, Stereo_Narrator_Only, 0.40, 0.60)

The idea being that by performing the differences between left and right in the original mix, I would only get the music 'cause the English Narrator is mono so it would be eliminated.
Still, this is not always the case, which means that not only this approach can fail but the whole mix would become a mono which is a no-go no-go...

So... what do I do?
The idea is to take CH.1-2, the English Full Mix and put the Italian Narrator on top of it in a decent way.

Emulgator · 19th January 2024, 00:30

Ducking it is.
Compression sidechained.

A: The cheapo way: No sidechaining: Mix both and compress the mix.
The more level the narrator gets the more the sum of both will contain from him if he talks.
Not good if the narrator is uncompressed, pops in and out

B: The middle-of the road way: Sidechaining:
Italian narrator generates gain reduction control voltage, the latter is applied to "duck" the english source, before mixing.
More dominance, but definitely not good if the narrator is uncompressed, pops in and out.

C: The correct way: Sidechaining:
Italian narrator is precompressed generates gain reduction control voltage, the latter is applied to "duck" the english source, before mixing.
Nicer, balanced dominance.

D: Can be refined further (only offline) with introduction of negative control delay.
Now english base sound is attenuated a bit before narrator sets in.

FranceBB · 19th January 2024, 21:34

Ok, so, for the first content, the documentary, using "oops" (namely the difference between the left and right channel) worked like a charm 'cause the English Narrator was mono, while the music was stereo, so that produced a nice Music and Effects only track, so effectively what I've done was:

Code:

#Indexing
video=LWLibavVideoSource("EarthAmazingDay.mxf.mxf")
audio=LWLibavAudioSource("EarthAmazingDay.mxf.mxf")
AudioDub(video, audio)

#Narrator
Narrator=WAVSource("EarthAmazingDay.mxf.wav").ResampleAudio(48000).ConvertAudioTo24bit()

#M&E
music_and_effects=soxfilter("oops")


#Stereo
Stereo_No_Narrator=MergeChannels(music_and_effects, music_and_effects)
Stereo_Narrator_Only=MergeChannels(Narrator, Narrator)


#Mix
MixAudio(Stereo_No_Narrator, Stereo_Narrator_Only, 0.40, 0.60)

which worked really well.

This however didn't work for the second program I had, a British reality program in which some music and effects were actually mono, so they cancelled each other.
This meant that I really had to perform oversound and, although I'm not proud of it, I went with Emulgator's option 1, basically the "cheap" way.
Effectively I followed the following steps:

Step 1:
Uniform the two mixes.

Basically the two audio files are really different from one another and even within the same file there were some voices which were high, some other were low instead, music and effects were all over the place and it was all pretty bad, so the idea was to bring everything up (and down later on).
This is why I brought everything to -18 with a very very very strict LRA (remember we're talking about programs / cheap low cost production and documentaries here):

Quote:

ffmpeg.exe -hide_banner -i "23151en.mxf" -c:a pcm_s24le -ar 48000 -af loudnorm=I=-18:LRA=1:tp=-2:linear=false -f wav -y "23151_en18.wav"

ffmpeg.exe -hide_banner -i "23151ita.wav" -c:a pcm_s24le -ar 48000 -af loudnorm=I=-18:LRA=1:tp=-2:linear=false -f wav -y "23151_ita18.wav"

pause

Step 2:
Perform the mix

Quote:

#Indexing
LWLibavVideoSource("23151en.mxf")

#Narrator
Narrator=WAVSource("23151_ita18.wav")

#English Full Mix
Stereo_English=WAVSource("23151_en18.wav")

#Mix
MixAudio(Stereo_English, Narrator, 0.20, 0.80)

Step 3:
Bring the newly created oversound mix to -24 LUFS and loudness correct the original english full mix to -24 LUFS

Quote:

ffmpeg.exe -hide_banner -i "RVN23151en.mxf" -c:a pcm_s24le -ar 48000 -af loudnorm=I=-24:LRA=1:tp=-2 -f wav "23151_English_24.wav"

ffmpeg.exe -hide_banner -i "Oversound_2.avs" -c:a pcm_s24le -ar 48000 -af loudnorm=I=-24:LRA=1:tp=-2 -f wav "23151_Italian_24.wav"

pause

Step 4:
Mux everything back in the original XDCAM file to get the following audio layout:

CH.1-2 Italian Oversound Dub
CH.3-4 English Full Mix

Quote:

ffmpeg.exe -hide_banner -i "23151.mxf" -i "23151_Italian_24.wav" -i "23151_English_24.wav" -map 0:0 -map 1:0 -map 2:0 -c:v copy -c:a copy -f mxf -y "23151_oversound_no_remux.mxf"

pause

Step 5:
Remux everything with the BBC BMX muxer for better compatibility (it's a better mxf muxer after all) and setting the TC to 10:00:00:00

Quote:

bmxtranswrap.exe -p -y 10:00:00:00 -t op1a --track-map stereo -o "23151_oversound.mxf" "23151_oversound_no_remux.mxf"

It worked and I can see that the levels of the Italian oversound and the English Full Mix are now pretty much the same all the time.

Next on the list is gonna be the creation of a workflow to speed the whole thing up and make it semi-automatic.

junh1024 · 20th January 2024, 08:35

the problem with OOPS is that while it removes the narration it also removes some M&E. So I would suggest ducking, or a more sophisticated approach like using ML/AI audio separation to remove the EN narration. The latter would keep the M&E better than OOPS, and sound better overall.

19th January 2024, 00:30	#2 \| Link
Emulgator Big Bit Savings Now ! Join Date: Feb 2007 Location: close to the wall Posts: 1,546	Ducking it is. Compression sidechained. A: The cheapo way: No sidechaining: Mix both and compress the mix. The more level the narrator gets the more the sum of both will contain from him if he talks. Not good if the narrator is uncompressed, pops in and out B: The middle-of the road way: Sidechaining: Italian narrator generates gain reduction control voltage, the latter is applied to "duck" the english source, before mixing. More dominance, but definitely not good if the narrator is uncompressed, pops in and out. C: The correct way: Sidechaining: Italian narrator is precompressed generates gain reduction control voltage, the latter is applied to "duck" the english source, before mixing. Nicer, balanced dominance. D: Can be refined further (only offline) with introduction of negative control delay. Now english base sound is attenuated a bit before narrator sets in. __________________ "To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain) "Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..." Last edited by Emulgator; 19th January 2024 at 00:44.

20th January 2024, 08:35	#4 \| Link
junh1024 Registered User Join Date: Mar 2011 Posts: 59	the problem with OOPS is that while it removes the narration it also removes some M&E. So I would suggest ducking, or a more sophisticated approach like using ML/AI audio separation to remove the EN narration. The latter would keep the M&E better than OOPS, and sound better overall.