Log in

View Full Version : SoxFilter 2


pinterf
1st December 2023, 17:11
Latest:
SoxFilter 2.2 for AviSynth - 20240104
https://github.com/pinterf/SoxFilter/releases

----

SoxFilter 2.0 for AviSynth.
Based on Sox audio library 14.4.2.
Requres Avisynth+ 3.7.3

Download:
https://github.com/pinterf/SoxFilter/releases/tag/v2.0

Previous discussion:
https://forum.doom9.org/showthread.php?t=104792&page=3

Treat it as a beta, you are the real testers. Spent some weeks on the project this year, got tired of it.

Some effects has been changed, merged with others or removed from the Sox core library since the 1.1 release; do not erase your old dll's yet.

Thundik81
1st December 2023, 19:00
Thanks!

tebasuna51
2nd December 2023, 12:51
Thanks, I'll test it.

kedautinh12
2nd December 2023, 12:59
Thanks (R.D don't online so long so I won't get complain :D)

tebasuna51
2nd December 2023, 15:42
Tested the downmix 7.1->5.1 (than can be applied in MEGUI):
# As AudioLimiter.dll is not available, SoftClipperFromAudX() cannot be used
# 7.1 Channels L,R,C,LFE,BL,BR,SL,SR -> standard 5.1
function c71_c51(clip a)
{
front = GetChannel(a, 1, 2, 3, 4)
back = GetChannel(a, 5, 6)
side = GetChannel(a, 7, 8)
# mix = MixAudio(back, side, 0.5, 0.5)#.SoftClipperFromAudX(0.0)
mix = MixAudio(back, side, 0.5, 0.5).SoxFilter("compand 0.0,0.0 -90,-84,-8,-2,-6,-1,-0,-0.1")
return MergeChannels(front, mix)
}

With the same output than ffmpeg here (https://forum.doom9.org/showthread.php?p=1920201#post1920201)

pinterf
2nd December 2023, 22:49
Thanks for trying it out.

I'm just testing the behavior of effects which change the channel count and/or the sample rate, so far so good. 2.1 will get this feature, 2.0 is giving an error message intentionally.

This works in my test for example

ColorBars().Trim(0,30*60*1) # ~1 min Stereo 48000Hz
SoxFilter("rate 2400", "remix -") # resample to 2400 Hz and convert to mono

FranceBB
3rd December 2023, 02:03
Thank you so much for this!
I look forward to try version 2.1 on the various upmix functions! (UpSoundOnSound() etc).
On Monday I'll try version 2.0 at work and I'll report back, but thank you for picking this one up, it feels like yesterday but it's been a long time since the last release as we have to go all the way back to 2006!
Back then I had just started fansubbing in Italy, I was still studying at school and my main worry at the time was going to the beach on summer and hanging around in winter.
Man, those were the days... Little did I know things were gonna get harder growing up from there... :(

Anyway, enough for the depressing thing, this is a very nice present under the Christmas tree! :D

https://i.imgur.com/pRTHGKU.png

Oh, I almost forgot: if anyone wants to try this on XP, don't, it's not compiled targeting it so it lacks the function GetLocaleInfoEx and won't work when it makes a kernel call ('cause it doesn't exist in kernel32).

flossy_cake
3rd December 2023, 11:45
Thank you very much for this.

I was wondering though if this issue got fixed:

I make some test and I can confirm the bug detected by Jenyok here:
http://forum.doom9.org/showthread.php?p=1622520#post1622520

Using:
back = soxfilter(a, "filter 100-7000")

There are buggy samples, when reach 0dB invert the sign, than can produce crackling. Then I can't recommend the use of this plugin without low the volume of the source before.

https://forum.doom9.org/attachment.php?attachmentid=14933&stc=1&d=1439208518



I am interested in using it as a compressor like this:

for those cases where dialogue is quiet and explosions are very loud....

SoxFilter("compand 0.3,1.0 6:-90,-90,-70,-70,-60,-20,0,0")

# attack time: 0.3 seconds
# decay time: 1.0 seconds
# soft-knee amount for smoothing between curve points: 6db
# curve point 1: set -90db to -90db (no change)
# curve point 2: set -70db to -70db (no change)
# curve point 3: set -60db to -20db (40db gain)
# curve point 4: set 0db to 0db (no change)

Resulting curve looks something like this but with 6db of soft knee smoothing between each linear segment

https://i3.lensdump.com/i/Tpw6Zc.png


The result sounds crackly but then I tried changing the final curve point to 0,-5 and it sounds better but maybe still a bit crackly, not sure.

If I seek too much the audio then gets really crackly, but then I can seek again and it resolves itself. Does it still require linear access?

:thanks:

flossy_cake
3rd December 2023, 12:02
I think the -5db trick might be solving the crackling issue mentioned by tebasuna51.

And I lowered the attack/decay for less "pumping" effect, and made the curve less steep:

SoxFilter("compand 0.01,0.1 6:-90,-90,-70,-70,-50,-30,0,-5")

Seems to work fairly well, just need to avoid seeking as that can cause it to crackle. Tried Preroll(30) without effect.

So I guess my only request would be to make it more compatible with seeking, but that's probably not of interest to most users.

flossy_cake
3rd December 2023, 12:18
SoxFilter("compand 0.0,0.0 -90,-84,-8,-2,-6,-1,-0,-0.1")

May I ask how you came up with this curve? Is it some standard curve used by Dolby or something? It sounds quite good to me. But I see you didn't specify a soft knee -- does that mean it uses linear segments or does it default to something?

pinterf
3rd December 2023, 16:37
Version 2.1
https://github.com/pinterf/SoxFilter/releases/tag/v2.1

- Allow effects which can alter sampling rate
- Allow effects which can change the number of channels
- Add XP build (v141_xp toolset)

richardpl
3rd December 2023, 16:47
Is this still using same internal processing sample format that sox is using?

FranceBB
3rd December 2023, 19:49
Thank you Ferenc!
I tested version 2.1, but when I call

UpSoundOnSound()

it says:

SoxFilter: (filter) could not find any effect.

The function I'm trying to use is:

function UpSoundOnSound(clip a)
{
# Sound On Sound Profile
# SOS approach Profile with 20ms delay and some attenuation on surround
back = a.soxfilter("filter 100-7000")
fl = a.GetLeftChannel()
fr = a.GetRightChannel()
cc = mixaudio(a.GetRightChannel(),a.GetLeftChannel,0.5,0.5)
lfe = ConvertToMono(a).SoxFilter("lowpass 120","vol -0.5")
sl = mixaudio(back.GetLeftChannel(),back.GetRightChannel(),0.668,-0.668)
sr = mixaudio(back.GetRightChannel(),back.GetLeftChannel(),0.668,-0.668)
sl = DelayAudio(sl,0.02)
sr = DelayAudio(sr,0.02)
return MergeChannels( fl, fr, cc, lfe, sl, sr)
}

to go from stereo to 5.1.

Other upmix functions end up with the same results, for instance:

UpDialog()

results in the same error message:

SoxFilter: (filter) could not find any effect.


function UpDialog(clip a)
{
# Audio with mostly dialog (ie. Comedy, Drama)
# Profile to use with audio sources that have mostly mono content. 20ms delay and -3db attenuation on surround
# Note: the center channel is very weak for this profile
front = a.soxfilter("filter 20-20000")
back = a.soxfilter("filter 100-7000")
fl = mixaudio(front.GetLeftChannel(),front.GetRightChannel(),0.794,-0.794)
fr = mixaudio(front.GetRightChannel(),front.GetLeftChannel(),0.794,-0.794)
cc = mixaudio(mixaudio(front.GetLeftChannel(),fl,1,-1),mixaudio(front.GetRightChannel(),fr,1,-1),0.224,0.224)
lfe = ConvertToMono(a).SoxFilter("lowpass 120","vol -0.596")
sl = mixaudio(back.GetLeftChannel(),back.GetRightChannel(),0.562,-0.562)
sr = mixaudio(back.GetRightChannel(),back.GetLeftChannel(),0.562,-0.562)
sl = DelayAudio(sl,0.02)
sr = DelayAudio(sr,0.02)
return MergeChannels( fl, fr, cc, lfe, sl, sr)
}

pinterf
3rd December 2023, 20:03
This is the day of the diligently read f** readme files :)

@FranceBB
"Also, some effects were removed, and others added in the past ~15 years. The parameters of some effects had been changed as well.
E.g. reverb parameters were modified in 2008; "filter" was removed, use "sinc" instead (with a different syntax)."

@richardpl
"SoxFilter will convert the audio to 32 bit integer format, this is how libsox works internally. It calls "ConvertAudio" which is part of AviSynth+."

pinterf
3rd December 2023, 20:10
I admit, this error message "SoxFilter: (filter) could not find any effect." is not something I would expect. I'm going to check, as far as I remember such an unknown effect name would return a proper error message. At least it returned in April :), this UpSoundOnSound and "filter" issue was my first surprise (showing that things has been changed in two decades).

FranceBB
4th December 2023, 03:17
Of course it was, it's been 15 years, silly me! xD

Ok, so, second test, first I extracted the first two channel (stereo) out of a sample mxf and trimmed it to 1 minute only:

video=BlankClip(length=74625, width=848, height=480, pixel_type="YV12", fps=25)
audio=LWLibavAudioSource("I:\temp\NAS\EART4ALL_FTR-2D-25_F_IT-IT_IT-G_51-IT_2K_20231108_OV\Audio.mxf")
AudioDub(video, audio)

Left=GetChannel(1)
Right=GetChannel(2)

MergeChannels(Left, Right)

trim(0, 1500)


https://i.imgur.com/pj2AHcc.png

then I upmixed using the original SoxFilter 1.0 and the SoundOnSound 5.1 upmixing function:

video=BlankClip(length=1500, width=848, height=480, pixel_type="YV12", fps=25)
audio=LWLibavAudioSource("I:\temp\stereo.wav")
AudioDub(video, audio)

back = soxfilter("filter 100-7000")
fl = GetLeftChannel()
fr = GetRightChannel()
cc = mixaudio(GetRightChannel(),GetLeftChannel,0.5,0.5)
lfe = ConvertToMono().SoxFilter("lowpass 120","vol -0.5")
sl = mixaudio(back.GetLeftChannel(),back.GetRightChannel(),0.668,-0.668)
sr = mixaudio(back.GetRightChannel(),back.GetLeftChannel(),0.668,-0.668)
sl = DelayAudio(sl,0.02)
sr = DelayAudio(sr,0.02)

MergeChannels(fl, fr, cc, lfe, sl, sr)

https://i.imgur.com/NHlHaqz.png

Then I upgraded to SoxFilter 2.1 and I replaced Filter with Sinc:

video=BlankClip(length=1500, width=848, height=480, pixel_type="YV12", fps=25)
audio=LWLibavAudioSource("I:\temp\stereo.wav")
AudioDub(video, audio)

back = soxfilter("sinc 100-7000")
fl = GetLeftChannel()
fr = GetRightChannel()
cc = mixaudio(GetRightChannel(),GetLeftChannel,0.5,0.5)
lfe = ConvertToMono().SoxFilter("lowpass 120","vol -0.5")
sl = mixaudio(back.GetLeftChannel(),back.GetRightChannel(),0.668,-0.668)
sr = mixaudio(back.GetRightChannel(),back.GetLeftChannel(),0.668,-0.668)
sl = DelayAudio(sl,0.02)
sr = DelayAudio(sr,0.02)

MergeChannels(fl, fr, cc, lfe, sl, sr)

https://i.imgur.com/xda4mt0.png


Then I compared the two versions:


video=BlankClip(length=1500, width=848, height=480, pixel_type="YV12", fps=25)
old_sox=LWLibavAudioSource("I:\temp\SoxFilter_10_Upmix_51.wav")
new_sox=LWLibavAudioSource("I:\temp\SoxFilter_21_Upmix_51.wav")

old=AudioDub(video, old_sox).VideoTek().Crop(692, 420, -0, -0)
new=AudioDub(video, new_sox).VideoTek().Crop(692, 420, -0, -0)

StackHorizontal(old, new)


https://i.imgur.com/QOJuvZ0.png
https://i.imgur.com/EZ9eGGw.png
https://i.imgur.com/hdeschZ.png
https://i.imgur.com/QN1Gzfr.png
https://i.imgur.com/UBiarMI.png


and of course the Surround Left and Surround Right are different between the two version as we can also see with Subtract:

https://i.imgur.com/NCTtJ4Y.png


I then grabbed SL and SR from both the old Sox upmix and the new Sox upmix and I compared them individually:

AudioDub(video, old_sox)

sl=GetChannel(5)
sr=GetChannel(6)
MergeChannels(sl, sr)


AudioDub(video, new_sox)

sl=GetChannel(5)
sr=GetChannel(6)
MergeChannels(sl, sr)


and although they seem different, when I listened to them, they both did their job, in fact in both the old one and the new one I could only hear the little sparrows tweet and then the music soundtrack without any words in it.

I then wrote something silly to prevent the upmix functions from failing when using an older version of Sox:

function UpSoundOnSound(clip a)
{
# Sound On Sound Profile
# SOS approach Profile with 20ms delay and some attenuation on surround

try {

back = a.soxfilter("filter 100-7000")
fl = a.GetLeftChannel()
fr = a.GetRightChannel()
cc = mixaudio(a.GetRightChannel(),a.GetLeftChannel,0.5,0.5)
lfe = ConvertToMono(a).SoxFilter("lowpass 120","vol -0.5")
sl = mixaudio(back.GetLeftChannel(),back.GetRightChannel(),0.668,-0.668)
sr = mixaudio(back.GetRightChannel(),back.GetLeftChannel(),0.668,-0.668)
sl = DelayAudio(sl,0.02)
sr = DelayAudio(sr,0.02)
return MergeChannels( fl, fr, cc, lfe, sl, sr)
}

catch(back) {

back = a.soxfilter("sinc 100-7000")
fl = a.GetLeftChannel()
fr = a.GetRightChannel()
cc = mixaudio(a.GetRightChannel(),a.GetLeftChannel,0.5,0.5)
lfe = ConvertToMono(a).SoxFilter("lowpass 120","vol -0.5")
sl = mixaudio(back.GetLeftChannel(),back.GetRightChannel(),0.668,-0.668)
sr = mixaudio(back.GetRightChannel(),back.GetLeftChannel(),0.668,-0.668)
sl = DelayAudio(sl,0.02)
sr = DelayAudio(sr,0.02)
return MergeChannels( fl, fr, cc, lfe, sl, sr)
}
}



The samples are here (link valid for 7 days): https://we.tl/t-Q83d2Uthtr

in the package there's:

- stereo.wav (original audio)
- SoxFilter_10_Upmix_51.wav (old sox upmix)
- SoxFilter_21_Upmix_51.wav (new sox upmix)
- SoxFilter_10_SL_SR_Only.wav (Surround Left Surround Right only of the old Sox Upmix)
- SoxFilter_21_SL_SR_Only.wav (Surround Left Surround Right only of the new Sox Upmix)


Before I go on with this, though, I'd like to ask Tebasuna as he's far more expert than me on this.

Needless to say, the same "workaround" can be applied to the other upmix functions of course.

UpAction:


function UpAction(clip a)
{
# Audio with a mix of sounds (ie. Action, Adventure)
# Profile to use with audio sources that have a wider range of sound content. 20ms delay and -3db attenuation on surround
# Note: General purpose profile

try {

front = a.soxfilter("filter 20-20000")
back = a.soxfilter("filter 100-7000")
fl = mixaudio(front.GetLeftChannel(),front.GetRightChannel(),0.668,-0.668)
fr = mixaudio(front.GetRightChannel(),front.GetLeftChannel(),0.668,-0.668)
cc = mixaudio(mixaudio(front.GetLeftChannel(),fl,1,-1),mixaudio(front.GetRightChannel(),fr,1,-1),0.398,0.398)
lfe = ConvertToMono(a).SoxFilter("lowpass 120","vol -0.447")
sl = mixaudio(back.GetLeftChannel(),back.GetRightChannel(),0.473,-0.473)
sr = mixaudio(back.GetRightChannel(),back.GetLeftChannel(),0.473,-0.473)
sl = DelayAudio(sl,0.02)
sr = DelayAudio(sr,0.02)
return MergeChannels( fl, fr, cc, lfe, sl, sr)
}

catch(front) {

front = a.soxfilter("sinc 20-20000")
back = a.soxfilter("sinc 100-7000")
fl = mixaudio(front.GetLeftChannel(),front.GetRightChannel(),0.668,-0.668)
fr = mixaudio(front.GetRightChannel(),front.GetLeftChannel(),0.668,-0.668)
cc = mixaudio(mixaudio(front.GetLeftChannel(),fl,1,-1),mixaudio(front.GetRightChannel(),fr,1,-1),0.398,0.398)
lfe = ConvertToMono(a).SoxFilter("lowpass 120","vol -0.447")
sl = mixaudio(back.GetLeftChannel(),back.GetRightChannel(),0.473,-0.473)
sr = mixaudio(back.GetRightChannel(),back.GetLeftChannel(),0.473,-0.473)
sl = DelayAudio(sl,0.02)
sr = DelayAudio(sr,0.02)
return MergeChannels( fl, fr, cc, lfe, sl, sr)
}
}



UpDialog:


function UpDialog(clip a)
{
# Audio with mostly dialog (ie. Comedy, Drama)
# Profile to use with audio sources that have mostly mono content. 20ms delay and -3db attenuation on surround
# Note: the center channel is very weak for this profile

try {

front = a.soxfilter("filter 20-20000")
back = a.soxfilter("filter 100-7000")
fl = mixaudio(front.GetLeftChannel(),front.GetRightChannel(),0.794,-0.794)
fr = mixaudio(front.GetRightChannel(),front.GetLeftChannel(),0.794,-0.794)
cc = mixaudio(mixaudio(front.GetLeftChannel(),fl,1,-1),mixaudio(front.GetRightChannel(),fr,1,-1),0.224,0.224)
lfe = ConvertToMono(a).SoxFilter("lowpass 120","vol -0.596")
sl = mixaudio(back.GetLeftChannel(),back.GetRightChannel(),0.562,-0.562)
sr = mixaudio(back.GetRightChannel(),back.GetLeftChannel(),0.562,-0.562)
sl = DelayAudio(sl,0.02)
sr = DelayAudio(sr,0.02)
return MergeChannels( fl, fr, cc, lfe, sl, sr)
}

catch(front) {

front = a.soxfilter("sinc 20-20000")
back = a.soxfilter("sinc 100-7000")
fl = mixaudio(front.GetLeftChannel(),front.GetRightChannel(),0.794,-0.794)
fr = mixaudio(front.GetRightChannel(),front.GetLeftChannel(),0.794,-0.794)
cc = mixaudio(mixaudio(front.GetLeftChannel(),fl,1,-1),mixaudio(front.GetRightChannel(),fr,1,-1),0.224,0.224)
lfe = ConvertToMono(a).SoxFilter("lowpass 120","vol -0.596")
sl = mixaudio(back.GetLeftChannel(),back.GetRightChannel(),0.562,-0.562)
sr = mixaudio(back.GetRightChannel(),back.GetLeftChannel(),0.562,-0.562)
sl = DelayAudio(sl,0.02)
sr = DelayAudio(sr,0.02)
return MergeChannels( fl, fr, cc, lfe, sl, sr)
}
}


UpFarina:

function UpFarina(clip a)
{
# Farina Profile
# Farina/Sursound approach Profile M=L+R, S=L-R, c=0.75, L' = (1-c/4)*M+(1+c/4)*S, C' = c*M, R' = (1-c/4)*M-(1+c/4)*S
# also added with 20ms delay and some attenuation on surround

try {

front = a.soxfilter("filter 20-20000")
back = a.soxfilter("filter 100-7000")
fl = mixaudio(mixaudio(front.GetLeftChannel(),front.GetRightChannel(),0.500,0.500),mixaudio(front.GetLeftChannel(),front.GetRightChannel(),0.500,-0.500),0.8125,1.1875)
fr = mixaudio(mixaudio(front.GetLeftChannel(),front.GetRightChannel(),0.500,0.500),mixaudio(front.GetLeftChannel(),front.GetRightChannel(),0.500,-0.500),0.8125,-1.1875)
cc = mixaudio(front.GetRightChannel(),front.GetLeftChannel,0.375,0.375)
lfe = ConvertToMono(a).SoxFilter("lowpass 120","vol -0.5")
sl = mixaudio(back.GetLeftChannel(),back.GetRightChannel(),0.668,-0.668)
sr = mixaudio(back.GetRightChannel(),back.GetLeftChannel(),0.668,-0.668)
sl = DelayAudio(sl,0.02)
sr = DelayAudio(sr,0.02)
return MergeChannels( fl, fr, cc, lfe, sl, sr)
}

catch(front) {

front = a.soxfilter("sinc 20-20000")
back = a.soxfilter("sinc 100-7000")
fl = mixaudio(mixaudio(front.GetLeftChannel(),front.GetRightChannel(),0.500,0.500),mixaudio(front.GetLeftChannel(),front.GetRightChannel(),0.500,-0.500),0.8125,1.1875)
fr = mixaudio(mixaudio(front.GetLeftChannel(),front.GetRightChannel(),0.500,0.500),mixaudio(front.GetLeftChannel(),front.GetRightChannel(),0.500,-0.500),0.8125,-1.1875)
cc = mixaudio(front.GetRightChannel(),front.GetLeftChannel,0.375,0.375)
lfe = ConvertToMono(a).SoxFilter("lowpass 120","vol -0.5")
sl = mixaudio(back.GetLeftChannel(),back.GetRightChannel(),0.668,-0.668)
sr = mixaudio(back.GetRightChannel(),back.GetLeftChannel(),0.668,-0.668)
sl = DelayAudio(sl,0.02)
sr = DelayAudio(sr,0.02)
return MergeChannels( fl, fr, cc, lfe, sl, sr)
}
}

UpGerzen:

function UpGerzen(clip a)
{
# Gerzen Profile
# Gerzen approach Profile modified with 20ms delay and some attenuation on surround

try {

front = a.soxfilter("filter 20-20000")
back = a.soxfilter("filter 100-7000")
fl = mixaudio(front.GetLeftChannel(),front.GetRightChannel(),0.885,-0.115)
fr = mixaudio(front.GetRightChannel(),front.GetLeftChannel(),0.885,-0.115)
cc = mixaudio(front.GetRightChannel(),front.GetLeftChannel,0.4511,0.4511)
lfe = ConvertToMono(a).SoxFilter("lowpass 120","vol -0.5")
sl = mixaudio(back.GetLeftChannel(),back.GetRightChannel(),0.668,-0.668)
sr = mixaudio(back.GetRightChannel(),back.GetLeftChannel(),0.668,-0.668)
sl = DelayAudio(sl,0.02)
sr = DelayAudio(sr,0.02)
return MergeChannels( fl, fr, cc, lfe, sl, sr)
}

catch(front) {

front = a.soxfilter("sinc 20-20000")
back = a.soxfilter("sinc 100-7000")
fl = mixaudio(front.GetLeftChannel(),front.GetRightChannel(),0.885,-0.115)
fr = mixaudio(front.GetRightChannel(),front.GetLeftChannel(),0.885,-0.115)
cc = mixaudio(front.GetRightChannel(),front.GetLeftChannel,0.4511,0.4511)
lfe = ConvertToMono(a).SoxFilter("lowpass 120","vol -0.5")
sl = mixaudio(back.GetLeftChannel(),back.GetRightChannel(),0.668,-0.668)
sr = mixaudio(back.GetRightChannel(),back.GetLeftChannel(),0.668,-0.668)
sl = DelayAudio(sl,0.02)
sr = DelayAudio(sr,0.02)
return MergeChannels( fl, fr, cc, lfe, sl, sr)
}
}

UpMultisonic:

function UpMultisonic(clip a)
{
# Multisonic Profile
# Multisonic approach Profile modified with 20ms delay and some attenuation on surround

try {

front = a.soxfilter("filter 20-20000")
back = a.soxfilter("filter 100-7000")
fl = mixaudio(front.GetLeftChannel(),front.GetRightChannel(),1,-0.5)
fr = mixaudio(front.GetRightChannel(),front.GetLeftChannel(),1,-0.5)
cc = mixaudio(front.GetRightChannel(),front.GetLeftChannel,0.5,0.5)
lfe = ConvertToMono(a).SoxFilter("lowpass 120","vol -0.5")
sl = mixaudio(back.GetLeftChannel(),back.GetRightChannel(),0.668,-0.668)
sr = mixaudio(back.GetRightChannel(),back.GetLeftChannel(),0.668,-0.668)
sl = DelayAudio(sl,0.02)
sr = DelayAudio(sr,0.02)
return MergeChannels( fl, fr, cc, lfe, sl, sr)
}

catch(front) {

front = a.soxfilter("sinc 20-20000")
back = a.soxfilter("sinc 100-7000")
fl = mixaudio(front.GetLeftChannel(),front.GetRightChannel(),1,-0.5)
fr = mixaudio(front.GetRightChannel(),front.GetLeftChannel(),1,-0.5)
cc = mixaudio(front.GetRightChannel(),front.GetLeftChannel,0.5,0.5)
lfe = ConvertToMono(a).SoxFilter("lowpass 120","vol -0.5")
sl = mixaudio(back.GetLeftChannel(),back.GetRightChannel(),0.668,-0.668)
sr = mixaudio(back.GetRightChannel(),back.GetLeftChannel(),0.668,-0.668)
sl = DelayAudio(sl,0.02)
sr = DelayAudio(sr,0.02)
return MergeChannels( fl, fr, cc, lfe, sl, sr)
}
}

tebasuna51
4th December 2023, 12:07
About these old upmix filters I make some comments longtime ago:

1) Filters like front = a.soxfilter("sinc 20-20000") does not have sense only can waste time and quality.

2) Create a LFE channel with a lowpass of front channels is don't understand what is the LFE channel, and how work all audio equipments: the bass frequencies of ALL channels (not only LFE) are sended to SubWoofer speaker. The mix of bass from front channels and LFE channel can be doubled/cancelled/distorted. The best option is always a LFE empty:
lfe = fl.Amplify(0)

3) To avoid problems, at least with SoxFilter 1.0 and multichannel, I recommend use multiple mono instances for sox filters, instead back = a.soxfilter("filter 100-7000"):
bl = fl.soxfilter("filter 100-7000")
br = fr.soxfilter("filter 100-7000")

4) Some mix with sum of coeficients greater than 1 can cause clips for instance fl = mixaudio(front.GetLeftChannel(),front.GetRightChannel(),1,-0.5)

A example of my suggested upmix:
function UpSoundOnSound(clip a)
{
# Sound On Sound Profile
# SOS approach Profile with 20ms delay and some attenuation on surround

a=ConvertAudiotoFloat(a)
fl = a.GetLeftChannel()
fr = a.GetRightChannel()
cc = mixaudio(fl, fr,0.5,0.5)
lfe = fl.Amplify(0)
bl = fl.soxfilter("sinc 100-7000").ConvertAudiotoFloat()
br = fr.soxfilter("sinc 100-7000").ConvertAudiotoFloat()
sl = mixaudio(bl,br,0.5,-0.5)
sr = mixaudio(br,bl,0.5,-0.5)
sl = DelayAudio(sl,0.02)
sr = DelayAudio(sr,0.02)
return MergeChannels( fl, fr, cc, lfe, sl, sr)
}

5) I never recommend that upmixes, VCR's have many functions to play in 5.1 stereo inputs.

richardpl
5th December 2023, 19:11
Its year 2023, using 32 fixed point processing is vintage. Switch to double-floating point.

pinterf
6th December 2023, 09:08
Wrong doorbell, complain here
https://sourceforge.net/p/sox/mailman/sox-users/thread/CAP1ZimH8XhD7M9JZgqB8ducoKFMetfMkCt4XSGjfdHDHEqUV-Q%40mail.gmail.com/

flossy_cake
1st January 2024, 16:32
Not sure if I'm knocking at the wrong doorbell but, as noted by tebasuna51:

Normalize don't work after SoxFilter("compand...")

Demo:


# Make sine tone
ColorBars().ConvertToYV12().Trim(0, 3000)

# Apply "Dolby film standard" compression profile which looks like this: tiny.cc/dolbydrc
SoxFilter("compand 0.05,0.10 -90,-90,-70,-64,-43,-37,-31,-31,-21,-21,0,-20")

# Problem occurs here
# Normalize(0.98) # audible distortion & visible clipping on histogram
# Normalize(0.60) # no visible clipping on histogram but still audible distortion

# Show levels
Histogram(mode="audiolevels")



AvsPMod seems to have issues with the sine tone though - it pops/crackles a lot even with just ColorBars(). Best way of testing I think is with ffmpeg command line:


"C:\program files\ffmpeg\bin\ffmpeg.exe" -i "C:\script.avs" -c:v libx264 -tune film -b:v 8000000 -pix_fmt yuv420p -c:a aac -b:a 320k "c:\audiotest.mkv"


MPC-HC seems to be ok for testing - it can also correct the issue by seeking backwards (I'm guessing this behaviour is specific to MPC-HC or specifically the LAV DirectShow filter I'm using to open .avs scripts with).

A workaround seems to do the Normalize first, then Soxfilter, then AmplifyDB(15.0) or so to gain back the 20db headroom created by the Soxfilter (or put " 15" at the end of the Soxfilter string). But seeking backwards can still cause it to bug out and become audibly distorted - more so in MPC-HC and less so in AvsPMod.

I have tried mucking around with Preroll, Prefetch and SetCacheMode without success.

:thanks:

pinterf
1st January 2024, 18:08
I suppose Normalize breaks the rule, that SoxFilter would like to see a totally linear audio stream. Now that Avisynth+ has audio cache again, helped much with such situations.
Anyway I can look into what happens exactly, perhaps understanding it would show us something interesting.

pinterf
2nd January 2024, 14:20
Normalize first runs through the whole audio starting from the 0th sample to the very last one, in order to gather the statistics. This is done in a linear way.
This is done only once, then it works normally: processes only those audio samples which were requested.
The restart from 0 is handled inside SoxFilter, I saw no problems there.

@flossy_cake:
I was not able to hear any audio distortions (or don't know what to notice?) from the ffmpeg output, however sometimes there are clicks or pops the playing the clip with Avspmod (but this may be normal at the moment as you mentioned).
EDIT: check the latest AvsPmod (2.7.6.5 - Dire Straits Edition atm) with revamped audio support. https://forum.doom9.org/showthread.php?p=1995709#post1995709

What kind of distortion should I search for?

pinterf
2nd January 2024, 14:27
The introduction of Normalize seemingly conflicts with SoxFilter's preference for a linear audio stream. With Avisynth+'s reintroduction of audio cache, (https://storysaver.page/) it aids in managing such scenarios. Delving into the specifics might unveil insights that shed light on this dynamic.
Dear first-time poster, thank you, or your AI for the English-English translation. :-/ It was really a super positive contribution to the topic.

flossy_cake
3rd January 2024, 09:49
@flossy_cake:
I was not able to hear any audio distortions (or don't know what to notice?) from the ffmpeg output

I rendered these with ffmpeg: 1 (https://drive.google.com/file/d/1Sj4HVRLIyPd_GShZIXg_Vyv9X6-bVHph/view?usp=sharing), 2 (https://drive.google.com/file/d/1uSVWWFNDZVbUp1ZngjIdLPwu-Y_v1UUs/view?usp=sharing)

But I think it's a moot point as the sine wave tone seems to be invalid for testing for distortions with SoxFilter's compander, because the dB remains constant whereas with real world content the volume fluctuates up and down such as actors speaking or a judge banging their gavel it immediately shoots up to just below 0dB. The compander has nonzero attack which allow it to clip before it realises where it's supposed to be reducing that dB level to and you can get clipping until the attack period averages to that db level. It seems the theoretical solution is to set attack & decay to 0.0 but then audio is completely distorted under all test scenarios for me. Maybe really small values could be used like 2ms/4ms and this would limit clipping distortion to 2ms/4ms durations. I'm still testing this to see if there is some optimal value.

edit: and just as I say that, a value of attack=0.0, decay=1.0 seems to do the trick. No distortions on sine or real world content as long as I do it in this order and don't seek backwards:


# Make sine tone
ColorBars().ConvertToYV12().Trim(0, 300)

# Peak normalization
Normalize(0.98)

# Apply "Dolby film standard" compression profile which looks like this: tiny.cc/dolbydrc
SoxFilter("compand 0.0,1.0 -90,-90,-70,-64,-43,-37,-31,-31,-21,-21,0,-20")

# Gain back the headroom created by the compression profile
AmplifyDB(15.0) # alternatively put " 15" at end of SoxFilter string (should be 20 but leaving 5dB headroom)

# Show levels
Histogram(mode="audiolevels")


edit: it seems unintentional benefit of using attack=0 is that when it bugs out on seek backwards the audio gets SUPER distorted so I can easily hear if I need to do another seek to correct it, thus I can now use it for realtime use in MPC-HC

pinterf
3rd January 2024, 17:02
flossy_cake: thanks for the nice reproduction steps.
Check please this release:
https://github.com/pinterf/SoxFilter/releases/tag/v2.2

tebasuna51
3rd January 2024, 21:12
Thanks pinterf, seems work fine now the Normalize() after a SoxFilter("compand...")
Used in a downmix 7.1->5.1

a=WavSource("71_down_test.wav").ConvertAudioToFloat()
flr = Getchannel(a, 1, 2, 3, 4)
blr = Getchannel(a, 5, 6)
slr = Getchannel(a, 7, 8)
sur = MixAudio(blr, slr, 0.5, 0.5).SoxFilter("compand 0.0,0.0 -90,-84,-8,-2,-6,-1,-0,-0.1").ConvertAudioToFloat().Normalize(1)
mergechannels(flr,sur)
ConvertAudioTo24bit()

flossy_cake
4th January 2024, 14:14
flossy_cake: thanks for the nice reproduction steps.
Check please this release:
https://github.com/pinterf/SoxFilter/releases/tag/v2.2
Wow Normalize works after SoxFilter AND seeking doesn't cause distortions anymore! Really pleased with this :goodpost::thanks:

flossy_cake
5th January 2024, 17:05
But hang on, if we do Normalize after SoxFilter, doesn't that mean SoxFilter processing has to be applied to the entire audio track in advance before playback can begin, so that Normalize can see what the resulting max dB was after SoxFilter? That seems like a lot of preemptive processing to do before playback can even begin - is that really what's happening under the hood? The entire audio track is being run through SoxFilter in advance? Every single sample?

edit: for a 22 minute clip I'm getting a load time of 2 seconds with Normalize before vs 7 seconds with Normalize after, which is consistent with the above.

pinterf
5th January 2024, 17:09
But hang on, if we do Normalize after SoxFilter, doesn't that mean SoxFilter processing has to be applied to the entire audio track in advance before playback can begin, so that Normalize can see what the resulting max dB was after SoxFilter? That seems like a lot of preemptive processing to do before playback can even begin - is that really what's happening under the hood? The entire audio track is being run through SoxFilter in advance? Every single sample?
Yes. Normalize will go through on the whole track in advance (once).
Then it processes the data chunks with the already gathered statistics.

flossy_cake
16th January 2024, 07:26
I'm probably ringing the wrong doorbell, but is there any possibility of getting compander to produce clean, non-distorted audio with attack & decay both set to 0.0 seconds? They are the first 2 values:

SoxFilter("compand 0.0,0.0 -90,-90,-70,-64,-43,-37,-31,-31,-21,-21,0,-20")

As far as I can tell, setting them both to 0 is the only way to do compression without volume "pumping" effect. I can set attack to 0.0 and get clean audio, but decay must be at least 0.5-1.0 to avoid audible distortion, and this creates some volume pumping.

I tried playing with the final parameter "delay" which is supposed to "allow the compander to effectively operate in a predictive rather than a reactive mode", but I found that to make volume pumping even worse.

Anyway, it's still pretty good even with a 1.0 second decay. I just used it on S04E01 of Family Guy R1 DVD and it really fixed the audio levels and made it sound more like broadcast TV levels, albeit with some subtle volume pumping.

By the way here is the best explanation of attack & decay I could find:


The attack and decay parameters (in seconds) determine the time over which the instantaneous level of the input signal is averaged to determine its volume; attacks refer to increases in volume and decays refer to decreases. For most situations, the attack time (response to the music getting louder) should be shorter than the decay time because the human ear is more sensitive to sudden loud music than sudden soft music. Where more than one pair of attack/decay parameters are specified, each input channel is companded separately and the number of pairs must agree with the number of input channels. Typical values are 0.3,0.8 seconds.

I still don't really understand it in my mind, like which one gets preference if attack monitor says volume got louder over 1 sample period, but decay monitor says it got quieter due to it being on an average downward trend over 1.0 seconds worth of samples. It seems pointless to me - why would you want to have a decay before detecting quiet audio anyway, that will just lead to not getting the volume you wanted immediately. And the attack seems totally useless at a nonzero value cause a sudden loud spike in volume would allow it to clip until the entire attack period averages to that volume level. So I don't really understand the point of having a nonzero attack & decay.

flossy_cake
16th January 2024, 23:54
Perhaps my understanding of how compression works is totally wrong, because if I look at the top end of these curves it appears to be flatlining near the top, which would create clipping if the attack & decay were instantaneous, wouldn't it?

https://b.l3n.co/i/3LxGoz.png

Do the Dolby profiles rely on a nonzero attack/decay period to avoid clipping?

But the problem still occurs even if I do a curve without such flatlining, such as this curve:

https://d.l3n.co/i/Tpw6Zc.png

But when I think about running a sine wave through that curve with instantaneous attack/decay, wouldn't that still distort the shape of the sine wave? Like the start of the sine wave would shoot up steeper and then it would flatten off near the top of the bell curve and produce a different sound? Maybe that's why some attack/decay is needed, to calculate the average volume level of the sine wave over a period and use that as the input value (x-axis value) ?

Maybe that's why Dolby left the 20db of headroom at the top - to allow for a nonzero attack time so that when a loud impulse occurs like a judge hitting a gavel, it will have some headroom to go above -20 for a short period?

flossy_cake
17th January 2024, 03:50
Yes, I think that is right because Dolby specifies attack and decay times here (http://web.archive.org/web/20040716131627/http://www.dolby.com/tech/L.mn.0002.DDPEG1.pdf):

https://b.l3n.co/i/44YXza.png

So I think compression by its very nature is going to be prone to some "pumping" effect, although using the Dolby values for attack & decay seem to produce less pumping. The resulting overall volume isn't as loud as using 0.0 attack time that I was using previously, but the pumping is less, for instance during the Family Guy intro music I can barely hear any pumping with Dolby's values. Slower attack & decay seems to produce less pumping, when I think about it that makes sense cause it's averaging the input volume over a longer period of samples. I suppose you could still have some beat frequency where the tempo of music just happens to coincide with the attack/decay times and you could still get pumping say in between drum hits if the drum hits are spaced <decay> milliseconds apart.

Normalisation can be done before/after, and before is faster loading but resulting overall volume is typically higher doing it after, so after seems better if you want to get as much volume as possible. I think it may also guard against clipping if SoxFilter's attack code screws up and lets the volume go above 0db (is that how it works - can there be values above 0dB internally inside SoxFilter and Avisynth which Normalize could potentially see?)

flossy_cake
17th January 2024, 07:41
Is the "hold off period" in the Dolby table perhaps equivalent to SoxFilter's final "delay" parameter?


The fifth (optional) parameter is a delay in seconds. The input signal is analysed immediately to control the compander, but it is delayed before being fed to the volume adjuster. Specifying a delay approximately equal to the attack/decay times allows the compander to effectively operate in a `predictive' rather than a reactive mode. A typical value is 0.2 seconds.


The way I'm reading it, sounds like it should mean the volume adjustments come late rather than early, but in practice I observe it coming early, which is consistent with "predictive" behaviour.

simple_simon
13th April 2024, 02:20
Is this a better option now then using AudioLimiter.dll for downmixing 7.1ch+ audio?

tebasuna51
13th April 2024, 09:35
Is this a better option now then using AudioLimiter.dll for downmixing 7.1ch+ audio?

In 7.1->5.1 downmix the AudioLimiter is used to preserve the max volume without clip when mix 4 surround chanels in only 2.

We can replace the AudioLimiter with a Sox Compand function like here (https://forum.doom9.org/showthread.php?p=1926777#post1926777) than can work in 64 bits (not allowed for AudioLimiter).

It is dificult to say what is better, I don't know the internal algorithm of AudioLimiter but maybe the sox compand at least is more sophisticated.