Audio Development [Archive]

Myrsloik

26th January 2020, 14:11

Audio support is finally mostly done. See this blog post (http://www.vapoursynth.com/2020/01/audio-support-and-how-it-works/) for a longer explanation and list of what's currently implemented.

Test5 installer (x64) (https://www.dropbox.com/s/l16vtaj4twpq8vx/VapourSynth64-R51-audio5.exe?dl=1)
Test5 portable (x64)
(https://www.dropbox.com/s/kpcu7yqvdtoeuqe/VapourSynth64-Portable-R51-audio5.7z?dl=1)

Test4 installer (x64) (https://www.dropbox.com/s/tcac6i210penql6/VapourSynth64-R51-audio4.exe?dl=1)
Test4 portable (x64)
(https://www.dropbox.com/s/2pwk662cwn1kzi8/VapourSynth64-Portable-R51-audio4.7z?dl=1)

BestAudioSource thread (https://forum.doom9.org/showthread.php?t=177337)

DJATOM

26th January 2020, 15:14

Looks like 1 audio frame has a length of 2 seconds and I can't trim less than that. clip2 = clip2.std.AudioTrim(0, 95999) works, while clip2 = clip2.std.AudioTrim(0, 47999) fails to output any data.

Myrsloik

26th January 2020, 15:36

Looks like 1 audio frame has a length of 2 seconds and I can't trim less than that. clip2 = clip2.std.AudioTrim(0, 95999) works, while clip2 = clip2.std.AudioTrim(0, 47999) fails to output any data.

Doh, the one case I forgot to test when passing through audio frames. The files in the first post have been discretely updated.

Audio frames are currently 96000 samples so that's well spotted.

DJATOM

26th January 2020, 15:42

Now it trims as intended. Great!

lansing

26th January 2020, 19:00

I got the "No attribute with the name bas exists" after running the installer

Myrsloik

26th January 2020, 19:11

I got the "No attribute with the name bas exists" after running the installer

Did you actually get the plugin?

lansing

26th January 2020, 19:17

Did you actually get the plugin?

I don't think I have, I checked "core/plugins", there's no dll with the name "bas".

AzraelNewtype

26th January 2020, 23:14

I don't think I have, I checked "core/plugins", there's no dll with the name "bas".

Did you... click the Best Audio Source Thread link in the OP and manually grab the plugin, which the question was phrased specifically to indicate was not actually built in and therefore would not be in core/plugins?

lansing

27th January 2020, 01:20

Did you... click the Best Audio Source Thread link in the OP and manually grab the plugin, which the question was phrased specifically to indicate was not actually built in and therefore would not be in core/plugins?

Ok I got it, I thought the plugin was included in the installer

lansing

27th January 2020, 07:15

Playing back the vpy script in media player classic is choppy, audio constant skips

DJATOM

27th January 2020, 12:41

That problem only occurs with w64 header, raw output is fine.

Pat357

2nd February 2020, 19:30

How to choose to output W64 or raw ?
What can I use to play raw audio ? I suppose FFplay should do the trick ?
What extra <parameters> do I have to use in example below to play raw-audio ?

Here's what I have :

vspipe -i betteraudio_test.vpy -
[mp3float @ 000001B60C074C00] Could not update timestamps for skipped samples.
Samples: 12856320
Sample Rate: 48000
Format Name: Audio32F
Sample Type: Float
Bits: 32
Channels: 2
Layout:

ffplay -f f32le -sample_rate 48000 -channel_layout 3 -i rawaudio.raw
[pcm_f32le @ 000002ce23786700] Channel layout 'stereo' with 2 channels does not match specified number of channels 1: ignoring specified channel layout
[f32le @ 000002ce23779180] Estimating duration from bitrate, this may be inaccurate
Input #0, f32le, from 'rawaudio.raw':
Duration: 00:08:55.68, bitrate: 1536 kb/s
Stream #0:0: Audio: pcm_f32le, 48000 Hz, 1 channels, flt, 1536 kb/s
[pcm_f32le @ 000002ce237d3280] Channel layout 'stereo' with 2 channels does not match specified number of channels 1: ignoring specified channel layout
24.58 M-A: 0.000 fd= 0 aq= 184KB vq= 0KB sq= 0B f=0/0

Vspipe -i says 2 channels, but ffplay can only find one !
One channel produces only noise, while the other plays the music, but much to slow (tempo and pitch are both to low).
The woman in this song sounds like a man with a deep voice.

Myrsloik

2nd February 2020, 20:05

How to choose to output W64 or raw ?
What can I use to play raw audio ? I suppose FFplay should do the trick ?
What extra <parameters> do I have to use in example below to play raw-audio ?

ffplay -f rawaudio <parameters> -i rawaudio.raw

<parameters> = ??

You use -y or --y4m wo add w64 heqaders. A bit confusing but I've basically changed it to mean "add headers". Not sure how to play the raw audio in ffmpeg.

Pat357

3rd February 2020, 17:43

Finally I figured it out and give it here in case other users want to use ffmpeg to process the output :

To playback or process the raw audio output from Vspipe, do the following :

vspipe -i betteraudio_test.vpy -
[mp3float @ 000001B60C074C00] Could not update timestamps for skipped samples.
Samples: 12856320
Sample Rate: 48000
Format Name: Audio32F
Sample Type: Float
Bits: 32
Channels: 2
Layout:

Notice the following parameters :
Sample Rate: 48000 = -ar 48000
Format Name: Audio32F
Sample Type: Float
Bits: 32 3 above together = -f f32le
Channels: 2 = -ac 2

To playback :
vspipe betteraudio_test.vpy - | ffplay -f f32le -ac 2 -ar 48000 -i -

To process further using ffmpeg :
vspipe betteraudio_test.vpy - | ffmpeg -f f32le -ac 2 -ar 48000 -i - <filters> -acodec xxx ... outputfile

PS. In vapoursynth only raw audio works ok in this test version, output in w64 always gives a lot of skips with noise.
With Avisynth I've no problems found so far.

Myrsloik

3rd February 2020, 21:45

...
PS. In vapoursynth only raw audio works ok in this test version, output in w64 always gives a lot of skips with noise.
With Avisynth I've no problems found so far.

I'll look into the bugs, must've gotten some value wrong in the w64 header. Expect a new version sometime next week since I'm quite busy.

Anyway, here's the general plan:

Try to fix all the reported issues.
Add the filters ShuffleChannels (audio equivalent of ShufflePlanes), MatrixMix, AssumeSampleRate and maybe AudioLoop.

Boulder

4th February 2020, 07:35

Would it be possible to also include the SoundTouch library functionalities to allow adjusting pitch, tempo etc.?

Myrsloik

4th February 2020, 09:46

Would it be possible to also include the SoundTouch library functionalities to allow adjusting pitch, tempo etc.?

Start writing a plugin!

Boulder

4th February 2020, 17:30

Start writing a plugin!

I'm just a little bit too dumb for that.. not much but enough to not push me over the edge :)
I was just thinking if it was possible to port the implementation already included in native Avisynth+.

Nevertheless, thanks for the work. I tested the plugin briefly in Avs+ to adjust audio (5.1ch FLAC file) tempo for an ugly 24.975 -> 25 fps conversion. It worked without a hitch :thanks:

Myrsloik

4th February 2020, 17:43

I'm just a little bit too dumb for that.. not much but enough to not push me over the edge :)
I was just thinking if it was possible to port the implementation already included in native Avisynth+.

Nevertheless, thanks for the work. I tested the plugin briefly in Avs+ to adjust audio (5.1ch FLAC file) tempo for an ugly 24.975 -> 25 fps conversion. It worked without a hitch :thanks:

I'm sure someone (maybe even me) will start to create/port the useful stuff seen in Avisynth once I get a little bit further with the official audio support.

Myrsloik

9th May 2020, 22:47

That problem only occurs with w64 header, raw output is fine.

You found an 11 year old bug from the original AVFS author. Congratulations!

So much for borrowing the w64 code from there. Will be fixed in the next version.

Myrsloik

2nd June 2020, 20:57

Another audio update:

Progress is in fact being made but a few problems have become apparent in the design so that's why there hasn't been a new build yet. Expect something functional later this month.

Myrsloik

9th June 2020, 19:00

I've released test2. It's now more or less feature complete and fixes the issues of test1 (incorrect wave64 headers and so on). It also adds the functions ShuffleChannels, SplitChannels and AudioMix. See the bundled documentation for usage examples.

YOU WILL NEED TO UPDATE BESTAUDIOSOURCE OR IT WILL CRASH.

Comment on things in general. Especially how to make the audio function syntax suck less, they all make shuffleplanes look trivial in comparison.

TEST IT!

DJATOM

10th June 2020, 13:55

Now it seems w64 files are valid, but some programs doesn't support w64. It would be nice to have an option to output with Wave header as well.
Also I'd like to know why audio is upsampled to 32 bit (from 24 reported in mediainfo)
C:\Temp\VapourSynth64-Portable-R50-audio>VSPipe.exe -i test_aud.vpy -
Samples: 68260320
Sample Rate: 48000
Format Name: Audio32
Sample Type: Integer
Bits: 32
Channels: 2
Layout: back center, side left, side right, top center, top front left, top front center, top front right, top back left
Layout is also look weird to me, there are only 2 channels.

Mediainfo from the input file:
General
ID : 0 (0x0)
Complete name : 00004.m2ts
Format : BDAV
Format/Info : Blu-ray Video
File size : 6.01 GiB
Duration : 23 min 42 s
Overall bit rate mode : Variable
Overall bit rate : 36.3 Mb/s
Maximum Overall bit rate : 48.0 Mb/s

Video
ID : 4113 (0x1011)
Menu ID : 1 (0x1)
Format : AVC
Format/Info : Advanced Video Codec
Format profile : High@L4.1
Format settings : CABAC / 4 Ref Frames
Format settings, CABAC : Yes
Format settings, Reference frames : 4 frames
Codec ID : 27
Duration : 23 min 42 s
Bit rate mode : Variable
Bit rate : 30.2 Mb/s
Maximum bit rate : 40.0 Mb/s
Width : 1 920 pixels
Height : 1 080 pixels
Display aspect ratio : 16:9
Frame rate : 23.976 (24000/1001) FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.608
Stream size : 5.01 GiB (83%)

Audio #1
ID : 4352 (0x1100)
Menu ID : 1 (0x1)
Format : PCM
Format settings : Big / Signed
Muxing mode : Blu-ray
Codec ID : 128
Duration : 23 min 42 s
Bit rate mode : Constant
Bit rate : 2 304 kb/s
Channel(s) : 2 channels
Channel layout : L R
Sampling rate : 48.0 kHz
Bit depth : 24 bits
Stream size : 391 MiB (6%)

Audio #2
ID : 4353 (0x1101)
Menu ID : 1 (0x1)
Format : PCM
Format settings : Big / Signed
Muxing mode : Blu-ray
Codec ID : 128
Duration : 23 min 42 s
Bit rate mode : Constant
Bit rate : 2 304 kb/s
Channel(s) : 2 channels
Channel layout : L R
Sampling rate : 48.0 kHz
Bit depth : 24 bits
Stream size : 391 MiB (6%)

Myrsloik

10th June 2020, 16:10

The upsampling, if any, happens inside ffmpeg. Maybe send me a short sample file and I'll take a look at it.

And the channel listing. Maybe I forgot to fix it...

Myrsloik

13th June 2020, 13:07

A lot of bugs were found and fixed that prevented proper operation. Added wav header support to vspipe. The audio output bitdepth now gets rounded up to the nearest multiple of 8 bits meaning that 24 bit output works as expected.

YOU WILL NEED TO UPDATE BESTAUDIOSOURCE OR IT WON'T BEHAVE PROPERLY.

Test again and see how soon it explodes.

DJATOM

13th June 2020, 15:02

No major problems spotted so far (with 24 bit stereo PCM muxed into m2ts container). I tried to trim audio by calculating samples per video frame and it worked.
The only one problem I faced is
>>> aclip.format._as_dict()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "src\cython\vapoursynth.pyx", line 980, in vapoursynth.AudioFormat._as_dict
AttributeError: 'vapoursynth.AudioFormat' object has no attribute 'samplesPerFrame'
while
>>> vclip.format._as_dict()
{'color_family': <ColorFamily.YUV: 3000000>, 'sample_type': <SampleType.INTEGER: 0>, 'bits_per_sample': 8, 'subsampling_w': 1, 'subsampling_h': 1}
works.
Upd.: proposed a PR for it.

Myrsloik

13th June 2020, 17:19

I sneakily updated the test3 build with some more fixes and improvements on what happens with "too long" audio in certain cases.

Try again and see what you find.

tebasuna51

15th June 2020, 19:20

Now it seems w64 files are valid...

Don't work for me.

Using VapourSynth64-Portable-R50-audio3.7z in W10

from vapoursynth import core
c = core.bas.Source(r"C:\Test\5_1.ac3")
c.set_output(0)

C:\Test\Vapour>VSPipe.exe zzaud.vpy zzaud.raw
[ac3 @ 000001C20EF50100] Estimating duration from bitrate, this may be inaccurate
Output 24 frames in 0.05 seconds (491.17 fps)

Correct raw pcm data 6 chan, 32 float, 48000 samplerate

C:\Test\Vapour>VSPipe.exe -y zzaud.vpy zzaud.wav
[ac3 @ 000002B824F81900] Estimating duration from bitrate, this may be inaccurate
Output 24 frames in 0.05 seconds (491.92 fps)

That write the same raw audio data but with 24 headers: FRAME/n
each 576000 bytes (0,5 sec)

I need use:

VSPipe.exe zzaud.vpy - | wavfix (https://forum.doom9.org/showthread.php?p=1520399#post1520399) - zzaud.w64 -o 3 -m 0 -f 32 -c 6

or, if we expect less than 4 GB:

VSPipe.exe zzaud.vpy - | wavfix (https://forum.doom9.org/showthread.php?p=1520399#post1520399) - zzaud.wav -m 0 -f 32 -c 6

tebasuna51

15th June 2020, 19:35

The audio output bitdepth now gets rounded up to the nearest multiple of 8 bits meaning that 24 bit output works as expected.

The expected output of any lossy audio decoder must be 32 (or 64) float, only lossless audio must have output 8,16,24 int.

Avisynth have ConvertAudio (http://avisynth.nl/index.php/ConvertAudio) functions but never must be used by default. Let 32 float please.

Myrsloik

15th June 2020, 20:32

Don't work for me.

Using VapourSynth64-Portable-R50-audio3.7z in W10

Correct raw pcm data 6 chan, 32 float, 48000 samplerate

That write the same raw audio data but with 24 headers: FRAME/n
each 576000 bytes (0,5 sec)

I need use:

VSPipe.exe zzaud.vpy - | wavfix (https://forum.doom9.org/showthread.php?p=1520399#post1520399) - zzaud.w64 -o 3 -m 0 -f 32 -c 6

or, if we expect less than 4 GB:

VSPipe.exe zzaud.vpy - | wavfix (https://forum.doom9.org/showthread.php?p=1520399#post1520399) - zzaud.wav -m 0 -f 32 -c 6

Can you provide a short test file? Maybe it's a specific type of ac3 that triggers it since throwing a wav file at it didn't do anything.

I've fixed the header problem so you can't mismatch it now. The test3 build has been sneakily updated so download it again.

Myrsloik

15th June 2020, 20:43

The expected output of any lossy audio decoder must be 32 (or 64) float, only lossless audio must have output 8,16,24 int.

Avisynth have ConvertAudio (http://avisynth.nl/index.php/ConvertAudio) functions but never must be used by default. Let 32 float please.

Nope. Complete and utter BS and a waste of bits. For mp3/aac compressed from 16 bit sources you'd just get a nice wasteful format where OVER HALF THE BITS ARE COMPRESSION NOISE. Take your audio placebo and go somewhere else. Moar bits=BETTERAR is the new brain rot of the doom9 forums.

tebasuna51

15th June 2020, 21:00

I used the -y parameter because --wav don't work with VapourSynth64-Portable-R50-audio2

Now --wav and --w64 work with VapourSynth64-Portable-R50-audio3

BTW both headers (wav and w64) must be WAVE_FORMAT_EXTENSIBLE for multichannel, simple headers are only allowed for mono or stereo.

tebasuna51

15th June 2020, 21:02

Nope. Complete and utter BS and a waste of bits.

But..., do you want make a proper soft or only a joke?

feisty2

16th June 2020, 00:24

lossy audio files are assumed to be fp32 when imported in Audition

tebasuna51

16th June 2020, 11:36

Here there are 3 valid AC3 files with 3 channel each one, there are also the the info logs of them and of the wav/w64 decoded by ffmpeg.

The relevant info is:

File ........: 3p_LR_C.ac3
Audio coding mode (acmod) ...: 3 (3/0 - L, C, R)
Low frequency effects channel: 0 (Not present)
File ........: 3p_LR_C.ac3_.wav or .w64
MaskChannels : 7 (FL FR FC)

File ........: 3p_LR_LFE.ac3
Audio coding mode (acmod) ...: 2 (2/0 - L, R)
Low frequency effects channel: 1 (Present)
File ........: 3p_LR_LFE.ac3_.w64 or .wav
MaskChannels : 11 (FL FR LF)

File ........: 3p_LR_S.ac3
Audio coding mode (acmod) ...: 4 (2/1 - L, R, S)
Low frequency effects channel: 0 (Not present)
File ........: 3p_LR_S.ac3_.wav or w64
MaskChannels : 259 (FL FR BC)

The decoder know the Audio coding mode of AC3 and the output must write the correct MaskChannels.
AviSynth can't obtain from the decoder the MaskChannels, maybe VapourSynth can.

Myrsloik

16th June 2020, 11:53

It's already in there if you print(clip.format). I just didn't add WAVE_FORMAT_EXTENSIBLE to the output headers yet.

Audio Format Descriptor
Id: 11001000
Name: Audio32F
Sample Type: Float
Bits Per Sample: 32
Bytes Per Sample: 4
Samples Per Frame: 24000
Channels: FRONT_LEFT, FRONT_RIGHT, BACK_CENTER

tebasuna51

16th June 2020, 13:12

It's already in there if you print(clip.format). I just didn't add WAVE_FORMAT_EXTENSIBLE to the output headers yet.

Good to know, thanks.

Myrsloik

16th June 2020, 22:27

I've sneakily updated the test3 build again with waveformatextensible support. Note that it's always on (unlike avisynth).

tebasuna51

17th June 2020, 10:53

Now work fine, thanks.

By the momet I read there are:
Audio filters

BestAudioSource – a new sample accurate but somewhat slow FFmpeg based source filter (usage: core.bas.Source(“rule6.mp4”))
BlankAudio – a classic
AudioSplice and AudioTrim – with the expected Python overloads of course

I check clip.std.AudioTrim(0, 100000)
and I obtained 100001 audio samples.

There are some docs to know how use BlankAudio, AudioSplice and others future functions?

About BestAudioSource seems decode ac3, eac3, aac, flac, mp3 but still not:
opus (vapoursynth.Error: Couldn't open 'C:\Test\MultiCan\Test_.opus')
dts (Filter Source has no audio samples in the output.)
thd (Filter Source has no audio samples in the output.)

Myrsloik

17th June 2020, 12:24

Now work fine, thanks.

By the momet I read there are:

I check clip.std.AudioTrim(0, 100000)
and I obtained 100001 audio samples.

There are some docs to know how use BlankAudio, AudioSplice and others future functions?

About BestAudioSource seems decode ac3, eac3, aac, flac, mp3 but still not:
opus (vapoursynth.Error: Couldn't open 'C:\Test\MultiCan\Test_.opus')
dts (Filter Source has no audio samples in the output.)
thd (Filter Source has no audio samples in the output.)

The range is inclusive just like trim in avisynth. Note that python array slicing syntax (clip[0:100000]) does not include the last value and will return 100000 samples.

Check the included local documentation. It's mostly complete.

BestAudioSource is compiled without additional decoding libraries since it's a huge pain in the ass to do. Especially when developing. More compatible builds will probably be created by someone once I declare development mostly finished.

Lypheo

18th June 2020, 16:15

Do you plan on adding documentation for the audio filter api?

Myrsloik

18th June 2020, 16:26

Do you plan on adding documentation for the audio filter api?

Yes, at some later point. It works 99% like video filters and you can look at audiofilters.cpp to see how they're implemented.

The only quirk you really need to know is that the audio frame size is currently always the same for all formats. And that I'll probably use this branch to clean up the API in general so at some point trivial changes and a quick recompile may be necessary.

l33tmeatwad

18th June 2020, 21:27

I see this is on GitHub, is there currently an easy way to compile this for testing on unix based systems?

Myrsloik

18th June 2020, 23:36

I see this is on GitHub, is there currently an easy way to compile this for testing on unix based systems?

The build system should work. Or at most only miss audiofilters.cpp in the list of files to compile.

Myrsloik

19th June 2020, 11:30

The build system is now updated so all you not-windows-people can test things out.

l33tmeatwad

19th June 2020, 17:16

In case anyone was curious, if you have a VapourSynth enabled copy of FFMPEG and want to encode video and audio at the same time it works rather nicely.

from vapoursynth import core
video = core.ffms2.Source('8bit Sample.mp4')
audio = core.bas.Source('8bit Sample.mp4')

video.set_output(0)
audio.set_output(1)

vspipe --wav -o 1 Sample.vpy - | ./ffmpeg -f vapoursynth -i Sample.vpy -i pipe: -map 0:0 -map 1:0 -f AVI -c:v utvideo -pix_fmt yuv420p -colorspace bt709 -c:a pcm_s16le -y sample.avi

poisondeathray

20th June 2020, 18:28

Something is buggy with the video for R50 vapoursynth audio3 version (even without audio) when using LWLibavSource (non indexed) for MP4. It previews ok; but it crashes when using vspipe to ffmpeg, or even the vapoursynth editor internal benchmark F7 , when loading a simple youtube video, or other random MP4 video

clip = core.lsmas.LWLibavSource(r'video.mp4')
clip.set_output()

But it works ok when indexed with LibavSMASHSource

clip = core.lsmas.LibavSMASHSource(r'video.mp4')
clip.set_output()

Switching back to R50 "no audio" vapoursynth version fixes LWLibavSource

EDIT: actually I'm not sure that's it. It seems to crash randomly, even with LibavSMASHSource +/- filters. Adding filters causes it to crash faster. But the no audio R50 version works ok

Myrsloik

20th June 2020, 21:28

Define crashes. Which os? Does it happen with other source filters?

poisondeathray

20th June 2020, 21:38

Define crashes. Which os? Does it happen with other source filters?

In vsedit benchmark (F7), "script editor has stopped working" . "A problem caused the program to stop working correctly. Please close the program"

Win8.1

Yes, ffms2 script crashes too when some filter is added e.g. clip = haf.SmoothLevels(clip, 18,1,235,0,255) , but not with source filter ffms2 alone

Or internal levels filter, in case there was an issue with haf.SmoothLevels

clip = core.ffms2.Source(r'video.mp4')
clip = core.std.Levels(clip, min_in=0, max_in=0, gamma=1, min_out=255, max_out=235, planes=[0,1,2])
clip.set_output()

But blankclip + filter works ok

Myrsloik

24th June 2020, 16:13

In vsedit benchmark (F7), "script editor has stopped working" . "A problem caused the program to stop working correctly. Please close the program"

Win8.1

Yes, ffms2 script crashes too when some filter is added e.g. clip = haf.SmoothLevels(clip, 18,1,235,0,255) , but not with source filter ffms2 alone

Or internal levels filter, in case there was an issue with haf.SmoothLevels

clip = core.ffms2.Source(r'video.mp4')
clip = core.std.Levels(clip, min_in=0, max_in=0, gamma=1, min_out=255, max_out=235, planes=[0,1,2])
clip.set_output()

But blankclip + filter works ok

Reproduced but will definitely take a while to figure out why it happens.