VapourSynth - AUDIO SUPPORT AND NEW API BETA [Archive]

Myrsloik

23rd July 2021, 22:07

It's finally happened!

Biggest changes:

Audio support
Cleaner API
More performance

I'll provide a longer post later but things have now reached a state I'm happy with in general. Basically full backwards compatibility is provided with only a few minor caveats:

YCoCg is no longer a separate format so any scripts that reference it will break (now fixed in all known script, simply update your things).
Filters that output multiple clips (old api FFMS2 in alpha mode) will simply have outputs beyond the first discarded.
Compat formats have been removed so applications that rely on them for output are broken.
Avisynth filters now all have a secret argument called compatpack added, set it to True to pack YV16 and RGB24 before passing to the filter.
get_read/write_array() functions have been removed in Python and replaced by a better implementation that can't accidentally cause access violations
Histogram filter no longer bundled in windows installer since it's not a part of the VS source tree

Note that none of the previously compiled plugins for api4 will work so use the provided versions in this release. If you want to do speed comparisons I'd recommend using the linked FFMS2 binary which supports both old and new VS APIs. The alpha plane is now attached to the main frame and has to be extracted with PropToClip().

https://github.com/vapoursynth/vapoursynth/releases/tag/R55-API4-RC2
https://github.com/vapoursynth/vapoursynth/releases/tag/R54-API4-test1 <- get filter binaries from this release

DJATOM

24th July 2021, 13:27

Apparently removing COMPAT stuff broke existing script previewers
Failed to convert to RGB:
The VSVideoInfo structure passed by Spline16 is invalid.
Couldn't create preview node for output number 0.

Myrsloik

24th July 2021, 13:34

Apparently removing COMPAT stuff broke existing script previewers
Failed to convert to RGB:
The VSVideoInfo structure passed by Spline16 is invalid.
Couldn't create preview node for output number 0.

Anything VFW/AVFS based still works. Also, compat formats needed to die sooner or later

ChaosKing

24th July 2021, 14:52

Can't test ffms2 v4 in vdub. I always get this error if click on "run video analysis pass":
An out-of-bounds memory access (access violation) occurred in module 'VirtualDub64'...
...reading address 0000000000000290.

Scrolling through the video can also lead to a crash.

Old ffms2 is ok.

Tested on Win 10 x64, Ryzen 3600 CPU. Vdub2 44282
Video source mpeg2 dvd muxxed to mkv.

EDIT
And it seems that old ffms2 is very slow in this build. The dvd clip went from 220fps to like 6fps after 2000 frames. 1080p avc clip starts with 8fps. Something seems not right.

Myrsloik

24th July 2021, 15:31

Can't test ffms2 v4 in vdub. I always get this error if click on "run video analysis pass":
An out-of-bounds memory access (access violation) occurred in module 'VirtualDub64'...
...reading address 0000000000000290.

Scrolling through the video can also lead to a crash.

Old ffms2 is ok.

Tested on Win 10 x64, Ryzen 3600 CPU. Vdub2 44282
Video source mpeg2 dvd muxxed to mkv.

EDIT
And it seems that old ffms2 is very slow in this build. The dvd clip went from 220fps to like 6fps after 2000 frames. 1080p avc clip starts with 8fps. Something seems not right.

Read the initial post: "Old api source filters that used nfMakeLinear (most of them since seeking is slooow) will perform worse until ported to the new API."

I'll see if I can figure out the crash

Myrsloik

24th July 2021, 15:54

Can't test ffms2 v4 in vdub. I always get this error if click on "run video analysis pass":
An out-of-bounds memory access (access violation) occurred in module 'VirtualDub64'...
...reading address 0000000000000290.

Scrolling through the video can also lead to a crash.

Old ffms2 is ok.

Tested on Win 10 x64, Ryzen 3600 CPU. Vdub2 44282
Video source mpeg2 dvd muxxed to mkv.

I can't reproduce the crash and I unfortunately don't have dvds muxed to mkv specifically.

ChaosKing

24th July 2021, 16:10

I can't reproduce the crash and I unfortunately don't have dvds muxed to mkv specifically.

It happens with other videos as well.

Myrsloik

24th July 2021, 17:04

It happens with other videos as well.

Found the bug. The binaries have been sneakily updated.

ChaosKing

24th July 2021, 18:50

No more crashes now :D
But there's still something wrong when using v3 api ffms2. It is painfully slow. 190 fps (api v4) vs 6fps (api v3)
I tried multiple ffms2 versions, same low fps. CPU usage seems similar to v4.

EDIT
will perform MUCH WORSE until ported to the new API.
But will it be that slow?

Myrsloik

24th July 2021, 19:34

No more crashes now :D
But there's still something wrong when using v3 api ffms2. It is painfully slow. 190 fps (api v4) vs 6fps (api v3)
I tried multiple ffms2 versions, same low fps. CPU usage seems similar to v4.

EDIT

But will it be that slow?

Yes, I suppose that's possible. I got ~1/15 speed with a quick qtgmc test. Basically all the logic to try to make requests linear are gone and the new (and much better) system requires additional filter support.

If relevant source filters aren't updated by the time this branch is stable and tested maybe I'll add some kind of additional workaround.

ChaosKing

24th July 2021, 19:49

My results (vdub - run analysis pass)
R54_test1: ffms2 v4 + qtgmc() = 170fps
R54: ffms2 v3 + qtgmc() = 185fps

EDIT:
More reliable results with vspipe (run 3 times each):
R54 ffms2 v3 1080p source - 264.8 fps
R54 ffms2 v3 + qtgmc 480p source - 182.05 fps

R54_test1 ffms2 v4 1080p source - 264.3 fps
R54_test1 ffms2 v4 + qtgmc 480p source - 204.12 fps

Now test1 is faster with qtgmc xD

ChaosKing

24th July 2021, 20:28

Very heavy filtering test:
haf.MCTemporalDenoise + knlm.KNLMeansCL + haf.ContraSharpening + haf.FineDehalo + nnedi3_rpow2 + MfTurd + haf.FineDehalo + haf.LSFmod + f3kdb.Deband

R54: 11.3 fps
R54: 12.11 fps (max_cache_size = 15000) (I have 64gb ram)
R54_test1: 12.5 fps

I noticed that test1 does not show the "Script exceeded memory limit. Consider raising cache size." message. So caching is now smarter?

Myrsloik

24th July 2021, 20:35

Very heavy filtering test:
haf.MCTemporalDenoise + knlm.KNLMeansCL + haf.ContraSharpening + haf.FineDehalo + nnedi3_rpow2 + MfTurd + haf.FineDehalo + haf.LSFmod + f3kdb.Deband

R54: 11.3 fps
R54: 12.11 fps (max_cache_size = 15000) (I have 64gb ram)
R54_test1: 12.5 fps

I noticed that test1 does not show the "Script exceeded memory limit. Consider raising cache size." message. So caching is now smarter?

Yes, the cache insertion is completely reworked and it wouldn't surprise me if there're only 1/4 as many caches inserted (and in better places) for scripts like qtgmc.

feisty2

25th July 2021, 14:28

when will the documentation for API v4 be ready?

Myrsloik

25th July 2021, 14:37

when will the documentation for API v4 be ready?

You can find mostly complete notes here: https://github.com/vapoursynth/vapoursynth/blob/doodle1/APIV4%20changes.txt

For plain video filters it's mostly a question of renamed things and you should be able to convert just by looking at the filter samples.

_Al_

25th July 2021, 19:54

vsedit would need to be updated, it uses COMPATBGR32 for RGB conversion then using QT's QImage.Format_RGB32 to convert it to QPixmap. In Python (not C ) it uses something along like this:
img = QImage(compatbgr.get_frame(f).get_read_array(0), w, h, stride, QImage.Format_RGB32).mirrored()
pix = QPixmap.fromImage(img).scaled(scale_w, scale_h, **modes)

Myrsloik

25th July 2021, 21:25

vsedit would need to be updated, it uses COMPATBGR32 for RGB conversion then using QT's QImage.Format_RGB32 to convert it to QPixmap. In Python (not C ) it uses something along like this:
img = QImage(compatbgr.get_frame(f).get_read_array(0), w, h, stride, QImage.Format_RGB32).mirrored()
pix = QPixmap.fromImage(img).scaled(scale_w, scale_h, **modes)

Fortunately packing planar RGB24 is trivial so applications should get updated quickly and would still work with both. If someone wants to compile a list of all applications that break because of my changes that'd be useful. I already have a good idea about which source plugins need to be updated and I'll get to work on d2vsource tomorrow.

DJATOM

25th July 2021, 22:13

Afaik, https://github.com/Endilll/vapoursynth-preview and https://bitbucket.org/mystery_keeper/vapoursynth-editor/src/master/ broken, both relies on Qt and Format_RGB32 (0xffRRGGBB).

Myrsloik

1st August 2021, 20:08

Test2 is released. Has fixes for crashes that could occur with certain filters but otherwise no real changes.

lansing

2nd August 2021, 15:56

Is RGB24 compatible with Virtualdub filter if I'm to load it in the script?

l33tmeatwad

2nd August 2021, 16:19

If someone wants to compile a list of all applications that break because of my changes that'd be useful.
Yeah, having a master list would be super helpful.

Myrsloik

2nd August 2021, 18:21

Is RGB24 compatible with Virtualdub filter if I'm to load it in the script?

No, not at the moment. Will work before I make a full release.

Myrsloik

2nd August 2021, 18:24

Yeah, having a master list would be super helpful.

Here's the list of known things so far. Note that several scripts got updated already and have been removed. I'm currently working on d2vsource and that should be ready sometime this week.

https://github.com/vapoursynth/vapoursynth/issues/718

Myrsloik

6th August 2021, 13:19

I'm adding back packed YUY2 and RGB32 support to Avisynth compatibility but I'm not sure how it should be done. Always packing YV16 to YUY2 will be counterproductive since nowadays more plugins support YV16 than YUY2 and it'll error out. Planar RGB to packed has similar problems. So I guess there are 2 options:

1. Inject a secret packing argument into every Avisynth function. Example: core.avs.GBlur(clip, usepacking=True)
2. Try to automatically detect things. Would be equivalent to first invoking a function using planar formats and if that fails pack inputs (if any suitable) to YUY2/RGB32 and retry. No idea how well this would work in practice.

cretindesalpes

9th August 2021, 22:57

Nice to see that audio will be supported in VS. But are you really going to support all kind of different sample formats at the plug-in level? Because nowadays we only need float for audio processing (sometimes double for temporary calculations, but this is not needed for streaming between audio modules), and conversion happens only at storage or interface level.

DJATOM

10th August 2021, 00:01

Simple (trim, join, etc) filters can live without any processing, isn't it better to provide untouched data?

tebasuna51

10th August 2021, 08:35

Nice to see that audio will be supported in VS. But are you really going to support all kind of different sample formats at the plug-in level? Because nowadays we only need float for audio processing (sometimes double for temporary calculations, but this is not needed for streaming between audio modules), and conversion happens only at storage or interface level.

My opinion was in the post https://forum.doom9.org/showthread.php?p=1917206#post1917206

Lossless functions can work with any format (and already work, more or less), but lossy functions only need the 32 bits float format (of course 64 bits float format is better but maybe in the future)

cretindesalpes

10th August 2021, 08:56

Any integer data up to 24 bits (even 25 bits IIRC) can be represented in 32-bit floating point data, and converted back and forth losslessly by trivial means. Just make sure not to enable dithering for the float to int conversion.

Myrsloik

10th August 2021, 09:19

Nice to see that audio will be supported in VS. But are you really going to support all kind of different sample formats at the plug-in level? Because nowadays we only need float for audio processing (sometimes double for temporary calculations, but this is not needed for streaming between audio modules), and conversion happens only at storage or interface level.

I allow 16-32 bit audio and float. I suspect most plugins will settle on 16,32 and float support or something similar but having the rest available fpr source filters to output is free.

Myrsloik

10th August 2021, 19:10

What would you say if I dropped 32 bit builds but in return added back windows 7 (x64) support (Python 3.8 modules in addition to whatever the latest Python is)?

Also, new test build coming soon with avisynth compatibility fully restored.

Reel.Deel

10th August 2021, 19:35

What would you say if I dropped 32 bit builds but in return added back windows 7 (x64) support (Python 3.8 modules in addition to whatever the latest Python is)?

Also, new test build coming soon with avisynth compatibility fully restored.

That sounds like a fair trade :D ... I have r52 (x64) installed because of that reason.

cretindesalpes

10th August 2021, 20:54

:thanks:

ChaosKing

10th August 2021, 23:22

Hmm I think a x64 win7 build is "more useful" than a x86 build.

Myrsloik

11th August 2021, 18:19

New version with exciting things:

Python 3.8 and 3.9 are supported now. And 32 bit compatibility remains as well for now I guess.

The portable version requires vs-detect-python.bat to be run once before using to select the correct dlls.

Avisynth compatibility has been restored and will always pass planar formats unless the the avs function is called with the argument compatpack=True.
core.avs.GBlur(clip, compatpack=True)
This seems to be the most reasonable tradeoff in modern times since many filters actually support planar RGB nowadays. May require some scripts to be modified obviously but it's only mildly breaking at most.

Have fun with testing things. The only breaking changes remaining are to overhaul all the memory view stuff in Python. It's currently an access violation deathtrap.

vxzms

12th August 2021, 11:09

My friend and I did some simple tests in R55-API4-test3. Here are some of our findings between it and R54-Release:

1. Audio support is wonderful, but I don’t know if it can process lossy audio almost losslessly like ffmpeg, e.g. trim .aac / .m4a audio
2. New API vs scripts reduces a lots of memory (reduced by about 30%), especially based CUDA plugins (reduced by about 80%)
3. Vspipe’s new option --filter-time is interesting; --argus input format also has some changes (byte -> string), user may need attention if vpy decode byte to string
4. Plugins which has been ported to api4 get a small speed increase (about 10%, zimg resize may be higher), but if don’t consider plugins which using nfMakeLinear, api3 plugins also have different degrees of speed reduction (about 5%-15%), e.g. eedi2, znedi3, neo_f3kdb, descale and so on (tcanny and warp similar to api3 speed). If multiple api3 filters are used, it will more obvious, such as 2pass eedi2 / znedi3, they generally become the bottleneck of script speed. These is just part of vs plugin library, the most important is that we are not sure that these filters will be ported to api4 in the future, some developers reduced their activity, so I think if you consider more compatibility with api3.

I don’t know if there are more other people test api4, didn't see much discussion in forum.

* After reminding to correct 2nd point, it may be bm3d.VAggregate reduced so much memory instead of BM3DCUDA.

Myrsloik

12th August 2021, 11:53

My friend and I did some simple tests in R55-API4-test3. Here are some of our findings between it and R54-Release:

1. Audio support is wonderful, but I don’t know if it can process lossy audio almost losslessly like ffmpeg, e.g. trim .aac / .m4a audio
2. New API vs scripts reduces a lots of memory (reduced by about 30%), especially based CUDA plugins (reduced by about 80%)
3. Vspipe’s new option --filter-time is interesting; --argus input format also has some changes (byte -> string), user may need attention if vpy decode byte to string
4. Plugins which has been ported to api4 get a small speed increase (about 10%, zimg resize may be higher), but if don’t consider plugins which using nfMakeLinear, api3 plugins also have different degrees of speed reduction (about 5%-15%), e.g. eedi2, znedi3, neo_f3kdb, descale and so on (tcanny and warp similar to api3 speed). If multiple api3 filters are used, it will more obvious, such as 2pass eedi2 / znedi3, they generally become the bottleneck of script speed. These is just part of vs plugin library, the most important is that we are not sure that these filters will be ported to api4 in the future, some developers reduced their activity, so I think if you consider more compatibility with api3.

I don’t know if there are more other people test api4, didn't see much discussion in forum.

1. No, obviously not in the scope of this project.
2. The CUDA drop sounds too big to be correct.
3. The new behavior is based on the type hint in the passed VSMap. If used through vspipe it's always a utf-8 string which turns into the str type in Python. I think strings generally make more sense for what comes from a command line argument.
4. There's no speed difference for api3 and api4 video filters unless they use any flags (99% don't) in which case things may get weird. All speed differences seen are due to general threading changes in the core. At most you'd see a slight memory usage drop when using api4 filter since useless caches can be avoided.

Btw, what system did you test things on?

vxzms

12th August 2021, 12:32

1. No, obviously not in the scope of this project.
2. The CUDA drop sounds too big to be correct.
3. The new behavior is based on the type hint in the passed VSMap. If used through vspipe it's always a utf-8 string which turns into the str type in Python. I think strings generally make more sense for what comes from a command line argument.
4. There's no speed difference for api3 and api4 video filters unless they use any flags (99% don't) in which case things may get weird. All speed differences seen are due to general threading changes in the core. At most you'd see a slight memory usage drop when using api4 filter since useless caches can be avoided.

Btw, what system did you test things on?

Thanks your reply, I test on Windows 11 22000.120.

CUDA tested by my friend, mainly (V-)BM3DCUDA. She is also win11.

Myrsloik

13th August 2021, 14:38

Does anyone actually use v210 output ever? as in set enable_v210 to true? I'm curious since it isn't the default.

DJATOM

13th August 2021, 16:44

Not ever needed to set this option before.

kedautinh12

13th August 2021, 17:05

Wow 3900x -> 5950x, waiting for 3090 to full option

DJATOM

13th August 2021, 23:06

Wow 3900x -> 5950x, waiting for 3090 to full option

I have bought it for $909 which I think is reasonable.
Maybe I will buy 4090 once it releases... I don't want to overpay for a card, so probably gonna pass on Ampere generation.

videoh

14th August 2021, 02:37

You're rich Me too.

poisondeathray

14th August 2021, 02:46

Does anyone actually use v210 output ever? as in set enable_v210 to true? I'm curious since it isn't the default.

Yes - when avfs is used for 10bit422 input into programs like retail NLE's

Myrsloik

14th August 2021, 10:52

Yes - when avfs is used for 10bit422 input into programs like retail NLE's

Ok, are there any other weird input formats NLEs support?

poisondeathray

14th August 2021, 14:10

Ok, are there any other weird input formats NLEs support?

Not really "weird" - v210 is the defacto standard for uncompressed professional video for the last 15-20 years . "p210" is universally not supported by those sorts of programs, including VFX, grading programs

For many NLE's , if uncompressed 8bit 4:2:0 is supported, it's usually supported as "IYUV" , and 8bit 4:2:2 as "UYVY" .

v210 (10bit422) , IYUV (8bit420), UYVY (8bit422) are usually the "magical" pixel format configurations that get passthrough or proper YUV treatment on Windows based NLE's . v210 also on Mac, but "AVI" is not as compatible on most Mac programs

YV12 (for 8bit 4:2:0) and YUY2 (for 8bit 4:2:2 "packed") or YV16 (for 8bit 4:2:2 "planar") as uncompressed imports usually get converted to RGB.

This has been mentioned before several times in other threads - but for the avfs side - MOV container emulation would be more compatible for some programs

Myrsloik

14th August 2021, 14:35

I could add alternate output support for IYUV and UYVY fourccs if it really matters.

Patches welcome for MOV support I guess. Personally I'd rather do other things.

poisondeathray

14th August 2021, 16:46

I could add alternate output support for IYUV and UYVY fourccs if it really matters.

I would put it on lower priority for your "things to do" list . I'm sure there's more important stuff to do. But if you had to prioritize them UYVY > IYUV

Myrsloik

14th August 2021, 22:31

I would put it on lower priority for your "things to do" list . I'm sure there's more important stuff to do. But if you had to prioritize them UYVY > IYUV

All you have to do is say some expensive enterprise NLEs that it'd improve compatibility with and I'll do it soon...

poisondeathray

15th August 2021, 15:49

All you have to do is say some expensive enterprise NLEs that it'd improve compatibility with and I'll do it soon...

UYVY will for sure - "Uncompressed video" to a NLE generally means UYVY for 8bit422 and v210 for 10bit422.

(Many do not handle uncompressed 8bit 4:2:0, but for the ones that do, the "magic" key is IYUV)

Most other variants/fourcc's are mishandled and converted to RGB (if they import at all)

(MOV support would increase compatibility too)

videoh

18th August 2021, 10:26

The gross performance degradation for API3 source filters with the beta API4 Vapoursynth effectively means you have abandoned backward compatibility for your API. That is a terrible thing. I strongly suggest you find a way to solve that.