vsFilterScript: writing C++ plugins like python scripts (WIP) [Archive]

View Full Version : vsFilterScript: writing C++ plugins like python scripts (WIP)

Pages : [1] 2

feisty2

24th March 2020, 14:15

https://github.com/IFeelBloated/vsFilterScript

this is yet another C++ wrapper for VSAPI. However, it is much higher level than vsxx and provides a "scripting" kinda experience to help you sketch your filter in the fastest possible way.

take a look at the 3x3 gauss blur example (https://github.com/IFeelBloated/vsFilterScript/blob/master/Examples/GaussBlur.hxx), less than 40 lines of code and you got your filter up and running. A temporal median example (https://github.com/IFeelBloated/vsFilterScript/blob/master/Examples/TemporalMedian.hxx) is also provided to show you how to write temporal or spatiotemporal filters.

there're 2 more examples, Crop (https://github.com/IFeelBloated/vsFilterScript/blob/master/Examples/Crop.hxx) and Rec601ToRGB (https://github.com/IFeelBloated/vsFilterScript/blob/master/Examples/Rec601ToRGB.hxx) showing you filters with advanced vaporsynth features.
Crop (https://github.com/IFeelBloated/vsFilterScript/blob/master/Examples/Crop.hxx) shows you how to write filters that modify the measures of the input (e.g. frame size) and filters that adapt to inputs with arbitrary bitdepths.
Rec601ToRGB (https://github.com/IFeelBloated/vsFilterScript/blob/master/Examples/Rec601ToRGB.hxx) converts a YUV444 clip to RGB using the Rec601 matrix, it shows you how to write filters that modify the format of the input (e.g. YUV->RGB) and how to manipulate frame properties.

C++20 support required (you probably need GCC10 from the trunk). The scripting style syntax is only possible with the latest C++ standard.

I haven't finished porting all C APIs to this wrapper, but the filter skeleton generator is here: https://github.com/IFeelBloated/vsFilterScript/blob/master/Include/Interface.hxx, it requires certain properties, some constants, some member functions as shown in the example filter. The skeleton generator works in a duck typing manner, it generates a set of skeleton functions as long as the filter struct has all the required properties.

you should write each filter in a header filer and include the headers in "EntryPoint.cxx (https://github.com/IFeelBloated/vsFilterScript/blob/master/Examples/EntryPoint.cxx)" and register each filter with "VaporInterface::RegisterFilter".

The "Clip" object could be accessed as a 4D array ([time (frame)][channel][height][width]) with "GetFrames" or as a 3D array ([channel][height][width]) with "GetFrame". The "time" dimension is relative to the current frame (t=0 for the current frame, t=-1 for the previous frame and t=1 for the next), the other 3 dimensions are absolute. Out-of-bound access is allowed and triggers automatic padding, the behavior of out-of-bound access is defined by concrete padding policies (repeat, reflect, zero...) and the default padding policy is "repeat" for both spatial and temporal dimensions. More details about this part are defined in Plane.hxx (https://github.com/IFeelBloated/vsFilterScript/blob/master/Include/Plane.hxx), Frame.hxx (https://github.com/IFeelBloated/vsFilterScript/blob/master/Include/Frame.hxx) and Clip.hxx (https://github.com/IFeelBloated/vsFilterScript/blob/master/Include/Clip.hxx)

latest update:
new functionality: full integration of C++ exceptions. with exceptions, you no longer have to manually handle any of the following errors:
a) failing to invoke an external plugin (plugin does not exist)
b) failing to invoke an external filter
c) failing to invoke a python function
d) failing to invoke SelfInvoker
... and possibly many more. SelfInvoker is now allowed to throw exceptions so the earlier restriction requiring SelfInvoker to always be successfully evaluated has been removed. Any of these errors will transparently pass through your filters and propagate to a root caller like Create() (https://github.com/IFeelBloated/vsFilterScript/blob/master/include/Interface.vxx#L48) which automatically handles any error.
To you, it would be like the error does not exist so you NEVER have to worry about errors. It's now one step closer to python scripts.

Initialize() has been replaced by normal constructors because with exceptions, it is no longer required to return a value to introspect if the filter has been successfully constructed.

Myrsloik

24th March 2020, 16:07

I'm curious, what does the actual generated code look like for this? What's the performance penalty for all the abstraction?

feisty2

24th March 2020, 18:50

I'm curious, what does the actual generated code look like for this?
oh I see, you meant generated (machine) code...

What's the performance penalty for all the abstraction?
The main runtime overhead is automatic padding (out-of-bound access detection) which gcc -O3 seems to handle pretty well. everything else is determined at compile time and thus zero cost abstraction.

feisty2

25th March 2020, 10:09

did some speed tests,

clp = core.test.GaussBlur(clp)

runs @ 1822.59 fps at 640 x 480 GRAYS (compiled with GCC -O3)

clp = core.std.Convolution(clp, [1,2,1,2,4,2,1,2,1])

runs @ 2425.51 fps at 640 x 480 GRAYS

the comparison is not completely fair tho, test.GaussBlur is a 100% C++ filter (GCC doesn't seem to autovectorize any loop) and std.Convolution has manual avx2 optimization.

feisty2

26th March 2020, 12:31

new example: temporal median (https://github.com/IFeelBloated/vsFilterScript/blob/master/TemporalMedian.hxx)
44 lines of perfectly readable, script-like code, and probably even easier to write
vs
376 lines of cryptic C code (https://github.com/dubhater/vapoursynth-temporalmedian/blob/master/src/temporalmedian.cpp)

zorr

26th March 2020, 22:07

I have to say this looks very promising. I have a couple of python filters that are annoyingly slow but it seems too much work to port them to actual C/C++ plugins.

What's the performance delta with temporal median?

Are_

29th March 2020, 20:14

feisty2

29th March 2020, 20:24

I noticed when using vsedit benchmark utility CPU cores are at 50% for this and at 20% for convolution.
I rerun the test with vspipe and got this:

14488fps all cores at ~85% load (Convolution)
4098fps all cores at 100% load (test)

emm... 14488fps is probably the effect of SIMD optimization, I'll later write a gauss blur filter with the C API and let's see...

feisty2

29th March 2020, 21:29

@Are_
could you compile this GaussBlur filter written with low level APIs (https://github.com/IFeelBloated/test_c_filters/blob/master/GaussBlur.cxx) and run a speed test again?

Are_

29th March 2020, 21:40

11748fps all cores at 100% load

:(

feisty2

29th March 2020, 21:46

interesting... guess I'll have to profile it a little bit and find the main cause that's slowing it down

josemaria.alkala

30th March 2020, 15:55

I am amazed how fast this is. Where is the video you are testing?

My Nim version is dead slow (most likely my bad, not Nim's) when compared. I am getting something like 40fps :(

StainlessS

30th March 2020, 17:17

I am amazed how fast this is
Looks like GRAYS is a form of blankclip [so timing mostly speed of filter rather than video decoder etc].
From here:- https://forum.doom9.org/showthread.php?p=1905459#post1905459

clip = core.std.BlankClip(format=vs.GRAYS, length=100000, fpsnum=24000, fpsden=1001, keep=True)

josemaria.alkala

30th March 2020, 18:22

I just got a bit better: 80fps.

feisty2

31st March 2020, 09:40

I am amazed how fast this is. Where is the video you are testing?

My Nim version is dead slow (most likely my bad, not Nim's) when compared. I am getting something like 40fps :(

I'm actually more concerned if the runtime overhead for a given format is constant, so it would be negligible compared to the actual algorithms for complex filters.

josemaria.alkala

31st March 2020, 16:36

I am sorry I am not a pro-developer (it is my hobby) so I am not sure if I understand you correctly. Nim compiles to C language. It is garbage collected, but you can disable the garbage collector (just passing "--gc:none"). So my understanding is that there is no runtime (so no overhead in that regard).

It is my first time dealing with memory, so probably I am doing something really bad. I have asked for advice here (https://forum.nim-lang.org/t/6134). You might want to take a look.

When I compile the following code:

import ../vapoursynth
import options

BlankClip( format=pfGrayS.int.some,
width=640.some,
height=480.some,
length=100000.some,
fpsnum=24000.some,
fpsden=1001.some, keep=1.some).Convolution(@[1.0,2.0,1.0,2.0,4.0,2.0,1.0,2.0,1.0]).Savey4m("/dev/null")

by means of:

$ nim c -f --threads:on --gc:none -d:release -d:danger modifyframe
$ time ./modifyframe
real 0m58,879s
user 0m54,969s
sys 0m7,433s

which is 1698fps.

This uses the Convolution filter plus a custom made filter (Savey4m) that for sure is adding some overhead, despite is sending the data to "/dev/null".

How do you read the memory once you have the plane's pointer? Could you send me a link to that particular piece of code (I don't understand much C/C++, I hope to understand enough).

amichaelt

31st March 2020, 16:52

How do you read the memory once you have the plane's pointer?

Just looks to be a combination of pointer arithmetic and indexing using the [] operator.

josemaria.alkala

31st March 2020, 16:58

I am using the same approach (but for sure I am adding a lot of overhead somewhere).

By the way, just for reference I am using a laptop with a: i7-4770HQ (4cores and 8Gb).

amichaelt

31st March 2020, 16:59

I am using the same approach (but for sure I am adding a lot of overhead somewhere).

By the way, just for reference I am using a laptop with a: i7-4770HQ (4cores and 8Gb).

What does the generated C code look like, though? Are you sure there's no extra bounds checking or other code being inserted by the C code generator?

feisty2

31st March 2020, 17:03

How do you read the memory once you have the plane's pointer? Could you send me a link to that particular piece of code (I don't understand much C/C++, I hope to understand enough).
https://github.com/IFeelBloated/vsFilterScript/blob/master/Plane.hxx#L47

also in your other post

The results are 80fps (for me) against about 2400fps for a C++ wrapper and 11748fps for a pure C version.

it's 4098fps for the C++ wrapper, not 2400fps (compared to 11748fps for the C version). you have to understand you can only compare these numbers on the same machine... so you can't compare your numbers with numbers reported by Are_ and me, because they were evaluated on different machines.

then, it seems that your nim version operates on int8 clips, the C and C++ plugins were coded for fp32 clips, there's also a significant performance gap here, you can't compare like that.

josemaria.alkala

7th April 2020, 09:17

Could you try the convolution (https://github.com/mantielero/VapourSynth.nim/blob/master/src/filters/convolution) and the filter (https://github.com/mantielero/VapourSynth.nim/blob/master/src/filters/modifyframe)? Those are just linux binaries that return:

$ ./modifyframe
Time : 164.1971027851105
Num. frames: 100000
FPS : 609.0241441767271

I would like to see how does it perform on your system.

MeteorRain

8th April 2020, 13:26

I was thinking how fast can you run temporal median considering it'll be a lot slower if the radius is large.

If done in SIMD, probably I can use a sorting network but I can imagine it still runs at O(n^2).

josemaria.alkala

8th April 2020, 20:27

I manage to compare in my computer the pure C++ filter and a Nim based version.

C++ Version
I took it from here (https://github.com/IFeelBloated/test_c_filters). I compiled with:

g++ -Wall -O3 -shared -fPIC -I. -o libfilter.so GaussBlur.cxx

Create a VapourSynth python filter like test_filter.vpy:

import vapoursynth as vs
core = vs.get_core()
core.std.LoadPlugin(path='./libfilter.so')
core.std.SetMaxCPU('none')
clip = core.std.BlankClip(format=vs.GRAYS, length=100000, fpsnum=24000, fpsden=1001, keep=True)
clip = core.testc.GaussBlur(clip)
clip.set_output()

and finally:

$ vspipe test_filter.vpy /dev/null
Output 100000 frames in 29.53 seconds (3386.27 fps)

Nim Version
I use custom_filter.nim (https://github.com/mantielero/VapourSynth.nim/blob/master/test/custom_filter.nim) which uses DrawFrame.nim (https://github.com/mantielero/VapourSynth.nim/blob/master/test/DrawFrame.nim).

I compile it like:

$ nim c -f --threads:on -d:release -d:danger custom_filter

And test it by doing:

$ ./custom_filter
Time : 9.394126653671265
Num. frames: 100000
FPS : 10644.94909283766

Using int32.

If I use float32:

$ ./custom_filter
Time : 16.52139902114868
Num. frames: 100000
FPS : 6052.756178335272

I wasn't using multithreading before while vspipe does.

josemaria.alkala

8th April 2020, 20:33

Another test that I have done is using the Convolution filter from Nim: convolution.nim (https://github.com/mantielero/VapourSynth.nim/blob/master/test/convolution.nim) and I get:

$ ./convolution
Time : 5.210163354873657
Num. frames: 100000
FPS : 19193.25617813089

Using vspipe and python script:

$ vspipe convolution.vpy /dev/null
Output 100000 frames in 26.87 seconds (3721.76 fps)

I think that vspipe is actually writting frames to /dev/null while I just request the frame and then dismiss them without further processing.

Myrsloik

8th April 2020, 20:45

josemaria.alkala

8th April 2020, 21:25

It is not showing faster in my computer:

$ vspipe convolution.vpy .
Output 100000 frames in 26.94 seconds (3712.54 fps)
$ vspipe test_filter.vpy .
Output 100000 frames in 29.58 seconds (3380.22 fps)

MeteorRain

8th April 2020, 23:41

v = core.std.BlankClip(format=vs.GRAY8, length=10000, fpsnum=24000, fpsden=1001, keep=True)
v = core.neo_tmedian.TemporalMedian(v, radius=3)
v.set_output()

Output 10000 frames in 6.10 seconds (1638.35 fps)

radius=10

Output 10000 frames in 18.55 seconds (539.06 fps)

That uses the same algorithms as F2's, std::nth_element.

EDIT:

* On radius=2

C code:
Output 10000 frames in 5.83 seconds (1714.24 fps)

AVX2 code:
Output 10000 frames in 0.41 seconds (24321.61 fps)

* On radius=3

C code:
Output 10000 frames in 6.10 seconds (1638.35 fps)

AVX2 code:
Output 10000 frames in 0.52 seconds (19154.49 fps)

* On radius=4

C code:
Output 10000 frames in 9.45 seconds (1057.94 fps)

AVX2 code:
Output 10000 frames in 0.64 seconds (15681.07 fps)

feisty2

11th April 2020, 14:19

update:
new functionality: Buffer (https://github.com/IFeelBloated/vsFilterScript/blob/master/Buffer.hxx), a writable matrix similar to a Plane, you may access the buffer as a plane to enable automatic padding when're done writing it, useful to filters that generate intermediate representations of the input clip.

revamped: the argument extracting process now has a prettier syntax, and looks more pythonic.

to get the argument for a parameter:

param = Arguments["param"];

to get the argument for an optional parameter:

if (Arguments["param"].Exists())
param = Arguments["param"];

to get the argument for a parameter array:

for (auto x : Range{ param_arr.size() })
if (Arguments["param"][x].Exists())
param_arr[x] = Arguments["param"][x];

or

for (auto x : Range{ Arguments["param"].Size() })
param_arr[x] = Arguments["param"][x];

feisty2

11th April 2020, 21:06

update:
revamped: even prettier syntax for parameter arrays

you may also fetch each element for a parameter array with range-for syntax

for (auto x : Arguments["param"])
param_arr.push_back(x);

revamped: you no longer have to specify the "Name + ..." caption in your error messages, this is now automatically appended via reflection.

// this is now deprecated
Console.RaiseError(Name + ": some error message"s);

//do this instead
Console.RaiseError("some error message");

feisty2

11th April 2020, 21:43

I'm thinking of adding support for frame properties, but very few filters seem to utilize these and I can't find an example filter to test if my implementations are correct.

MeteorRain

11th April 2020, 22:12

Interlaced content? There should be some IVTC/deinterlace filters that check tff/bff.

feisty2

12th April 2020, 21:44

update:
bug fixes: apparently in rare cases stride / sizeof(PixelType) != width, fixed.

new functionality: ability to manipulate frame properties, you could read the properties attached to the source frame similarly to how you fetch arguments for parameters, except that you may have to manually cast the property back to its original type.

to get the "_Matrix" property:

auto matrix = static_cast<int>(srcFrame["_Matrix"]);

to get a property array:

auto prop_arr = std::vector<PropType>{};
for (auto x : srcFrame["_Prop"])
prop_arr.push_back(x);

assigning properties to the output frame is also easy!

to assign a property:

dstFrame["_Prop"] = prop_value;

to assign a property array:

for (auto x : prop_arr)
dstFrame["_Prop"] += x;

to erase a property:

dstFrame["_Prop"].Erase();

there're 3 assigning operators, operator= corresponds to paReplace, operator+= corresponds to paAppend and operator|= corresponds to paTouch.

the semantics of "someFrame[x]" is dependent on the type of x
if x is an integer, someFrame[x] is the x-th channel of the frame.
if x is a string (of type char*, const char*, std::string, std::string_view), someFrame[x] is the property x attached to the frame.

feisty2

14th April 2020, 19:46

update:
revamped: dropped support for multiple outputs, I've never seen a single filter that uses this feature and my implementation was incorrect anyways.

revamped: eliminated all indirect addressing

// no more the following
InputClip.Info->Width
InputFrame.Format->PlaneCount
InputClip.Info->Format->BitsPerSample

// now you have direct access to all these items
InputClip.Width
InputFrame.PlaneCount
InputClip.BitsPerSample

note that for clips with mutable format, format members are filled with garbage data at clip level

new example: Crop (https://github.com/IFeelBloated/vsFilterScript/blob/master/Examples/Crop.hxx), this example shows you how to write filters that modify the measures of the input (e.g. frame size) and filters that adapt to inputs with arbitrary bitdepths.

new example: Rec601ToRGB (https://github.com/IFeelBloated/vsFilterScript/blob/master/Examples/Rec601ToRGB.hxx), this example converts a YUV444 clip to RGB using the Rec601 matrix, it shows you how to write filters that modify the format of the input (e.g. YUV->RGB) and how to manipulate frame properties.

I think this thing is almost feature complete now.

feisty2

17th April 2020, 12:13

update:
new functionality: ability to call external filters before the actual filtering (DrawFrame()) begins, you may define a Preprocess() function to do things that happen before DrawFrame() is invoked. you don't need to define an empty Preprocess() if there's no preprocessing required, the wrapper will automatically detect if your filter has the Preprocess() member via hasattr reflection, if Preprocess() exists, it will be invoked. Likewise, you don't need to define DrawFrame() and its relevant functions or attributes if the filter itself does not do any filtering (something like nnedi3_rpow2), in such case, you should pass the output clip to Console with Console.Receive(), see Transpose (https://github.com/IFeelBloated/vsFilterScript/blob/master/Examples/Transpose.hxx) for a concrete example.

new functionality: calling external filters.

the syntax to invoke external filters is as follows:

Core["namespace"]["filter name"]("param1", argument1, "param2", argument2, ...);

the call returns a video clip, the same as what you expect in vpy scripts. you may pass any iterable container as the argument to array parameters, the wrapper will examine if the argument is a scalar or a container via reflection (thru hasattr(argument, begin)) and take care of everything.

it is now no longer possible to apply some trivial patches and compile vsFilterScript with GCC9.3, the hasattr reflection is implemented by core language features of C++20 (concepts and constraints).

feisty2

18th April 2020, 11:07

update:
bug fixes: fixed a bug where std::string and std::string_view were assumed to be containers when forwarded as arguments.

revamped: if you allocate a temporary plane with Buffer, each row is now guaranteed to have a 32-byte alignment. (https://github.com/IFeelBloated/vsFilterScript/blob/master/Include/Buffer.hxx#L14)

revamped: you can now alter the padding policy of a plane at runtime

frame[c].PaddingPolicy = new_policy;

new functionality: PeekFrameFormat(), useful to deal with clips with mutable format. you may use this function to determine the bitdepth of a frame before calling GetFrame<PixelType>() which relies on the bitdepth information.

MysteryX

18th April 2020, 17:50

This looks very interesting. It should make it easier to port Avisynth filters to VapourSynth; which I still haven't looked into!

feisty2

18th April 2020, 20:22

This looks very interesting. It should make it easier to port Avisynth filters to VapourSynth; which I still haven't looked into!

this thing relies heavily on c++20 features and currently, the only known compiler that has implemented all required features is the yet unreleased GCC10. you have to first compile master branch GCC then you get to play with vsFilterScript

MysteryX

19th April 2020, 01:10

this thing replies heavily on c++20 features and currently, the only known compiler that has implemented all required features is the yet unreleased GCC10. you have to first compile master branch GCC then you get to play with vsFilterScript
Surely Visual Studio 2025 will support it

StainlessS

19th April 2020, 10:15

this thing replies heavily
Too many 'p' s

yet unreleased GCC10.
I dont as yet use Linux very much, but a couple of days back I think Cinnamon Mint [EDIT: 19.3 Tricia] installed GCC10 as update.

Just saying.

EDIT: some stuff extracted from /var/log/dpkg.log

2020-04-11 22:14:35 startup archives unpack
2020-04-11 22:14:44 install gcc-10-base:amd64 <none> 10-20200405-0ubuntu1~18.04
2020-04-11 22:14:44 status half-installed gcc-10-base:amd64 10-20200405-0ubuntu1~18.04
2020-04-11 22:14:44 status unpacked gcc-10-base:amd64 10-20200405-0ubuntu1~18.04
2020-04-11 22:14:44 status unpacked gcc-10-base:amd64 10-20200405-0ubuntu1~18.04
2020-04-11 22:14:44 install gcc-10-base:i386 <none> 10-20200405-0ubuntu1~18.04
2020-04-11 22:14:44 status half-installed gcc-10-base:i386 10-20200405-0ubuntu1~18.04
2020-04-11 22:14:44 status unpacked gcc-10-base:i386 10-20200405-0ubuntu1~18.04
2020-04-11 22:14:44 status unpacked gcc-10-base:i386 10-20200405-0ubuntu1~18.04
2020-04-11 22:14:45 startup packages configure
2020-04-11 22:14:45 configure gcc-10-base:amd64 10-20200405-0ubuntu1~18.04 <none>
2020-04-11 22:14:45 status unpacked gcc-10-base:amd64 10-20200405-0ubuntu1~18.04
2020-04-11 22:14:45 status half-configured gcc-10-base:amd64 10-20200405-0ubuntu1~18.04
2020-04-11 22:14:45 status installed gcc-10-base:amd64 10-20200405-0ubuntu1~18.04
2020-04-11 22:14:45 configure gcc-10-base:i386 10-20200405-0ubuntu1~18.04 <none>
2020-04-11 22:14:45 status unpacked gcc-10-base:i386 10-20200405-0ubuntu1~18.04
2020-04-11 22:14:45 status half-configured gcc-10-base:i386 10-20200405-0ubuntu1~18.04
2020-04-11 22:14:45 status installed gcc-10-base:i386 10-20200405-0ubuntu1~18.04

EDIT: I forced a 'Refresh' for the updates to appear in Update Manager.

MeteorRain

19th April 2020, 11:17

StainlessS

19th April 2020, 11:25

As in above EDIT:, I forced a refresh for it to appear in Update Manager.
Thanks for the info.
EDIT: Maybe I have it set to install unstable packages or something [I remember I was looking for something that was unavailable as standard].

feisty2

19th April 2020, 12:32

SssS: Yes that's the 20200405 snapshot of GCC 10 that's still in development. I don't know why a distribution would put a software still in development on users' computer.

Clang 10 however, has just released. But that doesn't support all the features I believe.

I might be able to make a GCC9.3 compatible version if anyone needs it. I would have to rewrite a bunch of concepts and constraints stuff with SFINAE tho.

feisty2

19th April 2020, 17:21

@Are_
can you run a speed test for the latest gauss blur (https://github.com/IFeelBloated/vsFilterScript/blob/master/Examples/GaussBlur.hxx) again?
I have eliminated some frequent memory reallocations which could potentially be the performance bottleneck in earlier versions

Are_

19th April 2020, 19:39

(vsFilterScript)
Output 100000 frames in 28.35 seconds (3527.91 fps)
(c_filter)
Output 100000 frames in 9.06 seconds (11036.62 fps)
(Convolution)
Output 100000 frames in 9.05 seconds (11045.38 fps)

This is not the same computer but it's quite similar in specs.

feisty2

19th April 2020, 20:24

so it wasn't reallocation either...
I guess it can only be the cost of small objects then (Plane::Proxy for each pixel)

qyot27

20th April 2020, 00:56

SssS: Yes that's the 20200405 snapshot of GCC 10 that's still in development. I don't know why a distribution would put a software still in development on users' computer.
Judging by some of the responses here:
https://forums.linuxmint.com/viewtopic.php?t=317303

I'm pretty sure it was due to the ubuntu-toolchain-r PPA, not the distro.

StainlessS

20th April 2020, 03:30

I'm pretty sure it was due to the ubuntu-toolchain-r PPA, not the distro.

Yep mine was same problem, thanx qyot.

feisty2

20th April 2020, 06:44

I have pinpointed the performance bottleneck, it is the indirection (Plane::Proxy and Plane::Offset)
and it shouldn't be a problem unless the filter is ultra fast, like 3x3 gauss blur. it is a constant cost for a given format and should be negligible for slow filters like NLMeans.

feisty2

20th April 2020, 20:42

update:
revamped: Preprocess() is now replaced by RegisterInvokingSequence() (https://github.com/IFeelBloated/vsFilterScript/blob/master/Examples/Transpose.hxx#L12), you may invoke a sequence of filters (including the very filter that is being defined, you may invoke this "self" filter via SelfInvoker) any way you like. basically, you get to write something like a vpy script in RegisterInvokingSequence and if this function is absent in your filter, it is assumed that the output is SelfInvoker()

say you defined a 3x3 blurring filter in DrawFrame and you wanna repeat it 10 times:

auto RegisterInvokingSequence(auto Core, auto&& SelfInvoker, auto Console) {
for (auto _ : Range{ 10 })
InputClip = SelfInvoker("clip", InputClip);
Console.Receive(InputClip);
}

you wanna upsize it by a factor of 2 after blurring.

auto RegisterInvokingSequence(auto Core, auto&& SelfInvoker, auto Console) {
for (auto _ : Range{ 10 })
InputClip = SelfInvoker("clip", InputClip);
InputClip = Core["fmtc"]["resample"]("clip", InputClip, "width", InputClip.Width * 2, "height", InputClip.Height * 2);
Console.Receive(InputClip);
}

and you wanna transpose the clip before all this happens

auto RegisterInvokingSequence(auto Core, auto&& SelfInvoker, auto Console) {
InputClip = Core["std"]["Transpose"]("clip", InputClip);
for (auto _ : Range{ 10 })
InputClip = SelfInvoker("clip", InputClip);
InputClip = Core["fmtc"]["resample"]("clip", InputClip, "width", InputClip.Width * 2, "height", InputClip.Height * 2);
Console.Receive(InputClip);
}

feisty2

21st April 2020, 16:14

update:
revamped: you can now call SelfInvoker() with an argument list, the calling syntax is unified with external filters. However unlike external filters, SelfInvoker does not have any error handling capacity. It is the filter developer's responsibility to ensure that only valid arguments are passed to SelfInvoker since it represents the "self" filter and it's private to the developer. The behavior is undefined (you will most likely encounter a segfault) if you call SelfInvoker with invalid arguments. all above examples have been updated to reflect the new calling syntax.