Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > VapourSynth
Register FAQ Calendar Today's Posts Search

Reply
 
Thread Tools Search this Thread Display Modes
Old 24th March 2020, 14:15   #1  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: void
Posts: 2,633
vsFilterScript: writing C++ plugins like python scripts (WIP)

https://github.com/IFeelBloated/vsFilterScript

this is yet another C++ wrapper for VSAPI. However, it is much higher level than vsxx and provides a "scripting" kinda experience to help you sketch your filter in the fastest possible way.

take a look at the 3x3 gauss blur example, less than 40 lines of code and you got your filter up and running. A temporal median example is also provided to show you how to write temporal or spatiotemporal filters.

there're 2 more examples, Crop and Rec601ToRGB showing you filters with advanced vaporsynth features.
Crop shows you how to write filters that modify the measures of the input (e.g. frame size) and filters that adapt to inputs with arbitrary bitdepths.
Rec601ToRGB converts a YUV444 clip to RGB using the Rec601 matrix, it shows you how to write filters that modify the format of the input (e.g. YUV->RGB) and how to manipulate frame properties.

C++20 support required (you probably need GCC10 from the trunk). The scripting style syntax is only possible with the latest C++ standard.

I haven't finished porting all C APIs to this wrapper, but the filter skeleton generator is here: https://github.com/IFeelBloated/vsFi.../Interface.hxx, it requires certain properties, some constants, some member functions as shown in the example filter. The skeleton generator works in a duck typing manner, it generates a set of skeleton functions as long as the filter struct has all the required properties.

you should write each filter in a header filer and include the headers in "EntryPoint.cxx" and register each filter with "VaporInterface::RegisterFilter".

The "Clip" object could be accessed as a 4D array ([time (frame)][channel][height][width]) with "GetFrames" or as a 3D array ([channel][height][width]) with "GetFrame". The "time" dimension is relative to the current frame (t=0 for the current frame, t=-1 for the previous frame and t=1 for the next), the other 3 dimensions are absolute. Out-of-bound access is allowed and triggers automatic padding, the behavior of out-of-bound access is defined by concrete padding policies (repeat, reflect, zero...) and the default padding policy is "repeat" for both spatial and temporal dimensions. More details about this part are defined in Plane.hxx, Frame.hxx and Clip.hxx

latest update:
new functionality: full integration of C++ exceptions. with exceptions, you no longer have to manually handle any of the following errors:
a) failing to invoke an external plugin (plugin does not exist)
b) failing to invoke an external filter
c) failing to invoke a python function
d) failing to invoke SelfInvoker
... and possibly many more. SelfInvoker is now allowed to throw exceptions so the earlier restriction requiring SelfInvoker to always be successfully evaluated has been removed. Any of these errors will transparently pass through your filters and propagate to a root caller like Create() which automatically handles any error.
To you, it would be like the error does not exist so you NEVER have to worry about errors. It's now one step closer to python scripts.

Initialize() has been replaced by normal constructors because with exceptions, it is no longer required to return a value to introspect if the filter has been successfully constructed.

Last edited by feisty2; 8th October 2020 at 11:45.
feisty2 is offline   Reply With Quote
Old 24th March 2020, 16:07   #2  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,555
I'm curious, what does the actual generated code look like for this? What's the performance penalty for all the abstraction?
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet
Myrsloik is online now   Reply With Quote
Old 24th March 2020, 18:50   #3  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: void
Posts: 2,633
Quote:
Originally Posted by Myrsloik View Post
I'm curious, what does the actual generated code look like for this?
oh I see, you meant generated (machine) code...

Quote:
Originally Posted by Myrsloik View Post
What's the performance penalty for all the abstraction?
The main runtime overhead is automatic padding (out-of-bound access detection) which gcc -O3 seems to handle pretty well. everything else is determined at compile time and thus zero cost abstraction.

Last edited by feisty2; 25th March 2020 at 05:57.
feisty2 is offline   Reply With Quote
Old 25th March 2020, 10:09   #4  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: void
Posts: 2,633
did some speed tests,
Code:
clp = core.test.GaussBlur(clp)
runs @ 1822.59 fps at 640 x 480 GRAYS (compiled with GCC -O3)

Code:
clp = core.std.Convolution(clp, [1,2,1,2,4,2,1,2,1])
runs @ 2425.51 fps at 640 x 480 GRAYS

the comparison is not completely fair tho, test.GaussBlur is a 100% C++ filter (GCC doesn't seem to autovectorize any loop) and std.Convolution has manual avx2 optimization.

Last edited by feisty2; 25th March 2020 at 10:12.
feisty2 is offline   Reply With Quote
Old 26th March 2020, 12:31   #5  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: void
Posts: 2,633
new example: temporal median
44 lines of perfectly readable, script-like code, and probably even easier to write
vs
376 lines of cryptic C code
feisty2 is offline   Reply With Quote
Old 26th March 2020, 22:07   #6  |  Link
zorr
Registered User
 
Join Date: Mar 2018
Posts: 447
I have to say this looks very promising. I have a couple of python filters that are annoyingly slow but it seems too much work to port them to actual C/C++ plugins.

What's the performance delta with temporal median?
zorr is offline   Reply With Quote
Old 29th March 2020, 20:14   #7  |  Link
Are_
Registered User
 
Join Date: Jun 2012
Location: Ibiza, Spain
Posts: 321
I noticed when using vsedit benchmark utility CPU cores are at 50% for this and at 20% for convolution.
I rerun the test with vspipe and got this:

Code:
14488fps all cores at ~85% load (Convolution)
 4098fps all cores at 100% load (test)
Are_ is offline   Reply With Quote
Old 29th March 2020, 20:24   #8  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: void
Posts: 2,633
Quote:
Originally Posted by Are_ View Post
I noticed when using vsedit benchmark utility CPU cores are at 50% for this and at 20% for convolution.
I rerun the test with vspipe and got this:

Code:
14488fps all cores at ~85% load (Convolution)
 4098fps all cores at 100% load (test)
emm... 14488fps is probably the effect of SIMD optimization, I'll later write a gauss blur filter with the C API and let's see...
feisty2 is offline   Reply With Quote
Old 29th March 2020, 21:29   #9  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: void
Posts: 2,633
@Are_
could you compile this GaussBlur filter written with low level APIs and run a speed test again?
feisty2 is offline   Reply With Quote
Old 29th March 2020, 21:40   #10  |  Link
Are_
Registered User
 
Join Date: Jun 2012
Location: Ibiza, Spain
Posts: 321
Code:
11748fps all cores at 100% load
Are_ is offline   Reply With Quote
Old 29th March 2020, 21:46   #11  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: void
Posts: 2,633
interesting... guess I'll have to profile it a little bit and find the main cause that's slowing it down
feisty2 is offline   Reply With Quote
Old 30th March 2020, 15:55   #12  |  Link
josemaria.alkala
Registered User
 
Join Date: Apr 2010
Posts: 16
I am amazed how fast this is. Where is the video you are testing?

My Nim version is dead slow (most likely my bad, not Nim's) when compared. I am getting something like 40fps
josemaria.alkala is offline   Reply With Quote
Old 30th March 2020, 17:17   #13  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 10,980
Quote:
I am amazed how fast this is
Looks like GRAYS is a form of blankclip [so timing mostly speed of filter rather than video decoder etc].
From here:- https://forum.doom9.org/showthread.p...59#post1905459
Code:
clip = core.std.BlankClip(format=vs.GRAYS, length=100000, fpsnum=24000, fpsden=1001, keep=True)
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???
StainlessS is offline   Reply With Quote
Old 30th March 2020, 18:22   #14  |  Link
josemaria.alkala
Registered User
 
Join Date: Apr 2010
Posts: 16
I just got a bit better: 80fps.
josemaria.alkala is offline   Reply With Quote
Old 31st March 2020, 09:40   #15  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: void
Posts: 2,633
Quote:
Originally Posted by josemaria.alkala View Post
I am amazed how fast this is. Where is the video you are testing?

My Nim version is dead slow (most likely my bad, not Nim's) when compared. I am getting something like 40fps
I'm actually more concerned if the runtime overhead for a given format is constant, so it would be negligible compared to the actual algorithms for complex filters.
feisty2 is offline   Reply With Quote
Old 31st March 2020, 16:36   #16  |  Link
josemaria.alkala
Registered User
 
Join Date: Apr 2010
Posts: 16
I am sorry I am not a pro-developer (it is my hobby) so I am not sure if I understand you correctly. Nim compiles to C language. It is garbage collected, but you can disable the garbage collector (just passing "--gc:none"). So my understanding is that there is no runtime (so no overhead in that regard).

It is my first time dealing with memory, so probably I am doing something really bad. I have asked for advice here. You might want to take a look.

When I compile the following code:
Code:
import ../vapoursynth
import options

BlankClip( format=pfGrayS.int.some, 
           width=640.some,
           height=480.some,
           length=100000.some,
           fpsnum=24000.some, 
           fpsden=1001.some, keep=1.some).Convolution(@[1.0,2.0,1.0,2.0,4.0,2.0,1.0,2.0,1.0]).Savey4m("/dev/null")
by means of:
Code:
$ nim c -f --threads:on --gc:none -d:release -d:danger modifyframe
$ time ./modifyframe
real	0m58,879s
user	0m54,969s
sys	0m7,433s
which is 1698fps.

This uses the Convolution filter plus a custom made filter (Savey4m) that for sure is adding some overhead, despite is sending the data to "/dev/null".

How do you read the memory once you have the plane's pointer? Could you send me a link to that particular piece of code (I don't understand much C/C++, I hope to understand enough).
josemaria.alkala is offline   Reply With Quote
Old 31st March 2020, 16:52   #17  |  Link
amichaelt
Guest
 
Posts: n/a
Quote:
Originally Posted by josemaria.alkala View Post
How do you read the memory once you have the plane's pointer?
Just looks to be a combination of pointer arithmetic and indexing using the [] operator.
  Reply With Quote
Old 31st March 2020, 16:58   #18  |  Link
josemaria.alkala
Registered User
 
Join Date: Apr 2010
Posts: 16
I am using the same approach (but for sure I am adding a lot of overhead somewhere).

By the way, just for reference I am using a laptop with a: i7-4770HQ (4cores and 8Gb).
josemaria.alkala is offline   Reply With Quote
Old 31st March 2020, 16:59   #19  |  Link
amichaelt
Guest
 
Posts: n/a
Quote:
Originally Posted by josemaria.alkala View Post
I am using the same approach (but for sure I am adding a lot of overhead somewhere).

By the way, just for reference I am using a laptop with a: i7-4770HQ (4cores and 8Gb).
What does the generated C code look like, though? Are you sure there's no extra bounds checking or other code being inserted by the C code generator?
  Reply With Quote
Old 31st March 2020, 17:03   #20  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: void
Posts: 2,633
Quote:
Originally Posted by josemaria.alkala View Post
How do you read the memory once you have the plane's pointer? Could you send me a link to that particular piece of code (I don't understand much C/C++, I hope to understand enough).
https://github.com/IFeelBloated/vsFi.../Plane.hxx#L47

also in your other post
Quote:
The results are 80fps (for me) against about 2400fps for a C++ wrapper and 11748fps for a pure C version.
it's 4098fps for the C++ wrapper, not 2400fps (compared to 11748fps for the C version). you have to understand you can only compare these numbers on the same machine... so you can't compare your numbers with numbers reported by Are_ and me, because they were evaluated on different machines.

then, it seems that your nim version operates on int8 clips, the C and C++ plugins were coded for fp32 clips, there's also a significant performance gap here, you can't compare like that.

Last edited by feisty2; 31st March 2020 at 17:29.
feisty2 is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 13:30.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.