Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > VapourSynth

Reply
 
Thread Tools Search this Thread Display Modes
Old 1st June 2015, 19:57   #1  |  Link
jackoneill
unsigned int
 
jackoneill's Avatar
 
Join Date: Oct 2012
Location: 🇪🇺
Posts: 723
Writing VapourSynth filters in Python

Just for fun.

Here is RemoveGrain mode 19 implemented in Python:
Code:
import vapoursynth as vs
c = vs.get_core()


def removegrain_mode19(n, f):
    fout = f.copy()
    
    for p in range(fout.format.num_planes):
        plane = fout.get_write_array(p)
    
        plane_height = len(plane)
        plane_width = len(plane[0])
        
        for y in range(1, plane_height - 1):
            for x in range(1, plane_width - 1):
                plane[y][x] = (plane[y-1][x-1] + plane[y-1][x] + plane[y-1][x+1] + plane[y][x-1] + plane[y][x+1] + plane[y+1][x-1] + plane[y+1][x] + plane[y+1][x+1] + 4) >> 3
    
    return fout


src = c.ffms2.Source("asdf.mov")
src = c.std.ModifyFrame(clip=src, clips=src, selector=removegrain_mode19)


src.set_output()
It is exceptionally slow.
__________________
Buy me a "coffee" and/or hire me to write code!
jackoneill is offline   Reply With Quote
Old 1st June 2015, 21:51   #2  |  Link
Groucho2004
►◄
 
Groucho2004's Avatar
 
Join Date: Mar 2006
Location: A wretched hive of scum and villainy
Posts: 4,402
Quote:
Originally Posted by jackoneill View Post
It is exceptionally slow.
3 nested for loops with array operations in the inner loop - The Python script interpreter should manage one frame in less than a day.
__________________
Groucho's Avisynth Stuff
Groucho2004 is offline   Reply With Quote
Old 1st June 2015, 21:55   #3  |  Link
MonoS
Registered User
 
Join Date: Aug 2012
Posts: 176
Quote:
Originally Posted by Groucho2004 View Post
3 nested for loops with array operations in the inner loop - The Python script interpreter should manage one frame in less than a day.
Who need those fancy AVX512 instruction, i have an interpreter
MonoS is offline   Reply With Quote
Old 1st June 2015, 21:56   #4  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Ikea Chair
Posts: 2,002
We obviously need to switch to PyPy!
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet
Myrsloik is offline   Reply With Quote
Old 2nd June 2015, 05:51   #5  |  Link
TurboPascal7
Registered User
 
TurboPascal7's Avatar
 
Join Date: Jan 2010
Posts: 270
Obviously you need to add some asm to it.
__________________
Me on GitHub | AviSynth+ - the (dead) future of AviSynth
TurboPascal7 is offline   Reply With Quote
Old 2nd June 2015, 07:55   #6  |  Link
TheFluff
Excessively jovial fellow
 
Join Date: Jun 2004
Location: rude
Posts: 1,073
Quote:
Originally Posted by TurboPascal7 View Post
Obviously you need to add some asm to it.
jesus fucking christ
TheFluff is offline   Reply With Quote
Old 2nd June 2015, 09:57   #7  |  Link
foxyshadis
ангел смерти
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Lost
Posts: 9,413
Quote:
Originally Posted by TurboPascal7 View Post
Obviously you need to add some asm to it.
Coming soon to a yasm near you.
__________________
There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order.
foxyshadis is offline   Reply With Quote
Old 6th June 2015, 10:11   #8  |  Link
Monarc
Registered User
 
Join Date: Jun 2002
Posts: 34
Quote:
Originally Posted by Myrsloik View Post
We obviously need to switch to PyPy!
Do you plan support for pypy? Would be nice
Monarc is offline   Reply With Quote
Old 6th June 2015, 10:21   #9  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Ikea Chair
Posts: 2,002
Quote:
Originally Posted by Monarc View Post
Do you plan support for pypy? Would be nice
Cython can compile the module for pypy too (and 2.x python but I don't care about ancient versions). It's just that nobody's really tested it yet.
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet
Myrsloik is offline   Reply With Quote
Old 8th June 2015, 12:12   #10  |  Link
splinter98
Registered User
 
Join Date: Oct 2010
Posts: 36
Before we delve into the magical and mysterious world of pypy, let's get this algorithm more optimised. (And maybe have a better understanding of python internals at the same time).

Firstly nested for loops. Surely we need one? we have to iterate over two axis! nope, python has a beautiful module in the standard library called itertools. The product function will generate us a nested for loop for us!

so:
Code:
for y in range(1, plane_height - 1):
    for x in range(1, plane_width - 1):
        ...
becomes:
Code:
from itertools import product
for y,x in product(range(1, plane_height-1), 
                   range(1, plane_width-1)):
    ...
Great now we don't recreate the range(1, plane_width-1) object for every value of y!

Secondly lets look at:
Code:
plane[y][x]
What's wrong with that? Nothing, except lets understand what it's doing. plane[y][x] is (almost) the same as row = plane[y]; row[x]. Lets break it down. plane[y] creates a new view object each time it's used. This would be fine, except all we do next is lookup the x value and discard the object. Wouldn't it be better if we could look up the single pixel value with a single lookup? Luckily we have a syntax for that:
Code:
plane[y,x]
This invokes a single lookup for the pixel value instead of two, so should give us a noticeable speed up especially when we're iterating over the pixel values.

>> 3 has the same comparable speed as // 8 so let's use the one that makes more sense in terms of what the algorithm is doing.

Finally I noticed a bug in the original implementation. We are reading pixel values from the copied frame and not the original frame! this means that when values change in the output that will affect the pixel values. (Which I believe is not what the original mode19 does).

So a pure Python implementation with speedups would look like this:

Code:
from itertools import product
import vapoursynth as vs
c = vs.get_core()


def removegrain_mode19(n, f):
    fout = f.copy()
    
    for p in range(fout.format.num_planes):
        plane = fout.get_write_array(p)
        inplane = f.get_read_array(p)
    
        plane_height = len(plane)
        plane_width = len(plane[0])
        
        for y, x in product(range(1, plane_height - 1),
                            range(1, plane_width - 1)):
            plane[y, x] = (inplane[y-1, x-1] + inplane[y-1, x] + inplane[y-1, x+1] + 
                           inplane[y, x-1] + inplane[y, x+1] + 
                           inplane[y+1, x-1] + inplane[y+1, x] + inplane[y+1, x+1] + 4) // 8
    
    return fout
note I haven't tested the code for actual speedups, but there should be some, if not at least a reduction in memory consumption.

Finally it's also worth pointing out here get_write_array(p) can be used as an input to other modules such as numpy and not occur a memory copy. Utilising these you may get even more of a speed up.

Last edited by splinter98; 8th June 2015 at 13:08.
splinter98 is offline   Reply With Quote
Old 8th June 2015, 14:39   #11  |  Link
TurboPascal7
Registered User
 
TurboPascal7's Avatar
 
Join Date: Jan 2010
Posts: 270
Range in python3 only creates an iterator which is extremely cheap (both cpu and memory-wise). Also considering that there are no cyclic references, the created iterator (and memoryview objects for that matter) will be collected at the end of the outer loop after every iteration so I'm not sure where that memory consumption reduction would come from.
__________________
Me on GitHub | AviSynth+ - the (dead) future of AviSynth

Last edited by TurboPascal7; 8th June 2015 at 14:44.
TurboPascal7 is offline   Reply With Quote
Old 9th June 2015, 07:09   #12  |  Link
Limit64
Registered User
 
Join Date: Jan 2005
Posts: 5
Wouldn't it make sense to use numpy. It has highly optimized and parallelized methods for calculations with matrices. It should make it more readable, too.

Gesendet von meinem LG-V500 mit Tapatalk
Limit64 is offline   Reply With Quote
Old 9th June 2015, 12:03   #13  |  Link
splinter98
Registered User
 
Join Date: Oct 2010
Posts: 36
Quote:
Originally Posted by TurboPascal7 View Post
Range in python3 only creates an iterator which is extremely cheap (both cpu and memory-wise). Also considering that there are no cyclic references, the created iterator (and memoryview objects for that matter) will be collected at the end of the outer loop after every iteration so I'm not sure where that memory consumption reduction would come from.
This is very true it's definitely a lot better in python 3 than it is in python 2. To be honest I didn't get around to profiling the optimisations properly, however there should still be some performance benefit with [y,x] over [y][x] as you only have a single lookup call vs 2 + view creation. (Although that's not really the main bottleneck in this case, which is the conversion to and from python ints when doing the calculations).

Quote:
Originally Posted by Limit64 View Post
Wouldn't it make sense to use numpy. It has highly optimized and parallelized methods for calculations with matrices. It should make it more readable, too.
Yes it would! The way get_read/write_array is written it conforms to PEP 3118 which means you can use it's output as the input to another function in python that also supports the buffer interface and it won't incur a memorycopy! So a basic numpy filter would start like:

Code:
import numpy as np
import vapoursynth as vs
from scipy import ndimage
c = vs.get_core()


def removegrain_mode19(n, f):
    fout = f.copy()

    for p in range(fout.format.num_planes):
        plane = np.asarray(fout.get_write_array(p))
        inplane = np.asarray(f.get_read_array(p))
    
        #Implement filter below using numpy methods
    
    return fout
splinter98 is offline   Reply With Quote
Old 11th June 2015, 15:09   #14  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: Los Angeles, California
Posts: 2,133
the very FIRST filter I just wrote....
a spatial median (radius=1) filter
Code:
def median (n, f):
    fout = f.copy()
    
    for p in range(fout.format.num_planes):
        plane = fout.get_write_array(p)
    
        plane_height = len(plane)
        plane_width = len(plane[0])
        
        members=[plane[0][0]]*9
        for y in range(1, plane_height - 1):
            for x in range(1, plane_width - 1):
                members[0] = plane[y-1][x-1]
                members[1] = plane[y][x-1]
                members[2] = plane[y+1][x-1]
                members[3] = plane[y-1][x]
                members[4] = plane[y][x]
                members[5] = plane[y+1][x]
                members[6] = plane[y-1][x+1]
                members[7] = plane[y][x+1]
                members[8] = plane[y+1][x+1]
                members.sort()
                plane[y][x] = members[4]
    
    return fout
I can use it to median float point clips finally, but it's slow like a b**, and I don't wanna die before it got the whole clip covered
so I'll just wait Myrsloik to update "std.Median"
__________________
If I got new ideas, will post here: https://github.com/IFeelBloated
feisty2 is offline   Reply With Quote
Old 12th June 2015, 08:19   #15  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: Los Angeles, California
Posts: 2,133
Code:
def rg11_int (n, f):
    fout = f.copy()
    
    for p in range(fout.format.num_planes):
        plane = fout.get_write_array(p)
    
        plane_height = len(plane)
        plane_width = len(plane[0])
        
        for y in range(1, plane_height - 1):
            for x in range(1, plane_width - 1):
                plane[y][x] = (plane[y][x] * 4 + (plane[y+1][x] + plane[y-1][x] + plane[y][x+1] + plane[y][x-1]) * 2 + plane[y+1][x+1] + plane[y+1][x-1] + plane[y-1][x+1] + plane[y-1][x-1] + 8) // 16

    return fout
it returns slightly different result compared to rgvs.RemoveGrain (,11), why? I got this mode11 algorithm from rgtools, they should be exactly the same
__________________
If I got new ideas, will post here: https://github.com/IFeelBloated
feisty2 is offline   Reply With Quote
Old 12th June 2015, 08:43   #16  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Ikea Chair
Posts: 2,002
You read from the same frame as you write. That's the problem.
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet
Myrsloik is offline   Reply With Quote
Old 12th June 2015, 12:19   #17  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: Los Angeles, California
Posts: 2,133
so, in case I did something wrong about rg11, I just copied the code at #1 by jackoneill
Code:
import vapoursynth as vs
c = vs.get_core()

def removegrain_mode19(n, f):
    fout = f.copy()
    
    for p in range(fout.format.num_planes):
        plane = fout.get_write_array(p)
    
        plane_height = len(plane)
        plane_width = len(plane[0])
        
        for y in range(1, plane_height - 1):
            for x in range(1, plane_width - 1):
                plane[y][x] = (plane[y-1][x-1] + plane[y-1][x] + plane[y-1][x+1] + plane[y][x-1] + plane[y][x+1] + plane[y+1][x-1] + plane[y+1][x] + plane[y+1][x+1] + 4) >> 3
    
    return fout

src = rule6.vob
src = c.std.ShufflePlanes(src, planes=0, colorfamily=vs.GRAY)
src1 = c.std.ModifyFrame(clip=src, clips=src, selector=removegrain_mode19)
dif = c.std.MakeDiff (src1,c.rgvs.RemoveGrain (src,19)).std.Expr ("x 128 - 10 * 128 +")
dif.set_output()
and it shows it's not bit exact level of "same" compared to rgvs either
Quote:
You read from the same frame as you write. That's the problem.
I tried stuff like
clp=rule6
dup=clp
std.ModifyFrame(clip=clp, clips=dup, selector=xxx)
but not working
__________________
If I got new ideas, will post here: https://github.com/IFeelBloated
feisty2 is offline   Reply With Quote
Old 12th June 2015, 12:49   #18  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: Los Angeles, California
Posts: 2,133
and one more thing,
Quote:
Mode 19
Every pixel is replaced with the arithmetic mean of its 3x3 neighborhood, center pixel not included. In other words, the 8 neighbors are summed up and the sum is divided by 8.
it should be
Code:
plane[y][x] = (plane[y-1][x-1] + plane[y-1][x] + plane[y-1][x+1] + plane[y][x-1] + plane[y][x+1] + plane[y+1][x-1] + plane[y+1][x] + plane[y+1][x+1]) >> 3
according to the description

but actually implemented as
Code:
 plane[y][x] = (plane[y-1][x-1] + plane[y-1][x] + plane[y-1][x+1] + plane[y][x-1] + plane[y][x+1] + plane[y+1][x-1] + plane[y+1][x] + plane[y+1][x+1] + 4) >> 3
what's that "+4" for?
__________________
If I got new ideas, will post here: https://github.com/IFeelBloated
feisty2 is offline   Reply With Quote
Old 12th June 2015, 13:01   #19  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Ikea Chair
Posts: 2,002
How the code should be written in the first post.
Code:
import vapoursynth as vs
c = vs.get_core()


def removegrain_mode19(n, f):
    fout = f.copy()
    
    for p in range(fout.format.num_planes):
        plane = f.get_read_array(p)
        dst_plane = fout.get_write_array(p)
    
        plane_height = len(plane)
        plane_width = len(plane[0])
        
        for y in range(1, plane_height - 1):
            for x in range(1, plane_width - 1):
                dst_plane[y][x] = (plane[y-1][x-1] + plane[y-1][x] + plane[y-1][x+1] + plane[y][x-1] + plane[y][x+1] + plane[y+1][x-1] + plane[y+1][x] + plane[y+1][x+1] + 4) >> 3
    
    return fout


src = c.ffms2.Source("asdf.mov")
src = c.std.ModifyFrame(clip=src, clips=src, selector=removegrain_mode19)


src.set_output()
The +4 is added before the division to get proper rounding.
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet
Myrsloik is offline   Reply With Quote
Old 12th June 2015, 13:22   #20  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: Los Angeles, California
Posts: 2,133
@Myrsloik
difs still exist, but a LOT smaller now, difs look like some random dust covered on a blank gray clip now, guess that's because of rounding errors caused by asm opt?
and I get that "+4" now, so "x >> 3"=floor(x/8), that +4 turns it into round (x/8), so it should be removed in float point version
edit: so "//" is actually floor division, I always thought it's round division
__________________
If I got new ideas, will post here: https://github.com/IFeelBloated

Last edited by feisty2; 12th June 2015 at 13:33.
feisty2 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 23:49.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.