View Full Version : Writing VapourSynth filters in Python
jackoneill
1st June 2015, 19:57
Just for fun.
Here is RemoveGrain mode 19 implemented in Python:
import vapoursynth as vs
c = vs.get_core()
def removegrain_mode19(n, f):
fout = f.copy()
for p in range(fout.format.num_planes):
plane = fout.get_write_array(p)
plane_height = len(plane)
plane_width = len(plane[0])
for y in range(1, plane_height - 1):
for x in range(1, plane_width - 1):
plane[y][x] = (plane[y-1][x-1] + plane[y-1][x] + plane[y-1][x+1] + plane[y][x-1] + plane[y][x+1] + plane[y+1][x-1] + plane[y+1][x] + plane[y+1][x+1] + 4) >> 3
return fout
src = c.ffms2.Source("asdf.mov")
src = c.std.ModifyFrame(clip=src, clips=src, selector=removegrain_mode19)
src.set_output()
It is exceptionally slow.
Groucho2004
1st June 2015, 21:51
It is exceptionally slow.
3 nested for loops with array operations in the inner loop - The Python script interpreter should manage one frame in less than a day. :D
MonoS
1st June 2015, 21:55
3 nested for loops with array operations in the inner loop - The Python script interpreter should manage one frame in less than a day. :D
Who need those fancy AVX512 instruction, i have an interpreter :D
Myrsloik
1st June 2015, 21:56
We obviously need to switch to PyPy (http://morepypy.blogspot.se/2011/07/realtime-image-processing-in-python.html)!
TurboPascal7
2nd June 2015, 05:51
Obviously you need to add some asm (https://gist.github.com/tp7/e12143e48503f19398f0) to it.
TheFluff
2nd June 2015, 07:55
Obviously you need to add some asm (https://gist.github.com/tp7/e12143e48503f19398f0) to it.
jesus fucking christ
foxyshadis
2nd June 2015, 09:57
Obviously you need to add some asm (https://gist.github.com/tp7/e12143e48503f19398f0) to it.
Coming soon to a yasm near you.
Monarc
6th June 2015, 10:11
We obviously need to switch to PyPy (http://morepypy.blogspot.se/2011/07/realtime-image-processing-in-python.html)!
Do you plan support for pypy? Would be nice :D
Myrsloik
6th June 2015, 10:21
Do you plan support for pypy? Would be nice :D
Cython can compile the module for pypy too (and 2.x python but I don't care about ancient versions). It's just that nobody's really tested it yet.
splinter98
8th June 2015, 12:12
Before we delve into the magical and mysterious world of pypy, let's get this algorithm more optimised. (And maybe have a better understanding of python internals at the same time).
Firstly nested for loops. Surely we need one? we have to iterate over two axis! nope, python has a beautiful module in the standard library called itertools (https://docs.python.org/3/library/itertools.html). The product function will generate us a nested for loop for us!
so:
for y in range(1, plane_height - 1):
for x in range(1, plane_width - 1):
...
becomes:
from itertools import product
for y,x in product(range(1, plane_height-1),
range(1, plane_width-1)):
...
Great now we don't recreate the range(1, plane_width-1) object for every value of y!
Secondly lets look at:
plane[y][x]
What's wrong with that? Nothing, except lets understand what it's doing. plane[y][x] is (almost) the same as row = plane[y]; row[x]. Lets break it down. plane[y] creates a new view object each time it's used. This would be fine, except all we do next is lookup the x value and discard the object. Wouldn't it be better if we could look up the single pixel value with a single lookup? Luckily we have a syntax for that:
plane[y,x]
This invokes a single lookup for the pixel value instead of two, so should give us a noticeable speed up especially when we're iterating over the pixel values.
>> 3 has the same comparable speed as // 8 so let's use the one that makes more sense in terms of what the algorithm is doing.
Finally I noticed a bug in the original implementation. We are reading pixel values from the copied frame and not the original frame! this means that when values change in the output that will affect the pixel values. (Which I believe is not what the original mode19 does).
So a pure Python implementation with speedups would look like this:
from itertools import product
import vapoursynth as vs
c = vs.get_core()
def removegrain_mode19(n, f):
fout = f.copy()
for p in range(fout.format.num_planes):
plane = fout.get_write_array(p)
inplane = f.get_read_array(p)
plane_height = len(plane)
plane_width = len(plane[0])
for y, x in product(range(1, plane_height - 1),
range(1, plane_width - 1)):
plane[y, x] = (inplane[y-1, x-1] + inplane[y-1, x] + inplane[y-1, x+1] +
inplane[y, x-1] + inplane[y, x+1] +
inplane[y+1, x-1] + inplane[y+1, x] + inplane[y+1, x+1] + 4) // 8
return fout
note I haven't tested the code for actual speedups, but there should be some, if not at least a reduction in memory consumption.
Finally it's also worth pointing out here get_write_array(p) can be used as an input to other modules such as numpy and not occur a memory copy. Utilising these you may get even more of a speed up.
TurboPascal7
8th June 2015, 14:39
Range in python3 only creates an iterator which is extremely cheap (both cpu and memory-wise). Also considering that there are no cyclic references, the created iterator (and memoryview objects for that matter) will be collected at the end of the outer loop after every iteration so I'm not sure where that memory consumption reduction would come from.
Limit64
9th June 2015, 07:09
Wouldn't it make sense to use numpy. It has highly optimized and parallelized methods for calculations with matrices. It should make it more readable, too.
Gesendet von meinem LG-V500 mit Tapatalk
splinter98
9th June 2015, 12:03
Range in python3 only creates an iterator which is extremely cheap (both cpu and memory-wise). Also considering that there are no cyclic references, the created iterator (and memoryview objects for that matter) will be collected at the end of the outer loop after every iteration so I'm not sure where that memory consumption reduction would come from.
This is very true it's definitely a lot better in python 3 than it is in python 2. To be honest I didn't get around to profiling the optimisations properly, however there should still be some performance benefit with [y,x] over [y][x] as you only have a single lookup call vs 2 + view creation. (Although that's not really the main bottleneck in this case, which is the conversion to and from python ints when doing the calculations).
Wouldn't it make sense to use numpy. It has highly optimized and parallelized methods for calculations with matrices. It should make it more readable, too.
Yes it would! The way get_read/write_array is written it conforms to PEP 3118 (https://www.python.org/dev/peps/pep-3118/) which means you can use it's output as the input to another function in python that also supports the buffer interface and it won't incur a memorycopy! So a basic numpy filter would start like:
import numpy as np
import vapoursynth as vs
from scipy import ndimage
c = vs.get_core()
def removegrain_mode19(n, f):
fout = f.copy()
for p in range(fout.format.num_planes):
plane = np.asarray(fout.get_write_array(p))
inplane = np.asarray(f.get_read_array(p))
#Implement filter below using numpy methods
return fout
feisty2
11th June 2015, 15:09
the very FIRST filter I just wrote....
a spatial median (radius=1) filter
def median (n, f):
fout = f.copy()
for p in range(fout.format.num_planes):
plane = fout.get_write_array(p)
plane_height = len(plane)
plane_width = len(plane[0])
members=[plane[0][0]]*9
for y in range(1, plane_height - 1):
for x in range(1, plane_width - 1):
members[0] = plane[y-1][x-1]
members[1] = plane[y][x-1]
members[2] = plane[y+1][x-1]
members[3] = plane[y-1][x]
members[4] = plane[y][x]
members[5] = plane[y+1][x]
members[6] = plane[y-1][x+1]
members[7] = plane[y][x+1]
members[8] = plane[y+1][x+1]
members.sort()
plane[y][x] = members[4]
return fout
I can use it to median float point clips finally, but it's slow like a b**, and I don't wanna die before it got the whole clip covered
so I'll just wait Myrsloik to update "std.Median"
feisty2
12th June 2015, 08:19
def rg11_int (n, f):
fout = f.copy()
for p in range(fout.format.num_planes):
plane = fout.get_write_array(p)
plane_height = len(plane)
plane_width = len(plane[0])
for y in range(1, plane_height - 1):
for x in range(1, plane_width - 1):
plane[y][x] = (plane[y][x] * 4 + (plane[y+1][x] + plane[y-1][x] + plane[y][x+1] + plane[y][x-1]) * 2 + plane[y+1][x+1] + plane[y+1][x-1] + plane[y-1][x+1] + plane[y-1][x-1] + 8) // 16
return fout
it returns slightly different result compared to rgvs.RemoveGrain (,11), why? I got this mode11 algorithm from rgtools, they should be exactly the same
Myrsloik
12th June 2015, 08:43
You read from the same frame as you write. That's the problem.
feisty2
12th June 2015, 12:19
so, in case I did something wrong about rg11, I just copied the code at #1 by jackoneill
import vapoursynth as vs
c = vs.get_core()
def removegrain_mode19(n, f):
fout = f.copy()
for p in range(fout.format.num_planes):
plane = fout.get_write_array(p)
plane_height = len(plane)
plane_width = len(plane[0])
for y in range(1, plane_height - 1):
for x in range(1, plane_width - 1):
plane[y][x] = (plane[y-1][x-1] + plane[y-1][x] + plane[y-1][x+1] + plane[y][x-1] + plane[y][x+1] + plane[y+1][x-1] + plane[y+1][x] + plane[y+1][x+1] + 4) >> 3
return fout
src = rule6.vob
src = c.std.ShufflePlanes(src, planes=0, colorfamily=vs.GRAY)
src1 = c.std.ModifyFrame(clip=src, clips=src, selector=removegrain_mode19)
dif = c.std.MakeDiff (src1,c.rgvs.RemoveGrain (src,19)).std.Expr ("x 128 - 10 * 128 +")
dif.set_output()
and it shows it's not bit exact level of "same" compared to rgvs either
You read from the same frame as you write. That's the problem.
I tried stuff like
clp=rule6
dup=clp
std.ModifyFrame(clip=clp, clips=dup, selector=xxx)
but not working
feisty2
12th June 2015, 12:49
and one more thing,
Mode 19
Every pixel is replaced with the arithmetic mean of its 3x3 neighborhood, center pixel not included. In other words, the 8 neighbors are summed up and the sum is divided by 8.
it should be
plane[y][x] = (plane[y-1][x-1] + plane[y-1][x] + plane[y-1][x+1] + plane[y][x-1] + plane[y][x+1] + plane[y+1][x-1] + plane[y+1][x] + plane[y+1][x+1]) >> 3
according to the description
but actually implemented as
plane[y][x] = (plane[y-1][x-1] + plane[y-1][x] + plane[y-1][x+1] + plane[y][x-1] + plane[y][x+1] + plane[y+1][x-1] + plane[y+1][x] + plane[y+1][x+1] + 4) >> 3
what's that "+4" for?
Myrsloik
12th June 2015, 13:01
How the code should be written in the first post.
import vapoursynth as vs
c = vs.get_core()
def removegrain_mode19(n, f):
fout = f.copy()
for p in range(fout.format.num_planes):
plane = f.get_read_array(p)
dst_plane = fout.get_write_array(p)
plane_height = len(plane)
plane_width = len(plane[0])
for y in range(1, plane_height - 1):
for x in range(1, plane_width - 1):
dst_plane[y][x] = (plane[y-1][x-1] + plane[y-1][x] + plane[y-1][x+1] + plane[y][x-1] + plane[y][x+1] + plane[y+1][x-1] + plane[y+1][x] + plane[y+1][x+1] + 4) >> 3
return fout
src = c.ffms2.Source("asdf.mov")
src = c.std.ModifyFrame(clip=src, clips=src, selector=removegrain_mode19)
src.set_output()
The +4 is added before the division to get proper rounding.
feisty2
12th June 2015, 13:22
@Myrsloik
difs still exist, but a LOT smaller now, difs look like some random dust covered on a blank gray clip now, guess that's because of rounding errors caused by asm opt?
and I get that "+4" now, so "x >> 3"=floor(x/8), that +4 turns it into round (x/8), so it should be removed in float point version
edit: so "//" is actually floor division, I always thought it's round division
feisty2
14th June 2015, 13:51
def Repair_mode1 (src, rep):
core = vs.get_core ()
def rep1 (n, f):
fout = f[0].copy ()
for p in range (fout.format.num_planes):
plane = np.asarray (f[1].get_read_array (p))
flt_plane = np.asarray (f[0].get_read_array (p))
dst_plane = np.asarray (fout.get_write_array (p))
plane_height = len (plane)
plane_width = len (plane[0])
members = [plane[0, 0]] * 9
for y,x in product (range (1, plane_height-1), range (1, plane_width-1)):
members[0] = plane[y-1, x-1]
members[1] = plane[y, x-1]
members[2] = plane[y+1, x-1]
members[3] = plane[y-1, x]
members[4] = plane[y, x]
members[5] = plane[y+1, x]
members[6] = plane[y-1, x+1]
members[7] = plane[y, x+1]
members[8] = plane[y+1, x+1]
members.sort ()
minnbr = members[0]
maxnbr = members[8]
dst_plane[y, x] = max (min (flt_plane[y, x], maxnbr), minnbr)
return fout
clip = core.std.ModifyFrame (clip=src, clips=[src, rep], selector=rep1)
return clip
rgvs.Repair (,,1), example filter takes multiple inputs rather than single clip
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.