View Single Post
Old 9th March 2014, 03:36    |  Link
TurboPascal7
Registered User
 
TurboPascal7's Avatar
 
Join Date: Jan 2010
Posts: 270
Okay, here's something a bit more fun for you to play.

CLExpr filter - avs expr filter, implemented in OpenCL.
Provided interface is almost identical to that of masktools2. Functions are called cl_expr, cl_exprxy and cl_exprxyz (try to guess which masktools functions they map to). In theory, you can use all expressions and functions that worked with masktools. Offx/offy/w/h parameters aren't implemented - if people are interested in those, I could add them of course.

16-bit support is provided with the common lsb parameter. Set it to true to process stacked 16-bit clips.

This plugin is highly experimental and could easily kill your GPU driver like it's the most natural thing to do. Drivers usually can restart themselves but I warned you.

There's very little point in threading this plugin.

Why expr?
For most common uses on 8-bit clips this filter will be slower than masktools (especially the b branch in this thread). Unless you have a very good video card/motherboard and a very bad CPU. On-board GPUs might probably provide a bit better performance, I never tried. Dunno about performance on CPU either, right now it searches only for GPU devices.
But you can process 16-bit clips with it, which you obviously can't with mt_lutxy and mt_lutxyz.

Performance is mostly limited by memory transfer and hardly depends on complexity of the expression you pass to it. For example on my system cl_expr("x log log log log log",u=3,v=3) has the same performance as cl_expr("x 1 +",u=3,v=3). No dup/swap functions provided.

Why OpenCL?
Two reasons:
1. It was very fun.
2. It's a lot easier to implement with OpenCL than with regular C/C++ (vsynth). Basically it works by calling mt_infix() on the expression and feeding that to OpenCL compilation engine. Done.

That said, my experience with OpenCL is limited to ~10 hours, so I'm probably doing something very dumb in there. Please feel free to tell me how to improve memory transfer rates, device detection etc.

Binary
Here. Only x86 version for now, but it works on x64 without problems.

EDIT: reuploaded the binary because NVIDIA apparently doesn't like long/ulong types and e.g. (ulong)(round(3.0f)) is suddenly equal to zero, while (uint)(round(3.0f)) is 3.
__________________
Me on GitHub | AviSynth+ - the (dead) future of AviSynth

Last edited by TurboPascal7; 9th March 2014 at 07:31.
TurboPascal7 is offline   Reply With Quote