Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 19th March 2017, 03:31   #1  |  Link
TheFluff
Excessively jovial fellow
 
Join Date: Jun 2004
Location: rude
Posts: 1,100
Description and comparison of the Avs+ and VS concurrency models

Avisynth and Vapoursynth share a very simple video filter API model. The filter exposes a GetFrame function, which is supposed to return a single frame. When someone (either a client application or a downstream filter) calls GetFrame, the environment supplies some kind of reference to the upstream filter, from which the called filter may request any number of frames it needs to produce the single output frame (except if it is a source filter - those don't have upstream filters).

In an ideal world, when getframe is called, all frames that the filter needs to produce output should be available immediately, and output should be ready just in time for someone to consume it. Filters should not have to wait for input, and output should not be sitting around waiting for someone to consume it - the former increases processing time and the latter memory usage. Also, the number of software worker threads should not be greater than the number of hardware threads, assuming that you can keep them actually working - having more just increases memory usage.

Achieving all of this is obviously impossible in the real world.


Vapoursynth's concurrency model: asynchronous frame requests

VS uses a familiar and conceptually simple callback-based asynchronous I/O model. To support this, there's an asynchronous requestframe function that you call from your filter's getframe function, instead of calling the plain old getframe and waiting for it to return. When frame N is requested from your filter, you call requestframe as many times as you wish, with the frame numbers you need from upstream to produce your output. Each requestframe call returns immediately and you do not actually get any frames back yet, so you will not be able to start processing. Instead, your filter will be called again later, when your input is ready (you get called both every time one requested input frame is ready and when all input you requested is ready, and it is up to you to decide when to start processing - most filters just wait for all input frames to be ready). When a filter requests frames, each request (for a single frame) becomes a "work item" that is put into a queue managed by the VS core. The core also maintains a worker thread pool (defaulting to hardware_concurrency workers) that processes items from this queue, oldest first (FIFO). This is all just standard async callback stuff, so hopefully no surprises so far. If you've ever done something in, say, Node.js, all of this probably sounds pretty familiar.

To determine how the thread pool interacts with your filter, the filter gets to choose between (in practice) three different concurrency modes when it registers itself with the core:
fmParallel is for thread-safe filters that do not need to modify any internal shared state between requests. Any number of threads may call the filter's getframe function at the same time, and for this to work the filter must only modify state associated with the request for one particular frame (reading shared state is fine). That means things like temporary work buffers need to be allocated and deallocated for each output frame produced - they cannot be allocated once per instance and kept for the lifetime of the filter. On modern hardware this is rarely a problem in practice; malloc is very fast these days.

If you do need something like a permanent work buffer per filter instance, or maybe if you're writing a GPU filter where context swaps are expensive, fmParallelRequests may be a better alternative. In this mode, your filter may still be asked to request the input frames it needs to produce a given output frame from many threads at the same time, but VS will ensure that input is only delivered to it from one thread at a time, and it will wait until the output is ready before delivering more input. In other words, your filter can asynchronously request many input frames in parallel, but it will only be asked to actually process one frame at a time. Hence, maintaining and modifying state per filter instance during processing is safe even without locks or mutexes, so you can have a filter that is effectively single threaded and thread-unsafe in its processing, but is still well-behaved in a multithreaded environment. KNLMeansCL uses this mode, for example.

If that still is too much concurrency for you, you may have to resort to fmUnordered. In this mode, your filter's getframe function will only ever be called from one thread at a time. You can still request input asynchronously, but you cannot be asked to do so while your filter is doing something else (like processing a frame, or determining what frame(s) to request to produce a different output frame). The probably most common use case for this mode is synchronous (and possibly non-threadsafe) source filters that produce frames immediately when requested and can't handle multiple requests at the same time. FFMS2 uses this mode, for example. It can also be used for filters that examine and modify shared internal state when determining which frames to request from the upstream filter - VDecimate is one example of this.

Attentive readers may have noticed already that most plain old thread-unsafe Avisynth filters could most likely be compatible with fmParallelRequests, except for the fact that Avisynth's getframe is synchronous, so they would not play nicely with anything else. To avoid this problem, VS has a whitelist of Avisynth filters together with which frames or frame ranges they are expected to request when the downstream filter requests frame N. This list also takes filter parameters into account, so filters with a temporal range parameter are handled correctly. All filters on the list are run with mode fmParallelRequests. If they're not whitelisted though, the core resorts to fmSerial, which behaves like old single threaded Avisynth does (getframe effectively becomes synchronous for the filter and all upstream filters).

In real filter chains, a slow filter that only supports fmParallelRequests (or fmUnordered) may easily become a bottleneck since its processing is single threaded even if everything else around it runs in parallel. This is only really meaningful in fairly trivial filter chains, though. However, if you're bothered by such a bottleneck, you can easily work around it by creating more than one filter instance, making them take care of a smaller part of the input clip each (for example by selectevery or by trimming the clip into two halves, or even by cropping) and then merging the resulting clips back together. That way you can get as many concurrent "single threaded" filters as you like.

Filters are encouraged (but not required) to request frames from their upstream filter in order, to make nice with source filters. However, because there's a central thread pool that keeps track of frame requests, VS filters can register themselves with the flag nfMakeLinear, which makes the thread pool attempt to reschedule frame requests to such filters so that frames are requested in order. FFMS2 does this, for example, and it can have a huge impact on performance (try selectevery(5, 2, 1, 0, 3, 4) after a ffvideosource call in Avs+ and compare it to VS and you'll see what I mean - when I tried that and put KNLMeansCL afterwards it ran at about a quarter of the speed). The thread pool also makes sure the same frame doesn't get requested from the same filter more than once at the same time. Finally, it also keeps track of frame caches and during processing it "learns" how much cache is actually beneficial, which is why the memory usage of VS processes tends to drop after a little while.

Client applications interacting with the VS API can request frames either synchronously or asynchronously. vspipe defaults to requesting as many frames at a time as there are worker threads in the VS core.


The Avisynth+ concurrency model: buying compatibility with complexity

Avs+ developers valued compatibility with existing filters and existing API users more than anything else, and hence the multithreading had to fit within the existing Avisynth API, which is completely synchronous. Also, it was seen as desirable to be able to parallelize old filters that were not designed with multithreading in mind. On the surface of things, the Avs+ concurrency model seems deceptively simple: multithreading is enabled by creating a prefetcher filter at the end of the script, which spawns a number of worker threads which in turn request frames from upstream filters. Since the client application calling getframe on the prefetcher is a synchronous operation, the prefetcher attempts to predict which frames will be requested in the future, and starts requests for them before they are actually requested.

Avs+ filters only have two practically useful concurrency modes; they can either be MT_NICE_FILTER, which is the equivalent of fmParallel, or they can be MT_MULTI_INSTANCE. In the later case, Avs+ simply spawns one filter instance per worker thread. There is no equivalent to fmParallelRequests since there are no asynchronous frame requests; instead, the only alternative is MT_SERIALIZED, which essentially forces everything upstream of it to run single threaded.

MT_MULTI_INSTANCE sounds cool until you realize that each instance comes with its own buffers and allocations and other overhead, and when you have several instances of the same MT_MULTI_INSTANCE filter in one script you get number of invocations multiplied by number of prefetcher threads instances and suddenly your script is hanging on open because it's trying to create 12 instances of KNLMeansCL. MT_MULTI_INSTANCE simply isn't a reasonable default, but since there's nothing else that's what you get.

Then it turns out that the Avs+ threading isn't so simple at all. I was making an honest effort at trying to understand how it actually worked, but stopped reading the code when it felt like that if I dug any deeper, I'd be staring a balrog in the eye very soon. Every filter is wrapped by an instance of a pseudo-filter called MtGuard. Since there's no central tracking of frame requests, the MtGuards and the prefetcher instead seem to use a global, thread-safe (with locking) frame cache that prevents the same frame from getting processed more than once by requests from different threads (no this is actually wrong, see below). Then there's a huge number of caveats and things that basically don't work, mainly related to how much old garbage Avs+ has inherited. There's a ton of code related to handling env->invoke from various places, for example - remember that in Avisynth, you can invoke from inside getframe. Runtime filters basically don't seem to work at all if multithreading is enabled (that is, filters that call getframe from a non-getframe function, with the frame number based on the state of a script variable) and there are comments in the source code that seem to indicate that there have been attempts to get them to work but it has resulted in either heap corruption or deadlocks, which doesn't surprise me at all.

The Avisynth+ threading code is enormously much more complex than the VS equivalent. Understanding how frame requests are actually handled and routed through the multi-threaded filter chain is incredibly difficult, and reasoning about performance is likewise all but impossible. The reliance on MT_MULTI_INSTANCE for everything that isn't natively thread-safe causes issues with memory consumption, and the entire thing has a number of problems caused by fundamental design issues that I believe are unlikely to ever get fixed.



tl;dr: if you try to shoehorn multithreading into an API designed for being single threaded and synchronous, you're gonna have a bad time.


Disclaimer: I have not attempted to run Avs+ in a debugger; my understanding of the concurrency model is based only on reading source code (and a few old d9 posts). I have probably misunderstood things. Please correct any errors.

Last edited by TheFluff; 19th March 2017 at 17:00.
TheFluff is offline   Reply With Quote
Old 19th March 2017, 07:24   #2  |  Link
hydra3333
Registered User
 
Join Date: Oct 2009
Location: crow-land
Posts: 540
thanks.
hydra3333 is offline   Reply With Quote
Old 19th March 2017, 15:42   #3  |  Link
TheFluff
Excessively jovial fellow
 
Join Date: Jun 2004
Location: rude
Posts: 1,100
Quote:
Originally Posted by hydra3333 View Post
thanks.
You're welcome!

I posted this last night without really finishing it completely because I was getting sick of staring at it, so here's a few clarifications and elaborations.

The by far biggest conceptual difference between Avs+ and VS is the way client application frame requests interact with the parallelism. In Avs+, all getframe calls are synchronous and therefore blocking. When the client application requests frame N from the end of the filter graph, the call propagates upwards in the filter chain synchronously, which means that the calling thread is blocked and waiting until the entire chain of getframe calls all the way up to the source filter have finished processing. This is obviously completely contrary to how a multithreaded system is supposed to work, and the solution chosen by the Avs+ developers was brute force. That's where the MtGuards, the prefetcher and the shared VideoCache come in. The prefetcher runs several worker threads that requests (predicted) output frames from the end of the graph before the client application does, so that when the client application asks for an output frame that frame has already been processed. The key thing to understand here is that when the client application requests frame N, all the processing needed to produce that frame is by necessity done in one and the same thread. However, frame requests almost never come alone, so there's more to this. To avoid multiple threads re-processing the same frame with the same filter (for example in situations where subsequent calls to a temporal filter results in requests for overlapping frame ranges from an upstream filter), Avs+ inserts cache filters between each filter in the chain. Cache lookups (that is, getframe calls) from this filter are protected by a mutex so if the requested frame has already been requested by another thread but isn't ready yet, the lookup blocks until the frame is available. Evictions from the cache are on a least-recently-used basis.

If this sounds awkward, that's because it is, and the only reason it actually kinda works in reality is that client applications tend to request frames in order, linearly. The entire model is by its nature very fragile (especially when you involve filters that reorder frames) and as soon as MT_MULTI_INSTANCE filters are involved it starts requiring immense amounts of memory because of the frame caches required to support it. If you use KNLMeansCL in Avs+ together with, say, prefetch(4), you will get almost exactly the same performance as in VS (in which there is a single instance of the filter) but almost four times the memory usage. This is because it's bottlenecked by the GPU so spawning more instances doesn't help at all, but Avs+ has to do it that way because it's not a MT_NICE_FILTER and there's no way to create a single instance that doesn't make everything upstream of it single-threaded and blocking. There are also a few other quite common filters that fit very poorly into the concurrency model - for example, TDecimate is MT_SERIALIZED, so everything upstream of it is single threaded and synchronous. You're just lucky that it happens to usually be placed soon after a source filter that is probably also MT_SERIALIZED so you don't notice it much.

In VS, on the other hand, all frame requests are in principle non-blocking and there is a central authority (the thread pool) that schedules work. A single frame request from a client application may be handled by any number of threads - if there is a slow fmParallelRequests filter somewhere in the chain but everything else is fmParallel, the thread pool will simply run everything else as fast as necessary to keep the fmParallelRequests filter processing without ever waiting for input (and without output ever waiting for that filter). The thread pool also keeps an eye on frame caches, which by the way are rather more sophisticated than in Avs+ but I can't be arsed to write even more right now and it's tangential to the topic anyway.

tl;dr: in Avs+, the threading is frame-based per output frame. The smallest work unit is one output frame's entire processing chain from source filter to output. In VS, the threading is frame-based per filter. The smallest work unit is processing one frame in one filter. The latter is obviously a whole lot more flexible.


Open questions
I don't understand how scriptclip/conditionalfilter and friends could ever work reliably with the Avs+ concurrency model. Right now it simply doesn't work at all. The trivial example
Code:
ScriptClip(last, "Subtitle(String(YDifferenceFromPrevious))")
just says "PlaneDifference: this filter can only be used within runtime filters", which implies that the current_frame variable isn't set. There is a thread-local storage for script variables in Avs+, so that isn't the problem - each single-threaded frame request has its own current_frame and any frame requests made from inside a conditional filter would therefore operate on the correct frame, in theory. The problem arises with the fact that scriptclip and friends effectively call eval() on every frame request, which means they call env->Invoke on every frame request, which means that a bunch of filter instances are constructed, and those constructors can in turn call env->Invoke and/or getframe, and the entire thing comes crashing down in flames, with either deadlocking or undefined behavior. pinterf claimed a while ago that scriptclip is supposed to work in MT, and I'm probably missing something because of all the awful complexity in the Avs+ threading, but I can't see how it ever could do so reliably. In some very simple instances it might work, but I really don't think it can be made to work with arbitrary filter chains.

For comparison, in VS you specify at compile time what you're going to feed into FrameEval, and it executes as fmParallelRequests or fmUnordered depending on what it's doing. The called evaluation function never calls getframe and never constructs filter instances; instead it just returns a node (or a clip, in Avisynth parlance) from which the downstream filter can request frames.

Last edited by TheFluff; 19th March 2017 at 21:38.
TheFluff is offline   Reply With Quote
Old 20th March 2017, 02:57   #4  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,559
Great explanation of VapourSynth's model and Avisynth's complex situation.

It could be interesting to compare something even more drastic than Eval: MP_Pipeline, running the whole script as an eval-like function in different processes, and its impact on the whole model.

Edit: if MP_Pipeline is used *instead* of internal multi-threading, it won't be an issue here. It's really just another MT mode.

Last edited by MysteryX; 20th March 2017 at 02:59.
MysteryX is offline   Reply With Quote
Old 20th March 2017, 09:42   #5  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,309
TheFluff, thanks for taking your time.
Yes, that ScriptClip issue in Avs+ mt will need much more time for me to understand and make a proper solution (and not a hacked workaround). Calling a MT_MULTI_INSTANCE filter from ScriptClip that calls a runtime function results in "current_frame" variable disappearing. Although current_frame variable exists but it is in the variable list of another thread's thread local storage.
I guess this cannot be solved by inserting one or two lines here and there, it may be the fundamental concept that fails and cannot handle the situation properly.
pinterf is offline   Reply With Quote
Old 20th March 2017, 15:32   #6  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,559
Is it possible to fundamentally redesign using a simple asynchronous model like VS, and have a synchronous wrapper around these asynchronous functions for compatibility? The challenge then would be compatibility with existing filters.
MysteryX is offline   Reply With Quote
Old 20th March 2017, 19:09   #7  |  Link
TheFluff
Excessively jovial fellow
 
Join Date: Jun 2004
Location: rude
Posts: 1,100
You could do exactly what VS does and wrap each old plugin in a fake Avisynth environment, but VS doesn't implement everything in the fake environment. Invoke is banned, for example (it behaves as if the function wasn't defined), and runtime functions that call getframe from the constructor simply won't work at all (edit: Myrsloik says I'm wrong and it actually does work, there is a synchronous getframe that gets used in this case). VS works because a lot of effort was spent to replace a bunch of Avisynth filters with new ones that sometimes work in significantly different ways, and I don't think there's much interest in re-fixing a bunch of avisynth plugins again. The decision to move to a new API should have been made a long time ago, but people wanted something that kinda worked right now and kept adding hacks onto the old API which then becomes effectively standardized with new quirks, so you can't break compatibility because that'd be too much effort.

Then again you could do what VS does but in reverse and let Avisynth load VS plugins via some wrapper to replace the stuff broken by a new API :V

Last edited by TheFluff; 20th March 2017 at 19:17.
TheFluff is offline   Reply With Quote
Old 21st March 2017, 06:39   #8  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,559
MT is complex but still it's working well in most cases. MP_Pipeline has some issues but I never use MP_Pipeline and MT at the same time anyway; one or the other.

Here we're talking about ScriptClip. We're not even talking about supporting any specific plugins... it's just that function that is broken.

If you want my opinion, it would be much better to implement conditional scripts at the script level, and not try to hack around and emulate it within a string. Plus, creating new plugin instances at every GetFrame isn't a good idea at all IMO. It's a performance killer. For something like KNLMeans that requires extensive initialization, don't even think about it. If it was handled by the language, then instances could be created and managed in a smarter way, and there would be no need to call Invoke within GetFrame.
MysteryX is offline   Reply With Quote
Old 21st March 2017, 11:55   #9  |  Link
TheFluff
Excessively jovial fellow
 
Join Date: Jun 2004
Location: rude
Posts: 1,100
Lots of Avisynth script functions (srestore, for example) rely on ScriptClip. But yes, it is ultimately a hack and a bad solution. Avisynth in general has a bunch of problems caused by being too tightly coupled to its scripting language, and any language where using the equivalent of eval() is somewhat common is providing it because it's trying to hide the fact that its support for closures isn't good enough.

Avisynth wasn't originally a bad design, really. It made a lot of sense for something written in 2001 for 32-bit single core CPU's - it was easy to use and to understand, but still powerful, especially for its time. The problem is that people have tried to build all kinds of different stuff on top of it, stuff that really kinda needed different design patters to remain relatively clean, well-behaved and easy to understand, but instead of providing a backwards compatible wrapper API/script environment people kept using the old one. I believe this to have been a fundamental mistake and it's only gotten harder to rectify it as time has gone by.

Last edited by TheFluff; 21st March 2017 at 12:02.
TheFluff is offline   Reply With Quote
Old 21st March 2017, 16:33   #10  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,559
I also want to point out that supporting conditions within the scripting language itself allows for a public interface change that is fully compatible with older version. These conditions can then also be used from DLLs with Eval, although that's kind of a hack.

Then, correct me if I'm wrong, but all plugins requiring ScriptClip are text scripts, and thus, it would be very easy to remove the brackets around the conditions.

There's also an old programming adage that says: if it's not broken, don't fix it.

Last edited by MysteryX; 21st March 2017 at 17:30.
MysteryX is offline   Reply With Quote
Old 21st March 2017, 17:31   #11  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,559
Another observation: if the scripting language would support the ScriptClip syntax, then ScriptClip could be redirected as Eval and it should work. It doesn't mean the exact same syntax is the best idea though.

This would essentially mean reprogramming ScriptClip as part of the core and handling it in a special way, and then not allowing Invoke to be called from GetFrame from there on. If conditional programming is supported by the core language, then there is no need to call Invoke within GetFrame.
MysteryX is offline   Reply With Quote
Old 21st March 2017, 18:59   #12  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,548
Quote:
Originally Posted by MysteryX View Post
I also want to point out that supporting conditions within the scripting language itself allows for a public interface change that is fully compatible with older version. These conditions can then also be used from DLLs with Eval, although that's kind of a hack.

Then, correct me if I'm wrong, but all plugins requiring ScriptClip are text scripts, and thus, it would be very easy to remove the brackets around the conditions.

There's also an old programming adage that says: if it's not broken, don't fix it.
It is objectively proven to be broken.
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet
Myrsloik is offline   Reply With Quote
Old 21st March 2017, 19:04   #13  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,559
Another question to throw into the mix. MP_Pipeline is one of these plugins that is designed to do something that Avisynth isn't supposed to do: running Avisynth in several processes and/or several threads.

Although the concept seems simple, in truth, it runs into all the same challenges as MT modes, but doesn't have the same level of sophistication. How does it handle these challenges, and what is it doing, really?

The other option would be to complete VapourSynth to have all the benefits and features of Avisynth; one of which is audio support.

Last edited by MysteryX; 21st March 2017 at 19:06.
MysteryX is offline   Reply With Quote
Old 21st March 2017, 20:41   #14  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,548
Quote:
Originally Posted by MysteryX View Post
Another question to throw into the mix. MP_Pipeline is one of these plugins that is designed to do something that Avisynth isn't supposed to do: running Avisynth in several processes and/or several threads.

Although the concept seems simple, in truth, it runs into all the same challenges as MT modes, but doesn't have the same level of sophistication. How does it handle these challenges, and what is it doing, really?

The other option would be to complete VapourSynth to have all the benefits and features of Avisynth; one of which is audio support.
MP_Pipeline's idea is really simple. Create several Avisynth instances in separate processes. Put a pre-fetcher to make it "multi-threaded" and the needed things to pass frames between these processes. And when you start wrapping Avisynth instances in multiple processes, who's really the plugin then?

If you look at the limitations the answer is that it doesn't get around anything that's relevant anymore. Avs+ has an x64 version and multi-threading. The only thing that might work better is ScriptClip and friends if you're very careful.

Note that I'm not a MP_Pipeline user.
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet
Myrsloik is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 10:20.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.