Doom9's Forum - View Single Post - Description and comparison of the Avs+ and VS concurrency models

TheFluff · 19th March 2017, 15:42

Quote:

Originally Posted by hydra3333

thanks.

You're welcome!

I posted this last night without really finishing it completely because I was getting sick of staring at it, so here's a few clarifications and elaborations.

The by far biggest conceptual difference between Avs+ and VS is the way client application frame requests interact with the parallelism. In Avs+, all getframe calls are synchronous and therefore blocking. When the client application requests frame N from the end of the filter graph, the call propagates upwards in the filter chain synchronously, which means that the calling thread is blocked and waiting until the entire chain of getframe calls all the way up to the source filter have finished processing. This is obviously completely contrary to how a multithreaded system is supposed to work, and the solution chosen by the Avs+ developers was brute force. That's where the MtGuards, the prefetcher and the shared VideoCache come in. The prefetcher runs several worker threads that requests (predicted) output frames from the end of the graph before the client application does, so that when the client application asks for an output frame that frame has already been processed. The key thing to understand here is that when the client application requests frame N, all the processing needed to produce that frame is by necessity done in one and the same thread. However, frame requests almost never come alone, so there's more to this. To avoid multiple threads re-processing the same frame with the same filter (for example in situations where subsequent calls to a temporal filter results in requests for overlapping frame ranges from an upstream filter), Avs+ inserts cache filters between each filter in the chain. Cache lookups (that is, getframe calls) from this filter are protected by a mutex so if the requested frame has already been requested by another thread but isn't ready yet, the lookup blocks until the frame is available. Evictions from the cache are on a least-recently-used basis.

If this sounds awkward, that's because it is, and the only reason it actually kinda works in reality is that client applications tend to request frames in order, linearly. The entire model is by its nature very fragile (especially when you involve filters that reorder frames) and as soon as MT_MULTI_INSTANCE filters are involved it starts requiring immense amounts of memory because of the frame caches required to support it. If you use KNLMeansCL in Avs+ together with, say, prefetch(4), you will get almost exactly the same performance as in VS (in which there is a single instance of the filter) but almost four times the memory usage. This is because it's bottlenecked by the GPU so spawning more instances doesn't help at all, but Avs+ has to do it that way because it's not a MT_NICE_FILTER and there's no way to create a single instance that doesn't make everything upstream of it single-threaded and blocking. There are also a few other quite common filters that fit very poorly into the concurrency model - for example, TDecimate is MT_SERIALIZED, so everything upstream of it is single threaded and synchronous. You're just lucky that it happens to usually be placed soon after a source filter that is probably also MT_SERIALIZED so you don't notice it much.

In VS, on the other hand, all frame requests are in principle non-blocking and there is a central authority (the thread pool) that schedules work. A single frame request from a client application may be handled by any number of threads - if there is a slow fmParallelRequests filter somewhere in the chain but everything else is fmParallel, the thread pool will simply run everything else as fast as necessary to keep the fmParallelRequests filter processing without ever waiting for input (and without output ever waiting for that filter). The thread pool also keeps an eye on frame caches, which by the way are rather more sophisticated than in Avs+ but I can't be arsed to write even more right now and it's tangential to the topic anyway.

tl;dr: in Avs+, the threading is frame-based per output frame. The smallest work unit is one output frame's entire processing chain from source filter to output. In VS, the threading is frame-based per filter. The smallest work unit is processing one frame in one filter. The latter is obviously a whole lot more flexible.

Open questions
I don't understand how scriptclip/conditionalfilter and friends could ever work reliably with the Avs+ concurrency model. Right now it simply doesn't work at all. The trivial example

Code:

ScriptClip(last, "Subtitle(String(YDifferenceFromPrevious))")

just says "PlaneDifference: this filter can only be used within runtime filters", which implies that the current_frame variable isn't set. There is a thread-local storage for script variables in Avs+, so that isn't the problem - each single-threaded frame request has its own current_frame and any frame requests made from inside a conditional filter would therefore operate on the correct frame, in theory. The problem arises with the fact that scriptclip and friends effectively call eval() on every frame request, which means they call env->Invoke on every frame request, which means that a bunch of filter instances are constructed, and those constructors can in turn call env->Invoke and/or getframe, and the entire thing comes crashing down in flames, with either deadlocking or undefined behavior. pinterf claimed a while ago that scriptclip is supposed to work in MT, and I'm probably missing something because of all the awful complexity in the Avs+ threading, but I can't see how it ever could do so reliably. In some very simple instances it might work, but I really don't think it can be made to work with arbitrary filter chains.

For comparison, in VS you specify at compile time what you're going to feed into FrameEval, and it executes as fmParallelRequests or fmUnordered depending on what it's doing. The called evaluation function never calls getframe and never constructs filter instances; instead it just returns a node (or a clip, in Avisynth parlance) from which the downstream filter can request frames.