PDA

View Full Version : MT in AviSynth 2.6


SEt
27th September 2009, 22:29
As suggested by IanB i'm opening thread for discussing MT implementation in AviSynth 2.6.

Main concepts i propose:

2.6 plugins are required to be threading-aware.
Threading mode is chosen by filter itself, not by the script. It's not the user who should guess in what mode some filter won't crash and will work faster.


Implementation thoughts, at least 3 modes are needed:
0) Threading is handled by the filter: 1 instance that is called from all threads with no protection.
1) Threading is handled by Avisynth: as many instances of the filter as there are working threads. Filter is aware that frame requests from each instance are likely non-sequential and part of the frames is handled by other instances.
2) No threading: 1 instance of the filter that is protected by Avisynth to be called only from 1 thread at a time. Intended only for source filters and debugging. Frame requests are likely sequential (due to cache after it) but it's not guaranteed.

Mode is selected ~at the time of instance creation, so filter can choice the mode based on the passed arguments.

Problem: some things (for example Cdeblend script, likely IVTC filters) want single instance and sequential frame requests to work correctly. Perhaps there should be some help from Avisynth core to run them in that way even if it's very slow (Mode 3 ?). This can also benefit sources with temporal compression where single backward frame request can be very expensive but even unneeded sequential requests are cheap.


Related issues:

MT should be completely separated from caching. Cache is Mode 0 filter that is inserted after most other filters.
Setting CacheHints should be mandatory. Cache should be more intellectual about connected requesting filters and number of working threads:

When set to no caching and there is only one requesting filter - no cache is inserted at all.
When there are several requesting filters (like Interleave(last.f1(), last.f2())) cache instance is inserted even if all request no caching.
Caching amount is adjusted by number of working threads and multiple requesting filters.
There should be a clean way to detect cache filter, not hack like now. I propose changing return value from void to at least bool and returning success state. Cache returns success for IsCache request.


This is intended as memory usage optimization that is required to be able to run multiple threads. Auto guessing mode of cache is often not needed at all (many filers are pure spatial) and can excess poor 32-bit address space limit before it learns that no caching is actually needed.

thewebchat
27th September 2009, 22:54
What about this: 2 filters, both of them must be run non-parallel in sequential operation.

In that case, would your idea be able to execute both of these filters in parallel, with filter A processing frame n+1 while filter B processes frame n from filter A?

SEt
28th September 2009, 00:02
It would be more difficult to balance cpu usage that way - quite often there is single heavy filter and many light ones around it, also remember that we have not only 2 threads but likely more than 8 in near future. So, i'd say it's not worth the effort.

MfA
28th September 2009, 01:33
Perhaps there should be some help from Avisynth core to run them in that way even if it's very slow (Mode 3 ?).
Help how? It might be nice to generate a warning when a frame is requested out of sequence from a filter/script which can't handle it ... but there is no way to really fix it, except rerunning the entire filter chain from frame 0.

If you want to generate the warning then maybe an extra parameter for loadplugin and an alternative for function to signal they require sequential access?

PS. well that is worst case of course, you could add another parameter which gives the minimum amount of frames it needs to resync and then rerun the filter for those.

IanB
28th September 2009, 05:24
Check this old thread Multicore optimization idea: running consecutive filters in different threads. It's far from complete but it has many good ideas. Post #20 has a good description of one of my preferred threading models, it needs some work regarding dealing with the sluggish (>50% of all time) filter case.

MfA
28th September 2009, 16:20
As long as we are throwing personal preference out there ... :)

My preferred threading model would throw each filter into it's own process (you could still have a mode 0 on request, and create only one instance so it can do it's own memory management) and just use shared memory for passing around video buffers. All filters would work, old, new ones programmed without a care for thread safety. Hell you could even still have a multithreaded mode to save on the few MB of memory you'd otherwise spend for filters which are programmed correctly.

SEt
28th September 2009, 16:26
MfA, i think something like adding special filter after cache after the problem filter that requests frames in sequential order from last requested to current request will help most of the time - backward requests will always hit cache. It will request all frames even if not all are required but it's either very light penalty (temporal compressed sources) or you actually have to request all frames from the filter to get consistent results.

IanB, about your model:
1) It's too complex.
2) Requires a lot of thread locking - adds significant overhead.
3) Theoretically unable to utilize cpu completely in situation when some filter (especially last) takes > 1/number_of_hardware_threads part of the global time (it's 50% only for 2 hardware threads, for 8 it's 12.5%) - exactly what happens in most my scripts.

Pipelining might be nice addition around "Mode 3" filters in my proposal, but again i think it's not worth all the trouble to implement it.

Edit:
MfA, spawning multiple processes won't help at all for "Mode 2/3" filters and "Mode 1" are perfectly fine inside single process (most filters are "Mode 1" i think).

MfA
28th September 2009, 16:55
Simply not calling getframe on one instance from multiple threads will be sufficient for most, but you will always find a couple of people who like using global variables.

SEt
28th September 2009, 17:02
Global variables would as well affect multiple instances of the filter in different places of script what is clearly a filter bug even in non-threaded Avisynth.

MfA
28th September 2009, 17:46
Only if they carry state between calls.

kemuri-_9
28th September 2009, 18:54
What I've thought up is something like the following:

A. Avisynth acts as a thread manager.
B. Avisynth itself spawns x number of threads based the user's y number of cpu cores in the computer at init.
- thread count could possibly be specified by the user within the script for extremely fine tweaking of performance based on what else is running on the computer (similar in concept to SetMemoryMax).
C. every filter is required to specify a maximum number of threads it would like to be allocated to process a single frame.
- the filter could get from avisynth the number of spawned threads and return a fractional amount of that.
D. on completion of frames needed by filterX, it gets allocated some X (where 1 <= X <= wanted) available threads to process on the frame Y desired by filterY.

this comes down to frames are still processed on demand, but each filter will actually be working on different frames as they are available.
if the frame(s) desired by filterX is not available, then avisynth will allocate available threads to other filters that can process.

this model
* would allow filters that want to be treated as single-threaded as such
* still allow filters that want to be multi-threaded to be also be treated as such.
* requires that plugins are made thread-aware, even if they themselves aren't threaded
- that is Avisynth should be able to go GetFrame(frame=n,threads=1), GetFrame(n+1,threads=1), etc. without issues.
* filters are not allowed to spawn their own threads.

the issue here is how exactly frame cache is going to be managed...
it could either be on a filter basis: that is filters are responsible for caching results for repeat frame requests. (are we being ideal here? extremely)
or it could be handled by Avisynth, but this would involve creating a fairly powerful (see complex) cache system. (the likely to occur case)

but in any situation the cache system is going to need to be highly overhauled/improved to properly handled threading in any scenario.

thewebchat
28th September 2009, 20:50
kemuri9, won't that new model require every filter to be rewritten?

kemuri-_9
28th September 2009, 23:43
kemuri9, won't that new model require every filter to be rewritten?

yes, pretty much so.
supporting old filters that are not MT compatible while supporting a true MT model is going to cause a nightmare.

And I don't see instantiating x instances of filterY as a feasible solution as there are some filters that have large instance costs, creating multiple copies of the filter internally will only serve to whittle memory that could be used better elsewhere.

MfA
29th September 2009, 00:28
It's less work to just rewrite those filters to share context between instances.

Gavino
29th September 2009, 01:55
It's less work to just rewrite those filters to share context between instances.
But not all instances will want to share the same context. Only those that are 'cloned' at the same time, corresponding to a specific instance in the script.

MfA
29th September 2009, 03:36
They don't have to ... share nothing is the default and simply works, if you want to share something making a list of parameters and doing a search for a match isn't a whole lot of work.

kemuri-_9
29th September 2009, 15:41
They don't have to ... share nothing is the default and simply works, if you want to share something making a list of parameters and doing a search for a match isn't a whole lot of work.

this would be tedious and better performed by adding a virtual function to the base filter class that can compare filters to each other to test if they're 'equivalent' (the same filter instanced with the same parameters)