PDA

View Full Version : Multi-threading: Roundup of options


pbristow
18th June 2011, 11:09
Since the topic of "yes, but why can't we have multi-threading*too*?" keeps cropping up in other development threads, I thought it might be useful to create a general roundup of the situation. If it's useful enough, it might be a candidate for a sticky...?


Current multithreading options, in Avisynth versions up to 2.5.8:
---------------------------------------------------------------

1. SetMTMode - Works by creating a shared cache so that several instances can be created of each filter(-chain), on separate threads, without having to duplicate the processing required to get the frames required. Essentially, this is "temporal" or "frame-wise" multi-threading.
- Requirements: an MT-enabled build of Avisynth.
- Pros: Can speed up the operation of most filters; avoids "split screen" artefacts.
- Cons: Interferes with operation of temporal filters such as TemporalSoften, MVTools, etc.; Requires m=several different modes to cope with different cases; Not always easy to understand what it's doing when debugging.
- Known bugs/issues: (to be added)

2. The MT() function - Works by splitting each frame of video either vertically or horizontally into strips, each strip being processed by one thread. This is "spatial" multi-threading.
- Requirements: The MT plugin plus an MT-enabled build of Avisynth.
- Pros: Easy to visualise and understand what's happening
- Cons: Interferes with operation of spatial filters (large blurs, de-blocking etc.); Interferes with block-matching across the strip boundaries (MVTools etc.).
- Known bugs/issues: (to be added)

3. The MTi() function - Works by creating two threads and passing "upper" fields to one, "lower" fields to the other. Theis is a specialised case of temporal multithreading.
- Requirements: The MT plugin plus an MT-enabled build of Avisynth.
- Pros: For truly interlaced material that will *remain* interlaced, neatly exploits the need to process the fields independently of each other.
- Cons: Does not allow processing of one set of fields to exploit data from the other (e.g. high quality de-interlacing);
Limited applicability.
- Known bugs/issues: (to be added)

4. Custom-implemented multithreading, inside specific functions.
- Requirements: An MT-enabled version of the relevant plug-in (which may not exist (yet)).
- Pros: The developer of the function has full control of what gets multithreaded and how, can diagnose the problems that relate to their specific case, and can create efficient, targeted solutions for those problems. (E.g. if spatial multi-threading is used, the optimum overlap can be determined more reliably, or the overlap merging scheme can be tailored to ignore pixels that will not be significant to the output, etc.);
Entirely new or case-specific methods of multithreading can be applied that may be more efficient than the obvious ones, (e.g. One thread for each colour plane; Only multi-threading computation of output pixels/blocks that match a computed mask; etc.)
Does not require any change to Avisynth itself, so can be used immediately with new/alpha/beta versions which do not yet have any built-in multithreading.
- Cons: More work for plugin developers; Have to wait for developer to create an mt version of the filter; Filter developer has to learn & think about appropriate mt techniques; Filters become more complex and harder to debug.
- Known bugs/issues: (These will be specific to individual filters.)

5. The ThreadRequest() function (See: http://forum.doom9.org/showthread.php?t=154886)
This enables a chain of filters to be split "mid-way", causing everything before ThreadRequest() to execute in a separate thread from everything after it.
- Requirements: ThreadRequest plugin; (does it also require an MT build of Avisynth?)
- Pros: Well suited (in theory) to long, but simple, linear filter chains;
Doesn't require spatial splitting of video data, so avoids joining/overlap artefacts;
Doesn't require temporal splitting of data, so avoids duplicated GetFrames/Caching issues/need for SetMTMode;
- Cons: Poorly documented; Requires careful analysis of script to determine best usage (which is hard to do without good documentation!);
- Known bugs/issues: Reported as either crashing or slowing to a crawl after a certain point in processing.


Suggested alternative/future methods:
-------------------------------------
1. Exploiting natural parallelism within scripted tasks, e.g.:

Movie3d = AVISource("whatever.avi")
LeftEye = Movie3d.SomeComplexFunction(Left=True)
RightEye = Movie3d.SomeComplexFunction(Left=False)
StackHorizontal(RightEye,LeftEye) # Cross your eyes to view!


Becomes:

Movie3d = AVISource("whatever.avi")
ParallelProcess(threads=2, \
"LeftEye = Movie3d.SomeComplexFunction(Left=True)", \
"RightEye = Movie3d.SomeComplexFunction(Left=False)" \
)
StackHorizontal(RightEye,LeftEye) # Cross your eyes to view!


- Pros: Should be simple to implement;
Interestingly versatile: Using this method it would be possible for the *user* to effectively re-implement the MT() function, simply by cropping their source video into sections before passing them to ParallelProcess(). They would also then have more control over where the boundaries between strips/blocks fell.
Similarly, MTi() Becomes something like:

Function MTi(clip, string function) {
clip = SeparateFields(clip)
Upper = clip.SelectEven
Lower = clip.SelectOdd
ParallelProcess(2, "Up2 = Upper." + Function, "Low2 = Lower." + Function)
Interleave(Up2,Low2).Weave
}
- Cons: Limited *direct* applicability to more linear processing tasks (but see above);
Problem of how to return multiple clips as output. Syntax as suggested here breaks with Avisynth norms, in that it has to return multiple results via, effectively, auto-generated global variables. ( What happens if a script containing this method is called by another script, using this method? Clash of global variables? Possible alternative: Multiple outputs are returned as a single clip by stacking them vertically (computationally efficient) or horizontally (may be necessary if results have different widths);





... OK, what have I missed/got wrong? :)

ajp_anton
18th June 2011, 13:02
What about threadrequest? It lets multiple heavy filters run in their own thread. While filter2 works on the output of filter1, filter1 can already start working on its next frame.

Didée
18th June 2011, 13:51
Yes, I wanted to suggest ThreadRequest, too. It seems promising, but the correct usage is a mystery to me.

Anyone can post a functionating ThreadRequest() script for, say, MDegrain3() with 6 threads for the motion searches, one thread for MSuper, and one thread for the final MDegrain3? Last time I tried I didn't manage anything, everything crashed very soon.

If you want to try, here's the starting point (http://forum.doom9.org/showthread.php?p=1405288#post1405288) that did not get me anywhere.

The existing "documentation" is more like an alibi explanation than being helpful. The actual correlations and technical imperatives between buffering/synchronizing/etc are as clear as a black hole.

My guess is that this baby is simply not yet mature. But it's hard to tell for sure, when you're not confident about how the current baby is to be used correctly.

Gavino
18th June 2011, 15:02
Anyone can post a functionating ThreadRequest() script for, say, MDegrain3() with 6 threads for the motion searches, one thread for MSuper, and one thread for the final MDegrain3? Last time I tried I didn't manage anything, everything crashed very soon.
I provided a tentative response in post #19 of that thread.
Did you try anything along those lines?

I also posted some general thoughts about ThreadRequest in post #21.
(I don't know if they make sense as no-one responded (either positively or negatively)).

Didée
18th June 2011, 15:20
No, I haven't tried anything after the initial one- two- or thre-hundred script crashes.
I've decided to simply give up on ThreadRequest until someone else might pop up and spread the wisdom.

Zarxrax
18th June 2011, 20:08
It seems that there are many different methods of achieving multithreading through Avisynth.
Of course it would seem that the most efficient way (computing speed-wise) is for each plugin to have its own custom threading. But for developers who don't want to write multithreading code, it would be nice if they can easily add some of these prebuilt solutions into their code. In other words, instead of the user having to select the appropriate threading method, the plugin developer would choose it.
And for older plugins that are no longer supported, there could be a database within avisynth which knows the best threading method for each plugin.
A solution like this could drastically simplify things for the end user. But maybe its just wishful thinking.

pbristow
18th June 2011, 20:17
I've added ThreadRequest. Thanks, I'd completely forgotten about that one! :) (If I get time this weekend, I'll give it a try myself.)

I've also added a "requirements" line to each section. Could people check that I've got these right? In particular, does ThreadRequest require an MT-modified build of AVIsynth, or is it stand-alone?

It would be wonderful if there were one or more multi-threading methods that were implemented simply as plug-ins, without the need for a modified AviSynth build at all. Wishful thinking, maybe, but it would give users some options that don't stop working whenever they move to the latest build of Avisynth 2.6, or whatever...

[EDIT:] Aha! This comment implies ThreadRequest does work stand-alone: http://forum.doom9.org/showthread.php?p=1507829#post1507829. I definitely need to try this one out...

cretindesalpes
6th July 2011, 23:23
Currently, if you have say 20 native mulithreaded plugins running on 8 logical cores, you'll end up with 160 active threads, which is a lot. And it gets worse if you enable the MT modes without limiting the thread count for these plugins...

That's why I think of writing a shared thread pool, initially for my Dither (http://forum.doom9.org/showthread.php?p=1386559#post1386559) package, and make it available for any other plugin through a separate DLL and an SDK. Of course existing plugins should be adapted to benefit from the thread pool. Sharing the threads would minimize their number and make the thread control easier for the user. Moreover, this would accommodate the MT modes without multiplying the threads and consuming resources.

Usage would be simple: just push a bunch of tasks on a queue, and wait for their completion. The worker threads would dequeue the tasks and execute them.

Do you think this idea makes sense? Any suggestion?

pbristow
7th July 2011, 03:48
My own experiments with ThreadRequest were uninspiring. Yes, it "works" without an MT-adapted version of avisynth (i.e. doesn't fail, and doesn't slow things down); Unfortunately, it doesn't seem to convey any real performance advantage in any of the situations I tried it with. It seems to be only suited to a very narrow subset of AviSynth use-cases.

(More thorough analysis required.)

TheRyuu
7th July 2011, 13:02
tl;dr version: All options other than 4 don't work.

pbristow
7th July 2011, 15:29
... Sharing the threads would minimize their number and make the thread control easier for the user. ...

Usage would be simple: just push a bunch of tasks on a queue, and wait for their completion. The worker threads would dequeue the tasks and execute them.

Do you think this idea makes sense? Any suggestion?

Certainly sounds like an avenue worth pursuing. (It's either this, or every plug-in has to have an option to limit the number of threads, and users have to learn how to use it.)

I'd like to keep this thread focussed on documenting what's *currently* available to users, if possible, but I look forward to reading of your progress in another thread. :)

pbristow
7th July 2011, 15:35
tl;dr version: All options other than 4 don't work.

Correction: All options have their own problems, including 4. However, 4 is currently the one that works for users "straight out of the box".

TheRyuu
8th July 2011, 00:16
All options crash (also known as "not working") except for 4.

ftfy.