Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
|
Thread Tools | Search this Thread | Display Modes |
1st June 2010, 21:49 | #1 | Link | ||
typo lover
Join Date: May 2009
Posts: 595
|
ThreadRequest : yet another plugin for multithread processing
ThreadRequest is an AviSynth plugin for multithread processing written by LANTIS.
This is another mounting of PipeLine that obtains the idea of QuaddiMM, and it was quietly open to the public. I had found this plugin on web in this March. but the license was not described there and I coudn't contact the author.(E-mail adress is not published in his page.) I called LANTIS in an AviSynth forum in Japan, and waited for his reaction. He noticed my calling about the middle of last month and decided it to making license GPL. Because the issue was cleared, I decided to introduce this here. Author's page : http://lantis.homeunix.org/avisynth.shtml DownLoad : http://lantis.homeunix.org/archive/T...equest102a.zip Quote:
Quote:
Last edited by Chikuzen; 3rd June 2010 at 05:30. |
||
2nd June 2010, 00:20 | #2 | Link |
Xbox Live: o 4lif o
Join Date: Jun 2009
Location: Monrovia, CA
Posts: 64
|
I'm fiddling this with this right now, but not really getting a speed increase at all (I'm on an 8-core machine). Can you post some scripts you use this with on a regular basis, and perhaps provide some insight into what kind of speed returns you get out of it? I've been using SET's 2.6 MT build for quite some time, with very few major issues but am always looking for ways to speed things up further. Thanks for the post though, I'll post if I get some significant testing results.
|
2nd June 2010, 08:45 | #3 | Link |
typo lover
Join Date: May 2009
Posts: 595
|
...ok.
#environment CPU : intel Core2Quad Q9450 Mem: 8GB (DDR2-800 2GBx4) GPU : Radeon HD5870 OS : Windows7 Ultimate (64bit) AviSynth: 2.5.8MT by SEt (32bit) #Source Anime OP (MPEG2 TS 1440x1080ix2700frames) #script SetMemoryMax(1536) MPEG2Source("source.d2v") #ThreadRequest(30) TFM() #ThreadRequest(30) TDecimate() #2700frames -> 2160frames #ThreadRequest(30) FFT3DGPU(sigma=3.0) #ThreadRequest(30) LanczosResize(1280,720) Return last #Benchmark avs2avi.exe script.avs -c null -o n #Result without ThreadRequest() * Pass 1/1: Finished in 00:01:52.698 (19.17 FPS) * Pass 1/1: Finished in 00:01:52.703 (19.17 FPS) * Pass 1/1: Finished in 00:01:52.964 (19.12 FPS) average : Finished in 00:01:52.788 (19.15 FPS) with ThreadRequest() * Pass 1/1: Finished in 00:00:57.009 (37.89 FPS) * Pass 1/1: Finished in 00:00:56.210 (38.43 FPS) * Pass 1/1: Finished in 00:00:55.946 (38.61 FPS) average : Finished in 00:00:56.388 (38.31 FPS) it seems that the speed has been upped to about 200%... Last edited by Chikuzen; 2nd June 2010 at 19:06. |
2nd June 2010, 18:06 | #5 | Link |
Xbox Live: o 4lif o
Join Date: Jun 2009
Location: Monrovia, CA
Posts: 64
|
...dokey. I just wasn't using a complicated enough script apparently.
#environment CPU : intel X5560 Mem: 3GB (DDr3-10700 1GBx3) OS : WindowsXP AviSynth: 2.6 SET (32bit) #Source Elephant's Dream (Cineform 1920x1080px2879frames) #script avisource("elephantsdream1080.avi") threadrequest(10,5,2).Splin36resize(2048,1152) threadrequest(10,5,2).assumeframebased() threadrequest(10,5,2).seperatefields() threadrequest(10,5,2).selectevery(8,0,1,2,3,2,5,4,7,6,7) threadrequest(10,5,2).weave() edeinted = threadrequest(10,5,2).tdeint() threadrequest(10,5,2).tfm(edeint=edeinted) threadrequest(10,5,2).tdecimate() threadrequest(10,5,2).blur(0,1) threadrequest(10,5,2).sharpen(0,1) threadrequest(10,5,2).spline36resize(640,360) trim(0,2878) #Benchmark Virtualdub 1.9.9 (Cineform) #Results -no Threading Finished in 00:04:59 - with ThreadRequest() Finished in 00:01:58 That certainly made a huge difference. And seems more stable than setmtmode with overly complicated scripts like the one above. |
3rd June 2010, 08:59 | #6 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,316
|
In fact is using this can be interesting for scripts with severals functions. Pipeline is interesting when you have severals task, as a451guy451 stated.
But, if you are runing for example a scrit with only one call, MT/SetMTMode are more interesting, and pipeline will do nothing. |
3rd June 2010, 22:20 | #7 | Link | ||
typo lover
Join Date: May 2009
Posts: 595
|
Quote:
Quote:
|
||
4th June 2010, 00:26 | #8 | Link |
Xbox Live: o 4lif o
Join Date: Jun 2009
Location: Monrovia, CA
Posts: 64
|
I haven't been cataloging my tests quite as well as Chikuzen, but I tend to agree with his analysis. There have only been a few instances where a transcode would go (slightly) faster for me with setmtmode (It might have something to do with the actual CPU specs though). Those cases were either just fps adjustments, or scripts with no real filtering happening. The major determining factor for me though is stability, and at that threadrequest() wins hands down. I had wanted to bench the two fairly with the script I referenced in my above post, but setmtmode couldn't run it with anything but modes 5 & 6, which wasn't really a fair comparison imhp.
Also, I just realized my placement of threadrequest() in my first script tests is wrong. It seems like it should follow a filter, not precede it, correct? |
4th June 2010, 02:23 | #9 | Link | |||
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 5,391
|
Quote:
First, your quoted script above won't run at all. There's splin36resize instead of spline36resize, also seperatefields instead of separatefields, and tfm(edeint=edeinted) probably should read tfm(clip2=edeinted), I guess? After correcting those, I got the following results on 1080p: 1 Thread:.. 1:46 (28.3 fps) (100%) 8 Thread:.. 1:28 (34.1 fps) (120%) ThreadReq: 0:52 (57.7 fps) (204%) By these numbers, threadrequest wins hands down. But it made me wonder, since in my tests with TGMC - which is a fairly complicated & complex script - I got much better results with setmtmode. What was apparent is that, in all three cases, the actual CPU usage was fairly low. Singlethreaded it was around 22%, 8-threaded was about 25%, and threadrequest'ed was about 32%. That's very little. Then I suspected it could be related to TFM and tdecimate. So I went ahead and simply deleted them both. Instead I put tdeint into place. (Yes the script then makes little sense processing-wise, but it's just for testing.) Okay ... without TFM+tdecimate, but with tdeint, I then got the following numbers: 1 Thread:.. 3:08 (15.9 fps) (100%) 8 Thread:.. 0:57 (52.6 fps) (331%) ThreadReq: 2:22 (21.1 fps) (132%) Now look at that! Obviously, it is TFM+TDecimate in particular that don't play well with setmtmode. If these two are not involved, then the picture changes completely. I don't claim to understand *why* it is like this, but at least there's a sign where the problem is located. Hence, I have to comment: Quote:
As indicated here, and more obviously in the other thread linked above, SetMTmode seems to work very well even with compuitational intensive scripts. 4-fold speed on a processing monster like TGMC is quite remarkable, I'd say. The true question is why SetMTmode works so badly together with TFM+TDecimate! (By another spontaneous idea, I tried to "buffer" these filters with RequestLinear, but that had only very little effect.) And lastly, Quote:
On the other hand, I already made a (weak) try to put ThreadRequest *into* the TGMC function. But it is not running, I get nothing but crashes, crashes, crashes. Could be I'm using it not appropriately, quite possible. TGMC is complex, and the knowledge base for ThreadRequest is little. Once more, I'm quite a newbie in Avisynth multithreading. I just try to learn, and report what I'm experiencing on the way.
__________________
- We´re at the beginning of the end of mankind´s childhood - My little flickr gallery. (Yes indeed, I do have hobbies other than digital video!) |
|||
4th June 2010, 03:00 | #10 | Link | |
typo lover
Join Date: May 2009
Posts: 595
|
Quote:
However,I think that you are correct when considering it from the examples enumerated by LANTIS(the examples of my first post is it). |
|
4th June 2010, 03:40 | #11 | Link |
Registered User
Join Date: Dec 2003
Location: MO, US
Posts: 999
|
Didée, what mt mode did you use? 2? I don't have any actual experience with it, but from my understanding mode 2 creates as many instances of each filter as there are threads. It also replaces the normal cache with a special cache that manages the threads so that a filter instance is only accessed by one thread at a time. In this way instance 1 can produce frame 1, instance 2 can produce frame 2, etc... and it doesn't matter if the filter is thread safe or not (access to class variables etc..). Although, it begins to matter if the filter produces the same results if you seek vs linear requesting. This will work fine for tfm/tdecimate... i.e. wont produce errors and the output should be the same as linear requesting (as long as the frame requests to each instance of tdecimate are not greater than two cycles. For tfm, seeking doesn't matter except for one of the special matching modes for blending that works based on previous matches.). However, it wont save any time with tdecimate because for its decisions each instance of tdecimate needs the statistics about every frame in each cycle. So all of the instances are doing all of the calculations, and therefore it doesn't save any time vs using a single instance.
Last edited by tritical; 4th June 2010 at 03:48. |
4th June 2010, 08:20 | #12 | Link | |
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,431
|
Quote:
Edit: In effect, if I understand correctly, applying threadrequest to TGMC is the same as applying it to just the last filter used inside TGMC. Last edited by Gavino; 4th June 2010 at 08:56. |
|
4th June 2010, 08:41 | #13 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,316
|
I think complex script like for exemple TGMC need to be internaly rewriten with threadrequest inside them to realy see the benefit of it.
Otherwise, on the global script, SetMTMode will do better, your results don't surprise me. It's each function inside the script wich need to be pipelined, not the whole script. Rewrite TGMC with threadrequest inside "on each line" (thing i inteded do try one day), and test afterward. |
4th June 2010, 12:50 | #14 | Link |
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 5,391
|
@ tritical - thanks, I see. (Hope so, at least).
Let me ask the other way round: IF a script involves decimation via tdecimate, what would be promising approaches to make use of multi-threading? Any? Or not at all, necessarily? Gavino, jpsdr - I might be a novice with MT, but I'm not silly. Of course I did not do TGMC().threadrequest(). Surely it was fiddled into the script, but ... problems. Practical example and problem: Consider this quite-familiar piece of code: Code:
o = bob(0,0.5) #.threadrequest(10,5,2) # o.temporalsoften(1,255,255,28,2).merge(o,0.25) #.threadrequest(10,5,2) # sup = msuper() #.synchronize(80) bv3 = sup.manalyse(isb=true,delta=3,search=4) #.threadrequest(10,5,2) bv2 = sup.manalyse(isb=true,delta=2,search=4) #.threadrequest(10,5,2) bv1 = sup.manalyse(isb=true,delta=1,search=4) #.threadrequest(10,5,2) fv1 = sup.manalyse(isb=false,delta=1,search=4) #.threadrequest(10,5,2) fv2 = sup.manalyse(isb=false,delta=2,search=4) #.threadrequest(10,5,2) fv3 = sup.manalyse(isb=false,delta=3,search=4) #.threadrequest(10,5,2) # o.mdegrain3(sup,bv1,fv1,bv2,fv2,bv3,fv3) #.threadrequest(10,5,2) I added TR to some lines, I added it to all lines, with default parameters, with larger-than-default parameters, also with synchronize in various places .... my clueless try is shown right of the # column. All I get is crashes after a few frames. It seems to get better when adding Synchronize(xx) to the sup=msuper() line - with that, it won't crash after just a view frames, but will crash after a few hundred frames. Seems its quite some hassle to get it right ... (... while SetMTMode runs out-of-the-box on this example. And for the time while the ThreadRequest'ed script IS running, it is slower than SetMTmode (66fps vs. 72fps).
__________________
- We´re at the beginning of the end of mankind´s childhood - My little flickr gallery. (Yes indeed, I do have hobbies other than digital video!) Last edited by Didée; 4th June 2010 at 12:53. |
4th June 2010, 14:36 | #15 | Link | |
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,431
|
Quote:
The most effective way to multithread any given script is not at all obvious and requires careful analysis of frame access patterns over the entire filter graph, also taking into account the action of the Avisynth cache. Add to that the unknown 'thread-safeness' of the individual filters and the whole thing is a minefield. |
|
4th June 2010, 14:56 | #16 | Link | |
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 5,391
|
So much I figured already.
Quote:
Patiently waiting for enlightenment on how to successfully use ThreadRequest on a "primitive" MDegrain sequence. If it works so well with SetMTmode, it should be possible with ThreadRequest, too?
__________________
- We´re at the beginning of the end of mankind´s childhood - My little flickr gallery. (Yes indeed, I do have hobbies other than digital video!) |
|
4th June 2010, 16:07 | #17 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,316
|
In fact, for me, on my i7@980, i must said that on 480p with TGMC, i was going from aroud 5fps to around 15fps with SetMTMode(2,12).
So, as i'm lazy and already have something wich speed-up, i may not try to use threadreaquest on TGMC.... |
4th June 2010, 16:39 | #18 | Link |
Registered User
Join Date: Dec 2003
Location: MO, US
Posts: 999
|
Atm, there isn't a good way to multithread tdecimate at the script level via setmtmode/MT etc... It would have to be done internally. Actually, it could probably be done using openmp with just a few lines, but I'd have to check the source code.
Personally, I think that for Avisynth 2.6 a function should be added to the plugin interface that requires a filter to report which mt modes it is compatible with, and for MT type multithreading automatically report how much overlap it needs for the current settings (still allow that to be overridden by the user though). That would eliminate a lot of problems for users trying to figure this out (in reality the only way to know which mode will work is to look at the source code). It would also get authors to think more about their implementation with respect to threading. Anyways, getting OT for this thread. Last edited by tritical; 4th June 2010 at 16:41. |
4th June 2010, 19:49 | #19 | Link | ||
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,431
|
Quote:
However, there's a difference between 'the most effective way' and 'a way that doesn't crash' - here a simpler analysis might suffice. Quote:
The result of bob() is fed into three different places (temporalsoften, merge and mdegrain3), while that of msuper() is fed into the six manalyse filters. Bob is probably(?) thread-safe, but likely the upstream source filter is not. So the first thing I would try is putting Synchronise() on bob() (or before it) and on Msuper(). Once the script works without crashing, it can then be tuned further by adjusting the threadrequest placement and parameters. This is all in theory - I haven't actually used any multithreading stuff as my processor is an ancient single-core. |
||
9th June 2010, 05:35 | #20 | Link |
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 4,926
|
Interesting it seem combining SetMtMode(2) + .ThreadRequest for only the heaviest parts (tried with filter like yadif and unblock) seems to be faster then to use .ThreadRequest in every line alone
for example this was more efficient SetMTMode(2) directshowsource("E:\test.ts", audio=false) Yadif(mode=0,order=1).ThreadRequest() then this directshowsource("E:\test.ts", audio=false) Yadif(mode=0,order=1).ThreadRequest() or that directshowsource("E:\test.ts", audio=false).ThreadRequest() Yadif(mode=0,order=1).ThreadRequest() interesting it gave me immediately very constant 95% utilization on Avisynth SET 2.5.8 MT and a nice speedup
__________________
all my compares are riddles so please try to decipher them yourselves :) It is about Time Join the Revolution NOW before it is to Late ! http://forum.doom9.org/showthread.php?t=168004 Last edited by CruNcher; 9th June 2010 at 05:39. |
Thread Tools | Search this Thread |
Display Modes | |
|
|