Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
|
Thread Tools | Search this Thread | Display Modes |
18th July 2008, 23:09 | #1 | Link |
Registered User
Join Date: May 2008
Posts: 10
|
Multicore optimization idea: running consecutive filters in different threads
From what I know, MT-plugin splits a frame into parts and processes each part in a different thread ("MT"), or uses alternating threads to get consecutive frames ("SetMTMode").
What I'm missing is a way to run consecutive filters on different threads, like a pipeline (as seen in modern cpu's for instruction-processing) would do. This would eliminate the Overlap-Problem of 'MT' and issues with filters that aren't reentrant or don't work with 'SetMTMode' for other reasons. Of course, such a pipeline might introduce other incompatibilities, but i'd like to give it a try. A filter implementing the idea would basically do the following: - The filter runs a worker-thread that does the frame-getting from previous filters in the chain. - Every time 'GetFrame' is called, a request is send to the worker-thread, requesting the frame and waiting for it to become available (ideally, the frame is already available). Now, for it to work as a real pipeline, it is necessary to generate frames before they are requested (in order to being able to answer the request immediately without processing-delay). The simplest way is to assume that frames are requested in linear order (actually the case when encoding). I wrote a small plugin to test the idea, and it works quite well: Put "PipeLine" in the script, balancing load between filters below and above the pipeline. In a fast test I got an speed-improvement of ~53%. With better balancing, improvements of up to 100% should be possible (or up to 300% on Quad-Cores), of course depending on the number of 'PipeLine's used, and the load-balancing. Can you please comment on the idea? Maybe it's already done in MT and I didn't notice? Is it generally a good idea and should it be compatible with avisynth? While testing I got some random crashes. Also the race-problem reappeared (no idea why). So I wonder if there is a better way to do it? In the attachment is a binary of the filter, along with its source code. Note when compiling: Due to 'CreateThread' being used, it must be linked with the Multithreading-DLL (not the static library). I'm fine with that, as I prefer the DLL-Runtime anyway. |
19th July 2008, 10:24 | #2 | Link |
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,431
|
In principle, this seems like a good idea to me.
But I think for it to work, all filters used need to be thread-safe and I don't know if that is widely true. I can see possible problems if you have a non-thread-safe filter repeated in different parts of the filter chain. For example: Code:
x = UnsafeFilter() y=PipeLine(x).SomeOtherFilter() x+y |
19th July 2008, 14:35 | #3 | Link |
Registered User
Join Date: Dec 2004
Location: Melbourne, AU
Posts: 1,963
|
I had a similar idea and ran into the same problems. If it was added to the core, I think the ideal place would be in the cache filter since it normally gets added internally after every filter by default.
I saw a similar massive speed improvement but it was unusable due to random crashes. Possibly the vfb management isn't/wasn't thread-safe, I think it's better now than when I tried but IanB would probably know best. |
19th July 2008, 18:47 | #4 | Link |
Registered User
Join Date: Mar 2002
Posts: 1,075
|
Isn't prefetching for parallelization going to be in Avisynth 2.6 (of course that seems to be on the same release schedule as 3.0).
PS. there's an easy solution to thread safety ... don't use threads. What are we talking about here? A couple 100 KB per instance of avisynth and the filter, and filter completion times multiple orders of magnitude larger than context switch times ... peanuts. Threading offers no real benefits over multiple processes with shared memory for this particular application. |
20th July 2008, 16:31 | #5 | Link |
interlace this!
Join Date: Jun 2003
Location: i'm in ur transfers, addin noise
Posts: 4,555
|
if this could be made to work with mvtools, it would be the rocking-est rock that ever rocked.
i have an under-utilised 8-core machine, and HD/2k footage itching to be NR'd.
__________________
sucking the life out of your videos since 2004 |
20th July 2008, 20:28 | #6 | Link |
masktools2 (ab)user
Join Date: Oct 2006
Location: PAL-I :(
Posts: 235
|
So with this, could you possibly run a filter, that only works on the current frame (1D filter, if that is a good term to use), split the task into two with SelectEven/Odd and put each instance of the filter under PipeLine and thus let both cores be used on what you'd originally achieve with just one call of the function, on the whole clip?
|
20th July 2008, 20:50 | #7 | Link | |
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,431
|
Quote:
In practice, you might run into trouble if any of the upstream filters is not thread-safe (see my post #2). All it would take to screw up is modifying an instance variable in the filter's GetFrame call. (BTW I'd call it a spatial filter, or perhaps 2D) |
|
20th July 2008, 21:09 | #8 | Link |
Registered User
Join Date: May 2008
Posts: 10
|
Thanks for the replies.
@Gavino: You're right. There are ways to break it with non-threadsafe filters. Even then it might be useful for simple linear filter chains. @MfA: It would be great if something similar will be in AviSynth 2.6. I'm looking forward to it. About using processes: Processes (can) run concurrently, therefore problems with thread-safety. Actually, on Windows a process is a mere container for one or more threads. @martino: You mean something like Code:
src = last a = src.SelectEven.BilinearResize (640, 480).PipeLine b = src.SelectOdd.BilinearResize (640, 480).PipeLine Interleave (a, b) That 'would' be possible, if it would actually work (and not crash). If that's not asked too much, can someone please revise my code? If there are errors, I'll try to fix them. If it's a problem with avisynth... well.. wait for 2.6. |
20th July 2008, 21:37 | #9 | Link |
Registered User
Join Date: Dec 2004
Location: Melbourne, AU
Posts: 1,963
|
I had a quick look at the code, the only thing I'd comment on is I don't think it's worth implementing a cache since avisynth will already do that for you. It does crash the same as mine with the latest 2.5 beta, I haven't tried 2.6.
I don't see how non-threadsafe filters are a problem as long as getframe calls for individual filters are serialized, which is the point here - a filter doesn't process multiple frames in parallel, instead multiple filters run in parallel processing different frames. What would be cool is to have code that analyzes which frames are being requested and adapts, rather than assuming linear access. |
20th July 2008, 21:38 | #10 | Link | |
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,431
|
Quote:
Testing can only show the presence of bugs, not their absence. And that's even more true where multi-threading is concerned. Last edited by Gavino; 20th July 2008 at 21:42. |
|
20th July 2008, 21:49 | #11 | Link | |
Registered User
Join Date: Mar 2002
Posts: 1,075
|
Quote:
With multiprocessing every filter can be run in parallel with other instances of itself, unless the developer really tried very hard to break things (for instance by using named win32 objects as a sidechannel for passing data between filters ... not that many filters using sidechannels though, maybe mvtools?). Last edited by MfA; 20th July 2008 at 21:56. |
|
20th July 2008, 22:19 | #12 | Link | |
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,431
|
Quote:
Code:
src = AviSource(...).AnyFilter() a = src.SelectEven.BilinearResize (640, 480).PipeLine b = src.SelectOdd.BilinearResize (640, 480).PipeLine Interleave (a, b) Ah, :lightbulb: - how about if Pipeline had a companion called Serialize designed to fix cases like this. You would then write Code:
src = AviSource(...).AnyFilter().Serialize() a = src.SelectEven.BilinearResize (640, 480).PipeLine b = src.SelectOdd.BilinearResize (640, 480).PipeLine Interleave (a, b) Last edited by Gavino; 20th July 2008 at 22:27. Reason: Idea for Serialize filter |
|
20th July 2008, 22:29 | #13 | Link | ||||
Registered User
Join Date: May 2008
Posts: 10
|
Quote:
EDIT: That's actually wrong. You're right about the cache. I could omit it. But currently all frames are put in the cache, even those explicitly requested. I did that for it was the easiest way to do. I'm new to the threading-stuff and it confuses me, so I kept it easy. Quote:
Quote:
Quote:
Last edited by QuaddiMM; 20th July 2008 at 22:34. |
||||
20th July 2008, 22:57 | #14 | Link | |
Registered User
Join Date: May 2008
Posts: 10
|
Quote:
Something like: Code:
PVideoFrame __stdcall Serialize::GetFrame (int n, IScriptEnvironment* env) { EnterCriticalSection (&cs); PVideoFrame frame = child->GetFrame (n, env); LeaveCriticalSection (&cs); return frame; } I suspect (without actually looking at it, so I might be wrong here) that the cause of the problems is the avisynth-cache, which may not be thread safe. If that's the case, there's no way around fixing it. |
|
21st July 2008, 00:49 | #15 | Link | ||
masktools2 (ab)user
Join Date: Oct 2006
Location: PAL-I :(
Posts: 235
|
Quote:
Quote:
*downloads filter and will try some other time |
||
21st July 2008, 04:12 | #16 | Link |
Registered User
Join Date: Dec 2004
Location: Melbourne, AU
Posts: 1,963
|
Instead of implementing serialization in another filter, why not just put it in pipeline and use it on every filter in the script? It's cheap to implement with a critical section like the code posted (critical sections are a lot faster than the other synchronization primitives).
I'd have to check your code again but I think it's already more or less serialized anyway, since getframe blocks for the "fetch" thread to return if it's already running. So if pipeline were in place immediately after a non-threadsafe filter, wouldn't it ensure only one getframe call went through at a time? Re: my idea of analyzing the frame order, I think scripts where this would be most useful are going to be the ones doing some sort of framerate adjustment via mvtools and whatever. Typically not linear access. Even adding a selecteven or selectodd would bork that. I never really got into statistics but something like a decaying mode would probably work, even for odd patterns (e.g. frame 0, frame 2, frame 3, frame 5 etc). I'll see if I can come up with some code. |
21st July 2008, 05:16 | #17 | Link | |
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,431
|
Quote:
Code:
src = AviSource(...).AnyFilter().Serialize() a = src.SelectEven.BilinearResize (640, 480).PipeLine b = src.SelectOdd.BilinearResize (640, 480).PipeLine Interleave (a, b) Or are you suggesting that the two pipelines would co-operate to enforce serial access at a global level? Hmm, perhaps you're right, I'm not sure now. |
|
21st July 2008, 07:35 | #18 | Link |
Registered User
Join Date: Dec 2004
Location: Melbourne, AU
Posts: 1,963
|
I meant just do this:
Code:
src = AviSource(...).AnyFilter().PipeLine() a = src.SelectEven.BilinearResize (640, 480).PipeLine() b = src.SelectOdd.BilinearResize (640, 480).PipeLine() Interleave (a, b) I replaced my avisynth.dll with the one from tsp's last MT avisynth build (since 2.6 isn't quite ready), and it's not crashing anymore. Don't use any setmtmode statements, I think they'll just slow things down or worse cause a deadlock. Simple mpeg2source().resize() scripts perform at double their previous speed. |
21st July 2008, 09:12 | #19 | Link |
Retired AviSynth Dev ;)
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
|
Well, in the practical world there are some issues, that make this more complex than it seems. I've written a prefetch addon to MT, which I have used for some of my own tests. It works almost completely like yours, but problems arise from complex scripts.
The example above works nicely en the example, but what if you want to use a temporal filter? If you use spatial division, as Mt(""), you must compute an overlap, plus you get penalized for thread synchronization, since you cannot return a frame before both threads have completed, and they very seldom are finished in the same amount of time. Futhermore most sources heavily prefer linear access, which means you must still access them in-order, to avoid have a huge seek penalty. You cannot run the script above without synchronization, since you have a 50% chance of 'b' requesting a frame before 'a'. It will be further complicated if we add a "MergeChroma(src)" at the last line, which will request first, if 'a' and 'b' runs async, and each are a frame ahead of yours? I'm still experimenting with the pre-fetcher. It works ok, but I'm still not happy enough with it to release it. In the example above, it should be able to replace PipeLine(). I need to get a sort of dynamic cache working before it is usable. btw, I can only get it to be stable on 2.6.
__________________
Regards, sh0dan // VoxPod Last edited by sh0dan; 21st July 2008 at 09:20. |
22nd July 2008, 02:15 | #20 | Link |
Avisynth Developer
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
|
Okay this is all pretty close to an idea I am gestating for properly doing multithreading in avisynth 2.6.
The thought currently goes like this :-
-- Client X->cache2->filter->cache1->source X requests cache2 requests filter requests cache1 requests source for frame 0. source returns frame 0, cache1 enqueues for frame 1 and returns frame 0. (okay I better trim this notation) worker1 starts prefetch frame 1 from source. cache2 enqueues for frame 1 and returns frame 0. worker2 starts prefetch frame 1 from filter, blocks in cache1. worker1 completes frame 1. No enqueuing! worker2 unblocks, cache1 enqueues for frame 2 and returns frame 1. worker3 starts prefetch frame 2 from source. X requests frame 1, blocks in cache2. worker2 completes frame 1. No enqueuing! X unblocks, cache2 enqueues for frame 2 and returns frame 1. worker3 completes frame 2. No enqueuing! worker1 starts prefetch frame 2 from filter, cache1 enqueues for frame 3 and returns frame 2. worker2 starts prefetch frame 3 from source. worker2 completes frame 3. No enqueuing! worker1 completes frame 2. No enqueuing! Stall! cache2 has frame 2 ready, cache 1 has frame 3 ready, all CPU cores available to X X requests frame 2, cache2 enqueues for frame 3 and returns frame 2. worker3 ... At the stall there is little point in Avisynth proceeding, the goal is to instantly provide X with the frame it is requesting is being met. If X had a bursty request pattern, say requesting 3 frames in quick succession and a long pause to process the 3, then simply adding Sh0dans, prefetch filter would accomodate. Of course for bonus points the cache could measure the inter frame request times and more aggressively prefetch frames based on minimum, maximum and latency times. Threads that block in on cache lock get recycled in a fibre like manner in an attempt to keep the defined number of workers concurrently active, hence the "Build a thread interlock infrastructure". Most of the code to do this simple involves repackaging TSP's current code and inverting the logic so the distributor is in every cache instead of once only at the very top of the graph. All the confusing SetMTMode calls should no longer be necessary with this model. Legacy filter will be assume thread unsafe, wrapper functionality could promote a legacy filters thread safeness. Of course by building infrastructures and adding access to them in the API means filter authors can enqueue fragments of their internal processing along with the prefetch logic in a compatible way. A work in progress! |
Thread Tools | Search this Thread |
Display Modes | |
|
|