Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 19th July 2008, 00:09   #1  |  Link
QuaddiMM
Registered User
 
Join Date: May 2008
Posts: 10
Multicore optimization idea: running consecutive filters in different threads

From what I know, MT-plugin splits a frame into parts and processes each part in a different thread ("MT"), or uses alternating threads to get consecutive frames ("SetMTMode").
What I'm missing is a way to run consecutive filters on different threads, like a pipeline (as seen in modern cpu's for instruction-processing) would do. This would eliminate the Overlap-Problem of 'MT' and issues with filters that aren't reentrant or don't work with 'SetMTMode' for other reasons.
Of course, such a pipeline might introduce other incompatibilities, but i'd like to give it a try.

A filter implementing the idea would basically do the following:

- The filter runs a worker-thread that does the frame-getting from previous filters in the chain.
- Every time 'GetFrame' is called, a request is send to the worker-thread, requesting the frame and waiting for it to become available (ideally, the frame is already available).

Now, for it to work as a real pipeline, it is necessary to generate frames before they are requested (in order to being able to answer the request immediately without processing-delay). The simplest way is to assume that frames are requested in linear order (actually the case when encoding).

I wrote a small plugin to test the idea, and it works quite well:

Put "PipeLine" in the script, balancing load between filters below and above the pipeline.

In a fast test I got an speed-improvement of ~53%. With better balancing, improvements of up to 100% should be possible (or up to 300% on Quad-Cores), of course depending on the number of 'PipeLine's used, and the load-balancing.

Can you please comment on the idea? Maybe it's already done in MT and I didn't notice? Is it generally a good idea and should it be compatible with avisynth?

While testing I got some random crashes. Also the race-problem reappeared (no idea why). So I wonder if there is a better way to do it?

In the attachment is a binary of the filter, along with its source code.
Note when compiling: Due to 'CreateThread' being used, it must be linked with the Multithreading-DLL (not the static library). I'm fine with that, as I prefer the DLL-Runtime anyway.
Attached Files
File Type: zip avs_pipeline_080719.zip (25.0 KB, 622 views)
QuaddiMM is offline   Reply With Quote
Old 19th July 2008, 11:24   #2  |  Link
Gavino
Avisynth language lover
 
Join Date: Dec 2007
Location: Spain
Posts: 3,385
In principle, this seems like a good idea to me.

But I think for it to work, all filters used need to be thread-safe and I don't know if that is widely true. I can see possible problems if you have a non-thread-safe filter repeated in different parts of the filter chain. For example:
Code:
x = UnsafeFilter()
y=PipeLine(x).SomeOtherFilter()
x+y
Gavino is offline   Reply With Quote
Old 19th July 2008, 15:35   #3  |  Link
squid_80
Registered User
 
Join Date: Dec 2004
Location: Melbourne, AU
Posts: 1,963
I had a similar idea and ran into the same problems. If it was added to the core, I think the ideal place would be in the cache filter since it normally gets added internally after every filter by default.
I saw a similar massive speed improvement but it was unusable due to random crashes. Possibly the vfb management isn't/wasn't thread-safe, I think it's better now than when I tried but IanB would probably know best.
squid_80 is offline   Reply With Quote
Old 19th July 2008, 19:47   #4  |  Link
MfA
Registered User
 
Join Date: Mar 2002
Posts: 1,075
Isn't prefetching for parallelization going to be in Avisynth 2.6 (of course that seems to be on the same release schedule as 3.0).

PS. there's an easy solution to thread safety ... don't use threads. What are we talking about here? A couple 100 KB per instance of avisynth and the filter, and filter completion times multiple orders of magnitude larger than context switch times ... peanuts. Threading offers no real benefits over multiple processes with shared memory for this particular application.
MfA is offline   Reply With Quote
Old 20th July 2008, 17:31   #5  |  Link
Mug Funky
interlace this!
 
Mug Funky's Avatar
 
Join Date: Jun 2003
Location: i'm in ur transfers, addin noise
Posts: 4,555
if this could be made to work with mvtools, it would be the rocking-est rock that ever rocked.

i have an under-utilised 8-core machine, and HD/2k footage itching to be NR'd.
__________________
sucking the life out of your videos since 2004
Mug Funky is offline   Reply With Quote
Old 20th July 2008, 21:28   #6  |  Link
martino
masktools2 (ab)user
 
martino's Avatar
 
Join Date: Oct 2006
Location: PAL-I :(
Posts: 235
So with this, could you possibly run a filter, that only works on the current frame (1D filter, if that is a good term to use), split the task into two with SelectEven/Odd and put each instance of the filter under PipeLine and thus let both cores be used on what you'd originally achieve with just one call of the function, on the whole clip?
martino is offline   Reply With Quote
Old 20th July 2008, 21:50   #7  |  Link
Gavino
Avisynth language lover
 
Join Date: Dec 2007
Location: Spain
Posts: 3,385
Quote:
Originally Posted by martino View Post
So with this, could you possibly run a filter, that only works on the current frame (1D filter, if that is a good term to use), split the task into two with SelectEven/Odd and put each instance of the filter under PipeLine and thus let both cores be used on what you'd originally achieve with just one call of the function, on the whole clip?
In principle, yes. You'd need Interleave as well of course, to put the result back together again.

In practice, you might run into trouble if any of the upstream filters is not thread-safe (see my post #2). All it would take to screw up is modifying an instance variable in the filter's GetFrame call.

(BTW I'd call it a spatial filter, or perhaps 2D)
Gavino is offline   Reply With Quote
Old 20th July 2008, 22:09   #8  |  Link
QuaddiMM
Registered User
 
Join Date: May 2008
Posts: 10
Thanks for the replies.

@Gavino:
You're right. There are ways to break it with non-threadsafe filters.
Even then it might be useful for simple linear filter chains.

@MfA:
It would be great if something similar will be in AviSynth 2.6. I'm looking forward to it.

About using processes: Processes (can) run concurrently, therefore problems with thread-safety. Actually, on Windows a process is a mere container for one or more threads.

@martino:
You mean something like
Code:
src = last
a = src.SelectEven.BilinearResize (640, 480).PipeLine
b = src.SelectOdd.BilinearResize (640, 480).PipeLine
Interleave (a, b)
?
That 'would' be possible, if it would actually work (and not crash).

If that's not asked too much, can someone please revise my code? If there are errors, I'll try to fix them. If it's a problem with avisynth... well.. wait for 2.6.
QuaddiMM is offline   Reply With Quote
Old 20th July 2008, 22:37   #9  |  Link
squid_80
Registered User
 
Join Date: Dec 2004
Location: Melbourne, AU
Posts: 1,963
I had a quick look at the code, the only thing I'd comment on is I don't think it's worth implementing a cache since avisynth will already do that for you. It does crash the same as mine with the latest 2.5 beta, I haven't tried 2.6.

I don't see how non-threadsafe filters are a problem as long as getframe calls for individual filters are serialized, which is the point here - a filter doesn't process multiple frames in parallel, instead multiple filters run in parallel processing different frames.

What would be cool is to have code that analyzes which frames are being requested and adapts, rather than assuming linear access.
squid_80 is offline   Reply With Quote
Old 20th July 2008, 22:38   #10  |  Link
Gavino
Avisynth language lover
 
Join Date: Dec 2007
Location: Spain
Posts: 3,385
Quote:
Originally Posted by QuaddiMM View Post
@Gavino:
You're right. There are ways to break it with non-threadsafe filters.
Even then it might be useful for simple linear filter chains.
The problem as I see it is that, since filters are not obliged to be thread-safe, you cannot trust any filter unless it has been shown to be safe (preferably by inspection of its source code). Even if it works on a particular day, you might just have got lucky - Murphy's Law will inevitably strike sooner or later.

Testing can only show the presence of bugs, not their absence.
And that's even more true where multi-threading is concerned.

Last edited by Gavino; 20th July 2008 at 22:42.
Gavino is offline   Reply With Quote
Old 20th July 2008, 22:49   #11  |  Link
MfA
Registered User
 
Join Date: Mar 2002
Posts: 1,075
Quote:
Originally Posted by QuaddiMM View Post
About using processes: Processes (can) run concurrently, therefore problems with thread-safety. Actually, on Windows a process is a mere container for one or more threads.
With processes a separate instance of the filter will always have it's own set of variables, re-entrance and thread-safety are only an issue with shared resources and multithreading. Things can still go wrong in other ways with concurrent programs, but that is neither here nor there.

With multiprocessing every filter can be run in parallel with other instances of itself, unless the developer really tried very hard to break things (for instance by using named win32 objects as a sidechannel for passing data between filters ... not that many filters using sidechannels though, maybe mvtools?).

Last edited by MfA; 20th July 2008 at 22:56.
MfA is offline   Reply With Quote
Old 20th July 2008, 23:19   #12  |  Link
Gavino
Avisynth language lover
 
Join Date: Dec 2007
Location: Spain
Posts: 3,385
Quote:
Originally Posted by squid_80 View Post
I don't see how non-threadsafe filters are a problem as long as getframe calls for individual filters are serialized, which is the point here - a filter doesn't process multiple frames in parallel, instead multiple filters run in parallel processing different frames.
Yes, but in a setup like
Code:
src = AviSource(...).AnyFilter()
a = src.SelectEven.BilinearResize (640, 480).PipeLine
b = src.SelectOdd.BilinearResize (640, 480).PipeLine
Interleave (a, b)
then the single instance of AnyFilter (and that of AviSource) represented by src has its GetFrame called by two different threads, so not serialized.

Ah, :lightbulb: - how about if Pipeline had a companion called Serialize designed to fix cases like this. You would then write
Code:
src = AviSource(...).AnyFilter().Serialize()
a = src.SelectEven.BilinearResize (640, 480).PipeLine
b = src.SelectOdd.BilinearResize (640, 480).PipeLine
Interleave (a, b)
It's a crazy idea, Jim, but it might just work...

Last edited by Gavino; 20th July 2008 at 23:27. Reason: Idea for Serialize filter
Gavino is offline   Reply With Quote
Old 20th July 2008, 23:29   #13  |  Link
QuaddiMM
Registered User
 
Join Date: May 2008
Posts: 10
Quote:
Originally Posted by squid_80 View Post
I had a quick look at the code, the only thing I'd comment on is I don't think it's worth implementing a cache since avisynth will already do that for you. It does crash the same as mine with the latest 2.5 beta, I haven't tried 2.6.
Yeah - but the avisynth-cache only works for frames that were already returned by GetFrame. This doesn't cover 'prefetched' frames, so there is a cache for them.
EDIT: That's actually wrong. You're right about the cache. I could omit it.

But currently all frames are put in the cache, even those explicitly requested. I did that for it was the easiest way to do. I'm new to the threading-stuff and it confuses me, so I kept it easy.

Quote:
I don't see how non-threadsafe filters are a problem as long as getframe calls for individual filters are serialized, which is the point here - a filter doesn't process multiple frames in parallel, instead multiple filters run in parallel processing different frames.
That's the point. I thought I avoided thread-safety issues by only requesting one frame at a time. There I was wrong or it's an error in the code.

Quote:
What would be cool is to have code that analyzes which frames are being requested and adapts, rather than assuming linear access.
That would be very hard to do properly. For encoding, linear prefetching should be enough.

Quote:
Originally Posted by MfA
With processes a separate instance of the filter will always have it's own set of variables, re-entrance and thread-safety are only an issue with shared resources and multithreading. Things can still go wrong in other ways with concurrent programs, but that is neither here nor there.

With multiprocessing every filter can be run in parallel with other instances of itself, unless the developer really tried very hard to break things (for instance by using named win32 objects as a sidechannel for passing data between filters ... not that many filters using sidechannels though, maybe mvtools?).
Sorry, I got you wrong there. So you mean not just using different processes, but also running different instances of the filters. That may solve some problems. But it's more interesting for MT-plugin as it really runs the same filter twice at the same time.

Last edited by QuaddiMM; 20th July 2008 at 23:34.
QuaddiMM is offline   Reply With Quote
Old 20th July 2008, 23:57   #14  |  Link
QuaddiMM
Registered User
 
Join Date: May 2008
Posts: 10
Quote:
Originally Posted by Gavino View Post
Ah, :lightbulb: - how about if Pipeline had a companion called Serialize designed to fix cases like this. You would then write
Code:
src = AviSource(...).AnyFilter().Serialize()
a = src.SelectEven.BilinearResize (640, 480).PipeLine
b = src.SelectOdd.BilinearResize (640, 480).PipeLine
Interleave (a, b)
It's a crazy idea, Jim, but it might just work...
Nice idea. It's quite simple to implement.
Something like:
Code:
PVideoFrame __stdcall Serialize::GetFrame (int n, IScriptEnvironment* env)
{
	EnterCriticalSection (&cs);
	PVideoFrame frame = child->GetFrame (n, env);
	LeaveCriticalSection (&cs);

	return frame;
}
That runs fine on its own, but doesn't solve the issues with PipeLine (i tested it).
I suspect (without actually looking at it, so I might be wrong here) that the cause of the problems is the avisynth-cache, which may not be thread safe. If that's the case, there's no way around fixing it.
QuaddiMM is offline   Reply With Quote
Old 21st July 2008, 01:49   #15  |  Link
martino
masktools2 (ab)user
 
martino's Avatar
 
Join Date: Oct 2006
Location: PAL-I :(
Posts: 235
Quote:
Originally Posted by Gavino View Post
In principle, yes. You'd need Interleave as well of course, to put the result back together again.
Oh, that's right. I forgot about that.

Quote:
Originally Posted by QuaddiMM View Post
Thanks for the replies.
@martino:
You mean something like
Code:
src = last
a = src.SelectEven.BilinearResize (640, 480).PipeLine
b = src.SelectOdd.BilinearResize (640, 480).PipeLine
Interleave (a, b)
?
That 'would' be possible, if it would actually work (and not crash).
Yup. That was exactly what I was thinking.

*downloads filter and will try some other time
martino is offline   Reply With Quote
Old 21st July 2008, 05:12   #16  |  Link
squid_80
Registered User
 
Join Date: Dec 2004
Location: Melbourne, AU
Posts: 1,963
Instead of implementing serialization in another filter, why not just put it in pipeline and use it on every filter in the script? It's cheap to implement with a critical section like the code posted (critical sections are a lot faster than the other synchronization primitives).
I'd have to check your code again but I think it's already more or less serialized anyway, since getframe blocks for the "fetch" thread to return if it's already running. So if pipeline were in place immediately after a non-threadsafe filter, wouldn't it ensure only one getframe call went through at a time?

Re: my idea of analyzing the frame order, I think scripts where this would be most useful are going to be the ones doing some sort of framerate adjustment via mvtools and whatever. Typically not linear access. Even adding a selecteven or selectodd would bork that. I never really got into statistics but something like a decaying mode would probably work, even for odd patterns (e.g. frame 0, frame 2, frame 3, frame 5 etc). I'll see if I can come up with some code.
squid_80 is offline   Reply With Quote
Old 21st July 2008, 06:16   #17  |  Link
Gavino
Avisynth language lover
 
Join Date: Dec 2007
Location: Spain
Posts: 3,385
Quote:
Originally Posted by squid_80 View Post
Instead of implementing serialization in another filter, why not just put it in pipeline and use it on every filter in the script? ...
I think it's already more or less serialized anyway, since getframe blocks for the "fetch" thread to return if it's already running. So if pipeline were in place immediately after a non-threadsafe filter, wouldn't it ensure only one getframe call went through at a time?
The problem is when you have more than just a linear chain. In my earlier example
Code:
src = AviSource(...).AnyFilter().Serialize()
a = src.SelectEven.BilinearResize (640, 480).PipeLine
b = src.SelectOdd.BilinearResize (640, 480).PipeLine
Interleave (a, b)
it seems to me the serialisation has to be done on the common part of the graph, hence immediately after AnyFilter as I described.

Or are you suggesting that the two pipelines would co-operate to enforce serial access at a global level? Hmm, perhaps you're right, I'm not sure now.
Gavino is offline   Reply With Quote
Old 21st July 2008, 08:35   #18  |  Link
squid_80
Registered User
 
Join Date: Dec 2004
Location: Melbourne, AU
Posts: 1,963
I meant just do this:
Code:
src = AviSource(...).AnyFilter().PipeLine()
a = src.SelectEven.BilinearResize (640, 480).PipeLine()
b = src.SelectOdd.BilinearResize (640, 480).PipeLine()
Interleave (a, b)
since pipeline will already take care of the serialization (I think).

I replaced my avisynth.dll with the one from tsp's last MT avisynth build (since 2.6 isn't quite ready), and it's not crashing anymore. Don't use any setmtmode statements, I think they'll just slow things down or worse cause a deadlock. Simple mpeg2source().resize() scripts perform at double their previous speed.
squid_80 is offline   Reply With Quote
Old 21st July 2008, 10:12   #19  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
Well, in the practical world there are some issues, that make this more complex than it seems. I've written a prefetch addon to MT, which I have used for some of my own tests. It works almost completely like yours, but problems arise from complex scripts.

The example above works nicely en the example, but what if you want to use a temporal filter? If you use spatial division, as Mt(""), you must compute an overlap, plus you get penalized for thread synchronization, since you cannot return a frame before both threads have completed, and they very seldom are finished in the same amount of time.

Futhermore most sources heavily prefer linear access, which means you must still access them in-order, to avoid have a huge seek penalty. You cannot run the script above without synchronization, since you have a 50% chance of 'b' requesting a frame before 'a'. It will be further complicated if we add a "MergeChroma(src)" at the last line, which will request first, if 'a' and 'b' runs async, and each are a frame ahead of yours?

I'm still experimenting with the pre-fetcher. It works ok, but I'm still not happy enough with it to release it. In the example above, it should be able to replace PipeLine(). I need to get a sort of dynamic cache working before it is usable.

btw, I can only get it to be stable on 2.6.
__________________
Regards, sh0dan // VoxPod

Last edited by sh0dan; 21st July 2008 at 10:20.
sh0dan is offline   Reply With Quote
Old 22nd July 2008, 03:15   #20  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,173
Okay this is all pretty close to an idea I am gestating for properly doing multithreading in avisynth 2.6.

The thought currently goes like this :-
  • Goal :-to instantly provide the client with the frame it is requesting.
  • Build a queueable GetFrame request infrastructure.
  • Build a worker thread infrastructure to service the queue.
  • Build a thread interlock infrastructure.
  • Use fibre like concepts to manage and control the number of workers active.
  • On exiting each Cache::GetFrame queue a request for the "next" frame.
  • Use a history tracking algorithm to predict "next".
  • Completing the queued request does not queue the next frame in the current cache instance, child cache instances do enqueue.
  • Prioritise queue request by cache graph depth.
  • All caches are interlocked to protect non-thread safe filters.
  • Caches are given knowledge of child filters "Identity" so it can apply a single lock against all instances of a filter.
  • New filters can declare their thread safeness through enhanced cache hints interface. i.e. Unsafe, Instance only safe, ..., Fully safe!
  • New filters can declare their processing cost. i.e. Zero, Bitblt like, light, medium, heavy. (Zero cost filters do not get queued requests).
  • New filters can declare access order restrictions i.e. strictly linear, linear preferred over step N, random, ...
So for a simple graph with 1 filter and 1 source the request pattern is like this :-

-- Client X->cache2->filter->cache1->source

X requests cache2 requests filter requests cache1 requests source for frame 0.
source returns frame 0, cache1 enqueues for frame 1 and returns frame 0.
(okay I better trim this notation)
worker1 starts prefetch frame 1 from source.
cache2 enqueues for frame 1 and returns frame 0.
worker2 starts prefetch frame 1 from filter, blocks in cache1.
worker1 completes frame 1. No enqueuing!
worker2 unblocks, cache1 enqueues for frame 2 and returns frame 1.
worker3 starts prefetch frame 2 from source.
X requests frame 1, blocks in cache2.
worker2 completes frame 1. No enqueuing!
X unblocks, cache2 enqueues for frame 2 and returns frame 1.
worker3 completes frame 2. No enqueuing!
worker1 starts prefetch frame 2 from filter, cache1 enqueues for frame 3 and returns frame 2.
worker2 starts prefetch frame 3 from source.
worker2 completes frame 3. No enqueuing!
worker1 completes frame 2. No enqueuing!
Stall! cache2 has frame 2 ready, cache 1 has frame 3 ready, all CPU cores available to X
X requests frame 2, cache2 enqueues for frame 3 and returns frame 2.
worker3 ...

At the stall there is little point in Avisynth proceeding, the goal is to instantly provide X with the frame it is requesting is being met.

If X had a bursty request pattern, say requesting 3 frames in quick succession and a long pause to process the 3, then simply adding Sh0dans, prefetch filter would accomodate. Of course for bonus points the cache could measure the inter frame request times and more aggressively prefetch frames based on minimum, maximum and latency times.

Threads that block in on cache lock get recycled in a fibre like manner in an attempt to keep the defined number of workers concurrently active, hence the "Build a thread interlock infrastructure".

Most of the code to do this simple involves repackaging TSP's current code and inverting the logic so the distributor is in every cache instead of once only at the very top of the graph. All the confusing SetMTMode calls should no longer be necessary with this model. Legacy filter will be assume thread unsafe, wrapper functionality could promote a legacy filters thread safeness.

Of course by building infrastructures and adding access to them in the API means filter authors can enqueue fragments of their internal processing along with the prefetch logic in a compatible way.

A work in progress!
IanB is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 12:52.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, vBulletin Solutions Inc.