Multicore optimization idea: running consecutive filters in different threads

QuaddiMM · 18th July 2008, 23:09

From what I know, MT-plugin splits a frame into parts and processes each part in a different thread ("MT"), or uses alternating threads to get consecutive frames ("SetMTMode").
What I'm missing is a way to run consecutive filters on different threads, like a pipeline (as seen in modern cpu's for instruction-processing) would do. This would eliminate the Overlap-Problem of 'MT' and issues with filters that aren't reentrant or don't work with 'SetMTMode' for other reasons.
Of course, such a pipeline might introduce other incompatibilities, but i'd like to give it a try.

A filter implementing the idea would basically do the following:

- The filter runs a worker-thread that does the frame-getting from previous filters in the chain.
- Every time 'GetFrame' is called, a request is send to the worker-thread, requesting the frame and waiting for it to become available (ideally, the frame is already available).

Now, for it to work as a real pipeline, it is necessary to generate frames before they are requested (in order to being able to answer the request immediately without processing-delay). The simplest way is to assume that frames are requested in linear order (actually the case when encoding).

I wrote a small plugin to test the idea, and it works quite well:

Put "PipeLine" in the script, balancing load between filters below and above the pipeline.

In a fast test I got an speed-improvement of ~53%. With better balancing, improvements of up to 100% should be possible (or up to 300% on Quad-Cores), of course depending on the number of 'PipeLine's used, and the load-balancing.

Can you please comment on the idea? Maybe it's already done in MT and I didn't notice? Is it generally a good idea and should it be compatible with avisynth?

While testing I got some random crashes. Also the race-problem reappeared (no idea why). So I wonder if there is a better way to do it?

In the attachment is a binary of the filter, along with its source code.
Note when compiling: Due to 'CreateThread' being used, it must be linked with the Multithreading-DLL (not the static library). I'm fine with that, as I prefer the DLL-Runtime anyway.

Gavino · 19th July 2008, 10:24

In principle, this seems like a good idea to me.

But I think for it to work, all filters used need to be thread-safe and I don't know if that is widely true. I can see possible problems if you have a non-thread-safe filter repeated in different parts of the filter chain. For example:

Code:

x = UnsafeFilter()
y=PipeLine(x).SomeOtherFilter()
x+y

squid_80 · 19th July 2008, 14:35

I had a similar idea and ran into the same problems. If it was added to the core, I think the ideal place would be in the cache filter since it normally gets added internally after every filter by default.
I saw a similar massive speed improvement but it was unusable due to random crashes. Possibly the vfb management isn't/wasn't thread-safe, I think it's better now than when I tried but IanB would probably know best.

MfA · 19th July 2008, 18:47

Isn't prefetching for parallelization going to be in Avisynth 2.6 (of course that seems to be on the same release schedule as 3.0).

PS. there's an easy solution to thread safety ... don't use threads. What are we talking about here? A couple 100 KB per instance of avisynth and the filter, and filter completion times multiple orders of magnitude larger than context switch times ... peanuts. Threading offers no real benefits over multiple processes with shared memory for this particular application.

Mug Funky · 20th July 2008, 16:31

if this could be made to work with mvtools, it would be the rocking-est rock that ever rocked.

i have an under-utilised 8-core machine, and HD/2k footage itching to be NR'd.

martino · 20th July 2008, 20:28

So with this, could you possibly run a filter, that only works on the current frame (1D filter, if that is a good term to use), split the task into two with SelectEven/Odd and put each instance of the filter under PipeLine and thus let both cores be used on what you'd originally achieve with just one call of the function, on the whole clip?

Gavino · 20th July 2008, 20:50

Quote:

Originally Posted by martino

So with this, could you possibly run a filter, that only works on the current frame (1D filter, if that is a good term to use), split the task into two with SelectEven/Odd and put each instance of the filter under PipeLine and thus let both cores be used on what you'd originally achieve with just one call of the function, on the whole clip?

In principle, yes. You'd need Interleave as well of course, to put the result back together again.

In practice, you might run into trouble if any of the upstream filters is not thread-safe (see my post #2). All it would take to screw up is modifying an instance variable in the filter's GetFrame call.

(BTW I'd call it a spatial filter, or perhaps 2D)

QuaddiMM · 20th July 2008, 21:09

Thanks for the replies.

@Gavino:
You're right. There are ways to break it with non-threadsafe filters.
Even then it might be useful for simple linear filter chains.

@MfA:
It would be great if something similar will be in AviSynth 2.6. I'm looking forward to it.

About using processes: Processes (can) run concurrently, therefore problems with thread-safety. Actually, on Windows a process is a mere container for one or more threads.

@martino:
You mean something like

Code:

src = last
a = src.SelectEven.BilinearResize (640, 480).PipeLine
b = src.SelectOdd.BilinearResize (640, 480).PipeLine
Interleave (a, b)

?
That 'would' be possible, if it would actually work (and not crash).

If that's not asked too much, can someone please revise my code? If there are errors, I'll try to fix them. If it's a problem with avisynth... well.. wait for 2.6.

squid_80 · 20th July 2008, 21:37

I had a quick look at the code, the only thing I'd comment on is I don't think it's worth implementing a cache since avisynth will already do that for you. It does crash the same as mine with the latest 2.5 beta, I haven't tried 2.6.

I don't see how non-threadsafe filters are a problem as long as getframe calls for individual filters are serialized, which is the point here - a filter doesn't process multiple frames in parallel, instead multiple filters run in parallel processing different frames.

What would be cool is to have code that analyzes which frames are being requested and adapts, rather than assuming linear access.

Gavino · 20th July 2008, 21:38

Quote:

Originally Posted by QuaddiMM

@Gavino:
You're right. There are ways to break it with non-threadsafe filters.
Even then it might be useful for simple linear filter chains.

The problem as I see it is that, since filters are not obliged to be thread-safe, you cannot trust any filter unless it has been shown to be safe (preferably by inspection of its source code). Even if it works on a particular day, you might just have got lucky - Murphy's Law will inevitably strike sooner or later.

Testing can only show the presence of bugs, not their absence.

And that's even more true where multi-threading is concerned.

MfA · 20th July 2008, 21:49

Quote:

Originally Posted by QuaddiMM

About using processes: Processes (can) run concurrently, therefore problems with thread-safety. Actually, on Windows a process is a mere container for one or more threads.

With processes a separate instance of the filter will always have it's own set of variables, re-entrance and thread-safety are only an issue with shared resources and multithreading. Things can still go wrong in other ways with concurrent programs, but that is neither here nor there.

With multiprocessing every filter can be run in parallel with other instances of itself, unless the developer really tried very hard to break things (for instance by using named win32 objects as a sidechannel for passing data between filters ... not that many filters using sidechannels though, maybe mvtools?).

Gavino · 20th July 2008, 22:19

Quote:

Originally Posted by squid_80

I don't see how non-threadsafe filters are a problem as long as getframe calls for individual filters are serialized, which is the point here - a filter doesn't process multiple frames in parallel, instead multiple filters run in parallel processing different frames.

Yes, but in a setup like

Code:

src = AviSource(...).AnyFilter()
a = src.SelectEven.BilinearResize (640, 480).PipeLine
b = src.SelectOdd.BilinearResize (640, 480).PipeLine
Interleave (a, b)

then the single instance of AnyFilter (and that of AviSource) represented by src has its GetFrame called by two different threads, so not serialized.

Ah, :lightbulb: - how about if Pipeline had a companion called Serialize designed to fix cases like this. You would then write

Code:

src = AviSource(...).AnyFilter().Serialize()
a = src.SelectEven.BilinearResize (640, 480).PipeLine
b = src.SelectOdd.BilinearResize (640, 480).PipeLine
Interleave (a, b)

It's a crazy idea, Jim, but it might just work...

QuaddiMM · 20th July 2008, 22:29

Quote:

Originally Posted by squid_80

I had a quick look at the code, the only thing I'd comment on is I don't think it's worth implementing a cache since avisynth will already do that for you. It does crash the same as mine with the latest 2.5 beta, I haven't tried 2.6.

Yeah - but the avisynth-cache only works for frames that were already returned by GetFrame. This doesn't cover 'prefetched' frames, so there is a cache for them.
EDIT: That's actually wrong. You're right about the cache. I could omit it.

But currently all frames are put in the cache, even those explicitly requested. I did that for it was the easiest way to do. I'm new to the threading-stuff and it confuses me, so I kept it easy.

Quote:

I don't see how non-threadsafe filters are a problem as long as getframe calls for individual filters are serialized, which is the point here - a filter doesn't process multiple frames in parallel, instead multiple filters run in parallel processing different frames.

That's the point. I thought I avoided thread-safety issues by only requesting one frame at a time. There I was wrong or it's an error in the code.

Quote:

What would be cool is to have code that analyzes which frames are being requested and adapts, rather than assuming linear access.

That would be very hard to do properly. For encoding, linear prefetching should be enough.

Quote:

Originally Posted by MfA

With processes a separate instance of the filter will always have it's own set of variables, re-entrance and thread-safety are only an issue with shared resources and multithreading. Things can still go wrong in other ways with concurrent programs, but that is neither here nor there.

With multiprocessing every filter can be run in parallel with other instances of itself, unless the developer really tried very hard to break things (for instance by using named win32 objects as a sidechannel for passing data between filters ... not that many filters using sidechannels though, maybe mvtools?).

Sorry, I got you wrong there. So you mean not just using different processes, but also running different instances of the filters. That may solve some problems. But it's more interesting for MT-plugin as it really runs the same filter twice at the same time.

QuaddiMM · 20th July 2008, 22:57

Quote:

Originally Posted by Gavino

Ah, :lightbulb: - how about if Pipeline had a companion called Serialize designed to fix cases like this. You would then write

Code:

src = AviSource(...).AnyFilter().Serialize()
a = src.SelectEven.BilinearResize (640, 480).PipeLine
b = src.SelectOdd.BilinearResize (640, 480).PipeLine
Interleave (a, b)

It's a crazy idea, Jim, but it might just work...

Nice idea. It's quite simple to implement.
Something like:

Code:

PVideoFrame __stdcall Serialize::GetFrame (int n, IScriptEnvironment* env)
{
	EnterCriticalSection (&cs);
	PVideoFrame frame = child->GetFrame (n, env);
	LeaveCriticalSection (&cs);

	return frame;
}

That runs fine on its own, but doesn't solve the issues with PipeLine (i tested it).
I suspect (without actually looking at it, so I might be wrong here) that the cause of the problems is the avisynth-cache, which may not be thread safe. If that's the case, there's no way around fixing it.

martino · 21st July 2008, 00:49

Quote:

Originally Posted by Gavino

In principle, yes. You'd need Interleave as well of course, to put the result back together again.

Oh, that's right. I forgot about that.

Quote:

Originally Posted by QuaddiMM

Thanks for the replies.
@martino:
You mean something like

Code:

src = last
a = src.SelectEven.BilinearResize (640, 480).PipeLine
b = src.SelectOdd.BilinearResize (640, 480).PipeLine
Interleave (a, b)

?
That 'would' be possible, if it would actually work (and not crash).

Yup. That was exactly what I was thinking.

*downloads filter and will try some other time

squid_80 · 21st July 2008, 04:12

Instead of implementing serialization in another filter, why not just put it in pipeline and use it on every filter in the script? It's cheap to implement with a critical section like the code posted (critical sections are a lot faster than the other synchronization primitives).
I'd have to check your code again but I think it's already more or less serialized anyway, since getframe blocks for the "fetch" thread to return if it's already running. So if pipeline were in place immediately after a non-threadsafe filter, wouldn't it ensure only one getframe call went through at a time?

Re: my idea of analyzing the frame order, I think scripts where this would be most useful are going to be the ones doing some sort of framerate adjustment via mvtools and whatever. Typically not linear access. Even adding a selecteven or selectodd would bork that. I never really got into statistics but something like a decaying mode would probably work, even for odd patterns (e.g. frame 0, frame 2, frame 3, frame 5 etc). I'll see if I can come up with some code.

Gavino · 21st July 2008, 05:16

Quote:

Originally Posted by squid_80

Instead of implementing serialization in another filter, why not just put it in pipeline and use it on every filter in the script? ...
I think it's already more or less serialized anyway, since getframe blocks for the "fetch" thread to return if it's already running. So if pipeline were in place immediately after a non-threadsafe filter, wouldn't it ensure only one getframe call went through at a time?

The problem is when you have more than just a linear chain. In my earlier example

Code:

src = AviSource(...).AnyFilter().Serialize()
a = src.SelectEven.BilinearResize (640, 480).PipeLine
b = src.SelectOdd.BilinearResize (640, 480).PipeLine
Interleave (a, b)

it seems to me the serialisation has to be done on the common part of the graph, hence immediately after AnyFilter as I described.

Or are you suggesting that the two pipelines would co-operate to enforce serial access at a global level? Hmm, perhaps you're right, I'm not sure now.

squid_80 · 21st July 2008, 07:35

I meant just do this:

Code:

src = AviSource(...).AnyFilter().PipeLine()
a = src.SelectEven.BilinearResize (640, 480).PipeLine()
b = src.SelectOdd.BilinearResize (640, 480).PipeLine()
Interleave (a, b)

since pipeline will already take care of the serialization (I think).

I replaced my avisynth.dll with the one from tsp's last MT avisynth build (since 2.6 isn't quite ready), and it's not crashing anymore. Don't use any setmtmode statements, I think they'll just slow things down or worse cause a deadlock. Simple mpeg2source().resize() scripts perform at double their previous speed.

sh0dan · 21st July 2008, 09:12

Well, in the practical world there are some issues, that make this more complex than it seems. I've written a prefetch addon to MT, which I have used for some of my own tests. It works almost completely like yours, but problems arise from complex scripts.

The example above works nicely en the example, but what if you want to use a temporal filter? If you use spatial division, as Mt(""), you must compute an overlap, plus you get penalized for thread synchronization, since you cannot return a frame before both threads have completed, and they very seldom are finished in the same amount of time.

Futhermore most sources heavily prefer linear access, which means you must still access them in-order, to avoid have a huge seek penalty. You cannot run the script above without synchronization, since you have a 50% chance of 'b' requesting a frame before 'a'. It will be further complicated if we add a "MergeChroma(src)" at the last line, which will request first, if 'a' and 'b' runs async, and each are a frame ahead of yours?

I'm still experimenting with the pre-fetcher. It works ok, but I'm still not happy enough with it to release it. In the example above, it should be able to replace PipeLine(). I need to get a sort of dynamic cache working before it is usable.

btw, I can only get it to be stable on 2.6.

IanB · 22nd July 2008, 02:15

Okay this is all pretty close to an idea I am gestating for properly doing multithreading in avisynth 2.6.

The thought currently goes like this :-

Goal :-to instantly provide the client with the frame it is requesting.
Build a queueable GetFrame request infrastructure.
Build a worker thread infrastructure to service the queue.
Build a thread interlock infrastructure.
Use fibre like concepts to manage and control the number of workers active.
On exiting each Cache::GetFrame queue a request for the "next" frame.
Use a history tracking algorithm to predict "next".
Completing the queued request does not queue the next frame in the current cache instance, child cache instances do enqueue.
Prioritise queue request by cache graph depth.
All caches are interlocked to protect non-thread safe filters.
Caches are given knowledge of child filters "Identity" so it can apply a single lock against all instances of a filter.
New filters can declare their thread safeness through enhanced cache hints interface. i.e. Unsafe, Instance only safe, ..., Fully safe!
New filters can declare their processing cost. i.e. Zero, Bitblt like, light, medium, heavy. (Zero cost filters do not get queued requests).
New filters can declare access order restrictions i.e. strictly linear, linear preferred over step N, random, ...

So for a simple graph with 1 filter and 1 source the request pattern is like this :-

-- Client X->cache2->filter->cache1->source

X requests cache2 requests filter requests cache1 requests source for frame 0.
source returns frame 0, cache1 enqueues for frame 1 and returns frame 0.
(okay I better trim this notation)
worker1 starts prefetch frame 1 from source.
cache2 enqueues for frame 1 and returns frame 0.
worker2 starts prefetch frame 1 from filter, blocks in cache1.
worker1 completes frame 1. No enqueuing!
worker2 unblocks, cache1 enqueues for frame 2 and returns frame 1.
worker3 starts prefetch frame 2 from source.
X requests frame 1, blocks in cache2.
worker2 completes frame 1. No enqueuing!
X unblocks, cache2 enqueues for frame 2 and returns frame 1.
worker3 completes frame 2. No enqueuing!
worker1 starts prefetch frame 2 from filter, cache1 enqueues for frame 3 and returns frame 2.
worker2 starts prefetch frame 3 from source.
worker2 completes frame 3. No enqueuing!
worker1 completes frame 2. No enqueuing!
Stall! cache2 has frame 2 ready, cache 1 has frame 3 ready, all CPU cores available to X
X requests frame 2, cache2 enqueues for frame 3 and returns frame 2.
worker3 ...

At the stall there is little point in Avisynth proceeding, the goal is to instantly provide X with the frame it is requesting is being met.

If X had a bursty request pattern, say requesting 3 frames in quick succession and a long pause to process the 3, then simply adding Sh0dans, prefetch filter would accomodate. Of course for bonus points the cache could measure the inter frame request times and more aggressively prefetch frames based on minimum, maximum and latency times.

Threads that block in on cache lock get recycled in a fibre like manner in an attempt to keep the defined number of workers concurrently active, hence the "Build a thread interlock infrastructure".

Most of the code to do this simple involves repackaging TSP's current code and inverting the logic so the distributor is in every cache instead of once only at the very top of the graph. All the confusing SetMTMode calls should no longer be necessary with this model. Legacy filter will be assume thread unsafe, wrapper functionality could promote a legacy filters thread safeness.

Of course by building infrastructures and adding access to them in the API means filter authors can enqueue fragments of their internal processing along with the prefetch logic in a compatible way.

A work in progress!

19th July 2008, 10:24	#2 \| Link
Gavino Avisynth language lover Join Date: Dec 2007 Location: Spain Posts: 3,431	In principle, this seems like a good idea to me. But I think for it to work, all filters used need to be thread-safe and I don't know if that is widely true. I can see possible problems if you have a non-thread-safe filter repeated in different parts of the filter chain. For example: Code: x = UnsafeFilter() y=PipeLine(x).SomeOtherFilter() x+y

20th July 2008, 16:31	#5 \| Link
Mug Funky interlace this! Join Date: Jun 2003 Location: i'm in ur transfers, addin noise Posts: 4,555	if this could be made to work with mvtools, it would be the rocking-est rock that ever rocked. i have an under-utilised 8-core machine, and HD/2k footage itching to be NR'd. __________________ sucking the life out of your videos since 2004

20th July 2008, 21:09	#8 \| Link
QuaddiMM Registered User Join Date: May 2008 Posts: 10	Thanks for the replies. @Gavino: You're right. There are ways to break it with non-threadsafe filters. Even then it might be useful for simple linear filter chains. @MfA: It would be great if something similar will be in AviSynth 2.6. I'm looking forward to it. About using processes: Processes (can) run concurrently, therefore problems with thread-safety. Actually, on Windows a process is a mere container for one or more threads. @martino: You mean something like Code: src = last a = src.SelectEven.BilinearResize (640, 480).PipeLine b = src.SelectOdd.BilinearResize (640, 480).PipeLine Interleave (a, b) ? That 'would' be possible, if it would actually work (and not crash). If that's not asked too much, can someone please revise my code? If there are errors, I'll try to fix them. If it's a problem with avisynth... well.. wait for 2.6.

21st July 2008, 07:35	#18 \| Link
squid_80 Registered User Join Date: Dec 2004 Location: Melbourne, AU Posts: 1,963	I meant just do this: Code: src = AviSource(...).AnyFilter().PipeLine() a = src.SelectEven.BilinearResize (640, 480).PipeLine() b = src.SelectOdd.BilinearResize (640, 480).PipeLine() Interleave (a, b) since pipeline will already take care of the serialization (I think). I replaced my avisynth.dll with the one from tsp's last MT avisynth build (since 2.6 isn't quite ready), and it's not crashing anymore. Don't use any setmtmode statements, I think they'll just slow things down or worse cause a deadlock. Simple mpeg2source().resize() scripts perform at double their previous speed.

21st July 2008, 09:12	#19 \| Link
sh0dan Retired AviSynth Dev ;) Join Date: Nov 2001 Location: Dark Side of the Moon Posts: 3,480	Well, in the practical world there are some issues, that make this more complex than it seems. I've written a prefetch addon to MT, which I have used for some of my own tests. It works almost completely like yours, but problems arise from complex scripts. The example above works nicely en the example, but what if you want to use a temporal filter? If you use spatial division, as Mt(""), you must compute an overlap, plus you get penalized for thread synchronization, since you cannot return a frame before both threads have completed, and they very seldom are finished in the same amount of time. Futhermore most sources heavily prefer linear access, which means you must still access them in-order, to avoid have a huge seek penalty. You cannot run the script above without synchronization, since you have a 50% chance of 'b' requesting a frame before 'a'. It will be further complicated if we add a "MergeChroma(src)" at the last line, which will request first, if 'a' and 'b' runs async, and each are a frame ahead of yours? I'm still experimenting with the pre-fetcher. It works ok, but I'm still not happy enough with it to release it. In the example above, it should be able to replace PipeLine(). I need to get a sort of dynamic cache working before it is usable. btw, I can only get it to be stable on 2.6. __________________ Regards, sh0dan // VoxPod Last edited by sh0dan; 21st July 2008 at 09:20.

19th July 2008, 14:35	#3 \| Link
squid_80 Registered User Join Date: Dec 2004 Location: Melbourne, AU Posts: 1,963	I had a similar idea and ran into the same problems. If it was added to the core, I think the ideal place would be in the cache filter since it normally gets added internally after every filter by default. I saw a similar massive speed improvement but it was unusable due to random crashes. Possibly the vfb management isn't/wasn't thread-safe, I think it's better now than when I tried but IanB would probably know best.

19th July 2008, 18:47	#4 \| Link
MfA Registered User Join Date: Mar 2002 Posts: 1,075	Isn't prefetching for parallelization going to be in Avisynth 2.6 (of course that seems to be on the same release schedule as 3.0). PS. there's an easy solution to thread safety ... don't use threads. What are we talking about here? A couple 100 KB per instance of avisynth and the filter, and filter completion times multiple orders of magnitude larger than context switch times ... peanuts. Threading offers no real benefits over multiple processes with shared memory for this particular application.

20th July 2008, 20:28	#6 \| Link
martino masktools2 (ab)user Join Date: Oct 2006 Location: PAL-I :( Posts: 235	So with this, could you possibly run a filter, that only works on the current frame (1D filter, if that is a good term to use), split the task into two with SelectEven/Odd and put each instance of the filter under PipeLine and thus let both cores be used on what you'd originally achieve with just one call of the function, on the whole clip?

20th July 2008, 21:37	#9 \| Link
squid_80 Registered User Join Date: Dec 2004 Location: Melbourne, AU Posts: 1,963	I had a quick look at the code, the only thing I'd comment on is I don't think it's worth implementing a cache since avisynth will already do that for you. It does crash the same as mine with the latest 2.5 beta, I haven't tried 2.6. I don't see how non-threadsafe filters are a problem as long as getframe calls for individual filters are serialized, which is the point here - a filter doesn't process multiple frames in parallel, instead multiple filters run in parallel processing different frames. What would be cool is to have code that analyzes which frames are being requested and adapts, rather than assuming linear access.

21st July 2008, 04:12	#16 \| Link
squid_80 Registered User Join Date: Dec 2004 Location: Melbourne, AU Posts: 1,963	Instead of implementing serialization in another filter, why not just put it in pipeline and use it on every filter in the script? It's cheap to implement with a critical section like the code posted (critical sections are a lot faster than the other synchronization primitives). I'd have to check your code again but I think it's already more or less serialized anyway, since getframe blocks for the "fetch" thread to return if it's already running. So if pipeline were in place immediately after a non-threadsafe filter, wouldn't it ensure only one getframe call went through at a time? Re: my idea of analyzing the frame order, I think scripts where this would be most useful are going to be the ones doing some sort of framerate adjustment via mvtools and whatever. Typically not linear access. Even adding a selecteven or selectodd would bork that. I never really got into statistics but something like a decaying mode would probably work, even for odd patterns (e.g. frame 0, frame 2, frame 3, frame 5 etc). I'll see if I can come up with some code.

22nd July 2008, 02:15	#20 \| Link
IanB Avisynth Developer Join Date: Jan 2003 Location: Melbourne, Australia Posts: 3,167	Okay this is all pretty close to an idea I am gestating for properly doing multithreading in avisynth 2.6. The thought currently goes like this :- Goal :-to instantly provide the client with the frame it is requesting. Build a queueable GetFrame request infrastructure. Build a worker thread infrastructure to service the queue. Build a thread interlock infrastructure. Use fibre like concepts to manage and control the number of workers active. *On exiting each Cache::GetFrame queue a request for the "next" frame.* Use a history tracking algorithm to predict "next". Completing the queued request does not queue the next frame in the current cache instance, child cache instances do enqueue. Prioritise queue request by cache graph depth. All caches are interlocked to protect non-thread safe filters. Caches are given knowledge of child filters "Identity" so it can apply a single lock against all instances of a filter. New filters can declare their thread safeness through enhanced cache hints interface. i.e. Unsafe, Instance only safe, ..., Fully safe! New filters can declare their processing cost. i.e. Zero, Bitblt like, light, medium, heavy. (Zero cost filters do not get queued requests). New filters can declare access order restrictions i.e. strictly linear, linear preferred over step N, random, ... So for a simple graph with 1 filter and 1 source the request pattern is like this :- -- Client X->cache2->filter->cache1->source X requests cache2 requests filter requests cache1 requests source for frame 0. source returns frame 0, cache1 enqueues for frame 1 and returns frame 0. (okay I better trim this notation) worker1 starts prefetch frame 1 from source. cache2 enqueues for frame 1 and returns frame 0. worker2 starts prefetch frame 1 from filter, blocks in cache1. worker1 completes frame 1. No enqueuing! worker2 unblocks, cache1 enqueues for frame 2 and returns frame 1. worker3 starts prefetch frame 2 from source. X requests frame 1, blocks in cache2. worker2 completes frame 1. No enqueuing! X unblocks, cache2 enqueues for frame 2 and returns frame 1. worker3 completes frame 2. No enqueuing! worker1 starts prefetch frame 2 from filter, cache1 enqueues for frame 3 and returns frame 2. worker2 starts prefetch frame 3 from source. worker2 completes frame 3. No enqueuing! worker1 completes frame 2. No enqueuing! Stall! cache2 has frame 2 ready, cache 1 has frame 3 ready, all CPU cores available to X X requests frame 2, cache2 enqueues for frame 3 and returns frame 2. worker3 ... At the stall there is little point in Avisynth proceeding, the goal is to instantly provide X with the frame it is requesting is being met. If X had a bursty request pattern, say requesting 3 frames in quick succession and a long pause to process the 3, then simply adding Sh0dans, prefetch filter would accomodate. Of course for bonus points the cache could measure the inter frame request times and more aggressively prefetch frames based on minimum, maximum and latency times. Threads that block in on cache lock get recycled in a fibre like manner in an attempt to keep the defined number of workers concurrently active, hence the "Build a thread interlock infrastructure". Most of the code to do this simple involves repackaging TSP's current code and inverting the logic so the distributor is in every cache instead of once only at the very top of the graph. All the confusing SetMTMode calls should no longer be necessary with this model. Legacy filter will be assume thread unsafe, wrapper functionality could promote a legacy filters thread safeness. Of course by building infrastructures and adding access to them in the API means filter authors can enqueue fragments of their internal processing along with the prefetch logic in a compatible way. A work in progress!

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode