Multithreaded development (from view of a Source Filter)

jordanh · 4th May 2013, 20:25

Although i spent the last months with playing around with avisynth Versions, i did not really get how to deal with the Multithreaded stuff.

First i encountered AVISynth MT Builds. It seemed to me that one can multithread basically any existing AVISynth Filter. So i expected to be able to enable multithreading before doing QTGMC, but i didnt really manage to see any change of speed or cpu usage

Second i learned that AVISynth 2.6 has a new API where Source Filters can add special support for multithreading. The problem was, that i wasnt able to build a sample with 2.6 my own(linkage errors) and also i didnt find a simple Multithreaded Source Plugin example.

Third i encountered, that a filter for 2.5 is upward compatible but 2.6 is no down compatible...

So, what is the most modern way of creating your source Filter?

Thanks,
Harry

IanB · 6th May 2013, 10:12

A quick review of the current AvisynthMT modes :-

Mode 1: Unprotected access to a single instance of the filter. All the threads may enter the GetFrame routine concurrently. Code has to be fully thread safe and re-entrant.

Mode 2: Unprotected multiplexed access to thread count instances of the filter. A threads can enter the GetFrame routine of the next free instance of the filter. Code only has to be instance safe. Each filter instance only sees a non-contiguous proportion of the GetFrame calls.

Mode 3: Protected access to a single instance of the filter. Only 1 thread may enter the GetFrame routine concurrently, the lock is released while the routine executes it's child->GetFrame() calls. Code has to be instance safe, re-entrant, and the code up to the child->GetFrame() call thread safe. The filter instance sees all of the GetFrame calls.

Mode 4: Protected multiplexed access to thread count instances of the filter. Only 1 thread may enter the GetFrame routine of the next free instance of the filter, the lock is released while the routine executes it's child->GetFrame() calls. Code can be mostly slack, statics and globals must be thread safe or read only. Each filter instance only sees a non-contiguous proportion of the GetFrame calls.

Mode 5 and 6: Protected access to a single instance of the filter. Only 1 thread may execute the GetFrame routine. When the routine executes a child->GetFrame() call, a mini-distributor prefetches a number of frames from the remaining graph. The thread count is based on the number, N, of threads concurrently waiting for any mode 5 filter lock. Mode 5 runs N pre-fetch threads, mode 6 runs N+1 pre-fetch threads.

By definition a source filter does no child->GetFrame() calls, the frame data comes from some external source, e.g. disk or network.

Thus modes 3, 5, and 6 are identical, i.e. access to the GetFrame routine is restricted to a single thread at any one time.

Mode 4 is practically useless for source filters, you have N instances of the filter but only 1 can ever execute at a time.

Mode 2 also has N instances of the filter but all can executing in parallel. For input formats with no inter-frame state this can work fine, e.g. raw uncompressed data direct from the disk, I frame only formats with instance safe codecs and very low random access overhead. For formats with inter-frame dependencies each thread ends up duplicating the same work as the other threads, e.g.with h264 each instance must decode the I frame and all the referenced P and B frames for each subsequent B frame.

Mode 1 of course means the filter has to be re-entrant and thread safe. All the threads can enter the GetFrame routine together. The filter author may choose to provide their own mutex protection. If they do they should avail themselves of the thread resource for all thread that they cause to be suspended. This assumes that some elements of the source frame generation process are usefully parallelisable.

The other threading technology available in Avisynth is pipe-lining. This model has worker threads between filter instances pre-fetching frames for that part of the graph. This can lead to concurrent access like mode 1 if multiple worker threads are configured.

When running heavy filters like QTGMC usually the source filter threading is not the problem.

jordanh · 6th May 2013, 17:37

Ah, again you make things more clear. Thank you very much, IanB. I will try to play around with the Mode 1with my CarbonSource Filter.

What i still dont get is if there is any connection between MT and 2.6, will they be merged at some point in time, or are they already?
The pipe-lining stuff with the pre-fetching of frames is also new and interesting to me, could you please point me a direction where to get informations about this?

Basically i feel that (as i dont use avisynth's Directshow interface but the framework only), i also could do the Multithreadnig on my own, by just splicing the pictures and having multiple instances, but i dont know if this is the best way...

All the best,
Harry

Groucho2004 · 6th May 2013, 17:54

Quote:

Originally Posted by jordanh

i also could do the Multithreadnig on my own, by just splicing the pictures and having multiple instances, but i dont know if this is the best way...

You could have a look at the source of tritical's filters, some of them are multithreaded (nnedi3, dfttest).

TheFluff · 6th May 2013, 18:06

Avisynth-MT is a fork of the official Avisynth project. It is unlikely to be merged with the official version since it is a terrible hack and suffers from severe stability problems because people don't get thread safety and keep using non-reentrant plugins in a threaded environment because "it only crashes occasionally". If you write your plugins in a generally thread-safe way and make sure that GetFrame is reentrant (can be surprisingly hard with source filters, depending on how you're doing the source file access) they will be compatible with Avisynth-MT's multithreading, but they will not be multithreaded in regular Avisynth. Generally I think it's still best to implement multithreading yourself inside your own plugin, since Avisynth-MT is a terrible idea in many ways. I'm not familiar with the pipelining IanB mentions though.

However, as IanB says, generally the source filter isn't the bottleneck, so if writing a threaded version is hard feel free to skip it.

Or you could try VapourSynth, which deals with the multithreading stuff in a much cleaner way.

IanB · 7th May 2013, 00:33

Some quick archaeology found these threads :-

All filter authors should aim for mode 1 compliance if possible. For source filters in particular it is almost a given as nothing inside Avisynth can really help very much with the decoding of formats with inter-frame dependencies. If you want to do multi-threaded decoding then doing it yourself with your own mutex management of incoming GetFrame() requests gives you the most control. And by only assuming the resources of threads your mutex management suspends allows you to seamlessly integrate into any upper level thread management scheme.

E.g. on an 8 core machine ideally you would have 8 Avisynth threads running and ideally all 8 would be doing something constructive. With modes 3, 4, 5 or 6 the threads just queue sequentially waiting for each single frame to be decoded. This may be acceptable with heavy processing scripts but for light processing scripts we may want something more. If we just blindly allocate another 8 independent threads in the decoder that makes 16 threads globally all trying to run on only 8 cores. We get contention. A smarter model might be to have the 8 independent threads in the decoder but have them suspended. As each Avisynth thread call the GetFrame routine we suspend it and ready 1 more of our worker threads. So as more Avisynth threads wait, the decoding of the current frame can become faster because more worker threads are available. As a frame completes decoding we suspend 1 of our worker threads and ready the Avisynth thread that was waiting for that frame. Decoding continues with the remaining worker threads. So the more Avisynth waits for frames the more worker threads the decoder has available.

kolak · 7th May 2013, 00:50

Just to add from practice- even mode 2 works well with I frame codecs. I use intermediate codecs and some of them are not not natively threaded, but with mode 2 I get multithreaded decoding. This is actually nice thing to have.

IanB · 7th May 2013, 03:54

Quote:

Originally Posted by IanB

...
Mode 2: ... For input formats with no inter-frame state this can work fine, e.g. raw uncompressed data direct from the disk, I frame only formats with instance safe codec's and very low random access overhead. ...

Note the instance safe bit, some builds of the Cedocide DV codec have had issues reported.

kolak · 7th May 2013, 12:04

I never had an issue, use mainly Canopus codecs: lossless, HQ, HQX and also ProRes, DNxHD.

jordanh · 7th May 2013, 22:15

Hey, i didnt expect that much response... thanks to all of you.
Until now i was under the impression, that AVISynth would call some kind of special function or float framenumbers or similar from my source filter in order to Multithread. i
In my special case with the CarbonSource Plugin, i first have to call some frames from Carbon, and then pull a few frames from AVISynth. So i always fill like 10 frames in one step and then request a few (depends on output framerate) frames from AVISynth for sending them to the upstream in Carbon again.. When i call getFrame() on the env (end of filterchain), all the frames that i can get are already loaded into the source plugin.
As the AVISynth CarbonSource Plugin does read only on the frames from Carbon, i should be able to do multithreaded frame gathering from the env (just as much as it needs depending on output framerate and available input frames in the buffer).

I try to summarize and generalize my understanding of multithreading options:

-) all MT() ,SetMTMode and similar functions are dedicated to AVISynth MT, but:
-) AVISynth MT is generally a bad idea
-) there are filters in above links that enable similar function than AVISynth MT, but i did not find any stable considered and multi-purpose
-) Generally best Performance is definitely reached by multithreading with your own program that invokes the AVISynth script and calls the getFrame() functions. The Source filter has to support Mode one Multithreading. The problem with that is, that you either have to create multiple environments or all filters in your chain are thread-safe.
-) If one needs to do multithreading using filters, its good to have a look at above links

That is great information for me. I think i see the way for my plugin now clearly.

Thanks,
Harry

kolak · 7th May 2013, 23:25

There is also MP_Pipeline: http://forum.doom9.org/showthread.php?t=163281

IanB · 7th May 2013, 23:43

I guess a lot depends on the Carbon API for "call some frames from Carbon".

Can you tell us a little about the environment.

Do you pull the frames from the API like inside AVISource or do you collect the frames from a callback like inside DirectShowSource?

Is the environment multithreading aware, i.e. can 8 threads each ask for 8 different frames in parallel?

Is that style an efficient use of the API?

You say "fill like 10 frames in one step". Does this mean grabbing a set of frames all at once with all cpu cores available to the Carbon decoder is a good design?

You imply that Carbon might eventually be calling processed frames back from Avisynth. How do you envision doing this? Is this simply through the VfW AVIFile interface or do you want to use the Avisynth C or C++ API?

One future idea for improving AvisynthMT is to allow filters that use more than 1 input frame to pre-request the other frames. There is no code written for this yet, if you have specific requirements now is a good time to discuss them.

I envisage a mythical SomeFilter that combines a 3 frame window of 2 input clips into an output frame would flow something like this. :-

Code:

PVideoFrame __stdcall SomeFilter::GetFrame(int n, IScriptEnvironment* env) {
  child->SetCacheHints(CACHE_PREFETCH_FRAME, n-1);
  child2->SetCacheHints(CACHE_PREFETCH_FRAME, n-1);
  child->SetCacheHints(CACHE_PREFETCH_FRAME, n+1);
  child2->SetCacheHints(CACHE_PREFETCH_FRAME, n+1);
  child->SetCacheHints(CACHE_PREFETCH_FRAME, n);
  child2->SetCacheHints(CACHE_PREFETCH_FRAME, n);

  child->SetCacheHints(CACHE_PREFETCH_GO, (int)env);
  child2->SetCacheHints(CACHE_PREFETCH_GO, (int)env);

  PVideoFrame frameOut = env->NewVideoFrame(vi);

  PVideoFrame frameprevA = child->GetFrame(n-1, env);
  PVideoFrame frameprevB = child2->GetFrame(n-1, env);
  /* munge left frames ... */

  PVideoFrame framenextA = child->GetFrame(n+1, env);
  PVideoFrame framenextB = child2->GetFrame(n+1, env);
  /* munge right frames ... */

  PVideoFrame framecurrA = child->GetFrame(n, env);
  PVideoFrame framecurrB = child2->GetFrame(n, env);
  /* munge centre frames */

  /* combine left, centre and right frames ... */

  return frameOut;
}

The idea being SomeFilter declare to the cache it will need these 6 frames. The cache requests any frames not already cached on worker threads. By the time the child->GetFrame calls happen the required frames are in cache or at least partially generated.

jordanh · 8th May 2013, 00:54

Yeah, you are right.. basically all depends on the Carbon API. Please give me one or 2 weeks more, then i have a documentation ready. Its already implemented and works OK, but needs some care before first release.
Of course there is a huge number of Carbon internal reasons why i buffer things like i do it. Carbon is extremely fast on decoding. Also it has a very good and easy internal logic when and what to transform. Just like in Avisynth, Carbonfilters can implement multithreading or not. The most Carbon Plugins seem to be single threads, but mostly the Exporter is the huge break or the up-downconversion. You can get a fullHD mpeg stream decoded in up to 7x realtime when you "null render" and do not filter it.

So far i can give a little overview:
You may understand that i cannot give too much information about Carbon API Stuff because thats NDA covered. All the Carbon related information below is or was generally available, published by Rhozet/Harmonic.

AVISynth API:
Pull-Pull model: You need to pull a frame from env, that calls a frame from your source plugin.

Carbon API:
Push-Push model :you get the decoded frame delivered into one public callback method, you can at one time only recieve or deliver frames, you need to do it in ascending order.

Hybrid Requirements:
support fps change, size change, interlacing change (did i forget something?)

Hybrid Usage:
In your CarbonCoder Application, Load the "source video filter" "Avisynth Filter". If you would use it as "target video filter" instead of "source", carbon will already have done all the resizing, deinterlacing and fps conversion when the frames arrive at your avs script.
At the Avisynth Filter configuration, point the path to your avs script. Then choose any input file and output settings.
In your avs script, explicitly load the CarbonPlugin.dll in the first line, and use CarbonSource() as the Source Filter for the Video stream you want to have from Carbon.
Mixed Processing is no problem: you can let AVISynth do only deinterlacing and Carbon the Framerate conversion.

Hybrid Implementation:
For optimal performance, we must have direct memory access from the AVISynth Source Plugin CarbonSource() to the Carbon Avisynth Filter Plugin. Both are win32 dll's, and as we loaddll "avisynth.dll", the memory is basically free for us to read. In the Carbon Part,include CarbonSource.h. After loaddll and invoke of environment, tell carboncoder what format will be delivered back from the returned clip in the avs script. In the CarbonSource() part, use a env->SetVar to exchange its own instance address. In the Carbon Part, use env->GetVar to get the instance Address of the Source Plugin. So we have access to all public functions of our source filter object. Unfortunately we must always memcpy on both sides,
From there it is mostly a matter of how things work in Carbon.

I hope this is not too intrusive, but i already lose about 1/3 speed by just pulling frames through avisynth, with an "empty script", it would be a huge bottleneck to do IPC or Filebased feeding of avisynth.
I dont think that this special kind of API usage needs to have special support, but i really wished i could swich avisynth to push mode... this should open avisynth for all kinds of live sources, shouldnt it?

good night!

jordanh · 8th May 2013, 23:50

@IanB: you work on the MT project?
thanks for sharing the idea of a pre-caching "theoretical after Source" plugin.
I dont know too much of the child stuff and this appearently mighty setcachehints function, as i have a push source, i dont need to implement any of them at this time... When i had a quick overview of all above mentioned links for multithreading strategies, i understand that the method you explained will have the same limitations like all the others and it cannot be compatible to all existing filters.

One question: am i right that you generally want to keep any Source Plugins requirement to only need to deliver frames only in ascending order?
The reason why i am asking is, that from my understanding of your prototype code, the Source Plugin is not asked in monotonic order. Is that right?
...if so, it either means incompatibility to slow seeking source media like CD and/or the need for a standard how to manage the needed buffer in the source plugin.

Of course all above are just assumtions :-)

IanB · 9th May 2013, 01:59

Your question first :- Yes frames can be requested in any order. Consider script usage of Trim(), SelectEvery(), SelectOdd(), SelectEven(), Reverse(), etc, all these effect the request order. Also consider the calling host application, e.g. an mpeg2 encoder might conceivably ask for frames in encode order, i.e. IPbbPbb... (0,3,1,2,6,4,5,...). And then there are the AvisynthMT influences.

There are filter that can mitigate this like the 'hack' ChangeFPS(Last, Last) and the plugin RequestLinear(). DirectShowSource can suffer quite badly from non-linear access. The plugin DSS2() has some tweaks to mitigate this, I believe it maintains a 10? frame LRU to service requests and always pulls frames in order for upto 20? frames into it's LRU. A seek beyond 20? frames or backwards results in a 2? frame preroll.

Children are the input clips from which a filter gets the frames that it will process to produce the output frame.

The class GenericVideoFilter : public IClip is the simplest filter. It calls a single input clip (child) and just passes Audio and Video straight through, most filters use this as a template to provide API "glue", so for an Audio filter you only need to write the GetAudio code, for a Video filter you only need to write the GetVideo code.

Source filters are unique because they do not have any children (input clips). Basically they env->NewVideoFrame(vi); and fill in the image contents.

The SetCacheHints method is currently only used to influence the caches of a filters input clips, basically you can turn it off, CACHE_NOTHING, or put it into windows mode, CACHE_RANGE. As part of the Avisynth 2.6 API changes it's usage is being extended. For 2.6.0 the cache still works the same way as in 2.5. The 2.6.0 cache does actually ask the child input clip performance questions to assert the interface design but the information is presently not used. A future version may tailor the behaviour based on these or future defined queries.

These are the currently defined queries, they may change, they will almost certainly be extended. :-

Code:

  CACHE_GETCHILD_CACHE_MODE=200, // Cache ask Child for desired video cache mode.
  CACHE_GETCHILD_CACHE_SIZE=201, // Cache ask Child for desired video cache size.
  CACHE_GETCHILD_AUDIO_MODE=202, // Cache ask Child for desired audio cache mode.
  CACHE_GETCHILD_AUDIO_SIZE=203, // Cache ask Child for desired audio cache size.

  CACHE_GETCHILD_COST=220, // Cache ask Child for estimated processing cost.
    CACHE_COST_ZERO=221, // Child response of zero cost (ptr arithmetic only).
    CACHE_COST_UNIT=222, // Child response of unit cost (less than or equal 1 full frame blit).
    CACHE_COST_LOW=223, // Child response of light cost. (Fast)
    CACHE_COST_MED=224, // Child response of medium cost. (Real time)
    CACHE_COST_HI=225, // Child response of heavy cost. (Slow)

  CACHE_GETCHILD_THREAD_MODE=240, // Cache ask Child for thread safetyness.
    CACHE_THREAD_UNSAFE=241, // Only 1 thread allowed for all instances. 2.5 filters default!
    CACHE_THREAD_CLASS=242, // Only 1 thread allowed for each instance. 2.6 filters default!
    CACHE_THREAD_SAFE=243, //  Allow all threads in any instance.
    CACHE_THREAD_OWN=244, // Safe but limit to 1 thread, internally threaded.

  CACHE_GETCHILD_ACCESS_COST=260, // Cache ask Child for preferred access pattern.
    CACHE_ACCESS_RAND=261, // Filter is access order agnostic.
    CACHE_ACCESS_SEQ0=262, // Filter prefers sequential access (low cost)
    CACHE_ACCESS_SEQ1=263, // Filter needs sequential access (high cost)

IanB · 9th May 2013, 02:21

Quote:

Originally Posted by jordanh

I hope this is not too intrusive, but i already lose about 1/3 speed by just pulling frames through avisynth, with an "empty script", it would be a huge bottleneck to do IPC or Filebased feeding of avisynth.
I dont think that this special kind of API usage needs to have special support, but i really wished i could switch avisynth to push mode... this should open avisynth for all kinds of live sources, shouldnt it?

Yes you always seem to get stuck with an input frame blit and another output frame blit as you cross API boundaries.

I take it this is costing you some speed. Have you identified any other bottlenecks?

Worker thread piping can help sometimes when bridging push and pull models. e.g. at the bottom a thread keeps draining the input stream into a fair number of buffers, it stop when all the buffers are full. At the top another thread keeps pulling frames from Avisynth and pushing them into the output stream, it stops when the output stream is full or all the input buffers in the source are empty.

jordanh · 11th May 2013, 18:49

IanB, sorry but i really struggle following if you are talking of AVISynth MT or classic.
The CACHE_GETCHILD_THREAD_MODE and similar, i have never seen them in the avisynth docs... is that a new thing you just implement? ...Sure thats interesting!
Thanks for all the explainations, everyone.

Quote:

Originally Posted by IanB

I take it this is costing you some speed. Have you identified any other bottlenecks?

At the moment not. I need to dig a little deeper into this.
By now i am measuring this way (including example values):
1) take the time that carbon needs without any filter: 1 minute
2) do the same using a special "empty" filter: 1 minute
3) take the time that carbon needs with the avisynth plugin filter, but disable all avisynth processing (copy the frames only): 1 min. 3 sec
4) take the time of carbon and avisynth in action, but with a special avs script that basically only loads the source and does not filtering: 1 min 20 sec

By the way, basically the framebuffer that holds the incoming pictures can be configured to hold only one frame, that brings the whole chain back to where it comes from: deliver one sample then fetch one sample.... Thats how most filters work in carbon.

Quote:

Originally Posted by IanB

Worker thread piping can help sometimes when bridging push and pull models. e.g. at the bottom a thread keeps draining the input stream into a fair number of buffers, it stop when all the buffers are full. At the top another thread keeps pulling frames from Avisynth and pushing them into the output stream, it stops when the output stream is full or all the input buffers in the source are empty.

I think in my special case it doesnt help to thread incoming and outgoing. As i mentioned before, the routine for frame gathering and the one for sending the processed frame to the upstream can only run seperately. it is not possible that both can run at the same time. Imagine it like the carbon engine was single threaded (AFAIK internally its not, but it looks like).
To implement what you mention would also mean the need to synchronize the carbon frame delivery.

I think the better and more easy way will be to just have multiple env's at the cost of memory. I hope to have more or less linear speed improvement when doing this:

1) create 4 similar avisynth environments, each having its own instance of my avisynth CarbonSource() plugin in the avs script
2) set the read buffer of the 4 instances of CarbonSource() to a single buffer (the one and only buffer)
3) Buffer about 10 input frames, stop buffering
4) create 4 threads that let the 4 environments process each 1 frame (actually at this spot, we measure in time units e.g. 40ms, not in frames, because of possible frame rate change)
5) buffer next 4 frames...

The problem is that i need to harden the functionality before i go to threading...

Harry

IanB · 12th May 2013, 02:00

Quote:

Originally Posted by jordanh

IanB, sorry but i really struggle following if you are talking of AVISynth MT or classic.
The CACHE_GETCHILD_THREAD_MODE and similar, I have never seen them in the Avisynth docs... is that a new thing you just implement?

Yes this is all new and reserved for a future version, they are just place holder so we will not need to change the API again when they are implemented. As I said the Cache currently does make all the queries, it validates the answers, but does not act on the responses. About 95% of the "MT" code is in the "classic" code base, the missing 5% is Distributor(), the 5 mode cache code and the [GS]etMTMode() verbs.

The CACHE_GETCHILD_* model will be that user filters optionally implement the IClip::SetCacheHints method. The parent cache at start up makes the queries. If the filter has an opinion on the query it responds appropriately. The parent cache reconfigures itself to accommodate the opinion.

Quote:

At the moment not. I need to dig a little deeper into this.
By now i am measuring this way (including example values):
1) take the time that carbon needs without any filter: 1 minute
2) do the same using a special "empty" filter: 1 minute
3) take the time that carbon needs with the avisynth plugin filter, but disable all avisynth processing (copy the frames only): 1 min. 3 sec
4) take the time of carbon and avisynth in action, but with a special avs script that basically only loads the source and does not filtering: 1 min 20 sec

I take it in 2. the filter effectively just returns the pointer to input buffer as it's output, much like a zero cost Avisynth filter would. So cost nothing.

In 3. I assume the filter malloc's (whatever) a new output buffer and just blits the input buffer contents into it, much like an Avisynth Crop(..., Align=True) will do with an unaligned left value. So costs 3 seconds.

In 4. I assume the script is a simple "Return CarbonSource(...)" and you lose an extra 17 seconds somewhere currently unknown.

Quote:

I think the better and more easy way will be to just have multiple env's at the cost of memory. I hope to have more or less linear speed improvement when doing this:

1) create 4 similar avisynth environments, each having its own instance of my avisynth CarbonSource() plugin in the avs script
2) set the read buffer of the 4 instances of CarbonSource() to a single buffer (the one and only buffer)
3) Buffer about 10 input frames, stop buffering
4) create 4 threads that let the 4 environments process each 1 frame (actually at this spot, we measure in time units e.g. 40ms, not in frames, because of possible frame rate change)
5) buffer next 4 frames...

This appear like you want manually implement a SetMTMode(2, 4). With mode 2, each "spread" of a filter share a common cache. If you have temporal filters that process say 3 input frames, prev, curr & next into an output frame the mode 2 cache only calls for each frame once. If the chains are fully divorced then no commonality is possible.

jordanh · 12th May 2013, 10:54

Thanks, you help me a lot while putting the puzzle together :-)
Basically i think that i got what you want to tell me, except one: The SetMTMode() stuff is dedicated to AVISynth MT, isnt it? ...i definitely have to give that a try before implementing by own threading.
On the other hand the outcome of this whole discussion is, that i need to do a lot of benchmarking to find the right spot for threading.

Quote:

If the chains are fully divorced then no commonality is possible.

I am not sure if you understood how things work for me... Actually i can definitely share the same input buffer on multiple environments. It is hard to get, but all i do at CarbonSource() depends on the special ability to be able to modify the living instance of my CarbonSource() Filter thats Instance was created by AVISynth when it loads an avs script.
It runs like this:
1) my Carboncoder Filter creates the env using c++
2) env creates the instance of the CarbonSource() Filter when an avs script containing CarbonSource() is loaded
3) CarbonSource() Filter sets a uservariable "INSTANCEADDRESS" in env to expose its own instance address
4) From there i can set the input buffer memory of the SourceFilter to whatever i like.

But, just like mentioned, it doesnt matter. First i need to benchmark and find out where speed is really lost.
Meanwhile i had a break through regarding input compatibility, i am realeasing the first Beta (or even RC1) in the next week.

jordanh · 12th May 2013, 18:35

It seems that i just cannot have multiple seperate avisynth environments using the c++ api at a time. At least not while my SourcePlugin uses env->setVar()...

Can it be that
1) generally, a single process can only run a single avisynth environemnt (would it help to load the dll mutliple times?)
2) for a threadsafe source filter, using env->setVar() is not allowed?

Below is the code part of interest, Upper is the calling Application (the Carbon Part) and bottom is the AVISynth Source Plugin part.
It produces the "setvar failed." error when i do either of
-) use setmtmode(2,4)
-) call the CreateNewEnv() function from below 2x in order to create 2 seperate environments.

Thanks for any clue... especially on the setVar in MT environment stuff.

Code:

//calling application that imports avisynth.dll and creates script enviroment

IScriptEnvironment* Scaler::CreateNewEnv(){
//creates new avisynth script environment and returns it

	if (!avsdll){
		avsdll = LoadLibrary("avisynth.dll");//load dll only once
	}
	IScriptEnvironment* outputenv;
	try{
				CreateEnv = (IScriptEnvironment *(__stdcall *)(int))GetProcAddress(avsdll, "CreateScriptEnvironment"); //get function pointer in external dll
				if(!CreateEnv){
						if (debug==true){
							sprintf(debugmsg,"Avisynth Filter error, cannot initialize avisynth.dll, version or installation problem");
							debug_log_callback( debugmsg);
						}
						throw std::runtime_error("Avisynth Filter error, cannot initialize avisynth.dll, version or installation problem");
						
				}
				//do the avisynth work
				outputenv= CreateEnv(AVISYNTH_INTERFACE_VERSION);
				
				//first we set our input info as variables, they are read from the plugin when initializing //hardcoded
				outputenv->SetVar("EXTERNAL_INPUT_HEIGHT", (int)m_InputVideoInfo.height);
				outputenv->SetVar("EXTERNAL_INPUT_WIDTH", (int)m_InputVideoInfo.width);
				outputenv->SetVar("EXTERNAL_INPUT_FPSNUM", (int)m_InputVideoInfo.frameRateNumerator);
				outputenv->SetVar("EXTERNAL_INPUT_FPSDEN", (int)m_InputVideoInfo.frameRateDenominator);
				outputenv->SetVar("EXTERNAL_INPUT_FIELDORDER", (int)m_InputVideoInfo.interlaceMode);	
				
		}
		catch (AvisynthError e) {
			            
			m_errorMessage = e.msg;
			debug_log_callback( (char*)e.msg);
			m_errorMessage = "Avisynth script error.";	

			outputenv = 0;//for avisynth 2.5, do not delete the environment. in newer versions a delete env function is provided	
			throw PluginException(RPI_RESULT_INTERNAL_ERROR,m_errorMessage.c_str());
	

		}
		return outputenv;

}

Code:

//AVISynth Source Filter CarbonSource()

AVSValue __cdecl Create_CarbonSource(AVSValue args, void* user_data, IScriptEnvironment* env) {
	//called by avisynth.dll. creates an instance of our SourceFilter Class

	//returns new instance see AvisynthPluginInit2 AddFunction for parameter description
	CarbonSource *theOneAndOnly =  new CarbonSource(env);
	
	if(env->SetVar("INSTANCEADDRESS", AVSValue((int)theOneAndOnly))) {
		  //env->ThrowError ("SetVar ok.");
		} else {
			//here is the error when using setmtmode(2,4)
		  env->ThrowError("SetVar fails. Possible reason is incompatible avisynth version. Must support at least interface version 2");
		}

	return theOneAndOnly;
}

4th May 2013, 20:25	#1 \| Link
jordanh Registered User Join Date: Apr 2013 Location: Vienna, Austria Posts: 55	Multithreaded development (from view of a Source Filter) Although i spent the last months with playing around with avisynth Versions, i did not really get how to deal with the Multithreaded stuff. First i encountered AVISynth MT Builds. It seemed to me that one can multithread basically any existing AVISynth Filter. So i expected to be able to enable multithreading before doing QTGMC, but i didnt really manage to see any change of speed or cpu usage Second i learned that AVISynth 2.6 has a new API where Source Filters can add special support for multithreading. The problem was, that i wasnt able to build a sample with 2.6 my own(linkage errors) and also i didnt find a simple Multithreaded Source Plugin example. Third i encountered, that a filter for 2.5 is upward compatible but 2.6 is no down compatible... So, what is the most modern way of creating your source Filter? Thanks, Harry

6th May 2013, 18:06	#5 \| Link
TheFluff Excessively jovial fellow Join Date: Jun 2004 Location: rude Posts: 1,100	Avisynth-MT is a fork of the official Avisynth project. It is unlikely to be merged with the official version since it is a terrible hack and suffers from severe stability problems because people don't get thread safety and keep using non-reentrant plugins in a threaded environment because "it only crashes occasionally". If you write your plugins in a generally thread-safe way and make sure that GetFrame is reentrant (can be surprisingly hard with source filters, depending on how you're doing the source file access) they will be compatible with Avisynth-MT's multithreading, but they will not be multithreaded in regular Avisynth. Generally I think it's still best to implement multithreading yourself inside your own plugin, since Avisynth-MT is a terrible idea in many ways. I'm not familiar with the pipelining IanB mentions though. However, as IanB says, generally the source filter isn't the bottleneck, so if writing a threaded version is hard feel free to skip it. Or you could try VapourSynth, which deals with the multithreading stuff in a much cleaner way. Last edited by TheFluff; 6th May 2013 at 18:08.

7th May 2013, 00:33	#6 \| Link
IanB Avisynth Developer Join Date: Jan 2003 Location: Melbourne, Australia Posts: 3,167	Some quick archaeology found these threads :- ThreadRequest : yet another plugin for multithread processing Sora's avs multi-process/multi-thread plugin package (2012-02-20) Opinions needed for a multi-threading library Multi-threading: Roundup of options All filter authors should aim for mode 1 compliance if possible. For source filters in particular it is almost a given as nothing inside Avisynth can really help very much with the decoding of formats with inter-frame dependencies. If you want to do multi-threaded decoding then doing it yourself with your own mutex management of incoming GetFrame() requests gives you the most control. And by only assuming the resources of threads your mutex management suspends allows you to seamlessly integrate into any upper level thread management scheme. E.g. on an 8 core machine ideally you would have 8 Avisynth threads running and ideally all 8 would be doing something constructive. With modes 3, 4, 5 or 6 the threads just queue sequentially waiting for each single frame to be decoded. This may be acceptable with heavy processing scripts but for light processing scripts we may want something more. If we just blindly allocate another 8 independent threads in the decoder that makes 16 threads globally all trying to run on only 8 cores. We get contention. A smarter model might be to have the 8 independent threads in the decoder but have them suspended. As each Avisynth thread call the GetFrame routine we suspend it and ready 1 more of our worker threads. So as more Avisynth threads wait, the decoding of the current frame can become faster because more worker threads are available. As a frame completes decoding we suspend 1 of our worker threads and ready the Avisynth thread that was waiting for that frame. Decoding continues with the remaining worker threads. So the more Avisynth waits for frames the more worker threads the decoder has available.

7th May 2013, 12:04	#9 \| Link
kolak Registered User Join Date: Nov 2004 Location: Poland Posts: 2,843	I never had an issue, use mainly Canopus codecs: lossless, HQ, HQX and also ProRes, DNxHD. Last edited by kolak; 7th May 2013 at 13:01.

7th May 2013, 22:15	#10 \| Link
jordanh Registered User Join Date: Apr 2013 Location: Vienna, Austria Posts: 55	Hey, i didnt expect that much response... thanks to all of you. Until now i was under the impression, that AVISynth would call some kind of special function or float framenumbers or similar from my source filter in order to Multithread. i In my special case with the CarbonSource Plugin, i first have to call some frames from Carbon, and then pull a few frames from AVISynth. So i always fill like 10 frames in one step and then request a few (depends on output framerate) frames from AVISynth for sending them to the upstream in Carbon again.. When i call getFrame() on the env (end of filterchain), all the frames that i can get are already loaded into the source plugin. As the AVISynth CarbonSource Plugin does read only on the frames from Carbon, i should be able to do multithreaded frame gathering from the env (just as much as it needs depending on output framerate and available input frames in the buffer). I try to summarize and generalize my understanding of multithreading options: -) all MT() ,SetMTMode and similar functions are dedicated to AVISynth MT, but: -) AVISynth MT is generally a bad idea -) there are filters in above links that enable similar function than AVISynth MT, but i did not find any stable considered and multi-purpose -) Generally best Performance is definitely reached by multithreading with your own program that invokes the AVISynth script and calls the getFrame() functions. The Source filter has to support Mode one Multithreading. The problem with that is, that you either have to create multiple environments or all filters in your chain are thread-safe. -) If one needs to do multithreading using filters, its good to have a look at above links That is great information for me. I think i see the way for my plugin now clearly. Thanks, Harry Last edited by jordanh; 7th May 2013 at 22:25.

6th May 2013, 10:12	#2 \| Link
IanB Avisynth Developer Join Date: Jan 2003 Location: Melbourne, Australia Posts: 3,167	A quick review of the current AvisynthMT modes :- Mode 1: Unprotected access to a single instance of the filter. All the threads may enter the GetFrame routine concurrently. Code has to be fully thread safe and re-entrant. Mode 2: Unprotected multiplexed access to thread count instances of the filter. A threads can enter the GetFrame routine of the next free instance of the filter. Code only has to be instance safe. Each filter instance only sees a non-contiguous proportion of the GetFrame calls. Mode 3: Protected access to a single instance of the filter. Only 1 thread may enter the GetFrame routine concurrently, the lock is released while the routine executes it's child->GetFrame() calls. Code has to be instance safe, re-entrant, and the code up to the child->GetFrame() call thread safe. The filter instance sees all of the GetFrame calls. Mode 4: Protected multiplexed access to thread count instances of the filter. Only 1 thread may enter the GetFrame routine of the next free instance of the filter, the lock is released while the routine executes it's child->GetFrame() calls. Code can be mostly slack, statics and globals must be thread safe or read only. Each filter instance only sees a non-contiguous proportion of the GetFrame calls. Mode 5 and 6: Protected access to a single instance of the filter. Only 1 thread may execute the GetFrame routine. When the routine executes a child->GetFrame() call, a mini-distributor prefetches a number of frames from the remaining graph. The thread count is based on the number, N, of threads concurrently waiting for any mode 5 filter lock. Mode 5 runs N pre-fetch threads, mode 6 runs N+1 pre-fetch threads. By definition a source filter does no child->GetFrame() calls, the frame data comes from some external source, e.g. disk or network. Thus modes 3, 5, and 6 are identical, i.e. access to the GetFrame routine is restricted to a single thread at any one time. Mode 4 is practically useless for source filters, you have N instances of the filter but only 1 can ever execute at a time. Mode 2 also has N instances of the filter but all can executing in parallel. For input formats with no inter-frame state this can work fine, e.g. raw uncompressed data direct from the disk, I frame only formats with instance safe codecs and very low random access overhead. For formats with inter-frame dependencies each thread ends up duplicating the same work as the other threads, e.g.with h264 each instance must decode the I frame and all the referenced P and B frames for each subsequent B frame. Mode 1 of course means the filter has to be re-entrant and thread safe. All the threads can enter the GetFrame routine together. The filter author may choose to provide their own mutex protection. If they do they should avail themselves of the thread resource for all thread that they cause to be suspended. This assumes that some elements of the source frame generation process are usefully parallelisable. The other threading technology available in Avisynth is pipe-lining. This model has worker threads between filter instances pre-fetching frames for that part of the graph. This can lead to concurrent access like mode 1 if multiple worker threads are configured. When running heavy filters like QTGMC usually the source filter threading is not the problem.

6th May 2013, 17:37	#3 \| Link
jordanh Registered User Join Date: Apr 2013 Location: Vienna, Austria Posts: 55	Ah, again you make things more clear. Thank you very much, IanB. I will try to play around with the Mode 1with my CarbonSource Filter. What i still dont get is if there is any connection between MT and 2.6, will they be merged at some point in time, or are they already? The pipe-lining stuff with the pre-fetching of frames is also new and interesting to me, could you please point me a direction where to get informations about this? Basically i feel that (as i dont use avisynth's Directshow interface but the framework only), i also could do the Multithreadnig on my own, by just splicing the pictures and having multiple instances, but i dont know if this is the best way... All the best, Harry

7th May 2013, 00:50	#7 \| Link
kolak Registered User Join Date: Nov 2004 Location: Poland Posts: 2,843	Just to add from practice- even mode 2 works well with I frame codecs. I use intermediate codecs and some of them are not not natively threaded, but with mode 2 I get multithreaded decoding. This is actually nice thing to have.

7th May 2013, 23:25	#11 \| Link
kolak Registered User Join Date: Nov 2004 Location: Poland Posts: 2,843	There is also MP_Pipeline: http://forum.doom9.org/showthread.php?t=163281

7th May 2013, 23:43	#12 \| Link
IanB Avisynth Developer Join Date: Jan 2003 Location: Melbourne, Australia Posts: 3,167	I guess a lot depends on the Carbon API for "call some frames from Carbon". Can you tell us a little about the environment. Do you pull the frames from the API like inside AVISource or do you collect the frames from a callback like inside DirectShowSource? Is the environment multithreading aware, i.e. can 8 threads each ask for 8 different frames in parallel? Is that style an efficient use of the API? You say "fill like 10 frames in one step". Does this mean grabbing a set of frames all at once with all cpu cores available to the Carbon decoder is a good design? You imply that Carbon might eventually be calling processed frames back from Avisynth. How do you envision doing this? Is this simply through the VfW AVIFile interface or do you want to use the Avisynth C or C++ API? One future idea for improving AvisynthMT is to allow filters that use more than 1 input frame to pre-request the other frames. There is no code written for this yet, if you have specific requirements now is a good time to discuss them. I envisage a mythical SomeFilter that combines a 3 frame window of 2 input clips into an output frame would flow something like this. :- Code: PVideoFrame __stdcall SomeFilter::GetFrame(int n, IScriptEnvironment* env) { child->SetCacheHints(CACHE_PREFETCH_FRAME, n-1); child2->SetCacheHints(CACHE_PREFETCH_FRAME, n-1); child->SetCacheHints(CACHE_PREFETCH_FRAME, n+1); child2->SetCacheHints(CACHE_PREFETCH_FRAME, n+1); child->SetCacheHints(CACHE_PREFETCH_FRAME, n); child2->SetCacheHints(CACHE_PREFETCH_FRAME, n); child->SetCacheHints(CACHE_PREFETCH_GO, (int)env); child2->SetCacheHints(CACHE_PREFETCH_GO, (int)env); PVideoFrame frameOut = env->NewVideoFrame(vi); PVideoFrame frameprevA = child->GetFrame(n-1, env); PVideoFrame frameprevB = child2->GetFrame(n-1, env); /* munge left frames ... / PVideoFrame framenextA = child->GetFrame(n+1, env); PVideoFrame framenextB = child2->GetFrame(n+1, env); / munge right frames ... / PVideoFrame framecurrA = child->GetFrame(n, env); PVideoFrame framecurrB = child2->GetFrame(n, env); / munge centre frames / / combine left, centre and right frames ... / return frameOut; } The idea being SomeFilter declare to the cache it will need these 6 frames. The cache requests any frames not already cached on worker threads. By the time the child->GetFrame calls happen the required frames are in cache or at least partially generated. Last edited by IanB; 7th May 2013 at 23:45.*

8th May 2013, 00:54	#13 \| Link
jordanh Registered User Join Date: Apr 2013 Location: Vienna, Austria Posts: 55	Yeah, you are right.. basically all depends on the Carbon API. Please give me one or 2 weeks more, then i have a documentation ready. Its already implemented and works OK, but needs some care before first release. Of course there is a huge number of Carbon internal reasons why i buffer things like i do it. Carbon is extremely fast on decoding. Also it has a very good and easy internal logic when and what to transform. Just like in Avisynth, Carbonfilters can implement multithreading or not. The most Carbon Plugins seem to be single threads, but mostly the Exporter is the huge break or the up-downconversion. You can get a fullHD mpeg stream decoded in up to 7x realtime when you "null render" and do not filter it. So far i can give a little overview: You may understand that i cannot give too much information about Carbon API Stuff because thats NDA covered. All the Carbon related information below is or was generally available, published by Rhozet/Harmonic. AVISynth API: Pull-Pull model: You need to pull a frame from env, that calls a frame from your source plugin. Carbon API: Push-Push model :you get the decoded frame delivered into one public callback method, you can at one time only recieve or deliver frames, you need to do it in ascending order. Hybrid Requirements: support fps change, size change, interlacing change (did i forget something?) Hybrid Usage: In your CarbonCoder Application, Load the "source video filter" "Avisynth Filter". If you would use it as "target video filter" instead of "source", carbon will already have done all the resizing, deinterlacing and fps conversion when the frames arrive at your avs script. At the Avisynth Filter configuration, point the path to your avs script. Then choose any input file and output settings. In your avs script, explicitly load the CarbonPlugin.dll in the first line, and use CarbonSource() as the Source Filter for the Video stream you want to have from Carbon. Mixed Processing is no problem: you can let AVISynth do only deinterlacing and Carbon the Framerate conversion. Hybrid Implementation: For optimal performance, we must have direct memory access from the AVISynth Source Plugin CarbonSource() to the Carbon Avisynth Filter Plugin. Both are win32 dll's, and as we loaddll "avisynth.dll", the memory is basically free for us to read. In the Carbon Part,include CarbonSource.h. After loaddll and invoke of environment, tell carboncoder what format will be delivered back from the returned clip in the avs script. In the CarbonSource() part, use a env->SetVar to exchange its own instance address. In the Carbon Part, use env->GetVar to get the instance Address of the Source Plugin. So we have access to all public functions of our source filter object. Unfortunately we must always memcpy on both sides, From there it is mostly a matter of how things work in Carbon. I hope this is not too intrusive, but i already lose about 1/3 speed by just pulling frames through avisynth, with an "empty script", it would be a huge bottleneck to do IPC or Filebased feeding of avisynth. I dont think that this special kind of API usage needs to have special support, but i really wished i could swich avisynth to push mode... this should open avisynth for all kinds of live sources, shouldnt it? good night! Last edited by jordanh; 8th May 2013 at 22:41.

8th May 2013, 23:50	#14 \| Link
jordanh Registered User Join Date: Apr 2013 Location: Vienna, Austria Posts: 55	@IanB: you work on the MT project? thanks for sharing the idea of a pre-caching "theoretical after Source" plugin. I dont know too much of the child stuff and this appearently mighty setcachehints function, as i have a push source, i dont need to implement any of them at this time... When i had a quick overview of all above mentioned links for multithreading strategies, i understand that the method you explained will have the same limitations like all the others and it cannot be compatible to all existing filters. One question: am i right that you generally want to keep any Source Plugins requirement to only need to deliver frames only in ascending order? The reason why i am asking is, that from my understanding of your prototype code, the Source Plugin is not asked in monotonic order. Is that right? ...if so, it either means incompatibility to slow seeking source media like CD and/or the need for a standard how to manage the needed buffer in the source plugin. Of course all above are just assumtions :-)

9th May 2013, 01:59	#15 \| Link
IanB Avisynth Developer Join Date: Jan 2003 Location: Melbourne, Australia Posts: 3,167	Your question first :- Yes frames can be requested in any order. Consider script usage of Trim(), SelectEvery(), SelectOdd(), SelectEven(), Reverse(), etc, all these effect the request order. Also consider the calling host application, e.g. an mpeg2 encoder might conceivably ask for frames in encode order, i.e. IPbbPbb... (0,3,1,2,6,4,5,...). And then there are the AvisynthMT influences. There are filter that can mitigate this like the 'hack' ChangeFPS(Last, Last) and the plugin RequestLinear(). DirectShowSource can suffer quite badly from non-linear access. The plugin DSS2() has some tweaks to mitigate this, I believe it maintains a 10? frame LRU to service requests and always pulls frames in order for upto 20? frames into it's LRU. A seek beyond 20? frames or backwards results in a 2? frame preroll. Children are the input clips from which a filter gets the frames that it will process to produce the output frame. The class GenericVideoFilter : public IClip is the simplest filter. It calls a single input clip (child) and just passes Audio and Video straight through, most filters use this as a template to provide API "glue", so for an Audio filter you only need to write the GetAudio code, for a Video filter you only need to write the GetVideo code. Source filters are unique because they do not have any children (input clips). Basically they env->NewVideoFrame(vi); and fill in the image contents. The SetCacheHints method is currently only used to influence the caches of a filters input clips, basically you can turn it off, CACHE_NOTHING, or put it into windows mode, CACHE_RANGE. As part of the Avisynth 2.6 API changes it's usage is being extended. For 2.6.0 the cache still works the same way as in 2.5. The 2.6.0 cache does actually ask the child input clip performance questions to assert the interface design but the information is presently not used. A future version may tailor the behaviour based on these or future defined queries. These are the currently defined queries, they may change, they will almost certainly be extended. :- Code: CACHE_GETCHILD_CACHE_MODE=200, // Cache ask Child for desired video cache mode. CACHE_GETCHILD_CACHE_SIZE=201, // Cache ask Child for desired video cache size. CACHE_GETCHILD_AUDIO_MODE=202, // Cache ask Child for desired audio cache mode. CACHE_GETCHILD_AUDIO_SIZE=203, // Cache ask Child for desired audio cache size. CACHE_GETCHILD_COST=220, // Cache ask Child for estimated processing cost. CACHE_COST_ZERO=221, // Child response of zero cost (ptr arithmetic only). CACHE_COST_UNIT=222, // Child response of unit cost (less than or equal 1 full frame blit). CACHE_COST_LOW=223, // Child response of light cost. (Fast) CACHE_COST_MED=224, // Child response of medium cost. (Real time) CACHE_COST_HI=225, // Child response of heavy cost. (Slow) CACHE_GETCHILD_THREAD_MODE=240, // Cache ask Child for thread safetyness. CACHE_THREAD_UNSAFE=241, // Only 1 thread allowed for all instances. 2.5 filters default! CACHE_THREAD_CLASS=242, // Only 1 thread allowed for each instance. 2.6 filters default! CACHE_THREAD_SAFE=243, // Allow all threads in any instance. CACHE_THREAD_OWN=244, // Safe but limit to 1 thread, internally threaded. CACHE_GETCHILD_ACCESS_COST=260, // Cache ask Child for preferred access pattern. CACHE_ACCESS_RAND=261, // Filter is access order agnostic. CACHE_ACCESS_SEQ0=262, // Filter prefers sequential access (low cost) CACHE_ACCESS_SEQ1=263, // Filter needs sequential access (high cost)