PDA

View Full Version : Implementing cache hints


sh0dan
17th December 2002, 09:49
My thoughts on caching:

The basic idea of hinting is to avoid the "cache-all-better-safe-than-sorry" cache policy AviSynth currently uses. Therefore filters can hint to the cache filter above the range of frames it will request at most. All frames outside this range shouldn't be stored.

If no hints are given, the current policy should still apply.

The code is a bit tricky, because the top filter cannot see what the actual center frame is. Therefore, if a filter says it will at max request 2 frames ahead and behind the current one, the cache needs to store 5 frames.

consider the frames stored like this:

-2|-1|n|+1|+2

When a filter requests frame '+2' the cache should still store frame '-2', because the cache filter doesn't know what frame 'n' is.

When cache hinting is properly implemented, the required cache size should be considerably smaller, since almost no filters actually require caching. Most colorcorrecting filters and non-temporal filters have no use of the input cache, because if some filter requests an earlier frame, it will be stored in the cache longer up the filter chain.

About the best implementation, I hope some of you have feedback, since I'm quite lost. The current one is just something that I threw together for testing, and shouldn't be taken too seriously.

SansGrip
17th December 2002, 14:31
Originally posted by sh0dan
The code is a bit tricky, because the top filter cannot see what the actual center frame is. I obviously don't know the Avisynth source well enough because this confused me. What do you mean?

When cache hinting is properly implemented, the required cache size should be considerably smaller, since almost no filters actually require caching. Though, that said, does one really lose anything by caching all the frames? Apart from memory, I mean ;).

sh0dan
17th December 2002, 14:46
Originally posted by SansGrip
I obviously don't know the Avisynth source well enough because this confused me. What do you mean?

In your filter you request frames by simply sending the number you would like from the filter above yours. The frame number is all you get, so the cache cannot see which frame is at the frame requested by your filter.

If you filter requests frame -2 in the example above, the cache doesn't know that your center frame is 'n'. It's not a big deal, just something you should consider when implementing a new cache algorithm.


Though, that said, does one really lose anything by caching all the frames? Apart from memory, I mean ;).
The problem now is, that it doesn't cache the "right" frames. Try removing the cache hints from the temporal smoother, and you'll see how bad it performs, even though it shouldn't.

Caching all frames it just silly, since the cache could be used for much better output buffering - caching only needed frames will make much more sense.

Being able to disable the cache, by hints will also eliminate many redundant frame copies.

SansGrip
17th December 2002, 14:59
Originally posted by sh0dan
If you filter requests frame -2 in the example above, the cache doesn't know that your center frame is 'n'. Oh ok, I get you now. Would it be a bad idea to add:

PVideoFrame GetFrame(int n, int centre, IScriptEnvironment* env)

or even

void CentreFrameIs(int n)

for new builds that make use of cache hints?

The problem now is, that it doesn't cache the "right" frames. Try removing the cache hints from the temporal smoother, and you'll see how bad it performs, even though it shouldn't. Mmmmm... That reminds me, did I even remember to call SetCacheHints in my new build of Flux? :rolleyes:

Being able to disable the cache, by hints will also eliminate many redundant frame copies. Ah hah, now there's a good reason. TBH I don't think "wasting RAM" is usually a valid reason for changing something unless we're talking about a huge amount of it. Generally speaking anyone doing video work will have a lot of it anyway.

I'm not sure how caching works right now (I should probably go look before making any more suggestions ;)), but might it not be feasible to add a "cache manager" in-between each filter? In other words, instead of the chain looking like this:

Source -> Cache -> Crop -> Resize -> GreatFilter

It would look like this:

Source -> Cache manager for crop -> Crop -> Cache manager for resize -> Resize -> Cache manager for GreatFilter -> GreatFilter

and so on...

sh0dan
17th December 2002, 15:37
>void CentreFrameIs(int n)

It's not very important, for the cache to know the center frame. Yes, you save some frames, but it isn't that many anyway.

>Mmmmm... That reminds me, did I even remember to call SetCacheHints in my new build of Flux? :rolleyes:

Don't worry - if you don't specify anything, it still caches frames, just as it used to. :)

>Ah hah, now there's a good reason. TBH I don't think "wasting RAM" is usually a valid reason for changing something unless we're talking about a huge amount of it. Generally speaking anyone doing video work will have a lot of it anyway.

I think "caching only the frames needing to be cached" is the best argument - right now it's hard to justify the massive RAM usage.

>It would look like this:
>
>Source -> Cache manager for crop -> Crop -> Cache manager for resize -> Resize -> Cache manager for GreatFilter -> GreatFilter

This is exactly how it works now. You can also see where a lot of stores can be saved. Basicly, crop & resize NEVER needs a cache, since they never request frames out-of-order. What you do in your filter is that you give the cache _before_ your filter hints about what would be good for it to cache.
That means, that if your filter reads frames sequencially, you give the hint CACHE_NONE (or what it's called).

SansGrip
17th December 2002, 16:19
Originally posted by sh0dan
You can also see where a lot of stores can be saved. Basicly, crop & resize NEVER needs a cache, since they never request frames out-of-order. It would seem to me that providing each cache manager knows how many out-of-order frames the filter will request, as well as the current centre frame, it would be fairly trivial to implement.

I must be missing something...?

That means, that if your filter reads frames sequencially, you give the hint CACHE_NONE (or what it's called). Are there any filters that might benefit from an additional CACHE_RANDOM constant, meaning "there is no pattern to what frames I might request, so cache everything"? I guess it would be as if one didn't specify a cache hint at all, but would be better in terms of self-documentation.

sh0dan
17th December 2002, 16:26
Originally posted by SansGrip
It would seem to me that providing each cache manager knows how many out-of-order frames the filter will request, as well as the current centre frame, it would be fairly trivial to implement.

But also not necessary - I see no need to change the filter interface for this. Knowing the current frame doesn't really matter. You may call it lazyness - I call it efficiency :)
Remember - for this minor change, we have to change ALL clip->GetFrame() - believe me there are many ;)


Are there any filters that might benefit from an additional CACHE_RANDOM constant, meaning "there is no pattern to what frames I might request, so cache everything"? I guess it would be as if one didn't specify a cache hint at all, but would be better in terms of self-documentation.

This should be default operation (IMO) - it will be how the LAST cache will be set, so that output frames are always buffered, up to the memory limit. But adding CACHE_ALL sounds nice enough for me! :)

SansGrip
17th December 2002, 17:35
Originally posted by sh0dan
Remember - for this minor change, we have to change ALL clip->GetFrame() - believe me there are many ;) I believe you ;). What about my other suggestion, that of filters supporting SetCacheHints() to call CentreFrameIs() in the GetFrame() override? Wouldn't that give the cache manager enough information without requiring large modifications to existing source?

But adding CACHE_ALL sounds nice enough for me! :) Yes, CACHE_ALL is clearer than CACHE_RANDOM :).

sh0dan
17th December 2002, 19:10
> Wouldn't that give the cache manager enough information without requiring large modifications to existing source?

Could be, but then you'd have to send cache hints each frame instead of one hint on initialization. I cannot be done to a central cache manager, since the frame number will be different different places in the filter chain (when using trim, splice for instance).

(trust me, when I say it's not worth it :)

SansGrip
17th December 2002, 20:06
What about if you took a chain like:

Source -> cache manager for crop -> crop -> cache manager for resize -> resize -> cache manager for nifty_filter -> nifty_filter

Assuming that crop and resize set CACHE_NONE and nifty_filter sets a CACHE_RANGE of, say, 2, you could turn it into something like:

Source -> crop -> resize -> cache manager for nifty_filter -> nifty_filter -> cache proxy for nifty_filter

That way the cache manager for nifty_filter (which is now in fact two objects and two links in the chain) knows exactly which frame nifty_filter is processing -- because the proxy took the request and passed it through to nifty_filter -- without the filter itself having to do anything other than hint during initialization. The cache manager could then make sure it cached only relevant frames.

Bidoche
17th December 2002, 20:38
I don't understand why not simply cache the last 'hint' requested frames.

So with your filter needing two frames ahead and two behind, it whould say cache 5 last.
And so when the top filter goes forward you will always read one new frame, have all the others in cache and the discard the oldest.

Won't it work ?

sh0dan
17th December 2002, 22:12
@Bidoche: Exactly - this is how it's done :)

@Sansgrip: I think you just lost me there ;)

I think the algorithm is good enough - I was more into asking who was interested in implementing it - I'm having a hard time figuring out the existing stuff, so basicly I'm looking for someone with a good knowledge on actually coding a _solid_ system, that has the existing features:

- Framebuffer reuse (avoids memory fragmentation)
- Able to synchronize between filters, to control memory usage.
- Threadsafe, so it doesn't have to be rewritten soon.
- Keeps good track of allocated buffers.

So basicly someone who knows AviSynth very well.

Bidoche
17th December 2002, 23:55
Damn, I guess I was just recreating the wheel again :p

About how the cache is done, I just thought it may be smart to directly put it into a filter class and let legacy do the job.

something like this:


public CachedFilter : GenericVideoFilter [
struct cachedFrame {
int frameNb;
PVideoFrame frame;
};
int cacheSize, oldest;
cachedFrame cache[];

protected:
CachedFilter(int _cacheSize) : oldest(0), cacheSize(_cacheSize)
{
cache = new cachedFrame[cacheSize];
//init all frameNb to -1
}

//change in the function to implement to actually calculate frames
virtual PVideoFrame calculateFrame(int frameNb) = 0;

public:
PVideoFrame getFrame(int frameNb)
{
//look into the cache and ....
//update it if necessary
}

~CachedFrame(){ delete[] cache; }
}

Then filters who want cache subclass this one and other don't.

SansGrip
18th December 2002, 01:36
Originally posted by sh0dan
@Sansgrip: I think you just lost me there ;) Basically you'd be "intercepting" requests from a filter to its child, e.g.:

source -> crop -> cache manager for nifty_filter -> nifty_filter -> cache proxy for nifty_filter -> resize

Instead of resize requesting the frame from nifty_filter directly, it instead (transparently) requests it from nifty_filter's cache proxy. The cache proxy makes a note of 'n' as being the current frame, then passes the request through to nifty_filter. When nifty_filter then in turn requests from its child (the cache manager) it (the cache manager) can then consult the cache proxy for the value of nifty_filter's 'n'.

Anyway, that's all irrelevant because

I think the algorithm is good enough - I was more into asking who was interested in implementing it Oh ;).

So basicly someone who knows AviSynth very well. I was going to volunteer but then realized that a) I don't know Avisynth's internals that well and b) my todo list is already longer than a German opera :D.

SansGrip
18th December 2002, 01:41
Originally posted by Bidoche
About how the cache is done, I just thought it may be smart to directly put it into a filter class and let legacy do the job.

[...]

Then filters who want cache subclass this one and other don't. A good idea (and more object-oriented than the current way, I would argue) but it should really be the other way around. We want caching to work with all filters that don't specifically disable it, and also to allow filters to specify a cache range.

Unfortunately I can't think of a way to accomplish this using inheritence without recompiling every filter.

sh0dan
18th December 2002, 10:14
Originally posted by Bidoche
something like this:
[...]
Then filters who want cache subclass this one and other don't.

The implementation is actually like this cache.cpp (http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/avisynth2/avisynth/cache.cpp?rev=1.1.1.1&content-type=text/vnd.viewcvs-markup)

There is virtually no speed penalty in putting in a cache, if it is just bypassed, so hinting seems like the best way to me - that way we don't have to break the default mode of operation.

@Sansgrip: Coding what you suggest for multi-input filters would just make it unnecessary complex. Remember, many filters have two or more input sources.

Bidoche
18th December 2002, 12:29
@SansGrip

For caching to work if filters don't disable it, then you put the caching code in the IClip class,
and make the constructor without argument (ie used by all filters) enable caching normally.

This way it does not break the current way of operations.

then getFrame is no longer virtual and does the cache lookup and call the virtual calculateFrame fonction when necessary.
All filters will have to rename their getFrame def into calculateFrame ..

@Sh0dan

My point is not into speed consideration, even if sparing the use of an intermediary cache filter will probably save some virtual function calls.
Just that it is probably cleaner programming and simpler to make each filter handle its own cache rather than count in the top filter to provide it.

It needs to recompile filters, but it just minor change (rename getFrame) and with the upcoming recompile for the 25beta, it might be a good time to do it.

Bidoche
18th December 2002, 12:35
About Thread-safe consideration, does someone know if STL containers are thread-safe ?
If yes, it might help in acheiving it.

SansGrip
18th December 2002, 17:16
Originally posted by sh0dan
@Sansgrip: Coding what you suggest for multi-input filters would just make it unnecessary complex. Remember, many filters have two or more input sources. Eek, I forgot about that. You're right -- that would be very complex.

SansGrip
18th December 2002, 17:26
Originally posted by sh0dan
The implementation is actually like this cache.cpp (http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/avisynth2/avisynth/cache.cpp?rev=1.1.1.1&content-type=text/vnd.viewcvs-markup) So each filter has a cache placed "before" it, and each cache is storing every single frame requested, up to the SetMemoryMax() limit? How do the caches coordinate memory usage?

Also I might be misusing the CVS viewer somehow, but while I see hinting support in the header file, I don't see it used in cache.cpp...

Edit: Wait, I found the right version.

SansGrip
18th December 2002, 17:35
Ok, I think I see how it's meant to work. I'll let it percolate through my brain and get back to you this afternoon ;).

Richard Berg
19th December 2002, 01:03
Originally posted by Bidoche
About Thread-safe consideration, does someone know if STL containers are thread-safe ?
If yes, it might help in acheiving it.
Yes and no: Client must lock shared mutable containers (http://www.sgi.com/tech/stl/thread_safety.html). This consideration isn't in the C++ standard, but I know all the major library developers are following SGI's lead as they did with hash_map and other extensions: it's implemented analogously in GCC (http://gcc.gnu.org/onlinedocs/libstdc++/17_intro/howto.html#3), Dinkum (http://www.mvps.org/vcfaq/lang/8.htm) (what VC uses), and STLPort (http://www.stlport.org/doc/sgi_stl.html#thread_safety). I should mention that the latter has lots of other safety & debugging features, FWIW.

Bidoche
19th December 2002, 10:36
Yes and no: Client must lock shared mutable containers.
Then if we do this it will be ok ?

I didn't know there were that much version of STL. :p

@shodan

I read cache.cpp and I would like to know what is the point to use two different storage strategies, one for normal behavior and one for hints.
Why not use the same linked list and just limit it in size, you drop the node out of radius and it's the same.

pseudo-code:
PVideoFrame getFrame(int n) {
for( each elt of the list)
if (list.size() == max and list[i] out of radius)
drop list[i]
else if (list[i] is the one we want)
return it
create f(n), store in list (error if size at max)
return f(n)
}
and with a STL list or vector (with a reasonable max without hint)
it will make much simpler code

sh0dan
19th December 2002, 12:56
@Bidoche:
I decided to ditch the the old storage form, since I couldn't see what it actually did, so I'd rather implement something that I knew would function.

I really don't know anything about elt-lists and stuff like that. That's why I'm asking here. I cannot see exactly when and which framebuffers are reused - all I can see is that it doesn't work very well.

As I wrote - the current hint-code is only a "working solution", and not a suggestion on how it should be implemented.

Personally I cannot see much difference in inhereting a cache, except that we have to change all filters. To me, implementing caches in the filter chain is just as clean a solution, and since people are used to this, I'd rather let it stay that way.

Would it be possible for you to create a working solution, and mail me the sources?

Richard Berg
19th December 2002, 13:33
I feel like I could implement the proposed system without too much trouble. The tough part (for me) is doing so without tossing out cache.* and half of avisynth.cpp and starting from scratch -- I find them pretty dense to work with. I read over all this stuff again and feel like I understand all the pieces, but couldn't summarize to someone how it all fits together: there's too many places where I will never remember if a class contains a ref-counted object, or a ref-counted container for a ref-counted object, or a plain object, or a circular linked list of ref-counted objects...all implemented with custom container classes...

Nevertheless, I don't want to tear it apart more than necessary; while it's pretty amazing that it all works with as few bugs as it does, the framework is handling a lot of features off-the-shelf containers wouldn't like automatic recycling and tracking how many times a buffer is edited.

Bidoche
19th December 2002, 17:05
I don't see where there is incompatiblity between containers,
class recycling and sequence nombers...

@sh0dan

I will try doing it for the cache filter class.

Besides does someone know what for are those InterlockedIncrement calls for ?
I found out it's part of the Win32 Api, and relative to multithread but there is no such thing in avs yet....

Richard Berg
19th December 2002, 17:58
There's no incompatibility, just confusion in my small brain :)

InterlockedIncrement(x) is a macro that's basically a thread-safe way of saying x++ (nothing else can touch x while it's being incremented).

SansGrip
19th December 2002, 18:33
Originally posted by Bidoche
Why not use the same linked list and just limit it in size, you drop the node out of radius and it's the same. I don't think the code you posted would be optimal given that the cache doesn't know the value of the filter's "n".

For example, when a filter with a CACHE_RANGE of 2 requests a frame and its n is 5, it will request the following frames:

3 4 5 6 7

When the filter requests the first of these, 6 and 7 will be out of the radius so if they are already cached your code would drop them only to re-request them later when the filter starts to process "future" frames.

It might work if you defined your "radius" to be anywhere from n - radius to n + radius * 2, but of course that would result in some unnecessary caching and would only work if the filter is moving forward through the frames. The more general case of caching from n - radius * 2 to n + radius * 2 would result in even more unnecessary work.

It would be much simpler to implement if the cache could know the filter's current "n".

sh0dan
19th December 2002, 20:03
@Sansgrip: The extra caching doesn't hurt (or take up that much memory). I would definately go for the range*2+1 approach, since it would make code much simpler to implement.

Bidoche
19th December 2002, 20:43
@SansGrip

Just replace radius by the full max range and it's ok.
I wrote this mimiking the actual code and It seems to have this flaw
abs(h_frame_nums[i]-n)>h_radius
replace h_radius (the radius) by h_total_frames (the cache size) to correct it

By the way, here is my proposal :
#include <vector>
using namespace std; //vector is in this namespace grrr

class CachedVideoFrame
{
public:
int sequence_number;
int frame_number;
PVideoFrame frame;

CachedVideoFrame(const CachedVideoFrame& cvf)
: sequence_number(cvf.sequence_number), frame_number(cvf.sequence_number), frame(cvf.frame) { }
CachedVideoFrame(int _frame_number, PVideoFrame& _frame)
: sequence_number(_frame->GetFrameBuffer()->GetSequenceNumber()),
frame_number(_frame_number), frame(_frame) {}

bool isValid() const { return sequence_number == frame->GetFrameBuffer()->GetSequenceNumber(); }
};

class Cache : public GenericVideoFilter
{
enum { MAX_SIZE_DEFAULT = 10 }; // is 10 enough ?

int cache_size;
vector<CachedVideoFrame> cache;

public:
Cache(PClip _child) : GenericVideoFilter(_child), cache_size(MAX_SIZE_DEFAULT) {}
PVideoFrame __stdcall GetFrame(int n, IScriptEnvironment* env);
void SetCacheHints(int _cache_size){ if (cache_size >= 0) cache_size = _cache_size; }
static AVSValue __cdecl Create_Cache(AVSValue args, void*, IScriptEnvironment* env);

};

PVideoFrame __stdcall Cache::GetFrame(int n, IScriptEnvironment* env)
{
for (vector<CachedVideoFrame>::iterator it = cache.begin(); it != cache.end(); )
{
if ( it->frame_number == n ) //frame is cached
if ( it->isValid() ) //and unchanged (I don't understand how change could happen but...
return it->frame; //caching succeeds
else {
it = cache.erase(it); //we delete when invalid
continue; //restart loop with next
}
if ( abs(n - it->frame_number) > cache_size) //out of radius
it = cache.erase(it); //we delete
else it++; //increment has to be in the loop since erase moves to next too
}
//if we are here, the cache missed
PVideoFrame result = child->GetFrame(n, env); //create frame
//cache failure if (cache.size() > cache_size)
if (cache_size > 0) //if we cache
cache.push_back(CachedVideoFrame(n, result)); //cache it
return result;
}Sorry, but I was unable to test, even compile the unit, I got silly error messages refusing vector (when of course it works ok everywhere else). I will try again tomorrow.

Richard Berg
19th December 2002, 21:17
<vector> is already included by internal.h, maybe that was messing things up?

Bidoche
19th December 2002, 21:24
certainly not, since I was not including internal.h (maybe I should have !?... :/ tomorrow :p)

sh0dan
19th December 2002, 22:55
Seems nice - I cannot test it for a few days myself, but it seems like a good start.

Bidoche
19th December 2002, 23:14
I found the mistake, totally stupid (of course)
I forgot: using namespace std; at the start of the header

i can compile the unit now (but not the whole dll, C builder just won't, I guess I will have to get VC :p)

corrected the few minor errors left (typo)

SansGrip
19th December 2002, 23:36
Originally posted by Bidoche
Just replace radius by the full max range and it's ok. Yep, that'll fix it.

By the way, here is my proposal Looks like that code should work, provided vector is thread-safe.

I've not had a chance to look at the rest of the caching code yet, so I was wondering where CACHE_NONE would be handled? Presumably in that case no caching object would be inserted into the chain for that filter.

By the way, when using STL I like to do something along the lines of:

typedef vector<CachedVideoFrame> CacheVector;
CacheVector cache;
CacheVector::iterator it;

and so on. I think it makes it more readable :).

Bidoche
20th December 2002, 11:28
New version, this time putting the caching code in the IClip class.
And now I have a good argument in favor of this architecture :
It allows to share cache when more than one top filters
#include <vector>
#include <utility>
using namespace std;

class IClip : private RefCountedObject {

friend class PClip;
friend class AVSValue;

enum { DEFAULT_CACHE_SIZE = 10 };

struct CachedVideoFrame {
int sequence_number;
int frame_number;
PVideoFrame frame;
void* client; //identify for who it was cached

CachedVideoFrame(int _frame_number, PVideoFrame& _frame, void* _client = NULL)
: sequence_number(_frame->GetFrameBuffer()->GetSequenceNumber()),
frame_number(_frame_number), frame(_frame), client(_client) {}

bool isValid() { return sequence_number == frame->GetFrameBuffer()->GetSequenceNumber(); }
};

typedef vector<pair<void*, int> > CacheSizes;
typedef vector<CachedVideoFrame> CacheVector;

int stdCached; //nb of frames cached by standad behavior
CacheSizes sizes;
CacheVector cache;

public:
IClip() : stdCached(0) { }

virtual int __stdcall GetVersion() { return AVISYNTH_INTERFACE_VERSION; }

PVideoFrame __stdcall GetFrame(int n, IScriptEnvironment* env, IClip* client = NULL);
virtual bool __stdcall GetParity(int n) = 0; // return field parity if field_based, else parity of first field in frame
virtual void __stdcall GetAudio(void* buf, __int64 start, __int64 count, IScriptEnvironment* env) = 0; // start and count are in samples
void __stdcall SetCacheHint(int cache_size, Iclip* client); // We do not pass cache requests upwards, only to the next filter.
virtual const VideoInfo& __stdcall GetVideoInfo() = 0;

protected:
virtual PVideoFrame __stdcall MakeFrame(int n, IScriptEnvironment* env) = 0;
};

PVideoFrame __stdcall IClip::GetFrame(int n, IScriptEnvironment* env, IClip* client = NULL)
{
int range = DEFAULT_CACHE_SIZE;
int i = sizes.size();
while( i-- > 0 && sizes[i].first != (void*)client){} //downward while client not found
if ( i>=0 ) //if found (stopped by second condition)
range = sizes[i].second; //get its range
else client = NULL; //else unknown, replace by NULL

CacheVector::iterator deletable; // item who can be removed from cache
bool delNotFound = true;
for(CacheVector::iterator it = cache.begin(); it != cache.end(); it++)
{
if ( it->frame_number == n ) { //frame is cached
if (it->isValid()) //and unchanged
return it->frame; //caching suceeds
else if ( delNotFound && it->client == (void*)client ) {
deletable = it; //mark as deletable when owned by the same client
delNotFound = false;
}
} else //frame nb don't match, we search for one to remove
if ( delNotFound && it->client == (void*)client //only if clients match
&& abs(it->frame_number - n) >= range ) { //and out of range
deletable = it;
delNotFound = false;
}
}
//if we are here the cache missed
PVideoFrame result = MakeFrame(n, env); //create frame
if (range > 0) { //if cache enabled
CachedVideoFrame cvf(n, result, (void*)client)); //new CachedVideoFrame
if (delNotFound) { //haven't found one to remove
cache.push_back(cvf); //we add
if (client == NULL) //if std behavior
stdCached++; //update the count
} else
//here std bahavior and cache request differ
//std behavior always fill up the cache before deleting
//since more than one filters can count on it (and not in same range)
if (client == NULL //std behavior
&& stdCached < DEFAULT_CACHE_SIZE){ //with cache not full
cache.push_back(cvf); //we cache
stdCached++; //and update count
}
else *deletable = cvf; //else overwrite the deletable
}
return result;
}

void __stdcall IClip::SetCacheHint(int cache_size, Iclip* client)
{
sizes.push_back(make_pair((void*)client, cache_size));
}

Richard Berg
20th December 2002, 19:05
Still have something I want to finish before I leave tomorrow so I can't look at this too deeply, but the project appears in good hands...sh0dan can add you as a sourceforge developer if you get something that works...merry xmas all...

Bidoche
21st December 2002, 11:01
Well like everyone I will be rather busy these few days (at least I hope for them), I will see after that

sh0dan
22nd December 2002, 16:44
@Bioche: I have tried for 30-40 minutes to make sense in how it would work in real-life, but I cannot.

I still cannot see any reason to throw off the current design (requiring changes, recompiles, etc.).
The same cache can already have multiple filters requesting from it without any problems - but the "quick-and-dirty" cache hint didn't take that into account.

How will this be used in real life?
What changes will there have to be made to filters?
How does the cache ever get invoked? (When?)
Where does it request frames from the parent filter?

I just don't get ANY of it! :(

PS. The reason you can send more info (CACHE_RANGE, etc.) at the hinting is to be able to implement sample caching, multithread synhronization and other stuff later. If you take it out we will have to recompile again!

trbarry
22nd December 2002, 18:10
The same cache can already have multiple filters requesting from it without any problems - but the "quick-and-dirty" cache hint didn't take that into account.

Avisynth relies heavily on some very sophisticated and non-obvious caching mechanisms that I do not understand. It sounds like something that would be quite easy to break.

This scares me. :scared:

- Tom

Bidoche
24th December 2002, 16:40
It already needs a recompile now.
And for changes, it is to make the code cleaner, easier to understand or less likely of bugs. And probably a bit more efficient, vector should not be slower than actual linked list and are certainly more memory economic.

How will this be used in real life?Each filter automatically hold its own cache, accessed through the GetFrame method.
What changes will there have to be made to filters?Rename their definition of GetFrame to MakeFrame. (sole modification)
How does the cache ever get invoked? (When?)through GetFrame calls
Where does it request frames from the parent filter? When the cache does not have the needed frame, it invokes the method MakeFrame to get it created.

The reason you can send more info (CACHE_RANGE, etc.) at the hinting is to be able to implement sample caching, multithread synhronization and other stuff later.Multithread synch can be added at the start of my GetFrame method.
For the rest, I guess these could be needed, I removed since they were unused.
The same cache can already have multiple filters requesting from it without any problems - but the "quick-and-dirty" cache hint didn't take that into accountyes but it took only one cahce hint into account, that's why I introduced the cache chient parameter


@trbarry

I aimed here to integrate into IClip the caching mechanism provided by the Cache class (btw all IClip are supposedly wrapped into one of these, so...)

sh0dan
25th December 2002, 01:32
@Bioche: Thanks for the reply. I have been thinking about your suggestion since I posted the question, and see some of the dilemmas it solves (and creates).

I'll get back to you when I have a clearer head. This is not something that's easily implemented - and it HAS to be done very carefully.

In general: I see many of your point - however - it was never my intension to change the plugin interface. I wouldn't like plugin-writers to feel they are programming for a "moving target", so if possible I'd (in a perfect world) not to have them worry, or at least just have them recompile with a new avisynth.h

I'll get back to you tomorrow.

scmccarthy
25th December 2002, 05:09
I vote that a recompile is justified if the cache improvements are worth it. Of course I know that's exactly how Sh0dan thinks anyhow (that the benefits will be weighed against the inconveniences).

However, it would be nice if Getframe() could be used the way it always has at least in most cases. Having a simple way to disable caching for filters that do not need it is also important. Both of which possibly point to the need for careful consideration of what the default behavior is or should be.

The proposal certainly impresses me.

Stephen

SansGrip
25th December 2002, 08:15
Originally posted by sh0dan
In general: I see many of your point - however - it was never my intension to change the plugin interface. I wouldn't like plugin-writers to feel they are programming for a "moving target" I've been following this with interest and while I see Bidoche's point I still don't think it's necessary to change the API.

If one is implementing a caching object as just another link in the chain (which is in my opinion the simplest and most compatible way of doing it) then it shouldn't be necessary to add any new methods at all. The cache should be able to intelligently adapt to whatever the filter asks of it, whether that be to cache everything, nothing, or just one frame either side of the centre.

It should be perfectly possible to come up with an algorithm that involves no changes to the API. I personally would argue that no change at all is better than a change that involves filters having to be recompiled.

sh0dan
25th December 2002, 15:41
ok - I'll list up some points.

Bidoche - I really appreciate your suggestion - it has shown what has to be done for a proper implementation. Otherwise my eyes wouldn't have been opened to the dilemmas here.

PClip extension:

We are able to maintain several hinted ranges (multiple filters requesting different temporal places within one output-frame). This is because we now get the requesting filter.
We _are_ still talking about alpha-state software, and people using it have been made aware of that.


Cache rewrite:

Cache operations are completely separate from the filter process.
We don't have to change all filters/plugins.
We will be able to change the cache implementation later. (Is this possible if we inherit the cache?)


It feels like I've just opened Pandora's Box, by beginning to look at the cache ;)

Can anyone see how we are able to send unique identifiers for each filter, so that we able able to see which filter requests which frame, but with the current cache implementation.

Can we find (or give) each created filter a unique number that it could pass along with the cache hint. D*mn that would be a bad hack when I think of it. (forget it)

Could we make this a "recompile-only" solution, changing PClip::GetFrame to include the client (as is does in Bidoche's code), but the instead of having the code in PClip, we still have it in cache.cpp?? Wouldn't that be possible - chances for a recompile is pretty high, but it a 2 minute thing for the coder. PClip::SetCacheHint should also include the client.

Bidoche
25th December 2002, 15:41
@sh0dan
I wouldn't like plugin-writers to feel they are programming for a "moving target", so if possible I'd (in a perfect world) not to have them worry, or at least just have them recompile with a new avisynth.hI understand this concern, but changes should still be possible if reasonably justified.
Actually IClip are not fitted to become thread-safe, for that it's better to have each virtual method hidden behind a 'solid' one where we would be able to add multithread code.

sh0dan
25th December 2002, 16:13
As for multithreading, my considerations are like this:

We want each thread to be able to work at the biggest chucks possible, to avoid too much thread synchronization. Therefore splitting the filter chain up doesn't seem like the best solution.

My thoughts on multithreading is more along the lines of parallel frame processing. Simply put, when AviSynth is requested to deliver frame n, CPU 0 begins to generate this, while CPU 1 begins to generate frame n+1 and so on. In a linear world this wouldn't pose any problems.

Since AviSynth is non-linear, it would have to be synchronized. This is where the up-filter calling of AviSynth will be useful. When the frame request is sent up the filter chain, the cache will mark these frames as "being generated", meaning that other threads shouldn't try to create them. When the frame the given place in the graph is finished it will be released so the other thread requesting the frame can continue (and be given the cached frame).

The cache would then act as the synchronizer between threads, if any conflicts should occur. In principle this should scale quite well, if the processing time of each frame is fairly similar. Some internal work still have to be done for this to work.


Multithreading isn't a planned feature for 2.5 - however if the API needs a change for it, now would be the time.

dividee
25th December 2002, 19:02
Maybe a stupid idea, but wouldn't it be possible to add a Cache filter before each filter instead of after it ? Instead of creating the Cache filter just after creating a filter, you scan the arguments of the to-be-created filter and replace the clip arguments with a newly created Cache instance (maybe in ScriptEnvironment::Invoke). This way you'd never have multiple clients and so avoid the problem of distinguishing them, and the cache code becomes simpler.

For efficiency, the framebuffers should still be shared by the instances that cache the same clip... I don't know if it can be done. Maybe it isn't such a great idea.

But the same technique could be used in conjunction with the current Cache as a way to identify clients without having to modify them, as a kind of cache proxy, if you follow me. The overhead involved is one additional virtual function call per GetFrame, if I'm not mistaken. Even SetCacheHints wouldn't have to pass a client identifier.

Bidoche
25th December 2002, 20:12
@divide

I didn't check into the parser code if it creates a cache for each created filter or for each argument of one top filter, maybe it is done that way (will have to check that)..

But having one cache for each filter (or cache built-in in the filter class) allows to get cache hit between differents top filters, so it seems a good idea. Of course it needs to take that into account (mine does)


@sh0dan

why a being generated flag ? better blocking access to the data until
generation is complete, no ?

sh0dan
25th December 2002, 21:16
@dividee / bidoche:

Considering the cache as a filter specific INPUT filter (rather than output filter as it is now) is interesting. Not as a 100% solution, but to benefit from the fact that one cache = one filter requesting from it.

Some definitions: (I need this be think straight) ;)

-Output cache (caches output of a filter). Multiple filters can request from this filter.
-Input cache (caches input). Only one filter can request from this cache.
-Encapsulation (really an output cache) - this way the filter is encapsulated by the cache. This is the best way I can describe Bidoche's suggestion.

What if:

We added another filter - let's call it cachemanager. This filter will be added before all filters. One filter in the filter chain before all filters.
This filter will not cache any images. It will however recieve cache hints from the filter it is cachemanager for. It (of course) also intercepts all GetFrame requests.

We will still add the output filter, as we used to. So basicly the cachemanager is sending commands to the output filter of the previous filter.

The cachemanager generates a random number on startup, so it can identify itself uniquely to the output filter. The cachemanager then sends commands to the cache about which files it needs to have stored on each frame. Or it could simply send which frames it would like to have removed from the cache.

This makes it all fairly tricky, but it keeps cache independent of the filters, and it can all be done with the existing interface.


Does it make any sense???

dividee
25th December 2002, 23:51
@Bidoche:
The Cache is currently created after the filter is evaluated (see ExpFunctionCall::Evaluate in expression.cpp).

@Shodan:We added another filter - let's call it cachemanager. This filter will be added before all filters. One filter in the filter chain before all filters.
This filter will not cache any images. It will however recieve cache hints from the filter it is cachemanager for. It (of course) also intercepts all GetFrame requests.

We will still add the output filter, as we used to. So basicly the cachemanager is sending commands to the output filter of the previous filter.That's what I was calling "cache proxy" in my previous post, but "cachemanager" is a better term, and you explained it a lot better. :)The cachemanager generates a random number on startup, so it can identify itself uniquely to the output filter.Why generate a random number ?? Just passing the 'this' pointer should work as well.The cachemanager then sends commands to the cache about which files it needs to have stored on each frame. Or it could simply send which frames it would like to have removed from the cache.Not sure I understand here. Do you suggest having only one cachemanager per filter, even if the filter has multiple clip arguments? I'd say you need one cachemanager per upstream filter; I can't see how you would implement it otherwise.

sh0dan
26th December 2002, 00:36
@dividee:
The random number thingie is to be able to use the current cache-hinting feature to pass arguments. Can we extend the PClip class (with a new method) without forcing recompiles of the plugins?

I see your point on one cachemanager per upstream clip. Attaching that on script generation shouldn't be that difficult.

My point by making the cache manager communicate to the cache is to allow one output cache (from one filter) to have several cachemanagers talking to it. The number (or the this argument) is only to let the output cache of the previous filter know which cache-manager it is talking to.

To clarify: We maintain an output cache on all filters - just as it is today. Inbetween the output cache and the input of the filter we stick a cache manager filter. This filter identifies itself to the cache, and sends hints to the cache about which input filter it is. The cache is then able to determine which unique input filter is requesting frames now. Thereby it is able to cache only the frames needed by this input filter. There can be several cache managers talking to the cache, but since they are all uniquely identified it is now able to distinguish.

If we can extend IClip to include a PVideoFrame GetCachedFrame(int frame, IClip* cache, IScriptEnvironment* env) then that could be the way the cache-manager requested frames from the output cache of the previous filter.

That could work - couldn't it? We get all the advantages, no random numbers and no recompiles.

dividee
26th December 2002, 02:18
I attach a figure of what I think you suggest. Looks good to me, provided you can add the method to IClip without a recompile of the plugins. I think it's possible, if you add the new method at the end (textually; in declaration order) of IClip in avisynth.h.

I've been experimenting a bit with loading V1/V2.0 plugins in V2.5 an saw that when an old plugin was calling IClip::GetVideoInfo, in fact SetCacheHints was called because it 'has taken it's place' in the vtable (compare the old definition of IClip and the new one to see what I mean). So, it seems virtual methods are called by their index in the vtable and the order of the methods in the vtable follow the order of declaration...

PS: for this experiment of loading old plugins, I tried to create two "Adapter" filters (one upstream and one downstream) which would translate between the old and the new interfaces. It worked for some filters (those who used 'MakeWritable') but not others (those who used NewVideoFrame), because I didn't "wrap" the ScriptEnvironment and the plugin passed the wrong VideoInfo struct. I gave up because the code was already a real mess (with void* casts and the like) and I couldn't see how to deallocate things properly. If anyone is interested I can open a new thread about it.

scmccarthy
26th December 2002, 05:35
@dividee

Dividing all filters between those that use MakeWritable and those that use NewVideoFrame is helpful to me because I never used MakeWritable before. I was looking at PeachSmoother script and didn't understand when the frame was input or output plus how to pass/read? video info and the ScriptEnvironment.

The question that occurs to me now is is there a difference between the way exceptions are thrown between the two types of filters?

Stephen

sh0dan
26th December 2002, 13:12
@dividee: Your illustration hits the nail right on the head! That's what I was thinking.

I'll try do to an experimental implementation, if Bidoche isn't interested (it's really only minor changes to his code, and wrapping it into cache.cpp)

Thanks for investigating the issue regarding plugins. I hope that most people use NewVideoFrame, except if they only do partial image processing - in my head it would theoretically be faster (avoiding one image blit per frame).


@scmccarthy: When you use env->MakeWritable() you take an input image, and makes it possible for your filter to write on it. That way it becomes both source and destination. This is a good thing, if you only write on a part of the image (so you don't have to copy the entire image).
Otherwise you might as well create a new image to write your new image into.

neuron2
26th December 2002, 14:24
Originally posted by sh0dan
I hope that most people use NewVideoFrameI use MakeWritable() a lot. Is that going to be a problem?

Bidoche
26th December 2002, 18:59
@sh0dan

You can use my code, no problem.

I don't understand why doing everything to avoid recompile here, since 2.5beta is supposed to require one...
I think we can afford limited structure change in the api rather than adding complexity.

About MakeWritable and such, I think there is a lack in the api :
PVideoFrame works as a VideoFrame * const
if we add a PCVideoFrame working as a Const VideoFrame * const, it would solve some issues.
Then filters will return PCVideoFrame (not modifiable)
if you want to modify, you cast it to PVideFrame (same as MakeWritable) or make a new PVideoFrame.
This will allow to remove the sequence number in the vfb class and make sure than VideoFrames in cache are not modified.

I will come with relative code when ready.

dividee
26th December 2002, 21:10
@sh0dan

As I said, I mostly gave up this plugin adaptation, but since you insist, I'll have another look ;) It somehow worked for some filters, but crashed at the end and was more a proof of concept than anything. I doubt I can make clean code for it, and my goal was just to make a temporary solution to help the adoption of the new version; when enough filters are available for 2.5, it would have been removed.

@neuron2

Using MakeWritable is good, IMHO. Really, I don't understand why Sh0dan hopes that most people use NewVideoFrame.

@Bidoche

Won't this (CPVideoFrame) affect performance ? After all, when a framebuffer is modified in the cache, that's for performance reason, isn't it ? And won't casting it to a PVideoFrame negate the benefit anyway ?

@all

This leads me to a question. I don't understand perfectly how all this works. When is a frame in the cache modified ? When the filter use MakeWritable ? But then, MakeWritable makes a copy of the frame if the refcount is >1. And how can the refcount be only 1 if it's owned both by the cache and the filter ?
Also sh0dan stated above that MakeWritable is mostly useful when you only write part of the image. But in my experience it's nearly always benefical (if you can use it). For instance, Tweak is noticeably faster with MakeWritable, and it filters the whole image. That leads me to think that often enough MakeWritable doesn't have to do a memory copy. So what's this thing with the refcounter ? Doesn't the cache "owns" the frame too ?

Bidoche
26th December 2002, 21:28
@dividee

The actual caching code invalidates VideoFrame modified while cached.
So making them impossible to modify will simplify.

To my undestanding, these modifications are processing fault anyway. A filter is supposed to always give same output, that's why caching is possible, if a cached frame is modified, it will contradict that and may give strange (and unpredictable) results.

MakeWritable can be faster when a VideoFrame is not shared.
Because then it does not copy , but return the source itself.
Of course, for that to happen, it should not be cached.

And no, I don't think it can affect performance.
The cast will 'steal' the pointer when possible (ie 1 ref) or copy otherwise.

Bidoche
2nd January 2003, 18:06
In attachement, a modified avisynth.h (not complete, nor final) with some (lots) structure change.

It breaks lots of things, I don't know if it breaks something really vital. (like the exporting env stuff...)

Bidoche
10th January 2003, 17:19
@sh0dan

About this optional parameter, it will force everyone to update their GetFrame definition to include it, but not change the internal code.

Can you precise which linker error occurred ?
Are you sure it's not because you did not commented/uncommented the good sections ?

a default to cache last 5 (or 10) should be okay I think

sh0dan
10th January 2003, 17:25
The linker error:

cache.obj : error LNK2001: unresolved external symbol "public: void __thiscall FrameCache::CachedVideoFrame::set(int,class PVideoFrame const &)" (?set@CachedVideoFrame@FrameCache@@QAEXHABVPVideoFrame@@@Z)
Release/avisynth.dll : fatal error LNK1120: 1 unresolved externals
Error executing link.exe.

Seems like it is missing a set function.

About the GetFrame stuff, I assume that only those interested in changing the default caching mechanism will need to change it.

Bidoche
10th January 2003, 17:50
set is missing...
Arghhhh, stupid me !!!
I added this one to increase efficiency and I forgot to implement it...

neuron2
10th January 2003, 18:55
Umm, isn't it risky to put new caching in right before the beta release?

sh0dan
10th January 2003, 19:26
It is - we should test it as good as possible. It'll probably be the main reason for beta 1.
Otherwise I actually feel we might just as well go ahead and make a 2.5.0, without the beta label.

It shouldn't mean total reaarangement of anything. There could be some memory issues, but that's the worst I think could happend - functional bugs in the cache are rather easy to discover.