Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 8th June 2005, 20:06   #1  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
change to avisynth to make it more threadsafe

after I made the plugin mt I discovered the way avisynth handles VideoFrame is thread unsafe. The problem lies in how the list of VideoFrames are handle in VideoFrame:: operator new(unsigned). Here the first VideoFrame with a refcount of 0 is returned to the VideoFrame constructor. The problem is that the refcount is increased when it reaches the PVideoFrame object so in the mean time another thread could have grapped it. I solved this in mt by wrapping the functions that creates a new videoframe(by calling new VideoFrame) so only one thread at a time can access these function. The following functions calls new VideoFrame: cache::GetFrame, ScriptEnvironment::NewVideoFrame(int row_size, int height, int align), ScriptEnvironment::NewPlanarVideoFrame(int width, int height, int align, bool U_first), (both is used by ScriptEnvironment::NewVideoFrame(const VideoInfo& vi, int align) and ScriptEnvironment::MakeWritable(PVideoFrame* pvf)), VideoFrame::Subframe .

I suggest the following fixes to avoid it.
Code:
void* VideoFrame::operator new(unsigned) {
  // CriticalSection
  //optional EnterCriticalSection(&csVideoFrame);
  for (LinkedVideoFrame* i = g_VideoFrame_recycle_bin; i; i = i->next)
    if (i->vf.refcount == 0)
{    
      InterlockedIncrement((long*)&i->vf.refcount);
      if(i->vf.refcount==1) 
           return &i->vf;
      else
           InterlockedDecrement((long*)&i->vf.refcount);
}
//LeaveCriticalSection(&csVideoFrame);
  LinkedVideoFrame* result = (LinkedVideoFrame*)::operator new(sizeof(LinkedVideoFrame));
  result->next = g_VideoFrame_recycle_bin;
  g_VideoFrame_recycle_bin = result;
  return &result->vf;
}
for even greater threadsafety CriticalSection could be used (csVideoFrameBuffer should of course be initialized somewhere else)
Code:
VideoFrame::VideoFrame(VideoFrameBuffer* _vfb, int _offset, int _pitch, int _row_size, int _height)
  : refcount(1), vfb(_vfb), offset(_offset), pitch(_pitch), row_size(_row_size), height(_height),offsetU(_offset),offsetV(_offset),pitchUV(0)  // PitchUV=0 so this doesn't take up additional space
{
  InterlockedIncrement(&vfb->refcount);
}

VideoFrame::VideoFrame(VideoFrameBuffer* _vfb, int _offset, int _pitch, int _row_size, int _height, int _offsetU, int _offsetV, int _pitchUV)
  : refcount(1), vfb(_vfb), offset(_offset), pitch(_pitch), row_size(_row_size), height(_height),offsetU(_offsetU),offsetV(_offsetV),pitchUV(_pitchUV)
{
  InterlockedIncrement(&vfb->refcount);
}
the refcount is just initialized to 1 to avoid the VideoFrame to be recruted by another thread.
Code:
VideoFrame* VideoFrame::Subframe(int rel_offset, int new_pitch, int new_row_size, int new_height) const {
  VideoFrame* Retval=new VideoFrame(vfb, offset+rel_offset, new_pitch, new_row_size, new_height); 
  InterlockedDecrement((long*)&Retval->refcount
  return Retval;
}

VideoFrame* VideoFrame::Subframe(int rel_offset, int new_pitch, int new_row_size, int new_height, int rel_offsetU, int rel_offsetV, int new_pitchUV) const {
  VideoFrame* Retval=new VideoFrame(vfb, offset+rel_offset, new_pitch, new_row_size, new_height, rel_offsetU+offsetU, rel_offsetV+offsetV, new_pitchUV);
  InterlockedDecrement((long*)&Retval->refcount);
  return Retval;
}
It is neccesary to decrease the refcount because it is not posible to change avisynth.h without breaking existing filters, this introduce a small window where another thread could grap the VideoFrame (The ideal solution would be to have VideoFrame::Release decrement the refcount to 0 when it hits 1.)
Code:
PVideoFrame __stdcall ScriptEnvironment::NewVideoFrame(const VideoInfo& vi, int align) {
  // Check requested pixel_type:
  switch (vi.pixel_type) {
    case VideoInfo::CS_BGR24:
    case VideoInfo::CS_BGR32:
    case VideoInfo::CS_YUY2:
    case VideoInfo::CS_YV12:
    case VideoInfo::CS_I420:
      break;
    default:
      ThrowError("Filter Error: Filter attempted to create VideoFrame with invalid pixel_type.");
  }
  // If align is negative, it will be forced, if not it may be made bigger
  if (vi.IsPlanar()) { // Planar requires different math ;)
    if (align>=0) {
      align = max(align,FRAME_ALIGN);
    }
    if ((vi.height&1)||(vi.width&1))
      ThrowError("Filter Error: Attempted to request an YV12 frame that wasn't mod2 in width and height!");
    PVideoFrame retval=ScriptEnvironment::NewPlanarVideoFrame(vi.width, vi.height, align, !vi.IsVPlaneFirst());  // If planar, maybe swap U&V
    InterlockedDecrement((long*)&retval->refcount);
    return retval;
  } else {
    if ((vi.width&1)&&(vi.IsYUY2()))
      ThrowError("Filter Error: Attempted to request an YUY2 frame that wasn't mod2 in width.");
    if (align<0) {
      align *= -1;
    } else {
      align = max(align,FRAME_ALIGN);
    }
    PVideoFrame retval=ScriptEnvironment::NewVideoFrame(vi.RowSize(), vi.height, align);
	InterlockedDecrement((long*)&retval->refcount);
    return retval;
  }
}
same story again.
Code:
bool ScriptEnvironment::MakeWritable(PVideoFrame* pvf) {
  const PVideoFrame& vf = *pvf;
  // If the frame is already writable, do nothing.
  if (vf->IsWritable()) {
    return false;
  }

  // Otherwise, allocate a new frame (using NewVideoFrame) and
  // copy the data into it.  Then modify the passed PVideoFrame
  // to point to the new buffer.
    const int row_size = vf->GetRowSize();
    const int height = vf->GetHeight();
    PVideoFrame dst;
    if (vf->GetPitch(PLANAR_U)) {  // we have no videoinfo, so we can only assume that it is Planar
      dst = NewPlanarVideoFrame(row_size, height, FRAME_ALIGN,false);  // Always V first on internal images
	  InterlockedDecrement((long*)&dst->refcount);
     } else {
      dst=NewVideoFrame(row_size, height, FRAME_ALIGN);
      InterlockedDecrement((long*)&dst->refcount);
     //LeaveCriticalSection(&csVideoFrame); 
    }
    BitBlt(dst->GetWritePtr(), dst->GetPitch(), vf->GetReadPtr(), vf->GetPitch(), row_size, height);
    // Blit More planes (pitch, rowsize and height should be 0, if none is present)
    BitBlt(dst->GetWritePtr(PLANAR_V), dst->GetPitch(PLANAR_V), vf->GetReadPtr(PLANAR_V), vf->GetPitch(PLANAR_V), vf->GetRowSize(PLANAR_V), vf->GetHeight(PLANAR_V));
    BitBlt(dst->GetWritePtr(PLANAR_U), dst->GetPitch(PLANAR_U), vf->GetReadPtr(PLANAR_U), vf->GetPitch(PLANAR_U), vf->GetRowSize(PLANAR_U), vf->GetHeight(PLANAR_U));

    *pvf = dst;
    return true;
}

Code:
PVideoFrame __stdcall Cache::GetFrame(int n, IScriptEnvironment* env) 
{ 
  n = min(vi.num_frames-1, max(0,n));  // Inserted to avoid requests beyond framerange.

  __asm {emms} // Protection from rogue filter authors

  if (h_policy == CACHE_NOTHING) { // don't want a cache. Typically filters that only ever seek forward.
    __asm mov ebx,ebx  // Hack! prevent compiler from trusting ebx contents across call
    PVideoFrame result = childGetFrame(n, env);
//  if (result->vfb) env->ManageCache(MC_ReturnVideoFrameBuffer, result->vfb); // return vfb to vfb pool for immediate reuse
    return result;
  }

  if (h_policy == CACHE_RANGE) {  // for filters that bash a span of frames. Typically temporal filters.
    PVideoFrame result;
    bool foundframe = false;

    for (int i = 0; i<h_total_frames; i++) {
      if (h_video_frames[i]->status & CACHE_ST_USED) {
		// Check if we already have the frame
	    if (h_video_frames[i]->frame_number == n) {
		  result = new VideoFrame(h_video_frames[i]->vfb, h_video_frames[i]->offset, h_video_frames[i]->pitch,
								  h_video_frames[i]->row_size, h_video_frames[i]->height,
								  h_video_frames[i]->offsetU, h_video_frames[i]->offsetV,
								  h_video_frames[i]->pitchUV);
		  foundframe = true;
		}
		// Check if it is out of scope
		else if (abs(h_video_frames[i]->frame_number-n)>=h_total_frames) {
		  h_video_frames[i]->status |= CACHE_ST_DELETEME;
		  if (!(h_video_frames[i]->status & CACHE_ST_HAS_BEEN_RELEASED)) {  // Has this framebuffer been released?
			UnlockVFB(h_video_frames[i]);  // We can now release this vfb.
			h_video_frames[i]->status |= CACHE_ST_HAS_BEEN_RELEASED;
		  }
		}
      }
    } // for (int i

    if (foundframe) {   // Frame was found - build a copy and return a (dumb) pointer to it.

      VideoFrame* copy = new VideoFrame(result->vfb, result->offset, result->pitch, result->row_size,
                                        result->height, result->offsetU, result->offsetV, result->pitchUV);
      _RPT2(0, "Cache2:%x: using cached copy of frame %d\n", this, n);

      InterlockedDecrement((long*)&result->refcount);
	  InterlockedDecrement((long*)&copy->refcount);
      return copy;
    }
    else {
	  __asm mov ebx,ebx  // Hack! prevent compiler from trusting ebx contents across call
      result = childGetFrame(n, env); // Should be checking the rest of cache

      // Find a place to store it.
      for (i=0 ;; i++) {
        if (i == h_total_frames)
#ifdef _DEBUG
          env->ThrowError("Cache2:%x: Internal cache error! Report this!", this);
#else
          return result; // Should never happen
#endif
        if (h_video_frames[i]->status & CACHE_ST_DELETEME)   // Frame can be deleted
          break;

        if (!(h_video_frames[i]->status & CACHE_ST_USED))    // Frame has not yet been used.
          break;
      }  
      _RPT2(0, "Cache2:%x: Miss! Now locking frame frame %d in memory\n", this, n);
    }

    if (h_video_frames[i]->status & CACHE_ST_USED) ReturnVideoFrameBuffer(h_video_frames[i], env); // return old vfb to vfb pool for early reuse

    // Store it
    h_video_frames[i]->vfb      = result->vfb;
    h_video_frames[i]->sequence_number = result->vfb->GetSequenceNumber();
    h_video_frames[i]->offset   = result->offset;
    h_video_frames[i]->offsetU  = result->offsetU;
    h_video_frames[i]->offsetV  = result->offsetV;
    h_video_frames[i]->pitch    = result->pitch;
    h_video_frames[i]->pitchUV  = result->pitchUV;
    h_video_frames[i]->row_size = result->row_size;
    h_video_frames[i]->height   = result->height;
    h_video_frames[i]->frame_number = n;
    h_video_frames[i]->faults = 0;
  // Keep this vfb to ourselves!
  // 1. This prevents theft by Plan B:
  // 2. This make the vfb readonly i.e. MakeWriteable() returns a copy
    LockVFB(h_video_frames[i]);
    h_video_frames[i]->status   = CACHE_ST_USED;

    return result;
  }
Code:
VideoFrame* Cache::BuildVideoFrame(CachedVideoFrame *i, int n)
{
  Relink(&video_frames, i, video_frames.next);   // move the matching cache entry to the front of the list
  VideoFrame* result = new VideoFrame(i->vfb, i->offset, i->pitch, i->row_size, i->height, i->offsetU, i->offsetV, i->pitchUV);

  // If we have asked for a same stale frame twice, leave frames locked.
  if ((fault_rate <= 160) || (fault_rate == 190)) { // Reissued frames are not subject to locking as readily
	UnlockVFB(i);
	_RPT2(0, "Cache:%x: using cached copy of frame %d\n", this, n);
  }
  else {
	_RPT3(0, "Cache:%x: lock vfb %x, cached frame %d\n", this, i->vfb, n);
  }
InterlockedDecrement((long*)&result->refcount);
  return result;
}
This is the cache from the current cvs. I'm not total familiar with out put I think that could do.

Also the linked list with VideoFrameBuffer 's can be accessed by two threads at the same time but it seems like it is only ScriptEnvironment::GetFrameBuffer2 that handles the list so putting a EnterCriticalSection/LeaveCriticalSection would be an easy solution.
[edit] that might not be enough because the VideoFrameBuffer is returned with a refcount of 0. So something like the VideoFrame hack above might be needed[/edit]

All this should allow two different threads to work on different clips/videoframes at the same time like mt does it. Because only the scripteviroment interface is exposed user filters can easy overwrite it to handle the GetVar SetVar.[Edit]now the code compiles[/edit]

Last edited by tsp; 9th June 2005 at 00:07.
tsp is offline   Reply With Quote
Old 8th June 2005, 23:37   #2  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
hmm can't add more to the above post.
Well I will try test this modification on the current CVS version and see if I can improve it. Good thing I kept my old abit BP6 motherboard.
[Edit]
small test with the above modification change the number of frames before a crash from ~10 to ~150 with mt version 0.10. Now I just need to try to secure the Videoframebuffer.
[/edit]

Last edited by tsp; 9th June 2005 at 00:30.
tsp is offline   Reply With Quote
Old 9th June 2005, 09:06   #3  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,173
@tsp, congrats, a maximum size post

A simpler approach might be to just rewrite the "VideoFrame:perator new" code to be threadsafe. Something along the lines you have started with but with 2 chains, a free chain and an inuse chain. Unfortunatly having old baked in code from avisynth.h makes getting recently freed objects back on the free chain again a little challanging but I can see possibilities.

As mentioned VideoFrameBuffer's need propper management.

Also AVSValue objects may need some attention.

IanB
IanB is offline   Reply With Quote
Old 9th June 2005, 10:19   #4  |  Link
Bidoche
Avisynth 3.0 Developer
 
Join Date: Jan 2002
Location: France
Posts: 639
Why not try getting rid of the recycling ?
It's the easiest to do, and personally I do not think it has a big impact.

VideoFrame are small objects after all.
Bidoche is offline   Reply With Quote
Old 9th June 2005, 17:52   #5  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
IanB you changed the IScriptEnviroment in 2.56 not good when I supplies my own version of IScriptEnviroment but I got that working now. Also implementet the neccesary changes when grapping a VideoFrameBuffer. It seems to be stable.[Edit]not yet. might be the cache playing tricks. will try disabling and see if it helps.[/edit]

Another problem I have now is that in some really strange way mt manages two allow two threads to enter a CriticalSection at the same time crashing avisynth because suddenly two threads access the same clip at the same time.
Also because newvideoframe, cache, makewritable and subframe returns PVideoFrame and the refcount is decremented after the videoframe has been assignt to a PVideoFrame object the refcount never reaches 0.
[Edit]BTW when building the latest CVS version with VS 2003 NET avisynth shows the errormessages instead of crashing with win xp sp2)[/Edit]

Bidoche: the problem is that to do that efficient we would have to change the avisynth.h file to call delete and that would break existing applications. Else a Garbage collector should be implemented(to delete the unused videoframes in the linked list) but that might hurt performance.

Last edited by tsp; 10th June 2005 at 01:47.
tsp is offline   Reply With Quote
Old 11th June 2005, 02:14   #6  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
why can't the stupid compiler figure out how to emit a lock cmpxchg instruction when using InterlockedCompareExchange

This code ought to work but two threads can still access the same videoframe. at least when I set a breakpoint that breaks if refcount==2 when returning. I think I will try assembly at least you get what you write.

Code:
	void* VideoFrame::operator new(unsigned) {
		// CriticalSection
		for (LinkedVideoFrame* i = g_VideoFrame_recycle_bin; i; i = i->next)
			if (0==InterlockedCompareExchange((long*)&i->vf.refcount,1,0))
			{    
					return &i->vf;//doesn't work if breakpoint is set here???
			}
		
			LinkedVideoFrame* result = (LinkedVideoFrame*)::operator new(sizeof(LinkedVideoFrame));
			result->vf.refcount=1;
			result->next = g_VideoFrame_recycle_bin;
			g_VideoFrame_recycle_bin = result;
			return &result->vf;
	}
tsp is offline   Reply With Quote
Old 11th June 2005, 07:22   #7  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,173
Quote:
Originally Posted by tsp
IanB, you changed the IScriptEnviroment in 2.56 not good when I supplies my own version of IScriptEnviroment
Sorry about that chief, the high price of progress. For what you are doing you must compile with the an avisynth.h that supports all the features of the avisynth.dll you are using.

Quote:
[Edit]BTW when building the latest CVS version with VS 2003 NET avisynth shows the errormessages instead of crashing with win xp sp2)[/Edit]
Yes, have a look at the absolutly filthy hack I did to force it to work correctly.

Also something probably need to be done to this:-

LinkedVideoFrame* result = (LinkedVideoFrame*)::operator new(sizeof(LinkedVideoFrame));
result->vf.refcount=1;
result->next = g_VideoFrame_recycle_bin;
g_VideoFrame_recycle_bin = result;


Like this :-

LinkedVideoFrame* result = (LinkedVideoFrame*)::operator new(sizeof(LinkedVideoFrame));
result->vf.refcount=1;
result->next = InterlockedExchange(&g_VideoFrame_recycle_bin, result);


Also this is not the latest CVS it is missing Tritical's leak-feast fixes, which would be this :-

LinkedVideoFrame* result = (LinkedVideoFrame*)::operator new(sizeof(LinkedVideoFrame));
result->vf.refcount=1;
result->next = InterlockedExchange(&(g_Bin.g_VideoFrame_recycle_bin), result);

IanB
IanB is offline   Reply With Quote
Old 11th June 2005, 15:54   #8  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
Fianly I figured out what caused the crash in the cach:

it was this part in cache::GetFrame
Code:
if (foundframe) {   // Frame was found - build a copy and return a (dumb) pointer to it.

      VideoFrame* copy = new VideoFrame(result->vfb, result->offset, result->pitch, result->row_size,
                                        result->height, result->offsetU, result->offsetV, result->pitchUV);
      _RPT2(0, "Cache2:%x: using cached copy of frame %d\n", this, n);
Is it okay to assign copy or the new VideoFrame directly to another PVideoFrame and return that instead(and wouldn't copy be assigned to a PVideoFrame anyway when GetFrames returnvalue is a PVideoFrame) else the refcount reaches 0 and the copy will be assigned to another videoframe.

[edit]
Triticals fix was mising because I used the 2.55 code for the VideoFrame:perator new and copied it into the cvs (and yes it did complain about a missing g_VideoFrame_recycle_bin)

changing copy from a VideoFrame* to a PVideoFrame seems to make the big difference. Not a single crash yet after 25000 frames(with both a release and debug buildt) using this script:

Code:
mt("blur2()",5)
function blur2(clip c)
{
c.blur(1)
blur(0.7)
TemporalSoften(4,4,4)
}
I will try polish the code and when post the diff or a link to the changed files and when you can deside if you want to integrate it into the CVS.
[/edit]

Last edited by tsp; 11th June 2005 at 20:50.
tsp is offline   Reply With Quote
Old 13th June 2005, 23:10   #9  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
ok I'm done with the most critical changes to make avisynth more threadsafe. The files that has been changes compaired to the current(that is today) CVS is available here. I will release the binary shortly in the mt thread. Also I did some testing to see how much this changes affected the speed. I used Didee's iip function that execise the avisynth core quite well. This is the time for previewing 1000 frames in virtualdub(1.6.6 and 720x576 video with iip())

avisynth 2.55
15:05 min.

avisynth 2.56 (current CVS compiled with visual c++ 2003 .NET)
14:40 min

thread safer avisynth 2.56
14:42 min.

so no significant difference between the modified version and it seems if version 2.56 is about 2-3% faster than 2.55. That is proberly because of the improved cache.

I improved Triticals leak fix so the memory is freed when the script unloads and not when the dll unloads(by using a pointer instead)

Also all internal function using VideoFrame::Subframe has been changed to use IScriptEnviroment::Subframe (because the videoframe has a refcount of zero when returned so it potentialy could be assigned to another PVideoFrame) and VideoFrame::Subframe has been declaired a private function (so stupid plugin writers get's a warning if they try to use it)
tsp is offline   Reply With Quote
Old 14th June 2005, 11:47   #10  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,173
@TSP,

Wow, I'll need a few days to digest it throughly.

Some quick thoughts.

The Planar env->SubFrame() should have been added ages ago. Thanks for the poke. It's use everywhere is a good thing.

You can't redefine the VideoFrame* Subframe() as private, unfortunately all the plugins already compiled know about it as public already. We are locked in supporting it this way. However it's not supposed to be used so a strong caveat will have to do.

Not to sure about the _WIN32_WINNT definition.

Already tried the RecycleBin *g_Bin=0 tidy up, it fails with multiple use of the CreateScriptEnvironment(int version) interface. The 1st release kills the subsequent issues.

I feel there must be an easier way than all the incrementing/decrementing of the use count. I'll see if anything come to mind as I read thru your changes. Maybe I'll come up with something that avoids the subframe dificulty.

IanB
IanB is offline   Reply With Quote
Old 14th June 2005, 14:53   #11  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
Quote:

You can't redefine the VideoFrame* Subframe() as private, unfortunately all the plugins already compiled know about it as public already. We are locked in supporting it this way. However it's not supposed to be used so a strong caveat will have to do.
I think is is only at compile time it matters if the function is public or private, so existing plugins will still work. I'm not even sure if it is posible to use VideoFrame::subframe.
I tried to use it in a simple function like this:
GetFrame(int n,IScriptEnviroment *env)
{
PVideoFrame src=child->GetFrame(n,env);
return src->Subframe(0, 100, 100,100);
}
and got this error from the linker:
error LNK2019: unresolved external symbol "public: class VideoFrame * __thiscall VideoFrame::Subframe(int,int,int,int)const " (?Subframe@VideoFrame@@QBEPAV1@HHHH@Z) referenced in function "public: virtual class PVideoFrame __stdcall BinomialBlur::GetFrame(int,class IScriptEnvironment *)" (?GetFrame@BinomialBlur@@UAG?AVPVideoFrame@@HPAVIScriptEnvironment@@@Z)
(this is with avisynth 2.55 header and dll installed)

so hopefully this means that no external filters uses VideoFrame::SubFrame
(And if you didn't notice I changed all the internal functions to use env->SubFrame)

Quote:
Not to sure about the _WIN32_WINNT definition.
It is neccesary for InitializeCriticalSectionAndSpinCount. It will only affect people that uses WinNT 4 with SP2 or lower or Windows 95. If that is not acceptable InitializeCriticalSection could be used instead(or maybe a simple if(winver<WinNT4sp3||winver<win98) InitializeCriticalSection() else InitializeCriticalSectionAndSpinCount() )
Also there might be a better place to put the definition I just haven't found it
yet.

Quote:
Already tried the RecycleBin *g_Bin=0 tidy up, it fails with multiple use of the CreateScriptEnvironment(int version) interface. The 1st release kills the subsequent issues.
so a static int ScriptEnviroment::Refcount might be the solution then (only delete if it is the last instance.) Also when is CreateScriptEnvironment called multiple times?
Quote:
I feel there must be an easier way than all the incrementing/decrementing of the use count. I'll see if anything come to mind as I read thru your changes. Maybe I'll come up with something that avoids the subframe dificulty.
maybe a #define Locked(a) InterlockedCompareExchange((long*)&a ,1,0)==0
and #define Unlock(a) InterlockedDecrement((long*)&a)
the good thing about using the refcount to lock is that the different threads doesn't have to wait until the linked lists aren't in use this would be the case if a CRITICAL_SECTION should have been used instead and it doesn't change the speed very much if at all.

Last edited by tsp; 14th June 2005 at 15:30.
tsp is offline   Reply With Quote
Old 15th June 2005, 16:21   #12  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
Updated avisynth.cpp so that ScriptEnviroment includes a refcount so the g_bin is only destryed when the last instance is destucted. The updated file is here

I will try to see if it is posible to modefy the cache to allow more than one clip accessing it at the same time
tsp is offline   Reply With Quote
Old 18th June 2005, 00:19   #13  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
hmm I think I will try to implement the parallel rendering of several frames now that is has been on the TODO list in the last 2 years and 9 month (http://sourceforge.net/tracker/index...23&atid=482676)

I have currently 4 ways a filter could be executed in parallel. If we have an ordinary avisynth script like this:
Code:
AVISource("C:\foo.avi")
trim(100,200)
blur(0.3)
the Internal cache is inserted like this:
Code:
AVISource("C:\foo.avi")
InternalCache()
trim(100,200)
InternalCache()
blur(0.3)
InternalCache()
my idea is that after the last internalcache a filter(the distributor) is inserted that creates x threads when constructed where x is the number of logical processor(2 for a Pentium 4 with HT or a dualcore Athlon X2 or 4 for a Pentium Extreme Edition 840). if frame n is requested the frames from n to n+x-1 is distributed among the x threads. The current thread sleeps until frame n is done and then returns even if the other threads isn't done yet.
When the next frame is requested the process is repeated. If any frame is in progress they are skipped. Because frame n is already in the cache it should be returned immediately so the next frame should be requested shortly after.

the first way(mode 1) to process a clip(filter) in parallel would be to use the ordinary cache so different threas could access the same clip at the same time. Only if the cache is creating the requested frame should the thread wait. This would of course require the filter to be threadsafe. Most likely many of the current filter wouldn't be that.

The second way(mode 2) is to create a new clip for each thread that shares the same cache so the above processed script would look like this for x=2:
Code:
AVISource("C:\foo.avi")      AVISource("C:\foo.avi")
                   InternalCacheMode2()
trim(100,200)                    trim(100,200)
                   InternalCacheMode2()
blur(0.3)                           blur(0.3)
                   InternalCacheMode2()
                     distributor()
The InternalCacheMode2() picks the clip that isn't been processed or waits if the requested frame is being generated.
Most filters that doesn't use global/static variables should work with this mode.

Next in mode=3 only one thread can access a filter at a time but if the filter request a frame from the next InternalCache the next thread can access the filter and the first thread can first return the requested frame when the next thread request a frame. The result is that only one thread is executing code inside the filter at a time. This would prevent breakage of linked lists in global/static variables etc. The problem with this solution would be that the cache before and after the filter should share the same critical_section so they must somehow know each other address.

Last is mode=4 that only allows one thread to access a filter at a time and the next thread can first access the filter when the first thread returns from the filter. Also the access could be even more resticted so only the thread that process the frame with the lowest number is allowed to access the filter so the order is right. This should work with all current filters.

there should also be two functions to control what mode each filter is using. Something like SetMultiThreadedMode(int mode,int threads=number_of_logical_processors) and GetMultiThreadedMode(). When called the following filters uses that mode and if they are not called at all multithreading is disabled. The filter should also call this two functions in the constructor if they don't work with a paticulair mode.
So a sample script could look like this:
Code:
SetMultiThreadedMode(2,2)
Avisource("c:\foo.avi")
SetMultiThreadedMode(1)
trim(200,100)
SetMultiThreadedMode(3)
c=last.pixiedust()
SetMultiThreadedMode(2)
c=c.blur(0.1)
c.subtitle("test")
that is then translated into this:
Code:
Avisource("c:\foo.avi")    Avisource("c:\foo.avi")
                     InternalCacheMode2()
                      trim(200,100)
                     InternalCacheMode1_3()
                     pixiedust()
                     InternalCacheMode1()
blur(0.1)                         blur(0.1)
                     InternalCacheMode2()
subtitle("test")                 subtitle("test")
                  InternalCacheMode2()
                  Distributor()
So now I just have to code it
tsp is offline   Reply With Quote
Old 18th June 2005, 13:53   #14  |  Link
Bidoche
Avisynth 3.0 Developer
 
Join Date: Jan 2002
Location: France
Posts: 639
What is the point in having 4 modes, don't they do perform the same service.

I suggest you only focus on mode 3, which seems to me te best one.

And as for your critical sections concerns, what you need is a per thread Cache stack, the top Cache is locked, the others enterable.
Bidoche is offline   Reply With Quote
Old 18th June 2005, 19:13   #15  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
well the 4 modes have different advantages and drawbacks. Mode 1 would be fastest and simple to implement but filters that aren't threadsafe(using class variables and/or global/static variables in a non-thread safe way) would crash.
Mode 2 is also easy to implement and filters can acces non-static class variables in a non-threadsafe way without crashing. The speed would be the same as mode 1 but the memory usage will be higher and there will still be a problem with global/static variables.
mode 3 would solve some of the problems with static/global objects but will have problem with class variables (for instance temporal soften would produce wrong results) like mode 1 and it would be slower than mode 1 and 2 (A simple script like Avisource("c:\blabla.avi").HDRAGC() would only allow 1 thread to execute where mode 1 and 2 would allow all threats to execute). Hmm I think the best way to implement mode 3 would to be to create several clips like mode 2 and restricts access like the proposed mode 3.
So I think I will implement mode 2(for speed), 3(more compatible) and 4(works with all filters)
tsp is offline   Reply With Quote
Old 8th July 2005, 22:31   #16  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
ok now I implemented the different modes in the latest CVS version. The dll and source is available from the MT 0.41 thread. Mode 1,2,3 is the same as the modes decriped in the earlier posts here. Mode 4 is a combination of mode 2 and 3 and mode 5 is equal to mode 4 in my last post.

the changes are still rather hacky but at least it works.
tsp is offline   Reply With Quote
Old 10th July 2005, 05:11   #17  |  Link
foxyshadis
ангел смерти
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Lost
Posts: 9,507
Would translating plugins to the multithreaded core (sounds like a 2.6 change, along with the improved cache, but that's your call) involve just changing plugin code to thread-safe variable access, or would it also involve wrapping frame-grabs, destructors, etc. in locks? (I assume you're taking care of all of that here, but I want to make sure.)

At least this looks a lot easier than converting from YUY2->YV12.
foxyshadis is offline   Reply With Quote
Old 10th July 2005, 12:25   #18  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
the construction and desctruction of filters a serialized so there is no need for a lock there. If the filter doesn't use global or static variables or only reads from them in the getframe function and the filter doesn't requere linear frame order and doesn't use much memory no changes are needed. Not even a recompile because the default mode 2 creates a new instance for each cpu. If however the filter reads and writes to a global/static variable the code should be modified to be threadsafe (a simple EnterCriticalSection can do)
I still haven't included support for filters that requeres linear frame order( and how to handle the GetVar/SetVar)
The cache that is inserted between each filter handles the syncronisation when calling GetFrame so also no need to use a lock there.

Also I think the improved cache will be included in the final 2.56 release at least it is in the current CVS version.
tsp is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 21:20.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, vBulletin Solutions Inc.