Doom9's Forum - View Single Post - change to avisynth to make it more threadsafe

tsp · 8th June 2005, 20:06

after I made the plugin mt I discovered the way avisynth handles VideoFrame is thread unsafe. The problem lies in how the list of VideoFrames are handle in VideoFrame:: operator new(unsigned). Here the first VideoFrame with a refcount of 0 is returned to the VideoFrame constructor. The problem is that the refcount is increased when it reaches the PVideoFrame object so in the mean time another thread could have grapped it. I solved this in mt by wrapping the functions that creates a new videoframe(by calling new VideoFrame) so only one thread at a time can access these function. The following functions calls new VideoFrame: cache::GetFrame, ScriptEnvironment::NewVideoFrame(int row_size, int height, int align), ScriptEnvironment::NewPlanarVideoFrame(int width, int height, int align, bool U_first), (both is used by ScriptEnvironment::NewVideoFrame(const VideoInfo& vi, int align) and ScriptEnvironment::MakeWritable(PVideoFrame* pvf)), VideoFrame::Subframe .

I suggest the following fixes to avoid it.

Code:

void* VideoFrame::operator new(unsigned) {
  // CriticalSection
  //optional EnterCriticalSection(&csVideoFrame);
  for (LinkedVideoFrame* i = g_VideoFrame_recycle_bin; i; i = i->next)
    if (i->vf.refcount == 0)
{    
      InterlockedIncrement((long*)&i->vf.refcount);
      if(i->vf.refcount==1) 
           return &i->vf;
      else
           InterlockedDecrement((long*)&i->vf.refcount);
}
//LeaveCriticalSection(&csVideoFrame);
  LinkedVideoFrame* result = (LinkedVideoFrame*)::operator new(sizeof(LinkedVideoFrame));
  result->next = g_VideoFrame_recycle_bin;
  g_VideoFrame_recycle_bin = result;
  return &result->vf;
}

for even greater threadsafety CriticalSection could be used (csVideoFrameBuffer should of course be initialized somewhere else)

Code:

VideoFrame::VideoFrame(VideoFrameBuffer* _vfb, int _offset, int _pitch, int _row_size, int _height)
  : refcount(1), vfb(_vfb), offset(_offset), pitch(_pitch), row_size(_row_size), height(_height),offsetU(_offset),offsetV(_offset),pitchUV(0)  // PitchUV=0 so this doesn't take up additional space
{
  InterlockedIncrement(&vfb->refcount);
}

VideoFrame::VideoFrame(VideoFrameBuffer* _vfb, int _offset, int _pitch, int _row_size, int _height, int _offsetU, int _offsetV, int _pitchUV)
  : refcount(1), vfb(_vfb), offset(_offset), pitch(_pitch), row_size(_row_size), height(_height),offsetU(_offsetU),offsetV(_offsetV),pitchUV(_pitchUV)
{
  InterlockedIncrement(&vfb->refcount);
}

the refcount is just initialized to 1 to avoid the VideoFrame to be recruted by another thread.

Code:

VideoFrame* VideoFrame::Subframe(int rel_offset, int new_pitch, int new_row_size, int new_height) const {
  VideoFrame* Retval=new VideoFrame(vfb, offset+rel_offset, new_pitch, new_row_size, new_height); 
  InterlockedDecrement((long*)&Retval->refcount
  return Retval;
}

VideoFrame* VideoFrame::Subframe(int rel_offset, int new_pitch, int new_row_size, int new_height, int rel_offsetU, int rel_offsetV, int new_pitchUV) const {
  VideoFrame* Retval=new VideoFrame(vfb, offset+rel_offset, new_pitch, new_row_size, new_height, rel_offsetU+offsetU, rel_offsetV+offsetV, new_pitchUV);
  InterlockedDecrement((long*)&Retval->refcount);
  return Retval;
}

It is neccesary to decrease the refcount because it is not posible to change avisynth.h without breaking existing filters, this introduce a small window where another thread could grap the VideoFrame (The ideal solution would be to have VideoFrame::Release decrement the refcount to 0 when it hits 1.)

Code:

PVideoFrame __stdcall ScriptEnvironment::NewVideoFrame(const VideoInfo& vi, int align) {
  // Check requested pixel_type:
  switch (vi.pixel_type) {
    case VideoInfo::CS_BGR24:
    case VideoInfo::CS_BGR32:
    case VideoInfo::CS_YUY2:
    case VideoInfo::CS_YV12:
    case VideoInfo::CS_I420:
      break;
    default:
      ThrowError("Filter Error: Filter attempted to create VideoFrame with invalid pixel_type.");
  }
  // If align is negative, it will be forced, if not it may be made bigger
  if (vi.IsPlanar()) { // Planar requires different math ;)
    if (align>=0) {
      align = max(align,FRAME_ALIGN);
    }
    if ((vi.height&1)||(vi.width&1))
      ThrowError("Filter Error: Attempted to request an YV12 frame that wasn't mod2 in width and height!");
    PVideoFrame retval=ScriptEnvironment::NewPlanarVideoFrame(vi.width, vi.height, align, !vi.IsVPlaneFirst());  // If planar, maybe swap U&V
    InterlockedDecrement((long*)&retval->refcount);
    return retval;
  } else {
    if ((vi.width&1)&&(vi.IsYUY2()))
      ThrowError("Filter Error: Attempted to request an YUY2 frame that wasn't mod2 in width.");
    if (align<0) {
      align *= -1;
    } else {
      align = max(align,FRAME_ALIGN);
    }
    PVideoFrame retval=ScriptEnvironment::NewVideoFrame(vi.RowSize(), vi.height, align);
	InterlockedDecrement((long*)&retval->refcount);
    return retval;
  }
}

same story again.

Code:

bool ScriptEnvironment::MakeWritable(PVideoFrame* pvf) {
  const PVideoFrame& vf = *pvf;
  // If the frame is already writable, do nothing.
  if (vf->IsWritable()) {
    return false;
  }

  // Otherwise, allocate a new frame (using NewVideoFrame) and
  // copy the data into it.  Then modify the passed PVideoFrame
  // to point to the new buffer.
    const int row_size = vf->GetRowSize();
    const int height = vf->GetHeight();
    PVideoFrame dst;
    if (vf->GetPitch(PLANAR_U)) {  // we have no videoinfo, so we can only assume that it is Planar
      dst = NewPlanarVideoFrame(row_size, height, FRAME_ALIGN,false);  // Always V first on internal images
	  InterlockedDecrement((long*)&dst->refcount);
     } else {
      dst=NewVideoFrame(row_size, height, FRAME_ALIGN);
      InterlockedDecrement((long*)&dst->refcount);
     //LeaveCriticalSection(&csVideoFrame); 
    }
    BitBlt(dst->GetWritePtr(), dst->GetPitch(), vf->GetReadPtr(), vf->GetPitch(), row_size, height);
    // Blit More planes (pitch, rowsize and height should be 0, if none is present)
    BitBlt(dst->GetWritePtr(PLANAR_V), dst->GetPitch(PLANAR_V), vf->GetReadPtr(PLANAR_V), vf->GetPitch(PLANAR_V), vf->GetRowSize(PLANAR_V), vf->GetHeight(PLANAR_V));
    BitBlt(dst->GetWritePtr(PLANAR_U), dst->GetPitch(PLANAR_U), vf->GetReadPtr(PLANAR_U), vf->GetPitch(PLANAR_U), vf->GetRowSize(PLANAR_U), vf->GetHeight(PLANAR_U));

    *pvf = dst;
    return true;
}

Code:

PVideoFrame __stdcall Cache::GetFrame(int n, IScriptEnvironment* env) 
{ 
  n = min(vi.num_frames-1, max(0,n));  // Inserted to avoid requests beyond framerange.

  __asm {emms} // Protection from rogue filter authors

  if (h_policy == CACHE_NOTHING) { // don't want a cache. Typically filters that only ever seek forward.
    __asm mov ebx,ebx  // Hack! prevent compiler from trusting ebx contents across call
    PVideoFrame result = childGetFrame(n, env);
//  if (result->vfb) env->ManageCache(MC_ReturnVideoFrameBuffer, result->vfb); // return vfb to vfb pool for immediate reuse
    return result;
  }

  if (h_policy == CACHE_RANGE) {  // for filters that bash a span of frames. Typically temporal filters.
    PVideoFrame result;
    bool foundframe = false;

    for (int i = 0; i<h_total_frames; i++) {
      if (h_video_frames[i]->status & CACHE_ST_USED) {
		// Check if we already have the frame
	    if (h_video_frames[i]->frame_number == n) {
		  result = new VideoFrame(h_video_frames[i]->vfb, h_video_frames[i]->offset, h_video_frames[i]->pitch,
								  h_video_frames[i]->row_size, h_video_frames[i]->height,
								  h_video_frames[i]->offsetU, h_video_frames[i]->offsetV,
								  h_video_frames[i]->pitchUV);
		  foundframe = true;
		}
		// Check if it is out of scope
		else if (abs(h_video_frames[i]->frame_number-n)>=h_total_frames) {
		  h_video_frames[i]->status |= CACHE_ST_DELETEME;
		  if (!(h_video_frames[i]->status & CACHE_ST_HAS_BEEN_RELEASED)) {  // Has this framebuffer been released?
			UnlockVFB(h_video_frames[i]);  // We can now release this vfb.
			h_video_frames[i]->status |= CACHE_ST_HAS_BEEN_RELEASED;
		  }
		}
      }
    } // for (int i

    if (foundframe) {   // Frame was found - build a copy and return a (dumb) pointer to it.

      VideoFrame* copy = new VideoFrame(result->vfb, result->offset, result->pitch, result->row_size,
                                        result->height, result->offsetU, result->offsetV, result->pitchUV);
      _RPT2(0, "Cache2:%x: using cached copy of frame %d\n", this, n);

      InterlockedDecrement((long*)&result->refcount);
	  InterlockedDecrement((long*)&copy->refcount);
      return copy;
    }
    else {
	  __asm mov ebx,ebx  // Hack! prevent compiler from trusting ebx contents across call
      result = childGetFrame(n, env); // Should be checking the rest of cache

      // Find a place to store it.
      for (i=0 ;; i++) {
        if (i == h_total_frames)
#ifdef _DEBUG
          env->ThrowError("Cache2:%x: Internal cache error! Report this!", this);
#else
          return result; // Should never happen
#endif
        if (h_video_frames[i]->status & CACHE_ST_DELETEME)   // Frame can be deleted
          break;

        if (!(h_video_frames[i]->status & CACHE_ST_USED))    // Frame has not yet been used.
          break;
      }  
      _RPT2(0, "Cache2:%x: Miss! Now locking frame frame %d in memory\n", this, n);
    }

    if (h_video_frames[i]->status & CACHE_ST_USED) ReturnVideoFrameBuffer(h_video_frames[i], env); // return old vfb to vfb pool for early reuse

    // Store it
    h_video_frames[i]->vfb      = result->vfb;
    h_video_frames[i]->sequence_number = result->vfb->GetSequenceNumber();
    h_video_frames[i]->offset   = result->offset;
    h_video_frames[i]->offsetU  = result->offsetU;
    h_video_frames[i]->offsetV  = result->offsetV;
    h_video_frames[i]->pitch    = result->pitch;
    h_video_frames[i]->pitchUV  = result->pitchUV;
    h_video_frames[i]->row_size = result->row_size;
    h_video_frames[i]->height   = result->height;
    h_video_frames[i]->frame_number = n;
    h_video_frames[i]->faults = 0;
  // Keep this vfb to ourselves!
  // 1. This prevents theft by Plan B:
  // 2. This make the vfb readonly i.e. MakeWriteable() returns a copy
    LockVFB(h_video_frames[i]);
    h_video_frames[i]->status   = CACHE_ST_USED;

    return result;
  }

Code:

VideoFrame* Cache::BuildVideoFrame(CachedVideoFrame *i, int n)
{
  Relink(&video_frames, i, video_frames.next);   // move the matching cache entry to the front of the list
  VideoFrame* result = new VideoFrame(i->vfb, i->offset, i->pitch, i->row_size, i->height, i->offsetU, i->offsetV, i->pitchUV);

  // If we have asked for a same stale frame twice, leave frames locked.
  if ((fault_rate <= 160) || (fault_rate == 190)) { // Reissued frames are not subject to locking as readily
	UnlockVFB(i);
	_RPT2(0, "Cache:%x: using cached copy of frame %d\n", this, n);
  }
  else {
	_RPT3(0, "Cache:%x: lock vfb %x, cached frame %d\n", this, i->vfb, n);
  }
InterlockedDecrement((long*)&result->refcount);
  return result;
}

This is the cache from the current cvs. I'm not total familiar with out put I think that could do.

Also the linked list with VideoFrameBuffer 's can be accessed by two threads at the same time but it seems like it is only ScriptEnvironment::GetFrameBuffer2 that handles the list so putting a EnterCriticalSection/LeaveCriticalSection would be an easy solution.
[edit] that might not be enough because the VideoFrameBuffer is returned with a refcount of 0. So something like the VideoFrame hack above might be needed[/edit]

All this should allow two different threads to work on different clips/videoframes at the same time like mt does it. Because only the scripteviroment interface is exposed user filters can easy overwrite it to handle the GetVar SetVar.[Edit]now the code compiles[/edit]