Doom9's Forum - View Single Post - feature request: extractred/green/blue() ; mergergb()

IanB · 23rd April 2005, 09:10

tsp,

Layer and Overlay make use of the Alpha channel. Generally it is preferable not to clobber the alpha channel. However given this filter currently copies a chosen channel to the other 2 it is not a great stretch to define it to copy a channel to the other 3. It certainly makes the MMX code easier. And the core currently has a ShowAlpha(clip, string pixel_type) which is just C code, it copies the Alpha to the R, G and B channels, so there is a precedent.

For the MMX code I was thinking along the lines of

Code:

  movq      mm0,[src]
  movq      mm1,[src+8]
  pand      mm0,k000000ff000000ff  // dd p1, p0
  pand      mm1,k000000ff000000ff  // dd p3, p2
  packssdw  mm0,mm1  // dw p3, p2, p1, p0
  packuswb  mm0,mm0  // db p3, p2, p1, p0, p3, p2, p1, p0
  punpcklbw mm0,mm0  // db p3, p3, p2, p2, p1, p1, p0, p0
  movq      mm1,mm0
  punpcklbw mm0,mm0  // db p1, p1, p1, p1, p0, p0, p0, p0
  punpckhbw mm1,mm1  // db p3, p3, p3, p3, p2, p2, p2, p2  
  movq      [dst],mm0
  movq      [dst+8],mm1

as this code only uses 3 registers you could add a 2nd (or 3rd) stream and process 8 (or 12) pixels at once, it should really scream

IanB

23rd April 2005, 09:10	#17 \| Link
IanB Avisynth Developer Join Date: Jan 2003 Location: Melbourne, Australia Posts: 3,167	tsp, Layer and Overlay make use of the Alpha channel. Generally it is preferable not to clobber the alpha channel. However given this filter currently copies a chosen channel to the other 2 it is not a great stretch to define it to copy a channel to the other 3. It certainly makes the MMX code easier. And the core currently has a ShowAlpha(clip, string pixel_type) which is just C code, it copies the Alpha to the R, G and B channels, so there is a precedent. For the MMX code I was thinking along the lines of Code: movq mm0,[src] movq mm1,[src+8] pand mm0,k000000ff000000ff // dd p1, p0 pand mm1,k000000ff000000ff // dd p3, p2 packssdw mm0,mm1 // dw p3, p2, p1, p0 packuswb mm0,mm0 // db p3, p2, p1, p0, p3, p2, p1, p0 punpcklbw mm0,mm0 // db p3, p3, p2, p2, p1, p1, p0, p0 movq mm1,mm0 punpcklbw mm0,mm0 // db p1, p1, p1, p1, p0, p0, p0, p0 punpckhbw mm1,mm1 // db p3, p3, p3, p3, p2, p2, p2, p2 movq [dst],mm0 movq [dst+8],mm1 as this code only uses 3 registers you could add a 2nd (or 3rd) stream and process 8 (or 12) pixels at once, it should really scream IanB