You are already using optimized code. I don't know what else could be done to improve the perfs. The big problem comes from the GPU to system memory copy
@clsid : this is okay, we are not affected by the patch in DXVA mode (if your question was addressed to us)
|