A GPU is really bad at doing CABAC, since CABAC is inherently a serial process while a GPU gets most of its speed by having lots of parallel processors.
There's no point in offloading just dequant+iDCT to the GPU, since that takes about 1% of ffh264's cpu-time.
So the only reasonable division of labor in a decoder is: CPU does CABAC, GPU does everything involving pixels and DCT.
|