Doom9's Forum - View Single Post - The rav1e development thread: working on rav1e in a different more forward manner

BlueSwordM · 13th December 2022, 19:37

Indeed.
Anyway, the main problem with specific video encoder pipelines is that they rely on low complexity metrics to make them feasible to run without HW acceleration and make HW acceleration not costly.

For quantization processes and motion estimation(as well as TPL-RDO depending on how complex its implementation is), fast metrics are a necessity.

That means only SAD and SATD to a lesser extent even when RDO for those processes is active, which means suboptimal consistent psycho-visual targeting.

Using frequency weighted metrics in the traditional way by using a DCT transform to get the frequency information is fine compute wise, but it's too slow for processes that are repeated a lot like quantization.

To fix that, I've finally found a way to fix this: I'll be using a specific filter to get the frequency information out of the blocks without using a transform, allowing us to get all the benefits of this without the compute costs!

This will allow me to do stuff in a somewhat psycho-visually weighted manner while preserving the speed of a very simple metric like SAD, getting us nice gains even at high speed presets and even speedups for those presets!

For those interested into how I've discovered this, I present you to you a glorious daala paper, where the Xiph folks managed to do something very smart:
https://people.xiph.org/~tterribe/da...a-icip2017.pdf

13th December 2022, 19:37	#9 \| Link
BlueSwordM Registered User Join Date: Dec 2021 Location: Canada Posts: 22	Indeed. Anyway, the main problem with specific video encoder pipelines is that they rely on low complexity metrics to make them feasible to run without HW acceleration and make HW acceleration not costly. For quantization processes and motion estimation(as well as TPL-RDO depending on how complex its implementation is), fast metrics are a necessity. That means only SAD and SATD to a lesser extent even when RDO for those processes is active, which means suboptimal consistent psycho-visual targeting. Using frequency weighted metrics in the traditional way by using a DCT transform to get the frequency information is fine compute wise, but it's too slow for processes that are repeated a lot like quantization. To fix that, I've finally found a way to fix this: I'll be using a specific filter to get the frequency information out of the blocks without using a transform, allowing us to get all the benefits of this without the compute costs! This will allow me to do stuff in a somewhat psycho-visually weighted manner while preserving the speed of a very simple metric like SAD, getting us nice gains even at high speed presets and even speedups for those presets! For those interested into how I've discovered this, I present you to you a glorious daala paper, where the Xiph folks managed to do something very smart: https://people.xiph.org/~tterribe/da...a-icip2017.pdf