Why metrics like PSNR, SSIM, VMAF suck while exploring possible alternatives [Archive]

View Full Version : Why metrics like PSNR, SSIM, VMAF suck while exploring possible alternatives

birdie

10th November 2024, 09:58

A nice article on streaming media (https://www.streamingmedia.com/Articles/Post/Blog/Challenges-of-New-Encoding-Scenarios-Reflections-on-Measuring-Perceived-Quality-166721.aspx).

Z2697

10th November 2024, 15:17

I don't really like the "FGS and AI" themed topic though.

Z2697

10th November 2024, 16:17

I don't think they (terribly) suck, they are "misused", if you know they suck at something then avoid it... for example SSIM fails to "detect" blur, or the non-neg VMAF models give much higher score to sharpened video.
Which might result in you should only trust the metric scores published online or you tested yourself when you can rule out those uncertainties. What a harsh world.
Current metrics, they have shortcomings. New metrics, new shortcomings.

benwaggoner

11th November 2024, 02:25

The least-bad metric I've used is the ITU p1204, which has a mode that combines both full reference and bitstream analysis. Of course it uses machine learning, so having a good ground truth data set that aligns with your scenarios really helps. And which is also very expensive and labor intensive to make.

It's important to distinguish between a distortion metric (like PSNR, SSIM, and VMAF) with a quality metric (like p1204), of course. A distortion metric will rate a very accurate encode of a horrible looking source higher than a still pretty accurate encode of a great source. A quality metric will measure how good it looks inclusive of both source and encoding artifacts.

Distortion metrics are great for tuning an encoder. Quality metrics are great for estimating how good a viewer will think the content looks. They're not so different that one can't be used for the other, but it's best to use the right kind of metric for what you're trying to accomplish.