View Full Version : New metrics. Better than SSIM?
IgorC
1st December 2006, 18:39
There is some information about new metric that potentionally as good as SSIM or even better. From NTIA.
http://www.via.ecp.fr/via/ml/x264-devel/2006-11/msg00107.html
http://www.its.bldrdoc.gov/vqeg/projects/frtv_phaseI/COM-80E_final_report.pdf
http://www.its.bldrdoc.gov/vqeg/projects/frtv_phaseII/downloads/VQEGII_Final_Report.pdf
Had anybody any previous experience with it?
MfA
2nd December 2006, 14:54
Nope, but I don't see other error metrics being implemented in a hurry. SSIM is popular not simply because how effective it is, but also because of how elegant it is.
The metrics which are better than SSIM tend to be highly complex and a pain to implement.
IgorC
3rd December 2006, 00:46
Nope, but I don't see other error metrics being implemented in a hurry. SSIM is popular not simply because how effective it is, but also because of how elegant it is.
The metrics which are better than SSIM tend to be highly complex and a pain to implement.
I understood the word 'elegant' as 'fast and enough real'. Is it right?
In my opinion SSIM is very good metric (best I see until now)
but sometimes it fails on :
1. Phsycovisual features like adaptive quantization.
2. Sometimes film grain and other miscellaneous details are considerated as artifacts and unwanted noise. So the video without noise but smooth and with some details lost has higher result.
Maybe some new metrics are hard to implement apart to be slow but it's price of more powerful measure tool
DmitryPopov
3rd December 2006, 12:03
There is also "Brightness Independent PSNR"
http://www.compression.ru/video/quality_measure/metric_plugins/bi-psnr_en.htm
which claims to be better than PSNR and SSIM.
MfA
3rd December 2006, 14:19
I'd rather hear the pearson's correlation with subjective tests for the VQEG dataset than claims :)
DarkZell666
3rd December 2006, 18:37
but sometimes it fails on :
1. Phsycovisual features like adaptive quantization.By nature, metrics will always fail in the psychovisual domain :p Visual information gets distorted to "look" better, and metrics are fooled every single time by this "distortion" (unless the tool for measuring is aware of what distortion has been applied).
IgorC
3rd December 2006, 19:19
By nature, metrics will always fail in the psychovisual domain
Nature? Oh no. By nature SSIM is already based on some psy. algoritms comparing to purely mathematic PSNR. It's not that simple.
I have some experience with SSIM values when it performs very well (and where OPSNR failed grossly) according to visual perception using adaptive quantization per macroblock in Nero H.264 codec.
Dmitry Popov There are enough people who have alergy to PSNR family metrics. :) . Some fast review just show for now that BI-PSNR is just another mathematic PSNR metric *imo*.
But I found MSU metrics like Blur and Blocking are usefull to undertsand where SSIM failed.
For example, higher inloop filter in H.264 leads to higher SSIM values, higher blurness and lower blocking (according to MSU blur and deblock sets)
however fow example in x264 :
deblock 0 = it's balanced quality that gives more quality to low and middle freq.
deblock -2 = it's another balanced quality that gives more quality to higher freq. details than to low/middle.
So deblock values between (maybe) 0 and -2 are just part of personal preferences. But SSIM as it likes more blur than blocking will prefer higher deblocking.
I had a sample where SSIM in/decreases and OPNSR de/increases when I change psy model. I will upload it when I'll have time. And will post some BI-PSNR values too.
Fizick
3rd December 2006, 21:13
bad thing with most metrix (for video) is neglect of temporal effects.
Didée
4th December 2006, 01:55
bad thing with most metrix (for video) is neglect of temporal effects.
Ah, balm on my soul. Get metrics of a clip where luma is offsetted by constantly +3 from reference. Get metrics of the clip where luma is offsetted by alternatingly +2 -2 +2 -2 etc. from reference.
No question the 2nd one gets the better metrics. No question the first one looks better. Dare to say, the first one *is* better.
IgorC
4th December 2006, 02:41
Ah, balm on my soul. Get metrics of a clip where luma is offsetted by constantly +3 from reference. Get metrics of the clip where luma is offsetted by alternatingly +2 -2 +2 -2 etc. from reference.
No question the 2nd one gets the better metrics. No question the first one looks better. Dare to say, the first one *is* better.
Is it valid for last SSIM 0.24a with lumimask true?
Can you provide avscript for this purpose?
Didée
4th December 2006, 13:34
Can't say, didn't try yet. But that doesn't matter, since it's only an oversimlistic example.
The point is temporal consistency. Some (bigger) errors might be acceptable if they're "always the same", although they will give the worse metrics, compared to some (smaller) errors that are fluctuating back-and-forth (or whatever) between adjacent frames.
Temporal consistency isn't evaluated by metrics that calculate their numbers only per-frame.
Think of the fabulous "crawling wall" problem ... a very small per-frame error for metrics, but a highly annoying artefact to watch at.
tritical
4th December 2006, 17:23
Sticking with ssim, there have been some proposed modifications:
multi-scale ssim (http://www.cns.nyu.edu/~zwang/files/papers/msssim.pdf)
complex-wavelet ssim (cwssim) (http://www.cns.nyu.edu/~zwang/files/papers/icassp05.pdf)
weighted subband cwssim (wcwssim) (http://www.ece.northwestern.edu/~pappas/papers/brooks_hvei06.pdf)
All of those are still based on image quality/similarity assessment. Therefore, they would be applied to video on a frame by frame basis. And as MfA stated before, they are all more complex to implement than plain ssim.
IMO, applying image quality metrics on a per frame basis to evaluate video is a lot like using a per-pixel metric (mse) to evaluate images. There are many types of artifacts/degradations that can lead to the same mse for two frames but that have quite different effects on percieved quality. Likewise, there are many possible temporal artifacts that can result in the same overall ssim (or another image quality metric) score for the sequence as a whole but that have significant differences on quality as percieved by a person.
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.