Log in

View Full Version : GMSD and SSIM Quality Metrics


Pages : 1 2 3 4 5 6 [7]

WorBry
3rd April 2019, 06:36
Zoptilib and muvsfunc do not unfortunately, u have to do all three planes individually


How have you tested that ???

WorBry
3rd April 2019, 06:46
source/ref resolution and encoding resolution was 1920x1080p.

Model was 0.6.1 as stated on the pics - I used the exact same model in both VS|ffmpeg tests, meaning: the exact same .pkl file.

OK thanks.

Iron_Mike
3rd April 2019, 09:14
And looking at the function code, I can't see provision for other plane options. I got the impression that it's hard coded for luma only. Same goes for GMSD.

yes, it does default to plane=0 (Y), if the plane param is None, meaning: not passed in

How have you tested that ???

looked at the code in muvsfunc, the plane param needs to be of type int...

and u can also simply test this by running these params when passing config to Zopti:

'ssim': {'downsample': False, 'show_map': False, 'plane': (0, 1, 2)}


it will throw error from muvsfunc that plane param needs to be int... setting plane=2 (for example) will measure the 'v' plane...

'ssim': {'downsample': False, 'show_map': False, 'plane': 2}


so currently, u need to run all three planes individually...

WorBry
3rd April 2019, 15:03
Oh I see now. It uses the GetPlane (ShufflePlanes based) helper function from mvsfunc to extract and convert the selected plane to greyscale (Y Luma) for processing.

I'd be more interested in deriving an aggregate value for the UV chroma planes than all three planes.

Actually, in the MDSI function there is provision to adjust the weight given to the pooled gradient and chromacity similarity maps:

alpha: (float, 0~1) Weight used to merge gradient similarity (GS) map and chromaticity similarity (CS) map.

Default is 0.6.

I'll look at that also.

WorBry
3rd April 2019, 16:03
I'm pretty sure all the component metrics are luma plane only.



https://forum.doom9.org/showthread.php?p=1870779#post1870779

....Note that libvmaf only uses luma plane for calculating scores.

WorBry
4th April 2019, 02:12
Oh I see now. It uses the GetPlane helper function from mvsfunc to extract and convert the chosen plane to greyscale (Y Luma) for processing.

Tested it out using the x264 CRF1 transcode of the Sony 2160/30p XAVC clip. Appears to work as expected. Plane=1 and Plane=2 produce different SSIM and GMSD scores to default plane=0 (None). Also, when the reference and test clips are converted to greyscale, the SSIM and GMSD scores are absolute (1 and 0) with Plane=1 and Plane=2, as you would expect.


In the MDSI function there is provision to adjust the weight given to the pooled gradient and chromacity similarity maps:

alpha: (float, 0~1) Weight used to merge gradient similarity (GS) map and chromaticity similarity (CS) map.

Default is 0.6.

I'll look at that also.

I was rather hoping that setting alpha=0 would exclude the gradient (structural/luminance) similarity component and setting alpha=1 would exclude the chromacity similarity component, but that does not appear to be the case. Based on the outcomes observed with greyscale clips, it does shift the bias to some degree, but it's not absolute.

WorBry
4th April 2019, 16:34
Incidentally, came across this study:

https://www.intechopen.com/books/proceedings-of-the-3rd-czech-china-scientific-conference-2017/influence-of-chroma-subsampling-on-objective-video-quality-assessment-for-high-resolutions

They used the MSU Quality Measurement Tool for the SSIM and PSNR metrics.

WorBry
4th April 2019, 18:48
Oh I see now. It uses the GetPlane (ShufflePlanes based) helper function from mvsfunc to extract and convert the selected plane to greyscale (Y Luma) for processing.


Tested it out using the x264 CRF1 transcode of the Sony 2160/30p XAVC clip. Appears to work as expected. Plane=1 and Plane=2 produce different SSIM and GMSD scores to default plane=0 (None). Also, when the reference and test clips are converted to greyscale, the SSIM and GMSD scores are absolute (1 and 0) with Plane=1 and Plane=2, as you would expect.


I also checked whether setting the selected plane internally gives the same score as converting the plane to Gray8 externally with ShufflePlanes, e.g.

clip = core.std.ShufflePlanes(clips=clip, planes=1, colorfamily=vs.GRAY)

And it does - the scores are identical. For the x264 CRF1 clip (256 frames):

SSIM (no downsample):

Plane=0; 255.9883826437114205942791044
Plane=1; 255.9511353443287036135700421
Plane=2; 255.9553550588348764804891293

GMSD (no downsample):

Plane=0 1.342334158110490287003219167
Plane=1 2.221467708712367798208099638
Plane=2 2.180245724244588914348796231

However, it needs to be borne in mind the luma-converted U (Plane 1) and V (Plane 2) planes are being processed at the chroma resolution. In this case, the 4:2:0 chroma > luma was processed at 1920 x 1080 resolution. For 4:2:2, it would be 1920 x2160 and for 4:4:4, 3840 x 2160.

So expect that chroma resolution of the reference and test clip will affect the scores.

In the FFMPEG SSIM metric:

The total SSIM takes into account all the planes but the weighting is different, each plane is scaled by the resolution is has. So for example with YUV420 the color planes have 4 times smaller weight.

https://forum.doom9.org/showthread.php?p=1866162#post1866162

Iron_Mike
4th April 2019, 23:41
However, it needs to be borne in mind the luma-converted U (Plane 1) and V (Plane 2) planes are being processed at the chroma resolution. In this case, the 4:2:0 chroma > luma was processed at 1920 x 1080 resolution. For 4:2:2, it would be 1920 x2160 and for 4:4:4, 3840 x 2160.


is the original src clip UHD format or why do u end up @ 3840x2160 in 444 for the chroma planes ?

WorBry
4th April 2019, 23:56
is the original src clip UHD format or why do u end up @ 3840x2160 in 444 for the chroma planes ?

Yes, the original source clip and x264 transcode are UHD resolution.

Tested it out using the x264 CRF1 transcode of the Sony 2160/30p XAVC clip.

WorBry
7th April 2019, 16:03
Quick question - how do you convert a full color RGB or YUV clip (whether 4:4:4, 4:2:2 or 420) to greyscale RGB in VapourSynth ?
Couldn't figure it out when I ran those parallel greyscale MDSI tests in the CrowdRun > Prores series and ended up using the 'Greyscale' filter in AVISynth+ and exporting the output to MagicYUV.

Must surely be possible with fmtconv/shuffleplanes, but I just can't figure it.

ChaosKing
7th April 2019, 16:44
Maybe like this core.std.ShufflePlanes(clip, planes=[0,0,0], colorfamily=vs.RGB)

EDIT
import mvsfunc as mvf
clip = mvf.GrayScale(clip)

WorBry
7th April 2019, 16:50
Thank-you. :)

Edit:
Actually, core.std.ShufflePlanes(clip, planes=[0,0,0], colorfamily=vs.RGB) works fine, but mvf.GrayScale throws the error - Error on frame 0 request: Resize error 1026: RGB color family cannot have YUV matrix coefficients - when trying to convert a full color YUV source to greyscale RGB with:

clip = core.fmtc.resample (clip, css="444")
clip = core.fmtc.matrix (clip, mat="709", col_fam=vs.RGB)
clip = core.fmtc.bitdepth (clip, bits=8)
clip = mvf.GrayScale(clip)

or

clip = mvf.ToRGB(clip, depth=8)
clip = mvf.GrayScale(clip)

WorBry
8th April 2019, 03:19
For moderate to high bitrates , in scenarios where you have adequate bandwidth to attempt to preserve details, I would disable SAO for x265 . It tends to be a detail killer, acting like a smoothing filter. I always do this for my own usage.


Thanks, I'll look at that.


I ran a parallel series of x265 encodes (CRF 1 -5 ) of the Sony 2160/30p XAVC clip, with SAO disabled:

ffmpeg -i {Path}:/AX100.mp4 -vcodec libx265 -preset slow -crf {Value} -pix_fmt yuv420p -r 30000/1001 -x265-params no-sao=1:colorprim=1:transfer=1:colormatrix=1 {Path}:/AX100_x265_No_SAO_CRFx.mp4

It had negligible effect on the encode bitrates and SSIM and GMSD scores in this case:

http://i.imgur.com/JGqJcsxm.png (https://imgur.com/JGqJcsx)

WorBry
15th April 2019, 18:56
I also checked whether setting the selected plane internally gives the same score as converting the plane to Gray8 externally with ShufflePlanes, e.g.

clip = core.std.ShufflePlanes(clips=clip, planes=1, colorfamily=vs.GRAY)

And it does - the scores are identical. For the x264 CRF1 clip (256 frames):

SSIM (no downsample):

Plane=0; 255.9883826437114205942791044
Plane=1; 255.9511353443287036135700421
Plane=2; 255.9553550588348764804891293

GMSD (no downsample):

Plane=0 1.342334158110490287003219167
Plane=1 2.221467708712367798208099638
Plane=2 2.180245724244588914348796231

However, it needs to be borne in mind the luma-converted U (Plane 1) and V (Plane 2) planes are being processed at the chroma resolution. In this case, the 4:2:0 chroma > luma was processed at 1920 x 1080 resolution. For 4:2:2, it would be 1920 x2160 and for 4:4:4, 3840 x 2160.

So expect that chroma resolution of the reference and test clip will affect the scores.

In the FFMPEG SSIM metric:

https://forum.doom9.org/showthread.php?p=1866162#post1866162

In light of the discussion that ensued in the Zopti thread...

https://forum.doom9.org/showthread.php?p=1870880#post1870880...

...I thought it would be interesting to see how the raw (muvsfunc) SSIM scores for the chroma planes compare with those generated by ffmpeg SSIM.

For that I turned to the Crowd Run 1080 50p series of x264 (CRF 0 -30) encodes that I retained from the earlier metric tests; I had already collated the ffmpeg SSIM Y, U and V results.

I also wanted look at the outcomes when the 8bit 4:2:0 chroma (of both the test and reference clips) is up-sampled to (YUV) 444 before testing, and whether converting the original chroma planes to Gray8 and then up-scaling to 1920 x 1080 produces similar results, which in theory it should.

I used Resize.Bicubic for both the chroma up-sampling and Gray8 up-scaling. Downsample=False was applied in the chroma plane tests. Here are the results:

http://i.imgur.com/SZ5NSCTm.png (https://imgur.com/SZ5NSCT)

The 'raw' muvsfunc SSIM scores obtained for the Y, U and V planes (top right chart) show a similar pattern to those produced by ffmpeg SSIM, but they are lower and propotionately more so in the lower bitrate range.

Looking at the results for the individual planes.

In the Y plane results I also included the scores obtained previously with 'Down-sample=True' i.e. the default settings. As seen in all of the test series, the initial 2x2 down-sampling always produces higher scores, and in this case higher than ffmpeg SSIM - I don't think the ffmpeg SSIM metric applies any internal down-sampling.

As for the U and V planes. First thing to note is that the upsampled (444) and upscaled (Gray8) scores are indeed very close. Secondly, the 'upsampled/upscaled' scores are higher than the 'raw' scores and also the ffmpeg SSIM scores. Which leaves me wondering....

I looked at the ffmpeg source. It's doing the fast version too - no gaussian kernels there. The total SSIM takes into account all the planes but the weighting is different, each plane is scaled by the resolution is has. So for example with YUV420 the color planes have 4 times smaller weight. .

Does that mean that the reported (aggregate) scores for the individual Y, U and V planes are also weighted (i.e. scaled by the resolution each has) or is that weighting only being applied in calculating a total ('All') score ? Or is it simply that the muvsfunc SSIM is more accurate than the 'fast' ffmpeg implementation ?

Thinking about Poisondeathray's comments in the Zopti thread:

The idea of "weights" for a combined Y/U/V aggregate metric score, should reflect that the Y plane should recieve a proportionally higher weighting due to human perceptions - I think everyone will agree on the general idea, but might disagree on the actual formula for the weighting

All I'm saying is the relative weighting shouldn't change because of subsampling . Your perception of the proportion of black/white vs. color importance doesn't suddenly change if you watch a 4:4:4 video or a 4:2:0 video.

That lower quality of 4:2:0 should already be reflected in the lower U, V scores for 4:2:0 . That up/down converison is the "penalty" already incurred . The "different treatment" of chroma is exactly what you're measuring in the first place when you measure U-SSIM, V-SSIM, or U-PSNR, V-PSNR or whatever metric

Yes you can have other weights, other categories, combine /mix/match in any way you want , analyze it in whatever way you want, call it whatever you want ; but either way you're measuring Y,U,V separately - so you should have the "raw" scores

I'm inclined to agree that, for the purpose of objective metric analysis at least, you want to have the raw scores for the individual Y, U and V planes. Personally, I don't see that much value in composite 'Total' score, however derived. If there are differences, you want to know if they occurring in the Luma (structural distortions) or the Chroma. A weighted 'Total' score might also obscure subtle differences that are occurring in the chroma. You can see that in the above ffmpeg SSIM results, at the lower bitrates especially; the weighted 'All' score curve shows a greater bias to the 'Y' plane curve, as you would expect.

That said, when it comes to 'perceptual quality' is there maybe a case for 'normalizing' the chroma to 444 ? After all, that is what we are looking at when viewing a video on a display. Is it more meaningful to 'normalize' the chroma before the metric is applied than to apply some compensatory weighting to the raw measures derived from the native chroma ? I'm not necessarily saying that's how it should be done - just some food for thought.

I also ran simultaneous GMSD tests but have yet to collate the data.