GMSD and SSIM Quality Metrics [Archive]

View Full Version : GMSD and SSIM Quality Metrics

Pages : [1] 2 3 4 5 6 7

WorBry

13th February 2019, 21:31

I've been running some quality metric tests with VMAF and FFMPEG SSIM and PSNR.

https://forum.doom9.org/showthread.php?p=1864770#post1864770

While I still have the test files I'd like to see how the SSIM and GMSD metrics (in muvsfunc) compare.

The function descriptions state:

SSIM - 'The mean SSIM (MSSIM) index value of the distorted image will be stored as frame property 'PlaneSSIM' in the output clip'.

GMSD - 'The distortion degree of the distorted image will be stored as frame property 'PlaneGMSD' in the output clip. The value of GMSD reflects the range of distortion severities in an image.'

But I'm not clear how to access these results.

zorr

13th February 2019, 21:57

While I still have the test files I'd like to see how the SSIM and GMSD metrics (in muvsfunc) compare.

The function descriptions state:

SSIM - 'The mean SSIM (MSSIM) index value of the distorted image will be stored as frame property 'PlaneSSIM' in the output clip'.

GMSD - 'The distortion degree of the distorted image will be stored as frame property 'PlaneGMSD' in the output clip. The value of GMSD reflects the range of distortion severities in an image.'

But I'm not clear how to access these results.

The easiest way probably is to use zoptilib.py which is part of the Zopti optimizer (https://forum.doom9.org/showthread.php?t=176076). But you don't need to run Zopti at all, just your script which calls zoptilib.

Here's a simple example:

from zoptilib import Zopti

# read input video
orig = core.ffms2.Source(source=r'source.avi')

# initialize output file and chosen metrics
zopti = Zopti('results.txt', metrics=['ssim', 'gmsd'])

# ... process the video ...
# alternate = some_process(orig)

# measure similarity of original and alternate videos, save results to output file
zopti.run(orig, alternate)

The output file will contain frame number and the chosen metrics separated by ; and on the last line the sum of per frame values. The file will be written when all of the video frames have been processed (on the last frame).

The latest version of zoptilib is currently at here (https://pastebin.com/511BcmNp).

There's also MDSI metric but for that you need to upgrade to the latest muvsfunc version manually.

WorBry

13th February 2019, 23:57

Yes, that works. Thanks.

WorBry

17th February 2019, 03:11

I ran and added muvsfunc SSIM and GMSD to the first test series linked to above:

https://forum.doom9.org/showthread.php?p=1864770#post1864770

Crowd Run 1080/50p encoded to x264 with crf 0 to 30:

http://i.imgur.com/JhKH4Bv.png (https://imgur.com/JhKH4Bv)

http://i.imgur.com/IdYYRjL.png (https://imgur.com/IdYYRjL)

Interesting that the libvmaf, ffmpeg and muvsfunc SSIM implementations give rather different results:

http://i.imgur.com/BsGiz6q.png (https://imgur.com/BsGiz6q)

According to the documentation, the ffmpeg filter does apply the original SSIM algorithm but to to improve speed uses the standard approximation of overlapped 8x8 block sums rather than the original gaussian weights:

https://github.com/FFmpeg/FFmpeg/blob/master/libavfilter/vf_ssim.c

As described in the original SSIM paper, one problem with the moving 8 x 8 block computation is that the resulting SSIM index map often exhibits undesirable blocking artifacts. By modifying the local statistics with gaussian weights the "quality maps exhibit a locally isotropic property" - which I take to mean it smooths out blocking artifacts in the quality map.

Page 605 - http://www.compression.ru/video/quality_measure/ssim.pdf

The muvsfunc SSIM function does apply gaussian filtering, with a default standard deviation of 1.5 as per the original recommendation:

https://github.com/WolframRhodium/muvsfunc/blob/master/muvsfunc.py

Would that explain why the muvsfunc SSIM metric gives higher scores this test series ?

Another factor might be whether preliminary downsampling is applied, as is recommended in the 'Suggested Usage":

https://ece.uwaterloo.ca/~z70wang/research/ssim/"

There's no mention of downscaling in the FFMPEG documentation. The muvsfunc SSIM filter does apply downscaling by default:

downsample: (bool) Whether to average the clips over local 2x2 window and downsample by a factor of 2 before calculation. Default is True.

And apparently VMAF "includes an empirical downsampling process, as described at the Suggested Usage" in it's elementary SSIM derivation:

https://github.com/Netflix/vmaf/issues/22

So why are the libvmaf SSIM scores even higher, if both are following the original code and applying the down-sampling process ?

The muvsfunc SSIM description does state though that it uses different size gaussian kernel to the one in the original MATLAB code.

Note that the size of gaussian kernel is different from the one in MATLAB.

Could that explain the difference?

I have to say this leaves me in a quandary about which SSIM implementation to use when comparing the inherent quality efficiency of different video formats, especially at high bitrates - for example, when comparing 'visually lossless' intermediate codecs, where the interest is not in perceptual quality under certain viewing conditions but in preservation of structural fidelity. The ffmpeg SSIM metric gives a much wider spread of values which makes it easier to judge, with some confidence, that one video is of higher quality than another based on the difference of isolated scores (the last graph presented below shows that well), but is it valid as a quotable SSIM score ?

Is there a valid case for omitting the down-sampling step in the muvsfunc SSIM metric under such conditions ?

Anyhow, I also ran muvsfunc GMSD and SSIM on the parallel x265 series:

http://i.imgur.com/lxlsHIp.png (https://imgur.com/lxlsHIp)

Including the ffmpeg SSIM results made the graph too 'busy', so here they are separately:

http://i.imgur.com/gPsF3om.png (https://imgur.com/gPsF3om)

GMSD gave consistently higher scores for x265 over the same bitrate range. The SSIM metrics did also, but by a narrower margin at the higher bitrates:

http://i.imgur.com/l8kZ7Xl.png (https://imgur.com/l8kZ7Xl)

GMSD looks like it could be very useful. I've yet to test muvsfunc SSIM and GMSD on the 2160/50p Crowd Run x264 and x265 series.

If anyone's interested here's the original GMSD paper:

https://arxiv.org/pdf/1308.3052.pdf

Edit: Came across this article that quotes from an article by RealNetworks CTO Reza Rassool:

“if a video service operator were to encode video to achieve a VMAF score of about 93 then they would be confident of optimally serving the vast majority of their audience with content that is either indistinguishable from original or with noticeable but not annoying distortion.” So a 93 VMAF score is about the same as .95 for SSIM

https://streaminglearningcenter.com/learning/mapping-ssim-vmaf-scores-subjective-ratings.html

In the above tests - at a VMAF score of 93 the corresponding muvsfunc SSIM scores were around 0.96 - 0.97 and the ffmpeg SSIM scores were around the 0.925 - 0.93. The libvmaf component SSIM scores however were way up at around 0.993, which surely suggests there's something more going

WorBry

17th February 2019, 17:40

I'd like to test muvsfunc SSIM with 'downsample=False' to see how the results compare. How can I change that setting so as to get the results through Zoptilib ?

from zoptilib import Zopti

# read input video
orig = core.ffms2.Source(source=r'source.avi')

# initialize output file and chosen metrics
zopti = Zopti('results.txt', metrics=['ssim', 'gmsd'])

# ... process the video ...
# alternate = some_process(orig)

# measure similarity of original and alternate videos, save results to output file
zopti.run(orig, alternate)

ChaosKing

17th February 2019, 19:45

I made an update https://github.com/theChaosCoder/zoptilib
You can use it like this
zopti = Zopti(output_file, metrics=['ssim', 'mdsi'])
zopti.addParams('ssim', dict(downsample=False, show_map=False))
zopti.addParams('mdsi', dict(down_scale=1))

WorBry

17th February 2019, 20:02

Brilliant. Haven't got around to looking at MDSI yet. It's RGB only though, isn't it ?

ChaosKing

17th February 2019, 21:06

Yes. Also use the latest version https://raw.githubusercontent.com/WolframRhodium/muvsfunc/master/muvsfunc.py

WorBry

18th February 2019, 18:39

I'd like to test muvsfunc SSIM with 'downsample=False' to see how the results compare.

I've done that with the x264 test series:

http://i.imgur.com/EfLsZmO.png (https://imgur.com/EfLsZmO)

http://i.imgur.com/IcqcDXH.png (https://imgur.com/IcqcDXH)

Clearly removing the downsampling step has a profound effect, producing a much wider spread of scores - even more so than ffmpeg SSIM - although up at around CRF=2 (400 - 425 Mbps), the scores start to approach those obtained with downsampling applied, and lossless is still reported as such.

http://i.imgur.com/gxaKZFN.png (https://imgur.com/gxaKZFN)

As described in the 'Suggested Usage' for the original (Matlab) SSIM code, the purpose of the downsampling is to compensate for viewing the image at a typical distance from the screen:

The above (ssim_index.m) is a single scale version of the SSIM indexing measure, which is most effective if used at the appropriate scale. The precisely “right” scale depends on both the image resolution and the viewing distance and is usually difficult to be obtained. In practice, we suggest to use the following empirical formula to determine the scale for images viewed from a typical distance (say 3~5 times of the image height or width): 1) Let F = max(1, round(N/256)), where N is the number of pixels in image height (or width); 2) Average local F by F pixels and then downsample the image by a factor of F; and 3) apply the ssim_index.m program. For example, for an 512 by 512 image, F = max(1, round(512/256)) = 2, so the image should be averaged within a 2 by 2 window and downsampled by a factor of 2 before applying ssim_index.m.

http://www.cns.nyu.edu/~lcv/ssim/

In other words it is a perceptual quality modifier. Whether it's valid to remove that step when using SSIM to compare video images for structural differences that exceed visual acuity (i.e. independent of perceived quality) I'm still not sure.

WorBry

18th February 2019, 19:58

Thought it might be interesting to see how the AVISynth SSIM filter compares also. This plugin has a rather nebulous history going back to the original implementation by LeFungus in 2003:

https://forum.doom9.org/showthread.php?t=61128

His last update was version 0.24, although the results log still reports it as 0.23.

It appears the plugin then received further fixes and modifications made by others, but in the absence of associated documentation, it is not clear exactly what changes were made.

This thread attempted to make sense of it:

https://forum.doom9.org/showthread.php?p=1089303#post1089303

I decided to test both the original (as assumed) 'LeFungus' 0.24 version and the 0.25.1 version posted by Mitsubishi in that thread. They produced identical results:

http://i.imgur.com/OnifUFL.png (https://imgur.com/OnifUFL)

Wow, very different from the other SSIM implementations, with the CRF=30 x264 encode scoring way down at 33 (0.33).
Yet, according to LeFungus, it was developed from the original code.

I suspect this stems from the 'luma masking' parameter that was given as an option (Default: True) in the original (LeFungus) 0.24 plugin. In 0.25.1 that option is not accessible, as such (returns an error), but since 0.25.1 produced identical results, it's reasonable to assume that 'Luma Masking' was being applied.

Possibly it equates with the luminance normalization filtering that is applied in the original SSIM algorithm ? In the AVISynth plugin it is applied as a weighting:

This filter is designed to compute an SSIM value by two methods, the original one, and a "enhanced" one that weight these results by lumimasking........In the csv file, when lumimasking is activated, both SSIM values and its weight is written.

https://avisynth.org.ru/docs/english/externalfilters/ssim.htm

Unfortunately, I only recorded the final aggregate SSIM score reported in the log file and didn't generate the csv file that lists the individual frame scores and weightings. I'll maybe re-run some tests to see what difference the weightings made. But really, these results and the vagaries surrounding this plugin, don't exactly instill confidence in it's use.

ChaosKing

18th February 2019, 20:14

So basically every ssim implementation gave different results... which one can we trust (more)?

WorBry

18th February 2019, 20:51

Quite so ! Granted the tests were conducted with just one source clip (CrowdRun) - although a good one at that - high quality/complexity/motion/hard to compress.

The results as they stand leave me more inclined to use muvsfunc SSIM as a 'definitive' implementation of the original code. Would be nice if there were an AVISynth(+) implementation of the muvsfunc SSIM filter.

Still don't understand though why the libvaf derived SSIM figures are so much higher. Is it down to difference in Gaussian kernel size or are the reported elementary SSIM scores being further weighted by the VMAF 'model' in some way (before the final VMAF calculation, that is) ?

As for MDSI - results to follow.

zorr

18th February 2019, 21:39

I looked at sources of muvsfunc SSIM and Avisynth's SSIM (v0.25.1 by Mitsubishi).

The Avisynth version is not doing the gaussian kernel at all - it's implemented using summed area tables. That's a faster but lower quality way to calculate it, the MSU Quality measurement tool page (http://www.compression.ru/video/quality_measure/info.html#ssim) has an example of the difference.

Also muvsfunc returns SSIM calculated on one plane only, by default the luma. Avisynth SSIM has a plane argument which defaults to 0 and then it returns a weighted sum of the luma and chroma channels:
(0.8 * Y) + (0.1*(U+V))

And yes, Avisynth has the lumimask but it's disabled in the code. Muvsfunc has the variables k1 and k2, but at least they default to same values as the ones used in Avisynth version.

So there are quite a few ways to make the implementations differ, I guess there are similar small differences between the other implementations.

My opinion is that the default muvsfunc SSIM downsampling is not useful when comparing the quality of different script settings (like in the Zopti optimizer).

WorBry

18th February 2019, 22:08

Thanks for the insights. That explains a lot.

Also muvsfunc returns SSIM calculated on one plane only, by default the luma.

So presumably libvmaf is doing the same ?

Avisynth SSIM has a plane argument which defaults to 0 and then it returns a weighted sum of the luma and chroma channels:
(0.8 * Y) + (0.1*(U+V))

Is that how ffmpeg calculates an aggregate 'All' SSIM score also ? Even if it's not applying Gaussian weights, obtaining individual scores for the Luma and U, V channels can be useful in assessing whether losses are occurring in the chroma only - for examining chroma subsampling efficiencies etc.

My opinion is that the default muvsfunc SSIM downsampling is not useful when comparing the quality of different script settings (like in the Zopti optimizer).

Yes, I don't see there's anything to be gained in that context.

WorBry

18th February 2019, 23:38

Question:

For conducting these tests I've been using VirtualDub2 to run the VS scripts and generate the result files.

I'd like to change to KingChaos's Portable (Flatpack) version in future. The changelog for the next update (2019-02-xx) promises to include:

- Add VFW "install" script, so that VDub and co can read vpy files

https://forum.doom9.org/showthread.php?t=175529

Meanwhile, for running MDSI (and possibly Buttergauli) scripts, how do I vspipe the RGB24 output to ffmpeg as a null operation, purely to generate the results files ?

I suppose I could use VSEditor > Preview in place of VirtualDub2 but I can't see how to stop the playback looping when it comes to the end of the clip.

ChaosKing

18th February 2019, 23:48

I'd like to change to KingChaos's Portable (Flatpack) version in future. The changelog for the next update (2019-02-xx) promises to include:

I think I'm trapped in an alternate reality.
You can use the reg file for now https://forum.doom9.org/showthread.php?p=1864051#post1864051

I suppose I could use VSEditor > Preview in place of VirtualDub2 but I can't see how to stop the playback looping when it comes to the end of the clip.

Use the Benchmark (F7) instead.

WorBry

19th February 2019, 00:28

I think I'm trapped in an alternate reality

Really, what's the weather like there ?

Use the Benchmark (F7) instead.

That's the one. Thanks.

You can use the reg file for now https://forum.doom9.org/showthread.php?p=1864051#post1864051

Hadn't seen your other post about the reg edit. So is that basically what the 'VFW "Install" Script' will be doing ?

WorBry

19th February 2019, 03:53

I looked at sources of.....Avisynth's SSIM (v0.25.1 by Mitsubishi).

...And yes, Avisynth has the lumimask but it's disabled in the code.

That's odd - I went back to check and re-run some of the tests with v0.24 and v0.25.1. The results I posted above were definitely with Lumimask=True applied in v0.24, and v0.25.1 gave the same results. You can see both the 'original' and weighted ('enhanced') SSIM scores displayed on the output frames as the script is played through VDub2 and the per-frame scores are listed in separate columns in the generated csv file. The text file however only gives the 'global' weighted score.

The Lumimask parameter may be disabled as an option in v0.25.1, but it's definitely being applied.

That said, setting Lumimask=False in v0.24 didn't radically change the results. I only ran a couple of tests:

Lumimask=True Lumimask=False
CRF0 100 100
CRF1 98.50 98.36
CRF12 87.17 86.64
CRF30 33.80 34.09

ChaosKing

19th February 2019, 14:06

Really, what's the weather like there ?

Cloudy with a Chance of Meatballs :p

Hadn't seen your other post about the reg edit. So is that basically what the 'VFW "Install" Script' will be doing ?

Yes, it will add it to the registry with the correct path.

WorBry

19th February 2019, 17:13

As for MDSI - results to follow.

The MDSI and GMSD results for the x264 and x265 series:

http://i.imgur.com/3f9NgNc.png (https://imgur.com/3f9NgNc)

http://i.imgur.com/CI2nHXH.png (https://imgur.com/CI2nHXH)

Interesting that the MDSI scores show a fairly linear relation with bitrate plotted as base 2 log. Encoding x264 at any fractional CRF value <1 of course defaults to lossless High444Predictive. Interesting also that the difference between the (bitrate matched) x264 and x265 score plots is fairly constant down to around 24 Mbps.

Those MDSI results were with downscale applied i.e.

zopti.addParams('mdsi', dict(down_scale=2))

With downscale turned off (the default)...

zopti.addParams('mdsi', dict(down_scale=1))

....the scores are lower and lose the linear relationship at the higher bitrates.

http://i.imgur.com/cHIFRD6.png (https://imgur.com/cHIFRD6)

For those interested, here's the original paper for the MDSI (Mean Deviation Similarity Index) metric.

https://arxiv.org/pdf/1608.07433.pdf

This metric pools combined image gradient (sensitive to structural distortions) and chromacity similarity maps.

zorr

19th February 2019, 22:08

Is that how ffmpeg calculates an aggregate 'All' SSIM score also ?

I looked at the ffmpeg source. It's doing the fast version too - no gaussian kernels there. The total SSIM takes into account all the planes but the weighting is different, each plane is scaled by the resolution is has. So for example with YUV420 the color planes have 4 times smaller weight. The constants 0.01 and 0.03 appear in the code and I assume that means the weights k1 and k2 are the same as in other implementations.

The Lumimask parameter may be disabled as an option in v0.25.1, but it's definitely being applied.

You're right, or should I say we're both right. :) The lumimask is disabled in the Get_SSIM_Frame() function which is the one used by Zopti (it can return the SSIM value to the calling script). But the other function which saves the file and the one you used has the lumimask functionality.

The MDSI and GMSD results for the x264 and x265 series:

Interesting results. Are the x264 and x265 results swapped in the second chart, it has x264 with better scores?

Note that if the GMSD results were calculated with the version before ChaosKing's addParams()-function it had 2x downsampling enabled for GMSD as well.

WorBry

20th February 2019, 01:12

I looked at the ffmpeg source. It's doing the fast version too - no gaussian kernels there. The total SSIM takes into account all the planes but the weighting is different, each plane is scaled by the resolution is has. So for example with YUV420 the color planes have 4 times smaller weight. The constants 0.01 and 0.03 appear in the code and I assume that means the weights k1 and k2 are the same as in other implementations.

Thanks for looking at that. Shame the free version of the MSU VQM tool is limited to SD resolution and the demo of Pro version doesn't guarantee that the results will be correct; seems an odd way to promote a product - the statistical equivalent of a water mark, I guess. It would be interesting to compare the results otherwise.

A few years back someone put together a VQA tool in Java (jVQA) inspired by the MSU VQA:

https://forum.doom9.org/showthread.php?t=172876&highlight=GMSD+SSIM+Java

Looks like it didn't get beyond beta development.

Are the x264 and x265 results swapped in the second chart, it has x264 with better scores?
No, they are the right way round. It's just that the second graph plots the scores against the encode CRF and the the resulting x264 bitrates were appreciably higher (around 30-35 %) than x265 at the same CRF setting, so it biases the plots that way. I only included it really to show why the x264 MDSI score goes sharply from 9.8 (at CRF=1) to 0 (at CRF=0) and there are no in-between data points.

For comparing x264 and x265 the first graph with the scores plotted against bitrate is the one to look at.

Note that if the GMSD results were calculated with the version before ChaosKing's addParams()-function it had 2x downsampling enabled for GMSD as well.

They were. I might go back and re-run the GMSD tests with down-sampling disabled, for completeness. Just finishing off testing Butteraugli.

WorBry

20th February 2019, 18:50

And ButtUgly...I mean, Butteraugli:

http://i.imgur.com/KfySsp9.png (https://imgur.com/KfySsp9)

Somewhat at odds with the results of the SSIM, GMSD and MDSI testing. Down at the low bitrates it reports x264 and x265 to be pretty much on par. Increasing the bitrate it gives the edge to x264, but then around the 170 Mbps mark (around CRF=10 for x264 and CRF=8 for x265) it switches and starts to score x265 at higher quality:

http://i.imgur.com/IA2vXEG.png (https://imgur.com/IA2vXEG)

That said, the butteraugli developers do stress that this metric was tuned for comparing images "in the domain of barely noticeable differences" ...."We don't know how well butteraugli performs with major deformations -- we have mostly tuned it within a small range of quality, roughly corresponding to jpeg qualities 90 to 95."

https://opensource.google.com/projects/butteraugli

https://github.com/google/butteraugli

I'm not sure how that equates to x264 and x265 quality in CRF mode, but I think that "switch-over" point where x265 starts to get the edge is probably significant.

And you can see it reflected in the Butteraugli quality 'heat' maps. Really starts to warm up around the CRF 10-12 (x264) mark and by CRF 30 it is a veritable furnace.

http://i.imgur.com/rffEp4Qm.jpg (https://imgur.com/rffEp4Q)

After opening image link, click on (+) cursor to enlarge

At some point I'll pull up quality maps from other tests to see what differences each is picking up, in the high bitrate 'visually lossless' range especially.

WorBry

21st February 2019, 03:17

Note that if the GMSD results were calculated with the version before ChaosKing's addParams()-function it had 2x downsampling enabled for GMSD as well.

They were. I might go back and re-run the GMSD tests with down-sampling disabled, for completeness.

GMSD on the x264 series with and without downsampling:

http://i.imgur.com/nCkM0Ug.png (https://imgur.com/nCkM0Ug)

zorr

21st February 2019, 22:18

Looks like Butteraugli is not useful for the whole quality range just like its developer suspected.

Interesting that GMSD without downsampling has the base 2 log - linear behavior a bit like MDSI when it was downsampled.

WorBry

22nd February 2019, 17:24

Looks like Butteraugli is not useful for the whole quality range just like its developer suspected.

The results, with that one source at least, do leave me with that impression. I searched for other instances where butteraugli has been used to compare x264 and x265. The only one I could find was:

https://encode.ru/threads/2811-Psychovisual-measurements-on-modern-image-codecs

I've yet to look at the uploaded results file in detail, but the comment by the second poster, Jyrki Alakuijala (who is the butteraugli developer.), is of note:

Digging out some data from the test results to support the question:

x264 233464 bytes, butteraugli = 0.948613, ssimulacra = 0.00761919
x265 241125 bytes, butteraugli = 1.689265, ssimulacra = 0.01749584

x265 gives significantly worse results than x264. Perhaps someone who knows ffmpeg flags can help.

Also, here jpeg for reference (also digged from the result_corpus.zip). In this test x264 is a lot better than jpeg, and x265 worse?!:

libjpeg/q93/yuv444 235122, butteraugli = 1.467834, ssimulacra = 0.01439446

Interesting that GMSD without downsampling has the base 2 log - linear behavior a bit like MDSI when it was downsampled.

I have no idea why. Must surely be in the scaled vs non-scaled computations.

zorr

23rd February 2019, 01:38

Hopefully you remembered to run the Butteraugli tests in linear RGB. :)

WorBry

23rd February 2019, 01:56

WorBry

23rd February 2019, 06:32

Meanwhile, I ran muvsfunc SSIM and GMSD on the 2160/50p Crowd Run x264 and x265 series that I tested in the other thread:

https://forum.doom9.org/showthread.php?p=1865316#post1865316

http://i.imgur.com/BPrFZvQ.png (https://imgur.com/BPrFZvQ)

Note that the shape of the muvsfunc and ffmpeg SSIM curves are different to those in the 1080/50p series - the scores dip more in the mid bitrate range. I brought attention to that (with ffmpeg SSIM) in the other thread:

http://i.imgur.com/cHsJnCW.png (https://imgur.com/cHsJnCW)

https://forum.doom9.org/showpost.php?p=1865424&postcount=56

More striking though are the GMSD results. Whereas the 1080/50p series gave consistently higher GMSD scores for x265 over the entire (CRF 0-30) range, here the x265 and x264 curves converge over the lower (48 - 96 Mbps) range.

At the higher bitrates however, GMSD still clearly favours x265:

http://i.imgur.com/pJ8H1ch.png (https://imgur.com/pJ8H1ch)

It could be said that the SSIM and GMSD results are somewhat complementary in this case.

I haven't tested MDSI.

ChaosKing

23rd February 2019, 23:02

Could you also test a recent rav1e build? --tune Psychovisual is now default. VMAF and GMSD would be interesting.

WorBry

24th February 2019, 02:32

I could look at it once I'm done testing with these x264 and x265 files, but I have to say that I have zero experience encoding AV1 - so you'd need to give me a command line with the parameters for ramping over a comparable test bitrate range.

zorr

24th February 2019, 15:42

Err, no I didn't. Didn't realize I needed to. I just converted to RGB24 with Rec709 matrix:

https://github.com/fdar0536/VapourSynth-butteraugli

I see you converted to linear RGB in your Zopti studies:

https://forum.doom9.org/showthread.php?p=1865218#post1865218/

But I don't understand why one would need to in this context. Doesn't butteraugli internally convert sRGB to Linear anyway ?

It's not really well documented but I found this in the source header (https://github.com/google/butteraugli/blob/master/butteraugli/butteraugli.h):

// Value of pixels of images rgb0 and rgb1 need to be represented as raw
// intensity. Most image formats store gamma corrected intensity in pixel
// values. This gamma correction has to be removed, by applying the following
// function:
// butteraugli_val = 255.0 * pow(png_val / 255.0, gamma);
// A typical value of gamma is 2.2. It is usually stored in the image header.
// Take care not to confuse that value with its inverse. The gamma value should
// be always greater than one.
// Butteraugli does not work as intended if the caller does not perform
// gamma correction.

This "raw intensity" I interpreted as linear RGB, I could be wrong though. But it states clearly that the caller must perform the correction.

ChaosKing

24th February 2019, 19:13

The command line looks like this vspipe.exe script.vpy - --y4m | rav1e.exe - --output out.ivf
https://github.com/xiph/rav1e/releases

The only interesting parameters are:
--quantizer <QP> Quantizer (0-255), smaller values are higher quality [default: 100]
-b, --bitrate <BITRATE> Bitrate (kbps)
-s, --speed <SPEED> Speed level (0 is best quality, 10 is fastest) [default: 3]
You can mux ivf to mkv. The ffms2 version in the portable fatpack can decode av1.

For me it would be interesting to see how the codecs perform on a lower bitrate.

WorBry

24th February 2019, 20:24

It's not really well documented but I found this in the source header (https://github.com/google/butteraugli/blob/master/butteraugli/butteraugli.h):

// Value of pixels of images rgb0 and rgb1 need to be represented as raw
// intensity. Most image formats store gamma corrected intensity in pixel
// values. This gamma correction has to be removed, by applying the following
// function:
// butteraugli_val = 255.0 * pow(png_val / 255.0, gamma);
// A typical value of gamma is 2.2. It is usually stored in the image header.
// Take care not to confuse that value with its inverse. The gamma value should
// be always greater than one.
// Butteraugli does not work as intended if the caller does not perform
// gamma correction.

This "raw intensity" I interpreted as linear RGB, I could be wrong though. But it states clearly that the caller must perform the correction.

Well, I dunno. If that is the case it should be made clear in the VapourSynth butteraugli plugin usage. Even the given example has:

import mvsfunc as mvf

clipa = core.std.Trim(src1, 0, 0)
clipa = mvf.ToRGB(clipa, depth=8)
clipb = core.std.Trim(src2, 0, 0)
clipb = mvf.ToRGB(clipb, depth=8)

diff = core.Butteraugli.butteraugli(clipa, clipb)

https://github.com/fdar0536/VapourSynth-butteraugli

I suppose I could retest but this really deserves some clarification.

WorBry

24th February 2019, 20:42

ChaosKing

24th February 2019, 22:19

Seems that Bitrate was added but not released yet https://github.com/moisesmcardona/rav1e_gui/issues/1

zorr

24th February 2019, 23:10

Well, I dunno. If that is the case it should be made clear in the VapourSynth butteraugli plugin usage.
...
I suppose I could retest but this really deserves some clarification.

I created an issue (https://github.com/google/butteraugli/issues/53) and asked about the input format. We also need to find out whether Vapoursynth-Butteraugli does it automatically.

ChaosKing

24th February 2019, 23:20

Found this https://github.com/fdar0536/VapourSynth-butteraugli/blob/master/vsbutteraugli.cpp#L98

WorBry

25th February 2019, 00:49

I haven't tested MDSI.

I have now:

http://i.imgur.com/YNOJ0i6.png (https://imgur.com/YNOJ0i6)

Compared with the 1080 50p series:

http://i.imgur.com/3f9NgNc.png (https://imgur.com/3f9NgNc)

x265 does still score marginally higher at high bitrates but we're looking at a difference of 0.1 units - or rather 0.001 in actual scores (x100).

http://i.imgur.com/inNFUxj.png (https://imgur.com/inNFUxj)

And then there's that abrupt upturn in the x264 score going from 20.5 at CRF=1 (2098 Mbps) to 0 for lossless CRF=0 (2566 Mbps). An even larger jump than seen in the 1080 50p series. Makes me wonder if there is a bitrate saturation point for this metric or if it's like ffmpeg PSNR, which has no saturation point and absolute lossless is reported as infinity (Inf).

Tried testing x264 (1080 50p) encoded at increasing 2-pass bitrates to see what more I could glean, but the maximum bitrate attainable was 615 Mbps:

http://i.imgur.com/QjFgFDM.png (https://imgur.com/QjFgFDM)

The MDSI score (with downsampling) increased just 0.02 (actual 0.0002) units going from 400 Mbps (18.30) to 615 Mbps (18.28).

WorBry

25th February 2019, 01:00

Found this https://github.com/fdar0536/VapourSynth-butteraugli/blob/master/vsbutteraugli.cpp#L98

So it is internally converting sRGB to Linear then ?

WorBry

25th February 2019, 18:21

I see there's a GUI for rav1e....
https://github.com/moisesmcardona/rav1e_gui

I've checked out the rav1e GUI (v1.8r2) and just can't get an encoded AV1 file (webm or mkv) out of it.

Tried encoding the CrowdRun 1080/50p 'master' (lossless, intra x264) clip at the default settings - Quantizer 100, Speed 3, Quality Tuning - PSNR. It proceeded to encode (slowly) and finally declared 'Finished', but there was no output file to be found.

I'll maybe try the command-line route. Otherwise, if you want to run the tests with that particular source, I used the 2160/50p (8bit 420) y4m version of Crowd Run from:

https://media.xiph.org/video/derf/

....as the source/reference for the 2160/50p tests. For the 1080/50p tests, I converted to lossless, intra x264:

ffmpeg -i {Path}:/crowd_run_2160p50.y4m -vf scale=1920:-1 -vcodec libx264 -intra -preset slow -qp 0 {Path}:/crowd_run_1080p50_x264_lossless.mp4

Edit: Well I got the command line rav1e working (with default settings - I set --tune Psychovisual just to be sure) but I can see it will take a long time to generate a series of test encodes. If you want to run the encodes and send me the files, or run the metric tests as well and send me the results (scores & bitrates), I could combine them with my x264/x265 data if you like.

zorr

25th February 2019, 21:33

So it is internally converting sRGB to Linear then ?

Yes, indeed it is. It's assuming the input is sRGB and always does the conversion. No need to do retests with Butteraugli.

I also got a response from Jyrki, the author of Butteraugli:

Is this correct and a conversion to "raw intensity" is needed for the input images?
Yes.
And if so is the result of the above mentioned gamma correction that the input is in linear RGB? So for example when applied to standard definition video with matrix '601' the frames should be converted to linear RGB before applying Butteraugli? Thanks!
You need to convert into linear RGB light where RGB are sRGB. These values may be (slightly) negative for wide gamut use. The normalization value of 255 corresponds roughly to 120-200 nits.

WorBry

25th February 2019, 22:08

That's good to know.

WorBry

26th February 2019, 18:06

I've checked out the rav1e GUI (v1.8r2) and just can't get an encoded AV1 file (webm or mkv) out of it.

Tried encoding the CrowdRun 1080/50p 'master' (lossless, intra x264) clip at the default settings - Quantizer 100, Speed 3, Quality Tuning - PSNR. It proceeded to encode (slowly) and finally declared 'Finished', but there was no output file to be found.

Appears that the issue was with the final muxing (to mkv or webm) - if I uncheck Remove Temporary Files, the concatenated ivf file is there in the Temp Folder.

ChaosKing

26th February 2019, 18:47

I will upload some encodes soon. I made encodes from -q 100 - 200. Bitrate mode looked very ugly within the first 2 frames so I dropped it.

WorBry

26th February 2019, 19:19

Sounds Good. Having solved the the rav1e GUI issue (i.e. not removing the temp ivf files), I've run a couple of tests - wasn't sure if it's best to disable Low Latency, which slows things down more. I was thinking I might have to set-up for encoding on another PC so I can get other stuff done, but if you've already produced test files, all the better.

Edit: Quick test with the rav1e GUI and Crowd Run 1080/50p source - Quantizer 140, Speed 3, tune Psychovisual

'Low Latency' enabled - Bitrate 20.7 Mbps, VMAF 93.21
'Low Latency' disabled - Bitrate 15 Mbps, VMAF 88.75

So yes I think -q100 - 200 will be a good test range.

I'll run some additional x264/x265 CRF encode tests in the lower bitrate range.

ChaosKing

26th February 2019, 20:51

Here are my rav1e 1080/50p encodes: https://www.dropbox.com/s/vnx69xt3wbzn4ot/rav1e_crowd.zip?dl=1
Used rav1e.exe from here https://github.com/xiph/rav1e/releases/tag/20190219

WorBry

26th February 2019, 23:10

Great. I'll try and run the tests this evening.

Edit: It will take me a bit as I'll have to extend the x264/x265 series down to around CRF38 to match the low rav1e bitrates - q200 is just 2655 Kbps

WorBry

27th February 2019, 19:13

The results.

I ran the entire series of x264 and x265 encodes afresh with the last Zeranoe nightly build (ffmpeg-20190225-f948082-win64-static) for good measure. They were encoded with the default CRF settings. What were the rav1e encode settings, btw ?

I used VapourSynth-VMAF version r3 in model=0 (vmaf_v0.6.1.pkl) mode. 'Downsample' was applied in the muvsfunc SSIM and GMSD tests.

http://i.imgur.com/IpT23ng.png (https://imgur.com/IpT23ng)

http://i.imgur.com/lAJH4nZ.png (https://imgur.com/lAJH4nZ)

http://i.imgur.com/QXovVYL.png (https://imgur.com/QXovVYL)

ChaosKing

27th February 2019, 19:28

I used default settings: --quantizer X --tune Psychovisual(just to be sure)
--speed <SPEED> Speed level (0 is best quality, 10 is fastest) [default: 3]

The result is more or less what I expected after I made some x265 encodes and compared it, it looked always slightly worse. :D
But to be more fair you should also use the -q mode in x264/5 and not crf. (I also used crf out of habit)