View Full Version : GMSD and SSIM Quality Metrics
WorBry
27th February 2019, 19:58
(I also used crf out of habit)[/I]
It didn't occur to me, especially as all the other tests were in CRF mode, I'll retest in -q mode when I have time.
WorBry
27th February 2019, 20:30
Since I'd run the tests, the ffmpeg SSIM and PSNR results:
http://i.imgur.com/C5DfLvz.png (https://imgur.com/C5DfLvz)
http://i.imgur.com/LdgQvtp.png (https://imgur.com/LdgQvtp)
I also ran MDSI tests on the rav1e encodes, but haven't yet on the x264/x265 series.
WorBry
28th February 2019, 03:40
(I also used crf out of habit)[/I]
It didn't occur to me, especially as all the other tests were in CRF mode, I'll retest in -q mode when I have time.
The VMAF, muvsfunc SSIM and GMSD results updated to include x264 encoded in CQP mode (qp 28 - 46) :
http://i.imgur.com/C9muivX.png (https://imgur.com/C9muivX)
http://i.imgur.com/UxWZmcP.png (https://imgur.com/UxWZmcP)
http://i.imgur.com/hM6CvJx.png (https://imgur.com/hM6CvJx)
Interesting that the VMAF scores for x264 in CQ mode are barely lower than CRF mode and then at around 22 Mbps CQ scores the highest of them all.
x265 in CQP mode to follow....at some point.
WorBry
28th February 2019, 18:41
The composite metric results, including x265 encoded in CQP mode (qp 28 - 44).
Note:
Interesting that the VMAF scores for x264 in CQ mode are barely lower than CRF mode and then at around 22 Mbps CQ scores the highest of them all.
Simple data transfer error. Corrected below.
http://i.imgur.com/elq9QV7l.png (https://imgur.com/elq9QV7)
Click on image to enlarge and on (+) cursor to enlarge further.
Clearly encoding x265 and x264 in CQP mode brings a different perspective. Now rav1e better matches CQP x265, as judged by SSIM and GMSD, and has the edge at the lowest bitrates. But VMAF still deems x265 to have higher perceptual quality over the entire bitrate range.
Knowing next to nothing about AV1, what are the prospects for CRF rate control in rav1e ?
ChaosKing
2nd March 2019, 02:38
What enc paramteres did you use for x264/x265?
Maybe I will make also some 2pass encodes with rav1e if the bitrate mode doesn't suck like in 1pass. Just for completeness. It would be interessting to see by how much it will improve. (or not improve at all :D)
WorBry
2nd March 2019, 03:28
What enc paramteres did you use for x264/x265?
For CRF:
ffmpeg -i Input.mp4 -vcodec libx264 -preset slow -crf {value} -pix_fmt yuv420p -r 50/1 -x264opts colorprim=bt709:transfer=bt709:colormatrix=bt709 Output.mp4
ffmpeg -i Input.mp4 -vcodec libx265 -preset slow -crf {value} -pix_fmt yuv420p -r 50/1 -x265-params colorprim=1:transfer=1:colormatrix=1 Output.mp4
For CQP:
ffmpeg -i Input.mp4 -vcodec libx264 -preset slow -qp {value} -pix_fmt yuv420p -r 50/1 -x264opts colorprim=bt709:transfer=bt709:colormatrix=bt709 Output.mp4
ffmpeg -i Input.mp4 -vcodec libx265 -preset slow -pix_fmt yuv420p -r 50/1 -x265-params qp={value}:colorprim=1:transfer=1:colormatrix=1 Output.mp4
In hindsight I maybe should have left at default -preset medium; I used -preset slow in all of the prior testing and didn't think to change it. I suppose I could re-test if it's felt that gave x264/x265 an unfair advantage, but can't face doing it all over just now.
I'm in the middle of some tests with visually lossless intermediate codecs. Seeing interesting results with Prores, MDSI and Butteraugli that deserve reporting when I'm done.
Maybe I will make also some 2pass encodes with rav1e if the bitrate mode doesn't suck like in 1pass. Just for completeness. It would be interessting to see by how much it will improve. (or not improve at all :D)
Sure. Bring it on ;)
WorBry
8th March 2019, 22:04
I'm in the middle of some tests with visually lossless intermediate codecs. Seeing interesting results with Prores, MDSI and Butteraugli that deserve reporting when I'm done.
So I've been looking at the behaviour of these metrics when applied to discern fine differences between 'visually lossless' codecs.
For this I used a 1080/50p 10bit 422 (MagicYUV) transcode of the original Crowd Run 2160/50p SGI sequence as the test source and reference:
ftp://vqeg.its.bldrdoc.gov/HDTV/SVT_MultiFormat/2160p50_CgrLevels_Master_SVTdec05_/1_CrowdRun_2160p50_CgrLevels_MASTER_SVTdec05_/
Encoded the source to:
- Prores HQ with VirtualDub2. The integrated (ffmpeg) encoder offers a range of 'Quality Levels' (q2-31), which appear to equate with qscale in ffmpeg CLI. I tested over q2 - 20 range.
- Cineform with VirtualDub2, using the 'native' Cineform SDK encoder. This implementation adds another quality level 'Filmscan 3' that is not normally available in other applications. I tested the full 'Filmscan 3' to 'Low' quality range.
- JPEG (RGB24, 100 - 85% compression range) using VapourSynth's very own ImageMagick Writer and the corresponding reader for re-importing the JPEG sequence:
I also encoded to Prores_ks HQ and DNxHR-HQX with ffmpeg CLI using the default settings.
Here are the muvsfunc VMAF, SSIM, GMSD MDSI and ffmpeg SSIM/PSNR results for Prores HQ and Cineform:
http://i.imgur.com/EywOupol.png (https://imgur.com/EywOupo)
http://i.imgur.com/ICbJOIMl.png (https://imgur.com/ICbJOIM)
Nothing that remarkable about the muvsfunc VMAF, SSIM and GMSD results. For ffmpeg SSIM I also plotted the aggregate component Y, U and V scores. The Cineform 'Filmscan 3' scores were only marginally higher than 'Filmscan 2'. Clearly, the VMAF metric has no value in this context.
What is interesting in this case are the MDSI results. With Prores (and 'downsample' applied) the scores plateaued around q10 and actually decreased as the quality level was increased to q2. The MDSI scores with 'downsample' disabled also plateaued, but at a higher 'quality level' (around q4). The Cineform MDSI results do not show this behaviour.
I decided to examine this further. It made little difference if the Prores imports (and MagicYUV source) were converted to RGB48 or RGB24, with or without dithering. What did make a difference was converting the Prores and reference (MagicYUV) clips to greyscale before the RGB conversion - it dramatically relieved the apparent 'score saturation', implicating the chroma component:
http://i.imgur.com/d6GBbKdl.png (https://imgur.com/d6GBbKd)
The MDSI metric score is derived from pooled Gradient (Luminance) and Chromacity similarity assessments. I asked WolframRhodium if it might be possible to generate the Gradient (GS), Chromacity (CS) and pooled Gradient-Chromacity (GCS) maps and he has kindly obliged:
https://forum.doom9.org/showthread.php?p=1867762#post1867762
The generated map traces are very faint but they can be brought up quite nicely with Unsharp Mask. For the tests with downsample enabled I applied 2-passes (Radius 25, Amount 10) in Gimp.
Here are the similarity maps for ProRes at q2, q10, q16 and q20 quality levels, with 'downsample' enabled - I did also generate maps with 'downsample' disabled, but have yet to examine them.
http://i.imgur.com/NtIYYbLm.png (https://imgur.com/NtIYYbL)
Click image to enlarge, and then on (+) cursor to enlarge to max.
Looking at the Gradient-Chromacity Similarity (GCS) maps it's difficult to see how the final MDSI score at q2 would be less than that at q10. But the Chromacity Similarity (CS) maps definitely have an issue with the higher saturation colours on the runners shirts, right up to q2.
If anyone wants to examine them in more detail here are the matching frame grabs (VSEditor) of the imported clips (ffms2 decode) before and after conversion to RGB:
http://i.imgur.com/8OC28Tsm.jpg (https://imgur.com/8OC28Ts)
The frames in the downloaded image (right click > Save Image As) are the original 1920 x 1080 res.
And here are the similarity maps obtained with the Cineform and JPEG transcodes:
http://i.imgur.com/NNFPVlFm.png (https://imgur.com/NNFPVlF)
http://i.imgur.com/pUdCInWm.png (https://imgur.com/pUdCInW)
Big difference. Evidently MDSI is picking up chroma 'distortion' in the Prores encodes.
Bringing the clips and frame grabs into DaVinci Resolve I couldn't pick-up any obvious chroma shifts or saturation changes on the videoscopes, but I'll look at that in more detail with selective hue/saturation/luminance keys.
I don't think it's 'Quicktime' related issue per se - Cineform produced the same MDSI results whether encoded in AVI or MOV format. However, there are numerous reports of 'chroma shifts' associated with the ffmpeg Prores implementation.
The metric scores obtained with the ffmpeg CLI Prores encode (at default settings) put it on par with 'quality level' 12 using the VDub2 encoder.
More interesting still are the Butteraugli test results. They don't show this behaviour with the Prores encodes and converting to greyscale had much less impact on the scores:
http://i.imgur.com/Mk9ghqCl.png (https://imgur.com/Mk9ghqC)
In fact above Prores quality level 4 (the default), the greyscale scores were lower.
As mentioned above, the Butteraugli metric was tuned for detecting fine differences in JPEG images in the 90-95% compression domain. Note that the score drops quite markedly going from 90 to 85% quality.
Also interesting that ffmpeg DNxHR-HQX, whilst giving a 'bitrate-matched' MDSI score similar to that of Prores (VDub2), gives a much higher Butteraugli score, approaching that of JPEG 90%. Also the Butteraugli score for the ffmpeg Prores encode was lower than of Prores (VDub2) at an equivalent bitrate.
I've yet to scrutinize the 'heat maps' to see what more they can reveal.
I'm also wondering now if the MDSI scores seen earlier with the ffmpeg x264/x265 encodes might have been influenced in the same way as Prores.
WorBry
9th March 2019, 06:01
I've yet to scrutinize the 'heat maps' to see what more they can reveal.
The Butteraugli 'heat maps' and corresponding RGB24 source frames.
Click on image to enlarge, and on (+) cursor to enlarge further.
Prores HQ (VirtualDub2):
http://i.imgur.com/jvzL8C6m.jpg (https://imgur.com/jvzL8C6)
Cineform:
http://i.imgur.com/auU6yQAm.jpg (https://imgur.com/auU6yQA)
JPEG:
http://i.imgur.com/GtTYHdjm.jpg (https://imgur.com/GtTYHdj)
In the Prores series, there is very little difference between the 'full colour' and greyscale heat maps. Looking at the image components that Butteraugli is targeting in the highest quality encodes of each series, it appears to be mostly the blacks and whites.
Default dithering was applied in the fmtconv YUV422P10 > RGB24 conversion btw:
clip = core.ffms2.Source(source=r'{Path}:/10bit_422_Source.avi')
clip = core.fmtc.resample(clip=clip, css="444")
clip = core.fmtc.matrix(clip=clip, mat="709", col_fam=vs.RGB)
clip = core.fmtc.bitdepth(clip=clip, bits=8)
WorBry
9th March 2019, 19:48
Here are the MDSI similarity maps for ProRes at q2, q10, q16 and q20 quality levels, with 'downsample' enabled - I did also generate maps with 'downsample' disabled, but have yet to examine them.
http://i.imgur.com/NtIYYbLm.png (https://imgur.com/NtIYYbL)
Click image to enlarge, and then on (+) cursor to enlarge to max.
Looking at the Gradient-Chromacity Similarity (GCS) maps it's difficult to see how the final MDSI score at q2 would be less than that at q10. But the Chromacity Similarity (CS) maps definitely have an issue with the higher saturation colours on the runners shirts, right up to q2.
After further investigation it appears that the issue was with the (FFMS2) decoding.
The reference 10bit 422 source was in MagicYUV AVI format and the test Prores encodes were in MOV format. First tried converting the MagicYUV AVI reference clip to MOV, but it made no difference. But when I converted the Prores MOV files to (lossless) MagicYUV 10bit 422 AVI and re-ran the MDSI tests, lo and behold, it resolved the 'chroma distortion'. Just why, I don't know. But here are the revised MDSI similarity maps:
http://i.imgur.com/YynjVDbm.png (https://imgur.com/YynjVDb)
What's interesting is that when I re-ran the Butteraugli tests with the converted Prores clips it did not change the scores or 'heat maps' at all. And no change in the VMAF, SSIM and GMSD scores either. It only affected MDSI.
I'll post the revised MDSI score graphs in due course.
WorBry
10th March 2019, 06:24
The revised Prores metric score charts:
http://i.imgur.com/WZ1HkDLm.png (https://imgur.com/WZ1HkDL)
And the revised MDSI scores compared with those of other transcodes:
http://i.imgur.com/xgaEanem.png (https://imgur.com/xgaEane)
Now that looks a lot healthier.
The ffmpeg (CLI) Prores-ks-HQ and DNxHR-HQX transcodes both showed the 'chroma distortion' in the MDSI tests as well and converting the MOV files to MagicYUV 10bit 422 AVI resolved that also. So this was not specifically a Prores issue.
As mentioned above, it is interesting that the Butteraugli scores and heat maps were not changed at all by the conversion. Suggests it is unresponsive to subtle color shifts.
As this exercise has shown, MDSI is very sensitive to subtle color shifts and could prove to be a useful tool for assessing chroma sampling efficiencies.
Boulder
10th March 2019, 14:17
Have you compared MDSI to GMSD? Based on my Bicubic resize parameter tests with Zopti, MDSI seems to favour a sharper image than GMSD. I don't know if there are other differences.
WorBry
10th March 2019, 17:17
Well this is how GMSD and MDSI compare for this particular source:
http://i.imgur.com/r5bI4h1m.png (https://imgur.com/r5bI4h1)
That said - whilst Crowd Run serves as a good test reference for it's hard-to-compress complex/colorful content, I wouldn't say it's an especially sharp image, at least by contemporary 4K standards; there's a fair bit of motion and pan blur going on there and it was downscaled (Spline36) to 1080p for these tests also.
MDSI seems to favour a sharper image than GMSD
Might well explain why the MDSI scores plots don't taper down to approach 0 as they do with GMSD.
I'll look and see what other HQ sharp sources I can test.
Boulder
10th March 2019, 17:56
Here's one from Black Sails season 1, which is a very high quality 1080p release. The texture of characters skins is very detailed-looking and sharp. A lot of the content in the series has shaky camera movement but this scene is rather still compared to others.
https://drive.google.com/open?id=1mZz-woQvyl949qCnZIXqKQE3igWDeRHg
WorBry
10th March 2019, 19:11
OK thanks. I'll have a look at it.
zorr
10th March 2019, 22:54
After further investigation it appears that the issue was with the (FFMS2) decoding.
Great detective work WorBy. MDSI looks pretty good in my tests too because it's more "picky" than the other metrics. Even if that's not the most important feature it can be an advantage.
WorBry
10th March 2019, 23:25
I spent a good while staring at images, videoscopes and selective H,S,V keys in DaVinci Resolve and could not see any chroma shifts in the Prores clips - came to the conclusion it must be occurring on VapourSynth import.
MDSI looks pretty good in my tests too because it's more "picky" than the other metrics.
I need to do more tests to reach a firm conclusion about MDSI. GMSD is really growing on me - as you've observed in your Zopti studies it does appear to be slightly more focused than SSIM (muvsfunc). But it's good to have a sensitive metric like MDSI that takes into account luminance (gradient) and chromacity also. Pity it doesn't provide separate luminance and chromacity measures. Still, a lot of can be gleaned from the similarity maps, as in this case. Running a parallel test in greyscale also.
I'm looking at Boulder's clip just now.
Incidentally, what would be good in-line option for 'normalizing' the very faint/nigh-on-invisible MDSI similarity maps in VapourSynth? Two Unsharp Mask passes in Gimp does it nicely, but it's a bit tedious copying frame grabs, especially when there's little visibly to guide the frame selection.
Boulder
11th March 2019, 04:55
Incidentally, what would be good in-line option for 'normalizing' the very faint/nigh-on-invisible MDSI similarity maps in VapourSynth? Two Unsharp Mask passes in Gimp does it nicely, but it's a bit tedious copying frame grabs, especially when there's little visibly to guide the frame selection.
Maybe something like this would do?
result = core.std.Levels(result, min_in=127, max_in=132, min_out=0, max_out=255, planes=0)
For other than 8-bit clips, the values must be bigger. For 16-bit I use result = core.std.Levels(result, min_in=125*256, max_in=132*256, min_out=0, max_out=255*256, planes=0).
WorBry
12th March 2019, 04:58
Here's one from Black Sails season 1, which is a very high quality 1080p release. The texture of characters skins is very detailed-looking and sharp. A lot of the content in the series has shaky camera movement but this scene is rather still compared to others.
https://drive.google.com/open?id=1mZz-woQvyl949qCnZIXqKQE3igWDeRHg
So I took a 12.5 sec section (frames 563 - 864) from the clip that spanned the head shots of the three actors, and converted to lossless MagicYUV YV12.
Since it's 8-bit 420 thought I may as well do a x264 vs x265 comparison but limited to the 'high-end" CF range (x264 CRF 1-5 and x265 CRF 0 -3) and Intra-frame only - I anticipated taking frame shots.
To avoid any decode issues, I converted the encodes to MagicYUV YV12 for the GMSD and MDSI tests. Ran the tests with and without downsample. Also ran a parallel series (with down-sampling) with the source and test clips converted to greyscale, which I did in AVISynth.
Here are the results:
http://i.imgur.com/PmeSy7Pm.png (https://imgur.com/PmeSy7P)
Both GMSD and MDSI consistently found differences between x264 and x265. The pattern is actually quite similar to that seen earlier with Crowd Run at the high bitrates:
http://i.imgur.com/QjFgFDMm.png (https://imgur.com/QjFgFDM)
Running the tests with greyscale clips improved the MDSI scores marginally. As noted in the Crowd Run 10bit 422 test series, converting to greyscale had no effect on the GMSD scores, as you might expect.
So what differences are GMSD and MDSI picking up between x264 and x265 ? Thought I'd see what the 'quality maps' could reveal. Fortuitously, x264 CRF=1 and x265 CRF=0 were both 161 Mbps. So I picked a frame (#778, I think, in the original mkv clip) that showed a visible map trace with both GMSD and MDSI (no downsample) in the x264 clip, generated the quality maps and amplified them with a two-pass Unsharp Mask in Gimp.
Here are the maps from the tests with downsample applied:
http://i.imgur.com/auyrVRsm.png (https://imgur.com/auyrVRs)
Very faint aren't they? I was tempted to run a third Unsharp pass but, for consistency, left it there.
And with no downsample:
http://i.imgur.com/QAA09zqm.png (https://imgur.com/QAA09zq)
And the matching frames (VSEditor YUV420P8 output) from the source and test clips:
Source:
http://i.imgur.com/R0D4zvdm.png (https://imgur.com/R0D4zvd)
x264:
http://i.imgur.com/8geESDam.png (https://imgur.com/8geESDa)
x265:
http://i.imgur.com/TygrruUm.png (https://imgur.com/TygrruU)
Clearly (in the 'no downsample' series at least), both the GMSD quality map and MDSI Gradient Similarity (GS) map are picking up distortions in the x264 clip that represent more than fine detail. The GMSD map for the x265 clip is markedly less dense. The MDSI GS map for x265 shows more of a selective improvement - see how traces representing fine detail on the actors forehead, around the eyes and moustache are greatly diminished - the striped shirt sleeve pattern also. In the pooled Gradient-Chromacity (GCS) map that gets offset by the faint Chromacity (CS) map, but there also you can see an improvement in the x265 clip trace.
Anyhow, there it is. I was thinking about applying some controlled distortions to the source clip to see how GMSD and MDSI respond but I'm not sure I'll have time.
Boulder
12th March 2019, 05:16
Quite interesting that the maps show such an amount of difference at that bitrate. I think that there you can see the fundamental difference between x264 and x265, the first one has blocking/is more focused on enhancing the edges and higher frequencies by creating "fake detail" at default settings while the latter one likes to blur more.
WorBry
12th March 2019, 05:55
The maps also support Zorr's perception that MDSI tends to be more 'picky':
MDSI looks pretty good in my tests too because it's more "picky" than the other metrics.
BTW, I tried:
result = core.std.Levels(result, min_in=127, max_in=132, min_out=0, max_out=255, planes=0)
But inserted after....
result = core.std.SetFrameProp(result, prop='_Matrix', delete=True)
...to generate the MDSI maps, it outputs a black frame.
WorBry
13th March 2019, 16:02
I think that there you can see the fundamental difference between x264 and x265, the first one has blocking/is more focused on enhancing the edges and higher frequencies by creating "fake detail" at default settings while the latter one likes to blur more.
I think you're right. Here are crops from another frame.
http://i.imgur.com/NR57nspm.png (https://imgur.com/NR57nsp)
Click image to enlarge and (+) cursor to enlarge further
The x265 image definitely has more blur than x264 (notably on the skin textures), which MDSI deems more acceptable.
ifb
13th March 2019, 22:24
How about a quick XAVC and/or AVC Intra test? It always seemed to me that prores_aw/ks was blurry compared to DNxHD, but DNxHD would get blocky pretty easily. I'm curious how the AVC variants do since they forbid deblocking.
WorBry
13th March 2019, 23:39
I don't have means for encoding bone fide XAVC or AVC-Intra, if that's what you mean ?
ifb
14th March 2019, 02:07
I don't have means for encoding bone fide XAVC or AVC-Intra, if that's what you mean ?
Something like:
x264 foo -o xavc.h264 --output-csp i422 --output-depth 10 --avcintra-class 100 --avcintra-flavor sony --level 4.2 --sar 1:1
The "panasonic" flavor will generate AVC-Intra. There's also Class 200 (doubles the bitrate).
WorBry
14th March 2019, 04:28
I'm not that familiar with x264.exe command line encoding - only ever used ffmpeg for encoding pseudo AVC-Intra 100 and XAVC-I Class 480 in the past, and that was not using this -avcintra-flavor option.
Transcoding the Crowd Run 1080/50p 10bit 422 master with your command as is, the resulting bitrate is 227 Mbps - shouldn't it be 100 Mbps for avcintra-class 100?
Edit: Ah, that's because its 50p ?
http://www.xavc-info.org/xavc/share/data/XAVC_Profiles_and_OperatingPoints_120_Amd1.pdf
ifb
14th March 2019, 15:19
I'm not that familiar with x264.exe command line encoding - only ever used ffmpeg for encoding pseudo AVC-Intra 100 and XAVC-I Class 480 in the past, and that was not using this -avcintra-flavor option.
Transcoding the Crowd Run 1080/50p 10bit 422 master with your command as is, the resulting bitrate is 227 Mbps - shouldn't it be 100 Mbps for avcintra-class 100?
Edit: Ah, that's because its 50p ?
http://www.xavc-info.org/xavc/share/data/XAVC_Profiles_and_OperatingPoints_120_Amd1.pdf
The names break down for 50p/60p rates. Bitrate switches to 2x the class name. Same for UHD where Class 300/480 is 600/960 Mbps, but there's no support in x264 for that right now anyway.
For ffmpeg, you can pass custom params to libx264:
-c:v libx264 -x264-params avcintra-class=100:avcintra-flavor=sony
With avfs, you can set enable_v210 = True and encode with Adobe Media Encoder if you have that available. It was terrible at XAVC the last time I tried it. YMMV.
WorBry
14th March 2019, 21:56
OK, so I tested CrowdRun 1080/50p 10bit 422 encoded to AVC-Intra 100 (227 Mbps) and AVC-Intra 200 (436 Mbps), both 'Sony flavour'.
Here are the SSIM, GMSD, and MDSI scores (with and without downsample) compared with those of the other formats:
http://i.imgur.com/VCITk0sm.png (https://imgur.com/VCITk0s)
I was quite surprised.
Didn't test Butteraugli. I don't think it provides more useful information, in this context at least.
Interesting that ffmpeg DNxHR-HQX gives higher scores then Prores HQ when no downsampling is applied.
ifb
15th March 2019, 03:44
I'm not too surprised. AVC is more advanced than the other intra codecs you're testing, even if they limit the codec a bit (no deblocking, fixed slice sizes, custom CQM, no CABAC). I wasn't sure how it would turn out given the two bitrates you can choose from.
It'd be interesting to see an XF-HEVC comparison too.
WorBry
15th March 2019, 04:31
It'd be interesting to see an XF-HEVC comparison too.
Can ffmpeg even ingest Canon XF-HEVC (10bit 422) camera footage yet ?
ifb
16th March 2019, 02:45
Can ffmpeg even ingest Canon XF-HEVC (10bit 422) camera footage yet ?
Dunno. I have an XF705 sample but haven't looked at it yet.
I did see a (flawed) comparison between the in-camera HEVC on the Fuji X-T3 and an external recorder (ProRes). I thought HEVC won easily.
I'm not sure that intra coding in HEVC is that much better than AVC? Of course it might be a moot point if XF-HEVC was crippled for the sake of reducing complexity/power consumption. I haven't seen any docs on the format, so I just don't know.
Iron_Mike
16th March 2019, 03:59
from zoptilib import Zopti
# read input video
orig = core.ffms2.Source(source=r'source.avi')
# initialize output file and chosen metrics
zopti = Zopti('results.txt', metrics=['ssim', 'gmsd'])
# ... process the video ...
# alternate = some_process(orig)
# measure similarity of original and alternate videos, save results to output file
zopti.run(orig, alternate)
quick question:
trying to run some GMSD, SSIM and MDSI metrics via Zopti - same as outlined on page 1 in the thread...
when I specify all three metrics as a list to Zopti, it creates a log file w/ the three results... when I then run the same distorted clip again via Zopti but only use ONE of the metrics (e.g.) GMSD, I get a log file with only one metric but the result does not match any of the results in the log file that contains all three metrics...
Example - all on the same distorted clip
GMSD, SSIM, MDSI
stop 123.25129597713656 228.20727309992276 230.492983448053
GMSD
stop 83.77192685888292
SSIM
stop 431.9516453872494
What am I missing ?
I'm running Zopti via a .vpy file executed by vspipe...
Thanks.
WorBry
16th March 2019, 06:40
Would you mind posting your script(s) for the 3-in-1 and individual metric tests ?
poisondeathray
16th March 2019, 06:46
Works for me . Same values in the same colorspace
MDSI requires RGB , so I converted to RGB for the 3 way test . Same values for the individual runs in RGB
But I got the same values within YUV for 2 way test GMSD ,SSIM, as single runs (within YUV, different values than converted RGB)
What source filter were you using ? When something is off like that, usually it's a frame mismatch
Iron_Mike
16th March 2019, 08:32
Would you mind posting your script(s) for the 3-in-1 and individual metric tests ?
I'm using same file you've used for some of your tests: Crowdrun 1080p50 420, 500 frames (crowd_run_1080p50.y4m)
The encoded file was created via ffmpeg x265 CRF 28 preset slow
vid_ref = core.ffms2.Source(source=ref_fp)
vid_enc = core.ffms2.Source(source=enc_fp)
log_fp = r'd:\zopti.log'
metrics = ('gmsd', 'ssim')
matrix = None
zopti = Zopti(log_fp, metrics=metrics, matrix=matrix)
zopti.addParams('ssim', dict(downsample=False, show_map=False))
zopti.run(vid_ref, vid_enc)
Works for me . Same values in the same colorspace
MDSI requires RGB , so I converted to RGB for the 3 way test . Same values for the individual runs in RGB
But I got the same values within YUV for 2 way test GMSD ,SSIM, as single runs (within YUV, different values than converted RGB)
What source filter were you using ? When something is off like that, usually it's a frame mismatch
thank for the hint that MDSI gets converted to RGB... that changes all values for GMSD and SSIM (if requested in the same run), which makes this absolutely pointless...
ONLY MDSI should be done in RGB, everything else in YUV, so that results stay consistent...
but even with that, the results using the Zopti class are as unstable as I've seen...
look at this... consecutive execution of the exact same script, just the requested metrics have been changed in each run - only using GMSD and SSIM... this is odd to say the least.... :confused:
GMSD #1
stop 82.89543157743563
SSIM #1
stop 436.64183973524297
GMSD #2
stop 95.29160100865921
GMSD #3
stop 82.89543157743563
GMSD #4
stop 82.89543157743563
GMSD #5
stop 82.89543157743563
GMSD #6
stop 82.89543157743563
GMSD, SSIM - #1
stop 122.23059696996059 236.93252855842488
GMSD, SSIM - #2
stop 121.9632105098167 237.4638095431857
SSIM #2
stop 436.641839735243
GMSD, SSIM - #3
stop 82.89543157743563 436.641839735243
GMSD, SSIM - #4
stop 122.14837149649146 236.40679167570883
GMSD, SSIM - #5
stop 123.8819626481414 228.86426005799092
GMSD, SSIM - #6
stop 120.18331285114424 246.86257738654993
ChaosKing
16th March 2019, 10:00
Check your video file with https://github.com/theChaosCoder/vapoursynth-portable-FATPACK/blob/master/VapourSynth64Portable/VapourSynth64/seek-test.py
python.exe seek-test.py video.mp4 0 100
0 100 = start_frame end_frame
Iron_Mike
16th March 2019, 12:46
Check your video file with https://github.com/theChaosCoder/vapoursynth-portable-FATPACK/blob/master/VapourSynth64Portable/VapourSynth64/seek-test.py
python.exe seek-test.py video.mp4 0 100
0 100 = start_frame end_frame
tested the encoded/distorted clip, result
Press 1 for FFMS2000
2 for L-SMASH-Works
3 for D2V Source
4 for AVISource
5 for FFMS2000(seekmode=0) [slow but more safe]
Number: 5
ffms2seek0
Clip has 500 frames.
Hashing: 80%
Clip hashed.
Test complete. No seeking issues found :D
ChaosKing
16th March 2019, 12:56
Have you also tested with number 1 (without seekmode=0)?
WorBry
16th March 2019, 16:32
I'm using same file you've used for some of your tests: Crowdrun 1080p50 420, 500 frames (crowd_run_1080p50.y4m)
The encoded file was created via ffmpeg x265 CRF 28 preset slow
Actually I converted the crowd_run_2160p50.y4m file to lossless 1080/50p x264 CRF0 Intra mp4 to serve as the 'Master' source and reference for the 1080/50p tests.
For the 2160/50p tests I used the original crowd_run_2160p50.y4m file as the 'Master'.
Re-testing the files I have archived, I get the following raw scores for 1080/50p x265 CRF28 (default settings, except -preset slow):
SSIM:GMSD (together): 478.4192336998458;43.506163861971956
SSIM(alone) : 478.4192336998458
GMSD (alone): 43.506163861971956
Exact same scores whether tested together or separately.
I'm using ChaosKing's Portable Fatpack VapourSynth and running the scripts with VSEditor > (F7) Benchmark. Now using Wolfberry's FFMS2 build:
https://forum.doom9.org/showthread.php?t=176198
poisondeathray
16th March 2019, 16:56
thank for the hint that MDSI gets converted to RGB... that changes all values for GMSD and SSIM (if requested in the same run), which makes this absolutely pointless...
ONLY MDSI should be done in RGB, everything else in YUV, so that results stay consistent...
The point is you should get the same consistent results in RGB (the multi run should be same as individual runs in RGB). If you didn't - it would add support that there was something wrong .
You did not clarify how you converted to RGB or any of the other procedures, so maybe there were other errors in your procedure?
So I even added the 2way YUV runs, which were the same within YUV as the individual YUV runs - and it looks like you're still getting different inconsistent results
I did multiple 5 runs of the same script, everything is consistent here
One thing that is odd, is sometimes the frame number will out of numerical order in the log e.g it might go 45,44,46 . I assume it has to do with the threading. But the actual values are the same.
Another difference is I did not prevent SSIM downsample (I went by the 1st script you quoted) , but that shouldn't affect GMSD , and I used vsedit to run the benchmark like WorBry . But I cannot see how using vspipe would alter the results .
Try another ffms build.
If seeking is accurate, maybe something else is up with your machine. Maybe start looking at memory integrity tests, hardware checks, temps, overheating
WorBry
16th March 2019, 19:28
Another difference is I did not prevent SSIM downsample (I went by the 1st script you quoted) , but that shouldn't affect GMSD
Disabling downsample changes the GMSD scores profoundly - examples:
https://forum.doom9.org/showthread.php?p=1866314#post1866314
...and
https://forum.doom9.org/showthread.php?p=1868858#post1868858
These are the raw SSIM and GMSD scores I get testing the x265 CRF28 encode with Downsample=False:
SSIM:GMSD (together): 446.2983305603785; 75.80431917408734
SSIM (alone): 446.2983305603785
GMSD (alone): 75.80431917408734
poisondeathray
16th March 2019, 21:32
Disabling downsample changes the GMSD scores profoundly - examples:
SSIM:GMSD (together): 446.2983305603785; 75.80431917408734
SSIM (alone): 446.2983305603785
GMSD (alone): 75.80431917408734
He only specified addParams for SSIM . Are you saying that changes GMSD too ? Don't you have to addParams('msdn' etc...) explicitly too ?
You you don't expect downsampling to change the consistency of the results .
You would expect any operation resize, or filter, or blur etc.. to affect the actual values.
But you do not expect it to change the values between multi testing at once vs. single testing; unless that filter uses some random property (e.g. a noise generator with random seed)
ie. When you run it you should get the same values when the settings are the same. You expect repeatable, consistent results within the same testing parameters. It shouldn't give you different values based on the time of the day or the phase of the moon
Iron_Mike
16th March 2019, 22:03
Actually I converted the crowd_run_2160p50.y4m file to lossless 1080/50p x264 CRF0 Intra mp4 to serve as the 'Master' source and reference for the 1080/50p tests.
For the 2160/50p tests I used the original crowd_run_2160p50.y4m file as the 'Master'.
ah, thank you for clarifying that. I was looking at some scores I got for the 1080p50 src I used and they did not perfectly match yours...
okay, I'm gonna convert the 2160p src to see if I can match your results throughout the thread - as a little (although not perfect) unit test. :D
Could you state the command how you converted to "1080/50p x264 CRF0 Intra mp4" ? (just to be sure I match your result)
I'm using ChaosKing's Portable Fatpack VapourSynth and running the scripts with VSEditor > (F7) Benchmark. Now using Wolfberry's FFMS2 build:
https://forum.doom9.org/showthread.php?t=176198
I'm using CK's Fatpack as well.
Btw, when I ran extensive tests before (using CK's Fatpack and VSPipe) on the VS VMAF implementation (vs. ffmpeg VMAF), there were never any inconsistencies...
Iron_Mike
16th March 2019, 22:14
Have you also tested with number 1 (without seekmode=0)?
just did that, and yes it found many errors... seeking seems to be 3 frames off
Press 1 for FFMS2000
2 for L-SMASH-Works
3 for D2V Source
4 for AVISource
5 for FFMS2000(seekmode=0) [slow but more safe]
Number: 1
ffms2
Clip has 500 frames.
Hashing: 80%
Clip hashed.
Requested frame 478, got frame 481.
Previous requests: 478
Requested frame 358, got frame 361.
Previous requests: 478 358
Requested frame 284, got frame 287.
Previous requests: 478 358 83 284
Requested frame 274, got frame 277.
Previous requests: 478 358 83 284 274
Requested frame 242, got frame 245.
Previous requests: 478 358 83 284 274 242
Requested frame 214, got frame 217.
Previous requests: 478 358 83 284 274 242 98 214
Requested frame 195, got frame 198.
Previous requests: 478 358 83 284 274 242 98 214 30 195
Requested frame 200, got frame 203.
Previous requests: 358 83 284 274 242 98 214 30 195 11 200
Requested frame 381, got frame 384.
Previous requests: 83 284 274 242 98 214 30 195 11 200 381
Requested frame 184, got frame 187.
Previous requests: 284 274 242 98 214 30 195 11 200 381 184
Requested frame 405, got frame 408.
Previous requests: 98 214 30 195 11 200 381 184 5 55 405
Requested frame 436, got frame 439.
Previous requests: 30 195 11 200 381 184 5 55 405 33 436
Requested frame 275, got frame 278.
Previous requests: 195 11 200 381 184 5 55 405 33 436 275
Requested frame 425, got frame 428.
Previous requests: 200 381 184 5 55 405 33 436 275 47 425
Requested frame 247, got frame 250.
Previous requests: 381 184 5 55 405 33 436 275 47 425 247
Requested frame 395, got frame 398.
Previous requests: 184 5 55 405 33 436 275 47 425 247 395
Requested frame 346, got frame 349.
Previous requests: 5 55 405 33 436 275 47 425 247 395 346
Requested frame 427, got frame 430.
Previous requests: 55 405 33 436 275 47 425 247 395 346 427
Requested frame 323, got frame 326.
Previous requests: 436 275 47 425 247 395 346 427 3 499 323
Requested frame 160, got frame 163.
Previous requests: 275 47 425 247 395 346 427 3 499 323 160
Requested frame 111, got frame 114.
Previous requests: 47 425 247 395 346 427 3 499 323 160 111
Requested frame 145, got frame 148.
Previous requests: 395 346 427 3 499 323 160 111 76 52 145
Requested frame 316, got frame 319.
Previous requests: 346 427 3 499 323 160 111 76 52 145 316
Requested frame 158, got frame 161.
Previous requests: 427 3 499 323 160 111 76 52 145 316 158
Requested frame 122, got frame 125.
Previous requests: 3 499 323 160 111 76 52 145 316 158 122
Requested frame 230, got frame 233.
Previous requests: 499 323 160 111 76 52 145 316 158 122 230
Requested frame 251, got frame 254.
Previous requests: 160 111 76 52 145 316 158 122 230 94 251
Requested frame 253, got frame 256.
Previous requests: 111 76 52 145 316 158 122 230 94 251 253
Requested frame 166, got frame 169.
Previous requests: 76 52 145 316 158 122 230 94 251 253 166
... (many more)
Test complete. Seeking issues found :-(
so why has this encoded file seeking issues ?
or is it my ffms2 version ? (it's the one from your Fatpack)
WorBry
16th March 2019, 22:19
He only specified addParams for SSIM . Are you saying that changes GMSD too ? Don't you have to addParams('msdn' etc...) explicitly too ?
Sorry, misunderstanding on my part.
When you run it you should get the same values when the settings are the same. You expect repeatable, consistent results within the same testing parameters. It shouldn't give you different values based on the time of the day or the phase of the moon
Absolutely.
ChaosKing
16th March 2019, 22:20
I added the lastest build from the ffms2 thread. You could try your old ffms2 version. Maybe something broke. L-SMASH is usually a bit more safe and is frame accurate for many more containers like m2ts or even vob.
p.s. that is why it is always good to test for seeking issues first. I have a h264 mkv but every ffms2 version has seeking issues with that file for some reason. lsmash no problemo.
WorBry
16th March 2019, 22:27
Could you state the command how you converted to "1080/50p x264 CRF0 Intra mp4" ? (just to be sure I match your result)
Probably better that I run the tests again with crowd_run_1080p50.y4m as source and post my results for x265 CRF28, rather than introducing another variable.
poisondeathray
16th March 2019, 22:51
I have a h264 mkv but every ffms2 version has seeking issues with that file for some reason. lsmash no problemo.
Is that with seekmode=0, threads=1 too ?
Did you examine that mkv to see what characteristics cause the problem ?
Can you share the file ? It would be a good debugging test clip
zorr
16th March 2019, 22:54
I did some tests too. With SSIM and GMSD the first two runs gave identical results, but the third one gave a different one. The difference is not big - starts with the 6th decimal - but it should not happen.
stop 49.08846907182173 3.484012159548981 3762.3334999999825
stop 49.088468378240414 3.4840229354201178 3428.1400000000417
There's no RGB conversion involved here so it has to be something else.
Looking at the individual frame results:
0; 0.9794317688604798; 0.07659219540647606; 0.0;
1; 0.9793688841540404; 0.07778895381762808; 0.0;
2; 0.9801722825175584; 0.07465046759649577; 0.0;
3; 0.9805196896947995; 0.07238442343729419; 0.0;
0; 0.9794317688604798; 0.0765921875457907; 0.0; <--- GMSD different from 8th digit
1; 0.9793688841540404; 0.07778895381762808; 0.0;
2; 0.9801722825175584; 0.07465046759649577; 0.0;
3; 0.9805189190488873; 0.07238807894499658; 0.0; <--- SSIM and GMSD different from 6th digit
Looks like there are differences on some frames only. Does not look like a seeking issue either, the errors are much smaller than what would happen if the frame was different.
I remember seeing this kind of issue when I first tested SSIM metric with Vapoursynth but then the error disappeared and I forgot about it.
Iron_Mike
16th March 2019, 23:01
Now using Wolfberry's FFMS2 build:
https://forum.doom9.org/showthread.php?t=176198
I just exchanged the FFMS2 plug in the Fatpack w/ Wolfberry's latest version...
same result run CK's seek test script (on the same encoded video file), it is always 3 frames off...
I also ran the #1 FFMS2 seek test on the original download of "crowd_run_1080p50.y4m" - zero issues found.
Alright, so I assume my encodes are bad...
did multiple re-encodes using Wolfberry's build or Zeranoes build... all have frame seeking issues, always 3 frames off...
here's the ffmpeg command:
<ffmpeg path> -y -thread_queue_size 8092 -i <path/to/crowd_run_1080p50.y4m> -c:v libx265 -preset slow -crf 24 -color_range tv -color_primaries bt709 -color_trc bt709 -colorspace bt709
-pix_fmt yuv420p -x265-params "keyint=100:min-keyint=100:rc-lookahead=100" -r 50 <path/to/output.mp4>
I also tested w/o specifying colorspace and keyframes... same result, 3 frames off
any pointers what I'm doing wrong in my encoding settings ?
ifb
16th March 2019, 23:03
Can ffmpeg even ingest Canon XF-HEVC (10bit 422) camera footage yet ?Yes. It was added (http://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=f95aee2b72535e14b7463750fd7afb6d1cdbe4d4) to the MXF demuxer 8 days ago.
Samples (https://www.dropbox.com/sh/uam3s1bvralba07/AADfc7RvwmhEA-rLJ8pDjz8la?dl=0)
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.