Quality evaluation I cannot fully understand [Archive]

View Full Version : Quality evaluation I cannot fully understand

SpasV

29th May 2019, 09:38

benwaggoner

29th May 2019, 17:19

With a general idea the Quantization is the only processing that can be lossy, I thought the encode quality depends on it only. I thought the Prediction determines the encode size but not the quality.

A simple test proofs me wrong.
I’ve run two encodes with different presets: --preset veryfast and –preset veryslow and all other options the same with –tune psnr –qp 8 –ipratio 1.0 –pbratio 1.0 on a 1224 frames. The encodes are 10-bit 1080p HDR from 2160p HDR source Blade Runner 2049 frames: 217348-218574.

The encoding results are shown in the attachment.
I've evaluated the encodes' quality through FFmpeg filter PSNR using SQRT(mse), as more intuitively understandable metric.
The test results show worse quality for --preset veryfast and bigger file size for --preset veryslow
SQRt(mse) PSNR
avg y u v avg y
--preset veryfast 1.756 1.909 1.537 1.246 56.490 55.250
--preset veryslow 1.558 1.660 1.442 1.210 57.040 55.860
veryslow sqrt(y)/veryfast sqrt(y) ratio = 0.869 or veryslow sqrt(y) is 13% less

--preset veryslow: 144,751,026 bytes vs --preset veryfast: 122,153,342 bytes or --preset veryslow file size is 18.5% bigger.

I can understand these results but cannot explain them. Or which options actually and how are they determine the lower sqrt(me) value - the better quality - for the --preset veryslow.
Way more things happen than simple quantization! Whole different tools get turned on in higher presets that aren't available at lower presets. Psychovisual modeling and optimization is applied. These might have an indirect impact on quantization, but it certainly isn't linear.

At higher presets, the correlation between QP and subjective quality probably worsens, since so many other ways to improve subjective quality get applied.

SpasV

29th May 2019, 19:16

Thanks.
x265 is a complex software and it is not trivial to control the encoding process.
By trial and error method I've found the option --rdoq-level <0|1|2> to be crucial in this case.--rdoq-level
Specify the amount of rate-distortion analysis to use within quantization:
At level 2 rate-distortion cost is used to make decimate decisions on each 4x4 coding group, ...
I've set --rdoq-level 2 with --preset medium and I've got good encoder performance and quality.
SQRT(mse) PSNR
avg y u v avg y
--preset veryfast 1.756 1.909 1.537 1.246 56.490 55.250
--preset veryslow 1.558 1.660 1.442 1.210 57.040 55.860
--preset medium + 1.587 1.697 1.451 1.224 56.800 55.590
--rdoq-level 2

Asmodian

31st May 2019, 21:42

Not that either of your metrics correlate particularly well with human perceptions of quality...

Have you tried using the slow preset? It is only one step slower than medium and it is the fastest preset that enables rdoq-level 2 by default. I have found the presets to be very good for quality v.s. speed tradeoffs.

SpasV

2nd June 2019, 14:34

I'm not an encoding fan. If I do encode my criterion is max quality at acceptable size which can be fulfilled when resizing the source frame - for example from UHD to HD. At HD frame size and an average 0.6 - 0.7 bpp, which is the usual BluRay quality, the encode size is pretty good.
For this reason I don't rely on human perceptions because at high quality there are not differences for a human to see.

And for this reason I implement --tune psnr (disables adaptive quant, psy-rd, and cutree).

Never the less, what would you say about these specific options I've used:
--max-merge 4 --rdoq-level 2 --rd 4 --no-early-skip --rc-lookahead 40 --preset medium --tune psnr --qp 10 --ipratio 1.1 --pbratio 1.2 --no-deblock --no-sao which were qualified as Beyond bad settings.
Thanks.

Forteen88

2nd June 2019, 17:40

@SpasV. You should have some deblocking, at least: --deblock -3:-3
Because deblock not only deblocks the image, but also compresses the video more.

sonnati

2nd June 2019, 23:23

With a general idea the Quantization is the only processing that can be lossy, I thought the encode quality depends on it only. I thought the Prediction determines the encode size but not the quality.

A simple test proofs me wrong.
I’ve run two encodes with different presets: --preset veryfast and –preset veryslow and all other options the same with –tune psnr –qp 8 –ipratio 1.0 –pbratio 1.0 on a 1224 frames. The encodes are 10-bit 1080p HDR from 2160p HDR source Blade Runner 2049 frames: 217348-218574.

The encoding results are shown in the attachment.
I've evaluated the encodes' quality through FFmpeg filter PSNR using SQRT(mse), as more intuitively understandable metric.
The test results show worse quality for --preset veryfast and bigger file size for --preset veryslow
SQRt(mse) PSNR
avg y u v avg y
--preset veryfast 1.756 1.909 1.537 1.246 56.490 55.250
--preset veryslow 1.558 1.660 1.442 1.210 57.040 55.860
veryslow sqrt(y)/veryfast sqrt(y) ratio = 0.869 or veryslow sqrt(y) is 13% less

--preset veryslow: 144,751,026 bytes vs --preset veryfast: 122,153,342 bytes or --preset veryslow file size is 18.5% bigger.

I can understand these results but cannot explain them. Or which options actually and how are they determine the lower sqrt(me) value - the better quality - for the --preset veryslow.

The reason why "very slow" provides higher PSNR than "very fast" even at fixed QP is because recovery of information from previous frames (motion estimation and compensation) is better for slower presets. This means that the remaining delta signal carries less info and the given QP eliminates less info. Therefore the final amount of info (prediction + quantized delta) is higher for slower presets.

It is less understandable why slower presets require so much more data rate at the same QP if delta signal is smaller...a part is probably due to higher signaling costs for more accurate motion estimation and compensation...but in your example there's something more

SpasV

3rd June 2019, 08:45

The reason why "very slow" provides higher PSNR than "very fast" even at fixed QP is because recovery of information from previous frames (motion estimation and compensation) is better for slower presets. This means that the remaining delta signal carries less info and the given QP eliminates less info. Therefore the final amount of info (prediction + quantized delta) is higher for slower presets.

It is less understandable why slower presets require so much more data rate at the same QP if delta signal is smaller...a part is probably due to higher signaling costs for more accurate motion estimation and compensation...but in your example there's something more
Thanks.
What I've found was at --rdoq-level 2 the mse was better.

--rdoq-level <0|1|2>
Specify the amount of rate-distortion analysis to use within quantization:

At level 0 rate-distortion cost is not considered in quant,
At level 1 rate-distortion cost is used to find optimal rounding values for each level,
At level 2 rate-distortion cost is used to make decimate decisions on each 4x4 coding group.

Level 2 is active at presets higher than medium.
Level 0 is active at presets lower than slow.

Probably, it is worth looking at the code.

SpasV

13th July 2019, 20:30

x265 has ten predefined --preset options that optimize the trade-off between encoding speed (encoded frames per second) and compression efficiency (quality per bit in the bitstream).

I've decided to look at the compression efficiency considering the two Rate Control Modes - CRF and CQP.

In my understanding the native video compression rate control is Constant QP rate. It is “pure” mathematical method for video compressing. The CRF along with AQ and --psy-rd & --psy-rdoq aim uniform quality and improved perceived visual quality with relatively low quality encodes.

I’ve run simple comparison tests in order to get some estimations.

The setup is simplified. The source - Mad Max: Fury Road 2015 UHD BluRay HDR, the encodes -1080p 10-bit HDR CRF and CQP.
Small clips, around 800 frames, --preset slower --no-cutree --ipratio 1.0 --pbratio 1.0 (for all frames to have the same QP).
In a probing attempt with CRF 15 --qcomp 0.9 --qpstep 1, I’ve got (I-frames) Avg QP:12.18, (P-frames) Avg QP:12.12, (B-frames) Avg QP:12.09, so I’ve used QP 12 for the CQP mode.

I’ve encoded four clips. The clips are from regions where the 2160p BluRay source stream has 70 Mbps add 30 Mbps.

https://thumbs2.imgbox.com/13/1c/xs1nVhvy_t.png (http://imgbox.com/xs1nVhvy) https://thumbs2.imgbox.com/c6/18/TKDvFalp_t.png (http://imgbox.com/TKDvFalp)
https://thumbs2.imgbox.com/ba/44/hkkiUm0p_t.png (http://imgbox.com/hkkiUm0p) https://thumbs2.imgbox.com/53/36/ywFEUD36_t.png (http://imgbox.com/ywFEUD36)

I’ve run VMAF PSNR SSIM MS-SSIM tests with vfam-master\Release\vmafossexec.exe and Model\vmaf_v0.6.1.pkl.

The first look at the results.

Low Bitrate 30 Mbps
VMAF PSNR SSIM MS-SSIM SIZE MB Less
Frames CRF 99.590 54.036 0.99927 0.99911 106.876
38594-777 CQP 99.518 53.898 0.99917 0.99900 95.441
CQP/CRF 0.999 0.997 0.99990 0.99988 89.30% 10.70%

Frames CRF 96.853 54.304 0.99936 0.99923 67.881
43505-793 CQP 96.873 54.006 0.99902 0.99892 50.705
CQP/CRF 1.000 0.995 0.99966 0.99969 74.70% 25.30%

High Bitrate 70 Mbps
VMAF PSNR SSIM MS-SSIM SIZE MB Less
Frames CRF 99.742 53.453 0.99924 0.99907 113.454
7066-772 CQP 99.724 53.325 0.99921 0.99900 95.441
CQP/CRF 1.000 0.998 0.99996 0.99994 84.12% 15.88%

Frames CRF 99.541 53.804 0.99944 0.99934 56.662
116425-786 CQP 99.547 54.159 0.99931 0.99925 52.005
CQP/CRF 1.000 1.007 0.99987 0.99990 91.78% 8.22%

The second look - the Bits allocated to the frames.
https://thumbs2.imgbox.com/5d/95/XL0VBed6_t.png (http://imgbox.com/XL0VBed6)
The first chart shows the information distributed over the stream frames 116425-(786) in Bits as generated by x265.
On the left of I-frame 500, which is actually 116924 (116424+500) - earlier in time, there is a region of higher CRF bitrate frames than CQP's. Part of the region is shown below the first chart.

What follows are a couple of comparison screens for frame 499
Screenshots are 8-bit color.
The coordinates of compared Pixels (marked with a black dot) and their YUV 10-bit code values are shown in boxes – down/right.
(The pixel with coordinates X:1031 Y:231 is in the sunlight reflecting spot on the left eye.
The pixel with coordinates X:900 Y:400 is under the nose on the right.)

https://thumbs2.imgbox.com/d9/68/jJZifVuG_t.png (http://imgbox.com/jJZifVuG) https://thumbs2.imgbox.com/db/95/qygqWWOH_t.png (http://imgbox.com/qygqWWOH)
https://thumbs2.imgbox.com/40/45/0EmRkaLT_t.png (http://imgbox.com/0EmRkaLT) https://thumbs2.imgbox.com/0b/d7/BWxjhSB6_t.png (http://imgbox.com/BWxjhSB6)

Asmodian

13th July 2019, 22:19

This kind of comparison is of limited value, the sample is too short and the sizes are too different. What are you trying to learn? What is the purpose of looking at the single pixels?

SpasV

14th July 2019, 12:59

This kind of comparison is of limited value, the sample is too short and the sizes are too different.
Yes, this is not a research paper.

What are you trying to learn?

I essence, do I need AQ and Psy options when I aim high quality encode.
Although the "comparison is of limited value" my impression is I do not need these options. CQP seems to me perfect as long as I have chosen QP and it stays unchanged.

What is the purpose of looking at the single pixels?

The purpose is for the reader to get impression about the closeness between images. Although the brain is capable of perceiving the whole image 1920x800 = 1,536,000 pixels in an instant it cannot distinguish pixels or even pixel area if they are very close.
As to "the single pixels", well let's try a look at the whole stream of 786 frames each 1,536,000 pixels.
Here is a chart of RMSE for the stream.

https://thumbs2.imgbox.com/f1/54/DHAUyPfo_t.png (http://imgbox.com/DHAUyPfo)

RMSE stands for Square Root of Mean Squared Error which is intuitively easier to understand.
The average RMSEs: CRF - 2.118, CQP - 2.054.
It is difficult to deal with this numbers without knowing the differences' distributions. Nevertheless I'll try to show some understandable interpretation assuming all pixels' code values are different and there are no more than number three between all pixels' code values.
In other words, I'm assuming differences are 1, 2, and 3.
Here are possible distributions of such differences.

https://thumbs2.imgbox.com/05/ed/ty5XISSO_t.png (http://imgbox.com/ty5XISSO)
It seems likely to me there are 2% pixels differ by 3, 33.8% - by 2 an 69.7% - by 1.
In fact, the distribution doesn't matter. Such 10-bit color frames are undistinguished (difference <4) when shown as 8-bit color because such differences would be 0 (zero).

benwaggoner

17th July 2019, 00:23

I've decided to look at the compression efficiency considering the two Rate Control Modes - CRF and CQP...

The setup is simplified. The source - Mad Max: Fury Road 2015 UHD BluRay HDR, the encodes -1080p 10-bit HDR CRF and CQP.
I’ve run VMAF PSNR SSIM MS-SSIM tests with vfam-master\Release\vmafossexec.exe and Model\vmaf_v0.6.1.pkl.

None of those metrics have demonstrated good subjective correlation with HDR content. Specifically, VMAF doesn't even claim to produce accurate scores with HDR content.

Also, there are absolutely psychovisual optimizations that improve subjective quality while harming all objective metrics, even VMAF. This kind of comparison with HDR content really needs to be done subjectively at this point.