x265's CRF versus low contrast HDR sources.. [Archive]

benwaggoner

1st August 2018, 19:49

Try:
--aq-mode 3 --hdr-opt
(with 10 bit encoding)
https://x265.readthedocs.io/en/default/cli.html
--aq-mode 3 is basically --aq-mode 2 plus a bias towards low luma. This matters because cheap LCD displays often don't have enough contrast in that range to not cause banding/blocking. Also, Rec 709 isn't really perceptually uniform, and so is somewhat starved for code values near black and has an excess of code values near white.

HDR-10 uses PQ (Perceptual Quantizer) which doesn't have this issue, and has way more code values in low luma - the problem aq-mode 3 addresses doesn't exist. So --aq-mode 2 is appropriate when PQ is being used. Using aq-mode 3 will result in more bits being spent in low luma. You can probably switch to using --crf 16 and still get a bitrate reduction switching to --aq-mode 2.

--aq-mode 3 is good for SDR and probably HLG.

Boulder

2nd August 2018, 10:05

Is --aqmode 2 recommended over 1 for HDR encodes? The docs really don't say much.

benwaggoner

2nd August 2018, 19:28

Is --aqmode 2 recommended over 1 for HDR encodes? The docs really don't say much.
I generally prefer aq-mode 2 over 1, in the context of other tuning. aq-mode 1 is "safer" when bitrate isn't the paramount concern, but better low-bitrate quality is possible with aq-mode 2 (and particularly 3 for SDR - there's not an aq-mode 1 with low-luma bias).

The advantage of 2 over 1 in HDR is smaller than that of 3 over 1 with SDR. aq-mode 3 requires careful tuning with CRF, as it increases bitrate a decent amount in CRF mode, but since it improves quality in low luma, which is the place where low CRF is most needed, aq-mode 3 allows for higher CRF values to be used, and thus improved compression efficiency.

Getting aq-strength tuned in correctly is important; too much can waste bits in CRF, or hurt quality in highly detailed regions in lower bitrate CBR/ABR encodes.

Boulder

2nd August 2018, 21:04

Very interesting, thank you for the detailed explanation. I really wish there were proper, up-to-date tunings in x265 because these kind of things would definitely belong there. I for one have never tried changing the AQ settings because I don't have enough understanding of the details.

Boulder

4th August 2018, 15:41

The advantage of 2 over 1 in HDR is smaller than that of 3 over 1 with SDR. aq-mode 3 requires careful tuning with CRF, as it increases bitrate a decent amount in CRF mode, but since it improves quality in low luma, which is the place where low CRF is most needed, aq-mode 3 allows for higher CRF values to be used, and thus improved compression efficiency.

Getting aq-strength tuned in correctly is important; too much can waste bits in CRF, or hurt quality in highly detailed regions in lower bitrate CBR/ABR encodes.

Some extra questions regarding these ones. I usually use CRF 19 for 720p or 1080p encodes (with mattes cropped) with --aq-mode 1. If I use --aq-mode 3, is it generally OK to raise CRF accordingly (and probably tune down --aq-strength to 0.8-0.9) to get around the same average bitrate? That is, is the biggest difference inside the areas of lower luma in the frames anyway or is there a danger of making "normal" areas worse by accident? In a short test, --aq-mode 3 required 40% more bits for the same CRF with all the other settings kept as they were.

EDIT: one more thing: --aq-mode 3 is meant for 8-bit SDR sources even though I use the Main10 profile to encode?

benwaggoner

7th August 2018, 00:15

Some extra questions regarding these ones. I usually use CRF 19 for 720p or 1080p encodes (with mattes cropped) with --aq-mode 1. If I use --aq-mode 3, is it generally OK to raise CRF accordingly (and probably tune down --aq-strength to 0.8-0.9) to get around the same average bitrate? That is, is the biggest difference inside the areas of lower luma in the frames anyway or is there a danger of making "normal" areas worse by accident? In a short test, --aq-mode 3 required 40% more bits for the same CRF with all the other settings kept as they were.

EDIT: one more thing: --aq-mode 3 is meant for 8-bit SDR sources even though I use the Main10 profile to encode?
--aq-mode 3 is meant for SDR sources. 8 versus 10 bit doesn't matter that much, although 8-bit tends to exacerbate the problems.

K.i.N.G

22nd August 2018, 17:54

benwaggoner

22nd August 2018, 21:52

Thanks for the replies!

So HDR10 footage doesnt really benefit that much from --aq-mode 3 even at low'ish bitrates (let's say 4K @ ±14Mbit/sec)?
(The x265 docs mention it does help also for low bitrate 10bit footage, but that is for SDR)
At any given bitrate, aq-mode 3 should help most SDR content, but hurt most PQ HDR content. Since HDR’s PQ is much more perceptually uniform than gamma, and HDR displays do a much better job rendering low luma, shifting more bits into low luma will hurt mid-high luma regions more than it will help low luma regions.

In general, aq-mode 3 will significantly increase bitrate at a given CRF, or increase the QP of mid-high luma at a fixed bitrate. It is just aq-mode 2 plus a negative offset to QP in low luma regions.

K.i.N.G

23rd August 2018, 03:27

...and HDR displays do a much better job rendering low luma.

Yes, (I just bought an oled and its amazing) but I'm not sure why you're mentioning the ability of a screen to render low luma as if it's a reason to not put any attention to it...

You can have a TV that can render the highest quality low luma but if they are encoded badly that doesn't matter. They will still look bad.

My question is/was does x265's adaptive CRF algorithm take the HDR PQ into account or does it distribute bits using the old SDR gamma'ish curve(s)? (and thus resulting in loss of detail because the resulting loss in contrast)

I got this impression of this happening, because i have to use a much lower CRF to get something more decent and there is stil a lot of quality lost in low contrast areas (more than expected compared to other parts).
Sadly I currently have not enough time to experiment more...

benwaggoner

28th August 2018, 18:32

Yes, (I just bought an oled and its amazing) but I'm not sure why you're mentioning the ability of a screen to render low luma as if it's a reason to not put any attention to it...
We're going to pay attention to low luma! But with HDR, the steps between low-luma code values aren't more important than with mid or high luma. With SDR, a block of Y'=16 next to a block of Y'=17 can be painfully obvious. But in HDR, Y'=64 and Y'=68 are almost indistinguishable.

SDR needs lower QP in low luma. But HDR, with the Perceptual Quantizer is much more perceptually uniform, so the same QP works well independent of mean luma.

My question is/was does x265's adaptive CRF algorithm take the HDR PQ into account or does it distribute bits using the old SDR gamma'ish curve(s)? (and thus resulting in loss of detail because the resulting loss in contrast)
My understanding is the all AQ modes other than aq-mode 3 assume perceptual uniformity. Although x264 being what it is, there may be some frame level luma offset constants in there somewhere.--aq-mode 3 exists exactly because AQ wasn't adequately sensitive to different QP needs at different luma values.

I got this impression of this happening, because i have to use a much lower CRF to get something more decent and there is stil a lot of quality lost in low contrast areas (more than expected compared to other parts).
Sadly I currently have not enough time to experiment more...
Using -aq-mode 3 will generally use more bits at the same CRF than --aq-mode 2, due to reduced QP in low luma settings. However, since low luma issues is often what drives picking a lower CRF level, you can often get away with a higher CRF with --aq-mode 3 in SDR. Comparing at the same --bitrate is the proper testing procedure of course.

Using --aq-mode 3 in HDR still gives you the bitrate hit due to lower QP in darker regions, but no visible quality improvements. At the same bitrate HDR --aq-mode 3 will look worse than --aq-mode 2 because the reduced QP in low luma results in higher QP in higher luma, which is a net reduction in perceptual quality.

K.i.N.G

29th August 2018, 05:55

We're going to pay attention to low luma! But with HDR, the steps between low-luma code values aren't more important than with mid or high luma. With SDR, a block of Y'=16 next to a block of Y'=17 can be painfully obvious. But in HDR, Y'=64 and Y'=68 are almost indistinguishable.

SDR needs lower QP in low luma. But HDR, with the Perceptual Quantizer is much more perceptually uniform, so the same QP works well independent of mean luma.

My understanding is the all AQ modes other than aq-mode 3 assume perceptual uniformity. Although x264 being what it is, there may be some frame level luma offset constants in there somewhere.--aq-mode 3 exists exactly because AQ wasn't adequately sensitive to different QP needs at different luma values.

Using -aq-mode 3 will generally use more bits at the same CRF than --aq-mode 2, due to reduced QP in low luma settings. However, since low luma issues is often what drives picking a lower CRF level, you can often get away with a higher CRF with --aq-mode 3 in SDR. Comparing at the same --bitrate is the proper testing procedure of course.

Using --aq-mode 3 in HDR still gives you the bitrate hit due to lower QP in darker regions, but no visible quality improvements. At the same bitrate HDR --aq-mode 3 will look worse than --aq-mode 2 because the reduced QP in low luma results in higher QP in higher luma, which is a net reduction in perceptual quality.

That all has more to do with 8bit vs 10bit than HDR vs SDR...
Not that anyone in his right mind would ever encode an HDR video with 8bit, but it's technically possible.
Anyway, that was not what my question was about.

I think you are not quite understanding what I'm asking.

If we would encode HDR footage but tell de decoder that its a SDR video. Then when we play it it would be all washed out/low contrast, right?
This shows that post decoding there are some color adjustments going on in order to display the data/footage correctly (that is why we put flags in the video; the decoder needs to know how to interpret the data).
Just because the data has internally encoded a block as Y'= n doesn't mean it is displayed as Y' = n. It is most likely it never is.
When this data is decoded then the decoder checks which color space is assigned and which color space should be used to display the color.
Just to be sure I'm not misunderstood, I'm not talking about the TV's custom screen settings. I'm assuming a correctly calibrated screen.
There is a difference between a rec.709 rec.2020 flag, as you undoubtedly know. So x265's adaptive algorithms should be adjusted accordingly. If not, it's bit distribution would be much

One would expect x265 to do this, but the (few) tests I've done made me doubt this.

alex1399

29th August 2018, 06:54

Actually a more precise question of your problem is requested when others have to guest and obviously guest wrong. Someone once recalled before that the HDR footage stuffs are not linearly distributed. Either way is to use high-bit-depth accuracy transform to linear distributed plane and do all kind of process or what then transform back, or use the --aq-mode 2 with higher --aq-strength strength to compromise the lack of variance in large flat blight area. I assumed that there are some upscaling in your encoding process.

Sometimes the background and sky will get too much bits with a high --aq-strength. And, maybe a --aq-mode 4 for BT.2020 ?

K.i.N.G

29th August 2018, 19:59

Yes, I am aware of the fact that I might be not doing a great job at explaining my thoughts on this.
And i mean no disrespect to benwaggoner at all. His signature mentions he's Principal Video Specialist, Amazon Instant Video.. So I'm sure he knows more about this than I do.

I do a lot of 3D rendering for work. We use something called adaptive sampling and importance sampling to speed things up.
Areas of an image that are more perceptually important to the human eye get more samples than areas in which the human eye (and brain actually) don't notice much.
Combine that with the fact that things are rendered in linear space but when shown on a screen a color profile is applied, which changes the look of the image drastically.
Thus this color profile needs to be taken into account or the adaptive algorithms will make things look bad.
The same should be applied to a video encoder that relies on psychovisual algorithms, no?

So it is not enough that we have algorithms that only takes the human perception into account.
It first should consider how it will actually be displayed (color profile) and apply the adaptive psycho visual algorithms based of human perception on top of that.
So my question is; does x265 do that?

As far as I know aq-modes 1,2 or 3 will assign bits the same way wether you are encoding a rec.709 SDR video or bt.2020 HDR video.
This makes no sense to me because the data will be shown completely different on a screen.

Also (a bit of topic maybe), people seem to think that 10bit HDR will produce more smoother gradients than 8bit SDR.
But if you take into account that HDR has a much higher peak brightness those bits are getting stretched over a bigger range.
I know SDR has no defined brightness, so I added a bit more to what the average google results gave me, just to be sure.
8bit SDR = 256 steps stretched over about 250nits.
Lets say 256nits to make the math easy.
10bit HDR = 1024 steps stretched over 1000nits (or even +4000nits theoretically).
Lets say 1024nits to make the math easy.
So, both (in this case) have 1 gradient step per nit.
Thus 10bit wont prevent much banding in HDR footage as much as it would in SDR content.

benwaggoner

29th August 2018, 23:47

That all has more to do with 8bit vs 10bit than HDR vs SDR...
Not that anyone in his right mind would ever encode an HDR video with 8bit, but it's technically possible.
PQ is more perceptually uniform than SDR irrespective of bit depth. At 8-bit, PQ will have more code values in low luma than 8-bit SDR will. --aq-mode 3 is in large part a fix for the lack of perceptual uniformity in low luma in SDR gamma, exacerbated by post-CRT displays having issues with accurate display of low luma values.

If we would encode HDR footage but tell de decoder that its a SDR video. Then when we play it it would be all washed out/low contrast, right?
Yeah, and the color will be more yellow/brown.

This shows that post decoding there are some color adjustments going on in order to display the data/footage correctly (that is why we put flags in the video; the decoder needs to know how to interpret the data).
True. Like 709 versus 601 versus 2020 versus 2020 PQ. There isn't really any "default" mapping from video levels to display levels.

Just because the data has internally encoded a block as Y'= n doesn't mean it is displayed as Y' = n. It is most likely it never is.
Well, the PQ curve HAS specified nit values for each code value. So Y'=n would appear the same on all perfectly [/QUOTE]calibrated PQ displays.

When this data is decoded then the decoder checks which color space is assigned and which color space should be used to display the color.
Just to be sure I'm not misunderstood, I'm not talking about the TV's custom screen settings. I'm assuming a correctly calibrated screen.
Not always a great assumption. And there weren't canonical display-referenced mappings for 601/709 like there are for PQ.

There is a difference between a rec.709 rec.2020 flag, as you undoubtedly know. So x265's adaptive algorithms should be adjusted accordingly. If not, it's bit distribution would be much One would expect x265 to do this, but the (few) tests I've done made me doubt this.
Correct, it doesn't do that. Most encoders assume perceptual linearity across code values unless otherwise specified. Which is why -aq-mode 3 was created for x264 and x265.

benwaggoner

29th August 2018, 23:56

I do a lot of 3D rendering for work. We use something called adaptive sampling and importance sampling to speed things up.
Areas of an image that are more perceptually important to the human eye get more samples than areas in which the human eye (and brain actually) don't notice much.
Combine that with the fact that things are rendered in linear space but when shown on a screen a color profile is applied, which changes the look of the image drastically.
Thus this color profile needs to be taken into account or the adaptive algorithms will make things look bad.
The same should be applied to a video encoder that relies on psychovisual algorithms, no?
I long for the world where all video is done in floating point linear light, but EXCEPT for 3D rendering and some VFX, most "normal" video stuff is done on some kind of curve. This is changing in film production with ACES.

No codec takes linear light as input, nor do codecs themselves do color volume conversions (encoders may do it, but that's upstream of the codec). That all gets done outside of the codec, which starts with subsampled Y'CbCr code values.

So it is not enough that we have algorithms that only takes the human perception into account.
It first should consider how it will actually be displayed (color profile) and apply the adaptive psycho visual algorithms based of human perception on top of that.
So my question is; does x265 do that?
It doesn't happen automatically. One could argue that it should default to --aq-mode 2 when HDR metadata or --hdr-opt is set, and otherwise default to --aq-mode 3, but nothing like that actually happens.

As far as I know aq-modes 1,2 or 3 will assign bits the same way wether you are encoding a rec.709 SDR video or bt.2020 HDR video.
Pretty much, although --hdr-opt does some QP adjustments itself.

This makes no sense to me because the data will be shown completely different on a screen.
I shall not argue that video technology works the way it would work if we could start over today from scratch :).

Also (a bit of topic maybe), people seem to think that 10bit HDR will produce more smoother gradients than 8bit SDR.
But if you take into account that HDR has a much higher peak brightness those bits are getting stretched over a bigger range.
I know SDR has no defined brightness, so I added a bit more to what the average google results gave me, just to be sure.
8bit SDR = 256 steps stretched over about 250nits.
Lets say 256nits to make the math easy.
10bit HDR = 1024 steps stretched over 1000nits (or even +4000nits theoretically).
Lets say 1024nits to make the math easy.
So, both (in this case) have 1 gradient step per nit.
Thus 10bit wont prevent much banding in HDR footage as much as it would in SDR content.
Actually, video only uses the 16-235 or 64-960 range; values above and below are supposed to be still max black or max white.

And TVs are all over the place with what they use peak white for. If anything, the main reason HDR gets higher peaks in practice is that you never get a 1000 nit white frame in HDR, while a frame of Y'=230 isn't that uncommon. So with thermal/power limitations, HDR can have more small points of brighter display, since the average frame light level really isn't higher than for SDR on a typical consumer displays.

K.i.N.G

31st August 2018, 14:25

I only mentioned the fact that video editing and 3D rendering is done in linear space because that is an easy example.

Pretty much, although --hdr-opt does some QP adjustments itself.

Ahh, and there we go! So there is room for improvement.
They could adjust the adaptive quantizer's algorithm depending on what color space is selected.
And I'm convinced this could potentially increase efficiency/quality by quite a margin.

I shall not argue that video technology works the way it would work if we could start over today from scratch :).

Sure, but this specifically 'only' requires adjusting the encoder not the entire history of how video works/evolved. :D

benwaggoner

31st August 2018, 16:22

I only mentioned the fact that video editing and 3D rendering is done in linear space because that is an easy example.
Very little video editing is done in linear. It's mainly seen in VFX and film coloring. Tools like After Effects and Premiere can have particular projects set to run in 32-bit linear light, but it's not the default. I do a lot in linear myself, but that's more doing corrections and conversions, not creative work.

Ahh, and there we go! So there is room for improvement.
They could adjust the adaptive quantizer's algorithm depending on what color space is selected.
And I'm convinced this could potentially increase efficiency/quality by quite a margin.
I'm not sure about "quite a margin" but it is a historically undervalued aspect of psychovisual optimization.

One could actually think of --aq-mode 3 as "--sdr-opt."

Sure, but this specifically 'only' requires adjusting the encoder not the entire history of how video works/evolved. :D
Yeah. Although an encoder that took linear light into the quantization stage and then quantized based on the output color volume could be awesome. It's always bothered me that we convert to final bit depth before doing the frequency transform, even though the iDCT values get more bits and those bits don't have 1:1 mapping with pixels anyway.

K.i.N.G

31st August 2018, 23:54

Very little video editing is done in linear. It's mainly seen in VFX and film coloring. Tools like After Effects and Premiere can have particular projects set to run in 32-bit linear light, but it's not the default. I do a lot in linear myself, but that's more doing corrections and conversions, not creative work.

Well, this I know a thing or two about and I can assure you that most creative work (CGI/VFX, Motion Graphics, ...) in after effects, nuke, etc... is done in linear or the math of the filters and layer blending modes isn't correct.
At least if the guy knows what he's doing. I can imagine that plenty of self thought hobbyists and youtubers etc aren't aware of this and just 'play around' until it looks acceptable to them. But ask any big studio that knows what theiy're doing and they will confirm this.

benwaggoner

1st September 2018, 01:28

Well, this I know a thing or two about and I can assure you that most creative work (CGI/VFX, Motion Graphics, ...) in after effects, nuke, etc... is done in linear or the math of the filters and layer blending modes isn't correct.
At least if the guy knows what he's doing. I can imagine that plenty of self thought hobbyists and youtubers etc aren't aware of this and just 'play around' until it looks acceptable to them. But ask any big studio that knows what theiy're doing and they will confirm this.
True, most professional scripted content is going to be done as linear float internally. But the source and output formats very rarely are live linear, which is what gets to the encoder. And no distribution codec encodes in linear.

Although I hope we get one someday. ACES has demonstrated some good visually lossless compression with linear float, and I think a lot of block-based motion compensated techniques in the frequency domain could be applied to linear float.

Boulder

29th October 2018, 08:01

benwaggoner

29th October 2018, 18:36

In case someone else is working with HDR sources and does CRF encoding: you need to lower CRF quite a lot compared to SDR sources. I did some testing with an episode of Game of Thrones S01 and the result was that CRF 15 produced sufficient quality when the final resolution was 1080p. With SDR sources, CRF 20.5 is the sweet spot for me.
Hmm. I haven’t seen nearly such a differential. Were you using —hdr-opt and —aq-mode 2 with the HDR?

RainyDog

31st October 2018, 10:23

In case someone else is working with HDR sources and does CRF encoding: you need to lower CRF quite a lot compared to SDR sources. I did some testing with an episode of Game of Thrones S01 and the result was that CRF 15 produced sufficient quality when the final resolution was 1080p. With SDR sources, CRF 20.5 is the sweet spot for me.

This is what I've found too. CRF needs to be a good 3-5 lower when encoding HDR sources.

Most HDR encodes will be about half the size and half the quality of the equivalent SDR encode at the same CRF.

RainyDog

31st October 2018, 10:36

Hmm. I haven’t seen nearly such a differential. Were you using —hdr-opt and —aq-mode 2 with the HDR?

My findings are the same. That's using --hdr-opt though not --aq-mode 2 which is still think is the worst of the 3 AQ modes after testing it a bit again.

Grainy sources level it out somewhat as grain and noise just attract bits regardless. But I just don't think the encoder is properly adapted to deal with non tone-mapped HDR sources really.

Boulder

31st October 2018, 11:43

Hmm. I haven’t seen nearly such a differential. Were you using —hdr-opt and —aq-mode 2 with the HDR?

Yes, both were used, aq-strength 1.0. I actually ended up using CRF 14 for the first Harry Potter movie, average bitrate was around 7.5 Mbps 25% into the encode (very light denoising and downsizing to 1080p). I also feel that the bits are allocated based on the same flat image you see when you view the video on a non-HDR display, so it really cannot be optimal.

RainyDog

31st October 2018, 14:18

I also feel that the bits are allocated based on the same flat image you see when you view the video on a non-HDR display, so it really cannot be optimal.

They are and that's mainly the problem it would seem.

Bare HDR sources don't have the tone and luminance variation of SDR sources as that mapping is added by the decoder. So the encoder doesn't have that as a basis for bitrate distribution which would otherwise be a huge factor in where bits are allocated with SDR sources.

blublub

8th November 2018, 07:46

RainyDog

8th November 2018, 10:08

So the takeaway for HDR sources is currently:

lower your CRF about 3-5 for HDR encodes and avoid aq-mode 3 and 2 - is that correct? Does this also apply for encoding in UHD? (some ppl in this thread downsized to 1080p)

No, I've still used aq-mode 3 for my HDR encodes as I find it the best all round aq-mode for low-mid bitrate encodes where dark frames need all the extra bits they can get.

Depends what bitrates/quality level you're aiming for though, really. There's still a solid case for aq-mode 1 being the most dependable, balanced and safest mode for transparent high bitrate encodes though.

Same should be applicable to 1080p or 2160p.

blublub

8th November 2018, 13:55

Well if u r downsizing UHD to HD such a low CRF might make sense.
But when u keep the original resolution CRF values below 16 often result in compression ratios not worth the encoding time or when the source is noisy even in larger than original sizes

K.i.N.G

10th November 2018, 00:41

I found that bumping the AQ-Strength quite a lot can be helpful to retain the low contrast details better. (Since more AQ-Strength will focus the bits more on flatter/textured areas instead of edges/lines/higher contrast areas, thus it balances things out a bit better, though obviously not ideal)
Which confirms my initial thought that x265 doesn't take into account how HDR footage will be displayed and analyses it the same way as it analyses SDR footage. (so, low contrast areas will be seen by x265 as even flatter areas and thus will be blurred out or even result in banding).

Currenly I'm encoding some HDR footage (it is 1080p though) with AQ-Strength bumped to 1.8, Qcomp to 0.7 and SubMe 7... Which brings it a bit 'closer' to expected results (still not as good as it could be imho).

Boulder

10th November 2018, 01:19

Which aq-mode are you using?

K.i.N.G

10th November 2018, 01:50

Which aq-mode are you using?

Currently, with this encode, I'm using 2 but with no perticular reason. Normally I like 1 more... Bit I figured I'd try 2 for once because some ppl seem to recommend it, so I thought it might help with this. Maybe it's better for HDR... I'll do the same encode with mode 1 after this, so I can compare but thats going to take a few days so...

Boulder

10th November 2018, 02:12

Please keep us posted, it's interesting to know if there are any differences. I've made only one HDR encode so far but didn't do any comparisons but just used the default aq-mode and strength.

benwaggoner

12th November 2018, 19:14

Currently, with this encode, I'm using 2 but with no perticular reason. Normally I like 1 more... Bit I figured I'd try 2 for once because some ppl seem to recommend it, so I thought it might help with this. Maybe it's better for HDR... I'll do the same encode with mode 1 after this, so I can compare but thats going to take a few days so...
aq-mode values aren't a "more" - they are modes. 0 is no aq, 1 is a static implementation, 2 adds auto variance, and 3 is 2 except with a bias towards lower QPs for low luma values.

We'd generally expect 2 to be better than 1 for most (but not all) content. And we'd expect 3 to be better for SDR as SDR has sparser code values in low luma than high luma. We'd expect 2 to be better than 3 for PQ curve HDR because PQ is much more perceptually uniform and has plenty of low-luma code values.

I'm not sure what would be best for HLG HDR.

benwaggoner

12th November 2018, 19:16

Boulder

12th November 2018, 19:21

What does the "auto variance" actually mean? I've always thought that aq-mode 1 is already some kind of automatically varying mode based on qg-size.

On HDR sources, mode 2 seems to produce noticably smaller files for the same CRF than mode 1. This would somehow point to the problem with the flat, non-tonemapped image x265 seems to use while analyzing things. Using aq-strength 1.8 for mode 1 makes the bitrate shoot through the roof compared to strength 1.0. I'm going to do some visual compares as soon as I have the time, but it's difficult because a 2-pass encode is a no-no since I'm not hitting a specific size. Comparing those two strengths at the bitrate that strength 1.0 produces at CRF 15 or so is not fair because strength 1.8 would require so much more bits.

benwaggoner

12th November 2018, 19:27

What does the "auto variance" actually mean? I've always thought that aq-mode 1 is already some kind of automatically varying mode based on qg-size.
The aq-mode options are all inherited from x264, which didn't have qg-size. I don't recall the specific differences in the algorithms. I imagine they've evolved some in x265 in any case. Modes 2/3 are more adaptive, which are generally better. But 2 was experimental in x264 for a long time, IIRC because some content it wouldn't adapt optimally.

On HDR sources, mode 2 seems to produce noticably smaller files for the same CRF than mode 1. This would somehow point to the problem with the flat, non-tonemapped image x265 seems to use while analyzing things. Using aq-strength 1.8 for mode 1 makes the bitrate shoot through the roof compared to strength 1.0. I'm going to do some visual compares as soon as I have the time, but it's difficult because a 2-pass encode is a no-no since I'm not hitting a specific size. Comparing those two strengths at the bitrate that strength 1.0 produces at CRF 15 or so is not fair because strength 1.8 would require so much more bits.
I generally recommend using 2-pass VBR when doing comparisons of features like this. With CRF you are changing bitrate AND quality together, so it's hard to tease out any actual encoding efficiency improvements. Going to a fixed file size at a reasonably challenging bitrate (so you're going to see some artifacts) is what I've found as the most efficient way to do these comparisons.

Boulder

12th November 2018, 19:36

Going to a fixed file size at a reasonably challenging bitrate (so you're going to see some artifacts) is what I've found as the most efficient way to do these comparisons.

Then again, if bitrate is a property that I don't need to control (because I cannot tell a static average bitrate for all material), I shouldn't worry about it. As CRF is as close to constant quality that we can ever get, I have been able to set a satisfying CRF level which means it should be fixed to that.

For example, I just ran two test encodes; aq-mode 1, strength 1.0 gave me 6013 kbps and strength 1.8 needed 11610 kbps for the same clip with all the other settings kept the same. Running a 2-pass encode comparison at 6000 kbps, it's quite easy to predict that strength 1.0 will have less artifacts or is sharper and more detailed. It still doesn't mean it's any better at my desired base quality level :)

benwaggoner

12th November 2018, 20:50

Then again, if bitrate is a property that I don't need to control (because I cannot tell a static average bitrate for all material), I shouldn't worry about it. As CRF is as close to constant quality that we can ever get, I have been able to set a satisfying CRF level which means it should be fixed to that.

For example, I just ran two test encodes; aq-mode 1, strength 1.0 gave me 6013 kbps and strength 1.8 needed 11610 kbps for the same clip with all the other settings kept the same. Running a 2-pass encode comparison at 6000 kbps, it's quite easy to predict that strength 1.0 will have less artifacts or is sharper and more detailed. It still doesn't mean it's any better at my desired base quality level :)
The challenge is how to respond when something is, say 11% smaller but looks a little bit worse.

Because CRF is really not that close to constant quality. It's just a psychovisual offset from QP, and lots of changes can impact how good a given CRF value looks in practice. For example, try adding --nr 1000 to your string. It'll be a lot smaller and will look quite different. Pretty much all the psychovisual stuff changes appearance and efficiency with a given CRF value.

One can think of codec parameter tuning as finding ways to make the worst parts of the video look better so that a higher CRF value can be used for the same perceptual quality.

K.i.N.G

12th November 2018, 23:23

On HDR sources, mode 2 seems to produce noticably smaller files for the same CRF than mode 1.

With the footage I'm currently encoding (yes, its HDR) it's the other way arround.
AQ-Mode 1 turned out a lot smaller than AQ-Mode 2.
So it is highly dependant on the type of footage (grain, motion, contrast,...).

So, as Ben says, its always recommended to use 2-pass when comparing.
Or even 3-pass because there still is quite a difference between the resulting bitrates when using 2-pass.
Maybe 3-pass is more accurate? (don't have the time to test it, sadly)

So far, my (quick) tests on current project AQ-Mode 2 turned out as the 'winner'.
Its HQ footage with slight (but noticeable) noise. AQ-Mode 2 preserved details better in the chroma channels and resulted in quite a bit less distortion.
Keep in mind though this probably varies from type of footage... When having footage which has a lot of grain, then distortion might be a more acceptable type of artifact and result in a perceptually better image. (ill test that probably another time)

K.i.N.G

12th November 2018, 23:33

Using aq-strength 1.8 for mode 1 makes the bitrate shoot through the roof compared to strength 1.0.

Yeah, thats normal. Just use higher CRF.

When testing to compare quality:
Test using 2-pass and then when you find your desired 'target' try to match that by raising/lowering CRF until you get close enough to the desired result you got with 2pass.

Boulder

13th November 2018, 05:16

With the footage I'm currently encoding (yes, its HDR) it's the other way arround.
AQ-Mode 1 turned out a lot smaller than AQ-Mode 2.
So it is highly dependant on the type of footage (grain, motion, contrast,...).

So, as Ben says, its always recommended to use 2-pass when comparing.
Or even 3-pass because there still is quite a difference between the resulting bitrates when using 2-pass.
Maybe 3-pass is more accurate? (don't have the time to test it, sadly)

So far, my (quick) tests on current project AQ-Mode 2 turned out as the 'winner'.
Its HQ footage with slight (but noticeable) noise. AQ-Mode 2 preserved details better in the chroma channels and resulted in quite a bit less distortion.
Keep in mind though this probably varies from type of footage... When having footage which has a lot of grain, then distortion might be a more acceptable type of artifact and result in a perceptually better image. (ill test that probably another time)

I've compared them on two quite different sources, the first Harry Potter movie (lots of noise) and then the Solo movie (quite clean). Aq-mode 1 kept the flat backgrounds better looking, mode 2 oversmoothed them even if I raised aq-strength to get the bitrate closer to what mode 1 gets. Didn't try lowering strength yet to see what happens. Nevertheless, the background looked ugly at default strength and mode 2 in both cases. Also in both cases the difference in bitrate was huge, mode 1 much higher than mode 2.

Boulder

13th November 2018, 13:43

Out of interest, would it be possible to run a 2-pass CRF encode so that the first pass uses a manually tonemapped version of the video and then the actual encode uses the real one? This would be really interesting to test at least :)

benwaggoner

13th November 2018, 17:14

I've compared them on two quite different sources, the first Harry Potter movie (lots of noise) and then the Solo movie (quite clean). Aq-mode 1 kept the flat backgrounds better looking, mode 2 oversmoothed them even if I raised aq-strength to get the bitrate closer to what mode 1 gets. Didn't try lowering strength yet to see what happens. Nevertheless, the background looked ugly at default strength and mode 2 in both cases. Also in both cases the difference in bitrate was huge, mode 1 much higher than mode 2.
Yeah, that's what I mean about VBR comparisons: "looked worse at a much lower bitrate" isn't that informative :). It could be that aqm 2 and lowering CRF by 2 could be better and smaller.

Boulder

13th November 2018, 17:48

Yeah, that's what I mean about VBR comparisons: "looked worse at a much lower bitrate" isn't that informative :). It could be that aqm 2 and lowering CRF by 2 could be better and smaller.

Well, going down on CRF so that the bitrates are close to each other still shows the ugly banding-like artifacts of mode 2 :)

I think I also need to compare aq-modes 1 and 3 with SDR sources in a similar way.
EDIT: tested, and aq-mode 3 looks better at least in the video I checked. So it would seem to be the go-to setting for SDR but jury's very much out on aq-mode 2 and HDR.

foxyshadis

17th November 2018, 22:37

What does the "auto variance" actually mean? I've always thought that aq-mode 1 is already some kind of automatically varying mode based on qg-size.

mode 1 is based entirely on the energy of each individual block. mode 2 is based on the energy of the block vs the energy of the frame as a whole. mode 3 is mode 2 with dark-bias, of course.

hdr-opt basically steals bits from the dark to the light, the inverse of mode 3, since that's half the point of HDR. It also does something to the chroma, but I can't make heads or tails of that.

K.i.N.G

2nd December 2018, 23:02

Well, going down on CRF so that the bitrates are close to each other still shows the ugly banding-like artifacts of mode 2 :)

I think I also need to compare aq-modes 1 and 3 with SDR sources in a similar way.
EDIT: tested, and aq-mode 3 looks better at least in the video I checked. So it would seem to be the go-to setting for SDR but jury's very much out on aq-mode 2 and HDR.

It may also be useful to mention this:
I had a HDR video which when viewed on a non-HDR monitor through MadVR's tonemapper looked fine, but when viewed on an actual HDR monitor or TV there was obvious banding visible (in the lighter parts).

Which makes sense, since a HDR -> SDR tonemapper is essentially a soft clipping curve that compresses the highlights + some color space convertions.