Log in

View Full Version : Encode RPU metadata in resized 444 encode


MonoS
11th June 2021, 22:22
Hi, I want to encode a 4K BD to an 1080p 444 HDR mkv and i have a couple of question.

When enabling HDR10-Opt in x265 i get an error message because input video should be 4:2:0, but mine is 4:4:4 is it safe to comment this line (https://bitbucket.org/multicoreware/x265_git/src/82786fccce10379be439243b6a776dc2f5918cb4/source/encoder/encoder.cpp#lines-4347)?

According to MakeMKV this BD in particular should be BL+MEL+RPU, i've demuxed the RPU stream with dovi_tool
Then when enabling RPU data, according to MakeMKV it should be a BL+MEL+RPU video, extracted using dovi_tool getting a 40mb file, when i set the Dolby-Vision version to 5.0 and the file path i get, again, an error regarding the colorspace which is not 4:2:0, again is it safe to delete this line (https://bitbucket.org/multicoreware/x265_git/src/bf91444e034831141e0ce02b1200e51996f8b6c6/source/common/param.cpp?at=master#lines-1841) skipping the error?

Is it better to dither in my script to 10bit (i'm using fmtc Ostromoukhov Error Diffusion algo) or sending to x265/ffmpeg the full 16bit clip and make it do its magic?

Thanks for the attention.

rwill
12th June 2021, 07:37
Given that there are probably no hardware players supporting Dolby Vision in a 444 HEVC Profile I am interested in which software player you are going to use to play back your file in Dolby Vision mode.

MonoS
12th June 2021, 09:11
Given that there are probably no hardware players supporting Dolby Vision in a 444 HEVC Profile I am interested in which software player you are going to use to play back your file in Dolby Vision mode.

I have no hardware player, my interest is to not throw away this metadata, maybe someday a FOSS player will support decoding of DV metadata and properly read my file.

benwaggoner
14th June 2021, 02:48
Scaling is a low-pass filter, so the metadata may well not be perfectly accurate in your 1080p in any case.

Archiving in scaled-down 1080p 444 is an unusual choice. If you're trying to save bits, 420 is more efficient in quality-for-bit. If you want to preserve quality, don't scale.

And the source is almost certainly in 10-bit, so why convert to 8-bit? 10-bit 420 is much more compatible and higher quality than 8-bit 444.

RanmaCanada
14th June 2021, 04:42
I have no hardware player, my interest is to not throw away this metadata, maybe someday a FOSS player will support decoding of DV metadata and properly read my file.

Kodi can already decode DolbyVision, but the problem is that the device needs to have a Dolby license to play it back. There are also hardware/profile limitations as well in that license. The odds of a box being able to play a UHD profile is going to be pretty small.

MonoS
14th June 2021, 09:49
Scaling is a low-pass filter, so the metadata may well not be perfectly accurate in your 1080p in any case.

Archiving in scaled-down 1080p 444 is an unusual choice. If you're trying to save bits, 420 is more efficient in quality-for-bit. If you want to preserve quality, don't scale.

And the source is almost certainly in 10-bit, so why convert to 8-bit? 10-bit 420 is much more compatible and higher quality than 8-bit 444.

I never said i'm converting to 8bit, my intermediate processing is in 16bit (in this case denoising and scaling), and the end of the chain i dither to 10bit which is what x265 will output to, i wanted to know if x265 will make use of the additional 6bit or dither it to the output bitdepth before any processing, i wasn't able to find any information in regard.

Regarding the accuracy of the metadata it all depends on how the information is saved in the metadata, if it's in relative position, with a 0-1 range for example, or are at the frame level than it will still be valid, if it's at the macroblock level, or something similar, than it will be complete garbage in a scaled down video, if that the case the DoVi metadata won't have any sense and be instead detrimental for the quality of the encode. For example i've encountered HDR10+ video (Mediainfo report it as SMPTE ST 2094 App4 HDR10+) and the metadata seemed to be at the frame level.

Regarding the 444/420 and to scale/not scale choice i like it that way, i want a middle ground between quality/space/speed, i don't see any reason to throw away chroma information as it compress easily, but i don't see that much detail in the whole frame to necessitate of the full 4k treatment.

MonoS
14th June 2021, 09:50
Kodi can already decode DolbyVision, but the problem is that the device needs to have a Dolby license to play it back. There are also hardware/profile limitations as well in that license. The odds of a box being able to play a UHD profile is going to be pretty small.

Well, if that is, and will be, the case there no sense in saving the DoVi layer, i thought it could be decoded in software, by bad.

Boulder
14th June 2021, 17:21
I never said i'm converting to 8bit, my intermediate processing is in 16bit (in this case denoising and scaling), and the end of the chain i dither to 10bit which is what x265 will output to, i wanted to know if x265 will make use of the additional 6bit or dither it to the output bitdepth before any processing, i wasn't able to find any information in regard.

If you process in 16 bits, it's best to feed that to x265 and use the --dither parameter. There is at least some 16bit processing going on there.

benwaggoner
14th June 2021, 18:45
I never said i'm converting to 8bit, my intermediate processing is in 16bit (in this case denoising and scaling), and the end of the chain i dither to 10bit which is what x265 will output to, i wanted to know if x265 will make use of the additional 6bit or dither it to the output bitdepth before any processing, i wasn't able to find any information in regard.
If you use the --dither command I think it will dither the input to 10-bit. It's a pretty stock dithering algorithm, though.

Regarding the accuracy of the metadata it all depends on how the information is saved in the metadata, if it's in relative position, with a 0-1 range for example, or are at the frame level than it will still be valid, if it's at the macroblock level, or something similar, than it will be complete garbage in a scaled down video, if that the case the DoVi metadata won't have any sense and be instead detrimental for the quality of the encode. For example i've encountered HDR10+ video (Mediainfo report it as SMPTE ST 2094 App4 HDR10+) and the metadata seemed to be at the frame level.
It's hard to predict how applicable it will be. I don't know of any decoders that can do 444 with Dolby Vision, so not sure how you could test this config.

Regarding the 444/420 and to scale/not scale choice i like it that way, i want a middle ground between quality/space/speed, i don't see any reason to throw away chroma information as it compress easily, but i don't see that much detail in the whole frame to necessitate of the full 4k treatment.
Chroma compresses pretty easily, but not trivially. For any bitrate-constrained encode, it's nearly always more efficient to encode as 420. At a higher bitrate if not yet a transparent encode, or at a higher resolution if at transparent. Ala 1440p 420.

MonoS
14th June 2021, 18:57
If you process in 16 bits, it's best to feed that to x265 and use the --dither parameter. There is at least some 16bit processing going on there.

I've checked the documentation (https://x265-git.readthedocs.io/en/master/cli.html#input-output-file-options) for this option
These options all describe the input video sequence or, in the case of --dither, operations that are performed on the sequence prior to encode.
I remember this being worded differently in the past, but as i read now it's inconsequential if i dither before passing the video to the encoder.

MonoS
14th June 2021, 19:01
If you use the --dither command I think it will dither the input to 10-bit. It's a pretty stock dithering algorithm, though.

As i said in my first post i'm using the Ostromoukhov Error Diffusion algo from the fmtc package, so i gues it's technically better than the one used in x265?


It's hard to predict how applicable it will be. I don't know of any decoders that can do 444 with Dolby Vision, so not sure how you could test this config.

I do understand :)


Chroma compresses pretty easily, but not trivially. For any bitrate-constrained encode, it's nearly always more efficient to encode as 420. At a higher bitrate if not yet a transparent encode, or at a higher resolution if at transparent. Ala 1440p 420.

Currently i'm trying to achieve a transparent encode, if you're interested i can share my current command line but i'm scared of being blasted :D

benwaggoner
14th June 2021, 20:27
I've checked the documentation (https://x265-git.readthedocs.io/en/master/cli.html#input-output-file-options) for this option

I remember this being worded differently in the past, but as i read now it's inconsequential if i dither before passing the video to the encoder.
Right. If the input is the same bit depth as the encode, there will be no dithering. If the input is a higher bit depth as the encode, x265 will apply a very basic dither by default. Using the --dither parameter switches to a better, somewhat slower algorithm. IIRC, a standard Floyd-Steinberg implementation. Your external ditherer is presumably superior.

Boulder
15th June 2021, 05:06
Yet, there is some calculation being done in 16 bits inside x265 so that may be effected. Some time ago, I did dig up the post where it was mentioned but didn't put the link anywhere to return to later. EDIT: found it, https://forum.doom9.org/showthread.php?p=1847537#post1847537.

Also, technically superior may not mean that it survives the encoding process better. That's why I trust the internal dithering method the most.

benwaggoner
15th June 2021, 17:58
Yet, there is some calculation being done in 16 bits inside x265 so that may be effected. Some time ago, I did dig up the post where it was mentioned but didn't put the link anywhere to return to later. EDIT: found it, https://forum.doom9.org/showthread.php?p=1847537#post1847537.
In 10-bit encoding, it's actually 32-bit internally IIRC. This is also signed floating point in the frequency domain, so it's pretty apples-to-kumquats when compared to source code values.

Also, technically superior may not mean that it survives the encoding process better. That's why I trust the internal dithering method the most.
The internal dithering isn't encoding-aware in any way, and would give the same results as if the same dithering algorithms were used externally. It's just pure preprocessing that happens outside of the actual codec itself.

You can compare that to the internal noise reduction (--nr-inter and --nr-intra), which operate like adaptive deadzones, and thus are in-loop and provide more compression friendly noise reduction than a pure preprocessor.

Creative intent noise reduction can still be superior of course, as one is removing the noise one wants to remove and leaving the noise one wants to retain.

MonoS
27th June 2021, 09:46
In 10-bit encoding, it's actually 32-bit internally IIRC. This is also signed floating point in the frequency domain, so it's pretty apples-to-kumquats when compared to source code values.

So should it be preferable to send to x265 the full 16bit clip instead of using --dither option or dithering beforehand?

benwaggoner
28th June 2021, 16:25
So should it be preferable to send to x265 the full 16bit clip instead of using --dither option or dithering beforehand?
If x265 is inputting more depth than it is outputting, you want to use --dither. That improves the quality of any bit depth reduction, as is ignored if input and output depth are identical.

Using an external ditherer and sending 10-bit to x265 makes sense when the external ditherer does a better job. Which is not a high bar, as the x265 --dither is still a pretty standard dithering algorithm, just somewhat better and a bit slower than the default one that gets used when reducing bit depth without specifying --dither.

x265 should really just make --dither on by default. The speed difference might have mattered a little in 2015, but we have so many cores now that I don't imagine that it materially impacts speed at anything above --preset faster.

MonoS
29th June 2021, 13:56
Using an external ditherer and sending 10-bit to x265 makes sense when the external ditherer does a better job. Which is not a high bar, as the x265 --dither is still a pretty standard dithering algorithm, just somewhat better and a bit slower than the default one that gets used when reducing bit depth without specifying --dither.ì

Ok, thank you for the explanation :) .

Any opinion regarding forcind the HDR10-Opt flag even when doing a 4:4:4 encode?

quietvoid
29th June 2021, 14:10
Edit: whoops I misread for dhdr10.

I think hdr10-opt still allocates more bits to highlights to preserve the luminance.

benwaggoner
29th June 2021, 17:32
Ok, thank you for the explanation :) .

Any opinion regarding forcind the HDR10-Opt flag even when doing a 4:4:4 encode?
It's may be a good idea, but you'll need to test. Hdr10-opt is a psychovisual optimization to change the luma versus chroma QP ratios at different points on the luma scale, IIRC. The docs explicitly say 4:2:0.

Chroma QPs are automatically increased in 4:4:4, since more chroma pixels means each can be encoded with less precision. If hdr10-opt knows how to combine with those adjustments it could work great.

But HDR-10 4:4:4 is a pretty rare edge case, and wouldn't have got nearly as much testing and tuning as HDR-10 4:2:0. If it does help, it would be more luck than design.

Relevant parts from x265.readthedocs.io:

--hdr10-opt, --no-hdr10-opt
Enable block-level luma and chroma QP optimization for HDR10 content as suggested in ITU-T H-series Recommendations ? Supplement 15. Source video should have HDR10 characteristics such as 10-bit depth 4:2:0 with Bt.2020 color primaries and SMPTE ST.2084 transfer characteristics. It is recommended that AQ-mode be enabled along with this feature. Default disabled.

Psychovisual Options (https://x265.readthedocs.io/en/master/cli.html#psycho-visual-options)
In 444, chroma gets twice as much resolution, so halve the quality when psy-rd is enabled. So, when psy-rd is enabled for 444 videos, cbQpOffset and crQpOffset are set to value 6, if they are not explicitly set.

MonoS
29th June 2021, 21:51
It's may be a good idea, but you'll need to test. Hdr10-opt is a psychovisual optimization to change the luma versus chroma QP ratios at different points on the luma scale, IIRC. The docs explicitly say 4:2:0.

Chroma QPs are automatically increased in 4:4:4, since more chroma pixels means each can be encoded with less precision. If hdr10-opt knows how to combine with those adjustments it could work great.

But HDR-10 4:4:4 is a pretty rare edge case, and wouldn't have got nearly as much testing and tuning as HDR-10 4:2:0. If it does help, it would be more luck than design.

Relevant parts from x265.readthedocs.io:



Psychovisual Options (https://x265.readthedocs.io/en/master/cli.html#psycho-visual-options)

Yeah, i've also guessed it will probably works more by luck, if those QP optimization are suggested in the spec then i guess MCW will never test something like that as it's more of a "fansuber encode" sort of mode.

Not having an HDR display i can't neither test if this option is truly beneficial, i'm activating it anyway recompiling the X265 as specified in my first post hoping it really help, my guess was that if it helped a 4:2:0 source why it shouldn't for a 4:4:4? It's the same information, only bigger.

Also i'm lowering the QP offset for chroma planes to -3/-3 (IRC 4:4:4 raise it to +6/+6)

benwaggoner
30th June 2021, 01:26
Yeah, i've also guessed it will probably works more by luck, if those QP optimization are suggested in the spec then i guess MCW will never test something like that as it's more of a "fansuber encode" sort of mode.

Not having an HDR display i can't neither test if this option is truly beneficial, i'm activating it anyway recompiling the X265 as specified in my first post hoping it really help, my guess was that if it helped a 4:2:0 source why it shouldn't for a 4:4:4? It's the same information, only bigger.

Also i'm lowering the QP offset for chroma planes to -3/-3 (IRC 4:4:4 raise it to +6/+6)
If you don't have a HDR display, you can't know how your parameters impact quality.

Since you can't test your results, you should really limit yourself to standard, well-tested, and well-understood configurations. Otherwise you could find yourself getting subtle artifacts you won't know about until you get a proper display, or wind up wasting huge amounts of time or bits on parameters that don't make any actual impact on quality.

You are tweaking knobs here that would require quite a bit of visual evaluation to validate aren't regressions, let alone improvements.

MonoS
2nd July 2021, 18:41
If you don't have a HDR display, you can't know how your parameters impact quality.

Since you can't test your results, you should really limit yourself to standard, well-tested, and well-understood configurations. Otherwise you could find yourself getting subtle artifacts you won't know about until you get a proper display, or wind up wasting huge amounts of time or bits on parameters that don't make any actual impact on quality.

You are tweaking knobs here that would require quite a bit of visual evaluation to validate aren't regressions, let alone improvements.

Except for the hdr10-opt option i've tested my command line with a couple of sources (anime, grainy, film) with no HDR and then compared with the source frame by frame, so up to the HDR parameters i HOPE to have made the correct choices, regarding the hdr-opt i hope is correct as i've spent many CPUweek encoding with that parameters :D

benwaggoner
2nd July 2021, 22:29
Except for the hdr10-opt option i've tested my command line with a couple of sources (anime, grainy, film) with no HDR and then compared with the source frame by frame, so up to the HDR parameters i HOPE to have made the correct choices, regarding the hdr-opt i hope is correct as i've spent many CPUweek encoding with that parameters :D
You won't know if it encoded well without testing on a good HDR display. Just putting .mp4 files on a thumb drive and sticking it into a smart TV works (that's how I had to test final HDR encodes for several years after we launched HDR in 2015).

HDR encoding is fundamentally different in some important ways, and settings that work well for SDR can be less optimal for HDR. --aq-mode 3, for example, is only ever useful for SDR. In HDR compression errors of a given range of code values is equally problematic across the luma curve, while in SDR the same level of code value distortion is worst at black and gets less visible the brighter the image. Clean encoding of sharp edges is more important in HDR because the local contrast can be so much higher. Ringing artifacts in white can do weird things in HDR, while being almost invisible in SDR near-white.

That said, you are more likely to be wasting bits by using a lower than needed CRF more than getting poor quality. Although if you are testing on an 8-bit SDR display than you may see banding and posterization that wouldn't exist in 10-bit HDR, and miss some subtle defects that aren't visible in 8-bit due to truncation.

MonoS
3rd July 2021, 11:42
You won't know if it encoded well without testing on a good HDR display. Just putting .mp4 files on a thumb drive and sticking it into a smart TV works (that's how I had to test final HDR encodes for several years after we launched HDR in 2015).

Sadly right now i don't have any HDR display at my disposal, and i'm not willing to spend the money right now to buy a proper one.


HDR encoding is fundamentally different in some important ways, and settings that work well for SDR can be less optimal for HDR. --aq-mode 3, for example, is only ever useful for SDR. In HDR compression errors of a given range of code values is equally problematic across the luma curve, while in SDR the same level of code value distortion is worst at black and gets less visible the brighter the image. Clean encoding of sharp edges is more important in HDR because the local contrast can be so much higher. Ringing artifacts in white can do weird things in HDR, while being almost invisible in SDR near-white.

THANK YOU VERY MUCH for the info about --aq-mode 3, i hope that using a low crf (right now i'm using 16) has counteract the use of that option in my HDR encode, i'll remove from the future ones, thank you again :).
Regarding edges and such i've put special care in looking for any artifact in those region, having encoded a couple of anime source which experience very nasty artifact on those area with standard settings.


That said, you are more likely to be wasting bits by using a lower than needed CRF more than getting poor quality. Although if you are testing on an 8-bit SDR display than you may see banding and posterization that wouldn't exist in 10-bit HDR, and miss some subtle defects that aren't visible in 8-bit due to truncation.
I guess i'm wasting bits more than wasting quality right now, using madVR and going back and forth between source and encode i saw that the encode was transparent at reasonable bitrate, for example the one i'm making right now it's around 10Mbps (1920*1038 4:4:4 24000/1001fps).

benwaggoner
4th July 2021, 02:02
Sadly right now i don't have any HDR display at my disposal, and i'm not willing to spend the money right now to buy a proper one.
You probably should just wait until you get one before continuing with this project. You've been wasting a lot of watts and time on suboptimal encodes.


THANK YOU VERY MUCH for the info about --aq-mode 3, i hope that using a low crf (right now i'm using 16) has counteract the use of that option in my HDR encode, i'll remove from the future ones, thank you again :).
It shouldn't hurt the quality, but your file sizes could easily be 30% higher than needed.

Regarding edges and such i've put special care in looking for any artifact in those region, having encoded a couple of anime source which experience very nasty artifact on those area with standard settings.
You have HDR anime? What titles?

guess i'm wasting bits more than wasting quality right now, using madVR and going back and forth between source and encode i saw that the encode was transparent at reasonable bitrate, for example the one i'm making right now it's around 10Mbps (1920*1038 4:4:4 24000/1001fps).
You are only seeing that it is transparent in SDR. If you want to make 1080p, why not just convert to SDR so you can see what you're doing and actually watch the backups now? Among other things, the quality difference between 4K and 1080p is much more apparent in HDR than SDR, as there can be way more local contrast, and you can get sparkling specular highlights and stars. You won't be able to see those details on a SDR display, and so won't know if your converting to 1080p is losing those essential details. Which, for plenty of 4K titles, you are losing those details.

You'd be much better off encoding to 1440p 4:2:0 if you're trying to keep to ~10 Mbps.

Really, you literally cannot see what you are doing in your current config, and the odds that you're making good HDR backups as configured and tested is quite low.

MonoS
26th July 2021, 21:14
You probably should just wait until you get one before continuing with this project. You've been wasting a lot of watts and time on suboptimal encodes.

Ok i'll do, a friend should have a proper HDR screen, maybe i'll ask them.


It shouldn't hurt the quality, but your file sizes could easily be 30% higher than needed.

Well, that's a relief.


You have HDR anime? What titles?

Well, anime and animation, Dragon Trainer and the latest movies from Shinkai god released in 4k HDR.


You are only seeing that it is transparent in SDR. If you want to make 1080p, why not just convert to SDR so you can see what you're doing and actually watch the backups now? Among other things, the quality difference between 4K and 1080p is much more apparent in HDR than SDR, as there can be way more local contrast, and you can get sparkling specular highlights and stars. You won't be able to see those details on a SDR display, and so won't know if your converting to 1080p is losing those essential details. Which, for plenty of 4K titles, you are losing those details.

You'd be much better off encoding to 1440p 4:2:0 if you're trying to keep to ~10 Mbps.

Really, you literally cannot see what you are doing in your current config, and the odds that you're making good HDR backups as configured and tested is quite low.
Well, i guess the player (madVr in that case) is doing the conversion to SDR during playback.
Regarding the loss of detail this should be valuated case by case, but i see your point, i'll be more careful in that regard.

benwaggoner
26th July 2021, 23:51
Well, i guess the player (madVr in that case) is doing the conversion to SDR during playback.
Regarding the loss of detail this should be valuated case by case, but i see your point, i'll be more careful in that regard.
And with madVR tone mapping back from HDR to SDR, you can't verify if any detail loss is from inverse tone mapping, or if there are issues in HDR that get masked by the inverse tone mapping. madVR will roll off highlights to make it SDR again, for example, and so if there are issues in your highlights you wouldn't see them.

You can get a small decent HDR TV or monitor for <$500 these days, and it's hard to buy anything NOT HDR >42" now.

Balling
16th September 2021, 09:59
RPU is only supported with 4:2:0.

RGB mezannine uses XML RPU.