How to analyze an HDR video for peak brightness level for the setting of metadata? [Archive]

TomArrow

3rd December 2019, 11:08

I wanna do an encode of a HDR video I made. I'd like to include the global peak brightness metric in the metadata, and maybe the average brightness too. How can I find the brightest pixel in a HDR video (PQ curve) and then calculate the brightness in nits from its value? And how could I go about calculating the average brightness?

Blue_MiSfit

4th December 2019, 02:28

How did you master your HDR video?

Usually mastering tools output this info or can report on it.

TomArrow

4th December 2019, 12:57

Well, I had a composition in After Effects in 32 bit floating point linear color space that had some clipped highlights. Basically came from a source with a high dynamic range already, I just adjusted exposure to look good in SDR and then decided to have the superbright details be HDR instead of tonemapping (curves or such) or clipping. Then exported into the Rec2100 PQ color space. Checked the result and the highlights weren't clipped, so it worked just fine. But ofc didn't tell me what metadata I need.

Blue_MiSfit

5th December 2019, 09:07

I'm scratching my head a bit as to why you mastered content this way, but we can sidestep that.

I'll assume you want to encode your content into HDR10 using static metadata.

The key thing to remember here - the HDR10 metadata consists of a few things:

1 - Signaling the transfer function, differencing function / matrix, and primaries in the HEVC VUI (you'll likely use SMPTE ST 2084, BT 2020 nc, and BT 2020 respectively - since you say you used Rec.2100 / PQ to export from After Effects)

2 - Signaling the mastering display characteristics, both in terms of the CIE XYZ coordinates of the primaries / white point, and the min / max luminance of the display. This is also known as SMPTE ST 2086

Note that there's nothing here about the max light level of the content. There is an additional piece of metadata you can signal called Max Content Light Level (MaxCLL) which indicates the "hottest" pixel in the whole sequence, and Max Frame Average Light Level (MaxFALL) which indicates the brightest frame on average. However, these are optional fields and they do not need to be signaled. In fact, they have little to no effect in most cases.

So, the only real question is the ST 2086 metadata.

What type of HDR display did you master the content on? It seems like maybe you didn't master on an HDR display?

benwaggoner

9th December 2019, 23:12

I wanna do an encode of a HDR video I made. I'd like to include the global peak brightness metric in the metadata, and maybe the average brightness too. How can I find the brightest pixel in a HDR video (PQ curve) and then calculate the brightness in nits from its value? And how could I go about calculating the average brightness?Colorfront Transkoder can do this very trivially.

Sent from my SM-T837V using Tapatalk

TomArrow

27th December 2019, 09:26

Colorfront Transkoder can do this very trivially.

Sent from my SM-T837V using Tapatalk

Thanks but that looks like an expensive software suit, I just need a trivial analysis.

TomArrow

27th December 2019, 09:31

I'm scratching my head a bit as to why you mastered content this way, but we can sidestep that.

I'll assume you want to encode your content into HDR10 using static metadata.

The key thing to remember here - the HDR10 metadata consists of a few things:

1 - Signaling the transfer function, differencing function / matrix, and primaries in the HEVC VUI (you'll likely use SMPTE ST 2084, BT 2020 nc, and BT 2020 respectively - since you say you used Rec.2100 / PQ to export from After Effects)

2 - Signaling the mastering display characteristics, both in terms of the CIE XYZ coordinates of the primaries / white point, and the min / max luminance of the display. This is also known as SMPTE ST 2086

Note that there's nothing here about the max light level of the content. There is an additional piece of metadata you can signal called Max Content Light Level (MaxCLL) which indicates the "hottest" pixel in the whole sequence, and Max Frame Average Light Level (MaxFALL) which indicates the brightest frame on average. However, these are optional fields and they do not need to be signaled. In fact, they have little to no effect in most cases.

So, the only real question is the ST 2086 metadata.

What type of HDR display did you master the content on? It seems like maybe you didn't master on an HDR display?

Well, I did it that way because that's the only way I know how to do it. :) And it's a nice comfortable and logical workflow.

Hmm that's interesting, I thought MaxCLL and MaxFALL are the required ones. Honestly, I'd prefer just using those. I'm mastering on an SDR display. Entering that SDR displays data will hardly lead to anything reasonable I think, I do want to display the superbright values, not have them cut off.

I'll have to do some digging to understand this mastering display thing then I think.

Edit: Is there something like "ideal" mastering display data that I can enter? I realize it might compromise the quality a little bit, but the only other option is for me to simply "steal" the data of some other display, which seems pointless.

For max luminance of the display I figure I could just use my maximum light level (MaxCLL), but what about the primaries? What do they do?

Blue_MiSfit

28th December 2019, 02:23

I'm not sure what to enter when you master HDR content on an SDR display. That's generally not done since you don't really know what you're looking at.

You could just try using some arbitrary values - it's typical in Hollywood to master on a 1000 nit display using the P3 D65 color space. You could express this in x265 using this:

https://x265.readthedocs.io/en/default/cli.html#cmdoption-master-display

The above example is for a display with a minimum light level of .0001 nits - in practice most content is not mastered that low. .002 is more common.

This may or may not look good - especially if the content looks good on your display in SDR without any LUT applied to map it.

You could also just skip the master-display param - the result would not be HDR10 compliant but may play correctly on some players.

TomArrow

29th December 2019, 04:45

I'm not sure what to enter when you master HDR content on an SDR display. That's generally not done since you don't really know what you're looking at.

You could just try using some arbitrary values - it's typical in Hollywood to master on a 1000 nit display using the P3 D65 color space. You could express this in x265 using this:

https://x265.readthedocs.io/en/default/cli.html#cmdoption-master-display

The above example is for a display with a minimum light level of .0001 nits - in practice most content is not mastered that low. .002 is more common.

This may or may not look good - especially if the content looks good on your display in SDR without any LUT applied to map it.

You could also just skip the master-display param - the result would not be HDR10 compliant but may play correctly on some players.

Thanks, looks good! I think I'll try using the primaries of my own display (closer to sRGB) and the luminance values of that display you suggested, or something like that.

TomArrow

29th December 2019, 11:34

I ended up taking the code of the Average filter and committing horrible crimes to it:
https://github.com/TomArrow/MaxCLLFindAVS

Result is an AVS+ filter that kinda does what I needed, analyzing MaxFALL and MaxCLL of a HDR PQ clip. Read the README for usage, if anyone needs it.

The code is a horrifying mess, but I'm lazy so I can't say if and when I will clean it up. Feel free to do pull requests though.

Edit: The plugin is slow as hell at the moment. I figure it needs to be vectorized, optimized, quadrupolized and whatnot. ;)

Edit 2: Sped it up a little with some caching. Sometimes the textfile doesn't get saved, not sure why.

kolak

29th December 2019, 14:06

Do you use any low pas filtering?
I think some pro tools use gentle filtering to avoid single pixels to be taken into account and dictate end result (specially when those can be overshoots from compression).

There is also this:
https://github.com/HDRWCG/HDRStaticMetadata

TomArrow

29th December 2019, 18:09

No, I haven't. I'm not sure whether I agree with the reasoning, because this would make individual bright pixels also impossible to accomplish.

I also haven't done the weighting of the color channels that the repository you linked does. I naively assumed that I could simply average every single 16-bit value (that is not the alpha channel). I should probably read the specification on how to calculate this stuff. I'll clarify this in the README to avoid confusion.

Thanks for linking that repo. I'm happier with mine in the end because creating TIFFs eats up a lot of space that I don't necessarily have.

kolak

29th December 2019, 20:01

I'm not 100% sure if it's needed or not but I know some tools use it. It also depends what you use this value for.

I'm not sure about your formula, but there should be proper, clearly defined formula by SMPTE etc. It should not be based on some assumption.

Look here:
https://spaces.hightail.com/space/nEaXy

there is ZIP with many documents about HDR.
Look in Study Group On High-Dynamic-Range-HDR-Ecosystem.pdf, page 43.

Looks like it's bit more complex than you may think.

Creating TIFF is pain in the xx for sure.

TomArrow

29th December 2019, 21:33

Yeah it's fair enough. I suppose anyone can run a blur on the video before passing it into the calculation, heh. Though it's probably fair to point out that blurring in PQ space might give improper results.

Okay, so according to the Study Group on Page 43 I'm doing MaxCLL right. MaxFALL it does differently than I do. Not necessarily hard to implement it the way they do, but I don't quite understand their logic. They are taking the brightest channel of each pixel and averaging that value across the screen. Meanwhile I take the average of each channel of each pixel across the screen. But the value is supposed to be the average brightness of the frame. How does taking the brightest channel of each pixel and averaging that give me the average brightness of the image? That's just the average brightness of the brightest channels. I'm confused. :P

Are you sure that document is from an official source? It just doesn't seem quite right.

However it should be trivial to implement that if it is indeed the way it's supposed to be. Just a small change really.

TomArrow

30th December 2019, 17:43

Okay, I've updated it to use the official MaxFALL algorithm, so it should give the proper values now.

It's possible there are still some bugs in it of course, as with any software, but judging purely by my logic, it should now give proper readings conforming to the standard, if the SMPTE recommendation is correct.

The old algorithm still exists as optional maxFallAlgorithm=1, but by default it will use the official algorithm.

I tried it on a PQ image I had at hand and the new algorithm gives slightly higher MaxFALL readings, which is to be expected. The difference might be greater or smaller depending on the source material's saturation.

Selur

1st January 2020, 18:57

@TomArrow: https://github.com/HDRWCG/HDRStaticMetadata might be interesting to compare your calculations,...

TomArrow

1st January 2020, 21:49

@TomArrow: https://github.com/HDRWCG/HDRStaticMetadata might be interesting to compare your calculations,...

Could you do it with some test file(s) and then compare results with me using the same files? I don't know how to compile that code.

Selur

2nd January 2020, 21:02

Sorry, just stumbled over it and thought about this thread and that it might be interesting.

TomArrow

2nd January 2020, 21:39

Ah I see. kolak linked it above too. I looked at the code and my algorithm is identical to it from what I can tell, however it's always possible I made some mistake implementing the algorithm, so checking it against another implementation can never hurt. But as I said, I have no idea how to compile it. It only has Linux compiling instructions and needs several dependencies, I'd need a way to do it in Visual Studio (I'm a C++ noob).

benwaggoner

3rd January 2020, 21:21

Do you use any low pas filtering?
I think some pro tools use gentle filtering to avoid single pixels to be taken into account and dictate end result (specially when those can be overshoots from compression).

The metadata is derived from the uncompressed RGB source, not compressed output. So even if there were overshoots from compression, that shouldn't impact the metadata at all. Same deal when downscaling and chroma subsampling. Even though the scaling is a low-pass filter, the metadata is still supposed to be that of the full-rez uncompressed source.

benwaggoner

3rd January 2020, 21:29

Big picture, these values can be left at 0 for "undefined" and most HDR displays will do something pretty close to the right thing. It is a lot better to not specify a value at all than to specify a materially incorrect one.

There is a straightforward translation from Y' code values to nits. So as long as you have some way to measure the brightest pixel in Y', you could then convert that to nits for MaxCLL.

MaxFALL is trickier, since the nits of the mean of the code values can be quite different than the mean of the nits of all the code values. So every pixel needs to get converted to nits and then mean of those determined, and then MaxFALL is the nits of the highest nit frame.

I think that for content already in Y'CbCr, nits can be derived directly from Y'. However all the tools I know of convert to RGB and then calculate from there. That should be empirically tested before that shortcut gets used, but obviously calculating Y'->nits would be a lot faster than going Y'CbCb -> RGB -> luma -> nits.

TomArrow

3rd January 2020, 22:28

@benwaggoner

The SMPTE reference implementation works with RGB values, so I figure that's the way to go. You're absolutely right that the mean of the code values is not the same as the mean of the nits values; the mean I calculate in my code is a mean of the nits values according to the algorithm in the reference pseudo-code. Unless I have made a mistake in implementing it, it should give the correct results.

Nowhere did I see references to using Y' code values. I presume Y' is a form of weighted average of the RGB values, whereas the reference implementation uses max(R,G,B) [aka the highest value channel] for the mean. So it would likely lead to different results again.

I think specifying metadata is better than just passing 0 because then the display has to dynamically adjust to the frames it sees, which has a good chance to lead to flickering/suddenly changing brightness I think, though I could be wrong.

suarsg

4th January 2020, 11:05

In a YUV420 10bit encoded video (HDR10), doesn't the Y-component of the pixel already define the brightness of the pixel?

For example, there's a flashlight in a dark scene located at coordinates (x=1000,y=800) in a frame. Decoding this frame of that scene and getting the Y component of the pixel at those coordinates, gives you e.g. the value 643. Using 643 on the PQ EOTF equals 432.5cd/m2.

Am I missing something?

TomArrow

4th January 2020, 13:28

Well the YUV data gets converted to RGB anyway for displaying. I think using YUV these days is mainly about saving bandwidth via color subsampling, but I could be wrong. I also think it's more of a pseudo-brightness that is roughly proportional to perceived brightness and not really a physical measure of brightness or anything.

This seems to be an example code for converting RGB to YUV (not sure which color matrix):

Y = (0.257 * R) + (0.504 * G) + (0.098 * B) + 16

Cr = V = (0.439 * R) - (0.368 * G) - (0.071 * B) + 128

Cb = U = -(0.148 * R) - (0.291 * G) + (0.439 * B) + 128

So it's basically what I suspected, a weighed average.

Let's take an RGB value of 0.5, 0.6, 0.8. The Y value would be 0.5093 (Edit: Yeah dunno what my brain was thinking here, I wasn't aware that the code was for an 8bit limited YUV range, just ignore this please). Whereas max(R,G,B) would be 0.8. The RGB mean would be ~0.63.

The nits of the Y value would be 101.224

The nits of max(R,G,B) would be 1555.178

The nits of the RGB mean would be 323.845

So depending on what algorithm is used the results can differ quite a lot. max(R,G,B) is the official recommendation afaik.

Edit: I made a little mistake there in the averaging the RGB mean, since I averaged the RGB values, not the nits values. That average would be (92+244+1555)/3 = ~630

nevcairiel

4th January 2020, 14:30

I think using YUV these days is mainly about saving bandwidth via color subsampling, but I could be wrong.

Subsampling is of course one reason (technically one could come up with a RGB subsampling scheme, similar to how Bayer filter work in image sensors and LCD displays, if one really wanted to), but there is also significant compression efficiency advantages by decorrelating the color planes, reducing redundant coded data. Compressing RGB without such a scheme would use more bandwidth.

suarsg

4th January 2020, 15:44

Let's take an RGB value of 0.5, 0.6, 0.8. The Y value would be 0.5093. Whereas max(R,G,B) would be 0.8. The RGB mean would be ~0.63.

The nits of the Y value would be 101.22

Not sure why you're converting full range RGB to a limited color range 8bit YUV and then comparing it to 10bit RGB.

As per your example numbers, for full color range:

R = 0.5*1023 = 512
G = 0.6*1023 = 614
B = 0.8*1023 = 818
PQEOTF(818) = 2716 cd/m2

Y = 0.2627*R + 0.6780*G + 0.0593*B = 599
PQEOTF(599) = 270 cd/m2

But that wasn't really my point. My point was, the display won't output 10,000nits if all of the pixels are RGB(0,1023,0) - but your way of calculating MaxCLL would say 10,000. As a matter of fact, it would take just one fully lit green pixel in the entire movie to result in MaxCLL of 10,000 with your calculation, when in reality this is no more than 750cd/m2.
I'm not saying you're wrong in the sense that you're not following the instructions. I just don't understand why they'd suggest doing it that way. Unless we're both misinterpreting what they mean by:
convert the pixel’s non-linear (R’,G’,B’) values to linear values (R,G,B) calibrated to cd/m2
Can someone post his interpretation of the above? The way I understand the above line is to apply the PQEOTF to the pixel values, since the pixel values represent luminance in a non-linear way but cd/m2 is linear.

They also have this note which sort of supports my above point:
For MaxCLL, the unit is equivalent to cd/m2 when the brightest pixel in the entire video stream has the chromaticity of the white point of the encoding system used to represent the video stream. Since the value of MaxCLL is computed with a max() mathematical operator, it is possible that the true CIE Y Luminance value is less than the MaxCLL value. This situation may occur when there are very bright blue saturated pixels in the stream, which may dominate the max(R,G,B) calculation, but since the blue channel is an approximately 10% contributor to the true CIE Y Luminance, the true CIE Y Luminance value of the example blue pixel would be only approximately 10% of the MaxCLL value.
I just don't understand why they'd recommend it this way. This seems incredibly dumb. What am I missing?

TomArrow

4th January 2020, 18:26

Sorry, I wasn't aware about it being limited or not, I just took the code from some website I found at quick glance. I suppose that example was nonsense then, my bad.

Well, MaxCLL is supposed to be the brightest pixel, and a single RGB pixels strictly speaking consists of R,G and B pixels on a display, so it makes sense to simply take the brightest one because that one pixel (R,G or B) would indeed reach 10,000 nits if that was its value.

It is indeed strange that they would use Max(R,G,B) for the MaxFALL measurement however since it's literally supposed to be an average value. Beats me why they do it that way. My original approach was to just average all the nits values of every single channel and since that made much more sense to me too, I left that in the plugin as alternate algorithm as mentioned earlier.

suarsg

4th January 2020, 19:58

because that one pixel (R,G or B) would indeed reach 10,000 nits if that was its value
Yes, if the pixel is (1023,1023,1023). My example was (0,1023,0) though. In BT709 or sRGB you can only achieve maximum luminance of a pixel when it's (255,255,255), i.e. if all three channels are at their max possible value (255).

I'd assume this is the same for SMPTE2084/HDR10. Anything else doesn't make sense in my opinion, but I'm not an expert.

As a counter example:
Let's think of a medium gray pixel with the values (128,128,128). If we were to apply your luminance to each pixel's channel, each channel would have the same luminance output. This however, would make your medium gray not gray but give it a very blue tint instead. Because the eye is much more sensitive to blue than to red or green.

Another example:
If (0,1023,0) was 10,000nits, that would mean (0,1023,1023) would now be 20,000 nits, since the blue-subpixel was dark before but is now also emitting 10,000nits luminance on top of the 10,000nits the green sub-pixel was emitting. The spec's max however is 10,000nits. So I don't see how anything other than (1023,1023,1023) can be 10,000nits.

TomArrow

4th January 2020, 20:49

Hmm good point about the eye's sensitivity, but I presume the nits value is simply the physical measurement disregarding the human eye's sensitivity? That would certainly explain it in my mind.

Ultimately I think the physical limit will be imposed by how much energy the individual colored dot can output, not how bright it appears to the eye. Since even if the eye isn't very sensitive to blue color, it will still heat up the component just as much as a red pixel of similar intensity.

Also the intensity wouldn't add up like you say I think, since nits is candela per square meter, so two pixels will have twice the candela but also twice the area, so it would still be 10,000 nits?

suarsg

5th January 2020, 01:20

Hmm good point about the eye's sensitivity, but I presume the nits value is simply the physical measurement disregarding the human eye's sensitivity? That would certainly explain it in my mind.

Ultimately I think the physical limit will be imposed by how much energy the individual colored dot can output, not how bright it appears to the eye. Since even if the eye isn't very sensitive to blue color, it will still heat up the component just as much as a red pixel of similar intensity.
That.. makes no sense. It seems like you lack basic understanding of how any of this works and almost feels like you keep misreading everything I write on purpose. And then repeat it the exact opposite.

Also the intensity wouldn't add up like you say I think, since nits is candela per square meter, so two pixels will have twice the candela but also twice the area, so it would still be 10,000 nits?
That was a bad example on my part in regards to cd/m2. My example was about one and the same pixel though - adding a second light source (the blue pixel) - will increase brightness of that pixel. Just like your room gets brighter the more white pixels your display shows.

TomArrow

5th January 2020, 07:21

Why would I misread something you write on purpose? The sheer paranoia! Yes, I'm after you and I only came to this forum specifically to piss you off. Everything I ever did in my life was specifically anchored around the final goal of going to a forum about video processing to purposely misread what YOU wrote. Finally I can rest in peace, having achieved my goal!

TomArrow

5th January 2020, 08:12

Aight, seems I misunderstood the measurement of luminance, but I think so did you.

Apparently the luminance/luminous intensity (cd) is already weighed by the human eye sensitivity: https://en.wikipedia.org/wiki/Luminous_intensity

Let's think of a medium gray pixel with the values (128,128,128). If we were to apply your luminance to each pixel's channel, each channel would have the same luminance output. This however, would make your medium gray not gray but give it a very blue tint instead.

So yes, it does make sense that a value of (128,128,128) with the ST2084 conversion would be neutral grey because luminance is already compensated for the human eye's spectral sensitivity.

The other way I misunderstood the nits measurement is in terms of the square meter. I thought it meant emitter area, but it means the area across which light output is measured.

If (0,1023,0) was 10,000nits, that would mean (0,1023,1023) would now be 20,000 nits, since the blue-subpixel was dark before but is now also emitting 10,000nits luminance on top of the 10,000nits the green sub-pixel was emitting. The spec's max however is 10,000nits. So I don't see how anything other than (1023,1023,1023) can be 10,000nits.

So yeah, I think this is exactly how it works. The 10,000 nit limit is then likely the limit of each individual *physical* pixel:
https://i.redd.it/erxv25g5vac11.jpg

Which makes more sense, since one physical pixel operates independently from the neighboring ones, so why should they be (from a physical perspective) arbitrarily grouped?

Wanted to think about one more angle, but my brain is falling asleep and I'm tired, will think about it more tomorrow.

suarsg

5th January 2020, 11:42

Why don't you run a few movies through your application and compare your calculated MaxCLL/MaxFALL results with what it says in their own metadata?

TomArrow

5th January 2020, 15:07

Sure. I looked at 3 HDR demos I downloaded a while back and they all seem to lack the MaxCLL/MaxFALL data, so there's nothing to compare to. Do you have any test videos in mind that are suitable?

suarsg

5th January 2020, 15:40

8 of the 10 most recently released UHD-BDs have MaxCLL/MaxFALL data. Pick any of them or check the latest one you already have. The tools to extract the HEVC stream exist.

Asmodian

5th January 2020, 23:02

If (0,1023,0) was 10,000nits, that would mean (0,1023,1023) would now be 20,000 nits, since the blue-subpixel was dark before but is now also emitting 10,000nits luminance on top of the 10,000nits the green sub-pixel was emitting. The spec's max however is 10,000nits. So I don't see how anything other than (1023,1023,1023) can be 10,000nits.

The way you (want to) do it is have the display understand how to drive the subpixels. If (1023,1023,1023) is 4000 nits the display could still drive the green subpixel harder to have it also reach 4000 nits by itself.

In practice this is impossible but that does not mean HDR video would not prefer 4000 nits for all pure colors as well as white or that HDR displays must drive the green subpixel the same when at (1023,1023,1023) and (0,1023,0).

benwaggoner

6th January 2020, 04:54

So yes, it does make sense that a value of (128,128,128) with the ST2084 conversion would be neutral grey because luminance is already compensated for the human eye's spectral sensitivity.
It's distracting to use full range 8-bit values. All HDR encoding is encoded in limited range Y'CbCr 4:2:0 10-bit.

Anyway, yes RGB 128, 128, 128 won't have any chroma. In 10-bit YUV it'd translate to (512,0,0). It's neutral in the sense that there is no chroma, but it's going to look quite white in PQ. The same values would be much more gray in 709.

So yeah, I think this is exactly how it works. The 10,000 nit limit is then likely the limit of each individual *physical* pixel.

Which makes more sense, since one physical pixel operates independently from the neighboring ones, so why should they be (from a physical perspective) arbitrarily grouped?

MaxFALL and MaxCLL aren't relative to any physical pixel. How could it be? Some displays have R, G, B elements, other R, G, B, and W. 4K HDR content can be played on 1080p or 8K panels.

Also, an RGB of 0, 512, 0 is going to be a lot brighter than one that's 0, 0, 512, as green is a much bigger portion of luma than blue.

I might want to ruminate on this a bit more, but I think that the conversion to luma used to figure out MaxFALL and MaxCLL is the same used in converting to Y'CbCr, which is why actual brightness should be able to derived with reasonable accuracy from Y'.

suarsg

6th January 2020, 10:18

The way you (want to) do it is have the display understand how to drive the subpixels. If (1023,1023,1023) is 4000 nits the display could still drive the green subpixel harder to have it also reach 4000 nits by itself.

In practice this is impossible but that does not mean HDR video would not prefer 4000nits for all pure colors as well as white or that HDR displays must drive the green subpixel the same when at (1023,1023,1023) and (0,1023,0).

Like I alread mentioned 2 times, this was a poorly chosen example on my part. I was trying to keep it simple without much details convoluting the actual issue. This was an attempt to make him see how flawed his logic was without going into particulars. He was thinking way too much in sRGB-terms and without even considering how RGB relates to brightness or how to convert an RGB-tuple to luminance.

Hence for BT2020 (the color space of HDR10!) we have:
Y = 0.2627*R + 0.6780*G + 0.0593*B
Solving for R, G, B when Y=1.0 (=10,000nits) makes it clear there's only one possible tuple. And in SMPTE284, that is (940+,940+,940+).
Not sure why you're bringing 4000nits or display tech into this as it doesn't really matter for the theory. Obviously there's no consumer display out there that can do these values without tone mapping and each display's tech is completely different on a sub-pixel level and how they themselves decide to derive its pixel's brightness from that sub-pixel structure is their own individual job to figure out.
The important part is, if you have an RGB-tuple for a pixel, the pixel's brightness/luminance will not be based on the value of MAX(R,G,B) or AVG(R,G,B). My examples were there to demonstrate why it makes absolutely no sense.

MaxFALL and MaxCLL aren't relative to any physical pixel. How could it be? Some displays have R, G, B elements, other R, G, B, and W. 4K HDR content can be played on 1080p or 8K panels.

Also, an RGB of 0, 512, 0 is going to be a lot brighter than one that's 0, 0, 512, as green is a much bigger portion of luma than blue.
That's why I stopped responding to him, he made no attempt in trying to understand why that is but instead went on a rant about his life goals.

I might want to ruminate on this a bit more, but I think that the conversion to luma used to figure out MaxFALL and MaxCLL is the same used in converting to Y'CbCr, which is why actual brightness should be able to derived with reasonable accuracy from Y'.
I agree and this was my first post regarding this:
In a YUV420 10bit encoded video (HDR10), doesn't the Y-component of the pixel already define the brightness of the pixel?

TomArrow

6th January 2020, 23:18

MaxFALL and MaxCLL aren't relative to any physical pixel. How could it be? Some displays have R, G, B elements, other R, G, B, and W. 4K HDR content can be played on 1080p or 8K panels.

Also, an RGB of 0, 512, 0 is going to be a lot brighter than one that's 0, 0, 512, as green is a much bigger portion of luma than blue.

I might want to ruminate on this a bit more, but I think that the conversion to luma used to figure out MaxFALL and MaxCLL is the same used in converting to Y'CbCr, which is why actual brightness should be able to derived with reasonable accuracy from Y'.

You're right, the physical pixel of course probably doesn't correspond with the "perfect" pixel in terms of HDR primaries in most cases. Kind of an intriguing problem to be honest, because ultimately I think that's what the TV needs to know, how much it will have to drive any individual physical pixel. Still I'd figure it's more useful for the TV to know the maximum brightness of any of the HDR primaries than they Y value, since Y could mean a wider range of different things. E.g. a value of 0.0593 for Y could mean a blue channel with the value of 1.0 (normalized to 0.0-1.0) or it could mean a green channel with the value of 0.09. I figure it would have to assume the highest possible value and thus end up using more tonemapping than necessary, since the corresponding maximum channel value could have been as low as 0.09 but it needed to assume 1.0.

Be that as it may, do you mean the encoded Y value in a HDR PQ encoded video when you say Y' or a separately calculated brightness based on the RGB PQ data? If you mean the former, I just looked up some pdfs earlier today and I believe most deliverables these days actually do the RGB to YUV conversion for PQ *after* the transfer function is already applied, so reading that Y value and converting it to nits is a pretty invalid thing to do. I believe this problem is called "non-constant" vs "constant", where the latter is done from the linear RGB values and the former (the typical one apparently) is done from the already PQ-encoded RGB values.

To demonstrate, I did a bit of math:
Y = 0.2627*R + 0.6780*G + 0.0593*B

PQ equivalents (roughly)
100 nits = 0.508
400 nits = 0.652
1600 nits = 0.803

With neutral grey, Y value is same as PQ value, since 0.2627 + 0.6780 + 0.0593 = 1

Let's say we have a "pure HDR red" of 100 nits and 400 nits:
(0.508,0,0)
(0.652,0,0)

Converted to nits from RGB we get roughly:

(100,0,0)
(400,0,0)

Let's convert to Y:
0.508*0.2627 = 0.1334516
0.652*0.2627 = 0.1712804

Now convert that value to nits:
0.714005 nits
1.491089 nits

Divide the second value through the first and we get
1.491089 / 0.714005 = 2.08834

We completely lose linearity. The higher value should be almost exactly 4 times as high as the lower value, but instead it is only 2.088 times as high. And the resulting value of course is also nowhere near the proper value.

Quick question by the way, does the white point imply the point where all color components have equal perceptional brightness? I always intuitively assumed that but now I'm questioning it a bit, since for example an sRGB (or linear sRGB) value of (0,255,0) appears brighter than (0,0,255), but (255,255,255) is perceived as white.

I suppose the answer is no. Gotta wrap my head around that. Hah!

benwaggoner

6th January 2020, 23:20

The way you (want to) do it is have the display understand how to drive the subpixels. If (1023,1023,1023) is 4000 nits the display could still drive the green subpixel harder to have it also reach 4000 nits by itself.

In practice this is impossible but that does not mean HDR video would not prefer 4000 nits for all pure colors as well as white or that HDR displays must drive the green subpixel the same when at (1023,1023,1023) and (0,1023,0).
Yeah. Current display technologies make blue the easiest color to make bright, which is why brighter TV settings are also bluer. Since proper "movie" settings require a warm D65 white point, that caps the maximum bright white possible. Adding a white subpixel mitigates that some, at the cost of reducing maximum brightness of saturated colors.

Tone mapping is a complex and interesting field, and there is definitely not any one way to do it. Tradeoffs around how to optimally preserver brightness, hue, and saturation without banding involve a lot of proprietary alchemy by TV companies.

TomArrow

6th January 2020, 23:40

I was trying to keep it simple without much details convoluting the actual issue. This was an attempt to make him see how flawed his logic was without going into particulars. He was thinking way too much in sRGB-terms and without even considering how RGB relates to brightness or how to convert an RGB-tuple to luminance.

That's why I stopped responding to him, he made no attempt in trying to understand why that is but instead went on a rant about his life goals.

First of all, what are you talking about? You responded to me twice after that. Lol! Have you considered to stop being a <redacted> and just present your arguments? If you have something to say and are able to teach me what I'm wrong about, please by all means "go into the particulars", but so far you've been doing a shit job at it. I never claimed to be an expert about this stuff, I'm just trying to figure it out. No reason to be condescending. If you're not gonna say anything acually helpful - whether you are knowledgeable or not - you may as well shut up.

Hence for BT2020 (the color space of HDR10!) we have:
Y = 0.2627*R + 0.6780*G + 0.0593*B
Solving for R, G, B when Y=1.0 (=10,000nits) makes it clear there's only one possible tuple. And in SMPTE284, that is (940+,940+,940+).
Not sure why you're bringing 4000nits or display tech into this as it doesn't really matter for the theory. Obviously there's no consumer display out there that can do these values without tone mapping and each display's tech is completely different on a sub-pixel level and how they themselves decide to derive its pixel's brightness from that sub-pixel structure is their own individual job to figure out.
The important part is, if you have an RGB-tuple for a pixel, the pixel's brightness/luminance will not be based on the value of MAX(R,G,B) or AVG(R,G,B). My examples were there to demonstrate why it makes absolutely no sense.

Hypothetically assume that we had a subpixel structure whose primaries are 100% identical to the Rec2020 primaries. In that case, as I understand it, the nits value of each subpixel would be equal to its perceived brightness, as nits is candela/sqm and candela is already a perceptual value (aka measuring blue lower than green and so forth). The TV would, in this hypothetical case, and unless I'm missing something, only have to apply the reverse luminosity function to derive the required watt output of the subpixel and then probably compensate for energy conversion efficiency and voila, that should tell it how much to drive said subpixel.

See: https://en.wikipedia.org/wiki/Luminous_intensity
Also see: https://en.wikipedia.org/wiki/Luminosity_function
Also see: https://en.wikipedia.org/wiki/Radiant_intensity

Why would the brightness of an entire pixel (R, G and B combined) even matter to the TV? The TV, I think, has to be worried about not burning through any subpixel, and maybe energy saving regulations. As long as no individual subpixel gets too hot (or whatever else matters for its longevity), it hardly matters what the value of the other subpixels is. Of course you can argue that the heat from a subpixel affects subpixels around it, however then, depending on the subpixel layout, the TV might - e.g. in case of a red subpixel - be more worried about the blue subpixel of the neighboring pixel since its closer to that red subpixel than the blue subpixel of the same pixel. So knowing the brightness of a pixel as a whole is barely useful, unless I'm missing something.

I agree and this was my first post regarding this: In a YUV420 10bit encoded video (HDR10), doesn't the Y-component of the pixel already define the brightness of the pixel?

See my last post and the whole thing about constant vs. non-constant. In short, as I think I learned, no.

TomArrow

6th January 2020, 23:54

So I tested my plugin on some of the MaxCLL test patterns here:
https://www.avsforum.com/forum/139-display-calibration/2943380-hdr10-test-patterns-set.html

What I learned is that some of the patterns will read out as almost 10,000 nits despite supposedly being meant to have MaxCLL 1000 or 4000. Other patterns will read slightly above 1000 nits, which seems about right. Seemed like an error to me, so I took a screenshot and checked the RGB values in Photoshop. Turns out, the text superimposed over the colors creates some artifacts that lead to extreme overshoots in some cases, for example with the red testpattern. However I'm not sure if this is a problem with the chart itself or possibly with ffvideosource or ConvertToRGB64(matrix="Rec2020") or simply with the color subsampling, or with a combination of all of those.

Generally speaking, judging by the patterns that gave reasonable readings, MaxCLL indeed seems to be the maximum subpixel value, if those charts are correct. Different colors that were supposed to have identical MaxCLL all turned out to have just that with my method of reading.

I tried a few other videos but the results were very inconsistent and any resemblance between my readings and their data was spurious at best. For example I tested some JVC HDR test Blu Ray and it gave me 10,000 nits where the metadata said MaxCLL was 1020 cd/m2. I took a screenshot again and it was some very tiny highlight, a reflection in some shoe in a shoe store at night, that actually blew out the blue channel. Which leads me to believe that the metadata is either wrong or they used that kind of blurring someone mentioned, which might explain why the metadata is lower. However they also reused the same metadata for multiple clips so I'm not sure it can be trusted at all..

I also tested this video: https://www.youtube.com/watch?v=hsbM2c6-9Wg

My readings for FALL (I just looked at single frames) were a bit less than half of what the video said they were supposed to be in each place I measured. I wonder if this might have to do with that little point about only measuring the "active pixels" and whether I need to just discard all black pixels from the calculation, though that seems a little strange to me. I then cropped the video to only show the right half and measured that, but it didn't really change much.

Maybe someone here can run the same videos through another MaxFALL/MaxCLL application to compare the results.

kolak

12th January 2020, 23:07

If I remember well somewhere in the docs there was info about taking into account active area only (if aspect is different than frame aspect).

Those overshoots are present as you deal with compressed footage I assume (and present problem which I was talking about).
In high-end finish master are done as eg. 16bit TIFF so there should be no such a problem (but even encoding to ProRes etc. will introduce overshoots).

There is also some 'inconsistency' with range- some documents use 0-1023, others 4-1019 (due to timing reference been used for other values). As far as I understand eg. HDMI uses 4-1019 range.

heisenberg

24th April 2020, 00:06

In a YUV420 10bit encoded video (HDR10), doesn't the Y-component of the pixel already define the brightness of the pixel?

For example, there's a flashlight in a dark scene located at coordinates (x=1000,y=800) in a frame. Decoding this frame of that scene and getting the Y component of the pixel at those coordinates, gives you e.g. the value 643. Using 643 on the PQ EOTF equals 432.5cd/m2.

Am I missing something?

Seems right, for YUV420 10bit PQ transfer video, as you say.

kedautinh12

29th September 2021, 17:49

New ver:
https://github.com/TomArrow/MaxCLLFindAVS/releases

quietvoid

29th September 2021, 18:26

By the way, in the SMPTE August 2021 journal, there is a paper describing an alternative calculation for the metadata.
Pseudocode:
https://i.postimg.cc/YvJVJCfW/Untitled.png (https://postimg.cc/YvJVJCfW)

kedautinh12

22nd February 2022, 12:25

0.36:
https://github.com/erazortt/MaxCLLFindAVS/releases

kolak

8th July 2022, 23:49

So does 0.36 include new formula with rejecting outliers ?

quietvoid

8th July 2022, 23:58

Doesn't seem to.
It should be pretty trivial to do using VapourSynth and numpy's percentile.

kolak

8th July 2022, 23:59

That's my impression as well. Could someone add it, please?
Other than this it seems to be accurate.