PDA

View Full Version : BT.601 and BT.709 compatibility benchmark


Petrakeas2
8th March 2012, 22:37
http://www.wiggler.gr/wp-content/wmplayer-2012-02-27-00-01-37-97.jpg

Without getting into much detail , 2 main color spaces exist noways for video: BT.601 and BT.709 . If a video is encoded using one of them and then decoded with the other, a slight color shift will occur (see the post's image that is split into 3 rows). Most people may not even notice it, but areas with blue and green colors will be more affected and if you want your video to be reproduced the way you mastered it, you have to take some things into consideration.

That's why I made this benchmark, testing video editing applications like Adobe Suite, Sony Vegas, video players and many decoders suck as windows media player, quick time on a mac, Adobe Flash player, VLC, XBMC and browsers with mp4 playback capabilities such as chrome and safari. If you are curious about their behavior read below!

Possible behaviors

A video using H.264 codec can carry some flags that will allow the player to know the color space that was used when it was encoded, so that the same color matrix will be used in decoding. Unfortunately, not all test candidates paid attention to those flags. Most of them, ignored them. Another way to decide is to use BT.601 for SD video (resolutions up to 576 lines vertically) and use BT.709 for HD videos. And yet another way is to stick with just one color matrix (worst way).

Test creation

In order to test the above, I created a "ground truth" sRGB video and then encoded it to H.264 using all the possible combinations: SD VS HD, BT.601 VS BT.709, tagged VS untagged. "Tagged" will be the clips where I stored the "flag" about the color space used when encoded and "unflagged" will be the videos without this flag. The flags I set are: "Color primaries", "Transfer characteristics", "Matrix coeeficients".

Encoding was performed using MEGUI, x264 and avisynth. Using avisyth I converted the colors from RGB to YV12 color space using BT.601 or BT.709 (ex ConvertToYV12(matrix="rec709") ) Using X.264 (inside megui), I encoded the video stream and "tagged" the video stream or not with the color matrix I used earlier. I also used Sony Vegas and Adobe Media Encoder to test their behavior in encoding.

The comparison was done, taking screenshots on Mac and using FRAPS on Windows. In both systems, sRGB profile was used for the monitor.

You can download the video benchmark files here (http://bit.ly/xZfthG). You can use media info to see the details of the files (flags, resolution etc).

Results

Windows media player

If the default decoder is used for H.264, then tags are read. If the video is untagged, BT.601 is used for SD video and BT.709 is used for HD.

If the decoder is ffdhow and outputs YV12, tags are ignored. BT.601 is used for SD video and BT.709 is used for HD. If RGB32 is set as output in ffdhsow, then software conversion is done for YV12<->RGB. Then, tags are read. If the video is untagged, BT.601 is used for SD video and BT.709 is used for HD.

VLC in Windows

Tags are ignored. BT.601 is used for SD video and BT.709 is used for HD.

XBMC in Windows

If DXVA2 is enabled then tags are read. If the video is untagged, BT.601 is used for SD video and BT.709 is used for HD.

If DXVA2 is not enabled, tags are ignored and BT.601 is used for SD video and BT.709 is used for HD.


Quick Time player on a mac

Quick Time player on a mac reads the tags of the video and uses the appropriate color matrix. If the video is untagged, then it always uses BT.601 .

A strange thing that I noticed is that when the video is BT.709 (tagged), the decoded image by QT is a little different than what all the other software produce. That means that either the BT.709 color primaries that QT uses is right and the other software are wrong or the opposite . When the "color primaries" is tagged as SMPTE 240M (almost the same as BT.709), then the decoded image from QT looks the same as the original.

Video player on mac

I tested this in vlc and XBMC. They ignore tags. They always use BT.601 .


Flash player 11

Flash videos are everywhere. I did this experiment in YouTube (1,2,3) and vimeo and tested it on a mac and on a pc because flash player behaved differently! Flags could not be tested, because YouTube removes them when it re-encodes the video (but it keeps the initial color space).

Flash player pc: Always uses BT.601 . When video is in full-screen and accelerated video rendering is used, then BT.709 is always used. This means that your video will look different in full screen! Vimeo was still BT.601 in full screen.

Flash player mac: Playes SD videos using BT.709 and HD videos using BT.601 . This is the reverse than what it should be!!This means that a color shift will occur when changing YouTube's player quality from 720p (HD) to 480p or lower.

I also noticed a strange behavior in snow leopard where I did the tests: if I have a browser with a YouTube video open (with flash player) and then open a second or even third browser, then the latter will use BT.709 for everything.

Chrome playing .mp4 (mac/windows)

I dragged an mp4 video inside the browser (flash was not used). Tags are ignored. It always uses BT.601.

Safari playing .mp4 (mac)

I dragged an mp4 video inside the browser (flash was not used). It has exactly the same behavior as Quick Time for mac (obviously QT is used by Safari).

iPhone

Either you're playing a video from YouTube app or from safari (youtube mobile) or dropbox app, Quick Time is used. So the behavior is the same as Quick Time player on a mac.

Adobe suite CS5.5

Adobe Media Encoder behaves the same as After effects and Premier when importing and exporting video. It ignores the flags. It uses BT.601 for SD video and BT.709 for HD. This means that if you import an HD video, downscale it to SD and export it, it will do the correct transformations for 709->601. The only problem is that if you have an HD video using BT.601 or an SD video using BT.709 (very rare) they will be imported wrong.. One thing to notice here it that adobe suite CS5 behaved differently. It used BT.601 for everything.

Sony Vegas

Behaves the same as Adobe Suite. It ignores flags and decides by resolution. One thing to notice in Vegas is that it uses full range (0-255) when importing and exporting mp4/mpg. This means that if you want your video to be limited range (16-235) you have to do it yourself. Just use "sony color corrector" on your main video bus and set it to: "comptuer RGB to studio RGB" for proper export.

Conclusion

As you might have guessed, each software behaves differently. The best behavior was to use the video's flag and if the video was untagged, to use BT.601 for SD and BT.709 for HD. This happened in Windows Media Player and XBMC in Windows. Quick Time on Mac behaved almost perfect. Flash was a disaster. There is not way to guarantee what the end user will see. I hope adobe fixes this in next versions.

Regarding the flags, "Transfer characteristics" did not have an effect on any player. "Matrix coefficients" was the one that mattered in proper BT.601/BT.709 reproduction and it was read from Quick Time (mac) and Windows Media player and XBMC windows, as I have already stated. "Color primaries" had an effect only on Quick Time, but it needed to be flagged as "smpte 240m" (when "Matrix coefficients" was flagged as BT.709) to give the same image as other players. Setting "Color primaries" to BT.709 produced more contrast in the output! Very strange...

my original post in my blog:
http://www.wiggler.gr/2012/02/27/bt-601-and-bt-709-compatibility/
useful links:
http://www.curtpalme.com/forum/viewtopic.php?t=21870
http://www.theasc.com/magazine/april05/conundrum2/page6.html
http://forum.doom9.org/showthread.php?t=133982
http://avisynth.org/mediawiki/Color_conversions
http://mewiki.project357.com/wiki/X264_Settings#colormatrix

dukey
8th March 2012, 23:58
I am sure there was logic behind the reason to use BT 701 for HD material, but it just creates a total mess for end users, well especially PC users.
Using 16-235 wasn't such an insane idea originally. Means you can actually increase the brightness of the image without losing detail. Some videos actually encode with values higher than 235 or lower than 16. So you can shift the colours without destroying the image at either end. But in practise this almost never gets used, and or encoders just clamp the values to the range, so the extra bits are essentially wasted.

703
20th March 2012, 05:57
You should do a test on Gamut. If SD is upscaled to HD, is gamut re-mapping done?

Petrakeas2
20th March 2012, 16:35
You should do a test on Gamut. If SD is upscaled to HD, is gamut re-mapping done?
I think I tested that in adobe media encoder:

"This means that if you import an HD video, downscale it to SD and export it, it will do the correct transformations for 709->601"

It also works the opposite way (SD to HD). Is that what you mean?

703
20th March 2012, 20:09
I think I tested that in adobe media encoder:

"This means that if you import an HD video, downscale it to SD and export it, it will do the correct transformations for 709->601"

It also works the opposite way (SD to HD). Is that what you mean?

I mean players, not encoders. For example, in a typical scenerio, at best, and for practicle reasons you would only have your display calibration to one standard e.g. BT.709, not both SMTPE-C and BT.709, but you also would play back SD and HD videos.

In this setup, all SD playback will be slighly off gamut. From what I know, only MadVR is smart enough to do gamut conversion between these color spaces when you tell it that your display is calibrated.

I have no idea what software or hardware players does the same thing as MadVR!

Petrakeas2
21st March 2012, 01:27
If I understand correctly all players convert YUV space to sRGB (which has color primaries close to BT. 709). So your monitor should be calibrated for sRGB, not 601 or 709. The player should be able to understand if the video is encoded using 601 or 709 and convert it to sRGB with the correct color matrix. And the answer to that is the above test.
Software players can't output directly to YUV, so doing 601->709 conversion can't be used (encoders on the other hand, should do that). Instead, players should choose wisely one of the two available options: 601-> sRGB or 709->sRGB

703
21st March 2012, 10:47
If I understand correctly all players convert YUV space to sRGB (which has color primaries close to BT. 709).

Oh ok, that isn't my understanding, I always thought that the decoder just de-matrix the signal to the renderer, and the renderer convert it to RGB to be displayed, and RGB is a device dependent color space.

Petrakeas2
21st March 2012, 19:32
Usually that's what happens. The decoder passes the video in the YUV mode to the renderer and the renderer converts it to RGB. But in ffdshow for example, the decoder itself can convert it to RGB (it's an option).
Usually the renderer (or ffdhsow) assumes that your monitor is sRGB calibrated. But if your monitor isn't sRGB (it's wide gamut for example) then you need to do color management. As far as I know, MadVR supports color management. But if your monitor is a typical sRGB monitor you don't need it.

JanWillem32
22nd March 2012, 00:11
Both BT.601 and BT.709 have a configuration in R'G'B' and a matrixed form in Y'CbCr (which is typically used when encoding). The conversion from Y'CbCr to R'G'B' is mathematically defined by the standards. The R'G'B' gamut of sRGB and BT.709 are the same. For BT.601, there are different gamuts for PAL/SECAM and NTSC. (These three still the same conversion matrix, though.) A typical renderer doesn't do anything but using the standard Y'CbCr to R'G'B' conversion math of either BT.601 or BT.709. Any more processing, such as gamut re-mapping and color management costs additional processing. So in essence, a typical renderer assumes your monitor output matches that of the encoding standard of the video file you're playing. That's actually not such a bad thing. Without a ICC/ICM/other profile for color management, a renderer can't do input to output color mapping anyway.
Note that using a wrong matrix to convert to R'G'B' is what this thread is about. Not all software correctly reads the tags set in a video file to pick the BT.601 or BT.709 matrix, or even fails to detect the standard to use by video resolution. In such cases the wrong matrix is used during color conversion, which can't possibly be corrected by any color management solution afterwards.
The notion "typical sRGB monitor" is flawed. I've never seen a monitor being even close to a 1:1 sRGB standard output. Plenty of decent displays can display colors beyond these (rather small) gamuts, and pretty much no monitor has a natural gamma like that of the standards. (I'll leave out the reference viewing conditions for the standard for now, but that's a factor, too.)
For the renderer I'm working on, you can feed the color management a generic sRGB profile (and a few extra settings). It will then re-map all colors to the sRGB spectrum, but it's rather unlikely that that will look any better the base R'G'B' picture.
For a renderer it's neither ideal to do processing in either Y'CbCr or R'G'B' by the way. Neither are mathematically uniform in gamut, gamma and handling of luminance, which is an obstacle when using any filtering. (The linear RGB form I've implemented in the renderer is a decent intermediate for rendering.) Color adaptation to the display output should be the last filtering step of a rendering chain, the filters before that should work in a uniform color space. Note: we make a small exception to that for re-up-sampling down-sampled chroma (typically 4:2:0 and 4:2:2 forms). But doing chroma down- and re-up-sampling is a bit of a messy thing anyway. For both, the two chroma channels (Cb and Cr) are assumed to be linear in their space (although they're not).

703
22nd March 2012, 08:51
JanW - well explained.

What is the difference between using a 3x3 matrix vs a 3D LUT to do gamut re-mapping between SMPTE-C and BT.709 color space?

Petrakeas2
23rd March 2012, 20:48
Jan thanx for the in depth explanation. If I understand correctly, if u have a decent sRGB monitor and play a BT.709 video (using the correct YCbCr->R'G'B' matrix), you'll get correct color reproduction because sRGB and BT.709 gamut are the same. But if you play a BT.601 video (using the correct YCbCr->R'G'B' matrix) in the same sRGB monitor, the color will not be correct because of the gamut difference.
Right?

JanWillem32
24th March 2012, 00:29
@703: A 33 matrix is used to convert a pixel mathematically, this is useful for when a standard dictates the conversion math. Example:// (C) 2011 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// RGB to Y'CbCr for SD&HD video input for floating point surfaces
// This shader should not be run as a screen space pixel shader.
// This shader requires compiling with ps_2_0, but higher is better, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports.
// This shader will change BT.709 [HD] or BT.601 [SD] derived RGB to Y'CbCr of an image.

sampler s0;
float2 c0;

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 s1 = tex2D(s0, tex).rgb;// original pixel
if(c0.x < 1120 && c0.y < 630) return float3(dot(float3(.299, .587, .114), s1), dot(float3(-.1495/.886, -.2935/.886, .5), s1), dot(float3(.5, -.2935/.701, -.057/.701), s1)).xyzz;// SD RGB to Y'CbCr output
return float3(dot(float3(.2126, .7152, .0722), s1), dot(float3(-.1063/.9278, -.3576/.9278, .5), s1), dot(float3(.5, -.3576/.7874, -.0361/.7874), s1)).xyzz;// HD RGB to Y'CbCr output
}Note that a 33 matrix is handled trough a series of three vector dot products or sometimes a bit simpler math. The mathematical precision is the only limiting factor for the output quality.

3D LUT stands for a three-dimensional look up table with the dimensions mapped to the R', G', and B' or Y', Cb and Cr components. For example, for the color management solution integrated in the renderer I'm working on, a high quality 3D LUT is 256 in size, storing a D3DFMT_A16B16G16R16 pixel for each entry. All incoming R'G'B' pixels are decomposed to R', G' and B' to become left-to-right, top-to-bottom and near-to-far vectors. These vectors are used to sample 8 pixels (a 222 block) from the lookup table and perform trilinear interpolation with.
For example: the input pixel is RGB (.51, .61, .71). Conversion to a 256 LUT index is easy, just multiply by 256, resulting (130.56, 156.16, 181.76). For interpolation, the 8 pixels from (130, 156, 181) to (131, 157, 182) are requested. Trilinear interpolation is used on the resulting R'G'B'A pixels, using the left over (.56, .16, .76) to weigh these pixels to one R'G'B'A output.
The result of that interpolation is the converted R'G'B' output value adapted for the display (the sampled Alpha channel is ignored). It's optionally dithered afterwards and output to the display.
This is a very expensive method to convert values. For conversion of video R'G'B' (a mostly mathematically defined color space) to the erratic display output R'G'B' values, this is pretty much the only way (there's no defined conversion math for this task). I would not use this method if the mathematical conversion between the two would be less expensive.
There are also a lot of points that contribute to imprecision using this method. The D3DFMT_A16B16G16R16 format used in the example may be of acceptable quality, but the standard calculation format is still D3DFMT_A32B32G32R32F (standard vector of four single-precision floating points). The interpolation required between 8 pixels to get one output makes the output value only accurate by approximation. The math used to get the 256 values into the 3D LUT also contributes to the imprecision as well (as everything has to be pre-processed and rounded). During creation of a 3D LUT of this size, a memory usage of half a GB by the player is typical. After creation, the 3D LUT takes 128 MB of video memory.
To compare: the above pixel shader compiled as "fxc /Tps_3_0 /Emain /O3 /Foout.txt in.txt" takes 572 bytes. That file only contains executable math and a rather large file header. (If we integrate this shader into the renderer, so that the the SD/HD selection and video resolution are resolved prior to compiling, it would be even smaller. Note that the lookup for the color management section in the renderer I'm working on also uses a pixel shader for pre- and post-processing.)

@Petrakeas2: I'll invert the reasoning for this situation.
If you create a raw image and separately down-grade it for publication to BT.709, BT.601 NTSC and BT.601 PAL/SECAM formats, the outputs will be different by definition. For this example, we assume a single raw image that the gamuts of each can completely contain and a video renderer that correctly uses the correct one of the 3 standards to interpret the input colors. The encoding in the file will be Y'CbCr. The video renderer converts the Y'CbCr to R'G'B using either the BT.709 or BT.601 matrix. After that step, the video renderer 'knows' the color space of the working format.
Mapping that input to the display output using display calibration data (through a 3D LUT and some processing in the video renderer) should re-map the said BT.709, BT.601 NTSC and BT.601 PAL/SECAM formatted images to visually the same thing as the raw source image.
Without data on the output device calibration and user viewing conditions, it would be a total waste to let the video renderer do expensive color re-mapping, and so a typical renderer will just default output what's converted by the Y'CbCr to R'G'B matrix. The differently encoded images will look different from one another when output by the renderer. Also in this case, if the display device and viewing conditions meets the BT.709 standard, the displayed image will look like your raw source image. This is highly improbable to begin with, and the video renderer itself doesn't 'know' that it happens to match standard criteria either in such a case.

703
26th March 2012, 11:44
Thanks. For gamut mapping, do you use Absolute Colormetric or Relative Colormetric? Any idea what other use as well? Such as yCMS?

JanWillem32
26th March 2012, 20:30
Absolute colorimetric and relative colorimetric modes are both fine to use, the key difference is only the white point. Relative colorimetric mode shifts the working color space's white point to the display white point. Absolute colorimetric mode doesn't. With absolute colorimetric mode the 100% R, 100% G, and 100% B output color isn't used at all. For example, the white point of the monitor may be 97% R, 99% G and 100% B to get the reference white of the working color space.

I'll leave out saturation intent for now, It's not suitable for anything but imaging business graphs. Perceptual intent distorts the color space too much for my taste. When working with large color spaces (such as XYZ and XYy), it's even completely unsuitable (it would compress everything to gray on display).
For yCMS, a very similar working color space R'G'B' to display R'G'B' 3D LUT is used. I don't know if the two are actually compatible (the main size and pixel formats of the systems do match). Lookup tables in hardware vary in quality. Those used in cinemas are probably larger, and will be directly connected to the display hardware. For most other hardware color lookup tables, such a processor and memory block would be very expensive, so it's rather unlikely to have such quality.

chris319
15th May 2017, 02:18
I dragged an mp4 video inside the browser (flash was not used). Tags are ignored. It always uses BT.601.


I have observed this behavior among all of the Windows browsers except Edge and IE. They default to bt.601 no matter what, so videos encoded with bt.709 display the wrong colors even if the video is HD (720p or 1080i) and even if the video is explicitly tagged as bt.709.

Given that sRGB is the standard for computer video and it calls for bt.709 coefficients, it is curious that these browsers default to 601 -- all of them -- Firefox, SeaMonkey, Chrome, Chromium, Opera. VLC defaults to 709 and you can open YouTube videos with it.

I have contacted several of the Firefox programmers. One replies to emails but the others simply ignore them.

It is a simple matter to change the default color space. In Windows you use DXVA and in Linux it is VDPAU.

I have created several test patterns here: https://www.youtube.com/channel/UCp341qNSckbp4YOz88BTfsw

You can use an eyedropper program to check the colors.

"tag test pattern709" is explicitly tagged as 709 yet it still gets played back with 601 coefficients. Clearly the browsers ignore these tags.

Complete name :tag test_pattern709.mp4

Color primaries : BT.709
Transfer characteristics : BT.709
Matrix coefficients : BT.709

Here is the ffmpeg command line used to flag the videos as 709. Note that I am encoding an RGB bmp image.

ffmpeg -y -loop 1 -t 10 -i test_pattern709.bmp -vf scale=out_color_matrix=bt709 -color_primaries bt709 -color_trc bt709 -colorspace bt709 test_pattern709.mp4

chris319
21st July 2017, 03:22
If I understand correctly, if u have a decent sRGB monitor and play a BT.709 video (using the correct YCbCr->R'G'B' matrix), you'll get correct color reproduction because sRGB and BT.709 gamut are the same. But if you play a BT.601 video (using the correct YCbCr->R'G'B' matrix) in the same sRGB monitor, the color will not be correct because of the gamut difference.
Right?

Yes, you have that correct.

Put the concept of "my monitor" out of your mind. We're dealing with the RGB data which is ultimately delivered to the viewer's monitor. It is then up to the user to make sure his monitor is properly calibrated.

This is something every broadcaster has known for decades. You have videos on YouTube, Vimeo or wherever, being viewed by a potential audience of millions. As producers (as opposed to consumers) of these videos we have NO CONTROL over how each of these potentially millions of viewers has adjusted, or not adjusted, his monitor. All we can do is set a standard and hope the viewer's device complies with that standard.

NTSC was the original color video standard developed by RCA in the 1940's and 1950's, before PAL and SECAM, using analog vacuum-tube technology which was the state of the art at the time. For what it was it worked remarkably well until broadcasters in the U.S. went digital in 2009. The reason it worked so well is because there was ONE STANDARD which everyone was expected to adhere to, particularly manufacturers of broadcasting and receiving equipment. You didn't have manufacturers adopting their own proprietary standards which the masses were expected to adapt, or not adapt, to.

Where NTSC fell short was that the masses didn't have a viable way of adjusting their receivers. In 1978 CBS labs developed a color bar signal which won a technical Emmy award. It was widely used to calibrate studio monitors. The user could adjust hue and saturation by disabling the red and green channels and displaying only the blue channel. Many monitors had a button which would show a blue-only display. The problem was that it required bandwidth to transmit the signal and broadcasters would rather broadcast programming with that bandwidth than a test signal.

Much of the technology in today's digital video has its antecedents in analog NTSC video developed in the 1940's and 1950's, such as the concept of a luminance channel and two color-difference signals, with the color-difference signals having a lower resolution/bandwidth than the luminance signal which carries the detail. Gamma correction was used to compensate for the transfer characteristics of CRT phosphors and was set at 2.2. I've read a lot of cruft about gamma correction but that is the historical reason it exists.

Today in 2017 we have one standard for web video: sRGB. As noted above, it uses the same coefficients as bt.709 and 2.2 gamma. As also noted above, not all encoders flag their color space in fact many videos aren't flagged at all and even if they were, the players/decoders don't necessarily read those flags and do the right thing with them. The best we can do as producers is to make our videos sRGB compliant.

My monitor calibration program with its USB sensor is one of the best investments I have ever made. The main parameters I use are 6500 K color temperature and 2.2 gamma. Many of these programs have a preset for sRGB.

https://en.wikipedia.org/wiki/SRGB

https://en.wikipedia.org/wiki/SMPTE_color_bars