FFDShow with H.264 10bit support

patrick_ · 14th July 2011, 17:11

I searched the web, but had a hard time finding a FFDShow build that supports H.264 10-bit. Finally I found one included in CCCP. I hate codec packs so I decided to replace the files of the default FFDShow Installer with the ones from CCCP. I've only tested it with a single 10-bit file using MPC-HC*, but it worked

Download FFDShow rev 3925 with H.264 10-bit support (BETA)
http://www.filesonic.com/file/144456...0711-10bit.exe

*if you use MPC-HC, don't forget to deactivate the internal filters.

Hypernova · 14th July 2011, 18:44

Thanks. I was looking for that also. I don't mind CCCP in comparison to bigger codec pack like k-lite, but still this is nice.

clsid · 14th July 2011, 19:24

Those builds are alpha quality and are not stable. They should be used with caution. For example 9-bit H.264 will just crash.

Btw, K-Lite has small variants as well. Its latest beta version even has Blu-ray playback capability.

JanWillem32 · 14th July 2011, 19:58

I know for a fact that quite a few people are working on this. I absolutely can't recommend usage of the 10-bit system yet with MPC-HC, until:
1.- A real ffdshow beta comes out that can output at least one of the recommended 10- or 16-bit formats for DirectShow to the player: http://msdn.microsoft.com/en-us/libr...=VS.85%29.aspx .
(Currently all output is rounded to 8-bit formats, so there's no gain by the 10-bit precision yet.)
2.- At least one of the internal renderers has been updated to become able to take in and mix the 10- or 16-bit formats.
3.- At least some quality and reliability testing has been done.

Dark Shikari · 14th July 2011, 20:15

Quote:

Originally Posted by JanWillem32

I know for a fact that quite a few people are working on this. I absolutely can't recommend usage of the 10-bit system yet with MPC-HC, until:
1.- A real ffdshow beta comes out that can output at least one of the recommended 10- or 16-bit formats for DirectShow to the player: http://msdn.microsoft.com/en-us/libr...=VS.85%29.aspx .
(Currently all output is rounded to 8-bit formats, so there's no gain by the 10-bit precision yet.)

This is completely false. 10-bit display support isn't merely unnecessary; it's beyond useless.

patrick_ · 14th July 2011, 20:47

JanWillem32, using 10bits in compression doesn't have anything to do with 10bit output. If it worked that way, you would need a 10bit display as well.

clsid, I updated the installer primary to allow x264 users to test/compare 8bit and 10bit outputs.

JanWillem32 · 14th July 2011, 20:51

10-bit display support is something entirely different than 10-bit input to the mixer.
A 10-bit bt.601 or bt.709 encode has a limited ranges [64, 940] [64, 960], [64, 960] Y'CbCr encoding, usually with chroma sub-sampling.
A 10-bit display has a full range [0, 1023] RGB display matrix.

In the link I gave, there's a list with the recommended replacement formats for the regular 8-bit YV12, I420/IYUV, NV12 and AYUV formats that do support more than 8-bit to feed to the mixer.

There's a lot going on in between to get the input format mixed and rendered to the output display. Even if the output of the display is just 8-bit RGB, there's no doubt that quality during mixing and rendering suffers if the input Y'CbCr format is rounded from 10- to 8-bit before even the mixer and renderer can receive the image.

By the way, there's plenty of scientific basis to why the Digital Cinema Initiative set 12-bit as a minimum requirement for both the encoding format and display capability for licensing.
There's also plenty of reasons why the studio formats store XYZ color data in a 32-bit floating-point format.

Dark Shikari · 14th July 2011, 21:01

Quote:

Originally Posted by JanWillem32

10-bit display support is something entirely different than 10-bit input to the mixer.
A 10-bit bt.601 or bt.709 encode has a limited ranges [64, 940] [64, 960], [64, 960] Y'CbCr encoding, usually with chroma sub-sampling.
A 10-bit display has a full range [0, 1023] RGB display matrix.

In the link I gave, there's a list with the recommended replacement formats for the regular 8-bit YV12, I420/IYUV, NV12 and AYUV formats that do support more than 8-bit to feed to the mixer.

There's a lot going on in between to get the input format mixed and rendered to the output display. Even if the output of the display is just 8-bit RGB, there's no doubt that quality during mixing and rendering suffers if the input Y'CbCr format is rounded from 10- to 8-bit before even the mixer and renderer can receive the image.

It isn't rounded, it's dithered. There's a slight, but critical difference.

The effect of your display on the effectiveness of 10-bit is negligible. A 6-bit $50 LCD benefits from 10-bit just as much as the world's most expensive IPS monitor because 10-bit is about internal codec precision, not output precision.

JanWillem32 · 14th July 2011, 23:33

How awful to use the dithering compromise right at the start of a rendering instance. I'm having enough trouble with performing convolutions and other filter passes on the inputs full of synthetic noise as it is.
I guess the dithering is less than a structured 128×128 dithering map lookup for every channel, too?
If it's about internal codec precision, please make the decoder 32-bit floating point or better on output, so I don't have to static cast every element of the mixer input to 32-bit floating-point anymore. I can handle pretty much any input quantization, only the double precision for the color management section can be a bit intense to process.
What comes out of the renderer takes at least 7 conversion passes from the decoder's output to the back buffer of the allocator-presenter. What goes on screen can hardly be called raw from what the decoder outputs. Depending on settings, things can look very bad. For instance, a low gamma setting of 2.0 to 2.2 is murder on darker scenes with most consumer-grade video.

LoRd_MuldeR · 14th July 2011, 23:55

Quote:

Originally Posted by JanWillem32

If it's about internal codec precision, please make the decoder 32-bit floating point or better on output, so I don't have to static cast every element of the mixer input to 32-bit floating-point anymore. I can handle pretty much any input quantization, only the double precision for the color management section can be a bit intense to process.

As far as I know, h.264 (and probably most video formats) internally use integer math only. So floating point output doesn't make much sense. You can as well do the conversion yourself, if you need FP math for your post-processing. Also: Even if the original input source only used 8-Bit precision (per color channel) and the final output is going to be 8-Bit again, using 10-bit (12-bit) internal codec precision improves compression efficiency. Whether the decoder outputs 8-Bit or 10-Bit (12-Bit) is up to the decoder's choice, i.e. we don't necessarily need "true" 10-bit (12-bit) output to benefit from "high bit-depth" h.264 video...

Dark Shikari · 15th July 2011, 00:28

Quote:

Originally Posted by JanWillem32

If it's about internal codec precision, please make the decoder 32-bit floating point or better on output, so I don't have to static cast every element of the mixer input to 32-bit floating-point anymore.

a) The H.264 spec only supports up to 14-bit.
b) Floating point math is incredibly slow relative to integer math. A decoder written in 32-bit float would probably not be able to decode 1080p on an overclocked 6-core Core i7.

JanWillem32 · 15th July 2011, 03:10

When looking at the specifications, I really don't see any reason to round any output to integer, and most certainly not 8-bit. The decoded structures are only guaranteed to be 8-bit on the 8-bit lossless profile. All lossy modes will generate in-between values, even on encoding integer input.
I've had absolutely no issues with floating-point performance, as long as you don't rely on the FPU to do the work. A nice example on how to use packed SSE (readable for non-programmers, too): http://software.intel.com/en-us/blog...-acceleration/ . It's also convenient to blend in GPU power when heavy floating-point operations are wanted, but the programming for that is very specific, and the GPU has only very little integer math power.

A bit of information why consumer formats are frustrating for studio workers:
You start off with a big, raw camera image format.
You take the camera color calibration scheme and project the input images to a nice studio format with the full XYZ color palette and plenty of quantization.
You edit in that studio format, towards the cinema format 2048×1080 or with less height on 1:2.40 movies. (The 4k profile of 4096×2160 is still very rare.)
You make the cinema screener: the XYZ color space remains intact in the encode, only when encoding the JPEG2000 video, colors are quantized to 12- or 14-bit and the gamma is set to 2.6. The encoding profile is lossless up to 250 Mbit/s. There's no visible change from the studio master to the encode at all. Features are generally some 300 GB in size.
On approval, the final version is encoded for distribution to cinemas with the same type of encoding.
When it's time to do the blu-ray and DVD release, things get nasty.
The image is clipped to 1920 wide for blu-ray, and usually the same amount for DVD, too.
The XYZ color space is converted to the HD, PAL and NTSC color spaces, losing about 2/3 of all possible colors in the space.
The input is limited on maximum lightness (a lowpass filter), because of limitations in the HDTV and SDTV standards.
The gamma is typically set to 2.4.
With DVD, the image is scaled down.
Chroma (2 color channels, relative to grayscale) is sub-sampled to half-resolution in height and width (4:2:0).
On the encoding step, the images are heavily dithered to mask the rounding to 8-bit+limited ranges.

There are plenty of "magic" filters in use by studios when doing the encodes, but you simply can't overcome the limitations set by the encoded format. Studio or cinema video and consumer video don't look nearly the same.
If there are no consumer products that can actually decode anything better than 8-bit, 4:2:0 Y'CbCr, studios hardly have a reason to use anything better. If the rare decoder found that can decode more than 8-bit, but rounds/dithers it again afterwards, studios don't have a reason either. They can take care of a much better set of filters for dithering to 8-bit themselves.
Considering the relative ages of the JPEG2000 and h.264 codecs, I do believe better results can be achieved with the 50 or 100 GB of space on a blu-ray.

On the other side: what I'm working on now. The mixer and renderer have to deal with images that have starvation in input precision, by the quantization level, by the image quality loss trough lossy encoding, by horizontal/vertical resolution and by the limitations in color space.
When writing filters for mixing and rendering stages, I've seen poor quality results with simple filters on even a lossless 8-bit RGB input from a BMP file. (The still image filter in Windows supports it.)
Quantization is one of the things that can be become better than what it is now. The recommended list of formats to transport 10- or 16-bit has been around for a while. Using those formats to for at least the 9-bit and better images is only logical.

Dark Shikari · 15th July 2011, 03:22

Quote:

Originally Posted by JanWillem32

When looking at the specifications, I really don't see any reason to round any output to integer, and most certainly not 8-bit. The decoded structures are only guaranteed to be 8-bit on the 8-bit lossless profile. All lossy modes will generate in-between values, even on encoding integer input.

Incorrect. The specification requires bit-exact decoding or decoder/encoder desync will occur, resulting in artifacts.

Quote:

Originally Posted by JanWillem32

I've had absolutely no issues with floating-point performance, as long as you don't rely on the FPU to do the work.

SSE is part of the FPU. It shares the same execution units.

Quote:

Originally Posted by JanWillem32

A nice example on how to use packed SSE (readable for non-programmers, too): http://software.intel.com/en-us/blog...-acceleration/ . It's also convenient to blend in GPU power when heavy floating-point operations are wanted, but the programming for that is very specific, and the GPU has only very little integer math power.

I've written thousands of lines of assembly code. I think I know how SIMD works.

Floating point addition in SSE is typically 3/1 (latency/invthroughput). By comparison, integer addition is typically 1/0.5, and with 16-bit integers you get twice as many per register (and four times as many with 8-bit). In the end this means the typical throughput from integer math is 4-8 times higher than floating point.

This isn't even considering the fact that integer math allows all sorts of useful shortcuts, like shifting instead of multiplication, bitmasking, and other performance tricks which are often impossible with floating point math.

JanWillem32 · 15th July 2011, 04:38

Quote:

Originally Posted by Dark Shikari

Incorrect. The specification requires bit-exact decoding or decoder/encoder desync will occur, resulting in artifacts.

Okay, so it's a "cleanup" cycle in decoding. Too bad, I had hoped for sine-wave-like transformations and convolution like with lossy audio that does benefit from floating-points.

Quote:

Originally Posted by Dark Shikari

SSE is part of the FPU. It shares the same execution units.

I mostly meant relying on the classical 1-level approach, instead of 2 or more SIMD instructions at a time.

Quote:

Originally Posted by Dark Shikari

This isn't even considering the fact that integer math allows all sorts of useful shortcuts, like shifting instead of multiplication, bitmasking, and other performance tricks which are often impossible with floating point math.

Bitmasking and such is indeed tricky on floating-points. I've done some on GPU work, and I can say that the bit field in floating-point mode is too variable to pull it off correctly usually. Only sign and some exponent tricks work well bit-wise.
The fact that I need to rely more and more on doubles and some floats for various things, does decrease my usage of bitmasking and such. The SSE example shows how to get at least some decent performance on doubles and floats on a CPU. (And give at least some impression to what it is for the regular people here.)

Dark Shikari · 15th July 2011, 04:42

Quote:

Originally Posted by JanWillem32

Okay, so it's a "cleanup" cycle in decoding. Too bad, I had hoped for sine-wave-like transformations and convolution like with lossy audio that does benefit from floating-points.

That's because audio codecs are, in terms of prediction, stuck in the 1980s: they still generally have no inter prediction. CELT and AAC Main (LTP) are the only audio codecs with inter prediction, and both do require nearly-bit-exact decoding as a result. IIRC, Long Term Prediction actually requires an implementation of 16-bit floats in order to work correctly.

JanWillem32 · 15th July 2011, 12:11

Interesting material. It seems that many lossless image formats also use a strict step rounding mechanism of the lossy internal structure, just before error correction to lossless.
I know the half float format very well. We currently use the D3DXFloat32To16Array function to write out to D3DFMT_A16B16G16R16F (can also carry non-ABGR data). It's not very fast, so I'd love to replace that one with something better. The structure it outputs is an unsigned __int16*, so the CPU can't really do anything useful with it, either. Once a D3DFMT_A16B16G16R16F texture is created, it goes straight into the GPU with a DMA transfer. Most GPU's nowadays don't have a half float calculation mode (anymore). It's just a supported format to save memory bandwidth. With every texture transfer, all vertices and subpixels are converted to 32-bit float for calculation. Other types are allowed in DirectX 10 for some calculation stages, DirectX 11 allows even doubles on the GPU. The performance is low with anything else but 32-bit float. That's one of the reasons why DXVA parts are separate on the silicon of a GPU.

Anyway, a bit more on topic. I very much hope that this advancement to leads a proper ffdshow beta soon. I'm more than willing to help writing and testing code for a video mixer part, so it's able to receive 10-bit or zero-padded 16-bit Y'CbCr formats. I'm looking at this from the perspective of the regular consumer and professional, too. Once a complete system has been proven to be able to maintain more measurable significant quality than what there was before, it becomes advertisable to the public and professionals in general.

TheRyuu · 15th July 2011, 13:50

Quote:

Originally Posted by clsid

For example 9-bit H.264 will just crash.

That's not going to get fixed until tryouts gets unborked.

No one uses 9bit h264 so I don't really think it matters that much.

clsid · 15th July 2011, 14:32

Quote:

Originally Posted by TheRyuu

That's not going to get fixed until tryouts gets unborked.

No one uses 9bit h264 so I don't really think it matters that much.

That part of the code is a huge mess and the people who wrote it are not around anymore. So any help is welcome. I have already done some major cleanup in Swscale, so that the difference with vanilla FFmpeg is now minimal.

As long as x264 supports 9-bit, such files will be made. If ffdshow can not be fixed to support it, then it should reject decoding such files. Crashing is unacceptable.

aegisofrime · 20th July 2011, 05:57

Quote:

Originally Posted by patrick_

I searched the web, but had a hard time finding a FFDShow build that supports H.264 10-bit. Finally I found one included in CCCP. I hate codec packs so I decided to replace the files of the default FFDShow Installer with the ones from CCCP. I've only tested it with a single 10-bit file using MPC-HC*, but it worked

Download FFDShow rev 3925 with H.264 10-bit support (BETA)
http://www.filesonic.com/file/144456...0711-10bit.exe

*if you use MPC-HC, don't forget to deactivate the internal filters.

Thanks so much for this, plays my 10-bit x264 encodes perfectly

BTW, is there a link or repository where I can keep up with future revisions?

SamKook · 20th July 2011, 14:51

Quote:

Originally Posted by aegisofrime

BTW, is there a link or repository where I can keep up with future revisions?

You can get beta for ffdshow anywhere, just do a search on google.

Or try here: http://www.afterdawn.com/software/au...m#all_versions

Not sure if they all have 10-bit support enabled though, I don't have anything to test them with.

14th July 2011, 17:11	#1 \| Link
patrick_ Registered User Join Date: Jan 2007 Posts: 85	FFDShow with H.264 10bit support I searched the web, but had a hard time finding a FFDShow build that supports H.264 10-bit. Finally I found one included in CCCP. I hate codec packs so I decided to replace the files of the default FFDShow Installer with the ones from CCCP. I've only tested it with a single 10-bit file using MPC-HC, but it worked Download FFDShow rev 3925 with H.264 10-bit support (BETA) http://www.filesonic.com/file/144456...0711-10bit.exe if you use MPC-HC, don't forget to deactivate the internal filters.

14th July 2011, 18:44	#2 \| Link
Hypernova Registered User Join Date: Feb 2006 Posts: 293	Thanks. I was looking for that also. I don't mind CCCP in comparison to bigger codec pack like k-lite, but still this is nice. __________________ Spec: Intel Core i5-3570K, 8g ram, Intel HD4000, Samsung U28D590 4k monitor+1080p Projector, Windows 10.

14th July 2011, 19:24	#3 \| Link
clsid ***** Join Date: Feb 2005 Posts: 5,647	Those builds are alpha quality and are not stable. They should be used with caution. For example 9-bit H.264 will just crash. Btw, K-Lite has small variants as well. Its latest beta version even has Blu-ray playback capability. __________________ MPC-HC 2.2.1

14th July 2011, 19:58	#4 \| Link
JanWillem32 Registered User Join Date: Oct 2010 Location: The Netherlands Posts: 1,083	I know for a fact that quite a few people are working on this. I absolutely can't recommend usage of the 10-bit system yet with MPC-HC, until: 1.- A real ffdshow beta comes out that can output at least one of the recommended 10- or 16-bit formats for DirectShow to the player: http://msdn.microsoft.com/en-us/libr...=VS.85%29.aspx . (Currently all output is rounded to 8-bit formats, so there's no gain by the 10-bit precision yet.) 2.- At least one of the internal renderers has been updated to become able to take in and mix the 10- or 16-bit formats. 3.- At least some quality and reliability testing has been done. __________________ development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv

14th July 2011, 20:51	#7 \| Link
JanWillem32 Registered User Join Date: Oct 2010 Location: The Netherlands Posts: 1,083	10-bit display support is something entirely different than 10-bit input to the mixer. A 10-bit bt.601 or bt.709 encode has a limited ranges [64, 940] [64, 960], [64, 960] Y'CbCr encoding, usually with chroma sub-sampling. A 10-bit display has a full range [0, 1023] RGB display matrix. In the link I gave, there's a list with the recommended replacement formats for the regular 8-bit YV12, I420/IYUV, NV12 and AYUV formats that do support more than 8-bit to feed to the mixer. There's a lot going on in between to get the input format mixed and rendered to the output display. Even if the output of the display is just 8-bit RGB, there's no doubt that quality during mixing and rendering suffers if the input Y'CbCr format is rounded from 10- to 8-bit before even the mixer and renderer can receive the image. By the way, there's plenty of scientific basis to why the Digital Cinema Initiative set 12-bit as a minimum requirement for both the encoding format and display capability for licensing. There's also plenty of reasons why the studio formats store XYZ color data in a 32-bit floating-point format. __________________ development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv Last edited by JanWillem32; 14th July 2011 at 20:54.

14th July 2011, 20:47	#6 \| Link
patrick_ Registered User Join Date: Jan 2007 Posts: 85	JanWillem32, using 10bits in compression doesn't have anything to do with 10bit output. If it worked that way, you would need a 10bit display as well. clsid, I updated the installer primary to allow x264 users to test/compare 8bit and 10bit outputs.

14th July 2011, 23:33	#9 \| Link
JanWillem32 Registered User Join Date: Oct 2010 Location: The Netherlands Posts: 1,083	How awful to use the dithering compromise right at the start of a rendering instance. I'm having enough trouble with performing convolutions and other filter passes on the inputs full of synthetic noise as it is. I guess the dithering is less than a structured 128×128 dithering map lookup for every channel, too? If it's about internal codec precision, please make the decoder 32-bit floating point or better on output, so I don't have to static cast every element of the mixer input to 32-bit floating-point anymore. I can handle pretty much any input quantization, only the double precision for the color management section can be a bit intense to process. What comes out of the renderer takes at least 7 conversion passes from the decoder's output to the back buffer of the allocator-presenter. What goes on screen can hardly be called raw from what the decoder outputs. Depending on settings, things can look very bad. For instance, a low gamma setting of 2.0 to 2.2 is murder on darker scenes with most consumer-grade video. __________________ development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv

15th July 2011, 03:10	#12 \| Link
JanWillem32 Registered User Join Date: Oct 2010 Location: The Netherlands Posts: 1,083	When looking at the specifications, I really don't see any reason to round any output to integer, and most certainly not 8-bit. The decoded structures are only guaranteed to be 8-bit on the 8-bit lossless profile. All lossy modes will generate in-between values, even on encoding integer input. I've had absolutely no issues with floating-point performance, as long as you don't rely on the FPU to do the work. A nice example on how to use packed SSE (readable for non-programmers, too): http://software.intel.com/en-us/blog...-acceleration/ . It's also convenient to blend in GPU power when heavy floating-point operations are wanted, but the programming for that is very specific, and the GPU has only very little integer math power. A bit of information why consumer formats are frustrating for studio workers: You start off with a big, raw camera image format. You take the camera color calibration scheme and project the input images to a nice studio format with the full XYZ color palette and plenty of quantization. You edit in that studio format, towards the cinema format 2048×1080 or with less height on 1:2.40 movies. (The 4k profile of 4096×2160 is still very rare.) You make the cinema screener: the XYZ color space remains intact in the encode, only when encoding the JPEG2000 video, colors are quantized to 12- or 14-bit and the gamma is set to 2.6. The encoding profile is lossless up to 250 Mbit/s. There's no visible change from the studio master to the encode at all. Features are generally some 300 GB in size. On approval, the final version is encoded for distribution to cinemas with the same type of encoding. When it's time to do the blu-ray and DVD release, things get nasty. The image is clipped to 1920 wide for blu-ray, and usually the same amount for DVD, too. The XYZ color space is converted to the HD, PAL and NTSC color spaces, losing about 2/3 of all possible colors in the space. The input is limited on maximum lightness (a lowpass filter), because of limitations in the HDTV and SDTV standards. The gamma is typically set to 2.4. With DVD, the image is scaled down. Chroma (2 color channels, relative to grayscale) is sub-sampled to half-resolution in height and width (4:2:0). On the encoding step, the images are heavily dithered to mask the rounding to 8-bit+limited ranges. There are plenty of "magic" filters in use by studios when doing the encodes, but you simply can't overcome the limitations set by the encoded format. Studio or cinema video and consumer video don't look nearly the same. If there are no consumer products that can actually decode anything better than 8-bit, 4:2:0 Y'CbCr, studios hardly have a reason to use anything better. If the rare decoder found that can decode more than 8-bit, but rounds/dithers it again afterwards, studios don't have a reason either. They can take care of a much better set of filters for dithering to 8-bit themselves. Considering the relative ages of the JPEG2000 and h.264 codecs, I do believe better results can be achieved with the 50 or 100 GB of space on a blu-ray. On the other side: what I'm working on now. The mixer and renderer have to deal with images that have starvation in input precision, by the quantization level, by the image quality loss trough lossy encoding, by horizontal/vertical resolution and by the limitations in color space. When writing filters for mixing and rendering stages, I've seen poor quality results with simple filters on even a lossless 8-bit RGB input from a BMP file. (The still image filter in Windows supports it.) Quantization is one of the things that can be become better than what it is now. The recommended list of formats to transport 10- or 16-bit has been around for a while. Using those formats to for at least the 9-bit and better images is only logical. __________________ development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv Last edited by JanWillem32; 15th July 2011 at 03:13. Reason: typo

15th July 2011, 12:11	#16 \| Link
JanWillem32 Registered User Join Date: Oct 2010 Location: The Netherlands Posts: 1,083	Interesting material. It seems that many lossless image formats also use a strict step rounding mechanism of the lossy internal structure, just before error correction to lossless. I know the half float format very well. We currently use the D3DXFloat32To16Array function to write out to D3DFMT_A16B16G16R16F (can also carry non-ABGR data). It's not very fast, so I'd love to replace that one with something better. The structure it outputs is an unsigned __int16*, so the CPU can't really do anything useful with it, either. Once a D3DFMT_A16B16G16R16F texture is created, it goes straight into the GPU with a DMA transfer. Most GPU's nowadays don't have a half float calculation mode (anymore). It's just a supported format to save memory bandwidth. With every texture transfer, all vertices and subpixels are converted to 32-bit float for calculation. Other types are allowed in DirectX 10 for some calculation stages, DirectX 11 allows even doubles on the GPU. The performance is low with anything else but 32-bit float. That's one of the reasons why DXVA parts are separate on the silicon of a GPU. Anyway, a bit more on topic. I very much hope that this advancement to leads a proper ffdshow beta soon. I'm more than willing to help writing and testing code for a video mixer part, so it's able to receive 10-bit or zero-padded 16-bit Y'CbCr formats. I'm looking at this from the perspective of the regular consumer and professional, too. Once a complete system has been proven to be able to maintain more measurable significant quality than what there was before, it becomes advertisable to the public and professionals in general. __________________ development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv