MPC-HC tester builds for internal renderer fixes - Page 36

Mierastor · 17th November 2011, 17:18

Ffdshow tryouts has Lanczos as one resize option so maybe there is some useful code there.

Keiyakusha · 17th November 2011, 21:02

lanczos is a shortcut for bicubic with taps=3 or something like that. don't really remember.
lanczos4 is a bicubic too, but with taps=4

spline on the other hand is somewhat different than lanczos and "many people think it gives the best result"

Mierastor · 17th November 2011, 21:04

Quote:

Originally Posted by Keiyakusha

lanczos is a shortcut for bicubic with taps=3 or something like that. don't really remember.
lanczos4 is a bicubic too, but with taps=4

http://en.wikipedia.org/wiki/Lanczos_resampling

Keiyakusha · 17th November 2011, 21:09

Quote:

Originally Posted by Mierastor

http://en.wikipedia.org/wiki/Lanczos_resampling

Lanczos is a "customized" bicubic, customized by guy with the name Cornelius Lanczos, what you don't understand?

Mierastor · 17th November 2011, 21:17

Quote:

Originally Posted by Keiyakusha

Lanczos is a "customized" bicubic, customized by guy with the name Cornelius Lanczos, what you don't understand?

Both are convolution filters. The results can be quite different.

Just Google to find numerous discussions regarding advantages or disadvantages for Lanczos vs. Bicubic.

Keiyakusha · 17th November 2011, 21:19

Quote:

Originally Posted by Mierastor

Both are convolution filters. The results can be quite different.

Just Google to find numerous discussions regarding advantages or disadvantages for Lanczos vs. Bicubic.

I'm not saying they are the same. Obviously they are not. I'm saying Bicubic can be tweaked to make lanczos out of it. Something like that.

JanWillem32 · 17th November 2011, 21:28

The bicubic filters really don't look a thing like the windowed sinc filters. I know the base filter, but I very much prefer a processed convolution using the weights functions.
base Lanczos sample weight filter:

Code:

#define PI acos(-1)// this will generate the number PI with full precision, it is useful for example with the sin, cos and tan functions

double LanczosFilter(double x, double radius)
{
	if(x == 0) return 1;
	if(x < 0) x = -x;// intrinsic sign(...) is faster as it only reads 1 bit, but for a quick sample this is sufficient
	if(x >= radius) return 0;// cutoff if beyond 2 for lanczos2, 3 for lanczos3, radius is normally a static const
	return sin(x*PI)*sin(x*PI/radius)/(x*x*PI*PI/radius);
}

All that's left is to fill in the variables, and write out the convolution function of 4 weights for Lanczos2, 6 for Lanczos3, etc.. A very tedious task.

Mierastor · 17th November 2011, 21:31

Quote:

Originally Posted by Keiyakusha

I'm not saying they are the same. Obviously they are not. I'm saying Bicubic can be tweaked to make lanczos out of it. Something like that.

Cannot be done in for example Ffdshow which includes a lot of customization options.

JanWillem32 · 17th November 2011, 23:14

Lanczos2, probably correct in function, but not optimized:

Code:

// (C) 2011 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// prototype Lanczos2 height resizer
// This shader should be run as a screen space pixel shader if you are up-scaling.
// This shader should not be run as a screen space pixel shader if you are down-scaling.
// This shader requires compiling with ps_2_0, but higher is better, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports.
// If possible, avoid compiling with the software emulation modes (ps_?_sw). Pixel shaders require a lot of processing power to run in real-time software mode.
// This shader is meant to work with linear RGB input and output. Regular R'G'B' with a video gamma encoding will have to be converted with the linear gamma shaders to work properly.
// Use this shader to scale the height of an image by Lanczos2 interpolation.

// fractions, either decimal or not, are allowed
// set the magnification factor
#define Magnify (4/3.)

sampler s0;
float2 c0;
float2 c1;
#define sp(a, b) float4 a = tex2D(s0, float2(tex.x, coord+b*c1.y));
#define PI acos(-1)// this will generate the number PI with full precision

float4 main(float2 tex : TEXCOORD0) : COLOR
{
	float coord = (tex.y/Magnify+.5-.5/Magnify)*c0.y;// assign the output position, normalized to texture height in pixels
	float t = frac(coord);// calculate the difference between the output pixel and the original surrounding two pixels
	coord = (coord-t+.5)*c1.y;// adjust sampling matrix to put the ouput pixel in the interval [Q1, Q2)

	sp(Q1, 0)// nearest original pixel to the top
	if (t) {
		sp(Q0, -1) sp(Q2, 1) sp(Q3, 2)// original pixels
		float Lanczos2wQ0 = sin((1.+t)*PI)*sin((1.+t)*PI/2.)/(pow(1.+t, 2)*PI*PI/2.);
		float Lanczos2wQ1 = sin(t*PI)*sin(t*PI/2.)/(t*t*PI*PI/2.);
		float Lanczos2wQ2 = sin((1.-t)*PI)*sin((1.-t)*PI/2.)/(pow(1.-t, 2)*PI*PI/2.);
		float Lanczos2wQ3 = sin((2.-t)*PI)*sin((2.-t)*PI/2.)/(pow(2.-t, 2)*PI*PI/2.);
		return Lanczos2wQ0*Q0+Lanczos2wQ1*Q1+Lanczos2wQ2*Q2+Lanczos2wQ3*Q3;}// interpolation output

	return Q1;// case float t == 0 is required to return sample Q1, because of a possible division by 0
}

// (C) 2011 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// prototype Lanczos2 width resizer
// This shader should be run as a screen space pixel shader if you are up-scaling.
// This shader should not be run as a screen space pixel shader if you are down-scaling.
// This shader requires compiling with ps_2_0, but higher is better, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports.
// If possible, avoid compiling with the software emulation modes (ps_?_sw). Pixel shaders require a lot of processing power to run in real-time software mode.
// This shader is meant to work with linear RGB input and output. Regular R'G'B' with a video gamma encoding will have to be converted with the linear gamma shaders to work properly.
// Use this shader to scale the width of an image by Lanczos2 interpolation.

// fractions, either decimal or not, are allowed
// set the magnification factor
#define Magnify (4/3.)

sampler s0;
float c0;
float c1;
#define sp(a, b) float4 a = tex2D(s0, float2(coord+b*c1, tex.y));
#define PI acos(-1)// this will generate the number PI with full precision

float4 main(float2 tex : TEXCOORD0) : COLOR
{
	float coord = (tex.x/Magnify+.5-.5/Magnify)*c0;// assign the output position, normalized to texture height in pixels
	float t = frac(coord);// calculate the difference between the output pixel and the original surrounding two pixels
	coord = (coord-t+.5)*c1;// adjust sampling matrix to put the ouput pixel in the interval [Q1, Q2)

	sp(Q1, 0)// nearest original pixel to the left
	if (t) {
		sp(Q0, -1) sp(Q2, 1) sp(Q3, 2)// original pixels
		float Lanczos2wQ0 = sin((1.+t)*PI)*sin((1.+t)*PI/2.)/(pow(1.+t, 2)*PI*PI/2.);
		float Lanczos2wQ1 = sin(t*PI)*sin(t*PI/2.)/(t*t*PI*PI/2.);
		float Lanczos2wQ2 = sin((1.-t)*PI)*sin((1.-t)*PI/2.)/(pow(1.-t, 2)*PI*PI/2.);
		float Lanczos2wQ3 = sin((2.-t)*PI)*sin((2.-t)*PI/2.)/(pow(2.-t, 2)*PI*PI/2.);
		return Lanczos2wQ0*Q0+Lanczos2wQ1*Q1+Lanczos2wQ2*Q2+Lanczos2wQ3*Q3;}// interpolation output

	return Q1;// case float t == 0 is required to return sample Q1, because of a possible division by 0
}

It also seems I really need to fix the post resize pixel shader parameters, but that's just a flaw in the renderer code.

Hera · 17th November 2011, 23:29

When the CPU cannot keep up, unlike Haali Renderer, EVR desyncs to a certain point and then cuts out audio and it is disastrous.
When you say "code to allow dropping frames" - will this mean no audio dropouts when CPU cannot keep up?
Under this situation, if the CPU *always* cannot keep up - can the behavior match/exceed what Haali Renderer does?

Off-topic question,
The "#define PI acos(-1)" is being cached by the compiler?

JanWillem32 · 18th November 2011, 00:22

The new schedulers indeed won't drop frames (these will only try to modulate ready frames if it's really late or early), unless desynchronization by the global timer is detected (which allows huge time differences of the audio and video timers). In that case, timers for both the video and audio renderer are reset. The old scheduler will still often drop ready frames from the mixer, or prevent frames from decoding, with any sort of interruption in my case.

The "#define PI acos(-1)" just converts to a constant floating-point number. You can take a look at what a pixel shader stores in its registers by opening the "Shader editor", make the compiler compile some code, extend the gray bottom box to readable size, and scroll trough to the output assembly. (A somewhat faster alternative for developers is to compile a .TXT file, using the fxc.exe utility in the DirectX SDK.)

Code:

// Parameters:
//
//   float c0;
//   float c1;
//   sampler2D s0;
//
//
// Registers:
//
//   Name         Reg   Size
//   ------------ ----- ----
//   c0           c0       1
//   c1           c1       1
//   s0           s0       1
//

    ps_3_0
    def c2, 0.375, 0.3125, 0.5, 2
    def c3, 1, 1.57079637, 6.28318548, -3.14159274
    def c4, 4.93480206, 1, 2, 0
    dcl_texcoord v0.xy  // tex<0,1>
    dcl_2d s0

#line 27
    mad r0.x, v0.x, c2.x, c2.y
    mul r0.y, r0.x, c0.x  // ::coord<0>
    frc r0.y, r0.y  // ::t<0>

#line 34
    add r0.z, r0.y, c3.x
    mad r0.w, r0.z, c2.z, c2.z
    frc r0.w, r0.w
    mad r0.w, r0.w, c3.z, c3.w
    sincos r1.y, r0.w
    mul r0.w, r0.z, c3.y
    mul r0.z, r0.z, r0.z
    mul r0.z, r0.z, c4.x
    rcp r0.z, r0.z
    sincos r2.y, r0.w
    mul r0.w, r1.y, r2.y
    mul r0.z, r0.z, r0.w  // ::Lanczos2wQ0<0>
    mul r1.xy, r0.y, c3_abs.wyzw
    sincos r2.y, r1.x
    sincos r3.y, r1.y
    mul r0.w, r2.y, r3.y
    mul r1.x, r0.y, r0.y
    mul r1.x, r1.x, c4.x
    rcp r1.x, r1.x
    mul r0.w, r0.w, r1.x  // ::Lanczos2wQ1<0>

#line 29
    mad r0.x, r0.x, c0.x, -r0.y
    add r0.x, r0.x, c2.z
    mul r1.x, r0.x, c1.x  // ::coord<0>

#line 33
    mov r1.yw, v0.y
    texld r2, r1, s0  // ::Q1<0,1,2,3>
    mov r3.w, c2.w
    mad r3.z, c1.x, r3.w, r1.x

#line 38
    mul r4, r0.w, r2

#line 33
    mad r1.z, r0.x, c1.x, -c1.x
    mad r3.x, r0.x, c1.x, c1.x
    texld r1, r1.zwzw, s0  // ::Q0<0,1,2,3>

#line 38
    mad r1, r0.z, r1, r4
    add r0.xz, -r0.y, c4.yyzw
    mul r4.xyz, r0.xxzw, c3_abs.wyyw
    sincos r5.y, r4.x
    sincos r6.y, r4.y
    sincos r7.y, r4.z
    mul r0.w, r5.y, r6.y
    mul r4.xy, r0.xzzw, r0.xzzw
    mad r0.x, r0.z, c2.z, c2.z
    frc r0.x, r0.x
    mad r0.x, r0.x, c3.z, c3.w
    sincos r5.y, r0.x
    mul r0.x, r7.y, r5.y
    mul r4.xy, r4, c4.x
    rcp r0.z, r4.x
    rcp r4.x, r4.y
    mul r0.x, r0.x, r4.x  // ::Lanczos2wQ3<0>
    mul r0.z, r0.z, r0.w  // ::Lanczos2wQ2<0>

#line 33
    mov r3.yw, v0.y
    texld r4, r3, s0  // ::Q2<0,1,2,3>
    texld r3, r3.zwzw, s0  // ::Q3<0,1,2,3>

#line 38
    mad r1, r0.z, r4, r1
    mad r1, r0.x, r3, r1  // ::main<0,1,2,3>
    cmp oC0, -r0.y, r2, r1  // ::main<0,1,2,3>

// approximately 114 instruction slots used (4 texture, 110 arithmetic)

Not too bad, as far as I can see. (Reading assembly isn't easy. Also note the the nasty habit of the compiler to truncate numbers to less decimal places than it usually stores in its binary format.)

G_M_C · 18th November 2011, 09:29

Quote:

Originally Posted by JanWillem32

It's been a while since I've added any resizers, so I can indeed try. I'll look for some code or math for it, as long as it's easy to insert. (I'm not going to work out the programming for sampling nodes, like I did for the spline5 and spline6 forms anytime soon.)

AviSynth contains lots of different resizers. maybe you can find some inspiration in Avisynth's source ?

golagoda · 18th November 2011, 09:40

+1 for Lanczos, I use that with 4 taps on madVR and it provides comparable results to spline 3/4taps, most people seem to prefer lanczos to spline though, depends on how much you hate ringing for the most part, lanczos gives a bit more.

Take your time and I look forward to it if you add it

JanWillem32 · 18th November 2011, 09:44

Thanks for the tips, maybe I'll find something in the Avisynth code. I'll definitely take some time to add a set of resizers. It's a lot of work to get the shader code running, and even more work to write the menu entries (the main reason I added a whole set of six resizers at once the last time).

I've taken a look at the external filters. For the renderers it was easier to solve than I thought at first. (I was afraid it was one of the changes to the subtitle renderer.)
I've also fixed a minor bit of memory management for the subtitle renderer.

RGold · 19th November 2011, 05:35

Initial test looks good. No tearing.
Question regarding mixer format: With the official version I get input and output same NV12. This build I get NV12 input and x8R8G8B8. What is the difference between these mixer formats?

JanWillem32 · 19th November 2011, 08:13

I changed the stats screen a bit to show what raw format is transferred from the decoder to the mixer, and what format is transferred from the mixer to the renderer. The stats screen of the renderer in the trunk build isn't very clear on this point, but it really does the same thing.

G_M_C · 19th November 2011, 10:46

Quote:

Originally Posted by JanWillem32

I changed the stats screen a bit to show what raw format is transferred from the decoder to the mixer, and what format is transferred from the mixer to the renderer. The stats screen of the renderer in the trunk build isn't very clear on this point, but it really does the same thing.

I think i've asked before, but i cannot remember / or find back the answer:
Was is possible to find out what format the renderer actually outputs to the display ? (Reason for this: to verify if 10-bit is actually outputted, and no dithering is done in/by the renderer)

JanWillem32 · 19th November 2011, 12:11

The renderer itself can only dither when it's commanded to do so (setting dithering level 0 will definitely disable it). As to the capacity of the driver to dither down a 10-bit frontbuffer (in exclusive mode) to 8-bit, I've never seen it happening. My main working display is analog (analog connections on my HD4890 are fed by a 10-bit DAC) and my secondary device accepts 10-bit input easily trough HDMI. Maybe someone can try it out again with an old 8-bit DVI-D monitor on a video card that's generally capable of 10-bit output, but of course won't be able to with such a monitor. Will the monitor give a black screen, will the driver refuse to enable 10-bit output mode (so it can be seen in the stats screen), fake the 10-bit output by rounding to 8-bit, or fake the 10-bit output by dithering to 8-bit?
The last option is actually unlikely. Dithering is pretty heavy, so it's a task for the shadercore (biggest block of transistors in a GPU). When the data is written to a back buffer (which later on shifts places to become a front buffer), the shadercore generally doesn't read or write to/from it again. Reading from the front buffer is done by the micro-controllers for the hardware output ports. These do have some logic on board to convert the front buffer format in memory to signals to output on TMDS (DVI/HDMI), Mini-packet (DP) or analog (through a DAC). I can't imagine that any of those micro-controllers would have a dithering unit on board.
The other three options can simply be observed.
Note that some digital displays will actually report the type and bit depth of the incoming signal.

G_M_C · 19th November 2011, 12:33

Quote:

Originally Posted by JanWillem32

The renderer itself can only dither when it's commanded to do so (setting dithering level 0 will definitely disable it). As to the capacity of the driver to dither down a 10-bit frontbuffer (in exclusive mode) to 8-bit, I've never seen it happening. My main working display is analog (analog connections on my HD4890 are fed by a 10-bit DAC) and my secondary device accepts 10-bit input easily trough HDMI. Maybe someone can try it out again with an old 8-bit DVI-D monitor on a video card that's generally capable of 10-bit output, but of course won't be able to with such a monitor. Will the monitor give a black screen, will the driver refuse to enable 10-bit output mode (so it can be seen in the stats screen), fake the 10-bit output by rounding to 8-bit, or fake the 10-bit output by dithering to 8-bit?
The last option is actually unlikely. Dithering is pretty heavy, so it's a task for the shadercore (biggest block of transistors in a GPU). When the data is written to a back buffer (which later on shifts places to become a front buffer), the shadercore generally doesn't read or write to/from it again. Reading from the front buffer is done by the micro-controllers for the hardware output ports. These do have some logic on board to convert the front buffer format in memory to signals to output on TMDS (DVI/HDMI), Mini-packet (DP) or analog (through a DAC). I can't imagine that any of those micro-controllers would have a dithering unit on board.
The other three options can simply be observed.
Note that some digital displays will actually report the type and bit depth of the incoming signal.

Thx for the answer. I'll have to read it again to understand it fully, but for now i get the idea that if the stats screen reports the renderer is using 10-bit (i.e. in the stats screen is reported something like A2R10G10B10), i can safely assume the output format is 10-bit as well.

ForceX · 19th November 2011, 14:01

Apart from the crashes when resizing, dfr3810i was the last good build working for me. Ever since the new scheduler build I just can't get smooth playback of 1080p content with CUVID decoding, whether I tick the Disable Scheduler or not. However, I don't think it's exactly related to the new scheduler, because it also happens with VMR9r which doesn't even have the new scheduler. During playback at 16 bit the framerate would drop erratically, but playback is fine if I switch to a 720p video or 8 bit or use software decoding. At first it seemed the problem was because video memory was getting filled but GPU-Z reports the same usage for 3810i and 3836i. Moreover, the problem doesn't happen when Aero is disabled or I use exclusive mode. GPU usage stays below 70% so it's probably not due to workload.

3833i seemed to have a video memory leak in conjunction with CUVID decoding but it seems to have been fixed in 3836, so that's good.

17th November 2011, 21:02	#702 \| Link
Keiyakusha 契約者 Join Date: Jun 2008 Posts: 1,576	lanczos is a shortcut for bicubic with taps=3 or something like that. don't really remember. lanczos4 is a bicubic too, but with taps=4 spline on the other hand is somewhat different than lanczos and "many people think it gives the best result" Last edited by Keiyakusha; 17th November 2011 at 21:05.

17th November 2011, 21:28	#707 \| Link
JanWillem32 Registered User Join Date: Oct 2010 Location: The Netherlands Posts: 1,083	The bicubic filters really don't look a thing like the windowed sinc filters. I know the base filter, but I very much prefer a processed convolution using the weights functions. base Lanczos sample weight filter: Code: #define PI acos(-1)// this will generate the number PI with full precision, it is useful for example with the sin, cos and tan functions double LanczosFilter(double x, double radius) { if(x == 0) return 1; if(x < 0) x = -x;// intrinsic sign(...) is faster as it only reads 1 bit, but for a quick sample this is sufficient if(x >= radius) return 0;// cutoff if beyond 2 for lanczos2, 3 for lanczos3, radius is normally a static const return sin(xPI)sin(xPI/radius)/(xxPIPI/radius); } All that's left is to fill in the variables, and write out the convolution function of 4 weights for Lanczos2, 6 for Lanczos3, etc.. A very tedious task. __________________ development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv Last edited by JanWillem32; 17th November 2011 at 21:39. Reason: removed a comment

17th November 2011, 23:29	#710 \| Link
Hera .NET Web App Dev Join Date: May 2010 Location: USA Posts: 291	When the CPU cannot keep up, unlike Haali Renderer, EVR desyncs to a certain point and then cuts out audio and it is disastrous. When you say "code to allow dropping frames" - will this mean no audio dropouts when CPU cannot keep up? Under this situation, if the CPU always cannot keep up - can the behavior match/exceed what Haali Renderer does? Off-topic question, The "#define PI acos(-1)" is being cached by the compiler? __________________ Intel i7 5820k / 16 GB DDR4 / NV 970 / 4K ASUS Windows 8.1

18th November 2011, 09:44	#714 \| Link
JanWillem32 Registered User Join Date: Oct 2010 Location: The Netherlands Posts: 1,083	mpc-hc SSE tester dfr3836i Thanks for the tips, maybe I'll find something in the Avisynth code. I'll definitely take some time to add a set of resizers. It's a lot of work to get the shader code running, and even more work to write the menu entries (the main reason I added a whole set of six resizers at once the last time). I've taken a look at the external filters. For the renderers it was easier to solve than I thought at first. (I was afraid it was one of the changes to the subtitle renderer.) I've also fixed a minor bit of memory management for the subtitle renderer. __________________ development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv Last edited by JanWillem32; 5th December 2011 at 01:38. Reason: removed old link

19th November 2011, 05:35	#715 \| Link
RGold Registered User Join Date: Sep 2011 Posts: 33	Initial test looks good. No tearing. Question regarding mixer format: With the official version I get input and output same NV12. This build I get NV12 input and x8R8G8B8. What is the difference between these mixer formats? __________________ Windows 7 SP1 X64 \| AMD PHENOM II X4 P960 \| ATI 5450 \| 6GB RAM

17th November 2011, 17:18	#701 \| Link
Mierastor Registered User Join Date: Nov 2010 Posts: 15	Ffdshow tryouts has Lanczos as one resize option so maybe there is some useful code there.

18th November 2011, 09:40	#713 \| Link
golagoda Registered User Join Date: Aug 2011 Posts: 98	+1 for Lanczos, I use that with 4 taps on madVR and it provides comparable results to spline 3/4taps, most people seem to prefer lanczos to spline though, depends on how much you hate ringing for the most part, lanczos gives a bit more. Take your time and I look forward to it if you add it

19th November 2011, 08:13	#716 \| Link
JanWillem32 Registered User Join Date: Oct 2010 Location: The Netherlands Posts: 1,083	I changed the stats screen a bit to show what raw format is transferred from the decoder to the mixer, and what format is transferred from the mixer to the renderer. The stats screen of the renderer in the trunk build isn't very clear on this point, but it really does the same thing. __________________ development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv

19th November 2011, 12:11	#718 \| Link
JanWillem32 Registered User Join Date: Oct 2010 Location: The Netherlands Posts: 1,083	The renderer itself can only dither when it's commanded to do so (setting dithering level 0 will definitely disable it). As to the capacity of the driver to dither down a 10-bit frontbuffer (in exclusive mode) to 8-bit, I've never seen it happening. My main working display is analog (analog connections on my HD4890 are fed by a 10-bit DAC) and my secondary device accepts 10-bit input easily trough HDMI. Maybe someone can try it out again with an old 8-bit DVI-D monitor on a video card that's generally capable of 10-bit output, but of course won't be able to with such a monitor. Will the monitor give a black screen, will the driver refuse to enable 10-bit output mode (so it can be seen in the stats screen), fake the 10-bit output by rounding to 8-bit, or fake the 10-bit output by dithering to 8-bit? The last option is actually unlikely. Dithering is pretty heavy, so it's a task for the shadercore (biggest block of transistors in a GPU). When the data is written to a back buffer (which later on shifts places to become a front buffer), the shadercore generally doesn't read or write to/from it again. Reading from the front buffer is done by the micro-controllers for the hardware output ports. These do have some logic on board to convert the front buffer format in memory to signals to output on TMDS (DVI/HDMI), Mini-packet (DP) or analog (through a DAC). I can't imagine that any of those micro-controllers would have a dithering unit on board. The other three options can simply be observed. Note that some digital displays will actually report the type and bit depth of the incoming signal. __________________ development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv

19th November 2011, 14:01	#720 \| Link
ForceX Registered User Join Date: Oct 2006 Posts: 150	Apart from the crashes when resizing, dfr3810i was the last good build working for me. Ever since the new scheduler build I just can't get smooth playback of 1080p content with CUVID decoding, whether I tick the Disable Scheduler or not. However, I don't think it's exactly related to the new scheduler, because it also happens with VMR9r which doesn't even have the new scheduler. During playback at 16 bit the framerate would drop erratically, but playback is fine if I switch to a 720p video or 8 bit or use software decoding. At first it seemed the problem was because video memory was getting filled but GPU-Z reports the same usage for 3810i and 3836i. Moreover, the problem doesn't happen when Aero is disabled or I use exclusive mode. GPU usage stays below 70% so it's probably not due to workload. 3833i seemed to have a video memory leak in conjunction with CUVID decoding but it seems to have been fixed in 3836, so that's good.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode