Log in

View Full Version : MPC-HC tester builds for internal renderer fixes


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 [15] 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

Mierastor
17th November 2011, 17:18
Ffdshow tryouts has Lanczos as one resize option so maybe there is some useful code there.

Keiyakusha
17th November 2011, 21:02
lanczos is a shortcut for bicubic with taps=3 or something like that. don't really remember.
lanczos4 is a bicubic too, but with taps=4

spline on the other hand is somewhat different than lanczos and "many people think it gives the best result"

Mierastor
17th November 2011, 21:04
lanczos is a shortcut for bicubic with taps=3 or something like that. don't really remember.
lanczos4 is a bicubic too, but with taps=4
http://en.wikipedia.org/wiki/Lanczos_resampling

Keiyakusha
17th November 2011, 21:09
http://en.wikipedia.org/wiki/Lanczos_resampling

Lanczos is a "customized" bicubic, customized by guy with the name Cornelius Lanczos, what you don't understand?

Mierastor
17th November 2011, 21:17
Lanczos is a "customized" bicubic, customized by guy with the name Cornelius Lanczos, what you don't understand?
Both are convolution filters. The results can be quite different.

Just Google to find numerous discussions regarding advantages or disadvantages for Lanczos vs. Bicubic.

Keiyakusha
17th November 2011, 21:19
Both are convolution filters. The results can be quite different.

Just Google to find numerous discussions regarding advantages or disadvantages for Lanczos vs. Bicubic.

I'm not saying they are the same. Obviously they are not. I'm saying Bicubic can be tweaked to make lanczos out of it. Something like that.

JanWillem32
17th November 2011, 21:28
The bicubic filters really don't look a thing like the windowed sinc filters. I know the base filter, but I very much prefer a processed convolution using the weights functions.
base Lanczos sample weight filter:#define PI acos(-1)// this will generate the number PI with full precision, it is useful for example with the sin, cos and tan functions

double LanczosFilter(double x, double radius)
{
if(x == 0) return 1;
if(x < 0) x = -x;// intrinsic sign(...) is faster as it only reads 1 bit, but for a quick sample this is sufficient
if(x >= radius) return 0;// cutoff if beyond 2 for lanczos2, 3 for lanczos3, radius is normally a static const
return sin(x*PI)*sin(x*PI/radius)/(x*x*PI*PI/radius);
}All that's left is to fill in the variables, and write out the convolution function of 4 weights for Lanczos2, 6 for Lanczos3, etc.. A very tedious task.

Mierastor
17th November 2011, 21:31
I'm not saying they are the same. Obviously they are not. I'm saying Bicubic can be tweaked to make lanczos out of it. Something like that.
Cannot be done in for example Ffdshow which includes a lot of customization options.

JanWillem32
17th November 2011, 23:14
Lanczos2, probably correct in function, but not optimized:// (C) 2011 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// prototype Lanczos2 height resizer
// This shader should be run as a screen space pixel shader if you are up-scaling.
// This shader should not be run as a screen space pixel shader if you are down-scaling.
// This shader requires compiling with ps_2_0, but higher is better, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports.
// If possible, avoid compiling with the software emulation modes (ps_?_sw). Pixel shaders require a lot of processing power to run in real-time software mode.
// This shader is meant to work with linear RGB input and output. Regular R'G'B' with a video gamma encoding will have to be converted with the linear gamma shaders to work properly.
// Use this shader to scale the height of an image by Lanczos2 interpolation.

// fractions, either decimal or not, are allowed
// set the magnification factor
#define Magnify (4/3.)

sampler s0;
float2 c0;
float2 c1;
#define sp(a, b) float4 a = tex2D(s0, float2(tex.x, coord+b*c1.y));
#define PI acos(-1)// this will generate the number PI with full precision

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float coord = (tex.y/Magnify+.5-.5/Magnify)*c0.y;// assign the output position, normalized to texture height in pixels
float t = frac(coord);// calculate the difference between the output pixel and the original surrounding two pixels
coord = (coord-t+.5)*c1.y;// adjust sampling matrix to put the ouput pixel in the interval [Q1, Q2)

sp(Q1, 0)// nearest original pixel to the top
if (t) {
sp(Q0, -1) sp(Q2, 1) sp(Q3, 2)// original pixels
float Lanczos2wQ0 = sin((1.+t)*PI)*sin((1.+t)*PI/2.)/(pow(1.+t, 2)*PI*PI/2.);
float Lanczos2wQ1 = sin(t*PI)*sin(t*PI/2.)/(t*t*PI*PI/2.);
float Lanczos2wQ2 = sin((1.-t)*PI)*sin((1.-t)*PI/2.)/(pow(1.-t, 2)*PI*PI/2.);
float Lanczos2wQ3 = sin((2.-t)*PI)*sin((2.-t)*PI/2.)/(pow(2.-t, 2)*PI*PI/2.);
return Lanczos2wQ0*Q0+Lanczos2wQ1*Q1+Lanczos2wQ2*Q2+Lanczos2wQ3*Q3;}// interpolation output

return Q1;// case float t == 0 is required to return sample Q1, because of a possible division by 0
}

// (C) 2011 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// prototype Lanczos2 width resizer
// This shader should be run as a screen space pixel shader if you are up-scaling.
// This shader should not be run as a screen space pixel shader if you are down-scaling.
// This shader requires compiling with ps_2_0, but higher is better, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports.
// If possible, avoid compiling with the software emulation modes (ps_?_sw). Pixel shaders require a lot of processing power to run in real-time software mode.
// This shader is meant to work with linear RGB input and output. Regular R'G'B' with a video gamma encoding will have to be converted with the linear gamma shaders to work properly.
// Use this shader to scale the width of an image by Lanczos2 interpolation.

// fractions, either decimal or not, are allowed
// set the magnification factor
#define Magnify (4/3.)

sampler s0;
float c0;
float c1;
#define sp(a, b) float4 a = tex2D(s0, float2(coord+b*c1, tex.y));
#define PI acos(-1)// this will generate the number PI with full precision

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float coord = (tex.x/Magnify+.5-.5/Magnify)*c0;// assign the output position, normalized to texture height in pixels
float t = frac(coord);// calculate the difference between the output pixel and the original surrounding two pixels
coord = (coord-t+.5)*c1;// adjust sampling matrix to put the ouput pixel in the interval [Q1, Q2)

sp(Q1, 0)// nearest original pixel to the left
if (t) {
sp(Q0, -1) sp(Q2, 1) sp(Q3, 2)// original pixels
float Lanczos2wQ0 = sin((1.+t)*PI)*sin((1.+t)*PI/2.)/(pow(1.+t, 2)*PI*PI/2.);
float Lanczos2wQ1 = sin(t*PI)*sin(t*PI/2.)/(t*t*PI*PI/2.);
float Lanczos2wQ2 = sin((1.-t)*PI)*sin((1.-t)*PI/2.)/(pow(1.-t, 2)*PI*PI/2.);
float Lanczos2wQ3 = sin((2.-t)*PI)*sin((2.-t)*PI/2.)/(pow(2.-t, 2)*PI*PI/2.);
return Lanczos2wQ0*Q0+Lanczos2wQ1*Q1+Lanczos2wQ2*Q2+Lanczos2wQ3*Q3;}// interpolation output

return Q1;// case float t == 0 is required to return sample Q1, because of a possible division by 0
}It also seems I really need to fix the post resize pixel shader parameters, but that's just a flaw in the renderer code.

Hera
17th November 2011, 23:29
When the CPU cannot keep up, unlike Haali Renderer, EVR desyncs to a certain point and then cuts out audio and it is disastrous.
When you say "code to allow dropping frames" - will this mean no audio dropouts when CPU cannot keep up?
Under this situation, if the CPU *always* cannot keep up - can the behavior match/exceed what Haali Renderer does?

Off-topic question,
The "#define PI acos(-1)" is being cached by the compiler?

JanWillem32
18th November 2011, 00:22
The new schedulers indeed won't drop frames (these will only try to modulate ready frames if it's really late or early), unless desynchronization by the global timer is detected (which allows huge time differences of the audio and video timers). In that case, timers for both the video and audio renderer are reset. The old scheduler will still often drop ready frames from the mixer, or prevent frames from decoding, with any sort of interruption in my case.

The "#define PI acos(-1)" just converts to a constant floating-point number. You can take a look at what a pixel shader stores in its registers by opening the "Shader editor", make the compiler compile some code, extend the gray bottom box to readable size, and scroll trough to the output assembly. (A somewhat faster alternative for developers is to compile a .TXT file, using the fxc.exe utility in the DirectX SDK.)

// Parameters:
//
// float c0;
// float c1;
// sampler2D s0;
//
//
// Registers:
//
// Name Reg Size
// ------------ ----- ----
// c0 c0 1
// c1 c1 1
// s0 s0 1
//

ps_3_0
def c2, 0.375, 0.3125, 0.5, 2
def c3, 1, 1.57079637, 6.28318548, -3.14159274
def c4, 4.93480206, 1, 2, 0
dcl_texcoord v0.xy // tex<0,1>
dcl_2d s0

#line 27
mad r0.x, v0.x, c2.x, c2.y
mul r0.y, r0.x, c0.x // ::coord<0>
frc r0.y, r0.y // ::t<0>

#line 34
add r0.z, r0.y, c3.x
mad r0.w, r0.z, c2.z, c2.z
frc r0.w, r0.w
mad r0.w, r0.w, c3.z, c3.w
sincos r1.y, r0.w
mul r0.w, r0.z, c3.y
mul r0.z, r0.z, r0.z
mul r0.z, r0.z, c4.x
rcp r0.z, r0.z
sincos r2.y, r0.w
mul r0.w, r1.y, r2.y
mul r0.z, r0.z, r0.w // ::Lanczos2wQ0<0>
mul r1.xy, r0.y, c3_abs.wyzw
sincos r2.y, r1.x
sincos r3.y, r1.y
mul r0.w, r2.y, r3.y
mul r1.x, r0.y, r0.y
mul r1.x, r1.x, c4.x
rcp r1.x, r1.x
mul r0.w, r0.w, r1.x // ::Lanczos2wQ1<0>

#line 29
mad r0.x, r0.x, c0.x, -r0.y
add r0.x, r0.x, c2.z
mul r1.x, r0.x, c1.x // ::coord<0>

#line 33
mov r1.yw, v0.y
texld r2, r1, s0 // ::Q1<0,1,2,3>
mov r3.w, c2.w
mad r3.z, c1.x, r3.w, r1.x

#line 38
mul r4, r0.w, r2

#line 33
mad r1.z, r0.x, c1.x, -c1.x
mad r3.x, r0.x, c1.x, c1.x
texld r1, r1.zwzw, s0 // ::Q0<0,1,2,3>

#line 38
mad r1, r0.z, r1, r4
add r0.xz, -r0.y, c4.yyzw
mul r4.xyz, r0.xxzw, c3_abs.wyyw
sincos r5.y, r4.x
sincos r6.y, r4.y
sincos r7.y, r4.z
mul r0.w, r5.y, r6.y
mul r4.xy, r0.xzzw, r0.xzzw
mad r0.x, r0.z, c2.z, c2.z
frc r0.x, r0.x
mad r0.x, r0.x, c3.z, c3.w
sincos r5.y, r0.x
mul r0.x, r7.y, r5.y
mul r4.xy, r4, c4.x
rcp r0.z, r4.x
rcp r4.x, r4.y
mul r0.x, r0.x, r4.x // ::Lanczos2wQ3<0>
mul r0.z, r0.z, r0.w // ::Lanczos2wQ2<0>

#line 33
mov r3.yw, v0.y
texld r4, r3, s0 // ::Q2<0,1,2,3>
texld r3, r3.zwzw, s0 // ::Q3<0,1,2,3>

#line 38
mad r1, r0.z, r4, r1
mad r1, r0.x, r3, r1 // ::main<0,1,2,3>
cmp oC0, -r0.y, r2, r1 // ::main<0,1,2,3>

// approximately 114 instruction slots used (4 texture, 110 arithmetic)Not too bad, as far as I can see. (Reading assembly isn't easy. Also note the the nasty habit of the compiler to truncate numbers to less decimal places than it usually stores in its binary format.)

G_M_C
18th November 2011, 09:29
It's been a while since I've added any resizers, so I can indeed try. I'll look for some code or math for it, as long as it's easy to insert. (I'm not going to work out the programming for sampling nodes, like I did for the spline5 and spline6 forms anytime soon.)

AviSynth contains lots of different resizers. maybe you can find some inspiration in Avisynth's source ?

golagoda
18th November 2011, 09:40
+1 for Lanczos, I use that with 4 taps on madVR and it provides comparable results to spline 3/4taps, most people seem to prefer lanczos to spline though, depends on how much you hate ringing for the most part, lanczos gives a bit more.

Take your time and I look forward to it if you add it ;)

JanWillem32
18th November 2011, 09:44
Thanks for the tips, maybe I'll find something in the Avisynth code. I'll definitely take some time to add a set of resizers. It's a lot of work to get the shader code running, and even more work to write the menu entries (the main reason I added a whole set of six resizers at once the last time).

I've taken a look at the external filters. For the renderers it was easier to solve than I thought at first. (I was afraid it was one of the changes to the subtitle renderer.)
I've also fixed a minor bit of memory management for the subtitle renderer.

RGold
19th November 2011, 05:35
Initial test looks good. No tearing.
Question regarding mixer format: With the official version I get input and output same NV12. This build I get NV12 input and x8R8G8B8. What is the difference between these mixer formats?

JanWillem32
19th November 2011, 08:13
I changed the stats screen a bit to show what raw format is transferred from the decoder to the mixer, and what format is transferred from the mixer to the renderer. The stats screen of the renderer in the trunk build isn't very clear on this point, but it really does the same thing.

G_M_C
19th November 2011, 10:46
I changed the stats screen a bit to show what raw format is transferred from the decoder to the mixer, and what format is transferred from the mixer to the renderer. The stats screen of the renderer in the trunk build isn't very clear on this point, but it really does the same thing.

I think i've asked before, but i cannot remember / or find back the answer:
Was is possible to find out what format the renderer actually outputs to the display ? (Reason for this: to verify if 10-bit is actually outputted, and no dithering is done in/by the renderer)

JanWillem32
19th November 2011, 12:11
The renderer itself can only dither when it's commanded to do so (setting dithering level 0 will definitely disable it). As to the capacity of the driver to dither down a 10-bit frontbuffer (in exclusive mode) to 8-bit, I've never seen it happening. My main working display is analog (analog connections on my HD4890 are fed by a 10-bit DAC) and my secondary device accepts 10-bit input easily trough HDMI. Maybe someone can try it out again with an old 8-bit DVI-D monitor on a video card that's generally capable of 10-bit output, but of course won't be able to with such a monitor. Will the monitor give a black screen, will the driver refuse to enable 10-bit output mode (so it can be seen in the stats screen), fake the 10-bit output by rounding to 8-bit, or fake the 10-bit output by dithering to 8-bit?
The last option is actually unlikely. Dithering is pretty heavy, so it's a task for the shadercore (biggest block of transistors in a GPU). When the data is written to a back buffer (which later on shifts places to become a front buffer), the shadercore generally doesn't read or write to/from it again. Reading from the front buffer is done by the micro-controllers for the hardware output ports. These do have some logic on board to convert the front buffer format in memory to signals to output on TMDS (DVI/HDMI), Mini-packet (DP) or analog (through a DAC). I can't imagine that any of those micro-controllers would have a dithering unit on board.
The other three options can simply be observed.
Note that some digital displays will actually report the type and bit depth of the incoming signal.

G_M_C
19th November 2011, 12:33
The renderer itself can only dither when it's commanded to do so (setting dithering level 0 will definitely disable it). As to the capacity of the driver to dither down a 10-bit frontbuffer (in exclusive mode) to 8-bit, I've never seen it happening. My main working display is analog (analog connections on my HD4890 are fed by a 10-bit DAC) and my secondary device accepts 10-bit input easily trough HDMI. Maybe someone can try it out again with an old 8-bit DVI-D monitor on a video card that's generally capable of 10-bit output, but of course won't be able to with such a monitor. Will the monitor give a black screen, will the driver refuse to enable 10-bit output mode (so it can be seen in the stats screen), fake the 10-bit output by rounding to 8-bit, or fake the 10-bit output by dithering to 8-bit?
The last option is actually unlikely. Dithering is pretty heavy, so it's a task for the shadercore (biggest block of transistors in a GPU). When the data is written to a back buffer (which later on shifts places to become a front buffer), the shadercore generally doesn't read or write to/from it again. Reading from the front buffer is done by the micro-controllers for the hardware output ports. These do have some logic on board to convert the front buffer format in memory to signals to output on TMDS (DVI/HDMI), Mini-packet (DP) or analog (through a DAC). I can't imagine that any of those micro-controllers would have a dithering unit on board.
The other three options can simply be observed.
Note that some digital displays will actually report the type and bit depth of the incoming signal.

Thx for the answer. I'll have to read it again to understand it fully, but for now i get the idea that if the stats screen reports the renderer is using 10-bit (i.e. in the stats screen is reported something like A2R10G10B10), i can safely assume the output format is 10-bit as well.

ForceX
19th November 2011, 14:01
Apart from the crashes when resizing, dfr3810i was the last good build working for me. Ever since the new scheduler build I just can't get smooth playback of 1080p content with CUVID decoding, whether I tick the Disable Scheduler or not. However, I don't think it's exactly related to the new scheduler, because it also happens with VMR9r which doesn't even have the new scheduler. During playback at 16 bit the framerate would drop erratically, but playback is fine if I switch to a 720p video or 8 bit or use software decoding. At first it seemed the problem was because video memory was getting filled but GPU-Z reports the same usage for 3810i and 3836i. Moreover, the problem doesn't happen when Aero is disabled or I use exclusive mode. GPU usage stays below 70% so it's probably not due to workload.

3833i seemed to have a video memory leak in conjunction with CUVID decoding but it seems to have been fixed in 3836, so that's good.

janos666
19th November 2011, 14:20
or fake the 10-bit output by dithering to 8-bit?

A spatial/temporal dithering is applied after the 16-bit 2D calibration LUT. It's easy to test.

I calibrated one of my friend's 8-bit (6bit+dithering) c-PVA display. I measured 10-bit effective precision, and ended up with very nice results (the uncalibrated state was disastrous but ArgyllCMS/dispcal did a very decent work, the white balance and gradation got smoothed out without noticeable banding).

My H5850 have one HDMI output only, but I need two cables between my VGA and my TV (one calibration settings memory per input and different refresh rates require different settings...). 10-bit output (EVR-CP) works through both HDMI and DVI->HDMI connections (I use HDMI->HDMI for 1080p24 playback and DVI->HDMI 1080p60 for PC gaming).


I am still not sure about what happens in the HDMI->HDMI situation (real or fake). My PDP is full of dithering noise already (as well as any 6-10+ bit LCD displays I tested had some amount of internal dithering), so I can't visually evaluate this.

Hera
21st November 2011, 05:25
Latest build works great, performance wise, on my netbook.

JanWillem32
22nd November 2011, 12:34
I don't have too much time before I have to go, but I mostly worked on new SSE code for the subtitle texture preparation functions. I've also fixed a "text falloff" problem with bitmap subtitles (still needs better aspect ratio correction).
I'll try to involve myself with the development of a better subtitle renderer if I can come to terms with the rest of the people involved (I'm asking for a lot, and I won't cooperate if the subtitle renderer isn't fundamentally changed to finally produce quality output). It's at the xy-vsfilter tracker, and I'll try to read that thread later on (lot of text, no time for it now). http://code.google.com/p/xy-vsfilter/issues/detail?id=40

@ForceX: A difficult situation, and pretty hard to analyze as well. It could be the combination with the FlipEx present mode (which is pretty new). Only I don't remember exactly when I implemented the FlipEx mode (I believe I did mention it somewhere in this thread). I'll take a look at creating a debug build later, maybe that will help to test.

@Hera: With all present modes or some in particular? I believe you were having issues with subtitles before, this has been solved?

I've worked out the (efficient) programming for a set of Lanczos filters and Bézier curves. Regarding the Lanczos filters, I know the base properties of a truncated windowed sinc function, and I was not very surprised when I simulated values for input:
For a simulated value of t = .312 , the weights for the function are:
-3-> 0.01353790110500565952473035467006
-2-> -0.01482150141104846335254841219284
-1-> -0.08627089893640504238179223189432
0-> 0.81387680430780980122826820168191
1-> 0.31372932636473093833162302177254
2-> -0.02780500481463081933622775622197
3-> -0.02055291682427676817514069470158
4-> 0.00582486728065545855547543976717

The weight sum for a Lanczos2 {-1, 0, 1, 2} is: 1.0135302269215048778418712353382 .
The weight sum for a Lanczos3 {-2, -1, 0, 1, 2, 3} is: 0.97815580868617964631418212844374 .
The weight sum for a Lanczos4 {-3, -2, -1, 0, 1, 2, 3, 4} is: 0.99751857707184076439438792288097 .

The current set of filters and the Bézier curves always sum up to 1 (those are simply built that way). My question is: does anyone know of a way to compensate the weight factors of truncated sinc functions? Also, up how big should the filter window be? I've seen versions taking 16 pixels in each sampling direction (Lanczos8, 256 pixels weighed per output pixel), that's just a lot for a filter that weights the last set of pixels by less than ±0.001 (it's even less for the less dramatic base windowed functions than Lanczos).
Lanczos4:// (C) 2011 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// prototype Lanczos4 height resizer
// This shader should be run as a screen space pixel shader if you are up-scaling.
// This shader should not be run as a screen space pixel shader if you are down-scaling.
// This shader requires compiling with ps_2_a, but higher is better, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports.
// This shader is meant to work with linear RGB input and output. Regular R'G'B' with a video gamma encoding will have to be converted with the linear gamma shaders to work properly.
// Use this shader to scale the height of an image by Lanczos4 interpolation.

// fractions, either decimal or not, are allowed
// set the magnification factor
#define Magnify (4/3.)

sampler s0;
float2 c0;
float2 c1;
#define sp(a, b) float4 a = tex2D(s0, float2(tex.x, coord+b*c1.y));
#define PI acos(-1)// this will generate the number PI with full precision

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float coord = (tex.y/Magnify+.5-.5/Magnify)*c0.y;// assign the output position, normalized to texture height in pixels
float t = frac(coord);// calculate the difference between the output pixel and the original surrounding two pixels
coord = (coord-t+.5)*c1.y;// adjust sampling matrix to put the output pixel in the interval [Q1, Q2)

sp(Q3, 0)// nearest original pixel to the top
if (t) {
sp(Q0, -3) sp(Q1, -2) sp(Q2, -1) sp(Q4, 1) sp(Q5, 2) sp(Q6, 3) sp(Q7, 4)// original pixels
float4 wset0 = float4(3, 2, 1, 0)+t;
float4 wset1 = float4(1, 2, 3, 4)-t;
float4 w0 = sin(wset0*PI)*sin(wset0*PI/2.)/(wset0*wset0*PI*PI/2.);
float4 w1 = sin(wset1*PI)*sin(wset1*PI/2.)/(wset1*wset1*PI*PI/2.);
return w0.x*Q0+w0.y*Q1+w0.z*Q2+w0.w*Q3+w1.x*Q4+w1.y*Q5+w1.z*Q6+w1.w*Q7;}// interpolation output

return Q3;// case t == 0 is required to return sample Q3, because of a possible division by 0
}

// (C) 2011 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// prototype Lanczos4 width resizer
// This shader should be run as a screen space pixel shader if you are up-scaling.
// This shader should not be run as a screen space pixel shader if you are down-scaling.
// This shader requires compiling with ps_2_a, but higher is better, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports.
// This shader is meant to work with linear RGB input and output. Regular R'G'B' with a video gamma encoding will have to be converted with the linear gamma shaders to work properly.
// Use this shader to scale the width of an image by Lanczos4 interpolation.

// fractions, either decimal or not, are allowed
// set the magnification factor
#define Magnify (4/3.)

sampler s0;
float c0;
float c1;
#define sp(a, b) float4 a = tex2D(s0, float2(coord+b*c1, tex.y));
#define PI acos(-1)// this will generate the number PI with full precision

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float coord = (tex.x/Magnify+.5-.5/Magnify)*c0;// assign the output position, normalized to texture height in pixels
float t = frac(coord);// calculate the difference between the output pixel and the original surrounding two pixels
coord = (coord-t+.5)*c1;// adjust sampling matrix to put the output pixel in the interval [Q1, Q2)

sp(Q3, 0)// nearest original pixel to the left
if (t) {
sp(Q0, -3) sp(Q1, -2) sp(Q2, -1) sp(Q4, 1) sp(Q5, 2) sp(Q6, 3) sp(Q7, 4)// original pixels
float4 wset0 = float4(3, 2, 1, 0)+t;
float4 wset1 = float4(1, 2, 3, 4)-t;
float4 w0 = sin(wset0*PI)*sin(wset0*PI/2.)/(wset0*wset0*PI*PI/2.);
float4 w1 = sin(wset1*PI)*sin(wset1*PI/2.)/(wset1*wset1*PI*PI/2.);
return w0.x*Q0+w0.y*Q1+w0.z*Q2+w0.w*Q3+w1.x*Q4+w1.y*Q5+w1.z*Q6+w1.w*Q7;}// interpolation output

return Q3;// case t == 0 is required to return sample Q3, because of a possible division by 0
}

CruNcher
22nd November 2011, 13:01
Crashes immediately after loading the stream :( tried all kind of decoder nothing, 3836i works :(

http://img810.imageshack.us/img810/2533/mpchctestcrash.png

JanWillem32
22nd November 2011, 13:29
Can you try a two-pass resizer (bicubic and onwards) and a one-pass resizer (first three options)? I think I've set the wrong borders for the resizer passes.

gilic
22nd November 2011, 14:22
The first 3 options crash with a d3derr_invalidcall the rest seems to work.

ForceX
22nd November 2011, 14:51
It could be the combination with the FlipEx present mode (which is pretty new). Only I don't remember exactly when I implemented the FlipEx mode (I believe I did mention it somewhere in this thread).

FlipEx was introduced in tester dfr3752ir. dfr3810i was the last good build working for me. The new scheduler was introduced in tester dfr3833i and builds since then are problematic for me.

Hera
22nd November 2011, 16:18
Yeah performance is great in D3D:FS.
The complex subtitles on that music video work even.

CruNcher
22nd November 2011, 23:25
The first 3 options crash with a d3derr_invalidcall the rest seems to work.

confirmed

PS: Heaviest issue i had till now http://img194.imageshack.us/img194/2483/afterlavsplitterhangwit.png though it seems to only happen in 1 constelation that involves current lav splitter-> mainconcepts mpeg-2 decoder (sdk9) bad dxva bitstream decoding and using File->Close while playing back after the hang MPC-HC test told me this and doesn't work anymore. Ok after a while it seems Windows recovered itself it did some strange things though for a while when trying to open another file after the hang with a new MPC-HC instance and lav splitter it showed "Opening" for several minutes and nothing happened (entirely different file, as this error message normally indicates a access violation) and when i tried to open Process Explorer it wasn't responding for a long time until 3 Process explorer opened up ;) typical stall situation known from NT5 already it seems (though first time i encountered one on NT6 so far :D ).

Hera
26th November 2011, 20:20
For the previous build (avoiding this one), on XP, opening a second file seems to crash the player.

RGold
29th November 2011, 17:13
I get black screen when I'm launching MKV files in full screen with build 3845i.

Hera
29th November 2011, 22:30
I am having hard time maintaining 24 Hz on AMD laptop which is weird - it drops frames and the GPU line zigzags out of control... Not codec related.

Sometimes it de-syncs and causes audio-delay. Pausing and starting it again fixes this issue.

JanWillem32
5th December 2011, 00:47
Firstly, I'm sorry I've been totally absent lately. My studies and such take up all my attention right now. It's a mess.
Anyway, I've added frame dropping code to the new scheduler (not in the constant frame interpolator yet) and added automatic chroma detection for chroma up-sampling functions (can still be disabled to test or for use with custom pixel shaders) to the renderer. I've re-written the code for the basic timer in x64 assembly (x86 is on a todo list). Lastly, I've edited the surface eraser function for the subtitle renderer.
Sorry that I haven't had the time to do more or answer questions right now. I'll try to free up some time in a few days to handle things as usual.

G_M_C
5th December 2011, 08:05
Firstly, I'm sorry I've been totally absent lately. My studies and such take up all my attention right now. It's a mess.[...]

Np, studies take priority. You are absolutely right on that.

XRyche
5th December 2011, 15:55
I take it that the SSE2 builds are still better (if you can use them.) than the SSE builds, right?

Hera
5th December 2011, 17:08
EDIT: Nevermind, this needs more testing.
EDIT2: When it comes to non-DXVA playback, Haali Renderer is still superior. Tested with 8 buffers. Haali Renderer did not de-sync where EVR:CP de-synced.

jackbluray
22nd December 2011, 10:26
blu-ray subtitles are displayed not correctly. They disappear too fast. With ffdshow it works very well

file here
http://www.megaupload.com/?d=24ZO2T1Y

bug tracker
https://sourceforge.net/apps/trac/mpc-hc/ticket/1907

Hera
24th December 2011, 07:12
What happened to JanW?

golagoda
24th December 2011, 08:59
What happened to JanW?

According to his profile he was last online today, probably just doesn't currently have as much time as he used to for whatever reason.

JanWillem32
24th December 2011, 20:45
Happy holidays!
Sorry I haven't been around for a while. Studies and family issues made me choose to restrict my area of activities. (Developing software for the projects I'm involved in is quite time consuming.)
I'm at my parents' house right now to spend the holidays with.
I did compile before I left home. I'll make some time to process the questions I received and add the changelog later.

gilic
24th December 2011, 22:45
The new release(x86) crashes instantly for me even without opening a file.

JanWillem32
25th December 2011, 01:26
I've confirmed that. The executables I've tested that came out of the compiler worked fine before, so I guess something went wrong during copying or archiving. The x64 version I've just downloaded runs fine. I'll upload a x86 version (or a new set) once I'm home again.

shadewither
29th December 2011, 12:10
Quoted from trunk MPC-HC thread:
If I ever get the DirectX 11 renderer in a working state (I cant get a mixer to work, unfortunately), I will most certainly not add any of the original scheduling and VSync code.
Does your mixer break the data flow, or it malfunctions in some graphs?
Maybe you can make a fake mixer+presenter combo and channel data out?

JanWillem32
29th December 2011, 17:25
The problem is that I don't have a custom mixer at all. If I did, I would have added a module like the one that handles the external EVR or VMR-9 mixers to the DirectX 9 allocator-presenter already. The mixer is the part that negotiates and accepts a DirectShow or Media Foundation video stream, transforms those to working surfaces in video memory, and in the case of EVR or VMR-9, also performs some basic initial filter passes.
I just never figured out how to accept a pin from the graph builder, and read the raw bytes exported by the video decoder. What I can easily do is initialize surfaces from a local unencoded resource in memory without using EVR or VMR-9.
It's too bad I haven't been able to get one to work, as I really don't like the superimposed filters that VMR-9 and EVR apply.
Those mixers will also only bind to DirectX 9 devices. I'm used to programming with DirectX 10 and 11. The interfaces and methods those have are a lot more modern, efficient and have very decent minimum hardware requirements. DirectX 9 has a huge amount of legacy stuff to deal with. I learned the hard way that when working with DirectX 9, I have to insert legacy support options, legacy support tests and fallback functions in every nook and cranny of the renderer code.

Hera
29th December 2011, 18:57
Two common problems with the previous 32/64 version (not dfr3913i),
- Sometimes crashes while seeking
- Sometimes crashed file opening nth file

shadewither
29th December 2011, 20:48
Happy holidays!
Sorry I haven't been around for a while. Studies and family issues made me choose to restrict my area of activities. (Developing software for the projects I'm involved in is quite time consuming.)
I'm at my parents' house right now to spend the holidays with.
I did compile before I left home. I'll make some time to process the questions I received and add the changelog later.

x64: http://www.mediafire.com/?8ujd5fbyapnn2tv
A small bug: some necessary shaders seem not enabled

If all below conditions are met:
1. surface is not 8-bit;
2. no pre-/post- shader enabled;
3. at 100% size.
(IOW, change surface format, leave rest settings default)

Symptom:
1. wrong color (as if in linear RGB?);
2. "display stats"/"remaining time" won't display.

System:
Windows 7 x64, Aero disabled
EVR input pin is NV12

JanWillem32
29th December 2011, 23:15
EVR indeed doesn't handle changes on-the-fly very well. Changing the base color format while rendering invokes a full device reset and the renderer will try to release all resources to make that happen. Unfortunately, the EVR mixer adds a lot of shadow references to various DirectX 9 rendering objects, and the DXVA helper function adds shadow references to the DirectX 9 Ex device and Direct3D 9 Ex base object. Those are a bit difficult to control. It's even worse when changing from rendering on one video card to another (by dragging the player from one screen to another). That operation is still often fatal.
Note that inheritance of (default) settings from the trunk build can give problems in certain cases

Hera
2nd January 2012, 00:48
OMG/XVID stuff seems to crash when fullscreen non-D3D and you try to get to the seek bar
The audio keeps playing, player fully crashes though.

CruNcher
5th January 2012, 15:46
@JanWillem32
First a happy new year for you and family

and now to something i experienced recently due to Quicksync decoding tests http://forum.doom9.org/showpost.php?p=1549083&postcount=386
MPC-HC experimental version was dfr3882i SSE 32 Bit

PS: Will there be a 32 bit compile again ?

JanWillem32
19th January 2012, 01:30
The links are in the OP.

I've split x64, x86 SSE and x86 SSE2 code paths for a lot of functions: renderer class initializer, VSync (slightly), device creator/reset, color management (heavy modifications, but unfortunately it's only on the renderer side) and image saving function (heavy modifications, the previous copy function was unsuitable).
Little CMS has been updated again, so I'd like to evaluate if the color management functions all work properly.
The alternative scheduler was renamed, and now uses a different frame time estimation method. The code is shared with the constant frame interpolator. Refresh rate estimation for this scheduler is still statically modified by the "Refresh Rate Adjustment" item in the Output tab of the options screen.
For a better view on the scheduler, I've expanded the stats screen jitter graph (and right-aligned it).

Items to do next:
I have the suspicion that presenter modes used with Aero (both schedulers) and the constant frame interpolator are demanding too much exclusive access on system resources and reserve a lot of those as well. For some reason this doesn't seem to really happen at all when the D3D exclusive mode is used. For example, during testing I've seen differences of more than 200 MB in the video memory pool. That pool includes both used and reserved memory, but I can't get a clear view of it in a per-object basis. I can't get a reading on the different responses of the device I create for the renderer and the allocations done by the operating system and drivers. I'll prioritize some effort to solve this issue.
The windowed full screen seek bar is incompatible with the renderer. I don't know yet if it's easier for me to fix that bar or create a new bar and OSD system for the exclusive mode.
I've lined up both Bézier and Lanczos resizers for integration. I'm still wondering about the filter weights issue I posted about earlier: http://forum.doom9.org/showthread.php?p=1540386#post1540386 . Maybe someone knows more about compensation functions for windowed sync interpolation?
I've written a lot of text over time, and a Russian website already hosts a guide on MPC-HC featuring some of that, but any guide in English is either very old or very incomplete. The newer texts written on most topics are are in bits and pieces. (The guide of the stats screen is pretty much done though.) It all needs compiling and checking for grammar, spelling and completeness. I'll happily write some extra bits about some topics, but I'll probably have to ask someone else to put the guide together.

@Hera and CruNcher: I'll take a look at both issues. The windowed full screen bar will be a tough cookie though.