Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
17th November 2011, 21:02 | #702 | Link |
契約者
Join Date: Jun 2008
Posts: 1,576
|
lanczos is a shortcut for bicubic with taps=3 or something like that. don't really remember.
lanczos4 is a bicubic too, but with taps=4 spline on the other hand is somewhat different than lanczos and "many people think it gives the best result" Last edited by Keiyakusha; 17th November 2011 at 21:05. |
17th November 2011, 21:04 | #703 | Link | |
Registered User
Join Date: Nov 2010
Posts: 15
|
Quote:
|
|
17th November 2011, 21:17 | #705 | Link | |
Registered User
Join Date: Nov 2010
Posts: 15
|
Quote:
Just Google to find numerous discussions regarding advantages or disadvantages for Lanczos vs. Bicubic. |
|
17th November 2011, 21:28 | #707 | Link |
Registered User
Join Date: Oct 2010
Location: The Netherlands
Posts: 1,083
|
The bicubic filters really don't look a thing like the windowed sinc filters. I know the base filter, but I very much prefer a processed convolution using the weights functions.
base Lanczos sample weight filter: Code:
#define PI acos(-1)// this will generate the number PI with full precision, it is useful for example with the sin, cos and tan functions double LanczosFilter(double x, double radius) { if(x == 0) return 1; if(x < 0) x = -x;// intrinsic sign(...) is faster as it only reads 1 bit, but for a quick sample this is sufficient if(x >= radius) return 0;// cutoff if beyond 2 for lanczos2, 3 for lanczos3, radius is normally a static const return sin(x*PI)*sin(x*PI/radius)/(x*x*PI*PI/radius); }
__________________
development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv Last edited by JanWillem32; 17th November 2011 at 21:39. Reason: removed a comment |
17th November 2011, 23:14 | #709 | Link |
Registered User
Join Date: Oct 2010
Location: The Netherlands
Posts: 1,083
|
Lanczos2, probably correct in function, but not optimized:
Code:
// (C) 2011 Jan-Willem Krans (janwillem32 <at> hotmail.com) // This file is part of Video pixel shader pack. // This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2. // This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. // You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. // prototype Lanczos2 height resizer // This shader should be run as a screen space pixel shader if you are up-scaling. // This shader should not be run as a screen space pixel shader if you are down-scaling. // This shader requires compiling with ps_2_0, but higher is better, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports. // If possible, avoid compiling with the software emulation modes (ps_?_sw). Pixel shaders require a lot of processing power to run in real-time software mode. // This shader is meant to work with linear RGB input and output. Regular R'G'B' with a video gamma encoding will have to be converted with the linear gamma shaders to work properly. // Use this shader to scale the height of an image by Lanczos2 interpolation. // fractions, either decimal or not, are allowed // set the magnification factor #define Magnify (4/3.) sampler s0; float2 c0; float2 c1; #define sp(a, b) float4 a = tex2D(s0, float2(tex.x, coord+b*c1.y)); #define PI acos(-1)// this will generate the number PI with full precision float4 main(float2 tex : TEXCOORD0) : COLOR { float coord = (tex.y/Magnify+.5-.5/Magnify)*c0.y;// assign the output position, normalized to texture height in pixels float t = frac(coord);// calculate the difference between the output pixel and the original surrounding two pixels coord = (coord-t+.5)*c1.y;// adjust sampling matrix to put the ouput pixel in the interval [Q1, Q2) sp(Q1, 0)// nearest original pixel to the top if (t) { sp(Q0, -1) sp(Q2, 1) sp(Q3, 2)// original pixels float Lanczos2wQ0 = sin((1.+t)*PI)*sin((1.+t)*PI/2.)/(pow(1.+t, 2)*PI*PI/2.); float Lanczos2wQ1 = sin(t*PI)*sin(t*PI/2.)/(t*t*PI*PI/2.); float Lanczos2wQ2 = sin((1.-t)*PI)*sin((1.-t)*PI/2.)/(pow(1.-t, 2)*PI*PI/2.); float Lanczos2wQ3 = sin((2.-t)*PI)*sin((2.-t)*PI/2.)/(pow(2.-t, 2)*PI*PI/2.); return Lanczos2wQ0*Q0+Lanczos2wQ1*Q1+Lanczos2wQ2*Q2+Lanczos2wQ3*Q3;}// interpolation output return Q1;// case float t == 0 is required to return sample Q1, because of a possible division by 0 } // (C) 2011 Jan-Willem Krans (janwillem32 <at> hotmail.com) // This file is part of Video pixel shader pack. // This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2. // This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. // You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. // prototype Lanczos2 width resizer // This shader should be run as a screen space pixel shader if you are up-scaling. // This shader should not be run as a screen space pixel shader if you are down-scaling. // This shader requires compiling with ps_2_0, but higher is better, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports. // If possible, avoid compiling with the software emulation modes (ps_?_sw). Pixel shaders require a lot of processing power to run in real-time software mode. // This shader is meant to work with linear RGB input and output. Regular R'G'B' with a video gamma encoding will have to be converted with the linear gamma shaders to work properly. // Use this shader to scale the width of an image by Lanczos2 interpolation. // fractions, either decimal or not, are allowed // set the magnification factor #define Magnify (4/3.) sampler s0; float c0; float c1; #define sp(a, b) float4 a = tex2D(s0, float2(coord+b*c1, tex.y)); #define PI acos(-1)// this will generate the number PI with full precision float4 main(float2 tex : TEXCOORD0) : COLOR { float coord = (tex.x/Magnify+.5-.5/Magnify)*c0;// assign the output position, normalized to texture height in pixels float t = frac(coord);// calculate the difference between the output pixel and the original surrounding two pixels coord = (coord-t+.5)*c1;// adjust sampling matrix to put the ouput pixel in the interval [Q1, Q2) sp(Q1, 0)// nearest original pixel to the left if (t) { sp(Q0, -1) sp(Q2, 1) sp(Q3, 2)// original pixels float Lanczos2wQ0 = sin((1.+t)*PI)*sin((1.+t)*PI/2.)/(pow(1.+t, 2)*PI*PI/2.); float Lanczos2wQ1 = sin(t*PI)*sin(t*PI/2.)/(t*t*PI*PI/2.); float Lanczos2wQ2 = sin((1.-t)*PI)*sin((1.-t)*PI/2.)/(pow(1.-t, 2)*PI*PI/2.); float Lanczos2wQ3 = sin((2.-t)*PI)*sin((2.-t)*PI/2.)/(pow(2.-t, 2)*PI*PI/2.); return Lanczos2wQ0*Q0+Lanczos2wQ1*Q1+Lanczos2wQ2*Q2+Lanczos2wQ3*Q3;}// interpolation output return Q1;// case float t == 0 is required to return sample Q1, because of a possible division by 0 }
__________________
development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv |
17th November 2011, 23:29 | #710 | Link |
.NET Web App Dev
Join Date: May 2010
Location: USA
Posts: 291
|
When the CPU cannot keep up, unlike Haali Renderer, EVR desyncs to a certain point and then cuts out audio and it is disastrous.
When you say "code to allow dropping frames" - will this mean no audio dropouts when CPU cannot keep up? Under this situation, if the CPU *always* cannot keep up - can the behavior match/exceed what Haali Renderer does? Off-topic question, The "#define PI acos(-1)" is being cached by the compiler?
__________________
Intel i7 5820k / 16 GB DDR4 / NV 970 / 4K ASUS Windows 8.1 |
18th November 2011, 00:22 | #711 | Link |
Registered User
Join Date: Oct 2010
Location: The Netherlands
Posts: 1,083
|
The new schedulers indeed won't drop frames (these will only try to modulate ready frames if it's really late or early), unless desynchronization by the global timer is detected (which allows huge time differences of the audio and video timers). In that case, timers for both the video and audio renderer are reset. The old scheduler will still often drop ready frames from the mixer, or prevent frames from decoding, with any sort of interruption in my case.
The "#define PI acos(-1)" just converts to a constant floating-point number. You can take a look at what a pixel shader stores in its registers by opening the "Shader editor", make the compiler compile some code, extend the gray bottom box to readable size, and scroll trough to the output assembly. (A somewhat faster alternative for developers is to compile a .TXT file, using the fxc.exe utility in the DirectX SDK.) Code:
// Parameters: // // float c0; // float c1; // sampler2D s0; // // // Registers: // // Name Reg Size // ------------ ----- ---- // c0 c0 1 // c1 c1 1 // s0 s0 1 // ps_3_0 def c2, 0.375, 0.3125, 0.5, 2 def c3, 1, 1.57079637, 6.28318548, -3.14159274 def c4, 4.93480206, 1, 2, 0 dcl_texcoord v0.xy // tex<0,1> dcl_2d s0 #line 27 mad r0.x, v0.x, c2.x, c2.y mul r0.y, r0.x, c0.x // ::coord<0> frc r0.y, r0.y // ::t<0> #line 34 add r0.z, r0.y, c3.x mad r0.w, r0.z, c2.z, c2.z frc r0.w, r0.w mad r0.w, r0.w, c3.z, c3.w sincos r1.y, r0.w mul r0.w, r0.z, c3.y mul r0.z, r0.z, r0.z mul r0.z, r0.z, c4.x rcp r0.z, r0.z sincos r2.y, r0.w mul r0.w, r1.y, r2.y mul r0.z, r0.z, r0.w // ::Lanczos2wQ0<0> mul r1.xy, r0.y, c3_abs.wyzw sincos r2.y, r1.x sincos r3.y, r1.y mul r0.w, r2.y, r3.y mul r1.x, r0.y, r0.y mul r1.x, r1.x, c4.x rcp r1.x, r1.x mul r0.w, r0.w, r1.x // ::Lanczos2wQ1<0> #line 29 mad r0.x, r0.x, c0.x, -r0.y add r0.x, r0.x, c2.z mul r1.x, r0.x, c1.x // ::coord<0> #line 33 mov r1.yw, v0.y texld r2, r1, s0 // ::Q1<0,1,2,3> mov r3.w, c2.w mad r3.z, c1.x, r3.w, r1.x #line 38 mul r4, r0.w, r2 #line 33 mad r1.z, r0.x, c1.x, -c1.x mad r3.x, r0.x, c1.x, c1.x texld r1, r1.zwzw, s0 // ::Q0<0,1,2,3> #line 38 mad r1, r0.z, r1, r4 add r0.xz, -r0.y, c4.yyzw mul r4.xyz, r0.xxzw, c3_abs.wyyw sincos r5.y, r4.x sincos r6.y, r4.y sincos r7.y, r4.z mul r0.w, r5.y, r6.y mul r4.xy, r0.xzzw, r0.xzzw mad r0.x, r0.z, c2.z, c2.z frc r0.x, r0.x mad r0.x, r0.x, c3.z, c3.w sincos r5.y, r0.x mul r0.x, r7.y, r5.y mul r4.xy, r4, c4.x rcp r0.z, r4.x rcp r4.x, r4.y mul r0.x, r0.x, r4.x // ::Lanczos2wQ3<0> mul r0.z, r0.z, r0.w // ::Lanczos2wQ2<0> #line 33 mov r3.yw, v0.y texld r4, r3, s0 // ::Q2<0,1,2,3> texld r3, r3.zwzw, s0 // ::Q3<0,1,2,3> #line 38 mad r1, r0.z, r4, r1 mad r1, r0.x, r3, r1 // ::main<0,1,2,3> cmp oC0, -r0.y, r2, r1 // ::main<0,1,2,3> // approximately 114 instruction slots used (4 texture, 110 arithmetic)
__________________
development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv Last edited by JanWillem32; 18th November 2011 at 00:43. Reason: added a comment |
18th November 2011, 09:29 | #712 | Link | |
Registered User
Join Date: Feb 2006
Posts: 1,076
|
Quote:
|
|
18th November 2011, 09:40 | #713 | Link |
Registered User
Join Date: Aug 2011
Posts: 98
|
+1 for Lanczos, I use that with 4 taps on madVR and it provides comparable results to spline 3/4taps, most people seem to prefer lanczos to spline though, depends on how much you hate ringing for the most part, lanczos gives a bit more.
Take your time and I look forward to it if you add it |
18th November 2011, 09:44 | #714 | Link |
Registered User
Join Date: Oct 2010
Location: The Netherlands
Posts: 1,083
|
mpc-hc SSE tester dfr3836i
Thanks for the tips, maybe I'll find something in the Avisynth code. I'll definitely take some time to add a set of resizers. It's a lot of work to get the shader code running, and even more work to write the menu entries (the main reason I added a whole set of six resizers at once the last time).
I've taken a look at the external filters. For the renderers it was easier to solve than I thought at first. (I was afraid it was one of the changes to the subtitle renderer.) I've also fixed a minor bit of memory management for the subtitle renderer.
__________________
development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv Last edited by JanWillem32; 5th December 2011 at 01:38. Reason: removed old link |
19th November 2011, 05:35 | #715 | Link |
Registered User
Join Date: Sep 2011
Posts: 33
|
Initial test looks good. No tearing.
Question regarding mixer format: With the official version I get input and output same NV12. This build I get NV12 input and x8R8G8B8. What is the difference between these mixer formats?
__________________
Windows 7 SP1 X64 | AMD PHENOM II X4 P960 | ATI 5450 | 6GB RAM |
19th November 2011, 08:13 | #716 | Link |
Registered User
Join Date: Oct 2010
Location: The Netherlands
Posts: 1,083
|
I changed the stats screen a bit to show what raw format is transferred from the decoder to the mixer, and what format is transferred from the mixer to the renderer. The stats screen of the renderer in the trunk build isn't very clear on this point, but it really does the same thing.
__________________
development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv |
19th November 2011, 10:46 | #717 | Link | |
Registered User
Join Date: Feb 2006
Posts: 1,076
|
Quote:
Was is possible to find out what format the renderer actually outputs to the display ? (Reason for this: to verify if 10-bit is actually outputted, and no dithering is done in/by the renderer) |
|
19th November 2011, 12:11 | #718 | Link |
Registered User
Join Date: Oct 2010
Location: The Netherlands
Posts: 1,083
|
The renderer itself can only dither when it's commanded to do so (setting dithering level 0 will definitely disable it). As to the capacity of the driver to dither down a 10-bit frontbuffer (in exclusive mode) to 8-bit, I've never seen it happening. My main working display is analog (analog connections on my HD4890 are fed by a 10-bit DAC) and my secondary device accepts 10-bit input easily trough HDMI. Maybe someone can try it out again with an old 8-bit DVI-D monitor on a video card that's generally capable of 10-bit output, but of course won't be able to with such a monitor. Will the monitor give a black screen, will the driver refuse to enable 10-bit output mode (so it can be seen in the stats screen), fake the 10-bit output by rounding to 8-bit, or fake the 10-bit output by dithering to 8-bit?
The last option is actually unlikely. Dithering is pretty heavy, so it's a task for the shadercore (biggest block of transistors in a GPU). When the data is written to a back buffer (which later on shifts places to become a front buffer), the shadercore generally doesn't read or write to/from it again. Reading from the front buffer is done by the micro-controllers for the hardware output ports. These do have some logic on board to convert the front buffer format in memory to signals to output on TMDS (DVI/HDMI), Mini-packet (DP) or analog (through a DAC). I can't imagine that any of those micro-controllers would have a dithering unit on board. The other three options can simply be observed. Note that some digital displays will actually report the type and bit depth of the incoming signal.
__________________
development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv |
19th November 2011, 12:33 | #719 | Link | |
Registered User
Join Date: Feb 2006
Posts: 1,076
|
Quote:
|
|
19th November 2011, 14:01 | #720 | Link |
Registered User
Join Date: Oct 2006
Posts: 150
|
Apart from the crashes when resizing, dfr3810i was the last good build working for me. Ever since the new scheduler build I just can't get smooth playback of 1080p content with CUVID decoding, whether I tick the Disable Scheduler or not. However, I don't think it's exactly related to the new scheduler, because it also happens with VMR9r which doesn't even have the new scheduler. During playback at 16 bit the framerate would drop erratically, but playback is fine if I switch to a 720p video or 8 bit or use software decoding. At first it seemed the problem was because video memory was getting filled but GPU-Z reports the same usage for 3810i and 3836i. Moreover, the problem doesn't happen when Aero is disabled or I use exclusive mode. GPU usage stays below 70% so it's probably not due to workload.
3833i seemed to have a video memory leak in conjunction with CUVID decoding but it seems to have been fixed in 3836, so that's good. |
Thread Tools | Search this Thread |
Display Modes | |
|
|