Video pixel shader pack [Archive] - Page 7

View Full Version : Video pixel shader pack

Pages : 1 2 3 4 5 6 [7] 8 9 10 11

Qaq

30th August 2011, 18:20

Can you test the "draw grid coordinates" shader? It should draw single-pixel wide RGB lines horizontally and vertically when full picture resizing and aspect ratio correction are disabled. If there's something wrong with the processing after the renderer output, the lines will be imperfect.
Tried the shader with 1080 video. Picture looks exactly the same in MPC-HC and in image viewer.
http://lostpic.net/thumbs/5a4f2f5483a7601144e257d3565c1ba0.png (http://lostpic.net/?view=5a4f2f5483a7601144e257d3565c1ba0)
Can't say I understand what to call perfect and imperfect here, but lines seems straight at least.
BTW, I found that picture looks completely different if I switch HDMI input between YCC and RGB. Seems like this shader is good test for HDMI chroma sub-sampling bug too.

JanWillem32

31st August 2011, 07:53

The "draw grid coordinates" is exactly to test for chroma sub-sampling errors. The other function is a geometry checkup. If the lines are not projected as single-pixel wide on screen, because of errors with overscan/underscan, resolution or geometry settings, the lines will be blended. Failure to map pixels 1:1 to the screen is a common problem that causes visible errors.

JanWillem32

14th September 2011, 09:27

I've just added version 2 of the YCbCr-type sharpen complex test.
These shaders are a bit better at banding detection, especially for chroma. They were only somewhat difficult to configure a reasonable sharpness to debanding ratio for.
For those that use the chroma up-sampling sets, the alternatives for the three 4:2:2 up-sampling shaders are included. Others can use one of the two "RGB to Y'CbCr for SD&HD video input"-type shaders to pre-process to Y'CbCr mode.
If anything could use improvement, please tell me.

TheElix

14th September 2011, 09:48

Please, could you tell me what differs this YCbCr-type version from the usual one and when one should prefer it over the usual version and what benefits it would give?

JanWillem32

14th September 2011, 11:02

The regular ones work on linear RGB. That's near-optimal for input with a reasonably linear response, a large gamut, a well-defined color interval at [0, 1] and no chroma sub-sampling issues. Unfortunately, consumer-grade video fails on all four accounts.
It's usually chroma sub-sampled. I've seen reasonably proper blu-ray masters that greatly overstep their nominal range (Y'CbCr, interval [16, 235], [16, 240], [16, 240]). The source gamut of HD video is limited to just sRGB, the gamut of SD video is even less than that. The Cb and Cr channels are reasonably linear (these are usually overpowered in presence by the luma channel in the matrix), but the Y' channel is a very bad approximation to linear lightness.
Adapting the filter to work on flaws of the source video in Y'CbCr colors is a good idea, but designing the shader was hard. The fist few versions I made looked terrible, these couldn't keep a good distinction in the areas to deband.
I'm actually pretty satisfied with the filter as it is now, but perhaps the main function could use some extra customization user settings apart from the regular "NoiseLevel".

TheElix

14th September 2011, 14:56

On what content do you recommend this filter to be used? I want to make some screenshot comparisons.

JanWillem32

14th September 2011, 15:11

Any standard Y'CbCr video will do fine. Try a variety, while adjusting "NoiseLevel" to a suitable value for each. Of course, the values for sharpening are by user preference. The current version is very sensitive to the sharpening parameters. A little too low blurs too much, a little too high and the typical sharpening artifacts become very visible.

TheElix

14th September 2011, 15:18

Aren't all DVD/BD content Y'CbCr? As well as TV translations... Also, adjusting parameters in a shader for each video is a little extreme. But for the sake of experiment....

JanWillem32

14th September 2011, 15:36

Pretty much all consumer-grade digital video uses Y'CbCr. I wouldn't use this kind of filter on superior formats, like when loading a standard BMP through the still image filter, for instance.
I've always supplied 5 versions of the same shader, each with only the "NoiseLevel" parameter changed (values .625, .75, 1, 1.5 and 2.5).

TheElix

14th September 2011, 21:16

Regular chain: http://img7.imageshack.us/img7/6919/regularshaders.png
With YCbCr-type 3. special 4÷2÷2 cubic B-spline5 chroma up-sampling: http://img190.imageshack.us/img190/6629/ycbcrtypeshaders.png

Qaq

14th September 2011, 22:49

Yeah, I remember it was red then I did something wrong. TheElix, make sure you haven't missed anything from the shader code.

TheElix

14th September 2011, 23:53

Nothing is missed with ctrl+A, ctrl+V.

JanWillem32

15th September 2011, 05:17

The regular type requires pre-processing to linear RGB by a "gamma conversion of video RGB to linear RGB"-type shader or integrated gamma function. The regular type will not change that gamma setting, so correction of gamma just before display output is required, too.
The Y'CbCr-type shader requires Y'CbCr pre-processing by a "RGB to Y'CbCr for SD&HD video input"-type shader (used on non-linear input). For those that already use multi-pass chroma interpolation, it would be rather silly to do that conversion twice, so I added alternatives to step 3, and moved the set of color controls. The Y'CbCr-type shader outputs linear RGB by default, but can be reconfigured easily.

CeeJay.dk

30th November 2011, 00:16

I posted my LumaSharpen shader (http://forum.doom9.org/showthread.php?p=1541796#post1541796) in another thread earlier and today I found this thread.

My shader is basically Y' version of Sharpen Complex but optimized for speed (still achieves better quality than the original) since it's primarily meant to be used for gaming.
I'm using an injector that allows me to do post-processing of the screenbuffer of any Direct3d game and the sharpen shader helps with the many upscaled textures you find in a game.

It's my first shader ever and my first real bit of programming I've done since school 16 years ago, but I feel I've already acomplished a lot in the few days I've been working on it.

Let me know what you think.

JanWillem32

5th December 2011, 01:26

That's quite interesting. I'll take a good look when I have more time. A quick glance at the code shows sqrt over mul. That's usually an improper way to deal with vectors. Try using abs on the delta values, next you can try using the more regular dot or length intrinsics. Be careful with pixel radius. The pixel shader sampler can be set to bilinear (or a few other functions on top of that), or in the case of MPC-HC, nearest neighbor. When sampling pixels, only use whole pixel sizes as distances. At the end, there's a function that uses pow with a floating-point exponent. That function will output undefined values if negative values are input. See a few of my shaders for reference on processing such values
Lastly, I've made a similar function: "sharpen\unsharp luma mask for SD&HD video". (Altough I've never really understood the black compensation, and I don't care much for versions without deband.) The shaders for the YCbCr-type sharpen complex test are a lot more advanced (and heavy), but maybe you can get some ideas from those for some of your own code. Note that the parameters are a bit extreme in those, I'm using a somewhat edited version of "r=6" myself... Might as well post that:// (C) 2011 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

#define NoiseLevel .75
#define Blur NoiseLevel/64.
#define EdgeSharpen 1.75*Blur
#define Sharpen0 .0625
#define Sharpen1 .140625
#define Sharpen2 .31640625
#define Sharpen3 .2109375
#define Sharpen4 .140625
#define Sharpen5 .09375
#define VideoGamma 2.4
sampler s0;
float2 c1 : register(c1);
#define sp(a, b) tex2D(s0, tex+c1*float2(a, b))
#define H0 Sharpen0*(1.125-dv3)
#define H1 Sharpen1*(1.125-dv3)
#define H2 Sharpen2*(1.125-dv3)
#define H3 Sharpen3*(1.125-dv3)
#define H4 Sharpen4*(1.125-dv3)
#define H5 Sharpen5*(1.125-dv3)
#define qp(a, b) ((dv = max(max((dv3 = abs(a)).x, dv3.y), dv3.z)) > b)?
#define D0d(a, b) qp((s1+b)/2.-a, ES) s1*(H0+1.)-a*H0
#define D0o(a, b) qp((s1+b)/3.-a, ES) s1*(H0+1.)-a*H0
#define D1d(a, b) (dv > BN)? (t1+a)/1.125*(H1+1.)-b*H1
#define D1o(a, b) (dv > BN)? (t1+a)/1.125*(H1+1.)-b/2.*H1
#define D2d(a, b, c) qp((a+c)/3.-b, BN) (t1+a+b)/2.125*(H2+1.)-c/2.*H2
#define D2o(a, b, c) qp((a+c)/3.-b/2., BN) (t1+a+b)/3.125*(H2+1.)-c/2.*H2
#define D3d(a, b, c, d) qp((b+d)/5.-c/2., BN) (t1+a+b+c)/4.125*(H3+1.)-d/4.*H3
#define D3o(a, b, c, d) qp((b+d)/6.-c/2., BN) (t1+a+b+c)/5.125*(H3+1.)-d/4.*H3
#define D4d(a, b, c, d) qp((b+d)/6.-c/4., BN) (t1+a+b+c)/8.125*(H4+1.)-d/4.*H4
#define D4o(a, b, c, d) qp((b+d)/5.-c/4., BN) (t1+a+b+c)/9.125*(H4+1.)-d/3.*H4
#define D5d(a, b, c, d) qp((b+d)/8.-c/4., BN) (t1+a+b+c)/12.125*(H5+1.)-d/4.*H5
#define D5o(a, b, c, d) qp((b+d)/8.-c/3., BN) (t1+a+b+c)/12.125*(H5+1.)-d/4.*H5
#define D6(a) (t1+a)/16.125
#define Dd(a, b, c, d, e, f) (D0d(a, b) : D1d(a, b) : D2d(a, b, c) : D3d(a, b, c, d) : D4d(a+b, c, d, e) : D5d(a+b+c, d, e, f) : D6(a+b+c+d+e+f))
#define Do(a, b, c, d, e, f) (D0o(a, b) : D1o(a, b) : D2o(a, b, c) : D3o(a, b, c, d) : D4o(a+b, c, d, e) : D5o(a+b+c, d, e, f) : D6(a+b+c+d+e+f))
float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 s1 = sp(0, 0).rgb;
float3 s2 = sp(-1, -1).rgb;
float3 s3 = sp(0, -1).rgb;
float3 s4 = sp(1, -1).rgb;
float3 s5 = sp(-1, 0).rgb;
float3 s6 = sp(1, 0).rgb;
float3 s7 = sp(-1, 1).rgb;
float3 s8 = sp(0, 1).rgb;
float3 s9 = sp(1, 1).rgb;
float3 r2 = sp(-2, -1).rgb;
float3 r3 = (sp(-1, -2)+sp(0, -2)).rgb;
float3 r4 = sp(1, -2).rgb;
float3 r5 = (sp(-2, 1)+sp(-2, 0)).rgb;
float3 r6 = (sp(2, 0)+sp(2, -1)).rgb;
float3 r7 = sp(-1, 2).rgb;
float3 r8 = (sp(0, 2)+sp(1, 2)).rgb;
float3 r9 = sp(2, 1).rgb;
float3 q2 = (sp(-2, -2)+sp(-3, -1)).rgb;
float3 q3 = (sp(-1, -3)+sp(0, -3)).rgb;
float3 q4 = (sp(2, -2)+sp(1, -3)).rgb;
float3 q5 = (sp(-3, 1)+sp(-3, 0)).rgb;
float3 q6 = (sp(3, 1)+sp(3, 0)).rgb;
float3 q7 = (sp(-1, 3)+sp(-2, 2)).rgb;
float3 q8 = (sp(0, 3)+sp(1, 3)).rgb;
float3 q9 = (sp(2, 2)+sp(3, -1)).rgb;
float3 p2 = (sp(-4, -1)+sp(-4, -2)+sp(-3, -2)+sp(-3, -3)).rgb;
float3 p3 = (sp(-2, -3)+sp(-2, -4)+sp(-1, -4)+sp(0, -4)).rgb;
float3 p4 = (sp(2, -3)+sp(3, -3)+sp(1, -4)+sp(2, -4)).rgb;
float3 p5 = (sp(-4, 2)+sp(-3, 2)+sp(-4, 1)+sp(-4, 0)).rgb;
float3 p6 = (sp(4, 0)+sp(4, -1)+sp(3, -2)+sp(4, -2)).rgb;
float3 p7 = (sp(-2, 4)+sp(-1, 4)+sp(-3, 3)+sp(-2, 3)).rgb;
float3 p8 = (sp(0, 4)+sp(1, 4)+sp(2, 4)+sp(2, 3)).rgb;
float3 p9 = (sp(3, 3)+sp(3, 2)+sp(4, 2)+sp(4, 1)).rgb;
float3 o2 = (sp(-5, -1)+sp(-5, -2)+sp(-4, -3)+sp(-3, -4)).rgb;
float3 o3 = (sp(-2, -5)+sp(-1, -5)+sp(0, -5)).rgb;
float3 o4 = (sp(4, -3)+sp(3, -4)+sp(1, -5)+sp(2, -5)).rgb;
float3 o5 = (sp(-5, 2)+sp(-5, 1)+sp(-5, 0)).rgb;
float3 o6 = (sp(5, 0)+sp(5, -1)+sp(5, -2)).rgb;
float3 o7 = (sp(-2, 5)+sp(-1, 5)+sp(-3, 4)+sp(-4, 3)).rgb;
float3 o8 = (sp(0, 5)+sp(1, 5)+sp(2, 5)).rgb;
float3 o9 = (sp(3, 4)+sp(4, 3)+sp(5, 2)+sp(5, 1)).rgb;
float3 n2 = (sp(-4, -4)+sp(-5, -3)+sp(-6, -2)+sp(-6, -1)).rgb;
float3 n3 = (sp(0, -6)+sp(-1, -6)+sp(-2, -6)+sp(-3, -5)).rgb;
float3 n4 = (sp(4, -4)+sp(3, -5)+sp(2, -6)+sp(1, -6)).rgb;
float3 n5 = (sp(-5, 3)+sp(-6, 2)+sp(-6, 1)+sp(-6, 0)).rgb;
float3 n6 = (sp(6, 0)+sp(6, -1)+sp(6, -2)+sp(5, -3)).rgb;
float3 n7 = (sp(-1, 6)+sp(-2, 6)+sp(-3, 5)+sp(-4, 4)).rgb;
float3 n8 = (sp(3, 5)+sp(2, 6)+sp(1, 6)+sp(0, 6)).rgb;
float3 n9 = (sp(6, 1)+sp(6, 2)+sp(5, 3)+sp(4, 4)).rgb;

float dv;
float3 dv3;
float BN = Blur;
float ES = EdgeSharpen;
float3 t1 = s1/8.;
float3 t0 = ((Dd(s2, r2, q2, p2, o2, n2)+Do(s3, r3, q3, p3, o3, n3)+Dd(s4, r4, q4, p4, o4, n4)+Do(s5, r5, q5, p5, o5, n5)+Do(s6, r6, q6, p6, o6, n6)+Dd(s7, r7, q7, p7, o7, n7)+Do(s8, r8, q8, p8, o8, n8)+Dd(s9, r9, q9, p9, o9, n9))/8.);
t0 = float3(t0.x+1.5748*t0.z, dot(t0, float3(1, -.1674679/.894, -.4185031/.894)), t0.x+1.8556*t0.y);// HD Y'CbCr to RGB
//t0 = float3(t0.x+1.402*t0.z, dot(t0, float3(1, -.202008/.587, -.419198/.587)), t0.x+1.772*t0.y);// SD Y'CbCr to RGB
float3 sbl = sign(t0);
t0 = sbl*pow(abs(t0), VideoGamma);
return t0.rgbb;
}

CruNcher

5th December 2011, 03:23

CeeJay.dk

5th December 2011, 18:55

A quick glance at the code shows sqrt over mul. That's usually an improper way to deal with vectors. Try using abs on the delta values, next you can try using the more regular dot or length intrinsics.

if (sqrt( mul(delta1,delta1) + mul(delta2,delta2)) > SharpenEdge) //?? Verify that the mul and sqrt aren't just there to get a positive value.

That bit of code is part of the edge detection from Sharpen Complex 2.
I agree .. multiplying a variable with itself and then taking the square root is a silly roundabout way to get a positive number when you can just use abs().

I don't use that part anymore. Mainly because edge detection was detecting large differences in contrast and applying less sharpening to those areas (well at least that how I used it .. I prefer edges smooth, not sharp) if the difference was above a certain threshold. But this causes some pixels that were sometimes over the threshold and sometimes under, to flicker which was very distracting to my eyes, so I turned it off.

I now just clamp the sharpening effect to a set maximum and this prevents edges from getting over-sharpened and it's much simpler and doesn't flicker.
There may be better way of limiting the sharpen effect from creating harsh artifacts and haloing, but for now I'm satisfied with just clamping the effect.

I may use the edge detection later if I can improve it or just throw it out, but for now I've left it in the code, but disabled it.

When sampling pixels, only use whole pixel sizes as distances.

But if I did that, then I couldn't exploit the hardware filtering trick that I use to greatly increase the speed over the original sharpen complex code.

Taking advantage of that is the best part of my shader.
When I sample on the edges of a pixel I get an average of the four pixels surrounding it. If I move the sampling point slightly I can adjust the weights of the pixels sampled.
Doing it this way even gets rid of the instructions that calculated the weights before .. those calculations are now free.

See http://prideout.net/archive/bloom/#Sneaky and http://rastergrid.com/blog/2010/09/efficient-gaussian-blur-with-linear-sampling/ for more detailed explanations of what I'm doing.

I'm currently also considering more exotic sampling patterns as well as using mipmap samples, to see if I can get a very large gaussian blur using very few samples.

For now I'm satisfied with using just 5 samples to get a 9-tap equivalent, but if I later wanted to try my hand at a local contrast enhancement (http://www.cambridgeincolour.com/tutorials/local-contrast-enhancement.htm) shader I would need a much larger blur.
You're using a huge amount of samples - Local contrast enhancement might be something you would want to try.

I use my shaders for games though, and I need them to be very efficient, so I'm holding off on that until I find fast way to do very large blurs.
Maybe box blur or one of the blurs from http://incubator.quasimondo.com/ - Stack blur, Superfast blur or Son of Gauss.

At the end, there's a function that uses pow with a floating-point exponent. That function will output undefined values if negative values are input. See a few of my shaders for reference on processing such values

//done = float4(pow(done, 1.0 / 2.2 )); Convert a sRGB colorspace to non-linear gamma 2.2 - Turned off because of precision errors

I was experimenting with doing the calculations in linear colorspace to see if that would improve the quality any. It produced errors instead. It should not use any negative values as input, but I may have overlooked something and you're right.
I'll take a look your code to see how you do it. Reading your code is not that easy though as you don't use comments much.

CeeJay.dk

6th December 2011, 23:04

Sounds cool CeeJay thx, i wonder if it is comparable to mirillis implementation in performance/quality they use their own Direct 3D renderer + Shader and in terms of GPU resources it's damn efficient and looks comparable to Sharpen Complex 2 though it seems to work different :)
they call it "Detail Boost" http://mirillis.com/en/products/picture2.html it deblurs very heavy quantized stuff very efficiently with low GPU resources tests based on Sandy Bridge HD2000 (GT1) 6 EU upto 1080p

I haven't seen mirilis Splash before, but it does look good. I'd love to know what technique they use.

I don't know how LumaSharpen compares to Detail boost, but properly tweaked it's slightly better than Sharpen Complex 2 qualitywise and much faster.

Between LumaSharpen, Sharpen Complex and Jan's Sharpen/Denoise/Deband shaders, Jan's probably has the best quality but also takes a lot of GPU time.

jokerb47

21st December 2011, 20:55

How to use these PS for image viewing?

Qaq

22nd December 2011, 06:51

Read first post:
To make a pixel shader work in MPC-HC:....

CeeJay.dk

24th December 2011, 12:09

I've noticed my shader in MPC seems to stretch the image from the center with about a pixels length. But only does this in MPC, not when used in combination with the dx9 shader injector i'm developing it for.
Why is that ?

I suspect I may need to change something in MPC's render settings or may force it to use bilinear filtering, but I'm not sure.

JanWillem32

28th December 2011, 01:58

@CeeJay.dk: Sorry for the delay.
I think it's a good thing you removed that part of the code you mentioned.
The original "Sharpen Complex 2" does indeed produce aliasing artifacts (along with banding and non-uniform filtering in its input to output color spectrum). With a branch in the code path implemented like that it's pretty inevitable.
I'm also familiar with the difficulty of getting good parameters for filter parts to get a proper, uniform response with various scenes and also varying video quality of course.

There's no speed increase by filtering sampling on 4 pixel borders using a bilinear sampler state over filtering sampling on 9 pixels using a nearest neighbor (point) sampler state (for this case). Shaders are interpreted by the display driver before being transported to the GPU's command cue and execution caches. Resolving a bilinear sampler state is done on the shadercore, the same processor chains that process the shaders themselves.
As usual, this is pretty hard to measure. Shadercores are superscalar engines, meaning that the pipelines can process multiple operations simultaneously, as long as those are in independent parts of the pipeline (doing different types of operations).
An example of that can be seen in the disassembly of a pixel shader with mixed code and multiple pixels being sampled. Under level 3.0 compiling (DirectX 9.0c) and onwards, sampling commands will be spread out over the length of the code. That's flow control (there are even some compiler flags to influence it). Sampling pixels is slow, and during sampling to a register, simple arithmetic can be done on independent registers (such as values from previous sample instructions). This means that applying a simple filter, whether it's from a bilinear averaging from a sampler state or simple self-written code, it's not likely to have any impact on the global speed of rendering at all.
(Note 1: Enabling flow control generally increases the total instruction count a bit but executes faster due to better pipeline utilization.
Note 2: DirectX 10 and onwards supports the intrinsics Sample() and Load(). Load() ignores sampler states and is better optimized by the compiler and driver because of that.)
In the pages you posted, both code for an OpenGL target (DirectX 9 and 10+ handle things quite a bit different). There's also no disassembly or data on globally active sampler states posted with any of the functions unfortunately. A difference in code target vectorization (four floats in a register usage for arithmetic functions, instead of fewer, or even only one float at a time) and serialization (flow control, mixed instructions using independent registers will take less time on average than instructions that repeat or executes while re-using the same registers multiple times in succession) can point out important performance issues.
(Note 3: SSE code on the CPU behaves pretty much the same way. The registers on the GPU's shadercore and the CPU's SSE core even support the same data configurations (with two rare exceptions). Just don't use any integer arithmetic on the GPU if it's not really necessary, the performance is very bad.
Note 4: Next to instruction vectorization and serialization there's parallelization, achieved trough rendering many 4×4 output pixel chunks in separate threads on the many execution units of the shadercore.)

Using mipmaps as a source for square-blurred samples is an idea I looked into earlier as well. I render from and to usual render-target textures. When I want to use the mip levels of such a texture, I'll have to order the device to generate it from the top-level surface that was just rendered. That's because the mip levels are not written during rendering at all.
I haven't had much success with this kind of mipmap usage at all. The blur is low-resolution and purely square which made it look aliased. If you've found a better-looking method, please share it. I'm still looking at integrating efficient debanding, sharpening and other basic functions for the internal video renderers.

Local contrast enhancement is indeed very interesting, so thank you for the link. This method solves some of the problems with a single box sampling area to unsharp mask (as noted in the text).
The parts "Complications" and "Further reading" are informative as well, although "Complications" could be spiced up a bit more in a technical sense. The first part describes the very common distortions found when rendering in a non-uniform colorspace. The suggestions to the solution are insightful, but don't go all the way. The suggested LAB colorspace is designed scale visually linear internally to luminance and chrominance dimensions. It's not mathematically linear or linear to lightness. In most of its encoded forms it also can't cover the entire range visible colors. For professional purposes the uniform (and directly related) color spaces XYZ an xyY are often used for rendering. (I'm also thinking it would be a good format to use by the internal renderer of MPC-HC, too. Although it can't be used with the 8- or 10-bit buffer formats.)
The second issue is the one I saw first in the sample pictures. Damage to the dynamic range is really common with image filters. If you butcher a picture with those bad enough, you can't even do a successful 'repair' filter pass afterwards (even with the most ideal precision intermediate picture buffers). As I understand, the pictures are dramatized to show a stronger effect of the filter than it is likely to be ever used.
I think this function could benefit from a convolution function: a gradual function to transform measured area sharpness to a factor used to directly blur or sharpen by that amount (possibly evaluating it for multiple directions from the base pixel). I'll take a look if I can write a prototype pixel shader based on this.

About the pow() intrinsic function; the compiler will warn that will not work properly on negative numbers. I used to have problems with it too, before I found out you can simply carry over the sign bits with the sign() intrinsic. After that, it behaved exactly as expected.

About writing comments in the code, take a look at the links in the opening post of this thread or the code packages in my folder. As far as I know, I make plenty of comments in my code.

About quality of the various sharpening filters; I think a very good balance in efficiency and quality could be achieved using an integrated multi-pass solution coded natively in the renderer, instead of in a single pixel shader. As I prefer to measure the sharpness relative to the current pixel in multiple directions and process blurring or sharpening per direction as well, generating a look-up map or similar optimization like that is difficult. I'd love to try some new methods that are a bit lighter than my current set. I know that that takes a flash of inspiration, and then a lot of patience to actually see any decent results.

Finally, to answer your latest post;
I've specifically hard-coded the renderer to only enable a bilinear sampler state on the filter passes for the color management's look-up table, automatically for resizing of subtitles and by (one) user setting for the main video resizing passes. (In the version of the renderer I edited, I also removed it for the main video resizing pass.)
I don't exactly remember how the renderer in the trunk build managed vertices at all. When I dumped the renderer core and imported a new one, I made the vertex caches indexed. After that, I never had any problems with it anymore. (The version in the trunk build can't even manage two-pass resizing.) AFAIK the vertices I coded have no displacement problems, but I can't guarantee that for the renderer in the trunk build.

I hope I didn't bore anyone with my recent walls of text on this forum... I'm just trying to help people a bit after being away for a while.:)

CeeJay.dk

31st December 2011, 13:37

There's no speed increase by filtering sampling on 4 pixel borders using a bilinear sampler state over filtering sampling on 9 pixels using a nearest neighbor (point) sampler state (for this case).
...
As usual, this is pretty hard to measure.

AMDs GPU Shaderanalyzer, in-game FPS meters, and using MSI Afterburner to measure GPU load, all tell me that using 4 texture fetches is faster than using 9.

Sampling pixels is slow, and during sampling to a register, simple arithmetic can be done on independent registers (such as values from previous sample instructions). This means that applying a simple filter, whether it's from a bilinear averaging from a sampler state or simple self-written code, it's not likely to have any impact on the global speed of rendering at all.

If you are saying that it often doesn't matter to optimize the number of ALU instructions, because the bottleneck when sampling several pixel is going to be texture fetches then I already know.
GPU Shaderanalyzer reports that texture fetches are the bottleneck on 11 of the 17 AMD cards is displays statistics about.

Just don't use any integer arithmetic on the GPU if it's not really necessary, the performance is very bad.

I know .. I won't.

I hope I didn't bore anyone with my recent walls of text on this forum... I'm just trying to help people a bit after being away for a while.:)

I'm not bored .. in fact I have more to comment on it , I just don't have time right now .. it's New Years.

Happy New Year!

P.S.

I thought you might like to see how I'm progressing .. this is the latest iteration of LumaSharpen (there are still lots more to be done .. I have so many ideas - I'll share some later if I get the time.)

/*
____________________

LumaSharpen 1.1.2
____________________

by Christian Cann Schuldt Jensen ~ CeeJay.dk

Based on Sharpen Complex 2 from Media Player Classic
(I have rewritten most of the code by now though)

It blurs the original pixel with the surrounding pixels and then subtracts this blur to sharpen the image.
It does this in luma to avoid color artifacts and allows limiting the maximum sharpning to avoid or lessen halo artifacts.

This is similar to converting an image to LAB colorspace and using Unsharp Mask on the Lightness channel in Photoshop.

Compiles with both PS 2.0 and 3.0 (Faster with 3.0)
*/

// .----------------------------------------------------._User settings_.---------------------------------------------------.

// -- Sharpening --
#define offset_bias 1.0 // I suggest a value between 0.0 and 2.0 - default is 1.0
#define sharp_strength 0.5 // Strength of the sharpening - You should probably use something between 0.2 and 2.0 - default is 0.5
#define sharp_clamp 0.015 // Limits maximum amount of sharpening a pixel recieves - Default is 0.015

#define pattern 2 // Choose a sample pattern ( 1, 2 or 3 )

// .--------------------------------------------------._Defining constants_.------------------------------------------------.

/* For use with SMAA injector.
#define s0 colorTexG
#define px BUFFER_RCP_WIDTH
#define py BUFFER_RCP_HEIGHT
*/

// For use with Shaderanalyzer and MPC-HC
sampler s0 : register(s0);
float4 p0 : register(c0);
float4 p1 : register(c1);

#define width (p0[0])
#define height (p0[1])

#define px (p1[0])
#define py (p1[1])

//#define dx (offset_bias*px)
//#define dy (offset_bias*py)

#define CoefLuma float4(0.2126, 0.7152, 0.0722, 0) // BT.709 & sRBG luma coefficient (Monitors and HD Television)
//#define CoefLuma float4(0.299, 0.587, 0.114, 0) // BT.601 luma coefficient (SD Television)
//#define CoefLuma float4(0.3333, 0.3334, 0.3333, 0) // Equal weight coefficient

#define sharp_strength_luma (CoefLuma * sharp_strength)

// .------------------------------------------------------._Main code_.-----------------------------------------------------.

//float4 SharpenPass( float2 tex )
float4 main( float2 tex : TEXCOORD0 ) : COLOR // Use with Shaderanalyzer and MPC-HC
{

// -- Get the original pixel --
float4 ori = tex2D(s0, tex); // ori = original pixel

// [ NW, , NE ] Each texture lookup (except ori)
// [ ,ori, ] samples 4 pixels
// [ SW, , SE ]

// -- Pattern 1 -- A 7 tap gaussian using 2+1 texture fetches.
#if pattern == 1

// -- Gaussian filter --
// [ 2/9, 4/9, ] [ 1 , 2 , ]
// [ 4/9, 8/9, 4/9] = [ 2 , 4 , 2 ]
// [ , 2/9, 2/9] [ , 2 , 1 ]

float4 blur_ori = tex2D(s0, tex + float2(-px,py) / 3 * offset_bias); // North West
blur_ori += tex2D(s0, tex + float2(px,-py) / 3 * offset_bias); // South East

//blur_ori += tex2D(s0, tex + float2(px,py) / 3 * offset_bias); // North East
//blur_ori += tex2D(s0, tex + float2(-px,-py) / 3 * offset_bias); // South West

blur_ori /= 2; //Divide by the number of texture fetches

#endif

// -- Pattern 2 -- A 9 tap gaussian using 4+1 texture fetches.
#if pattern == 2

// -- Gaussian filter --
// [ .25, .50, .25] [ 1 , 2 , 0 ]
// [ .50, 1, .50] = [ 2 , 4 , 2 ]
// [ .25, .50, .25] [ 0 , 2 , 1 ]

float4 blur_ori = tex2D(s0, tex + float2(-px,py) * 0.5 * offset_bias); // North West
blur_ori += tex2D(s0, tex + float2(px,-py) * 0.5 * offset_bias); // South East
blur_ori += tex2D(s0, tex + float2(px,py) * 0.5 * offset_bias); // North East
blur_ori += tex2D(s0, tex + float2(-px,-py) * 0.5 * offset_bias); // South West

blur_ori /= 4; //Divide by the number of texture fetches

#endif

// -- Pattern 3 -- An experimental 17 tap gaussian using 4+1 texture fetches.
#if pattern == 3

// -- Gaussian filter --
// [ , 4 , 6 , , ]
// [ ,16 ,24 ,16 , 4 ]
// [ 6 ,24 , ,24 , 6 ]
// [ 4 ,16 ,24 ,16 , ]
// [ , , 6 , 4 , ]

float4 blur_ori = tex2D(s0, tex + float2(-0.4*px,1.2*py) * offset_bias); // North North West
blur_ori += tex2D(s0, tex + float2(0.4*px,-1.2*py)* offset_bias); // South South East
blur_ori += tex2D(s0, tex + float2(1.2*px,0.4*py) * offset_bias); // East North East
blur_ori += tex2D(s0, tex + float2(-1.2*px,-0.4*py) * offset_bias); // West South West
blur_ori += ori; // Probably not needed. Only serves to lessen the effect.
blur_ori /= 5; //Divide by the number of texture fetches
#endif

// -- Calculate the sharpening --
float4 sharp = ori - blur_ori; //Subtracting the blurred image from the original image

// -- Adjust strength of the sharpening --
sharp = dot(sharp, sharp_strength_luma); //Calculate the luma and adjust the strength

// -- Clamping the maximum amount of sharpening to prevent halo artifacts --
sharp = clamp(sharp, -sharp_clamp, sharp_clamp); //TODO Try a curve function instead of a clamp

// -- Combining the values to get the final sharpened pixel --
float4 done = ori + sharp; // Add the sharpening to the original.

// .------------------------------------------------._Debugging and tweaking.-----------------------------------------------.

//For tweaking and debugging purposes you can show the sharpen effect or chroma.
//float4 done = (sharp*4) + float4(0.5,0.5,0.5,0); // Uncomment to visualize the strength of the sharpen (multiplied by 4 to see it better)

//done = ori.a; // Visualize the alpha
//done = 1.0 - ori.a; // Visualize the inverted alpha

// .-------------------------------------------------._Returning the output_.-----------------------------------------------.

return done;
}

It basically does the same thing, but with fewer instructions and the code looks much cleaner and is easier to read. There is also a new experimental 17-tap gaussian or at least it is a gaussian if my math is correct.

Also I corrected a bug in the fast version (pattern 1) where previously it moved the samples 0.5 (fixed version moves them 1/3) from center which worked but resulted in something more like a highpass sharpen than a gaussian.

JanWillem32

2nd February 2012, 17:52

Sorry I'm responding this late, this thread had already moved to page 2 before I noticed a response. I looked up the sampling parameters for D3D and DXGI. It appears that the Sample intrinsic (in both versions of the DX API) by defaults reserves pipelines on the shadercore to process anisotropic spatial filtering by default, and shaders themselves don't have indicators to indicate a lower requirement. The Load intrinsic (DXGI only) is handled by an entirely different pipeline, and can't be filtered. I didn't expect such a big difference in the methods for the basic Load and Sample on nearest neighbor. Maybe in a while I'll try to add optional bilinear filtering on custom pixel shader sampler stages for the DirectX 9 renderer I'm working on.
Your shader looks very promising. The debug section to visualize each section is very helpful, I use such mechanisms in rendering all the time. Just don't look too long at those rainbow colors while adjusting settings. :D The effect is of course, quite psychedelic, and can cover up the main intention for a shader's usage.
The methods seem pretty much correct. The only note I have is that mentioning the LAB colorspace is a bit wrong in this context. This shader only converts to Y'CbCr. That's also a luma-chroma system, but doesn't nearly have as much colorspace coverage or control as LAB or XYZ:
http://en.wikipedia.org/wiki/CIE_1931_color_space
http://en.wikipedia.org/wiki/Lab_color_space
http://en.wikipedia.org/wiki/File:Colorspace.png
(Although I must note that Y'CbCr is easier to handle than LAB. LAB is even more mathematically non-uniform than Y'CbCr, mostly because of a difficult gamma slope.)
Anyway, good luck with further developing.

leeperry

22nd November 2012, 10:44

Hi Jan, thanks for the scripts! Now that madVR supports them, this thread is well worth a bump :)

I can't see any script that would process a mirroring effect, wouldn't that be possible via a PS script? All I can see is flipping :o

And an artificial film grain script much like GrainFactory3() (http://avisynth.org/mediawiki/GrainFactory3) could also be really handy for low bitrate encodes.

:thanks:

burfadel

23rd November 2012, 23:22

I agree!

A nice effect that would be good as a shader is 'temporatal smoothing'. In FFDshow, under 'Blur & Noise Reduction', having everything unchecked except for 'Temporal smooth' (and of course, the box to actually activate the smoothing options), have it set to '1', and 'process color'.
'

leeperry

13th December 2012, 03:57

Hi Jan, I was wondering if you would have any plan to implement some sort of dynamic contrast stuff like Samsung's DNIE (http://forum.doom9.org/showpost.php?p=1600258&postcount=15433)?

It looks quite impressive on their TV's but it's neither defeatable or finetunable, and quite frankly they went quite overboard with the default settings....I guess it's meant to stun you in the shop but it basically makes everything look like a cell-shading cartoon = very funny for a few days, then it WILL get old after a while :o

They do it to compensate for the infamous 2K:1 native CR of their grossly overpriced TV's but the idea is good, it would only need to be finetunable I think. They allow several settings for their motion interpolation stuff(that looks quite good in "crisp" mode) so I dunno why they don't provide it for DNIE :rolleyes:

Besides, the best looking scalers in madVR require quite a lot of horsepower for 1080p and/or 60fps scaling so let's rock with the PS scripts while we're at it :D

I'd love to hear your thoughts on that matter, :thanks:

toniash

13th December 2012, 12:19

Besides, the best looking scalers in madVR require quite a lot of horsepower for 1080p and/or 60fps scaling so let's rock with the PS scripts while we're at it :D

PS scripts can be also very heavy on GPU

leeperry

13th December 2012, 16:22

Indeed, but atm anything cheaper than a GTX660 is a waste of money if you like it green so that leaves a lot of GPU power unused.....but I see that Jan has been silent for a while.

leeperry

28th December 2012, 19:49

so more ideas in case there'd be any bored PS script coder around: http://www.youtube.com/watch?v=rRf2aEsJaQE

pretty funky way to watch 4:3 content on a 16/9 display, would love to try it on my own James Brown Soul Train DVD's :devil:

leeperry

30th December 2012, 02:46

CeeJay.dk

3rd January 2013, 16:37

also, would that be hard to make a "negative" script? you've made all kinds of complicated scripts that have no real world use AFAICS, but simple stuff such as flipping/mirroring/negative just isn't there :(

That would be the easiest thing in the world, but why would you want to see a negative of the screen?

Anyways :

/* --- Defining Constants --- */

sampler s0 : register(s0);

/* --- Negative --- */
/*
by Christian Cann Schuldt Jensen ~ CeeJay.dk

Inverts the color of the image, making it negative.
*/

float4 NegativePass( float4 colorInput )
{
return 1.0 - colorInput;
}

/* --- Main --- */

float4 main(float2 tex : TEXCOORD0) : COLOR {
float4 FinalColor = tex2D(s0, tex);

FinalColor = NegativePass(FinalColor);

return FinalColor;
}

EDIT : While testing the negative shader I made , I found that MPC-HC already has one. It's called Invert.
This is it's code :

sampler s0 : register(s0);

float4 main(float2 tex : TEXCOORD0) : COLOR {
float4 c0 = float4(1, 1, 1, 1) - tex2D(s0, tex);

return c0;
}

It does exactly the same. Subtracts the pixel color from 1.0

leeperry

4th January 2013, 02:10

sweeet, :thanks: a bunch!

well, PS scripts don't work in 8bit like ffdshow so I can use them without killing the PQ in mVR and I like to have troubleshooting scripts "just in case" ;)

I used to code Seka assembler on Amiga back in the days and I heard that coding PS scripts was great fun because you only had to care about the good side of coding, so I might document myself on how to write these at some point. If anything, I'd crave for a less agressive DNIE (http://forum.doom9.org/showpost.php?p=1600258&postcount=15433) :)

Dodgexander

31st January 2013, 15:32

When using these in MPC-HC. I get errors:

memory(203,11): warning X3571: pow(f, e) will not work for negative f, use abs(f) or conditionally handle negative values if you expect them
memory(155,9): error X5589: Invalid const register num: 32. Max allowed is 31.

leeperry

31st January 2013, 17:24

did you set them to PS 3.0? did you RTFM if any? :p

Dodgexander

1st February 2013, 13:54

did you set them to PS 3.0? did you RTFM if any? :p

The pixel shader was indeed the problem, for some reason when I set to Pixel Shader 3, it reverts back to 2 and I have to change back to 3 again for it to work!

The results here are truly amazing, now I just have to pick the best option and automate. Thanks a lot for the help.

Oh another question though, when I use the YCbCr-type sharpen complex test 2 scripts, my screen completely goes purple.

http://imageshack.us/scaled/thumb/255/testtpz.png (http://imageshack.us/photo/my-images/255/testtpz.png/)

leeperry

2nd February 2013, 03:33

told ya, SimHD is the usual commercial bs clueless companies brag about...they always promise a lot but actually end up delivering very little. Doom9's forum is where the party's at for supreme PQ =)

don't bother with YCbCr, you wanna use RGB scripts with madVR.

Dodgexander

3rd February 2013, 02:50

Doing these scripts reminds me of a while back following 8:13's post for post processing in FFdshow via Avisynth, the results back then were impressive and today, with the video card doing the work its even better, especially with low end cpu!

I wish however that someone would develop all this post processing into a decoder or renderer like ffdshow. It would be so much easier to set up.

Dodgexander

15th February 2013, 00:18

Is there an easier way to import all of these files to switch between them without having to copy and paste each text file into MPC-HC?

Also, I like the effect that the blur shader has, but can't use it with any of the sharpen, deband and denoise filters at the same time. Is there any way around this?

Also, what other shaders are people using on their SD material for best effects?

Finally, using Mad VR upscaling chroma and Luma already, how can i make sure none of the effects are conflicting?

leeperry

1st May 2013, 11:01

BTW, mVR currently doesn't align chroma properly for MPEG1 as shown here (http://forum.doom9.org/showpost.php?p=1622754&postcount=18186).

Would anyone be kind enough to write a PS script that would fix this please? :thanks:

Is there an easier way to import all of these files to switch between them without having to copy and paste each text file into MPC-HC?
Dunno about MPC but you can just put them all in the /PxShader/ subdirectory of PotP, et voilà: http://thumbnails104.imagebam.com/25196/7c8097251955360.jpg (http://www.imagebam.com/image/7c8097251955360)

You can also setup automatic profiles with different combinations of PS scripts depending on frame rate/resolution/codec/etc :)

XRyche

6th July 2013, 01:54

Is there any way that you would make a YCbCr r=2 or r=1, sharpen complex, deband, denoise and color controls for SD&HD video input shader at this late date? The r=4 shader is a little to taxing on my rig with out overclocking my video card for some 1200p content that could use some debanding.

JanWillem32

7th July 2013, 22:14

Sure, I still often write shaders. (Just most of them are not for video renderers.) What shader chain would you like to use? I can probably reduce some of the overhead, or use somewhat more lightweight methods to get what you want.

XRyche

8th July 2013, 06:34

I'm not that tech-savy or a video-phile so I am not quite sure what you're asking. If you mean the order of shaders I use (they are all yours btw :) ) they are as follows and in the following order; Pre-resize: RGB to Y'CbCr floating point, 4:2:0 to 4:2:2 Chroma Up-sampling, 4:2:2 Spline5 Chroma Up-Sampling floating point for Y'CbCr, one of your r=?, sharpen complex, deband, ? denoise and color controls for SD&HD video input, and lastly unsharp luma mask for SD&HD video (the black border compensation is very very nice :) ). I don't really use post-resize shaders often. Have never found a need for them with your shaders. What I think I'm looking for is your "r=?, sharpen complex, deband, ? denoise and color controls for SD&HD video input for Y'CbCr" shader using only 1 or 2 radial layered sharpening functions. I've used your linear gamma shaders before using only 1 or 2 radial layered sharpening with 1200p and that seem to work for me. I just don't like the Linear Gamma shaders because I get massive artifacts when denoiseing and I can see the outlines for the sharpening (I'm probably doing something wrong). I tried raising the GammaCompensation value but that seems to have a negative impact on the denoising and sharpening effects.

While I have your attention, would there be any perceivable benefit in 4:4:4 Chroma Up-Sampling and if there was would you be willing to write a Y'CbCr shader for it? I've taken a really strong appreciation for your Y'CbCr shaders compared to RGB. Colours are much more subtle and not as harsh when using RGB at least to me. I prefer your Video renderer/w Y'CbCr shaders over a certain other extremely popular renderer simply for this fact.

mhourousha

8th July 2013, 08:20

I wrote a Color Vibrance Shader for some GPU without 'digital vibrance' feature (Intel HD Graphic for example)

sampler s0 : register(s0);
float4 p0 : register(c0);
float4 p1 : register(c1);

float3 ColorVibrance(float3 rgb,float vibrance)
{

//--Convert RGB to HSL--
float maxvalue = max(rgb.r,rgb.g);
float minvalue = min(rgb.r,rgb.g);
maxvalue = max(rgb.b,maxvalue);
minvalue = min(rgb.b,minvalue);
float CValue = maxvalue-minvalue;
float3 hsl = float3(0.0,0.0,0.0);
hsl.z = 0.5f*(maxvalue+minvalue);
float tempf = 1.0f-abs(hsl.z*2.0f-1.0f);
if(CValue<0.0001f)
{
return rgb;
}
hsl.y = CValue/tempf;

//--Boost pixel's Saturation base on its original Saturation
hsl.y +=hsl.y*(1.0f-hsl.y)*vibrance;
//--Conver HSL back to RGB-
float CValue2 = hsl.y*tempf;
float mValue = hsl.z-0.5f*CValue2;
float3 BaseColor = float3(0.0f,0.0f,0.0f);
if(maxvalue-0.00001f <= rgb.r)
{
BaseColor.x = 1.0f;
hsl.x = (rgb.g-rgb.b)/CValue;
if(hsl.x<0.0f)
{
BaseColor.z = -hsl.x;
}
else
{
BaseColor.y = hsl.x;
}
}
else if(maxvalue-0.00001f <= rgb.g)
{
BaseColor.y = 1.0f;
hsl.x = (rgb.b-rgb.r)/CValue;
if(hsl.x<0.0f)
{
BaseColor.x = -hsl.x;
}
else
{
BaseColor.z = hsl.x;
}
}
else
{
BaseColor.z = 1.0f;
hsl.x = (rgb.r-rgb.g)/CValue;
if(hsl.x<0.0f)
{
BaseColor.y = -hsl.x;
}
else
{
BaseColor.x = hsl.x;
}
}
return float3(mValue,mValue,mValue)+float3(CValue2,CValue2,CValue2)*BaseColor;
}
float4 main(float2 tex : TEXCOORD0):Color0
{
float4 c0 = saturate(tex2D(s0, tex));
float vibrance = 0.5;
return float4(ColorVibrance(c0.xyz,vibrance),1.0);
}

PetitDragon

8th July 2013, 16:08

I'm not that tech-savy or a video-phile so I am not quite sure what you're asking. If you mean the order of shaders I use (they are all yours btw :) ) they are as follows and in the following order; Pre-resize: RGB to Y'CbCr floating point......

Could you tell us what version of Jan's test build you are using with these PS shaders?

JanWillem32

10th July 2013, 01:30

XRyche, do you actually need the chroma up-sampling shaders? They only work on AMD/ATi and Intel GPUs in the quality mode. For the performance mode, the default mixer up-sampling is used and for nVidia GPUs the chroma up-sampling can't be overridden unless you take out the VMR-9 or EVR mixer. If you do need them, which of the current types in the renderer do you like? The higher-order types from the pixel shader pack are somewhat faulty.
I can certainly mix the sharpen complex~ and the luma-type unsharp mask shaders, they pretty much act on the pixels in a similar fashion anyway.
Gamma linearization is still important for Y'CbCr to R'G'B' to RGB to XYZ stages. By default, the renderer I wrote takes good care of these steps in the quality mode. Letting external shaders do such a task is required when you set "Disable Initial Color Mixing Stages" for the renderer (which does give a lot of control over that stage, with nearly ideal efficiency).

mhourousha, this shader does seem interesting. A quick glance shows some minor things to possibly improve, though.
First of all, unlike C, HLSL does not take 'F' or 'f' as a suffix for single precision. Single precision is the default, and double precision takes an 'l' or 'L' suffix.
Secondly, for what reason is saturation used? Various rendering stages for video playback and other HDR imaging produce valid output beyond the 0 minimum and 1 maximum. Other than some anti-aliasing methods that have to deal with the spatial problems of multi-sampling pixels into one output, I've never seen valid reasons to actively saturate colors.
Third, some parts could improve by using shuffle masks, such as: "mValue.rrr" or just "mValue" instead of "float3(mValue,mValue,mValue)" and "ColorVibrance(c0.xyz,vibrance).rgbb" instead of "float4(ColorVibrance(c0.xyz,vibrance),1.0)".
Fourth, why did you use "0.00001f" instead of a true floating-point epsilon? On top of that, why did you use such a small value at all? The only reason I can see is because "hsl.y = CValue/tempf;" could possibly do a division by zero. For that one changing the previous comparison to "CValue <= 0." would suffice though.

mhourousha

10th July 2013, 06:51

JanWillem32:
about the saturate,some renderer(MadVR for example),doesn't clamp the color between[0,1] for the surface used as source by the shader.so artifact would occur as I don't saturate the 'c0'
about shuffle mask,it's my habit :D ,I think the shader compiler would optimize it.
about the 0.00001f,first, it's not good to do a 'equal' comparision between float values.second,I didn't trust gpu on doing division by very small value.:p

XRyche

10th July 2013, 08:58

XRyche, do you actually need the chroma up-sampling shaders? They only work on AMD/ATi and Intel GPUs in the quality mode. For the performance mode, the default mixer up-sampling is used and for nVidia GPUs the chroma up-sampling can't be overridden unless you take out the VMR-9 or EVR mixer. If you do need them, which of the current types in the renderer do you like? The higher-order types from the pixel shader pack are somewhat faulty.
I can certainly mix the sharpen complex~ and the luma-type unsharp mask shaders, they pretty much act on the pixels in a similar fashion anyway.
Gamma linearization is still important for Y'CbCr to R'G'B' to RGB to XYZ stages. By default, the renderer I wrote takes good care of these steps in the quality mode. Letting external shaders do such a task is required when you set "Disable Initial Color Mixing Stages" for the renderer (which does give a lot of control over that stage, with nearly ideal efficiency).

Wow, I know enough just to make myself look like an idiot :rolleyes: .As a matter of fact it is the higher order types from your pixel shader pack that give me issues. I should have made that clear....oops again. I understand now that the chroma up-sampling shaders are unnecessary for my GPU (Nvidia), I checked myself and I didn't see any change at all with or without them. I incorrectly assumed that they worked without really bothering to compare. Am I correct in assuming the the RGB to Y'CbCr conversion shader does exactly that (I do definitely see a difference, if not....it's off to the optometrist I go :) .)?

As far as the sharpen complex and luma unsharpen mask shaders i would say "Yes, please". When I use the Luma unsharpen mask shader it has a tendency to clean up some of my old tv card rips without having to use excessive denoising. I assumed the "black border compensation" has something to do with that.

Also if you could write a r=1 and an r=2 "sharpen complex, deband, ? denoise and color controls for SD&HD video input" Y'CbCr (I know they are "test" shaders but they appear to do everything they say they do without issue for me). I use your higher order ones for alot of SD and 720p content anyway.

Thanks for setting me straight on the chroma upsampling issue. I had no idea EVR had that kind of limitation.

JanWillem32

10th July 2013, 10:39

XRyche, the RGB to Y'CbCr conversion shader is fairly basic. I assume you mean the chroma up-sampling shaders? These will distort the chroma if the values were altered before. Sharp chroma borders such as red on black and blue on black will have the worst artifacts.
I'll see what I can write today and tomorrow. Implementing this chain as three passes of pixel shaders should be easy enough.

mhourousha, not only MadVR preserves output beyond the 0 minimum and 1 maximum. It's standard in all quality rendering. Most video filters are expected to be able to take full floating point range inputs, correctly process everything and then output. That said, what artifacts do you expect? Of the entire set of pixel shaders in the pixel shader pack and all other shaders I've ever written except for the stages for a type of anti-aliasing, I never had to use saturation on colors at all. (Saturating and clamping on vertices is pretty common though.)
The ".rgbb" shuffle at the end is mostly a trick to prevent two write instructions to the output register (one for the three color channels and one for the the alpha output color channel). The other shuffles can be optimized by the compiler indeed.
In my honest opinion, all those teachers that still teach "do not use direct comparisons with floating point logic" and don't bother with actually explaining machine epsilon and relative error accumulation by instructions should get a whipping.
In this case it's about a division. For the right-hand operand in floating-point division, there are 5 special cases: -infinity, infinity, -0, 0 and not-a-number inputs. NaN is not an issue in this case and division by infinity will work as expected. Only straight division by exactly -0 and 0 will produce a NaN, -infinity or infinity, depending on the left-hand operand.
Handling small numbers on a GPU has not been an issue ever since the introduction of the programmable pipeline. It's actually CPUs that had issues with small numbers for up until recent models. Intel even introduced flags on x86 CPUs to allow flushing denormal values to zero and assume denormals as zero (not applied by default, as it breaks IEEE and most programming language standards). Most CPUs in use today do not have native denormal support in in their floating-point processing units. Whenever a denormal is detected in such a CPU, an interrupt is emitted, the FPU is put offline and the calculation with the denormal is done by emulation. Such operations can cost over a thousand clock ticks. Full-speed single precision floating point denormal handling is important (and costly in terms of transistor counts) on GPUs. GPUs don't have any other logic on board to emulate any instructions in the first place. On top of that, slowing down certain instructions in one core will pause all other grouped cores in a GPU (the curse of massive parallelism, which does apply on any branching code).
In terms of accuracy of divisions, I looked up the 'rcp' instruction (hlsl will generally not compile to straight division, but rather use fast reciprocal and multiply), the documentation is what I expected it would be: http://msdn.microsoft.com/en-us/library/windows/desktop/bb147315%28v=vs.85%29.aspx .
There are no issues with handling small numbers on a GPU, as long as the numbers don't require such a large exponent or mantissa that only double precision or even greater will suffice for calculations.
In regards to floating-point equality evaluations, any floating-point number compared to iself for equality will yield true, except for NaNs. Floating-point numbers do get altered by arithmetic instructions, which accumulate relative error that you might have to compensate for. However, the only instruction I see before the last set of comparisons to "rgb" and "maxvalue" is 'max'. That is a flat branching instruction, not an arithmetic type. No relative error is accumulated, so the epsilon on these comparisons is useless.

mhourousha

10th July 2013, 13:43

JanWillem32,thx for reply
about saturation:because RGB<->HSL is not a linear transform,it required lightness between[0,1],value beyond this range will cause artifact.http://en.wikipedia.org/wiki/HSL_color_space
about shader compiler optim:in fact,the token-assembly shader language(the asm shader in windows platform's D3D)is not 1 to 1 mapping to HW instruction. .rgbb will save a token-asm instruction like ‘mov oC0.w,c0‘ indeed.but if you look at the HW assembly(use Tool like ShaderAnalyzer),two method will cost the same clock cycle under most case.and if use .rgbb,it assumed that the alpha channel will not be used in future,it's not a always-safe trick,right? oh I see, I should use 'return float4(ColorVibrance(c0.xyz,vibrance),c0.w);'instead of'return float4(ColorVibrance(c0.xyz,vibrance),1.0);'
about floating point,for recent GPU,you are right.but some old GPU，not implement IEEE standard strictly,like ATI R300-R400's fp24 internal precision，and I'm afraid some shader compiler's aggressive optimization may use fp16 instead of fp32 (for Old NVidia card).so I use the epsilon for safe, it only cause a very little performance-hit.