Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
25th September 2020, 17:46 | #21 | Link | |
Registered User
Join Date: Jun 2020
Posts: 303
|
hlsl pixel shader Intro (dx9)
Quote:
I wrote a basic intro here (also details limitations): https://forum.videohelp.com/threads/...l)#post2587323 Checking out/modifying existing shaders is a good way to get started. You can have multiple return statements: use the screen to display your intermediate results and comment them out as required. define your user parameters with #define For dev, I would recommend mpc-hc/ notepad++ with an extended c syntaxic coloring for .hlsl. Notepad++ > Settings > Style Configurator... C .hlsl user-defined - instructions: main tex2D saturate dot pow max min lerp mul frac sign step sqrt clamp - types: sampler float4 float3 float2 float3x3 User Defined Language hlsl.xml is also possible (but it is a bit flaky) mpc-hc allows you to display the debug output / perf of your code. ---- // This shader can perform software alignment by Catmull-Rom spline6 interpolation for a 3LCD projector's red and blue panels // This file is part of Video pixel shader pack. // (C) 2011 Jan-Willem Krans (janwillem32 <at> hotmail.com) I must say I have no idea what this pixel shader is used for (some type of projector calibration ?). // This shader should be run as a screen space pixel shader. - screen space pixel shader: use as post-resize shader. // This shader is meant to work with linear RGB input and output. Regular R'G'B' with a video gamma encoding will have to be converted with the linear gamma shaders to work properly. see https://forum.doom9.org/showthread.php?t=181651 There is an overhead to converting to linear gamma, so if the effect is negligible in practice you might want to just ignore it, but it could affect the accuracy of the result. If this is the only shader you intend to use, you could integrate the linear gamma conversion into the shader. Otherwise, I typically use a 10bit integer surface for processing in mpc-hc to avoid too much precision loss with the shader chain.
__________________
bShaders: realtime Effects/filters for video players Last edited by butterw2; 30th October 2023 at 11:52. |
|
25th September 2020, 20:13 | #22 | Link |
Registered User
Join Date: Feb 2019
Posts: 235
|
Thanks, I actually just e-mailed Jan-Willem Krans this afternoon asking about it and he whipped up a simple shader to do what I wanted in this case.
Code:
#define RedControls 0 #define BlueControls 0 // RedShiftLeftToRight and BlueShiftLeftToRight, a value of 3. will shift three pixels to the right, 0 is disabled #define RedShiftLeftToRight 0. #define BlueShiftLeftToRight 0. // RedShiftTopToBottom and BlueShiftTopToBottom, a value of 3. will shift three pixels to the bottom, 0 is disabled #define RedShiftTopToBottom 0. #define BlueShiftTopToBottom 0. sampler s0; float2 c1 : register(c1); float4 main(float2 tex : TEXCOORD0) : COLOR { float4 s1 = tex2D(s0, tex);// base pixel #if RedControls == 1 s1.r = tex2D(s0, tex+c1*float2(RedShiftLeftToRight, RedShiftTopToBottom)).r;// base red pixel #endif #if BlueControls == 1 s1.b = tex2D(s0, tex+c1*float2(BlueShiftLeftToRight, BlueShiftTopToBottom)).b;// base blue pixel #endif return s1; } Basically a projector can look like this: https://lowtek.ca/roo/wp-content/upl...03/avsbad3.jpg Some higher end projectors have internal software corrections for this sort of issue that you can adjust, but others do not, and that's where this shader comes in. I used his shader that you posted, but that one has complex pixel interpolation and breaks the perfect 1:1 pixel mapping that I wanted. this new simple shader he wrote or me does exactly what I wanted now. I do have some other ideas for some simple shaders that I want to try making and I may post here if I have questions about how to do something as I start tinkering around with them. Last edited by SirMaster; 26th September 2020 at 17:49. |
25th September 2020, 21:48 | #23 | Link |
Registered User
Join Date: Jun 2020
Posts: 303
|
You just wanted a configurable (x, y) pixel offset on red and blue channels. Very simple indeed using a pixel shader. This could also be used as an effect of sorts, I suppose.
if I can add anything at all, p1 would be a more common name for the pixel offset: float2 p1: register(c1);
__________________
bShaders: realtime Effects/filters for video players |
30th September 2020, 17:51 | #24 | Link |
Registered User
Join Date: Feb 2019
Posts: 235
|
Here is a shader I made for simple 4 way masking.
Code:
#define width 1920. #define height 1080. #define left 0 #define right 0 #define top 0 #define bottom 0 sampler s0; float4 main(float2 tex : TEXCOORD0) : COLOR { if(tex.x >= (left/width) && tex.x <= 1-(right/width) && tex.y >= (top/height) && tex.y <= 1-(bottom/height)) return tex2D(s0, tex); return float4(0, 0, 0, 1); } This is useful for projectors that don't have a built in 4 way masking control. |
30th September 2020, 18:27 | #25 | Link | |
Registered User
Join Date: Jun 2020
Posts: 303
|
Here's my own optimized implementation from barMask.hlsl (Mode==112)
"if" conditionals are quite inefficient with pixel shaders, so the code uses a boolean insideBox function instead. All functions in HLSL are inline. An inline function generates a copy of the function body (when compiling) for each function call. #define macro functions are also commonly used. Quote:
__________________
bShaders: realtime Effects/filters for video players Last edited by butterw2; 30th September 2020 at 20:30. Reason: Code was copy-pasted, hopefully without mistakes |
|
30th September 2020, 21:11 | #27 | Link | |
Registered User
Join Date: Feb 2019
Posts: 235
|
Quote:
I normally have the whole chain set to RGB full range (lav filters / madVR, GPU output, and display). And everything is set to and calibrated to 2.2 power law gamma. So do I need to be concerned about doing some sort of linear gamma conversion? I am not sure why simply shifting the colored pixel channels around and adding black border masking would need to worry about anything like this. |
|
30th September 2020, 21:39 | #28 | Link |
Registered User
Join Date: Jun 2020
Posts: 303
|
# Linear Gamma Conversion https://forum.doom9.org/showthread.php?t=181651 was mentioned in the original LCD calibration source code you linked and yes it can be confusing.
If you are just moving pixels around or masking the frame: it is not necessary. Then there are many cases where it might be theoretically necessary (any processing of pixel values), but it makes little or no difference to the end result: just ignore. In a few cases however it is necessary to convert to linear gamma before processing and then gamma encode when you are done. Incorrect processing is so common however, that the result will sometimes look "strange/wrong" when you go through the trouble/overhead of doing it right.
__________________
bShaders: realtime Effects/filters for video players Last edited by butterw2; 19th October 2020 at 17:46. |
2nd October 2020, 17:17 | #29 | Link |
Registered User
Join Date: Oct 2018
Posts: 332
|
Hi @butterw2
Maybe you can help me with this. I'm using a Lanczos 3 shader that I found on the internet to do the VDSR scaling in my app. The results are very good, quite similar to AviSynth and Matlab, and using Lanczos instead of Bicubic as is done on the VDSR paper, allows the final result to be much sharper. This is the code: Code:
/* Copyright (C) 2010 Team XBMC http://www.xbmc.org Copyright (C) 2011 Stefanos A. http://www.opentk.com This Program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. This Program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this library. */ sampler s0 : register(s0); float4 p2 : register(c2); #define get(x, y) tex2D(s0, float2(x, y)).rgb #define FIX(c) max(abs(c), 1e-5); static const float PI = 3.141592653; float3 weight3(float x) { const float radius = 3.0; float s1 = FIX(2.0 * PI * (x - 1.5)); float s2 = FIX(2.0 * PI * (x - 0.5)); float s3 = FIX(2.0 * PI * (x + 0.5)); float3 sample = float3(s1, s2, s3); return sin(sample) * sin(sample / radius) / (sample * sample); } float3 line_run(float ypos, float3 xpos1, float3 xpos2, float3 linetaps1, float3 linetaps2) { return get(xpos1.r, ypos) * linetaps1.r + get(xpos1.g, ypos) * linetaps2.r + get(xpos1.b, ypos) * linetaps1.g + get(xpos2.r, ypos) * linetaps2.g + get(xpos2.g, ypos) * linetaps1.b + get(xpos2.b, ypos) * linetaps2.b; } float4 main(float2 tex : TEXCOORD0) : COLOR { float2 pos = tex + p2.zw * 0.5; float2 f = frac(pos / p2.zw); float3 linetaps1 = weight3(0.5 - f.x * 0.5); float3 linetaps2 = weight3(1.0 - f.x * 0.5); float3 columntaps1 = weight3(0.5 - f.y * 0.5); float3 columntaps2 = weight3(1.0 - f.y * 0.5); float suml = dot(linetaps1, float3(1, 1, 1)) + dot(linetaps2, float3(1, 1, 1)); float sumc = dot(columntaps1, float3(1, 1, 1)) + dot(columntaps2, float3(1, 1, 1)); linetaps1 /= suml; linetaps2 /= suml; columntaps1 /= sumc; columntaps2 /= sumc; float2 xystart = (-2.5 - f) * p2.zw + pos; float3 xpos1 = float3(xystart.x, xystart.x + p2.z, xystart.x + p2.z * 2.0); float3 xpos2 = float3(xystart.x + p2.z * 3.0, xystart.x + p2.z * 4.0, xystart.x + p2.z * 5.0); return float4( line_run(xystart.y , xpos1, xpos2, linetaps1, linetaps2) * columntaps1.r + line_run(xystart.y + p2.w , xpos1, xpos2, linetaps1, linetaps2) * columntaps2.r + line_run(xystart.y + p2.w * 2.0 , xpos1, xpos2, linetaps1, linetaps2) * columntaps1.g + line_run(xystart.y + p2.w * 3.0 , xpos1, xpos2, linetaps1, linetaps2) * columntaps2.g + line_run(xystart.y + p2.w * 4.0 , xpos1, xpos2, linetaps1, linetaps2) * columntaps1.b + line_run(xystart.y + p2.w * 5.0 , xpos1, xpos2, linetaps1, linetaps2) * columntaps2.b, 1.0); } Code:
float3 weight3(float x) { const float radius = 3.0; float s1 = FIX(2.0 * PI * (x - 1.5)); float s2 = FIX(2.0 * PI * (x - 0.5)); float s3 = FIX(2.0 * PI * (x + 0.5)); float3 sample = float3(s1, s2, s3); return sin(sample) / sample; } I'm not even sure anyway that it will serve to improve the results, since the VDSR network is more limited in what it can improve when using good scaling, and the models that I'm using are quite reduced, but I think that it may be worth the try. Once again thank you in advance for you help. EDIT: p2 are the dimensions: (width, height, 1/width, 1/height)
__________________
AviSynth AiUpscale Last edited by Alexkral; 2nd October 2020 at 17:28. |
2nd October 2020, 22:57 | #30 | Link |
Registered User
Join Date: Jun 2020
Posts: 303
|
Your AviSynthAiUpscale project seems interesting, VDSR (Very Deep Super Resolution) is a deep learning approach for enlarging an image, but I haven't done anything with pixel shader resizers beyond bicubic so far. The reason being that Lanczos-3 requires a large kernel and is theoretically non separable (2D sinc-windowed sinc: k(r) with r polar coordinate).
Bicubic methods use a (4x4) convolution kernel req 16 texture taps. A simple improvement is to separate the horizontal and vertical passes, bringing it down to 8 taps. Lanczos-3 (kernel: 2a*2a, a=3) req 36 taps, as demonstrated by your code (36 texture, 199 arithmetic). Your code comes from xbmc, Kodi. Do you have an url ? Seems similar to https://github.com/xbmc/xbmc/blob/ma...ion-6x6_d3d.fx However mpc-be has a 2-pass "compensated Lanczos3" implementation resizer_lanczos3_x.hlsl which seems to be 6+6 ? https://sourceforge.net/p/mpcbe/code...anczos3_x.hlsl I'm assuming you care more about quality than performance, but it may still be worth checking out. by Sinc method do you mean SincResize from Avisynth ? "uses the truncated sinc function. It is very sharp, but prone to ringing artifacts. " Once you have the correct kernel, implementation shouldn't be too challenging. If you are trying to implement the avisynth method, validation of the shader would be straightforward enough as you just need to compare to the avisynth reference. # Pixel shader resizers in video players Video player can use pixel shaders to perform scaling/resize (vs cpu sw scaling or fixed hw gpu scaling). Mpv allows user-defined scaling shaders, enabling the use of high-quality upscalers (a discrete gpu would be recommended for these shaders as they are not lightweight). Mpc-hc/be does not allow user-defined resize shaders with the EVR-CP renderer: you must use one of the pre-defined resize methods. A pixel shader cannot change the frame resolution, but you can still test a scaling shader, by zooming in to the top left quarter screen for instance, ex: 1080p screen resolution and 1080p input video in fullscreen (no scaling by video player). A resize shader will map the output coordinates (tex) to the corresponding input pixels to interpolate output pixel values. This is typically implemented using a convolution kernel (a more efficient 2-pass horizontal then vertical resizing approach is possible if the kernel is separable). Note: resize operations should theoretically be performed in linear gamma.
__________________
bShaders: realtime Effects/filters for video players Last edited by butterw2; 19th October 2020 at 17:47. Reason: +: Pixel shader resize in Video players, mpc-be repository link |
3rd October 2020, 16:14 | #31 | Link | |||
Registered User
Join Date: Oct 2018
Posts: 332
|
Ok, thanks for your answer anyway.
Quote:
Quote:
Quote:
__________________
AviSynth AiUpscale |
|||
3rd October 2020, 19:00 | #32 | Link | |
Registered User
Join Date: Jun 2020
Posts: 303
|
Quote:
Switching kernel functions should just work, but performance will not be dramatically improved. For reference, a 2-pass Catmull-rom resizer only requires (8 texture, 44 arithmetic). # Convolution Kernels A convolution kernel k(x, y) is a table of weights of size X*Y (ex: 3x3, 5x5 or 7x7). In a pixel shader, a convolution kernel might be calculated from parameters using the kernel function (if symmetry is present, it can be used to reduce the number of calculations required) or if constant the kernel can be hardcoded. The kernel is typically normalized so that the sum of weights is equal to one. To calculate each output pixel value, the corresponding X*Y input pixels values are sampled and multiplied by their corresponding kernel weight to finally be summed. A 2D separable kernel can be written as kh(x)*kv(y)=k(x, y) with kh and kv 1D-kernels. This allows 2D processing to be performed as 2 successive 1D-shader passes resulting in much more efficient operation in the case of large kernels.
__________________
bShaders: realtime Effects/filters for video players Last edited by butterw2; 24th October 2020 at 22:05. Reason: grammar |
|
6th October 2020, 18:00 | #33 | Link |
Registered User
Join Date: Jun 2020
Posts: 303
|
# Split screen display with pixel/fragment shaders:
This simple code modification can be applied to any pixel shader, and is useful to adjust parameter values and visualize what a shader actually does... ex: Edge detection from Edge Sharpen https://raw.githubusercontent.com/bu...tect_1080p.jpg Code:
float4 main(float2 tex: TEXCOORD0): COLOR { color=tex2D(s0, tex); //source pixel /* horizontal split screen, left-half: no effect, right-half: with effect */ if (tex.x<0.5) return color; ... } if (tex.x<0.5) return tex2D(s0, tex+float2(0.5, 0)); //Comparison Splitscreen: no effect Note: This technique can be used pre or post-resize in mpc-hc/be (but post-resize, it only fully works in fullscreen). it can also be applied to a mpv glsl .hook: https://github.com/butterw/bShaders/...Side.hook.glsl vec4 hook() { vec4 color=HOOKED_tex(HOOKED_pos); if (HOOKED_pos.x<0.5) return color; ... } Vertical split screen: if (tex.y<0.5) return color; //top-half: no Effect For a sharpening filter, enabling/disabling the shader might be more effective to grasp the visual difference. It's also of course possible to take screenshots for comparisons.
__________________
bShaders: realtime Effects/filters for video players Last edited by butterw2; 29th December 2020 at 12:28. Reason: +bSide.hook -screenshots |
8th October 2020, 18:52 | #34 | Link |
Registered User
Join Date: Jun 2020
Posts: 303
|
# Edge Detection
I've posted a couple of edge detection shaders based on convolution kernels at https://gist.github.com/butterw - Edge_Sharpen.hlsl: mpc-hc Edge Sharpen optimization and luma edge detection mod (GPL v3 licensed) Screenshot linked in preceding post. Edge sharpening can help a soft source which is a bit soft without generating artifacts, it is one of the applications of edge detection. On a clean high-res source, the following luma edge detection methods work very well on text, objects, hair, but maybe not so great on teeth. - Frei-Chen edge detection in luma (mpv .hook shader) can be used in combination with NoChroma.hook Directly processing source luma seems to give better result (vs obtaining luma from rgb). - Sobel/Canny Edge detection: https://github.com/butterw/bShaders/bSobel_Edge.hlsl, I've also made available a mpv .hook port of a glsl version of Sobel (in rgb). Sobel is the most common algorithm for edge detection. It can be used stand-alone or as part of the Canny algorithm. The final stages of Canny are better implemented on cpu or compute shader however. Canny edge detection video filter in ffmpeg (cpu): https://ffmpeg.org/ffmpeg-filters.html#edgedetect mpv --vf=lavfi="[edgedetect=low=0.1:high=0.4]" input.mp4 These filters are also available (apply a grayscale filter on output): mpv --vf=sobel input.mp4 mpv --vf=prewitt input.mp4 mpv --vf=roberts input.mp4
__________________
bShaders: realtime Effects/filters for video players Last edited by butterw2; 24th October 2020 at 22:03. Reason: +ffmpeg --vf |
8th October 2020, 20:04 | #35 | Link |
Registered User
Join Date: Oct 2018
Posts: 332
|
So I gave this another try (not knowing how it works yet). As I see it, the kernel is in the weight3 function, so I have changed it again like this:
Code:
float3 weight3(float x) { const float radius = 3.0; float s1 = 2.0 * PI * (x - 1.5); float s2 = 2.0 * PI * (x - 0.5); float s3 = 2.0 * PI * (x + 0.5); float ret1 = (abs(s1) < 2.0 * radius) ? sin(s1) / s1 : 0; float ret2 = (abs(s2) < 2.0 * radius) ? sin(s2) / s2 : 0; float ret3 = (abs(s3) < 2.0 * radius) ? sin(s3) / s3 : 0; return float3(ret1, ret2, ret3); } The metrics are not good but that doesn't matter too much to me, I'll use it to train some networks and then we'll see what the result is.
__________________
AviSynth AiUpscale |
8th October 2020, 22:39 | #36 | Link |
Registered User
Join Date: Jun 2020
Posts: 303
|
# Optimizing gpu arithmetic operations
Whether variable x is float or float4, mad (multiply addition) counts as a single operation on gpu, ex: x*x + y, 2*x +3*4.2 (1 arithmetic op) Weight3 is the kernel function, yes, but the code uses vectorized float3 operations, which can make it more difficult to understand. I don't know how poorly a naive (brute force) implementation performs. There doesn't seem to be other code available for a single pass lanczos-3 pixel shader. If you want to try it out, I can upload the 2-pass lanczos-3 shader code from mpc-be (which I've adapted to run in mpc-hc). Texture fetches hit the gpu harder than arithmetic ops, but it is still useful to reduce arithmetic ops when possible. You need to help out the compiler in some cases, check the compiler output to see if this saves one operation for instance: float s1 = 2.0 * PI * (x - 1.5); can be written as: float s1 = 2 * PI * x - 2 * PI *1.5; //1 arithmetic Instead of floats s1, s2, s3 you could use one vector operation to calculate float3 s and then use s.x, s.y, s.z # Cost of hlsl functions (in arithmetic operations) Single op: A*B+C (mad), float dot(A, B), frac(A), saturate(A) 2: lerp(A, B, 0.5 ), floor(A), step(A, 0.5), clamp(0.2, 0.8, A) 3: float length(A) 4: fmod(A, 2) 4,5: smoothstep(0, 1.0, A) trunc(A) --- ops>8: If multiple operations on float variables are needed, look at packing a vector. sqrt(A) sin(A)
__________________
bShaders: realtime Effects/filters for video players Last edited by butterw2; 27th October 2020 at 22:30. Reason: +arithmetic ops cost of functions |
8th October 2020, 23:55 | #37 | Link | |
Registered User
Join Date: Oct 2018
Posts: 332
|
Quote:
I needed to change that from the original code (the link I posted) because the max function gave wrong results, but you are right, now it can be done this way.
__________________
AviSynth AiUpscale |
|
9th October 2020, 00:58 | #38 | Link |
Registered User
Join Date: Jun 2020
Posts: 303
|
Compensated Lanczos3 pass X (from mpc-be, adapted for mpc-hc user shader with 2x magnification):
https://gist.github.com/butterw/8190...6bb6f188b9222b You'll also need the second pass (pass-Y), which is the same as pass-X, but applied to y-axis. In hlsl, texel centers (tex) are situated at half-pixel coordinates. If you don't need 2x magnification, don't start from my adaptation, start from the original code (just define dxdy as required, I prefer using p (or p1) for this, the pixel widths are then p.x, p.y). float4 p2: register(c2); //in this application, c2 contains: (width, height, 1/width, 1/height) #define p p2.zw //pixel widths (p.x, p.y) # Testing a shader with a png image input A good way to test a video player shader is to use a png image as input. Output will update automatically when you modify/save the shader and an output screenshot can be saved for comparison if required. Input: RGB(A) png (preferably of same resolution as the output screen), shader (preferably pre-resize shader), output (fullscreen preferred for screenshot in windows) The idea is to bypass compression artifacts (png is lossless), yuv2rgb conversion/range expansion, video player scaling, so as to only test the pixel shader.
__________________
bShaders: realtime Effects/filters for video players Last edited by butterw2; 19th October 2020 at 17:40. Reason: clarity |
9th October 2020, 02:29 | #39 | Link |
Registered User
Join Date: Oct 2018
Posts: 332
|
Thanks, I'll have some time in the next few days to take a look at all this, hopefully soon I will understand a little bit more.
__________________
AviSynth AiUpscale |
12th October 2020, 05:08 | #40 | Link |
Registered User
Join Date: Oct 2018
Posts: 332
|
So I didn't have much luck trying to change this code either, but it's ok, I don't want to keep trying or filling this thread with posts related to this. I just wanted to point out that the mpc-be code is wrong, the changes you made fixed it, but it still produces a half pixel shift to the left (same as the xbmc code but in the opposite direction), you can fix it like this:
Code:
float coord = (tex.x - p1.x)*0.5*p0.x;
__________________
AviSynth AiUpscale Last edited by Alexkral; 12th October 2020 at 05:10. |
Tags |
hlsl, mpc-be, mpc-hc, mpv, pixel shaders |
Thread Tools | Search this Thread |
Display Modes | |
|
|