Log in

View Full Version : MPC-HC tester builds for internal renderer fixes


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 [31] 32 33 34 35

v0lt
2nd February 2015, 19:34
MPC-BE + VMR-9 (renderless):
X8R8G8B8 - ok (bad two-pass interpolation)
A16B16G16R16F - ok (good two-pass interpolation!)
A32B32G32R32F - don't work

mpc-hc SSE2 tester dfr7370rrrrri.7z + VMR-9 (renderless): black screen

JanWillem32
3rd February 2015, 00:10
I replaced the builds and source code in my earlier post.
VMR-9 should work again. (It must have been broken for ages...) Resizing the window while media is loaded should be a bit less jerky than it used to be, but I might be able to optimize it some more later on.

I'll accept the resizing artifacts in the performance mode. We can't set anything better than mere 8-bit integer surfaces for that mode. As long as all quality modes are free of artifacts, I'm fine with this situation.

v0lt
3rd February 2015, 04:09
mpc-hc SSE2 tester dfr7370rrrrri.7z (on February 3) + VMR-9 (renderless):
X8R8G8B8 - ok (bad two-pass interpolation)
A16B16G16R16F - ok (good two-pass interpolation)
A16B16G16R16 - ok (good two-pass interpolation)
A32B32G32R32F - ok (good two-pass interpolation)

I see very contrasting figures on full screen.

JanWillem32
3rd February 2015, 12:03
For "I see very contrasting figures on full screen" what do you mean exactly? For instance; I made basic color correction the default. Images will not look the same in performance mode and quality mode because of that. If you disable the option in the "Renderer Settings", "Color Management" menu, it will revert to a generic BT.709 R'G'B' output. For AMD cards in D3D fullscreen mode with the 10-bit output option enabled colors can be way off. The images will look extremely banded and colorful like a candy store. Just reset the driver and try again if that happens.

ts1
3rd February 2015, 15:26
VMR-9 doesn't work correctly for me. When video shows after 2 seconds of black screen player hangs forever.

JanWillem32
3rd February 2015, 17:24
ts1, those things can happen. Let's try a debug build again. (As usual, do not use these builds as regular types and don't forget to add D3DCompiler_47.dll or D3DCompiler_43.dll.) If a warning occurs, just click on continue, but please do note down what the message box says in addition to the normal log.

ts1
3rd February 2015, 17:39
ASSERT Failed
m_pGeneralVertexBuffer
At line 685 of vmr9allocatorpresenter.cpp

log: http://s000.tinyupload.com/index.php?file_id=67282764943006020139

fagoatse
3rd February 2015, 18:34
For "I see very contrasting figures on full screen" what do you mean exactly? For instance; I made basic color correction the default. Images will not look the same in performance mode and quality mode because of that. If you disable the option in the "Renderer Settings", "Color Management" menu, it will revert to a generic BT.709 R'G'B' output. For AMD cards in D3D fullscreen mode with the 10-bit output option enabled colors can be way off. The images will look extremely banded and colorful like a candy store. Just reset the driver and try again if that happens.

I noticed that AMD enabled 10-bit output by default in their newer drivers. Does that affect anything? It's in the display and not video settings btw.

v0lt
3rd February 2015, 18:54
For "I see very contrasting figures on full screen" what do you mean exactly?
I do not know whether it is bad or good. I only noticed the difference.
MPC-BE VMR9 Bicubic -1,0: http://i.imgur.com/J3S2Jxi.png
MPC-HC-test VMR-9 float32 Bicubic -1,0: http://i.imgur.com/KCk1abH.png

JanWillem32
3rd February 2015, 23:25
ts1, let's try another debug build. The previous one unfortunately triggered on a useless condition.

fagoatse, the Windows desktop compositor is limited to 8-bit surfaces the last time I checked. Unless something has changed in that regard, there's no influence on the renderer. (D3D fullscreen mode bypasses the desktop compositor, by the way. It is therefore much less restricted to output color depth and frame timing limitations.)

v0lt, that difference is because I implemented rendering in an absolute, linear color system for the renderer. It's a good thing. Look up "gamma correct rendering" for more resources about the basics of the subject, and the XYZ, xyY and LMS color spaces for more advanced information about color theory.

XRyche
4th February 2015, 00:20
JanWillem32, you only had to recently fix the image resizers and not the chroma upsamplers, right? Well the Y'Cb'Cr shaders appear to be working as they should. Even leaving everything at their default original values (sharpening clamps @ 2, detection limits @ 32, and Noisethreshold @ .015625) . Those values use to make everything look like it had a thin curtain draped over it. Now they are actually debanding and denoiseing specific areas in the image and not the whole. Correcting the Image scalers should effect the Y'Cb'Cr shaders like that, should they? Only the chroma up samplers should have an effect on the Y'Cb'Cr' shaders since, to my understanding, chroma is upsampled before the Y'Cb'Cr shaders are even initiated.
I tried them on a media file I know for a fact was heavily denoised and debanded during encoding (to the point of losing some detail, in fact) and the shaders didn't appear to have to do any apparent debanding or denoising at all. Maybe I'm just seeing the difference between using your chroma upsamplers and having to use ffdshows chroma upsamplers.
I haven't a clue, but everything is working great now. In fact, it looks better than I have ever been able to get it to look. I wonder how long the image scalers were knackered for? Thanks for the hard work.

ts1
4th February 2015, 12:11
No any errors now but player still hangs. In EVR when video shows after 2 seconds because of that GetTimingReport() problem, video plays faster for a short period to catch up and VMR-9 just hangs instead (worked fine 2 times, noticed with the last debug build). And did you notice that it calls GetTimingReport() 2 times at 310x200 and played video resolutions. Also I assume that madvr uses this function too (in windowed mode stat screen says "windowed mode (old path)"), but everything works fine there. Btw, crash with madvr in this builds.

JanWillem32
4th February 2015, 23:20
XRyche, all resizers had an update to deal with various issues and to increase efficiency. I'm glad you seem to like it.

ts1, do you have a new debug log, or does it get stuck at the same point in the previous log? The GetTimingReport() issue took a bit of work, but was not too hard to prevent it from failing multiple times (e.g. when resizing). The madVR/Haali renderer issue was a trivial alignment issue, and has been fixed.

ts1
4th February 2015, 23:35
Seems that everything is works now.
Some minor bugs left. Step is bugged with Alternative scheduler, the video sometimes plays for a short period.
Frame disappears after the pause with Alternative scheduler on Vista.

JanWillem32
4th February 2015, 23:48
It's a bit odd that it seems to have fixed the issues VMR-9 r., as I didn't edit it much. I have no idea what was causing the problem. Oh well, as long as it's fixed.
I'll fix frame stepping next. It might be a bit tricky to get it right with the constant frame interpolator, but the alternative scheduler should be capable of it.
The disappearing frame issue is caused by the DWM scheduler of the system. I can't fix that.

ts1
5th February 2015, 00:04
You fixed the GetTimingReport() to prevent it from failing multiple times and it was the cause of desync at the beginning. Apparently VMR-9 is not good at handling desyncs.

XRyche
5th February 2015, 09:28
I have run into a problem. For some reason I can't get your MPC-HC to recognize xy-vsfilter in it's external filters. Weird thing is that it does recognize XySubFilter which is useless for your version of MPC-HC. I'm actually perfectly fine using the internal subtitle renderer but the media player shouldn't have a problem with xy-vsfilter should it?

JanWillem32
5th February 2015, 12:25
I replaced the builds and source code in my earlier post.

ts1, I shaved off a bit more of that timing functions' maximum time to try the automatic mode. (You can also disable it completely by setting something else than 1 for Refresh Rate Adjustment in the "Options", "Output" menu.)
Frame stepping is bugged in the EVR mixer. It actually starts to play, sends the frame step message, and then pauses. The renderer can't compensate for that unfortunately and the alternative scheduler is fast to react to the 'play' message, so it will often handle multiple frames when frame stepping under these conditions.

XRyche, I'll look at xy-vsfilter later. It should indeed not be such a problem.

ts1
6th February 2015, 12:38
If I set something else than 1 for Refresh Rate Adjustment, then it changes frame rate of the playback with Alternative scheduler and D3D Fs + Alternative scheduler.
And DebugView shows memory leak on shutdown:

10.72630787 [3452] EVR: ReleaseServicePointers()
10.72678661 [3452] EVR: Worker thread stopped
10.72700119 [3452] EVR: InitServicePointers()
10.72711945 [3452] EVR: ReleaseServicePointers()
10.81614780 [3452] Detected memory leaks!
10.81620216 [3452] Dumping objects ->
10.81625938 [3452] c:\mpc-hc-rf\src\mpc-hc\mpcpngimage.cpp(104) :
10.81630611 [3452] {1767}
10.81635475 [3452] normal block at 0x00328608, 52 bytes long.
10.81641579 [3452] Data: < t > B4 EB 74 02 00 00 00 00 00 00 00 00 00 00 00 00
10.81646442 [3452] Object dump complete.
Also, regarding frame disappearing after the pause, it only happens after the second pause and onwards, on 1st pause it's always ok.

JanWillem32
6th February 2015, 21:02
ts1, that memory leak is not from my code, but the trunk MPC-HC. I haven't been editing that part of the code.
On a more positive note, I changed some things around and now frame stepping should be more accurate in all modes. I also edited the options and options menu.

XRyche, I just tested xy-vsfilter and I had no issues. Note that the filter enumerates on the "External Filters" tab as DirectVobSubFilter and DirectVobSubFilter (auto-loading version), and you have to add it there before it will activate.

Important note: the automatic refresh rate detection option with the old default setting of "1" will disable the new option.
Tick the new box on the "Options", "Output" tab if you want to remain using the automatic refresh rate detection option. A reset of the renderer settings will also activate the automatic refresh rate detection mode.

kasper93
7th February 2015, 00:04
ts1, that memory leak is not from my code, but the trunk MPC-HC. I haven't been editing that part of the code.

Not the trunk... It has been fixed 8 months ago. Additionaly it was just logo image which was not freed on exit. But still not big deal, because it was only on exit.

XRyche
7th February 2015, 10:26
XRyche, I just tested xy-vsfilter and I had no issues. Note that the filter enumerates on the "External Filters" tab as DirectVobSubFilter and DirectVobSubFilter (auto-loading version), and you have to add it there before it will activate.


Ah, sorry I bothered you for such an obvious thing. I was looking for xy-vsfilter or vsfilter.

ts1
7th February 2015, 14:59
Framestepping is working correctly now, but there are some issues with the slider (time). It moves 2 times slower and doesn't move if the video has been stopped.

JanWillem32
8th February 2015, 16:49
kasper93, the base build I'm using is just old. I should update.

ts1, can you describe it better? Do you mean that during frame stepping the slider doesn't move correctly?
A minor bug with the stop state in combination with EVR CP is known; it may get stuck at the first frame until you re-activate the mixer with a seek or something similar. Did you mean that specific bug?

ts1
8th February 2015, 17:06
Yes during frame stepping and yes that specific bug.

ts1
8th February 2015, 18:09
I think I was wrong it's a different bug cause it happens with VMR 9 and EVR too, and slider still won't move after seek or resize for example, when playback was stopped. And VMR 9 r. hangs after several frame steps, but only if frame stepping used in 1st 5 seconds of playback, nothing useful in DebugView's log (I still have Debug build). Sliders movement speed during frame stepping is correct with EVR and seems correct with VMR 9 r. but not with EVR-CP.

Hera
8th February 2015, 18:14
Is non-basic frame interpolator causing artifacts (randomly appearing pure-white pixels around 'sharp' edges) and crashes from time to time (like when resizing) to anyone?
EDIT: Harder to see in video with lots of noise such as film grain - much easier in animation. Just making sure it is not my GPU.

ts1
8th February 2015, 18:25
Hera, same (and in D3D Fs mode too) but didn't experienced crashes.

Does Frame Interpolator forces Alternative scheduler? Because frame disappears after the pause with Frame Interpolator but without Alternative scheduler.

XRyche
8th February 2015, 21:02
Is non-basic frame interpolator causing artifacts (randomly appearing pure-white pixels around 'sharp' edges) and crashes from time to time (like when resizing) to anyone?
EDIT: Harder to see in video with lots of noise such as film grain - much easier in animation. Just making sure it is not my GPU.

Actually, the "clipping" (randomly appearing pure-white pixels around 'sharp' edges) is what JanWillem32 called it, was happening to me a while back. It was during the initial CEICAM2 shader testing. JanWillem32 actually fixed that for me and it never happened with the XLRCAM shaders. Although the crashing issue never happened with me.

I haven't updated to the latest build that was posted on 2/06/15. I'm still using the 2/02/15 build.

JanWillem32
9th February 2015, 01:26
ts1, I might have solved the stop bug (by adding a seek when stopping). I haven't been able to replicate this bug anymore, so that's a good start.
When the video is in stopped mode the player should not allow any other point on the seek bar than the start. That's normal. Seeking is only available in running or paused modes.
Frame stepping is off for interlaced sources for what I've seen (the mixer will only frame step for every two steps). I can't fix this bug, as it's the mixer's doing. I could not replicate other frame stepping issues. Could you describe exactly how you get the frame stepping and the seek bar to not run at the same speed?
The bug with VMR-9 hanging when frame stepping in the first few seconds is due to the mixer not responding to an information request. I can't fix this bug. (VMR-9 r. actually doesn't have any specific frame-stepping code, it's all inside the mixer.)
As a small side-note, frame-stepping backwards will often take longer than forwards, due to how the decoding works. It might take a second to complete.
The constant frame interpolator will use features of the Alternative scheduler if available.

Hera, interpolation artifacts can happen with complicated filters like the constant frame interpolator. I'll take a look if anything seems out of place later.
I haven't been able to replicate any crashes. In which scenario do you have the crashes (What you are uing in terms of settings, type of video, any subtitles, input frame rate, display refresh rate, and so on.)
We might have to try a new debug build to analyze these crashes.

Hera
9th February 2015, 04:26
32-bit Floating Point Surfaces
Alternative Schedule
9 Dithering
Motion Adaptive, Medium
DXVA Native using LAV filters
Windows 8.1 64-bit / NV 970

Got a crash when resizing window by dragging it to the top with the mouse OR Winkey + UP.
Second crash on exit, not sure if related.

Will see if happens with new build....

Also, when toggling interpolation, video stops then fast forwards trying to catch up.
Also, subtitles are also subject to interpolation from what I can tell - I get white pixel noise around subtitles.
Also, is it me or are subtitles now rendered to the size of the video and not desktop?

ts1
9th February 2015, 07:20
Nevermind, slider still doesn't move after the stop but it's the same in the trunk build.

Could you describe exactly how you get the frame stepping and the seek bar to not run at the same speed?
On a short (3-10s) video I used frame stepping from the beginning, video already ended and slider was right in the middle.

XRyche
9th February 2015, 21:28
Nevermind, slider still doesn't move after the stop but it's the same in the trunk build.


On a short (3-10s) video I used frame stepping from the beginning, video already ended and slider was right in the middle.

This isn't happening to me on the latest 64 bit build. It was an issue back when was I was using Vista 32 bit though.

JanWillem32
10th February 2015, 10:06
Hera, I still haven't been able to replicate any crashes related to the constant frame interpolator. I've added another debug build to try to debug this issue.
When toggling the constant frame interpolator on, multiple shaders have to be compiled. That takes some time. The renderer will try to catch up with the global timer to synchronize again after that.
Subtitles are indeed blended before frame interpolation passes are executed. I still haven't thought out a way to move the frame interpolation passes to another stage.
Subtitles can be renderered at video area size or full window size, see the "Options", Subtitles" tab for the option.

ts1, I've replicated the slow frame stepping issue. It's not caused by the renderer code, but by the external mixer counting the frame numbers up to a too low amount. I can't fix this issue. Luckily, it only affects the seek bar, and nothing playback related.

x86 SSE2 debug: http://www.mediafire.com/download/4690xb2p7nhvevd/mpc-hc_SSE2_tester_dfr7370rrrrri_debug.7z

ts1
10th February 2015, 18:46
Too many resets during window resize. Resize is slow because of this. Is it intentional?

JanWillem32
10th February 2015, 19:02
The resets are actually quite limited. I set a maximum of twice per second. The default was checking every frame for changed settings, window size and some other environment changes.

ts1
10th February 2015, 19:39
Is it Alternative scheduler's limitation? Can it be done without resets, so that resize would be smooth as in the trunk mpc-hc?

JanWillem32
11th February 2015, 07:35
The renderer in the trunk build is a pile of garbage. Even in this regard it does everything wrong. It doesn't set up a presentation queue, but rater a copy-to-desktop swap chain at the the monitor resolution it first initializes at. (You can move the player window over to another monitor with a higher resolution and see how buggy such a setup is.) I've always simply used correctly window-sized presentation swap chains that actually allow proper queuing. This does mean that you have to reset window size-dependent stuff with every resize, but that's just the way it is. DirectX 9 is slightly problematic in this regard as well, as it can't reset as fast as DirectX 10+ due to API limitations.
In many other cases I've also chosen to use the optimal efficiency during rendering, even if these optimizations require resets of some parts of the renderer.

I found a resizing bug that could cause crashes. This is probably the one Hera described earlier. I also optimized (re-)initialization for the renderer.

regular builds (just small updates this time):
x64: http://www.mediafire.com/download/0s9zceobckg496t/mpc-hc64_tester_dfr7370rrrrri.7z
x86 SSE2: http://www.mediafire.com/download/j7nv8jir4ahfg85/mpc-hc_SSE2_tester_dfr7370rrrrri.7z
x86 SSE: http://www.mediafire.com/download/5p2rdk7ll6ng3ve/mpc-hc_SSE_tester_dfr7370rrrrri.7z
source code: http://www.mediafire.com/download/x6xvnc5ty79exr3/mpc-hc_tester_dfr7370rrrrri_source_code.7z

ts1
11th February 2015, 10:43
Noise with Frame Interpolator that has been mentioned earlier http://imgur.com/mZ84Yrw

Overall it works pretty well, only those bugs with the mixer are annoying.

v0lt
14th February 2015, 14:29
@JanWillem32
Why EVR-custom can not use float texture on Intel graphics cards? Is it possible to fix it?
The problem occurs in MPC-HC and MPC-BE.

JanWillem32
14th February 2015, 19:48
ts1, these seem like reasonable interpolation artifacts. They result from the movement direction estimation faulting in a certain area and cause the pixel interpolation to be off.
I do agree that the mixer bugs are annoying. A custom mixer however would come at the cost of the maximum efficiency of the mixer pipeline in some stages. There is no ideal solution for handling the mixers.

v0lt, Intel chose a fixed-function unit to do mixing for EVR. This unit does not support anything but 8-bit R'G'B' outputs. VMR-9 r. does work with 16- and 32-bit floating-point textures (but it needs some special settings to actually return floating-point quality from the mixer).

XRyche
15th February 2015, 00:08
What is the lowest NoiseThreshold value for general debanding, not denoising for your Vertical/Horizontal pass sharpen complex, deband and denoising shaders? The NoiseThreshold value of .015625 is pretty close to perfect for my denoising needs but it's way too aggressive for general debanding. Just a suggested value is good enough for me. I'll work from your suggestion to find the value that best suits my individual tastes. I really don't want to mess with the SharpenLimits or the Detection Values.

JanWillem32
15th February 2015, 06:19
Banding, static, encoder and grainy noise are all just noise to the analyzing step of that particular shader. It doesn't "see" any difference when debanding and denoising. For the "NoiseThreshold" value I personally use .0078125 . Note that you can set the values of the four stages for "NoiseThreshold" in the body of that shader to different amounts. The inner (upper) value is the most important for denoising, and somewhat for debanding, and the outer (lower) value is mostly important for debanding, and not so much for denoising.

XRyche
15th February 2015, 15:54
Banding, static, encoder and grainy noise are all just noise to the analyzing step of that particular shader. It doesn't "see" any difference when debanding and denoising. For the "NoiseThreshold" value I personally use .0078125 . Note that you can set the values of the four stages for "NoiseThreshold" in the body of that shader to different amounts. The inner (upper) value is the most important for denoising, and somewhat for debanding, and the outer (lower) value is mostly important for debanding, and not so much for denoising.

Could you give me an exert from the script of the locations of the four stages and the upper and lower values? I think i might have an idea of where they are but I'm not confident at all. Like I've said before " I know just enough to be dangerous but not enough to be competent."

JanWillem32
15th February 2015, 18:30
I optimized these shaders and have given them multi-stage controls over the noise thresholds. Note that I can still expand the search area beyond the current 9 horizontal and vertical pixels if you wish (though I doubt the effect would gain much from that).
Use the renderer option "Disable Initial Color Mixing Stages" and set up these shaders as follows:
1- R'G'B' to Y'CbCr for LMS rendering
2- if required (on Intel/AMD graphics adapters), an up-sampling shader for 4:2:0 to 4:2:2
3- if required (on Intel/AMD graphics adapters), an up-sampling shader for 4:2:2 to 4:4:4
4- horizontal pass sharpen complex, deband and denoise for LMS rendering
5- vertical pass sharpen complex, deband and denoise, and color conversion to LMS for LMS rendering// (C) 2015 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// R'G'B' to Y'CbCr for LMS rendering
// This shader should be run as the very first pixel shader in the chain.
// This shader requires compiling with ps_2_0, but higher is better, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports.

#ifndef Ml// external compiler
#define Ml 0
#define Mi 0
#define Mr 0
#endif
#if Ml
#define tex2D(s, t) tex2Dlod(s, float4(t, 0., 0.))
#endif
sampler s0 : register(s0);
static const float3x3 mty = float3x3(
#if Mi == 0
.2126, -1063./9278., .5, .7152, -1788./4639., -1788./3937., .0722, .5, -361./7874.// BT.709 (+BT.2020) R'G'B' to Y'CbCr
#elif Mi == 1
.299, -299./1772., .5, .587, -587./1772., -587./1402., .114, .5, -57./701.// BT.601 (+BT.2020) R'G'B' to Y'CbCr
#else
.212, -106./913., .5, .701, -701./1576., -701./1576., .087, .5, -87./1576.// SMPTE 240M R'G'B' to Y'CbCr
#endif
)
#if Mr
*32767./65535.// convert to alternative limited ranges, first part
#endif
;

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 s1 = tex2D(s0, tex).rgb;// original pixel
s1 = s1.r*mty[0]+s1.g*mty[1]+s1.b*mty[2];// convert to Y'CbCr
#if Mr
s1 += float2(16384./65535., 32767./65535.).xyy;// convert to alternative limited ranges, second part
#endif
return s1.rgbb;
}// (C) 2015 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// horizontal pass sharpen complex, deband and denoise for LMS rendering
// This shader should be run as a Y'CbCr-stage pixel shader.
// This shader requires compiling with ps_3_0, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports.

// fractions, either decimal or not, are allowed
#define SharpenLimitLuma 2// valid interval [0, 10], luma-specific sharpening limit, 0 is disabled, lower numbers will allow more sharpening on contours
#define SharpenLimitChroma 2// valid interval [0, 10], chroma-specific sharpening limit, 0 is disabled, lower numbers will allow more sharpening on contours
#define LumaDetectionFactor 64// valid interval (1, 250], luma-specific detection factor, if set to the lowest amount no contours can be detected, higher numbers will shift the detection on color difference intervals of debanding to noise detection limit to mimimum sharpening to maximum sharpening toward more sharpening
#define ChromaDetectionFactor 64// valid interval (1, 250], chroma-specific detection factor, if set to the lowest amount no contours can be detected, higher numbers will shift the detection on color difference intervals of debanding to noise detection limit to mimimum sharpening to maximum sharpening toward more sharpening
// valid intervals [0, 1), banding treshold, higher numbers mean stronger deband and denoise
#define NoiseThreshold1 .015625// innermost threshold, mostly affects denoising
#define NoiseThreshold2 .015625
#define NoiseThreshold3 .015625
#define NoiseThreshold4 .015625// outermost threshold, mostly affects debanding area

#ifndef Ml// external compiler
#define Mr 0
#endif
sampler s0 : register(s0);
float2 c1 : register(c1);
#define sp(a) tex2Dlod(s0, float4(tex+c1*float2(a, 0), 0, 0)).rgb
static const float3 slimits = float3(-SharpenLimitLuma, -SharpenLimitChroma, -SharpenLimitChroma);
static const float3 dfactors = float3(LumaDetectionFactor, ChromaDetectionFactor, ChromaDetectionFactor)*(32768./32767.*float(Mr)+1.);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 n, p, s1 = sp(0);// original pixel
{
float3 s2 = sp(-1);
float3 af = 1.;// accumulated amount of colors from the samples
float3 ac = s1;// accumulate color
float3 cd = abs(s1-s2);// color difference
float3 rcd = max(slimits, 1.-dfactors*cd);// factor for both base and multiplicand is 1.0, the output will be in the interval (-inf, 1]
// invert interval on sharpening
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s2*rcd;
[branch] if(!any(cd >= NoiseThreshold1*(1.-32768./65535.*float(Mr)))) {// continue if all channels are below the noise threshold
float3 s3 = sp(-2);
cd = abs(s1-s3);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s3*rcd;
[branch] if(!any(cd >= NoiseThreshold2*(1.-32768./65535.*float(Mr)))) {
float3 s4 = sp(-3);
cd = abs(s1-s4);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s4*rcd;
[branch] if(!any(cd >= NoiseThreshold3*(1.-32768./65535.*float(Mr)))) {
float3 s5 = sp(-4);
cd = abs(s1-s5);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s5*rcd;
[branch] if(!any(cd >= NoiseThreshold4*(1.-32768./65535.*float(Mr)))) {
float3 s6 = sp(-5);
cd = abs(s1-s6);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s6*rcd;
}
}
}
}
n = ac/af;
}
{
float3 s2 = sp(1);
float3 af = 1.;// accumulated amount of colors from the samples
float3 ac = s1;// accumulate color
float3 cd = abs(s1-s2);// color difference
float3 rcd = max(slimits, 1.-dfactors*cd);// factor for both base and multiplicand is 1.0, the output will be in the interval (-inf, 1]
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s2*rcd;
[branch] if(!any(cd >= NoiseThreshold1*(1.-32768./65535.*float(Mr)))) {// continue if all channels are below the noise threshold
float3 s3 = sp(2);
cd = abs(s1-s3);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s3*rcd;
[branch] if(!any(cd >= NoiseThreshold2*(1.-32768./65535.*float(Mr)))) {
float3 s4 = sp(3);
cd = abs(s1-s4);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s4*rcd;
[branch] if(!any(cd >= NoiseThreshold3*(1.-32768./65535.*float(Mr)))) {
float3 s5 = sp(4);
cd = abs(s1-s5);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s5*rcd;
[branch] if(!any(cd >= NoiseThreshold4*(1.-32768./65535.*float(Mr)))) {
float3 s6 = sp(5);
cd = abs(s1-s6);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s6*rcd;
}
}
}
}
p = ac/af;
}
return ((n+p)*.5).rgbb;
}

JanWillem32
15th February 2015, 18:30
// (C) 2015 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// vertical pass sharpen complex, deband and denoise, and color conversion to LMS for LMS rendering
// This shader should be run as a Y'CbCr-stage pixel shader.
// This shader requires compiling with ps_3_0, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports.

// fractions, either decimal or not, are allowed
#define SharpenLimitLuma 2// valid interval [0, 10], luma-specific sharpening limit, 0 is disabled, lower numbers will allow more sharpening on contours
#define SharpenLimitChroma 2// valid interval [0, 10], chroma-specific sharpening limit, 0 is disabled, lower numbers will allow more sharpening on contours
#define LumaDetectionFactor 64// valid interval (1, 250], luma-specific detection factor, if set to the lowest amount no contours can be detected, higher numbers will shift the detection on color difference intervals of debanding to noise detection limit to mimimum sharpening to maximum sharpening toward more sharpening
#define ChromaDetectionFactor 64// valid interval (1, 250], chroma-specific detection factor, if set to the lowest amount no contours can be detected, higher numbers will shift the detection on color difference intervals of debanding to noise detection limit to mimimum sharpening to maximum sharpening toward more sharpening
// valid intervals [0, 1), banding treshold, higher numbers mean stronger deband and denoise
#define NoiseThreshold1 .015625// innermost threshold, mostly affects denoising
#define NoiseThreshold2 .015625
#define NoiseThreshold3 .015625
#define NoiseThreshold4 .015625// outermost threshold, mostly affects debanding area

#ifndef Ml// external compiler
#define Mr 0
#define Mp 0
#define Mi 0
#endif
sampler s0 : register(s0);
float2 c1 : register(c1);
#define sp(a) tex2Dlod(s0, float4(tex+c1*float2(0, a), 0, 0)).rgb
static const float3 slimits = float3(-SharpenLimitLuma, -SharpenLimitChroma, -SharpenLimitChroma);
static const float3 dfactors = float3(LumaDetectionFactor, ChromaDetectionFactor, ChromaDetectionFactor)*(32768./32767.*float(Mr)+1.);
static const float3x3 mat = transpose(float3x3(
#if Mp == 0
722868859153469683239595115393861./2255010826531620584211453297294600., 2585192674261804536498018473512337./4059019487756917051580615935130280., 431657167547167713128315634772483./10147548719392292628951539837825700., 12180008436477856247752895389891./75167027551054019473715109909820., 307102197566215335489903665465341./405901948775691705158061593513028., 82569264131239864825730732355689./1014754871939229262895153983782570., 53058419719444384066923671296./3075014763452209887561072678129., 111365697442061984458791034130./1025004921150736629187024226043., 2687859251406579550117775904443./3075014763452209887561072678129.// BT.709 (+BT.2020) RGB to Kuo, Zeise & Lai-modified Hunt–Pointer–Estévez LMS
#elif Mp == 1
782910914364235953227073443841197./2505869460647788817876743483379180., 47437201892727083793508359035427793./75176083819433664536302304501375400., 4251554495779502145981742150711697./75176083819433664536302304501375400., 124459316400474051550350235432343./751760838194336645363023045013754., 5551656166817314597571817932574949./7517608381943366453630230450137540., 240453017040437113518303387746387./2505869460647788817876743483379180., 190503071007861562472684780870./11390315730217221899439743106269., 3484707111371743136520220253336./34170947190651665698319229318807., 30114730866256337874380954722861./34170947190651665698319229318807.// BT.470-2 System B,G/EBU 3213 (PAL/SECAM) (+BT.2020) RGB to Kuo, Zeise & Lai-modified Hunt–Pointer–Estévez LMS
#elif Mp == 2
4849241781348846485632006241./10503846560761984531342857600., 34978470582276588065997919961./73526925925333891719400003200., 35966897450120142609203434./574429108791671029057812525., 1160596510009705046375091833./5251923280380992265671428800., 24013221393818167547899894193./36763462962666945859700001600., 72282281230950671674601009./574429108791671029057812525., -76116568237764117836011./65649041004762403320892860., 25082445529946839408599149./459543287033336823246250020., 108748414370263583165625737./114885821758334205811562505.// BT.470-2 System M RGB to Kuo, Zeise & Lai-modified Hunt–Pointer–Estévez LMS
#else
4582795022958559125948622834369921./13693077790023334631838222431764800., 465527727195754310063646435959647./746895152183090979918448496278080., 431705826356959866042061203601013./10269808342517500973878666823823600., 77217992358463681223124506539751./456435926334111154394607414392160., 56048129428711216246386203396963./74689515218309097991844849627808., 82578571800427591248026245959679./1026980834251750097387866682382360., 42047021865948932675414158432./2334047350572159312245151550869., 1103514589028385241176926280029./9336189402288637248980606203476., 2688162241932152092367341096573./3112063134096212416326868734492.// SMPTE 170M/SMPTE 240M/SMPTE C (NTSC) (+BT.2020) RGB to Kuo, Zeise & Lai-modified Hunt–Pointer–Estévez LMS
#endif
)
#if Mr
*32767./65535.// convert to limited ranges, first part
#endif
);
static const float2x3 mtr = float2x3(
#if Mi == 0
0., -1674679./8940000., 1.8556, 1.5748, -4185031./8940000., 0.// BT.709 (+BT.2020) Y'CbCr to R'G'B'
#elif Mi == 1
0., -25251./73375., 1.772, 1.402, -209599./293500., 0.// BT.601 (+BT.2020) Y'CbCr to R'G'B'
#else
0., -79431./350500., 1.826, 1.576, -41764./87625., 0.// SMPTE 240M Y'CbCr to R'G'B'
#endif
)
#if Mr
*65535./32767.// restore to full range, first part
#endif
;

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 n, p, s1 = sp(0);// original pixel
{
float3 s2 = sp(-1);
float3 af = 1.;// accumulated amount of colors from the samples
float3 ac = s1;// accumulate color
float3 cd = abs(s1-s2);// color difference
float3 rcd = max(slimits, 1.-dfactors*cd);// factor for both base and multiplicand is 1.0, the output will be in the interval (-inf, 1]
// invert interval on sharpening
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s2*rcd;
[branch] if(!any(cd >= NoiseThreshold1*(1.-32768./65535.*float(Mr)))) {// continue if all channels are below the noise threshold
float3 s3 = sp(-2);
cd = abs(s1-s3);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s3*rcd;
[branch] if(!any(cd >= NoiseThreshold2*(1.-32768./65535.*float(Mr)))) {
float3 s4 = sp(-3);
cd = abs(s1-s4);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s4*rcd;
[branch] if(!any(cd >= NoiseThreshold3*(1.-32768./65535.*float(Mr)))) {
float3 s5 = sp(-4);
cd = abs(s1-s5);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s5*rcd;
[branch] if(!any(cd >= NoiseThreshold4*(1.-32768./65535.*float(Mr)))) {
float3 s6 = sp(-5);
cd = abs(s1-s6);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s6*rcd;
}
}
}
}
n = ac/af;
}
{
float3 s2 = sp(1);
float3 af = 1.;// accumulated amount of colors from the samples
float3 ac = s1;// accumulate color
float3 cd = abs(s1-s2);// color difference
float3 rcd = max(slimits, 1.-dfactors*cd);// factor for both base and multiplicand is 1.0, the output will be in the interval (-inf, 1]
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s2*rcd;
[branch] if(!any(cd >= NoiseThreshold1*(1.-32768./65535.*float(Mr)))) {// continue if all channels are below the noise threshold
float3 s3 = sp(2);
cd = abs(s1-s3);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s3*rcd;
[branch] if(!any(cd >= NoiseThreshold2*(1.-32768./65535.*float(Mr)))) {
float3 s4 = sp(3);
cd = abs(s1-s4);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s4*rcd;
[branch] if(!any(cd >= NoiseThreshold3*(1.-32768./65535.*float(Mr)))) {
float3 s5 = sp(4);
cd = abs(s1-s5);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s5*rcd;
[branch] if(!any(cd >= NoiseThreshold4*(1.-32768./65535.*float(Mr)))) {
float3 s6 = sp(5);
cd = abs(s1-s6);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s6*rcd;
}
}
}
}
p = ac/af;
}
s1 = (n+p)*.5;
#if Mr
s1 -= float2(16384./65535., 4294737923./8589672450.).xyy;// restore to full range, second part
s1 = s1.r*65535./32767.+s1.g*mtr[0]+s1.b*mtr[1];// restore Y' to full range, convert to R'G'B'
#else
s1 = s1.r+s1.g*mtr[0]+s1.b*mtr[1];// convert to R'G'B'
#endif
s1 = sign(s1)*pow(abs(s1), 2.4);// to linear RGB, negative input compatible
s1 = s1.r*mat[0]+s1.g*mat[1]+s1.b*mat[2];// convert to Kuo, Zeise & Lai-modified Hunt–Pointer–Estévez LMS
#if Mr
s1 += 16384./65535.;// convert to limited ranges, second part
#endif
return s1.rgbb;
}

XRyche
15th February 2015, 20:52
I'm going to assume that your integrated Chroma Upsampling Fix for Intel/AMD on this renderer is sufficient and someone would need the additional upsampling shaders for another renderer.

I'm just curious, but what made you change the LumaDetectionFactors?

Oh, they work like a charm. Much, much better than the previous shader suite. They do the same thing for the most part, but are easier to control and understand for your average person that doesn't know too much about HLSL and/or video in general. Thank you very much.

JanWillem32
15th February 2015, 21:29
I'm going to assume that your integrated Chroma Upsampling Fix for Intel/AMD on this renderer is sufficient and someone would need the additional upsampling shaders for another renderer.If you disable the renderer's initial mixing stages, the renderer's chroma up-sampling and color conversion steps will be disabled. You have to replace them with custom shaders.I'm just curious, but what made you change the LumaDetectionFactors?The original value was for the [16384./65535., 49151/65535.] interval data, but the new ones are for [0, 1] interval data (with automatic adaptation if required). That's why the new values are double the original value.Oh, they work like a charm. Much, much better than the previous shader suite. They do the same thing for the most part, but are easier to control and understand for your average person that doesn't know too much about HLSL and/or video in general. Thank you very much.I try my best. If you need anything else, please ask away. Minor modifications of old shaders are easy. Only brand new effects can take a while.

XRyche
15th February 2015, 22:22
If you disable the renderer's initial mixing stages, the renderer's chroma up-sampling and color conversion steps will be disabled. You have to replace them with custom shaders.

That explains a lot for me. Ever since I switched to AMD from Nvidia I haven't seen a significant change. This is why. I thought I was just a victim of the placebo effect.

The only chroma upsampling shaders I have are from your "Video shader pack 1.4" which dosen't have any of the more recent upsampling shaders you use in your renderer. Could I ask you to make those available. If that's too much of a daunting task just the Lanzcos shaders would be sufficient.

The original value was for the [16384./65535., 49151/65535.] interval data, but the new ones are for [0, 1] interval data (with automatic adaptation if required). That's why the new values are double the original value.I try my best. If you need anything else, please ask away. Minor modifications of old shaders are easy. Only brand new effects can take a while.

I was just curious as to why. There wasn't any motive behind wanting to know except for curiosity :) .

JanWillem32
16th February 2015, 16:35
http://www.mediafire.com/download/m7td27lofk6dyi1/YCbCr-type_sharpen_complex_test_3.7z
Just a reminder; these are renderer-specific shaders for the quality mode. This package includes the three shaders listed above for "sharpen complex, deband and denoise for LMS rendering" and also all current chroma up-sampling shaders.
The readme.txt file contains information about how to chain shaders for different renderer conditions.
The chroma up-sampling shaders are literally copies from the internal shaders. I didn't test them separately, so please inform me if anything is wrong with them.