PDA

View Full Version : Video pixel shader pack


Pages : 1 [2] 3

JanWillem32
8th July 2011, 22:51
DCI demands it as a minimum. The demands for processing are even higher. In terms of DirectX 9 color formats, only A16B16G16R16 (non float) and A32B32G32R32F would suffice for processing from beginning to end.
My projector is 12-bit, too. I only wish I had a more absolute control on it (to disable more stuff). It's not that bad if I compare it to some other display devices, though. It exposes the full gamut in full range RGB mode, and allows almost full user control over all the internal parts in the OSD. I just don't like the post-processing part.

TheElix
8th July 2011, 23:41
From the times I've seen projectors I can't say I found their images more colourful or contrasty than PDPs, for example.

JanWillem32
9th July 2011, 01:04
Aside from those in labs and specialist environments, the digital (and even analog ones, in some ways) projectors in cinemas aren't bad at all. I'm talking about types that generally weigh more than 100 kg, of course.

Eliminateur
22nd July 2011, 00:43
Jan, i'm not clear on what is a screen space shader and what is not.
latest MPC buils have only 2 shader types: "pre resize" and "post resize".
so what do i use where?, could you update the readme?

i'm having the infamous red on black pixelating issue with ati and dxva and i'm trying to solve it with your pack.
which of the shaders should i use?, i'm using FFPP so i guess i'll use the ones for FP surfaces but i have no idea which of the 3 to use.
i used 2 and 3 and i get green and white converted to purple!

i also have a wide gamut display so i tried to use "gamma conversion of linear RGB to wide gamut RGB for floating point surfaces" as post-resize and it didn't change... anything....

fairchild
22nd July 2011, 00:56
I asked something similair in the other thread. When using the YV12 Chroma upsampling shader, does it have to be done on pre or post resize and/or on both? I'd imagine some of the chroma shaders in this pack is similair.

Eliminateur
22nd July 2011, 01:35
hmm i had to use 1->2->3 compiled for PS3 and now it works.

one more thing, creating new shaders in mpchc is a CHORE, you can select a "blank one" so you need to change the name, press enter, then delete everything in the window THEN copypaste, then enter in the name again so it gets compiled, etc etc.
shader editor is quite crappy as it is

JanWillem32
22nd July 2011, 01:47
The "post resize" item used to be "screenspace". The skipping chroma isn't specific to DXVA, but to enabling either HFPP, FFPP, or 10-bit RGB output on ATi hardware.
Your situation could use this basic setup:
Shader chain:
1. RGB to Y'CbCr for SD&HD video input for floating point surfaces
2. special 420 to 422 intermediate Catmull-Rom spline5 chroma up-sampling for SD&HD video input
3. special 422 Catmull-Rom spline5 chroma up-sampling for SD&HD video input (linear output enabled)

Post resize shaders:
gamma conversion of linear RGB to video RGB for floating point surfaces

I linearize the gamma to avoid darkening artifacts on R'G'B' color blending. Here's an example of no gamma correction throughout the filter stages: http://photocreations.ca/interpolator/index.html .
Of course, you can also edit all kinds of color controls in the step 3 shader, and add a few custom shaders.
The last shader can simply be edited to any display gamma. The common range is usually from 1/2.2 to 1/2.6 . The 256/563. and 5/12. values are just two presets. The linearization power in step 3 is 2.4.
For lighter processing, you can enable the single-pass bilinear filter instead of steps 1, 2 and 3.
A video file encoded with 4:2:2 chroma down-sampling doesn't need step 2.

I'll take a look at adding a full list of chroma interpolation shaders later, I've finished quite a few resizers, anyway. The idea of making the shader collection external to the player in .TXT files was rejected. Shaders will have to be loaded from the registry or from a .INI file (warning: the .INI file can only store small ones).
You can copy from the registry if you need a backup or portability: http://forum.doom9.org/showthread.php?p=1514362#post1514362 .

Eliminateur
22nd July 2011, 04:04
hmmm this is werid i'm having problems with the 1-2-3 shaders.
when i tested before, it worked ok, now i keep seeing pixelation on red, and activating/deactivating/recompiling is not doing anything.
if i open shader editor and i ente the name it says "could not load shader", is that the problem of the .ini file?
how big is "small ones"?

Also, i was using the 16-235 -> 0-255 [SD][HD] to correct levels, after using your 1-2-3 shaders it doesn't works anymore, even if i put it in pre/post/before/after
i really like that shader because without it, blacks are "grey" in my monitor(hmm but if i enable that shader i lose black details as they go very black).

anyway, the linearization shader makes the gamma go extremely high, i've tried with 5/12 and it's still too bright, settled with 451/563 for now -i'm gonna do much more trial and error-, it brightens it up a little without ruining the blacks much.

btw, all this is valid for DXVA that outputs NV12, but when i use soft decoding i'm outputting RGB32 from ffdshow, how will these shaders act in that case?

oh, about processing power, compiled everything with PS3.0, tried playing back a moderate 1080p clip w/DXVA and all shaders active, GPU usage hovering 48%(with UVD clocks) so i have aaaample processing power to spare

JanWillem32
22nd July 2011, 08:23
"Small ones", as in: the total may not exceed 15 KB, I believe. The Windows registry has a limitation in size of 1 MB per entry.

For "16-235 -> 0-255 [SD][HD]", see page one of this thread and a few onwards. It's one of the shaders I'm definitely going to delete. It has also been several years since the last report of a driver/mixer that makes this flaw, as well.
Gray tints in video where you normally expect darker tints is normally caused by a low display gamma. Try your hand at a tool that can help set up your monitor controls. Note that most of those tools set up for a gamma of 2.2, consumer-grade video is a bit steeper normally at around 2.4.

The "gamma conversion of linear RGB to video RGB for floating point surfaces" shader de-linearizes the gamma from step 3 (activated manually in the shader). You shouldn't need to set it to a value far beyond the normal interval of 1/2.2 to 1/2.6 .
For any normal color controls, you can also activate controls for Y'CbCr and RGB stages in step 3. Those are meant to correct specific flaws in an encoded video, not so much the flaws in a display.

RGB32 is a compatibility option for usage with older video cards that can't really deal with any video processing at all. It's not a default output normally. Unless you are viewing a .BMP file through the Windows still image filter, you should never get RGB32/X8R8G8B8 as input for the mixer. Valid input for the mixer with 8-bit 4:2:0 Y'CbCr video files is NV12 or YV12, with 8-bit 4:2:2 Y'CbCr video files is YUY2.

I'm familiar with having headroom in processing power. I set my HD4890 at 700/975 clocks during video playback. It only has momentary spikes over 90% GPU usage now and then on 1080p video, even with my heavy rendering chain. (I do need set the full 3D clocks of 850/975 for a 1080i sample I have, though.) It has a nasty habit of setting the clocks to 500/975 in combination with DXVA playback, which is too low for my needs on 1080p, so that's why I override them.

CiNcH
24th July 2011, 14:35
Hi Jan,

I am currently also fooling around with pixel shaders and have a little problem...

I am creating a pixel shader:

D3DXCompileShaderFromFile( buffer, NULL, NULL, "main", "ps_2_a",dwShaderFlags, &pCode,&pBufferErrors, &g_pConstantTablePS[i] );
g_pd3dDevice->CreatePixelShader( (DWORD*)pCode->GetBufferPointer(),&g_pPixelShader);


I then draw the quad:

g_pd3dDevice->SetVertexShader(NULL);
g_pd3dDevice->SetPixelShader(g_pPixelShader);
g_pd3dDevice->DrawPrimitive( D3DPT_TRIANGLESTRIP, 0, 2 );
g_pd3dDevice->SetPixelShader(NULL);


This works like a charm.

I then switch pixel shader version to 3.0:

D3DXCompileShaderFromFile( buffer, NULL, NULL, "main", "ps_3.0",dwShaderFlags, &pCode,&pBufferErrors, &g_pConstantTablePS[i] );


When I do this, the quad is not drawn. I only see the background color which has been set with the call to g_pd3dDevice->Clear.

Do you have any idea what is going wrong when setting pixel shader version to 3.0?

(I am loading D3DX9_43.dll BTW)

Eliminateur
24th July 2011, 16:31
i use rgb32 because i want maximum quality output without any mixer or driver doing anything to the color conversion(i don't trust them) and i find ffdshow software HQ conversion is much better than the mixer.

i've been toying with the shaders more, and with RGB32 input they actually lower the quality of the chroma, it's like it downscales it(starts blurring and pixellating the red on black borders)

about the gamm, i tried your setting and it was so high it was unwatchable(i'm going to post screenshots later.
I've tried monitor setting wizard in the past, they all result in values that i don't like how they apper, either too dark or too bright i'm using custom TGB with close to 100 value, brightness at 4 and contrast 40)

"Small ones", as in: the total may not exceed 15 KB, I believe. The Windows registry has a limitation in size of 1 MB per entry.

For "16-235 -> 0-255 [SD][HD]", see page one of this thread and a few onwards. It's one of the shaders I'm definitely going to delete. It has also been several years since the last report of a driver/mixer that makes this flaw, as well.
Gray tints in video where you normally expect darker tints is normally caused by a low display gamma. Try your hand at a tool that can help set up your monitor controls. Note that most of those tools set up for a gamma of 2.2, consumer-grade video is a bit steeper normally at around 2.4.

The "gamma conversion of linear RGB to video RGB for floating point surfaces" shader de-linearizes the gamma from step 3 (activated manually in the shader). You shouldn't need to set it to a value far beyond the normal interval of 1/2.2 to 1/2.6 .
For any normal color controls, you can also activate controls for Y'CbCr and RGB stages in step 3. Those are meant to correct specific flaws in an encoded video, not so much the flaws in a display.

RGB32 is a compatibility option for usage with older video cards that can't really deal with any video processing at all. It's not a default output normally. Unless you are viewing a .BMP file through the Windows still image filter, you should never get RGB32/X8R8G8B8 as input for the mixer. Valid input for the mixer with 8-bit 4:2:0 Y'CbCr video files is NV12 or YV12, with 8-bit 4:2:2 Y'CbCr video files is YUY2.

I'm familiar with having headroom in processing power. I set my HD4890 at 700/975 clocks during video playback. It only has momentary spikes over 90% GPU usage now and then on 1080p video, even with my heavy rendering chain. (I do need set the full 3D clocks of 850/975 for a 1080i sample I have, though.) It has a nasty habit of setting the clocks to 500/975 in combination with DXVA playback, which is too low for my needs on 1080p, so that's why I override them.

Eliminateur
25th July 2011, 01:59
i've uploaded the screens to dropbox to maintain them in PNG as jpg masked the errors, the filename shows the settings used
http://dl.dropbox.com/u/3493496/screens.zip

here's what i can conclude from them in parts:

1st part - no shaders
DXVA chroma looks horribly blocky as expected
soft(ffdshow w/RGB32 HQ output) and DXVA 8bit looks almost the same(differences in contrast/brightness due to CCC video improvements and also because ffdshow uses deband filter).
Verdict: i prefer the brighter looking DXVA 8bit output, soft looks more blurred

2nd part - with shaders(using the 3 shaders in the chroma interpolation folder for floating surfaces):
DXVA 10b looks almost the same as 8bit with no shaders(since save image does not capture post-shader output i had to printscreen the file)
soft decoding shows blocky chroma with shaders
Verdict: again, soft looks blurred and dxva 8b w/out shaders looks exactly the same as with chroma shaders.

what's the point of going the extra 10b route if i can go to 8b and not use any shader?

for the gamma conversion screen, again the screen capture does not get the shader output so i had to capture with printscreen, you can see how horrible it looks with default 256/563 value

JanWillem32
25th July 2011, 15:48
@CiNcH: To test: manually change "ps_3.0" to "ps_3_0".
For a more advanced solution:
Header file: typedef LPCSTR (WINAPI* D3DXGetPixelShaderProfilePtr)(LPDIRECT3DDEVICE9 pDevice);
typedef HRESULT (WINAPI* D3DXCompileShaderPtr)(LPCSTR pSrcData, UINT SrcDataLen, CONST D3DXMACRO* pDefines, LPD3DXINCLUDE pInclude, LPCSTR pFunctionName, LPCSTR pProfile, DWORD Flags, LPD3DXBUFFER* ppShader, LPD3DXBUFFER* ppErrorMsgs, LPD3DXCONSTANTTABLE* ppConstantTable);
typedef HRESULT (WINAPI* D3DXDisassembleShaderPtr)(CONST DWORD* pShader, bool EnableColorCode, LPCSTR pComments, LPD3DXBUFFER* ppDisassembly);

D3DXGetPixelShaderProfilePtr m_pD3DXGetPixelShaderProfile;
D3DXCompileShaderPtr m_pD3DXCompileShader;
D3DXDisassembleShaderPtr m_pD3DXDisassembleShader;Initialization section: const HINSTANCE hDll = recent version of D3DX9Dll;
if(hDll) {
m_pD3DXCompileShader = reinterpret_cast<D3DXCompileShaderPtr>(GetProcAddress(hDll, "D3DXCompileShader"));
m_pD3DXDisassembleShader = reinterpret_cast<D3DXDisassembleShaderPtr>(GetProcAddress(hDll, "D3DXDisassembleShader"));
m_pD3DXGetPixelShaderProfile = reinterpret_cast<D3DXGetPixelShaderProfilePtr>(GetProcAddress(hDll, "D3DXGetPixelShaderProfile"));}
...
m_pProfile = m_pD3DXGetPixelShaderProfile(m_pD3DDev);// get the pixel shader profile levelm_pProfile is a LPCSTR you can use as a direct input for "LPCSTR pProfile" input of m_pD3DXCompileShader, it should be renewed every time a device is re-made or reset.
The other two should behave properly after initialization like this.

It's possible to write a shader that has a very poor ordering system, so that it does compile on level 2.x, but not on level 3.0. It's possible to disable flow control for those.
For the command-line interface for the standalone compiler it's the /Gfa switch. (The compiler is fxc.exe in the "x86" and "x64" folders of "Microsoft DirectX SDK (June 2010)\Utilities\bin\" .)

For the other parts: DrawIndexedPrimitive can be more efficient, avoid "g_pd3dDevice->somecall(NULL);" as much as possible, as it doesn't come for free.

@Eliminateur: I've verified the color transfer matrices to be accurate up to quite a few bits for both VMR-9 and EVR. For the main set of matrices, see: http://msdn.microsoft.com/en-us/library/ms698715%28v=vs.85%29.aspx . (The bt.601 and bt.709 matrices are also used for the two xvYCC types.)
The driver only performs abnormal filtering because filters are set in the CCC in your case. I suggest everyone to manually set up the filters in the video options tab of the video card's driver suite. ATi, Intel and nVidia all implement the filters as optional, but some are enabled by default. Those filters do cost processing power, and may also not be to your liking, either.
Using X8R8G8B8 (RGB32) when 8-bit RGB isn't the source format is pretty destructive; Y'CbCr/xvYCC sources lose the data from the [0, 15] and [236, 255], [241, 255], [241, 255] intervals (8-bit quantization assumed). That doesn't happen with the mixer working on A32B32G32R32F (FFPP) or A16B16G16R16F (HFPP) surfaces. As those names imply, the mixer quantization is quite a bit better with those two formats, too.

For providing gamma correction for internal filtering stages (includes the internal resizers) after chroma up-sampling and color conversion, I added the "LinearRGBOutput" switch to step 3. It's made to linearize the encoding gamma of 1/2.4 to 1, so no darkening artifacts occur with the internal filtering stages when blending colors. To avoid some complaints, I've not enabled the function by default.

CiNcH
25th July 2011, 16:29
Thanks for your answer.

@CiNcH: To test: manually change "ps_3.0" to "ps_3_0".
Copy/paste mistake, sorry. I of course set it to ps_3_0.

I am already feeding the output of 'D3DXGetPixelShaderProfile' into 'D3DXCompileShader', which is ps_3_0.

Still ps_3_0 profile does not work for some reason. Within MPC-HC it does.

JanWillem32
25th July 2011, 16:37
Does fxc.exe generate any warnings or errors (even in strict mode)? It's rather rare that all ps 3.0 items would fail.
I assume that hardware vertex processing is enabled?

CiNcH
25th July 2011, 17:08
Yes, 'hardware vertex processing' is enabled. I use the very simple greyscale shader for testing purpose.

CiNcH
25th July 2011, 17:30
May this be due to the fact that i use fixed-function transforms like SetTransform?

JanWillem32
25th July 2011, 18:33
SetTransform probably isn't a big problem, although a static index and vertex buffer in video usually gains a few % of performance for resolving vertices. I don't think I've ever used pre-defined transforms yet, except for world view input data to vertex shaders.header:
#pragma pack(push, 1)// this directive is used on MYD3DVERTEX to copy 32-bit aligned vertex data to video memory on x86 and x64
template<unsigned texcoords>
struct MYD3DVERTEX {
float x, y, z, rhw; struct {
float u, v;} t[texcoords];};
template<>
struct MYD3DVERTEX<0ui32> {
float x, y, z, rhw; DWORD Diffuse;};
#pragma pack(pop)

CComPtr<IDirect3DIndexBuffer9> m_pIBuffer;
CComPtr<IDirect3DVertexBuffer9> m_pVBuffer;


main:
if(!m_pIBuffer) {
// prepare the static index buffer in video RAM
static const short indices[6] = {0, 1, 2, 2, 1, 3};// two triangles
void* pVoid;// void pointer for memcpy
// create an index buffer interface
hr = m_pD3DDev->CreateIndexBuffer(sizeof(indices), D3DUSAGE_DONOTCLIP|D3DUSAGE_WRITEONLY, D3DFMT_INDEX16, D3DPOOL_DEFAULT, &m_pIBuffer, NULL);
// lock m_pIBuffer and load the indices into it
hr = m_pIBuffer->Lock(0, 0, reinterpret_cast<void**>(&pVoid), D3DLOCK_NOSYSLOCK);
memcpy(pVoid, indices, sizeof(indices));
hr = m_pIBuffer->Unlock();
// set the static index buffer
hr = m_pD3DDev->SetIndices(m_pIBuffer);}

m_pD3DDev->SetFVF(D3DFVF_XYZRHW|D3DFVF_TEX1);

...
Although unfinished, the current AlphaBlt code can serve as a nice example.

HRESULT CDX9RenderingEngine::AlphaBlt(IDirect3DTexture9* pSubtitleTexture, const CRect rcSubSrc, const CRect rcSubDest)
{// only used by DX9AllocatorPresenter.cpp to blend subtitles with resizing
// TODO: add resizing shaders
HRESULT hr;

static CRect Scrsrc, Scrdst;
if(Scrsrc != rcSubSrc || Scrdst != rcSubDest || !m_pAlphaBltVBuffer) {
m_pAlphaBltVBuffer = NULL;
Scrsrc = rcSubSrc;
Scrdst = rcSubDest;
D3DSURFACE_DESC d3dsd = {D3DFMT_UNKNOWN, D3DRTYPE_SURFACE, 0, D3DPOOL_DEFAULT, D3DMULTISAMPLE_NONE, 0, 0, 0};
if(FAILED(hr = pSubtitleTexture->GetLevelDesc(0, &d3dsd))) return hr;

const float wrp = 1.0f/static_cast<float>(d3dsd.Width), hrp = 1.0f/static_cast<float>(d3dsd.Height),
rsl = static_cast<float>(rcSubSrc.left)*wrp, rsr = static_cast<float>(rcSubSrc.right)*wrp, rst = static_cast<float>(rcSubSrc.top)*hrp, rsb = static_cast<float>(rcSubSrc.bottom)*hrp,
rdl = static_cast<float>(rcSubDest.left)-0.5f, rdr = static_cast<float>(rcSubDest.right)-0.5f, rdt = static_cast<float>(rcSubDest.top)-0.5f, rdb = static_cast<float>(rcSubDest.bottom)-0.5f;
const MYD3DVERTEX<1> v[4] = {
{rdl, rdt, 0.5f, 2.0f, rsl, rst},
{rdr, rdt, 0.5f, 2.0f, rsr, rst},
{rdl, rdb, 0.5f, 2.0f, rsl, rsb},
{rdr, rdb, 0.5f, 2.0f, rsr, rsb}};

// prepare the vertex buffer in video RAM
void* pVoid;// void pointer for memcpy
// create a vertex buffer interface
hr = m_pD3DDev->CreateVertexBuffer(sizeof(v), D3DUSAGE_DONOTCLIP|D3DUSAGE_WRITEONLY, D3DFVF_XYZRHW|D3DFVF_TEX1, D3DPOOL_DEFAULT, &m_pAlphaBltVBuffer, NULL);
// lock m_pVBuffer and load the vertices into it
hr = m_pAlphaBltVBuffer->Lock(0, 0, reinterpret_cast<void**>(&pVoid), D3DLOCK_NOSYSLOCK);
memcpy(pVoid, v, sizeof(v));
hr = m_pAlphaBltVBuffer->Unlock();}

// set the special vertex buffer
m_pD3DDev->SetSamplerState(0, D3DSAMP_MAGFILTER, D3DTEXF_LINEAR);
m_pD3DDev->SetSamplerState(0, D3DSAMP_MINFILTER, D3DTEXF_LINEAR);
hr = m_pD3DDev->SetStreamSource(0, m_pAlphaBltVBuffer, 0, sizeof(MYD3DVERTEX<1>));

// draw the rectangle
hr = m_pD3DDev->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, 4, 0, 2);

// cleanup: set the normal vertex buffer
hr = m_pD3DDev->SetStreamSource(0, m_pVBuffer, 0, sizeof(MYD3DVERTEX<1>));
m_pD3DDev->SetSamplerState(0, D3DSAMP_MAGFILTER, D3DTEXF_POINT);
m_pD3DDev->SetSamplerState(0, D3DSAMP_MINFILTER, D3DTEXF_POINT);
return hr;
}

CiNcH
25th July 2011, 19:18
I have no idea anymore what could be wrong :( .

CiNcH
25th July 2011, 19:44
I now found this (http://www.gamedev.net/topic/489553-how-to-replace-setfvf-with-vertex-shader/page__view__findpost__p__4196329). I am using

D3DFVF_XYZ or D3DFVF_DIFFUSE or D3DFVF_TEX1

and not

D3DFVF_XYZRHW

JanWillem32
26th July 2011, 00:07
Interesting that it even worked on 2.x levels.
D3DFVF_DIFFUSE needs a color as a masked ABGR in the vertex data statement, D3DFVF_TEX1 would map a texture on top of that color. I guess it has some use with some kinds of alpha blending, material layering or lighting, but those things are more for end-stage design.
D3DFVF_XYZRHW is very convenient for 2D and some 3D work, as you don't have to mess too much with the Z-depth, culling and such (unless you want to).

CiNcH
27th July 2011, 06:15
I guess it has some use with some kinds of alpha blending, material layering or lighting, but those things are more for end-stage design.
Yes. I am rendering OSD textures on top of the video texture. Can I use 'D3DFVF_XYZRHW' for the video texture, apply shaders, and then render an OSD texture on top with D3DFVF_XYZ (as the OSD can be animated, so it shall to be transformable) with an alpha channel?

I am new to the whole stuff and it is just a hobby. Can you advise some good lecture for D3D9?

JanWillem32
27th July 2011, 16:17
D3DFVF_XYZRHW|D3DFVF_TEX1 will work just fine for that. It means you are using one set of texture coordinates relative to the output surface (X, Y, Z, RHW) and one set of coordinates for the sampling register bound to the source texture (X, Y), so 6 floating points per vertex point. "const MYD3DVERTEX<1> v[4]" describes a rectangle, composed of 2 triangles by drawing the vertices by index from points 0, 1 to 2 and 2, 1 to 3.
To use the example I used earlier again, after ordering the draw operation in DrawIndexedPrimitive, use "SetTexture(0, pOSDtexture);", "SetRenderState(D3DRS_ALPHABLENDENABLE, TRUE);" and DrawIndexedPrimitive again. If you want to do the alpha blend with different vertices, pre-load them to the array "const MYD3DVERTEX<1> v[8]" located in 4 to 7 and use "hr = m_pD3DDev->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 4, 0, 4, 0, 2);".
Note that I didn't change any other statuses in between the two draw operations. You might want to set "SetSamplerState(0, D3DSAMP_MAGFILTER, D3DTEXF_POINT);", for example when you're doing a 1:1 pixel mapping from source to target. Point sampling is less intensive than linear filtering, of course. In a similar way, you can decide anti-aliasing and anisotropic filtering options for each draw operation.

CiNcH
27th July 2011, 16:54
Thanks so much for your input!

I now got ps_3_0 working with vertex format 'D3DFVF_XYZRHW'. I still have some problems with vertex coordinates though. When having used 'D3DFVF_XYZ', coordinates for x and y ranged from -1 to 1, which must have been the percentage of the back buffer. This did not work with 'D3DFVF_XYZRHW'. I now use the dimensions of the back buffer in pixels with the origin (0,0). Why the difference?

You say I should use 'D3DFVF_XYZRHW' for the OSD texture as well? Even if I want to animate it? Can I use SetTransform (which I currently use to animate the OSD) with transformed vertices in the format 'D3DFVF_XYZRHW'?

Does it really make so much of a difference in performance when using an indexed buffer if we only have 4 vertices?

JanWillem32
27th July 2011, 18:58
If you use a vertex shader to modify vertices or use a pixel shader to animate it like the "sphere" and "wave" shaders, you don't have to change your base vertex statement.
If you use an Index buffer and vertex buffer, the GPU isn't halted every time to let the driver interrupt to send new vertex data. The GPU command bus instead loads it from its queue from local video memory instead.

JanWillem32
30th July 2011, 21:05
releaselog:
added the Mitchell-Netravali cubic shaders
added "detect even or odd coordinates, alternative"
updated the the complete set of chroma interpolation shaders, all include color controls now
updated Y'CbCr conversion efficiency for all shaders that perform Y'CbCr to RGB conversions

Eliminateur
31st July 2011, 19:40
feedback:
i've updated with latest pack and using the following shaders:
1-2-3 for floating surfaces Mitchell-Netravali cubic5 and r=6 sharpen complex, deband and mild denoise as pre-resize
gamma conversion for linear to wide gamut as post-resize
all compiled with PS 3.0
plus all the filters in CCC active(smooth video, dynamic range, color vibrance, flesh tone correction, edge enhance, denoise, mosquito denoise, deblocking, dynamic contrast) as i like how it outputs.

gpu usage peaks ~86% with UVD clocks but i have a feeling that it's "framing", dropped frames counter advances continuosly :(
(and i have a R5800 which should have massive ammounts of PS power available)
and another weird thing is why doesn't the card goes to 3D clocks on such high usage is beyond me....

if i disable the r=6 i get ~2 dropped frames.

disabling almost all filters in ccc don't drops much, except in long pans, looks like uvd clock ain't enough, curiously the dynamic range filter makes it drop more frrames...

JanWillem32
31st July 2011, 21:37
The filters in the video section of CCC are render items on the shader core, and just like any pixel shader, they cost some processing power when enabled.
Note that "Enforce Smooth Video Playback" isn't a video filter, but a video filter controller. When enabled, it disables some other filters once the GPU usage is near maximum capacity.

The AMD driver locks the video card to 2D clocks when DXVA is active, irrespective of the actual processing load. I use a custom profile to clock my video card a bit higher than 2D when starting MPC-HC with a 1080p or 1080i video while using DXVA. I'm fine with 2D clocks when I'm playing back 720p video, but that may be different in other cases.

The dropped frames counter will always advance a bit when menus are switched, options are enabled or disabled, while seeking, and so on. I typically get 0 dropped frames if I don't do those things.
If the dropped frames counter continues to increase during normal video playback, there's a real bottleneck.
With a HD5830 or better on reasonable clocks, your processing chain shouldn't be a problem at all. Peaks of about 86% are very reasonable.
I get peaks of up to 95% on my HD4890, usually without any frames dropped. I use an alpha version of my modified MPC-HC builds' EVR CP, and a heavier processing chain than yours (I've enabled color management).

Eliminateur
31st July 2011, 23:11
The r=6 sharpen, i enabled it on the pre-resize chain, as final resizer, is this correct?
(btw i tested adding a 9 as i didn't notice any noticeable change in video but it's definitively there now that i modify it!).
I should try some of the more "severe" values.

Also, i guess that by using the "r=6, sharpen complex, deband and mild denoise" i can safely disable all denoise and edge enhancements controls in CCC.... kind of redundant

When i mentioned dropped frames it was during regular playback. What are you using for custom clock profiles?(do you switch it by hand on 1080p or some autodetection is in place?).
Weird that i'm gettings drops on less than 90% use...
even is you use D3DEX output the GPU won't raise to 3D clocks?, seeing as in D3DEX it's essentially a 3D game....

BTW, you should add a notice on the "sharpen complex, deband and mild denoise", they don't compile on anything but PS3.0 -at least on R=6-

Why is PS 4.0 not available for compiling?, shouldn't it provide even better performance and efficiency?

Anyway, i'll continue testing!
thanks for your work and prompt responses, sorry for some of my questions as i can never get my head over colorspaces and conversions(and a lot of filters!) but i still find all these matters fascinating

JanWillem32
1st August 2011, 07:50
The "sharpen complex, deband and denoise"-type shaders can be inserted after making the RGB linear, but before resizing. Your placement is correct.
I made those shaders mostly with quality debanding in mind, the values for sharpening are very mild. (I don't like to give images the typical sharpened edges.)
For custom video card clock profiles, I use ATI Tray Tools. I generated two profiles, one intermediate for 1080p, and one with full 3D clocks for 1080i. These have a shortcut on my desktop that starts MPC-HC. Once MPC-HC exits, the video card is also set back to idle mode.
When I use a software decoder, the video card is set by default to full 3D clocks. (I generally use the intermediate profile to start the player in such a case, as it saves a bit of fan noise.)

All "sharpen complex, deband and denoise"-type shaders have the minimum required pixel shader level noted in line 9.
PS levels of 4.0 or higher are indeed more efficient, but you can only use them in DirectX 10 or higher. The renderer is sadly just DirectX 9.

CiNcH
1st August 2011, 08:51
What's the advantage of separating a scaler shader into a horizontal and vertical pass?

JanWillem32
1st August 2011, 09:39
A standard bicubic kernel samples 4 pixels in each direction. For separate horizontal and vertical passes that requires sampling 4+4 pixels. Doing that in one pass takes 16 pixels at once. For reference, see the "(inefficient one-pass)"-type shaders in the development folder.
Of course, there's some overhead in adding another pass in the chain, plus the need for a special intermediate surface (one that's only resized in one direction), but that's still less than sampling 8 pixels extra for every output pixel. The gain raises exponentially with the larger kernels that take 5 or more pixels per pass.

Eliminateur
1st August 2011, 23:41
are there any plans to make the renderer DX11?(as DX10 is kind of useless at this point...)

JanWillem32
2nd August 2011, 00:11
I don't have any objections against it, but at the moment the mixer functions, VSync functions and the subtitle renderer take precedence.

Eliminateur
8th August 2011, 00:11
Jan,
i was watching a video and noticed that it had a LOT of banding and was very noticeable, so i switched to ffdshow softdecode(RGB32+dither, deband thresh. 1.2 radius 16), disabled all shaders other than the gamma output one and it looks like a very good gradient as it should
i tried switching betweeen light and mild denoise R=6 shaders to no avail, it's VERY banded with DXVA

here's the DXVA one: http://dl.dropbox.com/u/3493496/DXVA.png
soft one: http://dl.dropbox.com/u/3493496/soft.png

any idea what do i need to tweak or do to have THAT kind of output with DXVA?

JanWillem32
8th August 2011, 01:05
Disable dynamic brightness, or any other function that's pushing the brightness so far up in the video card's control panel. Consumer video is already very limited in near black values due to the gamma and quantization, and there's also lossy compression on top of that.
The sharpening filters in control panel can also do quite a bit of harm, and I've never seen the standard denoise functions do any sort of debanding at all.
Also, all filtering settings in the the video card's control panel simply work as long as the decoder input to the EVR mixer isn't RGB. It's not an exclusive feature of DXVA.

For "sharpen complex, deband and denoise"-type shaders, you can raise the "NoiseLevel" value for a more agressive deband and denoise. There's also "GammaCompensation", but lowering it can be harsh on low-light, poor quality video.
Line 21: // GammaCompensation, interval [1, 2], default 1.5, the gamma compensation factor to diminish denoising on darker pixels, a factor 1 will use the same grade of filtering on darker and lighter pixels, the current standards for consumer-grade video are lacking in dynamics for the lower brightness range, so a factor 1 will usually cause too much fitering on darker pixels

Note, as usual, any proper debanding requires better working surfaces than the default X8R8G8B8 in the renderer chain.

dukestravels07
8th August 2011, 23:23
Hi Jan. I was wondering if you could give me a basic idea of what shaders to use with my setup?.
I've read through the thread and while I get the basics, some of it is totally confusing. I have a very basic HTPC setup. It consists of a visiontek ATI HD3650 agp card. I run mpc-hc with vmr9 renderless and dxva works fine.

I have a really crap old projector (SL2U) that has a native res of 800x600, so most of the films I have are avi with roughly DVD like quality...some are VHS quality.

I was wondering how I could get the best possible quality using your shaders?

I have no idea about "gamma conversion for linear RGB" is this something I need to use?

Will yv12 chroma upsampling improve crap avi's?

Basically if you can give a basic list of shaders and in what order they should go, in order to imrpove my videos id be grateful.

Thanks in advance.

JanWillem32
9th August 2011, 01:26
There's unfortunately no magic filter that will make a display look like it has a higher resolution. We can only filter the video.
Intermediate conversion to linear gamma is used to prevent darkening artifacts on every processing step. We generally do encoded video gamma -> linear gamma -> display gamma.
Chroma up-sampling shaders are a trick to get video input where chroma up-sampling was skipped to be up-sampled by one of the 4 methods. Your video card will skip chroma up-sampling when a surface mode other than X8R8G8B8 is used with the VMR-9 or EVR mixers. (10-bit out, HFPP, FFPP settings for MPC-HC)

My settings will be a bit heavy for your setup but it's better to start with more and take away things later.
Feel free to relax Full Floating Point Processing for Half, and replace "r=6, sharpen complex, deband and denoise" for something less extreme. There are plenty of choices to make your own working chain. It's just a matter of taste, but in general, don't try to apply two similar filters in one chain. It's usually better to edit settings in one shader to make it lighter or heavier in an aspect. Other than that, try a lot. Even messing around with some of the joke shaders can be very interesting.

Renderer settings I use in MPC-HC's EVR CP:
D3D Full Screen Mode, 10-bit RGB Output, Full Floating Point Processing, Disable desktop composition (Aero)

Shader chain:
1. RGB to Y'CbCr for SD&HD video input for floating point surfaces
2. special 420 to 422 intermediate Catmull-Rom spline5 chroma up-sampling for SD&HD video input
3. special 422 Catmull-Rom spline5 chroma up-sampling for SD&HD video input (linear output enabled, sometimes I also set colorfulness gamma to lower a high colorfulness on some video sources)
r=6, sharpen complex, deband and medium denoise,
Post-resize:
gamma conversion of linear RGB to wide gamut RGB for floating point surfaces
final pass: color management with an ICC profile installed system-wide and random ordered dithering

mindbomb
28th August 2011, 02:56
C:\Program Files (x86)\Media Player Classic - Home Cinema\memory(64,29): warning X3206: implicit truncation of vector type
C:\Program Files (x86)\Media Player Classic - Home Cinema\memory(64,45): warning X3206: implicit truncation of vector type
C:\Program Files (x86)\Media Player Classic - Home Cinema\memory(64,63): warning X3206: implicit truncation of vector type
C:\Program Files (x86)\Media Player Classic - Home Cinema\memory(85,12): error X3014: incorrect number of arguments to numeric-type constructor

i get these errors when creating the bilinear 4 2 0 chroma filter.
anything to worry about?

Eliminateur
28th August 2011, 03:11
mindbomb, check the PS version you're compiling the shaders with the shader editor, use the highest you can based on your hardware(usually ps 3.0 by now)

JanWillem32
28th August 2011, 10:43
I already noticed that the bilinear up-samplers were broken when I was integrating them in the latest MPC-HC tester build. I waited some time to get a nice set of fixes and new shaders ready.

Changelog:
corrected "bilinear chroma up-sampling and color controls for SD&HD video input"
improved performance for many Y'CbCr mode shaders

YCbCr-type sharpen complex test
I've tried something new. I've adapted r=4, r=5 and r=6 sharpen complex, deband and medium denoise to Y'CbCr-mode shaders. So far I've been very satisfied with its debanding capabilities and performance, so I'm asking if others would like to test these, too. For changing the debanding and denoising strength, just edit the value for "NoiseLevel".
For those that use the chroma up-sampling sets, the alternatives for the three 4:2:2 up-sampling shaders are included. Others can use one of the two "RGB to Y'CbCr for SD&HD video input"-type shaders to pre-process to Y'CbCr mode.
If anything needs correction, please tell me.

mindbomb
28th August 2011, 14:22
thanks, the bilinear chroma shader from 1.4 works perfectly

Qaq
28th August 2011, 14:40
Personally, I found that *correct* HQ chroma upsampling provides agressive colors with my setup. Same thing with madVR, even for FullHD videos (most of them). I've disabled all the adjustments in video driver, my TV doesn't allow color control in PC mode, so I can only hope for perfect color processing in video renderer. Trying to avoid that color madness I found that w/o chroma shaders picture looks much close to natural. Need to try NN chroma scaler in madVR too. :devil:

Eliminateur
28th August 2011, 22:35
are those in the development folder or the production folders?

I already noticed that the bilinear up-samplers were broken when I was integrating them in the latest MPC-HC tester build. I waited some time to get a nice set of fixes and new shaders ready.

Changelog:
corrected "bilinear chroma up-sampling and color controls for SD&HD video input"
improved performance for many Y'CbCr mode shaders

YCbCr-type sharpen complex test
I've tried something new. I've adapted r=4, r=5 and r=6 sharpen complex, deband and medium denoise to Y'CbCr-mode shaders. So far I've been very satisfied with its debanding capabilities and performance, so I'm asking if others would like to test these, too. For changing the debanding and denoising strength, just edit the value for "NoiseLevel".
For those that use the chroma up-sampling sets, the alternatives for the three 4:2:2 up-sampling shaders are included. Others can use one of the two "RGB to Y'CbCr for SD&HD video input"-type shaders to pre-process to Y'CbCr mode.
If anything needs correction, please tell me.

JanWillem32
29th August 2011, 07:18
@Qaq: That's odd. With the sharper resizers, like the Mitchell-Netravali cubic and Catmull-Rom spline implementation I've used, over-saturation on some chroma borders is possible (but very rare). With resizers that can only blur, like the cubic B-Spline and bilinear implementation I've used, that's not possible.
I wonder what your TV is doing to the signal. When the RGB 4:4:4 full range output of the renderer is left unprocessed by the display device (no digital processing, only digital-to-analog conversion for the panel), any form of chroma up-sampling in the renderer should give a better picture over nearest neighbor (especially if full picture resizing or aspect ratio correction is used by the renderer).
It's known that in some situations the renderer output is converted afterwards, lowering the output quality. Can you test the "draw grid coordinates" shader? It should draw single-pixel wide RGB lines horizontally and vertically when full picture resizing and aspect ratio correction are disabled. If there's something wrong with the processing after the renderer output, the lines will be imperfect.

@Eliminateur: The full directory listing of the normal download is in the OP, the "YCbCr-type sharpen complex test" is a separate download.

Qaq
29th August 2011, 10:07
Jan, thanks for the tip, of course I'll try to test my display with that "draw grid coordinates" shader.

burfadel
29th August 2011, 21:24
I've noticed with the sharpen, deband, denoise filters (I use r=6), the following:

// VideoGamma, interval [2., 3.], default 2.4, the video gamma input factor used to convert between the video input RGB and linear RGB during pre-processing
#define VideoGamma 2.4

Results in a very very dark image. If I set the videogamma to 1, its fine (original)...

JanWillem32
29th August 2011, 22:43
The 2.4 is perfectly fine as it is, you are looking for "LinearRGBOutput", which has a default value of 1. You can also reverse the gamma setting just before dithering (insert it as a post-resize shader as the last item), so the resizer and other shaders in between can work on linear RGB.

burfadel
29th August 2011, 23:59
hmmm, don't know why, but I either have to set the videogamma as 2.4 or set '#define LinearRGBOutput' as 0, regardless of how I set it!

Its only with shader pack v1.4

JanWillem32
30th August 2011, 07:28
Only the version in the "YCbCr-type sharpen complex test" changes the gamma to linear RGB on output by default. The regular one already expects linear RGB input (it only uses the gamma value for brightness estimation).
What setup do you use? I might need to edit some parts of the code if it's really broken.

Qaq
30th August 2011, 18:20
Can you test the "draw grid coordinates" shader? It should draw single-pixel wide RGB lines horizontally and vertically when full picture resizing and aspect ratio correction are disabled. If there's something wrong with the processing after the renderer output, the lines will be imperfect.
Tried the shader with 1080 video. Picture looks exactly the same in MPC-HC and in image viewer.
http://lostpic.net/thumbs/5a4f2f5483a7601144e257d3565c1ba0.png (http://lostpic.net/?view=5a4f2f5483a7601144e257d3565c1ba0)
Can't say I understand what to call perfect and imperfect here, but lines seems straight at least.
BTW, I found that picture looks completely different if I switch HDMI input between YCC and RGB. Seems like this shader is good test for HDMI chroma sub-sampling bug too.

JanWillem32
31st August 2011, 07:53
The "draw grid coordinates" is exactly to test for chroma sub-sampling errors. The other function is a geometry checkup. If the lines are not projected as single-pixel wide on screen, because of errors with overscan/underscan, resolution or geometry settings, the lines will be blended. Failure to map pixels 1:1 to the screen is a common problem that causes visible errors.

JanWillem32
14th September 2011, 09:27
I've just added version 2 of the YCbCr-type sharpen complex test.
These shaders are a bit better at banding detection, especially for chroma. They were only somewhat difficult to configure a reasonable sharpness to debanding ratio for.
For those that use the chroma up-sampling sets, the alternatives for the three 4:2:2 up-sampling shaders are included. Others can use one of the two "RGB to Y'CbCr for SD&HD video input"-type shaders to pre-process to Y'CbCr mode.
If anything could use improvement, please tell me.

TheElix
14th September 2011, 09:48
Please, could you tell me what differs this YCbCr-type version from the usual one and when one should prefer it over the usual version and what benefits it would give?

JanWillem32
14th September 2011, 11:02
The regular ones work on linear RGB. That's near-optimal for input with a reasonably linear response, a large gamut, a well-defined color interval at [0, 1] and no chroma sub-sampling issues. Unfortunately, consumer-grade video fails on all four accounts.
It's usually chroma sub-sampled. I've seen reasonably proper blu-ray masters that greatly overstep their nominal range (Y'CbCr, interval [16, 235], [16, 240], [16, 240]). The source gamut of HD video is limited to just sRGB, the gamut of SD video is even less than that. The Cb and Cr channels are reasonably linear (these are usually overpowered in presence by the luma channel in the matrix), but the Y' channel is a very bad approximation to linear lightness.
Adapting the filter to work on flaws of the source video in Y'CbCr colors is a good idea, but designing the shader was hard. The fist few versions I made looked terrible, these couldn't keep a good distinction in the areas to deband.
I'm actually pretty satisfied with the filter as it is now, but perhaps the main function could use some extra customization user settings apart from the regular "NoiseLevel".

TheElix
14th September 2011, 14:56
On what content do you recommend this filter to be used? I want to make some screenshot comparisons.

JanWillem32
14th September 2011, 15:11
Any standard Y'CbCr video will do fine. Try a variety, while adjusting "NoiseLevel" to a suitable value for each. Of course, the values for sharpening are by user preference. The current version is very sensitive to the sharpening parameters. A little too low blurs too much, a little too high and the typical sharpening artifacts become very visible.

TheElix
14th September 2011, 15:18
Aren't all DVD/BD content Y'CbCr? As well as TV translations... Also, adjusting parameters in a shader for each video is a little extreme. But for the sake of experiment....

JanWillem32
14th September 2011, 15:36
Pretty much all consumer-grade digital video uses Y'CbCr. I wouldn't use this kind of filter on superior formats, like when loading a standard BMP through the still image filter, for instance.
I've always supplied 5 versions of the same shader, each with only the "NoiseLevel" parameter changed (values .625, .75, 1, 1.5 and 2.5).

TheElix
14th September 2011, 21:16
Regular chain: http://img7.imageshack.us/img7/6919/regularshaders.png
With YCbCr-type 3. special 422 cubic B-spline5 chroma up-sampling: http://img190.imageshack.us/img190/6629/ycbcrtypeshaders.png

Qaq
14th September 2011, 22:49
Yeah, I remember it was red then I did something wrong. TheElix, make sure you haven't missed anything from the shader code.

TheElix
14th September 2011, 23:53
Nothing is missed with ctrl+A, ctrl+V.

JanWillem32
15th September 2011, 05:17
The regular type requires pre-processing to linear RGB by a "gamma conversion of video RGB to linear RGB"-type shader or integrated gamma function. The regular type will not change that gamma setting, so correction of gamma just before display output is required, too.
The Y'CbCr-type shader requires Y'CbCr pre-processing by a "RGB to Y'CbCr for SD&HD video input"-type shader (used on non-linear input). For those that already use multi-pass chroma interpolation, it would be rather silly to do that conversion twice, so I added alternatives to step 3, and moved the set of color controls. The Y'CbCr-type shader outputs linear RGB by default, but can be reconfigured easily.

CeeJay.dk
30th November 2011, 00:16
I posted my LumaSharpen shader (http://forum.doom9.org/showthread.php?p=1541796#post1541796) in another thread earlier and today I found this thread.

My shader is basically Y' version of Sharpen Complex but optimized for speed (still achieves better quality than the original) since it's primarily meant to be used for gaming.
I'm using an injector that allows me to do post-processing of the screenbuffer of any Direct3d game and the sharpen shader helps with the many upscaled textures you find in a game.

It's my first shader ever and my first real bit of programming I've done since school 16 years ago, but I feel I've already acomplished a lot in the few days I've been working on it.

Let me know what you think.

JanWillem32
5th December 2011, 01:26
That's quite interesting. I'll take a good look when I have more time. A quick glance at the code shows sqrt over mul. That's usually an improper way to deal with vectors. Try using abs on the delta values, next you can try using the more regular dot or length intrinsics. Be careful with pixel radius. The pixel shader sampler can be set to bilinear (or a few other functions on top of that), or in the case of MPC-HC, nearest neighbor. When sampling pixels, only use whole pixel sizes as distances. At the end, there's a function that uses pow with a floating-point exponent. That function will output undefined values if negative values are input. See a few of my shaders for reference on processing such values
Lastly, I've made a similar function: "sharpen\unsharp luma mask for SD&HD video". (Altough I've never really understood the black compensation, and I don't care much for versions without deband.) The shaders for the YCbCr-type sharpen complex test are a lot more advanced (and heavy), but maybe you can get some ideas from those for some of your own code. Note that the parameters are a bit extreme in those, I'm using a somewhat edited version of "r=6" myself... Might as well post that:// (C) 2011 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

#define NoiseLevel .75
#define Blur NoiseLevel/64.
#define EdgeSharpen 1.75*Blur
#define Sharpen0 .0625
#define Sharpen1 .140625
#define Sharpen2 .31640625
#define Sharpen3 .2109375
#define Sharpen4 .140625
#define Sharpen5 .09375
#define VideoGamma 2.4
sampler s0;
float2 c1 : register(c1);
#define sp(a, b) tex2D(s0, tex+c1*float2(a, b))
#define H0 Sharpen0*(1.125-dv3)
#define H1 Sharpen1*(1.125-dv3)
#define H2 Sharpen2*(1.125-dv3)
#define H3 Sharpen3*(1.125-dv3)
#define H4 Sharpen4*(1.125-dv3)
#define H5 Sharpen5*(1.125-dv3)
#define qp(a, b) ((dv = max(max((dv3 = abs(a)).x, dv3.y), dv3.z)) > b)?
#define D0d(a, b) qp((s1+b)/2.-a, ES) s1*(H0+1.)-a*H0
#define D0o(a, b) qp((s1+b)/3.-a, ES) s1*(H0+1.)-a*H0
#define D1d(a, b) (dv > BN)? (t1+a)/1.125*(H1+1.)-b*H1
#define D1o(a, b) (dv > BN)? (t1+a)/1.125*(H1+1.)-b/2.*H1
#define D2d(a, b, c) qp((a+c)/3.-b, BN) (t1+a+b)/2.125*(H2+1.)-c/2.*H2
#define D2o(a, b, c) qp((a+c)/3.-b/2., BN) (t1+a+b)/3.125*(H2+1.)-c/2.*H2
#define D3d(a, b, c, d) qp((b+d)/5.-c/2., BN) (t1+a+b+c)/4.125*(H3+1.)-d/4.*H3
#define D3o(a, b, c, d) qp((b+d)/6.-c/2., BN) (t1+a+b+c)/5.125*(H3+1.)-d/4.*H3
#define D4d(a, b, c, d) qp((b+d)/6.-c/4., BN) (t1+a+b+c)/8.125*(H4+1.)-d/4.*H4
#define D4o(a, b, c, d) qp((b+d)/5.-c/4., BN) (t1+a+b+c)/9.125*(H4+1.)-d/3.*H4
#define D5d(a, b, c, d) qp((b+d)/8.-c/4., BN) (t1+a+b+c)/12.125*(H5+1.)-d/4.*H5
#define D5o(a, b, c, d) qp((b+d)/8.-c/3., BN) (t1+a+b+c)/12.125*(H5+1.)-d/4.*H5
#define D6(a) (t1+a)/16.125
#define Dd(a, b, c, d, e, f) (D0d(a, b) : D1d(a, b) : D2d(a, b, c) : D3d(a, b, c, d) : D4d(a+b, c, d, e) : D5d(a+b+c, d, e, f) : D6(a+b+c+d+e+f))
#define Do(a, b, c, d, e, f) (D0o(a, b) : D1o(a, b) : D2o(a, b, c) : D3o(a, b, c, d) : D4o(a+b, c, d, e) : D5o(a+b+c, d, e, f) : D6(a+b+c+d+e+f))
float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 s1 = sp(0, 0).rgb;
float3 s2 = sp(-1, -1).rgb;
float3 s3 = sp(0, -1).rgb;
float3 s4 = sp(1, -1).rgb;
float3 s5 = sp(-1, 0).rgb;
float3 s6 = sp(1, 0).rgb;
float3 s7 = sp(-1, 1).rgb;
float3 s8 = sp(0, 1).rgb;
float3 s9 = sp(1, 1).rgb;
float3 r2 = sp(-2, -1).rgb;
float3 r3 = (sp(-1, -2)+sp(0, -2)).rgb;
float3 r4 = sp(1, -2).rgb;
float3 r5 = (sp(-2, 1)+sp(-2, 0)).rgb;
float3 r6 = (sp(2, 0)+sp(2, -1)).rgb;
float3 r7 = sp(-1, 2).rgb;
float3 r8 = (sp(0, 2)+sp(1, 2)).rgb;
float3 r9 = sp(2, 1).rgb;
float3 q2 = (sp(-2, -2)+sp(-3, -1)).rgb;
float3 q3 = (sp(-1, -3)+sp(0, -3)).rgb;
float3 q4 = (sp(2, -2)+sp(1, -3)).rgb;
float3 q5 = (sp(-3, 1)+sp(-3, 0)).rgb;
float3 q6 = (sp(3, 1)+sp(3, 0)).rgb;
float3 q7 = (sp(-1, 3)+sp(-2, 2)).rgb;
float3 q8 = (sp(0, 3)+sp(1, 3)).rgb;
float3 q9 = (sp(2, 2)+sp(3, -1)).rgb;
float3 p2 = (sp(-4, -1)+sp(-4, -2)+sp(-3, -2)+sp(-3, -3)).rgb;
float3 p3 = (sp(-2, -3)+sp(-2, -4)+sp(-1, -4)+sp(0, -4)).rgb;
float3 p4 = (sp(2, -3)+sp(3, -3)+sp(1, -4)+sp(2, -4)).rgb;
float3 p5 = (sp(-4, 2)+sp(-3, 2)+sp(-4, 1)+sp(-4, 0)).rgb;
float3 p6 = (sp(4, 0)+sp(4, -1)+sp(3, -2)+sp(4, -2)).rgb;
float3 p7 = (sp(-2, 4)+sp(-1, 4)+sp(-3, 3)+sp(-2, 3)).rgb;
float3 p8 = (sp(0, 4)+sp(1, 4)+sp(2, 4)+sp(2, 3)).rgb;
float3 p9 = (sp(3, 3)+sp(3, 2)+sp(4, 2)+sp(4, 1)).rgb;
float3 o2 = (sp(-5, -1)+sp(-5, -2)+sp(-4, -3)+sp(-3, -4)).rgb;
float3 o3 = (sp(-2, -5)+sp(-1, -5)+sp(0, -5)).rgb;
float3 o4 = (sp(4, -3)+sp(3, -4)+sp(1, -5)+sp(2, -5)).rgb;
float3 o5 = (sp(-5, 2)+sp(-5, 1)+sp(-5, 0)).rgb;
float3 o6 = (sp(5, 0)+sp(5, -1)+sp(5, -2)).rgb;
float3 o7 = (sp(-2, 5)+sp(-1, 5)+sp(-3, 4)+sp(-4, 3)).rgb;
float3 o8 = (sp(0, 5)+sp(1, 5)+sp(2, 5)).rgb;
float3 o9 = (sp(3, 4)+sp(4, 3)+sp(5, 2)+sp(5, 1)).rgb;
float3 n2 = (sp(-4, -4)+sp(-5, -3)+sp(-6, -2)+sp(-6, -1)).rgb;
float3 n3 = (sp(0, -6)+sp(-1, -6)+sp(-2, -6)+sp(-3, -5)).rgb;
float3 n4 = (sp(4, -4)+sp(3, -5)+sp(2, -6)+sp(1, -6)).rgb;
float3 n5 = (sp(-5, 3)+sp(-6, 2)+sp(-6, 1)+sp(-6, 0)).rgb;
float3 n6 = (sp(6, 0)+sp(6, -1)+sp(6, -2)+sp(5, -3)).rgb;
float3 n7 = (sp(-1, 6)+sp(-2, 6)+sp(-3, 5)+sp(-4, 4)).rgb;
float3 n8 = (sp(3, 5)+sp(2, 6)+sp(1, 6)+sp(0, 6)).rgb;
float3 n9 = (sp(6, 1)+sp(6, 2)+sp(5, 3)+sp(4, 4)).rgb;

float dv;
float3 dv3;
float BN = Blur;
float ES = EdgeSharpen;
float3 t1 = s1/8.;
float3 t0 = ((Dd(s2, r2, q2, p2, o2, n2)+Do(s3, r3, q3, p3, o3, n3)+Dd(s4, r4, q4, p4, o4, n4)+Do(s5, r5, q5, p5, o5, n5)+Do(s6, r6, q6, p6, o6, n6)+Dd(s7, r7, q7, p7, o7, n7)+Do(s8, r8, q8, p8, o8, n8)+Dd(s9, r9, q9, p9, o9, n9))/8.);
t0 = float3(t0.x+1.5748*t0.z, dot(t0, float3(1, -.1674679/.894, -.4185031/.894)), t0.x+1.8556*t0.y);// HD Y'CbCr to RGB
//t0 = float3(t0.x+1.402*t0.z, dot(t0, float3(1, -.202008/.587, -.419198/.587)), t0.x+1.772*t0.y);// SD Y'CbCr to RGB
float3 sbl = sign(t0);
t0 = sbl*pow(abs(t0), VideoGamma);
return t0.rgbb;
}

CruNcher
5th December 2011, 03:23
Sounds cool CeeJay thx, i wonder if it is comparable to mirillis implementation in performance/quality they use their own Direct 3D renderer + Shader and in terms of GPU resources it's damn efficient and looks comparable to Sharpen Complex 2 though it seems to work different :)
they call it "Detail Boost" http://mirillis.com/en/products/picture2.html it deblurs very heavy quantized stuff very efficiently with low GPU resources tests based on Sandy Bridge HD2000 (GT1) 6 EU upto 1080p

CeeJay.dk
5th December 2011, 18:55
A quick glance at the code shows sqrt over mul. That's usually an improper way to deal with vectors. Try using abs on the delta values, next you can try using the more regular dot or length intrinsics.

if (sqrt( mul(delta1,delta1) + mul(delta2,delta2)) > SharpenEdge) //?? Verify that the mul and sqrt aren't just there to get a positive value.

That bit of code is part of the edge detection from Sharpen Complex 2.
I agree .. multiplying a variable with itself and then taking the square root is a silly roundabout way to get a positive number when you can just use abs().

I don't use that part anymore. Mainly because edge detection was detecting large differences in contrast and applying less sharpening to those areas (well at least that how I used it .. I prefer edges smooth, not sharp) if the difference was above a certain threshold. But this causes some pixels that were sometimes over the threshold and sometimes under, to flicker which was very distracting to my eyes, so I turned it off.

I now just clamp the sharpening effect to a set maximum and this prevents edges from getting over-sharpened and it's much simpler and doesn't flicker.
There may be better way of limiting the sharpen effect from creating harsh artifacts and haloing, but for now I'm satisfied with just clamping the effect.

I may use the edge detection later if I can improve it or just throw it out, but for now I've left it in the code, but disabled it.


When sampling pixels, only use whole pixel sizes as distances.

But if I did that, then I couldn't exploit the hardware filtering trick that I use to greatly increase the speed over the original sharpen complex code.

Taking advantage of that is the best part of my shader.
When I sample on the edges of a pixel I get an average of the four pixels surrounding it. If I move the sampling point slightly I can adjust the weights of the pixels sampled.
Doing it this way even gets rid of the instructions that calculated the weights before .. those calculations are now free.

See http://prideout.net/archive/bloom/#Sneaky and http://rastergrid.com/blog/2010/09/efficient-gaussian-blur-with-linear-sampling/ for more detailed explanations of what I'm doing.

I'm currently also considering more exotic sampling patterns as well as using mipmap samples, to see if I can get a very large gaussian blur using very few samples.

For now I'm satisfied with using just 5 samples to get a 9-tap equivalent, but if I later wanted to try my hand at a local contrast enhancement (http://www.cambridgeincolour.com/tutorials/local-contrast-enhancement.htm) shader I would need a much larger blur.
You're using a huge amount of samples - Local contrast enhancement might be something you would want to try.

I use my shaders for games though, and I need them to be very efficient, so I'm holding off on that until I find fast way to do very large blurs.
Maybe box blur or one of the blurs from http://incubator.quasimondo.com/ - Stack blur, Superfast blur or Son of Gauss.


At the end, there's a function that uses pow with a floating-point exponent. That function will output undefined values if negative values are input. See a few of my shaders for reference on processing such values

//done = float4(pow(done, 1.0 / 2.2 )); Convert a sRGB colorspace to non-linear gamma 2.2 - Turned off because of precision errors

I was experimenting with doing the calculations in linear colorspace to see if that would improve the quality any. It produced errors instead. It should not use any negative values as input, but I may have overlooked something and you're right.
I'll take a look your code to see how you do it. Reading your code is not that easy though as you don't use comments much.

CeeJay.dk
6th December 2011, 23:04
Sounds cool CeeJay thx, i wonder if it is comparable to mirillis implementation in performance/quality they use their own Direct 3D renderer + Shader and in terms of GPU resources it's damn efficient and looks comparable to Sharpen Complex 2 though it seems to work different :)
they call it "Detail Boost" http://mirillis.com/en/products/picture2.html it deblurs very heavy quantized stuff very efficiently with low GPU resources tests based on Sandy Bridge HD2000 (GT1) 6 EU upto 1080p

I haven't seen mirilis Splash before, but it does look good. I'd love to know what technique they use.

I don't know how LumaSharpen compares to Detail boost, but properly tweaked it's slightly better than Sharpen Complex 2 qualitywise and much faster.

Between LumaSharpen, Sharpen Complex and Jan's Sharpen/Denoise/Deband shaders, Jan's probably has the best quality but also takes a lot of GPU time.

jokerb47
21st December 2011, 20:55
How to use these PS for image viewing?

Qaq
22nd December 2011, 06:51
Read first post:
To make a pixel shader work in MPC-HC:....

CeeJay.dk
24th December 2011, 12:09
I've noticed my shader in MPC seems to stretch the image from the center with about a pixels length. But only does this in MPC, not when used in combination with the dx9 shader injector i'm developing it for.
Why is that ?

I suspect I may need to change something in MPC's render settings or may force it to use bilinear filtering, but I'm not sure.

JanWillem32
28th December 2011, 01:58
@CeeJay.dk: Sorry for the delay.
I think it's a good thing you removed that part of the code you mentioned.
The original "Sharpen Complex 2" does indeed produce aliasing artifacts (along with banding and non-uniform filtering in its input to output color spectrum). With a branch in the code path implemented like that it's pretty inevitable.
I'm also familiar with the difficulty of getting good parameters for filter parts to get a proper, uniform response with various scenes and also varying video quality of course.

There's no speed increase by filtering sampling on 4 pixel borders using a bilinear sampler state over filtering sampling on 9 pixels using a nearest neighbor (point) sampler state (for this case). Shaders are interpreted by the display driver before being transported to the GPU's command cue and execution caches. Resolving a bilinear sampler state is done on the shadercore, the same processor chains that process the shaders themselves.
As usual, this is pretty hard to measure. Shadercores are superscalar engines, meaning that the pipelines can process multiple operations simultaneously, as long as those are in independent parts of the pipeline (doing different types of operations).
An example of that can be seen in the disassembly of a pixel shader with mixed code and multiple pixels being sampled. Under level 3.0 compiling (DirectX 9.0c) and onwards, sampling commands will be spread out over the length of the code. That's flow control (there are even some compiler flags to influence it). Sampling pixels is slow, and during sampling to a register, simple arithmetic can be done on independent registers (such as values from previous sample instructions). This means that applying a simple filter, whether it's from a bilinear averaging from a sampler state or simple self-written code, it's not likely to have any impact on the global speed of rendering at all.
(Note 1: Enabling flow control generally increases the total instruction count a bit but executes faster due to better pipeline utilization.
Note 2: DirectX 10 and onwards supports the intrinsics Sample() and Load(). Load() ignores sampler states and is better optimized by the compiler and driver because of that.)
In the pages you posted, both code for an OpenGL target (DirectX 9 and 10+ handle things quite a bit different). There's also no disassembly or data on globally active sampler states posted with any of the functions unfortunately. A difference in code target vectorization (four floats in a register usage for arithmetic functions, instead of fewer, or even only one float at a time) and serialization (flow control, mixed instructions using independent registers will take less time on average than instructions that repeat or executes while re-using the same registers multiple times in succession) can point out important performance issues.
(Note 3: SSE code on the CPU behaves pretty much the same way. The registers on the GPU's shadercore and the CPU's SSE core even support the same data configurations (with two rare exceptions). Just don't use any integer arithmetic on the GPU if it's not really necessary, the performance is very bad.
Note 4: Next to instruction vectorization and serialization there's parallelization, achieved trough rendering many 44 output pixel chunks in separate threads on the many execution units of the shadercore.)

Using mipmaps as a source for square-blurred samples is an idea I looked into earlier as well. I render from and to usual render-target textures. When I want to use the mip levels of such a texture, I'll have to order the device to generate it from the top-level surface that was just rendered. That's because the mip levels are not written during rendering at all.
I haven't had much success with this kind of mipmap usage at all. The blur is low-resolution and purely square which made it look aliased. If you've found a better-looking method, please share it. I'm still looking at integrating efficient debanding, sharpening and other basic functions for the internal video renderers.

Local contrast enhancement is indeed very interesting, so thank you for the link. This method solves some of the problems with a single box sampling area to unsharp mask (as noted in the text).
The parts "Complications" and "Further reading" are informative as well, although "Complications" could be spiced up a bit more in a technical sense. The first part describes the very common distortions found when rendering in a non-uniform colorspace. The suggestions to the solution are insightful, but don't go all the way. The suggested LAB colorspace is designed scale visually linear internally to luminance and chrominance dimensions. It's not mathematically linear or linear to lightness. In most of its encoded forms it also can't cover the entire range visible colors. For professional purposes the uniform (and directly related) color spaces XYZ an xyY are often used for rendering. (I'm also thinking it would be a good format to use by the internal renderer of MPC-HC, too. Although it can't be used with the 8- or 10-bit buffer formats.)
The second issue is the one I saw first in the sample pictures. Damage to the dynamic range is really common with image filters. If you butcher a picture with those bad enough, you can't even do a successful 'repair' filter pass afterwards (even with the most ideal precision intermediate picture buffers). As I understand, the pictures are dramatized to show a stronger effect of the filter than it is likely to be ever used.
I think this function could benefit from a convolution function: a gradual function to transform measured area sharpness to a factor used to directly blur or sharpen by that amount (possibly evaluating it for multiple directions from the base pixel). I'll take a look if I can write a prototype pixel shader based on this.

About the pow() intrinsic function; the compiler will warn that will not work properly on negative numbers. I used to have problems with it too, before I found out you can simply carry over the sign bits with the sign() intrinsic. After that, it behaved exactly as expected.

About writing comments in the code, take a look at the links in the opening post of this thread or the code packages in my folder. As far as I know, I make plenty of comments in my code.


About quality of the various sharpening filters; I think a very good balance in efficiency and quality could be achieved using an integrated multi-pass solution coded natively in the renderer, instead of in a single pixel shader. As I prefer to measure the sharpness relative to the current pixel in multiple directions and process blurring or sharpening per direction as well, generating a look-up map or similar optimization like that is difficult. I'd love to try some new methods that are a bit lighter than my current set. I know that that takes a flash of inspiration, and then a lot of patience to actually see any decent results.

Finally, to answer your latest post;
I've specifically hard-coded the renderer to only enable a bilinear sampler state on the filter passes for the color management's look-up table, automatically for resizing of subtitles and by (one) user setting for the main video resizing passes. (In the version of the renderer I edited, I also removed it for the main video resizing pass.)
I don't exactly remember how the renderer in the trunk build managed vertices at all. When I dumped the renderer core and imported a new one, I made the vertex caches indexed. After that, I never had any problems with it anymore. (The version in the trunk build can't even manage two-pass resizing.) AFAIK the vertices I coded have no displacement problems, but I can't guarantee that for the renderer in the trunk build.


I hope I didn't bore anyone with my recent walls of text on this forum... I'm just trying to help people a bit after being away for a while.:)

CeeJay.dk
31st December 2011, 13:37
There's no speed increase by filtering sampling on 4 pixel borders using a bilinear sampler state over filtering sampling on 9 pixels using a nearest neighbor (point) sampler state (for this case).
...
As usual, this is pretty hard to measure.


AMDs GPU Shaderanalyzer, in-game FPS meters, and using MSI Afterburner to measure GPU load, all tell me that using 4 texture fetches is faster than using 9.


Sampling pixels is slow, and during sampling to a register, simple arithmetic can be done on independent registers (such as values from previous sample instructions). This means that applying a simple filter, whether it's from a bilinear averaging from a sampler state or simple self-written code, it's not likely to have any impact on the global speed of rendering at all.


If you are saying that it often doesn't matter to optimize the number of ALU instructions, because the bottleneck when sampling several pixel is going to be texture fetches then I already know.
GPU Shaderanalyzer reports that texture fetches are the bottleneck on 11 of the 17 AMD cards is displays statistics about.

Just don't use any integer arithmetic on the GPU if it's not really necessary, the performance is very bad.

I know .. I won't.

I hope I didn't bore anyone with my recent walls of text on this forum... I'm just trying to help people a bit after being away for a while.:)

I'm not bored .. in fact I have more to comment on it , I just don't have time right now .. it's New Years.

Happy New Year!

P.S.

I thought you might like to see how I'm progressing .. this is the latest iteration of LumaSharpen (there are still lots more to be done .. I have so many ideas - I'll share some later if I get the time.)

/*
____________________

LumaSharpen 1.1.2
____________________

by Christian Cann Schuldt Jensen ~ CeeJay.dk

Based on Sharpen Complex 2 from Media Player Classic
(I have rewritten most of the code by now though)

It blurs the original pixel with the surrounding pixels and then subtracts this blur to sharpen the image.
It does this in luma to avoid color artifacts and allows limiting the maximum sharpning to avoid or lessen halo artifacts.

This is similar to converting an image to LAB colorspace and using Unsharp Mask on the Lightness channel in Photoshop.

Compiles with both PS 2.0 and 3.0 (Faster with 3.0)
*/

// .----------------------------------------------------._User settings_.---------------------------------------------------.

// -- Sharpening --
#define offset_bias 1.0 // I suggest a value between 0.0 and 2.0 - default is 1.0
#define sharp_strength 0.5 // Strength of the sharpening - You should probably use something between 0.2 and 2.0 - default is 0.5
#define sharp_clamp 0.015 // Limits maximum amount of sharpening a pixel recieves - Default is 0.015

#define pattern 2 // Choose a sample pattern ( 1, 2 or 3 )

// .--------------------------------------------------._Defining constants_.------------------------------------------------.

/* For use with SMAA injector.
#define s0 colorTexG
#define px BUFFER_RCP_WIDTH
#define py BUFFER_RCP_HEIGHT
*/


// For use with Shaderanalyzer and MPC-HC
sampler s0 : register(s0);
float4 p0 : register(c0);
float4 p1 : register(c1);

#define width (p0[0])
#define height (p0[1])

#define px (p1[0])
#define py (p1[1])

//#define dx (offset_bias*px)
//#define dy (offset_bias*py)



#define CoefLuma float4(0.2126, 0.7152, 0.0722, 0) // BT.709 & sRBG luma coefficient (Monitors and HD Television)
//#define CoefLuma float4(0.299, 0.587, 0.114, 0) // BT.601 luma coefficient (SD Television)
//#define CoefLuma float4(0.3333, 0.3334, 0.3333, 0) // Equal weight coefficient

#define sharp_strength_luma (CoefLuma * sharp_strength)

// .------------------------------------------------------._Main code_.-----------------------------------------------------.

//float4 SharpenPass( float2 tex )
float4 main( float2 tex : TEXCOORD0 ) : COLOR // Use with Shaderanalyzer and MPC-HC
{

// -- Get the original pixel --
float4 ori = tex2D(s0, tex); // ori = original pixel

// [ NW, , NE ] Each texture lookup (except ori)
// [ ,ori, ] samples 4 pixels
// [ SW, , SE ]

// -- Pattern 1 -- A 7 tap gaussian using 2+1 texture fetches.
#if pattern == 1

// -- Gaussian filter --
// [ 2/9, 4/9, ] [ 1 , 2 , ]
// [ 4/9, 8/9, 4/9] = [ 2 , 4 , 2 ]
// [ , 2/9, 2/9] [ , 2 , 1 ]

float4 blur_ori = tex2D(s0, tex + float2(-px,py) / 3 * offset_bias); // North West
blur_ori += tex2D(s0, tex + float2(px,-py) / 3 * offset_bias); // South East

//blur_ori += tex2D(s0, tex + float2(px,py) / 3 * offset_bias); // North East
//blur_ori += tex2D(s0, tex + float2(-px,-py) / 3 * offset_bias); // South West

blur_ori /= 2; //Divide by the number of texture fetches

#endif

// -- Pattern 2 -- A 9 tap gaussian using 4+1 texture fetches.
#if pattern == 2

// -- Gaussian filter --
// [ .25, .50, .25] [ 1 , 2 , 0 ]
// [ .50, 1, .50] = [ 2 , 4 , 2 ]
// [ .25, .50, .25] [ 0 , 2 , 1 ]

float4 blur_ori = tex2D(s0, tex + float2(-px,py) * 0.5 * offset_bias); // North West
blur_ori += tex2D(s0, tex + float2(px,-py) * 0.5 * offset_bias); // South East
blur_ori += tex2D(s0, tex + float2(px,py) * 0.5 * offset_bias); // North East
blur_ori += tex2D(s0, tex + float2(-px,-py) * 0.5 * offset_bias); // South West

blur_ori /= 4; //Divide by the number of texture fetches

#endif

// -- Pattern 3 -- An experimental 17 tap gaussian using 4+1 texture fetches.
#if pattern == 3

// -- Gaussian filter --
// [ , 4 , 6 , , ]
// [ ,16 ,24 ,16 , 4 ]
// [ 6 ,24 , ,24 , 6 ]
// [ 4 ,16 ,24 ,16 , ]
// [ , , 6 , 4 , ]

float4 blur_ori = tex2D(s0, tex + float2(-0.4*px,1.2*py) * offset_bias); // North North West
blur_ori += tex2D(s0, tex + float2(0.4*px,-1.2*py)* offset_bias); // South South East
blur_ori += tex2D(s0, tex + float2(1.2*px,0.4*py) * offset_bias); // East North East
blur_ori += tex2D(s0, tex + float2(-1.2*px,-0.4*py) * offset_bias); // West South West
blur_ori += ori; // Probably not needed. Only serves to lessen the effect.
blur_ori /= 5; //Divide by the number of texture fetches
#endif

// -- Calculate the sharpening --
float4 sharp = ori - blur_ori; //Subtracting the blurred image from the original image

// -- Adjust strength of the sharpening --
sharp = dot(sharp, sharp_strength_luma); //Calculate the luma and adjust the strength

// -- Clamping the maximum amount of sharpening to prevent halo artifacts --
sharp = clamp(sharp, -sharp_clamp, sharp_clamp); //TODO Try a curve function instead of a clamp

// -- Combining the values to get the final sharpened pixel --
float4 done = ori + sharp; // Add the sharpening to the original.

// .------------------------------------------------._Debugging and tweaking.-----------------------------------------------.

//For tweaking and debugging purposes you can show the sharpen effect or chroma.
//float4 done = (sharp*4) + float4(0.5,0.5,0.5,0); // Uncomment to visualize the strength of the sharpen (multiplied by 4 to see it better)

//done = ori.a; // Visualize the alpha
//done = 1.0 - ori.a; // Visualize the inverted alpha

// .-------------------------------------------------._Returning the output_.-----------------------------------------------.

return done;
}

It basically does the same thing, but with fewer instructions and the code looks much cleaner and is easier to read. There is also a new experimental 17-tap gaussian or at least it is a gaussian if my math is correct.

Also I corrected a bug in the fast version (pattern 1) where previously it moved the samples 0.5 (fixed version moves them 1/3) from center which worked but resulted in something more like a highpass sharpen than a gaussian.

JanWillem32
2nd February 2012, 17:52
Sorry I'm responding this late, this thread had already moved to page 2 before I noticed a response. I looked up the sampling parameters for D3D and DXGI. It appears that the Sample intrinsic (in both versions of the DX API) by defaults reserves pipelines on the shadercore to process anisotropic spatial filtering by default, and shaders themselves don't have indicators to indicate a lower requirement. The Load intrinsic (DXGI only) is handled by an entirely different pipeline, and can't be filtered. I didn't expect such a big difference in the methods for the basic Load and Sample on nearest neighbor. Maybe in a while I'll try to add optional bilinear filtering on custom pixel shader sampler stages for the DirectX 9 renderer I'm working on.
Your shader looks very promising. The debug section to visualize each section is very helpful, I use such mechanisms in rendering all the time. Just don't look too long at those rainbow colors while adjusting settings. :D The effect is of course, quite psychedelic, and can cover up the main intention for a shader's usage.
The methods seem pretty much correct. The only note I have is that mentioning the LAB colorspace is a bit wrong in this context. This shader only converts to Y'CbCr. That's also a luma-chroma system, but doesn't nearly have as much colorspace coverage or control as LAB or XYZ:
http://en.wikipedia.org/wiki/CIE_1931_color_space
http://en.wikipedia.org/wiki/Lab_color_space
http://en.wikipedia.org/wiki/File:Colorspace.png
(Although I must note that Y'CbCr is easier to handle than LAB. LAB is even more mathematically non-uniform than Y'CbCr, mostly because of a difficult gamma slope.)
Anyway, good luck with further developing.

leeperry
22nd November 2012, 10:44
Hi Jan, thanks for the scripts! Now that madVR supports them, this thread is well worth a bump :)

I can't see any script that would process a mirroring effect, wouldn't that be possible via a PS script? All I can see is flipping :o

And an artificial film grain script much like GrainFactory3() (http://avisynth.org/mediawiki/GrainFactory3) could also be really handy for low bitrate encodes.

:thanks:

burfadel
23rd November 2012, 23:22
I agree!

A nice effect that would be good as a shader is 'temporatal smoothing'. In FFDshow, under 'Blur & Noise Reduction', having everything unchecked except for 'Temporal smooth' (and of course, the box to actually activate the smoothing options), have it set to '1', and 'process color'.
'

leeperry
13th December 2012, 03:57
Hi Jan, I was wondering if you would have any plan to implement some sort of dynamic contrast stuff like Samsung's DNIE (http://forum.doom9.org/showpost.php?p=1600258&postcount=15433)?

It looks quite impressive on their TV's but it's neither defeatable or finetunable, and quite frankly they went quite overboard with the default settings....I guess it's meant to stun you in the shop but it basically makes everything look like a cell-shading cartoon = very funny for a few days, then it WILL get old after a while :o

They do it to compensate for the infamous 2K:1 native CR of their grossly overpriced TV's but the idea is good, it would only need to be finetunable I think. They allow several settings for their motion interpolation stuff(that looks quite good in "crisp" mode) so I dunno why they don't provide it for DNIE :rolleyes:

Besides, the best looking scalers in madVR require quite a lot of horsepower for 1080p and/or 60fps scaling so let's rock with the PS scripts while we're at it :D

I'd love to hear your thoughts on that matter, :thanks:

toniash
13th December 2012, 12:19
Besides, the best looking scalers in madVR require quite a lot of horsepower for 1080p and/or 60fps scaling so let's rock with the PS scripts while we're at it :D


PS scripts can be also very heavy on GPU

leeperry
13th December 2012, 16:22
Indeed, but atm anything cheaper than a GTX660 is a waste of money if you like it green so that leaves a lot of GPU power unused.....but I see that Jan has been silent for a while.

leeperry
28th December 2012, 19:49
so more ideas in case there'd be any bored PS script coder around: http://www.youtube.com/watch?v=rRf2aEsJaQE

pretty funky way to watch 4:3 content on a 16/9 display, would love to try it on my own James Brown Soul Train DVD's :devil:

leeperry
30th December 2012, 02:46
also, would that be hard to make a "negative" script? you've made all kinds of complicated scripts that have no real world use AFAICS, but simple stuff such as flipping/mirroring/negative just isn't there :(

CeeJay.dk
3rd January 2013, 16:37
also, would that be hard to make a "negative" script? you've made all kinds of complicated scripts that have no real world use AFAICS, but simple stuff such as flipping/mirroring/negative just isn't there :(

That would be the easiest thing in the world, but why would you want to see a negative of the screen?

Anyways :

/* --- Defining Constants --- */

sampler s0 : register(s0);

/* --- Negative --- */
/*
by Christian Cann Schuldt Jensen ~ CeeJay.dk

Inverts the color of the image, making it negative.
*/

float4 NegativePass( float4 colorInput )
{
return 1.0 - colorInput;
}

/* --- Main --- */

float4 main(float2 tex : TEXCOORD0) : COLOR {
float4 FinalColor = tex2D(s0, tex);

FinalColor = NegativePass(FinalColor);

return FinalColor;
}

EDIT : While testing the negative shader I made , I found that MPC-HC already has one. It's called Invert.
This is it's code :

sampler s0 : register(s0);

float4 main(float2 tex : TEXCOORD0) : COLOR {
float4 c0 = float4(1, 1, 1, 1) - tex2D(s0, tex);

return c0;
}

It does exactly the same. Subtracts the pixel color from 1.0

leeperry
4th January 2013, 02:10
sweeet, :thanks: a bunch!

well, PS scripts don't work in 8bit like ffdshow so I can use them without killing the PQ in mVR and I like to have troubleshooting scripts "just in case" ;)

I used to code Seka assembler on Amiga back in the days and I heard that coding PS scripts was great fun because you only had to care about the good side of coding, so I might document myself on how to write these at some point. If anything, I'd crave for a less agressive DNIE (http://forum.doom9.org/showpost.php?p=1600258&postcount=15433) :)

Dodgexander
31st January 2013, 15:32
When using these in MPC-HC. I get errors:

memory(203,11): warning X3571: pow(f, e) will not work for negative f, use abs(f) or conditionally handle negative values if you expect them
memory(155,9): error X5589: Invalid const register num: 32. Max allowed is 31.

leeperry
31st January 2013, 17:24
did you set them to PS 3.0? did you RTFM if any? :p

Dodgexander
1st February 2013, 13:54
did you set them to PS 3.0? did you RTFM if any? :p

The pixel shader was indeed the problem, for some reason when I set to Pixel Shader 3, it reverts back to 2 and I have to change back to 3 again for it to work!

The results here are truly amazing, now I just have to pick the best option and automate. Thanks a lot for the help.

Oh another question though, when I use the YCbCr-type sharpen complex test 2 scripts, my screen completely goes purple.

http://imageshack.us/scaled/thumb/255/testtpz.png (http://imageshack.us/photo/my-images/255/testtpz.png/)

leeperry
2nd February 2013, 03:33
told ya, SimHD is the usual commercial bs clueless companies brag about...they always promise a lot but actually end up delivering very little. Doom9's forum is where the party's at for supreme PQ =)

don't bother with YCbCr, you wanna use RGB scripts with madVR.

Dodgexander
3rd February 2013, 02:50
Doing these scripts reminds me of a while back following 8:13's post for post processing in FFdshow via Avisynth, the results back then were impressive and today, with the video card doing the work its even better, especially with low end cpu!

I wish however that someone would develop all this post processing into a decoder or renderer like ffdshow. It would be so much easier to set up.

Dodgexander
15th February 2013, 00:18
Is there an easier way to import all of these files to switch between them without having to copy and paste each text file into MPC-HC?

Also, I like the effect that the blur shader has, but can't use it with any of the sharpen, deband and denoise filters at the same time. Is there any way around this?

Also, what other shaders are people using on their SD material for best effects?

Finally, using Mad VR upscaling chroma and Luma already, how can i make sure none of the effects are conflicting?

leeperry
1st May 2013, 11:01
BTW, mVR currently doesn't align chroma properly for MPEG1 as shown here (http://forum.doom9.org/showpost.php?p=1622754&postcount=18186).

Would anyone be kind enough to write a PS script that would fix this please? :thanks:

Is there an easier way to import all of these files to switch between them without having to copy and paste each text file into MPC-HC?
Dunno about MPC but you can just put them all in the /PxShader/ subdirectory of PotP, et voil: http://thumbnails104.imagebam.com/25196/7c8097251955360.jpg (http://www.imagebam.com/image/7c8097251955360)

You can also setup automatic profiles with different combinations of PS scripts depending on frame rate/resolution/codec/etc :)

XRyche
6th July 2013, 01:54
Is there any way that you would make a YCbCr r=2 or r=1, sharpen complex, deband, denoise and color controls for SD&HD video input shader at this late date? The r=4 shader is a little to taxing on my rig with out overclocking my video card for some 1200p content that could use some debanding.

JanWillem32
7th July 2013, 22:14
Sure, I still often write shaders. (Just most of them are not for video renderers.) What shader chain would you like to use? I can probably reduce some of the overhead, or use somewhat more lightweight methods to get what you want.

XRyche
8th July 2013, 06:34
I'm not that tech-savy or a video-phile so I am not quite sure what you're asking. If you mean the order of shaders I use (they are all yours btw :) ) they are as follows and in the following order; Pre-resize: RGB to Y'CbCr floating point, 4:2:0 to 4:2:2 Chroma Up-sampling, 4:2:2 Spline5 Chroma Up-Sampling floating point for Y'CbCr, one of your r=?, sharpen complex, deband, ? denoise and color controls for SD&HD video input, and lastly unsharp luma mask for SD&HD video (the black border compensation is very very nice :) ). I don't really use post-resize shaders often. Have never found a need for them with your shaders. What I think I'm looking for is your "r=?, sharpen complex, deband, ? denoise and color controls for SD&HD video input for Y'CbCr" shader using only 1 or 2 radial layered sharpening functions. I've used your linear gamma shaders before using only 1 or 2 radial layered sharpening with 1200p and that seem to work for me. I just don't like the Linear Gamma shaders because I get massive artifacts when denoiseing and I can see the outlines for the sharpening (I'm probably doing something wrong). I tried raising the GammaCompensation value but that seems to have a negative impact on the denoising and sharpening effects.

While I have your attention, would there be any perceivable benefit in 4:4:4 Chroma Up-Sampling and if there was would you be willing to write a Y'CbCr shader for it? I've taken a really strong appreciation for your Y'CbCr shaders compared to RGB. Colours are much more subtle and not as harsh when using RGB at least to me. I prefer your Video renderer/w Y'CbCr shaders over a certain other extremely popular renderer simply for this fact.

mhourousha
8th July 2013, 08:20
I wrote a Color Vibrance Shader for some GPU without 'digital vibrance' feature (Intel HD Graphic for example)

sampler s0 : register(s0);
float4 p0 : register(c0);
float4 p1 : register(c1);

float3 ColorVibrance(float3 rgb,float vibrance)
{

//--Convert RGB to HSL--
float maxvalue = max(rgb.r,rgb.g);
float minvalue = min(rgb.r,rgb.g);
maxvalue = max(rgb.b,maxvalue);
minvalue = min(rgb.b,minvalue);
float CValue = maxvalue-minvalue;
float3 hsl = float3(0.0,0.0,0.0);
hsl.z = 0.5f*(maxvalue+minvalue);
float tempf = 1.0f-abs(hsl.z*2.0f-1.0f);
if(CValue<0.0001f)
{
return rgb;
}
hsl.y = CValue/tempf;

//--Boost pixel's Saturation base on its original Saturation
hsl.y +=hsl.y*(1.0f-hsl.y)*vibrance;
//--Conver HSL back to RGB-
float CValue2 = hsl.y*tempf;
float mValue = hsl.z-0.5f*CValue2;
float3 BaseColor = float3(0.0f,0.0f,0.0f);
if(maxvalue-0.00001f <= rgb.r)
{
BaseColor.x = 1.0f;
hsl.x = (rgb.g-rgb.b)/CValue;
if(hsl.x<0.0f)
{
BaseColor.z = -hsl.x;
}
else
{
BaseColor.y = hsl.x;
}
}
else if(maxvalue-0.00001f <= rgb.g)
{
BaseColor.y = 1.0f;
hsl.x = (rgb.b-rgb.r)/CValue;
if(hsl.x<0.0f)
{
BaseColor.x = -hsl.x;
}
else
{
BaseColor.z = hsl.x;
}
}
else
{
BaseColor.z = 1.0f;
hsl.x = (rgb.r-rgb.g)/CValue;
if(hsl.x<0.0f)
{
BaseColor.y = -hsl.x;
}
else
{
BaseColor.x = hsl.x;
}
}
return float3(mValue,mValue,mValue)+float3(CValue2,CValue2,CValue2)*BaseColor;
}
float4 main(float2 tex : TEXCOORD0):Color0
{
float4 c0 = saturate(tex2D(s0, tex));
float vibrance = 0.5;
return float4(ColorVibrance(c0.xyz,vibrance),1.0);
}

PetitDragon
8th July 2013, 16:08
I'm not that tech-savy or a video-phile so I am not quite sure what you're asking. If you mean the order of shaders I use (they are all yours btw :) ) they are as follows and in the following order; Pre-resize: RGB to Y'CbCr floating point......

Could you tell us what version of Jan's test build you are using with these PS shaders?

JanWillem32
10th July 2013, 01:30
XRyche, do you actually need the chroma up-sampling shaders? They only work on AMD/ATi and Intel GPUs in the quality mode. For the performance mode, the default mixer up-sampling is used and for nVidia GPUs the chroma up-sampling can't be overridden unless you take out the VMR-9 or EVR mixer. If you do need them, which of the current types in the renderer do you like? The higher-order types from the pixel shader pack are somewhat faulty.
I can certainly mix the sharpen complex~ and the luma-type unsharp mask shaders, they pretty much act on the pixels in a similar fashion anyway.
Gamma linearization is still important for Y'CbCr to R'G'B' to RGB to XYZ stages. By default, the renderer I wrote takes good care of these steps in the quality mode. Letting external shaders do such a task is required when you set "Disable Initial Color Mixing Stages" for the renderer (which does give a lot of control over that stage, with nearly ideal efficiency).

mhourousha, this shader does seem interesting. A quick glance shows some minor things to possibly improve, though.
First of all, unlike C, HLSL does not take 'F' or 'f' as a suffix for single precision. Single precision is the default, and double precision takes an 'l' or 'L' suffix.
Secondly, for what reason is saturation used? Various rendering stages for video playback and other HDR imaging produce valid output beyond the 0 minimum and 1 maximum. Other than some anti-aliasing methods that have to deal with the spatial problems of multi-sampling pixels into one output, I've never seen valid reasons to actively saturate colors.
Third, some parts could improve by using shuffle masks, such as: "mValue.rrr" or just "mValue" instead of "float3(mValue,mValue,mValue)" and "ColorVibrance(c0.xyz,vibrance).rgbb" instead of "float4(ColorVibrance(c0.xyz,vibrance),1.0)".
Fourth, why did you use "0.00001f" instead of a true floating-point epsilon? On top of that, why did you use such a small value at all? The only reason I can see is because "hsl.y = CValue/tempf;" could possibly do a division by zero. For that one changing the previous comparison to "CValue <= 0." would suffice though.

mhourousha
10th July 2013, 06:51
JanWillem32:
about the saturate,some renderer(MadVR for example),doesn't clamp the color between[0,1] for the surface used as source by the shader.so artifact would occur as I don't saturate the 'c0'
about shuffle mask,it's my habit :D ,I think the shader compiler would optimize it.
about the 0.00001f,first, it's not good to do a 'equal' comparision between float values.second,I didn't trust gpu on doing division by very small value.:p

XRyche
10th July 2013, 08:58
XRyche, do you actually need the chroma up-sampling shaders? They only work on AMD/ATi and Intel GPUs in the quality mode. For the performance mode, the default mixer up-sampling is used and for nVidia GPUs the chroma up-sampling can't be overridden unless you take out the VMR-9 or EVR mixer. If you do need them, which of the current types in the renderer do you like? The higher-order types from the pixel shader pack are somewhat faulty.
I can certainly mix the sharpen complex~ and the luma-type unsharp mask shaders, they pretty much act on the pixels in a similar fashion anyway.
Gamma linearization is still important for Y'CbCr to R'G'B' to RGB to XYZ stages. By default, the renderer I wrote takes good care of these steps in the quality mode. Letting external shaders do such a task is required when you set "Disable Initial Color Mixing Stages" for the renderer (which does give a lot of control over that stage, with nearly ideal efficiency).

Wow, I know enough just to make myself look like an idiot :rolleyes: .As a matter of fact it is the higher order types from your pixel shader pack that give me issues. I should have made that clear....oops again. I understand now that the chroma up-sampling shaders are unnecessary for my GPU (Nvidia), I checked myself and I didn't see any change at all with or without them. I incorrectly assumed that they worked without really bothering to compare. Am I correct in assuming the the RGB to Y'CbCr conversion shader does exactly that (I do definitely see a difference, if not....it's off to the optometrist I go :) .)?

As far as the sharpen complex and luma unsharpen mask shaders i would say "Yes, please". When I use the Luma unsharpen mask shader it has a tendency to clean up some of my old tv card rips without having to use excessive denoising. I assumed the "black border compensation" has something to do with that.

Also if you could write a r=1 and an r=2 "sharpen complex, deband, ? denoise and color controls for SD&HD video input" Y'CbCr (I know they are "test" shaders but they appear to do everything they say they do without issue for me). I use your higher order ones for alot of SD and 720p content anyway.

Thanks for setting me straight on the chroma upsampling issue. I had no idea EVR had that kind of limitation.

JanWillem32
10th July 2013, 10:39
XRyche, the RGB to Y'CbCr conversion shader is fairly basic. I assume you mean the chroma up-sampling shaders? These will distort the chroma if the values were altered before. Sharp chroma borders such as red on black and blue on black will have the worst artifacts.
I'll see what I can write today and tomorrow. Implementing this chain as three passes of pixel shaders should be easy enough.

mhourousha, not only MadVR preserves output beyond the 0 minimum and 1 maximum. It's standard in all quality rendering. Most video filters are expected to be able to take full floating point range inputs, correctly process everything and then output. That said, what artifacts do you expect? Of the entire set of pixel shaders in the pixel shader pack and all other shaders I've ever written except for the stages for a type of anti-aliasing, I never had to use saturation on colors at all. (Saturating and clamping on vertices is pretty common though.)
The ".rgbb" shuffle at the end is mostly a trick to prevent two write instructions to the output register (one for the three color channels and one for the the alpha output color channel). The other shuffles can be optimized by the compiler indeed.
In my honest opinion, all those teachers that still teach "do not use direct comparisons with floating point logic" and don't bother with actually explaining machine epsilon and relative error accumulation by instructions should get a whipping.
In this case it's about a division. For the right-hand operand in floating-point division, there are 5 special cases: -infinity, infinity, -0, 0 and not-a-number inputs. NaN is not an issue in this case and division by infinity will work as expected. Only straight division by exactly -0 and 0 will produce a NaN, -infinity or infinity, depending on the left-hand operand.
Handling small numbers on a GPU has not been an issue ever since the introduction of the programmable pipeline. It's actually CPUs that had issues with small numbers for up until recent models. Intel even introduced flags on x86 CPUs to allow flushing denormal values to zero and assume denormals as zero (not applied by default, as it breaks IEEE and most programming language standards). Most CPUs in use today do not have native denormal support in in their floating-point processing units. Whenever a denormal is detected in such a CPU, an interrupt is emitted, the FPU is put offline and the calculation with the denormal is done by emulation. Such operations can cost over a thousand clock ticks. Full-speed single precision floating point denormal handling is important (and costly in terms of transistor counts) on GPUs. GPUs don't have any other logic on board to emulate any instructions in the first place. On top of that, slowing down certain instructions in one core will pause all other grouped cores in a GPU (the curse of massive parallelism, which does apply on any branching code).
In terms of accuracy of divisions, I looked up the 'rcp' instruction (hlsl will generally not compile to straight division, but rather use fast reciprocal and multiply), the documentation is what I expected it would be: http://msdn.microsoft.com/en-us/library/windows/desktop/bb147315%28v=vs.85%29.aspx .
There are no issues with handling small numbers on a GPU, as long as the numbers don't require such a large exponent or mantissa that only double precision or even greater will suffice for calculations.
In regards to floating-point equality evaluations, any floating-point number compared to iself for equality will yield true, except for NaNs. Floating-point numbers do get altered by arithmetic instructions, which accumulate relative error that you might have to compensate for. However, the only instruction I see before the last set of comparisons to "rgb" and "maxvalue" is 'max'. That is a flat branching instruction, not an arithmetic type. No relative error is accumulated, so the epsilon on these comparisons is useless.

mhourousha
10th July 2013, 13:43
JanWillem32,thx for reply
about saturation:because RGB<->HSL is not a linear transform,it required lightness between[0,1],value beyond this range will cause artifact.http://en.wikipedia.org/wiki/HSL_color_space
about shader compiler optim:in fact,the token-assembly shader language(the asm shader in windows platform's D3D)is not 1 to 1 mapping to HW instruction. .rgbb will save a token-asm instruction like ‘mov oC0.w,c0‘ indeed.but if you look at the HW assembly(use Tool like ShaderAnalyzer),two method will cost the same clock cycle under most case.and if use .rgbb,it assumed that the alpha channel will not be used in future,it's not a always-safe trick,right? oh I see, I should use 'return float4(ColorVibrance(c0.xyz,vibrance),c0.w);'instead of'return float4(ColorVibrance(c0.xyz,vibrance),1.0);'
about floating point,for recent GPU,you are right.but some old GPU,not implement IEEE standard strictly,like ATI R300-R400's fp24 internal precision,and I'm afraid some shader compiler's aggressive optimization may use fp16 instead of fp32 (for Old NVidia card).so I use the epsilon for safe, it only cause a very little performance-hit.

XRyche
11th July 2013, 16:51
XRyche, the RGB to Y'CbCr conversion shader is fairly basic. I assume you mean the chroma up-sampling shaders? These will distort the chroma if the values were altered before. Sharp chroma borders such as red on black and blue on black will have the worst artifacts.
I'll see what I can write today and tomorrow. Implementing this chain as three passes of pixel shaders should be easy enough.



No, I didn't mean the chroma up-sampling shaders. I already understand now that I don't need them since I have an Nvidia GPU. I meant that I wanted you to create 2 "sharpen complex, deband, denoise and color controls for SD&HD video input". One with 2 radial sharpening levels and one with 1 radial sharpening level. Similar to what you have for the Linear RGB sharpen complex, deband and denoise shaders in your Video pixel shader pack. I happen to prefer Y'CbCr to RGB and I would like to use the "RGB to Y'CbCr for SD&HD video input for floating point surfaces" shader in conjuction with the Y'CbCr "sharpen complex, deband, denoise and color controls for SD&HD video input" shaders with all my content. This includes some 1200p content which is too much for my system to use with your Y'CbCr "r=4, sharpen complex, deband, medium denoise and color controls for SD&HD video input" shader. :thanks:

edit: I don't mean to sound demanding, if that's how it sounds.

JanWillem32
13th July 2013, 22:24
XRyche, I'm still experimenting a bit with some shader stages. The shaders are not that hard to combine, but getting a combination of good performance and amiable effects isn't easy. On top of that, these are pretty much the most complex shaders I've written for processing video. I'm pretty sure I should simplify some parts, but I'll just have to try some things and hope the effect gets better.

mhourousha, I took some time to analyze what kind of effect you were trying to apply. I saw that it only increases colorfulness. The bulk of the code in your shader is the rather involved process of RGB to HSL conversion and back again. From what I could make up of the articles about the HSV and HSL color models is that they are RGB representations. They equally suffer the same problem as RGB models: without values encoded outside the nominal color interval, it is impossible to represent the gamut of human vision. http://en.wikipedia.org/wiki/CIE_1931_color_space
(Note that even when rendering in a color space that does accommodate the gamut of human vision, saturation isn't used. Various filtering steps may just shift color data in and out of visible and invisible areas of the color space. That just happens when rendering. When dealing with a renderer you have to design things that can deal with ranges up to 1 as up to 100 just as easily in most stages.)
HSV and HSL mostly seem to be convenient when doing or comparing shifts in the hue, which is a more complicated matter in RGB models. (It's not that hard if you can handle matrix transforms. I added a few shaders as an example.)
I already had a colorfulness shader, but I didn't mind making another type for the occasion. Note that you can do a lot more complex transforms than one simple multiply in the line "s1.rgb = (s1.rgb-inptot)*colorfulness+inptot;".
I saved quite a lot of instructions on not doing the complex color space transforms. In terms of (expensive) branching, I only had to use one to prevent a division by zero.
You specified that you wanted to cater to the 16-bit internal precision of the Nvidia NV30, NV31 and NV34 models introduced ten years ago (and were never DirectX 9.0 compliant because of the precision issue).
http://en.wikipedia.org/wiki/Machine_epsilon
The precision of the 16-bit floating point type is rather bad, its machine epsilon is 0.0009765625. The value you used was 0.0001. That value covers 0.2048 bits of maximum precision loss, assuming the interval [.5, 1).
The precision of the 24-bit floating point type is a lot better, its machine epsilon is 0.0000152587890625. The value 0.0001 covers 13.1072 bits of maximum precision loss, assuming the interval [.5, 1).
For the execution of this shader that all doesn't matter though: of the cases where applying epsilon was assumed useful, it actually was not. None of the cases here have to do deal with precision loss due to the specific limited amount of of mantissa bits.
Hardware assembly execution speed and amount of D3D asm instructions are not tied 1:1 indeed. Modern processor architectures are superscalar. Everything that can be simultaneously executed without stalling will generally be executed at the same time as other instructions. Note that the rules vary per architecture. The D3D asm instruction count is still a good indication of how fast a shader will run on average.
In regards the the fourth component on the output of a pixel shader, I can be clear on that. The performance mode uses X8R8G8B8 to store color, thus not saving the fourth component at all. The quality modes use A16B16G16R16F, A16B16G16R16 and A32B32G32R32F. These can store the fourth component. If you want to design a specific chain of multiple shaders for some effect, you are welcome to put data in the fourth component for every pixel. However, none of the renderer components themselves will ever use that channel. I advise to treat the fourth component as discarded for most pixel shaders.
Other renderer stages that handle textures with valid alpha do require processing the fourth component. For blending the OSD and subtitles two special pixel shaders are used in combination with alpha blending. The renderer has always been this way in regards to this aspect, I don't expect changes either.

Sample shaders (the XYZ types can only properly function with the renderer in quality mode):// (C) 2013 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// very basic colorfulness control

#define colorfulness 1.5// default is one, this should not be zero

sampler s0 : register(s0);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float4 s1 = tex2D(s0, tex);
float inptot = dot(1/3., s1.rgb);// component average

[branch] if (inptot) {// prevent division by zero
s1.rgb = (s1.rgb-inptot)*colorfulness+inptot;
float intermtot = dot(1/3., s1.rgb);
s1.rgb *= inptot;
s1.rgb /= intermtot;}
return s1;
}



// colorfulness control for XYZ rendering

// white point in xyY, default is TV-type D65 {.3127, .3290, 1}, the most basic is E {1/3., 1/3., 1}
#define wpx .3127
#define wpy .3290// this cannot be zero

#define colorfulness 1.5// default is one, this should not be zero

sampler s0 : register(s0);
static const float wpyr = 1./wpy;
static const float wpX = wpx*wpyr;
static const float wpZ = wpyr-wpX-1.;
static const float3 wpXYZ = {wpX, 1, wpZ};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float4 s1 = tex2D(s0, tex);
float inptot = dot(1/3., s1.rgb);// component average

[branch] if (inptot) {// prevent division by zero
s1.rgb /= wpXYZ;// adapt white point
s1.rgb = (s1.rgb-inptot)*colorfulness+inptot;
float intermtot = dot(1/3., s1.rgb);
s1.rgb *= inptot;
s1.rgb /= intermtot;
s1.rgb *= wpXYZ;}// revert white point adaptation
return s1;
}



// hue shift for XYZ rendering

// white point in xyY, default is TV-type D65 {.3127, .3290, 1}, the most basic is E {1/3., 1/3., 1}
#define wpx .3127
#define wpy .3290// this cannot be zero

#define hue 180// in degrees, default is zero

sampler s0 : register(s0);
static const float wpyr = 1./wpy;
static const float wpX = wpx*wpyr;
static const float wpZ = wpyr-wpX-1.;
static const float3 wpXYZ = {wpX, 1, wpZ};
static const float huecos = cos(radians(hue));
static const float huesin = sin(radians(hue));
static const float huecosp = 1/3.-huecos/3.;
static const float huesinp = sqrt(1/3.)*huesin;
static const float huebase = huecosp+huecos;
static const float huedera = huecosp+huesinp;
static const float hueders = huecosp-huesinp;
static const float3x3 hueshiftmat = {
huebase, huedera, hueders,
hueders, huebase, huedera,
huedera, hueders, huebase};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float4 s1 = tex2D(s0, tex);
s1.rgb /= wpXYZ;// adapt white point
s1.rgb = mul(s1.rgb, hueshiftmat);
s1.rgb *= wpXYZ;// revert white point adaptation
return s1;
}



// variable hue shift for XYZ rendering

// white point in xyY, default is TV-type D65 {.3127, .3290, 1}, the most basic is E {1/3., 1/3., 1}
#define wpx .3127
#define wpy .3290// this cannot be zero

#define hue c0.w// in radians, default is zero

sampler s0 : register(s0);
float4 c0 : register(c0);
static const float wpyr = 1./wpy;
static const float wpX = wpx*wpyr;
static const float wpZ = wpyr-wpX-1.;
static const float3 wpXYZ = {wpX, 1, wpZ};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float huecos = cos(hue);
float huesin = sin(hue);
float huecosp = 1/3.-huecos/3.;
float huesinp = sqrt(1/3.)*huesin;
float huebase = huecosp+huecos;
float huedera = huecosp+huesinp;
float hueders = huecosp-huesinp;
float3x3 hueshiftmat = {
huebase, huedera, hueders,
hueders, huebase, huedera,
huedera, hueders, huebase};

float4 s1 = tex2D(s0, tex);
s1.rgb /= wpXYZ;// adapt white point
s1.rgb = mul(s1.rgb, hueshiftmat);
s1.rgb *= wpXYZ;// revert white point adaptation
return s1;
}

// colorfulness control for XYZ rendering on 16-bit integer surfaces

// white point in xyY, default is TV-type D65 {.3127, .3290, 1}, the most basic is E {1/3., 1/3., 1}
#define wpx .3127
#define wpy .3290// this cannot be zero

#define colorfulness 1.5// default is one, this should not be zero

sampler s0 : register(s0);
static const float wpyr = 1./wpy;
static const float wpX = wpx*wpyr;
static const float wpZ = wpyr-wpX-1.;
static const float3 wpXYZ = {wpX, 1, wpZ};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float4 s1 = tex2D(s0, tex);
s1.rgb -= 16384/65535.;// remove input interval [16384/65535, 49151/65535] offset to black point
float inptot = dot(1/3., s1.rgb);// component average

[branch] if (inptot) {// prevent division by zero
s1.rgb /= wpXYZ;// adapt white point
s1.rgb = (s1.rgb-inptot)*colorfulness+inptot;
float intermtot = dot(1/3., s1.rgb);
s1.rgb *= inptot;
s1.rgb /= intermtot;
s1.rgb *= wpXYZ;}// revert white point adaptation
s1.rgb += 16384/65535.;// re-apply black point offset
return s1;
}



// hue shift for XYZ rendering on 16-bit integer surfaces

// white point in xyY, default is TV-type D65 {.3127, .3290, 1}, the most basic is E {1/3., 1/3., 1}
#define wpx .3127
#define wpy .3290// this cannot be zero

#define hue 180// in degrees, default is zero

sampler s0 : register(s0);
static const float wpyr = 1./wpy;
static const float wpX = wpx*wpyr;
static const float wpZ = wpyr-wpX-1.;
static const float3 wpXYZ = {wpX, 1, wpZ};
static const float huecos = cos(radians(hue));
static const float huesin = sin(radians(hue));
static const float huecosp = 1/3.-huecos/3.;
static const float huesinp = sqrt(1/3.)*huesin;
static const float huebase = huecosp+huecos;
static const float huedera = huecosp+huesinp;
static const float hueders = huecosp-huesinp;
static const float3x3 hueshiftmat = {
huebase, huedera, hueders,
hueders, huebase, huedera,
huedera, hueders, huebase};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float4 s1 = tex2D(s0, tex);
s1.rgb -= 16384/65535.;// remove input interval [16384/65535, 49151/65535] offset to black point
s1.rgb /= wpXYZ;// adapt white point
s1.rgb = mul(s1.rgb, hueshiftmat);
s1.rgb *= wpXYZ;// revert white point adaptation
s1.rgb += 16384/65535.;// re-apply black point offset
return s1;
}



// variable hue shift for XYZ rendering on 16-bit integer surfaces

// white point in xyY, default is TV-type D65 {.3127, .3290, 1}, the most basic is E {1/3., 1/3., 1}
#define wpx .3127
#define wpy .3290// this cannot be zero

#define hue c0.w// in radians, default is zero

sampler s0 : register(s0);
float4 c0 : register(c0);
static const float wpyr = 1./wpy;
static const float wpX = wpx*wpyr;
static const float wpZ = wpyr-wpX-1.;
static const float3 wpXYZ = {wpX, 1, wpZ};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float huesin, huecos;
sincos(hue, huesin, huecos);
float huecosp = 1/3.-huecos/3.;
float huesinp = sqrt(1/3.)*huesin;
float huebase = huecosp+huecos;
float huedera = huecosp+huesinp;
float hueders = huecosp-huesinp;
float3x3 hueshiftmat = {
huebase, huedera, hueders,
hueders, huebase, huedera,
huedera, hueders, huebase};

float4 s1 = tex2D(s0, tex);
s1.rgb -= 16384/65535.;// remove input interval [16384/65535, 49151/65535] offset to black point
s1.rgb /= wpXYZ;// adapt white point
s1.rgb = mul(s1.rgb, hueshiftmat);
s1.rgb *= wpXYZ;// revert white point adaptation
s1.rgb += 16384/65535.;// re-apply black point offset
return s1;
}

detmek
14th July 2013, 10:05
I am using one or two shaders for small corections during playback, usually LumaSharpen or Sharpen Complex and Vibrance for anime. But sometimes video has a banding, usually my old encodes with denoised with FFT3DFilter.

Is there a pure deband shader that I can use to replace FFDShow Deband filter?

JanWillem32
16th July 2013, 18:58
The pure debanding shaders I tried were terrible. For the limit-based types, the worst cases were borders (even some pretty sharp ones). If the transition pixels on borders get blurred without compensating for the borders, aliasing occurs. For the gradual types, the main problem was blurring of pretty much everything.
So, I tried to use a gradual type, limit it, but additionally adapt a typical unsharp mask sharpening effect to to prevent some of the visible artifacts. For these adaptive shaders, the sharpening can be made stronger to accentuate contrast (without making banding and noise worse like other sharpen effects). I personally don't care about the sharpening effect, as I don't like seeing the sharpening halos at all. Though I know that if I set it too low or off, the other artifacts will be visible.
Every shader I wrote claiming to be able to deband and denoise has a notice on how to disable the sharpening effect. (Though I'm not sure how effective r=1 and r=2 types can be at debanding and denoising.) For the larger types, the sharpen factor can even be set for every layer.
In the mean time, I wrote a multi-pass sharpen, deband, denoise and color controls filter that should be better in terms of performance, while still doing a reasonably good job at debanding larger areas (the most costly part, as it requires a lot of pixels). If wanted, I can share it, but it's not quite finished yet.

PetitDragon
17th July 2013, 00:37
.... If wanted, I can share it, but it's not quite finished yet.

Yes please. We need a new shader pack for XYZ rendering.
:script::thanks:

turbojet
17th July 2013, 20:24
F3kdb has set my bar really high for debanding but I'm always interested in trying new methods. Will it be separate from sharpener, denoising and color controls?

JanWillem32 would you have interest writing an f3kdb shader? Developer of the avisynth plugin mentioned this a few months ago: http://forum.doom9.org/showthread.php?p=1621484#post1621484

jerrymh
23rd July 2013, 21:32
Where a can find a shader to put scanlines or crt grille on screen.

detmek
23rd July 2013, 22:32
Sure. I will be interested to try it. Its easier to use shader then loading FFDShow RAW filter.

JanWillem32
25th July 2013, 02:34
About the f3kdb shader, I'm quite interested. There's a limit to what I can do, though. Shaders work very differently compared to many other graphics filters. In most cases, translating to shaders is rather hard. I'll just have to try and be creative. It may just as well be easier than the combination effect shader chain I'm trying to write for XRyche. Where do I start?

jerrymh, that kind of effect is easy. I made this effect two-pass. If you want something more specific (such as a better quality lowpass), I can change a few parts. Depending on the input/output resolution ratio, you might need to blur a bit more or less.// (C) 2013 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// horizontal blur

sampler s0 : register(s0);
float c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
return (tex2D(s0, tex+float2(-3.*c1, 0))+tex2D(s0, tex+float2(-2.*c1, 0))+tex2D(s0, tex+float2(-c1, 0))+tex2D(s0, tex)+tex2D(s0, tex+float2(c1, 0))+tex2D(s0, tex+float2(2.*c1, 0))+tex2D(s0, tex+float2(3.*c1, 0)))/7.;// blur and output
}



// old CRT scan lines

#define scanlines 480// 480 for NTSC, 576 for PAL/SECAM, fractions, either decimal or not are allowed
#define gamma 1// higher is brighter, fractions, either decimal or not are allowed

sampler s0 : register(s0);
float2 c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float4 s1 = (tex2D(s0, tex+float2(0, -3.*c1.y))+tex2D(s0, tex+float2(0, -2.*c1.y))+tex2D(s0, tex+float2(0, -c1.y))+tex2D(s0, tex)+tex2D(s0, tex+float2(0, c1.y))+tex2D(s0, tex+float2(0, 2.*c1.y))+tex2D(s0, tex+float2(0, 3.*c1.y)))/7.;// blur input

float br = 1.-pow(abs(frac(abs(tex.y*scanlines-.5*scanlines))*2.-1.), gamma);// generate scan lines
return s1*br;// modulate brightness and output
}



// old CRT scan lines for XYZ rendering on 16-bit integer surfaces

#define scanlines 480// 480 for NTSC, 576 for PAL/SECAM, fractions, either decimal or not are allowed
#define gamma 1// higher is brighter, fractions, either decimal or not are allowed

sampler s0 : register(s0);
float2 c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float4 s1 = (tex2D(s0, tex+float2(0, -3.*c1.y))+tex2D(s0, tex+float2(0, -2.*c1.y))+tex2D(s0, tex+float2(0, -c1.y))+tex2D(s0, tex)+tex2D(s0, tex+float2(0, c1.y))+tex2D(s0, tex+float2(0, 2.*c1.y))+tex2D(s0, tex+float2(0, 3.*c1.y)))/7.;// blur input

float br = 1.-pow(abs(frac(abs(tex.y*scanlines-.5*scanlines))*2.-1.), gamma);// generate scan lines
return (s1-16384/65535.)*br+16384/65535.;// modulate brightness and output
}

turbojet
25th July 2013, 06:10
About the f3kdb shader, I'm quite interested. There's a limit to what I can do, though. Shaders work very differently compared to many other graphics filters. In most cases, translating to shaders is rather hard. I'll just have to try and be creative. It may just as well be easier than the combination effect shader chain I'm trying to write for XRyche. Where do I start?


Maybe messaging SAPikachu, the developer of f3kdb.dll he was open to helping in the message linked earlier. I have almost no programming experience so couldn't be of help.

XRyche
27th July 2013, 04:47
JanWillem32, First off.....Thank You for working on the hybrid shaders I've requested. I doubt if I would be able to find anyone else so willing to do that. Second, I have been doing some experimenting with different methods for cleaning up the image quality on a lot of my old XVID/DIVX/AVI files and it seems that they benefit more from deblocking (ffdshows raw filter mplayer deblocking) than denoising. I suppose since most of the files are old vhs to avi tv rips and old tvcard rips that makes sense. Anyways, would you be open to possibly doing an adjustable deblocking shader? I've read that ATI used to (I don't know if they still do) use it's shader core to do deblocking so I assume (whether correctly or not) it's possible to do it with an HLSL script. I don't have a clue what the math involved would be like so if you can't it's understandable.

turbojet
27th July 2013, 06:34
MPEG4 ASP is notorious for banding (could be mistaken for blocks on flat surfaces and faces) are you sure it's not banding?

f3kdb is a must for me with ASP much less so for any decent AVC encode or MPEG2. Have you tried it through ffdshow's avisynth interface? Make sure to use setmemorymax(128 or more) to stop the leakage.

jerrymh
28th July 2013, 05:09
About the f3kdb shader, I'm quite interested. There's a limit to what I can do, though. Shaders work very differently compared to many other graphics filters. In most cases, translating to shaders is rather hard. I'll just have to try and be creative. It may just as well be easier than the combination effect shader chain I'm trying to write for XRyche. Where do I start?

jerrymh, that kind of effect is easy. I made this effect two-pass. If you want something more specific (such as a better quality lowpass), I can change a few parts. Depending on the input/output resolution ratio, you might need to blur a bit more or less.// (C) 2013 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// horizontal blur

sampler s0 : register(s0);
float c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
return (tex2D(s0, tex+float2(-3.*c1, 0))+tex2D(s0, tex+float2(-2.*c1, 0))+tex2D(s0, tex+float2(-c1, 0))+tex2D(s0, tex)+tex2D(s0, tex+float2(c1, 0))+tex2D(s0, tex+float2(2.*c1, 0))+tex2D(s0, tex+float2(3.*c1, 0)))/7.;// blur and output
}



// old CRT scan lines

#define scanlines 480// 480 for NTSC, 576 for PAL/SECAM, fractions, either decimal or not are allowed
#define gamma 1// higher is brighter, fractions, either decimal or not are allowed

sampler s0 : register(s0);
float2 c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float4 s1 = (tex2D(s0, tex+float2(0, -3.*c1.y))+tex2D(s0, tex+float2(0, -2.*c1.y))+tex2D(s0, tex+float2(0, -c1.y))+tex2D(s0, tex)+tex2D(s0, tex+float2(0, c1.y))+tex2D(s0, tex+float2(0, 2.*c1.y))+tex2D(s0, tex+float2(0, 3.*c1.y)))/7.;// blur input

float br = 1.-pow(abs(frac(abs(tex.y*scanlines-.5*scanlines))*2.-1.), gamma);// generate scan lines
return s1*br;// modulate brightness and output
}



// old CRT scan lines for XYZ rendering on 16-bit integer surfaces

#define scanlines 480// 480 for NTSC, 576 for PAL/SECAM, fractions, either decimal or not are allowed
#define gamma 1// higher is brighter, fractions, either decimal or not are allowed

sampler s0 : register(s0);
float2 c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float4 s1 = (tex2D(s0, tex+float2(0, -3.*c1.y))+tex2D(s0, tex+float2(0, -2.*c1.y))+tex2D(s0, tex+float2(0, -c1.y))+tex2D(s0, tex)+tex2D(s0, tex+float2(0, c1.y))+tex2D(s0, tex+float2(0, 2.*c1.y))+tex2D(s0, tex+float2(0, 3.*c1.y)))/7.;// blur input

float br = 1.-pow(abs(frac(abs(tex.y*scanlines-.5*scanlines))*2.-1.), gamma);// generate scan lines
return (s1-16384/65535.)*br+16384/65535.;// modulate brightness and output
}

Thank you very much, :thanks::thanks:

How about a aperture grille like this on Final burn alpha, it feels like a real old crt, or LG plasma


http://img10.imageshack.us/img10/4519/6qhx.jpg

Or a scanlines at 95%

XRyche
28th July 2013, 08:19
@turbojet...Yes there is some banding but some of JanWillem32's shaders+his modified EVR-CP already help with that (as well as madVR all but eliminating banding) but blocking is still there without using fddshow's raw filter. Not that the raw filter is bad I just would like to eliminate it from my playback chain. If JanWillem32 can kindly make a deblocking shader that does as good of a job or better as the raw filter I would much rather use that.

Most of my problem video files are from old vhs recordings of TV shows converted to .avi's as well as some TIVO-type files and early PC TV card recordings so blocking is kind of a given as well as massive banding ;) . Considering that madVR doesn't do deblocking (it actually accentuates the blocking on some of my files) using madVR for these is sort of a no no without the raw filter or a shader script (one the madVR will not neuter because of gamma manipulation or such).

JanWillem32
28th July 2013, 14:17
Deblocking is mostly decoder territory. Many video codecs don't use the typical macroblocks at all. For those that do, you need the general blocking info for the luma, chroma and interlacing to deal with it. For h.264 (and some newer codecs) organized (de)blocking is mandatory for both encoder and decoder. The custom shader stages of the video renderer are a bit late in the rendering chain to properly work on blocking and such. I'm not sure if I can write a normal shader that can help with deblocking.

jerrymh, that picture mostly shows hand-drawn pixel art. No decent video will convert nicely to high contrast, low quantization images like that. I can approximate the effect by combining a few techniques, but note that posterization is a really messy effect (even in common 8-bit video and worst of all, it's everywhere).
It's a two-pass shader chain again. The warning for "should be divisible by 4" isn't too strict, the few artifacts are hard to see. For common resolutions such as 720- and 1080-line systems I can also adapt special shaders to compensate for this issue. I can also try to boost some of the contrast or colorfulness before posterization as well, but I didn't see much improvement with those effects enabled on the samples I used.// (C) 2013 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// horizontal 4-pixel averaging
// this shader only works properly on inputs that have a horizontal resolution that is evenly divisible by 4

sampler s0 : register(s0);
float c0 : register(c0);
float c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float pos = (trunc(tex.x*c0*.25)+.125)*c1*4.;// calculate the left positon of the current set of pixels
return (tex2D(s0, float2(pos, tex.y))+tex2D(s0, float2(pos+c1, tex.y))+tex2D(s0, float2(pos+2*c1, tex.y))+tex2D(s0, float2(pos+3*c1, tex.y)))*.25;// blur and output
}



// vertical 4-pixel averaging, dithering, posterizing and old CRT scan lines
// this shader only works properly on inputs that have a vertical resolution that is evenly divisible by 4

#define gamma 1// higher is brighter, fractions, either decimal or not are allowed
#define scanlinebasedarken .5// the default of .5 will darken outer pixels a bit on each set of vertical pixels to appear like old CRT scan lines, higher values will narrow the scan line beam
#define posterizedegamma 1// 0 or 1, apply dirty de-gamma for posterization, useful to preserve realistic gradients in low gamma modes
#define quantizationbits 4// posterization level, note that 'quantize' can actually take any amount, not just those based on powers of two

sampler s0 : register(s0);
float2 c0 : register(c0);
float2 c1 : register(c1);
static const float quantize = pow(2, quantizationbits)-1;
static const float quantizer = 1./quantize;
static const float qm = .0078125*quantizer;
static const float smalldithermap[8][8] = {
{-63*qm, qm, -47*qm, 17*qm, -59*qm, 5*qm, -43*qm, 21*qm},
{33*qm, -31*qm, 49*qm, -15*qm, 37*qm, -27*qm, 53*qm, -11*qm},
{-39*qm, 25*qm, -55*qm, 9*qm, -35*qm, 29*qm, -51*qm, 13*qm},
{57*qm, -7*qm, 41*qm, -23*qm, 61*qm, -3*qm, 45*qm, -19*qm},
{-57*qm, 7*qm, -41*qm, 23*qm, -61*qm, 3*qm, -45*qm, 19*qm},
{39*qm, -25*qm, 55*qm, -9*qm, 35*qm, -29*qm, 51*qm, -13*qm},
{-33*qm, 31*qm, -49*qm, 15*qm, -37*qm, 27*qm, -53*qm, 11*qm},
{63*qm, -qm, 47*qm, -17*qm, 59*qm, -5*qm, 43*qm, -21*qm}};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 basepos = tex.xyy*c0.xyy*float3(.03125, .03125, .25);
float3 basefrac = frac(basepos);
float2 lookups = basefrac.xy*8.;
float dithers = smalldithermap[lookups.x][lookups.y];

float br = 1.-pow(abs(basefrac.z*2.*scanlinebasedarken-scanlinebasedarken), gamma);// generate scan lines

float pos = (basepos.z-basefrac.z+.125)*c1.y*4.;// calculate the top positon of the current set of pixels
float4 s1 = (tex2D(s0, float2(tex.x, pos))+tex2D(s0, float2(tex.x, pos+c1.y))+tex2D(s0, float2(tex.x, pos+2*c1.y))+tex2D(s0, float2(tex.x, pos+3*c1.y)))*.25;// blur input
#if posterizedegamma
float4 signbits = sign(s1);
s1 = signbits*pow(round(sqrt(s1)*quantize+dithers)*quantizer, 2);// dither and posterize
return s1*br;// modulate brightness and output
#else
s1 = round(s1*quantize+dithers);// dither and posterize
return s1*(br*quantizer);// shrink interval back to normal after posterization, modulate brightness and output
#endif
}



// vertical 4-pixel averaging, dithering, posterizing and old CRT scan lines for XYZ rendering on 16-bit integer surfaces
// this shader only works properly on inputs that have a vertical resolution that is evenly divisible by 4

#define gamma 1// higher is brighter, fractions, either decimal or not are allowed
#define scanlinebasedarken .5// the default of .5 will darken outer pixels a bit on each set of vertical pixels to appear like old CRT scan lines, higher values will narrow the scan line beam
#define posterizedegamma 1// 0 or 1, apply dirty de-gamma for posterization, useful to preserve realistic gradients in low gamma modes
#define quantizationbits 4// posterization level, note that 'quantize' can actually take any amount, not just those based on powers of two

sampler s0 : register(s0);
float2 c0 : register(c0);
float2 c1 : register(c1);
static const float quantize = pow(2, quantizationbits)-1;
static const float quantizer = 1./quantize;
static const float qm = .0078125*quantizer;
static const float smalldithermap[8][8] = {
{-63*qm, qm, -47*qm, 17*qm, -59*qm, 5*qm, -43*qm, 21*qm},
{33*qm, -31*qm, 49*qm, -15*qm, 37*qm, -27*qm, 53*qm, -11*qm},
{-39*qm, 25*qm, -55*qm, 9*qm, -35*qm, 29*qm, -51*qm, 13*qm},
{57*qm, -7*qm, 41*qm, -23*qm, 61*qm, -3*qm, 45*qm, -19*qm},
{-57*qm, 7*qm, -41*qm, 23*qm, -61*qm, 3*qm, -45*qm, 19*qm},
{39*qm, -25*qm, 55*qm, -9*qm, 35*qm, -29*qm, 51*qm, -13*qm},
{-33*qm, 31*qm, -49*qm, 15*qm, -37*qm, 27*qm, -53*qm, 11*qm},
{63*qm, -qm, 47*qm, -17*qm, 59*qm, -5*qm, 43*qm, -21*qm}};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 basepos = tex.xyy*c0.xyy*float3(.03125, .03125, .25);
float3 basefrac = frac(basepos);
float2 lookups = basefrac.xy*8.;
float dithers = smalldithermap[lookups.x][lookups.y];

float br = 1.-pow(abs(basefrac.z*2.*scanlinebasedarken-scanlinebasedarken), gamma);// generate scan lines

float pos = (basepos.z-basefrac.z+.125)*c1.y*4.;// calculate the top positon of the current set of pixels
float4 s1 = (tex2D(s0, float2(tex.x, pos))+tex2D(s0, float2(tex.x, pos+c1.y))+tex2D(s0, float2(tex.x, pos+2*c1.y))+tex2D(s0, float2(tex.x, pos+3*c1.y)))*.25;// blur input
#if posterizedegamma
s1 = s1*65535/32767.-16384/32767.;
float4 signbits = sign(s1);
s1 = signbits*pow(round(sqrt(s1)*quantize+dithers)*quantizer, 2);// dither and posterize
return s1*(br*32767/65535.)+16384/65535.;// modulate brightness and output
#else
s1 = round((s1*65535/32767.-16384/32767.)*quantize+dithers);// dither and posterize
return s1*(br*quantizer*32767/65535.)+16384/65535.;// shrink interval back to normal after posterization, modulate brightness and output
#endif
}



// horizontal 5-pixel averaging
// this shader only works properly on inputs that have a horizontal resolution that is evenly divisible by 5

sampler s0 : register(s0);
float c0 : register(c0);
float c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float pos = (trunc(tex.x*c0*.2)+.1)*c1*5.;// calculate the left positon of the current set of pixels
return (tex2D(s0, float2(pos, tex.y))+tex2D(s0, float2(pos+c1, tex.y))+tex2D(s0, float2(pos+2*c1, tex.y))+tex2D(s0, float2(pos+3*c1, tex.y))+tex2D(s0, float2(pos+4*c1, tex.y)))*.2;// blur and output
}



// vertical 5-pixel averaging, dithering, posterizing and old CRT scan lines
// this shader only works properly on inputs that have a vertical resolution that is evenly divisible by 5

#define gamma 1// higher is brighter, fractions, either decimal or not are allowed
#define scanlinebasedarken .5// the default of .5 will darken outer pixels a bit on each set of vertical pixels to appear like old CRT scan lines, higher values will narrow the scan line beam
#define posterizedegamma 1// 0 or 1, apply dirty de-gamma for posterization, useful to preserve realistic gradients in low gamma modes
#define quantizationbits 4// posterization level, note that 'quantize' can actually take any amount, not just those based on powers of two

sampler s0 : register(s0);
float2 c0 : register(c0);
float2 c1 : register(c1);
static const float quantize = pow(2, quantizationbits)-1;
static const float quantizer = 1./quantize;
static const float qm = .0078125*quantizer;
static const float smalldithermap[8][8] = {
{-63*qm, qm, -47*qm, 17*qm, -59*qm, 5*qm, -43*qm, 21*qm},
{33*qm, -31*qm, 49*qm, -15*qm, 37*qm, -27*qm, 53*qm, -11*qm},
{-39*qm, 25*qm, -55*qm, 9*qm, -35*qm, 29*qm, -51*qm, 13*qm},
{57*qm, -7*qm, 41*qm, -23*qm, 61*qm, -3*qm, 45*qm, -19*qm},
{-57*qm, 7*qm, -41*qm, 23*qm, -61*qm, 3*qm, -45*qm, 19*qm},
{39*qm, -25*qm, 55*qm, -9*qm, 35*qm, -29*qm, 51*qm, -13*qm},
{-33*qm, 31*qm, -49*qm, 15*qm, -37*qm, 27*qm, -53*qm, 11*qm},
{63*qm, -qm, 47*qm, -17*qm, 59*qm, -5*qm, 43*qm, -21*qm}};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 basepos = tex.xyy*c0.xyy*float3(.025, .025, .2);
float3 basefrac = frac(basepos);
float2 lookups = basefrac.xy*8.;
float dithers = smalldithermap[lookups.x][lookups.y];

float br = 1.-pow(abs(basefrac.z*2.*scanlinebasedarken-scanlinebasedarken), gamma);// generate scan lines

float pos = (basepos.z-basefrac.z+.1)*c1.y*5.;// calculate the top positon of the current set of pixels
float4 s1 = (tex2D(s0, float2(tex.x, pos))+tex2D(s0, float2(tex.x, pos+c1.y))+tex2D(s0, float2(tex.x, pos+2*c1.y))+tex2D(s0, float2(tex.x, pos+3*c1.y))+tex2D(s0, float2(tex.x, pos+4*c1.y)))*.2;// blur input
#if posterizedegamma
float4 signbits = sign(s1);
s1 = signbits*pow(round(sqrt(s1)*quantize+dithers)*quantizer, 2);// dither and posterize
return s1*br;// modulate brightness and output
#else
s1 = round(s1*quantize+dithers);// dither and posterize
return s1*(br*quantizer);// shrink interval back to normal after posterization, modulate brightness and output
#endif
}



// vertical 5-pixel averaging, dithering, posterizing and old CRT scan lines for XYZ rendering on 16-bit integer surfaces
// this shader only works properly on inputs that have a vertical resolution that is evenly divisible by 5

#define gamma 1// higher is brighter, fractions, either decimal or not are allowed
#define scanlinebasedarken .5// the default of .5 will darken outer pixels a bit on each set of vertical pixels to appear like old CRT scan lines, higher values will narrow the scan line beam
#define posterizedegamma 1// 0 or 1, apply dirty de-gamma for posterization, useful to preserve realistic gradients in low gamma modes
#define quantizationbits 4// posterization level, note that 'quantize' can actually take any amount, not just those based on powers of two

sampler s0 : register(s0);
float2 c0 : register(c0);
float2 c1 : register(c1);
static const float quantize = pow(2, quantizationbits)-1;
static const float quantizer = 1./quantize;
static const float qm = .0078125*quantizer;
static const float smalldithermap[8][8] = {
{-63*qm, qm, -47*qm, 17*qm, -59*qm, 5*qm, -43*qm, 21*qm},
{33*qm, -31*qm, 49*qm, -15*qm, 37*qm, -27*qm, 53*qm, -11*qm},
{-39*qm, 25*qm, -55*qm, 9*qm, -35*qm, 29*qm, -51*qm, 13*qm},
{57*qm, -7*qm, 41*qm, -23*qm, 61*qm, -3*qm, 45*qm, -19*qm},
{-57*qm, 7*qm, -41*qm, 23*qm, -61*qm, 3*qm, -45*qm, 19*qm},
{39*qm, -25*qm, 55*qm, -9*qm, 35*qm, -29*qm, 51*qm, -13*qm},
{-33*qm, 31*qm, -49*qm, 15*qm, -37*qm, 27*qm, -53*qm, 11*qm},
{63*qm, -qm, 47*qm, -17*qm, 59*qm, -5*qm, 43*qm, -21*qm}};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 basepos = tex.xyy*c0.xyy*float3(.025, .025, .2);
float3 basefrac = frac(basepos);
float2 lookups = basefrac.xy*8.;
float dithers = smalldithermap[lookups.x][lookups.y];

float br = 1.-pow(abs(basefrac.z*2.*scanlinebasedarken-scanlinebasedarken), gamma);// generate scan lines

float pos = (basepos.z-basefrac.z+.1)*c1.y*5.;// calculate the top positon of the current set of pixels
float4 s1 = (tex2D(s0, float2(tex.x, pos))+tex2D(s0, float2(tex.x, pos+c1.y))+tex2D(s0, float2(tex.x, pos+2*c1.y))+tex2D(s0, float2(tex.x, pos+3*c1.y))+tex2D(s0, float2(tex.x, pos+4*c1.y)))*.2;// blur input
#if posterizedegamma
s1 = s1*65535/32767.-16384/32767.;
float4 signbits = sign(s1);
s1 = signbits*pow(round(sqrt(s1)*quantize+dithers)*quantizer, 2);// dither and posterize
return s1*(br*32767/65535.)+16384/65535.;// modulate brightness and output
#else
s1 = round((s1*65535/32767.-16384/32767.)*quantize+dithers);// dither and posterize
return s1*(br*quantizer*32767/65535.)+16384/65535.;// shrink interval back to normal after posterization, modulate brightness and output
#endif
}

jerrymh
28th July 2013, 22:35
Deblocking is mostly decoder territory. Many video codecs don't use the typical macroblocks at all. For those that do, you need the general blocking info for the luma, chroma and interlacing to deal with it. For h.264 (and some newer codecs) organized (de)blocking is mandatory for both encoder and decoder. The custom shader stages of the video renderer are a bit late in the rendering chain to properly work on blocking and such. I'm not sure if I can write a normal shader that can help with deblocking.

jerrymh, that picture mostly shows hand-drawn pixel art. No decent video will convert nicely to high contrast, low quantization images like that. I can approximate the effect by combining a few techniques, but note that posterization is a really messy effect (even in common 8-bit video and worst of all, it's everywhere).
It's a two-pass shader chain again. The warning for "should be divisible by 4" isn't too strict, the few artifacts are hard to see. For common resolutions such as 720- and 1080-line systems I can also adapt special shaders to compensate for this issue. I can also try to boost some of the contrast or colorfulness before posterization as well, but I didn't see much improvement with those effects enabled on the samples I used.// (C) 2013 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// horizontal 4-pixel averaging
// this shader only works properly on inputs that have a horizontal resolution that is evenly divisible by 4

sampler s0 : register(s0);
float c0 : register(c0);
float c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float pos = (trunc(tex.x*c0*.25)+.125)*c1*4.;// calculate the left positon of the current set of pixels
return (tex2D(s0, float2(pos, tex.y))+tex2D(s0, float2(pos+c1, tex.y))+tex2D(s0, float2(pos+2*c1, tex.y))+tex2D(s0, float2(pos+3*c1, tex.y)))*.25;// blur and output
}



// vertical 4-pixel averaging, dithering, posterizing and old CRT scan lines
// this shader only works properly on inputs that have a vertical resolution that is evenly divisible by 4

#define gamma 1// higher is brighter, fractions, either decimal or not are allowed
#define posterizedegamma 1// 0 or 1, apply dirty de-gamma for posterization, useful to preserve realistic gradients in low gamma modes
#define quantizationbits 4// posterization level, note that 'quantize' can actually take any amount, not just those based on powers of two

sampler s0 : register(s0);
float2 c0 : register(c0);
float2 c1 : register(c1);
static const float quantize = pow(2, quantizationbits)-1;
static const float quantizer = 1./quantize;
static const float qm = .0078125*quantizer;
static const float smalldithermap[8][8] = {
{-63*qm, qm, -47*qm, 17*qm, -59*qm, 5*qm, -43*qm, 21*qm},
{33*qm, -31*qm, 49*qm, -15*qm, 37*qm, -27*qm, 53*qm, -11*qm},
{-39*qm, 25*qm, -55*qm, 9*qm, -35*qm, 29*qm, -51*qm, 13*qm},
{57*qm, -7*qm, 41*qm, -23*qm, 61*qm, -3*qm, 45*qm, -19*qm},
{-57*qm, 7*qm, -41*qm, 23*qm, -61*qm, 3*qm, -45*qm, 19*qm},
{39*qm, -25*qm, 55*qm, -9*qm, 35*qm, -29*qm, 51*qm, -13*qm},
{-33*qm, 31*qm, -49*qm, 15*qm, -37*qm, 27*qm, -53*qm, 11*qm},
{63*qm, -qm, 47*qm, -17*qm, 59*qm, -5*qm, 43*qm, -21*qm}};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 basepos = tex.xyy*c0.xyy*float3(.0625, .0625, .25);
float3 basefrac = frac(basepos);
float2 lookups = basefrac.xy*8.;
float dithers = smalldithermap[lookups.x][lookups.y];

float br = 1.-pow(abs(basefrac.z-.5), gamma);// generate scan lines

float pos = (basepos.z-basefrac.z+.125)*c1.y*4.;// calculate the top positon of the current set of pixels
float4 s1 = (tex2D(s0, float2(tex.x, pos))+tex2D(s0, float2(tex.x, pos+c1.y))+tex2D(s0, float2(tex.x, pos+2*c1.y))+tex2D(s0, float2(tex.x, pos+3*c1.y)))*.25;// blur input
#if posterizedegamma
s1 = pow(round(sqrt(s1)*quantize+dithers)*quantizer, 2);// dither and posterize
return s1*br;// modulate brightness and output
#else
s1 = round(s1*quantize+dithers);// dither and posterize
return s1*(br*quantizer);// shrink interval back to normal after posterization, modulate brightness and output
#endif
}



// vertical 4-pixel averaging, dithering, posterizing and old CRT scan lines for XYZ rendering on 16-bit integer surfaces
// this shader only works properly on inputs that have a vertical resolution that is evenly divisible by 4

#define gamma 1// higher is brighter, fractions, either decimal or not are allowed
#define posterizedegamma 1// 0 or 1, apply dirty de-gamma for posterization, useful to preserve realistic gradients in low gamma modes
#define quantizationbits 4// posterization level, note that 'quantize' can actually take any amount, not just those based on powers of two

sampler s0 : register(s0);
float2 c0 : register(c0);
float2 c1 : register(c1);
static const float quantize = pow(2, quantizationbits)-1;
static const float quantizer = 1./quantize;
static const float qm = .0078125*quantizer;
static const float smalldithermap[8][8] = {
{-63*qm, qm, -47*qm, 17*qm, -59*qm, 5*qm, -43*qm, 21*qm},
{33*qm, -31*qm, 49*qm, -15*qm, 37*qm, -27*qm, 53*qm, -11*qm},
{-39*qm, 25*qm, -55*qm, 9*qm, -35*qm, 29*qm, -51*qm, 13*qm},
{57*qm, -7*qm, 41*qm, -23*qm, 61*qm, -3*qm, 45*qm, -19*qm},
{-57*qm, 7*qm, -41*qm, 23*qm, -61*qm, 3*qm, -45*qm, 19*qm},
{39*qm, -25*qm, 55*qm, -9*qm, 35*qm, -29*qm, 51*qm, -13*qm},
{-33*qm, 31*qm, -49*qm, 15*qm, -37*qm, 27*qm, -53*qm, 11*qm},
{63*qm, -qm, 47*qm, -17*qm, 59*qm, -5*qm, 43*qm, -21*qm}};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 basepos = tex.xyy*c0.xyy*float3(.0625, .0625, .25);
float3 basefrac = frac(basepos);
float2 lookups = basefrac.xy*8.;
float dithers = smalldithermap[lookups.x][lookups.y];

float br = 1.-pow(abs(basefrac.z-.5), gamma);// generate scan lines

float pos = (basepos.z-basefrac.z+.125)*c1.y*4.;// calculate the top positon of the current set of pixels
float4 s1 = (tex2D(s0, float2(tex.x, pos))+tex2D(s0, float2(tex.x, pos+c1.y))+tex2D(s0, float2(tex.x, pos+2*c1.y))+tex2D(s0, float2(tex.x, pos+3*c1.y)))*.25;// blur input
#if posterizedegamma
s1 = pow(round(sqrt(s1*65535/32767.-16384/32767.)*quantize+dithers)*quantizer, 2);// dither and posterize
return s1*(br*32767/65535.)+16384/65535.;// modulate brightness and output
#else
s1 = round((s1*65535/32767.-16384/32767.)*quantize+dithers);// dither and posterize
return s1*(br*quantizer*32767/65535.)+16384/65535.;// shrink interval back to normal after posterization, modulate brightness and output
#endif
}


Maybe if you only try to draw the mask grille, not the other efects. (only a draw a mask in front the video)

Any way I found the source code for the shader mask, but am to :( about codes.

https://github.com/libretro/common-shaders/blob/master/crt/crt-geom-flat.cg

Also found this variants of the shader
http://emulation-general.wikia.com/wiki/CRT_Geom

and the image should look like this

http://images3.wikia.nocookie.net/__cb20130723004104/emulation-general/images/thumb/4/4c/Retroarch_2013-07-22_17-21-17-60.png/1000px-Retroarch_2013-07-22_17-21-17-60.png

JanWillem32
29th July 2013, 00:27
The host renderer for the shaders your link points to is organized very differently than the ones used for the shaders over here.
The first shaders I posted actually only blur and apply the scan line effect. The results are not stellar. The second version also properly degrades to low resolution and low quantization. It won't come close to pixel art like in that picture, but it will do a reasonable job on most typical video sources.
The default quantization in the shader is rather high compared to that picture. If it's a 256-color mode, try quantizationbits at 8/3., posterizedegamma 0 and probably a different gamma for the scan lines effect for the renderer in 8-bit mode or 17/6. and posterizedegamma 1 in quality mode. (Quality mode wastes a few percent at the top of the usual [0, 1] interval for two of the three channels.)
Note that I edited my previous post to fix a few bugs in the code with dithering and negative inputs.

turbojet
29th July 2013, 07:20
XRyche: can you post a short clip?

JanWillem32
29th July 2013, 08:36
Here are some extra shaders for larger pixels. I also edited the previous post because the forum has a maximum text length limit.// (C) 2013 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// horizontal 8-pixel averaging
// this shader only works properly on inputs that have a horizontal resolution that is evenly divisible by 8

sampler s0 : register(s0);
float c0 : register(c0);
float c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float pos = (trunc(tex.x*c0*.125)+.0625)*c1*8.;// calculate the left positon of the current set of pixels
return (tex2D(s0, float2(pos, tex.y))+tex2D(s0, float2(pos+c1, tex.y))+tex2D(s0, float2(pos+2*c1, tex.y))+tex2D(s0, float2(pos+3*c1, tex.y))+tex2D(s0, float2(pos+4*c1, tex.y))+tex2D(s0, float2(pos+5*c1, tex.y))+tex2D(s0, float2(pos+6*c1, tex.y))+tex2D(s0, float2(pos+7*c1, tex.y)))*.125;// blur and output
}



// vertical 8-pixel averaging, dithering, posterizing and old CRT scan lines
// this shader only works properly on inputs that have a vertical resolution that is evenly divisible by 8

#define gamma 1// higher is brighter, fractions, either decimal or not are allowed
#define scanlinebasedarken .5// the default of .5 will darken outer pixels a bit on each set of vertical pixels to appear like old CRT scan lines, higher values will narrow the scan line beam
#define posterizedegamma 1// 0 or 1, apply dirty de-gamma for posterization, useful to preserve realistic gradients in low gamma modes
#define quantizationbits 4// posterization level, note that 'quantize' can actually take any amount, not just those based on powers of two

sampler s0 : register(s0);
float2 c0 : register(c0);
float2 c1 : register(c1);
static const float quantize = pow(2, quantizationbits)-1;
static const float quantizer = 1./quantize;
static const float qm = .0078125*quantizer;
static const float smalldithermap[8][8] = {
{-63*qm, qm, -47*qm, 17*qm, -59*qm, 5*qm, -43*qm, 21*qm},
{33*qm, -31*qm, 49*qm, -15*qm, 37*qm, -27*qm, 53*qm, -11*qm},
{-39*qm, 25*qm, -55*qm, 9*qm, -35*qm, 29*qm, -51*qm, 13*qm},
{57*qm, -7*qm, 41*qm, -23*qm, 61*qm, -3*qm, 45*qm, -19*qm},
{-57*qm, 7*qm, -41*qm, 23*qm, -61*qm, 3*qm, -45*qm, 19*qm},
{39*qm, -25*qm, 55*qm, -9*qm, 35*qm, -29*qm, 51*qm, -13*qm},
{-33*qm, 31*qm, -49*qm, 15*qm, -37*qm, 27*qm, -53*qm, 11*qm},
{63*qm, -qm, 47*qm, -17*qm, 59*qm, -5*qm, 43*qm, -21*qm}};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 basepos = tex.xyy*c0.xyy*float3(.015625, .015625, .125);
float3 basefrac = frac(basepos);
float2 lookups = basefrac.xy*8.;
float dithers = smalldithermap[lookups.x][lookups.y];

float br = 1.-pow(abs(basefrac.z*2.*scanlinebasedarken-scanlinebasedarken), gamma);// generate scan lines

float pos = (basepos.z-basefrac.z+.0625)*c1.y*8.;// calculate the top positon of the current set of pixels
float4 s1 = (tex2D(s0, float2(tex.x, pos))+tex2D(s0, float2(tex.x, pos+c1.y))+tex2D(s0, float2(tex.x, pos+2*c1.y))+tex2D(s0, float2(tex.x, pos+3*c1.y))+tex2D(s0, float2(tex.x, pos+4*c1.y))+tex2D(s0, float2(tex.x, pos+5*c1.y))+tex2D(s0, float2(tex.x, pos+6*c1.y))+tex2D(s0, float2(tex.x, pos+7*c1.y)))*.125;// blur input
#if posterizedegamma
float4 signbits = sign(s1);
s1 = signbits*pow(round(sqrt(s1)*quantize+dithers)*quantizer, 2);// dither and posterize
return s1*br;// modulate brightness and output
#else
s1 = round(s1*quantize+dithers);// dither and posterize
return s1*(br*quantizer);// shrink interval back to normal after posterization, modulate brightness and output
#endif
}



// vertical 8-pixel averaging, dithering, posterizing and old CRT scan lines for XYZ rendering on 16-bit integer surfaces
// this shader only works properly on inputs that have a vertical resolution that is evenly divisible by 8

#define gamma 1// higher is brighter, fractions, either decimal or not are allowed
#define scanlinebasedarken .5// the default of .5 will darken outer pixels a bit on each set of vertical pixels to appear like old CRT scan lines, higher values will narrow the scan line beam
#define posterizedegamma 1// 0 or 1, apply dirty de-gamma for posterization, useful to preserve realistic gradients in low gamma modes
#define quantizationbits 4// posterization level, note that 'quantize' can actually take any amount, not just those based on powers of two

sampler s0 : register(s0);
float2 c0 : register(c0);
float2 c1 : register(c1);
static const float quantize = pow(2, quantizationbits)-1;
static const float quantizer = 1./quantize;
static const float qm = .0078125*quantizer;
static const float smalldithermap[8][8] = {
{-63*qm, qm, -47*qm, 17*qm, -59*qm, 5*qm, -43*qm, 21*qm},
{33*qm, -31*qm, 49*qm, -15*qm, 37*qm, -27*qm, 53*qm, -11*qm},
{-39*qm, 25*qm, -55*qm, 9*qm, -35*qm, 29*qm, -51*qm, 13*qm},
{57*qm, -7*qm, 41*qm, -23*qm, 61*qm, -3*qm, 45*qm, -19*qm},
{-57*qm, 7*qm, -41*qm, 23*qm, -61*qm, 3*qm, -45*qm, 19*qm},
{39*qm, -25*qm, 55*qm, -9*qm, 35*qm, -29*qm, 51*qm, -13*qm},
{-33*qm, 31*qm, -49*qm, 15*qm, -37*qm, 27*qm, -53*qm, 11*qm},
{63*qm, -qm, 47*qm, -17*qm, 59*qm, -5*qm, 43*qm, -21*qm}};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 basepos = tex.xyy*c0.xyy*float3(.015625, .015625, .125);
float3 basefrac = frac(basepos);
float2 lookups = basefrac.xy*8.;
float dithers = smalldithermap[lookups.x][lookups.y];

float br = 1.-pow(abs(basefrac.z*2.*scanlinebasedarken-scanlinebasedarken), gamma);// generate scan lines

float pos = (basepos.z-basefrac.z+.0625)*c1.y*8.;// calculate the top positon of the current set of pixels
float4 s1 = (tex2D(s0, float2(tex.x, pos))+tex2D(s0, float2(tex.x, pos+c1.y))+tex2D(s0, float2(tex.x, pos+2*c1.y))+tex2D(s0, float2(tex.x, pos+3*c1.y))+tex2D(s0, float2(tex.x, pos+4*c1.y))+tex2D(s0, float2(tex.x, pos+5*c1.y))+tex2D(s0, float2(tex.x, pos+6*c1.y))+tex2D(s0, float2(tex.x, pos+7*c1.y)))*.125;// blur input
#if posterizedegamma
s1 = s1*65535/32767.-16384/32767.;
float4 signbits = sign(s1);
s1 = signbits*pow(round(sqrt(s1)*quantize+dithers)*quantizer, 2);// dither and posterize
return s1*(br*32767/65535.)+16384/65535.;// modulate brightness and output
#else
s1 = round((s1*65535/32767.-16384/32767.)*quantize+dithers);// dither and posterize
return s1*(br*quantizer*32767/65535.)+16384/65535.;// shrink interval back to normal after posterization, modulate brightness and output
#endif
}



// horizontal 10-pixel averaging
// this shader only works properly on inputs that have a horizontal resolution that is evenly divisible by 10

sampler s0 : register(s0);
float c0 : register(c0);
float c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float pos = (trunc(tex.x*c0*.1)+.05)*c1*10.;// calculate the left positon of the current set of pixels
return (tex2D(s0, float2(pos, tex.y))+tex2D(s0, float2(pos+c1, tex.y))+tex2D(s0, float2(pos+2*c1, tex.y))+tex2D(s0, float2(pos+3*c1, tex.y))+tex2D(s0, float2(pos+4*c1, tex.y))+tex2D(s0, float2(pos+5*c1, tex.y))+tex2D(s0, float2(pos+6*c1, tex.y))+tex2D(s0, float2(pos+7*c1, tex.y))+tex2D(s0, float2(pos+8*c1, tex.y))+tex2D(s0, float2(pos+9*c1, tex.y)))*.1;// blur and output
}



// vertical 10-pixel averaging, dithering, posterizing and old CRT scan lines
// this shader only works properly on inputs that have a vertical resolution that is evenly divisible by 10

#define gamma 1// higher is brighter, fractions, either decimal or not are allowed
#define scanlinebasedarken .5// the default of .5 will darken outer pixels a bit on each set of vertical pixels to appear like old CRT scan lines, higher values will narrow the scan line beam
#define posterizedegamma 1// 0 or 1, apply dirty de-gamma for posterization, useful to preserve realistic gradients in low gamma modes
#define quantizationbits 4// posterization level, note that 'quantize' can actually take any amount, not just those based on powers of two

sampler s0 : register(s0);
float2 c0 : register(c0);
float2 c1 : register(c1);
static const float quantize = pow(2, quantizationbits)-1;
static const float quantizer = 1./quantize;
static const float qm = .0078125*quantizer;
static const float smalldithermap[8][8] = {
{-63*qm, qm, -47*qm, 17*qm, -59*qm, 5*qm, -43*qm, 21*qm},
{33*qm, -31*qm, 49*qm, -15*qm, 37*qm, -27*qm, 53*qm, -11*qm},
{-39*qm, 25*qm, -55*qm, 9*qm, -35*qm, 29*qm, -51*qm, 13*qm},
{57*qm, -7*qm, 41*qm, -23*qm, 61*qm, -3*qm, 45*qm, -19*qm},
{-57*qm, 7*qm, -41*qm, 23*qm, -61*qm, 3*qm, -45*qm, 19*qm},
{39*qm, -25*qm, 55*qm, -9*qm, 35*qm, -29*qm, 51*qm, -13*qm},
{-33*qm, 31*qm, -49*qm, 15*qm, -37*qm, 27*qm, -53*qm, 11*qm},
{63*qm, -qm, 47*qm, -17*qm, 59*qm, -5*qm, 43*qm, -21*qm}};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 basepos = tex.xyy*c0.xyy*float3(.0125, .0125, .1);
float3 basefrac = frac(basepos);
float2 lookups = basefrac.xy*8.;
float dithers = smalldithermap[lookups.x][lookups.y];

float br = 1.-pow(abs(basefrac.z*2.*scanlinebasedarken-scanlinebasedarken), gamma);// generate scan lines

float pos = (basepos.z-basefrac.z+.05)*c1.y*10.;// calculate the top positon of the current set of pixels
float4 s1 = (tex2D(s0, float2(tex.x, pos))+tex2D(s0, float2(tex.x, pos+c1.y))+tex2D(s0, float2(tex.x, pos+2*c1.y))+tex2D(s0, float2(tex.x, pos+3*c1.y))+tex2D(s0, float2(tex.x, pos+4*c1.y))+tex2D(s0, float2(tex.x, pos+5*c1.y))+tex2D(s0, float2(tex.x, pos+6*c1.y))+tex2D(s0, float2(tex.x, pos+7*c1.y))+tex2D(s0, float2(tex.x, pos+8*c1.y))+tex2D(s0, float2(tex.x, pos+9*c1.y)))*.1;// blur input
#if posterizedegamma
float4 signbits = sign(s1);
s1 = signbits*pow(round(sqrt(s1)*quantize+dithers)*quantizer, 2);// dither and posterize
return s1*br;// modulate brightness and output
#else
s1 = round(s1*quantize+dithers);// dither and posterize
return s1*(br*quantizer);// shrink interval back to normal after posterization, modulate brightness and output
#endif
}



// vertical 10-pixel averaging, dithering, posterizing and old CRT scan lines for XYZ rendering on 16-bit integer surfaces
// this shader only works properly on inputs that have a vertical resolution that is evenly divisible by 10

#define gamma 1// higher is brighter, fractions, either decimal or not are allowed
#define scanlinebasedarken .5// the default of .5 will darken outer pixels a bit on each set of vertical pixels to appear like old CRT scan lines, higher values will narrow the scan line beam
#define posterizedegamma 1// 0 or 1, apply dirty de-gamma for posterization, useful to preserve realistic gradients in low gamma modes
#define quantizationbits 4// posterization level, note that 'quantize' can actually take any amount, not just those based on powers of two

sampler s0 : register(s0);
float2 c0 : register(c0);
float2 c1 : register(c1);
static const float quantize = pow(2, quantizationbits)-1;
static const float quantizer = 1./quantize;
static const float qm = .0078125*quantizer;
static const float smalldithermap[8][8] = {
{-63*qm, qm, -47*qm, 17*qm, -59*qm, 5*qm, -43*qm, 21*qm},
{33*qm, -31*qm, 49*qm, -15*qm, 37*qm, -27*qm, 53*qm, -11*qm},
{-39*qm, 25*qm, -55*qm, 9*qm, -35*qm, 29*qm, -51*qm, 13*qm},
{57*qm, -7*qm, 41*qm, -23*qm, 61*qm, -3*qm, 45*qm, -19*qm},
{-57*qm, 7*qm, -41*qm, 23*qm, -61*qm, 3*qm, -45*qm, 19*qm},
{39*qm, -25*qm, 55*qm, -9*qm, 35*qm, -29*qm, 51*qm, -13*qm},
{-33*qm, 31*qm, -49*qm, 15*qm, -37*qm, 27*qm, -53*qm, 11*qm},
{63*qm, -qm, 47*qm, -17*qm, 59*qm, -5*qm, 43*qm, -21*qm}};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 basepos = tex.xyy*c0.xyy*float3(.0125, .0125, .1);
float3 basefrac = frac(basepos);
float2 lookups = basefrac.xy*8.;
float dithers = smalldithermap[lookups.x][lookups.y];

float br = 1.-pow(abs(basefrac.z*2.*scanlinebasedarken-scanlinebasedarken), gamma);// generate scan lines

float pos = (basepos.z-basefrac.z+.05)*c1.y*10.;// calculate the top positon of the current set of pixels
float4 s1 = (tex2D(s0, float2(tex.x, pos))+tex2D(s0, float2(tex.x, pos+c1.y))+tex2D(s0, float2(tex.x, pos+2*c1.y))+tex2D(s0, float2(tex.x, pos+3*c1.y))+tex2D(s0, float2(tex.x, pos+4*c1.y))+tex2D(s0, float2(tex.x, pos+5*c1.y))+tex2D(s0, float2(tex.x, pos+6*c1.y))+tex2D(s0, float2(tex.x, pos+7*c1.y))+tex2D(s0, float2(tex.x, pos+8*c1.y))+tex2D(s0, float2(tex.x, pos+9*c1.y)))*.1;// blur input
#if posterizedegamma
s1 = s1*65535/32767.-16384/32767.;
float4 signbits = sign(s1);
s1 = signbits*pow(round(sqrt(s1)*quantize+dithers)*quantizer, 2);// dither and posterize
return s1*(br*32767/65535.)+16384/65535.;// modulate brightness and output
#else
s1 = round((s1*65535/32767.-16384/32767.)*quantize+dithers);// dither and posterize
return s1*(br*quantizer*32767/65535.)+16384/65535.;// shrink interval back to normal after posterization, modulate brightness and output
#endif
}

fagoatse
29th July 2013, 09:34
The shaders jerrymh posted are meant to be used with emulators(RetroArch/Libretro in this case) and they are tailored for a specific resolution as far as I know. RetroArch supports up to 8 passes and you can build https://github.com/libretro/libretro-ffmpeg if you wish to test them in video playback scenario.

jerrymh
4th August 2013, 07:11
Here are some extra shaders for larger pixels. I also edited the previous post because the forum has a maximum text length limit.// (C) 2013 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// horizontal 8-pixel averaging
// this shader only works properly on inputs that have a horizontal resolution that is evenly divisible by 8

sampler s0 : register(s0);
float c0 : register(c0);
float c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float pos = (trunc(tex.x*c0*.125)+.0625)*c1*8.;// calculate the left positon of the current set of pixels
return (tex2D(s0, float2(pos, tex.y))+tex2D(s0, float2(pos+c1, tex.y))+tex2D(s0, float2(pos+2*c1, tex.y))+tex2D(s0, float2(pos+3*c1, tex.y))+tex2D(s0, float2(pos+4*c1, tex.y))+tex2D(s0, float2(pos+5*c1, tex.y))+tex2D(s0, float2(pos+6*c1, tex.y))+tex2D(s0, float2(pos+7*c1, tex.y)))*.125;// blur and output
}



// vertical 8-pixel averaging, dithering, posterizing and old CRT scan lines
// this shader only works properly on inputs that have a vertical resolution that is evenly divisible by 8

#define gamma 1// higher is brighter, fractions, either decimal or not are allowed
#define scanlinebasedarken .5// the default of .5 will darken outer pixels a bit on each set of vertical pixels to appear like old CRT scan lines, higher values will narrow the scan line beam
#define posterizedegamma 1// 0 or 1, apply dirty de-gamma for posterization, useful to preserve realistic gradients in low gamma modes
#define quantizationbits 4// posterization level, note that 'quantize' can actually take any amount, not just those based on powers of two

sampler s0 : register(s0);
float2 c0 : register(c0);
float2 c1 : register(c1);
static const float quantize = pow(2, quantizationbits)-1;
static const float quantizer = 1./quantize;
static const float qm = .0078125*quantizer;
static const float smalldithermap[8][8] = {
{-63*qm, qm, -47*qm, 17*qm, -59*qm, 5*qm, -43*qm, 21*qm},
{33*qm, -31*qm, 49*qm, -15*qm, 37*qm, -27*qm, 53*qm, -11*qm},
{-39*qm, 25*qm, -55*qm, 9*qm, -35*qm, 29*qm, -51*qm, 13*qm},
{57*qm, -7*qm, 41*qm, -23*qm, 61*qm, -3*qm, 45*qm, -19*qm},
{-57*qm, 7*qm, -41*qm, 23*qm, -61*qm, 3*qm, -45*qm, 19*qm},
{39*qm, -25*qm, 55*qm, -9*qm, 35*qm, -29*qm, 51*qm, -13*qm},
{-33*qm, 31*qm, -49*qm, 15*qm, -37*qm, 27*qm, -53*qm, 11*qm},
{63*qm, -qm, 47*qm, -17*qm, 59*qm, -5*qm, 43*qm, -21*qm}};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 basepos = tex.xyy*c0.xyy*float3(.015625, .015625, .125);
float3 basefrac = frac(basepos);
float2 lookups = basefrac.xy*8.;
float dithers = smalldithermap[lookups.x][lookups.y];

float br = 1.-pow(abs(basefrac.z*2.*scanlinebasedarken-scanlinebasedarken), gamma);// generate scan lines

float pos = (basepos.z-basefrac.z+.0625)*c1.y*8.;// calculate the top positon of the current set of pixels
float4 s1 = (tex2D(s0, float2(tex.x, pos))+tex2D(s0, float2(tex.x, pos+c1.y))+tex2D(s0, float2(tex.x, pos+2*c1.y))+tex2D(s0, float2(tex.x, pos+3*c1.y))+tex2D(s0, float2(tex.x, pos+4*c1.y))+tex2D(s0, float2(tex.x, pos+5*c1.y))+tex2D(s0, float2(tex.x, pos+6*c1.y))+tex2D(s0, float2(tex.x, pos+7*c1.y)))*.125;// blur input
#if posterizedegamma
float4 signbits = sign(s1);
s1 = signbits*pow(round(sqrt(s1)*quantize+dithers)*quantizer, 2);// dither and posterize
return s1*br;// modulate brightness and output
#else
s1 = round(s1*quantize+dithers);// dither and posterize
return s1*(br*quantizer);// shrink interval back to normal after posterization, modulate brightness and output
#endif
}



// vertical 8-pixel averaging, dithering, posterizing and old CRT scan lines for XYZ rendering on 16-bit integer surfaces
// this shader only works properly on inputs that have a vertical resolution that is evenly divisible by 8

#define gamma 1// higher is brighter, fractions, either decimal or not are allowed
#define scanlinebasedarken .5// the default of .5 will darken outer pixels a bit on each set of vertical pixels to appear like old CRT scan lines, higher values will narrow the scan line beam
#define posterizedegamma 1// 0 or 1, apply dirty de-gamma for posterization, useful to preserve realistic gradients in low gamma modes
#define quantizationbits 4// posterization level, note that 'quantize' can actually take any amount, not just those based on powers of two

sampler s0 : register(s0);
float2 c0 : register(c0);
float2 c1 : register(c1);
static const float quantize = pow(2, quantizationbits)-1;
static const float quantizer = 1./quantize;
static const float qm = .0078125*quantizer;
static const float smalldithermap[8][8] = {
{-63*qm, qm, -47*qm, 17*qm, -59*qm, 5*qm, -43*qm, 21*qm},
{33*qm, -31*qm, 49*qm, -15*qm, 37*qm, -27*qm, 53*qm, -11*qm},
{-39*qm, 25*qm, -55*qm, 9*qm, -35*qm, 29*qm, -51*qm, 13*qm},
{57*qm, -7*qm, 41*qm, -23*qm, 61*qm, -3*qm, 45*qm, -19*qm},
{-57*qm, 7*qm, -41*qm, 23*qm, -61*qm, 3*qm, -45*qm, 19*qm},
{39*qm, -25*qm, 55*qm, -9*qm, 35*qm, -29*qm, 51*qm, -13*qm},
{-33*qm, 31*qm, -49*qm, 15*qm, -37*qm, 27*qm, -53*qm, 11*qm},
{63*qm, -qm, 47*qm, -17*qm, 59*qm, -5*qm, 43*qm, -21*qm}};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 basepos = tex.xyy*c0.xyy*float3(.015625, .015625, .125);
float3 basefrac = frac(basepos);
float2 lookups = basefrac.xy*8.;
float dithers = smalldithermap[lookups.x][lookups.y];

float br = 1.-pow(abs(basefrac.z*2.*scanlinebasedarken-scanlinebasedarken), gamma);// generate scan lines

float pos = (basepos.z-basefrac.z+.0625)*c1.y*8.;// calculate the top positon of the current set of pixels
float4 s1 = (tex2D(s0, float2(tex.x, pos))+tex2D(s0, float2(tex.x, pos+c1.y))+tex2D(s0, float2(tex.x, pos+2*c1.y))+tex2D(s0, float2(tex.x, pos+3*c1.y))+tex2D(s0, float2(tex.x, pos+4*c1.y))+tex2D(s0, float2(tex.x, pos+5*c1.y))+tex2D(s0, float2(tex.x, pos+6*c1.y))+tex2D(s0, float2(tex.x, pos+7*c1.y)))*.125;// blur input
#if posterizedegamma
s1 = s1*65535/32767.-16384/32767.;
float4 signbits = sign(s1);
s1 = signbits*pow(round(sqrt(s1)*quantize+dithers)*quantizer, 2);// dither and posterize
return s1*(br*32767/65535.)+16384/65535.;// modulate brightness and output
#else
s1 = round((s1*65535/32767.-16384/32767.)*quantize+dithers);// dither and posterize
return s1*(br*quantizer*32767/65535.)+16384/65535.;// shrink interval back to normal after posterization, modulate brightness and output
#endif
}



// horizontal 10-pixel averaging
// this shader only works properly on inputs that have a horizontal resolution that is evenly divisible by 10

sampler s0 : register(s0);
float c0 : register(c0);
float c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float pos = (trunc(tex.x*c0*.1)+.05)*c1*10.;// calculate the left positon of the current set of pixels
return (tex2D(s0, float2(pos, tex.y))+tex2D(s0, float2(pos+c1, tex.y))+tex2D(s0, float2(pos+2*c1, tex.y))+tex2D(s0, float2(pos+3*c1, tex.y))+tex2D(s0, float2(pos+4*c1, tex.y))+tex2D(s0, float2(pos+5*c1, tex.y))+tex2D(s0, float2(pos+6*c1, tex.y))+tex2D(s0, float2(pos+7*c1, tex.y))+tex2D(s0, float2(pos+8*c1, tex.y))+tex2D(s0, float2(pos+9*c1, tex.y)))*.1;// blur and output
}



// vertical 10-pixel averaging, dithering, posterizing and old CRT scan lines
// this shader only works properly on inputs that have a vertical resolution that is evenly divisible by 10

#define gamma 1// higher is brighter, fractions, either decimal or not are allowed
#define scanlinebasedarken .5// the default of .5 will darken outer pixels a bit on each set of vertical pixels to appear like old CRT scan lines, higher values will narrow the scan line beam
#define posterizedegamma 1// 0 or 1, apply dirty de-gamma for posterization, useful to preserve realistic gradients in low gamma modes
#define quantizationbits 4// posterization level, note that 'quantize' can actually take any amount, not just those based on powers of two

sampler s0 : register(s0);
float2 c0 : register(c0);
float2 c1 : register(c1);
static const float quantize = pow(2, quantizationbits)-1;
static const float quantizer = 1./quantize;
static const float qm = .0078125*quantizer;
static const float smalldithermap[8][8] = {
{-63*qm, qm, -47*qm, 17*qm, -59*qm, 5*qm, -43*qm, 21*qm},
{33*qm, -31*qm, 49*qm, -15*qm, 37*qm, -27*qm, 53*qm, -11*qm},
{-39*qm, 25*qm, -55*qm, 9*qm, -35*qm, 29*qm, -51*qm, 13*qm},
{57*qm, -7*qm, 41*qm, -23*qm, 61*qm, -3*qm, 45*qm, -19*qm},
{-57*qm, 7*qm, -41*qm, 23*qm, -61*qm, 3*qm, -45*qm, 19*qm},
{39*qm, -25*qm, 55*qm, -9*qm, 35*qm, -29*qm, 51*qm, -13*qm},
{-33*qm, 31*qm, -49*qm, 15*qm, -37*qm, 27*qm, -53*qm, 11*qm},
{63*qm, -qm, 47*qm, -17*qm, 59*qm, -5*qm, 43*qm, -21*qm}};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 basepos = tex.xyy*c0.xyy*float3(.0125, .0125, .1);
float3 basefrac = frac(basepos);
float2 lookups = basefrac.xy*8.;
float dithers = smalldithermap[lookups.x][lookups.y];

float br = 1.-pow(abs(basefrac.z*2.*scanlinebasedarken-scanlinebasedarken), gamma);// generate scan lines

float pos = (basepos.z-basefrac.z+.05)*c1.y*10.;// calculate the top positon of the current set of pixels
float4 s1 = (tex2D(s0, float2(tex.x, pos))+tex2D(s0, float2(tex.x, pos+c1.y))+tex2D(s0, float2(tex.x, pos+2*c1.y))+tex2D(s0, float2(tex.x, pos+3*c1.y))+tex2D(s0, float2(tex.x, pos+4*c1.y))+tex2D(s0, float2(tex.x, pos+5*c1.y))+tex2D(s0, float2(tex.x, pos+6*c1.y))+tex2D(s0, float2(tex.x, pos+7*c1.y))+tex2D(s0, float2(tex.x, pos+8*c1.y))+tex2D(s0, float2(tex.x, pos+9*c1.y)))*.1;// blur input
#if posterizedegamma
float4 signbits = sign(s1);
s1 = signbits*pow(round(sqrt(s1)*quantize+dithers)*quantizer, 2);// dither and posterize
return s1*br;// modulate brightness and output
#else
s1 = round(s1*quantize+dithers);// dither and posterize
return s1*(br*quantizer);// shrink interval back to normal after posterization, modulate brightness and output
#endif
}



// vertical 10-pixel averaging, dithering, posterizing and old CRT scan lines for XYZ rendering on 16-bit integer surfaces
// this shader only works properly on inputs that have a vertical resolution that is evenly divisible by 10

#define gamma 1// higher is brighter, fractions, either decimal or not are allowed
#define scanlinebasedarken .5// the default of .5 will darken outer pixels a bit on each set of vertical pixels to appear like old CRT scan lines, higher values will narrow the scan line beam
#define posterizedegamma 1// 0 or 1, apply dirty de-gamma for posterization, useful to preserve realistic gradients in low gamma modes
#define quantizationbits 4// posterization level, note that 'quantize' can actually take any amount, not just those based on powers of two

sampler s0 : register(s0);
float2 c0 : register(c0);
float2 c1 : register(c1);
static const float quantize = pow(2, quantizationbits)-1;
static const float quantizer = 1./quantize;
static const float qm = .0078125*quantizer;
static const float smalldithermap[8][8] = {
{-63*qm, qm, -47*qm, 17*qm, -59*qm, 5*qm, -43*qm, 21*qm},
{33*qm, -31*qm, 49*qm, -15*qm, 37*qm, -27*qm, 53*qm, -11*qm},
{-39*qm, 25*qm, -55*qm, 9*qm, -35*qm, 29*qm, -51*qm, 13*qm},
{57*qm, -7*qm, 41*qm, -23*qm, 61*qm, -3*qm, 45*qm, -19*qm},
{-57*qm, 7*qm, -41*qm, 23*qm, -61*qm, 3*qm, -45*qm, 19*qm},
{39*qm, -25*qm, 55*qm, -9*qm, 35*qm, -29*qm, 51*qm, -13*qm},
{-33*qm, 31*qm, -49*qm, 15*qm, -37*qm, 27*qm, -53*qm, 11*qm},
{63*qm, -qm, 47*qm, -17*qm, 59*qm, -5*qm, 43*qm, -21*qm}};

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 basepos = tex.xyy*c0.xyy*float3(.0125, .0125, .1);
float3 basefrac = frac(basepos);
float2 lookups = basefrac.xy*8.;
float dithers = smalldithermap[lookups.x][lookups.y];

float br = 1.-pow(abs(basefrac.z*2.*scanlinebasedarken-scanlinebasedarken), gamma);// generate scan lines

float pos = (basepos.z-basefrac.z+.05)*c1.y*10.;// calculate the top positon of the current set of pixels
float4 s1 = (tex2D(s0, float2(tex.x, pos))+tex2D(s0, float2(tex.x, pos+c1.y))+tex2D(s0, float2(tex.x, pos+2*c1.y))+tex2D(s0, float2(tex.x, pos+3*c1.y))+tex2D(s0, float2(tex.x, pos+4*c1.y))+tex2D(s0, float2(tex.x, pos+5*c1.y))+tex2D(s0, float2(tex.x, pos+6*c1.y))+tex2D(s0, float2(tex.x, pos+7*c1.y))+tex2D(s0, float2(tex.x, pos+8*c1.y))+tex2D(s0, float2(tex.x, pos+9*c1.y)))*.1;// blur input
#if posterizedegamma
s1 = s1*65535/32767.-16384/32767.;
float4 signbits = sign(s1);
s1 = signbits*pow(round(sqrt(s1)*quantize+dithers)*quantizer, 2);// dither and posterize
return s1*(br*32767/65535.)+16384/65535.;// modulate brightness and output
#else
s1 = round((s1*65535/32767.-16384/32767.)*quantize+dithers);// dither and posterize
return s1*(br*quantizer*32767/65535.)+16384/65535.;// shrink interval back to normal after posterization, modulate brightness and output
#endif
}

Thanks, long time without internet. :mad:

jerrymh
4th August 2013, 17:49
I found this on libreto ffmpeg video shader, really looks like and old crt monitor , but dont know if there any build for windows

https://photos-2.dropbox.com/t/0/AAB0TgdA87Z8o0z--fUDPR3sdcXuJyIHpmvfdSaYd3L2Lg/12/149537/png/32x32/3/1375639200/0/2/RetroArch-0719-182234.png/0vm6SKy7KZQJmOAzkTFZ8f7lNzBL7vzYcDfVtaXnYqU%2C_iLphXBTqHEvh5k1JTJPn3pjPVrjYF6islARCB6XHlI?size=1280x960

JanWillem32
11th August 2013, 22:30
XRyche, I made a three-stage chain that might work. It's currently rather restricted and I'll probably need to change a few more parameters, but it's a good start. I only made one chain, meant for the combination of HD video with the renderer settings on 16-bit integer surfaces with the disable initial pass shaders option enabled. I can add more shaders later, if these work well.// (C) 2013 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// R'G'B' to Y'CbCr for HD video input for XYZ rendering on 16-bit integer surfaces

sampler s0;

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 s1 = tex2Dlod(s0, float4(tex, 0, 0)).rgb;// original pixel
return ((s1.rrr*float3(.2126, -.1063/.9278, .5)+s1.ggg*float3(.7152, -.3576/.9278, -.3576/.7874)+s1.bbb*float3(.0722, .5, -.0361/.7874))*32767/65535.+float3(16384/65535., 32767/65535., 32767/65535.)).rgbb;// HD RGB to Y'CbCr output
}



// horizontal pass sharpen complex, deband and denoise for HD video input for XYZ rendering on 16-bit integer surfaces

#define SharpenLimitLuma 2// valid interval [0, 10], luma-specific sharpening limit, 0 is disabled, lower numbers will allow more sharpening on contours
#define SharpenLimitChroma 2// valid interval [0, 10], chroma-specific sharpening limit, 0 is disabled, lower numbers will allow more sharpening on contours
#define LumaDetectionFactor 64// valid interval (65535/32767., 250], luma-specific detection factor, if set to the lowest amount no contours can be detected, higher numbers will shift the detection on color difference intervals of debanding to noise detection limit to mimimum sharpening to maximum sharpening toward more sharpening
#define ChromaDetectionFactor 64// valid interval (65535/32767., 250], chroma-specific detection factor, if set to the lowest amount no contours can be detected, higher numbers will shift the detection on color difference intervals of debanding to noise detection limit to mimimum sharpening to maximum sharpening toward more sharpening
#define NoiseThreshold .0078125// valid interval [0, 32767/65535.), banding treshold, higher numbers mean stronger deband and denoise

sampler s0 : register(s0);
float2 c1 : register(c1);
#define sp(a) tex2Dlod(s0, float4(tex+c1*float2(a, 0), 0, 0)).rgb
static const float3 slimits = float3(-SharpenLimitLuma, -SharpenLimitChroma, -SharpenLimitChroma);
static const float3 dfactors = float3(LumaDetectionFactor, ChromaDetectionFactor, ChromaDetectionFactor);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 n, p, s1 = sp(0);// original pixel
{
float3 s2 = sp(-1);
float3 af = 1.;// accumulated amount of colors from the samples
float3 ac = s1;// accumulate color
float3 cd = abs(s1-s2);// color difference
float3 rcd = max(slimits, 1.-dfactors*cd);// factor for both base and multiplicand is 1.0, the output will be in the interval (-inf, 1]
// invert interval on sharpening
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s2*rcd;
[branch] if(max(max(cd.x, cd.y), cd.z) < NoiseThreshold) {// continue if all channels are below the noise threshold
float3 s3 = sp(-2);
cd = abs(s1-s3);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s3*rcd;
[branch] if(max(max(cd.x, cd.y), cd.z) < NoiseThreshold) {
float3 s4 = sp(-3);
cd = abs(s1-s4);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s4*rcd;
[branch] if(max(max(cd.x, cd.y), cd.z) < NoiseThreshold) {
float3 s5 = sp(-4);
cd = abs(s1-s5);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s5*rcd;
[branch] if(max(max(cd.x, cd.y), cd.z) < NoiseThreshold) {
float3 s6 = sp(-5);
cd = abs(s1-s6);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s6*rcd;
}
}
}
}
n = ac/af;
}
{
float3 s2 = sp(1);
float3 af = 1.;// accumulated amount of colors from the samples
float3 ac = s1;// accumulate color
float3 cd = abs(s1-s2);// color difference
float3 rcd = max(slimits, 1.-dfactors*cd);// factor for both base and multiplicand is 1.0, the output will be in the interval (-inf, 1]
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s2*rcd;
[branch] if(max(max(cd.x, cd.y), cd.z) < NoiseThreshold) {// continue if all channels are below the noise threshold
float3 s3 = sp(2);
cd = abs(s1-s3);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s3*rcd;
[branch] if(max(max(cd.x, cd.y), cd.z) < NoiseThreshold) {
float3 s4 = sp(3);
cd = abs(s1-s4);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s4*rcd;
[branch] if(max(max(cd.x, cd.y), cd.z) < NoiseThreshold) {
float3 s5 = sp(4);
cd = abs(s1-s5);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s5*rcd;
[branch] if(max(max(cd.x, cd.y), cd.z) < NoiseThreshold) {
float3 s6 = sp(5);
cd = abs(s1-s6);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s6*rcd;
}
}
}
}
p = ac/af;
}
return ((n+p)*.5).rgbb;
}



// vertical pass sharpen complex, deband, denoise and color controls for HD video input for XYZ rendering on 16-bit integer surfaces

#define SharpenLimitLuma 2// valid interval [0, 10], luma-specific sharpening limit, 0 is disabled, lower numbers will allow more sharpening on contours
#define SharpenLimitChroma 2// valid interval [0, 10], chroma-specific sharpening limit, 0 is disabled, lower numbers will allow more sharpening on contours
#define LumaDetectionFactor 64// valid interval (65535/32767., 250], luma-specific detection factor, if set to the lowest amount no contours can be detected, higher numbers will shift the detection on color difference intervals of debanding to noise detection limit to mimimum sharpening to maximum sharpening toward more sharpening
#define ChromaDetectionFactor 64// valid interval (65535/32767., 250], chroma-specific detection factor, if set to the lowest amount no contours can be detected, higher numbers will shift the detection on color difference intervals of debanding to noise detection limit to mimimum sharpening to maximum sharpening toward more sharpening
#define NoiseThreshold .0078125// valid interval [0, 32767/65535.), banding treshold, higher numbers mean stronger deband and denoise

// YCbCrColorControls, 0 is disabled, 1 is enabled
#define YCbCrColorControls 0
// Brightness, interval [-10, 10], default 0
#define Brightness 0
// Contrast, interval [0, 10], default 1
#define Contrast 1
// GrayscaleGamma and ColorfulnessGamma, interval (0, 10], default 1
#define GrayscaleGamma 1
#define ColorfulnessGamma 1
// Hue, interval [-180, 180], default 0
#define Hue 0
// Saturation, interval [0, 10], default 1
#define Saturation 1
// VideoRedGamma, VideoGreenGamma and VideoBlueGamma, interval (0, 10], default 2.4, the video gamma input factors used to convert between the video input RGB and linear RGB
#define VideoRedGamma 2.4
#define VideoGreenGamma 2.4
#define VideoBlueGamma 2.4

sampler s0 : register(s0);
float2 c1 : register(c1);
#define sp(a) tex2Dlod(s0, float4(tex+c1*float2(0, a), 0, 0)).rgb
static const float3 slimits = float3(-SharpenLimitLuma, -SharpenLimitChroma, -SharpenLimitChroma);
static const float3 dfactors = float3(LumaDetectionFactor, ChromaDetectionFactor, ChromaDetectionFactor);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float3 n, p, s1 = sp(0);// original pixel
{
float3 s2 = sp(-1);
float3 af = 1.;// accumulated amount of colors from the samples
float3 ac = s1;// accumulate color
float3 cd = abs(s1-s2);// color difference
float3 rcd = max(slimits, 1.-dfactors*cd);// factor for both base and multiplicand is 1.0, the output will be in the interval (-inf, 1]
// invert interval on sharpening
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s2*rcd;
[branch] if(max(max(cd.x, cd.y), cd.z) < NoiseThreshold) {// continue if all channels are below the noise threshold
float3 s3 = sp(-2);
cd = abs(s1-s3);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s3*rcd;
[branch] if(max(max(cd.x, cd.y), cd.z) < NoiseThreshold) {
float3 s4 = sp(-3);
cd = abs(s1-s4);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s4*rcd;
[branch] if(max(max(cd.x, cd.y), cd.z) < NoiseThreshold) {
float3 s5 = sp(-4);
cd = abs(s1-s5);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s5*rcd;
[branch] if(max(max(cd.x, cd.y), cd.z) < NoiseThreshold) {
float3 s6 = sp(-5);
cd = abs(s1-s6);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s6*rcd;
}
}
}
}
n = ac/af;
}
{
float3 s2 = sp(1);
float3 af = 1.;// accumulated amount of colors from the samples
float3 ac = s1;// accumulate color
float3 cd = abs(s1-s2);// color difference
float3 rcd = max(slimits, 1.-dfactors*cd);// factor for both base and multiplicand is 1.0, the output will be in the interval (-inf, 1]
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s2*rcd;
[branch] if(max(max(cd.x, cd.y), cd.z) < NoiseThreshold) {// continue if all channels are below the noise threshold
float3 s3 = sp(2);
cd = abs(s1-s3);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s3*rcd;
[branch] if(max(max(cd.x, cd.y), cd.z) < NoiseThreshold) {
float3 s4 = sp(3);
cd = abs(s1-s4);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s4*rcd;
[branch] if(max(max(cd.x, cd.y), cd.z) < NoiseThreshold) {
float3 s5 = sp(4);
cd = abs(s1-s5);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s5*rcd;
[branch] if(max(max(cd.x, cd.y), cd.z) < NoiseThreshold) {
float3 s6 = sp(5);
cd = abs(s1-s6);
rcd = max(slimits, 1.-dfactors*cd);
if(rcd.x < 0) rcd.x = SharpenLimitLuma-abs(rcd.x);
if(rcd.y < 0) rcd.y = SharpenLimitChroma-abs(rcd.y);
if(rcd.z < 0) rcd.z = SharpenLimitChroma-abs(rcd.z);
af += abs(rcd);
ac += s6*rcd;
}
}
}
}
p = ac/af;
}
float3 t0 = (n+p)*.5;
t0 = t0*65535/32767.-float3(16384/32767., 32767/65535.+.5, 32767/65535.+.5);
#if YCbCrColorControls == 1
t0.yz = mul(t0.yz, float2x2(cos(radians(Hue)), sin(radians(Hue)), -sin(radians(Hue)), cos(radians(Hue))));// process hue
t0.xyz *= float3(Contrast, 2*Saturation, 2*Saturation);// process contrast and saturation, extend the chroma interval from [-.5, .5] to [-1, 1] for gamma processing
t0.x += Brightness;// process brightness
// preserve the sign bits of Y'CbCr values
float3 sby = sign(t0);
t0 = sby*pow(abs(t0), float3(GrayscaleGamma, ColorfulnessGamma, ColorfulnessGamma));// gamma processing
t0 = t0.rrr+float3(0, -.5*.1674679/.894, .5*1.8556)*t0.ggg+float3(.5*1.5748, -.5*.4185031/.894, 0)*t0.bbb;// HD Y'CbCr to RGB, compensate for the chroma ranges
#else
t0 = t0.rrr+float3(0, -.1674679/.894, 1.8556)*t0.ggg+float3(1.5748, -.4185031/.894, 0)*t0.bbb;// HD Y'CbCr to RGB
#endif
// preserve the sign bits of RGB values
float3 sbl = sign(t0);
t0 = sbl*pow(abs(t0), float3(VideoRedGamma, VideoGreenGamma, VideoBlueGamma));// linear RGB gamma correction
t0 = mul(t0, float3x3(0.3786675215, 0.1952504408, 0.0177500401, 0.3283428626, 0.6566857251, 0.1094476209, 0.1657219631, 0.0662887852, 0.8728023391))*32767/65535.+16384/65535.;
return t0.rgbb;// XYZ output
}

JanWillem32
11th August 2013, 22:33
The "contour color expose banding" shader is useful for denoise and deband testing purposes.// (C) 2013 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// contour color expose banding for XYZ rendering on 16-bit integer surfaces
// This shader can be run as a screen space pixel shader.
// This shader requires compiling with ps_2_0, but higher is better, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports.
// Use this shader to add a color contoured effect to an image.

sampler s0;
float2 c1 : register(c1);
#define sp(a, b, c) float4 a = tex2D(s0, tex+c1*float2(b, c));

float4 main(float2 tex : TEXCOORD0) : COLOR
{
sp(s2, -1, -1) sp(s3, 0, -1) sp(s4, 1, -1) sp(s5, -1, 0) sp(s6, 1, 0) sp(s7, -1, 1) sp(s8, 0, 1) sp(s9, 1, 1)// sample surrounding pixels
return smoothstep(.0625, 0, abs(s2+s3+s4-s7-s8-s9)+abs(s2+s5+s7-s4-s6-s9)+abs(s2+s3+s5-s6-s8-s9)+abs(s3+s4+s6-s5-s7-s8))*32767/65535.+16384/65535.;// color contour output
}

JanWillem32
12th August 2013, 08:23
I just wrote some simple shaders for usage as a third pass, after the "vertical x-pixel averaging, dithering, posterizing and old CRT scan lines"-type shaders. These shaders separate RGB channels of the input video into multiple real pixels, imitating aperture grilles that use rectangular masks. (Imitating the other common shadow mask pattern would be a lot harder to program. I'm not sure it's worth the effort.) The warnings about divisibility in these shaders are not that important. The artifacts are barely visible if the input isn't evenly divisible.// horizontal 4-pixel RGB separation
// this shader only works properly on inputs that have a horizontal resolution that is evenly divisible by 4

sampler s0 : register(s0);
float c0 : register(c0);
float c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float scaletexx = tex.x*c0*.25;
float prepos = trunc(scaletexx);// calculate the left positon of the current set of pixels
float posdif = scaletexx-prepos;
float4 mask;// create RGB mask based on the pixel location
if(posdif < .25) mask = float4(1, 0, 0, 0);
else if(posdif < .5) mask = float4(0, 1/3., 2/3., 0);
else if(posdif < .75) mask = float4(0, 2/3., 1/3., 0);
else mask = float4(0, 0, 1, 0);
return tex2D(s0, tex)*mask;// mask and output
}



// horizontal 5-pixel RGB separation
// this shader only works properly on inputs that have a horizontal resolution that is evenly divisible by 5

sampler s0 : register(s0);
float c0 : register(c0);
float c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float scaletexx = tex.x*c0*.2;
float prepos = trunc(scaletexx);// calculate the left positon of the current set of pixels
float posdif = scaletexx-prepos;
float4 mask;// create RGB mask based on the pixel location
if(posdif < .2) mask = float4(1, 0, 0, 0);
else if(posdif < .4) mask = float4(0, 2/3., 1/3., 0);
else if(posdif < .6) mask = float4(0, 1, 0, 0);
else if(posdif < .8) mask = float4(0, 1/3., 2/3., 0);
else mask = float4(0, 0, 1, 0);
return tex2D(s0, tex)*mask;// mask and output
}



// horizontal 8-pixel RGB separation
// this shader only works properly on inputs that have a horizontal resolution that is evenly divisible by 8

sampler s0 : register(s0);
float c0 : register(c0);
float c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float scaletexx = tex.x*c0*.125;
float prepos = trunc(scaletexx);// calculate the left positon of the current set of pixels
float posdif = scaletexx-prepos;
float4 mask;// create RGB mask based on the pixel location
if(posdif < .25) mask = float4(1, 0, 0, 0);
else if(posdif < 0.375) mask = float4(0, 2/3., 1/3., 0);
else if(posdif < 0.625) mask = float4(0, 1, 0, 0);
else if(posdif < .75) mask = float4(0, 1/3., 2/3., 0);
else mask = float4(0, 0, 1, 0);
return tex2D(s0, tex)*mask;// mask and output
}



// horizontal 10-pixel RGB separation
// this shader only works properly on inputs that have a horizontal resolution that is evenly divisible by 10

sampler s0 : register(s0);
float c0 : register(c0);
float c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float scaletexx = tex.x*c0*.1;
float prepos = trunc(scaletexx);// calculate the left positon of the current set of pixels
float posdif = scaletexx-prepos;
float4 mask;// create RGB mask based on the pixel location
if(posdif < .3) mask = float4(1, 0, 0, 0);
else if(posdif < .4) mask = float4(0, 1/3., 2/3., 0);
else if(posdif < .6) mask = float4(0, 1, 0, 0);
else if(posdif < .7) mask = float4(0, 2/3., 1/3., 0);
else mask = float4(0, 0, 1, 0);
return tex2D(s0, tex)*mask;// mask and output
}



// horizontal 4-pixel RGB separation for XYZ rendering on 16-bit integer surfaces
// this shader only works properly on inputs that have a horizontal resolution that is evenly divisible by 4

sampler s0 : register(s0);
float c0 : register(c0);
float c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float scaletexx = tex.x*c0*.25;
float prepos = trunc(scaletexx);// calculate the left positon of the current set of pixels
float posdif = scaletexx-prepos;
float4 mask;// create RGB mask based on the pixel location
if(posdif < .25) mask = float4(1, 0, 0, 0);
else if(posdif < .5) mask = float4(0, 1/3., 2/3., 0);
else if(posdif < .75) mask = float4(0, 2/3., 1/3., 0);
else mask = float4(0, 0, 1, 0);
return (tex2D(s0, tex)-16384/65535.)*mask+16384/65535.;// mask and output
}



// horizontal 5-pixel RGB separation for XYZ rendering on 16-bit integer surfaces
// this shader only works properly on inputs that have a horizontal resolution that is evenly divisible by 5

sampler s0 : register(s0);
float c0 : register(c0);
float c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float scaletexx = tex.x*c0*.2;
float prepos = trunc(scaletexx);// calculate the left positon of the current set of pixels
float posdif = scaletexx-prepos;
float4 mask;// create RGB mask based on the pixel location
if(posdif < .2) mask = float4(1, 0, 0, 0);
else if(posdif < .4) mask = float4(0, 2/3., 1/3., 0);
else if(posdif < .6) mask = float4(0, 1, 0, 0);
else if(posdif < .8) mask = float4(0, 1/3., 2/3., 0);
else mask = float4(0, 0, 1, 0);
return (tex2D(s0, tex)-16384/65535.)*mask+16384/65535.;// mask and output
}



// horizontal 8-pixel RGB separation for XYZ rendering on 16-bit integer surfaces
// this shader only works properly on inputs that have a horizontal resolution that is evenly divisible by 8

sampler s0 : register(s0);
float c0 : register(c0);
float c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float scaletexx = tex.x*c0*.125;
float prepos = trunc(scaletexx);// calculate the left positon of the current set of pixels
float posdif = scaletexx-prepos;
float4 mask;// create RGB mask based on the pixel location
if(posdif < .25) mask = float4(1, 0, 0, 0);
else if(posdif < 0.375) mask = float4(0, 2/3., 1/3., 0);
else if(posdif < 0.625) mask = float4(0, 1, 0, 0);
else if(posdif < .75) mask = float4(0, 1/3., 2/3., 0);
else mask = float4(0, 0, 1, 0);
return (tex2D(s0, tex)-16384/65535.)*mask+16384/65535.;// mask and output
}



// horizontal 10-pixel RGB separation for XYZ rendering on 16-bit integer surfaces
// this shader only works properly on inputs that have a horizontal resolution that is evenly divisible by 10

sampler s0 : register(s0);
float c0 : register(c0);
float c1 : register(c1);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float scaletexx = tex.x*c0*.1;
float prepos = trunc(scaletexx);// calculate the left positon of the current set of pixels
float posdif = scaletexx-prepos;
float4 mask;// create RGB mask based on the pixel location
if(posdif < .3) mask = float4(1, 0, 0, 0);
else if(posdif < .4) mask = float4(0, 1/3., 2/3., 0);
else if(posdif < .6) mask = float4(0, 1, 0, 0);
else if(posdif < .7) mask = float4(0, 2/3., 1/3., 0);
else mask = float4(0, 0, 1, 0);
return (tex2D(s0, tex)-16384/65535.)*mask+16384/65535.;// mask and output
}

leeperry
11th November 2013, 23:54
Hi Jan,

So following your advice in another thread that was asking for a "film grain" PS script, I've played around with your "semi-random grayscale noise.txt" which looks quite good but do you think it would be possible to make a PS script version of GrainFactory3() (http://forum.doom9.org/showpost.php?p=1191292&postcount=30)?

It allows you to choose the size, strength and sharpness of grain depending on dark/mid-tone/bright areas(whose limits can also be defined) and it can really be finetuned either for deblocking purposes, grain-based EE or artistic effects meant to mimick reel grain.

The problem with Did閑's script is that it quickly becomes a CPU hog, Avisynth works in 8bit only and the idea would be to process it in 32fp after scaling to Jinc3AR in mVR....so if there is any way you could work your magic to do the same within a PS script, this would be too good to be true :)

:thanks: you very much in advance for even considering it,

JanWillem32
12th November 2013, 05:37
What I could find out about GrainFactory3 was: "noise generator that tries to simulate the behaviour of silver grain on film".
I already wrote some basic noise effect shaders, but maybe I could get closer to the look of silver grain on film.
When I start coding to create an effect, I start with looking at examples on images. I don't just duplicate/imitate other filters. When I've gathered enough research materials, I just start writing out possible parameters for methods in an effect. After that, I try a few methods. These are just calculations that spring to mind, and I usually copy a lot of previously written methods, too. After a bit of tinkering, I usually get the desirable effect from a shader. When transforming the prototype shader to a final type, I optimize first, and add comments. After that, I extract the set of constants, give them names and offer them as user-configurable parameters.
The reason I'm telling this is simple; I can practically guarantee that once I've finished something that resembles silver grain on film, the effect will not have a grand total 19 user input variables like GrainFactory3.
On the other hand, I don't see any options in GrainFactory3 for using color. I would probably add an option or options for this type of filter related to color, for example; to use the properties for sepia toning instead of silver. (This possibly requires separate filters, though.)

To start off simple, the pictures I could find of real film used in cinemas varied strongly over the decades. The most evident changes were the transition from grayscale or toned video to color. The form and amount of grain on film, and the cinema equipment varied, too. Some effects are available: projector film drive scratches, projector film dust, projector film lamp vignette, projector film sepia toning for SD&HD video input, grayscale, projector film shaking, semi-random colored surface noise and semi-random grayscale noise.
What era are you targeting for this kind of vintage look? Which effects are currently missing to complete the illusion of such a look? Please specify with some true vintage cinematic examples and name some very specific factors.

leeperry
12th November 2013, 18:17
Hi Jan, thanks for the swift reply.

Well, it would appear that to simulate silver film grain you'd need the ability to set different chunk sizes for dark/mid/bright pixels as grain would appear to look thicker in dark areas for instance. I think 21 grams (http://www.google.com/search?q=21+grams&tbm=isch) is a good example of what excessive reel grain can do, of course I want to keep it less intrusive.

My real-world use of GrainF3 was to set very low values in order to deblock(which tends to increase the subjective pop-effect IME), add some subtle grain-based EE, give a DLP/silver reel look to sanitized "flat looking" digital movies the same way DLP videoprojectors look pretty grain in dark areas due to their very fast rotating mirrors (http://www2.hesston.edu/Physics/TelevisionDisplays/IMAGES/DLP.JPG) for instance.

I don't think I would be interested in chroma grain, but I would need is the abilities to:
-choose the size and strength of grain depending on dark/mid/bright areas
-choose the limits for dark/mid/bright, like in GrainF3
-choose the grain pattern or make it random, much like what foxyshadis explained here (http://forum.doom9.org/showpost.php?p=1403328&postcount=40). Once you find your favorite grain patterns for dark/mid/bright areas, you can use them permanently and the "movie grain" won't be quite random anymore as it'll be finetuned to your liking.

The intent of GrainF3 is not to mimick a 1930's sepia looking semi-busted projector but simply to add (possibly very subtle) silver looking grain. Once finetuned, it can really look great as adding grain does wonders for deblocking purposes IME. Of course it can also be used with excessive settings for artistic means.

The more I talk about it, the more I wanna play around with GrainF3 all over again but I would really rather have it done via a PS script for all the aforementioned reasons :)

PS: ouh, this looks impressive too: Cinema Film Grain Plugin for FCPX (http://www.fcpeffects.com/products/cinema-film-grain) :cool:

leeperry
9th December 2013, 18:17
Hi again Jan, so I've tried my old GrainF3() calls that looked so great on CRT/DLP but they look darn noisy on LCD....would have to rework them again from scratch with much more subtle settings I guess.

Anyway, I've got a question for you :)

Here's another Avisynth script that has always impressed me: SmoothLevels() (http://forum.doom9.org/showthread.php?t=154971)

It's gone DLL but it used to be an .AVS script and here are a few older versions of it: SmoothLevels.rar (11 KB) (https://mega.co.nz/#!K0AlXLCK!VDlGv12GxSIiqSLgStdGQHXl2sEacjiVcvQanTI0oG8)

Using it to convert TV to PC, I've always found its resulting picture to look "deeper" and simply more 3D-looking than whatever ffdshow or madVR could offer.

I mentioned it to madshi a while ago who told me that its error diffusion was technically more advanced than the dithering currently done in mVR and that PS scripts couldn't process Floyd-Steinberg.

I've made screenshots comparisons on a gray ramp and a REC709 test pattern available at SmoothL2.rar (15.3 MB) (https://mega.co.nz/#!f9wCUa5a!KdRNjFkmCuL9p_xz2yPcnWuN8uZ6iBVjwLZfvvhhc68)

I'm colorblind so judging on banding and colorimetry is a very tedious task for me, but I've been told that banding didn't look any better than in mVR and that it seemed to make colorimetry shifts?

Comparing screenshots back and forth it would appear to me that SmoothL is doing some sort of colorimetry-based EE? The borders of the color squares in that REC709 pattern look quite different with SmoothL, don't they?

My point is that for instance in this very short sample Prisoners.mkv (22.9 MB) (https://mega.co.nz/#!HtwVTBYa!ZYBMzxAkSPBBxcmyr06Hp1heUW8uqMuFPcgzfYFWUuQ) when comparing SmoothLevels(preset="tv2pc",HQ=true) to anything else, that animal in the woods seems to appear much deeper in the picture and the passing cars on the road feel less "flat" and far more natural to me :cool:

"HQ" would stand for "HQ interpolation" BTW.

All this said, do you agree that the picture looks deeper with that sample when using SmoothL? Is that because its error diffusion is more advanced than what mVR can do, or because of some -as I'm suspecting- colorimetry-based EE? If so, could you possibly provide the same kind of trick with a PS script?

My problem is that SmoothL outputs 8bit from ffdshow to mVR, it's a real CPU hog(especially in HQ mode on 1.78 1080p) and I'm totally hooked to its 3D look :o

Hope you can look into it, :thanks: in advance!

foxyshadis
13th December 2013, 18:05
Hi Jan! Excellent work with the new filters. Can I request that the main archive be updated to include the last couple of years' work, as well? I know some aren't 100% finished, but it'd be nice to have them all handy for quick downloading onto the various systems I have set up for media.

vBm
13th December 2013, 21:32
Would be quite nice to have centralized repository for all the shaders.

JanWillem32
17th December 2013, 15:41
leeperry, sorry that I took a while to give a status update. I've tried several prototype shaders for the grain/noise effects. None are really finished yet, and it's going to take some more development. I'm currently struggeling with getting larger, well-shaped, random grain particles efficiently in the output images. The prototype shaders are currently hardly any better than the simple types I already made before. (projector film dust, projector film drive scratches, semi-random colored surface noise and semi-random grayscale noise) The shaders that do generate larger grain particles generate nasty artifacts, and the shaders that generate the medium to fine noise are very per-pixel randomized.

As for "chroma grain", I'm not going to use any Y'CbCr channels for this shader, neither luma nor chroma. I do occasionally use the chroma parameter of CIECAM02 for some color-related filtering, but in this case that's not relevant at all. For the grayscale, toned (such as sepia) and three-layered film types, the grain/noise modifies the intensity of each tinted chemical applied on the film. At the moment I unfortunately still have to guess which xyY primary color (or color function) each of these tinted chemicals govern for both the recording as the playback phases, but I'll research that some more. If anybody knows these details, please notify me.

I'm not going to add support for textured patterns. It's extremely impractical from the renderer's perspective with the current interfaces, hardware support and file management. Of course the renderer does have easy access to internal resources. I'll elaborate more on that point later.

Multiple general shapes and sizes for the grain in the shaders shoud be possible, but it's currently not that easy to configure with the prototypes.

For shaders that do color corrections, I already wrote quite a few already. These all need to be edited though. The descriptions in these shaders are sub-par. I also wrote some new prototypes. I'll publish some implementations of these at a later date.

Dithering is a complicated transformation done in the absolute final filtering pass of a video renderer. I'm not going to write a dithering filter outside the context of exactly that.
Floyd朣teinberg dithering is due to its filtering kernel indeed unsuitable for pretty much any modern dithering method, including the dithering done in pixel shading passes. There are plenty of good alternatives in ordered dithering and such. I already wrote three modes to dither for a final filtering pass, and I'll happily add more modes if another suitable method pops up, but for now, I'm not going to write more dithering filters.

As for banding, when using either of the two advanced video renderers currently available for filtering; it's pretty much always a problem in the video source. For video filters that don't keep much quality in quantization and dynamics during filter transformations, additional banding will indeed occur. Debanding the source is quite a difficult problem. I've written several filters that can more or less deband a bit. (Don't expect much from the debanding filters with only a small filtering radius.)

As for the other effects you name that you attribute to SmoothLevels(), there's only so much you can transform in a width-height-color-time domain. A different ditherer isn't going to make things "appear much deeper" or "less flat". That's something generally achieved by sharpening, contrast enhancements or even gamma controls. I've written several of these filters already.

foxyshadis and vBm, as I mentioned in the main MPC-HC thread here on the software players board:The Pixel Shader Pack v1.5 will need quite a lot of extra work. I don't like the blocks of comments below the title of the shaders. It still mentions "screen space pixel shader" and such, which is just outdated. I also want to split the entire set of shaders per rendering color space+format. I'll have to add a 'readme' to properly document what to use in what renderer for that. Most of the pixel shaders are already available as multi-version (see the last few pages of the shader thread for examples).
Some shaders need to be scrapped for the simple reason that they should never be used (or I'll just put them in the junk folder called 'development'). I also wrote new shaders and corrected some wrong methods in older ones.
My main problem is actually the set of newly developed shaders. These hardly contain any comments, and some of them are just not user-friendly. I just don't have time to edit and test over 350 files.I just write a lot of, well... crap. There are a lot of prototype and 'finished' shaders that I should really edit to make them decent. Some other shaders need to be dumped in the 'development' junk folder with correct comments inside. And again some others can just be deleted. Just collecting all shaders in their current state isn't going to help at all.

leeperry
17th December 2013, 16:30
As for the other effects you name that you attribute to SmoothLevels(), there's only so much you can transform in a width-height-color-time domain. A different ditherer isn't going to make things "appear much deeper" or "less flat". That's something generally achieved by sharpening, contrast enhancements or even gamma controls. I've written several of these filters already.
Hi Jan, thanks for the reply!

I've played around with SmoothL quite a bit lately and I still suspect it to be processing chroma-based EE, isn't that possible whatsoever?

Could you please try the aforemetioned sample and SmoothL Avisynth call and see for yourself what it does? It looks like EE to me, but not the usual halo-based luma EE, there is no halo as far as I can tell and yet things look a bit like a cell-shade cartoon to me...there's more to it than what it would appear IMO. Or maybe it does mess with gamma on high contrast edges if that's even possible? I wish I could find proper test patterns to find out wth it does :sly:

All this said, it outputs 8bit to mVR so noise quickly becomes an issue and it's a CPU hog when OTOH mVR is able to process the TV>PC conversion with lower banding off a low GPU load at that.

Also, its subjective EE is impressive at first, much like Samsung's DNIE but they both kinda look "artificial" to me....if you could find out what the trick is, you could possibly allow us to finetune it using much weaker settings :)

FWIW, SmoothL supports a "debug=true" argument that shows all kinds of histograms in real time.

JanWillem32
18th December 2013, 19:50
leeperry, I just tested SmoothLevels().
I'm sorry but I can't really see much difference beyond the obviously clipped levels beyond the {[16, 235], [16, 240], [16, 240]} intervals in synthethic tests. I tested "SmoothLevels(preset="tv2pc",HQ=true)" with the "Prisoners.mkv" sample, with input intervals set to [0, 255] for the conversion to RGB. I compared it with the regular conversion without SmoothLevels() and the regular {[16, 235], [16, 240], [16, 240]} input intervals setting.
Note that for some reason the the "Prisoners.mkv" sample with a resolution of 1280720 pixels was set to an anamorphic display ratio of 1279:720. I removed the anamorphic ratio before testing to prevent distortions.
Here's a link containing the two screenshots, taken of frame 600 of the sample, and the absolute diffence result: - .
The absolute difference in terms of R'G'B' is mostly 0.. The maximum difference is 6./255., in the form of some completely blue diffences on some of the sides of a tree.
The differences are mostly scattered R'G'B' pixels, but some groups can be seen on the sides of the trees and there are some patches on the brighter areas. The difference map also shows that only rarely combinations of R', G' and B' differences are found in the same pixel, pointing that there's not much change in brightness/luminance/contrast/luma/lightness.
My analysis is that all of this is mostly due to different dithering and dithering twice over, and not much else. Did I do something wrong during testing?

leeperry
19th December 2013, 02:52
Ah....So what did you compare "SmoothLevels(preset="tv2pc",HQ=true)" to exactly? mVR's TV to PC conversion?

Yes, the sides of the trees look different to me with SmoothL and the difference I'm seeing is definitely not placebo as I was able to DBT it with the help of a friend :o

I just recompared the REC709 test patterns from SmoothL2.rar (15.3 MB) (https://mega.co.nz/#!f9wCUa5a!KdRNjFkmCuL9p_xz2yPcnWuN8uZ6iBVjwLZfvvhhc68) and it seems clear that SmoothL outputs a higher level of dithering(I disabled dithering in mVR when using SmoothL), but maybe my brain likes it better because it's more advanced than what mVR is able to achieve...madshi made it clear that it's technically impossible to implement error diffusion with PS but that it might be possible with OpenCL/CUDA(which mVR does not support yet).

So you did write PS scripts for dithering? Would they be more efficient than what mVR currently does? No error diffusion I guess?

:thanks:

JanWillem32
19th December 2013, 07:04
I didn't use a video renderer. I compared the regular dithered output 8-bit R'G'B' by ffdshow tryouts' conversion. The internal {[16, 235], [16, 240], [16, 240]} intervals conversion to full range and that of SmoothLevels() doesn't differ all that much. The only consequence of letting SmoothLevels() is that the image is converted to YV12 in between expanding the levels and converting to R'G'B'. The consequence is dithering twice over, which is visible, but not really a good thing (I don't like adding noise at all).
The easiest 'dither' to implement is a random dither. You need to apply a lot of it so that it actually works, but it's not an elegant way of handling the issue. Example: http://caca.zoy.org/wiki/libcaca/study/1
Next are the static and random ordered dithering, which use a matrix. These require some attention on implementing, but these are very efficient single-level dithering methods. Example: http://caca.zoy.org/wiki/libcaca/study/2
Error diffusion and other methods are either pixel-progressive or multi-pass. This makes them very undesirable in modern systems. I'm not even going to attempt to implement one of those. Examples: http://caca.zoy.org/wiki/libcaca/study/3 and http://caca.zoy.org/wiki/libcaca/study/4
In general, if you want better dithering, just use a larger matrix for random ordered dithering. The other options would make due to performance problems no sense at all.
In terms of comparing my work to madVR, that's not easy. Because madVR has had a good many years of active development, it has many options. I however can't compare the internal code. I generally write really efficient, but wonderfully complicated code/assembly, that's for sure.

leeperry
19th December 2013, 19:18
Oh, I'm only parroting what madshi said: http://forum.doom9.org/showpost.php?p=1594415&postcount=14334
Screenshots are always converted to 8bit fullrange RGB (0-255), using error diffusion, which is a higher quality dithering algorithm compared to what madVR does during playback.
http://forum.doom9.org/showpost.php?p=1594420&postcount=14336
Error diffusion can't be done with GPU pixel shaders because error diffusion processes one pixel at a time, using the result of the previous pixel calculation for the next pixel. GPU pixel shaders more or less work on all pixels at the same time, which is the opposite of what error diffusion needs. *Maybe* it might be possible to do error diffusion using OpenCL/CUDA, but it will be difficult to do and likely not perform very well. For screenshots error diffusion is easy because madVR is doing the screenshot processing via CPU instead of GPU.
IIRC He told me that SmoothL does use error diffusion and that it wouldn't be possible in mVR without resorting to OpenCL/CUDA.

So you're saying that SmoothL does double dithering, interesting! I know many ppl used to enjoy my MT("GrainFactory_MT2 (http://pastebin.com/H6fDkSBp)(3,5,100,100,1.0,0.7,0,0,0,96,0)",4) call as much I did, so I guess my brain likes grain after all..I might just try some of your scripts for moar dither "grain" :)

romulous
29th December 2013, 04:51
Hi JanWillem32,

I was looking for a rotation PS (to rotate video files captured via mobile devices in portrait mode for example) - I just tried your "flip and rotate sampling direction for RGB" (for your v1.4 pack) in MPC-HC, and I finally managed to get it to load. Question - is this what the output is supposed to look like?
http://i.imgur.com/m3cNrQ3.jpg

If so, I have badly misunderstood what the PS is meant to do (I was looking for one that can rotate an entire video 90 degrees, 180 degrees, 270 degrees etc depending on how the person who filmed it was holding their camera at the time).

This is the same frame in the actual video itself for comparison with the PS:
http://i.imgur.com/pE0VuVR.jpg (as you can see, all this video requires is a simple 90 degree clockwise rotation)

Thanks.

JanWillem32
29th December 2013, 17:33
That shader performs an effect to flip and rotate each R, G and B channnel every frame on square textures.
You are probably looking for the renderer rotation effects. These are listed in the menus for resizing, rotation, pan and scan, et cetera, if available. Not all video renderers support this feature. (I wrote one with only partial support.) If that doesn't work, I can write a shader that does 90 degrees rotation, horizontal flipping and vertical flipping.
Note that geometry changes due to rotation usually require manual compensation, as the video renderer host usually doesn't compensate for it automatically.

romulous
30th December 2013, 02:48
You are probably looking for the renderer rotation effects.


Correct - I'm looking for a way to rotate entire videos. These will mainly be videos shot on mobile devices where the user has the camera not pointing the right way up. Here's an example posted here on Doom9 previously for this very thing:
http://www39.zippyshare.com/v/28485654/file.html


Not all video renderers support this feature. (I wrote one with only partial support.)


Heh - I wish I had asked here first before spending the entire day yesterday working on this (worked with your PS, various playing software, and a freely available DS rotation filter and that's the conclusion I came to).

The player I use is Zoom Player, and we have had an increasing number of people asking for video rotation, so I went looking for a solution. The only native Windows player so far that I have found that can do video rotation is MPC-BE, and only when using its own EVR CP (which doesn't help folks wanting the feature in other players such as Zoom).


If that doesn't work, I can write a shader that does 90 degrees rotation, horizontal flipping and vertical flipping.
Note that geometry changes due to rotation usually require manual compensation, as the video renderer host usually doesn't compensate for it automatically.

That would be great if it isn't too much trouble :) Zoom doesn't support PS as yet, but Blight is willing to add it as madVR already supports them (so only player support is required). I don't actually create mobile videos myself, but from what I've seen from samples various folk have posted, most seem to require a 90 degrees clockwise rotation, though I don't suppose you could really rule out a 90 degrees counter-clockwise rotation or maybe even a full 180 (how many different ways can you hold a camera that is not the correct way up?).

JanWillem32
31st December 2013, 02:47
The only native Windows player so far that I have found that can do video rotation is MPC-BE, and only when using its own EVR CP (which doesn't help folks wanting the feature in other players such as Zoom).In my honest opinion, rotation options should be implemented natively in the video renderer's mixer or resizer filtering passes. A separate pixel shader can do rotations, but these are not ideal. It's not efficient nor user-friendly. The efficient way of implementing these transforms is by using custom vertices for sampling the source texture (which in turn can be fed to any pixel shading pass in a single, combined transform). A separate pixel shader can't properly adjust the global resizing factor to fit an image nicely on screen (except when you only use the flip horizontal and flip vertical options which don't change geometry).
I'm well aware that there is no DirectShow interface to regulate video rotation, and integrated renderer filters such as this one are difficult to implement. I'll try to write a basic rotation shader today.I don't actually create mobile videos myself, but from what I've seen from samples various folk have posted, most seem to require a 90 degrees clockwise rotation, though I don't suppose you could really rule out a 90 degrees counter-clockwise rotation or maybe even a full 180 (how many different ways can you hold a camera that is not the correct way up?).180 degrees rotation is done by setting both 'flip horizontal' and 'flip vertical' options, 270 degrees rotation is done by setting all three options. I'll put that in the comments.

romulous
31st December 2013, 07:30
In my honest opinion, rotation options should be implemented natively in the video renderer's mixer or resizer filtering passes.

Indeed, Blight agrees - which is why he asked madshi to add it to madVR a while back. It's low on his to-do list though (and at the moment, madshi is not doing feature requests at all, so this will be a long time coming I'm afraid).


I'm well aware that there is no DirectShow interface to regulate video rotation, and integrated renderer filters such as this one are difficult to implement. I'll try to write a basic rotation shader today.

Thanks, at least that will be something in the meantime. For anyone wondering, this is the rotate filter I was testing:
http://videoprocessing.sourceforge.net/#rotate

Freeware and open source. It only works in RGB24 or RGB32, so you do have to load the colour space converter filter as well. 180 degree rotation works fine, as does vertical and horizontal (though vertical seems to not actually change the image in any way). 90 degrees, 270 degrees and diagonal all produce garbled images though:
http://i.imgur.com/4riVR5p.jpg

I'm told that would be because none of the video renderers support dynamic resolution changing, and that test clip is not the same width as it is height (meaning when you rotate it in certain ways, the resolution will change).

pirlouy
31st December 2013, 13:37
@JanWillem32: I'm quite sure you won't be interested, but maybe curiosity will persuade you.

Samsung TV offers a setting called "Dynamic Contrast" in options (not "CE dimming"), which changes original image, but I find it to be "nice looking" sometimes. You have 3 settings, but "low" is the best, 2 others change image too much.

Do you think you could write a shader for this, or it would me more complicated for the GPU ?
I suppose it changes pixel like this:
(post-resized by renderer)
Old Pixel: R=50; G=40; B=230
New Pixel:
R= 50+((50-128)*5%) = 46 (rounded)
G= 40+((40-128)*5%) = 36 (rounded)
B= 230+((230-128)*5%) = 235 (rounded)

Do you think my reasoning is stupid ?:confused:
Is there a simple way to test this algorithm with MPC ?

fagoatse
31st December 2013, 14:15
@JanWillem32: I'm quite sure you won't be interested, but maybe curiosity will persuade you.

Samsung TV offers a setting called "Dynamic Contrast" in options (not "CE dimming"), which changes original image, but I find it to be "nice looking" sometimes. You have 3 settings, but "low" is the best, 2 others change image too much.

Do you think you could write a shader for this, or it would me more complicated for the GPU ?
I suppose it changes pixel like this:
(post-resized by renderer)
Old Pixel: R=50; G=40; B=230
New Pixel:
R= 50+((50-128)*5%) = 46 (rounded)
G= 40+((40-128)*5%) = 36 (rounded)
B= 230+((230-128)*5%) = 235 (rounded)

Do you think my reasoning is stupid ?:confused:
Is there a simple way to test this algorithm with MPC ?

AFAIK the dynamic contrast option works by measuring ambient light in one's room so I guess it's impossible to simulate it correctly using shaders alone.

JanWillem32
1st January 2014, 00:00
romulous, this shader will work for now. I can also write a variant that masks the artifacts that occur when using diagonal flipping on rectangular textures.

pirlouy, that's a static contrast method. It's away from gray, and will crush near-black and near-white in the process. I don't like static nor dynamic contrast filters, as these always cause distortion. (I've already written several static contrast methods already, nonetheless.)
It's possible to do correct enviroment light adaptations (see the CIECAM02 transforms for reference). These filters require a lot of parameters, but are reasonably simple otherwise. These filers don't offer 'low'/'medium'/'high' options by the way. There's no place for user preferences in such filters. The ambient light factors can both be measured and estimated (again, see the CIECAM02 shader).

Example shader code (a funny one, as it only requires sampling a pixel and one multiply-add operation):#define CenterValueRed .5
#define CenterValueGreen .5
#define CenterValueBlue .5
#define ContrastRed .05
#define ContrastGreen .05
#define ContrastBlue .05
static const float4 Contrast = 1.+float4(ContrastRed, ContrastGreen, ContrastBlue, 0.);
static const float4 ScaledCenterValue = float4(CenterValueRed*ContrastRed, CenterValueGreen*ContrastGreen, CenterValueBlue*ContrastBlue, 0.);

sampler s0 : register(s0);

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float4 s1 = tex2D(s0, tex);// original pixel
return s1*Contrast-ScaledCenterValue;// process contrast and output
}// (C) 2013 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// flip and rotate
// This shader shoud be run as a screen space pixel shader when enabling diagonal flipping, else both modes will work.
// This shader requires compiling with ps_2_0, but higher is better, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports.
// If possible, avoid compiling with the software emulation modes (ps_?_sw). Pixel shaders require a lot of processing power to run in real-time software mode.
// This shader can flip and rotate full images.
// Note than when enabling diagonal flipping, the resizing factor has to be lowered in advance to allow the full image to be visible.

// FlipHorizontal, FlipVertical and FlipDiagonal: 0 is disabled, 1 is enabled
// To rotate by a quarter clockwise, enable FlipHorizontal and FlipDiagonal.
// To rotate by a half, enable FlipHorizontal and FlipVertical.
// To rotate by a quarter counter-clockwise, enable FlipVertical and FlipDiagonal.
#define FlipHorizontal 0
#define FlipVertical 0
#define FlipDiagonal 0

sampler s0;
#if FlipDiagonal
float2 c0;
float2 c1;
#endif

float4 main(float2 tex : TEXCOORD0) : COLOR
{
tex -= .5;
#if FlipHorizontal && FlipVertical
tex = -tex;
#elif FlipHorizontal
tex.x = -tex.x;
#elif FlipVertical
tex.y = -tex.y;
#endif
#if FlipDiagonal
tex = tex.yx*c1*c0.yx;
#endif
return tex2D(s0, tex+.5);// sample and output
}

romulous
1st January 2014, 01:46
romulous, this shader will work for now. I can also write a variant that masks the artifacts that occur when using diagonal flipping on rectangular textures.

Hi Jan,

Thanks - just trying it out in MPC-HC now (using madVR). Just testing the 90 degrees clockwise flip to begin with. On some of the videos, I see this (I tested 5 videos, these occurred on 4 of them - the remaining video was fine):
http://i.imgur.com/AcbOWwi.png

Is that the artifacts that you were referring to, or are these different ones?

Thanks!

JanWillem32
1st January 2014, 04:06
This shader is a little bit heavier, but can mask artifacts:// (C) 2013 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// flip and rotate with background mask
// This shader shoud be run as a screen space pixel shader when enabling diagonal flipping, else both modes will work.
// This shader requires compiling with ps_2_0, but higher is better, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports.
// If possible, avoid compiling with the software emulation modes (ps_?_sw). Pixel shaders require a lot of processing power to run in real-time software mode.
// This shader can flip and rotate full images.
// Note than when enabling diagonal flipping, the resizing factor has to be lowered in advance to allow the full image to be visible.

// FlipHorizontal, FlipVertical and FlipDiagonal: 0 is disabled, 1 is enabled
#define FlipHorizontal 0
#define FlipVertical 0
#define FlipDiagonal 0
// BackgroundMask: red, green, blue and alpha vector to return on border values when FlipDiagonal is activated, intervals [0, 1]
#define BackgroundMask float4(0., 0., 0., 0.)

sampler s0;
#if FlipDiagonal
float2 c0;
float2 c1;
#endif

float4 main(float2 tex : TEXCOORD0) : COLOR
{
tex -= .5;
#if FlipDiagonal
float2 geometryswap = c1*c0.yx;
float2 border = .5*geometryswap;
if((abs(tex.x) > border.x) || (abs(tex.y) > border.y)) return BackgroundMask;// draw background color on background pixels
#endif
#if FlipHorizontal && FlipVertical
tex = -tex;
#elif FlipHorizontal
tex.x = -tex.x;
#elif FlipVertical
tex.y = -tex.y;
#endif
#if FlipDiagonal
tex = tex.yx*geometryswap;
#endif
return tex2D(s0, tex+.5);// sample and output
}

romulous
1st January 2014, 04:28
Yes, thanks - that seems to remove the artifacts pretty well. It's much appreciated - you make writing these things look so easy! :) Is how 'heavy' the shader is based on your video card, or video card+cpu, or overall system?

I think Blight was wondering what pixel shader profile was required - is that the "ps_2_0" listed in the header?

JanWillem32
1st January 2014, 05:55
This shader really isn't that heavy compared to complex filters, such as some forms of resizeing, debanding, denoising, sharpening and frame interpolating types (and it was indeed really easy to write as well). An old, low-end GPU/IGP might choke on executing this shader in some cases, but setting a few shaders is usually not really a problem in most cases. This particular shader indeed requires the Direct3D 9 minimum of PS 2.0 support from the GPU, as indicated. Not that it really matters, as a video renderer can simply auto-detect ps_2_0, ps_2_a, ps_2_b and ps_3_0 levels from the D3D9 support caps report and use the highest level available. If a shader fails, that particular shader's title can simply be reported back to the user (already a required feature, as a video renderer supporting custom shaders has to deal with truely faulty shaders as well). There are also not that many active users using a GPU with lower than PS 3.0 (and PS 2.0 or higher) support anymore.
For MPC-HC I've advised some time ago to no longer even store the pixel shader compiling level with every shader. The interface to the video renderers doesn't need this parameter at all. The pixel shader menus can simply default to PS 3.0 every time, and offer the other three modes for testing purposes only.
For MadVR the case is even easier. MadVR notes in the list of system requirements "graphics card with full D3D9 / PS3.0 hardware support". There is no use in compiling pixel shaders at a lower level than that.

James Freeman
1st January 2014, 11:44
@JanWillem32

Can you please make "Sharpen Complex 2" only for Chroma?

I am on to something big here (huge improvement to chroma upscaling resolution).
I can almost make 4:2:0 to look like 4:4:4.


I'll explain:

First I captured a Belle-Nuit (http://www.belle-nuit.com/test-chart) test chart in 4:4:4 (rgb32) video.
Then I ran it through x264 to make a 4:2:0 video (and lose chroma information).
Then I opened both videos side by side and activated your pixel shader called "chroma for SD&HD video input" to show only chroma information on both of them.
Then I activated "sharpen complex 2" (pre-resize) on the 4:2:0 video and carefully tweaked it look like the 4:4:4 video (yes, it can even do that).

The idea is quite simple: Sharpen the chroma before upscaling.
The Sharpen 4:2:0 chroma results are very nice, almost like the original 4:4:4 (giving the saturated colors back to high frequency details).
Now I only need to put the Sharpen Chroma together with the untouched Luma for a perfect 4:2:0 -> 4:4:4 conversion.
That's why I need the Sharpen Complex 2 only for Chroma, leaving the Luma untouched.

Hope that captures you interest.

Happy New Year !

XRyche
1st January 2014, 12:33
@JanWillem32: I'm quite sure you won't be interested, but maybe curiosity will persuade you.

Samsung TV offers a setting called "Dynamic Contrast" in options (not "CE dimming"), which changes original image, but I find it to be "nice looking" sometimes. You have 3 settings, but "low" is the best, 2 others change image too much.

Do you think you could write a shader for this, or it would me more complicated for the GPU ?
I suppose it changes pixel like this:
(post-resized by renderer)
Old Pixel: R=50; G=40; B=230
New Pixel:
R= 50+((50-128)*5%) = 46 (rounded)
G= 40+((40-128)*5%) = 36 (rounded)
B= 230+((230-128)*5%) = 235 (rounded)

Do you think my reasoning is stupid ?:confused:
Is there a simple way to test this algorithm with MPC ?

Just as JanWillem32 suggested use his CIECAM02 shaders. If for example your movies appear too dark (to the point of even losing some detail in dark ambiance and/or night-time scenes) and lowering gamma just washes everything out than the CIECAM02 shaders using the "average""User Settings" are for you. Just try them, they will pleasantly surprise you.

pirlouy
1st January 2014, 19:40
AFAIK the dynamic contrast option works by measuring ambient light in one's room so I guess it's impossible to simulate it correctly using shaders alone.
My TV does not have any sensor like this; it really is a modification with a precise algorithm.

@JanWillem32: Your example code works well. But what are those CIECAM02 shaders you're talking about ? I tried to search but did not find/understand what it is or where to download these shaders ? :/

JanWillem32
2nd January 2014, 03:10
James Freeman, many chroma up-sampling and other resizing methods already have sharpening internally. Modifying any sharpening factors on these is easy. The sharpest chroma up-samplers (I implemented several) are not always the best-looking though. The sharpness factors are just one part of good scaling. What processing chain did you use for this test? It's possible that all you want is a different chroma upsampler.

pirlouy, "dynamic contrast" just compresses image dynamics or changes gamma without using much logic. Using "precise algorithm" to describe such an effect is quite an overstatement.
The CIECAM02 transforms were already integrated in the EVR-CP/VMR9 r. color management options I helped to integrate. For the new basic color correction option I added, there is no such option yet. I wrote a separate pixel shader before integrating it into newer builds to test the function. See for reference: http://forum.doom9.org/showthread.php?p=1658826#post1658826 .

James Freeman
2nd January 2014, 07:06
James Freeman, many chroma up-sampling and other resizing methods already have sharpening internally. Modifying any sharpening factors on these is easy. The sharpest chroma up-samplers (I implemented several) are not always the best-looking though. The sharpness factors are just one part of good scaling. What processing chain did you use for this test? It's possible that all you want is a different chroma upsampler.

Jan

I use MadVR Lanczos/Jinc for chroma upscaling but even that is not sharp enough for what I have in mind.
The goal of this experiment is to make the 4:2:0 chroma look like 4:4:4 chroma which is a lot sharper an more saturated.

I simply want to have control over the sharpness of the chroma, and "Sharpness Complex 2" is the best in my taste.

I can make a short video to demonstrate what I have done so far.

EDIT:

Here is a short video where I compare 4:4:4 vs 4:2:0 vs 4:2:0+Sharpen: Chroma Sharpen Demo (http://www.mediafire.com/watch/plaj4kh59zv8jpg/Chroma%20Sharpen%20Demo.avi)
*Its a screen capture, 28MB uncompressed rgb32.
*Please download the original video (don't watch online => quality destroyed).

The upscaler in the video is MadVR's Lanczos 3+AR which is not sharp enough.

There are two presets that I use in this video.
Chroma Only: Shows just the chroma.
Chroma Sharpen: Shows just chroma + Sharpen Complex 2 (Pre-Upscaler/resize).
*Placing the Sharpen after-upscaling does not have the same effect!!
*Note that the The Sharpen Complex 2 is carefully tweaked to look like 4:4:4 and not at it default settings.

Look at the green lines as they become more vibrant like in the original 4:4:4.
Look at the red/cyan fine lines as they become more distinct and colorful like in the original 4:4:4.
You can clearly see that 4:2:0+Sharpen looks a lot closer to the original 4:4:4.

Leaving the Luma untouched to prevent ringing and artifacts, just a better looking, more "True to the Source" chroma.


EDIT 2:

An even clearer video of this "trick"
Chroma Sharpen Demo 2 (http://www.mediafire.com/watch/yzozc6mb7ujxcbo/Chroma%20Sharpen%20Demo%202.avi)

Here you can clearly see that sharpening the chroma makes "magic" in fine details, almost recreating the missing data.

Thanks for your time.

XRyche
2nd January 2014, 09:59
My TV does not have any sensor like this; it really is a modification with a precise algorithm.

@JanWillem32: Your example code works well. But what are those CIECAM02 shaders you're talking about ? I tried to search but did not find/understand what it is or where to download these shaders ? :/

The shaders are located in 2 text documents in the compressed file (.7z) Of JanWillem32's latest build of his renderer found here: http://forum.doom9.org/showthread.php?t=161047 . At the bottom of the very first post.

toniash
2nd January 2014, 14:03
The shaders are located in 2 text documents in the compressed file (.7z) Of JanWillem32's latest build of his renderer found here: http://forum.doom9.org/showthread.php?t=161047 . At the bottom of the very first post.

What's the name of the shaders? I can't find them ...

JanWillem32
2nd January 2014, 14:07
James Freeman, what exact processing chain did you use? When feeding textures containing Y'CbCr into the "Sharpen Complex 2" shader it does not exclude the Y' channel without modifications.
Also note that Lanczos filtering is sharp, but it's not the sharpest resizer. There are other options that sharpen more, but as expected, these give heavy artifacts.

James Freeman
2nd January 2014, 15:11
Jan,

My chain in MPC-HC:
http://www.mediafire.com/convkey/b68f/n6svgk5389vsghyfg.jpg

There is no luma in those videos, just chroma.
That is why I ask you for help.
There is currently no sharpen shader only for chroma.

We need something that will divide Y from CbCr, and only apply sharpening to CbCr then combine them to YCbCr again without touching the Y.

Lanczos filtering is sharp, but it's not the sharpest resizer. There are other options that sharpen more, but as expected, these give heavy artifacts.
Lanczos is the sharpest resizer in MadVR (to my eyes), but not sharp enough.
Thats why I use Sharpen Complex 2, to carefully sharpen the chroma image without adding unwanted artifacts.
But as stated, there is no way currently to combine back my chain with the removed Y (luma) and the already sharpen chroma.

JanWillem32
2nd January 2014, 23:34
That actually explains a lot. "chroma for SD&HD input" is a test shader. It converts R'G'B' to YCbCr, discards original luma, multiplies chroma by 1.5 (to make it more visible), and converts back to R'G'B' with the Y' channel fixed at .5. The output isn't chroma, but just weird R'G'B'. To get real Y'CbCr, use a "R'G'B' to Y'CbCr"-type shader. You can then really filter chroma. I've already written Y'CbCr filtering shaders. Doing any filtering in Y'CbCr is risky. The Y'CbCr forms don't directly specify any color; all specifications are dependent on R'G'B'/RGB color space (which in turn has specifications in XYZ color space and a reference surround specification). In the category of "everyday-use effect pixel shaders" I only made chroma up-sampler shaders and the deband/denoise/sharpen effect shaders for now. (That last effect is only to deal with encoder noise and bad raw image quantization, these are really not suitable for any dramatic effects.)
When I think of effects to compensate anything of a resizer, it will probably be completely dependant as a compensation factor for a specific resizer, and also be integrated into that resizer as well. Chroma up-samplers are essentially resizers, with pretty much the same problems.
As for sharper resizers than Lanczos, there are a few bicubic variants that you might like to try. I'm not sure any of those are integrated in madVR. (I'm also not too sure that these would be really useful as well.)

James Freeman
3rd January 2014, 14:56
Jan,

If I understood you correctly, to sharpen only the chroma, this chain is necessary:

1. R'G'B' to Y'CbCr.
2. CbCr Sharpen Complex 2 (chroma only)
3. Y'CbCr to R'G'B'.

If so,
Is there any way to do #2 with any of the current shaders?


I also modified the "Chroma for SD&HD video input" shader to show only pure chroma (0.5=>0 & 1.5->1.0).
The image looks Black, Red & Blue (as expected).

Now I have tweaked the Sharpen shader even more accurately.
It looks even closer to 4:4:4 now., almost identical.

Jan,
How hard will it be to insert a "Sharpen Complex 2" to only affect the CbCr data into the above 1,2,3 chain,
and make a single Shader that does this?

JanWillem32
4th January 2014, 03:21
That chain is correct. To simplify it for other Y'CbCr-type shaders, I've merged the Y'CbCr to R'G'B' step (and onward standard color transforms) into the previous shader for some of the chains I designed earlier. The first shader is better off being separate, as the next step takes multiple samples. (If a shader takes only one sample, you can always merge it with the previous shader. Even merging several shaders is possible.)
The shaders for Y'CbCr I've already written are chroma up-samplers and the deband/denoise/sharpen effects. Both have sharpening to partiallly compensate some of the base problems of the main filter.
'Simple' sharpening filters are useful for raw, high-quality video and images straight from the camera, in a decent color space, to compensate for areas that should be more in the foreground, but seem a bit 'weak' (determined by the person that does the mastering).
I see no place for such 'simple' sharpening outside of that context. The various Y'CbCr forms are only used to encode color, and you need some filters in Y'CbCr stages to filter some of the encode's problems and later on convert it to R'G'B'. However, Y'CbCr does a terrible job at actually describing a uniform or human vision-relative color space. Every flaw in a mixer's/renderer's Y'CbCr filtering stages becomes amplified a lot in terms of visible artifacts. You can't really do much with it in filters directly.

vood007
4th January 2014, 09:31
Jan, is it possible to have a single color plane with a fixed size in the background and the actual video centered on top of it? In other words; When the player window is bigger than the video and scaling is turned off, the video is obviously surrounded by a black frame. Can we fill this frame with a color other than black?

JanWillem32
4th January 2014, 22:39
If an interface allows you to set a background color in the video renderer, it's easy. I implemented such a function a while ago for the renderers I edit (miscellaneous tab). For the cases that require a special approach, use the "letterbox" and "pillarbox" shaders.

toniash, sorry for the late response. I simply overlooked your question earlier. The CIECAM02 shaders are included along with the five tester builds at the bottom of the first post of that other thread. (It's not included with the source code archive.)

vood007
5th January 2014, 20:22
If an interface allows you to set a background color in the video renderer, it's easy. I implemented such a function a while ago for the renderers I edit (miscellaneous tab). For the cases that require a special approach, use the "letterbox" and "pillarbox" shaders.

What you added to the misc tab of your study build is exactly what i want, so how do we get this into main MPCHC branch?

JanWillem32
6th January 2014, 00:47
Sorry, but I don't edit the video renderers in the main branch anymore. The mixer classes part of the internal renderers (VMR-9 r., EVR CP, RealMedia DX9, Quicktime DX9) and the two renderer sockets were reasonably well-written, so only had to modify those. The main renderer is beyond salvation. I tried to modify it at first. Some of the simple patches I made a few years ago worked well, so these were integrated. When I tried to edit more parts, disaster on disaster followed. Once I finally had enough I decided to no longer edit it. I integrated a renderer I wrote/edited earlier that worked and got rid of the old one. It broke pretty much all compatibility with the old code parts. It took weeks for it to compile, and a bit longer to run well. I'm very satisfied with the results. I even integrated some complicated functions without too much of a hassle. The main branch renderer and the one I wrote are similar in settings and menus, and I didn't sacrifice original functionality (although I'd rather not have transplanted the original "VSync" functions). However, I still modified several megabytes of code. Integration will not be easy. Smaller parts of the code can go in the main branch without a problem as patches. The main renderer change and all of its functionality will be one huge patch.

vood007
6th January 2014, 01:55
Not a big problem, theres still good old FFDShow. Anyway thanks for answering.

turbojet
7th January 2014, 08:03
James Freeman: Defining chroma coefficients in the developer section of lumasharpen shader (http://dropcanvas.com/l46x9) to sharpen only chroma. It sharpens much like sharpen complex 2 but with less edge artifacts.

James Freeman
10th January 2014, 19:17
James Freeman: Defining chroma coefficients in the developer section of lumasharpen shader (http://dropcanvas.com/l46x9) to sharpen only chroma. It sharpens much like sharpen complex 2 but with less edge artifacts.

Thanks.

Can you please guide me how to do this?

I tested the MPC Luma Sharpen and its very nice !!

turbojet
11th January 2014, 01:04
I don't know the chroma coeffecients, maybe someone else does?

I think just this line needs to be commented and insert a new line around it.
#define CoefLuma float4(0.2126, 0.7152, 0.0722,0)

Unrelated but there's an interesting comment that might be worth trying for someone with the knowledge. // -- Clamping the maximum amount of sharpening to prevent halo artifacts --
sharp_luma = clamp(sharp_luma, -sharp_clamp, sharp_clamp); //TODO Try a curve function instead of a clamp

James Freeman
11th January 2014, 17:20
JanWillem32,

I have been using a small avisynth script to create Black Frame Insertion to see how a video looks without motion blur.
I looks fantastic (leaving the flicker aside) !!
But it takes huge amount of memory and is non fluid (stutters).

Is it possible to write a simple shader that replaces 1 or 2 black frames after each refresh rate?

for example:
72Hz monitor.
1, B, B, 4, B, B, 7, B, B, 10, B, B etc... to 72.
The goal is to show only One out of Three frames.
or
1, 2, B, 4, 5, B, 7, 6, B, 10, 11, B etc... to 72.
show only 2 out of 3 frames.

This way after each movie frame (24fps) the display will show 1 or 2 black frames to eliminate motion blur.
There is no need to take the movie frame rate into consideration, only the monitor refresh rate.

It also will be nice if the shader can automatically read the display refresh rate, or let the user select how much black frames to insert after how much normal frame/s.
Of course it should be in-sync with the monitor refresh rate.

Are you familiar with Backlight Strobing or BFI?

nevcairiel
11th January 2014, 17:36
Shaders work on a given image, I don't think they can produce new images.

JanWillem32
11th January 2014, 21:16
Correct, this is more a task for an internal renderer routine. A pixel shader can change entire frames into solid colors or gradients, and with some temporal effects using the frame and time counters, but actually presenting more output frames per second has to be done by the host renderer.
I can easily (temporarily) add this function to the 'Alternative Scheduler' renderer function (requires Vista or newer) for the renderer I'm editing if you just want to test a more lightweight solution than using Avisynth for this case.
As for these presenting techniques, some parts are easy, some are not:
-Setting a frame to black before presenting it is very easy.
-Setting the amount of refreshes for a presented frame is not supported under XP, handled by a nasty function under Vista in windowed mode and can be handled normally under 7 and newer in windowed mode, plus under Vista and newer in D3D full-screen exclusive mode.
-Frame-time, jitter and other presentation statistics requires managing a lot of code.
-The frame time stamps of incoming video source frames are unreliable. The average over a lot of frames does add up nicely to a frame-time clock, but individual time stamps will often be off by several milliseconds. That means that a renderer has to estimate for how long a frame should be presented, and then also implement scheduling for getting a matching input frame rate to display refresh rate. All of this stuff is also implemented along with the statistics parts, as you always have to compensate for jitter and imprecision of timing data in the long run.
-Getting the monitor refresh rate is difficult. The pixel clock of the video card is actually really stable, but most software interfaces (even the ones that seem to have decent precision) report its rate incorrectly. The few that do work have low precision and will fail sometimes.
-The video renderer runs its main loop for every incoming frame (and once in a while for a repeated frame in paused mode). The the display refresh rate only comes into play with presenting after the scheduling functions/constant frame interpolator correlated it with the video frame rate (with compensation). It really doesn't work the other way around.
All in all, the various presentation functions usually take a lot of code to handle. Black frame insertion is easy to add to the 'Alternative Scheduler' renderer function, because of its method. Backlight strobing has to be handled by the hardware of the display device. The data stream from the source device doesn't include such a control.

James Freeman
11th January 2014, 23:00
Thanks Jan.

I can see its a big hassle, so never mind.

I've played enough with avisynth BFI to be convinced of this effect.
I think I should buy a proper Backlight Strobing monitor.

Though, till an IPS panel with 120Hz and backlight strobing will exist, I'm waiting.

JanWillem32
17th January 2014, 02:26
I was asked to improve the quality of the sphere shader, so I did a bit of work on the version I already edited. It's a bit more complicated, so it takes a lot more instructions compared to the original, but it's worth it.
If someone can help a bit with the old cinema film grain effect shader with colorimetry data of silver grains, sepia toning and such, or with documented methods that could help (that don't use external textures), I would appreciate it. I'm a bit stuck at getting significantly better results from prototype shaders compared to the already available shaders unfortunately. I had to guess a lot of variables, so that's one of the problems. Another problem is getting multiple grain sizes on screen, with decent randomness.// (C) 2011-2014 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// sphere, Catmull-Rom spline44 interpolated
// This shader should be run as a screen space pixel shader if you are up-scaling.
// This shader should not be run as a screen space pixel shader if you are down-scaling.
// This shader requires compiling with ps_2_a, but higher is better, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports.
// This shader is meant to work with linear RGB input and output. Regular R'G'B' with a video gamma encoding will have to be converted with the linear gamma shaders to work properly.
// Use this shader to apply an effect that looks like projecting the video from a rotating sphere.

// fractions, either decimal or not, are allowed
// rotation speed, in rotations per second
#define rs .125
// border canvas size
#define bs 7.75
// border gamma factor
#define bf 128.
// light position
#define pl float3(6., -6., -7.)
// light intensity
#define cl .5
// light size
#define sl 64.
// camera position
#define pc float3(0., 0., -1.)
// sphere position
#define ps float3(0., 0., .75)
// sphere radius
#define ra acos(-1.)*.25
// base size constant
#define Ai 1.

sampler s0;
float4 c0 : register(c0);
float2 c1 : register(c1);
#define sp(a, b, c) float3 a; {float2 tmp = tex+c1*float2(b, c); a = tex2D(s0, frac(abs(float2(tmp.x, abs(tmp.y)*-1.+1.)))).rgb;}

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float ar = c0.x*c1.y;// aspect ratio
tex.x = tex.x*ar+.5-.5*ar;// surface rectangle to square
// relate the sphere to the camera position
float3 pd = normalize(float3(tex.xy-.5, 0.)-pc);
float B = dot(pd, 2.*(pc-ps));
float C = dot(pc-ps, pc-ps)-pow(ra, 2);
float D = B*B-4.*Ai*C;// calculate the sphere
if(D < 0.) return 0.;// output black borders, only map if things are inside the sphere

// intersection data
float3 p = pc+pd*(-sqrt(D)-B)*.5/Ai;
float3 n = normalize(p-ps);
float3 l = normalize(pl-p);

float2 cd = .75*(.5-tex);// measure the distance to the image center
tex = acos(-n.xy)/acos(-1.);// mapping the image onto the sphere
tex.x = frac(tex.x+c0.w*rs);// rotation
tex.y = (tex.y-.5)*ar-.5;// aspect ratio correction
tex *= c0.xy;// normalize to texture size in pixels
float2 t = frac(tex);// calculate the difference between the output pixel and the original surrounding two pixels
tex = (tex-t+.5)*c1+float2(1., 0.);// make the sampling position line up with an exact pixel coordinate for L1, normalized to the interval (1, 2), not (0, 1) as we want texture wrapping in this case

// weights
float2 t2 = t*t, t3 = t2*t;
float4 w13 = t3.xyxy*float2(1.5, .5).xxyy+t2.xyxy*float2(-2.5, -.5).xxyy;
float4 w02 = t3.xyxy*float2(-.5, -1.5).xxyy+t2.xyxy*float2(1., 2.).xxyy+t.xyxy*float2(-.5, .5).xxyy;
w13.xw += 1.;

// original pixels
sp(L0, -1., -1.) sp(L1, -1., 0.) sp(L2, -1., 1.) sp(L3, -1., 2.)
sp(K0, 0., -1.) sp(K1, 0., 0.) sp(K2, 0., 1.) sp(K3, 0., 2.)
sp(J0, 1., -1.) sp(J1, 1., 0.) sp(J2, 1., 1.) sp(J3, 1., 2.)
sp(I0, 2., -1.) sp(I1, 2., 0.) sp(I2, 2., 1.) sp(I3, 2., 2.)

// vertical interpolation
float3 Q0 = L0*w02.y+L1*w13.y+L2*w02.w+L3*w13.w;
float3 Q1 = K0*w02.y+K1*w13.y+K2*w02.w+K3*w13.w;
float3 Q2 = J0*w02.y+J1*w13.y+J2*w02.w+J3*w13.w;
float3 Q3 = I0*w02.y+I1*w13.y+I2*w02.w+I3*w13.w;
float3 P0 = Q0*w02.x+Q1*w13.x+Q2*w02.z+Q3*w13.z;// horizontal interpolation

return ((P0+cl*pow(max(dot(l, reflect(pd, n)), 0.), sl))*min(pow(bs*dot(cd, cd), -bf), 1.)*dot(n, l)).rgbb;// add specular mapping, sphere edges, diffuse mapping and then output
}// (C) 2011-2014 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// sphere, Catmull-Rom spline44 interpolated on 16-bit integer surfaces
// This shader should be run as a screen space pixel shader if you are up-scaling.
// This shader should not be run as a screen space pixel shader if you are down-scaling.
// This shader requires compiling with ps_2_a, but higher is better, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports.
// This shader is meant to work with linear RGB input and output. Regular R'G'B' with a video gamma encoding will have to be converted with the linear gamma shaders to work properly.
// Use this shader to apply an effect that looks like projecting the video from a rotating sphere.

// fractions, either decimal or not, are allowed
// rotation speed, in rotations per second
#define rs .125
// border canvas size
#define bs 7.75
// border gamma factor
#define bf 128.
// light position
#define pl float3(6., -6., -7.)
// light intensity
#define cl .5
// light size
#define sl 64.
// camera position
#define pc float3(0., 0., -1.)
// sphere position
#define ps float3(0., 0., .75)
// sphere radius
#define ra acos(-1.)*.25
// base size constant
#define Ai 1.

sampler s0;
float4 c0 : register(c0);
float2 c1 : register(c1);
#define sp(a, b, c) float3 a; {float2 tmp = tex+c1*float2(b, c); a = tex2D(s0, frac(abs(float2(tmp.x, abs(tmp.y)*-1.+1.)))).rgb;}

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float ar = c0.x*c1.y;// aspect ratio
tex.x = tex.x*ar+.5-.5*ar;// surface rectangle to square
// relate the sphere to the camera position
float3 pd = normalize(float3(tex.xy-.5, 0.)-pc);
float B = dot(pd, 2.*(pc-ps));
float C = dot(pc-ps, pc-ps)-pow(ra, 2);
float D = B*B-4.*Ai*C;// calculate the sphere
if(D < 0.) return 0.;// output black borders, only map if things are inside the sphere

// intersection data
float3 p = pc+pd*(-sqrt(D)-B)*.5/Ai;
float3 n = normalize(p-ps);
float3 l = normalize(pl-p);

float2 cd = .75*(.5-tex);// measure the distance to the image center
tex = acos(-n.xy)/acos(-1.);// mapping the image onto the sphere
tex.x = frac(tex.x+c0.w*rs);// rotation
tex.y = (tex.y-.5)*ar-.5;// aspect ratio correction
tex *= c0.xy;// normalize to texture size in pixels
float2 t = frac(tex);// calculate the difference between the output pixel and the original surrounding two pixels
tex = (tex-t+.5)*c1+float2(1., 0.);// make the sampling position line up with an exact pixel coordinate for L1, normalized to the interval (1, 2), not (0, 1) as we want texture wrapping in this case

// weights
float2 t2 = t*t, t3 = t2*t;
float4 w13 = t3.xyxy*float2(1.5, .5).xxyy+t2.xyxy*float2(-2.5, -.5).xxyy;
float4 w02 = t3.xyxy*float2(-.5, -1.5).xxyy+t2.xyxy*float2(1., 2.).xxyy+t.xyxy*float2(-.5, .5).xxyy;
w13.xw += 1.;

// original pixels
sp(L0, -1., -1.) sp(L1, -1., 0.) sp(L2, -1., 1.) sp(L3, -1., 2.)
sp(K0, 0., -1.) sp(K1, 0., 0.) sp(K2, 0., 1.) sp(K3, 0., 2.)
sp(J0, 1., -1.) sp(J1, 1., 0.) sp(J2, 1., 1.) sp(J3, 1., 2.)
sp(I0, 2., -1.) sp(I1, 2., 0.) sp(I2, 2., 1.) sp(I3, 2., 2.)

// vertical interpolation
float3 Q0 = L0*w02.y+L1*w13.y+L2*w02.w+L3*w13.w;
float3 Q1 = K0*w02.y+K1*w13.y+K2*w02.w+K3*w13.w;
float3 Q2 = J0*w02.y+J1*w13.y+J2*w02.w+J3*w13.w;
float3 Q3 = I0*w02.y+I1*w13.y+I2*w02.w+I3*w13.w;
float3 P0 = Q0*w02.x+Q1*w13.x+Q2*w02.z+Q3*w13.z;// horizontal interpolation

return ((P0*65535./32767.-16384./32767.+cl*pow(max(dot(l, reflect(pd, n)), 0.), sl))*min(pow(bs*dot(cd, cd), -bf), 1.)*dot(n, l)*32767./65535.+16384./65535.).rgbb;// add specular mapping, sphere edges, diffuse mapping and then output
}

romulous
31st January 2014, 13:58
Hi JanWillem32,

Blight is doing some work to integrate the rotation shader you wrote on page 20 into Zoom Player. He just had a couple of questions he wanted me to ask you if you don't mind. Some quoted text first, and then the question afterwards.

Question #1:

Note that geometry changes due to rotation usually require manual compensation, as the video renderer host usually doesn't compensate for it automatically.

Where does the manual compensation have to be performed?

Question #2:

// Note than when enabling diagonal flipping, the resizing factor has to be lowered in advance to allow the full image to be visible.

Where do you set the resize factor?

Thanks!

hadi79
31st January 2014, 18:10
Hi guys,

Could anyone help me?

Why this shader doesn't work on my MPC?
This is the error: error X3507: 'main': Not all control paths return

link (http://dl.dropboxusercontent.com/s/fh3mqtqcylhe2r4/discman.txt)

JanWillem32
1st February 2014, 02:40
hadi79, I don't see why that shader wouldn't compile. It could use a few changes, though. Change "static float" to "static const float", as it's a compile-time constant. Change "1.0f" to "1.0" or "1.", as the 'f' or 'F' suffixes are considered to be bad style when working with HLSL. Change "#define BlockCount 75" to "#define BlockCount 75.0" or "#define BlockCount 75.", as it's never used as an integer. The HLSL compiler doesn't reliably do implicit type casts when compiling. Always specify floating-point values as floating-point or use explicit type casts. (I really think all C, C++ and HLSL compilers should should warn about all implicit type casts.)
The effect itself is actually quite a funny one. It's pretty basic, though. (It doesn't pre-process with a low-pass, nor does it apply anti-aliasing on its mask.)

If PS 3.0 code isn't a problem, you can also add branching to optimize out the sampling of the zero-value output pixels:#define BlockCount 75.
#define Min .2
#define Max .45
sampler2D input : register(S0);

// static constants used as optimization
static const float BlockSize = 1./BlockCount;

float4 main(float2 uv : TEXCOORD) : COLOR
{
// Calculate block center
float2 blockPos = floor(uv*BlockCount);
float2 blockCenter = blockPos*BlockSize+BlockSize*.5;

// Round the block by testing the distance of the pixel coordinate to the center
float dist = length(uv-blockCenter)*BlockCount;
[branch] if(dist < Min) {
return 0.;}
else if(dist > Max) {
return 0.;}
else {// Sample color at the calculated coordinate
return tex2Dlod(input, float4(blockCenter, 0., 0.));}
}
romulous, setting the resizing factor is usually handled by the host player. Most video renderers do have a default setting which just resizes video to the full window size plus letterboxing/pillarboxing it to maintain the intended aspect ratio of the video source.
For standard interfaces to renderers there are these calls:
EVR/MF interface control:
IMFVideoDisplayControl::SetVideoPosition()
Standard DirectShow interface controls:
IBasicVideo::SetDefaultSourcePosition()
IBasicVideo::SetDestinationPosition()
IBasicVideo also has a lot of other calls to handle input/output video size for a video renderer
IVideoWindow::SetWindowPosition()
IVideoWindow also has a few other calls to handle the state of the presentation window

The EVR also natively supports rotation, but I've never worked with that, so I don't know how it behaves.

As for resizing factors, if you take the simple example of a 19201080 image on a 19201080 window without resizing or repositioning, your source and destination rectangles will simply be {0, 0, 1920, 1080}, {0, 0, 1920, 1080} (left, top, right, bottom).
When you use diagonal flipping and you intend to keep the image in the same aspect ratio, the destination rectangle will have to change. For this case {420, 0, 1500, 1080} would be suitable if you want a centered image. (That's a resizing factor of 9./16..)

For the case of a DirectX 9 renderer that you can edit, you can just set different vertices. (Note that when you can edit vertices, you don't need the rotation shader at all.)
Example for a very basic FVF-style (D3DFVF_XYZRHW | D3DFVF_TEX1) pre-transformed vertex statement in a top-left, top-right, bottom-left, bottom-right index order:struct CUSTOMVERTEX_TEX1 {
float x, y, z, rhw, u, v;
};
float hres = 1920.f, vres = 1080.f;
// lists for DrawIndexedPrimitive() or DrawPrimitive()
// standard
CUSTOMVERTEX_TEX1 v[4] = {
{ -.5f, -.5f, 0.f, 1.f, 0.f, 0.f},
{hres-.5f, -.5f, 0.f, 1.f, 1.f, 0.f},
{ -.5f, vres-.5f, 0.f, 1.f, 0.f, 1.f},
{hres-.5f, vres-.5f, 0.f, 1.f, 1.f, 1.f}};
// flipped horizontally
CUSTOMVERTEX_TEX1 v[4] = {
{hres-.5f, -.5f, 0.f, 1.f, 0.f, 0.f},
{ -.5f, -.5f, 0.f, 1.f, 1.f, 0.f},
{hres-.5f, vres-.5f, 0.f, 1.f, 0.f, 1.f},
{ -.5f, vres-.5f, 0.f, 1.f, 1.f, 1.f}};
// flipped vertically
CUSTOMVERTEX_TEX1 v[4] = {
{ -.5f, vres-.5f, 0.f, 1.f, 0.f, 0.f},
{hres-.5f, vres-.5f, 0.f, 1.f, 1.f, 0.f},
{ -.5f, -.5f, 0.f, 1.f, 0.f, 1.f},
{hres-.5f, -.5f, 0.f, 1.f, 1.f, 1.f}};
// flipped horizontally and vertically
CUSTOMVERTEX_TEX1 v[4] = {
{hres-.5f, vres-.5f, 0.f, 1.f, 0.f, 0.f},
{ -.5f, vres-.5f, 0.f, 1.f, 1.f, 0.f},
{hres-.5f, -.5f, 0.f, 1.f, 0.f, 1.f},
{ -.5f, -.5f, 0.f, 1.f, 1.f, 1.f}};
// note: diagonal flipping requires adaption of the input-to-output resolution
// flipped diagonally over the top-left-to-bottom-right axis
CUSTOMVERTEX_TEX1 v[4] = {
{ -.5f, -.5f, 0.f, 1.f, 0.f, 0.f},
{ -.5f, vres-.5f, 0.f, 1.f, 1.f, 0.f},
{hres-.5f, -.5f, 0.f, 1.f, 0.f, 1.f},
{hres-.5f, vres-.5f, 0.f, 1.f, 1.f, 1.f}};As a last note, the IDirect3DDevice9::StretchRect() function can do nearest neighbor and bilinear image resizing, but does not support rotation. A call to DrawIndexedPrimitive() or DrawPrimitive() on a correctly configured device, with optionally a resizing shader enabled for that stage is required for rotation.

hadi79
1st February 2014, 15:12
Thanks for your help. JanWillem32

hadi79
1st February 2014, 15:26
Thanks for your help. JanWillem32

JanWillem32
2nd February 2014, 20:07
You're welcome. If you need any help with other filtering code, such as the low-pass or anti-aliasing effects I mentioned, I'm happy to help.

romulous
7th February 2014, 09:11
Jan-Willem: Many thanks for your response to my questions from Blight - he has made some progress in integrating the script into Zoom thanks to your help :)

Regards,

romulous

romulous
12th February 2014, 07:55
Hi again Jan-Willem,

Blight had a follow-up question regarding the rotation shader. He isn't sure if you can compensate for the cropping like you described previously. He has tried, but he believes the actual content is cropped and is therefore not accessible to the video renderer. He wanted me to ask you using ibasicvideo/ivideowindow, how to make it work if the source video resolution is1920x1080 (for example, what values to pass the ibasicvideo/ivideowindow interfaces).

Regards,

romulous

JanWillem32
12th February 2014, 11:49
I just reviewed the rotation shader again. I added the comment "// This shader shoud be run as a screen space pixel shader when enabling diagonal flipping, else both modes will work.", but that's actually not entirely correct. If the resizer section clips the input video rectangle before the rotation (common when down-sizing), a part of the input is lost. In such a case, the rotation shader can be inserted before resizing.
Note that there's no proper way to compensate if you need down-sizing in one direction and up-sizing in another. Also, positioning the input on the output rectangle can cause clipping in the resizing section as well. However, if you edit the vertices of the resizing section like I described before, none of these problems occur.

romulous
12th February 2014, 12:19
Thanks - this implementation is using madVR, which does not allow access to the vertices as far as Blight knows. I think that means we are stuck until madshi adds rotation support to madVR?

JanWillem32
12th February 2014, 14:51
I think you are correct.

leeperry
26th March 2014, 18:45
Hi Jan, would that be possible to align MPEG1 to look like MPEG2/H264 chroma via a PS script?

mVR only does MPEG2/H264 chroma alignment so this would be very handy.

I was under the impression that outputting YUY2 from ffdshow would align chroma properly but I ain't too sure anymore, I guess only AYUV & Y416 would do what I want?

Anyway, PotPlayer's seamless playback feature likes to crash randomly when not using its built-in decoders so that doesn't really solve anything :o

:thanks: for your help,

JanWillem32
27th March 2014, 13:52
Do you mean chroma cositing? http://msdn.microsoft.com/en-us/library/windows/desktop/ms694252%28v=vs.85%29.aspx
It's part of the standard of how to convert Y'CbCr to R'G'B'. The pixel shader stages accessible by the user are already R'G'B' or onward, and there's no real way to convert back if it was handled in the wrong way. I think it's best to just wait for the renderer to be fixed. I implemented support for both common forms of chroma cositing a few months ago. It's really not that hard to implement.

leeperry
27th March 2014, 15:53
HI Jan, thanks for the reply.

From http://bengal.missouri.edu/~kes25c/ddcc.zip\ddcc - ReadMe.txt
cplace -

Specifies horizontal chroma placement... used when input is yuy2. Possible settings:

0 - chroma is aligned with left pixel in each pair (mpeg2, mpeg4, h264)

1 - chroma is centered between each pair of pixels (h261, h263, mpeg1)

default: 0

I do realize that it'll require some work considering that PS scripts work in RGB, but Leak wrote a script to smooth chroma (http://forum.doom9.org/showpost.php?p=1184975&postcount=32) so it still should be possible? Or not? :o

What I would need is a PS script that would align MPEG1 chroma like MPEG2 so mVR would align it properly afterwards, basically a PS script of one of those scripts: http://avisynth.nl/index.php/External_filters#Chroma_correction
ChromaShiftSP : This script can shift chroma in all directions with subpixel accuracy.

Humm, so forcing AYUV/Y416 output from ffdshow is the only way to overcome the problem?

nevcairiel
27th March 2014, 17:10
Humm, so forcing AYUV/Y416 output from ffdshow is the only way to overcome the problem?

That is assuming ffdshow even does it correctly, especially when it comes to the 16-bit formats I wouldn't really trust it.

foxyshadis
28th March 2014, 02:18
No, forcing any output won't change anything, except RGB output. (ffdshow will sample chroma correctly.)

To perform it in pixel shader, you'd have to convert it back to YUV exactly how it was converted to RGB, then convert it back to RGB correctly. The pixel shader never sees the YUV data, because it's already done by the time you can run a shader. Given the many possible ways to convert to RGB in the renderer, that's a pretty tall order, and madshi would have to help. Presumably, that workflow would also break dithering.

You have to do it in the decoding pipeline, not the rendering. Consider enabling ffdshow's avisynth processing and using that script to process the YUV data while decoding.

DarkSpace
28th March 2014, 13:00
To perform it in pixel shader, you'd have to convert it back to YUV exactly how it was converted to RGB, then convert it back to RGB correctly.
Why? From my limited understanding, shouldn't it be enough to separate Luma from Chroma using any matrix, then shift the Chroma channels (I think by 0.25 to the left?), and then convert back to RGB using the same matrix?

leeperry
28th March 2014, 17:58
OK thanks for the replies. I also thought that it would be fairly easy just like Darkspace is explaining it, I even assumed that YUY2 out of ffdshow would look properly aligned duh....I'll have to find a proper test pattern and give it a whirl.

leeperry
28th March 2014, 18:01
I'll just ask DragonQ for this sample (http://forum.doom9.org/showpost.php?p=1622754&postcount=18186) :)

JanWillem32
29th March 2014, 01:50
I have been thinking. Converting R'G'B' back to Y'CbCr is possible, but it will be in the 4:4:4 form. Given the context of video playback in this case, it is absolutely impossible to get the 4:2:2 or 4:2:0 chroma back as it was output by the decoder if any attempt of chroma-upsampling was made, with the notable exception of nearest neighbor filtering. Nearest neighbor filtering doesn't change any values nor discards any pixels. Given the 2:1 magnification factor, the sampling grid is simple as well. I already made pixel-shaded filters that exploit this trick. Both EVR and VMR-9 will use nearest neighbor chroma upsampling in certain cases (but still convert to flawed R'G'B' anyway). Fixing the internal stages of EVR or VMR-9 isn't possible. Using the renderer to convert the flawed R'G'B' back to 4:2:2 or 4:2:0 Y'CbCr, upsampling the chroma to 4:4:4 with one of the 15 filters I wrote and converting to R'G'B' again isn't ideal, but it does work. I'm fine with publicating the 15 chroma up-sampling up-sampling filters (in both common cositing modes) if people really want to use them. If anyone would like to have a sneak peak, you can open the file containg the source code for the internal filters I've written. It's the set of initial pass shaders in the bottom half of \src\filters\renderer\VideoRenderers\InternalDX9Shaders.h . However, the preferred fix for the problem is still of course to edit the renderer. As I said before, if you have access to the Y'CbCr mixing stages, implementing chroma upsampling for both common modes of cosited chroma is easy.

leeperry (first part), I advise to not give much regards to Leak's shader. None of the filtering stages do anything correctly. That shader should never be used.
As for the set of Avisynth filters, I can port a few on request. I guess that some of them could be useful.

foxyshadis, both madVR and my renderer in quality mode keep quantization high enough to generally not cause additional quantization artifacts than present in the source video. To be more specific; operations are done in single-precision floating point precision and intermediate storage of pixels in textures are formatted in half precision floating-point (lowest quality), 16-bit unsigned integer (very decent quality) or single-precision floating point (same as the processing format, mostly useful for debugging because of the high memory consumption). Between filtering stages pixels are never dithered. Dithering is only done in the very last filtering stage, as the backbuffer presented on the video adapter's output is merely 10-bit R'G'B' or worse, 8-bit R'G'B'. Breaking dithering stages is an artificial problem of all those old methods that employ poor quantization and vapid integer color processing methods of a bygone era.
In short, the video renderer stages don't have to bother with dithering at all until creating the output image for the video port.
On top of that, I'm really not a fan of moving any filters that the renderer should handle to some intermediate filtering pass attached to the decoder. Taking raw output from the decoder to the renderer is the best situation, partially because of the reasons I already noted. Sadly I'm stuck at the moment with the VMR-9, EVR, RealMedia an Quicktime mixers that are extremely limited in terms of input formats and various other capabilities. The renderer actually far exceeds the capabilities of the mixers. For madVR there are no such limitations, but that comes at the cost of the minimum processing efficiency of the mixer stages.

leeperry (second part), YUY2 (8-bit 4:2:2 Y'CbCr) has the same chroma cositing issues as any other 4:2:0 or 4:2:2 format.
As for the sample posted by DragonQ, I support the comment posted by nevcairiel following that sample. For what I've seen, the EVR and VMR-9 always upsample chroma using the MPEG1 method. (I actually don't understand why the MPEG2 chroma cositing method was ever implemented at all, as there are only disadvantages to it. Not that I really mind though, the cositing issues are just yet another fine example of what an absolutely terrible format Y'CbCr is and that it shoud be supplanted by a good implementation of color representation as soon as possible.)

yok833
30th March 2014, 07:06
Is it normal that when I apply lumasharpen several times in post or pre resize, there is no difference as if I had made it only once? I really would like to try this shader but I do not see any real difference (unlike of when I am using complex 2 or finesharp) so I am not sure to use it well...
However everything seems to be well configured (V 1.4.1) and I have also tried to rise the sharpening in the lumasharpen.hlsl file....
Maybe the sharpening is more light or subtle with this shader?

turbojet
30th March 2014, 07:25
It's not normal but there is at least one lumasharpen 1.4.1 out there that does nothing but load the gpu. Depending on settings it might not be very strong. If you open the file in a text editor what's next to strength and pattern?

http://forum.doom9.org/showthread.php?t=170357 (http://forum.doom9.org/showthread.php?t=170357) is the thread for lumasharpen, better to post your reply there.

CiNcH
28th June 2014, 09:42
Hi JanWillem32,

I am currently playing around with shaders and stuff. I am not a lot into D3D and video algorithms yet. There is one thing I wonder though. The vertices you define for resize shaders contain the texture coordinates in screenspace (so in pixels) rather than [0,1]. I wonder what difference that makes. I can't find anywhere that this is actually legal. I though D3D would clamp that to [0,1]. What happens with those texture coordinates? Are they passed to the shader?

JanWillem32
4th July 2014, 17:23
Texture coordinates are simply scaled in the space that you declare with the vertex declaration. For the resizers I just optimized the input and output parameters to suit the routines.
The vertices for the five main stages are set up like this:struct CUSTOMVERTEX_TEX1 {
float x, y, z, rhw, u, v;
};
...
// dCenterXo and dCenterYo already have .5 subtracted from them
float utlX = static_cast<float>(dTopLeftX + dCenterXo), utlY = static_cast<float>(dTopLeftY + dCenterYo);// offset to the top-left point
float utrX = static_cast<float>(dTopRightX + dCenterXo), utrY = static_cast<float>(dTopRightY + dCenterYo);// offset to the top-right point
float ublX = static_cast<float>(dBottomLeftX + dCenterXo), ublY = static_cast<float>(dBottomLeftY + dCenterYo);// offset to the bottom-left point
float ubrX = static_cast<float>(dBottomRightX + dCenterXo), ubrY = static_cast<float>(dBottomRightY + dCenterYo);// offset to the bottom-right point
float vidw = m_fVideoWidth - .5f, vidh = m_fVideoHeight - .5f, wndw = m_fWindowWidth - .5f, wndh = m_fWindowHeight - .5f;

__declspec(align(16)) CUSTOMVERTEX_TEX1 v[20] = {// lists for DrawIndexedPrimitive() with the number used for the BaseVertexIndex item
{-.5f, -.5f, 0.f, 1.f, 0.f, 0.f},// window size: 0
{wndw, -.5f, 0.f, 1.f, 1.f, 0.f},
{-.5f, wndh, 0.f, 1.f, 0.f, 1.f},
{wndw, wndh, 0.f, 1.f, 1.f, 1.f},
{-.5f, -.5f, 0.f, 1.f, 0.f, 0.f},// video size: 4
{vidw, -.5f, 0.f, 1.f, 1.f, 0.f},
{-.5f, vidh, 0.f, 1.f, 0.f, 1.f},
{vidw, vidh, 0.f, 1.f, 1.f, 1.f},
{utlX, utlY, 0.f, 1.f, -.5f, -.5f},// 1 pass resizers: 8
{utrX, utrY, 0.f, 1.f, vidw, -.5f},
{ublX, ublY, 0.f, 1.f, -.5f, vidh},
{ubrX, ubrY, 0.f, 1.f, vidw, vidh},
{utlX, -.5f, 0.f, 1.f, -.5f, 0.f},// 2 pass resizers x: 12
{utrX, -.5f, 0.f, 1.f, vidw, 0.f},
{ublX, vidh, 0.f, 1.f, -.5f, 1.f},
{ubrX, vidh, 0.f, 1.f, vidw, 1.f},
{-.5f, utlY, 0.f, 1.f, 0.f, -.5f},// 2 pass resizers y: 16
{wndw, utrY, 0.f, 1.f, 1.f, -.5f},
{-.5f, ublY, 0.f, 1.f, 0.f, vidh},
{wndw, ubrY, 0.f, 1.f, 1.f, vidh}
};

ryrynz
2nd September 2014, 09:00
Anyone else seen Samsung's auto depth enhancer at work on the U9000? It's IMO very effective, It brings about an almost 3D experience that I would hope other manufacturers apply to their TVs in future. With some excellent sharpeners already available how much work is it release something similar?

burfadel
2nd September 2014, 09:07
Anyone else seen Samsung's auto depth enhancer at work on the U9000? It's IMO very effective, It brings about an almost 3D experience that I would hope other manufacturers apply to their TVs in future. With some excellent sharpeners already available how much work is it release something similar?

That can be done, it's just a question of knowing how to do it and to do it so the performance penalty is low.

sucht
28th January 2015, 20:56
hi, is it possible to take an Avisynth script like FastLineDarken/FastLineDarkenMod and make a Pixel Shader out of it?

I want to post-process my anime DVD's in realtime without ffdshow/avisynth, because ffdshow and avisynth are crashing a lot.

We can use Video pixel shaders in somthing like MPC-HC, so i thought i ask here in this thread.... sorry if it was the wrong thing to do.



ciao
thomas

JanWillem32
28th January 2015, 21:14
There are some effects that are hard to implement in purely a pixel shader. For example the rotation shader mentioned earlier in the thread can't change the output vertices to produce a 10801920 image from a 19201080 image. These filters require renderer support (which can also be arranged, by the way). Effects that don't change the size of the output image nor require special intermediate steps in between pixel shaders (such as resizers and 3D shadow effects) can be implemented as simple pixel shaders.

sucht
29th January 2015, 00:08
Ok, so basically i'm better off using ffdshow and Avisynth for this Specific Stuff, because it sounds like alot of work and time is needed to make something like this, or better yet write a shader that does something similar.


Sorry for my English, and Thank you for your Time.



ciao
thomas

JanWillem32
29th January 2015, 00:40
I think you misunderstood. All effects can be achieved by pixel shaders and in some cases custom renderer steps for when you need control over textures, texture size, orientation and such. For the example of FastLineDarken/FastLineDarkenMod; these are mere sharpeners. I've written plenty of those already. I personally don't like sharpening very much (I dislike visual artifacts of any kind that deviate from good studio quality masters), but they are reasonably easy to write.

Shiandow
29th January 2015, 00:56
Ok, so basically i'm better off using ffdshow and Avisynth for this Specific Stuff, because it sounds like alot of work and time is needed to make something like this, or better yet write a shader that does something similar.

Sorry for my English, and Thank you for your Time.

ciao
thomas

I had a quick look, it doesn't seem too difficult to make something like FastLineDarken, although it would be quite a bit easier if you set it's "thinning" argument to 0. But I do agree that it would be a lot easier to make a shader that does something similar. If you just want to darken some lines you could take some simple sharpening shader like LumaSharpen and make a small change such that it only applies sharpening when the result is darker.

sucht
29th January 2015, 01:22
I did misunderstood you then, and i'm sorry.

I also don't like them much... but they(FastLineDarken/FastLineDarkenMod) helped some of my older Anime DVD's to look better when played with mpc-hc/madVR on my 50" 1080p TV.

I'm learning to write shaders(mostly for emulators), and that was more or less the reason of asking.

Nevertheless, i will use ffdshow and Avisynth and will try your Shaders, maybe i find something that i like, and can move on/away from ffdshow and avisynth.


and again Thank you for your Time.



ciao
thomas

XRyche
1st February 2015, 02:14
JanWillem32, your "(JW) H&V sharpen complex, deband, denoise SD-HD 16 bit FP for XLRCAM and LMS" that you created a while always have to be used with the "initial colour mixing stage" disabled. Is there any way to modify them to work without disabling colour mixing?

JanWillem32
1st February 2015, 02:56
No, there isn't. Though I would like a more extended shader control panel with a lot more shader stages as options, e.g.: 4:2:0-, 4:2:2 and 4:4:4-Y'CbCr stages, and a post-subtitle-OSD-stats-stage. Even with those, I would certainly allow users to disable the default initial passes to be able to use custom versions.

XRyche
1st February 2015, 03:37
No, there isn't. Though I would like a more extended shader control panel with a lot more shader stages as options, e.g.: 4:2:0-, 4:2:2 and 4:4:4-Y'CbCr stages, and a post-subtitle-OSD-stats-stage. Even with those, I would certainly allow users to disable the default initial passes to be able to use custom versions.

I didn't think so. Just wanted to check. I sometimes forget to disable colour mixing before I use those shaders and it makes me scratch my head for a moment :) .

burfadel
12th May 2015, 11:45
JanWillem32, I have a question relating to the 'semi-random grayscale noise' shader, part of the video pixel shader pack. I'm using the latest MPDN from Doom9 (http://forum.doom9.org/showthread.php?t=171120), set up for quality. There is an option in the settings to use shaders through the image processor. The shaders on options are available through the extensions pack, which you extract. The files then end up being in Extensions\RenderScripts\ImageProcessingShaders

Anyways, if I put your shader file in the MP-HC shader folder and enable it in MPDN, it doesn't work. If I alter the noise setting it still does nothing, unless I set a very extreme setting like 40/64 and the screen is white. Anything up to when it goes white, the picture is unchanged. This filter is of course, set last in the chain.

I was wondering whether you could take a look at this, as the shader is very good at perceptually increasing the picture quality. The added grain is very nice when set at a low, but visible strength being very small grains and only on the luma channel.

JanWillem32
12th May 2015, 14:35
That shader uses the time and frame counters. Try the "flash every frame" and "flash every second" from the development folder. If these don't work, the renderer isn't passing them and you should request the MPDN team for implementing these features.

Shiandow
12th May 2015, 15:09
They are implemented (sort of) although they might need some work. The problem seems to be that the value of c0.w is simply too large, at the moment it's the amount of time elapsed since 00:00, 01/01/0001, in tenths of microseconds.

What kind of values do your shaders expect?

JanWillem32
12th May 2015, 18:34
-c0.x shader stage width in pixels
-c0.y shader stage height in pixels
-c0.z source video frame count from start of playback
-c0.w source video time stamp in seconds from start of playback
-c1.x reciprocal of the shader stage width in pixels
-c1.y reciprocal of the shader stage height in pixels
-c1.z (post-resize) left whole pixel offset from the video rectangle to the display device
-c1.w (post-resize) top whole pixel offset from the video rectangle to the display device
-c2.x (post-resize) left normalized [0, 1] position of the borders of the video rectangle relative to the rendering stage window rectangle
-c2.y (post-resize) top normalized [0, 1] position of the borders of the video rectangle relative to the rendering stage window rectangle
-c2.z (post-resize) right normalized [0, 1] position of the borders of the video rectangle relative to the rendering stage window rectangle
-c2.w (post-resize) bottom normalized [0, 1] position of the borders of the video rectangle relative to the rendering stage window rectangle

c0.zw are 0-based, so intro-effects can work. They are also reset once in a while to deal with 16- and 32-bit floating-point limitations when counting.

Shiandow
12th May 2015, 19:33
Ok, thanks, I'll add a quick fix for the clock parameters.

burfadel
12th May 2015, 20:12
Ok, thanks, I'll add a quick fix for the clock parameters.

I'll let you know if that has fixed the issue for me :). It's a great shader when the grain is very slight, it brings 'life' to the picture.

James Freeman
23rd May 2015, 06:46
Hi Jan.

Is there a trapezoid correction for projectors somewhere inside your filter package?

JanWillem32
25th May 2015, 17:28
There are two shaders (which I should update with newer interpolation methods some time):
"development\3LCD panel horizontal software alignment, Catmull-Rom spline6 interpolated.txt" and
"development\3LCD panel vertical software alignment, Catmull-Rom spline6 interpolated.txt".
If you need a simpler or different effect I can write a custom one.

James Freeman
27th May 2015, 15:15
Thank you.
I have found the 3lcd shader and tried to use it, but with no success.
I could not create a trapezoid shape.

Maybe someting simpler that allows to control the geometry by pixel number XY in each corner,
Like 3LCD but all colors togethes.

JanWillem32
29th May 2015, 14:27
That's easy:// (C) 2015 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// projector panel horizontal software alignment, Catmull-Rom spline4 interpolated
// This shader should be run as a screen space pixel shader.
// This shader requires compiling with ps_2_0, but higher is better, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports.
// This shader is meant to work with linear RGB input and output. Regular R'G'B' with a video gamma encoding will have to be converted with the linear gamma shaders to work properly.
// This shader can only work when the display device receives input in its native display resolution, with over- and underscan disabled. This is to facilitate 1:1 pixel mapping.
// This shader can perform software alignment by Catmull-Rom spline4 interpolation for a projector's panels.

// fractions, either decimal or not, are allowed
// set the horizontal resolution
#define HorizontalResolution 1920.
// ShiftLeftToRight, a value of 3. will shift three pixels to the right, 0 is disabled
#define ShiftLeftToRight 64.
// ScaleHorizontal, the centered horizontal magnification factor, a value of HorizontalResolution-3. will scale the output to 3 pixels larger, a value of HorizontalResolution means disabled
#define ScaleHorizontal HorizontalResolution-128.
// Parallelogram, the centered horizontal offset factor on the bottom, a value of 3. will shift 0 pixels to the left on the top, and 3 pixels to the left on the bottom, 0 is disabled
#define Parallelogram 256.
// Keystone, the centered horizontal magnification factor on the bottom, a value of -3. will scale the output to 0 pixels larger on the top, and 3 pixels larger on the bottom, 0 is disabled
#define Keystone -192.

sampler s0;
#define sp(A, O) float4 A = tex2D(s0, float2(coord+O/HorizontalResolution, tex.y));

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float coord = (tex.x-.5)*(ScaleHorizontal+tex.y*Keystone)+.5*HorizontalResolution-ShiftLeftToRight+tex.y*Parallelogram;// assign the output position, normalized to texture width in pixels
float t = frac(coord);// calculate the difference between the output pixel and the original surrounding two pixels
coord = (coord-t+.5)/HorizontalResolution;

sp(Q0, -1.) sp(Q1, 0.) sp(Q2, 1.) sp(Q3, 2.)// original pixels
// calculate weights
float4 w0123 = ((float4(-.5, 1.5, -1.5, .5)*t+float4(1., -2.5, 2., -.5))*t+float4(-.5, 0., .5, 0.))*t;
w0123.y += 1.;
return w0123.x*Q0+w0123.y*Q1+w0123.z*Q2+w0123.w*Q3;// interpolation output
}

James Freeman
29th May 2015, 16:40
Thank you.

Though, it does not work for me.
I can't create a trapezoid shape at all.

Maybe I should be more specific?
My goal is narrow the upper or lower part of the image to create a symmetric trapezoid.
http://etc.usf.edu/clipart/40600/40699/pb_trp_40699_lg.gif

The current shader also stretches the sides when changing shape, while the correct way is to make everything black outside the image.

JanWillem32
29th May 2015, 19:52
Same shader, but with added borders:// (C) 2015 Jan-Willem Krans (janwillem32 <at> hotmail.com)
// This file is part of Video pixel shader pack.
// This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
// This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
// You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

// projector panel horizontal software alignment, Catmull-Rom spline4 interpolated with borders
// This shader should be run as a screen space pixel shader.
// This shader requires compiling with ps_2_0, but higher is better, see http://en.wikipedia.org/wiki/Pixel_shader to look up what PS version your video card supports.
// This shader is meant to work with linear RGB input and output. Regular R'G'B' with a video gamma encoding will have to be converted with the linear gamma shaders to work properly.
// This shader can only work when the display device receives input in its native display resolution, with over- and underscan disabled. This is to facilitate 1:1 pixel mapping.
// This shader can perform software alignment by Catmull-Rom spline4 interpolation for a projector's panels and apply borders where the signal is cut off.

// fractions, either decimal or not, are allowed
// set the horizontal resolution
#define HorizontalResolution 1920.
// ShiftLeftToRight, a value of 3. will shift three pixels to the right, 0 is disabled
#define ShiftLeftToRight 64.
// ScaleHorizontal, the centered horizontal magnification factor, a value of HorizontalResolution-3. will scale the output to 3 pixels larger, a value of HorizontalResolution means disabled
#define ScaleHorizontal HorizontalResolution-128.
// Parallelogram, the centered horizontal offset factor on the bottom, a value of 3. will shift 0 pixels to the left on the top, and 3 pixels to the left on the bottom, 0 is disabled
#define Parallelogram 256.
// Keystone, the centered horizontal magnification factor on the bottom, a value of -3. will scale the output to 0 pixels larger on the top, and 3 pixels larger on the bottom, 0 is disabled
#define Keystone -192.
// BorderColor, the color to apply at the borders created by this shader
#define BorderColor float4(0., 0., 0., 1.)

sampler s0;
#define sp(A, O) float4 A = BorderColor; {float pl = coord+O/HorizontalResolution; if(all(float2(pl, -pl) >= float2(0., -1.))) A = tex2D(s0, float2(pl, tex.y));}

float4 main(float2 tex : TEXCOORD0) : COLOR
{
float coord = (tex.x-.5)*(ScaleHorizontal+tex.y*Keystone)+.5*HorizontalResolution-ShiftLeftToRight+tex.y*Parallelogram;// assign the output position, normalized to texture width in pixels
float t = frac(coord);// calculate the difference between the output pixel and the original surrounding two pixels
coord = (coord-t+.5)/HorizontalResolution;

sp(Q0, -1.) sp(Q1, 0.) sp(Q2, 1.) sp(Q3, 2.)// original pixels
// calculate weights
float4 w0123 = ((float4(-.5, 1.5, -1.5, .5)*t+float4(1., -2.5, 2., -.5))*t+float4(-.5, 0., .5, 0.))*t;
w0123.y += 1.;
return w0123.x*Q0+w0123.y*Q1+w0123.z*Q2+w0123.w*Q3;// interpolation output
}

James Freeman
30th May 2015, 09:14
Jan sorry to bother you, but this shader simply can't create a trapezoid shape I posted above.

JanWillem32
30th May 2015, 15:51
Really?
I get a nice bordered (red) keystone with these values:

// fractions, either decimal or not, are allowed
// set the horizontal resolution
#define HorizontalResolution 1920.
// ShiftLeftToRight, a value of 3. will shift three pixels to the right, 0 is disabled
#define ShiftLeftToRight 0.
// ScaleHorizontal, the centered horizontal magnification factor, a value of HorizontalResolution-3. will scale the output to 3 pixels larger, a value of HorizontalResolution means disabled
#define ScaleHorizontal HorizontalResolution-
// Parallelogram, the centered horizontal offset factor on the bottom, a value of 3. will shift 0 pixels to the left on the top, and 3 pixels to the left on the bottom, 0 is disabled
#define Parallelogram 0.
// Keystone, the centered horizontal magnification factor on the bottom, a value of -3. will scale the output to 0 pixels larger on the top, and 3 pixels larger on the bottom, 0 is disabled
#define Keystone -777.
// BorderColor, the color to apply at the borders created by this shader
#define BorderColor float4(1., 0., 0., 1.)

Notice "HorizontalResolution-" for ScaleHorizontal, it's a trick for auto-aligning negative keystone values.

James Freeman
30th May 2015, 18:19
Now try to create the shape I posted with the narrow side on top.
The border also don't touch the end of the screen.

JanWillem32
30th May 2015, 18:41
The keystone the other way around is easy. Is this what you want?

// fractions, either decimal or not, are allowed
// set the horizontal resolution
#define HorizontalResolution 1920.
// ShiftLeftToRight, a value of 3. will shift three pixels to the right, 0 is disabled
#define ShiftLeftToRight 0.
// ScaleHorizontal, the centered horizontal magnification factor, a value of HorizontalResolution-3. will scale the output to 3 pixels larger, a value of HorizontalResolution means disabled
#define ScaleHorizontal HorizontalResolution+777.
// Parallelogram, the centered horizontal offset factor on the bottom, a value of 3. will shift 0 pixels to the left on the top, and 3 pixels to the left on the bottom, 0 is disabled
#define Parallelogram 0.
// Keystone, the centered horizontal magnification factor on the bottom, a value of -3. will scale the output to 0 pixels larger on the top, and 3 pixels larger on the bottom, 0 is disabled
#define Keystone -777.
// BorderColor, the color to apply at the borders created by this shader
#define BorderColor float4(1., 0., 0., 1.)

James Freeman
30th May 2015, 18:51
Thank you.

Magik Mark
30th May 2015, 23:06
Sorry for my ignorance, how do we apply this in MPDN player? It uses hsl extension.

foxyshadis
30th May 2015, 23:36
Sorry for my ignorance, how do we apply this in MPDN player? It uses hsl extension.

The extension doesn't matter, although it defaults the same as MPC, .hlsl. You want to select Image Processor (by itself or in a script chain), open its properties, click '+', and find the shader you want. You'll have to configure everything within the shader in notepad or similar, since there's no internal shader editor.

James Freeman
4th June 2015, 10:20
Can anyone please suggest a shader for white balance correction R,G,B?
I tried some of the shaders in the pixel shader pack but it also changes the contrast, brightness or gamma on each color.

Is there a shader that changes just the R,G,B Levels (not contrast, brightness, or gamma), for white balance fixing.
This is for my not so good projector in which the color management is non existent.

Thank you.

JanWillem32
4th June 2015, 22:22
There's the Multi-stage gamma controls (http://forum.doom9.org/showthread.php?p=1650254#post1650254) shader you asked for before. For altering the high red, green and blue values I could alter the z-segment to not end on 1 (full intensity) or add contrast and brightness controls there.
What controls do you need exactly? (Note that altering this particular shader with more advanced features will take some time and effort. It's rather complicated.)

James Freeman
7th June 2015, 10:33
Hi Jan.

I need a shader that alters the RGB levels just like a TV or a Monitor.
NOT contrast, brightness or gamma; just R,G,B LEVELS (in floating point of course).
It should effect luma and chroma, just like tweaking the display pixels themselves.

Just remove or add Green (for example) across the whole range, without effecting the contrast or brightness.

Thanks

JanWillem32
7th June 2015, 17:21
I don't have a clue what to implement for a "levels" control to be honest.

The RGB controls act on a linear 0 to 1 function.
-Changing the origin of the linear 0 to 1 function changes brightness (and dislodges the black and white points from their origins in the process).
-Changing the angle of the linear 0 to 1 function changes contrast (and dislodges the white point from its origin in the process).
-Applying a power function to the linear 0 to 1 function changes gamma (and keeps the black and white points on their origins in the process).
(None of the controls are really independent.)

Higher complexity than that can involve applying S-curves to the function, but these behave similar to gamma controls.
The the Multi-stage gamma controls shader also adds higher complexity in the form of allowing the user to set way-points for the gamma controls.

Beyond these functions I don't have a clue what function to add, especially one that doesn't "alter" contrast, brightness or gamma.

James Freeman
7th June 2015, 18:20
I don't have a clue what to implement for a "levels" control to be honest.
The RGB controls act on a linear 0 to 1 function.
-Changing the origin of the linear 0 to 1 function changes brightness (and dislodges the black and white points from their origins in the process).
-Changing the angle of the linear 0 to 1 function changes contrast (and dislodges the white point from its origin in the process).
-Applying a power function to the linear 0 to 1 function changes gamma (and keeps the black and white points on their origins in the process).
(None of the controls are really independent.)


Thank you!

Apparently it is a simple Brightness control for R,G,B which dislodges the black and white linearly.
I even think that something like this already exists.

EDIT:
I does exist!
called: brightness, contrast and gamma controls for RGB
Sorry to bother you once again Jan.

nevcairiel
7th June 2015, 18:57
Most graphics drivers have such settings as well, which might be easier and more universally useful.

romulous
21st October 2015, 13:50
Hi JanWillem,

Any chance of an updated pack with all the new scripts, and changes you have made to the existing scripts over the years (the current pack is dated 2011)?

romulous

JanWillem32
21st October 2015, 18:17
I could use some help with collecting all the newer pixel shaders and putting them into .txt files. I don't have all of them on file anymore. After that, it's going to take a while for me to re-vamp all the shaders. Some use pretty outdated methods, all of them need renewed main comments and some could use some comments in the programming details.

vBm
21st October 2015, 23:56
I could use some help with collecting all the newer pixel shaders and putting them into .txt files. I don't have all of them on file anymore. After that, it's going to take a while for me to re-vamp all the shaders. Some use pretty outdated methods, all of them need renewed main comments and some could use some comments in the programming details.

IMHO ideal would be to have repository for shaders on github or whenever xD

andybkma
3rd December 2015, 02:57
Hi, new to the whole shader thing as Zoom Player just achieved support for them. My question is if there is a shader that approximates the effects of using Avisynth script Limited Sharpen Faster (LSF) in real time? Would be nice to finally stop having to use LSF in order to drastically improve the pic quality of videos. Thanks

XRyche
4th December 2015, 00:33
Hi, new to the whole shader thing as Zoom Player just achieved support for them. My question is if there is a shader that approximates the effects of using Avisynth script Limited Sharpen Faster (LSF) in real time? Would be nice to finally stop having to use LSF in order to drastically improve the pic quality of videos. Thanks

JanWillem32 is away for a bit but to my knowledge he doesn't have any shaders like that. The best sharpening shader he has is a tweaked unsharpenmask shader intended for pre-processing.

I don't even think LSF can even be ported to HLSL? If it could I imagine it would have already been done.

There is "Adaptive sharpen" http://forum.doom9.org/showthread.php?t=172131 which can be a decent shader sharpener if used right.

IanD
19th October 2017, 00:43
I'm interested in using a shader to scale 1920x1080 video to 3840x2160 using a 2x2 matrix via a lookup table, operating on each pixel in the source frame in turn, not convolving 2x2 groups of pixels (ie similar to point resize but setting each pixel in the 2x2 block to a different value based on the matrix, not setting each pixel the same as the source pixel).

Can anyone point me in the right direction of tutorials to understand how to do this, or to explain why it can't be done this way?

My 4k LG OLED TV has quantisation issues near black where the effective bit depth is less than 8 bits (for an alleged 10 bit panel). I want to see whether a simple spatial 2x2 dither of 8 bit 1920x1080 inputs to 6 bit 3840x2160 output can help ameliorate the issue, with a further development of only applying the dither to near black pixels and using a more advanced scaling for non near black pixels. I wasn't interested in a traditional diffusion dither because I think LG already uses this and it creates an obviously much noisier image.

jeanlee20xx
18th December 2017, 07:41
I want play 2d as SBS,how to write the code?thank you!!!!:thanks::thanks::thanks::thanks:

PCU
15th February 2018, 14:34
request: deblocker shader for low quality videos like old bink videos.

JNW
20th February 2018, 05:44
I don抰 think anyone ever made a deblocking shader. This is usually accomplished with postprocessing like in Potplayer. Why don抰 you use ffdshow raw filter that way you can still use MPC-HC and LAV.

A good shader for MPC-HC I always thought would be Lumasharpen. Works just like in madVR https://gist.github.com/sthalik/c1b09db3465001e31144

PCU
20th February 2018, 16:17
well ffdshow is dead, isn't it?

JNW
20th February 2018, 17:26
Yes, that抯 true but if you抮e only using it for the deblocking feature it doesn抰 really matter. Think of it as using a shader, they don抰 get updated.

PCU
20th February 2018, 19:13
Yes, that’s true but if you’re only using it for the deblocking feature it doesn’t really matter. Think of it as using a shader, they don’t get updated.

how to deblock this file (max payne 2 intro) correctly?
http://www.filepup.net/files/a0ba90621519150241.html
no, maybe shaders gets updated, but ffdshow (even tryouts) is 100% dead & ffdshow has lots of other features that i don't use.

v0lt
22nd February 2018, 20:41
well ffdshow is dead, isn't it?
No, ffdshow also works well as before.
no, maybe shaders gets updated, but ffdshow (even tryouts) is 100% dead & ffdshow has lots of other features that i don't use.
The presence of additional functionality does not prevent to use only what is needed.

JNW
23rd February 2018, 09:18
Thanks v0lt what I was getting at.

I've uploaded a couple of shaders. These are from the SweetFX shader pack which I had laying around from many moons ago where some were updated for better use in MPC. I'm uploading them for people running old or low spec PC (I still have one laying around myself) and are unable to use madVR to its full advantage. Simply put it in the shader folder.

The first one is LumaSharpen and better than the one I posted before as it has a default strength of 0.65 same as madVR so you don't have to go in and change it.
http://s000.tinyupload.com/index.php?file_id=00110955234181062890

The second is a sort of faux HDR shader to try and imitate it.
http://s000.tinyupload.com/index.php?file_id=09603921432233252135

As I said for low spec but maybe it will help somebody.

j82k
22nd April 2018, 00:01
Is it possible to modify a shader so that it is only applied to part of the picture? I found this LCD angle correction shader here: https://github.com/zachsaw/RenderScripts/blob/master/RenderScripts/ImageProcessingShaders/MPC-HC/LCD%20angle%20correction.hlsl

My intend is to reduce the vertical banding on my Oled TV and I think this shader could work if it can be modified to only apply to a user defined vertical column of the picture, for example from pixel 2880 to 3260. I have zero knowledge about shader writing and only got so far as to make the shader correction horizontal instead of vertical. Is what I want possible and can anyone do it??

v0lt
22nd April 2018, 06:43
JNW
Can I add the LumaSharpen shader to MPC-BE?