Video pixel shader pack [Archive] - Page 6

JanWillem32

22nd July 2011, 01:47

The "post resize" item used to be "screenspace". The skipping chroma isn't specific to DXVA, but to enabling either HFPP, FFPP, or 10-bit RGB output on ATi hardware.
Your situation could use this basic setup:
Shader chain:
1. RGB to Y'CbCr for SD&HD video input for floating point surfaces
2. special 4÷2÷0 to 4÷2÷2 intermediate Catmull-Rom spline5 chroma up-sampling for SD&HD video input
3. special 4÷2÷2 Catmull-Rom spline5 chroma up-sampling for SD&HD video input (linear output enabled)

Post resize shaders:
gamma conversion of linear RGB to video RGB for floating point surfaces

I linearize the gamma to avoid darkening artifacts on R'G'B' color blending. Here's an example of no gamma correction throughout the filter stages: http://photocreations.ca/interpolator/index.html .
Of course, you can also edit all kinds of color controls in the step 3 shader, and add a few custom shaders.
The last shader can simply be edited to any display gamma. The common range is usually from 1/2.2 to 1/2.6 . The 256/563. and 5/12. values are just two presets. The linearization power in step 3 is 2.4.
For lighter processing, you can enable the single-pass bilinear filter instead of steps 1, 2 and 3.
A video file encoded with 4:2:2 chroma down-sampling doesn't need step 2.

I'll take a look at adding a full list of chroma interpolation shaders later, I've finished quite a few resizers, anyway. The idea of making the shader collection external to the player in .TXT files was rejected. Shaders will have to be loaded from the registry or from a .INI file (warning: the .INI file can only store small ones).
You can copy from the registry if you need a backup or portability: http://forum.doom9.org/showthread.php?p=1514362#post1514362 .

Eliminateur

22nd July 2011, 04:04

hmmm this is werid i'm having problems with the 1-2-3 shaders.
when i tested before, it worked ok, now i keep seeing pixelation on red, and activating/deactivating/recompiling is not doing anything.
if i open shader editor and i ente the name it says "could not load shader", is that the problem of the .ini file?
how big is "small ones"?

Also, i was using the 16-235 -> 0-255 [SD][HD] to correct levels, after using your 1-2-3 shaders it doesn't works anymore, even if i put it in pre/post/before/after
i really like that shader because without it, blacks are "grey" in my monitor(hmm but if i enable that shader i lose black details as they go very black).

anyway, the linearization shader makes the gamma go extremely high, i've tried with 5/12 and it's still too bright, settled with 451/563 for now -i'm gonna do much more trial and error-, it brightens it up a little without ruining the blacks much.

btw, all this is valid for DXVA that outputs NV12, but when i use soft decoding i'm outputting RGB32 from ffdshow, how will these shaders act in that case?

oh, about processing power, compiled everything with PS3.0, tried playing back a moderate 1080p clip w/DXVA and all shaders active, GPU usage hovering 48%(with UVD clocks) so i have aaaample processing power to spare

JanWillem32

22nd July 2011, 08:23

"Small ones", as in: the total may not exceed 15 KB, I believe. The Windows registry has a limitation in size of 1 MB per entry.

For "16-235 -> 0-255 [SD][HD]", see page one of this thread and a few onwards. It's one of the shaders I'm definitely going to delete. It has also been several years since the last report of a driver/mixer that makes this flaw, as well.
Gray tints in video where you normally expect darker tints is normally caused by a low display gamma. Try your hand at a tool that can help set up your monitor controls. Note that most of those tools set up for a gamma of 2.2, consumer-grade video is a bit steeper normally at around 2.4.

The "gamma conversion of linear RGB to video RGB for floating point surfaces" shader de-linearizes the gamma from step 3 (activated manually in the shader). You shouldn't need to set it to a value far beyond the normal interval of 1/2.2 to 1/2.6 .
For any normal color controls, you can also activate controls for Y'CbCr and RGB stages in step 3. Those are meant to correct specific flaws in an encoded video, not so much the flaws in a display.

RGB32 is a compatibility option for usage with older video cards that can't really deal with any video processing at all. It's not a default output normally. Unless you are viewing a .BMP file through the Windows still image filter, you should never get RGB32/X8R8G8B8 as input for the mixer. Valid input for the mixer with 8-bit 4:2:0 Y'CbCr video files is NV12 or YV12, with 8-bit 4:2:2 Y'CbCr video files is YUY2.

I'm familiar with having headroom in processing power. I set my HD4890 at 700/975 clocks during video playback. It only has momentary spikes over 90% GPU usage now and then on 1080p video, even with my heavy rendering chain. (I do need set the full 3D clocks of 850/975 for a 1080i sample I have, though.) It has a nasty habit of setting the clocks to 500/975 in combination with DXVA playback, which is too low for my needs on 1080p, so that's why I override them.

CiNcH

24th July 2011, 14:35

Hi Jan,

I am currently also fooling around with pixel shaders and have a little problem...

I am creating a pixel shader:

D3DXCompileShaderFromFile( buffer, NULL, NULL, "main", "ps_2_a",dwShaderFlags, &pCode,&pBufferErrors, &g_pConstantTablePS[i] );
g_pd3dDevice->CreatePixelShader( (DWORD*)pCode->GetBufferPointer(),&g_pPixelShader);

I then draw the quad:

g_pd3dDevice->SetVertexShader(NULL);
g_pd3dDevice->SetPixelShader(g_pPixelShader);
g_pd3dDevice->DrawPrimitive( D3DPT_TRIANGLESTRIP, 0, 2 );
g_pd3dDevice->SetPixelShader(NULL);

This works like a charm.

I then switch pixel shader version to 3.0:

D3DXCompileShaderFromFile( buffer, NULL, NULL, "main", "ps_3.0",dwShaderFlags, &pCode,&pBufferErrors, &g_pConstantTablePS[i] );

When I do this, the quad is not drawn. I only see the background color which has been set with the call to g_pd3dDevice->Clear.

Do you have any idea what is going wrong when setting pixel shader version to 3.0?

(I am loading D3DX9_43.dll BTW)

Eliminateur

24th July 2011, 16:31

i use rgb32 because i want maximum quality output without any mixer or driver doing anything to the color conversion(i don't trust them) and i find ffdshow software HQ conversion is much better than the mixer.

i've been toying with the shaders more, and with RGB32 input they actually lower the quality of the chroma, it's like it downscales it(starts blurring and pixellating the red on black borders)

about the gamm, i tried your setting and it was so high it was unwatchable(i'm going to post screenshots later.
I've tried monitor setting wizard in the past, they all result in values that i don't like how they apper, either too dark or too bright i'm using custom TGB with close to 100 value, brightness at 4 and contrast 40)

"Small ones", as in: the total may not exceed 15 KB, I believe. The Windows registry has a limitation in size of 1 MB per entry.

For "16-235 -> 0-255 [SD][HD]", see page one of this thread and a few onwards. It's one of the shaders I'm definitely going to delete. It has also been several years since the last report of a driver/mixer that makes this flaw, as well.
Gray tints in video where you normally expect darker tints is normally caused by a low display gamma. Try your hand at a tool that can help set up your monitor controls. Note that most of those tools set up for a gamma of 2.2, consumer-grade video is a bit steeper normally at around 2.4.

The "gamma conversion of linear RGB to video RGB for floating point surfaces" shader de-linearizes the gamma from step 3 (activated manually in the shader). You shouldn't need to set it to a value far beyond the normal interval of 1/2.2 to 1/2.6 .
For any normal color controls, you can also activate controls for Y'CbCr and RGB stages in step 3. Those are meant to correct specific flaws in an encoded video, not so much the flaws in a display.

RGB32 is a compatibility option for usage with older video cards that can't really deal with any video processing at all. It's not a default output normally. Unless you are viewing a .BMP file through the Windows still image filter, you should never get RGB32/X8R8G8B8 as input for the mixer. Valid input for the mixer with 8-bit 4:2:0 Y'CbCr video files is NV12 or YV12, with 8-bit 4:2:2 Y'CbCr video files is YUY2.

I'm familiar with having headroom in processing power. I set my HD4890 at 700/975 clocks during video playback. It only has momentary spikes over 90% GPU usage now and then on 1080p video, even with my heavy rendering chain. (I do need set the full 3D clocks of 850/975 for a 1080i sample I have, though.) It has a nasty habit of setting the clocks to 500/975 in combination with DXVA playback, which is too low for my needs on 1080p, so that's why I override them.

Eliminateur

25th July 2011, 01:59

i've uploaded the screens to dropbox to maintain them in PNG as jpg masked the errors, the filename shows the settings used
http://dl.dropbox.com/u/3493496/screens.zip

here's what i can conclude from them in parts:

1st part - no shaders
DXVA chroma looks horribly blocky as expected
soft(ffdshow w/RGB32 HQ output) and DXVA 8bit looks almost the same(differences in contrast/brightness due to CCC video improvements and also because ffdshow uses deband filter).
Verdict: i prefer the brighter looking DXVA 8bit output, soft looks more blurred

2nd part - with shaders(using the 3 shaders in the chroma interpolation folder for floating surfaces):
DXVA 10b looks almost the same as 8bit with no shaders(since save image does not capture post-shader output i had to printscreen the file)
soft decoding shows blocky chroma with shaders
Verdict: again, soft looks blurred and dxva 8b w/out shaders looks exactly the same as with chroma shaders.

what's the point of going the extra 10b route if i can go to 8b and not use any shader?

for the gamma conversion screen, again the screen capture does not get the shader output so i had to capture with printscreen, you can see how horrible it looks with default 256/563 value

JanWillem32

25th July 2011, 15:48

@CiNcH: To test: manually change "ps_3.0" to "ps_3_0".
For a more advanced solution:
Header file: typedef LPCSTR (WINAPI* D3DXGetPixelShaderProfilePtr)(LPDIRECT3DDEVICE9 pDevice);
typedef HRESULT (WINAPI* D3DXCompileShaderPtr)(LPCSTR pSrcData, UINT SrcDataLen, CONST D3DXMACRO* pDefines, LPD3DXINCLUDE pInclude, LPCSTR pFunctionName, LPCSTR pProfile, DWORD Flags, LPD3DXBUFFER* ppShader, LPD3DXBUFFER* ppErrorMsgs, LPD3DXCONSTANTTABLE* ppConstantTable);
typedef HRESULT (WINAPI* D3DXDisassembleShaderPtr)(CONST DWORD* pShader, bool EnableColorCode, LPCSTR pComments, LPD3DXBUFFER* ppDisassembly);

D3DXGetPixelShaderProfilePtr m_pD3DXGetPixelShaderProfile;
D3DXCompileShaderPtr m_pD3DXCompileShader;
D3DXDisassembleShaderPtr m_pD3DXDisassembleShader;Initialization section: const HINSTANCE hDll = recent version of D3DX9Dll;
if(hDll) {
m_pD3DXCompileShader = reinterpret_cast<D3DXCompileShaderPtr>(GetProcAddress(hDll, "D3DXCompileShader"));
m_pD3DXDisassembleShader = reinterpret_cast<D3DXDisassembleShaderPtr>(GetProcAddress(hDll, "D3DXDisassembleShader"));
m_pD3DXGetPixelShaderProfile = reinterpret_cast<D3DXGetPixelShaderProfilePtr>(GetProcAddress(hDll, "D3DXGetPixelShaderProfile"));}
...
m_pProfile = m_pD3DXGetPixelShaderProfile(m_pD3DDev);// get the pixel shader profile levelm_pProfile is a LPCSTR you can use as a direct input for "LPCSTR pProfile" input of m_pD3DXCompileShader, it should be renewed every time a device is re-made or reset.
The other two should behave properly after initialization like this.

It's possible to write a shader that has a very poor ordering system, so that it does compile on level 2.x, but not on level 3.0. It's possible to disable flow control for those.
For the command-line interface for the standalone compiler it's the /Gfa switch. (The compiler is fxc.exe in the "x86" and "x64" folders of "Microsoft DirectX SDK (June 2010)\Utilities\bin\" .)

For the other parts: DrawIndexedPrimitive can be more efficient, avoid "g_pd3dDevice->somecall(NULL);" as much as possible, as it doesn't come for free.

@Eliminateur: I've verified the color transfer matrices to be accurate up to quite a few bits for both VMR-9 and EVR. For the main set of matrices, see: http://msdn.microsoft.com/en-us/library/ms698715%28v=vs.85%29.aspx . (The bt.601 and bt.709 matrices are also used for the two xvYCC types.)
The driver only performs abnormal filtering because filters are set in the CCC in your case. I suggest everyone to manually set up the filters in the video options tab of the video card's driver suite. ATi, Intel and nVidia all implement the filters as optional, but some are enabled by default. Those filters do cost processing power, and may also not be to your liking, either.
Using X8R8G8B8 (RGB32) when 8-bit RGB isn't the source format is pretty destructive; Y'CbCr/xvYCC sources lose the data from the [0, 15] and [236, 255], [241, 255], [241, 255] intervals (8-bit quantization assumed). That doesn't happen with the mixer working on A32B32G32R32F (FFPP) or A16B16G16R16F (HFPP) surfaces. As those names imply, the mixer quantization is quite a bit better with those two formats, too.

For providing gamma correction for internal filtering stages (includes the internal resizers) after chroma up-sampling and color conversion, I added the "LinearRGBOutput" switch to step 3. It's made to linearize the encoding gamma of 1/2.4 to 1, so no darkening artifacts occur with the internal filtering stages when blending colors. To avoid some complaints, I've not enabled the function by default.

CiNcH

25th July 2011, 16:29

Thanks for your answer.

@CiNcH: To test: manually change "ps_3.0" to "ps_3_0".
Copy/paste mistake, sorry. I of course set it to ps_3_0.

I am already feeding the output of 'D3DXGetPixelShaderProfile' into 'D3DXCompileShader', which is ps_3_0.

Still ps_3_0 profile does not work for some reason. Within MPC-HC it does.

JanWillem32

25th July 2011, 16:37

Does fxc.exe generate any warnings or errors (even in strict mode)? It's rather rare that all ps 3.0 items would fail.
I assume that hardware vertex processing is enabled?

CiNcH

25th July 2011, 17:08

Yes, 'hardware vertex processing' is enabled. I use the very simple greyscale shader for testing purpose.

CiNcH

25th July 2011, 17:30

May this be due to the fact that i use fixed-function transforms like SetTransform?

JanWillem32

25th July 2011, 18:33

SetTransform probably isn't a big problem, although a static index and vertex buffer in video usually gains a few % of performance for resolving vertices. I don't think I've ever used pre-defined transforms yet, except for world view input data to vertex shaders.header:
#pragma pack(push, 1)// this directive is used on MYD3DVERTEX to copy 32-bit aligned vertex data to video memory on x86 and x64
template<unsigned texcoords>
struct MYD3DVERTEX {
float x, y, z, rhw; struct {
float u, v;} t[texcoords];};
template<>
struct MYD3DVERTEX<0ui32> {
float x, y, z, rhw; DWORD Diffuse;};
#pragma pack(pop)

CComPtr<IDirect3DIndexBuffer9> m_pIBuffer;
CComPtr<IDirect3DVertexBuffer9> m_pVBuffer;

main:
if(!m_pIBuffer) {
// prepare the static index buffer in video RAM
static const short indices[6] = {0, 1, 2, 2, 1, 3};// two triangles
void* pVoid;// void pointer for memcpy
// create an index buffer interface
hr = m_pD3DDev->CreateIndexBuffer(sizeof(indices), D3DUSAGE_DONOTCLIP|D3DUSAGE_WRITEONLY, D3DFMT_INDEX16, D3DPOOL_DEFAULT, &m_pIBuffer, NULL);
// lock m_pIBuffer and load the indices into it
hr = m_pIBuffer->Lock(0, 0, reinterpret_cast<void**>(&pVoid), D3DLOCK_NOSYSLOCK);
memcpy(pVoid, indices, sizeof(indices));
hr = m_pIBuffer->Unlock();
// set the static index buffer
hr = m_pD3DDev->SetIndices(m_pIBuffer);}

m_pD3DDev->SetFVF(D3DFVF_XYZRHW|D3DFVF_TEX1);

...
Although unfinished, the current AlphaBlt code can serve as a nice example.

HRESULT CDX9RenderingEngine::AlphaBlt(IDirect3DTexture9* pSubtitleTexture, const CRect rcSubSrc, const CRect rcSubDest)
{// only used by DX9AllocatorPresenter.cpp to blend subtitles with resizing
// TODO: add resizing shaders
HRESULT hr;

static CRect Scrsrc, Scrdst;
if(Scrsrc != rcSubSrc || Scrdst != rcSubDest || !m_pAlphaBltVBuffer) {
m_pAlphaBltVBuffer = NULL;
Scrsrc = rcSubSrc;
Scrdst = rcSubDest;
D3DSURFACE_DESC d3dsd = {D3DFMT_UNKNOWN, D3DRTYPE_SURFACE, 0, D3DPOOL_DEFAULT, D3DMULTISAMPLE_NONE, 0, 0, 0};
if(FAILED(hr = pSubtitleTexture->GetLevelDesc(0, &d3dsd))) return hr;

const float wrp = 1.0f/static_cast<float>(d3dsd.Width), hrp = 1.0f/static_cast<float>(d3dsd.Height),
rsl = static_cast<float>(rcSubSrc.left)*wrp, rsr = static_cast<float>(rcSubSrc.right)*wrp, rst = static_cast<float>(rcSubSrc.top)*hrp, rsb = static_cast<float>(rcSubSrc.bottom)*hrp,
rdl = static_cast<float>(rcSubDest.left)-0.5f, rdr = static_cast<float>(rcSubDest.right)-0.5f, rdt = static_cast<float>(rcSubDest.top)-0.5f, rdb = static_cast<float>(rcSubDest.bottom)-0.5f;
const MYD3DVERTEX<1> v[4] = {
{rdl, rdt, 0.5f, 2.0f, rsl, rst},
{rdr, rdt, 0.5f, 2.0f, rsr, rst},
{rdl, rdb, 0.5f, 2.0f, rsl, rsb},
{rdr, rdb, 0.5f, 2.0f, rsr, rsb}};

// prepare the vertex buffer in video RAM
void* pVoid;// void pointer for memcpy
// create a vertex buffer interface
hr = m_pD3DDev->CreateVertexBuffer(sizeof(v), D3DUSAGE_DONOTCLIP|D3DUSAGE_WRITEONLY, D3DFVF_XYZRHW|D3DFVF_TEX1, D3DPOOL_DEFAULT, &m_pAlphaBltVBuffer, NULL);
// lock m_pVBuffer and load the vertices into it
hr = m_pAlphaBltVBuffer->Lock(0, 0, reinterpret_cast<void**>(&pVoid), D3DLOCK_NOSYSLOCK);
memcpy(pVoid, v, sizeof(v));
hr = m_pAlphaBltVBuffer->Unlock();}

// set the special vertex buffer
m_pD3DDev->SetSamplerState(0, D3DSAMP_MAGFILTER, D3DTEXF_LINEAR);
m_pD3DDev->SetSamplerState(0, D3DSAMP_MINFILTER, D3DTEXF_LINEAR);
hr = m_pD3DDev->SetStreamSource(0, m_pAlphaBltVBuffer, 0, sizeof(MYD3DVERTEX<1>));

// draw the rectangle
hr = m_pD3DDev->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, 4, 0, 2);

// cleanup: set the normal vertex buffer
hr = m_pD3DDev->SetStreamSource(0, m_pVBuffer, 0, sizeof(MYD3DVERTEX<1>));
m_pD3DDev->SetSamplerState(0, D3DSAMP_MAGFILTER, D3DTEXF_POINT);
m_pD3DDev->SetSamplerState(0, D3DSAMP_MINFILTER, D3DTEXF_POINT);
return hr;
}

CiNcH

25th July 2011, 19:18

I have no idea anymore what could be wrong :( .

CiNcH

25th July 2011, 19:44

I now found this (http://www.gamedev.net/topic/489553-how-to-replace-setfvf-with-vertex-shader/page__view__findpost__p__4196329). I am using

D3DFVF_XYZ or D3DFVF_DIFFUSE or D3DFVF_TEX1

and not

D3DFVF_XYZRHW

JanWillem32

26th July 2011, 00:07

Interesting that it even worked on 2.x levels.
D3DFVF_DIFFUSE needs a color as a masked ABGR in the vertex data statement, D3DFVF_TEX1 would map a texture on top of that color. I guess it has some use with some kinds of alpha blending, material layering or lighting, but those things are more for end-stage design.
D3DFVF_XYZRHW is very convenient for 2D and some 3D work, as you don't have to mess too much with the Z-depth, culling and such (unless you want to).

CiNcH

27th July 2011, 06:15

I guess it has some use with some kinds of alpha blending, material layering or lighting, but those things are more for end-stage design.
Yes. I am rendering OSD textures on top of the video texture. Can I use 'D3DFVF_XYZRHW' for the video texture, apply shaders, and then render an OSD texture on top with D3DFVF_XYZ (as the OSD can be animated, so it shall to be transformable) with an alpha channel?

I am new to the whole stuff and it is just a hobby. Can you advise some good lecture for D3D9?

JanWillem32

27th July 2011, 16:17

D3DFVF_XYZRHW|D3DFVF_TEX1 will work just fine for that. It means you are using one set of texture coordinates relative to the output surface (X, Y, Z, RHW) and one set of coordinates for the sampling register bound to the source texture (X, Y), so 6 floating points per vertex point. "const MYD3DVERTEX<1> v[4]" describes a rectangle, composed of 2 triangles by drawing the vertices by index from points 0, 1 to 2 and 2, 1 to 3.
To use the example I used earlier again, after ordering the draw operation in DrawIndexedPrimitive, use "SetTexture(0, pOSDtexture);", "SetRenderState(D3DRS_ALPHABLENDENABLE, TRUE);" and DrawIndexedPrimitive again. If you want to do the alpha blend with different vertices, pre-load them to the array "const MYD3DVERTEX<1> v[8]" located in 4 to 7 and use "hr = m_pD3DDev->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 4, 0, 4, 0, 2);".
Note that I didn't change any other statuses in between the two draw operations. You might want to set "SetSamplerState(0, D3DSAMP_MAGFILTER, D3DTEXF_POINT);", for example when you're doing a 1:1 pixel mapping from source to target. Point sampling is less intensive than linear filtering, of course. In a similar way, you can decide anti-aliasing and anisotropic filtering options for each draw operation.

CiNcH

27th July 2011, 16:54

Thanks so much for your input!

I now got ps_3_0 working with vertex format 'D3DFVF_XYZRHW'. I still have some problems with vertex coordinates though. When having used 'D3DFVF_XYZ', coordinates for x and y ranged from -1 to 1, which must have been the percentage of the back buffer. This did not work with 'D3DFVF_XYZRHW'. I now use the dimensions of the back buffer in pixels with the origin (0,0). Why the difference?

You say I should use 'D3DFVF_XYZRHW' for the OSD texture as well? Even if I want to animate it? Can I use SetTransform (which I currently use to animate the OSD) with transformed vertices in the format 'D3DFVF_XYZRHW'?

Does it really make so much of a difference in performance when using an indexed buffer if we only have 4 vertices?

JanWillem32

27th July 2011, 18:58

If you use a vertex shader to modify vertices or use a pixel shader to animate it like the "sphere" and "wave" shaders, you don't have to change your base vertex statement.
If you use an Index buffer and vertex buffer, the GPU isn't halted every time to let the driver interrupt to send new vertex data. The GPU command bus instead loads it from its queue from local video memory instead.

JanWillem32

30th July 2011, 21:05

releaselog:
added the Mitchell-Netravali cubic shaders
added "detect even or odd coordinates, alternative"
updated the the complete set of chroma interpolation shaders, all include color controls now
updated Y'CbCr conversion efficiency for all shaders that perform Y'CbCr to RGB conversions

Eliminateur

31st July 2011, 19:40

feedback:
i've updated with latest pack and using the following shaders:
1-2-3 for floating surfaces Mitchell-Netravali cubic5 and r=6 sharpen complex, deband and mild denoise as pre-resize
gamma conversion for linear to wide gamut as post-resize
all compiled with PS 3.0
plus all the filters in CCC active(smooth video, dynamic range, color vibrance, flesh tone correction, edge enhance, denoise, mosquito denoise, deblocking, dynamic contrast) as i like how it outputs.

gpu usage peaks ~86% with UVD clocks but i have a feeling that it's "framing", dropped frames counter advances continuosly :(
(and i have a R5800 which should have massive ammounts of PS power available)
and another weird thing is why doesn't the card goes to 3D clocks on such high usage is beyond me....

if i disable the r=6 i get ~2 dropped frames.

disabling almost all filters in ccc don't drops much, except in long pans, looks like uvd clock ain't enough, curiously the dynamic range filter makes it drop more frrames...

JanWillem32

31st July 2011, 21:37

The filters in the video section of CCC are render items on the shader core, and just like any pixel shader, they cost some processing power when enabled.
Note that "Enforce Smooth Video Playback" isn't a video filter, but a video filter controller. When enabled, it disables some other filters once the GPU usage is near maximum capacity.

The AMD driver locks the video card to 2D clocks when DXVA is active, irrespective of the actual processing load. I use a custom profile to clock my video card a bit higher than 2D when starting MPC-HC with a 1080p or 1080i video while using DXVA. I'm fine with 2D clocks when I'm playing back 720p video, but that may be different in other cases.

The dropped frames counter will always advance a bit when menus are switched, options are enabled or disabled, while seeking, and so on. I typically get 0 dropped frames if I don't do those things.
If the dropped frames counter continues to increase during normal video playback, there's a real bottleneck.
With a HD5830 or better on reasonable clocks, your processing chain shouldn't be a problem at all. Peaks of about 86% are very reasonable.
I get peaks of up to 95% on my HD4890, usually without any frames dropped. I use an alpha version of my modified MPC-HC builds' EVR CP, and a heavier processing chain than yours (I've enabled color management).

Eliminateur

31st July 2011, 23:11

The r=6 sharpen, i enabled it on the pre-resize chain, as final resizer, is this correct?
(btw i tested adding a 9 as i didn't notice any noticeable change in video but it's definitively there now that i modify it!).
I should try some of the more "severe" values.

Also, i guess that by using the "r=6, sharpen complex, deband and mild denoise" i can safely disable all denoise and edge enhancements controls in CCC.... kind of redundant

When i mentioned dropped frames it was during regular playback. What are you using for custom clock profiles?(do you switch it by hand on 1080p or some autodetection is in place?).
Weird that i'm gettings drops on less than 90% use...
even is you use D3DEX output the GPU won't raise to 3D clocks?, seeing as in D3DEX it's essentially a 3D game....

BTW, you should add a notice on the "sharpen complex, deband and mild denoise", they don't compile on anything but PS3.0 -at least on R=6-

Why is PS 4.0 not available for compiling?, shouldn't it provide even better performance and efficiency?

Anyway, i'll continue testing!
thanks for your work and prompt responses, sorry for some of my questions as i can never get my head over colorspaces and conversions(and a lot of filters!) but i still find all these matters fascinating

JanWillem32

1st August 2011, 07:50

The "sharpen complex, deband and denoise"-type shaders can be inserted after making the RGB linear, but before resizing. Your placement is correct.
I made those shaders mostly with quality debanding in mind, the values for sharpening are very mild. (I don't like to give images the typical sharpened edges.)
For custom video card clock profiles, I use ATI Tray Tools. I generated two profiles, one intermediate for 1080p, and one with full 3D clocks for 1080i. These have a shortcut on my desktop that starts MPC-HC. Once MPC-HC exits, the video card is also set back to idle mode.
When I use a software decoder, the video card is set by default to full 3D clocks. (I generally use the intermediate profile to start the player in such a case, as it saves a bit of fan noise.)

All "sharpen complex, deband and denoise"-type shaders have the minimum required pixel shader level noted in line 9.
PS levels of 4.0 or higher are indeed more efficient, but you can only use them in DirectX 10 or higher. The renderer is sadly just DirectX 9.

CiNcH

1st August 2011, 08:51

What's the advantage of separating a scaler shader into a horizontal and vertical pass?

JanWillem32

1st August 2011, 09:39

A standard bicubic kernel samples 4 pixels in each direction. For separate horizontal and vertical passes that requires sampling 4+4 pixels. Doing that in one pass takes 16 pixels at once. For reference, see the "(inefficient one-pass)"-type shaders in the development folder.
Of course, there's some overhead in adding another pass in the chain, plus the need for a special intermediate surface (one that's only resized in one direction), but that's still less than sampling 8 pixels extra for every output pixel. The gain raises exponentially with the larger kernels that take 5 or more pixels per pass.

Eliminateur

1st August 2011, 23:41

are there any plans to make the renderer DX11?(as DX10 is kind of useless at this point...)

JanWillem32

2nd August 2011, 00:11

I don't have any objections against it, but at the moment the mixer functions, VSync functions and the subtitle renderer take precedence.

Eliminateur

8th August 2011, 00:11

Jan,
i was watching a video and noticed that it had a LOT of banding and was very noticeable, so i switched to ffdshow softdecode(RGB32+dither, deband thresh. 1.2 radius 16), disabled all shaders other than the gamma output one and it looks like a very good gradient as it should
i tried switching betweeen light and mild denoise R=6 shaders to no avail, it's VERY banded with DXVA

here's the DXVA one: http://dl.dropbox.com/u/3493496/DXVA.png
soft one: http://dl.dropbox.com/u/3493496/soft.png

any idea what do i need to tweak or do to have THAT kind of output with DXVA?

JanWillem32

8th August 2011, 01:05

Disable dynamic brightness, or any other function that's pushing the brightness so far up in the video card's control panel. Consumer video is already very limited in near black values due to the gamma and quantization, and there's also lossy compression on top of that.
The sharpening filters in control panel can also do quite a bit of harm, and I've never seen the standard denoise functions do any sort of debanding at all.
Also, all filtering settings in the the video card's control panel simply work as long as the decoder input to the EVR mixer isn't RGB. It's not an exclusive feature of DXVA.

For "sharpen complex, deband and denoise"-type shaders, you can raise the "NoiseLevel" value for a more agressive deband and denoise. There's also "GammaCompensation", but lowering it can be harsh on low-light, poor quality video.
Line 21: // GammaCompensation, interval [1, 2], default 1.5, the gamma compensation factor to diminish denoising on darker pixels, a factor 1 will use the same grade of filtering on darker and lighter pixels, the current standards for consumer-grade video are lacking in dynamics for the lower brightness range, so a factor 1 will usually cause too much fitering on darker pixels

Note, as usual, any proper debanding requires better working surfaces than the default X8R8G8B8 in the renderer chain.

dukestravels07

8th August 2011, 23:23

Hi Jan. I was wondering if you could give me a basic idea of what shaders to use with my setup?.
I've read through the thread and while I get the basics, some of it is totally confusing. I have a very basic HTPC setup. It consists of a visiontek ATI HD3650 agp card. I run mpc-hc with vmr9 renderless and dxva works fine.

I have a really crap old projector (SL2U) that has a native res of 800x600, so most of the films I have are avi with roughly DVD like quality...some are VHS quality.

I was wondering how I could get the best possible quality using your shaders?

I have no idea about "gamma conversion for linear RGB" is this something I need to use?

Will yv12 chroma upsampling improve crap avi's?

Basically if you can give a basic list of shaders and in what order they should go, in order to imrpove my videos id be grateful.

Thanks in advance.

JanWillem32

9th August 2011, 01:26

There's unfortunately no magic filter that will make a display look like it has a higher resolution. We can only filter the video.
Intermediate conversion to linear gamma is used to prevent darkening artifacts on every processing step. We generally do encoded video gamma -> linear gamma -> display gamma.
Chroma up-sampling shaders are a trick to get video input where chroma up-sampling was skipped to be up-sampled by one of the 4 methods. Your video card will skip chroma up-sampling when a surface mode other than X8R8G8B8 is used with the VMR-9 or EVR mixers. (10-bit out, HFPP, FFPP settings for MPC-HC)

My settings will be a bit heavy for your setup but it's better to start with more and take away things later.
Feel free to relax Full Floating Point Processing for Half, and replace "r=6, sharpen complex, deband and denoise" for something less extreme. There are plenty of choices to make your own working chain. It's just a matter of taste, but in general, don't try to apply two similar filters in one chain. It's usually better to edit settings in one shader to make it lighter or heavier in an aspect. Other than that, try a lot. Even messing around with some of the joke shaders can be very interesting.

Renderer settings I use in MPC-HC's EVR CP:
D3D Full Screen Mode, 10-bit RGB Output, Full Floating Point Processing, Disable desktop composition (Aero)

Shader chain:
1. RGB to Y'CbCr for SD&HD video input for floating point surfaces
2. special 4÷2÷0 to 4÷2÷2 intermediate Catmull-Rom spline5 chroma up-sampling for SD&HD video input
3. special 4÷2÷2 Catmull-Rom spline5 chroma up-sampling for SD&HD video input (linear output enabled, sometimes I also set colorfulness gamma to lower a high colorfulness on some video sources)
r=6, sharpen complex, deband and medium denoise,
Post-resize:
gamma conversion of linear RGB to wide gamut RGB for floating point surfaces
final pass: color management with an ICC profile installed system-wide and random ordered dithering

mindbomb

28th August 2011, 02:56

C:\Program Files (x86)\Media Player Classic - Home Cinema\memory(64,29): warning X3206: implicit truncation of vector type
C:\Program Files (x86)\Media Player Classic - Home Cinema\memory(64,45): warning X3206: implicit truncation of vector type
C:\Program Files (x86)\Media Player Classic - Home Cinema\memory(64,63): warning X3206: implicit truncation of vector type
C:\Program Files (x86)\Media Player Classic - Home Cinema\memory(85,12): error X3014: incorrect number of arguments to numeric-type constructor

i get these errors when creating the bilinear 4 2 0 chroma filter.
anything to worry about?

Eliminateur

28th August 2011, 03:11

mindbomb, check the PS version you're compiling the shaders with the shader editor, use the highest you can based on your hardware(usually ps 3.0 by now)

JanWillem32

28th August 2011, 10:43

I already noticed that the bilinear up-samplers were broken when I was integrating them in the latest MPC-HC tester build. I waited some time to get a nice set of fixes and new shaders ready.

Changelog:
corrected "bilinear chroma up-sampling and color controls for SD&HD video input"
improved performance for many Y'CbCr mode shaders

YCbCr-type sharpen complex test
I've tried something new. I've adapted r=4, r=5 and r=6 sharpen complex, deband and medium denoise to Y'CbCr-mode shaders. So far I've been very satisfied with its debanding capabilities and performance, so I'm asking if others would like to test these, too. For changing the debanding and denoising strength, just edit the value for "NoiseLevel".
For those that use the chroma up-sampling sets, the alternatives for the three 4:2:2 up-sampling shaders are included. Others can use one of the two "RGB to Y'CbCr for SD&HD video input"-type shaders to pre-process to Y'CbCr mode.
If anything needs correction, please tell me.

mindbomb

28th August 2011, 14:22

thanks, the bilinear chroma shader from 1.4 works perfectly

Qaq

28th August 2011, 14:40

Personally, I found that *correct* HQ chroma upsampling provides agressive colors with my setup. Same thing with madVR, even for FullHD videos (most of them). I've disabled all the adjustments in video driver, my TV doesn't allow color control in PC mode, so I can only hope for perfect color processing in video renderer. Trying to avoid that color madness I found that w/o chroma shaders picture looks much close to natural. Need to try NN chroma scaler in madVR too. :devil:

Eliminateur

28th August 2011, 22:35

are those in the development folder or the production folders?

I already noticed that the bilinear up-samplers were broken when I was integrating them in the latest MPC-HC tester build. I waited some time to get a nice set of fixes and new shaders ready.

Changelog:
corrected "bilinear chroma up-sampling and color controls for SD&HD video input"
improved performance for many Y'CbCr mode shaders

YCbCr-type sharpen complex test
I've tried something new. I've adapted r=4, r=5 and r=6 sharpen complex, deband and medium denoise to Y'CbCr-mode shaders. So far I've been very satisfied with its debanding capabilities and performance, so I'm asking if others would like to test these, too. For changing the debanding and denoising strength, just edit the value for "NoiseLevel".
For those that use the chroma up-sampling sets, the alternatives for the three 4:2:2 up-sampling shaders are included. Others can use one of the two "RGB to Y'CbCr for SD&HD video input"-type shaders to pre-process to Y'CbCr mode.
If anything needs correction, please tell me.

JanWillem32

29th August 2011, 07:18

@Qaq: That's odd. With the sharper resizers, like the Mitchell-Netravali cubic and Catmull-Rom spline implementation I've used, over-saturation on some chroma borders is possible (but very rare). With resizers that can only blur, like the cubic B-Spline and bilinear implementation I've used, that's not possible.
I wonder what your TV is doing to the signal. When the RGB 4:4:4 full range output of the renderer is left unprocessed by the display device (no digital processing, only digital-to-analog conversion for the panel), any form of chroma up-sampling in the renderer should give a better picture over nearest neighbor (especially if full picture resizing or aspect ratio correction is used by the renderer).
It's known that in some situations the renderer output is converted afterwards, lowering the output quality. Can you test the "draw grid coordinates" shader? It should draw single-pixel wide RGB lines horizontally and vertically when full picture resizing and aspect ratio correction are disabled. If there's something wrong with the processing after the renderer output, the lines will be imperfect.

@Eliminateur: The full directory listing of the normal download is in the OP, the "YCbCr-type sharpen complex test" is a separate download.