Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
13th December 2014, 09:23 | #61 | Link | |
YAP author
Join Date: Jul 2014
Location: Russian Federation
Posts: 111
|
Quote:
Also I've come to an understanding that compute shaders is better fit for this purpose then pixels shaders. Can anyone say, am I right that NNEDI3 is only currently available here @doom9 as pixel shader hlsl sources ? @madshi I can't reproduce the F11-F12 bug you've reported by your description. Maybe you have some more to report ? Is it always happen to you ? Last edited by Orf; 13th December 2014 at 09:43. |
|
13th December 2014, 16:15 | #62 | Link | |
Registered User
Join Date: Dec 2013
Posts: 753
|
Quote:
Edit: I was mistaken. It seems that MadVR also uses OpenCL. And I think part of the code actually comes with MadVR (in the folder 'legal stuff'). Last edited by Shiandow; 13th December 2014 at 16:26. |
|
13th December 2014, 16:19 | #63 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,347
|
NNEDI3 is done in OpenCL in madVR as well. Only the dithering shaders are DirectCompute, afaik.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
13th December 2014, 16:29 | #65 | Link |
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
I've tried converting NNEDI3 to DirectCompute, but it performed *much* slower than with OpenCL, unlike error diffusion which was actually slightly faster with DirectCompute. So nevcairiel ist right, NNEDI3 is done in OpenCL, error diffusion in DirectCompute.
@Orf, unfortunately I don't have any time atm. But I know that it was 100% reproducable for me, when I reported the problem. |
13th December 2014, 19:36 | #66 | Link |
YAP author
Join Date: Jul 2014
Location: Russian Federation
Posts: 111
|
Shiandow, nevcairiel, madshi,
thanks guys firstly for your comments, they are really helpful finding my way in the dark. But at second thought, and may be I'm missing something here, but why all of your starts talking in one voice about OpenCL vs DirectCompute, when I was initially asked about pixels shaders vs direct compute shaders ? |
15th December 2014, 06:39 | #68 | Link |
YAP author
Join Date: Jul 2014
Location: Russian Federation
Posts: 111
|
madshi, I agree, I do not ask a question in a correct way may be. Will try to correct myself now. As far as I understood beside using DirectCompute/OpenCL APIs there's third way to do it by simply drawing a quad via Direct3D and applyng pixel shader to it. Did you test is that method any faster then OpenCL ?
|
15th December 2014, 08:09 | #69 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,347
|
Pixel Shaders are much more limited, and something as complex as NNEDI3 is unlikely to be possible with pixel shaders alone.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
15th December 2014, 08:15 | #70 | Link | |
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
Quote:
The reason why pixel shaders are so much slower for OpenCL is that pixel shaders apply math to every destination pixel separetely. OpenCL and DirectCompute are more flexible, you can configure them to render multiple destination pixels with one kernel pass. Doing that allows to cleverly cache things and to share some calculations for multiple pixels etc. Especially for NNEDI3 that's very important to get things up to speed. |
|
16th December 2014, 05:32 | #71 | Link |
YAP author
Join Date: Jul 2014
Location: Russian Federation
Posts: 111
|
nevcairiel, madshi, thanks
I do understand OpenCL/DirectCompute is more powerfull, that was why I initially asked. But this NEDI/NNEDI is really confusing me. To summarize what I've learned from your: - NNEDI3 is the most heavy algorithm, but it gives the best result in the end - NEDI implemented here for example is less heavy, so PS use is acceptable - madVR internally should have at least two image processing conveyers. #1 is PS conveyer, #2 is OpenCL conveyer. Also it possible have DirectCompute conveyer as #3 - IMadVRExternalPixelShaders supports only #1 (?) - To make general and flexible image processing support all three conveyers have to be implemented - Which one of the three is better is kind of an open question. Also picking and implementing only one of them will require rewriting of hlsl/cl code (thing I very unlikely can do myself) I'm I still missing something ? |
16th December 2014, 09:28 | #72 | Link | |
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
Quote:
Although the names suggest otherwise, NNEDI3 and NEDI are *totally* different algorithms, which have almost nothing in common (except for doing an exact 2x upscale). IMO NNEDI3 has better image quality, but it's also quite a bit slower than NEDI. And yes, NEDI works fine with simple PS3.0 pixel shaders, while NNEDI3 requires OpenCL to run at a decent speed. FYI, Shiandow has written a super-res post-processing algorithm (using simple pixel shaders, once again) which improves NEDI quality even further, bringing it even nearer to NNEDI3 quality. This super-res algorithm is currently available for NEDI, only, I think, but it could in theory also be used to improve other 2x upscale algorithms, e.g. NNEDI3, or even Bicubic/Lanczos. I'm not sure if this super-res algorithm would improve NNEDI3 quality, too, we haven't tried yet, I think. But it might. I'm hoping that the super-res algorithm will sooner or later be a separate filter, running after any other 2x upscaling algorithm. |
|
16th December 2014, 12:24 | #73 | Link | |
Registered User
Join Date: Dec 2013
Posts: 753
|
Quote:
|
|
16th December 2014, 14:45 | #75 | Link | |
Registered User
Join Date: May 2014
Posts: 292
|
Quote:
This is just a proposal for expansion/improvement of your product. The problem is in the understanding of the translation. It would be good to contact you via e-mail (Vkontakte). |
|
17th December 2014, 06:13 | #76 | Link |
YAP author
Join Date: Jul 2014
Location: Russian Federation
Posts: 111
|
madshi,
can you please share your the PS and DirectCompute versions of nnedi3? It will be nice example for me to estimate the differences in code base and test performance maybe. Quick looked through Shiandow SuperRes realization, I'm right guessing that separate PS hlsl's theoretically can be combined in one DirectCompute hlsl or one OpenCL cl ? Shiandow, as far as I understand SuperRes requires only that separate hlsl been applied in correct order to work. So what benefits MPDN's render scripts gives to your comparing to MPC-HC way of configuring shaders ? Gravitator, I do not use any of social networks. But you can use the PM I guess. |
17th December 2014, 12:07 | #77 | Link | |
Registered User
Join Date: Dec 2013
Posts: 753
|
Quote:
|
|
19th December 2014, 10:03 | #78 | Link |
YAP author
Join Date: Jul 2014
Location: Russian Federation
Posts: 111
|
Shiandow,
sorry for delay, have to check some things before answer. That's because we're implementing the same thing. Does it simply means you need another sampler with source image on any stage of processing ? Because from what I've found here it looks more complex. Like you need samplers with the result of all previous stages or something more maybe. |
19th December 2014, 10:34 | #79 | Link |
Registered User
Join Date: Dec 2013
Posts: 753
|
There's only one step where I'd need another sampler with the source image, but more importantly that is still not quite enough to implement SuperRes. To implement SuperRes it's more or less necessary to be able to create new samplers and be able to send multiple samplers to one shader. It's technically possible to do SuperRes for one of the channels by storing things in the alpha channel, but that's not ideal.
The way this is achieved in MPDN is by building a chain of so called 'filters' which keeps track of allocating textures and sending the right textures to the right shaders. It might seem that you can use results from all previous stages, but under the hood it will try to allocate as few textures as possible, it also won't calculate results that aren't used and since recently it can even optimize away unnecessary conversions (so if you have X -> ConvertToYUV -> ConvertToRGB -> Y, it will simply do X -> Y). |
19th December 2014, 11:02 | #80 | Link |
YAP author
Join Date: Jul 2014
Location: Russian Federation
Posts: 111
|
In other words, some logics need to be programmed anyway. I'm currently trying' to understand if texture creating is possible inside compute shader hlsl. Can't find any useful information so far. May be madshi will shed some light on this matter.
|
|
|