YAP (Yet Another Player) v0.9.5 - Page 4

Orf · 13th December 2014, 09:23

Quote:

Originally Posted by leeperry

OK thanks for the reply, so basically ACDSee-like features(previous/next/random/shrink to fit/1:1 scaling) with NNEDI3 and Jinc3 aren't gonna happen in YAP?

No, what you've quoted means that madVR in its current state can't be used to accomplish this. But I think this idea of yours is nice, I like it much and I want to implement it anyway. So, I've already started to add pixel shader support on my own. When it will be finished, it will enable you to apply any combination of hlsl sources to images, madVR and to EVR CP in the future.

Also I've come to an understanding that compute shaders is better fit for this purpose then pixels shaders. Can anyone say, am I right that NNEDI3 is only currently available here @doom9 as pixel shader hlsl sources ?

@madshi
I can't reproduce the F11-F12 bug you've reported by your description. Maybe you have some more to report ? Is it always happen to you ?

Shiandow · 13th December 2014, 16:15

Quote:

Originally Posted by Orf

Also I've come to an understanding that compute shaders is better fit for this purpose then pixels shaders. Can anyone say, am I right that NNEDI3 is only currently available here @doom9 as pixel shader hlsl sources ?

As far as I know it's available as Avisynth (.avsi) or OpenCL (.cl) files. MadVR uses DirectCompute but I'm not sure if madshi ever made that code public.

Edit: I was mistaken. It seems that MadVR also uses OpenCL. And I think part of the code actually comes with MadVR (in the folder 'legal stuff').

nevcairiel · 13th December 2014, 16:19

NNEDI3 is done in OpenCL in madVR as well. Only the dithering shaders are DirectCompute, afaik.

Shiandow · 13th December 2014, 16:24

Oh, I think you're right, the changelog mentions that NNEDI3 needs OpenCL.

madshi · 13th December 2014, 16:29

I've tried converting NNEDI3 to DirectCompute, but it performed *much* slower than with OpenCL, unlike error diffusion which was actually slightly faster with DirectCompute. So nevcairiel ist right, NNEDI3 is done in OpenCL, error diffusion in DirectCompute.

@Orf, unfortunately I don't have any time atm. But I know that it was 100% reproducable for me, when I reported the problem.

Orf · 13th December 2014, 19:36

Shiandow, nevcairiel, madshi,
thanks guys firstly for your comments, they are really helpful finding my way in the dark. But at second thought, and may be I'm missing something here, but why all of your starts talking in one voice about OpenCL vs DirectCompute, when I was initially asked about pixels shaders vs direct compute shaders ?

madshi · 14th December 2014, 10:56

Quote:

Originally Posted by Orf

at second thought, and may be I'm missing something here, but why all of your starts talking in one voice about OpenCL vs DirectCompute, when I was initially asked about pixels shaders vs direct compute shaders ?

Because you were asking about NNEDI3 GPU implementations, and you were asking about DirectCompute. So we tried to explain to you that for NNEDI3, you'd better be using OpenCL because it's dramatically faster.

Orf · 15th December 2014, 06:39

madshi, I agree, I do not ask a question in a correct way may be. Will try to correct myself now. As far as I understood beside using DirectCompute/OpenCL APIs there's third way to do it by simply drawing a quad via Direct3D and applyng pixel shader to it. Did you test is that method any faster then OpenCL ?

nevcairiel · 15th December 2014, 08:09

Quote:

Originally Posted by Orf

As far as I understood beside using DirectCompute/OpenCL APIs there's third way to do it by simply drawing a quad via Direct3D and applyng pixel shader to it. Did you test is that method any faster then OpenCL ?

Pixel Shaders are much more limited, and something as complex as NNEDI3 is unlikely to be possible with pixel shaders alone.

madshi · 15th December 2014, 08:15

Quote:

Originally Posted by Orf

madshi, I agree, I do not ask a question in a correct way may be. Will try to correct myself now. As far as I understood beside using DirectCompute/OpenCL APIs there's third way to do it by simply drawing a quad via Direct3D and applyng pixel shader to it. Did you test is that method any faster then OpenCL ?

I actually did try to do NNEDI3 via PS3.0 pixel shaders, and from what I remember, it was slower by a factor of around 1000x, compared to OpenCL.

The reason why pixel shaders are so much slower for OpenCL is that pixel shaders apply math to every destination pixel separetely. OpenCL and DirectCompute are more flexible, you can configure them to render multiple destination pixels with one kernel pass. Doing that allows to cleverly cache things and to share some calculations for multiple pixels etc. Especially for NNEDI3 that's very important to get things up to speed.

Orf · 16th December 2014, 05:32

nevcairiel, madshi, thanks
I do understand OpenCL/DirectCompute is more powerfull, that was why I initially asked.
But this NEDI/NNEDI is really confusing me. To summarize what I've learned from your:
- NNEDI3 is the most heavy algorithm, but it gives the best result in the end
- NEDI implemented here for example is less heavy, so PS use is acceptable
- madVR internally should have at least two image processing conveyers. #1 is PS conveyer, #2 is OpenCL conveyer. Also it possible have DirectCompute conveyer as #3
- IMadVRExternalPixelShaders supports only #1 (?)
- To make general and flexible image processing support all three conveyers have to be implemented
- Which one of the three is better is kind of an open question. Also picking and implementing only one of them will require rewriting of hlsl/cl code (thing I very unlikely can do myself)

I'm I still missing something ?

madshi · 16th December 2014, 09:28

Quote:

Originally Posted by Orf

But this NEDI/NNEDI is really confusing me. To summarize what I've learned from your:
- NNEDI3 is the most heavy algorithm, but it gives the best result in the end
- NEDI implemented here for example is less heavy, so PS use is acceptable
- madVR internally should have at least two image processing conveyers. #1 is PS conveyer, #2 is OpenCL conveyer. Also it possible have DirectCompute conveyer as #3
- IMadVRExternalPixelShaders supports only #1 (?)
- To make general and flexible image processing support all three conveyers have to be implemented
- Which one of the three is better is kind of an open question. Also picking and implementing only one of them will require rewriting of hlsl/cl code (thing I very unlikely can do myself)

Seems all correct to me.

Although the names suggest otherwise, NNEDI3 and NEDI are *totally* different algorithms, which have almost nothing in common (except for doing an exact 2x upscale). IMO NNEDI3 has better image quality, but it's also quite a bit slower than NEDI. And yes, NEDI works fine with simple PS3.0 pixel shaders, while NNEDI3 requires OpenCL to run at a decent speed.

FYI, Shiandow has written a super-res post-processing algorithm (using simple pixel shaders, once again) which improves NEDI quality even further, bringing it even nearer to NNEDI3 quality. This super-res algorithm is currently available for NEDI, only, I think, but it could in theory also be used to improve other 2x upscale algorithms, e.g. NNEDI3, or even Bicubic/Lanczos. I'm not sure if this super-res algorithm would improve NNEDI3 quality, too, we haven't tried yet, I think. But it might. I'm hoping that the super-res algorithm will sooner or later be a separate filter, running after any other 2x upscaling algorithm.

Shiandow · 16th December 2014, 12:24

Quote:

Originally Posted by madshi

FYI, Shiandow has written a super-res post-processing algorithm (using simple pixel shaders, once again) which improves NEDI quality even further, bringing it even nearer to NNEDI3 quality. This super-res algorithm is currently available for NEDI, only, I think, but it could in theory also be used to improve other 2x upscale algorithms, e.g. NNEDI3, or even Bicubic/Lanczos. I'm not sure if this super-res algorithm would improve NNEDI3 quality, too, we haven't tried yet, I think. But it might. I'm hoping that the super-res algorithm will sooner or later be a separate filter, running after any other 2x upscaling algorithm.

SuperRes works for arbitrary scaling factors, and arbitrary algorithms. It's not hard to combine with other scaling algorithms, it basically just needs a 'before' and 'after' image. So far I'm having a bit of trouble with larger scaling factors, since it's hard to add detail back into the image without introducing aliasing, but I still have some ideas I could try and MPDN's render scripts make it quite easy to try things out so hopefully I'll be able to improve that soon.

madshi · 16th December 2014, 12:40

Ok, sounds good!

Gravitator · 16th December 2014, 14:45

Quote:

Originally Posted by Orf

Still barely understand, who are we and what do they want from me.
p/s pls remove this torrent shot of yours from my thread

This is just a proposal for expansion/improvement of your product. The problem is in the understanding of the translation. It would be good to contact you via e-mail (Vkontakte).

Orf · 17th December 2014, 06:13

madshi,
can you please share your the PS and DirectCompute versions of nnedi3? It will be nice example for me to estimate the differences in code base and test performance maybe. Quick looked through Shiandow SuperRes realization, I'm right guessing that separate PS hlsl's theoretically can be combined in one DirectCompute hlsl or one OpenCL cl ?

Shiandow,
as far as I understand SuperRes requires only that separate hlsl been applied in correct order to work. So what benefits MPDN's render scripts gives to your comparing to MPC-HC way of configuring shaders ?

Gravitator,
I do not use any of social networks. But you can use the PM I guess.

Shiandow · 17th December 2014, 12:07

Quote:

Originally Posted by Orf

Shiandow,
as far as I understand SuperRes requires only that separate hlsl been applied in correct order to work. So what benefits MPDN's render scripts gives to your comparing to MPC-HC way of configuring shaders ?

A few weeks ago madshi asked a similar question, you may want read my reply here. But the gist of it is that SuperRes needs to compare the current image to the original image, so you need to be able to store the original somewhere. If it wasn't for some very creative use of the alpha channel, I wouldn't have been able to do SuperRes with just shaders, at all.

Orf · 19th December 2014, 10:03

Shiandow,
sorry for delay, have to check some things before answer. That's because we're implementing the same thing. Does it simply means you need another sampler with source image on any stage of processing ? Because from what I've found here it looks more complex. Like you need samplers with the result of all previous stages or something more maybe.

Shiandow · 19th December 2014, 10:34

There's only one step where I'd need another sampler with the source image, but more importantly that is still not quite enough to implement SuperRes. To implement SuperRes it's more or less necessary to be able to create new samplers and be able to send multiple samplers to one shader. It's technically possible to do SuperRes for one of the channels by storing things in the alpha channel, but that's not ideal.

The way this is achieved in MPDN is by building a chain of so called 'filters' which keeps track of allocating textures and sending the right textures to the right shaders. It might seem that you can use results from all previous stages, but under the hood it will try to allocate as few textures as possible, it also won't calculate results that aren't used and since recently it can even optimize away unnecessary conversions (so if you have X -> ConvertToYUV -> ConvertToRGB -> Y, it will simply do X -> Y).

Orf · 19th December 2014, 11:02

In other words, some logics need to be programmed anyway. I'm currently trying' to understand if texture creating is possible inside compute shader hlsl. Can't find any useful information so far. May be madshi will shed some light on this matter.

13th December 2014, 16:19	#63 \| Link
nevcairiel Registered Developer Join Date: Mar 2010 Location: Hamburg/Germany Posts: 10,347	NNEDI3 is done in OpenCL in madVR as well. Only the dithering shaders are DirectCompute, afaik. __________________ LAV Filters - open source ffmpeg based media splitter and decoders

13th December 2014, 16:24	#64 \| Link
Shiandow Registered User Join Date: Dec 2013 Posts: 753	Oh, I think you're right, the changelog mentions that NNEDI3 needs OpenCL.

13th December 2014, 16:29	#65 \| Link
madshi Registered Developer Join Date: Sep 2006 Posts: 9,140	I've tried converting NNEDI3 to DirectCompute, but it performed much slower than with OpenCL, unlike error diffusion which was actually slightly faster with DirectCompute. So nevcairiel ist right, NNEDI3 is done in OpenCL, error diffusion in DirectCompute. @Orf, unfortunately I don't have any time atm. But I know that it was 100% reproducable for me, when I reported the problem.

13th December 2014, 19:36	#66 \| Link
Orf YAP author Join Date: Jul 2014 Location: Russian Federation Posts: 111	Shiandow, nevcairiel, madshi, thanks guys firstly for your comments, they are really helpful finding my way in the dark. But at second thought, and may be I'm missing something here, but why all of your starts talking in one voice about OpenCL vs DirectCompute, when I was initially asked about pixels shaders vs direct compute shaders ?

15th December 2014, 06:39	#68 \| Link
Orf YAP author Join Date: Jul 2014 Location: Russian Federation Posts: 111	madshi, I agree, I do not ask a question in a correct way may be. Will try to correct myself now. As far as I understood beside using DirectCompute/OpenCL APIs there's third way to do it by simply drawing a quad via Direct3D and applyng pixel shader to it. Did you test is that method any faster then OpenCL ?

16th December 2014, 05:32	#71 \| Link
Orf YAP author Join Date: Jul 2014 Location: Russian Federation Posts: 111	nevcairiel, madshi, thanks I do understand OpenCL/DirectCompute is more powerfull, that was why I initially asked. But this NEDI/NNEDI is really confusing me. To summarize what I've learned from your: - NNEDI3 is the most heavy algorithm, but it gives the best result in the end - NEDI implemented here for example is less heavy, so PS use is acceptable - madVR internally should have at least two image processing conveyers. #1 is PS conveyer, #2 is OpenCL conveyer. Also it possible have DirectCompute conveyer as #3 - IMadVRExternalPixelShaders supports only #1 (?) - To make general and flexible image processing support all three conveyers have to be implemented - Which one of the three is better is kind of an open question. Also picking and implementing only one of them will require rewriting of hlsl/cl code (thing I very unlikely can do myself) I'm I still missing something ?

16th December 2014, 12:40	#74 \| Link
madshi Registered Developer Join Date: Sep 2006 Posts: 9,140	Ok, sounds good!

17th December 2014, 06:13	#76 \| Link
Orf YAP author Join Date: Jul 2014 Location: Russian Federation Posts: 111	madshi, can you please share your the PS and DirectCompute versions of nnedi3? It will be nice example for me to estimate the differences in code base and test performance maybe. Quick looked through Shiandow SuperRes realization, I'm right guessing that separate PS hlsl's theoretically can be combined in one DirectCompute hlsl or one OpenCL cl ? Shiandow, as far as I understand SuperRes requires only that separate hlsl been applied in correct order to work. So what benefits MPDN's render scripts gives to your comparing to MPC-HC way of configuring shaders ? Gravitator, I do not use any of social networks. But you can use the PM I guess.

19th December 2014, 10:03	#78 \| Link
Orf YAP author Join Date: Jul 2014 Location: Russian Federation Posts: 111	Shiandow, sorry for delay, have to check some things before answer. That's because we're implementing the same thing. Does it simply means you need another sampler with source image on any stage of processing ? Because from what I've found here it looks more complex. Like you need samplers with the result of all previous stages or something more maybe.

19th December 2014, 10:34	#79 \| Link
Shiandow Registered User Join Date: Dec 2013 Posts: 753	There's only one step where I'd need another sampler with the source image, but more importantly that is still not quite enough to implement SuperRes. To implement SuperRes it's more or less necessary to be able to create new samplers and be able to send multiple samplers to one shader. It's technically possible to do SuperRes for one of the channels by storing things in the alpha channel, but that's not ideal. The way this is achieved in MPDN is by building a chain of so called 'filters' which keeps track of allocating textures and sending the right textures to the right shaders. It might seem that you can use results from all previous stages, but under the hood it will try to allocate as few textures as possible, it also won't calculate results that aren't used and since recently it can even optimize away unnecessary conversions (so if you have X -> ConvertToYUV -> ConvertToRGB -> Y, it will simply do X -> Y).

19th December 2014, 11:02	#80 \| Link
Orf YAP author Join Date: Jul 2014 Location: Russian Federation Posts: 111	In other words, some logics need to be programmed anyway. I'm currently trying' to understand if texture creating is possible inside compute shader hlsl. Can't find any useful information so far. May be madshi will shed some light on this matter.