YAP (Yet Another Player) v0.9.5 - Page 5

madshi · 19th December 2014, 11:21

Creation of textures etc is something you need to do in CPU code. AFAIK, kernels (doesn't matter if it's pixel shaders, DirectCompute, CUDA or OpenCL) can't create textures, they can just use them.

Orf · 20th December 2014, 09:35

madshi, and what about this one ?

madshi · 20th December 2014, 09:43

Quote:

Originally Posted by Orf

can you please share your the PS and DirectCompute versions of nnedi3?

The DirectCompute kernel is shipping with madVR. The PS version for upscaling in X direction is here:

Code:

    "sampler SourceSampler : register(s0);\n"
    "sampler WeightSampler : register(s2);\n"
    "float4 floatConsts1 : register(c0);\n"
    "#define pixSizeX (floatConsts1[0])\n"
    "#define pixSizeY (floatConsts1[1])\n"
    "static float1 SumWeights1[nns] = (float1[nns]) packedSumWeights1Array;\n"
    "static float1 SumWeights2[nns] = (float1[nns]) packedSumWeights2Array;\n"
    "static float4x4 rgbToHd = {+0.2126000000000000, +0.7152000000000000, +0.0722000000000000, 0,\n"
    "                           -0.1145721060573400, -0.3854278939426600, +0.5000000000000000, 0,\n"
    "                           +0.5000000000000000, -0.4541529083058166, -0.0458470916941834, 0, 0, 0, 0, 0};\n"
    "\n"
    "float4 main(float2 Tex : TEXCOORD0) : COLOR0\n"
    "{\n"
    "  float input[32];\n"
    "  float mstd0, mstd1, mstd2;\n"
    "  {\n"
    "    float sum = 0;\n"
    "    float sumsq = 0;\n"
    "    int index = 0;\n"
    "    float xpos = Tex.x - 1.0 * pixSizeX;\n"
    "    for (int ix = 0; ix < 4; ix++)\n"
    "    {\n"
    "      float ypos = Tex.y - 3.0 * pixSizeY;\n"
    "      for (int iy = 0; iy < 8; iy++)\n"
    "      {\n"
    "        float4 sample = tex2Dlod(SourceSampler, float4(xpos, ypos, 0, 0));\n"
    "        sample = (sample - 16.0f / 255.0f) / (219.0f / 255.0f);\n"   // d3d9Float8 16-235 -> 0-255
    "	       sample = mul(rgbToHd, sample) * 255.0;\n"
    "        ypos += pixSizeY;\n"
    "        input[index++] = sample[0];\n"
    "        sum += sample[0];\n"
    "        sumsq += sample[0] * sample[0];\n"
    "      }\n"
    "      xpos += pixSizeX;\n"
    "    }\n"
    "    mstd0 = sum / 32.0;\n"
    "    mstd1 = sumsq / 32.0 - mstd0 * mstd0;\n"
    "    mstd1 = (mstd1 <= 1.19209290e-07) ? 0.0 : sqrt(mstd1);\n"
    "    mstd2 = (mstd1 > 0) ? (1.0 / mstd1) : 0.0;\n"
    "  }\n"
    "  float vsum = 0;\n"
    "  float wsum = 0;\n"
    "  {\n"
    "    float ypos = 0.5 / nns;\n"
    "    for (int i1 = 0; i1 < nns; i1++)\n"
    "    {\n"
    "      float xpos = 0.5 / 16.0;\n"
    "      float sum1 = 0;\n"
    "      float sum2 = 0;\n"
    "      int index = 0;\n"
    "      for (int i2 = 0; i2 < 16; i2++)\n"
    "      {\n"
    "        float4 weights = tex1Dlod(WeightSampler, float4(xpos, ypos, 0, 0));\n"
    "        xpos += 1.0 / 16.0;\n"
    "        float sample = input[index++];\n"
    "        sum1 += sample * weights[0];\n"
    "        sum2 += sample * weights[1];\n"
    "        sample = input[index++];\n"
    "        sum1 += sample * weights[2];\n"
    "        sum2 += sample * weights[3];\n"
    "      }\n"
    "      ypos += 1.0 / nns;\n"
    "      float temp1 = sum1 * mstd2 + SumWeights1[i1];\n"
    "      float temp2 = sum2 * mstd2 + SumWeights2[i1];\n"
    "      temp1 = exp(clamp(temp1, -80.0, +80.0));\n"
    "      vsum += temp1 * (temp2 / (1.0 + abs(temp2)));\n"
    "      wsum += temp1;\n"
    "    }\n"
    "  }\n"
    "  float result = (mstd0 + ((wsum > 1e-10) ? (((5.0 * vsum) / wsum) * mstd1) : 0.0)) / 255.0;\n"
    "  return result * (219.0f / 255.0f) + 16.0f / 255.0f;\n"   // d3d9Float8 0-255 -> 16-235
    "}";

This kernel needs the NNEDI3 weight "database" uploaded to the "WeightSampler" texture and the "SumWeights1/2" constants in the right order, though, and I don't have that information easy to share, unfortunately.

Orf · 22nd December 2014, 06:52

madshi,
thanks, will look into it

Shiandow,
have thought a bit about it, so if I made support for simple script, that will look like in the example below, will it allow you to replace RenderScript part without changing your hlsl's ?

For example we have a shader pack of 5 hlsl scripts (Shader1.hlsl, Shader2.hlsl, Shader3.hlsl, Shader4.hlsl, Shader5.hlsl),
the default script generated would be one pass:
Shader1(Source)->Shader2(Shader1)->Shader3(Shader2)->Shader4(Shader3)->Shader5(Shader4);
but you can change it to something like this:
Shader1(Source)->Shader2(Shader1)->Shader3(Shader2); // first pass
Shader4(Source); // second pass
Shader5(Source, Shader3, Shader4); // third pass

Shiandow · 22nd December 2014, 11:27

Quote:

Originally Posted by Orf

Shiandow,
have thought a bit about it, so if I made support for simple script, that will look like in the example below, will it allow you to replace RenderScript part without changing your hlsl's ?

For example we have a shader pack of 5 hlsl scripts (Shader1.hlsl, Shader2.hlsl, Shader3.hlsl, Shader4.hlsl, Shader5.hlsl),
the default script generated would be one pass:
Shader1(Source)->Shader2(Shader1)->Shader3(Shader2)->Shader4(Shader3)->Shader5(Shader4);
but you can change it to something like this:
Shader1(Source)->Shader2(Shader1)->Shader3(Shader2); // first pass
Shader4(Source); // second pass
Shader5(Source, Shader3, Shader4); // third pass

Well, I'd need to be able to choose the sizes of the output of a shader. And for SuperRes I'm also using MPDN's internal scaling algorithms, but if you can use shaders and change their output size then it shouldn't be too hard to recreate those scaling algorithms.

Orf · 22nd December 2014, 13:00

I see, something like Shader2(Shader1); ResizeOutput(2, 1); Shader3(Shader2);
where ResizeOutput(2, 1) will results in OutputImage.Width = 2 * InputImage.Width ?

Shiandow · 22nd December 2014, 13:15

That notation is a bit ambiguous, does it resize the output of Shader2 or does it render Shader2 onto a larger texture? The latter is more important to have, but the former can also be convenient.

v0lt · 22nd December 2014, 19:23

I ran from the archive YAP.exe and it without asking anything added context menu for folders.

Orf · 23rd December 2014, 04:48

Quote:

Originally Posted by Shiandow

That notation is a bit ambiguous, does it resize the output of Shader2 or does it render Shader2 onto a larger texture? The latter is more important to have, but the former can also be convenient.

Yes, that was just first that comes to mind. I've meant that Shader3(Shader2) will be rendered at twice wider texture

v0lt,
uncheck "Explorer context menu entry" option and YAP will remove it. But if you only plan move executable to some other place, don't worry YAP will update path automatically

Orf · 23rd December 2014, 04:48

Quote:

Originally Posted by Shiandow

That notation is a bit ambiguous, does it resize the output of Shader2 or does it render Shader2 onto a larger texture? The latter is more important to have, but the former can also be convenient.

Yes, that was just first that comes to mind. I've meant that Shader3(Shader2) will be rendered at twice wider texture

v0lt,
uncheck "Explorer context menu entry" option and YAP will remove it. But if you only plan move executable to some other place, don't worry YAP will update path automatically

madshi · 23rd December 2014, 09:51

Quote:

Originally Posted by Shiandow

I'd need to be able to choose the sizes of the output of a shader. And for SuperRes I'm also using MPDN's internal scaling algorithms, but if you can use shaders and change their output size then it shouldn't be too hard to recreate those scaling algorithms.

JFMI: What do you need to scale for? I thought SuperRes would look at the original/unscaled image and at the final/scaled image and then post-process the scaled image, based on analyzing both images? Do you need to manually scale another time to make SuperRes work?

Shiandow · 23rd December 2014, 13:38

Quote:

Originally Posted by madshi

JFMI: What do you need to scale for? I thought SuperRes would look at the original/unscaled image and at the final/scaled image and then post-process the scaled image, based on analyzing both images? Do you need to manually scale another time to make SuperRes work?

The basic idea behind SuperRes is to try to invert a downscaling algorithm, to do this it tries to minimize the difference between the original image and a downscaled version of the final image. If A and B are the the original and upscaled image and D is a downscaling operator, then you can minimize the "||A - D B||^2" (which measures the difference between the A and a downscaled version of B) by changing the amount pixel values of B by something like:

- D^t (A - D B)

Where D^t is the transpose of the downscaling operator, which (surprisingly) is the corresponding upscaling operator. For instance if D performs bicubic downscaling then D^t performs bicubic upscaling. This means that you can calculate this part by downscaling 'B', subtracting this from A and then upscaling this again. You could technically do this in only one go, but that is several orders of magnitude slower.

This method does effectively invert the downscaling operation, but is ill behaved. It will create lots of ringing and aliasing. To avoid that it is necessary to do some post-processing to remove those, but of course this may cause the image to deviate from the original again, so you have to correct that again. Anyway that goes back and forth a few times and (hopefully) converges onto a final image. In practice 2 times seems to be enough to get reasonable results.

madshi · 24th December 2014, 11:49

Ok, so let me try to sum that up:

1) SuperRes needs access to the original image A and the upscaled image B.
2) SuperRes downscales B internally to the resolution of A.
3) SuperRes calculates the difference between A and B.
4) SuperRes upscaled the difference to the resolution of B.
5) SuperRes applies the upscaled difference to B.

Is that correct? I suppose the algorithms for steps 2) and 4) should be "identical" (e.g. both Bicubic AR)? Should they also be identical to the original upscaling algorithm used to upscale A to B? Or is that not necessary?

Shiandow · 24th December 2014, 12:05

Quote:

Originally Posted by madshi

Ok, so let me try to sum that up:

1) SuperRes needs access to the original image A and the upscaled image B.
2) SuperRes downscales B internally to the resolution of A.
3) SuperRes calculates the difference between A and B.
4) SuperRes upscaled the difference to the resolution of B.
5) SuperRes applies the upscaled difference to B.

Is that correct? I suppose the algorithms for steps 2) and 4) should be "identical" (e.g. both Bicubic AR)? Should they also be identical to the original upscaling algorithm used to upscale A to B? Or is that not necessary?

Yeah, that's more or less correct. Although in the last step you should also do some post-processing (SuperRes currently does anti-aliasing, anti-ringing and some sharpening).

Anyway, according to the mathematics step 2) and 4) should have "identical" scaling algorithms, but in practice using a better algorithm for step 4) has far more benefit than using a better algorithm for 2). My current favorite combination is to use bilinear for downscaling and Gaussian for upscaling (low aliasing and ringing). In theory you can use whatever algorithm you want for the initial scaling of A to B, but it's generally better to use one without too much aliasing, NEDI is almost ideal in that regard.

madshi · 24th December 2014, 12:15

Ok, thanks.

Anima123 · 24th December 2014, 23:38

Shiandow, NEDI doubles the resolution on both directions, does SuperRes with NEDI enabled need to do last step scaling in order to get the targeted rectangle, while the SuperRes without NEDI does not?

foxyshadis · 25th December 2014, 02:18

Anima123, it looks like it hands off to the player/renderer/next in chain to make any final adjustments, like plain NEDI. Without NEDI it'll go direct to the output resolution (unless you have a weird chain and force something else), so no further resize should be done.

Shiandow, something I'm curious about with SuperRes: Once the algorithm is pretty locked in, will converting it to OpenCL make a big difference? Also, would you eventually be willing to make an AviSynth or VapourSynth filter out of it? (NEDI-based upsizing is definitely better than NNEDI for some things.) If not, at least having the code available makes it possible for others. It just keeps getting better, I really like how well it works!

Also, if you guys don't mind, I think it's best to split this discussion out of the YAP thread.

Shiandow · 25th December 2014, 11:03

Quote:

Originally Posted by foxyshadis

Shiandow, something I'm curious about with SuperRes: Once the algorithm is pretty locked in, will converting it to OpenCL make a big difference? Also, would you eventually be willing to make an AviSynth or VapourSynth filter out of it? (NEDI-based upsizing is definitely better than NNEDI for some things.) If not, at least having the code available makes it possible for others. It just keeps getting better, I really like how well it works!

Also, if you guys don't mind, I think it's best to split this discussion out of the YAP thread.

I don't think converting to OpenCL will make much difference. One of the main advantages of NEDI and SuperRes is that a pixel only depends on the pixels that immediately surround it, so it's pretty easy to do all the work with shaders. So far I'm not not planning to make an AviSynth or VapourSynth out of it. I had a quick look and it think would take me a lot of time to figure out how to use those shaders in either of them.

Anyway, this discussion is deviating quite a bit from the original topic so I agree that it would probably be better to move it to it's own thread.

Orf · 26th December 2014, 04:41

Shiandow, to be completely sure

For SuperRes:
NEDI-pre -> <Upscale>-I -> <Upscale>-II -> SuperRes-pre -> SuperRes -> [SuperRes-inf -> SuperRes] -> NEDI-pst
Upscale means 2x upscale in both directions ? Where did final scaling should happen and what algorithm is used (better to use) for it ? And how did MPC-HC knew that it should reallocate and resize output texture on steps 2 and 3 ?

And can you please draw the same scheme for MPDN NEDI ?

Shiandow · 26th December 2014, 14:36

Quote:

Originally Posted by Orf

Shiandow, to be completely sure

For SuperRes:
NEDI-pre -> <Upscale>-I -> <Upscale>-II -> SuperRes-pre -> SuperRes -> [SuperRes-inf -> SuperRes] -> NEDI-pst
Upscale means 2x upscale in both directions ? Where did final scaling should happen and what algorithm is used (better to use) for it ? And how did MPC-HC knew that it should reallocate and resize output texture on steps 2 and 3 ?

And can you please draw the same scheme for MPDN NEDI ?

That version of SuperRes is more of a proof of concept, it worked well enough but MPDN's version is nicer. In that version I got around the problem of allocating a new texture by abuseing the alpha channel, I think either "Nedi-pre" or "Upscale-I" stores the original image in the alpha channel and "SuperRes-pre" / "SuperRes-inf" downscale the image and store the difference with the original in the alpha channel. I only did this for the luma channel since the others didn't fit. The more complete MPDN diagram is something like:

Code:

       /---------------------------------------\
       |                                       | 
       v                                       |
Initial Guess ---> Downscale ---> Diff ---> SuperRes
                                   ^           ^
                                   |           |
Original --------------------------+-----------/

For clarity I've left out the various colour conversions. Basically I do downscaling in linear light and everything else in L*a*b.

For the NEDI shaders the diagram is something like:

Code:

NEDI-pre -> NEDI-I -> NEDI-II -> NEDI-pst

but in MPDN I could allocate new textures and did it as follows (the part in parentheses is the size of the resulting image).

Code:

Input (w,h)--->NEDI-Hinterleave (2w,h)--->NEDI-Vinterleave(2w,2h)
   |                ^         |                   ^
   V                |         V                   |
NEDI-I (w,h)--------/       NEDI-II (2w,h)--------/

The NEDI-XInterleave shaders reshuffle the pixels of its two inputs resulting in an image that is either scaled 2x horizontally or 2x vertically. Doing it this way avoids some unnecessary calculations.

26th December 2014, 04:41	#99 \| Link
Orf YAP author Join Date: Jul 2014 Location: Russian Federation Posts: 111	Shiandow, to be completely sure For SuperRes: NEDI-pre -> <Upscale>-I -> <Upscale>-II -> SuperRes-pre -> SuperRes -> [SuperRes-inf -> SuperRes] -> NEDI-pst Upscale means 2x upscale in both directions ? Where did final scaling should happen and what algorithm is used (better to use) for it ? And how did MPC-HC knew that it should reallocate and resize output texture on steps 2 and 3 ? And can you please draw the same scheme for MPDN NEDI ? Last edited by Orf; 26th December 2014 at 04:51.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

19th December 2014, 11:21	#81 \| Link
madshi Registered Developer Join Date: Sep 2006 Posts: 9,140	Creation of textures etc is something you need to do in CPU code. AFAIK, kernels (doesn't matter if it's pixel shaders, DirectCompute, CUDA or OpenCL) can't create textures, they can just use them.

20th December 2014, 09:35	#82 \| Link
Orf YAP author Join Date: Jul 2014 Location: Russian Federation Posts: 111	madshi, and what about this one ?

22nd December 2014, 06:52	#84 \| Link
Orf YAP author Join Date: Jul 2014 Location: Russian Federation Posts: 111	madshi, thanks, will look into it Shiandow, have thought a bit about it, so if I made support for simple script, that will look like in the example below, will it allow you to replace RenderScript part without changing your hlsl's ? For example we have a shader pack of 5 hlsl scripts (Shader1.hlsl, Shader2.hlsl, Shader3.hlsl, Shader4.hlsl, Shader5.hlsl), the default script generated would be one pass: Shader1(Source)->Shader2(Shader1)->Shader3(Shader2)->Shader4(Shader3)->Shader5(Shader4); but you can change it to something like this: Shader1(Source)->Shader2(Shader1)->Shader3(Shader2); // first pass Shader4(Source); // second pass Shader5(Source, Shader3, Shader4); // third pass

22nd December 2014, 13:00	#86 \| Link
Orf YAP author Join Date: Jul 2014 Location: Russian Federation Posts: 111	I see, something like Shader2(Shader1); ResizeOutput(2, 1); Shader3(Shader2); where ResizeOutput(2, 1) will results in OutputImage.Width = 2 * InputImage.Width ?

22nd December 2014, 13:15	#87 \| Link
Shiandow Registered User Join Date: Dec 2013 Posts: 753	That notation is a bit ambiguous, does it resize the output of Shader2 or does it render Shader2 onto a larger texture? The latter is more important to have, but the former can also be convenient.

22nd December 2014, 19:23	#88 \| Link
v0lt Registered User Join Date: Dec 2008 Posts: 1,968	I ran from the archive YAP.exe and it without asking anything added context menu for folders.

24th December 2014, 11:49	#93 \| Link
madshi Registered Developer Join Date: Sep 2006 Posts: 9,140	Ok, so let me try to sum that up: 1) SuperRes needs access to the original image A and the upscaled image B. 2) SuperRes downscales B internally to the resolution of A. 3) SuperRes calculates the difference between A and B. 4) SuperRes upscaled the difference to the resolution of B. 5) SuperRes applies the upscaled difference to B. Is that correct? I suppose the algorithms for steps 2) and 4) should be "identical" (e.g. both Bicubic AR)? Should they also be identical to the original upscaling algorithm used to upscale A to B? Or is that not necessary?

24th December 2014, 12:15	#95 \| Link
madshi Registered Developer Join Date: Sep 2006 Posts: 9,140	Ok, thanks.

24th December 2014, 23:38	#96 \| Link
Anima123 Registered User Join Date: Jun 2005 Posts: 504	Shiandow, NEDI doubles the resolution on both directions, does SuperRes with NEDI enabled need to do last step scaling in order to get the targeted rectangle, while the SuperRes without NEDI does not?

25th December 2014, 02:18	#97 \| Link
foxyshadis Angel of Night Join Date: Nov 2004 Location: Tangled in the silks Posts: 9,559	Anima123, it looks like it hands off to the player/renderer/next in chain to make any final adjustments, like plain NEDI. Without NEDI it'll go direct to the output resolution (unless you have a weird chain and force something else), so no further resize should be done. Shiandow, something I'm curious about with SuperRes: Once the algorithm is pretty locked in, will converting it to OpenCL make a big difference? Also, would you eventually be willing to make an AviSynth or VapourSynth filter out of it? (NEDI-based upsizing is definitely better than NNEDI for some things.) If not, at least having the code available makes it possible for others. It just keeps getting better, I really like how well it works! Also, if you guys don't mind, I think it's best to split this discussion out of the YAP thread.