Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Hardware & Software > Software players

Reply
 
Thread Tools Search this Thread Display Modes
Old 19th December 2014, 11:21   #81  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Creation of textures etc is something you need to do in CPU code. AFAIK, kernels (doesn't matter if it's pixel shaders, DirectCompute, CUDA or OpenCL) can't create textures, they can just use them.
madshi is offline   Reply With Quote
Old 20th December 2014, 09:35   #82  |  Link
Orf
YAP author
 
Join Date: Jul 2014
Location: Russian Federation
Posts: 111
madshi, and what about this one ?
Orf is offline   Reply With Quote
Old 20th December 2014, 09:43   #83  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Quote:
Originally Posted by Orf View Post
can you please share your the PS and DirectCompute versions of nnedi3?
The DirectCompute kernel is shipping with madVR. The PS version for upscaling in X direction is here:

Code:
    "sampler SourceSampler : register(s0);\n"
    "sampler WeightSampler : register(s2);\n"
    "float4 floatConsts1 : register(c0);\n"
    "#define pixSizeX (floatConsts1[0])\n"
    "#define pixSizeY (floatConsts1[1])\n"
    "static float1 SumWeights1[nns] = (float1[nns]) packedSumWeights1Array;\n"
    "static float1 SumWeights2[nns] = (float1[nns]) packedSumWeights2Array;\n"
    "static float4x4 rgbToHd = {+0.2126000000000000, +0.7152000000000000, +0.0722000000000000, 0,\n"
    "                           -0.1145721060573400, -0.3854278939426600, +0.5000000000000000, 0,\n"
    "                           +0.5000000000000000, -0.4541529083058166, -0.0458470916941834, 0, 0, 0, 0, 0};\n"
    "\n"
    "float4 main(float2 Tex : TEXCOORD0) : COLOR0\n"
    "{\n"
    "  float input[32];\n"
    "  float mstd0, mstd1, mstd2;\n"
    "  {\n"
    "    float sum = 0;\n"
    "    float sumsq = 0;\n"
    "    int index = 0;\n"
    "    float xpos = Tex.x - 1.0 * pixSizeX;\n"
    "    for (int ix = 0; ix < 4; ix++)\n"
    "    {\n"
    "      float ypos = Tex.y - 3.0 * pixSizeY;\n"
    "      for (int iy = 0; iy < 8; iy++)\n"
    "      {\n"
    "        float4 sample = tex2Dlod(SourceSampler, float4(xpos, ypos, 0, 0));\n"
    "        sample = (sample - 16.0f / 255.0f) / (219.0f / 255.0f);\n"   // d3d9Float8 16-235 -> 0-255
    "	       sample = mul(rgbToHd, sample) * 255.0;\n"
    "        ypos += pixSizeY;\n"
    "        input[index++] = sample[0];\n"
    "        sum += sample[0];\n"
    "        sumsq += sample[0] * sample[0];\n"
    "      }\n"
    "      xpos += pixSizeX;\n"
    "    }\n"
    "    mstd0 = sum / 32.0;\n"
    "    mstd1 = sumsq / 32.0 - mstd0 * mstd0;\n"
    "    mstd1 = (mstd1 <= 1.19209290e-07) ? 0.0 : sqrt(mstd1);\n"
    "    mstd2 = (mstd1 > 0) ? (1.0 / mstd1) : 0.0;\n"
    "  }\n"
    "  float vsum = 0;\n"
    "  float wsum = 0;\n"
    "  {\n"
    "    float ypos = 0.5 / nns;\n"
    "    for (int i1 = 0; i1 < nns; i1++)\n"
    "    {\n"
    "      float xpos = 0.5 / 16.0;\n"
    "      float sum1 = 0;\n"
    "      float sum2 = 0;\n"
    "      int index = 0;\n"
    "      for (int i2 = 0; i2 < 16; i2++)\n"
    "      {\n"
    "        float4 weights = tex1Dlod(WeightSampler, float4(xpos, ypos, 0, 0));\n"
    "        xpos += 1.0 / 16.0;\n"
    "        float sample = input[index++];\n"
    "        sum1 += sample * weights[0];\n"
    "        sum2 += sample * weights[1];\n"
    "        sample = input[index++];\n"
    "        sum1 += sample * weights[2];\n"
    "        sum2 += sample * weights[3];\n"
    "      }\n"
    "      ypos += 1.0 / nns;\n"
    "      float temp1 = sum1 * mstd2 + SumWeights1[i1];\n"
    "      float temp2 = sum2 * mstd2 + SumWeights2[i1];\n"
    "      temp1 = exp(clamp(temp1, -80.0, +80.0));\n"
    "      vsum += temp1 * (temp2 / (1.0 + abs(temp2)));\n"
    "      wsum += temp1;\n"
    "    }\n"
    "  }\n"
    "  float result = (mstd0 + ((wsum > 1e-10) ? (((5.0 * vsum) / wsum) * mstd1) : 0.0)) / 255.0;\n"
    "  return result * (219.0f / 255.0f) + 16.0f / 255.0f;\n"   // d3d9Float8 0-255 -> 16-235
    "}";
This kernel needs the NNEDI3 weight "database" uploaded to the "WeightSampler" texture and the "SumWeights1/2" constants in the right order, though, and I don't have that information easy to share, unfortunately.
madshi is offline   Reply With Quote
Old 22nd December 2014, 06:52   #84  |  Link
Orf
YAP author
 
Join Date: Jul 2014
Location: Russian Federation
Posts: 111
madshi,
thanks, will look into it

Shiandow,
have thought a bit about it, so if I made support for simple script, that will look like in the example below, will it allow you to replace RenderScript part without changing your hlsl's ?

For example we have a shader pack of 5 hlsl scripts (Shader1.hlsl, Shader2.hlsl, Shader3.hlsl, Shader4.hlsl, Shader5.hlsl),
the default script generated would be one pass:
Shader1(Source)->Shader2(Shader1)->Shader3(Shader2)->Shader4(Shader3)->Shader5(Shader4);
but you can change it to something like this:
Shader1(Source)->Shader2(Shader1)->Shader3(Shader2); // first pass
Shader4(Source); // second pass
Shader5(Source, Shader3, Shader4); // third pass
Orf is offline   Reply With Quote
Old 22nd December 2014, 11:27   #85  |  Link
Shiandow
Registered User
 
Join Date: Dec 2013
Posts: 753
Quote:
Originally Posted by Orf View Post
Shiandow,
have thought a bit about it, so if I made support for simple script, that will look like in the example below, will it allow you to replace RenderScript part without changing your hlsl's ?

For example we have a shader pack of 5 hlsl scripts (Shader1.hlsl, Shader2.hlsl, Shader3.hlsl, Shader4.hlsl, Shader5.hlsl),
the default script generated would be one pass:
Shader1(Source)->Shader2(Shader1)->Shader3(Shader2)->Shader4(Shader3)->Shader5(Shader4);
but you can change it to something like this:
Shader1(Source)->Shader2(Shader1)->Shader3(Shader2); // first pass
Shader4(Source); // second pass
Shader5(Source, Shader3, Shader4); // third pass
Well, I'd need to be able to choose the sizes of the output of a shader. And for SuperRes I'm also using MPDN's internal scaling algorithms, but if you can use shaders and change their output size then it shouldn't be too hard to recreate those scaling algorithms.
Shiandow is offline   Reply With Quote
Old 22nd December 2014, 13:00   #86  |  Link
Orf
YAP author
 
Join Date: Jul 2014
Location: Russian Federation
Posts: 111
I see, something like Shader2(Shader1); ResizeOutput(2, 1); Shader3(Shader2);
where ResizeOutput(2, 1) will results in OutputImage.Width = 2 * InputImage.Width ?
Orf is offline   Reply With Quote
Old 22nd December 2014, 13:15   #87  |  Link
Shiandow
Registered User
 
Join Date: Dec 2013
Posts: 753
That notation is a bit ambiguous, does it resize the output of Shader2 or does it render Shader2 onto a larger texture? The latter is more important to have, but the former can also be convenient.
Shiandow is offline   Reply With Quote
Old 22nd December 2014, 19:23   #88  |  Link
v0lt
Registered User
 
Join Date: Dec 2008
Posts: 1,959
I ran from the archive YAP.exe and it without asking anything added context menu for folders.
v0lt is offline   Reply With Quote
Old 23rd December 2014, 04:48   #89  |  Link
Orf
YAP author
 
Join Date: Jul 2014
Location: Russian Federation
Posts: 111
Quote:
Originally Posted by Shiandow View Post
That notation is a bit ambiguous, does it resize the output of Shader2 or does it render Shader2 onto a larger texture? The latter is more important to have, but the former can also be convenient.
Yes, that was just first that comes to mind. I've meant that Shader3(Shader2) will be rendered at twice wider texture

v0lt,
uncheck "Explorer context menu entry" option and YAP will remove it. But if you only plan move executable to some other place, don't worry YAP will update path automatically
Orf is offline   Reply With Quote
Old 23rd December 2014, 04:48   #90  |  Link
Orf
YAP author
 
Join Date: Jul 2014
Location: Russian Federation
Posts: 111
Quote:
Originally Posted by Shiandow View Post
That notation is a bit ambiguous, does it resize the output of Shader2 or does it render Shader2 onto a larger texture? The latter is more important to have, but the former can also be convenient.
Yes, that was just first that comes to mind. I've meant that Shader3(Shader2) will be rendered at twice wider texture

v0lt,
uncheck "Explorer context menu entry" option and YAP will remove it. But if you only plan move executable to some other place, don't worry YAP will update path automatically
Orf is offline   Reply With Quote
Old 23rd December 2014, 09:51   #91  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Quote:
Originally Posted by Shiandow View Post
I'd need to be able to choose the sizes of the output of a shader. And for SuperRes I'm also using MPDN's internal scaling algorithms, but if you can use shaders and change their output size then it shouldn't be too hard to recreate those scaling algorithms.
JFMI: What do you need to scale for? I thought SuperRes would look at the original/unscaled image and at the final/scaled image and then post-process the scaled image, based on analyzing both images? Do you need to manually scale another time to make SuperRes work?
madshi is offline   Reply With Quote
Old 23rd December 2014, 13:38   #92  |  Link
Shiandow
Registered User
 
Join Date: Dec 2013
Posts: 753
Quote:
Originally Posted by madshi View Post
JFMI: What do you need to scale for? I thought SuperRes would look at the original/unscaled image and at the final/scaled image and then post-process the scaled image, based on analyzing both images? Do you need to manually scale another time to make SuperRes work?
The basic idea behind SuperRes is to try to invert a downscaling algorithm, to do this it tries to minimize the difference between the original image and a downscaled version of the final image. If A and B are the the original and upscaled image and D is a downscaling operator, then you can minimize the "||A - D B||^2" (which measures the difference between the A and a downscaled version of B) by changing the amount pixel values of B by something like:

- D^t (A - D B)

Where D^t is the transpose of the downscaling operator, which (surprisingly) is the corresponding upscaling operator. For instance if D performs bicubic downscaling then D^t performs bicubic upscaling. This means that you can calculate this part by downscaling 'B', subtracting this from A and then upscaling this again. You could technically do this in only one go, but that is several orders of magnitude slower.

This method does effectively invert the downscaling operation, but is ill behaved. It will create lots of ringing and aliasing. To avoid that it is necessary to do some post-processing to remove those, but of course this may cause the image to deviate from the original again, so you have to correct that again. Anyway that goes back and forth a few times and (hopefully) converges onto a final image. In practice 2 times seems to be enough to get reasonable results.
Shiandow is offline   Reply With Quote
Old 24th December 2014, 11:49   #93  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Ok, so let me try to sum that up:

1) SuperRes needs access to the original image A and the upscaled image B.
2) SuperRes downscales B internally to the resolution of A.
3) SuperRes calculates the difference between A and B.
4) SuperRes upscaled the difference to the resolution of B.
5) SuperRes applies the upscaled difference to B.

Is that correct? I suppose the algorithms for steps 2) and 4) should be "identical" (e.g. both Bicubic AR)? Should they also be identical to the original upscaling algorithm used to upscale A to B? Or is that not necessary?
madshi is offline   Reply With Quote
Old 24th December 2014, 12:05   #94  |  Link
Shiandow
Registered User
 
Join Date: Dec 2013
Posts: 753
Quote:
Originally Posted by madshi View Post
Ok, so let me try to sum that up:

1) SuperRes needs access to the original image A and the upscaled image B.
2) SuperRes downscales B internally to the resolution of A.
3) SuperRes calculates the difference between A and B.
4) SuperRes upscaled the difference to the resolution of B.
5) SuperRes applies the upscaled difference to B.

Is that correct? I suppose the algorithms for steps 2) and 4) should be "identical" (e.g. both Bicubic AR)? Should they also be identical to the original upscaling algorithm used to upscale A to B? Or is that not necessary?
Yeah, that's more or less correct. Although in the last step you should also do some post-processing (SuperRes currently does anti-aliasing, anti-ringing and some sharpening).

Anyway, according to the mathematics step 2) and 4) should have "identical" scaling algorithms, but in practice using a better algorithm for step 4) has far more benefit than using a better algorithm for 2). My current favorite combination is to use bilinear for downscaling and Gaussian for upscaling (low aliasing and ringing). In theory you can use whatever algorithm you want for the initial scaling of A to B, but it's generally better to use one without too much aliasing, NEDI is almost ideal in that regard.
Shiandow is offline   Reply With Quote
Old 24th December 2014, 12:15   #95  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Ok, thanks.
madshi is offline   Reply With Quote
Old 24th December 2014, 23:38   #96  |  Link
Anima123
Registered User
 
Join Date: Jun 2005
Posts: 504
Shiandow, NEDI doubles the resolution on both directions, does SuperRes with NEDI enabled need to do last step scaling in order to get the targeted rectangle, while the SuperRes without NEDI does not?
Anima123 is offline   Reply With Quote
Old 25th December 2014, 02:18   #97  |  Link
foxyshadis
ангел смерти
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Lost
Posts: 9,558
Anima123, it looks like it hands off to the player/renderer/next in chain to make any final adjustments, like plain NEDI. Without NEDI it'll go direct to the output resolution (unless you have a weird chain and force something else), so no further resize should be done.

Shiandow, something I'm curious about with SuperRes: Once the algorithm is pretty locked in, will converting it to OpenCL make a big difference? Also, would you eventually be willing to make an AviSynth or VapourSynth filter out of it? (NEDI-based upsizing is definitely better than NNEDI for some things.) If not, at least having the code available makes it possible for others. It just keeps getting better, I really like how well it works!

Also, if you guys don't mind, I think it's best to split this discussion out of the YAP thread.
foxyshadis is offline   Reply With Quote
Old 25th December 2014, 11:03   #98  |  Link
Shiandow
Registered User
 
Join Date: Dec 2013
Posts: 753
Quote:
Originally Posted by foxyshadis View Post
Shiandow, something I'm curious about with SuperRes: Once the algorithm is pretty locked in, will converting it to OpenCL make a big difference? Also, would you eventually be willing to make an AviSynth or VapourSynth filter out of it? (NEDI-based upsizing is definitely better than NNEDI for some things.) If not, at least having the code available makes it possible for others. It just keeps getting better, I really like how well it works!

Also, if you guys don't mind, I think it's best to split this discussion out of the YAP thread.
I don't think converting to OpenCL will make much difference. One of the main advantages of NEDI and SuperRes is that a pixel only depends on the pixels that immediately surround it, so it's pretty easy to do all the work with shaders. So far I'm not not planning to make an AviSynth or VapourSynth out of it. I had a quick look and it think would take me a lot of time to figure out how to use those shaders in either of them.

Anyway, this discussion is deviating quite a bit from the original topic so I agree that it would probably be better to move it to it's own thread.
Shiandow is offline   Reply With Quote
Old 26th December 2014, 04:41   #99  |  Link
Orf
YAP author
 
Join Date: Jul 2014
Location: Russian Federation
Posts: 111
Shiandow, to be completely sure

For SuperRes:
NEDI-pre -> <Upscale>-I -> <Upscale>-II -> SuperRes-pre -> SuperRes -> [SuperRes-inf -> SuperRes] -> NEDI-pst
Upscale means 2x upscale in both directions ? Where did final scaling should happen and what algorithm is used (better to use) for it ? And how did MPC-HC knew that it should reallocate and resize output texture on steps 2 and 3 ?

And can you please draw the same scheme for MPDN NEDI ?

Last edited by Orf; 26th December 2014 at 04:51.
Orf is offline   Reply With Quote
Old 26th December 2014, 14:36   #100  |  Link
Shiandow
Registered User
 
Join Date: Dec 2013
Posts: 753
Quote:
Originally Posted by Orf View Post
Shiandow, to be completely sure

For SuperRes:
NEDI-pre -> <Upscale>-I -> <Upscale>-II -> SuperRes-pre -> SuperRes -> [SuperRes-inf -> SuperRes] -> NEDI-pst
Upscale means 2x upscale in both directions ? Where did final scaling should happen and what algorithm is used (better to use) for it ? And how did MPC-HC knew that it should reallocate and resize output texture on steps 2 and 3 ?

And can you please draw the same scheme for MPDN NEDI ?
That version of SuperRes is more of a proof of concept, it worked well enough but MPDN's version is nicer. In that version I got around the problem of allocating a new texture by abuseing the alpha channel, I think either "Nedi-pre" or "Upscale-I" stores the original image in the alpha channel and "SuperRes-pre" / "SuperRes-inf" downscale the image and store the difference with the original in the alpha channel. I only did this for the luma channel since the others didn't fit. The more complete MPDN diagram is something like:

Code:
       /---------------------------------------\
       |                                       | 
       v                                       |
Initial Guess ---> Downscale ---> Diff ---> SuperRes
                                   ^           ^
                                   |           |
Original --------------------------+-----------/
For clarity I've left out the various colour conversions. Basically I do downscaling in linear light and everything else in L*a*b.

For the NEDI shaders the diagram is something like:

Code:
NEDI-pre -> NEDI-I -> NEDI-II -> NEDI-pst
but in MPDN I could allocate new textures and did it as follows (the part in parentheses is the size of the resulting image).

Code:
Input (w,h)--->NEDI-Hinterleave (2w,h)--->NEDI-Vinterleave(2w,2h)
   |                ^         |                   ^
   V                |         V                   |
NEDI-I (w,h)--------/       NEDI-II (2w,h)--------/
The NEDI-XInterleave shaders reshuffle the pixels of its two inputs resulting in an image that is either scaled 2x horizontally or 2x vertically. Doing it this way avoids some unnecessary calculations.
Shiandow is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 20:57.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.