Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 27th June 2015, 06:19   #1  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,128
Shaders in AviSynth

Here's a feature request. It would be great to have a generic bridge between shaders and AviSynth.

This is possible and has been done before with AviShader, but the source code of it is apparently lost. Nonetheless, it can be done.
http://forum.doom9.org/showthread.php?t=86793

One shader I would especially love to use is SuperRes written by Shiandow. I spoke with him. He'll keep working on SuperRes shader but doing an AviSynth version is not a priority.
https://github.com/zachsaw/MPDN_Exte...ow.SuperRes.cs

There are 2 options: rewriting the shader for AviSynth, or writing a generic bridge for shaders. I think the later option would be best for two reasons.

1. As SuperRes shader evolves, those changes would be reflected into AviSynth without having to update it there as well.

2. It would allow the use of other shaders.

What other shaders would be useful?

Is anyone interested in taking this project?


Edit: This is the latest response I got from Shiandow
Quote:
I have no way of knowing if there still is a version of the source floating around somewhere but the author hasn't been seen in several years and all links are dead so I'm not too optimistic.

If someone wants to adapt SuperRes to Avisynth it should be reasonably clear what they need to do. The algorithm itself isn't that complicated, in fact it's simplified somewhat in the last version.

Last edited by MysteryX; 27th June 2015 at 20:36.
MysteryX is offline   Reply With Quote
Old 6th July 2015, 20:15   #2  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,128
I'm looking into this. Shaders can be run with DirectCompute which has the advantage of using the GPU (while very few AviSynth filters use the GPU).
https://code.msdn.microsoft.com/wind...Win32-7d5a7408

The only thing that would have to be done is to translate AviSynth frame data into DirectComputer square buffer, and then translate it back.

The downside, however, is that it requires DX11, will be compatible with DX10 and it will fail on DX9. I'm not sure there's a way to run the filter with DX9 or on the CPU as a fallback plan.

If that would be done, this would allow running other shaders like NNEDI3 on the GPU and probably have MUCH better performance!
MysteryX is offline   Reply With Quote
Old 7th July 2015, 17:03   #3  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,128
I've done a bunch of research and testing. At the pace things are going (knowing nothing of C++, AviSynth dev or DirectX), I'm better leave that aside. If someone wants to pick it up, here's what I found.

This code does an approximation of SuperRes. It's missing a bunch of color conversions, the core SuperRes function and a custom Diff that copies the source's Y into the diff's A.

SuperRes(5, 1, .25, true, """nnedi3_rpow2(2, cshift="spline16resize")""")

function SuperRes(clip input, int "passes", float "strength", float "softness", bool "hqdownscaling", string "upscalecommand")
{
passes = default(passes, 3)
strength = default(strength, 1)
softness = default(softness, 0)
hqdownscaling = default(hqdownscaling, true)

Assert((passes > 0 && passes <= 5) ? true : false, chr(10) + "Passes must be between 1 and 5" + chr(10))
Assert((strength >= 0 && strength <= 1) ? true : false, chr(10) + "Strength must be between 0 and 1" + chr(10))
Assert((softness >= 0 && softness <= 1) ? true : false, chr(10) + "Softness must be between 0 and 1" + chr(10))
Assert((softness >= 0 && softness <= 1) ? true : false, chr(10) + "Softness must be between 0 and 1" + chr(10))
Assert(Defined(upscalecommand), chr(10) + "You must specify upscalecommand" + chr(10))

original = input
Eval("input = input." + upscalecommand)
input = input.SuperResPass(original, strength, softness, hqdownscaling)
input = passes > 1 ? input.SuperResPass(original, strength, softness, hqdownscaling) : input
input = passes > 2 ? input.SuperResPass(original, strength, softness, hqdownscaling) : input
input = passes > 3 ? input.SuperResPass(original, strength, softness, hqdownscaling) : input
input = passes > 4 ? input.SuperResPass(original, strength, softness, hqdownscaling) : input

return input
}

function SuperResPass(clip c, clip original, float strength, float softness, bool hqdownscaling)
{
downsample = hqdownscaling==true ? c.BicubicResize(original.Width, original.Height) : c.BilinearResize(original.Width, original.Height)
diff = mt_makediff(downsample, original, chroma="process")
# return SuperResCore(c, diff, strength, softness)
diff = diff.BicubicResize(c.width, c.height)
return mt_makediff(c, diff, chroma="process")
}

The challenge is that SuperRes works with RGB and Lab color spaces.

Basically, there would be 2 possible approaches to this.

First would be to run HLSL scripts with DirectX11's DirectCompute. It might also be possible to run HLSL in DirectX9 but I don't know how. SuperRes shader uses a 2D Texture to store the video frame data as 4xfloat per pixel. All that would need to be done is convert the frame data from PClip to 2DTexture and then back. This would have the great advantage of running on the GPU, and the disadvantage of failing if the GPU doesn't support it.

Second option would be to emulate the float2, float3 and float4 structures and translate the HLSL code to run on the CPU. Either way, the frame data needs to be converted from YV12 to float4 Lab and then back.

The very first task would be to replace the Diff in the script above with this diff. It takes clip1 as RGB32 and clip2 as Lab. It converts clip1 to Lab, performs the diff and copy clip1's original Y into the diff's A.
https://github.com/zachsaw/MPDN_Exte...rRes/Diff.hlsl

The shader execution option would be very interesting for a few reasons.

1. Faster code execution on the GPU
2. It runs on the GPU while the rest of AviSynth runs on the CPU, making use of both simultaneously.
3. It gives access to many shaders unavailable in AviSynth, such as Super-xBR
4. All developments efforts that are put into shaders would also benefit AviSynth users
5. Easy to maintain. When shaders are updated, the script can just be copied and used in AviSynth.


Btw, I have to say that SuperRes approach is smart and relatively simple. When doing an upscale, it resizes back down with Bicubic and does a comparison with the original frame in the highest quality possible with the Lab color space to produce a diff. The diff is a map of upscaling defects, of data missing from the upscaled frame. SuperRes can then re-insert that missing data into the upscaled frame to increase its accuracy. Genius!

Last edited by MysteryX; 7th July 2015 at 17:09.
MysteryX is offline   Reply With Quote
Old 8th July 2015, 01:26   #4  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,128
I think Shader support should be part of AviSynth's core. It would allow for GPU-support. And this should be done "right" by a master AviSynth developer. It shouldn't be difficult, but if it is to be done, and if it is to become a core component, it should be done right.

That's how madVR can work on the GPU: it's all coded in HLSL. It runs on DX9 so it definitely can be done.

I did another search in Google, and this time, some sample came up (that I couldn't find before)
Simple Vertex & Pixel Shader (HLSL)

The code to call the shader is there. All it would take is convert the data back and forth.

As for the SuperRes code, one way is to use the function written above and use a generic function to call shaders. Another option would be to rewrite that AVS code in C++. Shaders will generally need to work with RGBA data with float precision. Can 16-byte-per-pixel be stored in AviSynth Clip objects? If yes, the result of one shader can be passed to the next shader. If not, then the whole code needs to be written in C++ to store the float data in an internal buffer.

Last edited by MysteryX; 8th July 2015 at 02:05.
MysteryX is offline   Reply With Quote
Old 8th July 2015, 04:12   #5  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,128
Here's how to create a shader texture in DX9.

g_pD3D = Direct3DCreate9( D3D_SDK_VERSION );
D3DPRESENT_PARAMETERS d3dpp;
ZeroMemory( &d3dpp, sizeof(d3dpp) );
// Set D3DPRESENT_PARAMETERS options

g_pD3D->CreateDevice(D3DADAPTER_DEFAULT, D3DDEVTYPE_NULLREF, NULL, NULL, &d3dpp, &g_pd3dDevice);

g_pd3dDevice->CreateTexture(FrameWidth, FrameHeight, 1, 0, D3DFMT_A32B32G32R32F, D3DPOOL_SYSTEMMEM, &MyBuffer, NULL);

That's it! Before calling this, you create a memory buffer with a size of FrameWidth*FrameHeight*16.

The only difference between this buffer and AviSynth's buffer is that the pitch will be different (the texture has no padding).

I don't want to spend too much time into this as I know I won't be able to code the whole thing, but I keep getting ideas and will keep exploring.

For RGB to Lab conversion, it can very well be done within a shader itself.

What needs to be implemented is
- YV12(2-byte per pixel) to float RGB(12-byte per pixel) converter.
- float RGB to YV12 converter
- copy frame data from AviSynth to texture (copy each row and discard pitch)
- copy frame data from texture to AviSynth (copy each row and add pitch)
- create and configure shader
- run shader on the texture

The performance bottleneck will be in the YV12 to float RGB conversion. That conversion would ideally be optimized in assembly. But copying float RGB data back and forth will be very quick as you copy complete rows at once without processing.

That's pretty much all it takes!

And with these 6 functions, you get GPU acceleration of your scripts! I wonder how the NNEDI3 GPU-accelerated shader would compare with the AviSynth version.

I think the research and design part is done. All it takes now is someone who has time (and skills) to translate it into code.

With this, the SuperRes filter could be run from the AVS script I wrote above by running 4 shaders: RgbToLab, LabToRgb, SuperResDiff and SuperResCore (they're already written, the HLSL files would be copy/pasted)

Last edited by MysteryX; 8th July 2015 at 04:45.
MysteryX is offline   Reply With Quote
Old 8th July 2015, 04:32   #6  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,128
Here's a collection of filters that could be added to your AviSynth scripts
https://github.com/zachsaw/MPDN_Exte.../RenderScripts

My computer background is in .NET and architecture design (and VB6 and QBasic). I will *not* code this. My poor ability to focus on the low-level details in an unknown language would give a poor filter. Someone else will do a MUCH better job than me at this.

Someone with the right skills could do this in 3 days. It would take me 3 weeks (having to google "c++ how to copy memory", "c++ what is **", etc.)

I'll check back on Friday to see if it's done

Last edited by MysteryX; 8th July 2015 at 05:16.
MysteryX is offline   Reply With Quote
Old 8th July 2015, 06:27   #7  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,128
Just had another thought. The reason I couldn't find any simple code to convert YV12 to RGB is because a color matrix must be specified to do the conversion. I suppose Rec709 should be the default. It's not much different than doing an int conversion (such as in ConvertToRGB). The only difference is that there is no rounding at the end. From what I saw in that code, ConvertToRGB was using Rec601 instead of Rec709 by default. You can specify the color matrix and it will run those in MMX assembly and fail if MMX isn't supported. The optimized assembly code won't be useful because we need a float output instead of int. The standard 601 code won't be useful either. But a quick search on Rec709 will give the exact formula to use.
https://en.wikipedia.org/wiki/Rec._709
MysteryX is offline   Reply With Quote
Old 9th July 2015, 18:24   #8  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,128
Heck, I might just do it. I keep getting the message to "just get it done quick".

The right way to optimize the float YUV to RGB conversion will be to do it via HLSL shaders! Send YUV data to a shader that does the conversion.
MysteryX is offline   Reply With Quote
Old 9th July 2015, 18:42   #9  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,128
Alright. It can be done easily and will be done.

I created a GitHub project with what I have so far and anyone can look into it

https://github.com/mysteryx93/AviSynthShader
MysteryX is offline   Reply With Quote
Old 9th July 2015, 19:55   #10  |  Link
TheFluff
Excessively jovial fellow
 
Join Date: Jun 2004
Location: rude
Posts: 1,021
I don't know if you noticed, but you're only talking to yourself, here.
TheFluff is offline   Reply With Quote
Old 9th July 2015, 20:14   #11  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,128
If someone is interested in the topic, all the information is here. Plus I'm often going back to the information I compiled here.

If nobody else is interested in running shaders in AviSynth, then perhaps the benefits are not being understood.

Benefits:

1. Having access to various HLSL filters that aren't currently available in AviSynth

2. Using GPU acceleration

3. Having access to SuperRes shader which will considerably improve the upscale quality. Even bilinear upscaling looks decent with SuperRes. Anyone using NNEDI3 would want this as it makes the output even more precise.

Last edited by MysteryX; 9th July 2015 at 20:21.
MysteryX is offline   Reply With Quote
Old 9th July 2015, 21:06   #12  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 5,672
Quote:
2. Using GPU acceleration
Not sure if you will get much acceleration by going through the float conversions.
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???
StainlessS is offline   Reply With Quote
Old 9th July 2015, 22:24   #13  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,128
Quote:
Originally Posted by StainlessS View Post
Not sure if you will get much acceleration by going through the float conversions.
That would have to be tested. The GPU is *much* more efficient at doing float operations. In fact, the CPU is kind of useless compared to the GPU when doing such repetitive work. The CPU won't need to process float data if we delegate the conversion to the GPU.

Then it only needs to be converted once per frame to bring the data in and once to bring the data out. Compared to highly intensive operations such as SuperRes or NNEDI3, that would probably be a very small fraction of the work, even if doing float conversion on the CPU.

Once I get the basic conversion working, I'll have a better idea as to the cost on performance and quality, by looping through the conversion 5 or 10 times in a row.
MysteryX is offline   Reply With Quote
Old 10th July 2015, 01:25   #14  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,128
Performance-wise, I'm getting 200fps with AVSMeter for a basic conversion back and forth with very unoptimized code. Commenting out the actual float conversion has little impact on performance which probably indicates that most of the cost is in the unoptimized loops themselves. That really isn't bad and could be easily improved.

In the upscaling script I'd use it for, which runs at 4-6fps, I would convert back and forth twice (NNEDI3 -> SuperRes shader -> NNEDI3 -> SuperRes shader). However, if NNEDI3 shader works and provides better performance (the HLSL version is designed for live playback), I could run all 4 filters in a row with a single conversion.

In other words: conversion isn't a problem on performance.

Furthermore, texture shaders have the option of working with half-float data, where each float is 2-byte instead of 4-byte. Considering these represent 1-byte data, working with 2-byte would probably be enough and provide major performance gain. After the 4-byte float version is working, an option can easily be added to instead use half-float. And from there, tests can be done to compare quality and performance.
MysteryX is offline   Reply With Quote
Old 10th July 2015, 13:46   #15  |  Link
TheFluff
Excessively jovial fellow
 
Join Date: Jun 2004
Location: rude
Posts: 1,021
There is a reason there is nobody doing shader programming in Avisynth today. There are a few (not very many) GPU-accelerated filters - the improvement just isn't big enough to justify it in most cases - but none that I can think of is implemented using shaders. Can you figure out why that is?

Quote:
Originally Posted by MysteryX View Post
If that would be done, this would allow running other shaders like NNEDI3 on the GPU and probably have MUCH better performance!
Are you aware of the fact that a GPU-accelerated version of NNEDI3 already exists? It doesn't use shaders, though (there's a hint for you here).

Last edited by TheFluff; 10th July 2015 at 14:08.
TheFluff is offline   Reply With Quote
Old 10th July 2015, 17:45   #16  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,128
Quote:
Originally Posted by TheFluff View Post
There is a reason there is nobody doing shader programming in Avisynth today. There are a few (not very many) GPU-accelerated filters - the improvement just isn't big enough to justify it in most cases - but none that I can think of is implemented using shaders. Can you figure out why that is?


Are you aware of the fact that a GPU-accelerated version of NNEDI3 already exists? It doesn't use shaders, though (there's a hint for you here).
Of course the AviSynth version doesn't use shaders; AviSynth doesn't support that! But there *is* a shaders version of NNEDI3.

The GPU-enabled NNEDI3 that currently exists doesn't perform any better than the non-GPU version. Meanwhile, the HLSL version renders live videos. I'd be curious to compare the difference, and honestly I don't know what the difference will be.

But the main reason I'm doing this is to have access to filters that aren't available in AviSynth.
MysteryX is offline   Reply With Quote
Old 11th July 2015, 15:01   #17  |  Link
TheFluff
Excessively jovial fellow
 
Join Date: Jun 2004
Location: rude
Posts: 1,021
Quote:
Originally Posted by MysteryX View Post
Of course the AviSynth version doesn't use shaders; AviSynth doesn't support that!
wrong answer

what do you even mean by "doesn't support that"? Avisynth plugins are C++, you can write arbitrary code in them, there's absolutely nothing stopping you from doing whatever you want

Quote:
Originally Posted by MysteryX View Post
the HLSL version renders live videos
You mean the madVR implementation? Because I'm pretty sure that one isn't written in HLSL and does not, in fact, use shaders.

Last edited by TheFluff; 11th July 2015 at 15:15.
TheFluff is offline   Reply With Quote
Old 11th July 2015, 17:31   #18  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,128
Here's the NNEDI3 HLSL code
https://github.com/zachsaw/MPDN_Exte...Scripts/NNEDI3

I asked Madshi. He indeed did program madVR mostly with HLSL, and that's how he's using GPU acceleration.

I'm curious about something with the NNEDI3 OpenCL implementation. It has been reported to not perform any better than the standard version. Shouldn't that depend purely on the graphic card, or is the CPU still the bottleneck? Perhaps I'd be better to ask that on the NNEDI OpenCL thread.
MysteryX is offline   Reply With Quote
Old 11th July 2015, 20:33   #19  |  Link
TheFluff
Excessively jovial fellow
 
Join Date: Jun 2004
Location: rude
Posts: 1,021
Quote:
Originally Posted by MysteryX View Post
That's interesting because the last I heard of it (last fall) was that the shader implementation was about a factor 1000 slower than the OpenCL one. Which sounds completely reasonable to me.

I'll give you the answer to my question, though. See, the reason nobody writes Avisynth filters in HLSL is that for the vast majority of work you want to do in Avisynth, shaders are simply the wrong tool for the job. They're tailored to a very specific set of 3D rendering jobs and they're simply not very useful for the majority of image processing Avisynth programmers want to do. Even the much more general-purpose OpenCL stuff is kind of a niche thing - there are only a few filters that are well suited to running on a GPU.
TheFluff is offline   Reply With Quote
Old 11th July 2015, 23:14   #20  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,128
Shiandow doesn't seem to agree with you, and it's his shader I'm interested in.

Madshi doesn't seem to agree either, but he wants to keep his code private.
MysteryX is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 07:18.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2018, vBulletin Solutions Inc.