Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 2nd October 2015, 21:50   #1  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 1,980
AviSynthShader + SuperRes

Shiandow wrote a scaling algorithm called SuperRes that greatly enhances upscaling. Unfortunately, it wasn't possible to use this code in AviSynth and I really wanted to use it. So, I wrote an AviSynth plugin that allows running any HLSL pixel shaders through DirectX9.

Download the latest release here (v1.6.4, September 20th 2017)
https://github.com/mysteryx93/AviSynthShader/releases
Source code available on GitHub
https://github.com/mysteryx93/AviSynthShader

This plugin allows running HLSL pixel shaders within AviSynth. This gives access to various HLSL filters that haven't been programmed in AviSynth.

Note: Shiandow's SuperRes is not what is typically being called SuperRes; it's something else. It does not "create" any details like traditional SuperRes or Sharpening algorithms do. It doesn't have any temporal effect either, it works with frames one by one. Here's the way it works. It wraps around other resizers. After doubling the image size (with NNEDI3 for example), it resizes it back down with Bicubic and compares it with the original, producing a diff map representing details that were lost while upscaling. Then, it does its magic from that diff map. How? Well... with this code. Results speak for themselves. I found it to work best with NNEDI3(nns=4)


Syntax information on GitHub


Special thanks to Shiandow for writing such amazing code, and especially to make it open source!

Special thanks to Madshi for taking the time to give very valuable pointers when nobody else was able to help!


Here are comparison images
ImageSource("Lighthouse.png").ConvertToRGB24()

Note: Since uploading these comparison, SuperRes has slightly changed. The newer version has less ringing and softer images.

1. Original
2. Spline16
3. nnedi3_rpow2(4, nns=4, cshift="Spline16Resize")
4. SuperXBR(edgeStrength=.6, weight=.6), twice
5. SuperRes(2, .43, 0, """edi_rpow2(2, nns=4, cshift="Spline16Resize")"""), twice
6. SuperResXBR(2, .6, xbrEdgeStrength=2.3, xbrSharpness=1.2), twice
7. SuperResXBR(1, .7, xbrEdgeStrength=.1, xbrSharpness=.7), twice

Lighthouse


Clown


Eclipse (x2)

Last edited by MysteryX; 20th September 2017 at 17:59.
MysteryX is offline   Reply With Quote
Old 2nd October 2015, 21:52   #2  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 1,980
SuperRes is one of may shaders that can be run with AviSynthShader. This particular shader greatly enhances upscaling quality and runs on top of any other algorithm you're already using such as NNEDI3.

SuperRes(input, passes, strength, softness, hqdownscaling, upscalecommand)

In Shaders\SuperRes\SuperRes.avsi. Thanks to Shiandow for writing this great code!

Arguments:

passes: How many SuperRes passes to run. Default=1.

strength: How agressively we want to run SuperRes, between 0 and 1. Default=1.

softness: How much smoothness we want to add, between 0 and 1. Default=0.

hqdownscaling: True to downscale using Bicubic, false to downscale using Bilinear.

upscalecommand: An upscaling command that must contain offset-correction. Ex: """nnedi3_rpow2(2, cshift="Spline16Resize")"""

Shiandow provides many other HLSL shaders available here that can be integrated into AviSynth.

https://github.com/zachsaw/MPDN_Exte.../RenderScripts

Here's a comparison of NNEDI3 with and without SuperRes.

Original


NNEDI3


NNEDI3 + SuperRes (passes=1, strength=1, softness=0)

Last edited by MysteryX; 2nd October 2015 at 22:08.
MysteryX is offline   Reply With Quote
Old 2nd October 2015, 22:06   #3  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 1,980
As of right now, ConvertFromFloat isn't working towards RGB32. By fixing this, it would allow avoiding an unnecessary RGB-YUV conversion within SuperRes and may increase quality and performance.

The conversion code is here, could someone familiar with RGB32 help me on this one?

https://github.com/mysteryx93/AviSyn...mFloat.cpp#L54

Then, we could also avoid an unnecessary conversion by doing the SuperRes downscaling via a Shader; I don't have any such code right now, and I don't know if the DLL code will have to be tweaked to allow HLSL resizing.

Another limitation is that although the library contains a HLSL YUV-RGB conversion and converting back and forth works fine, I get weird distortion after running any other shaders. When using CPU conversion, it works fine. However, the CPU conversion code is Rec601 instead of Rec709. Since it gets converted back using the same algorithm, it doesn't distort the output, but it may cause a slight distort on the shader processing.

Last edited by MysteryX; 2nd October 2015 at 22:48.
MysteryX is offline   Reply With Quote
Old 3rd October 2015, 01:34   #4  |  Link
foxyshadis
ангел смерти
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Lost
Posts: 9,175
You're officially my hero.

Edit: You're not incrementing dst properly. Since it's char, you're writing to it like:

00000000
ABC00000
ADEF0000
ADGHI000
etc.

A simple fix is swapping &dst[x],&dst[x+1],&dst[x+2] with &dst[(x*4)+2],&dst[(x*4)+1],&dst[x*4]. Note that I swapped the order because RGB32 is actually BGRA in memory.

The way the function's designed now isn't the greatest, but at least it'll work, that's most important.
__________________
There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. ~ Ed Howdershelt

Last edited by foxyshadis; 3rd October 2015 at 01:48.
foxyshadis is offline   Reply With Quote
Old 3rd October 2015, 03:22   #5  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 1,980
Thanks. Now it almost looks OK but the image is still corrupted. It gives a similar effect as to when I'm trying to use the HLSL YUV-RGB conversion.



I also added a "folder" parameter to SuperRes to specify where to find the .cso files.
MysteryX is offline   Reply With Quote
Old 3rd October 2015, 05:03   #6  |  Link
foxyshadis
ангел смерти
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Lost
Posts: 9,175
Is dst memsetted to 0? Could be garbage in the alpha bit. Actually, it's only happening on white, so it's probably not clamping. Should move the clamping outside of the if yuv/rgb block, along with memsetting (or just manually setting alpha to 0 each pixel).
__________________
There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. ~ Ed Howdershelt
foxyshadis is offline   Reply With Quote
Old 3rd October 2015, 19:10   #7  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 1,980
Great, it now supports converting to/from RGB32. SuperRes's downscaling is now done in RGB32, avoiding unnecessary RGB-YUV conversion. It considerably further increases quality.



2 areas that still need improvement:

1. Find a Bicubic HLSL shader to avoid unnecessary conversion for downscaling.

2. Convert YUV-RGB in Rec709 colorspace. Because it is currently converting in the Rec601 colorspace, the sharpening effect may slightly pull towards certain colors.

Contributions or suggestions are welcomed on those.

And then, of course, there are all kinds of performance optimizations that could be implemented.

Here are some performance numbers. CPU-Z isn't reading my GPU usage properly so I can't measure what's happening on the GPU.

nnedi3_rpow2(2, cshift="Spline16Resize")
60fps @ 55% CPU
SuperRes(1, 1, 0, true, """nnedi3_rpow2(2, cshift="Spline16Resize")""")
4.5fps @ 12% CPU

MT=8
nnedi3_rpow2(2, cshift="Spline16Resize", Threads=2)
80fps @ 83% CPU
SuperRes(1, 1, 0, true, """nnedi3_rpow2(2, cshift="Spline16Resize", Threads=2)""")
15fps @ 85% CPU

Last edited by MysteryX; 3rd October 2015 at 19:32.
MysteryX is offline   Reply With Quote
Old 3rd October 2015, 19:29   #8  |  Link
Khanattila
Registered User
 
Khanattila's Avatar
 
Join Date: Nov 2014
Posts: 416
I have not had yet the opportunity to see the workings of Shiandow's SuperRes. Maybe is Farsiu or Mitzel. Really I have no idea.
I could make it a plugin like KNLMeansCL. If ever I have time.
__________________
https://github.com/Khanattila
Khanattila is offline   Reply With Quote
Old 4th October 2015, 01:28   #9  |  Link
vivan
/人 ◕ ‿‿ ◕ 人\
 
Join Date: May 2011
Location: Russia
Posts: 648
Quote:
Originally Posted by Khanattila View Post
I have not had yet the opportunity to see the workings of Shiandow's SuperRes. Maybe is Farsiu or Mitzel. Really I have no idea.
One thing that should be noted is that Shiandow's SuperRes is not a Super Resolution algorithm. I'd call it a "reverse downscaler" - it modifies upscaled image so that downscaled upscaled image is closer to the original image.
vivan is offline   Reply With Quote
Old 4th October 2015, 15:37   #10  |  Link
Shiandow
Registered User
 
Join Date: Dec 2013
Posts: 727
Quote:
Originally Posted by vivan View Post
One thing that should be noted is that Shiandow's SuperRes is not a Super Resolution algorithm. I'd call it a "reverse downscaler" - it modifies upscaled image so that downscaled upscaled image is closer to the original image.
It's called "single frame super resolution" in literature though. The current implementation is most closely related to the method described in this paper. The algorithms described by Faisru and Mitzel are for multi frame super resolution, which uses similar techniques but with a different goal.
Shiandow is offline   Reply With Quote
Old 3rd October 2015, 19:41   #11  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 1,980
SuperRes doesn't work like typical resize algorithms. It runs around other resizers. So let's say you want to use NNEDI3, it takes the larger image and the original image, does a Bicubic resize on the enlarged image to size it back down to the original, and then creates a difference map between the two, showing the details that were lost during the upsizing. From that diff map, it restores details and edges that were lost. Brilliant idea. It makes even basic resizers like Bilinear look decent.

Originally he was working on the Lab colorspace that definitely requires half-float data, and lately he dropped that to use Linear RGB colorspace. Perhaps I could try running with 8-bit-per-channel to see if there is a significant quality penalty.
MysteryX is offline   Reply With Quote
Old 3rd October 2015, 22:06   #12  |  Link
Khanattila
Registered User
 
Khanattila's Avatar
 
Join Date: Nov 2014
Posts: 416
Quote:
Originally Posted by MysteryX View Post
SuperRes doesn't work like typical resize algorithms. It runs around other resizers. So let's say you want to use NNEDI3, it takes the larger image and the original image, does a Bicubic resize on the enlarged image to size it back down to the original, and then creates a difference map between the two, showing the details that were lost during the upsizing. From that diff map, it restores details and edges that were lost. Brilliant idea. It makes even basic resizers like Bilinear look decent.

Originally he was working on the Lab colorspace that definitely requires half-float data, and lately he dropped that to use Linear RGB colorspace. Perhaps I could try running with 8-bit-per-channel to see if there is a significant quality penalty.
Fairsu: pdf.
Mitzel: pdf.

That's what I was talking about. Like me or tritical, we did not invent NLMeans, but we simply implement a well known algorithm
__________________
https://github.com/Khanattila
Khanattila is offline   Reply With Quote
Old 4th October 2015, 00:45   #13  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 1,980
Quote:
Originally Posted by Khanattila View Post
That's what I was talking about. Like me or tritical, we did not invent NLMeans, but we simply implement a well known algorithm
Shiandow isn't just implementing an existing algorithm, and this code isn't final. He's still working on it to try to find a better way to use Smoothness. Right now, it works better with Smoothness=0.

I'm seeing a weird bug when using 2 passes with strength=0.425, sometimes I get Access Violation, and it's random. Sometimes it starts anyway, and then fails unexpectedly. Now I'm running it with 1 pass with strength=0.85 and it's been running for half an hour without any issue. I really don't see why running a second pass could cause any such issue...

Last edited by MysteryX; 4th October 2015 at 00:50.
MysteryX is offline   Reply With Quote
Old 3rd October 2015, 20:47   #14  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 1,980
I have added support for processing standard 8-bit-per-channel instead of 16-bit-per-channel. Simply add parameter "precision=1" to each conversion and shader calls. Quality is considerably lowered.



With MT=8
SuperRes(1, 1, 0, true, """nnedi3_rpow2(2, cshift="Spline16Resize", Threads=2)""")
I get 24fps @ 78% CPU
MysteryX is offline   Reply With Quote
Old 3rd October 2015, 22:26   #15  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 1,980
When using SuperRes, NNEDI3 and EEDI3 give almost identical output. Which means I can get rid of the ridiculously-expensive EEDI3.

1 pass with strength=.85 gives almost the same result as 2 passes with strength=.425, and 2 passes looks slightly better. 2 passes with NNEDI3 is faster than 1 pass with EEDI3. Splitting in 3 or more passes gives the exact same thing as 2 passes.
MysteryX is offline   Reply With Quote
Old 4th October 2015, 18:45   #16  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 1,980
I fixed a crash when the DX9 device was lost.

I also replaced the code to copy the data in and out of DX9 with AviSynth's BitBlt, and it considerably increases performance by itself.

input.ConvertToFloat().ConvertFromFloat() renders 38fps @ 12% CPU, and if I disable YUV-RGB conversion, 42fps. Could someone look at how this short code could be optimized?
MysteryX is offline   Reply With Quote
Old 4th October 2015, 19:53   #17  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 1,980
The Access Violation error seems to be an out of memory error, plain and simple.

After playing some more with the settings, softness is actually working pretty well when using Strength=1 and 2 or 3 passes.

This is the best result I got with Passes=3, Strength=1, Softness=.25



There should be further increase in quality once I get to do Bicubic downscaling via shader, as it will avoid clamping on the diff map.
MysteryX is offline   Reply With Quote
Old 5th October 2015, 03:15   #18  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 1,980
One thing I love about SuperRes is that before, to get the best quality upscaling, I would have to combine EEDI3 and NNEDI3 in the right order and add just the right amount of sharpening at 3 different stages.

With SuperRes, I get better quality with only NNEDI3 and no artificial sharpening. No fine-tuning is necessary per video, standard NNEDI3 frame doubling with SuperRes(Passes=2, Strength=1, Softness=.3) is working perfect for all the videos I tried so far.

For the media encoder I was working on, it will make it much simpler.
MysteryX is offline   Reply With Quote
Old 7th October 2015, 00:44   #19  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 1,980
Updated the code and DLL.

- Increased performance
- Added Shader Width and Height parameters to set the output frame size. Default = same as source clip.
MysteryX is offline   Reply With Quote
Old 8th October 2015, 06:58   #20  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 1,980
Fixed float-byte rounding to be more accurate by adding .5f before rounding. Slight performance improvement.

This results in the colors being slightly brighter, and the SuperRes Diff map to be more accurate which slightly improve its effectiveness.

Quote:
Originally Posted by Khanattila View Post
I could make it a plugin like KNLMeansCL. If ever I have time.
As far as writing a native AviSynth version, I don't know if that would work. Originally, Shiandow was using the Lab colorspace which definitely requires half-float processing. He finally dropped it to use RGB Linear (not Gamma) colorspace. I doubt the YUV-RGB conversion could be avoided, and from my tests processing it with non-float data, the quality is considerably lower. This algorithm is very sensitive to details and must be processed with half-float precision. In that sense, perhaps native approaches wouldn't even be better than this. The GPU is much better at processing float data than the CPU.

Last edited by MysteryX; 8th October 2015 at 07:15.
MysteryX is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 09:08.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2017, vBulletin Solutions Inc.