Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 8th October 2015, 12:27   #21  |  Link
Ghostlamer
Registered User
 
Join Date: Aug 2009
Posts: 25
Is there any way to make it work with rfactor larger than 2.
My script:

Code:
XviD4PSPPluginsPath = "C:\Program Files (x86)\XviD4PSP 5\dlls\AviSynth\plugins\"

LoadPlugin(XviD4PSPPluginsPath+"nnedi3.dll")
LoadPlugin(XviD4PSPPluginsPath+"Shader.dll")

Import(XviD4PSPPluginsPath+"SuperRes.avsi")

SetMTMode(3,3)


AviSource("F:\2K\00.avi", audio=false, pixel_type="YV12")



SetMTMode(2)

SuperRes(2, 1, 0, false, """nnedi3_rpow2(rfactor=4, cshift="Spline16Resize", Threads=1)""")

Spline64Resize(2560,1380)
Without mtmode and rfactor 4 - stable working, but with 0.80fps with 6% cpu usage and 0% gpu, if i use rfactor=4 and setmtmode with more than 2 threads - crash, with 2 threads 1.40-1.60 fps and very low cpu and gpu usage.
Resolurion of original video 712x384.

Last edited by Ghostlamer; 8th October 2015 at 12:48.
Ghostlamer is offline   Reply With Quote
Old 8th October 2015, 17:14   #22  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,176
It crashes because you're going past the 2GB memory limit. The code will need to be optimized.

Performance can be improved by rewriting the functions to convert frames to/from float. The float/halffloat conversion can be done with a buffer instead of 1 by 1 which probably would increase performance. Having a HLSL Bicubic resize function also would help.

As far as memory usage, I'm not sure what can be done. Each DirectX 9 device is creating its own threads and managing its own memory. A DX9 device is created each time a Shader is called. If there are 8 shader calls within SuperRes, and 4 threads, then that's 32 DX9 devices.

You can analyze your script with AVSMeter. Using 1 pass instead of 2 also will increase performance. According to my tests, NNEDI3 works best with 2 threads.

Try this
SuperRes(1, .85, 0, false, """nnedi3_rpow2(rfactor=4, cshift="Spline16Resize", Threads=2)""")

Last edited by MysteryX; 8th October 2015 at 17:17.
MysteryX is offline   Reply With Quote
Old 8th October 2015, 17:42   #23  |  Link
Khanattila
Registered User
 
Khanattila's Avatar
 
Join Date: Nov 2014
Posts: 433
Quote:
Originally Posted by MysteryX View Post
Fixed float-byte rounding to be more accurate by adding .5f before rounding. Slight performance improvement.

This results in the colors being slightly brighter, and the SuperRes Diff map to be more accurate which slightly improve its effectiveness.


As far as writing a native AviSynth version, I don't know if that would work. Originally, Shiandow was using the Lab colorspace which definitely requires half-float processing. He finally dropped it to use RGB Linear (not Gamma) colorspace. I doubt the YUV-RGB conversion could be avoided, and from my tests processing it with non-float data, the quality is considerably lower. This algorithm is very sensitive to details and must be processed with half-float precision. In that sense, perhaps native approaches wouldn't even be better than this. The GPU is much better at processing float data than the CPU.
Have you tried to normalize data in uint32_t? Without use floating-point numbers.
If working with processor is much faster.
unorm32 = (UINT32_MAX * (value - VALUE_MIN)) / (VALUE_MAX - VALUE_MIN)

If for example VALUE_MIN is 0 and VALUE_MAX is 255.
unorm32 = (4294967295 * value) / 255 = 16843009 * value
0 --> 0
1 --> 16843009
2 --> 33686018
...
255 --> UINT32_MAX
__________________
github.com

Last edited by Khanattila; 8th October 2015 at 17:49.
Khanattila is offline   Reply With Quote
Old 8th October 2015, 18:16   #24  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 4,634
Quote:
Originally Posted by Khanattila View Post
Have you tried to normalize data in uint32_t? Without use floating-point numbers.
If working with processor is much faster.
Indeed. What's up with the obsession of some folks using floats lately? Even 64 bit (u)int is faster than 32 bit float.
Groucho2004 is offline   Reply With Quote
Old 8th October 2015, 19:04   #25  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,176
Quote:
Originally Posted by Khanattila View Post
Have you tried to normalize data in uint32_t? Without use floating-point numbers.
I could do some more experiment with that, the shader processing can be done with uint data. However, the way color conversion is currently done creates overflow so that won't work just yet; there would be "stuff" to fix in the color conversion first. The HLSL color conversion code should avoid overflow, but I was seeing weird behaviors when I tried using it so that's not yet working.

If we can get color conversion to avoid overflow and work properly, then we could try processing with uint data and see what performance difference it makes.
MysteryX is offline   Reply With Quote
Old 8th October 2015, 19:35   #26  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,176
I'm leaving to China for 2 weeks and won't be playing with this. If someone wants to look into the code, you could look into
1. Getting the HLSL color conversion to work; or having CPU conversion that avoids overflows
2. Getting the shader to run with uint data (changing buffer format); which requires not having overflows
3. Optimizing the ConvertToFloat and ConvertFromFloat functions. Precision could have 3 values for ConvertToFloat, ConvertFromFloat and Shader functions: 1 (8-bit per channel), 2 (16-bit uint per channel) or 3 (16-bit float per channel)
MysteryX is offline   Reply With Quote
Old 8th October 2015, 19:37   #27  |  Link
luigizaninoni
Registered User
 
Join Date: Apr 2015
Posts: 163
I must be missing something obvious, but have you any idea why is AvsPMod giving error:

LoadPlugin: unable to load "c:\users\admin\desktop\shaders\shader.dll",Module not found. Install missing library ?

shader.dll is definitely in that directory

My script:
LoadPlugin("C:\Users\admin\Desktop\Video\Staxrip\Applications\DGMPGDec\DGDecode.dll")
LoadPlugin("c:\users\admin\desktop\shaders\shader.dll")
MPEG2Source("C:\Users\admin\Desktop\10-07-05-40-01-mozzibrb-TELECOLOR temp files\10-07-05-40-01-mozzibrb-TELECOLOR.d2v",cpu=6,ipp=true,moderate_h=40,moderate_v=60,idct=5)
Crop(2, 2, -2, -2)
QTGMC(Preset="Slow")
SelectEven()
SuperRes(2, 0.85, 0, true, """nnedi3_rpow2(rfactor=2, cshift="Spline16Resize", Threads=2)""", "C:\users\admin\desktop\shaders\")
luigizaninoni is offline   Reply With Quote
Old 8th October 2015, 20:14   #28  |  Link
Ghostlamer
Registered User
 
Join Date: Aug 2009
Posts: 25
Quote:
It crashes because you're going past the 2GB memory limit. The code will need to be optimized.
Im using virtualdub and when the crash occurs, vdub process eat only 800-1100 mb.
Script working with setmtmode(3 and 5) instead of 2 (with more than 2 threads), but the output video - buggy (mode 3, 4-5 fps), low perfomance (mode 5, 0.5-0.8 fps).

Quote:
Try this
SuperRes(1, .85, 0, false, """nnedi3_rpow2(rfactor=4, cshift="Spline16Resize", Threads=2)""")
Thanks for the advice.

Last edited by Ghostlamer; 8th October 2015 at 20:36.
Ghostlamer is offline   Reply With Quote
Old 8th October 2015, 20:41   #29  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,176
btw, if anyone wants to play with the code, it's pretty simple, but you need
- DirectX SDK
- Visual Studio (I'm making very little use of C++, it could be adapted to standard C with little changes)

GitHub allows you to download the source code, make your own changes and upload your contributions to the code. It takes some time to learn how use but then is very useful for collaborative projects. TortoiseGit makes it much easier to use.
MysteryX is offline   Reply With Quote
Old 8th October 2015, 20:51   #30  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 4,634
Quote:
Originally Posted by luigizaninoni View Post
I must be missing something obvious, but have you any idea why is AvsPMod giving error:

LoadPlugin: unable to load "c:\users\admin\desktop\shaders\shader.dll",Module not found. Install missing library ?

shader.dll is definitely in that directory
Which OS are you using? If you're using Vista or above, use Dependency Walker to find out what's missing, possibly some of the DX stuff.

Last edited by Groucho2004; 8th October 2015 at 20:55.
Groucho2004 is offline   Reply With Quote
Old 8th October 2015, 20:59   #31  |  Link
Ghostlamer
Registered User
 
Join Date: Aug 2009
Posts: 25
MysteryX, Triple quotes can be bypassed?, why I ask?, there is a script mt_pipeline http://forum.doom9.org/showthread.php?t=163281 , it allows to bypass the 2gb limit, but it also uses triple quotation marks and conflicts with supereres (but works very well with many others), i just not avisynth guru, do not know much.

Last edited by Ghostlamer; 8th October 2015 at 21:03.
Ghostlamer is offline   Reply With Quote
Old 8th October 2015, 21:34   #32  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,176
I tried applying the 4GB patch to AVSMeter.exe, and it didn't work. The flag D3DXCONSTTABLE_LARGEADDRESSAWARE must also be added within the DX9 source code to make that work. Not sure why it hasn't worked yet.
MysteryX is offline   Reply With Quote
Old 8th October 2015, 21:47   #33  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 4,634
Quote:
Originally Posted by MysteryX View Post
I tried applying the 4GB patch to AVSMeter.exe, and it didn't work.
LARGEADDRESSAWARE is one of linker options I use for the 32 Bit binary. No need to patch.
Groucho2004 is offline   Reply With Quote
Old 8th October 2015, 22:14   #34  |  Link
luigizaninoni
Registered User
 
Join Date: Apr 2015
Posts: 163
Quote:
Originally Posted by Groucho2004 View Post
Which OS are you using? If you're using Vista or above, use Dependency Walker to find out what's missing, possibly some of the DX stuff.
Problem solved. d3dx9_43.dll was actually missing. Thank you very much for your kind advice
luigizaninoni is offline   Reply With Quote
Old 8th October 2015, 22:39   #35  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,176
Quote:
Originally Posted by luigizaninoni View Post
Problem solved. d3dx9_43.dll was actually missing. Thank you very much for your kind advice
How come was that file missing? Isn't it a system file that "should" already be there?


I just did an experiment with processing frames with int data instead of float. When initializing the device, I replaced the format D3DFMT_A16B16G16R16F by D3DFMT_A16B16G16R16. The performance is slightly faster but not that much. We might save some more in the data conversion. Obviously, with this test, the image was corrupt because it was processing half-float data as if it was uint, but it looked better than I would have expected.

There is definitely a bottleneck somewhere and it isn't the half-float shader processing.

In terms of numbers, I'm using this script
Quote:
SetMTMode(3,4)
AviSource("Preview.avi", audio=false, pixel_type="YV12")
SetMTMode(2)
SuperRes(2, .42, 0, true, """nnedi3_rpow2(2, cshift="Spline16Resize", Threads=2)""")
Distributor()
With D3DFMT_A16B16G16R16 or D3DFMT_A16B16G16R16F, I get almost exactly the same numbers: 12fps @ 53% CPU. Memory usage also is the same.

One thing I found out is that creating a .def file (such as AVSMeter.def) with "STACKSIZE 512KB" in it slightly increases performance.

I also just did another quick test: removing the half-float conversions. It still calculates in float but then converts into short. Performance went way up from 12fps to 15-16fps.

Last edited by MysteryX; 8th October 2015 at 23:04.
MysteryX is offline   Reply With Quote
Old 8th October 2015, 23:32   #36  |  Link
Shiandow
Registered User
 
Join Date: Dec 2013
Posts: 752
Quote:
Originally Posted by Khanattila View Post
Have you tried to normalize data in uint32_t? Without use floating-point numbers.
If working with processor is much faster.
unorm32 = (UINT32_MAX * (value - VALUE_MIN)) / (VALUE_MAX - VALUE_MIN)

If for example VALUE_MIN is 0 and VALUE_MAX is 255.
unorm32 = (4294967295 * value) / 255 = 16843009 * value
0 --> 0
1 --> 16843009
2 --> 33686018
...
255 --> UINT32_MAX
Quote:
Originally Posted by Groucho2004 View Post
Indeed. What's up with the obsession of some folks using floats lately? Even 64 bit (u)int is faster than 32 bit float.
Well ,the original SuperRes code (designed for MPDN) used 16 bit uint for most of the processing. It does store an intermediate results in float, but that conversion is handled by the GPU itself. I'm not even sure if that part is necessary, signed ints would probably work just as well.

However the shaders will still use floats (single precision) internally. And as far as I know GPUs aren't that good at integer (or fixed point) arithmetic, but maybe that's changed.
Shiandow is offline   Reply With Quote
Old 8th October 2015, 23:43   #37  |  Link
Khanattila
Registered User
 
Khanattila's Avatar
 
Join Date: Nov 2014
Posts: 433
Quote:
Originally Posted by Shiandow View Post
Well ,the original SuperRes code (designed for MPDN) used 16 bit uint for most of the processing. It does store an intermediate results in float, but that conversion is handled by the GPU itself. I'm not even sure if that part is necessary, signed ints would probably work just as well.

However the shaders will still use floats (single precision) internally. And as far as I know GPUs aren't that good at integer (or fixed point) arithmetic, but maybe that's changed.
GPU are TERRIBLE with integer. But it have a fast internal conversion from integer to float.

Like KNLMeansCL, this is the way forward:
Read Integer Buffer --> GPU internal Conversion to normalized float --> Processing float --> GPU internal Conversion to integer --> Write to Integer Buffer.

Anyway, in this case it is better not to use float rather than converting by CPU.
__________________
github.com
Khanattila is offline   Reply With Quote
Old 9th October 2015, 01:06   #38  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,176
I wouldn't be surprised if the DX9 function to convert half-float data is delegated to the GPU then, and that's what the buffer-processing function is for. If that's the case, then right now I'm sending commands to the GPU one by one. If I batch them into a buffer to be processed all at once, then performance would probably be MUCH better. Worth a try!
MysteryX is offline   Reply With Quote
Old 9th October 2015, 02:51   #39  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,176
I edited ConvertToFloat to use a buffer for half-float conversions. It still calculates as float (which could be optimized by calculating int instead), stores all data into a large float buffer, converts the whole frame at once, then copy back into the frame. ConvertFromFloat doesn't have those changes yet.

That change brought the performance up from 12fps to 14.5fps.
MysteryX is offline   Reply With Quote
Old 9th October 2015, 03:36   #40  |  Link
MysteryX
Soul Architect
 
MysteryX's Avatar
 
Join Date: Apr 2014
Posts: 2,176
ConvertToFloat and ConvertFromFloat are now both using a buffer for half-float conversion. Performance is now 18.5fps instead of 12fps. It could be further improved by calculating int data instead of float.

With this optimization, the CPU usage is also now higher, even with only 4 threads, so the whole script is running considerably faster.

Edit: I adapted ConvertToFloat to calculate color conversion with INT instead of FLOAT, performance further increased.

Last edited by MysteryX; 9th October 2015 at 05:41.
MysteryX is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 16:49.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, vBulletin Solutions Inc.