Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
8th October 2015, 12:27 | #21 | Link |
Registered User
Join Date: Aug 2009
Posts: 25
|
Is there any way to make it work with rfactor larger than 2.
My script: Code:
XviD4PSPPluginsPath = "C:\Program Files (x86)\XviD4PSP 5\dlls\AviSynth\plugins\" LoadPlugin(XviD4PSPPluginsPath+"nnedi3.dll") LoadPlugin(XviD4PSPPluginsPath+"Shader.dll") Import(XviD4PSPPluginsPath+"SuperRes.avsi") SetMTMode(3,3) AviSource("F:\2K\00.avi", audio=false, pixel_type="YV12") SetMTMode(2) SuperRes(2, 1, 0, false, """nnedi3_rpow2(rfactor=4, cshift="Spline16Resize", Threads=1)""") Spline64Resize(2560,1380) Resolurion of original video 712x384. Last edited by Ghostlamer; 8th October 2015 at 12:48. |
8th October 2015, 17:14 | #22 | Link |
Soul Architect
Join Date: Apr 2014
Posts: 2,559
|
It crashes because you're going past the 2GB memory limit. The code will need to be optimized.
Performance can be improved by rewriting the functions to convert frames to/from float. The float/halffloat conversion can be done with a buffer instead of 1 by 1 which probably would increase performance. Having a HLSL Bicubic resize function also would help. As far as memory usage, I'm not sure what can be done. Each DirectX 9 device is creating its own threads and managing its own memory. A DX9 device is created each time a Shader is called. If there are 8 shader calls within SuperRes, and 4 threads, then that's 32 DX9 devices. You can analyze your script with AVSMeter. Using 1 pass instead of 2 also will increase performance. According to my tests, NNEDI3 works best with 2 threads. Try this SuperRes(1, .85, 0, false, """nnedi3_rpow2(rfactor=4, cshift="Spline16Resize", Threads=2)""")
__________________
FrameRateConverter | AvisynthShader | AvsFilterNet | Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer Last edited by MysteryX; 8th October 2015 at 17:17. |
8th October 2015, 17:42 | #23 | Link | |
Registered User
Join Date: Nov 2014
Posts: 440
|
Quote:
If working with processor is much faster. unorm32 = (UINT32_MAX * (value - VALUE_MIN)) / (VALUE_MAX - VALUE_MIN) If for example VALUE_MIN is 0 and VALUE_MAX is 255. unorm32 = (4294967295 * value) / 255 = 16843009 * value 0 --> 0 1 --> 16843009 2 --> 33686018 ... 255 --> UINT32_MAX
__________________
github.com Last edited by Khanattila; 8th October 2015 at 17:49. |
|
8th October 2015, 19:04 | #25 | Link | |
Soul Architect
Join Date: Apr 2014
Posts: 2,559
|
Quote:
If we can get color conversion to avoid overflow and work properly, then we could try processing with uint data and see what performance difference it makes. |
|
8th October 2015, 19:35 | #26 | Link |
Soul Architect
Join Date: Apr 2014
Posts: 2,559
|
I'm leaving to China for 2 weeks and won't be playing with this. If someone wants to look into the code, you could look into
1. Getting the HLSL color conversion to work; or having CPU conversion that avoids overflows 2. Getting the shader to run with uint data (changing buffer format); which requires not having overflows 3. Optimizing the ConvertToFloat and ConvertFromFloat functions. Precision could have 3 values for ConvertToFloat, ConvertFromFloat and Shader functions: 1 (8-bit per channel), 2 (16-bit uint per channel) or 3 (16-bit float per channel) |
8th October 2015, 19:37 | #27 | Link |
Registered User
Join Date: Apr 2015
Posts: 163
|
I must be missing something obvious, but have you any idea why is AvsPMod giving error:
LoadPlugin: unable to load "c:\users\admin\desktop\shaders\shader.dll",Module not found. Install missing library ? shader.dll is definitely in that directory My script: LoadPlugin("C:\Users\admin\Desktop\Video\Staxrip\Applications\DGMPGDec\DGDecode.dll") LoadPlugin("c:\users\admin\desktop\shaders\shader.dll") MPEG2Source("C:\Users\admin\Desktop\10-07-05-40-01-mozzibrb-TELECOLOR temp files\10-07-05-40-01-mozzibrb-TELECOLOR.d2v",cpu=6,ipp=true,moderate_h=40,moderate_v=60,idct=5) Crop(2, 2, -2, -2) QTGMC(Preset="Slow") SelectEven() SuperRes(2, 0.85, 0, true, """nnedi3_rpow2(rfactor=2, cshift="Spline16Resize", Threads=2)""", "C:\users\admin\desktop\shaders\") |
8th October 2015, 20:14 | #28 | Link | ||
Registered User
Join Date: Aug 2009
Posts: 25
|
Quote:
Script working with setmtmode(3 and 5) instead of 2 (with more than 2 threads), but the output video - buggy (mode 3, 4-5 fps), low perfomance (mode 5, 0.5-0.8 fps). Quote:
Last edited by Ghostlamer; 8th October 2015 at 20:36. |
||
8th October 2015, 20:41 | #29 | Link |
Soul Architect
Join Date: Apr 2014
Posts: 2,559
|
btw, if anyone wants to play with the code, it's pretty simple, but you need
- DirectX SDK - Visual Studio (I'm making very little use of C++, it could be adapted to standard C with little changes) GitHub allows you to download the source code, make your own changes and upload your contributions to the code. It takes some time to learn how use but then is very useful for collaborative projects. TortoiseGit makes it much easier to use. |
8th October 2015, 20:59 | #31 | Link |
Registered User
Join Date: Aug 2009
Posts: 25
|
MysteryX, Triple quotes can be bypassed?, why I ask?, there is a script mt_pipeline http://forum.doom9.org/showthread.php?t=163281 , it allows to bypass the 2gb limit, but it also uses triple quotation marks and conflicts with supereres (but works very well with many others), i just not avisynth guru, do not know much.
Last edited by Ghostlamer; 8th October 2015 at 21:03. |
8th October 2015, 22:39 | #35 | Link | ||
Soul Architect
Join Date: Apr 2014
Posts: 2,559
|
Quote:
I just did an experiment with processing frames with int data instead of float. When initializing the device, I replaced the format D3DFMT_A16B16G16R16F by D3DFMT_A16B16G16R16. The performance is slightly faster but not that much. We might save some more in the data conversion. Obviously, with this test, the image was corrupt because it was processing half-float data as if it was uint, but it looked better than I would have expected. There is definitely a bottleneck somewhere and it isn't the half-float shader processing. In terms of numbers, I'm using this script Quote:
One thing I found out is that creating a .def file (such as AVSMeter.def) with "STACKSIZE 512KB" in it slightly increases performance. I also just did another quick test: removing the half-float conversions. It still calculates in float but then converts into short. Performance went way up from 12fps to 15-16fps.
__________________
FrameRateConverter | AvisynthShader | AvsFilterNet | Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer Last edited by MysteryX; 8th October 2015 at 23:04. |
||
8th October 2015, 23:32 | #36 | Link | ||
Registered User
Join Date: Dec 2013
Posts: 753
|
Quote:
Quote:
However the shaders will still use floats (single precision) internally. And as far as I know GPUs aren't that good at integer (or fixed point) arithmetic, but maybe that's changed. |
||
8th October 2015, 23:43 | #37 | Link | |
Registered User
Join Date: Nov 2014
Posts: 440
|
Quote:
Like KNLMeansCL, this is the way forward: Read Integer Buffer --> GPU internal Conversion to normalized float --> Processing float --> GPU internal Conversion to integer --> Write to Integer Buffer. Anyway, in this case it is better not to use float rather than converting by CPU.
__________________
github.com |
|
9th October 2015, 01:06 | #38 | Link |
Soul Architect
Join Date: Apr 2014
Posts: 2,559
|
I wouldn't be surprised if the DX9 function to convert half-float data is delegated to the GPU then, and that's what the buffer-processing function is for. If that's the case, then right now I'm sending commands to the GPU one by one. If I batch them into a buffer to be processed all at once, then performance would probably be MUCH better. Worth a try!
|
9th October 2015, 02:51 | #39 | Link |
Soul Architect
Join Date: Apr 2014
Posts: 2,559
|
I edited ConvertToFloat to use a buffer for half-float conversions. It still calculates as float (which could be optimized by calculating int instead), stores all data into a large float buffer, converts the whole frame at once, then copy back into the frame. ConvertFromFloat doesn't have those changes yet.
That change brought the performance up from 12fps to 14.5fps. |
9th October 2015, 03:36 | #40 | Link |
Soul Architect
Join Date: Apr 2014
Posts: 2,559
|
ConvertToFloat and ConvertFromFloat are now both using a buffer for half-float conversion. Performance is now 18.5fps instead of 12fps. It could be further improved by calculating int data instead of float.
With this optimization, the CPU usage is also now higher, even with only 4 threads, so the whole script is running considerably faster. Edit: I adapted ConvertToFloat to calculate color conversion with INT instead of FLOAT, performance further increased.
__________________
FrameRateConverter | AvisynthShader | AvsFilterNet | Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer Last edited by MysteryX; 9th October 2015 at 05:41. |
Thread Tools | Search this Thread |
Display Modes | |
|
|