Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Hardware & Software > Software players

Reply
 
Thread Tools Search this Thread Display Modes
Old 23rd April 2015, 14:16   #1801  |  Link
aufkrawall
Registered User
 
Join Date: Dec 2011
Posts: 1,812
With a GTX 980, I'm having a little higher GPU load with MPDN 64 neurons, compared to madVR (with the exact settings you have given).
With madVR, it's ~32% GPU usage when scaling 720p25 -> WQHD, with MPDN it's ~39%. The difference seems to get stronger when I additionally let both renderers downsample to maximized windowed resolution.
In both players, render and presentation queues are set to 4 frames (new windowed path).
I'm not giving much about render times since the GPU doesn't always run with the same clock.

But, it's working without any glitches on a quick try. Will do quality comparisons as soon as I got more time.
Very nice progress again!

Edit: Could someone please explain again what the technical reason is why NNEDI3 via DirectCompute is much slower than via OpenCL?

Last edited by aufkrawall; 23rd April 2015 at 14:33.
aufkrawall is offline   Reply With Quote
Old 23rd April 2015, 22:51   #1802  |  Link
Scyna
Registered User
 
Join Date: Jul 2013
Posts: 33
Code:
 1) Amd 290
2)A 16/Scalar: 84%/534 Avg Render 10.50ms, 32/Scalar: 88%/564 Avg render 11.85ms, 64/Scalar: 83%/610 Avg Render 15.34ms, 128/Scalar: 100%/725 Avg Render 21.30ms, 256/Scalar: 100%/922 Avg Render 33.63ms DWM glitches present 84
2)B 16/vector: 67%/606 Avg Render 11.11ms, 32/vector: 72%/586 Avg Render 12.71ms, 64/vector: 85%/630 Avg Render 16.35ms, 128/vector: 90%/743 Avg Render 23.57ms, 256/vector: 100%/953 Avg Render 37.27ms DWM glitches present 221
2)C 16/branches: 84%/544 Avg Render 11.28ms, 32/branches: 80%/605 Avg Render 12.78ms, 64/branches: 81%/642 Avg Render 16.35ms, 128/branches: 96%/753 Avg Render 23.48ms, 256/branches: 100%/953 Avg Render 37.34ms DWM glitches present 229
3) 1280/720 24p
4) 1920x1059
5) Max Performance

2)A 16/Scalar: 67%/565Mhz Avg Render 11ms, 32/Scalar: 96%/574 Avg Render 12ms, 64/Scalar: 80%/630 Avg Render 15.65ms, 128/Scalar: 100%/720 Avg Render 21.42ms, 256/Scalar: 100%/923 avg render 33.72ms DWM glitches Present 68
2)B 16/vector: 80%/553mhz Avg Render 11.48ms, 32/vector: 62%/561 Avg Render 13.08ms, 64/vector: 81%/655 Avg Render 16.45ms, 128/vector: 100%/764 Avg Render 23.73ms, 256/vector: 100%/955 Avg Render 37.56ms DWM glitches present 127
2)C 16/branches: 72%/523 Avg Render 10.83ms, 32/branches: 95%/584 Avg Render 13.19ms, 64/branches: 98%/648 Avg Render 16.44ms, 128/branches: 100%/760 Avg Render 23.58ms, 256/branches: 100%/953 Avg Render 37.70ms DWM glitches present 203
5) Image Quality
Scyna is offline   Reply With Quote
Old 24th April 2015, 02:20   #1803  |  Link
Zachs
Suptitle, MediaPlayer.NET
 
Join Date: Nov 2001
Posts: 1,721
Quote:
Originally Posted by aufkrawall View Post
With a GTX 980, I'm having a little higher GPU load with MPDN 64 neurons, compared to madVR (with the exact settings you have given).
With madVR, it's ~32% GPU usage when scaling 720p25 -> WQHD, with MPDN it's ~39%. The difference seems to get stronger when I additionally let both renderers downsample to maximized windowed resolution.
In both players, render and presentation queues are set to 4 frames (new windowed path).
I'm not giving much about render times since the GPU doesn't always run with the same clock.

But, it's working without any glitches on a quick try. Will do quality comparisons as soon as I got more time.
Very nice progress again!

Edit: Could someone please explain again what the technical reason is why NNEDI3 via DirectCompute is much slower than via OpenCL?
It's very hard to compare GPU load if the clock speed isn't the same between the two. Lower clock speed will incur greater load, but that doesn't mean it's doing more work - in fact it could be doing less. What I'm interested to know is which "path" is the fastest for your 980GTX.

Quote:
Originally Posted by Scyna View Post
Code:
 1) Amd 290
2)A 16/Scalar: 84%/534 Avg Render 10.50ms, 32/Scalar: 88%/564 Avg render 11.85ms, 64/Scalar: 83%/610 Avg Render 15.34ms, 128/Scalar: 100%/725 Avg Render 21.30ms, 256/Scalar: 100%/922 Avg Render 33.63ms DWM glitches present 84
2)B 16/vector: 67%/606 Avg Render 11.11ms, 32/vector: 72%/586 Avg Render 12.71ms, 64/vector: 85%/630 Avg Render 16.35ms, 128/vector: 90%/743 Avg Render 23.57ms, 256/vector: 100%/953 Avg Render 37.27ms DWM glitches present 221
2)C 16/branches: 84%/544 Avg Render 11.28ms, 32/branches: 80%/605 Avg Render 12.78ms, 64/branches: 81%/642 Avg Render 16.35ms, 128/branches: 96%/753 Avg Render 23.48ms, 256/branches: 100%/953 Avg Render 37.34ms DWM glitches present 229
3) 1280/720 24p
4) 1920x1059
5) Max Performance

2)A 16/Scalar: 67%/565Mhz Avg Render 11ms, 32/Scalar: 96%/574 Avg Render 12ms, 64/Scalar: 80%/630 Avg Render 15.65ms, 128/Scalar: 100%/720 Avg Render 21.42ms, 256/Scalar: 100%/923 avg render 33.72ms DWM glitches Present 68
2)B 16/vector: 80%/553mhz Avg Render 11.48ms, 32/vector: 62%/561 Avg Render 13.08ms, 64/vector: 81%/655 Avg Render 16.45ms, 128/vector: 100%/764 Avg Render 23.73ms, 256/vector: 100%/955 Avg Render 37.56ms DWM glitches present 127
2)C 16/branches: 72%/523 Avg Render 10.83ms, 32/branches: 95%/584 Avg Render 13.19ms, 64/branches: 98%/648 Avg Render 16.44ms, 128/branches: 100%/760 Avg Render 23.58ms, 256/branches: 100%/953 Avg Render 37.70ms DWM glitches present 203
5) Image Quality
Thank you for such a detailed feedback!

All these reports will be very handy to come up with automatic "path" selection.
Zachs is offline   Reply With Quote
Old 24th April 2015, 04:22   #1804  |  Link
Anime Viewer
Troubleshooter
 
Anime Viewer's Avatar
 
Join Date: Feb 2014
Posts: 339
Avoid Branches

Quote:
Originally Posted by Zachs View Post
I've uploaded MPDN build 3066 to the test builds folder. It fixes a problem in 3065 where scaling luma wasn't skipped even when render script told it to, so it was unnecessarily scaling luma with MPDN's internal scaler even when it's already being scaled by nnedi3. So it should be make things a little faster.

I've also updated the nnedi3 script on GitHub - it now has an extra option called "Path" which presents you with 3 options: Prefer Scaler, Prefer Vector or Avoid Branches. Each GPU+driver works differently (as I've gathered from the tests you have done so far) so try out each path for each neuron setting, you'll find one that's fastest.

Same as before, I'd like feedback to be in the following format.
1) GPU name
2) 16/32/64/128/256 Neurons - Selected Path (Scalar/Vector/NoBranch) - GPU load / clock speed when it's running + render time (optional)
3) Source clip resolution / frame rate
4) Target resolution (must be bigger than source clip res)
5) Render quality settings (preferably "Image Quality" which is the MPDN default)



Code:
1) Nvidia 680m GTX (Optimus system)
2) 
16/Scalar: 24%/718.5Mhz(gpu)900mhz(mem) (render 12-13ms), 
16/Vector: 22-17% gpu load (clocks remain the same as in other test) 11-12ms, 
16/Branch: 22% gpu (render 11-12ms), 
32/Scalar: 33% gpu (render 16ms), 
32/Vector: 31% gpu (render 15ms), 
32/Branch 30% gpu (render 15ms), 
64/Scaler: 52% gpu (render 24ms),  
64/Vector: 48% gpu (render 24ms), 
64/Branch: 48% gpu (render 22ms), 
128/Scalar: 94% gpu (render 64ms) in other words dropped frames and failure to be a worthwhile setting, 
128/Vector: 92% gpu (render 62ms) in other words dropped frames and failure to be a worthwhile setting, 
128/Branch: 84% gpu (render 36-37ms) - Actually worked!?!
3) 848x480 23p
4) 1920x1080
5) Image Quality
At first I thought they all performed pretty similar, and that it wouldn't make much difference, but then I got the the 128 neuron test and there was a very significant difference. "Avoid Branches" seems to be the better choice for 680m GTX Optimus systems.
__________________
System specs: Sager NP9150 SE with i7-3630QM 2.40GHz, 16 GB RAM, 64-bit Windows 10 Pro, NVidia GTX 680M/Intel 4000 HD optimus dual GPU system. Video viewed on LG notebook screen and LG 3D passive TV.

Last edited by Anime Viewer; 24th April 2015 at 04:37.
Anime Viewer is offline   Reply With Quote
Old 24th April 2015, 04:29   #1805  |  Link
Zachs
Suptitle, MediaPlayer.NET
 
Join Date: Nov 2001
Posts: 1,721
Quote:
Originally Posted by Anime Viewer View Post
At first I thought they all performed pretty similar, and that it wouldn't make much difference, but then I got the the 128 neuron test and there was a very significant difference. "Avoid Branches" seems to be
If the driver does a good job at utilizing the GPU resource with any given shader code, you should get the same regardless of which option you choose. NV hardware seems to like the "Avoid Branches" path, and yes the difference could be massive like what you saw. Blame the driver!
Zachs is offline   Reply With Quote
Old 24th April 2015, 04:46   #1806  |  Link
Anime Viewer
Troubleshooter
 
Anime Viewer's Avatar
 
Join Date: Feb 2014
Posts: 339
Quote:
Originally Posted by Zachs View Post
If the driver does a good job at utilizing the GPU resource with any given shader code, you should get the same regardless of which option you choose. NV hardware seems to like the "Avoid Branches" path, and yes the difference could be massive like what you saw. Blame the driver!
While the Intel 4000 HD can't handle any forum of NNEDI3 (render times are too high) it seems the Intel (or at least Optimus Intel's) seem to like the Vector setting best (70ms vs 190-Scalar and 206-Branches).

I feel like it would be slightly amusing if AMD turns out to work best with the third option (Scalar). Then each of the three vendor cards would work better with a different shader code.
__________________
System specs: Sager NP9150 SE with i7-3630QM 2.40GHz, 16 GB RAM, 64-bit Windows 10 Pro, NVidia GTX 680M/Intel 4000 HD optimus dual GPU system. Video viewed on LG notebook screen and LG 3D passive TV.

Last edited by Anime Viewer; 24th April 2015 at 04:49.
Anime Viewer is offline   Reply With Quote
Old 24th April 2015, 04:47   #1807  |  Link
ryrynz
Registered User
 
ryrynz's Avatar
 
Join Date: Mar 2009
Posts: 3,646
Quote:
Originally Posted by Anime Viewer View Post
At first I thought they all performed pretty similar, and that it wouldn't make much difference, but then I got the the 128 neuron test and there was a very significant difference. "Avoid Branches" seems to be the better choice for 680m GTX Optimus systems.
Can confirm on a 850m Avoid Branches is fastest at 64+ neurons. 32 neurons was a touch faster on Prefer Scaler and at 16 it's too close to determine a winner.
ryrynz is offline   Reply With Quote
Old 24th April 2015, 04:54   #1808  |  Link
Zachs
Suptitle, MediaPlayer.NET
 
Join Date: Nov 2001
Posts: 1,721
Quote:
Originally Posted by Anime Viewer View Post
While the Intel 4000 HD can't handle any forum of NNEDI3 (render times are too high) it seems the Intel (or at least Optimus Intel's) seem to like the Vector setting best (70ms vs 190-Scalar and 206-Branches).

I feel like it would be slightly amusing if AMD turns out to work best with the third option (Scalar). Then each of the three vendor cards would work better with a different shader code.
My Intel HD 4600 runs best with scalar - even does 320x180 256 neurons with only 73% GPU load!

From Scyna's feedback, AMD seems to prefer scalar/vector depending on the neuron setting.

Last edited by Zachs; 24th April 2015 at 04:59.
Zachs is offline   Reply With Quote
Old 24th April 2015, 05:10   #1809  |  Link
Anime Viewer
Troubleshooter
 
Anime Viewer's Avatar
 
Join Date: Feb 2014
Posts: 339
Quote:
Originally Posted by Zachs View Post
My Intel HD 4600 runs best with scalar - even does 320x180 256 neurons with only 73% GPU load!
That's interesting. I would have guessed that all 4000 series Intel gpu would work better with the same option. I just found that there is a Intel gpu driver update available for my card (version 15.33.35.64.4176), so I'm going to download, install, and test to see if that changes anything (I expect NNEDI3 to still be unusable with the 4000 after the update, but I'll be curious to see if the scalar then works better or if ms render readings drop at all).
Edit: with the new drivers the scalar ms reading dropped slightly (from 190ms to 187ms), but vector still seemed to preform better with the Intel 4000. Maybe its an Optimus thing more than a 4000 thing, but regardless it doesn't matter since NNEDI3 is unusable with the 4000 anyway.
__________________
System specs: Sager NP9150 SE with i7-3630QM 2.40GHz, 16 GB RAM, 64-bit Windows 10 Pro, NVidia GTX 680M/Intel 4000 HD optimus dual GPU system. Video viewed on LG notebook screen and LG 3D passive TV.

Last edited by Anime Viewer; 24th April 2015 at 05:20.
Anime Viewer is offline   Reply With Quote
Old 24th April 2015, 05:14   #1810  |  Link
Zachs
Suptitle, MediaPlayer.NET
 
Join Date: Nov 2001
Posts: 1,721
Quote:
Originally Posted by Anime Viewer View Post
That's interesting. I would have guessed that all 4000 series Intel gpu would work better with the same option. I just found that there is a Intel gpu driver update available for my card (version 15.33.35.64.4176), so I'm going to download, install, and test to see if that changes anything (I expect NNEDI3 to still be unusable with the 4000 after the update, but I'll be curious to see if the scalar then works better or if ms render readings drop at all).
Can't remember the exact improvements but the new driver improved it quite a bit for me.

Anyway, I've just found two other code paths that could make a difference, I'll add that and let you guys have a play.
Zachs is offline   Reply With Quote
Old 24th April 2015, 05:50   #1811  |  Link
ryrynz
Registered User
 
ryrynz's Avatar
 
Join Date: Mar 2009
Posts: 3,646
Quote:
Originally Posted by Zachs View Post
My Intel HD 4600 runs best with scalar - even does 320x180 256 neurons with only 73% GPU load!

From Scyna's feedback, AMD seems to prefer scalar/vector depending on the neuron setting.
Same with nvidia. Intel HD graphics does better with Prefer Vector and considerably so.. Strange is that 64 neurons with Prefer Vector is twice as fast as 32 neurons set the same.. Any ideas why? In fact 64 neurons is faster than 16!

Last edited by ryrynz; 24th April 2015 at 06:00.
ryrynz is offline   Reply With Quote
Old 24th April 2015, 05:56   #1812  |  Link
Zachs
Suptitle, MediaPlayer.NET
 
Join Date: Nov 2001
Posts: 1,721
Quote:
Originally Posted by ryrynz View Post
Same with nvidia. The 820m does better with Prefer Vector and considerably so.. Strange is that 64 neurons with Prefer Vector is twice as fast as 32 neurons set the same.. Any ideas why?
Nope. Only Nvidia can answer that question I'm afraid!
My only very rough guess is that they do very specific optimizations for games and some conditions must've triggered it. On other cases, they couldn't be bothered.
Zachs is offline   Reply With Quote
Old 24th April 2015, 06:08   #1813  |  Link
ryrynz
Registered User
 
ryrynz's Avatar
 
Join Date: Mar 2009
Posts: 3,646
Quote:
Originally Posted by Zachs View Post
Nope. Only Nvidia can answer that question I'm afraid!
My only very rough guess is that they do very specific optimizations for games and some conditions must've triggered it. On other cases, they couldn't be bothered.
Thought something was fishy there.. It was the hd graphics not the 820 that was running. Surprising what works for one doesn't work so well for others and changing depending on neuron count... I think this should perhaps be able to be set individually per neuron count. Trying to do it automatically might not work too well.
ryrynz is offline   Reply With Quote
Old 24th April 2015, 06:14   #1814  |  Link
Zachs
Suptitle, MediaPlayer.NET
 
Join Date: Nov 2001
Posts: 1,721
Quote:
Originally Posted by ryrynz View Post
Thought something was fishy there.. It was the hd graphics not the 820 that was running. Surprising what works for one doesn't work so well for others and changing depending on neuron count... I think this should perhaps be able to be set individually per neuron count. Trying to do it automatically might not work too well.
Yeah it certainly looks that way now. I might just leave it the way it is so you can select the neuron and the code path individually.
Zachs is offline   Reply With Quote
Old 24th April 2015, 06:19   #1815  |  Link
Zachs
Suptitle, MediaPlayer.NET
 
Join Date: Nov 2001
Posts: 1,721
OK guys, grab the latest nnedi3 script from github - it adds two more code paths. I've also optimized it further.
Zachs is offline   Reply With Quote
Old 24th April 2015, 06:42   #1816  |  Link
ryrynz
Registered User
 
ryrynz's Avatar
 
Join Date: Mar 2009
Posts: 3,646
Have the first three been optimized further?
ryrynz is offline   Reply With Quote
Old 24th April 2015, 06:47   #1817  |  Link
Zachs
Suptitle, MediaPlayer.NET
 
Join Date: Nov 2001
Posts: 1,721
It's a general (minor) optimization that affects all paths.
Zachs is offline   Reply With Quote
Old 24th April 2015, 07:10   #1818  |  Link
Zachs
Suptitle, MediaPlayer.NET
 
Join Date: Nov 2001
Posts: 1,721
Quote:
Originally Posted by huhn View Post
but is works fine with EVR/madVR using direct audio with a 90 % buffer.

but i wouldn't be shock if the driver of this card are kind of "broken".
asus and creative doesn't really care about there driver.

the ac3 filter is load but not used i tried both 32 and 64 bit MPDN.
Can you try lowering MPDN's priority to high to see if it helps? I suspect the problem may be because you've only got 2 CPU cores. If it doesn't help, try lowering it down further and let me know which one works?
Zachs is offline   Reply With Quote
Old 24th April 2015, 07:37   #1819  |  Link
Scyna
Registered User
 
Join Date: Jul 2013
Posts: 33
Code:
 1) Amd 290
2)A 16/Scalar small code: 88%/530 Avg Render 10.78ms, 32: 78%/575 Avg Render 12.14ms, 64: 87%/638 Avg Render 15.29ms, 128: 100%/717 Avg Render 21.17ms, 256: 100%/905 Avg Render 32.87ms DWM glitches present 13
2)B 16/vector small code: 78%/556 Avg Render 10.93ms, 32: 50%/610 Avg Render 12.96ms, 64: 93%/634 Avg Render 16.54ms, 128: 100%/749 Avg Render 23.18ms, 256: 100%/951 Avg Render 36.82ms DWM glitches present 259
3) 1280/720 24p
4) 1920x1059
5) Max Performance

2)A 16/Scalar small code: 59%/579 Avg Render 10.85ms, 32: 86%/589 Avg Render 12.62ms, 64: 80%/667 Avg Render 15.15ms, 128: 100%/734 Avg Render 21.41ms, 256: 100%/918 Avg Render 33.06ms DWM glitches present 6
2)B 16/vector small code: 61%/564 Avg Render 11.30ms, 32: 82%/596 Avg Render 13.02ms, 64: 81%/649 Avg Render 16.22ms, 128: 100%/743 Avg Render 23.48ms, 256: 100%/951 Avg Render 37.07ms DWm glitches present 251 
5) Image Quality
Scyna is offline   Reply With Quote
Old 24th April 2015, 08:24   #1820  |  Link
Zachs
Suptitle, MediaPlayer.NET
 
Join Date: Nov 2001
Posts: 1,721
Looks like the small code versions did help AMD at 16 neurons!
Zachs is offline   Reply With Quote
Reply

Tags
direct3d, mpdn, nnedi3, opencl, reclock

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 09:00.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.