Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 7th December 2016, 14:51   #981  |  Link
Khanattila
Registered User
 
Khanattila's Avatar
 
Join Date: Nov 2014
Posts: 440
Quote:
Originally Posted by ShogoXT View Post
New version of amd performance drivers coming in a day or so with new features including hevc encoding support on amd amf 1.4 . Since I have the rx 480 which versions do you want me to try to test and compare?

Also should I be using avisynth+ or Mt instead?
v1.0.0-beta.1 is stable enough if you want to try it.
If you have problems back to v0.7.7.

Using AviSynth MT is useless. Instead AviSynth+ is supported but not required.
__________________
github.com
Khanattila is offline   Reply With Quote
Old 7th December 2016, 15:08   #982  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,806
I wonder why ocl_y = 8 appears to be a "magic numer" ? (at least for AMD/INTEL)
Atak_Snajpera is offline   Reply With Quote
Old 7th December 2016, 16:48   #983  |  Link
Khanattila
Registered User
 
Khanattila's Avatar
 
Join Date: Nov 2014
Posts: 440
Quote:
Originally Posted by Atak_Snajpera View Post
I wonder why ocl_y = 8 appears to be a "magic numer" ? (at least for AMD/INTEL)
Global memory access by threads can be coalesced to one transaction.

Should be the alignment of the memory.
__________________
github.com
Khanattila is offline   Reply With Quote
Old 7th December 2016, 17:50   #984  |  Link
Khanattila
Registered User
 
Khanattila's Avatar
 
Join Date: Nov 2014
Posts: 440
Intel i5-4690

Code:
// (ocl_x * ocl_y) = 2048
ocl_x = 512, ocl_y = 4, ocl_r = 1, 1.798 FPS
ocl_x = 256, ocl_y = 8, ocl_r = 1, 1.761 FPS
ocl_x = 128, ocl_y = 16, ocl_r = 1, 1.805 FPS
ocl_x = 64, ocl_y = 32, ocl_r = 1, 1.738 FPS

// (ocl_x * ocl_y) = 1024
ocl_x = 256, ocl_y = 4, ocl_r = 2, 2.358 FPS
ocl_x = 128, ocl_y = 8, ocl_r = 2, 2.290 FPS
ocl_x = 64, ocl_y = 16, ocl_r = 2, 2.259 FPS
ocl_x = 32, ocl_y = 32, ocl_r = 2, 2.225 FPS

// (ocl_x * ocl_y) = 1024
ocl_x = 256, ocl_y = 4, ocl_r = 3, 2.555 FPS
ocl_x = 128, ocl_y = 8, ocl_r = 3, 2.672 FPS
ocl_x = 64, ocl_y = 16, ocl_r = 3, 2.573 FPS
ocl_x = 32, ocl_y = 32, ocl_r = 3, 2.485 FPS

// (ocl_x * ocl_y) = 1024
ocl_x = 256, ocl_y = 4, ocl_r = 4  2.796 FPS
ocl_x = 128, ocl_y = 8, ocl_r = 4, 2.792 FPS
ocl_x = 64, ocl_y = 16, ocl_r = 4, 2.705 FPS
ocl_x = 32, ocl_y = 32, ocl_r = 4, 2.641 FPS
__________________
github.com

Last edited by Khanattila; 7th December 2016 at 18:51.
Khanattila is offline   Reply With Quote
Old 7th December 2016, 22:19   #985  |  Link
dipje
Registered User
 
Join Date: Oct 2014
Posts: 268
My speed seems to be fastest with ocl_y = 8 as well.
But maxing my workgroup size means ocl_x = 128, ocl_y = 8 (which is the fastest of all the combinations with ocl_x * ocl_y = 1024)

But any combination with ocl_y = 8 is faster except ocl_x = 4. So my fastest ended up being ocl_x = 16, ocl_y = 8.

Then, increasing ocl_r skyrockets performance some more. It tops out eventually.. but.. completely different to the combinations posted before.

Will post all my benches in some mintues.

----------------------------------
Core i7 860 @ 3.36 ghz, GTX 1060 6GB

First off, 0.7.6 / 0.7.7 / 1.0b1. Avisynth x64, d = 2, a = 2:
0.7.6 = 14.52
0.7.7 = 14.45
1.0b1 = 12.95

Sort of the same speed decrease as the 0.8 test/alpha thing

----------------------------------

The 'benchmark only' thing, 32bit (took me a while to figure out the DLL wasn't x64 :P), d = 1, a = 1

Nvidia max work size = 1024, So I started with this:
Code:
ocl_x = 256		ocl_y = 4			ocl_r = 1		123.6
ocl_x = 128		ocl_y = 8			ocl_r = 1		126.2
ocl_x = 64		ocl_y = 16			ocl_r = 1		119.4
ocl_x = 32		ocl_y = 32			ocl_r = 1		121.1
ocl_x = 16		ocl_y = 64			ocl_r = 1		106.7
ocl_x = 8		ocl_y = 128			ocl_r = 1		102.7
ocl_x = 4		ocl_y = 256			ocl_r = 1		97.25
ocl_y = 8 is supreme again apparently.

Then, just for curiousity I tried the other ocl_x parameters but kept ocl_y at 8:
Code:
ocl_x = 4		ocl_y = 8			ocl_r = 1		91.12
ocl_x = 8		ocl_y = 8			ocl_r = 1		140.5
ocl_x = 16		ocl_y = 8			ocl_r = 1		141.3
ocl_x = 32		ocl_y = 8			ocl_r = 1		134.3
ocl_x = 64		ocl_y = 8			ocl_r = 1		134.5
ocl_x = 128		ocl_y = 8			ocl_r = 1		126.2
Most of them are much faster! So I started testing ocl_r with the x = 16, y = 8 combination:
Code:
ocl_x = 16		ocl_y = 8			ocl_r = 1		141.5
ocl_x = 16		ocl_y = 8			ocl_r = 2		162.5
ocl_x = 16		ocl_y = 8			ocl_r = 3		169.7
ocl_x = 16		ocl_y = 8			ocl_r = 4		171.4
ocl_x = 16		ocl_y = 8			ocl_r = 5		173.5
ocl_x = 16		ocl_y = 8			ocl_r = 6		172.7
ocl_x = 16		ocl_y = 8			ocl_r = 7		173.4
ocl_x = 16		ocl_y = 8			ocl_r = 8		172.1
ocl_x = 16		ocl_y = 8			ocl_r = 9		172.7
ocl_x = 16		ocl_y = 8			ocl_r = 10		170.9
Huge speed increase with the first few ocl_r increases, from around ocl_r = 5 and up it seems to be 'generally the same' and probably in the margin of benchmark-error / variation.

I wanted to see the effect of ocl_r when using all the complete work-size my OpenCL info reports. So testing some more with ocl_x = 128, ocl_y = 8:
Code:
ocl_x = 128		ocl_y = 8			ocl_r = 1		126.2
ocl_x = 128		ocl_y = 8			ocl_r = 2		151.2
ocl_x = 128		ocl_y = 8			ocl_r = 3		160.9
ocl_x = 128		ocl_y = 8			ocl_r = 4		165.5
ocl_x = 128		ocl_y = 8			ocl_r = 5		168.4
ocl_x = 128		ocl_y = 8			ocl_r = 6		167.2
ocl_x = 128		ocl_y = 8			ocl_r = 7		167.6
ocl_x = 128		ocl_y = 8			ocl_r = 8		168.8
ocl_x = 128		ocl_y = 8			ocl_r = 9		168.6
ocl_r = 10 gave an error. Probably asking too much or out of memory or something.
I see the same speed increase in the first few ocl_r increases, and again from ocl_r = 5 it seems to reach a good 'max' leaving the rest in 'pretty much the same'-ballpark.
The speed comes very close to ocl_x = 16, ocl_y = 8, ocl_r = 5, but never quite matches it.

Are you looking to find an x/y/r combination sweet spot for all cards, or looking to pick different parameters on different generations + brands of cards? of max memory, or whatever?

For what's it worth, not specifying the parameters gives me 166.2 fps

Last edited by dipje; 7th December 2016 at 23:03.
dipje is offline   Reply With Quote
Old 7th December 2016, 23:08   #986  |  Link
dipje
Registered User
 
Join Date: Oct 2014
Posts: 268
Btw, Khanattila, if you say 'Intel i5-4690', do you mean with the gpu on that i5? because If I try to do CPU only (device_type = "CPU") with the benchmark-only version it says "no compatible opencl platforms available"
dipje is offline   Reply With Quote
Old 7th December 2016, 23:53   #987  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,806
install intel opencl runtime and drivers
Atak_Snajpera is offline   Reply With Quote
Old 8th December 2016, 00:54   #988  |  Link
ShogoXT
Registered User
 
Join Date: Dec 2011
Posts: 95
He might have the IGPU disabled in his BIOS. Usually its default disabled on custom mobos. You have to go through hoops to enable it along side a normal GPU on Windows 7, but you can have them both enabled fine on Windows 10.

Can use quicksync that way.
ShogoXT is offline   Reply With Quote
Old 8th December 2016, 08:49   #989  |  Link
dipje
Registered User
 
Join Date: Oct 2014
Posts: 268
I don't have a iGPU. But cpu type is working with the previous versions. Maybe a x86 / x64 thing.

Anyway , I was asking if he said ' core i5' if that was the CPU or iGPU.
dipje is offline   Reply With Quote
Old 8th December 2016, 12:03   #990  |  Link
Khanattila
Registered User
 
Khanattila's Avatar
 
Join Date: Nov 2014
Posts: 440
Quote:
Originally Posted by dipje View Post
I don't have a iGPU. But cpu type is working with the previous versions. Maybe a x86 / x64 thing.

Anyway , I was asking if he said ' core i5' if that was the CPU or iGPU.
CPU, otherwise I would write HD Graphics 4600 or Haswell(GT2).
__________________
github.com
Khanattila is offline   Reply With Quote
Old 8th December 2016, 12:21   #991  |  Link
Khanattila
Registered User
 
Khanattila's Avatar
 
Join Date: Nov 2014
Posts: 440
@ dipje

Thanks for the informations, they explain why the new version is slower.

ocl_x, ocl_y and olc_r will be set internally by the program. I can read some information from the device such as local memory (L2 cache) and local work size.
__________________
github.com
Khanattila is offline   Reply With Quote
Old 8th December 2016, 20:52   #992  |  Link
dipje
Registered User
 
Join Date: Oct 2014
Posts: 268
Quote:
Originally Posted by Khanattila View Post
CPU, otherwise I would write HD Graphics 4600 or Haswell(GT2).
So any reason why the 'benchmark only' build will not run CPU mode on my i7-860 but previous versions worked fine?

Or am I confused between my i7-860 and my ui7-2670qm and is it another case of AVX support or something like that.
dipje is offline   Reply With Quote
Old 9th December 2016, 10:03   #993  |  Link
ShogoXT
Registered User
 
Join Date: Dec 2011
Posts: 95
Trying to use 1.0.0b1 in regular use in staxrip to see if it can help with some rainbow and dot crawl issues.
How do you use the new channels setting? Wiki only mentions cmode.
ShogoXT is offline   Reply With Quote
Old 9th December 2016, 16:56   #994  |  Link
Khanattila
Registered User
 
Khanattila's Avatar
 
Join Date: Nov 2014
Posts: 440
Quote:
Originally Posted by ShogoXT View Post
Trying to use 1.0.0b1 in regular use in staxrip to see if it can help with some rainbow and dot crawl issues.
How do you use the new channels setting? Wiki only mentions cmode.
channels = {"YUV", "Y", "UV", "RGB", "auto"}.

"auto" is "Y" for YUV colour space and "RGB" for RGB color space.
__________________
github.com
Khanattila is offline   Reply With Quote
Old 10th December 2016, 07:29   #995  |  Link
ShogoXT
Registered User
 
Join Date: Dec 2011
Posts: 95
Fiddling around with it in Staxrip I found a few things.

With default settings and with d=0 (Temp on or off made no difference), there was significant blurring compared to the older 7.7 version.

See here:
http://screenshotcomparison.com/comp...3278/picture:0
Its two sets, first is with 1.0b1 vs 7.7, and second with 1.0b1 vs OFF.

Keep in mind this is after it was upsampled and sharpened (awarpsharp2 and LSFMod) a good bit, so the blurring is actually a bit worse before probably (I cant take a pic without LSFMod because id have to redo how havfunc is loaded to make NNedi3 work)

Also with channels = "YUV", although Staxrip doesnt complain, it makes the preview window only show this:
http://imgur.com/mmjCT8e

EDIT: Tried setting all the basic values to 1, but no effect on reducing blur (a,s, and h. Did d as well to play with it)

Last edited by ShogoXT; 10th December 2016 at 20:22.
ShogoXT is offline   Reply With Quote
Old 12th December 2016, 11:34   #996  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
I have tested SMDegrain with newer KNLMeansCL on lot of movies during these days.

The final result is an average 3-7% less noise reduction, measured very roughly on bitrate output and visual comparison frame by frame.

Any idea of why? What has changed in newer version that could have so much impact?

Plus: with stable version I can use 8 threads, with beta only 7.
__________________
@turment on Telegram

Last edited by tormento; 12th December 2016 at 11:55.
tormento is offline   Reply With Quote
Old 12th December 2016, 15:42   #997  |  Link
WolframRhodium
Registered User
 
Join Date: Jan 2016
Posts: 162
Quote:
Originally Posted by tormento View Post
I have tested SMDegrain with newer KNLMeansCL on lot of movies during these days.

The final result is an average 3-7% less noise reduction, measured very roughly on bitrate output and visual comparison frame by frame.

Any idea of why? What has changed in newer version that could have so much impact?

Plus: with stable version I can use 8 threads, with beta only 7.
Maybe because "wmode" has changed? For v1.0.0-beta.1, you should set it to 0 (Welsch weighting function), which is used in v0.7.7 by default.
WolframRhodium is offline   Reply With Quote
Old 12th December 2016, 16:20   #998  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
Quote:
Originally Posted by WolframRhodium View Post
Maybe because "wmode" has changed? For v1.0.0-beta.1, you should set it to 0 (Welsch weighting function), which is used in v0.7.7 by default.
Uh... is there any doc for newer version? I found nothing in the release zip file, only a changelog on github.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 12th December 2016, 17:37   #999  |  Link
Khanattila
Registered User
 
Khanattila's Avatar
 
Join Date: Nov 2014
Posts: 440
Quote:
Originally Posted by tormento View Post
Uh... is there any doc for newer version? I found nothing in the release zip file, only a changelog on github.
v1.0.0-beta.2 will be well documented.
__________________
github.com
Khanattila is offline   Reply With Quote
Old 12th December 2016, 17:39   #1000  |  Link
WolframRhodium
Registered User
 
Join Date: Jan 2016
Posts: 162
Quote:
Originally Posted by tormento View Post
Uh... is there any doc for newer version? I found nothing in the release zip file, only a changelog on github.
I'm also looking for official document for new version...

Update: oh the next version will be documented, that's great!

Last edited by WolframRhodium; 12th December 2016 at 17:42.
WolframRhodium is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 19:47.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.