Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
|
Thread Tools | Search this Thread | Display Modes |
7th December 2016, 14:51 | #981 | Link | |
Registered User
Join Date: Nov 2014
Posts: 440
|
Quote:
If you have problems back to v0.7.7. Using AviSynth MT is useless. Instead AviSynth+ is supported but not required.
__________________
github.com |
|
7th December 2016, 15:08 | #982 | Link |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,806
|
I wonder why ocl_y = 8 appears to be a "magic numer" ? (at least for AMD/INTEL)
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
7th December 2016, 16:48 | #983 | Link | |
Registered User
Join Date: Nov 2014
Posts: 440
|
Quote:
Should be the alignment of the memory.
__________________
github.com |
|
7th December 2016, 17:50 | #984 | Link |
Registered User
Join Date: Nov 2014
Posts: 440
|
Intel i5-4690
Code:
// (ocl_x * ocl_y) = 2048 ocl_x = 512, ocl_y = 4, ocl_r = 1, 1.798 FPS ocl_x = 256, ocl_y = 8, ocl_r = 1, 1.761 FPS ocl_x = 128, ocl_y = 16, ocl_r = 1, 1.805 FPS ocl_x = 64, ocl_y = 32, ocl_r = 1, 1.738 FPS // (ocl_x * ocl_y) = 1024 ocl_x = 256, ocl_y = 4, ocl_r = 2, 2.358 FPS ocl_x = 128, ocl_y = 8, ocl_r = 2, 2.290 FPS ocl_x = 64, ocl_y = 16, ocl_r = 2, 2.259 FPS ocl_x = 32, ocl_y = 32, ocl_r = 2, 2.225 FPS // (ocl_x * ocl_y) = 1024 ocl_x = 256, ocl_y = 4, ocl_r = 3, 2.555 FPS ocl_x = 128, ocl_y = 8, ocl_r = 3, 2.672 FPS ocl_x = 64, ocl_y = 16, ocl_r = 3, 2.573 FPS ocl_x = 32, ocl_y = 32, ocl_r = 3, 2.485 FPS // (ocl_x * ocl_y) = 1024 ocl_x = 256, ocl_y = 4, ocl_r = 4 2.796 FPS ocl_x = 128, ocl_y = 8, ocl_r = 4, 2.792 FPS ocl_x = 64, ocl_y = 16, ocl_r = 4, 2.705 FPS ocl_x = 32, ocl_y = 32, ocl_r = 4, 2.641 FPS
__________________
github.com Last edited by Khanattila; 7th December 2016 at 18:51. |
7th December 2016, 22:19 | #985 | Link |
Registered User
Join Date: Oct 2014
Posts: 268
|
My speed seems to be fastest with ocl_y = 8 as well.
But maxing my workgroup size means ocl_x = 128, ocl_y = 8 (which is the fastest of all the combinations with ocl_x * ocl_y = 1024) But any combination with ocl_y = 8 is faster except ocl_x = 4. So my fastest ended up being ocl_x = 16, ocl_y = 8. Then, increasing ocl_r skyrockets performance some more. It tops out eventually.. but.. completely different to the combinations posted before. Will post all my benches in some mintues. ---------------------------------- Core i7 860 @ 3.36 ghz, GTX 1060 6GB First off, 0.7.6 / 0.7.7 / 1.0b1. Avisynth x64, d = 2, a = 2: 0.7.6 = 14.52 0.7.7 = 14.45 1.0b1 = 12.95 Sort of the same speed decrease as the 0.8 test/alpha thing ---------------------------------- The 'benchmark only' thing, 32bit (took me a while to figure out the DLL wasn't x64 :P), d = 1, a = 1 Nvidia max work size = 1024, So I started with this: Code:
ocl_x = 256 ocl_y = 4 ocl_r = 1 123.6 ocl_x = 128 ocl_y = 8 ocl_r = 1 126.2 ocl_x = 64 ocl_y = 16 ocl_r = 1 119.4 ocl_x = 32 ocl_y = 32 ocl_r = 1 121.1 ocl_x = 16 ocl_y = 64 ocl_r = 1 106.7 ocl_x = 8 ocl_y = 128 ocl_r = 1 102.7 ocl_x = 4 ocl_y = 256 ocl_r = 1 97.25 Then, just for curiousity I tried the other ocl_x parameters but kept ocl_y at 8: Code:
ocl_x = 4 ocl_y = 8 ocl_r = 1 91.12 ocl_x = 8 ocl_y = 8 ocl_r = 1 140.5 ocl_x = 16 ocl_y = 8 ocl_r = 1 141.3 ocl_x = 32 ocl_y = 8 ocl_r = 1 134.3 ocl_x = 64 ocl_y = 8 ocl_r = 1 134.5 ocl_x = 128 ocl_y = 8 ocl_r = 1 126.2 Code:
ocl_x = 16 ocl_y = 8 ocl_r = 1 141.5 ocl_x = 16 ocl_y = 8 ocl_r = 2 162.5 ocl_x = 16 ocl_y = 8 ocl_r = 3 169.7 ocl_x = 16 ocl_y = 8 ocl_r = 4 171.4 ocl_x = 16 ocl_y = 8 ocl_r = 5 173.5 ocl_x = 16 ocl_y = 8 ocl_r = 6 172.7 ocl_x = 16 ocl_y = 8 ocl_r = 7 173.4 ocl_x = 16 ocl_y = 8 ocl_r = 8 172.1 ocl_x = 16 ocl_y = 8 ocl_r = 9 172.7 ocl_x = 16 ocl_y = 8 ocl_r = 10 170.9 I wanted to see the effect of ocl_r when using all the complete work-size my OpenCL info reports. So testing some more with ocl_x = 128, ocl_y = 8: Code:
ocl_x = 128 ocl_y = 8 ocl_r = 1 126.2 ocl_x = 128 ocl_y = 8 ocl_r = 2 151.2 ocl_x = 128 ocl_y = 8 ocl_r = 3 160.9 ocl_x = 128 ocl_y = 8 ocl_r = 4 165.5 ocl_x = 128 ocl_y = 8 ocl_r = 5 168.4 ocl_x = 128 ocl_y = 8 ocl_r = 6 167.2 ocl_x = 128 ocl_y = 8 ocl_r = 7 167.6 ocl_x = 128 ocl_y = 8 ocl_r = 8 168.8 ocl_x = 128 ocl_y = 8 ocl_r = 9 168.6 I see the same speed increase in the first few ocl_r increases, and again from ocl_r = 5 it seems to reach a good 'max' leaving the rest in 'pretty much the same'-ballpark. The speed comes very close to ocl_x = 16, ocl_y = 8, ocl_r = 5, but never quite matches it. Are you looking to find an x/y/r combination sweet spot for all cards, or looking to pick different parameters on different generations + brands of cards? of max memory, or whatever? For what's it worth, not specifying the parameters gives me 166.2 fps Last edited by dipje; 7th December 2016 at 23:03. |
7th December 2016, 23:08 | #986 | Link |
Registered User
Join Date: Oct 2014
Posts: 268
|
Btw, Khanattila, if you say 'Intel i5-4690', do you mean with the gpu on that i5? because If I try to do CPU only (device_type = "CPU") with the benchmark-only version it says "no compatible opencl platforms available"
|
7th December 2016, 23:53 | #987 | Link |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,806
|
install intel opencl runtime and drivers
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
8th December 2016, 00:54 | #988 | Link |
Registered User
Join Date: Dec 2011
Posts: 95
|
He might have the IGPU disabled in his BIOS. Usually its default disabled on custom mobos. You have to go through hoops to enable it along side a normal GPU on Windows 7, but you can have them both enabled fine on Windows 10.
Can use quicksync that way. |
8th December 2016, 12:03 | #990 | Link |
Registered User
Join Date: Nov 2014
Posts: 440
|
CPU, otherwise I would write HD Graphics 4600 or Haswell(GT2).
__________________
github.com |
8th December 2016, 12:21 | #991 | Link |
Registered User
Join Date: Nov 2014
Posts: 440
|
@ dipje
Thanks for the informations, they explain why the new version is slower. ocl_x, ocl_y and olc_r will be set internally by the program. I can read some information from the device such as local memory (L2 cache) and local work size.
__________________
github.com |
8th December 2016, 20:52 | #992 | Link | |
Registered User
Join Date: Oct 2014
Posts: 268
|
Quote:
Or am I confused between my i7-860 and my ui7-2670qm and is it another case of AVX support or something like that. |
|
9th December 2016, 16:56 | #994 | Link | |
Registered User
Join Date: Nov 2014
Posts: 440
|
Quote:
"auto" is "Y" for YUV colour space and "RGB" for RGB color space.
__________________
github.com |
|
10th December 2016, 07:29 | #995 | Link |
Registered User
Join Date: Dec 2011
Posts: 95
|
Fiddling around with it in Staxrip I found a few things.
With default settings and with d=0 (Temp on or off made no difference), there was significant blurring compared to the older 7.7 version. See here: http://screenshotcomparison.com/comp...3278/picture:0 Its two sets, first is with 1.0b1 vs 7.7, and second with 1.0b1 vs OFF. Keep in mind this is after it was upsampled and sharpened (awarpsharp2 and LSFMod) a good bit, so the blurring is actually a bit worse before probably (I cant take a pic without LSFMod because id have to redo how havfunc is loaded to make NNedi3 work) Also with channels = "YUV", although Staxrip doesnt complain, it makes the preview window only show this: http://imgur.com/mmjCT8e EDIT: Tried setting all the basic values to 1, but no effect on reducing blur (a,s, and h. Did d as well to play with it) Last edited by ShogoXT; 10th December 2016 at 20:22. |
12th December 2016, 11:34 | #996 | Link |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,542
|
I have tested SMDegrain with newer KNLMeansCL on lot of movies during these days.
The final result is an average 3-7% less noise reduction, measured very roughly on bitrate output and visual comparison frame by frame. Any idea of why? What has changed in newer version that could have so much impact? Plus: with stable version I can use 8 threads, with beta only 7.
__________________
@turment on Telegram Last edited by tormento; 12th December 2016 at 11:55. |
12th December 2016, 15:42 | #997 | Link | |
Registered User
Join Date: Jan 2016
Posts: 162
|
Quote:
|
|
12th December 2016, 17:37 | #999 | Link |
Registered User
Join Date: Nov 2014
Posts: 440
|
v1.0.0-beta.2 will be well documented.
__________________
github.com |
12th December 2016, 17:39 | #1000 | Link | |
Registered User
Join Date: Jan 2016
Posts: 162
|
Quote:
Update: oh the next version will be documented, that's great! Last edited by WolframRhodium; 12th December 2016 at 17:42. |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|