Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
10th November 2018, 01:10 | #21 | Link | ||
Helenium(Easter)
Join Date: Aug 2017
Location: Hsinchu, Taiwan
Posts: 99
|
Quote:
Quote:
Code:
configure:15780: checking whether C compiler accepts -mincoming-stack-boundary=2 configure:15795: gcc -c -mincoming-stack-boundary=2 conftest.c >&5 cc1.exe: error: -mincoming-stack-boundary=2 is not between 3 and 12
__________________
Monochrome Anomaly |
||
28th January 2019, 11:32 | #23 | Link |
Registered User
Join Date: Dec 2006
Location: Germany
Posts: 91
|
Hi,
currently I am using AviSynth 2.6.0 MT (SEt) (x86) and almost everytime FFT3DGPU 0.8.2 (with my GeForce GTX 660 Ti); my CPU is an Intel Core i5-3470S (Features: MMX, SSE, SSE-2, SSE-3, SSSE-3, SSE4.1, AVX, DEP, VMX, SMX, SMEP, EM64T, EIST, TM1, TM2, Turbo, AES-NI, RDRAND). I want to save Energy, therefore I am going to switch to AviSynth+ r1576 (x64) and to FFT3DFilter 2.5 (x64); the latter needs FFT3W. At the moment I have fft3w.dll 3.3.4 (x86) (original FFT3W-build). I testet that one against the original 3.3.5-build, but 3.3.4 is faster (testet with my old environment, AviSynth 2.6.0 x86 MT). The 3.3.7-build (x86, SSE) from Lord_Mulder is not working in my environment, sadly. Another linked 3.3.7-build here (from GMJCZP), is apparently an AVX2-build. The next link (from GMJCZP) to File Dropper is offline. Another one (from Groucho2004) is offline too. The link to another 3.3.7-version (from Midzuki) - there I don't know, which build it is. And the last link (from Wolfberry) to an 3.3.8-build, I cannot find (and it is AVX2, which my CPU does not have, and I don't know what "codelets" means too). So ... I would be very happy, if someone could compile a 3.3.8-version (with SSE2 and AVX; that's all, my CPU is able to do, I think - respectively that's all, that FFT3W is utilizing, regarding my CPU?) in x86 and x64 version! That would be great :-) I tried to install MinGW, but could not figure out, how to build it on my own (so I uninstalled it). Last edited by almosely; 28th January 2019 at 12:07. |
28th January 2019, 12:49 | #24 | Link | |
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
|
Quote:
What makes you think that you'll save energy switching to the CPU variant of that filter? Did you measure power consumption and, more importantly, task energy (power consumption * time)? Why do you want to use the very old AVS+ r1576? It does not even support multi-threading. The recent builds of AVS+ can be found here.
__________________
Groucho's Avisynth Stuff |
|
28th January 2019, 13:10 | #25 | Link | |
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 10,980
|
Quote:
Pinterf is apparently considering opening his own avs+ thread, where this prob should no longer exist.
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? |
|
28th January 2019, 13:19 | #26 | Link |
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
|
Sounds good.
__________________
Groucho's Avisynth Stuff |
28th January 2019, 13:29 | #27 | Link |
Registered User
Join Date: Dec 2006
Location: Germany
Posts: 91
|
Oh, thanks for the fast replies and the hint for the newer versions of AVS+! Of course I am going with the newer ones then :-)
I did not measure the consumption with a device, but I know the TDPs of my CPU and GPU and watch the consumptions within HWInfo. The GTX uses 15% of 150W in IDLE mode, another 15% when just playing a video through its engine and another 10% for using FFT3DGPU. These are 60 W in total for the GTX 660 Ti, plus 31 W from the CPU (6,5 W IDLE plus 24,5 W while encoding). I switched to QuickSync as Video Engine and used FFT3dFilter (same parameters as fft3dgpu, 32 blocksize and bt=3), let a short clip encode, got 24 fps and a power consumption auf 56 W for both, cpu (encoding) and gpu (idle). The same clip using the GTX video engine and FFT3DGPU (utilizing the gtx) got 29 fps and consumed 91 Watts. So the pure cpu-encoding takes 20% longer, which results in 67 Watts for the same clip against 91 W when I use my gpu in addition. Sadly enough, that the gtx is using 24 W in IDLE, grrr. Last edited by almosely; 28th January 2019 at 13:50. |
28th January 2019, 13:38 | #28 | Link |
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 10,980
|
Quality wise, the CPU version is probably better bet, methinks that GPU FFT3DFilter is now quite old and works in (I think) less precise fixed point calculations,
CPU version updated many times since GPU version. (leastwise, thats what I seem to remember, maybe I'm wrong on that).
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? |
28th January 2019, 13:42 | #29 | Link |
Registered User
Join Date: Dec 2006
Location: Germany
Posts: 91
|
I observed a big difference when comparing fft3dfilter against fft3dgpu within AvsPmod at first sight (histogram "luma" activated). But until now I did not go into depth comparing - I just checked if everything is working so I could test the consumptions. Is AvsPmod still working with AVS+? Or is there a better tool like AvsPmod meanwhile? I use it everytime before I start an encoding.
-edit- Btw, my computer is running aprox. 12 hours a day, almost non-stop, used as a NAS and workstation. So, the IDLE-consumption is there anyway. The pure consumption for the encodings are therefore 31 W (31 W CPU + 0 W GPU) against 61 W (25 W CPU + 36 W GPU) (task energy) - 50% less/more, thats a huge difference. Last edited by almosely; 28th January 2019 at 14:02. |
28th January 2019, 13:54 | #30 | Link | |
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
|
Quote:
__________________
Groucho's Avisynth Stuff |
|
28th January 2019, 14:24 | #31 | Link |
Registered User
Join Date: Dec 2006
Location: Germany
Posts: 91
|
Thanks for the hint; I already downloaded them (these are the ones from Midzuki), but did not test (an hour ago). If this 3.3.7-version is working, I don't know if it's optimised for SSE2 and AVX - probably not. For now, it would be okay. But I think with SSE2 and AVX my encodings would be finished earlier, so more energy saving - especially the times, when the computer is running for encoding only (sometimes at night), finishing with a shutdown. And of course, sometimes faster would just be nice. And, maybe I could use the sigma2,3,4 parameters of fft3dfilter, which the fft3dGPU-version not provides - sometimes the noise/grain is very different, that could help a lot, while obtaining the same encoding-speed as with a non-SSE2-AVX-build. Maybe someone is compiling a new one, I hope so.
Last edited by almosely; 28th January 2019 at 14:49. |
28th January 2019, 14:32 | #32 | Link |
Helenium(Easter)
Join Date: Aug 2017
Location: Hsinchu, Taiwan
Posts: 99
|
I can compile 3.3.8 with ICC 19.0 in a few days, once it is done, you may do some tests to see if it is any faster.
__________________
Monochrome Anomaly |
28th January 2019, 14:40 | #33 | Link | |
Registered User
Join Date: Dec 2006
Location: Germany
Posts: 91
|
3.3.8 with SSE2 and AVX (not AVX2) in x86 and x64, please - that would be great! :-) (x86 for safety and test against AVS 2.6.0). In a few days is fine - I also need some time to setup AVS+ and read into it (and have some other urgent jobs to do). Thank you in advance! :-) I will do some different tests, of course.
-edit, because I forgot to mention within my first post - Quote:
HWInfo is showing Watts for the "CPU Package" and Percent only for the "Total GPU Power" auf the GTX. The default values for the blk-sizes of FFT3DGPU are 32x32, for FFT3dFilter they are 48x48. When I tested FFT3DGPU vs FFT3dFilter, 48x48 was significantly slower than 32x32 and same goes with bt=3 vs bt=5; so I compared these two filter-versions with blk-size 32x32 and bt=3 (even it would probably result in a better quality, if FFT3dFilter would use its bt=5, which the FFT3DGPU is not able to do; even bt=4 is buggy there). Oh, and to get QuickSync working while having the GTX 660 Ti still in use, was very tricky - so stupid and annoying from microsoft, grrr! After many setbacks and different unveilings (e.g. that the HD Graphics is absolutely not capable of OpenCL even Intel is telling so), the final solution is now, to use the GTX as the first initialized GPU (within BIOS), let the IGPU (HD Graphics 2500) work in "Multi-Monitor" (within BIOS) and let Windows 7 "think", that there is another display device, working in "extended screen"-mode, connected as "VGA" to the HD Graphics (even that there is actually no device connected to), and put that "display" on the top left edge of my real display, so that I am still able to use window-edge-snapping and don't slide out of the screen with the mouse and windows at any display-border. But that way, I can use QuickSync and CUVID (and OpenCL and any other feature of the GTX) parallel, the same time. Till now, that setup is working good - I hope, installing and playing games won't be a problem. But it would be such a waste of potential and energy, to not use QuickSync when it's provided (1 Watt with QS vs 25 Watt with CUVID for pure and "simple" video-decoding is a joke/effrontery!). Last edited by almosely; 28th January 2019 at 23:23. |
|
29th January 2019, 15:10 | #34 | Link |
Helenium(Easter)
Join Date: Aug 2017
Location: Hsinchu, Taiwan
Posts: 99
|
FFTW 3.3.8 built with ICC 19.0 now in my signature.
Have fun
__________________
Monochrome Anomaly |
29th January 2019, 15:30 | #35 | Link |
Registered User
Join Date: Dec 2006
Location: Germany
Posts: 91
|
Coooooool :-) Thank you very much!!!
Just downloaded every generated version. I will test later, today or in 1-2 days, and report my results. Question: While building FFTW, there's the only option to choose one single SIMD or could it be a combination of more, e.g. SSE2+AVX? ;-) |
29th January 2019, 15:39 | #36 | Link |
Registered User
Join Date: Dec 2005
Location: Germany
Posts: 1,795
|
thx, I get 2-4 fps more now on a Ryzen xD
Will add it to my vs portable fatpack.
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth VapourSynth Portable FATPACK || VapourSynth Database |
29th January 2019, 15:52 | #37 | Link | |
Registered User
Join Date: Dec 2006
Location: Germany
Posts: 91
|
Quote:
|
|
29th January 2019, 18:46 | #38 | Link | |
Registered User
Join Date: Dec 2006
Location: Germany
Posts: 91
|
Quote:
Yeah! :-) So, no another build with SSE2 + AVX is needed. I am going to test a little bit with AviSynth 2.6.0 so far. |
|
29th January 2019, 23:26 | #39 | Link |
Registered User
Join Date: Dec 2006
Location: Germany
Posts: 91
|
So, I did some tests with AviSynth 2.6.0 (x86), this time with another clip (the fastest results gave: multithreading=off, ncpu=1 (fft3dfilter) and no RequestLinear(); everything else was more slowly; blk-size 32, bt=3).
FFTW 3.3.8 AVX is 4% faster than FFTW 3.3.4 (I guess, it's an AVX-build too, given the filesize 2259 KB), at least with this setup :-) Thank you! And I am very curious about comparing that with AVS+ (x64)! 26.88 fps CUVID 26.54 fps CUVID + FFT3DGPU 0.8.2 (sigma 1.0) 26.41 fps CUVID + FFT3DGPU 0.8.2 (sigma 1.0, sharpen 0.16) (GPU Power 59 W) 26.14 fps CUVID + FFT3DGPU 0.8.4 (sigma 1.0, sharpen 0.16) (GPU Power 59 W) 26.90 fps QuickSync FFTW 3.3.4 (AVX) ------------------ 22.10 fps QuickSync + FFT3dFilter 2.5 (sigma 1.0) 21.81 fps QuickSync + FFT3dFilter 2.5 (sigma 1.0, sharpen 0.16) FFTW 3.3.8 SSE2 ----------------- 22.84 fps QuickSync + FFT3dFilter 2.5 (sigma 1.0) 22.31 fps QuickSync + FFT3dFilter 2.5 (sigma 1.0, sharpen 0.16) FFTW 3.3.8 AVX ---------------- 22.99 fps QuickSync + FFT3dFilter 2.5 (sigma 1.0) 22.39 fps QuickSync + FFT3dFilter 2.5 (sigma 1.0, sharpen 0.16) Speed Comparison ------------------ 100.00 % 120.09 % 115.44 % 26.54 fps CUVID + FFT3DGPU 0.8.2 (sigma 1.0) 083.27 % 100.00 % 096.13 % 22.10 fps QuickSync + FFT3dFilter 2.5 (sigma 1.0) FFTW 3.3.4 (AVX) 086.06 % 103.35 % 099.35 % 22.84 fps QuickSync + FFT3dFilter 2.5 (sigma 1.0) FFTW 3.3.8 SSE2 086.62 % 104.03 % 100.00 % 22.99 fps QuickSync + FFT3dFilter 2.5 (sigma 1.0) FFTW 3.3.8 AVX Consumption Comparison: CPU+GPU total (workload only) ------------------------------------------------------ 100 % 90 Watt (100 % 60 Watt) 60 min 26.54 fps CUVID + FFT3DGPU 0.8.2 (sigma 1.0) -26 % 67 Watt (-48 % 31 Watt) 72 min 22.10 fps QuickSync + FFT3dFilter 2.5 (sigma 1.0) FFTW 3.3.4 (AVX) -29 % 64 Watt (-50 % 30 Watt) 69 min 22.99 fps QuickSync + FFT3dFilter 2.5 (sigma 1.0) FFTW 3.3.8 AVX - edit - I did some tests with AviSynth+ 0.1.0 r2772 MT (x64) with the same clip and script, and this time with the x64-filter-plugins, of course - compared with formerly QuickSync results of AVS 2.6.0 ... 098,96 % 26.62 fps QuickSync FFTW 3.3.8 AVX ---------------- 099.91 % 22.97 fps QuickSync + FFT3dFilter 2.5 (sigma 1.0) 101,25 % 22.67 fps QuickSync + FFT3dFilter 2.5 (sigma 1.0, sharpen 0.16) ... so, a little bit slower, sadly, except the last one. Maybe with activated MT it will be better. BUT: I can't use FFT3dFilter - it destroys the luma :-( Regarding this, I posted within the corresponding thread: https://forum.doom9.org/showpost.php...0&postcount=53 - edit - Tested QuickSync + FFT3dFilter 2.5 (sigma 1.0, sharpen 0.16) in MT-mode: Every trial was more slowly (2,4,6 threads, with and without requestlinear;19-21 fps) Last edited by almosely; 1st February 2019 at 17:21. |
30th January 2019, 16:55 | #40 | Link |
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
|
Multi-threading should speed this up. Post your complete script.
__________________
Groucho's Avisynth Stuff |
Tags |
fftw, fftw3.dll |
Thread Tools | Search this Thread |
Display Modes | |
|
|