Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Announcements and Chat > General Discussion

Reply
 
Thread Tools Search this Thread Display Modes
Old 28th March 2017, 15:46   #1  |  Link
Sparktank
47.952fps@71.928Hz
 
Sparktank's Avatar
 
Join Date: Mar 2011
Posts: 940
newer FFTW DLL's Windows?

Edit 3: Currently active links
Quote:
Originally Posted by HolyWu View Post

Quote:
Originally Posted by filler56789 View Post





EDIT2:
Quote:
Originally Posted by HolyWu View Post
(removed dead link; see above)


EDIT:
Quote:
Originally Posted by GMJCZP View Post
libfftw3f-3.dll 3.3.7 (rename it if you wish to FFTW) inside DFTTest Vapoursynth version:

Here
Quote:
Originally Posted by Overdrive80 View Post
From official mirror: ftp://ftp.fftw.org/pub/fftw/

Old:
I notice on the FFTW site that the current Windows builds are for 3.3.5.
http://fftw.org/install/windows.html

Since then, there are have been a few updates for the 'stable' 3.3.6 verison:
http://fftw.org/release-notes.html
Quote:
FFTW 3.3.6-pl2
Mar 25th, 2017

Bugfix: MPI Fortran-03 headers were missing in FFTW 3.3.6-pl1.

FFTW 3.3.6-pl1 (withdrawn)
Jan 16th, 2017

Bugfix: FFTW 3.3.6 had the wrong libtool version number, and generated shared libraries of the form libfftw3.so.2.6.6 instead of libfftw3.so.3.*.

FFTW 3.3.6 (withdrawn)
Jan 15th, 2017

The fftw_make_planner_thread_safe() API introduced in 3.3.5 didn't work, and this 3.3.6 fixes it. Sorry about that.
Compilation fixes for IBM XLC.
Compilation fixes for threads on Windows.
fix SIMD autodetection on amd64 when (_MSC_VER > 1500)

media-autobuild_suite doesn't include fftw building.
and the last time I tried to compile something myself, I ended up spending 2 days trying to get everythint together and I don't even remember if I finished or not.

EDIT: Can someone please compile and upload?
The official site might update once they reach 3.3.7.
I'm pretty sure the version I have currently on my system is one of the deprecated versions.
__________________
Win10 (x64) build 19041
NVIDIA GeForce GTX 1060 3GB (GP106) 3071MB/GDDR5 | (r435_95-4)
NTSC | DVD: R1 | BD: A
AMD Ryzen 5 2600 @3.4GHz (6c/12th, I'm on AVX2 now!)

Last edited by Sparktank; 23rd April 2020 at 22:54.
Sparktank is offline   Reply With Quote
Old 21st December 2017, 21:40   #2  |  Link
GMJCZP
Registered User
 
GMJCZP's Avatar
 
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 744
libfftw3f-3.dll 3.3.7 (rename it if you wish to FFTW) inside DFTTest Vapoursynth version:

Here
__________________
By law and justice!

GMJCZP's Arsenal
GMJCZP is offline   Reply With Quote
Old 21st December 2017, 22:31   #3  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
FWIW, here is FFTW 3.3.7 for Win32 (SSE2 build):

fftw-3.3.7-win32.zip
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 21st December 2017 at 22:34.
LoRd_MuldeR is offline   Reply With Quote
Old 22nd December 2017, 22:16   #4  |  Link
Overdrive80
Anime addict
 
Overdrive80's Avatar
 
Join Date: Feb 2009
Location: Spain
Posts: 673
From official mirror: ftp://ftp.fftw.org/pub/fftw/
__________________
Intel i7-6700K + Noctua NH-D15 + Z170A XPower G. Titanium + Kingston HyperX Savage DDR4 2x8GB + Radeon RX580 8GB DDR5 + ADATA SX8200 Pro 1 TB + Antec EDG750 80 Plus Gold Mod + Corsair 780T Graphite
Overdrive80 is offline   Reply With Quote
Old 23rd December 2017, 01:09   #5  |  Link
GMJCZP
Registered User
 
GMJCZP's Avatar
 
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 744
Direct link (32 and 64 bit versions):

Here
__________________
By law and justice!

GMJCZP's Arsenal
GMJCZP is offline   Reply With Quote
Old 23rd December 2017, 23:45   #6  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
I built libfftw3f-3.dll with gcc 4.9.3 (SSE2, x86) and was surprised to see it perform better than the ICC version that comes with dfttest.

Tested on i5 2500K @4GHz.
Script for testing:
Code:
colorbars(width = 1280, height = 720, pixel_type = "yv12").killaudio().assumefps(50, 1).trim(0, 49)
RemoveNoise()

function RemoveNoise(clip video, int "threshold")
{
  last = video
  sc = MSuper(hpad = 16, vpad = 16)
  backward_vector = MAnalyse(sc, isb =  true, delta = 1, blksize = 16, overlap = 4, truemotion = false, sadx264 = 4, dct = 1)
  forward_vector =  MAnalyse(sc, isb = false, delta = 1, blksize = 16, overlap = 4, truemotion = false, sadx264 = 4, dct = 1)
  MDegrain1(sc, backward_vector, forward_vector, thSAD = 300)
  return last
}
ICC build taken from DFTTest:
Code:
Frames processed:               50 (0 - 49)
FPS (min | max | average):      0.742 | 1.529 | 0.760
Memory usage (phys | virt):     51 | 47 MiB
Thread count:                   9
CPU usage (average):            25%
gcc 4.9.3 build:
Code:
Frames processed:               50 (0 - 49)
FPS (min | max | average):      0.899 | 1.806 | 0.921 (+21%)
Memory usage (phys | virt):     49 | 45 MiB
Thread count:                   9
CPU usage (average):            25%
__________________
Groucho's Avisynth Stuff
Groucho2004 is offline   Reply With Quote
Old 23rd December 2017, 23:48   #7  |  Link
GMJCZP
Registered User
 
GMJCZP's Avatar
 
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 744
Thank you, brother Marx! Better in speed and memory usage, superb!

Edit: if I missed something, in which part of the script is called dfttest or fft3dfilter?
__________________
By law and justice!

GMJCZP's Arsenal

Last edited by GMJCZP; 23rd December 2017 at 23:52.
GMJCZP is offline   Reply With Quote
Old 24th December 2017, 00:03   #8  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by GMJCZP View Post
Thank you, brother Marx! Better in speed and memory usage, superb!

Edit: if I missed something, in which part of the script is called dfttest or fft3dfilter?
The test was with MDegrain where MAnalyse uses FFTW when dct = 1.

However, with dfttest the gcc build is quite a bit slower. So, not recommended for dfttest. With fft3dfilter it's about the same speed.
__________________
Groucho's Avisynth Stuff
Groucho2004 is offline   Reply With Quote
Old 2nd January 2018, 13:37   #9  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by HolyWu View Post
Interesting. I had compiled FFTW with GCC 7.2.0 as well but only done tests with FFTW's benchmark program (benchf.exe) and DFTTest since I didn't use DCT mode in MVTools. After investigation I find out that MVTools dislike ICC's O3 optimization for unknown reason, change to O2 optimization gives better performance. I also discover that FFTW by default only generate efficient codelets of size 8 in DCT/IDCT transforms. I especially generate codelets of typical sizes 4, 16, 32 and 64 so now it's at least 50% faster than before when blksize is one of them. DFTTest and FFT3DFilter are unaffected since they use real DFT transforms. The 7z file on GitHub is updated.
Nice, the new fftw DLL is almost twice as fast with the script I posted above.
__________________
Groucho's Avisynth Stuff
Groucho2004 is offline   Reply With Quote
Old 2nd May 2018, 09:31   #10  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
Quote:
Originally Posted by GMJCZP View Post
Direct link (32 and 64 bit versions)]
The link has expired. Could someone please post a link with a windows working x64 build of 3.3.7?
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 2nd May 2018, 19:31   #11  |  Link
Midzuki
Unavailable
 
Midzuki's Avatar
 
Join Date: Mar 2009
Location: offline
Posts: 1,480
Quote:
Originally Posted by tormento View Post
The link has expired. Could someone please post a link with a windows working x64 build of 3.3.7?
https://forum.videohelp.com/threads/...Ls-for-Windows
Midzuki is offline   Reply With Quote
Old 21st October 2018, 13:35   #12  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,883
@Wolfberry... Nice, but if you don't mind, could you build the SSE2 with v141_XP and AVX/AVX2 with the normal v141?

Thank you in advance.
FranceBB is offline   Reply With Quote
Old 22nd October 2018, 09:13   #13  |  Link
ChaosKing
Registered User
 
Join Date: Dec 2005
Location: Germany
Posts: 1,795
THX for the builds.
o2 is always faster on my Ryzen 1700 (like 0.5 fps faster ). And for some reason the avx build is the slowest. Tested with dfttest.
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth
VapourSynth Portable FATPACK || VapourSynth Database
ChaosKing is offline   Reply With Quote
Old 25th October 2018, 22:45   #14  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,883
I tested it, it works fine on WinXP, but I had to include LIBGCC_S_DW2-1.DLL myself in system32 as it was missing.

Thanks. ^_^
FranceBB is offline   Reply With Quote
Old 5th November 2018, 09:12   #15  |  Link
ErazorTT
Registered User
 
Join Date: Mar 2003
Location: Germany
Posts: 215
@Wolfberry:
I tested the performance of your x64 compilations with dfttest on a mobile i5 Haswell. The one named „simd128+256“ was around 10% faster than the official 3.3.5 Build, all others were 5% slower.
However using fft3dfilter with „simd128+256“ produces an access violation. As far as I know, the instruction set of haswells should be complete up to AVX-256. Did someone manage to get fft3dfilter working with this build, if yes on what system?
ErazorTT is offline   Reply With Quote
Old 8th November 2018, 11:24   #16  |  Link
Wolfberry
Helenium(Easter)
 
Wolfberry's Avatar
 
Join Date: Aug 2017
Location: Hsinchu, Taiwan
Posts: 99
@ErazorTT
Confirmed.
It seems that fft3dfilter dislike generic (gcc) 256-bit SIMD optimizations.
Updated original post.
__________________
Monochrome Anomaly
Wolfberry is offline   Reply With Quote
Old 8th November 2018, 11:39   #17  |  Link
ErazorTT
Registered User
 
Join Date: Mar 2003
Location: Germany
Posts: 215
Great, I will test the performance with your new builds when I’m back home.

Quote:
Originally Posted by Wolfberry View Post
It seems that fft3dfilter dislike generic (gcc) 256-bit SIMD optimizations.
For me this sounds like some kind of an alignment error.
Can you show the differences of the flags you use for your compilations and the old 128+256?
ErazorTT is offline   Reply With Quote
Old 9th November 2018, 09:05   #18  |  Link
ErazorTT
Registered User
 
Join Date: Mar 2003
Location: Germany
Posts: 215
Ok so all builds apart of simd256 work with fft3dfilter. However the old build simd128+256 appears to have been a wee bit faster than all new builds with dfttest.

Out of curiosity: what do you actually mean by simd 128 or 256 in contrast to sse2/avx/avx2? After all sse2 is a 128 simd and avx/avx2 have additional 256 simd instructions on top of sse2.

Last edited by ErazorTT; 9th November 2018 at 09:13.
ErazorTT is offline   Reply With Quote
Old 9th November 2018, 13:08   #19  |  Link
Wolfberry
Helenium(Easter)
 
Wolfberry's Avatar
 
Join Date: Aug 2017
Location: Hsinchu, Taiwan
Posts: 99
Code:
  --enable-sse2             enable SSE/SSE2 optimizations
  --enable-avx              enable AVX optimizations
  --enable-avx2             enable AVX2 optimizations
  --enable-avx512           enable AVX512 optimizations
  --enable-avx-128-fma      enable AVX128/FMA optimizations
  --enable-kcvi             enable Knights Corner vector instructions optimizations
  --enable-altivec          enable Altivec optimizations
  --enable-vsx              enable IBM VSX optimizations
  --enable-neon             enable ARM NEON optimizations
  --enable-generic-simd128  enable generic (gcc) 128-bit SIMD optimizations
  --enable-generic-simd256  enable generic (gcc) 256-bit SIMD optimizations
Above is some flags that you can use during configuration.
The SIMD builds also enabled SSE2/AVX/AVX2, but I am not sure if it is worth it.
AFAIK, the generic-simd128/256 is some kind of generic AVX(2), not sure how generic they are.

The fftw release note says:
Quote:
enabling them all at the same time is a bad idea, because it increases the planning time for minimal gain
And the more path you enabled, the more fat the dlls will be.
Quote:
Originally Posted by HolyWu View Post
I especially generate codelets of typical sizes 4, 16, 32 and 64 so now it's at least 50% faster than before when blksize is one of them. DFTTest and FFT3DFilter are unaffected since they use real DFT transforms.
The future builds will have these codelets generated as well.
__________________
Monochrome Anomaly

Last edited by Wolfberry; 9th November 2018 at 13:12.
Wolfberry is offline   Reply With Quote
Old 9th November 2018, 18:46   #20  |  Link
ErazorTT
Registered User
 
Join Date: Mar 2003
Location: Germany
Posts: 215
So the generic options are based on the compiler vectorization and optimization.

Have you tried to increase the alignment using --with-incoming-stack-boundary? Like suggested here: https://forum.doom9.org/showthread.p...80#post1857180
ErazorTT is offline   Reply With Quote
Reply

Tags
fftw, fftw3.dll

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 03:30.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.