Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Announcements and Chat > General Discussion

Reply
 
Thread Tools Search this Thread Display Modes
Old 28th March 2017, 15:46   #1  |  Link
Sparktank
47.952fps@71.928Hz
 
Sparktank's Avatar
 
Join Date: Mar 2011
Posts: 856
newer FFTW DLL's Windows?

I notice on the FFTW site that the current Windows builds are for 3.3.5.

http://fftw.org/install/windows.html

Since then, there are have been a few updates for the 'stable' 3.3.6 verison:

http://fftw.org/release-notes.html
Quote:
FFTW 3.3.6-pl2
Mar 25th, 2017

Bugfix: MPI Fortran-03 headers were missing in FFTW 3.3.6-pl1.

FFTW 3.3.6-pl1 (withdrawn)
Jan 16th, 2017

Bugfix: FFTW 3.3.6 had the wrong libtool version number, and generated shared libraries of the form libfftw3.so.2.6.6 instead of libfftw3.so.3.*.

FFTW 3.3.6 (withdrawn)
Jan 15th, 2017

The fftw_make_planner_thread_safe() API introduced in 3.3.5 didn't work, and this 3.3.6 fixes it. Sorry about that.
Compilation fixes for IBM XLC.
Compilation fixes for threads on Windows.
fix SIMD autodetection on amd64 when (_MSC_VER > 1500)
media-autobuild_suite doesn't include fftw building.
and the last time I tried to compile something myself, I ended up spending 2 days trying to get everythint together and I don't even remember if I finished or not.

EDIT: Can someone please compile and upload?
The official site might update once they reach 3.3.7.
I'm pretty sure the version I have currently on my system is one of the deprecated versions.
__________________
Win10 (x64) build 17134 | GPU Caps Viewer v1.39.0.0
NVIDIA GeForce GT 640 (GK107) 2047MB/DDR3 | (R398.11)
NTSC | DVD: R1 | BD: A
Sparktank is offline   Reply With Quote
Old 21st December 2017, 21:40   #2  |  Link
GMJCZP
Registered User
 
GMJCZP's Avatar
 
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 514
libfftw3f-3.dll 3.3.7 (rename it if you wish to FFTW) inside DFTTest Vapoursynth version:

Here
__________________
By law and justice!

Flea Market
GMJCZP is offline   Reply With Quote
Old 21st December 2017, 22:31   #3  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 12,928
FWIW, here is FFTW 3.3.7 for Win32 (SSE2 build):

fftw-3.3.7-win32.zip
__________________
There was of course no way of knowing whether you were being watched at any given moment.
How often, or on what system, the Thought Police plugged in on any individual wire was guesswork.



Last edited by LoRd_MuldeR; 21st December 2017 at 22:34.
LoRd_MuldeR is offline   Reply With Quote
Old 22nd December 2017, 22:16   #4  |  Link
Overdrive80
Anime addict
 
Overdrive80's Avatar
 
Join Date: Feb 2009
Location: Spain
Posts: 625
From official mirror: ftp://ftp.fftw.org/pub/fftw/
__________________
Intel i7-6700K + Noctua NH-D15 + Z170A XPower G. Titanium + Kingston HyperX Savage DDR4 2x8GB + Nvidia GTX750 2GB DDR5 + SSD Vertex 4 256 GB + Antec EDG750 80 Plus Gold Mod + Corsair 780T Graphite
Overdrive80 is offline   Reply With Quote
Old 23rd December 2017, 01:09   #5  |  Link
GMJCZP
Registered User
 
GMJCZP's Avatar
 
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 514
Direct link (32 and 64 bit versions):

Here
__________________
By law and justice!

Flea Market
GMJCZP is offline   Reply With Quote
Old 23rd December 2017, 23:45   #6  |  Link
Groucho2004
 
Groucho2004's Avatar
 
Join Date: Mar 2006
Posts: 3,896
I built libfftw3f-3.dll with gcc 4.9.3 (SSE2, x86) and was surprised to see it perform better than the ICC version that comes with dfttest.

Tested on i5 2500K @4GHz.
Script for testing:
Code:
colorbars(width = 1280, height = 720, pixel_type = "yv12").killaudio().assumefps(50, 1).trim(0, 49)
RemoveNoise()

function RemoveNoise(clip video, int "threshold")
{
  last = video
  sc = MSuper(hpad = 16, vpad = 16)
  backward_vector = MAnalyse(sc, isb =  true, delta = 1, blksize = 16, overlap = 4, truemotion = false, sadx264 = 4, dct = 1)
  forward_vector =  MAnalyse(sc, isb = false, delta = 1, blksize = 16, overlap = 4, truemotion = false, sadx264 = 4, dct = 1)
  MDegrain1(sc, backward_vector, forward_vector, thSAD = 300)
  return last
}
ICC build taken from DFTTest:
Code:
Frames processed:               50 (0 - 49)
FPS (min | max | average):      0.742 | 1.529 | 0.760
Memory usage (phys | virt):     51 | 47 MiB
Thread count:                   9
CPU usage (average):            25%
gcc 4.9.3 build:
Code:
Frames processed:               50 (0 - 49)
FPS (min | max | average):      0.899 | 1.806 | 0.921 (+21%)
Memory usage (phys | virt):     49 | 45 MiB
Thread count:                   9
CPU usage (average):            25%
Groucho2004 is offline   Reply With Quote
Old 23rd December 2017, 23:48   #7  |  Link
GMJCZP
Registered User
 
GMJCZP's Avatar
 
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 514
Thank you, brother Marx! Better in speed and memory usage, superb!

Edit: if I missed something, in which part of the script is called dfttest or fft3dfilter?
__________________
By law and justice!

Flea Market

Last edited by GMJCZP; 23rd December 2017 at 23:52.
GMJCZP is offline   Reply With Quote
Old 24th December 2017, 00:03   #8  |  Link
Groucho2004
 
Groucho2004's Avatar
 
Join Date: Mar 2006
Posts: 3,896
Quote:
Originally Posted by GMJCZP View Post
Thank you, brother Marx! Better in speed and memory usage, superb!

Edit: if I missed something, in which part of the script is called dfttest or fft3dfilter?
The test was with MDegrain where MAnalyse uses FFTW when dct = 1.

However, with dfttest the gcc build is quite a bit slower. So, not recommended for dfttest. With fft3dfilter it's about the same speed.
Groucho2004 is offline   Reply With Quote
Old 30th December 2017, 17:44   #9  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 492
Quote:
Originally Posted by Groucho2004 View Post
I built libfftw3f-3.dll with gcc 4.9.3 (SSE2, x86) and was surprised to see it perform better than the ICC version that comes with dfttest.
Interesting. I had compiled FFTW with GCC 7.2.0 as well but only done tests with FFTW's benchmark program (benchf.exe) and DFTTest since I didn't use DCT mode in MVTools. After investigation I find out that MVTools dislike ICC's O3 optimization for unknown reason, change to O2 optimization gives better performance. I also discover that FFTW by default only generate efficient codelets of size 8 in DCT/IDCT transforms. I especially generate codelets of typical sizes 4, 16, 32 and 64 so now it's at least 50% faster than before when blksize is one of them. DFTTest and FFT3DFilter are unaffected since they use real DFT transforms. The 7z file on GitHub is updated.

Last edited by HolyWu; 30th December 2017 at 17:46.
HolyWu is offline   Reply With Quote
Old 2nd January 2018, 13:37   #10  |  Link
Groucho2004
 
Groucho2004's Avatar
 
Join Date: Mar 2006
Posts: 3,896
Quote:
Originally Posted by HolyWu View Post
Interesting. I had compiled FFTW with GCC 7.2.0 as well but only done tests with FFTW's benchmark program (benchf.exe) and DFTTest since I didn't use DCT mode in MVTools. After investigation I find out that MVTools dislike ICC's O3 optimization for unknown reason, change to O2 optimization gives better performance. I also discover that FFTW by default only generate efficient codelets of size 8 in DCT/IDCT transforms. I especially generate codelets of typical sizes 4, 16, 32 and 64 so now it's at least 50% faster than before when blksize is one of them. DFTTest and FFT3DFilter are unaffected since they use real DFT transforms. The 7z file on GitHub is updated.
Nice, the new fftw DLL is almost twice as fast with the script I posted above.
Groucho2004 is offline   Reply With Quote
Old 2nd May 2018, 09:31   #11  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 976
Quote:
Originally Posted by GMJCZP View Post
Direct link (32 and 64 bit versions)]
The link has expired. Could someone please post a link with a windows working x64 build of 3.3.7?
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 2nd May 2018, 19:31   #12  |  Link
Midzuki
Unavailable
 
Midzuki's Avatar
 
Join Date: Mar 2009
Location: offline
Posts: 1,477
Quote:
Originally Posted by tormento View Post
The link has expired. Could someone please post a link with a windows working x64 build of 3.3.7?
https://forum.videohelp.com/threads/...Ls-for-Windows
Midzuki is offline   Reply With Quote
Old 21st October 2018, 13:35   #13  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Germany
Posts: 393
@Wolfberry... Nice, but if you don't mind, could you build the SSE2 with v141_XP and AVX/AVX2 with the normal v141?

Thank you in advance.
__________________
Broadcast Encoder
FranceBB is offline   Reply With Quote
Old 21st October 2018, 14:50   #14  |  Link
Wolfberry
Helenium(Easter)
 
Wolfberry's Avatar
 
Join Date: Aug 2017
Location: Hsinchu, Taiwan
Posts: 81
Quote:
The primary build mechanism for FFTW remains GNU autoconf/automake. CMake support is meant to offer an easy way to compile FFTW on Windows, and as such it does not cover all the features of the automake build system, such as exotic cycle counters, cross-compiling, or build of binaries for a mixture of ISA's
@FranceBB I decided not to provide VS builds as CMake support is still experimental.

I will build it with MSYS+MinGW instead, should work fine on XP, I guess.
__________________
media-autobuild_suite builds / FFTW
Wolfberry is offline   Reply With Quote
Old 22nd October 2018, 09:13   #15  |  Link
ChaosKing
Registered User
 
Join Date: Dec 2005
Location: Germany
Posts: 462
THX for the builds.
o2 is always faster on my Ryzen 1700 (like 0.5 fps faster ). And for some reason the avx build is the slowest. Tested with dfttest.
__________________
Search and denoise
ChaosKing is offline   Reply With Quote
Old 25th October 2018, 02:07   #16  |  Link
Wolfberry
Helenium(Easter)
 
Wolfberry's Avatar
 
Join Date: Aug 2017
Location: Hsinchu, Taiwan
Posts: 81
x64 SSE2/AVX/AVX2/Generic SIMD builds (GCC 8.2.0 Rev4 built by MSYS2 project)

Link
__________________
media-autobuild_suite builds / FFTW

Last edited by Wolfberry; 8th November 2018 at 11:19.
Wolfberry is offline   Reply With Quote
Old 25th October 2018, 22:45   #17  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Germany
Posts: 393
I tested it, it works fine on WinXP, but I had to include LIBGCC_S_DW2-1.DLL myself in system32 as it was missing.

Thanks. ^_^
__________________
Broadcast Encoder
FranceBB is offline   Reply With Quote
Old 5th November 2018, 09:12   #18  |  Link
ErazorTT
Registered User
 
Join Date: Mar 2003
Location: Germany
Posts: 58
@Wolfberry:
I tested the performance of your x64 compilations with dfttest on a mobile i5 Haswell. The one named „simd128+256“ was around 10% faster than the official 3.3.5 Build, all others were 5% slower.
However using fft3dfilter with „simd128+256“ produces an access violation. As far as I know, the instruction set of haswells should be complete up to AVX-256. Did someone manage to get fft3dfilter working with this build, if yes on what system?
ErazorTT is offline   Reply With Quote
Old 8th November 2018, 11:24   #19  |  Link
Wolfberry
Helenium(Easter)
 
Wolfberry's Avatar
 
Join Date: Aug 2017
Location: Hsinchu, Taiwan
Posts: 81
@ErazorTT
Confirmed.
It seems that fft3dfilter dislike generic (gcc) 256-bit SIMD optimizations.
Updated original post.
__________________
media-autobuild_suite builds / FFTW
Wolfberry is offline   Reply With Quote
Old 8th November 2018, 11:39   #20  |  Link
ErazorTT
Registered User
 
Join Date: Mar 2003
Location: Germany
Posts: 58
Great, I will test the performance with your new builds when I’m back home.

Quote:
Originally Posted by Wolfberry View Post
It seems that fft3dfilter dislike generic (gcc) 256-bit SIMD optimizations.
For me this sounds like some kind of an alignment error.
Can you show the differences of the flags you use for your compilations and the old 128+256?
ErazorTT is offline   Reply With Quote
Reply

Tags
fftw, fftw3.dll

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 12:34.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2018, vBulletin Solutions Inc.