Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Announcements and Chat > General Discussion

Reply
 
Thread Tools Search this Thread Display Modes
Old 9th November 2018, 09:05   #21  |  Link
ErazorTT
Registered User
 
Join Date: Mar 2003
Location: Germany
Posts: 58
Ok so all builds apart of simd256 work with fft3dfilter. However the old build simd128+256 appears to have been a wee bit faster than all new builds with dfttest.

Out of curiosity: what do you actually mean by simd 128 or 256 in contrast to sse2/avx/avx2? After all sse2 is a 128 simd and avx/avx2 have additional 256 simd instructions on top of sse2.

Last edited by ErazorTT; 9th November 2018 at 09:13.
ErazorTT is offline   Reply With Quote
Old 9th November 2018, 13:08   #22  |  Link
Wolfberry
Helenium(Easter)
 
Wolfberry's Avatar
 
Join Date: Aug 2017
Location: Hsinchu, Taiwan
Posts: 81
Code:
  --enable-sse2             enable SSE/SSE2 optimizations
  --enable-avx              enable AVX optimizations
  --enable-avx2             enable AVX2 optimizations
  --enable-avx512           enable AVX512 optimizations
  --enable-avx-128-fma      enable AVX128/FMA optimizations
  --enable-kcvi             enable Knights Corner vector instructions optimizations
  --enable-altivec          enable Altivec optimizations
  --enable-vsx              enable IBM VSX optimizations
  --enable-neon             enable ARM NEON optimizations
  --enable-generic-simd128  enable generic (gcc) 128-bit SIMD optimizations
  --enable-generic-simd256  enable generic (gcc) 256-bit SIMD optimizations
Above is some flags that you can use during configuration.
The SIMD builds also enabled SSE2/AVX/AVX2, but I am not sure if it is worth it.
AFAIK, the generic-simd128/256 is some kind of generic AVX(2), not sure how generic they are.

The fftw release note says:
Quote:
enabling them all at the same time is a bad idea, because it increases the planning time for minimal gain
And the more path you enabled, the more fat the dlls will be.
Quote:
Originally Posted by HolyWu View Post
I especially generate codelets of typical sizes 4, 16, 32 and 64 so now it's at least 50% faster than before when blksize is one of them. DFTTest and FFT3DFilter are unaffected since they use real DFT transforms.
The future builds will have these codelets generated as well.
__________________
media-autobuild_suite builds / FFTW

Last edited by Wolfberry; 9th November 2018 at 13:12.
Wolfberry is offline   Reply With Quote
Old 9th November 2018, 18:46   #23  |  Link
ErazorTT
Registered User
 
Join Date: Mar 2003
Location: Germany
Posts: 58
So the generic options are based on the compiler vectorization and optimization.

Have you tried to increase the alignment using --with-incoming-stack-boundary? Like suggested here: https://forum.doom9.org/showthread.p...80#post1857180
ErazorTT is offline   Reply With Quote
Old 10th November 2018, 01:10   #24  |  Link
Wolfberry
Helenium(Easter)
 
Wolfberry's Avatar
 
Join Date: Aug 2017
Location: Hsinchu, Taiwan
Posts: 81
Quote:
Originally Posted by Groucho2004 View Post
There are a number of guidelines here about building fftw.
The official guideline for building fftw on windows is outdated, I consider BUILD-MINGW32 and BUILD-MINGW64 and PKGBUILD as a better reference.

Quote:
On win32, some versions of gcc assume that the stack is 16-byte aligned, but code compiled with other compilers may only guarantee a 4-byte alignment, resulting in mysterious segfaults.
As quoted, --with-incoming-stack-boundary=2 is only applicable to win32(x86), not x64.
Code:
configure:15780: checking whether C compiler accepts -mincoming-stack-boundary=2
configure:15795: gcc -c -mincoming-stack-boundary=2  conftest.c >&5
cc1.exe: error: -mincoming-stack-boundary=2 is not between 3 and 12
__________________
media-autobuild_suite builds / FFTW
Wolfberry is offline   Reply With Quote
Reply

Tags
fftw, fftw3.dll

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 12:56.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2018, vBulletin Solutions Inc.