Myrsloik
7th August 2020, 16:06
This is kind of a continuation from the previous thread (https://forum.doom9.org/showthread.php?t=175199). Instead of simply porting things quickly I took a look at the actual algorithm and split it up into logical blocks to reduce threading contention. Now the most expensive modes like bt=5 should run several times faster. We're talking 5x territory for a single plane compared to my previous build (and even more for multiple planes).
The ncpu argument is still present since the transform preparation wasn't completely parallelized. Maybe I'll try to fix that some day too. It's a simple problem but with an annoying solution to write.
Output should be identical. Expect a blog post at some point with interesting observations and small optimizations all the avisynth developers seem to have missed.
REQUIRES THE LATEST AUDIO TEST BUILD (https://forum.doom9.org/showthread.php?t=177623)
Test1 x64 avx2 (https://www.dropbox.com/s/qqibqvg8kqn05z1/fft3dfilter_x64_avx2_clang_test1.7z?dl=1)
The ncpu argument is still present since the transform preparation wasn't completely parallelized. Maybe I'll try to fix that some day too. It's a simple problem but with an annoying solution to write.
Output should be identical. Expect a blog post at some point with interesting observations and small optimizations all the avisynth developers seem to have missed.
REQUIRES THE LATEST AUDIO TEST BUILD (https://forum.doom9.org/showthread.php?t=177623)
Test1 x64 avx2 (https://www.dropbox.com/s/qqibqvg8kqn05z1/fft3dfilter_x64_avx2_clang_test1.7z?dl=1)