Log in

View Full Version : FFT3DFilter - the optimizationing of terror!


Myrsloik
7th August 2020, 16:06
This is kind of a continuation from the previous thread (https://forum.doom9.org/showthread.php?t=175199). Instead of simply porting things quickly I took a look at the actual algorithm and split it up into logical blocks to reduce threading contention. Now the most expensive modes like bt=5 should run several times faster. We're talking 5x territory for a single plane compared to my previous build (and even more for multiple planes).

The ncpu argument is still present since the transform preparation wasn't completely parallelized. Maybe I'll try to fix that some day too. It's a simple problem but with an annoying solution to write.

Output should be identical. Expect a blog post at some point with interesting observations and small optimizations all the avisynth developers seem to have missed.

REQUIRES THE LATEST AUDIO TEST BUILD (https://forum.doom9.org/showthread.php?t=177623)

Test1 x64 avx2 (https://www.dropbox.com/s/qqibqvg8kqn05z1/fft3dfilter_x64_avx2_clang_test1.7z?dl=1)

lansing
7th August 2020, 16:41
Quick test on speed, source is 720x480 dvd, on Ryzen 3900x

previous version:
bt=3: 70 fps, 7% cpu
bt=5: 51 fps, 6.5% cpu

new version:
bt=3: 285 fps, 22.5% cpu
bt=5: 273 fps, 24% cpu

That is more than 5x speed gain on bt=5, wow. The thing I noticed is that 9 of my 24 threads didn't run.

ChaosKing
7th August 2020, 17:17
default values ntsc DVD - Ryzen 2600. Tested in vseditor which adds a bit of overhead so it is a bit faster in reality.

fft "old" 70fps
fft new 246fps
neo_fft 191fps ( I just noticed that it changes the colors a bit)

bt = 5
fft "old" 48fps
fft new 230fps
neo_fft 176fps

Impressive optimization!

Myrsloik
7th August 2020, 17:35
Quick test on speed, source is 720x480 dvd, on Ryzen 3900x

previous version:
bt=3: 70 fps, 7% cpu
bt=5: 51 fps, 6.5% cpu

new version:
bt=3: 285 fps, 22.5% cpu
bt=5: 273 fps, 24% cpu

That is more than 5x speed gain on bt=5, wow. The thing I noticed is that 9 of my 24 threads didn't run.

Since the transform stage itself is only parallel requests you can probably get slightly more performance with ncpu=12.

This turned out surprisingly fast for something that only uses clang's auto vectorizer and no intrinsics at all.

ChaosKing
9th February 2021, 19:17
Any plans for a proper release in the near future?

Selur
9th February 2021, 19:36
.. ideally one what works with normal release. ;)

Myrsloik
10th February 2021, 18:13
.. ideally one what works with normal release. ;)

I do plan to get back to the API R4 branch at some point soon after making another maintenance release. Basically it only has a few small quirks left and I'll probably add proper clip properties too if I can think of an elegant way to do it.

ChaosKing
28th April 2022, 20:20
Any updates on this?

kedautinh12
29th April 2022, 00:46
Any updates on this?

Here:
https://github.com/AmusementClub/VapourSynth-FFT3DFilter/releases