Log in

View Full Version : DFTTest2, experiences?


Selur
8th August 2022, 17:38
Stumbled over DFTTest2 (https://github.com/AmusementClub/vs-dfttest2) and I was wondering if anyone else played around with it.

I made a quick mod of havsfunc.QTGMC by adding a new input parameter to
cuda: bool = False,
and changing the Denoiser == 'dfttest' section to
elif Denoiser == 'dfttest':
if cuda:
import dfttest2
if noiseTD == 1 or noiseTD == 3 or noiseTD == 5 or noiseTD == 7:
dnWindow = dfttest2.DFTTest(clip=noiseWindow, sigma=Sigma * 4, tbsize=noiseTD, planes=CNplanes, backend=dfttest2.Backend.NVRTC())
else:
dnWindow = dfttest2.DFTTest(clip=noiseWindow, sigma=Sigma * 4, tbsize=noiseTD, planes=CNplanes)
else:
dnWindow = noiseWindow.dfttest.DFTTest(sigma=Sigma * 4, tbsize=noiseTD, planes=CNplanes)
and using:
clip = havsfunc.QTGMC(Input=clip, Preset="Slow", TFF=False, opencl=True, cuda=True, Denoiser='dfttest', NoiseProcess=1)
compared to
clip = havsfunc.QTGMC(Input=clip, Preset="Slow", TFF=False, opencl=True, cuda=False, Denoiser='dfttest', NoiseProcess=1)
gave a speed-up from ~20fps to ~30fps for SD content (50% speed-up). For HD content speed difference was a low smaller. (~5-10%)

Seeing this and since I personally don't use dfttest that often especially in QTGMC, I was wondering whether someone played around with DFTTest2 and would share some expriences?

Cu Selur

Greenhorn
10th August 2022, 04:03
Just from playing with it now, on a system with a Ryzen 3700X and GTX 1660 Super, tested on a Y' plane extracted from HD content:

-- The output is not *exactly* the same as CPU DFTTest, but the differences appear to be smaller than 1/65536, so I wouldn't worry about it too much.
-- as the author's wiki says, the NVRTC backend is much, much, much faster than the CUDA backend even on my more modest hardware, with CUDA being about 50% faster than CPU and NVRTC being about 2.5x faster than CPU/1.6x faster than CUDA.
-- Going down to 16bits with CPU DFTTest is still a lot slower than 32bit GPU DFTTest, even with the CUDA backend.
-- Other than the very tiny output differences and bitdepth restrictions, it does appear to just work.

I have to wonder if the actual computational cost of dfttest2 may be higher than CPU DFTTest, considering the relative power of the hardware being used, but actually figuring that out is well above my knowledge level. It is definitely faster though.

Also, CPU DFTTest is appears to be a bit faster (not really meaningfully, but it was consistent across multiple 2400-frame runs) when you split a clip's planes, process them separately, and join them back at the end than if you just filter everything in one call. (This is why I used a gray clip to test, since IDK what dfttest2 is doing in regard to this.)