Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
8th August 2022, 17:38 | #1 | Link |
Registered User
Join Date: Oct 2001
Location: Germany
Posts: 7,277
|
DFTTest2, experiences?
Stumbled over DFTTest2 (https://github.com/AmusementClub/vs-dfttest2) and I was wondering if anyone else played around with it.
I made a quick mod of havsfunc.QTGMC by adding a new input parameter to Code:
cuda: bool = False, Code:
elif Denoiser == 'dfttest': if cuda: import dfttest2 if noiseTD == 1 or noiseTD == 3 or noiseTD == 5 or noiseTD == 7: dnWindow = dfttest2.DFTTest(clip=noiseWindow, sigma=Sigma * 4, tbsize=noiseTD, planes=CNplanes, backend=dfttest2.Backend.NVRTC()) else: dnWindow = dfttest2.DFTTest(clip=noiseWindow, sigma=Sigma * 4, tbsize=noiseTD, planes=CNplanes) else: dnWindow = noiseWindow.dfttest.DFTTest(sigma=Sigma * 4, tbsize=noiseTD, planes=CNplanes) Code:
clip = havsfunc.QTGMC(Input=clip, Preset="Slow", TFF=False, opencl=True, cuda=True, Denoiser='dfttest', NoiseProcess=1) Code:
clip = havsfunc.QTGMC(Input=clip, Preset="Slow", TFF=False, opencl=True, cuda=False, Denoiser='dfttest', NoiseProcess=1) Seeing this and since I personally don't use dfttest that often especially in QTGMC, I was wondering whether someone played around with DFTTest2 and would share some expriences? Cu Selur |
10th August 2022, 04:03 | #2 | Link |
Registered User
Join Date: Apr 2018
Posts: 61
|
Just from playing with it now, on a system with a Ryzen 3700X and GTX 1660 Super, tested on a Y' plane extracted from HD content:
-- The output is not *exactly* the same as CPU DFTTest, but the differences appear to be smaller than 1/65536, so I wouldn't worry about it too much. -- as the author's wiki says, the NVRTC backend is much, much, much faster than the CUDA backend even on my more modest hardware, with CUDA being about 50% faster than CPU and NVRTC being about 2.5x faster than CPU/1.6x faster than CUDA. -- Going down to 16bits with CPU DFTTest is still a lot slower than 32bit GPU DFTTest, even with the CUDA backend. -- Other than the very tiny output differences and bitdepth restrictions, it does appear to just work. I have to wonder if the actual computational cost of dfttest2 may be higher than CPU DFTTest, considering the relative power of the hardware being used, but actually figuring that out is well above my knowledge level. It is definitely faster though. Also, CPU DFTTest is appears to be a bit faster (not really meaningfully, but it was consistent across multiple 2400-frame runs) when you split a clip's planes, process them separately, and join them back at the end than if you just filter everything in one call. (This is why I used a gray clip to test, since IDK what dfttest2 is doing in regard to this.) Last edited by Greenhorn; 10th August 2022 at 04:55. |
|
|