Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > VapourSynth
Register FAQ Calendar Today's Posts Search

Reply
 
Thread Tools Search this Thread Display Modes
Old 8th August 2022, 17:38   #1  |  Link
Selur
Registered User
 
Selur's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 7,277
DFTTest2, experiences?

Stumbled over DFTTest2 (https://github.com/AmusementClub/vs-dfttest2) and I was wondering if anyone else played around with it.

I made a quick mod of havsfunc.QTGMC by adding a new input parameter to
Code:
cuda: bool = False,
and changing the Denoiser == 'dfttest' section to
Code:
        elif Denoiser == 'dfttest':
          if cuda:
            import dfttest2
            if noiseTD == 1 or noiseTD == 3 or noiseTD == 5 or noiseTD == 7:
              dnWindow = dfttest2.DFTTest(clip=noiseWindow, sigma=Sigma * 4, tbsize=noiseTD, planes=CNplanes, backend=dfttest2.Backend.NVRTC())
            else:
              dnWindow = dfttest2.DFTTest(clip=noiseWindow, sigma=Sigma * 4, tbsize=noiseTD, planes=CNplanes)
          else:          
              dnWindow = noiseWindow.dfttest.DFTTest(sigma=Sigma * 4, tbsize=noiseTD, planes=CNplanes)
and using:
Code:
clip = havsfunc.QTGMC(Input=clip, Preset="Slow", TFF=False, opencl=True, cuda=True, Denoiser='dfttest', NoiseProcess=1)
compared to
Code:
clip = havsfunc.QTGMC(Input=clip, Preset="Slow", TFF=False, opencl=True, cuda=False, Denoiser='dfttest', NoiseProcess=1)
gave a speed-up from ~20fps to ~30fps for SD content (50% speed-up). For HD content speed difference was a low smaller. (~5-10%)

Seeing this and since I personally don't use dfttest that often especially in QTGMC, I was wondering whether someone played around with DFTTest2 and would share some expriences?

Cu Selur
__________________
Hybrid here in the forum, homepage
Selur is offline   Reply With Quote
Old 10th August 2022, 04:03   #2  |  Link
Greenhorn
Registered User
 
Join Date: Apr 2018
Posts: 61
Just from playing with it now, on a system with a Ryzen 3700X and GTX 1660 Super, tested on a Y' plane extracted from HD content:

-- The output is not *exactly* the same as CPU DFTTest, but the differences appear to be smaller than 1/65536, so I wouldn't worry about it too much.
-- as the author's wiki says, the NVRTC backend is much, much, much faster than the CUDA backend even on my more modest hardware, with CUDA being about 50% faster than CPU and NVRTC being about 2.5x faster than CPU/1.6x faster than CUDA.
-- Going down to 16bits with CPU DFTTest is still a lot slower than 32bit GPU DFTTest, even with the CUDA backend.
-- Other than the very tiny output differences and bitdepth restrictions, it does appear to just work.

I have to wonder if the actual computational cost of dfttest2 may be higher than CPU DFTTest, considering the relative power of the hardware being used, but actually figuring that out is well above my knowledge level. It is definitely faster though.

Also, CPU DFTTest is appears to be a bit faster (not really meaningfully, but it was consistent across multiple 2400-frame runs) when you split a clip's planes, process them separately, and join them back at the end than if you just filter everything in one call. (This is why I used a gray clip to test, since IDK what dfttest2 is doing in regard to this.)

Last edited by Greenhorn; 10th August 2022 at 04:55.
Greenhorn is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 21:56.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.