PDA

View Full Version : dfttest - 2D/3D frequency domain denoiser.


Terranigma
28th November 2007, 03:17
dfttest by tritical

Info:
2D/3D frequency domain denoiser.

Requires libfftw3f-3.dll to be in the search path.
http://www.fftw.org/install/windows.html

Download (http://web.missouri.edu/~kes25c/)

DeathAngelBR
28th November 2007, 05:57
Straight to the point. Here's some screens. Yep, anime! No conclusion from me except anime is always a bitch to denoise.
It seems a sigma value of 0.25 to 0.5 is enough for grainy anime like this one.

Original noisy frame:
http://img503.imageshack.us/img503/9152/27730000originaltx9.th.png (http://img503.imageshack.us/img503/9152/27730000originaltx9.png)

Denoised with a not-so-much-detail-killer.

fluxsmoothst(9,9)
ttempsmooth()
fastlinedarkenmod(thinning=0, strength=25)
limitedsharpenfaster(smode=4, strength=50)
tweak(sat=1.10)

http://img525.imageshack.us/img525/310/27730001denoisedmw9.th.png (http://img525.imageshack.us/img525/310/27730001denoisedmw9.png)

Next is mc_spuds()

mc_spuds(frames=2, strength=1, anime=true, starfield=true, lsfstr=50, blocksize=8, overlap=8/2)
ttempsmooth()
fastlinedarkenmod(thinning=0, strength=25)
tweak(sat=1.10)

http://img267.imageshack.us/img267/1323/27730006mcspudswp5.th.png (http://img267.imageshack.us/img267/1323/27730006mcspudswp5.png)

Using dfttest() default.

dfttest()
fastlinedarkenmod(thinning=0, strength=25)
limitedsharpenfaster(smode=4, strength=50)
tweak(sat=1.10)

http://img523.imageshack.us/img523/764/27730002dfttestdefaulteo8.th.png (http://img523.imageshack.us/img523/764/27730002dfttestdefaulteo8.png)
Ooops! Too washed out.

dfttest(sigma=1.0), everything else the same.
http://img523.imageshack.us/img523/8152/27730003dfttestsigma1qm4.th.png (http://img523.imageshack.us/img523/8152/27730003dfttestsigma1qm4.png)

dfttest(sigma=0.5)
http://img509.imageshack.us/img509/3465/27730004dfttestsigma05md9.th.png (http://img509.imageshack.us/img509/3465/27730004dfttestsigma05md9.png)

dfttest(sigma=0.25)
http://img509.imageshack.us/img509/2218/27730005dfttestsigma025lx0.th.png (http://img509.imageshack.us/img509/2218/27730005dfttestsigma025lx0.png)

Adub
28th November 2007, 06:04
What kind of speed are we getting?

cestfait
28th November 2007, 07:54
WOW! :eek:

That is one powerful filter-- I've never seen such a good stand-alone temporal denoiser without artifacts (fluxsmooth and degrainmedian's blurring or temporalsoften and fft3d's ghosting, for example). It also temporally denoises much better than the former examples in general, imho (even when these others are mv-compensated).

It looks like you can set tbsize really high w/o side effects beyond slower processing. (ok, with high temporal radii it can get pretty slow, actually. . .)

Unfortunately, being quite mathematically illiterate, I still have some total blanks here and there in the docs. I wonder if tritical has any explanations for us about the differences between the analysis/synthesis windows. . .

It's really fun to play with, though. Now for some play with the spatial thresholds!

Fabulous! ^_^

:thanks:


P.S. I'm not big on spending my time clocking these filters, but I can say that with low sigma and temporal radii, this filter is acceptably fast. In fact, considering its power, it is pleasingly quick, indeed!

foxyshadis
28th November 2007, 12:30
btw, if you feel like really slowing it down, dfttest improves much with motion compensation, was originally made for that (http://forum.doom9.org/showthread.php?p=866255#post866255) in fact.

You could use it as a drop-in replacement for mvdegrain in mc_spuds, too, just in case it's still too fast for you. :p

2Bdecided
28th November 2007, 15:14
The parameters for dfttest have changed since that thread was written. What would be the equivalents of...

dfttest(sigma=3.6,bsize=8,osize=6,ssr=0,tsr=2,max2dblocks=5)

...in that post? I did try to figure the changes out from the help file, but some parameters have vanished!

Cheers,
David.

tritical
28th November 2007, 21:28
closest thing to:

dfttest(sigma=3.6,bsize=8,osize=6,ssr=0,tsr=2,max2dblocks=5)

would be:

dfttest(sigma=3.6,sbsize=8,sosize=6,tbsize=5)

The max2dblocks parameter is gone now along with ssr/tsr. ssr/tsr (spatial search radius, temporal search radius) were part of its internal mc, which I decided to remove since external mc with mvtools is faster and better. Previously, it would search within the area defined by ssr/tsr for the best matching blocks (most similar to the block on the current frame), and then choose the best 'max2dblocks' worth to include in the 3d transform. With ssr=0/tsr=2 and max2dblocks=5 it is essentially the same as using tbsize=5 and tmode=0 in the new filter.

Atm, the temporal operation of this filter (tbsize>1) is quite a bit slower than it needs to be since I don't cache the 2d transforms on each frame. The upside of the current method is that it uses a lot less memory (especially for small windows with large overlaps). If I get the time I will add optional caching. Also, there are 2 more filter types I would like to add.

Unfortunately, being quite mathematically illiterate, I still have some total blanks here and there in the docs. I wonder if tritical has any explanations for us about the differences between the analysis/synthesis windows. . .
I'm not an expert on windowing function characteristics either. Basically, I just included a bunch of different ones to play with. This page gives a nice overview, and has most of the ones I implemented listed:

http://en.wikipedia.org/wiki/Window_function

*.mp4 guy
29th November 2007, 23:25
Did you implement a "square" (sometimes called rectangular, Dirichlet, or sinc) windowing function (a square windowing function is just a plain cut-off, its the simplest one). With high overlap square often looks the best, square also always retains the most detail.

Terranigma
29th November 2007, 23:49
Did you implement a "square" (sometimes called rectangular, Dirichlet, or sinc) windowing function (a square windowing function is just a plain cut-off, its the simplest one). With high overlap square often looks the best, square also always retains the most detail.


swin,twin -

Sets the type of analysis/synthesis window to be used for spatial (swin) and
temporal (twin) processing. Possible settings:

0: hanning
1: hamming
2: blackman
3: 4 term blackman-harris
4: kaiser-bessel
5: 7 term blackman-harris
6: flat top
7: rectangular
8: Bartlett
9: Bartlett-Hann
10: Nuttall
11: Blackman-Nuttall

default: 0,0

So you think tritical should change the default to 7?

Also, you think you could explain how each method works (if you don't mind ?)

*.mp4 guy
30th November 2007, 00:05
Well, I don't know how the windowing functions will effect denoising specifically, but generally speaking a rectangular windowing function retains the most detail and has the most artifacts (ringing). All of the other windowing functions give less "wieght" to pixels close to the edges of the window when performing the dft transform, to try to lower the amount of ringing while retaining a certain type of information as well as possible (what is targeted varies amoung windowing functions) I would Guess that 7 would work the best, but obviously, to be sure testing would have to be done.

The wikipedia page should give you a good idea about how each different method works, the blue shape in the picture associated with each function gives a visual representation of how the weight of outlying pixels is reduced compared to the center pixel (in actuality they don't represent this, but for this usage of windowing functions its safe to think of them in this way).

tritical
30th November 2007, 03:14
I guess I'll try to give an explanation of some of the differences between the windows. There are basically three main characteristics to look at... the main lobe width, the peak side lobe level, and the side lobe roll off.

A small main lobe width gives better resolution. That is, given two equal magnitude peaks in the frequency spectrum, the smaller the main lobe width, the closer together the two peaks can be and we will still be able to resolve them. The rectangular window has the smallest main lobe width.

A small peak side lobe level makes it possible to resolve a weak (small) peak that is next to a, relatively speaking, strong (large) peak. The peak side lobe level determines the maximum response outside the main lobe. In other words, if the peak side lobe level is too high, then the energy from a strong peak can leak into nearby frequency bins and cover up a smaller peak if one is present. Of the windows implemented in dfttest, flat top or 7-term blackman-harris has the smallest peak side lobe level. The rectangular window has the largest peak side lobe level.

The third characteristic is the side lobe roll off, which gives the rate of decrease in the peak of each side lobe as you move away from the main lobe. Sometimes trading a faster roll off for a larger peak side lobe level (the peak side lobe level generally occurs in the side lobes closest to the main lobe) is good if there are no nearby peaks, but there are some farther away.

Thus, the best window would have a small main lobe width, a small side peak, and a fast roll off. The problem is that you can't have all of that at the same time. Generally, the smaller the main lobe width, the higher the side peak and vice versa. Of the windows in dfttest, the order from smallest to largest in terms of main lobe width is:

rectangular
bartlett
hamming
hanning
bartlett-hann
blackman
blackman-harris (4-term)
blackman-nutall
nutall
blackman-harris (7-term)
flat top

I left out kaiser-bessel because it is adjustable and could end up anywhere in there.

Exactly which window is best for denoising is hard to say, and would also depend on overlap amount, window size, etc... I can say from my tests in the objective denoiser thread that hanning/hamming with large overlap >= 75% and window size around 8-16 seemed to work best for removal of gaussian white noise (in terms of resulting psnr and ssim).

http://en.wikipedia.org/wiki/Image:Window_function_%28comparsion%29.png

This graph from wikipedia has pretty much all of the windows listed above, and shows the main lobe, along with the frequency envelope (you can see the peak side lobe level and how fast the roll off is).

*.mp4 guy
30th November 2007, 04:00
Exactly which window is best for denoising is hard to say, and would also depend on overlap amount, window size, etc... I can say from my tests in the objective denoiser thread that hanning/hamming with large overlap >= 75% and window size around 8-16 seemed to work best for removal of gaussian white noise (in terms of resulting psnr and ssim).


Nitpick: its important to note that addgrain produces noise that may differ from other guassian-like noise commonly seen in actual video.

Anyway, sorry for veering a bit off topic, I just wanted to know if the option was available (its left out surprisingly often when multiple windowing functions are available) I'm sure the default settings dfttest uses are as close to optinal for any given source as you can expect defaults to be.

Chainmax
1st December 2007, 04:11
...
Requires libfftw3f-3.dll to be in the search path.
http://www.fftw.org/install/windows.html
...


What exactly does that mean? All I see on that link are compilation instructions.

Dark Shikari
1st December 2007, 04:18
What exactly does that mean? All I see on that link are compilation instructions.The top section on the page is titled "precompiled..." ;)

Chainmax
1st December 2007, 23:03
dfttest already has a pre-compiled DLL included. What I don't understand is what does "Requires libfftw3f-3.dll to be in the search path" mean, which doesn't seem to be clarified in the URL.

Terranigma
1st December 2007, 23:19
The post was taken vertabimly from dfttest's .txt document. What that means is, is that it needs to be in WINDOWS's directory like fftw3.dll

Chainmax
4th December 2007, 00:07
Here's a comparison between DeGrainMedian(limitY=5,limitUV=7,mode=0) and dfttest(sigma=1.25) on an extremely crappy source:


Source (resized by two on VDub's preview window):
http://img84.imageshack.us/img84/4918/sourceos5.png (http://imageshack.us)

DeGrainMedian(limitY=5,limitUV=7,mode=0):
http://img81.imageshack.us/img81/1800/degrainmedianlimity5limox4.png (http://imageshack.us)

dfttest(sigma=1.25):
http://img209.imageshack.us/img209/8708/dfttestsigma125bi2.png (http://imageshack.us)


If you want more pics, let me know.

BlueCup
4th December 2007, 05:01
I think I can count the blocks in that first picture, whoa. Is that even watchable?

Chainmax
4th December 2007, 10:06
From a distance :p :(.

It's so sad because it's one of my favorite clips and as far as I know it's only available in this crappy YouTube form. :( x20

Chainmax
26th December 2007, 13:00
It seems dfttest can cause some problem with the order of the frames being displayed. I was trying to re-encode Pirate Baby's Cabana Battle Street Fight 2006 (freeware video, available here (http://www.selectparks.net/modules.php?file=article&name=News&sid=442)), and after encoding the video would sometimes jerk around as if a wrong field order was assumed. Here's the filterchain:

SetMemoryMax(512)

LoadPlugin("X:\wherever\MT.dll")

LoadPlugin("X:\wherever\MT_MaskTools.dll")
LoadPlugin("X:\wherever\DGDecode.dll")
LoadPlugin("X:\wherever\DCTFilter.dll")
Import("X:\wherever\DeBlock_QED_MT2.avs")

LoadPlugin("X:\wherever\RemoveGrainSSE3.dll")

LoadPlugin("X:\wherever\dfttest.dll")

Import("X:\wherever\LimitedSharpenFaster.avs")
Import("X:\wherever\Soothe_MT2.avs")





SetMTMode(5)
DirectShowSource("X:\wherever\PaulRobertson_PirateBabysCabanaBattleStreetFight2006.mpg",fps=25,audio=false)

ConvertToYV12()

SetMTMode(2)
DeBlock_QED_MT2(quant1=35,aOff1=16,quant2=45,aOff2=6)

RemoveGrain(mode=5)

Crop(12,52,328,184,align=true)

dfttest(sigma=1.5)

Spline36Resize(320,176)

RemoveGrain(mode=5)

LimitedSharpenFaster(SMode=4,Strength=200)

Commenting out the dfttest line would solve the issue.


[edit]mmm...commenting out the SetMTMode calls instead also solves the issue. Should I post this in the MT thread as well?

tritical
28th December 2007, 12:18
With the settings you used dfttest isn't even a temporal filter... on every getframe call it simply requests the current frame, filters it, and then returns it. I'm really not sure how it could have any effect on the order of frames.

dfttest might have an issue with SetMTMode(2). Unfortunately, I'm not up to date on how all the setmtmode and built-in avisynth multithreading works. I do remember that Colormatrix initially had some problems working with mt-avisynth, and dfttest uses the same kind of threading model. The main thread creates a set of worker threads which it maintains in a threadpool, and on every getframe call the main thread distributes work to the worker threads. The main thread then sits and waits for the worker threads to finish, and then returns the final frame. tsp would be probably know whether this can work with SetMTMode(2) or not.

On another note, in the currently available version of dfttest I accidently left 3 function pointers as global variables. This will cause an issue if multiple instances of dfttest are used in the same script and they have different values for ftype or zmean. Also, if in one instance sbsize%4==0 and in another it doesn't then there will be a problem. I've fixed this issue in the version on my computer. Hopefully I'll be able to it this weekend.

Kumo
12th February 2008, 12:18
i'm finding this filter very usefull for a video i'm trying to convert.at the moment i'm using dfttest like this:
source = last
backward_vec3 = source.MVAnalyse(isb = true, delta = 3, pel = 2, overlap=4, sharp=1, idx = 1)
backward_vec2 = source.MVAnalyse(isb = true, delta = 2, pel = 2, overlap=4, sharp=1, idx = 1)
backward_vec1 = source.MVAnalyse(isb = true, delta = 1, pel = 2, overlap=4, sharp=1, idx = 1)
forward_vec1 = source.MVAnalyse(isb = false, delta = 1, pel = 2, overlap=4, sharp=1, idx = 1)
forward_vec2 = source.MVAnalyse(isb = false, delta = 2, pel = 2, overlap=4, sharp=1, idx = 1)
forward_vec3 = source.MVAnalyse(isb = false, delta = 3, pel = 2, overlap=4, sharp=1, idx = 1)
source.MVDegrain3(backward_vec1,forward_vec1,backward_vec2,forward_vec2,backward_vec3,forward_vec3,thSAD=400,idx=1)
dfttest(sigma=1)
with mvdegrain before it.there's a way to mvcompensate dfttest with mvtools instead of using mvdegrain3 as a previus filter?can anyone post a sample usage sript?thanks

Terranigma
12th February 2008, 14:49
(tritical's example, modded)

source=last
vf1=source.mvanalyse(pel=2,blksize=8,isb=false,idx=1,overlap=4,sharp=2,truemotion=true)
vf2=source.mvanalyse(pel=2,blksize=8,isb=false,idx=1,delta=2,overlap=4,sharp=2,truemotion=true)
vb1=source.mvanalyse(pel=2,blksize=8,isb=true,idx=1,overlap=4,sharp=2,truemotion=true)
vb2=source.mvanalyse(pel=2,blksize=8,isb=true,idx=1,delta=2,overlap=4,sharp=2,truemotion=true)
interleave(\
mvcompensate(source,vf2,idx=1,thSCD1=800)\
, mvcompensate(source,vf1,idx=1,thSCD1=800)\
, source\
, mvcompensate(source,vb1,idx=1,thSCD1=800)\
, mvcompensate(source,vb2,idx=1,thSCD1=800))
dfttest(sigma=1)
selectevery(5,2)

elguaxo
12th February 2008, 15:49
Isn't it necessary to have tbsize=3 in dfttest?

Note that with default settings, dfttest works purely spatial (tbsize=1), so it won't benefit at all from the compensation.
You should instead use dfttest(sigma=[something], tbsize=3) in that script, to actually make use of the motion compensation.

Didée
12th February 2008, 18:28
With tbsize=3, dfttest uses 3 frames: current, current-1, current+1.
To use 2 backward and 2 forward compensations in the code Terranigma posted, set tbsize=5.

Adub
12th February 2008, 18:44
I find it interesting how people just copy and paste the script information from that DV cleaning thread, even after Didee already explained this.

Terranigma
12th February 2008, 18:55
I find it interesting how people just copy and paste the script information from that DV cleaning thread, even after Didee already explained this.

...and what I find interesting, is that I missed it, since I don't go searching and waiting for all posts by Didée. :D

elguaxo
12th February 2008, 19:21
With tbsize=3, dfttest uses 3 frames: current, current-1, current+1.
To use 2 backward and 2 forward compensations in the code Terranigma posted, set tbsize=5.

:thanks:

Chainmax
17th February 2008, 03:55
With the settings you used dfttest isn't even a temporal filter... on every getframe call it simply requests the current frame, filters it, and then returns it. I'm really not sure how it could have any effect on the order of frames.
...
On another note, in the currently available version of dfttest I accidently left 3 function pointers as global variables. This will cause an issue if multiple instances of dfttest are used in the same script and they have different values for ftype or zmean. Also, if in one instance sbsize%4==0 and in another it doesn't then there will be a problem. I've fixed this issue in the version on my computer. Hopefully I'll be able to it this weekend.
If you can, please set default values that make it act as a 3D filter so that lazy filterers like I only have to change the sigma value :p.

tritical
19th February 2008, 01:22
I'll change it in the next version.

murrsturr
30th April 2008, 01:43
Thanks Tritical. Very much appreciated.

kwak
7th May 2008, 17:07
Hi Guys :) this is my first post:D

I've a problem to locate the directory for the libfftw3f-3.dll file in windows Vista 64bit, I've just used this filter in XP without problem, and I've just tried a solution on the FFTW site without success.

Please help me :)

:thanks: forward to all

lol_123
7th May 2008, 17:43
I am confused about the term "denoise" , when you guys say it, what does it exactly mean, de-mosquito noise of MPEG ?

R3Z
8th May 2008, 07:58
I am confused about the term "denoise" , when you guys say it, what does it exactly mean, de-mosquito noise of MPEG ?

Predominantly random dots and particles that are not part of the detail make up of the movie. Denoise also covers film grain and other small imperfections.

foxyshadis
8th May 2008, 08:08
kwak, if you can't get it into the windows system32 folder (or syswow64 in x64) then you have to put it into the same folder as the program you're running, or another folder that's on the "path".

kwak
9th May 2008, 19:41
thanks foxyshadis, now dfttest runs great also in vista :)

Chainmax
20th July 2008, 22:44
So, for using its temporal component, all one has to set is tbsize to something higher than 1, right?

Adub
30th July 2008, 17:54
Yes. I usually find 3-5 a good range.

weisskreuz
24th January 2009, 00:19
Thanks, that gives amazing effect.
But I’ve got a problem, that when I use dfttest on win2003, it gives me a result with random black/green bar (random frame & random location)
I tried avs 2.5.6~2.5.8, also tried just put dfttest and avs default filters in plugins directory, update fft, update drivers, but can't solve this problem.

If you can see this clip, you will know what I said
http://www.mediafire.com/download.php?jgtn2el3xjn


I'm using:
Microsoft Windows Server 2003, Enterprise Edition SP2
QuadCore Intel Xeon E5335, 2000 MHz (6 x 333)
Intel Sapello S5000VSA (1 PCI, 2 PCI-E x8, 2 PCI-X, 4/8 FB-DIMM, Video, Dual Gigabit LAN)
4089 MB (DDR2-667 Fully Buffered ECC DDR2 SDRAM)
ATI ES1000 (16 MB)

masterkivat
24th January 2009, 00:42
I already had that issue, but I've formated my pc here (now running Vista x64 with updated drivers), and until now I didn't get it. Hope tritical could awser us what we must to do avoid that :D

mikeytown2
24th January 2009, 21:38
Try it with mod16 width and height. That usually takes care of problems related to random green.

tritical
24th January 2009, 23:15
Please post the scripts causing problems.

tritical
25th January 2009, 05:58
I finished/cleaned up the code for the next version that I had from November 08. Try it, and see if it fixes the problems... dfttest v1.2 (http://bengal.missouri.edu/~kes25c/dfttestv12.zip). Changes:

+ added filter types 3/4 and corresponding parameters (sigma2,pmin,pmax,sfile2,pminfile,pmaxfile)
+ more asm optimizations
- fixed problem with global function pointers add multiple instances in the same script
- changed name of 'cfile' parameter to 'sfile'
- the value given for sigma is no longer squared on initialization
- sigma now defaults to 2.0
- tbsize now defaults to 5

Note that tbsize now defaults to 5... so it will be slow as hell :p.

Dark Shikari
25th January 2009, 06:32
Oh come on, you must be able to make this one faster:

void addMean_SSE(float *dftc, const int ccnt, const float *dftc2)
{
__asm
{
mov edx,dftc
mov edi,dftc2
mov ecx,ccnt
xor eax,eax
four_loop:
movaps xmm0,[edx+eax*4]
addps xmm0,[edi+eax*4]
movaps [edx+eax*4],xmm0
add eax,4
cmp eax,ecx
jl four_loop
}
}

---> (not tested but you get the idea)

void addMean_SSE(float *dftc, const int ccnt, const float *dftc2)
{
__asm
{
mov edx,dftc
mov ecx,dftc2
mov eax,ccnt
four_loop:
movaps xmm0,[edx+eax*4-16]
movaps xmm1,[edx+eax*4-32]
movaps xmm2,[edx+eax*4-48]
movaps xmm3,[edx+eax*4-64]
addps xmm0,[ecx+eax*4-16]
addps xmm1,[ecx+eax*4-32]
addps xmm2,[ecx+eax*4-48]
addps xmm3,[ecx+eax*4-64]
movaps [edx+eax*4-16],xmm0
movaps [edx+eax*4-32],xmm1
movaps [edx+eax*4-48],xmm2
movaps [edx+eax*4-64],xmm3
sub eax, 4
jl four_loop
}
}

(Even if it isn't guaranteed to be mod64, it probably costs less to modify the code to allow it than to have such a short loop...)

Also, sqrtps is really slow. Apparently rsqrtps is 3 cycles, while sqrtps is up to 58. And rcpps is just 3, so it's far faster to do rsqrtps + rcpps than to do sqrtps. At least on the Core 2.

By the way, any commentary about the merits of the various ftypes?

tritical
25th January 2009, 22:22
Unrolling there does make addmean faster, but it isn't one of the more time consuming functions. However, it did make more of a difference in overall speed than I thought it would based on previous profiling results:

libfftw3f-3.dll 67.75 smode=1,sbsize=12,sosize=9,tmode=0,tbsize=5
dfttest.dll 19.03
filter_0_SSE 38.16
proc0_SSE2_4 25.03
removeMean_SSE 14.89
proc1_SSE 10.22
addMean_SSE 7.31
func_1 2.38

libfftw3f-3.dll 48.79 smode=1,sbsize=12,sosize=9,tbsize=1
dfttest.dll 28.34
proc1_SSE 31.01
filter_0_SSE 28.62
proc0_SSE2_4 14.42
removeMean_SSE 9.96
addMean_SSE 6.34
func_0 3.15

So I changed that, made a few other small tweaks, and fixed tmode=1 operation so that during linear access it doesn't have to recalculate all involved temporal blocks on every frame. Will put it next version later tonight.

ftype=0/1 are probably the only useful ones for typical video denoising. The others allow for lots of frequency domain effects, but to really be useful you'd probably have to spend the time to specify different sigma/sigma2/pmin/pmax values for each dft coefficient. Plus, as there is currently no option to show the dft coefficients/psd values it would be rather difficult to set all of those. I have plans to add a 'show' option of some kind.

Dark Shikari
25th January 2009, 22:30
Since I'm rather interested in this going faster, is there any general reason why you can't do the transforms in integer math?

I imagine that special-casing transforms could make things a whole lot faster, though switching the whole program over to DCT/iDCT instead of FFT might be quite complicated.

weisskreuz
25th January 2009, 23:55
Try it with mod16 width and height. That usually takes care of problems related to random green.

:rolleyes:512x288 is already mod 16&32



Please post the scripts causing problems.

directshowsource("cb_nico_04.mp4")
dfttest(sigma=0.8)
crop(0,48,0,-48)# video height is 384px with borders
trim(0,3000)

It seems not caused by script or video, because I also tested it on XP, no problems have been found.



I finished/cleaned up the code for the next version that I had from November 08. Try it, and see if it fixes the problems... dfttest v1.2 (http://bengal.missouri.edu/~kes25c/dfttestv12.zip).


:D Thx tritical

Problem still exist, here is a 301 frames test between 1.1 and 1.2 with another video.
http://www.mediafire.com/download.php?dnyj4z4nnzy

also used with default setting, script:

dss2("Druaga2_tvs_03.mp4")
dfttest(sigma=1)
trim(0,300)

Sagekilla
26th January 2009, 00:25
I assume that was a typo and you mean 512x256 yes? ;)

weisskreuz
26th January 2009, 01:24
sorry, that's 512x288(16:9)

post update

Caroliano
26th January 2009, 03:39
Regarding the filter going faster, as dfftest seems to be similar to ff3dfilter, how dificult would be to do an "dfttGPU"? Or I'm comparing apples to oranges here?

I would love to see that, because most of my encoding time is dfttest, and I can't drop it, because it is so good in anime.

tritical
27th January 2009, 09:43
I put up [link removed], changes:

+ some small assembly optimizations
+ tmode=1 caching (don't need to recalculate all involved temporal blocks on every frame)
- replicate temporal dimension at beginning/end, don't mirror


@weisskreuz
Please try v1.3. If the problem with win2003 is still there then I have no idea what is causing it.

@Dark Shikari
Since I'm rather interested in this going faster, is there any general reason why you can't do the transforms in integer math?

I imagine that special-casing transforms could make things a whole lot faster, though switching the whole program over to DCT/iDCT instead of FFT might be quite complicated.
I don't have any experience with integer transforms, but if you can only do transformations of integer inputs then it wouldn't be useable because the windowing function needs to be applied beforehand. I guess you could do scaling/integer approximation. You probably know more than I do about it. Switching from DFT to DCT wouldn't take that much work.

IMO, dfttest isn't that slow in spatial only mode (tbsize=1). The default of 12x12 blocks with 75% overlap is simply a lot of calculations. More reasonable settings, 16x16 w/ 25-50% overlap, are quite fast. Of course, the results aren't usually as good. dfttest is slow with tbsize>1, especially with tmode=0, because it computes full 3D transforms instead of computing 2D transforms on each frame, storing the results, and then applying the temporal window and 1D transform as needed to those. That method uses significantly more memory, especially for large overlaps, but eliminates a lot of redundant calculations. I've started making the changes to do this, but it is going to take a little while to complete.

@Caroliano
Regarding the filter going faster, as dfftest seems to be similar to ff3dfilter, how dificult would be to do an "dfttGPU"? Or I'm comparing apples to oranges here?
Let's just say that I will not be the one to make a dfttGPU :p.

tritical
7th April 2009, 03:26
I forgot to upload v1.4 with the fix for weisskreuz's issue from February. It's now up on my website. Turns out the problem was caused by threads>1 creating the possibility for multiple threads to try to update the same memory location (add their value to the current value) simultaneously. Well, I should say I think that is what the problem was, as fixing that problem also fixed his issue. However, for some reason I could never make threads>1 result in significant differences (more than +-1 for a given pixel value) on my quadcore machine running XP. Unfortunately, the fix does slow it down a little bit.

tritical
11th April 2009, 21:09
I've been doing some work on dfttest, v1.5 is up on my website.

I've added an exponent to the 0 filter type:

mult = max((psd-sigma)/psd,0)^f0beta

1.0 (the previous default) corresponds to wiener filter with spectral subtraction as the signal power estimate. If you set f0beta=0.5 you get spectral subtraction, which I've been finding to be slightly better in ssim/psnr tests of removing white noise. Other values are also possible (actually I got even better ssim/psnr with f0beta=[0.2,0.3]), but are not typically used. The 1.0 and 0.5 cases have special code paths for speed, other cases use a general routine that needs to compute pow(), so are slower.

I also added the ability for dfttest to estimate the noise spectrum based on pure noise blocks that the user can specify using the 'nfile' parameter. This way dfttest can be more easily used for noise that doesn't have a flat power spectrum, but the noise still needs to be stationary (power spectrum doesn't change).

Finally, sigma/sigma2, for ftype<2, and pmin/pmax are now normalized based on the non-coherent power gain... so sigma corresponds directly to power regardless of the window size/function.

I'm also playing with some SNR adaptive spectral subtraction techniques, one of which should make it into v1.6.

vampiredom
12th April 2009, 19:18
I have a question; and please pardon me if it seems silly or obvious: Would it not be a good idea to include an interlaced=true/false option in dfttest (like exists in FFT3DFilter)?

In the absence of such an option, what is the best way to handle interlaced sources? Should we use SeparateFields().dfttest().Weave() ?

Thanks! (the denoiser quality looks great, BTW)

tritical
13th April 2009, 02:59
I don't include an interlaced parameter, and probably never will, because IMO you are better off handling it yourself... with separatefields().dfttest().weave() as you suggested, or more involved, but usually better quality, approaches such as bobbing the video (with a bobber that preserves the original fields), filtering, and then re-interlacing. The many possible approaches have been discussed before in various threads. Here is one I found: http://forum.doom9.org/showthread.php?t=112551. I thought there were some more recent ones, but can't seem to find any right now.

vampiredom
13th April 2009, 05:38
Thanks for the clarification. I think I understand what you mean: A simple separate -> filter -> weave does not apply the filter to realistic spatial interpretation of the image.

Yes, bobbing beforehand is the "quality" way to go; but it requires (at least) twice the processing time.

Out of curiosity: Does anyone happen to know if the interlaced=true option in FFT3DFilter simply separate and then re-weave the fields, or is there more going on behind-the-scenes? Does it force different handling of the chroma planes (when chroma denoising is selected) for YV12 sources?

nonsens112
2nd June 2009, 05:00
tritical, thanks for such a great denoiser.

but since it is rather slow, especially with big tbsize values, it would be great to add a clip parameter which can locally modify sigma.
I noticed that when using MC+dfttest, it's good to apply dfttest with bigger sigma on bad SAD areas, and almost disable dfttest (replacing it with mdegrain) on areas with too little local contrast, otherwise it eats some weak detail. so I have to blend 2 or even 3 instances of dfttest when it's needed to just locally control its sigma by external mask.

hope you'll consider such ability in further versions ;)

nonsens112
2nd June 2009, 20:04
another similar idea which also looks attractive in MC+dfttest(big tbsize) case is external clip that controls local tbsize.

in high-motion or intensively flickering areas big tbsize values don't work as good as low ones because most part of compensated frames in such areas violates SAD limitations, thus making most part of 3D-block belong to current frame. in static areas, the bigger tbsize is, the better detail retainment and the better stability can be obtained. for noisy DVD sources, maximum tbsize should be more than GOP size, which is about 16 frames if I'm not wrong.
thus, blending of multiple instances of dfttest with various tbsize (for example, from 3 to 17) seems to be a good idea qualitywise, but speedwise (and memorywise, if doing in 1 pass) is unreal now.
adding external control of local tbsize could easily solve this problem I think.

tritical
2nd June 2009, 23:53
Modifying sigma while the filter is running based on an external mask is easy enough. How do you want the mask clip to be interpreted? How should sigma change based on it?

Modifying tbsize while the filter is running would require more substantial code changes. Atm, the 3D window/transform are computed during initialization. I guess the easiest way would be to compute all possible windows/transforms for the varying tbsize range, and then switch between those during runtime. How do you want the external mask to be interpreted? I also had the idea of limiting tbsize based on sad or ssd to the center frame block as you move outwards... if the difference exceeds some threshold then stop (sort of how most temporal denoisers with adaptive radius work).

In other dfttest news, a few weeks back I worked on adding local neighborhood (i.e. neighboring blocks) smoothing of filtered power spectra prior to taking inverse fft and combining results. This is similar to smoothing the short time fourier spectra when using spectral subtraction in audio processing in order to reduce 'musical' noise. I only added it to the 2D (spatial only) denoising case. It works quite well to remove the typical artifacts of fft based denoising, especially when using strong settings. Applying it to the 3D case is more difficult though.

nonsens112
3rd June 2009, 18:47
Modifying sigma while the filter is running based on an external mask is easy enough. How do you want the mask clip to be interpreted? How should sigma change based on it?
I think the most natural way is: block's sigma = ("sigma" parameter) * (luma of ext clip averaged on spatial block of current frame) / 255

Modifying tbsize while the filter is running would require more substantial code changes. Atm, the 3D window/transform are computed during initialization. I guess the easiest way would be to compute all possible windows/transforms for the varying tbsize range, and then switch between those during runtime. How do you want the external mask to be interpreted? I also had the idea of limiting tbsize based on sad or ssd to the center frame block as you move outwards... if the difference exceeds some threshold then stop (sort of how most temporal denoisers with adaptive radius work).
I think the best way to interpret ext mask is: block's tbsize = 2 * round(("tbsize" parameter/2 -1) * (255 - ext clip luma at center of block) / 255) + 1
limiting tbsize based on sad or ssd to the center frame block as you move outwards is a good idea. the only drawback I see here is double SAD/SSD computation when using MC - in mvanalyse and here. if I'm not wrong, mvmask doesn't compute SAD but just takes it from vectors' clip, so using ext clip based on mvmask could be faster. whether it really matters compared to your DFT magic or not is the other question :)

also, as possible further improvement of this idea, I would consider not tbsize limiting but determine temporal borders independently, thus the current frame could be not in the center of the block. external control would require 2 clips in this case - forward and backward temporal depth. I'd expect much better stability, especially on scene changes. afaik, mdegrain works in such way.

tritical
5th June 2009, 02:33
@nonsens112
Your suggestions seem fine to me. Making the temporal radius unequal going backwards/forwards is rather difficult taking into account windowing. For now I'll stick with only the first two items :). No guarantees on when it will get done though.

@all
I discovered a bug in dfttest's window normalization under the following scenerios:

tbsize>1, tmode=0
consequence is a rectangular temporal window regardless of what twin is set to

sbsize>1, smode=0
consequence is a rectangular spatial window regardless of what swin is set to

In the smode=tmode=0 case you got rectangular for both temporal/spatial.

I should have a fix up later tonight.

nonsens112
8th June 2009, 16:59
Making the temporal radius unequal going backwards/forwards is rather difficult taking into account windowing.
I wonder what's difficult - current implementation limitations, possible quality issues (3D-ringing or like that) or smth else? or a mixture of?)))
I'll stick with only the first two items . No guarantees on when it will get done though.
that's great anyways :) hope the features will be useful to many people.

Keiyakusha
9th July 2009, 02:02
Hi. Please can someone tell me what options of this filter actually do? I mean when I need to adjust them? I have read the documentation but I don't understand it.

Y,U,V - If true, the corresponding plane is processed. Otherwise, it is copied through to the output image as is. default: true,true,true
So I need to add u=false,v=false parameters if I want to process only luma?
ftype - Controls the filter type. default: 0 - generalized wiener filter; mult = max((psd-sigma)/psd,0)^f0beta)
I guess I better leave it at default?
sigma,sigma2 - Value of sigma and sigma2 (used as described in ftype parameter description)
"ftype parameter description"- this is that formula? I don't understand what it means. This option is something like sigma in FFT3Dfilter?
pmin,pmax - Used as described in the ftype parameter description.
Again, have no idea what it means and when I should adjust it.
tmode - Sets the mode for temporal operation
Default is good most of the time?
swin,twin - Sets the type of analysis/synthesis window to be used for spatial (swin) and temporal (twin) processing.
Should I adjust this parameters or defaults is good?

Sorry for dumb questions. Please give me a link if I can read about it somewhere else.

Nightshiver
9th July 2009, 02:23
the sigma's are the strength of the filter. The high you set it, the stronger it denoises. For just about every use you will have for this filter, you will not need to alter anything, save the sigma.

tritical
9th July 2009, 06:22
So I need to add u=false,v=false parameters if I want to process only luma?
yep.

For general denoising, only ftype=0 and ftype=1 are useful. ftype=0 actually becomes ftype=1 as the value for f0beta approaches 0. pmin/pmax/sigma2 are only used in ftype>1 so you don't need to worry about those. ftype>1 is really only useful if you plan to use the 'sfile' parameter to specify sigma values for specific fft coefficients... in which case you can accomplish all sorts of frequency domain manipulations.

As for tmode, I would always use 0. 1 is faster (its the same idea as smode=0 vs smode=1 in the spatial domain), but in the temporal domain it causes artifacts. swin/twin you could play around with. The defaults are what I've found to work best in most cases. They also worked best in the ssim/psnr tests I've done. See http://en.wikipedia.org/wiki/Window_function for some information on the different windows.

If you are trying to remove noise that isn't white, the nfile parameter (which allows you to specify blocks containing only noise, from which dfttest estimates the noise power spectrum) can be very useful.

Keiyakusha
9th July 2009, 14:32
Oh, big thanks for the explanation!:thanks:

rkalwaitis
27th July 2009, 08:36
Tritical,

I am trying to use the nfile option and I receive an error that states unable to open nfile. Is there something special I should be doing. Perhaps a little example. Thanks. Also is there a way to make the nfile do each frame, remember the results for use by the denoiser, instead of working on an average?

Thanks

tritical
28th July 2009, 19:55
I am trying to use the nfile option and I receive an error that states unable to open nfile. Is there something special I should be doing. Perhaps a little example.
What is your script? What is in your nfile? When I use the nfile option it works fine... so I need a little more information about your problem.

Also is there a way to make the nfile do each frame, remember the results for use by the denoiser, instead of working on an average?
You actually have a clip where the noise power spectrum is changing on every frame, and the same position on the image is always flat with just noise? Or do you intend to manually specify locations for every frame?

rkalwaitis
3rd August 2009, 21:03
Tritical I have no real script for use of the option. I was trying to make it work via your read file. I thought that the nfile took a sample of a frame and used that to compute denoising. I was just wondering if it could do it frame by frame.

Thanks K

onesloth
3rd August 2009, 21:48
I thought that the nfile took a sample of a frame and used that to compute denoising. I was just wondering if it could do it frame by frame.
Somebody correct me if I'm wrong but my understanding is that one uses nfile by defining an area of one frame from the clip where there is nothing but noise (e.g. a flat color background). The analysis of that noise area is used to tweak the denoising parameters for the whole clip *not* just the frame in which the area is defined.

I think Tritical's point is that he would be really surprised if the noise in your clip changed significantly from frame to frame (thus requiring a new noise sample) and that it would likely be impossible for you to find a flat noise-only area in *every* frame that would be suitable for defining the noise.

rkalwaitis
4th August 2009, 16:13
This makes sense to me. Thanks for an explanation. Thanks Tritical for the great denoiser.

lansing
3rd September 2009, 18:35
my script was this:

MPEG2Source("sample.d2v", info=3)
colormatrix(hints=true)

dfttest()

And I'm seeing ghosting in some dark area. Noticed the blue coat.

Original:
http://img2.imageshack.us/img2/8519/originalz.png

dfttest:
http://img40.imageshack.us/img40/4609/dfttestghosting.png


I've uploaded a sample here (http://www.mediafire.com/download.php?tdqiwqynztf).

tritical
3rd September 2009, 19:20
Have you tried reducing tbsize and/or sigma, or running dfttest on a motion compensated version of the video? dfttest is definitely not immune to ghosting when using 3d filtering.

mgh
3rd September 2009, 20:43
dfttest is excellent at removing compression artefacts such as blocking and also at removing the noise you get when capturing analog tv from mildly weak channels. Have seen its deblocking powers mentioned in other threads here but not in this one:)
Nothing else i have tried is as good as dfttest at presrving details.
thanks for a superb filter.

lansing
3rd September 2009, 21:31
Have you tried reducing tbsize and/or sigma, or running dfttest on a motion compensated version of the video? dfttest is definitely not immune to ghosting when using 3d filtering.

I get what you mean now. I decreased the tbsize to 1 and the ghosting went away. Thanks for the help.

yup
2nd October 2009, 08:11
Hi all!
I written simple script for deblocking VCD (my daughter have big collection anime on VCD and I want transfer to DVD replace 6-8 VCD to one DVD DL) result very good.
MPEG2Source("AVSEQ01.d2v",5,6,info=3)
source=ColorMatrix(hints=true)
sigma=110*1.5
sigmamc=110
sourcef=source.dfttest_dfttest(Y=true,U=false,V=false,tbsize=1,ftype=1,sbsize=8,sosize=6,sigma=sigma)
sourcef=sourcef.dfttest_dfttest(Y=false,U=true,V=true,tbsize=1,ftype=1,sbsize=16,sosize=12,sigma=1.5*sigma)
sclip = source.msuper()
sclipf = sourcef.msuper()
vf1=manalyse(sclipf,isb=false,delta=1,overlap=4,badSAD=1000)
vf2=manalyse(sclipf,isb=false,delta=2,overlap=4,badSAD=1000)
vf3=manalyse(sclipf,isb=false,delta=3,overlap=4,badSAD=1000)
vb1=manalyse(sclipf,isb=true,delta=1,overlap=4,badSAD=1000)
vb2=manalyse(sclipf,isb=true,delta=2,overlap=4,badSAD=1000)
vb3=manalyse(sclipf,isb=true,delta=3,overlap=4,badSAD=1000)

mcomp7 = interleave(\
mcompensate(source,sclip,vf3,thSCD1=600)\
, mcompensate(source,sclip,vf2,thSCD1=600)\
, mcompensate(source,sclip,vf1,thSCD1=600)\
, source\
, mcompensate(source,sclip,vb1,thSCD1=600)\
, mcompensate(source,sclip,vb2,thSCD1=600)\
, mcompensate(source,sclip,vb3,thSCD1=600))
chroma=mcomp7.dfttest_dfttest(Y=false,U=true,V=true,tbsize=7,ftype=1,sbsize=16,sosize=12,sigma=1.5*sigmamc).selectevery(7,3)
mcomp7.dfttest_dfttest(Y=true,U=false,V=false,tbsize=7,ftype=1,sbsize=8,sosize=6,sigma=sigmamc).selectevery(7,3)
MergeChroma(chroma)
#StackVertical(source,last)

I use different spatial block for luma and chroma, because it is YV12 and chroma resolution 2 time little than luma (please advice) .
Also I think will be useful find optimal matrix (sfile parameter) based on MPEG quantizer, but why? For analog capture I use only sigma and increase sigma for chroma.
Any suggestions, please welcome!
With kind regards yup.

VincAlastor
21st November 2009, 09:35
dfttest is to crazy slow for hd encoding. can i use the gpufftw library or a cuda library instead? (http://gamma.cs.unc.edu/GPUFFTW/)

markanini
21st November 2009, 17:16
dfttest is to crazy slow for hd encoding. can i use the gpufftw library or a cuda library instead? (http://gamma.cs.unc.edu/GPUFFTW/)
Wow, that would be sweet! :D

Vitaliy Gorbatenko
25th November 2009, 07:43
It seems the project abandoned. Have not updated. I think it should look for the implementation of FFT on Open-CL, as the most promising and unlike CUDA universal.

Great Dragon
28th March 2010, 09:54
It's a great filter you've made, tritical. Thanks.
And I have one question: can I disable completely a temporal filtering to use only spatial features?

nonsens112
28th March 2010, 10:56
It's a great filter you've made, tritical. Thanks.
And I have one question: can I disable completely a temporal filtering to use only spatial features?
tbsize=1 makes filter spatial.

Dogway
27th May 2010, 01:34
Im wondering whether this is the spiritual sucesor of tnlmeans. is it?
I use tnlmeans because its wonderful results on anime, although its deadly slow and blurs a little bit on undifined areas. Its also abandoned and not supported anymore.

Is there a setting on dfttest aimed at animation? or similar to tnlmeans(ax=3,ay=3,az=6,sx=2,sy=2,bx=1,by=1,h=0.6,a=1000.0,sse=false)

EDIT:
http://img64.imageshack.us/img64/9092/compac.th.png (http://img64.imageshack.us/img64/9092/compac.png)

tritical
1st June 2010, 04:31
dfttest isn't related to tnlmeans in any way other than being a denoiser. dfttest works in the frequency domain. tnlmeans is basically bilateral filtering with no distance weighting and similarity weighting based on local neighborhood similarity (difference) instead of individual pixel difference. The only way to find equivalent settings (if they even exist) is to test I'm afraid.

Dogway
1st June 2010, 05:54
Thanks for the answer. I added the adjective "spiritual" only because its the next denoiser you made after tnlmeans. I still havent fully tested dfttest but sigmas and tbsize, but results doesnt seem much animation oriented at defaults settings as tnlmeans thats why Im trying to speed up this one with MT+ms=true (doesnt seem to work) or find out a good dfttest setting for this kind of sources. But its good to know that unlike tnlmeans sse=false, dfttest doesnt have any specific setting which would undoubtedly work better in animation.

foxyshadis
3rd June 2010, 22:53
I think the noise power spectrum analysis might be a little off, it smooths the image way out. For instance, I get this on one image:

0.013 18650.967 1413.832 255.401 143.324 45.611 0.009
43671.273 846.859 561.533 310.418 87.381 18.431 7.488
4325.101 438.551 167.673 90.019 45.118 18.157 23.700
134.528 189.763 4.576 86.177 57.927 23.606 24.473
13.794 7.955 4.505 34.447 15.854 4.820 5.515
2.716 2.340 6.929 0.477 0.114 0.087 0.327
11.135 3.224 0.270 0.179 0.501 0.372 0.417
2.716 3.802 0.464 1.590 0.274 0.150 0.327
13.794 26.689 67.291 47.565 6.188 7.503 5.515
134.528 1.487 60.844 123.280 64.566 35.531 24.473
4325.101 3644.524 187.824 80.187 130.267 33.799 23.700
43671.273 40664.180 1224.066 10.090 92.101 19.694 7.488


avg noise power = 2006.033321
The text file is just "0,0,486,368", and attached is the image that I used to create it. The script is:

ImageReader("C:\Temp\grafix\4454627606_fb119cdbfb_o.jpg",end=0).crop(0,0,0,-1).converttoyv12(matrix="pc.709")
dfttest(tbsize=1,nfile="nclip.txt")


On the other hand, when I crop the image to just that 12x12 square and use "0,0,0,0", I get this:
0.001 21.195 74.256 26.044 4.195 1.394 0.660
2.550 17.687 66.975 59.390 7.629 1.961 0.575
33.436 24.428 33.947 27.089 2.309 0.416 0.342
51.260 28.261 10.396 8.658 2.295 0.064 0.096
11.952 6.892 0.965 3.310 1.639 0.018 0.067
7.062 3.432 0.490 3.285 1.215 0.011 0.127
6.039 3.123 0.284 2.904 1.121 0.030 0.101
7.062 3.752 0.355 3.274 1.124 0.198 0.127
11.952 7.017 1.482 6.035 2.040 0.227 0.067
51.260 36.053 4.138 5.044 4.195 0.013 0.096
33.436 15.871 3.712 13.901 8.128 0.281 0.342
2.550 11.079 28.088 11.395 5.175 1.396 0.575


avg noise power = 10.156793
Which seems much more reasonable (it's rather chunky noise).

Edit: Further investigation reveals that the numbers keep increasing as the sbsize goes up.

tritical
4th June 2010, 04:16
I think you have x and y switched in nclip.txt. The ordering is:

frame_number,plane,y,x

If I use:

0,0,368,486

instead of

0,0,486,368

I get the same result as you show for the 12x12 cropped image. As far as dependence on sbsize/sosize/tbsize/tosize, the values are normalized based non-coherent power gain, and should be independent of window size (of course that only applies to ideal cases, such as different windows containing only white noise, etc...). Overlapping isn't considered when estimating the noise spectrum, but the estimated values are adjusted accordingly before use. If you continue to see any weirdness please let me know :thanks:.

foxyshadis
4th June 2010, 05:04
Thanks. Sorry for the bad call; given that I've been staring at the code in question all day, I have no excuse. Also, I made a patch which may or may not be of any use to you: It allows specifying nframe, nplane, nx, ny as arguments if you only want one set, without having to make a file. Also included is an experimental nclip parameter, where instead of specifying a location, it uses a naive random algorithm to find a suitable location in the blank area of an edge map. It's pretty half-baked right now. My original plan was to use tdtrans to find the farthest locations from any edge, but it wasn't really any better than just choosing a location at random.

For noisy edge maps, one of my work in progress is:

dn=dfttest(tbsize=1,sigma=40,sigma2=40)
ed1=dn.scalededge(1)
ed2=dn.scalededge(2)
ed4=dn.scalededge(4)
ed=average(ed1,.4,ed2,.4,ed4,.2).mt_binarize(64,chroma="process").removegrain(2)

function scalededge(clip c, int "scale") {
c
sm=Spline16Resize(m(2,width/scale),m(2,height/scale))
ved=sm.mt_convolution(horizontal="1 2 1",vertical="1 0 -1",chroma="process")
hed=sm.mt_convolution(vertical="1 2 1",horizontal="1 0 -1",chroma="process")
return mt_lutxy(hed,ved,expr="x y +",chroma="process").Spline16Resize(width,height)
}

dfttest(tbsize=1,nframe=0,nclip=ed)
There are probably more elegant ways of getting a good mask, but I'm a bit sleepy now.

tritical
4th June 2010, 17:45
Definitely good ideas. For some reason, trying automatic placement of the window for noise estimation never even crossed my mind. I'll try to get your changes added.

dansrfe
12th June 2010, 10:20
Is there any way to speed up dfttest() at default settings. I'm really happy with the results of this filter, but It's just way to slow to use in practice. I MT'd it but I still get about the same speed maybe a bit more.

tritical
12th June 2010, 16:27
dfttest is internally multithreaded so I wouldn't expect much if any speed increase from using MT. By default dfttest uses threads = # of processors... in some cases it might be faster with slightly more threads, but I've never tested that. First option to make it faster is decrease tbsize to 3 or 1 (default is 5). Second option is to decrease the amount of spatial overlap and increase the spatial window size. By default it uses a spatial window size of 12 with 9 overlap (75%). For the same window size less overlap will be faster. For the same overlap percentage, larger window size will be faster.

dansrfe
12th June 2010, 19:47
Thanks tcritical! Also can you post the documentation for dfttest so I can read up on all the options. I can't find the documentation anywhere.
Also do you have any plans to revise and upgrade dfttest in the future? :)

Didée
12th June 2010, 20:13
I can't find the documentation anywhere.
Huh? The documentation is included in dfttestv16.zip from tritical's page (http://web.missouri.edu/~kes25c).

dansrfe
12th June 2010, 21:39
*Oop's*, sorry about that. I had the dll downloaded a while ago and must have forgotten to save the documentation. I didn't look at the one place it had to be :eek:

Thanks Didée! :)

tritical
21st June 2010, 19:30
Posted v1.7. Changes:

+ added nstring/sstring/ssx/ssy/sst parameters and functionality
+ allow space as delimiter in input files
- fixed missing emms in sse routine for f0beta != (1.0 or 0.5) and ftype=0

Explanations for new parameters (from v1.8 readme):


nstring -

Same functionality as 'nfile', but allows entering window locations directly in
the script instead of creating a separate file. The list of frame/plane/ypos/xpos
quadruples is stored as a string with each quadruple separated by a space.
Example:

If you use an nfile that looks like:

a=4.0
35,0,45,68
28,0,23,87

You can use the following nstring and get the same result:

nstring="a:4.0 35,0,45,68 28,0,23,87"

The one restriction is that the oversubtraction factor (a:x.x) must be the first
entry in the string (as opposed to nfiles where the a=x.x can be placed anywhere).
If it is not supplied, then the same default oversubtraction factor is used as
is used for the nfile option.

default: ""


sstring/ssx/ssy/sst -

Used to specify functions of sigma based on frequency. If you want sigma to vary
based on frequency, then use 'sstring' instead of the 'sigma' parameter. sstring
allows you to enter values of sigma for different normalized [0.0,1.0] frequency
locations. Values for locations between the ones you explicitly specify are computed
via linear interpolation. The frequency range, which is dependent on sbsize/tbsize,
is normalized to [0.0,1.0] with 0.0 being the lowest frequency and 1.0 being the
highest frequency. You MUST specify sigma values for those end point locations
(0.0 and 1.0)! You can specify as many other locations as you wish, and they don't
have to be in any particular order. Each frequency/sigma pair is given as "f.f:s.s".
The list of frequency/sigma pairs is saved as a string, with each pair separated by
a space.

For example, if you want a linear ramp of sigma from 1.0 for the lowest frequency
to 10.0 for the highest frequency use:

sstring = "0.0:1.0 1.0:10.0"

"0.0:1.0" => this means sigma=1.0 at frequency 0.0

"1.0:10.0" => this means sigma=10.0 at frequency 1.0

Sigma values for frequencies between 0.0 and 1.0 will be computed via
linear interpolation.

Or if you want a band-stop filter that passes low and high frequencies (filters
middle frequencies) use something like:

sstring = "0.0:0.0 0.15:10.0 0.85:10.0 1.0:0.0"

To help visualize the process, the resulting filter spectrum is output to
"filter_spectrum-date_string.txt" using the same format as the "noise_spectrum.txt"
file that is output by the nfile/nstring options. The format of this file is compatible
with 'sfile' input.

There are two methods for computing sigma values for a given frequency bin based on
sstring. The first computes the normalized frequency location of each dimension
(horizontal,vertical,temporal), interpolates sigma for each of those dimensions,
and then multiples the individual sigmas to obtain the final sigma value. So that
everything scales correctly, all sigma values entered in sstring are first raised to
the 1/#_dimensions power before perform performing linear interpolation and multiplying.
The second method (based on fft3dfilter's system) works by computing a single location
from the seperate dimension locations (x,y,z) as:

new = sqrt((x*x+y*y+z*z)/3.0)

sigma is then interpolated to this location. By default the first system is used.
To use the second system simply put a '$' sign at the beginning of sstring as shown
below:

sstring = "$ 0.0:1.0 1.0:10.0"


---------------- ssx/ssy/sst explanation -------------------------------

sstring breaks the 1D (sbsize=1), 2D (for tbsize=1), or 3D (for sbsize>1 and tbsize>1)
frequency spectrum into chunks by normalizing each dimension to [0.0,1.0]... i.e. the
frequency range [0.0,0.25] is a cube covering the first 1/4 of each dimension. This works
fine if you want to treat all dimensions the same in terms of how sigma should vary.
However, if you wanted to ramp sigma based only on temporal frequency or horizontal
frequency, this is too limited. This is where ssx/ssy/sst come in!

ssx/ssy/sst allow you to specify sigma as a function of horizontal (ssx), vertical (ssy),
and temporal (sst) frequency only. The syntax is exactly the same as that of sstring. To
get the final sigma value for a frequency location, the three separate values (one for
each dimension) are computed and then multiplied together. As with sstring the sigma values
are first raised to the 1/#_dimensions power before performing linear interpolation and
multiplying. If you don't specify all three strings, then a flat function equal to the
'sigma' parameter is used for the missing dimensions. For dimensions of size one (the
spatial dimenions if sbsize=1 or the temporal dimension for tbsize=1) the corresponding
string is ignored.

For example:

ssx="0.0:1.0 1.0:10.0",ssy="0.0:1.0 1.0:10.0",sst="0.0:1.0 1.0:10.0"

will give the same result as

sstring="0.0:1.0 1.0:10.0"

Or if you want to ramp sigma based on temporal frequency:

sigma=10.0,sst="0.0:1.0 1.0:10.0"

This will use 10.0 for the horizontal/vertical dimensions, and ramp
sigma from 1.0 to 10.0 in the temporal dimension.

If 'sstring' is specified, it takes precedence over ssx/ssy/sst. Again, the
"filter_spectrum-date_string.txt" output file is helpful in visualizing the result.

default: ""


dither -

Controls whether dithering is performed when converting from float to unsigned char
for output. Internally dfttest works on floating point values. For output the
result must be quantized back to unsigned char values. Prior to v1.8 this was always
done by simply rounding. Possible settings:

0 - no dithering (same as v1.7 and prior)
1 - Floyd-Steinberg dithering
2-100 - Floyd-Steinberg dithering with increasing amounts of uniform random
noise added prior to the dithering process

Obviously dither=0 is the fastest, and dither=1 is slightly faster than dither>=2
due to not having to generate a random number for every pixel. However, this part
doesn't take much time compared to the actual filtering operation. dither=1 should
combat any banding introduced by dfttest's quantization, but probably wont help
banding in the source. At least, not anymore than the filtering itself. dither>=2 can
combat banding in the source that is left over after filtering.

default: 0

Needless to say 'nstring' came from foxyshadis' idea for not having to create a separate file. The sstring/ssx/ssy/sst I had been meaning to do for a while since it is a frequently requested feature.

For reference, the sigma4/sigma3/sigma2/sigma system of fft3dfilter converts to - sstring="$ 0.0:sigma4 0.1768:sigma3 0.3536:sigma2 1.0000:sigma". Note the '$', which makes dfttest use the second sstring method for determining sigma values.

tritical
22nd June 2010, 19:52
Posted v1.8, changes:

+ added dither parameter and functionality
+ attach date string to filter_spectrum.txt and noise_spectrum.txt output
+ changed sstring handling and added option to function like fft3dfilter

See post above for more info on parameters.

foxyshadis
22nd June 2010, 23:22
Wow, nice. Thanks!

AddGrain pre-generates a field of grain and quickly applies it to the image each frame, starting from a random offset, rather than coming up with a new random number each pixel. Since dither isn't nearly as visible, you could probably get away with pre-generating as few as 128-256 values to repeatedly add prior to floyd-steinberg.

Terranigma
23rd June 2010, 02:26
Thanks a lot for the update. After fiddling around with it, i've really come to love this new version, even more so than fft3dfilter in every conceivable way. :D

And oh yea, thanks for adding in my request. :)

Lorax2161
23rd June 2010, 04:49
Thanks for your hard work on this. It just keeps getting better and better.

Question: might it be in your future plans to include a "show" parameter which would display either the noise identified for denoising without the video--or which would paint the pixels that have been identified as noise to be removed a different color?

Of course I presume this would make it slower, but it would only be for testing/analysis... not to be left on during an encode.

In any event, I really do appreciate your contributions.

:thanks:

Stephen R. Savage
23rd June 2010, 05:02
Lorax, that feature would not need to be implemented in dfttest. You can already create the difference map between two clips with the internal Subtract() filter or the mt_makediff() function from the MaskTools plugin.

tormento
23rd June 2010, 08:15
Sob. If only it could be a little faster on HD contents.

Tritical: do you plan to update the x64 version too?

cretindesalpes
23rd June 2010, 08:28
Thanks for the dithering ! Could you please also implement a 16 bit output (like the way I modified the 1.6 version here (http://forum.doom9.org/showthread.php?p=1386559#post1386559)) so the dithering can be performed later in the processing chain with other tools or algorithms ?

tritical
23rd June 2010, 15:13
AddGrain pre-generates a field of grain and quickly applies it to the image each frame, starting from a random offset, rather than coming up with a new random number each pixel. Since dither isn't nearly as visible, you could probably get away with pre-generating as few as 128-256 values to repeatedly add prior to floyd-steinberg.
I might use a faster method like you suggest in the future. I didn't actually benchmark it at all, but it probably doesn't take up much time compared to all of the other processing dfttest does... so may not be worth it to change it. Will do some testing before the next version.

Question: might it be in your future plans to include a "show" parameter which would display either the noise identified for denoising without the video--or which would paint the pixels that have been identified as noise to be removed a different color?
Hm, is this for nstring/nfile use (paint the windows being used to estimate the noise spectrum) or just in general? If in general then Stephen R. Savage's suggestion will show you what dfttest thinks is noise. For the nstring/nfile case I don't plan on making a mode that paints the windows since you can easily locate the windows on your own. The estimated spectrum is already output to a file. The one thing that might be cool is to take the inverse transform of the estimated noise spectrum and show that against a flat image. If multiple windows are being used in the nfile/nstring, dfttest could take into the variance of the estimation when doing this (otherwise it would really only need to show one block).

Tritical: do you plan to update the x64 version too?
No. The reason I don't make x64 builds of any of my filters is that I don't run a 64-bit windows OS on any computers. When that changes I will probably start making them. Of course, another hold up is all of the inline assembly which has to be moved out (never mind rewriting to take advantage of the extra registers) to yasm or similar. dfttest has a lot of that :(

Thanks for the dithering ! Could you please also implement a 16 bit output (like the way I modified the 1.6 version here) so the dithering can be performed later in the processing chain with other tools or algorithms ?
Yep.

Lorax2161
23rd June 2010, 15:44
Stephen R. Savage

Lorax, that feature would not need to be implemented in dfttest. You can already create the difference map between two clips with the internal Subtract() filter or the mt_makediff() function from the MaskTools plugin.

Thanks for pointing me in the right direction; I will look into those suggestions.

tritical

Hm, is this for nstring/nfile use (paint the windows being used to estimate the noise spectrum) or just in general? If in general then Stephen R. Savage's suggestion will show you what dfttest thinks is noise. For the nstring/nfile case I don't plan on making a mode that paints the windows since you can easily locate the windows on your own. The estimated spectrum is already output to a file.

I meant just in general. At the moment, I use a script someone wrote here which will split your screen into a "before" and "after" filtering, and it's useful for fairly obvious changes but not so much finer differences unless they are more isolated.

This came about as I tweaked dfttest to improve speed on my dual core, and I wanted to see what I was giving up in noise reduction. But Stephen's suggestion should do the trick. I have some reading up to do on that, though. Thanks for the reply.

tritical
23rd June 2010, 16:22
Personally, I like to use interleave() to see the visual differences between clips since the output from subtract() can be hard to interpret (in relation to how different the two frames actually look to your eyes). So something like:

source()
a = dfttest(settings_old)
b = dfttest(settings_new)
interleave(a,b)

then open in virtualdub and flip back and forth between the same frame in both clips. Everyone has their own preference for this though.

Keiyakusha
23rd June 2010, 17:36
doesn't take up much time compared to all of the other processing dfttest does...

Well... I guess "not much time" is somewhat subjective. When with this: dfttest(sstring="0.0:1.0 1.0:16.0",dither=0,tbsize=1) I have average speed 9.2531 fps, with dither=1 I have 8.1968.
It will be nice if it can be faster. However results are perfect, thanks!
EDIT: or what foxyshadis proposed is valid only for dither>1?
By the way what tbsize=1 actually means? To process 1 frame in temporal dimension or it turns filter into pure spatial? Because with more than 1 I see some temporal artifacts...

VincAlastor
23rd June 2010, 20:19
tritical, thank you very much. to slow for "daily" using, but dfttest is a outstanding filter for extraordinary sources :)

tritical
23rd June 2010, 20:50
@VincAlastor
It's only slow if you use slow settings :). For example, processing 720x480 material on the Q9650 here in the lab with the following script:

mpeg2source()
dfttest()

I get 5.75 fps with (dither=2) - everything else defaults, and I get 6.3 fps with dither=0. If I use (tbsize=3,sbsize=48,sosize=16,U=false,V=false), which is equivalent to fft3dfilter's defaults, I get 54fps. If I use (tbsize=1,sbsize=48,sosize=16,U=false,V=false) I get 112 fps. Finally, if I use (tbsize=1,sbsize=16,sosize=0,U=false,V=false) I get 190 fps. Of course, the trade off for speed is quality (usually). Like most of the rest of my filters (nnedi3 for example) the defaults are tuned heavily in favor of quality. Though I should really change the tbsize default to 3 instead of 5, since 5 usually causes visible artifacts in motion areas unless motion-comp is used.

@Keiyakusha
The slow down is more than I thought it would be, but it's also relative to the settings being used. If you use fast settings then the float to unsigned char conversion process at the end takes up more time relative to the filtering. Also, dither=0 has SSE/SSE2 code paths while dither>0 (dithering or dithering+random_noise) is just C code for now. The difference between dither=1 and dither>1 isn't enough to make me use a faster method for generating the noise. If I have time I'll try to write SSE versions of dither>0 which should narrow the gap a lot.

Bi11
23rd June 2010, 22:40
Could someone explain to me in plain english, what makes dfttest better than other 2D/3D frequency domain denoisers like fft3dfilter? The documentation doesn't easily say what differentiates it from other denoisers, only how the use the parameters.

How does dfttest compare to tnlmeans? I know they operate very differently but are there sources under certain conditions that are more favorable to tnlmeans than dfttest?

Though I should really change the tbsize default to 3 instead of 5, since 5 usually causes visible artifacts in motion areas unless motion-comp is used.
What would the ultimate MC script with dfttest look like? :devil:

Off topic, tritical, if you have time, could you release an optimized version of EEDI3 please? It is very very useful with TGMC, but very very sloooow. :(

pandy
24th June 2010, 10:47
big THX Tritical! Is there any chance to add: fixed - time constant dither (like ordered for example) THX once more!

Caroliano
24th June 2010, 15:24
Thanks for the dither function! It seems much less destructive than the gradfunkmirror's one. I hope it is as effective. I aways wanted an lower strenght for dither, and now I have. But I assume that for sources with much banding, I'm better off using an external dither utility, right? I don't like to add grain.

Though I should really change the tbsize default to 3 instead of 5, since 5 usually causes visible artifacts in motion areas unless motion-comp is used.
I very much agree with that. I never use tbsize=5 anymore because of that. Can't you make an mode that turns the temporal filter of dfttest as safe as ttempsmooth? Or it won't work/won't work well with dfttest? And it is also a good idea to make the filter a little faster by default, as this is the main complaint that I hear about dfttest() elsewhere, from people that don't dare to mess with the settings.


I for myself don't mess much with the sbsize and sosize settings. The readme could be a bit more "user-friendly". Like, in the sbsize description saying that it is an speed/quality trade-off, and that an higher setting. the filter will be faster. Or in the ftype say something like "this is much slower, but likely to be more precise" or "works better for heavy denoising" or "blurs more", etc. For me "mult = sigma*sqrt((psd*pmax)/((psd+pmin)*(psd+pmax)))" don't means anything.

Dfttest() is really a great filter.

What would the ultimate MC script with dfttest look like? :devil:
I think it look like this: http://forum.doom9.org/showthread.php?p=1308269#post1308269

foxyshadis
24th June 2010, 23:30
Rather than faster by default, a preset argument for fastest/fast/normal/quality would make sense to control default size/overlap. It's still a bit of an experimental filter, though.

ftype (and swin/twin/sbeta/tbeta/f0beta/zmean) isn't something that really even goes in the front of the docs. There are a few places where you would try another ftype or tweak the arguments, but they're way out in an experimental left field if you just can't get the results you want or want a very different effect. The defaults are definitely sufficient for everyday use.

nibus
25th June 2010, 18:59
I've been using this for the past few days and I have to say I am getting some amazing results, especially on HD content with the dither function.

It is slow, but it does do a very good job of utilizing the whole CPU.

Keiyakusha
26th June 2010, 00:25
yeah, but there is one problem. this dither is so small, its hard to keep it with sane bitrates. even simple removegrain(mode=1) which I always use shaves it off :Р

HolyWu
15th November 2010, 10:58
Hi tritical. I see you have built nnedi3 x64 and eedi3 x64 in another thread. I wonder whether you plan to build dfttest x64? http://forum.doom9.org/showthread.php?t=152800 is outdated 1.6 version. Very appreciated.

Dogway
2nd January 2011, 08:54
When it comes to speed I like using dfttest, I have found it works nice along mdegrain, so I make a mdegrain prefilter and then run dfttestmc. The only drawbacks are the ghosting and the heavy blur on dark areas. I wanted to ask about this one, is there a way in the filter to protect these dark areas? as for now Im raising levels->dfttest->lowerin levels which preserves most details on penumbra but might harm the overall image/dynamic range.

thewebchat
11th January 2011, 04:08
When it comes to speed I like using dfttest, I have found it works nice along mdegrain, so I make a mdegrain prefilter and then run dfttestmc. The only drawbacks are the ghosting and the heavy blur on dark areas. I wanted to ask about this one, is there a way in the filter to protect these dark areas? as for now Im raising levels->dfttest->lowerin levels which preserves most details on penumbra but might harm the overall image/dynamic range.

You generally shouldn't be using MDeGrain to preprocess for MAnalyse. This doesn't make sense, because if MDeGrain could remove the grain, that means MAnalyse found the vectors and doesn't need any preprocessing. You should use a spatial denoiser like dfttest or FFT3DFilter instead.

Aside from that, you should check to see if you get the same artifacts running just dfttest, since they could be related to the MC function. I would try lowering sigma (obviously) and the temporal radius to see if that prevents the artifacts. If you find the artifacts are specific to dfttestMC, try reducing the spatial component (sbsize/sosize) and the number of compensated frames (mc). Also, don't set the mdg parameter to true, it's typically counter-productive.

Dogway
11th January 2011, 17:36
I just found someone uses a similar principle (http://forum.doom9.org/showpost.php?p=1444252&postcount=31). The goal was to denoise with mdegrain (not the mdg flag), and only then denoise with dfttest with a low sigma, using the mc version to avoid ghosting, this way you get a good denoised result while preserving details.

From here you can enhace the mechanism which probably you pointed out. Preprocess with spatial denoiser for vector evaluation, pass mdegrain, pass dfttest using previous vectors, and probably use the lumamask function I posted a few days ago to mix mdegrain+dfttest on bright with only mdegrain for dark.

maki
25th March 2011, 09:09
Question, is it okay if I try to keep the settings close to the original video (in this case), or is it okay to leave this relatively amount of noise.
(I'm trying to keep the video close to the source, but I don't know whether to keep this amount of noise, or try to remove it completely.
For those that are used to encoding anime what should I do?).

Original: No filtering whatsoever
http://img825.imageshack.us/img825/769/haruhiwithoutfilter.th.png (http://img825.imageshack.us/i/haruhiwithoutfilter.png/)


http://img101.imageshack.us/img101/71/haruhiwithfilter.th.png (http://img101.imageshack.us/i/haruhiwithfilter.png/)

With filters
toonlite(0.05)
dfttest(sigma=0.05, tbsize=3)
WarpSharp(96, 10, 128, -0.6)
LimitedSharpenFaster(strength=200).LimitedSharpenFaster()
Dehalo_alpha()

http://img192.imageshack.us/img192/7672/haruhi2006version.th.jpg (http://img192.imageshack.us/i/haruhi2006version.jpg/)

I know it sort of bothers me, but how can I thin the lines enough to make it look close to the thin lines of the 2006 version. Any help with the warpshap settings would be appreciated.

As for the the noise, dfttest seems to affect the background as the preview shows, but the moving object likes haruhi and kyon seem blocky,
nothing like the preview. How can I fix the chroma of moving objects without affecting the overall of the background too much.
Deen's defaults doesn't seem to do much, and I don't really get DeBlock_QED, I could also use some noise remover, but I am lost any help would be appreciated.

naoan
27th March 2011, 18:13
Uh... shouldn't that be up to what you want?

Imo, that warpsharp destroy it too much though.

GMJCZP
20th January 2012, 05:20
First of all the order of the filters would be:

dfttest (sigma = 0.05, tbsize = 3)
toonlite (0.05)
LimitedSharpenFaster (strength = 200). LimitedSharpenFaster ()
WarpSharp (96, 10, 128, -0.6)
Dehalo_alpha ()

I suggest using AwarpSharp2 instead of WarpSharp for anime. ;)

VincAlastor
20th January 2012, 22:08
The faster-than-fast Fourier transform

For a large range of practically useful cases, MIT researchers find a way to increase the speed of one of the most important algorithms in the information sciences.

http://web.mit.edu/newsoffice/2012/faster-fourier-transforms-0118.html
http://groups.csail.mit.edu/netmit/sFFT/

redfordxx
20th January 2012, 23:48
"Under some circumstances, the improvement can be dramatic — a tenfold increase in speed." Wow, hopefully dfttest gets lucky;-)

mandarinka
22nd January 2012, 18:41
Doesn't dfttest stand for >discrete< Fourier transform, as opposed to fast Fourier transform?

But if dfttest uses FFT for lower complexity (no idea), it might benefit if that method got integrated into the FFTW library it uses? I assume it uses that code to do the transform.

gyth
23rd January 2012, 05:49
FFT is the class of algorithms used the compute DFT.
http://en.wikipedia.org/wiki/Fast_Fourier_transform

Naive implementation of DFT would invert a square matrix taking a number of steps equal to the size of the matrix, O(n^2). FFT does the same thing in O(n log n). For sparce matrices (when many of the values are zero) things can possibly be even faster.

Many of the values are zero for heavily compressed files. Denoising won't zero out as many values unless you are using aggressive settings.