Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 18th June 2005, 12:17   #1  |  Link
insanedesio
Registered User
 
Join Date: May 2003
Posts: 27
filter: ExtendedBilateral

Well following up from this thread I decided to try and implement an extended version of the bilateral filter. Using the paper in that thread as well as another that suggests a couple of improvements to the bilateral filter, I came up with ExtendedBilateral.

The extension itself is that it adds an "initial estimation preprocess" that goes before the regular bilateral filtering step. Because of this, weaker smoothing settings are used when using both the preprocessor and the main bilateral filtering step. ExtendedBilateral can also be used in conjunction with TBilateral via its "clip2" parameter.

I haven't had much time to test it out fully, but the quick tests I did do were slow until I added a new LUT (thanks to tritical for the suggestion). I still have one more thing to do (remove the upsampling, which I may not get around to for another couple of days) but even after that I suspect it will still be a bit slower than TBilateral given me lack of experience in this kind of thing.

Since I haven't tested all the features, I'd love comments or bug reports; knowing me, I made some dumb mistake somewhere that completely screwed the whole particular process over. I suspect that this ought to work better on sources with large blocks of colour (cartoon/anime) and not quite as great on live sources (though I haven't tested it). I could also use some suggestions for speed-ups, if anyone actually looks at the code.

Here's the latest version of the filter:
gone now

Last edited by insanedesio; 17th April 2011 at 10:43.
insanedesio is offline   Reply With Quote
Old 21st June 2005, 21:37   #2  |  Link
AVIL
Registered User
 
Join Date: Nov 2004
Location: Spain
Posts: 408
Hi,

After give extendedbilateral a try, i can report:

its sloooooooooooooooooooooooooooooow

It took abot two minutes to initiate process (time to load and process script in virtualdub) in my pentium III 800 PC. Same time (roughly) to advance to next frame.

I have used to produce a ppclip for Tritical's tbilateral filter. I can say there are a sligth, but noticeable, improvement in edge preservation.

The speed, alas, make the filter unusable for me (perhaps in a superCraycomputer?)

Best regards
AVIL is offline   Reply With Quote
Old 22nd June 2005, 04:32   #3  |  Link
insanedesio
Registered User
 
Join Date: May 2003
Posts: 27
Yeah it's pretty damn slow... though not as slow as you're saying (the PIII 800 might have something to do with it). On my tests it showed to be about 40-50% the speed of TBilateral (the main bilateral process) and the preprocessor was a bit faster than that. Once I take out the upsampling it should be faster (though by how much, I'm not sure... a quick test showed it faster by an fps or two but the actual difference oughta be more since the actual pixel processing changes and becomes simpler...) If you're getting processing times that are really slow (like less than 30% of TBilateral) then I'd be confused. (If you think this version is slow, you should have seen 0.5.0.1 - eight times or so slower).

What settings did you use, the defaults? Some settings will slow the filter down significantly on certain values. Also, when using it with TBilateral, you need to make sure that both filters use the same corresponding dev and sigma values (info in the readme). You might want to use smaller devs in TBilateral than you'd normally use since the preprocessor adds a bit of smoothing on its own.

As for the "edge preservation," the real strength of this process is that it is supposed to be able to tackle noise closer to edges that a regular bilateral process cannot. So yeah, in some senses the edge preservation is supposed to be better (more noise removal close to edges).

EDIT:
Another thing ExtendedBilateral offers is for the multiple kernels (the flat one is supposed to be pretty fast... some of the others should be alot faster than the default gaussian, too... check the PRASA paper mentioned in the readme for more info) as well as the median bilateral filters, which are also suggested as improvements in the same paper.

Last edited by insanedesio; 22nd June 2005 at 08:56.
insanedesio is offline   Reply With Quote
Old 22nd June 2005, 09:53   #4  |  Link
ambrotos
Evil tweaker...
 
ambrotos's Avatar
 
Join Date: Sep 2002
Posts: 33
ok you win ^^
ambrotos is offline   Reply With Quote
Old 22nd June 2005, 19:03   #5  |  Link
insanedesio
Registered User
 
Join Date: May 2003
Posts: 27
win? who said it was a competition? you should try to continue yours, chances are it'll end up being faster than mine.
insanedesio is offline   Reply With Quote
Old 22nd June 2005, 21:49   #6  |  Link
AVIL
Registered User
 
Join Date: Nov 2004
Location: Spain
Posts: 408
Scripting slowly

Hi:

I have repeated my tests. My script has been:

setmemorymax(128)
loadplugin("C:\Archivos de programa\AviSynth 2.5\plugins2\extendedbilateral.dll")
t=avisource("bvi_v1.avi").assumetff().crop(256,0,256,0,align=true)

q=t.TDeint(order=-1,mode=1,field=1,type=0,sharp=true,mtnmode=0,mthreshL=1,mthreshC=1,cthresh=1).converttoyv12()

i=q.ExtendedBilateral(preprocess=0)
v=q.tbilateral(sdevl=5,sdevc=5,ppclip=i,gui=false,chroma=true)

stackhorizontal(q,v,subtract(q,v))

and the timing:

- Time to process frame 0 = 2 minutes 30 seconds.
- Time to skip to fame 1 = 2 minutes 25 seconds.

In my Pc :
Pentium III 800 EB (with more cache)
384 MB RAM
Windows XP professional with no processes running (other than antivirus and housekeeping)

Version 5.0.2 of the filter.

By the way. The more, the merrier. I appreciate all the filters that people share without no fee. All. Even the less applicable.
AVIL is offline   Reply With Quote
Old 22nd June 2005, 22:59   #7  |  Link
Dreassica
Registered User
 
Join Date: May 2002
Posts: 384
weird, I'm on athlon 64 3000+ with 1gb ddr400 ram. It takes 3 minutes to load and 3 minutes to render 1 frame.
Feels liek im using an amiga for encoding with this filter :P
Dreassica is offline   Reply With Quote
Old 23rd June 2005, 02:57   #8  |  Link
insanedesio
Registered User
 
Join Date: May 2003
Posts: 27
@Dreassica
That's even weirder because that's what I'm on... exactly. It ran at I think 3-4 fps for me (TBilateral at 7), can't remember (just the main bilateral filter). Of course, I was running it with just avisource() and ConvertToYUY2() and default settings. What settings did you use, the defaults? Are you using it in conjunction with TBilateral? Maybe ExtendedBilateral doesn't like working with other filters for some reason. Although the clip I tested on was only 40 frames long... but that shouldn't matter I don't think.

@AVIL
That's even worse than I thought originally, since it's only the preprocess, which for me ran ~4-6 fps. It's possible the SetMemoryMax() might have something to do with it, but there's not much avoiding that if you're on 384MB RAM. My guess would be that your 800MHz processor might have something to do with it... ~4 times slower than mine, however, 2 minutes doesn't make sense. I didn't test it in conjunction with TBilateral but the preprocessor on its own took a max of like six seconds to load then less than half a second (probably around a quarter or fifth) to render. Try ExtendedBilateral on its own, without TBilateral (with preprocess = 2, the default). I'm going to run some tests with using both in conjunction.

Last edited by insanedesio; 23rd June 2005 at 03:10.
insanedesio is offline   Reply With Quote
Old 23rd June 2005, 03:47   #9  |  Link
insanedesio
Registered User
 
Join Date: May 2003
Posts: 27
Added new version (0.5.0.3) in the first post. This should be faster (thanks to a tip from tritical, he pointed out something that just ate up time in the upsampling routine... which I'm still going to kill, eventually...), especially if you're using YV12. In my tests in YUY2 it brought down the load time significantly and brought the processing speed up by 1 fps (20% increase, I explain in the readme how it could actually be upto 1.9fps which is 42% in that case... though that's probably pushing it.. alot ). Hopefully that should work better for you guys.

Last edited by insanedesio; 23rd June 2005 at 03:52.
insanedesio is offline   Reply With Quote
Old 23rd June 2005, 04:13   #10  |  Link
ambrotos
Evil tweaker...
 
ambrotos's Avatar
 
Join Date: Sep 2002
Posts: 33
@insanedesio

I'm sorry ^^
I don't take it as a competition but I have plenty of work and almost no time to implement ^^ so you win because you did it before me that's all ^^ (and it's really a good thing, i'm eager to try it)

I will continue to give you my opinion (and/or advice) on this topic but I wonder if I would find time to throw away matlab and focus on real avisynth plug-in

long life to the skillfull developpers ^^

(signed : the evil -but inexperimented- tweaker)

btw be happy, on matlab it take about 10 min for one frame

For the moment I have some ideas of new functions to replace the gaussian one but... even more slow ^^ (i have to stop maths and do some code ^^)

Last edited by ambrotos; 23rd June 2005 at 04:45.
ambrotos is offline   Reply With Quote
Old 23rd June 2005, 04:38   #11  |  Link
insanedesio
Registered User
 
Join Date: May 2003
Posts: 27
Let me know of those functions, I can add them as kernels to mine (with due credit of course) especially since it already allows multiple kernels.
insanedesio is offline   Reply With Quote
Old 23rd June 2005, 07:32   #12  |  Link
ambrotos
Evil tweaker...
 
ambrotos's Avatar
 
Join Date: Sep 2002
Posts: 33
In fact I was thinking about something for the range filter (the estimator of the weight, given the "color-distance") in the second step of the bilateral filter

for a given distance (in color-space) : d = abs[ pixel_value(x) - pixel_value(e) ]

weight(d) = 1 / sqrt[ 1 + (d / d_c) ^(2*n) ]

it is a kind of low pass filter where d_c is the cutting-frequency and n is the order of the filter. The shape can be chosen very close to a gaussian (d_c = 1) but it is more flexible because we can set a filter which give high weights until a certain color-distance and cut sharp just after (with a high enough order)

Example for n = 5 (red), n = 50 and n = 500 (blue), d_c = 500


So, in the case of a high order (ex : n > 500) for a uniform colored area, with noise (but under the cutting level) it is a real gaussian filter, and if there is a detail (superior to the thresold) it use the bilateral ability.
But it is not only a binary thresold... as it can be softened by using a lower order.

And by tweaking the d_c value you can delay the response to a change of color

any question ?

Last edited by ambrotos; 23rd June 2005 at 12:14.
ambrotos is offline   Reply With Quote
Old 23rd June 2005, 20:12   #13  |  Link
insanedesio
Registered User
 
Join Date: May 2003
Posts: 27
Can't say I feel like adding that, since it takes two parameters, and all the kernels I use only take one. It is also similar mathematically to the El-Fallah Ford kernel; the only difference is that the El-Fallh Ford kernel doesn't have a variable order... it's just (1 + (diff / sigma) ^ 2) ^ (-0.5).

Last edited by insanedesio; 23rd June 2005 at 20:14.
insanedesio is offline   Reply With Quote
Old 23rd June 2005, 21:41   #14  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
insanedesio have you done some profiling to figure out what makes this filter so slow?
tsp is offline   Reply With Quote
Old 24th June 2005, 00:11   #15  |  Link
insanedesio
Registered User
 
Join Date: May 2003
Posts: 27
I'm pretty new to *actual* programming so I don't really know what you mean by "profiling." I've looked at the code and tried to find places where I think it might be slow but I have no idea how I might be able to narrow down the places that are slowing it down in reality. I believe the most recent change I made might've been a culprit when working in YV12 and I'm pretty sure it'll be a faster still when I take out the upsampling, but beyond that I really have no clue.
insanedesio is offline   Reply With Quote
Old 24th June 2005, 01:15   #16  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
you can use the free tool CodeAnalyst from AMD (http://www.amd.com/us-en/Processors/...9_3604,00.html) it will work even with a non-AMD processor although only the timer-part(but that is the part you need) or you can use QueryPerformanceCounter or another timer to meassure the time to complete different part of your code
tsp is offline   Reply With Quote
Old 24th June 2005, 01:29   #17  |  Link
insanedesio
Registered User
 
Join Date: May 2003
Posts: 27
Wicked. I'll play with it later today or tomorrow.
insanedesio is offline   Reply With Quote
Old 24th June 2005, 06:08   #18  |  Link
insanedesio
Registered User
 
Join Date: May 2003
Posts: 27
Well I took a look at CodeAnalyst and couldn't quite figure out how to get it work (especially with a .dll). Mind giving me a quick walkthrough? I must say I am rather interested in finding out whether the upsampling really is that slow, especially after the change in 0.5.0.3 that made it faster. Keeping it in there makes the coding simpler, the way I did it at least... if it's only a small culprit then it probably isn't worth taking it out but if it's bigger I'll have to take the time to get rid of it :-\.

Last edited by insanedesio; 24th June 2005 at 06:34.
insanedesio is offline   Reply With Quote
Old 24th June 2005, 15:02   #19  |  Link
tsp
Registered User
 
tsp's Avatar
 
Join Date: Aug 2004
Location: Denmark
Posts: 807
sure. First create a release build with Debug Information Format set to Program database(it hides in project properties->C/C++->general) and Generate Debug Info set to Yes(under project properties->Linker->Debugging) and Enable Incremental Linking to yes(under project properties->Linker->General)
Next start CodeAnalyst and create a new project(file->new) set Session name to Timer 1. Project directory to theplace you want to save the generated files. Project Name to ExtendedBilateral. Ignore Working directory and set Launch app to the player you use to open the avs file with. I use virtualdub so I set the Launch app to this: "c:\vd16\virtualdub.exe" "c:\test.avs" so virtualdub opens the avs script that contains the filter you want to profile.
Set session type to Timer Trigger and set Project information->Terminate app and set duration to 360 secunds and start delay to 10 secunds (the time it takes for virtualdub to open the avs file and you to click play when the file is open)
now click on play and when virtualdub has opened the avs file hit play in virtualdub and wait while codeanalyst collects the data.
When codeanalyst is done virtualdub is closed find your session under TBP sessions and double click it. Now the System Data tab appears with all the loaded programs and dlls. Find ExtendedBilateral and double click on it. Now the list with the time per function appears. The function name appears in the Symbol + Offset column. If it only displayes NO SYMBOL it might be neccesary to use the commandline tool or check if the linker options aren't enabled due to some other settings(like enable global optimization). If you doubleclick the function name the source code appears with the timer information.
tsp is offline   Reply With Quote
Old 24th June 2005, 23:02   #20  |  Link
insanedesio
Registered User
 
Join Date: May 2003
Posts: 27
Alright, the test results basically told me that the upsampling/downsampling take about 1.5% of the whole filter's time/processing; the rest goes to the pprocess() and process() functions, which are the main functions anyways. Small amounts (a couple samples amidst thousands) went to the LUT building functions (and thus the kernel functions). The only thing I suppose that can be done now is to somehow speed up the main functions themselves. Though the sampling didn't take up too much the main functions ought to be faster once I get rid of it since it will have to process less stuff, assuming I set it up right, which I still need to think about before I get started on it.

Other than that, there wasn't much else. _ftol() took up 6% or so but not much can be done about that, I don't think.
insanedesio is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 06:49.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.