Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > VapourSynth

Reply
 
Thread Tools Search this Thread Display Modes
Old 24th May 2015, 08:54   #1  |  Link
mawen1250
Registered User
 
Join Date: Aug 2011
Posts: 103
BM3D-r7 | state-of-the-art image/video denoising filter

Source & Readme

Binary: GitHub | NMM

After over a month's effort, it's finally completed.
Including the 2D(image) denoising filter BM3D and the 3D(video) denoising filter V-BM3D.

The computational complexity of this denoising algorithm is fairly high, thus it's very slow.
Moreover, the memory consumption of V-BM3D is very high. Since for each current frame, multiple frames are requested, and the filtering result is also aggregated to those frames.

For convenience, I wrote a script named mvsfunc, with which you could apply it with a simple call like:
Code:
import mvsfunc as mvf

core.max_cache_size = 4000 # Set a big enough cache size. For V-BM3D, you may need 8000 or even more (according to resolution and radius)

# Any input format
clip = mvf.BM3D(clip, sigma=[3,3,3], radius1=0) # radius1=0 for BM3D, radius1>0 for V-BM3D
# Same as input format

Last edited by mawen1250; 16th August 2016 at 06:44. Reason: update r7
mawen1250 is offline   Reply With Quote
Old 24th May 2015, 18:49   #2  |  Link
mawen1250
Registered User
 
Join Date: Aug 2011
Posts: 103
Update r2

Fix memory leaks in VAggregate.
mawen1250 is offline   Reply With Quote
Old 26th May 2015, 16:06   #3  |  Link
mawen1250
Registered User
 
Join Date: Aug 2011
Posts: 103
Update r3

Fixed a stupid mistake that the order of applying DCT and IDCT is reversed (it should be DCT->filter->IDCT), which leads to over-filtering in fine structures and produces artifacts such as blocking, ringing and desaturation.
Now it actually filters as expected.
mawen1250 is offline   Reply With Quote
Old 28th May 2015, 01:06   #4  |  Link
MonoS
Registered User
 
Join Date: Aug 2012
Posts: 203
I get incorrect result disabling chroma processing in OPP colorspace [also disabling only one channel]
Code:
res = edi.nnedi3_resample(crop, crop.width, crop.height, mats="601", fulls=False, curves="709", sigmoid=True, scale_thr=1.0, csp=vs.RGB48)

res = core.bm3d.RGB2OPP(res)
ref = core.bm3d.VBasic(res, radius=1, matrix=100, sigma=[8,0,0]).bm3d.VAggregate(radius=1)
flt = core.bm3d.VFinal(res, ref, radius=1, matrix=100, sigma=[8,0,0]).bm3d.VAggregate(radius=1)
flt = core.bm3d.OPP2RGB(flt)

flt = edi.nnedi3_resample(flt, flt.width, flt.height, matd="601", fulld=False, curves="709", sigmoid=True,scale_thr=1.0, csp = vs.YUV420P16)
IIRC OPP is similar to YUV so this operation should be possible, correct me if i'm wrong
MonoS is offline   Reply With Quote
Old 28th May 2015, 13:59   #5  |  Link
mawen1250
Registered User
 
Join Date: Aug 2011
Posts: 103
Ah, I forgot to mention it in the README...
When specific plane is not processed for V-BM3D, the result of that plane will be garbage...Thus you should manually use std.ShufflePlanes to merge them.
It's mainly becanse the implementaion of V-BM3D is divided into 2 functions, it's not very convenient and efficient to pass through the unprocessed planes.

Later I will add the notification and example about it to the README.

Last edited by mawen1250; 28th May 2015 at 14:49.
mawen1250 is offline   Reply With Quote
Old 21st November 2015, 06:46   #6  |  Link
mawen1250
Registered User
 
Join Date: Aug 2011
Posts: 103
Update r4

Added SSE2 optimizations
Basic, Final: profile="fast" use group_size=8 instead of 16 for better speed
OPP2RGB, RGB2OPP: now override frame property "_Matrix"
Compiled with VS2015

~35% faster with the same settings
~175% faster with default settings for bm3d.Basic+bm3d.Final
mawen1250 is offline   Reply With Quote
Old 11th December 2015, 00:23   #7  |  Link
MonoS
Registered User
 
Join Date: Aug 2012
Posts: 203
This goes in my backlog of avx optimization
MonoS is offline   Reply With Quote
Old 23rd January 2016, 02:54   #8  |  Link
mawen1250
Registered User
 
Join Date: Aug 2011
Posts: 103
Update r5

Final estimate: Fixed NaN or INF results which may produce corrupt output and/or break subsequent filters
mawen1250 is offline   Reply With Quote
Old 23rd January 2016, 04:38   #9  |  Link
Hagi
Registered User
 
Join Date: Jan 2013
Posts: 4
Thanks mawen1250. It is, by far, the very best denoising filter I've ever used.

In case you use Paypal, I'd be happy to pay you a few beers. Cheers!
Hagi is offline   Reply With Quote
Old 23rd January 2016, 09:58   #10  |  Link
mawen1250
Registered User
 
Join Date: Aug 2011
Posts: 103
Quote:
Originally Posted by Hagi View Post
Thanks mawen1250. It is, by far, the very best denoising filter I've ever used.

In case you use Paypal, I'd be happy to pay you a few beers. Cheers!
Wow, thanks for your support and I'm glad you like this filter, that's the best reward for me. I'll PM you my Paypal.
mawen1250 is offline   Reply With Quote
Old 30th January 2016, 06:03   #11  |  Link
Hagi
Registered User
 
Join Date: Jan 2013
Posts: 4
Quote:
Originally Posted by mawen1250 View Post
Wow, thanks for your support and I'm glad you like this filter, that's the best reward for me. I'll PM you my Paypal.
You're welcome. Hard work deserves a fair reward, within your means.
Hagi is offline   Reply With Quote
Old 8th April 2016, 15:09   #12  |  Link
asarian
Registered User
 
Join Date: May 2005
Posts: 1,464
Looks wonderful! What is the best way to pre-calculate core.max_cache_size, though? Is there some kind of formula I can use?
__________________
Gorgeous, delicious, deculture!
asarian is offline   Reply With Quote
Old 9th April 2016, 10:43   #13  |  Link
mawen1250
Registered User
 
Join Date: Aug 2011
Posts: 103
Quote:
Originally Posted by asarian View Post
Looks wonderful! What is the best way to pre-calculate core.max_cache_size, though? Is there some kind of formula I can use?
Just did a simple test with mvf.BM3D.
1920x1080 input
threads=8, max_cache_size=20000
profile="fast"

radius | Memory consumption
0 | ~4GB
1 | ~10GB
2 | ~15GB

Then try lowering max_cache_size to see if there will be obvious speed penalty. From my tests, 2/3 the memory consumption above is a well-balanced point.
That is, 2500 for radius=0, 7000 for radius=1, 10000 for radius=2, for 1920x1080 video.
mawen1250 is offline   Reply With Quote
Old 9th April 2016, 11:17   #14  |  Link
asarian
Registered User
 
Join Date: May 2005
Posts: 1,464
Thank you very much!

I only have 12G installed, so I guess I can forget about 'radius=5' then. (Although that was on a 640x480 source, so maybe I can make it work anyway)
__________________
Gorgeous, delicious, deculture!
asarian is offline   Reply With Quote
Old 9th April 2016, 11:57   #15  |  Link
jackoneill
unsigned int
 
jackoneill's Avatar
 
Join Date: Oct 2012
Location: 🇪🇺
Posts: 760
Quote:
Originally Posted by mawen1250 View Post
Just did a simple test with mvf.BM3D.
1920x1080 input
threads=8, max_cache_size=20000
profile="fast"

radius | Memory consumption
0 | ~4GB
1 | ~10GB
2 | ~15GB

Then try lowering max_cache_size to see if there will be obvious speed penalty. From my tests, 2/3 the memory consumption above is a well-balanced point.
That is, 2500 for radius=0, 7000 for radius=1, 10000 for radius=2, for 1920x1080 video.
How many frames did you filter? The cache reevaluates its behaviour every few hundred frames, so the speed/memory consumption may improve (or degrade) over time.
__________________
Buy me a "coffee" and/or hire me to write code!
jackoneill is offline   Reply With Quote
Old 9th April 2016, 18:20   #16  |  Link
mawen1250
Registered User
 
Join Date: Aug 2011
Posts: 103
Quote:
Originally Posted by jackoneill View Post
How many frames did you filter? The cache reevaluates its behaviour every few hundred frames, so the speed/memory consumption may improve (or degrade) over time.
approximate 500 frames

From previous experience, the memory consumption of VS will gradually decrease during encoding procedure. It's an interesting question if it will decrease when V-BM3D is used, and we can set lower max_cache_size? Maybe I'll do more tests to find it out.
mawen1250 is offline   Reply With Quote
Old 10th April 2016, 13:30   #17  |  Link
asarian
Registered User
 
Join Date: May 2005
Posts: 1,464
Quote:
Originally Posted by jackoneill View Post
How many frames did you filter? The cache reevaluates its behaviour every few hundred frames, so the speed/memory consumption may improve (or degrade) over time.
I noticed that, the other day. The other day I set

core.max_cache_size = 4096

(Using TempLinearApproximateMC), just to see what would happen. Python started off at ca 6G (as expected for the job), but slowly tapered off to almost not using the cache at all. Seems Python is clever enough to realize what it actually needs, over time. I like that. (And kinda logical too: it is, after all, max_cache_size, not just cache_size).
__________________
Gorgeous, delicious, deculture!
asarian is offline   Reply With Quote
Old 11th April 2016, 08:08   #18  |  Link
mawen1250
Registered User
 
Join Date: Aug 2011
Posts: 103
Quote:
Originally Posted by asarian View Post
I noticed that, the other day. The other day I set

core.max_cache_size = 4096

(Using TempLinearApproximateMC), just to see what would happen. Python started off at ca 6G (as expected for the job), but slowly tapered off to almost not using the cache at all. Seems Python is clever enough to realize what it actually needs, over time. I like that. (And kinda logical too: it is, after all, max_cache_size, not just cache_size).
Python is just an interface for vs though. it's the vs core that manages the frame cache (which is written in C++).

By the way, I've tested processing more frames with BM3D(radius=2).
Up to now, the result is:
Code:
frame   memory(MB)
100     15500
750     15500
1950    15300
8400    13400
37600   12700

Last edited by mawen1250; 11th April 2016 at 08:12.
mawen1250 is offline   Reply With Quote
Old 8th June 2016, 17:19   #19  |  Link
asarian
Registered User
 
Join Date: May 2005
Posts: 1,464
Is this, despite its wicked memory requirements, perchance a good alternative to KNLMeansCL? (Something I could use on my VM, sans relevant GPU, to tide me over until I have the GTX 750 for it).
__________________
Gorgeous, delicious, deculture!
asarian is offline   Reply With Quote
Old 8th June 2016, 17:26   #20  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: void
Posts: 2,633
Quote:
Originally Posted by asarian View Post
Is this, despite its wicked memory requirements, perchance a good alternative to KNLMeansCL? (Something I could use on my VM, sans relevant GPU, to tide me over until I have the GTX 750 for it).
NLMeans -> for white noise
BM3D -> for Gaussian noise

Gaussian is just one kind of white noise, other kinds are out there also.

different algorithms, these two work best when combined together, not one replacing another, you could over de-noise the image a tiny little bit with BM3D and then refine it with an NLMeans loop, the result will be stunning and better than both plain BM3D or plain NLMeans

Last edited by feisty2; 8th June 2016 at 17:37.
feisty2 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 01:21.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.