Log in

View Full Version : xClean 3-Pass Denoiser


Pages : [1] 2

MysteryX
24th September 2021, 18:41
xClean 3-Pass Denoiser for VapourSynth (https://github.com/mysteryx93/xClean/blob/main/xClean.py)

Here's a denoising comparison video (https://www.youtube.com/watch?v=Z0OrPP0VOUI)

beta 7 (2021-10-27) by Etienne Charland
Supported formats: YUV, RGB, GRAY
Requires: rgsf, rgvs, fmtc, mv, mvsf, tmedian, knlm, bm3d, bm3dcuda_rtc, bm3dcpu, neo_f3kdb, akarin, nnedi3_resample, nnedi3cl

xClean runs MVTools -> BM3D -> KNLMeans in that order, passing the output of each pass as the ref of the next denoiser.

The objective is to remove noise while preserving as much details as possible. Removing noise is easy -- just blur out everything.
The hard work is in preserving the details in a way that feels natural.

Designed for raw camera footage to remove noise in dark areas while preserving the fine details. It works for most types of content.

Performance-wise, BM3D pass is the heaviest and helps recover fine details, but this script runs 1 pass of BM3D whereas stand-alone BM3D runs twice.


+++ Short Doc (TL;DR) +++
Default settings provide the best quality in most cases. Simply use
xClean(sharp=..., outbits=...)

If only darker areas contain noise, set strength=-50
For better performance, set m1=0 or m2=0, or set m1=.5 and m2=3.6 (downscale)
BM3D performance can be greatly improved by setting radius=0, block_step=7, bm_range=7, ps_range=5

For 720p WebCam, optimal settings are: sharp=9.5, m1=.65, h=2.8
For 288p anime, optimal settings are: sharp=9.5, m1=.7, rn=0, optional depth=1
For 4-5K GoPro (with in-camera sharpening at Low), optimal settings are: sharp=7.7, m1=.5, m2=3.7, optional strength=-50 (or m1=.6, m2=3.8 if your computer can handle it)


+++ Description +++
KNLMeans does a good job at denoising but can soften the image, lose details and give an artificial plastic look. I found that on any given source
(tested 5K GoPro footage and noisy WebCam), denoising with less than h=1.4 looks too noisy, and anything above it blurs out the details.
KNLMeans also keeps a lot of data from the clip passed as rclip, so doing a good prefilter highly impacts the output.

Similarly, BM3D performs best with sigma=9. A lower value doesn't remove enough noise, and a higher value only makes the edges sharper.

xClean is essentially KNLMeans with advanced pre-filtering and with post-processing to renoise & sharpen to make the image look more natural.

One strange aspect of xClean is that denoising is automatic and there's very little room to configure denoising strength other than reducing the overall effect.
It runs with BM3D sigma=9 and KNL h=1.4, and generally you shouldn't change that. One setting that can allow increasing denoising (and performance)
is downscaling MVTools and BM3D passes. You can also set h=2.8 if the output remains too noisy. h = 1.4 or 2.8 are generally the best values.

According to my tests, water & cliff 5K video with little noise preserves the details very well while removing subtle grain, and with same settings,
very noisy 720p WebCam footage has HUGE noise reduction while preserving a surprising amount of natural details.

The default settings are very tolerant to various types of clips.

All processing is done in YUV444 format. When conv=True, processing is done in YCgCoR, and in OPP colorspace for BM3D.


+++ Denoising Methods Overview +++
To provide the best output, processing is done in 3 passes, passing the output of one pass as the ref clip of the 2nd pass. Each denoiser has its strengths and weaknesses.

Pass 1: MVTools (m1)
Strength: Removes a lot of noise, good at removing temporal noise.
Weakness: Can remove too much, especially with delicate textures like water.
Ref: Impacts vectors analysis but low impact on outcome

Pass 2: BM3D (m2)
Strength: Good at preserving fine details!
Weakness: Doesn't remove much grain.
Ref: Moderate impact on outcome. A blurry ref will remove more grain while BM3D puts back a lot of details.

Pass 3: KNLMeansCL (m3)
Strength: Best general-purpose denoiser
Weakness: Can blur out details and give an artificial plastic effect
Ref: High impact the outcome. All prefilters benefit from running KNLMeans over it.


+++ Denoising Pass Configuration (m1=.6, m2=2, m3=2) +++
Each pass (method) can be configured with m1 (MVTools), m2 (BM3D) and m3 (KNLMeansCL) parameters to run at desired bitdepth.
This means you can fine-tune for quality vs performance.

0 = Disabled, 1 = 8-bit, 2 = 16-bit, 3 = 32-bit

Note: BM3D always processes in 32-bit, KNLMeansCL always processes in 16-bit+, and post-processing always processes at least in 16-bit, so certain
values such as m2=1, m3=1 will behave the same as m2=2, m3=2. Setting m2=2 instead of 3 will only affect BM3D post-processing (YUV444P16 instead of YUV444PS)

MVTools (m1) and BM3D (m2) passes can also be downscaled for performance gain, and it can even improve quality! Values between .5 and .8 generally work best.

Optional resize factor is set after the dot:
m1 = .6 or 1.6 processes MVTools in 8-bit at 60% of the size. m2 = 3.6 processes BM3D in 16-bit at 60% of the size.
You may want to downscale MVTools (m1) because of high CPU usage and low impact on outcome.
You may want to downscale BM3D (m2) because of high memory usage. If you run out of memory, lower the size until you get no hard-drive paging.
Note: Setting radius=0 greatly reduces BM3D memory usage!


+++ Renoise and Sharpen (rn=14, sharp=9.5) +++
The idea comes from mClean by Burfadel (https://forum.doom9.org/showthread.php?t=174804) and the algorithm was changed by someone else while porting
to VapourSynth, producing completely different results -- original Avisynth version blurs a lot more, VapourSynth version keeps a lot more details.

It may sound counter-productive at first, but the idea is to combat the flat or plastic effect of denoising by re-introducing part of the removed noise.
The noise is processed and stabilized before re-inserting so that it's less distracting.
Renoise also helps reduce large-radius grain; but should be disabled for anime (rn=0).

Using the same analysis data, it's also sharpening to compensate for denoising blur.
Sharpening must be between 0 and 20. Actual sharpening calculation is scaled based on resolution.


+++ Strength / Dynamic Denoiser Strength (strength=20) +++
A value of 20 will denoise normally.
A value between 1 and 19 will reduce the denoising effect by partially merging back with the original clip.
A value between 0 and -200 will activate Dynamic Denoiser Strength, useful when bright colors require little or no denoising and dark colors contain more noise.
It applies a gradual mask based on luma. Specifying a value of -50 means that out of 255 (or 219 tv range), the 50 blackest values have full-reduction
and the 50 whitest values are merged at a minimal strength of 50/255 = 20%.

+++ Radius (radius=0) +++
BM3D radius. Low impact on individual frames.
Pros: Helps stabilize temporal grain. Can significantly improve video compressability.
Cons: High impact on performance and memory usage! May require downscaling BM3D for HD content with m2 between 3.6 and 3.8
For moving water, the temporal stabilization may be undesirable.

+++ Depth (depth=0) +++
This applies a modified warp sharpening on the image that may be useful for certain things, and can improve the perception of image depth.
Settings range up from 0 to 5. This function will distort the image, for animation a setting of 1 or 2 can be beneficial to improve lines.

+++ Deband (deband=False) +++
This will perceptibly improve the quality of the image by reducing banding effect and adding a small amount of temporally stabilised grain
to both luma and chroma. Default settings are suitable for most cases without having a large effect on compressibility.

+++ Output (outbits, dmode=0) +++
Specifies the output bitdepth. If not specified it will be converted back to the bitdepth of the source clip using dithering method specified by dmode.
You can set dmode=3 if you won't be doing any further processing for high-quality ditherig.

+++ Chroma upsampling/downsamping (chroma=nnedi3, downchroma=True) +++
Chroma upsampling options:
none = don't touch chroma
bicubic = bicubic(0, .5) upsampling
nnedi3 = NNEDI3 upsampling
reconstructor = feisty2's ChromaReconstructor_faster v3.0 HBD mod

downchroma: whether to downscale back to match source clip. Default is False for reconstructor and True for other methods.

+++ Anime +++
For anime, set rn=0. Optionally, you can set depth to 1 or 2 to thicken the lines.

+++ Advanced Settings +++
gpuid = 0: The GPU id to use for KNLMeans and BM3D, or -1 to use CPU.
gpucuda = 0: The GPU id to use for BM3D, or -1 to use CPU.
h = 1.4: KNLMeans strength, can increase slightly if the output is still too noisy. 1.4 or 2.8 generally work best.
block_step = 4, bm_range = 16, ps_range = 8: BM3D parameters for performance vs quality. No impact on CPU and memory. Adjust based on GPU capability.
Fast settings are block_step = 5, bm_range = 7, ps_range = 5

Normally you shouldn't have to touch these
rgmode = 18: RemoveGrain mode used during post-processing. Setting this to 0 disables post-processing, useful to compare raw denoising.
thsad = 400: Threshold used for MVTools analysis.
d = 2: KNLMeans temporal radius. Setting 3 can either slightly improve quality or give a slight plastic effect.
a = 2: KNLMeans spacial radius.
sigma = 9: BM3D strength.
bm3d_fast = False. BM3D fast.
conv = True. Whether to convert to OPP format for BM3D and YCgCoR for everything else. If false, it will process in standard YUV444.

MysteryX
24th September 2021, 18:53
Remains a few things to fix

- KNLMeansCL, what would be the optimal settings for dark scenes? Should chroma be processed with the same strength as luma? Would rclip be useful here?
- BM3D, what would be the right way to use it? It could be added as method=2
- I'll do some improvement to the dark scene detection
- You can test that strength, sharp and other parameters are doing what they're supposed to be doing

It will be converted to Avisynth only when the VapourSynth script is completed. MvTools2 temporal analysis runs MUCH smoother in VapourSynth.

I did very quick benchmark. Processing 5K video at
8-bit: 1.4 fps
16-bit: 0.8 fps
32-bit: 0.20 fps
8-bit with outbits=16: 1.2 fps

ChaosKing
24th September 2021, 19:47
Nice work.
The new Vapoursynth version (API4) does not support COMPAT anymore.
mvsf.Degrain3, mvsf.Degrain4 means it's refers to the old mvtools-sf version. The new version only knows mvsf.Degrain.

MysteryX
24th September 2021, 20:00
Nice work.
The new Vapoursynth version (API4) does not support COMPAT anymore.
mvsf.Degrain3, mvsf.Degrain4 means it's refers to the old mvtools-sf version. The new version only knows mvsf.Degrain.
So just change mvsf.Degrain4 to mvsf.Degrain, and what about mv.Degrain3?

ChaosKing
24th September 2021, 21:59
It seems you now need to specify a radius in Analyze and this is then basically Degrain3, 4, 24 etc. It works now with any radius.

https://forum.doom9.org/showthread.php?p=1911978#post1911978

MysteryX
25th September 2021, 02:46
With r10-prerelease, I cannot specify the 4 vector clips to Degrain, the syntax is completely different. For now I'll stay with r9.

8-bit runs at 1.4fps, 16-bit at 0.8fps, and 32-bit at 0.20fps. 32-bit output has a very slight colour shift.

MysteryX
25th September 2021, 02:53
Fixed the sharpener. It works in 32-bit with mvsf r9.

I disabled boost by default. There's no good default value for it, as it depends for each video source. Many videos don't need it at all.

If you have a troublesome video with dark scenes, best thing to do is to scan the video using this to find the right average luma threshold.


clip = clip.std.PlaneStats().text.FrameProps(scale=4)


For example, I have a video in a boat under the sun, then we enter the "pirate's cave" and it gets dark yet we partially see the boat or the cave entrance. 28% seems to be the right threshold for that case. However, in another part of the video, we're looking at the big cliff and it also goes down to 18-22% although that scene isn't noisy.

Note that if I set boost=28 and a scene has 26% luma, it doesn't fully take the blurrier KNLMeansCL, but it's a partial merge based on how dark it goes below the threshold.

I could do a fancier dark scene detection by discarding the brightest pixels if they cover less than 15%, to average everything else, but I'm not sure it's worth it.

MysteryX
25th September 2021, 22:35
Added proper outbits support with dither at the end if necessary. Post-processing in 16-bit is worth it with minimal performance penalty, with 10-bit dithering if encoding in 10-bit.

Question: which dmode dithering method (https://akarinvs.github.io/fmtconv/doc/fmtconv.html#bitdepth) is recommended to use here?

Curiously, bit-depth and dithering is impacting more the file size than I thought:
- 8-bit processing: 7359kb
- 8-bit processing with 16-bit post-processing with 10-bit dithering: 6992kb
- 16-bit processing with 10-bit dithering: 6742kb
- 32-bit processing with 10-bit dithering: 6686kb

MysteryX
25th September 2021, 23:51
Fixed fullscreen detection that was reverse (full-range is 1 not 0)

Added method=2: BM3D (CUDA). Its effect seems to be very subtle and need to set sigma (p1) very high to see some difference. Since it's a conservative denoiser, not sure renoise/sharp will be so useful here but you can test it.

For KNLMeansCL, what's the best way to process chroma? Right now it's processing luma first, then chroma separately. Should instead convert to YUV444 and process all planes at once? Or process chroma differently?

Quadratic
26th September 2021, 14:56
Added proper outbits support with dither at the end if necessary. Post-processing in 16-bit is worth it with minimal performance penalty, with 10-bit dithering if encoding in 10-bit.

Question: which dmode dithering method (https://akarinvs.github.io/fmtconv/doc/fmtconv.html#bitdepth) is recommended to use here?

I think the best dithering technique is going to depend on the type of content you're processing, but Sierra-2-4A is probably good for general use.

That aside I am going to argue against this kind of option, controlling the depth of your clips is something very fundamental and something all users should know how to do without relying on the function to provide the option. It also adds needless complexity to the function.

For KNLMeansCL, what's the best way to process chroma? Right now it's processing luma first, then chroma separately. Should instead convert to YUV444 and process all planes at once? Or process chroma differently?
What benefits can KNLMeansCL provide when processing at 4:4:4? (BM3D has CBM3D). One potential downside I can imagine with having two KNLMeansCL calls (Y/CbCr) is perhaps higher overhead.

BTW if possible, I think vcm is windows only and managing that on Linux is kind of annoying. Any chance a replacement could be found? I don't know of any other windows exclusive vapoursynth filters.

MysteryX
26th September 2021, 15:16
That aside I am going to argue against this kind of option, controlling the depth of your clips is something very fundamental and something all users should know how to do without relying on the function to provide the option. It also adds needless complexity to the function.

if I want to process MVTools (and/or KNLMeansCL) in 8-bit, and do all the post-processing in 16-bit, then I need that.

Only thing that would be optional is auto-dithering for 10bit, 12bit and 14bit outbits; but then supporting 10-bit outbits instead of 16-bit outbits is also optional.


What benefits can KNLMeansCL provide when processing at 4:4:4? (BM3D has CBM3D). One potential downside I can imagine with having two KNLMeansCL calls (Y/CbCr) is perhaps higher overhead.

Output isn't the same. I believe I read that it's processing chroma based on Luma analysis or something, but can't find where I read that (or I imagined it)

MysteryX
27th September 2021, 06:22
Added finalm parameter to run KNLMeansCL (finalm=1) or BM3d (finalm=2) with xClean as ref. Results are impressive. (https://forum.doom9.org/showthread.php?p=1953208#post1953208)

This is the syntax to get the best results according to my tests.


clean = xclean.xClean(clip, sharp=21, strength=-50, outbits=16, finalm=1, f1=1.4)

lansing
28th September 2021, 00:46
I see no difference in your comparison

MysteryX
28th September 2021, 01:18
You need to zoom into the 5K image to see it at full resolution to see the details. There's noise at the top, difficult details in the beard, moving trees in the background, and square textures on the seat. Small differences if you look closely.

If you just open the web page it doesn't allow zooming to 100%, need to drag that image into a new tab.

Edit: the xClean image seemed to be wrong, I updated it. Haven't double-checked if there's any other mixup.

MysteryX
29th September 2021, 06:16
xClean now fully support MVTools (0), KNLMeansCL (1) and BM3D (2) methods with renoising and sharpening, and can then apply KNLMeans (1) or BM3D (2) with the first method as ref using finalm.

I did extensive testing with screenshots here. (https://forum.doom9.org/showthread.php?p=1953348#post1953348) Very interesting results!

Remains a question. Renoise and sharpen run on the first denoise method, which then gets fed to the 2nd denoiser. Is it useful to re-apply renoise and/or sharpen again? The images already look pretty damn good. Re-applying them would require running RGTools to then calculate the sharpen and renoise masks. I do re-apply dynamic denoising strength; that's the only post-processing I'm re-applying the 2nd time.

MysteryX
2nd October 2021, 01:57
Now all 3 methods can be used as final methods; but only KNLMeansCL(1) is useful. MVTools doesn't keep much from prefilter so it's a waste to feed complex denoising into it. BM3D doesn't take much from prefilter either and has poor temporal stability. As for feeding a prefilter into KNLMeans, I have yet to see cases where it produces worse results than prefilter itself.

Also adjusted sharpening to be consistent across methods.

Selur
2nd October 2021, 09:39
Now all 3 methods can be used as final methods; but only KNLMeansCL(1) is useful.
sounds like the other methods should be removed then ;)

MysteryX
2nd October 2021, 16:20
sounds like the other methods should be removed then ;)

As final method. I'm refactoring it so that you can enable/disable each component but they stay in the same order: MVTools -> BM3D -> KNLMeans. Plus configure in what bit-depth each will run (post-processing will be in 16-bit)

MysteryX
3rd October 2021, 01:46
Refactored to run in this order: MVTools -> BM3D -> KNLMeans, each with configurable bitdepth with m1 (MVTools), m2 (BM3D) and m3 (KNLMeans). 0=disabled, 1=8bit, 2=16bit, 3=32bit

Default is m1=1, m2=0, m3=2. All content look better with m1=1, m2=1, m3=2 (m2=1 or m2=2 is the same as it runs in 32-bit)

I also removed vcm.Median as part of the output, now it was only pre-processing. Saw it was a HUGE performance hog so replaced it as MVTool prefilter with Convolution matrix.

Here are some benchmark on 5K footage, running on Intel i7-10750H with GeoForce GTX 2060

"1 0 1" means m1=1, m2=0, m3=1. In order from lower quality to best quality.

0 0 1: 2.26fps (KNLMeans d=2, a=2)
1 0 1: 1.69fps
1 0 2: 1.77fps (faster??)
1 0 2 d=3: 0.95fps
0 1 2: 1.42fps
0 1 2 d=3 radius=1: 0.32fps
1 1 2: 0.82fps
1 1 2 d=3: 0.28fps
1 1 2 radius=1: 0.11fps -- 16GB RAM isn't enough

Not sure why d=3 is absolutely killing performance when paired with BM3D.

SMDegrain could probably be optimized for performance; if someone knows how it works and has ideas.

Now the quality test is interesting. On a general clip, 1 0 2 d=3 (0.95fps) and 1 1 2 (0.82fps) have almost the exact same quality improvement over 1 1 2. They look nearly identical. The difference is that 1 1 2 will be more consistent across a variety of scenes. As for adding h=3 and radius=1 over 1 1 2... I looked very very closely and can't tell the difference. Not worth investing $4000 into a new Threadripper with 64GB RAM.

Mode 1 1 2 thus seems to be the best option overall, and 1 0 2 for performance.

Image comparison: Driving (https://slow.pics/c/atcHSsRb)

I challenge you to find a difference between 1 1 2 and 1 1 2 d3r1. It's also comparing with simple KNLMeans .When you add renoise & sharpen, file size goes up from 13.4MB to 15.5MB and it looks more natural. Since I removed Median at the start, it makes the image sharper... sharp=21 may be too high now. (screenshots are with sharp=21)

Performance test on 720p WebCam:
KNLMeansCL(d=2, a=2): 33.3fps
0 0 1: 33.1fps
1 0 1: 32.2fps
1 0 2: 32.25fps
1 0 2 d=3: 23.0fps
0 1 2: 27.65fps
0 1 2 d=3 radius=1: 20.6fps
1 1 2: 26.84fps
1 1 2 d=3: 20.43fps
1 1 2 d=3 radius=1: 17.46fps

Wow, the highest setting runs only at half the performance of plain KNLMeans!? RAM really was the issue above.

Image comparison: WebCam (https://slow.pics/c/8ejtPfD6)

It's surprising the amount of noise that it's taking out and the amount of details it preserves with a natural feel. I've included an image with radius=1 and radius=2. Radius=1 helps very slightly but radius=2 seems counter-productive. BTW, 26.84fps means real-time camera denoising.

Because it's running 3 different denoisers, it is very forgiving and works for any kind of content (that I tested so far).

Finally... because of 16RAM limitations, my real options are
A) 1 1 2 encoding with NVEnc (0.80fps)
B) 1 0 2 or 0 1 2 depending on content, encoding with x265 (... fps) never mind 0.1fps even that isn't an option. x265 alone encodes at 1.5fps though, but using 9GB ram on its own.

MysteryX
3rd October 2021, 17:06
Wow found a weird quality/performance setting. My 2 main concerns were
1) Performance
2) Inability to increase denoising strength

For the first pass with MVTools, if you resize it down, it barely impacts the output and may even increase quality by increasing the denoising strength.

So now if m1 is between 0 and 1, I resize it by that factor.

Performance on 5K:
1 1 2: 0.82fps
.5 1 2: 1.23fps
.6 1 2: 1.14fps
.6 1 2 d=3: 0.74fps (best output)

To compensate for very slight blur, increase sharp=10 to sharp=11

Now I can re-encode my raw GoPro 5K footage. With x265 preset=fast and .5 1 2, it goes at 0.38fps with good RAM margin left. Or, with NVEnc at 1fps.

lansing
4th October 2021, 00:29
I am completely lost at what you're trying to do. Those "0 1 2", "m1" "m2" renaming are just unreadable. And there's like no differences between all your comparisons on the 5K image. How do you even come up with conclusion like who's better? One thing I've seen for sure is the color noise across all your variation settings, meaning that your filter is not getting the job done on chroma denoise.

MysteryX
4th October 2021, 03:40
I am completely lost at what you're trying to do. Those "0 1 2", "m1" "m2" renaming are just unreadable. And there's like no differences between all your comparisons on the 5K image. How do you even come up with conclusion like who's better? One thing I've seen for sure is the color noise across all your variation settings, meaning that your filter is not getting the job done on chroma denoise.

I haven't yet rewritten the documentation but it's simple. It goes in 3 passes that can be switched on or off. m1 is first pass, m2 is second pass, m3 is 3rd pass. 0=disabled, 1=8bit, 2=16bit. Will add 3=YUV44416 and 4=YUV444PS.

I'm not trying to remove "all" of the noise. I'm trying to remove as much as possible while preserving as much details and natural feel as possible. KNLMeans tends to give a "plastic" feeling to images, especially on things like water. xClean removes more noise and doesn't have that artificial feel to it. Definite difference. Especially in video, the noise that remains has more temporal stability and is less distracting.

Removing noise is easy. Just blur out everything. The hard part is to preserve the details.

WebCam frame -- you're not seeing the difference there??

One challenge with HD footage (4K, 5K) is that denoising works in a local way based on surrounding pixels. If looking at it on a 1080p display, the noise is a lot more stable, but the visible spatial noise is very difficult to remove without removing a lot of fine details.

MysteryX
4th October 2021, 08:29
I am completely lost at what you're trying to do. Those "0 1 2", "m1" "m2" renaming are just unreadable. And there's like no differences between all your comparisons on the 5K image. How do you even come up with conclusion like who's better? One thing I've seen for sure is the color noise across all your variation settings, meaning that your filter is not getting the job done on chroma denoise.
When comparing images, don't look at the noise. Look at what's left. The noise varies very little based on the settings. What the settings alter is the precision of the image that remains.

m1|m2|m3 now support 3 (YUV444P16) and 4 (YUV444PS) processing. m1 takes a float where the fraction after the decimal is the resize factor and before the dot is the format.

ex: .6 or 1.6 processes in YUV420P8 at 60% size
3.6 processes in YUV444P16 at 60% size
I don't know whether that has benefits but it's supported

Default is now .6 3 3 for best quality. 2nd pass (BM3D) is already in YUV444PS anyway, and for 3rd pass, it avoids running KNL twice on Luma and Chroma. Performance hit is very minimal with a slight gain of precision on the chroma.

MysteryX
4th October 2021, 19:18
Updated documentation and released beta 2 (https://github.com/mysteryx93/xClean/blob/main/xClean.py)

ChaosKing
5th October 2021, 10:42
You can now vsrepo install xclean

MysteryX
5th October 2021, 16:28
You can now vsrepo install xclean
Nice.

Note that there's still an important detail missing. BM3D conversion back from RGBS is currently converting to full-range and not respecting matrix, transfer, primaries and chromalocation. Just be aware of that while testing.

MysteryX
5th October 2021, 17:45
To answer my earlier question: encoding with NVEnc vs x265. With NVEnc, water splashes suffer. Definite difference -- even with just preset=Fast

Thus downscaling MVTools to allow running x265 on top of it is important.

Looking at the clips, xClean does a GREAT job at:
- Temporal denoising (stability)
- Small video denoising like webcam
- Preserving details for any type of content

Where it struggles:
- Large radius noise. Like for 5K content on a 1080p display, the local spatial noise removed is within a single visible pixel, and thus it's very hard to remove large radius grain.

MVTools is the only one that addresses large-radius grain to some degree. vcm.Median also aims to help with that, but performance is very bad, 4fps with maxgrid=5 and 2.9fps with maxgrid=9. It needs maxgrid=9 to make a difference and it does soften large radius grain without any noticeable image deterioration.

For small quality improvement, you can run vcm.Median(maxgrid=9) before xClean. Whether it's worth the performance hit is debatable.

sl1pkn07
10th October 2021, 18:37
BTW if possible, I think vcm is windows only and managing that on Linux is kind of annoying. Any chance a replacement could be found? I don't know of any other windows exclusive vapoursynth filters.

https://github.com/AmusementClub/vcm

MysteryX
11th October 2021, 05:45
I went through the to-do list. Only item remaining is to convert back with the right range and matrix after BM3D.

It now has proper support for high-bit-depth input, YUV422, YUV444 or GRAY input. Can also now disable GPU...

OK got a problem with disabling GPU. With KNLMeans, if I set device_type="cpu", it throws: "no compatible opencl platforms available!" If I use mvf.BM3D instead of bm3dcuda, I get "matrix: unsupported color family for output." How to fix those?

Also I added experimental SpotLess to replace pass 1. Due to a bug in dubhater version, it's only working in the sf version. It will use SpotLess if you set m1 to 4 (or 4.5 with 50% downscaling), calculating in 32-bit. With preliminary tests, it seems that with SpotLess, the image has more depth, appears more realistic and less flat. It makes moving objects disappear but we restore them in pass 2 and 3 so that's not much of an issue.

Edit: I reported the bug to KNLMeans author, and this works for BM3D instead of mvsfund

ref = core.bm3d.Basic(src, sigma=9)
flt = core.bm3d.Final(src, ref, sigma=9)

lansing
11th October 2021, 19:37
I tested the plugin on both my 4K outdoor basketball court video and a grainy 1080p anime. On the 4K video it wiped out all the little rocks on a floor. And on the anime, it couldn't take out bigger grain. And the renoise feature doesn't really look good, the suppressed noise that's added back from the plugin does not move with the objects. So on anime where there're big patches of color, it would look if the noise was sticking to my screen.

MysteryX
11th October 2021, 19:54
I tested the plugin on both my 4K outdoor basketball court video and a grainy 1080p anime. On the 4K video it wiped out all the little rocks on a floor. And on the anime, it couldn't take out bigger grain. And the renoise feature doesn't really look good, the suppressed noise that's added back from the plugin does not move with the objects. So on anime where there're big patches of color, it would look if the noise was sticking to my screen.
I'm mostly focusing my tests on raw camera footage for now, with the focus on preserving details.

Can you send me your 4K clip to test on if it's not too big? (or trim 10 seconds of it using StaxRip)

As for anime, it often requires different settings. Got tweaking ideas? I'm struggling with removing larger grain that neither KNLMeans nor BM3D take out .What do you find effective with anime grain?

Renoise is to combat the "flat" effect of denoising. For anime, is flat good? Is a little bit of renoise beneficial or not at all? It can be disabled with rn=0

There's also the option of replacing one of the passes with a denoiser that fundamentally works well with grainy anime.

MysteryX
11th October 2021, 20:14
Conversion back to YUV after BM3D now respects source attributes. Tweaked settings of BM3D for better quality. BM3D now can work in CPU mode.

There might have been a bug where even with settings to process in 16-bit, it would use the 8-bit clip as source.

I also changed dmode default to 0; set dmode=3 if you won't do any further processing.

MysteryX
12th October 2021, 04:59
About Renoise... it was designed to combat the flatness of the original mClean that was a lot more aggressive in blurring things out. Is it still needed with this version that is a lot more detailed? It seems to still benefit, but probably doesn't need to be set as high.

BUT..... higher renoise somehow attenuates some of the larger grain that is hard to remove??? How do you explain that? I'm not sure what it's doing anymore. This will require more testing to find the optimal default value.

Yep. I confirm. Renoise in fact removes more of the bigger grain... default value of 14 is probably fine.

MysteryX
12th October 2021, 17:57
I tested the plugin on both my 4K outdoor basketball court video and a grainy 1080p anime. On the 4K video it wiped out all the little rocks on a floor. And on the anime, it couldn't take out bigger grain. And the renoise feature doesn't really look good, the suppressed noise that's added back from the plugin does not move with the objects. So on anime where there're big patches of color, it would look if the noise was sticking to my screen.

I tested on a random music video with light noise and it worked fine. MVTools tends to make small rocks disappear, but BM3D generally fixes that problem. I'm surprised you're still seeing missing rocks. I'm not seeing that. What settings are you using?

Also fixed a bug where renoise wasn't accounting for TV vs Full range when applying contrast.

I tested on an anime video. Indeed, renoise is not good on anime. Set rn=0, and optionally can play with depth parameter with anime to make the lines thicker. With this, it worked pretty good. Since the anime was only 288p, it worked best with m1=.8 instead of .6

So for anime, I used: rn=0, m1=.8, depth=1

MysteryX
13th October 2021, 20:20
Beta 3 is ready. (https://github.com/mysteryx93/xClean/blob/main/xClean.py)

ChaosKing, can you update it on vsrepo. It's pretty much feature-complete.

Evaluating noise contrast using per-frame ColoRange with FrameEval creating LUT table on every frame killed performance. I just took out all FrameEval and assume frame properties based on first frame. It gives a performance boost.

There seems to be an issue of a very slight luma shift. Can you test this out and see if you can pinpoint the problem? Could renoise be responsible? (https://github.com/mysteryx93/xClean/blob/main/xClean.py#L316)

ChaosKing
13th October 2021, 20:55
Beta 3 is ready. (https://github.com/mysteryx93/xClean/blob/main/xClean.py)

ChaosKing, can you update it on vsrepo. It's pretty much feature-complete.


done: vsrepo install xclean

MysteryX
14th October 2021, 01:08
- Dynamically adapt sharpness to not be affected by m1 downscaling and by KNLMeans h parameter.
- Changed sharp default value from 11 to 9.5.
- Sharp and renoise can now be float.
- Moved Depth to be run once after all passes are done.

These are significant QOL improvements. You can update vsrepo again.

I tested whether some videos benefit from altering KNL's h. The webcam video remains too noisy. Doubling it at h=2.8 is beneficial. Sharpness is now slightly increased to make up for it (without over-sharpening the other passes).

For 720p WebCam, optimal settings are: sharp=9.5, m1=.65, h=2.8
For 288p anime, optimal settings are: sharp=9.5, m1=.7, rn=0, optional depth=1
For 5K GoPro (with in-camera sharpening at Low), optimal settings are: sharp=7.7, strength=-50, m1=.6 (or .5 for performance)

Adjusting sharpening of pass 1's downscaling considerably reduces the need for sharpening of further passes. I also removed overboost sharpening (sharp=21-25); we don't need nearly that much sharpening.

MysteryX
17th October 2021, 02:24
After more testing... BM3D with radius=1 is pretty much required. It makes little difference on a frame-per-frame comparison, but in motion, it's required to stabilize the noise. Performance is a problem though. I cannot run radius=1 on my 5K videos with 16GB RAM. Radius is super heavy on RAM!

I added the option to downscale the BM3D pass. There is very little quality loss for running it at 70% size. For my 5K clips, 100% size with radius=0 runs about the same speed as 65% size with radius=1. That's acceptable. It also helps to deal with large-radius noise! MVTools runs at 50% size, then BM3D runs at 65% size, then KNLMeans runs at full size.

I also added various parameters to configure BM3D: block_step = 5, bm_range = 15, ps_range = 7, radius = 1

Question: for downscaling and upscaling, is default Bicubic enough, or is there some tweak that might give better results? It downscales, denoises and sharpens to make up for the downscaling blur, and upscales back. Perhaps I could try sharper downsacle/upscale methods, requiring less additional sharpening.

EDIT: Now use Bicubic b=0 c=.75 for first downscale and Spline16 for other resizes. No longer need to sharpen to compensate for downscaling.

There is a weird bug that with some random downscale factors... it just freezes. I guess some plugin (BM3D perhaps?) doesn't like some resolutions? I guess just need to resize to factors of 4 instead of factors of 2

MysteryX
18th October 2021, 05:25
Damn I never finish tweaking this thing. Each tweaking detail adds up, and it's looking pretty damn good. I could produce a video showing denoiser comparison. Only problem is that it runs slow. 5K clip, first pass at 50% size, 2nd pass at 65% size radius=1, 3rd pass normal, encodes at ~.3fps. Made some improvement on memory usage, it's now leaving a good 1-3GB free of my 16GB.

BM3D now converts to OPP format and uses radius=1 by default. Tweaked settings for max quality.

MVTools downscales with Bicubic(0, .75) and post-processing is downscaled.
It then resizes for BM3D using Spine16, and post-processing is done at full-resolution using Spline16 again.

KNL d=3 gives a slight plastic effect without helping much with the noise, so d=2 a=2 should be left like that. BM3D radius=1 helps stabilize the noise and extract textures, and OPP conversion helps get clearer objects. Increasing BM3D quality with block_step=4, bm_range=16, ps_range=8 is optional but it helps.

I also did a comparison with running MVTools+BM3D at full res without KNL. It doesn't extract the object details nearly as much as when running KNL afterwards.

MysteryX
18th October 2021, 15:42
Tweaked resize methods again:
Downscale for MVTools using Bicubic(0, .75) # Sharpest
Downscale source clip for BM3D using Bicubic(0, .5) # Most numerically accurate
All other resizes use Spline36 # Balanced

MysteryX
19th October 2021, 04:59
Released beta 4 (https://github.com/mysteryx93/xClean/blob/main/xClean.py)

TODO:
- re-test GRAY source and chroma=False
- for CPU mode, use BM3DCPU 32-bit or BM3D 16-bit?
- once FMTC releases a bug fix, use it for OPP-RGB conversion for better performance

Let me know if you see missing dependencies in the doc.

MysteryX
20th October 2021, 17:28
Beta 5 (https://github.com/mysteryx93/xClean/blob/main/xClean.py)

It's feature-complete. The only thing I would do at this point is
- look for a more optimized YUV-OPP conversion
- fix bugs
- convert to Avisynth

One possible performance improvement would be to process Luma and Chroma with different settings; but I run it at max quality anyway.

MysteryX
22nd October 2021, 18:10
Optimized RGB-OPP conversion for BM3D (10x faster)

Convert to YCgCoR for KNLMeans, which removes a slight "bleech" effect.

MysteryX
23rd October 2021, 04:14
Could someone help me out with optimization?

I wrote a 2nd version of the script that manages formats and colorspaces differently. The numbers are the output of vspipe -p showing plugins performance (showing the top of the list)

Script #1 (https://github.com/mysteryx93/xClean/blob/main/xClean.py) runs at 0.66fps

BM3D parallel 99.94 46.98
KNLMeansCL parreq 67.36 31.66
Bicubic parallel 47.61 22.38
Degrain3 parallel 39.56 18.59
Analyse parallel 33.87 15.92
Analyse parallel 33.84 15.91
Analyse parallel 33.68 15.83
Analyse parallel 33.66 15.82
Analyse parallel 33.44 15.72
Analyse parallel 33.19 15.60
Bicubic parallel 30.92 14.53
Super parallel 25.06 11.78
Bicubic parallel 24.06 11.31


Script #2 (https://github.com/mysteryx93/xClean/blob/main/xClean2.py) is crawling at 0.45fps

Bicubic parallel 91.75 61.10
BM3D parallel 67.49 44.94
bitdepth parallel 55.19 36.75
KNLMeansCL parreq 54.42 36.24
Degrain3 parallel 43.89 29.22
Analyse parallel 40.95 27.27
Analyse parallel 40.59 27.03
Analyse parallel 39.86 26.55
Analyse parallel 39.49 26.30
Analyse parallel 38.80 25.83
Analyse parallel 38.39 25.56
Bicubic parallel 36.30 24.17
Super parallel 23.90 15.92


Bicubic at the top of the list!?? It's the Bicubic at the top that converts YUV420P8 to RGB48. Why does it crawl like that?

Strange thing is that Script #2 has only a single resize, whereas Script #1 has 2 such resizes at the top.

Anyone knows what's wrong?

In terms of quality, I'm unable to detect a difference. Both process BM3D in OPP colorspace and KNLMeans in YCgCoR. MVTools running in YUV isn't having much impact on the output.

EDIT: I tested with a 720p video. Script 1 goes at 14.6fps with 3.1GB memory usage, Script 2 goes at 14.0fps with 2.8GB memory usage. That's more what I would expect.

Thus, the problem is a VapourSynth performance problem with high-res videos. I reported it here. (https://github.com/vapoursynth/vapoursynth/issues/824)

MysteryX
24th October 2021, 06:06
Released beta 6 (https://github.com/mysteryx93/xClean/blob/main/xClean.py)

Added feisty2's ChromaReconstructor_faster and nnedi3 chroma upscaling options to script 2. Set chroma = none|bicubic|nnedi3|reonstructor. nnedi3 has very little impact on performance, but reconstructor is very heavy; and it's a faster variation from the original script.

Unfortunately that 2nd script isn't very usable on 4K+ content until the performance bug is fixed in VapourSynth, but works perfectly fine for SD.

MysteryX
27th October 2021, 22:42
Released beta 7. (https://github.com/mysteryx93/xClean/blob/main/xClean.py)

Few tweaks, and now supports RGB source format. For 4K+ sources, it seems to work better in VapourSynth r55 than r57 in certain cases.

It looks stable now. Let me know if you encounter any issue.

PatchWorKs
6th November 2021, 08:42
Hi there, very nice work !

About optimizations: I fear that a port into a more performing language (such as C / C ++ or, even better, ASM) is mandatory, but a colab version like this (https://colab.research.google.com/github/rlaPHOENiX/VSGAN/blob/master/VSGAN.ipynb) would be also nice.

Quadratic
6th November 2021, 14:17
About optimizations: I fear that a port into a more performing language (such as C / C ++ or, even better, ASM) is mandatory, but a colab version like this (https://colab.research.google.com/github/rlaPHOENiX/VSGAN/blob/master/VSGAN.ipynb) would be also nice.

I'm not sure that would bring much benefit at all, since all the heavy processing is being done by plugins which are already written in the usual suspects (C / C++ / Rust).

MysteryX
7th November 2021, 04:28
Yes all the hard work is done by plugins; BM3D, MVTools and KNLMeans being most of the processing. They are already optimized; MVTools is probably the most poorly optimized old code.

Btw did you know it's faster to write using Expr than to write it in C++? Because Expr uses SIMD automatically whereas C++ code requires complicated inline assembly to not be slow.

MysteryX
10th November 2021, 18:22
Beta 8 (https://github.com/mysteryx93/xClean/blob/main/xClean.py), use FMTC for resize and resampling for performance.