xClean 3-Pass Denoiser - Page 3

MysteryX · 19th October 2021, 04:59

Released beta 4

TODO:
- re-test GRAY source and chroma=False
- for CPU mode, use BM3DCPU 32-bit or BM3D 16-bit?
- once FMTC releases a bug fix, use it for OPP-RGB conversion for better performance

Let me know if you see missing dependencies in the doc.

MysteryX · 20th October 2021, 17:28

Beta 5

It's feature-complete. The only thing I would do at this point is
- look for a more optimized YUV-OPP conversion
- fix bugs
- convert to Avisynth

One possible performance improvement would be to process Luma and Chroma with different settings; but I run it at max quality anyway.

MysteryX · 22nd October 2021, 18:10

Optimized RGB-OPP conversion for BM3D (10x faster)

Convert to YCgCoR for KNLMeans, which removes a slight "bleech" effect.

MysteryX · 23rd October 2021, 04:14

Could someone help me out with optimization?

I wrote a 2nd version of the script that manages formats and colorspaces differently. The numbers are the output of vspipe -p showing plugins performance (showing the top of the list)

Script #1 runs at 0.66fps

Code:

BM3D                 parallel        99.94      46.98
KNLMeansCL           parreq          67.36      31.66
Bicubic              parallel        47.61      22.38
Degrain3             parallel        39.56      18.59
Analyse              parallel        33.87      15.92
Analyse              parallel        33.84      15.91
Analyse              parallel        33.68      15.83
Analyse              parallel        33.66      15.82
Analyse              parallel        33.44      15.72
Analyse              parallel        33.19      15.60
Bicubic              parallel        30.92      14.53
Super                parallel        25.06      11.78
Bicubic              parallel        24.06      11.31

Script #2 is crawling at 0.45fps

Code:

Bicubic              parallel        91.75      61.10
BM3D                 parallel        67.49      44.94
bitdepth             parallel        55.19      36.75
KNLMeansCL           parreq          54.42      36.24
Degrain3             parallel        43.89      29.22
Analyse              parallel        40.95      27.27
Analyse              parallel        40.59      27.03
Analyse              parallel        39.86      26.55
Analyse              parallel        39.49      26.30
Analyse              parallel        38.80      25.83
Analyse              parallel        38.39      25.56
Bicubic              parallel        36.30      24.17
Super                parallel        23.90      15.92

Bicubic at the top of the list!?? It's the Bicubic at the top that converts YUV420P8 to RGB48. Why does it crawl like that?

Strange thing is that Script #2 has only a single resize, whereas Script #1 has 2 such resizes at the top.

Anyone knows what's wrong?

In terms of quality, I'm unable to detect a difference. Both process BM3D in OPP colorspace and KNLMeans in YCgCoR. MVTools running in YUV isn't having much impact on the output.

EDIT: I tested with a 720p video. Script 1 goes at 14.6fps with 3.1GB memory usage, Script 2 goes at 14.0fps with 2.8GB memory usage. That's more what I would expect.

Thus, the problem is a VapourSynth performance problem with high-res videos. I reported it here.

MysteryX · 24th October 2021, 06:06

Released beta 6

Added feisty2's ChromaReconstructor_faster and nnedi3 chroma upscaling options to script 2. Set chroma = none|bicubic|nnedi3|reonstructor. nnedi3 has very little impact on performance, but reconstructor is very heavy; and it's a faster variation from the original script.

Unfortunately that 2nd script isn't very usable on 4K+ content until the performance bug is fixed in VapourSynth, but works perfectly fine for SD.

MysteryX · 27th October 2021, 22:42

Released beta 7.

Few tweaks, and now supports RGB source format. For 4K+ sources, it seems to work better in VapourSynth r55 than r57 in certain cases.

It looks stable now. Let me know if you encounter any issue.

PatchWorKs · 6th November 2021, 08:42

Hi there, very nice work !

About optimizations: I fear that a port into a more performing language (such as C / C ++ or, even better, ASM) is mandatory, but a colab version like this would be also nice.

Quadratic · 6th November 2021, 14:17

Quote:

Originally Posted by PatchWorKs

About optimizations: I fear that a port into a more performing language (such as C / C ++ or, even better, ASM) is mandatory, but a colab version like this would be also nice.

I'm not sure that would bring much benefit at all, since all the heavy processing is being done by plugins which are already written in the usual suspects (C / C++ / Rust).

MysteryX · 7th November 2021, 04:28

Yes all the hard work is done by plugins; BM3D, MVTools and KNLMeans being most of the processing. They are already optimized; MVTools is probably the most poorly optimized old code.

Btw did you know it's faster to write using Expr than to write it in C++? Because Expr uses SIMD automatically whereas C++ code requires complicated inline assembly to not be slow.

MysteryX · 10th November 2021, 18:22

Beta 8, use FMTC for resize and resampling for performance.

PatchWorKs · 20th November 2021, 09:14

@Selur just tested HINet, how does it performs (in terms of both speed and fidelity) compared to xClean ?

Authors claims:

Quote:

With the help of HIN Block, HINet surpasses the state-of-the-art (SOTA) on various image restoration tasks. For image denoising, we exceed it 0.11dB and 0.28 dB in PSNR on SIDD dataset, with only 7.5% and 30% of its multiplier-accumulator operations (MACs), 6.8 times and 2.9 times speedup respectively. For image deblurring, we get comparable performance with 22.5% of its MACs and 3.3 times speedup on REDS and GoPro datasets. For image deraining, we exceed it by 0.3 dB in PSNR on the average result of multiple datasets with 1.4 times speedup.

Quote:

Originally Posted by Selur

Has anyone tested https://github.com/HolyWu/vs-hinet ?

Here are a few screen shots: (not sure what to make of them and for what content this is really useful)

Mode: Deblur GoPro

Mode: Deblur REDS

Mode: denoise

Mode: derain

Reclusive Eagle · 17th January 2022, 00:19

Bit of an older post but why not use Dfttest as a reference + BM3D?

I also find pre-sharpening the clip before the reference preserves more detail.
Using this combination preserves all but the most minute details in heavy noise clips.
I'm talking less than 1%-0.5% and only in the already faintest of details.

For anything that isn't SD compressed DVD grain, you will preserve all detail this way

In fact honestly after testing this I would recommend:

1: Double the resolution of your clip and then sharpen it. (This reduces the effect of sharpening noise)
2: Denoising with Dfttest
3: Using Dfftest for a BM3D Reference
4: Using the first BM3D Reference as a reference for a second BM3D
5: Downscale to original resolution

This will preserve massive amounts if not all detail, and because it relies on BM3D Cuda will result in 2fps + renders.
So +- 2 hours for 25 minutes of footage. If you have a better GPU than my 2014 750 TI you will get dramatically faster results.

Here is an example of a very noisy DVD IVTC:

Here is the original image after IVTC:
https://i.slow.pics/4c69Dt8B.png

Here is the same image after denoising:
https://i.slow.pics/jNtO6qyC.png

Even the clouds preserve all detail. Which is extremely, extremely hard to preserve in older anime backgrounds when denoising

To achieve this I:
1: pre-sharpened with LSFMod
2: Denoised with Dfftest
3: Denoised with BM3D using Dfftest as a reference
4: Denoised with a second BM3D using the first BM3D as a reference.

Here is the code I used if you or anyone else would like to replicate it.
Just remember this Dfftest's nlocation is custom made for this DVD to preserve extremely small details. It won't work on all clips.

Quote:

#Upscale
clip = core.resize.Lanczos(clip, 1440,1080)

#Pre-sharpen clip
clip = haf.LSFmod(clip, preblur=3, strength=110, Smode=1,Smethod=3, soothe=True, edgemaskHQ=True, secure=True,soft=10)
clip=core.fmtc.bitdepth (clip,bits=32)

#3 Pass Denoise
ref = core.dfttest.DFTTest(clip, ftype=1, tbsize=3, nlocation=[35,0,0,10, 28,0,83,87, 95,0,0,100],sigma=10.0,sst=[0.0,2.0, 1.0,20.0], ssystem=1,opt=3)
ref2= core.bm3dcuda.BM3D(clip, ref=ref, sigma=[25,25,25], block_step=3)
clip= core.bm3dcuda.BM3D(clip, ref=ref2, sigma=[17,17,17], block_step=3)

clip=core.fmtc.bitdepth (clip,bits=8)
clip.set_output()

Notice how each denoise decreases in intensity. Also for BM3D Block step of 1 will reduce performance by 80%.
Having it at 3 is the difference between rendering at 5.5fps vs 0.6fps. And there is ZERO difference in quality (in this case)

Reclusive Eagle · 17th January 2022, 01:22

In fact at this point the denoising is so good with BM3D if you wanted to you can just continue stacking BM3D references at less intensity.
You can have 8 pass noise reduction with 1 dfttest and 7 BM3D's with lightning speed compared to KLMeans.

And don't be afraid to add more sharpness after every few stacks.
For example, taking the above code I stacked it 4 times but this time added more sharpness to the original clip but because the references were so good-
I retained more detail, increased overall sharpness and gained zero new noise compared to the above example with this code:

Quote:

#Upscale
clip = core.resize.Lanczos(clip, 1440,1080)

#Pre-sharpen
clip = haf.LSFmod(clip, preblur=3, strength=110, Smode=1,Smethod=3, soothe=True, edgemaskHQ=True, secure=True,soft=10)
clip=core.fmtc.bitdepth (clip,bits=32)

#3 Pass denoise
ref = core.dfttest.DFTTest(clip, ftype=1, tbsize=3, nlocation=[35,0,0,10, 28,0,83,87, 95,0,0,100],sigma=10.0,sst=[0.0,2.0, 1.0,20.0], ssystem=1,opt=3)
ref2= core.bm3dcuda.BM3D(clip, ref=ref, sigma=[25,25,25], block_step=3)
ref3= core.bm3dcuda.BM3D(clip, ref=ref2, sigma=[17,17,17], block_step=3)

#Post reference sharpen
clip = haf.LSFmod(clip, preblur=3, strength=40, Smode=1,Smethod=3, soothe=True, edgemaskHQ=True, secure=True,soft=10)

#Final 4th pass denoise
clip= core.bm3dcuda.BM3D(clip, ref=ref3, sigma=[13,13,13], block_step=3)

clip=core.fmtc.bitdepth (clip,bits=8)
clip.set_output()

The results are again, increased sharpness due to the post reference sharpen however with zero increase to the overall noise in the final 4th denoise pass.
Obviously you can see sharpening halos. But those can be entirely fixed by masking.

You can find a really detailed masking tutorial on kageru's blog:
https://blog.kageru.moe/legacy/edgemasks.html

and you can fix sharpening halos directly with this tutorial also by Kageru:
https://guide.encode.moe/encoding/ma...iting-etc.html

Btw with this setup including Nnedi 3 and IVTC and upscaling and 4 pass denoising -
I am rendering at 2.8fps in prores 1440x1080 with an i5 9600k and GTX 750 Ti.

Just as a baseline for anyone who want's to compare future settings.

I also recommend tbilateral after your pre-sharpen but before your denoise references or-
Apply it to your final reference before final denoise (same affect either way).

This will give you an insanely clean image

kedautinh12 · 17th January 2022, 06:21

Wow, it's great

MysteryX · 23rd January 2022, 10:12

Reclusive Eagle, your reference clip is an anime while mine is raw camera footage -- great method, but completely different use-case.

It gives a nice polishing effect on your anime; but that will most likely give the plastic effect on other videos that I've been trying to avoid.

MysteryX · 6th February 2022, 20:42

Here's a denoising comparison video for xClean

WarnerBrah · 19th November 2022, 01:05

Script Error
Script error: There is no function named 'nmod'.
TransformsPack - Main.avsi, line 456
xClean.avsi, line 232

any help?

kedautinh12 · 19th November 2022, 01:46

You lack this script
https://github.com/Dogway/Avisynth-S...izersPack.avsi

WarnerBrah · 19th November 2022, 21:11

nnedi3_weights.bin is missing now

kedautinh12 · 19th November 2022, 21:21

You lack NNEDI3CL, download extra 1.0.3 to got .bin file and put same folder with .dll file from 1.0.5
https://github.com/Asd-g/AviSynthPlus-NNEDI3CL/releases

19th October 2021, 04:59	#41 \| Link
MysteryX Soul Architect Join Date: Apr 2014 Posts: 2,559	Released beta 4 TODO: - re-test GRAY source and chroma=False - for CPU mode, use BM3DCPU 32-bit or BM3D 16-bit? - once FMTC releases a bug fix, use it for OPP-RGB conversion for better performance Let me know if you see missing dependencies in the doc. __________________ FrameRateConverter \| AvisynthShader \| AvsFilterNet \| Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer Last edited by MysteryX; 19th October 2021 at 15:23.

20th October 2021, 17:28	#42 \| Link
MysteryX Soul Architect Join Date: Apr 2014 Posts: 2,559	Beta 5 It's feature-complete. The only thing I would do at this point is - look for a more optimized YUV-OPP conversion - fix bugs - convert to Avisynth One possible performance improvement would be to process Luma and Chroma with different settings; but I run it at max quality anyway. __________________ FrameRateConverter \| AvisynthShader \| AvsFilterNet \| Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer Last edited by MysteryX; 20th October 2021 at 17:34.

22nd October 2021, 18:10	#43 \| Link
MysteryX Soul Architect Join Date: Apr 2014 Posts: 2,559	Optimized RGB-OPP conversion for BM3D (10x faster) Convert to YCgCoR for KNLMeans, which removes a slight "bleech" effect. __________________ FrameRateConverter \| AvisynthShader \| AvsFilterNet \| Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer

23rd October 2021, 04:14	#44 \| Link
MysteryX Soul Architect Join Date: Apr 2014 Posts: 2,559	Could someone help me out with optimization? I wrote a 2nd version of the script that manages formats and colorspaces differently. The numbers are the output of vspipe -p showing plugins performance (showing the top of the list) Script #1 runs at 0.66fps Code: BM3D parallel 99.94 46.98 KNLMeansCL parreq 67.36 31.66 Bicubic parallel 47.61 22.38 Degrain3 parallel 39.56 18.59 Analyse parallel 33.87 15.92 Analyse parallel 33.84 15.91 Analyse parallel 33.68 15.83 Analyse parallel 33.66 15.82 Analyse parallel 33.44 15.72 Analyse parallel 33.19 15.60 Bicubic parallel 30.92 14.53 Super parallel 25.06 11.78 Bicubic parallel 24.06 11.31 Script #2 is crawling at 0.45fps Code: Bicubic parallel 91.75 61.10 BM3D parallel 67.49 44.94 bitdepth parallel 55.19 36.75 KNLMeansCL parreq 54.42 36.24 Degrain3 parallel 43.89 29.22 Analyse parallel 40.95 27.27 Analyse parallel 40.59 27.03 Analyse parallel 39.86 26.55 Analyse parallel 39.49 26.30 Analyse parallel 38.80 25.83 Analyse parallel 38.39 25.56 Bicubic parallel 36.30 24.17 Super parallel 23.90 15.92 Bicubic at the top of the list!?? It's the Bicubic at the top that converts YUV420P8 to RGB48. Why does it crawl like that? Strange thing is that Script #2 has only a single resize, whereas Script #1 has 2 such resizes at the top. Anyone knows what's wrong? In terms of quality, I'm unable to detect a difference. Both process BM3D in OPP colorspace and KNLMeans in YCgCoR. MVTools running in YUV isn't having much impact on the output. EDIT: I tested with a 720p video. Script 1 goes at 14.6fps with 3.1GB memory usage, Script 2 goes at 14.0fps with 2.8GB memory usage. That's more what I would expect. Thus, the problem is a VapourSynth performance problem with high-res videos. I reported it here. __________________ FrameRateConverter \| AvisynthShader \| AvsFilterNet \| Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer Last edited by MysteryX; 24th October 2021 at 00:40.

24th October 2021, 06:06	#45 \| Link
MysteryX Soul Architect Join Date: Apr 2014 Posts: 2,559	Released beta 6 Added feisty2's ChromaReconstructor_faster and nnedi3 chroma upscaling options to script 2. Set chroma = none\|bicubic\|nnedi3\|reonstructor. nnedi3 has very little impact on performance, but reconstructor is very heavy; and it's a faster variation from the original script. Unfortunately that 2nd script isn't very usable on 4K+ content until the performance bug is fixed in VapourSynth, but works perfectly fine for SD. __________________ FrameRateConverter \| AvisynthShader \| AvsFilterNet \| Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer Last edited by MysteryX; 24th October 2021 at 18:05.

27th October 2021, 22:42	#46 \| Link
MysteryX Soul Architect Join Date: Apr 2014 Posts: 2,559	Released beta 7. Few tweaks, and now supports RGB source format. For 4K+ sources, it seems to work better in VapourSynth r55 than r57 in certain cases. It looks stable now. Let me know if you encounter any issue. __________________ FrameRateConverter \| AvisynthShader \| AvsFilterNet \| Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer Last edited by MysteryX; 27th October 2021 at 22:48.

6th November 2021, 08:42	#47 \| Link
PatchWorKs Registered User Join Date: Aug 2002 Location: Italy Posts: 304	Hi there, very nice work ! About optimizations: I fear that a port into a more performing language (such as C / C ++ or, even better, ASM) is mandatory, but a colab version like this would be also nice.

7th November 2021, 04:28	#49 \| Link
MysteryX Soul Architect Join Date: Apr 2014 Posts: 2,559	Yes all the hard work is done by plugins; BM3D, MVTools and KNLMeans being most of the processing. They are already optimized; MVTools is probably the most poorly optimized old code. Btw did you know it's faster to write using Expr than to write it in C++? Because Expr uses SIMD automatically whereas C++ code requires complicated inline assembly to not be slow. __________________ FrameRateConverter \| AvisynthShader \| AvsFilterNet \| Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer

10th November 2021, 18:22	#50 \| Link
MysteryX Soul Architect Join Date: Apr 2014 Posts: 2,559	Beta 8, use FMTC for resize and resampling for performance. __________________ FrameRateConverter \| AvisynthShader \| AvsFilterNet \| Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer

17th January 2022, 06:21	#54 \| Link
kedautinh12 Registered User Join Date: Jan 2018 Posts: 2,156	Wow, it's great

23rd January 2022, 10:12	#55 \| Link
MysteryX Soul Architect Join Date: Apr 2014 Posts: 2,559	Reclusive Eagle, your reference clip is an anime while mine is raw camera footage -- great method, but completely different use-case. It gives a nice polishing effect on your anime; but that will most likely give the plastic effect on other videos that I've been trying to avoid. __________________ FrameRateConverter \| AvisynthShader \| AvsFilterNet \| Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer

6th February 2022, 20:42	#56 \| Link
MysteryX Soul Architect Join Date: Apr 2014 Posts: 2,559	Here's a denoising comparison video for xClean __________________ FrameRateConverter \| AvisynthShader \| AvsFilterNet \| Natural Grounding Player with Yin Media Encoder, 432hz Player, Powerliminals Player and Audio Video Muxer

19th November 2022, 01:05	#57 \| Link
WarnerBrah DPX -> HEVC@Veryslow Join Date: Oct 2022 Posts: 6	Script Error Script error: There is no function named 'nmod'. TransformsPack - Main.avsi, line 456 xClean.avsi, line 232 any help?

19th November 2022, 01:46	#58 \| Link
kedautinh12 Registered User Join Date: Jan 2018 Posts: 2,156	You lack this script https://github.com/Dogway/Avisynth-S...izersPack.avsi

19th November 2022, 21:11	#59 \| Link
WarnerBrah DPX -> HEVC@Veryslow Join Date: Oct 2022 Posts: 6	nnedi3_weights.bin is missing now

19th November 2022, 21:21	#60 \| Link
kedautinh12 Registered User Join Date: Jan 2018 Posts: 2,156	You lack NNEDI3CL, download extra 1.0.3 to got .bin file and put same folder with .dll file from 1.0.5 https://github.com/Asd-g/AviSynthPlus-NNEDI3CL/releases Last edited by kedautinh12; 19th November 2022 at 21:26.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode