Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 8th March 2023, 12:29   #221  |  Link
anton_foy
Registered User
 
Join Date: Dec 2005
Location: Sweden
Posts: 702
Quote:
Originally Posted by DTL View Post
Some important note for multi-generation MVs refinement: The thSAD for MDegrain need to be significantly reduced after 1st generation of MAnalyse using first generation of MDegrain output. Because SAD of mostly cleaned 'current' block with input noised block become about 2time lower. So thSAD for intermediate generations and last output MDegrain need to be reduced to about 0.5 of initial.

So better multi-generation MVs refinement is some like:
Code:
init_thSAD=400

s1=MSuper()
mv1 = MAnalyse(s1)
dg1 = MDegrain(s1, mv1, thSAD=init_thSAD)

1stgen_thSAD = (int)(init_thSAD/1.8) # divisor - subject to Zopti refine ?

s2=MSuper(dg1)
mv2 = MAnalyse(s2, SuperCurrent=s1) # or (s1, SuperCurrent=s2) - may be not visible difference
dg2=MDegrain(s1, mv2, thSAD=1stgen_thSAD)
Also it was found enabling trymany=true in MAnalyse while good refining zero MVs also may add some significantly bad MVs. So it is planned to add flags for predictors used in trymany mode to skip possibly bad predictors and to make performance visibly better.
Love those updates! Seems logical with lowering the thSad in the next step. Does this work using DX12 me?
anton_foy is offline   Reply With Quote
Old 8th March 2023, 13:14   #222  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
Yes - MAnalyse with optional SuperCurrent can be used with any optSearchOption value (so including hardware search options). I even think of using >1 HWacc in the system for better performance in pipelined way. So first HWacc making initial analysis and second make refining step.

Later we will have many cheap secondhand old HWaccs capable of DX12-ME so it may be tested. Currently you can try accnum different for MAnalyses:
Code:
init_thSAD=400

s1=MSuper()
mv1 = MAnalyse(s1, optSearchOption=5, accnum=1) # use first DX12-ME accelerator in system or accnum=0 ? need testing
dg1 = MDegrain(s1, mv1, thSAD=init_thSAD)

1stgen_thSAD = (int)(init_thSAD/1.8) # divisor - subject to Zopti refine ?

s2=MSuper(dg1)
mv2 = MAnalyse(s2, SuperCurrent=s1, optSearchOption=5, accnum=2) # use second DX12-ME accelerator in system or accnum=1 - need testing 
dg2=MDegrain(s1, mv2, thSAD=1stgen_thSAD)
Currently using 2 MAnalyse with single HWacc will drop performance about 2 times.

Also may be combination of 1 external PCI-board DX12-ME acc and build-in into CPU may be tested where avaiable.

Also as I read some NVIDIA boards/chips have >1 MPEG encoder ASIC (?) so may expose >1 full-speed DX12-ME interfaces for applications.

At https://developer.nvidia.com/video-e...ort-matrix-new
# OF CHIPS

# OF NVENC /CHIP

Total # of NVENC

So GeForce GTX 965M > 980M / 980MX Maxwell (2nd Gen) may have 2 full-speed DX12-ME interfaces ?
Also GeForce GTX 960 Ti / 970 / 980 , GeForce GTX 980 Ti , GeForce GTX Titan X
GeForce GTX 1070M / 1080M , GeForce GTX 1070 / 1070Ti, GeForce GTX 1080 , GeForce GTX 1080 Ti, GeForce GTX Titan X / Titan Xp


Same is GeForce RTX 4080 Laptop , GeForce RTX 4080 16GB , GeForce RTX 4090 Laptop , GeForce RTX 4090 - but much more expensive.

Also Titan V - 3 NVENC.

Dogway have GTX 1070? May be good to try to ask for testing 1 vs 2 MAnalyse performance (also accepting different accnum values >0 or >1).

Addition: I not sure if several MPEG encoder ASICs located in single physical board will be switched as different Direct3D12 devices with 'accnum' param. May be environment will auto-spread motion estimation tasks if single board have several task dispatch resources avaialble. So 2-NVENC boards may be simply allow to run 2 MAnalyse with about equal speed with default accum=0.

Last edited by DTL; 8th March 2023 at 14:59.
DTL is offline   Reply With Quote
Old 8th March 2023, 20:26   #223  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
New release: https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.20

Fixed possible bug with trymany in MAnalyse.
Added trymany into optPredictorType=1 mode (zero, global and median predictors only).

Added partial fix for 4:2:x formats processing chroma shift issue for MAnalyse, MDegrainN, MCompensate (may also MRecalculate). With the curernt pel-precision from MSuper.

The multi-generations MVs refining looks like also work very visibly against blurring for complex motion like facial animation.

Cleaned from MShow processing script:
Code:
my_DMFlags=1
my_thSAD=300
my_thSAD2=250

my_thSAD_mg=150
my_thSAD2_mg=100

my_thSCD=500

my_global=true
my_pzero=10
my_pnew=10
my_pglobal=10

my_pel=2
my_trymany=true

my_oPT=1

tr=6
super=MSuper(last,chroma=true, mt=false, pel=my_pel)

multi_vec=MAnalyse(super, multi=true, delta=tr, search=3, searchparam=2, trymany=my_trymany, overlap=0, chroma=true, mt=false, optSearchOption=1, truemotion=false, pnew=my_pnew, pzero=my_pzero, pglobal=my_pglobal, global=my_global, levels=4, DMFlags=my_DMFlags, optPredictorType=my_oPT)
g1=MDegrainN(super, multi_vec, tr, thSAD=my_thSAD, thSAD2=my_thSAD2, mt=false, wpow=4, thSCD1=my_thSCD, adjSADzeromv=0.5, adjSADcohmv=0.5, thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=3)

gen1=g1

super_g1=MSuper(gen1,chroma=true, mt=false, pel=my_pel)
multi_vec_g2=MAnalyse(super_g1, SuperCurrent=super, multi=true, delta=tr, search=3, searchparam=2, trymany=my_trymany, overlap=0, chroma=true, mt=false, optSearchOption=1, truemotion=false, pnew=my_pnew, pzero=my_pzero, pglobal=my_pglobal, global=my_global, levels=4, DMFlags=my_DMFlags, optPredictorType=my_oPT)

g2=MDegrainN(super, multi_vec_g2, tr, thSAD=my_thSAD_mg, thSAD2=my_thSAD2_mg, mt=false, wpow=4, thSCD1=my_thSCD, adjSADzeromv=0.5, adjSADcohmv=0.5, thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=3)

gen2=g2

super_g2=MSuper(gen2,chroma=true, mt=false, pel=my_pel)
multi_vec_g3=MAnalyse(super_g2, SuperCurrent=super, multi=true, delta=tr, search=3, searchparam=2, trymany=my_trymany, overlap=0, chroma=true, mt=false, optSearchOption=1, truemotion=false, pnew=my_pnew, pzero=my_pzero, pglobal=my_pglobal, global=my_global, levels=4, DMFlags=my_DMFlags, optPredictorType=my_oPT)
g3=MDegrainN(super, multi_vec_g3, tr, thSAD=my_thSAD_mg, thSAD2=my_thSAD2_mg, mt=false, wpow=4, thSCD1=my_thSCD, adjSADzeromv=0.5, adjSADcohmv=0.5, thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=3)

gen3=g3

super_g3=MSuper(gen3,chroma=true, mt=false, pel=my_pel)
multi_vec_g4=MAnalyse(super_g3, SuperCurrent=super, multi=true, delta=tr, search=3, searchparam=2, trymany=my_trymany, overlap=0, chroma=true, mt=false, optSearchOption=1, truemotion=false, pnew=my_pnew, pzero=my_pzero, global=my_global, levels=4, DMFlags=my_DMFlags, optPredictorType=my_oPT)
g4=MDegrainN(super, multi_vec_g4, tr, thSAD=my_thSAD_mg, thSAD2=my_thSAD2_mg, mt=false, wpow=4, thSCD1=my_thSCD, adjSADzeromv=0.5, adjSADcohmv=0.5, thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=3)

gen4=g4

super_g4=MSuper(gen4,chroma=true, mt=false, pel=my_pel)
multi_vec_g5=MAnalyse(super_g4, SuperCurrent=super, multi=true, delta=tr, search=3, searchparam=2, trymany=my_trymany, overlap=0, chroma=true, mt=false, optSearchOption=1, truemotion=false, pnew=my_pnew, pzero=my_pzero, pglobal=my_pglobal, global=my_global, levels=4, DMFlags=my_DMFlags, optPredictorType=my_oPT)
g5=MDegrainN(super, multi_vec_g5, tr, thSAD=my_thSAD_mg, thSAD2=my_thSAD2_mg, mt=false, wpow=4, thSCD1=my_thSCD, adjSADzeromv=0.5, adjSADcohmv=0.5, thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=3)

gen5=g5

super_g5=MSuper(gen5,chroma=true, mt=false, pel=my_pel)
multi_vec_g6=MAnalyse(super_g5, SuperCurrent=super, multi=true, delta=tr, search=3, searchparam=2, overlap=0, chroma=true, mt=false, optSearchOption=1, truemotion=false,  pnew=my_pnew, pzero=my_pzero, pglobal=my_pglobal, global=my_global, levels=4, DMFlags=my_DMFlags, optPredictorType=my_oPT)
g6=MDegrainN(super, multi_vec_g6, tr, thSAD=my_thSAD_mg, thSAD2=my_thSAD2_mg, mt=false, wpow=4, thSCD1=my_thSCD, adjSADzeromv=0.5, adjSADcohmv=0.5, thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=3)

return Interleave(g6.Subtitle("g6"), g2.Subtitle("g2"), g1.Subtitle("g1"))
Generally gen2 is already much sharper at motion in compare with gen1 (standard MDegrainN). Gen6 sometime look a bit better and sometime more blurry. May be some average good number of generations is between 2 and 6 (or some detail-restoration processing may be added to regain details from high-gen if that frame (area of frame) is sharper).

Frames g1, g2 and g6 2x enlarged with BSpline:




It was non-field separated interlaced so 2 fields present.

May be somehow this many calls to MSuper/MAnalyse/MDegrainN for each generation of MVs refining can be compacted to some AVS function and make script smaller.

imgsli comparisons:
https://imgsli.com/MTYwODgx
https://imgsli.com/MTYwODgw

Last edited by DTL; 9th March 2023 at 13:26. Reason: fixed error in last gen6 MDegrainN
DTL is offline   Reply With Quote
Old 9th March 2023, 22:57   #224  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
Real working script with both accelerator and CPU search and refining functions. For 1920x1080i input.

Code:
# Input plugins
LoadPlugin("ffms2.dll")
LoadPlugin("mvtools2.dll")

SetFilterMTMode("DEFAULT_MT_MODE", 3)

my_thSAD=260
my_thSAD2=240

my_thSAD_mg=130
my_thSAD2_mg=120

my_thSCD=500

my_pzero=10
my_pnew=10
my_pglobal=10

my_pel=2
my_thCohMV=5 # 5..8 for pel=2, 10..16 for pel=4 ?
my_trymany=true

my_oPT=1
my_overlap=0
my_IntOvlp=3
my_searchparam=2

my_MPBNumIt=2

my_init_tr=12
my_refine_tr=12

Function RefineMV(clip mvclip, clip super_ref, clip src, int _thSAD, int _thSAD2, int in_tr, int refine_tr, int my_thSCD, int my_pel, bool my_trymany, int my_pnew, int my_pzero, int my_pglobal, \
int my_oPT, int my_overlap, int my_searchparam, int my_IntOvlp, int my_thCohMV)
{
 g_next=MDegrainN(src, super_ref, mvclip, in_tr, thSAD=_thSAD, thSAD2=_thSAD2, mt=false, wpow=4, thSCD1=my_thSCD, adjSADzeromv=0.5, adjSADcohmv=0.5, thCohMV=my_thCohMV, \
MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=my_IntOvlp)
 super_g_next=MSuper(g_next,chroma=true, mt=false, pel=my_pel)
 return MAnalyse(super_g_next, SuperCurrent=super_ref, multi=true, delta=refine_tr, search=3, searchparam=my_searchparam, trymany=my_trymany, overlap=my_overlap, chroma=true, mt=false,\
 optSearchOption=1, truemotion=false, pnew=my_pnew, pzero=my_pzero, pglobal=my_pglobal, global=true, optPredictorType=my_oPT)
}

Function RefineMV_HW(clip mvclip, clip super_ref, clip src, int _thSAD, int _thSAD2, int in_tr, int refine_tr, int my_thSCD, int my_pel, int my_thCohMV)
{
 g_next=MDegrainN(src, super_ref, mvclip, in_tr, thSAD=_thSAD, thSAD2=_thSAD2, mt=false, wpow=4, thSCD1=my_thSCD, adjSADzeromv=0.5, adjSADcohmv=0.5, thCohMV=my_thCohMV, \
MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=3, UseSubShift=1)
 super_g_next=MSuper(g_next,chroma=true, mt=false, pel=my_pel, levels=1, pelrefine=false)
 return MAnalyse(super_g_next, SuperCurrent=super_ref, multi=true, delta=refine_tr, chroma=true, mt=false, optSearchOption=5, levels=1)
}


FFmpegSource2("1920x1080i.mp4")

AddBorders(0,0,0,72)

noproc=last

SeparateFields()

super_hwa=MSuper(last, mt=false, chroma=true, pel=my_pel, hpad=8, vpad=8, levels=1, pelrefine=false)
super_cpu=MSuper(last, mt=false, chroma=true, pel=my_pel, hpad=8, vpad=8, levels=0, pelrefine=true)

multi_vec_hwa=MAnalyse(super_hwa, multi=true, blksize=8, delta=my_init_tr, overlap=0, chroma=true, optSearchOption=5, mt=false, levels=1)
multi_vec_cpu=MAnalyse(super_cpu, multi=true, delta=my_init_tr, search=3, searchparam=my_searchparam, trymany=my_trymany, overlap=my_overlap, chroma=true, mt=false, \
optSearchOption=1, truemotion=false, pnew=my_pnew, pzero=my_pzero, pglobal=my_pglobal, global=true, optPredictorType=my_oPT)

multi_vec_cpu2=RefineMV(multi_vec_cpu, super_cpu, last, my_thSAD, my_thSAD2, my_init_tr, my_refine_tr, my_thSCD, my_pel, my_trymany, my_pnew, my_pzero, my_pglobal, my_oPT, \
my_overlap, my_searchparam, my_IntOvlp, my_thCohMV)
multi_vec_hybr2=RefineMV(multi_vec_hwa, super_cpu, last, my_thSAD, my_thSAD2, my_init_tr, my_refine_tr, my_thSCD, my_pel, my_trymany, my_pnew, my_pzero, my_pglobal, my_oPT, \
my_overlap, my_searchparam, my_IntOvlp, my_thCohMV)
multi_vec_hwa2=RefineMV_HW(multi_vec_hwa, super_hwa, last, my_thSAD, my_thSAD2, my_init_tr, my_refine_tr, my_thSCD, my_pel, my_thCohMV)

cpu2=MDegrainN(last,super_cpu, multi_vec_cpu2, my_refine_tr, thSAD=my_thSAD_mg, thSAD2=my_thSAD2_mg, mt=false, wpow=4, UseSubShift=1, thSCD1=my_thSCD, adjSADzeromv=0.7, \
adjSADcohmv=0.7, thCohMV=my_thCohMV, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=my_IntOvlp, MPBthSub=5, MPBthAdd=20, MPBNumIt=my_MPBNumIt, \
MPB_SPCsub=0.5, MPB_SPCadd=1.5, MPBthIVS=2200, showIVSmask=false).Weave().Subtitle("cpu2")
hwa2=MDegrainN(last,super_hwa, multi_vec_hwa2, my_refine_tr, thSAD=my_thSAD_mg, thSAD2=my_thSAD2_mg, mt=false, wpow=4, UseSubShift=1, thSCD1=my_thSCD, adjSADzeromv=0.7, \
adjSADcohmv=0.7, thCohMV=my_thCohMV, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=my_IntOvlp, MPBthSub=5, MPBthAdd=20, MPBNumIt=my_MPBNumIt, \
MPB_SPCsub=0.5, MPB_SPCadd=1.5, MPBthIVS=2200, showIVSmask=false).Weave().Subtitle("hwa2")
hybr2=MDegrainN(last,super_hwa, multi_vec_hybr2, my_refine_tr, thSAD=my_thSAD_mg, thSAD2=my_thSAD2_mg, mt=false, wpow=4, UseSubShift=1, thSCD1=my_thSCD, adjSADzeromv=0.7, \
adjSADcohmv=0.7, thCohMV=my_thCohMV, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=my_IntOvlp, MPBthSub=5, MPBthAdd=20, MPBNumIt=my_MPBNumIt, \
MPB_SPCsub=0.5, MPB_SPCadd=1.5, MPBthIVS=2200, showIVSmask=false).Weave().Subtitle("hybr2")

Interleave(cpu2, hybr2, hwa2, noproc.Subtitle("src"))

#last=hybr2

Crop(0,0,1920,1080)

Prefetch(6)
Examples of both accelerator and CPU search and refining and hybrid mode (accelerator first search and CPU refining). Quality onCPU is a bit better. Full CPU search and refine at i5-9600K and 1920x1080i frame run at about 0.28 fps.
Hybrid mode with GTX1060 and i5-9600K CPU run at about 1.24 fps (pel=4) and 1.75fps (pel=2). Quality is close to full CPU search.
Full accelerator search and refine run only a bit faster (about 1.3 fps with pel=4) and quality is a bit lower of hybrid mode at some scenes.

Last edited by DTL; 10th March 2023 at 17:25. Reason: updated script
DTL is offline   Reply With Quote
Old 15th March 2023, 18:39   #225  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
New version: https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.21
Added new processing mode to MDegrainN: MEL (Most Equal Looking) search mode for TTH (Temporal Thresholding).
New params to MDegrainN:

pmode=0 (default) - standard blending, pmode=1 - MEL search and TTH only.

TTH_DMFlags - dismetric flags for estimating blocks difference at TTH compare. Flags 0x01 to 0x20 valid (except 0x08).

TTH_thUPD (0 default, additional thresholding disabled, 100% linear mode, must be >0 for pmode=1) - integer threshold for selection: keep output old in pmode=0 or 'best' in pmode=1 block from memory or update block in memory and output new block. Typical working values expected to be significantly below thSAD (like thSAD/3.. thSAD/4 and less). Startng from 0. 0 mean no blocks from memory used (standard MDegrainN mode - FIR filter).

TTH_chroma - use chroma in TTH dismetric analysis (slower, better quality) or not (faster).

Fixed performance issue with double processing of chroma planes in combined YUV processing with no overlap.

Current testscript:
Code:
tr=10
super=MSuper(last, mt=false, chroma=true, pel=2, hpad=8, vpad=8, levels=0, pelrefine=true)
multi_vec=MAnalyse(super, multi=true, blksize=8, delta=tr, search=3, searchparam=2, overlap=0, optSearchOption=1, optPredictorType=0, chroma=false, mt=false)
ref=MDegrainN(last,super, multi_vec, tr, thSAD=185, thSAD2=170, mt=false, wpow=4, thSCD1=350, adjSADzeromv=0.5, adjSADcohmv=0.5, thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, IntOvlp=3)

super2=MSuper(ref, mt=false, chroma=true, pel=2, hpad=8, vpad=8, levels=0, pelrefine=true)
MDegrainN(ref,super2, multi_vec, tr, thSAD=250, thSAD2=240, mt=false, thSCD1=350, pmode=1, TTH_thUPD=100, IntOvlp=3)
TTH_thUPD may be also enabled in 'standard' blending modes (pmode=0 (default)) too (both overlap and no overlap combined YUV processing). It is (much) faster but may provide somehow lower quality. Currently no motion block tracking available so it is mostly effective for completely static blocks only. Some limited motion tracking expected to be in some future versions.

Complexity of analysis in pmode=1 currently is ~tr^2 so it may use separate tr value (and mvclip created with lower tr value). Quality expected to be ~tr value (probability to found most commonly looking block in the total tr-pool). Param thSAD in pmode=1 also controls initial block skipping when accumulating blocks in analysis pool.

TTH_thUPD is the main param to adjust - the higher its value - the more noise blocks are skipped but too high value may cause 'hanging' blocks visible or motion quality degradation. Setting too high thSAD in pmode=1 also may cause more artifacts.

pmode=1 expected to be 'final cleaning' after initial MDegrainN (also must use new super clip with pre-denoised frames) and if highest quality required. For general everyday encodings may be enough to play with TTH_thUPD param in standard pmode=0.

Last MDegrainN with pmode=1 may or may not use refined mv-clip (for best results best refined mvclip is recommended).

pmode=1 not blend at all - so no degrade details quality with any thSAD. It only additional way to select 'best' looking block in current tr-scope and duplicate it in output frames until visual difference with current frame block is below threshold.

TTH_thUPD may be enabled in any MDegraiN in processing script (in MVs refining and final degrain and final cleanup).
TTH_DMFlags may set any avaialable dismetric for visual difference analysis (SAD - faster, SSIM and VIF - slower) at any MDegrainN with enabled TTH_thUPD or pmode=1.

Last edited by DTL; 16th March 2023 at 10:25.
DTL is offline   Reply With Quote
Old 30th March 2023, 11:04   #226  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
Some morning quicky implementation of this year idea about noise bitrate estimation to check the degrain quality.

Release 30.03.2023 - https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.22

Added computing and displaying of residual noise bits count per frame to MCompensate.
Compute sum of log2 of the samples absolute difference between source and motion compensated output frame of MCompensate. For complete static frame sequence RNB=0 bits/frame. For noise bitrate per second - value should be multiplied to frame rate.

New param to MCompensate: showRNB (default = false).

Usage example:
super=MSuper()
mv=MAnalyse(super)
MCompensate(super, mv, showRNB=true)

Currently only for 8bit sources. Need to offset processing function to templated for HBD support. Can process Y only input clip or YUV/RGB (3 planes present). For >1 planes the sum of all planes is displayed.

Computing part:
Code:
    for (int y = 0; y < nHeight; y++)
    {
      uint8_t* pDstFrame = pDst[0] + nDstPitches[0] * y;
      uint8_t* pSrcFrame = (uint8_t*)pSrc[0] + nSrcPitches[0] * y + nOffset[0];
      for (int x = 0; x < nWidth; x++)
      {
        iSumNzBits += 32 - __lzcnt(SADABS((int)pSrcFrame[x] - (int)pDstFrame[x]));
      }
    }
Not applicable to float32 samples directly (need convert to some finite precision integer first <32bit).

Usage example to measure denoise process:
Code:
SeparateFields()

fields_orig=last

tr=3

super=MSuper(last, mt=false, chroma=true, pel=2, hpad=8, vpad=8, levels=0, pelrefine=true)
multi_vec=MAnalyse(super, multi=true, blksize=8, delta=tr, search=3, searchparam=2, overlap=0, optSearchOption=1, optPredictorType=0, chroma=false, mt=false, DMFlags=1)
ref=MDegrainN(last,super, multi_vec, tr, thSAD=185, thSAD2=170, mt=false, wpow=4, thSCD1=350, adjSADzeromv=0.5, adjSADcohmv=0.5, thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, IntOvlp=3)

super2=MSuper(ref, mt=false, chroma=true, pel=2, hpad=8, vpad=8, levels=0, pelrefine=true)
MDegrainN(ref,super2, multi_vec, tr, thSAD=350, thSAD2=340, mt=false, thSCD1=350, pmode=1, TTH_thUPD=100, IntOvlp=3)

super_ref=MSuper(ref)
mv_ref=MAnalyse(super_ref)
rnb_den_ref=MCompensate(super_ref, mv_ref, showRNB=true)

super2=MSuper()
mv2=MAnalyse(super2)
rnb_den=MCompensate(super2, mv2, showRNB=true)

super_orig=MSuper(fields_orig)
mv_orig=MAnalyse(super_orig)
rnb_orig=MCompensate(super_orig, mv_orig, showRNB=true)

StackHorizontal(rnb_den, rnb_den_ref,  rnb_orig)

Weave()
Output sample frame


Yes - the fileds are blended not very nice. It shows how for static areas the second MDegraiN(pmode=1) decreases noise bitcount about 10 times. First stage denoise about 2.9 times decrease noise bitrate. Addition of secondary non-linear IIR-type filter with memory decreases nosie bitrate about 30 times from source.

Completely (100%) temporal denoised frame sequence for zero calibration is
Trim(1,1)
Loop()

Last edited by DTL; 30th March 2023 at 14:30.
DTL is offline   Reply With Quote
Old 30th March 2023, 11:49   #227  |  Link
TDS
Formally known as .......
 
TDS's Avatar
 
Join Date: Sep 2021
Location: Down Under.
Posts: 965
Quote:
Originally Posted by DTL View Post
Some morning quicky implementation of this year about noise bitrate estimation to check the degrain quality.

Release 30.03.2023 - https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.22
Just to confirm, and probably a noobie question, but the 2 variants, DX12 & noDX12, are reliant on Direct-X being installed on the system (or not) ??
__________________
Long term RipBot264 user.

RipBot264 modded builds..
TDS is offline   Reply With Quote
Old 30th March 2023, 14:03   #228  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
If you have Win10 or later and compatible hardware you can use DX12 build. It will not load at Win7 or other without DX12 installed. If you not use DX12 search modes in MAnalyse you can safely use noDX12 build.
DTL is offline   Reply With Quote
Old 30th March 2023, 18:54   #229  |  Link
mastrboy
Registered User
 
Join Date: Sep 2008
Posts: 365
DTL, I can't get your builds to work at all in Windows11, I have tried all 4 different .dll's in the zip file...

AVSmeter just stops at 0 frames forever, until I hit ctrl+c.
Code:
AVSMeter64.exe -o d:\test.avs

AVSMeter 3.0.9.0 (x64), (c) Groucho2004, 2012-2021
AviSynth+ 3.7.3 (r3973, 3.7, x86_64) (3.7.3.0)

Number of frames:                    33304
Length (hh:mm:ss.ms):         00:23:09.054
Frame width:                           960
Frame height:                          720
Framerate:                          23.976 (24000/1001)
Colorspace:                           YV12

Frame (current | last):         0 | 33303
Virtualdub gives me a cryptic memory violation message:
Code:
An out-of-bounds memory access (access violation) occurred in module 'VirtualDub64'...
...reading address FFFFFFFFFFFFFFFF.
Avspmod gives a similear error message:
Code:
Traceback (most recent call last):
  File "_ctypes/callbacks.c", line 315, in 'calling callback function'
  File "avsp.pyo", line 5136, in local_wnd_proc
WindowsError: exception: access violation reading 0xFFFFFFFFFFFFFFFF
I have none of these issues with https://github.com/pinterf/mvtools/releases

I also have no idea how to troubleshoot this other than give you some information about my system and hope you have any idea of what is wrong:
AviSynth+ 3.7.3 (r3973, 3.7, x86_64)
Windows 11 22H2 (22621.1413)

Avisynth script I tested with:
Code:
FFVideoSource("test.mkv")
SMDegrain(tr=3, thSAD=400, RefineMotion=false, contrasharp=false, plane=4, prefilter=0, chroma=true)
__________________
(i have a tendency to drunk post)

Last edited by mastrboy; 30th March 2023 at 18:56.
mastrboy is offline   Reply With Quote
Old 30th March 2023, 19:41   #230  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
Unfortunately my builds may be not compatible with many old scripts (using no-default block size of 16x16 and may be some more not tested options). So it still pre-release demos of some features and mostly tested at the examples scripts provided here and typically block size of 8x8. I even make somehow changed QTGMC to use with my builds when I tested deinterlacing.

So it is no good to put this .dll in 'common' folder and recommended to load with LoadPlugin() from current working folder. It is expected may be in some years (in beginning of 2024 it is expected great all planet celebration of 20 years for mvtools) we will have some features ported to 'more official' pinterf or may be other programmer capable to test and bugfix most of supported modes of mvtools. But it still not happen. I going to make some e-table (may be google web docs ?) of all new features and ideas accumulated and partially implemented for post-2.7.45 version with current 'status' and other data for analyse and creating list of mostly important features to port/bugfix.

Also I not use SMDegrain script and make my own denosie scripts based on mvtools only. So I not know what cause crash there. May be some day I will have time to attempt to install SMDegrain and check it with debugger where may be crash with that settings and if it possible to more or less fast to fix it.

For the very first possible solution it is recommended to test with block size of 8x8 (internal default for mvtools).

Though if you use SMDegrain as I read it still not support any new features of post-2.7.45 mvtools so it may be safely to use old 'stable' 2.7.45 build from pinterf. May be still many years until we will have some more stable post-2.7.45 build fully compatible with 2.7.45 processing with default new settings and Dogway will make changes to SMDegrain to use new features.

Last edited by DTL; 30th March 2023 at 20:14.
DTL is offline   Reply With Quote
Old 31st March 2023, 22:22   #231  |  Link
takla
Registered User
 
Join Date: May 2018
Posts: 182
@DTL
Have you seen this
https://devblogs.microsoft.com/direc...y-sdk-1-710-0/
Is it applicable for mvtools?
takla is offline   Reply With Quote
Old 31st March 2023, 23:05   #232  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
I have some strategic idea how may be make current post-2.7.45 version more compatible with old scripts and 2.7.45 build - rename all filters with adding _a to the end (like alpha-state). So it may be possible to load both 2.7.45 and post-2.7.45 mvtools in single AVS environment and only use selected filters from post-2.7.45 if required (also it may be (partially) compatible in between - super and mv clips). Now because of same naming it either not loads or may cause undefined usage of different filters from different .dlls. May be in next build.

"Is it applicable for mvtools?"

About new heaps mode - currently the performance is very few limited by textures upload and backward download of MVs and SAD data is very small in size. About sampling - currently some 'simple' sampling mode used in SAD compute shader (sort of sample(x,y) request as CPU from host RAM do (not possible 'complex 3D' sampling when texture mapping to some virtual triangle or other patch performed). So no update of sampling required and can not help in performance.
DTL is offline   Reply With Quote
Old 17th June 2023, 21:20   #233  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
New release: https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.23

Added denoise mask clip input into MDegrainN. Work only on block-based mode. Must be Y8 format with frame size equal to blocks number to process (including any used overlap mode).

New param to MDegrainN:
dnmask - clip. 0 is full standard denoise, 255 is no denoise (so positive Y-channel can be used as mask to degrain only low brightness levels).

Example script (for IntOvlp=3):
dn_mask1=ConvertToY8()

blksize=8
#int_ ovlp=3
dn_mask_x=dn_mask1.width/blksize
overlap_size=blksize/2
dn_mask_y=(dn_mask1.height-overlap_size)/(blksize-overlap_size)
dn_mask1=BilinearResize(dn_mask1, dn_mask_x, dn_mask_y)

dn_mask1=Levels(dn_mask1, 0, 1, 100, 0, 255, coring=false)

dn_masked=MDegrainN(.., IntOvlp=3, dnmask=dn_mask1)

Added update MEL memory with best (lowest sum of DM table row) block and memory for sum of current stored in IIR memory block.

Real fast way to get block numbers is to feed any sized mask clip and read error message if size is not correct - it will show current blocks number for H and V directions for current used overlap mode.

Simple BilinearResize do not make perfect mask for any overlap mode because even blocks rows shifted to the right to overlap size (typical half block size with max overlap). So better to separate to odd/even rows - shift even rows to the right and combine to frame back. But any overlap processing modes looks like hide these errors with not too large block sizes.

Last edited by DTL; 18th June 2023 at 15:09.
DTL is offline   Reply With Quote
Old 6th July 2023, 13:08   #234  |  Link
anton_foy
Registered User
 
Join Date: Dec 2005
Location: Sweden
Posts: 702
@DTL

How is it possible to use mdegrain2 with your version, or is it only possible with tr=1 and mdegrain()?

I would like to make my Clay script to work with your version separately to get a speed boost and also a quality boost and yet keep close to the results I get with the current Pinterf version of mvtools.
But I think I have to restructure the script without using mrecalculate and overlap in manalyse.

Edit: maybe you have any further ideas for improvement in both speed and quality when using your version? Need to make it quite simple with for example HQ=true/false or I will have to add many possible parameters.

Last edited by anton_foy; 6th July 2023 at 13:16.
anton_foy is offline   Reply With Quote
Old 6th July 2023, 16:06   #235  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
"is it only possible with tr=1 and mdegrain()?"

MDegrain2 is tr=2. Yes - all new features only included in MDegrainN.

Also as it was found while testing IIR mode with TTempSmooth - any IIR (with previous frames memory) filter can only run in MT_SERIALIZE AVS+ MT mode correctly. So with any IIR-setting enabled (TTH_thUPD > 0) in current MDegraiN release (up to a.23) also require to manually set MT_SERIALIZE for MDegrainN (with SetFilterMTMode(.., force=true)) and to keep multithreading - only use internal AVSTP-based multithreading (mt=true and use updated avstp.dll from pinterf to save from hangs). Thanks to gods pinterf found and fixed that odd issue in avstp and now mvtools can run again with internal multithreading as it really the only possible MT mode with 'temporal' processing like IIR-filtering enabled. MT_SERIALIZE also auto-activated for MAnalyse if 'temporal' predictor is used for the same reason.

In the next versions MDegrainN will auto-register itself with MT_SERIALIZE if any IIR-setting is activated. Maybe also try to set mt=true too ?

"I would like to make my Clay script to work with your version separately to get a speed boost and also a quality boost and yet keep close to the results I get with the current Pinterf version of mvtools.
But I think I have to restructure the script without using mrecalculate and overlap in manalyse."

Best quality of MVs expected only from multi-generation MVs refining - example also in the
https://forum.doom9.org/showthread.p...64#post1987964

It is more complex in control because it is required to adjust at least 2 different thSAD for first and next generations. With not very small tr-value for first generation it is expected significant part of noise is removed at first generation so the thSAD for second generation may be about 0.6..0.7 of thSAD of first generation. With perfect noise removed it is expected that the last thSAD is first_thSAD/2. But the best strategy of number of generations of refining and decreasing of thSAD (and may be changing tr-value from lower at first generation to higher at second and next generations) at each generation is subject of research (also may be with Zopti). Also with such research the quality metric better be structure-aware (like SSIM or VIF or other).

Also best quality expected from onCPU MAnalyse and full 4x overlap in both MAnalyse and MDegrainN (it is the slowest mode). So currently very many performance/quality modes are possible.

Last edited by DTL; 10th July 2023 at 10:00.
DTL is offline   Reply With Quote
Old 30th July 2023, 19:37   #236  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
New version - https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.24

Added Auto-thSAD for MDegrainN.

New params to MDegrainN:
thSADA_a (float), default = 0. Multiplier proportional to estimated nosie level
thSADA_b (float), default = 0. Offset to calculated Auto-thSAD.

If both thSADA_a and thSADA_b = 0 - Auto-thSAD is disabled.

Used Auto-thSAD is a scaled and offsetted arithmetic mean of blocks SAD values below thSCD1 (noise_estimate). Next is applied adjusting params:
Auto_thSAD = thSADA_a * noise_estimate + thSADA_b

thSAD2, thSADC, thSADC2 calculated proportionally to provided old params values.

For a typical workflow user must provide both non-default thSADA_a and thSADA_b values. If only thSADA_b provided - it will be equal to static thSAD. Expected start values are thSADA_a = 1.0 and thSADA_b = 10.

Setting thSADA_a < 1.0 will make higher denoise on low noise scenes and lower at high nosie scenes.
Setting thSADA_a > 1.0 will make higher denoise on high noise scenes and higher at high noise scenes.

thSADA_b is a simple additive offset (may be negative too).

Initial release of Auto-thSAD feature for testing.

Example:
Code:
MDegrainN(last,super, multi_vec, tr, thSAD=135, thSAD2=120, mt=false, wpow=4, thSCD1=450, thSADA_a=1.05, thSADA_b=5, adjSADzeromv=0.5, adjSADcohmv=0.5, thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, IntOvlp=3)
To provide roll-off slope for thSAD if required - user must set thSAD and thSAD2 (also thSADC and thSADC2 for chroma if required). It may be in some abstract units if Auto-thSAD is enabled (only relative ratio is calculated internally). Also users must take care of correct thSCD1 param for medium and high noised scenes. Only blocks with SAD below thSCD1 are used in noise estimation, so too low thSCD1 will cause either too bad estimation or fallback to 'static thSAD' provided as old params. Also may quickly disable any denoising if all frames will be detected as 'scenechange'.
DTL is offline   Reply With Quote
Old 1st August 2023, 00:09   #237  |  Link
anton_foy
Registered User
 
Join Date: Dec 2005
Location: Sweden
Posts: 702
Quote:
Originally Posted by DTL View Post
New version - https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.24

Added Auto-thSAD for MDegrainN.

New params to MDegrainN:
thSADA_a (float), default = 0. Multiplier proportional to estimated nosie level
thSADA_b (float), default = 0. Offset to calculated Auto-thSAD.

If both thSADA_a and thSADA_b = 0 - Auto-thSAD is disabled.

Used Auto-thSAD is a scaled and offsetted arithmetic mean of blocks SAD values below thSCD1 (noise_estimate). Next is applied adjusting params:
Auto_thSAD = thSADA_a * noise_estimate + thSADA_b

thSAD2, thSADC, thSADC2 calculated proportionally to provided old params values.

For a typical workflow user must provide both non-default thSADA_a and thSADA_b values. If only thSADA_b provided - it will be equal to static thSAD. Expected start values are thSADA_a = 1.0 and thSADA_b = 10.

Setting thSADA_a < 1.0 will make higher denoise on low noise scenes and lower at high nosie scenes.
Setting thSADA_a > 1.0 will make higher denoise on high noise scenes and higher at high noise scenes.

thSADA_b is a simple additive offset (may be negative too).

Initial release of Auto-thSAD feature for testing.

Example:
Code:
MDegrainN(last,super, multi_vec, tr, thSAD=135, thSAD2=120, mt=false, wpow=4, thSCD1=450, thSADA_a=1.05, thSADA_b=5, adjSADzeromv=0.5, adjSADcohmv=0.5, thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, IntOvlp=3)
To provide roll-off slope for thSAD if required - user must set thSAD and thSAD2 (also thSADC and thSADC2 for chroma if required). It may be in some abstract units if Auto-thSAD is enabled (only relative ratio is calculated internally). Also users must take care of correct thSCD1 param for medium and high noised scenes. Only blocks with SAD below thSCD1 are used in noise estimation, so too low thSCD1 will cause either too bad estimation or fallback to 'static thSAD' provided as old params. Also may quickly disable any denoising if all frames will be detected as 'scenechange'.
So cool! Does this new feature slow things down alot?

Edit: btw. I tried to make your version correspond visually to pinterf's latest version of mvtools2 but yours with optSearchOption=5 and intOvlp=3 gave less denoising and less temporal stability even if I turned up thsad. Will post the full script comparisons later today if I can (Clay with fast=true since your version does not have MDegrain2 now) . Also I did not get any speed improvement which I guess is because of my old GPU.

Last edited by anton_foy; 1st August 2023 at 08:13.
anton_foy is offline   Reply With Quote
Old 1st August 2023, 09:22   #238  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
"Does this new feature slow things down alot?"

I did not take tests of performance. But it is expected to be very fast and not make a visible performance hit. If performance hit is visible - performance may be better in next versions with pre-calculating of tr-weights. Currently each frame tr-weights roll-off (defined by thSAD/thSAD2 difference) calculated using float cos() function.

" optSearchOption=5 and intOvlp=3 gave less denoising and less temporal stability"

In my tests the quality of ME with the GTX1060 card is somehow worse in comparison with onCPU MAnalyse. But acceptable for the denoise of documentaries series with offloading part of work from CPU so total mvtools+x264 encoding run faster. For highest quality denoise work only onCPU MAnalyse is recommended (optSearchOption != 5/6).

Hardware ME from MPEG encoder ASIC is not simply hardware-accelerated MAnalyse but completely different ME engine may be optimized for faster MPEG encoding and not for quality. Also at each version of hardware and each vendor (NVIDIA/AMD/Intel/others ?) it may provide different quality and performance.

Maybe hardware ME can be used to make things faster in multi-generations ME refining as first generation of MAnalyse.

My current test script for 2 generations MVs refining and Auto-thSAD used:
Code:
# Input plugins
LoadPlugin("ffms2.dll")
LoadPlugin("mvtools2.dll")

SetFilterMTMode("DEFAULT_MT_MODE", 3)

my_thSADA_a=1.1
my_thSADA_b=50

my_thSAD=250
my_thSAD2=Int(Float(my_thSAD) * 0.8)

my_thSAD_mg=150
my_thSAD2_mg=Int(Float(my_thSAD_mg) * 0.8)

my_thSCD=my_thSAD+200

my_pzero=10
my_pnew=10
my_pglobal=10

my_pel=2
my_thCohMV=5 # 5..8 for pel=2, 10..16 for pel=4 ?
my_trymany=false

my_oPT=1
my_overlap=0
my_IntOvlp=3
my_searchparam=2

my_MPBNumIt=2

my_init_tr=6
my_refine_tr=6

Function RefineMV(clip mvclip, clip super_ref, clip src, int _thSAD, int _thSAD2, int in_tr, int refine_tr, int my_thSCD, int my_pel, bool my_trymany, int my_pnew, int my_pzero, int my_pglobal, \
int my_oPT, int my_overlap, int my_searchparam, int my_IntOvlp, int my_thCohMV)
{
 g_next=MDegrainN(src, super_ref, mvclip, in_tr, thSAD=_thSAD, thSAD2=_thSAD2, mt=false, wpow=4, thSCD1=my_thSCD, adjSADzeromv=0.6, adjSADcohmv=0.6, thCohMV=my_thCohMV, \
MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=my_IntOvlp)
 super_g_next=MSuper(g_next,chroma=true, mt=false, pel=my_pel)
 return MAnalyse(super_g_next, SuperCurrent=super_ref, multi=true, delta=refine_tr, search=3, searchparam=my_searchparam, trymany=my_trymany, overlap=my_overlap, chroma=true, mt=false,\
 optSearchOption=1, truemotion=false, pnew=my_pnew, pzero=my_pzero, pglobal=my_pglobal, global=true, optPredictorType=my_oPT)
}

Function RefineMVa(clip mvclip, clip super_ref, clip src, int _thSAD, int _thSAD2, float _thSADA_a, float _thSADA_b, int in_tr, int refine_tr, int my_thSCD, int my_pel, bool my_trymany, int my_pnew, int my_pzero, int my_pglobal, \
int my_oPT, int my_overlap, int my_searchparam, int my_IntOvlp, int my_thCohMV)
{
 g_next=MDegrainN(src, super_ref, mvclip, in_tr, thSAD=_thSAD, thSAD2=_thSAD2, thSADA_a=_thSADA_a, thSADA_b=_thSADA_b, mt=false, wpow=4, thSCD1=my_thSCD, adjSADzeromv=0.6, adjSADcohmv=0.6, thCohMV=my_thCohMV, \
MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=my_IntOvlp)
 super_g_next=MSuper(g_next,chroma=true, mt=false, pel=my_pel)
 return MAnalyse(super_g_next, SuperCurrent=super_ref, multi=true, delta=refine_tr, search=3, searchparam=my_searchparam, trymany=my_trymany, overlap=my_overlap, chroma=true, mt=false,\
 optSearchOption=1, truemotion=false, pnew=my_pnew, pzero=my_pzero, pglobal=my_pglobal, global=true, optPredictorType=my_oPT)
}

FFmpegSource2("src.mp4")

noproc=last

super_cpu=MSuper(last, mt=false, chroma=true, pel=my_pel, hpad=8, vpad=8, levels=0, pelrefine=true)

multi_vec_cpu=MAnalyse(super_cpu, multi=true, delta=my_init_tr, search=3, searchparam=my_searchparam, trymany=my_trymany, overlap=my_overlap, chroma=true, mt=false, \
optSearchOption=1, truemotion=false, pnew=my_pnew, pzero=my_pzero, pglobal=my_pglobal, global=true, optPredictorType=my_oPT)

multi_vec_cpu2=RefineMV(multi_vec_cpu, super_cpu, last, my_thSAD, my_thSAD2, my_init_tr, my_refine_tr, my_thSCD, my_pel, my_trymany, my_pnew, my_pzero, my_pglobal, my_oPT, \
my_overlap, my_searchparam, my_IntOvlp, my_thCohMV)

multi_vec_cpu2a=RefineMVa(multi_vec_cpu, super_cpu, last, my_thSAD, my_thSAD2, my_thSADA_a, my_thSADA_b, my_init_tr, my_refine_tr, my_thSCD, my_pel, my_trymany, my_pnew, my_pzero, my_pglobal, my_oPT, \
my_overlap, my_searchparam, my_IntOvlp, my_thCohMV)

cpu2=MDegrainN(last,super_cpu, multi_vec_cpu2, my_refine_tr, thSAD=my_thSAD_mg, thSAD2=my_thSAD2_mg, mt=false, wpow=4, UseSubShift=1, thSCD1=my_thSCD, adjSADzeromv=0.7, \
adjSADcohmv=0.7, thCohMV=my_thCohMV, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=my_IntOvlp, MPBthSub=5, MPBthAdd=20, MPBNumIt=my_MPBNumIt, \
MPB_SPCsub=0.5, MPB_SPCadd=1.5, MPBthIVS=2200, showIVSmask=false)

cpu2a=MDegrainN(last,super_cpu, multi_vec_cpu2a, my_refine_tr, thSAD=my_thSAD_mg, thSAD2=my_thSAD2_mg, thSADA_a=my_thSADA_a, thSADA_b=my_thSADA_b, mt=false, wpow=4, UseSubShift=1, thSCD1=my_thSCD, adjSADzeromv=0.7, \
adjSADcohmv=0.7, thCohMV=my_thCohMV, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=my_IntOvlp, MPBthSub=5, MPBthAdd=20, MPBNumIt=my_MPBNumIt, \
MPB_SPCsub=0.5, MPB_SPCadd=1.5, MPBthIVS=2200, showIVSmask=false)

cpu2a_s=MDegrainN(last,super_cpu, multi_vec_cpu2a, my_refine_tr, thSAD=my_thSAD_mg, thSAD2=my_thSAD2_mg, thSADA_a=my_thSADA_a, thSADA_b=my_thSADA_b, mt=false, wpow=4, UseSubShift=1, thSCD1=my_thSCD, adjSADzeromv=0.7, \
adjSADcohmv=0.7, thCohMV=my_thCohMV, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=my_IntOvlp)

cpu_s=MDegrainN(last,super_cpu, multi_vec_cpu, my_init_tr, thSAD=my_thSAD, thSAD2=my_thSAD2, mt=false, thSCD1=my_thSCD, IntOvlp=my_IntOvlp)

Interleave(noproc.Subtitle("src"),cpu2.Subtitle("cpu2"), cpu2a_s.Subtitle("cpu2a_s"), cpu_s.Subtitle("cpu_s"))

Prefetch(..)
Where interleaved output frames
src - input source
cpu2 - 2 generations MVs refined with MPB and static thSAD
cpu2a_s - 2 generations MVs refined without MPB and Auto-thSAD at all generations
cpu_s - single generation MAnalyse and MDegrain with static thSAD (mostly close to 2.7.45 version, only interpolated overlap used for better performance).

Settings for MAnalsye in the script are not the best possible for best quality - I set lower for better performance at my old test CPU of E7500. Better quality expected with
my_pel=4
my_thCohMV=12 # 10..16 for pel=4 ?
my_trymany=true

my_oPT=0 # all predictors used
my_overlap=4 # full 4x real search overlap - slowest
my_IntOvlp=0
my_searchparam=2 # better expected with >2 and also pelsearch > 4 (for pel=4)

MPB processing in MDegrainN still looks not make things visibly better (at least at my grainy test footage) so currently may be disabled for a bit better performance. 2 generations MVs refining sometime reduce search errors also at the borders of objects and dark parts of scenes. Using of Auto-thSAD (with old added SAD-related tweaks for static and moving and 'coherent moving' blocks with adjSADzeromv, adjSADcohmv keeps more details at some areas like moving parts with lower denoising at these areas).

Last edited by DTL; 1st August 2023 at 09:49.
DTL is offline   Reply With Quote
Old 3rd August 2023, 10:27   #239  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
I hope to see all those news in SMDegrain soon
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 7th August 2023, 11:40   #240  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
It is not likely to be soon until we get good programmers to fix currently already added bugs. As I found with an attempt to enable internal MT with avstp.dll - both MAnalyse and MDegrain crashes with something like memory corruption. Only works stable with AVS+ global frame-based MT. So it looks like compatibility with internal MT via AVSTP is severely broken. An internal MT in MDegrainN is highly recommended if use IIR-based temporal additional filtering (only works good in MT_SERIALIZED). So I think Dogway does not like to use such not stable versions.

About using hardware ME with very slow filtering - it really greatly helps in 2 generations MVs refining. 2 MAnalyse with 'very' slow settings like pel=4, tr=12, trymany=true close to no-start at all. And with the use of DX12-ME from GTX1060 at first MAnalyse total transcoding runs at about 0.25 fps with i5-9600K CPU.

Current practical script with 'best quality' settings is:
Code:
# Input plugins
LoadPlugin("ffms2.dll")
LoadPlugin("mvtools2.dll")

SetMemoryMax(10000)

my_thSADA_a=1.3
my_thSADA_b=80

my_thSAD=250
my_thSAD2=Int(Float(my_thSAD) * 0.8)

my_thSAD_mg=150
my_thSAD2_mg=Int(Float(my_thSAD_mg) * 0.8)

my_thSCD=my_thSAD+200

my_pzero=10
my_pnew=10
my_pglobal=10

my_pel=4
my_thCohMV=4 # 5..8 for pel=2, 10..16 for pel=4 ?
my_trymany=true

my_oPT=0
my_overlap=4
my_IntOvlp=0
my_searchparam=4
my_pelsearchparam=4

my_MPBNumIt=2

my_init_tr=12
my_refine_tr=12

my_MT=false

Function RefineMVa(clip mvclip, clip super_hwa, clip super_ref, clip src, int _thSAD, int _thSAD2, float _thSADA_a, float _thSADA_b, int in_tr, int refine_tr, int my_thSCD, int my_pel, bool my_trymany, int my_pnew, int my_pzero, int my_pglobal, \
int my_oPT, int my_overlap, int my_searchparam, int _pelsearchparam, int my_IntOvlp, int my_thCohMV, bool _my_MT)
{
 g_next=MDegrainN(src, super_hwa, mvclip, in_tr, thSAD=_thSAD, thSAD2=_thSAD2, thSADA_a=_thSADA_a, thSADA_b=_thSADA_b, mt=_my_MT, wpow=4, UseSubShift=1, thSCD1=my_thSCD, adjSADzeromv=0.6, adjSADcohmv=0.6, thCohMV=my_thCohMV, \
MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=3)
 super_g_next=MSuper(g_next,chroma=true, mt=_my_MT, pel=my_pel)
 return MAnalyse(super_g_next, SuperCurrent=super_ref, multi=true, delta=refine_tr, search=3, searchparam=my_searchparam, pelsearch=_pelsearchparam, trymany=my_trymany, overlap=my_overlap, chroma=true, mt=false,\
 optSearchOption=1, truemotion=false, pnew=my_pnew, pzero=my_pzero, pglobal=my_pglobal, global=true, optPredictorType=my_oPT)
}

FFmpegSource2("src.mp4")

super_cpu=MSuper(last, mt=my_MT, chroma=true, pel=my_pel, hpad=8, vpad=8, levels=0, pelrefine=true)
super_hwa=MSuper(last, mt=my_MT, chroma=true, pel=4, hpad=8, vpad=8, levels=1, pelrefine=false)

multi_vec_cpu=MAnalyse(super_cpu, multi=true, delta=my_init_tr, search=3, searchparam=my_searchparam, trymany=my_trymany, overlap=my_overlap, chroma=true, mt=false, \
optSearchOption=1, truemotion=false, pnew=my_pnew, pzero=my_pzero, pglobal=my_pglobal, global=true, optPredictorType=my_oPT)

multi_vec_hwa=MAnalyse(super_hwa, multi=true, delta=my_init_tr, chroma=true, mt=false, \
optSearchOption=5, levels=1)

multi_vec_cpu2a=RefineMVa(multi_vec_hwa, super_hwa, super_cpu, last, my_thSAD, my_thSAD2, my_thSADA_a, my_thSADA_b, my_init_tr, my_refine_tr, my_thSCD, my_pel, my_trymany, my_pnew, my_pzero, my_pglobal, my_oPT, \
my_overlap, my_searchparam, my_pelsearchparam, my_IntOvlp, my_thCohMV, my_MT)

MDegrainN(last,super_cpu, multi_vec_cpu2a, my_refine_tr, thSAD=my_thSAD_mg, thSAD2=my_thSAD2_mg, thSADA_a=my_thSADA_a, thSADA_b=my_thSADA_b, mt=my_MT, wpow=4, UseSubShift=1, thSCD1=my_thSCD, adjSADzeromv=0.7, \
adjSADcohmv=0.7, thCohMV=my_thCohMV, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=my_IntOvlp)

Prefetch(..)
But as I see at total transcoding fps about 0.25 and FullHD frame it uses only about 1..2% of hardware encoder. So most of the CPU time looks like sitting in the second MAnalyse with slow settings for best quality.

So I get new ideas about better quality of MVs using still free resources of hardware accelerator: To use extra free resources of hardware ME accelerator (also typically not capable to make overlapping processing in mvtools-order with single search job) send several small steps shifted frames for search MVs with a bit different blocks assignment (like +-1 sample for 4:4:4 formats and +-2 samples for 4:2:0 formats) to generate 4 or 8 additional MVs around 'current' block position and calculate some averaging of these 5 or 9 MVs to get possibly more noise-free MV for current block. Averaging modes may be arithmetic mean or median (or other non-linear filtering of data 1D vector or even 2D array). To make it usable with any MAnalyse mode and any other filter consumer of MVs data - make it finally separated mvtools filter like MVProc() with 5 or 9 possible inputs from several MAnalyse (or in the future 1 input from single MAnalyse in special multi-mode). Also maybe transfer MVLPF (and other possible future MVs data intermediate processing) in this filter so it can be used with any MVs data consumer filter in complex scripting and allow to split its output to different filters using AVS scripting - for example as additional predictor for multi-generation search scripts (see feature 48 also). The number of search positions around the current block may be increased up to filling all possible integer blocks positions. Also maybe subsample shifted positions can be added too (to fill radius of 0.5..0.25 to 1.25 and more around current block position).

Expected new features script is like:
Code:
#current block pos
super=MSuper(last)
current_mvclip=MAnalyse(super,..)

#shifted 2 samples up block pos
sh2_up=Crop(0,2,last.width, last.height-2).AddBorders(0,0,2,0)
super_sh2up=MSuper(sh2_up)
sh2_up_mvclip=MAnalyse(super_sh2up,..)

# same here for shifting 2 samples left, down, right

# combine 5 MV clips from current and shifted blocks assignment
mvclip=MVProc(current_mvclip, sh2_up_mvclip, sh2_down_mvclip, sh2_left_mvclip, sh2_right_mvclip, average_mode='median',.., optional MVLPF and other)

MDegrainN(last, super, mvclip,...)
So it still requires some development time to check this idea of MVs refining at the typical complex places like low contrast and heavily noisy areas. Where noise close or above amplitude of texture details so simple MVs search typically greatly fails and it causes details blurring with MDegrain.
DTL is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 19:26.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.