MVTools-pfmod - Page 36

DTL · 14th October 2021, 20:56

Some first possibly working testbuild with VS2017 - https://drive.google.com/file/d/15v6...ew?usp=sharing . For new AVX2 exhaustive search way.
Only chroma=false and searchparam=2 (default ?) and 4. Full levels esa search is search=3 but one of levels always esa(radius=2) so it possibly will partially works on most of search methods. Block size only default 8x8.
Tested only in SDE intel AVX simulator in debug build (still not have AVX-capable chip at my home). So can not measure the performance. Will have access to AVX-capable cpu only at my work a few days in a month.
The MDegrainN triggers some assert (do not know why - may be VS2017 not completely compatible with mvtools project also ?) . So tested with MDegrain2() in debug build.

Unfortunately the new functions still require sadly check for input vector validity - the calling of search with invalid vectors need to be fixed in the future because it may slow process with non-needed checks. If disabled - it will run-out of buffers addresses and crash.

The MShow displays vectors differs from old search (Expanding search) but degrain looks like works without visible blurring so the vectors looks like not very bad.

DTL · 15th October 2021, 15:25

Design ideas about Y+PseudoHueSat 8+8bit samples SAD processing: (instead of YUV 8+8+8 bit typically (up to) 3 passes)

1. Hue looks like naturally 2D essence so if unrolled to 1D will have discontinuity point. Unfortunately CPU can not process SAD of 4+4 low and high packs of bits of 8bit byte. (Or I still do not know if some math possible). This discontinuity + natural noise will cause large changes of PHS value near some colour tone and it will cause this colour tone blocks to treat as non-equal so they will not be denoised. We can only place it to some rare enough colour tone (I think close to Magenta). The simple 4-quadrants adjusting of the colour tone where discontinuity is happens is feeding PHS() calculation function with +-U and/or +-V values.

2. The Y+PHS coded plane may be processed with 1 pass SAD SIMD AVX2 same (close to) as for 16x8 block size of 8bit samples.

3. Simple enough calculation of PHS value from UV: (in 8bit unsigned, center to 127 codelevel for green colour tone and/or zero saturation)
(DiamondAngle(U,V)-2)*Sat(UV)*some_norm+127.
where
DiamondAngle(x,y):

Code:

float DiamondAngle(float y, float x)
{
	if (y >= 0)
		return (x >= 0 ? y / (x + y) : 1 - x / (-x + y));
	else
		return (x < 0 ? 2 - y / (-x - y) : 3 + x / (x - y));
} //x and y in 0..1 range, out is 0..4 range.

and Sat(x,y) about ((abs(U)+abs(V)) >> some_norm_value) - sort of saturation.
It may be made as SIMD calculation at runtime or may be simple LUT of 8=f(8,8) bit values that is 16 kBytes in size and will fit in L1 cache. Good to test both ways.

DTL · 16th October 2021, 12:59

Some working release for test: https://github.com/DTL2020/mvtools/r...s/tag/2.7.45-1 .
Due to no user-controls yet it have many build versions:
mvtools2.dll - standard PseudoEPZSearch, Esa search and searchparam 2 and 4 with chroma=false uses new AVX2 SAD.
mvtools2_ml1.dll - maxlevels limited to 1, searchparam only 2.
mvtools2_ml1.dll - maxlevels limited to 2.
mvtools2_glob_med_pred_ml2.dll - maxlevels limited to 2, only global and medium MV predictors, medium speed and quality
mvtools2_no_pred_ml1.dll maxlevels limited to 1, no predictors, fastest speed, lowest motion search quality.

Test script (1080i source):

Code:

SeparateFields()
tr=12
super=MSuper(last,chroma=true)
multi_vec=MAnalyse (super, multi=true, delta=tr, search=3, searchparam=2, overlap=2, chroma=false)
MDegrainN(last,super, multi_vec, tr, thSAD=300, thSAD2=200)
Weave()

CPU i5-9600K
AVSMeter results (fps):
release-2.7.45: 7.2
mvtools2.dll: 9.53
mvtools2_glob_med_pred_ml2.dll: 11.46
mvtools2_ml1.dll: 11.31
mvtools2_ml2.dll: 10.03
mvtools2_no_pred_ml1.dll: 18.35

Test encoding average datarate with x264 (no MdegrainN : 24252) (kbit/s)
release-2.7.45: 7866
mvtools2.dll: 7812
mvtools2_glob_med_pred_ml2.dll: 8464
mvtools2_ml1.dll: 8376
mvtools2_ml2.dll: 7844
mvtools2_no_pred_ml1.dll: 9342

So 'compression ratio' with best motion search (slowest) is about 3.08 and worst (fastest) is about 2.6. With difference in MAnalyse speed about 2 times.

So it is really greatly limited with memory-access for SADs for both Refine() search and predictors testing. Attempt to add Prefetch* into FetchPredictors() helps nothing - it really need to reorganize data in memory for better access with predictors-defined pattern.

Pull-request with current version created. Current idea is to add new user-input control like 'predictors=(all, partial, none)' and 'levels=(1,2,all/auto) for MAnalyse. To control quality/speed ratio.

tormento · 16th October 2021, 14:44

Quote:

Originally Posted by DTL

So 'compression ratio' with best motion search (slowest) is about 3.08 and worst (fastest) is about 2.6. With difference in MAnalyse speed about 2 times.

You could run some synthetic benchmark such as SSIM or PSNR. Not always less is better, IMHO.

DTL · 23rd October 2021, 09:46

Do the project compatible with Intel C++ compiler (integrated in Visual Studio) ?

I see some include defines about Intel_compiler. Also from command line I can compile and build .lib with Intel C++ 19.1 (from Parallel Studio XE 2020) for linking with VisualStudio 2019 all other build. (Excluding that .cpp from build in VS2019).

But when I try to switch 'Platform Toolset' to Intel compiler in Visual Studio (or even for the selected .cpp files) - the strange happens: Visual Studio project dies with unability to open project (mvtools) properties pages and unability to build. Same happens with VisualStudio 2017 and 2019 (tried 16.3 and 16.8). The only way to fix - to replace VS project files (.sln, .vsxproj) from the old state.
May be the VS project files are not compatible with IDE Visual Studio with Intel C++ build tools ? May special command line batch file exist to build mvtools.dll from Intel C++ command line interface (environment) ?

I want to try IntelC++ compiler build to test for speed and with many available hardware-specific optimisations.

Quote:

Originally Posted by tormento

You could run some synthetic benchmark such as SSIM or PSNR. Not always less is better, IMHO.

It need to be very syntetic because with degrain we do not have 'clean' source to compare with. So only way is to get some clean enough source and to add syntetic noise.

There are possible 2 types of errors with motion search for MDegrain:

1. Block marked as equal (SAD below theshold) by error.
2. Block search missed (not found any better position in compare with initial).

The 1 type errors cause blurring at motion areas and make output MPEG speed lower. It is most bad error because it typically good visible.
The 2 type errors only decrease degrain ratio (typically on motion areas) that is less visble.

kedautinh12 · 23rd October 2021, 10:12

I meet error notice when use last ver MVTool2 with TemporalDegrain2Mod

Code:

Avisynth script error: 
Evaaluate: Unhandled C++ exception! 
(C:/Program Files (x86)/Avisynth+/plugin64+/TemparalDegrain-v2.3.1MOD.avsi, line 276)

Turnback to old ver 2.7.44 error was gone
Location error
https://github.com/kedaitinh12/AVSPl...1MOD.avsi#L276

My script:

Code:

TemporalDegrain2(limitFFT=2, postFFT=6)

pinterf · 23rd October 2021, 18:25

Hi kedautinh12, I'd need the clip width and height and video format (e.g. YV12?)

tormento · 23rd October 2021, 19:52

Quote:

Originally Posted by DTL

It need to be very syntetic because with degrain we do not have 'clean' source to compare with. So only way is to get some clean enough source and to add syntetic noise.

I meant to compare from previous version of MVTools and your ones.

kedautinh12 · 23rd October 2021, 21:07

Quote:

Originally Posted by pinterf

Hi kedautinh12, I'd need the clip width and height and video format (e.g. YV12?)

Here:
https://drive.google.com/file/d/1Jrf...w?usp=drivesdk

pinterf · 23rd October 2021, 21:18

Quote:

Originally Posted by kedautinh12

Here: ...

Thanks for the sample clip, so far it works nicely with 2.7.45 at 0.61 fps (no Prefetch)

I've got no NVidia card on developer machine so I'm using

Code:

TemporalDegrain2(limitFFT=2, postFFT=5) # BM3DCPU

instead.

arnea · 23rd October 2021, 22:00

Quote:

Originally Posted by arnea

I need to use MDepan with super clip created with negative delta (i.e. I want to stabilize clip globally in relation to one specific frame). However MDepan does not support this at the moment. There is a check in constructor that throws error when mvclip.nDeltaFrame is not 1.

...

Or are there any other stabilizers that could do this?

I've made some progress on this. wonkey_donkey has a plugin in development that did quite good job on the test clip that I provided. However the plugin is not ready yet.

I tried to change the MDepan plugin and removed the check for negative delta. It worked, Depan plugin stabilized the video, but there was still too much movement across the clip.

I then decided to implement my own ideas about using sprocket holes to match the frames globally. It took some time, but it's ready now: https://github.com/arnean/PerfPan Not very efficient implementation, but it worked on the single clip I have at the moment. I will scan more films and see how it behaves.

kedautinh12 · 26th October 2021, 15:36

Quote:

Originally Posted by pinterf

Thanks for the sample clip, so far it works nicely with 2.7.45 at 0.61 fps (no Prefetch)

I've got no NVidia card on developer machine so I'm using

Code:

TemporalDegrain2(limitFFT=2, postFFT=5) # BM3DCPU

instead.

I check again and find out my nvidia driver only standard and no dch. I install dch driver and it's work correctly

DTL · 27th October 2021, 11:05

Sort of *need for testers* request:

For test-release https://github.com/DTL2020/mvtools/r....7.46-pre.a.01 (also copy to google disk because of frequent repository removing for forking new actual version - https://drive.google.com/file/d/1Qm5...ew?usp=sharing ).

Recommended testscript:

Code:

tr = 8 # Temporal radius - need testing low (2..3 and high values like 10+)
super = MSuper ()
multi_vec = MAnalyse (super, multi=true, delta=tr,chroma=false,mt=false, levels=2)
MDegrainN (super, multi_vec, tr, thSAD=400, thSAD2=400-1)

2 test cases:
1. Early skip of processing blocks with zero weight. It adds 2 conditions of zero checks that can decrease speed but if weight is zero it skips fetching from memory ref block and skips its processing that will increase speed. The total influence on speed depends on thSAD value and noise level in the processed footage. (Zero weights occur when blockSAD > thSAD and blockSAD increases with increasing of noise amplitude).

Files: mvtools2_ww_es.dll - use early zero weight skip, mvtools2_ww_ns.dll - use old all blocks processing without condiitons. Currently fast zero weight skip only used in MDegrainN_sse2 so to test with low tr (2..3 the thSAD must be different from thSAD2 - to use MDegrainN instead of MDegrain2,3). Required tested tr values - low as about 2..3 and higer like 10.

2. Modification of weights of blocks to average. Old/classic MVtools uses additional weighting of blocks below thSAD with linear function
f(x)=(thSAD^2-blockSAD^2)/(thSAD^2+blockSAD^2)
It causes the blockSADs close to thSAD(2) to get low weights (close to zero if blockSAD slightly below thSAD). That decreases degrain ratio but may help to prevent more blurring if 'Type1' error occur (block have SAD below thSAD but it is not noised block but different block (moved/scaled/rotated etc).
So using same sad-wise method for both motion compensating and selecting (and weigthing) blocks for avaraging is not totally error-free and nice. And may need more logical supplementing.
So it looks the 'weighting of weights' method is the great field for finetuning and need to be additional user-param for MDegrain*() functions.
Currently for test is 2 builds:
mvtools2_ww_es.dll - old/classic linear weighting of weights
mvtools2_eqw_es.dll - equal weighting of all blocks below thSAD(2). It is also a bit faster because no double float division is required.
The using of thSAD2 in MDegrainN only a bit lower thSAD increase degrain ratio but may cause more blurring (and other bugs) with motion.
Test task: to check if mvtools2_eqw_es.dll produce better degrain ratio with all other equal params (tr mostly and thSAD(2)) and if introduce some visual quality degradation - to describe it. It possibly applies to all MDegrain* functions because of MDegrain3.h header file.

These builds also may contain test of OpenMP in MAnalyse() enabled to it better to disable avstp mt for MAnalyse(). Also MAnalyse() have new params
optSearchOption=0 (default), if =1 - use new _avx2 esa search functions for search=3 and searchparam=2,3,4 (3 and 4 still not debugged in this build), and block size 8x8 and chroma=false and nPel=1 (in MSuper).
optPredictorType=0 (default), if =1 - use partial predictors (faster but less vectors found and less degrain quality), if =2 - use no predictors - faster than (1) but less quality. To use searchparam > 2 the 'levels' must be >1.

One bad use case for 'classic' weights weighting:
The natural photon-shot noise have Poisson distrubution (coming to Gauss with practically large numbers of photons per sample) and it have both low and high deviations from mean value. It cause SAD deviations also low and high.
When block with high SAD deviation is in tr-scope of degrain process (but still below thSAD) it got low weight to most of other blocks and low include its content to other blocks. But when processing reach this block - all other blocks in tr-scope got low weights and this block keeps most of its deviation after weighted-averaging. And in MPEG-compression later this block treats as different block and increases bit number for frame.
I think 1-pass idea of comparing current block SAD with others is not perfect - may be better to compare 'previous-averaged' (that is close to expected block looking) SAD with others. But it looks like need multi-pass degraining. With second SAD-search process of '1-pass-degrained plane' with input source. May be it is hard to implement in 1-pass scripting without intermediate 1-pass degrained full plane output to temp file.

So current thSAD value need to be high enough to make good weights for blocks with large noise-deviated SADs but low enough to decrease rate of 'block not equal' errors. I think this contradiction may be somehow (better) resolved. With sad-independent weighting (simple averaging of all tr-scope blocks with sad below trSAD) we can have lower thSAD value with still catching into processing the more blocks with high noise-deviated SADs.

Dogway · 1st November 2021, 22:03

I think some MDegrain values are not auto-scaled for 32-bit float, as I could see limit and limitc, didn't test more.

Dogway · 13th November 2021, 14:13

2 weeks later but with the same issue, the wiki states that MDegrain and MSuper support 32-bit float, yet all I get is a black screen.

Code:

ConvertBits(32)

super_search = ConvertBits(8).MSuper(rfilter=4)
bv1 = super_search.MAnalyse(isb = true,  delta = 1, overlap= 4)
fv1 = super_search.MAnalyse(isb = false, delta = 1, overlap= 4)
MDegrain1(MSuper(levels=1), bv1, fv1, thSAD=300, thSADC=150)

DTL · 13th November 2021, 14:49

Looks like a bug in 32-bit codepath. It looks MDegrainN have some support of 32bit format but in C-reference only (slow). PlanarRGB 32bit same black screen also.

Update - some testbuild with SIMD-based FetchPredictors() - https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.03 . It looks it was slow because of too many conditional tests and branching in the CheckMV() and Median(). Now it is fully redesigned to SSE(4.1 minimum required). The clipping may differs from old versions so new is only called with optSearchOption > 0. So now full-predictors MAnalyse should run faster on CPUs SSE4.1+ and with all possible other options sets + additional speedup for blocks 8x8 8bit luma=false pel=1 with AVX2-capable chips.

DTL · 20th November 2021, 12:03

Quote:

Originally Posted by StainlessS

FranceBB,
In MRecalculate, with square side = 1, does that mean that it searches only 1 pixel side to side, up and down, or NONE.

Not sure if problem, just asking. [EDIT: ie, is your MRecalculate a timewasting NOP]

It looks like hidden (still) undocumented feature of MAnalyse - it ignores user-input of searchparam at the finest (largest) level of search and always use searchparam=pel at finest/largest level. So setting to zero do not disables search at finest level.
I think it is optimization idea of old developers because in other case it is better to allow user to feed a vector/array of searchparams for each search level to find best speed/quality balance. Or at least add more user-input param like searchparamfinest=int or useequalsearchparamatalllevels=bool.
There is still idea that with fast enough search engine it possible to make something like pel=1 sp=2 levels=1 faster in compare with pel=1 sp=1 levels=2. But currently the program do not allow to actually use sp > pel at the fastest processing with level=1 and perform silent fallback to sp=pel.
Because each level require full-frame memory recall (but each next level buffer size is 1/4 of previous). So if memory speed is final limit of speed - the 1 pass search with levels=1 may be faster in compare with 2 pass levels=2 (1 + 1/4 memory).

takla · 21st November 2021, 17:37

@DTL

With my Ryzen 3900x:

2.7.45 (mvtools-2.7.45-with-depans20210608)
time=70.115s

2.7.46 (Release_2.7.46_pre_a_03 AVX2)
time=63.457s

Using these settings
Except this: ConvertBits(8, dither=1)

Quote:

ffmpeg -benchmark -i TEST.avs -c:v ffv1 TEST.mkv

Very nice speed-up. Filesize was identical too.

DTL · 21st November 2021, 19:44

Thank you for testing and report.

There is many new changes since that test-build. Now when fixing optPredictorType=2 (something like 'only hierarchical predictor' mode) I found may be useful method of returning different SAD value from MAnalyse to MDegrain and it significally helps in some cases of noise or noise-like aliasing of interlaced camera. It is return sad not from finest level of search but from prevoius.
Currently fixed mode of optPredictorType=2 works like this (it possibly the only possilble mode for this limiting preictor mode) but works acceptable only with levels=1 or 2. May produce severe artifacts with levels >2. But it still fastest mode for possibly lowes quality degrain work. Also with levels 2 and more it start to blur more and more (still do not know why - may be too many low-sad values).

But in optPredictorType=1 this 'previous-level sad return' maybe controlled only for finest level (using hidden still working feature of nSearchParam==1 at finest level) so it allow to use optPredictorType=1 with any levels number (the 2 is faster in compare with auto/all like 6). It is also possible in 'all predictors' mode optPredictorType=0 if required (still not added).

So current latest testbuild - https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.04 . The executables *sf.dll return special sad from non-finest level to MAnalyse. It typically 2 or more times lower in compare with 'standard' sad so allow to use lower thSAD(2) values in MDegrain. With too high thSAD it may cause more artifacts like visibility of blocks edges/corners so may require increasing of overlap param (I currently use overlap=2). So require more precise thSAD adjusting per given content. If this feature will be found as useful it may be additional user-switch param.

Current optSearchOption valid values:
0 - standard processing equal to 2.7.45 (for compatibility)
1 - partial SIMD optimizations for all options may be used. Like ClipMV, FetchPredictors. Exhaustive search (type=3) for searchparam 1 and 2 (and 3,4) is new AVX2-based. Though I see the r=3 and r=4 is of low use because default r=2 for levels except finest (slowest) and at finest/slowest level uses the fixed r=1 uses.
2 - special 'preset=fast' for degraining work, have hardcoded fixed many 'default' params for faster execution (less conditions checks).
Currently aggregated and fixed params for optSearchOption=2:
blksize = blksizeV = required to be 8,
search=3,
searchparam=2,
pel = required to be 1 in MSuper
chroma = false,
outfile = disabled,
dct = disabled,
badSAD = (infinite,disabled)
badrange = (disabled),
temporal = false,
trymany = false,
required CPU opts: SSE, SSE4.1, AVX2.

There may be also the very first 'tech demo' of multi-(4)-blocks search with optSearchOption=3 but it still not work for degrain at all - simply test for speed possible more SIMD-oriented processing on AVX2-capable chips.

My typical production using script currently is about

Code:

tr=12
super=MSuper(chroma=true, mt=false, pel=1)
multi_vec=MAnalyse (super, multi=true, delta=tr, search=3, searchparam=2, overlap=2, chroma=false, mt=false, optSearchOption=2, optPredictorType=1,levels=4)
MDegrainN(last,super, multi_vec, tr, thSAD=175, thSAD2=160, mt=false,wpow=4)

takla · 22nd November 2021, 01:24

@DTL

Code:

function EZdenoise(clip Input, int "thSAD", int "thSADC", int "TR", int "BLKSize", int "Overlap")
{
thSAD = default(thSAD, 150)
thSADC = default(thSADC, thSAD)
TR = default(TR, 3)
BLKSize = default(BLKSize, 8)
Overlap = default(Overlap, BLKSize/2)

Super = Input.MSuper(pel=1)
Multi_Vector = Super.MAnalyse(Multi=True, Delta=TR, BLKSize=BLKSize, Overlap=Overlap, optSearchOption=1)

Input.MDegrainN(Super, Multi_Vector, TR, thSAD=thSAD, thSAD2=thSAD/2, thSADC=thSADC, thSADC2=thSADC/2)
}

Code:

LWLibavVideoSource("C:\Users\Admin\Documents\01.mkv")
Trim(0, 1440)
ConvertBits(16)
EZdenoise()
ConvertBits(8, dither=1)
Prefetch(12, 48)

Code:

2.7.45
time=70.115s

2.7.46
optSearchOption=0
time=63.457s

2.7.46
optSearchOption=1
time=58.777s

Code:

optSearchOption=0

is ~10% faster then the original

Code:

optSearchOption=1

is ~16% faster then the original

Very nice speedups. Thank you very much!

14th October 2021, 20:56	#701 \| Link
DTL Registered User Join Date: Jul 2018 Posts: 1,075	Some first possibly working testbuild with VS2017 - https://drive.google.com/file/d/15v6...ew?usp=sharing . For new AVX2 exhaustive search way. Only chroma=false and searchparam=2 (default ?) and 4. Full levels esa search is search=3 but one of levels always esa(radius=2) so it possibly will partially works on most of search methods. Block size only default 8x8. Tested only in SDE intel AVX simulator in debug build (still not have AVX-capable chip at my home). So can not measure the performance. Will have access to AVX-capable cpu only at my work a few days in a month. The MDegrainN triggers some assert (do not know why - may be VS2017 not completely compatible with mvtools project also ?) . So tested with MDegrain2() in debug build. Unfortunately the new functions still require sadly check for input vector validity - the calling of search with invalid vectors need to be fixed in the future because it may slow process with non-needed checks. If disabled - it will run-out of buffers addresses and crash. The MShow displays vectors differs from old search (Expanding search) but degrain looks like works without visible blurring so the vectors looks like not very bad. Last edited by DTL; 14th October 2021 at 21:41.

15th October 2021, 15:25	#702 \| Link
DTL Registered User Join Date: Jul 2018 Posts: 1,075	Design ideas about Y+PseudoHueSat 8+8bit samples SAD processing: (instead of YUV 8+8+8 bit typically (up to) 3 passes) 1. Hue looks like naturally 2D essence so if unrolled to 1D will have discontinuity point. Unfortunately CPU can not process SAD of 4+4 low and high packs of bits of 8bit byte. (Or I still do not know if some math possible). This discontinuity + natural noise will cause large changes of PHS value near some colour tone and it will cause this colour tone blocks to treat as non-equal so they will not be denoised. We can only place it to some rare enough colour tone (I think close to Magenta). The simple 4-quadrants adjusting of the colour tone where discontinuity is happens is feeding PHS() calculation function with +-U and/or +-V values. 2. The Y+PHS coded plane may be processed with 1 pass SAD SIMD AVX2 same (close to) as for 16x8 block size of 8bit samples. 3. Simple enough calculation of PHS value from UV: (in 8bit unsigned, center to 127 codelevel for green colour tone and/or zero saturation) (DiamondAngle(U,V)-2)Sat(UV)some_norm+127. where DiamondAngle(x,y): Code: float DiamondAngle(float y, float x) { if (y >= 0) return (x >= 0 ? y / (x + y) : 1 - x / (-x + y)); else return (x < 0 ? 2 - y / (-x - y) : 3 + x / (x - y)); } //x and y in 0..1 range, out is 0..4 range. and Sat(x,y) about ((abs(U)+abs(V)) >> some_norm_value) - sort of saturation. It may be made as SIMD calculation at runtime or may be simple LUT of 8=f(8,8) bit values that is 16 kBytes in size and will fit in L1 cache. Good to test both ways. Last edited by DTL; 15th October 2021 at 15:54.

16th October 2021, 12:59	#703 \| Link
DTL Registered User Join Date: Jul 2018 Posts: 1,075	Some working release for test: https://github.com/DTL2020/mvtools/r...s/tag/2.7.45-1 . Due to no user-controls yet it have many build versions: mvtools2.dll - standard PseudoEPZSearch, Esa search and searchparam 2 and 4 with chroma=false uses new AVX2 SAD. mvtools2_ml1.dll - maxlevels limited to 1, searchparam only 2. mvtools2_ml1.dll - maxlevels limited to 2. mvtools2_glob_med_pred_ml2.dll - maxlevels limited to 2, only global and medium MV predictors, medium speed and quality mvtools2_no_pred_ml1.dll maxlevels limited to 1, no predictors, fastest speed, lowest motion search quality. Test script (1080i source): Code: SeparateFields() tr=12 super=MSuper(last,chroma=true) multi_vec=MAnalyse (super, multi=true, delta=tr, search=3, searchparam=2, overlap=2, chroma=false) MDegrainN(last,super, multi_vec, tr, thSAD=300, thSAD2=200) Weave() CPU i5-9600K AVSMeter results (fps): release-2.7.45: 7.2 mvtools2.dll: 9.53 mvtools2_glob_med_pred_ml2.dll: 11.46 mvtools2_ml1.dll: 11.31 mvtools2_ml2.dll: 10.03 mvtools2_no_pred_ml1.dll: 18.35 Test encoding average datarate with x264 (no MdegrainN : 24252) (kbit/s) release-2.7.45: 7866 mvtools2.dll: 7812 mvtools2_glob_med_pred_ml2.dll: 8464 mvtools2_ml1.dll: 8376 mvtools2_ml2.dll: 7844 mvtools2_no_pred_ml1.dll: 9342 So 'compression ratio' with best motion search (slowest) is about 3.08 and worst (fastest) is about 2.6. With difference in MAnalyse speed about 2 times. So it is really greatly limited with memory-access for SADs for both Refine() search and predictors testing. Attempt to add Prefetch* into FetchPredictors() helps nothing - it really need to reorganize data in memory for better access with predictors-defined pattern. Pull-request with current version created. Current idea is to add new user-input control like 'predictors=(all, partial, none)' and 'levels=(1,2,all/auto) for MAnalyse. To control quality/speed ratio. Last edited by DTL; 16th October 2021 at 13:05.

23rd October 2021, 10:12	#706 \| Link
kedautinh12 Registered User Join Date: Jan 2018 Posts: 2,156	I meet error notice when use last ver MVTool2 with TemporalDegrain2Mod Code: Avisynth script error: Evaaluate: Unhandled C++ exception! (C:/Program Files (x86)/Avisynth+/plugin64+/TemparalDegrain-v2.3.1MOD.avsi, line 276) Turnback to old ver 2.7.44 error was gone Location error https://github.com/kedaitinh12/AVSPl...1MOD.avsi#L276 My script: Code: TemporalDegrain2(limitFFT=2, postFFT=6) Last edited by kedautinh12; 23rd October 2021 at 10:15.

23rd October 2021, 18:25	#707 \| Link
pinterf Registered User Join Date: Jan 2014 Posts: 2,314	Hi kedautinh12, I'd need the clip width and height and video format (e.g. YV12?) __________________ AviSynth+ on github, Other repos: RgTools, Masktools2, MvTools2, TIVTC, Average

27th October 2021, 11:05	#713 \| Link
DTL Registered User Join Date: Jul 2018 Posts: 1,075	Sort of need for testers request: For test-release https://github.com/DTL2020/mvtools/r....7.46-pre.a.01 (also copy to google disk because of frequent repository removing for forking new actual version - https://drive.google.com/file/d/1Qm5...ew?usp=sharing ). Recommended testscript: Code: tr = 8 # Temporal radius - need testing low (2..3 and high values like 10+) super = MSuper () multi_vec = MAnalyse (super, multi=true, delta=tr,chroma=false,mt=false, levels=2) MDegrainN (super, multi_vec, tr, thSAD=400, thSAD2=400-1) 2 test cases: 1. Early skip of processing blocks with zero weight. It adds 2 conditions of zero checks that can decrease speed but if weight is zero it skips fetching from memory ref block and skips its processing that will increase speed. The total influence on speed depends on thSAD value and noise level in the processed footage. (Zero weights occur when blockSAD > thSAD and blockSAD increases with increasing of noise amplitude). Files: mvtools2_ww_es.dll - use early zero weight skip, mvtools2_ww_ns.dll - use old all blocks processing without condiitons. Currently fast zero weight skip only used in MDegrainN_sse2 so to test with low tr (2..3 the thSAD must be different from thSAD2 - to use MDegrainN instead of MDegrain2,3). Required tested tr values - low as about 2..3 and higer like 10. 2. Modification of weights of blocks to average. Old/classic MVtools uses additional weighting of blocks below thSAD with linear function f(x)=(thSAD^2-blockSAD^2)/(thSAD^2+blockSAD^2) It causes the blockSADs close to thSAD(2) to get low weights (close to zero if blockSAD slightly below thSAD). That decreases degrain ratio but may help to prevent more blurring if 'Type1' error occur (block have SAD below thSAD but it is not noised block but different block (moved/scaled/rotated etc). So using same sad-wise method for both motion compensating and selecting (and weigthing) blocks for avaraging is not totally error-free and nice. And may need more logical supplementing. So it looks the 'weighting of weights' method is the great field for finetuning and need to be additional user-param for MDegrain() functions. Currently for test is 2 builds: mvtools2_ww_es.dll - old/classic linear weighting of weights mvtools2_eqw_es.dll - equal weighting of all blocks below thSAD(2). It is also a bit faster because no double float division is required. The using of thSAD2 in MDegrainN only a bit lower thSAD increase degrain ratio but may cause more blurring (and other bugs) with motion. Test task: to check if mvtools2_eqw_es.dll produce better degrain ratio with all other equal params (tr mostly and thSAD(2)) and if introduce some visual quality degradation - to describe it. It possibly applies to all MDegrain functions because of MDegrain3.h header file. These builds also may contain test of OpenMP in MAnalyse() enabled to it better to disable avstp mt for MAnalyse(). Also MAnalyse() have new params optSearchOption=0 (default), if =1 - use new _avx2 esa search functions for search=3 and searchparam=2,3,4 (3 and 4 still not debugged in this build), and block size 8x8 and chroma=false and nPel=1 (in MSuper). optPredictorType=0 (default), if =1 - use partial predictors (faster but less vectors found and less degrain quality), if =2 - use no predictors - faster than (1) but less quality. To use searchparam > 2 the 'levels' must be >1. One bad use case for 'classic' weights weighting: The natural photon-shot noise have Poisson distrubution (coming to Gauss with practically large numbers of photons per sample) and it have both low and high deviations from mean value. It cause SAD deviations also low and high. When block with high SAD deviation is in tr-scope of degrain process (but still below thSAD) it got low weight to most of other blocks and low include its content to other blocks. But when processing reach this block - all other blocks in tr-scope got low weights and this block keeps most of its deviation after weighted-averaging. And in MPEG-compression later this block treats as different block and increases bit number for frame. I think 1-pass idea of comparing current block SAD with others is not perfect - may be better to compare 'previous-averaged' (that is close to expected block looking) SAD with others. But it looks like need multi-pass degraining. With second SAD-search process of '1-pass-degrained plane' with input source. May be it is hard to implement in 1-pass scripting without intermediate 1-pass degrained full plane output to temp file. So current thSAD value need to be high enough to make good weights for blocks with large noise-deviated SADs but low enough to decrease rate of 'block not equal' errors. I think this contradiction may be somehow (better) resolved. With sad-independent weighting (simple averaging of all tr-scope blocks with sad below trSAD) we can have lower thSAD value with still catching into processing the more blocks with high noise-deviated SADs. Last edited by DTL; 27th October 2021 at 11:47.

1st November 2021, 22:03	#714 \| Link
Dogway Registered User Join Date: Nov 2009 Posts: 2,361	I think some MDegrain values are not auto-scaled for 32-bit float, as I could see limit and limitc, didn't test more. __________________ i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread

13th November 2021, 14:13	#715 \| Link
Dogway Registered User Join Date: Nov 2009 Posts: 2,361	2 weeks later but with the same issue, the wiki states that MDegrain and MSuper support 32-bit float, yet all I get is a black screen. Code: ConvertBits(32) super_search = ConvertBits(8).MSuper(rfilter=4) bv1 = super_search.MAnalyse(isb = true, delta = 1, overlap= 4) fv1 = super_search.MAnalyse(isb = false, delta = 1, overlap= 4) MDegrain1(MSuper(levels=1), bv1, fv1, thSAD=300, thSADC=150) __________________ i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread

13th November 2021, 14:49	#716 \| Link
DTL Registered User Join Date: Jul 2018 Posts: 1,075	Looks like a bug in 32-bit codepath. It looks MDegrainN have some support of 32bit format but in C-reference only (slow). PlanarRGB 32bit same black screen also. Update - some testbuild with SIMD-based FetchPredictors() - https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.03 . It looks it was slow because of too many conditional tests and branching in the CheckMV() and Median(). Now it is fully redesigned to SSE(4.1 minimum required). The clipping may differs from old versions so new is only called with optSearchOption > 0. So now full-predictors MAnalyse should run faster on CPUs SSE4.1+ and with all possible other options sets + additional speedup for blocks 8x8 8bit luma=false pel=1 with AVX2-capable chips. Last edited by DTL; 13th November 2021 at 21:45.

21st November 2021, 19:44	#719 \| Link
DTL Registered User Join Date: Jul 2018 Posts: 1,075	Thank you for testing and report. There is many new changes since that test-build. Now when fixing optPredictorType=2 (something like 'only hierarchical predictor' mode) I found may be useful method of returning different SAD value from MAnalyse to MDegrain and it significally helps in some cases of noise or noise-like aliasing of interlaced camera. It is return sad not from finest level of search but from prevoius. Currently fixed mode of optPredictorType=2 works like this (it possibly the only possilble mode for this limiting preictor mode) but works acceptable only with levels=1 or 2. May produce severe artifacts with levels >2. But it still fastest mode for possibly lowes quality degrain work. Also with levels 2 and more it start to blur more and more (still do not know why - may be too many low-sad values). But in optPredictorType=1 this 'previous-level sad return' maybe controlled only for finest level (using hidden still working feature of nSearchParam==1 at finest level) so it allow to use optPredictorType=1 with any levels number (the 2 is faster in compare with auto/all like 6). It is also possible in 'all predictors' mode optPredictorType=0 if required (still not added). So current latest testbuild - https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.04 . The executables sf.dll return special sad from non-finest level to MAnalyse. It typically 2 or more times lower in compare with 'standard' sad so allow to use lower thSAD(2) values in MDegrain. With too high thSAD it may cause more artifacts like visibility of blocks edges/corners so may require increasing of overlap param (I currently use overlap=2). So require more precise thSAD adjusting per given content. If this feature will be found as useful it may be additional user-switch param. Current optSearchOption valid values: 0 - standard processing equal to 2.7.45 (for compatibility) 1 - partial SIMD optimizations for all options may be used. Like ClipMV, FetchPredictors. Exhaustive search (type=3) for searchparam 1 and 2 (and 3,4) is new AVX2-based. Though I see the r=3 and r=4 is of low use because default r=2 for levels except finest (slowest) and at finest/slowest level uses the fixed r=1 uses. 2 - special 'preset=fast' for degraining work, have hardcoded fixed many 'default' params for faster execution (less conditions checks). Currently aggregated and fixed params for optSearchOption=2: blksize = blksizeV = required to be 8, search=3, searchparam=2, pel = required to be 1 in MSuper chroma = false, outfile = disabled, dct = disabled, badSAD = (infinite,disabled) badrange = (disabled), temporal = false, trymany = false, required CPU opts: SSE, SSE4.1, AVX2. There may be also the very first 'tech demo' of multi-(4)-blocks search with optSearchOption=3 but it still not work for degrain at all - simply test for speed possible more SIMD-oriented processing on AVX2-capable chips. My typical production using script currently is about Code: tr=12 super=MSuper(chroma=true, mt=false, pel=1) multi_vec=MAnalyse (super, multi=true, delta=tr, search=3, searchparam=2, overlap=2, chroma=false, mt=false, optSearchOption=2, optPredictorType=1,levels=4) MDegrainN(last,super, multi_vec, tr, thSAD=175, thSAD2=160, mt=false,wpow=4) Last edited by DTL; 21st November 2021 at 20:00.*