Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 7th August 2023, 12:29   #241  |  Link
kedautinh12
Registered User
 
Join Date: Jan 2018
Posts: 2,161
Pinterf was fixed bug from AVSTP
https://github.com/pinterf/AVSTP/releases
kedautinh12 is offline   Reply With Quote
Old 7th August 2023, 17:21   #242  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,175
Yes - so it is usable with AVS+ again. But now it only can run with a 2.7.45 build because I did not test it during the years of development of my version and it looks like several crash-bugs with internal MT using AVSTP was accumulated.

Some test script with 5x overlap with hardware ME (and vsttempsmooth pmode=1 as median-like 'best' sample value select engine at the end):
Code:
# Input plugins
LoadPlugin("ffms2.dll")
LoadPlugin("mvtools2.dll")
LoadPlugin("vsTTempSmooth.dll")

SetMemoryMax(10000)

my_thSADA_a=1.1
my_thSADA_b=60

my_thSAD=250
my_thSAD2=Int(Float(my_thSAD) * 0.8)

my_thSAD_mg=150
my_thSAD2_mg=Int(Float(my_thSAD_mg) * 0.8)

my_thSCD=my_thSAD+200

my_thCohMV=4 

my_refine_tr=12

my_MT=false

FFmpegSource2("src.mp4")

AddBorders(16,16,16,16)

super_hwa_center=MSuper(last, mt=my_MT, chroma=true, pel=4, hpad=8, vpad=8, levels=1, pelrefine=false)

src=last
shift_val=4

src_up=Crop(0,shift_val,src.width-0,src.height-shift_val).AddBorders(0,0,0,shift_val)
src_down=Crop(0,0,src.width-0,src.height-shift_val).AddBorders(0,shift_val,0,0)
src_left=Crop(shift_val,0,src.width-shift_val,src.height-0).AddBorders(0,0,shift_val,0)
src_right=Crop(0,0,src.width-shift_val,src.height-0).AddBorders(shift_val,0,0,0)

super_hwa_up=MSuper(src_up, mt=my_MT, chroma=true, pel=4, hpad=8, vpad=8, levels=1, pelrefine=false)
super_hwa_down=MSuper(src_down, mt=my_MT, chroma=true, pel=4, hpad=8, vpad=8, levels=1, pelrefine=false)
super_hwa_left=MSuper(src_left, mt=my_MT, chroma=true, pel=4, hpad=8, vpad=8, levels=1, pelrefine=false)
super_hwa_right=MSuper(src_right, mt=my_MT, chroma=true, pel=4, hpad=8, vpad=8, levels=1, pelrefine=false)

mv_hwa_center=MAnalyse(super_hwa_center, multi=true, delta=my_init_tr, chroma=true, mt=false, optSearchOption=5, levels=1)
mv_hwa_up=MAnalyse(super_hwa_up, multi=true, delta=my_init_tr, chroma=true, mt=false, optSearchOption=5, levels=1)
mv_hwa_down=MAnalyse(super_hwa_down, multi=true, delta=my_init_tr, chroma=true, mt=false, optSearchOption=5, levels=1)
mv_hwa_left=MAnalyse(super_hwa_left, multi=true, delta=my_init_tr, chroma=true, mt=false, optSearchOption=5, levels=1)
mv_hwa_right=MAnalyse(super_hwa_right, multi=true, delta=my_init_tr, chroma=true, mt=false, optSearchOption=5, levels=1)

dg_center=MDegrainN(src, super_hwa_center, mv_hwa_center, my_refine_tr, thSAD=my_thSAD_mg, thSAD2=my_thSAD2_mg, thSADA_a=my_thSADA_a, thSADA_b=my_thSADA_b, mt=my_MT, wpow=4, UseSubShift=1, thSCD1=my_thSCD, adjSADzeromv=0.7, \
adjSADcohmv=0.7, thCohMV=my_thCohMV, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=3)

dg_up=MDegrainN(src_up, super_hwa_up, mv_hwa_up, my_refine_tr, thSAD=my_thSAD_mg, thSAD2=my_thSAD2_mg, thSADA_a=my_thSADA_a, thSADA_b=my_thSADA_b, mt=my_MT, wpow=4, UseSubShift=1, thSCD1=my_thSCD, adjSADzeromv=0.7, \
adjSADcohmv=0.7, thCohMV=my_thCohMV, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=3)

dg_down=MDegrainN(src_down, super_hwa_down, mv_hwa_down, my_refine_tr, thSAD=my_thSAD_mg, thSAD2=my_thSAD2_mg, thSADA_a=my_thSADA_a, thSADA_b=my_thSADA_b, mt=my_MT, wpow=4, UseSubShift=1, thSCD1=my_thSCD, adjSADzeromv=0.7, \
adjSADcohmv=0.7, thCohMV=my_thCohMV, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=3)

dg_left=MDegrainN(src_left, super_hwa_left, mv_hwa_left, my_refine_tr, thSAD=my_thSAD_mg, thSAD2=my_thSAD2_mg, thSADA_a=my_thSADA_a, thSADA_b=my_thSADA_b, mt=my_MT, wpow=4, UseSubShift=1, thSCD1=my_thSCD, adjSADzeromv=0.7, \
adjSADcohmv=0.7, thCohMV=my_thCohMV, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=3)

dg_right=MDegrainN(src_right, super_hwa_right, mv_hwa_right, my_refine_tr, thSAD=my_thSAD_mg, thSAD2=my_thSAD2_mg, thSADA_a=my_thSADA_a, thSADA_b=my_thSADA_b, mt=my_MT, wpow=4, UseSubShift=1, thSCD1=my_thSCD, adjSADzeromv=0.7, \
adjSADcohmv=0.7, thCohMV=my_thCohMV, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=3)


#move shifted back
dg_up=Crop(dg_up, 0,0,dg_up.width-0,dg_up.height-shift_val).AddBorders(0,shift_val,0,0)
dg_down=Crop(dg_down, 0,shift_val,dg_down.width-0,dg_down.height-shift_val).AddBorders(0,0,0,shift_val)
dg_left=Crop(dg_left, 0,0,dg_left.width-shift_val,dg_left.height-0).AddBorders(shift_val,0,0,0)
dg_right=Crop(dg_right, shift_val,0,dg_left.width-shift_val,dg_left.height-0).AddBorders(0,0,shift_val,0)


intrl=Interleave(dg_center, dg_up, dg_down, dg_left, dg_right) 
vstt=vsTTempSmooth(intrl, ythresh=200, uthresh=200, vthresh=200, pmode=1, maxr=2)
SelectEvery(vstt, 5,2)

Crop(16,16,width-32, height-32)

Prefetch(..)
Using 5x 'overlapped' processing fixes some small search errors and gives lower overall noise level. In comparison with the 'center' output clip. But still have some general search errors in comparison with onCPU MAnalyse with 'max' settings. It is still not tested as 'prefilter/1st generation' processing in 2 or more generations of MVs refining.

" see all those news in SMDegrain soon"

Maybe the only small and still important features like second input to MAnalyse and auto-thSAD to MDegrain may be ported to yet another 'simple addition to 2.7.45 pinterf version' as mostly safe from bugs changes. And Dogway may test it in the SMDegrain. I hope pinterf may return back in the 2023 and I can ask about making port of some very limited pack of simple new features to make some 'official-post-2.7.45' build of mvtools.

Last edited by DTL; 8th August 2023 at 22:55.
DTL is offline   Reply With Quote
Old 26th September 2023, 19:59   #243  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,175
New release https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.25

Added non-linear Median-like MV filtering mode in addition to linear low-pass filtering to MDegrainN.

New params to MDegrainN:
MVMedF (integer) - default 0 (disabled). Temporal radius of median filtering of temporal MVs sequence. Valid range - from 1 to about 1/3 of tr-value used.
MVMedF_em (integer) - default 0. Edges of MV temporal vector processing modes. Mode 0 - copy non-filtered MVs from input. Mode 1 - invalidate non-filtered frames MVs to save from possible blending of bad MVs. Number of non-filtered frames equal to MVMdeF value.
MVMedF_cm (integer) - default 0. MVs coordinates processing mode. Mode 0 - separated x,y vectors median filtering. Mode 1 - using length of difference vector as dissimilarity metric.

Example:
Code:
MDegrainN(last,super, multi_vec, tr, thSAD=135, thSAD2=120, mt=false, wpow=4, thSCD1=450, thSADA_a=my_thA, thSADA_b=my_thB, \
adjSADzeromv=0.5, adjSADcohmv=0.5, thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, MVMedF=2, MVMedF_cm=0, MVMedF_em=0, IntOvlp=3)
For any used MV filtering mode thMVLPFCorr must be non-zero (to use filtered MVs with coordinates difference from input MVs below this value).
Both linear and non-linear MVs filtering may be enabled in any combination. First executed non-linear filtering and next linear.
MVMedF_em=1 may be used for possible higher quality processing (non-filtered MVs/blocks at the edges of tr excluded from blending). But it eats MVMedF frames from total tr-pool and decreases possible denoise level (so to keep same max denoise level with MVMedF_em=1 tr need to be tr+MVMedF).

MVMedF_cm=1 may produce more shifted areas of moving objects - it is subject of testing and may be fix possible in next releases.

Expected working values for MVMedF temporal radius of non-linear filtering - about 1..3 (may be up to about 1/3 of the tr value used, so for MVMedF=3 recommended tr is about 6..10). Values of 3 and more not yet tested.

Non-linear filtering expected to work better in skipping false long strike MVs with lower tr in compare with linear filtering.

Test script for compare new MVs processing features with old (2022 ?) source from post https://forum.doom9.org/showthread.p...40#post1974040 :
Code:
LoadPlugin("mvtools2_260923.dll")
LoadPlugin("ffms2.dll")

FFmpegSource2("test_org.mkv")

examp=FFMpegSource2("test_enc_thSAD200.mkv").Crop(250,200,500,500).ConvertToYUV420(matrix="Rec709").ConvertBits(8).Subtitle("test_enc")

ConvertToYUV420(matrix="Rec709")

Crop(250,200,500,500)

noproc=last.Subtitle("src")

super_std=MSuper(mt=false, pel=2)

tr=14

my_thA = 1.3
my_thB = 30

multi_vec=MAnalyse(super_std, blksize=8, multi=true, search=3, temporal=false, trymany=true, searchparam=2, chroma=true, delta=tr, truemotion=false,\
 pzero=10, pnew=10, pglobal=10, levels=0, mt=false, overlap=4)
new_m2=MDegrainN(last,super_std, multi_vec, tr, thSAD=150, thSAD2=140, thSADA_a=my_thA, thSADA_b=my_thB, mt=false, wpow=4, thSCD1=500, \
adjSADzeromv=0.7, adjSADcohmv=0.7, thCohMV=6, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, MVMedF=2, MVMedF_cm=0, MVMedF_em=1,\
 IntOvlp=0).Subtitle("new_m2")
old=MDegrainN(last,super_std, multi_vec, tr, thSAD=200, thSAD2=190, mt=false,IntOvlp=0).Subtitle("old_thSAD200")
ConvertBits(8)

Interleave(examp, new_m2, old, noproc)
Sharpen(0.5)

SincLin2Resize(width*2, height*2)

Prefetch(2)
155 (src) frame of interleaved sequence compare https://imgsli.com/MjA5OTQx/1/3

Last edited by DTL; 27th September 2023 at 22:28.
DTL is offline   Reply With Quote
Old 28th September 2023, 17:06   #244  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,175
New release - https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.26

Added new MVF_fm param to MDegrainN. Fixed MV filtering in the non-YUV-combined processing modes.

MVF_fm (integer), default=0 . Blocks failing mode at the process of MVs filtering. Mode 0 (default) - pass blocks with failed SAD re-check unchanged to blending engine. Mode 1 - fail (invalidate to blending) blocks with failed SAD re-check after filtered MVs coordinates.

In MVF_fm=1 mode it saves more blocks from blurring but it typically cause degradation of denoising at these areas. So cause uneven denosie over total frame area. May be more visible when compare static frames.
DTL is offline   Reply With Quote
Old 2nd October 2023, 20:19   #245  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,175
Some strategic announcement about fully hardware multi-generation MVs refining for noised sources.

I made some onCPU tech tests of multi-generation refining in MDegrainN (using simple ESA search algorithm) - even with a search radius of 4 it is much slower. Though still faster in comparison with script-based refining (and uses about 2 times less RAM). It is expected to be somehow faster after most possible optimizations with SIMD but not very much possible.

So as hardware MVs search ASIC typically shows significant underload - it is possible to put MVs refining into hardware accelerator (new modes for MAnalyse). Major part is development of MDegrainN simple or most featured processing as Compute Shader so it can be dispatched in HWA without downloading current generation MVs for external processing. It not breaks logic of MAnalyse in AVS filterchain - it still outputs single MVs clip with refined MVs completely in the hardware accelerator using hardware MVs search ASIC and hardware universal shaders dispatch units to dispatch SAD and MDegrainN shader to provide filtered frames to next generations of MVs refining by same hardware MVs search ASIC. For boards with 2 or more NVENC ASICs I hope drivers are smart enough to create a full filterchain inside accelerator board without download-upload resources between degrain and MVs search stages and spread load over all available ASICs onboard.
DTL is offline   Reply With Quote
Old 12th March 2024, 07:49   #246  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,625
Quote:
Originally Posted by DTL View Post
New release
Maybe this can be useful.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 31st March 2024, 08:04   #247  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,175
New release https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.27

Added more 'area' predictors to MAnalyse. Extended optPredictorType to -1,-2, -3.

Added AreaMode MVs refining to MAnalyse.
New params:
AreaMode (integer), valid values 0,1,2,3,4.
0 - disabled
1 - x5 total positions checks (center + 4 diagonal offsets of +-1)
2 - x9 total positions checks (center + 8 diagonal offsets of +-1 and +-2)
3 - x13 total positions checks (center + 12 diagonal offsets of +-1 and +-2 and +-3)
4 - x17 total positions checks (center + 16 diagonal offsets of +-1 and +-2 and +-3 and +-4)

AMdiffSAD (integer), valid values 0 and positive. Recommended range 1..1000.
Allow to add MVs absolute difference from AreaMode search to the SAD to send additional hints about block MV quality. Multiplier to the mean sum of abs MVs coordinates differences. Values about 1000 and higher totally fail the SAD value of the block.

Example of progressive film processing script with difference control from different AreaMode setting:
Code:
tr=12
my_AMDiffSAD=0
my_thSADA_a=1.2

my_intOvlp=0
my_ovlp=4

super=MSuper(last, mt=false, pel=2, hpad=8, vpad=8)

multi_vec_cpu=MAnalyse (super, multi=true, blksize=8, delta=tr, search=3, searchparam=2, truemotion=true, overlap=my_ovlp, chroma=false, optSearchOption=1, optPredictorType=0, mt=false)
multi_vec_am5=MAnalyse (super, multi=true, blksize=8, delta=tr, search=3, searchparam=2, truemotion=true, overlap=my_ovlp, chroma=false, optSearchOption=1, optPredictorType=0, mt=false, AreaMode=1, AMdiffSAD=my_AMDiffSAD)
multi_vec_am9=MAnalyse (super, multi=true, blksize=8, delta=tr, search=3, searchparam=2, truemotion=true, overlap=my_ovlp, chroma=false, optSearchOption=1, optPredictorType=0, mt=false, AreaMode=2, AMdiffSAD=my_AMDiffSAD)
multi_vec_am13=MAnalyse (super, multi=true, blksize=8, delta=tr, search=3, searchparam=2, truemotion=true, overlap=my_ovlp, chroma=false, optSearchOption=1, optPredictorType=0, mt=false, AreaMode=3, AMdiffSAD=my_AMDiffSAD)

ma_cpu=MDegrainN(last,super, multi_vec_cpu, tr, thSADA_a=my_thSADA_a, thSADA_b=50, mt=false, wpow=4, adjSADzeromv=0.8, adjSADcohmv=0.8, thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=my_intOvlp).Subtitle("ma_cpu")
ma_cpu_am5=MDegrainN(last,super, multi_vec_am5, tr, thSADA_a=my_thSADA_a, thSADA_b=50, mt=false, wpow=4, adjSADzeromv=0.8, adjSADcohmv=0.8, thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=my_intOvlp).Subtitle("ma_am5")
ma_cpu_am9=MDegrainN(last,super, multi_vec_am9, tr, thSADA_a=my_thSADA_a, thSADA_b=50, mt=false, wpow=4, adjSADzeromv=0.8, adjSADcohmv=0.8, thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=my_intOvlp).Subtitle("ma_am9")
ma_cpu_am13=MDegrainN(last,super, multi_vec_am13, tr, thSADA_a=my_thSADA_a, thSADA_b=50, mt=false, wpow=4, adjSADzeromv=0.8, adjSADcohmv=0.8, thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=my_intOvlp).Subtitle("ma_am13")

Interleave(ma_cpu, ma_cpu_am5, ma_cpu, ma_cpu_am9, ma_cpu, ma_cpu_am13, Subtract(ma_cpu, ma_cpu_am13).Levels(100,1, 140, 0,255))
Sharpen(1.0)
The benefit from more predictors added is very small but may be tested. They are median (mode ?) of the surround block predictors from previous levels of search. Performance cost is not big (in single predictor refining mode with trymany=false).

The performance cost of AreaMode search is great - it is +4, +8, +12, +16 new full searches runs around current block position and next computing of median (mode ?) of the resulted MVs vector to create output MV (SAD is the max of the selected best dx and dy MVs found). AreaMode=3 (12 additional block searches) runs about 5 times slower in comparison with standard single search per block at i5-9600K CPU.


Dark film scene compare (gamma=2.0 added to show darks better) https://imgsli.com/MjUxNjY3

Also this new level of processing generates more data for analysis of 'quality of MV' and currently a simple idea is used to additionally signal to the denoising engine: If MVs of the small shifted positions around the current block are not coherent - it means MV estimation may be unstable. So the absolute difference of the MVs coordinates (averaged to the number of MVs in the search pool to make no (less ?) dependence of the AreaMode used) multiplied to the AMdiffSAD param may be added to block SAD. So this block will take less degrain weight in MDegrain. Users can try to control the effect of this setting using MShow for average SAD per frame (showsad=true).

For the areas of stable MVs enabling AreaMode makes close to no difference and great performance hit. For the areas of unstable MVs it adds some more quality in denoising (may be good visible at 600..800% crops). x264 encoding bitrate at fixed CRF=18 is also a bit lower (about 3% at some quick test encodings). In the future I hope to add close to this idea for the DX12 accelerated ME. So the ME ASIC may be finally good loaded with useful work. But I still not yet fully restore my development environment (so currently no DX12-ME build available) and also may need to ask Microsoft support how to fast shift loaded resource in the accelerator to send a queue of searches to ASIC in 1 job list instead of uploading lots of shifted copies of the frames to make searches with shifted block tessellation grid.

The AreaMode level is located over standard levels of search in MAnalyse and compatible with any old mode but currently looks like work only with block size 8x8 because of experimental release not fully debugged. In the future it is planned to move into -e.XX builds expected to be more stable in comparison with -a.XX builds.

Also some observation: enabling truemotion=true also significantly helps to MVs stability in the low contrast low detailed noised areas recommended to be enabled in the high quality use cases.

Addition: Current ideas on performance optimization: Limit AreaMode search to full-pel level only (or to some non-finest level depending on current pel-setting). But the full search algorithm still can fail to 'best SAD' and not really best MV while checking lots of predictors (starting from zero predictor). So for using full search with AreaMode down to full-pel level and only sub-sample refining it may be added some more PredictorsTypes (like optPredictorType=5 or something else). Or as I long time thinking it is better to make many control params of MAnalyse arrays instead of single value for all levels. With arrays user can set much more flexible performance/quality balance on each level. Currently MAnalyse only have separate param like persearch to set different search radius at the sub-pel levels. But it can be expanded to many other search params to adjust performance/quality at user side and not hardcode into MAnalyse as 'hard presets'. But I still not have experience in arrays params for AVS filters. So current more easy ways is to add more params like AMll (for AreaMode level limit) and maybe pelPT (for pel PredictorType selection for fixed sets of predictors used at sub-sample levels of search) and maybe peltrymany (to select usage of refining of all predictors or best only at the sub-sample levels). So user can select more quality search at the levels down to full-sample precision (fast enough) and select only limited search at the slowest pel=2 and pel=4 levels for better performance.

Last edited by DTL; 1st April 2024 at 11:12.
DTL is offline   Reply With Quote
Old 2nd April 2024, 01:23   #248  |  Link
TDS
Formally known as .......
 
TDS's Avatar
 
Join Date: Sep 2021
Location: Down Under.
Posts: 1,086
Quote:
Originally Posted by DTL View Post
I hate to say it DTL, but any build after Release_2.7.46_e.03, causes an error in RipBot264 "Cannot render the file" when previewing script in AVSMeter.

With either DX12 or noDX12, build .03 had AVX options.
__________________
Long term RipBot264 user.

RipBot264 modded builds..
*new* x264 & x265 addon packs..

Last edited by TDS; 2nd April 2024 at 01:47.
TDS is offline   Reply With Quote
Old 2nd April 2024, 09:34   #249  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,175
Oh - a.XX builds are the most featured but unstable for many blocksizes/bitdepths. Typically tested only with YV12 and 8x8 blocksize. So if it crash even with YV12 and 8x8 blocksize - write at least frame size (or better full script) so I can try to look in the debugger. The e.03 build only have very small number of new features after 2.7.45 added but expected to be most stable and support all bitdepths/blocksizes as in 2.7.45.

I plan to add AreaMode search mode into e.XX builds but some time later after finishing most of its settings.

With the very limited developer resources at the residuals of current civilization the new features tested and debugged modes are quickly shrinks to the very limited and mostly used format like YV12 and 8x8 blocksize (like at the very beginning of AVS and mvtools). Though I understand the better quality of MVs is expected with at least dual-blocksize processing like first search with blocksize of 16x16 and next is refine to 8x8 with MRecalculate. So most of new controls to MAnalyse need to be added to MRecalculate too.

It may be also natural limiting to AreaMode finest level (also different area-size) - like first search MAnalyse(AreaMode=3, blksize=16) and next refine with MRecalculate(AreaMode=1, blksize=8) for better performance and maybe quality too because of larger blocksize at the first search.

Last edited by DTL; 2nd April 2024 at 10:02.
DTL is offline   Reply With Quote
Old 2nd April 2024, 10:53   #250  |  Link
TDS
Formally known as .......
 
TDS's Avatar
 
Join Date: Sep 2021
Location: Down Under.
Posts: 1,086
Quote:
Originally Posted by DTL View Post
Oh - a.XX builds are the most featured but unstable for many blocksizes/bitdepths.
I will double check the script I used, but I generally don't have any blocksize settings, and don't know what the video I'm using is, either.

Will report back....
__________________
Long term RipBot264 user.

RipBot264 modded builds..
*new* x264 & x265 addon packs..
TDS is offline   Reply With Quote
Old 2nd April 2024, 13:55   #251  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,175
Can you post sample video and script used ?
DTL is offline   Reply With Quote
Old 3rd April 2024, 02:45   #252  |  Link
TDS
Formally known as .......
 
TDS's Avatar
 
Join Date: Sep 2021
Location: Down Under.
Posts: 1,086
Quote:
Originally Posted by DTL View Post
Can you post sample video and script used ?
Well, another day, and a different result...

.27 is now working, and all I have changed is to update FFmpeg...
__________________
Long term RipBot264 user.

RipBot264 modded builds..
*new* x264 & x265 addon packs..
TDS is offline   Reply With Quote
Old 3rd April 2024, 08:48   #253  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,175
The sad truth about complex bugs with 'memory corruption' in the still poorly protected memory in current OS/CPUs design - if some thread writes out of the buffer it may be not any visible while the overwritten memory region not used as some executable page or non-checked data bytes. It may be the main source of 'hidden bugs' accumulation - they are invisible untill damage something else important and cause crash finally. The complexity of search of these bugs is because there is no records who and when made write to damaged memory bytes. Crash happens only when damaged data read and used somewhere. Also because of close to random 4KB RAM pages location in address space - each next OS reboot and each next process run may cause different damaged memory area.

So the simple version: a.XX build with some settings cause some RAM page corruption but not all ffmpeg builds uses same memory allocation scheme so this corruption may remain hidden and not damage any important data. So ffmpeg runs without crash and maybe without image data distortion. But everything may change at any time. This is sad process of constant accumulation of overall non-stability and hard to debug the real place where it starts. There exists some automated software tools for program instrumenting for many (every ?) write operation control to help found these bugs. But they may work with C-writes and not check long SIMD data writes. And mvtools uses many hand coded asm parts and these parts may be not compatible with automated instruments for memory damaging analysis.
DTL is offline   Reply With Quote
Old 4th April 2024, 13:28   #254  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,625
Quote:
Originally Posted by DTL View Post
New release 2.7.46-a.27
I can't test on my Sandy Bridge, gives instruction error.

Did you compile for AVX2 only?
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 4th April 2024, 20:28   #255  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,175
New release: https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.28

Added new params to MAnalyse and MRecalculate:

AMstep (integer): default 0 (auto) or 1 and higher. Step of each 4 area search positions offset (diagonal) around center position of block. 0 auto mean scaled to 8x8 block size (auto mean 1 for 8x8 block size, 2 for 16x16 block size and so on).
AMoffset (integer): default 0. Offset from 1 to start of area positions.

New params for MAnalyse:
PTpel (integer): PredictorType at sub-sample levels of search. Default = optPredictorType.
AMpel (integer): AreaMode for sub-sample levels (level 0 for pel=1 or level 0 and level 1 for pel=2). Default = AreaMode.

New params for MRecalculate:
AreeaMode, AMdiffSAD - same as for MAnalyse.

Fixed issue with crash with block size 16x16 (and possibly others) with SIMD instructions enabled (SetMAXCPU > 'none').

Added passing of tr-value to nTrad new member of analysisdata structure to mvmulti clip from MRecalculate (to be compatible with compatibility check in new MDegrainN).

"I can't test on my Sandy Bridge, gives instruction error."

Yes - that was AVX2 only build. I added SSE2 and AVX2 in update to current release a.28 at github.

Example of test script starting with pel=1 and blocksize of 16x16 and MRecalculate to pel=2 and blocksize 8x8:
Code:
my_tr=12
my_AMDiffSAD=0
my_thSADA_a=1.2

my_intOvlp=0
my_ovlp=0
my_blksize=16
mymrecthSAD=200

super_p2=MSuper(last, mt=false, pel=2, hpad=32, vpad=32)
super_p1=MSuper(last, mt=false, pel=1, hpad=32, vpad=32)

multi_vec_cpu=MAnalyse (super_p2, multi=true, blksize=8, delta=my_tr, search=3, searchparam=2, truemotion=true, overlap=my_ovlp, chroma=false, optSearchOption=1, optPredictorType=0, mt=false)

vec_am0=MAnalyse (super_p1, multi=true, blksize=my_blksize, delta=my_tr, search=3, searchparam=2, truemotion=false, pnew=0, global=true,  overlap=my_ovlp, chroma=true,\
 optSearchOption=1, optPredictorType=0, mt=false, AreaMode=0, AMstep=2, AMdiffSAD=my_AMDiffSAD)
vec_am1=MAnalyse (super_p1, multi=true, blksize=my_blksize, delta=my_tr, search=3, searchparam=2, truemotion=false, pnew=0, global=true,  overlap=my_ovlp, chroma=true,\
 optSearchOption=1, optPredictorType=0, mt=false, AreaMode=1, AMstep=2, AMdiffSAD=my_AMDiffSAD)


multi_vec_mrec_am0=MRecalculate(super_p2, vec_am0, thSAD=mymrecthSAD, blksize=8, search=3, searchparam=4, truemotion=false, pnew=0, chroma=true, overlap=my_ovlp,\
 AreaMode=0, tr=my_tr)
multi_vec_mrec_am2=MRecalculate(super_p2, vec_am1, thSAD=mymrecthSAD, blksize=8, search=3, searchparam=4, truemotion=false, pnew=0, chroma=true, overlap=my_ovlp, \
AreaMode=2, tr=my_tr)


ma_cpu=MDegrainN(last,super_p2, multi_vec_cpu, my_tr, thSADA_a=my_thSADA_a, thSADA_b=50, mt=false, wpow=4, adjSADzeromv=0.8, adjSADcohmv=0.8, thCohMV=16,\
MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=my_intOvlp).Subtitle("ma_cpu")
ma_cpu_mrec_am0=MDegrainN(last,super_p2, multi_vec_mrec_am0, my_tr, thSADA_a=my_thSADA_a, thSADA_b=50, mt=false, wpow=4, adjSADzeromv=0.8, adjSADcohmv=0.8,\
thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=my_intOvlp).Subtitle("ma_am_mrec_am0")
ma_cpu_mrec_am=MDegrainN(last,super_p2, multi_vec_mrec_am2, my_tr, thSADA_a=my_thSADA_a, thSADA_b=50, mt=false, wpow=4, adjSADzeromv=0.8, adjSADcohmv=0.8,\
thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=my_intOvlp).Subtitle("ma_am_mrec_am")


Interleave(ma_cpu, ma_cpu_mrec_am0, ma_cpu_mrec_am, Subtract(ma_cpu, ma_cpu_mrec_am0).Levels(100,1, 140, 0,255), Subtract(ma_cpu, ma_cpu_mrec_am).Levels(100,1, 140, 0,255))
Sharpen(1.0)
Still not any best params adjusted - just initial working to first tests.

Last edited by DTL; 4th April 2024 at 20:34.
DTL is offline   Reply With Quote
Old 5th April 2024, 09:34   #256  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,625
Quote:
Originally Posted by DTL View Post
Yes - that was AVX2 only build. I added SSE2 and AVX2 in update to current release a.28 at github.
__________________
@turment on Telegram

Last edited by tormento; 5th April 2024 at 09:36.
tormento is offline   Reply With Quote
Old 5th April 2024, 09:35   #257  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,625
Quote:
Originally Posted by DTL View Post
Yes - that was AVX2 only build. I added SSE2 and AVX2 in update to current release a.28 at github.
I can see AVX2/SSE2 build only.

If you can, please provide AVX version too, so I can report issues if any.
__________________
@turment on Telegram

Last edited by tormento; 5th April 2024 at 10:21.
tormento is offline   Reply With Quote
Old 5th April 2024, 10:29   #258  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,625
Benchmark on a simple SMDegrain:

MVTools 2.7.46e03 (AVX ) 7,93 fps
MVTools 2.7.46a28 (SSE2) 7,28 fps


Then I checked some metrics between MVTools 2.7.46e03 and MVTools 2.7.46a28:

PSNR 52.9363
SSIM 0.9974
VMAF 96.0528
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 5th April 2024, 15:48   #259  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,175
I updated release from 04.04.24 with DX12 and noDX12 SSE2/AVX/AVX2 builds with Visual Studio 2019.

Unfortunately a.XX versions may be really slower in compare with very close to 2.7.45 version e.03 version because of active usage of many possible dissimilarity metrics (SAD, SSIM, VIF) and runtime sub-sample shifting in MAnalyse. a.XX builds are full featured for MAnalyse but contains lots of conditional jumps. These conditional jumps disturbs out-of-order and branch-prediction units of CPUs and may make visible performance penalty.

To regain some performance you need to enable at least optSearchOption=1.

To somehow fix this new performance penalty it is planned to re-design program text of MAnalyse to templated versions of processing functions (use DMFlags dissimilarity metric control or old SAD only and UseSubShift or not). Manual copy of these options and selection of used function of startup will creates even more scary file PlaneofBlocks.cpp and more complex to future development. But this planned templated re-design only possible in some future versions. Now if testing shows the a.28 version finally start to work with typical still used complex scripts like SMDegrain and QTGMC - I can only ask Asd-g to make LLVM builds (typically somehow faster). And if LLVM compiler is more smart and AI-powered inside as we expect from still not completely died progress of civilization it may auto-optimize these conditional jumps better in comparison with simple VS2019 compiler.
DTL is offline   Reply With Quote
Old 6th April 2024, 11:03   #260  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,625
Quote:
Originally Posted by DTL View Post
I updated release from 04.04.24 with DX12 and noDX12 SSE2/AVX/AVX2 builds with Visual Studio 2019.
Thanks. I will try and report ASAP.

I think you should give a try to Intel Compiler. AFAIK it produces the fastest builds, even for AMD.
Quote:
Originally Posted by DTL View Post
To regain some performance you need to enable at least optSearchOption=1.
We need to convince Dogway to implement some of your new features in his very good scripts.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 23:49.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.