Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

 
 
Thread Tools Search this Thread Display Modes
Prev Previous Post   Next Post Next
Old 6th April 2024, 12:11   #11  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,159
" Intel Compiler. AFAIK it produces the fastest builds, even for AMD."

I tried it in beginning of 202x years and it makes some small visible benefit over VisualStudio compiler. But as we see with Asd-g builds the progress of development of LLVM compilers is good and they makes best binaries. Maybe I also someday will try to install LLVM somehow to my development system.

Current best in quality degrain script with first search with 16x16 blocksize pel=1 and recalculating to 8x8 blocksize pel=2 before MDegrainN (with some not very slow speed):
Code:
my_tr=12
my_AMDiffSAD=0
my_thSADA_a=1.2

my_intOvlp=3
my_ovlp=0
my_blksize=16

super_p2=MSuper(last, mt=false, pel=2, hpad=32, vpad=32)
super_p1=MSuper(last, mt=false, pel=1, hpad=32, vpad=32)


vec_am1o4=MAnalyse (super_p1, multi=true, blksize=my_blksize, delta=my_tr, search=3, searchparam=2, truemotion=true, pnew=0, global=true,  overlap=my_ovlp, chroma=true,\
 optSearchOption=1, optPredictorType=0, mt=false, AreaMode=1, AMoffset=4, AMdiffSAD=my_AMDiffSAD)


multi_vec_mrec_am1o4am2recall=MRecalculate(super_p2, vec_am1o4, thSAD=20, blksize=8, search=3, searchparam=4, truemotion=true, pnew=0, chroma=true, overlap=my_ovlp, \
AreaMode=2, AMoffset=0, tr=my_tr)


MDegrainN(last,super_p2, multi_vec_mrec_am1o4am2recall, my_tr, thSADA_a=my_thSADA_a, thSADA_b=50, mt=false, wpow=4, adjSADzeromv=0.8, adjSADcohmv=0.8,\
thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, adjSADLPFedmv=0.9, IntOvlp=my_intOvlp)
The thSAD param for MRecalculate is significant quality/performance balancing value:

1. Low or zero thSAD: Perform refining search of all new recalculated (interpolated) MVs. Slowest mode but may give best quality.
2. Default of 200: Depend on noise level and motion can cause more or less refining searches for blocks with SAD > thSAD.
3. Very high value (about thSCD1 ): Only interpolate MVs and re-check SAD for interpolated MV. Fastest mode but may give lower quality (depend on the noise level/profile and may more).

About low AreaMode setting (starting from 1 'layer'): It looks some better quality happen with non-zero AMoffset value and expected good value about blocksize/4. For higher AreaMode settings the best AMstep/AMoffset values depending on blocksize is still subject of many tests.
For blocksize of 16x16 AMoffset=0 and AreaMode=1 cause only additional search with +-1 of block's center position and it is only 1/16 of block size. With AMoffset=4 more surround samples used in lower number of new search calculations and it looks create more benefit in quality per used CPU cycles.
Current equation for offset from center block's position for each AreaMode step/'layer' (i) is int iOffset = (iAMstep * i + iAMoffset + 1); and new 4 searches of MAnalyse performed with 4 search positions of +-iOffset (diagonal, not sides). So for AreaMode=1, AMoffset=0, AMstep=1 4 new searches performed with +-1 offset (in current 'pel' scale - different for each level and pel setting). It may be not effective (for areas with complex motion) to make additional searches with iOffset > blocksize or even blocksize/2.

Also the AreaMode algorithm may have some 'natural' limit of MVs quality increase with increasing of number of new search positions. Currently internal (not checked) limit of search positions vector is limited to 100 (expected too slow and no one used ever). So user can test AreaMode offsets much larger than blocksize as experiments.

Currently compute complexity is of linear scale from AreaMode setting (number of new search positions is linear scale 4 of AreaMode setting). It is possible to add check of all other possible integer positions around center with given radius (including sides positions and all intermediate). But it will make compute complexity up to square of AreaMode setting. May be in some future version may be added at least sides check option (it only increases complexity 2x linear). Maybe as additional AMflags param like:
AMflags 1 - diagonal offsets
AMflags 2 - sides offsets
AMflags 4 - all offsets in defined by AreaMode+AMstep+AMoffset area.
So user can select AMflags 1+2=3 for diagonal+sides offsets for example. For 'big' blocksizes like 16x16 or 32x32 using AreaMode=1, AMoffset=blocksize/4, AMflags=3 may be better in performance/quality balance. For blocksize 8x8 AMFlags=3 equal to all possible positions with AreaMode=1 (4 diagonal and 4 sides) and so on.

Last edited by DTL; 6th April 2024 at 12:18.
DTL is offline   Reply With Quote
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 00:05.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.