Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 27th September 2022, 09:39   #161  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
It looks current 'starting' MPB params of MPBNumIt=2 and MPB_SPC=1.5 may be too high for SD noisy sources like VHS captures or may be other too noisy sources. So it cause significant decreasing of 'denoising' over all frame even with high enough tr and thSAD params. For such sources the MPB_SPC may be as low as 1.02..1.05 (it expected to be > 1.0) and MPBNumIt to 1. It is a bit unexpected low values so may be in future versions the internal math may be changed for more expected adjustment in range of about 1.1..2.0. Though leaving some residual noise may be used instead of 'debanding' after typical too clean degrain.

Also thCohMV param depends on pel about pel*4 so thCohMV=16 is my typical value for pel=4. If you use pel=1 it may be lower to about thCohMV=4.

Addition from 02.10.2022: It was found that using hardware motion search with NVIDIA GTX1060 card looks like sometime produces significant MVs errors and it cause significant details lost with MDegrainN processing. It is more visible if using 'high enough' tr-values like 10. May be other chips of new NVIDIA families or AMD produces better result. So currently for highest quality work only onCPU MAnalyse can be recommended.

Last edited by DTL; 2nd October 2022 at 07:10.
DTL is offline   Reply With Quote
Old 6th October 2022, 18:55   #162  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
New test release https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.17

Added MPB_PartBlend param to MDegrainN to check real partial blend vs subtraction of block. Default false (use subtraction). If set to true - perform full block blending with test block removed (slower but a bit more accurate in SAD). Subtraction method with integer 8(16 immediate) bit processing may still give up to +-1 error for each sample may be due to integer rounding errors and total SAD difference from real partial blending for even small 8x8 block of 64 samples total may be comparable with low threshold value. So for best possible quality MPB_PartBlend=true is recommended (may be significantly slower).

Separated subtractive and additive coefficients to different params MPB_SPCsub and MPB_SPCadd for better flexibility at experiments of finetuning.

Added isMVsStable function to check if MVs in current tr-pool for current block are enough coherent (stable) - to try to make MPB processing only at areas with stable enough motion search in tr-pool of frames. New MDegrainN param MPBthIVS - threshold to compare current calculated measure of non-stabilily of MVs (sum of accelerations multiplied to sum of vectors angle difference). Param is internally scaled to squared pel value but may significantly depend on tr and other settings. To help adjust this threashold - use IVS-mask display with showIVSmask=true.

Added protection to MDegrainN against too low padding (now need to be at least blocksize in size) and not equal temporal radius param for MAnalyse and MDegrain - display error messages instead of corrupted output.

Added showIVSmask param to MDegrainN to mark blocks detected as stable enough MVs with black. Default false. Black blocks with are detected as ready for MPB processing.

Added mvmultivs param to MDegrainN as option to provide separate MVclip with different search source or options for IVS mask creating. Provided clip must be equal to mvmulti in block number, overlap mode and recommended to use truemotion=false preset of MAnalyse to show noise-moved blocks as best as possible, not recomended to make from prefiltered clip. mvmulti clip may use any required params for best denoising, can be created from prefiltered source and so on.

Example of using mvmultivs clip with separate MAnalyse settings:
Code:
tr=10

super=MSuper(last, mt=false, chroma=true, pel=2)

multi_vec=MAnalyse(super, multi=true, blksize=8, delta=tr, search=3, searchparam=2, optSearchOption=1, chroma=true, mt=false)
multi_vec_vs=MAnalyse(super, multi=true, blksize=8, delta=tr, search=3, searchparam=2, optSearchOption=1, truemotion=false, pnew=0, pzero=0, chroma=true, mt=false)

MDegrainN(last,super, multi_vec, tr, thSAD=200, thSAD2=190, mt=false, wpow=4, thSCD1=400, adjSADzeromv=0.5, adjSADcohmv=0.5, thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, 
UseSubShift=1, IntOvlp=3, MPBthSub=5, MPBthAdd=20, MPBNumIt=2, MPB_SPCsub=0.5, MPB_SPCadd=1.5, MPB_PartBlend=false, MPBthIVS=1500, showIVSmask=false, mvmultivs=multi_vec_vs)
Typically onCPU MAnalyse with truemotion=false provide significantly better MVs for IVS-mask generation in compare with hwAcc search with NVIDIA GTX1060 but the processing is slower.

The idea of IVS-masked MPB processing is not to increase noise on non-detailed areas (like out of focus, clear sky and so on). Only enchance details (and also some noise) at areas with at least some details detected by motion search engine and existance of some temporal coherence of the MVs in current tr-scope around current frame. Current IVS-mask generation engine is not final design and may be subject to change in future.

If no mvmultivs clip is provided - single mvmulti is used for all operations (for example in hardware-only search of MVs). Though if MVs clip is created from anti-noise pre-filtered source the quality of IVS-mask may be more degraded.

Addition: The MPB processing in this build also significantly redesigned: Old processing tried to adjust initial weights array from old DegrainWeight function based on ratio of block SAD vs current thSAD. New design trying to create new weight-array from initial equal-weight condition (equal to wpow=7). And after 1 or more iterations of weights aligning it apply finally new calculated weight array to initial using proportional scaling.

Addition2: Currently MPBthIVS param is auto-scaled internally to pel*pel of the 'main working' mvmulti clip because acceleration part of metric depends on square absolute value of MVs coordinates differences and they are scaled with pel value. But to make processing faster if separate MVclip provided for IVS-mask it may be generated from lower pel-mode of MAnalyse (like pel=2 or pel=1) and in the future versions it looks good to make autoscaling to pel value of the mvmultivs clip (if provided). Though it still may significantly depends on other params like tr-value so still not auto-corrected completely anyway and need to be checked before starting of production processing after many other params adjustment.

Last edited by DTL; 7th October 2022 at 12:59.
DTL is offline   Reply With Quote
Old 9th October 2022, 18:30   #163  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
New release https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.18

Added SSIM metric in MAnalyse and MPB MDegrainN processing. Now possible dissimilarity metric flags (as bit mask):
bit 0 - SAD,
bit 1 - SSIM luma only,
bit 2 - SSIM contrast and structure.

Examples:
SAD only = 1
SSIM luma only = 2
SSIM contrast and structure = 4
Full (standard) SSIM = 6
SAD + SSIM contrast and structure = 5

Selecting of dissimilarity metric supported only in part of optSearchOptions and optPredictorType of MAnalyse. For example optSearchOption=2 have hardcoded SAD in SIMD and can not be switched.

New param for MDegrainN - MPB_DMFlags. Integer any of dissimilarity metric bitmask, default=1.
New param for MAnalyse - DMFlags Integer any of dissimilarity metric bitmask, default=1.
New param for MRecalculate - DMFlags Integer any of dissimilarity metric bitmask, default=1.

Current release have only C-reference partially float32 SSIM calculation functions (best precision and support all blocksizes and bithdepths) so very slow. For quality check mostly. Current processing speed degradation of MDegrainN MPB using SSIM about 2 times. MAnalyse usiing SSIM about 4 times. Expected good benefit from SIMD versions in the future (for some limited number of block sizes and bitdepths).

Example of interleaved frames with different dissimilarity metric for MPB processing:
Code:
ssim=MDegrainN(last,super, multi_vec, tr, thSAD=250, thSAD2=240, thSADC=240, mt=false, wpow=4, thSCD1=400, adjSADzeromv=0.5, adjSADcohmv=0.5, thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, 
UseSubShift=1, IntOvlp=3, MPBthSub=2, MPBthAdd=2000, MPBNumIt=2, MPB_SPCadd=1.5, MPB_SPCsub=0.6, MPBthIVS=1500, showIVSmask=false, MPB_DMFlags=6).Subtitle("SSIM")
sad=MDegrainN(last,super, multi_vec, tr, thSAD=250, thSAD2=240, thSADC=240, mt=false, wpow=4, thSCD1=400, adjSADzeromv=0.5, adjSADcohmv=0.5, thCohMV=16, MVLPFGauss=0.9, thMVLPFCorr=50, 
UseSubShift=1, IntOvlp=3, MPBthSub=2, MPBthAdd=2000, MPBNumIt=2, MPB_SPCadd=1.5, MPB_SPCsub=0.6, MPBthIVS=1500, showIVSmask=false, MPB_DMFlags=1).Subtitle("SAD")
Interleave(ssim, sad)
The SSIM metric produce much more great deviation of metric value so if using in MAnalyse and MDegrain the thSCD1 param of MDegrain must be significantly adjusted to higher value (like from 400 to 15000 or higher). Check MShow mean-SAD output to see expected value. Though thSAD can be about the same as with SAD metric.
Combinations of SAD+SSIM is additive (so thSAD may be about 2 times higher if used).
Combinations of SSIM-luma and SSIM-contrast and structure are multiplied (to form 'standard SSIM'). So enabling standard SSIM may not require significant thSAD adjustment.

Current conversion of SSIM into something close to SAD computing engine:

Dissimilarity metric = (1-SSIM) * maxSAD/2.
Where maxSAD = (3 * nBlkSizeX * nBlkSizeY * (pixelsize == 4 ? 1 : (1 << nBPP)))

So -2 SSIM (totally different blocks) shoud reach maxSAD. Though it looks the intermediate values of SAD and SSIM with some intermediate blocks dissimilarities are very different. May be it good to add some power-function to align a bit better in the future. Though the much higher gain of SSIM around low SAD may helps to MDegrainN and MVs search better.

Expected other metrics for addition in the future like CW-SSIM and others. Currently new intermediate helper class DisMetric is used so addition of new metrics is now much easier. Though the new class add some overhead at calling of metric function and selection so performance of SAD only metric may be visibly degradred.

Last edited by DTL; 9th October 2022 at 19:11.
DTL is offline   Reply With Quote
Old 10th October 2022, 08:05   #164  |  Link
anton_foy
Registered User
 
Join Date: Dec 2005
Location: Sweden
Posts: 702
Thanks DTL! Just a thought here, would it be possible to implement in mvtools a way to analayse and detect the grain/noise amount (using gpu?) for it to automatically and dynamically adjust tr/thsad as an option? I guess it would be a runtime thing that will slow down alot though.
anton_foy is offline   Reply With Quote
Old 11th October 2022, 15:42   #165  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
I still not have good idea how to implement auto-thresholding. Current single mvmulti clip with low tr may be not applicable and/or too slow for something like scene-wide lookahead. So may be add one more analysis mvclip (with single frame temporal stepping) to MDegrainN so it can try to call many next frames fast enough to look from current to next scene-change detect point and calculate something like mean-sad over total scene-wide frame numbers and apply it.

Currently still many more ideas to implement.

It the post about previous version it is not directly mentioned but with optSearchOption=6 it should now able to apply different metric to onHWAcc searched MVs. So it may be real reason why DX12 ME API do not output any metric with MVs - it may be up to the enduser to select and calculate any similarity/dissimilarity metric if required. For other search options the metric is either limited to search metric (like optSearchOption=0 and 1) or limited to SAD only (2,3,4,5). So I think it require more option to MAnalyse to select search and output option to select different metric (the single metric may be not best for search on all cases I think) and MVs consumer filter may be work better with different metric provided. As I see in the article https://videoprocessing.ai/metrics/w...e-metrics.html there are lots of different metrics available now and good to check how is they work with denoising or other mvtools activity.
Also the computer shader for optSearchOption=5 need to be redesigned to use many more metrics selected by DMFlags (or DMFlagsOutput) option to MAnalyse. So the external to CPU compute accelerator will offload more computing work of computing much more complex metrics in compare with simple SAD metric.
DTL is offline   Reply With Quote
Old 13th October 2022, 16:51   #166  |  Link
anton_foy
Registered User
 
Join Date: Dec 2005
Location: Sweden
Posts: 702
Quote:
Originally Posted by DTL View Post
I still not have good idea how to implement auto-thresholding. Current single mvmulti clip with low tr may be not applicable and/or too slow for something like scene-wide lookahead. So may be add one more analysis mvclip (with single frame temporal stepping) to MDegrainN so it can try to call many next frames fast enough to look from current to next scene-change detect point and calculate something like mean-sad over total scene-wide frame numbers and apply it.

Currently still many more ideas to implement.

It the post about previous version it is not directly mentioned but with optSearchOption=6 it should now able to apply different metric to onHWAcc searched MVs. So it may be real reason why DX12 ME API do not output any metric with MVs - it may be up to the enduser to select and calculate any similarity/dissimilarity metric if required. For other search options the metric is either limited to search metric (like optSearchOption=0 and 1) or limited to SAD only (2,3,4,5). So I think it require more option to MAnalyse to select search and output option to select different metric (the single metric may be not best for search on all cases I think) and MVs consumer filter may be work better with different metric provided. As I see in the article https://videoprocessing.ai/metrics/w...e-metrics.html there are lots of different metrics available now and good to check how is they work with denoising or other mvtools activity.
Also the computer shader for optSearchOption=5 need to be redesigned to use many more metrics selected by DMFlags (or DMFlagsOutput) option to MAnalyse. So the external to CPU compute accelerator will offload more computing work of computing much more complex metrics in compare with simple SAD metric.
Yes to load mdegrain in runtime will be too slow probably but what about using a mask in runtime to average the degraining? Now you have added different metrics also this could help to make this mask?
anton_foy is offline   Reply With Quote
Old 13th October 2022, 17:16   #167  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
Currently IVS-mask is to decrease level of details blurring on 'important' areas of frame while keeping degraining at no detailed areas good enough. As I see from latest tests the MAnalyse produces significantly different MVs with SAD and SSIM metrics at the non-detailed areas. So it may be additional method to create IVS-mask from even single pairs of frames search - it is faster in compare with many frames analysis for MVs difference for block. So in the new versions the MDegrainN may have 2 additional inputs of MVs clips and separate tr-value for this part of processing. Like mvmultivs2 and MPB_IVStr params. So the mvmulti clips may be created with much smaller tr and less CPU load, may be down to tr=1. Also it may help to reduce issue of current IVS-mask creation algorithm - after scene change it will not produce any good mask at about tr/2 or more number of frames.

MAnalyse with SSIM metric works about 2 times slower in compare with SAD metric even with AVX2 calculation of SSIM (getting sigmas and means in 16..32 bit integer subtraction/addition/multiplication and final processing in float32 including full precision square root). So it is good to decrease amount of processing with SSIM metric if possible for better performance.

Last edited by DTL; 13th October 2022 at 21:40.
DTL is offline   Reply With Quote
Old 13th October 2022, 21:46   #168  |  Link
anton_foy
Registered User
 
Join Date: Dec 2005
Location: Sweden
Posts: 702
Very interesting with the new IVS-mask.
Quote:
So in the new versions the MDegrainN may have 2 additional inputs of MVs clips and separate tr-value for this part of processing.
Do you mean like conditional filtering or a more seamless/smooth transition? I was thinking something like this:

Tr1 = MdegrainN(last,super1,multivec, tr=1, thsad=200...)
Tr6 = MdegrainN(last,super2,multivec, tr=6, thsad=400...)
Mask = 'scriptclip the mask for eg. Blackness() with opacity changing"
Mt_merge(tr6,tr1,mask,luma=true,u=3,v=3)

Changes dynamically between tr1 and tr6 (min/max).

Last edited by anton_foy; 13th October 2022 at 23:02.
anton_foy is offline   Reply With Quote
Old 13th October 2022, 23:49   #169  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
It is all done inside single blending engine for better speed. Any sequence of 'filters' in AVS takes lots of RAM for caching inbetween filters and for each thread and also the second main reason of single blending engine - it can load once all src and all ref frames blocks data from RAM to CPU L1D cache and make many processing not touching host RAM any more.

If you chain some filters in AVS each filter make full frame scan in some order so data is many time loaded from host RAM into CPU and it is much slower.

So for performance reasons it is better to make all processing for each input/output block in single (may be very complex) blending engine and not write some intermediate results in AVS-clips RAM objects and making final blend/merge in separate objects. Unfortunately AVS-filters can not interoperate image data on small chunks like blocks or samples - only total frames are requested by downstream filters from upstream. And total frame request invalidate most of caches (at least L1D as the fastest but only about <100 kBytes in size).

If IVS-mask is useful for some other scripting it can be special mode of MDegrainN to output this mask in some more nice form like grayscale 256 levels or just 1 black and white. Currently its output is designed only to check of mask placing over the frame data so it is not clean from image data.

I think for better understanding how old and current MDegrainN (MAnalyse + MDegrainN) is working and where many of adjustments params are passing it is good to create some structure scheme. May be will do in some form for documentation.

One sad issue with 'direct output' of IVS-mask: In better quality 'overlapped' mode of MdegrainN it is also generated for each block of 'overlapped blending space' and the blocks count in any overlapped mode are > blocks count in not overlapped mode (simple tesselation of frame to width/blocksizeH and height/blocksizeV number of blocks). So the IVS-mask in simple single frame form can not be outputted for good quality overlapping processing modes. Or some way of output of overlapped mask need to be designed (like doubling or quadrupling of output framerate to output each part of mask in separate frame). It is much more complex. It is also reason of using single blending engine instead of attempt to transfer mask via current AVS scripting interface based mostly on simple 'clips' objects of fixed frame size and may be expecting masks to be produced in same size.

Last edited by DTL; 14th October 2022 at 00:08.
DTL is offline   Reply With Quote
Old 18th October 2022, 15:16   #170  |  Link
anton_foy
Registered User
 
Join Date: Dec 2005
Location: Sweden
Posts: 702
https://forum.doom9.org/showthread.p...1962308​

https://forum.doom9.org/showthread.p...1973516​

Possibly for this thread or not but since it is about improving mvtools I guess it is suitable.
Using RIFE for/in mvtools would be possible?
anton_foy is offline   Reply With Quote
Old 19th October 2022, 06:18   #171  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
Well - RIFE at GTX1060 can run for 1080i degrain at simulating of MDegrain1. Run at about 2 fps at only 3 AVS+ threads fit 6 GB RAM of HWA. 4 and more threads typically throw vkmemoryallocation errors. Also Win10 GPU memory graph hit max value. The most of HWA load is Compute1 about 99..100% in Win10 GPU performance counters.

For 1080i source the script was
Code:
FFmpegSource2("src.mxf")

yadifmod2(mode=1)

ConvertToRGB(interlaced=false)

ConvertToPlanarRGB()
ConvertBits(32)
src32=last

even=SelectEvery(2,0)
odd=SelectEvery(2,1) 

even_d=RIFE(even)
odd_d=RIFE(odd)

even_d=SelectEvery(even_d,2,1)
odd_d=SelectEvery(odd_d,2,1)

den_int=Interleave(even_d, odd_d).Trim(1,0)

src32_trim=Trim(src32,1,0)

den_int2=Average(den_int, 0.666, src32_trim, 0.3333)
den_int2.ConvertToRGB24().ConvertToYV12()

SeparateFields()
SelectEvery(4,0,3)
Weave()

Prefetch(3)
It is not clean field-based denoise of interlaced (uses intermediate 50fps yadif deinterlaced form) but work as interlaced input and interlaced output. Also it is not clean test for quality because yadif simple deinterlacer adds its own distortions.

The MPEG x264 filesize saving with crf=18 encode is still very visible - the input noised source via same yadif deint->reint path give about 17.5 Mbps and denoised only 7.8 Mbps average.

The only good for users - it do not have any user adjustable params like thSAD and 10..20+ more tweaking params of current mvtools.

Separated RIFE degrain1 function may be like this
Code:
function RIFE_degrain1(clip src)
{
ConvertToRGB(src)
ConvertToPlanarRGB()
ConvertBits(32)
src32=last
even=SelectEvery(2,0)
odd=SelectEvery(2,1) 
even_d=RIFE(even)
odd_d=RIFE(odd)
even_d=SelectEvery(even_d,2,1)
odd_d=SelectEvery(odd_d,2,1)
den_int=Interleave(even_d, odd_d).Trim(1,0)
src32_trim=Trim(src32,1,0)
den_int2=Average(den_int, 0.666, src32_trim, 0.3333)
return den_int2.ConvertToRGB24().ConvertToYV12()
}
Uses Average plugin - http://avisynth.nl/index.php/Average , may be replaced to something internal AVS+ weighted blending with mask like Layer() or Overlay(). As tr=1 degrain it skips 1st frame of source clip for more simple form of function - output is input-1 frame count.

Last edited by DTL; 19th October 2022 at 07:40.
DTL is offline   Reply With Quote
Old 19th October 2022, 10:05   #172  |  Link
ChaosKing
Registered User
 
Join Date: Dec 2005
Location: Germany
Posts: 1,795
Not sure if it was asked here, but would DirectStorage be of any use for mvtools? Or maybe other plugins?

https://devblogs.microsoft.com/direc...1-coming-soon/

As far as I understand a constant CPU<->GPU transfer can be a big bottleneck. DirectStorage solves this by communication directly to the SSD/NVME storage without touching the cpu.
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth
VapourSynth Portable FATPACK || VapourSynth Database
ChaosKing is offline   Reply With Quote
Old 19th October 2022, 10:25   #173  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
DirectStorage is mostly about loading large resources from storage to GPU. Like GPU can issue load commands directly to storage and free CPU cycles to do some useful computing. Most of mvtools processing is about fetching RAM resources and math computing. DirectStorage may be 'environment' feature for AVS (mostly source plugins) because it loads data from storage into AVS environment. But as I see AVS developers not like to make AVS core windows-dependent. So it may be some feature for some source plugin to load file from storage directly to GPU for decompresion and download decompressed frames to host RAM as AVS resources (frames of clip). It may save some CPU cycles but may be not many.

May be some developer can make DirectStorage-using DirectShow plugin so AVS can use it via existing DirectShowSource input.

About RIFE denoising on some long footage test: With default settings it looks have too poor scene change detection and also create flickering on some repeating texture patterns at architecture. May be require model manual tweaking/selecting and params adjusting like scene change.
DTL is offline   Reply With Quote
Old 24th October 2022, 20:35   #174  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
New test release: https://github.com/DTL2020/mvtools/r.../r.2.7.46-a.19
Added MPB_MVlth param to MDegrainN. Limit allowed length of MV for weight correctionby MPB processing. Can decrease possible 'ghosting' with too extreme MPB_SPCsub/add params values. Not scaled by pel currently. Recommended values - about 2..3 squared pel value. Valid working range from 0 to squared frame size (unlimited). Zero may be disable MPB weight adjusting completely.

Added reduced tr blending mode for MPB controlled currently by MPB_SPCadd > 10.

Added MPBtgtTR param to MDegrainN:
In standard MPB mode controls initial number of weights used for calculate initial blend estimation (may be 0 - only current block used).
Valid range - from 0 to tr.
In reduced weights MPB mode - controls number of ref frames (2 * tr) used for blending without any other weights adjustment by MPB.

Added MPB_DMFlags=64 flag. Uses covariance metric only. Can be used only with MDegrainN.

Added VIF (DWT- based) metric. Controlled by 0x10 (VIF-Approximation) and 0x20 (VIF-Edges) flags. 16 and 32 decimal. Full VIF (VIF-A * VIF-E) is DMFlags=16+32=48. Can be used in both MDegrainN MPB flags and MAnalyse.

Current possibly best settings for processing:
Code:
MDegrainN(last,super, multi_vec, tr, thSAD=250, thSAD2=240, thSADC=240, mt=false, wpow=4, thSCD1=400, adjSADzeromv=0.5, adjSADcohmv=0.5, thCohMV=16, MVLPFGauss=0.9, 
thMVLPFCorr=50, UseSubShift=1, IntOvlp=3, MPBthSub=5, MPBthAdd=5, MPBNumIt=3, MPB_SPCadd=3, MPB_SPCsub=0.3, MPBthIVS=1500, showIVSmask=false, MPB_DMFlags=64, MPB_MVlth=8, MPBtgtTR=tr)
Example of reduced tr blending mode without MPB weights adjusting:
Code:
MDegrainN(last,super, multi_vec, tr, thSAD=250, thSAD2=240, thSADC=240, mt=false, wpow=4, thSCD1=400, adjSADzeromv=0.5, adjSADcohmv=0.5, thCohMV=16, MVLPFGauss=0.9, 
thMVLPFCorr=50, UseSubShift=1, IntOvlp=3, MPBNumIt=1, MPB_SPCadd=11,  MPBthIVS=1500, showIVSmask=false, MPBtgtTR=tr-5)
Blocks with tr cut to tr-5 controlled by IVS mask.

With enough aggressive settings like MPBNumIt=3, MPB_SPCadd=3, MPB_SPCsub=0.3 it is recommended to use low enough MPB_MVlth=8 (about 2 squared pel value) or the small moving objects may start to have 'ghosts' . This 'ghosting' feature of MPG is not perfectly fixed (and even not completely debugged) so added limitation of max MV length for weight adjustment. Current idea of this ghosting: Some blocks far enough from moving objects with errors MVs got some non zero weight in total blending pool of blocks and MPB multi-pass greatly amplificated weight of such blocks as it have some details and they become visible at output. So they looks like exist in standard mvtools/MDegrainN processing but with typical low weight are invisible.

Current MPB_DMFlags=64 uses covariance only metric that is inverted to typical metrics (covariance increases with increasing blocks similarity) and do not have good max expected values to make easy inverting to align with other dis-similarity metrics so it currently implemented outside DisMetric class and uses separate processing function in MDegrainN only. Currently with enough aggressive MPB settings it looks make best result on details keeping (and even look like produce some 'sharpening' effect).

Quote:
Originally Posted by anton_foy View Post
Just a thought here, would it be possible to implement in mvtools a way to analayse and detect the grain/noise amount (using gpu?) for it to automatically and dynamically adjust tr/thsad as an option? I guess it would be a runtime thing that will slow down alot though.
Currently with progress of multi-pass blending and in the future the multi-frame multi-pass blending the importance of thSAD will be less and less important. In this processing the weights of blocks to blend are many times adjusted by many functions. So initial thSAD is for first approximations and to skip too bad blocks. Same as you see the RIFE do not have thSAD or close adjustment and process all blocks similarity automatic.

About offloading of MDegrain MPB and MFMPB to accelerator - it may be possible but in some far future. With the increasing computing complexity of 'self-adjusting blending' the penalty for upload and download result to and from accelerator may be not very big even for per-block processing. Better to upload full set of frames and download resulted frame only but it require more complex compute shader design. Also I do not have good way to debug compute shaders now with remote debug. Though with MDegrainN the host do not need DX12-ME features for run separate compute shader and may be local debug will work on much poor accelerator that exist in some way on my development host with visual studio.

Currently the MPB still not very slow with much more computing in compare with old single pass blending mode with thSAD-defined weights only because it uses very small blocks data once that is good cached to L1D cache of CPU and no more disturbs very slow host RAM. So it good benefits from multi core CPUs and fast cores even with slow enduser RAM with low memory channels and poor performance cache/memory controllers of endusers cheap CPUs.

Last edited by DTL; 26th October 2022 at 04:39.
DTL is offline   Reply With Quote
Old 5th February 2023, 17:46   #175  |  Link
anton_foy
Registered User
 
Join Date: Dec 2005
Location: Sweden
Posts: 702
MCompensate

@DTL, with your versions of mvtools I can run scripts with mdegrain scripts but not motion compensated scripts with MCompensate. It just stops, aborts without error. Do you know why?

Edit: sorry I just read this
But does it mean your version does not even support the mcompensate at all?

Also here you mention the problem with thSCD but if I do not use scenechange detection at all I put it to zero? Will it make it easier for mvtools to denoise?

For HBD problems mentioned here could you just use two MSuper? Does MAnalyse really degrade quality too much if 8-bit yv12?

Something like this:
Code:
# HBD source

Sup8 = convertbits(8).converttoyv12()
Sup   = source.MSuper(levels=1)
Sup8 = source8.MSuper()
multi_vec_vs = sup8.MAnalyse(multi=true, blksize=8, ...).convertbits(source)
MDegrainN(last, sup, multi_vec_vs, ...)

Last edited by anton_foy; 5th February 2023 at 20:11.
anton_foy is offline   Reply With Quote
Old 5th February 2023, 20:02   #176  |  Link
Selur
Registered User
 
Selur's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 7,259
Any hope for this getting Vapoursynth support?
__________________
Hybrid here in the forum, homepage
Selur is offline   Reply With Quote
Old 5th February 2023, 22:16   #177  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
"does it mean your version does not even support the mcompensate at all?"

MCompensate should be the same as in 2.7.45 version from pinterf. But blocksize 16x16 may be broken (or bitdepth >8). Also blocksize of 16x16 do not run well at my GTX1060 with hardware search (in MAnalyse) and cause some crash that remote debugger can not break in. So I use only block size of 8x8. It is possible with scripting to feed half frame sized to MAnalyse with blocksize 8x8 and use MScaleVect to map output MVs it to full frame size with blocks of 16x16 if you want to try. It may be also faster for large frame sizes.

"thSCD but if I do not use scenechange detection at all I put it to zero?"

Disabled thSCD is about maxSAD (and it may be internallly silently clipped to maxSAD). So to disable scenechange detection you need to set thSCD to some verybig value (may be >1000 or >10000).

"could you just use two MSuper? Does MAnalyse really degrade quality too much if 8-bit yv12?"

Yes - you can use 8bit source for MAnalyse and use output MVs clip to any bitdepth MDegrain processing. If source is enough noised it is also enough self-dithered so may be no significant degradation of motion search quality. It is also may be tested with software 16bit MAnalyse.

"Any hope for this getting Vapoursynth support?"

It is a question to developers of Vapoursynth if they can take sources and redesign to Vapoursynth. I do not know how to make it.
DTL is offline   Reply With Quote
Old 5th February 2023, 23:19   #178  |  Link
anton_foy
Registered User
 
Join Date: Dec 2005
Location: Sweden
Posts: 702
Quote:
Originally Posted by DTL View Post
MCompensate should be the same as in 2.7.45 version from pinterf. But blocksize 16x16 may be broken (or bitdepth >8). Also blocksize of 16x16 do not run well at my GTX1060 with hardware search (in MAnalyse) and cause some crash that remote debugger can not break in. So I use only block size of 8x8. It is possible with scripting to feed half frame sized to MAnalyse with blocksize 8x8 and use MScaleVect to map output MVs it to full frame size with blocks of 16x16 if you want to try. It may be also faster for large frame sizes.

"thSCD but if I do not use scenechange detection at all I put it to zero?"

Disabled thSCD is about maxSAD (and it may be internallly silently clipped to maxSAD). So to disable scenechange detection you need to set thSCD to some verybig value (may be >1000 or >10000).

"could you just use two MSuper? Does MAnalyse really degrade quality too much if 8-bit yv12?"

Yes - you can use 8bit source for MAnalyse and use output MVs clip to any bitdepth MDegrain processing. If source is enough noised it is also enough self-dithered so may be no significant degradation of motion search quality. It is also may be tested with software 16bit MAnalyse.
Thanks! Then I will try 8x8 blocks with MAnalyse, MCompensate but it will not be any benifits in speed with DX12-ME (GPU) when using MCompensate? Only used for MAnalyse? Sounds a bit similar to SVanalyse then in SVPflow. When using mscalevect I had bad experiences with too soft/blurry results but I will try.
anton_foy is offline   Reply With Quote
Old 5th February 2023, 23:32   #179  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
MCompensate is a client function for MAnalyse MVs server, same as MDegrain and all other from mvtools. MAnalyse can offload MVs search to DX12-ME hardware accelerator.
The MRecalculate is some intermediate - it receives MVs from MAnalyse and make some refinement search (and it can not be offloaded to DX12-ME accelerator because it make only single search from beginning).

In theory with the progress of MDegrainN it also may make some refinement search in multi-pass processing - with CPU only.

MAnalyse with DX12-ME mode should be able to make both single pair of frames search and 'multi' for MDegrainN search (using storage of current frame in accelerator and not upload it for each pair of search in 'multi' mode for better performance). So it should be compatible with all other MVs clients of mvtools.

The hardware ME from MPEG encoder looks not designed for best quality - the software may be better (and with iterative multi pass - more better). So for highest quality work the usage of hardware ME may be only for initial search or may be prefiltering.
DTL is offline   Reply With Quote
Old 7th February 2023, 13:22   #180  |  Link
anton_foy
Registered User
 
Join Date: Dec 2005
Location: Sweden
Posts: 702
Code:
#DTL:
pre=convertbits(8,dither=-1).converttoyv12().dipre()
tr=6
super = last.MSuper (pel=4, levels=1, chroma=true)
sup8 = pre.MSuper (pel=4, chroma=true)
#multi_vec = MAnalyse (sup8, multi=true, blksize=8, delta=tr, optSearchOption=5, overlap=0, levels=1, chroma=true).convertbits(16).converttoYUV444()
multi_vec=MAnalyse(sup8, multi=true, blksize=8, delta=tr, search=3, searchparam=2, overlap=0, optSearchOption=1, optPredictorType=4, chroma=false, mt=false, levels=1).convertbits(16).converttoYUV444()
#MDegrainN (super, multi_vec, tr, thSAD=150, thSAD2=250)
Mcompensate(last,super,multi_vec,thsad=thsad,tr=tr,center=true,mt=false)
temporalsoften(6,75,76,255,2)
selectevery(tr*2+1,tr)
I get error "unhandled C++ exception" with this script and if using the mdegrain line I get the same error. If using the first Manalyse line I get this error "MAnalyse: frame width not supported by DX12_ME, max supported width 0".
anton_foy is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 15:56.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.