MVTools-pfmod - Page 40

Ceppo · 13th February 2022, 19:28

There is an adobe plugin called Twixtor that allows blend interpolation like mvtools and another mode where it takes one of the interpolated frames and returns it without blending it. This mode would require only one motion vector and may give better interpolation result, at least Twixtor user say that this mode is often superior to blend. Is there any chance to get this blend free interpolating function?

DTL · 16th February 2022, 11:05

I think it is important addition to the project - to add interface for input/output of moving data from MAnalyse to user-accessible format inside scripting environment and read back to client filters.

I read Dogway already ask for some way to get MVs from MAnalyse to check.

As I found with static images (static parts of moving images) processing - https://forum.doom9.org/showthread.p...66#post1963966 - the 2frames based MAnalyse search can not decrease noise on temporal axis and so the MVs data between MAnalyse and MDegrain client when processing noised sources may require additional filtering. I can add some form of temporal filtering to MDegrainN but it may be very content-dependent and may be it is good to allow many external non-C programmers developers to experiment with in-between motion data processing before final MDegrain blending.

Currently it looks we do not have common/standard exchange formats for motion data. With DX12 the Microsoft way of exchange is converting to 2D texture of 16+16 bit signed 2 component format (size of texture = number of blocks HxV). But it only provide translate motion x,y data.

For current AVS+ it is idea to convert to RGBPS format texture/clip with mapping as
R - x
G - y
B - SAD

And may be make 2 additional functions to convert MAnalyse output pseudo-clip to MotionData-RGBPS and back. So users of scripts with sample-accesing functions may try to read or read and send back processed (filtered) motion data to the downstream client filters. May be it is enough to provide motion data only per level=0 and not all other levels (MDegrain only uses level=0 data ?).

In the best future we need some extended format may be like XML (?) with many motion (transform) params for each block:
1. Translate (x,y)
2. Rotate (rz (rx, ry) ?)
3. Scale (sx,sy)
4. Skew (.., sx, sy)
5..
N. SAD scalar unsigned data

May be for multi-movement search engines use set of separated RGBPS or Y-PS clips for each type of movement (translate/rotate/scale/etc). So if client filter can accept different types of motion data it will accept several input motion-clips. It will keep compatibility with old versions/scripts.

DTL · 6th March 2022, 11:30

Quote:

Originally Posted by FranceBB

A couple of other questions:
- do you think we will ever get to a point in which compilers will be smart enough to generate fast enough code automatically at compile time while targeting an instruction set so that manually written intrinsics won't be necessary/worth writing or will it ever be science fiction?

Some more information about motion search on non-CPU data compute accelerators:

1. Good information: The data compute accelerators also support dedicated hardware instructions for many SAD computing on input vectors - the msad4 intrinsinc in HLSL: https://docs.microsoft.com/en-us/win...ics-hlsl-msad4 . Also have an example for searching position of reference pattern in a buffer. That mean the SAD-based motion search form current MAnalyse may be more efficiently ported into Compute Shader version.

2. The very strange situation about 'low level' programming of compute accelerators: It not support 'assembler' level programming. Only C-like languages. Example of question about direct 'assembler-level' programming of accelerator - https://stackoverflow.com/questions/...-in-directx-11 .
For NVIDIA CUDA - https://stackoverflow.com/questions/...guage-for-cuda .

So it mean currently the compiler must produce as best as possible executable result for accelerator and no manual hand-crafting is possible (not officially supported). Programmer need to use higher-level functions intrinsics or simple C-like statements. That is partial answer on the question about current and may be future state of compiler optimizations for at least part of current computing hardware (and may be typically higher in performance in compare with host desktop CPU). May be accelerators from different manufacturers are not fully compatible with instructions sets so even in 'compiled' state the program may still use some pseudo-code for further adapting to executing hardware at runtime.

It looks AMD also will someday support AVX512 register file and some instructions. https://www.extremetech.com/computin...2-channel-ddr5
So it is good to test the sub-sample processing with upsizing of 1x level to 2x or 4x for pel=2 and pel=4 search in MAnalyse and shifting for MDegrainN on AVX512 functions.

FranceBB · 13th March 2022, 14:36

Quote:

Originally Posted by DTL

It looks AMD also will someday support AVX512 register file and some instructions. https://www.extremetech.com/computin...2-channel-ddr5
So it is good to test the sub-sample processing with upsizing of 1x level to 2x or 4x for pel=2 and pel=4 search in MAnalyse and shifting for MDegrainN on AVX512 functions.

Thanks for the info. Honestly, when I had to purchase new servers in 2019, I went again for Intel Xeon mostly due to the fact that they were the only ones supporting AVX-512. They're encoding files through Avisynth and either x262/Libavcodec MPEG-2 Encoder or x264 on a daily basis, calling then the mxf muxer (either BBC BMX or the closed source paid Omneon mxf muxer provided by Harmonic). I do make use of MVTools extensively for the Tape Remastering workflows put in place in Summer 2020 after Derek's suggestion, so seeing a speed improvement in there would be nice. I'm not planning to get any new servers anytime soon as I'm re-allocating old AVID Transcode servers to become "new" Avisynth servers. They're just low speed 10c/20th Intel Xeon with AVX2 only, so nothing compared to the three 56c/112th AVX-512 beasts I bought in 2019, but they're gonna do the job. I guess the time to perhaps try an AMD test bench for professional encoding use in servers MIGHT come in 2025 if I'll have to buy more servers, but for the time being, I guess I'm gonna be fine as I have a farm with:

- 3 Intel Xeon 56c/112th AVX-512 128 GB of RAM
- 2 Intel Xeon 20c/40th AVX-2 64 GB of RAM
- 1 Intel Xeon 10c/20th AVX-2 32 GB of RAM (old server)

and soon-ish I'm gonna have 19 more of those, so:

- 20 Intel Xeon 10c/20th AVX-2 32 GB of RAM (old server)

that should take care of all the extra work and ideally the 3 monsters will pick up and handle almost only the ProRes, XAVC, MJPEG2000 etc UHD clips and the other ones will take care of the old/legacy XDCAM-50 FULL HD version and the MPEG-2 12 Mbit/s Long GOP M=3 N=15 SD version of movies, tv series etc.

The only nag is that for SD versions only I always have to call ommcp.exe and remux with the Omneon muxer 'cause Omneon playout ports have their own special flag and they don't follow the normal container's flag or stream's flag for aspect ratio. This is because once you flag it within Omneon, it will tell the playback port what to do, so not only whether it's 4:3 or 16:9 but also whether you want to crop it, add borders, leave it as it is etc.

I really honestly wish SD to die 'cause encoding 2022 movies in SD BT601 only to serve some people really breaks my heart.

DTL · 28th June 2022, 09:01

Made some graphics to show how MDegrain works with incoming SAD deviations and for understanding how to set thSAD (and wpow) value for different input noise-based SAD distributions.

https://i4.imageban.ru/out/2022/06/2...6c7a979dc0.png

Also attached to post to keep at this server.

As for thSCD1 value: The current ideas it shoud be at least as great as thSAD. Default thSCD1 looks like 400, so if increasing thSAD above 400 it may required to raise thSCD1 too or it will work as internal thSAD limiter and will also throw-away blocks from processing completely because detected as scene-changed blocks and completely wrongly compensated.

As for AVX512:

Currently I am in the finishing process of testing new processing mode for MDegrainN and pel > 1 with inside CPU core generating sub-pel compensated block instead of fetchig of pre-computed by MSuper() block from host memory (using old fully pel-refined super clip to 4x size for pel=2 and 16x size for pel=4). It works about good and faster but for best speed require processing inside register file of chip. So the AVX2 512 bytes sized register file is enough only for 8x8 8bit and lower block sizes (so can only fully service YV12 8 bit colour format with 8x8 luma and 4x4 chroma blocks, the 4:2:2 YV16 may be added with 4x8 chroma block size but not sure if it is widely used, also not sure if it supported by current mvtools at all). For any larger it is better to use 2048 bytes AVX512 register file (also with a bit faster processing of twice longer vectors). So todays 8x8 16bit blocks and 16x16 16bit blocks already require AVX512 for best speed.

The 16bit 8x8 and 16x16 blocks processing AVX2 functions can be designed but will have lower performance because of store-load temp results from register file to L1d cache and back and it ruines speed to a factor about 5. Not total function speed but internal partial operations touching memory susbsystem like cache. Main reason of these functions will be to save RAM usage and it also adds to processing speed.

I also will post a test sample of MAnalyse with same 'runtime-calculating' pel >1 blocks fetching. But it looks even at UMH optimized search it still slower in compare with pre-calculated super-clip of pel=4 about 2 times at i5-9600K chip. Will try to test at i5-11600 chip with AVX512 versions of sub-shift functions too later.

So the main speed benefit of new MDegrainN processing of pel >1 is when using ME hardware accelerator so the most speed limit was memory fetching of blocks from large 16x sized pel=4 super clip at 4K resolutions. It is about 1.2 fps at i5-9600K with old super clip mode and about 6 fps with new inside chip shifting of 1x sized frames.

tormento · 28th June 2022, 10:45

Quote:

Originally Posted by DTL

Made some graphics to show how MDegrain works with incoming SAD deviations and for understanding how to set thSAD (and wpow) value for different input noise-based SAD distributions.

How this graph would help us to find the correct values for parameters? I am looking at it and I am "a bit" confused.

What is the latest version that of MVTools that I can use with SMDegrain with no AVSI modification, withouth AVX2/512 requirement?

DTL · 28th June 2022, 15:00

"How this graph would help us to find the correct values for parameters?"

I hope it can help to understand MDegrainN activity when parameters changing:
1. When thSAD is too low - any tr value will not help in degraining.
2. The good working value of thSAD have enough visible 'barrier' or 'step' effect - untill it reach lower noise-SAD levels it is mostly ineffective. After it is set to upper noise-SAD values it become already nearly maximum effective and increasing thSAD to higher levels mostly do nothing (useful) but may cause more details blurring.
3. After thSAD reach optimal level - the most of degrain-strength adjustment is only tr-width (value). Increasing thSAD to twice or more higher above optimal will mostly not add anything useful to degrain activity.

To the left it is placed rotated graph of different DegrainWeight() functions graphs scaled to 'possibly optimal' thSAD vertical value to show how blocks weights depend on SAD and thSAD values and with different wpow params. May be it is good to shoot a video-lection with a several minutes or more duration with attempt to describe this drawing better

.

"What is the latest version that of MVTools that I can use with SMDegrain with no AVSI modification, withouth AVX2/512 requirement?"

In theory any builds should be backward compatible with old scripts. All new params are in the default disabled state. Also the max available version of SIMD co-processor is auto-detected.
It was funny to found the old program text around MAnalyse() functions that very ancient 'isse' common functions param was truncated to SSE or nothing only. And there were no newer functions above SSE 128bit to use more newer chips. So this truncation were never detected untill the AVX2 functions were added.

tormento · 28th June 2022, 18:23

Quote:

Originally Posted by DTL

I hope it can help to understand MDegrainN activity when parameters changing

I do it manually, increasing tr and thsad at the same time, until compressibility comes to decrease less and less rapidly.

tr=3, thsad=300 -> x265 -> file size
tr=4, thsad=400 -> x265 -> file size

etc...

But it's very time consuming and I have always wondered if there is a way to do it automatically.

Quote:

Originally Posted by DTL

In theory any builds should be backward compatible with old scripts.

I remember that you sent here a version with one hardcoded parameter that could really increase speed without affecting quality in a visible manner. Would it be possible to have that version, updated?

LeXXuz · 28th June 2022, 18:49

Quote:

Originally Posted by DTL

"How this graph would help us to find the correct values for parameters?"

I hope it can help to understand MDegrainN activity when parameters changing:
1. When thSAD is too low - any tr value will not help in degraining.
2. The good working value of thSAD have enough visible 'barrier' or 'step' effect - untill it reach lower noise-SAD levels it is mostly ineffective. After it is set to upper noise-SAD values it become already nearly maximum effective and increasing thSAD to higher levels mostly do nothing (useful) but may cause more details blurring.
3. After thSAD reach optimal level - the most of degrain-strength adjustment is only tr-width (value). Increasing thSAD to twice or more higher above optimal will mostly not add anything useful to degrain activity.

To the left it is placed rotated graph of different DegrainWeight() functions graphs scaled to 'possibly optimal' thSAD vertical value to show how blocks weights depend on SAD and thSAD values and with different wpow params. May be it is good to shoot a video-lection with a several minutes or more duration with attempt to describe this drawing better

.

"What is the latest version that of MVTools that I can use with SMDegrain with no AVSI modification, withouth AVX2/512 requirement?"

In theory any builds should be backward compatible with old scripts. All new params are in the default disabled state. Also the max available version of SIMD co-processor is auto-detected.
It was funny to found the old program text around MAnalyse() functions that very ancient 'isse' common functions param was truncated to SSE or nothing only. And there were no newer functions above SSE 128bit to use more newer chips. So this truncation were never detected untill the AVX2 functions were added.

These figures are ver interesting and thanks for the more detailed explanation.

I never paid much attention to anything else but prefiltering, tr and thSAD. Simply because I didn't understand enough how all the other parameters are connected with each other. Yes, I read the docs. But that still was way over my head at times.

So the very big question is how to determine the somewhat best SAD value for a source file.

Right now, much like tormento already mentioned, it's more like a lot of trial and error for me. So I was wondering if there may be any automated way to measure this somehow. It doesn't have to measure the best possible settings but something like a decent base of params to just tweak a little here and there to personal liking. Otherwise starting from scratch for every movie will be a life's work if I ever want to recode my collection.

DTL · 28th June 2022, 20:06

" that version, updated?"

The most currently developed new feature of internal shifting for MDegrainN need at least AVX2 CPU to run faster in compare with old versions. And still have only YV12 format supported in this mode with Y-block size of 8x8. So it will not run faster at old AVX-only chip.

" always wondered if there is a way to do it automatically"

Initial estimation of the thSAD may be with MShow(showsad=true). To make it work stable I use single pair of frames search:

Code:

super = MSuper(mt=false, pel=1)
forward_vec1 = MAnalyse(super, isb = false, delta = 1, search=3, chroma=true, mt=false)
MShow(super,forward_vec1, showsad=true)

With multi-mode of MAnalyse it typically not shows SAD stable enough.

" increasing tr and thsad at the same time"

thSAD mostly control 'quality by blurring' and tr the 'amount of degraining'. So initially it is good to set as high thSAD as still not too much degrade fine details but already making degraining with relatively low tr like 3. And next increase tr to balance speed/degraining ratio. And look for the ratio of thSAD vs thSCD1 value - if thSAD > thSCD1 visibly (also if mean SAD by MShow(showsad=true) is > thSCD1) - need to start raise thSCD1 too.

Boulder · 29th June 2022, 06:55

Quote:

Originally Posted by DTL

As for thSCD1 value: The current ideas it shoud be at least as great as thSAD. Default thSCD1 looks like 400, so if increasing thSAD above 400 it may required to raise thSCD1 too or it will work as internal thSAD limiter and will also throw-away blocks from processing completely because detected as scene-changed blocks and completely wrongly compensated.

Interesting - I thought that scene change detection affected the whole frame and not just a single block. If you use MShow to view the vectors, you get that impression.

DTL · 29th June 2022, 08:21

Updated post:

After more looking into the source: Yes - the array 'usable_flag_arr' is one per frame entry.

The usage of thSCD1 and thSCD2 is mostly as

Code:

bool FakePlaneOfBlocks::IsSceneChange(sad_t nTh1, int nTh2) const
{
  int sum = 0;
  for ( int i = 0; i < nBlkCount; i++ )
    sum += ( blocks[i].GetSAD() > nTh1 ) ? 1 : 0;

  return ( sum > nTh2 );
}

So when there are too many blocks with SAD > thSCD1 - the total frame marked as not-usable. So thSCD1 really directly compared with block SAD but the final result is per-frame but not per-block.

So engine allow some blocks to have SAD > thSCD1 and still be in processing (only if the frame still marked as usable !). But after percentage of these blocks become > thSCD2 - the whole frame is marked as unusable and thrown away from processing.

thSCD2 (int, 130)
Threshold which sets how many blocks have to change for the frame to be considered as a scene change. It is ranged from 0 to 255, 0 meaning 0 %, 255 meaning 100 %. Default is 130 (which means 51 %).

So user must carefully look for situation when thSAD > thSCD1 - it may quickly stops any useful processing and any increasing of tr will be useless.

LeXXuz · 29th June 2022, 12:53

Just throwing in some numbers...

I had:
thSAD=800, tr=12 and thSC1=default => total file size 7.72GB

now retried with:
thsad=800, tr=12 and thSCD1=600 => 6.59GB
thsad=800, tr=24 and thSCD1=600 => 5.69GB

I think tr=24 and higher thSCD1 were totally worth it. At least for this source. However, I will not go any higher than 24, speed gets too low for general use. With tr=12 I had 5.18FPS and with tr=24 it dropped to 2.91FPS including prefiltering and a modified "slower" setting of x265. That's barely still okay. But any slower would kill me with my next electrical bill.

Will now lower thSAD more closer to thSCD1 and see how much file size will increase, because 800 is already smoothing too much for my taste.

DTL · 29th June 2022, 13:35

"thsad=800, and thSCD1=600"

I think in practical use cases thSCD1 should be not less than thSAD. Default is thSAD=thSCD1=400 as I see from documentation.

" total file size 7.72GB"

If you encode in crf mode with x264 (may be x265 too) there may be also some 'threshold effect' based of crf-value close to MDegrain activity - if you set crf as high so the MPEG encoder do not detect changed blocks - it encodes as static and no residual noise changes and output file size reduces very visibly. Same happens if after increasing thSAD and tr to such high values that residual noise is below 'crf-threshold' of MPEG encoder you also got significant filesize decrease.

So you may try to make a research of

(thSAD, tr) in mvtools + (crf) in MPEG encoder and look how output file size is changed.

So practically mvtools are pre-processor for higher-ratio MPEG compressing. As moving from MPEG4-AVC to MPEG-HEVC may give about +50% compression ratio but using denoise before MPEG4-AVC may add thousands % of compression ratio and also make image more clean and clear. I currently have about 22000Kbit/s in non-denoised documentaries in FullHD and about 4500Kbit/s after 'deep denoising' with same crf=18 x264 encoder. So additional compression ratio from denoise-preprocessing is about 488%.

"Will now lower thSAD more closer to thSCD1 "

In the 2.7.45 and older the DegrainWeight weighting functions have fixed control param of 2. In the newer versions it is wpow param of MDegrainN and may be set up to 6 and 7=equal weight. It allow to increase block weight inside thSAD without setting thSAD too high. I typically use wpow=4 now. BlockWeight=f(wpow, blockSAD) is that graph at the left rotated 90degrees at the big combined image https://i4.imageban.ru/out/2022/06/2...6c7a979dc0.png
Unfortunately in the used math functon wpow > 6 is too slow in computing so after 6 the SAD-based smooth falloff weighting is disabled and equal weighting used - so wpow=7 is the max possible degraining at given thSAD but may cause additional visual issues.

LeXXuz · 29th June 2022, 22:00

Quote:

Originally Posted by DTL

Initial estimation of the thSAD may be with MShow(showsad=true). To make it work stable I use single pair of frames search:

Code:

super = MSuper(mt=false, pel=1)
forward_vec1 = MAnalyse(super, isb = false, delta = 1, search=3, chroma=true, mt=false)
MShow(super,forward_vec1, showsad=true)

Is this the average value found for a frame or the maximum?
With the source I currently have it hardly goes over 400. I use 600 now which still leaves a little noise in the picture.
Also I start to think there is more than just grain in this film that makes it compress so badly. I think I noticed some flickering and mosquito noise around edges too. Looks like the source bitrate was already chosen too low when this Blu-ray was created.

DTL · 29th June 2022, 22:59

"Is this the average value found for a frame or the maximum?"

MShow -> showsad

Allows to show the mean (scaled to block 8x8) SAD after compensating the picture and quantity (thSCD2) of bad (thSCD1) blocks.

Mean is about average I think. May be it also good to add more statistics to MShow like ends of distribution like mean of 5..10% smallest SAD and mean of 5..10% of highest SAD.

takla · 30th June 2022, 05:59

Yeah. Using what ever thSAD Mshow shows ain't gonna work. A quick test with one of my sources shows a range from ~50 to ~200 (so my default 150 was actually pretty good here).

And yes, having the statistic show some % of lowest and highest SAD would be very helpful.

thSAD NEEDS to be dynamically adjusted automatically. It should not be a fixed value. What SHOULD be a fixed value is something like thSADmax which caps the maximum.

OR you could also make it so there is a logfile that writes every frames SAD value and have MDegrain read from it. Basically 2-pass mode.

anton_foy · 30th June 2022, 07:48

Quote:

Originally Posted by takla

Yeah. Using what ever thSAD Mshow shows ain't gonna work. A quick test with one of my sources shows a range from ~50 to ~200 (so my default 150 was actually pretty good here).

And yes, having the statistic show some % of lowest and highest SAD would be very helpful.

thSAD NEEDS to be dynamically adjusted automatically. It should not be a fixed value. What SHOULD be a fixed value is something like thSADmax which caps the maximum.

OR you could also make it so there is a logfile that writes every frames SAD value and have MDegrain read from it. Basically 2-pass mode.

I use these lines below to regulate thSAD and TR dynamically with ScriptClip.
Although my material is noisy 4K sLog-2 footage so for other material the "noise detection" may be tweaked or changed to fit the purpose better:

Code:

o=last

  #Prefilter:
  b=fastblur(3)
  P=merge(o,b,0.5).ex_levels(12,1.2,100)

pk = converttoRGB()
    pl = pk.converttoPlanarRGB()
    in = pl.ex_invert()

    t=ScriptClip(function[in,pl,pk,p] () {

        lum = in.averageR()
        rgb = pk.RGBDifferenceFromPrevious()
        luma = int(lum + rgb)
        ttr  = int(luma*0.0001)
        ths  = int(luma*0.0067)

TSMC(tradius=ttr,lumathresh=ths,auxclip=p)
    } )

Also in this case I use mocomped TemporalSoften but you can use mdegrainN or SMDegrain instead. Note my prefiltering is a simple fastblur+levels which after months of testing prefiltering techniques surprisingly works the best (out of all the other prefiltering I tried) for my material.

takla · 30th June 2022, 08:22

Quote:

Originally Posted by anton_foy

I use these lines below to regulate thSAD and TR dynamically with ScriptClip.
Although my material is noisy 4K sLog-2 footage so for other material the "noise detection" may be tweaked or changed to fit the purpose better:

Code:

o=last

  #Prefilter:
  b=fastblur(3)
  P=merge(o,b,0.5).ex_levels(12,1.2,100)

pk = converttoRGB()
    pl = pk.converttoPlanarRGB()
    in = pl.ex_invert()

    t=ScriptClip(function[in,pl,pk,p] () {

        lum = in.averageR()
        rgb = pk.RGBDifferenceFromPrevious()
        luma = int(lum + rgb)
        ttr  = int(luma*0.0001)
        ths  = int(luma*0.0067)

TSMC(tradius=ttr,lumathresh=ths,auxclip=p)
    } )

Also in this case I use mocomped TemporalSoften but you can use mdegrainN or SMDegrain instead. Note my prefiltering is a simple fastblur+levels which after months of testing prefiltering techniques surprisingly works the best (out of all the other prefiltering I tried) for my material.

That is nice and might help some people, but in this case, I want to get the exact value from Mshow. How to I accomplish this? I don't think support for that is there.

LeXXuz · 30th June 2022, 09:29

Quote:

Originally Posted by DTL

"Is this the average value found for a frame or the maximum?"

MShow -> showsad

Allows to show the mean (scaled to block 8x8) SAD after compensating the picture and quantity (thSCD2) of bad (thSCD1) blocks.

Mean is about average I think. May be it also good to add more statistics to MShow like ends of distribution like mean of 5..10% smallest SAD and mean of 5..10% of highest SAD.

The second digit shows the quantity of bad blocks? What does bad blocks mean in that context?

16th February 2022, 11:05	#782 \| Link
DTL Registered User Join Date: Jul 2018 Posts: 1,067	I think it is important addition to the project - to add interface for input/output of moving data from MAnalyse to user-accessible format inside scripting environment and read back to client filters. I read Dogway already ask for some way to get MVs from MAnalyse to check. As I found with static images (static parts of moving images) processing - https://forum.doom9.org/showthread.p...66#post1963966 - the 2frames based MAnalyse search can not decrease noise on temporal axis and so the MVs data between MAnalyse and MDegrain client when processing noised sources may require additional filtering. I can add some form of temporal filtering to MDegrainN but it may be very content-dependent and may be it is good to allow many external non-C programmers developers to experiment with in-between motion data processing before final MDegrain blending. Currently it looks we do not have common/standard exchange formats for motion data. With DX12 the Microsoft way of exchange is converting to 2D texture of 16+16 bit signed 2 component format (size of texture = number of blocks HxV). But it only provide translate motion x,y data. For current AVS+ it is idea to convert to RGBPS format texture/clip with mapping as R - x G - y B - SAD And may be make 2 additional functions to convert MAnalyse output pseudo-clip to MotionData-RGBPS and back. So users of scripts with sample-accesing functions may try to read or read and send back processed (filtered) motion data to the downstream client filters. May be it is enough to provide motion data only per level=0 and not all other levels (MDegrain only uses level=0 data ?). In the best future we need some extended format may be like XML (?) with many motion (transform) params for each block: 1. Translate (x,y) 2. Rotate (rz (rx, ry) ?) 3. Scale (sx,sy) 4. Skew (.., sx, sy) 5.. N. SAD scalar unsigned data May be for multi-movement search engines use set of separated RGBPS or Y-PS clips for each type of movement (translate/rotate/scale/etc). So if client filter can accept different types of motion data it will accept several input motion-clips. It will keep compatibility with old versions/scripts. Last edited by DTL; 16th February 2022 at 12:18.

28th June 2022, 09:01	#785 \| Link
DTL Registered User Join Date: Jul 2018 Posts: 1,067	Made some graphics to show how MDegrain works with incoming SAD deviations and for understanding how to set thSAD (and wpow) value for different input noise-based SAD distributions. https://i4.imageban.ru/out/2022/06/2...6c7a979dc0.png Also attached to post to keep at this server. As for thSCD1 value: The current ideas it shoud be at least as great as thSAD. Default thSCD1 looks like 400, so if increasing thSAD above 400 it may required to raise thSCD1 too or it will work as internal thSAD limiter and will also throw-away blocks from processing completely because detected as scene-changed blocks and completely wrongly compensated. As for AVX512: Currently I am in the finishing process of testing new processing mode for MDegrainN and pel > 1 with inside CPU core generating sub-pel compensated block instead of fetchig of pre-computed by MSuper() block from host memory (using old fully pel-refined super clip to 4x size for pel=2 and 16x size for pel=4). It works about good and faster but for best speed require processing inside register file of chip. So the AVX2 512 bytes sized register file is enough only for 8x8 8bit and lower block sizes (so can only fully service YV12 8 bit colour format with 8x8 luma and 4x4 chroma blocks, the 4:2:2 YV16 may be added with 4x8 chroma block size but not sure if it is widely used, also not sure if it supported by current mvtools at all). For any larger it is better to use 2048 bytes AVX512 register file (also with a bit faster processing of twice longer vectors). So todays 8x8 16bit blocks and 16x16 16bit blocks already require AVX512 for best speed. The 16bit 8x8 and 16x16 blocks processing AVX2 functions can be designed but will have lower performance because of store-load temp results from register file to L1d cache and back and it ruines speed to a factor about 5. Not total function speed but internal partial operations touching memory susbsystem like cache. Main reason of these functions will be to save RAM usage and it also adds to processing speed. I also will post a test sample of MAnalyse with same 'runtime-calculating' pel >1 blocks fetching. But it looks even at UMH optimized search it still slower in compare with pre-calculated super-clip of pel=4 about 2 times at i5-9600K chip. Will try to test at i5-11600 chip with AVX512 versions of sub-shift functions too later. So the main speed benefit of new MDegrainN processing of pel >1 is when using ME hardware accelerator so the most speed limit was memory fetching of blocks from large 16x sized pel=4 super clip at 4K resolutions. It is about 1.2 fps at i5-9600K with old super clip mode and about 6 fps with new inside chip shifting of 1x sized frames. Attached Images Last edited by DTL; 28th June 2022 at 09:37.

28th June 2022, 15:00	#787 \| Link
DTL Registered User Join Date: Jul 2018 Posts: 1,067	"How this graph would help us to find the correct values for parameters?" I hope it can help to understand MDegrainN activity when parameters changing: 1. When thSAD is too low - any tr value will not help in degraining. 2. The good working value of thSAD have enough visible 'barrier' or 'step' effect - untill it reach lower noise-SAD levels it is mostly ineffective. After it is set to upper noise-SAD values it become already nearly maximum effective and increasing thSAD to higher levels mostly do nothing (useful) but may cause more details blurring. 3. After thSAD reach optimal level - the most of degrain-strength adjustment is only tr-width (value). Increasing thSAD to twice or more higher above optimal will mostly not add anything useful to degrain activity. To the left it is placed rotated graph of different DegrainWeight() functions graphs scaled to 'possibly optimal' thSAD vertical value to show how blocks weights depend on SAD and thSAD values and with different wpow params. May be it is good to shoot a video-lection with a several minutes or more duration with attempt to describe this drawing better . "What is the latest version that of MVTools that I can use with SMDegrain with no AVSI modification, withouth AVX2/512 requirement?" In theory any builds should be backward compatible with old scripts. All new params are in the default disabled state. Also the max available version of SIMD co-processor is auto-detected. It was funny to found the old program text around MAnalyse() functions that very ancient 'isse' common functions param was truncated to SSE or nothing only. And there were no newer functions above SSE 128bit to use more newer chips. So this truncation were never detected untill the AVX2 functions were added. Last edited by DTL; 28th June 2022 at 15:05.

28th June 2022, 20:06	#790 \| Link
DTL Registered User Join Date: Jul 2018 Posts: 1,067	" that version, updated?" The most currently developed new feature of internal shifting for MDegrainN need at least AVX2 CPU to run faster in compare with old versions. And still have only YV12 format supported in this mode with Y-block size of 8x8. So it will not run faster at old AVX-only chip. " always wondered if there is a way to do it automatically" Initial estimation of the thSAD may be with MShow(showsad=true). To make it work stable I use single pair of frames search: Code: super = MSuper(mt=false, pel=1) forward_vec1 = MAnalyse(super, isb = false, delta = 1, search=3, chroma=true, mt=false) MShow(super,forward_vec1, showsad=true) With multi-mode of MAnalyse it typically not shows SAD stable enough. " increasing tr and thsad at the same time" thSAD mostly control 'quality by blurring' and tr the 'amount of degraining'. So initially it is good to set as high thSAD as still not too much degrade fine details but already making degraining with relatively low tr like 3. And next increase tr to balance speed/degraining ratio. And look for the ratio of thSAD vs thSCD1 value - if thSAD > thSCD1 visibly (also if mean SAD by MShow(showsad=true) is > thSCD1) - need to start raise thSCD1 too. Last edited by DTL; 28th June 2022 at 20:16.

29th June 2022, 08:21	#792 \| Link
DTL Registered User Join Date: Jul 2018 Posts: 1,067	Updated post: After more looking into the source: Yes - the array 'usable_flag_arr' is one per frame entry. The usage of thSCD1 and thSCD2 is mostly as Code: bool FakePlaneOfBlocks::IsSceneChange(sad_t nTh1, int nTh2) const { int sum = 0; for ( int i = 0; i < nBlkCount; i++ ) sum += ( blocks[i].GetSAD() > nTh1 ) ? 1 : 0; return ( sum > nTh2 ); } So when there are too many blocks with SAD > thSCD1 - the total frame marked as not-usable. So thSCD1 really directly compared with block SAD but the final result is per-frame but not per-block. So engine allow some blocks to have SAD > thSCD1 and still be in processing (only if the frame still marked as usable !). But after percentage of these blocks become > thSCD2 - the whole frame is marked as unusable and thrown away from processing. thSCD2 (int, 130) Threshold which sets how many blocks have to change for the frame to be considered as a scene change. It is ranged from 0 to 255, 0 meaning 0 %, 255 meaning 100 %. Default is 130 (which means 51 %). So user must carefully look for situation when thSAD > thSCD1 - it may quickly stops any useful processing and any increasing of tr will be useless. Last edited by DTL; 29th June 2022 at 10:15.

13th February 2022, 19:28	#781 \| Link
Ceppo Registered User Join Date: Feb 2016 Location: Nonsense land Posts: 339	There is an adobe plugin called Twixtor that allows blend interpolation like mvtools and another mode where it takes one of the interpolated frames and returns it without blending it. This mode would require only one motion vector and may give better interpolation result, at least Twixtor user say that this mode is often superior to blend. Is there any chance to get this blend free interpolating function?

29th June 2022, 12:53	#793 \| Link
LeXXuz 21 years and counting... Join Date: Oct 2002 Location: Germany Posts: 716	Just throwing in some numbers... I had: thSAD=800, tr=12 and thSC1=default => total file size 7.72GB now retried with: thsad=800, tr=12 and thSCD1=600 => 6.59GB thsad=800, tr=24 and thSCD1=600 => 5.69GB I think tr=24 and higher thSCD1 were totally worth it. At least for this source. However, I will not go any higher than 24, speed gets too low for general use. With tr=12 I had 5.18FPS and with tr=24 it dropped to 2.91FPS including prefiltering and a modified "slower" setting of x265. That's barely still okay. But any slower would kill me with my next electrical bill. Will now lower thSAD more closer to thSCD1 and see how much file size will increase, because 800 is already smoothing too much for my taste.

29th June 2022, 13:35	#794 \| Link
DTL Registered User Join Date: Jul 2018 Posts: 1,067	"thsad=800, and thSCD1=600" I think in practical use cases thSCD1 should be not less than thSAD. Default is thSAD=thSCD1=400 as I see from documentation. " total file size 7.72GB" If you encode in crf mode with x264 (may be x265 too) there may be also some 'threshold effect' based of crf-value close to MDegrain activity - if you set crf as high so the MPEG encoder do not detect changed blocks - it encodes as static and no residual noise changes and output file size reduces very visibly. Same happens if after increasing thSAD and tr to such high values that residual noise is below 'crf-threshold' of MPEG encoder you also got significant filesize decrease. So you may try to make a research of (thSAD, tr) in mvtools + (crf) in MPEG encoder and look how output file size is changed. So practically mvtools are pre-processor for higher-ratio MPEG compressing. As moving from MPEG4-AVC to MPEG-HEVC may give about +50% compression ratio but using denoise before MPEG4-AVC may add thousands % of compression ratio and also make image more clean and clear. I currently have about 22000Kbit/s in non-denoised documentaries in FullHD and about 4500Kbit/s after 'deep denoising' with same crf=18 x264 encoder. So additional compression ratio from denoise-preprocessing is about 488%. "Will now lower thSAD more closer to thSCD1 " In the 2.7.45 and older the DegrainWeight weighting functions have fixed control param of 2. In the newer versions it is wpow param of MDegrainN and may be set up to 6 and 7=equal weight. It allow to increase block weight inside thSAD without setting thSAD too high. I typically use wpow=4 now. BlockWeight=f(wpow, blockSAD) is that graph at the left rotated 90degrees at the big combined image https://i4.imageban.ru/out/2022/06/2...6c7a979dc0.png Unfortunately in the used math functon wpow > 6 is too slow in computing so after 6 the SAD-based smooth falloff weighting is disabled and equal weighting used - so wpow=7 is the max possible degraining at given thSAD but may cause additional visual issues. Last edited by DTL; 29th June 2022 at 14:06.

29th June 2022, 22:59	#796 \| Link
DTL Registered User Join Date: Jul 2018 Posts: 1,067	"Is this the average value found for a frame or the maximum?" MShow -> showsad Allows to show the mean (scaled to block 8x8) SAD after compensating the picture and quantity (thSCD2) of bad (thSCD1) blocks. Mean is about average I think. May be it also good to add more statistics to MShow like ends of distribution like mean of 5..10% smallest SAD and mean of 5..10% of highest SAD. Last edited by DTL; 29th June 2022 at 23:03.

30th June 2022, 05:59	#797 \| Link
takla Registered User Join Date: May 2018 Posts: 184	Yeah. Using what ever thSAD Mshow shows ain't gonna work. A quick test with one of my sources shows a range from ~50 to ~200 (so my default 150 was actually pretty good here). And yes, having the statistic show some % of lowest and highest SAD would be very helpful. thSAD NEEDS to be dynamically adjusted automatically. It should not be a fixed value. What SHOULD be a fixed value is something like thSADmax which caps the maximum. OR you could also make it so there is a logfile that writes every frames SAD value and have MDegrain read from it. Basically 2-pass mode.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode