Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
13th February 2022, 19:28 | #781 | Link |
Registered User
Join Date: Feb 2016
Location: Nonsense land
Posts: 339
|
There is an adobe plugin called Twixtor that allows blend interpolation like mvtools and another mode where it takes one of the interpolated frames and returns it without blending it. This mode would require only one motion vector and may give better interpolation result, at least Twixtor user say that this mode is often superior to blend. Is there any chance to get this blend free interpolating function?
|
16th February 2022, 11:05 | #782 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,067
|
I think it is important addition to the project - to add interface for input/output of moving data from MAnalyse to user-accessible format inside scripting environment and read back to client filters.
I read Dogway already ask for some way to get MVs from MAnalyse to check. As I found with static images (static parts of moving images) processing - https://forum.doom9.org/showthread.p...66#post1963966 - the 2frames based MAnalyse search can not decrease noise on temporal axis and so the MVs data between MAnalyse and MDegrain client when processing noised sources may require additional filtering. I can add some form of temporal filtering to MDegrainN but it may be very content-dependent and may be it is good to allow many external non-C programmers developers to experiment with in-between motion data processing before final MDegrain blending. Currently it looks we do not have common/standard exchange formats for motion data. With DX12 the Microsoft way of exchange is converting to 2D texture of 16+16 bit signed 2 component format (size of texture = number of blocks HxV). But it only provide translate motion x,y data. For current AVS+ it is idea to convert to RGBPS format texture/clip with mapping as R - x G - y B - SAD And may be make 2 additional functions to convert MAnalyse output pseudo-clip to MotionData-RGBPS and back. So users of scripts with sample-accesing functions may try to read or read and send back processed (filtered) motion data to the downstream client filters. May be it is enough to provide motion data only per level=0 and not all other levels (MDegrain only uses level=0 data ?). In the best future we need some extended format may be like XML (?) with many motion (transform) params for each block: 1. Translate (x,y) 2. Rotate (rz (rx, ry) ?) 3. Scale (sx,sy) 4. Skew (.., sx, sy) 5.. N. SAD scalar unsigned data May be for multi-movement search engines use set of separated RGBPS or Y-PS clips for each type of movement (translate/rotate/scale/etc). So if client filter can accept different types of motion data it will accept several input motion-clips. It will keep compatibility with old versions/scripts. Last edited by DTL; 16th February 2022 at 12:18. |
6th March 2022, 11:30 | #783 | Link | |
Registered User
Join Date: Jul 2018
Posts: 1,067
|
Quote:
1. Good information: The data compute accelerators also support dedicated hardware instructions for many SAD computing on input vectors - the msad4 intrinsinc in HLSL: https://docs.microsoft.com/en-us/win...ics-hlsl-msad4 . Also have an example for searching position of reference pattern in a buffer. That mean the SAD-based motion search form current MAnalyse may be more efficiently ported into Compute Shader version. 2. The very strange situation about 'low level' programming of compute accelerators: It not support 'assembler' level programming. Only C-like languages. Example of question about direct 'assembler-level' programming of accelerator - https://stackoverflow.com/questions/...-in-directx-11 . For NVIDIA CUDA - https://stackoverflow.com/questions/...guage-for-cuda . So it mean currently the compiler must produce as best as possible executable result for accelerator and no manual hand-crafting is possible (not officially supported). Programmer need to use higher-level functions intrinsics or simple C-like statements. That is partial answer on the question about current and may be future state of compiler optimizations for at least part of current computing hardware (and may be typically higher in performance in compare with host desktop CPU). May be accelerators from different manufacturers are not fully compatible with instructions sets so even in 'compiled' state the program may still use some pseudo-code for further adapting to executing hardware at runtime. It looks AMD also will someday support AVX512 register file and some instructions. https://www.extremetech.com/computin...2-channel-ddr5 So it is good to test the sub-sample processing with upsizing of 1x level to 2x or 4x for pel=2 and pel=4 search in MAnalyse and shifting for MDegrainN on AVX512 functions. Last edited by DTL; 7th March 2022 at 12:16. |
|
13th March 2022, 14:36 | #784 | Link | |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,904
|
Quote:
- 3 Intel Xeon 56c/112th AVX-512 128 GB of RAM - 2 Intel Xeon 20c/40th AVX-2 64 GB of RAM - 1 Intel Xeon 10c/20th AVX-2 32 GB of RAM (old server) and soon-ish I'm gonna have 19 more of those, so: - 20 Intel Xeon 10c/20th AVX-2 32 GB of RAM (old server) that should take care of all the extra work and ideally the 3 monsters will pick up and handle almost only the ProRes, XAVC, MJPEG2000 etc UHD clips and the other ones will take care of the old/legacy XDCAM-50 FULL HD version and the MPEG-2 12 Mbit/s Long GOP M=3 N=15 SD version of movies, tv series etc. The only nag is that for SD versions only I always have to call ommcp.exe and remux with the Omneon muxer 'cause Omneon playout ports have their own special flag and they don't follow the normal container's flag or stream's flag for aspect ratio. This is because once you flag it within Omneon, it will tell the playback port what to do, so not only whether it's 4:3 or 16:9 but also whether you want to crop it, add borders, leave it as it is etc. I really honestly wish SD to die 'cause encoding 2022 movies in SD BT601 only to serve some people really breaks my heart. |
|
28th June 2022, 09:01 | #785 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,067
|
Made some graphics to show how MDegrain works with incoming SAD deviations and for understanding how to set thSAD (and wpow) value for different input noise-based SAD distributions.
https://i4.imageban.ru/out/2022/06/2...6c7a979dc0.png Also attached to post to keep at this server. As for thSCD1 value: The current ideas it shoud be at least as great as thSAD. Default thSCD1 looks like 400, so if increasing thSAD above 400 it may required to raise thSCD1 too or it will work as internal thSAD limiter and will also throw-away blocks from processing completely because detected as scene-changed blocks and completely wrongly compensated. As for AVX512: Currently I am in the finishing process of testing new processing mode for MDegrainN and pel > 1 with inside CPU core generating sub-pel compensated block instead of fetchig of pre-computed by MSuper() block from host memory (using old fully pel-refined super clip to 4x size for pel=2 and 16x size for pel=4). It works about good and faster but for best speed require processing inside register file of chip. So the AVX2 512 bytes sized register file is enough only for 8x8 8bit and lower block sizes (so can only fully service YV12 8 bit colour format with 8x8 luma and 4x4 chroma blocks, the 4:2:2 YV16 may be added with 4x8 chroma block size but not sure if it is widely used, also not sure if it supported by current mvtools at all). For any larger it is better to use 2048 bytes AVX512 register file (also with a bit faster processing of twice longer vectors). So todays 8x8 16bit blocks and 16x16 16bit blocks already require AVX512 for best speed. The 16bit 8x8 and 16x16 blocks processing AVX2 functions can be designed but will have lower performance because of store-load temp results from register file to L1d cache and back and it ruines speed to a factor about 5. Not total function speed but internal partial operations touching memory susbsystem like cache. Main reason of these functions will be to save RAM usage and it also adds to processing speed. I also will post a test sample of MAnalyse with same 'runtime-calculating' pel >1 blocks fetching. But it looks even at UMH optimized search it still slower in compare with pre-calculated super-clip of pel=4 about 2 times at i5-9600K chip. Will try to test at i5-11600 chip with AVX512 versions of sub-shift functions too later. So the main speed benefit of new MDegrainN processing of pel >1 is when using ME hardware accelerator so the most speed limit was memory fetching of blocks from large 16x sized pel=4 super clip at 4K resolutions. It is about 1.2 fps at i5-9600K with old super clip mode and about 6 fps with new inside chip shifting of 1x sized frames. Last edited by DTL; 28th June 2022 at 09:37. |
28th June 2022, 10:45 | #786 | Link | |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,577
|
Quote:
What is the latest version that of MVTools that I can use with SMDegrain with no AVSI modification, withouth AVX2/512 requirement?
__________________
@turment on Telegram |
|
28th June 2022, 15:00 | #787 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,067
|
"How this graph would help us to find the correct values for parameters?"
I hope it can help to understand MDegrainN activity when parameters changing: 1. When thSAD is too low - any tr value will not help in degraining. 2. The good working value of thSAD have enough visible 'barrier' or 'step' effect - untill it reach lower noise-SAD levels it is mostly ineffective. After it is set to upper noise-SAD values it become already nearly maximum effective and increasing thSAD to higher levels mostly do nothing (useful) but may cause more details blurring. 3. After thSAD reach optimal level - the most of degrain-strength adjustment is only tr-width (value). Increasing thSAD to twice or more higher above optimal will mostly not add anything useful to degrain activity. To the left it is placed rotated graph of different DegrainWeight() functions graphs scaled to 'possibly optimal' thSAD vertical value to show how blocks weights depend on SAD and thSAD values and with different wpow params. May be it is good to shoot a video-lection with a several minutes or more duration with attempt to describe this drawing better . "What is the latest version that of MVTools that I can use with SMDegrain with no AVSI modification, withouth AVX2/512 requirement?" In theory any builds should be backward compatible with old scripts. All new params are in the default disabled state. Also the max available version of SIMD co-processor is auto-detected. It was funny to found the old program text around MAnalyse() functions that very ancient 'isse' common functions param was truncated to SSE or nothing only. And there were no newer functions above SSE 128bit to use more newer chips. So this truncation were never detected untill the AVX2 functions were added. Last edited by DTL; 28th June 2022 at 15:05. |
28th June 2022, 18:23 | #788 | Link | |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,577
|
Quote:
tr=3, thsad=300 -> x265 -> file size tr=4, thsad=400 -> x265 -> file size etc... But it's very time consuming and I have always wondered if there is a way to do it automatically. I remember that you sent here a version with one hardcoded parameter that could really increase speed without affecting quality in a visible manner. Would it be possible to have that version, updated?
__________________
@turment on Telegram Last edited by tormento; 28th June 2022 at 18:26. |
|
28th June 2022, 18:49 | #789 | Link | |
21 years and counting...
Join Date: Oct 2002
Location: Germany
Posts: 716
|
Quote:
I never paid much attention to anything else but prefiltering, tr and thSAD. Simply because I didn't understand enough how all the other parameters are connected with each other. Yes, I read the docs. But that still was way over my head at times. So the very big question is how to determine the somewhat best SAD value for a source file. Right now, much like tormento already mentioned, it's more like a lot of trial and error for me. So I was wondering if there may be any automated way to measure this somehow. It doesn't have to measure the best possible settings but something like a decent base of params to just tweak a little here and there to personal liking. Otherwise starting from scratch for every movie will be a life's work if I ever want to recode my collection. |
|
28th June 2022, 20:06 | #790 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,067
|
" that version, updated?"
The most currently developed new feature of internal shifting for MDegrainN need at least AVX2 CPU to run faster in compare with old versions. And still have only YV12 format supported in this mode with Y-block size of 8x8. So it will not run faster at old AVX-only chip. " always wondered if there is a way to do it automatically" Initial estimation of the thSAD may be with MShow(showsad=true). To make it work stable I use single pair of frames search: Code:
super = MSuper(mt=false, pel=1) forward_vec1 = MAnalyse(super, isb = false, delta = 1, search=3, chroma=true, mt=false) MShow(super,forward_vec1, showsad=true) " increasing tr and thsad at the same time" thSAD mostly control 'quality by blurring' and tr the 'amount of degraining'. So initially it is good to set as high thSAD as still not too much degrade fine details but already making degraining with relatively low tr like 3. And next increase tr to balance speed/degraining ratio. And look for the ratio of thSAD vs thSCD1 value - if thSAD > thSCD1 visibly (also if mean SAD by MShow(showsad=true) is > thSCD1) - need to start raise thSCD1 too. Last edited by DTL; 28th June 2022 at 20:16. |
29th June 2022, 06:55 | #791 | Link | |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,731
|
Quote:
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
|
29th June 2022, 08:21 | #792 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,067
|
Updated post:
After more looking into the source: Yes - the array 'usable_flag_arr' is one per frame entry. The usage of thSCD1 and thSCD2 is mostly as Code:
bool FakePlaneOfBlocks::IsSceneChange(sad_t nTh1, int nTh2) const { int sum = 0; for ( int i = 0; i < nBlkCount; i++ ) sum += ( blocks[i].GetSAD() > nTh1 ) ? 1 : 0; return ( sum > nTh2 ); } So engine allow some blocks to have SAD > thSCD1 and still be in processing (only if the frame still marked as usable !). But after percentage of these blocks become > thSCD2 - the whole frame is marked as unusable and thrown away from processing. thSCD2 (int, 130) Threshold which sets how many blocks have to change for the frame to be considered as a scene change. It is ranged from 0 to 255, 0 meaning 0 %, 255 meaning 100 %. Default is 130 (which means 51 %). So user must carefully look for situation when thSAD > thSCD1 - it may quickly stops any useful processing and any increasing of tr will be useless. Last edited by DTL; 29th June 2022 at 10:15. |
29th June 2022, 12:53 | #793 | Link |
21 years and counting...
Join Date: Oct 2002
Location: Germany
Posts: 716
|
Just throwing in some numbers...
I had: thSAD=800, tr=12 and thSC1=default => total file size 7.72GB now retried with: thsad=800, tr=12 and thSCD1=600 => 6.59GB thsad=800, tr=24 and thSCD1=600 => 5.69GB I think tr=24 and higher thSCD1 were totally worth it. At least for this source. However, I will not go any higher than 24, speed gets too low for general use. With tr=12 I had 5.18FPS and with tr=24 it dropped to 2.91FPS including prefiltering and a modified "slower" setting of x265. That's barely still okay. But any slower would kill me with my next electrical bill. Will now lower thSAD more closer to thSCD1 and see how much file size will increase, because 800 is already smoothing too much for my taste. |
29th June 2022, 13:35 | #794 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,067
|
"thsad=800, and thSCD1=600"
I think in practical use cases thSCD1 should be not less than thSAD. Default is thSAD=thSCD1=400 as I see from documentation. " total file size 7.72GB" If you encode in crf mode with x264 (may be x265 too) there may be also some 'threshold effect' based of crf-value close to MDegrain activity - if you set crf as high so the MPEG encoder do not detect changed blocks - it encodes as static and no residual noise changes and output file size reduces very visibly. Same happens if after increasing thSAD and tr to such high values that residual noise is below 'crf-threshold' of MPEG encoder you also got significant filesize decrease. So you may try to make a research of (thSAD, tr) in mvtools + (crf) in MPEG encoder and look how output file size is changed. So practically mvtools are pre-processor for higher-ratio MPEG compressing. As moving from MPEG4-AVC to MPEG-HEVC may give about +50% compression ratio but using denoise before MPEG4-AVC may add thousands % of compression ratio and also make image more clean and clear. I currently have about 22000Kbit/s in non-denoised documentaries in FullHD and about 4500Kbit/s after 'deep denoising' with same crf=18 x264 encoder. So additional compression ratio from denoise-preprocessing is about 488%. "Will now lower thSAD more closer to thSCD1 " In the 2.7.45 and older the DegrainWeight weighting functions have fixed control param of 2. In the newer versions it is wpow param of MDegrainN and may be set up to 6 and 7=equal weight. It allow to increase block weight inside thSAD without setting thSAD too high. I typically use wpow=4 now. BlockWeight=f(wpow, blockSAD) is that graph at the left rotated 90degrees at the big combined image https://i4.imageban.ru/out/2022/06/2...6c7a979dc0.png Unfortunately in the used math functon wpow > 6 is too slow in computing so after 6 the SAD-based smooth falloff weighting is disabled and equal weighting used - so wpow=7 is the max possible degraining at given thSAD but may cause additional visual issues. Last edited by DTL; 29th June 2022 at 14:06. |
29th June 2022, 22:00 | #795 | Link | |
21 years and counting...
Join Date: Oct 2002
Location: Germany
Posts: 716
|
Quote:
With the source I currently have it hardly goes over 400. I use 600 now which still leaves a little noise in the picture. Also I start to think there is more than just grain in this film that makes it compress so badly. I think I noticed some flickering and mosquito noise around edges too. Looks like the source bitrate was already chosen too low when this Blu-ray was created. |
|
29th June 2022, 22:59 | #796 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,067
|
"Is this the average value found for a frame or the maximum?"
MShow -> showsad Allows to show the mean (scaled to block 8x8) SAD after compensating the picture and quantity (thSCD2) of bad (thSCD1) blocks. Mean is about average I think. May be it also good to add more statistics to MShow like ends of distribution like mean of 5..10% smallest SAD and mean of 5..10% of highest SAD. Last edited by DTL; 29th June 2022 at 23:03. |
30th June 2022, 05:59 | #797 | Link |
Registered User
Join Date: May 2018
Posts: 184
|
Yeah. Using what ever thSAD Mshow shows ain't gonna work. A quick test with one of my sources shows a range from ~50 to ~200 (so my default 150 was actually pretty good here).
And yes, having the statistic show some % of lowest and highest SAD would be very helpful. thSAD NEEDS to be dynamically adjusted automatically. It should not be a fixed value. What SHOULD be a fixed value is something like thSADmax which caps the maximum. OR you could also make it so there is a logfile that writes every frames SAD value and have MDegrain read from it. Basically 2-pass mode. |
30th June 2022, 07:48 | #798 | Link | |
Registered User
Join Date: Dec 2005
Location: Sweden
Posts: 703
|
Quote:
Although my material is noisy 4K sLog-2 footage so for other material the "noise detection" may be tweaked or changed to fit the purpose better: Code:
o=last #Prefilter: b=fastblur(3) P=merge(o,b,0.5).ex_levels(12,1.2,100) pk = converttoRGB() pl = pk.converttoPlanarRGB() in = pl.ex_invert() t=ScriptClip(function[in,pl,pk,p] () { lum = in.averageR() rgb = pk.RGBDifferenceFromPrevious() luma = int(lum + rgb) ttr = int(luma*0.0001) ths = int(luma*0.0067) TSMC(tradius=ttr,lumathresh=ths,auxclip=p) } ) |
|
30th June 2022, 08:22 | #799 | Link | |
Registered User
Join Date: May 2018
Posts: 184
|
Quote:
|
|
30th June 2022, 09:29 | #800 | Link | |
21 years and counting...
Join Date: Oct 2002
Location: Germany
Posts: 716
|
Quote:
Last edited by LeXXuz; 30th June 2022 at 09:34. |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|