Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
24th November 2021, 19:16 | #741 | Link | |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,542
|
Quote:
I can't even get a nice error, they simply produce a 0 size file.
__________________
@turment on Telegram |
|
24th November 2021, 19:47 | #742 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,041
|
Oh - it looks at least error with hard setting of optPredictorType inside - it looks everywhere is PT=0. Sorry - not tested it for output result after build. Will try to look what is wrong.
"simply produce a 0 size file." May be it simply crash at startup. Can you look in system evens viewer or drwatson log - may it have some crash records ? It is strange - it should use only SSE instructions up to SSE 4.1 and checked at Intel Core2Duo E7500. Will enable check for required SSE version in next testbuild. Edit: Found source of one error - Visual Studio opens for editing files form different copy of the project. So all .dlls were build from equal sources. Will rebuild now. Though it not shows why processing not make any result. Last edited by DTL; 24th November 2021 at 20:19. |
24th November 2021, 21:28 | #743 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,041
|
Well - new fixed testbuild and checked for run at Core2Duo E7500: https://drive.google.com/file/d/1ry3...ew?usp=sharing
Have both SO0 and SO1 builds for all PT values. The PT2 versions removed because can not run correctly without limiting levels to about 2..3. PT4 require lowering of thSAD to about 1.5 times lower in compare with other. On E7500 old CPU optSearchOption=1 with some new SSE optimizations enabled runs slower - it looks not all old SSE CPUs can run faster with SSE versions of functions instead of C. So it definitely can not be non-controlled enabled in final release. |
26th November 2021, 01:08 | #744 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,041
|
Issue found with latest testbuild : crash with block size 16x16 and (ALIGN_SOURCEBLOCK = 1 'asb1' in file names) (aligned copy disabled). With default padding = 8. If increase padding to 16 (in Msuper) - crash not happens. So users of scripts with default MSuper params (hpad=vpad=8) will have crash if using faster 'asb1' builds with block size 16x16. Looks x264 SSE2 and SSSE3 16x16 SAD functions was not tested with disabling aligned copy of source block.
Hope for some workaround for this issue. Current user-side workaround - increase hpad and vpad to about blocksize or larger if crash occur. |
30th November 2021, 16:14 | #745 | Link |
Registered User
Join Date: Nov 2009
Posts: 2,352
|
I got a BSOD with latest official build, trying to return a scaled MV clip... I panicked. Lost my dev version script of SMDegrain (filled with zeros), luckily it wasn't much, only a few commented expressions and notes.
This was more or less the trigger. I was trying to check bv1 clip dimensions to debug an issue I was having. Environment: i7-4790K Win7-SP1 x64 AVS+ test29 x64 no avstp.dll in plugin path Code:
setmemorymax(2048) DGSource(bluray source) ConvertBits(16) w=width() h=height() bicubicresize(w*2,h*2) pref8 = ConvertBits(8, dither=-1) pref8 = pref8.BilinearResize(w, h) pref8 = pref8.ConvertToYUV420(false,"","MPEG1","spline16","top_left").ConvertBits(16) pref8 = pref8.ex_Luma_Rebuild(S0=3.0,c=0.0625,uv=3,tv_range=true,fulls=false).ConvertBits(8, dither=-1) super_search = MSuper(pref8, pel=1, chroma=true, hpad=0, vpad=0, sharp=1, rfilter=4, mt=true) Recalculate = MSuper(pref8, pel=1, chroma=true, hpad=0, vpad=0, sharp=1, rfilter=4, mt=true,levels=1) bv1 = super_search.MAnalyse(isb = true, delta = 1, overlap=8, blksize= 16, search=4, chroma=true, truemotion=false, divide=0, dct=0, searchparam=2, pelsearch=1, temporal=false, trymany=false, scaleCSAD=1, mt=true) bv1 = MRecalculate(Recalculate, bv1, overlap=4,blksize=8, thSAD=200, chroma=true, truemotion=false, divide=0, dct=0, scaleCSAD=1, mt=true) bv1 = bv1.MScaleVect() bv1 # without prefetch, in avspmod
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread Last edited by Dogway; 30th November 2021 at 16:30. |
30th November 2021, 16:55 | #746 | Link | |
Registered User
Join Date: Jan 2014
Posts: 2,309
|
Quote:
bv1 is a 172444 x 1 sized RGB32 clip. Works for me from avsmeter64 and in 64 bit avspmod as well. |
|
30th November 2021, 17:30 | #747 | Link |
Registered User
Join Date: Nov 2009
Posts: 2,352
|
Oh well, thanks for testing, I didn't feel brave enough to reproduce. I guess the long sized clip did something to my RAM, also I was running low on disk space so it could be a thing. I thought bv1 was similar to msuper clip. Now I will try to debug without returning mv clips, lesson learned.
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread |
3rd December 2021, 12:31 | #748 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,041
|
Small important update based on pinterf sources from 9 November 2021 - https://drive.google.com/file/d/1EEY...ew?usp=sharing . Should run stable with block size 16.
Added check of coordinates of predictors to skip repeated check of already checked predictor. Should make optPredictorType=0 (all predictors, old default) close to PT1 in speed while still kepping all possible predictors. In real footage many predictors are equal (of Zero, Global, Median and 4 neibour, also may be +Temporal if enabled)) so keeping track of already checked predictors saves form some calls to single SAD() function that is not SIMD-friendly and hard to optimize. Speed is content-dependent so the completely static sources like ColorBars() will give more speed. So better to test speed on real footage with different movements. Included also very small SSE41 optimizations in separate file and hardcoded inside SO=1 for users of old scripts. Last edited by DTL; 3rd December 2021 at 12:44. |
3rd December 2021, 17:31 | #750 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,041
|
It built by system - https://drive.google.com/file/d/1B6E...ew?usp=sharing
but not any good tested if work correctly. |
7th December 2021, 18:20 | #751 | Link | |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,542
|
Quote:
I am now using the SO1 version, instead of stable one, because of its speed and good results. If you want to try a SSE42 build, my CPU supports it and perhaps we can get a little speed bump.
__________________
@turment on Telegram Last edited by tormento; 7th December 2021 at 22:57. |
|
7th December 2021, 23:20 | #752 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,041
|
Later it looks I found a bug in that build from December 3 - it may cause skipping some valuable predictors and decrease degraining quality. Hope bugfixed build - (both x64 and x86)
https://drive.google.com/file/d/1kMc...ew?usp=sharing "SSE42 build, my CPU supports it and perhaps we can get a little speed bump." Unfortunately SSE4.2 do not adds any significant. The only way to boost performance with all predictors and all levels refining - either AVX2 or better AVX512 capable chip. For old CPUs only possible to try 'logical optimizations' like PT=4 mode - with pure interpolated prediction at level 0. It may provide lower quality of degraning but fastest possible mode. Also it is planed to put to SIMD (of low family like simple SSE) the InterpolatePrediction() function and it may also add some speed at SSE-level chips. But it still of lower priority - I currently in developing of multi-blocks search for AVX2 and AVX512 and interesting in the difference between 4/8 blocks AVX2 processing vs 16/32 blocks AVX512 processing. Of blocksize 8x8. Today the 4Blocks 8x8 sp1 avx2 function looks like converted from pure tech speed test to something working for degraining. Addition: PT=4 do require re-adjusting thSAD value in MDegrain (lower to about 1.5 times from 'standard' because it output SAD from level 1 and it typically lower). Using 'standard' thSAD value may cause too much detail blurring as usual too high setting of thSAD. Last edited by DTL; 8th December 2021 at 00:57. |
8th December 2021, 08:25 | #756 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,041
|
I think the best home chip is about i5-11400 now. But it looks it need about 200 watt unlocked power and cooler to run with AVX512 at good performance. If run with rated TDP 65 watt limit it looks will self-limiting to much lower performance level.
|
8th December 2021, 08:32 | #757 | Link | |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,542
|
Quote:
Alder Lake is a nice beast, unfortunately you have to disable E-Cores to have AVX-512 back.
__________________
@turment on Telegram |
|
8th December 2021, 11:48 | #758 | Link | |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,883
|
Quote:
Bring it on for the next stable release, Sky servers will thank you for the AVX512 build speed-up! |
|
8th December 2021, 12:43 | #759 | Link | |
Registered User
Join Date: Jul 2018
Posts: 1,041
|
Quote:
As for DDR5 vs DDR4 - I not sure if it makes lot difference. As I see with typical latency about 50 ns the real random access byte-read speed is about 20 MBytes/s. And linear transfer is typical > 50 GBytes/s nowdays. The gap is about 2500 times. Unfortunately progress in latency at SDRAM is about 2 times at about 2 decades. Can you make test of speed for 64x16 vs 16x64 block processing ? At old Core 2 Duo E7500 CPU I got about 60% of speed difference. But at i5-9600 and i5-11500 much less (looks latest intels have better hardware prefetchers tuned and really have about 10 times more cache). I think about re-design of MDegrainN memory access pattern for better speed of memory access but it also need time and data if it will significantly helps to newer CPUs. "AVX512?" Yes - it have 4 times larger register file and allow to process 4 times more blocks in a single search op (if vector coherency domain is large enough - that is frequently happens). But it looks something still bad with consumer-level AVX512 intels - testers reports of large power overbudget if try to load CPU with calculation and not limit power at motherboard power supply. So it either over-heated (with small funny box cooler) and auto-trottle speed or overload motherboard power supply and crash/BSOD/etc of even burn motherboard. I personally have really burn-out motherboard at Pentium2 time - it was 2 slots and 1 of 2 once burn at night. So it looks 14 nm intel can not run with AVX512 processing even at nominal frequency and start to auto-trottle itself. So the performance at consumer-level AVX512 chips may be still limited. Or very good (water ?) cooler required and special motherboard with large power over-limiting over rated chip TDP (like 3x times larger). I wonder how server-class intel chips with > 10 cores of AVX512 work at full speed for years. I hope newer 7nm intel chips will be less power-hungry at AVX512 processing. But it still the future. " will thank you for the AVX512 build speed-up!" Unfortunately creating 'massive multi-block' processing versions of search functions takes lots of time for checking. The 'very simple' 4blocks sp1 AVX2 function take visible part of day to check all 4 blocks x 8 positions_each_block = 32 test points. And for AVX512 it is planned up to 32 blocks - 32x8=256 test points. Or require to build special test software for automation testing task. And for level>0 the sp2 versions required that have 24..25 search positions for each block - it is 32x25=800 points to test for full checking. The performance of new hardware quickly outperform the performance of user to create programs for it. I hope AVX512 16/32 blocks 'tech demo' of SearchOption=4 will soon be available to check for possible speedup of AVX512. Last edited by DTL; 8th December 2021 at 13:17. |
|
8th December 2021, 13:27 | #760 | Link | |||
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,883
|
Quote:
Quote:
Quote:
Well, fingers crossed again, then. |
|||
Thread Tools | Search this Thread |
Display Modes | |
|
|