Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
![]() |
#2262 | Link |
Registered User
Join Date: Jan 2018
Posts: 2,169
|
AviSynthPlus r3936 clang and IntelLLVM build
https://gitlab.com/uvz/AviSynthPlus-Builds |
![]() |
![]() |
![]() |
#2264 | Link |
Registered User
Join Date: Jan 2014
Posts: 2,473
|
I'd like to see benchmarks with different builds. I found that non-optimized C++ only codes were indeed quicker with clang/llvm. The difference was even bigger at 32 bit builds, where LLVM was much smarter on using avaliable (of limited number) CPU registers.
I then remember RgTools where it depended on the specific filter mode. One mode was better with Clang other modes were much quicker with the plugin version built with Microsoft, if I remember well, it was especially the AVX2 optimization where MS shined. Probably this benchmark should be periodically rechecked for each generation change in compilers. EDIT: I see that the README.md in https://gitlab.com/uvz/AviSynthPlus-Builds contains a short script, where clang is +10%, IntelLLVM is +14% quicker than MSVC. I wonder if the gain is evenly distributed among the filters of there is a specific one or two filters which are the bottleneck. EDIT 2: My benchmarks; two measurements per Avisynth+ version Machine: Win11 Pro, 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz 2.50 GHz MS: 49,78; 49,93 fps Clang: 50,90; 50,74 IntelLLVM: 53,29; 53,25 EDIT 3: The difference is mainly in dither=1 option (which is written in pure C). When changing that option in ConvertBits into dither=0: MS: 154,3 fps Clang: 154,9 fps Intel: 154,8 fps Script for this last run: Code:
ffms2("myvideo") convertbits(16) converttoyuv444(chromaresample="spline36") convertbits(32, fulls=false, fulld=true) converttoplanarrgb() convertbits(16,dither=0) Spline36Resize(width*2, height*2) convertbits(8, dither=0) Last edited by pinterf; 23rd February 2023 at 10:26. Reason: benchmarks added |
![]() |
![]() |
![]() |
#2265 | Link |
Registered User
Join Date: Jan 2014
Posts: 2,473
|
O.K. Challenge accepted
![]() Avisynth+ 3.7.3 test 7 (20230223) Changes since test6 Code:
20230223 3.7.3 WIP ------------------ - Update build documentation with 2023 Intel C++ tools. See Compiling Avisynth+ https://avisynthplus.readthedocs.io/...g_avsplus.html - CMakeLists.txt: add support for Intel C++ Compiler 2023. - Enhanced performance in ConvertBits Floyd dither (dither=1) for 10->8, 16->8 and 16->10 bit cases by providing special function templates to allow compilers to optimize them much better. (Both Microsoft and Intel Classic 19.2 benefits, LLVM based clangCL and IntelLLVM compilers not) - Fix crash when outputting VfW (e.g. VirtualDub) for YUV422P16, or P10 in Intel SSE2 optimization due to aligned SIMD write to an unaligned pointer - did not affect Microsoft builds. As seen in https://forum.doom9.org/showthread.p...43#post1983343 Code:
test6 test7 (fps) Microsoft 49,86 53,71 Clang 50,85* 52,99** Intel 53,43* 58,17** ** provided in test7 package by me My Intel version was built with Intel Classic 19.2 uvz version named the folder IntelLLVM. I was not able to reach even MS's speed with my IntelLLVM build without any tweak. Script: Code:
ffms2("myvideo720x576.avi") convertbits(16) converttoyuv444(chromaresample="spline36") convertbits(32, fulls=false, fulld=true) converttoplanarrgb() convertbits(16,dither=1) Spline36Resize(width*2, height*2) convertbits(8, dither=1) |
![]() |
![]() |
![]() |
#2266 | Link |
Registered User
Join Date: Mar 2012
Location: Texas
Posts: 1,675
|
Surprisingly the test 7 MSVC build is the fastest for me. I'm embarrassed to post the speeds of my 10 year old PC but here they go. Edit: well I'm not that embarrassed anymore, I didn't realize pinterf's video was 720x576
![]() Windows 7 (x64) - Intel i7-4930K Script Code:
ColorBars(1920, 1080, pixel_type="YUV420P8") Loop() Trim(0,1000) ConvertBits(16) ConvertToYUV444(chromaresample="spline36") ConvertBits(32, fulls=false, fulld=true) convertToPlanarRGB() ConvertBits(16, dither=1) Spline36Resize(width*2, height*2) ConvertBits(8, dither=1) Code:
FPS (min | max | average): 4.652 | 7.187 | 6.517 Process memory usage (max): 456 MiB Time (elapsed): 00:02:33.598 Code:
FPS (min | max | average): 4.505 | 6.913 | 6.272 Process memory usage (max): 449 MiB Time (elapsed): 00:02:39.605 Code:
FPS (min | max | average): 4.266 | 6.755 | 6.062 Process memory usage (max): 449 MiB Time (elapsed): 00:02:45.114 Code:
FPS (min | max | average): 4.367 | 7.026 | 5.999 Process memory usage (max): 449 MiB Time (elapsed): 00:02:46.860 Code:
FPS (min | max | average): 4.072 | 6.310 | 5.578 Process memory usage (max): 450 MiB Time (elapsed): 00:02:59.451 Code:
FPS (min | max | average): 3.812 | 6.551 | 5.554 Process memory usage (max): 456 MiB Time (elapsed): 00:03:00.226 Last edited by Reel.Deel; 24th February 2023 at 06:38. |
![]() |
![]() |
![]() |
#2267 | Link |
Registered User
Join Date: Dec 2005
Location: Denmark
Posts: 52
|
Request: Expr() error message, unbalanced
Is it possible to get a little more information in the Expr() error message when the stack is unbalanced on return. The message is currently
"Expr: Stack unbalanced at end of expression. Need to have exactly one value on the stack to return" (in AvsPMod). The same error message appears both for empty stack as well as more than 1 value on the stack at return. Is it possible to show the number of elements remaining on the stack in the error message? That would really help debugging of Expr expressions. |
![]() |
![]() |
![]() |
#2268 | Link | |
Registered User
Join Date: Jan 2014
Posts: 2,473
|
Quote:
|
|
![]() |
![]() |
![]() |
#2269 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,320
|
Current test results of resizers for 2:1 downsize and checking for displaying with taps=8 sincresize:
https://drive.google.com/file/d/1ds4...ew?usp=sharing 2.2 support for UserDefined2Resize was tested with custom debug build of current avisynth+ sources. Last edited by DTL; 24th February 2023 at 17:27. |
![]() |
![]() |
![]() |
#2271 | Link |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,817
|
Tested the three builds on my 5950X using Reel.Deel's script. The Intel build is the fastest on a Zen 3 as well.
Code:
MSVC FPS (min | max | average): 11.12 | 14.64 | 14.19 Process memory usage (max): 367 MiB Thread count: 33 CPU usage (average): 3.1% Time (elapsed): 00:01:10.544 Code:
Clang FPS (min | max | average): 11.14 | 14.84 | 14.09 Process memory usage (max): 367 MiB Thread count: 33 CPU usage (average): 3.1% Time (elapsed): 00:01:11.053 Code:
Intel FPS (min | max | average): 11.20 | 14.81 | 14.22 Process memory usage (max): 367 MiB Thread count: 33 CPU usage (average): 3.1% Time (elapsed): 00:01:10.417
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
![]() |
![]() |
![]() |
#2272 | Link | |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,213
|
Quote:
|
|
![]() |
![]() |
![]() |
#2273 | Link | ||
Registered User
Join Date: Mar 2012
Location: Texas
Posts: 1,675
|
Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#2274 | Link | |
Registered User
Join Date: Jul 2018
Posts: 1,320
|
Quote:
The kernel of SinPow is based on several non-linear hacks and work only in very truncated size (with more or less discontinuity at edges). The kernel of UD can be safely expanded to larger 'support' size (until it safely reach very low values). Initially there were an idea to set larger fixed 'support' to UD(2) resize (like 3 for example) internally but with experiments I found it may be useful to limit its 'support' too by additional user-controlled param and to the finely adjusted float value (most changes occur at adjustments from 2 to 3 with steps like 0.1..0.2). For example setting 'support' of 2.2 to UD2 it possible to get less residual ringing/artifacts at transients while using 'higher' b/c control params of 75/-25 and get more sharpness (closer to SinPow). Using old 'support' of 2 require to use 'lower' b/c control pair of 70/-30 (more 'extreme') to have comparable visible sharpness while having more residual ringing/artifacts. So limit of 'support' to 2 only for UD kernel significantly limits its possible 'peaking/sharpness' capability with extreme b/c setting of kernel members, also not allow to show its 'linear' properties in ringing control at full scale. So I supplement the current issue description with request to add 3rd control param to UD(2) resize of 'support' or 's'. With s=2 (as in old implementation of jpsdr's plugin) the UD(2)Resize may be adjusted close enough to SinPowResize output (at least in some range of control param). And when increasing 'support' to 3 (and may be more) you can get less residual ringing (while having lower sharpness and thicker 'peaking' contours around sharp transients). So with adjusting 'support' param the UD(2)Resize can be adjusted between 'partially non-linear' resize with s=2 to 'more linear' with s>2. Though real difference between SinPow and UD resizers even with fixed support=2 as today is not easily visible and UD(2) s>2 is mostly for 'perfectionists' like creating high-end linear processing workflow. Understanding it will have lower 'per-sample' sharpness (so require to sit at more distance from screen or use higher DPI displays and more samples per frame to keep the same 'visual sharpness'). So for very limited in samples count per frame (small frame size internet-torrents) rips of UHD/HD sources (like 700MB version in something like 640x360) may be 'partially non-linear' form of current SinPow and UD2 s=2 resizers may have benefits of a bit higher sharpness while producing some more residual ringing/artifacts. I currently checking my new rip of 4k->FullHD of some nature documentary created with UD2(width/2, height/2, b=70, c=-30) and it looks about very good. So SinPow kernel looks like limited to its initial design of single control param and 'support' of 2 and some non-linearity by-design with about no expanding possible. And UD can be easlily expanded in the number of kernel members used and 'support' size with some increasing of quality (control over residual ringing and artifacts). For example UD10Resize with 'support' of about 10 and 10 user-provided kernel members easiliy possible. So with some 'user-provided vector of arguments' more general form is UserDefinedNResize(kernel members list, s, ...) where is args_count=2 (and s=2) it is current UserDefined2Resize. But I poor in programming of AVS and still not know how to make filter with variable number of arguments and it require to ask very busy pinterf to make more programming or someone else. And also as noted in https://forum.doom9.org/showthread.p...80#post1983080 and https://forum.doom9.org/showthread.p...19#post1983119 about chroma subsampling conversion it is now possible to use different types of 4:4:4<->4:2:x conversion filtering to UV in Convert() filters. Here is example of muiti-generation default bicubic filter at 4:4:4<->4:2:0 chroma sharpness degradation: Code:
ColorBars(960*4, 540*4, pixel_type="YUV444P8") UserDefined2Resize(width/4, height/4, b=105, c=0) # put some conditioning sinc=ConvertToYUV420(chromaresample="sinclin2") sinc=ConvertToYUV444(sinc,chromaresample="sinclin2") bicub=ConvertToYUV420() bicub=ConvertToYUV444(bicub) sinc=ConvertToYUV420(sinc,chromaresample="sinclin2") sinc=ConvertToYUV444(sinc,chromaresample="sinclin2") bicub=ConvertToYUV420(bicub) bicub=ConvertToYUV444(bicub) sinc=ConvertToYUV420(sinc,chromaresample="sinclin2") sinc=ConvertToYUV444(sinc,chromaresample="sinclin2") bicub=ConvertToYUV420(bicub) bicub=ConvertToYUV444(bicub) sinc=ConvertToYUV420(sinc,chromaresample="sinclin2") sinc=ConvertToYUV444(sinc,chromaresample="sinclin2") bicub=ConvertToYUV420(bicub) bicub=ConvertToYUV444(bicub) sinc=ConvertToYUV420(sinc,chromaresample="sinclin2") sinc=ConvertToYUV444(sinc,chromaresample="sinclin2") bicub=ConvertToYUV420(bicub) bicub=ConvertToYUV444(bicub) sinc=ConvertToYUV420(sinc,chromaresample="sinclin2") sinc=ConvertToYUV444(sinc,chromaresample="sinclin2") bicub=ConvertToYUV420(bicub) bicub=ConvertToYUV444(bicub) sinc=ConvertToYUV420(sinc,chromaresample="sinclin2") sinc=ConvertToYUV444(sinc,chromaresample="sinclin2") bicub=ConvertToYUV420(bicub) bicub=ConvertToYUV444(bicub) sinc=ConvertToYUV420(sinc, chromaresample="sinclin2") bicub=ConvertToYUV420(bicub) sinc_mon=ConvertToYUV444(sinc,chromaresample="userdefined2", param1=105, param2=0).Subtitle("sinc_mon") #monitoring chroma anti-Gibbs sinc_no_mon=ConvertToYUV444(sinc,chromaresample="sinclin2").Subtitle("sinc_no_mon") bicub=ConvertToYUV444(bicub).Subtitle("bicub") Interleave(sinc_mon, sinc_no_mon, bicub) SincLin2Resize(width*2, height*2) So with w-param of w='none' or 'lin2' and extending of s to 16: Code:
ColorBars(960*4, 540*4, pixel_type="YUV444P8") # our natural infinite resolution scene , really RGBP8 or better RGBPS # main digital video camera transform of full conditioning of RGB/YUV (band-limiting and anti-Gibbs residual spectrum shaping # to required look/makeup) UserDefined2Resize(width/4, height/4, b=105, c=0, s=3, w="none") # film-look / makeup softer #or UserDefined2Resize(width/4, height/4, b=80, c=-20, s=3, w="none") # video-look / makeup sharper # put 2:1 system compression converting to 4:2:0 with partial conditioning (band-limiting, no anti-Gibbs) ConvertToYUV420(chromaresample="userdefined2", b=16, c=16, s=16, w="lin2") #MPEG compression for distribution #digital moving pictures production transform ends here ########## # broadcasting / distribution / archive digital movie compressed content ########### #enduser transform: #MPEG decompression to 4:2:0 #1:2 decompression of 4:2:0 to 4:4:4 and continue 2:1 bandlimited UV data conditioning with anti-Gibbs ConvertToYUV444(chromaresample="userdefined2", b=105, c=0, s=3, w="none") UserDefined2Resize(width*4, height*4, b=16, c=16, s=16, w="lin2")# equal to SincLin2Resize(), decompression of sampled data to 'infinite' resolution (DAC) ConvertToRGB()# for feed to RGB physical display Last edited by DTL; 25th February 2023 at 12:46. |
|
![]() |
![]() |
![]() |
#2275 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,434
|
Hello.
Tested the Test7 version provided by Pinterf, and with the clang version, i still have the same issue, the following error message with VirtualdDub2: Code:
File open error AVI Import Filter error: (unknow) (80040154) The MS build works fine.
__________________
My github. |
![]() |
![]() |
![]() |
#2276 | Link |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,213
|
Sox doesn't seem to be working neither in x86 nor in x64.
Tested on both Windows XP x86 and Windows Server 2019 x64. How to reproduce: Code:
ColorBars(848, 480, pixel_type="YV12") UpSoundOnSound() ![]() This has been broken since 2017 and the last version working with sox was Avisynth 2.6.1 from 2016 (it's been a minute, I know). I reported it here: https://forum.doom9.org/showthread.php?p=1887661 a while ago and has been reproduced by Tebasuna as well for both x86 and x64. Since a new Avisynth version is in the making (3.7.3 Test 7), is there a way to get this sorted once and for all so that I can go back to use Sox? |
![]() |
![]() |
![]() |
#2277 | Link | |
Registered User
Join Date: Jan 2014
Posts: 2,473
|
Quote:
![]() |
|
![]() |
![]() |
![]() |
#2278 | Link | |
Moderator
![]() Join Date: Feb 2005
Location: Spain
Posts: 7,241
|
Quote:
Must be MaskChannels instead NumChannels (than can be calculated from MaskChannels) |
|
![]() |
![]() |
![]() |
#2279 | Link | |
Registered User
Join Date: Jan 2014
Posts: 2,473
|
Quote:
![]() And this is a CPP 2.5 style plugin. Nothing is guaranteed, if it works, we are happy. I'm gonna put zero effort into checking why it does not work. Instead, the whole Sox library integration must be refreshed; too bad they changed the API, first in 2006, then ... who knows when. Anyway, I could not make Sox compile as a static library to link with SoxFilter avisynth filter, nor could do any integration in two hours' effort. Everything has been changed since then. For example I had no grey hair in 2006 ![]() |
|
![]() |
![]() |
![]() |
#2280 | Link | |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,213
|
Yeah, I know.
It's a shame that they haven't updated anything since 2006 and I'm also sad to see it failing to compile while targeting new Avisynth+ headers... ![]() Unfortunately, I used to rely on those upmix methods a lot in the past, in particular when I had to insert like distribution bumpers etc in what would otherwise be a 5.1 movie or tv series etc before creating the final DCP and send it over to the cinemas. Nowadays, ever since it stopped working with newer versions of AVS, I used the surround filter in FFMpeg like: Quote:
![]() however ideally in the future I'd like to go back doing everything inside Avisynth like I used to. Someone should really pick up the Sox filters and properly maintain them instead of leaving them in the Avisynth 2.5 abandonware... ![]() Last edited by FranceBB; 2nd March 2023 at 18:41. |
|
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|