Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
![]() |
#2341 | Link | |
Registered User
Join Date: Jan 2014
Posts: 2,233
|
Quote:
|
|
![]() |
![]() |
![]() |
#2343 | Link |
Registered User
Join Date: Jul 2018
Posts: 783
|
Unfortunately VS2019 debugger not break on the Illegal Istruction exception in AVS (or I still not know how to configure it to break and show disassembly). But my Core2Duo E7500 CPU do have MMX, SSE, SSE2, SSE3, SSSE3, SSE 4.1, EM64T, VT-x and still invalid instruction crash with that builds without SDE emulating up to AVX2.
It definitely something around AVX/2 : Disabling AVX and AVX2 emulation sde.exe -avx 0 -avx2 0 -- G:\Distr\VirtualDub-1.10.4-AMD64\veedub64.exe cause crash with x64_clang build: Illegal instruction at address = 7fed7947e0a: c5 fc 10 05 9e 17 19 00 c5 fc 11 05 e6 b4 24 Image name: C:\Windows\system32\AviSynth.dll Offset in image: 0x4a7e0a IDA shows at disassembly: .text:00000001804A7E0A vmovups ymm0, cs:ymmword_1806395B0 - it is AVX or AVX2 instruction. Same as VS2019 debug output : Exception thrown at 0x000007FEC4FF7E0A (AviSynth.dll) in Veedub64.exe: 0xC000001D: Illegal Instruction. 7E0A address from some page offset ? Last edited by DTL; 17th March 2023 at 18:48. |
![]() |
![]() |
![]() |
#2344 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,220
|
My quick test PC hasn't AVX, but has SSE4.2. Never tested on my dedicated video processing PC (as it didn't work on quick test), which has AVX2.
I'll test again, but on my Video PC. Is it possible that despite compile option clang still puts AVX/AVX2 when it shouldn't ? Or some intrinsic code with AVX/AVX2 is in some place it doesn't belong ? Because that's strange that both Intel and clang produce the same result. Now Visual Studio has allready changed things without telling (the example is the bug in non aligned for AVISource, where VS compiled a not aligned when the code asked for aligned).
__________________
My github. Last edited by jpsdr; 17th March 2023 at 21:03. |
![]() |
![]() |
![]() |
#2345 | Link | |
Registered User
Join Date: Jul 2018
Posts: 783
|
Quote:
sub_1804A7E00 proc near mov cs:dword_1806F32F0, 10100h vmovups ymm0, cs:ymmword_1806395B0 < -- crash vmovups cs:ymmword_1806F3300, ymm0 mov cs:dword_1806F3320, 1030200h vmovups ymm0, cs:ymmword_1806395D0 vmovups cs:ymmword_1806F3330, ymm0 vmovaps xmm0, cs:xmmword_18052C510 vmovaps cs:xmmword_1806F3350, xmm0 vmovups ymm0, cs:ymmword_1806395F0 vmovups cs:ymmword_1806F3360, ymm0 vmovups ymm0, cs:ymmword_180639610 vmovups cs:ymmword_1806F3380, ymm0 vmovaps xmm0, cs:xmmword_18052C520 vmovaps cs:xmmword_1806F33A0, xmm0 vmovups ymm0, cs:ymmword_180639630 vmovups cs:ymmword_1806F33B0, ymm0 vmovups ymm0, cs:ymmword_180639650 vmovups cs:ymmword_1806F33D0, ymm0 vmovups ymm0, cs:ymmword_180639690 vmovups cs:ymmword_1806F3410, ymm0 vmovups ymm0, cs:ymmword_180639670 vmovups cs:ymmword_1806F33F0, ymm0 vmovups ymm0, cs:ymmword_180639710 vmovups cs:ymmword_1806F3490, ymm0 vmovups ymm0, cs:ymmword_1806396F0 vmovups cs:ymmword_1806F3470, ymm0 vmovups ymm0, cs:ymmword_1806396D0 vmovups cs:ymmword_1806F3450, ymm0 vmovups ymm0, cs:ymmword_1806396B0 vmovups cs:ymmword_1806F3430, ymm0 and so on. It looks like compiler generated block. Last edited by DTL; 17th March 2023 at 19:42. |
|
![]() |
![]() |
![]() |
#2349 | Link |
Registered User
Join Date: Jul 2018
Posts: 783
|
So do llvm build environment have some settings to force disable emitting AVX and later instructions ? Or these builds can be only marked as AVX(2) minimum ?
Users may still have some second-hand multicore multi-chips Xeons systems capable of AVS processing but noAVX even. |
![]() |
![]() |
![]() |
#2350 | Link |
Registered User
Join Date: Jan 2014
Posts: 2,233
|
It is optimized, prechecking the data, then automatically handling multiple processor path, small block, aligned block, large block, overlaps, small one, usually inlined, different technique when amount is known; both MS and LLVM do it.
|
![]() |
![]() |
![]() |
#2351 | Link |
Registered User
Join Date: Jan 2014
Posts: 2,233
|
Finally, in 37 minutes from switching on, my PC has happily loaded avs+ source. OMG, so slow, I've got no SSD in this machine.
Intel(R) Core(TM) i7 860, it has SSE4.1 at most. This PC features with a VS 2019 with LLVM 12.0. Made a debug build. Run it. No problem ![]() |
![]() |
![]() |
![]() |
#2352 | Link | |
Registered User
Join Date: Jan 2014
Posts: 2,233
|
Quote:
(but the problem is interesting, very interesting, I can say. And may have other reasons, this is why I put in days to investigate) |
|
![]() |
![]() |
![]() |
#2353 | Link |
Registered User
Join Date: Jan 2014
Posts: 2,233
|
Ehh...
First of all, this slow machine makes me mad. 13 minutes for a simple cmake install. 17 minutes after starting VS2022 until Avisynth project is loaded. But then! BUT THEN! ![]() Debug build was not failing. Release with debug build did it. Yeah. Strange. It would really run avx code on my pre-avx machine. This is what happened. convert_bits_avx2.cpp includes "convert_bits.h" https://github.com/AviSynth/AviSynth...s_avx2.cpp#L54 convert_bits.h contains static initialization of the dither structures: https://github.com/AviSynth/AviSynth...ert_bits.h#L89 like this: Code:
// repeated 8x for sse size 16 static const struct dither2x2a_t { const BYTE data[4] = { 0, 1, 1, 0, }; // cycle: 2 alignas(16) const BYTE data_sse2[2 * 16] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0 }; dither2x2a_t() {}; } dither2x2a; Code:
dither2x2a_t() {} Unfortunately, this initialization was requested from an AVX2 module. Though the failure happened at the first struct initialization, there are other predefined dither structs, they would all fail as well: I'm going to stop putting such active codes into common header files. They must be moved out to a hpp and be included into both the _sse2 and the _avx2 source. I wonder what happens if I do that change. Will it recognise that the static initialization of an AVX2 compiled code is forbidden? Disassembly - for the records. Code:
AviSynth.dll!_GLOBAL__sub_I_convert_bits_avx2.cpp(void): 00007FFBF987F4E0 mov dword ptr [dither2x2a (07FFBF9B03620h)],10100h 00007FFBF987F4EA vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+10h (07FFBF9A3CF60h)] 00007FFBF987F4F2 vmovups ymmword ptr [dither2x2a+10h (07FFBF9B03630h)],ymm0 00007FFBF987F4FA mov dword ptr [dither2x2 (07FFBF9B03650h)],1030200h 00007FFBF987F504 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+30h (07FFBF9A3CF80h)] 00007FFBF987F50C vmovups ymmword ptr [dither2x2+10h (07FFBF9B03660h)],ymm0 00007FFBF987F514 vmovaps xmm0,xmmword ptr [__xmm@02060307040005010307020605010400 (07FFBF991D470h)] 00007FFBF987F51C vmovaps xmmword ptr [dither4x4a (07FFBF9B03680h)],xmm0 00007FFBF987F524 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+50h (07FFBF9A3CFA0h)] 00007FFBF987F52C vmovups ymmword ptr [dither4x4a+10h (07FFBF9B03690h)],ymm0 00007FFBF987F534 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+70h (07FFBF9A3CFC0h)] 00007FFBF987F53C vmovups ymmword ptr [dither4x4a+30h (07FFBF9B036B0h)],ymm0 00007FFBF987F544 vmovaps xmm0,xmmword ptr [__xmm@050d070f09010b03060e040c0a020800 (07FFBF991D480h)] 00007FFBF987F54C vmovaps xmmword ptr [dither4x4 (07FFBF9B036D0h)],xmm0 00007FFBF987F554 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+90h (07FFBF9A3CFE0h)] 00007FFBF987F55C vmovups ymmword ptr [dither4x4+10h (07FFBF9B036E0h)],ymm0 00007FFBF987F564 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+0B0h (07FFBF9A3D000h)] 00007FFBF987F56C vmovups ymmword ptr [dither4x4+30h (07FFBF9B03700h)],ymm0 00007FFBF987F574 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+0F0h (07FFBF9A3D040h)] 00007FFBF987F57C vmovups ymmword ptr [dither8x8a+20h (07FFBF9B03740h)],ymm0 00007FFBF987F584 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+0D0h (07FFBF9A3D020h)] 00007FFBF987F58C vmovups ymmword ptr [dither8x8a (07FFBF9B03720h)],ymm0 00007FFBF987F594 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+170h (07FFBF9A3D0C0h)] 00007FFBF987F59C vmovups ymmword ptr [dither8x8a+0A0h (07FFBF9B037C0h)],ymm0 00007FFBF987F5A4 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+150h (07FFBF9A3D0A0h)] 00007FFBF987F5AC vmovups ymmword ptr [dither8x8a+80h (07FFBF9B037A0h)],ymm0 00007FFBF987F5B4 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+130h (07FFBF9A3D080h)] 00007FFBF987F5BC vmovups ymmword ptr [dither8x8a+60h (07FFBF9B03780h)],ymm0 00007FFBF987F5C4 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+110h (07FFBF9A3D060h)] 00007FFBF987F5CC vmovups ymmword ptr [dither8x8a+40h (07FFBF9B03760h)],ymm0 00007FFBF987F5D4 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+190h (07FFBF9A3D0E0h)] 00007FFBF987F5DC vmovups ymmword ptr [dither8x8 (07FFBF9B037E0h)],ymm0 00007FFBF987F5E4 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+1B0h (07FFBF9A3D100h)] 00007FFBF987F5EC vmovups ymmword ptr [dither8x8+20h (07FFBF9B03800h)],ymm0 00007FFBF987F5F4 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+1D0h (07FFBF9A3D120h)] 00007FFBF987F5FC vmovups ymmword ptr [dither8x8+40h (07FFBF9B03820h)],ymm0 00007FFBF987F604 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+1F0h (07FFBF9A3D140h)] 00007FFBF987F60C vmovups ymmword ptr [dither8x8+60h (07FFBF9B03840h)],ymm0 00007FFBF987F614 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+210h (07FFBF9A3D160h)] 00007FFBF987F61C vmovups ymmword ptr [dither8x8+80h (07FFBF9B03860h)],ymm0 00007FFBF987F624 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+230h (07FFBF9A3D180h)] 00007FFBF987F62C vmovups ymmword ptr [dither8x8+0A0h (07FFBF9B03880h)],ymm0 00007FFBF987F634 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+330h (07FFBF9A3D280h)] 00007FFBF987F63C vmovups ymmword ptr [dither16x16a+0E0h (07FFBF9B03980h)],ymm0 00007FFBF987F644 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+310h (07FFBF9A3D260h)] 00007FFBF987F64C vmovups ymmword ptr [dither16x16a+0C0h (07FFBF9B03960h)],ymm0 00007FFBF987F654 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+2F0h (07FFBF9A3D240h)] 00007FFBF987F65C vmovups ymmword ptr [dither16x16a+0A0h (07FFBF9B03940h)],ymm0 00007FFBF987F664 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+2D0h (07FFBF9A3D220h)] 00007FFBF987F66C vmovups ymmword ptr [dither16x16a+80h (07FFBF9B03920h)],ymm0 00007FFBF987F674 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+2B0h (07FFBF9A3D200h)] 00007FFBF987F67C vmovups ymmword ptr [dither16x16a+60h (07FFBF9B03900h)],ymm0 00007FFBF987F684 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+290h (07FFBF9A3D1E0h)] 00007FFBF987F68C vmovups ymmword ptr [dither16x16a+40h (07FFBF9B038E0h)],ymm0 00007FFBF987F694 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+270h (07FFBF9A3D1C0h)] 00007FFBF987F69C vmovups ymmword ptr [dither16x16a+20h (07FFBF9B038C0h)],ymm0 00007FFBF987F6A4 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+250h (07FFBF9A3D1A0h)] 00007FFBF987F6AC vmovups ymmword ptr [dither16x16a (07FFBF9B038A0h)],ymm0 00007FFBF987F6B4 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+430h (07FFBF9A3D380h)] 00007FFBF987F6BC vmovups ymmword ptr [dither16x16+0E0h (07FFBF9B03A80h)],ymm0 00007FFBF987F6C4 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+410h (07FFBF9A3D360h)] 00007FFBF987F6CC vmovups ymmword ptr [dither16x16+0C0h (07FFBF9B03A60h)],ymm0 00007FFBF987F6D4 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+3F0h (07FFBF9A3D340h)] 00007FFBF987F6DC vmovups ymmword ptr [dither16x16+0A0h (07FFBF9B03A40h)],ymm0 00007FFBF987F6E4 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+3D0h (07FFBF9A3D320h)] 00007FFBF987F6EC vmovups ymmword ptr [dither16x16+80h (07FFBF9B03A20h)],ymm0 00007FFBF987F6F4 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+3B0h (07FFBF9A3D300h)] 00007FFBF987F6FC vmovups ymmword ptr [dither16x16+60h (07FFBF9B03A00h)],ymm0 00007FFBF987F704 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+390h (07FFBF9A3D2E0h)] 00007FFBF987F70C vmovups ymmword ptr [dither16x16+40h (07FFBF9B039E0h)],ymm0 00007FFBF987F714 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+370h (07FFBF9A3D2C0h)] 00007FFBF987F71C vmovups ymmword ptr [dither16x16+20h (07FFBF9B039C0h)],ymm0 00007FFBF987F724 vmovups ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+350h (07FFBF9A3D2A0h)] 00007FFBF987F72C vmovups ymmword ptr [dither16x16 (07FFBF9B039A0h)],ymm0 00007FFBF987F734 vzeroupper 00007FFBF987F737 ret |
![]() |
![]() |
![]() |
#2355 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,220
|
@pinterf
Nice you find something, good work as always, ![]() Out of curiosity, if you have any idea why Visual Studio build are working but not the others... If you don't, no big deal, at least there is something, and it's in the code, so the "rare bug" case, not compiler related. Means it's eventualy fixable.
__________________
My github. |
![]() |
![]() |
![]() |
#2358 | Link | |
...?
Join Date: Nov 2005
Location: Florida
Posts: 1,372
|
Quote:
That's probably the simplest explanation. *a very long-lived known difference that makes GCC very annoying when it comes to using intrinsics and runtime CPU detection and how one has to build the sources. This is the reason that there's an entire block in avs_core/CMakeLists.txt doing exactly what was described there: keeping separate sources for intrinsics-using code, and then using the build system to slice GCC's and LLVM's intrinsics flags down to just those files that require them to even compile and try to keep them separate from all of the other files in the sources so that it doesn't globally optimize non-intrinsics code for the most recent instruction set and bork the entire purpose of having runtime CPU detection. (although since the SSE paths were templated a while back, those blocks probably need to be cleaned up since there are no files that would match the query for *_ssse3.cpp or *_sse41.cpp) I doubt that the builds that crash on non-AVX machines could be made to work with the script declaring SetMaxCPU, but if building with GCC, Clang, or Intel, you would almost certainly be able to see it start working if you turn off SIMD entirely (-DENABLE_INTEL_SIMD:bool=off) and just use the appropriate -march flag in -DCMAKE_CXX_FLAGS to globally optimize for your particular CPU. Probably would take a pretty big theoretical** performance hit, but still. **theoretical, because on non-x86 CPUs (like the IBM 970MP, ARM Cortex A72, or Apple M1), the core still works fairly well, at least on synthetic tests. It's just that on x86, those same tests get just absolutely stupid numbers because of the intrinsics. Almost comprehensibly meaningless numbers in some cases (Version getting north of 32000 fps vs a much more 'I can understand this' 2000-8000 fps range on the M1 or on a 9th Gen Core i5 with SetMaxCPU("None")). But if a large amount of the testing is on external plugins and not using much or any core functions, it's questionable whether you might see a performance hit at all, or if the thing for 'normal' core use that makes the most difference is not the intrinsics in all of the filters you may never use, but because of the use of the x86-optimized memcpy or bitblt. And just how much of it is down to non-SIMD-related compiler optimizations. I've not tried to do a '-march=native -O3' build on x86 that also has the intrinsics disabled, but that might be pretty enlightening. |
|
![]() |
![]() |
![]() |
#2359 | Link |
Registered User
Join Date: Jan 2014
Posts: 2,233
|
A theoretical fix is up on git, along with
https://github.com/AviSynth/AviSynthPlus/issues/347 Finally I think, this is not the compiler's fault. Programmers must really take care of statically initialized tables in classes (and not use them) which would occur in other than the base CPU-arch modules. |
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|