Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 17th March 2023, 17:40   #2341  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,309
Quote:
Originally Posted by DTL View Post
Yes - it looks clang uses some >SSE instructions. The VirtualDub loaded via SDE emulator
Code:
set PATH=%PATH%;G:\sde
sde.exe -avx 1 -avx2 1 -emu_fast 1 -fma 1 -sse41 1 -- G:\Distr\VirtualDub-1.10.4-AMD64\veedub64.exe
can load script with x64-clang AVS+ 3.7.3 test7. Slow enough but working. So if it possible to disable AVX and higher usage in clang (also IntelClassic must have switches for it).

May be clang and llvm so visibly faster because they finally start to make use of registerfile of SIMD coprocessor for temporals and function arguments instead of pushing to stack and so on. But old chips do not have registerfile of required size (as AVX it is 256 bytes) and instructions to store and load data from it. So new clang and llvm no more compatible with SSE architecture chips ?
I doubt they are incompatible by design. I wonder, that the illegal instruction come from Avisynth code or somewhere from the library. I know that some of my plugins require sse4.1 when compiled with clang. (Where it would be too painful to separate SSE2 and SSE4.1 code to put them into different functions with different pragmas), while with MS I could do it simply by templates.
pinterf is offline   Reply With Quote
Old 17th March 2023, 17:56   #2342  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,309
Check cmakelist.txt. I think sse4.1 is set as the minimum for llvm compilation.
pinterf is offline   Reply With Quote
Old 17th March 2023, 18:31   #2343  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
Unfortunately VS2019 debugger not break on the Illegal Istruction exception in AVS (or I still not know how to configure it to break and show disassembly). But my Core2Duo E7500 CPU do have MMX, SSE, SSE2, SSE3, SSSE3, SSE 4.1, EM64T, VT-x and still invalid instruction crash with that builds without SDE emulating up to AVX2.

It definitely something around AVX/2 :
Disabling AVX and AVX2 emulation
sde.exe -avx 0 -avx2 0 -- G:\Distr\VirtualDub-1.10.4-AMD64\veedub64.exe

cause crash with x64_clang build:
Illegal instruction at address = 7fed7947e0a: c5 fc 10 05 9e 17 19 00 c5 fc 11 05 e6 b4 24
Image name: C:\Windows\system32\AviSynth.dll
Offset in image: 0x4a7e0a

IDA shows at disassembly:
.text:00000001804A7E0A vmovups ymm0, cs:ymmword_1806395B0 - it is AVX or AVX2 instruction.

Same as VS2019 debug output : Exception thrown at 0x000007FEC4FF7E0A (AviSynth.dll) in Veedub64.exe: 0xC000001D: Illegal Instruction.
7E0A address from some page offset ?

Last edited by DTL; 17th March 2023 at 18:48.
DTL is offline   Reply With Quote
Old 17th March 2023, 19:17   #2344  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,309
My quick test PC hasn't AVX, but has SSE4.2. Never tested on my dedicated video processing PC (as it didn't work on quick test), which has AVX2.
I'll test again, but on my Video PC.
Is it possible that despite compile option clang still puts AVX/AVX2 when it shouldn't ? Or some intrinsic code with AVX/AVX2 is in some place it doesn't belong ? Because that's strange that both Intel and clang produce the same result.
Now Visual Studio has allready changed things without telling (the example is the bug in non aligned for AVISource, where VS compiled a not aligned when the code asked for aligned).
__________________
My github.

Last edited by jpsdr; 17th March 2023 at 21:03.
jpsdr is offline   Reply With Quote
Old 17th March 2023, 19:39   #2345  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
Quote:
Originally Posted by jpsdr View Post
Or some intrinsic code with AVX/AVX2 is in some place it doesn't belong ?
IDA disassembly for clang build shows not anything looking like human-handcrafted program:

sub_1804A7E00 proc near
mov cs:dword_1806F32F0, 10100h
vmovups ymm0, cs:ymmword_1806395B0 < -- crash
vmovups cs:ymmword_1806F3300, ymm0
mov cs:dword_1806F3320, 1030200h
vmovups ymm0, cs:ymmword_1806395D0
vmovups cs:ymmword_1806F3330, ymm0
vmovaps xmm0, cs:xmmword_18052C510
vmovaps cs:xmmword_1806F3350, xmm0
vmovups ymm0, cs:ymmword_1806395F0
vmovups cs:ymmword_1806F3360, ymm0
vmovups ymm0, cs:ymmword_180639610
vmovups cs:ymmword_1806F3380, ymm0
vmovaps xmm0, cs:xmmword_18052C520
vmovaps cs:xmmword_1806F33A0, xmm0
vmovups ymm0, cs:ymmword_180639630
vmovups cs:ymmword_1806F33B0, ymm0
vmovups ymm0, cs:ymmword_180639650
vmovups cs:ymmword_1806F33D0, ymm0
vmovups ymm0, cs:ymmword_180639690
vmovups cs:ymmword_1806F3410, ymm0
vmovups ymm0, cs:ymmword_180639670
vmovups cs:ymmword_1806F33F0, ymm0
vmovups ymm0, cs:ymmword_180639710
vmovups cs:ymmword_1806F3490, ymm0
vmovups ymm0, cs:ymmword_1806396F0
vmovups cs:ymmword_1806F3470, ymm0
vmovups ymm0, cs:ymmword_1806396D0
vmovups cs:ymmword_1806F3450, ymm0
vmovups ymm0, cs:ymmword_1806396B0
vmovups cs:ymmword_1806F3430, ymm0

and so on. It looks like compiler generated block.

Last edited by DTL; 17th March 2023 at 19:42.
DTL is offline   Reply With Quote
Old 17th March 2023, 19:49   #2346  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,309
It's an unrolled loop doing memcpy.
Fortunately I have a non-avx machine at home.
pinterf is offline   Reply With Quote
Old 17th March 2023, 20:19   #2347  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
It is doing memcpy of large enough block. Typical universal memcpy must support from 1 byte to any ?
DTL is offline   Reply With Quote
Old 17th March 2023, 21:04   #2348  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,309
Tested on my Video PC with AVX2 (and Windows 7), my llvm build works.
__________________
My github.
jpsdr is offline   Reply With Quote
Old 17th March 2023, 21:20   #2349  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
So do llvm build environment have some settings to force disable emitting AVX and later instructions ? Or these builds can be only marked as AVX(2) minimum ?

Users may still have some second-hand multicore multi-chips Xeons systems capable of AVS processing but noAVX even.
DTL is offline   Reply With Quote
Old 17th March 2023, 21:30   #2350  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,309
Quote:
Originally Posted by DTL View Post
It is doing memcpy of large enough block. Typical universal memcpy must support from 1 byte to any ?
It is optimized, prechecking the data, then automatically handling multiple processor path, small block, aligned block, large block, overlaps, small one, usually inlined, different technique when amount is known; both MS and LLVM do it.
pinterf is offline   Reply With Quote
Old 17th March 2023, 21:37   #2351  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,309
Finally, in 37 minutes from switching on, my PC has happily loaded avs+ source. OMG, so slow, I've got no SSD in this machine.
Intel(R) Core(TM) i7 860, it has SSE4.1 at most.
This PC features with a VS 2019 with LLVM 12.0. Made a debug build. Run it. No problem
pinterf is offline   Reply With Quote
Old 17th March 2023, 21:41   #2352  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,309
Quote:
Originally Posted by DTL View Post
So do llvm build environment have some settings to force disable emitting AVX and later instructions ? Or these builds can be only marked as AVX(2) minimum ?

Users may still have some second-hand multicore multi-chips Xeons systems capable of AVS processing but noAVX even.
Yep, use MSVC build. Simple as that. Maybe it is even quicker than llvm, llvm alone is not a magic wand.
(but the problem is interesting, very interesting, I can say. And may have other reasons, this is why I put in days to investigate)
pinterf is offline   Reply With Quote
Old 18th March 2023, 00:25   #2353  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,309
Ehh...

First of all, this slow machine makes me mad. 13 minutes for a simple cmake install. 17 minutes after starting VS2022 until Avisynth project is loaded. But then! BUT THEN!

Debug build was not failing.
Release with debug build did it. Yeah.

Strange.

It would really run avx code on my pre-avx machine.
This is what happened.

convert_bits_avx2.cpp includes "convert_bits.h"
https://github.com/AviSynth/AviSynth...s_avx2.cpp#L54

convert_bits.h contains static initialization of the dither structures:
https://github.com/AviSynth/AviSynth...ert_bits.h#L89

like this:

Code:
// repeated 8x for sse size 16
static const struct dither2x2a_t
{
  const BYTE data[4] = {
    0, 1,
    1, 0,
  };
  // cycle: 2
  alignas(16) const BYTE data_sse2[2 * 16] = {
    0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
    1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0
  };
  dither2x2a_t() {};
} dither2x2a;
The illegal instruction came from
Code:
dither2x2a_t() {}
which triggered the static initialization.

Unfortunately, this initialization was requested from an AVX2 module.

Though the failure happened at the first struct initialization, there are other predefined dither structs, they would all fail as well:

I'm going to stop putting such active codes into common header files. They must be moved out to a hpp and be included into both the _sse2 and the _avx2 source. I wonder what happens if I do that change. Will it recognise that the static initialization of an AVX2 compiled code is forbidden?

Disassembly - for the records.

Code:
AviSynth.dll!_GLOBAL__sub_I_convert_bits_avx2.cpp(void):
00007FFBF987F4E0  mov         dword ptr [dither2x2a (07FFBF9B03620h)],10100h  
00007FFBF987F4EA  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+10h (07FFBF9A3CF60h)]  
00007FFBF987F4F2  vmovups     ymmword ptr [dither2x2a+10h (07FFBF9B03630h)],ymm0  
00007FFBF987F4FA  mov         dword ptr [dither2x2 (07FFBF9B03650h)],1030200h  
00007FFBF987F504  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+30h (07FFBF9A3CF80h)]  
00007FFBF987F50C  vmovups     ymmword ptr [dither2x2+10h (07FFBF9B03660h)],ymm0  
00007FFBF987F514  vmovaps     xmm0,xmmword ptr [__xmm@02060307040005010307020605010400 (07FFBF991D470h)]  
00007FFBF987F51C  vmovaps     xmmword ptr [dither4x4a (07FFBF9B03680h)],xmm0  
00007FFBF987F524  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+50h (07FFBF9A3CFA0h)]  
00007FFBF987F52C  vmovups     ymmword ptr [dither4x4a+10h (07FFBF9B03690h)],ymm0  
00007FFBF987F534  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+70h (07FFBF9A3CFC0h)]  
00007FFBF987F53C  vmovups     ymmword ptr [dither4x4a+30h (07FFBF9B036B0h)],ymm0  
00007FFBF987F544  vmovaps     xmm0,xmmword ptr [__xmm@050d070f09010b03060e040c0a020800 (07FFBF991D480h)]  
00007FFBF987F54C  vmovaps     xmmword ptr [dither4x4 (07FFBF9B036D0h)],xmm0  
00007FFBF987F554  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+90h (07FFBF9A3CFE0h)]  
00007FFBF987F55C  vmovups     ymmword ptr [dither4x4+10h (07FFBF9B036E0h)],ymm0  
00007FFBF987F564  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+0B0h (07FFBF9A3D000h)]  
00007FFBF987F56C  vmovups     ymmword ptr [dither4x4+30h (07FFBF9B03700h)],ymm0  
00007FFBF987F574  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+0F0h (07FFBF9A3D040h)]  
00007FFBF987F57C  vmovups     ymmword ptr [dither8x8a+20h (07FFBF9B03740h)],ymm0  
00007FFBF987F584  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+0D0h (07FFBF9A3D020h)]  
00007FFBF987F58C  vmovups     ymmword ptr [dither8x8a (07FFBF9B03720h)],ymm0  
00007FFBF987F594  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+170h (07FFBF9A3D0C0h)]  
00007FFBF987F59C  vmovups     ymmword ptr [dither8x8a+0A0h (07FFBF9B037C0h)],ymm0  
00007FFBF987F5A4  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+150h (07FFBF9A3D0A0h)]  
00007FFBF987F5AC  vmovups     ymmword ptr [dither8x8a+80h (07FFBF9B037A0h)],ymm0  
00007FFBF987F5B4  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+130h (07FFBF9A3D080h)]  
00007FFBF987F5BC  vmovups     ymmword ptr [dither8x8a+60h (07FFBF9B03780h)],ymm0  
00007FFBF987F5C4  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+110h (07FFBF9A3D060h)]  
00007FFBF987F5CC  vmovups     ymmword ptr [dither8x8a+40h (07FFBF9B03760h)],ymm0  
00007FFBF987F5D4  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+190h (07FFBF9A3D0E0h)]  
00007FFBF987F5DC  vmovups     ymmword ptr [dither8x8 (07FFBF9B037E0h)],ymm0  
00007FFBF987F5E4  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+1B0h (07FFBF9A3D100h)]  
00007FFBF987F5EC  vmovups     ymmword ptr [dither8x8+20h (07FFBF9B03800h)],ymm0  
00007FFBF987F5F4  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+1D0h (07FFBF9A3D120h)]  
00007FFBF987F5FC  vmovups     ymmword ptr [dither8x8+40h (07FFBF9B03820h)],ymm0  
00007FFBF987F604  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+1F0h (07FFBF9A3D140h)]  
00007FFBF987F60C  vmovups     ymmword ptr [dither8x8+60h (07FFBF9B03840h)],ymm0  
00007FFBF987F614  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+210h (07FFBF9A3D160h)]  
00007FFBF987F61C  vmovups     ymmword ptr [dither8x8+80h (07FFBF9B03860h)],ymm0  
00007FFBF987F624  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+230h (07FFBF9A3D180h)]  
00007FFBF987F62C  vmovups     ymmword ptr [dither8x8+0A0h (07FFBF9B03880h)],ymm0  
00007FFBF987F634  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+330h (07FFBF9A3D280h)]  
00007FFBF987F63C  vmovups     ymmword ptr [dither16x16a+0E0h (07FFBF9B03980h)],ymm0  
00007FFBF987F644  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+310h (07FFBF9A3D260h)]  
00007FFBF987F64C  vmovups     ymmword ptr [dither16x16a+0C0h (07FFBF9B03960h)],ymm0  
00007FFBF987F654  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+2F0h (07FFBF9A3D240h)]  
00007FFBF987F65C  vmovups     ymmword ptr [dither16x16a+0A0h (07FFBF9B03940h)],ymm0  
00007FFBF987F664  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+2D0h (07FFBF9A3D220h)]  
00007FFBF987F66C  vmovups     ymmword ptr [dither16x16a+80h (07FFBF9B03920h)],ymm0  
00007FFBF987F674  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+2B0h (07FFBF9A3D200h)]  
00007FFBF987F67C  vmovups     ymmword ptr [dither16x16a+60h (07FFBF9B03900h)],ymm0  
00007FFBF987F684  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+290h (07FFBF9A3D1E0h)]  
00007FFBF987F68C  vmovups     ymmword ptr [dither16x16a+40h (07FFBF9B038E0h)],ymm0  
00007FFBF987F694  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+270h (07FFBF9A3D1C0h)]  
00007FFBF987F69C  vmovups     ymmword ptr [dither16x16a+20h (07FFBF9B038C0h)],ymm0  
00007FFBF987F6A4  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+250h (07FFBF9A3D1A0h)]  
00007FFBF987F6AC  vmovups     ymmword ptr [dither16x16a (07FFBF9B038A0h)],ymm0  
00007FFBF987F6B4  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+430h (07FFBF9A3D380h)]  
00007FFBF987F6BC  vmovups     ymmword ptr [dither16x16+0E0h (07FFBF9B03A80h)],ymm0  
00007FFBF987F6C4  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+410h (07FFBF9A3D360h)]  
00007FFBF987F6CC  vmovups     ymmword ptr [dither16x16+0C0h (07FFBF9B03A60h)],ymm0  
00007FFBF987F6D4  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+3F0h (07FFBF9A3D340h)]  
00007FFBF987F6DC  vmovups     ymmword ptr [dither16x16+0A0h (07FFBF9B03A40h)],ymm0  
00007FFBF987F6E4  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+3D0h (07FFBF9A3D320h)]  
00007FFBF987F6EC  vmovups     ymmword ptr [dither16x16+80h (07FFBF9B03A20h)],ymm0  
00007FFBF987F6F4  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+3B0h (07FFBF9A3D300h)]  
00007FFBF987F6FC  vmovups     ymmword ptr [dither16x16+60h (07FFBF9B03A00h)],ymm0  
00007FFBF987F704  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+390h (07FFBF9A3D2E0h)]  
00007FFBF987F70C  vmovups     ymmword ptr [dither16x16+40h (07FFBF9B039E0h)],ymm0  
00007FFBF987F714  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+370h (07FFBF9A3D2C0h)]  
00007FFBF987F71C  vmovups     ymmword ptr [dither16x16+20h (07FFBF9B039C0h)],ymm0  
00007FFBF987F724  vmovups     ymm0,ymmword ptr [__xmm@0f80800e80800d80800c80800b80800a+350h (07FFBF9A3D2A0h)]  
00007FFBF987F72C  vmovups     ymmword ptr [dither16x16 (07FFBF9B039A0h)],ymm0  
00007FFBF987F734  vzeroupper  
00007FFBF987F737  ret
Eh.. I'm gonna finally sleep well at least. I'll return to it next week.
pinterf is offline   Reply With Quote
Old 18th March 2023, 10:41   #2354  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
@DTL: If you had wrapped these lists in a bbCode CODE block, it would have taken less space in scrollable boxes.
_

Oops, missed another page of replies, sorry.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 18th March 2023, 11:14   #2355  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,309
@pinterf
Nice you find something, good work as always,
Out of curiosity, if you have any idea why Visual Studio build are working but not the others... If you don't, no big deal, at least there is something, and it's in the code, so the "rare bug" case, not compiler related. Means it's eventualy fixable.
__________________
My github.
jpsdr is offline   Reply With Quote
Old 18th March 2023, 11:39   #2356  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,041
I delete that posts as found better way of error reporting via SDE crash log.
DTL is offline   Reply With Quote
Old 18th March 2023, 12:04   #2357  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,883
eheheheh I knew there must have been something else there.

Very nicely spotted, Ferenc!
You're the "Avisynth Grandmaster" after all.
FranceBB is offline   Reply With Quote
Old 18th March 2023, 20:25   #2358  |  Link
qyot27
...?
 
qyot27's Avatar
 
Join Date: Nov 2005
Location: Florida
Posts: 1,419
Quote:
Originally Posted by jpsdr View Post
@pinterf
Nice you find something, good work as always,
Out of curiosity, if you have any idea why Visual Studio build are working but not the others... If you don't, no big deal, at least there is something, and it's in the code, so the "rare bug" case, not compiler related. Means it's eventualy fixable.
Judging by some of the replies, there are/were functions in headers and the process dispatching that were getting dinged by LLVM and not by MSVC because of the difference* in the way GCC/Clang (and presumably Intel now too, since it uses LLVM) and CL handle enabling intrinsics. And this was causing AVX instructions to be emitted for sources that it shouldn't have been generated for.

That's probably the simplest explanation.

*a very long-lived known difference that makes GCC very annoying when it comes to using intrinsics and runtime CPU detection and how one has to build the sources. This is the reason that there's an entire block in avs_core/CMakeLists.txt doing exactly what was described there: keeping separate sources for intrinsics-using code, and then using the build system to slice GCC's and LLVM's intrinsics flags down to just those files that require them to even compile and try to keep them separate from all of the other files in the sources so that it doesn't globally optimize non-intrinsics code for the most recent instruction set and bork the entire purpose of having runtime CPU detection.

(although since the SSE paths were templated a while back, those blocks probably need to be cleaned up since there are no files that would match the query for *_ssse3.cpp or *_sse41.cpp)

I doubt that the builds that crash on non-AVX machines could be made to work with the script declaring SetMaxCPU, but if building with GCC, Clang, or Intel, you would almost certainly be able to see it start working if you turn off SIMD entirely (-DENABLE_INTEL_SIMD:bool=off) and just use the appropriate -march flag in -DCMAKE_CXX_FLAGS to globally optimize for your particular CPU. Probably would take a pretty big theoretical** performance hit, but still.

**theoretical, because on non-x86 CPUs (like the IBM 970MP, ARM Cortex A72, or Apple M1), the core still works fairly well, at least on synthetic tests. It's just that on x86, those same tests get just absolutely stupid numbers because of the intrinsics. Almost comprehensibly meaningless numbers in some cases (Version getting north of 32000 fps vs a much more 'I can understand this' 2000-8000 fps range on the M1 or on a 9th Gen Core i5 with SetMaxCPU("None")). But if a large amount of the testing is on external plugins and not using much or any core functions, it's questionable whether you might see a performance hit at all, or if the thing for 'normal' core use that makes the most difference is not the intrinsics in all of the filters you may never use, but because of the use of the x86-optimized memcpy or bitblt. And just how much of it is down to non-SIMD-related compiler optimizations. I've not tried to do a '-march=native -O3' build on x86 that also has the intrinsics disabled, but that might be pretty enlightening.
qyot27 is offline   Reply With Quote
Old 20th March 2023, 12:26   #2359  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,309
A theoretical fix is up on git, along with

https://github.com/AviSynth/AviSynthPlus/issues/347

Finally I think, this is not the compiler's fault.

Programmers must really take care of statically initialized tables in classes (and not use them) which would occur in other than the base CPU-arch modules.
pinterf is offline   Reply With Quote
Old 20th March 2023, 16:13   #2360  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,309
I've made an LLVM build of r3966 and tested, it works.

__________________
My github.
jpsdr is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 16:54.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.