Questions about assembler [Archive]

View Full Version : Questions about assembler

Jamaika

28th August 2021, 14:06

I don't know how to make codecs with nasm and gcc.

Does assembler mean SIMD AVX, AVX2, AVX3?
I know that assembler can be used with 64bit files.
Do you need computer with cpu AVX2 for the codecs?
What functions should gcc and nasm have?
Is nasm the best for assembler? Maybe there is something better and newer.
Maybe third program is needed to merge the others?
How to check if assembler is included and works with the codec?

LoRd_MuldeR

28th August 2021, 15:53

Does assembler mean SIMD AVX, AVX2, AVX3?
No. Not necessarily. Assembler means you are writing code in assembly language (https://en.wikipedia.org/wiki/Assembly_language), instead of using a "high level" language, such as C, C++, Rust, etc. pp.

If you write code in assembly language, for the x86 or x64 platform, then you may use SIMD instructions (MMX, SSE/2/3/4, AVX/2, etc.), but you don't have to. Using SIMD instructs limits the CPUs that your code will run on.

Note: Usually like ~99% of the code of an application or library are written in a "high level" language. Only the "critical" functions are written as assembler code, for optimization purposes.

Do you need computer with cpu AVX2 for the codecs?

If code written in assembly language uses AVX2 instructions, then yes, that code requires a CPU which supports (at least) the AVX2 instruction set extension. Otherwise it would crash will "illegal instruction" exception :eek:

But: Usually, developers create multiple versions of the assembly code targeting different types of CPU. Then, at runtime, the "best" version of the code for the particular CPU can be selected.

For example, the same function can be implemented as "plain C" (runs on all CPUs and serves as a baseline), as AVX-optimized assembly code (runs on CPU with AVX support) and as AVX2-optimized assembly code (runs on CPU with AVX2 support). At runtime, the application can check the capabilities of the CPU that it is running on, using the CPUID (https://en.wikipedia.org/wiki/CPUID) instruction, and then select the implementation that matches the actual CPU.

Of course, all that does not happen "automatically". The programmer has to implement it that way! ;)

Is nasm the best for assembler? Maybe there is something better and newer.

Because an assembler just translates assembly instructions into binary opcodes in a "1:1" fashion, one assembler is as good as any other – provided that it supports the desired target platform and all required instruction set extensions you want to use. However, be aware that different assemblers may use different "dialects" of the assembly language. So, existing assembly code generally needs to be assembled with the specific assembler that it was written for.

How to check if assembler is included and works with the codec?

By looking at the source code?

Also, applications and libraries often provide diagnostic output at runtime, which shows whether it was built with assembly code enabled and, if so, which specific assembly code optimizations actually are in use.

x264 is a good example for that:
x264 [info]: using cpu capabilities MMX MMXEXT SSE SSE2

(It means that x264 has determined that the CPU it is running on supports MMX, MMXEXT, SSE and SSE2. Thus x264 is going to use its optimized assembly code for those specific instruction set extensions)

videoh

28th August 2021, 16:15

Because an assembler just translates assembly instructions into binary opcodes in a "1:1" fashion You are forgetting about assembler directives and macros, etc.

lvqcl

28th August 2021, 16:47

Maybe by "make codecs" he means "compile existing code into an executable" ?

videoh

28th August 2021, 17:10

Maybe by "make codecs" he means "compile existing code into an executable" ? Yes. The source code generally comes with a full project or build script that would specify all the tools used.

Jamaika

28th August 2021, 18:30

No. Not necessarily. Assembler means you are writing code in assembly language (https://en.wikipedia.org/wiki/Assembly_language), instead of using a "high level" language, such as C, C++, Rust, etc. pp.

If you write code in assembly language, for the x86 or x64 platform, then you may use SIMD instructions (MMX, SSE/2/3/4, AVX/2, etc.), but you don't have to. Using SIMD instructs limits the CPUs that your code will run on.

Note: Usually like ~99% of the code of an application or library are written in a "high level" language. Only the "critical" functions are written as assembler code, for optimization purposes.
Please describe more clearly. What does "SIMD instructions (MMX, SSE / 2/3/4, AVX2, etc.)" mean?
AVX2 ranges from MMX, SSE2 / 3/4 to AVX2, but does that mean I will be able to use SSE2 alone?
But: Usually, developers create multiple versions of the assembly code targeting different types of CPU. Then, at runtime, the "best" version of the code for the particular CPU can be selected.
And here I don't know what's going on. He gives an example. Adds libjpeg-turbo or new dav1d with assembler which has files from sse2 and avx2. To compile assember I have to add two kinds of files sse2 and avx2. The question is, is assembler under sse2 or avx2?
Added asm sse2 or avx/2/3 files can be freely added in gcc under the same (-msse2) but are these dead functions and do they contribute anything? It doesn't have rare comumic.
Because an assembler just translates assembly instructions into binary opcodes in a "1:1" fashion, one assembler is as good as any other – provided that it supports the desired target platform and all required instruction set extensions you want to use. However, be aware that different assemblers may use different "dialects" of the assembly language. So, existing assembly code generally needs to be assembled with the specific assembler that it was written for.
I asked about assembler programs because I read that they are different, e.g. yasm. I was also interested in the question whether in assembly language itself you can choose only e.g. sse2.
By looking at the source code?

Also, applications and libraries often provide diagnostic output at runtime, which shows whether it was built with assembly code enabled and, if so, which specific assembly code optimizations actually are in use.

x264 is a good example for that:
It seems that there can be no problem with such a simple issue. Unfortunately, I have no success.
For x265 I have a message: none. At first I thought I had an old computer, but I downloaded other users' codec and here the assembler communicates that it works.
Gives preset functions to programs:
I wonder what I am doing wrong.

g++.exe -std=gnu++11 -ggdb3 -flto -O3 -fPIC -DWINVER=0x0602 -D_WIN32_WINNT=0x0602 -DEXPORT_C_API=0 -DX265_NS=x265_12bit -DX86_64=1 -DX265_VERSION=3.5+13 -DHIGH_BIT_DEPTH=1 -DX265_DEPTH=12 -DNX265_ARCH_X86=1 -DENABLE_HDR10_PLUS=1 -DNENABLE_LIBVMAF=1 -DENABLE_ASSEMBLY=1 -DHAVE_STRTOK_R -c ... -o ...
nasm.exe -f win64 -O3 -DARCH_X86_64=1 -DBIT_DEPTH=12 -DHIGH_BIT_DEPTH=1 -DX265_NS=x265_12bit -DPIC=1 -DSUFFIX=o -Xgnu ... -o ...
Some recommend adding "march = native" and some say not necessarily. The processor must be specified.
Should windows be win64 or elf64 for gcc?
Or maybe gcc 12.0.0 just has bugs and doesn't work.

Sorry for my English

LoRd_MuldeR

28th August 2021, 18:35

Most applications or libraries that do contain optimized assembly code will check for the required assembler tool (e.g. nasm) when you run the provided ./configure script.

Usually the ./configure script will simply error out when the required assembler tool is missing – unless you explicitly disable assembly code by passing the --disable-asm option (or whatever it is called). Sometimes the ./configure script "silently" disables the assembly code, if the required assembler tool wasn't found. In that case you should see whether assembly code is enabled or not from the final output of the ./configure script, e.g.:
./configure
[...]
platform: X86
shared: yes
static: yes
asm: yes <--- !!!

Note: It should be clear that all this requires that the software which you are trying to build already does support assembly code. If the software does not support assembly code, you cannot "magically" enable it ;)

LoRd_MuldeR

28th August 2021, 18:45

Please describe more clearly. What does "SIMD instructions (MMX, SSE / 2/3/4, AVX2, etc.)" mean?
AVX2 ranges from MMX, SSE2 / 3/4 to AVX2, but does that mean I will be able to use SSE2 alone?

MMX, SSE, SSE2, SSE3, SSSE3, SSE4, AVX, AVX2 and AVX-512 – just to name a few – all are extensions of the original x86 instruction set. They all are so-called SIMD (https://en.wikipedia.org/wiki/SIMD) instruction sets.

If "optimized" assembly code uses any of those instruction set extensions, then that code will run only on CPUs which support the specific instruction set extension.

So, if the assembly code uses any SSE2 instructions, it requires a CPU that supports (at least) SSE2. If the assembly code uses any AVX instructions, it requires a CPU that supports (at least) AVX. If the assembly code uses any AVX2 instructions, it requires a CPU that supports (at least) AVX2. And so on. Of course, SSE2 and AVX (or AVX2) instructions can be "mixed" in the assembly code, but then a CPU with support for SSE2 and AVX (or AVX2) is required to run that code!

(Note: If a CPU supports AVX, then support for SSE2 is implied, but certainly not the other way around! Also AVX2 support implies AVX support, but again not the other way around)

As said before, an application or library may contain multiple versions of the same assembly code. For example, one version that uses SSE2 only and another version that uses AVX. This allows the "SSE2 only" assembly code to be used on CPUs that do support SSE but not AVX. And the "AVX" assembly code can be used on CPUs that support AVX. But, to be clear, this kind of "runtime CPU dispatching" does not happen automatically; it needs to be implemented!

Jamaika

28th August 2021, 19:06

MMX, SSE, SSE2, SSE3, SSSE3, SSE4, AVX, AVX2 and AVX-512 – just to name a few – all are extensions of the original x86 instruction set.
I wonder ARCH_X86_64 definition. Does that mean x265 works with win64?

Use functions in gcc flto or fast-math or rather not necessarily? What function g "" should be used? g0, ggdb or gdwarf

lvqcl

28th August 2021, 19:13

https://en.wikipedia.org/wiki/Programming_by_permutation

LoRd_MuldeR

28th August 2021, 20:59

I wonder ARCH_X86_64 definition. Does that mean x265 works with win64?

"x86-64" (aka "x64", aka "AMD64", aka "Intel 64", not to be confused with "IA64") is the 64-Bit extension of Intel's original "x86" architecture.

It is the architecture that all Intel and AMD processors from last ~15 years use.

Binaries built for "x86-64" architecture will run on 64-Bit Windows (Win64), as long as we are talking about the 64-Bit Windows for Intel/AMD "x86-64" processors.

Note: There now is a version of Windows for "arm64" processors too, but that is something different!

Use functions in gcc flto or fast-math or rather not necessarily? What function g "" should be used? g0, ggdb or gdwarf

Regarding link-time optimization (LTO) see:
https://en.wikipedia.org/wiki/Interprocedural_optimization

Regarding the "-ffast-math" option:
This enables some "unsafe" optimizations for math functions. Specifically, it drops strict compliance to the IEEE rules/specifications in order to yield even faster code. This may break some applications!

Jamaika

28th August 2021, 21:08

Regarding link-time optimization (LTO) see:
https://en.wikipedia.org/wiki/Interprocedural_optimization

Regarding the "-ffast-math" option:
This enables some "unsafe" optimizations for math functions. Specifically, it drops compliance to the IEEE or ISO rules/specifications in order to yield even faster code. This may break some applications!
So generally speaking do not use.

LoRd_MuldeR

28th August 2021, 21:16

So generally speaking do not use.

Well, LTO may give a nice speed-up for some applications, but may not have any noteworthy effect for others. You really have to test it out. I think it's something worth trying.

But option "-ffast-math" is something you should use with care. Even though it may give an extra speed-up, if you do not exactly understand the consequences (for your particular application) then better don't use it ;)

Jamaika

29th August 2021, 13:00

Searching for a bug.
I read bit. For libjpeg-turbo cmake is recommended to use debugging information "-g". In nasm this is just the equivalent of "-g"
I used "-g0". My mistake.
"Level 0 produces no debug information at all. Thus, -g0 negates -g." Default is 2
Then I ran test with ggdb3. The assembler test for x265 failed on sse2 or avx2.
For svt-av1 cmake is recommended to run "-gdwarf". Option under unix. The problem is that "gdwarf" in gcc defaults to gdwarf32 for the nasm elf32/64 function.
Trying with the "gdwarf" function did not work either.

For me, it's big deal to combine nasm with gcc 12.0.0. The suggestion of permutation with options in programs is like ball against the fence. Nasm doesn't have lot of options. The programs either work or don't.

libjpeg-turbo
nasm_2.15.05.exe -f win64 -g -F cv8 -O3 -DWIN64=1 -D__x86_64__=1 -DPIC=1 %%f -Xgnu -o %%~nf.o
gcc_12.0.0.exe -std=gnu11 -g -O3 -fPIC -DWITH_SIMD=1 -DUSE_WINDOWS_MESSAGEBOX=1 -DBITS_IN_JSAMPLE=8 -DINLINE="inline __attribute__((always_inline))" -DLOCAL(type)="static type" -c %%f -o %%~nf.o

dav1d
fnasm_2.15.05.exe -f win64 -g -F cv8 -O3 -DWIN64=1 -DARCH_X86_64=1 -DPIC=1 -Dprivate_prefix=dav1d -Xgnu %%f -o %%~nf.o
gcc_12.0.0.exe -std=gnu11 -g -O3 -fPIC -DARCH_X86_64=1 -DHAVE_ASM=1 -D__USE_MINGW_ANSI_STDIO=1 -DUNICODE=1 -D_UNICODE=1 -DCONFIG_8BPC -DCONFIG_16BPC -DBITDEPTH=8 -DHAVE_ALIGNED_MALLOC=1 -c %%f -o %%~nf.o

svt-av1 ???
nasm_2.15.05.exe -f win64 -g -F cv8 -O3 -DWIN64=1 -DARCH_X86_64=1 -DPIC=1 -Xgnu %%f -o %%~nf.o
gcc_12.0.0.exe -std=gnu11 -g -O3 -fPIC -DARCH_X86_64=1 -c %%f -o %%~nf.o

x265
nasm_2.15.05.exe -fwin32 -g -F cv8 -O3 -DX265_ARCH_X86=1 -DARCH_X86_64=0 -DBIT_DEPTH=8 -DHIGH_BIT_DEPTH=0 -DX265_NS=x265 -DPIC=1 -DSUFFIX=o -Xgnu %%f -o %%~nf.o
g++_12.0.0.exe -std=gnu++11 -g -O3 -fPIC -DX86_64=0 -DWINVER=0x0602 -D_WIN32_WINNT=0x0602 -DEXPORT_C_API=1 -DX265_NS=x265 -DLINKED_10BIT=1 -DLINKED_12BIT=1 -DX265_VERSION=3.5+13 -DHIGH_BIT_DEPTH=0 -DX265_DEPTH=8 -DENABLE_HDR10_PLUS=1 -DHAVE_STRTOK_R -DENABLE_ASSEMBLY=1 -c %%f -o %%~nf.o

Jamaika

31st August 2021, 06:01

Building with assembler using libheif. Assume that we do not press cmake, but we want to test new functions.
After hours of guessing, I find libjpeg-turbo, dav1d, and x265 assemblers a pain to compile.
Defects:
Very large gcc build file sizes with {-g -O3} only functions. The user can only use fPIC under 64bit if the assembly files contain the appropriate functions. The rest of the functions aren't included {-ffast-math -fflto -ftree-vectorize}. There will be differences in the nasm, gcc files and the codec will not work properly.
x265 works for me only in 32bit, when we turn off avx2, avx3 functions which only work in 64bit. When I use gcc in C++ x86_64 32bit {m32} then gcc doesn't allow this to work with the additional {mingw.thread} software. There are bugs. Moreover x265 assembler has bugs and doesn't work in 64bit after corrections, which it communicates.
libjpeg and dav1d only work in x86_64 32bit and 64bit. User should use {mavx2} functions because assembler is mmx - sse2 - avx2. However user may not approve {mavx2} and it is only under SSE2.

So for photo codecs the user should not use assembler.

In libavif should be easier. There isn't c++ and x265. AV1 doesn't have 64bit assembler yet
https://www.sendspace.com/file/qmalk7

nevcairiel

31st August 2021, 07:41

All these projects already come with build systems that compile the C code and the assembly files for you. Why are you manually trying to compile them when someone already took all the effort to make it easy to compile with just a few commands?

Jamaika

31st August 2021, 08:25

Why can I not? Inquiry is prohibited.
Secondly. I wanted to investigate the latest dav1d assembler fixes that are not included in libheif. I know this upsets some.
https://github.com/videolan/dav1d

"When I use gcc in C++ x86_64 32bit {m32} then gcc doesn't allow this to work with the additional {mingw.thread} software." My mistake. The downloaded gcc 64bit did not contain 32bit files.

Jamaika

1st September 2021, 07:57

Assembler in GCC for AVX / AVX2 / AVX3 processors. I have old computer so this topic didn't interest me. However this change awaits me.
Amateur observations. How is it that AVX2 can open files in C11 from C++11 and not in C++11 from C11. How to observe it? Help options aren't displayed in C++11. What am I doing wrong? I read on forums that combining languages in C11/C++11 is troublesome. Use "march=native". It doesn't work for me.
I thought that I had old computer that doesn't open the next generation AVX2 codecs. Today I compile libavif in c11. Codecs work in SSE2. Thus. Does user need special file extraction on SSE2 and AVX2? What to add to C ++ 11 to make it work?
Despite the effort, I was unable to run libheif or libwebp2 in C++11 with AVX2.

Examples:
https://www.sendspace.com/filegroup/qZYFQ7gzg60Ouqaacpaf5Q