Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Programming and Hacking > Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 28th August 2021, 14:06   #1  |  Link
Jamaika
Registered User
 
Join Date: Jul 2015
Posts: 705
Questions about assembler

I don't know how to make codecs with nasm and gcc.

Does assembler mean SIMD AVX, AVX2, AVX3?
I know that assembler can be used with 64bit files.
Do you need computer with cpu AVX2 for the codecs?
What functions should gcc and nasm have?
Is nasm the best for assembler? Maybe there is something better and newer.
Maybe third program is needed to merge the others?
How to check if assembler is included and works with the codec?
Jamaika is offline   Reply With Quote
Old 28th August 2021, 15:53   #2  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by Jamaika View Post
Does assembler mean SIMD AVX, AVX2, AVX3?
No. Not necessarily. Assembler means you are writing code in assembly language, instead of using a "high level" language, such as C, C++, Rust, etc. pp.

If you write code in assembly language, for the x86 or x64 platform, then you may use SIMD instructions (MMX, SSE/2/3/4, AVX/2, etc.), but you don't have to. Using SIMD instructs limits the CPUs that your code will run on.

Note:
Usually like ~99% of the code of an application or library are written in a "high level" language. Only the "critical" functions are written as assembler code, for optimization purposes.

Quote:
Originally Posted by Jamaika View Post
Do you need computer with cpu AVX2 for the codecs?
If code written in assembly language uses AVX2 instructions, then yes, that code requires a CPU which supports (at least) the AVX2 instruction set extension. Otherwise it would crash will "illegal instruction" exception

But: Usually, developers create multiple versions of the assembly code targeting different types of CPU. Then, at runtime, the "best" version of the code for the particular CPU can be selected.

For example, the same function can be implemented as "plain C" (runs on all CPUs and serves as a baseline), as AVX-optimized assembly code (runs on CPU with AVX support) and as AVX2-optimized assembly code (runs on CPU with AVX2 support). At runtime, the application can check the capabilities of the CPU that it is running on, using the CPUID instruction, and then select the implementation that matches the actual CPU.

Of course, all that does not happen "automatically". The programmer has to implement it that way!

Quote:
Originally Posted by Jamaika View Post
Is nasm the best for assembler? Maybe there is something better and newer.
Because an assembler just translates assembly instructions into binary opcodes in a "1:1" fashion, one assembler is as good as any other – provided that it supports the desired target platform and all required instruction set extensions you want to use. However, be aware that different assemblers may use different "dialects" of the assembly language. So, existing assembly code generally needs to be assembled with the specific assembler that it was written for.

Quote:
Originally Posted by Jamaika View Post
How to check if assembler is included and works with the codec?
By looking at the source code?

Also, applications and libraries often provide diagnostic output at runtime, which shows whether it was built with assembly code enabled and, if so, which specific assembly code optimizations actually are in use.

x264 is a good example for that:
Code:
x264 [info]: using cpu capabilities MMX MMXEXT SSE SSE2
(It means that x264 has determined that the CPU it is running on supports MMX, MMXEXT, SSE and SSE2. Thus x264 is going to use its optimized assembly code for those specific instruction set extensions)
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 28th August 2021 at 16:33.
LoRd_MuldeR is offline   Reply With Quote
Old 28th August 2021, 16:15   #3  |  Link
videoh
Useful n00b
 
Join Date: Jul 2014
Posts: 1,667
Quote:
Originally Posted by LoRd_MuldeR View Post
Because an assembler just translates assembly instructions into binary opcodes in a "1:1" fashion
You are forgetting about assembler directives and macros, etc.
videoh is offline   Reply With Quote
Old 28th August 2021, 16:47   #4  |  Link
lvqcl
Registered User
 
Join Date: Aug 2015
Posts: 294
Maybe by "make codecs" he means "compile existing code into an executable" ?
lvqcl is offline   Reply With Quote
Old 28th August 2021, 17:10   #5  |  Link
videoh
Useful n00b
 
Join Date: Jul 2014
Posts: 1,667
Quote:
Originally Posted by lvqcl View Post
Maybe by "make codecs" he means "compile existing code into an executable" ?
Yes. The source code generally comes with a full project or build script that would specify all the tools used.
videoh is offline   Reply With Quote
Old 28th August 2021, 18:30   #6  |  Link
Jamaika
Registered User
 
Join Date: Jul 2015
Posts: 705
Quote:
Originally Posted by LoRd_MuldeR View Post
No. Not necessarily. Assembler means you are writing code in assembly language, instead of using a "high level" language, such as C, C++, Rust, etc. pp.

If you write code in assembly language, for the x86 or x64 platform, then you may use SIMD instructions (MMX, SSE/2/3/4, AVX/2, etc.), but you don't have to. Using SIMD instructs limits the CPUs that your code will run on.

Note:
Usually like ~99% of the code of an application or library are written in a "high level" language. Only the "critical" functions are written as assembler code, for optimization purposes.
Please describe more clearly. What does "SIMD instructions (MMX, SSE / 2/3/4, AVX2, etc.)" mean?
AVX2 ranges from MMX, SSE2 / 3/4 to AVX2, but does that mean I will be able to use SSE2 alone?
Quote:
Originally Posted by LoRd_MuldeR View Post
But: Usually, developers create multiple versions of the assembly code targeting different types of CPU. Then, at runtime, the "best" version of the code for the particular CPU can be selected.
And here I don't know what's going on. He gives an example. Adds libjpeg-turbo or new dav1d with assembler which has files from sse2 and avx2. To compile assember I have to add two kinds of files sse2 and avx2. The question is, is assembler under sse2 or avx2?
Added asm sse2 or avx/2/3 files can be freely added in gcc under the same (-msse2) but are these dead functions and do they contribute anything? It doesn't have rare comumic.
Quote:
Originally Posted by LoRd_MuldeR View Post
Because an assembler just translates assembly instructions into binary opcodes in a "1:1" fashion, one assembler is as good as any other – provided that it supports the desired target platform and all required instruction set extensions you want to use. However, be aware that different assemblers may use different "dialects" of the assembly language. So, existing assembly code generally needs to be assembled with the specific assembler that it was written for.
I asked about assembler programs because I read that they are different, e.g. yasm. I was also interested in the question whether in assembly language itself you can choose only e.g. sse2.
Quote:
Originally Posted by LoRd_MuldeR View Post
By looking at the source code?

Also, applications and libraries often provide diagnostic output at runtime, which shows whether it was built with assembly code enabled and, if so, which specific assembly code optimizations actually are in use.

x264 is a good example for that:
It seems that there can be no problem with such a simple issue. Unfortunately, I have no success.
For x265 I have a message: none. At first I thought I had an old computer, but I downloaded other users' codec and here the assembler communicates that it works.
Gives preset functions to programs:
I wonder what I am doing wrong.
Code:
g++.exe -std=gnu++11 -ggdb3 -flto -O3 -fPIC -DWINVER=0x0602 -D_WIN32_WINNT=0x0602 -DEXPORT_C_API=0 -DX265_NS=x265_12bit -DX86_64=1 -DX265_VERSION=3.5+13 -DHIGH_BIT_DEPTH=1 -DX265_DEPTH=12 -DNX265_ARCH_X86=1 -DENABLE_HDR10_PLUS=1 -DNENABLE_LIBVMAF=1 -DENABLE_ASSEMBLY=1 -DHAVE_STRTOK_R -c ... -o ...
nasm.exe -f win64 -O3 -DARCH_X86_64=1 -DBIT_DEPTH=12 -DHIGH_BIT_DEPTH=1 -DX265_NS=x265_12bit -DPIC=1 -DSUFFIX=o -Xgnu ... -o  ...
Some recommend adding "march = native" and some say not necessarily. The processor must be specified.
Should windows be win64 or elf64 for gcc?
Or maybe gcc 12.0.0 just has bugs and doesn't work.

Sorry for my English

Last edited by Jamaika; 28th August 2021 at 18:59.
Jamaika is offline   Reply With Quote
Old 28th August 2021, 18:35   #7  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Most applications or libraries that do contain optimized assembly code will check for the required assembler tool (e.g. nasm) when you run the provided ./configure script.

Usually the ./configure script will simply error out when the required assembler tool is missing – unless you explicitly disable assembly code by passing the --disable-asm option (or whatever it is called). Sometimes the ./configure script "silently" disables the assembly code, if the required assembler tool wasn't found. In that case you should see whether assembly code is enabled or not from the final output of the ./configure script, e.g.:
Code:
./configure
[...]
platform:      X86
shared:        yes
static:        yes
asm:           yes <--- !!!
Note: It should be clear that all this requires that the software which you are trying to build already does support assembly code. If the software does not support assembly code, you cannot "magically" enable it
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 28th August 2021 at 18:53.
LoRd_MuldeR is offline   Reply With Quote
Old 28th August 2021, 18:45   #8  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by Jamaika View Post
Please describe more clearly. What does "SIMD instructions (MMX, SSE / 2/3/4, AVX2, etc.)" mean?
AVX2 ranges from MMX, SSE2 / 3/4 to AVX2, but does that mean I will be able to use SSE2 alone?
MMX, SSE, SSE2, SSE3, SSSE3, SSE4, AVX, AVX2 and AVX-512 – just to name a few – all are extensions of the original x86 instruction set. They all are so-called SIMD instruction sets.

If "optimized" assembly code uses any of those instruction set extensions, then that code will run only on CPUs which support the specific instruction set extension.

So, if the assembly code uses any SSE2 instructions, it requires a CPU that supports (at least) SSE2. If the assembly code uses any AVX instructions, it requires a CPU that supports (at least) AVX. If the assembly code uses any AVX2 instructions, it requires a CPU that supports (at least) AVX2. And so on. Of course, SSE2 and AVX (or AVX2) instructions can be "mixed" in the assembly code, but then a CPU with support for SSE2 and AVX (or AVX2) is required to run that code!

(Note: If a CPU supports AVX, then support for SSE2 is implied, but certainly not the other way around! Also AVX2 support implies AVX support, but again not the other way around)

As said before, an application or library may contain multiple versions of the same assembly code. For example, one version that uses SSE2 only and another version that uses AVX. This allows the "SSE2 only" assembly code to be used on CPUs that do support SSE but not AVX. And the "AVX" assembly code can be used on CPUs that support AVX. But, to be clear, this kind of "runtime CPU dispatching" does not happen automatically; it needs to be implemented!
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 28th August 2021 at 19:00.
LoRd_MuldeR is offline   Reply With Quote
Old 28th August 2021, 19:06   #9  |  Link
Jamaika
Registered User
 
Join Date: Jul 2015
Posts: 705
Quote:
Originally Posted by LoRd_MuldeR View Post
MMX, SSE, SSE2, SSE3, SSSE3, SSE4, AVX, AVX2 and AVX-512 – just to name a few – all are extensions of the original x86 instruction set.
I wonder ARCH_X86_64 definition. Does that mean x265 works with win64?

Use functions in gcc flto or fast-math or rather not necessarily? What function g "" should be used? g0, ggdb or gdwarf

Last edited by Jamaika; 28th August 2021 at 19:10.
Jamaika is offline   Reply With Quote
Old 28th August 2021, 19:13   #10  |  Link
lvqcl
Registered User
 
Join Date: Aug 2015
Posts: 294
https://en.wikipedia.org/wiki/Progra...by_permutation
lvqcl is offline   Reply With Quote
Old 28th August 2021, 20:59   #11  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by Jamaika View Post
I wonder ARCH_X86_64 definition. Does that mean x265 works with win64?
"x86-64" (aka "x64", aka "AMD64", aka "Intel 64", not to be confused with "IA64") is the 64-Bit extension of Intel's original "x86" architecture.

It is the architecture that all Intel and AMD processors from last ~15 years use.

Binaries built for "x86-64" architecture will run on 64-Bit Windows (Win64), as long as we are talking about the 64-Bit Windows for Intel/AMD "x86-64" processors.

Note: There now is a version of Windows for "arm64" processors too, but that is something different!

Quote:
Originally Posted by Jamaika View Post
Use functions in gcc flto or fast-math or rather not necessarily? What function g "" should be used? g0, ggdb or gdwarf
Regarding link-time optimization (LTO) see:
https://en.wikipedia.org/wiki/Interp...l_optimization

Regarding the "-ffast-math" option:
This enables some "unsafe" optimizations for math functions. Specifically, it drops strict compliance to the IEEE rules/specifications in order to yield even faster code. This may break some applications!
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 28th August 2021 at 21:30.
LoRd_MuldeR is offline   Reply With Quote
Old 28th August 2021, 21:08   #12  |  Link
Jamaika
Registered User
 
Join Date: Jul 2015
Posts: 705
Quote:
Originally Posted by LoRd_MuldeR View Post
Regarding link-time optimization (LTO) see:
https://en.wikipedia.org/wiki/Interp...l_optimization

Regarding the "-ffast-math" option:
This enables some "unsafe" optimizations for math functions. Specifically, it drops compliance to the IEEE or ISO rules/specifications in order to yield even faster code. This may break some applications!
So generally speaking do not use.
Jamaika is offline   Reply With Quote
Old 28th August 2021, 21:16   #13  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by Jamaika View Post
So generally speaking do not use.
Well, LTO may give a nice speed-up for some applications, but may not have any noteworthy effect for others. You really have to test it out. I think it's something worth trying.

But option "-ffast-math" is something you should use with care. Even though it may give an extra speed-up, if you do not exactly understand the consequences (for your particular application) then better don't use it
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 28th August 2021 at 21:32.
LoRd_MuldeR is offline   Reply With Quote
Old 29th August 2021, 13:00   #14  |  Link
Jamaika
Registered User
 
Join Date: Jul 2015
Posts: 705
Searching for a bug.
I read bit. For libjpeg-turbo cmake is recommended to use debugging information "-g". In nasm this is just the equivalent of "-g"
I used "-g0". My mistake.
"Level 0 produces no debug information at all. Thus, -g0 negates -g." Default is 2
Then I ran test with ggdb3. The assembler test for x265 failed on sse2 or avx2.
For svt-av1 cmake is recommended to run "-gdwarf". Option under unix. The problem is that "gdwarf" in gcc defaults to gdwarf32 for the nasm elf32/64 function.
Trying with the "gdwarf" function did not work either.

For me, it's big deal to combine nasm with gcc 12.0.0. The suggestion of permutation with options in programs is like ball against the fence. Nasm doesn't have lot of options. The programs either work or don't.

libjpeg-turbo
Code:
nasm_2.15.05.exe -f win64 -g -F cv8 -O3 -DWIN64=1 -D__x86_64__=1 -DPIC=1 %%f -Xgnu -o %%~nf.o
gcc_12.0.0.exe -std=gnu11 -g -O3 -fPIC -DWITH_SIMD=1 -DUSE_WINDOWS_MESSAGEBOX=1 -DBITS_IN_JSAMPLE=8 -DINLINE="inline __attribute__((always_inline))" -DLOCAL(type)="static type" -c %%f -o %%~nf.o
dav1d
Code:
fnasm_2.15.05.exe -f win64 -g -F cv8 -O3 -DWIN64=1 -DARCH_X86_64=1 -DPIC=1 -Dprivate_prefix=dav1d -Xgnu %%f -o %%~nf.o
gcc_12.0.0.exe -std=gnu11 -g -O3 -fPIC -DARCH_X86_64=1 -DHAVE_ASM=1 -D__USE_MINGW_ANSI_STDIO=1 -DUNICODE=1 -D_UNICODE=1 -DCONFIG_8BPC -DCONFIG_16BPC -DBITDEPTH=8 -DHAVE_ALIGNED_MALLOC=1 -c %%f -o %%~nf.o
svt-av1 ???
Code:
nasm_2.15.05.exe -f win64 -g -F cv8 -O3 -DWIN64=1 -DARCH_X86_64=1 -DPIC=1 -Xgnu %%f -o %%~nf.o
gcc_12.0.0.exe -std=gnu11 -g -O3 -fPIC -DARCH_X86_64=1 -c %%f -o %%~nf.o
x265
Code:
nasm_2.15.05.exe -fwin32 -g -F cv8 -O3 -DX265_ARCH_X86=1 -DARCH_X86_64=0 -DBIT_DEPTH=8 -DHIGH_BIT_DEPTH=0 -DX265_NS=x265 -DPIC=1 -DSUFFIX=o -Xgnu %%f -o %%~nf.o
g++_12.0.0.exe -std=gnu++11 -g -O3 -fPIC -DX86_64=0 -DWINVER=0x0602 -D_WIN32_WINNT=0x0602 -DEXPORT_C_API=1 -DX265_NS=x265 -DLINKED_10BIT=1 -DLINKED_12BIT=1 -DX265_VERSION=3.5+13 -DHIGH_BIT_DEPTH=0 -DX265_DEPTH=8 -DENABLE_HDR10_PLUS=1 -DHAVE_STRTOK_R -DENABLE_ASSEMBLY=1 -c %%f -o %%~nf.o

Last edited by Jamaika; 31st August 2021 at 06:27.
Jamaika is offline   Reply With Quote
Old 31st August 2021, 06:01   #15  |  Link
Jamaika
Registered User
 
Join Date: Jul 2015
Posts: 705
Building with assembler using libheif. Assume that we do not press cmake, but we want to test new functions.
After hours of guessing, I find libjpeg-turbo, dav1d, and x265 assemblers a pain to compile.
Defects:
Very large gcc build file sizes with {-g -O3} only functions. The user can only use fPIC under 64bit if the assembly files contain the appropriate functions. The rest of the functions aren't included {-ffast-math -fflto -ftree-vectorize}. There will be differences in the nasm, gcc files and the codec will not work properly.
x265 works for me only in 32bit, when we turn off avx2, avx3 functions which only work in 64bit. When I use gcc in C++ x86_64 32bit {m32} then gcc doesn't allow this to work with the additional {mingw.thread} software. There are bugs. Moreover x265 assembler has bugs and doesn't work in 64bit after corrections, which it communicates.
libjpeg and dav1d only work in x86_64 32bit and 64bit. User should use {mavx2} functions because assembler is mmx - sse2 - avx2. However user may not approve {mavx2} and it is only under SSE2.

So for photo codecs the user should not use assembler.

In libavif should be easier. There isn't c++ and x265. AV1 doesn't have 64bit assembler yet
https://www.sendspace.com/file/qmalk7

Last edited by Jamaika; 31st August 2021 at 06:36.
Jamaika is offline   Reply With Quote
Old 31st August 2021, 07:41   #16  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,346
All these projects already come with build systems that compile the C code and the assembly files for you. Why are you manually trying to compile them when someone already took all the effort to make it easy to compile with just a few commands?
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 31st August 2021, 08:25   #17  |  Link
Jamaika
Registered User
 
Join Date: Jul 2015
Posts: 705
Why can I not? Inquiry is prohibited.
Secondly. I wanted to investigate the latest dav1d assembler fixes that are not included in libheif. I know this upsets some.
https://github.com/videolan/dav1d

"When I use gcc in C++ x86_64 32bit {m32} then gcc doesn't allow this to work with the additional {mingw.thread} software." My mistake. The downloaded gcc 64bit did not contain 32bit files.
Jamaika is offline   Reply With Quote
Old 1st September 2021, 07:57   #18  |  Link
Jamaika
Registered User
 
Join Date: Jul 2015
Posts: 705
Assembler in GCC for AVX / AVX2 / AVX3 processors. I have old computer so this topic didn't interest me. However this change awaits me.
Amateur observations. How is it that AVX2 can open files in C11 from C++11 and not in C++11 from C11. How to observe it? Help options aren't displayed in C++11. What am I doing wrong? I read on forums that combining languages in C11/C++11 is troublesome. Use "march=native". It doesn't work for me.
I thought that I had old computer that doesn't open the next generation AVX2 codecs. Today I compile libavif in c11. Codecs work in SSE2. Thus. Does user need special file extraction on SSE2 and AVX2? What to add to C ++ 11 to make it work?
Despite the effort, I was unable to run libheif or libwebp2 in C++11 with AVX2.

Examples:
https://www.sendspace.com/filegroup/...g60Ouqaacpaf5Q

Last edited by Jamaika; 1st September 2021 at 09:19.
Jamaika is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 18:11.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.