Log in

View Full Version : x264 "--asm": Forcing certain CPU instruction set extensions


GrandAdmiralThrawn
14th February 2014, 16:00
Greetings!

For testing purposes, I would like to be able to force x264 to not use certain instruction set extensions that my CPU provides. For that, x264 provides the --asm <integer> parameter at runtime. I do however not find any documentation that describes how certain CPU feature combinations are being expressed in the integer passed to --asm!

I tried to get that info from the source code, but failed. :( So far I tried this blindly, and mostly just made a mess like trying to feed my Core i7 some Altivec and NEON (at the same time!! lol!).

I would like to be able to use this for VM vs. native comparisons, where the VM hypervisor does not pass through all my instruction sets.

For instance, my CPU natively has SSE+SSE2+SSE3+SSSE3+SSE4.1+SSE4.2.

The VirtualBox hypervisor stops at SSSE3, and does not pass through SSE4.1 and SSE4.2 to the guest operating system.

Now, I would like to emulate this behavior on my host operating system or other VMs using the --asm parameter.

(All this is done for rapid and "rather accurate than not" operating system comparisons).

Where can I find documentation for --asm?

Thanks a lot!

And my apologies if this is the wrong subforum. Not sure whether "hey, I looked at the source code" really qualifies. ;)

LoRd_MuldeR
14th February 2014, 17:13
It's probably just the x264 CPU flags that you can find in "x264.h":
/* x86 */
#define X264_CPU_CMOV 0x0000001
#define X264_CPU_MMX 0x0000002
#define X264_CPU_MMX2 0x0000004 /* MMX2 aka MMXEXT aka ISSE */
#define X264_CPU_MMXEXT X264_CPU_MMX2
#define X264_CPU_SSE 0x0000008
#define X264_CPU_SSE2 0x0000010
#define X264_CPU_SSE3 0x0000020
#define X264_CPU_SSSE3 0x0000040
#define X264_CPU_SSE4 0x0000080 /* SSE4.1 */
#define X264_CPU_SSE42 0x0000100 /* SSE4.2 */
#define X264_CPU_LZCNT 0x0000200 /* Phenom support for "leading zero count" instruction. */
#define X264_CPU_AVX 0x0000400 /* AVX support: requires OS support even if YMM registers aren't used. */
#define X264_CPU_XOP 0x0000800 /* AMD XOP */
#define X264_CPU_FMA4 0x0001000 /* AMD FMA4 */
#define X264_CPU_AVX2 0x0002000 /* AVX2 */
#define X264_CPU_FMA3 0x0004000 /* Intel FMA3 */
#define X264_CPU_BMI1 0x0008000 /* BMI1 */
#define X264_CPU_BMI2 0x0010000 /* BMI2 */
/* x86 modifiers */
#define X264_CPU_CACHELINE_32 0x0020000 /* avoid memory loads that span the border between two cachelines */
#define X264_CPU_CACHELINE_64 0x0040000 /* 32/64 is the size of a cacheline in bytes */
#define X264_CPU_SSE2_IS_SLOW 0x0080000 /* avoid most SSE2 functions on Athlon64 */
#define X264_CPU_SSE2_IS_FAST 0x0100000 /* a few functions are only faster on Core2 and Phenom */
#define X264_CPU_SLOW_SHUFFLE 0x0200000 /* The Conroe has a slow shuffle unit (relative to overall SSE performance) */
#define X264_CPU_STACK_MOD4 0x0400000 /* if stack is only mod4 and not mod16 */
#define X264_CPU_SLOW_CTZ 0x0800000 /* BSR/BSF x86 instructions are really slow on some CPUs */
#define X264_CPU_SLOW_ATOM 0x1000000 /* The Atom is terrible: slow SSE unaligned loads, slow
* SIMD multiplies, slow SIMD variable shifts, slow pshufb,
* cacheline split penalties -- gather everything here that
* isn't shared by other CPUs to avoid making half a dozen
* new SLOW flags. */
#define X264_CPU_SLOW_PSHUFB 0x2000000 /* such as on the Intel Atom */
#define X264_CPU_SLOW_PALIGNR 0x4000000 /* such as on the AMD Bobcat */

Note that those are given in Hexadecimal notation.

And if you want to combine multiple flags, you'll have to do a binary OR and you will probably need to pass the result as a Decimal value on the command-line.

For example, to use CMOV, MMX, MMX2 and SSE we get:
0x0000001|0x0000002|0x0000004|0x0000008 = 0x0000000F = 15

Dark Shikari
14th February 2014, 17:24
Just list the flags with commas:

--asm mmx2,sse2,sse2fast,ssse3,sse42,avx for example.

GrandAdmiralThrawn
14th February 2014, 17:56
I did remember x264.h (because I requested a list of all assembler code paths here way back :) ), and I did attempt to convert the hex values to decimals and then just add them up. That was when I got my i7 to try and run Altivec and NEON code. ;) So that didn't work.. I didn't think of doing OR on the values though, but even that doesn't seem to work, got another round of Altivec thrown into the mix with "15".

Just list the flags with commas:

--asm mmx2,sse2,sse2fast,ssse3,sse42,avx for example.

THAT however, DID work!
:thanks:

Maybe the output of x264 --fullhelp should be updated to reflect that, because it specifically says "--asm <integer>"! And it does seem to interpret arbitrary integer values somehow. But a list of strings is just so much easier! :)

Thanks for the help!

LoRd_MuldeR
14th February 2014, 18:50
Calculating the value according to the flags in "x264.h" works for me. Using "--asm 15" gives this, as expected:
x264 [info]: using cpu capabilities: MMX2 SSE

Anyway, just passing a comma-separated list of flag names is much more convenient, of course ;)

About the help: I don't think this option is anything regular users should ever need to mess with. x264 uses CPU runtime detection and will only enable optimized ASM code for those instructions that are actually supported by the CPU.

Now, if VirtualBox doesn't support certain CPU instructions (although supported by the host CPU), but fails to emulate CPUID accordingly (i.e. still signals them available), then that's not x264's fault, but a bug in VirtualBox!

GrandAdmiralThrawn
14th February 2014, 21:32
I'm not sure what the code does exactly, but for me "--asm 15" gives me "Altivec Cache32 Cache64"?!?

And those virtualization things, well.. it's a "bug" in many hypervisors unfortunately. :(

LoRd_MuldeR
14th February 2014, 21:42
I'm not sure what the code does exactly, but for me "--asm 15" gives me "Altivec Cache32 Cache64"?!?

Well, "AltiVec" is some PowerPC extension, so you are running on PowerPC. The same CPU flags have different meanings on different architectures. I am running on x86 ;)

#define X264_CPU_CMOV 0x0000001
#define X264_CPU_MMX 0x0000002
#define X264_CPU_MMX2 0x0000004 /* MMX2 aka MMXEXT aka ISSE */

/* PowerPC */
#define X264_CPU_ALTIVEC 0x0000001

/* ARM */
#define X264_CPU_ARMV6 0x0000001
#define X264_CPU_NEON 0x0000002 /* ARM NEON */
#define X264_CPU_FAST_NEON_MRC 0x0000004 /* Transfer from NEON to ARM register is fast (Cortex-A9) */

GrandAdmiralThrawn
14th February 2014, 21:46
While I do own a PPC machine among others (like ARMs, where the NEON instructions are), I have to say "no" here.

The machine I'm doing all of this on at the moment is a x86_64, a Core i7 980X.. So it's showing "Altivec Cache32 Cache64" while running on an actual Core i7 980X CPU!

It seems some other part of the code is blocking the actual Altivec code from running, otherwise I would be getting some "illegal instruction" error?

LoRd_MuldeR
14th February 2014, 23:00
x264 does not even have the string "Altivec" compiled in, unless built for PowerPC :confused:

const x264_cpu_name_t x264_cpu_names[] =
{
#if HAVE_MMX
[...]
#elif ARCH_PPC
{"Altivec", X264_CPU_ALTIVEC},
#elif ARCH_ARM
[...]
#endif
{"", 0},
};

GrandAdmiralThrawn
15th February 2014, 11:35
I guess it's just a cosmetic thing then? I managed to get "Altivec" displayed alone with some Integer value I tried (don't remember which one). The code runs exactly as fast as it would with --no-asm. Of course it would. It either doesn't even try to run Altivec, or it would crash immediately, having called an illegal CPU instruction. Since it seems to be running pure C/C++ code, it has to be a cosmetic error.

Makes sense to not even have Altivec when built for x86_32 or x86_64. Seems what I get here is a display glitch only. Same for ARM NEON. Weird that it works fine for you though.. I will retry this on a different operating system later (I can try Linux and BSD UNIX as well as Haiku OS, right now I'm on XP x64). Not sure if my OS should have any say in this?

LoRd_MuldeR
15th February 2014, 17:41
Seems like something with your build is messed up. Did you compile that yourself or get a pre-compiled binary?

GrandAdmiralThrawn
17th February 2014, 08:35
Ok, here it comes. It's a version thing.

The one on Windows is an official 32-bit binary from x264.nl, albeit a bit older. A current, official x86_64 Windows binary (x264-r2389-956c8d8.exe) shows the behavior described by you (MMX2 SSE). I also tried a not-completely-recent custom build (no fishy configure options set, just plain default, about 1 year old!) on both CentOS 6 Linux x86_64 as well as on PC-BSD/FreeBSD 9.2 UNIX x86_64, and in all those cases --asm 15 results in:

x264 [info]: using cpu capabilities: Altivec Cache32 Cache64

uname -a with hostname cut out where it's a FQDN shows:

Linux <HOST> 2.6.32-358.18.1.el6.x86_64 #1 SMP Wed Aug 28 17:19:38 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

..and:

FreeBSD unixbox 9.2-RELEASE-p12 FreeBSD 9.2-RELEASE-p12 #0: Thu Jan 16 21:12:30 UTC 2014
root@amd64-builder.pcbsd.org:/usr/obj/usr/src/sys/GENERIC amd64

So I assume there were some changes to the --asm stuff in the last year.. I usually don't update x264 that often, wouldn't have expected that parameter to change behavior when being fed Integers, but it seems to be the case. In any case, the way described by Dark Shikari seems to work on all "relatively recent" versions I have lying around.

LoRd_MuldeR
17th February 2014, 17:58
Be aware that you must always look at the "x264.h" that matches the version of your x264 binary!

Applying constants from the latest "x264.h" to some old x264 binary, or the other way around, can result in unexpected behavior.

There is a X264_BUILD define, which is increased every time the public API of x264 changes, for a reason ;)


This is an excerpt from "x264.h" of an older version:
/* CPU flags
*/
#define X264_CPU_CACHELINE_32 0x0000001 /* avoid memory loads that span the border between two cachelines */
#define X264_CPU_CACHELINE_64 0x0000002 /* 32/64 is the size of a cacheline in bytes */
#define X264_CPU_ALTIVEC 0x0000004
#define X264_CPU_MMX 0x0000008
#define X264_CPU_MMX2 0x0000010 /* MMX2 aka MMXEXT aka ISSE */
#define X264_CPU_MMXEXT X264_CPU_MMX2
#define X264_CPU_SSE 0x0000020
#define X264_CPU_SSE2 0x0000040

:eek: :D

GrandAdmiralThrawn
18th February 2014, 14:14
I see, I wasn't aware of that, thanks!