View Full Version : Trying to compile x264 for legacy x86 CPUs (Win32) *solved*
GrandAdmiralThrawn
25th November 2011, 21:40
Hello!
I am currently trying to recompile x264 on Windows using Komisars MinGW/MSYS and libpack. My goal is to get rid of all modern x86 instruction set dependencies, like SSE.
I have already compiled x264 on MIPSEL, so I know it should be definitely possible..
So my own x264 build works on modern systems running XP/Vista/7 when sitting on modern CPUs (>= Pentium III). However, I am currently desperately trying to get it to work on an ancient Quad Pentium PRO machine. Pentium PROs are P6 architecture, just like Pentium II/III, however they lack MMX/SSE.
The behaviour that x264 is showing is just that it starts to work seemingly, then terminates without any error message.
My question would be: Where do I have to look? Maybe the libav libraries from Komisars libpack are the problem? I am linking them in statically, but maybe they depend on SSE (I am using a H.264/AVC elementary stream for input, so i kinda depend on libav/lavf).
If it's not lavf's fault, what else could be amiss? I have no error output, so this is kind of hard to diagnose.
My CFLAGS/CXXFLAGS are: "-O3 -march=pentiumpro -mtune=pentiumpro -mfpmath=387 -mno-sse -ffast-math -static". I have also told the configure script: "--disable-asm". YASM is not building any assembler codepaths and GCC is not passing any options to generate more modern instructions than pure i686/387.
The build works on modern systems, showing that it is using no CPU capabilities.
Thanks!
LoRd_MuldeR
25th November 2011, 21:55
If you link in any pre-compiled libraries (such as pthreads, libavcodec, ffms2, etc), then these of course must have been compiled with setting suitable for the target CPU too!
This also applies to static libraries, because static libraries simply are archives that contain object files. And object files contain compiled machine code - compiled with whatever settings were used to create the static library.
Only exception is when "link-time code generation" (LTGC) is used, in which case the object files in the static library would contain a pre-compiled byte-code that is compiled to machine code in the final linking phase.
So if you compile an executable with compiler settings that make it compatible to the target CPU, but then link in some libraries that had been compiled with incompatible settings, the final output is likely to fail on that CPU :eek:
In other words: CFLAGS apply at compile-time, not at link-time (except for LTCG). For static libraries the CFLAGS apply when the object files in that library are created. Not when the library is linked into some binary.
Consequently, the CFLAGS that are used to compiled the "main" object files of the binary (i.e. the CFLAGS that are chosen by you) do NOT effect the already-compiled object files contained in some third-party static library.
GrandAdmiralThrawn
25th November 2011, 22:11
Thanks for your detailed explanation! So maybe I was right to suspect the precompiled libav.. I will try to recompile the entire libav from source in MinGW/MSYS and link x264 against it, in the hope that all the other libs are simple i386-i686 code.
Let's see if that works...
LoRd_MuldeR
25th November 2011, 22:30
I think you have to take care of all libraries you link in. Even the compiler's own libraries, such as the runtime and standard libs. And also take care of the pthreads library.
(I would expect the compiler's libraries to have been compiled with "maximum compatibility" settings. But there's no guarantee for that, depending on whose GCC/MinGW build you use)
GrandAdmiralThrawn
25th November 2011, 23:08
If I would really need to recompile everything, I probably have to give up anyway, can't handle that. Too complicated. I just pray that it's only libav.
Currently I'm struggling to get the configure script of x264 to even find and link against libav... I guess I installed into the correct prefix, but "lavf.... no".
I have a feeling this stuff is never going to work...
LoRd_MuldeR
25th November 2011, 23:19
Is it actually worth to compile x264 for CPU's that don't meet the minimum requirements to compile x264 with ASM enabled? :confused:
Not only will x264 be extraordinarily slow without all the insane ASM optimization (even on CPU's that would be able to run the ASM code), also CPU's that don't support x264 with ASM are old and thus too slow for video encoding anyway!
(BTW: I usually compile x264 with Komisar's libav's libraries. Adding a "-L<path_to_lib_files>" to the LDFLAGS and "-I<path_to_header_files>" to the CFLAGS is sufficient to make x264's configure script recognize the libs)
GrandAdmiralThrawn
25th November 2011, 23:23
You're right. But the intended purpose is actually not real encoding, but benchmarking. I am hosting a custom x264 benchmark list for a Forum that I am active in, and we already got some crazy stuff like Transmeta, Little Endian MIPSEL and PowerPC CPUs in the Mix.
So the idea is to extend the range of testable CPUs to lower end x86 running Windows (on Linux it's comparably easy to build libav+x264 without any asm).
Seems to be seriously hard though.
Edit: I am also using Komisars stuff. But there seems to be SSE code in his libpack (maybe also in other parts of his prebuilt toolchain, who knows)..
Edit 2: Strange. x264 seems to link against Komisars libpack just fine. But as soon as I replace the libraries with my own, x264's configure says "lavf... no". No idea why it won't link against my self-built libav..
LoRd_MuldeR
25th November 2011, 23:27
About the benchmark: I hope your plan is to only select the build that was compiled without ASM on those CPU's that can't run x264 otherwise.
If instead you plan to use one build on all CPU's (and make that build as compatible as possible), then your results will be seriously crippled and unfair on all half-way up-to-date processors!
Even if done properly, running the benchmark on those "old" CPU's that can't run x264 normally is kind of pointless. We know beforehand their score will be beyond good and evil...
GrandAdmiralThrawn
25th November 2011, 23:38
You are correct again. This is, why there are certain flags to be set in my list of results. One of those flags is "Custom Build", which means, that a binary different from the "proper" binary was used to generate that specific result.
So, results generated by a version which was compiled without any SSE support would be flagged as such. All modern CPUs are tested with full support for SSE, SSE2, SSSE3, SSE4 etc.
This is how results from Linux on PowerPC or MIPS were flagged also, because naturally, those CPUs don't have SSE (even though x264 can make use of AltiVec on PPC).
So, no worries about that. I would just love to get it to work. :)
Edit: In case you'd like to take a look, here is the english version of that list: x264 Benchmark list (http://www.xin.at/x264/index-en.php).
Edit 2: I managed to get it done using a full CygWin environment for compilation. libav decoding works fine, and speed seems relatively ok on a Pentium PRO quad machine.
GrandAdmiralThrawn
4th February 2012, 09:12
Update: I have found out that while the CygWin environment has no SSE-dependant libraries, the libs are still i686. So I got this to run on a Pentium PRO processor (Intels first 686) and on an AMD Athlon, a K7 Thunderbird. But when you go for "almost-686" like the AMD K6 or pure 586 like the Pentium MMX etc., you're out of luck. Since there is obviously no i586/i486/i386 CygWin or MinGW/MSYS, it has to be Linux for anything older.
Here is my current record holder in massive slowness, an i486 DX4-S 100MHz from Intel running on an ASUS PCI/I-486SP3G with 128MB FPM-DRAM (of which only 64MB are cacheable, maximum). OS is Debian Lenny. For swap space, there is a fast Adaptec AHA-19160 on PCI 2.1 with a 10.000rpm Seagate Cheetah, this image is updated all 10 minutes. Will probably run for another 7 months, has now been running since 2 months:
http://www.xin.at/xin/snapshot.jpg
It's running a 2-pass encode with 15.691 frames per pass, and this is currently pass 1. Real average fps are currently 0.00129. Machine is in timezone UTC+1.
I doubt anyone has ever run x264 on a CPU older or at least slower than this, haha. ;)
Dark Shikari
4th February 2012, 16:18
It's probably not fair to run x264 out of swap space; that's too crippling! Probably best to pick settings (or an input resolution) that lets it fit in RAM.
LoRd_MuldeR
5th February 2012, 02:34
The i486 DX4-S is done. Now the next challenge for you:
http://img845.imageshack.us/img845/6061/x26464.png
GrandAdmiralThrawn
5th February 2012, 08:17
HAHA, i actually have one here that still works. But i seriously doubt that it can complete this encoding run in a man's lifetime. :D
@Dark Shikari: True. But I don't hear the harddrive do too much work. Here and there it's loading around a little bit. Bandwidth-wise the SCSI system is probably not that much slower than the main RAM.
I tried to give the 486 as much RAM as possible, but the board won't take more than 128MB, which is pretty crazy already for a 486. And I can't alter the test anymore, otherwise it wouldn't be comparable any longer.
So I just gave it the fastest swap space I could come up with: U160 SCSI.
Dark Shikari
5th February 2012, 11:50
HAHA, i actually have one here that still works. But i seriously doubt that it can complete this encoding run in a man's lifetime. :D
@Dark Shikari: True. But I don't hear the harddrive do to much work. Here and there it's loading around a little bit. Bandwidth-wise the SCSI system is probably not that much slower than the main RAM.Bandwidth isn't why swap is slow, it's latency. x264's access patterns are not designed for a system with a 10ms memory access latency, especially one without a modern hardware prefetcher.
GrandAdmiralThrawn
5th February 2012, 18:41
Yes, of course, access time. I knew this would be brought up as soon as I said "bandwidth". ;)
So, what I'm seeing here is this: The drive stays idle for a few minutes, then it becomes quite active for around 5-10 seconds, then it goes back to almost idle. During all that time, CPU load stays at 98-99%. It's just that during that short period of more intense activity, kswapd will consume more of the CPU, like 20-30% instead of near-zero.
I have the impression, that most of the time, the CPU is working with data in RAM... Of course my observation is very amateurish, but at least it seems like it's not totally bogging x264 down.
Dark Shikari
5th February 2012, 19:44
In that case it's probably fine. As long as x264 can fit the working set for a single thread into RAM, things will probably be okay.
Bloax
5th February 2012, 19:54
If the first pass has taken two months already, I doubt the second one will finish this year. :P
Anyways, insane project, no less. (Also shows how much better modern hardware is!)
GrandAdmiralThrawn
5th February 2012, 21:29
Hmm, usually Pass 2 runs about as long as Pass 1. (I am doing pass 1 with --slow-firstpass). Currently, pass 1 isn't even half way done. ;) The entire runtime (pass 1 + 2) is currently estimated at 280 days approximately. To compare that to more modern systems, just [see the results list (http://www.xin.at/x264/index-en.php)].
Actually it seems the thing boosting x86 most are the assembler codepaths for SSEx for Pentium III and newer CPUs...
Dark Shikari
5th February 2012, 21:51
Technically, MMX alone would be very helpful, but due to general laziness and utter lack of reason to care about CPUs that are 15 years old, x264 simply doesn't support asm on anything without MMXEXT (aka iSSE), which was introduced on the Pentium 3.
Also, the instructions in your benchmark post are incorrect; -O3 does not make x264 "use more memory", it makes the compiler generate larger code. Code size is insignificant compared to allocated memory, and most CPUs in the past 20 years have similar size code caches.
GrandAdmiralThrawn
6th February 2012, 14:35
I can fully understand that there is not much incentive or point in optimizing x264 for ancient or rare processors. I would have loved to see optimizations for the SIMD unit of my ICT Loongson-2f processor (MIPSEL), or for HP ALPHA EV7 processors, but those are so rare, it probably doesn't make any sense to work hard on those things. And who would really use a 486 for this.. or even a Pentium 3? ;)
And:
Thank you for the information regarding the -O3 switch! I guess you mean my post at voodooalert.de, I'll correct the false information there asap.
Bloax
6th February 2012, 17:43
Next challenge: Decode that file. :P
GrandAdmiralThrawn
6th February 2012, 21:07
Funny thing is: I have already transcoded that on several machines which are completely unable of actually DEcoding the video stream properly. :p
(Yes, it makes no practical sense at all, but I do not care! ;))
GrandAdmiralThrawn
5th September 2012, 08:46
To whoever may be interested: The 486 DX4-S/100 (presumably now the oldest x86 to ever run x264) has finished its task of transcoding Elephants Dream.
http://www.xin.at/xin/snapshot.jpg
The result is as abysmal as expected, in HHHH:MM:SS.sss notation it took the machine 6261:29:58.872 - over sixthousand hours - to do the job. Breaking that down to the months that the benchmark ran throughout, it took 8 months, 16 days, 21 hours, 29 minutes and ~59 seconds to complete. That's actually a few hours off, because of the machines RTC clock skew, but the deviation is not too significant, clock was about 2 hours in the future as the test ended.
To compare this result to other machines, please see the [results list (http://www.xin.at/x264/index-en.php)] again.
It was definitely a fun project to make Debian Linux run on the machine an get that x264 transcode done. Quite something. :rolleyes:
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.