PDA

View Full Version : x264, core2 duo, mingw and sse2


morph166955
5th February 2007, 05:21
For those who didnt see my post a few weeks ago, here was my original problem. I was getting a bad instruction error when using x264 with mencoder with in the first few frames of the file (generally less then 7 frames). This obviously was really annoying since it meant that I couldn't use x264 with mencoder (my personal preference in encoders). I surmised that it was a problem with the way the compile of either x264 or mencoder was done so i just finally had the time to do a whole lot of encodes with different options, noticing that some worked, and some didn't.

I found two fixes and I'm not sure which is better so thats why im posting this here. Ultimately the problem lies in the way x264 does its SSE2 assembly optimizations. After searching on here i found a similar issue seen on the newer macs (the ones with the new intel core and core2 chips) dealing with common/i386/i386inc.asm and the following code (note: this is the WORKING version if it, bascially all commented out)


; Name of the .rodata section. On OS X we cannot use .rodata because NASM
; is unable to compute address offsets outside of .text so we use the .text
; section instead until NASM is fixed.
%macro SECTION_RODATA 0
; %ifidn __OUTPUT_FORMAT__,macho
; SECTION .text align=16
; fakegot:
; %else
; SECTION .rodata data align=16
; %endif
%endmacro


Now im not entirely sure what this does but it seems to work when i compile everything normally and only have this patch applied to the source. It should be noted that x264 does compile successfully and none of the nasm runs make any complaints, it only fails upon usage.

The second fix is removing -DHAVE_SSE2 from the gcc options in the config.mak file before compiling. both of which have generated almost identical frame rates with mencoder so i'd assume that they both do roughly the same thing.

So the question is, which is better, and why is this happening in the first place? without doing this, the only x264 i can do on my shiny new core2duo laptop is a first pass run. neither a one pass crf nor a second pass run works unless one of these two are implemented. I have tried multiple versions of mencoder from all different sources (including sharktooths builds both the cpu detection and his core2 build as well as the other generic vanilla builds) and all of them fail. I have to assume that a few other people on here have core2's or core's and unless they arent speaking up too much, they arent having issues so why am I? My laptop is a lenovo x60 with a core2 duo 2.0ghz, 1.5gb ram, 160gig hdd, and all the other generic things in the system.

Thanks in advance!

akupenguin
5th February 2007, 07:27
Removing -DHAVE_SSE2 disables SSE2 (did you expect it do do something else?). This doesn't really matter on AMD cpus, but should be a measurable speed penalty on core2 (17% slower here).

If there's any difference between 1st pass and 2nd pass or 1pass, it can only mean that the crash is specific to some features you disabled for fast-1st-pass.

morph166955
5th February 2007, 14:45
yea the -DHAVE_SSE2 is obviously disabling that, i can only assume that the other code disables the same thing (or more?). id love to know why this is happening and how to fix it (even if its an unofficial patch that i can apply myself) so that i can get that speed back. i really wanna see what this thing can do and thus far ive been very limited in testing because of all of this.