Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Programming and Hacking > Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 26th March 2007, 23:39   #1  |  Link
morph166955
Registered User
 
Join Date: Mar 2006
Posts: 443
ffmpeg/libavcodec assembly DOES NOT WORK FOR CORE2!

I have now spent about a month rebuilding and reworking gcc and every other possible thing to get mplayer/mencoder to compile libavcodec with march/mtune being set to core2...it DOESNT WORK. I have searched through everything I can and I have come to the conclusion that the assembly code in libavcodec is incompatibile with the core2's design. The core2 architecture utilizes technology that has up until now not been utilized (such as 128 bit execution channels as opposed to the 64 bit ones that have been used on all the previous processors) and I believe that is the case that we are seeing here.

I found a bug on gcc's bugzilla here http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11203 which directly discusses the offending file/function and it has a post as recent as feb 2007 (the thread was originally started in 2004). I've posted bug reports up on the bugzilla for ffmpeg/mplayer and they have been closed because they were supposedly "problems with the compiler".

Being that this is the new processor architecture and that mplayer/mencoder/ffmpeg is a pretty important piece of software, I come here to ask anyone who knows what their doing in assembly to give me a hand in fixing this so that we can use the benefits of the core2. I have had significant experience in writing C/C++ programs, I have not however had any experience (other then the limited experience from the past month) in handling assembly (specifically assembly involving mmx specific calls) inside of C. I have written programs in raw assembly (very basic stuff, had a class on it a few years back while getting my degree) so I understand what its trying to do, I just am not sure how its doing it all.

The problem lies as best as I can tell in libavcodec/i386/dsputil_mmx.c around line 636 with the assembly in function transpose4x4(). the failure occurs when it is called by h264_h_loop_filter_luma_mmx2 (or h264_loop_filter_luma_mmx2 depending on what optimization setting I use in gcc) in libavcodec/i386h264dsp_mmx.c on line 449. The error that gcc spits out is:

Code:
i386/h264dsp_mmx.c: In function 'h264_h_loop_filter_luma_mmx2':
i386/dsputil_mmx.c:636: error: can't find a register in class 'GENERAL_REGS' while reloading 'asm'
i386/dsputil_mmx.c:636: error: can't find a register in class 'GENERAL_REGS' while reloading 'asm'
i386/dsputil_mmx.c:636: error: 'asm' operand has impossible constraints
i386/dsputil_mmx.c:636: error: 'asm' operand has impossible constraints
i386/dsputil_mmx.c:636: error: 'asm' operand has impossible constraints
i386/dsputil_mmx.c:636: error: 'asm' operand has impossible constraints
i386/h264dsp_mmx.c:393: error: 'asm' operand has impossible constraints
i386/dsputil_mmx.c:636: error: 'asm' operand has impossible constraints
i386/dsputil_mmx.c:636: error: 'asm' operand has impossible constraints
i386/dsputil_mmx.c:636: error: 'asm' operand has impossible constraints
i386/dsputil_mmx.c:636: error: 'asm' operand has impossible constraints
i386/dsputil_mmx.c:636: error: 'asm' operand has impossible constraints
i386/dsputil_mmx.c:636: error: 'asm' operand has impossible constraints
i386/h264dsp_mmx.c:393: error: 'asm' operand has impossible constraints
i386/dsputil_mmx.c:636: error: 'asm' operand has impossible constraints
i386/dsputil_mmx.c:636: error: 'asm' operand has impossible constraints
Everything else that I have compiled with gcc 4.3 that has core2 enabled has worked with out a problem (x264 being a prime example of this since the h264 function is whats failing). I am asking anyone who knows what their doing to take a look at this because I have hit the limit of what I can do with this and what I can spend time wise in getting this to work.

Thank you in advance.
morph166955 is offline   Reply With Quote
Old 27th March 2007, 00:12   #2  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,392
Related thread on the ffmpeg mailinglist.

This is a long-standing bug in gcc that affects all x86_32 architectures. Core2 has nothing to do with it, nor does any other value of -march or -mtune. Workarounds: -m64, -fomit-frame-pointer, and/or -fno-PIC. (Actually, the bug affects x86_64 too, but then it only triggers after 15 registers rather than 6.)

x264 solves it by not using gcc asm, we use nasm instead.

Last edited by akupenguin; 27th March 2007 at 00:23.
akupenguin is offline   Reply With Quote
Old 27th March 2007, 00:38   #3  |  Link
morph166955
Registered User
 
Join Date: Mar 2006
Posts: 443
so then how come the only march that fails on my end is core2? if i manually set it to prescott or any of the others that gcc supports it works perfectly.
morph166955 is offline   Reply With Quote
Old 27th March 2007, 10:27   #4  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,392
Dunno. gcc-4.1.2 doesn't have -march=core2, and I failed to compile gcc-4.3-svn

Last edited by akupenguin; 27th March 2007 at 14:30.
akupenguin is offline   Reply With Quote
Old 27th March 2007, 18:55   #5  |  Link
morph166955
Registered User
 
Join Date: Mar 2006
Posts: 443
if you want to try it out, i have a gcc 4.3 build that i did yesterday online you can get it at http://www.benswebs.com/mingw/

we also have a thread going about building gcc 4.x up to 4.3 right now at http://forum.doom9.org/showthread.php?t=108215

also in case your wondering, i have successfully compiled x264 with the march/mtune set to core2 and mssse3 enabled. I did have to manually enable them in your config.mak file since it doesnt do it otherwise. I also oddly enough had to add -DHAVE_SSE3 since it didnt recognize it. my guess from looking at your code was that its looking for an X86_64 system and im only running 32 bit windows and/or because my mingw shows up as i686-pc-mingw32.
morph166955 is offline   Reply With Quote
Old 27th March 2007, 20:45   #6  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,392
Manually adding -DHAVE_SSE3 did absolutely nothing, because there aren't any 32bit ssse3 functions in x264. When I implement some, I'll make sure configure detects ssse3 support on x86_32, but until then it's intentionally disabled so that you don't think you're getting ssse3. (Though by that argument I should remove the "using cpu capabilities: 3dnow" message too, since there aren't any 3dnow functions either.)

Last edited by akupenguin; 27th March 2007 at 20:51.
akupenguin is offline   Reply With Quote
Old 27th March 2007, 23:19   #7  |  Link
morph166955
Registered User
 
Join Date: Mar 2006
Posts: 443
i definitely see your point there with how its pointless for me to add it. so ultimately what i would need is xp64 then loaded on this system to get that to work i assume? would the x64 version of freebsd also do the trick because i was considering dual booting this laptop with freebsd x64?

so a little bit back on topic, does anyone have an idea what I may be able to do to make the ffmpeg code work (be it changing which registers to use or something else)?
morph166955 is offline   Reply With Quote
Old 12th April 2007, 08:29   #8  |  Link
julesh
Registered User
 
Join Date: Jan 2005
Posts: 18
Have you tried using -fomit-frame-pointer? That might make an additional register available that you would otherwise need.

Otherwise, it could be rewritten so that instead of the four values being specified as inputs and four as outputs, it could have the source and destination pointers and two stride values specified as inputs and no outputs, which would probably be easier for gcc to deal with.
julesh is offline   Reply With Quote
Old 12th April 2007, 18:50   #9  |  Link
morph166955
Registered User
 
Join Date: Mar 2006
Posts: 443
i havent tested it in a while...im goign to be shortly though so that i can create an optimized mencoder for my new xeon setup
morph166955 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 03:11.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.