View Full Version : x264: [maybe] a marginal speed-up for CPUs that support SSE3
CREXbzh
1st April 2006, 11:56
There there,
I've made a very simple patch to take advantage of SSE3's lddqu intruction. Intel claims that it can bring a great speed-up, but also warns that it some cases it can harm speed (see http://72.14.203.104/search?q=cache:Woi1D7ELJQMJ:www.intel.com/technology/itj/2004/volume08issue01/art01_microarchitecture/p06_sse.htm+lddqu&hl=fr&gl=fr&ct=clnk&cd=1&client=firefox)
Anyway, here's the patch: http://tuxrip.free.fr/transperl/MPlayer/SSE3_lddqu.diff
I don't have a CPU that supports SSE3, so I just can't test, but I'd be happy to read some benchmarks.
Sirber
1st April 2006, 15:11
Did you just repleace all movdqu by lddqu?
celtic_druid
1st April 2006, 16:59
From a brief scan of the diff it definatly looks that way.
ChronoCross
1st April 2006, 19:42
me and syskin were discussing this last night and I think we both came to the same conclusion that it would provide almost no speedup in this case. Perhaps 1% but nothing to get excited about.
CREXbzh
1st April 2006, 19:53
Did you just repleace all movdqu by lddqu?
yes, that's all that's in it (or at least, the only thing that was intented to to into the patch, as the first version I uploaded had some CFLAGS changes).
I don't expect the speed-up difference to be huge, if there's any at all... it's just that I don't see the harm of testing it out and see what happens.
Sharktooth
1st April 2006, 21:27
Well, i did know sse3 would give no speed up but i wanted to try the patch anyways. The result is... it's almost useless, at least on AMD (yes with SSE3 support).
foxyshadis
1st April 2006, 21:42
May I try your patched mplayer? I'd like to test it on my Core Duo, at least to see if it makes any diff on Intel.
akupenguin
1st April 2006, 22:55
+ lddqu [edi], xmm0
That shouldn't even work... lddqu is load, not store.
CREXbzh
1st April 2006, 22:58
Well, i did know sse3 would give no speed up but i wanted to try the patch anyways. The result is... it's almost useless, at least on AMD (yes with SSE3 support).
I'm not surprised. No AMD core has been built from the ground up to benefit from it, which isn't the case for Prescott core.
CREXbzh
1st April 2006, 23:18
+ lddqu [edi], xmm0
That shouldn't even work... lddqu is load, not store.
For some reason, yasm didn't complain on my machine (amd64).
Anyway, new patch here: http://tuxrip.free.fr/transperl/MPlayer/SSE3_lddqu.2.diff
squid_80
2nd April 2006, 13:04
For some reason, yasm didn't complain on my machine (amd64).
Yasm can be too forgiving of illegal instructions. In fact if you were compiling for AMD64 architecture you would definitely not want to store to [edi], no matter what instruction was used to do it.
Sharktooth
2nd April 2006, 14:49
I'm not surprised. No AMD core has been built from the ground up to benefit from it, which isn't the case for Prescott core.
i have yet to see any improvements by using SSE3 on ANY software on ANY CPU...
IMHO SSE3 were introduced only for marketing reasons and for trying to fix the HT.
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.