View Full Version : Does x264 utilize SSE4?
twolfe18
9th January 2008, 01:08
i was reading a review of the new penryn based intel processors, and they say that there are HUGE gains to be made on penryn when you use a programs that is SSE4 optimized (see http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3195&p=4).
does x264 currently use SSE4 instructions? if not, are there plans to re-optimize the code for SSE4?
Dark Shikari
9th January 2008, 01:10
The "huge gains" are completely contrived BS. The SAD ESA instruction, the source behind the supposed "huge gains," is not only useless unless an exhaustive search is used, but its slower than the mathematically equivalent sequential elimination algorithm (SEA) implemented in software. Since x264 already uses SEA (which was recently boosted in speed by over 30% in rev. 717), the new instruction is even more useless than it already was.
check
9th January 2008, 05:39
doesn't SSE4 include more than one new instruction? Are the rest just unrelated to video work?
Dark Shikari
9th January 2008, 05:43
doesn't SSE4 include more than one new instruction? Are the rest just unrelated to video work?I don't think akupenguin found anything useful in SSE4. SSE 4.1, if I recall correctly, I had something or other that might be useful in certain cases.
Manao
9th January 2008, 06:47
http://forum.doom9.org/showthread.php?p=993986#post993986
SSE4 will help here and there, but nothing groundbreaking, and nothing that would beat algorithmic refinements such as SEA vs ESA (ie, brute force, aka SSEx, can't beat brain, especially Loren's it seems...)
IgorC
9th January 2008, 07:38
(ie, brute force, aka SSEx, can't beat brain, especially Loren's it seems...)
But may beat divx developer's one. As we can remeber SSE4 brings a "huge speedup" to divx 6, doesn't it? Or more than probably it is just a lack of optimiziations in divx code.
akupenguin
9th January 2008, 07:56
The SAD ESA instruction, the source behind the supposed "huge gains," is not only useless unless an exhaustive search is used, but its slower than the mathematically equivalent sequential elimination algorithm (SEA) implemented in software.
Actually, I take that back. Not the part about x264's SSE2 algorithm being faster than the SSE4 code Intel used to show off MPSADBW, that's still true. But I think I found a way to make MPSADBW compatible with SEA, so you can get the benefit of both at once. Still, you won't get nearly as huge a speedup as advertised, because SAD isn't even the majority of the CPU cost in SEA.
I don't think akupenguin found anything useful in SSE4. SSE 4.1, if I recall correctly, I had something or other that might be useful in certain cases.
SSE numbers are no longer sequential. SSE4.1 is a subset of SSE4, SSE4.2 is the other subset, and SSE5 is completely unrelated.
Manao
9th January 2008, 08:35
IgorC : http://forum.doom9.org/showthread.php?p=1033903#post1033903
http://forum.doom9.org/showthread.php?t=125417
So, SSE4 doesn't bring a speed up to divx6, but to a modded version of it, to which a dumb ESA search was added.
froggy1
9th January 2008, 10:13
The "huge gains" are completely contrived BS. The SAD ESA instruction, the source behind the supposed "huge gains," is not only useless unless an exhaustive search is used, but its slower than the mathematically equivalent sequential elimination algorithm (SEA) implemented in software. Since x264 already uses SEA (which was recently boosted in speed by over 30% in rev. 717), the new instruction is even more useless than it already was.
Maybe a bit off topic, but I remember to read here that ESA in x264 is still single-threaded. Is this still true, or did the recent changes in x264 made ESA multi-threaded?
akupenguin
9th January 2008, 10:19
Maybe a bit off topic, but I remember to read here that ESA in x264 is still single-threaded.
fixed as of r676 / 2007-09-15
Dark Shikari
9th January 2008, 15:17
But may beat divx developer's one. As we can remeber SSE4 brings a "huge speedup" to divx 6, doesn't it? Or more than probably it is just a lack of optimiziations in divx code.The DivX code was intentionally changed so as to make the new instruction useful. I.e. they changed their motion search to a pure exhaustive search and then said "hey look how good SSE4 is!"
mcka
16th January 2008, 17:08
The DivX code was intentionally changed so as to make the new instruction useful. I.e. they changed their motion search to a pure exhaustive search and then said "hey look how good SSE4 is!"
Some days ago I read a benchmark on a german site, which used DivX 6.7 to encode a MPEG2 file and used the default settings (not the intel setting with "no audio"...). Only by enabling SSE4 the time needed for encoding was reduced by 15% with a Core 2 Extreme QX9770.
http://www.computerbase.de/artikel/hardware/prozessoren/2007/test_intel_core_2_extreme_qx9770_q9450/13/#abschnitt_divx_6_7
So you said DivX "changed their motion search to a pure exhaustive search", but has this "new, slow search" become the default in current DivX 6.7 with SSE4 disabled (making encoding slower for everyone not owning a cpu which are only available for some days and quite expensive)?
If not, I think 15% is really a nice result for a complete encoding process, only by enabling SSE4.
Inventive Software
16th January 2008, 17:29
The pure exhaustive search was implemented as a proof of concept of the SSE4 instruction. It's not the default, and is much slower than the "standard" search algorithm. :search: This has been discussed many times since it was found in DivX 6.6. ;)
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.