Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
18th April 2007, 20:50 | #1 | Link |
Registered User
Join Date: Sep 2004
Posts: 429
|
question for the devs - x264.exe and SSE4
With all these news articles boasting a "more than 40% faster video encoding with SSE4 optimized video encoders, Intel promises" in the soon-to-be-released Penryn processor, I wanted to ask some x264 devs about this. As I understand it, the SSE4 instruction set is what will enable this speed increase in these new chips. What is the status of this in x264 and how soon after the release of these chips can we expect to see this gain? I ask because I'm almost ready to buy a Q6600/Intel P965 based system, but if the Penryn chips are really 40 % faster for x264 encoding, I may rethink the purchase.
Thanks for any info. |
18th April 2007, 21:07 | #2 | Link |
Registered User
Join Date: Aug 2006
Posts: 2,229
|
I'm guessing as soon as somebody is available to supplement and implement the SSE4 instructions! The SSE4 instruction information is current available? maybe if someone had time they could implement it into the x264 code, and have it as an option? (as in --sse4 until such times as it can be properly tested and debugged. Least that way if it takes a couple of months to include it will be available for testing as soon as somebody gets a SSE4 processor (any bugs then henceforth be rectified with haste) and a subsequent change permitting the use of SSE4 without the sse4 switch (which is only there so it won't crash straight away for the non-testers who happen to get a SSE4 capable processor before the code can be fully tested).
Just an idea |
18th April 2007, 21:23 | #3 | Link |
x264 developer
Join Date: Sep 2004
Posts: 2,392
|
SSE4 will be implemented as soon as I get access to a SSE4 cpu, or someone else with such decides to write it.
No need for such a complicated interface. I can't write SSE4 without a cpu to test it on, and if I can test it then there's very little chance that it would break on other cpus. No one has complained about SSSE3 crashes... What's your source for "40%"? This one says "Motion estimation ... often accounts for about 40% of the total CPU cycles consumed by an encoder. ... This white paper will describe how video encoders can benefit from the Intel SSE4 instructions, achieving 1.6x to 3.8x performance speedups in integer motion vector search." Then they go on to describe ESA. And their results are probably correct for ESA. But the fast integer motion searches in x264 are more like 10-15% of the total cpu-time, plus SSE4 won't help them as much as it helps ESA. And even x264's successive-elimination ESA might be as fast as Intel's brute-force SSE4 ESA. Last edited by akupenguin; 18th April 2007 at 22:05. |
18th April 2007, 22:40 | #4 | Link |
I'm Shpongled
Join Date: Nov 2001
Location: Lithuania
Posts: 303
|
aku:
http://www.anandtech.com/cpuchipsets...spx?i=2972&p=3 see DivX result, although systems are quite unequal and SSE4 speedup part is unknown, but difference is huge. |
18th April 2007, 22:50 | #6 | Link |
Registered User
Join Date: Oct 2004
Posts: 68
|
SSE4 lol
i have a quad core q6600 it s 3 time faster than a 3.4 dual core (945) i d rather have the multithread before SSE4 cause neither megui or virtualdub with xvid or x264 give me more than 50% , usualy when i load a complex avs , it run at 30% |
18th April 2007, 23:51 | #7 | Link |
Registered User
Join Date: Mar 2006
Posts: 443
|
I see your 4...and raise you 4... lol j/k. I agree...multithread all the way! lets get these things running our systems to full before we start to go crazy with more optimizatoin. ssse3 is good enough for basically anyone on here at least for the next few months.
|
19th April 2007, 00:52 | #8 | Link |
Registered User
Join Date: Sep 2004
Posts: 429
|
|
19th April 2007, 03:44 | #9 | Link |
Angel of Night
Join Date: Nov 2004
Location: Tangled in the silks
Posts: 9,559
|
That's just what the Intel press brief says, and well, they can say anything they want. Based on the wording of the actual press release, I'd expect that SSE4 is not the primary motivation behind the speed increase, but rather the other aspects of the architecture moving forward (much more memory bandwidth and 12M L2 cache especially, plus faster operations and more SSE gates). SSE4, like SSSE3, will push it an extra few percents beyond that.
Last edited by foxyshadis; 19th April 2007 at 03:47. |
19th April 2007, 04:06 | #10 | Link |
Registered User
Join Date: Jan 2004
Posts: 849
|
As always ArsTechnica has a some good info on the subject. Here is a link (includes some tables and benches): Intel details Penryn performance, new SSE4 extensions
It appears not only do we get new instructions, but old instructions get improved performance (from 5 cycles down to 3 cycles execution on some). So Penryn will be good improvement (regardless of exactly how good) even if no SSE4 code is added to x264. Niceness all around
__________________
Geforce GTX 260 Windows 7, 64bit, Core i7 MPC-HC, Foobar2000 |
19th April 2007, 05:13 | #11 | Link |
Registered User
Join Date: Dec 2006
Posts: 44
|
http://www.bit-tech.net/hardware/200..._penryn/2.html
The last picture on that page shows how Intel is accounting the 105% performance boost in DivX. |
|
|