Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264
Register FAQ Calendar Today's Posts Search

Reply
 
Thread Tools Search this Thread Display Modes
Old 18th April 2007, 20:50   #1  |  Link
graysky
Registered User
 
graysky's Avatar
 
Join Date: Sep 2004
Posts: 429
question for the devs - x264.exe and SSE4

With all these news articles boasting a "more than 40% faster video encoding with SSE4 optimized video encoders, Intel promises" in the soon-to-be-released Penryn processor, I wanted to ask some x264 devs about this. As I understand it, the SSE4 instruction set is what will enable this speed increase in these new chips. What is the status of this in x264 and how soon after the release of these chips can we expect to see this gain? I ask because I'm almost ready to buy a Q6600/Intel P965 based system, but if the Penryn chips are really 40 % faster for x264 encoding, I may rethink the purchase.

Thanks for any info.
graysky is offline   Reply With Quote
Old 18th April 2007, 21:07   #2  |  Link
burfadel
Registered User
 
Join Date: Aug 2006
Posts: 2,229
I'm guessing as soon as somebody is available to supplement and implement the SSE4 instructions! The SSE4 instruction information is current available? maybe if someone had time they could implement it into the x264 code, and have it as an option? (as in --sse4 until such times as it can be properly tested and debugged. Least that way if it takes a couple of months to include it will be available for testing as soon as somebody gets a SSE4 processor (any bugs then henceforth be rectified with haste) and a subsequent change permitting the use of SSE4 without the sse4 switch (which is only there so it won't crash straight away for the non-testers who happen to get a SSE4 capable processor before the code can be fully tested).

Just an idea
burfadel is offline   Reply With Quote
Old 18th April 2007, 21:23   #3  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,392
SSE4 will be implemented as soon as I get access to a SSE4 cpu, or someone else with such decides to write it.

No need for such a complicated interface. I can't write SSE4 without a cpu to test it on, and if I can test it then there's very little chance that it would break on other cpus. No one has complained about SSSE3 crashes...

What's your source for "40%"? This one says "Motion estimation ... often accounts for about 40% of the total CPU cycles consumed by an encoder. ... This white paper will describe how video encoders can benefit from the Intel SSE4 instructions, achieving 1.6x to 3.8x performance speedups in integer motion vector search." Then they go on to describe ESA. And their results are probably correct for ESA. But the fast integer motion searches in x264 are more like 10-15% of the total cpu-time, plus SSE4 won't help them as much as it helps ESA. And even x264's successive-elimination ESA might be as fast as Intel's brute-force SSE4 ESA.

Last edited by akupenguin; 18th April 2007 at 22:05.
akupenguin is offline   Reply With Quote
Old 18th April 2007, 22:40   #4  |  Link
slavickas
I'm Shpongled
 
slavickas's Avatar
 
Join Date: Nov 2001
Location: Lithuania
Posts: 303
aku:
http://www.anandtech.com/cpuchipsets...spx?i=2972&p=3
see DivX result, although systems are quite unequal and SSE4 speedup part is unknown, but difference is huge.
slavickas is offline   Reply With Quote
Old 18th April 2007, 22:46   #5  |  Link
morph166955
Registered User
 
Join Date: Mar 2006
Posts: 443
what processors are even out (and publicly avail) that have sse4 as of now? arent they debuting in the penyrn processors which wont be out until Q1 08?
morph166955 is offline   Reply With Quote
Old 18th April 2007, 22:50   #6  |  Link
harissa
Registered User
 
Join Date: Oct 2004
Posts: 68
SSE4 lol
i have a quad core q6600
it s 3 time faster than a 3.4 dual core (945)
i d rather have the multithread before SSE4
cause neither megui or virtualdub with xvid or x264 give me more than 50% , usualy when i load a complex avs , it run at 30%
harissa is offline   Reply With Quote
Old 18th April 2007, 23:51   #7  |  Link
morph166955
Registered User
 
Join Date: Mar 2006
Posts: 443
Quote:
Originally Posted by harissa View Post
SSE4 lol
i have a quad core q6600
it s 3 time faster than a 3.4 dual core (945)
i d rather have the multithread before SSE4
cause neither megui or virtualdub with xvid or x264 give me more than 50% , usualy when i load a complex avs , it run at 30%
I see your 4...and raise you 4... lol j/k. I agree...multithread all the way! lets get these things running our systems to full before we start to go crazy with more optimizatoin. ssse3 is good enough for basically anyone on here at least for the next few months.
morph166955 is offline   Reply With Quote
Old 19th April 2007, 00:52   #8  |  Link
graysky
Registered User
 
graysky's Avatar
 
Join Date: Sep 2004
Posts: 429
Quote:
Originally Posted by akupenguin View Post
What's your source for "40%"?
It came from this tgdaily story. It's very vague.

Thanks to all for the replies, btw!
graysky is offline   Reply With Quote
Old 19th April 2007, 03:44   #9  |  Link
foxyshadis
Angel of Night
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Tangled in the silks
Posts: 9,559
That's just what the Intel press brief says, and well, they can say anything they want. Based on the wording of the actual press release, I'd expect that SSE4 is not the primary motivation behind the speed increase, but rather the other aspects of the architecture moving forward (much more memory bandwidth and 12M L2 cache especially, plus faster operations and more SSE gates). SSE4, like SSSE3, will push it an extra few percents beyond that.

Last edited by foxyshadis; 19th April 2007 at 03:47.
foxyshadis is offline   Reply With Quote
Old 19th April 2007, 04:06   #10  |  Link
lexor
Registered User
 
Join Date: Jan 2004
Posts: 849
As always ArsTechnica has a some good info on the subject. Here is a link (includes some tables and benches): Intel details Penryn performance, new SSE4 extensions

It appears not only do we get new instructions, but old instructions get improved performance (from 5 cycles down to 3 cycles execution on some). So Penryn will be good improvement (regardless of exactly how good) even if no SSE4 code is added to x264. Niceness all around
__________________
Geforce GTX 260
Windows 7, 64bit, Core i7
MPC-HC, Foobar2000
lexor is offline   Reply With Quote
Old 19th April 2007, 05:13   #11  |  Link
RaynQuist
Registered User
 
Join Date: Dec 2006
Posts: 44
http://www.bit-tech.net/hardware/200..._penryn/2.html

The last picture on that page shows how Intel is accounting the 105% performance boost in DivX.
RaynQuist is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 22:52.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.