Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.


Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Thread Tools Search this Thread Display Modes
Old 30th March 2007, 05:33   #1  |  Link
Registered User
Join Date: Mar 2006
Posts: 443
Looking for some opinions/options to optimize my new 8-Core system for x264 speed!

So first read here for system specs and other system info (long story short, dual X5355 quad core xeons...):


For obvious reasons I'm looking for ways to squeeze every fps out of this system. For the time being, over clocking this thing is NOT an option. It cost way to damn much and I don't have enough experience over clocking that i want to risk it quite yet. Maybe eventually...but not now...and not any time in the next few weeks/months.

I have some ideas, but I'm not sure if they will do anything hence why I'm asking. So here they are...

1a) screw the hard drives, lets do this all in ramdisks! ok now for the reasoning. I have 4GB of DDR2 667 FB-DIMM ram in this box. Its all on a 1333MHz FSB and its operating in Four Channel mode. Thats about as fast as you can get memory today (dont start with the CRAZY expensive stuff that no one really ever can use please ). If I create some ramdisks, say one to hold the source and one to hold the output, and copy my source into the ramdisk, would being in the ram drastically (or even slightly) speed up the encode because the read/write time would be faster then that of a 7200RPM SATA hard disc. In other discussions I've seen on here and elsewhere the cpu speed limited the encode far more then the hard drives. Is that still the case here or is the fact that i have 8x 2.66GHz cores running simultaniously going to surpass that cpu/hdd limit where this may be beneficial?
1b) Would there be any benefit of creating a small (say 64meg or less) ramdisk to hold the x264 binary and any other libraries and binaries and/or anything else thats more "static" like those? I realize that this option is probably 110% worthless, but I figured I'd ask.

2) Optimized SMP Kernel. The primary OS is going to be freebsd 6.2, amd64 branch which opposed to its name handles all 64-bit cpus. Is there anything I can do when building the kernel to speed anything up or make the SMP parts work better? I know there are several options I can put into the kernel and I have to do some more research on them but if anyone can point me in any specific directions that would be awesome.

3) Optimized x264 binary/libraries. I'm planning on using gcc 4.3 along with its march/mtune options set to core2 as well as mssse3 plus I'm going to make sure that im on the new x264 with the ssse3 support (i believe that was r635?). Any way for me to make that thing go even faster? Anything I can manually patch in x264 that will make it go faster (like maybe something in there designed for the older systems that I can run faster thats not already done in r635)?

Thats about it on my brain for now...Lemme know what you all think about those and/or any other options or ideas you have that could help out.

Theres an old saying that I think could be changed a little to apply here. "Keep track of the pennies and the dollars will keep track of themselves" or in this case "Keep track of every .1 fps and the final speed will blow your mind!" Thanks all!
morph166955 is offline   Reply With Quote
Old 30th March 2007, 06:14   #2  |  Link
ангел смерти
foxyshadis's Avatar
Join Date: Nov 2004
Location: Lost
Posts: 9,175
Even if you're going to use it for encoding, it's still 100% hard/software discussion.

1) Memory speed will probably be a large bottleneck, but I couldn't tell for sure without tuning. It's doubtful that hard drive will be a problem if you have more than one (in any configuration, single input drive+output, raid 0, raid 5, dual raid 0...). With a single drive, you might end up thrashing, especially if you use multiple x264 instances instead of simple x264 threads.

Quad channel memory will help, but if you're doing HD then DDR3 might be attractive; every 1080p frame is 3 megs, which eats L2 cache in a hurry.

3) Besides the AQ patch, I doubt it. If you were on altivec, Guillaume might have experimental code for you. But you never know... you could try asking the x264 list directly.
There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. ~ Ed Howdershelt
foxyshadis is offline   Reply With Quote
Old 30th March 2007, 06:25   #3  |  Link
Registered User
Join Date: Mar 2006
Posts: 443
true...it is a hard/software discussion...the reason i made a post here was because i was also looking for the possibility of some x264 cli options and/or anything else related directly to x264 itself that is semi-outside of that boundary.

As for your comments:

1) I'm going to be using 2 drives to do encoding. I have 2x WD5000YS drives in the system. One will be to hold the source, one will be to write the new files to. I'm planning on running x264 with threads (ill start at 8 but then see if something like 12 or 16 or higher is faster...going to take some testing). Any recommendations on what may be a good value? I realize its completely hypothetical since we don't know what kind of load each cpu will get as of now when running x264 but we do have some experience on the quad-cores that exist now that i believe could be used to come up with a theory.

3) How do i get on the x264 list? I went looking a while ago for a link and I dont remember why I couldnt find one. Im probably stupid and its probably in one of the stickys and I'm just blind and not seeing it.
EDIT: Nevermind...found it...subscribing...im stupid.

Last edited by morph166955; 30th March 2007 at 06:31.
morph166955 is offline   Reply With Quote
Old 30th March 2007, 18:01   #4  |  Link
Registered User
Join Date: Sep 2006
Posts: 602
Originally Posted by morph166955 View Post
(ill start at 8 but then see if something like 12 or 16 or higher is faster...going to take some testing). Any recommendations on what may be a good value?
Number of threads = number of CPUs.

There's little point adding more threads since, at any given time, each CPU can only process one. Having more threads than CPUs requires additional thread handling overhead, more cache pollution etc etc.

Alternatively, if you have multiple encodings to do, run them in parallel with the appropriate number of threads. e.g., if you have 4 similarly sized files to encode, run each one with 2 threads and, if possible, assign the processor affinity accordingly.
JohnnyMalaria is offline   Reply With Quote
Old 30th March 2007, 06:41   #5  |  Link
Registered User
Join Date: Mar 2006
Posts: 443
:-( it wont let me on the list...error msg is below:
This is the Postfix program at host krishna.via.ecp.fr.

I'm sorry to have to inform you that your message could not be
be delivered to one or more recipients. It's attached below.

For further assistance, please send mail to <postmaster>

If you do so, please include this problem report. You can
delete your own text from the attached returned message.

The Postfix program

<ecartis@krishna.via.ecp.fr>: Command died with status 1:
morph166955 is offline   Reply With Quote
Old 30th March 2007, 09:08   #6  |  Link
Registered User
Join Date: Aug 2005
Posts: 132
3) using make fprofiled instead of make may squeeze some last bits of performance out of x264.
Hellworm is offline   Reply With Quote
Old 30th March 2007, 21:16   #7  |  Link
Registered User
AGDenton's Avatar
Join Date: Jan 2007
Posts: 27
From a dual 5160/chipset 5000X user with 5GB RAM (MacPro), some remarks on threading :

1) x264's sliceless threading is quite nice, but it is benefitial to have more threads than cores. If you let x264 choose, it will set the number of threads to 1.5x the number of cores (in my case, 6) but I saw a performance increase when going to 10 or more (1080p material).

2) Because of the sliceless method, there is no drawback to increasing the number of threads... Instead, you increase the parallelism of the algorithm by specifiying the "mvrange_thread" parameter, which specifies a maximum motion vector size. By default, x264 deduces it from the number of threads you specified. It works as in

Lower values = more threads able to work simultaneously

The drawback to this is a potential loss in compressibility if the ideal motion-vector happens to be larger than mvrange_thread. Beware ! You have to set mvrange to the same value as mvrange_thread. Otherwise, x264 may first calculate a MV that is too large, and then crash as it realizes its mistake. I've had it happen to me.

3) Your best bet, of course, is not to rely on parallelizing x264 but on using the temporal parallelization of your encoding jobs, like in a cluster encoding.

a) In constant quantizer/quality mode, it's as simple as encoding fractions of the movie, 8 at a time, with single-threaded x264s and then stitching them up. The resulting encoding will have (possibly unneccessary) I-frames inserted at the stitching points, which will result in a (negligible) loss in compressibility.

b) In a multi-pass mode, a more intelligent method has been designed by Orion (x264farm) : after a fast "0th pass", which finds the scenecuts, scenes are encoded in parallel a first time. The resulting 1st-pass log is reassembled, and GOPs that were encoded with an inadequate bitrate at first are reprocessed. This achieves the same result as multipass x264, but in a massively parallel fashion.

You could run x264farm locally (unwieldly, as it requires AviSynth/Windows) or adapt it to your needs... In my experience, temporal parallelization will always beat x264's threading.

Do post some scalability benchmarks with 1080p !

AGDenton is offline   Reply With Quote

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +1. The time now is 21:34.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2017, vBulletin Solutions Inc.