Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
10th April 2007, 17:53 | #1 | Link |
Registered User
Join Date: Mar 2006
Posts: 443
|
x264 multi-core (4+) threading optimization
EDIT: To make it easier for people to find the thread_pool04 patch is available at http://www.benswebs.com/public/x264/....04c.r680.diff
I've made the disturbing realization that as of right now every method I have tested on sending video to x264 is faster on xp then it is in freebsd...which really should not be! I'm looking for some options (or what other people are doing) to try out on this system to get x264 to run at its peak speed wize. This machine is an octacore xeon which is pulling well over 100fps in xp when doing the megui benchmark using avisynth/mt.dll yet it barely hits 100 if at all in bsd...even if i pipe it raw y4m files! I really want to get this machine running as fast as humanly possible (which for this thing should be obsurdly fast) and I'd rather not run it in XP. x264 is not running anywhere near 100% (infact its closer to 35% on all of the cores) when in first pass and only maybe 75% in the second pass so I know it can definitely speed up. So please...any ideas on how to pipe the video into x264 to run this thing to its max would be greatly appreciated. (note for mods: im not looking for "bests" here...im looking for options to test and opinions on what may or may not be the ideal settings on this setup) Last edited by morph166955; 7th January 2009 at 03:40. |
10th April 2007, 19:00 | #3 | Link |
Registered User
Join Date: Jan 2007
Posts: 27
|
Raw Y4M == HDD bottleneck. It is imperative that you decode the input in another program, then pipe it to x264 (using | or a fifo). However, x264 tends to assume its standard input to be raw YUV, not Y4M, so you'll have to apply the patch that adds a --y4m-input option (or use a fifo with name **.y4m).
Have you tried ffmpeg -f yuv4mpegpipe | ...? Anyhow, there's always the option of using Avisynth under wine with avs2yuv, which was done for that exact purpose. It will be slower than under Windows, but not by much. AG Last edited by AGDenton; 10th April 2007 at 19:03. |
10th April 2007, 23:13 | #6 | Link |
Registered User
Join Date: Mar 2006
Posts: 443
|
At this point I've found that (at least freebsd) has some issues with scheduling for more then 4 cores (it wont even consider letting x264, ffmpeg or mencoder have more then 9 threads which i assume is 1 for input and 8 for x264).
I'm (saddly) going to end up running xp as the primary os on this box for now and possibly run some form of linux in a vmware workstation to do testing even though this will slow down the operation of the vmware slightly (although with that many cores its probably negligable). I'm moving things around on the drives now so that I can change my partition types w/o having to export my files off of the system onto externals or w/e. I already have windows loaded on this box on another parition so I'm going to use acronis to redo the paritions and just make it one huge one on this disk. Once I get something working stably in vmware I'll probably migrate over to that OS but I at least want something to tinker with until then seeing as it may take a good while to do that. I'll let ya all know how it goes in a few hours once i do some more tweaking. |
11th April 2007, 00:57 | #7 | Link | |
Registered User
Join Date: Sep 2004
Location: Italy
Posts: 154
|
Quote:
|
|
11th April 2007, 01:42 | #8 | Link |
Registered User
Join Date: Mar 2006
Posts: 443
|
I'm going to run vmware on the xp install to test different distro's for speed (assuming that vmware degrades each speedwize approximately equally) as well as stability and ease of use. I was goign to try debian, fedora and ubuntu first...then maybe a few of the other ones depending how those do. I think its going to be a lot of trial and error to get the most optimal setup for me.
|
11th April 2007, 01:54 | #9 | Link | |
x264 developer
Join Date: Sep 2004
Posts: 2,392
|
Quote:
Responsiveness of their default desktop environment? (choice of window manager doesn't have much to do with distro) Speed at which they run x264? (should depend only on the kernel's smp scheduler and the amount of crap running in the background, not anything distro-specific) Boot time? (the amount of time I spend waiting for a reboot over the lifespan of a computer is probably less than the time it would take to install one alternate distro) |
|
11th April 2007, 02:32 | #10 | Link |
Registered User
Join Date: Mar 2006
Posts: 443
|
yea i was going to shoot for encoding speed basically...most of it was going to be ease of installing/compiling software and such (and just how i liked it overall). speedwize the only thing i can think of is if the os has any added software running that can take up cpu time and/or different libraries (such as the threads library) that may affect how it all runs. its mostly going to come down to how i like the system overall and im probably just going to end up playing with it over the course of a few weeks...ill just run everything in xp until then it should do for now.
|
11th April 2007, 19:17 | #12 | Link |
Registered User
Join Date: Mar 2006
Posts: 443
|
ok i loaded up fedora core6, compiled x264, and piped the megui test through mencoder using the x264 options in their job1-2.xml file for a first pass. i'm still only getting 35-40% cpu utilization on the first pass putting me at ~177fps (yea i know im complaining about 177fps on x264...sounds almost too good to be true right).
@akpenguin what would be the most optimal way for me to create a file from the test mpeg into something that a non-y4m patched x264 can read the fastest? ffmpeg? mplayer/mencoder? |
11th April 2007, 20:38 | #13 | Link |
x264 developer
Join Date: Sep 2004
Posts: 2,392
|
ffmpeg, mplayer, and mencoder will all produce identical yuv files.
shortest commandline is `ffmpeg -i test.mpg -f rawvideo -y test.yuv` I might have reproduced your problem (though not as great a magnitude given only 2 cores). I have two core2 duo systems here, and one gets 105% smp efficiency (i.e. 100% cpu use and 5% less total cpu-time than 1 thread needed) while the other gets 90% efficiency (i.e. 85% cpu use and 5% less total cpu-time). This is true even with intra-only, which should have no synchronization overhead at all. So I don't know why it happens, but at least I can test alternatives. Last edited by akupenguin; 11th April 2007 at 20:42. |
11th April 2007, 20:52 | #14 | Link |
Registered User
Join Date: Mar 2006
Posts: 443
|
Heres some CPU Utilization numbers for you. Source is generated from using the megui test .avs file thru mencoder into a raw yuv file. x264 runs were done using the megui test options (see below). You must divide the cpu percentage by 8 i believe to get the average. Pass 1 is still only around 45% and pass 2 is around 80%. These final #'s were generated after the passes were all run (they were obviously run pass1 then pass2 not all the 1's then all the 2's). I did 10 runs with these settings.
Code:
PASS1: encoded 1488 frames, 189.16 fps, 989.84 kb/s 27.75user 0.42system 0:07.87elapsed 357%CPU (0avgtext+0avgdata 0maxresident)k encoded 1488 frames, 182.12 fps, 989.84 kb/s 28.00user 0.41system 0:08.17elapsed 347%CPU (0avgtext+0avgdata 0maxresident)k encoded 1488 frames, 187.77 fps, 989.84 kb/s 27.84user 0.41system 0:07.93elapsed 356%CPU (0avgtext+0avgdata 0maxresident)k encoded 1488 frames, 188.89 fps, 989.84 kb/s 27.83user 0.46system 0:07.88elapsed 358%CPU (0avgtext+0avgdata 0maxresident)k encoded 1488 frames, 188.22 fps, 989.84 kb/s 27.87user 0.42system 0:07.91elapsed 357%CPU (0avgtext+0avgdata 0maxresident)k encoded 1488 frames, 187.57 fps, 989.84 kb/s 27.79user 0.40system 0:07.94elapsed 355%CPU (0avgtext+0avgdata 0maxresident)k encoded 1488 frames, 185.17 fps, 989.84 kb/s 27.85user 0.41system 0:08.04elapsed 351%CPU (0avgtext+0avgdata 0maxresident)k encoded 1488 frames, 187.65 fps, 989.84 kb/s 27.78user 0.46system 0:07.93elapsed 356%CPU (0avgtext+0avgdata 0maxresident)k encoded 1488 frames, 185.20 fps, 989.84 kb/s 27.72user 0.42system 0:08.04elapsed 350%CPU (0avgtext+0avgdata 0maxresident)k encoded 1488 frames, 188.29 fps, 989.84 kb/s 27.77user 0.44system 0:07.91elapsed 356%CPU (0avgtext+0avgdata 0maxresident)k PASS2: encoded 1488 frames, 116.38 fps, 1009.85 kb/s 84.39user 0.90system 0:12.87elapsed 662%CPU (0avgtext+0avgdata 0maxresident)k encoded 1488 frames, 116.82 fps, 1009.74 kb/s 84.26user 0.91system 0:12.82elapsed 664%CPU (0avgtext+0avgdata 0maxresident)k encoded 1488 frames, 116.19 fps, 1009.75 kb/s 84.25user 0.90system 0:12.88elapsed 660%CPU (0avgtext+0avgdata 0maxresident)k encoded 1488 frames, 116.74 fps, 1009.39 kb/s 84.41user 0.91system 0:12.83elapsed 665%CPU (0avgtext+0avgdata 0maxresident)k encoded 1488 frames, 116.34 fps, 1009.38 kb/s 84.42user 0.84system 0:12.87elapsed 662%CPU (0avgtext+0avgdata 0maxresident)k encoded 1488 frames, 116.08 fps, 1009.44 kb/s 84.27user 0.93system 0:12.90elapsed 660%CPU (0avgtext+0avgdata 0maxresident)k encoded 1488 frames, 116.15 fps, 1009.78 kb/s 84.28user 0.89system 0:12.89elapsed 660%CPU (0avgtext+0avgdata 0maxresident)k encoded 1488 frames, 115.93 fps, 1009.79 kb/s 84.22user 0.92system 0:12.91elapsed 659%CPU (0avgtext+0avgdata 0maxresident)k encoded 1488 frames, 115.99 fps, 1009.38 kb/s 84.24user 0.87system 0:12.91elapsed 659%CPU (0avgtext+0avgdata 0maxresident)k encoded 1488 frames, 115.59 fps, 1009.48 kb/s 84.33user 0.87system 0:12.95elapsed 657%CPU (0avgtext+0avgdata 0maxresident)k Code:
x264 --pass 1 --bitrate 1000 --stats test-NEW.stats --bframes 3 --b-pyramid --direct auto --subme 1 --analyse none --vbv-maxrate 25000 --me dia --merange 12 --threads auto --thread-input --progress --no-psnr --no-ssim -o /dev/null stream.yuv 640x480 x264 --pass 2 --bitrate 1000 --stats test-NEW.stats --ref 3 --bframes 3 --b-pyramid --weightb --direct auto --subme 6 --trellis 1 --analyse all --8x8dct --vbv-maxrate 25000 --me umh --merange 12 --threads auto --thread-input --progress --no-psnr --no-ssim -o /dev/null stream.yuv 640x480 Last edited by morph166955; 11th April 2007 at 20:55. |
11th April 2007, 22:02 | #17 | Link |
Registered User
Join Date: Mar 2006
Posts: 443
|
@akupenguin
how is mvrange-thread calculated in x264 when its set to auto (i assume thats the default?) I'm a little confused about the whole mvrange-thread thing all together as far as what it exactly does and there isnt a whole lot of stuff on here minus the random post saying "yes use it its good". Whats a good way for me to calculate it to optimize speed w/o it being overly crazy where it will affect my bitrate? |
11th April 2007, 22:24 | #19 | Link | ||
x264 developer
Join Date: Sep 2004
Posts: 2,392
|
Quote:
Quote:
Then take half, round up to a multiple of the macroblock height. That's the space allocated to mvrange-thread. The other half is scheduling leeway, so that threads don't have to wait for eachother after every row. The derived value of mvrange-thread is printed at the beginning if you run `x264 -v ...` Last edited by akupenguin; 11th April 2007 at 23:56. |
||
11th April 2007, 22:50 | #20 | Link |
Registered User
Join Date: Mar 2006
Posts: 443
|
ah...cool. thanks for the explanation! its comming up now default at 24. I'm moving some files around right now on those drives so once thats done (should be soon) I'll run a few passes with lower numbers for that (im going to try it first at threads=12 first for different values to see what that does, then ill play around with different #'s of threads)
|
Thread Tools | Search this Thread |
Display Modes | |
|
|