Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Reply
 
Thread Tools Search this Thread Display Modes
Old 10th April 2007, 17:53   #1  |  Link
morph166955
Registered User
 
Join Date: Mar 2006
Posts: 443
x264 multi-core (4+) threading optimization

EDIT: To make it easier for people to find the thread_pool04 patch is available at http://www.benswebs.com/public/x264/....04c.r680.diff


I've made the disturbing realization that as of right now every method I have tested on sending video to x264 is faster on xp then it is in freebsd...which really should not be! I'm looking for some options (or what other people are doing) to try out on this system to get x264 to run at its peak speed wize. This machine is an octacore xeon which is pulling well over 100fps in xp when doing the megui benchmark using avisynth/mt.dll yet it barely hits 100 if at all in bsd...even if i pipe it raw y4m files! I really want to get this machine running as fast as humanly possible (which for this thing should be obsurdly fast) and I'd rather not run it in XP. x264 is not running anywhere near 100% (infact its closer to 35% on all of the cores) when in first pass and only maybe 75% in the second pass so I know it can definitely speed up. So please...any ideas on how to pipe the video into x264 to run this thing to its max would be greatly appreciated.

(note for mods: im not looking for "bests" here...im looking for options to test and opinions on what may or may not be the ideal settings on this setup)

Last edited by morph166955; 7th January 2009 at 03:40.
morph166955 is offline   Reply With Quote
Old 10th April 2007, 18:31   #2  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,690
Tried using more threads if its not using all the CPU power?
Dark Shikari is offline   Reply With Quote
Old 10th April 2007, 19:00   #3  |  Link
AGDenton
Registered User
 
AGDenton's Avatar
 
Join Date: Jan 2007
Posts: 27
Raw Y4M == HDD bottleneck. It is imperative that you decode the input in another program, then pipe it to x264 (using | or a fifo). However, x264 tends to assume its standard input to be raw YUV, not Y4M, so you'll have to apply the patch that adds a --y4m-input option (or use a fifo with name **.y4m).

Have you tried ffmpeg -f yuv4mpegpipe | ...?

Anyhow, there's always the option of using Avisynth under wine with avs2yuv, which was done for that exact purpose. It will be slower than under Windows, but not by much.

AG

Last edited by AGDenton; 10th April 2007 at 19:03.
AGDenton is offline   Reply With Quote
Old 10th April 2007, 22:07   #4  |  Link
morph166955
Registered User
 
Join Date: Mar 2006
Posts: 443
forgot to mention...the source was loaded in a ramdisk. the ram is 4 gig worth of 240Pin DDR2 667 FB-DIMM's...so shall we say...no slow down on the source in terms of accessing it
morph166955 is offline   Reply With Quote
Old 10th April 2007, 23:05   #5  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,393
No problems here. x264 reading from a rawyuv file uses all the cpu I can give it. (not a ramdisk, just relying on Linux's file-chache to keep it in ram)
akupenguin is offline   Reply With Quote
Old 10th April 2007, 23:13   #6  |  Link
morph166955
Registered User
 
Join Date: Mar 2006
Posts: 443
At this point I've found that (at least freebsd) has some issues with scheduling for more then 4 cores (it wont even consider letting x264, ffmpeg or mencoder have more then 9 threads which i assume is 1 for input and 8 for x264).

I'm (saddly) going to end up running xp as the primary os on this box for now and possibly run some form of linux in a vmware workstation to do testing even though this will slow down the operation of the vmware slightly (although with that many cores its probably negligable). I'm moving things around on the drives now so that I can change my partition types w/o having to export my files off of the system onto externals or w/e. I already have windows loaded on this box on another parition so I'm going to use acronis to redo the paritions and just make it one huge one on this disk. Once I get something working stably in vmware I'll probably migrate over to that OS but I at least want something to tinker with until then seeing as it may take a good while to do that.

I'll let ya all know how it goes in a few hours once i do some more tweaking.
morph166955 is offline   Reply With Quote
Old 11th April 2007, 00:57   #7  |  Link
giandrea
Registered User
 
Join Date: Sep 2004
Location: Italy
Posts: 154
Quote:
Originally Posted by morph166955 View Post
At this point I've found that (at least freebsd) has some issues with scheduling for more then 4 cores (it wont even consider letting x264, ffmpeg or mencoder have more then 9 threads which i assume is 1 for input and 8 for x264).

I'm (saddly) going to end up running xp as the primary os on this box for now and possibly run some form of linux in a vmware workstation to do testing even though this will slow down the operation of the vmware slightly (although with that many cores its probably negligable). I'm moving things around on the drives now so that I can change my partition types w/o having to export my files off of the system onto externals or w/e. I already have windows loaded on this box on another parition so I'm going to use acronis to redo the paritions and just make it one huge one on this disk. Once I get something working stably in vmware I'll probably migrate over to that OS but I at least want something to tinker with until then seeing as it may take a good while to do that.

I'll let ya all know how it goes in a few hours once i do some more tweaking.
That's what I tought. Try running Linux, it's better at multithreading. Try Ubuntu server and build x264, MPlayer, ffmpeg and what else with optimizations for your processor.
giandrea is offline   Reply With Quote
Old 11th April 2007, 01:42   #8  |  Link
morph166955
Registered User
 
Join Date: Mar 2006
Posts: 443
I'm going to run vmware on the xp install to test different distro's for speed (assuming that vmware degrades each speedwize approximately equally) as well as stability and ease of use. I was goign to try debian, fedora and ubuntu first...then maybe a few of the other ones depending how those do. I think its going to be a lot of trial and error to get the most optimal setup for me.
morph166955 is offline   Reply With Quote
Old 11th April 2007, 01:54   #9  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,393
Quote:
test different distro's for speed
What kind of speed could you test?

Responsiveness of their default desktop environment? (choice of window manager doesn't have much to do with distro)
Speed at which they run x264? (should depend only on the kernel's smp scheduler and the amount of crap running in the background, not anything distro-specific)
Boot time? (the amount of time I spend waiting for a reboot over the lifespan of a computer is probably less than the time it would take to install one alternate distro)
akupenguin is offline   Reply With Quote
Old 11th April 2007, 02:32   #10  |  Link
morph166955
Registered User
 
Join Date: Mar 2006
Posts: 443
yea i was going to shoot for encoding speed basically...most of it was going to be ease of installing/compiling software and such (and just how i liked it overall). speedwize the only thing i can think of is if the os has any added software running that can take up cpu time and/or different libraries (such as the threads library) that may affect how it all runs. its mostly going to come down to how i like the system overall and im probably just going to end up playing with it over the course of a few weeks...ill just run everything in xp until then it should do for now.
morph166955 is offline   Reply With Quote
Old 11th April 2007, 05:26   #11  |  Link
morph166955
Registered User
 
Join Date: Mar 2006
Posts: 443
and xp bites the dust...it wont let me access more then 2 gig of my ram w/o going to xp64...fedora core 6 here i come!
morph166955 is offline   Reply With Quote
Old 11th April 2007, 19:17   #12  |  Link
morph166955
Registered User
 
Join Date: Mar 2006
Posts: 443
ok i loaded up fedora core6, compiled x264, and piped the megui test through mencoder using the x264 options in their job1-2.xml file for a first pass. i'm still only getting 35-40% cpu utilization on the first pass putting me at ~177fps (yea i know im complaining about 177fps on x264...sounds almost too good to be true right).

@akpenguin
what would be the most optimal way for me to create a file from the test mpeg into something that a non-y4m patched x264 can read the fastest? ffmpeg? mplayer/mencoder?
morph166955 is offline   Reply With Quote
Old 11th April 2007, 20:38   #13  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,393
ffmpeg, mplayer, and mencoder will all produce identical yuv files.
shortest commandline is `ffmpeg -i test.mpg -f rawvideo -y test.yuv`

I might have reproduced your problem (though not as great a magnitude given only 2 cores). I have two core2 duo systems here, and one gets 105% smp efficiency (i.e. 100% cpu use and 5% less total cpu-time than 1 thread needed) while the other gets 90% efficiency (i.e. 85% cpu use and 5% less total cpu-time). This is true even with intra-only, which should have no synchronization overhead at all. So I don't know why it happens, but at least I can test alternatives.

Last edited by akupenguin; 11th April 2007 at 20:42.
akupenguin is offline   Reply With Quote
Old 11th April 2007, 20:52   #14  |  Link
morph166955
Registered User
 
Join Date: Mar 2006
Posts: 443
Heres some CPU Utilization numbers for you. Source is generated from using the megui test .avs file thru mencoder into a raw yuv file. x264 runs were done using the megui test options (see below). You must divide the cpu percentage by 8 i believe to get the average. Pass 1 is still only around 45% and pass 2 is around 80%. These final #'s were generated after the passes were all run (they were obviously run pass1 then pass2 not all the 1's then all the 2's). I did 10 runs with these settings.

Code:
PASS1:
encoded 1488 frames, 189.16 fps, 989.84 kb/s
27.75user 0.42system 0:07.87elapsed 357%CPU (0avgtext+0avgdata 0maxresident)k
encoded 1488 frames, 182.12 fps, 989.84 kb/s
28.00user 0.41system 0:08.17elapsed 347%CPU (0avgtext+0avgdata 0maxresident)k
encoded 1488 frames, 187.77 fps, 989.84 kb/s
27.84user 0.41system 0:07.93elapsed 356%CPU (0avgtext+0avgdata 0maxresident)k
encoded 1488 frames, 188.89 fps, 989.84 kb/s
27.83user 0.46system 0:07.88elapsed 358%CPU (0avgtext+0avgdata 0maxresident)k
encoded 1488 frames, 188.22 fps, 989.84 kb/s
27.87user 0.42system 0:07.91elapsed 357%CPU (0avgtext+0avgdata 0maxresident)k
encoded 1488 frames, 187.57 fps, 989.84 kb/s
27.79user 0.40system 0:07.94elapsed 355%CPU (0avgtext+0avgdata 0maxresident)k
encoded 1488 frames, 185.17 fps, 989.84 kb/s
27.85user 0.41system 0:08.04elapsed 351%CPU (0avgtext+0avgdata 0maxresident)k
encoded 1488 frames, 187.65 fps, 989.84 kb/s
27.78user 0.46system 0:07.93elapsed 356%CPU (0avgtext+0avgdata 0maxresident)k
encoded 1488 frames, 185.20 fps, 989.84 kb/s
27.72user 0.42system 0:08.04elapsed 350%CPU (0avgtext+0avgdata 0maxresident)k
encoded 1488 frames, 188.29 fps, 989.84 kb/s
27.77user 0.44system 0:07.91elapsed 356%CPU (0avgtext+0avgdata 0maxresident)k

PASS2:
encoded 1488 frames, 116.38 fps, 1009.85 kb/s
84.39user 0.90system 0:12.87elapsed 662%CPU (0avgtext+0avgdata 0maxresident)k
encoded 1488 frames, 116.82 fps, 1009.74 kb/s
84.26user 0.91system 0:12.82elapsed 664%CPU (0avgtext+0avgdata 0maxresident)k
encoded 1488 frames, 116.19 fps, 1009.75 kb/s
84.25user 0.90system 0:12.88elapsed 660%CPU (0avgtext+0avgdata 0maxresident)k
encoded 1488 frames, 116.74 fps, 1009.39 kb/s
84.41user 0.91system 0:12.83elapsed 665%CPU (0avgtext+0avgdata 0maxresident)k
encoded 1488 frames, 116.34 fps, 1009.38 kb/s
84.42user 0.84system 0:12.87elapsed 662%CPU (0avgtext+0avgdata 0maxresident)k
encoded 1488 frames, 116.08 fps, 1009.44 kb/s
84.27user 0.93system 0:12.90elapsed 660%CPU (0avgtext+0avgdata 0maxresident)k
encoded 1488 frames, 116.15 fps, 1009.78 kb/s
84.28user 0.89system 0:12.89elapsed 660%CPU (0avgtext+0avgdata 0maxresident)k
encoded 1488 frames, 115.93 fps, 1009.79 kb/s
84.22user 0.92system 0:12.91elapsed 659%CPU (0avgtext+0avgdata 0maxresident)k
encoded 1488 frames, 115.99 fps, 1009.38 kb/s
84.24user 0.87system 0:12.91elapsed 659%CPU (0avgtext+0avgdata 0maxresident)k
encoded 1488 frames, 115.59 fps, 1009.48 kb/s
84.33user 0.87system 0:12.95elapsed 657%CPU (0avgtext+0avgdata 0maxresident)k
x264 options:
Code:
x264 --pass 1 --bitrate 1000 --stats test-NEW.stats --bframes 3 --b-pyramid --direct auto --subme 1 --analyse none --vbv-maxrate 25000 --me dia --merange 12 --threads auto --thread-input --progress --no-psnr --no-ssim -o /dev/null stream.yuv 640x480

x264 --pass 2 --bitrate 1000 --stats test-NEW.stats --ref 3 --bframes 3 --b-pyramid --weightb --direct auto --subme 6 --trellis 1 --analyse all  --8x8dct --vbv-maxrate 25000 --me umh --merange 12 --threads auto --thread-input --progress --no-psnr --no-ssim -o /dev/null stream.yuv 640x480

Last edited by morph166955; 11th April 2007 at 20:55.
morph166955 is offline   Reply With Quote
Old 11th April 2007, 21:15   #15  |  Link
AGDenton
Registered User
 
AGDenton's Avatar
 
Join Date: Jan 2007
Posts: 27
What happens if you lower mvrange-thread manually and increase the number of threads?
AGDenton is offline   Reply With Quote
Old 11th April 2007, 21:27   #16  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,393
The extreme case is --keyint 1. I-frames can all be encoded independently without any mvrange-thread.
akupenguin is offline   Reply With Quote
Old 11th April 2007, 22:02   #17  |  Link
morph166955
Registered User
 
Join Date: Mar 2006
Posts: 443
@akupenguin
how is mvrange-thread calculated in x264 when its set to auto (i assume thats the default?)

I'm a little confused about the whole mvrange-thread thing all together as far as what it exactly does and there isnt a whole lot of stuff on here minus the random post saying "yes use it its good". Whats a good way for me to calculate it to optimize speed w/o it being overly crazy where it will affect my bitrate?
morph166955 is offline   Reply With Quote
Old 11th April 2007, 22:08   #18  |  Link
AGDenton
Registered User
 
AGDenton's Avatar
 
Join Date: Jan 2007
Posts: 27
Quote:
Originally Posted by akupenguin View Post
The extreme case is --keyint 1. I-frames can all be encoded independently without any mvrange-thread.
That's the case here? I thought default keyint was 250 or something...
AGDenton is offline   Reply With Quote
Old 11th April 2007, 22:24   #19  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,393
Quote:
Originally Posted by AGDenton View Post
That's the case here? I thought default keyint was 250 or something...
Right. But if you're going to change options, keyint 1 is the obvious one to try.

Quote:
Originally Posted by morph166955 View Post
how is mvrange-thread calculated in x264 when its set to auto
Divide the frame height by the number of threads, subtract the buffer needed for encoding and filtering. That's the amount of space available between threads.
Then take half, round up to a multiple of the macroblock height. That's the space allocated to mvrange-thread.
The other half is scheduling leeway, so that threads don't have to wait for eachother after every row.

The derived value of mvrange-thread is printed at the beginning if you run `x264 -v ...`

Last edited by akupenguin; 11th April 2007 at 23:56.
akupenguin is offline   Reply With Quote
Old 11th April 2007, 22:50   #20  |  Link
morph166955
Registered User
 
Join Date: Mar 2006
Posts: 443
ah...cool. thanks for the explanation! its comming up now default at 24. I'm moving some files around right now on those drives so once thats done (should be soon) I'll run a few passes with lower numbers for that (im going to try it first at threads=12 first for different values to see what that does, then ill play around with different #'s of threads)
morph166955 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 23:16.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2017, vBulletin Solutions Inc.