PDA

View Full Version : x264 speed on Core i7


chipzoller
28th June 2009, 20:09
--crf 21 --ref 5 --mixed-refs --no-fast-pskip --bframes 16 --b-adapt 2 --b-pyramid --weightb --direct auto --subme 9 --trellis 2 --partitions all --8x8dct --me tesa --threads auto --thread-input --progress --no-psnr --no-ssim --output "output" "input"



Using 32-bit x264 with MT avisynth DLL and very simple, only loading MPEG2Source and run through megui. All threads (8) are being used but not constantly (overall ~60% load total) and I'm getting around 10fps on a SD source. Since I'm new to multithreading in x264 and avisynth, is this consistent with what I should be getting?

Dark Shikari
28th June 2009, 20:11
Hyperthreading isn't magic; no program will ever fill 8 threads because 4 of the cores aren't real.

chipzoller
28th June 2009, 20:15
Yes, I understand this, I guess I should have asked if MPEG2Source was creating a bottleneck for x264. While I'm using the MT avisynth DLL, I didn't see a difference using SetMTMode vs. without. I'm not expecting 100% all the time since I've read your responses on other forum threads, just simply curious...

Manao
28th June 2009, 20:23
Unless the avisynth script is quite complicated, 10fps isn't indicative of source bottleneck. --b-adapt 2 -b 16, however, will create a bottleneck

chipzoller
28th June 2009, 20:42
Where, and how large of one?

LoRd_MuldeR
28th June 2009, 20:47
Use "--b-adapt 2 --bframe 4". Higher B-Frame limit is useless, but can slow down encoding significantly ;)

(That's because frame-type decision is NOT multi-threaded in x264 yet)

chipzoller
28th June 2009, 20:53
Wow, that did make a big difference. Now getting around 16fps with ~95% total usage. Since I'm not as adept with the inner workings of x264 as you all, is there some way you could explain why this made this large of a difference? I knew consecutive b-frames after 4 or 5 dropped off considerably, but I didn't know that look-ahead caused such a speed penalty. Thank you, m'lord!

LoRd_MuldeR
28th June 2009, 20:59
I already explained why. B-adapt is really slow in mode 2, at least for high b-frame limits. And since B-adapt is part of the non-multithreaded lookahead, it can easily become the bottleneck...

chipzoller
28th June 2009, 21:00
Sorry, I responded before I saw your edit. Thanks for your explanation, makes perfect sense.

Boolsheet
28th June 2009, 21:11
Hyperthreading isn't magic;
But it does a pretty good job with applications that have enough work for all threads such as x264 with a reasonable number of B frames. ;)
I haven't done many tests but the performance gain must be over 10%, maybe even 20... Is the fps counter in x264 the right thing to measure the effect of hyperthreading?

LoRd_MuldeR
28th June 2009, 21:18
Yes. Throughput (frames per second) is the correct measure for performance. CPU load (alone) is not a measure for performance.

But you should measure the average FPS for a longer clip (in order to iron out fluctuations). And of course you must make sure that x264 isn't bottlenecked by slow input...

10L23r
29th June 2009, 00:52
what speeds do u get with me umh and 6 bframes? i would like to know b/c i'm building an i7 rig soon.

chipzoller
29th June 2009, 01:27
With my conservatively overclocked 920 to 3.3GHz and the above script I'd get around 16-17fps. Best advice on building a Core i7 rig is to immediately recycle the stock cooler and get something nice like the Akasa Nero, which is what I have, or the Megahalems, and accept they run hot compared to previous processors and that's by design.

10L23r
29th June 2009, 01:37
wait, 16-17 fps for me tesa or umh?
cus i thought umh is much much faster than tesa. but 10fps on maxed out settings (except for merange) is pretty awesome.

btw, i'm planning on getting the noctua nh-u12p se1366 :D

oh, and how much does overclocking improve performance?

chipzoller
29th June 2009, 01:39
16-17 on tesa, so umh would be a few fps faster. And these speeds are on an SD source, btw.

Shinigami-Sama
29th June 2009, 03:09
wait, 16-17 fps for me tesa or umh?
cus i thought umh is much much faster than tesa. but 10fps on maxed out settings (except for merange) is pretty awesome.

btw, i'm planning on getting the noctua nh-u12p se1366 :D

oh, and how much does overclocking improve performance?

my notcua cools my 920 @ 3.6ghz quite nice, never broke 50C even when ambient temp was 25c

and the i7s overclock very very well :D

chipzoller
29th June 2009, 03:11
Did you test the upper end of that with 100% usage? Use Prime95 as a good gauge. As for performance increase, I don't know figures since I haven't done a benchmark yet.

aegisofrime
29th June 2009, 03:43
I'm considering upgrading my Q6600 to a Core i7, and I would also like to know the performance increase that I can get with it, specifically with TGMC. Does anyone have any ballpark figures?

Boolsheet
29th June 2009, 04:02
I've done one more test and thought I'd share the data.

The source is the Star Wars The Old Republic Deceived Trailer (swtor.com (http://www.swtor.com/media/trailers/deceived-cinematic-trailer) - there's a download button in the flash player). It's a 231 second long CG 720p video with 13876 frames and a framerate of 59.94.
Decoded with TheRyuus mt build of FFMS2 (http://forum.doom9.org/showthread.php?t=127037) and piped to x264 with avs2yuv. Encoded with JEEBs modified 64 bit build revision 1173.
Options:
--crf 28
-i 49 -I 500 (I don't know why my mind couldn't see the 60 in 59.94)
--me umh --subme 9 --merange 32
--ref 6 --mixed-refs
--bframes 4 --b-adapt 2 --b-pyramid --weightb
--trellis 2 --psy-rd 1.0:0.4
--8x8dct --direct auto
--no-psnr --no-ssim
--threads auto (6 and 12 threads in this test)
-o NUL


Without HT:
x264 [info]: 1280x720 @ 59.94 fps
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 FastShuffle SSE4.2
x264 [info]: profile High, level 4.0
x264 [info]: slice I:97 Avg QP:20.69 size: 44627
x264 [info]: slice P:7319 Avg QP:23.27 size: 8681
x264 [info]: slice B:6460 Avg QP:26.42 size: 592
x264 [info]: consecutive B-frames: 10.9% 76.8% 9.4% 1.2% 1.7%
x264 [info]: mb I I16..4: 38.4% 51.9% 9.7%
x264 [info]: mb P I16..4: 6.6% 6.4% 0.5% P16..4: 32.4% 4.4% 4.1% 0.0% 0.0% skip:45.5%
x264 [info]: mb B I16..4: 0.2% 0.0% 0.0% B16..8: 16.0% 0.0% 0.0% direct: 0.1% skip:83.6% L0:22.2% L1:77.3% BI: 0.5%
x264 [info]: 8x8 transform intra:47.2% inter:76.9%
x264 [info]: direct mvs spatial:100.0% temporal:0.0%
x264 [info]: coded y,uvDC,uvAC intra:42.4% 37.9% 6.5% inter:6.0% 7.5% 0.3%
x264 [info]: ref P L0 77.0% 10.4% 6.1% 2.5% 2.1% 2.0%
x264 [info]: ref B L0 80.0% 9.6% 5.7% 2.6% 2.1%
x264 [info]: ref B L1 99.1% 0.9%
x264 [info]: kb/s:2477.2

encoded 13876 frames, 16.76 fps, 2477.73 kb/s

additional runs:
encoded 13876 frames, 16.76 fps, 2477.88 kb/s
encoded 13876 frames, 16.71 fps, 2477.85 kb/s



With HT:
x264 [info]: slice I:97 Avg QP:20.79 size: 43563
x264 [info]: slice P:7319 Avg QP:23.33 size: 8695
x264 [info]: slice B:6460 Avg QP:26.41 size: 591
x264 [info]: consecutive B-frames: 10.9% 76.8% 9.4% 1.2% 1.7%
x264 [info]: mb I I16..4: 38.3% 52.2% 9.5%
x264 [info]: mb P I16..4: 6.6% 6.5% 0.5% P16..4: 32.4% 4.4% 4.0% 0.0% 0.0% skip:45.5%
x264 [info]: mb B I16..4: 0.2% 0.0% 0.0% B16..8: 16.0% 0.0% 0.0% direct: 0.1% skip:83.6% L0:22.2% L1:77.3% BI: 0.5%
x264 [info]: 8x8 transform intra:47.5% inter:76.9%
x264 [info]: direct mvs spatial:99.9% temporal:0.1%
x264 [info]: coded y,uvDC,uvAC intra:42.5% 38.0% 6.5% inter:6.0% 7.5% 0.3%
x264 [info]: ref P L0 77.0% 10.4% 6.1% 2.5% 2.1% 2.0%
x264 [info]: ref B L0 80.0% 9.6% 5.7% 2.6% 2.1%
x264 [info]: ref B L1 99.1% 0.9%
x264 [info]: kb/s:2477.2

encoded 13876 frames, 19.51 fps, 2477.73 kb/s

additional runs:
encoded 13876 frames, 19.52 fps, 2477.68 kb/s
encoded 13876 frames, 19.48 fps, 2477.69 kb/sLooks like it's about 14% faster. Of course, as usual, this depends always on the source. Static scenes should be faster without Hyper-Threading, the CPU load drops and only one thread seems to be fully working.


oh, and how much does overclocking improve performance?
I've overclocked my i7 920 about 8% to BCLK 145 (2900 MHz or 3045 if turbo mode kicks in (it always does for me)), still got the intel cooler on it so this is as far as I want to go. ;)
With the default BCLK of 133 (2660 or 2793 MHz) x264 encodes this video at 17.86 fps. Is it safe to say it scales 1 to 1 if compared to 19.51 fps? I'm new to overclocking.

10L23r
29th June 2009, 05:26
I'm considering upgrading my Q6600 to a Core i7, and I would also like to know the performance increase that I can get with it, specifically with TGMC. Does anyone have any ballpark figures?

i had the same question, cus i was thinking that my i7/gtx260 setup would be overkill...

and what fps's do you guys get with single-thread?

aurorix
29th June 2009, 08:08
I've overclocked my i7 920 about 8% to BCLK 145 (2900 MHz or 3045 if turbo mode kicks in (it always does for me)), still got the intel cooler on it so this is as far as I want to go. ;)
With the default BCLK of 133 (2660 or 2793 MHz) x264 encodes this video at 17.86 fps. Is it safe to say it scales 1 to 1 if compared to 19.51 fps? I'm new to overclocking.

Well just look at the numbers...

speedup = performance(new) / performance(old)

For BCLK: speedup = 145/133 = 1.0902... (roughly 9%)
For fps: speedup = 19.51/17.86 = 1.0923... (roughly 9%)

So yes, it scales 1:1 with BCLK in this case.

btw this looks awesome coming from a 2.0GHz core2 laptop. depending on the options I encode at about 3fps...

can't wait for my new i7 rig to arrive, it'll be night and day :D

10L23r
29th June 2009, 23:48
i don't think it's safe to say it scales 1:1 cus 9% is a very slight OC

3 fps OUCH. what settings do you use? tesa and subme9???

aurorix
30th June 2009, 04:03
i don't think it's safe to say it scales 1:1 cus 9% is a very slight OCI agree, which is why I said "in this case." Though as a general rule i think it makes sense for the performance to scale close to linear with clockspeed... i mean unless you hit an input bandwidth limit the encoder itself is a good computational benchmark (tight code running in the CPU's cache).

This link http://www.anandtech.com/bench/default.aspx?b=28 has a list of benchmark scores and you can compare each processor family to see how they scale. Core i7 from 2.66GHz through 3.33GHz is more or less linear.

3 fps OUCH. what settings do you use? tesa and subme9???hehe not tesa, but I do use umh + trellis2 + subme9 etc. which slow things down a bunch

10L23r
30th June 2009, 06:23
thx for the link. evidence that an i7 920 is definitely worth it. and then the i5's come and show me that i should have invested $300 in a gfx card or a new monitor... :P

so my guess is that an i7 can do SD encodes with umh+subme9... at 20-40 fps.

i have some questions on multi-threading.
what happens to the framerate when multi-threading is turned off (i.e. utilize only one core for encoding)??
i've heard that multi-threading impacts quality a little. why does this happen and by how much is the quality affected?

ash925
30th June 2009, 11:45
i have some questions on multi-threading.
what happens to the framerate when multi-threading is turned off (i.e. utilize only one core for encoding)??
i've heard that multi-threading impacts quality a little. why does this happen and by how much is the quality affected?

The speed of encoding will drop if you set x264 to use only one core , the quality degradation with multi-threading is negligible AFAIK, as long as the number of threads is some sane value and the trade-off with speed is not worth it.There has been some discussion on it :search: to learn more.