View Full Version : x264 using 50% CPU only, what happened to multithreading ?
St Devious
30th October 2008, 02:20
Trying to encode a fraps recorded video to mp4 with megui/x264 998 . Its only using 50% of my quad core . I thought it was supposed to use 100% on all 4 cores , what happened to that ?
Am i doing something wrong ?
EDIT : Here is the command line
program --pass 2 --bitrate 4500 --stats ".stats" --ref 8 --mixed-refs --bframes 3 --b-adapt 2 --b-pyramid --weightb --direct auto --filter -1:-1 --subme 8 --trellis 2 --partitions p8x8,b8x8,i4x4,i8x8 --8x8dct --me esa --threads auto --thread-input --aq-strength 1.5 --cqm "jvt" --progress --no-psnr --no-ssim --output "output" "input"
ChronoCross
30th October 2008, 02:21
post your command line
LoRd_MuldeR
30th October 2008, 02:24
1. If you use "--threads n" instead of "--threads auto", then "n" should be the number of cores * 3/2 (e.g. "--threads 6" for a quadcore)
2. If you use "--b-adapt 2", then lower "--bframes" to a sane value, such as "5" or below
St Devious
30th October 2008, 02:26
post your command line
program --pass 2 --bitrate 4500 --stats ".stats" --ref 8 --mixed-refs --bframes 3 --b-adapt 2 --b-pyramid --weightb --direct auto --filter -1:-1 --subme 8 --trellis 2 --partitions p8x8,b8x8,i4x4,i8x8 --8x8dct --me esa --threads auto --thread-input --aq-strength 1.5 --cqm "jvt" --progress --no-psnr --no-ssim --output "output" "input"
Thanks !
Comatose
30th October 2008, 02:26
1. If you use "--threads n" instead of "--threads auto", then "n" should be the number of cores * 2/3 (e.g. "--threads 6" for a quadcore)
3/2, not 2/3 :P
LoRd_MuldeR
30th October 2008, 02:29
Well, "--me esa" is a bit overkill. But I think it's multi-threaded, so this shouldn't be the problem. Also why do you use "--cqm jvt" ???
The rest looks quite normal. Are you sure that your source (decoder + filters) isn't the performance bottleneck here?
3/2, not 2/3 :P
Look at the timestamp of your post. Then look at "Last edited by LoRd_MuldeR; Today at 03:26." :p
Blue_MiSfit
30th October 2008, 02:40
I often bottleneck when decoding high bitrate sources. AviSynth can also easily bottleneck x264.
What's your source?
~MiSfit
Sagekilla
30th October 2008, 02:45
Can you put up a small sample? I wanna try testing to see if I can duplicate, I find it very odd that you're getting such slow speeds. Nothing in your script seems wrong: Your settings are so slow that --b-adapt 2 isn't the bottleneck, the rest of the x264 functions are eating up most of CPU time.
Very odd..
St Devious
30th October 2008, 02:52
Just noticed, that only first pass uses 50% cpu, 2nd pass is up at full 100% usage.
Odd, shouldn't the first pass be at 100% too ?
Sagekilla
30th October 2008, 03:23
Can you try with 1st pass using b-adapt 1? b-adapt 2 is slower than b-adapt 1 but it shouldn't be a lot slower @ 3 B's. If B-frame placement is the bottleneck, 50% on 1st pass is normal.
Dark Shikari
30th October 2008, 06:46
Just noticed, that only first pass uses 50% cpu, 2nd pass is up at full 100% usage.
Odd, shouldn't the first pass be at 100% too ?If he's using "Turbo" or whatever it is in MeGUI that makes the first pass use faster settings, this wouldn't be at all surprising.
Snowknight26
30th October 2008, 12:28
Considering --b-adapt 2 and --bframe 3 is in his 2nd pass, its likely to say that those are the 1st pass as well, hence the low CPU usage.
St Devious
1st November 2008, 02:24
I am using "Turbo" mode for the first pass . That doesn't reduce the quality significantly , right ?
Anyways anyway to use 100% cpu in first pass ?
Adub
1st November 2008, 05:44
No, the turbo option does not reduce quality significantly. In fact, it's very hard to tell the difference.
However, my guess is that one of your biggest problems is either the size of you video and the decoder you are using, or the fact that you are using b-adapt 2. Which is a good thing! The only problem is that it hasn't been properly threaded yet, so it is actually running slower right now. We just have to be patient and wait for the X264 team to finish optimizing it for us.
Speaking of which, Dark, can you comment on the threading issues? I believe that I read somewhere that it had to do with slice threading or something, but I am not one hundred percent sure. I am about to upgrade to a quad core myself, so having a fully threaded b-adapt 2 would be very nice.
Dark Shikari
1st November 2008, 05:45
Speaking of which, Dark, can you comment on the threading issues? I believe that I read somewhere that it had to do with slice threading or something, but I am not one hundred percent sure. I am about to upgrade to a quad core myself, so having a fully threaded b-adapt 2 would be very nice.No, someone just has to write a patch to move slicetype decision into a lookahead pass. This has a number of subtleties to it, but gives the benefit that it would mean a VBV lookahead would become comparatively trivial once it was done.
Adub
1st November 2008, 05:52
Ah, slicetype was the word. Has there been any interest in the development team to do such a thing, or are you guys focusing elsewhere right now?
Comatose
1st November 2008, 14:02
Look at the timestamp of your post. Then look at "Last edited by LoRd_MuldeR; Today at 03:26." :p
Oh noes, the time space continuum!
Lele-brz
2nd November 2008, 10:25
I have a similar "issue". In my case it's because I'm using a resizing (BicubicResize) in my avs script.
something like:
DirectShowSource("d:\bentest\DEMAPartTwo.mp4", fps=30.000)
BicubicResize(512, 288, 0, 0.5)
ConvertToYV12()
In that case with a source of 1280x720 (H264 at 5Mbps).
With BicubicResize CPU is around 50%, if I remove the resize 95% of CPU is used.
My system is a Intel Xeon 5130 Dual Core.
Not sure if there something to speed up the process in my case.
roozhou
2nd November 2008, 12:01
Avoid using avisynth built-in resizer since it's slow and not multi-threaded. Use the swscaler from ffmpeg, available in both ffdshow and ffmpegsource. It's multi-threaded and about 1~2x faster than avisynth built-in resizer.
Lele-brz
2nd November 2008, 13:35
Thanks,now I'm trying with the following AVS input:
FFVideoSource("d:\bentest\DEMAPartTwo.mp4")
SWScale(512, 288)
ConvertToYV12()
But apparently it's always taking low CPU, even in this case removing the scaling improve the CPU usage.
I don't want to go off-topic, but how can I try doing the same with ffdshow, as you suggested.
Thanks again
p.s: I tried even with mencoder as a input piping it to x264, even in that case resizing was the bottleneck (but I need more test there...)
roozhou
2nd November 2008, 14:59
Why do you use converttoyv12? It is not needed any will only bring extra overhead.
I don't believe resizing from 1280x720 to 512x288 being bottleneck. IMO it taks at most 50% CPU time as much as decoding.
To resize in ffdshow, find the "resize & aspect" tab in "ffdshow video decoder configuration".
LoRd_MuldeR
2nd November 2008, 16:02
Why do you use converttoyv12? It is not needed any will only bring extra overhead.
If your source already is YV12, then ConvertToYV12() is a NOP. It may be superfluous, but shouldn't cause any problems...
Lele-brz
2nd November 2008, 16:18
yes, in fact I still see that not doing any sizing can take all CPUs power, especially in second pass.
It can maybe depend on the machine I'm using.
nm
2nd November 2008, 18:18
Encoding 512x288 is many times faster than encoding 720p, so when resizing to a low resolution, decoding the 720p source may become a bottleneck. Are you using a multithreaded H.264 decoder?
Lele-brz
2nd November 2008, 20:53
What I'm checking is the CPU usage during the encoding, if I don't do any resizing I have 100% CPU on a Dual Core machine.
When doing the resizing the CPU usage decrease dramatically, so I guess it's not on the decoding side.
I'll test again with different kind of sources.
kemuri-_9
2nd November 2008, 21:15
no, it definitely seems that the decoder could be the case for the lack of cpu usage:
with a smaller video the encoder can run faster than at original size.
with the full size video x264 is taking all it can to encode the large frames, while avisynth is keeping up with/getting ahead of x264.
thus 100 or near 100% utilization.
with the small size video, x264 is blasting through the encode of the small frames, running ahead of avisynth.
So now avisynth's decoding of the original full-sized material is being the block for the utilization as it can't decode fast enough to keep up with x264's encoding.
that's the situation the others were mentioning.
Lele-brz
3rd November 2008, 08:19
Yes, I think this is the case. Thanks for clarifying.
So, I guess there's no way with free decoders to avoid this bottleneck?
pcordes
3rd November 2008, 09:25
p.s: I tried even with mencoder as a input piping it to x264, even in that case resizing was the bottleneck (but I need more test there...)
If you're piping raw yuv4mpeg, like mplayer -vo yuv4mpeg:file=pipe.y4m, you need a bigger buffer between mplayer and x264 than standard Unix pipe buffers (8kB on Linux). e.g. bfr -b11M. see http://forum.doom9.org/showthread.php?p=1206916#post1206916
This was necessary for keeping both cores of my C2D busy, esp. on faster encodes.
Lele-brz
3rd November 2008, 10:32
Thank Pcordes, in my case I'm using Windows.
I'll see if there's some option to set on mencoder to increase the buffer size (probably it's somethin in the OS?)
roozhou
3rd November 2008, 12:14
The buffer size has nothing to do with mencoder. I think you are piping in cmd.exe, right? So that is cmd.exe who calls CreatePipe to redirect mencoder's stdout to x264's stdin. You need to write your own program to implement piping. In my personal a 2x~3x frame size buffer gives a good result. And IMO large buffers may increase cache misses, so it is not the larger the better.
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.