PDA

View Full Version : x264 number of threads issue


Racer
17th December 2012, 21:07
Hello,

I have done several tests with my AMD X6 1045T and the x264 HD Benchmark Ver. 5.0.1

I get way better encoding results if I change the number of threads for x264. This could be improved by the x264 developers.
It seems to be general issue with x264 and AMD CPUs. With "--threads auto" I never get 100% cpu load.

Results for x264.exe r2200
x264 Benchmark: 64-bit
==========================

Pass 1
------
encoded 11812 frames, 52.56 fps, 7758.17 kb/s --> 24 threads
encoded 11812 frames, 50.89 fps, 7757.62 kb/s --> 18 threads
encoded 11812 frames, 41.29 fps, 7754.40 kb/s --> 12 threads
encoded 11812 frames, 27.22 fps, 7754.03 kb/s --> auto

Pass 2
------
encoded 11812 frames, 9.88 fps, 7999.65 kb/s --> 24 threads
encoded 11812 frames, 9.91 fps, 8000.46 kb/s --> 18 threads
encoded 11812 frames, 9.66 fps, 8002.22 kb/s --> 12 threads
encoded 11812 frames, 9.23 fps, 8001.96 kb/s --> auto

Atak_Snajpera
17th December 2012, 21:46
what is your cpu utilization in x264 fhd benchmark?

Dark Shikari
17th December 2012, 21:55
It's because it's allocating more lookahead threads on the first pass; a better way to change this is just to set --lookahead-threads accordingly. I need to find some good way to automatically come up with a reasonable number of lookahead threads on the first-pass; currently the formula is optimized more for CRF.

Racer
17th December 2012, 22:11
It's not just the 1st pass. I could also see the same issue if I set the preset to "superfast" or "ultrafast" for the 2nd pass.

Asmodian
18th December 2012, 02:35
Aren't superfast or ultrafast more or less equivalent to 1st pass?

I assume the issue is that lookahead is a greater percentage of the total cpu time needed than usual (usual being crf, preset medium+) so more lookahead threads are needed to keep the rest of the threads busy.

If setting --lookahead-threads to over what auto would pick does the total number of threads go up or does it allocate threads usually used for something else to lookahead? i.e. If auto picks 12 threads and I set lookahead to 4 do I get 14 total threads or still 12?

Dark Shikari
18th December 2012, 07:51
If auto picks 12 threads and you set lookahead to 4, you'll get 16 threads.

Asmodian
18th December 2012, 19:19
Ah! unexpectedly easy. Thanks :)

Racer
21st December 2012, 16:40
I have done further tests. 4 lookahead threads seem to be the best for my AMD X6 1045T.
If I also increase the threads I can see a slightly better performance on pass 1, too.


Results for x264.exe r2200
x264 Benchmark: 64-bit
==========================

Pass 1
------
encoded 11812 frames, 27.17 fps, 7754.05 kb/s --threads auto
encoded 11812 frames, 27.08 fps, 7753.94 kb/s --threads auto --lookahead-threads 1
encoded 11812 frames, 41.53 fps, 7753.80 kb/s --threads auto --lookahead-threads 2
encoded 11812 frames, 50.65 fps, 7753.92 kb/s --threads auto --lookahead-threads 3
encoded 11812 frames, 53.27 fps, 7753.38 kb/s --threads auto --lookahead-threads 4
encoded 11812 frames, 53.25 fps, 7753.93 kb/s --threads auto --lookahead-threads 5
encoded 11812 frames, 53.75 fps, 7754.19 kb/s --threads 12 --lookahead-threads 4
encoded 11812 frames, 53.73 fps, 7757.38 kb/s --threads 18 --lookahead-threads 4
encoded 11812 frames, 54.10 fps, 7758.01 kb/s --threads 24 --lookahead-threads 4
encoded 11812 frames, 53.96 fps, 7759.41 kb/s --threads 30 --lookahead-threads 4


Pass 2
------
encoded 11812 frames, 9.31 fps, 8002.04 kb/s --threads auto
encoded 11812 frames, 9.32 fps, 8002.02 kb/s --threads auto --lookahead-threads 1
encoded 11812 frames, 9.34 fps, 8002.71 kb/s --threads auto --lookahead-threads 2
encoded 11812 frames, 9.35 fps, 8002.07 kb/s --threads auto --lookahead-threads 3
encoded 11812 frames, 9.24 fps, 8002.65 kb/s --threads auto --lookahead-threads 4
encoded 11812 frames, 9.22 fps, 8001.45 kb/s --threads auto --lookahead-threads 5
encoded 11812 frames, 9.66 fps, 8002.34 kb/s --threads 12 --lookahead-threads 4
encoded 11812 frames, 9.89 fps, 8001.11 kb/s --threads 18 --lookahead-threads 4
encoded 11812 frames, 10.03 fps, 7999.71 kb/s --threads 24 --lookahead-threads 4
encoded 11812 frames, 10.05 fps, 7998.62 kb/s --threads 30 --lookahead-threads 4

Asmodian
21st December 2012, 23:34
Doesn't the quality start to drop when using really a lot of threads? I remember Dark Shikari mentioned this; I think he said it wasn't really a worry until going over 18. I am also sure it isn't a hard line at 18, the effect just starts to be measurable around there.

I notice the bit-rate of the second pass for "--threads 24" or higher starts to drop a little bit. This seems odd to me.

ajp_anton
23rd December 2012, 16:30
You think a 0.05% difference between maximum and minimum bitrates is reason to worry, or even to think it's odd?

Bleck
23rd December 2012, 21:43
I have a 8 threads i7. I get all threads with --threads 0 or I need to set another command like Racer?

J_Darnley
24th December 2012, 10:43
Huh? If that is a CPU with 8 virtual cores (4 + HT) then x264 will use 12 encoding threads and 2 input threads. Whether this will maximise your encoding speed, we don't know. You haven't told us anything yet. Based on Racer's tests it may be advantageous to use more input threads.

sneaker_ger
12th January 2013, 15:55
https://github.com/DarkShikari/x264-devel/commit/8e145ffcbd53717261f969b2a78eaa36d224075b

Blue_MiSfit
12th January 2013, 22:19
Don't use a lot of threads with relatively small VBV buffers.

I ran across a really nasty problem when using 36 threads (result of --threads auto on a 24 logical core system) and a 1 second buffered CBR encode. Quality would occasionally hit the floor for a few seconds at a time. Reducing threads to 12 fixed the issue.

I don't know if this was ever fixed, but the devs didn't seem interested in fixing it at the time.

Dark Shikari
13th January 2013, 01:03
It's not that people aren't interested in fixing it, it's that nobody knows how or if it's even actually possible to do so. I'd absolutely love to fix it... if anyone had any ideas regarding how.

Blue_MiSfit
13th January 2013, 06:57
I stand corrected!

Dark Shikari
13th January 2013, 10:50
More precisely, trying to get VBV to work well when the thread buffer is larger than the VBV is extremely difficult. It's probably not impossible, but improving things would require a seriously better understanding of the weaknesses of the current approach and why it fails when it does. I've tried, but I'm really not very good at this.

Blue_MiSfit
14th January 2013, 04:49
Is this any easier with sliced threads? I doubt that's helpful but it's the only thing that came to mind...

Dark Shikari
14th January 2013, 04:52
Sliced threads won't get you nearly the throughput of frame threads, so while it'll be easier on VBV, you'll probably get better results with frame threading. That is, frame threading with 4-8 threads will probably get better performance than any number of slice threads.

Blue_MiSfit
14th January 2013, 05:29
Makes sense. How could one go about calculating a "safe" amount of buffer required for VBV to work properly with a given number of threads?

Put another way, 12 threads is fine with a 1 second buffer, but 36 is not, so is there any way to know how many seconds would be required for VBV to work as expected with 36 threads?

Dark Shikari
14th January 2013, 06:24
No idea; I haven't done enough testing.

Racer
20th September 2014, 09:29
How many lookahread threads are actually used if I set --threads=12,16,18. Is there some kind of formula. Thanks!

fionag
20th September 2014, 13:16
It's a little bit complicated:

if( h->param.i_lookahead_threads == X264_THREADS_AUTO )
{
if( h->param.b_sliced_threads )
h->param.i_lookahead_threads = h->param.i_threads;
else
{
/* If we're using much slower lookahead settings than encoding settings, it helps a lot to use
* more lookahead threads. This typically happens in the first pass of a two-pass encode, so
* try to guess at this sort of case.
*
* Tuned by a little bit of real encoding with the various presets. */
int badapt = h->param.i_bframe_adaptive == X264_B_ADAPT_TRELLIS;
int subme = X264_MIN( h->param.analyse.i_subpel_refine / 3, 3 ) + (h->param.analyse.i_subpel_refine > 1);
int bframes = X264_MIN( (h->param.i_bframe - 1) / 3, 3 );

/* [b-adapt 0/1 vs 2][quantized subme][quantized bframes] */
static const uint8_t lookahead_thread_div[2][5][4] =
{{{6,6,6,6}, {3,3,3,3}, {4,4,4,4}, {6,6,6,6}, {12,12,12,12}},
{{3,2,1,1}, {2,1,1,1}, {4,3,2,1}, {6,4,3,2}, {12, 9, 6, 4}}};

h->param.i_lookahead_threads = h->param.i_threads / lookahead_thread_div[badapt][subme][bframes];
/* Since too many lookahead threads significantly degrades lookahead accuracy, limit auto
* lookahead threads to about 8 macroblock rows high each at worst. This number is chosen
* pretty much arbitrarily. */
h->param.i_lookahead_threads = X264_MIN( h->param.i_lookahead_threads, h->param.i_height / 128 );
}
}

Basically if it's sliced threads mode, it's lookahead threads == threads, otherwise it's a rough formula that tries to guess about how slow the main encode will be relative to the lookahead based on a few settings.