Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Reply
 
Thread Tools Search this Thread Display Modes
Old 1st October 2013, 18:22   #1  |  Link
Gregleto
Registered User
 
Join Date: Sep 2013
Posts: 38
Too many threads when set to auto

Hi all,

I'm just trying to clear up a little confusion about setting threads in x.264. From all the advice I've read and received here and elsewhere, large numbers of threads is ill-advised, and even has a negative impact on quality. Everyone seems to have a different opinion about how many threads is too many, but most advice is keep them under 20. However, when threads is set to auto (recommended default), x264 seems to set the number of threads to its default formula (1.5*logical processors). In many modern multiprocessor/core servers, this will almost always result in a number of threads considered anecdotally by most x.264 users as "too many". For example...

We are running our x264 encodes on a Dell R710 server with dual 6-core processors for a total of 12 cores (24 processor threads using hyperthreading). When we set threads to "auto" in x264, the number of threads reported by Mediainfo is 36. That makes sense (so long as you consider a processor thread to be equivalent to a "logical processor" (more on that later).

So while x264 seems to be working according to its formula, my question is why does the default "auto" setting permit the threads to rise above a threshold that most everyone seems to be saying will have a negative impact on quality? Or is it the case that the quality impact only comes into play when you set the number of threads higher than the actual number of logical processors on your machine (times 1.5)?

In other words, do people see a reduction in quality with a large number of threads, even if their machine has enough processor threads to support the thread number calculated according to x264s formula?

Thanks,

Gregleto
Gregleto is offline   Reply With Quote
Old 1st October 2013, 19:36   #2  |  Link
detmek
Registered User
 
Join Date: Aug 2009
Posts: 463
I just did a small test (one short clip) and compared SSIM values for --threads 3 and --threads 36. With increased number of threads, quality dropped by 0,36%. Not much, I you ask me.
detmek is offline   Reply With Quote
Old 1st October 2013, 21:16   #3  |  Link
Asmodian
Registered User
 
Join Date: Feb 2002
Location: San Jose, California
Posts: 4,406
As I understand it (imperfectly at best) the reduction in quality is due to the individual threads not sharing information with each other and when they get too high they are all working on such a small section that this lack of information starts to become significant. Exactly what "information" they are not sharing I don't know but I think it is some aspect of the results from each thread.

I think the auto threads was designed a while ago and with more normal systems in mind, with a multiprocessor many core system I suggest you set threads manually if you want quality over speed.

@detmek
Which resolution was that clip? I believe the higher the resolution the less threads have an impact on quality.
Asmodian is offline   Reply With Quote
Old 1st October 2013, 21:20   #4  |  Link
detmek
Registered User
 
Join Date: Aug 2009
Posts: 463
Scene from I am legend
720x304, around 1000 frames. I used 2-pass at 700kbs, preset medium, tune ssim.
detmek is offline   Reply With Quote
Old 1st October 2013, 21:52   #5  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,392
There's two ways that lots of threads can lose quality:
  • When using VBV, we have to plan the size of one frame using only predictions of, not exact knowledge of, the size of previous frames that are still in progress. Threads do tell each other their exact size-so-far at the end of each row of macroblocks, not just when finishing a whole frame; but that can still be a significant amount of stuff in progress, adding up to a significant amount of uncertainty. This is bad because if we overcommitted bits to the beginning of a frame and then have to revise quality downwards for the bottom of the frame, (or conversely if we undercommitted for the top and only later learned that we have more bits available for the bottom), the result looks worse than if we had encoded the whole frame at an intermediate quality that gives the same total size. If you're not using VBV, this doesn't happen; none of the other ratecontrol modes care about such small deviations from the predicted size.
  • Limits to the vertical component of motion vectors, since a block can only be predicted from some part of a previous frame that has already been encoded. Which is bad if the actual motion is faster than that. However, x264 imposes a lower bound on this (default 24 pixels, can be overridden by --mvrange-thread). 24 implies that the spacing between one thread and the next that are working on mutually dependent frames, is 40 pixels (the other 16 is from the height of a macroblock). Which means that a 1080p video won't use more than 27 threads for a sequence of P-frames, though it can use more threads if there are B-frames involved, because those don't add to the dependency chain. And a 304p video won't use more than 8 threads for a sequence of P-frames. (Though perhaps the default mvrange-thread should depend on resolution, since typical speed of motion as measured in pixels depends on resolution.)
akupenguin is offline   Reply With Quote
Old 1st October 2013, 22:07   #6  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by akupenguin View Post
There's two ways that lots of threads can lose quality:
  • When using VBV, we have to plan the size of one frame using only predictions of, not exact knowledge of, the size of previous frames that are still in progress. Threads do tell each other their exact size-so-far at the end of each row of macroblocks, not just when finishing a whole frame; but that can still be a significant amount of stuff in progress, adding up to a significant amount of uncertainty. This is bad because if we overcommitted bits to the beginning of a frame and then have to revise quality downwards for the bottom of the frame, (or conversely if we undercommitted for the top and only later learned that we have more bits available for the bottom), the result looks worse than if we had encoded the whole frame at an intermediate quality that gives the same total size. If you're not using VBV, this doesn't happen; none of the other ratecontrol modes care about such small deviations from the predicted size.
  • Limits to the vertical component of motion vectors, since a block can only be predicted from some part of a previous frame that has already been encoded. Which is bad if the actual motion is faster than that. However, x264 imposes a lower bound on this (default 24 pixels, can be overridden by --mvrange-thread). 24 implies that the spacing between one thread and the next that are working on mutually dependent frames, is 40 pixels (the other 16 is from the height of a macroblock). Which means that a 1080p video won't use more than 27 threads for a sequence of P-frames, though it can use more threads if there are B-frames involved, because those don't add to the dependency chain. And a 304p video won't use more than 8 threads for a sequence of P-frames. (Though perhaps the default mvrange-thread should depend on resolution, since typical speed of motion as measured in pixels depends on resolution.)
Something like this should be sticky. Thanks for the excellent explanation.
Groucho2004 is offline   Reply With Quote
Old 2nd October 2013, 04:07   #7  |  Link
Gregleto
Registered User
 
Join Date: Sep 2013
Posts: 38
So based on the feedback here, below is the formula for properly setting threads that seems to make sense to me.

Threads should be equal to the lesser result of the following two calculations…
1. The number of logical processors multiplied by 1.5
2. The number of vertical pixels of video divided by 40

Examples
A. Video encoded to 640x360 on a machine with a total of 8-cores and hyperthreading enabled.
(8 cores with hyperthreading equals 16 logical processors)
1. 16*1.5 = 24
2. 360/40 = 9
Set threads to 9 (the lower result)

B. Video encoded to 1280x720 on a machine with a total of 4-cores and hyperthreading enabled.
(4 cores with hyperthreading equals 8 logical processors)
1. 8*1.5 = 12
2. 720/40 = 18
Set threads to 12 (the lower result)

C. Video encoded to 1920x1080 on a machine with a total of 12-cores and hyperthreading enabled.
(12 cores with hyperthreading equals 24 logical processors)
1. 24*1.5 = 36
2. 1080/40 = 27
Set threads to 27 (the lower result)

Based on recent feedback and my (limited) understanding, with threads set to “auto” in x264, examples A and C would both result in an excessively high number of threads that could/may hinder performance and/or negatively affect quality.

If anyone sees any problems with this approach please let me know.

Thanks,

Gregleto
Gregleto is offline   Reply With Quote
Old 2nd October 2013, 16:55   #8  |  Link
Gregleto
Registered User
 
Join Date: Sep 2013
Posts: 38
Thanks from me as well. However, I just want to confirm one thing in your response that will add to my understanding. Near the end of your answer you state....

"And a 304p video won't use more than 8 threads for a sequence of P-frames."

Did you mean to say "304p"?

Thanks,

Gregleto
Gregleto is offline   Reply With Quote
Old 2nd October 2013, 21:23   #9  |  Link
Selur
Registered User
 
Selur's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 7,259
probably some resolution with a height of 304 pixels (304/40 rounded up is 8 )
__________________
Hybrid here in the forum, homepage
Selur is offline   Reply With Quote
Old 2nd October 2013, 21:50   #10  |  Link
Gregleto
Registered User
 
Join Date: Sep 2013
Posts: 38
Maybe, but there was no discussion about rounding up or down, and that's what threw me a bit. If he had said "320p", it would make perfect sense to me. I just want to make sure I'm not missing something.
Gregleto is offline   Reply With Quote
Old 3rd October 2013, 12:41   #11  |  Link
the_weirdo
Yes, I'm weird.
 
the_weirdo's Avatar
 
Join Date: May 2010
Location: Southeast Asia
Posts: 271
Maybe he's referring to detmek's post.
__________________
“Never argue with stupid people, they will drag you down to their level and then beat you with experience.” — Mark Twain
the_weirdo is offline   Reply With Quote
Old 3rd October 2013, 16:09   #12  |  Link
Gregleto
Registered User
 
Join Date: Sep 2013
Posts: 38
Ah, I missed that. Thanks for pointing it out.
Gregleto is offline   Reply With Quote
Old 3rd October 2013, 16:31   #13  |  Link
detmek
Registered User
 
Join Date: Aug 2009
Posts: 463
Yes, it is comment to my test. Thanks aqupenguin. I didn't know that. I will try do perform some tests with 1080p videos.
detmek is offline   Reply With Quote
Old 15th October 2013, 19:35   #14  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,750
Is there much overhead in terms of CPU or RAM use with multithreading? For example, if I was encoding 36 individual identical streams simultaneously, would I see total fps throughput and RAM utilization go down significantly if I encoded each with --threads 1 instead of the implicit default of 36 threads for each instance (1296 threads! I'd like THAT workstation...)
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 15th October 2013, 19:45   #15  |  Link
Gregleto
Registered User
 
Join Date: Sep 2013
Posts: 38
Quote:
Originally Posted by benwaggoner View Post
Is there much overhead in terms of CPU or RAM use with multithreading? For example, if I was encoding 36 individual identical streams simultaneously, would I see total fps throughput and RAM utilization go down significantly if I encoded each with --threads 1 instead of the implicit default of 36 threads for each instance (1296 threads! I'd like THAT workstation...)
Ben, I'm not qualified to answer your question. However, I posted a theoretical approach to setting threads more accurately here...

http://forum.doom9.org/showthread.php?t=168848

It's at the bottom of the thread and it received no replies. I think it's more appropriate for this thread. Could it be moved from there to here?

Thanks,

Gregleto
Gregleto is offline   Reply With Quote
Old 15th October 2013, 21:59   #16  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by benwaggoner View Post
Is there much overhead in terms of CPU or RAM use with multithreading? For example, if I was encoding 36 individual identical streams simultaneously, would I see total fps throughput and RAM utilization go down significantly if I encoded each with --threads 1 instead of the implicit default of 36 threads for each instance (1296 threads! I'd like THAT workstation...)
Running just one thread (+ 1 input thread) is slightly more efficient than the default because of processor scheduling, threading code in x264, etc. I just ran a test:
1 thread: 5.5 fps @ 25% CPU usage
Auto (6 threads): 20.2 fps @ 100% CPU usage

So, running 4 simultaneous encodings with "--threads 1" is more efficient but uses more memory - however - in separate processes.
Running 36 simultaneous processes is probably more a problem for the hard drive serving your sources (although with a fast SSD maybe not so much).

Last edited by Groucho2004; 16th October 2013 at 01:24.
Groucho2004 is offline   Reply With Quote
Old 16th October 2013, 00:33   #17  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,750
Quote:
Originally Posted by Gregleto View Post
I think it's more appropriate for this thread. Could it be moved from there to here?
Done.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 28th August 2014, 08:29   #18  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,309
Quote:
Originally Posted by akupenguin View Post
There's two ways that lots of threads can lose quality:
When using VBV, we have to plan the size of one frame using only predictions of, not exact knowledge of, the size of previous frames that are still in progress. Threads do tell each other their exact size-so-far at the end of each row of macroblocks, not just when finishing a whole frame; but that can still be a significant amount of stuff in progress, adding up to a significant amount of uncertainty. This is bad because if we overcommitted bits to the beginning of a frame and then have to revise quality downwards for the bottom of the frame, (or conversely if we undercommitted for the top and only later learned that we have more bits available for the bottom), the result looks worse than if we had encoded the whole frame at an intermediate quality that gives the same total size. If you're not using VBV, this doesn't happen; none of the other ratecontrol modes care about such small deviations from the predicted size.
Unless it's already done, is it possible in case of multi-pass and multi-threaded encode, to add data to stat file to help bit repartition for passes after the 1rst ? In that case it may significantly help and improve result, no ? Or am i saying something stupid ?
jpsdr is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 20:29.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.