Log in

View Full Version : What is the maximum number of threads in x264?


Jamaika
7th December 2024, 18:21
I know that the x264 codec isn't improved. The codec is in use though.
I'm interested in the topic of correctly adding the default number of threads in x264 for the i5 13600K processor. For me x264 adds threads=30. Doesn't it overstate?

https://code.videolan.org/videolan/x264/-/commit/da14df5535fd46776fb1c9da3130973295c87aca

microchip8
8th December 2024, 05:56
Amount of threads is, IIRC, 1.5x num of CPU. You should just use 'auto' and don't bother.

Jamaika
8th December 2024, 07:21
My question is for auto 30 threads. Is this value from outer space and heats the processor until I finally burn it? For vvenv, xeve there are protections up to 12 threads.
resolution < 720p: 4, < 5K 2880p: 8, >= 5K 2880p: 12 threads

https://www.intel.com/content/www/us/en/products/sku/230493/intel-core-i513600k-processor-24m-cache-up-to-5-10-ghz/specifications.html
For 13600K total threads 20

For x265 is:
x265 [info]: Thread pool created using 20 threads

microchip8
8th December 2024, 08:45
It's not out of space. There's a formula behind it. I recall back in the days Dark Shikari explained it once how the amount is calculated but I can't find it no more :/

jpsdr
8th December 2024, 14:08
If i remember properly, the max = height/40.
Another thing i vaguely remember, it wasn't also recommended to have more than 20, as above there is a several quality drop.

Jamaika
8th December 2024, 18:50
If i remember properly, the max = height/40.
Another thing i vaguely remember, it wasn't also recommended to have more than 20, as above there is a several quality drop.
Impossible. For my video 1920 divided by 40 is 48.

FranceBB
8th December 2024, 21:38
You may wanna take a look at my post from earlier on this year: https://forum.doom9.org/showthread.php?p=1999533
Jean Philippe was spot on then and is spot on now:

max = height/40.


In my example, I was encoding a UHD source, so 3840x2160 which means 2160:40 = 54 maximum threads and sure enough if you count all the "boxes" you get 54:

https://i.imgur.com/hy0C0bf.png


We can make a little list here:

720x480 = 480/40 = 12 maximum threads (SD NTSC)
720x576 = 576/40 = 14 maximum threads (SD PAL)
1280x720 = 720/40 = 18 maximum threads (HD)
1920x1080 = 1080/40 = 27 maximum threads (FULL HD)
3840x2160 = 2160/40 = 54 maximum threads (UHD)


To me, however, more than the lack of ability to scale on multiple cores/threads, the biggest downside is the lack of AVX512 which are there for x264 8bit but are missing for x264 10bit.

Jamaika
8th December 2024, 21:56
To me, however, more than the lack of ability to scale on multiple cores/threads, the biggest downside is the lack of AVX512 which are there for x264 8bit but are missing for x264 10bit.
Thanks for the answer. Now it's more correct, although x264 adds 30 threads and not 27. Correct or incorrect? Should max threads be added?
X264 is an old codec that only has AVX512F. I think AVX10 is already there.
Free OpenCL has also been abandoned.

FranceBB
9th December 2024, 07:27
Opencl is still supported as per x264 8bit and you can still compile x264 with support for it. I use it with --opencl all the time. The problem is that once again that's only for the 8bit version as it never made its way into the 10bit version (and probably never will). :(

Jamaika
9th December 2024, 08:23
True but X264 OpenCL doesn't support the motherboard and is at version level 1.2.

Z2697
9th December 2024, 14:39
My question is for auto 30 threads. Is this value from outer space and heats the processor until I finally burn it? For vvenv, xeve there are protections up to 12 threads.
resolution < 720p: 4, < 5K 2880p: 8, >= 5K 2880p: 12 threads

https://www.intel.com/content/www/us/en/products/sku/230493/intel-core-i513600k-processor-24m-cache-up-to-5-10-ghz/specifications.html
For 13600K total threads 20

For x265 is:
x265 [info]: Thread pool created using 20 threads

Use more therads than "physical" threads won't burn you CPU more than "1:1 threads".

Also I don't think that 12 threads thing is protection.

Jamaika
9th December 2024, 16:34
Also I don't think that 12 threads thing is protection.
Xeve gives max 8 threads. Where does this preventiveness come from?
I read and read. I wasn't interested when I had old processor 4 threads that x264 gives 6. Today I find out that vvenc is tested only for 8/12 threads because otherwise other functions may work incorrectly and be e.g. fps incompatible.

How is compatibility for x264/x265 with max thread 30? 10 years ago it was not possible to add 30 threads.

Z2697
9th December 2024, 17:56
Xeve gives max 8 threads. Where does this preventiveness come from?
I read and read. I wasn't interested when I had old processor 4 threads that x264 gives 6. Today I find out that vvenc is tested only for 8/12 threads because otherwise other functions may work incorrectly and be e.g. fps incompatible.

How is compatibility for x264/x265 with max thread 30? 10 years ago it was not possible to add 30 threads.

They are limitations, I guess. Either lacking the implementation of atuomatically choose thread count based on "physical threads" or lacking enough parallel capability.

High core count CPUs did exist 10 years ago, but less common. I know some people prefer to limit the threads in x264 to 16 and use multiple concurrent encoding sessions to utilize the rest of total threads, but that just their personal preference, there's no major downside to use more threads.

microchip8
9th December 2024, 18:15
Do note that x264 is not very effective at utilizing so many threads. I have a 12c/24t CPU and even though x264 runs with 25 threads on it, the cores/threads are not fully pegged at 100%. I get more like 80-85% utilization per core/thread. Also, the more threads you throw at it, there will be a small decrease in quality.

Z2697
9th December 2024, 21:09
Do note that x264 is not very effective at utilizing so many threads. I have a 12c/24t CPU and even though x264 runs with 25 threads on it, the cores/threads are not fully pegged at 100%. I get more like 80-85% utilization per core/thread. Also, the more threads you throw at it, there will be a small decrease in quality.

I tried the "more thread less quality" with x265 and found "not quite".
The situations which make difference are:
1) pools < 4 / pools >= 4
2) frames-threads = 1 / frame-threads > 1
3) no-wpp / wpp
For pools and frame-threads, the number beyond the upper bound (4 and 1 respectively, as above) don't seem to impact any quality.

I might find some time to test x264 as well.

microchip8
9th December 2024, 22:52
The loss is there, you just can't see it. It's not like it loses so much quality that it's blatently visible.

Z2697
10th December 2024, 05:31
The loss is there, you just can't see it. It's not like it loses so much quality that it's blatently visible.

For x265 the results which I referred as "don't impact quality" are bitexact if --no-info is used.
The loss is not there.

But for the "make difference" situations, you are correct, the loss is very small, very hard to see.

microchip8
10th December 2024, 09:19
I was talking about x264, since this thread is about x264. And the quality drop due to very many threads doesn't come from my mouth but from a past main dev of x264, Dark Shikari.

Z2697
10th December 2024, 14:33
The source is 3840x2160 video scaled from 1920x1080 'cause it's just for testing
(How to limit the display size?)
[IMG]https://files.catbox.moe/vo1usq.png

Jamaika
10th December 2024, 15:14
Thanks for the test. I understand that there is no difference in speed between thread 48 and 64. What is processor? What does adding the graphics card opencl option do?

The 30 threads option is therefore recommended for editing movies.

Z2697
10th December 2024, 17:44
The processor used in the test above is my old Ryzen 7950X. (16C / 32T)

Sagittaire
15th December 2024, 20:57
You may wanna take a look at my post from earlier on this year: https://forum.doom9.org/showthread.php?p=1999533
To me, however, more than the lack of ability to scale on multiple cores/threads, the biggest downside is the lack of AVX512 which are there for x264 8bit but are missing for x264 10bit.

Well I think that the ability to scale on multiple cores/threads is really higher problem for speed encoding than lack AVX512 compatility.

In the best case, speed AVX512 improvement must be at ~10% for x264 and certainely less in real life encoding scenario.

If you want I have really interessing test for your 52C/112T Xeon to prove that ... ;-)

LunaRabbit
15th December 2024, 21:18
In the best case, speed AVX512 improvement must be at ~10% for x264 and certainely less in real life encoding scenario.

Did they ever solve the problem with AVX512 causing chips to heat up quickly and throttle? I've held off on buying any Intel stuff for several years now because of that. AVX512 does sound interesting but I've never heard anything good about it. But I don't have many friends running the latest Xeon either.

FranceBB
15th December 2024, 23:32
Did they ever solve the problem with AVX512 causing chips to heat up quickly and throttle?

Yes, however in my case we're talking about a very controlled environment, namely a server room, which is not what most people would run those things on. That being said, Xeon are very different from consumer chips. I wouldn't recommend any Intel consumer CPU these days at least given that they seem to be all hands on power efficiency rather than performance and I'm not a fan of the hybrid Efficiency Core / Performance Core approach either. I mean, it makes sense on a laptop but definitely not on a desktop. My personal workstation from 2017 is powered by a 20c/40th Xeon (AVX2 only, no AVX512) and an NVIDIA Quadro P4000, with 64 GB of RAM. It's getting older, but it's still pretty good encoding-wise. Still, with October 2025 getting closer and Windows 10 approaching the end of support, Microsoft would deem such a thing "obsolete" as it only has a TPM 1.2 and not a TPM 2.0 chip. Sure, there are workarounds to get that abomination of Windows 11 running anyway despite the lack of a TPM 2.0 chip, but still this just shows how ridiculous the requirement is...

If you want I have really interessing test for your 52C/112T Xeon to prove that ... ;-)

Yeah, I saw the benchmark post and I've been eager to try it, but I have to find a bit of time to do that. Generally those have very little downtime aside from every second Wednesday of the month when I install the Windows Security Updates (yes, I know that Patch Tuesday is the second Tuesday of the month, but it's in U.S time, which translates to Wednesday in UK time).

Sagittaire
23rd December 2024, 10:04
Yeah, I saw the benchmark post and I've been eager to try it, but I have to find a bit of time to do that. Generally those have very little downtime aside from every second Wednesday of the month when I install the Windows Security Updates (yes, I know that Patch Tuesday is the second Tuesday of the month, but it's in U.S time, which translates to Wednesday in UK time).

In fact for your monster Xeon, you need particular benchmark version.

Certainly with 8 simultaneous mulipart encoding scession to saturate your 56/112 thread CPU.

I can make really particular benchmark to show the multipart encoding interest.