View Full Version : X265 pool, quality and performance
YaBoyShredderson
19th March 2023, 21:00
I have a 5900x 12c/24t cpu and using x265 (and x264 for that matter) i tend to limit thread usage per encode due to improved efficiency and then run multiple encodes at the same time to max out cpu usage. With x265 i have frame threads and lookahead-slices set to 1, and with wpp this gets me about 40% usage per encode, so i run 3 of them.
Each instance of staxrip states for each encode that a thread pool has been created using 24 threads and i was wondering if i should set pools to 8 so my 3 encodes end up at 24 threads total.
How will this affect quality and performance?
benwaggoner
20th March 2023, 23:57
I have a 5900x 12c/24t cpu and using x265 (and x264 for that matter) i tend to limit thread usage per encode due to improved efficiency and then run multiple encodes at the same time to max out cpu usage. With x265 i have frame threads and lookahead-slices set to 1, and with wpp this gets me about 40% usage per encode, so i run 3 of them.
Each instance of staxrip states for each encode that a thread pool has been created using 24 threads and i was wondering if i should set pools to 8 so my 3 encodes end up at 24 threads total.
How will this affect quality and performance?
You sure can. I have a dual-socket workstation, and typically pin an encode to one or the other socket, or do two in parallel like that. You want to make sure that you're allocating different cores for each of the encodes, of course.
HD MOVIE SOURCE
21st March 2023, 04:19
You sure can. I have a dual-socket workstation, and typically pin an encode to one or the other socket, or do two in parallel like that. You want to make sure that you're allocating different cores for each of the encodes, of course.
Do you need special software to do that or you can literally do in task manager? I use Vidcoder for instance, if I set the core count to only use 2, and then open another version up, would both versions then use 2 cores each?
It is not faster to use all cores for one encode, and then move on to the next?
I do appreciate that a single core gets a better encode.
Guest
21st March 2023, 04:28
I have a 5900x 12c/24t cpu and using x265 (and x264 for that matter) i tend to limit thread usage per encode due to improved efficiency and then run multiple encodes at the same time to max out cpu usage. With x265 i have frame threads and lookahead-slices set to 1, and with wpp this gets me about 40% usage per encode, so i run 3 of them.
Each instance of staxrip states for each encode that a thread pool has been created using 24 threads and i was wondering if i should set pools to 8 so my 3 encodes end up at 24 threads total.
How will this affect quality and performance?
Have you tried the "chunk" encoding option in StaxRip ??
benwaggoner
21st March 2023, 17:20
Do you need special software to do that or you can literally do in task manager? I use Vidcoder for instance, if I set the core count to only use 2, and then open another version up, would both versions then use 2 cores each?
It is not faster to use all cores for one encode, and then move on to the next?
I do appreciate that a single core gets a better encode.
Single core doesn't intrinsically give a better encode. The big difference is that frame-threading has some inefficiencies, and frame-threads defaults to higher values with more cores.If you set --frame_threads 1, quality will be the same irrespective of core counts, but the max cores used will be lower.
To do my per-socket parallelism, I use --pools "+,-" for the first x265 instance, and --pools "-,+" for the second. That syntax specifies which NUMA node to use, which is two in my config.
AMD uses multiple NUMA nodes per socket with their chiplets, and I've not had one to play with to figure out optimal tuning and perf impacts.
excellentswordfight
21st March 2023, 20:27
AMD uses multiple NUMA nodes per socket with their chiplets, and I've not had one to play with to figure out optimal tuning and perf impacts.
This is only true for naples, rome and milan does not behave like this, cross chiplet communication is much less of an issue when they moved to a shared IO die. For video encoding I think the penalty of the design will be negligible.
benwaggoner
22nd March 2023, 16:18
This is only true for naples, rome and milan does not behave like this, cross chiplet communication is much less of an issue when they moved to a shared IO die. For video encoding I think the penalty of the design will be negligible.
Good to hear, and makes sense.
How does NUMA work in Zen 4?
excellentswordfight
23rd March 2023, 09:30
Good to hear, and makes sense.
How does NUMA work in Zen 4?
I dont have any Genoa (Zen4 based Epyc generation) servers available, but I doubt there will be any difference to Rome and Milan from that perspective, looks pretty much the same, just that it has up to 12 Zen4 chiplets, instead of up to 8. They behave from a users perspective, and is reported by the OS as a single NUMA.
benwaggoner
24th March 2023, 16:47
Excellent. I likely need to replace my old dual Xeon workstation soon, and looks like I'll be back to a single-socket main system for the first time in at least 12 years. Although these new high core server/workstation chips seem to have absorbed the extra cost of dual socket motherboards, two CPUs, etcetera into a much higher MSRP.
rwill
24th March 2023, 19:46
Excellent. I likely need to replace my old dual Xeon workstation soon, and looks like I'll be back to a single-socket main system for the first time in at least 12 years. Although these new high core server/workstation chips seem to have absorbed the extra cost of dual socket motherboards, two CPUs, etcetera into a much higher MSRP.
@benwaggoner
Maybe just get a Threadripper Pro 5975WX. I think it has the right balance between Base/Boost Clock and Core Count to run around 2 UHD Encode Jobs. I think Lenovo sells some.
See here: https://www.tomshardware.com/reviews/amd-threadripper-pro-5995wx-5975wx-cpu-review/4
I have a Threadripper 3970X, one of the previous models and it helped me a lot to cut down encode time while being efficient.
Only problem with these system is the "around idle" power consumption which is somewhat high. I personally solved that with some small system I use for all work tasks except the heavy encoding. For heavy video tasks I start up the Threadripper and Remote Desktop into it. Now Power consumption is not something one cares about much in a work environment but when you are in Home Office in Western Europe, ah, lets not talk about Energy prices ...
excellentswordfight
24th March 2023, 20:19
Excellent. I likely need to replace my old dual Xeon workstation soon, and looks like I'll be back to a single-socket main system for the first time in at least 12 years. Although these new high core server/workstation chips seem to have absorbed the extra cost of dual socket motherboards, two CPUs, etcetera into a much higher MSRP.
@benwaggoner
Maybe just get a Threadripper Pro 5975WX. I think it has the right balance between Base/Boost Clock and Core Count to run around 2 UHD Encode Jobs. I think Lenovo sells some.
See here: https://www.tomshardware.com/reviews/amd-threadripper-pro-5995wx-5975wx-cpu-review/4
I have a Threadripper 3970X, one of the previous models and it helped me a lot to cut down encode time while being efficient.
Only problem with these system is the "around idle" power consumption which is somewhat high. I personally solved that with some small system I use for all work tasks except the heavy encoding. For heavy video tasks I start up the Threadripper and Remote Desktop into it. Now Power consumption is not something one cares about much in a work environment but when you are in Home Office in Western Europe, ah, lets not talk about Energy prices ...
Yes, we use a lot of Epyc version of that model, the EPYC 7543P, its a very good CPU for a reasonable price. For the models above 32C all core clock will start too go down quite rapidly, so if individual encode throughput is also important its a bit of a sweetspot. The Zen4 replacement modell of that is 9354P (the P sufix is for single socket limited models that are about 25% cheaper).
We also have some Threadripper Pros, but we dont have anything higher than 5965WX (excellent model for single UHD encodes and content creation in general). And yes, for workstations you can both get the Lenovo p620 or Dells new Precision 7865.
I have a upcomming meeting with a supplier on 4th gen Xeon SP (Sapphire Rapids) as we are finally starting to see products being launched. I hope I can do some comparison between those and Genoa soon, I would love to see some comparisons between 6414U and 9354P, and the new Xeon-W 32C vs 5975WX. But tbh, it doesnt look that promising, in my experience base clock is the frequency you can expect for all core load @tdp, 9354P has a 3.25GHz base and 6414U only 2Ghz, so I suspect that for these two models the AMD one will outperform it by quite a margin.
vBulletin® v3.8.11, Copyright ©2000-2026, vBulletin Solutions Inc.