Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 19th March 2023, 21:00   #1  |  Link
YaBoyShredderson
Registered User
 
Join Date: Jul 2020
Posts: 76
X265 pool, quality and performance

I have a 5900x 12c/24t cpu and using x265 (and x264 for that matter) i tend to limit thread usage per encode due to improved efficiency and then run multiple encodes at the same time to max out cpu usage. With x265 i have frame threads and lookahead-slices set to 1, and with wpp this gets me about 40% usage per encode, so i run 3 of them.

Each instance of staxrip states for each encode that a thread pool has been created using 24 threads and i was wondering if i should set pools to 8 so my 3 encodes end up at 24 threads total.

How will this affect quality and performance?
YaBoyShredderson is offline   Reply With Quote
Old 20th March 2023, 23:57   #2  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,819
Quote:
Originally Posted by YaBoyShredderson View Post
I have a 5900x 12c/24t cpu and using x265 (and x264 for that matter) i tend to limit thread usage per encode due to improved efficiency and then run multiple encodes at the same time to max out cpu usage. With x265 i have frame threads and lookahead-slices set to 1, and with wpp this gets me about 40% usage per encode, so i run 3 of them.

Each instance of staxrip states for each encode that a thread pool has been created using 24 threads and i was wondering if i should set pools to 8 so my 3 encodes end up at 24 threads total.

How will this affect quality and performance?
You sure can. I have a dual-socket workstation, and typically pin an encode to one or the other socket, or do two in parallel like that. You want to make sure that you're allocating different cores for each of the encodes, of course.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 21st March 2023, 04:19   #3  |  Link
HD MOVIE SOURCE
Registered User
 
HD MOVIE SOURCE's Avatar
 
Join Date: Mar 2021
Location: North Carolina
Posts: 138
Quote:
Originally Posted by benwaggoner View Post
You sure can. I have a dual-socket workstation, and typically pin an encode to one or the other socket, or do two in parallel like that. You want to make sure that you're allocating different cores for each of the encodes, of course.
Do you need special software to do that or you can literally do in task manager? I use Vidcoder for instance, if I set the core count to only use 2, and then open another version up, would both versions then use 2 cores each?

It is not faster to use all cores for one encode, and then move on to the next?

I do appreciate that a single core gets a better encode.
HD MOVIE SOURCE is offline   Reply With Quote
Old 21st March 2023, 04:28   #4  |  Link
FTLOY
Friend of a friend..
 
FTLOY's Avatar
 
Join Date: Feb 2023
Posts: 290
Quote:
Originally Posted by YaBoyShredderson View Post
I have a 5900x 12c/24t cpu and using x265 (and x264 for that matter) i tend to limit thread usage per encode due to improved efficiency and then run multiple encodes at the same time to max out cpu usage. With x265 i have frame threads and lookahead-slices set to 1, and with wpp this gets me about 40% usage per encode, so i run 3 of them.

Each instance of staxrip states for each encode that a thread pool has been created using 24 threads and i was wondering if i should set pools to 8 so my 3 encodes end up at 24 threads total.

How will this affect quality and performance?
Have you tried the "chunk" encoding option in StaxRip ??
FTLOY is offline   Reply With Quote
Old 21st March 2023, 17:20   #5  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,819
Quote:
Originally Posted by HD MOVIE SOURCE View Post
Do you need special software to do that or you can literally do in task manager? I use Vidcoder for instance, if I set the core count to only use 2, and then open another version up, would both versions then use 2 cores each?

It is not faster to use all cores for one encode, and then move on to the next?

I do appreciate that a single core gets a better encode.
Single core doesn't intrinsically give a better encode. The big difference is that frame-threading has some inefficiencies, and frame-threads defaults to higher values with more cores.If you set --frame_threads 1, quality will be the same irrespective of core counts, but the max cores used will be lower.

To do my per-socket parallelism, I use --pools "+,-" for the first x265 instance, and --pools "-,+" for the second. That syntax specifies which NUMA node to use, which is two in my config.

AMD uses multiple NUMA nodes per socket with their chiplets, and I've not had one to play with to figure out optimal tuning and perf impacts.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 21st March 2023, 20:27   #6  |  Link
excellentswordfight
Lost my old account :(
 
Join Date: Jul 2017
Posts: 332
Quote:
Originally Posted by benwaggoner View Post
AMD uses multiple NUMA nodes per socket with their chiplets, and I've not had one to play with to figure out optimal tuning and perf impacts.
This is only true for naples, rome and milan does not behave like this, cross chiplet communication is much less of an issue when they moved to a shared IO die. For video encoding I think the penalty of the design will be negligible.

Last edited by excellentswordfight; 21st March 2023 at 20:34.
excellentswordfight is offline   Reply With Quote
Old 22nd March 2023, 16:18   #7  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,819
Quote:
Originally Posted by excellentswordfight View Post
This is only true for naples, rome and milan does not behave like this, cross chiplet communication is much less of an issue when they moved to a shared IO die. For video encoding I think the penalty of the design will be negligible.
Good to hear, and makes sense.

How does NUMA work in Zen 4?
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 23rd March 2023, 09:30   #8  |  Link
excellentswordfight
Lost my old account :(
 
Join Date: Jul 2017
Posts: 332
Quote:
Originally Posted by benwaggoner View Post
Good to hear, and makes sense.

How does NUMA work in Zen 4?
I dont have any Genoa (Zen4 based Epyc generation) servers available, but I doubt there will be any difference to Rome and Milan from that perspective, looks pretty much the same, just that it has up to 12 Zen4 chiplets, instead of up to 8. They behave from a users perspective, and is reported by the OS as a single NUMA.
excellentswordfight is offline   Reply With Quote
Old 24th March 2023, 16:47   #9  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,819
Excellent. I likely need to replace my old dual Xeon workstation soon, and looks like I'll be back to a single-socket main system for the first time in at least 12 years. Although these new high core server/workstation chips seem to have absorbed the extra cost of dual socket motherboards, two CPUs, etcetera into a much higher MSRP.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 24th March 2023, 19:46   #10  |  Link
rwill
Registered User
 
Join Date: Dec 2013
Location: Berlin, Germany
Posts: 375
Quote:
Originally Posted by benwaggoner View Post
Excellent. I likely need to replace my old dual Xeon workstation soon, and looks like I'll be back to a single-socket main system for the first time in at least 12 years. Although these new high core server/workstation chips seem to have absorbed the extra cost of dual socket motherboards, two CPUs, etcetera into a much higher MSRP.
@benwaggoner

Maybe just get a Threadripper Pro 5975WX. I think it has the right balance between Base/Boost Clock and Core Count to run around 2 UHD Encode Jobs. I think Lenovo sells some.

See here: https://www.tomshardware.com/reviews...x-cpu-review/4

I have a Threadripper 3970X, one of the previous models and it helped me a lot to cut down encode time while being efficient.

Only problem with these system is the "around idle" power consumption which is somewhat high. I personally solved that with some small system I use for all work tasks except the heavy encoding. For heavy video tasks I start up the Threadripper and Remote Desktop into it. Now Power consumption is not something one cares about much in a work environment but when you are in Home Office in Western Europe, ah, lets not talk about Energy prices ...
rwill is offline   Reply With Quote
Old 24th March 2023, 20:19   #11  |  Link
excellentswordfight
Lost my old account :(
 
Join Date: Jul 2017
Posts: 332
Quote:
Originally Posted by benwaggoner View Post
Excellent. I likely need to replace my old dual Xeon workstation soon, and looks like I'll be back to a single-socket main system for the first time in at least 12 years. Although these new high core server/workstation chips seem to have absorbed the extra cost of dual socket motherboards, two CPUs, etcetera into a much higher MSRP.
Quote:
Originally Posted by rwill View Post
@benwaggoner

Maybe just get a Threadripper Pro 5975WX. I think it has the right balance between Base/Boost Clock and Core Count to run around 2 UHD Encode Jobs. I think Lenovo sells some.

See here: https://www.tomshardware.com/reviews...x-cpu-review/4

I have a Threadripper 3970X, one of the previous models and it helped me a lot to cut down encode time while being efficient.

Only problem with these system is the "around idle" power consumption which is somewhat high. I personally solved that with some small system I use for all work tasks except the heavy encoding. For heavy video tasks I start up the Threadripper and Remote Desktop into it. Now Power consumption is not something one cares about much in a work environment but when you are in Home Office in Western Europe, ah, lets not talk about Energy prices ...
Yes, we use a lot of Epyc version of that model, the EPYC 7543P, its a very good CPU for a reasonable price. For the models above 32C all core clock will start too go down quite rapidly, so if individual encode throughput is also important its a bit of a sweetspot. The Zen4 replacement modell of that is 9354P (the P sufix is for single socket limited models that are about 25% cheaper).

We also have some Threadripper Pros, but we dont have anything higher than 5965WX (excellent model for single UHD encodes and content creation in general). And yes, for workstations you can both get the Lenovo p620 or Dells new Precision 7865.

I have a upcomming meeting with a supplier on 4th gen Xeon SP (Sapphire Rapids) as we are finally starting to see products being launched. I hope I can do some comparison between those and Genoa soon, I would love to see some comparisons between 6414U and 9354P, and the new Xeon-W 32C vs 5975WX. But tbh, it doesnt look that promising, in my experience base clock is the frequency you can expect for all core load @tdp, 9354P has a 3.25GHz base and 6414U only 2Ghz, so I suspect that for these two models the AMD one will outperform it by quite a margin.

Last edited by excellentswordfight; 27th March 2023 at 09:10.
excellentswordfight is offline   Reply With Quote
Reply

Tags
performance, pools, quality, staxrip, x255

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 12:51.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.