Log in

View Full Version : HEVC CPU load with Threadripper - "threading" it right?!


Pages : 1 [2]

RanmaCanada
30th June 2019, 05:05
NO. Energy consumption will go up much faster than fps.

http://www.nagorak.com/imagedump/R7-1700-Efficiency.jpg

Dude, where do you keep getting these totally awesome informational graphs and pictures? Do you build them yourself with all the benchmarking software you have? This one is glorious, and I wish we had more of these curves for more processors haha. It would allow people to know about getting that lower end CPU and overclock it and see if it's worth just paying for that next one up.

mariush
30th June 2019, 11:43
you can make one yourself.
All it costs is a power meter and a lot of time.

Here's example of a power meter : https://www.amazon.com/P3-P4400-Electricity-Usage-Monitor/dp/B00009MDBU/
You plug the computer into it and it will tell you how much power the system consumes.

From there, it's a repeat process.
1. Find the lowest voltage the CPU can be configured at where it will be stable at a particular frequency (ex 3000 Mhz)
2. Run benchmark for a reasonable amount of time, let's say 1h.
3. Look at the power meter's peak Watts value or if the meter has such feature, divide the amount of power (kWh) by minutes to get average energy consumption per minute)
4. Increase cpu frequency by 50/100Mhz and increase voltage on CPU until it's stable again with this new higher frequency. (see step 1)

The results will not be perfect, because the power meter measures the power consumption of the whole system, which includes the inefficiency of the power supply (the power supply doesn't have a fixed efficiency, for example it may be 85% efficient at 200w and 87% efficient at 400w)
You could use a better clamp meter or multimeter to measure the energy taken by CPU VRM through the 8pin EPS connector - that will give you even better numbers.
Though, these will still include some very small errors due to variation in CPU VRM's efficiency (vrm = dc-dc converter which converts 12v from psu to 1v..1.4v for the cpu) but this variation in efficiency should be so small that would not affect your graphs in any significant way.

Atak_Snajpera
30th June 2019, 11:44
Dude, where do you keep getting these totally awesome informational graphs and pictures? Do you build them yourself with all the benchmarking software you have? This one is glorious, and I wish we had more of these curves for more processors haha. It would allow people to know about getting that lower end CPU and overclock it and see if it's worth just paying for that next one up.

https://www.reddit.com/r/Amd/comments/61lgtw/whats_the_max_safe_temperature_for_oc_1700_for/

blublub
9th July 2019, 17:17
Also, frame-threading artifacts are typically around the GOP boundary, and with a long GOP that can get buried in the overall VMAF. Comparing the minimum frame VMAF or the lowest 1% of frame VMAF values can find regressions much more effectively.

phew, quite some posts I have to catch up with since my last visit.

The encoders sets frame-threads to "5" in my setup when it is not otherwise specified, so automode is "5".
If I manually reduce that to a max of "3" it certainly won't degrade quality compared to the defaults.

Primary question for me is if the option "numa-pools=48" does reduce quality as I manually raised it from the default of 32 to over 50 which gave me an incredible performance boost.

Atak_Snajpera
9th July 2019, 18:48
Did you setup your Threadripper in NUMA mode instead of UMA?

benwaggoner
10th July 2019, 18:04
phew, quite some posts I have to catch up with since my last visit.

The encoders sets frame-threads to "5" in my setup when it is not otherwise specified, so automode is "5".
If I manually reduce that to a max of "3" it certainly won't degrade quality compared to the defaults.

Primary question for me is if the option "numa-pools=48" does reduce quality as I manually raised it from the default of 32 to over 50 which gave me an incredible performance boost.
The number of frame threads can have a significant impact on quality. It was pretty overwhelming on early versions of x265, and it's still ideal to run at -F 1 if possible for high quality/efficiency encoding.

Lookahead threading is reduced in slower presets, but I'm fuzzy on if and how it actually impacts quality much.

Other than that, I don't think that thread count impacts quality so much.

For Intel dual-socket systems at least, I've often found that pinning a job to just one socket ala --pools "-,+" has minimal impact on performance, although that probably varies with encoded frame size and number of cores per socket.

Stereodude
12th July 2019, 18:27
The number of frame threads can have a significant impact on quality. It was pretty overwhelming on early versions of x265, and it's still ideal to run at -F 1 if possible for high quality/efficiency encoding.
So you recommend breaking the encode up into pieces and running multiple concurrent encodes that are -F 1?

Do you also limit the total number of threads per encode? Like if you're going to run 4 -F 1 concurrent encodes you set thread pool for each encode to logical cores / 4?

mparade
12th July 2019, 21:39
Did you setup your Threadripper in NUMA mode instead of UMA?

UMA mode with --pools or NUMA mode with -- numa pools should be faster using x265?

Thanks for the answer.

Atak_Snajpera
13th July 2019, 12:41
Always use UMA. Video encoders unlike games do not need super low latency when accessing memory.

mparade
13th July 2019, 13:57
Thank you very much!

RanmaCanada
18th July 2019, 20:38
So stupid question, with the 3950x being released in September, and the current 3900 available now, does the advice in this thread work well with those processors, or is it even needed? Looking to upgrade to one of them come black friday, and just curious!

Stereodude
18th July 2019, 23:58
So stupid question, with the 3950x being released in September, and the current 3900 available now, does the advice in this thread work well with those processors, or is it even needed? Looking to upgrade to one of them come black friday, and just curious!
The 3950X should have about 33% more performance than the 3900X for 50% more money. Whether it's worth it or not is up to you.

blublub
21st July 2019, 19:59
Always use UMA. Video encoders unlike games do not need super low latency when accessing memory.

I used numa in my test and on the default settings x265 also chose numa.
Since I only have 1 16 core I'll try UMA next time with --pool 52 or something like that.

blublub
21st July 2019, 20:00
The 3950X should have about 33% more performance than the 3900X for 50% more money. Whether it's worth it or not is up to you.

Only if you are able to utilize all of those cores. If not then the 12c is likely the much better buy.

blublub
21st July 2019, 20:01
The number of frame threads can have a significant impact on quality. It was pretty overwhelming on early versions of x265, and it's still ideal to run at -F 1 if possible for high quality/efficiency encoding.

Lookahead threading is reduced in slower presets, but I'm fuzzy on if and how it actually impacts quality much.

Other than that, I don't think that thread count impacts quality so much.

For Intel dual-socket systems at least, I've often found that pinning a job to just one socket ala --pools "-,+" has minimal impact on performance, although that probably varies with encoded frame size and number of cores per socket.


Since at default frame-thread was 5 and I am currently setting it to 2 or 3
I am already improving quality - so with frame-thread 2 or 3 I doubt that anyone will see a difference.

Stereodude
21st July 2019, 22:14
Only if you are able to utilize all of those cores. If not then the 12c is likely the much better buy.
Well, you can make the same argument until you're using the 6c or 4c model.

blublub
21st July 2019, 22:21
Well, you can make the same argument until you're using the 6c or 4c model.No u can't.
With x265 it's easy to saturate 8c 16t but it's hard to do 16c 32t at 100% - if u leave the defaults you are only utilizing 60 to 70% so about a 10c 20t CPU on UHD resolution.
With tweaking u can get close to 100% but it's not yet clear if that degrades image quality or not - I can't see a difference but that doesn't mean there isn't any.

Stereodude
22nd July 2019, 00:46
No u can't.
Sure I can.
With x265 it's easy to saturate 8c 16t but it's hard to do 16c 32t at 100% - if u leave the defaults you are only utilizing 60 to 70% so about a 10c 20t CPU on UHD resolution.
I can saturate any number of cores, even 64, with x265 without degrading quality.

blublub
22nd July 2019, 05:19
With default settings? Which software and material are u using?

Stereodude
22nd July 2019, 12:29
With default settings? Which software and material are u using?
I didn't say with a single x265 instance. Break your source up into pieces and encode them at the same time. Or, run more than one encoding job at the same time. Think a little outside the box.

Atak_Snajpera
22nd July 2019, 14:58
With default settings? Which software and material are u using?

Just activate Distributed Encoding in Ripbot264 and done
https://i.imgsafe.org/5c/5c0a9ddd47.png

This way I can saturate hundreds of threads!

Stereodude
22nd July 2019, 15:54
Just activate Distributed Encoding in Ripbot264 and done
https://i.imgsafe.org/5c/5c0a9ddd47.png

This way I can saturate hundreds of threads!
How does Ripbot264 decide where to splice? Does it correctly handle breaking up a qpfile between the segments?

Atak_Snajpera
22nd July 2019, 15:57
Splitting is done at keyframes reported by FFMS.

blublub
29th July 2019, 12:03
Mhh gotta try that

nakTT
3rd May 2021, 17:30
There can be quality regressions when using frame threading. Although that can be turned off directly via -F 1.
Hi, how can i do that in MeGUI? Currently I'm just using the option in the GUI menu to set the frame-threads to 1 to force it to just 1 thread because i don't know how to actually disable the frame threading altogether. All this while i only use GUI menu, not sure how to do it manually with MeGUI. Many thanks in advance.

benwaggoner
3rd May 2021, 21:21
Hi, how can i do that in MeGUI? Currently I'm just using the option in the GUI menu to set the frame-threads to 1 to force it to just 1 thread because i don't know how to actually disable the frame threading altogether. All this while i only use GUI menu, not sure how to do it manually with MeGUI. Many thanks in advance.
-F 1 is the same as --frame-threads 1, so you're probably getting what you wanted. Actually turning the number of encoding threads down to 1 would be --pools "1" Which you definitely don't want to do outside of some specific multi-bitrate encoding scenarios. Even without frame-threads, there is plenty of parallelization available to x265 from WPP and such, which scales with frame size. If you're finding only a fraction of your available cores are being used, you should try --pmode.

Also, the potential quality regressions are proportional to the number of frame threads. I've found -F 2 as being generally safe while offering a big perf boost compared to -F 1

Boulder
4th May 2021, 07:59
Before you start lowering the frame-threads parameter in all your encodes, I suggest you test how it looks. It's very likely that you won't be able to see the difference between -F 1 or -F 4.

benwaggoner
4th May 2021, 18:29
Before you start lowering the frame-threads parameter in all your encodes, I suggest you test how it looks. It's very likely that you won't be able to see the difference between -F 1 or -F 4.
True that. I've not really seen any substantial issues in a couple of years with higher frame threads. Doing more basic settings works reliably. When new features are introduced (RADL for example), there may be frame threading issues initially that get fixed later.

tonemapped
9th January 2022, 09:32
Sorry to dig up an old thread, but I've found the most straight forward way to utilise 12 core is to run two instances (and use 2 for frame threads). That results in constant 99.5%+ CPU utilisation. Without that, and with pmode, I'd only see ~65-70% utilisation.