View Full Version : AMD Epyc (32 Core/64 Threads) very slow x265 Encoding.
-QfG-
18th April 2022, 11:54
Hi Community,
i have a problem with x265 encoding and an AMD EPYC processor. The encode is ultra slow (4K HDR Encode). So ~1,3 Frames/sec.
In the StaxRip window i see, that x265 created 4 NUMA POOLS with 16 Threads.
But only 16 cores works with full speed, not all 32 :(
The FPS with my 3950x are much more (2 - 3 FPS).
Any idea or solution, how i can encode with using all cores with full power?
FranceBB
18th April 2022, 13:28
Post your full command line and output, but in the meantime you can try with: --wpp --pmode --pme --lookahead-slices 16
-QfG-
18th April 2022, 18:20
--------------------------- Video encoding ---------------------------
x265 3.4.0.2
"C:\Program Files\StaxRip\Apps\Support\avs2pipemod\avs2pipemod64.exe" -y4mp D:\xxxx.avs | "C:\Program Files\StaxRip\Apps\Encoders\x265\x265.exe" --crf 17 --preset slow --profile main10 --level-idc 5.1 --output-depth 10 --aq-mode 1 --refine-analysis-type off --min-keyint 24 --keyint 240 --no-open-gop --master-display "G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1)" --colorprim bt2020 --colormatrix bt2020nc --transfer smpte2084 --range limited --max-cll "3560,427" --chromaloc 2 --repeat-headers --hrd --aud --deblock -3:-3 --no-sao --no-strong-intra-smoothing --refine-mv 0 --wpp --pmode --pme --lookahead-slices 16 --frames 143959 --y4m --output E:\xxxx.hevc -
avs2pipemod[info]: writing 143959 frames of 24/1 fps, 3840x1920,
sar 0:0, YUV-420-planar-10bit progressive video.
y4m [info]: 3840x1920 fps 24/1 i420p10 unknown frame count
raw [info]: output file: E:\xxxx.hevc
x265 [info]: HEVC encoder version 3.4+2-73ca1d7be377
x265 [info]: build info [Windows][MSVC 1925][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x265 [warning]: Limit reference options 2 and 3 are not supported with pmode. Disabling limit reference
x265 [warning]: Specifying a decoder level with constant rate factor rate-control requires
x265 [warning]: enabling VBV with vbv-bufsize=160000kb vbv-maxrate=160000kbps. VBV outputs are non-deterministic!
x265 [info]: Main 10 profile, Level-5.1 (High tier)
x265 [info]: Thread pool 0 using 16 threads on numa nodes 0
x265 [info]: Thread pool 1 using 16 threads on numa nodes 1
x265 [info]: Thread pool 2 using 16 threads on numa nodes 2
x265 [info]: Thread pool 3 using 16 threads on numa nodes 3
x265 [info]: Slices : 1
x265 [info]: frame threads / pool features : 5 / wpp(30 rows)+pmode+pme
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : star / 57 / 3 / 3
x265 [info]: Keyframe min / max / scenecut / bias : 24 / 240 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt : 25 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 0
x265 [info]: References / ref-limit cu / depth : 4 / off / off
x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress : CRF-17.0 / 0.60
x265 [info]: VBV/HRD buffer / max-rate / init : 160000 / 160000 / 0.900
x265 [info]: tools: rect limit-modes rd=4 psy-rd=2.00 rdoq=2 psy-rdoq=1.00
x265 [info]: tools: rskip mode=1 signhide tmvp lslices=12 deblock(tC=-3:B=-3)
excellentswordfight
18th April 2022, 19:15
I just did a test on an 7502P, and no issue there at default settings (only setting --crf --preset --profile --level-idc) for UHD, I dont get the utilization pinned at 100% but thats not expected either, but I get spikes to 100%.
x265 [info]: Thread pool 0 using 16 threads on numa nodes 0
x265 [info]: Thread pool 1 using 16 threads on numa nodes 1
x265 [info]: Thread pool 2 using 16 threads on numa nodes 2
x265 [info]: Thread pool 3 using 16 threads on numa nodes 3
For me x265 does not recognize 7502P to have multiple numas, I only get "x265 [info]: Thread pool created using 64 threads". What generation of epyc and OS are you using?
-QfG-
18th April 2022, 19:29
Using Windows Server 2022 and an AMD EPYC 7551p.
excellentswordfight
18th April 2022, 19:44
Using Windows Server 2022 and an AMD EPYC 7551p.
I guess that is an Naples Epyc? That would explain that part i think as the first generation doesnt have a shared IO die and is more the case of just four cpus put together. But even so, it still a bit odd, as it still seems to assign 16threads per numa (which should be correct), so it shouldn't me an issue. But I havnt acutally used x265 on systems with more than two numas (and even there you loose some performance) so not sure how it handles four, but I would expect it just to have performance issues, not that its ignoring two of the numas all together.
And just fyi first gen Zen is not that competitive for x265, I wouldnt be surprised if current gen consumer models will outperform it, especially for single file throughput as you wont see 100% utilization eitherway with 64 threads for UHD, and get some penalty from the Naples architecture, as well as some AVX limitations on the first Zen generation. 5800X is almost twice as fast (https://tpucdn.com/review/amd-ryzen-7-5800x/images/x265.png) as 1800X with the same number of threads. But as you already have it, it might be better to try to do one instance per numa instead?
RanmaCanada
18th April 2022, 22:42
Have you tried using ripbot to do chunking? Though I do have to agree with excellentswordfight that the first gen was really bad for x265.
benwaggoner
19th April 2022, 17:17
I guess that is an Naples Epyc? That would explain that part i think as the first generation doesnt have a shared IO die and is more the case of just four cpus put together. But even so, it still a bit odd, as it still seems to assign 16threads per numa (which should be correct), so it shouldn't me an issue. But I havnt acutally used x265 on systems with more than two numas (and even there you loose some performance) so not sure how it handles four, but I would expect it just to have performance issues, not that its ignoring two of the numas all together.
And just fyi first gen Zen is not that competitive for x265, I wouldnt be surprised if current gen consumer models will outperform it, especially for single file throughput as you wont see 100% utilization eitherway with 64 threads for UHD, and get some penalty from the Naples architecture, as well as some AVX limitations on the first Zen generation. 5800X is almost twice as fast (https://tpucdn.com/review/amd-ryzen-7-5800x/images/x265.png) as 1800X with the same number of threads. But as you already have it, it might be better to try to do one instance per numa instead?
On my two-socket Xeon system, I've never seen anything like a 2x performance improvement using both CPUs versus --pools "+,-" and "-,+." Definitely not for 4K and below, and not much better for 8K.
When I'm doing encoding tests, I just run two at once, one per NUMA, for maximum throughput.
excellentswordfight
20th April 2022, 08:29
On my two-socket Xeon system, I've never seen anything like a 2x performance improvement using both CPUs versus --pools "+,-" and "-,+." Definitely not for 4K and below, and not much better for 8K.
When I'm doing encoding tests, I just run two at once, one per NUMA, for maximum throughput.
Yes, and especially now when we are seeing such a high core count per CPU it makes even more sense as its easier to fully utilize all threads so there will be an performance gain there as well and not just from avoiding the performance penalty of multiple NUMAs.
Did a simple test recently for 32/64 systems it looked like this (UHD encode using preset slow):
Epyc "Rome" single socket 32C/64T: 6,26fps
Epyc "Rome" single socket (two instances): 3,9*2=7,8fps
Xeon "Cascade Lake R" dual socket 2*16C/32T: 4,85fps
Xeon "Cascade Lake R" dual socket (one instance per CPU): 3,2*2=6,4fps
-QfG-
21st April 2022, 06:53
Thanks very much @all here. Ok, the old Epyc must go :D.
PatchWorKs
3rd May 2022, 08:15
Any idea or solution, how i can encode with using all cores with full power?
...it's not x265 of course, but what about hardware encoding ?
https://github.com/rigaya/VCEEnc
note: you can use FastFlix (https://github.com/cdgriffith/FastFlix) as GUI to easily obtain commandline(s)
RanmaCanada
3rd May 2022, 16:11
...it's not x265 of course, but what about hardware encoding ?
https://github.com/rigaya/VCEEnc
note: you can use FastFlix (https://github.com/cdgriffith/FastFlix) as GUI to easily obtain commandline(s)
This needs an AMD GPU, which EPYC does not have. It also uses VCE which is ancient. VCN is the current AMD hardware encoder.
As of AMD Raven Ridge (released January 2018), VCE was succeeded by Video Core Next (VCN).
PatchWorKs
5th May 2022, 15:47
It also uses VCE which is ancient. VCN is the current AMD hardware encoder.
As of AMD Raven Ridge (released January 2018), VCE was succeeded by Video Core Next (VCN).
Well, according to this CodeCalamity article (https://codecalamity.com/amd-hardware-encoding-in-2021-vce-vcn/) VCE and VCN are just AMD naming entries, so can be used interchangeably.
vBulletin® v3.8.11, Copyright ©2000-2026, vBulletin Solutions Inc.