Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
![]() |
#1 | Link | |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,173
|
x264-10bit UHD and Numa Nodes support?
Hi there guys,
I'm currently using x264-10bit with an AVS Script.avs to encode UHD BT2020 HLG 50p 4:2:2 500 Mbit/s 10bit planar files like so: Quote:
What I'm interested about is the CPU usage, whether I could get any gains with the "big guns" and why my two different dual socket configurations behave very differently. First configuration: CPU 0: Intel Xeon E5-2640 v4 2.40GHz 10c/20th (AVX2 max) CPU 1: Intel Xeon E5-2640 v4 2.40GHz 10c/20th (AVX2 max) RAM: 64 GB DDR4 OS: Windows 10 Enterprise x64 This configuration reaches a speed of 26fps and x264 saturates all cores and all threads, so there isn't anything to optimize here: ![]() Second configuration: CPU 0: Intel Xeon Gold 6238R 2.20GHz 28c/56th (AVX512 max) CPU 1: Intel Xeon Gold 6238R 2.20GHz 28c/56th (AVX512 max) RAM: 128 GB DDR4 OS: Windows Server 2019 Standard x64 This configuration reaches a speed of 32.9fps, only slightly faster than the other configuration and x264 only saturates the cores and threads of CPU 0 instead of using both of them: ![]() In other words, the reason why it's 26fps vs 32.9fps is because it's as if the 20c/40th was competing against a single 28c/56th CPU instead of a 56c/112th one... On top of that, despite having AVX512, it's only using up to AVX2 'cause x264 has AVX512 assembly optimization only for the 8bit version but not for the 10bit version, sadly (or at least that's what the command line output from the prompt says). What I don't understand is why this happens. I mean, up until now I thought only x265 was Numa Nodes aware and therefore was able to use both CPUs in a dual socket configuration. This reflects what is happening in the more powerful 56c/112th configuration, however the 20c/40th is also a dual socket configuration and there x264 is using both CPUs at 100%, so... what's going on here? And most importantly, is there anything I can do on this regard? The x264 build I'm using is c164_r3107_a8b68eb from the 17th of July 2023, so it's fairly updated, in case you're wondering. Avisynth is also updated as it's 3.7.3 stable, Ferenc's build of course. Last edited by FranceBB; 22nd March 2024 at 23:36. |
|
![]() |
![]() |
![]() |
#2 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,385
|
The number of threads in x264 is limited by the vertical size with something around 40 lines if i remember properly, it's the smallest size it splits the frame.
You didn't specify the size of your video, but if it's 2160 => 2160/40 = 54, so this is the maximum number of threads you can expect.
__________________
My github. |
![]() |
![]() |
![]() |
#3 | Link | |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,173
|
Quote:
![]() Thank you Jean Philippe, as always! |
|
![]() |
![]() |
![]() |
#5 | Link | |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,173
|
Quote:
![]() It only accepts MPEG-2 8bit and H.264 10bit and it plays them back via SDI for playout. It's part of the "Versio Integrated Playout" platform for UHD TX used in linear channels by plenty of TVs out there. ![]() Last edited by FranceBB; 15th February 2024 at 01:31. |
|
![]() |
![]() |
![]() |
#7 | Link | |
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,951
|
Quote:
As far as optimization, you've already implemented all the clever ideas I thought of. Using a faster preset if it gets you sufficient quality within the bitrate maybe? Doing split and stitch with two x264 instances each pinned to a single NUMA node would offer quite a bit better throughput. The NU part of the MA introduces a fair amount of overhead, as you're seeing from your fps going up less than your thread utilization. I doubt AVX512 would benefit you that much; x264 has fewer opportunities for >256 bit SIMD than x265. |
|
![]() |
![]() |
![]() |
#8 | Link | |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,173
|
Quote:
I used to think that 500 Mbit/s were plenty for UHD in H.264 and that it wouldn't have mattered anyway, but I actually found out that x264 wants to stay at around 800 Mbit/s in order to be "happy" and if left uncapped it would actually overshoot most of the time (hence the buffer size = bitrate / fps constrain). Crazy, I know, but it is what it is. Still, it's still way more compressed than the original DNX running at a whopping 1.3 Gbit/s. |
|
![]() |
![]() |
![]() |
#9 | Link | |
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,951
|
Quote:
|
|
![]() |
![]() |
![]() |
#10 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,240
|
You can try to start more encoding processes and limit number of threads per each process. It maybe no good to start lots of threads with too small workunit for each tread. Because threads sync also adds some significant overhead.
|
![]() |
![]() |
![]() |
#11 | Link |
Registered User
Join Date: Oct 2012
Posts: 8,287
|
time to necro a bit.
can this Versio Imagine box handle the soft telecine flag --pulldown double. it may just double the frame rate if you ask it to output progressive with no deinterlacing artefacts. or it may ruin the image. |
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|