Log in

View Full Version : Xeon question


burnix
6th July 2021, 07:27
Hello Community.

I admit that with that question i'm a little bit lazy but is X265 or ffmpeg command or version optimized for Xeon processors ????

I have some servers at work that are not fully exploited :p:p

Thanks for your response

FranceBB
6th July 2021, 08:59
Optimized in which way? How?
x265 has manually written intrinsics in assembly to make use of the specialized registers in the CPUs like SSE, SSE2, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX512.
I personally have a 56c/112th Intel Xeon over here and although it's impossible to make x265 use the CPU at 100% unless you're using very high resolutions and bit depths, there are some additional parameters that can be triggered to make use of more cores.
That being said, I think that for these kind of server the real strength is parallelism rather than single encodes, in fact I bought them to encode several files at the same time.
Heck, with MPEG-2 I can encode up to 56 files at the same time without the blink of an eye from the CPU, so, you know...
About x264, I found it to be quite parallelized BUT to this very date it cannot make use of NUMA Nodes properly, which means that if you have a Dual Socket configuration it will only use 1 CPU with all its cores/threads but not the other one.
This doesn't happen with x265 which is able to use both CPUs of a dual socket configuration.


My personal remark (and please note that it might not be shared as a view by some other fella over here): I don't trust AMD CPUs for work, I never did. I've been running Intel Xeon CPUs on all my server for years and I'm very much happy with the performances in encoding. I'm also experimenting with AVX512 which are still Intel only and many people said that they can't use them 'cause their CPUs get way too hot and end up throttling, which means that the clock is gonna go down, let the CPU cool, then go up again, then go down again etc ending up with worse performances than with plain AVX2, so I gave it a go in our server room with a controlled temperature and I gotta say that I was able to keep it around 83°C, so it's doable and I had a speed benefit over plain AVX2. I'll post some benchmarks later, but I'm happy with the results I've got.
Professionally speaking, everyone has Intel Xeon CPUs and NVIDIA Quadro GPUs for workstations and all servers are also running Intel Xeon CPUs. It's just the way it is and there's a good reason for it.

microchip8
6th July 2021, 12:38
@FranceBB

You may prefer Xeons but AMD EPYC CPUs are making a strong inroad into datacenters and supercomputers and for good reasons (more power efficient, faster, cheaper, more cores). Also, if you need a lot of PCIe lanes, you cannot ignore AMD in this case. As for encoding, Threadripper/EPYC will be faster overall compared to a similar Xeon

Boulder
6th July 2021, 13:20
I see Intel has done its propaganda work very well :D

benwaggoner
6th July 2021, 19:33
The concern with AVX512 wasn't that it was never helpful, but that it only had a reliable improvement over AVX2 only with 4K HDR --preset verslow or higher. Which I've seen, and also seen improvements doing 8K encoding.

I could see really good cooling helping some (although 83 isn't that cool). And CPU architecture improvements can change these dynamics a lot. AVX2 was in a similar thermal-throttled boat when originally launched, but was much improved a generation or two later.

FranceBB
7th July 2021, 11:02
(although 83 isn't that cool)

Yeah but it's in a server room kept at very low temperatures and with the fan spinning like crazy, I think it would skyrocket to 97°C if it was in a normal room and nothing was in place other than the normal fan eheheheh

rwill
7th July 2021, 14:00
Yeah but it's in a server room kept at very low temperatures and with the fan spinning like crazy, I think it would skyrocket to 97°C if it was in a normal room and nothing was in place other than the normal fan eheheheh

Might I suggest an industrial Chiller with around 4KW for this use case ?

RanmaCanada
8th July 2021, 05:49
Might I suggest an industrial Chiller with around 4KW for this use case ?

Pretty sure that's only used when you want to lie about the specs of your processor. haha

excellentswordfight
8th July 2021, 20:04
Yeah but it's in a server room kept at very low temperatures and with the fan spinning like crazy, I think it would skyrocket to 97°C if it was in a normal room and nothing was in place other than the normal fan eheheheh
I doubt it will do that much difference, server hardware usually comply strictly to intels specifications and cooling is designed to handle it, most if not all intel servers I have experience with does not exceed the tdp value under sustained load in power consumption, and thermal solutions are designed to handle that at atleast reference ambiant tempeture (22C). It is not like on consumer platforms were system builders are allowed to have increased power consumption if the cooling allows for it.

I can understand why people stayed away from the first generation of Epyc, even Rome, which solved most of performance related issues compare to intel had some initial issues (supply and slow implementations in mature servers). But for the last year or two the epyc platform has been absolute killer, I would assume that for encoding the higher sustained clock speeds of Epyc will easily outweight any potential performace gain that avx512 might give on an intel platform. I was blown away when we received our first Rome server when seeing sustained clockspeeds at over 3Ghz under full load, most high core intel systems we have go down to an rather low base clock of just over 2Ghz, with avx512 tests some have even gone under 2ghz.

What clockspeed are you getting under 100% load with x265 with avx512 enabled?

FranceBB
9th July 2021, 09:20
What clockspeed are you getting under 100% load with x265 with avx512 enabled?

https://i.imgur.com/7t7yXBz.png

excellentswordfight
9th July 2021, 09:28
https://i.imgur.com/7t7yXBz.png
Thats actually higher then I expected! Xeons usually hit base clock under full load (2.2 for Gold 6238R it seems). I wonder if Intel has improved avx512 on Cascade Lake, I know that they have on Ice Lake, but the test I did on Skylake-SP had a pretty large penalty (almost 500Mhz).

What is the vendor of that server?

FranceBB
9th July 2021, 10:35
Xeons usually hit base clock under full load (2.2 for Gold 6238R it seems).

Yeah, this was what happened with my old 28c/56th and 20c/40th, which reverted to base clock on full load, but apparently Intel improved things in these new gen.


What is the vendor of that server?

We have a running contract with HP.
This particular server comes with a Gold Xeon which doesn't get much higher, but the Platinum Xeon which has the very same specs as the one I've got (same cache, same cores/threads) actually does get up to 3GHz, but I wonder if it can sustain this load with such an high clock... (I don't think it can, though, but I don't have it so we'll never know).

excellentswordfight
9th July 2021, 11:09
This particular server comes with a Gold Xeon which doesn't get much higher, but the Platinum Xeon which has the very same specs as the one I've got (same cache, same cores/threads) actually does get up to 3GHz, but I wonder if it can sustain this load with such an high clock... (I don't think it can, though, but I don't have it so we'll never know).
Yeah there are some more expensive 28C SKUs, they either have higher tdp (i.e. higher sustained frequencies) and/or support for octa socket. But not all servers can be configured with dual socket 200W+ models.

Judging from the line up, yeah the 6238R seems to offer the best value of the 28C models (the 200W model 6258R has an 50% price increase lol).

benwaggoner
9th July 2021, 16:20
Thats actually higher then I expected! Xeons usually hit base clock under full load (2.2 for Gold 6238R it seems). I wonder if Intel has improved avx512 on Cascade Lake, I know that they have on Ice Lake, but the test I did on Skylake-SP had a pretty large penalty (almost 500Mhz).
This is a really good question! Anyone know any details here? We saw something similar with AVX2 where the first implementation had really bad thermal throttling that impaired its utility, but later implementations were able to throttle less and thus made AVX2 almost essential to good performance.

excellentswordfight
9th July 2021, 19:49
This is a really good question! Anyone know any details here? We saw something similar with AVX2 where the first implementation had really bad thermal throttling that impaired its utility, but later implementations were able to throttle less and thus made AVX2 almost essential to good performance.
https://www.computerbase.de/2019-04/intel-cascade-lake-sp-lcc-hcc-xcc-avx-512-takt/

According to intel it looks like there is about a 400-500Mhz penalty on Cascade lake, FranceBB has a modell from the refresh lineup though, and it could be that vendors has some leeway with the power budget now days, but from my experience its usually pretty strictly enforced. And do note that not all AVX512 load is equal, and will not demand the same amount of power.

edit.
Cant find specific numbers for 3rd gen SP (ice lake) but it should be improved https://www.servethehome.com/wp-content/uploads/2020/11/Intel-Xeon-Ice-Lake-AVX512-SC20-Higher-res-Update.jpg

Just for reference i dont get lower then 3.1Ghz on an 32C 7502P under x265 avx2 load. And the new Milan model increases the base clock by 300Mhz, so I wouldn't be surprised if those can get close to 3.5Ghz. Its insane if you consider what kind of xeon models that were offered a few years ago in that price range, we are talking about double the price/performance if not more!

https://i.imgur.com/rn6g7n7.jpg

excellentswordfight
16th December 2021, 10:54
I now have access to two very similar specced servers for an rather good apples to apples comparison between the last gen models from Intel & AMD.

Single instance UHD encoding with x265 3.4+54 (GCC 10.2.0) preset slow

Intel Xeon "Cascade Lake Refresh" 2x6226R 16c/32T 150W MSRP 1300USD (each).
avx256: 4,85fps
avx512: 4,49fps

AMD EPYC "ROME" 7502P 32c/64t 180W MSRP 2300USD
avx256: 6,26fps

Two instances for 100% CPU LOAD (running in separate NUMA on Intel)

Intel
avx256: 2x3,2fps

AMD
avx256: 2x3,92fps

DJATOM
16th December 2021, 17:37
I somehow noticed that x265 ignores 2nd NUMA node, or not utilizing it as much as the 1st one.

tonemapped
16th December 2021, 19:19
The concern with AVX512 wasn't that it was never helpful, but that it only had a reliable improvement over AVX2 only with 4K HDR --preset verslow or higher. Which I've seen, and also seen improvements doing 8K encoding.

I could see really good cooling helping some (although 83 isn't that cool). And CPU architecture improvements can change these dynamics a lot. AVX2 was in a similar thermal-throttled boat when originally launched, but was much improved a generation or two later.

The AVX2 performance on the 5900X (at stock values with a 360mm radiator) is impressive at 4.35 GHz all-core with temperatures of ~70°C. That's in a room that's ~25°C.

My server is in a colder room and has AVX2 support, but only 4C/4T (Xeon 1225). Performance is reasonable, given its limitations, but encoding takes 4.4x longer than my workstation. Power differences (measured at the wall) are 170W for the server (4U full of drives) and 230W for the workstation. That's a massive saving in power (it's not cheap in the UK like it is in the US).