View Full Version : build x265 for Ryzen
ghostshadow
3rd March 2023, 15:33
Hi, are there any options to enable when compiling x265 sources to optimize for Ryzen 5xxx CPUs.
Ditto for the Intel 13xxx series.
THANKS
microchip8
3rd March 2023, 16:11
if using GCC, add -mtune=znver3
for Intel, add -mtune=alderlake
ghostshadow
3rd March 2023, 19:36
thanks, but i am using visual studio 2019
john33
3rd March 2023, 22:43
'Enable Enhanced Instruction Set' Intel(R) Advanced Vector Extensions 2 (/arch:CORE-AVX2)
ghostshadow
4th March 2023, 09:19
OK thanks.
Another question :
To compile x264 this time but with the libraries libswsale, libavformat, ffmpegsource and lsmash how do we do it?
I'm using msys64 on windows to compile the x264.
Thanks in advance
Selur
4th March 2023, 10:07
You might want to look at https://github.com/m-ab-s/media-autobuild_suite it does compile x264 and x265 with all those dependencies.
ghostshadow
4th March 2023, 20:29
You might want to look at https://github.com/m-ab-s/media-autobuild_suite it does compile x264 and x265 with all those dependencies.
Thanks the auto script 'media-autobuild_suite.bat' works fine.
But if I want to make my build from a source that I download (git of x264 mod), I find myself in the same situation.
In the source folder of x264 I type this :
./configure --build=$MINGW_CHOST --host=$MINGW_CHOST --prefix=$LOCALDESTDIR --bindir=$LOCALDESTDIR/bin-video --enable-static --enable-pic --extra-cflags="-I/c/media-autobuild_suite-master/local64/include" --extra-ldflags=-L/c/media-autobuild_suite-master/local64/lib
But the result is not good, it does not integrate me libswsale, libavformat etc ..
./configure --build=$MINGW_CHOST --host=$MINGW_CHOST --prefix=$LOCALDESTDIR --bindir=$LOCALDESTDIR/bin-video --enable-static --enable-pic --extra-cflags="-I/c/media-autobuild_suite-master/local64/include" --extra-ldflags=-L/c/media-autobuild_suite-master/local64/lib
Unknown option --build=x86_64-w64-mingw32, ignored
platform: X86_64
byte order: little-endian
system: WINDOWS
cli: yes
libx264: internal
shared: no
static: yes
bashcompletion: no
asm: yes
interlaced: yes
avs: yes
lavf: no
ffms: no
mp4: no
gpl: yes
thread: win32
opencl: yes
filters: resize crop select_every
lto: no
debug: no
gprof: no
strip: no
PIC: yes
bit depth: all
chroma format: all
thanks
microchip8
4th March 2023, 21:06
Try with --enable-lavf
ghostshadow
4th March 2023, 21:09
Try with --enable-lavf
Thank but this is not good :
./configure --build=$MINGW_CHOST --host=$MINGW_CHOST --prefix=$LOCALDESTDIR --bindir=$LOCALDESTDIR/bin-video --enable-static --enable-pic --enable-lavf --extra-cflags="-I/c/media-autobuild_suite-master/local64/include" --extra-ldflags=-L/c/media-autobuild_suite-master/local64/lib
Unknown option --build=x86_64-w64-mingw32, ignored
Unknown option --enable-lavf, ignored
ghostshadow
9th March 2023, 19:11
it's good I managed to make my codecs by modifying the script of media-autobuild_suite-master.
On the other hand, on my x265 codec where I integrated avisynth, I can't use my bat script that I had when I used avs2yuv64.exe
@echo off
echo Encoding movie
echo debut encodage %date% %time%
@echo on
cd /D "C:\Program Files\CreaVideo\proEncod"
start "" /b x265-x64-v3.5+20+10-aMod-gcc12.2.0 --pass 1 --bitrate 35303 --stats "D:\Log265\TestQP.stats" --output-depth 10 --profile main10 --no-slow-firstpass --level-idc 5.1 --high-tier --rd 5 --rskip 2 --sao --limit-sao --selective-sao 4 --cutree --qpstep 4 --ctu 64 --deblock -3:-3 --min-cu-size 8 --bframes 9 --b-adapt 2 --rc-lookahead 60 --lookahead-slices 1 --hist-scenecut --hist-threshold 0.03 --ref 5 --limit-refs 2 --merange 57 --subme 7 --rect --amp --max-merge 5 --aq-mode 4 --aq-strength 0.90 --tu-intra 4 --tu-inter 4 --limit-tu 4 --me 3 --hdr10 --hdr10-opt --master-display G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,0) --min-luma 0 --max-luma 1023 --max-cll 1000,638 --gop-lookahead 0 --weightp --weightb --repeat-headers --aud --hrd --qcomp 0.60 --rdoq-level 2 --qblur 0.6 --psy-rd 2.10 --psy-rdoq 1.35 --vbv-maxrate 160000 --vbv-bufsize 160000 --colorprim bt2020 --transfer smpte2084 --colormatrix bt2020nc --output NUL "C:\AVS265\TestQP.avs" - 2> "D:\Log265\TestQP.log"
echo fin encodage %date% %time%
timeout 30
exit
I have an error : x265 [warning]: extra unused command arguments given <->
HD MOVIE SOURCE
15th March 2023, 03:57
if using GCC, add -mtune=znver3
for Intel, add -mtune=alderlake
Just wondering, what exactly does this do? Would this work on other x265 encoding platforms like Handbrake or Vidcoder, for instance?
Does this take advantage of more hardware options?
microchip8
15th March 2023, 07:17
Just wondering, what exactly does this do? Would this work on other x265 encoding platforms like Handbrake or Vidcoder, for instance?
Does this take advantage of more hardware options?
it optimizes code, where possible, for the specific CPU architecture. Most of the time, this results in slightly faster program compared to a generic compiled one.
yes, it does work with other programs. It's just an optimization.
HD MOVIE SOURCE
16th March 2023, 18:47
it optimizes code, where possible, for the specific CPU architecture. Most of the time, this results in slightly faster program compared to a generic compiled one.
yes, it does work with other programs. It's just an optimization.
Okay, good to know, thanks for the response. I have an AMD CPU, how do I know which version I'm using like 1,2 or 3?
microchip8
16th March 2023, 20:58
Okay, good to know, thanks for the response. I have an AMD CPU, how do I know which version I'm using like 1,2 or 3?
Ryzen 1xxx is znver1
Ryzen 2xxx / 3xxx is znver2
Ryzen 5xxx is znver3
Or read the man page of GCC.
ghostshadow
16th March 2023, 22:07
Hello, for me I won on alderlake on laptop 0.5 fps in x264 with avs denoise filter.( 6.15 fps)
For the ryzen3 I gain between 0.5 and 0.8 fps in x265 on pc. (5.2 fps)
to compile you have to put: -march=znver3 -mtune=znver3 to compile with all the zen3 architecture for example.
But in this case the codec will only work with ryzen 3.
If you put that -mtune=znver3 he compiler does not generate any code that cannot run on the default machine
~ VEGETA ~
17th March 2023, 21:39
i have ryzen 7900x, which version should i use? and what is the best website for it?
any cli command to use which increases performance?
microchip8
17th March 2023, 22:11
i have ryzen 7900x, which version should i use? and what is the best website for it?
any cli command to use which increases performance?
-mtune=znver4
it is only present in GCC 13 and above
~ VEGETA ~
17th March 2023, 22:56
-mtune=znver4
it is only present in GCC 13 and above
can you point me to the website page where i can download such version?
microchip8
17th March 2023, 23:29
can you point me to the website page where i can download such version?
I can point you to the site of GCC but it only offers raw code. You'll have to compile it yourself. I do not know who offers Windows/MinGW binaries for GCC. There are some here on this forum
filler56789
18th March 2023, 01:05
can you point me to the website page where i can download such version?
http://www.msystem.waw.pl/x265/
NOTICE: experimental stuff.
Play at your own risk :-/
HD MOVIE SOURCE
18th March 2023, 05:21
Ryzen 1xxx is znver1
Ryzen 2xxx / 3xxx is znver2
Ryzen 5xxx is znver3
Or read the man page of GCC.
I appreciate it thank you.
ghostshadow
18th March 2023, 12:19
-mtune=znver4
it is only present in GCC 13 and above
-march=znver4 -mtune=znver4
john33
19th March 2023, 15:01
can you point me to the website page where i can download such version?
And at Winlibs, here: https://github.com/brechtsanders/winlibs_mingw/releases/
~ VEGETA ~
19th March 2023, 20:24
And at Winlibs, here: https://github.com/brechtsanders/winlibs_mingw/releases/
I mean best x265 gcc build suitable for ryzen 7900x.
microchip8
20th March 2023, 15:43
I mean best x265 gcc build suitable for ryzen 7900x.
I don't think you'll find a build tuned for your Zen 4 CPU yet. GCC 13, that supports tuning for znver4, is not officially out yet (though that doesn't say much) and I haven't seen anything by the local people who offer builds for Windows that they offer an optimized version of x265.
I'm afraid you'll have to compile it yourself and use Zen 4 optimization flags.
~ VEGETA ~
20th March 2023, 20:54
I don't think you'll find a build tuned for your Zen 4 CPU yet. GCC 13, that supports tuning for znver4, is not officially out yet (though that doesn't say much) and I haven't seen anything by the local people who offer builds for Windows that they offer an optimized version of x265.
I'm afraid you'll have to compile it yourself and use Zen 4 optimization flags.
ok fine, but i am ok with the next best choice.
this is the website I use currently: http://msystem.waw.pl/x265/
I get AVX2 GCC 12.2 version (64-bit, 10-bit).
does it offer that ryzen 4 cli command?
HD MOVIE SOURCE
21st March 2023, 04:03
When looking for a CPU for encoding, is there a core limit? Or does x265 take advantage of as many cores as you can throw at it? I've heard that it might not be as efficient with more cores, however my goal is to understand whether more cores are better, or to look at CPUs that have faster single-thread performance.
So if you had a Intel Core i9-13900KS that has great single-core performance. Would something that is slower but has double the amount of cores finish encodes faster?
ghostshadow
21st March 2023, 10:07
Intel Core i9-13900K Review encoding
https://www.techpowerup.com/review/intel-core-i9-13900k/16.html
benwaggoner
21st March 2023, 17:14
When looking for a CPU for encoding, is there a core limit? Or does x265 take advantage of as many cores as you can throw at it? I've heard that it might not be as efficient with more cores, however my goal is to understand whether more cores are better, or to look at CPUs that have faster single-thread performance.
So if you had a Intel Core i9-13900KS that has great single-core performance. Would something that is slower but has double the amount of cores finish encodes faster?
Yes, absolutely. x265 can scale up to lots of cores. The max useful cores/threads depends on frame size and frame threading parameters. 4K can easily saturate 8 physical cores even with a single frame thread. I'd think that would be the absolute minimum I'd want in an encoding box.
Strong single-thread performance is really important, of course, and scales linearly while more cores are additive, but not linear, and don't help at all past a certain point. With consumer CPUs, 2x the CPUs even if each is 20% slower will still be a lot faster.
benwaggoner
21st March 2023, 17:15
Intel Core i9-13900K Review encoding
https://www.techpowerup.com/review/intel-core-i9-13900k/16.html
Impressive!
I'm really curious to see how Zen 4 will stack up.
ghostshadow
21st March 2023, 19:46
Impressive!
I'm really curious to see how Zen 4 will stack up.
You have on it the result of the zen 4(Ryzen 9 7950x)
benwaggoner
22nd March 2023, 16:17
You have on it the result of the zen 4(Ryzen 9 7950x)
Oh, right. I meant the Zen 4 version of Epyc more specifically.
Has anyone toyed with --pools on Zen 4 to see how much efficiency improves by pinning to a single NUMA node? I'd think than Zen 4 chipset NUMA would have less of a perf hit than between Xeon sockets, but no idea what it actually would be.
~ VEGETA ~
23rd March 2023, 23:06
http://www.msystem.waw.pl/x265/
NOTICE: experimental stuff.
Play at your own risk :-/
i used the gcc13 avx2 latest one.
however, could not find such option for ryzen cpu? can you elaborate on which version is fastest for me?
thanks
guest
24th March 2023, 01:24
i used the gcc13 avx2 latest one.
however, could not find such option for ryzen cpu? can you elaborate on which version is fastest for me?
thanks
Well, that's new :)
You could also try this one :-
x265 v3.5+97 (https://forum.doom9.org/showthread.php?p=1984890#post1984890)
I have both a 13900KF & Ryzen 7950X, and in my experience, the 7950X is slightly faster with x265 encodes (using RipBot264).
I have tried these so called AVX2 & Ryzen Zen3 optimised builds and there is no noticeable improvement..
I also have the avx512 switch in my x265 command line...
HD MOVIE SOURCE
21st April 2023, 19:21
Yes, absolutely. x265 can scale up to lots of cores. The max useful cores/threads depends on frame size and frame threading parameters. 4K can easily saturate 8 physical cores even with a single frame thread. I'd think that would be the absolute minimum I'd want in an encoding box.
Strong single-thread performance is really important, of course, and scales linearly while more cores are additive, but not linear, and don't help at all past a certain point. With consumer CPUs, 2x the CPUs even if each is 20% slower will still be a lot faster.
When looking for a PC to buy would you absolutely buy a workstation to encode on? Can you buy workstations pre-built? Because I don't know how to build a PC.
benwaggoner
21st April 2023, 21:48
When looking for a PC to buy would you absolutely buy a workstation to encode on? Can you buy workstations pre-built? Because I don't know how to build a PC.
Yeah, the major PC manufacturers sell high performance workstations, both pre-built and configurable.
The component prices can have a huge markup, but sometimes worth it to ensure everything works reliably together.
It is surprisingly easy to get into five figure prices.
RanmaCanada
22nd April 2023, 03:18
When looking for a PC to buy would you absolutely buy a workstation to encode on? Can you buy workstations pre-built? Because I don't know how to build a PC.
As Ben said, yes you absolutely can buy workstations. The issue is they are insanely expensive. I know Lenovo had Threadripper Pro workstations in the past, and Dell has started selling them as well. The issue is cost. As stated, they can easily get to 5 figures, and at that point you're better off to buy multiple high end desktop units and use software like Ripbot for distributed encoding. For example, Dell sells a 64 core 5995wx workstation for $21000 CDN (https://www.dell.com/en-ca/shop/workstations/precision-7865-tower-workstation-configurable/spd/precision-7865-workstation/xctopt7865ca_vp?configurationid=7ad24f68-cfd5-4552-901a-dfc1c6d9e205) (16 gigs of Ram and a T420 potato graphics with a 256gb SSD). Which is insane. You can easily buy 4 16 core systems for less than that and then use Ripbot. Yes you will use more power in the long run, but how long will it take you to use $8-11k in power? Assuming of course you are spending $2500-$3000 per system.
HD MOVIE SOURCE
22nd April 2023, 03:38
As Ben said, yes you absolutely can buy workstations. The issue is they are insanely expensive. I know Lenovo had Threadripper Pro workstations in the past, and Dell has started selling them as well. The issue is cost. As stated, they can easily get to 5 figures, and at that point you're better off to buy multiple high end desktop units and use software like Ripbot for distributed encoding. For example, Dell sells a 64 core 5995wx workstation for $21000 CDN (https://www.dell.com/en-ca/shop/workstations/precision-7865-tower-workstation-configurable/spd/precision-7865-workstation/xctopt7865ca_vp?configurationid=7ad24f68-cfd5-4552-901a-dfc1c6d9e205) (16 gigs of Ram and a T420 potato graphics with a 256gb SSD). Which is insane. You can easily buy 4 16 core systems for less than that and then use Ripbot. Yes you will use more power in the long run, but how long will it take you to use $8-11k in power? Assuming of course you are spending $2500-$3000 per system.
WOW, seriously? Thats crazy prices. So what is it like encoding on systems like this? Can you encode a 2 hour movie using Placebo on x265 in a day?
Can you rent ultra high end workstations in a cloud. Enode and perform all the tasks you need, controlled by your own PC. Get a job done super quick, and then send yourself the file once you've finished.
Because it seems that computing power really holds back encoding.
What about cpu capabilities like MMX2 and I know there's a bunch of others I see when I encode. Does this help none workstations catchup a little? How do I know which CPU's have all the CPU capabilities that x265 needs?
RanmaCanada
22nd April 2023, 05:49
WOW, seriously? Thats crazy prices. So what is it like encoding on systems like this? Can you encode a 2 hour movie using Placebo on x265 in a day?
Can you rent ultra high end workstations in a cloud. Enode and perform all the tasks you need, controlled by your own PC. Get a job done super quick, and then send yourself the file once you've finished.
Because it seems that computing power really holds back encoding.
What about cpu capabilities like MMX2 and I know there's a bunch of others I see when I encode. Does this help none workstations catchup a little? How do I know which CPU's have all the CPU capabilities that x265 needs?
To answer your last question, every modern CPU will have everything that x265 "needs" with the caveat being avx-512. Some chips have it, some don't. You could always rent a VPS or other system like AWS but as I understand it the costs there are usually pinned to the amount of cores available and or time and can get very high again.
I'll be honest, most of what you are asking is way beyond my area of expertise, so hopefully someone comes in to further expand, and or correct anything I've written in error.
the last time encoding on 64 cores was tested, it was slower than a 16 core (https://www.anandtech.com/show/16478/64-cores-of-rendering-madness-the-amd-threadripper-pro-3995wx-review/5) And in older tests, 32 cores was the sweet spot (https://singhkays.com/blog/x265-128-core-scaling-4k-hevc-hdr-azure-vm/).
But for current results, there are some users here with access to ridonkulous hardware; if they could chime in, that would be fantastic.
benwaggoner
23rd April 2023, 23:12
WOW, seriously? Thats crazy prices. So what is it like encoding on systems like this? Can you encode a 2 hour movie using Placebo on x265 in a day?
You're not going to get 2x faster than the fastest i9 at 4x the cost. Workstations like that make sense when the difference between a 3 and a 2 day encode is worth money, or if there's a highly paid operator using it. Spending an extra $20K to improve the efficiency of a $200K/year professional by 20% has a great ROI. Workstations often get used for other stuff, like uncompressed input/output, GPU-accelerated rendering, etc. This is a world where the I/O board can cost $4000, the GPU can cost $6500, the monitor can cost $2000-$30,000, the RAID controller and drives $3000. 128-256 GB of RAM is typical. So even if the bare workstation costs $10K more than a high-end consumer setup, that's still only a fraction of the whole system cost.
Can you rent ultra high end workstations in a cloud. Enode and perform all the tasks you need, controlled by your own PC. Get a job done super quick, and then send yourself the file once you've finished.
Because it seems that computing power really holds back encoding.
Getting source files uploaded to the cloud can be the bottleneck here. The ProRes mezzanine for a long movie can get close to 1 TB. Can be a long upload with asymmetric consumer broadband.
The most cost-effective AWS instance for fast 4K encoding would probably be the c7g.8xlarge, with an on-demand cost of $1.16/hour. So that 2 hour movie in 2 days would cost about $56. Prices can get a lot lower as usage goes up.
Spot instances can be a lot cheaper, but aren't great for traditional encoding since each operation can take hours or days. A segmenting encoder could take good advantage of spot pricing, but full quality with VBV compliance requires overlapping segments using --chunk-start and/or --chunk-end.
It's good to test how many cores your content and settings can saturate, and then use the smallest instance that can accommodate that. More cores don't make encoding faster beyond that, but do make it more expensive.
What about cpu capabilities like MMX2 and I know there's a bunch of others I see when I encode. Does this help none workstations catchup a little? How do I know which CPU's have all the CPU capabilities that x265 needs?
On AWS you can get the processor names and look up their capabilities. https://aws.amazon.com/ec2/instance-types/#Compute_Optimized.
For x86-64, AVX2 is pretty much required for good perf (and universal). Whether AVX512 is useful depends on settings, resolution, and specific processor.
The ARM-based Graviton2 has received really great NEON etc. optimizations for x265, ffmpeg, and other open source encoder components over the last 18 months, and can be performance and price/performance at least as good as x86-64. Graviton3 using the current MASTER branch versions seems about the best price/performance today, and should pull away even more over time as wider SIMD optimization gets implemented for it. AWS has been checking in tons of optimized assembly for ARM.
excellentswordfight
24th April 2023, 15:51
WOW, seriously? Thats crazy prices. So what is it like encoding on systems like this? Can you encode a 2 hour movie using Placebo on x265 in a day?
First of all, x265 doesnt scale indefinitely with more cores, no advanced GOP-based codecs do so, at somewhat stock threading settings 4k get good utilization to about 16-24C, to improve speed beyond that chunk encoding is a necessity. So for single file encoding a expensive 64C model wont help, it will probably be slower as they have lower clock speed.
I encode at work on a CPU comparable to the 5975WX (32C threadripper), at preset veryslow it gets about 1-1,5fps, and I can do two simultaneously at about 1fps. But even professionally, for production I never go below slower, there just isnt any ROI at that point, and have up to this point yet encountered anyone using something like x265 placebo in production environments, thats just bad business as there are so much diminishing returns at that point. Even if we had hw were the raw throughput was usable, i doubt I would even go to veryslow, slower does indeed already use most of the potential in the encoder. At that point most gains comes from content and compression level specific tuning.
benwaggoner
24th April 2023, 20:06
First of all, x265 doesnt scale indefinitely with more cores, no advanced GOP-based codecs do so, at somewhat stock threading settings 4k get good utilization to about 16-24C, to improve speed beyond that chunk encoding is a necessity. So for single file encoding a expensive 64C model wont help, it will probably be slower as they have lower clock speed.
I encode at work on a CPU comparable to the 5975WX (32C threadripper), at preset veryslow it gets about 1-1,5fps, and I can do two simultaneously at about 1fps. But even professionally, for production I never go below slower, there just isnt any ROI at that point, and have up to this point yet encountered anyone using something like x265 placebo in production environments, thats just bad business as there are so much diminishing returns at that point. Even if we had hw were the raw throughput was usable, i doubt I would even go to veryslow, slower does indeed already use most of the potential in the encoder. At that point most gains comes from content and compression level specific tuning.
Great points! I pretty much only go above slower for lower bitrates & resolutions that won't be the long pole in encoding an adaptation set, or for test content that needs to be as accurate as possible.
You're absolutely right that most of the quality improvements beyond slower (and maybe beyond medium) come from content or scenario specific tuning. Last night's test encode had a 1033 character command line, 762 just of parameters excluding paths and file names.
Lucius Snow
26th April 2023, 22:41
I tried x265 (official build) with DEE for Dolby Vision Profile 7 encoding (FEL). I used a Ryzen 3990X (64 cores). That's terribly slow: something like 1 hour for a 5 seconds footage. The CPU usage never exceeded 30%. I couldn't test with znver2 because it doesn't seem to be included in the parameters.
vBulletin® v3.8.11, Copyright ©2000-2026, vBulletin Solutions Inc.