Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
11th April 2018, 21:17 | #6001 | Link | |
Unavailable
Join Date: Mar 2009
Location: offline
Posts: 1,480
|
Quote:
https://bitbucket.org/multicoreware/x265/commits/all Last edited by Midzuki; 11th April 2018 at 21:18. Reason: :-/ |
|
11th April 2018, 21:25 | #6002 | Link |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,783
|
Damn. I waited for the "Re: 0/307 — approved" mail.
Time to build. _ P.S.: Compiling x265 with AVX-512 support works only for x86-64 architecture targets. A "bailout" for x86 (Win32) architecture targets seems to be missing, so it throws "invalid opcode" errors for the 8-bit depth core where assembler is still enabled. _ x265 2.7+332-593e63cda903 (Win64) Support for AVX-512 assembly optimized kernels; remember: enable it manually by adding --asm avx512 to the CLI — and don't fry your CPU... Only x86-64 (Win64) version available, skipping it in x86 (Win32) mode for NASM is necessary not to break compilation completely. Last edited by LigH; 11th April 2018 at 22:52. |
12th April 2018, 02:10 | #6004 | Link |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,783
|
I already have that feeling that one day, x265 will be used rather as a benchmark for the efficiency of the AVX implementations in a specific CPU, rather than as a benchmark for efficient video encoding ...
|
12th April 2018, 05:04 | #6006 | Link |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,905
|
Yep, but in 2018 20.7 MB is still very small and the increase is negligible. Unfortunately I can't test the latest AVX512 instruction set 'cause I have a Intel Xeon E5-2660 v4 that supports AVX2 only, sadly.
I look forward for benchmarks. |
12th April 2018, 05:09 | #6007 | Link |
Angel of Night
Join Date: Nov 2004
Location: Tangled in the silks
Posts: 9,559
|
I thought having Kaby Lake meant I had them, but nope, servers only. I have one customer who has a brand spanking new Skylake-X server that I can remote into, I should be able to get benchmarks tomorrow.
|
12th April 2018, 06:37 | #6008 | Link |
Registered User
Join Date: Feb 2002
Location: San Jose, California
Posts: 4,407
|
AVX-512 is faster!
I did some benchmarks using LigH's build x265 2.7+332-593e63cda903 (Win64) above. I used the same build for the AVX2 tests, simply without the "--asm avx512" command. i9-7900X @ 4.5 GHz all cores, 3.0 GHz mesh/cache, DDR4 4000-17-18-18-41-1T. No AVX2 or AVX-512 multiplier offsets. Max 92 degC package CPU temperature during both veryslow encodes. The faster modes did not saturate all 20 threads. The source is 1920x1080 8-bit gradient MagicYUV 4:2:0 on a NvME SSD encoding to another NvME SSD. I used the first 1000 frames from Firefly episode 9 which I had already denoised (SMDegrain) and had on my drive. avs2pipemod.exe -y4mp=1:1 "fireflyshort.avs" | x265_AVX512.exe --input - --y4m -o "D:\temp\fireflyshort.mkv" --asm avx512 --preset veryslow --crf 18.5 --output-depth 10 x265 [info]: HEVC encoder version 2.7+332-593e63cda903 x265 [info]: build info [Windows][GCC 7.3.0][64 bit] 10bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 AVX512 x265 [info]: Main 10 profile, Level-4 (Main tier) x265 [info]: Thread pool created using 20 threads veryslow: AVX512: encoded 1000 frames in 303.44s (3.30 fps), 4037.41 kb/s, Avg QP:20.64 AVX2: encoded 1000 frames in 335.83s (2.98 fps), 4037.41 kb/s, Avg QP:20.64 medium: AVX512: encoded 1000 frames in 28.41s (35.20 fps), 3183.67 kb/s, Avg QP:20.46 AVX2: encoded 1000 frames in 30.71s (32.57 fps), 3183.67 kb/s, Avg QP:20.46 veryfast: AVX512: encoded 1000 frames in 15.47s (64.64 fps), 2769.26 kb/s, Avg QP:20.89 AVX2: encoded 1000 frames in 16.89s (59.20 fps), 2769.26 kb/s, Avg QP:20.89 ultrafast: AVX512: encoded 1000 frames in 6.86s (145.77 fps), 1398.46 kb/s, Avg QP:25.00 AVX2: encoded 1000 frames in 7.22s (138.41 fps), 1398.46 kb/s, Avg QP:25.00 Thanks to everyone who works on x265 and thanks for the regular builds LigH.
__________________
madVR options explained Last edited by Asmodian; 12th April 2018 at 07:12. |
12th April 2018, 10:41 | #6009 | Link |
Lost my old account :(
Join Date: Jul 2017
Posts: 326
|
Slower here.
Using LGHs build with a dell 2u rack server with a Xeon Gold 6126 (12c/24t). CPU utilization dropped with about 10% (both for 1080p and 2160p) and clockspeed dropped from 2.9Ghz to 2.4Ghz. I'm guessing that the gains for AVX512 didnt outweight the dropp in clockspeed and utilization. Tears of steal source (10bit UHD-Bluray compat x265 source for 2160p test, 8bit bluray compat x264 soruce for 1080p) 2160p with avx512: 80-90% CPU usage, 2.28 fps Code:
--asm avx512 --preset slow --profile main10 --level-idc 51 --crf 22 2160p: 100% CPU usage, 2.36 fps Code:
--preset slow --profile main10 --level-idc 51 --crf 22 1080p with avx512: 45-55% CPU usage, 6.54 fps Code:
--asm avx512 --preset slow --profile main10 --level-idc 41 --crf 18 1080p: 55-65% CPU usage, 7.14 fps Code:
--preset slow --profile main10 --level-idc 41 --crf 18 Last edited by excellentswordfight; 12th April 2018 at 11:58. |
12th April 2018, 11:10 | #6010 | Link | |
Registered User
Join Date: Aug 2016
Posts: 60
|
Quote:
|
|
12th April 2018, 12:47 | #6011 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,348
|
Asmodian runs without a AVX512 offset, which would instantly crash his system if a strong AVX512 workload would run, so clearly its faster with some "light" AVX512 usage. Usually you need at least a -10 offset or such to get it working stable under strong AVX512 load (or boost voltages substantially for more heat). Non-OCed Xeon CPUs probably downlock quite substantially.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
12th April 2018, 14:18 | #6012 | Link |
Registered User
Join Date: Feb 2007
Location: Sweden
Posts: 483
|
x265 v2.7+337-54ff74d2b635 (GCC 7.3.0, 32 & 64-bit 8/10/12bit Multilib Windows Binaries)
Code:
https://bitbucket.org/multicoreware/x265/commits/branch/default |
12th April 2018, 14:26 | #6013 | Link |
Registered User
Join Date: Aug 2006
Posts: 2,229
|
Probably best to utilise AVX-512 where it gives the best gains without triggering thermal throttle. The good thing at least with 307 separate patches this can be whittled down. If a function is frequently used and gives only a small gain, it may actually encode faster if on mitred fire to the throttling the patch causes. Even if throttling isn't triggered on a particular rig, temperature difference should be taken into account to cover typical situations.
|
12th April 2018, 17:52 | #6014 | Link |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,783
|
x265 2.7+337-54ff74d2b635
New CLI parameters: Code:
--atc-sei <integer> Emit the alternative transfer characteristics SEI message where the integer is the preferred transfer characteristics. Default disabled --pic-struct <integer> Set the picture structure and emits it in the picture timing SEI message. Values in the range 0..12. See D.3.3 of the HEVC spec. for a detailed explanation. |
12th April 2018, 17:57 | #6015 | Link | |
Registered User
Join Date: Feb 2002
Location: San Jose, California
Posts: 4,407
|
Quote:
I also ran some tests at my normal OC settings with -2, -4 multiplier offsets. 4.8 GHz max core, 4.6 GHz AVX2, 4.4 GHz AVX-512. AVX512: encoded 1000 frames in 310.46s (3.22 fps), 4037.41 kb/s, Avg QP:20.64 AVX2: encoded 1000 frames in 335.85s (2.98 fps), 4037.41 kb/s, Avg QP:20.64 It would probably still melt with a heavy AVX-512 load but it also wasn't completely maxed. AVX-512 ran cooler than AVX2 at these settings. I am not sure why my AVX2 run only had the same speed as the previous 4.5 GHz encode, maybe a latency penalty due to the core changing states. This is a binned, delidded, and water cooled CPU... other systems may have different results. Edit: If I run Prime95 (p95v294b8) with AVX-512 at 4.5 GHz I do get thermal throttling.
__________________
madVR options explained Last edited by Asmodian; 12th April 2018 at 19:09. |
|
12th April 2018, 20:57 | #6016 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,348
|
Try with LinX/Linpack and see your system die. Prime95 does not fully use AVX512 yet (only trial factoring, not full FFTs)
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
12th April 2018, 21:06 | #6017 | Link |
Registered User
Join Date: Nov 2009
Posts: 327
|
It's actually not so bad at higher frequencies, because each 100 MHz increment saves a lot more power, compared to 2.5 GHz server SKUs. i9-7900X can reach 4.1-4.2 GHz AVX-512 frequency with an aftermarket cooling solution.
|
12th April 2018, 21:37 | #6018 | Link | |
Registered User
Join Date: Dec 2014
Posts: 240
|
Quote:
|
|
12th April 2018, 21:37 | #6019 | Link | |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,348
|
Quote:
But this is probably going a bit off-topic for X265. I would've thought the X265 people already learned the down-clocking lesson with AVX2 though, where they experienced the same effect - fancy instructions that made the overall encode slower, especially on server systems, due to clock changes.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
|
12th April 2018, 21:44 | #6020 | Link | |
Registered User
Join Date: Jan 2007
Posts: 729
|
https://forums.anandtech.com/threads...#post-39149633
Quote:
|
|
|
|