Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 10th April 2018, 13:17   #6001  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 5,493
Quote:
Originally Posted by RieGo View Post
afaik there are some settings that haven't really been adopted to presets. are there any plans to revise presets?
I'm no developer, but ... probably; presets have been revised in x264 several times as well. And in addition, there are also tunings to be revised and added. But first: Satisfy needs of paying customers. Second: e.g. complete spec coverage as much as sensible; etc. etc. The developers will surely face no boredom.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 11th April 2018, 21:17   #6002  |  Link
Midzuki
Unavailable
 
Midzuki's Avatar
 
Join Date: Mar 2009
Location: offline
Posts: 1,465
Quote:
Originally Posted by LigH View Post


307 patches with AVX-512 (and other improved assembly) code uploaded to the developer mailing list. That will take a little while to review.
They are up and running 0_o

https://bitbucket.org/multicoreware/x265/commits/all

Last edited by Midzuki; 11th April 2018 at 21:18. Reason: :-/
Midzuki is offline   Reply With Quote
Old 11th April 2018, 21:25   #6003  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 5,493
Damn. I waited for the "Re: 0/307 — approved" mail.

Time to build.
_

P.S.: Compiling x265 with AVX-512 support works only for x86-64 architecture targets. A "bailout" for x86 (Win32) architecture targets seems to be missing, so it throws "invalid opcode" errors for the 8-bit depth core where assembler is still enabled.
_

x265 2.7+332-593e63cda903 (Win64)

Support for AVX-512 assembly optimized kernels; remember: enable it manually by adding --asm avx512 to the CLI — and don't fry your CPU...

Only x86-64 (Win64) version available, skipping it in x86 (Win32) mode for NASM is necessary not to break compilation completely.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid

Last edited by LigH; 11th April 2018 at 22:52.
LigH is offline   Reply With Quote
Old 12th April 2018, 01:15   #6004  |  Link
hajj_3
Registered User
 
Join Date: Mar 2004
Posts: 823
anyone with an avx512 capable processor fancy doing benchmarks comparing it to the previous build?
hajj_3 is offline   Reply With Quote
Old 12th April 2018, 02:10   #6005  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 5,493
I already have that feeling that one day, x265 will be used rather as a benchmark for the efficiency of the AVX implementations in a specific CPU, rather than as a benchmark for efficient video encoding ...
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 12th April 2018, 02:54   #6006  |  Link
Midzuki
Unavailable
 
Midzuki's Avatar
 
Join Date: Mar 2009
Location: offline
Posts: 1,465
More AVX-512 code = bigger filesize

Midzuki is offline   Reply With Quote
Old 12th April 2018, 05:04   #6007  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Germany
Posts: 347
Yep, but in 2018 20.7 MB is still very small and the increase is negligible. Unfortunately I can't test the latest AVX512 instruction set 'cause I have a Intel Xeon E5-2660 v4 that supports AVX2 only, sadly.
I look forward for benchmarks.
FranceBB is online now   Reply With Quote
Old 12th April 2018, 05:09   #6008  |  Link
foxyshadis
ангел смерти
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Lost
Posts: 9,356
I thought having Kaby Lake meant I had them, but nope, servers only. I have one customer who has a brand spanking new Skylake-X server that I can remote into, I should be able to get benchmarks tomorrow.
__________________
There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. ~ Ed Howdershelt
foxyshadis is offline   Reply With Quote
Old 12th April 2018, 06:37   #6009  |  Link
Asmodian
Registered User
 
Join Date: Feb 2002
Location: San Jose, California
Posts: 3,200
AVX-512 is faster!

I did some benchmarks using LigH's build x265 2.7+332-593e63cda903 (Win64) above. I used the same build for the AVX2 tests, simply without the "--asm avx512" command.
i9-7900X @ 4.5 GHz all cores, 3.0 GHz mesh/cache, DDR4 4000-17-18-18-41-1T. No AVX2 or AVX-512 multiplier offsets. Max 92 degC package CPU temperature during both veryslow encodes. The faster modes did not saturate all 20 threads.

The source is 1920x1080 8-bit gradient MagicYUV 4:2:0 on a NvME SSD encoding to another NvME SSD. I used the first 1000 frames from Firefly episode 9 which I had already denoised (SMDegrain) and had on my drive.

avs2pipemod.exe -y4mp=1:1 "fireflyshort.avs" | x265_AVX512.exe --input - --y4m -o "D:\temp\fireflyshort.mkv" --asm avx512 --preset veryslow --crf 18.5 --output-depth 10
x265 [info]: HEVC encoder version 2.7+332-593e63cda903
x265 [info]: build info [Windows][GCC 7.3.0][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 AVX512
x265 [info]: Main 10 profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 20 threads

veryslow:
AVX512: encoded 1000 frames in 303.44s (3.30 fps), 4037.41 kb/s, Avg QP:20.64
AVX2: encoded 1000 frames in 335.83s (2.98 fps), 4037.41 kb/s, Avg QP:20.64
medium:
AVX512: encoded 1000 frames in 28.41s (35.20 fps), 3183.67 kb/s, Avg QP:20.46
AVX2: encoded 1000 frames in 30.71s (32.57 fps), 3183.67 kb/s, Avg QP:20.46
veryfast:
AVX512: encoded 1000 frames in 15.47s (64.64 fps), 2769.26 kb/s, Avg QP:20.89
AVX2: encoded 1000 frames in 16.89s (59.20 fps), 2769.26 kb/s, Avg QP:20.89
ultrafast:
AVX512: encoded 1000 frames in 6.86s (145.77 fps), 1398.46 kb/s, Avg QP:25.00
AVX2: encoded 1000 frames in 7.22s (138.41 fps), 1398.46 kb/s, Avg QP:25.00

Thanks to everyone who works on x265 and thanks for the regular builds LigH.
__________________
madVR options explained

Last edited by Asmodian; 12th April 2018 at 07:12.
Asmodian is offline   Reply With Quote
Old 12th April 2018, 10:41   #6010  |  Link
excellentswordfight
Lost my old account :(
 
Join Date: Jul 2017
Posts: 31
Slower here.

Using LGHs build with a dell 2u rack server with a Xeon Gold 6126 (12c/24t). CPU utilization dropped with about 10% (both for 1080p and 2160p) and clockspeed dropped from 2.9Ghz to 2.4Ghz. I'm guessing that the gains for AVX512 didnt outweight the dropp in clockspeed and utilization.

Tears of steal source (10bit UHD-Bluray compat x265 source for 2160p test, 8bit bluray compat x264 soruce for 1080p)

2160p with avx512: 80-90% CPU usage, 2.28 fps
Code:
--asm avx512 --preset slow --profile main10 --level-idc 51 --crf 22

2160p: 100% CPU usage, 2.36 fps
Code:
--preset slow --profile main10 --level-idc 51 --crf 22

1080p with avx512: 45-55% CPU usage, 6.54 fps
Code:
--asm avx512 --preset slow --profile main10 --level-idc 41 --crf 18

1080p: 55-65% CPU usage, 7.14 fps
Code:
--preset slow --profile main10 --level-idc 41 --crf 18

Last edited by excellentswordfight; 12th April 2018 at 11:58.
excellentswordfight is offline   Reply With Quote
Old 12th April 2018, 11:10   #6011  |  Link
WhatZit
Registered User
 
Join Date: Aug 2016
Posts: 55
Quote:
Originally Posted by excellentswordfight View Post
I'm guessing that the gains for AVX512 didn't outweight the drop in clockspeed and utilization.
Yep, a Catch-22 also discovered by Cloudfare after some cryptography assessments: https://blog.cloudflare.com/on-the-d...uency-scaling/
WhatZit is offline   Reply With Quote
Old 12th April 2018, 12:47   #6012  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 9,312
Asmodian runs without a AVX512 offset, which would instantly crash his system if a strong AVX512 workload would run, so clearly its faster with some "light" AVX512 usage. Usually you need at least a -10 offset or such to get it working stable under strong AVX512 load (or boost voltages substantially for more heat). Non-OCed Xeon CPUs probably downlock quite substantially.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 12th April 2018, 14:18   #6013  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 291
x265 v2.7+337-54ff74d2b635 (GCC 7.3.0, 32 & 64-bit 8/10/12bit Multilib Windows Binaries)

Code:
https://bitbucket.org/multicoreware/x265/commits/branch/default
Barough is offline   Reply With Quote
Old 12th April 2018, 14:26   #6014  |  Link
burfadel
Registered User
 
Join Date: Aug 2006
Posts: 2,235
Probably best to utilise AVX-512 where it gives the best gains without triggering thermal throttle. The good thing at least with 307 separate patches this can be whittled down. If a function is frequently used and gives only a small gain, it may actually encode faster if on mitred fire to the throttling the patch causes. Even if throttling isn't triggered on a particular rig, temperature difference should be taken into account to cover typical situations.
burfadel is offline   Reply With Quote
Old 12th April 2018, 17:34   #6015  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 336
I tested Barough's build in comment #6006 on an i9-7900X at Intel POR settings.

Code:
Processor: i9-7900X
OS: Windows 10 1709
Memory: 4x DDR4-3200 CL 16

Extended processor information:
Turbo frequency: 4.0 / 3.6 / 3.3 GHz
PL1: 140 W
PL2: 168 W
PL1tau: 1 s

Input: 1280x720 4:4:4 10-bit, 2160 frames
Encoder settings: --preset veryslow --output-depth 10 --crf 18

AVX2: --asm avx2
Speed: 2.36 fps
Power: 92 W
Efficiency: 0.026 fps/W

AVX-512: --asm avx512
Speed: 2.45 fps
Power: 87 W
Efficiency: 0.028 fps/W
The memory is non-JEDEC (DDR4-2666 CL 17), but it should not matter for video encoding. After many months(?) of work, AVX-512 has managed to deliver a meager 3.8% speedup. The efficiency story is slightly better at almost 10% frames/W, but since the power never came close to TDP, efficiency is of little importance.

Quote:
Originally Posted by burfadel View Post
Probably best to utilise AVX-512 where it gives the best gains without triggering thermal throttle. The good thing at least with 307 separate patches this can be whittled down. If a function is frequently used and gives only a small gain, it may actually encode faster if on mitred fire to the throttling the patch causes. Even if throttling isn't triggered on a particular rig, temperature difference should be taken into account to cover typical situations.
It's actually the opposite. They need to pack all the AVX-512 functions so that they run back-to-back, or else the whole application will be frequency-throttled, yet gain no benefit. It takes on the order of 1 ms to change AVX states on the processor.

Last edited by Stephen R. Savage; 12th April 2018 at 23:27.
Stephen R. Savage is offline   Reply With Quote
Old 12th April 2018, 17:52   #6016  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 5,493
x265 2.7+337-54ff74d2b635
  • Merge with default; prep for v3.0
  • Support for HLG-graded content and pic_struct
  • Fix conditions for single-sei NAL
  • Fix 32 bit build error (means: AVX-512 support is only included in x86-64 architecture target)
(VMAF support to report per frame and aggregate VMAF score — unfortunately not yet? available for Windows builds)

New CLI parameters:

Code:
   --atc-sei <integer>           Emit the alternative transfer characteristics SEI message where the integer is the preferred transfer characteristics. Default disabled
   --pic-struct <integer>        Set the picture structure and emits it in the picture timing SEI message. Values in the range 0..12. See D.3.3 of the HEVC spec. for a detailed explanation.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 12th April 2018, 17:57   #6017  |  Link
Asmodian
Registered User
 
Join Date: Feb 2002
Location: San Jose, California
Posts: 3,200
Quote:
Originally Posted by nevcairiel View Post
Asmodian runs without a AVX512 offset, which would instantly crash his system if a strong AVX512 workload would run, so clearly its faster with some "light" AVX512 usage. Usually you need at least a -10 offset or such to get it working stable under strong AVX512 load (or boost voltages substantially for more heat). Non-OCed Xeon CPUs probably downlock quite substantially.
I had downclocked from my normal max clocks when running without an AVX offset.

I also ran some tests at my normal OC settings with -2, -4 multiplier offsets. 4.8 GHz max core, 4.6 GHz AVX2, 4.4 GHz AVX-512.

AVX512: encoded 1000 frames in 310.46s (3.22 fps), 4037.41 kb/s, Avg QP:20.64
AVX2: encoded 1000 frames in 335.85s (2.98 fps), 4037.41 kb/s, Avg QP:20.64

It would probably still melt with a heavy AVX-512 load but it also wasn't completely maxed. AVX-512 ran cooler than AVX2 at these settings. I am not sure why my AVX2 run only had the same speed as the previous 4.5 GHz encode, maybe a latency penalty due to the core changing states.

This is a binned, delidded, and water cooled CPU... other systems may have different results.

Edit: If I run Prime95 (p95v294b8) with AVX-512 at 4.5 GHz I do get thermal throttling.
__________________
madVR options explained

Last edited by Asmodian; 12th April 2018 at 19:09.
Asmodian is offline   Reply With Quote
Old 12th April 2018, 20:31   #6018  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 336
Quote:
Originally Posted by nevcairiel View Post
Asmodian runs without a AVX512 offset, which would instantly crash his system if a strong AVX512 workload would run, so clearly its faster with some "light" AVX512 usage. Usually you need at least a -10 offset or such to get it working stable under strong AVX512 load (or boost voltages substantially for more heat). Non-OCed Xeon CPUs probably downlock quite substantially.
Anandtech has the Xeon Scalable turbo frequencies for each AVX state: https://www.anandtech.com/show/11544...f-the-decade/8

At a glance, it's clear that most Xeon Gold/Platinum SKUs have a 20% AVX-512 vs AVX penalty, which is far more than the x265 "optimization" achieves. The high frequency client parts (Core-X/Xeon-W) have a lower penalty of only 10%, where the new x265 just barely comes out ahead. Another 10% frequency loss would reverse the gains, which matches up with what excellentswordfight found.
Stephen R. Savage is offline   Reply With Quote
Old 12th April 2018, 20:57   #6019  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 9,312
Quote:
Originally Posted by Asmodian View Post
Edit: If I run Prime95 (p95v294b8) with AVX-512 at 4.5 GHz I do get thermal throttling.
Try with LinX/Linpack and see your system die. Prime95 does not fully use AVX512 yet (only trial factoring, not full FFTs)
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 12th April 2018, 21:06   #6020  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 336
Quote:
Originally Posted by nevcairiel View Post
Try with LinX/Linpack and see your system die. Prime95 does not fully use AVX512 yet (only trial factoring, not full FFTs)
It's actually not so bad at higher frequencies, because each 100 MHz increment saves a lot more power, compared to 2.5 GHz server SKUs. i9-7900X can reach 4.1-4.2 GHz AVX-512 frequency with an aftermarket cooling solution.
Stephen R. Savage is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 00:21.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2018, vBulletin Solutions Inc.