Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 19th April 2024, 12:17   #9281  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,600
Quote:
Originally Posted by excellentswordfight View Post
But AFAIK the downclocking of AVX512 load have been improved since its introduction, and was less of an issue on consumer desktop models were frequency control is possible. But given what i've seen, i'm pretty sure that running it with AVX512 produces worse performance/watt.
Itís proven that AVX512 has more performance per watt. About downclocking: if you have a powerful heatsink AFAIK you can disable downclocking in bios.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 19th April 2024, 14:38   #9282  |  Link
excellentswordfight
Lost my old account :(
 
Join Date: Jul 2017
Posts: 332
Quote:
Originally Posted by tormento View Post
It’s proven that AVX512 has more performance per watt
Proven? For all systems with all software? Its pretty much a fact that has been worse for my cases; were enterprise-server hardware has been used. These systems are pretty strict in staying within it's power-specification, so when I get worse performance with AVX512 enabled, but it consumes the same amount of power (I have not messured the powerdraw this mind you, but im pretty sure thats the case given tempeture and freqency readings), then it means that it has a negative impact on performance/w. There are software were AVX512 instructions can do wonders, were it boosts performance by a great ammount, and there im sure that it has performance per watts benefits. But for x265, the performance gains of AVX512 are not that big, and at least with older generation hardware performance per watt has been lower, so unless you also increase power it have given me no gains.
Quote:
About downclocking: if you have a powerful heatsink AFAIK you can disable downclocking in bios.
Yes, and thats also why i wrote:

"...less of an issue on consumer desktop models were frequency control is possible"

I have only tested AVX512 on Xeon/server hardware, and my old ice lake laptop. But as I said, I will try it on Sapphire Rapids, it will be interesting to see how it has been improved, cause first gen, both xeon and consumer (ice-lake) the downclocking rather big.

Last edited by excellentswordfight; 19th April 2024 at 14:53.
excellentswordfight is offline   Reply With Quote
Old 19th April 2024, 17:10   #9283  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,355
AVX512 benefits start if you use it well and thoroughly, using it sparsely or inefficiently can cause more overhead then gain.
Blanket statements never make sense with complex optimizations and complex software. Test your own use-case on your own hardware.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 19th April 2024, 23:18   #9284  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,849
Avx512 will be soon abandoned by Intel. AVX10 is a new toy.
Atak_Snajpera is offline   Reply With Quote
Old 22nd April 2024, 04:22   #9285  |  Link
MeteorRain
結城有紀
 
Join Date: Dec 2003
Location: Oregon
Posts: 895
Quote:
Originally Posted by DMD View Post
I apologize but maybe that is why in the latest release of Patman86 the avx512 version does not appear?
https://github.com/Patman86/x265-Mod-by-Patman/releases

Thank you
That's just a targeted build, which affects everything other than the core part. The core will still run at different instruction set (meaning you should be able to use AVX512 on a sandy bridge build if you wish). Only the C++ code (which is not part of the core parts) will target at a different instruction set.

Quote:
Originally Posted by tormento View Post
I'd be curious to see AVX2 vs AVX512 performances of a very noisy source in very slow/placebo preset on modern CPUs.
Unfortunately the choice for Intel is now limited to 11 series and Xeons.
I need to decide if to go with Arrow Lake or Zen 5.
Not having AVX512 could be a big no no, as a lot of plugins and software that I use support that instruction set.
AVX10 for Arrow Lake is a big enigma, as it's not completely confirmed if it will be 10.1 (no AVX512) or 10.2.
AVX512 on different CPUs with different implementations will behave differently. On Zen4 you should generally see slight improvement with AVX512, while on an Intel CPU you'll see some throttling and slow down if you mix AVX512 workload with AVX2 ones.

Zen4 basically runs AVX512 on AVX2 platform, but since some AVX512 instructions are more efficient than their plain AVX2 equivalent, speed sees some improvement.
Intel CPUs runs AVX512 natively, so they do get great improvement. However Intel CPU has that infamous clock throttling so it clocks lower, impacting normal workloads.

It'll end up being a personal preference. AVX10 is a new thing and will take time to adopt. And AVX10 is, well, to just put "AVX512" back to existing product line.

Just my 2 cents.
__________________
My Projects
x265 - Yuuki-Asuna-mod
TS - ADTS AAC Splitter | LATM AAC Splitter | BS4K-ASS
Neo AviSynth+ filters - F3KDB | FFT3D | DFTTest | MiniDeen | Temporal Median
MeteorRain is offline   Reply With Quote
Old 22nd April 2024, 11:26   #9286  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,600
Quote:
Originally Posted by MeteorRain View Post
It'll end up being a personal preference. AVX10 is a new thing and will take time to adopt. And AVX10 is, well, to just put "AVX512" back to existing product line.
AVX 10.1 will be a subset of AVX512. With 10.2 they will embrace and improve AVX512.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 22nd April 2024, 11:35   #9287  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,747
I really wouldn't hold my breath waiting for x265 to get big optimizations regarding any new instruction sets.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 24th April 2024, 19:25   #9288  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,804
Quote:
Originally Posted by MeteorRain View Post
AVX512 on different CPUs with different implementations will behave differently. On Zen4 you should generally see slight improvement with AVX512, while on an Intel CPU you'll see some throttling and slow down if you mix AVX512 workload with AVX2 ones.
And the throttling and performance impact varies between different major Intel design versions. The current design performs quite a bit better with AVX512 than the first round of CPUs did.

Quote:
Zen4 basically runs AVX512 on AVX2 platform, but since some AVX512 instructions are more efficient than their plain AVX2 equivalent, speed sees some improvement.
Intel CPUs runs AVX512 natively, so they do get great improvement. However Intel CPU has that infamous clock throttling so it clocks lower, impacting normal workloads.
Yeah, even if it is the same fused instruction under the hood, AVX512 has smaller instruction size per bit, and so can stay in L3 cache better.

Quote:
It'll end up being a personal preference. AVX10 is a new thing and will take time to adopt. And AVX10 is, well, to just put "AVX512" back to existing product line.
Has anyone checked the x265 source code to see if many non-AVX10 instructions are being used? I wouldn't be surprised if the AVX10 subset includes most or all of what x265 uses. Video encoding is made of pretty well understood DSP algorithms, and ones that have broad applicability to things like gaming. So I'd expect them to have better odds of making it to a more consumer-focused AVX512 subset.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 26th April 2024, 15:47   #9289  |  Link
excellentswordfight
Lost my old account :(
 
Join Date: Jul 2017
Posts: 332
Ok, so here is a fresh AVX512 test with the current generation from AMD, unfortunately I only had STEM2 to test with right now, going to see if I can find a more complex/grainy title to retest with later. I will also add test with Sapphire Rapids Xeon next week.

STEM2 2160p24 re-encode @ crf16

AMD Threadripper PRO 7955WX (Storm Peak) 16C/32T @ 350W.

Medium:
Non-AVX512: 15,59fps
AVX512: 15,42fps

slow:
Non-AVX512: 5,85fps
AVX512: 5,83fps

slower:
Non-AVX512: 1,66fps
AVX512: 1,67fps

Frequency for both non and AVX512 load was between 4,65 & 4,75Ghz. So even though I didnt detect any significant downclock speed was pretty much within margin of error (as the load and frequncy is so dynamic and not pinned at 100% run2run diviations will occour). This is actually worse than I expected, given that the downclocking while running avx512 load didnt look like it could have been more than 100MHz, at the same frequency I would expect 5-10% increase or so for preset 'slower', will be interesting to see how grainy/complex content differ at slower.

Last edited by excellentswordfight; 26th April 2024 at 16:02.
excellentswordfight is offline   Reply With Quote
Old 26th April 2024, 17:52   #9290  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,849
Quote:
Originally Posted by excellentswordfight View Post
Ok, so here is a fresh AVX512 test with the current generation from AMD, unfortunately I only had STEM2 to test with right now, going to see if I can find a more complex/grainy title to retest with later. I will also add test with Sapphire Rapids Xeon next week.

STEM2 2160p24 re-encode @ crf16

AMD Threadripper PRO 7955WX (Storm Peak) 16C/32T @ 350W.

Medium:
Non-AVX512: 15,59fps
AVX512: 15,42fps

slow:
Non-AVX512: 5,85fps
AVX512: 5,83fps

slower:
Non-AVX512: 1,66fps
AVX512: 1,67fps

Frequency for both non and AVX512 load was between 4,65 & 4,75Ghz. So even though I didnt detect any significant downclock speed was pretty much within margin of error (as the load and frequncy is so dynamic and not pinned at 100% run2run diviations will occour). This is actually worse than I expected, given that the downclocking while running avx512 load didnt look like it could have been more than 100MHz, at the same frequency I would expect 5-10% increase or so for preset 'slower', will be interesting to see how grainy/complex content differ at slower.
AVX512 in Zen 4 is performed on 2x256bit instead of 2x512bit like in Intel cpus. Zen5 will finally have 2x512bit.
Atak_Snajpera is offline   Reply With Quote
Old 26th April 2024, 23:46   #9291  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,804
Quote:
Originally Posted by excellentswordfight View Post
Ok, so here is a fresh AVX512 test with the current generation from AMD, unfortunately I only had STEM2 to test with right now, going to see if I can find a more complex/grainy title to retest with later.
I doubt graininess will have much of an impact on speed. I'm curious if you'll find out otherwise!

Quote:
I will also add test with Sapphire Rapids Xeon next week.
That should show improvement from older Intel and current AMD. Curious to see if it does, and by how much[/QUOTE]

STEM2 2160p24 re-encode @ crf16

AMD Threadripper PRO 7955WX (Storm Peak) 16C/32T @ 350W.

Medium:
Non-AVX512: 15,59fps
AVX512: 15,42fps

slow:
Non-AVX512: 5,85fps
AVX512: 5,83fps

slower:
Non-AVX512: 1,66fps
AVX512: 1,67fps[/QUOTE]
FWIW, when AVX512 support first came out, MultiCoreWare said it was most likely to show a performance benefit with 4K veryslow 10-bit. On the first Intel AVX2 implementations, with more common encoding parameters it was common to see encoding fps drop with AVX512 turned on.

AMD at least has no regression, so it's presumably safe to have it on by default instead of only for specific use cases.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 26th April 2024, 23:48   #9292  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,804
Quote:
Originally Posted by Boulder View Post
I really wouldn't hold my breath waiting for x265 to get big optimizations regarding any new instruction sets.
Intel has historically funded new instruction implementation and optimization for both x264 and x265. I imagine we'll see a new round for new instructions if they think it will give them a higher "% improvement from last generation/competition" number.

Video compression has often been the "up to" in "up to X% faster"
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 26th April 2024, 23:51   #9293  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,804
Quote:
Originally Posted by tormento View Post
AVX 10.1 will be a subset of AVX512. With 10.2 they will embrace and improve AVX512.
I'm not personally deep on this, but that disagrees with what Wikipedia says: https://en.wikipedia.org/wiki/Advanc...tensions#AVX10

Quote:
The first and "early" version of AVX10, notated AVX10.1, will not introduce any instructions or encoding features beyond what is already in AVX-512 (specifically, in Intel Sapphire Rapids: AVX-512F, CD, VL, DQ, BW, IFMA, VBMI, VBMI2, BITALG, VNNI, GFNI, VPOPCNTDQ, VPCLMULQDQ, VAES, BF16, FP16).
Is that missing anything? And is anything missing used in x265?
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 27th April 2024, 07:50   #9294  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,747
Quote:
Originally Posted by benwaggoner View Post
Intel has historically funded new instruction implementation and optimization for both x264 and x265. I imagine we'll see a new round for new instructions if they think it will give them a higher "% improvement from last generation/competition" number.

Video compression has often been the "up to" in "up to X% faster"
If Intel is involved, I'd expect them to put their effort on SVT-AV1 which is being developed constantly. x265 is sadly mostly on life support and we have not seen anything really new in a long time.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...

Last edited by Boulder; 27th April 2024 at 07:59.
Boulder is offline   Reply With Quote
Old 27th April 2024, 10:12   #9295  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,600
Quote:
Originally Posted by benwaggoner View Post
I'm not personally deep on this, but that disagrees with what Wikipedia says: https://en.wikipedia.org/wiki/Advanc...tensions#AVX10


Is that missing anything? And is anything missing used in x265?

I read Intel documentation. Usually itís more precise than wiki
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 27th April 2024, 10:15   #9296  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,600
x265 HEVC Encoder

Quote:
Originally Posted by benwaggoner View Post
I doubt graininess will have much of an impact on speed. I'm curious if you'll find out otherwise!
On my poor i7-2600k, grainy material vs denoised one shows sometimes huge impact on bitrate (ofc) and speed.
__________________
@turment on Telegram

Last edited by tormento; 28th April 2024 at 10:31.
tormento is offline   Reply With Quote
Old 27th April 2024, 10:31   #9297  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,747
The higher the bitrate gets, the more computationally complex things tend to get.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 28th April 2024, 11:48   #9298  |  Link
excellentswordfight
Lost my old account :(
 
Join Date: Jul 2017
Posts: 332
Quote:
Originally Posted by Atak_Snajpera View Post
AVX512 in Zen 4 is performed on 2x256bit instead of 2x512bit like in Intel cpus. Zen5 will finally have 2x512bit.
Im fully aware of that, but there has been quite a bit of tests demonstrating that AVX512 performance on zen4 is actually quite good regardless.

"On average for the tested AVX-512 workloads, making use of the AVX-512 instructions led to around 59% higher performance compared to when artificially limiting the Ryzen 9 7950X to AVX2 / no-AVX512.

From these results I am rather impressed by the AVX-512 performance out of the AMD Ryzen 9 7950X. While initially being disappointed when hearing of their "double pumping" approach rather than going for a 512-bit data path, these benchmark results speak for themselves. For software that can effectively make use of AVX-512 (and compiled so), there is significant performance uplift to enjoy while no negative impact in terms of reduced CPU clock speeds / higher power consumption (with oneDNN being one of the only exceptions seen so far in terms of higher power draw)."


https://www.phoronix.com/review/amd-zen4-avx512/6


Quote:
Originally Posted by benwaggoner View Post
I doubt graininess will have much of an impact on speed. I'm curious if you'll find out otherwise!
Quote:
Originally Posted by Boulder View Post
The higher the bitrate gets, the more computationally complex things tend to get.
Yes, and that was a bit of my reasoning as well; as complex content can pretty much half the encoding speed compared to easier content that it might also shift the bottleneck a bit and maybe favor AVX512 more. But as im not a programmer and have no idea what calculations avx512 instructions are (supposed) to speed up in x265, so this was just a wild theory.

Anyway, tried it with the good ol SVT sequence, straight up half the speed, but the result compared to not using avx512 was identical to the other test. As mentioned Im going to test with Intel as well, but Im pretty sure that avx512 isnt going to do anything there either, and that its simply doesnt do pretty much anything for x265, if the instructions just isnt suited for x265/videoencoding or if it's a poor implementiotion Im gonna leave up for any programmer to decide, but as a user, I probably continue not to bother with it. Cause even though it was minor, you are right benwaggoner, for preset medium and slow I always saw AVX512 to be slower, and this is without any downclocking! Its was minor/negligible mind you, but still. If that is the case for veryslow? Dunno, dont care as I never use it, and see very little rational for its use cases, I barley go down to slower, even at work.

Last edited by excellentswordfight; 28th April 2024 at 12:02.
excellentswordfight is offline   Reply With Quote
Old 28th April 2024, 17:11   #9299  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 493
x265 v3.6+8
Built on April 27, 2024, GCC 13.2.0

https://bitbucket.org/multicoreware/.../branch/master

DL :
https://www.mediafire.com/file/1lee9l8p0qgrxau
__________________
Do NOT re-post any of my Mediafire links. Download & re-host the content(s) if you want to share it somewhere else.
Barough is offline   Reply With Quote
Old 29th April 2024, 15:58   #9300  |  Link
excellentswordfight
Lost my old account :(
 
Join Date: Jul 2017
Posts: 332
As mentioned, Intel test as well.

Intel Xeon Gold 6426Y (Sapphire Rapids) 16C/32T

Medium:
Non-AVX512: 11,72fps
AVX512: 11,87fps

slow:
Non-AVX512: 4,17fps
AVX512: 4,49fps

slower:
Non-AVX512: 1,15fps
AVX512: 1,22fps

So, yes, there has been some major improvements on avx512 load from Intel, I actually didnt see any downclocking at all! So its looks like that part is finally solved on Intel cpus as well. And without downclocking I saw the 5-10% improvement that I was expecting on the AMD side without downclocking. So it does indeed look like AMD is hurt by the "double pumping" of the avx512 load used in this scenario.

So, it looks like for sapphire rapids (and newer), you might wanna run with avx512, with that said, pretty sure apples to apples, AMD will outperform Intel regardless of avx512.

Last edited by excellentswordfight; 29th April 2024 at 16:20.
excellentswordfight is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 12:09.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.