View Full Version : x265 HEVC Encoder
vadlerg
9th April 2024, 15:01
Made a build of new version of my custom mod. Also add an AVX LLVM version, don't remember who asked for it...
Thanks :)!
benwaggoner
10th April 2024, 23:52
No it's not. It's for the H.265 Film grain characteristics SEI message. Or now it's better to reference it as H.274 FGS.
It can be used to generate the expected binary file.
Oh, I haven't heard much about that yet. What's the intended use case?
quietvoid
11th April 2024, 00:07
Oh, I haven't heard much about that yet. What's the intended use case?
Film grain synthesis in MPEG codecs. It's been specified for decades and is still barely seeing any use.
benwaggoner
11th April 2024, 00:32
Film grain synthesis in MPEG codecs. It's been specified for decades and is still barely seeing any use.
Oh, the old optional one from HD-DVD? I haven't looked at it in AGES, but I remember it not being very capable.
AVFG1 is a much better FGS implementation.
LigH
11th April 2024, 09:08
Sounds like the next more complex model beyond "noise reduction" (--nr-[inter|intra]) in x265, which was more a modelling than a simple reduction too.
FranceBB
11th April 2024, 12:58
Hey guys, I know most of you saw the screenshots of the 56c/112th and other Intel Xeon monsters CPU I use at work, but you should know that at home I'm still using an old i7 5930K CPU 6c/12th at 3.50GHz which can go up to 4.00GHz.
It served me fairly well over the years, but it's getting older and older and I was thinking about buying a new one.
Unfortunately, nowadays Intel seems to be focused on releasing CPUs with mixed cores only for consumers so much so only Intel Xeons have the good old normal cores, but I'm too broke to afford one.
This means that I'll probably end up buying another consumer CPU (like the Core Ultra or whatever they named them now that they killed the i7-i9 nomenclature) which has Efficiency cores and Performance cores.
I know that the efficiency ones are low powered, low clock and also miss some instruction set support like they don't have full AVX512 support etc while the Performance cores have a higher clock and full instruction set support (SSE, SSE2, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512). The idea being that the Efficiency cores are supposed to be used for low intensity background activities like checking your email while the Performance cores are supposed to be used for high intensity activities like... encoding I guess.
So, the question is:
assuming I'm on Windows 11 on a new PC and I fire up my good old x265 BAT to encode from an AVS Script.avs as I've always done, what's gonna happen?
Is it only gonna use the Performance cores?
Is it gonna use both the performance and the efficiency cores in a very bad way so that once a frame is split in the various threads I'll end up with the performance cores waiting for the efficiency cores to finish their piece (i.e threads synchronization) and thus they'll impact performances negatively?
Is everything gonna work smartly automagically as it's handled by the OS (Win11) and I don't have to care about it?
Sorry for the various questions, but this is almost definitely gonna impact my choice of what to buy 'cause depending on the answer and given the prices of the Xeon nowadays, I'm flirting with the idea of moving to AMD.
guest
11th April 2024, 13:13
I'm flirting with the idea of moving to AMD.
I have been using AMD's for a lot of years, I still have a 3950X, 5900X, 5950X, and when the 7950X were released I knew I had to get one.
I exclusively use RipBot264, and the 7950X is an encoding beast, BUT, 16C CPU's do have issues with x265 encoding, but due to the "behind the scenes settings" in RipBot, this can be easily overcome.
But then the 13th Gen Intel's came out, so took a big chance and got a 13900KF, I wasn't sure how the different architecture would work, but after some time fiddling around, I'm more than happy with it.
In a head to head encode, the 7950X is just slightly faster...I would think that the 14th Gen Intel's maybe a little faster.
And to just sum up, you could build an Intel system quite a bit cheaper that an AMD, DDR4 vs DDR5, but one thing that IS very important, you WILL need a really good cooling system, as both run really hot, but they are stable as.
PS:- No AVX512 support on the Intel's :( Hadn't heard of the Core Ultra 'til your post.
https://www.intel.com/content/www/us/en/products/docs/processors/core-vs-ultra.html
benwaggoner
11th April 2024, 18:07
Sounds like the next more complex model beyond "noise reduction" (--nr-[inter|intra]) in x265, which was more a modelling than a simple reduction too.
it's about grain reconstruction based on metadata, ala AV1's Film Grain Synthesis. But a very old basic version originally designed for H.264. The only platform that supported it IIRC was HD-DVD, but AFAIK it wasn't ever used on actual discs. People thought it was limited at best back then.
Perhaps coupled with modern de-grain and parameterization technology, maybe?
I think AVFG1 is more likely to be important.
benwaggoner
11th April 2024, 18:09
PS:- No AVX512 support on the Intel's :( Hadn't heard of the Core Ultra 'til your post.
https://www.intel.com/content/www/us/en/products/docs/processors/core-vs-ultra.html
AVX512 hasn't been all that helpful for x265 outside of 4K resolutions at slower presets. It's probably not a big loss.
qyot27
11th April 2024, 19:56
assuming I'm on Windows 11 on a new PC and I fire up my good old x265 BAT to encode from an AVS Script.avs as I've always done, what's gonna happen?
Is it only gonna use the Performance cores?
Is it gonna use both the performance and the efficiency cores in a very bad way so that once a frame is split in the various threads I'll end up with the performance cores waiting for the efficiency cores to finish their piece (i.e threads synchronization) and thus they'll impact performances negatively?
Is everything gonna work smartly automagically as it's handled by the OS (Win11) and I don't have to care about it?
If it's anything like macOS, the efficiency cores are executed second.
Not using x265, but an example of the core topology is included in this post from 2022 (https://forum.doom9.org/showpost.php?p=1974382&postcount=2111). Short answer: if Intel's heterogenous architectured CPUs act the same way in conjunction with the thread scheduler in Linux and/or Windows, just use Prefetch() to target the Performance cores if you don't want them mixing.
excellentswordfight
11th April 2024, 22:18
assuming I'm on Windows 11 on a new PC and I fire up my good old x265 BAT to encode from an AVS Script.avs as I've always done, what's gonna happen?
Is it only gonna use the Performance cores?
Is it gonna use both the performance and the efficiency cores in a very bad way so that once a frame is split in the various threads I'll end up with the performance cores waiting for the efficiency cores to finish their piece (i.e threads synchronization) and thus they'll impact performances negatively?
Is everything gonna work smartly automagically as it's handled by the OS (Win11) and I don't have to care about it?
I've been encoding on Intels CPUs pretty since they introduced the hyrid architecture. And yes, that pretty much the case. From a performance perspective everything is pretty much just working, I have played with turning of e-cores etc, but I gain nothing from it, there is a net profit in using the e-cores. And as the e-cores have better performance per watt, in cases were you are power restricted (which is always now days) and can saturate the extra threads you will get more performance of having say 4 e-cores, compared to having 1-2 extra p-cores, or giving existing p-cores more of the power-budget.
With that said, its not all smooth sailing, windows does weird shit with its scheduler. For example, if you minimize the application you are encoding with (this is at least the case when running applications through CMD) that application will no longer use p-cores AT ALL, as windows now treats it has a "background application". This can be enjoying, and even a dealbreaker as it can also be triggered just by running something in fullscreen infront of it. But I actually use it as a feature, cause if I wanna game while encoding i just minize it will free up the p-cores completely.
With that said, for just a straight up encoding machine, if we are talking about consumer CPUs I would just get a 7000-series Ryzen instead given the huge performance/w advantage in this performance class (Intel is actually not that bad in this regards if you limit your CPU at say 125W and below) and there you really dont have to care about any hybrid core stuff. Ignoring downclocking and powerlimits etc, out of the box, 7950X gets 90% of 14900k encoding performance at 1/2 of the power concumption.
DMD
16th April 2024, 22:49
AVX512 hasn't been all that helpful for x265 outside of 4K resolutions at slower presets. It's probably not a big loss.
I apologize but maybe that is why in the latest release of Patman86 the avx512 version does not appear?
https://github.com/Patman86/x265-Mod-by-Patman/releases
Thank you
tormento
17th April 2024, 12:27
AVX512 hasn't been all that helpful for x265 outside of 4K resolutions at slower presets. It's probably not a big loss.
I'd be curious to see AVX2 vs AVX512 performances of a very noisy source in very slow/placebo preset on modern CPUs.
Unfortunately the choice for Intel is now limited to 11 series and Xeons.
I need to decide if to go with Arrow Lake or Zen 5.
Not having AVX512 could be a big no no, as a lot of plugins and software that I use support that instruction set.
AVX10 for Arrow Lake is a big enigma, as it's not completely confirmed if it will be 10.1 (no AVX512) or 10.2.
tormento
17th April 2024, 17:29
A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC (https://www.hindawi.com/journals/sp/2017/1431574/)
rwill
18th April 2024, 05:07
"Wer schreibt der bleibt"
guest
18th April 2024, 06:18
"Wer schreibt der bleibt"
Those who write remain
rwill
18th April 2024, 15:58
Yeah, but in the context of the linked paper the correct translation is "Publish or Perish".
Like I skipped through it and it seems to be missing BD-Rate graphs. To quote: "The decrease of the average PSNR is less than 1.5%.". This makes the whole approach questionable and up to worthless.
I mean its nice, I did GPU based Motion Fields as a hobby exercise with CUDA in 2007 or 2008. That some guys seriously present some HEVC ME stuff 10 years later with a complete disregard for video coding, well I better don't comment further.
excellentswordfight
18th April 2024, 19:38
I'd be curious to see AVX2 vs AVX512 performances of a very noisy source in very slow/placebo preset on modern CPUs.
Unfortunately the choice for Intel is now limited to 11 series and Xeons.
I need to decide if to go with Arrow Lake or Zen 5.
Not having AVX512 could be a big no no, as a lot of plugins and software that I use support that instruction set.
AVX10 for Arrow Lake is a big enigma, as it's not completely confirmed if it will be 10.1 (no AVX512) or 10.2.
I have access to some 4th gen Xeon SP/Sapphire rapids systems (golden cove cores, same as 12th gen Core/Alder lake), when I get some time I can redo some AVX512 tests.
But in the past, I have never gotten better performance with avx512 with x265, the downclocking it triggers have eaten all potential performance gain. But AFAIK the downclocking of AVX512 load have been improved since its introduction, and was less of an issue on consumer desktop models were frequency control is possible. But given what i've seen, i'm pretty sure that running it with AVX512 produces worse performance/watt.
vpupkind
18th April 2024, 20:04
Don't look at the man behind the curtain there!
This feature is in preemptive support for players that can decode HEVC with AVFG1 film grain synthesis metadata. It's essentially the FGS component of AV1 made general, with a fix so that grain size is based on grain size in the content, not display resolution.
I don't think that there is anything out yet that can play such a file. The HEVC will decode, but the FGS metadata will be ignored.
I believe VLC already supports it.
In any case, MCW has an open source library, libfgm, that should be able to both filter and estimate parameters.
tormento
19th April 2024, 12:17
But AFAIK the downclocking of AVX512 load have been improved since its introduction, and was less of an issue on consumer desktop models were frequency control is possible. But given what i've seen, i'm pretty sure that running it with AVX512 produces worse performance/watt.
It’s proven that AVX512 has more performance per watt. About downclocking: if you have a powerful heatsink AFAIK you can disable downclocking in bios.
excellentswordfight
19th April 2024, 14:38
It’s proven that AVX512 has more performance per watt
Proven? For all systems with all software? Its pretty much a fact that has been worse for my cases; were enterprise-server hardware has been used. These systems are pretty strict in staying within it's power-specification, so when I get worse performance with AVX512 enabled, but it consumes the same amount of power (I have not messured the powerdraw this mind you, but im pretty sure thats the case given tempeture and freqency readings), then it means that it has a negative impact on performance/w. There are software were AVX512 instructions can do wonders, were it boosts performance by a great ammount, and there im sure that it has performance per watts benefits. But for x265, the performance gains of AVX512 are not that big, and at least with older generation hardware performance per watt has been lower, so unless you also increase power it have given me no gains.
About downclocking: if you have a powerful heatsink AFAIK you can disable downclocking in bios.
Yes, and thats also why i wrote:
"...less of an issue on consumer desktop models were frequency control is possible"
I have only tested AVX512 on Xeon/server hardware, and my old ice lake laptop. But as I said, I will try it on Sapphire Rapids, it will be interesting to see how it has been improved, cause first gen, both xeon and consumer (ice-lake) the downclocking rather big.
nevcairiel
19th April 2024, 17:10
AVX512 benefits start if you use it well and thoroughly, using it sparsely or inefficiently can cause more overhead then gain.
Blanket statements never make sense with complex optimizations and complex software. Test your own use-case on your own hardware.
Atak_Snajpera
19th April 2024, 23:18
Avx512 will be soon abandoned by Intel. AVX10 is a new toy.
MeteorRain
22nd April 2024, 04:22
I apologize but maybe that is why in the latest release of Patman86 the avx512 version does not appear?
https://github.com/Patman86/x265-Mod-by-Patman/releases
Thank you
That's just a targeted build, which affects everything other than the core part. The core will still run at different instruction set (meaning you should be able to use AVX512 on a sandy bridge build if you wish). Only the C++ code (which is not part of the core parts) will target at a different instruction set.
I'd be curious to see AVX2 vs AVX512 performances of a very noisy source in very slow/placebo preset on modern CPUs.
Unfortunately the choice for Intel is now limited to 11 series and Xeons.
I need to decide if to go with Arrow Lake or Zen 5.
Not having AVX512 could be a big no no, as a lot of plugins and software that I use support that instruction set.
AVX10 for Arrow Lake is a big enigma, as it's not completely confirmed if it will be 10.1 (no AVX512) or 10.2.
AVX512 on different CPUs with different implementations will behave differently. On Zen4 you should generally see slight improvement with AVX512, while on an Intel CPU you'll see some throttling and slow down if you mix AVX512 workload with AVX2 ones.
Zen4 basically runs AVX512 on AVX2 platform, but since some AVX512 instructions are more efficient than their plain AVX2 equivalent, speed sees some improvement.
Intel CPUs runs AVX512 natively, so they do get great improvement. However Intel CPU has that infamous clock throttling so it clocks lower, impacting normal workloads.
It'll end up being a personal preference. AVX10 is a new thing and will take time to adopt. And AVX10 is, well, to just put "AVX512" back to existing product line.
Just my 2 cents.
tormento
22nd April 2024, 11:26
It'll end up being a personal preference. AVX10 is a new thing and will take time to adopt. And AVX10 is, well, to just put "AVX512" back to existing product line.
AVX 10.1 will be a subset of AVX512. With 10.2 they will embrace and improve AVX512.
Boulder
22nd April 2024, 11:35
I really wouldn't hold my breath waiting for x265 to get big optimizations regarding any new instruction sets.
benwaggoner
24th April 2024, 19:25
AVX512 on different CPUs with different implementations will behave differently. On Zen4 you should generally see slight improvement with AVX512, while on an Intel CPU you'll see some throttling and slow down if you mix AVX512 workload with AVX2 ones.
And the throttling and performance impact varies between different major Intel design versions. The current design performs quite a bit better with AVX512 than the first round of CPUs did.
Zen4 basically runs AVX512 on AVX2 platform, but since some AVX512 instructions are more efficient than their plain AVX2 equivalent, speed sees some improvement.
Intel CPUs runs AVX512 natively, so they do get great improvement. However Intel CPU has that infamous clock throttling so it clocks lower, impacting normal workloads.
Yeah, even if it is the same fused instruction under the hood, AVX512 has smaller instruction size per bit, and so can stay in L3 cache better.
It'll end up being a personal preference. AVX10 is a new thing and will take time to adopt. And AVX10 is, well, to just put "AVX512" back to existing product line.
Has anyone checked the x265 source code to see if many non-AVX10 instructions are being used? I wouldn't be surprised if the AVX10 subset includes most or all of what x265 uses. Video encoding is made of pretty well understood DSP algorithms, and ones that have broad applicability to things like gaming. So I'd expect them to have better odds of making it to a more consumer-focused AVX512 subset.
excellentswordfight
26th April 2024, 15:47
Ok, so here is a fresh AVX512 test with the current generation from AMD, unfortunately I only had STEM2 to test with right now, going to see if I can find a more complex/grainy title to retest with later. I will also add test with Sapphire Rapids Xeon next week.
STEM2 2160p24 re-encode @ crf16
AMD Threadripper PRO 7955WX (Storm Peak) 16C/32T @ 350W.
Medium:
Non-AVX512: 15,59fps
AVX512: 15,42fps
slow:
Non-AVX512: 5,85fps
AVX512: 5,83fps
slower:
Non-AVX512: 1,66fps
AVX512: 1,67fps
Frequency for both non and AVX512 load was between 4,65 & 4,75Ghz. So even though I didnt detect any significant downclock speed was pretty much within margin of error (as the load and frequncy is so dynamic and not pinned at 100% run2run diviations will occour). This is actually worse than I expected, given that the downclocking while running avx512 load didnt look like it could have been more than 100MHz, at the same frequency I would expect 5-10% increase or so for preset 'slower', will be interesting to see how grainy/complex content differ at slower.
Atak_Snajpera
26th April 2024, 17:52
Ok, so here is a fresh AVX512 test with the current generation from AMD, unfortunately I only had STEM2 to test with right now, going to see if I can find a more complex/grainy title to retest with later. I will also add test with Sapphire Rapids Xeon next week.
STEM2 2160p24 re-encode @ crf16
AMD Threadripper PRO 7955WX (Storm Peak) 16C/32T @ 350W.
Medium:
Non-AVX512: 15,59fps
AVX512: 15,42fps
slow:
Non-AVX512: 5,85fps
AVX512: 5,83fps
slower:
Non-AVX512: 1,66fps
AVX512: 1,67fps
Frequency for both non and AVX512 load was between 4,65 & 4,75Ghz. So even though I didnt detect any significant downclock speed was pretty much within margin of error (as the load and frequncy is so dynamic and not pinned at 100% run2run diviations will occour). This is actually worse than I expected, given that the downclocking while running avx512 load didnt look like it could have been more than 100MHz, at the same frequency I would expect 5-10% increase or so for preset 'slower', will be interesting to see how grainy/complex content differ at slower.
AVX512 in Zen 4 is performed on 2x256bit instead of 2x512bit like in Intel cpus. Zen5 will finally have 2x512bit.
benwaggoner
26th April 2024, 23:46
Ok, so here is a fresh AVX512 test with the current generation from AMD, unfortunately I only had STEM2 to test with right now, going to see if I can find a more complex/grainy title to retest with later.
I doubt graininess will have much of an impact on speed. I'm curious if you'll find out otherwise!
I will also add test with Sapphire Rapids Xeon next week.
That should show improvement from older Intel and current AMD. Curious to see if it does, and by how much[/QUOTE]
STEM2 2160p24 re-encode @ crf16
AMD Threadripper PRO 7955WX (Storm Peak) 16C/32T @ 350W.
Medium:
Non-AVX512: 15,59fps
AVX512: 15,42fps
slow:
Non-AVX512: 5,85fps
AVX512: 5,83fps
slower:
Non-AVX512: 1,66fps
AVX512: 1,67fps[/QUOTE]
FWIW, when AVX512 support first came out, MultiCoreWare said it was most likely to show a performance benefit with 4K veryslow 10-bit. On the first Intel AVX2 implementations, with more common encoding parameters it was common to see encoding fps drop with AVX512 turned on.
AMD at least has no regression, so it's presumably safe to have it on by default instead of only for specific use cases.
benwaggoner
26th April 2024, 23:48
I really wouldn't hold my breath waiting for x265 to get big optimizations regarding any new instruction sets.
Intel has historically funded new instruction implementation and optimization for both x264 and x265. I imagine we'll see a new round for new instructions if they think it will give them a higher "% improvement from last generation/competition" number.
Video compression has often been the "up to" in "up to X% faster"
benwaggoner
26th April 2024, 23:51
AVX 10.1 will be a subset of AVX512. With 10.2 they will embrace and improve AVX512.
I'm not personally deep on this, but that disagrees with what Wikipedia says: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#AVX10
The first and "early" version of AVX10, notated AVX10.1, will not introduce any instructions or encoding features beyond what is already in AVX-512 (specifically, in Intel Sapphire Rapids: AVX-512F, CD, VL, DQ, BW, IFMA, VBMI, VBMI2, BITALG, VNNI, GFNI, VPOPCNTDQ, VPCLMULQDQ, VAES, BF16, FP16).
Is that missing anything? And is anything missing used in x265?
Boulder
27th April 2024, 07:50
Intel has historically funded new instruction implementation and optimization for both x264 and x265. I imagine we'll see a new round for new instructions if they think it will give them a higher "% improvement from last generation/competition" number.
Video compression has often been the "up to" in "up to X% faster"
If Intel is involved, I'd expect them to put their effort on SVT-AV1 which is being developed constantly. x265 is sadly mostly on life support and we have not seen anything really new in a long time.
tormento
27th April 2024, 10:12
I'm not personally deep on this, but that disagrees with what Wikipedia says: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#AVX10
Is that missing anything? And is anything missing used in x265?
I read Intel documentation. Usually it’s more precise than wiki ;)
tormento
27th April 2024, 10:15
I doubt graininess will have much of an impact on speed. I'm curious if you'll find out otherwise!
On my poor i7-2600k, grainy material vs denoised one shows sometimes huge impact on bitrate (ofc) and speed.
Boulder
27th April 2024, 10:31
The higher the bitrate gets, the more computationally complex things tend to get.
excellentswordfight
28th April 2024, 11:48
AVX512 in Zen 4 is performed on 2x256bit instead of 2x512bit like in Intel cpus. Zen5 will finally have 2x512bit.
Im fully aware of that, but there has been quite a bit of tests demonstrating that AVX512 performance on zen4 is actually quite good regardless.
"On average for the tested AVX-512 workloads, making use of the AVX-512 instructions led to around 59% higher performance compared to when artificially limiting the Ryzen 9 7950X to AVX2 / no-AVX512.
From these results I am rather impressed by the AVX-512 performance out of the AMD Ryzen 9 7950X. While initially being disappointed when hearing of their "double pumping" approach rather than going for a 512-bit data path, these benchmark results speak for themselves. For software that can effectively make use of AVX-512 (and compiled so), there is significant performance uplift to enjoy while no negative impact in terms of reduced CPU clock speeds / higher power consumption (with oneDNN being one of the only exceptions seen so far in terms of higher power draw)."
https://www.phoronix.com/review/amd-zen4-avx512/6
I doubt graininess will have much of an impact on speed. I'm curious if you'll find out otherwise!
The higher the bitrate gets, the more computationally complex things tend to get.
Yes, and that was a bit of my reasoning as well; as complex content can pretty much half the encoding speed compared to easier content that it might also shift the bottleneck a bit and maybe favor AVX512 more. But as im not a programmer and have no idea what calculations avx512 instructions are (supposed) to speed up in x265, so this was just a wild theory.
Anyway, tried it with the good ol SVT sequence, straight up half the speed, but the result compared to not using avx512 was identical to the other test. As mentioned Im going to test with Intel as well, but Im pretty sure that avx512 isnt going to do anything there either, and that its simply doesnt do pretty much anything for x265, if the instructions just isnt suited for x265/videoencoding or if it's a poor implementiotion Im gonna leave up for any programmer to decide, but as a user, I probably continue not to bother with it. Cause even though it was minor, you are right benwaggoner, for preset medium and slow I always saw AVX512 to be slower, and this is without any downclocking! Its was minor/negligible mind you, but still. If that is the case for veryslow? Dunno, dont care as I never use it, and see very little rational for its use cases, I barley go down to slower, even at work.
excellentswordfight
29th April 2024, 15:58
As mentioned, Intel test as well.
Intel Xeon Gold 6426Y (Sapphire Rapids) 16C/32T
Medium:
Non-AVX512: 11,72fps
AVX512: 11,87fps
slow:
Non-AVX512: 4,17fps
AVX512: 4,49fps
slower:
Non-AVX512: 1,15fps
AVX512: 1,22fps
So, yes, there has been some major improvements on avx512 load from Intel, I actually didnt see any downclocking at all! So its looks like that part is finally solved on Intel cpus as well. And without downclocking I saw the 5-10% improvement that I was expecting on the AMD side without downclocking. So it does indeed look like AMD is hurt by the "double pumping" of the avx512 load used in this scenario.
So, it looks like for sapphire rapids (and newer), you might wanna run with avx512, with that said, pretty sure apples to apples, AMD will outperform Intel regardless of avx512.
kurkosdr
30th April 2024, 14:45
it's about grain reconstruction based on metadata, ala AV1's Film Grain Synthesis. But a very old basic version originally designed for H.264. The only platform that supported it IIRC was HD-DVD, but AFAIK it wasn't ever used on actual discs. People thought it was limited at best back then.
Perhaps coupled with modern de-grain and parameterization technology, maybe?
I think AVFG1 is more likely to be important.
I am wondering, is it possible to get AVFG1 implemented on not just H.265 encoders and decoders but also H.264 encoders and decoders? (I don't see why not) I am one of those people who think that film grain should always be removed at post-production in the same way that interlaced scenes that are added to a progressive stream should always be de-interlaced. Film grain was typically removed when authoring DVDs, the whole idea of leaving film grain in was a flex of HD-DVD and Blu-Ray to advertise their large capacity (aka average bitrates) compared to AVCHD/BD9/HD-VMD and the then-nascent streaming services.
If film grain can be made into an effect instead of making the encoder scream, I'd be much happier. Yes, even if the effect isn't present on existing H.264 and H.265 decoders. Nobody else than film buffs cares.
benwaggoner
30th April 2024, 18:25
I am wondering, is it possible to get AVFG1 implemented on not just H.265 encoders and decoders but also H.264 encoders and decoders? (I don't see why not) I am one of those people who think that film grain should always be removed at post-production in the same way that interlaced scenes that are added to a progressive stream should always be de-interlaced. Film grain was typically removed when authoring DVDs, the whole idea of leaving film grain in was a flex of HD-DVD and Blu-Ray to advertise their large capacity (aka average bitrates) compared to AVCHD/BD9/HD-VMD and the then-nascent streaming services.
If film grain can be made into an effect instead of making the encoder scream, I'd be much happier. Yes, even if the effect isn't present on existing H.264 and H.265 decoders. Nobody else than film buffs cares.
AVFG1 is entirely codec agnostic, and absolutely could be applied to H.264 or any other codec that supports SEI. And the grain removal and parameterization process is also codec agnostic, and takes place on uncompressed frames pre-encoder. The input to the encoder is the detrained frames and the SEI messages to embed. If one is encoding to multiple AVFG1 streams, the grain removal and parameterization would be shared across codecs, all of them getting the same frames and SEI stream.
FWIW, HD-DVD did have support for the old H.264 FGS technology mandatory. No mainstream titles used it however, due to technical limitations at the time. Degraining and parameterization were vastly more challenging in 2006 than in 2024. We can spend a couple orders of magnitude more FLOPS/pixel now, and have AI/ML, for which this is a well-suited task.
I never got to play with it enough to determine if the synthesis part was flexibile and high quality enough, though.
benwaggoner
30th April 2024, 18:34
Intel Xeon Gold 6426Y (Sapphire Rapids) 16C/32T
Medium:
Non-AVX512: 11,72fps
AVX512: 11,87fps
slow:
Non-AVX512: 4,17fps
AVX512: 4,49fps
slower:
Non-AVX512: 1,15fps
AVX512: 1,22fps
So, yes, there has been some major improvements on avx512 load from Intel, I actually didnt see any downclocking at all! So its looks like that part is finally solved on Intel cpus as well. And without downclocking I saw the 5-10% improvement that I was expecting on the AMD side without downclocking. So it does indeed look like AMD is hurt by the "double pumping" of the avx512 load used in this scenario.
IIRC, the AVX-512 down clocking was internal, and didn't result in the actual CPU frequency dropping. But some internal components wound up running at half clock speed. It took MCW quite some time to figure out what was going on with poor AVX-512 performance, and AVX2 before that.
So, it looks like for sapphire rapids (and newer), you might wanna run with avx512, with that said, pretty sure apples to apples, AMD will outperform Intel regardless of avx512.
I expect that there will be bigger gains with higher resolution, and less or perhaps none with lower resolutions.
excellentswordfight
30th April 2024, 20:50
IIRC, the AVX-512 down clocking was internal, and didn't result in the actual CPU frequency dropping. But some internal components wound up running at half clock speed. It took MCW quite some time to figure out what was going on with poor AVX-512 performance, and AVX2 before that.
No, it sure did, and it was pretty aggressive as well, the first skylake-sp models i tried it on dropped from like 2,5Ghz to 1,9Ghz, tanking performance. But it was already vastly improved in ice lake, and pretty much "fixed" now I guess in sapphire rapids.
benwaggoner
30th April 2024, 20:53
No, it sure did, and it was pretty aggressive as well, the first skylake-sp models i tried it on dropped from like 2,5Ghz to 1,9Ghz, tanking performance.
Yeah, but the internal SIMD clock speed dropped more in a way that was very challenging to measure. If it was only a 25% drop like above, AVX-512 still would have been a lot faster for anything >256 bits.
excellentswordfight
30th April 2024, 21:12
Yeah, but the internal SIMD clock speed dropped more in a way that was very challenging to measure.
Well you stated:
"and didn't result in the actual CPU frequency dropping"
But as I said, it sure did.
https://images.anandtech.com/doci/11616/8180_turbo.png
If it was only a 25% drop like above, AVX-512 still would have been a lot faster for anything >256 bits.
Regardless of if there are any internal SIMD frequency things going on, even the first generation of CPUs with AVX512 support could easily double performance with the right workload (although one could argue that you might as well run that on GPUs), and still x265, even now when you dont have to make up for a lower frequency deficit only see minor performance gains, and to me that simply implies that its simply not a great fit for this workload.
benwaggoner
30th April 2024, 21:17
Well you stated:
"and didn't result in the actual CPU frequency dropping"
But as I said, it sure did.
Yes, absolutely. I should have said "reduced in frequency dropping unrelated to the reported CPU frequency"
Regardless of if there are any internal SIMD frequency things going on, even the first generation of CPUs with AVX512 support could easily double performance with the right workload (although one could argue that you might as well run that on GPUs), and still x265, even now only see minor performance gains, and to me that simply implies that its simply not a great fit for this workload.
Encoding is a mix of multiple performance-stressing features. CABAC is all about single-threaded arithmetic performance, for example.
x265 turned out to be more CPU intensive than Intel's then internal "worst case" CPU stress test software, which was pretty surprising to all involved.
Hellboy.
15th May 2024, 02:18
I did some tests (slow, crf19, 4K-HDR) with this 3 settings "repeat-headers, aud, hrd", and the quality was exactly the same, the speed was almost the same and the size was almost the same.
I compared some frames using a lot of zoom and there is no different.
So i can leave this 3 settings always enables when encode 1080p, 4K-SDR and 4K-HDR?
Not 100% sure what this settings do but i think they create a more compatible file.
Thanks.
Emulgator
15th May 2024, 10:00
They do create a more compatible file satisfying BD restrictions in that regard,
so BD capable devices/software should have no problems if you satisfy their other restrictions as well.
benwaggoner
15th May 2024, 21:50
They do create a more compatible file satisfying BD restrictions in that regard,
so BD capable devices/software should have no problems if you satisfy their other restrictions as well.
You need to use --repeat-headers for adaptive streaming as well, especially for HDR to make sure the 2084 metadata goes into each GOP.
Hellboy.
15th May 2024, 22:28
So there are no point in use this 3 settings if i don't create a full BD compliant file?
Edit:
Except for --repeat-headers that is needed for HDR.
benwaggoner
16th May 2024, 01:47
So there are no point in use this 3 settings if i don't create a full BD compliant file?
Edit:
Except for --repeat-headers that is needed for HDR.
Or if you're making streams for something else that wants them for compatibility. They don't affect the encode itself one way or another.
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.