x265 HEVC Encoder - Page 355

RanmaCanada · 7th October 2019, 05:45

Quote:

Originally Posted by Magik Mark

Hey Guys,

Are there any new switches that would speed up multi pass encoding?

Yes the -Buy Ryzen 3900x switch

aymanalz · 8th October 2019, 20:56

Quote:

Originally Posted by RanmaCanada

Yes the -Buy Ryzen 3900x switch

He has raised a valid and pertinent point. After all these years, x265 is still painfully slow on "normal" processors. You are right that the only way to encode faster is to get faster processors with more cores. I wish that wasn't the case, and that by now the developers could have made it faster. Maybe it is mathematically/programatically impossible to get any more speed improvements. That's a pity.

FranceBB · 8th October 2019, 22:36

Quote:

Originally Posted by aymanalz

He has raised a valid and pertinent point. After all these years, x265 is still painfully slow on "normal" processors. You are right that the only way to encode faster is to get faster processors with more cores. I wish that wasn't the case, and that by now the developers could have made it faster. Maybe it is mathematically/programatically impossible to get any more speed improvements. That's a pity.

It was pretty much the same when there was the shift from MPEG-2 and then Xvid to H.264: the computational complexity was way higher and the old single core single thread CPUs weren't able to cope with the amount of resources required. The fact that H.264 has been the de-facto standard for several years kinda got us used to high encoding speed. As a matter of fact, MPEG-2 encoders like x262 and MPEG-4 ASP encoders (Xvid) were not parallelized at all or were poorly parallelized, while x264 is able to max out a very high number of cores and thread depending on the settings.
A thing it won't use properly, though, is the second CPU, for instance I noticed that if you have a dual socket configuration like a Dual Xeon with an high number of cores and threads respectively, x264 will use only one of the CPUs, thus reducing the speed.
Anyway, x264 is generally so fast by now that it's fine, also because of modern assembly optimizations (manually written intrinsics) that were not available for x262 and Xvid encoders like AVX2.
x265 on the other hand has been developed with modern hardware in mind and not only it uses modern assembly optimizations (like x264) but it is also heavily parallelized, it uses both CPUs in a dual socket environment and it also enables you to use some additional settings if your CPU has so many cores that it's not maxed out by it.

You mentioned the mathematical complexity and indeed H.264 was based on a Discrete Cosine Transform (which works with real numbers and is continuous in 2phi) and the Hadamard Transform which is very light and is meant to take care of what the DCT couldn't compress well enough. As to H.265 it is indeed more demanding in terms of computational cost as it's using the Discrete Cosine Transform and the Discrete Sine Transform, but keep in mind that it could have been even more demanding 'cause years ago, before 2013, there were propositions about using the Karhunen-Loeve transform which is the heaviest transform that I know and it's very demanding in terms of computational cost, this is because back in the days it seemed impossible to achieve a 40% reduction compared to H.264 based on a linear-algebra only approach. The thing was that according to the results, the KLT did achieve better results compared to the DCT, however the improvements were so small in some cases and the computational cost was so high that they decided not to proceed with that approach, which then led to the modern DCT, DST approach.
If you take a look at the "future", you'll see 8K and H.266 VVC which inherited the Discrete Cosine Transform and Discrete Sine Transform approach from H.265 HEVC, but it's also using an adaptive multiple transform (AMT) scheme for residual coding for both inter-coded and intra-coded blocks. This approach basically consists of a set of five DCT and DST based transform, namely DCT-II, DCT-V, DCT-VIII, DST-I, and DST-VII and a Signal
Dependent Transform (SDT) is competed to the AMT output. The SDT approximates the optimal Karhunen-Loéve
transform (KLT), which is a signal dependent transform, by estimating current signal to code (transform block) with
similar signals (i.e. reference patch) available at the decoder (already coded). This way a lot of computational power is actually saved by not using the KLT directly which is far too demanding in terms of computational cost.

Anyway, you can be sure of one thing: it will be even more demanding, but, you know, in a world in which we have configurations with Intel Xeons CPUs like this Intel Xeon Platinum 9282 56c/112th, this doesn't seem to be a problem. For instance, I myself encode with an Intel Xeon 28c/56th at work with 64GB of RAM and a Quadro GPU even though I've been asking them several times to upgrade the CPU as it's been like this ever since 2017 and it's now "old" for what I gotta do on a daily basis.

What do I have at home? Well, a crappy i7 4c/8th with 32 GB DDR4 and an RTX NVIDIA GPU but it doesn't matter since the PC I use at home is for general purpose: browsing (replying to you folks here on Doom9 :P), occasionally watching videos (although I do have my 4K Panasonic Bluray for that), listening to music and studying (I'm in the middle of my master at university while I'm working as encoder for a company).

In a nutshell: computational cost will always get higher and higher but CPUs will get better and better. :P

RanmaCanada · 9th October 2019, 02:21

FranceBB hits the nail on the head. It's also why AV1 is a literal order of magnitude slower than x265. As you get more complex with your codecs and your compression, the processing power required is basically a bell curve. There are ways around this, like the SVT implementations of HEVC and AV1, but they are seriously garbage in comparison to a dedicated CPU encode. In time they might get better, but currently, no.

soresu · 9th October 2019, 03:05

Quote:

Originally Posted by RanmaCanada

There are ways around this, like the SVT implementations of HEVC and AV1, but they are seriously garbage in comparison to a dedicated CPU encode. In time they might get better, but currently, no.

SVT is purely CPU encode, there's no GPU, ASIC or other accelerator code in there - just a great parallel scaling framework that seemingly loses no quality as you pile on threads (per the BAV conference), that and oodles of AVX2 and AVX 512 SIMD code.

It's not a question of whether it 'might' get better though, the SVT codecs are owned/controlled by Intel, with Netflix working on it too, so unless they get bored and shelve them, it will continue to get developed because its a perfect way to show off and benchmark their super-mega-core-a-paloosa CPU's.

The libaom encoder is more of a development platform/reference implementation of AV1 optimised into a working encoder, much like libvpx for VP8/VP9 from Google - I wouldn't ever expect it to reach the speed performance of the other implementations because they are also developing the next gen codec on an experimental branch at the moment.

NikosD · 9th October 2019, 03:07

Quote:

Originally Posted by FranceBB

Anyway, you can be sure of one thing: it will be even more demanding, but, you know, in a world in which we have configurations with Intel Xeons CPUs like this Intel Xeon Platinum 9282 56c/112th, this doesn't seem to be a problem.

This is not a real world CPU.
It's practically not existent, it doesn't have a written price and it's probably never sold to anyone. Only rumors.
Just for papers.

Quote:

Originally Posted by FranceBB

For instance, I myself encode with an Intel Xeon 28c/56th at work with 64GB of RAM and a Quadro GPU even though I've been asking them several times to upgrade the CPU as it's been like this ever since 2017 and it's now "old" for what I gotta do on a daily basis.

Since your needs are that high, you should convince your boss at work to buy some serious processing power with half the money.
Try a 64C/128T EPYC CPU and you will get double processing power with half the money of the Xeon 28C/56T
Yes, it's that simple.

MeteorRain · 9th October 2019, 17:55

Quote:

Originally Posted by aymanalz

He has raised a valid and pertinent point. After all these years, x265 is still painfully slow on "normal" processors. You are right that the only way to encode faster is to get faster processors with more cores. I wish that wasn't the case, and that by now the developers could have made it faster. Maybe it is mathematically/programatically impossible to get any more speed improvements. That's a pity.

Things designed for future are supposed to be used with future technologies.

May I ask what is a "normal" processor. When x264 was released, I was among one of the pioneers to use x264 for daily driving. What is a normal processor by then? An Athlon 64 4000+ with 1 core 1 thread at 2.4GHz is probably a HEDT(?) processor. A Sempron 2400+ is probably a fairly normal processor with 1c1t at 1.66GHz. Does the latest x264 run faster on a Sempron 2400+? Probably not.

Within 5 years after that, at around 2010 we got Phenom II X6 1055T at a reasonable price, with 6c6t at 2.8GHz, which is about 10x fast as an Athlon 64 4000+. You used to get 3 fps from x264, now it's 30 fps, which sounds very reasonable.

Now regarding x265, it was released at around 2013, a year which Haswell released. i7-4770K comes with 4c8t at 3.5GHz. Within 5 years, what do we get? Core i9-9900K that's 8c16t at 4GHz if you can afford that. From passmark score it's barely 2x performance to 4770K, and even you take AVX2 into consideration it's not gonna be 3x 4x performance. You used to get 3 fps, now it's 9 fps, which is still slow.

So, blame the CPU manufacturers, not developers.

Also you probably made an assumption that code can be optimized by a good portion. That's not very true if you are talking about the versions after full AVX2 optimization is done. Yes, before they did that, it was a bit slow because the full advantage of a modern CPU is not being used. For now, the AVX2 code (and AVX512 code as we speak) has little room to further optimize. We may be able to squeeze little time by applying some early exiting and skipping algorithm optimizations. Again, marginal difference.

HEVC is designed for (near) future, with the ability to reduce processor demands by downgrading the parameters. If you want something fast, use a lower setting. Full feature enabled high setting encoding is supposed and desired to be slow even on a high-end processor, let alone a "normal" one.

Hope this helps.

excellentswordfight · 10th October 2019, 10:30

Quote:

Originally Posted by FranceBB

A thing it won't use properly, though, is the second CPU, for instance I noticed that if you have a dual socket configuration like a Dual Xeon with an high number of cores and threads respectively, x264 will use only one of the CPUs, thus reducing the speed.
Anyway, x264 is generally so fast by now that it's fine, also because of modern assembly optimizations (manually written intrinsics) that were not available for x262 and Xvid encoders like AVX2.
x265 on the other hand has been developed with modern hardware in mind and not only it uses modern assembly optimizations (like x264) but it is also heavily parallelized, it uses both CPUs in a dual socket environment and it also enables you to use some additional settings if your CPU has so many cores that it's not maxed out by it.

I have not seen any multi socket issues with x264, I think it handles it fine, maybe not as good as x265, but it can use the second socket if needed. You will see more load on one socket, but isnt that the logical way? Most of the time you wont saturate all threads in a multi socket system, and it will prioritize one socket ofc to minimize cross socket communication.

And tbh, I dont think x265 is better parallelized then x264, at default settings its actaully worse for resolutions under 4k cause of the large CU size. And both x264 and x265 have a hard time to scale beyond 24'ish threads for 1080p at slower settings. Above that clock speeds should be prioritized over threads if not doing chunk encoding. And this is by no means a criticism for x264 or x265, the paralazation and thread scaling is already very impressive for this task. And I dont think we can assume this to get much better without sacrificing something else. To increase speed and to utilize renderfarms, chunk encoding will be the way forward. And for live/realtime content, it will be hw-encoding doing the job.

Quote:

Originally Posted by MeteorRain

Now regarding x265, it was released at around 2013, a year which Haswell released. i7-4770K comes with 4c8t at 3.5GHz. Within 5 years, what do we get? Core i9-9900K that's 8c16t at 4GHz if you can afford that. From passmark score it's barely 2x performance to 470K, and even you take AVX2 into consideration it's not gonna be 3x 4x performance. You used to get 3 fps, now it's 9 fps, which is still slow.

4770k have AVX2 to, if I'm not mistaken it was introduced with haswell. I own an 4790k and have some experience with 9900, I would say that there is about an 2,5x performance difference for x265.

aymanalz · 10th October 2019, 12:18

Quote:

Originally Posted by MeteorRain

Things designed for future are supposed to be used with future technologies.

May I ask what is a "normal" processor. When x264 was released, I was among one of the pioneers to use x264 for daily driving. What is a normal processor by then? An Athlon 64 4000+ with 1 core 1 thread at 2.4GHz is probably a HEDT(?) processor. A Sempron 2400+ is probably a fairly normal processor with 1c1t at 1.66GHz. Does the latest x264 run faster on a Sempron 2400+? Probably not.

Within 5 years after that, at around 2010 we got Phenom II X6 1055T at a reasonable price, with 6c6t at 2.8GHz, which is about 10x fast as an Athlon 64 4000+. You used to get 3 fps from x264, now it's 30 fps, which sounds very reasonable.

Now regarding x265, it was released at around 2013, a year which Haswell released. i7-4770K comes with 4c8t at 3.5GHz. Within 5 years, what do we get? Core i9-9900K that's 8c16t at 4GHz if you can afford that. From passmark score it's barely 2x performance to 4770K, and even you take AVX2 into consideration it's not gonna be 3x 4x performance. You used to get 3 fps, now it's 9 fps, which is still slow.

So, blame the CPU manufacturers, not developers.

Also you probably made an assumption that code can be optimized by a good portion.
...

But HEVC is no longer for the future, is it? Successors to HEVC are in development, so HEVC is the present.

The part about how processors haven't improved much is absolutely right, and I had that in mind as well. From the year 2000 to 2010, processors (low end, mid grade, high end, everything) became several times faster, and sold for the same or even lower prices, probably due to the Intel-AMD competition. But incremental improvements have been a lot less in this decade, especially after Sandybridge or Haswell.

I could blame the CPU manufacturers, or blame the x265 developers for not anticipating that processing power per price will not keep increasing at the rate it used to, but I'm not really trying to assign blame; just making an observation that x265 is extremely slow on "normal" processors - by which I meant a reasonable home desktop without a gazillion cores and threads. (I'd say a quad core i7 or hexacore is the mainstream now.)

I wasn't assuming that code can be optimized further - I was wondering out loud whether it could. I was lamenting that perhaps they have reached a point where further optimizations for speed just isn't possible - in which case, only professional encoders or studios with 16+ core machines can use it at decent speeds. Not the casual home users.

LigH · 10th October 2019, 12:34

CPUs are not developed with the one and only purpose to encode video.

If you want video encoding at top speed, use a dedicated video encoder chip... but as always, there are the usual conflicts between speed, accuracy/complexity, and other factors: "You can't have them all at maximum at the same time."

Rousseau · 10th October 2019, 15:14

On UHD rips with 3.2 , the image looks darker when played in MPV than with rips made in 3.1 . They look the same in MPC with no tone mapping. I made no change other than the encoder.

Boulder · 10th October 2019, 16:52

Have you compared the metadata between the two encodes?

Barough · 10th October 2019, 19:32

x265 v3.2+6-f46aa2bc1c341 (32 & 64-bit 8/10/12bit Multilib Windows Binaries) (GCC 9.2.0)

Code:

https://bitbucket.org/multicoreware/x265/commits/branch/default

MeteorRain · 10th October 2019, 21:42

Quote:

Originally Posted by excellentswordfight

And tbh, I dont think x265 is better parallelized then x264, at default settings its actually worse for resolutions under 4k cause of the large CU size. And both x264 and x265 have a hard time to scale beyond 24'ish threads for 1080p at slower settings.

4770k have AVX2 to, if I'm not mistaken it was introduced with haswell. I own an 4790k and have some experience with 9900, I would say that there is about an 2,5x performance difference for x265.

x264 used to suffer from high thread count causing reduction of encoding quality. Later the threading is greatly improved and people are no longer limited to the (more optimized option of) 8 - 12 threads and can go beyond without much loss of quality.

Haswell is the first generation that introduced AVX2. However the performance of AVX2 is not consistent across multiple generations. The manufacturer is constantly improving instruction speed, and AVX2 on Haswell is slower than AVX2 on Skylake or later generations by some percent. That's why I said the performance difference on AVX2 needs to be taken into consideration.

MeteorRain · 10th October 2019, 22:02

Quote:

Originally Posted by aymanalz

But HEVC is no longer for the future, is it? Successors to HEVC are in development, so HEVC is the present.

The part about how processors haven't improved much is absolutely right, and I had that in mind as well. From the year 2000 to 2010, processors (low end, mid grade, high end, everything) became several times faster, and sold for the same or even lower prices, probably due to the Intel-AMD competition. But incremental improvements have been a lot less in this decade, especially after Sandybridge or Haswell.

I could blame the CPU manufacturers, or blame the x265 developers for not anticipating that processing power per price will not keep increasing at the rate it used to, but I'm not really trying to assign blame; just making an observation that x265 is extremely slow on "normal" processors - by which I meant a reasonable home desktop without a gazillion cores and threads. (I'd say a quad core i7 or hexacore is the mainstream now.)

I wasn't assuming that code can be optimized further - I was wondering out loud whether it could. I was lamenting that perhaps they have reached a point where further optimizations for speed just isn't possible - in which case, only professional encoders or studios with 16+ core machines can use it at decent speeds. Not the casual home users.

From different point of view, you can say HEVC is the present, you can say it's not. For example, online streaming still uses AVC (some like Youtube may be using VP9 but still), we still buy Blu-rays instead of UHDs, majority of the content distributors are still using AVC. The use of HEVC would still be quite limited so far, even if assuming HEVC can be encoded at a faster speed. I admit there are other factors like royalty, but still.

For casual home users you can still use x265, just with a lower settings. On one of a workstation that I have, with only 6 cores Sandy Bridge, I can still get 4-5 FPS on 1080P with a reasonable settings (10-bit, medium preset, LP tuning). Today a $199 6 cores AMD is already twice the speed of that SNB, so 10 FPS is what you get if you max it out. I personally feel like this is an acceptable speed for casual home users. If someone needs to do extensive HEVC encoding at a decent speed, a higher performance CPU is inevitable. (Ryzen 3900X is only $499 which is still kinda in the affordable range.)

And I apologize for my incorrect assumption. Thanks for your input.

RanmaCanada · 10th October 2019, 23:19

And we already know that the current highest end Epyc 7742 (64 cores) can do 8k encoding in faster than real time (79fps). Now imagine what a 32 core Zen2 Threadripper will be able to do. Heck I wish someone here who has a 3900x would run Sagitare's benchmark and post their results. We haven't had any of the new Zen 2 chips benched. Though I think he would need to update all the binaries to reflect the changes that have been made in x265 as the benchmark is 2 years old.

Natty · 10th October 2019, 23:38

Quote:

Originally Posted by Rousseau

On UHD rips with 3.2 , the image looks darker when played in MPV than with rips made in 3.1 . They look the same in MPC with no tone mapping. I made no change other than the encoder.

i have observed the same issue with 3.2, but with HD. somehow managed to not make it darker or lighter through avisynth.

Rousseau · 11th October 2019, 02:56

Quote:

Originally Posted by Boulder

Have you compared the metadata between the two encodes?

I'm getting some contradictory results now so this may just be some mpv wonkiness. I did notice that 3.1 stable (3.1.2+4-dc2dcb5) concatenates the cll settings with the master-display settings which might be an issue. Other versions have them separate.

3.1.2+4-dc2dcb5
master-display=G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1)cll=0,0

3.2+5-354901970679
master-display=G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1) / cll=0,0

I've also noticed AQ2 seems to have problems in HDR encodes (3.1 & 3.2 builds I've tested), leaving a lot of blotchy grain/puddling in flat areas of some scenes. AQ1 does much better. It's not that visible when you look at it without tone mapping, but in MPV it's very obvious. I tried AQ3 & 4 with no improvement.

NikosD · 11th October 2019, 06:08

Oh guys, please.

How many times do I have to post it ?

x265 is not optimized for speed/multi-threading and AMD CPUs.

64C/128T EPYC processor managed to encode faster than real-time 8K H.265 stream, using Beamr H.265 encoding software, which is optimized for speed/multi-threading and EPYC.

H.265 is different than x265.

https://www.tomshardware.com/news/am...ing,40400.html

Boulder · 11th October 2019, 06:52

Quote:

Originally Posted by Rousseau

I'm getting some contradictory results now so this may just be some mpv wonkiness. I did notice that 3.1 stable (3.1.2+4-dc2dcb5) concatenates the cll settings with the master-display settings which might be an issue. Other versions have them separate.

3.1.2+4-dc2dcb5
master-display=G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1)cll=0,0

3.2+5-354901970679
master-display=G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1) / cll=0,0

I've also noticed AQ2 seems to have problems in HDR encodes (3.1 & 3.2 builds I've tested), leaving a lot of blotchy grain/puddling in flat areas of some scenes. AQ1 does much better. It's not that visible when you look at it without tone mapping, but in MPV it's very obvious. I tried AQ3 & 4 with no improvement.

Did you try a higher aq-strength for mode 2? I think some people reported that HDR encoding requires that.
EDIT: I forgot that I was "there" as well : https://forum.doom9.org/showthread.php?t=175631 . I've not done any HDR encodes in a long time but I would probably go for aq-mode 1 at the default strength.

I think you should report an issue for both of these cases at https://bitbucket.org/multicoreware/x265/issues.

I have a feeling that HDR encoding is not optimal (and also opened an issue regarding this without a response), pointing to how much you need to change aq-strength and CRF compared to SDR encodes.

10th October 2019, 12:34	#7090 \| Link
LigH German doom9/Gleitz SuMo Join Date: Oct 2001 Location: Germany, rural Altmark Posts: 6,784	CPUs are not developed with the one and only purpose to encode video. If you want video encoding at top speed, use a dedicated video encoder chip... but as always, there are the usual conflicts between speed, accuracy/complexity, and other factors: "You can't have them all at maximum at the same time." __________________ New German Gleitz board MediaFire: x264 \| x265 \| VPx \| AOM \| Xvid

10th October 2019, 16:52	#7092 \| Link
Boulder Pig on the wing Join Date: Mar 2002 Location: Finland Posts: 5,733	Have you compared the metadata between the two encodes? __________________ And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon...

10th October 2019, 19:32	#7093 \| Link
Barough Registered User Join Date: Feb 2007 Location: Sweden Posts: 483	x265 v3.2+6-f46aa2bc1c341 (32 & 64-bit 8/10/12bit Multilib Windows Binaries) (GCC 9.2.0) Code: https://bitbucket.org/multicoreware/x265/commits/branch/default

11th October 2019, 06:08	#7099 \| Link
NikosD Registered User Join Date: Aug 2010 Location: Athens, Greece Posts: 2,901	Oh guys, please. How many times do I have to post it ? x265 is not optimized for speed/multi-threading and AMD CPUs. 64C/128T EPYC processor managed to encode faster than real-time 8K H.265 stream, using Beamr H.265 encoding software, which is optimized for speed/multi-threading and EPYC. H.265 is different than x265. https://www.tomshardware.com/news/am...ing,40400.html __________________ Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all

9th October 2019, 02:21	#7084 \| Link
RanmaCanada Registered User Join Date: May 2009 Posts: 331	FranceBB hits the nail on the head. It's also why AV1 is a literal order of magnitude slower than x265. As you get more complex with your codecs and your compression, the processing power required is basically a bell curve. There are ways around this, like the SVT implementations of HEVC and AV1, but they are seriously garbage in comparison to a dedicated CPU encode. In time they might get better, but currently, no.

10th October 2019, 15:14	#7091 \| Link
Rousseau Registered User Join Date: Jun 2017 Posts: 8	On UHD rips with 3.2 , the image looks darker when played in MPV than with rips made in 3.1 . They look the same in MPC with no tone mapping. I made no change other than the encoder.

10th October 2019, 23:19	#7096 \| Link
RanmaCanada Registered User Join Date: May 2009 Posts: 331	And we already know that the current highest end Epyc 7742 (64 cores) can do 8k encoding in faster than real time (79fps). Now imagine what a 32 core Zen2 Threadripper will be able to do. Heck I wish someone here who has a 3900x would run Sagitare's benchmark and post their results. We haven't had any of the new Zen 2 chips benched. Though I think he would need to update all the binaries to reflect the changes that have been made in x265 as the benchmark is 2 years old.