Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
8th October 2019, 20:56 | #7082 | Link |
Registered User
Join Date: May 2015
Posts: 68
|
He has raised a valid and pertinent point. After all these years, x265 is still painfully slow on "normal" processors. You are right that the only way to encode faster is to get faster processors with more cores. I wish that wasn't the case, and that by now the developers could have made it faster. Maybe it is mathematically/programatically impossible to get any more speed improvements. That's a pity.
|
8th October 2019, 22:36 | #7083 | Link | |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,905
|
Quote:
A thing it won't use properly, though, is the second CPU, for instance I noticed that if you have a dual socket configuration like a Dual Xeon with an high number of cores and threads respectively, x264 will use only one of the CPUs, thus reducing the speed. Anyway, x264 is generally so fast by now that it's fine, also because of modern assembly optimizations (manually written intrinsics) that were not available for x262 and Xvid encoders like AVX2. x265 on the other hand has been developed with modern hardware in mind and not only it uses modern assembly optimizations (like x264) but it is also heavily parallelized, it uses both CPUs in a dual socket environment and it also enables you to use some additional settings if your CPU has so many cores that it's not maxed out by it. You mentioned the mathematical complexity and indeed H.264 was based on a Discrete Cosine Transform (which works with real numbers and is continuous in 2phi) and the Hadamard Transform which is very light and is meant to take care of what the DCT couldn't compress well enough. As to H.265 it is indeed more demanding in terms of computational cost as it's using the Discrete Cosine Transform and the Discrete Sine Transform, but keep in mind that it could have been even more demanding 'cause years ago, before 2013, there were propositions about using the Karhunen-Loeve transform which is the heaviest transform that I know and it's very demanding in terms of computational cost, this is because back in the days it seemed impossible to achieve a 40% reduction compared to H.264 based on a linear-algebra only approach. The thing was that according to the results, the KLT did achieve better results compared to the DCT, however the improvements were so small in some cases and the computational cost was so high that they decided not to proceed with that approach, which then led to the modern DCT, DST approach. If you take a look at the "future", you'll see 8K and H.266 VVC which inherited the Discrete Cosine Transform and Discrete Sine Transform approach from H.265 HEVC, but it's also using an adaptive multiple transform (AMT) scheme for residual coding for both inter-coded and intra-coded blocks. This approach basically consists of a set of five DCT and DST based transform, namely DCT-II, DCT-V, DCT-VIII, DST-I, and DST-VII and a Signal Dependent Transform (SDT) is competed to the AMT output. The SDT approximates the optimal Karhunen-Loéve transform (KLT), which is a signal dependent transform, by estimating current signal to code (transform block) with similar signals (i.e. reference patch) available at the decoder (already coded). This way a lot of computational power is actually saved by not using the KLT directly which is far too demanding in terms of computational cost. Anyway, you can be sure of one thing: it will be even more demanding, but, you know, in a world in which we have configurations with Intel Xeons CPUs like this Intel Xeon Platinum 9282 56c/112th, this doesn't seem to be a problem. For instance, I myself encode with an Intel Xeon 28c/56th at work with 64GB of RAM and a Quadro GPU even though I've been asking them several times to upgrade the CPU as it's been like this ever since 2017 and it's now "old" for what I gotta do on a daily basis. What do I have at home? Well, a crappy i7 4c/8th with 32 GB DDR4 and an RTX NVIDIA GPU but it doesn't matter since the PC I use at home is for general purpose: browsing (replying to you folks here on Doom9 :P), occasionally watching videos (although I do have my 4K Panasonic Bluray for that), listening to music and studying (I'm in the middle of my master at university while I'm working as encoder for a company). In a nutshell: computational cost will always get higher and higher but CPUs will get better and better. :P Last edited by FranceBB; 8th October 2019 at 22:44. |
|
9th October 2019, 02:21 | #7084 | Link |
Registered User
Join Date: May 2009
Posts: 331
|
FranceBB hits the nail on the head. It's also why AV1 is a literal order of magnitude slower than x265. As you get more complex with your codecs and your compression, the processing power required is basically a bell curve. There are ways around this, like the SVT implementations of HEVC and AV1, but they are seriously garbage in comparison to a dedicated CPU encode. In time they might get better, but currently, no.
|
9th October 2019, 03:05 | #7085 | Link | |
Registered User
Join Date: May 2005
Location: Swansea, Wales, UK
Posts: 196
|
Quote:
It's not a question of whether it 'might' get better though, the SVT codecs are owned/controlled by Intel, with Netflix working on it too, so unless they get bored and shelve them, it will continue to get developed because its a perfect way to show off and benchmark their super-mega-core-a-paloosa CPU's. The libaom encoder is more of a development platform/reference implementation of AV1 optimised into a working encoder, much like libvpx for VP8/VP9 from Google - I wouldn't ever expect it to reach the speed performance of the other implementations because they are also developing the next gen codec on an experimental branch at the moment. |
|
9th October 2019, 03:07 | #7086 | Link | ||
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Quote:
It's practically not existent, it doesn't have a written price and it's probably never sold to anyone. Only rumors. Just for papers. Quote:
Try a 64C/128T EPYC CPU and you will get double processing power with half the money of the Xeon 28C/56T Yes, it's that simple.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
||
9th October 2019, 17:55 | #7087 | Link | |
結城有紀
Join Date: Dec 2003
Location: NJ; OR; Shanghai
Posts: 894
|
Quote:
May I ask what is a "normal" processor. When x264 was released, I was among one of the pioneers to use x264 for daily driving. What is a normal processor by then? An Athlon 64 4000+ with 1 core 1 thread at 2.4GHz is probably a HEDT(?) processor. A Sempron 2400+ is probably a fairly normal processor with 1c1t at 1.66GHz. Does the latest x264 run faster on a Sempron 2400+? Probably not. Within 5 years after that, at around 2010 we got Phenom II X6 1055T at a reasonable price, with 6c6t at 2.8GHz, which is about 10x fast as an Athlon 64 4000+. You used to get 3 fps from x264, now it's 30 fps, which sounds very reasonable. Now regarding x265, it was released at around 2013, a year which Haswell released. i7-4770K comes with 4c8t at 3.5GHz. Within 5 years, what do we get? Core i9-9900K that's 8c16t at 4GHz if you can afford that. From passmark score it's barely 2x performance to 4770K, and even you take AVX2 into consideration it's not gonna be 3x 4x performance. You used to get 3 fps, now it's 9 fps, which is still slow. So, blame the CPU manufacturers, not developers. Also you probably made an assumption that code can be optimized by a good portion. That's not very true if you are talking about the versions after full AVX2 optimization is done. Yes, before they did that, it was a bit slow because the full advantage of a modern CPU is not being used. For now, the AVX2 code (and AVX512 code as we speak) has little room to further optimize. We may be able to squeeze little time by applying some early exiting and skipping algorithm optimizations. Again, marginal difference. HEVC is designed for (near) future, with the ability to reduce processor demands by downgrading the parameters. If you want something fast, use a lower setting. Full feature enabled high setting encoding is supposed and desired to be slow even on a high-end processor, let alone a "normal" one. Hope this helps. |
|
10th October 2019, 10:30 | #7088 | Link | ||
Lost my old account :(
Join Date: Jul 2017
Posts: 326
|
Quote:
And tbh, I dont think x265 is better parallelized then x264, at default settings its actaully worse for resolutions under 4k cause of the large CU size. And both x264 and x265 have a hard time to scale beyond 24'ish threads for 1080p at slower settings. Above that clock speeds should be prioritized over threads if not doing chunk encoding. And this is by no means a criticism for x264 or x265, the paralazation and thread scaling is already very impressive for this task. And I dont think we can assume this to get much better without sacrificing something else. To increase speed and to utilize renderfarms, chunk encoding will be the way forward. And for live/realtime content, it will be hw-encoding doing the job. Quote:
Last edited by excellentswordfight; 10th October 2019 at 12:29. |
||
10th October 2019, 12:18 | #7089 | Link | |
Registered User
Join Date: May 2015
Posts: 68
|
Quote:
The part about how processors haven't improved much is absolutely right, and I had that in mind as well. From the year 2000 to 2010, processors (low end, mid grade, high end, everything) became several times faster, and sold for the same or even lower prices, probably due to the Intel-AMD competition. But incremental improvements have been a lot less in this decade, especially after Sandybridge or Haswell. I could blame the CPU manufacturers, or blame the x265 developers for not anticipating that processing power per price will not keep increasing at the rate it used to, but I'm not really trying to assign blame; just making an observation that x265 is extremely slow on "normal" processors - by which I meant a reasonable home desktop without a gazillion cores and threads. (I'd say a quad core i7 or hexacore is the mainstream now.) I wasn't assuming that code can be optimized further - I was wondering out loud whether it could. I was lamenting that perhaps they have reached a point where further optimizations for speed just isn't possible - in which case, only professional encoders or studios with 16+ core machines can use it at decent speeds. Not the casual home users. |
|
10th October 2019, 12:34 | #7090 | Link |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,784
|
CPUs are not developed with the one and only purpose to encode video.
If you want video encoding at top speed, use a dedicated video encoder chip... but as always, there are the usual conflicts between speed, accuracy/complexity, and other factors: "You can't have them all at maximum at the same time." |
10th October 2019, 19:32 | #7093 | Link |
Registered User
Join Date: Feb 2007
Location: Sweden
Posts: 483
|
x265 v3.2+6-f46aa2bc1c341 (32 & 64-bit 8/10/12bit Multilib Windows Binaries) (GCC 9.2.0)
Code:
https://bitbucket.org/multicoreware/x265/commits/branch/default |
10th October 2019, 21:42 | #7094 | Link | |
結城有紀
Join Date: Dec 2003
Location: NJ; OR; Shanghai
Posts: 894
|
Quote:
Haswell is the first generation that introduced AVX2. However the performance of AVX2 is not consistent across multiple generations. The manufacturer is constantly improving instruction speed, and AVX2 on Haswell is slower than AVX2 on Skylake or later generations by some percent. That's why I said the performance difference on AVX2 needs to be taken into consideration. |
|
10th October 2019, 22:02 | #7095 | Link | |
結城有紀
Join Date: Dec 2003
Location: NJ; OR; Shanghai
Posts: 894
|
Quote:
For casual home users you can still use x265, just with a lower settings. On one of a workstation that I have, with only 6 cores Sandy Bridge, I can still get 4-5 FPS on 1080P with a reasonable settings (10-bit, medium preset, LP tuning). Today a $199 6 cores AMD is already twice the speed of that SNB, so 10 FPS is what you get if you max it out. I personally feel like this is an acceptable speed for casual home users. If someone needs to do extensive HEVC encoding at a decent speed, a higher performance CPU is inevitable. (Ryzen 3900X is only $499 which is still kinda in the affordable range.) And I apologize for my incorrect assumption. Thanks for your input.
__________________
Projects x265 - Yuuki-Asuna-mod Download / GitHub TS - ADTS AAC Splitter | LATM AAC Splitter | BS4K-ASS Neo AviSynth+ filters - F3KDB | FFT3D | DFTTest | MiniDeen | Temporal Median Last edited by MeteorRain; 10th October 2019 at 22:05. |
|
10th October 2019, 23:19 | #7096 | Link |
Registered User
Join Date: May 2009
Posts: 331
|
And we already know that the current highest end Epyc 7742 (64 cores) can do 8k encoding in faster than real time (79fps). Now imagine what a 32 core Zen2 Threadripper will be able to do. Heck I wish someone here who has a 3900x would run Sagitare's benchmark and post their results. We haven't had any of the new Zen 2 chips benched. Though I think he would need to update all the binaries to reflect the changes that have been made in x265 as the benchmark is 2 years old.
|
11th October 2019, 02:56 | #7098 | Link |
Registered User
Join Date: Jun 2017
Posts: 8
|
I'm getting some contradictory results now so this may just be some mpv wonkiness. I did notice that 3.1 stable (3.1.2+4-dc2dcb5) concatenates the cll settings with the master-display settings which might be an issue. Other versions have them separate.
3.1.2+4-dc2dcb5 master-display=G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1)cll=0,0 3.2+5-354901970679 master-display=G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1) / cll=0,0 I've also noticed AQ2 seems to have problems in HDR encodes (3.1 & 3.2 builds I've tested), leaving a lot of blotchy grain/puddling in flat areas of some scenes. AQ1 does much better. It's not that visible when you look at it without tone mapping, but in MPV it's very obvious. I tried AQ3 & 4 with no improvement. |
11th October 2019, 06:08 | #7099 | Link |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Oh guys, please.
How many times do I have to post it ? x265 is not optimized for speed/multi-threading and AMD CPUs. 64C/128T EPYC processor managed to encode faster than real-time 8K H.265 stream, using Beamr H.265 encoding software, which is optimized for speed/multi-threading and EPYC. H.265 is different than x265. https://www.tomshardware.com/news/am...ing,40400.html
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
11th October 2019, 06:52 | #7100 | Link | |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,733
|
Quote:
EDIT: I forgot that I was "there" as well : https://forum.doom9.org/showthread.php?t=175631 . I've not done any HDR encodes in a long time but I would probably go for aq-mode 1 at the default strength. I think you should report an issue for both of these cases at https://bitbucket.org/multicoreware/x265/issues. I have a feeling that HDR encoding is not optimal (and also opened an issue regarding this without a response), pointing to how much you need to change aq-strength and CRF compared to SDR encodes.
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... Last edited by Boulder; 11th October 2019 at 07:02. |
|
|
|