View Single Post
Old 23rd September 2014, 11:53   #1  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Evaluation of HEVC decoders (SW, Hybrid and HW)

Welcome to this thread regarding the evaluation of HEVC decoders (HW,Hybrid and SW)

The latest update with new samples for evaluating pure fixed-function HW HEVC decoders is here testing Kabylake HD 610, Skylake HD 530, nVidia Turing GTX 1660 Nvidia Pascal GTX 1050 Ti/1060, Nvidia Maxwell 3rd gen GTX 960 and AMD Polaris RX 470 - 01/03/2017

Added some info on VP9 acceleration support of browsers using Youtube on various hardware (including Polaris cards) here - 22/07/2017

The first official confirmation of Skylake's (HD 530) 8K (!) HEVC 8bit decoding capability in pure HW decoding mode here by Yups - 05/09/2015

Here is the comparison between Skylake HD 530 and Nvidia GTX 960 - 24/08/2015

Probably world's first Skylake 4K HEVC 8bit (HD 530) decoding results here and here and here by Yups: (looks like the world's fastest HW decoder for 8bit HEVC) - 22/08/2015

For updated Optimus results, take a look here - 27/05/2015


For those interested in hybrid only results on Intel's iGPU (DXVA, OpenCL), look at the results of the new PowerDVD 15 x86 (DXVA, OpenCL) and latest LAV 0.65/MPC 1.4.4.286 DXVA x64/x86 decoders here - 22/04/2015

New updated CPU results for latest Lentoid v2.1.0.0, LAV v0.67.3 and MS H.265 decode (build 10586) here - 29/11/2015

For those interested in CPU only results, look at the results of the new Lentoid x64/x86 v2.0.3.2, PowerDVD 15 on various CPU architectures and systems including Win 10 Build 10049 using latest LAV 0.65/MPC 1.4.4.286 decoders.
Also added a 10bit HEVC video here - 22/04/2015


Added new results for the first public release of UHDcode, the HEVC decoder of MultiCoreWare here - 01/04/2015


For HW decoder results of Nvidia 960 GTX, you have to see the post from Nevcairiel here - 23/01/2015
(these are the world's first benchmark results of the fixed-function HEVC decoder inside GTX 960)

For a little older tests - 14/01/2015 - including PowerDVD v14 decoder, Microsoft's MFT Windows 10 decoder, a 10bit clip, on various CPU architectures (from Core 2 Duo up to Core i7 Haswell) take a look here


The tool I use is always DXVA Checker you can download from here


The first post (23/09/2014) starts here:

I used latest DXVA Checker x86/x64 version v3.1.2 using DXVA processing and a scaling resolution of 1280x720 for all clips.


1080p clips

1.ProRes-1080p@30fps-2Mbps
https://www.sendspace.com/file/mo8exg


LAV x64 i7-4790 158/230/251

LAV x86 i7-4790 121/151/188

LAV x64 i7-4790 (DXVA) 83/134/157

LAV x86 i7-4790 (DXVA) 84/133/158

Strongene x86 i7-4790 (OpenCL) 83/112/146

Strongene x86 i7-4790 97/109/111

LAV x64 i5-4200M 76/98/114

LAV x64 i5-4200M (DXVA) 60/84/96

LAV x64 Core 2 Quad 45/52/54




2.Elephants Dream-1080p@24fps-1.7Mbps
http://www.libde265.org/hevc-bitstre...1080-cfg02.mkv


LAV x64 i7-4790 93/195/242

LAV x64 i7-4790 (DXVA) 78/138/204

LAV x86 i7-4790 (DXVA) 66/135/200

LAV x86 i7-4790 62/133/259

Strongene x86 i7-4790 66/107/116

LAV x64 i5-4200M 40/97/130

LAV x64 i5-4200M (DXVA) 44/84/112

LAV x64 Core 2 Quad 35/52/55

Strongene x86 i7-4790 (OpenCL) Crashed



3.Big Buck Bunny-1080p@60fps-2Mbps
http://www.libde265.org/hevc-bitstre...1080-cfg06.mkv


LAV x64 i7-4790 144/232/267

LAV x64 i7-4790 (DXVA) 123/165/230

LAV x86 i7-4790 (DXVA) 113/161/201

LAV x86 i7-4790 67/146/259

LAV x64 i5-4200M 66/109/131

LAV x64 i5-4200M (DXVA) 68/97/114

LAV x64 Core 2 Quad 45/53/55

Strongene x86 i7-4790 (OpenCL) Crashed

Strongene x86 i7-4790 Crashed just before the end of the file



UHD - 3840x2160p clips


1.Beauty-2160p@30fps-12.3Mbps
http://ultravideo.cs.tut.fi/video/Be...t_HEVC_MP4.mp4


LAV x64 i7-4790 51/63/65

Strongene x86 i7-4790 (OpenCL) 26/47/47

LAV x86 i7-479025/40/43

Strongene x86 i7-4790 20/34/35

LAV x64 i7-4790 (DXVA) 29/33/35

LAV x86 i7-4790 (DXVA) 30/32/33

LAV x64 i5-4200M 15/25/27

LAV x64 i5-4200M (DXVA) 19/21/22

LAV x64 Core 2 Quad 6/13/13



2.Fitness-2160p@30fps-8Mbps
http://cloud.ultrahdtv.net/fitness-trailer-8000.mkv


LAV x64 i7-4790 57/71/82

LAV x64 i7-4790 (DXVA) 41/55/77

LAV x86 i7-4790 (DXVA) 40/52/76

LAV x86 i7-4790 35/48/71

Strongene x86 i7-4790 28/40/50

LAV x64 i5-4200M (DXVA) 25/30/39

LAV x64 i5-4200M 22/29/38

LAV x64 Core 2 Quad 9/13/13

Strongene x86 i7-4790 (OpenCL) Crashed



3.Ducks-2160p@50fps-4Mbps
https://www.sendspace.com/file/cyiv49


LAV x64 i7-4790 62/72/73

Strongene x86 i7-4790 (OpenCL) 29/46/50

LAV x86 i7-4790 (DXVA) 43/45/47

LAV x64 i7-4790 (DXVA) 42/45/47

Strongene x86 i7-4790 27/41/42

LAV x86 i7-4790 29/40/44

LAV x64 i5-4200M 25/31/33

LAV x64 i5-4200M (DXVA) 27/29/30

LAV x64 Core 2 Quad 10/13/13



Comments:

I used latest nightly of LAV filters 0.62.46 (24-09-2014) with LAV Video threads set to Auto and latest Strongene's OpenCL and CPU HEVC decoders.

The Core i7 system tested was:
Win 8.1 Pro x64 - Core i7-4790 - HD 4600 - Drivers v.3907

The Core i5 system tested was:
Win 8.1 x64 - Core i5-4200M@2.5GHz (battery mode) - HD 4600 - Drivers v.3907 - Nvidia 740M - Drivers 344.11 - Optimus technology

The Core 2 Quad system tested was:
Win 8.1 Pro x64 - Core 2 Quad Q9550@2.26GHz (266MHz x 8.5) - Nvidia GT610 - Drivers 344.11

For laptop and Nvidia hybrid decoder, see my post here

For tests using more threads than Auto, null renderer, overclocked CPUs and different OS (Win 7) check this post:

CPU decoders for Core i7


The CPU decoders were using GPU as low as possible ~25%-35% with 600MHz clock, but Strongene's CPU was a little higher ~80%@600MHz

1080p
For LAV x86/x64 CPU decoders, CPU utilization was 55% - 60% using all 8 threads with a clock of 3.8GHz, for Strongene CPU decoder it was only 16% !!! with a clock of ~4.0GHz with only a few threads enabled.

2160p
For LAV x64 CPU decoder, CPU utilization was 77% - 91% using all 8 threads with a clock of 3.8GHz, for LAV x86 CPU CPU utilization was 92% - 94% using all 8 threads with a clock of 3.8GHz, for Strongene CPU decoder it was only 25% -29% !!! with a clock of ~4.0GHz with only a few threads enabled.



CPU decoders for Core i5/ Core 2 Quad


I used only LAV x64 decoder and the results are completely weird.

Core i5-4200M@2.5GHz is a dual core processor with four threads (2C/4T) and achieved a CPU utilization of 76% - 86% for 1080p and 76% - 89% for 2160p

Core 2 Quad Q9550@2.26GHz is a quad core processor with four threads (4C/4T) and achieved a CPU utilization of 50% - 58% for 1080p and 55% - 60% for 2160p

For all clips the performance of the dual core Core i5@2.5GHz is almost double than the quad Core 2 Quad@2.26GHz with more than 50% CPU utilization (!) for the dual core (2C/4T) chip (!!)



GPU decoders for Core i7


1080p
The DXVA decoders were using GPU at ~100% and max 1200MHz clock and CPU at ~14% with a clock of 2.6GHz - 4.0GHz

Strongene's OpenCL decoder used GPU like Strongene's CPU decoder - only a few MHz higher (600MHz - 750MHz) and a CPU usage of ~11% at 4.0GHz


2160p
The DXVA decoders were using GPU at ~90% and max 1200MHz clock and CPU at ~14% with a clock of 2.6GHz - 4.0GHz.

The memory usage went to the limit of ~960MB-990MB, the maximum that iGPU can use.

Strongene's OpenCL decoder used GPU a lot more than 1080p at 85%@850MHz - 900MHz and the CPU usage of 36% - 50% at 3.8GHz-4.0GHz which is double than Strongene's CPU decoder!


GPU decoders for Core-i5

See here



----------------------------------------------------------------------------------------------------------------------------------
LAV DXVA x86 and x64 have almost the same performance and in 2160p they are slower than OpenCL decoder, mainly because of the CPU usage of OpenCL decoder and not GPU.

It's interesting that LAV DXVA x86/x64 is faster than LAV CPU x86 in 1080p clips and low bitrate 2160p clips.
But in just 12Mbps 2160p clip, the LAV x86 CPU is faster than DXVA.

During benchmarking LAV CPU decoders were using more power (watt) than DXVA or OpenCL decoder, but I think during playback the opposite occurs (I haven't tested yet)

I used latest Intel's GPA tool to monitor any QuickSync (fixed-function HW) activity, but it was a dead zero during DXVA decoding.

So everything is decoded in EUs and a little CPU for HEVC DXVA2 decoder.

----------------------------------------------------------------------------------------------------------------------------------

Conclusion

LAV DXVA uses maximum iGPU memory of 1GB for 4K decoding and with a low bandwidth of even 12Mbps is slower even than LAV x86 CPU.
It's more useful for lower resolutions.

Strongene's OpenCL decoder is definitely a lot more useful for 4K than 1080p, but it has incompatibilities and in order to be fast at 4K it uses a lot the CPU, not so much the GPU.

Strongene's CPU decoder is almost always slower of all.

LAV x64 CPU is always faster than anything else in all bitrates and resolutions.

CPU utilization of all CPU decoders, rises with the increased resolution and low bitrate.
Best example is Duck 2160p@4Mbps clip with LAV x64 CPU utilization of 91% and LAV x86 at 94%

When bandwidth increases a little, the CPU utilization and decoding performance drops more.

For all decoders I used a good case scenario of a 1280x720 display resolution.

When native resolution is used for display at 1080p and 2160p, the results are lower by a good percentage.

For 4K BluRay with a bitrate of 100Mbps and 10 bit resolution, expect CPU decoders and hybrid decoders to be useless (?)* even with Haswell-E or Xeon processors.

We are going to need definitely pure fixed-function HW decoders for 4K BluRay.

On the other hand, 4K BluRay will appear on winter holidays of 2015, so until then, CPU and hybrid decoders are just fine for low bandwidth and low fps clips that HEVC is the best codec to use.

I have already encoded 4K H.264 and 4K H.265 up to 600Mbps (!) and I can say for sure that 4K H.264 performance of Haswell QuickSync decoder is about 190fps for 4K H.264 100Mbps clip at 1280x720 display resolution.

Intel and Nvidia decided to offer a hybrid solution now, which is a useful choice for low fps, low bandwidth clips even at 4K resolution and by the time of 4K BluRay arrival, fixed-function decoders will be ready.

Hopefully AMD will follow.

* according to these results by cyberbeing an overclocked CPU - even without HT - and a real fast Nvidia card can handle even more difficult to decode 4K HEVC clips at greater speed than my results.

But still 4K BluRay will be out of their reach I think.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all

Last edited by NikosD; 5th May 2019 at 07:59.
NikosD is offline   Reply With Quote