Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
![]() |
#1 | Link |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Evaluation of HEVC decoders (SW, Hybrid and HW)
Welcome to this thread regarding the evaluation of HEVC decoders (HW,Hybrid and SW)
The latest update with new samples for evaluating pure fixed-function HW HEVC decoders is here testing Kabylake HD 610, Skylake HD 530, nVidia Turing GTX 1660 Nvidia Pascal GTX 1050 Ti/1060, Nvidia Maxwell 3rd gen GTX 960 and AMD Polaris RX 470 - 01/03/2017 Added some info on VP9 acceleration support of browsers using Youtube on various hardware (including Polaris cards) here - 22/07/2017 The first official confirmation of Skylake's (HD 530) 8K (!) HEVC 8bit decoding capability in pure HW decoding mode here by Yups - 05/09/2015 Here is the comparison between Skylake HD 530 and Nvidia GTX 960 - 24/08/2015 Probably world's first Skylake 4K HEVC 8bit (HD 530) decoding results here and here and here by Yups: (looks like the world's fastest HW decoder for 8bit HEVC) - 22/08/2015 For updated Optimus results, take a look here - 27/05/2015 For those interested in hybrid only results on Intel's iGPU (DXVA, OpenCL), look at the results of the new PowerDVD 15 x86 (DXVA, OpenCL) and latest LAV 0.65/MPC 1.4.4.286 DXVA x64/x86 decoders here - 22/04/2015 New updated CPU results for latest Lentoid v2.1.0.0, LAV v0.67.3 and MS H.265 decode (build 10586) here - 29/11/2015 For those interested in CPU only results, look at the results of the new Lentoid x64/x86 v2.0.3.2, PowerDVD 15 on various CPU architectures and systems including Win 10 Build 10049 using latest LAV 0.65/MPC 1.4.4.286 decoders. Also added a 10bit HEVC video here - 22/04/2015 Added new results for the first public release of UHDcode, the HEVC decoder of MultiCoreWare here - 01/04/2015 For HW decoder results of Nvidia 960 GTX, you have to see the post from Nevcairiel here - 23/01/2015 (these are the world's first benchmark results of the fixed-function HEVC decoder inside GTX 960) For a little older tests - 14/01/2015 - including PowerDVD v14 decoder, Microsoft's MFT Windows 10 decoder, a 10bit clip, on various CPU architectures (from Core 2 Duo up to Core i7 Haswell) take a look here The tool I use is always DXVA Checker you can download from here The first post (23/09/2014) starts here: I used latest DXVA Checker x86/x64 version v3.1.2 using DXVA processing and a scaling resolution of 1280x720 for all clips. 1080p clips 1.ProRes-1080p@30fps-2Mbps https://www.sendspace.com/file/mo8exg LAV x64 i7-4790 158/230/251 LAV x86 i7-4790 121/151/188 LAV x64 i7-4790 (DXVA) 83/134/157 LAV x86 i7-4790 (DXVA) 84/133/158 Strongene x86 i7-4790 (OpenCL) 83/112/146 Strongene x86 i7-4790 97/109/111 LAV x64 i5-4200M 76/98/114 LAV x64 i5-4200M (DXVA) 60/84/96 LAV x64 Core 2 Quad 45/52/54 2.Elephants Dream-1080p@24fps-1.7Mbps http://www.libde265.org/hevc-bitstre...1080-cfg02.mkv LAV x64 i7-4790 93/195/242 LAV x64 i7-4790 (DXVA) 78/138/204 LAV x86 i7-4790 (DXVA) 66/135/200 LAV x86 i7-4790 62/133/259 Strongene x86 i7-4790 66/107/116 LAV x64 i5-4200M 40/97/130 LAV x64 i5-4200M (DXVA) 44/84/112 LAV x64 Core 2 Quad 35/52/55 Strongene x86 i7-4790 (OpenCL) Crashed 3.Big Buck Bunny-1080p@60fps-2Mbps http://www.libde265.org/hevc-bitstre...1080-cfg06.mkv LAV x64 i7-4790 144/232/267 LAV x64 i7-4790 (DXVA) 123/165/230 LAV x86 i7-4790 (DXVA) 113/161/201 LAV x86 i7-4790 67/146/259 LAV x64 i5-4200M 66/109/131 LAV x64 i5-4200M (DXVA) 68/97/114 LAV x64 Core 2 Quad 45/53/55 Strongene x86 i7-4790 (OpenCL) Crashed Strongene x86 i7-4790 Crashed just before the end of the file UHD - 3840x2160p clips 1.Beauty-2160p@30fps-12.3Mbps http://ultravideo.cs.tut.fi/video/Be...t_HEVC_MP4.mp4 LAV x64 i7-4790 51/63/65 Strongene x86 i7-4790 (OpenCL) 26/47/47 LAV x86 i7-479025/40/43 Strongene x86 i7-4790 20/34/35 LAV x64 i7-4790 (DXVA) 29/33/35 LAV x86 i7-4790 (DXVA) 30/32/33 LAV x64 i5-4200M 15/25/27 LAV x64 i5-4200M (DXVA) 19/21/22 LAV x64 Core 2 Quad 6/13/13 2.Fitness-2160p@30fps-8Mbps http://cloud.ultrahdtv.net/fitness-trailer-8000.mkv LAV x64 i7-4790 57/71/82 LAV x64 i7-4790 (DXVA) 41/55/77 LAV x86 i7-4790 (DXVA) 40/52/76 LAV x86 i7-4790 35/48/71 Strongene x86 i7-4790 28/40/50 LAV x64 i5-4200M (DXVA) 25/30/39 LAV x64 i5-4200M 22/29/38 LAV x64 Core 2 Quad 9/13/13 Strongene x86 i7-4790 (OpenCL) Crashed 3.Ducks-2160p@50fps-4Mbps https://www.sendspace.com/file/cyiv49 LAV x64 i7-4790 62/72/73 Strongene x86 i7-4790 (OpenCL) 29/46/50 LAV x86 i7-4790 (DXVA) 43/45/47 LAV x64 i7-4790 (DXVA) 42/45/47 Strongene x86 i7-4790 27/41/42 LAV x86 i7-4790 29/40/44 LAV x64 i5-4200M 25/31/33 LAV x64 i5-4200M (DXVA) 27/29/30 LAV x64 Core 2 Quad 10/13/13 Comments: I used latest nightly of LAV filters 0.62.46 (24-09-2014) with LAV Video threads set to Auto and latest Strongene's OpenCL and CPU HEVC decoders. The Core i7 system tested was: Win 8.1 Pro x64 - Core i7-4790 - HD 4600 - Drivers v.3907 The Core i5 system tested was: Win 8.1 x64 - Core i5-4200M@2.5GHz (battery mode) - HD 4600 - Drivers v.3907 - Nvidia 740M - Drivers 344.11 - Optimus technology The Core 2 Quad system tested was: Win 8.1 Pro x64 - Core 2 Quad Q9550@2.26GHz (266MHz x 8.5) - Nvidia GT610 - Drivers 344.11 For laptop and Nvidia hybrid decoder, see my post here For tests using more threads than Auto, null renderer, overclocked CPUs and different OS (Win 7) check this post: CPU decoders for Core i7 The CPU decoders were using GPU as low as possible ~25%-35% with 600MHz clock, but Strongene's CPU was a little higher ~80%@600MHz 1080p For LAV x86/x64 CPU decoders, CPU utilization was 55% - 60% using all 8 threads with a clock of 3.8GHz, for Strongene CPU decoder it was only 16% !!! with a clock of ~4.0GHz with only a few threads enabled. 2160p For LAV x64 CPU decoder, CPU utilization was 77% - 91% using all 8 threads with a clock of 3.8GHz, for LAV x86 CPU CPU utilization was 92% - 94% using all 8 threads with a clock of 3.8GHz, for Strongene CPU decoder it was only 25% -29% !!! with a clock of ~4.0GHz with only a few threads enabled. CPU decoders for Core i5/ Core 2 Quad I used only LAV x64 decoder and the results are completely weird. Core i5-4200M@2.5GHz is a dual core processor with four threads (2C/4T) and achieved a CPU utilization of 76% - 86% for 1080p and 76% - 89% for 2160p Core 2 Quad Q9550@2.26GHz is a quad core processor with four threads (4C/4T) and achieved a CPU utilization of 50% - 58% for 1080p and 55% - 60% for 2160p For all clips the performance of the dual core Core i5@2.5GHz is almost double than the quad Core 2 Quad@2.26GHz with more than 50% CPU utilization (!) for the dual core (2C/4T) chip (!!) GPU decoders for Core i7 1080p The DXVA decoders were using GPU at ~100% and max 1200MHz clock and CPU at ~14% with a clock of 2.6GHz - 4.0GHz Strongene's OpenCL decoder used GPU like Strongene's CPU decoder - only a few MHz higher (600MHz - 750MHz) and a CPU usage of ~11% at 4.0GHz 2160p The DXVA decoders were using GPU at ~90% and max 1200MHz clock and CPU at ~14% with a clock of 2.6GHz - 4.0GHz. The memory usage went to the limit of ~960MB-990MB, the maximum that iGPU can use. Strongene's OpenCL decoder used GPU a lot more than 1080p at 85%@850MHz - 900MHz and the CPU usage of 36% - 50% at 3.8GHz-4.0GHz which is double than Strongene's CPU decoder! GPU decoders for Core-i5 See here ---------------------------------------------------------------------------------------------------------------------------------- LAV DXVA x86 and x64 have almost the same performance and in 2160p they are slower than OpenCL decoder, mainly because of the CPU usage of OpenCL decoder and not GPU. It's interesting that LAV DXVA x86/x64 is faster than LAV CPU x86 in 1080p clips and low bitrate 2160p clips. But in just 12Mbps 2160p clip, the LAV x86 CPU is faster than DXVA. During benchmarking LAV CPU decoders were using more power (watt) than DXVA or OpenCL decoder, but I think during playback the opposite occurs (I haven't tested yet) I used latest Intel's GPA tool to monitor any QuickSync (fixed-function HW) activity, but it was a dead zero during DXVA decoding. So everything is decoded in EUs and a little CPU for HEVC DXVA2 decoder. ---------------------------------------------------------------------------------------------------------------------------------- Conclusion LAV DXVA uses maximum iGPU memory of 1GB for 4K decoding and with a low bandwidth of even 12Mbps is slower even than LAV x86 CPU. It's more useful for lower resolutions. Strongene's OpenCL decoder is definitely a lot more useful for 4K than 1080p, but it has incompatibilities and in order to be fast at 4K it uses a lot the CPU, not so much the GPU. Strongene's CPU decoder is almost always slower of all. LAV x64 CPU is always faster than anything else in all bitrates and resolutions. CPU utilization of all CPU decoders, rises with the increased resolution and low bitrate. Best example is Duck 2160p@4Mbps clip with LAV x64 CPU utilization of 91% and LAV x86 at 94% When bandwidth increases a little, the CPU utilization and decoding performance drops more. For all decoders I used a good case scenario of a 1280x720 display resolution. When native resolution is used for display at 1080p and 2160p, the results are lower by a good percentage. For 4K BluRay with a bitrate of 100Mbps and 10 bit resolution, expect CPU decoders and hybrid decoders to be useless (?)* even with Haswell-E or Xeon processors. We are going to need definitely pure fixed-function HW decoders for 4K BluRay. On the other hand, 4K BluRay will appear on winter holidays of 2015, so until then, CPU and hybrid decoders are just fine for low bandwidth and low fps clips that HEVC is the best codec to use. I have already encoded 4K H.264 and 4K H.265 up to 600Mbps (!) and I can say for sure that 4K H.264 performance of Haswell QuickSync decoder is about 190fps for 4K H.264 100Mbps clip at 1280x720 display resolution. Intel and Nvidia decided to offer a hybrid solution now, which is a useful choice for low fps, low bandwidth clips even at 4K resolution and by the time of 4K BluRay arrival, fixed-function decoders will be ready. Hopefully AMD will follow. * according to these results by cyberbeing an overclocked CPU - even without HT - and a real fast Nvidia card can handle even more difficult to decode 4K HEVC clips at greater speed than my results. But still 4K BluRay will be out of their reach I think.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all Last edited by NikosD; 5th May 2019 at 07:59. |
![]() |
![]() |
![]() |
#2 | Link |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Next thing to test is a special driver from Nvidia or Intel or a LAV filters special version with significant gain in performance (CPU or Hybrid) for Nvidia or Intel.
I'll definitely try to repeat the tests with Optimus laptop to check if anything has changed regarding Nvidia's hybrid decoder performance. If any of you has interesting samples (difficult to decode), I would give them a try, especially in 1080p resolution for which I didn't find a lot of samples.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all Last edited by NikosD; 25th September 2014 at 18:51. |
![]() |
![]() |
![]() |
#4 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,376
|
They are fast with stealing my code, apparently.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
![]() |
![]() |
![]() |
#5 | Link | |
Registered User
Join Date: Dec 2002
Posts: 5,565
|
Quote:
https://www.dropbox.com/s/6rbqlrrhhv...%20UHD.ts?dl=0 (probably not cut correctly at GOP border) But it uses 10 bit coding - I don't know if Nvidia and Strongene support that? |
|
![]() |
![]() |
![]() |
#7 | Link | ||
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Quote:
I think there are threads here in Doom9 with that subject - I think it has been discussed thoroughly - or you can start a new thread about that. Quote:
Well, you are going to make me buy a new CPU for that file! I used latest nightly MPC-HC x64 and I got a lot of stuttering using EVR-CP. I had to use plain EVR in order to be smoother, but I can't say it was absolutely smooth. LAV DXVA, Strongene's OpenCL and CPU decoders are all incompatible with 10-bit HEVC, so I can't use them for decoding comparisons. I don't know yet about Nvidia's HEVC decoder, but I think only LAV CPU can decode 10-bit HEVC clips. BTW, I have a lot of 10 bit 4K HEVC clips from here http://demo-uhd3d.com/ which all of them stutter during playback. It's not exactly stutter or judder, it's like repeatedly spasmodic movement, after a few seconds of normal playback each time - you can call it stutter. The problem with those 6 demo files I have, is that the CPU utilization is below 50% when the movement looks like stuttered. I thought the problem was in the encoding or muxing and stopped downloading other clips from that site for that reason. Is it possible the decoding speed to be the reason of stuttering ? Those clips are huge. 6 of them are 5.15GBytes on my disk. I don't know if you grabbed your sample from there and you cut a smaller sample, but if you have a link of the original source I would like to download the whole file. Also I would be interested in HEVC 1080p files too.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
||
![]() |
![]() |
![]() |
#8 | Link |
Registered User
Join Date: Dec 2002
Posts: 5,565
|
Yes, it's very demanding. My core i7-860 does less than the at least required 50 fps on that sample even with Null Renderer. But compared to the 4K Blu-Ray this is low bitrate. We're looking at 4Kp60, 10 bit and more bitrate than current Blu-Rays.
It's a direct capture from the Astra 19.2° satellite. |
![]() |
![]() |
![]() |
#9 | Link | ||
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Quote:
Quote:
![]() OK, thanks.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
||
![]() |
![]() |
![]() |
#11 | Link |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Added 2160p clips and changed a lot the comments for the new results.
I added a conclusion, too.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
![]() |
![]() |
![]() |
#12 | Link |
Broadband Junkie
Join Date: Oct 2005
Posts: 1,859
|
i5-3570K @4.4Ghz, 16GB DDR3 1866Mhz 8-9-9-24
NVIDIA GTX 770 2GB, PCI-E 3.0 x8 Win7 SP1 x64 LAV Filters git-33f0b3 (2014-09-21) DXVA Checker 3.1.2 1.Beauty-2160p@30fps-12.3Mbps LAV x64 CPU (Null) 91/106/110 LAV x64 CPU (1280x720 scale) 78/100/110 LAV x64 DXVA (Null) 78/86/87 LAV x64 DXVA (1280x720 scale) 78/84/86 LAV x86 DXVA (Null) 76/83/86 LAV x86 CPU (Null) 36/47/53 2.Fitness-2160p@30fps-8Mbps LAV x64 CPU (Null) 109/132/142 LAV x64 CPU (1280x720 scale) 88/125/142 LAV x64 DXVA (Null) 100/113/146 LAV x64 DXVA (1280x720 scale) 97/111/125 LAV x86 DXVA (Null) 86/101/123 LAV x86 CPU (Null) 41/62/118 3.Ducks-2160p@50fps-4Mbps LAV x64 DXVA (Null) 193/197/198 LAV x64 DXVA (1280x720 scale) 183/187/189 LAV x86 DXVA (Null) 185/189/190 LAV x64 CPU (Null) 128/140/144 LAV x64 CPU (1280x720 scale) 107/131/140 LAV x86 CPU (Null) 36/48/54 Astra-UHD@50fps-18Mbps (10bit) https://www.dropbox.com/s/6rbqlrrhhv...%20UHD.ts?dl=0 LAV x64 CPU (Null-P010) 70/86/107 LAV x64 CPU (1280x720 scale-NV12) 65/86/107 LAV x86 CPU (Null-P010) 28/44/86 UHD_ENT_Transformer_Quad@24fps-51Mbps (10bit) http://demo-uhd3d.com/files/4k/Trans...Trailer_4K.zip LAV x64 CPU (Null-P010) 45/59/102 LAV x64 CPU (1280x720 scale-NV12) 45/59/102 LAV x86 CPU (Null-P010) 25/39/86 Last edited by cyberbeing; 7th October 2014 at 20:18. |
![]() |
![]() |
![]() |
#13 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,376
|
Are you sure DXVA was actually used? The performance seems extremely high for 2160p content. I never saw it go above 50 fps at that resolution, no matter how "simple" i made the encodes.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
![]() |
![]() |
![]() |
#14 | Link |
Broadband Junkie
Join Date: Oct 2005
Posts: 1,859
|
Yes, I'm sure 'DXVA (Native)' was used. CPU usage was low, GPU usage was high. LAV reported 'dxva2n' during playback/benchmark.
Ducks UHD, GPU load ~90%, GTX 770 Core 1200Mhz (Boost) Fitness UHD, GPU load ~55%, GTX 770 Core 1110Mhz (non-Boost) Beauty UHD, GPU load ~65%, GTX 770 Core 1110Mhz (non-Boost) Last edited by cyberbeing; 24th September 2014 at 17:58. |
![]() |
![]() |
![]() |
#15 | Link | |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Quote:
I think HEVC DXVA2 decoder is only for Intel cards, tomorrow I'll try Nvidia cards to see if it works for them too. I'm sure that Nvidia cards have NVCUVID ability to decode HEVC. The results of LAV DXVA for what card are referred to ? Is it Nvidia using NVCUVID ?
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
|
![]() |
![]() |
![]() |
#17 | Link |
Broadband Junkie
Join Date: Oct 2005
Posts: 1,859
|
I don't even have the driver from my CPU's HD4000 installed and it's disabled in the BIOS, so it's impossible that it would be used.
LAV added support for both DXVA2 & CUVID on supported NVIDIA cards (Kepler and newer?). Use of the CUVID or DXVA2 Copyback modes is of course much slower than DXVA2 Native, especially since I'm only using a PCI-E 3.0 x8 link. Though GPU & CPU usage is also much lower when benching in these modes, since the hybrid decoder seems to become heavily bottlenecked by 3840x2160 copyback performance. For example: Ducks-2160p@50fps-4Mbps LAV x64 DXVA Native (Null) 193/197/198 LAV x64 CUVID (Null) 73/78/78 LAV x64 DXVA Copyback (Null) 69/73/73 Fitness-2160p@30fps-8Mbps LAV x64 DXVA Native (Null) 100/113/146 LAV x64 CUVID (Null) 50/57/70 LAV x64 DXVA Copyback (Null) 49/55/66 Beauty-2160p@30fps-12.3Mbps LAV x64 DXVA Native (Null) 78/86/87 LAV x64 CUVID (Null) 43/51/55 LAV x64 DXVA Copyback (Null) 41/48/50 Last edited by cyberbeing; 24th September 2014 at 19:17. |
![]() |
![]() |
![]() |
#18 | Link |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Nice!
I didn't know. So I have much work to do tomorrow if I want to test every possible decoder for 740M. Hopefully I will see HEVC DXVA2 for 740M. Yes your CPU is Ivy so your card is HD 4000, not 4600. I thought that CUVID due to 3D clocks and zero memory performance penalty would be about equal to DXVA2 native. Thanks for the info!
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
![]() |
![]() |
![]() |
#20 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,376
|
I think I mostly tested performance with Copy-Back, so that may explain it, but I'm surprised its such a massive difference, wonder if I can tune something there..
On H.264 for example, the difference was always minimal, but i guess that was 1080p and not 2160p.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|