Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
24th September 2014, 20:03 | #21 | Link |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
From the words of nevcairiel, I was under the impression that Nvidia's hybrid HEVC decoder was a CUVID only feature, like MPEG4-ASP
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
24th September 2014, 20:10 | #22 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
|
Who ever said MPEG4-ASP is a CUVID only feature? Its just that noone ever bothered to implement it in DXVA2, because its quite a bit of work for practically no return value.
With CUVID, it just comes for free, since its all handled in the driver, not much code needed at all. HEVC is obviously implemented in DXVA2, and as such it also works on NVIDIA of course, and not only through CUVID.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders Last edited by nevcairiel; 24th September 2014 at 20:33. |
24th September 2014, 20:40 | #23 | Link |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
OK I'll test it and I'll post the results.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
25th September 2014, 03:22 | #26 | Link |
Registered User
Join Date: Dec 2005
Posts: 106
|
The x64 LAV video does good job here, but there is another problem, we still have not a mature replay suit. for example: there is not a x64 DTS-HD software audio decoder so far.
So, we still need to wait for a real HW decoder which support 4K 60p(at least) HEVC( GM1XX does not have a full HW HEVC decoder, but GM2XX does). |
25th September 2014, 04:19 | #27 | Link | |
Registered User
Join Date: Oct 2012
Posts: 7,903
|
Quote:
you can read about this here. http://www.anandtech.com/show/8526/n...x-980-review/5 |
|
25th September 2014, 10:12 | #28 | Link | |
Registered User
Join Date: Sep 2004
Posts: 146
|
Quote:
I didn't know that it is not implemented. BTW, the hybrid hevc decoder is pretty disappointing. Is there any room to improve? Last edited by sheppaul; 25th September 2014 at 10:41. |
|
25th September 2014, 13:39 | #30 | Link | |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Quote:
Look at the Ducks sample. It's more than two times faster.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
|
25th September 2014, 16:52 | #31 | Link |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Using a laptop with Nvidia's Optimus technology proved a difficult task for video decoding benchmarking.
First of all, I didn't find a way to disable it in BIOS, so I had to live with it. Using initially Intel's HD 4600 iGPU inside the Core i5-4200M, I saw that even when DXVA native was used during video decoding, the HD 4600 never reached max clocks with maximum utilization. The clocks were most of the time at 850MHz@100% usage for 1080p and 2160p clips with a max clock of 1150MHz. Also, the CPU utilization was too high for DXVA native mode, ~40% for 1080p clips and 33% - 57% for 2160p and the CPU clock went to max 2.5GHz a lot of times during benchmarking. It looks like a system with an iGPU and a discrete Nvidia card on the same system using Optimus technology, can't utilize perfectly DXVA native mode for both GPUs. Of course DXVA copy-back was even slower. From the Control Panel I chose max performance for iGPU. Also I tried using the laptop in battery and plugged-in. Nothing changed. But with Nvidia's 740M GPU the problems were a lot bigger. Starting with 340.52 driver, the clocks went high at 980MHz (max 1060MHz) but with low GPU utilization. Then the video dropped to 0fps and stopped working. Unfortunately that happened with almost all clips I tried and all modes (DXVA copy-back, NVCUVID) I tried both LAV x86 and x64 with no success. I did an update to the latest driver 344.11, but still had issues. The GPU clocks went further down to 140MHz - 230MHz (!) with very little GPU usage, but at least most of the clips were fully decoded without sudden stops, but not all the clips I tried. Some, still have incompatibilities. So, there are still compatibility problems with 740M even with latest drivers and LAV nightly 0.62.46 The results were extremely low for the clips that decoded fine and some didn't reach the end, so I decided to not include any results. I don't know if it's Optimus technology problem or driver or LAV Video. I will try again after a while, to see if anything has changed.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all Last edited by NikosD; 26th September 2014 at 17:14. |
25th September 2014, 19:15 | #33 | Link | ||
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Quote:
Especially for Nvidia, the issue is certainly not there. Quote:
OK you have a: Core i5-3570K @4.4Ghz (Ivy) 4cores/4threads (No HT) - Win7 SP1 x64 - 16GB - DDR3 1866MHz CL9 - LAV Filters git-33f0b3 (2014-09-21) - DXVA Checker 3.1.2 I have a: Core i7-4790@3.8GHz (Haswell) 4cores/8threads (HT On) - Win 8.1 Pro x64 - 8GB - DDR3 1600MHz CL9 - LAV filters git 52 (25-09-2014) - DXVA Checker 3.1.2 (DXVA Decoding) Your processor has 16% higher frequency, but it's only 4 threads vs 8 threads and it's older architecture (Ivy vs Haswell) Results: 1.Beauty-2160p@30fps-12.3Mbps LAV x64 i5-3570K 97/107/111 CPU Utilization: 95% (Threads=12) (Windows 7) (4.4GHz) LAV x64 i5-3570K 89/103/109 CPU Utilization: 91% (Threads=12) (Windows 8.1) (4.4GHz) LAV x64 i7-3770K 85/101/105 CPU Utilization: 79% (Threads=16) (HT on) (Windows 7) (4.0GHz) LAV x64 i7-4790 82/95/99 CPU Utilisation: 78% (Threads=16) (HT on) (Windows 8.1) (3.8GHz) LAV x64 i7-3770K 88/93/100 CPU Utilization: 95% (Threads=16) (HT off) (Windows 7) (4.0GHz) LAV x64 i7-3770K 80/93/99 CPU Utilization: 70% (Thread=Auto) (HT on) (Windows 7) (4.0GHz) LAV x64 i7-3770K 77/92/97 CPU Utilization: 94% (Threads=12) (HT off) (Windows 7) (4.0GHz) LAV x64 i7-4790 78/90/95 CPU Utilisation: 73% (Threads=Auto) (HT on) (Windows 8.1) (3.8GHz) LAV x64 i5-3570K 76/82/86 CPU Utilization: 72% (Threads=Auto) (Windows 7) (4.4GHz) LAV x64 i7-4790 69/81/87 CPU Utilisation: 90% (Threads=16) (HT off) (Windows 8.1) (3.8GHz) LAV x64 i7-4790 69/81/85 CPU Utilisation: 90% (Threads=12) (HT off) (Windows 8.1) (3.8GHz) LAV x64 i7-3770K 61/71/76 CPU Utilization: 71% (Threads=Auto) (HT off) (Windows 7) (4.0GHz) LAV x64 i7-4790 61/68/72 CPU Utilisation: 90% (Threads=Auto) (HT off) (Windows 8.1) (3.8GHz) 2.Fitness-2160p@30fps-8Mbps LAV x64 i5-3570K (Null) 109/132/142 Threads=12 LAV x64 i7-4790 (Null) 99/120/148 CPU Utilisation: 92% Threads=16 LAV x64 i7-4790 (Null) 97/113/136 CPU Utilisation: 85% Threads=Auto 3.Ducks-2160p@50fps-4Mbps LAV x64 i5-3570K (Null) 128/140/144 Threads=12 LAV x64 i7-4790 (Null) 109/122/127 CPU Utilisation: 92% Threads=16 LAV x64 i7-4790 (Null) 110/120/122 CPU Utilisation: 91% Threads=Auto Astra-UHD@50fps-18Mbps (10bit) LAV x64 i5-3570K 71/86/108 CPU Utilization: 94% (Threads=12) (Windows 7) (4.4GHz) LAV x64 i5-3570K 76/82/86 CPU Utilization: 72% (Threads=Auto) (Windows 7) (4.4GHz) LAV x64 i5-3570K 73/81/86 CPU Utilization: 69% (Threads=Auto) (Windows 8.1) (4.4GHz) LAV x64 i7-4790 61/72/85 CPU Utilisation: 80% (Threads=16) (HT on) (Windows 8.1) (3.8GHz) LAV x64 i7-4790 59/69/81 CPU Utilisation: 76% (Threads=Auto) (HT on) (Windows 8.1) (3.8GHz) LAV x64 i5-3570K 55/69/95 CPU Utilization: 75% (Threads=Auto) (Windows 7) (4.4GHz) LAV x64 i7-4790 51/64/77 CPU Utilisation: 93% (Threads=12) (HT off) (Windows 8.1) (3.8GHz) LAV x64 i7-4790 47/63/77 CPU Utilisation: 92% (Threads=16) (HT off) (Windows 8.1) (3.8GHz) LAV x64 i7-4790 47/55/76 CPU Utilisation: 79% (Threads=Auto) (HT off) (Windows 8.1) (3.8GHz) UHD_ENT_Transformer_Quad@24fps-51Mbps (10bit) LAV x64 i5-3570K(Null-P010) 45/59/102 LAV x64 i7-4790 (Null) 45/58/85 CPU Utilisation: 96% Threads=16 LAV x64 i7-4790 (Null-P010) 45/57/80 CPU Utilisation: 94% Threads=Auto Can you explain me at least the last result, a Core i7-4790 with a CPU utilization of 96% (full 8 threads decoding) to have less decoding performance than a Core i5-3570K with 15% higher frequency. I find it unbelievable. I used DXVA Checker's v3.1.2 DXVA decoding choice for Null renderer and latest (today's) LAV filters 0.62.52 Can you try that tool with that filters version to check the performance reporting CPU utilization too? Thanks!
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all Last edited by NikosD; 26th September 2014 at 21:28. |
||
25th September 2014, 19:56 | #34 | Link |
Broadband Junkie
Join Date: Oct 2005
Posts: 1,859
|
All my previous results were with LAV Video set to Threads=12, since as you also noticed, CPU utilization and resulting performance is lower than expected on some HEVC samples with LAV set to Threads=Auto. I'd assume this is the discrepancy you are seeing.
|
25th September 2014, 20:46 | #35 | Link | |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Quote:
I tried first putting Threads=12, but it was equal or a little slower than Auto for my CPU. So I went all the way up to Threads=16. I edited my previous post to show the results of Threads=16 There is an increase, but still I can't reach you. I still find it extremely weird. Do you have a secret ? Did you use GraphStudioNext for benchmarking Null renderer ? I can't figure it out, unless it's the different LAV filters version. I can't think of anything else.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
|
25th September 2014, 21:41 | #36 | Link |
Broadband Junkie
Join Date: Oct 2005
Posts: 1,859
|
I'm not doing anything special:
DXVAChecker 3.1.2 -> Decoder tab -> Drag/Drop Video -> Click Arrow -> Benchmark -> DXVA decoding Threading differences between Win7 & Win8, HT vs no-HT, architecture jumps not always being superior in all metrics, and that Intel CPUs sometimes unbottleneck themselves when overclocked could all be possible explanations. Also since you have a laptop, it's possible your CPU is power/thermal throttling and not using maximum TurboBoost when benchmarking. If you download RealTemp TI, it should show you the actual clock speed at any given time. It looks like LAV HEVC threading problem I was referring to only occurs on the following two samples: 3570K@4.4Ghz LAV Filters git-1d591 (Betaking build) 1.Beauty-2160p@30fps-12.3Mbps LAV x64 i5-3570K 97/107/111 CPU Utilization: 95% (Threads=12) LAV x64 i5-3570K 76/82/86 CPU Utilization: 72% (Threads=Auto) Astra-UHD@50fps-18Mbps (10bit) LAV x64 i5-3570K 71/86/108 CPU Utilization: 94% (Threads=12) LAV x64 i5-3570K 55/69/95 CPU Utilization: 75% (Threads=Auto) The other samples had good utilization (>92%) with Threads=Auto, and only ~1% lower performance vs Threads=12. Not so for the two samples above. Last edited by cyberbeing; 25th September 2014 at 21:52. |
25th September 2014, 22:27 | #38 | Link |
Broadband Junkie
Join Date: Oct 2005
Posts: 1,859
|
Above NikosD was testing Null, which implies no scaling.
My results using 'DXVA Processing' and 1280x720 scaling with actual renderer output, explicitly state so. There is not much difference in performance either way. Since you both have CPUs with Hyperthreading, try disabling HT and see if your results improve. And you huhn, since you have a 3770k, could try matching my 4.4Ghz overclock. If you are only seeing 60% CPU Utilization on your 3770K that is extremely strange...were you also using Win8 and not Win7 like I am? I'll try booting into Win8.1 later and see if there is any difference. Also, the LAV Filters nightly builds I've been using for testing are those from Betaking: here. Which builds have you guys been using? Edit (Win7 vs Win8.1 Threading): Windows 8.1 does seem to have marginally lower CPU Utilization and benchmark performance compared to Windows 7, but not significant enough to explain what NikosD or huhn are seeing. 1.Beauty-2160p@30fps-12.3Mbps LAV x64 i5-3570K 97/107/111 CPU Utilization: 95% (Threads=12) (Windows 7) LAV x64 i5-3570K 89/103/109 CPU Utilization: 91% (Threads=12) (Windows 8.1) LAV x64 i5-3570K 76/82/86 CPU Utilization: 72% (Threads=Auto) (Windows 7) LAV x64 i5-3570K 73/81/86 CPU Utilization: 69% (Threads=Auto) (Windows 8.1) Edit2 (Betaking 2014-09-25 vs K-Lite 10.76 LAV nightly build): No difference in performance. Last edited by cyberbeing; 26th September 2014 at 01:20. |
26th September 2014, 00:58 | #39 | Link |
Registered User
Join Date: Oct 2012
Posts: 7,903
|
i used windows 7.
lavfilter is 0.62.0.52 i took it from KLCP 4.4 GHz results in a blue screen after reboot. my 3770k has pretty bad quality silicon so i stay at 4.0. my RAM was at 1333 for some reasons it now at normal 1600. the system wasn't rebooted for some time looks a lot better with 1600 and rebooted system: Beauty_3840x2160_120fps_420_8bit_HEVC_MP4 80 93 99 / 70% threads auto (should be 12 with 8 thread system) 85 101 105 / 79% threads 16 i'm pretty sure an i7 will look pretty good with 24-32 threads. all used Null rendering edit: non HT 61 71 76 / 71% threads auto (should be 6) 77 92 97 / 94% threads 12 88 93 100 / 95% threads 16 looks like using 3 times the number of threads the CPU got helps for HEVC decoding. 12 thread with core 4 cpu and 12 threads with 4 core cpu with HT threads have about the same speed the min FPS is a lot higher with HT. at 16 threads it looks like HT give a decent speed boost where non HT has no real difference except min fps. i guess HT does a good job with a lot of threads too but this will take "a lot" of RAM. i guess even with 32 threads this shouldn't be a real problem for people with 8 thread CPUs. Last edited by huhn; 26th September 2014 at 01:18. Reason: formating |
26th September 2014, 08:02 | #40 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
|
Some clips are just not encoded in a way that would be beneficial to multi-threading, and while you can try throwing more and more threads at it to try to improve the CPU utilization, its usually a bad solution.
For such clips it may be useful to combine both frame and slice multi-threading, so that big I frames can use slice threading (or wpp) to speed up their decoding, since everything hinges on them being ready as a reference for the others. I've seen preliminary patches for this, but its never been finished.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
Thread Tools | Search this Thread |
Display Modes | |
|
|