Log in

View Full Version : H.264 CPU/DXVA codec comparison - Core2Duo vs UVD 2.2


Pages : 1 2 [3]

NikosD
14th December 2011, 22:36
I think that latest versions of DXVA checker 2.6.1/2.6.2 are rock solid in terms of stability of results and bug free more or less.
I'm talking about ATI hardware UVD and Microsoft+AMD decoders (DS&MFT).

Do you have any examples of bad behavior of DXVA checker and funny results ?

For Mirillis I could say that when I first wrote here about Splash Pro in the DiAVC post, nobody knew the program and they all tried to lower it by talking about only for bugs that time.

It's good to see that other people than me see talent in that company.

vivan
16th December 2011, 12:38
Any particular reason that you tested both VC-1 and H.264 clips with FFDShow QS version in VMR renderer ?

Why don't you use EVR with FFDShow QS and add the results to the above post?

It should be faster...Actuallty it was slightly slower and less stable.

7.
Renderer: Enhanced Video Renderer (DirectShow)
Decoder: ffdshow Video Decoder
Decoder Device: ModeH264_VLD_NoFGT
Time: 00:03.303
Average FPS: 112,322
Min/Max FPS: 96 / 117
CPU Usage (%): Avg: 27 Min: 25 Max: 30


8.
Renderer: Enhanced Video Renderer (DirectShow)
Decoder: ffdshow Video Decoder
Decoder Device: ModeH264_VLD_NoFGT
Time: 00:04.932
Average FPS: 111,517
Min/Max FPS: 79 / 115
CPU Usage (%): Avg: 27 Min: 25 Max: 31

And with vc-1 ffdshow qs failed.
If I choose play -> EVR - it plays it smoothly (http://2.firepic.org/2/images/2011-12/16/zoslxl93k364.png).
But if I choose benchmark -> EVR - it just show blank window (http://2.firepic.org/2/images/2011-12/16/ztjkn2iam60i.png). And forces turning Aero off in 90% cases.
Looks like DXVAChecker bug...

And about CPU usage egur recently wrote in his thread:
http://forum.doom9.org/showthread.php?p=1544519#post1544519
Anyway, CPU usage is low enough and decoder is quite fast. Plus it's possible to use madVR :)

NikosD
16th December 2011, 13:00
Strange behavior justified only by the non pure DXVA implementation of FFDShow QS.
Pure DXVA uses EVR only.
It could be a bug of DXVA Checker or FFDShow QS or both.

The work that has already be done by Egur is impressive but Intel has to do two things:

1) Make pure working DXVA decoders (DS/MFT).
2) Make HW acceleration (DXVA) COMPLETELY CPU INDEPENDANT.

It's a combination of drivers/ hardware that I hope to see in Ivy Bridge which will have the capability of multiple streams of H.264 hw decoding up to 4K resolution.

It will support 4K x 4K too.

nevcairiel
16th December 2011, 13:07
2) Make HW acceleration (DXVA) COMPLETELY CPU INDEPENDANT.


Their GPU is in their CPU. :P
For the record, using a good DXVA decoder (or a QuickSync decoder) will already result in very low CPU usage, allowing it to remain in the lowest performance state possible.

NikosD
16th December 2011, 13:55
Their GPU is in their CPU. :P
For the record, using a good DXVA decoder (or a QuickSync decoder) will already result in very low CPU usage, allowing it to remain in the lowest performance state possible.

The CPU resources used by a very powerful CPU like SB or Ivy etc when decoding in benchmark mode even the toughest H.264 clips should be near 0%.

With my poor CPU Core2Duo at 1.6GHz when I benchmark even the thoughest H.264 clips with UVD2.2, I never go beyond 8%-9% at the max. The average is 3%-5% !!! With a dual core@1.6GHz!

They have to implement the whole pipeline of H.264 (including bitstream format stage) in drivers/ hardware in Ivy Bridge.

BTW, DXVA decoder equals or should equal QS decoder, meaning that QuickSync decoder should use pure DXVA decoder only (not using Intel Media SDK)
The latter (Intel Media SDK) is far more flexible and useful for some people than pure DXVA , but it should exist an option for pure speed (DXVA only) by Intel.

vivan
16th December 2011, 14:13
should exist an option for pure speed (DXVA only) by Intel.lowest fps is ~120 fps on 8th video while both nvidia ant ati has much slower perfomance ;)
And for "pure speed" there is an option called "CoreAVC", at least.
Also madVR is much better option than higher benchmark results :)

nevcairiel
16th December 2011, 15:03
The CPU resources used by a very powerful CPU like SB or Ivy etc when decoding in benchmark mode even the toughest H.264 clips should be near 0%.

Since i don't believe any benchmark i didn't do myself, here it is.

I used the "Girls" clip, because i still had it on my disc.

NVIDIA GTX 570
Renderer: Enhanced Video Renderer (DirectShow)
Decoder: Microsoft DTV-DVD Video Decoder
Decoder Device: ModeH264_VLD_NoFGT
Processor Device: -
Time: 02:36.734
Average FPS: 76,237
Min/Max FPS: 74 / 79
CPU Usage (%): Avg: 01 Min: 00 Max: 02

Intel i7 2600k:
Renderer: Enhanced Video Renderer (DirectShow)
Decoder: Microsoft DTV-DVD Video Decoder
Decoder Device: ModeH264_VLD_NoFGT
Processor Device: -
Time: 00:28.663
Average FPS: 416,879
Min/Max FPS: 398 / 427
CPU Usage (%): Avg: 03 Min: 02 Max: 05

I'll leave everyone to judge these results all they want, but a 5-fold increase of decoding speed will just need more CPU, no questions asked. 1% at 80fps is just 5% at 400 fps.
Not sure why this clip is so extremely fast, though, others play at around ~200-300, but still at a maximum of 4-5% CPU.

So, no, they don't need to change anything, and they also don't need their own decoder, the MS decoder is doing just fine. :)

PS:
The Intel Media SDK is just a wrapper around DXVA2, there is no special API like there is with CUDA for NVIDIA.

NikosD
16th December 2011, 17:40
If we compare different things we are not going to extract useful and valid results.

First of all you didn't write your CPU frequency during benchmarking.
5% of what CPU ? Frequency!

The toughest clips I know and Vivan tried are clips 7. and 8. and the CPU usage was ~25% according to him during benchmarking.
This is a DAMN HIGH CPU USAGE FOR A QUAD CORE CPU AS POWERFUL AS SANDY.

Try to benchmark clips 7. and 8. WRITING THE PROCESSOR FREQUENCY.

But above of all these...
When Vivan forced the CPU frequency down to 800MHz, he got half fps in clips 7. and 8. !!

This is UNACCEPTABLE.
CPU frequency and cpu decoding ability should have nothing to do with decoding performance when using DXVA.

It's that simple!

And for "pure speed" there is an option called "CoreAVC", at least.


It uses Intel Media SDK too, I think.
Pure DXVA means no CUDA or OpenVideo or Intel Media SDK involved.

nevcairiel
16th December 2011, 18:30
You sure are an unfriendly fellow

Anyhow, here are results for sample 8.

Intel i7 2600k
Renderer: Enhanced Video Renderer (DirectShow)
Decoder: Microsoft DTV-DVD Video Decoder
Decoder Device: ModeH264_VLD_NoFGT
Processor Device: -
Time: 00:03.700
Average FPS: 143,243
Min/Max FPS: 133 / 148
CPU Usage (%): Avg: 08 Min: 07 Max: 09

CPU goes to full clock during benchmark (3.8Ghz)
In "Playback" mode, CPU stays at lowest setting (1.6Ghz), at around 3-4% usage

The NVIDIA GTX570 doesn't even manage to reach 24 fps on it:
Renderer: Enhanced Video Renderer (DirectShow)
Decoder: Microsoft DTV-DVD Video Decoder
Decoder Device: ModeH264_VLD_NoFGT
Processor Device: -
Time: 00:23.928
Average FPS: 22,150
Min/Max FPS: 20 / 24
CPU Usage (%): Avg: 03 Min: 02 Max: 05

Again, a 5 to 6 fold performance advantage, with only tripple the CPU usage (which is partly from the demuxer and other DirectShow components).
I don't know what more proof you need, the decoding is completely done in hardware, and its the fastest out there. The relatively high cpu load in benchmark mode is only natural from the very high fps achieved.

Don't forget that you still need a CPU to pump 120 fps of a very high bitrate clip from your HDD through DirectShow to the decoder.
DXVA only does the decoding, but the splitter still needs to process the video, and if its very high bitrate, there also is more data to process --> higher CPU load.
120mbit at 24 fps equals 600mbit at 120 fps, quite alot of data to shuffle around!

Oh well, i'm done. Believe what you want, but the hardware decoder in the Intel CPUs is some fine work.
Now if they only fixed some of the driver bugs, it would be even better. :)

NikosD
16th December 2011, 18:41
Keep up the good work with LAV Filters.

Hope you add MFT MKV splitter and DXVA for all hardware (ΑΤΙ, Nvidia, Intel) soon :)

NikosD
28th December 2011, 20:58
I have recently installed latest beta drivers of Nvidia 290.53 on a Geforce GT440 under Windows 7 Home x86 and I didn't get any codecs by Nvidia.

I thought that Nvidia provided "NVIDIA Video Decoder MFT" in their drivers.

Does any know what happened and Nvidia stopped providing MFT decoders ?

Which is the last driver with Nvidia MFT decoders ?

Where can I find NVIDIA MFT decoders ?

Thanks!

nevcairiel
28th December 2011, 21:57
NVIDIA didn't provide their own decoder for quite a long time because the MS decoder works just fine with their cards.
I don't know when the last driver was that shipped the decoder, but it wasn't in the 2xx series of drivers, so no driver that supports the 440 will have it. :p

NikosD
29th December 2011, 08:53
Thanks for the info.

But then, if Nvidia decided so, how could someone use VC-1/WMV3 DXVA VLD decoder ?

Because MS decoders(DS/MFT) for DXVA VC-1/WMV3 don't support VLD decoding.

MS decoders provide full acceleration (VLD) only for MPEG2 and H.264.

That's why AMD decided - that was a very nice move by AMD - to implement its own MFT decoders for VLD VC-1/WMV3/ MPEG4ASP (DivX, Xvid).

BTW, how could someone use DXVA MPEG4ASP with VP4 ?

I think that official DivX codec only works with UVD3 and your CUVID decoder is discontinued.

Is there any other way ?

JohnnyFu
3rd August 2012, 08:50
Hi guys,

I have a question regarding DXVA and I believe this is a good place to ask.

During evaluation of h264 software decoders we realized we cannot decode four h264 streams at the same without violating our software specification requirements for CPU load.

I tried CoreAVC, Mainconcept, Elecard and Microsoft. Our i7-620M plattform is above 70% load when decoding more then two streams (1080p 25fps, avg 20Mbit/s) at the same time. And under 100% load when decoding four streams at the same time.

I noticed when decoding a single stream using EVR renderer, at least Elecard, Mainconcept and Microsoft make use of DXVA reducing CPU load to almost 0%.

However, I also noticed when decoding more then one stream at the same time, only one GraphStudio seems to make use of DXVA.

So I wonder, if it is possible to make use of DXVA with multiple decoding processes at the same time? Or does simple not work by design?

Any tip on how to decode up to four HD streams at the same time on our i7-620M plattform without causing CPU load higher then 60% would be greatly appreciated! Our goal are streams at 1080p, 20mbps, 23-30fps.

EDIT: just noticed, using MPC-HC both process seems to make use of DXVA as they are both causing just 2% load instead of 30%. However, the video sutters horrible...

NikosD
3rd August 2012, 09:43
Your CPU/Platform (i7-620M) uses Nehalem architecture for mobile, called Arrandale.

Arrandale doesn't have a powerful integrated GPU, to be more exact VPU (Video Processing Unit), to process at the same time four 1080p30fps H.264 streams.

It is possible that you can't even process two streams of the above type with your hardware.

So whatever software/ codec you try (CoreAVC, Mainconcept etc) it doesn't change anything, it's the hardware the limiting factor of the decoding performance of your platform.

Using my discrete card - ATI Radeon 5750 - I can only decode in HW (DXVA) two streams of 1080p30fps H.264 at 20Mbps.
Not even three.

But I can do four 720p30fps H.264 streams at 10Mbps.

QuickSync hardware available inside most Sandy Bridge and Ivy Bridge Intel processors and maybe VP5 discrete Nvidia cards can decode four 1080p30 fps - 20Mbps H.264 streams.

For QuickSync I'm sure.

JohnnyFu
3rd August 2012, 10:01
We are able to decode two to three 1080p streams at 20-30mpbs using CoreAVC in software mode. However this violates our CPU load SRS.

When using DXVA with two streams, both Microsoft and Elecard seems to make use of DXVA for both streams. However it looks like they are competing for DXVA API in some way as the video stutters horrible as soon as I start the second stream.

I just realized our encoding plattform uses Sandy Bridge, I'll reconfigure the lab and see if there is any stuttering when decoding two streams using DXVA.

vivan
3rd August 2012, 10:14
Run benchmark using DXVA checker. If you'll get less than 60 fps on your video - than your hardware is too slow for decoding even 2 such streams.

JohnnyFu
3rd August 2012, 10:17
Wow... I just decoded four streams using Microsoft's Decoder without any problems, 10% CPU load with Sandy Bridge! Thank you very much, huge step forward for me!

fashionman
28th November 2012, 10:10
hi,

I develop video decoder with media foundation and DXVA2, after decode frame, the GPU usage about increase 20%, but no video display on screen, could you give me advise,

thanks

NikosD
28th November 2012, 12:51
Sorry I'm not a developer.

Maybe Egur (Eric) or Nevcariel (Hendrik) could help you.

rubait
23rd July 2020, 00:28
Can you give me access to ftp://helpedia.com/pub/multimedia/x264/testvideos/? Wanted to test these out on a Icelake system for comparison.

NikosD
23rd July 2020, 11:11
Can you give me access to ftp://helpedia.com/pub/multimedia/x264/testvideos/? Wanted to test these out on a Icelake system for comparison. The server seems to be down, but it's not mine.
I think a user called @mariush had uploaded all those files on that server.
Maybe you could reach him.