View Full Version : H.264 CPU/DXVA codec comparison - Core2Duo vs UVD 2.2
pirlouy
2nd April 2011, 21:05
Of course I don't have nevcairiel's knowledge, but for me, OpenCL is better than CUDA because even if you seem to disagree, it seems more open than CUDA.
And if people starts to develop applications for CUDA, it's the beginning of a company dependence.
Imagine if there were a C for Intel and a C for AMD, it would be boring for everyone.
From what I've read, OpenCL can use DXVA (with its limitations), but it seems it's up to GPU companies to allow direct connections between OpenCL and their H.264/VC1 etc. modules. And from NikosD's link, it seems it's the case now with AMD.
But indeed, all this is young and should have a lot of bugs. But in some years, I hope there won't be a lot of applications "optimized for Nvidia"....
NikosD
3rd April 2011, 10:34
The only reason for Nvidia not to implement a solution for "OpenCL" video decoder ever, is CUDA.
If Nvidia had the choice, they would prefer to not implement anything in OpenCL. CUDA is their exclusive API, they try to spread everywhere and to everyone in order to sell more cards.
Eventually we will see an "OpenCL" video decoder, but for ATI only.
pirlouy
3rd April 2011, 11:33
I think they support OpenCL, but through CUDA (OpenCL -> CUDA -> hardware). So it should work, but not as powerful as if there were a direct "connection". But at least they try...
thuan
4th April 2011, 04:50
I run everything on my home system through an UPS and with its software, my power usage between accelerated playback with DXVA and ffdshow with EVR-CP are not that much different, hovering around 123w and 125w respectively. This and problems I have with accelerated video playback are more than enough for me to abandon it.
CruNcher
4th April 2011, 11:01
hmm using FFMpeg-MT, CoreAVC or DiAVC ? and yes its a valid question if you only look @ Playback power consumption especially as CPUs become more and more energy efficient, nobody yet measured how much energy for example does CoreAVC need for Playback on SB with the integrated GPU not doing anything else and with the GPU doing the decode though in this non discrette case and with the Logic combined on 1 Die its almost clear that it would save a lot of energy even compared to a High efficient Decoder like CoreAVC (it could look completely different for a discrette system where you have a base idle consumption + decoder dsp consumption + other cards components consumption while decoding) ;)
Though for other System configurations it would loss efficiency depending on several factors, though a Desktop system with discrete GPU is mostly also used for Gaming so the GPU is active so or so including the Logic you may not need, and being able to utilize that to save some power in certain tasks or just offload the CPU is nothing wrong and yeah Decoding alone can be a very small power save depending on the System but if you combine more and more tasks smart and efficiently you can save a lot.
nevcairiel
4th April 2011, 11:08
Of course once CPUs become more powerful and efficient, the difference will always be smaller. However, alot of people have fairly weak CPUs in their dedicated HTPCs, which might just manage to decode it in software, but do that at nearly full load, which adds noise and heat.
Or people want to do post-processing (ok, which does not work with DXVA, but does work with for example CUVID based decoders), and need the CPU for that.
There will always be valid use-cases for accelerated decoding, just not for everyone.
thuan
4th April 2011, 11:44
Agree and I was using ffdshow so ffmpeg-mt. Tested video was h264 encode of the last part of Planet Earth ep 5 where there were a lot of locusts.
CruNcher
4th April 2011, 12:11
Of course once CPUs become more powerful and efficient, the difference will always be smaller. However, alot of people have fairly weak CPUs in their dedicated HTPCs, which might just manage to decode it in software, but do that at nearly full load, which adds noise and heat.
Or people want to do post-processing (ok, which does not work with DXVA, but does work with for example CUVID based decoders), and need the CPU for that.
There will always be valid use-cases for accelerated decoding, just not for everyone.
Not entirely true Roozhou works on DXVA frame grabing based stuff which could be also utilized for Post Processing :)
True the first tests don't look as efficient as Nvcuvid on XP at least see Roozhou test http://forum.doom9.org/showthread.php?t=160371 but testing and real life results are sometimes not near @ each other see my Performance issues with LAV CUVID and Haali/Madvr which i yet cant explain either why i lose so heavily efficiency compared to VMR on my System configuration where theoreticaly LAV CUVID + Madvr should be somewhere performance wise equal in theory as when rendering onto VMR (where LAV CUVID on VMR9 renderless also shows problems here) but somewhere is a bottleneck might be inside my system configuration or how LAV CUVID,CoreAVC CUDA, CUDA Video Decoder provide the decoded frames i dunno yet another explanation could be high resolution timing.
I mean if you see that
LAV CUVID + MadVR
http://img189.imageshack.us/img189/5263/disappearingframes.png
LAV CUVID + VMR9 renderless
http://img190.imageshack.us/img190/4484/lavcuvidvmr9.png
and this you would also ask you these same questions ;)
Cyberlink DXVA + VMR9 renderless
http://img23.imageshack.us/img23/9450/justwow.png
nevcairiel
4th April 2011, 12:26
Especially ATI cards are *really slow* when downloading the contents of a D3D Texture (where the DXVA decoded image ends up), so it will cut into the performance. (Especially on XP this shows, the new driver model used by Vista/7 seems to improve this somewhat - but then XP is really getting old, personally i don't care for any statistics in it.)
CruNcher
4th April 2011, 16:44
Could somebody Benchmark this in Vista/7 with Forceware 270.51 ?
DXVA + VMR9 Renderless + Complex Sharpening 2
The 4 Girls stream can be found @ the first post
http://img860.imageshack.us/img860/1104/shaderperformancebench.png
Don't expect to get that Stream Playing Back Realtime neither Nvidia nor ATI Hardware can do that (or some overhead doesn't allow them too, someone with VDPAU Linux could test ?) according (also for Vista/7) to the Benchmarks here and my own, though Intel HD2000/3000 results on Vista/7 would still be needed, someone with a Broadcom Crystal or any Hardware based Windows Decoder setup and results from their would also be great :)
nevcairiel
4th April 2011, 18:25
Don't expect to get that Stream Playing Back Realtime neither Nvidia nor ATI Hardware can do that (or some overhead doesn't allow them too, someone with VDPAU Linux could test ?)
Obviously you're wrong about that
http://images.gammatester.com/pics/7ab488d8c82576b826ccdd54a1d8e0f8.png
Since i'm on Win7, i had to use EVR Custom for DXVA (VMR9 DXVA is only XP)
So, this is MPC-HC DXVA decoder, EVR Custom + sharpen complex 2, on a VP4 card (GTX 570)
Btw, the shaders shouldn't influence the performance, as long as you don't overload the shader unit. The decoder is completly separate from the shaders.
CruNcher
4th April 2011, 18:34
Nice :) so the major question now is VP4 responsible or Win 7 eg WDDM 1.1 and DWM
do you get the same results with LA CUVID (nvcuvid) instead of DXVA ?
VP2 is known to be clocked @ 400 MHz :D
And that you don't show a picture of the decoded content i guess means you can't capture the surface on EVR Custom ?
thuan
4th April 2011, 18:48
Remember I have a VP2 9800GT and with new driver, DXVA playback (and through CUDA for that matter) on Windows 7 is fubar. I say whether they works or not will always be a combination of factors. As I don't want to play roulette anymore, I'm giving up on accelerated playback for now :sign:.
nevcairiel
4th April 2011, 18:58
do you get the same results with LA CUVID (nvcuvid) instead of DXVA ?
I get about 50 dropped frames with my CUVID decoder over the whole file, but i attribute that to its not being optimized. Didn't test CoreAVC in CUDA mode.
CruNcher
4th April 2011, 19:12
I get about 50 dropped frames with my CUVID decoder over the whole file, but i attribute that to its not being optimized. Didn't test CoreAVC in CUDA mode.
Not bad :) did you compared CPU utilization in both cases already ?
do you use also the 64 bit version like thuan ?
Remember I have a VP2 9800GT and with new driver, DXVA playback (and through CUDA for that matter) on Windows 7 is fubar. I say whether they works or not will always be a combination of factors. As I don't want to play roulette anymore, I'm giving up on accelerated playback for now :sign:.
Do you get the same results as nevcairiel with the same playback setup from the 4 girls clip ?
MPC-HC DXVA decoder + EVR Custom + sharpen complex 2
you could also try
Cyberlink DXVA + EVR Custom + sharpen complex 2
or
LAV CUVID/CoreAVC CUDA/CUDA Video Decoder + EVR Custom + sharpen complex 2
Not sure why you give up it shouldn't be a problem to get it working and if 270.51 doesn't work on your 9800GT and Win 7 then try the latest Win 7 WHQL driver for the 9800GT that should work.
Would be this one currently for the 9800GT http://us.download.nvidia.com/Windows/266.58/266.58_desktop_win7_winvista_64bit_international_whql.exe
If you mean by Fubar the Performance is bad then please tell us by how much, and what do you mean by a combination of factors (MB Bios,Cpu Frequency switching, Ram timings/frequency, Gfx Card Bios, GFX Card Frequency switching ?) ? :)
thuan
5th April 2011, 03:51
I don't want to do another test because I have done it enough but my observation is like this:
As I said on another thread, only on driver version 258.96 and older, I can get accelerated playback plays at acceptable/normal performance. With newer driver and my VP2 card, performance is worse and on certain high bitrate 1080p stream, it simply goes like a slideshow. This consistently happens with any method I use to play video, regardless it's CUDA based or DXVA based decoder that is used with madVR or EVR-CP, with 258.96 and older it works, with newer driver it does not.
So I think it's a combination of my VP2 card and driver, but I'm not so sure whether that is correct in my case, still I believe it is so.
CruNcher
5th April 2011, 17:00
Im slowly also going into that direction of the Driver for my issues with LAV CUVID + MadVR + XP i tried Kernel timing now and other things still i lose a heavy amount of Frames cant be normal though Benchmarks show no issues (though DXVAchecker only seems to bench VMR7/9 windowed) @ all but as soon video output seems to come into play it crawls awaythough i have 0 problems with DXVA except the very heavy streams most others also have issues with here.
Though we still have to less results here nevcairiels results beat everything else in terms of Performance currently on his 570 with VP4 and Win 7 :) for both NVCUVID and DXVA on EVR-CP very impressive results (number wise) we have to trust him on the Real Performance ;)
PS: Im currently in a state where i don't understand the results anymore (LAV CUVID,MADVR CoreAVC CUDA + MADVR + Haali)
Here is the IMHO most direct way to test the DSP with as low overhead as possible directly via nvcuvid on XP
Display overhead
http://img846.imageshack.us/img846/8108/directdspuse.png
Without display overhead
http://img52.imageshack.us/img52/8686/disabledisplayout.png
Though this pure performance results absolutely don't come through for me with LAV CUVID or any other NVCUVID directshow implementation on any renderer in XP only with DXVA i get to the same Decoding Performance Level (obviously only on VMR7/9 windowed).
It matches with the VMR7 windowed results i get with Cyberlinks DXVA in MPC-HC
Huh after closing DGDecNV and retrying it in MPC-HC with VMR7 Renderless (3D surface) ahh ok that is explainable its not using DXVA @ all it shows it though in the Status Display ;) but it doesn't use it :P
http://img696.imageshack.us/img696/9230/vmr7renderless.png
VMR7 windowed right after it
http://img268.imageshack.us/img268/3197/vmr7windowed.png
VMR9 Renderless (no shaders) (Alternate Sync, 3D surfaces,Bicubic,VMR9-mixer mode)
http://img854.imageshack.us/img854/1049/vmr9renderless.png
VMR9 windowed
http://img851.imageshack.us/img851/3816/vmr9windowed.png
LAV CUVID + VMR7 windowed
http://img703.imageshack.us/img703/9936/lavcuvidvmr7windowed.png
LAV CUVID + VMR9 windowed
http://img651.imageshack.us/img651/6042/lavcuvidvmr9windowed.png
LAV CUVID + VMR7 Renderless (any mode) doesn't work falls back to Video Renderer eg VMR7 windowed see above result
LAV CUVID + VMR9 Renderless (no shaders) (Alternate Sync, 3D surfaces,Bicubic,VMR9-mixer mode)
http://img849.imageshack.us/img849/4875/lavcuvidvmr9renderless.png
LAV CUVID + MADVR (NV12) (heavy frame dropping fps comparable to Haali + CoreAVC CUDA (YUY2))
http://img813.imageshack.us/img813/6388/lavcuvidmadvr.png
Haali + CoreAVC CUDA (YUY2)
http://img851.imageshack.us/img851/5713/haaliyuy2coreavccuda.png
CoreAVC CUDA (NV12) VMR7 windowed (comparable LA CUVID result)
http://img20.imageshack.us/img20/579/coreavcnv12vmr7windowed.png
Currently best NVCUVID dshow result -22 fps compared to the DXVA result on VMR9 Renderless
CoreAVC CUDA (NV12) + VMR9 Renderless (no shaders) (Alternate Sync, 3D surfaces,Bicubic,VMR9-mixer mode)
http://img69.imageshack.us/img69/8035/coreavccudavmr9renderle.png
So quiet normal expected results the windowed modes are faster especially VMR7 windowed (not without a reason the default XP renderer)
VMR9 renderless obviously has the highest overhead with -8 fps compared to VMR7 windowed though not that much to worry about therefore it has all the benefits of being dxva capturable and support subtitles and the alternate sync + shader customization. Still this makes it unexplainable why i get so bad results with LAV CUVID + MadVR/VMR9 renderless or CoreAVC CUDA + MadVR especially looking @ the bandwith NV12 results.
D3D9 Surface speed test:
NV12: upload 403 fps, download 532 fps, trick download failed
YV12: upload 75 fps, download 15 fps, trick download failed
A8R8G8B8: upload 387 fps, download 248 fps, trick download failed
DXVA Surface speed test:
NV12: upload 425 fps, download 526 fps, trick download failed
YV12: upload 75 fps, download 14 fps, trick download failed
A8R8G8B8: upload 383 fps, download 242 fps, trick download failed
So maybe someone else has a idea why the performance of LAV CUVID + MADVR on XP 9800 GT Forceware 270.51 could be so extreme slow even with NV12
The major difference between LA CUVID VMR7/9 windowed (especially 9 windowed that's not overlay @ all) and MadVR is what i try to understand, or is it really the missing hardware memcopy improvements that came after G92 that play @ major role here and explain nevcariels result or is my bandwith not enough @ all ?
http://img541.imageshack.us/img541/6451/cudabandwith.png
IgorC
6th April 2011, 17:10
NikosD,
I was looking for such information. Thank you for it.
Also it will be great to see laptop's battery life scenario for different decoders.
One particular decoder can be fastest but can consume more power. It depends on how smart resources are used.
NikosD
7th April 2011, 10:03
Thanx, good to hear.
Unfortunately, I don't have a laptop to test it.
renq
8th April 2011, 10:52
Tested the clips on my old laptop (Core 2 Duo T5600 1,83GHz, W7 Pro X32 sp1 beta, 7600Go)
A. Twinpeaks-30fps
CoreAVC v2.5.1[/B] CPU 59/74/82
Microsoft MFT CPU 42/48/54
B. Samsung-30fps
CoreAVC v2.5.1 CPU 21/35/77
C. Basket-60fps
CoreAVC 2.5.1 CPU 55/66/92
D. Girls-60fps
CoreAVC 2.5.1 CPU 46/51/72
E. Birds-60fps
CoreAVC 2.5.1 CPU 31/38/49
F. Cat-60fps
CoreAVC 2.5.1 CPU 45/49/53
thuan
10th April 2011, 18:03
Seems like I have found my issue, I don't know what nvidia did but when I checked GPU-Z with a H264 video played back using MPC DXVA on 258.96 I get typically around 20% lower video engine load compared to newer driver. This gave me a clue that it is actually something nvidia did in their driver that I hope in later driver version will be fixed, if ever. Does any of you guys know the place I can report this issue to nvidia?
Rain1
10th April 2011, 18:15
Does any of you guys know the place I can report this issue to nvidia?
Try this (http://nvidia-submit.custhelp.com/cgi-bin/nvidia_submit.cfg/php/enduser/std_alp.php)
NikosD
2nd June 2011, 16:10
Maybe Quick Sync can directly access the decoded image in the GPUs memory, saving the memory transfer there would make it alot faster. I still think that the raw decode performance isn't that much greater. In any case, i will upgrade to a Z68 board once its out, so i can use the integrated GPU. If until then no-one else does some tests, i can do it then.
Z68 boards are out!
Tell me that you did the upgrade in order to see some results of Quicksync...
nevcairiel
2nd June 2011, 16:24
I did upgrade, here some quick tests, with only the MS DS H264 decoder - Min/Avg/Max values.
1. Twin Peaks: 193/195/198
3. Basketball: 193/202/207
4. Girls: 199/200/213
5. Birds: 198/200/200
Now only if the actual GPU would be fast enough to do some madVR processing with that..
NikosD
2nd June 2011, 16:41
Extremely fast as I told you!
Although the results seem a little odd because they are all around 200 fps for every clip and every value (min/avg/max)
Did you try the newest DXVAchecker v2.5 ?
Maybe it can handle Quicksync better.
Are you sure that DXVA only mode is used by Quicksync and no DXVA by GPU or software CPU ?
I'm sure the answer is DXVA Quicksync only, but I'm just wondering...
nevcairiel
2nd June 2011, 16:46
Cpu load was consistently low. Constant fps results are to be expected for a good hardware decoder, they usually don't care how complex a movie is.
Looking forward to the next nvidia decoder generation, some catching up todo performance wise.
And yes, dxvachecker2.5
NikosD
2nd June 2011, 16:57
So it seems that Quicksync is about 3 times faster than VP4 and 2-2.5 times slower than Core i7 2600 in H.264 decoding.
Looking forward to ATI next generation video decoder, too :)
Thanks for your results.
CruNcher
2nd June 2011, 19:14
@ nev
could you bench this also with Intels supplied Quicksync decoder in the Media SDK ?
dukey
2nd June 2011, 19:38
How does cuda compare using CoreAVC ?
CruNcher
2nd June 2011, 20:58
as NikosD said around 3 times slower :) Though 200 fps is a lot that mostly never someone gonna use it makes sense for the Quicksync Encoder though as it allows a lot of parallel stream encoding like Intel showed it off, and that at very acceptable quality see Tomshardware or last MSU test :)
Nev could you measure the Power consumption @ those 200 fps ?
Also 1 great thing with Quicksync is Desktop Live Recording you could easily do 60 fps recordings of the Full Windows Desktop (compressed) running even high precision timer applications without any dropped frames (in theory) :)
nevcairiel
2nd June 2011, 21:09
How does cuda compare using CoreAVC ?
CUDA gets about the same performance as the DXVA decoders, as its limited by the hardware, not any software interface.
dukey
2nd June 2011, 21:30
Strange,
when i tried CUDA vs DXVA, Cuda came out about 2x as fast. Or used 2x less CPU.
CruNcher
2nd June 2011, 23:53
That would be indeed very strange as CPU usage should be almost the same only performance should be a tad better for DXVA
dukey
3rd June 2011, 00:52
I average something like 20~ % CPU usage on this video with Cuda, and around 45% with DXVA, so CUDA totally blows DXVA away. DXVA on my system seems to offer almost no improvement at all. Perhaps it's just this video I am testing.
XinHong
3rd June 2011, 07:05
CUDA's performance is more dependent than DXVA. On my "old" system with a 8600GT DXVA is faster than CUDA.
NikosD
21st June 2011, 14:08
It seems that with new Fusion APU, there is a chance for fast DXVA + madVR decoding/ rendering due to UVD3 + No data copy between GPU memory and main memory.
We need tests, as always.
roozhou
22nd June 2011, 06:37
DXVA usually uses less memory than CUVID.
NikosD
22nd June 2011, 06:51
It doesn't matter.
Up to now, PotPlayer + DXVA renderless mode + madVR, doesn't work due to slow memory copies in ATI cards.
There is a hope that Fusion APU can do better.
CruNcher
22nd June 2011, 07:07
It doesn't matter.
Up to now, PotPlayer + DXVA renderless mode + madVR, doesn't work due to slow memory copies in ATI cards.
There is a hope that Fusion APU can do better.
Yes it should be both Brazos as well as Liano should be capable of it the lack though in Brazos is Encoding performance as they cutted of needed SIMD functions so vs Quicksync at least Brazos has no chance looks different for Liano. Though AMDs GPU encoder research is anyways to say it with nice words "minmalistic" and also in Decoder bug issues they didn't earned good community response over the last years for UVD ;)
NikosD
22nd June 2011, 07:44
UVD3 is fully featured and quite fast.
The bugs were always in software.
No serious bugs after Catalyst 10.12 + AMD MFT codecs for DivX, VC-1, WMV3
MatLz
19th November 2011, 20:34
Hi,
I am in trouble with the "Twinpeaks" sample in the first post.
FFdshow or ffms2 produce artifacts at 4s to 6s.
VLCrap is less crappy than I thought because it plays the file without problem.
Can someone confirm this ?
the_weirdo
20th November 2011, 03:24
VLC 1.2.0 nightly build (http://nightlies.videolan.org/build/win32/last/) as well as latest LAVFilters and mplayer2 produce artifacts with that sample too. So maybe it's a regression of libavcodec.
MatLz
20th November 2011, 03:36
My VLC is 1.1.9, april 2011.
NikosD
9th December 2011, 20:04
Cpu load was consistently low. Constant fps results are to be expected for a good hardware decoder, they usually don't care how complex a movie is.
Looking forward to the next nvidia decoder generation, some catching up todo performance wise.
And yes, dxvachecker2.5
Hello. I have some questions regarding QuickSync and DXVA.
I've recently read that the bitstream formatting stage is being done on the CPU even in SandyBridge with Quicksync.
Can you confirm it?
What is the situation regarding DXVA support of real DXVA capable video players like WMP12, MPC-HC(internal codecs), PotPlayer(internal codecs) and QuickSync ?
Are the players compatible with QuickSync DXVA decoding ?
What about DXVA codecs like DivX, CoreAVC, MS DS, MS MFT ?
Are there any DS or MFT codecs made by Intel ?
Is there a full DXVA VLD VC-1 support exposed by drivers for QuickSync?
And my last question:
Is it possible to post a DXVA checker screenshot (latest version/ latest drivers ?) of QuickSync ?
Thank you in advance!
vivan
10th December 2011, 10:40
What is the situation regarding DXVA support of real DXVA capable video players like WMP12, MPC-HC(internal codecs), PotPlayer(internal codecs) and QuickSync ?I'm using only MPC-HC and it support QS with H264/AVC (DXVA) internal decoder (however it produce artifacts on 1080p with 16 ReFrames video).
What about DXVA codecs like DivX, CoreAVC, MS DS, MS MFT ?At least CoreAVC.
Btw, 2.Samsung.Demo.Oceanic.Life-1080p30fpsRef16-40Mbps using CoreAVC DXVA:
Average FPS: 269,454
Min/Max FPS: 237 / 330
Are there any DS or MFT codecs made by Intel ?Maybe http://forum.doom9.org/showthread.php?t=162442
Is it possible to post a DXVA checker screenshot (latest version/ latest drivers ?) of QuickSync ?http://2.firepic.org/2/images/2011-12/10/63telyjuv6ve.png
NikosD
10th December 2011, 13:31
I'm using only MPC-HC and it support QS with H264/AVC (DXVA) internal decoder (however it produce artifacts on 1080p with 16 ReFrames video).
At least CoreAVC.
Btw, 2.Samsung.Demo.Oceanic.Life-1080p30fpsRef16-40Mbps using CoreAVC DXVA:
Average FPS: 269,454
Min/Max FPS: 237 / 330
Maybe http://forum.doom9.org/showthread.php?t=162442
http://2.firepic.org/2/images/2011-12/10/63telyjuv6ve.png
Thank you for your replies.
It seems that more work has to be done in drivers, at least.
It is good that VC-1 VLD is present at DXVA checker.
Could you do some more benchmarks with samples from these links ?
The second VC-1 clip from here: (The tough one with 1080p60 fps at 40Mbps)
http://forum.doom9.org/showthread.php?t=156660
And two very demanding H.264 clips in terms of bandwidth from here: (7th and 8th clip)
http://forum.doom9.org/showthread.php?t=163110
The link that you provided is referring to an impressive work of Eric Gur, which involves Intel Media SDK and DXVA.
It's not a pure DXVA solution.
During benchmarking, please take a look at the frequency of your processor- it should be go down to ~1.6GHz.
What is the CPU usage in DXVA checker during benchmarking ? Min/Avg/Max
Thanks
vivan
11th December 2011, 11:12
About H/W - I have i5-2410M CPU, it has 2,3 Ghz base clock speed, 2,9 Ghz Max Turbo and 800 Mhz lowest clock speed.
During playback it runs at minimum clock speed (800 Mhz). During benchmarks - at base clock speed (2,3 Ghz). If I choose "Power saving mode" it runs at 800 Mhz but benchmark results are lower - 65-70 fps for 7-8 files.
VC-1
Renderer: Video Mixing Renderer
Decoder: ffdshow Video Decoder
ffdshow VMR
Time: 00:16.983
Average FPS: 201,201
Min/Max FPS: 194 / 204
CPU Usage (%): Avg: 27 Min: 25 Max: 30
Renderer: Video Mixing Renderer
Decoder: WMVideo Decoder DMO
Time: 00:50.363
Average FPS: 67,847
Min/Max FPS: 60 / 88
CPU Usage (%): Avg: 43 Min: 39 Max: 47
Renderer: Enhanced Video Renderer (Media Foundation)
Decoder: WMVideo Decoder MFT
Time: 01:12.893
Average FPS: 46,863
Min/Max FPS: 43 / 54
CPU Usage (%): Avg: 25 Min: 24 Max: 27
8. AVC
Renderer: Enhanced Video Renderer (DirectShow)
Decoder: CoreAVC Video Decoder
Decoder Device: ModeH264_VLD_NoFGT_ClearVideo
Time: 00:04.268
Average FPS: 128,866
Min/Max FPS: 121 / 129
CPU Usage (%): Avg: 06 Min: 02 Max: 15
Renderer: Enhanced Video Renderer (DirectShow)
Decoder: Microsoft DTV-DVD Video Decoder
Decoder Device: ModeH264_VLD_NoFGT
Time: 00:04.275
Average FPS: 123,977
Min/Max FPS: 116 / 127
CPU Usage (%): Avg: 26 Min: 21 Max: 37
Renderer: Video Mixing Renderer
Decoder: ffdshow Video Decoder
Time: 00:04.671
Average FPS: 117,748
Min/Max FPS: 99 / 120
CPU Usage (%): Avg: 30 Min: 26 Max: 40
I have some problems with file number 7, DXVA Checher showed only one decoder:
7. AVC
Renderer: Enhanced Video Renderer (Media Foundation)
Decoder: Microsoft H264 Video Decoder MFT
Decoder Device: ModeH264_VLD_NoFGT_ClearVideo
Time: 00:02.846
Average FPS: 130,358
Min/Max FPS: 129 / 130
CPU Usage (%): Avg: 24 Min: 22 Max: 27
But when I remuxed it to mkv:
7. AVC (remuxed to mkv)
Renderer: Enhanced Video Renderer (DirectShow)
Decoder: Microsoft DTV-DVD Video Decoder
Decoder Device: ModeH264_VLD_NoFGT
Time: 00:02.587
Average FPS: 131,040
Min/Max FPS: 127 / 131
CPU Usage (%): Avg: 27 Min: 20 Max: 33
Renderer: Enhanced Video Renderer (DirectShow)
Decoder: CoreAVC Video Decoder
Decoder Device: ModeH264_VLD_NoFGT_ClearVideo
Time: 00:02.894
Average FPS: 128,196
Min/Max FPS: 124 / 129
CPU Usage (%): Avg: 11 Min: 06 Max: 16
Renderer: Video Mixing Renderer
Decoder: ffdshow Video Decoder
Time: 00:03.271
Average FPS: 113,421
Min/Max FPS: 98 / 120
CPU Usage (%): Avg: 28 Min: 26 Max: 31
NikosD
12th December 2011, 10:04
Good job!
These are VERY INTERESTING results.
It seems obvious to me now that indeed, even in latest HW of Intel (QuickSync), there is no pure DXVA solution in video decoding.
It reminds me the debate to my other post regarding the same thing of a missing stage in H.264 decoding pipeline that has to be done in CPU when processed by G45 chipset.
That debate had a lot of intense comments about Intel's HW/drivers efficiency in DXVA decoding.
Not a lot of things have changed since, regarding CPU dependency.
For a pure HW decoding implementation like UVDx or VPx, the CPU always run in lowest frequency - even in benchmark mode - and the results are always the same, no matter what the CPU frequency is.
This clearly doesn't happen in QS implementation, because higher CPU frequency leads to higher framerates in benchmark mode.
That bitstream formating stage is being processed by CPU.
It's also clear that FFDShow VC-1 codec (Intel Media SDK) is using QS but with a large CPU usage ~25%
CoreAVC is definitely the way to go for DXVA decoding with H.264 and QS - fast performance with minimum CPU usage, although I think I read that is not pure DXVA solution for Intel.
It's using Intel Media SDK too like FFDShow QS.
Interesting comparison between FFDshow QS vs CoreAVC in DXVA (Intel Media SDK) mode
After your results I have to ask Intel one question:
Where are your DXVA DS/ MFT decoders for H.264/ VC-1 ?
Thank you Vivan for your time and the results.
NikosD
12th December 2011, 17:21
@vivan
Any particular reason that you tested both VC-1 and H.264 clips with FFDShow QS version in VMR renderer ?
Why don't you use EVR with FFDShow QS and add the results to the above post?
It should be faster...
CruNcher
14th December 2011, 19:48
Unfortunately its not that easy to bench the performance as DXVA Checker also has problems with several DXVA Decoder + Splitter combinations you sometimes get very weird benchmark or playback results some Decoder splitter combinations even fail completely in a 0 bench result (they just brake up returning 1000 fps) (happens also on the plain Graph level with Software like Graphstudio).
Also CoreAVC is some kind of a special DXVA implementation it can avoid some x264 bitstream issues other DXVA Decoder like Cyberlinks or Arcsoft would fail on, i would prefer it anytime in terms of overall bitstream stability though even better for DXVA is Mirillis Splash Players DXVA + Direct3d Render combination its pure awesome (it even avoids extreme bitstream issues not 1 glitch where CoreAVC would show @ least 1 on intels Hardware) and on the other side it's also the most resource efficient Decoder + Render combination i yet saw on Windows :)
Mirillis is really a company to look out for very talented Windows Multimedia Coder (with very deep Codec knowledge) they even beat the Asians on some stuff,they surely gonna give CoreCodec a run for it's money in the future, just recently they started to attack FRAPS Lossy Intra Codec Part for now and they don't do bad (also they have very talented GUI designers and Artists as well, remembers a little bit of DivX awesome Web Designer (who created the Cow and Alien Stuff) back then just for the Desktop) ;)
Though his style was is awesome http://web.archive.org/web/20041216090025im_/http://images.divx.com/home/banner_lotr.jpg :) not for nothing he won several awards :D
Normally im not so into Design of applications but usability and efficiency though if you do all of those right or very balanced it's awesome ;) http://mirillis.com/gfx/action/mirillis_action_window_desktop_recording.jpg
vBulletin® v3.8.11, Copyright ©2000-2026, vBulletin Solutions Inc.