View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing
theoneofgod
15th June 2013, 09:43
Works with up to 3 discrete GPUs in theory. As far as I know, systems with more than 2 dGPUs don't have QuickSync (they have extreme edition processor models w/o QuickSync).
Note that using QuickSync when the display is not connected to the processor graphics (iGPU) is only supported in Windows 8. In Windows 7 you need to extend the desktop to the disconnected iGPU.
I run a 3770K and crossfire with Windows 8. FFDShow won't detect QSV but Arcsoft Media Converter does.
egur
18th June 2013, 00:00
Isnt the adapter number based on the number of connected screens as well? I can have one dGPU with 4 connected screens, and the iGPU would only come after that. Extreme setup i suppose, but possible :)
That's how it works with D3D9. With D3D10/11 every adapter is enumerated, even if disconnected. I'm not sure how multiple screens connected to multiple adapters are enumerated.
I run a 3770K and crossfire with Windows 8. FFDShow won't detect QSV but Arcsoft Media Converter does.
That's very strange, are you sure Media Converter actually manages to work with QS?
Maybe more than 4 devices get enumerated... Shouldn't happen.
theoneofgod
18th June 2013, 00:45
That's how it works with D3D9. With D3D10/11 every adapter is enumerated, even if disconnected. I'm not sure how multiple screens connected to multiple adapters are enumerated.
That's very strange, are you sure Media Converter actually manages to work with QS?
Maybe more than 4 devices get enumerated... Shouldn't happen.
Im pretty sure. Low CPU converting.
Please check the attached images.
itsonlyjustincase
18th June 2013, 11:39
I'm really impressed to see how QuickSync is powerful.
Egur did you have answer from nvidia ?
egur
18th June 2013, 17:40
@theoneofgod
I'm currently on a business trip and can't test my desktop setup.
Please try to specify your setup in detail so I can reproduce as closely as possible (upon my return).
-- Intel GPU Driver version
-- How the display(s) are connected (which GPU).
-- Any other special drivers installed (e.g. Virtu)?
-- Player used (name + 32/64 bit).
-- Version of ffdshow
-- Does ffdshow use libavcodec or does it use QS but displays a bad image?
theoneofgod
19th June 2013, 03:18
@theoneofgod
I'm currently on a business trip and can't test my desktop setup.
Please try to specify your setup in detail so I can reproduce as closely as possible (upon my return).
-- Intel GPU Driver version
-- How the display(s) are connected (which GPU).
-- Any other special drivers installed (e.g. Virtu)?
-- Player used (name + 32/64 bit).
-- Version of ffdshow
-- Does ffdshow use libavcodec or does it use QS but displays a bad image?
1) 9.17.10.2932 - 12/12/2012
2) 1 display, connected to DVI from the dGPU.
3) Yes, using Virtu MVP (otherwise nothing works, including Media Conveter) Virtu MVP Desktop Edition: 2.1.221.24927
4) GOM Player x86 2.1.50
5) ffdshow tryouts rev4512 may 25 2013 (the lastest you released) ffdshow_rev4512_20130525_egur
6) ffdshow shows libavcodec, no Intel QSV is displayed.
Thank you.
egur
19th June 2013, 05:07
1) 9.17.10.2932 - 12/12/2012
2) 1 display, connected to DVI from the dGPU.
3) Yes, using Virtu MVP (otherwise nothing works, including Media Conveter) Virtu MVP Desktop Edition: 2.1.221.24927
4) GOM Player x86 2.1.50
5) ffdshow tryouts rev4512 may 25 2013 (the lastest you released) ffdshow_rev4512_20130525_egur
6) ffdshow shows libavcodec, no Intel QSV is displayed.
Thank you.
Since you use Virtu MVP, you should set up your player to use the iGPU. Virtu may block using the iGPU when set to use the dGPU.
I currently don't have Virtu MVP, but I'll try to get a copy when I return home.
In any case, I don't believe that crossfire has anything to do with this.
theoneofgod
19th June 2013, 06:53
Since you use Virtu MVP, you should set up your player to use the iGPU. Virtu may block using the iGPU when set to use the dGPU.
I currently don't have Virtu MVP, but I'll try to get a copy when I return home.
In any case, I don't believe that crossfire has anything to do with this.
I forgot to mention that I tried that already, still the same results, QSV not shown in ffdshow codec setup.
Even when opening the ffdshow config menu directly, still doesn't show it.
NikosD
19th June 2013, 10:04
A performance preview of Snapdragon 800 at Anandtech:
Along with the x2 performance of Adreno 330 vs Adreno 320 we have the first mobile SoC capable of 4K H.264 Hardware Encoding / Decoding (playback).
It is capable of Hardware encoding/decoding of H.264 3840x2160@25fps with 120Mbps bitrate available to smartphones and tablets
4K samples here 3840x2160@25fps-120Mbpshttp://images.anandtech.com/reviews/gadgets/qualcomm/MDP8974/VID_20130618_161251.renametomp4.zip (]http://anandtech.com/show/7082/snapdragon-800-msm8974-performance-preview-qualcomm-mobile-development-tablet[/URL)
and YouTube here 3840x2160@25fps-56Mbps[URL=]http://youtu.be/H2eoSEPIQPQ
I'm sure that Ivy and Haswell can play both of them easily, but I have to say that my tiny VP5 can play both of them too - with an almost zero CPU utilization of my poor Core2Duo ;)
theoneofgod
19th June 2013, 11:07
A performance preview of Snapdragon 800 at Anandtech:
Along with the x2 performance of Adreno 330 vs Adreno 320 we have the first mobile SoC capable of 4K H.264 Hardware Encoding / Decoding (playback).
It is capable of Hardware encoding/decoding of H.264 3840x2160@25fps with 120Mbps bitrate available to smartphones and tablets
4K samples here 3840x2160@25fps-120Mbpshttp://images.anandtech.com/reviews/gadgets/qualcomm/MDP8974/VID_20130618_161251.renametomp4.zip (]http://anandtech.com/show/7082/snapdragon-800-msm8974-performance-preview-qualcomm-mobile-development-tablet[/URL)
and YouTube here 3840x2160@25fps-56Mbps[URL=]http://youtu.be/H2eoSEPIQPQ
I'm sure that Ivy and Haswell can play both of them easily, but I have to say that my tiny VP5 can play both of them too - with an almost zero CPU utilization of my poor Core2Duo ;)
Playing that sample in VLC uses a whole 25% CPU. Crazy.
screamz
20th June 2013, 12:53
4K samples here 3840x2160@25fps-120Mbpshttp://images.anandtech.com/reviews/gadgets/qualcomm/MDP8974/VID_20130618_161251.renametomp4.zip
I downloaded the file but can't extract it. Do you have similar problems?
NikosD
20th June 2013, 13:45
Playing that sample in VLC uses a whole 25% CPU. Crazy.
The crazy thing is that you can play it with ONLY 25% CPU.
This is impossible, unless you have a secret Chinese CPU :eek:
In order to view it in realtime@25fps you must use a decent video player like PotPlayer and in software mode you will need ~55% of a Core i5(quad core)@3.1GHz CPU (Sandy) or in hardware mode (DXVA/QS) you need Ivy or Haswell or an Nvidia VP5 card.
I downloaded the file but can't extract it. Do you have similar problems?
You can't extract it.
The name of the file says "rename me to mp4"
So, change the extension of the file from .zip to .mp4
vivan
20th June 2013, 14:42
The crazy thing is that you can play it with ONLY 25% CPU.
This is impossible, unless you have a secret Chinese CPU :eek:On my i7 970 I get ~20% CPU load.
Also note that my CPU (and his too, probably) does have HT and every second core have zero load (at least when I'm using LAV Decoder). Let's assume that HT efficiency is about 15%, then this 20% become 20*2/1,15 = 35% of "real" CPU load.
NikosD
20th June 2013, 15:02
On my i7 970 I get ~20% CPU load.
This is a false statement of the CPU monitor software, which counts the logical cores as physical cores.
Also note that my CPU (and his too, probably) does have HT and every second core have zero load (at least when I'm using LAV Decoder). Let's assume that HT efficiency is about 15%, then this 20% become 20*2/1,15 = 35% of "real" CPU load.
A more accurate way to measure the real load is to disable the HT - if it's possible - and run a CPU video benchmark with DXVA Checker or GraphEdit and then repeat the same benchmark with HT on.
The difference will show you the real benefit of HT for video decoding - I think 15% is too much for HT regarding video decoding and Core i7 970.
Then disable HT again and measure CPU utilization during normal playback.
You can subtract the real benefit of HT from the above utilization with HT off to find a more accurate CPU utilization of your hyperthreaded CPU.
nevcairiel
20th June 2013, 15:09
I did such tests before on my SNB 2600k. Enabling HT gives a 30% performance advantage when benchmarking decoding.
screamz
20th June 2013, 15:15
You can't extract it.
The name of the file says "rename me to mp4"
So, change the extension of the file from .zip to .mp4
Omg thank you! I didn't read that ;-)
NikosD
20th June 2013, 15:22
Intel didn't manage to have the best HT from the beginning - Pentium 4 had HT too :scared:
SNB's HT is surely better than HT of Core i7 970.
I didn't ever have a hyperthreaded CPU and I haven't done the tests I describe above myself - still 30% seems too much although video decoding could be a fully parallel procedure.
I won't insist on numbers since I don't have them, nevertheless, I think I made my point of what real CPU utilization means, clear.
egur
20th June 2013, 17:07
Hyper Threading like any other performance feature doesn't improve performance across the board. It improves performance for most flows and may lower performance for few.
It's worse than doubling the core count but it uses significantly less die area than double core count.
For flows that have lots of I/O, CPU execution resources are utilized better with HT. A good example is compilation (e.g. build a C++ project using Visual Studio).
Also if one thread is hammering the SSE units and another is scalar integer based, they'll use different execution resources and the overall performance would increase.
Compared to the die increase of HT, a 30% performance boost is very good (can be more sometimes).
theoneofgod
20th June 2013, 17:42
I have a 3770k at 4.2ghz and I assure you, with VLC 25% CPU was used.
Pat357
22nd June 2013, 13:45
I'm looking for a way to access my HD4600 (quickync) with my monitor connected to my NVidia GTX-680.
I remember that Egur posted a trick to accomplish this by setting up a "fake" monitor, but now that I have the proper HW, I can't fnd this post anymore.
Has anyone a pointer to the post from Egur ?
nevcairiel
22nd June 2013, 13:54
Here is the link:
http://forum.doom9.org/showpost.php?p=1532786&postcount=186
sheppaul
22nd June 2013, 14:30
I have a 3770k at 4.2ghz and I assure you, with VLC 25% CPU was used.
I've tried it with 3570K not overclocked and the CPU usage of VLC was roughly 43% in task manager of windows 8.
cf. 41% in potplayer (with type 1 decoder)
theoneofgod
22nd June 2013, 16:20
I'm looking for a way to access my HD4600 (quickync) with my monitor connected to my NVidia GTX-680.
I remember that Egur posted a trick to accomplish this by setting up a "fake" monitor, but now that I have the proper HW, I can't fnd this post anymore.
Has anyone a pointer to the post from Egur ?
If your Motherboard supports Virtu MVP, it's possible.
I've tried it with 3570K not overclocked and the CPU usage of VLC was roughly 43% in task manager of windows 8.
cf. 41% in potplayer (with type 1 decoder)
Even with HT disabled, still clocked at 4.2ghz, VLC/GOM Player still take around 23/25%
jkauff
22nd June 2013, 17:31
If your Motherboard supports Virtu MVP, it's possible.
Motherboard needs to support iGPU and dGPU enabled at the same time. Eric's "fake monitor" trick works fine for decoding (like with LAV), but for encoding you do need Virtu MVP, and only certain apps work with that.
For example, Arcsoft Media Converter has explicit support in Virtu, but you can make Nero Recode and Handbrake QuickSync beta work by adding them to the Virtu iMode apps list. Media Coder, however, does not work.
theoneofgod
22nd June 2013, 17:45
Motherboard needs to support iGPU and dGPU enabled at the same time. Eric's "fake monitor" trick works fine for decoding (like with LAV), but for encoding you do need Virtu MVP, and only certain apps work with that.
For example, Arcsoft Media Converter has explicit support in Virtu, but you can make Nero Recode and Handbrake QuickSync beta work by adding them to the Virtu iMode apps list. Media Coder, however, does not work.
Eric's ffdshow didn't work for me (QSV), even with Virtu.
Pat357
22nd June 2013, 18:12
Here is the link:
http://forum.doom9.org/showpost.php?p=1532786&postcount=186
Thanks Nev !
I have it working, but with one very anoying issue : I often loose my mouse cursus (probably locked in the area of the not-existing fake screen).
Does this mean I did something wrong or is this a known issue ?
Is there a way to quickly recover from this situation besides rebooting the system ?
Pat357
22nd June 2013, 20:48
Eric's ffdshow didn't work for me (QSV), even with Virtu.
At first, it didn't work for me neither :)
I did have my monitor connected to the intel HD Graphics (DVI), but no quicksync for me.
I noticed that there was no file called "IntelQuickSyncDecoder.dll" in my fresh installed FFDShow folder, while AFAIR it should be there.
What I did was copy the "IntelQuickSyncDecoder.dll" from my LAV-filters installation to the FFDSHow directory and .. indeed.. quicksync was available in FFDshow !!
This could be a bug in the FFDshow installation : maybe it didn't detect my HD4600 and therefore it didn't install this needed file ?
I took the FFDshow installation file "ffdshow_rev4512_20130525_egur.exe" directly from this thread....
egur
23rd June 2013, 01:58
I'll check ffdshow's installer.
Anyway, Nev has found a graphics driver bug that causes QS to display green images when D3D11 is used on IvyBridge/Haswell drivers.
I hope to release a new version today.
egur
23rd June 2013, 03:12
Version 0.44 is out with the following changes:
* Improved D3D11 compatibility with 15.31 Intel drivers.
* H264 playback now properly supports fragmented packets (some live TV splitters), better error handling. Code was rewritten and now it's BSD licensed all the rest of IQSD code.
* FFDShow: r4519
Downloads
* For the latest cutting edge FFDShow builds download my builds Intel QuickSync Decoder SourceForge home page (http://sourceforge.net/projects/qsdecoder/)
* FFDShow-tryout site (http://ffdshow-tryout.sourceforge.net/download.php)
* LAV Splitter builds (http://forum.doom9.org/showthread.php?t=156191)
itsonlyjustincase
23rd June 2013, 13:29
any news from nvidia ?
egur
23rd June 2013, 17:25
Sorry, no.
Nev, has found an issue with live TV playback (I can't test this flow), so a fix will be made shortly.
itsonlyjustincase
24th June 2013, 21:58
Sorry, no.
Nev, has found an issue with live TV playback (I can't test this flow), so a fix will be made shortly.
I doubt we will have any answer or fix to that
egur
24th June 2013, 22:04
Just to be clear, the Nvidia issue is separate from the live TV issue.
The Nvidia issue is under investigation, it takes time to reach the right people.
itsonlyjustincase
25th June 2013, 00:37
Just to be clear, the Nvidia issue is separate from the live TV issue.
The Nvidia issue is under investigation, it takes time to reach the right people.
Sorry i should have precised i was talking about nvidia not nev.
I truely believe we won't have any interesting feedback from nvidia.
theoneofgod
26th June 2013, 12:39
Issues with the latest version, watching Spartacus, I see the picture freezing (not audio) and then catching up on itself. Happens a lot during action scenes? It's not part of the show as it works fine with GOM Players internal filter.
egur
26th June 2013, 17:21
Issues with the latest version, watching Spartacus, I see the picture freezing (not audio) and then catching up on itself. Happens a lot during action scenes? It's not part of the show as it works fine with GOM Players internal filter.
Previous versions were OK?
What's your full setup (player, OS, splitter, driver, etc)?
Can you share a clip (<100MB) or specify the release name?
CharlieCL
26th June 2013, 18:10
My CPU Usage testing with 4K samples 3840x2160@25fps-120Mbp listed by NikosD on Ivy Bridge 2.0GHz HD 4000 Windows 8 Pro Intel Graphics Driver 9.18.10.3165
Windows Metro UI 6%
Windows Desktop 11% with LAV and Quick Sync enabled
Intel's new driver 10% ~ 20% better than Microsoft's default driver in Windows 8 pro.
I guess Quick Sync hardware acceleration unit may not be shared between tasks.
This may be a disadvantage of hardware acceleration codec vs software codec in
current using of hardware acceleration unit.
egur
26th June 2013, 18:23
Metro UI (all metro apps) use the MFT infrastructure. The graphics driver ships with Hardware MFTs to enable decode/vpp/encode (this is not not new BTW).
On Windows 7, WMP and WMC could use those MFTs. Windows (Metro) UI drops support for DirectShow in favor of MFT.
The MFT flow is very similar to DXVA as far as system resources are concerned. So it will always be my decoder.
Historically, Intel MFTs and Microsoft's DXVA2 video decoder (DTV-DVD video decoder) were not stable enough for general purpose use and lacked several features. That's why I started the Intel QuickSync Decoder project.
With 4K, the copy back scheme still works, but at a greater cost.
Each user should decide what's best for him/her.
theoneofgod
26th June 2013, 18:58
Previous versions were OK?
What's your full setup (player, OS, splitter, driver, etc)?
Can you share a clip (<100MB) or specify the release name?
The video was Spartacus.S02E06.720p.HDTV.x264-IMMERSE around 47th minute during the fight. It doesn't happen all the time. (this is just an example as it happened a few times)
I've not seen this before so I think it's just the new version.
GOM Player 2.1.50, Windows 8 x64 (fully updated), 9.18.10.3165, 3770k. No dGPU right now, using iGPU. I use ffdshow for both audio and video, spdif to my receiver.
egur
26th June 2013, 19:08
I can check this only on Monday (I'm on a business trip). Please fill in the blanks for the other details.
theoneofgod
26th June 2013, 22:08
Not sure, I don't use a specific splitter, just GOM Player and ffdshow.
GTPVHD
26th June 2013, 23:05
http://msdn.microsoft.com/en-us/library/windows/apps/bg182880.aspx
Windows 8.1 Preview introduces DirectX 11.2, which brings a host of new features to improve performance in your games and graphics apps.
Eric, does anything in DirectX 11.2 improve QS decoder when using D3D11?
CharlieCL
27th June 2013, 14:37
...
With 4K, the copy back scheme still works, but at a greater cost.
Each user should decide what's best for him/her.
The advantage of Quick Sync decoder is its renderer independent. Since Haswell's CPU and GPU share the same memory space, is it possible to remove the copy-back and keep renderer independent?
nevcairiel
27th June 2013, 15:17
The advantage of Quick Sync decoder is its renderer independent. Since Haswell's CPU and GPU share the same memory space, is it possible to remove the copy-back and keep renderer independent?
No, thats not possible.
Even if you could transfer the memory address, the GPU stores the image differently than your CPU needs it.
egur
27th June 2013, 16:49
There's an option for me to expose DXVA2 (D3D9 only) interface, in which the output will be an IMediaSample interface. This requires that D3D9 will be active, the DirectShow decoder (LAV) will add support for this. For DXVA2, the decoder is in charge of memory allocation, not the renderer. This option blocks adding a filter between the decoder and renderer as well as using only EVR style renderers.
Very complex (requires massive cooperation between me and Nev) and not worth the effort ATM. LAV already has DXVA2 in place.
I don't know how this is done with D3D11 or even which renderers support D3D11 (the surfaces are different)
Just exposing the address of the D3D9 surface data is something I already support (move the copy back to LAV). For D3D11, it's impossible to get the address.
The memory allocated for GPU surfaces is always WC (write combine) memory, which is uncached memory. It the same memory as WB (write back) memory with the same memory controller.
GPU drivers use WC memory and not WB because:
1) WC has excellent write performance - which is 99% of GPU memory traffic. WC used dedicated write back buffers to combine writes to DDR.
2) When writing to WC memory, the CPU caches are not modified (since it's uncached) hence CPU performance is not affected. Otherwise, every time you write a frame buffer to memory, many cache lines used by the application (code, data) will get evicted to higher caches or even main memory.
That's the theory anyway, I didn't test this on modern architectures where the memory controller is shared between GPU and CPU and the L3 cache is large. To test if this theory is still correct (my guess, it is), I'd need to rewrite the graphics driver.
theoneofgod
28th June 2013, 11:48
Eric, have you looked at the issue when Virtu MVP is in use, even with a single card, ffdshow won't use QSV.
edit: Strangely enough, after saying this, it seems to work now. I removed GOM Player from Virtu MVP. I had ffdshow installed and the codecs were selected before installing the graphics card this time.
edit2: Added GOM Player to Virtu MVP, now ffdshow won't use QSV.
Thanks Eric.
jkauff
30th June 2013, 03:45
In my experience with Virtu MVP, adding unsupported applications is a real crap shoot. Are you using Eric's "phantom display" trick for the iGPU? That seems to work with the video filters that can use QS like ffdshow and LAV.
theoneofgod
30th June 2013, 13:06
In my experience with Virtu MVP, adding unsupported applications is a real crap shoot. Are you using Eric's "phantom display" trick for the iGPU? That seems to work with the video filters that can use QS like ffdshow and LAV.
No, I haven't done anything. First I installed my system with the iGPU, including Eric's ffdshow. Set it up to allow QSV (all working)
Installed my HD 7950, and that was that, ffdshow works with QSV (D3D11)
jkauff
30th June 2013, 13:40
No, I haven't done anything. First I installed my system with the iGPU, including Eric's ffdshow. Set it up to allow QSV (all working)
Installed my HD 7950, and that was that, ffdshow works with QSV (D3D11)
Win 8 is supposed to support a headless iGPU and a dGPU setup, but this is the first report I've seen of it actually working. Maybe installing the dGPU last is the key.
theoneofgod
30th June 2013, 13:43
Win 8 is supposed to support a headless iGPU and a dGPU setup, but this is the first report I've seen of it actually working. Maybe installing the dGPU last is the key.
Me too. I do have Virtu MVP installed if that makes any difference.
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.