View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing
linzki
22nd December 2011, 05:59
I have i5-2500k, gtx460, P8H61-M LX.
With these can i get intel quick sync decoder to work with lucid virtua and ffdshow?
I use my acer H5360 projector as monitor. Does it work if gtx460 is connected to projector or is it must to have igpu vga connected too. I'm asking this because the hdmi cable coming from my projector is connected to my gtx460 with hdmi-dvi adapter. I don't have vga cable so i can't connect directly to the igpu.
So can i get this working with only having hdmi cable going to my gtx460, without any cable in igpu and how?
egur
22nd December 2011, 09:26
I have i5-2500k, gtx460, P8H61-M LX.
With these can i get intel quick sync decoder to work with lucid virtua and ffdshow?
I use my acer H5360 projector as monitor. Does it work if gtx460 is connected to projector or is it must to have igpu vga connected too. I'm asking this because the hdmi cable coming from my projector is connected to my gtx460 with hdmi-dvi adapter. I don't have vga cable so i can't connect directly to the igpu.
So can i get this working with only having hdmi cable going to my gtx460, without any cable in igpu and how?
Yes, you can make it work. In fact you can have 2 setups:
1) Renderer uses iGPU (HD2000/3000) - must use Virtu to redirect the display. Connect display to the GTX460, add your player to Virtu's application list. Use EVR as renderer for best quality.
2) Hybrid GPU setup - renderer uses dGPU HW - decoder uses QuickSync. You need a short setup as explained in a previous post:
http://forum.doom9.org/showthread.php?p=1532786#post1532786
EVR will use NVidia's video processing, but such a strong dGPU can also use MadVR with it's highest settings. Try both and see which is more to your liking.
BTW, there's no need for setting Virtu. The player must be directed to output to the dGPU (e.g. gtx460).
I only validate Windows 7. I'm not sure the trick trick works on older versions of Windows.
BTW Virtu works very good for 32 bit playback but I had issues with the 64 version (v1.06).
You'll get different quality from wither setup, it's up to you to decide what's best for you.
linzki
22nd December 2011, 14:12
I tried to install lucid virtu driver 1.2.11 (64-bit) on windows 7 64-bit. I get: "This chipset is not supported for this version of "Virtu" setup will now exit.
Is it my cheap motherboard that is not supported by lucid virtu, or is there other version somewhere that supports my MB?
Edit: tried option nr 2 also. what ever i try it always uses libavcodec instead intel quicksync decoder.
egur
22nd December 2011, 16:03
I tried to install lucid virtu driver 1.2.11 (64-bit) on windows 7 64-bit. I get: "This chipset is not supported for this version of "Virtu" setup will now exit.
Is it my cheap motherboard that is not supported by lucid virtu, or is there other version somewhere that supports my MB?
Personally I don't buy the cheapest boards because they are made from the cheapest components and die faster.
Virtu might install after a BIOS update - check your motherboard manufacturer if an update exists. My board was recognized by Virtu only after an update. It installed fine but was working in evaluation mode. If your MB manufacturer bought a license from Virtu, it will work.
For playback using practically all players, you can use my decoder with the hybrid GPU setup in the previous post. BTW, it's a one time setup.
TPoise
22nd December 2011, 16:40
BTW, I installed the lastest x2559 Intel drivers and that fixed my video corruption issues. Glad things work now.
Slightly Off-topic, but I had a few questions about QuickSync in general:
1. Would it ever be possible to hardware-decode VP8 (WebM) video? (Whether or not WebM actually takes off is a different question...)
2. Any update on integrating the ENCODER into x264? There was a lot of back-and-forth with the x264 guys on whether it was possible to use the hardware in QS to accelerate x264 (without using the "out-of-the-box" QuickSync transcoder) and I think there was an Intel engineer that was going to end up doing it.
3. Is there a program to monitor the LFM for a specific process? I use CoreTemp which measures total voltage across the entire system, but I didn't know if you could verify that the QS decoder is using only 800Mhz for laptops.
egur
22nd December 2011, 19:41
BTW, I installed the lastest x2559 Intel drivers and that fixed my video corruption issues. Glad things work now.
Slightly Off-topic, but I had a few questions about QuickSync in general:
1. Would it ever be possible to hardware-decode VP8 (WebM) video? (Whether or not WebM actually takes off is a different question...)
2. Any update on integrating the ENCODER into x264? There was a lot of back-and-forth with the x264 guys on whether it was possible to use the hardware in QS to accelerate x264 (without using the "out-of-the-box" QuickSync transcoder) and I think there was an Intel engineer that was going to end up doing it.
3. Is there a program to monitor the LFM for a specific process? I use CoreTemp which measures total voltage across the entire system, but I didn't know if you could verify that the QS decoder is using only 800Mhz for laptops.
I can't answer all of your questions. It's not my place to reveal future features of HW/driver/SDK. There are official forums for these topics. Since most of it is confidential, don't get your hopes up...
1. I'm not aware of any public announcement on the matter. Usually new features are exposed close to the launch date.
2. I'm not familiar with the encoder enough myself. It's a little out of my scope. I think the Intel Media SDK allows setting some encoder parameters on the fly so a SW encoder can become a high level encoder. E.g. tell the HW encoder when to create a I/B/P and decide the quantization parameters. Maybe someone is working on it within Intel like you said but I don't know who it is or which group he belongs to. Intel has 100K employees...
3. The CPU frequency is global to the entire CPU - all cores run at the same specific frequency. This frequency maybe above stock frequency (turbo) or below it. In Turbo mode some cores may not run at all "giving" their power budget to other cores. LFM is the low frequency mode and it's also global to the CPU's cores. The frequency doesn't change as a result of process context switch so CoreTemp will measure the only thing there is to measure - CPU frequency (all cores are the same) + package power (cores + system-agent + graphics). So if CoreTemp is showing 800Mhz when QS is being used, than that's it.
BTW, CPU utilization is heavily affected by core speed. Reduce the frequency by 50% and the utilization will double for the same program.
I hope this helped.
hajj_3
23rd December 2011, 01:53
Will you be adding support to this for Ivy Bridge once that is released?
egur
23rd December 2011, 09:42
Will you be adding support to this for Ivy Bridge once that is released?
It will work on Ivy Bridge as well.
hoborg
23rd December 2011, 15:04
Hi.
I have Intel Core i5-2400S + Radeon 6750 + Win7 x64 in my system.
If i try to install Intel GPU drivers, i got a error message that my system doesnot meet the minimum.
What trick need to be done ? :)
wanezhiling
24th December 2011, 07:00
Hi egur
Could it support Pentium G620 and Celeron G530? They are both SandyBridge.
vivan
24th December 2011, 08:01
http://ark.intel.com/products/53480
http://ark.intel.com/products/53414
Intel® Quick Sync Video - No
=> No.
wanezhiling
24th December 2011, 08:25
http://ark.intel.com/products/53480
http://ark.intel.com/products/53414
Intel® Quick Sync Video - No
=> No.
Thx.
This's dxva checker G620 (http://we.pcinlife.com/data/attachment/forum/201111/12/094547v8hjmtttph2p1hxp.png)
How to explain it?:confused:
hajj_3
24th December 2011, 10:40
Thx.
This's dxva checker G620 (http://we.pcinlife.com/data/attachment/forum/201111/12/094547v8hjmtttph2p1hxp.png) How to explain it?:confused:
this just see's if the cpu can hardware decode video which those processors can. Quicksync is for encoding video, that is something different.
Those intel links are correct, quicksync is only on core i3, i5, i7 sandy bridge.
egur
24th December 2011, 17:10
A few clarifications.
QuickSync is the name of HW video acceleration for decode, video processing and encode.
The Intel QuickSync Decoder uses only the decode part which is available on Pentium (and probably celeron) versions of SandyBridge. Users have already confirmed that it works.
If one wants HW encode - very fast transcoding he/she should buy at least an i3.
egur
24th December 2011, 17:14
Hi.
I have Intel Core i5-2400S + Radeon 6750 + Win7 x64 in my system.
If i try to install Intel GPU drivers, i got a error message that my system doesnot meet the minimum.
What trick need to be done ? :)
You should enable the GPU in BIOS. The default BIOS behavior is to disable the iGPU if an external GPU exists. If the iGPU is enabled in BIOS, it will appear in the device manager (right click on "computer", manage, device manager). If a screen isn't connected to the iGPU you will not be able open its control panel.
Add a fake display to enable hybrid GPU support - see this link http://forum.doom9.org/showthread.php?p=1532786#post1532786
hoborg
24th December 2011, 21:22
You should enable the GPU in BIOS. The default BIOS behavior is to disable the iGPU if an external GPU exists. If the iGPU is enabled in BIOS, it will appear in the device manager (right click on "computer", manage, device manager). If a screen isn't connected to the iGPU you will not be able open its control panel.
Add a fake display to enable hybrid GPU support - see this link http://forum.doom9.org/showthread.php?p=1532786#post1532786
Thank you for info, but i didn't found such option in BIOS. Maybe it is not supported by Gigabyte PH67-UD3-B3 mother board.
CruNcher
24th December 2011, 22:21
A few clarifications.
QuickSync is the name of HW video acceleration for decode, video processing and encode.
The Intel QuickSync Decoder uses only the decode part which is available on Pentium (and probably celeron) versions of SandyBridge. Users have already confirmed that it works.
If one wants HW encode - very fast transcoding he/she should buy at least an i3.
Though we can be sure that it's disabled on die but physically existent its most probably just disabled for some reason like "we cant afford 2 productions streets just disable it on the IME level (1 bit and support is gone see upgrade possibility) or it was just malfunctioning (production error and because of that disabled either hardware level "laser cut or IME (uefi,bios))) or it will become part of Intels hardware by software lock upgrade programm (for some more money you can buy it additionally get a code (protected by the cryptographically part of Intels silicon) for the bios open up this 1 bit and boom you got support for xx$) that's where intel is heading to anyways 1 silicon upgradable into every stage of the line with just 1 highly hardware protected key system Maximum Profit with low effort, though much better for resource usage but if you guess its being done because we need to protect our world and try to use less resources unfortunately that's not the main reason rather than china starting the next phase of their economical WarGame ;)
egur
24th December 2011, 23:48
Thank you for info, but i didn't found such option in BIOS. Maybe it is not supported by Gigabyte PH67-UD3-B3 mother board.
The board has the right chipset (h67) according to Gygabyte's website but it looks like a typo.
it behaves a like a P67 chipset motherboard. It has OC (H67 doesn't have) and lacks processor graphics support (which h67 should have) - there's no graphics connector of any kind on the back panel. Here are the spec from Gygabyte's website http://www.gigabyte.com/products/product-page.aspx?pid=3767&dl=1#ov. There's a photo of the back panel.
The manual doesn't reference the processor graphics anywhere.
Looks like a P67 board to me. So no video from the processor.
Does your board's back panel have graphics connectors (dvi/vga/hdmi/dp)?
egur
25th December 2011, 00:17
Though we can be sure that it's disabled on die but physically existent its most probably just disabled for some reason like "we cant afford 2 productions streets just disable it on...
That's not quite accurate. I'd like to give a general explanation on how multiple versions of a chip are made. This isn't specific to Intel, it's across the entire semiconductor industry.
Disabling features is how there's a large selection of CPUs to choose from. Some will only want the cheapest, some will put a few more dollars or euros for a better one (more features), some will pay more than 1000$ for an extreme edition and some will buy a $6000 Xeon based workstation with two sockets and 64GB of RAM. It makes no sense to create 30 versions of processors (some very similar) due to the enormous validation costs. So if Intel (and probably all semiconductor companies) would go and create many different dies, they would loose money or the products will cost double (probably a lot more than double). It makes sense to produce different dies if and only if there's a significant die area reduction (die area == $$$) and the expected sales are high enough (recover the validation costs as well increased manufacturing costs).
wanezhiling
25th December 2011, 07:00
A few clarifications.
QuickSync is the name of HW video acceleration for decode, video processing and encode.
The Intel QuickSync Decoder uses only the decode part which is available on Pentium (and probably celeron) versions of SandyBridge. Users have already confirmed that it works.
If one wants HW encode - very fast transcoding he/she should buy at least an i3.
:thanks:
So what about previous generation Desktop Processors(i3/i5/i7..)?
egur
25th December 2011, 07:57
:thanks:
So what about previous generation Desktop Processors(i3/i5/i7..)?
The decoder will work on previous generation but at much lower efficiency due to different memory architecture (integrated memory controller) as well as different decode HW. Even Core 2 Duo machines will work (with GMCH graphics). I don't test these setups because of that. Better to use pure SW (similar performance to my decoder) or pure DXVA (faster but limited).
Fortunately, future generations (IvyBridge,...) will have the same performance as SandyBridge - probably much better. So my work isn't targeting a single generation.
Same goes for OS support - I'm officially testing only on Windows 7, although Vista should work fine. Windows 8 will be supported as well when it's released.
wanezhiling
25th December 2011, 12:55
:thanks:I got it.
pulbitz
28th December 2011, 04:33
I'm sorry. I don't speak English very well.
ffdshow video decoder configuration | Output | Stream settings
Check "Set interlaced flag in output media type"
- libavcodec
perfect smooth playback of 1080i content. (double framerate)
- Intel QuickSync
motion is steppier. (poor)
Is this a bug? Or QuickSync decoder limitation?
You need a SandyBridge (or newer) to enjoy the HW acceleration used in the QuickSync decoder.
Please specify which renderer was use as well other setup (driver version, OS version, 32/64 bit, player, display is connected to iGPU or dGPU, etc). If you can post a part of the clip (10-20 seconds) that would help development.
This thread is very busy so please report issues at the QuickSync decoder thread:
[ur]http://forum.doom9.org/showthread.php?t=162442[/url]
CPU: i5-2500 with HD Graphics 2000 (I don't use external GPU)
OS: Windows 7 Ultimate x64
Driver: 2559 (15.22.52.64.2559)
Renderer: EVR (not custom EVR)
Player
1. The KMPlayer(3.0.0.1441) http://cdn.kmplayer.com/KMP/Download/KMPlayer_EN_3.0.0.1441_R2.exe
2. PotPlayer(2011.12.26. beta) http://get.daum.net/PotPlayer/Beta/PotPlayerSetup.exe
sample file(1080i60) http://www.multiupload.com/2JMT9SLZQI
P.S. PotPlayer with built-in QuickSync decoder is OK. (Unlike ffdshow QuickSync decoder)
egur
28th December 2011, 22:34
CPU: i5-2500 with HD Graphics 2000 (I don't use external GPU)
OS: Windows 7 Ultimate x64
Driver: 2559 (15.22.52.64.2559)
Renderer: EVR (not custom EVR)
Player
1. The KMPlayer(3.0.0.1441) http://cdn.kmplayer.com/KMP/Download/KMPlayer_EN_3.0.0.1441_R2.exe
2. PotPlayer(2011.12.26. beta) http://get.daum.net/PotPlayer/Beta/PotPlayerSetup.exe
sample file(1080i60) http://www.multiupload.com/2JMT9SLZQI
P.S. PotPlayer with built-in QuickSync decoder is OK. (Unlike ffdshow QuickSync decoder)
I've reproduced/confirmed the issue on another player. Still root causing it.
BTW, PotPlayer seems to use a DLL (quicksync.dll) that has very similar export functions as well as file size, do they use an older version of my DLL or built their own? (either way is fine BTW).
clsid
28th December 2011, 23:19
PotPlayer/KMPlayer are notorious for stealing code from open-source projects (MPC and ffdshow in particular) and violating the GPL.
NikosD
30th December 2011, 20:39
Hello.
I did some benchmarking with QS and I put my results here:
http://forum.doom9.org/showthread.php?t=163110
After the whole experience I have some questions that I would like to ask you.
1) After the installation of latest driver Intel 15.22.52.2559 I found 3 MFT decoders by Intel at C:\Program Files\Common Files\Intel\Media SDK\s1\2.0\
The names are Intel Hardware H.264/MPEG-2/VC-1 Decoder MFT.
But during the enumeration of available codecs in DXVA Checker when I try to benchmark a video file, those decoders never show up.
Why?
Also in their properties they don't seem to have a DXVA option (enable/disable)
2) Your work of QS Decoder is it possible to be installed without FFDShow ?
Because I don't want to install the package of FFDShow just to have your QS decoder.
3) After the installation of your QS decoder through FFDShow and the appropriate configuration to use QS for AVC, VC-1, MPEG-2, during the enumeration of available codecs FFDShow most of the times didn't show up.
And when it was available for benchmarking, it wasn't working at all.
My setup was a little different from what you recommend.
I didn't use Lucid Virtu or Virtual displays.
I just put two different cables of my two different GC (HD 2000 and GT440) at the same time on my two input display and automatically the driver got into Extended mode.
The QuickSync worked OK that way.
4) Why Intel restricts such a POWERFUL DECODER like QS for 1920x1080 only?
I think that the driver's team should "open" the driver up to 4K x 2K that QS could handle with ease.
And of course your decoder and every other decoder using QS must be updated too, to include 4K x 2K.
5) The video file number 9 has some issues (artifacts) during the last 3-4 seconds.
Thanks
egur
31st December 2011, 11:25
Hello.
I did some benchmarking with QS and I put my results here:
http://forum.doom9.org/showthread.php?t=163110
After the whole experience I have some questions that I would like to ask you.
1) After the installation of latest driver Intel 15.22.52.2559 I found 3 MFT decoders by Intel at C:\Program Files\Common Files\Intel\Media SDK\s1\2.0\
The names are Intel Hardware H.264/MPEG-2/VC-1 Decoder MFT.
But during the enumeration of available codecs in DXVA Checker when I try to benchmark a video file, those decoders never show up.
Why?
Also in their properties they don't seem to have a DXVA option (enable/disable)
I don't know - I'm not part of the Media SDK dev team nor the graphics driver team. I'll forward your question.
2) Your work of QS Decoder is it possible to be installed without FFDShow ?
Because I don't want to install the package of FFDShow just to have your QS decoder.
My decoder is not a DirectShow filter. It's a DLL with a simple API that allows decoding - an abstraction layer (with enhancements) above the Intel Media SDK API. This is why it's bundled with FFDShow. It can be integrated into other DS filters quite easily but I started with FFDShow because it's very popular. Now it's part of the official FFDShow builds. This site will allow people to test the newest cutting edge versions.
3) After the installation of your QS decoder through FFDShow and the appropriate configuration to use QS for AVC, VC-1, MPEG-2, during the enumeration of available codecs FFDShow most of the times didn't show up.
And when it was available for benchmarking, it wasn't working at all.
My setup was a little different from what you recommend.
I didn't use Lucid Virtu or Virtual displays.
I just put two different cables of my two different GC (HD 2000 and GT440) at the same time on my two input display and automatically the driver got into Extended mode.
The QuickSync worked OK that way.
Your setup seems fine. Please tell me what's not working in greater detail.
4) Why Intel restricts such a POWERFUL DECODER like QS for 1920x1080 only?
I think that the driver's team should "open" the driver up to 4K x 2K that QS could handle with ease.
And of course your decoder and every other decoder using QS must be updated too, to include 4K x 2K.
See answer #1. My guess would be that it made the HW more expensive and not worth the cost. I'll forward the question.
5) The video file number 9 has some issues (artifacts) during the last 3-4 seconds.
I'll check it out and report back.
NikosD
31st December 2011, 13:03
Your setup seems fine. Please tell me what's not working in greater detail.
From the whole range of clips from 1 to 10, only a few gave me the option to benchmark FFDshow.
I mean that only for a few clips the FFDshow codec appeared red which means DXVA2 capable codec ready for benchmarking with a suitable decoder device and codec.
Most of the times it was grey, available only for DXVA1.
And the few times that it was red, when I tried to benchmark FFDshow the only thing happened was to open the benchmark window and stuck there.
The clip wasn't moving at all.
If you haven't met that situation before, never mind.
I'll search it myself next time.
My guess would be that it made the HW more expensive and not worth the cost.
I mean the hardware as is - the QuickSync first generation - seems more than capable of 4K x 2K decoding.
I believe it's a matter of appropriate drivers and codecs to unleash the power.
I know that Intel advertises that the second generation of QS inside Ivy is capable of not only 4K x 2K, but of mulltiple streams of 4K x 2K and 4K x 4K - square resolution.
I think a simple 4K x 2K is feasible by first generation QS too
Waiting for your feedback.
Thanks
egur
31st December 2011, 14:47
3) After the installation of your QS decoder through FFDShow and the appropriate configuration to use QS for AVC, VC-1, MPEG-2, during the enumeration of available codecs FFDShow most of the times didn't show up.
And when it was available for benchmarking, it wasn't working at all.
DXVA checker will only check DXVA - it loads FFDShow-DXVA which doesn't use the Intel QuickSync decoder.
Try benchmarking using GraphStudio, it's very easy to use. (menu->view->decoder performance). There you can select decoder and renderer (or NULL renderer).
BTW, I'm having a hard time downloading the clips since I don't have a rapidshare account. Please share them on www.multiupload.com, it's the easiest way to share files for free as it currently doesn't impose limitations (it allows concurrent downloads with download managers).
NikosD
31st December 2011, 15:31
From your answer I can understand now that I was looking at a wrong direction.
Your dll appears as a CPU codec not a DXVA codec, that's why you told me to benchmark it with GraphStudio.
Of course it's a CPU codec using DXVA through Intel Media SDK!
I have used GraphStudio and Timecodec at the past in order to benchmark CPU codecs, but after DXVA Checker I use it - DXVA Checker - for all kind of codecs, even CPU.
DXVA checker can use all kind of codecs (CPU, DXVA, CUDA etc)
So I will try again with DXVA Checker and FFDShow CPU codec and not FFDShow DXVA codec, when I'll have access again at the Core i5 system.
Maybe a good idea for Nevcariel could be to include the QS decoder in his LAV Video along with his pure CPU codecs and CUVID codecs, if it's OK by you.
BTW, I found the artifacts at 9th clip during playback with Potplayer and its default QuickSync.dll and then I replaced it and renamed your latest dll and still the artifacts were there.
You don't have to download everything, just clip 9.
I will try to reupload it to mediashare which is like multiupload, I think.
NikosD
31st December 2011, 16:02
One more thing...
There are a lot of minor issues regarding HD Graphics and QuickSync.
I couldn't find a desktop gadget to see the status of the graphics card (clocks, GPU usage, memory).
Even latest GPU-Z reports HD 2000 GC of Core i5 as GT2 at 45nm! and 12 GU, but reports GPU load.
QS is even harder to get info.
No gadget, no program no sign when it's used, no load.
Also DXVA checker disables CPU and GPU usage during playback and benchmarking with QS.
Because of the nature of GC and QS - inside the processor - I don't know if Intel can think a way of "separating" the operations of Graphics Card and QS from the "rest" of the CPU and monitor the activity and features of each component.
egur
31st December 2011, 16:16
One more thing...
There are a lot of minor issues regarding HD Graphics and QuickSync.
I couldn't find a desktop gadget to see the status of the graphics card (clocks, GPU usage, memory).
Even latest GPU-Z reports HD 2000 GC of Core i5 as GT2 at 45nm! and 12 GU, but reports GPU load.
QS is even harder to get info.
No gadget, no program no sign when it's used, no load.
Also DXVA checker disables CPU and GPU usage during playback and benchmarking with QS.
Because of the nature of GC and QS - inside the processor - I don't know if Intel can think a way of "separating" the operations of Graphics Card and QS from the "rest" of the CPU and monitor the activity and features of each component.
There's this tool for analysis:
http://software.intel.com/en-us/articles/vcsource-tools-intel-gpa/
It's mostly used by developers.
BTW, the 9th clip plays fine using ffdshow-quicksync, I'll test the other clips for issues and performance (using graph studio).
CruNcher
31st December 2011, 17:25
Huh it's free now ? wasnt it @ the beginning for Game Studios under NDA only ?
anyway this is really cool :)
http://software.intel.com/file/40560
much better then to just use Microsofts Generic stuff (WPA,WPR) which also goes deep but not so specific on the GPU (DSP) parts :D
also some very interesting information about the Motion Estimation part (and how turbo boost relates to it) :D http://software.intel.com/sites/landingpage/vcsource/frame.php?u=http%3A//software.intel.com/en-us/articles/using-intel-graphics-performance-analyzer-gpa-to-analyze-intel-media-software-development-kit-enabled-applications
also this is showing the problematic with the lock/unlocking of the frame (copy back)
There are two reasons: first, the compressed frame is smaller in size as compared to uncompressed frame, and second, the Intel Media SDK utilizes an optimized data copy with a combination of MOVNTDQA and MFENCE instructions.
ahh and we are @ beta 5 now http://software.intel.com/en-us/articles/vcsource-tools-media-sdk-beta/
Transcode enhancements
Increased performance
Enhanced quality
Easier usage with opaque memory
Abstraction of system buffer and DirectX* surface
Simplified memory between CPU and Intel® Processor Graphics
MVC encode and decode - stereoscopic 3D
Improved videoconferencing extensions
Dynamic bit rate control
Improved robustness (error resilience)
Improved error detection and reporting
Lower latency: Improved responsiveness
Improved long-term reference frame control
New samples
Decoding and stereoscopic 3D rendering sample
Videoconferencing usage sample
OpenCL™ parallel programming sample*
anything specific in information here about the improvements compared to beta 4 (especially on which level where the quality improvements, and how that shows in metric measurements) ?
I love Intel :)
http://software.intel.com/sites/landingpage/vcsource/frame.php?u=http%3A//software.intel.com/en-us/articles/introducing-the-intel-gpa-advisor
you really doing it the right way :)
egur
1st January 2012, 20:39
Version 0.21 beta is out with the following changes:
* Performance boost (~20%) + lower latency decoding by using a worker thread to perform post-decode work (mostly frame copy).
* FFDShow rev4216
Download from SourceForge home page (http://sourceforge.net/projects/qsdecoder/")
NikosD
1st January 2012, 22:07
There's this tool for analysis:
http://software.intel.com/en-us/articles/vcsource-tools-intel-gpa/
It's mostly used by developers.
BTW, the 9th clip plays fine using ffdshow-quicksync, I'll test the other clips for issues and performance (using graph studio).
Happy New Year.
I tried the software and I really loved GPA Monitor.
Very easy to use and see the utilization of both GPU and QS in a beautiful and informative way.
I updated my results to include performance of your QS FFDshow v0.20. Unfortunately I didn't expect you to work today and release a new version!
From what I see from my results here http://forum.doom9.org/showthread.php?t=163110, there is a big penalty by using DXVA through Intel Media than DXVA directly.
It's two times slower in low bandwidth clips!
But in high bitrate clips from 7 to 10, the speed is the same.
I started to monitor MFX performance (QS) with GPA Monitor v4.3 and I found out that during benchmarking QS FFDShow was using from 65% - 75% of QS decoding hardware and QS DXVA was using 96% - 98% of QS decoding hardware.
Also during the monitoring of the benchmarking process, I found out that although QS FFDshow put the CPU in Turbo mode of 3.2GHz in contrast to QS DXVA that put the CPU at a frequency range of 1.6GHz - 2.1 GHz the QS utilization was only slightly lower in QS FFDShow despite the twofold CPU frequency.
Probably because of the lower utilization of QS in general by QS FFDShow I mentioned above.
But the main problem in my opinion is the CPU usage of QS FFDshow.
During playback in both WMP12 and PotPlayer using DXVA, the CPU was at lowest point 1.6GHz for most of the clips, most of the time.
During playback with DXVA checker and PotPlayer with your QS FFDShow dll, the CPU went very high in Turbo mode of 3.2GHz in all 1080p60fps clips!
For the rest of the clips, even the "tough" ones like from 7 to 10 the CPU frequency went from 1.6GHz to 3.1GHz
PotPlayer in DXVA mode couldn't play clips 2 and 3 - it falls back to software mode.
WMP12 plays fine in DXVA mode from clip 1 to 10 as PotPlayer with QS FFDshow.
I INSIST ON writing you that during playback of clip 9 (Ducks Take off) with both DXVA checker and PotPlayer using QS FFDShow I see ARTIFACTS at the last few seconds with version 0.20.
I know nothing about version 0.21.
I'll try it when I go to Core i5 system again.
egur
1st January 2012, 22:48
Happy New Year.
Happy new year to you too :)
I tried the software and I really loved GPA Monitor.
Very easy to use and see the utilization of both GPU and QS in a beautiful and informative way.
Very nice.
I updated my results to include performance of your QS FFDshow v0.20. Unfortuantely I didn't expect you to work today and release a new version!
Well, most of the work was done in '11.
From what I see from my results here http://forum.doom9.org/showthread.php?t=163110, there is a big penalty by using DXVA through Intel Media than DXVA directly.
It's two times slower in low bandwidth clips!
But in high bitrate clips from 7 to 10, the speed is the same.
There's a performance penalty which is low during normal playback. When going full speed, the CPU utilization goes up because of the memory copying (GPU->CPU). The CPU stays at LFM (1.6GHz) for the duration of playback.
FYI, QuickSync is part of the GPU not the CPU so it operates on the GPU frequency (650-1350GHz). The GPU frequency is determined by dynamically. There's a nice overview on SandyBridge's architecture from AnandTech.
Regarding performance, I'm not a magician, DXVA will always be faster and use less power. But it's not good for everyone due to it's many restrictions. Actually the hassles of DXVA are abstracted by my code as well the Media SDK. The Media SDK add practically zero overhead. Copying the frames to system memory so all plays can enjoy HW acceleration is the main feature of the Intel QuickSync decoder.
It's also has features like using a different GPU for rendering.
You should try v0.21 which is the first beta. It's faster and will probably get even faster once I finish tuning the code in a week or two.
BTW, DXVA is fast buy it's not always working well for me (and others).
Regarding corruption, sometimes it's a matter of what splitter is used.
The best to date are: LAV and Haali.
None of them is perfect but they are very good. Haali has aspect ratio issues and LAV doesn't seek as well on broken TS streams (slow seek+corrupted frames). I had little issues with MPC's splitter as well. With PotPlayer I got crashes I don't know how to fix.
I played clip #9 using ZoomPlayer (32 bit) and Windows Media Center (64 bit) and no corruption.
Another thing, when benchmarking using Graph Studio, the first run is always slower for some reason, so you should either omit it or run 10 passes. It looks like the graph init time (happens only on the 1st run) gets into the stats.
egur
1st January 2012, 23:17
Setup:
* GraphStudio, 10 passes, NULL renderer.
* 10 clips from http://forum.doom9.org/showthread.php?t=163110
* FFDShow rev4216 (QS 0.21) 32 bit.
* Windows 7 Ultimate 64 bit, Aero on.
* Intel driver: v2559.
* Lucid Virtu: not installed.
* CPU: i7-2600 (3.4GHz), power management on. HD 2000 (GT1) iGPU.
* DDR3 @1333MHz (nothing fancy)
* Score are lowest/avg/highest frame rates for the entire clip.
* Note - the first pass was always slower due to graph contruction time which affects short clip benchmarks. The most interesting results are the highest FPS as the median score is very close to them.
1.Twinpeaks1080p30fpsRef2-27Mbps.mov
264/297/303
2.Samsung.Demo.Oceanic.Life-1080p30fpsRef16-40Mbps.mkv
240/260/263
3.Basketball-1088p60fpsRef8-10Mbps.mkv
310/315/319
4.Girls.YoonYoon-1080p60fpsRef5-21Mbps.mkv
298/307/310
5.Birds_1080p60fpsReF2-30Mbps.mp4
283/294/298
6.Cat-1080p60fpsRef4-25Mbps.m2ts
291/290/301
7.Vortexx_1088p24fpsRef3-109Mpbs.mp4
119/134/140
8.Birds_1080p24fpsRef4-112Mbps.mkv
122/134/137
9.Ducks.Take.Off.1080p30fpsRef5-108Mbps.mkv
147/154/156
10.Crowd.Run.1080p25Ref4-116Mbps.mkv
115/126/128
NikosD
1st January 2012, 23:36
Thanks for the info.
Next time I will try to check the GPU speed, but I think that GPA Monitor doesn't provide this, I'm not sure.
I think GPU-Z can help, although it's reporting wrong HD 2000 features.
CruNcher
1st January 2012, 23:40
Happy New Year everyone :)
@Egur
did you benchmark how the glass shader of aero is hurting performance i did some tests and it seems quiet heavy compared to all other effects disabling it gives me a rather big boost and also lowers GPU usage without losing Aeros V-Sync :). Im also experimenting how the GPU Frequency impacts Performance currently my Motherboard is capable overclocking the HD 2000 (GPU overclocking bits enabled) :)
Also it seems that actually the CPU is never staying @ the same Frequency but always changing it even with a High Performance Profile it constantly switches over here see with http://www.mediafire.com/?kwrwnj41428lzzg coded on Intel Specs very great work (capable of changing multiplier on the fly if bios allows it)
@NikosD
Hwinfo32 is a great tool sensoring GT1 frequency and power consumption for free it also shows correct data compared to GPU-Z
egur
1st January 2012, 23:51
Happy New Year everyone :)
@Egur
did you benchmark how the glass shader of aero is hurting performance i did some tests and it seems quiet heavy compared to all other effects disabling it gives me a rather big boost and also lowers GPU usage without losing Aeros V-Sync :). Im also experimenting how the GPU Frequency impacts Performance currently my Motherboard is capable overclocking the HD 2000 (GPU overclocking bits enabled) :)
These are preliminary test that will serve as a performance benchmark mostly for development purposes - I want to be able to get the maximum out of the code and I think I can do a little better.
It's best that "official" benchmarks are produced by others for objectivity sake :)
Also when you have an 8 thread CPU (i7-2600), SW decoders can perform very well. Especially on low bitrate clips.
I also didn't test with faster memory, my new HTPC setup will have an i7-2600K with 1600MHz DDR3. My aging Core 2 Duo + Nvidia 7600 is having a hard time pushing a Full HD TV...
CruNcher
2nd January 2012, 00:34
Egur im gonna bench this new version as well against the DXVA2 generic frame copy no MSDK competition ;)
NikosD
2nd January 2012, 10:31
@NikosD
Hwinfo32 is a great tool sensoring GT1 frequency and power consumption for free it also shows correct data compared to GPU-Z
I didn't find power consumption in Hwinfo32.
I found this tool from Intel which works for SandyBridge and later processors only.
I haven't tried it yet, but it looks promising and most accurate, but for the whole CPU (no separate components)
http://software.intel.com/en-us/articles/intel-power-gadget/
Looking for a Windows Desktop gadget to support GPU load, clocks, memory for Intel HD Graphics (GT1, GT2)
nevcairiel
2nd January 2012, 13:20
Hi Eric,
I've been looking into adding your QuickSync decoder to LAV Video, because the API is so trivial that i really cannot go wrong with it.
Without having tested this stuff yet, i have a few questions/concerns:
1) I'm really not a big fan of your timestamp interpolation logic. There are so many cases where i'm not sure it would work properly.
I've been trying to fully understand the timestamp code, and i am still wondering why you don't just use timestamps provided by the source, if present. Trying to calculate the average framerate and re-calculating all timestamps based on that seems dangerous, considering applications like live TV where there could easily be gaps.
Anyway, i guess what i'm asking is an option to disable all your fancy logic and just give me back the timestamps i put in, just re-ordered by the decoder (PTS timestamps). I can take care about gaps and whatnot in the timestamps myself.
Would you consider this, maybe if i provide a patch?
2) How is the new multi-threading handled, specifically is it "transparent" to the caller?
More specifically, which thread calls the deliver callback? Is it the worker thread, or the thread of the caller? I'm a bit cautios of exposing different threads to directshow, and i would prefer that always the callers thread is used to deliver frames.
From looking at the code, it seems like its doing it the way i hope it does, but i just want to make sure.
Are there any reasons i would not want your worker thread (considering you added an option)?
3) How is the memory in the QsFrameData structure handled?
Do i have to free the structure and the y/u/v pointers? Or is it re-used on the next frame automatically, so i should copy it into another buffer?
Also, considering its NV12 data, the names are not choosen all that wisely, i would've gone with a planes[4] array or something (for future proofing), instead of 3 named parameters. :)
CruNcher
2nd January 2012, 13:42
I didn't find power consumption in Hwinfo32.
I found this tool from Intel which works for SandyBridge and later processors only.
I haven't tried it yet, but it looks promising and most accurate, but for the whole CPU (no separate components)
http://software.intel.com/en-us/articles/intel-power-gadget/
Looking for a Windows Desktop gadget to support GPU load, clocks, memory for Intel HD Graphics (GT1, GT2)
Yep problem with it you need F8 unsigned driver workaround @ bootup (for NT6 64 bit) Throttlestops used ring0 driver should be as low latency and doesn't need F8 and the unsigned driver bootup :)
Hwinfo32 should be able to read the Power consumption information (Watt) of both CPU/GPU as well it has a Gadget (i prefer inclusion into the almost excellent (some design issues with the monitoring options) rivatuner sensoring framework (shared memory) though instead of a heavy gadget)
http://img265.imageshack.us/img265/6975/hwinfo32cpugpupower.png
Though the Drawback Hwinfo32 has only a max sample resolution of 100ms where Intel can go as low as 25ms (impressive thx to HPET) Throttlestop though can be much lower latency then Hwinfo32 using the "More Data" option.
Though to low sampling resolution isn't always a good idea under a non RTOS such as Windows it can have big impact on Performance if done wrong and especially Hwinfo32 wasn't made for to low sampling resolutions it's sensoring monitor is causing a lot of stress to the system updating very fast (always be careful measuring on the software side time critical stuff with so low latencies).
1) I'm really not a big fan of your timestamp interpolation logic. There are so many cases where i'm not sure it would work properly.
I've been trying to fully understand the timestamp code, and i am still wondering why you don't just use timestamps provided by the source, if present. Trying to calculate the average framerate and re-calculating all timestamps based on that seems dangerous, considering applications like live TV where there could easily be gaps.
Yep this already showed up on different (craziest one was the asian i guess ts commercial record (girl sings about a petrol brand :D ) gone totally out of sync) samples i guess also my last (lost lock problem on mpc-hc after finding the auto deinterlacing option problems) report (some pages back) http://forum.doom9.org/showpost.php?p=1538481&postcount=265 is caused by this. Though i have to recheck with the new version maybe its fixed :)
egur
2nd January 2012, 15:11
I've been looking into adding your QuickSync decoder to LAV Video, because the API is so trivial that i really cannot go wrong with it.
Excellent :)
Without having tested this stuff yet, i have a few questions/concerns:
1) I'm really not a big fan of your timestamp interpolation logic. There are so many cases where i'm not sure it would work properly.
...
I'll add a disable bit. No problem. In fact the the time stamp logic took too much effort and should probably be handled in the DS filter.
2) How is the new multi-threading handled, specifically is it "transparent" to the caller?
More specifically, which thread calls the deliver callback? Is it the worker thread, or the thread of the caller? I'm a bit cautios of exposing different threads to directshow, and i would prefer that always the callers thread is used to deliver frames.
From looking at the code, it seems like its doing it the way i hope it does, but i just want to make sure.
Are there any reasons i would not want your worker thread (considering you added an option)?
The decode thread will receive samples and output samples. The worker thread will do processing in the background. I think EVR doesn't like it any other way.
There's an option to disable multithreading if it's not stable enough. I've tested it quite a bit and it works great. But since I can't guaranty 100% functionality, I've added the chicken bit. When disabled the worker thread will not be created. Like all settings, this must be set before calling the Init function.
The downside is that more system memory is used - ~3 extra frames.
3) How is the memory in the QsFrameData structure handled?
Do i have to free the structure and the y/u/v pointers? Or is it re-used on the next frame automatically, so i should copy it into another buffer?
Also, considering its NV12 data, the names are not choosen all that wisely, i would've gone with a planes[4] array or something (for future proofing), instead of 3 named parameters. :)
The buffers are reused so you shouldn't free them. I currently use aligned_malloc but this can change. Also the addresses of y/u/v do not point to the allocation address. They are at an offset for faster GPU-CPU copying.
The buffers are also writable (currently) - you can modify their content (I'll never read from them). There's a bool in the QsFrameData that specifies this.
The buffer is a single allocation (stride * height). If you need the buffer to be larger (e.g. for in-place format conversion), let me know.
Regarding pointer names, NV12 is the only supported format ATM, but this may change (don't know if and when). Regarding a forth channel (you meant alpha?), I can add it for completeness.
Usually when I deal with images, I use unions for clarity:
union{
char* red; //RGB colorspace
char* y; //YCbCr colorspace
char* luma; //HSL colorspace
};
So I'll add the unions, it's clearer than a vague pointer array.
egur
2nd January 2012, 15:14
Yep problem with it you need F8 unsigned driver workaround @ bootup (for NT6 64 bit) Throttlestops used ring0 driver should be as low latency and doesn't need F8 and the unsigned driver bootup
:confused:
CruNcher
2nd January 2012, 16:57
oops i see it got updated and has a signed kernel driver now hehe :D
btw http://software.intel.com/en-us/blogs/2011/03/31/accessing-intel-power-gadget-from-intel-energy-checker-sdk/ you could implement it into the ffdshow OSD (would be cool, to see it realtime @ video playback their would be other ways over the directx surface but directly encoded into the video would be also nice for several tasks, how much ~power consumed a frame to decode in the framework) and additionally directly into mpc-hcs osd ;), Though i guess it wont be that High Resolution with all the latency to exactly pinpoint 1 frames decoding power consumption, but a rough estimate per gop could be also interesting (of course to be useful @ all you need the system to be background noise free when measuring anything specific) ;)
NikosD
2nd January 2012, 17:22
I tried Hwinfo32 in my system with Core2Duo and I didn't see power usage.
From what I saw of your pictures and Egur's reply it's not clear if I run it on a Win 7 SP1 x86 - Core i5-2400 system without any tricks, if could I see power usage ?
What is the procedure to activate power usage if it's not on by default ?
I'm planning to run some tests again on Friday or Saturday on SandyBridge.
CruNcher
2nd January 2012, 17:39
Power Consumption measuring is only available on Sandy Bridge and up
Known Limitations/Issues
Only works on 2nd Generation Intel® CoreTM processor family (Sandy Bridge) or later
Use 32-bit installer only on 32-bit OS
Not sure but actually i think this feature came from Atom :) Sandy Bridge combines a lot of the low power stuff that was started with Atom :)
nevcairiel
2nd January 2012, 18:32
Hey Eric,
how did you ever manage to compile a release build of the decoder? :D
1>d:\dev\multimedia\lavfsplitter\intel qs\qsdecoder\intelquicksyncdecoder\quicksyncdecoder.cpp(643): error C2220: warning treated as error - no 'object' file generated
1>d:\dev\multimedia\lavfsplitter\intel qs\qsdecoder\intelquicksyncdecoder\quicksyncdecoder.cpp(643): warning C4715: 'CQuickSyncDecoder::SetD3DDeviceManager' : not all control paths return a value
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.