View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing
Correct. Should be better quality anyway since the desktop displays at full range. The display pipeline within the GPU can change it limited range later on but data will be lost.
So to minimize range conversions, I output at full range.
The optimal would be to output YCbCr at standard levels (16-235 for Y) but Windows doesn't support that as far as I know.
Maybe one day Windows will support 30bpp output but till then, my setup is probably optimal.
nevcairiel
1st June 2013, 18:04
You only run into problems if your TV doesn't tell your GPU that it accepts full-range (even if you tell your TV to expect full range), and the Intel option doesn't let you override (which probably was never fixed)
NVIDIA kinda has the same issue, as they don't even have a visible option to override the HDMI range detection, but at least they have sneaky ways to override it.
hajj_3
2nd June 2013, 07:19
Quick Sync Performance
With more graphics EUs under the hood of all desktop Haswells (at least those launching today), Quick Sync performance improves a bit over Ivy Bridge. Intel claims to have focused heavily on improving the quality of Quick Sync transcodes however in my testing I saw a slight regression in quality. I didn’t have a ton of time to dig further to find out what’s going on but I plan on doing so post-Computex.
The other big news is Handbrake now officially supports Quick Sync, something Ganesh will be testing with his HTPC look at Haswell.
Needless to say, Quick Sync performance is better on Haswell than on Ivy Bridge. And it’s even better if you happen to have a Haswell with a 128MB L4 cache.
http://images.anandtech.com/graphs/graph6993/55316.png
Source: http://www.anandtech.com/show/7003/the-haswell-review-intel-core-i74770k-i54560k-tested/8
Egur: are you able to paste us the settings that the new higher quality profile that quicksync uses in haswell as the anandtech article doesn't say what they are?
Egur: are you able to paste us the settings that the new higher quality profile that quicksync uses in haswell as the anandtech article doesn't say what they are?
No, I don't have them.
NikosD
2nd June 2013, 13:23
That's the easiest option, you can later uninstall it. Once installed, copy the QS DLL.
The easiest option would be if I can get it as a single file, without all the trouble of installing/ uninstalling.
PM works I think in doom9, if you can't post it in public :)
Your setup looks great :)
SW decoding is missing but you don't need that anyway.
The problem is that the above MSDK configuration is visible only if I make a fake display.
But that is the Windows 7 way.
In Windows 8 we shouldn't make any fake displays.
So if I connect my monitor to dGPU in Win 8 without making a fake display I get this - state is 08, not active:
Intel Media SDK System Analyzer (32 bit)
The following versions of Media SDK API are supported by platform/driver:
Version Target Supported Dec Enc
1.0 HW No
1.0 SW No
1.1 HW No
1.1 SW No
1.3 HW No
1.3 SW No
1.4 HW No
1.4 SW No
1.5 HW No
1.5 SW No
1.6 HW No
1.6 SW No
Graphics Devices:
Name Version State
Intel(R) HD Graphics 9.17.10.3062 08
AMD Radeon HD 6700 Series 12.104.0.0 Active
System info:
CPU: Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz
OS: Microsoft Windows 8 Pro
Arch: 64-bit
Installed Media SDK packages (be patient...processing takes some time):
Installed Media SDK DirectShow filters:
Installed Intel Media Foundation Transforms:
Intelχ Quick Sync Video H.264 Encoder MFT : {4BE8D3C0-0515-4A37-AD55-E4BAE19AF
471}
Tips:
- SW target does not work: make sure Media SDK DLL (e.g. libmfxsw64.dll)
is located in your executable path or in system path
Analysis complete... [press ENTER]
NikosD
2nd June 2013, 16:52
http://www.anandtech.com/show/7003/the-haswell-review-intel-core-i74770k-i54560k-tested
Haswell CPU is all but desktop. Desktop view of Haswell CPU is at least boring.
About 8% faster than Ivy on average (AVX and AVX2 are faster and a lot faster)
The rival that Haswell is going to compete is not an AMD processor or a previous generation Intel.
The enemy is ARM
The new PCs are not desktop PCs. They are smartphones and tablets.
http://www.anandtech.com/show/6993/intel-iris-pro-5200-graphics-review-core-i74950hq-tested
Haswell reviews are up, Iris Pro 5200 is really impressive, it trounces the 100W desktop Trinity real good.
Haswell GPU is far more interesting than Haswell CPU.
Iris Pro 5200 (55W) didn't catch the target that Intel put, Nvidia 650M.
Actually it didn't catch even 640.
It's slower in almost every game, but it's very close or even faster in synthetic benchmarks.
Is it hardware limitation or driver's optimizations ?
Will see...
I was amazed by OpenCL numbers though.
In LuxMark 2.0, GT3e is even faster than my 5750 :eek:
VT-d and the other extensions are absent not only in K models, but in R (GT3e) models, too.
SAD
I think I'm gonna stick to my signature system again - with my new VP5.
If 4770R is reachable to a non-OEM product - maybe a M/B with a soldered CPU - I could think of an upgrade.
NikosD,
I just re-ran the analyzer and it didn't find HW acceleration (iGPU is headless in Win8). But QS works. In my setup the status of "08" was also there.
Your setup is fine.
Here's the QS DLL (32 bit) (http://www.multiupload.nl/NWVLRXI93D).
Here's a test DLL with debug colors (http://www.multiupload.nl/V911D3P4QY). Look at the top left corner, if QS is working you'll see a color rectangle:
* Blue - D3D9 HW
* Magenta (pink) - D3D11 HW
* Red - SW.
* No color - another decoder is active :)
NikosD
2nd June 2013, 19:20
Well done, Eric!
It works like a charm...
It works with a simple copy-paste in LAV folder, but PotPlayer has a different dll and it's not compatible with your structure.
So PotPlayer doesn't work, they have to do it by their own, as always.
Some benchmark results with DXVA Checker: (GPU clock is default max 1100MHz)
D3D11
Every 1080p H.264 clip I tried is always above 110 - 130 fps with an average CPU of 7% with the lowest CPU clock of 1600MHz. Unbelievable!
Even 1080p60 clips go below 2000MHz CPU and towards the end of the clips go down to 1600MHz CPU clock with an average CPU of 8%!
It was like testing DXVA native, but with a lot lower performance.
Samsung demo is the only one that goes below 100fps - even if it's not the most difficult of my test suite.
D3D9
It's a lot, lot faster than D3D11 from 100% to more than 250%, but CPU goes up to 3100MHz and more and the average utilization goes up to 40%!
Conclusion:
For laptops and power consumption aware desktop users, I would definitely suggest D3D11 use instead of D3D9 use of QuickSync decoder.
It can play every 1080p clip with ease and with minimum power consumption out of the box, with no "fake display" or other tricks.
andyvt
2nd June 2013, 19:47
Egur: are you able to paste us the settings that the new higher quality profile that quicksync uses in haswell as the anandtech article doesn't say what they are?
Do you mean the expanded target usage values?
GTPVHD
3rd June 2013, 08:32
http://www.anandtech.com/show/7007/intels-haswell-an-htpc-perspective
Unfortunately there's trouble in Haswell Quicksync's encoding quality.
In our opinion, the QuickSync results on HD4600 appear to be worse than what is obtained on the HD4000. With Haswell, Intel introduced seven levels of quality/performance settings that application developers can choose from. According to Intel, even the lowest quality Haswell QSV settings should be better than what we had with Ivy Bridge. In practice, this simply isn't the case. There's a widespread regression in image quality ranging from appreciably worse to equal at best with Haswell compared to Ivy Bridge. I'm not sure what's going on here but QuickSync remains one of the biggest missed opportunities for Intel over the past few years. The fact that it has taken this long to get Handbrake support going is a shame. Now that we have it, the fact that Intel seems to have broken image quality is the icing on a really terrible cake.
nevcairiel
3rd June 2013, 08:50
Its very well possible that Handbrakes settings are not optimized for the new presets in Haswell yet, i wouldn't jump to conclusions too early.
wanezhiling
3rd June 2013, 09:14
The few who care about advanced madVR scaling algorithms (such as Jinc and the anti-ringing filters for Lanczos) may need to fork out for a discrete GPU
http://i.imgur.com/egBCTtS.gif
But this refers to 4600, we know 5200 is double stronger than 4600.:)
ryrynz
3rd June 2013, 09:28
Yup, the 4600 will be adequate for lanczos as even the HD 3000 has no issues there, Jinc will probably be still a bit too demanding for some content though. But yeah the Iris Pro (HD 5200) is what I'm interested in too.
wanezhiling
3rd June 2013, 09:57
the Iris Pro (HD 5200) is what I'm interested in too.
http://en.wikipedia.org/wiki/Haswell_(microarchitecture)#Mobile_processors
4750HQ 4850HQ 4950HQ :)
nevcairiel
3rd June 2013, 10:06
http://en.wikipedia.org/wiki/Haswell_(microarchitecture)#Mobile_processors
4750HQ 4850HQ 4950HQ :)
Can't really put a mobile CPU into your HTPC. :p
The 4770R would be interesting, if they sell it soldered on a ITX board without a whole PC around it.
Conclusion:
For laptops and power consumption aware desktop users, I would definitely suggest D3D11 use instead of D3D9 use of QuickSync decoder.
If you've reached your conclusion based upon DXVA checker benchmarks, then let me correct you if I may.
You should check power and CPU utilization/speed at normal playback speeds. DXVA checker tries running at full speed which doesn't matter for playback.
Even for transcoding, your conclusion is a little hasty. You need to sum up the power used for the entire clip in both cases and see which one is better. Looking the power draw alone is not enough as D3D9 will finish much sooner.
CiNcH
3rd June 2013, 15:32
I read that with Haswell, the Display Controller also moved to the CPU. This could mean that we now have a common clock for audio (HDMI) and video!? It is quite an improvement that the video clock can now achieve an accurate 23.976Hz. This does not help a bit though if the audio resp. the DirectShow graph clock is off. Because the real problem has never been the inaccurate video clock but the deviation between audio/graph and video clock...
The digital display interfaces moved to the system agent (north bridge) within the processor instead of being transferred to via the DMI bus to the PCH. This is more efficient but as for the audio clock, I don't know.
jkauff
3rd June 2013, 15:57
Its very well possible that Handbrakes settings are not optimized for the new presets in Haswell yet, i wouldn't jump to conclusions too early.
QS quality in Handbrake beta is not as good on an HD4000 as other implementations (Nero Recode, Arcsoft Media Converter, Media Coder). These implementations vary in quality, too, so I think there's probably a lot of tweaking that needs to happen to get the best results out of QS. Intel is reportedly working closely with the Handbrake team, so I'd expect improvements in future betas.
Iris Pro 5200 (55W) didn't catch the target that Intel put, Nvidia 650M.
Actually it didn't catch even 640.
According to anandtech they matched this target. Anand tested a 900/2500 GDDR5 650M. So they should match a 800 Mhz 650M with DDR3 memory. GT640 on the desktop side is difficult to say because its competitor will be a 65W BGA GT3e.
QS quality in Handbrake beta is not as good on an HD4000 as other implementations (Nero Recode, Arcsoft Media Converter, Media Coder). These implementations vary in quality, too, so I think there's probably a lot of tweaking that needs to happen to get the best results out of QS. Intel is reportedly working closely with the Handbrake team, so I'd expect improvements in future betas.
This is absolutely wrong. Handbrake QS quality is top notch for Ivy Bridge. I did lots of tests. As for Haswell there might be some issues with the new presets.
jkauff
5th June 2013, 01:55
This is absolutely wrong. Handbrake QS quality is top notch for Ivy Bridge. I did lots of tests. As for Haswell there might be some issues with the new presets.
I guess I'll have to do additional testing with different source files. The three files I transcoded in the various apps had many more visible artifacts with Handbrake QS.
wanezhiling
5th June 2013, 11:05
Haswell family (http://ark.intel.com/products/codename/42174/Haswell)
Why separately name Iris Pro 5200 family Crystal Well (http://ark.intel.com/products/codename/51802/Crystal-Well) o.O
NikosD
5th June 2013, 12:13
If you've reached your conclusion based upon DXVA checker benchmarks, then let me correct you if I may.
You should check power and CPU utilization/speed at normal playback speeds. DXVA checker tries running at full speed which doesn't matter for playback.
Even for transcoding, your conclusion is a little hasty. You need to sum up the power used for the entire clip in both cases and see which one is better. Looking the power draw alone is not enough as D3D9 will finish much sooner.
Sorry for the late response. I was really busy the last few days...
For me benchmarking with DXVA Checker exposes the potential of the hardware and has only one real use, transcoding, because it uses hardware at its limits - like benchmarking.
The other real use is normal playback of course - more important to me.
Well, during normal playback DXVA native and DXVA copy-back are by far the most power-efficient solutions.
QS decoder power efficiency during playback is almost the same for various clips I tried under both situations - D3D11 and D3D9.
So, it seems that the real advantage of D3D11 is the headless, no fake display use of QS decoder.
According to anandtech they matched this target. Anand tested a 900/2500 GDDR5 650M. So they should match a 800 Mhz 650M with DDR3 memory. GT640 on the desktop side is difficult to say because its competitor will be a 65W BGA GT3e.
It is true that Anand tested the best variant of 650M, but the difference in games is big enough to not be covered by GT3e using a little lower clocked 650M with DDR3 memory - we have to see real numbers of course.
Iris Pro 5200 tested by Anand is inside a mobile CPU with lower TDP than desktop, but I think that the special version used in benchmarks of 55W is the top-performer GT3e that anyone could use regardless of Desktop-Mobile CPU package.
Eric could tell us more about that...
itsonlyjustincase
5th June 2013, 14:32
I have a question to all ? Do people with a laptop with win8 and sandy or ivy bridge coupled with discrete GPU has tested it ?
Cause the last version of the intel drivers was supposed to permit to use intel quick sync technology even if the iGPU isn't used. But in my case when the dGPU is used the possibility to use quicksync through lav filters disappears
Haswell family (http://ark.intel.com/products/codename/42174/Haswell)
Why separately name Iris Pro 5200 family Crystal Well (http://ark.intel.com/products/codename/51802/Crystal-Well) o.O
CrystalWell is the code name for the EDRAM chip itself.
The combined name for GT3+EDRAM is Iris Pro.
I have a question to all ? Do people with a laptop with win8 and sandy or ivy bridge coupled with discrete GPU has tested it ?
Cause the last version of the intel drivers was supposed to permit to use intel quick sync technology even if the iGPU isn't used. But in my case when the dGPU is used the possibility to use quicksync through lav filters disappears
I didn't get any feedback except yours. No problems with Win8 on desktop.
wanezhiling
5th June 2013, 16:08
Thanks Eric.
http://ark.intel.com/compare/76087,76086,76085
I noticed that Max Memory Bandwidth of these three models is 76.8GB/s, and all other Haswell models(even 4770R) are 25.6GB/s. Is it true?
I guess I'll have to do additional testing with different source files. The three files I transcoded in the various apps had many more visible artifacts with Handbrake QS.
Arcsoft is crap to be honest. High profile isn't possible and I can't even choose the preset. It must be the balanced or speed preset. Quality preset not possible.
It is true that Anand tested the best variant of 650M, but the difference in games is big enough to not be covered by GT3e using a little lower clocked 650M with DDR3 memory - we have to see real numbers of course.
Iris Pro 5200 tested by Anand is inside a mobile CPU with lower TDP than desktop, but I think that the special version used in benchmarks of 55W is the top-performer GT3e that anyone could use regardless of Desktop-Mobile CPU package.
Eric could tell us more about that...
GDDR5 makes quite a big difference. See how the GT640 performs.
GT650M 900/2500 GDDR5
GT 640 925/1700 DDR3
Despite having the slightly faster GPU GT640 loses by a good margin. Except BF3 Iris Pro isn't much slower than a GT640 925/1700. On this basis I would say Iris Pro should match a 800 Mhz 650M with DDR3. As for the 65W model Iris Pro is clearly TDP limited in a 47W power envelope, 55W Iris Pro runs 10% or so faster. 65W Iris Pro for desktop could be even faster slightly.
itsonlyjustincase
5th June 2013, 16:43
CrystalWell is the code name for the EDRAM chip itself.
The combined name for GT3+EDRAM is Iris Pro.
I didn't get any feedback except yours. No problems with Win8 on desktop.
That's what i was doubting. Okay so i think that as you may have previously pointed out, Optimus disables entirely iGPU when dGPU is forced. So as for sure Nvidia won't do anything about it, i won't be able to use it in that way
nevcairiel
5th June 2013, 16:51
http://ark.intel.com/compare/76087,76086,76085
I noticed that Max Memory Bandwidth of these three models is 76.8GB/s, and all other Haswell models(even 4770R) are 25.6GB/s. Is it true?
The EDRAM is a L4 Cache, which means it can boost memory bandwidth.
wanezhiling
6th June 2013, 01:11
But 4770R is still 25.6GB/s
The math is:
1.6GHz*8*2 = 25.6GB/s
- 8 bytes per channel bus width (64 bit)
- 2 memory channels.
Anyway, I agree that the various Iris Pro SKUs should have the same memory bandwidth. Will be fixed.
wanezhiling
6th June 2013, 13:49
9.18.10.3186 ;)
32bit (http://file2.mydrivers.com/display/intel_hd_graphics_9.18.10.3186_Win32.zip)
64bit (http://file2.mydrivers.com/display/intel_hd_graphics_9.18.10.3186_Win64.zip)
Tacio
8th June 2013, 17:38
Any news about NVIDIA Optimus support? Because on my system (Asus U36 with GT520M) QuickSync doesn't work with latest ffdshow. Just shows that libavcodec in use.
I didn't get a clear answer on the Nvidia forum. Sadly no progress on this front.
I'll try to ask within Intel.
itsonlyjustincase
9th June 2013, 12:18
I also tried to ask the question on the nvidia forum but no answer
https://forums.geforce.com/default/topic/546217/geforce-mobile-gpus/intel-quicksync-when-dgpu-is-forced/
I also tried to mail through the driver support but i think i'll never get an answer
FYI, here's my post (https://forums.geforce.com/default/topic/544766/geforce-drivers/optimus-behavior-with-secondary-gpu-win8-dx11-1-/) in the Nvidia forums. If enough people interact, Nvidia will answer :)
NikosD
9th June 2013, 15:38
Very nice integration of Media Performance in System Analyzer of latest GPA 2013 R2.
Performance metrics of CPU & GPU & QuickSync & Power all-in-one and each one separately.
Just one thing.
It would be nice if there was an option "Always on top", for better monitoring of system resources.
Superb
11th June 2013, 12:57
Just wanted to post that I've installed Windows 7 on a friend's laptop yesterday with the Intel driver from Windows Update... and QuickSync worked. (tested w/ LAV Filters; ran an actual video and checked the status)
egur
11th June 2013, 13:15
I got a report that this issue was fixed with Microsoft.
Personally, I still strongly advise against it.
egur
11th June 2013, 19:38
An issue was found with D3D11 playback using 15.31 drivers (IVB/HSW). The issue is a green playback screen.
A fix will be released shortly.
madchicken265
13th June 2013, 00:25
Hey Egur, any love for Sandybridge (new drivers)?
egur
13th June 2013, 07:25
Hey Egur, any love for Sandybridge (new drivers)?
Maybe (don't know). New drivers for SNB will have bug fixes not new features. Driver support has been always for current + (1) previous generation.
theoneofgod
13th June 2013, 18:44
I find when I use FFDShow for Audio and Video, audio seems slightly out of sync. Without the video decoder it seems fine, not sure which is the culprit. I use optical to my receiver. I've tried both an Asus Xonar DG and my Realtek ALC898 soundcards, both have the same results.
NikosD
14th June 2013, 11:20
Maybe (don't know). New drivers for SNB will have bug fixes not new features. Driver support has been always for current + (1) previous generation.
AMD (ATI) and Nvidia support many previous generations from current (at least 3)
theoneofgod
15th June 2013, 06:44
Does this work fine with a Crossfire setup? Other things are detecting QSV but this won't.
egur
15th June 2013, 07:27
Does this work fine with a Crossfire setup? Other things are detecting QSV but this won't.
Works with up to 3 discrete GPUs in theory. As far as I know, systems with more than 2 dGPUs don't have QuickSync (they have extreme edition processor models w/o QuickSync).
Note that using QuickSync when the display is not connected to the processor graphics (iGPU) is only supported in Windows 8. In Windows 7 you need to extend the desktop to the disconnected iGPU.
nevcairiel
15th June 2013, 08:25
With the correct motherboard with a PCIe switch, you can have 4 dGPUs with a "normal" CPU
egur
15th June 2013, 09:06
May be possible but not likely. In any case no support for systems where the iGPU is adapter number 4 (zero based) as enumerated by d3d.
nevcairiel
15th June 2013, 09:25
Isnt the adapter number based on the number of connected screens as well? I can have one dGPU with 4 connected screens, and the iGPU would only come after that. Extreme setup i suppose, but possible :)
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.