View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing
GTPVHD
24th April 2013, 17:22
I didn't know of any hardware decoder that can decode 10bit encodes. :p
nevcairiel
24th April 2013, 17:28
I didn't know of any hardware decoder that can decode 10bit encodes. :p
I suppose QuickSyncs software fallback can decode 10-bit, but its already butt-slow in comparison, and Eric didn't write it, its part of the driver.
I think you're really confused right now. :)
egur
24th April 2013, 18:41
I've implemented an AVX2 copy back function, the rest of my code takes almost zero CPU time.
AVX2 isn't any faster for this operation. In fact it was a little slower.
As for 10bit, I'm not aware that the SW Intel implementation can do it. I think it should imitate the HW capabilities.
andyvt
24th April 2013, 18:47
As for 10bit, I'm not aware that the SW Intel implementation can do it. I think it should imitate the HW capabilities.
Do you expect support for 10-bit anytime soon? Without demand for it, hardly seems like it would make the cut.
egur
25th April 2013, 12:36
Do you expect support for 10-bit anytime soon? Without demand for it, hardly seems like it would make the cut.
If any time soon mean Haswell, then no. Afterwards, I don't know. My guess is the same as yours, 10 bit is a niche profile. I assume that it requires massive HW changes (+die area) to support it and the added value is small (for most use cases).
nevcairiel
25th April 2013, 12:43
Doubtful H.264 10-bit will ever be supported.
For H.265, we may still hope. The initial spec includes a consumer 10-bit profile, so maybe..
itsonlyjustincase
25th April 2013, 19:22
Can't wait for the win8/directx11.1 version :). I have lots of hopes in it. Nevcairiel might hate me because of how i bothered people here with my Serato Video soft which is not compatible with intel GPU. Lauching the soft but using the quicksync to decode video would permit me to have 2 HD videos playing at the same time and fluid. Video mix software is different from players as the video and sound are processed in real time and optimised for rewind, fast forward, scratches (vinyl or cd emulation with those video mix software) etc...
GTPVHD
26th April 2013, 17:54
http://newsroom.intel.com/community/intel_newsroom/blog/2013/04/26/chip-shot-4th-generation-intel-core-coming-soon
nevcairiel
26th April 2013, 18:09
So they are just repeating that on June 4th Haswell will be shown, which happens to be the Computex which was announced as the release event before? :p
huhn
26th April 2013, 20:37
this time with a way to output/force pc level on hdmi and a good 23p clock ?
GTPVHD
2nd May 2013, 05:41
http://www.anandtech.com/show/6926/intel-iris-iris-pro-graphics-haswell-gt3gt3e-gets-a-brand
hajj_3
3rd May 2013, 09:34
http://www.anandtech.com/show/6926/intel-iris-iris-pro-graphics-haswell-gt3gt3e-gets-a-brand
It does look potentially rather good, i doubt they will reveal more info until the official unveil date though, maybe then egur will give us in-depth answers to questions we may have.
Most details about Haswell has already been answered. Don't ask me about performance (yet). I also don't have a GT3e unit yet. Hopefully I'll get those numbers after launch.
It's not clear to me whether GT3e enjoys better performance vs GT3 in video acceleration use cases.
BTW, the new drivers (15.31) which are the first Haswell drivers, they also support IvyBridge but not SandyBridge. SandyBridge owners (myself included) are limited to 15.28 drivers.
For users who consider buying a Haswell for HTPCs, I recommend buying fast RAM in order to get the max out of them (this is also true for previous architectures).
NikosD
3rd May 2013, 16:49
There are 6 versions of different GPUs.
GT1, GT2, 2 GT3 and 2 GT3e.
The 4 GT3 versions have all 40 shaders but different clocks and GT3e have 128 embedded RAM.
Fastest GPU is definitely 4770R (Iris pro desktop version).
But what is the difference and what is the performance difference between the 4 GT3 besides clocks and eRam in video, computing (OpenCL) and games ?
Sent by Note II using Tapatalk
GT1 and GT2 have less EUs (cores) than GT3.
GT1 < GT2 < GT3.
Edit
Frequencies (usually) depend a lot on TDP and less on the size of the GPU.
See Anandtech covering IDF on HSW's GPU (http://www.anandtech.com/show/6355/intels-haswell-architecture/12)
Edit2
The R models are listed as desktops but they are BGA parts. This means that they are soldered to the motherboard. Their main usage will be All In One (AIO) PCs (like iMac).
NikosD
4th May 2013, 06:35
Edit2
The R models are listed as desktops but they are BGA parts. This means that they are soldered to the motherboard. Their main usage will be All In One (AIO) PCs (like iMac).
So, you mean we can't actually buy a 4770R. It's not sold separately.
But can we buy a motherboard with an R model soldered to it ?
Because if we can't do that either, then the only option we have is to buy a brand name PC (All In One) or build ourselves a Haswell PC with GT1/GT2 option only.
ryrynz
4th May 2013, 09:05
I'm also wondering this, will anyone do a ATX BGA board with the 4770R.. whoever does is likely going to be winging in some cash.
I'd certainly think of buying one.
My personal opinion is that for serious gamers, a discrete GPU is still a must. Killing a 200W dGPU with an iGPU that takes much less power is not practical (yet).
As Intel gives more and more attention (die area) to the GPU and keeps its superiority in silicon manufacturing, this might change.
Video enthusiasts who use MadVR's high end setups require a strong GPU, but this is a small niche (unfortunately).
I can't discuss which OEM is doing what platform, but an OEM will come to the conclusion that it makes sense, they'll build it. Going BGA makes things cheaper and since most users don't do upgrades, it might be a good HTPC solution.
The problem of course is when the motherboard dies, replacing a BGA part is not practical.
So if a good OEM does this, upgrading is not a real issue.
Video enthusiasts who use MadVR's high end setups require a strong GPU, but this is a small niche (unfortunately).
with the new igpu there shouldn't be any performance issue with madvr the hd 4000 works pretty well already. i used the hd4000 for some tie with madvr but...
the real htpc issue with intel is this http://communities.intel.com/thread/29420?start=0&tstart=0 there are a lot more thread like this. is know and most likely easy to fix.
if i get this right intel is working on a better why to detected the output color range. but this will never fix the problem because a lot of tv report limited and except only in pc mode unlimited so the edid is wrong.
Amd fixed this with an simple drop down menu where you can force the output color range.
Nvidia got the same problem like intel but there's a workarounds madshi has created one too. so this is still bad but better then nothing.
itsonlyjustincase
5th May 2013, 10:33
any donation needed to help in developement of quicksync support with intel quick sync decoder even when the sandy brigde igpu isn't in use ?
http://techreport.com/news/24604/performance-boosting-intel-igp-drivers-are-out
NikosD
5th May 2013, 12:29
As Intel gives more and more attention (die area) to the GPU and keeps its superiority in silicon manufacturing, this might change.
128MB of eDRAM is a huge amount of memory to be on-die.
I wonder what HW resources Intel had to remove to supply eRAM in GT3e.
UPDATE:
I think I found my answer
www.techpowerup.com/182841/intel-readies-haswell-variants-with-large-graphics-cores-and-edram-caches.html
any donation needed to help in developement of quicksync support with intel quick sync decoder even when the sandy brigde igpu isn't in use ?
http://techreport.com/news/24604/performance-boosting-intel-igp-drivers-are-out
Code donations are always welcome. Money donations are not needed.
128MB of eDRAM is a huge amount of memory to be on-die.
I wonder what HW resources Intel had to remove to supply eRAM in GT3e.
eDRAM has a separate die (There are photos of this chip in Anandtech and other sites). It's a standard DRAM die. For reasons outside this thread, DRAM and Logic (e.g. Haswell) dies can't be made using the same process, hence GT3e parts are multi-chip packages (MCP).
When a DRAM is very close to the memory controller (millimeters instead of centimeters), it's possible to use very high frequencies and/or low voltage. The eDRAM is effectively a large cache. It's slower than the L3 cache but faster than main memory.
GTPVHD
6th May 2013, 18:56
http://www.anandtech.com/show/6936/intels-silvermont-architecture-revealed-getting-serious-about-mobile
Eric, does Silvermont have the same hardware decoder as Ivy Bridge? So Quicksync decoder just works on Silvermont?
nevcairiel
6th May 2013, 19:50
That is up to the drivers for those GPUs, if they support the Media SDK out of the box.
The copy-back architecture of the decoder may however not be the best course on a very power-conscious system like a phone or tablet.
Nev's answer is correct.
It doesn't matter what GPU technology is used as long as the Intel graphics driver and Media SDK abstract it.
NikosD
7th May 2013, 13:10
This year hardware decoding on smartphones and tablets will be even more interesting because of Nvidia Tegra 4, Qualcomm Snapdragon 800 and Samsung Mali 450.
All of the above SoCs have one thing in common.
4K H.264 HW acceleration.
I haven't seen any performance numbers yet, but they claim they can do it.
So, they can do what no AMD card does even today, not even Nvidia VP4 or older or Intel SandyBridge or older.
It doesn't look feasible for Silvermont with 4 EUs only and very constrained power envelope to be able to include something like QuickSync or even worse 4K HW acceleration like Ivy's with 16 EUs and large TDP.
The priority for Silvermont is definitely CPU not GPU or VPU.
i don't agree with your conclusions.
4 GEN7 EUs are in the same performance neighborhood as 6 GEN6 EUs (SandyBridge GT1). The latter I use for development and the performance is very good as you know.
Most of the decode work is done in fixed function (MFX engine) not the EUs.
If the playing field is similar (GPU clock, memory speed, memory latency, cache latency), I can speculate that performance will be around SNB GT1.
As for 4K, SandyBridge didn't have a performance problem, it had a HW limitation (to my knowledge).
The problem is that even if the driver ships with Media SDK, I have no way to test this. I don't have such a system.
NikosD
7th May 2013, 17:48
It's impossible the playing field to be similar because of power constraints.
The clocks of Silvermont GPU will be lower than GT1.
I take the risk to predict GPU performance of next generation SoCs
Tegra 4 > Snapdragon 800 > A6X (swift) > Silvermont
nevcairiel
7th May 2013, 18:19
Sandy Bridge HD2000 GPU performance on a phone? Thats insanely fast in comparison to todays mobile GPUs.
I think you'll be surprised. :p
itsonlyjustincase
7th May 2013, 18:25
Code donations are always welcome. Money donations are not needed.
eDRAM has a separate die (There are photos of this chip in Anandtech and other sites). It's a standard DRAM die. For reasons outside this thread, DRAM and Logic (e.g. Haswell) dies can't be made using the same process, hence GT3e parts are multi-chip packages (MCP).
When a DRAM is very close to the memory controller (millimeters instead of centimeters), it's possible to use very high frequencies and/or low voltage. The eDRAM is effectively a large cache. It's slower than the L3 cache but faster than main memory.
I wish i could do that but it's not in my skills :p
The intel quicksync is so powerful ! I record my video mix and games with a soft developed by Mirilis called Action!. I can mix or play games with my Nvidia GPU and record a hdmi plugged screen at the same time with the intel quicksync without any impact on the fps. I had tried many programs before like msi afterburner but all were affecting the fps. Can't wait for the ability to use quick sync even when the intel gpu isn't used with your next QS decoder
It doesn't look feasible for Silvermont with 4 EUs only and very constrained power envelope to be able to include something like QuickSync or even worse 4K HW acceleration like Ivy's with 16 EUs and large TDP.
Quicksync is confirmed for Intels next gen Atom tablets. Newest Media SDK support it. The fixed function unit is highly power efficient and fast.
jkauff
8th May 2013, 02:53
The problem is that even if the driver ships with Media SDK, I have no way to test this. I don't have such a system.
What? You mean you can't just put in a new hardware request at work? ;)
NikosD
8th May 2013, 09:32
Quicksync is confirmed for Intels next gen Atom tablets. Newest Media SDK support it. The fixed function unit is highly power efficient and fast.
Every smartphone and tablet nowdays has hardware decoding/encoding capabilities.
For example my Note II can hardware encode/decode H.264 up to 1080p@30 fps (Bluray compatible).
The GPU is Mali 400MP4 (4core)@533 MHz.
My benchmarks say that HW decoder is 100% faster than the optimized software decoder running multicore and using SIMD NEON instructions on a quadcore Cortex A9@1.6GHz (Exynos 4412)
So, Silvermont will definitely have a HW decoder.
But even if you or Intel call it QuickSync, what are the characteristics and performance compared to Ivy's QS which is more than capable of even 4K H.264 ?
Sent by Note II using Tapatalk
I don't know the performance of the yet-unreleased Bay Trail. At this point in time it's not important (SW stack may not be optimized enough). Performance is relevant near launch. By then, they will be disclosed.
wanezhiling
10th May 2013, 11:01
15.31.7.3131 for ivb/has
win7/8 x86:http://file2.mydrivers.com/display/intel_graphics_15.31.7.3131-WIN32.exe
win7/8 x64:http://file2.mydrivers.com/display/intel_graphics_15.31.7.3131-WIN64.exe
NikosD
14th May 2013, 09:25
For people who know Chinese, first review of Haswell is here:
http://www.chinadiy.com.cn/html/24/n-9024-7.html
It's a 3770K vs 4770K comparison.
Even if you don't know Chinese (like me) you can read the numbers and the names of benchmark applications (they are in English).
I worked out an initial version of QS that supports D3D11 HW video decoding (alternative to the limited D3D9).
Direct3D11 provides headless (disconnected iGPU) video acceleration. This hopefully solves multi-GPU setups. Media servers also enjoy this (no screen at all).
Minimum requirements:
- SandyBridge or newer (like before)
- 15.28 or 15.31 driver.
- Windows 8 (D3D11 and D3D9). Windows 7 (D3D9 only)
- 32 bit player
- LAV (32bit) and/or ffdshow (32bit) installed
Get the DLL from here (http://www.mediafire.com/?w78b4d4p4x71pnt). Rename the original IntelQuickSyncDecoder.dll file and use the new one instead.
This is an "alpha" SW, not well tested, etc. Please provide feedback.
My test system:
- Win8 Pro 64bit fully patched
- 2nd generation Core i7-2600 (SandyBridge)
- Radeon HD 6950 (connected to display), Catalyst 13.1.
- Intel HD 2000, latest 15.28 driver.
Update
The above DLL was updated (link updated too) and now it works on Win7. VC1 playback (D3D11) is operational too.
nussman
16th May 2013, 23:37
Hi,
thanks for this testbiuld. :thanks:
I did a quick test, but on my system it's not working yet.
Win732bit
LAV 56.2
Intel HD4000 15.31 driver
AMD HD6570 CCC 13.1
DVBViewer Pro
Quicksync is shown available, but avcodes is stilled used. Same problem with ffdshow (your latest build).
Anything else that have to be done instead of replacing the IntelQuickSyncDecoder.dll?
egur
That's great news. I've been anticipating headless support for a long time, and we have it now. Thank you.
Though, it is not working on my system. Whole setup below.
System: Windows 7 x64 SP1 with almost all updates including KB2670838.
Hardware: Intel HD Graphics 2000 + GeForce gtx 470.
Drivers: Intel 15.28.15.64.3062 (9.17.10.3062) + Nvidia 314.07
Software: Daum Potplayer x86 + LAV 56.2
For people who know Chinese, first review of Haswell is here:
http://www.chinadiy.com.cn/html/24/n-9024-7.html
It's a 3770K vs 4770K comparison.
Even if you don't know Chinese (like me) you can read the numbers and the names of benchmark applications (they are in English).
That is some inconsistent test. Turbo issues on Haswell, oddly high 3770k results and so on. Here is a better one: http://diy.pconline.com.cn/329/3297240.html
Pretty good GT2 results and convincing system power improvements.
nevcairiel
17th May 2013, 07:22
Though, it is not working on my system. Whole setup below.
Headless only works on Windows 8.
Windows 7 lacks the required D3D11 support.
Correct. Headless works only on Windows 8 (d3d11).
Windows 7 users should check that what worked before still does.
My own Windows 7 HTPC works fine with the new DLL as far as I could tell.
Full Screen Exclusive (FSE) layers like Windows Media Center do not exist on Windows 8 as far as I know. Please correct me if I'm wrong. That's why I don't know if D3D11 works in FSE applications.
wanezhiling
17th May 2013, 08:23
Wow a good reason to convince myself to win8? :p
This is sad. I wonder if it is technically impossible to fully port D3D11 functionality or it is just the way Microsoft is doing business... Well at least I can buy Lucid MVP, cause there is no way I am going to buy Windows 8.
itsonlyjustincase
17th May 2013, 10:33
I worked out an initial version of QS that supports D3D11 HW video decoding (alternative to the limited D3D9).
Direct3D11 provides headless (disconnected iGPU) video acceleration. This hopefully solves multi-GPU setups. Media servers also enjoy this (no screen at all).
Minimum requirements:
- SandyBridge or newer (like before)
- 15.28 or 15.31 driver.
- Windows 8 (D3D11 and D3D9). Windows 7 (D3D9 only)
- 32 bit player
- LAV (32bit) and/or ffdshow (32bit) installed
Get the DLL from here (http://www.mediafire.com/?w78b4d4p4x71pnt). Rename the original IntelQuickSyncDecoder.dll file and use the new one instead.
This is an "alpha" SW, not well tested, etc. Please provide feedback.
My test system:
- Win8 Pro 64bit fully patched
- 2nd generation Core i7-2600 (SandyBridge)
- Radeon HD 6950 (connected to display), Catalyst 13.1.
- Intel HD 2000, latest 15.28 driver.
Update
The above DLL was updated (link updated too) and now it works on Win7. VC1 playback (D3D11) is operational too.
WOOOWWW great news ! Thank you guy !!! Can't wait to go home tonight to test it :)
itsonlyjustincase
17th May 2013, 10:34
This is sad. I wonder if it is technically impossible to fully port D3D11 functionality or it is just the way Microsoft is doing business... Well at least I can buy Lucid MVP, cause there is no way I am going to buy Windows 8.
Lucid MVP needs a GTX geforce model. I bought it before understanding that i needed a GTX. So if you want i can give it to you........the standard edition
nevcairiel
17th May 2013, 13:33
I wonder if it is technically impossible to fully port D3D11 functionality
Parts of these functions need the new driver model WDDM 1.2 which is not available for 7 either, so they would need to port this as well, and who knows what all else this contains and how deep it goes.
So one can understand why its not done, even if it makes us sad.
itsonlyjustincase
18th May 2013, 09:14
Unfortunatly doesn't work with my app. It uses avcodec when i select quicksync
WIN8 64 bits
Asus ux32vd
Please specify your setup - what SW is used (must be 32 bit) what driver? Is the iGPU enabled?
itsonlyjustincase
18th May 2013, 10:16
win8 64bits
i7 3517u
intel hd 4000 + nvidia geforce 620m
intel hd 4000 enabled by default.
Geforce 620m forced to use my soft Serato Video 32 bits (www.serato.com) cause they disabled intel gpu compatibility so that you can't run the soft using the intel GPU as it gives you an error message saying your intel gpu isn't supported. That is why i wanted to be able to launch it with the Nvidia GPU but use quick sync to play video
Driver 9.18.10.3071
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.