View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing
NikosD
24th February 2015, 19:01
Well, are they supported ?
I'm not sure and I don't know an easy way to check it out.
andyvt
24th February 2015, 19:38
Well, are they supported ?
I'm not sure and I don't know an easy way to check it out.
Look at the MSDK documentation. IIRC, the most recent version does not yet support these formats, but I might have missed a release.
NikosD
24th February 2015, 20:30
Latest MSDK documentation I have seen is from January 2014, more than one years old.
And I think latest MSDK is 2014 since January 2014.
In there, there is no VP8/VP9 support for sure.
But there is support in the drivers, though.
So, the MSDK is very old, but the drivers are fully updated.
Is this possible ?
andyvt
24th February 2015, 20:49
Within the realm of possibility? Probably, but it would entail a significant amount of [otherwise unnecessary] effort because you would have to build support for those formats into the parts of the MSDK that sit above the driver.
Nintendo Maniac 64
24th February 2015, 20:51
I'd like to mention that QuickSync-accelerated VP8 decoding wouldn't even be that useful since VP8 software decoding uses even less CPU utilization than h.264 baseline.
Now VP9? I could definitely get behind that.
NikosD
24th February 2015, 22:49
I've started to look into this. My new laptop is a Haswell so this should speed things up.
What matters to me is if the codec is supported by the Media SDK.
If a codec is supported by it, I can add support for it as well.
DXVA codec can be supported by LAN w/o me doing anything...
Within the realm of possibility? Probably, but it would entail a significant amount of [otherwise unnecessary] effort because you would have to build support for those formats into the parts of the MSDK that sit above the driver.
Eric, give me a hint.
If it's so difficult to build a decoder for a format not supported by MediaSDK, like VP8/VP9 albeit they are supported by the driver, then how are you going to support the hybrid HEVC decoder, which is also not supported directly by MediaSDK, but probably via a plugin, which I think it's a SW only implementation (not hybrid).
If I had to choose between hybrid HEVC and VP9, I would definitely go to VP9, because LAV already supports hybrid HEVC and there is no VP9 decoder.
xooyoozoo
25th February 2015, 00:06
There is also a VP8 decoder in Haswell
Where did you hear this?
AFAIK, the VP8 decoder only exists in Atom SoCs (at least Bay Trail) and Broadwell. Those chips also support hybrid VP9, so I'm guessing the VP9 decoder piggybacks the VP8 hardware blocks.
I don't think Eric will be able to test VP8/VP9 on Haswell (though my mental image of Intel engineers consists of them having a buffet of Intel chips to play with :)).
NikosD
25th February 2015, 03:28
I can see it in my system.
There is a VP8 MFT decoder.
xooyoozoo
25th February 2015, 04:13
Ah, you're right. My mistake.
egur
25th February 2015, 09:31
The latest Media SDK (2015) is part of a larger SW package called INDE.
It can be downloaded from here (https://software.intel.com/en-us/media-client-solutions).
The free version is good enough for developing QuickSync Decoder.
I have already updated the code to use the headers and library (r101 in qsdecoder's SVN).
NikosD
25th February 2015, 11:16
I have already downloaded INDE 2015 update 1 in order to use latest GPA 2014 R4 which is included inside.
But I haven't found documentation for MediaSDK 2015.
Could you provide us a link with that ?
Or should I install separately MediaSDK from INDE in order to get the documentation too ?
Thanks.
EDIT:
Look at the MSDK documentation. IIRC, the most recent version does not yet support these formats, but I might have missed a release.
In the release notes of INDE 2015 Update 1 there is this paragraph:
The Intel® INDE 2015 Media SDK for Windows* introduces API version 1.13
Versions of HW SDK library having API version 1.13 also support the following features of older API versions, previously unsupported:
o HW accelerated HEVC decode MAIN 10 profile via plugin interface. HW HEVC decode plugin is distributed with graphics driver. HEVC HW UID is defined in mfxplugin.h as MFX_PLUGINID_HEVCD_HW.
o HW accelerated VP8 decode via plugin interface. HW VP8 decode plugin is distributed with graphics driver. HW VP8 UID is defined in mfxplugin.h as MFX_PLUGINID_VP8D_HW.
o HW accelerated VP9 decode via plugin interface, HW VP9 decode plugin is distributed with graphics driver. HW VP9 UID is defined in mfxplugin.h as MFX_PLUGINID_VP9D_HW.
So, Eric and andyvt, there is everything inside MediaSDK 2015 and distributed with graphics driver.
All of the above HW decoders are hybrid, GPU accelerated (not ASIC).
According to this table https://communities.intel.com/message/273816#273816, Core Haswell has no HEVC 10bit/VP9 support, but it says nothing about VP8, which I think is supported, like HEVC 8bit.
Broadwell based Celeron/Pentium/ Core M have VP9 support, but no HEVC support and Broadwell Core iX support everything (HEVC 8bit/10bit, VP9)
You can find the whole MediaSDK 2015 release notes here:
https://software.intel.com/sites/default/files/managed/eb/ee/mediasdk_release_notes.pdf
egur
11th March 2015, 22:10
I started to look at the HEVC implementation for the Intel QuickSync Decoder.
I'll also try to get a clear picture on what codec/profile is supported on which platform/driver.
When I'm done coding, I'll need to build a test clip library for HEVC. I hope you guys can share some horribly hard clips :sly:
Also, since FFDShow (R.I.P) doesn't support the new codecs, all engineering/test builds will be LAV.
Zachs
11th March 2015, 23:30
Hi egur, just wondering if you've fixed the resource leak issue I reported about half a year ago?
xooyoozoo
11th March 2015, 23:32
I don't know much about the corporate structure at Intel, but you guys do offer HEVC (and VP9) stress clips. Maybe you can borrow a few? :)
There are some high bitrate HEVC clips floating around, but that mainly stresses entropy decoding. Ideally, the clip would be "legitimately" high-bitrate, instead of just low-QP, to make sure deblocking is actually on. To add a lot more complexity, TMVP skip-mode should be disabled and 8x8 weighted bi-prediction blocks should be used almost all the time.
egur
12th March 2015, 08:36
Hi egur, just wondering if you've fixed the resource leak issue I reported about half a year ago?
Couldn't find it in my code.
I don't know much about the corporate structure at Intel, but you guys do offer HEVC (and VP9) stress clips. Maybe you can borrow a few? :)
Intel has 100K employees. The lowest level manager I have in common with the graphics guys is the Intel president :)
In the past it was much more productive to ask users for test clips.
Using the Intel test clips is a little pointless since they are already being tested at Intel...
Zachs
12th March 2015, 10:31
Couldn't find it in my code.
From memory, you did manage to replicate the issue though correct?
egur
12th March 2015, 14:31
From memory, you did manage to replicate the issue though correct?
Yes, I replicated but couldn't isolate the leak to my code, MSDK or the graphics driver.
On my HTPC, the player run for many weeks and not have any issues that could have resulted from a serious leak.
Zachs
13th March 2015, 00:30
Yes, I replicated but couldn't isolate the leak to my code, MSDK or the graphics driver.
On my HTPC, the player run for many weeks and not have any issues that could have resulted from a serious leak.
The application we are using it with plays 1 minute footage files that are seamingly merged into one big footage. We get a resource leak each time a new file is opened/closed from the QuickSync decoder, which is quite catastrophic as it is not memory that leaks but GPU resource which is highly limited.
Are you saying the QuickSync decoder is only meant for HTPC purposes?
cybersans
24th March 2015, 15:48
guys, why always get stutter video of mp4 when there is no sound in the background, and huge delay between blank screen (for example pause from one scene to another scene)? also with a crash (lots of pixellated image) when forward/reverse or dragging the progress bar to another duration.
it happens to the mp4 and mkv if using ffdshow with quicksync decoder. i have no problem if using libavcodec for mp4 or for other format such as avi and so on.
anyway i am using windows 7 with windows media player with latest ffdshow and intel hd drivers .4072
the same goes with older drivers.
egur
25th March 2015, 14:17
Do you know which splitter is used?
Does it happen with LAV filters as the decoder (also has QS support)?
VFR maniac
25th March 2015, 15:29
Hi.
I encountered some AVC-in-AVI files crash at opening when LAV Splitter + QuickSync Decoder. (For other splitters, it works fine.)
It seems workaround added at r69 (http://sourceforge.net/p/qsdecoder/code/69/) uses the bitstream handler wrongly and could do out-of-bounds read.
The following patch fixes the crash but I'm not sure that the workaround also works on what originally introduces it. And I tested only on LAV Video Decoder, not ffdshow.
From e89b6c332fe5c6e2f401a23eca611a0b3351bc79 Mon Sep 17 00:00:00 2001
From: Yusuke Nakamura <muken.the.vfrmaniac@gmail.com>
Date: Wed, 25 Mar 2015 03:59:06 +0900
Subject: [PATCH] Fix possible out-of-bounds read of workaround for
DecodeHeader.
---
QuickSyncDecoder.cpp | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/QuickSyncDecoder.cpp b/QuickSyncDecoder.cpp
index 747f348..55e4ff0 100644
--- a/QuickSyncDecoder.cpp
+++ b/QuickSyncDecoder.cpp
@@ -419,9 +419,10 @@ mfxStatus CQuickSyncDecoder::DecodeHeader(mfxBitstream* bs, mfxVideoParam* par)
{
mfxBitstream bs2 = *bs;
+ bs2.DataOffset = 0;
bs2.Data = new mfxU8[bs->DataLength + 5];
- memcpy(bs2.Data, bs->Data + bs->DataOffset, bs->DataLength - bs->DataOffset);
- bs2.MaxLength = bs2.DataLength = bs->DataLength + 5 - bs->DataOffset;
+ memcpy(bs2.Data, bs->Data + bs->DataOffset, bs->DataLength);
+ bs2.MaxLength = bs2.DataLength = bs->DataLength + 5;
// Write H264 start code + start of splice section
*((unsigned*)(bs2.Data + bs2.DataLength - 5)) = 0x01000000;
--
1.9.2.msysgit.0
cybersans
25th March 2015, 15:46
Do you know which splitter is used?
Does it happen with LAV filters as the decoder (also has QS support)?
err is it because i am using haali? need to use lav?
theoneofgod
26th March 2015, 05:59
err is it because i am using haali? need to use lav?
Video Renderer and Decoder aren't the same things.
egur
26th March 2015, 17:22
Haali may produce horrible time stamps with some streams.
QS decoder tries to correct them and find the frame rate. I've seen cases where the time stamp are so wrong that QS decoder will guess the wrong frame rate.
LAV is much better in that sense.
Corruption after seeks have been visible for many years. Nothing I can do about it. libavcodec handles corrupted streams much better. Ideally the splitter will produce a proper stream after a seek, but both Haali and the LAV are not perfect.
Overall, I got a much better experience with the LAV splitter.
VFR maniac
Thanks for the patch! I recently noticed the bug during my work on HEVC.
NikosD
8th April 2015, 09:32
Main features
* HW deinterlacing -auto or forced, with half or full (50/60p) output rate
I tried using HW DI of LAV Video (Enable Adaptive HW Deinterlacing) during a transcoding and the speed is almost half compared to HW DI in the encoding stage.
It seems better (almost double speed) during transcoding to use the two different functions (Encoding, Deinterlacing) by the HW encoder and decoding only by QS decoder instead of two different functions (Decoding, Deinterlacing) using QS decoder and encoding only by HW encoder.
Is it a different algorithm used by QS decoder for DI compared to HW encoders, a different use of VPP engine or is it because of different pipeline usage ?
andyvt
8th April 2015, 09:40
I tried using HW DI of LAV Video (Enable Adaptive HW Deinterlacing) during a transcoding and the speed is almost half compared to HW DI in the encoding stage.
It seems better (almost double speed) during transcoding to use the two different functions (Encoding, Deinterlacing) by the HW encoder and decoding only by QS decoder instead of two different functions (Decoding, Deinterlacing) using QS decoder and encoding only by HW encoder.
Is it a different algorithm used by QS decoder for DI compared to HW encoders, a different use of VPP engine or is it because of different pipeline usage ?
When you DI in the decoder you are forcing a 2x increase in the # of frames output at that stage. Using this in a DS pipeline where the frames are copied from GPU to system memory this will cause a significant decrease in performance.
Is the encoder decimating or are you getting the same output in either case (e.g. 1080i30 ->1080p60)?
egur
8th April 2015, 09:43
It's the same DI in both cases and the same API calls to use it.
The performance difference is the the images in the encoding scenario stay in the GPU memory.
Also, are you using the DI in double or single rate in both cases?
Also when decoding, the HW DI takes GPU resources and lowers the overall performance.
The HW DI feature in QS Decoder is to view deinterlaced content. If you have the means to deinterlace in advance that will increase the performance of the decoder.
NikosD
8th April 2015, 10:39
When you DI in the decoder you are forcing a 2x increase in the # of frames output at that stage. Using this in a DS pipeline where the frames are copied from GPU to system memory this will cause a significant decrease in performance.
The performance difference is the the images in the encoding scenario stay in the GPU memory.
Also when decoding, the HW DI takes GPU resources and lowers the overall performance.
If you have the means to deinterlace in advance that will increase the performance of the decoder.
Thanks for the replies, they certainly explain a lot.
On the other hand though, one could think that because the decoding stage is a lot easier than encoding stage, the extra process of DI should be done at the easier stage, because the decoder is already a lot faster than encoder and the lowering of performance due to extra utilization of GPU resources shouldn't delay the encoder.
If you use the GPA monitor, you 'll see that during transcoding of progressive material the decoder is used like 25-30% and the encoder like 50-60% more or less.
So adding the extra process of DI to the encoding stage which is already too busy, should lower the performance even more.
Is the GPU frame copy to system memory penalty so big than even overcomes the delay of the encoding so much that the whole transcoding has almost half speed ?
Is the encoder decimating or are you getting the same output in either case (e.g. 1080i30 ->1080p60)?
Also, are you using the DI in double or single rate in both cases?
My setup as it is right now, allows me only 1080i25fps -> 1080p25fps conversion, so I used that in both scenarios.
I don't know yet if 1080i25fps -> 1080p50fps conversion shrinks the performance gap between those two scenarios.
andyvt
8th April 2015, 12:02
Thanks for the replies, they certainly explain a lot.
On the other hand though, one could think that because the decoding stage is a lot easier than encoding stage, the extra process of DI should be done at the easier stage, because the decoder is already a lot faster than encoder and the lowering of performance due to extra utilization of GPU resources shouldn't delay the encoder.
If you use the GPA monitor, you 'll see that during transcoding of progressive material the decoder is used like 25-30% and the encoder like 50-60% more or less.
So adding the extra process of DI to the encoding stage which is already too busy, should lower the performance even more.
Either way you need to decode->DI->encode. The DI HW is the same HW either way.
When you're doing it:
1) decode->DI->2((GPU->RAM)->(RAM->GPU))->encode
v.
2) decode->(GPU->RAM)->(RAM->GPU)->DI->encode
There is a significant performance hit for 1 because your copying 2x the data GPU->RAM->GPU.
Is the GPU frame copy to system memory penalty so big than even overcomes the delay of the encoding so much that the whole transcoding has almost half speed ?
Depending on the system it can be more resource intensive to copy the frame GPU->RAM than decode it in SW...
My setup as it is right now, allows me only 1080i25fps -> 1080p25fps conversion, so I used that in both scenarios.
I don't know yet if 1080i25fps -> 1080p50fps conversion shrinks the performance gap between those two scenarios.
When DI in the decoding stage as you are your are actually doing:
1080i25->1080p50->1080p25
vs.
1080i25->1080p25 (the MSDK will auto-decimate during DI in this scenario)
JohnLai
12th April 2015, 09:36
egur, how is the HEVC QS progress?
cybersans
15th April 2015, 05:07
Haali may produce horrible time stamps with some streams.
QS decoder tries to correct them and find the frame rate. I've seen cases where the time stamp are so wrong that QS decoder will guess the wrong frame rate.
LAV is much better in that sense.
Corruption after seeks have been visible for many years. Nothing I can do about it. libavcodec handles corrupted streams much better. Ideally the splitter will produce a proper stream after a seek, but both Haali and the LAV are not perfect.
Overall, I got a much better experience with the LAV splitter.
VFR maniac
Thanks for the patch! I recently noticed the bug during my work on HEVC.
dear egur, same thing happened either using lav or haali. i still experienced delay when there is no image (blank screen between scene) or a scene without any sound. i suspect ffdshow caused that, because in ffd video decoder; intel quicksync settings; when i click enable time stamp correction, there are no video corruption or delay between blank screen, but the audio is not sync with the video. when i untick that, corruption and delay in blank screen occured.
NikosD
23rd April 2015, 14:13
New drivers for Haswell & Broadwell v4170
New API v1.15
64 - http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=24870
32 - http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=24869
NikosD
28th April 2015, 13:31
Eric,
reading the Intel forums I'm under the impression that using MSDK to decode HEVC can be done in two ways: SW and HW.
For SW you need a SW plugin which is inside the MSS (Media Server Studio) and for HW (hybrid solution) you need another plugin - the MFX_PLUGIN_HEVCD_HW - which is inside INDE.
So, it seems that there are no solutions without plugins right now.
How are you going to implement your HEVC QS decoder ?
kalehrl
28th April 2015, 18:00
I've just installed a discrete graphics card and I can no longer choose QuickSync in LAV filters nor use Intel HW 264 encoding.
Is there a way to use a discrete graphics and be able to use these 2 things in Windows 7 64bit?
Thank you.
Taurus
28th April 2015, 20:29
I've just installed a discrete graphics card and I can no longer choose QuickSync in LAV filters nor use Intel HW 264 encoding.
Is there a way to use a discrete graphics and be able to use these 2 things in Windows 7 64bit?
Thank you.
Did you try this: https://mirillis.com/en/products/tutorials/action-tutorial-intel-quick-sync-setup_for_desktops.html
or this:https://mirillis.com/en/products/tutorials/action-tutorial-intel-quick-sync-setup_for_desktops_2_displays.html#top
Courtesy of mirillis...
P.J
29th April 2015, 10:09
I've just installed a discrete graphics card and I can no longer choose QuickSync in LAV filters nor use Intel HW 264 encoding.
Is there a way to use a discrete graphics and be able to use these 2 things in Windows 7 64bit?
Thank you.
No issue with Windows 8.1
NikosD
29th April 2015, 16:41
Win 8/8.1 have no need of the "fake" monitor setup.
But Win 7 needs that trick, in order to enable QuickSync.
kalehrl
29th April 2015, 19:40
Did you try this: https://mirillis.com/en/products/tutorials/action-tutorial-intel-quick-sync-setup_for_desktops.html
or this:https://mirillis.com/en/products/tutorials/action-tutorial-intel-quick-sync-setup_for_desktops_2_displays.html#top
Courtesy of mirillis...
Thank you for pointing me to these links.
At first, I wasn't able to fix the issue because when I clicked on 'Detect' button, no additional displays would appear.
Then I went into BIOS, IGP settings and enabled multiple monitors and then the instructions worked just fine.
Taurus
29th April 2015, 22:55
Then I went into BIOS, IGP settings and enabled multiple monitors and then the instructions worked just fine.
Aaah, I forgot about this....
The same happened to me a while back :devil:
NikosD
30th April 2015, 17:28
The new INDE 2015 Update 2 includes new MSDK as well, with API v1.15 documentation, mainly about HEVC encoding of Skylake.
So, it's official:
Skylake supports HW H.265 encoding.
Release notes:
https://software.intel.com/sites/default/files/managed/48/ce/mediasdk_release_notes.pdf
GTPVHD
4th May 2015, 15:32
http://www.anandtech.com/show/9219/the-surface-3-review/4
In addition to the GPU update, the ISP and hardware decode capabilities get a bump as well. There is full hardware acceleration for decode of H.263, MPEG4, H.264, H.265 (HEVC), VP8, VP9, MVC, MPEG2, VC1, and JPEG, as well as hardware encode for H.264, H.263, VP8, MVC, and JPEG. This marks the first Intel product to ship with the company's full, fixed-function HEVC decoder, making Atom the company's most advanced media processor, at least for this short moment.
NikosD
6th May 2015, 09:57
I've just realized that LAV x64 0.65 QS decoder uses SW fallback for VC-1/ WMV3 (!)
For the other formats (H.264, MPEG2) LAV x64 QS decoder works OK.
Also LAV x86 0.65 QS decoder works OK for all formats (H.264, MPEG2, VC-1/WMV3).
wanezhiling
25th July 2015, 15:10
https://downloadcenter.intel.com/download/25143/Intel-Iris-Iris-Pro-and-HD-Graphics-Driver-for-Windows-7-8-8-1-64-bit
https://downloadcenter.intel.com/download/25146/Intel-Iris-Iris-Pro-and-HD-Graphics-Driver-for-Windows-7-8-8-1-32-bit
Intel 10.18.14.4251 driver for Haswell & Broadwell.
I can't download 64-bit driver
I can't download 64-bit driver
Me neither
clsid
30th July 2015, 13:28
Does this driver version also have P010/P016 bug? You can easily test by loading a 10bit video in GraphStudioNext.
NikosD
30th July 2015, 13:36
Of course it has the bug.
It's not fixed yet.
detmek
20th August 2015, 18:33
https://downloadcenter.intel.com/download/25232/Intel-Iris-Iris-Pro-and-HD-Graphics-Driver-for-Windows-7-8-8-1-64bit
https://downloadcenter.intel.com/download/25233/Intel-Iris-Iris-Pro-and-HD-Graphics-Driver-for-Windows-7-8-8-1-32bit
Intel 10.18.14.4264 driver for Haswell & Broadwell.
Thanks. Everything works as it should.
One question if anyone knows. This and previous drivers for Windows 8 support 4k H.264 decoding on Intel Pentium G3220 (Haswell). But drivers for Windows 10 (4256) only support up to full HD QSV decoding. Is it just an omission from Intel or they are removing 4k QSV decoding in Windows 10 for Pentium processors?
leonccyiu
6th September 2015, 06:15
Thanks. Everything works as it should.
One question if anyone knows. This and previous drivers for Windows 8 support 4k H.264 decoding on Intel Pentium G3220 (Haswell). But drivers for Windows 10 (4256) only support up to full HD QSV decoding. Is it just an omission from Intel or they are removing 4k QSV decoding in Windows 10 for Pentium processors?
I've only had windows 10 on my pentium g3258 and have never seen 4k hardware h264 decoding which would be really useful for me for high bit rate content
P.J
10th September 2015, 17:53
:thanks:
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.