Log in

View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 [32] 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

nevcairiel
25th July 2012, 21:03
The renderer just needs to know that its getting a D3D surface instead of just a memory buffer, thats nothing you can hide from the renderer. So it needs support for this. Nothing else makes any sense.
You can of course use LAV in DXVA2 Copy-Back mode, that way it works with every renderer, but its also slower, because you don't have direct GPU to GPU transfer anymore.

egur
25th July 2012, 21:48
However what I expected is to run in hardware acceleration whether the renderer is DXVA2 or not.

DXVA2 implies a different connection protocol than the default.
The idea is to avoid any copying between decoder and renderer.
There's no in-between solution with the exception of copy-back decoders like my decoder or LAV's DXVA copy-back mode.

Why CreateD3DDeviceManager() can not be done in decoder?
And access the buffer inside the decoder from any renderer?
It can be done - that's what I do if I don't get one from the renderer.
Note that when the renderer exist in full screen exclusive (FSE) before the decoder is created (e.g. Windows Media Player), DirectX will not allow creating any HW device, forcing the decoder to get the device manager as well as the DirectX object from the renderer.
FSE was originally made for games, too bad it was extended to video, it really complicates things.

CharlieCL
26th July 2012, 00:43
It can be done - that's what I do if I don't get one from the renderer.

Note that when the renderer exist in full screen exclusive (FSE) before the decoder is created (e.g. Windows Media Player), DirectX will not allow creating any HW device, forcing the decoder to get the device manager as well as the DirectX object from the renderer.
FSE was originally made for games, too bad it was extended to video, it really complicates things.

Good. I also tested QS in LAV under Windows 8 CP 64-bit. I believe that the HW acceleration still not worked for my renderer cause the CPU usage is still 30 ~ 40%. I unchecked the NV12 in LAV. Maybe in this case a new filter will be created to convert NV12 to RGB. When I enabled NV12 I can only capture Y plane, there is only black/white picture, the UV plane are zero.

I can switch from Windows mode to FSE. That was set by DirectX.

BTW Is QS HW decoder a thread-safe? I know it is task-safe. I can run multi-tasking player with HW enabled.

nevcairiel
26th July 2012, 06:44
The latest driver for Windows8 does not work properly with QS.
Also, if you get no colors from NV12, you must be doing something wrong, seems to work fine for every other renderer.

Regarding thread-safety, i don't know what you had in mind there. The way DirectShow works, there usually is one thread that processes one stream, so one thread for each stream. LAV and QS fully support decoding multiple streams at once, so it can run in parallel in multiple threads, if thats what you mean. Tests show that up to 4 streams work OK with the hardware, anything above makes it choke.

egur
26th July 2012, 07:40
Good. I also tested QS in LAV under Windows 8 CP 64-bit. I believe that the HW acceleration still not worked for my renderer cause the CPU usage is still 30 ~ 40%. I unchecked the NV12 in LAV. Maybe in this case a new filter will be created to convert NV12 to RGB. When I enabled NV12 I can only capture Y plane, there is only black/white picture, the UV plane are zero.

I can switch from Windows mode to FSE. That was set by DirectX.

BTW Is QS HW decoder a thread-safe? I know it is task-safe. I can run multi-tasking player with HW enabled.
Adding to Nev's response, the beta Win8 driver is not so healthy with video.
I've made some changes to my code to account for some of the failures but sadly not all of them :(

Personally, I'm too old to install a beta OS on my own machines... I'll might wait a for a few months after Win8 launch so everything is more stable.
I'll provide an ugly patch for the 15.28 drivers that allows QS to work fine with some performance penalty (limited multithreading).

nevcairiel
26th July 2012, 09:52
Just wait until Win8 release until you worry about supporting it, if the driver hasn't improved by then.

Shark007
29th July 2012, 03:45
a suggestion: instead of releasing full ffdshow builds on sourceforge, how about just releasing the x86/x64 IntelQuickSyncDecoder.dll(s)?
It is not much trouble for a user to place it into the appropriate ffdshow or LAV filters directory themselves.

If releasing the dll on sourceforge isnt a good idea, could you consider releasing the dll in the 1st post of this thread?

nevcairiel
29th July 2012, 07:07
For the record, i will not offer any support for someone that replaced the DLL in his LAV install. I release a version thats tested with LAV, and should usually also be somewhat up to date.

egur
29th July 2012, 07:16
a suggestion: instead of releasing full ffdshow builds on sourceforge, how about just releasing the x86/x64 IntelQuickSyncDecoder.dll(s)?
It is not much trouble for a user to place it into the appropriate ffdshow or LAV filters directory themselves.

If releasing the dll on sourceforge isnt a good idea, could you consider releasing the dll in the 1st post of this thread?

As QS decoder versions move forward, the ffdshow code may change as well (use new options or handle changed parameters) or at just need to recompile.
Replacing the QS DLL is not recommended and will increase my support effort needlessly.
Usually new versions of QS decoder are quickly merged into the ffdshow trunk.

Shark007
29th July 2012, 07:32
Thanks, both of you. Your response is appreciated.
perfect reasoning / explanation.

egur
30th July 2012, 19:53
Version 0.38 beta is out with the following changes:
* A fix for audio sync issues in broken streams.
* Bugfixes for broken Win8 drivers
* Better handling of incomplete sequence headers.
* Able to recover after multiple initializations. New stream must be from the same codec as the old stream. Calling InitDecoder or SetConfig will reset the decoder.
* FFDShow: Added fine grain multithreading support
* FFDShow: r4477

Note to Window 7/8 users with drivers 15.28.xx.xx:
Do not enable full multithreading support in FFDshow or player will abort playback at some point in many clips.

Downloads
* For the latest cutting edge FFDShow builds download my builds Intel QuickSync Decoder SourceForge home page (http://sourceforge.net/projects/qsdecoder/)
* FFDShow-tryout site (http://ffdshow-tryout.sourceforge.net/download.php)
* LAV Splitter builds (http://forum.doom9.org/showthread.php?t=156191)

NikosD
2nd August 2012, 07:45
Windows 8 reached final build RTM (Release To Manufacturing) yesterday and will be available on August 15th via MSDN and Technet.

The final build is 9200 (9200.16384.win8_rtm.120725-1247)

On October 26th will be available for retail customers and pre-loaded systems with Win 8.

egur
2nd August 2012, 08:08
Windows 8 reached final build RTM (Release To Manufacturing) yesterday and will be available on August 15th via MSDN and Technet.

The final build is 9200 (9200.16384.win8_rtm.120725-1247)

On October 26th will be available for retail customers and pre-loaded systems with Win 8.

Great. I'll wait a few months before even considering using it at home.

NikosD
2nd August 2012, 11:29
Judging by Metro UI, I would say to not consider it at all using Win 8 at home on a computer.

Metro UI was built for tablets and touch screens and probably I would skip that version of Windows completely, if "classic" Windows Explorer didn't exist at all.

On the other hand, we are more or less at the enthusiast community so I will definitely install Win 8 - as second system - no matter what, as soon as I get a copy in my hands - I have already seen Win 8 preview release on my older second system.

Esperado
6th August 2012, 08:06
It works ok with Lav Filter, but, when i try to use FFDSHOW, despite i have chosen intel QuickSync, FFdshow says it uses Libavcodeck on H264 (TNT HD).
FFDSHOW says it uses Intel on non hd channels (mpeg2), so i believe install is correct.
I had tried both 64 bits and 32bit version as i use DVBViewer which is in 32 bits to watch TNT TV.
What the hell ?

(My config: Seven 64 bits with a Sandy bridge GT2 i5-2500K @ 4.6Ghz (HD graphics 3000) on a Z68 chipset and 4GB of ram)

egur
6th August 2012, 19:57
What's the driver version? Splitter version?

Esperado
6th August 2012, 20:50
What's the driver version? Splitter version?FFdshow and Lav : Last ones. Downloaded yersterday.
Intel Driver : 8.15.10.2696

Not a big problem, as LAV works perfectly and i can use FFDSHOW raw for some additional post processing. it is just to let you know and to try to understand what happens as well.

egur
7th August 2012, 08:42
Unfortunately, I can't test live TV streams since I don't have such a source available.
You can try installing a newer driver (from the 15.26 family) although this driver worked fine as far as I remember.
Can you play H264 files?
If LAV works with QS decoder and ffdshow doesn't, it means that some checks within ffdshow fail.

Esperado
7th August 2012, 13:05
Unfortunately, I can't test live TV streams since I don't have such a source available.
You can try installing a newer driver (from the 15.26 family) although this driver worked fine as far as I remember.
The last one i found at Intel download center is win7_64-152612 Version is 8.15.10.2761
Despite Windows said i had the latest ;-(
Same behavior.

Can you play H264 files?Yes, you're right: ffdshow says "Intel @Quicksync" with recorded files from TNT. Strange.

If LAV works with QS decoder and ffdshow doesn't, it means that some checks within ffdshow fail.Unintelligent connect from Directshow ?

BTW, do-you have contact with this guy ? : http://doom10.org/index.php?topic=717.0

Esperado
7th August 2012, 13:24
Also, i experience some freezes after long playings of tv streams with Quicksync. Closing application then reload, changing channel or rebuild graph and it is good for an other run. No such a problem with software decoders.

egur
7th August 2012, 14:26
FFDShow may recieve an incompatible fourcc or something of the sort. I don't know and can't test this.

I don't know the guy in the post. Intel 100K employees...

Esperado
7th August 2012, 16:30
I don't know and can't test this.How can-i help ?

Thunderbolt8
22nd August 2012, 04:55
does the intelquicksync hardware accelerarion only work when having an i5 (or similar cpu) AND the intel HD graphic card both running at the same time?

in case of the avatar 60p demo, I get almost 400fps with the intel decoder, compared to only 120 with coreavc. but since I use madvr, the intel card with only 25gb/s bandwith seems to be rather weak here and I get dropped frames if I priorize it over the Radeon mobility 7670, no matter which decoder of coreavc, lav video or ffdshow I use in both cases. it seems like the huge decoding advantage gets lots by the intel HD 4000 graphics cards bad performance with madvr.

or am I doing something wrong?

egur
22nd August 2012, 07:37
How can-i help ?
Sorry for the late response, I was on vacation w/o internet.
You'll need a VS2010 and able to build ffdshow.
Another option is for me to provide a debug build that pops a lot of message boxes showing the status. Let me know which option is preferred.

does the intelquicksync hardware accelerarion only work when having an i5 (or similar cpu) AND the intel HD graphic card both running at the same time?

in case of the avatar 60p demo, I get almost 400fps with the intel decoder, compared to only 120 with coreavc. but since I use madvr, the intel card with only 25gb/s bandwith seems to be rather weak here and I get dropped frames if I priorize it over the Radeon mobility 7670, no matter which decoder of coreavc, lav video or ffdshow I use in both cases. it seems like the huge decoding advantage gets lots by the intel HD 4000 graphics cards bad performance with madvr.

or am I doing something wrong?

Unfortunately, the decoder will work only if the iGPU is active - connected to a screen. A Direct3D9 limitation.
There's an option to trick D3D9 by extending the desktop to a disconnected display. See here (http://forum.doom9.org/showthread.php?p=1532786#post1532786).

Esperado
22nd August 2012, 15:49
Another option is for me to provide a debug build that pops a lot of message boxes showing the status. Let me know which option is preferred.As i do not develop under Visual Studio since decade, this second solution seems more reasonable on my side. But blind and time consuming on your side ?

Pulp Catalyst
7th September 2012, 03:15
is there a way to utilize quicksync decoding function with MeGUI or avisynth like DGDecNV can use Nvidia GPU's

i'm looking for any feedback, or theory's. is this something that can be done, or will it be like AMD (radeon) where the decode information can't be gained easily

i know that GPU can help in some respects, yet i would of thought it would be even better than Nvidia GPU decoding as intel in on chip (no motherboard travelling kind of thing with information bottling the express lanes)

can i use the FFDShow to assist MeGUI (offloading H264 decoding to GPU) so MeGUI x264 wil be faster.... even 15% gain in speed would be great.


if not..... will this or could this be possible, or will quicksync be like AMD (radeon) system where the decoding information can't be got at for make use outside of directx (avisynth needs to be the one to have the info)

easyfab
7th September 2012, 10:46
is there a way to utilize quicksync decoding function with MeGUI or avisynth like DGDecNV can use Nvidia GPU's

i'm looking for any feedback, or theory's. is this something that can be done, or will it be like AMD (radeon) where the decode information can't be gained easily

i know that GPU can help in some respects, yet i would of thought it would be even better than Nvidia GPU decoding as intel in on chip (no motherboard travelling kind of thing with information bottling the express lanes)

can i use the FFDShow to assist MeGUI (offloading H264 decoding to GPU) so MeGUI x264 wil be faster.... even 15% gain in speed would be great.


if not..... will this or could this be possible, or will quicksync be like AMD (radeon) system where the decoding information can't be got at for make use outside of directx (avisynth needs to be the one to have the info)

Perhaps, you can try to create a .grf file (see here http://avisynth.org/mediawiki/Importing_media ) with ffdshow/lavfilter quicksync decoder and load it in avisynth with DirectShowSource().

egur
7th September 2012, 11:01
ffdshow/lavfilters can be used with QuickSync as long as a display is connected to the Intel GPU in some manner.
The performance gains are not as trivial as standard decoding because all HW decoders output NV12 and avisynth wants either YUY2 or YV12. The latter is preferred as the conversion is lossless and very fast. Video renderers on the other hands like to get NV12 so no conversion is needed.

As far as complex/smart transcoding flows where meta data from the decoder is used (e.g. motion vectors or quantization parameters), I'm afraid we're out of luck. The driver doesn't expose those which is a shame since this information is useful for denoising/deblocking as well.

Pulp Catalyst
7th September 2012, 14:26
i see, so partially closed than like AMD, nvidia seems to be the only one that has really opened this area up, it'a an awful shame that so many developers still don't take into account that Avisynth is where the power is (so much can be done with Avisynth), here's hoping that programs like DGDecNV can be created for intel quick-sync.... although it's worrying about the NV12 needing conversion...... Nvidia GPU architecture doesn't have this issue though... why? it's still a GPU (VP decoding chip)

nevcairiel
7th September 2012, 15:14
NVIDIA also outputs NV12 from its GPU decoder.

Pulp Catalyst
7th September 2012, 15:23
i see, well in that case NV12 for quicksync shouldn't be an issue than either regarding the conversion hit, heres hoping then that "neuron2" can pull it out the bag as he did with Nvidia, but hopefully as the GPU is actually on CPU.... quicksync this will be even more efficient than discrete GPU.

egur
8th September 2012, 07:33
QuickSync overhead is lower since the video memory resides on main memory. The QS HW is also much faster.
In transcoding the benefit are smaller since most of the work is done by the encoder.

GTPVHD
11th September 2012, 19:23
http://www.anandtech.com/show/6263/intel-haswell-architecture-disclosure-live-blog


Higher encode quality, faster Quick Sync with GT3

Introducing hardware based SVC codec, can encode once and playback multiple times

4Kx2K video acceleration is supported

Moved some video processing stuff off the EU array into a dedicated video quality engine

Hardware image stabilization is new in Haswell

Now there are three concurrent video engines: codec, imaging and scale/composition

In the past only had two concurrent engines: codec and imaging/scale/composite, now you can do more in parallel as long as there's enough bandwidth to sustain


Haswell looking pretty good at the moment for video decoding and post processing.

egur
11th September 2012, 21:39
Yes, the GTs are getting (much) larger every generation. It will force Nvidia/AMD to push their dGPUs to higher performance at lower prices.

Other interesting things mentioned are:
* TSX - a new way for reducing locking (mutexes). May speed up multithreaded code.
* AVX2 - completes the AVX instructions (256bit SIMD) to integer math + new instruction to copy memory from USWB memory (GPU RAM).
* S0ix power states and a bunch of other power management features - battery life

VS2012 supports (should support) AVX2, so I'll write a new gpu->cpu copy function using AVX2.

Blight
13th September 2012, 03:23
Cool :)

nussman
13th September 2012, 08:57
I cant see how Intel can push AMD/Nvidia without Full RGB 0-255 output (hdmi) and proper 24p handling. :p

Quicksync ist very nice. Good performance and processing!
What about this point?
http://forum.doom9.org/showthread.php?p=1583369#post1583369

Quicksync + madVR + AMD GPU would be perfect for HTPC.

egur
13th September 2012, 15:07
I cant see how Intel can push AMD/Nvidia without Full RGB 0-255 output (hdmi) and proper 24p handling. :p

Quicksync ist very nice. Good performance and processing!
What about this point?
http://forum.doom9.org/showthread.php?p=1583369#post1583369

Quicksync + madVR + AMD GPU would be perfect for HTPC.

MSDK and driver should support DX11 surfaces soon, maybe they do right now, not sure. DX11 should solve this problem - using HW w/o a screen. I'm not sure if it will solve the fullscreen exclusive problem.

nussman
13th September 2012, 20:12
Cool.

I tried this driver (15.28.0.2792): http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=21235

Not possible with this one. I am looking forward to the next. :)

egur
13th September 2012, 21:22
What's not possible? What did you try to do?
I didn't implement DX11 so it will not work until I do.

nussman
13th September 2012, 21:42
"using HW w/o a screen" (= killer feature imho ;) )
I didnt know that you have to implement it first.

egur
13th September 2012, 22:26
Yes, currently I'm using the D3D9 API to allocate graphic surfaces and this can't be done because D3D9 will not enumerate a disconnected GPU so that's about it. This is even before the HW decoder is actually created.

dna108
16th September 2012, 09:28
http://www.mediafire.com/?digabe69wig5ktr

Hardware
i7 3770k
Intel HD4000
AMD HD6850

Software
Win7 x64
MPC-HC (32+64)
Lav splitter
Driver 8.15.10.2761

Playback is fine with libavcodec/coreavc.
QS shows corruption around 5s and 12s.
Clip taken from a larger ts file.

egur
16th September 2012, 13:02
http://www.mediafire.com/?digabe69wig5ktr

Hardware
i7 3770k
Intel HD4000
AMD HD6850

Software
Win7 x64
MPC-HC (32+64)
Lav splitter
Driver 8.15.10.2761

Playback is fine with libavcodec/coreavc.
QS shows corruption around 5s and 12s.
Clip taken from a larger ts file.

I managed to reproduce on newer drivers as well. I'll report this clip.

NikosD
19th September 2012, 16:27
Eric hi.

The new Ivy drivers for Win 7/8 expected to be released in October, will enable 4K x 2K resolution support for suitable monitor/ TV but most of the articles add as well as hardware accelerated 4K video decode.

I don't have an Ivy Bridge but I know that 4K accelerated decoding is already supported by the current drivers.

What exactly will be further supported regarding 4K decoding ?

Thanks!

egur
19th September 2012, 20:03
4K decoding is already supported - I already checked that months ago. The new driver will raise the display output resolution to 4K.

CiNcH
20th September 2012, 10:51
Hi Eric,

I am currently writing an EVR Custom Presenter. The target output resolution is set for the IMFMediaType (via attribute MF_MT_FRAME_SIZE) and passed to the EVR Mixer via 'SetOutputType'. IMFSamples with surfaces of the target output resolution are also created and passed to the EVR Mixer to get the video frame data. I guess that the EVR Mixer will then take the source frame from the decoder and trigger DXVA for scaling it into the surface of the IMFSample, right?

I also followed another approach where I request the source resolution from the EVR Mixer and then scale with the bilinear filter of D3D into the back buffer. Pretty bad quality of course, and also eating up GPU ressources for 4K video.

I then found a quote by you:
HW scaling doesn't have to go through Media SDK - EVR doesn't use it. Media SDK has a few limitions to be properly used in a renderer, better use the D3D/DXVA API for that.
I am wondering now how I can trigger those high quality video scalers without using EVR (Mixer) and Media SDK but using the mentioned D3D/DXVA method!? Do you have some references for me?

egur
20th September 2012, 11:07
You should use a DXVA2 video processor device. I think there's sample code in the DirectX SDK. EVR probably does the same.
BTW, I didn't say not to use EVR, I said that EVR doesn't use the Media SDK, it uses DXVA2.
I'll check if I can find some concrete useful references. I never tried doing this.

I've asked the MSDK support team and they confirmed my answer.
Here's the full answer (Thanks Tony):

Intel's video processing hardware is exposed by the driver's support of DXVA-VP (VideoProcessor) and DXVA-HD interfaces.

There are a variety of processing and rendering models that use these interfaces. For example, when using DirectShow model, Microsoft's "VMR" makes use of the capability.

Microsoft examples tend to be based on Media Foundation usage, but you can see how these DXVA interfaces work here:

http://msdn.microsoft.com/en-us/library/windows/desktop/bb970335(v=vs.85).aspx

and

http://msdn.microsoft.com/en-us/library/windows/desktop/ee663586(v=vs.85).aspx

vad74
2nd October 2012, 12:46
Hi Eric,
My config Core i3, HD3000, Win7. I use QS in MPC without problems, it work well. I setup XBMC_DSPlayer. In LAV setings I selected use QS for hardware accelerate. After when I started H.264 video, I opened LAV decoder. In LAV I see "active decoder : avdecoder" (software decoding), not QS! This problem rise when XBMC work in fullscreen mode (exclusive mode). How connected QS with XBMC(in fullscreen)? Note: DXVA work with XBMC in fullscreen.

egur
2nd October 2012, 13:01
In fullscreen exclusive, QS can't create a Direct3D9 HW device so it reports to LAV decoder that it can work in SW mode only. Only after the renderer is connected, QS can create a HW device providing the renderer passes a certain interface.
An identical issue occurs in Windows Media Center.
I've fixed this for ffdshow.
This is something LAV author can fix if he has the time.

vad74
3rd October 2012, 07:08
Thanks, I will try send your post to LAV autor. And hope that together will fix this.