Log in

View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 [51] 52 53 54

CharlieCL
19th June 2014, 19:57
Quick Sync Copyback may cause lot of performance down especially in high data rate. Without copyback the CPU usage may be in 7% but with copyback the CPU usage may be 20%. Any solution? A buffer pointer to video render?

nevcairiel
19th June 2014, 20:03
Use DXVA2-Native, then you don't get any copy-back, and its fully compatible with recent Intel GPUs.

theoneofgod
20th June 2014, 01:55
Use DXVA2-Native, then you don't get any copy-back, and its fully compatible with recent Intel GPUs.

Is there a way to use the iGPU for DXVA while running a dGPU?

CharlieCL
20th June 2014, 01:58
Use DXVA2-Native, then you don't get any copy-back, and its fully compatible with recent Intel GPUs.

My render is embedded and not is compatible to EVR. So when I enabled DXVA-2 native the hardware acceleration was disabled. Quick Sync copyback can enable hardware acceleration in my case.

It is an architecture mistake for all the CPU and GPU designers that put video hardware acceleration in GPU memory space.

nevcairiel
20th June 2014, 02:08
It is an architecture mistake for all the CPU and GPU designers that put video hardware acceleration in GPU memory space.

For playback, which is the major use case, its the most efficient way to do it, since in an ideal case you'll never have to copy the image anywhere, which is how DXVA2 works.
From decoding right to the display, the image doesn't have to be copied once. If you put the decoder anywhere else, you would always have to copy the image at least once.

In any case, you can implement DXVA2 support in your own renderer if you really wanted do. Otherwise, you'll just have to use the copy-back decoders.

CharlieCL
20th June 2014, 03:52
For playback, which is the major use case, its the most efficient way to do it, since in an ideal case you'll never have to copy the image anywhere, which is how DXVA2 works.
From decoding right to the display, the image doesn't have to be copied once. If you put the decoder anywhere else, you would always have to copy the image at least once.

In any case, you can implement DXVA2 support in your own renderer if you really wanted do. Otherwise, you'll just have to use the copy-back decoders.

My case is to output decoded frame as a texture buffer input.
So shared decoding output buffer and texture buffer will remove an image buffer. Can I have a DXVA2 compatible render embedded in my program? I just know EVR render to generate a .DLL filter. I do not need a .DLL.

Whether it is DXVA2 Native or Quick Sync, I can allocate a image buffer and present a pointer to codec, however I do not know the protocol to do so.

I expect that HSA can solve this kind of hardware acceleration problem. Unfortunately HSA today is only a paper in software.

andyvt
20th June 2014, 09:41
Can I have a DXVA2 compatible render embedded in my program? I just know EVR render to generate a .DLL filter. I do not need a .DLL.


Are you asking if it's possible to load an unregistered COM object embedded in your exe as a DirectShow renderer (filter)? If so, the answer is yes (http://www.gdcl.co.uk/2011/June/UnregisteredFilters.htm).

To support DXVA2 you just need to follow the rules.

Out of curiosity, why are you writing a custom renderer? In most cases you should be able to use the EVR w/ a custom presenter or mixer (depending on what your use case is).

CharlieCL
20th June 2014, 14:50
Are you asking if it's possible to load an unregistered COM object embedded in your exe as a DirectShow renderer (filter)? If so, the answer is yes (http://www.gdcl.co.uk/2011/June/UnregisteredFilters.htm).

To support DXVA2 you just need to follow the rules.

Out of curiosity, why are you writing a custom renderer? In most cases you should be able to use the EVR w/ a custom presenter or mixer (depending on what your use case is).

No. I just included entire source code of a render in my exe. EVR + Custom Presenter is limited for post-processing.

andyvt
20th June 2014, 15:22
No. I just included entire source code of a render in my exe. EVR + Custom Presenter is limited for post-processing.

If you need to do something pre-post processing (e.g. before DI) that's where you could implement a custom mixer.

NikosD
27th June 2014, 17:27
Eric hi.

How hard would be for you to implement HW acceleration for MJPEG codec inside your QS decoder ?

egur
28th June 2014, 12:38
Eric hi.

How hard would be for you to implement HW acceleration for MJPEG codec inside your QS decoder ?

Hard to tell. If everything is simple (never is) it's pretty quick - a day or two. If things get complicated it can take a week or two.

There's very little appeal to MJPEG, this format is dying and not used much by HD content. For SD support, better to use the SW codecs.

Since I'm working 130% on my current tasks, I'm really in no way to experiment in low priority features.
Other codecs are coming soon. When/if MSDK supports them, I'll try to find time to add those.

My hope is that both DXVA2 and DX11 video API will be that good and QS decoder will have no need to exist. It was meant as an intermediate solution.

Anyway, all of my time is dedicated to Overclocking of next gen CPUs (fun :) )and a few new security features (less fun).
Media OC is on my TODO list as well.

egur
28th June 2014, 12:40
Japan clip analysis:
It seems the decoder isn't doing anything wrong. In ffdshow, the stalls occur when ffdshow asks the renderer for a surface.
I don't have a clue as to why this happens, I thought maybe the timestamps caused this, but I stopped handling those and no change.

So I'm out of ideas about this specific clip.

egur
28th June 2014, 14:15
Version 0.45 is out with the following changes:
* Bugfix - frames were sometime treated as interlaced.
* Bugfix - time stamps are passed 'as is' when TS manipulation is off.
* Bugfix - time stamps handling was causing A/V delay.
* Changed: AnnexB type packets (AVC in TS files) is not pre-processed and sent to the HW decoder directly. May break a broken clip or two but save many others.
* Sync with MSDK 2014 files.
* FFDShow: r4531

Downloads
* For the latest cutting edge FFDShow builds download my builds Intel QuickSync Decoder SourceForge home page (http://sourceforge.net/projects/qsdecoder/)
* FFDShow-tryout site (http://ffdshow-tryout.sourceforge.net/download.php)
* LAV Splitter builds (http://forum.doom9.org/showthread.php?t=156191)

NikosD
28th June 2014, 15:13
Nice!

It's been a long time since your last update.
I'm sure that will see the new version in both PotPlayer and LAV Video.

Do the changes in .ts container mean, that QS decoder will finally work together with LAV splitter without artifacts after seeking ?

UPDATE
I forgot to ask:
Are you going to support HEVC_VLD_Main mode in QS decoder ?

clsid
28th June 2014, 15:54
New stable ffdshow releases are now available on SourceForge as well.

egur
28th June 2014, 21:21
TS containers have long standing issues with the HW decoder but also with libavcodec.
The fix is related to timestamps which have nothing to do with image corruption that shows up in first few frames after a seek.
I hope the timestamp issue is solved.

Changing the splitter to produce the same bitstream as the old Matsroska splitter is probably hard.
Fixing this in my decoder is not simple. If someone has the knowledge to do it, I can give it a try.

HEVC, when supported properly by the MSDK will be added. I'm not sure what kind of HW acceleration is supported in Haswell/Broadwell if at all.

NikosD
28th June 2014, 21:35
So fixed timestamps will improve seeking and frame rate.

Is there anything more ?

I'm not sure if I had any problems with .ts besides image corruption.

About HEVC, it's officially supported by latest driver using a hybrid mode of HW acceleration, leveraging EU's and CPU of course.

Full fixed-function HW accelerated HEVC decoding will not be possible even by Broadwell.

We have to wait two generations from Haswell.

NikosD
29th June 2014, 09:57
New beta ver. 3651, mainly for GRID Autosport* game:

https://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProdId=3719&DwnldID=23985&ProductFamily=Graphics&ProductLine=Desktop+graphics+drivers&ProductProduct=4th+Generation+Intel%c2%ae+Core%e2%84%a2+Processors+with+Intel%c2%ae+HD+Graphics+4400&lang=eng

ryrynz
29th June 2014, 11:11
Just set up a fake display, is there a way to actually hide the now invisible second screen I have to the right?

theoneofgod
29th June 2014, 15:07
Just set up a fake display, is there a way to actually hide the now invisible second screen I have to the right?

With Windows 8+ you don't need a fake display to use QuickSync.

Tacio
29th June 2014, 16:28
With Windows 8+ you don't need a fake display to use QuickSync.
I have windows 8.1 Pro x64 and Core i5-2410M (Sandy Bridge) but don't have available QuickSync decoding in LAV filters while my MPC-BE uses dGPU NVIDIA 520M in nVIDIA control panel settings.
LAV filters version: 0.62.0
Intel HD Graphics dirver ver. 3517
What's wrong with my setup?

wanezhiling
29th June 2014, 17:12
Because it doesn't support NVIDIA Optimus.
On desktop it works.

egur
29th June 2014, 20:38
Because it doesn't support NVIDIA Optimus.
On desktop it works.

Sad but true. Optimus hides the iGPU. Nothing I can do about it.

cybersans
30th June 2014, 04:19
http://i1116.photobucket.com/albums/k565/cybersans/ffdshow_rev4531_20140628_egur_x64-quicksync.jpg (http://s1116.photobucket.com/user/cybersans/media/ffdshow_rev4531_20140628_egur_x64-quicksync.jpg.html)

i don't know what to call this but the video's become like this when using the latest ffdshow. fyi, i am using windows media player 64 bit with ffdshow x64.
before this using ffdshow_rev4519_20130622_egur_x64.exe but audio and video not sync, where sometime audio is faster than video or vice versa.

fyi i also test the video using 32bit version and same result.

theoneofgod
30th June 2014, 08:49
Intel 10.18.10.3652 beta drivers for Haswell. This beta driver will only install on Haswell, Ivy Bridge & Bay Trail is not supported in this beta driver.

https://communities.intel.com/thread/52705

https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=23985

Ivy is 2 generations behind now, I didn't realize. Oh well...

NikosD
30th June 2014, 09:52
Ivy is just the previous generation.
Not two generations, just one.

SandyBridge is two generations back.

theoneofgod
30th June 2014, 18:07
I thought z97 was the 5th generation.

nevcairiel
30th June 2014, 18:11
Z97 is not a new CPU generation, its just a new chipset with a refresh of Haswell, not a new graphics generation by itself.

theoneofgod
30th June 2014, 18:49
It's shamefully marketed as the 5th generation. Thanks for clearing that up for me.

andyvt
30th June 2014, 19:10
Z97 is not a new CPU generation, its just a new chipset with a refresh of Haswell, not a new graphics generation by itself.

9x was intended as the chipset for Broadwell (socket compatible w/ Haswell), but when Broadwell missed it's target date (now expected pre-holidays) they shipped the chipset anyway.

Deihmos
30th June 2014, 23:29
I have an i3 processor and during mkv video playback in WMC my CPU usage is about 10-15 with software decoding. Is there any benefit if I use intel quick sync? The gpu is intel HD 4400.

theoneofgod
1st July 2014, 13:51
I have an i3 processor and during mkv video playback in WMC my CPU usage is about 10-15 with software decoding. Is there any benefit if I use intel quick sync? The gpu is intel HD 4400.

QuickSync does use very little CPU, but there are issues with it too.

cybersans
20th July 2014, 15:55
hello, anyone?
can someone help me with my #2526 post?
when "enable time stamp correction" it will disappear but video and audio not sync.
disable that, video and audio sync but the image crash when dragging the seek bar.

egur
20th July 2014, 21:06
I released a new version recently, try it out. If it still fails, share the clip.

cybersans
27th July 2014, 10:38
fyi i already install your current version per my #2526 post. it happen to all my mp4/mkv clips. already try different version of intel hd graphics driver too

egur
27th July 2014, 13:10
fyi i already install your current version per my #2526 post. it happen to all my mp4/mkv clips. already try different version of intel hd graphics driver too

the latest version is newer than what you used. The version you specified is from last year.
Go to the downloads (http://sourceforge.net/projects/qsdecoder/) section and try the new version:

cybersans
2nd August 2014, 09:39
sorry, that was 2013 but the codec used for that video is rev4512. i try rev4519, also same. today i try latest one rev4531. also same. fyi my intel hd graphics is 15.33.22.64.3621

CiNcH
7th August 2014, 21:02
While reading the Media SDK 2014 documentation, I stumbled over the following thing...

Many implementations of Intel® Iris™ Pro Graphics, Intel® Iris™ Graphics and Intel® HD Graphics (4200+ Series) allow support the creation of interpolated frame content by using the MFX_FRCALGM_FRAME_INTERPOLATION algorithm if the ratio of input-to-output frame rate is not supported, the VPP operation will report MFX_WRN_FILTER_SKIPPED. Commonly supported conversion ratios are 1:2 and 2:5 (useful for creating 60Hz content from 30Hz and 24Hz, respectfully).

Could this be integrated into the QuickSync Decoder for easy usage in LAV? I also checked out the Direct3D11 Video APIs where there is als FRC support. But this seems to require a Media Foundation / Direct3D 11 pipeline. I don't know of a hardware that supports those Video APIs and no MFT that uses it. And the only more modern renderer supporting an up-to-date Direct3D might be madVR (which has its own FRC already). So I don't see any Direct3D11 FRC on the horizon. An easy to use Intel solution for existing infrastructures would be a cool thing.

Do you by chance know whether this is done by an ASIC inside the Haswell/4th Gen Core i processors? Or is it done by the EUs?

fvisagie
8th August 2014, 09:49
I want to correctly target my system's GPU when encoding to ensure it can use DXVA for playback. At the moment I'm specifically concerned with H.264 encoding, but learning about the other available formats like MPEG-2 will also be useful. So far I've come up empty-handied and will welcome any suggestions.

Searching on the Internet indicates a lot of confusion around levels and reference frame settings, although it seems certain different GPUs have different capabilities. Scouring the Intel site for almost a day hasn't yielded anything either, beyond the statement that Clear Video is supported on my Core i5-2540M with HD Graphics 3000 GPU (http://ark.intel.com/products/50072/Intel-Core-i5-2540M-Processor-3M-Cache-up-to-3_30-GHz). From what I can tell, DXVA Checker doesn't provide this level of information (no pun intended) about the system.

Thanks,
Francois

GTPVHD
11th August 2014, 17:16
http://images.anandtech.com/doci/8355/GPUMedia_575px.png

http://www.anandtech.com/show/8355/intel-broadwell-architecture-preview/3

Moving on, last but not least in our GPU discussion, Intel is also upgrading their GPU’s media capabilities for Broadwell. The aforementioned increase in sub-slices and the resulting increase in samplers will have a direct impact on the GPU’s video processing capabilities – the Video Quality Engine and QuickSync – further increasing the throughput of each of them, up to 2x in the case of the video engine. Intel is also promising quality improvements in QuickSync, though they haven’t specified whether this is from technical improvements to the encoder or having more GPU resources to work with.

Broadwell’s video decode capabilities will also be increasing compared to Haswell. On top of Intel’s existing codec support, Broadwell will be implementing a hybrid H.265 decoder, allowing Broadwell to decode the next-generation video codec in hardware, but not with the same degree of power efficiency as H.264 today. In this hybrid setup Intel will be utilizing both portions of their fixed function video decoder and executing decoding steps on their shaders in order to offer complete H.265 decoding. The use of the shaders for part of the decoding process is less power efficient than doing everything in fixed function hardware but it’s better than the even less optimal CPU.

The use of a hybrid approach is essentially a stop-gap solution to the problem – the lead time on the finalization of H.265 would leave little time to develop a fixed function encoder for anyone with a long product cycle like Intel – and we expect that future generation products will have a full fixed function decoder. In the meantime Intel will be in the company of other GPU manufacturers such as NVIDIA, who is using a similar hybrid approach for H.265 on their Maxwell architecture.

Zachs
16th August 2014, 12:57
Dear Intellier (used to be one myself), I found a resource leak issue with the decoder. I reported it to LAVFilter's author but was asked to report here instead.

Bug report here: https://code.google.com/p/lavfilters/issues/detail?id=473

Cheers.

egur
16th August 2014, 14:22
Dear Intellier (used to be one myself), I found a resource leak issue with the decoder. I reported it to LAVFilter's author but was asked to report here instead.

Bug report here: https://code.google.com/p/lavfilters/issues/detail?id=473

Cheers.

Well, this doesn't provide a lot of details...
I opened a media file 10 times in a row and memory stayed the about same (went up and down again and again).

Maybe one of the small memory footprint interfaces is leaked.

Do you know which resource type/interface is being leaked?
This could be a driver or Media SDK issue as well (my code is not the only allocator).

In any case this is a minor issue that shouldn't concern end users.

egur
16th August 2014, 14:28
I want to correctly target my system's GPU when encoding to ensure it can use DXVA for playback. At the moment I'm specifically concerned with H.264 encoding, but learning about the other available formats like MPEG-2 will also be useful. So far I've come up empty-handied and will welcome any suggestions.

Searching on the Internet indicates a lot of confusion around levels and reference frame settings, although it seems certain different GPUs have different capabilities. Scouring the Intel site for almost a day hasn't yielded anything either, beyond the statement that Clear Video is supported on my Core i5-2540M with HD Graphics 3000 GPU (http://ark.intel.com/products/50072/Intel-Core-i5-2540M-Processor-3M-Cache-up-to-3_30-GHz). From what I can tell, DXVA Checker doesn't provide this level of information (no pun intended) about the system.

Thanks,
Francois

QuickSync HW supports (at least) Level 5.1. High profile (8bit 4:2:0).
Some tablets only support up to level 4.0.
My suggestion is to use H264 (a.k.a MPEG4 part 10 AVC) High profile with Level 4.0 or 4.1. This is a trade off between the most optimized high bitrate encoding and having lots of devices be able to play it.
MPEG2 is old and should already be dead. H264 is superior in every aspect so use that.
H265 is too new for now but within a year or two it should start eating market share from H264.

Zachs
16th August 2014, 15:24
Hi egur,

You should use D3D9 debug runtimes with max validations and break on all errors and memory leaks, rather than looking at memory usage. In fact, it is not a memory leak but a GPU resource leak. Outstanding Alloc Counts when the app terminates can refer to either GPU or sys mem resource leak depending on where the resource was allocated.

Thanks,
Zach

egur
16th August 2014, 21:15
I didn't see how I can do this on Win8.1. I'll give it a look at work where i still use Win7.

Zachs
17th August 2014, 02:18
Yeah Win8.1 isn't ideal for D3D9 development.

fvisagie
19th August 2014, 07:57
I want to correctly target my system's GPU when encoding to ensure it can use DXVA for playback ... Clear Video is supported on my Core i5-2540M with HD Graphics 3000 GPU (http://ark.intel.com/products/50072/Intel-Core-i5-2540M-Processor-3M-Cache-up-to-3_30-GHz).

QuickSync HW supports (at least) Level 5.1. High profile (8bit 4:2:0).
Some tablets only support up to level 4.0.


Thanks for your response. Is a given Intel GPU supposed to support Clear Video and QuickSync to the same H.264 levels? Initial testing on HD Graphics 3000 seems to suggest that on playback (Clear Video) it only supports up to level 4.1 High profile, and playback is what I'm mostly concerned with at this point.

Your suggestion of targeting Level 4.0/1 during encoding would be even more appropriate if it turns out Intel GPUs are not guaranteed to support Clear Video to the same level as the 5.1 on QuickSync you mention. Just to be clear, I'm using non-HW-accelerated encoding.

nevcairiel
19th August 2014, 08:05
All Intel GPUs since the Sandy Bridge generation support decoding of H.264 up to level 5.1, however with resolution constraints. SNB only supports 1080p, not 4K (even though 4K is technically allowed in level 5.1).
That means 1080p with 16 refs or very high bitrate would decode just fine (which needs L5.1), but 4K would of course not.

In short, the support doesn't fit into any levels. If you needed to be very strict, then yes, only L4.1 is supported completely on Sandy Bridge, L5.1 only on later GPUs with 4K decoding capability.
But if you encode at 1080p, then L5.1 works just fine on those GPUs as well.

egur
19th August 2014, 08:32
Every generation has a limit on the decoder's resolution handling.
Latest generation (Haswell and Broadwell) supports up to 4K. 8K is still a dream.

NikosD
19th August 2014, 11:28
Seems like 15.36 drivers drop support for Ivybridge.

I'll give them a try.