Log in

View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 [34] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

egur
27th October 2012, 22:27
I think the MFT decoders were dropped and made a comeback. I don't know why (either way).
As for 3rd generation i3 (IVB) not being able to output 4Kx2K might be an incentive to buy a better processor.
I checked 4K decode a while back using a much older driver. I'll check again on Sunday.
Special note:
There are several known issues with this driver with respect to my decoder. Use only the latest ffdshow (4488) or LAV (0.52). Older versions might not decode H264 at all. All multihreading options (ffdshow) must be disabled (MT copy is fine) or playback will break on many clips.
In fact, I'm seriously considering dropping multithreaded decoding and video processing since there's no fix in sight. This will definitely simplify my design and validation efforts, allowing me to add new features much faster.
If I decide to drop MT, the next build will be a cleanup build that will be 100% (hopefully) compatible with current production drivers (15.28) and beta drivers (15.31).
It turns out that multithreading was a larger headache than I anticipated.

wanezhiling
28th October 2012, 05:13
Sorry Eric, I cant find any reason to use QS decoder because it is too reliant on the drivers...:o

NikosD
28th October 2012, 05:40
I think the MFT decoders were dropped and made a comeback. I don't know why (either way).


I think it's the first time they enumerate in DXVAChecker, but they are not working at least with version v2.9.1 of DXVAChecker. They output an error code. Will see if it is a bug of DXVAChecker or the MFT decoders (transcoders to be more exact) are not working.


As for 3rd generation i3 (IVB) not being able to output 4Kx2K might be an incentive to buy a better processor.
I checked 4K decode a while back using a much older driver. I'll check again on Sunday.


If Intel disabled 4K decoding even in IVB (core i3), I have to forget 4K decoding on SNB for sure.
But I wouldn't mind if they followed the same pattern and enable it on SNB Core i5/i7 only.


If I decide to drop MT, the next build will be a cleanup build that will be 100% (hopefully) compatible with current production drivers (15.28) and beta drivers (15.31).
It turns out that multithreading was a larger headache than I anticipated.

How much performance will QS decoder loose, if you drop multithreaded code ?

Could you tell me, if you don't mind, a simple way to enable and use HW video processing capabilities of QS decoder in PotPlayer or MPC-HC because my time with Core i5 system is always limited (it's not mine!)

egur
28th October 2012, 14:33
Sorry Eric, I cant find any reason to use QS decoder because it is too reliant on the drivers...:o
Using HW acceleration implies relying on drivers. The best anyone can do is mask out driver issues.

If Intel disabled 4K decoding even in IVB (core i3), I have to forget 4K decoding on SNB for sure.
But I wouldn't mind if they followed the same pattern and enable it on SNB Core i5/i7 only.

How much performance will QS decoder loose, if you drop multithreaded code ?

Could you tell me, if you don't mind, a simple way to enable and use HW video processing capabilities of QS decoder in PotPlayer or MPC-HC because my time with Core i5 system is always limited (it's not mine!)

I don't know the details on i3 but SandyBridge doesn't support 4K (HW limitation) as far as I know.

QS will loose some performance in benchmarks but will be more efficient in normal playback scenarios. I'll might have some sort of MT going on that's not affected by the driver.

For MPC-HC - just install LAV or ffdshow. ffdshow enable QuickSync in config->codecs page for H264, mpeg2, vc1. In LAV select QuickSync from the HW accelration tab.
ffdshow has an relatively new Intel QuickSync docoder config page of it's own (within ffdshow config).

HoP
28th October 2012, 16:13
@egur
this driver:
http://downloadcenter.intel.com/Detail_Desc.aspx?ProductID=3231&DwnldID=21776&lang=eng&iid=dc_rss

is compatible with this CPU:

http://upit.cc/i/82c35ac5.png

why i ask? cause i think after update,system is a little bit unstable.
[sorry for off-topic]

NikosD
28th October 2012, 17:04
For MPC-HC - just install LAV or ffdshow. ffdshow enable QuickSync in config->codecs page for H264, mpeg2, vc1. In LAV select QuickSync from the HW accelration tab.
ffdshow has an relatively new Intel QuickSync docoder config page of it's own (within ffdshow config).

I didn't mean exactly that.

I meant how do I choose - inside video players - the suitable options in order to be sure that for example Deinterlacing or Scaling or other HW post-processing procedures are executed by QuickSync hardware and not anything else.

Which are the options inside video players menu and if there is an indication that QS HW is in use for post-processing options (deinterlacing, scaling etc).

Thanks!

egur
28th October 2012, 17:23
@HoP
I'm sorry but I'm not using these CPUs. I don't have any means of reporting driver issues directly to the driver team. You can try the driver support page.

@NikosD
LAV doesn't use any HW video processing from y decoder. Only decoding. If QS is active while playing a clip, open up the LAV video decoder filter properties (changes from player to player). If you see "Available" it's not currently in use. "Active" mean it's in use.

EVR uses deinterlacing from the GPU connected to the active screen. If you use a multi monitor setup, the other GPU might do the work, same answer for scaling.

MadVR will use HW deinterlacing the same as EVR (think so anyway). MadVR will scale the image on its own. Questions to MadVR should be directed to it's thread.

Other renderers (e.g. EVR-CP) - I'm not sure what they do.

CiNcH
28th October 2012, 20:21
EVR uses deinterlacing from the GPU connected to the active screen.
It is also possible to use DXVA deinterlacing within XBMC. It does not work with Intel iGPU's however. You can chose between 'DXVA BOB' and 'DXVA Best'. BOB quality is quite bad of course, 'DXVA Best' just results in a still image. It works with nVIDIA and AMD GPU's though. Any clue?

NikosD
28th October 2012, 21:14
Well, I did some tests to the following system:

Win 7 SP1 x86 - Core i5 2400 (SNB) - Intel drivers v2867

I installed:

PotPlayer v1.5.34442, LAV 0.52, MPC-HC v1.6.5.6103 ICL12 by XhmikosR, FFDShow v4488 ICL12 by XhmikosR

and the following monitoring tools:

GPU-Z v0.6.6, Intel Media Checker 2.0.1.18, Intel GPA Monitor 2012 R4.

I did some tests mainly for Deinterlacing.
PotPlayer crashes all the time (closes before the end of almost any clip progressive and interlaced) in both modes (QS, DXVA).
It has an older QS .dll not 0.39
I added as an external filter the LAV Video codec but LAV doesn't support HW DI - I didn't catch any pre-processing functions with Intel Media checker.

But I saw that EVR-CP with interlaced content has in both internal modes (DXVA, QS) a lot more GPU load and less CPU than EVR.
The same goes for LAV Video too.

It seems that EVR uses a lot more CPU than GPU compared to EVR-CP which does the opposite.
Strange.

In general, I had a feeling of major incompatibility of PotPlayer with Intel Platform.

In MPC-HC I added FFDshow (LAV video didn't work for HW DI) with default options as an external filter and selected only one more option - DI - from the configuration menu.

FFDShow worked with interlaced content and Media Checker caught HW video pre-processing functions in both EVR and EVR-CP renderers.
But GPA Monitor didn't catch any. Only DECODE functions all the time for both EU and MFX. No VPP functions at all.
Strange.

GPU load was very high with EVR-CP (85%) and GPU clock went at top speed - 1100MHz.
The interlaced content was MPEG-2, H.264, VC-1 - 1080p.
The MFX load (DECODE) was below 15%.

On the other hand EVR had only 60% of GPU load at 850MHz consuming less than half power of EVR-CP (for GPU only) and pushed more CPU utilization for the same interlaced content.

The playback of clips was in window mode - not full screen - because I wanted to see various other windows (GPU-Z, Media Checker, GPA Monitor).

That's all folks!

egur
28th October 2012, 22:08
In ffdshow video decoder->config->Intel QuickSync you should enable the deinterlacer. The config page is self explanatory.
Make sure that the combo box with multithreading options is set to "Multithreaded copy".
Potplayer crashes with QS since it allows all MT options which are sadly not stable on the 15.28 drivers and there's nothing I can do about it (except decode in 1 thread). I didn'tchange the hard coded defaults in the latest release so replacing the DLL will not help. This is something Potplayer writers can easily fix.

NikosD
28th October 2012, 22:20
All the options you mention are default options except DI.
That was the only option I had to select.

Unfortunately PotPlayer crashes in DXVA too.

What about EVR-CR vs EVR, regarding GPU load ?

And GPA Monitor regarding VPP functions ?

egur
30th October 2012, 08:06
EVR-CP uses more GPU resources than EVR.
EVR is the most power friendly and performance friendly alternative in win7/8. One can always use VMR9 (DXVA1) but it's outdated and I don't recommend it.
Using DI (or any other VPP operation) within my decoder has these properties:
* Allows you to add subtitles, OSD afterwards via standard SW functions.
* DI can double the frame rate (via option). EVR will always double the frame rate. Both use the same DXVA2 video processor device. Doubling the frame rate in the decoder will hurt performance but current systems are more than strong enough to handle this.
* Detail/Denoise are used via the same video processing device in QS/EVR. These operations must work on progressive video, so DI is implicitly enabled, even if user disables it.
* Same goes for ProcAMP (contrast, brightness, etc) and scaling which I didn't implement.

Pulp Catalyst
30th October 2012, 16:53
I'll be lying if i said i understood half of what's being talked about here, so my question may of been answered already, but my understanding of this thread is.... sorry anyway

my question is, i use MeGUI, can the decoding of a H264/VC1 stream be shifted over to the QS decoder, than from there go through the normal path of Avisynth - filters - encoder - output

i use DGDecNV which shifts decoding over to the GPU and MeGUI happily supports DGDecNV, is there anyway that QS can be utilized in the same manner using any available techniques at present, or perhaps will this be possible in the future, having a GPU do the decoding has been very rewarding for me with DGDecNV, but QS is said to be far superior due to technical reasons, and i would love to take advantage of that fact, any feed back of this being donme now, or perhaps in the future would be great..... i know that the new ffdashow does have avisynth capabilities... can this be utilized somehow along with MeGUI ?

i have 3770k
9800GT
Virtu Universal MVP

vivan
30th October 2012, 17:25
Pulp Catalyst,
Just use directshowsource () for decoding video in avisynth. If you have ffdshow installed - than it will use it, thus using QS.

Pulp Catalyst
30th October 2012, 17:29
so inside a megui avisynth script, use directsource, ok, this will load up ffdshow, how do i tell ffdshow to use quicksync decoder or will it do this automatically if i use this one

http://sourceforge.net/projects/qsdecoder/

even so, surely it must be a bit more complicated than this, isn't the image coming back in, the wrong format kind of thing?

Directshowsource () doesn't the encoder need a certain kind of format.... or am i reading to much into this LOL

QS decoding is all well and god, but the information coming back along the pipe needs to be yv12, or what ever x264/xvid will want, will i need to do anything inside avisynth to make the decoded info compatible?

NikosD
31st October 2012, 18:25
I tried a new beta of DXVAChecker v2.9.2 which seems that not only enumerates Intel MFT decoders but can use them too for playback and benchmarking, but not for every format.

It can only enumerate Intel MFT decoders for H.264 (progressive & interlaced) and VC-1 progressive.
So, no MPEG-2, no WMV and no VC-1 interlaced for Intel MFT.

Regarding performance it's using for sure Intel MSDK (big surprise :eek:) and so it's not comparable with native DXVA HW codecs, it's comparable with your QS decoder (it seems that you have an internal competitor :p)

Your QS decoder in both implementations (FFDShow 4488, LAV 0.52) is generally faster than Intel MFT ~20%

LAV > FFDShow > Intel MFT

Only in very specific clips Intel MFT is a little faster.

If I have something new about future beta of DXVAChecker adding more codecs, I'll post again.

crotecun
1st November 2012, 03:57
I don't know what Linux drivers do and what options they support.

I see, I guess I'll pick up a Windows laptop to be sure I take advantage of Sandy/Ivy Bridge graphics technology.

Speaking of which, concerning this quote:

EVR-CP uses more GPU resources than EVR.
EVR is the most power friendly and performance friendly alternative in win7/8.

Are there any differences in video playback between Windows 7 and Windows 8? I was wondering if there is any benefit in moving to Microsoft's newest OS when it comes to watching videos.

egur
1st November 2012, 19:48
I tried a new beta of DXVAChecker v2.9.2 which seems that not only enumerates Intel MFT decoders but can use them too for playback and benchmarking, but not for every format.

It can only enumerate Intel MFT decoders for H.264 (progressive & interlaced) and VC-1 progressive.
So, no MPEG-2, no WMV and no VC-1 interlaced for Intel MFT.

Regarding performance it's using for sure Intel MSDK (big surprise :eek:) and so it's not comparable with native DXVA HW codecs, it's comparable with your QS decoder (it seems that you have an internal competitor :p)

Your QS decoder in both implementations (FFDShow 4488, LAV 0.52) is generally faster than Intel MFT ~20%

LAV > FFDShow > Intel MFT

Only in very specific clips Intel MFT is a little faster.

If I have something new about future beta of DXVAChecker adding more codecs, I'll post again.
Very odd, it should connect via DXVA2. If it doesn't, this means that it do a copy-back like I do. Strange that MPEG2 , WMV9 and VC-1 interlaced aren't supported.
I should generally be faster since my code is parallelized a little better probably. My copy function is also faster.

so inside a megui avisynth script, use directsource, ok, this will load up ffdshow, how do i tell ffdshow to use quicksync decoder or will it do this automatically if i use this one

http://sourceforge.net/projects/qsdecoder/

even so, surely it must be a bit more complicated than this, isn't the image coming back in, the wrong format kind of thing?

Directshowsource () doesn't the encoder need a certain kind of format.... or am i reading to much into this LOL

QS decoding is all well and god, but the information coming back along the pipe needs to be yv12, or what ever x264/xvid will want, will i need to do anything inside avisynth to make the decoded info compatible?
Use the newer between my download page or ffdshow-tryouts page. My build is made on the main source code.
Within ffdshow's configuration page (e.g. from Start Menu), go to the codecs tab and select H264,MPEG2, VC1, WMV9. This is a one time setup.

Are there any differences in video playback between Windows 7 and Windows 8?
There are some differences like DXVA over D3D11. I'm not sure who/what actually uses the new features.

NikosD
1st November 2012, 21:03
Very odd, it should connect via DXVA2. If it doesn't, this means that it do a copy-back like I do. Strange that MPEG2 , WMV9 and VC-1 interlaced aren't supported.


Looking at my benchmark numbers, DXVA CB is a lot slower (more than 20%) than QS decoder.

Also if it's using DXVA CB, how is it possible to play/benchmark VC-1 progressive content ?
VC-1 HW acceleration of Intel doesn't use proprietary DXVA mode.

It seems that MPEG-2 doesn't belong to MediaFoundation supported video formats.
But then again, why Intel released a MFT MPEG-2 decoder (transcoder) ?

Pulp Catalyst
2nd November 2012, 01:07
can someone help me getting this to work with MeGUI, i have looked around, there is no option in MeGUI to create a AVS script but use DirectShowSource instead of dgv,

i looked for some guides, but all i get is create a avs script in megui but use DirectShowSource, i have been on this for two days now.

can anyone help, give me some feedback other than "use DirectShowSource", appreciate it.

the following doesn't work,

# Set DAR in encoder to 4 : 3. The following line is for automatic signalling
global MeGUI_darx = 4
global MeGUI_dary = 3
SetMemoryMax(1024)
SetMTMode(5,2)
#LoadPlugin("D:\Program Files (x86)\MeGUI\tools\dgindex\DGDecode.dll")
DirectShowSource("H:\MEGUI\7\VTS_01_1.d2v", info=3)
LoadPlugin("D:\Program Files (x86)\MeGUI\tools\avisynth_plugin\ColorMatrix.dll")
ColorMatrix(hints=true, threads=0)
SetMTMode(2,2)
#deinterlace
crop(18, 10, -14, -6)
#resize
#denoise


so i messed around myself manually and did this
# Set DAR in encoder to 4 : 3. The following line is for automatic signalling
global MeGUI_darx = 4
global MeGUI_dary = 3
SetMemoryMax(1024)
SetMTMode(5,2)
#LoadPlugin("D:\Program Files (x86)\MeGUI\tools\dgindex\DGDecode.dll")
DirectShowSource("G:\DVD\03\VIDEO_TS\VTS_01_1.VOB")
LoadPlugin("D:\Program Files (x86)\MeGUI\tools\avisynth_plugin\ColorMatrix.dll")
#ColorMatrix(hints=true, threads=0)
SetMTMode(2,2)
#deinterlace
crop(18, 10, -14, -6)
#resize
#denoise


now this worked, but AVSMeter showed a decrease in speed by about 45% over libavcodec.....
also when used with QTGMC (which is really important for me hence why i'm trying to coax everybit of speed i can)

i get error in AVSMeter

unsupported colourspace, masktools only supports YUV colorspaces

YV12, YV16, YV24

EDIT

ConvertToYV12(interlaced=true)

EDIT 2

done some more testing, AVSMeter seems to be giving of false readings, if i drag over a avs file it says average 525, then i stop, drag it over again, goes down to 458..... each time it's different, gonna have to think about this one.... maybe will have to do actual time tests..... video time tests take so long to do LOL.

egur
2nd November 2012, 12:55
Looking at my benchmark numbers, DXVA CB is a lot slower (more than 20%) than QS decoder. There's more then one way to do things. I've put in the effort to do it as fast as possible.
Also if it's using DXVA CB, how is it possible to play/benchmark VC-1 progressive content ?
VC-1 HW acceleration of Intel doesn't use proprietary DXVA mode.
Intel code doesn't use proprietary DXVA anything. The Media SDK probably has some workarounds. This is just a guess. I dont really know.
It seems that MPEG-2 doesn't belong to MediaFoundation supported video formats.
But then again, why Intel released a MFT MPEG-2 decoder (transcoder) ?
No clue.

can someone help me getting this to work with MeGUI, i have looked around, there is no option in MeGUI to create a AVS script but use DirectShowSource instead of dgv,

....

Try limiting ffdshow's / LAV decoder to YV12 (better!) or YUY2, depending on the version of Avisynth you use.
Do this via the config page of either decoder. Make sure these decoders actually work, ffdshow shows a tray icon. You can see if LAV is working by by using Process explorer and looking at the DLLs used by the process.
FYI,
It's a known fact that for low resolution and/or low bitrate you will not get any benefit from HW acceleration:
1) Need to perform more memory copying relative to SW decoder.
2) Need to convert the output from NV12->YV12 or even worse to YUY2.

The big difference is with the high bitrate stuff - e.g. BluRay.
As for trancoding, the bulk of the work is done by the encoder so even if the decoder works in zero time, the difference will be small.

nevcairiel
4th November 2012, 12:00
Hey Eric,

i found a somehwat annoying behaviour in the QS Decoder.
I recently added support for using it with EVR in Fullscreen Exclusive mode, but now its broken in another use-case.
Because i cannot really detect if i'm in FSE mode, i always try to grab the interface from EVR if its present, and use it (ffdshow does the same).

The problem starts when i try to move the player from one screen to another. Playback simply freezes. You can reproduce this with ffdshow as well.
What EVR does is reconnect its Pin, and give me a new D3D Device Manager, because obviously the old one was for the other screen. After its done, the old device manager ceases to function.

At this point, the QS decoder also fails, because it doesn't seem to be able to change its D3D interface on the fly.
What i would hope it would do, is either try to re-create its D3D interfaces using the device manager. As an alternative, it could also just try to create the D3D interfaces directly, and only use the renderer supplied version if the direct way didn't work.

Thoughts?

Edit:
http://git.1f0.de/gitweb?p=qsdecoder.git;a=commitdiff;h=6fede91c290c1eece59357f220a62c65768a6433

This is my change implementing my alternative idea above, so far it seems to work just perfectly. The comment in the code also seems to think its done like this, the question is, any reason it wasn't?
I have been testing both FSE usage and normal usage, and everything seems perfect so far.

egur
4th November 2012, 17:27
I looks fine. It might cause a slightly slower initialization.
It will not solve this scenario:
* Start in FSE
* Player leaves FSE into windowed mode.
* Player/renderer moved to another monitor.

Not really strong use case but possible, I'll try to solve both.
I want to check another option, before committing.

nevcairiel
4th November 2012, 17:47
Yeah i don't consider the start in FSE and then go into windowed mode a strong use-case. :p

If you can fix it another way, its fine with me. I just don't want to break something that worked before i added FSE support (ie. moving screen with EVR).

egur
4th November 2012, 21:46
Did 2 small commits.
r70 fixes the above problem (hopefully).
r71 just chnages the default MT options - only MT copy is enabled by default. Doesn't matter to either LAV or ffdshow.

Pulp Catalyst
5th November 2012, 07:53
been going through the entire topic here, page by page....

i finally just came across this which explains everything why i'm getting such inferior results than i would expect Luckily NVIDIA was smarter and lets you access the HW decoder without a D3D device. Even works without a screen connected at all. Intel should totally do that

i realize now that QS decoder can't be used like the Nvidia one can (DGDecNV), so like AMD, QS can't be directly accessed either, which means having to go through the d3d surfaces is the only way, this now explains why i'm getting inferior results with my tests (there is an innate overhead performance lost going through this entire chain) where the Nvidia system allows direct communication whereby eliminating many of the overheads that the QS system has..... and only getting the decoder to process what's necessary!

so i am i right saying that QS for decoding streams directly will always be inferior to Nvidia's solution unless intel opens up the decoder for direct access.....

also will this, or is there any reason why this can't be done (AMD hasn't done this yet either, although haven't checked for a while)?

also is this a decision made by Intel not to open the decoder up, or is it because of technical issues (drivers need developing specifically for this)?

nevcairiel
5th November 2012, 08:09
The QS decoder doesn't have any higher actual overhead then the NVIDIA CUDA decoder. Sure, in theory there is the D3D in between, which requires a connected screen - but if you have the connected screen anyway, the actual CPU overhead is the same, both need to copy the frame from the GPU back to main system memory. Additionally, the QS decoder is quite a lot faster then NVIDIAs. So from a pure performance perspective, QS still wins.

It would still be great to have access to the QS engine without D3D, but i for one am doubtful that will ever happen. Not for performance reasons, but simply to be able to use it without trickery with virtual screens and whatnot when you have a dedicated GPU.

Pulp Catalyst
5th November 2012, 08:47
I have been doing tests now for several days, maybe it's the implementations, maybe there are other cogs in the chain causing lost of performance I don't know, what I do know is I have been doing many tests with megui encodes through ffdshow (quicksync decoding) and I am getting results that are the same if not worse than doing software decoding (the numbers seems to jump all over the place.... i also get huge mourse lag and stutter in windows.... memory usage in avisynth shoots up to 1gb (it's locked not to go over that).... fact is... ffdshow decoding at high-speed (for use with encoding) feels very unstable.

whereas DGDecNV (offloading the decoding too my 9800GT which has VP2 decoder engine) I get about a decrease in CPU usage for Highdef Blu-ray video around 6-12%, not much, but it does give x264 an extra boost in available processing power (I know this is in simple terms), however the end result can't be denied, my average fps of 22.3 fps can go to 24.3 fps (on a good day, but usually an extra 1.5 increase is about the norm although my 9800gt has an old decoding engine tech compared to ivy 3770k)

I'm thinking with the added elements of ffdshow (and how it works, having to use directshowsource in avisynth along with anything else that i can't see in the middle) maybe is what's causing the overall performance drop compared to having a direct link to the decoder itself like a tool similar to DGDecNV, only for QS instead.

finished reading this yesterday actually, which may or may not be relevant here....

http://software.intel.com/en-us/articles/performance-interactions-of-opencl-code-and-intel-quick-sync-video-on-intel-hd-graphics

egur
5th November 2012, 19:34
@Pulp Catalyst
Try specifying your MeGUI setup as much as possible.
I'll try to replicate your setup and maybe find a better one (performance, same quality).

Pulp Catalyst
5th November 2012, 22:31
thanks, but quite frankly there isn't one (i know bold statement), but when you think about it how could there be, what ever gains QS gives regarding the decoding, will be lost when taking into account the extra steps needed doing that process..... (which i feel strongly that's what i have been witnessing)

offloading the decoding process to hardware doesn't gain that much anyway as you well know, because of that fact it doesn't take much for the performance gains of hardware decoding to diminish..... with ffdshow having to do this at that end (i will not pretend to understand the technical stuff at this point) those small gains quickly diminish, by the time the decoded stream comes back to avisynth I'm lucky if i have broken even (but from what i can tell i usually have lost performance by this point NOT GAINED)


very kind of you to offer though, i think i will just invest in a new GPU (with vp5 decoder as this has a lot more performance boots (thanks nvidia))

http://en.wikipedia.org/wiki/Nvidia_PureVideo#The_Fifth_Generation_PureVideo_HD

along with the PCIe 3.0 ivy brings to the table with a vp5 decoder..... i reckon even more work will be offloaded to the GPU (with the faster express lane, hopefully less protocol overhead too)

also that link i posted in my previous post has got me very concerned in how the GPU in ivy works (if CPU and GPU are utilized at full at the same time, then a bottleneck is formed..... something like that if i understood it correctly because how turbo boost works or something...... worrying but can't be sure if relevant to my case)

your main focus has been decoding for a display device, perhaps in time when that is finalized you will focus on decoding for transcoding purposes but not QS encoding.... just decoding thanks LOL, but giving how busy you are already, decoding for encoding purposes is probably the last thing you need to start dealing with right now....


what would it take for intel to open there decoder up for direct access (couldn't help notice you didn't answer my previous questions on this matter)
and if they won't...... WHY?

if you don't know.... who would?

egur
6th November 2012, 07:59
@Pulp Catalyst
I want to clarify some technical details:
Both Nvidia and Intel do not give you direct access to the HW because that's the driver's job.
Both provide an API (user mode) that abstracts the hassles of dealing with HW.
Intel provides 2 APIs:
1) DXVA - decode and video processing.
2) Media SDK - decode, video processing & encoding.
The later is more abstract (much less code for developers) and allows encoding which is not available in DXVA (Microsoft didn't define an encoding API).
The Media SDK dependency on DXVA/D3D9 is not a real problem when the display is connected to iGPU.

All HW decoders output NV12. This surface type is the best optimized surface for HW implementation.
QS doesn't actually output frames in system memory, my implementation copies (as fast as possible) the D3D surfaces from the GPU memory space. GPU memory space is organized differently than the CPU memory space so the copy is ~half the speed of standard memory copy.
ffdshow copies the output yet again to the surface provided by the downstream filter (usually renderer). Instead of copying, it may perform surface type or colorspace conversion.

I'm interested in improving performance in all domains related to decoding and I think there's still things to be done.
BTW the HW encoder can be used freely by developers, including open source via the Media SDK. It's not too complicated but I don't have the bandwidth to add it to my code and integrate it into an encoder that will work under MeGUI.

I couldn't make Avisynth work at all with DirectShowSource - I'm having setup troubles :(

Pulp Catalyst
6th November 2012, 09:57
yeah i knew about the API, reading back what i wrote.... yeah should't use the word direct really - sorry, nothing is ever direct, there would be BSOD's all over the place LOL

what i meant is that ffdshow is doing a piggy in the middle kind of thing where as.....

the following is quite informative for DGDecNV development process, it's great that "neuron2" has shared this for others to see,

would be great if you could take the time to read, and comment on how that could relate to Intel's API, and if at all possible?
(there is a lot i don't understand, but i'm sure you will)

http://neuron2.net/dgdecnv/cuda/cuda.html

(it's a long read, you may need a couple of coffees if you decide to read it),

i will understand if you don't !!!

andybkma
6th November 2012, 10:16
Greets, I have a question about how QS is implemented differently in ffdshow vs LAV with my Ivy Bridge i7-3610QM . I have two AVC vids that play fine in Zoom Player using LAV Splitter + LAV Video QS. But when I use ffdshow as the QS decoder (with defaults) with same LAV Splitter those two vids play fast (above normal speed and out of sync). If I disable "Enable Time Stamp correction" in ffdshow then those two vids play a little better, seeking is still a tad off (plays fast but then reverts to what should be normal speed) and is in sync. But they still don't play nearly as nice as when I use LAV Video as the QS decoder. But if I then play other AVC vids I then get a different seeking problem if I leave "Enable Time Stamp Correction" off so that's not a viable solution.

So which leads me back to I am wondering why I am getting such different results when playing these two vids using two different QS decoders but with same LAV splitter. Probably a bug in ffdshow? Am using newest clsid, rev 4489.

egur, if you want, I can pm you the file download links to the two vids I am having probs with in ffdshow using your QS decoder...

Note: If anyone is wondering why I am reporting this problem with ffdshow and hoping it will get fixed is because ffdshow has that realtime avisynth plugin option (I use Limited Sharpen Faster) . If anyone can tell me how to use avisynth realtime plugin with LAV without ffdshow that would be great.

egur
6th November 2012, 19:31
@andybkma
The main difference between ffdshow and LAV in that respect is that LAV computes the time stamps based on heuristics derived from information which is not known to my decoder. This usually works best (not 100%). ffdshow doesn't do it, so I calculate the time stamps (and frame rate) from the stream itself which in some (rare) cases doesn't work, especially if the splitter is buggy (e.g. Haali). LAV splitter is the best choice.
I would appreciate a sample since all my test clips work fine.

BTW ffdshow can be used a post processor (e.g. raw video).

@Pulp Catalyst
I read part of the long post. Anything special you wanted me to notice?

Pulp Catalyst
6th November 2012, 22:53
didn't know whether any of it would help, as there is some overlap here with what your doing in a way, thought maybe you would get some ideas or something.... obviously not LOL

to me it seemed what neuron2 has already done seemed familiar in someway with what your doing now, (thought there could be some info on his post that maybe would explain why ffdshow decoding above REALTIME speed would be giving such poor results for me with Intel)

his system is a fremeserver (frames come back directly to avisynth for example), maybe is this difference in design that makes all the world of difference, because ffdshow is not a frameserver, the end design for ffdshow is to playback frames in real-time or better on a display device...... somewhere between these two worlds must be a difference which would explain the lack of performance that i'm getting from intel decoding of high def...... although your system and a frameserver shares many similar designs elements, there must be a branch difference somewhere near the end or something (clearly this is where i have no clue..... just shouting out some possible ideas)

feel free to shoot me down if i'm way of base....which i probably am...

andybkma
7th November 2012, 02:43
@andybkma

I would appreciate a sample since all my test clips work fine.



Thanks for your detailed explanation :-) PM sent with clip download link

egur
7th November 2012, 22:28
Version 0.40 is out with the following changes:
* Removed all MT code, cleaning up the design. MT copy is still here
* Wrote basic AVX2 copy function (unused and untested).
* Enabled DVD decode. Not enabled well in ffdshow. Used in LAV 0.53 and up.
* Out of beta!
* FFDShow: r4490

Downloads
* For the latest cutting edge FFDShow builds download my builds Intel QuickSync Decoder SourceForge home page (http://sourceforge.net/projects/qsdecoder/)
* FFDShow-tryout site (http://ffdshow-tryout.sourceforge.net/download.php)
* LAV Splitter builds (http://forum.doom9.org/showthread.php?t=156191)

MarkT
9th November 2012, 03:38
Hi,

I am using LAVFilters most recent 0.53 build from yesterday and am trying to decode H.264 Video (TV) with QuickSync. DXVA native works and QuickSync used to work, too, but since a couple of Intel video driver releases on my Intel Ivy Bridge HD 4000 CPU it falls back to avcodec for H.264. Most recent 9.17.10.2875 driver also makes it fall back. Is this supposed to be that way? :D

I imagine something with regard to the SDK was changed some weeks ago, but it is out of my league to determine what that might be.

Thanks,
Mark

egur
9th November 2012, 10:42
@Mark
What about playing files?

Anyone else had this issue?

MarkT
9th November 2012, 17:51
@egur

File works:

Filter : LAV Video Decoder - CLSID : {EE30215D-164F-4A92-A4EB-9D4C13390F9F}

- Connected to:

CLSID: {171252A0-8820-4AFE-9DF8-5C92B2D66B04}
Filter: LAV Splitter
Pin: Video

- Connection media type:

Video: MPEG4 Video (H264) 1920x816 23.976fps

AM_MEDIA_TYPE:
majortype: MEDIATYPE_Video {73646976-0000-0010-8000-00AA00389B71}
subtype: Unknown GUID Name {31435641-0000-0010-8000-00AA00389B71}
formattype: FORMAT_MPEG2_VIDEO {E06D80E3-DB46-11CF-B4D1-00805F6CBBEA}
bFixedSizeSamples: 0
bTemporalCompression: 1
lSampleSize: 1
cbFormat: 165

VIDEOINFOHEADER:
rcSource: (0,0)-(1920,816)
rcTarget: (0,0)-(1920,816)
dwBitRate: 0
dwBitErrorRate: 0
AvgTimePerFrame: 417084

VIDEOINFOHEADER2:
dwInterlaceFlags: 0x00000000
dwCopyProtectFlags: 0x00000000
dwPictAspectRatioX: 40
dwPictAspectRatioY: 17
dwControlFlags: 0x00000000
dwReserved2: 0x00000000

MPEG2VIDEOINFO:
dwStartTimeCode: 0
cbSequenceHeader: 33
dwProfile: 0x00000064
dwLevel: 0x00000029
dwFlags: 0x00000004

BITMAPINFOHEADER:
biSize: 40
biWidth: 1920
biHeight: 816
biPlanes: 1
biBitCount: 12
biCompression: AVC1
biSizeImage: 2350080
biXPelsPerMeter: 0
biYPelsPerMeter: 0
biClrUsed: 0
biClrImportant: 0

pbFormat:
0000: 00 00 00 00 00 00 00 00 80 07 00 00 30 03 00 00 ........€...0...
0010: 00 00 00 00 00 00 00 00 80 07 00 00 30 03 00 00 ........€...0...
0020: 00 00 00 00 00 00 00 00 3c 5d 06 00 00 00 00 00 ........<]......
0030: 00 00 00 00 00 00 00 00 28 00 00 00 11 00 00 00 ........(.......
0040: 00 00 00 00 00 00 00 00 28 00 00 00 80 07 00 00 ........(...€...
0050: 30 03 00 00 01 00 0c 00 41 56 43 31 00 dc 23 00 0.......AVC1.#.
0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0070: 00 00 00 00 21 00 00 00 64 00 00 00 29 00 00 00 ....!...d...)...
0080: 04 00 00 00|00 18 67 64 00 29 ac 34 e5 01 e0 19 ......gd.)4..
0090: e8 40 00 65 d3 c0 13 12 d0 23 c6 0c 65 80 00 05 @.e..#.e€..
00a0: 68 ee b2 c8 b0 hȰ


I recorded a small segment in DVB-Viewer and playing it in MPC-HC it also uses QuickSync:

Filter : LAV Video Decoder - CLSID : {EE30215D-164F-4A92-A4EB-9D4C13390F9F}

- Connected to:

CLSID: {171252A0-8820-4AFE-9DF8-5C92B2D66B04}
Filter: LAV Splitter
Pin: Video

- Connection media type:

Video: MPEG4 Video (H264) 1280x720 50fps

AM_MEDIA_TYPE:
majortype: MEDIATYPE_Video {73646976-0000-0010-8000-00AA00389B71}
subtype: Unknown GUID Name {31435641-0000-0010-8000-00AA00389B71}
formattype: FORMAT_MPEG2_VIDEO {E06D80E3-DB46-11CF-B4D1-00805F6CBBEA}
bFixedSizeSamples: 0
bTemporalCompression: 1
lSampleSize: 1
cbFormat: 288

VIDEOINFOHEADER:
rcSource: (0,0)-(1280,720)
rcTarget: (0,0)-(1280,720)
dwBitRate: 0
dwBitErrorRate: 0
AvgTimePerFrame: 200000

VIDEOINFOHEADER2:
dwInterlaceFlags: 0x00000000
dwCopyProtectFlags: 0x00000000
dwPictAspectRatioX: 16
dwPictAspectRatioY: 9
dwControlFlags: 0x00000000
dwReserved2: 0x00000000

MPEG2VIDEOINFO:
dwStartTimeCode: 0
cbSequenceHeader: 156
dwProfile: 0x00000064
dwLevel: 0x00000028
dwFlags: 0x00000004

BITMAPINFOHEADER:
biSize: 40
biWidth: 1280
biHeight: 720
biPlanes: 1
biBitCount: 12
biCompression: AVC1
biSizeImage: 1382400
biXPelsPerMeter: 0
biYPelsPerMeter: 0
biClrUsed: 0
biClrImportant: 0

pbFormat:
0000: 00 00 00 00 00 00 00 00 00 05 00 00 d0 02 00 00 ...............
0010: 00 00 00 00 00 00 00 00 00 05 00 00 d0 02 00 00 ...............
0020: 00 00 00 00 00 00 00 00 40 0d 03 00 00 00 00 00 ........@.......
0030: 00 00 00 00 00 00 00 00 10 00 00 00 09 00 00 00 ................
0040: 00 00 00 00 00 00 00 00 28 00 00 00 00 05 00 00 ........(.......
0050: d0 02 00 00 01 00 0c 00 41 56 43 31 00 18 15 00 .......AVC1....
0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0070: 00 00 00 00 9c 00 00 00 64 00 00 00 28 00 00 00 ....œ...d...(...
0080: 04 00 00 00|00 27 27 64 00 28 ac c8 60 14 01 6e .....''d.(`..n
0090: c0 5a 80 80 80 f8 00 00 03 00 08 00 00 03 03 27 Z€€€.........'
00a0: 44 00 05 7b c0 00 36 65 d7 bd c0 50 00 00 05 28 D..{.6e׽P...(
00b0: fa 43 cb 00 00 61 28 5e 90 f3 82 80 24 42 00 82 C..a(^‚€$B.‚
00c0: 10 84 61 00 c0 80 f7 06 1f ff e0 c3 ff fc 0b 01 .„a.€....
00d0: 08 43 11 c2 02 04 84 21 88 30 ff ff 06 1f ff e0 .C...„!ˆ0..
00e0: a0 18 43 01 90 c0 43 e0 80 22 10 ff c2 18 20 c1 *.C.C€"..
00f0: 0e 11 84 21 88 88 60 40 90 8c 48 80 b7 ff f0 2c ..„!ˆˆ`@ŒH€,
0100: 08 88 c4 71 1f f8 8f c4 7f 11 88 13 88 88 c4 88 .ˆq..ˆ.ˆˆˆ
0110: f1 1c 21 88 88 c0 00 00 07 28 7e 90 f3 00 c0 00 .!ˆˆ...(~..



I am guessing now its an interaction with the DVB source filter and LAV and not related to your QuickSync code. I have to experiment some more and probably talk to LAV author in the other thread. Thanks!

corporalgator
11th November 2012, 19:49
I'm having trouble with a particular file while using quicksync. I was on lav .52, so I updated to .53 and it didn't get any better. First question is, do I need to download the quicksync update separately or is it automatically included with lav .53?

So, when I play this file, a blu-ray remuxed into an mkv, every 5 minutes or so, a few of the frames become corrupted with several blocks of random solid colors. If I switch the lav video decoder to dxva, the problem goes away.

Here's the information on the file:


- Connected to:

CLSID: {B98D13E7-55DB-4385-A33D-09FD1BA26338}
Filter: LAV Splitter Source
Pin: Video

- Connection media type:

Video: WVC1 1920x1080 23.976fps

AM_MEDIA_TYPE:
majortype: MEDIATYPE_Video {73646976-0000-0010-8000-00AA00389B71}
subtype: Unknown GUID Name {31435657-0000-0010-8000-00AA00389B71}
formattype: FORMAT_VideoInfo2 {F72A76A0-EB0A-11D0-ACE4-0000C0CC16BA}
bFixedSizeSamples: 0
bTemporalCompression: 1
lSampleSize: 1
cbFormat: 146

VIDEOINFOHEADER:
rcSource: (0,0)-(1920,1080)
rcTarget: (0,0)-(1920,1080)
dwBitRate: 0
dwBitErrorRate: 0
AvgTimePerFrame: 417083

VIDEOINFOHEADER2:
dwInterlaceFlags: 0x00000000
dwCopyProtectFlags: 0x00000000
dwPictAspectRatioX: 16
dwPictAspectRatioY: 9
dwControlFlags: 0x00000000
dwReserved2: 0x00000000

BITMAPINFOHEADER:
biSize: 74
biWidth: 1920
biHeight: 1080
biPlanes: 1
biBitCount: 12
biCompression: WVC1
biSizeImage: 3110400
biXPelsPerMeter: 0
biYPelsPerMeter: 0
biClrUsed: 0
biClrImportant: 0

pbFormat:
0000: 00 00 00 00 00 00 00 00 80 07 00 00 38 04 00 00 ........€...8...
0010: 00 00 00 00 00 00 00 00 80 07 00 00 38 04 00 00 ........€...8...
0020: 00 00 00 00 00 00 00 00 3b 5d 06 00 00 00 00 00 ........;]......
0030: 00 00 00 00 00 00 00 00 10 00 00 00 09 00 00 00 ................
0040: 00 00 00 00 00 00 00 00 4a 00 00 00 80 07 00 00 ........J...€...
0050: 38 04 00 00 01 00 0c 00 57 56 43 31 00 76 2f 00 8.......WVC1.v/.
0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0070:|00 00 00 01 0f db 7e 3b f2 1b 8a 3b f8 86 f1 80 .....~;.Š;†€
0080: 4a 02 02 03 09 af 27 07 27 04 00 00 01 0e 5a df J....'.'.....Z
0090: f8 40 @





Is this a quicksync problem or lav problem, do I need to uplaod a clip?

egur
12th November 2012, 07:54
@MarkT
Nev, LAV author has created a fix for DVB source. According to the fix, it seems that DVB doesn't fill the initial media sample correctly, causing QS's stream compatibility check to fail.


Is this a quicksync problem or lav problem, do I need to uplaod a clip?
You need to upload the clip. Better yet, part of it containing the issue. It could be a Media SDK issue not the driver. MediaFire is a recommended file share for this purpose.

CiNcH
12th November 2012, 08:09
Nev, LAV author has created a fix for DVB source. According to the fix, it seems that DVB doesn't fill the initial media sample correctly, causing QS's stream compatibility check to fail.
It is possible to activate format pre-detection inside the DVBViewer, which will properly set profile and level. It is disabled by default, which means that the decoder has to parse the information from the bitstream all by itself. Leaving format pre-detection disabled works for most decoders and improves channel switching delay.

MarkT
12th November 2012, 13:22
@egur: Thanks for taking the time, fix works, Nev is the man. ;-)

corporalgator
12th November 2012, 15:46
You need to upload the clip. Better yet, part of it containing the issue. It could be a Media SDK issue not the driver. MediaFire is a recommended file share for this purpose.

Here's the link to the clip. It displays the pixelization twice right in the middle.

http://www.mediafire.com/?m92175z7o5r59so

To be using the latest quick sync, do I just need to download the latest lav filters, or do I have to download quick sync separately?

I found others in my library that have the same problem, and they are all VC-1 encoded.

egur
12th November 2012, 16:56
Here's the link to the clip. It displays the pixelization twice right in the middle.

http://www.mediafire.com/?m92175z7o5r59so

To be using the latest quick sync, do I just need to download the latest lav filters, or do I have to download quick sync separately?

I found others in my library that have the same problem, and they are all VC-1 encoded.

I've reproduced the issues. Looks like a driver bug. I'll report it and hopefully it will get fixed.
Please share more clips that show corruption if you can.

As for the Intel QuickSync Decoder, it's shipped with either ffdshow or LAV. The latest release of either has the latest version (usually). Don't copy the DLL around between ffdshow and LAV as the API might have changed. It will usually work but no guaranties.

As a rule of thumb, ffdshow is updated first because I have permission to update the ffdshow source code. LAV updates very quickly, so if the last LAV version is new, it's up to date.
Note that in the future, certain features will be available in LAV and not in ffdshow and vice versa.
Currently the differences are:
LAV: QS DVD playback.
ffdshow: HW deinterlacing, denoise, detail, time stamp correction, soft inverse telecine.

corporalgator
13th November 2012, 05:02
I've reproduced the issues. Looks like a driver bug. I'll report it and hopefully it will get fixed.
Please share more clips that show corruption if you can.

As for the Intel QuickSync Decoder, it's shipped with either ffdshow or LAV. The latest release of either has the latest version (usually). Don't copy the DLL around between ffdshow and LAV as the API might have changed. It will usually work but no guaranties.

As a rule of thumb, ffdshow is updated first because I have permission to update the ffdshow source code. LAV updates very quickly, so if the last LAV version is new, it's up to date.
Note that in the future, certain features will be available in LAV and not in ffdshow and vice versa.
Currently the differences are:
LAV: QS DVD playback.
ffdshow: HW deinterlacing, denoise, detail, time stamp correction, soft inverse telecine.

Thanks, I'll upload clips for the others Wednesday probably.

egur
14th November 2012, 21:24
Thanks, I'll upload clips for the others Wednesday probably.

I just got hold of the newest driver - 15.28.8.64.2875.
It fixes your clip, so next released driver will do the job. The driver was built before my report so I guess this was a known issue.

Please share more failing clips.
As a rule of thumb, if libavcodec/ffmpeg shows no issues and QuickSync does, this is a good clip to report.

corporalgator
15th November 2012, 06:03
I just got hold of the newest driver - 15.28.8.64.2875.
It fixes your clip, so next released driver will do the job. The driver was built before my report so I guess this was a known issue.

Please share more failing clips.
As a rule of thumb, if libavcodec/ffmpeg shows no issues and QuickSync does, this is a good clip to report.

Here's another doing the same thing. You'll be able to see which move it is, heh.

http://www.mediafire.com/?yzrmqsxy19g7m7b (http://www.mediafire.com/?yzrmqsxy19g7m7b)

So basically the next version of lav filters will have this fixed?

nevcairiel
15th November 2012, 07:32
No, the next Intel driver fixes this (a version higher then 2875 at least). You need to update the driver, not LAV. :)