Log in

View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [18] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

CruNcher
24th February 2012, 11:48
was Aero off (especially the glass shader) ?

NikosD
24th February 2012, 12:44
Eric, Nevcairiel, Cruncher

In my opinion there is no such thing as NULL Renderer.
In real life scenarios - during playback - you will always use a renderer.

If you have such a performance loss, just by using a renderer (EVR), the first thing you should do is to improve the performance of QuickSync decoder.

If you think it's not worth it or you can't improve it due to software or hardware limitations, then make a native DXVA decoder using Intel' MSDK for H.264, MPEG-2 and VC-1.

The performance of QuickSync decoder, as it is right now, should not make you happy with a hardware like QuickSync.

The performance of LAV DXVA copy-back is embarassing for QS HW.

nevcairiel
24th February 2012, 12:51
In real life scenarios - during playback - you will always use a renderer.


In real life scenarios - during playback - you'll watch content at a maximum of 60 fps.
Why would anyone care if it outputs 250 or 350 fps in benchmarks?

The decoder is not slow (in fact, its very fast), whats making it slow is the renderer. Sure, it probably is possible to make the benchmark go faster, but... why? All it changes is the benchmark, playback will be 100% exactly the same.

If you think you need 400 fps with EVR to watch a movie, then make a native DXVA decoder using Intel' MSDK for H.264, MPEG-2 and VC-1. :)

NikosD
24th February 2012, 12:57
Because:

A) There is always the possibility of a new Intel driver for SNB, allowing 4K decoding of H.264 - where you would need every fps possible.

B) There is always the pressure "from inside" the perfectionist developer - like Eric - to optimize as much as he can.

egur
24th February 2012, 13:51
Because:

A) There is always the possibility of a new Intel driver for SNB, allowing 4K decoding of H.264 - where you would need every fps possible.

B) There is always the pressure "from inside" the perfectionist developer - like Eric - to optimize as much as he can.

I don't believe SNB will ever support 4K because of HW limitations. Lucky for most users is that 4K is a very small niche ATM with low bitrates that ffmpeg can handle without problems (on SNB).

I plan to investigate the EVR performance a little more but not much more. I seem to hit a wall here. Like Nev said, performance is already very good.

I personally don't like the DXVA route as it cripples the SW architecture severely.

Regarding performance. If you look at CPU utilization and power usage during playback, the state is very good. MT doesn't help that one bit. It only reduces latency and time spent in the decoding thread. This allows faster seeks which is a good feature.
But benchmarks results may be (are) tainted with locks and waits associated with MT. This means that the CPU can wait for queues to fill or operations to finish. These are meaningless with respect to playback efficiency. It only affects "full speed" playback.

Nev is working on a DXVA decoder which should have same performance as other pure DXVA solutions. I want to concentrate on HW video processing in the near future.

nevcairiel
24th February 2012, 13:53
Nev is working on a DXVA decoder which should have same performance as other pure DXVA solutions. I want to concentrate on HW video processing in the near future.

That would be much more fun if Intel would adhere to DXVA "standards", or at least disclose the differences in their implementation. :p

Apparently, only the MSDK really knows how its supposed to work with Intel (which is why it would also be nice to get support for older GPUs back somehow)

egur
24th February 2012, 13:57
So this Performance issue is only a problem if you use quicksync with rendering out on the iGPU @ the same time ?
Though it makes sense as the MFX uses the EUs and if the EUs are pressured their should be a performance impact and using EVR presures the EUs same as Aero (dwm) does @ the same time with Deinterlacing i guess the pressure should be even higher (you can actually measure the overhead its small though but it's their).
Im pretty sure with Encoding (H.264, here it's even official that the MFX uses the EUs for Motion Estimation) and Rendering directly out (EVR) you gonna see the same effect (most probably any PP in the Intel Control Panel might even stress more).

Aero was off for the tests.
I didn't test with deinterlacing active so that's another issue.
The EUs don't do much for both decode, video processing and encode. The bulk of the work is done via ASIC (fixed function HW). That's why its so fast. That's also why it's hard or impossible to add features (codecs, profiles, etc).
You can't have a cake and eat it ;)

When EVR uses the same HW, it burdens the memory systems. Since I don't know the internals of EVR, it would be hard to find the optimal method using it with respect to performance.

NikosD
24th February 2012, 14:13
I don't believe SNB will ever support 4K because of HW limitations. Lucky for most users is that 4K is a very small niche ATM with low bitrates that ffmpeg can handle without problems (on SNB).


HW limitations ? That's a new one.

I remember a previous post from you, explaining that in theory QS ASIC is capable of processing 4K resolutions (both pixels and decoding bandwidth)


Regarding performance. If you look at CPU utilization and power usage during playback, the state is very good. MT doesn't help that one bit. It only reduces latency and time spent in the decoding thread. This allows faster seeks which is a good feature.


Last time I checked out playback, it was still draining a lot of power using QuickSync decoder, compared to native DXVA implementations, especially for 60fps clips (even for low bitrate clips)

I wouldn't recommend QuickSync decoder - as it is right now - to laptop users, especially when there are so fast and efficient native DXVA implementations like PotPlayer internal codecs and MS DS/MFT, CoreAVC etc.

And bugless too (as bugless as they can be with Intel hardware/drivers)

If Intel doesn't want to make a 4K driver for SNB, at least it should open VC-1 VLD to all and provide the appropriate documentation for native DXVA implementation for all formats (H.264, MPEG-2, VC-1)

Grow up Intel :sly:

andyvt
24th February 2012, 14:59
I wouldn't recommend QuickSync decoder - as it is right now - to laptop users, especially when there are so fast and efficient native DXVA implementations like PotPlayer internal codecs and MS DS/MFT, CoreAVC etc.

On most laptops the screen is so poor that there's no benefit to messing with any of this stuff.

egur
24th February 2012, 22:32
@NikosD,
You can recommend or not recommend what ever you want.

QS decoder was meant for those who find Microsoft's decoder insufficient and/or want the SW-like look and feel so they can keep using their video setup without dramatic changes.
There's no competition between my decoder and CoreAVC or any other proprietary codec. In fact I've made my source code BSD license so they can use it as a whole as reference code so we could all enjoy high quality video using the HW resources.

The MS decoder is free and yet there's room for CoreAVC, ffdshow, LAV, CyberLink, Arcsoft, etc. Why? because MS decoder isn't working very well and can't be used on a daily basis (for many people).

Since my resources are very limited, I have to channel my efforts to what people want most.

If you don't appreciate what I do than there are other threads in doom9 for you to post in. Let's keep the discussions civilized.

NikosD
25th February 2012, 10:39
The MS DS decoder sure has some problems.

That's why I put the asterisk (*) whenever I find out decoding bugs (Artifacts) during playback or benchmarking.

I think the asterisk is obvious at my benchmarks post.

Also MS MFT is one of the fastest decoders with no decoding errors - as far as I can say - but it's limited to WMP12 and any other Media Foundation Player.

About other decoders i can say the competition is good as long as it produces better products.

I like your effort from the beginning, that's why I have contributed a lot - I think - to improve it - not from the developer's view (I'm not a developer), but from a user's view with some knowledge and skills to test and push some things forward.

I take every opportunity I can get, to be "uncivilized" not to you personally, but to every tactic I see from big Companies like Intel, AMD, Nvidia that is against the majority of us - users.

And because I respect you personally, I will stop "attacking" Intel, because you work for them.

But you have to understand, even if you aren't responsible, that you "represent" Intel here in a way.

wanezhiling
25th February 2012, 11:35
@NikosD
I think PotPlayer's internal DXVA decoder is good enough to Intel except that ModeVC1_VLD isn't accessible.

NikosD
25th February 2012, 16:31
The new DXVA checker v2.8.0 (it's in beta) has fixed the problem with LAV Video and Basketball clip and it will support renderless VMR and EVR benchmark modes.

egur
26th February 2012, 21:27
The new DXVA checker v2.8.0 (it's in beta) has fixed the problem with LAV Video and Basketball clip and it will support renderless VMR and EVR benchmark modes.

Made some nice progress optimizing the flow for EVR workloads. I'll release a new version in a few days.

BTW, where can I download DXVA checker v2.8.0 beta?

CruNcher
26th February 2012, 21:35
I wonder if it clears out the brake up issues when decoding + pp (ffdshow) under heavy cpu core load http://forum.doom9.org/showpost.php?p=1558451&postcount=785 :) That Microsoft and MPC-HC have less/no issues with

wanezhiling
27th February 2012, 10:08
15.26.3.64.2639 is out, only for IVY.

http://i.imgur.com/8isVE.png

nevcairiel
27th February 2012, 10:20
15.26.3.64.2639 is out, only for IVY.

http://i.imgur.com/8isVE.png

A driver is "out" when its available on Intels site, and not on some chinese website. :p
Still more then a month until Ivy release

egur
27th February 2012, 10:28
15.26.3.64.2639 is out, only for IVY.

http://i.imgur.com/8isVE.png

IvyBridge driver (15.26 family) but also installs on SandyBridge.
You can get it from Intel's download center here (http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=20910&keyword=2639&DownloadType=Drivers&lang=eng). IvyBridge Drivers are available by selecting "Desktop boards" from the download center, then select one of the IvyBridge boards (e.g. 77 series). Don't download drivers from unofficial sites.

Adds initial WMV9 HW support. Not perfect. Otherwise nothing major with respect to video as far as I've seen. Doesn't fix any issues I know about.

With the exception of developers or very bored people, users should stick with the current SNB drivers (2509, 2559, 2622).

wanezhiling
27th February 2012, 12:04
IvyBridge driver (15.26 family) but also installs on SandyBridge.
You can get it from Intel's download center here (http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=20910&keyword=2639&DownloadType=Drivers&lang=eng). IvyBridge Drivers are available by selecting "Desktop boards" from the download center, then select one of the IvyBridge boards (e.g. 77 series). Don't download drivers from unofficial sites.
Thanks Eric. I DID DOWNLOAD it from Intel official site.:)


Adds initial WMV9 HW support.
PotPlayer' QuickSync decoder could support WMV9 since they integrated your DLL in PotPlayer 1.5.30927(19.12.2011), how did they do that?
http://i.imgur.com/tmVc2.jpg

nevcairiel
27th February 2012, 12:08
PotPlayer' QuickSync decoder could support WMV9 since they integrated your DLL in PotPlayer 1.5.30927(19.12.2011), how did they do that?
http://i.imgur.com/tmVc2.jpg

The QuickSync decoder always supported WMV9, but check your CPU usage, until now it did it in software.

egur
27th February 2012, 12:13
...
PotPlayer' QuickSync decoder could support WMV9 since they integrated your DLL in PotPlayer 1.5.30927(19.12.2011), how did they do that?
http://i.imgur.com/tmVc2.jpg

I added support for WMV9 months ago. The SandyBridge drivers only offered support for SW decoding but as soon as HW acceleration was available I tested it and it worked.
BTW, the 2622 driver doesn't have HW acceleration for WMV9 but 2639 does. Actually all 15.26 drivers have WMV9 support. BTW, 2622 belongs to the 15.22 family.

CruNcher
27th February 2012, 13:04
i already saw this support creeping out in this preview Sandy Bridge Driver 2626 with Quicksync hope 2639 is more stable ;)
though i guess moding this driver will be a little harder this time as it's most probably missing all the main Hardware IDs for Sandybridge not only the specific Subsystem ones

wanezhiling
27th February 2012, 13:12
I added support for WMV9 months ago. The SandyBridge drivers only offered support for SW decoding but as soon as HW acceleration was available I tested it and it worked.
BTW, the 2622 driver doesn't have HW acceleration for WMV9 but 2639 does. Actually all 15.26 drivers have WMV9 support.

Er..I'm confused, QS is a HW decoder,so you mean in fact this (http://i.imgur.com/tmVc2.jpg) is in software mode not QS mode?

CruNcher
27th February 2012, 13:16
Er..I'm confused, QS is a HW decoder,so you mean in fact this (http://i.imgur.com/tmVc2.jpg) is in software mode not QS mode?

Jep Intel can dynamically dispatch based on the bitstream between their HW decoder.dll and Software decoder.dll clever thing, so in theory they could also support 4:2:2 or 10 bit fallback for example though their Software Decoder doesn't support neither as well ;)
If you install 2626 this fallback is disabled and it tries to decode the bitstream on the Hardware which looked odd (like you would try to play DRM encrypted WMV, lot of strange colored blocks)
Though i didn't looked much into it

egur
27th February 2012, 13:19
Er..I'm confused, QS is a HW decoder,so you mean in fact this (http://i.imgur.com/tmVc2.jpg) is in software mode not QS mode?

Yes, SW mode.

Media SDK HW DLL (libmfxhwXX.dll) will work in SW mode (e.g. SW fallback) if it doesn't support a profile in some cases.
Some examples:
* Video is wider or taller than 1080p.
* WMV9 (VC1 simple and main profiles)

Some features may exist (currently) only in the SW version of the MSDK dll (libmfxswXX.dll), available as part of the Media SDK 2012 install:
* H264-3D: stereo or MVC profiles
* MJPG
The above are meant for developers to test their code before HW support is available.

nevcairiel
27th February 2012, 13:20
Er..I'm confused, QS is a HW decoder,so you mean in fact this (http://i.imgur.com/tmVc2.jpg) is in software mode not QS mode?

Technically, the name "QuickSync" decoder isn't accurate, it really should've been called "Intel Media SDK" decoder.
They offer both software and hardware decoders for the 3 formats, and the software decoder also supported WMV3, but the hardware version didn't until now.

Their software decoders are generally slower then for example the ffmpeg versions, so they don't find much use.

wanezhiling
27th February 2012, 13:24
egur,nevcairiel,CruNcher, thanks a lot!
Very useful information for me.:thanks::thanks:

egur
27th February 2012, 13:49
QuickSync is the Intel brand name for HW accelerated video decode/process/encode starting with SandyBridge. The brand name strongly implies fast transcoding (which it does but I don't support).
The Media SDK is the method I used to get the job done.

Intel QuickSync Decoder sounds better than Media SDK Decoder. Like "LAV filters" sound better than "FFMPEG filters" as FFMPEG is also a means to an end and might be replaced if a better alternative arises (not very likely :) ).

wanezhiling
27th February 2012, 14:46
@egur
When I use your ffdshow QS to play a WMV3 file, does this (http://i.imgur.com/Am0cd.png) mean I'm in QS(SW) mode?
If playing a normal H.264/MPEG2/VC1 file like this (http://i.imgur.com/eL4L5.jpg), then means I'm in QS(HW) mode?

PS: Driver is still 2622.


@nev
As above, LAV QS says available not active when playing a WMV3 file, does this mean I'm in QS(SW) mode as well?

:thanks:

egur
27th February 2012, 14:51
@egur
When I use your ffdshow QS to play a WMV3 file, does this (http://i.imgur.com/Am0cd.png) mean I'm in QS(SW) mode?
If playing a normal H.264/MPEG2/VC1 file like this (http://i.imgur.com/eL4L5.jpg), then means I'm in QS(HW) mode?

PS: Driver is still 2622.


@nev
As above, LAV QS says available not active when playing a WMV3 file, does this mean I'm in QS(SW) mode as well?

:thanks:

Latest official builds of either ffdshow or LAV removed support for SW playback through the QS decoder. You should see 1-2% CPU utilization when HW is used. if you see >10% than it's probably SW.

If you'll build a debug version of my DLL, it will color the top left corner in blue for HW and red for SW.

wanezhiling
27th February 2012, 15:09
Latest official builds of either ffdshow or LAV removed support for SW playback through the QS decoder. You should see 1-2% CPU utilization when HW is used. if you see >10% than it's probably SW.
Thanks, update to rev4336
http://i.imgur.com/48GaY.jpg :)

egur
27th February 2012, 15:12
Thanks, update to rev4336
http://i.imgur.com/48GaY.jpg :)

If you'll build a debug version of my DLL, it will color the top left corner in blue for HW and red for SW.

NikosD
27th February 2012, 20:36
BTW, where can I download DXVA checker v2.8.0 beta?

Unfortunately beta versions of DXVA Checker are not publicy available.

BTW, I changed one clip at my benchmark collection.

I removed Birds-60fps and I added Avatar-60fps due to higher bitrate.

ryrynz
27th February 2012, 22:20
Be a nice guy and link a build for him anyway :p

egur
2nd March 2012, 16:12
Version 0.29 beta is out with the following changes:
* Support for VFW (under FFDShow-VFW).
* Optimized code path for playback under real world conditions (at the expense of GraphStudio).
* Bug fixes.
* FFDShow rev4364

Downloads
* For the latest cutting edge FFDShow builds download my builds Intel QuickSync Decoder SourceForge home page (http://sourceforge.net/projects/qsdecoder/)
* FFDShow-tryout site (http://ffdshow-tryout.sourceforge.net/download.php)
* LAV Splitter builds (http://forum.doom9.org/showthread.php?t=156191)

nevcairiel
2nd March 2012, 16:49
So, i benchmarked the new version, and the only clip that changed was the samsung ref16 clip, all others remained the same. I think we're really just at the limit of EVRs capability to accept frames in system memory there.

Here are the benchmarks:
https://docs.google.com/spreadsheet/ccc?key=0Ajo8vvjNtaZ5dC1abjBSeVlmcnZXSjYwampfamk3ZWc

The good thing is, GraphStudio didn't really slow down either.
Good job, i guess!

How about that deinterlacing now? :)

egur
2nd March 2012, 17:07
So, i benchmarked the new version, and the only clip that changed was the samsung ref16 clip, all others remained the same. I think we're really just at the limit of EVRs capability to accept frames in system memory there.

Here are the benchmarks:
https://docs.google.com/spreadsheet/ccc?key=0Ajo8vvjNtaZ5dC1abjBSeVlmcnZXSjYwampfamk3ZWc

The good thing is, GraphStudio didn't really slow down either.
Good job, i guess!

How about that deinterlacing now? :)

Part of the slowdown was related to noticed before in ffdshow was related to ffdshow - fixed that.

Yes, EVR is very limited when working in system memory, even if the GPU copy back function would be zero time, it would still output less than 230fps in the Samsung clip...

Up next is deinterlacing. Will be active if and only if decoder is active (at first anyway).

Nev, maybe you should separate your benchmarks results for DXVA checker and GraphStudio so they'll be clearer.

CharlieCL
3rd March 2012, 22:59
If you'll build a debug version of my DLL, it will color the top left corner in blue for HW and red for SW.

Where is the Debug version?

I had difficult to know if Quick Sync was enabled or not.
From CPU usage I guess QS codec was not used.

I have a testing PC with Core i5 2405S DH61AG MB 4GB DDR3
Windows 7 64-bit. But when I tried to install Intel's Win7Vista_152254 Graphic driver, it said my system was not satisfied the mini requirements. This is weird.

What is the mini requirements to use QS hardware codec?

I am not sure the architecture of Sandy Bridge. The video acceleration hardware seems to connect CPU directly. This is unlike GPU card which the video acceleration hardware is connected through PCIe. Everything looks right for Sandy Bridge. However in my testing, Sandy Bridge looked slow in data rate between CPU and GPU.

egur
4th March 2012, 00:57
Where is the Debug version?

I had difficult to know if Quick Sync was enabled or not.
From CPU usage I guess QS codec was not used.

I have a testing PC with Core i5 2405S DH61AG MB 4GB DDR3
Windows 7 64-bit. But when I tried to install Intel's Win7Vista_152254 Graphic driver, it said my system was not satisfied the mini requirements. This is weird.

What is the mini requirements to use QS hardware codec?

I am not sure the architecture of Sandy Bridge. The video acceleration hardware seems to connect CPU directly. This is unlike GPU card which the video acceleration hardware is connected through PCIe. Everything looks right for Sandy Bridge. However in my testing, Sandy Bridge looked slow in data rate between CPU and GPU.

A debug version is not supplied. one has to download the sources from sourceforge and compile them.

The driver you're trying to install is the same one I use. Very strange. Try the latest driver from the board manufacturer. Do you see the driver in the Device Manager under Display adapters?
BTW, if by a long shot you have an engineering sample of SandyBridge, the production (standard) drivers will not install.

FFDShow reports it's using QS in it's tray icon. Also in the config dialog when a clip is running. LAV video decoder reports "Available" when QS is enabled and "Active" when it's actually used.

Although CPU and GPU share the same RAM and even share L3 cache. The GPU uses memory in a special way (called USWC) which makes it optimized for burst reads/writes (GPU like it) but take longer for the CPU to process.

For low bitrate clips, CPU will be faster.

Esperado
4th March 2012, 01:45
FFDShow reports it's using QS in it's tray icon. It says libavcodec on my config, while QuickSync was chosen in FFDShow for H264 and MPEG2. Why ?
My two screens are plugged in the Intel connectors, my CPU is Sandy bridge I5-2500k, graphic card is: Sandy Bridge-DT GT2 (Integrated 8086 / 0112, Rev 09), drivers are "Intel(R) HD Graphics Family" 8.15.10.2509.
What am-I doing wrong ?

[edit] Tried with driver 8.15.10.2622 (10/01/2012): same issue.

CharlieCL
4th March 2012, 06:00
A debug version is not supplied. one has to download the sources from sourceforge and compile them.

The driver you're trying to install is the same one I use. Very strange. Try the latest driver from the board manufacturer. Do you see the driver in the Device Manager under Display adapters?
BTW, if by a long shot you have an engineering sample of SandyBridge, the production (standard) drivers will not install.

FFDShow reports it's using QS in it's tray icon. Also in the config dialog when a clip is running. LAV video decoder reports "Available" when QS is enabled and "Active" when it's actually used.

Although CPU and GPU share the same RAM and even share L3 cache. The GPU uses memory in a special way (called USWC) which makes it optimized for burst reads/writes (GPU like it) but take longer for the CPU to process.

For low bitrate clips, CPU will be faster.

My driver is 8.15.10.2372 date 4/15/2011 this is on the Intel's CD. Now I upgrade to 8.15.10.2509 from online date 8/31/2011. The latest drevier is 15.22.54.2622 date 01/12/2012.

My Sandy Bridge is in a retail box so I guess it is not a engineering sample.

Tested the LAV decoder. There was an item of Quick Sync and it was set to available. But no "active" display in the property list.

What I want is to display that the QS hardware accelerator is used while a video was playing. Could you distribute a non-debug version that can display if QS is using?

So far I am disappointed on the performance of Sandy Bridge. I run my program on Pentium dual core 2.9GHz DDR2 800Mhz with Nvidia Quadra fX 3500 card, the FPS is 54. But on
my new SB quad core 2.5GHz DDR3 1333, the FPS is only 31.
In both PCs only ffdshow software codec was used.

CruNcher
4th March 2012, 12:23
@ Egur
im currently testing 2639 (libmfxhw32-s1 3.0.357.38898) but the decodinerror.ts is still not fixed (lav video quicksync, ffdshow quicksync) :( ?

No Problems with CoreAVC DXVA and other implementation Nevs DXVA is absolutely broken here

No difference to MC.ts that is still fixed no regression visible :)

Still Nevs DXVA crashes with the 720p.mpg (x264 mpeg-2 benchmark sequence)

Though both of the Nev issues aren't driver related but implementation issues.

egur
4th March 2012, 12:52
@ Egur
im currently testing 2639 but the decodinerror.ts is still not fixed (lav video quicksync, ffdshow quicksync) :( ?

I know, it's a different issue than the mc.ts clip. I have sent this clip and others for the driver and MSDK teams to analyze. It seems (to me) that these corruptions happen on scene changes. So maybe there's a way to fix them by manipulating the stream headers somehow...
Now that I use QS to play all my movies and TV shows on my i7-2600k HTPC, I get this sort of corruption from time to time (about once per 3-4 clips) - always very short and always on scene changes. I mostly watch 720p h264 mkv files. VC1 decode errors were reported here as well (or maybe in the AVS forum thread), this issue is also handled.

The good thing is that these issues are addressed and not ignored. A solution isn't always is quick or simple but the general direction is definitely positive.

The 2639 driver (which is not a final IVB driver or even the latest) gave me a few problems so I uninstalled it. I suggest you do the same...

Anyway, keep the issues coming, it will help drivers/MSDK mature faster.

Edit
If CoreAVC DXVA is working well then my hunch on stream pre processing looks even more viable.

CruNcher
4th March 2012, 13:08
Yup definitely great support in fixing problems :)

The only issue i have is when doing realtime surrface manipulation with @ the same time video output (low latency) testing that with ffdshows and SPP processing (which is very heavy on the CPU) with Quicksync i dont get it stable its stucking very often and sync is lost, reported that some pages back :(

Though something like this not always works nicely it seems heavily dependent on the Decoder best results so far i got with MPC-HCs and Microsofts own Mpeg-2 Decoder those seem really stable in such a time critical workflow.

egur
4th March 2012, 13:21
Yup definitely great support in fixing problems :)

The only issue i have is when doing realtime surrface manipulation with @ the same time video output testing that with ffdshows and SPP processing (which is very heavy on the CPU) with Quicksync i dont get it stable its stucking very often, reported that some pages back :(

I'll look into that today.

Esperado
4th March 2012, 15:01
It says libavcodec on my config, while QuickSync was chosen in FFDShow for H264 and MPEG2. Why ?
My two screens are plugged in the Intel connectors, my CPU is Sandy bridge I5-2500k, graphic card is: Sandy Bridge-DT GT2 (Integrated 8086 / 0112, Rev 09), drivers are "Intel(R) HD Graphics Family" 8.15.10.2509.
What am-I doing wrong ?

[edit] Tried with driver 8.15.10.2622 (10/01/2012): same issue.Any help ?

CruNcher
4th March 2012, 15:19
@Egur

Im not sure but it seems the normal decoding overhead (copy back) is too high for it to get stable fps though i wonder why Intels Decoder also has problems, with it's so small overhead (despite the green line problem, it also stucks from time to time like with quicksync decoding but not as heavy most probably less overhead related see bellow) and many Software Decoder as well.
Though it seems improving Performance to much isn't very good for latency and decreases performance in such specific realtime workflows (especially when mixing multi threaded with single threaded parts) might be also why MPC-HCs libmpeg2 performs so well its Performance isn't really that good (better than Mainconcept Singlethreaded) but therfore it keeps very stable in that case :)


Though first of i really have to find out when and if the Reference Decoder is using Quicksync Hardware and when it doesn't

i also wonder if this improves just by switching to Windows 8 ;)


So with EVR output it uses the Hardware:

http://img62.imageshack.us/img62/5487/realtimemanipulationtes.png

Also with Null it uses the Hardware (so those amazing improvements are indeed hardware (ASIC) related compared to the software decoders and not software, Performance wise in direct comparison with the best decoders i would say you get almost multithreaded performance of 4 cores (90%) @ single threaded cpu utilization levels (25%) very impressive so the ASIC saves like 65% depending on the Software Decoders Multi threading efficiency.

http://img256.imageshack.us/img256/3753/inteldecoderanalyzenull.png

So i guess those stucking in the realtime test and sync issues will be the same for everything hardware decoded and the copy back overhead on the cpu (double the utilization thrown around the cores by lav video quicksync, ffdshows quicksync) just amplifies the problem more :( ?

egur
4th March 2012, 16:14
Any help ?
Do you mean that in the codecs tab you selected the QS decoder and in the info tab you saw libavcodec? What player/splitter do you use? Does it happen with LAV video decoder?

@CruNcher
I fixed ffdshow to use PP on all QS codecs (VC1, mpeg2, H264).
I analyzed SPP with a profiler on an SD clip (720x480), MPEG2. Results are that SPP is taking practically all the CPU cycles.
It's also single threaded so it doesn't scale well with modern CPU architectures.
My decoder only outputs NV12 so it might add some overhead to SPP.

To make matters worse, ffdshow uses inline functions to mask out intrinsic function calls. this works nice in optimized builds but it's dead slow in debug builds (1/2 fps!) making the debug process very hard.

I'll contact clsid for a solutions to these problems. MT can be done with OpenMP and using the intrinsic functions 'as is' without wrappers would be very fast in debug builds.
OpenMP requires a few simple but significant changes in ffdshow - link with dynamic version of the CRT. This can complicate ffdshow distribution.
OpernMP has a bug that it crashes on exit when being used with a static version of the CRT (libc).

nevcairiel
4th March 2012, 17:42
OpernMP has a bug that it crashes on exit when being used with a static version of the CRT (libc).

An alternative would be using Microsofts Parallel Patterns Library (PPL), i use it in LAV Video and it seems to work just fine.
Super easy to create a parallel for loop with it.

PS:
Intrinsics in debug builds will always be much slower then release builds, because the debug intrinsics always have an extra step to move register content back to system memory so the debugger can easily look into it.

egur
4th March 2012, 18:16
An alternative would be using Microsofts Parallel Patterns Library (PPL), i use it in LAV Video and it seems to work just fine.
Super easy to create a parallel for loop with it.
Didn't work with it before, what are the dependencies?
OpenMP is super easy too and it ships with the compiler.

Intrinsics in debug builds will always be much slower then release builds, because the debug intrinsics always have an extra step to move register content back to system memory so the debugger can easily look into it.
My copy function is written using intrinsics and it works very fast in debug builds.