Log in

View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing


Pages : 1 2 [3] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

egur
24th September 2011, 21:00
@timestamps in general:

I'm sure the MSDK offers an ability to handle timestamps by itself. This will probably be fine with all PTS timestamps. At least thats the case with NVIDIAs API and DXVA2.
Just for DTS timestamps, you need to manually map the incoming times to the outgoing frames. Since the number of frames coming in and going out is usually the same, that shouldn't be much of a problem. I have that implemented in LAV Video using a FIFO buffer for the CUVID decoder (because i don't know its exact processing delay), and a fixed circular buffer in the avcodec decoder (because there i know its exact decoding delay).

First, I really appreciate your help. Thanks.
Is there a 1:1 relation between samples coming in and frames coming out?
e.g. is every media sample a single frame, or a frame can be divided into an arbitrary number of samples. This will make mapping time stamps impossible.

nevcairiel
24th September 2011, 21:12
In theory, both is possible, to some extend.

In reality, VC1 and H264 are usually always one sample=one frame, MPEG2 might be split over multiple data samples.

egur
25th September 2011, 21:46
New and improved version. Zip files contains installer documentation, please read.

Download version 0.14 alpha:
32 bit http://www.multiupload.com/LPX2JXB06S
64 bit http://www.multiupload.com/8H7ZPV5XKC
Source code http://www.multiupload.com/AXET69LL3P

Revision highlights:
v1.14:
* Created ffdshow installer. Installer will default to enabling the Intel QuickSync decoder on new installations.
* More speed optimizations. CPU is at its lowest frequency during playback with very low utilization. 2-3% on desktop and 5-6% on mobile. Mobile lowest frequency is half of desktop (800/1600).
* Fixed handling of interlaced content which is encoded as progressive frames.
* More robust and faster codec initialization

Atak_Snajpera
26th September 2011, 12:46
Are you going to merge your code with official builds? I'm asking because there is zero movement at the moment in ffdshow :( Only libav is currently being updated. That's all.

egur
26th September 2011, 13:51
Are you going to merge your code with official builds? I'm asking because there is zero movement at the moment in ffdshow :( Only libav is currently being updated. That's all.

That's the idea. I wanted to get a minimum feature set working and I'm very close to it. clsid was very skeptic I could pull this off because of the memory copying involved, but this part is now solved :)
I'll contact clsid again and push things forward.

BTW, did you try my build? It should improve Avisynth performance when ffdshow is used as a decoder.

millercentral
26th September 2011, 19:13
I know this effort has been focused around ffdshow, but does anyone know of any similar efforts to enhance the ffmpeg encoding library for QuickSync? I would love to see this for transcoding improvements...

Blight
26th September 2011, 19:15
millercentral:
Eric said he'll wrap this up in a nice DLL.
From there, ffmpeg coders should be able to integrate with little effort.

clsid
29th September 2011, 02:28
Hi Eric,
I haven't tested your build yet, but here are a few questions:
- How are multiple instances of the decoder handled? I assume the HW can only handle a certain amount of streams at once. Will ffdshow simply fail graph connection for instance N+1, will it fall back to software mode (libavcodec), or will it crash and burn?
- This only work on Sandy Bridge, and future similar CPUs?
- Does it work if a mobo doesn't use the integrated GPU? Or if disabled in BIOS? Or if no monitor is attached?

I can give you SVN access if you want. Then you can commit it when deemed stable and also perform updates and fixes.

ajp_anton
29th September 2011, 03:30
How can I make the QS decoder ignore unsupported streams (10bit, 4:4:4 chroma, lossless etc) and let for example LAV take care of it?

egur
29th September 2011, 08:38
- How are multiple instances of the decoder handled? I assume the HW can only handle a certain amount of streams at once. Will ffdshow simply fail graph connection for instance N+1, will it fall back to software mode (libavcodec), or will it crash and burn?
There should not be a (practical) limit. I can modify ffdshow to revert to libavcodec if initialization fails. Most likely that the platform will run out RAM before this happens.

- This only work on Sandy Bridge, and future similar CPUs?
Correct, that’s guarantied. It might work on small core processors in the future (Atom), no promises.
The user has the option to use this decoder or not via the standard ffdshow config. Just like other decoders (e.g. wmv9).
- Does it work if a mobo doesn't use the integrated GPU? Or if disabled in BIOS? Or if no monitor is attached?
BIOS must enable the IPG as the driver will not load. IGP driver must be loaded and enabled. This is not a problem. My own system has an AMD 6950 Radeon on an h67 chipset MB.
DirectX will not enumerate the IGP if a screen is not connected to it (it enumerate only the displays).
Current version needs to have either the IGP connected to a screen or have switchable graphics or Lucid Virtu is installed and configured to bind the media players to the IGP.
A VGA dummy can also be used.
I’m in the process of enabling the HW acceleration without the above hacks.

I can give you SVN access if you want. Then you can commit it when deemed stable and also perform updates and fixes.
That would be great! Please send me instructions by mail if there’s any kind of procedures I’m required to follow.

egur
29th September 2011, 08:48
How can I make the QS decoder ignore unsupported streams (10bit, 4:4:4 chroma, lossless etc) and let for example LAV take care of it?

The QS decoder will fail to initialize on such streams. The idea is to have ffdshow fallback to libavcodec and that fails to notify the graph/player on connection failure. The player can then load another filter (e.g. LAV). That's the most elegant way to do it as far as I can see.
Future HW might (or not) support these stream types.
It would help if you pointed me to such streams for testing purposes, or to a guide on how to transcode to these formats.

gvaley
29th September 2011, 08:52
Reading the news in AnandTech really raised my hopes but they were to quickly plunge after I read egur's full post.

Not to say this initiative isn't commendable, but what I (and the rest of the world) would like to see is a true open source port of the SDK available to all OSes. This would enable the ffdshow, mplayer, VLC, you name it guys to build both encoding and decoding code into their projects and really put QuickSync to good use.

jakmal
29th September 2011, 09:15
The QS decoder will fail to initialize on such streams. The idea is to have ffdshow fallback to libavcodec and that fails to notify the graph/player on connection failure. The player can then load another filter (e.g. LAV). That's the most elegant way to do it as far as I can see.
Future HW might (or not) support these stream types.
It would help if you pointed me to such streams for testing purposes, or to a guide on how to transcode to these formats.

Eric,

You can find a 10b H264 file at this location:

http://dl.dropbox.com/u/15890479/480p_H264.mkv

Regards
Ganesh

egur
29th September 2011, 13:49
Reading the news in AnandTech really raised my hopes but they were to quickly plunge after I read egur's full post.

Not to say this initiative isn't commendable, but what I (and the rest of the world) would like to see is a true open source port of the SDK available to all OSes. This would enable the ffdshow, mplayer, VLC, you name it guys to build both encoding and decoding code into their projects and really put QuickSync to good use.

Agree, but before an open source SDK, you'll need driver support. BTW I'm not an expert on the driver, most of my knowledge is public knowledge.
* Linux has a very basic driver. I'm not aware of a video acceleration API (like DXVA).
* Mac OS X driver is developed by Apple not Intel (with Intel support of course and probably full disclosure of Windows driver code). Apple controls the driver API and functionality, because they own the OS (like Microsoft owns D3D & DXVA APIs).

Making a cross platform video acceleration SDK without a proper VA API is probably possible but would require bypassing the DXVA API and communicating in an alternative way with the driver. This usually requires a consent from the OS vendor (Microsoft & Apple) - not an easy task.
The best solution would be to have an open standard for video acceleration that would be accepted by all OS vendors (like OpenGL, OpenCL).

I believe the media SDK will be ported to other OSes in the future, so my code will be a good start point for understanding how to utilize it. My only "Windows" specific code is dealing with the memory GPU allocations, but this can be abstracted by the SDK.

Having an open source SDK is not really a must. The Intel Media SDK is free to use and you also have a large company backing it up. The SDK abstracts the HW enough so that the code is future proof.

nevcairiel
29th September 2011, 14:01
Funny that you would say "VA API"

Linux has a API which is supported by Intel, its called simply "VA API (http://en.wikipedia.org/wiki/Video_Acceleration_API)". It was designed by Intel initially, but AFAIK you can get adapters to run hwaccel through VAAPI on other GPUs as well these days.

egur
29th September 2011, 14:01
Eric,

You can find a 10b H264 file at this location:

http://dl.dropbox.com/u/15890479/480p_H264.mkv

Regards
Ganesh

Thanks

nm
29th September 2011, 14:02
Agree, but before an open source SDK, you'll need driver support. BTW I'm not an expert on the driver, most of my knowledge is public knowledge.
* Linux has a very basic driver. I'm not aware of a video acceleration API (like DXVA).

The current driver is much more than basic. As nevcairiel pointed out, Intel supports hardware-accelerated decoding on Linux through VAAPI: http://intellinuxgraphics.org/h264.html

There's also initial encoding support through libva, but I haven't heard anyone outside Intel having tested it yet.

clsid
29th September 2011, 14:40
The QS decoder will fail to initialize on such streams. The idea is to have ffdshow fallback to libavcodec and that fails to notify the graph/player on connection failure. The player can then load another filter (e.g. LAV). That's the most elegant way to do it as far as I can see.
Future HW might (or not) support these stream types.
It would help if you pointed me to such streams for testing purposes, or to a guide on how to transcode to these formats.
Have a look at ffglobals.cpp. There is SPS parsing code there that is currently used for checking if a stream is supported. Maybe you can also fix the code to no longer depend on the libavcodec golomb stuff.

Send me a PM with your SourceForge username and I can give you commit rights.

egur
30th September 2011, 13:24
The current driver is much more than basic. As nevcairiel pointed out, Intel supports hardware-accelerated decoding on Linux through VAAPI: http://intellinuxgraphics.org/h264.html

There's also initial encoding support through libva, but I haven't heard anyone outside Intel having tested it yet.

Good to know. I'm currently trying to make a good Windows based decoder that's easy to integrate. Cross platform code will be available once Media SDK supports it.

CruNcher
1st October 2011, 01:25
This issue might be not easy to find egur, currently ffdshow-quicksync has a tendency to silently selfdestruct :( after a lot of streams continuously loaded in MPC-HC it dies and MPC-HC falls @ the next stream then back to another Decoder :( ffdshow-quicksync wont load anymore after until MPC-HC is restarted it isn't a specific stream it seems already changing streams fast without unloading them can trigger this, like in MPC-HC drag & drop while another stream is still loaded.
Another problem though here is MPC-HC cant cleanly unload dshow filter when a file is closed the filter is still acquired not sure though if that has something todo with ffdshow-quicksync currently braking after a while. Though maybe it's also a memory leak i didn't looked into that yet.


http://www.mediafire.com/download.php?ri4sdpjlafxxrry <- hangs (@ the cat) when the lion scene should follow (with both lav splitter and mpc-hc).

Btw the 4:2:2 Mpeg-2 Fallback already works perfectly in combination with Lav Video :)
Only High10, High422 and X264 losless fail

vivan
1st October 2011, 10:54
Some problems when choosing russian language in installer:
http://2.firepic.org/2/images/2011-10/01/mauicafzgk79.png
http://2.firepic.org/2/images/2011-10/01/vzuvye7g62o1.png
http://2.firepic.org/2/images/2011-10/01/t5vo2z4dznm3.png
But with regular ffdshow everything is ok...

jmone
2nd October 2011, 00:02
Well done egur, I have had good success testing this on my i7-2600K and it worked pretty well with little issues on dropping frames etc with madVR. I did not have a good result however with interlaced HD material when compared with LAV CUVID. LAV CUVID would output frame doubled material that looks great at both 50 and 59.94 where as the intel version is pushing out 25 and 29.97 frames respectively and the interlaced fields as easily visible. Also on one VC-1(i) clip (Eagles Farewell Tour muxed to M2TS) there were also bouts of video corruption that seemed to be related to scene changes.

Great start!
Thanks
Nathan

nevcairiel
2nd October 2011, 00:06
It does not deinterlace at all yet, thats why you see the interlacing artifacts, and only get 25/29.97 fps. :p

jmone
2nd October 2011, 00:16
That would explain it! I'll be very intersted to see if the IGP is fast enough to handled 50/60fps in madVR when/if deinterlacing works as the rendering times already look high(ish) compared to the 550Ti.

ajp_anton
2nd October 2011, 01:56
Did some speed testing on a Blu-ray video (Pixar's Day&Night).
i7 2600K. CPU at 4000MHz, GPU at 1700MHz
I also looked at the CPU power draw readings, don't know how accurate they are, and they are fluctuating a bit...


decoder speed(fps) power(W)
ffdshow (QuickSync) 460.4 33**
LAV video 326.1 90
ffdshow 287.0 86
LAV video (1 thread) 67.9 48
ffdshow (1 thread) 66.6 48

ffdshow (QuickSync) 24 12*
LAV video 24 15*
ffdshow 24 15*

** CPU at 1600MHz, GPU at 1700MHz
* CPU at 1600MHz

egur
2nd October 2011, 10:03
decoder speed(fps) power(W)
ffdshow (QuickSync) 460.4 33**
LAV video 326.1 90
ffdshow 287.0 86
LAV video (1 thread) 67.9 48
ffdshow (1 thread) 66.6 48

ffdshow (QuickSync) 24 12*
LAV video 24 15*
ffdshow 24 15*

** CPU at 1600MHz, GPU at 1700MHz
* CPU at 1600MHz


These are very good numbers. I didn't have the time to measure power yet. I think power would be better if the GPU wasn't overclocked so high, but 20% savings on 1080p@24 is a good start point.
Interlaced content should provide better results (than SW implementations) when using the EVR as it uses the HW deinterlacer.

Well done egur, I have had good success testing this on my i7-2600K and it worked pretty well with little issues on dropping frames etc with madVR. I did not have a good result however with interlaced HD material when compared with LAV CUVID. LAV CUVID would output frame doubled material that looks great at both 50 and 59.94 where as the intel version is pushing out 25 and 29.97 frames respectively and the interlaced fields as easily visible. Also on one VC-1(i) clip (Eagles Farewell Tour muxed to M2TS) there were also bouts of video corruption that seemed to be related to scene changes.

Great start!
Thanks
Nathan

Thanks.
As nevcairiel pointed out, LAV Video decoder doesn't deinterlace the video. I also do not deinterlace, it's the job of the renderer. If I incorrectly flag a clip as progressive, please share it to help me fix the problem.
EVR uses HW deinterlacing and produces 50-60fps. That's the most stable renderer ATM.
MadVR compatibility is on my TODO list.
VC1 is a little problematic and I've seen corruption in some clips. I do not know the root cause of the problem (HW, driver, media SDK) but it's being dealt with by the Media SDK team. I hope a solution will come promptly. If you provide/share a short sample of the corrupted scene it would help.

I want to thank everyone for providing valuable feedback, this is the highway to a stable product and a good reference for other products.

leomax
2nd October 2011, 11:28
Keep up the great job egur.
Would it be possible to use this in a notebook with switchable graphics (optimus)?

egur
2nd October 2011, 12:00
This issue might be not easy to find egur, currently ffdshow-quicksync has a tendency to silently selfdestruct :( after a lot of streams continuously loaded in MPC-HC it dies and MPC-HC falls @ the next stream then back to another Decoder :( ffdshow-quicksync wont load anymore after until MPC-HC is restarted it isn't a specific stream it seems already changing streams fast without unloading them can trigger this, like in MPC-HC drag & drop while another stream is still loaded.
Another problem though here is MPC-HC cant cleanly unload dshow filter when a file is closed the filter is still acquired not sure though if that has something todo with ffdshow-quicksync currently braking after a while. Though maybe it's also a memory leak i didn't looked into that yet.

I managed to reproduce the drag & drop causing a freeze to MPC-HC. ffdshow.ax stay resident in memory but my decoder DLL is unloaded. Looks like some kind of race condition. Almost impossible to debug a this doesn't occur if I place breakpoints :(
I couldn't reproduce with ZoomPlayer, maybe MPC-HC is handling the loading differently. If I knew how, I could find a solution. Unless I'm wrong here, this is a low-medium priority bug. If an MPC-HC developer can give a hint that would help.
I scanned for memory leaks and fixed them (for next release). They were minor and didn't affect anything.


http://www.mediafire.com/download.php?ri4sdpjlafxxrry <- hangs (@ the cat) when the lion scene should follow (with both lav splitter and mpc-hc).

Btw the 4:2:2 Mpeg-2 Fallback already works perfectly in combination with Lav Video :)
Only High10, High422 and X264 losless fail
The crash/hang is actually a critical bug in my code - fixed.

Regarding the various H264 formats I now filter them within ffdshow and fallback to libavcodec silently.

egur
2nd October 2011, 12:02
Keep up the great job egur.
Would it be possible to use this in a notebook with switchable graphics (optimus)?

One report said it works. I don't have a system to check myself.
Please try and let me know.

CruNcher
2nd October 2011, 12:31
I managed to reproduce the drag & drop causing a freeze to MPC-HC. ffdshow.ax stay resident in memory but my decoder DLL is unloaded. Looks like some kind of race condition. Almost impossible to debug a this doesn't occur if I place breakpoints :(
I couldn't reproduce with ZoomPlayer, maybe MPC-HC is handling the loading differently. If I knew how, I could find a solution. Unless I'm wrong here, this is a low-medium priority bug. If an MPC-HC developer can give a hint that would help.
I scanned for memory leaks and fixed them (for next release). They were minor and didn't affect anything.


The crash/hang is actually a critical bug in my code - fixed.

Regarding the various H264 formats I now filter them within ffdshow and fallback to libavcodec silently.

I asked Jan if he might have an idea, about the fallback so it's falling back to the Internal ffdshow libav not to the dshow chain right, could you maybe add a switch to let the user chose if he prefers internal/external in such a fallback case ?

jmone
2nd October 2011, 12:46
Here is the first 1:20 of the VC(i) clip with showing the corruption that comes and goes --> http://www.megaupload.com/?d=40NY4V2C

Deinterlacing: For me the great benefit of LAV CUVID is not the decoding as much as that you get the best deinterlacing I've ever seen (for supported formats). It really is very very good and puts to shame anything that EVR or FFDSHOW/YADIF can do. Are you saying that access to GPU deinterlacing will not be part of the Intel GPU Video Decoder?

Integration with LAV Video: I know that nevcairiel has hinted that your project may be accessible from LAV Video (as an alternative to FFDSHOW) or am I reading this incorrectly? & if not any timeline?

Thanks again.

Blight
2nd October 2011, 13:26
jmone:
I believe deinterlacing and other PP effects will be added once general-playback is considered stable.

CruNcher
2nd October 2011, 13:38
jmone
Yes as egur stated already your issue is known and it's up to the Driver and Media SDK team @ Intel now to fix it and hopefully it is fixable @ all but also Nvidia fixed it so it seems to be doable without any ASIC replace, especially these Asics aren't so basic as they where years ago they are partly programmable nowdays, or like Nvidia did for a Mpeg-2 implementation bug just using the EUs to workaround the " hardware bug" in the hardest case (though Intels action space is smaller for this EU workarounds, being not so powerful).
But im very confident the Intel Engineers gonna find a way to fix this Problem one or another way :)

Also im sure Intel knows about this problem since some time now, as it hardly can be that none of the Top 4 ISVs (Mainconcept, Arcsoft, Cyberlink, Corel) gave that feedback already from their customers that experience this issue to Intel, just the pressure gets higher the more reports flowing in about this :)
Btw It took Nvidias Engineers 1 Driver cycle to fix this back then ;)

egur
2nd October 2011, 13:55
jmone:
I believe deinterlacing and other PP effects will be added once general-playback is considered stable.

Correct. One thing at a time.

pulbitz
2nd October 2011, 22:49
I'm sorry. I don't speak English very well.

sample files
http://www.mediafire.com/?sem1jx36pnnae2i or http://www.multiupload.com/VJPU41ELBI

file: slow video.20100518.직캠.서울시립대학교 축제.아이유(IU) - Boo.flv
file: slow video.20110516.S-OIL.TV-CM.즐거운 세상 만드는 좋은기름 1리터의 힘.20초.아이유(IU).RAiN.ts
quicksync is slow motion. libavcodec is OK.

file: can't decode.20101006.직캠.숭실대 얼Ssu!.6.아이유(IU) - 멘트.mkv
quicksync can't decode video(720x1280).
if HW can't decode it, then need fallback to libavcodec.

egur
3rd October 2011, 08:32
file: slow video.20100518.직캠.서울시립대학교 축제.아이유(IU) - Boo.flv
file: slow video.20110516.S-OIL.TV-CM.즐거운 세상 만드는 좋은기름 1리터의 힘.20초.아이유(IU).RAiN.ts
quicksync is slow motion. libavcodec is OK.

file: can't decode.20101006.직캠.숭실대 얼Ssu!.6.아이유(IU) - 멘트.mkv
quicksync can't decode video(720x1280).
if HW can't decode it, then need fallback to libavcodec.

Issue 1: Slow decode (Boo.flv). Root caused to several splitters not sending frame rate (defaulting to 24fps) - will be fixed in next release (I'm working hard on stabilizing the time stamps and identify frame rate :( ).
Unfortunately, I haven't locks on a good algorithm that takes care of all the corner cases. LAV decoder's solution for time stamp handling is working according to nevcairiel but I don't want to use it as it involves querying the filter graph and needs some meta data I don't have. I want to make a standalone algorithm independent from DirectShow - rely only on the time stamps given at input and outputted by the Media SDK decoder API. I may release a less than perfect implementation in order for other fixes to surface.

Issue 2: Slow decode (Oil TV.ts). The clip has inverse telecine flags at the beginning of the clip. Due to a bug, the QS decoder stays locked at 23.97. This fix will be solved with issue 1.

Issue 3: Can't decode. A duplicate bug reported by CruNcher. Already fixed in my code. Will be in next release - hopefully this week.


pulbitz - thanks a lot of the feedback!

egur
3rd October 2011, 12:30
CruNcher:
The CPU utilization seems a little high.
During my testing on both desktop and laptop CPUs, during playback the CPU frequency would drop to a minimum (1600MHz for desktop and 800MHz for mobile) with single digit CPU utilization.
It's obvious that (unnecessary) surface conversion is going on not sure why. Did you use the standard EVR?

CruNcher
3rd October 2011, 12:36
It cant drop lower because im using High Performance to keep overall system latency low as possible (but you right @ the beginning i used Balanced) :) and nope this is EVR-CP (Deinterlacing completely lost but different capabilities for Higher Quality output) and im not sure whats going on especially that YUY2 conversion seems to hit (its using in both cases billinear 2.0 ps for the scaling) still it does better then @ the beginning so that progress i wanted to visualize but still config stuff changes :P Though that the OSD for MPC-HC main gets differently rendered (blur) im not sure what causes this (could be either one of the Microsoft D2D or Intel Driver update)

Also my experience is SB is so efficient itself that clock changing doesn't impact power consumption much @ all and not very useful for a Desktop, we aren't in the early days of C&C and Speedstep anymore where you could save tons of watts, the whole Power Management from the lowest (cpu,chipset), middle (bios) up to the highest (OS) level is already efficient even running @ High Performance ;)

PS: Also keep in mind i didn't compared yet EVR-CP (Bicubic PS 2.0) vs the Hardware Scaling used on EVR by Intels Driver yet (i got a quality idea of the Deinterlacing, IVTC and Sharpen PP so far but not how the Adaptive Scaling off different content works out compared to a hardcoded Bicubic PS approach running over the EUs).


Ok here is the current state 32 bit (Balanced) :)

ffdshow-quicksync 0.14 Alpha

http://img27.imageshack.us/img27/7227/rocks32bitffdbicgpu.png

Cyberlink DXVA (Worlds most efficient DXVA Decoder)

http://img849.imageshack.us/img849/6203/rocks32bitdxvabicgpu.png

Please dont ask why the OSD is so blurred i have no idea what changed this (it's the same build and settings as the first day test of ffdshow-quicksync, just the system around changed Driver and Subsystems and also the Scaling is by default now Bicubic PS 2.0 im not sure anymore what i used on the first day test, but it is in now way responsible for the blur) (Microsoft Optional D2D Patches for fixing problems with IE9) that could cause this) :(
Feel free to tell if you have any idea or have this same issue with the OSD currently on Intel Graphics and the Main MPC-HC builds with EVR-CP :)

So this is how ffdshow-quicksync started its a big improvement :)

http://img835.imageshack.us/img835/4941/ffdshowquicksyncomgover.png

egur
3rd October 2011, 17:29
...
PS: Also keep in mind i didn't compared yet EVR-CP (Bicubic PS 2.0) vs the Hardware Scaling used on EVR by Intels Driver yet (i got a quality idea of the Deinterlacing, IVTC and Sharpen PP so far but not how the Adaptive Scaling off different content works out compared to a hardcoded Bicubic PS approach running over the EUs).

Bicubic is not good enough when scaling factors are high (>2), the image is a little blurry. Bilinear shouldn't be used for anything as it creates horrible scaling artifacts.
The SNB HW scaler has the capability to downgrade to Lanczos4, Lanczos3, Lanczos2 and all the bicubic variants. I'm not sure if the driver supports any of these modes though, but I'll check. It makes sense to utilize it in any case.

A few questions for everyone:
* Does the EVR have an interface to configure the scaler quality?
* Is the EVR-CP the same one supplied with Media SDK 4.0b4 (they have the same file name)?
* A tough one - does anyone know how to create a virtual adapter - so DXVA can enumerate a GPU not connected to a screen?

CruNcher
3rd October 2011, 17:50
Btw i also lost Microsofts DTV Decoder for this file im also not sure yet why but it doesn't connect anymore, anyways for DXVA Cyberlink is the best choice so i added this to have a compare vs DXVA (it showed me that a lot of times in the past even in very picky situations where others DXVA implementations fail it keeps stable and performant throughout different DSPs, and just recently on a NT 6 test it showed again superior results when testing CoreCodecs new Implementation) :)

Egur all of these are very good questions :)
1. Somehow their must be else how should the driver have the possibility to manipulate EVR directly in terms of IVTC, Deinterlacing and Sharpen :P though i guess from the lower level it's not really documented @ all, so most probably only from the Driver Kernel Level ?
2. In theory Intel could have borrowed MPC-HCs version for their samples indeed ;)
3. Yea there must be a way, though im not sure if this was done for security reasons too in terms of the Protected Media Path, hmm if Lucids Virtu is in dgpu mode though Quicksync can be still used also can't it ? so they must have found a way (or does encoding work only in igpu mode ?) :D

JanWillem32
3rd October 2011, 21:05
I managed to reproduce the drag & drop causing a freeze to MPC-HC. ffdshow.ax stay resident in memory but my decoder DLL is unloaded. Looks like some kind of race condition. Almost impossible to debug a this doesn't occur if I place breakpoints :(
I couldn't reproduce with ZoomPlayer, maybe MPC-HC is handling the loading differently. If I knew how, I could find a solution. Unless I'm wrong here, this is a low-medium priority bug. If an MPC-HC developer can give a hint that would help.I'm quite familiar with the MPC-HC graph builder and related parts. There are several initialization problems. I'm currently evaluating two fixes for problems that are indeed causing race conditions (and a scaling bug for DVD menus). I'm also looking at possibilities for seamless playback. The default settings for debugging MPC-HC are okay for a quick checkup on a project, but not for core debugging of renderers, profile-guided optimization, assembly checkup, et cetera.

Bicubic is not good enough when scaling factors are high (>2), the image is a little blurry. Bilinear shouldn't be used for anything as it creates horrible scaling artifacts.
The SNB HW scaler has the capability to downgrade to Lanczos4, Lanczos3, Lanczos2 and all the bicubic variants. I'm not sure if the driver supports any of these modes though, but I'll check. It makes sense to utilize it in any case.

A few questions for everyone:
* Does the EVR have an interface to configure the scaler quality?
* Is the EVR-CP the same one supplied with Media SDK 4.0b4 (they have the same file name)?
* A tough one - does anyone know how to create a virtual adapter - so DXVA can enumerate a GPU not connected to a screen?The basic VMR and EVR property pages can be called when the appropriate DLL files are registered for it (proppage.dll and evrprop.dll, separate versions exist for x86 and x64).
For reading the additional EVR-CP mixer settings, querying the mixer interface is required. The same rule applies for applying most settings. As for the scaling quality, I've simply assumed it's always bilinear coming out of the scaling by VMR and EVR judging by the type of square or rectangular scaling artifacts and blur.

As for the variants of custom renderers, MPC-HC has one. It's completely shared between the VMR-9 r., EVR-CP, RealMedia DX9 and Quicktime DX9 mixers. EVR Sync contains a nearly 1:1 copy of that code. It evolved from the DirectX 7, then 8 renderer that was there before it. I never liked it.

When I decided that I could come up with something better, I dumped the renderer core and started developing: http://forum.doom9.org/showthread.php?t=161047 .
In the beginning, the renderer wouldn't work properly at all, but I'm slowly making progress over time. Once I've finally figured out how to receive the raw bits from a DirectShow pin (or if anyone would help with that part), I'll also add a custom mixer in time to replace the "borrowed" mixers. I'm already trying to disable all possible built-in filters of the mixers and making implementations of useful filters in the custom renderer core. That includes all resizing options (currently doesn't always work for disabling chroma filtering, though). The handling of deinterlacing at the mixer level is an abomination, by the way.

A few virtual DirectX 9 COM pointer functions can be derived from a DirectX 10/11 device, but that's for expert-level DirectX programmers. The resource management is difficult.
For DirectX usage without a screen, look into DirectX 10.1 and 11 DirectCompute. Else, just ignore the implicit swap chain completely and set the window handle to invisible.

I can assist with debugging. I can upload a full source code of what I'm working on, explain the debug methods for some parts and help with other things in my field of expertise.

CruNcher
4th October 2011, 16:14
Something is strange im trying ffdshow-quicksync with MPC-HC test and their is some strange thing going on with the Internal OSD not the one of MPC-HC but the one of ffdshow and Colorspaces, if the OSD is off the output from ffdshow-quicksync is NV12 as it should be and if i enabled the OSD while playback it stays @ NV12 but if i leave the OSD on now and reopen the file the output from ffdshow-quicksync changes to YV12 ?????

Happens also with the normal MPC-HC so it seems to be a ffdshow thing.


Open the OSD while playback:

http://img829.imageshack.us/img829/6356/firstcall.png

OSD still enabled on next file open:

http://img692.imageshack.us/img692/3042/nextcalls.png

JanWillem32
4th October 2011, 17:41
Both NV12 and YV12 are valid types for transporting progressive Y'CbCr 4:2:0 video, but it's indeed a bit odd that the type changes on re-opening. The mixer input format is a bigger problem: "YUY2" indicates format conversion to an incompatible type.
Can you take a look at the output pins of the video parts in the "Play", "Filters" menu? Neither EVR or VMR-9 should ever use a YUY2 input pin if NV12, YV12, IYUV or I420 is offered. A problem factor can be insertion of the color space converter filter.
You can get a full graphic overview of the DirectShow playback chain with GraphEdit. Simply use "Connect to Remote Graph...". You might be able to find out things easier when using that interface.
For an overview of all supported video decoder and processor types, with and without deinterlacing, see DXVAChecker. If there are no compatible processor types for Y'CbCr 4:2:0 video, we will have to take a look at the options for color space conversion to a more proper type than YUY2.

CruNcher
4th October 2011, 18:43
Hmm it remembers me that ffdshows NV12 implementation might be buggy anyways if your mixer cant recognize it there where some heavy issues with it's NV12 when put into a full Hardware decoding chain without any conversions (it crashed the whole rendering) of Nvidias Nvcuvid it's a little problematic to repeat that test with the Quicksync Decoder though as ffdshow is now a part of it ;)

http://forum.doom9.org/showthread.php?p=1501021&highlight=ffdshow#post1501021


PS: I cant connect to the remote graph tried both with Elevated Admin rights (Graphstudio 64, MPC-HC Tester 64) but it doesn't connect, i see the Graph but it wont connect nothing happens if i push connect the Graph isn't loaded.
And with Graphedit 64 from the Windows SDK it doesn't even shows any Graph to connect to @ all :(

http://img21.imageshack.us/img21/6775/hmmcantconnect.png


Here it is (this is a mockup i wish MPC-HC would be finally able to do this window multitasking without interrupting its workflow ;))

http://img847.imageshack.us/img847/2908/evrcpffdshow.png

nevcairiel
4th October 2011, 18:51
ffdshows problem is with raw NV12 input, and the problem is quite obvious if you get it.

egur
4th October 2011, 22:09
ffdshow has issues with NV12 - libavcodec bug actually. It crashes when copying NV12 surfaces in some cases. ffdshow has an alternative method which works fine - that's what I used.
Regarding the connection issues with MPC-HC. It should be fixed for the next release. I'm almost done with the time stamp code fix so a release is very close. Probably last release before integration with ffdshow's official code base.

egur
5th October 2011, 23:56
A few virtual DirectX 9 COM pointer functions can be derived from a DirectX 10/11 device, but that's for expert-level DirectX programmers. The resource management is difficult.
For DirectX usage without a screen, look into DirectX 10.1 and 11 DirectCompute. Else, just ignore the implicit swap chain completely and set the window handle to invisible.


Thanks for the explanations, but I need a DirectX adapter to be enumurated for a screen less adapter. Otherwise the Media SDK will not initialize.
Do you know why the EVR CP drops frames? There's plenty of compute headroom and it falls to ~30fps, dropping about half the frames.

CruNcher
6th October 2011, 17:48
@jan

is it possible to keep this active by default for EVR-CP (or selectable via the EVR-CP config tab,or even better directly in the Menu Options bound to the stop graph call it needs) ? it works but it disables itself after a close and you have to enable it again (would be much easier to have access via the normal menu for on demand usage) :(

http://img267.imageshack.us/img267/826/evrpropkeepactive.png

That seems to be Intels Hardware Deinterlacer

And this Intels IVTC

http://img845.imageshack.us/img845/9980/intelivtc.png


The Deinterlacer depending on the Interlaced frames config though fails sometimes in terms of motion results (i wonder if it does any frame analyzing @ all as it shows NumForwardRefSamples = 0 NumBackwardRefSamples = 0) but better then nothing if no flag is available combined with the flag call from the decoder, it seems a nice combo, and it doesn't really seem to hit the progressive frames :)

I wonder if its possible to get access to the Denoiser (NoiseFilterTechnology) and Sharpener (DetailFilterTechnology) also via this DXVA2 interface it suggests so :)

JanWillem32
8th October 2011, 23:45
Thanks for the explanations, but I need a DirectX adapter to be enumurated for a screen less adapter. Otherwise the Media SDK will not initialize.
Do you know why the EVR CP drops frames? There's plenty of compute headroom and it falls to ~30fps, dropping about half the frames.If you're trying to use the internal VSync and/or flush functions, you'll see that the GPU will be doing nothing about half of the time. These functions are made to flush the command cue, hold the GPU and the paint thread (on the CPU) inactive until the estimated next VBlank period, and then call a present. After the present call, the paint sequence for the next frame is stalled for a while, until the scheduler is sure that the next frame won't be presented early. In that time the presenter thread and GPU are idle.
A normal renderer never flushes and rarely stalls, except for reset and world transition situations. It's pretty normal to allow up to about 3 fully rendered frames in a queue.
The trunk MPC-HC build renderer also can't properly queue even basic drawing sequence commands, let alone queue entire frames.
For the first question, are you looking for the IDirect3D9 adapter functions? http://msdn.microsoft.com/en-us/library/bb174300%28v=VS.85%29.aspx
When creating a DX9 or DX9Ex device, the default adapter is always given the number 0. Don't forget to give a private HWND input pointer if you intend to make the window invisible (don't use the one of the main window). An ignored swapchain will generally be 1×1 in size, and have a single back buffer, that is never used. Not creating an implicit swap chain is not allowed under DirectX 9.
I'm looking forward to your fixes, I'll keep an eye on this thread to see if I can help with anything.

@CruNcher: Video processor devices are registered by the display driver and are also handled by the display driver in the external EVR mixer phase, as a black box to the video application. There are some options the video application can set, but most items are are vendor-specific and many settings are simply ignored when initializing the external mixer. Graphics drivers should have a tab for video options in their control panel.
I was already not amused by setting up the color controls for EVR on the Miscellaneous page. These sort of controls won't ever work for RGB input types, are a duplicate of those in most recent control panels and most importantly, are not transparent to the user what software is responsible for executing this filter (not the video player itself in this case). That's why I'm against integrating these sorts of controls in MPC-HC.
In the pictures you see the debug window for the loaded EVR filter. It doesn't keep settings. It's the graphics driver's job to write out defaults and settings to the registry, and in the the case of interlacing types, even for several scenarios. Remember that the vanilla EVR also doesn't come with a regular settings panel for these sorts of things, too.

egur
9th October 2011, 08:15
If you're trying to use the internal VSync and/or flush functions, you'll see that the GPU will be doing nothing about half of the time. These functions are made to flush the command cue, hold the GPU and the paint thread (on the CPU) inactive until the estimated next VBlank period, and then call a present. After the present call, the paint sequence for the next frame is stalled for a while, until the scheduler is sure that the next frame won't be presented early. In that time the presenter thread and GPU are idle.
A normal renderer never flushes and rarely stalls, except for reset and world transition situations. It's pretty normal to allow up to about 3 fully rendered frames in a queue.
The trunk MPC-HC build renderer also can't properly queue even basic drawing sequence commands, let alone queue entire frames.
For the first question, are you looking for the IDirect3D9 adapter functions? http://msdn.microsoft.com/en-us/library/bb174300%28v=VS.85%29.aspx
When creating a DX9 or DX9Ex device, the default adapter is always given the number 0. Don't forget to give a private HWND input pointer if you intend to make the window invisible (don't use the one of the main window). An ignored swapchain will generally be 1×1 in size, and have a single back buffer, that is never used. Not creating an implicit swap chain is not allowed under DirectX 9.
I'm looking forward to your fixes, I'll keep an eye on this thread to see if I can help with anything.

Thanks for the explanations; they should come in handy in the future. Unfortunately, the MSDK initialization (educated guess) queries the IDirect3D9 interface for available adaptors via calls to GetAdapterCount and GetAdapterIdentifier and look for an Intel GPU. When the Intel GPU is not connected to a monitor, it will not enumerate in the above calls. My home setup has a Radeon connected to screen. If the monitor is connected to the Radeon, the Intel GPU will be hidden. It will show in Windows device manager, but not accessible from DirectX.
The big question is how do I force DirectX to enumerate it?
Is there a way to programmatically connect the disconnected GPU to a virtual monitor?