View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing
Superb
17th September 2011, 02:25
I believe it's not the moderator's view... It's the license's view...
egur
17th September 2011, 09:07
Since ffdshow is licensed under the GPL and you are distributing modified builds of it, I think you are required to make the source available regardless of anyone requesting it or not. At least that's the point of view a certain moderator on this forum took with my work in the past.
I didn't contact CLSID (ffdshow's admin) about when and if I can integrate my ffdshow changes into the ffdshow source control in SourceForge.
BTW, the changes to ffdshow itself are small and trivial - just add a new decoder and assign it H264/MPEG2/VC1.
The majority of my code (a decoder DLL) should be on its own and I'm not sure what's the best way to post it's code. The most generic solution would be to create a separate project in SourceForge or another depository and have ffdhsow use it as an external lib (like it uses other decoders).
I'd like the decoder DLL to be LGPL not GPL so it can be used in any project (open or closed source).
ATM, it's much easier for me to have a single VC2010 solution and develop on it. The entire ffdshow source code is ~60MB zip and it's a little big to post, so I don't post it. My DLL's source code is ~130K and I have no problems sending it to anyone.
I'm not an expert in these matters and I'd be happy to get suggestions on the matter - both technical and legal.
tetsuo55
17th September 2011, 14:12
I didn't contact CLSID (ffdshow's admin) about when and if I can integrate my ffdshow changes into the ffdshow source control in SourceForge.
BTW, the changes to ffdshow itself are small and trivial - just add a new decoder and assign it H264/MPEG2/VC1.
The majority of my code (a decoder DLL) should be on its own and I'm not sure what's the best way to post it's code. The most generic solution would be to create a separate project in SourceForge or another depository and have ffdhsow use it as an external lib (like it uses other decoders).
I'd like the decoder DLL to be LGPL not GPL so it can be used in any project (open or closed source).
ATM, it's much easier for me to have a single VC2010 solution and develop on it. The entire ffdshow source code is ~60MB zip and it's a little big to post, so I don't post it. My DLL's source code is ~130K and I have no problems sending it to anyone.
I'm not an expert in these matters and I'd be happy to get suggestions on the matter - both technical and legal.The best thing you can do right now is open a new project, you can use GIT trickery to link to ffdshow at compile time and then patch from your own tree (that way you do not need to copy all of ffdshow).
Here is some extra info http://www.joelonsoftware.com/articles/fog0000000043.html
(P.S. Make sure to keep the LGPL and GPL code in seperate sub-projects)
BetaBoy
18th September 2011, 05:35
I'd like the decoder DLL to be LGPL not GPL so it can be used in any project (open or closed source)
LGPL is not great for closed source as it would raise more concerns then what I think you're trying to accomplish. I would propose a BSD based license.
Alternatively make it licensed under all 3: GPL, LGPL, BSD to satisfy everyone.
Blight
18th September 2011, 12:42
I second it, BSD is better than LGPL for closed source applications.
egur
18th September 2011, 23:01
Then BSD license it is.
Next build will Tomorrow (Monday).
egur
19th September 2011, 06:48
Thanks for the reply's.
Do you think this code will at some later point in any way help older DXVA implementations of pre-sandy-bridge hardware?
Technically (MSDK docs), it should work on Core 2 Dou with Intel graphics. But no one reported success yet. And the real world is not aligned with MSDK docs :(
I'll try to make it work on a Penryn laptop and report back
egur
19th September 2011, 10:47
New and improved version. Zip files contain documentation, please read.
Download version 0.13 alpha:
32 bit http://www.multiupload.com/Z284JFR06X
64 bit http://www.multiupload.com/1YFGXD786C
Source Code http://www.multiupload.com/ZVMCN124A0
Revision highlights:
v1.13:
* Optimized memory copy even further. Memory copy has 2-6% overhead (out of process CPU usage).
* Fixed bug in memory copy when frame width wasn't mod128.
* VC1 playback is more stable. Still corruption on some clips.
* Fixed some small memory leaks.
* Compatibility with 2509 driver.
* Bug fixes & cleanup.
* Tested with driver versions 2509 and 2372.
CruNcher
19th September 2011, 13:07
Egur for Performance and behviour testing also of the MSDK parts you should really take WAC,WPR,WPA from the new ADK into your Dev chain (though i guess Intel is using it already longer time then we are aware now of it since Build, and this massive usability changes of it) i can really recommend it to use it's damn powerful down to the stack and much easier know then Xperf was when it started with NT 6, this is for every Windows Developer a must use in Application Development/Assessment :)
Like Valgrind is for *nix Devs :D
http://img52.imageshack.us/img52/457/powerfulz.png
Argh
* Tested with driver versions 2509 and 2372.
I wasn't aware of a new version unfortunately the news about new Intel Drivers aren't spreading as fast as for example Nvidia or AMD driver releases :(
also you have inside knowledge coud you please explain the release cycle difference and numbering sheme difference between the Platform Drivers and the Mainboard Driver release and the difference between them ?
So the difference currently between the 2 branches (also in terms of MSDK integration)
8.15.10.2476
15.22.50.64.2509
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=20334&ProdId=3283&lang=eng&OSVersion=Windows%20Vista%2064*&DownloadType=Treiber
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=20391&ProdId=3319&lang=eng&OSVersion=Windows%207%20Ultimate*%2C%2064-Bit-Version&DownloadType=Treiber
Has it just todo with testing and Certification for the correct functioning on the Intel Mainboards or for what are those 2 different in CPU driver categories ?
egur
19th September 2011, 14:06
also you have inside knowledge could you please explain the release cycle difference and numbering sheme difference between the Platform Drivers and the Mainboard Driver release and the difference between them ?
So the difference currently between the 2 branches (also in terms of MSDK integration)
8.15.10.2476
15.22.50.64.2509
Has it just todo with testing and Certification for the correct functioning on the Intel Mainboards or for what are those 2 different in CPU driver categories ?
I'm as clueless as you are on driver version numbers and release dates :(
The last driver released to the public was in April. The new one is about a week old.
I didn't notice any positive changes about the new driver. It broke parts of my code and I have opened a thread on the MSDK forum. VC1 corruption is still there.
I also have a laptop with an engineering sample SNB processor that has an April driver. I use the two systems to produce code that works on both.
Some MSDK functions fail using one driver and succeed on the other and vice versa.
BTW, do you mind if I share the clips you're posted here to the MSDK team?
CruNcher
19th September 2011, 14:14
Sure you can share them so the MSDK Devs become aware of these issues and fix them in Driver, this is the goal of such a collaboration improving by sharing problems so the whole Ecosystem can leverage from it from ISVs, Vendors to Consumers in the End :)
BetaBoy
19th September 2011, 14:23
New and improved version. Zip files contain documentation, please read.
egur... thank you for this and your continued work on it.
CruNcher
19th September 2011, 14:28
Btw Egur this is also something that interests myself http://software.intel.com/en-us/forums/showthread.php?t=86355&o=a&s=lr :)
In the Documention it says there needs to be a Display connected @ least so in theory it should work with a Discreet Card inside and connected if another Monitor is also connected to the IGPU (or maybe a Dongle is enough to make Windows and the Driver and so the MSDK in believing a Monitor is connected ;) ) I really could guess suddenly you would get a answer back that doesn't say unsupported anymore, though maybe Intel now decided to remove it completely after the Lucid Logix Partnership ;)
Especially im interested how to leverage the DSP Encoder without needing 3rd party software for framebuffer copying like Lucid Logix in such a scenario (if a dongle is enough it would be perfect i didn't tried it yet, im still testing the full capabilities of Intels GT1 alone especially in Power Consumption, but obviously i keep a backup of every SDK and Driver to check if something dramaticaly changed or has been removed on purpose) :)
Im a little sad that my Mainboard manufacture didn't decided to give this capability to their customers for free especially early adopters but Intel did so so in the end they gave their users something for the Chipset Disaster for free (but if it should come out that it's possible without 3rd party software to leverage Quicksync alone on a multi GPU system then it would have been just a clever marketing step for both Intel and Lucid Logix) i wished other Vendors would have gone the same way but they made it a feature for Higher Class SKUs ;)
So im really interested in the answer you gonna get myself ;)
egur
20th September 2011, 07:30
Im a little sad that my Mainboard manufacture didn't decided to give this capability...
FYI, I had to update my BIOS to get Virtu working on my Intel DH67GD motherboard. New BIOS had other enhancements like a much smarter fan control and fast boot (1-2s POST).
I think that with Viru you actually use the 2 GPUs, but you'll need 2 processes. Each process will use a different GPU (add one of them to Virtu's app list) and data needs to be copied to shared memory (memory mapped file). This is a little complex setup and I don't have the resources to explore it. At least I've proven that copying the data from the Intel GPU isn't too bad. Latest benchmarks for a 243 frame clip 1920x816 took 110ms (for all the frames) according to VTune Amplifier 2011.
I'll report in this thread if there's anything new on the matter.
egur
20th September 2011, 07:31
egur... thank you for this and your continued work on it.
10x, I appreciate it.
CruNcher
20th September 2011, 13:14
Egur i expected that Marketing Answer and im not happy with it @ all ;)
egur
20th September 2011, 13:26
Egur i expected that Marketing Answer and im not happy with it @ all ;)
What marketing answer?
CruNcher
20th September 2011, 15:41
What marketing answer?
Eric,
This feature is only supported for systems with switchable graphics. More details can be found here:
http://www.intel.com/support/graphics/sb/CS-031103.htm
Details about how to set this up can be very system specific. We're hoping to add some clarifications to the documentation in the future.
Regards,
Jeff
That one ;)
egur
22nd September 2011, 09:52
I've ran a few vtune sessions to optimize my code. New version (0.14) will be slightly faster than 0.13.
Test platform:
* Windows 7, 64 bit
* Core i7 2840 @2.4GHz (45W)
* MPC-HC (current version)
* A 10s clip. H264/AVC1, 1920x816, 243 frames
Vtune showed that the latest sse4_memcpy took 112ms for the entire clip. That's less than 0.5ms per frame (almost 1080p).
CPU usage was in the low single digits ~5%.
My DLL's code contributed 1/50 of that 5%.
A more important thing is the the CPU frequency went down to 800MHz, the lowest frequency SNB-mobile will go to for the entire clip. This is about 1/3 of the stock frequency and ~1/4 of max turbo.
Eliminateur
22nd September 2011, 12:42
i'm really looking forward to see your decoder implemented in mpc-hc!(if it's possible at all), since right now dxva decoding is broken for SNB on MPC-HC
egur
22nd September 2011, 12:48
i'm really looking forward to see your decoder implemented in mpc-hc!(if it's possible at all), since right now dxva decoding is broken for SNB on MPC-HC
It works with MPC-HC (32/64 bit).
MPC-HC is my only test platform for 64 bit BTW.
Using EVR in MPC-HC is very solid, except for several VC1 clips which are under inverstigation and only libwmv9 can play properly.
It's still work in progress, but things are quite stable and I'd appreciate more testers.
In MPC-HC just uncheck the internal filters for MPEG2/H264/VC1 in the "options->Internal Filters" dialog. Add ffdshow to the external filter list, configure it to use IntelQuickSync and you're set to go.
Latest version is always availble on the 1st page.
Comments are welcome.
Next release will come as an FFDshow installer like the standard builds.
Eliminateur
22nd September 2011, 13:00
what i meant was working as in "integrated" into the internal filters, not as part of ffdshow separate installation.
When i get a new Pentium Gxxx machine built here in the shop i'll test if it works with that series
egur
22nd September 2011, 13:04
what i meant was working as in "integrated" into the internal filters, not as part of ffdshow separate installation.
It's on my TODO list.
CruNcher
22nd September 2011, 13:23
Yep it got amazing fast now and CPU overhead is in the range of Lav Cuvid now for Yoon Yoon it was a dramatic improvement from that heavy utilization @ the beginning to 18% and now only 7-8% pretty good (for non DXVA) :)
It would now even make sense to try it in a Quicksync based Framework and see how it does their :D inlcuding the Encoder inside ffdshow also looks like a good idea :)
Yeah the decoding issue with the MC.ts bitstream is still a problem it's funny that also Nvidia had problems in the beginning of their API with this i wonder why this bitstream type was overlooked by Nvidia and Intel now ;).
Also i might have found another H.264 issue but i have to isolate this first it happened in a pretty normal playback scenario.
I also tested Intels PP system but it's fairly weak (Denoise,Sharpening) are pretty basic implementations currently
Also Deinterlacing and IVTC work only Efficient on EVR with EVR-CP MBAFF Deinterlacing and IVTC are failing currently, though not much of a big deal as Shader based PP are usable on both and with Aero on tearing is history anyways (for my weak GT1 6 EU it works still pretty reliable and i still have clock headroom to improve higher res input) :) :(
PS: I see that a Dummy works like expected from the Documentation very nice no need for the Lucid solution :D though that you have no reference what for P-States the GT1 is using i wonder how much power it draws if in this headless mode i guess as much as with a real display though, maybe a little lower depends on how the DSP is weaved together with the rest of the GPU and CPU and the efficiency of the Power Management Intel implemented :)
Now i slowly getting there todo a complete framework test between Nvidia and Intel :D
nevcairiel
22nd September 2011, 14:34
Nice to see the performance improvements, that'll surely make it much more usable in the future.
Luckily SSE4.1 is available since Penryn, so any recent Intel iGPU will be able to use it. :)
Looking forward to working on integrating it in LAV Video when i'm done integrating CUVID properly (and maybe wmv9, depending on what i decide to do first).
PS:
Regarding "integrating into MPC-HC", the MPC-HC integrated decoders are overall outdated, the only thing useful they offer is the DXVA decoder which works better then ffdshows (which is based on the same code, but never was truely maintained)
I've always aimed to replace those decoders with a equally simple and easy to use, yet modern, decoder, which is exactly what my LAV Audio & Video are providing.
egur
22nd September 2011, 15:48
Yep it got amazing fast now and CPU overhead is in the range of Lav Cuvid now for Yoon Yoon it was a dramatic improvement from that heavy utilization @ the beginning to 18% and now only 7-8% pretty good (for non DXVA) :)
We need some a method for measuring CPU usage. ffdshow-quicksync CPU usage (your example) went down from 18%@3.1GHz to 7-8%@0.8GHz.
Maybe a normalized formula is needed:
NormalizedCpuUsage = CpuUsage * NumPhysicalCores * Freq
Regarding power - SNB reduced voltage at 800MHz to about 0.7V-0.75V. In turbo it's ~1.2V. That's a major power drop.
It would now even make sense to try it in a Quicksync based Framework and see how it does their :D inlcuding the Encoder inside ffdshow also looks like a good idea :)
I think its a good idea too. But not feasible in the short term. BTW, there're encoders in MSDK, but I know nothing about them.
Yeah the decoding issue with the MC.ts bitstream is still a problem it's funny that also Nvidia had problems in the beginning of their API with this i wonder why this bitstream type was overlooked by Nvidia and Intel now ;).
I gave the MC.ts clip to the MSDK team to check out.
There’s something wrong with it. My AMD Radeon 6950 DXVA crashes on it, libavcodec doesn’t work. Only WMV9 works well.
Also i might have found another H.264 issue but i have to isolate this first it happened in a pretty normal playback scenario.
I’m aware of the following bug (not root caused yet) :
Open an MKV/AVC1 clip in MPC-HC using with EVR-CP as renderer --> crash.
But… if you open another file first and then open the crashing clip it will not crash!
Also not crash with normal EVR. Very strange and very repeatable. Crash is within ffdshow.ax but before my constructor is called. In ZoomPlayer it never happened (no EVR-CP).
I also tested Intels PP system but it's fairly weak (Denoise,Sharpening) are pretty basic implementations currently
I get more detail using EVR in the IGP then my AMD card. I guess a matter of taste.
Please post images for comparison. Also the IGP scaling is much better (I designed it :) )
Also Deinterlacing and IVTC work only Efficient on EVR with EVR-CP MBAFF Deinterlacing and IVTC are failing currently,…
Please explain. I don’t fully understand.
PS: I see that a Dummy works like expected from the Documentation very nice no need for the Lucid solution :D though that you have no reference what for P-States the GT1 is using i wonder how much power it draws if in this headless mode i guess as much as with a real display though, maybe a little lower depends on how the DSP is weaved together with the rest of the GPU and CPU and the efficiency of the Power Management Intel implemented :)
P states should be high (my guess) – a lot of memory traffic but the EUs should be idle. They don’t do much.
Now i slowly getting there todo a complete framework test between Nvidia and Intel :D
Excellent – this would be good for everyone. Someone needs to replace HQV with something more professional.
Nice to see the performance improvements, that'll surely make it much more usable in the future.
Luckily SSE4.1 is available since Penryn, so any recent Intel iGPU will be able to use it. :)
It works on Penryn (I have a Penryn laptop T400 Thinkpad), but poorly :(. The HW isn’t the same as SNB…
Looking forward to working on integrating it in LAV Video when i'm done integrating CUVID properly (and maybe wmv9, depending on what i decide to do first).
Excellent! I’m gathering requirements now. Please send them to me.
PS:
Regarding "integrating into MPC-HC", the MPC-HC integrated decoders are overall outdated, the only thing useful they offer is the DXVA decoder which works better then ffdshows (which is based on the same code, but never was truely maintained)
I've always aimed to replace those decoders with a equally simple and easy to use, yet modern, decoder, which is exactly what my LAV Audio & Video are providing.
I don’t have the bandwidth to create a standalone DirectShow decoder. The MPC-HC devs will have to it themselves, I guess. What I can do is create a standalone decoder with a C++ interface that’s not dependent on anything. This is BTW very close to where I’m now. I’m missing interface requirements to make integration a smooth process (<1 week)
nevcairiel
22nd September 2011, 16:18
I gave the MC.ts clip to the MSDK team to check out.
There’s something wrong with it. My AMD Radeon 6950 DXVA crashes on it, libavcodec doesn’t work. Only WMV9 works well.
Its interlaced VC-1, of course libavcodec won't work - interlaced VC-1 is not supported at all, sadly.
As far as i am aware, CUVID decoders on NVIDIA work fine with it, though. Sadly i don't have a copy of that file to check it out, and it appears no-one ever linked it publicly in this thread, or i was too blind to find it.
Excellent! I’m gathering requirements now. Please send them to me.
I'll get back to you on that. I don't really have a set of requirements defined, as most of the time as a developer of these components i just have to adapt to the APIs i have, be it CUDA/CUVID, the WMV9 decoder, DXVA2 or the Intel MSDK.
All i really need is some API at which i can throw compressed frames, and it somehow gives me back the decoded frames, including all necessary metadata.
Then again, there is timestamp handling, which will never work out of the box, so defining requirements for that is non-trivial. H264 and MPEG2 are easy, VC-1 is hard.
I'll think some about that.
If anything, your code will be a great template to build upon.
CruNcher
22nd September 2011, 16:29
I get more detail using EVR in the IGP then my AMD card. I guess a matter of taste.
Please post images for comparison. Also the IGP scaling is much better (I designed it )
Nice :) yes im gonna do some also the PP stages and differences to what is basically available to consumer including a pretty basic comparison to thinks like SimHD which imho is overrated entirely to basic Shader PP in it's current incarnation @ least :)
This H.264 problem is not directly related to ffdshow-quicksync but another popular 3rd Party component that makes use of the Decoder via DXVA, though im still checking this.
egur
22nd September 2011, 16:45
Sadly i don't have a copy of that file to check it out, and it appears no-one ever linked it publicly in this thread, ...
For MC.ts & CD.ts see CruNcher's post with links:
http://forum.doom9.org/showthread.php?p=1526099#post1526099
CD.ts plays fine and it's VC1 interlaced. CruNcher said in the post , that the MC.ts is field interlaced and the CD.ts is frame interlaced. Something is completely screwed with the MC clip, I don't know what yet. It was sent to the MSDK guys for a solution.
Update
wmv9 (from ffdshow 3978) - reports clip as progressive (wrong). No block artifacts. No deinterlacing.
Intel decoder - clip is interlaced (TFF). EVR deinterlaces OK. Strong block artifacts in decoder at the macro block level. No idea why.
nevcairiel
22nd September 2011, 16:48
Field interlacing is rather rare for VC-1, however i've run across another clip that uses this just a short while ago - but i've never seen it before that. :)
Edit:
I can confirm that MC.ts plays fine with my CUVID decoder. :)
Blight
22nd September 2011, 22:13
cruncher:
When eric said he designed the the scaler, he should have given a bit of background.
Prior to Intel, he worked on scaling algorithms (noise reduction as well if I remember correctly) at a company that designed image processing chips.
The company was bought out by Intel and the scaler algorithm work was integrated into future Intel chips.
A few years later... and now it's part of Sandy Bridge :)
CruNcher
22nd September 2011, 23:14
So the Scaler inside the first Atoms is also from Eric :D ?
they also bough in the beginning of the year Silicon Hive to improve most probably the power consumption part of future Atoms (and maybe even allready Ivy Bridge or @ least Haswell) :D
Silicon Hive processors save tremendous area and power by moving almost all control from run-time to compile time.
In a statement Intel said that the acquisition would bring better still-imaging and multimedia video processor technology, compilers and software tools to the Atom processor portfolio. "The Silicon Hive capabilities will aid in the delivery of more differentiated Atom-processor based SoCs as multimedia and imaging grow in importance across the mobile smart device segments," Intel said.
Ahh i see Oplus Technologies Inc is the Deinterlacing Oplus Pixel-Entropy Deinterlacing ?
hehe
Image enhancement filter - enhances images and provides spatial noise reduction
Color transition improvement - improves picture sharpness by emphasizing color transitions
3D motion noise reduction filter
Motion adaptive de-interlacing
Color control - providing independent saturation and luminance control per color compounent
Skin tone enhancement - providing detection and improvement of skin tones
Adaptive contrast enhancement - dynamically adjusts the contrast curve, enhancing image detail
Pixel mode recognition algorithm - enables seamless 3:2 (NTSC) and 2:2 (PAL) reverse pulldown, at the pixel level, of TV-adapted film material.
Image detection, adjustment and positioning
High-quality, patented, linear and non-linear scaling
Patented keystone correction algorithm for high-quality electronic correction for projectors
Flexible windowing technology supporting any combination of video and graphics or video and HDTV, with size, position and order control.
At least i know what im comparing here ;)
So basically the equivalent of HQV from Israel
hehe have to find that patent ;)
hehe thats cool what he done with his Skin tone Research :D
Segman examined thousands of photos of people
with different skin tones. “The aim was to establish
if a certain skin color could indicate illness. Could a
certain skin tone, for example, be influenced by liver
problems? He uncovered a relationship between skin
colors that indicate normal health or illness.
Segman found that when people are tense, their skin
tone changes. “
The first trials of Cnoga’s technology will soon be
tested at Bnei Zion Hospital in Haifa, supervised by
Prof. Eli Zuckerman.
egur
22nd September 2011, 23:39
So the Scaler inside the first Atoms is also from Eric :D ?
No, It's a completely GPU, I'm not familiar with the details on the Atom...
egur
22nd September 2011, 23:41
...Also Deinterlacing and IVTC work only Efficient on EVR with EVR-CP MBAFF Deinterlacing and IVTC are failing currently...
Can you upload failing clips?
CruNcher
23rd September 2011, 01:23
Can you upload failing clips?
http://www.mediafire.com/?4gcnazd6y78c9yp
http://www.mediafire.com/?wlnkh80oaou827h
Sure works fine on EVR only EVR-CP is the Problem :(
It really seems there are only 2 ways to use the Adaptive Deinterlacing either using EVR or EVR-CP + DXVA Decoder
PS: I got Desktop Capturing to work via Quicksync trying to capture though 60 fps on DWM with BitBlt seems not really that efficient :D
http://img148.imageshack.us/img148/7202/thisiscool.png
Performance droop with Firefox is Extreme here on both sides IE is much better doing, obviously drooped frames in the EVR but not as bad as with Firefox :) (they are still far away from Microsofts D2D Engine Performance)
http://img20.imageshack.us/img20/7202/thisiscool.png
egur
23rd September 2011, 10:12
Ahh i see Oplus Technologies Inc is the Deinterlacing Oplus Pixel-Entropy Deinterlacing ?
Before 2005 (when Oplus was bought), the technology went into video processors for flat panel TV and projectors.
The SNB algorithms where done when Oplus was already owned by Intel almost 2 years later. Segman was one of the Oplus founders and he invented the algorithms for the early Oplus video processors, a little before my time and I don't want to comment on his work. He didn't get along with Intel and left very quickly after the acquisition. He got a few million dollars from the sale like all the founders and now he pursues his exotic algorithms elsewhere...
I had a bug with identifying the interlace flag correctly, hence the sample-mbaff.ts clip didn't deinterlace. fixed.
The other clip telecine-test.ts plays fine (EVR).
Regarding EVR-CP. Most video clips will not show video using this on a GT1. I think GT2 works, I'll recheck on Sunday when I come to work.
Almost all renderers disable Aero when MPC-HC starts playback (EVR doesn't :) ).
Does anyone have a solution for this?
Why is this happening?
CruNcher
23rd September 2011, 10:22
No such problems (though i know that black frames also like to happen if a bitstream is wrong marked for example as High 5.1, some decoder in the past didn't liked that) though my Win 7 is very Virgin and i didn't yet installed some optional updates that have todo with the D2D subsystem.
And please look carefully @ the telecine one, the problem happens @ the fade (the cut after the man gets dragged out of the frame by the bodyguard and the camera switches back to her on stage) its almost invisible but the difference between EVR and EVR-CP in that fade is visually recognizable.
nevcairiel
23rd September 2011, 10:47
Almost all renderers disable Aero when MPC-HC starts playback (EVR doesn't :) ).
Does anyone have a solution for this?
Why is this happening?
Its a option. You probably applied some settings preset which included this option.
Check the renderer options in the right-click menu.
CruNcher
23rd September 2011, 11:12
ouhh yeah that nasty thing i totally ignored that, since i first turned it off don't wanna remember it ;)
Blight
23rd September 2011, 11:28
I know that at least with MadVR, the reason for the option to disable Aero with fullscreen exclusive mode is that it makes the switch between exclusive/window mode faster.
Which is good and fine if you're running a standalone HTPC which never sees the desktop.
Cruncher:
I'm interested to see a visual comparison between SNB DeInterlacing and the NVIDIA/ATI/Software modes.
I believe the eric's plan is to look into adding the hardware deinterlacing support after the decoding is fully stable :)
egur:
What other PP does the SNB support? Denoise? Sharpen? Weren't you talking about facial color correction a while back?
egur
23rd September 2011, 13:00
The various EVR CP clack screen issues were solved by resetting the renderer to its default settings.
EVR CP give a jitter or 7-8ms and an offset of ~8ms very constantly. I have no clue why, changing the EVR buffer count doesn't change anything. If someone has an idea, let me know.
CruNcher:
Regarding the IVTC drop (telecine-test.ts) – I can’t reproduce in my current version. I’ve placed a breakpoint in the code that drops IVTC and it doesn’t trigger. BTW IVTC isn’t activated on the first frame, I look for a field doubling flag and only when I see one I activate IVTC. The IVTC dropping heuristic is to count 4 frames from the last frame that had the field doubling flag. After 4 frames the 3:2 cadence is broken and IVTC is dropped. If someone has a better way of doing this, don’t be shy.
SNB video processing blocks are:
*Motion Detection based deinterlacer
*Film cadence detection (for interlaced content) – works on raw video not part of decoder
* Denoise (temporal and spatial)
* Context adaptive sharpening
* Context adaptive 8 tap polyphase scaler. Supports Non Linear Adaptive Scaling (4:3 -> 16:9)
* Total Color Control – something like digital vibrance .
* Automatic Contrast Enhancement
*Skin tone correction
*Color space conversion
* Surface conversion (e.g. YUY2 -> NV12)
* Frame rate conversion.
Note that most are pure ASIC implementation and do affect CPU or EU utilization.
Other algorithms might have been added via kernels (shaders)
CruNcher
23rd September 2011, 18:58
Egur i see it seems there is indeed a deinterlaced frame that doesn't got flaged as such and the adaptive deinterlacing catches it on EVR
yep that Latency i guess has something todo with the nature of EVR-CP there is a lot going on you could try Jan's current test builds and see if it lowers the latency.
Blight from the first looks just from the experience of it with different interlaced content it does a very good job also look @ what it does to this telecine clip http://forum.doom9.org/showthread.php?p=1526139#post1526139 :) where it catches that 1 fade very accurately of course as it is blended it's very hard to get that right.
Egur yeah that is the big + vs Nvidia/AMDs shader implementations having most important performance taking PP stuff aside of decoding/encoding Natively and adaptive :)
Though my first test with * Context adaptive sharpening didn't looked so good versus SharpenComplex2 PS implementation though obviously SharpenComplex2 almost kills the GT1 with High Resolution content but with SD (and for me i use it very unlikely anyways with HD content) it does a good job and the GT1 survives it easily there, the results from Intels Hardware Sharpening via the Control Panel doesn't look as good on the first sight with Heavy compressed streams (even pushing the slider to the edge) but obviously it does much better when you push it a not so compressed image then what SharpenComplex2 does then with it ;)
egur
23rd September 2011, 20:37
CruNcher:
I looked closer at the problematic telecine-test.ts clip. Indeed it has several interlaced frames right where you said it has.
The fix isn't easy and I don't know if it will be in the next release. At least I root caused the problem. My fix will include a much quicker switch from telecined to non telecined material.
There' also the PAL telecining 24->25 frames by repeating the 24th frame, but it's low priority now.
I'm having issues with time stamps. Several clips produce out of order time stamps. Currently I take the 1st good (non garbage) time stamp and use it as a reference. Following time output stamps are derived from that first one. This isn't perfect and I'd like to know if anyone has a good (bulletproof) solution for this.
nevcairiel
23rd September 2011, 20:56
I've never had any real issues with H264 or MPEG2 timestamps, they usually are rather consistent. (VC-1 is another matter, different splitters give different results, but there are luckily only 2 ways they do it)
If you just look at the incoming timestamps, they will of course be out of order, because those are presentation timestamps. They belong to the frame they are attached to, and frames are delivered in decode order, not in presentation order.
Any specific clip that gives you trouble?
CruNcher
23rd September 2011, 22:07
I know that at least with MadVR, the reason for the option to disable Aero with fullscreen exclusive mode is that it makes the switch between exclusive/window mode faster.
Which is good and fine if you're running a standalone HTPC which never sees the desktop.
Cruncher:
I'm interested to see a visual comparison between SNB DeInterlacing and the NVIDIA/ATI/Software modes.
I believe the eric's plan is to look into adding the hardware deinterlacing support after the decoding is fully stable :)
egur:
What other PP does the SNB support? Denoise? Sharpen? Weren't you talking about facial color correction a while back?
Blight lets start with what the user currently sees and has direct access to, that are 7 types of Output configurations.
Color
Denoise
Sharpening
Telecine
Skin tone
Adaptive Contrast
Scaling
in that order the Denoise and Sharpening have a automatic setting and a manual one. Telecine has like the others (AMD/Nvidia) only a on/off (fully adaptive)
they are in the Media section with 3 different subsections in the UI (Color,Picture,Scale) and Deinterlacing being fully Adaptive like Telecine and only 1 Quality therefore is not configurable as well and not mentioned @ all in the UI (makes perfect sense).
Color:
http://img842.imageshack.us/img842/7931/colorie.png
Picture:
http://img716.imageshack.us/img716/950/pictureqo.png
Scale:
http://img840.imageshack.us/img840/4071/scaler.png
Everyone of these 3 Sections has 2 Preview Pictures to chose from (basic still picture)
CruNcher:
I looked closer at the problematic telecine-test.ts clip. Indeed it has several interlaced frames right where you said it has.
The fix isn't easy and I don't know if it will be in the next release. At least I root caused the problem. My fix will include a much quicker switch from telecined to non telecined material.
There' also the PAL telecining 24->25 frames by repeating the 24th frame, but it's low priority now.
Nice can't wait to see this in action :)
egur
24th September 2011, 08:20
I've never had any real issues with H264 or MPEG2 timestamps, they usually are rather consistent. (VC-1 is another matter, different splitters give different results, but there are luckily only 2 ways they do it)
If you just look at the incoming timestamps, they will of course be out of order, because those are presentation timestamps. They belong to the frame they are attached to, and frames are delivered in decode order, not in presentation order.
Any specific clip that gives you trouble?
The DC.ts is a good example:
Time stamps are fine when LAV's "Enable VC1 timestamp correction" is checked. Out of order otherwise (including the default 'auto' option) or when using MPC internal splitter. Haali splitter is also fine (although MPC-HC refuses to use it for mpeg2 transport for some reason).
I don't have access to the DShow graph from within my DLL to query the connected filters and from a design point of view I think it's a terrible idea to patch my code according to the connected filters. Ill behaved filters should be handled by the DShow decoder filter if at all. I'll do a more deep scan of my clips to see if non VC1 formats behave well enough.
nevcairiel:
BTW LAV Splitter produces incorrect (or inconsistent with other splitters) frame rates, interlaced or telecined clips receive 59.97 AvgTimePerFrame in the initial VIDEOINFOHEADER. This is very confusing.
CruNcher
24th September 2011, 12:30
The DC.ts is a good example:
Time stamps are fine when LAV's "Enable VC1 timestamp correction" is checked. Out of order otherwise (including the default 'auto' option) or when using MPC internal splitter. Haali splitter is also fine (although MPC-HC refuses to use it for mpeg2 transport for some reason).
I don't have access to the DShow graph from within my DLL to query the connected filters and from a design point of view I think it's a terrible idea to patch my code according to the connected filters. Ill behaved filters should be handled by the DShow decoder filter if at all. I'll do a more deep scan of my clips to see if non VC1 formats behave well enough.
nevcairiel:
BTW LAV Splitter produces incorrect (or inconsistent with other splitters) frame rates, interlaced or telecined clips receive 59.97 AvgTimePerFrame in the initial VIDEOINFOHEADER. This is very confusing.
Egur are these inconsistencies vanishing if you run it in Source mode ? so instead of the default File Source Async way ?
That is still a big difference to how the Internal MPC-HC splitter works and i saw major differences here especially with partly broken streams.
That also got fixed when running Lav Splitter in Source Mode instead of going through "File Source Async" first, especially with evil trees.ts.
nevcairiel
24th September 2011, 12:57
nevcairiel:
BTW LAV Splitter produces incorrect (or inconsistent with other splitters) frame rates, interlaced or telecined clips receive 59.97 AvgTimePerFrame in the initial VIDEOINFOHEADER. This is very confusing.
The information in that header is really not interesting.
Its usually just taken from the container, if its stored there. If not, it'll try to measure it. I would never assume that its 100% reliable with any splitter. Its more of a "hint" then a reliable source of information. Decoding must work without it, a media type is perfectly valid with it set to 0.
The measured values can be off, depending on how its encoded. But luckily, its only measuring it on MPEG-TS and similar formats, but MPEG-TS is tightly timestamped, so that the value is not needed to calculate missing timestamps.
MKV might contain timestamp gaps, but it does include a proper fps header in the container instead.
It never was a problem before for me.
PS:
Filters are not "ill behaved", there is just two ways to handle VC1. I believe i explained that in a PM before. There is PTS and DTS timestamps. MPC-HCs MPEG Splitter outputs PTS, Haali outputs DTS, LAV can output both, depending on which decoder is connected (and ffdshow expects PTS)
PTS timestamps would look out of order if you don't handle them properly, thats perfectly normal and valid. H264 and MPEG2 always use PTS.
To top the VC1 problem off, VC1 in MKV or in WMV only contains DTS, while MPEG-TS contains both DTS and PTS.
egur
24th September 2011, 14:45
CruNcher:
I use LAV/Haali/MPC splitters only as source filters.
BTW, I doubt if I can decode both evil_trees.ts and telecine-test.ts using a common logic. Both clips stop marking the frames as telecined (field doubling) mid playback. evil_trees.ts looks best if I don't drop out of IVTC immediately and telecine-test.ts looks bad no matter what I do, but only for a fraction of a second, almost impossible to notice. Even if i switch correctly back to interlaced, the renderer doesn't handle this well and shows the same frame 3 times and then 2-3 frames that should have been interlaced but have not.
Since both clips have identical behavior (same flags on the decoded frames) it's impossible to tell them apart. DGindex reports that evil trees is 12% interlaced, 88% film...
I'd like to hear your opinion on this.
nevcairiel:
How can I tell if get DTS or PTS, missing time stamps. Sounds like hell :)
Is there code for dealing with this some where?
To avoid license problems (is there a problem?), I can put the time stamp normalizing code in ffdshow, not in my DLL.
BTW expecting a valid value for AvgTimePerFrame is legit. A zero value might mean I need to measure (need a few frames to do so), but a wrong value is misleading. Microsoft's documentation doesn't say you can ignore this value.
nevcairiel
24th September 2011, 15:12
Even if its legit, its not required. Just use the timestamps you get from the source filter, thats what they are for! :)
Zero AvgFrameTime means it could as well be variable frame rate, which sounds like your code would crash and burn with.
The fun about VFR is that it will most likely not be 0, it'll probably be whatever frame rate the first segment is.
@timestamps in general:
I'm sure the MSDK offers an ability to handle timestamps by itself. This will probably be fine with all PTS timestamps. At least thats the case with NVIDIAs API and DXVA2.
Just for DTS timestamps, you need to manually map the incoming times to the outgoing frames. Since the number of frames coming in and going out is usually the same, that shouldn't be much of a problem. I have that implemented in LAV Video using a FIFO buffer for the CUVID decoder (because i don't know its exact processing delay), and a fixed circular buffer in the avcodec decoder (because there i know its exact decoding delay).
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.