View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing
pulbitz
21st October 2011, 17:08
I'm sorry. I don't speak English very well.
audio/video unsync (with Gabest Splitter) sample files.
(2011.09.28) Hyun Young 조현영 _A_ @ Gachon University Festival Celebration Fancam(720p_H.264-AAC).mp4
http://o-o.preferred.fra02s05.v5.lscache1.c.youtube.com/videoplayback?sparams=id%2Cexpire%2Cip%2Cipbits%2Citag%2Csource%2Cratebypass%2Ccp&fexp=904539%2C914032%2C903119%2C900221&itag=22&ip=121.0.0.0&signature=8C7C37527E0C230FA0A01CFAE0620BFA923D7B6E.8EA9A67F9D7C0A227858F78B0DFBF2ADE9A5CF1B&sver=3&ratebypass=yes&source=youtube&expire=1319148000&key=yt1&ipbits=8&cp=U0hQTlFPVl9FSkNOMF9JSVpBOk12b085N3JnWmY4&id=1089491d982d9386
(2011.10.06) Hyun Young 조현영 _Mach_ @ Gyeonggi University of S&T Festival Fancam(720p_H.264-AAC).mp4
http://o-o.preferred.fra02s05.v7.lscache8.c.youtube.com/videoplayback?sparams=id%2Cexpire%2Cip%2Cipbits%2Citag%2Csource%2Cratebypass%2Ccp&fexp=904539%2C914032%2C903119%2C900221&itag=22&ip=121.0.0.0&signature=5F64ED712E76CE6F1A91E8CB333257FA14091F93.9D07A528297624A90D5979D530452CFEF44787AE&sver=3&ratebypass=yes&source=youtube&expire=1319148000&key=yt1&ipbits=8&cp=U0hQTlFPVl9FSkNOMF9JSVpBOk12b085N3JnWmY4&id=2130d6979f680c4c
QuickSync = 30.303fps
libavcodec = 29.97xfps
please improve your timestamp code more. :)
I know I need to improve the time stamps. Fixed a few things in the v0.16 but there's still more work to do...
The links you've posted are not working - "access denied" for both. You can share very quickly on http://www.multiupload.com
I upload files. Please try again.
http://www.mediafire.com/?jql2qu8xj22c2ar
http://www.mediafire.com/?ye6oudmltypr7ey
egur
21st October 2011, 23:00
I upload files. Please try again.
http://www.mediafire.com/?jql2qu8xj22c2ar
http://www.mediafire.com/?ye6oudmltypr7ey
Got the files.
The newest version (to be released soon) seems to play them well. They are have constant frame rate of 29.97 which stays well for the entire clip.
Other than audio/video sync issues, are there any other problems?
BTW, do you know what camera produced these clips?
I want to thank all you guys for helping me make this a better product by testing and providing clips :)
JanWillem32
21st October 2011, 23:50
I haven't seen anything like this - two DXVA devices from different GPUs passing surfaces from one to the other?
I can take your word for it but it's probably extremely complicated to accomplish.
In the Intel GPU, I don't think there's any DMA going on when copying surfaces back and forth to the CPU. It's the same memory sitting on the same memory controller. A special SSE4 instruction was introduced in Penryn to address the complex mapping to solve the speed issues.DMA just means getting valid pointers to memory outside of regular system memory. It's just a matter of pointer logic between two DirectX 9 devices and giving direct access, without invoking a copy operation on an entire texture. As far as I know, DXVA uses rather normal render targets. Sharing a texture would look like this:HANDLE SharedHandle = NULL;// uninitialized
D3DDevice1->CreateTexture(Width, Height, 1, D3DUSAGE_RENDERTARGET, SurfaceType, D3DPOOL_DEFAULT, &RTTexture1, &SharedHandle);// handle is initialized, texture is created on deivice 1
D3DDevice2->CreateTexture(Width, Height, 1, D3DUSAGE_RENDERTARGET, SurfaceType, D3DPOOL_DEFAULT, &RTTexture2, &SharedHandle);// handle is used, no extra texture is created, the COM pointer RTTexture2 will be usable only to D3DDevice2This way device 2 will have read/write access to that texture hosted in device's 1 memory. Ownership will stay with device 1, so when quitting, release RTTexture2 before it becomes invalid by releasing RTTexture1. (The same old story as with "new" and "delete", get rid of additional pointers first before destroying the object itself.)
The helper function for DXVA on EVR does a similar thing (although usually on the same adapter). Shared handles are also used when you want to access textures from a DirectX 9 to a DirectX 11 device.
pulbitz
22nd October 2011, 06:54
Got the files.
The newest version (to be released soon) seems to play them well. They are have constant frame rate of 29.97 which stays well for the entire clip.
Other than audio/video sync issues, are there any other problems?
BTW, do you know what camera produced these clips?
I want to thank all you guys for helping me make this a better product by testing and providing clips :)
No other problems.
Probably Samsung HMX-S10. (I found it difficult. :p)
Thanks for the fix!
P.S. Not related QuickSync decoder. But can you look at this bug? http://communities.intel.com/thread/24972
markanini
22nd October 2011, 08:59
P.S. Not related QuickSync decoder. But can you look at this bug? http://communities.intel.com/thread/24972
Not sure if related but I'm seeing similar poor chroma upsampling on flash video.
egur
22nd October 2011, 15:59
P.S. Not related QuickSync decoder. But can you look at this bug? http://communities.intel.com/thread/24972
Not sure if related but I'm seeing similar poor chroma upsampling on flash video.
I only use the decoder part of QuickSync ATM. The post processing comes from the renderer.
The video processing pipeline, which I'm very familiar with doesn't care about surface format, what matters is that both are 4:2:0. If different results arise from NV12 and YV12, it looks like a driver bug.
CruNcher
22nd October 2011, 19:17
Wow i just realized (testing ffdshow-quicksync) some vendor DXVA implementations can avoid Ref Frame issues on EVR :D CoreAVC and Arcsoft are one of them without loosing hardware playback or need to fallback to Software very impressive :)
egur
22nd October 2011, 19:29
Wow i just realized (testing ffdshow-quicksync) some vendor DXVA implementations can avoid Ref Frame issues on EVR :D CoreAVC and Arcsoft are one of them without loosing hardware playback or need to fallback to Software very impressive :)
Can you explain, what's the Ref Frame issue with EVR? And what I did better then the others?
BTW, my decoder can fallback to SW under certain conditions. WMV3 isn't HW accelerated at all and clips with height OR width larger than 1080p also fallback to SW.
This is noticeable by a significant change in CPU usage.
In a future version I'll notify the app (ffdshow in this my case) that no HW acceleration is available and the app can choose whether to use Intel's SW implementation or use another SW decoder.
Update:
CPU usage is very high for Intel SW implementation because I keep using the D3D surfaces as frame buffers. This (of course) is far from optimal but will be addressed in the next release.
So head to head benchmarks between Intel's SW implementation and libavcodec/libwmv9 will have to wait till then.
vivan
22nd October 2011, 20:38
CoreAVC-dxva (the same applies to the mpc-dxva, ffdshow-dxva and mirillis splash player) + 1080p with 16 ReFrames:
http://2.firepic.org/2/images/2011-10/22/mv9yqyb9f86m.png
But with your decoder everything is perfect :)
P.s. another one sample with vfr (at least mediaInfo says so), that causes a/v desync: http://www.mediafire.com/?d54fscbebq2p4ag
And this one is with real vfr (60/30): http://akross.info/guest/[akross.ru]_Artofeel_-_Minimalistique_alt.mkv
egur
22nd October 2011, 21:11
CoreAVC-dxva (the same applies to the mpc-dxva, ffdshow-dxva and mirillis splash player) + 1080p with 16 ReFrames:
http://2.firepic.org/2/images/2011-10/22/mv9yqyb9f86m.png
But with your decoder everything is perfect :)
P.s. another one sample with vfr (at least mediaInfo says so), that causes a/v desync: http://www.mediafire.com/?d54fscbebq2p4ag
And this one is with real vfr (60/30): http://akross.info/guest/[akross.ru]_Artofeel_-_Minimalistique_alt.mkv
Thanks for the clip, I'll check it out.
CoreCodec are free to use my code as reference or 'as is' if they want to. Hence the BSD license.
If they do, another goal has been met at some level - improve SW that use the QuickSync technology.
If someone has a clip that demonstrates this failure, please share. Don't be shy.
CruNcher
22nd October 2011, 22:24
egur look @ http://forum.doom9.org/showthread.php?t=159486 i tested with the oceanic samsung x264 encode and CoreAVC 3.0.1 and Arcsofts DXVA2 survive it :) and i confirmed that they don't fallback to Software decoding either (some implementations do that after a bitstream check)
Vivan strange i tested several over ref frames clips from several x264 builds and it seems CoreCodec at least found a workaround the same as Arcsoft did, its impressive other solutions either switch to Software or error per frame areas :D
egur this though isn't a problem that should interest you as you don't use DXVA your decoder will not suffer from this :)
Cyberlinks DXVA has really to fight with this erroring out every frame with core 56 bitstreams
egur
22nd October 2011, 23:33
egur this though isn't a problem that should interest you as you don't use DXVA your decoder will not suffer from this :)
Actually, my decoder uses the Intel Media SDK which uses DXVA2. So indirectly I'm using DXVA...
CruNcher
23rd October 2011, 10:55
yup im not sure but it seems only DXVA1 implementations are affected and Cyberlink seems to be still DXVA1 both Arcsoft and CoreAVC are DXVA2 implementations and don't suffer from it, though CoreAVC even plays some more test bitstreams then Arcsoft does :)
vivan
23rd October 2011, 11:18
egur look @ http://forum.doom9.org/showthread.php?t=159486 i tested with the oceanic samsung x264 encode and CoreAVC 3.0.1 and Arcsofts DXVA2 survive it :) and i confirmed that they don't fallback to Software decoding either (some implementations do that after a bitstream check)even mpc-dxva plays it well, I guess it just doesn't use so much reframes.
Vivan strange i tested several over ref frames clips from several x264 builds and it seems CoreCodec at least found a workaround the same as Arcsoft did, its impressive other solutions either switch to Software or error per frame areas :DMaybe their implementations are a bit better, but they still are far from this decoder.
E.g. with this sample (http://www.mediafire.com/?id7cd8aujnmwslf), coreavc shows mess for about 2/3 of time, mpc-dxva - 9/10, but mirillis splash player have only few artifacts (so ~1/30 of time).
At least on my i5-2410M with intel HD 3000.
CruNcher
23rd October 2011, 12:06
thx for the sample going to check, and yes sure fffdshow-quicksync is more robust the same as nvidia cuvid is though also under XP (NT5) that intel doesn't support anymore for valid reasons :)
But the Ref problem is one of the major issues people change from DXVA to other solutions and the upcoming 10 bit and 4:2:2 wave (though problem on the Hardware support level) obviously at least under NT 6 it's much better in terms of flexibility when mixing different inputs (subbtitles, interactive layers (guis), different content, pp systems(shader,cpu,compute)) also in terms of editing via the copy and transfer between GPU (memory) capabilities though not many make use of this currently :)
My work on my High Efficiency DWM Desktop (DirectX,OpenGL) Capture and Transcode framework leverages all of this (except compute currently) mixing it in near realtime (latency of H.264, still testing different efficiency scenarios because of GPU/CPU dependency) im confident i can do even better then Mirillis with Action http://mirillis.com/en/products/action.html (and their FIC codec) does in the End on supported systems :D
PS: Yep Vivan that bitstream is hardcore (Anime exaggerated Encoding style ;) ) Arcsoft DXVA freezes straight @ the start Cyberlinks goes Hi wire only CoreAVCs DXVA can produce something with your shown errors from here to their :) though amazing how CoreCodec compensates even this level lower levels seem no problems to fix for them entirely (might be also the reason it's not up to the performance of Cyberlinks DXVA which fails even @ lower levels) :)
Mirillis really pushes the boundary here once again wow as you said ultra low CPU ultra low GPU and only 1 issue their custom Renderer and Decoder is really efficient as hell (these polish guys really fascinate, they are on the super right track)
Action is definitely gonna beat Fraps hands down i give you my word for it these guys know what they do :)
Though we should be also true these Bitstreams are rare ;) they exist but they are still rare (mostly old x264 encoder libx264 used via ffmpeg in the commercial space without any idea from the user what he actually does) though it's good to know that somebody cares about efficiency behind specs ;) (Vivan be also advised im no fan of this exaggerated Anime Encoding style as i find it useless in most cases, it's just not worth for every pixel quality to brake specs which will be hardly visible @ all,also in Power Consumption terms and yeah im no fan of Placebo either ;))
egur
23rd October 2011, 14:12
Does anyone have an idea if I should correct 1440x1080 with aspect ratio of 4:3 to an aspect ratio of 16:9?
I've noticed some clips that have this wrong AR and thus displayed wrong.
The wrong aspect ratio of 4:3 exists in both the media type as well as the PPS (h264).
nevcairiel
23rd October 2011, 14:49
If the stream is flagged improperly, dont touch it. It could as well be meant to be 4:3, you will never know.
The only correction i do automatically is crop a height of 1088 to 1080, because in 99.99% of all cases, thats correct. (Just missing the cropping flags in the bitstream)
Blight
23rd October 2011, 16:53
I would add an optional check-box.
I've seen quite a few 1440x1080 clips and they were all 16:9.
I have never seen one at 4:3.
nevcairiel
23rd October 2011, 17:21
I have never seen a 1440x1080 that was wrongly flagged and playing as 4:3 eventhough it is 16:9, tbh.
One would imagine that in all this time, someone would've posted such a sample as a bug report.
egur
23rd October 2011, 20:01
If the stream is flagged improperly, dont touch it. It could as well be meant to be 4:3, you will never know.
The only correction i do automatically is crop a height of 1088 to 1080, because in 99.99% of all cases, thats correct. (Just missing the cropping flags in the bitstream)
I have never seen a 1440x1080 that was wrongly flagged and playing as 4:3 eventhough it is 16:9, tbh.
One would imagine that in all this time, someone would've posted such a sample as a bug report.
Totally right, it tuned out to be bug in the splitter (Haali). With LAV splitter AR is correct. The clip was test.ts posted by CruNcher a while back on this thread.
nevcairiel
23rd October 2011, 20:09
Ah, Haalis "feature" that replaces all AR definitions in the stream with the container defined AR. Too bad a TS file does not have a container AR. ;)
Blight
24th October 2011, 14:16
egur/nevcairiel:
Was this reported to Haali?
nevcairiel
24th October 2011, 15:00
Its a feature in his book, because CoreAVC didn't have a option to ignore the Stream AR, so instead of adding that option in the decoder, Haali just overwrites the Stream AR with the container-defined AR. For MKV that might make sense, for any other format thats just terrible.
Anyhow, i have never seen him give any feedback whatsoever on issue reports on HMS.
egur
24th October 2011, 23:25
I get image corruption right after seeks on quite a few h264 TS clips.
After some testing, I've found out that Haali Media Splitter doesn't cause the corruptions. LAV and MPC do.
The first NALU after a seek with Haali is an SEI then a SLICE. The rest provide a SLICE NALUs without anything else.
Before posting a bug report (or feature request) to nevcairiel, does anyone have an idea on the differences?
Also, it seems that Haali is significantly faster at seeks or at least my decoder as well libavcodec produce a frame much faster. The difference is instant vs 1-2 seconds.
Can someone explain this behavior difference? Possible workaround?
nevcairiel
25th October 2011, 05:50
Its not a splitter "bug", its just not perfect behavior. You either ignore it, or you take steps to not show an image until its artifact free.
TS is not a format that was designed with seeking in mind, which means it does not carry any metadata about keyframes or such to make seeking easier.
Its already being tracked as an enhancement to try to find a key frame before delivery from the splitter, so the decoder has an easier job.
PS:
How does "nav" get into peoples minds when they read my name?
egur
25th October 2011, 07:58
Its already being tracked as an enhancement to try to find a key frame before delivery from the splitter, so the decoder has an easier job.
Very well, the snappy and clean seeks are worth the effort.
PS:
How does "nav" get into peoples minds when they read my name?
It means people are too tired when writing a post - corrected :)
nevcairiel
25th October 2011, 08:09
It means people are too tired when writing a post - corrected :)
I don't mind "nev", just i dont get where people get the a from, you're not the only one. :p
HeadlessCow
25th October 2011, 19:30
nev(cairiel) + LAV = nav
Probably.
egur
25th October 2011, 20:23
New and improved version. Zip files contains installer and documentation, please read.
Download version 0.17 alpha:
32 bit http://www.multiupload.com/I6XHZWQP2Y
64 bit http://www.multiupload.com/OUR6SXPVT1
Source code http://www.multiupload.com/J8X9WPAKXM
Revision highlights:
v1.17:
* Support variable frame rate video.
* More stable time stamps (audio sync issues).
* Fixed FFDShow’s frame rate measurement to better view frame rate changes.
* Better Media SDK initialization.
vivan
26th October 2011, 20:56
With new version QS decoding doesn't work on any video o_O
ffdshow is prefered decoder, so on older versions it works perfectly. But with 1.17 mpc-hc is using next decoder (lav decoder, in my case). If I change QS decoder (in ffdshow settings) to libavcodec - ffdshow works as it should (but, obviously, without QS).
I've rolled back to older version - everything behave as it should. Than updated to new version - QS is not working again. So, problem is not in the system settings...
SW: MPC-HC 1.5.2.3456, nVidia driver - 285.62, intel driver - .2509. W7 HP x64
HW: i5-2410M + GT540M.
egur
26th October 2011, 23:26
With new version QS decoding doesn't work on any video o_O
ffdshow is prefered decoder, so on older versions it works perfectly. But with 1.17 mpc-hc is using next decoder (lav decoder, in my case). If I change QS decoder (in ffdshow settings) to libavcodec - ffdshow works as it should (but, obviously, without QS).
I've rolled back to older version - everything behave as it should. Than updated to new version - QS is not working again. So, problem is not in the system settings...
SW: MPC-HC 1.5.2.3456, nVidia driver - 285.62, intel driver - .2509. W7 HP x64
HW: i5-2410M + GT540M.
Just reinstalled 32 and 64 bit using the installers (they also register the filters) and no prb on mpc-hc 32 and 64 (same build as yours). All DLLs exist, double checked with dependency walker to make sure the debug builds didn't escape.
Even deleted the files in program files and reinstalled again, still works.
Does anyone else have issues?
JEskandari
28th October 2011, 13:23
well ,when I want to use this as decoder for potplayer I receive this error
"Unhandled exception occurred [0xc0000005@0x6EEA545A] at
IntelQuickSyncDecoder.dll
Additional exception information has been stored locally
and this application will be terminated"
and when I use mpc-hc it crash with this error report
Problem signature:
Problem Event Name: APPCRASH
Application Name: mpc-hc.exe
Application Version: 1.5.2.3456
Application Timestamp: 4e29d332
Fault Module Name: IntelQuickSyncDecoder.dll
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 4ea5f220
Exception Code: c0000005
Exception Offset: 0000545a
OS Version: 6.1.7601.2.1.0.768.3
Locale ID: 1033
Additional Information 1: 0a9e
Additional Information 2: 0a9e372d3b4ad19135b953a78882e789
Additional Information 3: 0a9e
Additional Information 4: 0a9e372d3b4ad19135b953a78882e789
by the way the system I yried this on have a core I3 2310m and
no discrete gpu only hd3000 and I have installed the latest Intel
driver
egur
28th October 2011, 17:49
well ,when I want to use this as decoder for potplayer I receive this error
"Unhandled exception occurred [0xc0000005@0x6EEA545A] at
IntelQuickSyncDecoder.dll
Additional exception information has been stored locally
and this application will be terminated"
I've found a strange issue with potplayer - the HW vc1 decoder fails to play interlaced VC1 clips that mpc-hc and ZoomPlayer play well. This is a splitter difference. Using Haali or LAV splitters solve the issue. I'll look into the matter further.
Regarding MPC-HC I didn't have a single issue. I need more information:
* All videos crash or a specific clip? If a single clip, please share.
* Driver version
* splitter used.
* Renderer used.
* Other non default setting for player
* Clip has subtitles or not. Which filter renders the subs.
Did you try other splitters?
egur
30th October 2011, 22:17
New and improved version. Zip files contains installer and documentation, please read.
Download version 0.18 alpha:
32 bit http://www.multiupload.com/VWSVG172HQ
64 bit http://www.multiupload.com/AR3VDNBA6P
Source code http://www.multiupload.com/UQEUJB4WST
Revision highlights:
v1.18:
* Fixed FFDShow’s H264 sequence header parsing crash. A lot of users reported crashes with the last build. This was a long standing FFDShow issue that affected specific clips.
* Added black borders to images with non 16 modulo width. Retaining non standard width can cause downstream filters to crash (dvobsub/vsfilter).
vivan
31st October 2011, 06:02
With new version my problem was fixed. Thanks!
egur
31st October 2011, 08:57
With new version my problem was fixed. Thanks!
Great!
BTW, Potplayer's internal splitter is still not supported. So is WMC full screen. For the latter I have no clue as to why ffdshow is not loaded and since it's fullscreen exclusive it's impossible to debug with a single screen :(. If someone has any insight on this let me know.
boliver10
5th November 2011, 01:32
I have an I7 2600k processor and two monitors (with two HDMI from the motherboard directly: no separate videocard)
My test is 4:2:0 interlaced PAL mkv MPEG2 file.
But when I run your FFDShow (with Intel selected for all MPEG2) I still get libavcodec running. And I have interlacing lines...
I'm using MPC-HC and am sure FFDShow is being used.
What can I do?
Note, I also get interlace lines with Microsoft's DTV codec.
egur
5th November 2011, 13:43
I have an I7 2600k processor and two monitors (with two HDMI from the motherboard directly: no separate videocard)
My test is 4:2:0 interlaced PAL mkv MPEG2 file.
But when I run your FFDShow (with Intel selected for all MPEG2) I still get libavcodec running. And I have interlacing lines...
I'm using MPC-HC and am sure FFDShow is being used.
What can I do?
Note, I also get interlace lines with Microsoft's DTV codec.
Deinterlacing is (usually) performed by the renderer.
Try the following setup:
Select EVR as a renderer (in MPC-HC press "O" then "output").
In ffdshow configuration uncheck the deinterlacing check box.
Note that some renderers do not perform deinterlacing.
If you still see interlace lines (lack of deinterlacing), I'll need to see the clip myself. Maybe the clip is flagged wrong. Please share it (or a portion of it). A large clip will take time to download. www.multiupload.com is the easiest way to go.
CruNcher
6th November 2011, 22:34
Egur finally having the capability to do this http://www.mediafire.com/download.php?6uhkiqiajnak5c4 a side by side compare of ffdshows-quicksync (left PID:6276) overhead vs cyberlink native dxva (right PID:7524) :)
egur
7th November 2011, 15:59
Egur finally having the capability to do this http://www.mediafire.com/download.php?6uhkiqiajnak5c4 a side by side compare of ffdshows-quicksync (left PID:6276) overhead vs cyberlink native dxva (right PID:7524) :)
Relatively high cpu usage, what are the details (resolution, CPU type, output surface format, etc.)
egur
7th November 2011, 16:14
SourceForge homepage:
http://sourceforge.net/p/qsdecoder
Currently only useful for source control (SVN).
FFDshow code changes where merged to FFDshow's code trunk. Will be part of next official FFDshow release (very similar to 0.18 alpha).
Next on my task list (v0.19):
* Create configuration to enable/disable certain features as asked by several developers for easy integration.
* Fix fullscreen problem in WMC (not loading ffdshow for some reason).
* Export D3D surfaces (DXVA2 samples) instead of system memory buffers. Will provide DXVA speed without actually dealing with DXVA...
If all goes well, version 0.20 will add video postprocessing (deinterlacing, film cadence correction, noise reduction, sharpness, etc.)
rsd78
7th November 2011, 17:14
Hi Eric,
Very interested by your work here! Will definitely check it out once the WMC fullscreen issue is fixed since I'm a WMC only user. One question I did have is since this is using ffdshow, will using the Mediacontrol plugin continue to work as well? Mediacontrol is huge for me (easy sub/audio stream control and ff/rew), so I'm hoping so.
Thanks for your great work!
nevcairiel
7th November 2011, 17:31
I just looked over your ffdshow changes, and for the record: Its always much nicer to keep changes separated amongst multiple commits. For example, the bug fixes and the addition of the QS decoder should've at least been two commits, or more. Just sayin', its not my project or anything. :)
One thing i noticed though. Your sse2 memcpy seems superflous. If ffdshow is configured to use function intrinsics, the MS compiler will already use a optimized memcpy using sse2 if available. I did some testing along those lines recently, and a custom sse2 memcpy was actually not faster.
In addition to that, i don't think ffdshow had a hard dependency on sse2 before.
egur
7th November 2011, 18:52
I just looked over your ffdshow changes, and for the record: Its always much nicer to keep changes separated amongst multiple commits. For example, the bug fixes and the addition of the QS decoder should've at least been two commits, or more. Just sayin', its not my project or anything. :)
Usually, yes, but it was hard to separate everything since a lot have changed.
One thing i noticed though. Your sse2 memcpy seems superflous. If ffdshow is configured to use function intrinsics, the MS compiler will already use a optimized memcpy using sse2 if available. I did some testing along those lines recently, and a custom sse2 memcpy was actually not faster.
In addition to that, i don't think ffdshow had a hard dependency on sse2 before.
Maybe VS2010 got it right :) I just copied the function from an another program that was compiled on vs2005. Back then, it was 2x faster (on Core2Duo and P4).
SSE2 implies a Pentium 3 or early 4 if I remember correctly. Not a crazy dependency :)
I'll run a few more tests and kill it if performance is not gained.
egur
7th November 2011, 22:06
Hi Eric,
Very interested by your work here! Will definitely check it out once the WMC fullscreen issue is fixed since I'm a WMC only user. One question I did have is since this is using ffdshow, will using the Mediacontrol plugin continue to work as well? Mediacontrol is huge for me (easy sub/audio stream control and ff/rew), so I'm hoping so.
I'll try the Media Control plugin, shouldn't be a problem as I didn't change the ffdshow API. I'll probably fix the WMC issue soon. Been busy lately.
Thanks for your great work!
10x
CruNcher
8th November 2011, 00:14
Relatively high cpu usage, what are the details (resolution, CPU type, output surface format, etc.)
Video: NV12 1920x1080 59.94fps, Intel Core I-5 2400
General
Complete name : G:\WipEout_HD_English_1080p.mp4
Format : MPEG-4
Format profile : Base Media / Version 2
Codec ID : mp42
File size : 171 MiB
Duration : 1mn 11s
Overall bit rate : 19.9 Mbps
Encoded date : UTC 2008-08-01 17:57:37
Tagged date : UTC 2008-08-01 17:57:43
Video
ID : 2
Format : AVC
Format/Info : Advanced Video Codec
Format profile : Main@L4.2
Format settings, CABAC : No
Format settings, ReFrames : 2 frames
Codec ID : avc1
Codec ID/Info : Advanced Video Coding
Duration : 1mn 11s
Bit rate : 19.8 Mbps
Width : 1 920 pixels
Height : 1 080 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 59.940 fps
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.159
Stream size : 169 MiB (99%)
Language : English
Encoded date : UTC 2008-08-01 17:57:05
Tagged date : UTC 2008-08-01 17:57:43
Color primaries : BT.709-5, BT.1361, IEC 61966-2-4, SMPTE RP177
Transfer characteristics : BT.709-5, BT.1361
Matrix coefficients : BT.709-5, BT.1361, IEC 61966-2-4 709, SMPTE RP177
Audio
ID : 1
Format : AAC
Format/Info : Advanced Audio Codec
Format profile : LC
Codec ID : 40
Duration : 1mn 11s
Bit rate mode : Constant
Bit rate : 144 Kbps
Nominal bit rate : 160 Kbps
Channel(s) : 2 channels
Channel positions : Front: L R
Sampling rate : 48.0 KHz
Compression mode : Lossy
Stream size : 1.25 MiB (1%)
Language : English
Encoded date : UTC 2008-08-01 17:57:04
Tagged date : UTC 2008-08-01 17:57:43
Material_Duration : 71851
Material_StreamSize : 1314528
interesting with Cyberlinks Decoder i get additional the Line 21 Decoder 2 loaded ? wasn't line 21 a Analog thing ?
This gets loaded on EVR Input 1 with Cyberlink DXVA
Filter: Line 21 Decoder 2
Pin: XForm Out
- Connection media type:
Video: AI44 720x480 29.97fps 82861kbps
Hmm the Filter is in quartz.dll so it's coming from Microsoft i never saw this one on XP
ffdshow-quicksync though doesn't load it
http://msdn.microsoft.com/en-us/library/windows/desktop/dd390642%28v=vs.85%29.aspx
yep it's the analog line that was used to transmit different types of data back in the Analog days (videodat,captions) seems Cyberlink loads it by default with some data strange or there is some hidden analog data in this Sony stream :P.
Filter : CyberLink Video Decoder (PDVD11) - CLSID : {9699092D-91FC-4DA1-8A63-112D865EB1D2}
- Connected to:
CLSID: {E4206432-01A1-4BEE-B3E1-3702C8EDC574}
Filter: Line 21 Decoder 2
Pin: XForm In
- Connection media type:
Unknown
AM_MEDIA_TYPE:
majortype: MEDIATYPE_AUXLine21Data {670AEA80-3A82-11D0-B79B-00AA003767A7}
subtype: MEDIASUBTYPE_Line21_GOPPacket {6E8D4A23-310C-11D0-B79A-00AA003767A7}
formattype: FORMAT_None {0F6417D6-C318-11D0-A43F-00A0C9223196}
bFixedSizeSamples: 1
bTemporalCompression: 1
lSampleSize: 200
cbFormat: 0
- Enumerated media type 0:
Set as the current media type
Seems indeed Cyberlink opens it by default now, i get it loaded for every stream :D (never experienced this before seems to be new behavior)
PS: Egur i checked more carefully into the overhead and it seems without Quick Sync Recording the CPU usage is 10% @ playback with ffdshow quicksync and it increases it to 20% while Recording also Cyberlink DXVA Cpu usage increases while Recording but not by such a heavy amount just 2% more from 1 to 3%. So Recording with Quick Sync @ the same time currently seems to lower the efficiency of the Decoder somehow expected that also recording time critical stuff (D2D Browser Demos) shows a slowdown (low latency recording helps here a little).
Yup it seems there are more GPU resources allocated to the Recording that get lost for the Decoding and so CPU usage inreases :)
egur
8th November 2011, 12:30
PS: Egur i checked more carefully into the overhead and it seems without Quick Sync Recording the CPU usage is 10% @ playback with ffdshow quicksync and it increases it to 20% while Recording
That's more aligned to what I see for 1080p@60.
BTW, when I'll add HW deinterlacing, this is the expected performance (for outputting 1080p@60).
I can live with this level of performance but maybe driver improvements can lower CPU usage - more than half the CPU usage within my decoder DLL (not in FFDSHOW) goes into locking the D3D surface - slower than copying the surface back to system memory...
egur
8th November 2011, 21:19
Current version has a limitation that was exposed by Windows Media Center.
It can't initialize in full screen exclusive mode.
The D3D surfaces are allocated through a D3D9 device created with IDirect3D9::CreateDevice(). It's not used to display anything.
Only in full screen exclusive mode (which WMC seem to use) CreateDevice fails.
Does anyone have a workaround?
nevcairiel
8th November 2011, 21:47
What you could try is ask the EVR for the device. Part of the whole DXVA APIs is a Interface to get the device from the renderer.
Luckily for me, CUVID also functions without a D3D device, so i have never had to try (yet).
Edit:
Specifically this: http://msdn.microsoft.com/en-us/library/windows/desktop/ms704727(v=vs.85).aspx
JanWillem32
8th November 2011, 22:00
Indeed, that's the regular DXVA helper. Note that it's not actually an EVR object. It inherits from DXVA2.dll, and calls mfplat.dll. typedef HRESULT (WINAPI *DXVA2CreateDirect3DDeviceManager9Ptr)(__out UINT *pResetToken, __out IDirect3DDeviceManager9 **ppDXVAManager);
DXVA2CreateDirect3DDeviceManager9Ptr pfDXVA2CreateDirect3DDeviceManager9;
m_hDXVA2Lib = LoadLibrary(L"dxva2.dll");
if (m_hDXVA2Lib) pfDXVA2CreateDirect3DDeviceManager9 = reinterpret_cast<DXVA2CreateDirect3DDeviceManager9Ptr>(GetProcAddress(m_hDXVA2Lib, "DXVA2CreateDirect3DDeviceManager9"));
else {
_Error += L"Could not find dxva2.dll\n";
hr = E_FAIL;
return;}edit:
If you're using a device passed to DXVA2CreateVideoService (http://msdn.microsoft.com/en-us/library/windows/desktop/ms704721%28v=VS.85%29.aspx), make sure that the HWND pointer used when creating the device isn't linked to a monitor that will be used for exclusive mode.
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.