Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
|
Thread Tools | Search this Thread | Display Modes |
7th November 2011, 16:14 | #241 | Link |
QuickSync Decoder author
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
|
Open source project has been set up
SourceForge homepage:
http://sourceforge.net/p/qsdecoder Currently only useful for source control (SVN). FFDshow code changes where merged to FFDshow's code trunk. Will be part of next official FFDshow release (very similar to 0.18 alpha). Next on my task list (v0.19): * Create configuration to enable/disable certain features as asked by several developers for easy integration. * Fix fullscreen problem in WMC (not loading ffdshow for some reason). * Export D3D surfaces (DXVA2 samples) instead of system memory buffers. Will provide DXVA speed without actually dealing with DXVA... If all goes well, version 0.20 will add video postprocessing (deinterlacing, film cadence correction, noise reduction, sharpness, etc.)
__________________
Eric Gur, Processor Application Engineer for Overclocking and CPU technologies Intel QuickSync Decoder author Intel Corp. |
7th November 2011, 17:14 | #242 | Link |
Registered User
Join Date: Jan 2009
Posts: 73
|
Hi Eric,
Very interested by your work here! Will definitely check it out once the WMC fullscreen issue is fixed since I'm a WMC only user. One question I did have is since this is using ffdshow, will using the Mediacontrol plugin continue to work as well? Mediacontrol is huge for me (easy sub/audio stream control and ff/rew), so I'm hoping so. Thanks for your great work! |
7th November 2011, 17:31 | #243 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,346
|
I just looked over your ffdshow changes, and for the record: Its always much nicer to keep changes separated amongst multiple commits. For example, the bug fixes and the addition of the QS decoder should've at least been two commits, or more. Just sayin', its not my project or anything.
One thing i noticed though. Your sse2 memcpy seems superflous. If ffdshow is configured to use function intrinsics, the MS compiler will already use a optimized memcpy using sse2 if available. I did some testing along those lines recently, and a custom sse2 memcpy was actually not faster. In addition to that, i don't think ffdshow had a hard dependency on sse2 before.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
7th November 2011, 18:52 | #244 | Link | ||
QuickSync Decoder author
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
|
Quote:
Quote:
SSE2 implies a Pentium 3 or early 4 if I remember correctly. Not a crazy dependency I'll run a few more tests and kill it if performance is not gained.
__________________
Eric Gur, Processor Application Engineer for Overclocking and CPU technologies Intel QuickSync Decoder author Intel Corp. Last edited by egur; 7th November 2011 at 19:07. |
||
7th November 2011, 22:06 | #245 | Link | |
QuickSync Decoder author
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
|
Quote:
10x
__________________
Eric Gur, Processor Application Engineer for Overclocking and CPU technologies Intel QuickSync Decoder author Intel Corp. |
|
8th November 2011, 00:14 | #246 | Link | |
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 4,926
|
Quote:
General Complete name : G:\WipEout_HD_English_1080p.mp4 Format : MPEG-4 Format profile : Base Media / Version 2 Codec ID : mp42 File size : 171 MiB Duration : 1mn 11s Overall bit rate : 19.9 Mbps Encoded date : UTC 2008-08-01 17:57:37 Tagged date : UTC 2008-08-01 17:57:43 Video ID : 2 Format : AVC Format/Info : Advanced Video Codec Format profile : Main@L4.2 Format settings, CABAC : No Format settings, ReFrames : 2 frames Codec ID : avc1 Codec ID/Info : Advanced Video Coding Duration : 1mn 11s Bit rate : 19.8 Mbps Width : 1 920 pixels Height : 1 080 pixels Display aspect ratio : 16:9 Frame rate mode : Constant Frame rate : 59.940 fps Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 8 bits Scan type : Progressive Bits/(Pixel*Frame) : 0.159 Stream size : 169 MiB (99%) Language : English Encoded date : UTC 2008-08-01 17:57:05 Tagged date : UTC 2008-08-01 17:57:43 Color primaries : BT.709-5, BT.1361, IEC 61966-2-4, SMPTE RP177 Transfer characteristics : BT.709-5, BT.1361 Matrix coefficients : BT.709-5, BT.1361, IEC 61966-2-4 709, SMPTE RP177 Audio ID : 1 Format : AAC Format/Info : Advanced Audio Codec Format profile : LC Codec ID : 40 Duration : 1mn 11s Bit rate mode : Constant Bit rate : 144 Kbps Nominal bit rate : 160 Kbps Channel(s) : 2 channels Channel positions : Front: L R Sampling rate : 48.0 KHz Compression mode : Lossy Stream size : 1.25 MiB (1%) Language : English Encoded date : UTC 2008-08-01 17:57:04 Tagged date : UTC 2008-08-01 17:57:43 Material_Duration : 71851 Material_StreamSize : 1314528 interesting with Cyberlinks Decoder i get additional the Line 21 Decoder 2 loaded ? wasn't line 21 a Analog thing ? This gets loaded on EVR Input 1 with Cyberlink DXVA Filter: Line 21 Decoder 2 Pin: XForm Out - Connection media type: Video: AI44 720x480 29.97fps 82861kbps Hmm the Filter is in quartz.dll so it's coming from Microsoft i never saw this one on XP ffdshow-quicksync though doesn't load it http://msdn.microsoft.com/en-us/libr...=vs.85%29.aspx yep it's the analog line that was used to transmit different types of data back in the Analog days (videodat,captions) seems Cyberlink loads it by default with some data strange or there is some hidden analog data in this Sony stream :P. Filter : CyberLink Video Decoder (PDVD11) - CLSID : {9699092D-91FC-4DA1-8A63-112D865EB1D2} - Connected to: CLSID: {E4206432-01A1-4BEE-B3E1-3702C8EDC574} Filter: Line 21 Decoder 2 Pin: XForm In - Connection media type: Unknown AM_MEDIA_TYPE: majortype: MEDIATYPE_AUXLine21Data {670AEA80-3A82-11D0-B79B-00AA003767A7} subtype: MEDIASUBTYPE_Line21_GOPPacket {6E8D4A23-310C-11D0-B79A-00AA003767A7} formattype: FORMAT_None {0F6417D6-C318-11D0-A43F-00A0C9223196} bFixedSizeSamples: 1 bTemporalCompression: 1 lSampleSize: 200 cbFormat: 0 - Enumerated media type 0: Set as the current media type Seems indeed Cyberlink opens it by default now, i get it loaded for every stream (never experienced this before seems to be new behavior) PS: Egur i checked more carefully into the overhead and it seems without Quick Sync Recording the CPU usage is 10% @ playback with ffdshow quicksync and it increases it to 20% while Recording also Cyberlink DXVA Cpu usage increases while Recording but not by such a heavy amount just 2% more from 1 to 3%. So Recording with Quick Sync @ the same time currently seems to lower the efficiency of the Decoder somehow expected that also recording time critical stuff (D2D Browser Demos) shows a slowdown (low latency recording helps here a little). Yup it seems there are more GPU resources allocated to the Recording that get lost for the Decoding and so CPU usage inreases
__________________
all my compares are riddles so please try to decipher them yourselves :) It is about Time Join the Revolution NOW before it is to Late ! http://forum.doom9.org/showthread.php?t=168004 Last edited by CruNcher; 8th November 2011 at 10:39. |
|
8th November 2011, 12:30 | #247 | Link | |
QuickSync Decoder author
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
|
Quote:
BTW, when I'll add HW deinterlacing, this is the expected performance (for outputting 1080p@60). I can live with this level of performance but maybe driver improvements can lower CPU usage - more than half the CPU usage within my decoder DLL (not in FFDSHOW) goes into locking the D3D surface - slower than copying the surface back to system memory...
__________________
Eric Gur, Processor Application Engineer for Overclocking and CPU technologies Intel QuickSync Decoder author Intel Corp. |
|
8th November 2011, 21:19 | #248 | Link |
QuickSync Decoder author
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
|
Help needed
Current version has a limitation that was exposed by Windows Media Center.
It can't initialize in full screen exclusive mode. The D3D surfaces are allocated through a D3D9 device created with IDirect3D9::CreateDevice(). It's not used to display anything. Only in full screen exclusive mode (which WMC seem to use) CreateDevice fails. Does anyone have a workaround?
__________________
Eric Gur, Processor Application Engineer for Overclocking and CPU technologies Intel QuickSync Decoder author Intel Corp. |
8th November 2011, 21:47 | #249 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,346
|
What you could try is ask the EVR for the device. Part of the whole DXVA APIs is a Interface to get the device from the renderer.
Luckily for me, CUVID also functions without a D3D device, so i have never had to try (yet). Edit: Specifically this: http://msdn.microsoft.com/en-us/libr...(v=vs.85).aspx
__________________
LAV Filters - open source ffmpeg based media splitter and decoders Last edited by nevcairiel; 8th November 2011 at 21:49. |
8th November 2011, 22:00 | #250 | Link |
Registered User
Join Date: Oct 2010
Location: The Netherlands
Posts: 1,083
|
Indeed, that's the regular DXVA helper. Note that it's not actually an EVR object. It inherits from DXVA2.dll, and calls mfplat.dll.
Code:
typedef HRESULT (WINAPI *DXVA2CreateDirect3DDeviceManager9Ptr)(__out UINT *pResetToken, __out IDirect3DDeviceManager9 **ppDXVAManager); DXVA2CreateDirect3DDeviceManager9Ptr pfDXVA2CreateDirect3DDeviceManager9; m_hDXVA2Lib = LoadLibrary(L"dxva2.dll"); if (m_hDXVA2Lib) pfDXVA2CreateDirect3DDeviceManager9 = reinterpret_cast<DXVA2CreateDirect3DDeviceManager9Ptr>(GetProcAddress(m_hDXVA2Lib, "DXVA2CreateDirect3DDeviceManager9")); else { _Error += L"Could not find dxva2.dll\n"; hr = E_FAIL; return;} If you're using a device passed to DXVA2CreateVideoService (http://msdn.microsoft.com/en-us/libr...=VS.85%29.aspx), make sure that the HWND pointer used when creating the device isn't linked to a monitor that will be used for exclusive mode.
__________________
development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv Last edited by JanWillem32; 8th November 2011 at 22:27. |
8th November 2011, 23:00 | #251 | Link |
QuickSync Decoder author
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
|
Problem is that my decoder is not connected to the renderer, it's unaware of the graph. I can redesign
Currently I create my own device but it fails in fullscreen (only when decoder is instantiated during fullscreen)
__________________
Eric Gur, Processor Application Engineer for Overclocking and CPU technologies Intel QuickSync Decoder author Intel Corp. |
9th November 2011, 07:28 | #252 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,346
|
Its the only way. In FSE Mode you cannot create a new device, you have to use the one the EVR gives you.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
9th November 2011, 09:46 | #253 | Link | |
QuickSync Decoder author
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
|
Quote:
Can you point me to the relevant reading material? I didn't see this anywhere. Update: I was referred to this article: http://msdn.microsoft.com/en-us/libr...(v=vs.85).aspx Basically this mean I can create a device when these conditions are met: * They are created by the same Direct3D object that created the device that is full-screen. * They have the same focus window as the device that is full-screen. * They represent a different adapter from any full-screen device. So nevcairiel is right . I need to postpone decoder initialization after the graph is connected which is a little ugly for current use cases and requires a shotgun surgery in ffdshow's code. Future use cases (output DXVA samples) will enjoy this design change as I'll be able to know how many surfaces are queued in the renderer.
__________________
Eric Gur, Processor Application Engineer for Overclocking and CPU technologies Intel QuickSync Decoder author Intel Corp. Last edited by egur; 10th November 2011 at 10:27. |
|
10th November 2011, 10:33 | #255 | Link | |
QuickSync Decoder author
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
|
Quote:
* When there's just 1 GPU and one screen. * Even in multi GPU setups I don't know how to do this My home setup has 2 GPUs Intel + AMD. The screen is connected to the AMD and I still can't create the device on the Intel GPU. Maybe I send an HWND that's associated with the AMD-connected monitor. But I don't know how to create an generic HWND on the other monitor that will actually result in a functioning d3d device...
__________________
Eric Gur, Processor Application Engineer for Overclocking and CPU technologies Intel QuickSync Decoder author Intel Corp. |
|
10th November 2011, 13:06 | #256 | Link |
Registered User
Join Date: Dec 2005
Posts: 560
|
EVR sets things up like this
creates a window 1x1 pixels in size D3DPRESENT_PARAMETERS pp; ZeroMemory(&pp, sizeof(pp)); pp.BackBufferWidth = 1; pp.BackBufferHeight = 1; pp.Windowed = TRUE; pp.SwapEffect = D3DSWAPEFFECT_COPY; pp.BackBufferFormat = D3DFMT_UNKNOWN; pp.hDeviceWindow = hwnd; pp.Flags = D3DPRESENTFLAG_VIDEO; pp.PresentationInterval = D3DPRESENT_INTERVAL_DEFAULT; doesn't use that for rendering. It creates addition swap chains which it renders into, and presents when they are done. Don't know if that helps. |
10th November 2011, 18:52 | #257 | Link |
Registered User
Join Date: Oct 2010
Location: The Netherlands
Posts: 1,083
|
That points out the hwnd handle nicely. Creating the window handle window handle with WS_MINIMIZE|WS_POPUP and possibly WS_DISABLED should work. http://msdn.microsoft.com/en-us/libr...=VS.85%29.aspx
If creating it minimized is a problem, using CloseWindow should also do the trick of minimizing it. http://msdn.microsoft.com/en-us/libr...=VS.85%29.aspx Also, structs can be assigned on creation. Using ZeroMemory is more something for class members (and also only if you can't zero them on class initialization). D3DPRESENT_PARAMETERS pp = {1, 1, D3DFMT_X8R8G8B8, 1, D3DMULTISAMPLE_NONE, 0, D3DSWAPEFFECT_DISCARD, hWnd, TRUE, FALSE, D3DFMT_UNKNOWN, 0, 0, D3DPRESENT_INTERVAL_IMMEDIATE};
__________________
development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv |
10th November 2011, 19:05 | #258 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,346
|
For readability, everyone should always favor the syntax as dukey posted it. For complex structs like that, using the inline initializers is just asking for trouble.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
10th November 2011, 20:22 | #259 | Link |
Registered User
Join Date: Oct 2010
Location: The Netherlands
Posts: 1,083
|
That would most certainly extend the size of the renderers I'm working on considerably (the color management section declares dozens of various structs). Also, I often declare structs like this as static const. Using ZeroMemory might inhibit making elements "rommable" (a syntax of: D3DPRESENT_PARAMETERS pp = {0}; is illegal in this case). I usually just add comments to mark the interesting bits (most elements of this type of struct are 0 or 1). In this case only the hWnd parameter is worth noting, the rest just describes parameters for a 1×1 pixel backbuffer without extras.
__________________
development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv |
10th November 2011, 20:23 | #260 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,346
|
If the code is already too long, using these things to "shorten" it is the worst idea ever. It'll just make already long code even harder to read/understand.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
Tags |
ffdshow, h264, intel, mpeg2, quicksync, vc1, zoom player |
|
|