Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
6th August 2010, 12:55 | #1 | Link |
Registered User
Join Date: Jan 2005
Posts: 99
|
DirectShow Media Types for Stereoscopic Content
I'm opening this thread to discuss media types and media formats to be used by DirectShow filters for stereoscopic formats.
Last edited by pwimmer; 7th August 2010 at 00:41. |
6th August 2010, 12:57 | #2 | Link |
Registered User
Join Date: Jan 2005
Posts: 99
|
There is a need to specify media types for two categories of stereo content. This posting will be updated periodically to reflect the outcome of the discussions in this thread.
1) Compressed H.264 stereoscopic/multiview content. The following FOURCCs have been specified and are supported by the MPC HC MP4 and MPEG-TS splitters after applying the 3D patches. Upcoming versions of the Haali splitter and CoreAVC will support these media types, too. The AVC1 FOURCC should be used for 2D content. If a decoder input pin uses the AVC1 FOURCC, the decoder should also offers a second input pin that supports the EMVC FOURCC only. The AMVC FOURCC should be used for MVC streams where both the base view NALs and the MVC extension NALs (coded slice extension NAL units and prefix NAL units) are delivered on the same DirectShow filter pin. If the AMVC FOURCC is used, there must be a subset SPS and a second PPS in the format block. A DirectShow splitter filter pin offering the AMVC FOURCC can also offer the AVC1 FOURCC as second media type so that it also connects to legacy 2D decoders. However, when the decoder input pin is connected using the AMVC FOURCC, the decoder must not offer a second pin for EMVC FOURCC media types. The EMVC FOURCC should be used if the pin delivers MVC extension NALs only. The format block must contain a subset SPS and the corresponding PPS. NAL units of the base view must be delivered on another pin using the AVC1 FOURCC. Timestamps of the base view and MVC extension samples must be derived from the same clock, else the decoder fails to synchronize them. 2) Uncompressed stereoscopic/multiview content. The open media format is an extension to Microsoft's DirectShow technology to enable interoperability between stereo-capable DirectShow filters of different vendors. Click here to see the draft of the open media format specification: DirectShow Open Media Format SourceForge Project OpenMediaFormat.zip on 3dtv.at server Last edited by pwimmer; 1st August 2011 at 22:49. |
6th August 2010, 13:00 | #4 | Link |
Registered User
Join Date: Jan 2005
Posts: 99
|
Ad 2)
The Stereoscopic Player that I'm working on already uses a special stereo media type for uncompressed stereo samples sent from the 3dtv.at Stereo Tranformation filter to the 3dtv.at Stereo Renderer filter and from the 3dtv.at Stereo Image Source to the 3dtv.at Stereo Transformation. Edit: Removed the code to avoid confusion with the new Open Media Format definition. If you are interested in the media format currently used by the Stereoscopic Player, please contact me offlist. Last edited by pwimmer; 6th August 2010 at 23:55. Reason: Removed the code to avoid confusion with the new Open Media Format definition |
6th August 2010, 13:08 | #5 | Link |
Registered User
Join Date: Jan 2005
Posts: 99
|
I propose a different IMediaSampleEx interface that derives from IMediaSample and adds just two methods:
HRESULT GetPointerEx(BYTE **ppBuffer, int iView); long GetSizeEx(int iView); This allows to use separate buffers for each view, which I would prefer over a single buffer for all views. The format type adds a dwNumViews field to the VIDEOINFOHEADER2 DEFINE_GUID(STEREOLAYOUT_MONOSCOPIC, ...); // Monoscopic DEFINE_GUID(STEREOLAYOUT_MULTIBUFFERS, ...); // Use StereoMediaSampleEx to get buffers != 0 DEFINE_GUID(STEREOLAYOUT_SIDEBYSIDELEFTFIRST, ...); DEFINE_GUID(STEREOLAYOUT_SIDEBYSIDERIGHTFIRST, ...); DEFINE_GUID(STEREOLAYOUT_OVERUNDERLEFTTOP, ...); DEFINE_GUID(STEREOLAYOUT_OVERUNDERRIGHTTOP, ...); DEFINE_GUID(STEREOLAYOUT_INTERLACEDLEFTFIRST, ...); DEFINE_GUID(STEREOLAYOUT_INTERLACEDRIGHTFIRST, ...); DEFINE_GUID(STEREOLAYOUT_FRAMESQUENTIALLEFTFIRST, ...); DEFINE_GUID(STEREOLAYOUT_FRAMESQUENTIALRIGHTFIRST, ...); typedef struct tagSTEREOVIDEOINFOHEADER { RECT rcSource; RECT rcTarget; DWORD dwBitRate; DWORD dwBitErrorRate; REFERENCE_TIME AvgTimePerFrame; DWORD dwInterlaceFlags; DWORD dwCopyProtectFlags; DWORD dwPictAspectRatioX; DWORD dwPictAspectRatioY; DWORD dwControlFlags; DWORD dwReserved2; // Stereo related fields GUID guidStereoLayout; // See supported STEREOLAYOUT_xxx GUIDs above DWORD dwNumViews; // Number of views (only valid for STEREOLAYOUT_MULTIBUFFERS) BITMAPINFOHEADER bmiHeader; } STEREOVIDEOINFOHEADER; Last edited by pwimmer; 6th August 2010 at 23:56. Reason: Added GetSizeEx |
6th August 2010, 13:30 | #6 | Link |
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
A few comments:
(1) Let's prepare for *Multi* views and not just stereo. After all it's h264 M(ulti)VC and not h264 S(tereo)VC. (2) HDMI 1.4a not only knows left/right, but it also supports funny things like "L + Depth". Not sure if the new media type should support such things, too? (3) I'd like to use the opportunity to add more information to the header structure. Which means that the purpose of the new media type might not be limited to 3D. I'd also like to see the new media type being used for 2D. So maybe we should not even name it 3D or MVC. Maybe we should name it completely different, e.g. "FORMAT_VideoInfo3". However, that name might collide with a future Microsoft format, so maybe we should use "FORMAT_Doom9VideoInfo" or "FORMAT_OpenVideoInfo" or whatever. I don't think there should be "Stereo" in the name. The new format should be stereo capable, but that would only be one benefit of the new media type. I want to have more benefits. And if the new format is used for 2D, the standard IMediaSample interface should suffice. (4) I strongly suggest reusing the same FOURCC values that also exist for conventional 2D frames, instead of defining new ones. According to Microsoft, there already exists FOURCC definition for YCbCr 4:2:0, 4:2:2 and 4:4:4, with 8bit, 10bit and 16bit each. See here: http://msdn.microsoft.com/en-us/libr...px#_420formats (5) I don't really like the idea of locking the buffer for different "components". After all, depending on the FOURCC, the buffers may not be planar, and thus locking separate component buffers doesn't always make sense. I think there should be a way to get the pointer/size for every view, but it should be one simple pointer, similar to how the standard IMediaSample interface works. How the data is stored in the buffer depends on the FOURCC. (6) Let's try to name the methods of the extended IMediaSample interface similar to the original functions. E.g. instead of "LockBuffer" I'd suggest "GetPointerEx/GetSizeEx". (7) I'd suggest to start the structure with VIDEOINFOHEADER2, and to just append more data to it. This way it would be extremely easy for existing software to add support for the new media type. They'd just need to interpret the structure as VIDEOINFOHEADER2. They'd only need to do extra coding if they want to actually support the new fields of the structure. (8) Additional fields I want to have in the header, all taken from the h264 specification: - video_format - video_full_range_flag - colour_primaries - transfer_characteristics - matrix_coefficients - chroma_format_idc - chroma_sample_loc_type_top_field - chroma_sample_loc_type_bottom_field - bit_depth_luma - bit_depth_chroma Maybe more. The fields above might have to be adjusted to be more flexible. We don't want to limit ourselves to what the h264 spec supports. Other video formats may support more variations than h264. |
6th August 2010, 13:45 | #7 | Link | |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,348
|
Quote:
Some formats require codec/format specific data that gets appended after the BITMAPINFOHEADER in VIDEOINFOHEADER2. If you want to append data to the end of VIDEOINFOHEADER2, you would occupy this space. I don't see a clear way to preserve compatibility in a way like this without crippling some functionality of the format, tbh. Another question that i just asked myself.. This new format is mainly designed for data from the decoder to the renderer, not from the source/splitter to the decoder, right? Would decoders benefit from any additional information we could provide on the splitter side of things? I'm currently working on a DS source/splitter and just wondering if on this side anything new is required. Last edited by nevcairiel; 6th August 2010 at 13:54. |
|
6th August 2010, 13:57 | #8 | Link | |
Registered User
Join Date: Jun 2009
Location: London, United Kingdom
Posts: 707
|
Quote:
You should also include Side-by-side MPEG-2 and H.264 broadcasts. There are also 720p120 H.264 3D broadcasts. There's a special SEI in H.264 for indicating side-by-side 3D but not many people use it yet. Presumably there's something similar in MPEG-2. |
|
6th August 2010, 14:06 | #9 | Link | |||||||
Registered User
Join Date: Jan 2005
Posts: 99
|
Quote:
Quote:
I suggest to define a separate media type for 2D+Depth and do not cover it in our new media type. Quote:
FORMAT_OpenVideoInfo is my personal preference. Quote:
Quote:
Quote:
Quote:
VIDEOINFOHEADER2 already has support for parts of this values. Microsoft's doc says: If the AMCONTROL_COLORINFO_PRESENT flag is set in the dwControlFlags member, you can cast the dwControlFlags value to a DXVA_ExtendedFormat structure to access the extended color information. http://msdn.microsoft.com/en-us/libr...67(VS.85).aspx |
|||||||
6th August 2010, 15:02 | #10 | Link |
CoreCodec Founder
Join Date: Oct 2001
Location: San Francisco
Posts: 1,421
|
I'm bringing in squid_80 and Haali to follow/discuss.
__________________
Dan "BetaBoy" Marlin Ubiquitous Multimedia Technologies and Developer Tools http://corecodec.com |
6th August 2010, 17:12 | #11 | Link | ||||||
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
Quote:
Quote:
Quote:
Quote:
Quote:
No, you've inserted your new fields *before* the bmiHeader field. As a result your structure definition is not compatible to VIDEOINFOHEADER2. Quote:
That's great - thanks! Here comes my suggestion: Code:
typedef struct tagOPENVIDEOINFOHEADER { VIDEOINFOHEADER2 VideoInfoHeader2; MULTIVIEWHEADER MultiView; VIDEOPROPSHEADER VideoProps; DWORD Flags[8]; // Flags[0] & 0x1 = Full Range } OPENVIDEOINFOHEADER; typedef struct tagMULTIVIEWHEADER { DWORD ViewType; // enum to be defined DWORD NumViews; } typedef struct tagVIDEOPROPS { BYTE ColourPrimaries; // enum to be defined BYTE TransferCharacteristics; // enum to be defined BYTE MatrixCoefficients; // enum to be defined BYTE ChromaFormat; // enum to be defined BYTE ChromaLocTopField; // enum to be defined BYTE ChromaLocBottomField; // enum to be defined BYTE LumaBitdepth; BYTE ChromaBitdepth; } |
||||||
7th August 2010, 00:09 | #12 | Link | |
Registered User
Join Date: Jan 2005
Posts: 99
|
I've written a C++ header file that includes all the suggestions made so far. The link is in the second message of this thread.
It was possible to use a union including the VIDEOPROPS structure and dwControlFlags (that can be casted to DXVA_ExtendedFormat). This means I didn't break compatibility with the VIDEOPINFOHEADER2. New fields are only required for the stereo stuff. Quote:
|
|
7th August 2010, 09:02 | #13 | Link | |||||||||||
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
Quote:
Quote:
Quote:
Quote:
BTW, your structure is still incompatible with VIDEOINFOHEADER2, because the bmiHeader field is in the wrong place, as I said before. Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
|||||||||||
8th August 2010, 00:33 | #14 | Link |
Registered User
Join Date: Jan 2005
Posts: 99
|
I've addressed the issues you noticed.
* The license terms are less strict. * A Doom9 thread cannot be a copyright holder, only individuals. Of course I'll add other contributers as well. Let me know name & email address. * VIDEOPROPS will always be optional because in many causes the information is unknown. In this case I prefer the value "unknown" instead of forcing a filter to write some mess in these fields. If it is "unknown", it gives the application the chance to the ask the user or apply application-specific defaults. * You should define new FOURCCs for currently unsupported bit depths. Using existing FOURCCs with wrong bit depths will introduce more problems than it solves because developers will not expect other bit depths than those defined by Microsoft. Is there really content with unusual bit depths? The H.264 spec is very flexible, but that doesn't mean that all possible combinations are used in practice. Lets see the optinion of other people on this issue... * Using dwPictAspectRatioX and dwPictAspectRatioY is a very convenient way to make squeezed stereo files display properly in 2D media players. Although the output is not stereo but still two views, at least the image is not distored. Imho it is a clean solution, it is pretty much the same situation as with anamorphic 2D content. dwPictAspectRatioX and dwPictAspectRatioY are designed for the purpose of making the video display in the proper aspect ratio, so why not use it for stereo content as well? Anyway, I added an alternative that allows to leave dwPictAspectRatioX and dwPictAspectRatioY untouched: StereoFlags_HalfHorizontalResolution and StereoFlags_HalfVerticalResolution. But I would allow both methods. * I modified the text to mention "multiview" in addition to "stereoscopic". As multiview content is typically used for autostereoscopic displays, I do not consider the term "stereoscopic" wrong or inappropriate. The specification currently doesn't cover any other use of multiview content than for autostereoscopic displays. If somebody is familiar with other usage scenarios of multiview content (e.g. multi viewpoint video) and has suggestion what could be added to better support such scenarios, then let me know. * I completely gave up the union stuff in favor of a flat structure. * The concept of a passthrough matrix is rather confusing and I doubt any filter developer would support it. I added a much simpler feature that allows to find out if a media type is the native one: The flags AdvancedFlags_NativeFormatUnknown, AdvancedFlags_NativeFormatTrue and AdvancedFlags_NativeFormatFalse. * I have different "STEREOLAYOUT_SIDEBYSIDELEFTFIRST" and "SIDEBYSIDERIGHTFIRST" GUIDs because I want the GUIDs to be as precise as possible. "Views must be ordered from left to right" only applies to the STEREOLAYOUT_MULTIBUFFER GUID. A filter can easily change the order by swapping pointers while it would mean memcpys for other layouts. Thus I only created a single GUID for the multi buffer layout but not for the other layouts. I updated the comments to clarify the issue. * The latest version contains many new stereo layouts and desciptions for all layouts. Most notably, I added STEREOLAYOUT_UNKNOWN that should be used instead STEREOLAYOUT_MONOSCOPIC if the actual content is unknown. * The latest version also includes many new fields in the new STEREOLAYOUTPROPS structure, e.g. orientation (rotation and flipping), cropping, parallax adjustment. All these stuff is used and required in practice, e.g. for content recorded with mirror rigs, beam splitter or other optical stereo attachments. Last edited by pwimmer; 8th August 2010 at 04:07. |
8th August 2010, 08:33 | #15 | Link |
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
I like some/most of the changes, but I still see some issues. You seem to know a lot more about 3D/multiview related things than I do, so there's not much I can say about that.
(1) Again: Your structure is not compatible with VIDEOINFOHEADER2, because bmiHeader is in the wrong place. (2) The license terms are much better. But IMHO we don't need any copyright at all. But let's hear what other people have to say about that. (3) About VIDEOPROPS being optional: The same argument you're using could also be used for the multiview properties - yet all those fields are there and must be filled. There's no "this_is_multiview_content" flag which must be set first to activate the multiview structures. I want the same for VIDEOPROPRS. Of course there should be an "undefined" or "unknown" value for every VIDEOPROPS field. But having to activate the VIDEOPROPRS structure first by setting an additional flag is not good enough for me. If the VIDEOPROPS information is unknown, then filters can set the fields that way. But very often the information is known (or could easily be retrieved) and it's just not set because of lazyness. (4) I just noticed that one original field I was asking for, namely a "ChromaFormat (4:2:0, 4:2:2, 4:4:4)" is missing. Yeah, I already hear you say: "You can see that from the FOURCC". No, you can't, because an intermediate filter (e.g. ffdshow raw video processor) could already have applied chroma upsampling. Or the decoder could have internally upsampled chroma for e.g. YUY2 output. I want to know what format the original source had, not what the upstream filters have converted the original source data to. (5) Right now (4) made me think: Maybe there should be 2 VIDEOPROPS structures: One for the native format. And another one for the "current" data format. Something like "NativeVideoProps" and "CurrentVideoProps". (6) "Using existing FOURCCs with wrong bit depths will introduce more problems than it solves". I never meant to suggest using incorrect bitdepths. Of course if a filter uses a 10bit FOURCC then the data must really be 10bit. But if the original video source was only 9bit then the least significant bit of that 10bit data will always be 0. And that is an important thing to know for the video renderer. It makes no sense at all to define dozens of different FOURCCs for any wild bitdepth combination. No decoder writer would ever implement dozens of FOURCCs for that purpose. Too much programming time needed. Too much time needed for testing. It makes *much* more sense to e.g. always have the filter use a 16bit FOURCC with the video data "upconverted" to 16bit, with additional information fields that indicate how many least significant bits are zeroed out. (6) About dwPictAspectRatioX/Y. I was just starting to write a long text about how we should set it to 16:9, but then I thought: If there's a video renderer which doesn't really support/know 3D at all, then if we set aspect ratio to 16:9 for "side-by-side (full)" content, the video renderer will draw the image incorrectly. So I have to change my opinion and agree with your original suggestion to use 32:9 for side-by-side (full), for better compatability to non-3D-aware video renderers. Which also means that I vote for removing those half/full flags again. (6) What happens if the decoder wants to fill the VIDEOPROPS fields, but the content is originally RGB? In that case the current transfer matrix and chroma subsampling options don't fit. The correct transfer matrix for RGB is "passthrough". You could also name it "no matrix (RGB)" or something like that. (7) I fear that the native flags might not work as intended. E.g. if a decoder programmer finds that his decoder does not connect to my renderer if he sets "NativeFormatFalse", then he might misunderstand that as a bug in my renderer and as a "fix" he may decide to always set the flag to "True" or "Unknown". As a result I'd be screwed. I'd prefer to have decoder programmers fill in the correct values in the VIDEOPROPS structure, then I can check myself what they've done. So my suggestion would be this: Code:
typedef struct tagOPENVIDEOINFOHEADER { VIDEOINFOHEADER2 videoInfo2; VIDEOPROPS nativeVideoProps; VIDEOPROPS currentVideoProps; STEREOLAYOUTPROPS stereoLayoutProps; UINT reserved[8]; // Set to zero when writing and ignore when reading } OPENVIDEOINFOHEADER; Last edited by madshi; 8th August 2010 at 08:36. |
8th August 2010, 15:14 | #16 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,348
|
You should forget the compat with VIH2. Devs need to modify their code anyway for the new format, and as long as the fields are the same, plus new ones, you can still parse it with the same code.
Having a VIH2 at the top is rather confusing, because the BMI in the VIH2 can actually grow with extradata. How would you handle this? Move it to the end of the OVH? Your compat is already gone, then. I'm with pwimmer on the basic design, copy VIH2 and add new fields, keep BMI at the end. |
8th August 2010, 17:08 | #17 | Link | |
Registered User
Join Date: Jan 2005
Posts: 99
|
Quote:
All the other fields should be identical, which is the reason why I would like to keep the VIH2-compatible VIDEOPROPS. Anyway, the VIDEOPROPS is big improvement over the dwControlFlags, because it's now obvious what information it contains. Many developers didn't know that dwControlFags can be casted to DXVA_ExtendedFormat. At least this issue is solved, which should increase the chance that the structure ios actually used. By keeping it compatible we even increase the change that developers even use it for VIH2, too. |
|
8th August 2010, 17:58 | #18 | Link | ||||||
Registered User
Join Date: Jan 2005
Posts: 99
|
Quote:
There is no flag to active the STEREOLAYOUTPROPS, but it is optional, of course. If all byte are set to zero (thus, if the developer ignores it), this defaults to STEREOLAYOUT_UNKNOWN (which is deliberately defined as GUID_NULL). All the other fields in the structure are defined in a way that ensures that 0 always means the default. A renderer that ignores STEREOLAYOUTPROPS and a renderer that obeys STEREOLAYOUTPROPS will output the same image when all fields of STEREOLAYOUTPROPS are zero. Quote:
Quote:
EDIT: I was wrong. http://msdn.microsoft.com/en-us/libr...px#_420formats specifies that bit shifting should be used! Quote:
Imho, there is no clear winner which approach is better, so I would keep both and let the developer decide. Quote:
This is the way it is already handled in VIH2 and I see no need to change it. Quote:
It's more likely that developers are willing to properly set a single flag than a complex structure. Last edited by pwimmer; 8th August 2010 at 18:09. |
||||||
8th August 2010, 18:26 | #19 | Link | |
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
Quote:
Code:
typedef struct tagOPENVIDEOINFOHEADER { VIDEOINFOHEADER2 videoInfo2; RGBQUAD bmiColors[256]; VIDEOPROPS nativeVideoProps; VIDEOPROPS currentVideoProps; STEREOLAYOUTPROPS stereoLayoutProps; UINT reserved[8]; // Set to zero when writing and ignore when reading } OPENVIDEOINFOHEADER; |
|
8th August 2010, 18:43 | #20 | Link | |||||
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
Quote:
Quote:
Quote:
Quote:
Quote:
With nativeVideoProps/currentVideoProps, the programmer would have to actually lie to make my filter connect. With the flags you have suggested he simply has to leave it at "0", which is even the default value (provided that he uses memset(0))! Furthermore, if I have nativeVideoProps and currentVideoProps, I can see exactly what the filter chain has done and so I can give exact tips to the end user about what he has to change to improve image quality. |
|||||
|
|