Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > New and alternative a/v containers
Register FAQ Calendar Today's Posts Search

Reply
 
Thread Tools Search this Thread Display Modes
Old 6th August 2010, 12:55   #1  |  Link
pwimmer
Registered User
 
Join Date: Jan 2005
Posts: 99
DirectShow Media Types for Stereoscopic Content

I'm opening this thread to discuss media types and media formats to be used by DirectShow filters for stereoscopic formats.

Last edited by pwimmer; 7th August 2010 at 00:41.
pwimmer is offline   Reply With Quote
Old 6th August 2010, 12:57   #2  |  Link
pwimmer
Registered User
 
Join Date: Jan 2005
Posts: 99
There is a need to specify media types for two categories of stereo content. This posting will be updated periodically to reflect the outcome of the discussions in this thread.


1) Compressed H.264 stereoscopic/multiview content.

The following FOURCCs have been specified and are supported by the MPC HC MP4 and MPEG-TS splitters after applying the 3D patches. Upcoming versions of the Haali splitter and CoreAVC will support these media types, too.

The AVC1 FOURCC should be used for 2D content. If a decoder input pin uses the AVC1 FOURCC, the decoder should also offers a second input pin that supports the EMVC FOURCC only.

The AMVC FOURCC should be used for MVC streams where both the base view NALs and the MVC extension NALs (coded slice extension NAL units and prefix NAL units) are delivered on the same DirectShow filter pin. If the AMVC FOURCC is used, there must be a subset SPS and a second PPS in the format block.

A DirectShow splitter filter pin offering the AMVC FOURCC can also offer the AVC1 FOURCC as second media type so that it also connects to legacy 2D decoders. However, when the decoder input pin is connected using the AMVC FOURCC, the decoder must not offer a second pin for EMVC FOURCC media types.

The EMVC FOURCC should be used if the pin delivers MVC extension NALs only. The format block must contain a subset SPS and the corresponding PPS.

NAL units of the base view must be delivered on another pin using the AVC1 FOURCC. Timestamps of the base view and MVC extension samples must be derived from the same clock, else the decoder fails to synchronize them.


2) Uncompressed stereoscopic/multiview content.

The open media format is an extension to Microsoft's DirectShow technology to enable interoperability between stereo-capable DirectShow filters of different vendors. Click here to see the draft of the open media format specification:
DirectShow Open Media Format SourceForge Project
OpenMediaFormat.zip on 3dtv.at server

Last edited by pwimmer; 1st August 2011 at 22:49.
pwimmer is offline   Reply With Quote
Old 6th August 2010, 12:58   #3  |  Link
pwimmer
Registered User
 
Join Date: Jan 2005
Posts: 99
Ad 1) I'm currently working on the MVC decoder and thinking about possible solutions. Things currently evolve...

Last edited by pwimmer; 6th August 2010 at 13:10.
pwimmer is offline   Reply With Quote
Old 6th August 2010, 13:00   #4  |  Link
pwimmer
Registered User
 
Join Date: Jan 2005
Posts: 99
Ad 2)

The Stereoscopic Player that I'm working on already uses a special stereo media type for uncompressed stereo samples sent from the 3dtv.at Stereo Tranformation filter to the 3dtv.at Stereo Renderer filter and from the 3dtv.at Stereo Image Source to the 3dtv.at Stereo Transformation.

Edit: Removed the code to avoid confusion with the new Open Media Format definition. If you are interested in the media format currently used by the Stereoscopic Player, please contact me offlist.

Last edited by pwimmer; 6th August 2010 at 23:55. Reason: Removed the code to avoid confusion with the new Open Media Format definition
pwimmer is offline   Reply With Quote
Old 6th August 2010, 13:08   #5  |  Link
pwimmer
Registered User
 
Join Date: Jan 2005
Posts: 99
I propose a different IMediaSampleEx interface that derives from IMediaSample and adds just two methods:

HRESULT GetPointerEx(BYTE **ppBuffer, int iView);
long GetSizeEx(int iView);

This allows to use separate buffers for each view, which I would prefer over a single buffer for all views.

The format type adds a dwNumViews field to the VIDEOINFOHEADER2

DEFINE_GUID(STEREOLAYOUT_MONOSCOPIC, ...); // Monoscopic
DEFINE_GUID(STEREOLAYOUT_MULTIBUFFERS, ...); // Use StereoMediaSampleEx to get buffers != 0
DEFINE_GUID(STEREOLAYOUT_SIDEBYSIDELEFTFIRST, ...);
DEFINE_GUID(STEREOLAYOUT_SIDEBYSIDERIGHTFIRST, ...);
DEFINE_GUID(STEREOLAYOUT_OVERUNDERLEFTTOP, ...);
DEFINE_GUID(STEREOLAYOUT_OVERUNDERRIGHTTOP, ...);
DEFINE_GUID(STEREOLAYOUT_INTERLACEDLEFTFIRST, ...);
DEFINE_GUID(STEREOLAYOUT_INTERLACEDRIGHTFIRST, ...);
DEFINE_GUID(STEREOLAYOUT_FRAMESQUENTIALLEFTFIRST, ...);
DEFINE_GUID(STEREOLAYOUT_FRAMESQUENTIALRIGHTFIRST, ...);


typedef struct tagSTEREOVIDEOINFOHEADER {
RECT rcSource;
RECT rcTarget;
DWORD dwBitRate;
DWORD dwBitErrorRate;
REFERENCE_TIME AvgTimePerFrame;
DWORD dwInterlaceFlags;
DWORD dwCopyProtectFlags;
DWORD dwPictAspectRatioX;
DWORD dwPictAspectRatioY;
DWORD dwControlFlags;
DWORD dwReserved2;

// Stereo related fields
GUID guidStereoLayout; // See supported STEREOLAYOUT_xxx GUIDs above
DWORD dwNumViews; // Number of views (only valid for STEREOLAYOUT_MULTIBUFFERS)

BITMAPINFOHEADER bmiHeader;
} STEREOVIDEOINFOHEADER;

Last edited by pwimmer; 6th August 2010 at 23:56. Reason: Added GetSizeEx
pwimmer is offline   Reply With Quote
Old 6th August 2010, 13:30   #6  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
A few comments:

(1) Let's prepare for *Multi* views and not just stereo. After all it's h264 M(ulti)VC and not h264 S(tereo)VC.

(2) HDMI 1.4a not only knows left/right, but it also supports funny things like "L + Depth". Not sure if the new media type should support such things, too?

(3) I'd like to use the opportunity to add more information to the header structure. Which means that the purpose of the new media type might not be limited to 3D. I'd also like to see the new media type being used for 2D. So maybe we should not even name it 3D or MVC. Maybe we should name it completely different, e.g. "FORMAT_VideoInfo3". However, that name might collide with a future Microsoft format, so maybe we should use "FORMAT_Doom9VideoInfo" or "FORMAT_OpenVideoInfo" or whatever. I don't think there should be "Stereo" in the name. The new format should be stereo capable, but that would only be one benefit of the new media type. I want to have more benefits. And if the new format is used for 2D, the standard IMediaSample interface should suffice.

(4) I strongly suggest reusing the same FOURCC values that also exist for conventional 2D frames, instead of defining new ones. According to Microsoft, there already exists FOURCC definition for YCbCr 4:2:0, 4:2:2 and 4:4:4, with 8bit, 10bit and 16bit each. See here:

http://msdn.microsoft.com/en-us/libr...px#_420formats

(5) I don't really like the idea of locking the buffer for different "components". After all, depending on the FOURCC, the buffers may not be planar, and thus locking separate component buffers doesn't always make sense. I think there should be a way to get the pointer/size for every view, but it should be one simple pointer, similar to how the standard IMediaSample interface works. How the data is stored in the buffer depends on the FOURCC.

(6) Let's try to name the methods of the extended IMediaSample interface similar to the original functions. E.g. instead of "LockBuffer" I'd suggest "GetPointerEx/GetSizeEx".

(7) I'd suggest to start the structure with VIDEOINFOHEADER2, and to just append more data to it. This way it would be extremely easy for existing software to add support for the new media type. They'd just need to interpret the structure as VIDEOINFOHEADER2. They'd only need to do extra coding if they want to actually support the new fields of the structure.

(8) Additional fields I want to have in the header, all taken from the h264 specification:

- video_format
- video_full_range_flag
- colour_primaries
- transfer_characteristics
- matrix_coefficients
- chroma_format_idc
- chroma_sample_loc_type_top_field
- chroma_sample_loc_type_bottom_field
- bit_depth_luma
- bit_depth_chroma

Maybe more. The fields above might have to be adjusted to be more flexible. We don't want to limit ourselves to what the h264 spec supports. Other video formats may support more variations than h264.
madshi is offline   Reply With Quote
Old 6th August 2010, 13:45   #7  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,348
Quote:
Originally Posted by madshi View Post
(7) I'd suggest to start the structure with VIDEOINFOHEADER2, and to just append more data to it. This way it would be extremely easy for existing software to add support for the new media type. They'd just need to interpret the structure as VIDEOINFOHEADER2. They'd only need to do extra coding if they want to actually support the new fields of the structure.
While this idea sounds good in theory, there is one problem with it.
Some formats require codec/format specific data that gets appended after the BITMAPINFOHEADER in VIDEOINFOHEADER2.
If you want to append data to the end of VIDEOINFOHEADER2, you would occupy this space.

I don't see a clear way to preserve compatibility in a way like this without crippling some functionality of the format, tbh.

Another question that i just asked myself..
This new format is mainly designed for data from the decoder to the renderer, not from the source/splitter to the decoder, right?
Would decoders benefit from any additional information we could provide on the splitter side of things?

I'm currently working on a DS source/splitter and just wondering if on this side anything new is required.

Last edited by nevcairiel; 6th August 2010 at 13:54.
nevcairiel is offline   Reply With Quote
Old 6th August 2010, 13:57   #8  |  Link
kieranrk
Registered User
 
Join Date: Jun 2009
Location: London, United Kingdom
Posts: 707
Quote:
Originally Posted by madshi View Post
(1) Let's prepare for *Multi* views and not just stereo. After all it's h264 M(ulti)VC and not h264 S(tereo)VC.
Agreed. Just because Stereo is all the rage now doesn't mean everything else should be excluded.

You should also include Side-by-side MPEG-2 and H.264 broadcasts. There are also 720p120 H.264 3D broadcasts.

There's a special SEI in H.264 for indicating side-by-side 3D but not many people use it yet. Presumably there's something similar in MPEG-2.
kieranrk is offline   Reply With Quote
Old 6th August 2010, 14:06   #9  |  Link
pwimmer
Registered User
 
Join Date: Jan 2005
Posts: 99
Quote:
Originally Posted by madshi View Post
(1) Let's prepare for *Multi* views and not just stereo. After all it's h264 M(ulti)VC and not h264 S(tereo)VC.
Agreed. The dwNumView field already allows for more than two views.

Quote:
Originally Posted by madshi View Post
(2) HDMI 1.4a not only knows left/right, but it also supports funny things like "L + Depth". Not sure if the new media type should support such things, too?
I don't think so. It is not used in practice and would make the new media type much more difficult. The only 2D+Depth files are the ones for Philips Wow 3D displays, but they not only use 2D+Depth but some of them 2D+Depth+Background 2D+Background Depth.

I suggest to define a separate media type for 2D+Depth and do not cover it in our new media type.

Quote:
Originally Posted by madshi View Post
(3) I'd like to use the opportunity to add more information to the header structure. Maybe we should name it completely different, e.g. "FORMAT_VideoInfo3". However, that name might collide with a future Microsoft format, so maybe we should use "FORMAT_Doom9VideoInfo" or "FORMAT_OpenVideoInfo" or whatever.
They name can always be changed without breaking binary compatibility. Only in the source code a search&replace must be performed. The GUID is unique in any case, even if there is a collision of the names.

FORMAT_OpenVideoInfo is my personal preference.

Quote:
Originally Posted by madshi View Post
(4) I strongly suggest reusing the same FOURCC values that also exist for conventional 2D frames, instead of defining new ones.
Right. I did this because I needed FOURCCs for formats where each plane resides in different formats. I've published this so that everybody can see how the existing Stereoscopic Player works, but it should no be part of the new standard.

Quote:
Originally Posted by madshi View Post
(6) Let's try to name the methods of the extended IMediaSample interface similar to the original functions. E.g. instead of "LockBuffer" I'd suggest "GetPointerEx/GetSizeEx".
Agreed. I forgot the GetSizeEx method in my posting above. Will fix it later.

Quote:
Originally Posted by madshi View Post
(7) I'd suggest to start the structure with VIDEOINFOHEADER2, and to just append more data to it.
Agreed. That's what I did.

Quote:
Originally Posted by madshi View Post
(8) Additional fields I want to have in the header, all taken from the h264 specification:

- video_format
- video_full_range_flag
- colour_primaries
- transfer_characteristics
- matrix_coefficients
- chroma_format_idc
- chroma_sample_loc_type_top_field
- chroma_sample_loc_type_bottom_field
- bit_depth_luma
- bit_depth_chroma

VIDEOINFOHEADER2 already has support for parts of this values. Microsoft's doc says: If the AMCONTROL_COLORINFO_PRESENT flag is set in the dwControlFlags member, you can cast the dwControlFlags value to a DXVA_ExtendedFormat structure to access the extended color information.

http://msdn.microsoft.com/en-us/libr...67(VS.85).aspx
pwimmer is offline   Reply With Quote
Old 6th August 2010, 15:02   #10  |  Link
BetaBoy
CoreCodec Founder
 
BetaBoy's Avatar
 
Join Date: Oct 2001
Location: San Francisco
Posts: 1,421
I'm bringing in squid_80 and Haali to follow/discuss.
__________________
Dan "BetaBoy" Marlin
Ubiquitous Multimedia Technologies and Developer Tools

http://corecodec.com
BetaBoy is offline   Reply With Quote
Old 6th August 2010, 17:12   #11  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Quote:
Originally Posted by nevcairiel View Post
While this idea sounds good in theory, there is one problem with it.
Some formats require codec/format specific data that gets appended after the BITMAPINFOHEADER in VIDEOINFOHEADER2.
If you want to append data to the end of VIDEOINFOHEADER2, you would occupy this space.
True, so it would not be *perfectly* compatible to FORMAT_VideoInfo2, but only "mostly". But isn't 90% compatability better than none at all? Or in other words: What do we lose if we start with VIDEOINFOHEADER2?

Quote:
Originally Posted by nevcairiel View Post
This new format is mainly designed for data from the decoder to the renderer, not from the source/splitter to the decoder, right?
Would decoders benefit from any additional information we could provide on the splitter side of things?

I'm currently working on a DS source/splitter and just wondering if on this side anything new is required.
There's been some 3D related discussion on the Matroska mailing list in the last couple of days/weeks, too. So I think this new media type could also be very useful to connect splitters with decoders. If you have any splitter related new fields in mind that would be useful, we can consider them.

Quote:
Originally Posted by kieranrk View Post
You should also include Side-by-side MPEG-2 and H.264 broadcasts. There are also 720p120 H.264 3D broadcasts.
Yes, absolutely.

Quote:
Originally Posted by pwimmer View Post
Agreed. The dwNumView field already allows for more than two views.

[...]

I don't think so. It is not used in practice and would make the new media type much more difficult. The only 2D+Depth files are the ones for Philips Wow 3D displays, but they not only use 2D+Depth but some of them 2D+Depth+Background 2D+Background Depth.
Why would it make the media type more difficult? We could simply add one more "guidStereoLayout" (we should change the name of that field, though) for 2D+Depth etc. The first view index would then be 2D, index 1 would be Depth. I don't see how that would make anything more complicated?

Quote:
Originally Posted by pwimmer View Post
I suggest to define a separate media type for 2D+Depth and do not cover it in our new media type.
Why? I'd have to implement 2 different media types in my renderer, then. That makes things only more complicated. I'd much prefer to have only one new media type which can handle all new formats.

Quote:
Originally Posted by pwimmer View Post
Agreed. That's what I did.
No, you've inserted your new fields *before* the bmiHeader field. As a result your structure definition is not compatible to VIDEOINFOHEADER2.

Quote:
Originally Posted by pwimmer View Post
VIDEOINFOHEADER2 already has support for parts of this values. Microsoft's doc says: If the AMCONTROL_COLORINFO_PRESENT flag is set in the dwControlFlags member, you can cast the dwControlFlags value to a DXVA_ExtendedFormat structure to access the extended color information.

http://msdn.microsoft.com/en-us/libr...67(VS.85).aspx
Cool, I didn't know that! However, does any current splitter/decoder actually fill in these values? I doubt it. I think if we add dedicated fields to the structure and make them mandatory (although they may be set to "not specified"), that would very much increase the chance of splitters & decoders actually filling in proper values. Furthermore, the DXVA_ExtendedFormat structure does not contain all the information I'm looking for.

Quote:
Originally Posted by BetaBoy View Post
I'm bringing in squid_80 and Haali to follow/discuss.
That's great - thanks!

Here comes my suggestion:

Code:
typedef struct tagOPENVIDEOINFOHEADER
{
  VIDEOINFOHEADER2 VideoInfoHeader2;
  MULTIVIEWHEADER  MultiView;
  VIDEOPROPSHEADER VideoProps;
  DWORD            Flags[8];   // Flags[0] & 0x1 = Full Range
} OPENVIDEOINFOHEADER;

typedef struct tagMULTIVIEWHEADER
{
  DWORD ViewType;   // enum to be defined
  DWORD NumViews;
}

typedef struct tagVIDEOPROPS
{
  BYTE ColourPrimaries;           // enum to be defined
  BYTE TransferCharacteristics;   // enum to be defined
  BYTE MatrixCoefficients;        // enum to be defined
  BYTE ChromaFormat;              // enum to be defined
  BYTE ChromaLocTopField;         // enum to be defined
  BYTE ChromaLocBottomField;      // enum to be defined
  BYTE LumaBitdepth;
  BYTE ChromaBitdepth;
}
madshi is offline   Reply With Quote
Old 7th August 2010, 00:09   #12  |  Link
pwimmer
Registered User
 
Join Date: Jan 2005
Posts: 99
I've written a C++ header file that includes all the suggestions made so far. The link is in the second message of this thread.

It was possible to use a union including the VIDEOPROPS structure and dwControlFlags (that can be casted to DXVA_ExtendedFormat). This means I didn't break compatibility with the VIDEOPINFOHEADER2. New fields are only required for the stereo stuff.


Quote:
Originally Posted by madshi View Post
Code:
typedef struct tagVIDEOPROPS
{
  BYTE ColourPrimaries;           // enum to be defined
  BYTE TransferCharacteristics;   // enum to be defined
  BYTE MatrixCoefficients;        // enum to be defined
  BYTE ChromaFormat;              // enum to be defined
  BYTE ChromaLocTopField;         // enum to be defined
  BYTE ChromaLocBottomField;      // enum to be defined
  BYTE LumaBitdepth;
  BYTE ChromaBitdepth;
}
ColourPrimaries, TransferCharacteristics, MatrixCoefficients, ChromaFormat and FullRange are already covered by DXVA_ExtendedFormat. I just added a few new items to the enumerations. To ensure that we do not have a conflict if Microsoft should add their own items in the future as well, I used the highest possible values for the new items instead of the next available value. ChromaLocTopField and ChromaLocBottomField are imho not required, they are covered by DXVA2_VideoChromaSubsampling. LumaBitdepth and ChromaBitdepth is implicitely defined by the FOURCC, there is no need to store it twice. Consequently, all required fields are already present in DXVA_ExtendedFormat so we can stay compatible.
pwimmer is offline   Reply With Quote
Old 7th August 2010, 09:02   #13  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Quote:
Originally Posted by pwimmer View Post
I've written a C++ header file that includes all the suggestions made so far.
We're working together to define the structures. No offense, but does it make sense to put your name as the copyright holder above it? If at all, shouldn't this thread be the copyright holder?

Quote:
Originally Posted by pwimmer View Post
Redistributions in binary form must reproduce the above copyright
No, thanks!! Your copyright conditions are worse than using media types defined by MS!

Quote:
Originally Posted by pwimmer View Post
It was possible to use a union including the VIDEOPROPS structure and dwControlFlags (that can be casted to DXVA_ExtendedFormat). This means I didn't break compatibility with the VIDEOPINFOHEADER2. New fields are only required for the stereo stuff.
I would still prefer dedicated fields in the header, because this whole logic with "AMCONTROL_COLORINFO_PRESENT" makes the VIDEOPROPS fields look very much optional. I want them to be mandatory. If we make them optional, then once again nobody will bother using them.

Quote:
Originally Posted by pwimmer View Post
LumaBitdepth and ChromaBitdepth is implicitely defined by the FOURCC, there is no need to store it twice.
FOURCC defines 8bit, 10bit or 16bit. Intermediate values are not defined (at least not by MS). Also the existing FOURCCs always define the same bitdepth for Luma and Chroma. The h264 spec allows *any* bitdepth between 8bit and 14bit, and it allows different bitdepths for Luma and Chroma. So let's consider a video track with 9bit Luma and 8bit Chroma. What will a renderer output? Probably 10bit for both Luma and Chroma. But now the video renderer does not know the *native* bitdepth of the source. And that information could be very useful for specific processing algorithms (e.g. anti-banding post processing).

BTW, your structure is still incompatible with VIDEOINFOHEADER2, because the bmiHeader field is in the wrong place, as I said before.

Quote:
Originally Posted by pwimmer View Post
For example, for a 1920 x 1080 video containing squeezed side-by-side format
// the dwPictAspectRatioX should be set to 32 and dwPictAspectRatioY to 9.
This is debatable. The 32:9 aspect ratio would describe what is transfered downstream. But dwPictAspectRatioX/Y actually does not describe what is transfered, but what should ultimately be visible on the screen. And that is 16:9. Whether the side-by-side is squeezed or not is very obvious from the bmiHeader information. Actually it could even be half squeezed, or it could be anamorphically squeezed in addition to the side-by-side squeeze, no problem at all. No need to misuse the dwPictAspectRatioX/Y fields.

Quote:
// ----------------------------------------------------------------------------
// Stereoscopic layout information
// ----------------------------------------------------------------------------
As said before, please don't use "stereo", please replace it with "Multiview" or something not stereo-specific.

Quote:
GUID guidStereoLayout; // See supported STEREOLAYOUT_xxx GUIDs above
union {
STEREOLAYOUTPARAMS1 stereoLayoutParams1; // Layout-specific parameters
STEREOLAYOUTPARAMS2 stereoLayoutParams2; // Layout-specific parameters
DWORD dwReserved3[2]; // Set to zero if neither of the STEREOLAYOUTPARAMS structures is used
};
The union solution is somewhat "clever", but I'd prefer a simple flat structure with all fields in it. Just imagine you want to log out the information fields of the media type and the multiview GUID is not known to you (it might be a new GUI defined after your software was written). In this situation you don't know which of the unions is the correct one to use. Because of that I'd prefer a flat structure with all fields in it.

Quote:
// Video transfer matrix. Values are identical to the
// DXVA2_VideoTransferMatrix and MFVideoTransferMatrix enumerations.
I'd like the media type to have a way to signal that the original source was encoded in RGB and not YCbCr. Maybe we could add a transfer matrix for RGB (basically a passthrough matrix) for that purpose? The reason why I want to know which format the original source had is that my renderer insists on doing all video processing, so it refuses a connection which would result in the decoder doing any kind of conversion. Which means that I'd like to be able to see from the media type information whether the connection would be a "native" connection or not. If the original source was YCbCr, my renderer will not accept an RGB connection and vice versa...

Quote:
Views must be ordered from left to right. For stereoscopic content (dwNumViews = 2), view 0 is the left and view 1 the right view. For multiview content (dwNumViews > 2), view 0 is the most-left view, view dwViewNum-1 is the most-right view.
If that is the case then why do you have different "STEREOLAYOUT_SIDEBYSIDELEFTFIRST" and "SIDEBYSIDERIGHTFIRST" GUIDs? I'd simply drop that quoted paragraph.

Quote:
The IOpenMediaSample interface must be supported by media samples if the field guidStereoLayout in the STEREOVIDEOINFOHEADER structure is set to the value STEREOLAYOUT_MULTIBUFFERS.
Why only for MULTIBUFFERS? E.g. for 2DDEPTH IOpenMediaSample would also be needed.

Quote:
Number of horizontal tiles (only for STEREOLAYOUT_TILEDxxx, zero otherwise)
I'd say "1 otherwise". Makes more sense to me.
madshi is offline   Reply With Quote
Old 8th August 2010, 00:33   #14  |  Link
pwimmer
Registered User
 
Join Date: Jan 2005
Posts: 99
I've addressed the issues you noticed.

* The license terms are less strict.

* A Doom9 thread cannot be a copyright holder, only individuals. Of course I'll add other contributers as well. Let me know name & email address.

* VIDEOPROPS will always be optional because in many causes the information is unknown. In this case I prefer the value "unknown" instead of forcing a filter to write some mess in these fields. If it is "unknown", it gives the application the chance to the ask the user or apply application-specific defaults.

* You should define new FOURCCs for currently unsupported bit depths. Using existing FOURCCs with wrong bit depths will introduce more problems than it solves because developers will not expect other bit depths than those defined by Microsoft. Is there really content with unusual bit depths? The H.264 spec is very flexible, but that doesn't mean that all possible combinations are used in practice. Lets see the optinion of other people on this issue...

* Using dwPictAspectRatioX and dwPictAspectRatioY is a very convenient way to make squeezed stereo files display properly in 2D media players. Although the output is not stereo but still two views, at least the image is not distored. Imho it is a clean solution, it is pretty much the same situation as with anamorphic 2D content. dwPictAspectRatioX and dwPictAspectRatioY are designed for the purpose of making the video display in the proper aspect ratio, so why not use it for stereo content as well? Anyway, I added an alternative that allows to leave dwPictAspectRatioX and dwPictAspectRatioY untouched: StereoFlags_HalfHorizontalResolution and StereoFlags_HalfVerticalResolution. But I would allow both methods.

* I modified the text to mention "multiview" in addition to "stereoscopic". As multiview content is typically used for autostereoscopic displays, I do not consider the term "stereoscopic" wrong or inappropriate. The specification currently doesn't cover any other use of multiview content than for autostereoscopic displays. If somebody is familiar with other usage scenarios of multiview content (e.g. multi viewpoint video) and has suggestion what could be added to better support such scenarios, then let me know.

* I completely gave up the union stuff in favor of a flat structure.

* The concept of a passthrough matrix is rather confusing and I doubt any filter developer would support it. I added a much simpler feature that allows to find out if a media type is the native one: The flags AdvancedFlags_NativeFormatUnknown, AdvancedFlags_NativeFormatTrue and AdvancedFlags_NativeFormatFalse.

* I have different "STEREOLAYOUT_SIDEBYSIDELEFTFIRST" and "SIDEBYSIDERIGHTFIRST" GUIDs because I want the GUIDs to be as precise as possible. "Views must be ordered from left to right" only applies to the STEREOLAYOUT_MULTIBUFFER GUID. A filter can easily change the order by swapping pointers while it would mean memcpys for other layouts. Thus I only created a single GUID for the multi buffer layout but not for the other layouts. I updated the comments to clarify the issue.

* The latest version contains many new stereo layouts and desciptions for all layouts. Most notably, I added STEREOLAYOUT_UNKNOWN that should be used instead STEREOLAYOUT_MONOSCOPIC if the actual content is unknown.

* The latest version also includes many new fields in the new STEREOLAYOUTPROPS structure, e.g. orientation (rotation and flipping), cropping, parallax adjustment. All these stuff is used and required in practice, e.g. for content recorded with mirror rigs, beam splitter or other optical stereo attachments.

Last edited by pwimmer; 8th August 2010 at 04:07.
pwimmer is offline   Reply With Quote
Old 8th August 2010, 08:33   #15  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
I like some/most of the changes, but I still see some issues. You seem to know a lot more about 3D/multiview related things than I do, so there's not much I can say about that.

(1) Again: Your structure is not compatible with VIDEOINFOHEADER2, because bmiHeader is in the wrong place.

(2) The license terms are much better. But IMHO we don't need any copyright at all. But let's hear what other people have to say about that.

(3) About VIDEOPROPS being optional: The same argument you're using could also be used for the multiview properties - yet all those fields are there and must be filled. There's no "this_is_multiview_content" flag which must be set first to activate the multiview structures. I want the same for VIDEOPROPRS. Of course there should be an "undefined" or "unknown" value for every VIDEOPROPS field. But having to activate the VIDEOPROPRS structure first by setting an additional flag is not good enough for me. If the VIDEOPROPS information is unknown, then filters can set the fields that way. But very often the information is known (or could easily be retrieved) and it's just not set because of lazyness.

(4) I just noticed that one original field I was asking for, namely a "ChromaFormat (4:2:0, 4:2:2, 4:4:4)" is missing. Yeah, I already hear you say: "You can see that from the FOURCC". No, you can't, because an intermediate filter (e.g. ffdshow raw video processor) could already have applied chroma upsampling. Or the decoder could have internally upsampled chroma for e.g. YUY2 output. I want to know what format the original source had, not what the upstream filters have converted the original source data to.

(5) Right now (4) made me think: Maybe there should be 2 VIDEOPROPS structures: One for the native format. And another one for the "current" data format. Something like "NativeVideoProps" and "CurrentVideoProps".

(6) "Using existing FOURCCs with wrong bit depths will introduce more problems than it solves". I never meant to suggest using incorrect bitdepths. Of course if a filter uses a 10bit FOURCC then the data must really be 10bit. But if the original video source was only 9bit then the least significant bit of that 10bit data will always be 0. And that is an important thing to know for the video renderer. It makes no sense at all to define dozens of different FOURCCs for any wild bitdepth combination. No decoder writer would ever implement dozens of FOURCCs for that purpose. Too much programming time needed. Too much time needed for testing. It makes *much* more sense to e.g. always have the filter use a 16bit FOURCC with the video data "upconverted" to 16bit, with additional information fields that indicate how many least significant bits are zeroed out.

(6) About dwPictAspectRatioX/Y. I was just starting to write a long text about how we should set it to 16:9, but then I thought: If there's a video renderer which doesn't really support/know 3D at all, then if we set aspect ratio to 16:9 for "side-by-side (full)" content, the video renderer will draw the image incorrectly. So I have to change my opinion and agree with your original suggestion to use 32:9 for side-by-side (full), for better compatability to non-3D-aware video renderers. Which also means that I vote for removing those half/full flags again.

(6) What happens if the decoder wants to fill the VIDEOPROPS fields, but the content is originally RGB? In that case the current transfer matrix and chroma subsampling options don't fit. The correct transfer matrix for RGB is "passthrough". You could also name it "no matrix (RGB)" or something like that.

(7) I fear that the native flags might not work as intended. E.g. if a decoder programmer finds that his decoder does not connect to my renderer if he sets "NativeFormatFalse", then he might misunderstand that as a bug in my renderer and as a "fix" he may decide to always set the flag to "True" or "Unknown". As a result I'd be screwed. I'd prefer to have decoder programmers fill in the correct values in the VIDEOPROPS structure, then I can check myself what they've done.

So my suggestion would be this:

Code:
typedef struct tagOPENVIDEOINFOHEADER {
  VIDEOINFOHEADER2 videoInfo2;
  VIDEOPROPS nativeVideoProps;
  VIDEOPROPS currentVideoProps;
  STEREOLAYOUTPROPS stereoLayoutProps;
  UINT reserved[8];    // Set to zero when writing and ignore when reading
} OPENVIDEOINFOHEADER;
I also like to suggest using "videoInfo2" like that, because it makes the structure look much smaller and easier to understand. If you list all VIDEOINFOHEADER2 fields one by one in your structure, your structure looks much more intimidating.

Last edited by madshi; 8th August 2010 at 08:36.
madshi is offline   Reply With Quote
Old 8th August 2010, 15:14   #16  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,348
You should forget the compat with VIH2. Devs need to modify their code anyway for the new format, and as long as the fields are the same, plus new ones, you can still parse it with the same code.
Having a VIH2 at the top is rather confusing, because the BMI in the VIH2 can actually grow with extradata. How would you handle this? Move it to the end of the OVH? Your compat is already gone, then.

I'm with pwimmer on the basic design, copy VIH2 and add new fields, keep BMI at the end.
nevcairiel is offline   Reply With Quote
Old 8th August 2010, 17:08   #17  |  Link
pwimmer
Registered User
 
Join Date: Jan 2005
Posts: 99
Quote:
Originally Posted by nevcairiel View Post
Having a VIH2 at the top is rather confusing, because the BMI in the VIH2 can actually grow with extradata. How would you handle this? Move it to the end of the OVH? Your compat is already gone, then.
Having the full VIH2 at the top would be nice, but as you say it is not possible because BITMAPINFOHEADER is a variable size structure, so it must be at the end.

All the other fields should be identical, which is the reason why I would like to keep the VIH2-compatible VIDEOPROPS. Anyway, the VIDEOPROPS is big improvement over the dwControlFlags, because it's now obvious what information it contains. Many developers didn't know that dwControlFags can be casted to DXVA_ExtendedFormat. At least this issue is solved, which should increase the chance that the structure ios actually used. By keeping it compatible we even increase the change that developers even use it for VIH2, too.
pwimmer is offline   Reply With Quote
Old 8th August 2010, 17:58   #18  |  Link
pwimmer
Registered User
 
Join Date: Jan 2005
Posts: 99
Quote:
Originally Posted by madshi View Post
(3) About VIDEOPROPS being optional: The same argument you're using could also be used for the multiview properties - yet all those fields are there and must be filled. There's no "this_is_multiview_content" flag which must be set first to activate the multiview structures. I want the same for VIDEOPROPRS. Of course there should be an "undefined" or "unknown" value for every VIDEOPROPS field. But having to activate the VIDEOPROPRS structure first by setting an additional flag is not good enough for me.
I agree it is not good that there is a separate flag to activate the VIDEOPROPS structure. But that's how Microsoft specified it and since we can reuse the structure, it makes sense to stay compatible.

There is no flag to active the STEREOLAYOUTPROPS, but it is optional, of course. If all byte are set to zero (thus, if the developer ignores it), this defaults to STEREOLAYOUT_UNKNOWN (which is deliberately defined as GUID_NULL). All the other fields in the structure are defined in a way that ensures that 0 always means the default. A renderer that ignores STEREOLAYOUTPROPS and a renderer that obeys STEREOLAYOUTPROPS will output the same image when all fields of STEREOLAYOUTPROPS are zero.

Quote:
Originally Posted by madshi View Post
(4) I just noticed that one original field I was asking for, namely a "ChromaFormat (4:2:0, 4:2:2, 4:4:4)" is missing. Yeah, I already hear you say: "You can see that from the FOURCC". No, you can't, because an intermediate filter (e.g. ffdshow raw video processor) could already have applied chroma upsampling. Or the decoder could have internally upsampled chroma for e.g. YUY2 output. I want to know what format the original source had, not what the upstream filters have converted the original source data to.
Ok, now I understand what you mean. But I don't believe the fields are well-thought-out yet. What do you do if the filter delivers 4:4:4 or RGB and says the original chroma format was 4:2:0? You can't do anything, because you do not know how the upsampling was performed. You cannot undo it. It is simply impossible to restore the original data. To find out if it is not the native format, the flags I've already introduced are sufficient.

Quote:
Originally Posted by madshi View Post
(6) But if the original video source was only 9bit then the least significant bit of that 10bit data will always be 0.
Actually padding with 0 is the wrong way to convert to higher bit depths. This would make the image slightly darker than it should be.

EDIT: I was wrong. http://msdn.microsoft.com/en-us/libr...px#_420formats specifies that bit shifting should be used!

Quote:
Originally Posted by madshi View Post
(6) About dwPictAspectRatioX/Y. I was just starting to write a long text about how we should set it to 16:9, but then I thought: If there's a video renderer which doesn't really support/know 3D at all, then if we set aspect ratio to 16:9 for "side-by-side (full)" content, the video renderer will draw the image incorrectly. So I have to change my opinion and agree with your original suggestion to use 32:9 for side-by-side (full), for better compatability to non-3D-aware video renderers. Which also means that I vote for removing those half/full flags again.
There are pros and cons for both approaches. To con of dwPictAspectRatioX/Y is that the display size in a 2D player depends on the decoder being used. A old decoder not supporting the spec would output a squeezed image while a new decoder would output a full-size image. The flags avoid this issue, while still allowing a 3d-enabled renderer/player to properly decode the side-by-side format and display at correct aspect ratio.

Imho, there is no clear winner which approach is better, so I would keep both and let the developer decide.

Quote:
Originally Posted by madshi View Post
(6) What happens if the decoder wants to fill the VIDEOPROPS fields, but the content is originally RGB? In that case the current transfer matrix and chroma subsampling options don't fit. The correct transfer matrix for RGB is "passthrough". You could also name it "no matrix (RGB)" or something like that.
For RGB content, filters should write VideoPrimaries_Unknown and VideoTransferMatrix_Unknown; when reading, a filter should ignore videoTransferMatrix and videoPrimaries.

This is the way it is already handled in VIH2 and I see no need to change it.

Quote:
Originally Posted by madshi View Post
(7) I fear that the native flags might not work as intended. E.g. if a decoder programmer finds that his decoder does not connect to my renderer if he sets "NativeFormatFalse", then he might misunderstand that as a bug in my renderer and as a "fix" he may decide to always set the flag to "True" or "Unknown". As a result I'd be screwed. I'd prefer to have decoder programmers fill in the correct values in the VIDEOPROPS structure, then I can check myself what they've done.
But if the programmer correctly fills in nativeVideoProps and currentVideoProps, you filter would refuse connection as well (because it is not the native format), resulting in exactly the same problems.

It's more likely that developers are willing to properly set a single flag than a complex structure.

Last edited by pwimmer; 8th August 2010 at 18:09.
pwimmer is offline   Reply With Quote
Old 8th August 2010, 18:26   #19  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Quote:
Originally Posted by nevcairiel View Post
Having a VIH2 at the top is rather confusing, because the BMI in the VIH2 can actually grow with extradata. How would you handle this?
How big can it grow? As far as I can see, the max size would be "RGBQUAD bmiColors[256]", right? So the problem could be solved like this:

Code:
typedef struct tagOPENVIDEOINFOHEADER {
  VIDEOINFOHEADER2 videoInfo2;
  RGBQUAD bmiColors[256];
  VIDEOPROPS nativeVideoProps;
  VIDEOPROPS currentVideoProps;
  STEREOLAYOUTPROPS stereoLayoutProps;
  UINT reserved[8];    // Set to zero when writing and ignore when reading
} OPENVIDEOINFOHEADER;
Violà, perfect compatability with VIDEOINFOHEADER2. Or am I missing something?
madshi is offline   Reply With Quote
Old 8th August 2010, 18:43   #20  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Quote:
Originally Posted by pwimmer View Post
I agree it is not good that there is a separate flag to activate the VIDEOPROPS structure. But that's how Microsoft specified it and since we can reuse the structure, it makes sense to stay compatible.
You say yourself that many devs didn't know about DXVA_ExtendedFormat. So I don't see a big loss if we don't use that, but define our own fields instead. Using our own fields would not be incompatible, either, because they'd be outside of VIDEOINFOHEADER2.

Quote:
Originally Posted by pwimmer View Post
Ok, now I understand what you mean. But I don't believe the fields are well-thought-out yet. What do you do if the filter delivers 4:4:4 or RGB and says the original chroma format was 4:2:0? You can't do anything, because you do not know how the upsampling was performed. You cannot undo it. It is simply impossible to restore the original data.
Correct. But I can inform the user, saying that the decoder has upsampled chroma and that he should modify the decoder settings to disable the chroma upsampling, or switch to another decoder.

Quote:
Originally Posted by pwimmer View Post
Actually padding with 0 is the wrong way to convert to higher bit depths.

EDIT: I was wrong. http://msdn.microsoft.com/en-us/libr...px#_420formats specifies that bit shifting should be used!
Doing anything else would introduce banding, so bit shifting is the only reasonable solution. (Unless you want to go the full mile with floating point and dithering etc).

Quote:
Originally Posted by pwimmer View Post
To con of dwPictAspectRatioX/Y is that the display size in a 2D player depends on the decoder being used. A old decoder not supporting the spec would output a squeezed image while a new decoder would output a full-size image.
I don't understand what you mean here. Can you explain?

Quote:
Originally Posted by pwimmer View Post
For RGB content, filters should write VideoPrimaries_Unknown and VideoTransferMatrix_Unknown
Wrong. The primaries are very important for RGB content, too!!

Quote:
Originally Posted by pwimmer View Post
But if the programmer correctly fills in nativeVideoProps and currentVideoProps, you filter would refuse connection as well (because it is not the native format), resulting in exactly the same problems.
With nativeVideoProps/currentVideoProps, the programmer would have to actually lie to make my filter connect. With the flags you have suggested he simply has to leave it at "0", which is even the default value (provided that he uses memset(0))! Furthermore, if I have nativeVideoProps and currentVideoProps, I can see exactly what the filter chain has done and so I can give exact tips to the end user about what he has to change to improve image quality.
madshi is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 12:52.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.