Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
|
#1 | Link |
|
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,398
|
Smooth Streaming and *.ismv fragmented MPEG-4
So, as some of you may have heard, we (Microsoft) have launched the beta of Smooth Streaming this week. It's an adaptive streaming technology combining IIS and Silverlight. The core file format is fragmented MPEG-4, where each "chunk" of video is transmitted as a moof fragment starting with a Closed GOP, via a single HTTP request. A chunk will typically be 2-4 seconds long. Audio can be either muxed into the same chunk, or be provided in a parallel series of chunks to enable multilangauge audio or what have you.
The file format used is straight-up ISO fragmented MPEG-4, using XML and SMIL manifests to indicate what bitrates in the file set and where the fragments in those file are. We're not trying to make up a new file format here; just take advantage of existing technologies in a novel way. Lots of other details to be had, including this roundup of links: http://on10.net/blogs/benwagg/Beta-R...oth-Streaming/ More importantly, zambelli had a great post showing how we're using the file format: http://alexzambelli.com/blog/2009/02...-architecture/ And we now have some sample files up that I encoded, including the media files and manifest files: http://on10.net/blogs/benwagg/Big-Bu...-for-download/ (if you're curious how it was encoded) http://on10.net/blogs/benwagg/Behind...ig-Buck-Bunny/ The current samples are VC-1 and WMA 10 Pro, but we'll be supporting H.264 and AAC-LC payloads with the next version of Silverlight later this year. So, if you start seeing these *.ismv files out there, just know that they're
We've become big belivers in fragmented MPEG-4 as a file format for all kinds of usage, and have been getting similar feedback across the industry, so I expect we'll be seeing a lot more fMP4 being used in the future. I haven't been able to find many players that can demux fMP4 yet, but I expect there will be increasing demand for that feature. While the actual file switching heuristics can e highly complicated and can be highly tuned to the content (they're delivered on the fly as part of the Silverlight application), being able to play back a single bitstream local file should be quite straightforward. |
|
|
|
|
|
#3 | Link | |
|
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,398
|
Quote:
ASF was a great format for bit-pumping - the MIPS-per-Mbps ratio was much better than other formats of its era. However, it simply wasn't well set up to allow a byterange to encapsulate a fragment that could potentially be a single closed GOP. Since server hardware is so much more powerful these days, plus proxy caching so dramaticallly reduces the load on the origin server, going for a somewhat more complex server-side parsing methadology made good sense. The MPEG-4 fragmented format was able to do everything we needed, so it was simplest to just use that rather than make up something new. Plus it's a good excuse to say "moof" in public .Although I'm sure the more wizened members of the board are having Clarus the Dogcow flashbacks... |
|
|
|
|
|
|
#4 | Link |
|
Registered User
Join Date: Aug 2004
Posts: 133
|
It would surely be great for the adoption in splitters if you could register your WMA extension to the MP4-container at MP4RA.
Furthermore: apparently the VC-1 track in the .ismv files are not compliant with SMPTE RP 2025-2007, which I thought would describe the one and only (despite so far not supported by anybody(?)) official way to put VC-1 into MP4. Can you comment? |
|
|
|
|
|
#5 | Link | ||
|
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,398
|
Quote:
Quote:
|
||
|
|
|
|
|
#8 | Link |
|
Registered User
Join Date: Feb 2003
Location: Palmcoast of Norway
Posts: 362
|
I like this aproach, but one question comes to mind:
Is the "chunking" done pr gop? Or can one have multiple gop's pr chunk? Or do one have really long gops stretching over lets say 5secs? Another thing: Will the smoothstreaming plugin for IIS be opensources at one time ? Like for apache? How does this work on normal Proxy servers like squid and such ? What about edge proxies running in reverse proxy mode? Sorry for all the questions ![]() Best regards TEB |
|
|
|
|
|
#9 | Link | |||
|
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,398
|
Uo
Quote:
The key thing is that Silverlight is using the MediaStreamSource API to pass of a single bitstream to the decoders. So whenever there's a change in the bitrate or frame size or whatever, it needs a new sequence header for that change. So, really the requirement is that each chunk start with a sequence header. Quote:
Only the origin server needs to run IIS. Quote:
We're already looking at mediator's suggestions. |
|||
|
|
|
|
|
#10 | Link |
|
Registered User
Join Date: Feb 2003
Location: Palmcoast of Norway
Posts: 362
|
Thx for a prompt anwser
![]() So if ive gotten this correct the concept of smoothstreaming consists of : 1. Smoothstreaming plugin for IIS (now in public beta) to make up the "origin" server 2. Smoothstreaming on the clientside implemented in Silverlight 2.x, released, 3.x coming in a few months with h.264 support 3. Encoding and chunking of streams via Expression Encoder, released in EE 2.x Or have i missunderstood? ![]() Will Microsoft make some kind of application to chunk already encoded h.264 files eg x264/mp4box? br TEB |
|
|
|
|
|
#11 | Link | ||
|
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,398
|
Quote:
Quote:
Of course, if seamless stream switching isn't required (either it's just a single bitrate, or a little buffering at a switch isn't a problem) existing content couold be repurpsed just fine. The proxy caching scalability gains are orthgonal to the seamless stream switching in the decoder. There's nothing reuquired from Microsoft to be able to mux those files or generate the manifest, but we are looking at various ways to help the ecosystem author Smooth Streaming compatible content. |
||
|
|
|
|
|
#12 | Link | ||
|
Registered User
Join Date: Feb 2002
Posts: 40
|
Quote:
Quote:
Also the files seem to be crashing Atomic Parsley. Can anyone recommend any patches for it or other tools to analyze these files? |
||
|
|
|
|
|
#13 | Link | |||
|
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,398
|
Quote:
Quote:
Is there a particular scenario in mind you have for that? Quote:
|
|||
|
|
|
|
|
#15 | Link | |
|
Swallowed in the Sea
Join Date: Oct 2002
Location: Aix-en-Provence, France
Posts: 5,191
|
Quote:
MediaInfo is able to retrieve some infos from those files. MP4 Browser from Miravid (google is your friend ) is also able to get the tree (Just rename those files to mp4).Ex:
|
|
|
|
|
|
|
#16 | Link | |
|
Life's clearer in 4K UHD
Join Date: Jun 2003
Location: Notts, UK
Posts: 12,120
|
Quote:
__________________
| I've been testing hardware media playback devices and software A/V encoders and decoders since 2001 | My Network Layout & A/V Gear |
|
|
|
|
|
|
|
#19 | Link |
|
Registered User
Join Date: Oct 2007
Posts: 2
|
Problems conforming to ISO Media
It seems the stsd id in 'trex' atom is wrong according to specs:
aligned(8) class TrackExtendsBox extends FullBox(‘trex’, 0, 0){ unsigned int(32) track_ID; unsigned int(32) default_sample_description_index; 'stsc': "sample_description_index is an integer that gives the index of the sample entry that describes the samples in this chunk. The index ranges from 1 to the number of sample entries in the Sample Description Box" In the samples, stsd id is '0' while it should be '1'. It seems also that "first_sample_flags" are weird for the 'trun' boxes for the video track, value seems to be '0x40' which does not refer to keyframe flag, if it was the intent. Extradata for the VC-1 stream is contained at the end of the 'stsd', and seems compatible with the format used in .wmv, however this format is not compliant with SMPTE RP 2025 IIRC (I might be wrong though), and is _not_ contain in any box, while it should for sake of uniformity of ISO Media. I did not check the validity of the 'ctts' values yet, I will try to soon. Hope this helps. |
|
|
|
|
|
#20 | Link |
|
Registered User
Join Date: Oct 2005
Posts: 2
|
The smooth streaming files are not compliant
Indeed, it seems that the MP4 files created by the Expression Encoder 2 (SP1) are not compliant.
First, the 'avc1' major brand used is obviously not correct. But the biggest problem that I'm seeing is that the SampleEntry (that makes up the sample description in the 'stsd' box) is bogus. The spec (ISO 14496-12, section 11.3) says that when making a derived format (which this is), one must extend the relevant SampleEntry data structure by adding one or more boxes to it (not just data fields). So that means that for audio tracks, the SampleEntry in the 'stsd' container must be an AudioSampleEntry data structure followed by one or more boxes. For video tracks, a VisualSampleEntry followed by one or more boxes. For instance, the AVC spec extends the VisualSampleEntry in this way by adding an 'avcC' box. But the format used by microsoft violates this rule. If we look at the audio sample entry for example, we'll see that it does indeed start with the same fields as AudioSampleEntry (where the format is 'owma'), but instead of being followed by a box, it is followed by 'raw' fields. Here are the relevant bytes: Code:
00001c0: 0000 0000 0000 0000 0050 7374 7364 0000 .........Pstsd.. 00001d0: 0000 0000 0001 0000 0040 6f77 6d61 0000 .........@owma.. 00001e0: 0000 0000 0001 0000 0000 0000 0000 0002 ................ 00001f0: 0010 0000 0000 ac44 0000 6101 0200 44ac .......D..a...D. Why is this a problem? First, it breaks the spec. That's pretty sad. It will crash some code and generally make things painful for the ones who want to be strict. But more importantly, it makes this entry incompatible with parts of the specification that assume that the SampleEntry are extended in the normal way (with boxes). Specifically, when it comes to encryption: when encrypting MP4 data, the sample description is replaced with a 'wrapper', with the type 'enca' or 'encv' (14496-12, section 8.12), and an 'sinf' box is added at the end, containing an 'frma' box to indicate the original format, and other boxes with the crypto details. For this to work, the parser MUST be able to read the 'sinf' box. The parser can't know the original format until it has read the 'sinf' box and found the 'frma' box inside it. So the parser must assume that the data is layed out in the standard way, which means the AudioSampleEntry or VisualSampleEntry, followed by boxes. If there's anything else after the AudioSampleEntry or VisualSampleEntry that's not a box, that parser will be completely confused. Is it too late to fix this? |
|
|
|
![]() |
| Tags |
| fragmented mpeg-4, smooth streaming |
|
|