PDA

View Full Version : Smooth Streaming and *.ismv fragmented MPEG-4


benwaggoner
27th February 2009, 05:44
So, as some of you may have heard, we (Microsoft) have launched the beta of Smooth Streaming this week. It's an adaptive streaming technology combining IIS and Silverlight. The core file format is fragmented MPEG-4, where each "chunk" of video is transmitted as a moof fragment starting with a Closed GOP, via a single HTTP request. A chunk will typically be 2-4 seconds long. Audio can be either muxed into the same chunk, or be provided in a parallel series of chunks to enable multilangauge audio or what have you.

The file format used is straight-up ISO fragmented MPEG-4, using XML and SMIL manifests to indicate what bitrates in the file set and where the fragments in those file are. We're not trying to make up a new file format here; just take advantage of existing technologies in a novel way.

Lots of other details to be had, including this roundup of links:
http://on10.net/blogs/benwagg/Beta-Release-of-Smooth-Streaming/

More importantly, zambelli had a great post showing how we're using the file format:
http://alexzambelli.com/blog/2009/02/10/smooth-streaming-architecture/

And we now have some sample files up that I encoded, including the media files and manifest files:
http://on10.net/blogs/benwagg/Big-Buck-Bunny-Smooth-Streaming-sample-for-download/

(if you're curious how it was encoded)
http://on10.net/blogs/benwagg/Behind-the-Scenes-at-SmoothHDcom-Encoding-Big-Buck-Bunny/

The current samples are VC-1 and WMA 10 Pro, but we'll be supporting H.264 and AAC-LC payloads with the next version of Silverlight later this year.

So, if you start seeing these *.ismv files out there, just know that they're
Fragmented MPEG-4
With VC-1 or H.264 for video
WMA 10 Pro or AAC-LC for audio


We've become big belivers in fragmented MPEG-4 as a file format for all kinds of usage, and have been getting similar feedback across the industry, so I expect we'll be seeing a lot more fMP4 being used in the future.

I haven't been able to find many players that can demux fMP4 yet, but I expect there will be increasing demand for that feature. While the actual file switching heuristics can e highly complicated and can be highly tuned to the content (they're delivered on the fly as part of the Silverlight application), being able to play back a single bitstream local file should be quite straightforward.

Kurtnoise
27th February 2009, 06:21
Interesting that Microsoft is pushing this...:)



Just a silly question: WMAPro is covered by ISO standards ?

benwaggoner
27th February 2009, 07:27
Interesting that Microsoft is pushing this...:)

Just a silly question: WMAPro is covered by ISO standards ?
The file format is covered by the ISO MPEG-4 standard, but WMA Pro is not. MPEG-4 and AAC-LC would be part of the MPEG-4 standard. VC-1 is a SMPTE standard.

ASF was a great format for bit-pumping - the MIPS-per-Mbps ratio was much better than other formats of its era. However, it simply wasn't well set up to allow a byterange to encapsulate a fragment that could potentially be a single closed GOP. Since server hardware is so much more powerful these days, plus proxy caching so dramaticallly reduces the load on the origin server, going for a somewhat more complex server-side parsing methadology made good sense.

The MPEG-4 fragmented format was able to do everything we needed, so it was simplest to just use that rather than make up something new. Plus it's a good excuse to say "moof" in public :).

Although I'm sure the more wizened members of the board are having Clarus the Dogcow flashbacks...

mediator
27th February 2009, 11:08
It would surely be great for the adoption in splitters if you could register your WMA extension to the MP4-container at MP4RA.

Furthermore: apparently the VC-1 track in the .ismv files are not compliant with SMPTE RP 2025-2007, which I thought would describe the one and only (despite so far not supported by anybody(?)) official way to put VC-1 into MP4. Can you comment?

benwaggoner
27th February 2009, 15:31
It would surely be great for the adoption in splitters if you could register your WMA extension to the MP4-container at MP4RA.
Good point. I'll see what info is available.

Furthermore: apparently the VC-1 track in the .ismv files are not compliant with SMPTE RP 2025-2007, which I thought would describe the one and only (despite so far not supported by anybody(?)) official way to put VC-1 into MP4. Can you comment?
What's missing in that document for VC-1 in fMP4? I wasn't directly involved in the muxer implementation myself, but my understanding is that it was done following that spec.

mediator
27th February 2009, 16:16
the fourcc in the sample-description has to be "vc-1", also I don't see the "dvc1"-Box. If you closely compare file and spec, you should see what I mean.

benwaggoner
27th February 2009, 22:21
the fourcc in the sample-description has to be "vc-1", also I don't see the "dvc1"-Box. If you closely compare file and spec, you should see what I mean.
I've got the team looking into this now. Thanks for the heads up.

TEB
4th March 2009, 13:51
I like this aproach, but one question comes to mind:
Is the "chunking" done pr gop? Or can one have multiple gop's pr chunk? Or do one have really long gops stretching over lets say 5secs?

Another thing:
Will the smoothstreaming plugin for IIS be opensources at one time ? Like for apache? How does this work on normal Proxy servers like squid and such ? What about edge proxies running in reverse proxy mode?

Sorry for all the questions ;)
Best regards
TEB

benwaggoner
4th March 2009, 17:09
I like this aproach, but one question comes to mind:
Is the "chunking" done pr gop? Or can one have multiple gop's pr chunk? Or do one have really long gops stretching over lets say 5secs?
Yeah, the key thing to get seamless stream switching is for each chunk to start and end with a Closed GOP. But there's no reason that you couldn't have multiple GOPs per chunk, or an Open GOP in the middle.

The key thing is that Silverlight is using the MediaStreamSource API to pass of a single bitstream to the decoders. So whenever there's a change in the bitrate or frame size or whatever, it needs a new sequence header for that change. So, really the requirement is that each chunk start with a sequence header.

Another thing:
Will the smoothstreaming plugin for IIS be opensources at one time ? Like for apache? How does this work on normal Proxy servers like squid and such ? What about edge proxies running in reverse proxy mode?
Since each chunk has a unique URL, and all requests for that chunk get the same URL from any client, and each chunk is small enough to fit in a cache proxy, yes, they'll get automatically cached by the existing proxy cache infrastructure, including squid. As zabelli said in one of his blog posts, "Smooth Streaming adapts streaming to the web instead of trying to adapt the web to streaming." That includes proxy caches inside a CDN, at corporate firewalls, at ISPs, ecetera. This was a core design goal of the technology, so that availabilty of content scales with popularity and we don't get stuck with "waiting rooms" or other kinds of audience/bitrate caps.

Only the origin server needs to run IIS.

Sorry for all the questions ;)
Best regards
TEB
Please, keep them coming! This is a beta, so we're still working on the technology itself and how we explain it.

We're already looking at mediator's suggestions.

TEB
4th March 2009, 20:58
Thx for a prompt anwser ;)

So if ive gotten this correct the concept of smoothstreaming consists of :

1. Smoothstreaming plugin for IIS (now in public beta) to make up the "origin" server
2. Smoothstreaming on the clientside implemented in Silverlight 2.x, released, 3.x coming in a few months with h.264 support
3. Encoding and chunking of streams via Expression Encoder, released in EE 2.x

Or have i missunderstood? ;)

Will Microsoft make some kind of application to chunk already encoded h.264 files eg x264/mp4box?

br TEB

benwaggoner
4th March 2009, 21:18
Thx for a prompt anwser ;)

So if ive gotten this correct the concept of smoothstreaming consists of :

1. Smoothstreaming plugin for IIS (now in public beta) to make up the "origin" server
2. Smoothstreaming on the clientside implemented in Silverlight 2.x, released, 3.x coming in a few months with h.264 support
3. Encoding and chunking of streams via Expression Encoder, released in EE 2.x
That's a good description of the current set of products, yes. For #3, since we're using standard file formats we expect many other tools will come out that support authoring the *.ismv files at least.

Will Microsoft make some kind of application to chunk already encoded h.264 files eg x264/mp4box?
The trick with existing content is that each bitrate would need to start a new IDR at the same frame across bitrates, which little existing content does. But I imagine it's a feasible tweak for x264 to coordinate across the multiple encodes to ensure sequence header and I-frame alignment.

Of course, if seamless stream switching isn't required (either it's just a single bitrate, or a little buffering at a switch isn't a problem) existing content couold be repurpsed just fine. The proxy caching scalability gains are orthgonal to the seamless stream switching in the decoder.

There's nothing reuquired from Microsoft to be able to mux those files or generate the manifest, but we are looking at various ways to help the ecosystem author Smooth Streaming compatible content.

Shapierian
10th March 2009, 22:00
The current samples are VC-1 and WMA 10 Pro, but we'll be supporting H.264 and AAC-LC payloads with the next version of Silverlight later this year.


Will H.264/AAC samples be available before the next version of Silverlight is released into the wild?


So, if you start seeing these *.ismv files out there, just know that they're
Fragmented MPEG-4
With VC-1 or H.264 for video
WMA 10 Pro or AAC-LC for audio



Will VC-1 with AAC-LC or H.264 with WMA 10 Pro be valid configurations?


Also the files seem to be crashing Atomic Parsley. Can anyone recommend any patches for it or other tools to analyze these files?

benwaggoner
11th March 2009, 00:14
Will H.264/AAC samples be available before the next version of Silverlight is released into the wild?
Yes, I hope so. That said, it's easy enough to do that we haven't actually made any yet. I'll plan on creating a parallel set of files using H.264 and AAC when I can.

Will VC-1 with AAC-LC or H.264 with WMA 10 Pro be valid configurations?
We're not doing anything to keep it from working, but it's not something on our test plan either.

Is there a particular scenario in mind you have for that?

Also the files seem to be crashing Atomic Parsley. Can anyone recommend any patches for it or other tools to analyze these files?
MP4box has had some luck. But those moof fragments simply aren't used that broadly, yet. I hope to see that change; fMP4 is really a great format for all kinds of things.

mediator
11th March 2009, 09:59
Is there some news about compliance with SMPTE RP 2025-2007?

Kurtnoise
11th March 2009, 13:28
Also the files seem to be crashing Atomic Parsley. Can anyone recommend any patches for it or other tools to analyze these files?
what do you mean by "analyze" ?


MediaInfo is able to retrieve some infos from those files.

MP4 Browser from Miravid (google is your friend ;)) is also able to get the tree (Just rename those files to mp4).

Ex:

http://uppix.net/8/2/9/1ad043666b04aae8fde17efb30255.png (http://uppix.net/8/2/9/1ad043666b04aae8fde17efb30255.html)

SeeMoreDigital
11th March 2009, 23:03
MP4 Browser from Miravid (google is your friend ;)) is also able to get the tree (Just rename those files to mp4). And MP4Muxer v0.9.3 offers in-depth stream analysis too ;)

Kurtnoise
14th March 2009, 11:31
Just to mention that the ftyp major_brand name is also wrong (or misused) for the provided samples. This is not avc streams...:)

benwaggoner
15th March 2009, 21:34
Just to mention that the ftyp major_brand name is also wrong (or misused) for the provided samples. This is not avc streams...:)
Thanks for catching that. We'll look at rationalizing major_brand in a future update.

bcoudurier
23rd March 2009, 17:09
It seems the stsd id in 'trex' atom is wrong according to specs:

aligned(8) class TrackExtendsBox extends FullBox(‘trex’, 0, 0){
unsigned int(32) track_ID;
unsigned int(32) default_sample_description_index;

'stsc':
"sample_description_index is an integer that gives the index of the sample entry that describes the
samples in this chunk. The index ranges from 1 to the number of sample entries in the Sample
Description Box"

In the samples, stsd id is '0' while it should be '1'.

It seems also that "first_sample_flags" are weird for the 'trun' boxes for the video track, value seems to be '0x40' which does not refer to keyframe flag, if it was the intent.

Extradata for the VC-1 stream is contained at the end of the 'stsd', and seems compatible with the format used in .wmv, however this format is not compliant with SMPTE RP 2025 IIRC (I might be wrong though), and is _not_ contain in any box, while it should for sake of uniformity of ISO Media.

I did not check the validity of the 'ctts' values yet, I will try to soon.

Hope this helps.

bok
18th May 2009, 06:18
Indeed, it seems that the MP4 files created by the Expression Encoder 2 (SP1) are not compliant.
First, the 'avc1' major brand used is obviously not correct.
But the biggest problem that I'm seeing is that the SampleEntry (that makes up the sample description in the 'stsd' box) is bogus. The spec (ISO 14496-12, section 11.3) says that when making a derived format (which this is), one must extend the relevant SampleEntry data structure by adding one or more boxes to it (not just data fields). So that means that for audio tracks, the SampleEntry in the 'stsd' container must be an AudioSampleEntry data structure followed by one or more boxes. For video tracks, a VisualSampleEntry followed by one or more boxes. For instance, the AVC spec extends the VisualSampleEntry in this way by adding an 'avcC' box.
But the format used by microsoft violates this rule. If we look at the audio sample entry for example, we'll see that it does indeed start with the same fields as AudioSampleEntry (where the format is 'owma'), but instead of being followed by a box, it is followed by 'raw' fields.
Here are the relevant bytes:

00001c0: 0000 0000 0000 0000 0050 7374 7364 0000 .........Pstsd..
00001d0: 0000 0000 0001 0000 0040 6f77 6d61 0000 .........@owma..
00001e0: 0000 0000 0001 0000 0000 0000 0000 0002 ................
00001f0: 0010 0000 0000 ac44 0000 6101 0200 44ac .......D..a...D.

The AudioSampleEntry data are the first 28 bytes after 'owma', ending with 'ac440000' (which is the 44.1kHz sampling rate). What follows (61020200...) is obviously NOT a box.

Why is this a problem?
First, it breaks the spec. That's pretty sad. It will crash some code and generally make things painful for the ones who want to be strict.
But more importantly, it makes this entry incompatible with parts of the specification that assume that the SampleEntry are extended in the normal way (with boxes). Specifically, when it comes to encryption: when encrypting MP4 data, the sample description is replaced with a 'wrapper', with the type 'enca' or 'encv' (14496-12, section 8.12), and an 'sinf' box is added at the end, containing an 'frma' box to indicate the original format, and other boxes with the crypto details. For this to work, the parser MUST be able to read the 'sinf' box. The parser can't know the original format until it has read the 'sinf' box and found the 'frma' box inside it. So the parser must assume that the data is layed out in the standard way, which means the AudioSampleEntry or VisualSampleEntry, followed by boxes. If there's anything else after the AudioSampleEntry or VisualSampleEntry that's not a box, that parser will be completely confused.

Is it too late to fix this?

benwaggoner
18th May 2009, 06:51
@bok,

Thanks for the very detailed feedback! I know some of the issues you list are being addressed, but I'm not sure about a few others. I've passed on your post to the team working on the spec. There's been quite a bit of evolution in in itsince last fall, and we've worked with a large number of other companies involved in MPEG-4 as well. It's our intent to make the Smooth Streaming file format 100% ISO MPEG-4 compatible.

I'll make sure to upload some updated samples, including using H.264 and AAC, once we've got all the changes implemented in the muxer.

mediator
19th May 2009, 11:00
...also hoping that the previously reported SMPTE compliance issue of the VC-1 tracks is not entirely forgotten ;)

benwaggoner
19th May 2009, 15:00
...also hoping that the previously reported SMPTE compliance issue of the VC-1 tracks is not entirely forgotten ;)
I believe that was addressed some time ago.

I'll get new samples up according to the revised spec as soon as soon as all the changes are added to the muxer.