How to demux VC1 elementary stream from WMV file? [Archive] - Page 3

View Full Version : How to demux VC1 elementary stream from WMV file?

Pages : 1 2 [3] 4

zambelli

5th February 2007, 09:58

To expand a little more on my previous post... As you're reverse engineering your way through this, be careful not to base all your testing of VC-1 streams exclusively on the WMV decoder. You should probably double check against a true VC-1 reference decoder or at least a 3rd party decoder (ffmpeg?). I mention this because it's important to understand that the WMV decoder is still first and foremost a WMV decoder. We did not rewrite it from scratch based on the VC-1 spec for the v11 release. It was designed with WMV9 within ASF in mind. Although certain VC-1 AP features such as range expansion and coded resolution were added to the decoder to ensure VC-1 compliancy, the decoder still largely relies on ASF metadata for things like interlaced signaling and aspect ratio signaling. I'm not sure it even supports things like 3:2 pulldown at all. So keep in mind that even though WMV9 is a VC-1 implementation, the WMV9 decoder is a WMV9 decoder, not a reference VC-1 decoder.

crypto

6th February 2007, 08:21

The good news. The ES streams output by the latest ES writer filter is accepted without problems in authoring.

But, the muxed EVOB shows some stuttering and pixelation every few seconds. I suspect an error my HD-DVD profile encoding settings. I am still trying some variations.

If someone knows an approved profile, let me know.

stegre

7th February 2007, 08:03

I must interrupt you here before you continue down this path... Be careful not to confuse "display size/coded size" with Pixel Aspect Ratios - they're not the same concepts..

Thank you - I was indeed assuming "Display Size" / "Coded Size" = "PAR", when I wrote that, and you're absolutely right, the phrases are being used quite differently here. Actually, the whole subject - how WMV uses the registry parameters you mentioned, what the various fields in the VC-1 spec mean, how WMV is using them in its embedded "short form" VC-1 headers, which parameter is applied to which first, etc., etc - is quite complex.

I think I have it all sorted out and now; I should write it up as some kind of FAQ (if only for myself, before I forget! - so I can finish the project which will create proper VC-1 ES headers from the WMV file).

WMV is really doing some weird stuff actually. For example, once you do specify those parameters you mentioned, the WMV's embedded VC-1 headers suddenly become "long form", i.e. the previously excluded VC-1 "Display Extension" is suddenly included. I think they do that so they have a way to recover the original file's coded size - i.e. the "size it would have been encoded at if the registry overrides were absent". They apparently store that value in the VC-1 field called "Display Size". But it really isn't "Display Size" because they apply any PAR to that value afterwards, resulting in the real display size. And for its part, the PAR value itself always left out of these VC-1 headers - even though there's a spot for it right there - probably because WMV already has at least one or two other places where they already store that.

So if that sounds complicated, it is, but I think I can get all the values and do everything I need to for a proper conversion under all circumstances.

...it's important to understand that the WMV decoder is still first and foremost a WMV decoder. We did not rewrite it from scratch based on the VC-1 spec for the v11 release. It was designed with WMV9 within ASF in mind. Although certain VC-1 AP features such as range expansion and coded resolution were added to the decoder to ensure VC-1 compliancy, the decoder still largely relies on ASF metadata for things like interlaced signaling and aspect ratio signaling....

Yes, that statement seems totally consistent with stuff I was just mentioning above...

... as you're reverse engineering your way through this, be careful not to base all your testing of VC-1 streams exclusively on the WMV decoder. You should probably double check against a true VC-1 reference decoder or at least a 3rd party decoder (ffmpeg?)...

Yes, I have successfully compiled and am testing streams against the official VC-1 reference decoder.

I'm not "testing any streams against the WMV decoder" at all, because, while this thread has discussed various "projects", the one I'm discussing now is essentially the "reverse engineering" the WMV VC-1 (WVC1) file format streams produced by the encoder to see if I can convert them to standard VC-1 ES streams acceptable to the reference decoder and other apps that want the ES format. So I'm creating files with the MS encoder, testing with the reference decoder, and not really using the MS decoder for much at all right now except to quickly check that my encodes look reasonable before I start analyzing them.

stegre

16th February 2007, 06:26

OK here's the latest. Rather than further enhance my existing DirectShow VC-1 ES writer filter (http://forum.doom9.org/showthread.php?p=947760#post947760) that works with GraphEdit, I'm just dispensing with DirectShow altogether.

The new converter will be a command line util:

ASF2VC1 <input.wmv> <output.vc1>

where the input is a WVC1 WMV file and the output will be a VC-1 elementary stream. The screen output will give a technical log of the details of the conversion. It'll be small, fast, and have no "pre-requisites" - as long as your PC can read and write files it will work. The source will be a self-contained project and not make any use of DXSDK or anything else. It's 100% written from scratch.

I was already manually parsing all these fields out of the ASF container anyway; I figured why not just go ahead parse the frame data itself as well and be done with it. Parsing ASF files is difficult, but it's already largely functional; I'll probably finish it over the weekend.

Here's a summary of how it works

It gets the existing "short" VC-1 Sequence and Entry Point headers from the end of the ASF_Video_Media object. A table driven bit parser with variable bit length and conditional capabilities automatically runs thru that bitstream and fills a table with whatever data is available from it.

It then gets framerate from the ASF_Extended_Stream_Properties_Object, PAR from the ASF_Metadata_Object and/or ASF "Payload Extension Systems", and perhaps other data from other places. It uses that info to fill in any missing data in the table, and then runs the bit parser in reverse to generate new, longer and more informative Sequence and Entry Point headers.

After that, it actually demuxes the entire ASF video stream, prefixing each frame with a VC-1 Frame Start Code and each keyframe with the "new and improved" Sequence and Entry Point headers generated above.

The screen output will show the original input headers from the ASF and the modified output headers in the same readable format as my VC1_Info util (http://forum.doom9.org/showthread.php?p=947434#post947434), along with stats and other info. In fact, it will probably replace that util as well, because I'll set it up with a command line switch, or for that matter if only one (either the *.WMV or the *.VC1) file is specified rather than both it will default to just an info dump instead of a demux.

And no, I have no plans to make it work in reverse to create a WMV/ASF from a VC-1 ES (I think I read here someone else is working on such a thing though (?). Anyway I have to get back to work on GSpot, which is way "behind schedule" because of this "distraction". On the plus side, I'll be putting all this code (the informational parts) into the next GSpot, so it will support VC-1 ES streams and .WMV containers in general, a capability it hasn't had until now. Anyway, that's OT for here....

oberon

1st March 2007, 22:13

When I load this prx into procoder it just crashes when I hit convert

<profile version="589824"
storageformat="1"
name="[WVC1] 1080p30cbr HDDVD"
description="[WVC1] Advanced HD-DVD, 1920 x 1080 Pixel, 29.97 FPS">
<streamconfig majortype="{73646976-0000-0010-8000-00AA00389B71}"
streamnumber="1"
streamname="Video Stream"
inputname="Video409"
bitrate="14000000"
bufferwindow="1000"
reliabletransport="0"
decodercomplexity="AU"
rfc1766langid="en-us"
>
<videomediaprops maxkeyframespacing="5000000"
quality="100"/>
<wmmediatype subtype="{31435657-0000-0010-8000-00AA00389B71}"
bfixedsizesamples="0"
btemporalcompression="1"
lsamplesize="0">
<videoinfoheader dwbitrate="14000000"
dwbiterrorrate="0"
avgtimeperframe="333667">
<rcsource left="0"
top="0"
right="1920"
bottom="1080"/>
<rctarget left="0"
top="0"
right="1920"
bottom="1080"/>
<bitmapinfoheader biwidth="1920"
biheight="1080"
biplanes="1"
bibitcount="24"
bicompression="WVC1"
bisizeimage="0"
bixpelspermeter="0"
biypelspermeter="0"
biclrused="0"
biclrimportant="0"/>
</videoinfoheader>
</wmmediatype>
</streamconfig>
</profile>

zambelli

2nd March 2007, 03:09

OK here's the latest. Rather than further enhance my existing DirectShow VC-1 ES writer filter (http://forum.doom9.org/showthread.php?p=947760#post947760) that works with GraphEdit, I'm just dispensing with DirectShow altogether.
Hi Stegre,

I've tested the existing DShow implementation a little further. I think there is still some work to be done - I don't think the output is 100% correct yet.

We have an internal ES conversion utility here at MS that I tested against.

Here's the output info from your conversion:

Sequence Header:
Profile: Advanced
Level: 2
ChromaFormat: 1
FrameRate: 22.00 fps
Bitrate: 608 kbps
MaxFrameWidth: 640
MaxFrameHeight: 272
PostProcInfoPresent: 0
BroadcastFlags: 1
InterlacedSource: 0
TemporalFrmCntr: 0
SeqFrameInterpolation: 0
SMPTE Reserved: 1
PsF: 0
Disp extension Flag: 1
Horizontal: 640
Vertical: 272
bAspect Ratio Flag: 0
Frame Rate Flag: 1
Frame Rate index: 0
iFrame Rate nr: 1
iFrame Rate dr: 2
Color Format Flag: 0
Hardware Flag: 0

1st Entry Point Header:
BrokenLink: 0
ClosedEntryPoint: 0
PanScanPresent: 0
RefDistPresent: 0
LoopFilter: 0
UVHpelBilinear: 0
ExtendedMvMode: 0
DQuantCodingOn: 0
XformSwitch: 0
SequenceOverlap: 0
ExplicitSeqQuantizer: 0
ExplicitFrameQuantizer: 0
NewCodedSizeFlag: 0
FrmWidthSrc: 0
FrmHeightSrc: 0
RangeRedY_Flag: 0
RangeRedY: 0
RangeRedUV_Flag: 0
RangeRedUV: 0
Hardware Flag: 0

Here's the output info from my conversion:

Sequence Header:
Profile: Advanced
Level: 2
ChromaFormat: 1
FrameRate: 22.00 fps
Bitrate: 608 kbps
MaxFrameWidth: 640
MaxFrameHeight: 272
PostProcInfoPresent: 0
BroadcastFlags: 1
InterlacedSource: 0
TemporalFrmCntr: 0
SeqFrameInterpolation: 0
SMPTE Reserved: 1
PsF: 0
Disp extension Flag: 0
Hardware Flag: 0

1st Entry Point Header:
BrokenLink: 0
ClosedEntryPoint: 0
PanScanPresent: 0
RefDistPresent: 1
LoopFilter: 1
UVHpelBilinear: 0
ExtendedMvMode: 0
DQuantCodingOn: 1
XformSwitch: 1
SequenceOverlap: 1
ExplicitSeqQuantizer: 0
ExplicitFrameQuantizer: 0
NewCodedSizeFlag: 1
FrmWidthSrc: 640
FrmHeightSrc: 272
RangeRedY_Flag: 0
RangeRedY: 0
RangeRedUV_Flag: 0
RangeRedUV: 0
Hardware Flag: 0

As you can see, there are differences. When I play back the ES output from your conversion, I get a strange flicker in the decoder - not sure why.

crypto

3rd March 2007, 11:46

This thread is suspiciously calm for a while now. I think Steve and most of us have given up on VC-1 due to no or very little support from the WM fraction. It seems AVC is the better choice for homebrew stuff.

stegre

4th March 2007, 06:52

No, not at all! I actually had to stop for a while to update GSpot, and then there were a few more delays. But now, for your testing edification, I cordially present:

* ASF2VC1.exe v1.00 * (http://www.ftyps.com/unrelated/ASF2VC1.zip)

This is a small (64KB exe; 31KB zip), self-contained, extremely fast & trivial to use command-line-app that will demux a WVC1 WMV file to a VC-1 ES file. It was written from scratch, doesn't depend on or require anything else to use, and has no "system requirements" to speak of (you could run it on an old Win98 box if you want). The console output is set to be quite verbose now, so it tells you "what's going in, how it's being modified, and what's coming out" - complete with a compact display of frame layout and a summary of all frame types written. The output from a sample test run is shown in the code box at the end of this post.

Zambelli - I'm going to take a closer look at what you've posted above, concerning my previous DirectShow filter writer and the differences in those two info dumps. But I'd like to use this app as my new starting point. If there are still issues with "flicker" or anything else, what would be great is if I could somehow get a sample VC-1 ES file that doesn't do that, and if possible the source that it was created from.

In fact, if you're now able to convert WMV to VC-1 ES using MS internal utilities, the holy grail of sample files for me would be a WMV and the corresponding VC-1 that does work well, in a situation where mine does not. That way I could run my converter myself on the sample WMV and examine how the two resulting VC-1 files differ.

If instead you’re now creating the VC-1's and WMV's separately, but from the same original sample in a some third format it'd be great if I could get any/all of those as well, or even just any VC-1 sample that "works well". Basically I'll take anything ;) Feel free to PM me if it's not cool to publicly post the company's internal test files. crypto (or anyone else): same deal. If you get a chance, see how the files this util creates compare to "professional" stuff; any feedback of any sort will be helpful and appreciated.

But back to the app itself. It's version 1.00, I just finished it, there isn't even a "readme" file yet or anything. So there may be bugs, though I have done limited testing as mentioned above. The current version also has no command line switches - I'm thinking of adding some; the first thought that comes to mind (besides a less [or more?] "verbose" option) would be a way to add, to or even override various header values - e.g. set a PAR or frame rate override right from the command line.
C:> asf2vc1

ASF2VC1 v1.00 (c)2007 Steven G Greenberg
Demuxes WMV WVC1 to VC-1 Elementary Stream. Advanced Profile Only.

Primary Usage:
ASF2VC1<input.wmv> <output.vc1> Convert ASF/WMV WVC1 input to VC-1 ES
Other:
ASF2VC1 <input.wmv> Info only about WVC1 file <input.wmv>
ASF2VC1 <input.vc1> Info only about VC-1 ES file <input.vc1>

C:> asf2vc1 test.wmv test.vc1

ASF2VC1 v1.00 (c)2007 Steven G Greenberg
Demuxes WMV WVC1 to VC-1 Elementary Stream. Advanced Profile Only.

Found ASF_Header_Object

-- Found ASF_File_Properties_Object
-- Found ASF_Header_Extension_Object
-- -- Found ASF_Metadata_Object; processing 4 objects
-- -- -- Metadata for stream 1: IsVBR = FALSE
-- -- -- Metadata for stream 1: DeviceConformanceTemplate = L2
-- -- -- Metadata for stream 2: IsVBR = FALSE
-- -- -- Metadata for stream 2: DeviceConformanceTemplate = AP@L2
-- -- Found ASF_Extended_Stream_Properties_Object for ASF stream 1
-- -- -- avg object period: 97720 uSecs (10.233 / sec)
-- -- -- max object size: 2731 bytes
-- -- Found ASF_Extended_Stream_Properties_Object for ASF stream 2
-- -- -- avg object period: 33367 uSecs (29.970 / sec)
-- -- -- max object size: 95038 bytes
-- Found ASF_Stream_Properties_Object for ASF stream 1
-- -- (Non-video stream; no further processing)
-- Found ASF_Stream_Properties_Object for ASF stream 2
-- -- Identified this stream as the first video stream.
-- -- FourCC: WVC1
-- -- Stream correctly marked as VC-1 Advanced Profile
-- -- Now looking for Sequence and Entry Point Bytes...
-- -- Found Sequence and Entry Point Bytes; now copying...

Successfully retrieved 21 bytes of VC-1 headers from file test.wmv.
Bytes as retrieved [concatenation of Sequence and Entry Point Headers]:
00 00 01 0f d3 fe 24 a1 87 88 80 00 00 01 0e 10 44 92 86 1c 80

Interpretation of above VC-1 headers:

---- Sequence Header ----
PROFILE: 3
LEVEL: 2 (AP3@L2)
COLORDIFF_FORMAT: 1 (=4:2:0)
FRMRTQ_POSTPROC: 7
BITRTQ_POSTPROC: 31
POSTPROCFLAG: 0
MAX_CODED_WIDTH: 586 (=1174)
MAX_CODED_HEIGHT: 391 (=784)
PULLDOWN: 1
INTERLACE: 0
TFCNTRFLAG: 0
FINTERPFLAG: 0
RESERVED: 1
PSF: 0
DISPLAY_EXT: 0
-DISP_HORIZ_SIZE: --
-DISP_VERT_SIZE: --
-ASPECT_RATIO_FLAG: --
--ASPECT_RATIO: --
---ASPECT_HORIZ_SIZE: --
---ASPECT_VERT_SIZE: --
-FRAMERATE_FLAG: --
--FRAMERATEIND: --
--FRAMERATENR: --
--FRAMERATEDR: --
--FRAMERATEEXP: --
-COLOR_FORMAT_FLAG: --
--COLOR_PRIM: --
--TRANSFER_CHAR: --
--MATRIX_COEF: --
HRD_PARAM_FLAG: 0
-HRD_NUM_LEAKY_BUCKETS: --
-BIT_RATE_EXPONENT: --
-BUFFER_SIZE_EXPONENT: --

---- Entry Point Header ----
BROKEN_LINK: 0
CLOSED_ENTRY: 0
PANSCAN_FLAG: 0
REFDIST_FLAG: 1
LOOPFILTER: 0
FASTUVMC: 0
EXTENDED_MV: 0
DQUANT: 0
VSTRANSFORM: 1
OVERLAP: 0
QUANTIZER: 0
CODED_SIZE_FLAG: 1
-CODED_WIDTH: 586 (=1174)
-CODED_HEIGHT: 391 (=784)
EXTENDED_DMV: --
RANGE_MAPY_FLAG: 0
-RANGE_MAPY: --
RANGE_MAPUV_FLAG: 0
-RANGE_MAPUV: --

Generating modified headers...
-- Inserting Frame Period: 33367 uSecs (29.970 FPS)...
-- Note: PAR not specified in ASF input file- shall remain so in VC-1 output.

Header bytes to be used in output [concatenated Seq and Entry Point Hdrs]:
00 00 01 0f d3 fe 24 a1 87 8a 24 a8 61 e8 0c 88 00 00 00 01 0e 10 44 92
86 1c 80 00

Interpretation of above VC-1 headers:

---- Sequence Header ----
PROFILE: 3
LEVEL: 2 (AP3@L2)
COLORDIFF_FORMAT: 1 (=4:2:0)
FRMRTQ_POSTPROC: 7
BITRTQ_POSTPROC: 31
POSTPROCFLAG: 0
MAX_CODED_WIDTH: 586 (=1174)
MAX_CODED_HEIGHT: 391 (=784)
PULLDOWN: 1
INTERLACE: 0
TFCNTRFLAG: 0
FINTERPFLAG: 0
RESERVED: 1
PSF: 0
DISPLAY_EXT: 1
-DISP_HORIZ_SIZE: 1173 (=1174)
-DISP_VERT_SIZE: 783 (=784)
-ASPECT_RATIO_FLAG: 0
--ASPECT_RATIO: --
---ASPECT_HORIZ_SIZE: --
---ASPECT_VERT_SIZE: --
-FRAMERATE_FLAG: 1
--FRAMERATEIND: 0
--FRAMERATENR: 3 (=30000/1001, =29.970 FPS)
--FRAMERATEDR: 2 (=30000/1001, =29.970 FPS)
--FRAMERATEEXP: --
-COLOR_FORMAT_FLAG: 0
--COLOR_PRIM: --
--TRANSFER_CHAR: --
--MATRIX_COEF: --
HRD_PARAM_FLAG: 0
-HRD_NUM_LEAKY_BUCKETS: --
-BIT_RATE_EXPONENT: --
-BUFFER_SIZE_EXPONENT: --

---- Entry Point Header ----
BROKEN_LINK: 0
CLOSED_ENTRY: 0
PANSCAN_FLAG: 0
REFDIST_FLAG: 1
LOOPFILTER: 0
FASTUVMC: 0
EXTENDED_MV: 0
DQUANT: 0
VSTRANSFORM: 1
OVERLAP: 0
QUANTIZER: 0
CODED_SIZE_FLAG: 1
-CODED_WIDTH: 586 (=1174)
-CODED_HEIGHT: 391 (=784)
EXTENDED_DMV: --
RANGE_MAPY_FLAG: 0
-RANGE_MAPY: --
RANGE_MAPUV_FLAG: 0
-RANGE_MAPUV: --

Found ASF_Data_Object
-- len = 6840050 bytes; contains 855 data packets
Starting ASF demux / VC-1 write...

K............K..................................................
.....................K...................................K......
................................................................
.K.........................................K....................
...................................................K....

Done. Summary of frames written:

K [I] (Intra-Coded -"Keyframes") : 7
. [P] (Predictive) : 305
, [B] (Bidirectionally Predictive): 0
- [BI] (Intra-Coded B-Frames) : 0
d [D] (Skipped -"Duplicated") : 0

Total frames written : 312

zambelli

5th March 2007, 03:40

No, not at all! I actually had to stop for a while to update GSpot, and then there were a few more delays. But now, for your testing edification, I cordially present:

* ASF2VC1.exe v1.00 * (http://www.ftyps.com/unrelated/ASF2VC1.zip)
Cool! I'll try it out this week and compare the output to our own test tool's output. I'll contact you by PM if I spot any differences. I ought to be able to provide you WMV and ES samples if necessary.

Sagittaire

6th March 2007, 14:18

Well seem work very well ...

But unfortunaly it's completely useless to make demux with Stream from WMEncoder (for example) simply because WMEncoder don't produce strictly compliant VC1 stream (no HRD parameters for example).

stegre

6th March 2007, 20:19

Well, actually, the streams I'm producing from the ASF are in fact "strictly compliant". HRD parameters, for example, are optional.
http://www.ftyps.com/unrelated/spec.png
Now that doesn't mean necessarily mean they aren't important, which is exactly why I'm asking for samples that "work well" in a professional environment or are produced by professional equipment so I can examine the differences.

It's interesting you mention HRD, though. Although I don't have the samples I'm asking for above, I do have at least one "professional" VC-1 (posted earlier by crypto), and one of the only significant differences I noticed is that his does include a single HRD "leaky bucket" value on each sequence header.

Now if that's the whole big difference between the converted ES streams working well and not, I don't see any reason why I couldn't compute that value myself and add it to the sequence headers. ASF already has headers with initial and average/max "leaky bucket" info; they had that even before VC-1 was around. In fact, I wouldn't be surprised VC-1 actually adapted the term "leaky bucket" from Microsoft. But I don't know that the ASF file format keeps any "running totals" on "what's in the bucket", which is what I think the VC-1 HRD params are, but with the info available I imagine I could just compute that myself, on the fly, and add that into the headers too.

Sagittaire

7th March 2007, 08:38

Well, actually, the streams I'm producing from the ASF are in fact "strictly compliant". HRD parameters, for example, are optional.

In fact not for HDDVD or BD. You must have HRD flags with scenarist for example (and certainely other)

But be carefull the problem don't come from your demuxer but from the encoder. Actually the best available and free encoder, WME9, don't produce stream with HRD flags.

Anyway I think that it will be possible to generate the HRD flags or the pulldown flags even if the original stream don't have these flags.

stegre

7th March 2007, 22:43

OK, thank you for that info, I didn't know that about HDDVD/BD & Scenarist requirements; I am going to start looking into "counting bits" and see if I can inject HRD info into the converted stream.

I was also thinking about the pulldown issue too, I should be able to make it produce, for example, a VC1-ES that plays 30FPS / 60 flds/s from 24FPS WMV just by just changing flags, but that would involve modifying flags in some other (pic/frame/field) headers I'm not currently modifying, so I have to study that first; it'd probably be the next "phase" of the project if I decide to add that too.

crypto

8th March 2007, 00:24

You must have HRD flags with scenarist for example (and certainely other)

I doubt it. In fact I have authored a HD-DVD, which plays perfectly and was encoded with wmvmuxer and converted to ES using stegre's ES writer.

@stegre
The cli version stopped at 1 GB with the following error which repeat over and over:

Warning: ASF ReplicatedLen of '1' not handled yet; ignoring.

The first 1GB were ok and are accepted by Scenarist. The ES writer does not have that limit.

stegre

8th March 2007, 00:40

Ahh, ok thx for the info. Reading, absorbing and understanding the ASF spec is a not simple (though it's pretty well written as these things go). But I remember it kept saying "unless ReplicatedLen is '1', in which case refer to sec <nnn>.." and I was too lazy to go thru reading that whole part - just hoping it wouldn't happen ;) And, in fact, I never did see, it - but then again I never got involved with files that big; must have something to do with that. Guess I'll have to do some light reading tonight about "replicated length =1 ;)

crypto

8th March 2007, 08:20

OK, great job so far. Needless to say, that I am very impressed seeing all this happening without a call into the WMFSDK11 libs. This is really cool stuff.

ACrowley

18th April 2007, 12:15

@stegre

Thx for your Tools

EDIT . now its clear

Eastermeyer

24th April 2007, 17:23

I got this Error in Scenarist
"Error O:\test.vc1
Error : Stream contains a GOP with a number of fields = 514; it should not exceed 72.
Warning : Number of SequenceEndCode is 0."

What's up ?

And what is the right Framerate to encode at , for 720p ?
23.976 29.970 59.940 50.000 ?

crypto

24th April 2007, 18:52

What encoding settings did you use? You can control the GOP length with the buffer size and the key frame interval.

For the 720p, what's the sources fps?

Eastermeyer

25th April 2007, 16:15

I solved the problem with settting the keyframe-interval to 0.25 sec.

15 / 59.940 = ~0.25

My source fps are 23.976 convert via ChangeFPS in AviSynth to 59.940.

What makes me disturbed is the fact that the VC1 File looks much worse then a MPEG2 File at exactly the same settings ?!?
I will upload a sample later...

Is there something weird with my settings ?
http://i18.tinypic.com/2e6fzlw.png

Eastermeyer

25th April 2007, 16:53

Here is the sample :
http://files.to/get/434483/33283/vc1_against_mpeg2.rar

Eastermeyer

27th April 2007, 15:21

Hmm ?

Strongling

1st May 2007, 03:47

Hi,

@ Eastermeyer
Here is the sample :

Holy crap dude! I don't know which tool/encoder you were using and with what settings but I just ran it through the asf2vc1.exe tool by stegre and I noticed that it is all Intra frames!

The encoder/encoding tool didn't put any P/B frames where as the mpeg2 sample attached in your same message had the more familiar IBBPBBPBB... structure. Obviously much of the compression was lost due to I frames only and then the encoder constrained by the bitrate you asked for, had no choice but to raise the quantizer very high resulting in very bad quality that we see in the wmv version.

On another note to the VC-1 guys: another strange thing I am sure you guys must have noticed, if you downloaded his smaples, is that the wmv version really compressed badly in another way: if you compress it again with a simple rar the size goes from 26MB to just 10 MB! Try it yourself and see. What's going on here? That's very odd to me, given that any half decent compression scheme if applied properly would realy not compress further with other methods.

There is something very wrong with this VC-1 encoded stream even though it is probably fully compliant. I wish I had a VC-1 analyzer to investigate further.

@stegre:

First of all, a very nice tool and utility for experimenting with VC-1 so thanks a lot for that. Good Work.

Now I have a question: I read the readme.txt file in one of the earlier DS filters that you released to demux WMV and I read the explaination about how only the advanced profile is self cotained and others need a container for them to be useful. Very useful introduction, by the way, so thanks again.

Now, in this light, my question is whether there is any use for the simple/main profiles at all? I mean all the samples etc. I have been able to see from different sources are all advanced profile, not one of them simple or main. Are there any tools/players/encoders that work with these two profiles at all?

zambelli

1st May 2007, 18:46

On another note to the VC-1 guys: another strange thing I am sure you guys must have noticed, if you downloaded his smaples, is that the wmv version really compressed badly in another way: if you compress it again with a simple rar the size goes from 26MB to just 10 MB! Try it yourself and see. What's going on here? That's very odd to me, given that any half decent compression scheme if applied properly would realy not compress further with other methods.
I haven't downloaded it and looked at it, but like you said, the encoding settings seem a little dubious. Clearly a properly encoded WMV with at least 0.5 second GOPs and good ratecontrol would not be inflated like that. :)

Now I have a question: I read the readme.txt file in one of the earlier DS filters that you released to demux WMV and I read the explaination about how only the advanced profile is self cotained and others need a container for them to be useful. Very useful introduction, by the way, so thanks again.
I believe this was done for backwards compatibility with WMV9. Making SP and MP self-contained too would've required changes to the bitstream which would in turn rendered all existing WMV9 SP and MP content incompliant with the VC-1 standard.

Now, in this light, my question is whether there is any use for the simple/main profiles at all? I mean all the samples etc. I have been able to see from different sources are all advanced profile, not one of them simple or main. Are there any tools/players/encoders that work with these two profiles at all?
I think only HD-DVD and BluRay specifications require VC-1 Advanced Profile - precisely because of its ability to be delivered without a container.
Simple and Main are still very useful (i.e. mobile device playback), but they can only be delivered in a container such as ASF, AVI, MP4, etc. Any standalone CE device that is certified for WMV9 playback supports Simple and Main VC-1 decoding - though if certified for "WMV9" playback, that'd imply ASF file support specifically.

First of all, a very nice tool and utility for experimenting with VC-1 so thanks a lot for that. Good Work.
Stegre, I second that and I offer my deep apology for dozing off on this thread. I remember I offered to help you out with testing your tool to ensure it matches the results of our Microsoft internal VC-1 ES --> ASF tool - and then work got busy and the thread kinda died out.

If you've got time to refine this tool further, I'd be happy to work with you to ensure the tool produces valid VC-1 ES output.

Strongling

1st May 2007, 20:59

Basically being an encoder guy, I have never had to worry about the container and other such stuff, I am quite used to dealing with just the raw bitstreams. However, I noticed that with VC-1, it is not entirely possible to avoid the container issue, specially for main and simple profile.

So I was going through the specs looking for some answers about all this container business and here's how I think it works. Please feel free to correct me wherever I am wrong.

In chapter 3, fig. 5 of the specs, it is stated that the coverage of the specs is as in the following figure:

| |
|<-------------- coverage of the VC-1 specs ------------------>|
| |
| |
| compliant ____________ |
| compressed bitstream | | compliant |
---------------------------->| | YUV 420 output |
| | |------------------------>
| | decoding | |
| compliant | process | |
| decoder initilaization | | |
| metadata | | |
---------------------------->| | |
| |____________| |
| |
So the specs not only cover the format of the encoded bitstream but also define the decoder initilaization metadata in Annex J. It just leaves out the mechanism to get this to the decoder. The Advanced profile is exceptional in the sense that it doesn't really need this metadata, it is self-contained and gets all the required parameters through the encoded bitstream alone. From that it gets the sequence header and the entry point header syntax elements which is all that is needed to decompress the stream.

Problem with the other 2 profiles is that there is no sequence header and the entry point header so consequently there is vital information in the decoder initialization metadata and without it, a decoder can't even interpret the bitstream properly. For example, amongst the million or so different quantization modes, which one to use is specified in the metadata for the simple and main profiles. So without knowing the value of the "QUANTIZER" syntax element, which is a part of the decoder initialization metadata, it is not possible to even inverse quantize the coefficients.

Now, I can see in Annex J that it is really not left upto everyone to define the format of this metadata and it is strictly defined in the specs already so there is no choice but to follow it. Then in Annex L, it is specified that:

"This annex defines a concatenation of the decoder initialization metadata and compressed frame data so as to embed the information necessary for the decoding process into a common minimal serial bitstream that can be used in defining file or streaming format encodings of the elementary bitstream."

Which is basically telling us that if we concatenate the metadata struct with the compressed bitstream, then we have a self-contained unit that can be decoded without further need of containers. It works for all profiles, including simple and main. Then they recommend the concatenation by simply prepending the "Sequence Layer Data Structure" and the "Frame Layer Data Structure" to the compressed bitstream to make this self-contained unit. For example:

________ ______ _________________ ______ ___________________ __
|Sequence|Frame |Compressed |Frame |Compressed | .
|Layer |Layer |Bitstream |Layer |Bitstream | .
|Data |Data |for |Data |for | .
|Struct |Struct|Frame # 0 |Struct|Frame # 1 | .
|________|______|_________________|______|___________________|__

So the whole thing becomes a kind of simple container itself.

Assuming I got all the above correctly, my questions are:

1. If an encoder generates the bitstream like I described in the above figure will tools recognize it? i.e. If I started making simple/main profile bitstreams like that, would they be useful? Because I was going to suggest to Stegre that he modify his tool to include simple/main profile like the above too.

@zambelli:

I believe this was done for backwards compatibility with WMV9. Making SP and MP self-contained too would've required changes to the bitstream which would in turn rendered all existing WMV9 SP and MP content incompliant with the VC-1 standard.

2. Does the above mentioned format break any of the existing WMV9 SP and MP content as you mentioned?

@zambelli:

Simple and Main are still very useful (i.e. mobile device playback), but they can only be delivered in a container such as ASF, AVI, MP4, etc.

3. This whole Annex L thing seems to go against what you are saying, i.e. simple/main raw bitstreams cannot live without a container of some kind. Please help me understand that.

Strongling

9th May 2007, 03:12

Hello guys,

So, no takers for the above questions?

stegre

9th May 2007, 06:04

I've been away from the forum a bit; I'll take some time to catch up with this thread shortly, in a day or so...

@strongling: thx! I think I have a minor update of that util I never posted, too, so let me track down where I left off on that as well...

zambelli

10th May 2007, 11:17

Assuming I got all the above correctly, my questions are:

1. If an encoder generates the bitstream like I described in the above figure will tools recognize it? i.e. If I started making simple/main profile bitstreams like that, would they be useful? Because I was going to suggest to Stegre that he modify his tool to include simple/main profile like the above too.
Well, that depends on the tools implementation. A standard is only a standard if everybody adheres to it. At this moment VC-1 tools are fairly rare. HD-DVD and BluRay authoring tools only use Advanced Profile anyway - so they don't need to worry about SP and MP.

2. Does the above mentioned format break any of the existing WMV9 SP and MP content as you mentioned?
If you're only adding metadata and aren't changing the actual video bitstream - then I would imagine it doesn't break existing content. But the question is - which tools will know how to read such a format? As you said, the moment you start adding simple metadata to the bitstream, you've essentially created a simple container which means it then needs to be parsed before decoding.

But IIRC, Simple and Main video bitstreams DO contain metadata flags. The decoder MUST know whether certain features like Loopfilter and Extended MV Range are enabled (different between SP and MP) before it starts decoding the bitstream. Unlike AP, I think SP and MP only contain one sequence header per stream.

This whole Annex L thing seems to go against what you are saying, i.e. simple/main raw bitstreams cannot live without a container of some kind. Please help me understand that.
It's not that they can't live without containers - but they lack information that would make them self-sufficient. However, they DO contain important decoding metadata.

stegre

14th May 2007, 02:39

As far as Strongling's format questions, I have perhaps a couple of answers and many of the same questions myself, although and I do want to get into that now ... that'll be a separate post. This is about Eastermeyer's file:
Hi, @ Eastermeyer
...
The encoder/encoding tool didn't put any P/B frames where as the mpeg2 sample attached in your same message had the more familiar IBBPBBPBB... structure. Obviously much of the compression was lost due to I frames only and then the encoder constrained by the bitrate you asked for, had no choice but to raise the quantizer very high resulting in very bad quality that we see in the wmv version....

That's exactly correct; it's all keyframes, as can be seen from the the output from AFF2VC1, as Strongling mentioned (this is v1.1 of the conversion app btw, with a few updates - I'll post it in a day or two).
ASF2VC1 v1.1 (c)2007 Steven G Greenberg
Demuxes WMV WVC1 to VC-1 Elementary Stream. Advanced Profile Only.

Found ASF_Header_Object

-- Found ASF_File_Properties_Object
-- Found ASF_Header_Extension_Object
-- -- Found ASF_Metadata_Object; processing 2 objects
-- -- -- Metadata for stream 2: IsVBR = TRUE
-- -- -- Metadata for stream 2: DeviceConformanceTemplate = AP@L3
-- -- Found ASF_Extended_Stream_Properties_Object for ASF stream 2
-- -- -- avg object period: 16683 uSecs (59.941 / sec)
-- -- -- max object size: 30298 bytes
-- -- -- leaky bucket size: 1720 bytes
-- -- -- leaky bucket init fullness: 0 bytes
-- -- -- leaky bucket bitrate: 8500 KB/s
-- -- -- payload extension systems # 1:
-- -- -- -- Data size: 2 Bytes of extra info: 0
-- Found ASF_Stream_Properties_Object for ASF stream 2
-- Stream type: ASF_Video_Media
-- -- FourCC: WVC1
-- -- Stream correctly marked as VC-1 Advanced Profile
-- -- Now looking for Sequence and Entry Point Bytes...
-- -- Found Sequence and Entry Point Bytes; now copying...
Video stream with highest bitrate is stream 2 (bitrate = 8500 Kb/s)

Successfully retrieved 21 bytes of VC-1 headers from video stream 2.
Bytes as retrieved [concatenation of Sequence and Entry Point Headers]:
00 00 01 0f db fe 27 f1 67 88 80 00 00 01 0e 10 44 9f c5 9c 80

Interpretation of above VC-1 headers:

---- Sequence Header ----
PROFILE: 3
LEVEL: 3 (AP3@L3)
COLORDIFF_FORMAT: 1 (=4:2:0)
FRMRTQ_POSTPROC: 7
BITRTQ_POSTPROC: 31
POSTPROCFLAG: 0
MAX_CODED_WIDTH: 639 (=1280)
MAX_CODED_HEIGHT: 359 (=720)
PULLDOWN: 1
INTERLACE: 0
TFCNTRFLAG: 0
FINTERPFLAG: 0
RESERVED: 1
PSF: 0
DISPLAY_EXT: 0
-DISP_HORIZ_SIZE: --
-DISP_VERT_SIZE: --
-ASPECT_RATIO_FLAG: --
--ASPECT_RATIO: --
---ASPECT_HORIZ_SIZE: --
---ASPECT_VERT_SIZE: --
-FRAMERATE_FLAG: --
--FRAMERATEIND: --
--FRAMERATENR: --
--FRAMERATEDR: --
--FRAMERATEEXP: --
-COLOR_FORMAT_FLAG: --
--COLOR_PRIM: --
--TRANSFER_CHAR: --
--MATRIX_COEF: --
HRD_PARAM_FLAG: 0
-HRD_NUM_LEAKY_BUCKETS: --
-BIT_RATE_EXPONENT: --
-BUFFER_SIZE_EXPONENT: --

---- Entry Point Header ----
BROKEN_LINK: 0
CLOSED_ENTRY: 0
PANSCAN_FLAG: 0
REFDIST_FLAG: 1
LOOPFILTER: 0
FASTUVMC: 0
EXTENDED_MV: 0
DQUANT: 0
VSTRANSFORM: 1
OVERLAP: 0
QUANTIZER: 0
CODED_SIZE_FLAG: 1
-CODED_WIDTH: 639 (=1280)
-CODED_HEIGHT: 359 (=720)
EXTENDED_DMV: --
RANGE_MAPY_FLAG: 0
-RANGE_MAPY: --
RANGE_MAPUV_FLAG: 0
-RANGE_MAPUV: --

Generating modified headers...
-- Inserting Frame Period: 16683 uSecs (59.941 FPS)...
-- Note: PAR not specified in ASF input file- shall remain so in VC-1 output.

Header bytes to be used in output [concatenated Seq and Entry Point Hdrs]:
00 00 01 0f db fe 27 f1 67 8a 27 f8 59 e8 14 88 00 00 00 01 0e 10 44 9f
c5 9c 80 00

Interpretation of above VC-1 headers:

---- Sequence Header ----
PROFILE: 3
LEVEL: 3 (AP3@L3)
COLORDIFF_FORMAT: 1 (=4:2:0)
FRMRTQ_POSTPROC: 7
BITRTQ_POSTPROC: 31
POSTPROCFLAG: 0
MAX_CODED_WIDTH: 639 (=1280)
MAX_CODED_HEIGHT: 359 (=720)
PULLDOWN: 1
INTERLACE: 0
TFCNTRFLAG: 0
FINTERPFLAG: 0
RESERVED: 1
PSF: 0
DISPLAY_EXT: 1
-DISP_HORIZ_SIZE: 1279 (=1280)
-DISP_VERT_SIZE: 719 (=720)
-ASPECT_RATIO_FLAG: 0
--ASPECT_RATIO: --
---ASPECT_HORIZ_SIZE: --
---ASPECT_VERT_SIZE: --
-FRAMERATE_FLAG: 1
--FRAMERATEIND: 0
--FRAMERATENR: 5 (=60000/1001, =59.940 FPS)
--FRAMERATEDR: 2 (=60000/1001, =59.940 FPS)
--FRAMERATEEXP: --
-COLOR_FORMAT_FLAG: 0
--COLOR_PRIM: --
--TRANSFER_CHAR: --
--MATRIX_COEF: --
HRD_PARAM_FLAG: 0
-HRD_NUM_LEAKY_BUCKETS: --
-BIT_RATE_EXPONENT: --
-BUFFER_SIZE_EXPONENT: --

---- Entry Point Header ----
BROKEN_LINK: 0
CLOSED_ENTRY: 0
PANSCAN_FLAG: 0
REFDIST_FLAG: 1
LOOPFILTER: 0
FASTUVMC: 0
EXTENDED_MV: 0
DQUANT: 0
VSTRANSFORM: 1
OVERLAP: 0
QUANTIZER: 0
CODED_SIZE_FLAG: 1
-CODED_WIDTH: 639 (=1280)
-CODED_HEIGHT: 359 (=720)
EXTENDED_DMV: --
RANGE_MAPY_FLAG: 0
-RANGE_MAPY: --
RANGE_MAPUV_FLAG: 0
-RANGE_MAPUV: --

Found ASF_Data_Object
-- len = 26768050 bytes; contains 1673 data packets
Starting ASF demux / VC-1 write...

KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKKKKKKKKKKKKKKKKKKKKKK

Done. Summary of frames written:

K [I] (Intra-Coded -"Keyframes") : 1501
. [P] (Predictive) : 0
, [B] (Bidirectionally Predictive): 0
- [BI] (Intra-Coded B-Frames) : 0
d [D] (Skipped -"Duplicated") : 0

Total frames written : 1501
Anyway, obviously "all keyframes" is highly inefficient, and the problem is further exacerbated by a number of side issues. Let's take this one step at a time.

1. Why did it use all I frames? I don't think the MS VC-1 encoders accept a value of less than one second for minimum keyframe distance - look what happens if you try to enter your "0.25" value into WME 9 directly:
http://www.ftyps.com/unrelated/screen.png
I think Nic's WMEnc should issue an error it, but apparently appears to accept it- but it no doubt it becomes "0" by the time it's fed to the actual codec, and the codec reasonably enough would interprets "0 distance between keyframes" as "nothing but keyframes".

Scenarist's message aside, it's my understanding that VC-1 is supposed to work more like MPEG-4: whereas MPEG 1/2 use short GOP structure "patterns" like the IBBPBBPBB pattern Strongling mentions above, the newer encoders seem to have dispensed with that idea in favor of using I-frames (aka "intra" frames aka keyframes) at scene changes, unless some predefined max number of frames or seconds elapses prior to that (whichever comes first). The latter condition is to ensure reasonable random access capability. But that number would typically be more like a few hundred frames or perhaps several seconds, not 4 per second.

2. Why does it look so bad? First, I don't know if it was intentional as I'm unfamiliar with Blu-ray /Scenarist / HD requirements & capabilities, but the file was encoded at a full ~60 frames (not fields) per second. The same is true for the MPEG sample, but the WMV suffers more (more on that later); for now let's just do the math for the wmv.

The specified bitrate is 8500kbps, which is 1038 KB/s; you divide that by the 60 keyframes /sec and find it's got to individually encode each 1280 x 720 picture with only 17KB per picture. It actually didn't do such a bad job:

http://www.ftyps.com/unrelated/comp542b.png

Above are excerpts of frame 542, a keyframe in both of the file samples. The top row is how the MPEG file encoded it, but it did that with 48.3KB it allocated to it. It wasn't limited to the 17KB avg. because it "saved up" bits during the 14 out of every 15 frames that weren't "I" frames (the specific GOP pattern used was IBBPBBPBBPBBPBB except at scene changes).

The second row shows how VC-1 did it with the 17KB or so (less overhead) it could use on average - it actually used 14.2KB for this particular frame. So it looks worse than the MPEG, though not half bad considering it’s 30% of the file size the MPEG used.

This is a bit of an offshoot, but for comparison, JPEG is completely unable to encode that 1280 x 720 picture in 14.2KB. Even if you turned the JPEG quality setting very low, you can get the very poor result shown on the third line in even using 34.5KB (anything smaller than that was completely unreasonable). So VC-1's "single image compression far" exceeds the capabilities of JPEG – not a total surprise – it is a far more modern algorithm. But, interestingly, is not as good as JPEG-2K. JPEG-2K can and did make the image shown in the fourth line using the same 14.2KB the VC-1 had to work with, and clearly outperformed it.

Which is interesting - if you did need a movie file that was all keyframes (for editing purposes or whatever), "motion JPEG-2000" would apparently outperform VC-1. Which makes me wonder why VC-1 doesn't use the JPEG-2000 algorithm for keyframes and use the existing algorithm for everything else. Maybe it would be two slow or there are other issues. But presumably, such a codec should outperform VC-1. Perhaps I should create & copyright/patent such a hybrid ;)

3. There's a third big problem with both of the Eastermeyer's encodes, the MPEG and the VC-1, but they're quite unrelated to anything discussed above and his post has gotten quite long - so I'm going to write up some comments on that & post it separately, maybe tomorrow.

zambelli

14th May 2007, 08:05

That's exactly correct; it's all keyframes, as can be seen from the the output from AFF2VC1, as Strongling mentioned (this is v1.1 of the conversion app btw, with a few updates - I'll post it in a day or two).
My offer to help you test this still stands. Let me know.

1. Why did it use all I frames? I don't think the MS VC-1 encoders accept a value of less than one second for minimum keyframe distance - look what happens if you try to enter your "0.25" value into WME 9 directly:
I think Nic's WMEnc should issue an error it, but apparently appears to accept it- but it no doubt it becomes "0" by the time it's fed to the actual codec, and the codec reasonably enough would interprets "0 distance between keyframes" as "nothing but keyframes".
Actually, this is only a limitation in the WME9 UI. The limitation does not exist in the WME9 SDK, nor in the WMV9 codec DMO. If you use wmcmd.vbs (http://www.citizeninsomniac.com/WMV), you can specify floating point key distances < 1.0 seconds.

You are correct, though, that specifying key distance = 0 means "all I frames".

Scenarist's message aside, it's my understanding that VC-1 is supposed to work more like MPEG-4: whereas MPEG 1/2 use short GOP structure "patterns" like the IBBPBBPBB pattern Strongling mentions above, the newer encoders seem to have dispensed with that idea in favor of using I-frames (aka "intra" frames aka keyframes) at scene changes, unless some predefined max number of frames or seconds elapses prior to that (whichever comes first).
That's not a codec spec difference - that's an implementation difference. An MPEG-2 encoder could act just the same, and some do. Many MPEG-2 encoders allow you to both insert I frames at scene changes as well as use open GOPs - knowing very well the latter isn't good for editing.

The latter condition is to ensure reasonable random access capability.
Actually, that's not the main reason. I frames at scene changes are more efficient. If you were to encode a scene change frame as a P or B frame, you'd end up coding 100% error residuals because all your motion vectors from the previous frame would be wrong. That would also mean that all the frames that followed would be equally inefficient. Coding the scene change frame as an I frame ensures more efficient encoding. Easy editing and chapter breaks are just nice perks.

2. Why does it look so bad? First, I don't know if it was intentional as I'm unfamiliar with Blu-ray /Scenarist / HD requirements & capabilities, but the file was encoded at a full ~60 frames (not fields) per second. The same is true for the MPEG sample, but the WMV suffers more (more on that later); for now let's just do the math for the wmv.
Based on your sequence header information, I can also notice that the encoder didn't use any advanced VC-1 features - not even in-loop filtering. That encode could probably be done with much higher quality.

Which is interesting - if you did need a movie file that was all keyframes (for editing purposes or whatever), "motion JPEG-2000" would apparently outperform VC-1. Which makes me wonder why VC-1 doesn't use the JPEG-2000 algorithm for keyframes and use the existing algorithm for everything else.
:)
You can't just mix and match codecs like that! JPEG-2000 is patented, just like most other codecs. There'd be some serious licensing and legal work involved in something like that.

foxyshadis

14th May 2007, 13:23

J2K is a wavelet codec, and multiscale wavelets are designed to do much better at the bottom of the quality scale, but are only marginally better or worse as you raise it, and their usefulness in delta frames is very marginal over dct. More importantly, they're painfully slow, although as far as I know there may be no fully optimized wavelet transforms in software currently. Codecs have to balance speed and compression, which is part of why VC-1 chose not to use arithmatic encoding (CABAC). End-user formats like VC-1 and h.264 aren't really designed for I-frame only efficiency anyway, since that makes no sense for their intended use. MJPEG2K is a great capture format if you have the hardware for it, though.

Optional multiscale wavelet I-frames is still a pretty cool idea for low bitrate video that I hope gets exploited someday, especially with more advanced directional wavelet research progressing. Much of the academic research isn't actually patented, as far as I know, though many of the underlying techniques probably are (as in video coding).

stegre

16th May 2007, 07:26

@zambelli, foxishadis: Thx for the offer Zambelli & thanks to both of you for the informative posts, much of which I didn't know (esp the JPEG2K/wavelet stuff); BTW, I was really only kidding about the "patent" thing - I wasn't quite setting up bank accounts yet ;)

And I still have to get back to my significant third point about Eastermeyer's files (including an explanation of Strongling's WinRar compression observation of Eastermeyers VC1 file), and about Strongling's other posts, but for now, I only have time for this one comment:

Actually, that's not the main reason. I frames at scene changes are more efficient. If you were to encode a scene change frame as a P or B frame, you'd end up coding 100% error residuals because all your motion vectors from the previous frame would be wrong. That would also mean that all the frames that followed would be equally inefficient. Coding the scene change frame as an I frame ensures more efficient encoding... That's not a codec spec difference - that's an implementation difference. An MPEG-2 encoder could act just the same, and some do. Many MPEG-2 encoders allow you to both insert I frames at scene changes

I am aware that "I" frames are also added for coding efficiency, what I was saying is that in MPEG-4, if for example, there was no significant "scene change" for a long time (and hence nothing to be gained in regard to what you're describing) they will still add one anyway at some maximum interval for random accessibility. And while your point that MPEG 1/2 don't enforce any limits may be true, a [standard resolution] DVD compliant MPEG-2 file does: I believe 18 frames or 15 frames max for NTSC / PAL DVD's respectively.

Using GSpot's semi-obscure "VGS" function (which I do plan to further enhance) shows this clearly below. This diagram below is the actual frame layout that TMPGEnc used for Eastermeyer's MPEG-2 file - a fairly regular, fixed GOP pattern of length 15 (though I suppose it could have used 18). The places where the pattern breaks (e.g. the ones marked "4" and "13" are places where TMPGEnc no doubt detected a "scene change" and decided to "break the pattern" for exactly the reason you mention - coding efficiency - not random accessibility. (Though it did use a "closed GOP" as well as breaking the pattern, whereas the others are "open GOP's; that probably is for "editability", but that's a technicality that's getting beyond the scope of my point here).

http://www.ftypes.com/unrelated/vgs_mpeg2.png

By contrast, note the DivX MPEG-4 layout on this section of a "professionally encoded" movie trailer from the DivX people themselves. There are no patterns or obvious limitations at all; presumably this "layout" was chosen for maximal coding efficiency. Most of the "I" frames are there because the encoder decided that was the more effecient route to go. BUT: note the gap I labeled 300. That one is for random accessibility, there is no scene change there, but the DivX settings they used apparently specified a "max I frame" distance of 300. These two diagrams clearly a show categorical difference in encoding styles, and that was my point about "modern encoders" and trend difference with regard to "I" frames.

http://www.ftyps.com/unrelated/vgs_mpeg4.png

Anyway, I'll try to get more directly back on topic as soon as I have a chance to post again; my time is a bit constrained...

stegre

17th May 2007, 07:33

Anyway, really quickly, my last and probably most important point is that Eastermeyer's files are 2.5 times as large (or "2.5 times worse", for the same file size) as they "should" be - in addition to anything else mentioned above - for a very simple reason:

Apparently, the media was originally 24FPS material and the frame rate was increased to a full 60 FPS. Both the VC-1 and MPEG encoders achieved this by identically duplicating the original frames in a 3,2,3,2... pulldown pattern. This isn't even your "regular 3:2 pulldown" - this involves twice that inefficiency because the final file is 60 full progressive frames (not 30 frames / 60 fields) per second.

I don't know what the requirements were; certainly if this file is just for playing on a PC, then the extra frames should simply be removed & the framerate changed to 24FPS - the resulting file would be 40% the size (or 2.5 times "better") and play "temporally smoother" and require only 40% of the CPU usage to boot.

If it's for a Blu-ray DVD, I'm not familiar with the requirements; I assume they have a mechanism similar to standard DVD's which allows the hardware take care of 24p to 60i (or 24p to 60p?) conversion. In any event, if 60 FPS is required for whatever reason, both VC-1 and MPEG (and even AVI, as a container) have "duplicate frame" constructs which could/should have been used.

In the VC-1 case, the encoder was probably "prohibited" from using any such mechanism because the settings ended up requiring "all keyframes", as previously discussed. I don't know what the MPEG encoding setup was, but TMPGEnc was apparently either "unaware" of the duplicates, or prevented from using MPEG's so-called "not coded" bit - perhaps also due to its settings.

What was surprising to me was that even if TMPGEnc was running in a "dumb" or "restricted" mode like that, you'd think that many or most of the interframes would be tiny: they're encoding "zero difference" from the previous reference frame. Surprisingly, that did not see that to be the case, which is an item of some curiosity.

But back to the VC-1 - it consists of a huge amount of precisely repetitive data, and that's the reason behind Strongling's observation that WinRar was able to compress that file 2:1 without much sweat.

Below are the details of the first 100 frames of the raw VC-1 data from Eastermeyer's VC-1 file. As can be seen, it's a highly repetitive series data blocks identical right down to the bit level, as can be seen by the CRC's I've included. As noted at the bottom, simply removing this redundancy my any of the above method would result in a 2.5X improvement in quality/size ratio.
VC-1 (60FPS) Frame#, type, size & CRC Orig 24FPS Frame#
======================================= =================
0: [K], 18600 bytes, CRC32 = 7622-ACFB 0
1: [K], 18600 bytes, CRC32 = 7622-ACFB .
2: [K], 18679 bytes, CRC32 = 5140-624E 1
3: [K], 18679 bytes, CRC32 = 5140-624E .
4: [K], 18676 bytes, CRC32 = 4BDA-DBC6 2
5: [K], 18676 bytes, CRC32 = 4BDA-DBC6 .
6: [K], 18676 bytes, CRC32 = 4BDA-DBC6 .
7: [K], 18627 bytes, CRC32 = C6AE-BE0B 3
8: [K], 18627 bytes, CRC32 = C6AE-BE0B .
9: [K], 18627 bytes, CRC32 = 00E4-9C5E 4
10: [K], 18627 bytes, CRC32 = 00E4-9C5E .
11: [K], 18627 bytes, CRC32 = 00E4-9C5E .
12: [K], 18544 bytes, CRC32 = 9164-B266 5
13: [K], 18544 bytes, CRC32 = 9164-B266 .
14: [K], 18417 bytes, CRC32 = A547-CE26 6
15: [K], 18417 bytes, CRC32 = A547-CE26 .
16: [K], 18417 bytes, CRC32 = A547-CE26 .
17: [K], 18548 bytes, CRC32 = 0263-DE72 7
18: [K], 18548 bytes, CRC32 = 0263-DE72 .
19: [K], 18656 bytes, CRC32 = F94C-9602 8
20: [K], 18656 bytes, CRC32 = F94C-9602 .
21: [K], 18656 bytes, CRC32 = F94C-9602 .
22: [K], 18642 bytes, CRC32 = 4BFD-8279 9
23: [K], 18642 bytes, CRC32 = 4BFD-8279 .
24: [K], 18788 bytes, CRC32 = C5F8-D0EC 10
25: [K], 18788 bytes, CRC32 = C5F8-D0EC .
26: [K], 18788 bytes, CRC32 = C5F8-D0EC .
27: [K], 18819 bytes, CRC32 = 9A3E-EA1E 11
28: [K], 18819 bytes, CRC32 = 9A3E-EA1E .
29: [K], 18816 bytes, CRC32 = AFD0-29C0 12
30: [K], 18816 bytes, CRC32 = AFD0-29C0 .
31: [K], 18816 bytes, CRC32 = AFD0-29C0 .
32: [K], 18929 bytes, CRC32 = D044-FB7D 13
33: [K], 18929 bytes, CRC32 = D044-FB7D .
34: [K], 18933 bytes, CRC32 = 6BAB-4ABE 14
35: [K], 18933 bytes, CRC32 = 6BAB-4ABE .
36: [K], 18933 bytes, CRC32 = 6BAB-4ABE .
37: [K], 18944 bytes, CRC32 = F7E7-77C3 15
38: [K], 18944 bytes, CRC32 = F7E7-77C3 .
39: [K], 19011 bytes, CRC32 = 34E6-07B6 16
40: [K], 19011 bytes, CRC32 = 34E6-07B6 .
41: [K], 19011 bytes, CRC32 = 34E6-07B6 .
42: [K], 19038 bytes, CRC32 = 93F5-86B2 17
43: [K], 19038 bytes, CRC32 = 93F5-86B2 .
44: [K], 18994 bytes, CRC32 = E3D4-93C7 18
45: [K], 18994 bytes, CRC32 = E3D4-93C7 .
46: [K], 18994 bytes, CRC32 = E3D4-93C7 .
47: [K], 18990 bytes, CRC32 = 3B74-B4DF 19
48: [K], 18990 bytes, CRC32 = 3B74-B4DF .
49: [K], 18994 bytes, CRC32 = 5366-6A36 20
50: [K], 18994 bytes, CRC32 = 5366-6A36 .
. . . . .
. . . . .
=========== ===========
26677761 total, incl dups 10761886 total w/o dups

Total VC-1 frame data is 2.48 (~2.5) times what it "should be" (60/24 = 2.5)

Strongling

17th May 2007, 14:42

Ah, so the mystry is solved, thanks stegre!

What was surprising to me was that even if TMPGEnc was running in a "dumb" or "restricted" mode like that, you'd think that many or most of the interframes would be tiny: they're encoding "zero difference" from the previous reference frame. Surprisingly, that did not see that to be the case, which is an item of some curiosity.

What interframes? Didn't we just all agree that there were no interframes, its all intra, no? Anyway, don't know much about TMPGEnc so maybe I misunderstood and you were talking about something else.

Any chance of getting the source for the wonderful VC-1 analyzer/tool that you wrote soon?

stegre

17th May 2007, 15:01

His VC-1 sample was 100% intraframes; the MPEG was not.

In fact, if scroll up a bit, to the upper of the two colorful box diagrams above, you can see the exact way TMPGenc laid out the frames (the bottom pic is an arbitrary example file, but the top one shows the first 900 frames of Eastermeyer's actual MPEG file).

Yes, I do plan to release the source for the util; I'll try to neaten up over the weekend - I should have some time then - and post the latest source and a compiled binary.

stegre

28th May 2007, 02:49

ASF2VC1 v1.2 (http://www.ftyps.com/unrelated/asf2vc1/)

"A small free command line utility which quickly and losslessly converts a Windows Media WMV9 Advanced Profile (i.e. a WVC1 "*.wmv" file) into non-Microsoft-specific VC-1 "encapsulated elementary" bitstream file (i.e. a "*.vc1" file)."

(no source code yet; see webpage for more info)

Feedback appreciated, especially regarding the potential use of the app for Blu-ray or HD DVD authoring. Post here or email.

Thanks to Zambelli for all his help in getting this application to this point in its development.

- Steve

akupenguin

28th May 2007, 07:19

Which is interesting - if you did need a movie file that was all keyframes (for editing purposes or whatever), "motion JPEG-2000" would apparently outperform VC-1. Which makes me wonder why VC-1 doesn't use the JPEG-2000 algorithm for keyframes and use the existing algorithm for everything else. Maybe it would be two slow or there are other issues. But presumably, such a codec should outperform VC-1. Perhaps I should create & copyright/patent such a hybrid

Because it would massively increase implementation complexity. As-is, there's essentially no code specific to I-frames, they're just frames containing only I-blocks. But even if you skip all the scalability features of JPEG-2000, you're still adding a wavelet and a whole new entropy coder.
Maybe speed is also an issue. I don't know if JPEG-2000 is inherently slow, or if all implementations I've seen are just inefficient.

Strongling

28th May 2007, 14:29

@stegre

I read the details about ASF2VC1 on the website you mentioned. One question remains: how does the tool deal with simple/main profile files? You can't just add the entrypoint and sequence headers to them, right?

Were any of the 9 files from Microsoft, that you mentioned, simple/main profile?

Also are those files freely available for anyone to download? If yes, can you please give the link.

Thanks

zambelli

1st June 2007, 11:37

I read the details about ASF2VC1 on the website you mentioned. One question remains: how does the tool deal with simple/main profile files? You can't just add the entrypoint and sequence headers to them, right?
Simple and Main Profiles don't have enough information in them to be self-contained, therefore it's not possible to make elementary streams out of them. They essentially require a container, even just a simple one.

Were any of the 9 files from Microsoft, that you mentioned, simple/main profile?
Nope, I only shared out Advanced Profile encodes with Steve.

Also are those files freely available for anyone to download? If yes, can you please give the link.
I think that could be arranged, though there's little mystery left about them. They were generated as WMVs using WME9 and the free WMV9 codec, then converted to .vc1 ES using internal Microsoft test tools. Now that Steve has finished v1.2 of ASF2VC1, you can pretty much do the same thing at home. :)

foxyshadis

2nd June 2007, 07:59

Because it would massively increase implementation complexity. As-is, there's essentially no code specific to I-frames, they're just frames containing only I-blocks. But even if you skip all the scalability features of JPEG-2000, you're still adding a wavelet and a whole new entropy coder.
Maybe speed is also an issue. I don't know if JPEG-2000 is inherently slow, or if all implementations I've seen are just inefficient.

The entropy coder is just an AC, and I'm sure CABAC would be roughly equivalent, and could be compatible with CAVLC.

The rest is just 4 or 5 steps of inverse wavelet transform, layer differences, bilinear upsize by 2x, repeat. I wouldn't be surprised if it was possible to get good things out of a multiscale dct (and a multitap filter, perhaps the same one used for qpel), although I'm sure decoding complexity would skyrocket then.

I'm just musing because I-frames are still as much a bane to low-rate encoding now as when MPEG-1 was introduced. Ah well.

DanielCardenas

7th November 2007, 00:01

I'm having trouble demuxing: http://download.microsoft.com/download/e/a/d/eadb9b42-728b-42b0-bfdf-b472fa2a2464/Step_into_Liquid_1080.exe
Which I retrieved from: http://www.microsoft.com/windows/windowsmedia/musicandvideo/hdvideo/contentshowcase.aspx

I believe it is because the techniques listed in this thread work for Advance profile, but not for main profile. Is that correct?
Is there a solution for main profile? Is it to encapsulate into an RCV file?

Thanks,
Daniel

crypto

7th November 2007, 08:19

Exactly, only the Advanced Profile defines the headers, that are necessary to build elementary streams outside of a container.

You are also right about assuming RCV for the Main Profile. But I am not aware of any tools.
See: What's the difference between VC-1 RCV with VC-1 elementary stream? (http://forum.doom9.org/showthread.php?t=131117)

stegre

3rd December 2007, 06:34

I've just released the complete source code for ASF2VC1, along with additional technical documentation.

http://www.ftyps.com/unrelated/asf2vc1/

crypto

4th December 2007, 00:05

Thank you Steve. That's great news.

Conspicuous57

5th February 2008, 11:05

i have a file with wmv file extension encoded with VC1.

i tried to demux the audio with vdm 1.5.10.1. and it didnt worked :?:

first i have written an avs script. it looks like:
DirectShowSource("E:\sub\iguana-subg.1080p.5.1.wmv")

and opened it with vdm. gone to the streams tab. there selected the audio and pressed Demux.

that is all i have done for the audio. but there was a file i named it as "test" with no extension and it gave me a file with a 4,37 gb file size without any extension and i wasnt able to play it.

can anyone gimme a proper solution. how can i demux the ac3 audio?

PleXuS

10th February 2008, 06:24

I've just released the complete source code for ASF2VC1, along with additional technical documentation.

http://www.ftyps.com/unrelated/asf2vc1/

you maybe know what this means?

Warning: ASF ReplicatedLen of '1' not handled yet; ignoring.

i get this with my VC1 WMV file :|

best regards,
PleXuS

stegre

12th March 2008, 05:38

Sorry about the delayed response, but I've been very busy. Meanwhile I've gotten at least three emails regarding this exact question. My [admittedly somewhat lame ;)] response:

I do specifically remember putting in that message. There was some field in the ASF/WMV spec which I needed to use, and the spec did explain exactly what it was, but kept saying "unless the value is '1', in which the meaning is something totally different" (at least that's the gist of what it said, as I recall, in a very simplified form). But for its part, the spec did indeed go on and described what "1" meant too.

I just remember that at the time it was tough work poring thru the highly technical VC-1 spec (fortunately I didn't need to understand all 493 pages of it!) And although it's far more common (to say the least), and has a far, far shorter spec, a thorough understanding of the ASF/WMV "container" isn't totally trivial either - and I needed that knowledge in order to demux the WMV to create the VC-1.

So I was pretty "OD'd" on specs by the time I got ASF2VC1 working to my satisfaction, and I took the "lazy man's way out" by ignoring the "ReplicatedLen = 1" issue -- because none of the files I tested ever came up with it. So I figured it was some oddball thing no one used. Though, in my defense, I should mention that I did at took the time to put in the message you're seeing if I ever did encountered a value of "1" for the field.

But apparently it's not as "oddball" as I thought, so I'll read up on that section of the spec and see if I can make appropriate modifications to ASF2VC1, hopefully this weekend. It'll probably end up be something much simpler than I'm making it out to be, anyway.

-Steve

Isochroma

12th March 2008, 07:55

Perchance, what does ReplicatedLen = 1 mean in the spec?

stegre

12th March 2008, 13:07

I'm going to re-read it tonight. I just remember it had some "special" meaning compared to other values.

stegre

18th March 2008, 06:16

I didn't forget about this; it is indeed a "special case" in the spec that's complex enough where I'd need a sample file to update my program reliably. And it took me until now (well, I wasn't exactly working full time;) to get such a sample. Now I can start on the actual fix, maybe add a few other things too (e.g. PAR and frame rate command line override switches). It'll probably be another week before I actually finish the update, though.