PDA

View Full Version : Help with VC-1 transform


Strongling
28th April 2007, 06:50
Hello,


Experimenting with the DCT-like integer transform used in VC-1.

However, I see that when I take any sequence through a forward-transform inverse-transform cycle, the final output doesn't match with the initial input.

The differences are slight and the picture is quite clear, but still shouldn't it be exactly the same thing?

As I think the transform is advertised to be "lossless". Right?

If it is supposed to be lossless, then I have (obviously) made a mistake somewhere. I am attaching the source code. If somebody with experience could please look at it, would help me a lot.


Thanks and regards

Sulik
28th April 2007, 08:15
The VC1 transform is NOT lossless - where did you get that idea ?

akupenguin
28th April 2007, 15:50
While DCT is mathematically reversible, any finite-precision implementation of it won't be. Even H.264, which has a lossless mode, has to disable the DCT to get there.

Strongling
28th April 2007, 23:04
Thanks for the help guys!

@Sulik
The VC1 transform is NOT lossless - where did you get that idea ?

I really thought that the reason all these "integer" transforms are appearing is that they are losslessly reversible. Otherwise what is the point?

I used to think that all these integer implementations have worse energy compaction capabilities than the plain old DCT. Then there has to be a reason to choose any of them over DCT. The reason, again as I used to think, was better reversibility. By the way, I totally made this term up "better reversibility", I hope you understand what I mean. So my thinking was that at the cost of slightly less compression, the VC-1 gives a lossless implementation in integer arithmetic.

That's why I started working on it, I want to have an arrangement to test the VC-1 transform against the DCT, but as I said before, I am afraid that I might have made a mistake in my VC-1 transform implementation. I guess I can verify the inverse VC-1 transform, trouble is, the forward VC-1 implementation is nowhere to be seen. Is there an open-source implementation available somewhere?

After some initial experiments, I can tell you that in fact the VC-1 transform is giving better energy compaction than the DCT, however, the cost is distortion. i.e. the DCT-IDCT cycle is relatively more lossless than the VCT-IVCT cycle. Allow me to call the VC-1 transform the VCT.

Testing on the old forman QCIF sequence, here are some numbers:

DCT:
sad_src: 5577205, sad_dct: 947716, sad_io: 3238

VCT:
sad_src: 5577205, sad_vct: 827430, sad_io: 15488

The first number, sad_src, is the total sad (i.e. representing energy) of the source frame, the second number, sad_dct or sad_vct, is the total sad (i.e. representing energy) of the forward transformed frame and the third number is the difference between the initial source frame and the final transformed-inverse transformed frame. I have attached the results for the entire 400 frame sequence, the above just represents frame number 1.

So I can see that VCT has less energy in the transformed frame but the final distortion is about 5 times greater than the DCT. That is what prompted me initially to check my VCT. I think that is just too much distortion for just the transform cycle. By the way, the FDCT used was Skal's integer implementation and the IDCT used was the popular Chen-Wang integer implementation.


@akupenguin:
While DCT is mathematically reversible, any finite-precision implementation of it won't be. Even H.264, which has a lossless mode, has to disable the DCT to get there.

That's a very good and precise answer, however, you mean if I use QP=4 with an h.264 codec, I don't get lossless results? I am now guessing: NOT, but just for my information, why not? Doesn't QP=4 make the quantization factor equal 1, which should give lossless results.


And now I am almost afraid to ask this but is there *ANY* transform available that is truly reversible? I can think of maybe Hadamard, however, it has really poor energy compaction performance I think so it is not very interesting with respect to compression.


Anyways, thanks again guys, it helps.

Sulik
29th April 2007, 00:46
The primary benefit of the integer transform is that its output is strictly identical between all implementations - unlike the DCT used for MPEG-1/2/4, where each implementation's precision can lead to different rounding (though a strict integer implementation of the DCT could be chosen as an integer transform).

It's pretty much required for features like in-loop deblocking and/or spatial intra prediction, otherwise the rounding differences between encoder/decoder could lead to significant error accumulation (amplifying rounding differences of iDCT output).

Note that the forward transform is not part of the standard.

akupenguin
29th April 2007, 12:51
That's a very good and precise answer, however, you mean if I use QP=4 with an h.264 codec, I don't get lossless results? I am now guessing: NOT, but just for my information, why not? Doesn't QP=4 make the quantization factor equal 1, which should give lossless results.
QP=4 isn't a quantization factor of 1, it's 1.59
QP=0 (in baseline/main profile) is a quantization factor of 1. But that doesn't mean no quantization, that means a change of +/- 1 in the quantized coefficient level corresponds to a change of +/- 1 in the decoded pixels. If you actually didn't divide the DCT coefficients by anything, then the coded coefficients would be 16 times (64 times for 8x8 transform) bigger than the pixel values. e.g. the DC coefficient is a sum of all 16 pixels.
Or to explain the same source of distortion in another way: The HCT isn't normalized, and normalization is rolled into the quantization phase. QP=0 (when it's not the special lossless mode) is not quantized insofar as the number of output levels is the same as the number of input levels, but there's still rounding involved. The output of an exact normalized HCT would contain fractional numbers, and they have to be rounded to integers for writing to the bitstream.
Plus, the iHCT contains integer right-shifts instead of infinite-precision division, so that also discards some information.

Now, all that doesn't mean that it's not possible to pick a sequence of coefficients whose iHCT is identical to the input. (I don't know if that's possible.) It just means that fHCT->renormalize->iHCT isn't the identity transform. There are other methods of generating transformed coefficients, they're just very computationally expensive. e.g. libavcodec's quantization noise shaping (which is supposed to be a psychovisual algorithm, but it also improves PSNR because it's more exact than the fDCT.)

And now I am almost afraid to ask this but is there *ANY* transform available that is truly reversible? I can think of maybe Hadamard, however, it has really poor energy compaction performance I think so it is not very interesting with respect to compression.

Hadamard is reversible, and yes it's a poor choice for compression.

Wavelets can be reversible. If they're implemented with FIR or matrix multiply, then they're not reversible for the same reasons as DCT. But wavelets can also be implemented with lifting, i.e. a sequence of: predict a coefficient from the values of other coefficients, and subtract the prediction from the coefficient value. Then the decoder simply reverses it (add the prediction instead of subtract), and it'll perfectly match the input regardless of any integer approximations. This can also stay normalized at every step, thus rolling the required rounding into the "integer approximation" and not violating reversibility
Staying normalized isn't necessarily desirable for the purpose of lossy compression (it rounds more that just renormalizing once at the end), but does allow lossless compression by just omitting the quantization. Hence Snow's 5/3 wavelet is reversible but its 9/7 isn't.

Strongling
29th April 2007, 17:02
@Sulik :

The primary benefit of the integer transform is that its output is strictly identical between all implementations - unlike the DCT used for MPEG-1/2/4, where each implementation's precision can lead to different rounding (though a strict integer implementation of the DCT could be chosen as an integer transform).

That's my point exactly, i.e. the last part you said: they could have just made one of the IDCT integer implementations as the standard e.g. Chen-Wang, LLM etc. etc. Why go for a completely different transform if all that was required was a standard integer-exact implementation? There's is something else about the VC-1 and I would like to know that reason.

@akupenguin:

Thanks for the detailed explanation about the h264 transform/quantization. I was going to explore that transform too but because of the added Hadamard DC step when 16x16 intra prediction is used, I decided to look at it later.

However, one question I have to ask now is: how is the lossless h264 mode implemented? Clearly it is not in the way I thought it would be. I remember there was a bypass transform mode called IPCM or something like that. Is it such that the pixel data is just transmitted without transform to the entropy coding engine? What about for inter predicted blocks (i.e. signed pixel data) does the same thing work there as well?


My concern with the transform being lossless/reversible is not only in the interests of making a lossless compression codec but I believe that in the whole scheme of things it is not the job of the transform to distort the data (and thereby introduce losses). That part should only be the responsibility of the qunatization step as much as possible. This makes analysis of compression performance as well as other things (e.g. designing a bitrate controller) easier.

By the way, what do you guys think about comparing transforms like I am doing? Did you consider some of the results I posted? I know that just summing up the entire frame to see how many bits it will produce is rather crude as it depends a lot on the entropy scheme used, however, it does seem to be a good indicator.

Thanks guys for spending the time to straighten all this transform business out for me.

The most important thing that would help me move on would be some pointer to some open-source VC-1 implementation. I just need to be sure that the transform I implemented is correct.

akupenguin
29th April 2007, 17:50
Why go for a completely different transform if all that was required was a standard integer-exact implementation? There's is something else about the VC-1 and I would like to know that reason.
I would have said speed. But I just tested it and my SSE2 implementation of 8x8 Chen iDCT is 206 cycles while my SSE2 8x8 iHCT is 205 cycles. So only the fDCT differs in speed.

Maybe they planned for CPUs where multiplication takes more than Core2's 1 cycle. (iHCT has no multiplies.) That probably is the rationale, since H.264 also made a big deal about their multiply-free CABAC, while it turns out to be slower than a multiply-full arithcoder on recent CPUs.

However, one question I have to ask now is: how is the lossless h264 mode implemented? Clearly it is not in the way I thought it would be. I remember there was a bypass transform mode called IPCM or something like that. Is it such that the pixel data is just transmitted without transform to the entropy coding engine? What about for inter predicted blocks (i.e. signed pixel data) does the same thing work there as well?
The normal lossless mode is: take the residual after motion compensation or intra prediction, and send it straight to the entropy coder. It's still zigzagged as if it were DCT coefficients, and the entropy coder still thinks the later coefficients are smaller than the earlier ones, so H.264's lossless mode isn't so great at compression ratio.
IPCM mode is no compression at all. Just write the pixels to the bitstream with no prediction and no entropy coding. The only purpose of IPCM is so that you can limit bitrate in the cases where the lossless compression would make the compressed stream bigger than the input (pigeonhole...).

By the way, what do you guys think about comparing transforms like I am doing? Did you consider some of the results I posted?

QCIF is 25344 pixels. If sad is only 15488 (is that luma-only?) then 61% of the pixels had their LSB flipped. That doesn't sound so bad for rounding. I would have guessed 50%.

The most important thing that would help me move on would be some pointer to some open-source VC-1 implementation. I just need to be sure that the transform I implemented is correct.
ffmpeg (http://svn.mplayerhq.hu/ffmpeg/trunk/libavcodec/vc1dsp.c?revision=HEAD&content-type=text%2Fplain)

Sulik
29th April 2007, 21:09
One benefit of the VC1 inverse transform is that it was designed so it could be implemented with 16-bit multiply-adds.
Also, it was obviously designed for maximum efficiency for integer SIMD implementation since its output is transposed.

A typical vector idct implementation would consist of:
column_idct
transpose
column_idct
transpose

The last step is not needed in VC1, since the forward transform already has a transposed output.

akupenguin
29th April 2007, 21:22
Any separable transform, including the popular approximations of DCT, can be implemented as
column_idct
transpose
column_idct

The standard doesn't know or care whether your codec stores the coefficients transposed in memory. e.g. H.264 has no such provision, but x264 stores the coefficients transposed anyway. It just has to transpose the quantization matrix and the zigzag scan to match.
And my above benchmark of iDCT vs iHCT includes that optimization on both parties.

Strongling
29th April 2007, 21:38
Thanks akupenguin for the insights about h264 lossless compression.

And also I would agree about the reason for VC-1's superiority over plain old DCT that it could be implemented for high speed. I also agree about not making a big deal about not having a lot of mulitplies. But we have to remember that these things were in research a couple of years before they got released so maybe it was significant back then.

QCIF is 25344 pixels. If sad is only 15488 (is that luma-only?) then 61% of the pixels had their LSB flipped. That doesn't sound so bad for rounding. I would have guessed 50%.

Actually the results are for the whole frame luma+ chroma so that's 25344*3/2 = 38016 pixels and the result is even lower than you estimated at around 40%.

And by the way, by the same token aren't you impressed with the regular DCT with around 8% for the same sequence?

Another way to look at it is as in the attached figure. The first one, top left is the difference between original and DCT-IDCTed amplified 4x, second one top right is the same thing amplified 40x. And similarly the bottom 2 are for the VCT-IVCT. What troubles me is that I can almost make out the picture in the VCT-IVCT case and that's without any kind of quantization!

ffmpeg

Yeah, I looked at that. As far as I could tell, there is no VC-1 encoder in there, hence no forward transform, just inverse. That's why in one of my earlier posts I said I may be able to verify the inverse transform. But right now, I have no way to know if I did the forward transform correctly. Can you confirm that there really isn't any VC-1 encoder in ffmpeg? Maybe I missed it in all the huge source code.

akupenguin
29th April 2007, 22:27
There isn't any VC-1 encoder in ffmpeg. But as Sulik said, the forward transform isn't and can't be part of the standard. Maybe the fVCT you find in an encoder is more precise than yours, and maybe it isn't.

The most obvious way to improve the precision of your fVCT is to use round-to-nearest instead of truncate when renormalizing. That reduces sad by a factor of 3.

Strongling
30th April 2007, 01:32
There isn't any VC-1 encoder in ffmpeg.

Thanks for the verification.


But as Sulik said, the forward transform isn't and can't be part of the standard. Maybe the fVCT you find in an encoder is more precise than yours, and maybe it isn't.

I agree, however, not really looking to match my implementation with anyone else's just wanted to see the performance to see if it is within the same margins. That would sort of validate that what I have done is not grossly incorrect.

As things are, I see that there really is no VC-1 encoder (or project) available except for the Microsoft's WMV implementation which is obviously not open source.

And why is that, by the way, just wondering? Is it pure lack of interest on the video coders' community or is there some other reason that there is no VC-1 encoder available openly to public?


The most obvious way to improve the precision of your fVCT is to use round-to-nearest instead of truncate when renormalizing. That reduces sad by a factor of 3.

Good/interesting idea! I will try that and if I find something interesting, I will share.

Thanks again for your time and ideas.

akupenguin
30th April 2007, 01:51
Is it pure lack of interest on the video coders' community or is there some other reason that there is no VC-1 encoder available openly to public?
All the competent multimedia programmers know that H.264 is better than VC-1. ;) We still need a decoder because some companies might publish movies in VC-1, but with an encoder you have a choice, so there's no reason not to pick the better format.
Or maybe it's because VC-1 is backed by Microsoft, and who would want to implement a Microsoft format when there are alternatives?
Or maybe just because H.264 was standardized first.
Or maybe because H.264 has an open source reference encoder, slow though it is.

Strongling
30th April 2007, 02:33
All the competent multimedia programmers know that H.264 is better than VC-1.

LOL!:D Should have known better than to ask the x264 author for that opinion! Just kidding buddy, please don't take offense, have seen a lot of h264 vs VC-1 wars lately. And precisely for that reason and the fear of going off topic, I will not comment further. Let's keep this thread technical only.

Just adding that I totally agree with most of the points you mentioned above, specially the last couple and I am a huge h264 fan myself.

The thing that makes h264 golden (for me) is that wonderful trace feature of the reference decoder, kind of makes the $10000/person/year license analyzer tools redundant. Just the graphical interface is missing. Maybe someday someone will do one to take the decoder trace output and make a GUI for it, that would make h264 truely immortal!

zambelli
30th April 2007, 10:36
All the competent multimedia programmers know that H.264 is better than VC-1. ;) We still need a decoder because some companies might publish movies in VC-1, but with an encoder you have a choice, so there's no reason not to pick the better format.
Now now... Let's leave the obvious author's bias out of this. :)

Which one is better or worse depends largely on the implementation. Having two well designed competing codecs would be very beneficial for the quality of both codecs. Innovation in one encoder would drive the innovation in the other, don't you think? Ostracizing an alternative codec standard hardly seems like a way to improve the other.

Or maybe it's because VC-1 is backed by Microsoft, and who would want to implement a Microsoft format when there are alternatives?
Backed by Microsoft and owned by Microsoft are not the same thing. It's not a Microsoft format anymore - that was exactly the point of SMPTE standardization. It's a standard like any other.
Or maybe just because H.264 was standardized first.
Agreed, H.264 does have the advantage of a head start.
Or maybe because H.264 has an open source reference encoder, slow though it is.
Well, nothing is stopping development of an open source VC-1 encoder. Certainly not Microsoft, anyhow.

@ Strongling:
Speaking of reference encoders... I was under the impression that a reference VC-1 encoder was available through SMPTE. Do you have access to SMPTE materials?

Strongling
30th April 2007, 15:48
Speaking of reference encoders... I was under the impression that a reference VC-1 encoder was available through SMPTE. Do you have access to SMPTE materials?

You know what, I thought that would be case too. However, as I see it, my company just bought the specs some time ago and all I see is the pdf of the specs, no code!

I would appreciate it if someone could verify that there is a reference encoder provided by SMPTE (or anyone else) when you buy the specs.

I went to the SMPTE website and tried to verify that by trying to purchase the 421 but the details don't say whether there is a reference software included or not.

Manao
30th April 2007, 21:56
Which one is better or worse depends largely on the implementationI wonder who's biased :p Almost everything VC1 can do, h264 can too. The reverse, however, is far from true.Having two well designed competing codecs would be very beneficial for the quality of both codecsIt would be even better if both were following the same standard. That way, comparison would be easier ( then one wouldn't say that one is better because the standard is inherently better ).It's not a Microsoft format anymore - that was exactly the point of SMPTE standardization. It's a standard like any other.Still, the largest benefactor of more VC1 content out there is microsoft, mostly because VC1 and Microsoft are so tightly associated. Note, I'm not saying that is a bad thing.

zambelli
30th April 2007, 22:54
I would appreciate it if someone could verify that there is a reference encoder provided by SMPTE (or anyone else) when you buy the specs.
I'm double checking on the reference encoder availability. In the meantime, if you're an IEEE member (or someone at your company can get access for you), this IEEE paper on VC-1 transforms should be very helpful:

http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1530057

This paper describes the construction of computationally efficient transforms which can significantly reduce the complexity of a video decoder without loss in compression efficiency. In particular, these inverse transforms can be implemented using purely 16-bit arithmetic with similar rate-distortion performance to a 32-bit or floating point transform. To allow for variable block size coding, four 2D transforms are considered: one each for 8&8, 8&4, 4&8 and 4&4 blocks respectively. The design criteria for a pure 16 bit arithmetic implementation of the inverse transform are introduced, and used to derive the only useful set of transforms under these conditions. These transforms have been used in the WMV9/VC-1 codec to achieve significant reduction in computational complexity.

zambelli
1st May 2007, 04:01
I got word on the reference encoder. It is currently only available to SMPTE committee members but efforts are in progress to make it available to the public through the SMPTE store later this summer. I don't have any exact dates on when that will exactly happen though. I hope that IEEE paper will be of help in the meantime.

Strongling
1st May 2007, 04:05
Thanks for the info and confirmation on the reference encoder. I knew there was something fishy going on.

As for the link, I am not a member myself but like you said, I will check it out with someone from work tomorrow.


Thanks a lot for all the help again!

Cheers:thanks:

zambelli
1st May 2007, 20:05
I have a few more notes to pass on from our resident video codec architect regarding the relationship between the H.264 and VC-1 transform.

1) H.264 and VC-1 both have more than one transform in them.

1a) H.264 has 4x4 and 8x8 transforms. The 8x8 transform was not in the original three (Baseline, Main, and Extended) profiles. H.264 also has some less well-known "special" modes, such as a two-stage hierachical 16x16 transform (used only for luma unless the chroma format is 4:4:4) built out of 16 4x4 transforms and an extra 4x4 transform of DC components, and a two-stage hierarchical 8x8 (for 4:2:0) or 8x16 (for 4:2:2) transform (only used for chroma) built out of 4x4 transforms and an extra 2x2 or 2x4 transform of DC components, plus a couple of special "transform-bypass" modes. Another special mode is simple "PCM", in which the actual values of the samples are simply sent directly (which obviously achieves no compression but allows lossless representation of particular small areas). Another is a way of sending a block of residual differences (relative to an intra or inter picture prediction block) directly, and yet another is a way of losslessly coding an intra-picture series of samples in a "DPCM" fashion. The original profiles of H.264 have only the 4x4 and PCM modes. The "High" family of profiles added the 8x8 transform to that mix. And finally the new High 4:4:4 Predictive profile (not yet implemented by anyone since it is so new) also adds the other lossless modes.

1b) VC-1 has 4x4, 8x8, 4x8, and 8x4 transforms.

2) The ordinary 4x4 and 8x8 transforms are not "lossless" (i.e. exactly invertible for all input data) in either case (H.264 or VC-1). This is nothing new. None of the previous standards had invertible transforms in them either (when you take into account that quantization was always being applied and that rounding error tolerances and other distortion-introducing "mismatch control" distortions existed for decoders of the older standards).

3) The primary motivating factors for using a transform design as found in H.264 or VC-1 rather than something like an infinite-precision DCT design (as found in H.261 or MPEG-1) are

3a) to reduce computational requirements (at least for implementation on some architectures), and

3b) to specify the inverse transform in a way that ensures that all decoders will produce exactly the same decoded sample values.

4) Whether for H.264 or VC-1, the reason that the transform design is not "lossless" is to prevent a need for excessive dynamic range in the decoder processing. Since video is ordinarily not coded losslessly with such compression technology, exact invertibility is not ordinarily considered a requirement. It would be possible to use a "lossless"/exactly-invertible transform, but doing that would be likely to increase the decoder processing requirements or have other problems (such as undesirable compression artifacts or elimination of desirable parallel-processing characteristics).

5) There is no significant difference between the compression capability of the H.264 and VC-1 transforms of each corresponding block size. There is also no significant difference between the compression cability of these and that of a conventional DCT design with the same block size (e.g., as in H.261 or MPEG-1). Compression capability can be roughly assessed by computation of "transform coding gain" (TCG), which is a well-known theoretical measure of the effectiveness of such transforms. Computing a TCG is pretty straightforward for a design like a conventional DCT. But it gets a little trickier for hierarchical transforms, transforms with unequal norms, overlapped transforms, or bi-orthogonal transforms.


He also recommends the IEEE paper mentioned earlier - so I think that's your best ticket for fully understanding the forward integer transform implemented in VC-1.

Strongling
1st May 2007, 21:42
@zambelli:

Thanks for such valuable information.

I am still looking for someone that could give me access to the IEEE library, however, I have also decided to move on for now and look at the quantization. Maybe I will get back to the transform at a later date. In the absense of some other implementation, it is good enough for now.

@everyone:

I found all the information that everyone here shared very very useful so a big :thanks: to everyone.