View Full Version : Spectral Sub-band Seperation in H.26L...
SirDavidGuy
29th May 2003, 01:45
... aka The "Integer DCT" transform.
Why the committee chose it is a mystery to me, for two main reasons.
1. It is clearly worse than the DCT in terms of its ability to seperate the different frequencies; numerous tests have shown that. The argument which says that it allows more accurate reconstruction doesn't fly. Even at a low quant, the roundoff will almost certainly make any difference very low.
2. It doesn't allow DCT-domain sub-pixel motion estimation. Since this approach is faster (and may possibly present better quality), in a system like H.26L, where even 8th-pel precision is allowed, this is a must.
Any reasons as to why?
bergi
29th May 2003, 01:53
The argument which says that it allows more accurate reconstruction doesn't fly
An other argument:
Integer instructions are faster than floats.
SirDavidGuy
29th May 2003, 02:28
Originally posted by bergi
An other argument:
Integer instructions are faster than floats.
But this is addressed in point 2.
Sirber
29th May 2003, 04:35
Geez... this is high-tech stuff... :D
A little question from a H264 noob: Is H.26L = H.264?
sysKin
29th May 2003, 11:32
Originally posted by SirDavidGuy
1. It is clearly worse than the DCT in terms of its ability to seperate the different frequencies; numerous tests have shown that.Are you sure about it? I've read that the difference between this transform and DCT is up to 1% of any coefficient. I might be wrong though. And also, I don't see why separiting frequencies is important.
The argument which says that it allows more accurate reconstruction doesn't fly. Even at a low quant, the roundoff will almost certainly make any difference very low. I disagree. DCT problems are there, and they are very ugly. Do you remember XviD qpel smearing? It's horrible. Even if we can solve it on PCs by using the same DCT for all projects, I fear that _all_ qpel videos will not be decodable on stand-alone players.
2. It doesn't allow DCT-domain sub-pixel motion estimation. Since this approach is faster (and may possibly present better quality), in a system like H.26L, where even 8th-pel precision is allowed, this is a must.Can you explain? You always can do DCT if you want to, and you won't use the data during encoding anyway (I doubt 4x4 transforms will do any good for ME)... so I don't see the point.
Again, it might be that I don't understand and am wrong ;) Please explain, I will be doing "Kludge" and I'm planning on using integer transform :) (just note: I _will_ need 100% perfect reconstruction, some stuff will be predicted from the picture, prediction will determine VLCs used so must be the same on both ends)
Regards,
Radek
Selur
29th May 2003, 11:44
@sirber:
The H.264/MPEG-4 AVC standard has had several names over the course of its development. It was initially known as ITU-T H.26L and is now formally becoming Part 10 of the ISO/IEC MPEG-4 standard identified as ISO/IEC 14496-10 AVC.
source: VideoLocus Introduces World?s First Real-Time H.264/MPEG-4 AVC Standard Definition Video Encoder (http://videosystems.com/ar/video_videolocus_introduces_worlds_2/)
SirDavidGuy
29th May 2003, 12:56
I've read that the difference between this transform and DCT is up to 1% of any coefficient.
I got different figures; around 7-8%.
I don't see why separiting frequencies is important.
The more into the frequency domain you rotate it, the less noticeable (in the spatial domain) the quantization step will be. To a much smaller extent, sort of like the difference between quantization in the frequency (DCT) domain and the spatial domain (direct quantization of the data).
I disagree. DCT problems are there, and they are very ugly. Do you remember XviD qpel smearing? It's horrible. Even if we can solve it on PCs by using the same DCT for all projects, I fear that _all_ qpel videos will not be decodable on stand-alone players.
Then the "little" difference shouldn't mean much, should it? ;)
Can you explain? You always can do DCT if you want to, and you won't use the data during encoding anyway (I doubt 4x4 transforms will do any good for ME)... so I don't see the point.
But then it wastes time doing the DCT; which would presumably be done with an 8x8.
sysKin
29th May 2003, 13:35
Originally posted by SirDavidGuy
The more into the frequency domain you rotate it, the less noticeable (in the spatial domain) the quantization step will be. To a much smaller extent, sort of like the difference between quantization in the frequency (DCT) domain and the spatial domain (direct quantization of the data).OK I get it now, thanks :)
Do you happen to know how much I can lose when changing 8x8 transform (from dct to this integer)?
Then the "little" difference shouldn't mean much, should it? ;)Well, you can't watch the video. Dunno if it's a little difference ;) DCT errors propagate. It wasn't a problem in mpeg1/2 because the propagation was short. In xvid, even with many b-frames, it just looks horrible...
Radek
The integer transform is quite a bit different than the DCT, so I wouldnt be surprised by the 7% number ... but being different from DCT doesnt say much about the compression. There are integer transforms (http://www.mcl.iis.u-tokyo.ac.jp/~komatsu/BUNKEN/icip2001.pdf) which approximate the DCT very well, but they take a lot more multipliers&adders.
Does anyone have any numbers on coding loss, either measured with constant PSNR or with constant size, for non intra frames? (Not interested in some measure of decorrelation/energy-compaction/whatever, only cold hard bits after entropy coding count) I seriously doubt it is significant.
Transform domain MC doesnt seem to make much sense with such small blocks and long interpolation filters, as for ME ... encoder complexity was always a secondary concern.
Personally I still think the most appropriate way of coding the DFD is VQ.
SirDavidGuy
29th May 2003, 20:15
Why does Q-Pel amplify the DCT error? I don't know much about it, technically. Is the interpolation technique defined in the standard, the stream, or chosen at encode time?
Tommy Carrot
29th May 2003, 20:24
Originally posted by SirDavidGuy
Why does Q-Pel amplify the DCT error? I don't know much about it, technically. Is the interpolation technique defined in the standard, the stream, or chosen at encode time?
I don't know, but every mpeg4 codec has this issue, so it's the standard's fault. Even halfpel does this, just to a lesser extent. So the integer transforms definetaly help here.
bobololo
1st June 2003, 03:38
Originally posted by Tommy Carrot
I don't know, but every mpeg4 codec has this issue, so it's the standard's fault. Even halfpel does this, just to a lesser extent. So the integer transforms definetaly help here.
The main qpel issue with different codecs is primarily related to the definition of the qpel interpolation specified in the ISO/IEC standard. The first specification was very confusing and was completely updated lately (in a draft corrigendum from wg11, not publicly published yet). The result is that different codecs have their own implementation that follows more or less the standard and aren't not 100% interoperable.
Concerning the DCT mismatch issue, it is more visible in mpeg4 compared to mpeg1/2 because the usual GOP is much larger (~300 frames is common) and because most encoders don't intra-code macroblocks after 132 consecutive preditive coding to reduce the accumulation of DCT mismatch errors as required by the standard.
This mismatch problem is probably one of the motivations that leads to the choice of an integer transform.
-- bobololo.
shlezman
1st June 2003, 09:20
Originally posted by SirDavidGuy
It's not as good as "Real"-DCT but it's not that bad as you make it sound. The Integer transform is an approximation to the 4x4 DCT, It has three clear advantages which compensate the fact that it isnt as good as DCT.
1. It's easy to implement and takes alot of computation load off the CPU.
2. It's possible to implement it with 16 bit processors (makes the semicoductors industry very happy I guess). And of course it fixes the compliancy problem of the "old" DCT.
3. I guess someone will collect a bunch of money from the royalties (politics is an issue as well).
Originally posted by SirDavidGuy
[B2. It doesn't allow DCT-domain sub-pixel motion estimation
There are many advantages to the method you specified but it's not a consideration when forming a standard.
The standard is supposed to define the best methods to code the data and construct a compliant stream. The algorithms to use are up-to-you.
bergi
1st June 2003, 15:53
In this (http://www.bergos.org/files/transforms_dct.zip) zip file there is also a program (not realy good, some calculation errors, higher multiplicators mean less rounding error...) that can calculate a matrix for a integer DCT. I've made some test, and the 4x4 matrix of h.26L is the best i've seen, no other matrix is more correct than this one. Ok, there is some error if you want lossless forward - inverse transform, i don't know much about float transforms, but i think the are equal or more rounding errors than using this integer dct transform.
shlezman
1st June 2003, 16:32
@bergi
I took a brief look at your tests I think you might be wrong in constructing the matrixes. You see the inverse transform is not the transpose of the forward transform, but the inverse matrix. anyway the math isnt that hard, I calculated the inverse transform for :
13 13 13 13
17 7 -7 -17
13 -13 -13 13
7 -17 17 -7
and it's :
20 26 20 11
20 11 -20 -26
20 -11 -20 26
20 -26 20 -11
with 10 bits accuracy.
you might want to correct the code and run it with the right matrixes. You can use matlab to calculate the inverse transforms.
bergi
1st June 2003, 16:46
@shlezman
I don't think so. The 4x4 matrix calculated with my program is the same as the h.26L matrix. And matrixes calculated with my program were use in my other program (for transforming an bmp) included in the zip file, and i don't had any (big) noticeable errors.
shlezman
1st June 2003, 17:48
The Integer-Transform of H.264 is a rough approximation to the 4x4 DCT.
The matrix that you use is a much better approximation to that transform, then the results of forward transform and inverse transform MUST be 99.9% error free.
Test that with no quantization at all (quantization of 1) then calculate psnr :
sq2 = Sum of the square of the error (pixelwise)
psnr = 10*log10(255*255*image_width*image_size/sq2)
if it's less then 60 db then the calculation is wrong otherwise your calculation are correct and I'm wrong
:eek:
bergi
3rd June 2003, 22:27
@shlezman
Ok, you are right, but the difference wasn't so high, don't remeber the numbers for sure, think 77,??? for both, you matrix was a little bit better. Also I think i've found the error, i calculate the inverse transform, for the float forward transform and your matrix is the inverse transform for the (rounded) integer transform, right?
I'm going to update the program the next days, but first i want to change some things:
- clean the code (code should be easy to understand for everybody)
- calculate all inverse matrixes new, couse the 8x8, 16x16 and 32x32 are calculated the same way (can anybody give me some source how to calculate the the inverse transform this way?)
- perhaps add some wavelet block matrix (test only, don't think wavelet will lock good at little blocks)
shlezman
4th June 2003, 10:10
The easyest way is to use matlab or octave to inverse matrixes and quantize them, it can be VERY useful to test all these transforms, included wavelets.
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.