PDA

View Full Version : DC / AC Frequency Questions


Defiler
23rd September 2002, 18:43
First, please forgive me if this has been answered elsewhere.
I've just spent two hours searching for the answer, but I'm crippled by the fact that you cannot search for two-letter acronyms.
I've found some useful information on the web, particularly in documents describing JPEG compression.

I understand how the process of quantization works, and I understand that the quantizer matrix is traversed in a zigzag pattern from top left to bottom right, thus making the top left values correspond to the quantizers applied to "low frequency" blocks, and the bottom right values correspond to the quantizers applied to "high frequency" blocks.

This, however, leaves me with a few unanswered questions:

1. What do AC and DC stand for? I cannot find ANY documents that describe what these acronyms mean. I know only that the block containing the lowest "spatial frequency" is designated DC, and the other 63 blocks are designated AC.

2. If the video is operated on in 8x8 blocks, this means that each value in the quantizer matrix is applied to only one pixel in a block? Is this a stupid question? I don't quite understand how you can know anything about a single pixel, other than its position, how "bright" it is, and how much that luma value differs from the same location in adjacent frames.

3. In the context of video compression, what does "frequency" actually refer to? I have read that "spatial frequency" (the domain to which the whole AC/DC discussion applies) is converted from the temporal domain by the DCT process, by I don't have a good handle on what this actually means. Can someone describe spatial frequency and temporal frequency in greater detail?

Thank you for taking the time to answer me.

-h
23rd September 2002, 19:15
1. What do AC and DC stand for? I cannot find ANY documents that describe what these acronyms mean. I know only that the block containing the lowest "spatial frequency" is designated DC, and the other 63 blocks are designated AC.

I honestly don't know what they stand for. All you really have to remember is that the DC value represents the average of the block, and AC values describe progressively finer details. Here (http://www.cs.sfu.ca/CourseCentral/365/li/material/notes/Chap4/Chap4.2/DCT_basis.gif) is an image which shows how each DCT coefficient describes the block of pixels - note that the larger the stored value for a coefficient, the more the block of output pixels will resemble that coefficient's signature pattern. When enough of those patterns are laid on top of each other, they begin to resemble the original input.

Or as SirDavidGuy may tell you, just think in 64-dimensional space :)

2. If the video is operated on in 8x8 blocks, this means that each value in the quantizer matrix is applied to only one pixel in a block? Is this a stupid question? I don't quite understand how you can know anything about a single pixel, other than its position, how "bright" it is, and how much that luma value differs from the same location in adjacent frames.

Nope, applied to one DCT coefficient, not one pixel. If a coefficient is set to anything but 0, it will affect *every* pixel value. Such is the DCT's strength in lossy compression - you can discard or brutalise several coefficients, but those (low frequency ones) which remain may still be able to describe most of the pixels with some degree of accuracy.

3. In the context of video compression, what does "frequency" actually refer to? I have read that "spatial frequency" (the domain to which the whole AC/DC discussion applies) is converted from the temporal domain by the DCT process, by I don't have a good handle on what this actually means. Can someone describe spatial frequency and temporal frequency in greater detail?

Well, instead of storing individual pixels, we decide to store a set of frequencies which have been supplied by a mathematical cosine function of the input. It's the same concept as a spectrogram you might see in an audio program - the input is just a bunch of bytes that make up the points on a sound wave, but through frequency analysis (the Fourier transform in that case) we can get a nice pretty picture of the frequencies contained in that sound wave.

I'm not very good at explaining it, because I've never studied the maths behind the DCT.

-h

Defiler
23rd September 2002, 19:27
OK.. So the codec performs some type of frequency analysis on each 8x8 pixel block, the details of which it seems neither of us are prepared to discuss. :D
After it has done this, it chooses one of the 64 DCT coefficients to use while performing the DCT on the 8x8 block of pixels? ..or is it that it always uses all 64 coefficients on each block, and applies one of those 64 to each "frequency component" exposed through something like a Fourier transformation?

If the second option is correct, what is so special about 8x8 blocks? Why not pass the whole frame through this process, or 4x4 blocks, or 16x16, etc, etc?
Thanks again.

-h
23rd September 2002, 20:00
After it has done this, it chooses one of the 64 DCT coefficients to use while performing the DCT on the 8x8 block of pixels? ..or is it that it always uses all 64 coefficients on each block, and applies one of those 64 to each "frequency component" exposed through something like a Fourier transformation?

Nope. The final DCT coefficients are just the results of calling an equation with a given input.

Have a look at this (http://www.fh-friedberg.de/fachbereiche/e2/telekom-labor/zinke/mk/mpeg2beg/applicat.jpg) image. Along the top you have the values 0 through 7, which are 8 pixels in a row (Sample Array). By using each pixel value in a cosine equation, the answer to those equations can be summated to represent how strongly that cosine equation (frequency) represents the 8 input pixels. Once all 8 pixels have been sent to 8 differently-weighted cosine equations, you perform the same thing on the next row of 8 input pixels. Once all 8 rows have been done, you will have 64 values, but they only represent horizontal information whereas we need vertical as well. So, those 8 rows of DCT coefficients are broken into 8 columns, and their values are again sent to the cosine equation and the output summated to give a result which represents both horizontal and vertical properties of the original block.

.. what is so special about 8x8 blocks? Why not pass the whole frame through this process, or 4x4 blocks, or 16x16, etc, etc?
Thanks again.

It was a good tradeoff between the inefficiency of smaller blocks and more noticable artifacts of larger blocks from memory. I read a study a long time ago about it but couldn't find anything when I looked now.

-h

Defiler
23rd September 2002, 20:08
In the image you just linked, the weights are values internal to the DCT process, and have nothing to do with the values in the quantizer matrix, correct? If so, what determines the weight assigned to any given pixel in the array?
Is there a site out there that can show me the whole process, start to finish, as performed on a sample 8x8 block? As in, something that ends up with an equation that has all the variables filled in?

-h
23rd September 2002, 20:36
In the image you just linked, the weights are values internal to the DCT process, and have nothing to do with the values in the quantizer matrix, correct? If so, what determines the weight assigned to any given pixel in the array?

Correct - quantization has nothing to do with the DCT. Quantization just divides numbers to make them smaller, and it just so happens that in MPEG those numbers come from a DCT.

Is there a site out there that can show me the whole process, start to finish, as performed on a sample 8x8 block? As in, something that ends up with an equation that has all the variables filled in?

There is a java applet here (http://www-mm.informatik.uni-mannheim.de/veranstaltungen/animation/multimedia/2d_dct/) that you might be interested in, however it requires the Java SDK which is a 5 MB download. It lets you tinker with individual coefficients and see the output. This (http://www-mm.informatik.uni-mannheim.de/veranstaltungen/animation/multimedia/1d_dct_und_dft/) applet does the same, but only on the 8-pixels described previously (1-dimensional).

This (http://www.cs.cf.ac.uk/Dave/Multimedia/node231.html) page gives a description of the maths involved, but I can't find anywhere (apart from XviD's fdct.c source file) that would let you step through the equations one by one.

-h

MfA
23rd September 2002, 20:37
AC&DC have become terms upon themselves, AFAICS they are just derived from good old electrical AC&DC (alternating current and direct current) but obviously have changed slightly in meaning. Outside of the context of power supplies DC usually refers to the constant component and AC to the alternating component.

Nope, applied to one DCT coefficient, not one pixel. If a coefficient is set to anything but 0, it will affect *every* pixel value. Such is the DCT's strength in lossy compression

Such is its strength, such is also its weakness :)

-h
23rd September 2002, 21:02
AC&DC have become terms upon themselves, AFAICS they are just derived from good old electrical AC&DC (alternating current and direct current) but obviously have changed slightly in meaning. Outside of the context of power supplies DC usually refers to the constant component and AC to the alternating component.

As you have IEEE material at your fingertips, perhaps you could check the source ;)

N. Ahmed, T. Natarajan, and K. R. Rao, "Discrete Cosine Transform,"
IEEE Transactions on Computers, Vol. C-23, No. 1, January 1974, pp. 90-93

-h

Defiler
23rd September 2002, 21:43
Thanks for the links, -h. I think I'm starting to understand this mess.
Is it useful to think of the DCT as a process that converts intensity into frequency, or is that a mental dead-end?

-h
23rd September 2002, 22:09
Is it useful to think of the DCT as a process that converts intensity into frequency, or is that a mental dead-end?

The DCT converts an 8x8 block of pixels into 64 values which are the amplitudes of a discrete cosine equation at 64 different frequencies. So yes.

I'm certainly no expert on the DCT though, don't take anything I say as gospel. I've just never had to know (in detail) how it does its job, just what the end result is.

-h

MfA
23rd September 2002, 22:44
I shouldnt have been so wishy washy, let me rephrase that ... AC&DC mean alternating and constant components (I was just concerned about the .0001% chance of the terms not coming from the electrical current definitions.) It is not just DCT, it is common terminology.

Gazza
24th September 2002, 05:17
So if I get it right, AC & DC come from F[0,0] where DC is the average level or intensity within the 8x8 matrix and AC is the spread or range of differences (frequencies) about this mean. By choosing different values of u & v (F[u,v]) can you describe the particular value within a 8x8 array with respect to the F[0,0] value?

If yes, does this mean that you only need to store the start value (AC & DC at F[0,0]) and the differences related to this start point for each pixel?

In this calculation, do you step the 8x8 block along pixel by pixel and re-calculate each time or do you jump to the next 8x8 block?

Or I could be totally off beam.

-h
24th September 2002, 05:43
So if I get it right, AC & DC come from F[0,0] where DC is the average level or intensity within the 8x8 matrix and AC is the spread or range of differences (frequencies) about this mean.

What is F[]? The pixels or frequencies?

By choosing different values of u & v (F[u,v]) can you describe the particular value within a 8x8 array with respect to the F[0,0] value?

If F[] is a frequency array, the value at u,v cannot be described with respect to F[0,0] (the DC coefficient) as they are independent of each other. DC is the average of all pixels. AC values are weights of 63 different frequencies which are independent of the mean.

If yes, does this mean that you only need to store the start value (AC & DC at F[0,0]) and the differences related to this start point for each pixel?

I don't know what you mean.

In this calculation, do you step the 8x8 block along pixel by pixel and re-calculate each time or do you jump to the next 8x8 block?

Still not sure what you're getting at..

-h

Gazza
24th September 2002, 09:49
Originally posted by -h
So if I get it right, AC & DC come from F[0,0] where DC is the average level or intensity within the 8x8 matrix and AC is the spread or range of differences (frequencies) about this mean.

What is F[]? The pixels or frequencies?



Apologies for not being clear (typing too fast with limited time...). In www.cs.cf.ac.uk/Dave/..... he states that F[0,0] defines the DC and AC components. I'm wondering that F[0,0] only defines the DC value and the next calculations (F[1,0], F[1,1],.....) produce the AC components? The maths escapes me (been a long time) but does F[0,0] mean the average for all frequencies in the 8x8 array?


By choosing different values of u & v (F[u,v]) can you describe the particular value within a 8x8 array with respect to the F[0,0] value?

If F[] is a frequency array, the value at u,v cannot be described with respect to F[0,0] (the DC coefficient) as they are independent of each other. DC is the average of all pixels. AC values are weights of 63 different frequencies which are independent of the mean.


I'm not sure that they are totally independant because you could probably determine other things like statistical spread, standard deviation, etc and use those to fine tune ..... Or am I talking rubbish here.


If yes, does this mean that you only need to store the start value (AC & DC at F[0,0]) and the differences related to this start point for each pixel?

I don't know what you mean.


Rather than storing the values for each calculation do you get even more efficiency by just storing the difference to the DC values?


Assuming there is a link between the mean and each calculation then could you just store the difference (smaller store size?)

In this calculation, do you step the 8x8 block along pixel by pixel and re-calculate each time or do you jump to the next 8x8 block?

Still not sure what you're getting at..




For each 8x8 block, I assume that you end up performing 64 calculations. Once done then do you hop over to the next 8x8 bloxk of pixels or do you just move the 8x8 block over by one pixel and start again?



Apologies to all if I'm totally off with my reasoning here.

Gazza

-h
24th September 2002, 10:29
Apologies for not being clear (typing too fast with limited time...). In www.cs.cf.ac.uk/Dave/..... he states that F[0,0] defines the DC and AC components. I'm wondering that F[0,0] only defines the DC value and the next calculations (F[1,0], F[1,1],.....) produce the AC components? The maths escapes me (been a long time) but does F[0,0] mean the average for all frequencies in the 8x8 array?

Oh OK, F[0,0] is the DC coefficient as it will equal the mean of both row and column calculations. First we create frequences from 8 rows of 8 pixels, split those frequencies into columns, then create frequency analyses of those columns. The end result is the same as a 2-dimensional transform, only a tad simpler to implement.

I'm not sure that they are totally independant because you could probably determine other things like statistical spread, standard deviation, etc and use those to fine tune ..... Or am I talking rubbish here.

Typically not. Even if the mean is especially high or low, you can still require erratic AC values if a single pixel is at the opposite end of the intensity scale.

Rather than storing the values for each calculation do you get even more efficiency by just storing the difference to the DC values?

Nope, there's just no real correlation between them to exploit. This is why in MPEG-4, (some of) the AC values are predicted from the AC values of neighbouring blocks.

For each 8x8 block, I assume that you end up performing 64 calculations. Once done then do you hop over to the next 8x8 bloxk of pixels or do you just move the 8x8 block over by one pixel and start again?

There are many ways to cheat. However if you merrily followed the equations without simplification, you would have to perform 8 calculations per desired DCT coefficient. Thus the first step of performing the DCT on rows would require 64 instructions per row, or 512 instructions for all 8 of them. Then you perform the DCT again based on columns, which will be another 512 instructions.

Once those 1024 instructions were done, you would just start work on the next 8x8 block of pixel input.

-h

Gazza
24th September 2002, 13:45
-h
Thanks for that. I will probably, if time permits, have a closer look at the maths behind DCT.... Or maybe just accept the outputs, as you say.

Gazza