PDA

View Full Version : Meaning of frequency regarding video


thewebchat
1st October 2009, 19:03
Many times, in posts on this forum, I have seen mention of "high frequency information," "low frequency information," "lowpassing," and "frequency domains" when talking about the composition of video frames. My understanding is that these terms are related to the fourier transform, but I do not understand what the fourier transform is or how it relates to videos. Now, as I've mentioned in other posts, my understanding of mathematics is rather poor, so I could not understand the Wikipedia articles on the fourier transform and its applications. When I took a class on basic differential equations at university, I learned about fourier sums, which could be used to approximate functions as sums of trigonometric relations, and I was told that a fourier sum of infinite terms becomes a fourier transform. However, I must admit that I did not understand the usefulness nor the general properties of fourier sums/transforms at the time, and I still do not. So, my question is, what do these terms that I mentioned in my first sentence have to do with video processing?

LoRd_MuldeR
1st October 2009, 19:16
You should take a look at the JEPG article in Wikipedia as a starting point:
http://de.wikipedia.org/wiki/JPEG

The idea is very similar to video compression:
Each frame is cut into blocks of 8x8 pixels and then each block containing 64 pixels is represented as a linear combination of those 64 "reference" blocks:

http://upload.wikimedia.org/wikipedia/commons/thumb/2/23/Dctjpeg.png/292px-Dctjpeg.png

So after the transform we don't have 64 pixel values per block, but 64 coefficients. Each of those coefficients corresponds to one of the reference blocks in the picture above.
The transform itself is lossless: With the coefficients the original pixel values can be reconstructed by summing up the reference blocks, each weighted with its coefficient.

Now about frequencies: The coefficients in the upper left corner are considered "low" frequency, while the ones on the right and on the bottom are considered "high" frequencies.
By looking at the reference blocks that correspond to those coefficients (again the picture above), it should be quite obvious why that is.

In the further compression process the coefficients are quantized (divided and rounded). Hence some data (or "frequencies" are lost). In fact most coefficients are zeroed out.
Finally the coefficients are entropy coded (e.g. with Huffman coding). And since there are many zeros, this works pretty well...

thewebchat
1st October 2009, 19:25
But in this case, what are we referring to as the frequency of the image? Why is it a sum of those specific 64 blocks? Is that a natural consequence of the algorithm or did someone decide that?

LoRd_MuldeR
1st October 2009, 19:33
But in this case, what are we referring to as the frequency of the image?

The frequency of course depends on the original pixel values. After the transform of a 8x8 pixel block, we get one coefficient for each reference block (or "frequency").

Each coefficient represents the amplitude of the corresponding frequency. Or in other words: Each coefficient represents the weight of the corresponding reference block.

Shouldn't be too hard to image that if we zero out the coefficients in the lower right corner, we will only loose the "high" frequencies (that is: the "fine" image details).

Why is it a sum of those specific 64 blocks? Is that a natural consequence of the algorithm or did someone decide that?

Those blocks originate from the Discrete Cosine Transform (DCT). For the mathematical details, see the Wiki article:
http://en.wikipedia.org/wiki/Discrete_cosine_transform

It can be proven that each 8x8 pixel block can be represented as a linear combination of those 64 reference blocks.
So the DCT transformed data can be transformed back to the original data. That's what IDCT does (Inverse Cosine Transform).

thewebchat
1st October 2009, 19:44
Edit: I see.

In this case, why do we know that higher frequency blocks can be rounded out and eliminated? How do these blocks correspond to picture structures? You mentioned that higher frequencies contain the "fine details" of the image, but what property of the DCT leads to this?

I am trying to read the article on DCT, but it keeps talking about other mathematical concepts, and the articles on those concepts talk about them in terms of even more concepts!

LoRd_MuldeR
1st October 2009, 20:29
In this case, why do we know that higher frequency blocks can be rounded out and eliminated? How do these blocks correspond to picture structures? You mentioned that higher frequencies contain the "fine details" of the image, but what property of the DCT leads to this?

Well, I can't explain it any better than with the picture above:

While the coefficient ("reference" block) in the upper left corner would represent a completely "flat" block, the frequencies/details are getting higher/finer when moving the the right, to the bottom or both.

And how the coefficients are rounded is controlled by the quantization matrix. It usually causes most coefficients in the lower/right area to be rounded to "0" after quantization...


Example:

http://upload.wikimedia.org/math/b/d/2/bd2e111c655d5776790db8ac46a0e8de.png


I am trying to read the article on DCT, but it keeps talking about other mathematical concepts, and the articles on those concepts talk about them in terms of even more concepts!

IMHO the article about JPEG compression gives a better idea on how DCT works and why this is done in image/video compression.

It gives you the basic idea of the concept, instead of explaining the mathematical foundation in detail...

thewebchat
1st October 2009, 21:17
I see. From what I understand, the crucial "lossy" step in image/video coding is then when the DCT coefficients are divided by the specified matrix (and I suppose the divided numbers are multipled back prior to iDCT)? In that case, how do the quantization strength factors apply to the process? Do we multiply our divisor matrix by the strength factor, and thus higher factors result in more numbers being rounded to 0?

As for the "mathematical foundation" of the DCT, I at least want to understand why the cosine function lets us break up the image into parts like this, but I suspect I'll have to go back to school and take more math to find out.

LoRd_MuldeR
1st October 2009, 21:29
I see. From what I understand, the crucial "lossy" step in image/video coding is then when the DCT coefficients are divided by the specified matrix (and I suppose the divided numbers are multipled back prior to iDCT)?

Yes, only the quantization step is lossy. The transform itself is lossless. The entropy coding is lossless too.

In that case, how do the quantization strength factors apply to the process? Do we multiply our divisor matrix by the strength factor, and thus higher factors result in more numbers being rounded to 0?

Yes, I think that's basically how it works.

As for the "mathematical foundation" of the DCT, I at least want to understand why the cosine function lets us break up the image into parts like this, but I suspect I'll have to go back to school and take more math to find out.

DCT transform is just a different way of representing (or "looking at") the image data. The data is transformed into the "frequency domain".

This doesn't change the data, but it's a representation of the data that is more suitable for compression...

thewebchat
1st October 2009, 21:44
Right, but I was wondering about why the cosine function in particular lets us look at the information in this way. Anyway, it seems I've understood most (?) of what I asked, so I'll leave it at that.

LoRd_MuldeR
1st October 2009, 21:55
Right, but I was wondering about why the cosine function in particular lets us look at the information in this way.

Then you better get reading :D
http://www.wisnet.seecs.edu.pk/publications/tech_reports/DCT_TR802.pdf

thewebchat
2nd October 2009, 02:00
Thank you. I spent the better portion of an hour reading that document. However, three terms that were recurring throughout the text confused me. They are the "compression of energy," "decorrelation," and "probabilistic mass function."

Now, the only kind of energy I know of is the F*d kind. As for the "probabilistic mass function," I neither know what this is nor what mass has to do with images. For "decorrelation," I sort of understand what they are talking about, but what specifically does it mean when you "decorrelate" a pixel? What do these new "decorrelated" values do?

Dark Shikari
2nd October 2009, 02:12
Thank you. I spent the better portion of an hour reading that document. However, three terms that were recurring throughout the text confused me. They are the "compression of energy," "decorrelation," and "probabilistic mass function."

Now, the only kind of energy I know of is the F*d kind. As for the "probabilistic mass function," I neither know what this is nor what mass has to do with images. For "decorrelation," I sort of understand what they are talking about, but what specifically does it mean when you "decorrelate" a pixel? What do these new "decorrelated" values do?I've never heard the term "probabilistic mass function".

Decorrelation is removing the correlation from data (and thus making it easier to compress).

A simple example:

5 5 5 5 6 6 6 6

This is really the sum of two basis functions:

(1 1 1 1 1 1 1 1)*5.5
(-1 -1 -1 -1 1 1 1 1)*0.5

This transformation has decorrelated our 8 input values into two frequency coefficients.

thewebchat
2nd October 2009, 02:25
Ah, I see. Although, I think you meant 5.5 maybe?

Dark Shikari
2nd October 2009, 02:28
Ah, I see. Although, I think you meant 5.5 maybe?Yes, typo fixed.

LoRd_MuldeR
2nd October 2009, 03:02
Thank you. I spent the better portion of an hour reading that document. However, three terms that were recurring throughout the text confused me. They are the "compression of energy," "decorrelation," and "probabilistic mass function."

I think what they mean with "energy compaction" is what you can see in the sample images on page 10/11.

In the "original" images (left) that data (or "brightness intensities") is distributed all over the image. But in the DCT-transformed image (right) the data/coefficients are "compacted" in a small area.

That's the same effect used in image compression. After the transform of an 8x8 pixel block, most of the 64 coefficients are very small, except those in the upper left corner.

Hence those near-zero coefficients can be rounded off to zero without loosing much information...