PDA

View Full Version : Gray Scale Encoding


koliva
6th March 2009, 09:36
Hi all,

I have asked another version of this question in different place but I think it is also appropriate for this topic.

I have 8 bit gray scale images. I can successfully import these files into VirtualDubMod via avisynth script. I select ffdshow Video Codec and then H.264 encoder. When I look at the properties of my result video, I saw it is 24 bit. Why? Shoul'd not it be 8 bit? :confused:

Dark Shikari
6th March 2009, 09:39
1. Your "properties" is wrong. x264 only supports YV12, which is 12-bit.

2. Don't use VirtualDubMod for any purpose whatsoever. It's old, deprecated, and mostly broken.

3. Don't use x264VFW. If you want to use x264, use it properly, through Avisynth or one of many GUIs.

4. x264 doesn't support grayscale (4:0:0) encoding because almost no decoders support it--and because the number of bits it saves is truly microscopic; on the order of 0.01%, so small an amount that it wasn't even worth committing the code despite the fact that I already wrote it.

koliva
6th March 2009, 09:50
1. Your "properties" is wrong. x264 only supports YV12, which is 12-bit.

2. Don't use VirtualDubMod for any purpose whatsoever. It's old, deprecated, and mostly broken.

3. Don't use x264VFW. If you want to use x264, use it properly, through Avisynth or one of many GUIs.

4. x264 doesn't support grayscale (4:0:0) encoding because almost no decoders support it--and because the number of bits it saves is truly microscopic; on the order of 0.01%, so small an amount that it wasn't even worth committing the code despite the fact that I already wrote it.

So, I can't use H.264 for my grayscale input image sequences. I didn't understant why the number of bits are microscopic. Do you have any suggestion about how I can compress my gray scale imput images?

Thank you for your replies.:thanks:

Dark Shikari
6th March 2009, 09:51
So, I can't use H.264 for my grayscale input image sequences.Sure you can--it'll just add dummy chroma channels. There is absolutely nothing wrong with this.I didn't understant why the number of bits are microscopic.Because in an arithmetic coder, it takes almost no bits to say, thousands of times in a row, "there is no chroma here."

koliva
6th March 2009, 10:02
Sure you can--it'll just add dummy chroma channels. There is absolutely nothing wrong with this.Because in an arithmetic coder, it takes almost no bits to say, thousands of times in a row, "there is no chroma here."

Ok. I give up H264 for my problem. I understand that H.264 uses mainly chroma component to compress video. So, I should choose a encoder which doesn't look at chroma component. Do you have any suggestions?

Dark Shikari
6th March 2009, 10:05
Ok. I give up H264 for my problem. I understand that H.264 uses mainly chroma component to compress video. So, I should choose a encoder which doesn't look at chroma component. Do you have any suggestions?I can't make any meaningful sense out of this post.

Again, there's nothing wrong with x264 for encoding grayscale images. I don't understand what you're trying to say.

koliva
6th March 2009, 10:08
I can't make any meaningful sense out of this post.

I will try to explain my situation more in detail.
My images are uncompressed bmp files. I have a projector that we made ourself and I have to sent R, G, B images seperately through the projector. So, I am storing the R, G, B images. But I have huge amaount of data. Somehow I have to compress them. I dont need to send uncompressed R, G, B images. So, my bmp images are 24 bit and after splitting process, I have 8 bit depth R, G, B images. In the video, I need to put them one after each other R, G, B and R, G, B for second frame and so on. I hope I have explained my problem. thanks.

Dust Signs
6th March 2009, 10:38
So why don't you encode it in full and separate the R, G and B channels for sending (after decoding and converting the YUV raw data to RGB)? Wouldn't that be easier for you than encoding three separate files?

Dust Signs

koliva
6th March 2009, 10:45
So why don't you encode it in full and separate the R, G and B channels for sending (after decoding and converting the YUV raw data to RGB)? Wouldn't that be easier for you than encoding three separate files?

Dust Signs

Yes, this is another idea but indeed I need to save R, G, B videos seperately. I will use another way to send these videos to the projector. So I need to compress these components one by one and seperately.

Dust Signs
6th March 2009, 10:52
Yes, this is another idea but indeed I need to save R, G, B videos seperately. I will use another way to send these videos to the projector. So I need to compress these components one by one and seperately.
You have to decode them anyways before sending, so why do you want to encode them separately? Are you really sure what you are doing?

Dust Signs

koliva
6th March 2009, 11:01
You have to decode them anyways before sending, so why do you want to encode them separately? Are you really sure what you are doing?

Dust Signs

:) Yes, I am quite sure what I am doing. But I think we are not in the same frequency. Ok, I will explain more in detail again.

Our projector has 3 LCOS panels. As you know, we have 3 panels because of R, G, B. When R image is on the panel, Red light will be on and R image will be on the screen at the same time. The same situation is valid for G and B. Therefore, before sending video, I need to split R, G, B components of the video and send them seperately to the certain panel. In that point, I need to encode them. Encoding process will occur in the PC but decoding will occur in the processor of the panel.

Dust Signs
6th March 2009, 11:20
Oh, I see. And the decoding process which takes place at each is panel can handle H.264 streams in real time?

Dust Signs

koliva
6th March 2009, 11:27
Oh, I see. And the decoding process which takes place at each is panel can handle H.264 streams in real time?

Dust Signs

It is not the case for now. May be later. There are 2 Gbit Ram on board. I think we can use buffering. So, do you have any idea about how to compress 8 bit data? :rolleyes:

Dark Shikari
6th March 2009, 11:29
It is not the case for now. May be later. There are 2 Gbit Ram on board. I think we can use buffering. So, do you have any idea about how to compress 8 bit data? :rolleyes:What is wrong with doing what I proposed--adding dummy chroma channels (like everyone else does)? It isn't as if it costs any bits, or even a remotely significant amount of decoding time.

LoRd_MuldeR
6th March 2009, 11:31
Still I don't see the problem. You can compress each of the three components (R, G, B) separately with x264.

You will end up with three independent H.264 streams in the 4:2:0 colorspace, only that the "chroma" channels of each stream are unused (and hence take almost zero bits).

koliva
6th March 2009, 11:41
Yes, I can use H 264 to encode my R, G, B image sequences but on of my R image is just 8 bit. After encoding, my video is 24 bit. I don't need rest of 8 bit. I dont want to use it. Ok, I can encode my images with H264 but, on the other hand, file size is also quite important. If the values of these 16 bit are almost 0, somehow I could be able to set it 8 bit. Am I wrong?

LoRd_MuldeR
6th March 2009, 11:51
x264 uses the YV12 colorspace (aka "4:2:0"). That is 12 Bits/Pixel, because of Chroma-Subsampling:
There's one 8-Bit Luma value (Y) for each pixel. Additionally there are two 8-Bit Chroma values (Cb and Cr) for each 2x2 Block of pixels.
Hence each 2x2 block of pixel takes 4*8 + 2*8 = 48 Bit. So this are 48/4 = 12 Bits per pixel, in average.

In your case there would only be Luma data (Y-values), the Chroma data (Cb- and Cr-values) will simply by empty and take ~zero bits after entropy coding ;)
So in the encoded bitstream the "dummy" Chroma channels don't take any noteworthy space and after the decoding you can simply discard them...

Some more info:
http://en.wikipedia.org/wiki/Chroma_subsampling#Sampling_systems_and_ratios

koliva
6th March 2009, 12:04
x264 uses the YV12 colorspace (aka "4:2:0"). That is 12 Bits/Pixel, because of Chroma-Subsampling:
There's one 8-Bit Luma value (Y) for each pixel. Additionally there are two 8-Bit Chroma values (Cb and Cr) for each 2x2 Block of pixels.
Hence each 2x2 block of pixel takes 4*8 + 2*8 = 48 Bit. So this are 48/4 = 12 Bits per pixel, in average.

In your case there would only be Luma data (Y-values), the Chroma data (Cb- and Cr-values) will simply by empty and take ~zero bits after entropy coding ;)
So in the encoded bitstream the "dummy" Chroma channels don't take any noteworthy space and after the decoding you can simply discard them...

Some more info:
http://en.wikipedia.org/wiki/Chroma_subsampling#Sampling_systems_and_ratios

I am not sure that I understand you correctly. Please correct me if I am wrong.

After encoding my grayscale image sequence, indeed my result video has only Y information. Chroma channels are zero(because there is no chroma information) and after entropy coding, these bits are gone. If so, why I see "24 bit" when I look at the properties of the result video?
:confused:

Dark Shikari
6th March 2009, 12:10
I am not sure that I understand you correctly. Please correct me if I am wrong.

After encoding my grayscale image sequence, indeed my result video has only Y information. Chroma channels are zero(because there is no chroma information) and after entropy coding, these bits are gone. If so, why I see "24 bit" when I look at the properties of the result video?
:confused:Because the program you are using to view the properties is bugged. The actual bit depth is 12-bit.

LoRd_MuldeR
6th March 2009, 12:10
I am not sure that I understand you correctly. Please correct me if I am wrong.

After encoding my grayscale image sequence, indeed my result video has only Y information. Chroma channels are zero(because there is no chroma information) and after entropy coding, these bits are gone. If so, why I see "24 bit" when I look at the properties of the result video?
:confused:

Before your video is encoded, the data will be converted to the YV12 format, because that's the only format accepted by x264.
Now, if your source video is grayscale, then only the Y-values (Luma) will be significant, all the Cr- and Cb-values (Chroma) will simply be zeroed.
After encoding the "dummy" Chroma values are not gone, but they are compressed very efficiently, so they take almost no bits!
Finally after decoding you will get YV12 data again, 12 bits/pixel. Throw away the unneeded Chroma channels and all that remains is the 8 bit/pixel Luma data.
If you get 24-Bit for some reason, then your decoder/player is probably converting the output from YV12 to RGB24 for some reason...

Dark Shikari
6th March 2009, 12:24
Now, if your source video is grayscale, then only the Y-values (Luma) will be significant, all the Cr- and Cb-values (Chroma) will simply be zeroed.128'd, to be exact ;)

koliva
6th March 2009, 13:55
Ok. Now everything is totally clear except,

128'd, to be exact ;)

What do you mean?

As an another question, I made a video using my colorful images. It has 76.372 mb. And then I added GreyScale() function at the end of the same avisynth script to do it grayscale. Now grayscale video has 74.810 mb. Why there is a little difference? I thought that I would get 1/3 size of colorful video but not. Why?

By the way, thank you for your all replies and your patience :rolleyes:

kemuri-_9
6th March 2009, 14:24
Ok. Now everything is totally clear except,


128'd, to be exact


What do you mean?

128 is the proper value for pure black chroma channels, not 0.
this is what the chroma is/should be set to for proper YV12 greyscale.

LoRd_MuldeR
6th March 2009, 17:53
What do you mean?

He means that for "grayscale" video all the chroma values (Cb and Cr) will be set to 128, not to 0.

Anyway, a long sequence of the very same value (may it be 0's or may it be 128's) will be compressed to almost zero bits after entropy encoding.

As an another question, I made a video using my colorful images. It has 76.372 mb. And then I added GreyScale() function at the end of the same avisynth script to do it grayscale. Now grayscale video has 74.810 mb. Why there is a little difference? I thought that I would get 1/3 size of colorful video but not. Why?

The reason is the Chroma-Subsampling of the YV12 (4:2:0) format: There are four Luminance values for each 2x2 pixel-block (one Y value for each pixel), but only two Chroma values (one Cb and one Cr value for the whole 2x2 block). So even if you discard all the Chroma data, you discard only 1/3 of the data. Furthermore it seems that most of the "information" (entropy) was contained in the Luma channel. Therefore applying GreyScale() and removing the Chroma channels (in fact they are not removed, but simply filled with 128's) didn't save you 33% of the compressed size...


http://upload.wikimedia.org/wikipedia/commons/thumb/5/56/Chroma_subsampling_ratios.svg/850px-Chroma_subsampling_ratios.svg.png

koliva
6th March 2009, 21:19
He means that for "grayscale" video all the chroma values (Cb and Cr) will be set to 128, not to 0.

Anyway, a long sequence of the very same value (may it be 0's or may it be 128's) will be compressed to almost zero bits after entropy encoding.



The reason is the Chroma-Subsampling of the YV12 (4:2:0) format: There are four Luminance values for each 2x2 pixel-block (one Y value for each pixel), but only two Chroma values (one Cb and one Cr value for the whole 2x2 block). So even if you discard all the Chroma data, you discard only 1/2 of the data. Furthermore it seems that most of the "information" (entropy) was contained in the Luma channel. Therefore applying GreyScale() and removing the Chroma channels (in fact they are not removed, but simply filled with 128's) didn't save you 50% of the compressed size...


http://upload.wikimedia.org/wikipedia/commons/thumb/5/56/Chroma_subsampling_ratios.svg/850px-Chroma_subsampling_ratios.svg.png

Please, correct me,

We are both agree that in grayscale image has no Chroma channels. Let say I have 800x800 image,

with Chroma channel,
[(800x800)x8]+[(400x400)x8]+[(400x400)x8]= 1.024.000 bit

for grayscale image, this calculation is the same but because all of the values are the same, size would be

[(800x800)x8]+ T ~ 5.015.000

T is very small amount of data for entropy of Chroma channels. In my example, if I looked at my colorful image, chroma channels didn't have so much information. Therefore, at the end of the encoding I had gotten 76.372 mb. Then I converted images to grayscale but chroma channels were still nearly the same. Hence, I got 74.810 mb. If I had choosen an image sequence which have more chroma channel information, I should have gotten, for example, 110 mb instead of 76.372 mb. Moreover, if I had encoded this sequence with H.264, I should have gotten, for example, 74 mb which must be nearly the same as 74.810 mb. Is this explanation correct?

LoRd_MuldeR
6th March 2009, 21:38
We are both agree that in grayscale image has no Chroma channels. Let say I have 800x800 image

No, the "grayscale" video still has the Chroma channels, but the Chroma channels don't contain any information (ALL values simply are set to "128").

I don't really understand what your example is supposed to say. But you have to remember that the encoder doesn't work on pixels. Inside the encoder "bits per pixel" doesn't matter much. In fact each pixel block will be transformed to the frequency domain. Then the further processing happens on the resulting coefficients. So if you have a "grayscale" video, then all pixles in a Chroma block will have a value of 128. Or In other words: The Chroma blocks are completely flat. After transformation all coefficients of the block, except for the upper left one, will be "0". These coefficients can be compressed extremely well. So after entropy coding the Chroma channels will take almost no bits - they are negligible for the total size of your encode. That's it...

akupenguin
6th March 2009, 23:37
After transformation all coefficients of the block, except for the upper left one, will be "0".
The upper-left coef will be 0 too, due to intra or inter prediction.

ajp_anton
6th March 2009, 23:44
Original grayscale:
[luma 8 bit] = 8 bit
YV12 grayscale:
[luma 8 bit] + [chroma 4 bit] = 12 bit

In both, the luma channel contains the grayscale video, and in YV12, the chroma box contains only a huge amount of 128's.
Now, after you encode this, you get

Original grayscale:
[luma x bit] = x bit
YV12 grayscale:
[luma x bit] + [chroma ~0 bit] = ~x bit

where x is smaller than 8 (depends on how much it was compressed). In YV12, chroma is almost 0 because (example:) all it needs to say is "here is only a huge amount of 128's", which covers the whole video of thousands of frames. So when encoded, both versions are almost the same size (~0.01% difference from Dark Shikari's first reply).

When decoded, the luma channels are 8 bit again, and in YV12 the luma channel is 4 bit.

koliva
7th March 2009, 13:26
I had an experiment and , as usual, and I have some questions about it.

I prepared a video which has 729 colorful images. Then I encode it using H.264.
Result avi filesize is 19,913 kb.

I converted all these input images to Y, U, V channels and saved them seperately. So, I had 3 videos and each of them had 729 images. (Y, U, V channels)
I encoded these 3 videos using same encoding method.
Result avi filesize for Y image sequence is 19,565 kb.
Result avi filesize for U image sequence is 400 kb
Result avi filesize for V image sequence is 3,221 kb.

Finally, when I encode these channels seperately, I got 3,223 kb extra filesize.
My question might be little bit weird and unneccessary but I wonder it. Why I got this extra size? What was changed when I encode them seperately? I guessed that it is beacuse of entropy coding but still I want to know it exactly.

And my second question: I just split the bmp file into YUV channels and save them(Y, U, V seperately) using Matlab. When I import, for example, U channel images into the VirtualDub, does H.264 encoder use this information as U channel information or just gray level information?

Thanks in advance.

Dark Shikari
7th March 2009, 13:38
I had an experiment and , as usual, and I have some questions about it.

I prepared a video which has 729 colorful images. Then I encode it using H.264.
Result avi filesize is 19,913 kb.

I converted all these input images to Y, U, V channels and saved them seperately. So, I had 3 videos and each of them had 729 images. (Y, U, V channels)
I encoded these 3 videos using same encoding method.
Result avi filesize for Y image sequence is 19,565 kb.
Result avi filesize for U image sequence is 400 kb
Result avi filesize for V image sequence is 3,221 kb.

Finally, when I encode these channels seperately, I got 3,223 kb extra filesize.
My question might be little bit weird and unneccessary but I wonder it. Why I got this extra size? What was changed when I encode them seperately? I guessed that it is beacuse of entropy coding but still I want to know it exactly.Because:

1) Chroma is coded differently from luma in H.264.

2) There is a high correlation between chroma and luma with regards to motion, etc--coding them separately duplicates this data. Coding the same stream three times means coding data like motion vectors three times as well.

koliva
7th March 2009, 14:05
Because:

1) Chroma is coded differently from luma in H.264.

2) There is a high correlation between chroma and luma with regards to motion, etc--coding them separately duplicates this data. Coding the same stream three times means coding data like motion vectors three times as well.

So, you mean that there is one motion vector for both Luma and Chroma channels. Is that correct?

If so, this extra space is mainly due to motion vector.

LoRd_MuldeR
8th March 2009, 02:04
I think the "extra" size also may be explained with Chroma subsampling. When you encode the "colorful" images (I assume the source is RGB, not subsampled), then they will be converted to YV12 before encoding. So the resolution of both Chroma channels will be reduced to 1/2 in both dimensions, while the full Luma resolutions is kept. Now when you separate the channels before encoding, you will get three videos: One containing the Luma information, two containing the Chroma information. In fact you have just generated three "grayscale" videos from one "colorful" source video. If these three separate videos are then converted to YV12, each video will contain all information in its Luma channel (at full resolution!), while the two Chroma channels of each video are unused (filled with "dummy" data). In other words: The Chorma information of your original "colorful" images are now stored in the Luma(!) channel of the corresponding separate video file. Hence they are not subsampled any more, encoded at full resolution...

Sagekilla
8th March 2009, 06:31
@LoRd_MuldeR: Yep, it's a bit more involved then that I think. But, you are that he's more or less encoding -way- more information than necessary. If we had proper support for 4:4:4, this wouldn't be an issue. But since we don't, we have to make do with 4:2:0, or resort to some ridiculous RGB -> 3 x Y type encoding.

koliva
8th March 2009, 11:17
I think the "extra" size also may be explained with Chroma subsampling. When you encode the "colorful" images (I assume the source is RGB, not subsampled), then they will be converted to YV12 before encoding. So the resolution of both Chroma channels will be reduced to 1/2 in both dimensions, while the full Luma resolutions is kept. Now when you separate the channels before encoding, you will get three videos: One containing the Luma information, two containing the Chroma information. In fact you have just generated three "grayscale" videos from one "colorful" source video. If these three separate videos are then converted to YV12, each video will contain all information in its Luma channel (at full resolution!), while the two Chroma channels of each video are unused (filled with "dummy" data). In other words: The Chorma information of your original "colorful" images are now stored in the Luma(!) channel of the corresponding separate video file. Hence they are not subsampled any more, encoded at full resolution...

I have little bit changed my experiment and I got weird results. I don't know the reason and maybe you help me to clarify it.

In my previous experiment, I saved Y, U, V channels seperately and, hence, I had 3 videos. Then you said that I got extra size just because my chroma information was kept in Luma channel when I saved U and V channel videos. That is ok.

Now I have saved Y, U, V channels seperately again but there is a difference. Now, When I saved Luma information, I had saved 128 for all Chroma channels in order not to keep Chroma information in Luma channel. I have done the same thing for Chroma channels as well. When I save U channel, I save 128 for Y and V information.

At the end of the process I had 3 videos and each of them has only one channel information. Then I encode them using H.264. Here is my results,

Video which has only Y channel information => 3,224 KB
Video which has only U channel information => 3,196 KB
Video which has only V channel information => 369 KB

However, my original video size had 19,913 KB. Am I doing something wrong? I bet I did :confused:
I expected that the video has only Y information should have had more size.

LoRd_MuldeR
8th March 2009, 20:13
I don't really understand what you are trying to do now. Why not simply accept what has been explained already?

koliva
8th March 2009, 20:33
I don't really understand what you are trying to do now. Why not simply accept what has been explained already?

I do accept all of them. I am just trying to twist little bit it's arm :devil:

akupenguin
9th March 2009, 05:11
Now I have saved Y, U, V channels seperately again but there is a difference. Now, When I saved Luma information, I had saved 128 for all Chroma channels in order not to keep Chroma information in Luma channel. I have done the same thing for Chroma channels as well. When I save U channel, I save 128 for Y and V information.
This will be inefficient for reasons different from before: Parts of x264's motion estimation algorithms look at luma only (because usually chroma motion is well correlated with luma, and it's faster this way.) With --no-chroma-me this is everything; but even with chroma-me, fullpel is luma only.

koliva
9th March 2009, 18:46
This question is a bit out of the topic. As far as I know the maximum block size is 16x16 in H.264. If I have an image sequence which the difference between two frames is more than one block size, the H.264 algorithm will not be so efficient. So, we can say that H.264 algorithm is highly efficient when adjacent frames are not different from each other more than 16 pixels.

Is this correct?

Sharktooth
9th March 2009, 18:55
nonsense.
h.264 implements variable block size. so if the difference between 2 frames is bigger than the maximum block size, it means the difference is more than one block. it's the same for EVERY block based encoders.
i cant see what are you aiming at.

koliva
9th March 2009, 19:03
nonsense.
h.264 implements variable block size. so if the difference between 2 frames is bigger than the maximum block size, it means the difference is more than one block.
i cant see what are you aiming at.

In my video, there is a moving ball which moves only horizontally. Nothing more. I deleted some intra frames but file size is now more than I have expected. I thought that it was because of the difference between adjacent frames. For example, before deleting, the ball moved 10 pixels within a frame. But after deleting, this was 20 pixels.

Sharktooth
9th March 2009, 19:10
bigger MVs

koliva
9th March 2009, 19:12
bigger MVs

I thought that MV has the same size for all frames. So I need to read about MV.:thanks:

Sharktooth
9th March 2009, 19:14
the more motion the more bits will be needed for MVs.

_DW_
10th March 2009, 17:57
Result avi filesize is 19,913 kb.



Small clarification here. I didn't think it was a good idea to stuff a h.264 stream into avi. I know you can do, I just didn't think you should. Like sticking a fork in a light socket....

Sharktooth
11th March 2009, 14:07
yeah... sort of... by doing that, what you get are only troubles...

TDC.net
20th November 2010, 15:19
Just another question about grayscale encoding:
it's overall 12 bits/pixel, but only 8 bits luma / pixel?
so encoding a set of 12 bit grayscale images (dicoms) as mpeg4 video will cut information down to 8 bit grayscale?
Or is there a way to preserve 12 bit grayscale?
Thanks for your help!

nm
20th November 2010, 17:00
Just another question about grayscale encoding:
it's overall 12 bits/pixel, but only 8 bits luma / pixel?
Yes, when using 8-bit YUV 4:2:0.

so encoding a set of 12 bit grayscale images (dicoms) as mpeg4 video will cut information down to 8 bit grayscale?
Or is there a way to preserve 12 bit grayscale?

You could encode at 10 bits with x264, but there aren't many decoders that support such H.264 streams. Mainconcept does, I think.

TDC.net
21st November 2010, 13:17
Yes, when using 8-bit YV12.



You could encode at 10 bits with x264, but there aren't many decoders that support such H.264 streams. Mainconcept does, I think.

:thanks: