PDA

View Full Version : Compression of black bars in AVC, does it take a lot of space?


tijgert
19th October 2009, 00:07
I've been experimenting with different encode settings. The original goal was to see for myself the quality difference (if any) at medium bitrates. By mistake I have made one version of 'The Patriot' (2:55hours) with the black bars cropped (17Gigs) and one version without cropping (19Gigs).

I'm having a hell of a lot of trouble actually seeing any difference in both encodes, but that *may* be because the moving part of the movies have roughly the same bitrates while the black bars eat up those 2 extra gigs. I don't really know.

What I'm actually asking you folks here is about the compressability of the black bars; Can anyone give me a reasonable conclusion of how cropping black bars affects movie quality when encoding at the same bitrate? (do they compress down to near nothing or are they hungry for bits?)

Boolsheet
19th October 2009, 00:24
I know someone asked a similar question not too long ago. *pokes search* Ah here:
http://forum.doom9.org/showthread.php?t=149403

Mh, if I remember correctly there must be another thread with the same question...

tijgert
19th October 2009, 00:36
The other topic actually implies a better quality movie WITH the black bars during a one pass encode.
But I was doing a 2 pass encode with Ripbot (I know, hardly professional but I'm just trying to learn from experience) and so I'm not sure how that translates.

I have no interest in any compliance with resolution restrictions by the way.

Sagekilla
19th October 2009, 00:48
If you're using crf mode, the fact that the encode with black bars came out is normal and expected. Black bars = very flat, easy to encode. Because of how much space they take up in the frame, they decrease the overall complexity of the scene and x264 lowers the quantizer used for the frames. That's why you get a larger file size.

Otherwise, if you encoded using identical bitrates (using 2-pass, for example), there should be little to no difference since black bars consume VERY few bits.

LoRd_MuldeR
19th October 2009, 00:53
What actually consumes bits would be the border between the black area and the actual content. If the border is perfectly sharp and aligned to mod16, the overhead for the black area will be negligible.

Dark Shikari
19th October 2009, 00:53
You should always crop black bars if possible. They don't take up much space, but there's no reason to keep them (it wastes encoding and decoding time) and it slightly lowers the efficiency of motion compensation. It's worse if the black bars don't end on mod16 boundaries.

tijgert
19th October 2009, 01:03
Yeah well I intended to crop the bars, I just forgot.
Since I encoded both attempts at different bitrates I also can't say that those 2 extra Gigs went to the black bars.
It'd be kinda much anyway.

This little foul-up just made me curious as to the effect of leaving the black bars in (with very crisp borders by the way)

benwaggoner
19th October 2009, 10:32
Black bars from noisy sources are also a lot worse from both a compression efficiency and quality perspective. Even a little bit of noise in there can require a lot of bits to avoid blocking on LCD displays. Cropping them out thus saves some noise reduction tweaking and processing time as well.

Getting letterboxing to be mod16 is really important for quality with MPEG-2, since it doesn't have any good way of dealing with a really sharp edge mid-block.

shon3i
19th October 2009, 10:57
Did is better to crop, and add new black bars if need (Blu-Ray), than keep original?

Dark Shikari
19th October 2009, 10:59
Did is better to crop, and add new black bars if need (Blu-Ray), than keep original?Yes, and to add the new ones such that the borders of the black bars are mod16.

Sharc
19th October 2009, 11:11
When Blu-ray compliance is required it is good practice to crop the black borders such as to leave a mod16 compliant active picture, and to add new black borders using AddBorders(...) which are noise-free.

Edit:
DS was faster ....
But question: If the added black borders are not mod16, wouldn't x264 just pad it anyway to be mod16 compliant?

Dark Shikari
19th October 2009, 11:18
But question: If the added black borders are not mod16, wouldn't x264 just pad it anyway to be mod16 compliant?I meant that the edges of the borders were mod16, not the image itself.

Shevach
19th October 2009, 13:03
What actually consumes bits would be the border between the black area and the actual content. If the border is perfectly sharp and aligned to mod16, the overhead for the black area will be negligible.

I agree with the above quote. Indeed if a black border is not aligned to 16-pixels boundary then sharp vertical edge might cause appearance of high-frequency components after DCT.

I would clarify more points related to CABAC encoding. Notice that this point is relevant if the stream is encoded in CABAC mode.
Entropy models of an active video area and a black area usually are very different. Consequently when an encoder reaches the black area with the entropy models tuned for the active video "penalty" (extra) bits are generated.
In Theory of Information there is a particular term - Kullback–Leibler divergence which measures the expected number of extra bits required to code samples from a source with actual entropy P when using a code based on the entropy Q.

Moreover, within the black area CABAC is started to adapt it self to the actual entropy of black area and when the CABAC again returns to active video area it wastes extra bits until it tunes to the entropy of the active video area.

G_M_C
19th October 2009, 13:25
Just theoretical;

If you need the black bars to get to a BD spec resolution. Is it not possible to encode the cropped image, and add the black bars back through adding a slice above and a slice under the real image ? (so in fact making slices useful in those cases ?).

How it can be done is another matter. But theoretically speaking; Should this not work ?

Shevach
19th October 2009, 14:05
G_M_C,
I assume your comment relates to my explanation.

Indeed, top and bottom margins we can encode as a separate slice. After at most 64 MBs CABAC should tune itself to the actual entropy of the black area (since context models are initially initiated according to cabac_init_idc and SliceQP and are far from the actual entropy). In such case extra bits are generated only for first 64 MBs.
As for right and left margins 'slicing' is not a good idea.

Generally speaking everyone can conduct the following experiment:
1) Take two "different" video excerpts say with the same resolution 480x720 and the same number of pictures. Here "different" means with different entropy (e.g. one stream can be from cartoon another from live video).

2) Encode these two video streams in CABAC mode with constant QP (to exclude RateControl impact).
Suppose that the first stream gives X bits while another Y bits.

3) Then compose a new video by concatenating raw data of the first excerpt and the second one. The resulting video is in the resolution of 480x1440 (where left part is from the first stream and the right is from the second one).

4) Encode the concatenated video excerpt in CABAC mode with the same QP.
It is expected that the new stream gives much more bits than (X +Y), because entropy is changed across each row and CABAC can't stabilize.

akupenguin
19th October 2009, 20:03
Slices can only end at macroblock boundaries, which is precisely the condition on which borders aren't too horribly inefficient anyway.
The number of bits wasted in waiting for cabac to readapt from coding "all skips" to coding real content, is less than the number of bits spent on a slice header. Because the border will be all skips, so it doesn't need to code any other type of token, so it doesn't contaminate any other part of the cabac model.
A new slice might possibly improve mv prediction along the top edge (which would otherwise use a constant 0 prediction due to the two neighbors with mv=0), and it would allow use of DC prediction for intra blocks there (which would otherwise average half of its neighbors from the black part). This doesn't do jack for the bottom border, since motionless skip blocks don't rely on neighbors even that much.

Then compose a new video by concatenating raw data of the first excerpt and the second one.
That mixes a change in entropy coding with a change in motion interpolation. mvs that used to point off the left or right edge of the frame (and thus get extrapolated samples), now point into the other video (and thus get much worse values).

Shevach
20th October 2009, 10:09
That mixes a change in entropy coding with a change in motion interpolation. mvs that used to point off the left or right edge of the frame (and thus get extrapolated samples), now point into the other video (and thus get much worse values).

The question is how from the mixture extracts the very impact of entropy change?

Let's consider a P-frame with width 64x16 pixels where all MBs at each even row are non-skip while all MBs at each odd row are skip.
In such case for mb_skip_flag syntax element only one context model with number 12 is always selected since left MB is always skip/non-skip and top is always non-skip/skip.

Upon an encoder finishes an even row the context model #12 contains LPS=1 (least probable symbol) with the probability 0.01875 (the minimal probability in CABAC).
Now the encoder starts to process the next MB row (odd row) where all syntax elements to code are mb_skip_flag and end_of_slice. The number of bits are generated for mb_skip_flag of the first MB is 6-7 bits. After 64 MBs mb_skip_flag is coded with ~1 bit. Thus ~2-3 extra bits are produced in average for coding of mb_skip_flags.

The above example shows how the change of entropy affects on total output size.

akupenguin
20th October 2009, 16:38
Let's consider a P-frame with width 64x16 pixels
64x16 MBs?

The number of bits are generated for mb_skip_flag of the first MB is 6-7 bits. After 64 MBs mb_skip_flag is coded with ~1 bit. Thus ~2-3 extra bits are produced in average for coding of mb_skip_flags.
The number of bits generated for mb_skip_flag of the first MB is 5.7 bits. After 64 MBs, mb_skip_flag is coded with 0.027 bits. The total cost of all 64 flags is 46.2 bits, which is an overhead of 44.4 compared to the 1.8 bits it would cost to code 64 flags optimally predicted.
If you code each row as its own slice, then the slice header + NAL encapsulation costs about 80 bits (or more if you use mmco or wpred). And a new slice doesn't init the cabac state to "perfect prediction", it inits to some QP-dependent but otherwise constant value, which for the sake of simplicity I'll assume to be the 50% context. Coding 64 flags starting from there costs 17.1 bits. Total overhead 95 bits.

Shevach
21st October 2009, 10:21
64x16 MBs?

The number of bits generated for mb_skip_flag of the first MB is 5.7 bits.

I see you are familiar with Theory of Information.
Indeed -log2(0.01875) = ~5.7 bits. This is an approximate value since CABAC is Q-coder, i.e. the multiplications for determination intervals are approximated. Therefore the exact number of bits that CABAC produces for the very first mb_skip_flag is believed to be between 5 - 6 bits. So, I was wrong in my estimation on 6-7 bits.
Anyway your reasoning sounds reasonable, i.e. slice headers can only deteriorate the situation. Frankly speaking I was not a proponent for slice headers. I wanted to show that a sharp change of entropy within a picture might give extra bits.

There is an issue how to init context models at the start of a slice in order to maximally approaches to the actual entropy of data within the slice? Two parameters cabac_init_idc and sliceQP affects on initial setting of context models.
One can selects best sliceQP and cabac_init_idc to get closest approximation to the actual entropy. In such case the number of extra bits generated by CABAC during adaptation is minimal.
For example let's assume that sliceQP=1 and cabac_init_idc=1 gives best approximation then the first MB should contain qp_delta in order to signal to a decoder the correct QP value (because sliceQP has been chosen only for CABAC initialization).
On the other hand all initial settings of context models are not uniformly dispersed. Therefore for some sources initial CABAC settings is far from the actual entropy. Unfortunately H.264 does not support a custom initiation of context models, therefore on many video sources CABAC generates a lot of extra bits at the start of slices untill it adapts itself.

Dark Shikari
21st October 2009, 10:49
I see you are familiar with Theory of Information.Perhaps you should look again at who the person you're responding to is ;)There is an issue how to init context models at the start of a slice in order to maximally approaches to the actual entropy of data within the slice? Two parameters cabac_init_idc and sliceQP affects on initial setting of context models.Both of which, from my testing, are totally useless.

Shevach
21st October 2009, 11:36
Perhaps you should look again at who the person you're responding to is
I responded to akupenguin.


Both of which, from my testing, are totally useless.

I conjecture that in your tests slices contains large number of MBs. Therefore the choice of cabac_init_idc is useless since the number of bits you can save by optimal choice of cabac_init_idc/sliceQP is negligible against total slice size.

But in situations where each slice contains say 50-100 MBs the optimal initiation of context models might give a significant bit-saving.

Dark Shikari
21st October 2009, 11:40
I responded to akupenguin.My point stands ;)I conjecture that in your tests slices contains large number of MBs. Therefore the choice of cabac_init_idc is useless since the number of bits you can save by optimal choice of cabac_init_idc/sliceQP is negligible against total slice size.

But in situations where each slice contains say 50-100 MBs the optimal initiation of context models might give a significant bit-saving.Adaptive cabac_init_idc was tried years ago and was useless: 0 is practically always the best in all situations.

And I did most of my tests regarding slice QP using CIF videos.

Edit: I take that back. Re-tested it now and got some slightly better results:

517.74 -> 517.45 kbps on soccer CIF
33.68 -> 33.61 kbps on akiyo QCIF

(by optimizing frame QP so that the average of the frame's delta quants is 0, or at least rounds to zero)

Measurable, but hardly worth spending that much time on.

Edit again: maybe not. Foreman CIF:

438.63 -> 439.58 kbps

Shevach
21st October 2009, 13:18
Adaptive cabac_init_idc was tried years ago and was useless

Dark Shikari,

It is hard to believe that Gary Sullivan and Detlev Marpe (one of H.264 authors) could adopt the syntax element of cabac_init_idc without rigorious check that this feature is beneficial.
I also provided some experiments with cabac_init_idc and revealed bit-savings.

Unfortunately I can't find in IEEE papers any research related to this issue (i.e. optimal choice of cabac_init_idc).
I'll ask Gary Sullivan, maybe in JVT archieves there are relevant documents.

Dark Shikari
21st October 2009, 13:33
Dark Shikari,

It is hard to believe that Gary Sullivan and Detlev Marpe (one of H.264 authors) could adopt the syntax element of cabac_init_idc without rigorious check that this feature is beneficial. Why's that? H.264 is absolutely dripping with completely useless features.

Shevach
21st October 2009, 13:56
H.264 is absolutely dripping with completely useless features.

So, you are welcome to list useless features of H.264

Dark Shikari
21st October 2009, 14:02
So, you are welcome to list useless features of H.264Everything in extended profile (which almost nobody has ever used and was only included in the spec for political reasons--this category alone could fill a page)
Everything in Baseline that isn't in Main (same as above)
Constrained intra prediction
The ridiculously unnecessary amount of flexibility with regard to poc type and frame nums (in particular, poc type 1, the golomb-coded poc method, which is totally useless). But the number of LSBs that the spec allows you to code is also stupid since the maximum reorder_frames is 16.
direct-8x8-inference=0 is incredibly pointless and its mere existence complicates decoders. I have never managed to get more than 0.001db gain using it, which is why I removed it from x264.
constraint_set2_flag is completely pointless.
Half the SEI messages are completely useless, or at a minimum ridiculously overcomplicated for their intended task (especially all the repeat_count syntax elements that specify how often something will be changed...)
cabac_init_idc (of course)

This is not even listing badly designed features or features which could be made vastly simpler: this is simply a list of things which could be completely excised from the spec and absolutely nothing of value would be lost. If you want me to start listing redundancies in the bitstream and so forth the list could get a lot longer...

Shevach
21st October 2009, 14:17
Dark Shikari,

Why you don't ask H.264 designers (e.g. Gary Sullivan) under what circumstances for example direct-8x8-inference=0 gives a gain (if ever).
I propose to open a new thread - "useless features in H.264".
Perhaps someone can points onto particular circumstances when for example constrained_intra_pred mode =1 is beneficial.

G_M_C
21st October 2009, 20:54
Sorry for all this. I started this discussion about slices a couple of post ago, not because of the technical side of it but wondering on the practical use of it. The technical side of it, and especially the level of discussion now, is way over my head (and I mean WAY over my head, something like Mount Everest is over my head here in the Netherlands, which is in fact below sea-level :P).

The reason i asked is because I remembered the tsMuxeR thread, where Roman tried using slices to achieve "uncropping" of cropped encodes. The idea beeing that you can "uncrop" the encoded video re-writing the raw video-stream, and by adding (modifying) slices above and underneath the encoded video get to blu-ray specified resolutions. I thought the idea of using slices this way was "nifty".

But Roman's efforts did not succeed, and reading above technical info gives me the idea that it would never have worked, even now x264 supports BD compliant slicing.

Just to let you guru's know why I came to this question/discussion, a simple useful (read: hopeful) idea.

Dark Shikari
21st October 2009, 22:03
The reason i asked is because I remembered the tsMuxeR thread, where Roman tried using slices to achieve "uncropping" of cropped encodes. The idea beeing that you can "uncrop" the encoded video re-writing the raw video-stream, and by adding (modifying) slices above and underneath the encoded video get to blu-ray specified resolutions. I thought the idea of using slices this way was "nifty".

But Roman's efforts did not succeed, and reading above technical info gives me the idea that it would never have worked, even now x264 supports BD compliant slicing.No, it absolutely would have worked. Of course, the other Blu-ray restrictions are likely to be more of an issue.

akupenguin
21st October 2009, 22:26
No, it absolutely would have worked.
What do you do with mvs that used to point off the frame?

Dark Shikari
21st October 2009, 22:44
What do you do with mvs that used to point off the frame?... oops. I keep forgetting that AVC doesn't have a way to disable that, and even the relevant SEIs only say that MVs won't point off the frame, not that edge emulation is changed.

G_M_C
22nd October 2009, 08:16
What do you do with mvs that used to point off the frame?

Yup that's exactly what happened in the trial-and-error tool Roman wrote.

Shevach
22nd October 2009, 10:18
Dark Shikari,
I'll ask Gary Sullivan, maybe in JVT archieves there are relevant documents.

The answer of Gary Sullivan on the usefullness of cabac_init_idc is:

The usefulness of the feature is closely related to slice size. When slices are large, the benefit of custom initialization will be negligible, because the coder will adapt reasonably quickly on its own. When slices are small, there will be a benefit, although typically a relatively small one. The basic idea is just to help minimize the penalty of using small slice sizes. I believe the original proposal was JVT-D020. It was proposed again in JVT-E154 with a small refinement in JVT-F039.



As that time, we may have thought that small slices would be used more commonly than they seem to be today. In MPEG-2 video, there's at least one slice per row of macroblocks. My impression is that more recent common practice has become to use relatively few slices (sometimes just one slice per picture).



Note that the method of selecting the initialization value is outside the scope of the standard. The method used by the proposer was very simple (which was to use the value that corresponds best with the result of the table adaptation for the preceding slice). I believe the benefit reported by using that method with standard-definition video with two macroblock rows per slice was about an average of 2%. That may not seem like so much, but we wanted CABAC to retain a substantial compression benefit over CAVLC even when small slices were used, and with smaller slices such as one slice per macroblock row (or a more exhaustive encoder selection of the initialization value) the benefit would presumably be greater.

G_M_C
24th October 2009, 10:14
Sorry for all this. I started this discussion about slices a couple of post ago, not because of the technical side of it but wondering on the practical use of it. The technical side of it, and especially the level of discussion now, is way over my head (and I mean WAY over my head, something like Mount Everest is over my head here in the Netherlands, which is in fact below sea-level :P).

The reason i asked is because I remembered the tsMuxeR thread, where Roman tried using slices to achieve "uncropping" of cropped encodes. The idea beeing that you can "uncrop" the encoded video re-writing the raw video-stream, and by adding (modifying) slices above and underneath the encoded video get to blu-ray specified resolutions. I thought the idea of using slices this way was "nifty".

But Roman's efforts did not succeed, and reading above technical info gives me the idea that it would never have worked, even now x264 supports BD compliant slicing.

Just to let you guru's know why I came to this question/discussion, a simple useful (read: hopeful) idea.

No, it absolutely would have worked. Of course, the other Blu-ray restrictions are likely to be more of an issue.

What do you do with mvs that used to point off the frame?

... oops. I keep forgetting that AVC doesn't have a way to disable that, and even the relevant SEIs only say that MVs won't point off the frame, not that edge emulation is changed.

Yup that's exactly what happened in the trial-and-error tool Roman wrote.

I just want to add that I would really have liked a feature like that. I even think that i would have been very useful and that i also have the feeling that it would have been used very much.

I can even imagine that some manufacturers might have used in their suits. You could give the camera more non-standard resolution-options, and you offer a "fast setting" in your software-suits (no transcoding required only rewriting of the stream & adding slices).