Updated Rule of ^3/4 for H.264 high profile?

benwaggoner · 8th May 2013, 17:33

So, a rule of thumb in compression has long been that changing frame size while maintaining similar subjective quality doesn't entail a linear increase/decrease it bitrate.

The classic rule of thumb had been that the change in pixel area should match a bitrate change to the power of 3/4ths of the pixel area. Thus going from a 640x480 1 Mbps to 1280x960 would entail 4^0.75=2.83 Mbps. Conversely, going down to 320x240 would allow a reduction to 0.25^0.75=0.35 Mbps

This originally seems to have been calibrated around MPEG-4 pt 2, however. Zambelli did some extensive testing a few years ago and discovered 0.71 was a better value for VC-1. I imagine that in-loop deblocking would tend to reduce the value.

Thus, I expect that the exponent would be even lower for H.264. I was wondering if anyone has come up with their own rule of thumb, experimentally or otherwise. I expect that High Profile is probably lower than Main, which is lower than Baseline.

I expect the exponent would be even lower for HEVC with its huge block sizes and other features to very efficiently encode areas of lower detail.

akupenguin · 8th May 2013, 23:49

Do you think the difference between .75 and .71 reflects a difference between MPEG4part2 and VC-1? Or just that the rule of thumb never was that precise?

benwaggoner · 9th May 2013, 01:21

Quote:

Originally Posted by akupenguin

Do you think the difference between .75 and .71 reflects a difference between MPEG4part2 and VC-1? Or just that the rule of thumb never was that precise?

Yeah, that small difference isn't going to matter.

But more broadly, I do think more advanced codecs will need fewer pixels per bit as frame size goes up.

Hmmm. There's probably a similar heuristic for frame rate. It'll be lower since increasing frame rate reduces the motion between frames.

Sharc · 9th May 2013, 10:00

Quote:

Originally Posted by benwaggoner

So, a rule of thumb in compression has long been that changing frame size while maintaining similar subjective quality doesn't entail a linear increase/decrease it bitrate.

The classic rule of thumb had been that the change in pixel area should match a bitrate change to the power of 3/4ths of the pixel area. Thus going from a 640x480 1 Mbps to 1280x960 would entail 4^0.75=2.83 Mbps. Conversely, going down to 320x240 would allow a reduction to 0.25^0.75=0.35 Mbps

If I understand this correctly, this rule for "similar subjective quality" applies only when watching the encoded video at its native resolution.
A different question would be about the crossover for the resolution for similar subjective quality, given a fix bitrate and a fix display size (e.g. TV screen).

Manao · 9th May 2013, 11:26

Quote:

But more broadly, I do think more advanced codecs will need fewer pixels per bit as frame size goes up.

That's particularly true of HEVC, which really shines (against H264) at large resolutions thanks to 64x64 "macroblocks" and better spatial prediction - both for intra and motion vectors.

Since mpeg4p2 doesn't really have any intra prediction, and a very basic motion prediction, I think most of the explanation for the rule of thumb (instead of the expected linear relationship) lies in the interaction between 8x8 DCT transform, details & quantization. H264 has a good intra prediction, a better motion prediction (especially skip), an adaptive entropy coder, so it can handle large resolution a lot better. "Sadly", I think low resolution coding was improved even more (4x4 DCT, 4x4 partitions), so rule may not have changed that much. HEVC somewhat doesn't care about small resolutions (if I'm not mistaken, no more 4x4 inter partitions), and really improved the coding of large ones. There's no doubt in my mind that the exponent will be lower.

Quote:

There's probably a similar heuristic for frame rate.

Yes, and it depends of the codec too. Without hierarchical bframes (h264, hevc), you'll be taking a large bitrate penalty by increasing framerate. Hierarchical bframes are a must because doubling the framerate can be seen as adding a hierarchical layer of non-reference, highly quantized bframes that will be smaller than all the other frames. As for the content itself, inter frame motion gets smaller, but motion blur disappears, and sharper frames are harder to code.

benwaggoner · 10th May 2013, 20:03

Quote:

Originally Posted by Manao

Without hierarchical bframes (h264, hevc), you'll be taking a large bitrate penalty by increasing framerate. Hierarchical bframes are a must because doubling the framerate can be seen as adding a hierarchical layer of non-reference, highly quantized bframes that will be smaller than all the other frames. As for the content itself, inter frame motion gets smaller, but motion blur disappears, and sharper frames are harder to code.

The comparison I was thinking of was the same source reducing the frame size or frame rate, so the motion blur would be constant in the comparison.

For example, comparing the extra bits to encode a 60p source at a full 60p instead of just 30p. If the frame rate exponent was 0.5 (arbitrary choice), that would mean that doubling frame rate would require an increase of 41% in bitrate.

Comparing content shot at 24p with 1/48th of a second shutter versus 48p shot with 1/72nd shutter would require a higher exponent.

The hierarchical B-frame example is interesting, but the differences of higher frame rate wouldn't be just that. For one, we could have maxed out the hierarchy possible at 30p already (I think x264 still only does one reference and one non-reference layer of B-frames). Second, the higher frame rate means that there are twice as many frames to pick between for reference frames, so we'd be able to pick better reference frames some of the time.

We'd probably get longer B-frame chains on average as well, since there will be less new visual information per frame.

Manao · 10th May 2013, 23:08

Quote:

The comparison I was thinking of was the same source reducing the frame size or frame rate, so the motion blur would be constant in the comparison.

Ah, but motion blur must be taken into account. When you reduce frame size, you downpass to avoid aliasing. It's only fair that if you reduce frame rate, you add motion blur, to avoid jerkiness - i.e. temporal aliasing. That said, I agree it's easier to ignore motion blur to compare things that are comparable.

Quote:

For one, we could have maxed out the hierarchy possible at 30p already (I think x264 still only does one reference and one non-reference layer of B-frames)

Yeah, but that's x264. H264 levels for 720p (3.1 & 3.2) set the maximum DPB size to 5 frames, which allows a fully fledged pyramidal structure with 15 Bframes, which is quite a lot already. Iirc, HEVC gives a 6-frames DPB, i.e 31 Bframes with a full pyramid. And even if you've filled out the pyramidal structure, you can still add non reference bframes. But if that happens, those added frames won't be smaller than all the other frames. However, they'll be as small as the smallest B frames already present in the low frame rate video - so quite small already.

I've made a test encoding with x264 + 3B hierarchical, and average bitrate for all bframes ended at half the overall bitrate. With non reference bframes roughly twice as small as reference ones, non reference bframes have an average bitrate of 3/8th of the overall. That roughly means doubling the framerate by adding non reference bframes would increase the bitrate by 37.5%. Adding a layer to the pyramid would probably reduce the increase to 30%.

Quote:

Second, the higher frame rate means that there are twice as many frames to pick between for reference frames, so we'd be able to pick better reference frames some of the time.

That would assume you're adding reference frames. If you do, then what you say is correct. But you'll reduce bitrate further if you add non reference frames instead.

benwaggoner · 13th May 2013, 18:50

Quote:

Originally Posted by Manao

Ah, but motion blur must be taken into account. When you reduce frame size, you downpass to avoid aliasing. It's only fair that if you reduce frame rate, you add motion blur, to avoid jerkiness - i.e. temporal aliasing. That said, I agree it's easier to ignore motion blur to compare things that are comparable.

Synthesizing motion blur is one model, but is very rarely done in practice. It's quite computationally expensive, among other things. It might be "fair" but it isn't how things are done today.

For example, The Hobbit was shot at 48p with 1/72nd exposure time. That's the average of the 1/48th you'd normally have with 24p and the 1/96th you'd expect with 48p.

I think the 48p version would have looked a lot better with a 1/96th shutter, and that was probably one of the reasons it looked so weird to so many customers.

Quote:

Yeah, but that's x264. H264 levels for 720p (3.1 & 3.2) set the maximum DPB size to 5 frames, which allows a fully fledged pyramidal structure with 15 Bframes, which is quite a lot already. Iirc, HEVC gives a 6-frames DPB, i.e 31 Bframes with a full pyramid. And even if you've filled out the pyramidal structure, you can still add non reference bframes. But if that happens, those added frames won't be smaller than all the other frames. However, they'll be as small as the smallest B frames already present in the low frame rate video - so quite small already.

Yes, I agree with all that.

Quote:

I've made a test encoding with x264 + 3B hierarchical, and average bitrate for all bframes ended at half the overall bitrate. With non reference bframes roughly twice as small as reference ones, non reference bframes have an average bitrate of 3/8th of the overall. That roughly means doubling the framerate by adding non reference bframes would increase the bitrate by 37.5%. Adding a layer to the pyramid would probably reduce the increase to 30%.

Fair enough. Of course, those gains are only possible if the lower frame rate encode maxed out at 7-8 B-frames. Which is certainly plausible. But low-motion content might have already used 15 B-frames at the original frame rate. I'd still expect significant per-frame bitrate savings in that case.

Quote:

That would assume you're adding reference frames. If you do, then what you say is correct. But you'll reduce bitrate further if you add non reference frames instead.

You can't add to the max number of reference frames, but having more frames to choose between would allow for somewhat better matches. In LFR, choosing between frames 1, 3, and 5 could exclude a better reference frame at 2, 4 or 6.

ChiDragon · 14th May 2013, 06:39

Before release people here were claiming that The Hobbit would be shot at 1/48 despite the 48 fps. Didn't know it turned out otherwise.

benwaggoner · 14th May 2013, 20:32

Quote:

Originally Posted by ChiDragon

Before release people here were claiming that The Hobbit would be shot at 1/48 despite the 48 fps. Didn't know it turned out otherwise.

I was wrong; it's actually 1/64th.

filmmakermagazine.com/60811-the-hobbit-arrives

Hyral · 18th March 2014, 19:02

Hello, I am developing my Master's dissertation on video quality for adaptive systems. I am very interested in exactly how the ^3/4 principle was developed in the first place and might be able to update it to H.264/x264 at 1080p. I understand this refers to subjective quality. Was this callibration done by a single individual's subjective perception, or were MOS tests conducted, or even a combination of these with objective metrics such as SSIM?

8th May 2013, 17:33	#1 \| Link
benwaggoner Moderator Join Date: Jan 2006 Location: Portland, OR Posts: 4,770	Updated Rule of ^3/4 for H.264 high profile? So, a rule of thumb in compression has long been that changing frame size while maintaining similar subjective quality doesn't entail a linear increase/decrease it bitrate. The classic rule of thumb had been that the change in pixel area should match a bitrate change to the power of 3/4ths of the pixel area. Thus going from a 640x480 1 Mbps to 1280x960 would entail 4^0.75=2.83 Mbps. Conversely, going down to 320x240 would allow a reduction to 0.25^0.75=0.35 Mbps This originally seems to have been calibrated around MPEG-4 pt 2, however. Zambelli did some extensive testing a few years ago and discovered 0.71 was a better value for VC-1. I imagine that in-loop deblocking would tend to reduce the value. Thus, I expect that the exponent would be even lower for H.264. I was wondering if anyone has come up with their own rule of thumb, experimentally or otherwise. I expect that High Profile is probably lower than Main, which is lower than Baseline. I expect the exponent would be even lower for HEVC with its huge block sizes and other features to very efficiently encode areas of lower detail. __________________ Ben Waggoner Principal Video Specialist, Amazon Prime Video My Compression Book

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

8th May 2013, 23:49	#2 \| Link
akupenguin x264 developer Join Date: Sep 2004 Posts: 2,392	Do you think the difference between .75 and .71 reflects a difference between MPEG4part2 and VC-1? Or just that the rule of thumb never was that precise?

14th May 2013, 06:39	#9 \| Link
ChiDragon Registered User Join Date: Sep 2005 Location: Vancouver Posts: 600	Before release people here were claiming that The Hobbit would be shot at 1/48 despite the 48 fps. Didn't know it turned out otherwise.

18th March 2014, 19:02	#11 \| Link
Hyral Digital video researcher Join Date: Mar 2012 Location: Brazil Posts: 11	Hello, I am developing my Master's dissertation on video quality for adaptive systems. I am very interested in exactly how the ^3/4 principle was developed in the first place and might be able to update it to H.264/x264 at 1080p. I understand this refers to subjective quality. Was this callibration done by a single individual's subjective perception, or were MOS tests conducted, or even a combination of these with objective metrics such as SSIM?