Variable resolution codec? [Archive]

View Full Version : Variable resolution codec?

otfuttr

4th September 2023, 20:09

Codecs have evolved to be able to vary the bitrate, and frame rate... why not also vary the resolution?

Netflix has already been doing this for a while now with convex hull encoding: https://netflixtechblog.com/per-title-encode-optimization-7e99442b62a2 They encode a single video normally at multiple resolutions, then switch between them with their custom solution choosing the optimal resolution at a given bitrate, but it's not built into the codec itself. On the web, you can use HLS or DASH, but afaik, there is no codec or container that has native support for this kind of behavior. Considering the benefits have been known for a while now, isn't it time to start adopting this into codecs?

I know AV1 has a primitive form of this called super resolution, but it is limited to 2x downsampling and it's not really enabled by default like a standard feature. I'm talking about fully arbitrary scaling, from 240p all the way up to 4K, plus everything in between, as a core feature. A new paradigm for codecs, putting the days of picking a resolution in the past. You just choose a quality level, and the encoder figures out the best resolution+bitrate pair on the convex hull, not just for the video, but for each scene and even individual frames.

Has anyone experimented with something like this? If not, why? Are gains minimal/not worth the effort? Too computationally demanding? Are there patent issues? Thoughts?

nevcairiel

4th September 2023, 22:25

The benefit from such an idea is not quite as large as it may seem, because you can just drop some detail or encode with larger blocks and most benefit from reducing the resolution disappears. Its not like modern codecs literally encode every and each pixel on their own.

rwill

5th September 2023, 04:52

This is spatial scalability more or less. Most video standards since Mpeg-2 have supported such a thing.

It somehow never took off for Mpeg-2, AVC and HEVC. Maybe because the coding efficiency losses are too high or there was no demand by the market. We will see how it goes with VVC.

birdie

5th September 2023, 09:09

Almost all online video delivery websites already do that. You don't need a new codec for that, any existing one will work. Depending on the bandwidth you're getting the best resolution for it.

While this is fine for such websites since they have massive storage and money, this will not work for most end users who have limited storage and it doesn't make a lot of sense either: CRF and target bitrate already take care of that.

benwaggoner

5th September 2023, 18:55

Almost all online video delivery websites already do that. You don't need a new codec for that, any existing one will work. Depending on the bandwidth you're getting the best resolution for it.
Because sometimes you can do 1080p anime at 1 Mbps, but 8 Mbps isn't enough for an intense action scene. The idea is that varying frame size will allow perceptual quality to be more constant at a given bitrate.

While this is fine for such websites since they have massive storage and money, this will not work for most end users who have limited storage and it doesn't make a lot of sense either: CRF and target bitrate already take care of that.
The same idea can be helpful for small sites as well. If the limitation is a particular file size/bitrate, figuring out what the right frame size is the best bang for the bit can help out a lot. Particularly if there is a lot of variation in content complexity.

otfuttr

6th September 2023, 02:07

you can just drop some detail or encode with larger blocks

Good point. You can kind of think of blocks like intra-frame variable resolution, then increasing block size seems equivalent to reducing resolution, but you'd have to increase block size a ton. e.g. going from 480p to 2160p would require a 4.5x increase in block size to achieve equivalent coverage of the frame. Might cause performance issues.

Selur

6th September 2023, 13:14

why not also vary the resolution?
Don't vpx and av1 encoders support spatial resampling? (At least the ones vpxenc and aomenc should support this,..)
Spatial resampling involves scaling the image down to a smaller size in the encoder (as an alternative method for reducing the number of bits per frame to increasing the quantizer) and then scaling it back up in the decoder. Note that frames can be dropped at any time but the encoder can only change its spatial re-sampling ratio on a key frame.
see for example: https://www.webmproject.org/docs/encoder-parameters/

benwaggoner

6th September 2023, 20:11

Don't vpx and av1 encoders support spatial resampling? (At least the ones vpxenc and aomenc should support this,..)

see for example: https://www.webmproject.org/docs/encoder-parameters/
Yeah. It allows resolution adaptability at the stream level instead of at the player heuristics level, at the cost of some flexibility.

Basically, if there's a really hard sequence that would look bad at the current bitrate and resolution, scaling down acts as a sort of "emergency blur" to quadruple bits per pixel, and so push QPs way down.

This could be very powerful combined with Film Grain Synthesis, as the grain itself would still be rendered at full resolution. There are plenty of 35mm films, particularly Super35, which are essentially 720p actual detail with a 4K film grain layer on top anyway ;).

Alas, real-world FGS implementations have shipped with bugs that have prevented broad use of FGS with AV1. Perhaps with AV2, or as a generalized FGS filter that could be triggered by metadata for any codec.

Blue_MiSfit

6th September 2023, 23:32

benwaggoner

7th September 2023, 18:29

This is making me think of LC-EVC - where you'd often encode at 1/4 or 1/2 resolution to keep the QP low, and then let their enhancement layer reconstruct high frequencies during upscaling. HE-AAC's SBR for video, basically ;)
Yeah, it's the same sort of thing, another kind of out-of-loop post processing.

There could be some value in having the scaling be in-loop, so a frame could be predicted from a scaled frame of a different resolution.

otfuttr

8th September 2023, 01:36

Don't vpx and av1 encoders support spatial resampling? (At least the ones vpxenc and aomenc should support this,..)

see for example: https://www.webmproject.org/docs/encoder-parameters/

I was aware of AV1 support, but I was not aware vpx supported this. Is it fully arbitrary scaling? or only limited range? I can't find any details or it even being mentioned in the specs here: https://storage.googleapis.com/downloads.webmproject.org/docs/vp9/vp9-bitstream-specification-v0.6-20160331-draft.pdf

AV1 is limited to 2x downsampling, but there is a lot of steps in between, allowing integer ratios of 8/9 down to 8/16 source: https://gitlab.com/AOMediaCodec/SVT-AV1/-/blob/master/Docs/Appendix-Super-Resolution.md

otfuttr

8th September 2023, 02:19

There could be some value in having the scaling be in-loop, so a frame could be predicted from a scaled frame of a different resolution.

Yea, I was thinking of exactly something like this.

For example in h264, you normally have B-frames encoded at much higher QP than I-frames. For sufficiently high QP, you may be removing most of the high frequency detail anyways, so you are effectively decreasing the resolution. But say you effectively decrease resolution by 2x, but since max macroblock size is 16x16, you're effectively limiting yourself to 8x8 macroblocks, and so you enter a sub-optimal region on the convex-hull.

So what if instead of increasing QP a lot, you reference an I-frame encoded at full resolution, but encode the B-frame at 75% resolution but at lower QP. If you find the optimal point on the convex hull for all B-frames, could there be significant efficiency gains?

benwaggoner

8th September 2023, 05:59

Yea, I was thinking of exactly something like this.

For example in h264, you normally have B-frames encoded at much higher QP than I-frames. For sufficiently high QP, you may be removing most of the high frequency detail anyways, so you are effectively decreasing the resolution. But say you effectively decrease resolution by 2x, but since max macroblock size is 16x16, you're effectively limiting yourself to 8x8 macroblocks, and so you enter a sub-optimal region on the convex-hull.

So what if instead of increasing QP a lot, you reference an I-frame encoded at full resolution, but encode the B-frame at 75% resolution but at lower QP. If you find the optimal point on the convex hull for all B-frames, could there be significant efficiency gains?
Yeah, something like that could be interesting.

It would have been a slam-dunk feature for something like MPEG-2 or VC-1, which had really steep quality drop-offs as QP got too high.

However, the trend in codecs is better and better in-loop prediction for error concealment. In most cases, HEVC and AV1 aren't going to look much worse at 1080 at the same bitrate. Since high QPs don't get sharp block edges, but get kinda soft, it's not all that different from what lowering resolution gives you.

VVC does even better in this regard, with much more natural looking motion edges with high QP predicted TUs. So I think for the most part we're solving the problem in other ways.

The key thing for a streaming service is to know that you can use higher resolutions at a given bitrate for some classes of content. Going smaller isn't nearly as valuable.

Wiabol

12th September 2023, 08:02

https://bitmovin.com/vvc-open-gop-resolution-switching/

benwaggoner

12th September 2023, 19:04

Awesome! I'll make sure to ask more about this in my meeting with Bitmovin at IBC.

otfuttr

14th September 2023, 22:36

https://bitmovin.com/vvc-open-gop-resolution-switching/

Interesting. Didn't realize VVC already had this. Will need to look into the details some more.