Log in

View Full Version : CRF use for content aware VOD streaming?


excellentswordfight
22nd April 2020, 17:08
Hi,

So there are a lot of discussions regarding content aware/per-title streaming for VOD nowdays, were different types of methods are used to select what bitrate is appropriate when it comes to streaming.

old, but yeah basically this:
https://netflixtechblog.com/per-title-encode-optimization-7e99442b62a2

But it seems like a lot of streaming services (i guess at least most of the smaller local ones) still just use fixed bitrates for the ladder, cause I guess they dont have resources to implement complex workflows.

So I've been wondering, why isnt CRF used (or is it?)? Cant it be in used together with vbv restrictions to create a crude method to do this?

I'm a bit out of my realm of expertise here but. Lets say you have a 1Mbps connection target preset, why not set a crf value that is close to 500Kbps in avg bitrate for your average complexity material, then set --vbv-maxrate 750 --vbv-bufsize 1000. I'm I missing something or shouldn't this be a fairly good (or at least easy) way of doing it compared to fixed bitrates?

So a ladder would look something like this:

540p24 ~500Kbps (1Mbps safe) --crf 26 --vbv-maxrate 750 --vbv-bufsize 1000

720p24 ~1250Kbps (2Mbps safe) --crf 24 --vbv-maxrate 1750 --vbv-bufsize 2500

1080p24 ~3Mbps (5Mbps safe) --crf 22 --vbv-maxrate 4500 --vbv-bufsize 6000

1080p24 ~6Mbps (10Mbps safe) --crf 19 --vbv-maxrate 9000 --vbv-bufsize 12000

So, my question is basically: is this a good or a poor way of doing it? If good, is this already in widespread use, and if not, why?

Blue_MiSfit
22nd April 2020, 19:13
This is a reasonable way of doing things. Some providers do exactly this.

Beamr did an interesting review of the landscape and found several content adaptation methods

http://media2.beamrvideo.com/pdf/Beamr_Content_Adaptive_Tech_Guide.pdf

Beamr's tech is cool, they do frame by frame optimization in loop with encoding, so they calculate a lot of metrics for an encoded frame and then re-encode the frame with fewer and fewer bits until they get the lowest data rate possible while maintaining the metric score. It's more complex than that, but it's a tight integration. I've looked at their stuff and it's quite good. It's not cheap tho :)

Netflix does extensive content optimization (the per scene convex hull approach).

benwaggoner
22nd April 2020, 19:39
Yeah, using CRF can definitely be a useful tool in this regard. One problem is that some older heuristics struggle with the variable bitrate, especially if they don't make an internal distinction between peak bitrate and actual bitrate. So you'll see this sort of technique used more by companies who either focus only on web or recent devices, or who are able to deliver their own player with custom heuristics. If one is trying to work with the build-in DASH client in older living room devices, things get fraught quickly.

benwaggoner
22nd April 2020, 20:15
Looks like someone at Amazon patented some related ideas :sly:. http://pat2pdf.org/patents/pat9712860.pdf

Blue_MiSfit
22nd April 2020, 20:25
Makes sense. Smart TVs in general (especially older ones) are _ATROCIOUS_ platforms to develop for, especially when you have to live in the sandbox environment they give app developers. Netflix always gets special permission to run their own binary. Nobody else gets that level of integration, so it limits what's possible.

benwaggoner
24th April 2020, 00:01
Makes sense. Smart TVs in general (especially older ones) are _ATROCIOUS_ platforms to develop for, especially when you have to live in the sandbox environment they give app developers. Netflix always gets special permission to run their own binary. Nobody else gets that level of integration, so it limits what's possible.
I wouldn't say nobody else. I'd expect it is available to the big five premium content services. They build players in a lot of different ways targeting different levels of abstraction.

Blue_MiSfit
24th April 2020, 00:25
That's true, I guess having worked for one of the big studios and still running into these limitations I assumed that would be the case for other platforms as well ;)

benwaggoner
24th April 2020, 16:36
That's true, I guess having worked for one of the big studios and still running into these limitations I assumed that would be the case for other platforms as well ;)
Other than Disney+, the big studios's official platforms don't have a significant market share. Here's a recent breakdown: https://www.mediapost.com/publications/article/349306/nielsen-streamings-share-of-tv-time-leaps-to-23.html?utm_source=factoftheday1

Because only Disney has core brands customers are familiar with (Disney, Marvel, Pixar). People just don't know which shows and movies would be on Peacock versus CBS All Access.

Building and supporting premimum experience video players for all the devices people want to watch on (so many generations of OS software running on so many different SoCs!) and the content for those (different codecs, packaging, and DRM systems) takes a 1000+ person engineering org by my estimate.

TEB
26th April 2020, 22:27
We are actually doing this as we speak.
We dont have need to go as far as netflix/youtube are going. So we went the CRF + CAP way with a fixed ladder. Gives quite results as we are only allowing newer devices on native players..

benwaggoner
27th April 2020, 23:20
We are actually doing this as we speak.
We dont have need to go as far as netflix/youtube are going. So we went the CRF + CAP way with a fixed ladder. Gives quite results as we are only allowing newer devices on native players..
By CAP do you mean capped VBV?

It is a delicious luxury to only have to worry about newer devices! Of course, the longer a service lasts, the more today's bright and shiny will eventually become tomorrow legacy deadweight holding you back :sly:.

TEB
28th April 2020, 09:39
By CAP do you mean capped VBV?

It is a delicious luxury to only have to worry about newer devices! Of course, the longer a service lasts, the more today's bright and shiny will eventually become tomorrow legacy deadweight holding you back :sly:.

X264:
-crf 20 -maxrate 15M -bufsize 15M for the top profile

TEB
28th April 2020, 09:48
So for us, we (ISP and Operator) finally got to the point that we had the opportunity to move from a classic static CBR ladder til something better.

We had to decide on how far we wanted to take this into the constant quality domain and we ended up with the conclusion that we wanted to do something between what we have today and what Netflix is doing!

We did a lot of tests of 2pass VBR, vs CRF "free" vs CRF Capped, vs CBR both on x264, x265 and commercial encoders like Titan File

We ended up with a static ladder based on CRF with a bitrate CAP which gives us a lot of what Netflix is doing, but not going all the way with regards to scene based encoding, title tuning and multiple pre-passes to find the sweet spot etc..

To give an example:
Our top HQ 1080p profile/rung is basically CRF19@15mbps CAP with a fallback of CRF22@8mbps CAP etc..

benwaggoner
28th April 2020, 18:56
X264:
-crf 20 -maxrate 15M -bufsize 15M for the top profile
Oh, if that was the only hard part! But what about the rare cases where the first NAL unit is larger than the following, which is allowed by the spec, but not by some SoC decoders? Or the many, many permutations of DRM encryption that can be fussy with some devices. Or Qualcomm DRM carveouts where you need to allocate on boot enough memory for the maximum resolution & max reference frames. Note that it's the worst case of both; supporting 320x180 with 6 reference frames and 3840x2160 with 3 reference frames requires 3840x2160x6 memory carveout.

And then there are all the heuristics that will treat a 15 Mbps peakrate as all fragments of that stream index being really 15 Mbps, so it won't go to that even when that top stream is at 3 Mbps and there's 12 Mbps of bandwidth.

benwaggoner
28th April 2020, 18:58
So for us, we (ISP and Operator) finally got to the point that we had the opportunity to move from a classic static CBR ladder til something better.

We had to decide on how far we wanted to take this into the constant quality domain and we ended up with the conclusion that we wanted to do something between what we have today and what Netflix is doing!
What's the stuff Netflix is doing that you aren't?

We ended up with a static ladder based on CRF with a bitrate CAP which gives us a lot of what Netflix is doing, but not going all the way with regards to scene based encoding, title tuning and multiple pre-passes to find the sweet spot etc..

To give an example:
Our top HQ 1080p profile/rung is basically CRF19@15mbps CAP with a fallback of CRF22@8mbps CAP etc..
Do you have documentation on what Netflix is doing with Capped VBR (CVBR)?

Note that CRF itself IS scene-based encoding. There are other flavors of it, but the most basic is not to spend bits where they aren't needed.

excellentswordfight
29th April 2020, 10:06
Oh, if that was the only hard part! But what about the rare cases where the first NAL unit is larger than the following, which is allowed by the spec, but not by some SoC decoders? Or the many, many permutations of DRM encryption that can be fussy with some devices. Or Qualcomm DRM carveouts where you need to allocate on boot enough memory for the maximum resolution & max reference frames. Note that it's the worst case of both; supporting 320x180 with 6 reference frames and 3840x2160 with 3 reference frames requires 3840x2160x6 memory carveout.

I dont really follow how this can be an issue. Isnt the decoder validated for levels and profiles? If a decoder is specified to handle main10 main teir level 4, how could it not handle MaxDpbSize and vbv-limits within the spec? As long as the buffer is not higher then the maximum maxrate of that level/tier and that it can feed data in that rate shoudlnt it be all good? Or is this specifically an issue for some DRM solutions?

And then there are all the heuristics that will treat a 15 Mbps peakrate as all fragments of that stream index being really 15 Mbps, so it won't go to that even when that top stream is at 3 Mbps and there's 12 Mbps of bandwidth.

Mind expanding on this? Why would it use the peakrate as decision for selection this way? Shouldn't the decision be made on buffer status and chunk sizes? Would that really be an issue with ABR with MPEG-DASH and HLS?

Blue_MiSfit
29th April 2020, 20:03
In my experience, the DASH clients on older devices are insane. If those devices are important enough, then you need to make playback work well on them. It kind of doesn't matter what the problem really is.

This can ALSO be an issue with poorly implemented DRM, btw. I've definitely seen certain popular settop boxes blow up when fed streams that were 100% valid within the constraints of the Main10 profile @ level 5 (both vbv limits and MaxDpbSize). Reducing the reference frame count was the only solution. This was acknowledged as a firmware bug but I don't know if it's ever been fixed ;)

TEB
30th April 2020, 09:37
What's the stuff Netflix is doing that you aren't?
Do you have documentation on what Netflix is doing with Capped VBR (CVBR)?
Note that CRF itself IS scene-based encoding. There are other flavors of it, but the most basic is not to spend bits where they aren't needed.

I merely pointing at what i have read through their excellent blogs as well as general knowledge in this marked (i may be totally wrong on my assumptions, but thats why we are all here to learn ;)):
1. Using some kind of ML process to detect the scenes, separation
2. Encoding each scene seperatly with different encoding parameters (sweet spot), maybe different gop lengths also??
3. Use loops of CRF tests against a VMAF threshold to find the baseline for encoding with static/capped VBR.

TEB
30th April 2020, 09:42
In my experience, the DASH clients on older devices are insane. If those devices are important enough, then you need to make playback work well on them. It kind of doesn't matter what the problem really is.

This can ALSO be an issue with poorly implemented DRM, btw. I've definitely seen certain popular settop boxes blow up when fed streams that were 100% valid within the constraints of the Main10 profile @ level 5 (both vbv limits and MaxDpbSize). Reducing the reference frame count was the only solution. This was acknowledged as a firmware bug but I don't know if it's ever been fixed ;)

I hear ya ;) Luckily we have a higher minimum device req.
With that said, what we DO have seen as the major issue is actually Lvl1 WV in the combination with 1080p50 for linear sports broadcasts.
Alot of embedded devices running AOSP/ATV cant properly do this without dropping frames in the TEE environment..
Apple IOS/APPLE eats most of our content with little complaint as long as we have ios10+ as the minimum os level req

TEB
30th April 2020, 09:43
Oh, if that was the only hard part! But what about the rare cases where the first NAL unit is larger than the following, which is allowed by the spec, but not by some SoC decoders? Or the many, many permutations of DRM encryption that can be fussy with some devices. Or Qualcomm DRM carveouts where you need to allocate on boot enough memory for the maximum resolution & max reference frames. Note that it's the worst case of both; supporting 320x180 with 6 reference frames and 3840x2160 with 3 reference frames requires 3840x2160x6 memory carveout.

And then there are all the heuristics that will treat a 15 Mbps peakrate as all fragments of that stream index being really 15 Mbps, so it won't go to that even when that top stream is at 3 Mbps and there's 12 Mbps of bandwidth.

Woa!! Thats alot of corner cases i have never seen/observed.
We dont go past 1080p50 today and have relatively modern devices mentioned in other replies.

Blue_MiSfit
1st May 2020, 00:10
I hear ya ;) Luckily we have a higher minimum device req.
With that said, what we DO have seen as the major issue is actually Lvl1 WV in the combination with 1080p50 for linear sports broadcasts.
Alot of embedded devices running AOSP/ATV cant properly do this without dropping frames in the TEE environment..
Apple IOS/APPLE eats most of our content with little complaint as long as we have ios10+ as the minimum os level req

Yeah, throughput for AES-CTR mode becomes a major issue on cost constrained devices. Apple was right to support only AES-CBCS since it' so much faster. Incidentally they were right for the wrong reasons :D Their devices are beefy enough to do CTR at high speeds! We've seen on certain cheap crap Smart TVs that support both formats that switching from CTR to CBCS almost doubled throughput.

Anyway, we're drifting off-topic here a bit, even if this is still relevant in the context of the risks associated with not using more traditional CBR / heavily capped VBR rate control modes for streaming :)

benwaggoner
1st May 2020, 05:18
I dont really follow how this can be an issue. Isnt the decoder validated for levels and profiles? If a decoder is specified to handle main10 main teir level 4, how could it not handle MaxDpbSize and vbv-limits within the spec? As long as the buffer is not higher then the maximum maxrate of that level/tier and that it can feed data in that rate shoudlnt it be all good? Or is this specifically an issue for some DRM solutions?
The thing is that the first NAL unit is allowed to be bigger than the following NAL units, per spec. Most decoders do this right, but there are a couple common SoCs found in living room decoders that require ALL NAL units to be no bigger than the after-the-first size. Of course, almost always the first NAL unit is still under that side. But every now and then one doesn't. That was at least six months to root cause and get a fix for rolled out...

Or some older devices that do progressive download, but only support a legacy 32-bit data fields in .mp4 files. Today everything uses the 64-bit extention, and we don't really think about it. But at even a bitrate of 2.5 Mbps, after about four hours the 32-bit offsets run out of room and some older players fail. The secret? Turn off B-frames for long titles! That raises the limit to about 12 hours. Why? I can't even remember anymore.

I wish we could say all decoders are fully spec compliant, but there are all kinds of little things that don't work right. Or even if it is compliant, when doing DRM there are finicky ways things need to be structured to get adequate performance.

Mind expanding on this? Why would it use the peakrate as decision for selection this way? Shouldn't the decision be made on buffer status and chunk sizes? Would that really be an issue with ABR with MPEG-DASH and HLS?
At scale, a one-in-a-million problem where a fragments causes the player to crash will happen 0.1% of the time each hour of playback. That's many thousands of customers suffering from player crashes a day.

Someday I'll tell the story about how 1% of all bandwidth in North America was spent trying to keep my ex-wife happy...

foxyshadis
1st May 2020, 05:59
The thing is that the first NAL unit is allowed to be bigger than the following NAL units, per spec. Most decoders do this right, but there are a couple common SoCs found in living room decoders that require ALL NAL units to be no bigger than the after-the-first size. Of course, almost always the first NAL unit is still under that side. But every now and then one doesn't. That was at least six months to root cause and get a fix for rolled out...

Or some older devices that do progressive download, but only support a legacy 32-bit data fields in .mp4 files. Today everything uses the 64-bit extention, and we don't really think about it. But at even a bitrate of 2.5 Mbps, after about four hours the 32-bit offsets run out of room and some older players fail. The secret? Turn off B-frames for long titles! That raises the limit to about 12 hours. Why? I can't even remember anymore.

I wish we could say all decoders are fully spec compliant, but there are all kinds of little things that don't work right. Or even if it is compliant, when doing DRM there are finicky ways things need to be structured to get adequate performance.

It's funny to remember how it was almost 20 years ago, when we were all trying to figure out which hardware players were least buggy and picky about playing back DivX/Xvid, just so we could watch it in the living room too. Or how to structure our offering to them. Some things never change.

Someday I'll tell the story about how 1% of all bandwidth in North America was spent trying to keep my ex-wife happy...

:p:cool: