Alliance for Open Media codecs [Archive] - Page 29

MoSal

30th January 2019, 12:29

In particular I worry that VMAF is insufficiently sensitive to temporal shifts in video quality. A VMAF of 70, 65, 60, 60, 65, 60, 60 might come out as a nice "VMAF=65.3" but be a annoying to watch. Frame strobing was a weakness of libvpx.

That's not really a problem. Per-frame data is available (wrote this (https://github.com/MoSal/vmaf-plot) mostly in a couple of hours). It's even available for multiple metrics, which is nice.

The real problem is that VMAF is not that good. It, for example, spectacularly fails with samples that greatly benefit from AQ (yes, I know you already hinted at this).

jonatans

30th January 2019, 13:57

Thanks for posting an update Jonatan. Your efforts on this, and in the MC-IF are really appreciated.

Thank you Tom. And thanks for providing additional context to this interesting and complicated matter.

I think i read that Franhaufer sold their HEVC patents to General Electric, if true you should remove Franhaufer from your diagram.

This is correct. But my understanding is that Fraunhofer did not sell all their HEVC patents. They are listed as licensor in HEVC Advance. In the latest patent list from HEVC Advance there are two Fraunhofer patents listed: https://www.hevcadvance.com/pdfnew/HEVC-Patent-List-January-2019.pdf

Beelzebubu

30th January 2019, 17:56

In particular I worry that VMAF is insufficiently sensitive to temporal shifts in video quality. A VMAF of 70, 65, 60, 60, 65, 60, 60 might come out as a nice "VMAF=65.3" but be a annoying to watch. Frame strobing was a weakness of libvpx.

First and foremost: yes! It's great to see some technical & independent thinking of how good VMAF really is.

Netflix uses "hVMAF" as official notation in their charts. "h" means "harmonic", which means it uses harmonic (https://en.wikipedia.org/wiki/Harmonic_mean) means, which bias towards the least favourable. So in your example, the harmonic mean would be 62.66, whereas the average would be 62.86. Neither of these is 65.3. So I'm personally not as concerned about the averaging mechanism aspect of your concern. (In the CLI, use --pool harmonic_mean or something similar, depending on which exact tool you use.) On the other hand, I don't believe that VMAF uses temporal consistency in the reconstruction (the "motion" component is calculated from the source), so that particular concern ("frame throbbing" - i.e. keyframe pulsing or grain/textured-background tearing) I agree with.

Actually, I have to hedge a little here, since I'm not 100% sure VIF (another VMAF component) has a temporal component to it. I don't think it does but I'm not 100% sure.

Since we're on the subject, here's some more of my personal concerns about VMAF:
* it's luma-only;
* AQ (x264/5) or SAO (x265) appear to have a negative impact on vmaf score, which is inconsistent with the reported visual results. I do have more detailed thoughts on this but let's leave that for some other time;
* the actual MOS/VMAF correlation depends very strongly on the viewing environment and therefore on the used model file, but most poeple simply use the default model without knowing what viewing environment it represents.

Just to be clear, I'm not trying to talk badly about VMAF, I think it's a great tool, it's better than the alternatives and it's fantastic that they opensourced the library as well as the models so that we can learn and understand how it works and constructively critique it. Hopefully, over time, that will make it even better, which should be the ultimate goal.

Separately, I also do agree with you that in the end, we should probably make a distinction between codecs optimized using VMAF vs. those that did not. This isn't an excuse to suck at writing encoders or to not use VMAF when writing encoders, but at the end of the day, we have to acknowledge that as in any metric, we're assuming a perfect correlation between our metric-of-the-day and the visual experience (or MOS score). That correlation will in practice always be imperfect, and therefore tuning towards/using that metric needs to be done with care and with visual confirmation (otherwise queue up the incoming VMAF artifacts - I wonder what they will look like?).

MoSal

30th January 2019, 21:57

@Beelzebubu

What's really funny, Netflix will not be using VMAF on their published AOM content as is. Why? Because of film grain synthesis ;)

benwaggoner

31st January 2019, 06:00

@Beelzebubu

What's really funny, Netflix will not be using VMAF on their published AOM content as is. Why? Because of film grain synthesis ;)
Well, if VMAF is used on the reconstructed video, it should be as good as VMAF is at dealing with film grain.

If VMAF is bad at dealing with film grain, they need to address that.

The nice thing about VMAF is that it's really a machine learning framework. They can keep on adding new clips and kinds of encodings and training it to rate those. The big expenses is getting the subjective ratings to use as ground-truth data. But VMAF itself can always be as good as the ground truth data from subjective testing.

TD-Linux

31st January 2019, 06:38

The nice thing about VMAF is that it's really a machine learning framework. They can keep on adding new clips and kinds of encodings and training it to rate those. The big expenses is getting the subjective ratings to use as ground-truth data. But VMAF itself can always be as good as the ground truth data from subjective testing.

The inputs to VMAF itself are the outputs of a bunch of simpler metrics. In that way, the machine-learned part is sort of a "meta-metric". That also means that if the input metrics all respond poorly to film grain, no amount of machine learning is going to be able to make sense of it. I think more work on the input metrics will be needed before VMAF can be used to make film grain decisions.

I don't know what Netflix currently does, but if I were them I would filter the grain from the video, run the VMAF-targeting dynamic optimizer to produce the rate controlled stream, and then add the noise parameters back as a final step.

LigH

31st January 2019, 08:55

Franhaufer

Fraunhofer

Frau = woman
Hof = yard

TomV

31st January 2019, 19:27

HEVC Advance standard essential patent owned by GE challenged as likely invalid (https://www.unifiedpatents.com/news/2019/1/31/tx7k046x6yvail8s3jz4v8tes3x4m7)

benwaggoner

31st January 2019, 20:33

The inputs to VMAF itself are the outputs of a bunch of simpler metrics. In that way, the machine-learned part is sort of a "meta-metric". That also means that if the input metrics all respond poorly to film grain, no amount of machine learning is going to be able to make sense of it. I think more work on the input metrics will be needed before VMAF can be used to make film grain decisions.

I don't know what Netflix currently does, but if I were them I would filter the grain from the video, run the VMAF-targeting dynamic optimizer to produce the rate controlled stream, and then add the noise parameters back as a final step.
Good point on the underlying metrics. In particular I think the temporal metric was quite weak. They changed it for the most recent VMAF, but I'm not confident it'll catch all common kinds of visible temporal distortions. Two frames can look equally "good" but switching between them can be terribly jarring. Open GOP and RADL exist in large part to smooth inter-GOP transitions. And that still requires some cleverness around GOP boundaries to do well.

Mr_Khyron

1st February 2019, 20:13

https://github.com/OpenVisualCloud/SVT-AV1

Welcome to the GitHub repo for the SVT-AV1 encoder! To see a list of feature request and view what is planned for the SVT-AV1 encoder, visit our Trello page: http://bit.ly/SVT-AV1 Help us grow the community by subscribing to our SVT-AV1 mailing list
:cool:
Hardware

The SVT-AV1 Encoder library supports the x86 architecture

CPU Requirements

In order to achieve the performance targeted by the SVT-AV1 Encoder, the specific CPU model listed above would need to be used when running the encoder. Otherwise, the encoder runs on any 5th Generation Intel® Core™ processor, (Intel® Xeon® CPUs, E5-v4 or newer).

RAM Requirements

In order to run the highest resolution supported by the SVT-AV1 Encoder, at least 48GB of RAM is required to run a 4k 10bit stream multi-threading on a 112 logical core system. The SVT-AV1 Encoder application will display an error if the system does not have enough RAM to support this. The following table shows the minimum amount of RAM required for some standard resolutions of 10bit video per stream:
Resolution Minimum Footprint (GB)
4k 48gb
1080p 16gb
720p 8gb
480p 4gb

Selur

1st February 2019, 20:17

so an Intel only encoder?

nevcairiel

1st February 2019, 20:20

so an Intel only encoder?

It should be able to run on any AVX2 CPU.

But the entire series of SVT encoders (SVT-HEVC is also a thing) is designed specifically for a use-case of running them on powerful datacenter systems with loads of memory and CPU cores.

benwaggoner

1st February 2019, 20:47

https://github.com/OpenVisualCloud/SVT-AV1

:cool:
Wow, that's a LOT of RAM for 4K. But if it's somewhat proportional to number of cores, no biggie. Any 112 logical core system is going to have >> 48 GiB RAM. The biggest c5 instance today is 72 logical threads and 144 GiB RAM.

I don't think there's ever been an encoder that could usefully use anything like 112 cores except via GOP-level parallelism. But hey, it's Intel.

I've not been able to find much detailed documentation about the SVT HEVC or AV1 projects. Do they mean "Scalable Video" ala SVC and SHVC with enhancement layers, mainly used in videoconferencing? Or scalable in the sense of scaling with hardware?

Leveraging the new low-level encoder SDK from Intel offers some interesting potential for very fast initial estimates for encoding, leaving the CPU to focus more on refinement. There isn't an AV1 encoder in the current Intel CPUs, obviously, but perhaps some VP9 functionality added in Kaby/Coffee Lake can be leveraged. Certainly things like weighted prediction and coarse motion vectors could be reused to some degree. SVT HEVC has a full 8-bit HEVC encoder implementation to leverage in Skylake-S+ and 10-bit in Kaby/Coffee.

Unfortunately there aren't any Xeon processors with VP9 encoding yet. The best available is the 8/16 core i9-9900K. I don't see any public roadmap for when AV1 might be added. Ice Lake? I see that has an all new HEVC encoder at least. Although given tape-out schedules and how recent the AV1 bitstream was finalized, a full fixed-function implementation might not be there before Tiger Lake. (all just personal speculation fueled by Wikipedia).

I am very curious to see what comes out of the next generation of GPU-assisted software-defined encoding. Having it all on-die instead avoid the PCI bus latency challenges of past GPU+CPU implementations.

nevcairiel

1st February 2019, 20:50

TomV

2nd February 2019, 00:26

I've not been able to find much detailed documentation about the SVT HEVC or AV1 projects. Do they mean "Scalable Video" ala SVC and SHVC with enhancement layers, mainly used in videoconferencing? Or scalable in the sense of scaling with hardware?
No. Intel bought eBrisk (they already owned a good chunk of eBrisk, thanks to the Altera acquisition, because Altera had invested in eBrisk), and then open sourced their HEVC encoder. Then they started focusing on AV1, and now they've open sourced that encoder. The HEVC encoder is fast, but the video quality is not competitive. I'm not sure if it can beat x264 under equal conditions. It certainly can't beat x265 or Beamr 5 under any conditions.

TomV

2nd February 2019, 00:26

Hardware. Its not producing "scalable video".
Not hardware. Software.

nevcairiel

2nd February 2019, 00:43

Not hardware. Software.

You should read the context of the question that answer was to. ;-)

To make sure its not lost again, let me paraphrase: :p
Q: Scalable Video, or Scaling with Hardware?
A: Hardware.

TomV

2nd February 2019, 02:47

You should read the context of the question that answer was to. ;-)

To make sure its not lost again, let me paraphrase: :p
Q: Scalable Video, or Scaling with Hardware?
A: Hardware.
OK... I see. Exactly what they were thinking when they used this acronym, which, as Ben points out, is confusingly similar to Scalable Video Coding and Scalable HEVC Video Coding, I don't know. Nothing to see here... move along.

hajj_3

3rd February 2019, 17:47

Intel SVT-AV1 benchmarks: https://twitter.com/fg118942/status/1092045469981671424

soresu

3rd February 2019, 20:54

Is it just me or is rav1e actually pulling out ahead of VP9 at some bitrates on the graph? If so thats a nice milestone for rav1e, given the timeframe.

benwaggoner

4th February 2019, 19:41

Intel SVT-AV1 benchmarks: https://twitter.com/fg118942/status/1092045469981671424
Is there more documentation on what's actually being tested and graphed here? Based on the parameters, it doesn't appear to be controlled for encoding speed. And odd to use --tune ssim for x264/x265 for VMAF, which is a superior objective metric than SSIM.

I wish tests would provide the actual per-frame VMAF scores instead of just a mean. For real-world duration stuff, variability of quality can hurt subjective quality in a way that VMAF itself won't capture. Keyframe strobing on one frame every 5 seconds doesn't drag down the mean much, but it can be a very annoying artifact viewers can clap along to.

Nintendo Maniac 64

4th February 2019, 20:31

I wish tests would provide the actual per-frame VMAF scores instead of just a mean.
I presume you mean (pun not intended) in a manner similar to frame-time graphs used in GPU performance benchmarks, or at least just also showing a 1% low? For a similar reason, they came about since showing the average frame rate hides any uneven frame delivery which is much more important to game playability.

The only thing is that such graphs would tend to be limited to having a single bitrate or quality setting since the bottom axis in such a situation would be time rather than bitrate.

...which might very well be why people don't do it - because they want to show a single graph with various differing bitrates rather than a really detailed graph but only at a single bitrate or quality setting.

EDIt: A 1% low graph would at least let you do this, but it would still also require making a second graph (unless you don't even care about the mean at all, in which case you could just graph a 1% low and call it a day).

nevcairiel

4th February 2019, 20:37

We get SSIM graphs with per-frame curves, so its not that of a "new" idea to also do that for VMAF or the likes.

benwaggoner

5th February 2019, 00:56

I presume you mean (pun not intended) in a manner similar to frame-time graphs used in GPU performance benchmarks, or at least just also showing a 1% low? For a similar reason, they came about since showing the average frame rate hides any uneven frame delivery which is much more important to game playability.

The only thing is that such graphs would tend to be limited to having a single bitrate or quality setting since the bottom axis in such a situation would be time rather than bitrate.

...which might very well be why people don't do it - because they want to show a single graph with various differing bitrates rather than a really detailed graph but only at a single bitrate or quality setting.

EDIt: A 1% low graph would at least let you do this, but it would still also require making a second graph (unless you don't even care about the mean at all, in which case you could just graph a 1% low and call it a day).
Having a harmonic mean of the worst 0.1%, 1%, 10% would be quite useful, yes.

But the actual VMAF output is just per-frame scores, so anyone publishing a mean VMAF already has the data. Even if they don't want to plot the data, they could still make the log files available for download.

fg118942

5th February 2019, 03:24

Having a harmonic mean of the worst 0.1%, 1%, 10% would be quite useful, yes.

But the actual VMAF output is just per-frame scores, so anyone publishing a mean VMAF already has the data. Even if they don't want to plot the data, they could still make the log files available for download.

Log files and encoded videos are here.
https://www.dropbox.com/s/nbnlsicvslptt2c/vidyo1_720p_60fps.7z?dl=0

I am encoding it with tune ssim because I followed the instructions in this article.
https://www.streamingmedia.com/Articles/Editorial/Featured-Articles/AV1-A-First-Look-127133.aspx

I may not be able to answer difficult questions as I am not good at English.

TomV

5th February 2019, 16:57

Intel SVT-AV1 benchmarks: https://twitter.com/fg118942/status/1092045469981671424
x264 and x265 preset slower is not the right preset to use versus aomenc --cpu-used = 0 and SVT-AV1 enc-mode 0. This test should compare with x264, x265 --preset placebo. Better yet, forget objective metrics. Just show us the video, so we can judge for ourselves the bit rates that produce matching subjective quality.

kanaka

6th February 2019, 10:03

My AVIF toolkit: https://mega.nz/#!5oQE2Sob!STZHdk4ob4ptHknMvNcB4JxbCt9xdu3WUKkg7iyh2EM

I tested avif format with this photo
https://personal.sron.nl/~pault/images/colourvisiontest_small.png
Avif file was different from source... (text wasn't readable), so i removed
--color-primaries=bt709 --transfer-characteristics=bt709 --matrix-coefficients=bt709
and result was ok. Why did you put this color profile?

I'm thinking about conversion my 12bit raw photos to avif. Is is possible? What pix_format shoul I use?

fg118942

6th February 2019, 11:46

x264 and x265 preset slower is not the right preset to use versus aomenc --cpu-used = 0 and SVT-AV1 enc-mode 0. This test should compare with x264, x265 --preset placebo. Better yet, forget objective metrics. Just show us the video, so we can judge for ourselves the bit rates that produce matching subjective quality.

I thought that the point was right so I added placebo data.
https://i.imgur.com/V1WH6GJ.png
Also, the video encoded with SVT-AV1 has been uploaded here.
https://www.dropbox.com/s/nbnlsicvslptt2c/vidyo1_720p_60fps.7z?dl=0

LigH

6th February 2019, 18:27

New uploads: (MSYS2; MinGW32: GCC 7.4.0 / MinGW64: GCC 8.2.1)

AOM v1.0.0-1299-g54eabb5c8 (https://www.mediafire.com/file/q1x1a8akjgtqu8c/aom_v1.0.0-1299-g54eabb5c8.7z)

rav1e 0.1.0 (2cec0f9 / 2019-02-06) (https://www.mediafire.com/file/w1o3g5wdye8w8o5/rav1e_0.1.0_2019-02-06_2cec0f9.7z)

dav1d 0.1.1 (caca572 / 2019-02-06) (https://www.mediafire.com/file/aym9cb9e1ct5vi7/dav1d_0.1.1_2019-02-06_caca572.7z)

benwaggoner

6th February 2019, 21:05

Log files and encoded videos are here.
https://www.dropbox.com/s/nbnlsicvslptt2c/vidyo1_720p_60fps.7z?dl=0

I am encoding it with tune ssim because I followed the instructions in this article.
https://www.streamingmedia.com/Articles/Editorial/Featured-Articles/AV1-A-First-Look-127133.aspx

I may not be able to answer difficult questions as I am not good at English.
Thank you!

benwaggoner

6th February 2019, 21:07

I tested avif format with this photo
https://personal.sron.nl/~pault/images/colourvisiontest_small.png
Avif file was different from source... (text wasn't readable), so i removed
--color-primaries=bt709 --transfer-characteristics=bt709 --matrix-coefficients=bt709
and result was ok. Why did you put this color profile?

I'm thinking about conversion my 12bit raw photos to avif. Is is possible? What pix_format shoul I use?
709==sRGB, so I am surprised it made a difference. Perhaps a 0-255 versus 16-235 luma range conversion? Making text unreadable would be a weird result, though.

benwaggoner

6th February 2019, 21:09

x264 and x265 preset slower is not the right preset to use versus aomenc --cpu-used = 0 and SVT-AV1 enc-mode 0. This test should compare with x264, x265 --preset placebo. Better yet, forget objective metrics. Just show us the video, so we can judge for ourselves the bit rates that produce matching subjective quality.
If we are comparing to very slower encoders, I recommend adding --tskip to x265 as well. That can help efficiency with text, cel animation, and other content with synthetically sharp edges.

kanaka

7th February 2019, 08:48

709==sRGB, so I am surprised it made a difference. Perhaps a 0-255 versus 16-235 luma range conversion? Making text unreadable would be a weird result, though.

check yourself http://screenshotcomparison.com/comparison/129605

TD-Linux

7th February 2019, 11:07

check yourself http://screenshotcomparison.com/comparison/129605

Aha, looks like 601 vs 709 matrix. Although JPEG is normally sRGB primaries, it uses what is basically a full-range 601 matrix. So if your sources are JPEG, a 601 matrix makes the most sense.

kanaka

7th February 2019, 11:23

Aha, looks like 601 vs 709 matrix. Although JPEG is normally sRGB primaries, it uses what is basically a full-range 601 matrix. So if your sources are JPEG, a 601 matrix makes the most sense.

Source is png (https://personal.sron.nl/~pault/images/colourvisiontest_small.png)
and there is commands from encode.ST.sh

./bins/ffmpeg -r 1 -y -hide_banner -loglevel fatal -i "$1" -vf scale=out_color_matrix=bt709:flags=lanczos+accurate_rnd+bitexact+full_chroma_int+full_chroma_inp,format=yuv420p10le -strict -1 "temp/orig_$filename.y4m"
./bins/aomenc --threads=4 -v --cpu-used=4 --end-usage=q --cq-level=$quality --sharpness=7 --bit-depth=10 --full-still-picture-hdr --color-primaries=bt709 --transfer-characteristics=bt709 --matrix-coefficients=bt709 --ivf -o "encoded/$filename.ivf" "temp/orig_$filename.y4m"

//edit: I had older version of toolkit. New toolkit use yuv420p and works ok.

benwaggoner

7th February 2019, 19:57

Aha, looks like 601 vs 709 matrix. Although JPEG is normally sRGB primaries, it uses what is basically a full-range 601 matrix. So if your sources are JPEG, a 601 matrix makes the most sense.
sRGB uses 709, which itself is the average of the 601 PAL (EBU 3213) and NTSC (SMPTE C) primaries. As an industry, we should probably stop talking about "601 primaries" since there are actually two different ones, unless we use it as shorthand for "the primaries used by the original SD video format."

709 was the compromise for HD to make it "international" - as the average of the two, if 601 gets treated as 709 or vise versa, that minimizes the worst-case error compares to 601 PAL <> 601 NTSC.

https://en.wikipedia.org/wiki/Rec._709#Primary_chromaticities

As a parochial American, I thought 709 was dumb when it came out, but I have since gained the wisdom to appreciate its simple brilliance.

soresu

7th February 2019, 22:44

benwaggoner

8th February 2019, 21:45

benwaggoner you just mentioned AV2 several times in the EVC thread on February 1st. Do you know where current work on AV2 is being committed to if it is public yet? The googlesource.com git site leaves something to be desired as far as usability and search is concerned.
I've heard from some people that they are doing some initial work on it, and the hope is that it will be a relatively quick turnaround.

One potential wrinkle to the VPx and AVx codecs is that they know in advance when essential patents are going to expire, so tools could be designed in advance and only deployed when IP is cleared up. So there could be stuff that was too early for AV1 that could be reused. That's just my own personal speculation, though. But that could speed some things up.

People looking at AV2 have also been a lot more optimistic about it than AV1, which didn't get enough attention to HW decoder design optimization, or getting tools to work together orthogonally. One comment I heard is that one tool might be sharpening a pixel while another is smoothing it.

2020 should be an interesting year in the codec space, with AV2, VVC, and EVC all potentially being far enough along to evaluate, and H.264, HEVC, and AV1 still competing for current deployments. After UHD, HDR, HFR, and object-based audio all launching in 2014-2016, it's been a little dull around new media technologies. So I'm pretty amped by all the exciting fun 2020-2022 is going to be for codecs! It'll be an interesting mix of technical, business, and legal factors, and I really don't have a guess yet about what the codec world will look like in five years*!

And audio stuff is heating up with xHE-AAC, AC-4 with Atmos, and MPEG-H all going mainstream.

* Well, I bet we'll still be using MP4 as a container format.

soresu

8th February 2019, 22:47

Is there any word on Daala techniques like PVQ and Activity Masking, and also ANS going into AV2?

Though from the direction of VVC and Google's own encoding research priorities - I could see a more than healthy dose of machine learning put to use in AV2 aswell. ML seems to be affording some very significant complexity/efficiency gains in the area of Path Tracing, and I'm sure all of the involved AOM parties would cheer improvements in encoding complexity.

IgorC

10th February 2019, 18:10

2020 should be an interesting year in the codec space, with AV2, VVC, and EVC all potentially being far enough along to evaluate, and H.264, HEVC, and AV1 still competing for current deployments. After UHD, HDR, HFR, and object-based audio all launching in 2014-2016, it's been a little dull around new media technologies. So I'm pretty amped by all the exciting fun 2020-2022 is going to be for codecs! It'll be an interesting mix of technical, business, and legal factors, and I really don't have a guess yet about what the codec world will look like in five years*!

This statement is beyond of a healthy optimism.
The market of video codecs is cooling down. There are few reasons for that. Royalty free formats start to gain share and some external factors like a big improvement of network bandwidth especially during last years.

And audio stuff is heating up with xHE-AAC, AC-4 with Atmos, and MPEG-H all going mainstream.

:(
I don't know where You get this information from but this is not what happens with audio codecs lately.
xHE-AAC has nothing to do with mainstream. It's a low bitrate codec and companies adopt it only where bandwidth is very scarce. xHE-AAC/AC4 has no advantage over AAC (22 years old format) at 96-128+ kbps. Audio formats are mature at this point.

AC3 patents have expired in 2017 while LC-AAC's will be expired during 2019-2020. It will be imposible to force some company to use new codec when there are AC3 and LC-AAC with expired patents. Giant streaming platforms, Netflix and Youtube, use AAC and Opus. They don't plan to use any new audio codecs in near future.

Plus there is no one single developer team working on xHE-AAC, MPEG-H or AC4 audio codec. And xHE-AAC isn't actually a new format. It's a standard since 2012. Where its development? Adoption?

Blue_MiSfit

10th February 2019, 23:21

Ultra low bitrate is highly desirable for a few specific use cases for companies delivering video:

1) Countries with extremely poor (~2G, to maybe 3G at best) cellular connectivity. Delivering even good quality SD video is totally acceptable here. Total bit budget is often like 200 - 300 Kbps though, so you really do need to use the lowest bitrate audio you can possibly use. 96 Kbps for stereo AAC is not feasible. Opus is great here, but it doesn't have universal support, so more development into other formats is absolutely welcome.

2) Download / offline playback. Imagine you're at the airport about to board a flight. You forgot to download something to watch on your phone / tablet during the flight! You want to be able to download a movie or a couple episodes of a series as quickly, probably using over-crowded WiFi or cellular connectivity. See above.

soresu

11th February 2019, 03:14

IgorC,

I dont know about XHE-AAC, but I heard that UK FreeSAT chose AC-4 as the audio format for its next evolution. Link here (https://dolbyac4.com/uk/).
Other platforms supporting it are shown, aswell as multiple hardware partners (Broadcom, Cadence, HiSilicon,
Mediatek, MStar Semiconductor,
Novatek and Realtek).

IgorC

11th February 2019, 04:14

1) Countries with extremely poor (~2G, to maybe 3G at best) cellular connectivity. Delivering even good quality SD video is totally acceptable here. Total bit budget is often like 200 - 300 Kbps though, so you really do need to use the lowest bitrate audio you can possibly use. 96 Kbps for stereo AAC is not feasible.

This is not true.
Look at the report https://opensignal.com/reports-data/global/data-2018-11/state_of_wifi_vs_mobile_OpenSignal_201811.pdf

The modest mobile and/or fixed connections are about ~2-3 Mbps (in Algeria). Far from yours 200-300 kbps.

Generally people have a wrong idea that every county in Africa, Asia and Latin America (where I live actually) has very bad internet connection.
Here in Latin America I get 10+ Mbps on 4g/LTE+/4G+.
And Indians are mad about their "slow" 4G connection. It's "just" 6 Mpbs! https://www.indiatimes.com/technology/news/india-has-over-86-percent-4g-availability-but-the-worst-data-speed-in-the-world-at-6-07-mbps-340086.html

https://ispspeedindex.netflix.com/country/india/

Do You still think "96 Kbps for stereo AAC is not feasible" and "India is so 2G", right?

Ultra low bitrate is highly desirable for a few specific use cases for companies delivering video:

Yes, corner cases. Not mainstream as Ben claims.

Opus is great here, but it doesn't have universal support, so more development into other formats is absolutely welcome.

Opus is used in Youtube, an endless number of VoIP and telephone clients including Cisco corporate solutions like Webex, Skype etc. And it is supported by large number of platflorms including Android and iOS https://caniuse.com/#search=opus

So are You sugesting to use something better like xHE-AAC which doesn't even has one single available encoder? Oh, nice. That will do.

2) Download / offline playback.
What is wrong with current VP9, H.264, H.265, Opus and HE/AAC codecs?
xHE-AAC isn't any better than HE/AAC, Opus at 96 kbps, which is already low bitrate. https://www.ietf.org/lib/dt/documents/LIAISON/file1298.doc

IgorC,

I dont know about XHE-AAC, but I heard that UK FreeSAT chose AC-4 as the audio format for its next evolution. Link here (https://dolbyac4.com/uk/).
Other platforms supporting it are shown, aswell as multiple hardware partners (Broadcom, Cadence, HiSilicon,
Mediatek, MStar Semiconductor,
Novatek and Realtek).
Great. Both xHE-AAC and AC4 have similar quality as they have the same/similar compression tools. So AC4 has an advantage but only on low bitrate as well. It makes sense to use it where BW is expensive like digital radio DRM but I won't expect it to see on internet platforms like Netflix, YouTube (Opus AAC), Spotify (AAC 128-256k, Vorbis 96/160/320l), Apple Music (AAC 256k), Tidal (96-256 kbps AAC and lossless FLAC) etc.

soresu

11th February 2019, 04:45

TomV

11th February 2019, 07:30

This is not true.
Look at the report https://opensignal.com/reports-data/global/data-2018-11/state_of_wifi_vs_mobile_OpenSignal_201811.pdf

Igor, you're arguing with 2 technical professionals who are key members of their respective Tier 1 companies... Amazon and Disney. They have access to much better insights on end-user bandwidth and client device capabilities than you or I. These companies will license proprietary codecs like Dolby AC-4 or xe-AAC if and when that makes sense. Software audio decoding is certainly feasible on most devices, especially at very low bit rates (when audio is also likely mixed to one channel).

I'm glad you have decent bandwidth in Latin America. In many developing areas of the world, bandwidth is still scarce and expensive. And even if mobile networks have been upgraded, that end-customer that a video streaming service is trying to take care of may still have an older device.

iwod

11th February 2019, 07:44

As someone who comes from a village in northern England, I can tell you that it only just got upgraded to VDSL from the 3 mbps ADSL 2 it had been at for 5-8 years.

Thankfully it is only 1.5-2 miles from the closest exchange, but many rural communities are much further out than that and still lack the FTTC/VDSL upgrades that have existed near the exchanges for over half a decade (therefore stuck with ultra low ADSL data rates). Expensive 4G mobile broadband data is sadly a bad option if you plan to consume any significant amount of video per month.

All this adds up to the fact that low/ultra low bitrate video is far from corner case, even in first world countries - mainly because rural areas being lower population density are treated like third world countries by BT/Open Reach.

It wouldnt surprise me to find out that many rural places in Europe and the US suffer from similarly slow uptake of landline fibre based broadband technologies.

I know this is slightly off topic, but I can assure you, comparatively speaking BT isn't doing such a bad job at rural areas. They are actively investing into G.Fast and VDSL 35b. One of the earliest implementor of ADSL2+, ( That is up to 5000M from exchange ). The future is that once 5G matures, setting up Gigabits wireless network using Microwave as backbone should be way cheaper than layering out fibre. So I am optimistic in rural area's broadband.

But yes, ultra low bitrate ( Sub 1Mbps ) is still required in many places, especially if you are doing video which is hogging a lot of the capacity. There is a huge capacity difference between constantly hanging on to a 1Mbps Data stream than doing once in a while 6Mbps Speed test.

So hopefully as both Network Technologies improves and Compression improves, the long tail of world population can all enjoy online streaming video within the next decade. I just hope future codec focus more on sub 2-4Mbps bitrate,