Alliance for Open Media codecs [Archive] - Page 39

benwaggoner

31st October 2019, 19:46

Unfortunately not fast enough, I bought an Amazon Fire Stick for my dad in 2017 assuming that the advertised HEVC support meant up to 10 bit, only to find out to my horror that most of the HEVC encoded videos I have did not work on it.

At least the newer FS 4K I bought this year has 10 bit capability, still the experience was somewhat disheartening considering the sheer amount of 10 bit content available at the time I bought the first Fire Stick, 5 years after HEVC was standardised.
Any HDR-capable Fire Stick/TV supports 10-bit decode. The feature did come to FireTV first, though, a couple of generations back.

benwaggoner

31st October 2019, 19:53

Lots of Deep, Neural and ML related stuff discussed at the AOM event it seems - I think we can guess the main direction of AV(x) codecs in the future.
Do you mean the encoders or the bitstream definition itself?

I get nervous about actually tuning bitstream features based on ML, because we still lack in well subjectively-correlated metrics. VMAF is the least bad ever, but is SDR only. and the whole question of how individual frame metrics get aggregated into good metrics for interframe encoding remains barely examined. A mean of individual frame values isn't that useful for a clip that is more than a few second of a single shot.

We've seen that AV1 shows better VMAF to MOS ratios than other codecs, which could be a result of this sort of curve fitting to one metric. It's generally true that the more a metric gets used, the lower its subjective correlation becomes, as encoders get increasingly tuned to the metric instead of to subjective ratings.

soresu

31st October 2019, 21:21

Do you mean the encoders or the bitstream definition itself?

I get nervous about actually tuning bitstream features based on ML, because we still lack in well subjectively-correlated metrics. VMAF is the least bad ever, but is SDR only. and the whole question of how individual frame metrics get aggregated into good metrics for interframe encoding remains barely examined. A mean of individual frame values isn't that useful for a clip that is more than a few second of a single shot.

We've seen that AV1 shows better VMAF to MOS ratios than other codecs, which could be a result of this sort of curve fitting to one metric. It's generally true that the more a metric gets used, the lower its subjective correlation becomes, as encoders get increasingly tuned to the metric instead of to subjective ratings.

I've only skimmed 2-3 of the presentation slide decks, but there is certainly an appreciation for bitstream compatibility where the ML models are concerned.

Though Google at least have been knocking on this particular door for at least a few years now, I figure they would not still be at it if they thought it could not be harnessed in a standardised way for a codec bitstream.

Just to be clear, my own level of understanding of all of this is fairly amateur compared to experts on here - I mostly have an avid interest in codecs and more recently ML too (due to various ML optimisations in CG rendering and production fields).

soresu

1st November 2019, 19:59

Found an interesting slide deck called"Adaptive Optimal Linear Estimators for Enhanced Motion Compensated Prediction".

It goes over several things, though I'm not sure if they are alternative solutions or potentially additive improvements.

Assuming they are additive, it discusses at least an average 11.5% BD rate improvement over baseline (presumably AV1 is the baseline).

Link here (https://aomedia.org/wp-content/uploads/2019/10/KenRose_UCSB.pdf).

NikosD

2nd November 2019, 11:00

TLDR:
Win7 64bits, i7-4770k, 3.40GHz (stock), improvement between 0.2.1 and 0.5.1 using only SSSE3 accelerated routines, single thread:
Chimera: 33.2%
Dua Lipa: 34.9%

Included are some AVX2 tests too, because yes. Ok, so you replied to different issues than those I questioned with my results, but still your results are interesting.
It seems that single threaded performance has increased for SSSE3 but AVX2 over SSSE3 is very tiny.
Still, both SSSE3 and AVX2 in single threaded mode are ~30 something % faster for 0.5.1 vs 0.2.1

BUT as I have already stated in my results, the CPU utilization during real-world multi-threaded decoding, eats ALL of the single threaded performance in case of Dua Lipa clip and most of the single threaded gain for the other clips for both SSSE3 and AVX2 versions.

In other words, for the real-world multi-threaded decoding the absolute sum of gain and loss between 0.2.1 vs 0.5.1 is dead zero for Dua Lipa and so small for the other clips.

Sorry, but I can't call this situation as progress after seven months, if overall multi-threaded decoding performance gain is zero.

NikosD

2nd November 2019, 11:49

FFMpeg says it's going to use 4 frame threads and 3 tile threads to decode the files, so I'll be using those numbers.

Chimera: 34%

Dua Lipa: 29.7%
Ok, now your results are even more interesting.
Where can I can I find those executables of 0.2.1 and 0.5.1 versions to run them on my systems ?
I have used the LAV filters versions posted above.

NikosD

2nd November 2019, 12:02

FFMpeg says it's going to use 4 frame threads and 3 tile threads to decode the files, so I'll be using those numbers.

Chimera: 34%

Dua Lipa: 29.7%
Also, I have to say that you are using half of your threading power meaning only 50% CPU logical utilization as you have an 8 threaded CPU and you are using only 4 threads.
I think you have to test it again with at least 8 threads in order to use hyperthreading and all of your CPU's processing power.

SmilingWolf

2nd November 2019, 14:45

Is there any particular reason DXVA Checker grays out the CPU usage line when I try to do the benchmarks? Is there some option I need to set?

NikosD

2nd November 2019, 15:06

Is there any particular reason DXVA Checker grays out the CPU usage line when I try to do the benchmarks? Is there some option I need to set? You only have to run the benchmark (I run it 3 times and take the average) and for every run you will see at the end the CPU usage.
It has also min/avg/max value, even for CPU usage.
Just leave it to finish.

SmilingWolf

2nd November 2019, 15:50

Yeah I tried doing that and it didn't work, see screenshots

https://i.ibb.co/sV30XGJ/Screenshot-1.png (https://ibb.co/sV30XGJ) https://i.ibb.co/2gPh0H6/Screenshot-2.png (https://ibb.co/2gPh0H6)

NikosD

2nd November 2019, 16:06

Yeah I tried doing that and it didn't work, see screenshots

https://i.ibb.co/sV30XGJ/Screenshot-1.png (https://ibb.co/sV30XGJ) https://i.ibb.co/2gPh0H6/Screenshot-2.png (https://ibb.co/2gPh0H6) Ok, you have to go to LAV filters settings and choose SW decoding.
It seems to me that you are using HW decoding.

SmilingWolf

2nd November 2019, 16:24

I did wonder if that was the case, but when I open LAVFilters' config panel this is what I see:
https://i.ibb.co/RcM5bCC/Screenshot-1.png (https://ibb.co/RcM5bCC)

Fresh installation, nothing touched

NikosD

2nd November 2019, 16:59

Fresh installation, nothing touched Latest DXVA Checker v4.2.1 and Connect to Renderer selected ?
Also, when you select the AV1 file, does LAV say unsupported inside DXVA Checker ?

SmilingWolf

2nd November 2019, 17:19

Latest DXVA Checker v4.2.1, system equipped with a GTX1080 (440.97).

It says Unsupported, yes.

This version does not seem to have a "Connect to Renderer" option anywhere.
https://i.ibb.co/2N34LRy/Screenshot-1.png (https://ibb.co/2N34LRy)

SmilingWolf

3rd November 2019, 07:26

Ok, but SmilingWolf and you, have tested different things than me.
Firstly, he posted single threaded performance difference and I posted multi-threaded performance difference

Conveniently forgetting about my two posts dedicated to multi threaded performance aren't we?
http://forum.doom9.org/showthread.php?p=1889274#post1889274
http://forum.doom9.org/showthread.php?p=1889289#post1889289

There isn't a DXVA Checker report yet, afternoon spent trying to make it work notwithstanding, but as I said, CPU utilization goes between 70% and 90% with the two sequences used.

littleD

3rd November 2019, 08:21

Some older versions of dxva checker shows CPU usage. But i did short test and the results of dxch was around 88% utilization while system monitor was showing 100%. Maybe thats why authors of the program turned off the feature temporarily, because of inconsistent results?

SmilingWolf

3rd November 2019, 08:50

I'd love to keep arguing, but I have to agree that won't make the board any favor, so let's let bygones be bygones.

I have already removed all sorts of config files, fresh installations, even reboots etc.
Have you run your own benches on this (4.2.1) version, or on an older one? If the latter happens to be the case, what exact version, so I can download it from the VideoHelp archive?

I don't get what you mean by "no internal commands". They are cmdline applications, just use the same cmdlines I used. I was inside an MSYS shell just so that I could use the "time" command, but I suppose PowerShell on Win10 has got something similar.
Word of advice, they only digest pure IVF files, so at least the Dua Lipa video will have to be freed of its container using "ffmpeg -i Dua_Lipa.mp4 -c:v copy Dua_Lipa.ivf"

NikosD

3rd November 2019, 09:45

I'm not a developer, so environments like MSYS , Visual Studio etc are not frequently installed on my system.
I have compiled a few apps from time to time, even my own code decades ago (!) but I'm not going to do it now setting up MSYS.
I'll give PowerShell a try of course, as I use it from time to time for my job (although I still do a lot using cmd)
But that IVF thing is another obstacle.
Regarding DXVA Checker I used v4.2.1 which of course has everything, as I told you before.
Min/avg/max for FPS and CPU utilization.
Your main problem is that you see things like Video Engine and GPU utilization and you shouldn't.
You need a cleaner OS.
Tomorrow I'll try setting LAV to single-thread mode and run the same tests with Skylake at work.
If nobody here in this forum can confirm or reject my multi-thread results using so familiar tools like LAV filters and DXVA Checker, I'll try to reproduce yours single-thread results.

P.S
DXVA Checker is a sophisticated and accurate tool and the CPU utilization refers to itself only, as a process, not general CPU utilization during its running.

nevcairiel

5th November 2019, 09:40

Comparisons between LAV 0.74.1 and later nightly versions are flawed since the threading strategy changed in FFmpeg, which resulted in 0.74.1 using more frame threads then the later nightlies, making 0.74.1 artificially faster. As such, all your results are invalidated.
This is why you should use as little software as possible to do benchmarking (ie. go as close to the core as possible), as you never know what changes might interfer with your conclusions.

I've also once again changed the thread distribution in 0.74.1-30 from last night, and while its going to use more threads again now, similar to the old logic, its not going to be identical to 0.74.1 in all cases (because I added more tile threads on high core-count CPUs)

Mr_Khyron

7th November 2019, 23:56

AOMedia Research Symposium 2019 Videos
https://www.youtube.com/playlist?list=PL97T7zfqOOF3YKvniyywewtWKpxXky8iI

utack

8th November 2019, 14:49

Lesson learnt from WebP

There are only two I can think of

despite being technically more advanced you can still lose to a decades old legacy format when your encoder is terrible
it does not matter that your format is worse than the legacy competition, if you claim that it is better often enough others will start parroting it and adopt it

Seriously the only area where it might be a tiny bit better is for ultra-high compression where it does not start falling apart as badly as jpeg, for any sane (mid ot high) image quality range the vast array of jpeg encoders are doing a significantly better job of retaining detail

dapperdan

8th November 2019, 20:04

Webp had some other benefits over JPEG outside of compressing photographic images.

JPEG XL seems like WebP's successor in this regard. It's targeted at lots of pain points that would make it a good choice to replace JPEG (and PNG and GiF) on the web and in the browser even if it didn't beat JPEG on compression, though it claims that as well. And maybe the JPEG name will help, though that doesn't seem to have benefitted anyone but the original JPEG.

Not sure there's room for AVIF and JPEG XL but maybe they have subtly different niches.

dapperdan

8th November 2019, 20:09

soresu

9th November 2019, 01:38

Ronald's slide showing 4 AV1 encoders all scaling well seems like an improvement from his slide at BIG Apple Video where only SVT seemed to be managing that, with Eve just behind and Rav1e and libaom trailing.

Not sure it it's a direct comparison to the earlier slide but if it is then things should be a lot better for AV1 when cores are available.

An interesting point Ronald made implies that AV1 has an intrinsic parallel scaling limitation due to an oversight during the encoder development (12:20 in the video), something to do with superblock boundaries.

Hopefully a lesson learned for AV2 efforts going forward.

marcomsousa

9th November 2019, 15:49

Rav1e release 0.1.0
First official release, published during the Video Dev Days 2019 in Tokyo.

Features

Intra and inter frames
64x64 superblocks
4x4 to 64x64 RDO-selected square and 2:1/1:2 rectangular blocks
DC, H, V, Paeth, smooth, and a subset of directional prediction modes
DCT, (FLIP-)ADST and identity transforms (up to 64x64, 16x16 and 32x32 respectively)
8-, 10- and 12-bit depth color
4:2:0 (full support), 4:2:2 and 4:4:4 (limited) chroma sampling
11 speed settings (0-10)
Near real-time encoding at high speed levels
Rate control (single-pass and two-pass)
Temporal RDO
Scene cut detection
CLI tool and C API

https://github.com/xiph/rav1e/releases/tag/0.1.0

mzso

9th November 2019, 16:41

Hi!
Are there any AV1 videos on youtube besides the beta playlist? So far I haven't found any.

marcomsousa

9th November 2019, 16:49

Hi!
Are there any AV1 videos on youtube besides the beta playlist? So far I haven't found any.

Almost all top videos, but only in low resolutions.

mzso

9th November 2019, 19:20

Almost all top videos, but only in low resolutions.

Thanks. Well, I guess I won't be coming across many then. I don't watch stuff like that, and it looks like a few million views are far from enough. A 4+ billion Ed Sheeran song had it up to to 2160p, but a 2+ billion Taylor swift song only has it up to 720p. I managed to find some Wired videos with a couple million views, that have AV1 though.

It seems like Firefox's (well, Waterfox's to be accurate) AV1 decoding is quite poor. MPV's (after upgrading) and LAV's seem to be a lot better, no hangs or stutter.

marcomsousa

10th November 2019, 09:29

HandBrake 1.3.0 Released

* Added support for reading AV1 via libdav1d

https://github.com/HandBrake/HandBrake/releases

dapperdan

10th November 2019, 16:25

A 4+ billion Ed Sheeran song had it up to to 2160p, but a 2+ billion Taylor swift song only has it up to 720p. I managed to find some Wired videos with a couple million views, that have AV1 though.

A YouTube engineer made a comment about people using YouTube as a radio station, but their licence requiring video, so ultra low bitrate AV1 being a kind of workaround for contractual obligations.

So it's possible that YouTube is measuring the bitrate/resolution of these views and prioritising the ones that are viewed at high quality, not just the ones that are viewed a lot. Just a guess though.

Beelzebubu

11th November 2019, 07:58

An interesting point Ronald made implies that AV1 has an intrinsic parallel scaling limitation due to an oversight during the encoder development (12:20 in the video), something to do with superblock boundaries.

The superblock boundary one to get superblock-row multithreading (an encoder-side-only version of partitions in vp8 or wavefront in hevc) is quite micro, because it is easily worked around by just ignoring the superblock edge's correctness and sacrifice your search' accuracy a tiny little bit. The frame multi-threading one is a bigger deal.

benwaggoner

11th November 2019, 22:50

A YouTube engineer made a comment about people using YouTube as a radio station, but their licence requiring video, so ultra low bitrate AV1 being a kind of workaround for contractual obligations.

So it's possible that YouTube is measuring the bitrate/resolution of these views and prioritising the ones that are viewed at high quality, not just the ones that are viewed a lot. Just a guess though.
A lot of the "TV for radio" clips have super static backgrounds or just scrolling lyrics. So they should encode down super small with more advanced codecs and long GOPs.

Adonisds

18th November 2019, 23:09

When do you think Stadia using AV1 will be available?

marcomsousa

19th November 2019, 00:00

When do you think Stadia using AV1 will be available?Stadia is use cheap client cpu, so AV1 will be in next year for high-end cpu, and the follow year to cheap devices.. So 2 to 3 year for sure.

Marco Sousa

soresu

19th November 2019, 08:25

Stadia is use cheap client cpu, so AV1 will be in next year for high-end cpu, and the follow year to cheap devices.. So 2 to 3 year for sure.

Marco Sousa

There's nothing to suggest AV1 will be in either Samsung or Qualcomm's flagship SoC next year, Qualcomm still has yet to even join AOM when last I checked.

On the other hand the likes of Amlogic have plans (https://www.cnx-software.com/2019/10/20/amlogic-s905x4-s908x-s805x2-av1-1080p-4k-8k-media-processors/)to have AV1 in lower end chips for 2020, so you appear to be wrong on both counts.

benwaggoner

19th November 2019, 18:18

Stadia is use cheap client cpu, so AV1 will be in next year for high-end cpu, and the follow year to cheap devices.. So 2 to 3 year for sure.
On the decode side, maybe in a few years. But Does Stadia have a path to a good 4Kp60 encoder with sufficient efficiency and cost?

There is a lot of untapped potential in using how the game is rendered to drive encoder optimization that could help here (the game knows your motion vectors!). Not sure if anyone's researching that with AV1 currently.

benwaggoner

19th November 2019, 18:55

There's nothing to suggest AV1 will be in either Samsung or Qualcomm's flagship SoC next year, Qualcomm still has yet to even join AOM when last I checked.

On the other hand the likes of Amlogic have plans (https://www.cnx-software.com/2019/10/20/amlogic-s905x4-s908x-s805x2-av1-1080p-4k-8k-media-processors/)to have AV1 in lower end chips for 2020, so you appear to be wrong on both counts.
So far, all the announced AV1 chips have been for living room devices, and nothing for mobile, correct?

Living room has a lot more breathing room for power consumption, thermal management, and thus process. The mobile chips are where every extra transistor costs the most.

Of course, having a working HW decoder at all is a huge milestone, even if it might take a bit for those to migrate into mobile SoCs.

huhn

20th November 2019, 03:47

i wouldn't wait for them to announce a hardware decoder they may just add them or evne ship them already without a word and without any way to access them.

as an example the "first" nvidia card with an HEVC main10 decoder was the 960 this card has a VP9 profile 0 decoder too it was not possible to access it for about a year or so.

the 960 was release in 22.01.2015
VP9 was finalised 17.06.2013
that's just 17 months
i dare to say that AV1 has far more attraction then VP9

AV1 is 19 month old so if someone really wanted to they could have added it and it wouldn't be the first time they made it public which may sound odd at first but if it can't be used anyway it may just be a better this way.

There is a lot of untapped potential in using how the game is rendered to drive encoder optimization that could help here (the game knows your motion vectors!). Not sure if anyone's researching that with AV1 currently.
using multiply frames for encoding is a very bad thing in this case because latency is very important in this task there is a reason the currently released stadia is at best a very poor joke they want money for.

this only counts for lookahead which should be completely avoided if possible.
and how would you even access these information from the game in the first place.

soresu

20th November 2019, 13:16

So far, all the announced AV1 chips have been for living room devices, and nothing for mobile, correct?

He was talking Stadia which uses a Chromecast in its founder bundle - that is a living room/wall powered device.

huhn

20th November 2019, 15:26

stadia "works" with pretty much every device. you don't need a chromecast a phone can do it so can a web browser on the PC. there is missing support for iOS and such but what ever.

hajj_3

20th November 2019, 16:24

New RAV1E build out with big performance improvements when using tiling: https://github.com/xiph/rav1e/releases/tag/20191120

huhn

20th November 2019, 21:35

i don't even see any use case for stadia and AV1 now.
even if the end device is able to decode it you still have to real time low latency encode it which is just not possible at 4K60 UHD.

Mr_Khyron

22nd November 2019, 17:20

Visionular AV1 encoder AURORA demo video
http://35.185.250.137:8080/#
https://i.imgur.com/T3N47lb.png

LigH

23rd November 2019, 19:10

Isn't it "Tears of Steel"? :o

Adonisds

23rd November 2019, 19:41

i wouldn't wait for them to announce a hardware decoder they may just add them or evne ship them already without a word and without any way to access them.

as an example the "first" nvidia card with an HEVC main10 decoder was the 960 this card has a VP9 profile 0 decoder too it was not possible to access it for about a year or so.

the 960 was release in 22.01.2015
VP9 was finalised 17.06.2013
that's just 17 months
i dare to say that AV1 has far more attraction then VP9

AV1 is 19 month old so if someone really wanted to they could have added it and it wouldn't be the first time they made it public which may sound odd at first but if it can't be used anyway it may just be a better this way.

using multiply frames for encoding is a very bad thing in this case because latency is very important in this task there is a reason the currently released stadia is at best a very poor joke they want money for.

this only counts for lookahead which should be completely avoided if possible.
and how would you even access these information from the game in the first place.

So will the GPUs releasing next year from Nvidia and AMD probably decode AV1?

nevcairiel

23rd November 2019, 20:10

So will the GPUs releasing next year from Nvidia and AMD probably decode AV1?

AMD has historically been very slow adopting new codecs, while NVIDIA has usually pushed it decently fast. But any speculation is just that.

huhn

23rd November 2019, 20:53

AV1 has a major problem i forgot.
they changed the bit stream about 10 month ago. i don't have detail if thsi affects decoding but as far as i understand yes it did back then.

maybe AMD cares more now since they are now entering the mobile market. taking recent development crashing with hardware decoding and the lying with polaris into account it will be intel or nvidia first.

it's just speculation anyway.

Beelzebubu

23rd November 2019, 23:47

AV1 has a major problem i forgot.
they changed the bit stream about 10 month ago. i don't have detail if thsi affects decoding

It does not. The errata-1 is exactly that, an errata to clarify some bitstream (mostly level) constraints. Actual decoding is not affected, it is just intended to simplify worst-case for hardware design.

Actual non-cosmetic changes that I am aware of in the errata1:

#226 (https://github.com/AOMediaCodec/av1-spec/commit/619a830b27cf8195479de51f2d2fc1598c1e2d99)
#227 (https://github.com/AOMediaCodec/av1-spec/commit/f1509dd6f7382e0a9b07dcb535cf9d021b229865)
#228 (https://github.com/AOMediaCodec/av1-spec/commit/11843e6361ddf25b67c651612646aac441d1880d)