Log in

View Full Version : dav1d accelerated AV1 decoder


Pages : 1 [2] 3 4

NikosD
12th February 2020, 14:06
Netflix starts using AV1 initially on Android mobile for offline viewing, leveraging dAV1d 10 bit optimized app.

Interesting.

-- Netflix --

Today we are excited to announce that Netflix has started streaming AV1 to our Android mobile app. AV1 is a high performance, royalty-free video codec that provides 20% improved compression efficiency over our VP9† encodes. AV1 is made possible by the wide-ranging industry commitment of expertise and intellectual property within the Alliance for Open Media (AOMedia), of which Netflix is a founding member.

Our support for AV1 represents Netflix’s continued investment in delivering the most efficient and highest quality video streams. For our mobile environment, AV1 follows on our work with VP9, which we released as part of our mobile encodes in 2016 and further optimized with shot-based encodes in 2018.

While our goal is to roll out AV1 on all of our platforms, we see a good fit for AV1’s compression efficiency in the mobile space where cellular networks can be unreliable, and our members have limited data plans. Selected titles are now available to stream in AV1 for customers who wish to reduce their cellular data usage by enabling the “Save Data” feature.

Our AV1 support on Android leverages the open-source dav1d decoder built by the VideoLAN, VLC, and FFmpeg communities and sponsored by the Alliance for Open Media. Here we have optimized dav1d so that it can play Netflix content, which is 10-bit color. In the spirit of making AV1 widely available, we are sponsoring an open-source effort to optimize 10-bit performance further and make these gains available to all.

As codec performance improves over time, we plan to expand our AV1 usage to more use cases and are now also working with device and chipset partners to extend this into hardware.

Nintendo Maniac 64
14th February 2020, 23:39
Phononix has 16core verses 32core vs 48c vs 64c vs 64c+SMT scaling comparisons in dav1d v0.5.0 and SVT-AV1 v0.8 on Windows 10 Pro, Win10 Enterprise, and Clear Linux (all on a Threadripper 3990X):
https://www.phoronix.com/scan.php?page=article&item=3990x-windows-linux&num=3


And regarding their choice of Linux distro:
https://www.phoronix.com/scan.php?page=article&item=3990x-clear-linux&num=1[/url]"]One of the interesting takeaways from my pre-launch briefing with AMD on the Ryzen Threadripper 3990X was AMD representatives actually recommending Clear Linux for use on this 64-core / 128-thread HEDT processor and the platform to which they've found the best performance. Yet, Clear Linux is an Intel open-source project.

The Clear Linux recommendation for the Threadripper 3990X was hardly a surprise to me given my experience with the platform, just a bit surprising AMD representatives acknowledging the Intel open-source software creation during a briefing. We've been benchmarking Clear Linux for years and were the ones to initially shine the public spotlight on its impressive performance capabilities -- that includes for AMD platforms too with numerous tests on different platforms we've performed the past few years.

mzso
17th February 2020, 20:45
Is Dav1d likely to get significantly faster?
I tried it a bit in Firefox Nightly and there's a sizable gap to VP9 decoding in CPU usage (2x or more). I tried with 4k videos so the CPU usage would be more obvious.

Beelzebubu
17th February 2020, 21:59
Is Dav1d likely to get significantly faster?

Yes.

I tried it a bit in Firefox Nightly and there's a sizable gap to VP9 decoding in CPU usage (2x or more). I tried with 4k videos so the CPU usage would be more obvious.

Could you elaborate on what sort of system (CPU chipset etc.), and where you got the content from? In particular, it'd be interesting to know the respective bitrates for the AV1 & VP9 files/streams, but knowing the encoder settings might also be somewhat useful.

Playback speed correlates a lot with bitrate. The 30% numbers that we've shown at conferences and in blogs are for same-quality encodes, where VP9 has a higher bitrate than AV1. If the files are same-bitrate, the performance difference goes up. On easy content, the postfilters also require a higher % of runtime (compared to e.g. inverse transform or predictors), and since AV1 has more postfilters, that means the difference will grow on low-complexity content, and will be smaller on high-complexity content. The 30% was also without film grain (since we assume the GPU will do that for free), but there is currently no browser that does that correctly yet.

benwaggoner
17th February 2020, 22:23
Is Dav1d likely to get significantly faster?
I tried it a bit in Firefox Nightly and there's a sizable gap to VP9 decoding in CPU usage (2x or more). I tried with 4k videos so the CPU usage would be more obvious.It'll get somewhat faster. But given equal levels of opimization, VP9 is going to decode a lot faster than AV1 beause AV1 is a lot more complex. This is how it always is. About every decade we get a new bitstream that offers an eventual ~50% reduction in bitrate for about a 100% increase in decoder complexity as long as you spend ~10x more on encoding to take avantage of all the new features.

Codec development is all about turning Moore's law improvements into better compression efficiency. There are all kinds of features that could have been used in AV1/HEVC/VVC that offer small improvements for bigger complexity requirements. And each generation it's a trade off for what a reasonable complexity cost is, and people design the most capable bitstream format within that reasonable decoder complexity envalope.

Sent from my SM-T837V using Tapatalk

mzso
1st March 2020, 17:45
What browser/version, and on what platform/OS?



I think Youtube is known to do significantly higher bitrates for AV1 than for VP9, so that could be part of why...

I was testing Dav1d on Firefox Nightly. On Windows 8.1

hajj_3
6th March 2020, 01:03
dav1d 0.6.0 'Gyrfalcon'

0.6.0 brings major improvements in 10/12bit decoding on ARMv8 CPUs, up to 2.5 times faster than 0.5.2. It also brings new AVX-512, AVX2 and SSSE3 optimizations and improves the existing optimizations on all platforms. Finally, it also fixes some decoder mismatches and minor crashes.

soresu
11th March 2020, 00:15
Yeah, fast ARM 10-bit will make HDR feasible on 2020 mobile devices. For user generated content at least; premium studio content will still require HW DRM.

It's be nice to have some benchmarks with details beyond "Up to 2.5x faster" - is that only in some edge cases, or is ~2x speedup a practical expectation?

The bottom post on this gitlab issue has some benchmarks done by an AOM community member.

Link here (https://code.videolan.org/videolan/dav1d/issues/15).

Mr_Khyron
11th March 2020, 02:49
Dav1d 0.6 AV1 Video Decoder benchmark
https://phoronix.com/scan.php?page=news_item&px=Dav1d-0.6-AV1-Benchmarks

benwaggoner
3rd April 2020, 17:51
I'm wanting to test av1 and some other new encoders, and have been using media-autobuild-suite to try and build a ffmpeg with it. It keeps failing trying to install cargo-c as part of rav1e. I've deleted the folder. I've removed rav1e from the build list, but it keeps always trying to compile it and then failing. It's been happening for a couple of weeks now, and has persisted despite several rav1e updates in that time period.

Any suggestions?

dav1d git .................................................. [Up-to-date]
Running git clone for rav1e...
┌ rav1e git .......................................... [Recently updated]
├ Running submodule...
├ Running install-cargo-c...
Likely error (tail of the failed operation logfile):


error: aborting due to previous error

error: failed to compile `cargo-c v0.6.2`, intermediate artifacts can be found at `C:\Users\benwagg\AppData\Local\Temp\cargo-installyZ6Pas`

Caused by:
could not compile `cargo-c`.

To learn more, run the command again with --verbose.
install-cargo-c failed. Check C:/Users/benwagg/Desktop/media-autobuild_suite-master/build/rav1e-git/ab-suite.install-cargo-c.log
This is required for other packages, so this script will exit.
Creating diagnostics file...

All relevant logs have been anonymously uploaded to https://0x0.st/iuKh.zip
Copy and paste [logs.zip](https://0x0.st/iuKh.zip) in the GitHub issue.
Make sure the suite is up-to-date before reporting an issue. It might've been fixed already.
Try running the build again at a later time.

hajj_3
9th April 2020, 22:01
The windows 10 av1 extension decoder now uses the dav1d decoder! An update was just released on the microsoft store.

hajj_3
9th April 2020, 23:29
vlc player 3.0.9.2 is out, the first update in 8 months. Pretty sure it includes the latest dav1d decoder.

Pat357
10th April 2020, 00:30
The windows 10 av1 extension decoder now uses the dav1d decoder! An update was just released on the microsoft store.

Is this an extra download or does Win10 comes with this AV1 extension decoder ?

If it is an extra download, any url ?

hydra3333
10th April 2020, 07:45
The windows 10 av1 extension decoder now uses the dav1d decoder! An update was just released on the microsoft store.

Thanks !!

hajj_3
7th May 2020, 09:43
MPC-BE v1.5.5 (build 5274) beta has been released, the first non-nightly release in 5 months. It includes dav1d git-0.6.0-80-g114e8f0 and ffmpeg git-n4.3-dev-2815-gda44bbefaa.

LigH
13th May 2020, 07:47
New builds in MABS will delay a bit: meson has some issues with GCC 10.1 and posix_memalign (https://code.videolan.org/videolan/dav1d/-/issues/337) as reported for dav1d. MSYS2 update pending.

Sagittaire
16th May 2020, 10:23
New builds in MABS will delay a bit: meson has some issues with GCC 10.1 and posix_memalign (https://code.videolan.org/videolan/dav1d/-/issues/337) as reported for dav1d. MSYS2 update pending.

Your build is the last implementation dor VPx?

Spyros
20th May 2020, 18:01
dav1d 0.7.0 'Frigatebird' the fast and lean AV1 decoder (https://code.videolan.org/videolan/dav1d/-/releases/0.7.0)


This is a major update of the dav1d, the fast and lean AV1 decoder, codename 'Frigatebird'.

This release improves, once again, the speed on all platforms.

A rewrite of refmv made an important speed boost on x86 while reducing RAM usage. It should improve the speed on every platform.

A large number of assembly optimizations went for ARM64 for 8/10/12bit, a few for x86, notably for film grain and AVX-512 optimizations for CDEF.

And from the changelog (https://code.videolan.org/videolan/dav1d/-/blob/master/NEWS):


Changes for 0.7.0 'Frigatebird':
------------------------------

0.7.0 is a major release for dav1d:
- Faster refmv implementation gaining up to 12% speed while -25% of RAM (Single Thread)
- 10b/12b ARM64 optimizations are mostly complete:
- ipred (paeth, smooth, dc, pal, filter, cfl)
- itxfm (only 10b)
- AVX2/SSSE3 for non-4:2:0 film grain and for mc.resize
- AVX2 for cfl4:4:4
- AVX-512 CDEF filter
- ARM64 8b improvements for cfl_ac and itxfm
- ARM64 implementation for emu_edge in 8b/10b/12b
- ARM32 implementation for emu_edge in 8b
- Improvements on the dav1dplay utility player to support 10 bit,
non-4:2:0 pixel formats and film grain on the GPU

unlord
20th May 2020, 18:57
Timed with the dav1d 0.7.0 release, I just ran a multi-threaded performance comparison of libgav1 and dav1d on the 8-bit and 10-bit Chimera encodes (which are at roughly equivalent rate):

https://docs.google.com/spreadsheets/d/19byTEMMVuyOpqqF59eT1mwAi-W1Fhhtcqj1_4js9jSo

benwaggoner
21st May 2020, 22:37
dav1d 0.7.0 'Frigatebird' the fast and lean AV1 decoder (https://code.videolan.org/videolan/dav1d/-/releases/0.7.0)

And from the changelog (https://code.videolan.org/videolan/dav1d/-/blob/master/NEWS):
Exciting progress! And one of the first practical uses of AVX-512 in the video world (x265 supports it, but with current Intel thermal throttling, it rarely turns into any real world perf improvement).

I'd be curious to see the pref delta between an AVX-512 and an AVX2-only build.

Beelzebubu
22nd May 2020, 00:48
Exciting progress! And one of the first practical uses of AVX-512 in the video world (x265 supports it, but with current Intel thermal throttling, it rarely turns into any real world perf improvement).

I'd be curious to see the pref delta between an AVX-512 and an AVX2-only build.

It targets different CPUs. x265 targets skylake. dav1d targets icelake. I should clarify here that this is because - as @benwaggoner already pointed out - we expect avx512 would not ever be faster on skylake compared with avx2. Performance (avx512 vs. avx2 on icelake) is slightly faster multi-threaded, and slightly slower single-threaded, more detailed notes will follow when it's more complete. Because we don't have consistently faster results yet, avx512 is currently disabled by default, and you need to specify --cpumask=avx512icl to enable it.

hajj_3
22nd May 2020, 08:00
http://www.jbkempf.com/blog/post/2020/dav1d-0.7.0-mobile-focus

hajj_3
27th November 2020, 22:13
DAV1D v0.8.0 changelog:

- Improve the performance by using a picture buffer pool; The improvements can reach 10% on some cases on Windows.
- Support for Apple ARM Silicon
- ARM32 optimizations for 8bit bitdepth for ipred paeth, smooth, cfl
- ARM32 optimizations for 10/12/16bit bitdepth for mc_avg/mask/w_avg, put/prep 8tap/bilin, wiener and CDEF filters
- ARM64 optimizations for cfl_ac 444 for all bitdepths
- x86 optimizations for MC 8-tap, mc_scaled in AVX2
- x86 optimizations for CDEF in SSE and {put/prep}_{8tap/bilin} in SSSE3

soresu
29th November 2020, 05:40
Great strides for ARM32 playback in dav1d 0.8 which will benefit most streamers on the market which are limited to ARM32 OS's by their RAM capacity, at this point the ARM64 code path is almost completely optimised save for film grain.

Still outstanding though is 10+ bpc SIMD code for AVX2 and SSSE3, which seems a bit silly now after Netflix started churning out 10 bit HDR AV1 content earlier in the year.

benwaggoner
2nd December 2020, 02:51
Great strides for ARM32 playback in dav1d 0.8 which will benefit most streamers on the market which are limited to ARM32 OS's by their RAM capacity, at this point the ARM64 code path is almost completely optimised save for film grain.

Still outstanding though is 10+ bpc SIMD code for AVX2 and SSSE3, which seems a bit silly now after Netflix started churning out 10 bit HDR AV1 content earlier in the year.
Are they using it for PC/Mac playback, though? Netflix is going to have some HW DRM requirements from their content licensors.

soresu
3rd December 2020, 11:21
888 will able to decode 4K 8bit 30fps AV1 in software just fine though battery life will be hugely affected.

There's no reason that 10 bit 4K should be a problem at this point with the level of asm/SIMD optimisation for NEON that has been achieved in dav1d already from ARM64, which is basically everything but film grain and one other feature of the codec.

Even ARM32 is closing the gap to the ARM64 now.

There is also the GPU work in the Xbox One decoder which may well be translatable to GLES/Vulkan and Android* - though it should probably work fine for Windows on ARM with little changes, given it is UWP and DX12 code.

*There is already some GPU decoding work done in dav1d and the AOM Xbox One decoder branch may provide some pointers/reference for future work filling in the remaining gaps - though I doubt that it will dramatically affect decoding speed, probably more battery life, but that is what they would be after anyways for mobile use cases.

soresu
3rd December 2020, 19:23
Are they using it for PC/Mac playback, though? Netflix is going to have some HW DRM requirements from their content licensors.

True, though production problems of this year aside they have been steadily transitioning to their own content above anything else on the platform.

AMD and nVidia are also starting to push AV1 decoding hardware into the PC platform with RX 6000 and RTX 3000.

I'm not sure if those ASIC in GPU solutions will be compatible with DRM requirements but its certainly a start - there's also the new Intel SoC's coming that will decode and even encode AV1 in hardware.

benwaggoner
3rd December 2020, 23:53
Good thread!

I'm curious if anyone has deep dived on the decoder performance benefits of using more tiles, relative to the number of available threads to decode on.

benwaggoner
3rd December 2020, 23:57
True, though production problems of this year aside they have been steadily transitioning to their own content above anything else on the platform.
Yeah. Although they still need to support full studio-approved DRM if they have any licensed content. I suppose they could use AV1 only on a subset of the library.

AMD and nVidia are also starting to push AV1 decoding hardware into the PC platform with RX 6000 and RTX 3000.
Also Intel Xe/11th Gen Core! Which is the only thing that seems to be shipping in material volume so far.

I'm not sure if those ASIC in GPU solutions will be compatible with DRM requirements but its certainly a start - there's also the new Intel SoC's coming that will decode and even encode AV1 in hardware.
All the GPU solutions are absolutely supposed to support HW DRM playback. Windows has had good hooks for leveraging that for many years now.

I've not heard of anyone actually doing adaptive bitrate AV1 with DRM targeting actual consumer HW decode yet, though. Lots of talk about it, but I've not been able to find a single demo package or web site anywhere.

dapperdan
9th December 2020, 18:29
Yeah. Although they still need to support full studio-approved DRM if they have any licensed content. I suppose they could use AV1 only on a subset of the library.


I think this is a standard part of their rollout. I seem to recall VP9 being used first for downloadable content, which was often their in-house stuff.

Blue_MiSfit
9th December 2020, 20:47
Downloadable content is often SD to reduce size for product reasons:

"oops I'm at the airport gate and forgot I want some shows to watch! I want them ASAP! Also my connection is currently congested WiFi / LTE"

DRM requirements for SD are quite lax (software DRM is usually fine), so new codecs with software decoders are often okay there :)

hajj_3
9th December 2020, 22:48
mpc-be 1.5.6.5797 beta has been released which includes dav1d 0.8.0 it also adds lots of other av1 related things: https://sourceforge.net/p/mpcbe/code/HEAD/tree/trunk/docs/Changelog.txt

benwaggoner
10th December 2020, 01:58
for dav1d 0.8.0, what's the per-thread decoder perf difference for 8-bit and 10-bit content these days?

sneaker_ger
10th December 2020, 12:56
On SSSE3 like factor 5 or something? There still isn't optimization for 10 bit AVX2 nor 10 bit SSSE3.

benwaggoner
10th December 2020, 18:04
On SSSE3 like factor 5 or something? There still isn't optimization for 10 bit AVX2 nor 10 bit SSSE3.
I hope that optimization gets done and is effective. With AV1 we finally have a codec with mandatory 10-bit support, but it's too slow to be practical on the platforms where AV1 is most advantageous. The big 2021 win with AV1 would be Chrome/Firefox as it's an upgrade from H.264. Without 10-bit, there's no HDR, and no option to look better than H.264, just lower bitrate.

soresu
19th December 2020, 07:58
I hope that optimization gets done and is effective. With AV1 we finally have a codec with mandatory 10-bit support, but it's too slow to be practical on the platforms where AV1 is most advantageous. The big 2021 win with AV1 would be Chrome/Firefox as it's an upgrade from H.264. Without 10-bit, there's no HDR, and no option to look better than H.264, just lower bitrate.

Unfortunately unless a private entity sponsors x86 10 bpc SIMD asm as Netflix did for ARM64 NEON I would not expect it to happen soon as it is a pretty low priority on the dav1d roadmap - below even AVX512 8 bpc work, which is not even close to as far along as AVX2 or SSSE3 for completeness right now.

With any luck Netflix might take pity on us desktop users and get someone to do the x86 10 bpc work after their current hire has finished doing the ARM32 10 bpc optimisations which are coming along pretty well at the moment.

sneaker_ger
19th December 2020, 09:01
Unfortunately unless a private entity sponsors x86 10 bpc SIMD asm
They supposedly do have that sponsorship already (https://www.reddit.com/r/AV1/comments/k2azsb/dav1d_080_is_out/gdvf690/) but as you say it's not the highest priority.

It's good 10bpc is mandatory in the most basic profile but they should have taken it one step further and just got rid of 8bpc coding altogether.

Jamaika
19th December 2020, 12:56
Maybe someone knows how to set config parameters dav1d 0.8.0-7424f8e for gcc windows 16bit with SSE2.
For 8bit the decoder works but for 16bit no. Maybe increase memory size.
WINVER=0x0602, _WIN32_WINNT=0x0602, ARCH_X86_64=1, CONFIG_16BPC, BITDEPTH=16, HAVE_ALIGNED_MALLOC

nevcairiel
20th December 2020, 11:19
Maybe someone knows how to set config parameters dav1d 0.8.0-7424f8e for gcc windows 16bit with SSE2.
For 8bit the decoder works but for 16bit no. Maybe increase memory size.
WINVER=0x0602, _WIN32_WINNT=0x0602, ARCH_X86_64=1, CONFIG_16BPC, BITDEPTH=16, HAVE_ALIGNED_MALLOC

AV1 does not support 16-bit. its only 8, 10 and 12. Hence a decoder cannot support anything that the codec does not provide.

If you build it with full default settings, dav1d will support all these bitdepths natively out of the box. You can turn off 8 bit or 10/12 bit, but unless you are very much size constrained, there is no reason to do so.

Jamaika
20th December 2020, 13:14
AV1 does not support 16-bit. its only 8, 10 and 12. Hence a decoder cannot support anything that the codec does not provide.

If you build it with full default settings, dav1d will support all these bitdepths natively out of the box. You can turn off 8 bit or 10/12 bit, but unless you are very much size constrained, there is no reason to do so.
For dav1d there are no settings to define 10 or 12bit only for total of 10,12,16bit. I understand that in practice dav1d doesn't support 10.12bit

nevcairiel
20th December 2020, 17:41
I understand that in practice dav1d doesn't support 10.12bit

You are wrong. dav1d supports the entire AV1 spec fully.

Jamaika
20th December 2020, 19:52
You are wrong. dav1d supports the entire AV1 spec fully.
Well then how to set the config. BITDEPTH value can't be 10 or 12.

nevcairiel
20th December 2020, 19:56
Just build dav1d with default config, no changes, and it'll support everything.

Blue_MiSfit
20th December 2020, 22:13
Unfortunately unless a private entity sponsors x86 10 bpc SIMD asm as Netflix did for ARM64 NEON I would not expect it to happen soon as it is a pretty low priority on the dav1d roadmap - below even AVX512 8 bpc work, which is not even close to as far along as AVX2 or SSSE3 for completeness right now.


I'm honestly pretty surprised there's so little interest. Why is this? Is it just that the real hope for short term AV1 adoption via software decode is wide/shallow engagement content like social / user generated?

I suppose in this world it matters more to have comprehensive optimization for 8 bit content across even very old hardware. Seems odd to prioritize AVX512 for 8 bit over something like AVX2 for 10 bit tho :/

Of course all hollywood content is mastered in 10+ bit (even SDR) and is frequently delivered in 10 bit HEVC today. If I was building an AV1 encoding pipeline it sure would be ideal if I could just encode 10 bit and not worry about 8 bit at all..

nevcairiel
21st December 2020, 00:21
If I had to make a guess, I would say that desktop PC media consumption is not a priority for Netflix, Amazon etc, and a large part of their traffic is mobile or straight to TV or streaming devices, while YouTube sees a larger share on PC consumption.

foxyshadis
21st December 2020, 00:54
People watching on a PC/laptop generally won't click away just because the fans spin up, either, they'll just turn it up if they aren't already wearing headphones. Big difference from stuttering.

I wonder if SVT-AV1's assembly routines could be adapted; I don't know nearly enough about dav1d to know if Rust or the way data is set up would be a blocker, but at least the licensing is taken care of since the move.

sneaker_ger
21st December 2020, 08:26
Rust? I think you have dav1d confused with Rav1e. Both SVT-AV1 and dav1d are written in C (+assembly/intrinsics).

foxyshadis
21st December 2020, 09:59
Rust? I think you have dav1d confused with Rav1e. Both SVT-AV1 and dav1d are written in C (+assembly/intrinsics).

Brain fart, yes.

hajj_3
21st December 2020, 10:44
I'm honestly pretty surprised there's so little interest. Why is this? Is it just that the real hope for short term AV1 adoption via software decode is wide/shallow engagement content like social / user generated?

I suppose in this world it matters more to have comprehensive optimization for 8 bit content across even very old hardware. Seems odd to prioritize AVX512 for 8 bit over something like AVX2 for 10 bit tho :/

Of course all hollywood content is mastered in 10+ bit (even SDR) and is frequently delivered in 10 bit HEVC today. If I was building an AV1 encoding pipeline it sure would be ideal if I could just encode 10 bit and not worry about 8 bit at all..

Few people have HDR monitors or 10bit monitors. I'm sure that 10bit support will be good at some point next year.

Blue_MiSfit
21st December 2020, 22:52
Few people have HDR monitors or 10bit monitors. I'm sure that 10bit support will be good at some point next year.

To the contrary, 4k TVs are widely popular, and they all have 10 bit HDR & WCG panels.

I'll agree that the vast majority of PC / Mac systems do not have HDR / 10 bit displays ;)