Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > VP9 and AV1

Reply
 
Thread Tools Search this Thread Display Modes
Old 23rd March 2022, 18:41   #141  |  Link
lvqcl
Registered User
 
Join Date: Aug 2015
Posts: 293
8-bit video: SSE4.1 vs AVX2 vs AVX-512 (on 8C/16T Rocket Lake) - https://code.videolan.org/videolan/d..._requests/1301

Last edited by lvqcl; 23rd March 2022 at 19:46.
lvqcl is offline   Reply With Quote
Old 23rd March 2022, 18:56   #142  |  Link
Beelzebubu
Registered User
 
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 109
Quote:
Originally Posted by benwaggoner View Post
Do we know how much speedup AVX512 provided? We've not seen it to be particularly useful in encoder performance, so it'd be interesting if it helps more on the decode side.
dav1d uses the icelake subset (AWS: m6i/c6i, or: F, CD, VL, DQ, BW, IFMA, VBMI, VBMI2, VPOPCNTDQ, BITALG, VNNI, VPCLMULQDQ, GFNI, VAES), not skylake subset (AWS: m5*/c5*, or: F, CD, VL, DQ, BW). Icelake's performance of AVX512 instructions is in general much better than Skylake's, but the wider instruction subset also allows for certain additional code optimizations.

Extreme example of the latter: 8-bit film grain is more than 3x as fast with AVX512 compared to AVX2.

Last edited by Beelzebubu; 23rd March 2022 at 19:04.
Beelzebubu is offline   Reply With Quote
Old 23rd March 2022, 20:05   #143  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,752
Wow, those are some very impressive speedups with AVX512! The new instructions are making at least as much of a difference than the "AVX2, but 2x wider" instructions.

Of course, Icelake CPUs don't have that much market share yet, but these kinds of speedups are quite promising in the long term for software decoding.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 24th March 2022, 12:21   #144  |  Link
Beelzebubu
Registered User
 
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 109
Quote:
Originally Posted by benwaggoner View Post
Of course, Icelake CPUs don't have that much market share yet, but these kinds of speedups are quite promising in the long term for software decoding.
... and software encoding!
Beelzebubu is offline   Reply With Quote
Old 15th February 2023, 19:53   #145  |  Link
Spyros
Registered User
 
Join Date: Jun 2019
Posts: 16
dav1d 1.1.0 'Arctic Peregrine Falcon'

dav1d 1.1.0 was released yesterday. (Tag)

Quote:
Changes for 1.1.0 'Arctic Peregrine Falcon':
-------------------------------------------

1.1.0 is an important release of dav1d, fixing numerous bugs, and adding SIMD
  • New function dav1d_get_frame_delay to query the decoder frame delay
  • Numerous fixes for strict conformity to the specs and samples
  • NEON and AVX-512 misc fixes and improvements
  • Partial AVX2 12bpc transform implementations
  • AVX-512 high bit-depth cdef_filter, loopfilter, itx
  • NEON z1/z3 optimization for 8bpc
  • SSSE3 z1 optimization for 8bpc

"From VideoLAN with love"
Spyros is offline   Reply With Quote
Old 3rd May 2023, 07:43   #146  |  Link
hajj_3
Registered User
 
Join Date: Mar 2004
Posts: 1,120
Changes for 1.2.0 'Arctic Peregrine Falcon':
-------------------------------------------

- Improvements on attachments of props and T.35 entries on output pictures
- NEON z1/z3 high bit-depth optimizations and improvements for 8bpc
- SSSE3 z2/z3 8bpc and SSSE3 z1/z3 high bit-depth optimziations
- refmvs.save_tmvs optimizations in SSSE3/AVX2/AVX-512
- AVX-512 optimizations for high bit-depth itx (16x64, 32x64, 64x16, 64x32, 64x64)
- AVX2 optimizations for 12bpc for 16x32, 32x16, 32x32 itx
hajj_3 is offline   Reply With Quote
Old 4th June 2023, 21:57   #147  |  Link
hajj_3
Registered User
 
Join Date: Mar 2004
Posts: 1,120
Changes for 1.2.1 'Arctic Peregrine Falcon':
-------------------------------------------

- Fix a threading race on task_thread.init_done
- NEON z2 8bpc and high bit-depth optimizations
- SSSE3 z2 high bit-depth optimziations
- Fix a desynced luma/chroma planes issue with Film Grain
- Reduce memory consumption
- Improve dav1d_parse_sequence_header() speed
- OBU: Improve header parsing and fix potential overflows
- OBU: Improve ITU-T T.35 parsing speed
- Misc buildsystems, CI and headers fixes
hajj_3 is offline   Reply With Quote
Old 5th October 2023, 20:54   #148  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 480
Changes for 1.3.0 'Tundra Peregrine Falcon (Calidus)':
------------------------------------------------------

1.3.0 is a medium release of dav1d, focus on new APIs and memory usage reduction.

- Reduce memory usage in numerous places
- ABI break in Dav1dSequenceHeader, Dav1dFrameHeader, Dav1dContentLightLevel structures
- new API function to check the API version: dav1d_version_api()
- Rewrite of the SGR functions for ARM64 to be faster
- NEON implemetation of save_tmvs for ARM32 and ARM64
- x86 palette DSP for pal_idx_finish function
__________________
Do NOT re-post any of my Mediafire links. Download & re-host the content(s) if you want to share it somewhere else.
Barough is offline   Reply With Quote
Old 5th October 2023, 21:02   #149  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 480
dav1d v1.3.0-3-g47107e3
Built on October 05, 2023, GCC 13.2.0

https://code.videolan.org/videolan/dav1d

DL :
dav1d v1.3.0-3-g47107e3
__________________
Do NOT re-post any of my Mediafire links. Download & re-host the content(s) if you want to share it somewhere else.

Last edited by Barough; 5th October 2023 at 21:35.
Barough is offline   Reply With Quote
Old 14th February 2024, 17:21   #150  |  Link
hajj_3
Registered User
 
Join Date: Mar 2004
Posts: 1,120
Changes for 1.4.0 'Road Runner':
------------------------------------------------------

1.4.0 is a medium release of dav1d, focusing on new architecture support and optimizations

- AVX-512 optimizations for z1, z2, z3 in 8bit and high-bit depth
- New architecture supported: loongarch
- Loongarch optimizations for 8bit
- New architecture supported: RISC-V
- RISC-V optimizations for itx
- Misc improvements in threading and in reducing binary size
- Fix potential integer overflow with extremely large frame sizes
hajj_3 is offline   Reply With Quote
Old 14th February 2024, 20:19   #151  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,752
RISC-V is interesting. It's starting to go into a lot of embedded things. License free (unlike ARM) and a very elegant architecture.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 9th March 2024, 17:41   #152  |  Link
hajj_3
Registered User
 
Join Date: Mar 2004
Posts: 1,120
v1.4.1 'Road Runner':
--------------------------------

- Optimizations for 6tap filters for NEON (ARM)
- More RISC-V optimizations for itx (4x8, 8x4, 4x16, 16x4, 8x16, 16x8)
- Reduction of binary size on ARM64, ARM32 and RISC-V
- Fix out-of-bounds read in 8bpc SSE2/SSSE3 wiener_filter
- Msac optimizations
hajj_3 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 00:44.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.