Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > VP9 and AV1

Reply
 
Thread Tools Search this Thread Display Modes
Old 23rd March 2022, 18:41   #141  |  Link
lvqcl
Registered User
 
Join Date: Aug 2015
Posts: 278
8-bit video: SSE4.1 vs AVX2 vs AVX-512 (on 8C/16T Rocket Lake) - https://code.videolan.org/videolan/d..._requests/1301

Last edited by lvqcl; 23rd March 2022 at 19:46.
lvqcl is offline   Reply With Quote
Old 23rd March 2022, 18:56   #142  |  Link
Beelzebubu
Registered User
 
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 109
Quote:
Originally Posted by benwaggoner View Post
Do we know how much speedup AVX512 provided? We've not seen it to be particularly useful in encoder performance, so it'd be interesting if it helps more on the decode side.
dav1d uses the icelake subset (AWS: m6i/c6i, or: F, CD, VL, DQ, BW, IFMA, VBMI, VBMI2, VPOPCNTDQ, BITALG, VNNI, VPCLMULQDQ, GFNI, VAES), not skylake subset (AWS: m5*/c5*, or: F, CD, VL, DQ, BW). Icelake's performance of AVX512 instructions is in general much better than Skylake's, but the wider instruction subset also allows for certain additional code optimizations.

Extreme example of the latter: 8-bit film grain is more than 3x as fast with AVX512 compared to AVX2.

Last edited by Beelzebubu; 23rd March 2022 at 19:04.
Beelzebubu is offline   Reply With Quote
Old 23rd March 2022, 20:05   #143  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,578
Wow, those are some very impressive speedups with AVX512! The new instructions are making at least as much of a difference than the "AVX2, but 2x wider" instructions.

Of course, Icelake CPUs don't have that much market share yet, but these kinds of speedups are quite promising in the long term for software decoding.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 24th March 2022, 12:21   #144  |  Link
Beelzebubu
Registered User
 
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 109
Quote:
Originally Posted by benwaggoner View Post
Of course, Icelake CPUs don't have that much market share yet, but these kinds of speedups are quite promising in the long term for software decoding.
... and software encoding!
Beelzebubu is offline   Reply With Quote
Old 15th February 2023, 19:53   #145  |  Link
Spyros
Registered User
 
Join Date: Jun 2019
Posts: 16
dav1d 1.1.0 'Arctic Peregrine Falcon'

dav1d 1.1.0 was released yesterday. (Tag)

Quote:
Changes for 1.1.0 'Arctic Peregrine Falcon':
-------------------------------------------

1.1.0 is an important release of dav1d, fixing numerous bugs, and adding SIMD
  • New function dav1d_get_frame_delay to query the decoder frame delay
  • Numerous fixes for strict conformity to the specs and samples
  • NEON and AVX-512 misc fixes and improvements
  • Partial AVX2 12bpc transform implementations
  • AVX-512 high bit-depth cdef_filter, loopfilter, itx
  • NEON z1/z3 optimization for 8bpc
  • SSSE3 z1 optimization for 8bpc

"From VideoLAN with love"
Spyros is offline   Reply With Quote
Old 3rd May 2023, 07:43   #146  |  Link
hajj_3
Registered User
 
Join Date: Mar 2004
Posts: 1,081
Changes for 1.2.0 'Arctic Peregrine Falcon':
-------------------------------------------

- Improvements on attachments of props and T.35 entries on output pictures
- NEON z1/z3 high bit-depth optimizations and improvements for 8bpc
- SSSE3 z2/z3 8bpc and SSSE3 z1/z3 high bit-depth optimziations
- refmvs.save_tmvs optimizations in SSSE3/AVX2/AVX-512
- AVX-512 optimizations for high bit-depth itx (16x64, 32x64, 64x16, 64x32, 64x64)
- AVX2 optimizations for 12bpc for 16x32, 32x16, 32x32 itx
hajj_3 is offline   Reply With Quote
Old 4th June 2023, 21:57   #147  |  Link
hajj_3
Registered User
 
Join Date: Mar 2004
Posts: 1,081
Changes for 1.2.1 'Arctic Peregrine Falcon':
-------------------------------------------

- Fix a threading race on task_thread.init_done
- NEON z2 8bpc and high bit-depth optimizations
- SSSE3 z2 high bit-depth optimziations
- Fix a desynced luma/chroma planes issue with Film Grain
- Reduce memory consumption
- Improve dav1d_parse_sequence_header() speed
- OBU: Improve header parsing and fix potential overflows
- OBU: Improve ITU-T T.35 parsing speed
- Misc buildsystems, CI and headers fixes
hajj_3 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 12:53.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, vBulletin Solutions Inc.