Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > VP9 and AV1

Reply
 
Thread Tools Search this Thread Display Modes
Old 23rd November 2022, 07:32   #1  |  Link
BlueSwordM
Registered User
 
BlueSwordM's Avatar
 
Join Date: Dec 2021
Location: Canada
Posts: 16
The rav1e development thread: working on rav1e in a different more forward manner

Hello everyone. I know I don't post much on doom9, but no worries: from now on, I will be posting and most importantly, interacting more often

The main reason I decided to write this post is that I've been (slowly) working on rav1e as well as contributing to other related projects like ssimulacra2_rs (a Rust version of the ssimulacra2_C psy image metric), and learning a lot of stuff from early September all the way to now.

One of these things is how complex video encoder development truly is: not only do you need to have solid coding and math knowledge and experience, but a good understanding of scientific analysis and how the HVS system works is critical to good results in terms of speed, coding efficiency and visual performance.

One of these things is how complex video encoder development truly is: a solid programming and mathematics background is very important to actually being able to implement the ideas you actually want to integrate into the encoder and how critical good scientific analysis with a solid understanding of how the HVS system works is to getting the best result possible in the end.

Another important factor is how truly important feedback loops are to the entire development process: that is why metrics exist, right? They exist to act as a feedback loop into how encoder development should be driven, and where it should be driven. Be it objectively or subjectively driven, video encoder improvement is driven by how varied and good those feedback loops are. If they are poor, you will get suboptimal results, or the worst of all, you'll get the right answers to the wrong questions on how to improve the video encoder in question.

Something that is often ignored in tightly knit video encoder development groups is how helpful user-driven/community-driven feedback can truly be, as working in a tightly knit group often results in shared opinions becoming the norm, and critical concensus is never truly attained. In that sense, you start developping tunnel vision in regards to what needs to become better after you've picked the low hanging fruit.

How does that relate to the AV1 video encoder development scene? Well, most of the actual video encoder development in aomenc, SVT-AV1 and other tools is done somewhat privately by AOM members while not taking into opinions from outside entities much or disregarding them entirely. rav1e is a lot better in this regard, but because of a lack of funding, its development has slowed down somewhat.

A lack of public open discussion on something like a forum means most of the discussion is happening on slightly closed off(#daala IRC channel) or even proprietary platforms(Discord AV1 server), both suboptimal ways to discuss development and news about this kind of stuff since it's closed off from public view, and most information moves from mouth to mouth, preventing general view from a large portion of viewers, readers, developers, enthusiasts, encoders, and even search engines, making for low visibility.

As such, I've decided to lead (somewhat independently lmao) a different much more open approach to what I believe will be the strongest for AV1 encoder development: on public forums platforms like Reddit, doom9 forums, and a few others, I'll start taking feedback and ideas for various improvements and suggestions on what features to add and improve in rav1e, respond to general questions, involve more industry folks in the subject, have people from different standard toolset teams chip in for more general improvements.

Most importantly, video encoder feedback can and will be taken more seriously, as every complaint or criticism with proper explanations of what is bad and can be changed is invaluable advice.

I believe this addition in terms of feedback as well as the improvement of using heavily psy metric development

I believe this change in development feedback loop is necessary, as it fully complements the beginning of the widespread usage of much more heavily psy driven development models and more varied understanding in that regard. In that sense, having many outside eyes and hands in a video encoder project is critical to its large success, as it can be used to go to revive and kick off a program that was thought to never be able to improve on its implementation weaknesses.

With all of that in mind, what do I believe rav1e's end goal is? In a somewhat generic manner, it is to be the AV1 encoder equivalent of x264, but only from a general usage point of view.

In the context of a video encoder, it is more nuanced than that:
  • The first point is to take good tools of the AV1 standard that rav1e currently lacks, shove them inside of rav1e taking a calculated approach of how to implement them more effectively than the other encoders to get all the upsides with almost none of the downsides.

  • The second one is to have an even stronger psycho-visual mode decision model than what is currently used in rav1e, which would allow us to get the best visual performance at very low bitrates all the way to excellent visual lossless performance at high bitrates. In theory and even in practice(if it can get optimally implemented), this would allow rav1e to greatly surpass x264 relatively speaking, since one of the reasons x264's high end performance is great is specifically because mode decision is heavily limited in way that allows its psycho-visual modeling(psy-rd and other tools) to perform well. The main limitation this creates is that at low bitrates, these mode decision limitations limit how well you can perform at these low quality levels.

    Implementing such a model that works reliably without throwing obscene amounts of external compute or restricting mode decision choices in rav1e is actually feasible since the infrastructure is well made in that regard, well-built and greatly thought out.

    Another benefit is that it would eliminate most of the cargo culting that is often seen with a large amount of open encoders that gets difficult to tweak, tiring, and often full of mindhive like usage of specific settings for ??? reasons.
  • Finally, higher speeds encoder speeds on faster presets and better threading on rav1e's part is necessary to get rav1e used by about everyone, so that is a given.

Thank you all for reading up to this point. All of you and the people who've helped me to get up to this point are why I've decided to attempt such a large mindset change over the last few years. A few of them come to mind: rav1e and av1an developers, dav1d, some ffmpeg devs, AV1 Discord server members, and JPEG-XL folks; there are just too many good people to mention them all sadly.

Now that I've said everything in my mind up to this point, I'd like to start the conversation by saying one thing:
what do you suggest we can do to improve rav1e, the way we do development, and what suggestions do you have in general?

I'd love to hear from all of you guys and gals around the world with different perspective and views.
If any of you would like to submit concrete improvements backed up with comements and optimally, with numbers, please do so.
This is the kind of feedback that is the most appreciated.


Thank you all, and have a good day.

All of the above text and opinions are of my own, only shared by some other people, and do not represent the thoughts of rav1e's development team as a whole.

Last edited by BlueSwordM; 26th November 2022 at 07:48.
BlueSwordM is offline   Reply With Quote
Old 24th November 2022, 06:18   #2  |  Link
BlueSwordM
Registered User
 
BlueSwordM's Avatar
 
Join Date: Dec 2021
Location: Canada
Posts: 16
One simple way to improve rav1e at all quality levels is to simple improve deblocking: while the strength selection algorithm is already very optimal, the distortion metric used is SSE, and is not influenced in any way by other forms of more advanced psy RDO metric in rav1e.

That means it blurs blocks to a larger degree than necessary in my testing in anything grainy, and unlike aomenc (unless I can't read the deblock.rs source properly), it doesn't actually include a bias to counteract that to a small degree.

It should be relatively easy to lower deblocking strength somewhat to counteract this in the very short term, and in the mid term, implement frequency weighted SSE to make the deblocking filter application a lot more effective.
This could either be determined through the non-flatness metric used, or by using tx_domain_dist in an interesting way to map out the block frequencies.

Long term wise, it would be possible to even integrate the current psycho-visual RDO metric into it, although that would require some new piping and tuning, but it would allow reuse of data that is already used in activity masking.

Last edited by BlueSwordM; 24th November 2022 at 06:23. Reason: Better formatting and clarification
BlueSwordM is offline   Reply With Quote
Old 26th November 2022, 01:27   #3  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,318
QFT: "In that sense, you start developping tunnel vision in regards to what needs to become better after you've picked the low hanging fruit."

SO true. A very insightful post.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 26th November 2022, 07:33   #4  |  Link
soresu
Registered User
 
Join Date: May 2005
Location: Swansea, Wales, UK
Posts: 196
Let it begin, tis I "Kenny / Mountain Nomad" on the Discord.
soresu is offline   Reply With Quote
Old 1st December 2022, 06:53   #5  |  Link
BlueSwordM
Registered User
 
BlueSwordM's Avatar
 
Join Date: Dec 2021
Location: Canada
Posts: 16
One other interesting thing I've noticed is how in most encoders, the more advanced RDO decisions aren't available to stuff like quantization, motion estimation, and temporal-RDO.
I can obviously understand not having more complex RDO for ME, as SSD/SATD ME RD decisions are already demanding enough as is, so adding a more complex metric would be very computationally complex.

However, x264 and x265 jump out in particular as not having their psy-rd implementations working in their TPL-RDO implementations. rav1e actually does have it available, and in the encoder, when you disable the coding tools that tend to blur even at higher bitrates currently(deblocking and SGR, CDEF is a bit different), disabling TPL-RDO doesn't actually improve image quality that much, and tends to actually make image quality worse, even at the high end of content and bitrates(Foodmarket).

This is actually an interesting observation that I didn't notice much before.
BlueSwordM is offline   Reply With Quote
Old 1st December 2022, 20:16   #6  |  Link
Blue_MiSfit
Derek Prestegard IRL
 
Blue_MiSfit's Avatar
 
Join Date: Nov 2003
Location: Los Angeles
Posts: 5,957
Quote:
Implementing such a model that works reliably without throwing obscene amounts of external compute or restricting mode decision choices in rav1e is actually feasible since the infrastructure is well made in that regard, well-built and greatly thought out.
Coming from the standpoint of someone not familiar with the codebase, but with an appreciation of what this implies --- AWESOME
__________________
These are all my personal statements, not those of my employer :)
Blue_MiSfit is offline   Reply With Quote
Old 2nd December 2022, 07:12   #7  |  Link
BlueSwordM
Registered User
 
BlueSwordM's Avatar
 
Join Date: Dec 2021
Location: Canada
Posts: 16
Thank you for the compliment Blue_Misfit. And yeah, aomenc and SVT-AV1's codebases make them hard for someone like me to implement more psy driven RDO across all encoding processes.

In other news, I think I've figured out how to implement psy-driven RDO quantization: in-block psy-RDO and psy trellis/hybrid deadzone quant.
BlueSwordM is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 19:06.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, vBulletin Solutions Inc.