The rav1e development thread: working on rav1e in a different more forward manner [Archive]

View Full Version : The rav1e development thread: working on rav1e in a different more forward manner

BlueSwordM

23rd November 2022, 07:32

Hello everyone. I know I don't post much on doom9, but no worries: from now on, I will be posting and most importantly, interacting more often :thanks:

The main reason I decided to write this post is that I've been (slowly) working on rav1e as well as contributing to other related projects like ssimulacra2_rs (a Rust version of the ssimulacra2_C psy image metric), and learning a lot of stuff from early September all the way to now.

One of these things is how complex video encoder development truly is: not only do you need to have solid coding and math knowledge and experience, but a good understanding of scientific analysis and how the HVS system works is critical to good results in terms of speed, coding efficiency and visual performance.

One of these things is how complex video encoder development truly is: a solid programming and mathematics background is very important to actually being able to implement the ideas you actually want to integrate into the encoder and how critical good scientific analysis with a solid understanding of how the HVS system works is to getting the best result possible in the end.

Another important factor is how truly important feedback loops are to the entire development process: that is why metrics exist, right? They exist to act as a feedback loop into how encoder development should be driven, and where it should be driven. Be it objectively or subjectively driven, video encoder improvement is driven by how varied and good those feedback loops are. If they are poor, you will get suboptimal results, or the worst of all, you'll get the right answers to the wrong questions on how to improve the video encoder in question.

Something that is often ignored in tightly knit video encoder development groups is how helpful user-driven/community-driven feedback can truly be, as working in a tightly knit group often results in shared opinions becoming the norm, and critical concensus is never truly attained. In that sense, you start developping tunnel vision in regards to what needs to become better after you've picked the low hanging fruit.

How does that relate to the AV1 video encoder development scene? Well, most of the actual video encoder development in aomenc, SVT-AV1 and other tools is done somewhat privately by AOM members while not taking into opinions from outside entities much or disregarding them entirely. rav1e is a lot better in this regard, but because of a lack of funding, its development has slowed down somewhat.

A lack of public open discussion on something like a forum means most of the discussion is happening on slightly closed off(#daala IRC channel) or even proprietary platforms(Discord AV1 server), both suboptimal ways to discuss development and news about this kind of stuff since it's closed off from public view, and most information moves from mouth to mouth, preventing general view from a large portion of viewers, readers, developers, enthusiasts, encoders, and even search engines, making for low visibility.

As such, I've decided to lead (somewhat independently lmao) a different much more open approach to what I believe will be the strongest for AV1 encoder development: on public forums platforms like Reddit, doom9 forums, and a few others, I'll start taking feedback and ideas for various improvements and suggestions on what features to add and improve in rav1e, respond to general questions, involve more industry folks in the subject, have people from different standard toolset teams chip in for more general improvements.

Most importantly, video encoder feedback can and will be taken more seriously, as every complaint or criticism with proper explanations of what is bad and can be changed is invaluable advice.

I believe this addition in terms of feedback as well as the improvement of using heavily psy metric development

I believe this change in development feedback loop is necessary, as it fully complements the beginning of the widespread usage of much more heavily psy driven development models and more varied understanding in that regard. In that sense, having many outside eyes and hands in a video encoder project is critical to its large success, as it can be used to go to revive and kick off a program that was thought to never be able to improve on its implementation weaknesses.

With all of that in mind, what do I believe rav1e's end goal is? In a somewhat generic manner, it is to be the AV1 encoder equivalent of x264, but only from a general usage point of view.

In the context of a video encoder, it is more nuanced than that:

The first point is to take good tools of the AV1 standard that rav1e currently lacks, shove them inside of rav1e taking a calculated approach of how to implement them more effectively than the other encoders to get all the upsides with almost none of the downsides.

The second one is to have an even stronger psycho-visual mode decision model than what is currently used in rav1e, which would allow us to get the best visual performance at very low bitrates all the way to excellent visual lossless performance at high bitrates. In theory and even in practice(if it can get optimally implemented), this would allow rav1e to greatly surpass x264 relatively speaking, since one of the reasons x264's high end performance is great is specifically because mode decision is heavily limited in way that allows its psycho-visual modeling(psy-rd and other tools) to perform well. The main limitation this creates is that at low bitrates, these mode decision limitations limit how well you can perform at these low quality levels.

Implementing such a model that works reliably without throwing obscene amounts of external compute or restricting mode decision choices in rav1e is actually feasible since the infrastructure is well made in that regard, well-built and greatly thought out.

Another benefit is that it would eliminate most of the cargo culting that is often seen with a large amount of open encoders that gets difficult to tweak, tiring, and often full of mindhive like usage of specific settings for ??? reasons.

Finally, higher speeds encoder speeds on faster presets and better threading on rav1e's part is necessary to get rav1e used by about everyone, so that is a given.

Thank you all for reading up to this point. All of you and the people who've helped me to get up to this point are why I've decided to attempt such a large mindset change over the last few years. A few of them come to mind: rav1e and av1an developers, dav1d, some ffmpeg devs, AV1 Discord server members, and JPEG-XL folks; there are just too many good people to mention them all sadly.

Now that I've said everything in my mind up to this point, I'd like to start the conversation by saying one thing:
what do you suggest we can do to improve rav1e, the way we do development, and what suggestions do you have in general?

I'd love to hear from all of you guys and gals around the world with different perspective and views.
If any of you would like to submit concrete improvements backed up with comements and optimally, with numbers, please do so.
This is the kind of feedback that is the most appreciated.

Thank you all, and have a good day.

All of the above text and opinions are of my own, only shared by some other people, and do not represent the thoughts of rav1e's development team as a whole.

BlueSwordM

24th November 2022, 06:18

One simple way to improve rav1e at all quality levels is to simple improve deblocking: while the strength selection algorithm is already very optimal, the distortion metric used is SSE, and is not influenced in any way by other forms of more advanced psy RDO metric in rav1e.

That means it blurs blocks to a larger degree than necessary in my testing in anything grainy, and unlike aomenc (unless I can't read the deblock.rs source properly), it doesn't actually include a bias to counteract that to a small degree.

It should be relatively easy to lower deblocking strength somewhat to counteract this in the very short term, and in the mid term, implement frequency weighted SSE to make the deblocking filter application a lot more effective.
This could either be determined through the non-flatness metric used, or by using tx_domain_dist in an interesting way to map out the block frequencies.

Long term wise, it would be possible to even integrate the current psycho-visual RDO metric into it, although that would require some new piping and tuning, but it would allow reuse of data that is already used in activity masking.

benwaggoner

26th November 2022, 01:27

QFT: "In that sense, you start developping tunnel vision in regards to what needs to become better after you've picked the low hanging fruit."

SO true. A very insightful post.

soresu

26th November 2022, 07:33

Let it begin, tis I "Kenny / Mountain Nomad" on the Discord.

BlueSwordM

1st December 2022, 06:53

One other interesting thing I've noticed is how in most encoders, the more advanced RDO decisions aren't available to stuff like quantization, motion estimation, and temporal-RDO.
I can obviously understand not having more complex RDO for ME, as SSD/SATD ME RD decisions are already demanding enough as is, so adding a more complex metric would be very computationally complex.

However, x264 and x265 jump out in particular as not having their psy-rd implementations working in their TPL-RDO implementations. rav1e actually does have it available, and in the encoder, when you disable the coding tools that tend to blur even at higher bitrates currently(deblocking and SGR, CDEF is a bit different), disabling TPL-RDO doesn't actually improve image quality that much, and tends to actually make image quality worse, even at the high end of content and bitrates(Foodmarket).

This is actually an interesting observation that I didn't notice much before.

Blue_MiSfit

1st December 2022, 20:16

Implementing such a model that works reliably without throwing obscene amounts of external compute or restricting mode decision choices in rav1e is actually feasible since the infrastructure is well made in that regard, well-built and greatly thought out.

Coming from the standpoint of someone not familiar with the codebase, but with an appreciation of what this implies --- AWESOME :)

BlueSwordM

2nd December 2022, 07:12

Thank you for the compliment Blue_Misfit. And yeah, aomenc and SVT-AV1's codebases make them hard for someone like me to implement more psy driven RDO across all encoding processes.

In other news, I think I've figured out how to implement psy-driven RDO quantization: in-block psy-RDO and psy trellis/hybrid deadzone quant.

benwaggoner

7th December 2022, 20:35

In other news, I think I've figured out how to implement psy-driven RDO quantization: in-block psy-RDO and psy trellis/hybrid deadzone quant.
Those sound very promising!

BlueSwordM

13th December 2022, 19:37

Indeed.
Anyway, the main problem with specific video encoder pipelines is that they rely on low complexity metrics to make them feasible to run without HW acceleration and make HW acceleration not costly.

For quantization processes and motion estimation(as well as TPL-RDO depending on how complex its implementation is), fast metrics are a necessity.

That means only SAD and SATD to a lesser extent even when RDO for those processes is active, which means suboptimal consistent psycho-visual targeting.

Using frequency weighted metrics in the traditional way by using a DCT transform to get the frequency information is fine compute wise, but it's too slow for processes that are repeated a lot like quantization.

To fix that, I've finally found a way to fix this: I'll be using a specific filter to get the frequency information out of the blocks without using a transform, allowing us to get all the benefits of this without the compute costs!

This will allow me to do stuff in a somewhat psycho-visually weighted manner while preserving the speed of a very simple metric like SAD, getting us nice gains even at high speed presets and even speedups for those presets!

For those interested into how I've discovered this, I present you to you a glorious daala paper, where the Xiph folks managed to do something very smart:
https://people.xiph.org/~tterribe/daala/daala-icip2017.pdf

wswartzendruber

21st December 2022, 18:16

damian101

18th March 2023, 20:59

Greetings. I am a video encoding layperson. I do not currently possess the math skills required to implement what I want to see. I have implemented a FFT but barely know what a MDCT is.

From a user's perspective, I would much like to see AV1's capability for representing noise better exploited. I am, of course, referring to film grain synthesis. At present, SVT-AV1 seems to model in a somewhat okay manner, but on playback, the reproduced grain level is far below what it originally was. This is true when going to the max setting. In other words, SVT-AV1 *seems* to identify grain correctly, but doesn't set the intensity high enough. Another option is that libdav1d is decoding incorrectly.

Either way, really good noise emulation would be invaluable for greatly reducing the bitrate of noisy movies (looking at you, Zack Snyder) while having similar visual quality.

Both aomenc and SVT-AV1 currently do not adapt noise synthesis to the encode, but merely to a denoised version of the source, with the option to encode from that denoised version (default for both encoders to my knowledge). And even that, at least for aomenc, is not done very well at all, tending to add much more noise than was removed in the denoising step. Which probably is primarily because the metric used to determine noise strength, whatever it is, was never meant to measure the thing most relevant here, visual energy. Furthermore, it creates grain with ugly patterns at higher strengths, maybe because of the artifacts denoising produces at some point, which would again point to an unsuited metric used for noise strength estimation. So, yeah, the current implementations are pretty bad.

benwaggoner

12th October 2023, 18:18

Has much happened with rav1e recently? With the quality complaints about SVT-AV1 and speed complaints about aomenc, it seems rav1e could be filling a valuable niche.

quietvoid

12th October 2023, 19:57

Not much activity over at rav1e. It takes some particular expertise to develop on encoders and it seems most people who worked on it moved on to other things.

damian101

25th October 2023, 19:49

Has much happened with rav1e recently? With the quality complaints about SVT-AV1 and speed complaints about aomenc, it seems rav1e could be filling a valuable niche.

I think the quality complaints about SVT-AV1 should be a thing of the past now, because in my testing in the higher quality range, SVT-AV1 is now straight up superior over aomenc with equivalent settings, especially with the new ssim tune. Preset 3 is what I used with svt-av1, which is faster than aomenc cpu-used 3, slower than cpu-used 4.

Rav1e sadly still lacks a lot of crucial features, so I don't see it becoming competitive in the usual quality range anytime soon...

benwaggoner

26th October 2023, 00:23

I think the quality complaints about SVT-AV1 should be a thing of the past now, because in my testing in the higher quality range, SVT-AV1 is now straight up superior over aomenc with equivalent settings, especially with the new ssim tune. Preset 3 is what I used with svt-av1, which is faster than aomenc cpu-used 3, slower than cpu-used 4.
Good to know! This is with subjective comparisons?

Did you look at FGS?

Rav1e sadly still lacks a lot of crucial features, so I don't see it becoming competitive in the usual quality range anytime soon...
It seems to have lost most momentum. SVT-AV1 really has the same goal and is officially supported by AOM, so it's not that surprising.

Boulder

26th October 2023, 06:22

I think the quality complaints about SVT-AV1 should be a thing of the past now, because in my testing in the higher quality range, SVT-AV1 is now straight up superior over aomenc with equivalent settings, especially with the new ssim tune. Preset 3 is what I used with svt-av1, which is faster than aomenc cpu-used 3, slower than cpu-used 4.

Can you post both aomenc and svt-av1 command lines for testing and comparison? It would be interesting to see how the encoders differ in detail retention.

It would be nice if quietvoid's rebase of the FGS table patch made it to mainline, that is one big difference between the two encoders.

jethro

26th October 2023, 21:34

Can you post both aomenc and svt-av1 command lines for testing and comparison? It would be interesting to see how the encoders differ in detail retention.

It would be nice if quietvoid's rebase of the FGS table patch made it to mainline, that is one big difference between the two encoders.

FWIW, in my test it was:
SVT>RAV1E>>AOM
AOM too blurry, jumping (on-off) details between frames.

benwaggoner

27th October 2023, 00:02

FWIW, in my test it was:
SVT>RAV1E>>AOM
AOM too blurry, jumping (on-off) details between frames.
When/with what versions was that?

jethro

27th October 2023, 18:44

When/with what versions was that?

Late July

Svt[info]: -------------------------------------------
Svt[info]: SVT [version]: SVT-AV1 Encoder Lib v1.6.0-2-gba18204e
Svt[info]: SVT [build] : GCC 13.1.0 64 bit
Svt[info]: LIB Build date: Jul 17 2023 16:42:40

AOMedia Project AV1 Encoder 3.6.0-355-g96158b090 (default)

rav1e 0.6.1 (p20230711-1-g2447fcc) (release)

Boulder

28th October 2023, 15:03

FWIW, in my test it was:
SVT>RAV1E>>AOM
AOM too blurry, jumping (on-off) details between frames.

I highly recommend the lavish mod for aomenc, provides better quality than the vanilla aomenc, using tune-content=psy and a couple of other parameters set properly.