BlueSwordM
23rd November 2022, 07:32
Hello everyone. I know I don't post much on doom9, but no worries: from now on, I will be posting and most importantly, interacting more often :thanks:
The main reason I decided to write this post is that I've been (slowly) working on rav1e as well as contributing to other related projects like ssimulacra2_rs (a Rust version of the ssimulacra2_C psy image metric), and learning a lot of stuff from early September all the way to now.
One of these things is how complex video encoder development truly is: not only do you need to have solid coding and math knowledge and experience, but a good understanding of scientific analysis and how the HVS system works is critical to good results in terms of speed, coding efficiency and visual performance.
One of these things is how complex video encoder development truly is: a solid programming and mathematics background is very important to actually being able to implement the ideas you actually want to integrate into the encoder and how critical good scientific analysis with a solid understanding of how the HVS system works is to getting the best result possible in the end.
Another important factor is how truly important feedback loops are to the entire development process: that is why metrics exist, right? They exist to act as a feedback loop into how encoder development should be driven, and where it should be driven. Be it objectively or subjectively driven, video encoder improvement is driven by how varied and good those feedback loops are. If they are poor, you will get suboptimal results, or the worst of all, you'll get the right answers to the wrong questions on how to improve the video encoder in question.
Something that is often ignored in tightly knit video encoder development groups is how helpful user-driven/community-driven feedback can truly be, as working in a tightly knit group often results in shared opinions becoming the norm, and critical concensus is never truly attained. In that sense, you start developping tunnel vision in regards to what needs to become better after you've picked the low hanging fruit.
How does that relate to the AV1 video encoder development scene? Well, most of the actual video encoder development in aomenc, SVT-AV1 and other tools is done somewhat privately by AOM members while not taking into opinions from outside entities much or disregarding them entirely. rav1e is a lot better in this regard, but because of a lack of funding, its development has slowed down somewhat.
A lack of public open discussion on something like a forum means most of the discussion is happening on slightly closed off(#daala IRC channel) or even proprietary platforms(Discord AV1 server), both suboptimal ways to discuss development and news about this kind of stuff since it's closed off from public view, and most information moves from mouth to mouth, preventing general view from a large portion of viewers, readers, developers, enthusiasts, encoders, and even search engines, making for low visibility.
As such, I've decided to lead (somewhat independently lmao) a different much more open approach to what I believe will be the strongest for AV1 encoder development: on public forums platforms like Reddit, doom9 forums, and a few others, I'll start taking feedback and ideas for various improvements and suggestions on what features to add and improve in rav1e, respond to general questions, involve more industry folks in the subject, have people from different standard toolset teams chip in for more general improvements.
Most importantly, video encoder feedback can and will be taken more seriously, as every complaint or criticism with proper explanations of what is bad and can be changed is invaluable advice.
I believe this addition in terms of feedback as well as the improvement of using heavily psy metric development
I believe this change in development feedback loop is necessary, as it fully complements the beginning of the widespread usage of much more heavily psy driven development models and more varied understanding in that regard. In that sense, having many outside eyes and hands in a video encoder project is critical to its large success, as it can be used to go to revive and kick off a program that was thought to never be able to improve on its implementation weaknesses.
With all of that in mind, what do I believe rav1e's end goal is? In a somewhat generic manner, it is to be the AV1 encoder equivalent of x264, but only from a general usage point of view.
In the context of a video encoder, it is more nuanced than that:
The first point is to take good tools of the AV1 standard that rav1e currently lacks, shove them inside of rav1e taking a calculated approach of how to implement them more effectively than the other encoders to get all the upsides with almost none of the downsides.
The second one is to have an even stronger psycho-visual mode decision model than what is currently used in rav1e, which would allow us to get the best visual performance at very low bitrates all the way to excellent visual lossless performance at high bitrates. In theory and even in practice(if it can get optimally implemented), this would allow rav1e to greatly surpass x264 relatively speaking, since one of the reasons x264's high end performance is great is specifically because mode decision is heavily limited in way that allows its psycho-visual modeling(psy-rd and other tools) to perform well. The main limitation this creates is that at low bitrates, these mode decision limitations limit how well you can perform at these low quality levels.
Implementing such a model that works reliably without throwing obscene amounts of external compute or restricting mode decision choices in rav1e is actually feasible since the infrastructure is well made in that regard, well-built and greatly thought out.
Another benefit is that it would eliminate most of the cargo culting that is often seen with a large amount of open encoders that gets difficult to tweak, tiring, and often full of mindhive like usage of specific settings for ??? reasons.
Finally, higher speeds encoder speeds on faster presets and better threading on rav1e's part is necessary to get rav1e used by about everyone, so that is a given.
Thank you all for reading up to this point. All of you and the people who've helped me to get up to this point are why I've decided to attempt such a large mindset change over the last few years. A few of them come to mind: rav1e and av1an developers, dav1d, some ffmpeg devs, AV1 Discord server members, and JPEG-XL folks; there are just too many good people to mention them all sadly.
Now that I've said everything in my mind up to this point, I'd like to start the conversation by saying one thing:
what do you suggest we can do to improve rav1e, the way we do development, and what suggestions do you have in general?
I'd love to hear from all of you guys and gals around the world with different perspective and views.
If any of you would like to submit concrete improvements backed up with comements and optimally, with numbers, please do so.
This is the kind of feedback that is the most appreciated.
Thank you all, and have a good day.
All of the above text and opinions are of my own, only shared by some other people, and do not represent the thoughts of rav1e's development team as a whole.
The main reason I decided to write this post is that I've been (slowly) working on rav1e as well as contributing to other related projects like ssimulacra2_rs (a Rust version of the ssimulacra2_C psy image metric), and learning a lot of stuff from early September all the way to now.
One of these things is how complex video encoder development truly is: not only do you need to have solid coding and math knowledge and experience, but a good understanding of scientific analysis and how the HVS system works is critical to good results in terms of speed, coding efficiency and visual performance.
One of these things is how complex video encoder development truly is: a solid programming and mathematics background is very important to actually being able to implement the ideas you actually want to integrate into the encoder and how critical good scientific analysis with a solid understanding of how the HVS system works is to getting the best result possible in the end.
Another important factor is how truly important feedback loops are to the entire development process: that is why metrics exist, right? They exist to act as a feedback loop into how encoder development should be driven, and where it should be driven. Be it objectively or subjectively driven, video encoder improvement is driven by how varied and good those feedback loops are. If they are poor, you will get suboptimal results, or the worst of all, you'll get the right answers to the wrong questions on how to improve the video encoder in question.
Something that is often ignored in tightly knit video encoder development groups is how helpful user-driven/community-driven feedback can truly be, as working in a tightly knit group often results in shared opinions becoming the norm, and critical concensus is never truly attained. In that sense, you start developping tunnel vision in regards to what needs to become better after you've picked the low hanging fruit.
How does that relate to the AV1 video encoder development scene? Well, most of the actual video encoder development in aomenc, SVT-AV1 and other tools is done somewhat privately by AOM members while not taking into opinions from outside entities much or disregarding them entirely. rav1e is a lot better in this regard, but because of a lack of funding, its development has slowed down somewhat.
A lack of public open discussion on something like a forum means most of the discussion is happening on slightly closed off(#daala IRC channel) or even proprietary platforms(Discord AV1 server), both suboptimal ways to discuss development and news about this kind of stuff since it's closed off from public view, and most information moves from mouth to mouth, preventing general view from a large portion of viewers, readers, developers, enthusiasts, encoders, and even search engines, making for low visibility.
As such, I've decided to lead (somewhat independently lmao) a different much more open approach to what I believe will be the strongest for AV1 encoder development: on public forums platforms like Reddit, doom9 forums, and a few others, I'll start taking feedback and ideas for various improvements and suggestions on what features to add and improve in rav1e, respond to general questions, involve more industry folks in the subject, have people from different standard toolset teams chip in for more general improvements.
Most importantly, video encoder feedback can and will be taken more seriously, as every complaint or criticism with proper explanations of what is bad and can be changed is invaluable advice.
I believe this addition in terms of feedback as well as the improvement of using heavily psy metric development
I believe this change in development feedback loop is necessary, as it fully complements the beginning of the widespread usage of much more heavily psy driven development models and more varied understanding in that regard. In that sense, having many outside eyes and hands in a video encoder project is critical to its large success, as it can be used to go to revive and kick off a program that was thought to never be able to improve on its implementation weaknesses.
With all of that in mind, what do I believe rav1e's end goal is? In a somewhat generic manner, it is to be the AV1 encoder equivalent of x264, but only from a general usage point of view.
In the context of a video encoder, it is more nuanced than that:
The first point is to take good tools of the AV1 standard that rav1e currently lacks, shove them inside of rav1e taking a calculated approach of how to implement them more effectively than the other encoders to get all the upsides with almost none of the downsides.
The second one is to have an even stronger psycho-visual mode decision model than what is currently used in rav1e, which would allow us to get the best visual performance at very low bitrates all the way to excellent visual lossless performance at high bitrates. In theory and even in practice(if it can get optimally implemented), this would allow rav1e to greatly surpass x264 relatively speaking, since one of the reasons x264's high end performance is great is specifically because mode decision is heavily limited in way that allows its psycho-visual modeling(psy-rd and other tools) to perform well. The main limitation this creates is that at low bitrates, these mode decision limitations limit how well you can perform at these low quality levels.
Implementing such a model that works reliably without throwing obscene amounts of external compute or restricting mode decision choices in rav1e is actually feasible since the infrastructure is well made in that regard, well-built and greatly thought out.
Another benefit is that it would eliminate most of the cargo culting that is often seen with a large amount of open encoders that gets difficult to tweak, tiring, and often full of mindhive like usage of specific settings for ??? reasons.
Finally, higher speeds encoder speeds on faster presets and better threading on rav1e's part is necessary to get rav1e used by about everyone, so that is a given.
Thank you all for reading up to this point. All of you and the people who've helped me to get up to this point are why I've decided to attempt such a large mindset change over the last few years. A few of them come to mind: rav1e and av1an developers, dav1d, some ffmpeg devs, AV1 Discord server members, and JPEG-XL folks; there are just too many good people to mention them all sadly.
Now that I've said everything in my mind up to this point, I'd like to start the conversation by saying one thing:
what do you suggest we can do to improve rav1e, the way we do development, and what suggestions do you have in general?
I'd love to hear from all of you guys and gals around the world with different perspective and views.
If any of you would like to submit concrete improvements backed up with comements and optimally, with numbers, please do so.
This is the kind of feedback that is the most appreciated.
Thank you all, and have a good day.
All of the above text and opinions are of my own, only shared by some other people, and do not represent the thoughts of rav1e's development team as a whole.