Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > VP9 and AV1

Reply
 
Thread Tools Search this Thread Display Modes
Old 14th September 2023, 13:22   #1  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,729
Parallel encoding to speed up AV1 encoders or x265

I recently got into looking at AV1 as a codec option as my new media player (Homatics Box R 4K Plus) supports AV1 decoding in hardware. I've experimented a little bit with the 'lavish' mod of aomenc (based on aomenc-psy) and noticed that aomenc is quite slow since it doesn't utilize multithreading too much.

There is a great tool for splitting the encode into chunks for parallel encoding called av1an, but it works only with files that can be opened with VapourSynth. As I use Avisynth for processing the videos, this was a problem.

I had wanted to test ChatGPT's capabilities of creating usable code for a while so this was a good exercise. My finding was that the thing can do some basic stuff, but anything more complex requires human work. Nevertheless, even I was able to hack together a working PoC in a couple of evenings and ChatGPT definitely helped there. You just need to be very patient and know when to write some part on your own. PyCharm's debugging tools became quite useful this time

If you are interested, I put the result here: https://github.com/Boulder08/chunknorris. Feel free to do any changes needed, currently it's very basic and for example does not have any built-in scene change detection included (I'm planning on adding one). I've used StainlessS's excellent tool for creating a QP file so it was the quickest option to use in this exercise. You can find the tool here: https://forum.doom9.org/showthread.php?t=171624, there should a more recent version than the one in the first post posted somewhere in the thread. It includes creating a .qp.txt file for using in x264/x265 and it works with this script as is.

I'm probably going to do some minor development as my time allows but feel free to tell me if there are any problems or things that might be useful to have.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...

Last edited by Boulder; 10th March 2024 at 19:47.
Boulder is offline   Reply With Quote
Old 14th September 2023, 18:14   #2  |  Link
Jamaika
Registered User
 
Join Date: Jul 2015
Posts: 701
libavm
* In encoding and decoding, AV1 allows an input image frame be partitioned into separate vertical tile columns, which can be encoded or decoded independently. This enables easy implementation of parallel encoding and decoding. The parameter for this control describes the number of tile columns (in log2 units), which has a valid range of [0, 6]:
0 = 1 tile column
1 = 2 tile columns
2 = 4 tile columns
.....
n = 2**n tile columns
* By default, the value is 0, i.e. one single column tile for entire image.
Jamaika is offline   Reply With Quote
Old 14th September 2023, 18:42   #3  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,729
Quote:
Originally Posted by Jamaika View Post
libavm
* In encoding and decoding, AV1 allows an input image frame be partitioned into separate vertical tile columns, which can be encoded or decoded independently. This enables easy implementation of parallel encoding and decoding. The parameter for this control describes the number of tile columns (in log2 units), which has a valid range of [0, 6]:
0 = 1 tile column
1 = 2 tile columns
2 = 4 tile columns
.....
n = 2**n tile columns
* By default, the value is 0, i.e. one single column tile for entire image.
Yes, there are tiles but they don't come without a price.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 14th September 2023, 19:54   #4  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,768
Quote:
Originally Posted by Boulder View Post
Yes, there are tiles but they don't come without a price.
Although chunked encoding can have its own challenges, particularly as chunks get smaller. Like maintain VBV compliance across chunk boundaries, and potentially adding extra intra frames at chunk boundaries.

Aren't some amount of tiles required for decoder compliance at higher resolutions?
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 15th September 2023, 07:57   #5  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,729
Quote:
Originally Posted by benwaggoner View Post
Although chunked encoding can have its own challenges, particularly as chunks get smaller. Like maintain VBV compliance across chunk boundaries, and potentially adding extra intra frames at chunk boundaries.
My use cases fortunately don't have problems with VBV as the results are for my personal use only and bitrates definitely low enough to ignore any VBV related things. But yes, there are also caveats with this method.

Quote:
Aren't some amount of tiles required for decoder compliance at higher resolutions?
That's a good question, I could not find anything based on a quick search. Then again, with higher resolutions, having more tiles (I'd expect that there's always one tile anyway) would not have the efficiency penalty that splitting a 720p or 1080p source in multiple tiles might see.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 18th September 2023, 08:21   #6  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,729
I've done some more work on this, now you can use ffmpeg for detecting the scene changes as the first phase of the script. I'll also add an option to use a different script for the detection (so you can either just load the source, or do some downscaling to speed things up).

You can also apply a film grain table on the fly -- Film Grain Synthesis seems to be the thing about AV1 so it's very important in my opinion. I need to come up with a practical solution of creating the table based on the source so that could also be done in the same processing chain without too much manual interaction.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 21st September 2023, 15:54   #7  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,729
Another update, now you can create the Film Grain Synthesis grain table (semi-)automatically.

The next thing to do is to implement a better method for automatic scene change detection, ffmpeg is quite flaky with non-stable sources.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 22nd September 2023, 22:56   #8  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,768
Also, SVT-AV1 isn't really all that great an encoder, particularly for Film Grain Synthesis. I don't know of a great free AV1 encoder, but commercial ones are getting quite good. But aomenc and SVT-AV1 aren't likely to outperform x265 much for quality @ perf. The Visonular Aurora encoder is the one I've played with the most, and I'm pleased by the compression efficiency it can deliver in reasonable encoding times.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 23rd September 2023, 08:43   #9  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,729
I think the aomenc-lavish fork is not that bad, it has some psy tunings and psy related fixes compared to the original. Of course, AV1 like the other modern encoders, tend to be biased towards minimizing bitrate so they require a bit of tuning to keep details. In general, all encoder related denoising must be disabled and AQ mode set to 0. I don't understand what has been screwed up with AQ, but it just doesn't work. But the big thing is FGS in my opinion, it can make a huge difference.

Aomenc without any sort of chunked encoding like av1an or this script is definitely a waste of time compared to x265.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 23rd September 2023, 09:01   #10  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,729
Most of the big stuff is done now, I just added the capability to use PySceneDetect for scene change detection.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 23rd September 2023, 11:50   #11  |  Link
Selur
Registered User
 
Selur's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 7,275
Out of curiosity: When using Avisynth anyway, why not use SCXvid ?
__________________
Hybrid here in the forum, homepage
Selur is offline   Reply With Quote
Old 23rd September 2023, 15:15   #12  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,729
Quote:
Originally Posted by Selur View Post
Out of curiosity: When using Avisynth anyway, why not use SCXvid ?
Probably because I didn't know about it

Maybe I'll implement it as an option as well, shouldn't be too hard now that things are as functions.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 24th September 2023, 16:55   #13  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,729
SCXviD added now as well.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...

Last edited by Boulder; 25th September 2023 at 19:07.
Boulder is offline   Reply With Quote
Old 28th September 2023, 12:22   #14  |  Link
Beelzebubu
Registered User
 
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 109
Quote:
Originally Posted by benwaggoner View Post
Aren't some amount of tiles required for decoder compliance at higher resolutions?
Yes, but that only comes in above 4K. Max tile area is 4096 * 2304, so 4K can in theory still be a single tile.
Beelzebubu is offline   Reply With Quote
Old 28th September 2023, 13:55   #15  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,729
I added some tile parameters in the presets if anyone wants to use them. 1080p comes with --tile-columns=1 and presets meant for resolutions above that --tile-columns=1 --tile-rows=1. I didn't notice these to cause any significant compression efficiency dropdowns and would help with decoding. The actual encoding speed might not be affected that much, I've been running some tests with 1080p content and four parallel encodes with 16 threads already saturate the CPU quite nicely (5950X 16c/32t).

Now I need to do some more work on the film grain synthesis part to make it work properly, then all the basic stuff is there.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 28th September 2023, 20:05   #16  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,729
The film grain table creation was reworked and based on my tests, it works as expected. Now I suggest looking up a short, representative frame range (100-200 frames will do) without scene changes for analysis. The script will get the longest section it finds from the generated table and use it for the final encode. The grain table file format is somehow a bit odd, but this approach seems to work fine. Running a diff on the complete source is definitely unnecessary, you just need to find a good scene to get the data from.

All the big stuff should be done now.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 4th October 2023, 18:51   #17  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,768
Be warned that some early SoCs don't properly implement FGS, so you'll definitely want to test it on your intended hardware before making a lot of content using it.

And yeah, tiles>1 can help parallelism, but if you're already saturated, it won't matter. Potentially you could turn down frame threading with more tiles, which might profile better quality @ perf. Frame threading is HARD to do without some quality degradation, in any codec.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 6th October 2023, 08:25   #18  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,729
I think at least aomenc already has frame threading disabled by default, it only does row-based multithreading unless you change the parameters.

While developing the tool and testing AV1, I've found it to be a very capable replacement for HEVC. Without film grain synthesis, it does smooth the image way too much to my liking, but with FGS on top, the results are very good indeed. Much better than x265, and no issues with red colors which tend to be a problem with x265 for some reason and FGS is basically free of charge what comes to bitrate. Aomenc is only slightly slower than x265 when you utilize these chunked encoding methods, it would be too slow when running just a single encode. You just need to make sure you are using the lavish mod, and there is a very recent patch with luma bias parameters which improve the handling of dark areas a lot. The AV1 Discord channel has pre-built binaries available.

I've also found out that it is usually a better idea to use a luma-only grain table. A table with full chroma information tends to start oversaturating the image and the chroma noise itself can be more distracting than the more grain like appearance of luma noise. I'll probably do some development on this to apply Tweak in the grain table creation part, that way the user can choose the amount of chroma to include in the result.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 6th October 2023, 22:56   #19  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,768
Uote]

Quote:
Originally Posted by Boulder View Post
I think at least aomenc already has frame threading disabled by default, it only does row-based multithreading unless you change the parameters.
Yeah, frame threading is hard, and can cause unpredictable problems in a few frames here and there that require somewhat complex evaluation of per-frame metrics to predict. I always try to use a single frame with x265 as well.

Quote:
While developing the tool and testing AV1, I've found it to be a very capable replacement for HEVC. Without film grain synthesis, it does smooth the image way too much to my liking, but with FGS on top, the results are very good indeed.
And my impression of the "smoothing" is that it is mainly a limitation of AQ modes in existing encoders, not intrinsic to AV1 as a technology. Commercial proprietary AV1 encoders are doing a lot better in that regard.

Quote:
Much better than x265, and no issues with red colors which tend to be a problem with x265 for some reason
Are you doing HDR? x265 really benefits from setting --hdr10-opt and lowering the chroma QPs by at least one.

Quote:
and FGS is basically free of charge what comes to bitrate.
If you have a good implementation on the encoding side and the decoding side. I'm not aware of anyone using FGS at scale by default. The SVT-AV1 implementation in particular can result in some really distracting grain patterns instead of a true random one.

The whole "remove grain while parameterizing it" domain has been an active R&D area for a few decades now, and it's hard to perfect across the many edge cases in real-world content. ML is certainly helping a lot, quickly, but it's not a solved problem by any means.

But the biggest challenge is inconsistent support in early HW implementations.

That said, grain and noise is the biggest bit suck in encoding, and doesn't get much better as we improve efficiency of encoding the actual signal. Good grain removal and parametrization followed by good FGS would help cut bitrate of grainy titles by >50% in some cases. And since it is entirely out-of-loop of the AV1 decoder itself, the metadata and synthesis could support any codec. Getting a "good enough" end-to-end FGS chain would be revolutionary.

Quote:
Aomenc is only slightly slower than x265 when you utilize these chunked encoding methods, it would be too slow when running just a single encode. You just need to make sure you are using the lavish mod, and there is a very recent patch with luma bias parameters which improve the handling of dark areas a lot. The AV1 Discord channel has pre-built binaries available.
Does aomenc have a lapped rate control model like x265 for split-and-stitch? That allows for frames before and after the chunk to be analyzed and discarded to get a more accurate VBV state to reduce quality fluctuations at chunk boundaries.

Quote:
I've also found out that it is usually a better idea to use a luma-only grain table. A table with full chroma information tends to start oversaturating the image and the chroma noise itself can be more distracting than the more grain like appearance of luma noise. I'll probably do some development on this to apply Tweak in the grain table creation part, that way the user can choose the amount of chroma to include in the result.
Good tip. While color film does have some chromatic grain, it's generally not super saturated. Is there a way to reduce grain saturation without eliminating it outright?
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book

Last edited by benwaggoner; 6th October 2023 at 22:58. Reason: quote fixing
benwaggoner is offline   Reply With Quote
Old 7th October 2023, 09:18   #20  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,729
Quote:
Originally Posted by benwaggoner View Post
And my impression of the "smoothing" is that it is mainly a limitation of AQ modes in existing encoders, not intrinsic to AV1 as a technology. Commercial proprietary AV1 encoders are doing a lot better in that regard.
With aomenc, it's at least currently recommended to switch the regular AQ-mode off as it just blurs the image even more. --deltaq-mode (1 for SDR, 5 for HDR) and --enable-chroma-deltaq=1 are what I recommend.

Quote:
Are you doing HDR? x265 really benefits from setting --hdr10-opt and lowering the chroma QPs by at least one.
Both SDR and HDR. This is an old issue with x265, it just doesn't handle the chroma planes that well. --cbqpoffs -3 --crqpoffs -6 are my go-to settings in x265 but actually the offsets don't fix the issue. I've just left them there since the increase in bitrate is rather small and I think x264 uses -3 for both by default so I've kind of followed that path.

--hdr10-opt only adjusts QP per average luma level, but yes, it's always enabled with HDR sources.

Quote:
If you have a good implementation on the encoding side and the decoding side. I'm not aware of anyone using FGS at scale by default. The SVT-AV1 implementation in particular can result in some really distracting grain patterns instead of a true random one.
I haven't noticed any ill effects by using the grain table produced by grav1synth. Then again, I'm using a fork of aomenc, which is the reference encoder. I'm not sure if the encoder does anything when you feed the table as a parameter, or is it just muxing it in the container.

Quote:
The whole "remove grain while parameterizing it" domain has been an active R&D area for a few decades now, and it's hard to perfect across the many edge cases in real-world content. ML is certainly helping a lot, quickly, but it's not a solved problem by any means.

But the biggest challenge is inconsistent support in early HW implementations.
That's why I'm only thinking this in a "selfish" way and create solutions which work on my devices Well, the box (and its derivatives like the Dune or Nokia) I have is a quite common one because of its capability of outputting DoVi in Kodi so maybe someone else will benefit from it too. And playback on the PC works fine.

Quote:
That said, grain and noise is the biggest bit suck in encoding, and doesn't get much better as we improve efficiency of encoding the actual signal. Good grain removal and parametrization followed by good FGS would help cut bitrate of grainy titles by >50% in some cases. And since it is entirely out-of-loop of the AV1 decoder itself, the metadata and synthesis could support any codec. Getting a "good enough" end-to-end FGS chain would be revolutionary.
I think the grav1synth approach is quite smart (and simple) but requires some manual labour. You just need to diff the final encoding result against the original one and it comes up with a grain table approximating the difference. The grain table file has start and end timestamps for each set of grain so in theory, you could also adjust it in case there are sections which are full on CGI and some with a lot of film grain.

I've just taken a short path and ask the user to either provide a table or analyze a short range of the source to create a table with just one section of grain. I've also collected some tables and put them on GitHub for starters and will add more along the way. After all, the title doesn't matter but the appearance of grain.

Quote:
Does aomenc have a lapped rate control model like x265 for split-and-stitch? That allows for frames before and after the chunk to be analyzed and discarded to get a more accurate VBV state to reduce quality fluctuations at chunk boundaries.
I think not. At least I don't see any advanced things like these there.

Quote:
Good tip. While color film does have some chromatic grain, it's generally not super saturated. Is there a way to reduce grain saturation without eliminating it outright?
The only way is to either alter the grain table (it's just a text file), or alter the video while running the analysis. So if you feed the source without chroma information to the analysis, there will be no chroma grain in the table. That's why I was thinking of allowing the user to choose a saturation value between 0-1 to either tone down the chroma noise substantially or to remove it completely.

Here's a short sample of how AV1 with FGS (@ q 12) and HEVC (@ CRF 18) compare in a rather bright, static scene. The bitrate difference between the two is huge, but for the entire movie, the AV1 encode is around 7,5% bigger. I didn't re-encode the credits yet to lower the bitrate, they are way too big in the AV1 encode. I'm also using settings which give quite a lot of bits to dark areas compared to the default. In general, HEVC is just much blurrier since FGS adds to sharpness.

AV1: https://drive.google.com/file/d/1cB1...usp=drive_link
HEVC: https://drive.google.com/file/d/1mXQ...usp=drive_link
Lossless: https://drive.google.com/file/d/1qOa...usp=drive_link

EDIT: got around to re-encoding the credits (a little less than 40000 frames ), and now the complete encode is a little smaller than the HEVC counterpart. Very happy with this, but I need to continue testing different sources.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...

Last edited by Boulder; 7th October 2023 at 14:08.
Boulder is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 14:46.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.