Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
![]() |
#1 | Link |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,673
|
Parallel encoding to speed up aomenc
I recently got into looking at AV1 as a codec option as my new media player (Homatics Box R 4K Plus) supports AV1 decoding in hardware. I've experimented a little bit with the 'lavish' mod of aomenc (based on aomenc-psy) and noticed that aomenc is quite slow since it doesn't utilize multithreading too much.
There is a great tool for splitting the encode into chunks for parallel encoding called av1an, but it works only with files that can be opened with VapourSynth. As I use Avisynth for processing the videos, this was a problem. I had wanted to test ChatGPT's capabilities of creating usable code for a while so this was a good exercise. My finding was that the thing can do some basic stuff, but anything more complex requires human work. Nevertheless, even I was able to hack together a working PoC in a couple of evenings and ChatGPT definitely helped there. You just need to be very patient and know when to write some part on your own. PyCharm's debugging tools became quite useful this time ![]() If you are interested, I put the result here: https://github.com/Boulder08/chunknorris. Feel free to do any changes needed, currently it's very basic and for example does not have any built-in scene change detection included (I'm planning on adding one). I've used StainlessS's excellent tool for creating a QP file so it was the quickest option to use in this exercise. You can find the tool here: https://forum.doom9.org/showthread.php?t=171624, there should a more recent version than the one in the first post posted somewhere in the thread. It includes creating a .qp.txt file for using in x264/x265 and it works with this script as is. I'm probably going to do some minor development as my time allows but feel free to tell me if there are any problems or things that might be useful to have.
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
![]() |
![]() |
![]() |
#2 | Link |
Registered User
Join Date: Jul 2015
Posts: 682
|
libavm
* In encoding and decoding, AV1 allows an input image frame be partitioned into separate vertical tile columns, which can be encoded or decoded independently. This enables easy implementation of parallel encoding and decoding. The parameter for this control describes the number of tile columns (in log2 units), which has a valid range of [0, 6]: 0 = 1 tile column 1 = 2 tile columns 2 = 4 tile columns ..... n = 2**n tile columns * By default, the value is 0, i.e. one single column tile for entire image. |
![]() |
![]() |
![]() |
#3 | Link | |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,673
|
Quote:
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
|
![]() |
![]() |
![]() |
#4 | Link |
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,653
|
Although chunked encoding can have its own challenges, particularly as chunks get smaller. Like maintain VBV compliance across chunk boundaries, and potentially adding extra intra frames at chunk boundaries.
Aren't some amount of tiles required for decoder compliance at higher resolutions? |
![]() |
![]() |
![]() |
#5 | Link | ||
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,673
|
Quote:
Quote:
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
||
![]() |
![]() |
![]() |
#6 | Link |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,673
|
I've done some more work on this, now you can use ffmpeg for detecting the scene changes as the first phase of the script. I'll also add an option to use a different script for the detection (so you can either just load the source, or do some downscaling to speed things up).
You can also apply a film grain table on the fly -- Film Grain Synthesis seems to be the thing about AV1 so it's very important in my opinion. I need to come up with a practical solution of creating the table based on the source so that could also be done in the same processing chain without too much manual interaction.
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
![]() |
![]() |
![]() |
#7 | Link |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,673
|
Another update, now you can create the Film Grain Synthesis grain table (semi-)automatically.
The next thing to do is to implement a better method for automatic scene change detection, ffmpeg is quite flaky with non-stable sources.
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
![]() |
![]() |
![]() |
#8 | Link |
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,653
|
Also, SVT-AV1 isn't really all that great an encoder, particularly for Film Grain Synthesis. I don't know of a great free AV1 encoder, but commercial ones are getting quite good. But aomenc and SVT-AV1 aren't likely to outperform x265 much for quality @ perf. The Visonular Aurora encoder is the one I've played with the most, and I'm pleased by the compression efficiency it can deliver in reasonable encoding times.
|
![]() |
![]() |
![]() |
#9 | Link |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,673
|
I think the aomenc-lavish fork is not that bad, it has some psy tunings and psy related fixes compared to the original. Of course, AV1 like the other modern encoders, tend to be biased towards minimizing bitrate so they require a bit of tuning to keep details. In general, all encoder related denoising must be disabled and AQ mode set to 0. I don't understand what has been screwed up with AQ, but it just doesn't work. But the big thing is FGS in my opinion, it can make a huge difference.
Aomenc without any sort of chunked encoding like av1an or this script is definitely a waste of time compared to x265.
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
![]() |
![]() |
![]() |
#10 | Link |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,673
|
Most of the big stuff is done now, I just added the capability to use PySceneDetect for scene change detection.
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
![]() |
![]() |
![]() |
#12 | Link | |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,673
|
Quote:
![]() Maybe I'll implement it as an option as well, shouldn't be too hard now that things are as functions.
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
|
![]() |
![]() |
![]() |
#13 | Link |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,673
|
SCXviD added now as well.
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... Last edited by Boulder; 25th September 2023 at 19:07. |
![]() |
![]() |
![]() |
#15 | Link |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,673
|
I added some tile parameters in the presets if anyone wants to use them. 1080p comes with --tile-columns=1 and presets meant for resolutions above that --tile-columns=1 --tile-rows=1. I didn't notice these to cause any significant compression efficiency dropdowns and would help with decoding. The actual encoding speed might not be affected that much, I've been running some tests with 1080p content and four parallel encodes with 16 threads already saturate the CPU quite nicely (5950X 16c/32t).
Now I need to do some more work on the film grain synthesis part to make it work properly, then all the basic stuff is there.
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
![]() |
![]() |
![]() |
#16 | Link |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,673
|
The film grain table creation was reworked and based on my tests, it works as expected. Now I suggest looking up a short, representative frame range (100-200 frames will do) without scene changes for analysis. The script will get the longest section it finds from the generated table and use it for the final encode. The grain table file format is somehow a bit odd, but this approach seems to work fine. Running a diff on the complete source is definitely unnecessary, you just need to find a good scene to get the data from.
All the big stuff should be done now.
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
![]() |
![]() |
![]() |
#17 | Link |
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,653
|
Be warned that some early SoCs don't properly implement FGS, so you'll definitely want to test it on your intended hardware before making a lot of content using it.
And yeah, tiles>1 can help parallelism, but if you're already saturated, it won't matter. Potentially you could turn down frame threading with more tiles, which might profile better quality @ perf. Frame threading is HARD to do without some quality degradation, in any codec. |
![]() |
![]() |
![]() |
#18 | Link |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,673
|
I think at least aomenc already has frame threading disabled by default, it only does row-based multithreading unless you change the parameters.
While developing the tool and testing AV1, I've found it to be a very capable replacement for HEVC. Without film grain synthesis, it does smooth the image way too much to my liking, but with FGS on top, the results are very good indeed. Much better than x265, and no issues with red colors which tend to be a problem with x265 for some reason and FGS is basically free of charge what comes to bitrate. Aomenc is only slightly slower than x265 when you utilize these chunked encoding methods, it would be too slow when running just a single encode. You just need to make sure you are using the lavish mod, and there is a very recent patch with luma bias parameters which improve the handling of dark areas a lot. The AV1 Discord channel has pre-built binaries available. I've also found out that it is usually a better idea to use a luma-only grain table. A table with full chroma information tends to start oversaturating the image and the chroma noise itself can be more distracting than the more grain like appearance of luma noise. I'll probably do some development on this to apply Tweak in the grain table creation part, that way the user can choose the amount of chroma to include in the result.
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
![]() |
![]() |
![]() |
#19 | Link | ||||||
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,653
|
Uote]
Quote:
Quote:
Quote:
Quote:
The whole "remove grain while parameterizing it" domain has been an active R&D area for a few decades now, and it's hard to perfect across the many edge cases in real-world content. ML is certainly helping a lot, quickly, but it's not a solved problem by any means. But the biggest challenge is inconsistent support in early HW implementations. That said, grain and noise is the biggest bit suck in encoding, and doesn't get much better as we improve efficiency of encoding the actual signal. Good grain removal and parametrization followed by good FGS would help cut bitrate of grainy titles by >50% in some cases. And since it is entirely out-of-loop of the AV1 decoder itself, the metadata and synthesis could support any codec. Getting a "good enough" end-to-end FGS chain would be revolutionary. Quote:
Quote:
Last edited by benwaggoner; 6th October 2023 at 22:58. Reason: quote fixing |
||||||
![]() |
![]() |
![]() |
#20 | Link | |||||||
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,673
|
Quote:
Quote:
--hdr10-opt only adjusts QP per average luma level, but yes, it's always enabled with HDR sources. Quote:
Quote:
![]() Quote:
I've just taken a short path and ask the user to either provide a table or analyze a short range of the source to create a table with just one section of grain. I've also collected some tables and put them on GitHub for starters and will add more along the way. After all, the title doesn't matter but the appearance of grain. Quote:
Quote:
Here's a short sample of how AV1 with FGS (@ q 12) and HEVC (@ CRF 18) compare in a rather bright, static scene. The bitrate difference between the two is huge, but for the entire movie, the AV1 encode is around 7,5% bigger. I didn't re-encode the credits yet to lower the bitrate, they are way too big in the AV1 encode. I'm also using settings which give quite a lot of bits to dark areas compared to the default. In general, HEVC is just much blurrier since FGS adds to sharpness. AV1: https://drive.google.com/file/d/1cB1...usp=drive_link HEVC: https://drive.google.com/file/d/1mXQ...usp=drive_link Lossless: https://drive.google.com/file/d/1qOa...usp=drive_link EDIT: got around to re-encoding the credits (a little less than 40000 frames ![]()
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... Last edited by Boulder; 7th October 2023 at 14:08. |
|||||||
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|