Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > VP9 and AV1

Reply
 
Thread Tools Search this Thread Display Modes
Old 3rd March 2022, 01:34   #1  |  Link
BlueSwordM
Registered User
 
BlueSwordM's Avatar
 
Join Date: Dec 2021
Posts: 9
Encoder tuning Part 4: A 2nd generation guide to aomenc-av1, shooting for the stars!

So, this is a follow-up to the 2nd part guide regarding aomenc-av1, which can be found here:

https://old.reddit.com/r/AV1/comment...cav1libaomav1/

While that guide is still fine for the most part at a first glance,
I've learned a lot regarding the sudo-reference AV1 encoder, its options, its intricacies, and best of all, its shortcomings.

It now means I understand a lot more about the options themselves, what they do, how to take advantage of them, when to actually use them,
and even how to get around their downsides through some clever options and even a custom WIP build on how to address aomenc-av1's greatest weakness:
a surprising lack of deep psycho-visual optimizations(intra only has a nice number of them, but barely any video coding versions).

Before I begin, I have to add that this is not a comprehensive documentation. A simple Reddit forum post is far too small for such a massive endeavour, so a separate post will be done with an entry on a dedicated Wiki of some sorts to explain what each and every option does in detail, and even speed-features and their explanations.

Now, to get on to the main subject of the post itself: the 2nd generation tuning guide for aomenc-av1!

Encoder speed preset

The encoder preset itself:
Code:
--cpu-used=X
For VOD purposes, this ranges from 0 (abominably slow) to 6 (decently fast) in the good preset.
For realtime purposes like streaming, the RT presets range from 5 to 10, with 5 being the slowest RT preset and 10 being the fastest.

For reference, the default is 0. Not exactly optimal...

My general recommendation for choosing what preset to utilize is based on speed, usability and quality.
In that context, all realtime presets are off of the table until aomenc gets their frame-threading merged into the mainline build due to their low single instance speed/quality ratio; you are better off using SVT-AV1 right now in that sense.

Otherwise, my general recommendation is in the middle: CPU-2 being the lowest preset I'd recommend actually using, CPU-3 being a good middle ground in general since it keeps most of the juicy features on.

CPU-4 is good for those wanting faster encoding than CPU-3 while not losing much. CPU-5 is where tradeoffs start getting a bit more severe since pruning and the disabling of features(particularly loop restoration filtering).
gets disabled. CPU-6 is the fastest I'd go utilizing aomenc. Any faster today, and going with SVT-AV1 is a better tradeoff.

General recommendations: `--cpu-used=2` for slow encoding, `--cpu-used=3` as the middle ground, and `--cpu-used=5` as the fast option.

Keyframe refresh intervals

Code:
--kf-max-dist=240 --kf-min-dist=12
This parameter dictates the maximum distance between statically placed keyframes(as in, keyframes not placed by the scene-detection algorithms).
For seeking purposes in most content, the standard recommendation is 10 seconds worth of frames, with 300 frames usually being the max number of frames being put to keep good seeking performance.

So, my recommendations would for 240 frames for 24FPS, 250 frames for 25FPS, and 300 frames for >30FPS content.

As for kf-min-dist, it is the minimum amount of frames before you can place a keyframe. This is mainly done in case the scene-detection fails to insert intra-refreshes or fails to detect flashes and places unnecessary keyframes all over the place.

Threading options

Code:
--threads=cpu-threads --sb-size=64
for <=1080p content.
Code:
--threads=cpu-threads --sb-size=64 --tile-columns=1
for even higher encoder side threading and some decoder side tile threading.

Code:
--threads=cpu-threads --sb-size=64 --tile-columns=2 --tile-rows=1
if you need best threading for decoding purposes, particularly at higher resolutions.

Code:
--threads=cpu-threads --tile-columns=2 --tile-rows=1
for >1080p resolutions

Code:
--threads=2 --sb-size=64
+ thread pinning if you use chunked encoding to give yourself better thread scaling.

Now, threading in aomenc. What an interesting subject.
Aomenc has access to these threading parameters:

- Row threading --- - Tile Threading --- - Smaller task threading - Frame-threading(experimental, so will not be tackled in this guide)

The AV1 standard has access to 2 types of SuperBlock types: 64x64-128x128, also allowing for the usage of larger partitions at higher resolutions. Not very useful at standard HD resolutions(<=1080p), but it does exist for a good reason.

In aomenc, the default behavior is to dynamically choose between 64x64-128x128 superblocks. This is good, as very large static SBs and partitions might prove detrimental to speed and perceptual quality to a small extent. Another side effect of using larger SBs is that row threading gets less effective.

To balance it out, tile threading can be used, but as IĀ’ve tested personally, the penalty for using static 64x64 Sbs is lower than even adding just one additional tile column, so if you worry a bit about encoder side threading for the encoder to use 64x64 Sbs before adding tiles.

The main reason to add tiles would be to boost random access performance for the decoder, as frame threads are much higher latency than tile threads. Adding tiles boosts seeking performance.

Finally, tiles still follow the power of 2 rules. Therefore, `--tile-columns=1` = 2Ā¹ = 2 tile columns.
The total number of tiles is dictated by: # of tile columns * # of tile rows = total number of tiles.
Thus, --tile-columns=2 --tile-rows=1 = 2² columns x 2¹ rows = 4x2 tiles = 8 tiles.

Rate control

Code:
--end-usage=q --cq-level=24
In aomenc, you have access to multiple rate control options.

The Q rate control mode is basically a modulated quantizer depending on spatial adaptive quantization, temporal-rdo, spatio-temporal AQ(deltaq-mode=1,2) and motion in general. Basically, its closest equivalent is CRF, so use it if you target maximum quality encodes without a bitrate limit.

CQ is Constrained Quality, meaning it's similar to it, except it can't go as high in terms of quality because of the bitrate constrained quality and other stuff. This is not recommended unless you have very specific requirements.

VBR and CBR are Variable and Constant Bitrate respectively. Unless you have a very recent aomenc build with the bitrate accuracy compiler flag enabled, I wouldnĀ’t recommend using them if youĀ’re trying to target a certain ratio of quality-bitrate.

As for cq-level, it is basically how you choose your base quality level/modulated quantizer. 24 is usually a good target for encoding at a decent quality. 20 is usually a good target for higher quality encoding, and 18 is where high quality encoding starts. 30 is where the threshold for low-mid quality starts and where aomenc-av1 really starts to pull away in front in quality/bitrate vs other encoders.

35-40 is where Youtube quality can be achieved without using more exotic settings. Anything higher is where the low quality threshold starts.

Note that these guidelines are all for 8-bit SDR live-action/animation sources. Very high motion and high contrast sources like video games have different requirements entirely, and thatĀ’s not even mentioning native 10-bit HDR sources with larger color gamuts; for video games, I usually recommend upping the Q level by 10-15 above the usual recommendations to achieve similar bitrates compared to easier content. As for HDR sources, keep reading

Bit-depth and chroma subsampling

Code:
--bit-depth=10
and whatever the source chroma subsampling is.

In AV1, you have access to 8-bit coding and 16-bit coding.
That leaves you with these bit-depths that the AV1 standard allows: 8-bit, 10-bit, and 12-bit.

I **always** recommend encoding in **10-bit**, particularly if your source is 4:2:0 YCbCr chroma subsampled limited range, even from an 8-bit source. So, most video sources currently found on the Internet.

Not only does encoding in 10-bit allow the encoder to process everything in 16-bit buffers(getting higher coding efficiency due to considerably less truncating/rounding off), but the much higher color depth allowed by 10-bit coding and output allows for a more perceptually efficient output, **particularly in darker shades where differences are more easily noticeable by the human eye and where dithering is more prominent.**

Also, since 8-bit YCbCr <> 8-bit RGB coding is not lossless unlike other transforms like YCoCg and XYB, 10-bit YcbCr allows for lossless RGB conversion to your screen.

As for other high bit-depth sources, keeping the same bit-depth is what is most optimal, especially if you value general HW decoder compatibility.

The same thing applies with chroma subsampling: unless you must support widespread HW decoders, keep the same chroma subsampling parameters as the source.

Encoding passes and lookahead

Code:
--lag-in-frames=48
(--passes=2 in aomenc is default, so no need to specify it).

2-pass was extremely important in vpxenc-vp9, as not only was it the only way for the encoder to utilize scene-detection, but it also allowed for the placement of alternate reference frames. Not doing that seriously cripples the encoder in what it can do. It also disables other stuff, but this also applies to aomenc-av1, so letĀ’s move on to the AV1 encoder again.

In aomenc-av1, 2-pass allows for these things in particular:
- More advanced scene detection when the lookahead buffer is high enough.
- Partition recoding: the encoder itself can decide whether or not to redo partition selection based on the preset on other conditions, resulting in better partition selection.
- Better auto-alt-ref placement through the encoded stream.

It also does some more advanced things, so IĀ’d advise keeping it on if you can

So yeah, always use 2-pass if you can. Luckily, itĀ’s set by default in the standalone encoder, so you donĀ’t need to do anything if you utilize a utility like nmkoder or av1an

As for lookahead, it is controlled through a parameter thatĀ’s called --lag-in-frames.

More lookahead in the form of lag-in-frames in aomenc gives you

- Better rate control.

- Better temporal-rdo.

- Better frame-placement.

- Generally more effective motion preservation due to a combination of previous and other factors.

In default aomenc, the range of lag-in-frames is 0-48, with the default being 35.
I always recommend putting to 48 as it increases efficiency nicely without any significant penalties other than higher memory consumption.

Another effect of lag-in-frames is the kind of scene detection the encoder decides to choose.

0-18: No scene-detection.

19-32: Scene detection mode 1 is active(due to limited future frame prediction)

33 and higher: Scene detection mode 2 is active due to large number of future references allowing for the highest level of scene detection present in aomenc and more information is gathered.

Temporal filtering

Code:
--arnr-strength=2 --arnr-maxframes=3
for medium fidelity live-action.

Code:
--arnr-strength=1 --arnr-maframes=3
for higher fidelity live-action. This will keep the temporal filtering on at low strength unless it decides it doesnĀ’t need it.

Code:
--arnr-strength=0
for animation.

Contrary to what I and many others believed, the arnr-maxframes=X parameter does not affect the maximum number of alternate reference in the encoderĀ’s search space sadly.

So, the settings written above affect temporal filtering, and nothing else. Interestingly enough, temporal filtering isnĀ’t exclusive to AV1 encoders: it can be found in other encoders for other standards and can even be found in some HW encoders, but thatĀ’s a discussion for another day.

That means `--arnr-strength=X` affects the strength of the filtering itself.
Higher = stronger = less detailts/artifacts pass through at the same quantizer.

I am of the philosophy that less is more, and if you want more filtering, you want to use external filtering which has way more dials to turn with to tweak the output. However, the filtering within the encoder is simple, decently effective, and tied to the encoding process decently(which can cause some problems however...) by lowering the filtering strength if your quantizer chosen is low enough. Of course, the adjustment itself isnĀ’t very high(1), so I prefer setting it lower myself.

As for arnr-maxframes, the trick is pretty simple: lower number of frames gets you higher visual consistency as with all spatio-temporal filtering, while a bigger filtering window gets you potentially higher quality filtering at the cost of a higher change of temporal artifacts. I prefer a low amount of frames to be used for temporal filtering for a more consistent look.

Animation is low variance by default, so there is no need to have temporal filtering on at all.

Spatial and spatio-temporal adaptive quantization

Code:
--aq-mode=1 --deltaq-mode=1
for low-mid fidelity encoding.

Code:
--aq-mode=1 --deltaq-mode=0
for higher fidelity and grainy encoding.

Code:
--aq-mode=1 --deltaq-mode=0 --enable-tpl-model=0
if you want the most stable grain possible, not the best one. You can also disable adaptive quantization for even more stable quantizer utilization, but at this time with default aomenc, I do not recommend doing that.

At very low bitrates, you can disable adaptive quantization entirely.

In aomenc, you have access to 3 spatial aq-modes:
  • aq-mode=1 is a variance based aq-mode, giving more bits to low variance blocks within SBs.
  • aq-mode=2 is a complexity based aq-mode, setting an AC bias(IE, high frequency varied pattern) to give more bits where high frequency detail is located.
  • aq-mode=3 is based on cyclic refresh AQ, giving more bits to moving spots within a mostly very static frame, such as in a video conference.

I pretty much always recommend aq-mode=1, since encoders are usually not very good at giving bits to low variance spots, and aomenc is no exception to that(in fact, IĀ’d argue it’s not very good at it in the 1st place). It would be nice if the aq-mode=1 also had an AC bias like in x264/x265’s aq-modes, but thatĀ’s a topic for another day.

As for the spatio-temporal deltaq-mode=X options(1/2, 3/4 are meant for AVIF/all-intra currently), they do some things rather interestingly.

deltaq-mode=1 is spatio-temporal adaptive quantization based on objective metrics, working in tandem with temporal RDO (tpl-model) to get nice coding gains by deciding costs between inter and intra coding modes alongside temporal optimizations. Works well at low-mid bitrates, but at higher fidelity levels and especially grainy stuff, it can be a detriment to fidelity.

deltaq-mode=2 is supposed to be the perceptual version of this , but not only does it not work well currently, but it also comes with a large speed penalty even at CPU-2/3, so I do not recommend using it at all as of March 2022.

Sorry if this is not the full post, but there are character limits on Doom9, and since I didn't post much until today, I need to wait for the mods' approval to post the 2nd part.

Last edited by BlueSwordM; 5th March 2022 at 06:38. Reason: Mistakes
BlueSwordM is offline   Reply With Quote
Old 3rd March 2022, 01:35   #2  |  Link
BlueSwordM
Registered User
 
BlueSwordM's Avatar
 
Join Date: Dec 2021
Posts: 9
Sharpness

Code:
--sharpness=0
for low fidelity encoding.

Code:
--sharpness=1
for anything approaching high fidelity. Don’t bother setting it higher in the mainline aomenc build, the aomenc devs ruined it in June of 2021.

Before June 2021, the sharpness parameter affected how End of Block(EoB) optimizations were done and how high the RD multiplier offset was set at(every sharpness uptick added +0.1 to the RD multiplier), which forced the encoder to utilize sharper transforms, leading to more of the original sharpness being kept, higher detail retention and most importantly, better clarity in high motion segments.

After June 2021, the aomenc devs decided to F everything up, and while trying to make good changes, mostly succeeding, they decided to remove the RD multiplier offset entirely, which meant that they made `--sharpness=1` equal to `--sharpness=2-5`, making it practically useless under our noses before some us noticed and decided to change that BS behaviour in my aom-av1-psy fork.

Grain synthesis

Code:
--enable-dnl-denoising=0 –denoise-noise-level=5
if you use aomenc by itself

Code:
--film-grain-table=photon-noise-isoXXX.tbl
if you use the photon noise tool for the grain synthesis application.

Code:
--photon-noise=X
as an av1an parameter if you use av1an. 1X = 100ISO, NX= N*100ISO

Since the grain synth guide is still valid, I’ll just copy paste it from my 3rd generation guide:

For --denoise-noise-level=XX (crappy name, I know), a higher number dictates a larger amount of noise. The default mode of operation (--enable-dnl-denoising=1) denoises the input in the 1st pass, after which the denoised stream is passed on to the encoder to do the rest of the job. I

It does an ok job at grain synthesis, but because of the denoising pass, not only does the 1st pass become agonizingly slow, practically doubling the already lengthened encoding process, but it also gives a lower quality output than would be expected. That is why a new option in the form of giving the user control to disable that pesky denoising was added in 2020, being --enable-dnl-denoising=0.

This bypasses the denoiser entirely, restoring the normal 1st pass speed, making the normal encoding process a bit faster, and giving a higher quality output. In live-action content, it does quite well, which is why I always recommend enabling it for that kind of content. Of course, the grain synth process in aomenc is still not threaded, so it can cause some problems still at it is a latency bottleneck.

For photon noise, I’d rather link directly to my still valid old guide since this post is getting long as is:
https://old.reddit.com/r/AV1/comment...is_tables_for/

Rate distortion tuning

Code:
--tune=psnr
This argument dictates what metric the encoder uses for rate distortion tuning. RT presets don’t use that at all. It also only affects RD calculations, nothing else in the encoder, which is why even the butteraugli RD tune can’t magically fix everything in the encoder. It certainly helps a lot, but it’s still not enough to turn it into x264.

The SSIM RD tune is indeed superior since it performs additional psy block distortion optimizations to distribute bitrate more evenly towards what we deem as higher quality. I recommend it somewhat for live-action, but I will repeat myself: do not use it for animation :P

The VMAF tunes are all bad except for `--tune=vmaf_without_preprocessing`, but it’s quite slow, so I wouldn’t use it.

The butteraugli tune is the best, but it currently only works in 8-bit and on Linux builds, so I’m not even going to mention it.

There is also other tune that is pretty decent and works on all OSes, but I will reserve that for another time.

Decoding optimizations

Code:
--enable-cdef=0 --enable-restoration=0
CDEF is a very smart very effective deringing filter, so keep it on unless you really think you don't need it for fidelity or for decoding purposes as mentioned down below.

Restoration filtering are filters that aomenc can use to get back some detail lost by the encoding process, utilizing filters like wiener restoration filtering and self guided restoration filtering. These are normally quite useful and at higher bitrates, they usually back off in terms of strength quite nicely.

However, they can be decoding bottlenecks at high resolutions, so disabling them is a good idea. I personally recommend to disable restoration filtering first, and if really needed, you can disable CDEF filtering completely as well. You could also disable the loop filtering, but doing that honestly is never a good idea until you want your stream to look like x264 ultrafast.

Note: Starting at CPU-5, restoration filtering is disabled entirely, which is one of the main reasons CPU-5 is a decent bit faster vs CPU-4.

Miscellaneous arguments

--tune-content=default --- Leave this to the default tune unless you encode pure screen content(screen sharing or Peppa the Pig types of animation). For gaming, just leave the encoder to decide.

--enable-qm=1 --- This enables quantization matrices for aomenc. I have 0 idea why it’s not enabled by default, as it provides free psy and coding gains. Always leave it on no matter what. There are no penalties for enabling it. For reference, the default min-qm table is 5, and the default max-qm table is 9, which is a good choice of constants.

Smaller QM table = steeper quantization matrix(bigger differences between each step)
Bigger QM table = flatter quantization matrix(smaller differences between each step)

--quant-b-adapt=0/1 --- This parameter, unlike what I said in the previous guide, does not enable a special adaptive quantization flag. Instead, it enables further block optimizations for “trellis” optimization adaptively. Enabling it does increase efficiency, but it can decrease fidelity in some cases, but the fact that it’s not consistently doing so means it’s not bad for high fidelity. On or off doesn’t matter too much unless you’re at low bitrates, where enabling it does consistently help.

--enable-fwd-kf=1 -- This parameter enables bi-directional keyframes and open-GOP. Always leave it on since there aren’t any significant encoding or decoding penalties with it on. Even with the nature of chunked encoding causing bi-directional Kfs to be much rarer, it still allows for open-GOP at the mini-GOP level to give a decent efficiency uplift.

--enable-chroma-deltaq=0 --- To those reading the previous guide, this might seem rather strange. Why would I recommend a parameter in the past that I’m not recommending anymore? Well, it’s because this parameter takes away chroma bits: specifically, it increases the Q by 2 for chroma channels. I thought it was the opposite for a long time. Why? It was meant for 4:4:4 sources and was never tweaked beyond that. It is actually very good for 4:4:4 sources where chroma resolution is plenty. For 4:2:0 sources where chroma data is scarce, utilizing such a parameter in default aomenc starves the chroma channels even more, creating even more distracting color artifacts. For that reason alone, I would not use it for video sources where 4:2:0 is the most prevalent chroma subsampling factor.

That might change in the near future, but that is currently not the case sadly.

--enable-keyframe-filtering=0/1/2
Use KF=2 if you can use av1an/nmkoder/aomenc-by-gop with MKVToolnix/MKVMerge to merge the clips and it is the most efficient.

--keyframe-filtering=1 –arnr-strength=1 if you want to avoid the dreaded KF=1 low probability random BS artifacts unless you use the aom-av1-psy build which manages to fix it in a smart way, and KF=0 if you want to avoid all of that at a significant efficiency penalty.

--profile=0/1/2 --- profile 0 for 10-bit 4:2:0, profile 1 for 10-bit 4:4:4, profile 2 for 12-bit and 4:2:2.

HDR encoding and metadata
Code:
 --deltaq-mode=5 --color-primaries=bt2020 --transfer-characteristics=smpte2084 --matrix-coefficients=bt2020ncl
These are the usual arguments for 10-bit HDR BT2020 sources, as it it the most common way to get HDR.
--deltaq-mode=5 is a deltaq mode that adjust the luma and chroma quantizer in blocks according to the block luma average in HDR defined in T-REC-H.Sup15.

Sorry for the much bigger walls of text, but I’ve amassed an immense amount of knowledge and experience ever since I’ve written the 1st aomenc-av1 guide, and as such, I had to be much more thorough in my writing, while also correcting my previous rather naive mistakes caused by my lack of knowledge in the encoder and the standard itself. I’m actually surprised no one tried to correct me until a few months ago, which is when I started to write the 2nd generation aomenc-av1 guide.

Important note: These parameters are all meant for the mainline aomenc build. My current aom-av1-psy build is an entirely different monster that deserves its own separate post since half of the post would be a rant.

Now, for the piece of resistance; the settings you’ve been waiting for all along!


Settings for standalone aomenc that I use with default aomenc(mostly for chunked encoding in av1an/nmkoder with thread pinning and aomenc-by-gop) at 1080p:
Code:
--threads=2 --cpu-used=3 --end-usage=q --cq-level=24 --enable-fwd-kf=1 --aq-mode=1 --lag-in-frames=48 --bit-depth=10 --kf-max-dist=240 --kf-min-dist=12 –enable-qm=1 --sb-size=64 --enable-keyframe-filtering=2 --arnr-strength=2 --arnr-maxframes=3` `--sharpness=1 --enable-dnl-denoising=0 --denoise-noise-level=5
Higher fidelity using aomenc in chunked encoding at 1080p:
Code:
--threads=2 --cpu-used=3 --end-usage=q --cq-level=18 --enable-fwd-kf=1 --aq-mode=1 --lag-in-frames=48 --bit-depth=10 --kf-max-dist=240 --kf-min-dist=12 --enable-qm=1 --sb-size=64 --enable-keyframe-filtering=2 --arnr-strength=1 --arnr-maxframes=3 --deltaq-mode=0 --sharpness=1 --enable-dnl-denoising=0 --denoise-noise-level=5

Highest fidelity

Code:
--threads=2 --cpu-used=3 --end-usage=q --cq-level=16 --enable-fwd-kf=1 --aq-mode=1 --lag-in-frames=48 --bit-depth=10 --kf-max-dist=240 --kf-min-dist=12 --enable-qm=1 --sb-size=64 --enable-keyframe-filtering=2 --arnr-strength=1 --arnr-maxframes=3 --enable-restoration=0 --deltaq-mode=0 --sharpness=1 --enable-dnl-denoising=0 --denoise-noise-level=5
If you want to probe the stream with ffmpeg until the ffmpeg folks fix the KF=2 behavior:
Code:
--threads=2 --cpu-used=3 --end-usage=q --cq-level=18 --enable-fwd-kf=1 --aq-mode=1 --lag-in-frames=48 --bit-depth=10 --kf-max-dist=240 --kf-min-dist=12 --enable-qm=1 --sb-size=64 --arnr-strength=1 --arnr-maxframes=3 --deltaq-mode=0 --sharpness=1 --enable-dnl-denoising=0 --denoise-noise-level=5
If you’re using chunked encoding and lack enough RAM for more workers, you can increase the threads parameter to --threads=4.

If you’re encoding at higher resolutions, you can up that to 8 threads, discard grain synthesis if you like since you’re using higher bitrates, and up the parameter `--tile-columns` to `--tile-columns=1` and at 4k, `--tile-columns=2 –tile-rows=1` to gain maximum decoding performance.

For 2D animation, just setting `--arnr-strength` to --arnr-strength=0 is your best bet

If you like to encode using ffmpeg, here are some base parameters you can play with(use 2-pass ffmpeg please if you want the most optimal encoding with aomenc; for simple encoding, just use SVT-AV1):
Code:
ffmpeg -i input.mkv -c:v libaom-av1 -cpu-used 3 -threads 8 -crf 18 -arnr-max-frames 3 -arnr-strength 1 -aq-mode 1 -denoise-noise-level=5 -lag-in-frames 48 -tile_columns 1 -aom-params sb-size=64:enable-qm=1:enable-dnl-denoising=0:deltaq-mode=0 g 240 -keyint_min 12 -pix_fmt yuv420p10le -c:a copy

If you have any additional questions or any corrections/clarification you would like for me to add in, please leave them below.
Criticisms welcome.

Last edited by BlueSwordM; 3rd March 2022 at 18:10. Reason: Clarification
BlueSwordM is offline   Reply With Quote
Old 3rd March 2022, 06:51   #3  |  Link
Blue_MiSfit
Derek Prestegard IRL
 
Blue_MiSfit's Avatar
 
Join Date: Nov 2003
Location: Los Angeles
Posts: 5,926
This is great info, thanks for sharing! AOM is a bit obtuse and there's not a lot of great documentation.
__________________
These are all my personal statements, not those of my employer :)
Blue_MiSfit is offline   Reply With Quote
Old 3rd March 2022, 15:14   #4  |  Link
rbauer
Registered User
 
Join Date: Sep 2010
Posts: 29
Quote:
Originally Posted by BlueSwordM View Post
Decoding optimizations

Code:
--enable-cdef=0 --enable-restoration=0
CDEF is a very smart very effective deringing filter, so keep it on unless you really need it.
You mean "so keep it off", right?


Many thanks

Last edited by rbauer; 3rd March 2022 at 15:41.
rbauer is offline   Reply With Quote
Old 3rd March 2022, 18:03   #5  |  Link
BlueSwordM
Registered User
 
BlueSwordM's Avatar
 
Join Date: Dec 2021
Posts: 9
@rbauer, you are correct, but not in the sense that I was trying to say it :P
BlueSwordM is offline   Reply With Quote
Old 4th March 2022, 06:42   #6  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,116
Great article! Very useful to the community.

I was bedeviled by one missing verb, though:
Quote:
for video games, I usually recommend the Q level by 10-15 above the usual recommendations to achieve similar bitrates compared to easier content.
Raising? Lowering?
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 4th March 2022, 16:15   #7  |  Link
rbauer
Registered User
 
Join Date: Sep 2010
Posts: 29
Quote:
Originally Posted by BlueSwordM View Post
@rbauer, you are correct, but not in the sense that I was trying to say it :P
I think I didn't understand correctly (yep, evidently my english sucks): the recommended value for
"--enable-cdef" is on (--enable-cdef=1) or off (--enable-cdef=0)?

Thanks
rbauer is offline   Reply With Quote
Old 5th March 2022, 06:39   #8  |  Link
BlueSwordM
Registered User
 
BlueSwordM's Avatar
 
Join Date: Dec 2021
Posts: 9
@benwaggoner, the verb that was missing was raising the Q/CRF.

@rbauer, the recommend parameter is keeping it on, AKA --enable-cdef=1

Anyway, that's all from me now.
I have to finish an article for a website, and my 5th post about aom-av1-psy.
BlueSwordM is offline   Reply With Quote
Old 16th March 2022, 22:29   #9  |  Link
rbauer
Registered User
 
Join Date: Sep 2010
Posts: 29
Quote:
Originally Posted by BlueSwordM View Post
If you like to encode using ffmpeg, here are some base parameters you can play with(use 2-pass ffmpeg please if you want the most optimal encoding with aomenc; for simple encoding, just use SVT-AV1):
Code:
ffmpeg -i input.mkv -c:v libaom-av1 -cpu-used 3 -threads 8 -crf 18 -arnr-max-frames 3 -arnr-strength 1 -aq-mode 1 -denoise-noise-level=5 -lag-in-frames 48 -tile_columns 1 -aom-params sb-size=64:enable-qm=1:enable-dnl-denoising=0:deltaq-mode=0 g 240 -keyint_min 12 -pix_fmt yuv420p10le -c:a copy
Unfortunately "denoise-noise-level=5" doesn't work.


I'm testing the latest ffmpeg (64bit; Latest Auto-Build, 2022-03-16, with libaom-avi 3.3.0. Win10-64bit): trying to transcode an mp4 file (x264 video and aac audio.
I want to transcode the video stream and just copy the audio stream to a av1/mkv file.

The original video stream is 15 fps (power point slides, screen sharing, remote college lessons, etc.).

Command line:
Code:
>ffmpeg -i Test.mp4 -c:v libaom-av1 -pix_fmt yuv420p10le -cpu-used 3 -threads 6
-crf 41 -arnr-max-frames 3 -arnr-strength 1 -aq-mode 1 -denoise-noise-level=5
-lag-in-frames 48 -tile_columns 1 -aom-params
sb-size=64:enable-qm=1:enable-dnl-denoising=0:deltaq-mode=0:quant-b-adapt=1:
enable-keyframe-filtering=1:sharpness=1 -g 150 -keyint_min 12 -c:a copy Test.mkv
It doesn't work: "Unrecognized option 'denoise-noise-level=5'. Error splitting the argument list: Option not found".



This work:
Code:
>ffmpeg -i Test.mp4 -c:v libaom-av1 -pix_fmt yuv420p10le -cpu-used 3 -threads 6
-crf 41 -arnr-max-frames 3 -arnr-strength 1 -aq-mode 1 -lag-in-frames 48
-tile_columns 1 -aom-params sb-size=64:enable-qm=1:enable-dnl-denoising=0:
deltaq-mode=0:denoise-noise-level=5:quant-b-adapt=1:
enable-keyframe-filtering=1:sharpness=1 -g 150 -keyint_min 12 -c:a copy Test.mkv

Unfortunately "enable-fwd-kf=1" doesn't work both in the ffmpeg cl (-enable-fwd-kf=1) and in the -aom-params (:enable-fwd-kf=1): "Unrecognized option 'enable-fwd-kf=1'. Error splitting the argument list: Option not found".



I should probably use ffmpeg just to pass the file content (rawvideo or similar, I suppose) to your aom-av1 variant (aom-av1-psy): could you kindly suggest a good combination with ffmpeg and your aom-av1-psy for my "user case"?


Many Thanks
rbauer is offline   Reply With Quote
Old 28th May 2022, 04:25   #10  |  Link
BuccoBruce
Registered User
 
Join Date: Apr 2022
Posts: 14
Curious what you think! I get the feeling I'm just thinking through something you've already considered?

Quote:
Originally Posted by BlueSwordM View Post
For photon noise, I’d rather link directly to my still valid old guide since this post is getting long as is:
https://old.reddit.com/r/AV1/comment...is_tables_for/
Quote:
Who knows, I might even write a TPDF grain synthesis noise tool to make tables that are more accurate for 2D animated sources :P
I've been thinking about this since I first read your guides. And it looks like now I might finally have my answer. And you might be far closer to making that kind of grain synthesis tool than you think!

Quote:
Originally Posted by BuccoBruce View Post
Quote:
Originally Posted by Beelzebubu View Post
Quote:
Originally Posted by BuccoBruce View Post
Is there any way to "trick" aomenc, or preferably aom-av1-psy, into utilizing grain synthesis by analyzing a grainy, untouched source but actually encoding video that was de-grained or de-noised externally?
Check examples/noise_model, it can create a grain table which represents the difference between a "noisy" and "denoised" source, and the resulting grain table can be input into aomenc using the --film-grain-table argument.
Thank you for pointing me in the right direction!

"/examples/noise_model.c"? Oh God, it's code!

All joking aside, looking at the comments in the code (wow, this is really well documented...) this looks like exactly what I described! If I'm reading CMakeLists.txt correctly, I just need to make sure to add -DENABLE_EXAMPLES=1 to my cmake command. cmake .. -G "Visual Studio 16 2019" -DENABLE_EXAMPLES=1 and then cmake --build .. Easy enough, I'll have a noise_model.exe waiting for me in an "examples" subfolder in my build directory, right?
Unless the resulting AV1 encoded with such a grain table is absolutely massive in comparison, all that would be missing from a more practical tool (beyond the obvious optimization - I don't expect something I found in the "examples" folder to run quickly) would be to port or write something like msg7086's AVISynth input modification to x265 (also in DJ Atom's x265 fork). I'm pretty sure the Yuuki branch is where it's easiest to see how they first added input functionality. I think Patman also has an x264 fork with AVS/VPY input support.

I tried looking through how AOMEnc handles input, and it does already have extra..."bits" for reading IVF, OBU, and WEBM, but I imagine it's only aomdec.exe that uses them. I'd try to tackle it myself if I knew even a "hello world" level of C/C++, but alas, even VapourSynth intimidates me. If it's not a batch or shell script, I probably can't figure out how it works.

As to why it's the only thing keeping what's already there from being practical...unless AOMEnc's grain synthesis function/table implementation while encoding is lacking - there's no easy way to pipe two different videos that I'm aware of. FFASTrans seems really fascinating but I don't think it can do it either.

Piping one video? Easy. ffmpeg, or any of the avs2xyz pipes out there. in | out.

Two or more videos? Can they be AVIs? Rename to .avi if it's something that can use VFW. Or if ffmpeg is in there somewhere, compile it with AVISynth input support. Else, use something like AVFS...

Two or more videos as .yuv? I heard you liked filling up hard drives!

1,382,400 Y/U/V pixels in 1280x720 4:2:0 x 10bpp = 13,824,000 bits/frame...nearly 14 Mbps/frame. x 23.976fps / 8bit/byte blah blah blah 41,430,528 bytes/sec. Oh. 40 MB/s isn't going to kill a mechanical hard drive, but it will fill it up pretty quickly. 23 GB to store a 10 minute 720p uncompressed "intermediate". 88.9 MB/s for 1080p. Uncompressed YUV is massive. And you better have multiple drives available for this, unless you're considering gobbling up all of those SSD write cycles for a few hundred gigabytes of raw video with sweet sweet random access time...

So it already exists. It's just extremely impractical for anything but professional use.

Quote:
Originally Posted by BuccoBruce View Post
Needing two .yuv inputs is definitely going to limit what I can do with this. Using ffmpeg itself to pipe .yuv is easy enough, but even then I'd still need to hog ~60 GB up (at least for what I want to test out) somewhere for keeping the degrained output around as raw .yuv. Normally if I needed to access filtered output twice I'd just use FFV1 or something, and it would save time to filter once and then re-use that same output for grain analysis and encoding, but that doesn't help me much. I'd still need a way to pipe FFV1 -> YUV to the grain analysis alongside source -> YUV.

Now I can see why those x264/x265 mods that can read AviSynth input directly exist.
I want to try it out on something anyways once it's done compiling. I can't remember if there's an exception here for short sample clips of copyrighted material, so unless there are Looney Tunes shorts on any of the Blu-Rays I own that are already in the public domain, which I doubt, then I might have to try (and share) on something else. I don't know if any test sample clips have grain similar to what I'd be testing.
BuccoBruce is offline   Reply With Quote
Old 29th May 2022, 04:03   #11  |  Link
BlueSwordM
Registered User
 
BlueSwordM's Avatar
 
Join Date: Dec 2021
Posts: 9
@BuccoBruce, yeah it would work, but the main problem with aomenc's current grain synth implementation(and SVT-AV1's to a lesser extent), is that it's not very good.

That does is mostly due to a lack of dynamic strength control with the normal video toolset and a suboptimal default random seed for noise generation(although that can be fixed on the code side funnily enough, so not a problem).

aomenc does have a grain synth estimation tool within the all-intra tools, but it's a bit buggy when trying to get it to work for non all-intra sources last time I tried it.

Last edited by BlueSwordM; 29th May 2022 at 05:32. Reason: Clarification and corrections
BlueSwordM is offline   Reply With Quote
Old 29th May 2022, 05:08   #12  |  Link
BuccoBruce
Registered User
 
Join Date: Apr 2022
Posts: 14
Quote:
Originally Posted by BlueSwordM View Post
@BuccoBruce, yeah it would work, but the main problem woith aomenc's current grain synth implementation(and SVT-AV1's to a lesser extent), is that it's not very good.

aomenc does have a grain synth estimation tool within the all-intra tools, but it's a bit buggy when trying to get it to work for non all-intra sources.
Quote:
Originally Posted by BuccoBruce View Post
It's kinda fugly but it's a lot better than I was expecting. Hideous if you seek by frame but pretty passable if you're just watching it. I had to use two .yuv intermediates to make it work.
BuccoBruce is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 12:02.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, vBulletin Solutions Inc.