Alliance for Open Media codecs [Archive] - Page 28

NikosD

25th December 2018, 08:46

So, AVX2 is missing only 4:4:4 and SVC but SSSE3 is missing everything (almost) for 8bit.

Thank you!

utack

25th December 2018, 19:43

Did a quick test for some typical "sent by phone video". Shot on a phone, 30s, medium resolution and bitrate
x264 crf 26 and "placebo" to get a bitrate estimate for medium-poor quality, libaom cpu-used=3 in 2pass mode to match the bitrate and compare
Turns out for this medium resolution (720p), and with a lot of "high frequency" motion (water waves and grass) x264 is still extremely competitive, and imho even beats libaom here in 1/3 screenshots
http://screenshotcomparison.com/comparison/126513

hajj_3

29th December 2018, 11:28

It looks like some new SSSE3 optimisations for Dav1d have been submitted: https://code.videolan.org/videolan/dav1d/commit/9ea56386dee2706d94f3c2dac1720bcf4961aaba

hajj_3

30th December 2018, 12:04

what AV1 decoder does the latest MPC-BE x64 use (v1.5.3 4246 beta)? When playing a 720p 25fps AV1 video it uses up to 61% cpu on my kabylake i3-7100u, using vlc player 3.0.5 (which uses dav1d) it uses up to 35% playing the same video.

v0lt

30th December 2018, 20:12

@hajj_3
MPC-BE used libaom git-v1.0.0-748-g8048e8c0b.
https://sourceforge.net/p/mpcbe/code/HEAD/tree/trunk/lib64/

marcomsousa

3rd January 2019, 09:16

@hajj_3
MPC-BE used libaom git-v1.0.0-748-g8048e8c0b.
https://sourceforge.net/p/mpcbe/code/HEAD/tree/trunk/lib64/

Just update to libaom git-v1.0.0-1116-g00c80e6b5 (3 hours ago) next build will be updated.

benwaggoner

4th January 2019, 20:45

Did a quick test for some typical "sent by phone video". Shot on a phone, 30s, medium resolution and bitrate
x264 crf 26 and "placebo" to get a bitrate estimate for medium-poor quality, libaom cpu-used=3 in 2pass mode to match the bitrate and compare
Turns out for this medium resolution (720p), and with a lot of "high frequency" motion (water waves and grass) x264 is still extremely competitive, and imho even beats libaom here
Water waves and grass are really hard to encode, and classic per-frame PNSR or SAD style optimization don't yield good results. There's a lot of psychovisual tuning to keep the motion looking natural without getting block-based basis pattern leaking in. And a lot of rate control to keep a part of a frame with that content looking good without sucking all the bits away from the rest of the frame and making them look bad.

That's the kind of stuff that comes from a mature encoder with lots of psychovisual tweaks. Which defines x264 in spades, and which x265 inherited a lot of. The real-world performance of those encoders has more to do with the foundational legacy of loving obsessive attention from quality @ bitrate obsessed video pirates than any particular underlying bitstream features. I bet an x262 could have outperformed any MPEG-2 encoder for anime DVDs, for example.

utack

4th January 2019, 23:13

Water waves and grass are really hard to encode, and classic per-frame PNSR or SAD style optimization don't yield good results. There's a lot of psychovisual tuning to keep the motion looking natural without getting block-based basis pattern leaking in. And a lot of rate control to keep a part of a frame with that content looking good without sucking all the bits away from the rest of the frame and making them look bad.

That's the kind of stuff that comes from a mature encoder with lots of psychovisual tweaks. Which defines x264 in spades, and which x265 inherited a lot of. The real-world performance of those encoders has more to do with the foundational legacy of loving obsessive attention from quality @ bitrate obsessed video pirates than any particular underlying bitstream features. I bet an x262 could have outperformed any MPEG-2 encoder for anime DVDs, for example.

Thanks for your insight.
Would you attribute this mostly to excellent psychovisual tuning or are video streams of small dimensions with a lot of motion and 4x4 blocks areas where AV1 might always be much better even in a theoretical best case scenario?

benwaggoner

5th January 2019, 21:10

Thanks for your insight.
Would you attribute this mostly to excellent psychovisual tuning or are video streams of small dimensions with a lot of motion and 4x4 blocks areas where AV1 might always be much better even in a theoretical best case scenario?
H.264 and HEVC both have 4x4 blocks as well, so that feature alone isn’t going to be make-or-break.

As for comparing formats, the codec specs are like what you have in your fridge. The encoder is like the cook. A great cook can make simple ingredients into something wonderful, and a terrible cook can make a disaster out of the best ingredients. A great cook with a wide variety of great ingrediants is what gives the optimal results.

In comparing codecs, all we really can compare is the dishes that come out of the kitchen, though. Is a meal great or bad due to the cook or the ingredients? It’s hard to say and involves a lot of educated guesses and speculation.

For example, x264 with —preset placebo —tune film is probably going to produce better quality @ bitrate with typical content than libaom at its absolute fastest settings. It’s really quality @ perf @ bitrate, and that’s controlled by encoder optimization even more than the bitstream format. Stuff like AVX2 optimization will produce better quality within a given bitrate @ perf, because more options get tried and tools get used. And that’s with absolutely no change to psychovisual tuning or bitstream. It’s just the same results, faster.

Of course even that can be impacted by bitstream details. The bigger block sizes of HEVC mean that AVX2 and AVX512 offer bigger gains than with x264. Even choice of processor can change relative quality @ perf @ bitrate, as differenr encoders make better or worse use of lots of cores or more advanced SIMD.

We can only really know how “good” HEVC or AV1 or VVC are based on the best avaialable encoder for a given use case. And that can be hard to predict. Certainly cable MPEG-2 is a lot more efficient today than anyone predicted or could demonstrate when MPEG-2’s spec was finished.

hajj_3

7th January 2019, 13:08

http://www.streamingmedia.com/Articles/News/Online-Video-News/Unified-Patents-Challenges-Velos-Media-Patent-128870.aspx

LigH

7th January 2019, 13:18

I was afraid to click an uncommented URL ... but did anyway.

This article is about HEVC licensing. Only marginally related to AOM.

TomV

16th January 2019, 01:42

Hey AV1 experts... can you share your command line for your highest subjective quality encodes? In other words, what's your recommended settings for the equivalent of --preset veryslow?

Tommy Carrot

16th January 2019, 12:57

The preset equivalent for aomenc is --cpu-used. Contrary to what would be logical, it has nothing to do with threading. --cpu-used=0 is the equivalent of preset placebo, while 8 is fastest (more precisely the least slow) preset. I would not go over 6, after that the quality regression is really noticeable.

utack

16th January 2019, 13:24

I would not go over 6, after that the quality regression is really noticeable.

What kind of SSIM difference is noticeable?
Speed 1, 5 and 8 (https://www.arewecompressedyet.com/?job=av1_sp1_010819%402019-01-11T15%3A41%3A56.697Z&job=av1_sp5_011019%402019-01-11T20%3A49%3A41.789Z&job=av1_sp8_011019%402019-01-11T19%3A26%3A13.348Z)

Tommy Carrot

16th January 2019, 16:11

I'm talking about visual quality. --cpu-used 7 and 8 are looking significantly worse than 6, while the encoding speed doesn't improve that much.

benwaggoner

16th January 2019, 18:51

Hey AV1 experts... can you share your command line for your highest subjective quality encodes? In other words, what's your recommended settings for the equivalent of --preset veryslow?
Probably more like —preset very slow —tune film (or grain, or animation). The psychovisual tuning that’s implicit in x264/5’s RF model, and explicit in —tune and other parameters, are an important component.

benwaggoner

16th January 2019, 18:55

What kind of SSIM difference is noticeable?
Speed 1, 5 and 8 (https://www.arewecompressedyet.com/?job=av1_sp1_010819%402019-01-11T15%3A41%3A56.697Z&job=av1_sp5_011019%402019-01-11T20%3A49%3A41.789Z&job=av1_sp8_011019%402019-01-11T19%3A26%3A13.348Z)
Mean of per frame SSIM is not a great metric, honestly. The mean of any per-frame metric isn’t going to catch variability of quality, and any spatial-only metric will be bad at catching frame strobing or other visual discontinuities.

The latest VMAF is the least-bad metric available, by a good margin. But even it isn’t very useful at comparing between high quality encodes and some kinds of artifacts.

Encoding history is littered with encoders with promising PSNR and SSIM scores that just didn’t look very good.

TomV

17th January 2019, 17:14

The preset equivalent for aomenc is --cpu-used. Contrary to what would be logical, it has nothing to do with threading. --cpu-used=0 is the equivalent of preset placebo, while 8 is fastest (more precisely the least slow) preset. I would not go over 6, after that the quality regression is really noticeable.

Yes, I'm aware of that. I've been using --cpu-used=1 Thanks.

TomV

17th January 2019, 17:20

Probably more like —preset very slow —tune film (or grain, or animation). The psychovisual tuning that’s implicit in x264/5’s RF model, and explicit in —tune and other parameters, are an important component.

--preset is not an option (I get an error message)
There is a --tune option (psnr, ssim, cdef-dist, daala-dist), and a --tune-content option (default, screen)

Here's the help file. It seems like every option that would improve visual quality is on by default.

C:\Test>aomenc --help
Usage: aomenc <options> -o dst_filename src_filename

Options:
--help Show usage options and exit
-c <arg>, --cfg=<arg> Config file to use
-D, --debug Debug mode (makes output deterministic)
-o <arg>, --output=<arg> Output filename
--codec=<arg> Codec to use
-p <arg>, --passes=<arg> Number of passes (1/2)
--pass=<arg> Pass to execute (1/2)
--fpf=<arg> First pass statistics file name
--limit=<arg> Stop encoding after n input frames
--skip=<arg> Skip the first n input frames
--good Use Good Quality Deadline
-q, --quiet Do not print encode progress
-v, --verbose Show encoder parameters
--psnr Show PSNR in status line
--webm Output WebM (default when WebM IO is enabled)
--ivf Output IVF
--obu Output OBU
--q-hist=<arg> Show quantizer histogram (n-buckets)
--rate-hist=<arg> Show rate histogram (n-buckets)
--disable-warnings Disable warnings about potentially incorrect encode settings.
-y, --disable-warning-prompt Display warnings, but do not prompt user to continue.
--test-decode=<arg> Test encode/decode mismatch
off, fatal, warn

Encoder Global Options:
--yv12 Input file is YV12
--i420 Input file is I420 (default)
--i422 Input file is I422
--i444 Input file is I444
-u <arg>, --usage=<arg> Usage profile number to use
-t <arg>, --threads=<arg> Max number of threads to use
--profile=<arg> Bitstream profile number to use
-w <arg>, --width=<arg> Frame width
-h <arg>, --height=<arg> Frame height
--forced_max_frame_width Maximum frame width value to force
--forced_max_frame_height Maximum frame height value to force
--stereo-mode=<arg> Stereo 3D video format
mono, left-right, bottom-top, top-bottom, right-left
--timebase=<arg> Output timestamp precision (fractional seconds)
--fps=<arg> Stream frame rate (rate/scale)
--global-error-resilient=< Enable global error resiliency features
-b <arg>, --bit-depth=<arg> Bit depth for codec (8 for version <=1, 10 or 12 for version 2)
8, 10, 12
--lag-in-frames=<arg> Max number of frames to lag
--large-scale-tile=<arg> Large scale tile coding (0: off (default), 1: on)
--monochrome Monochrome video (no chroma planes)
--full-still-picture-hdr Use full header for still picture

Rate Control Options:
--drop-frame=<arg> Temporal resampling threshold (buf %)
--resize-mode=<arg> Frame resize mode
--resize-denominator=<arg> Frame resize denominator
--resize-kf-denominator=<a Frame resize keyframe denominator
--superres-mode=<arg> Frame super-resolution mode
--superres-denominator=<ar Frame super-resolution denominator
--superres-kf-denominator= Frame super-resolution keyframe denominator
--superres-qthresh=<arg> Frame super-resolution qindex threshold
--superres-kf-qthresh=<arg Frame super-resolution keyframe qindex threshold
--end-usage=<arg> Rate control mode
vbr, cbr, cq, q
--target-bitrate=<arg> Bitrate (kbps)
--min-q=<arg> Minimum (best) quantizer
--max-q=<arg> Maximum (worst) quantizer
--undershoot-pct=<arg> Datarate undershoot (min) target (%)
--overshoot-pct=<arg> Datarate overshoot (max) target (%)
--buf-sz=<arg> Client buffer size (ms)
--buf-initial-sz=<arg> Client initial buffer size (ms)
--buf-optimal-sz=<arg> Client optimal buffer size (ms)

Twopass Rate Control Options:
--bias-pct=<arg> CBR/VBR bias (0=CBR, 100=VBR)
--minsection-pct=<arg> GOP min bitrate (% of target)
--maxsection-pct=<arg> GOP max bitrate (% of target)

Keyframe Placement Options:
--enable-fwd-kf=<arg> Enable forward reference keyframes
--kf-min-dist=<arg> Minimum keyframe interval (frames)
--kf-max-dist=<arg> Maximum keyframe interval (frames)
--disable-kf Disable keyframe placement

AV1 Specific Options:
--cpu-used=<arg> CPU Used (0..8)
--auto-alt-ref=<arg> Enable automatic alt reference frames
--sharpness=<arg> Loop filter sharpness (0..7)
--static-thresh=<arg> Motion detection threshold
--row-mt=<arg> Enable row based multi-threading (0: off, 1: on (default))
--tile-columns=<arg> Number of tile columns to use, log2
--tile-rows=<arg> Number of tile rows to use, log2
--enable-tpl-model=<arg> RDO modulation based on frame temporal dependency
--arnr-maxframes=<arg> AltRef max frames (0..15)
--arnr-strength=<arg> AltRef filter strength (0..6)
--tune=<arg> Distortion metric tuned with
psnr, ssim, cdef-dist, daala-dist
--cq-level=<arg> Constant/Constrained Quality level
--max-intra-rate=<arg> Max I-frame bitrate (pct)
--max-inter-rate=<arg> Max P-frame bitrate (pct)
--gf-cbr-boost=<arg> Boost for Golden Frame in CBR mode (pct)
--lossless=<arg> Lossless mode (0: false (default), 1: true)
--enable-cdef=<arg> Enable the constrained directional enhancement filter (0: false, 1: true (default))
--enable-restoration=<arg> Enable the loop restoration filter (0: false, 1: true (default))
--enable-rect-partitions=< Enable rectangular partitions (0: false, 1: true (default))
--enable-dual-filter=<arg> Enable dual filter (0: false, 1: true (default))
--enable-intra-edge-filter Enable intra edge filtering (0: false, 1: true (default))
--enable-order-hint=<arg> Enable order hint (0: false, 1: true (default))
--enable-tx64=<arg> Enable 64-pt transform (0: false, 1: true (default))
--enable-dist-wtd-comp=<ar Enable distance-weighted compound (0: false, 1: true (default))
--enable-masked-comp=<arg> Enable masked (wedge/diff-wtd) compound (0: false, 1: true (default))
--enable-interintra-comp=< Enable interintra compound (0: false, 1: true (default))
--enable-smooth-interintra Enable smooth interintra mode (0: false, 1: true (default))
--enable-diff-wtd-comp=<ar Enable difference-weighted compound (0: false, 1: true (default))
--enable-interinter-wedge= Enable interinter wedge compound (0: false, 1: true (default))
--enable-interintra-wedge= Enable interintra wedge compound (0: false, 1: true (default))
--enable-global-motion=<ar Enable global motion (0: false, 1: true (default))
--enable-warped-motion=<ar Enable local warped motion (0: false, 1: true (default))
--enable-filter-intra=<arg Enable filter intra prediction mode (0: false, 1: true (default))
--enable-smooth-intra=<arg Enable smooth intra prediction modes (0: false, 1: true (default))
--enable-paeth-intra=<arg> Enable Paeth intra prediction mode (0: false, 1: true (default))
--enable-cfl-intra=<arg> Enable chroma from luma intra prediction mode (0: false, 1: true (default))
--enable-obmc=<arg> Enable OBMC (0: false, 1: true (default))
--enable-palette=<arg> Enable palette prediction mode (0: false, 1: true (default))
--enable-intrabc=<arg> Enable intra block copy prediction mode (0: false, 1: true (default))
--enable-angle-delta=<arg> Enable intra angle delta (0: false, 1: true (default))
--disable-trellis-quant=<a Disable trellis optimization of quantized coefficients (0: false (default) 1: true)
--enable-qm=<arg> Enable quantisation matrices (0: false (default), 1: true)
--qm-min=<arg> Min quant matrix flatness (0..15), default is 8
--qm-max=<arg> Max quant matrix flatness (0..15), default is 15
--reduced-tx-type-set=<arg Use reduced set of transform types
--frame-parallel=<arg> Enable frame parallel decodability features (0: false (default), 1: true)
--error-resilient=<arg> Enable error resilient features (0: false (default), 1: true)
--aq-mode=<arg> Adaptive quantization mode (0: off (default), 1: variance 2: complexity, 3: cyclic refresh)
--deltaq-mode=<arg> Delta qindex mode (0: off (default), 1: deltaq 2: deltaq + deltalf)
--frame-boost=<arg> Enable frame periodic boost (0: off (default), 1: on)
--noise-sensitivity=<arg> Noise sensitivity (frames to blur)
--tune-content=<arg> Tune content type
default, screen
--cdf-update-mode=<arg> CDF update mode for entropy coding (0: no CDF update; 1: update CDF on all frames(default); 2: selectively update CDF on some frames
--color-primaries=<arg> Color primaries (CICP) of input content:
bt709, unspecified, bt601, bt470m, bt470bg, smpte240, film, bt2020, xyz, smpte431, smpte432, ebu3213
--transfer-characteristics Transfer characteristics (CICP) of input content:
unspecified, bt709, bt470m, bt470bg, bt601, smpte240, lin, log100, log100sq10, iec61966, bt1361, srgb, bt2020-10bit, bt2020-12bit, smpte2084, hlg, smpte428
--matrix-coefficients=<arg Matrix coefficients (CICP) of input content:
identity, bt709, unspecified, fcc73, bt470bg, bt601, smpte240, ycgco, bt2020ncl, bt2020cl, smpte2085, chromncl, chromcl, ictcp
--chroma-sample-position=< The chroma sample position when chroma 4:2:0 is signaled:
unknown, vertical, colocated
--min-gf-interval=<arg> min gf/arf frame interval (default 0, indicating in-built behavior)
--max-gf-interval=<arg> max gf/arf frame interval (default 0, indicating in-built behavior)
--gf-max-pyr-height=<arg> maximum height for GF group pyramid structure (1 to 4 (default))
--sb-size=<arg> Superblock size to use
dynamic, 64, 128
--num-tile-groups=<arg> Maximum number of tile groups, default is 1
--mtu-size=<arg> MTU size for a tile group, default is 0 (no MTU targeting), overrides maximum number of tile groups
--timing-info=<arg> Signal timing info in the bitstream (model unly works for no hidden frames, no super-res yet):
unspecified, constant, model
--film-grain-test=<arg> Film grain test vectors (0: none (default), 1: test-1 2: test-2, ... 16: test-16)
--film-grain-table=<arg> Path to file containing film grain parameters
--denoise-noise-level=<arg Amount of noise (from 0 = don't denoise, to 50)
--denoise-block-size=<arg> Denoise block size (default = 32)
--enable-ref-frame-mvs=<ar Enable temporal mv prediction (default is 1)
-b <arg>, --bit-depth=<arg> Bit depth for codec (8 for version <=1, 10 or 12 for version 2)
8, 10, 12
--input-bit-depth=<arg> Bit depth of input
--input-chroma-subsampling chroma subsampling x value.
--input-chroma-subsampling chroma subsampling y value.
--sframe-dist=<arg> S-Frame interval (frames)
--sframe-mode=<arg> S-Frame insertion mode (1..2)
--annexb=<arg> Save as Annex-B

Stream timebase (--timebase):
The desired precision of timestamps in the output, expressed
in fractional seconds. Default is 1/1000.

Included encoders:

av1 - AOMedia Project AV1 Encoder 0.1.0-11038-g437d957f8 (default)

Use --codec to switch to a non-default encoder.

benwaggoner

17th January 2019, 19:43

--preset is not an option (I get an error message)
There is a --tune option (psnr, ssim, cdef-dist, daala-dist), and a --tune-content option (default, screen)
Sorry, I wasn't clear I was offering x264/5 syntax for what you want an AV1 equivalent to. Which does not (yet?) exist in libaom.

Libaom has very little psychovisual optimization compared to the x26? codecs, or psychovisual tuning options. Libaom is mare like a speed-optimized reference encoder right now. Production AV1 encoders will need to have much more psychovisual tuning.

I'm kinda surprised no one has started making a xAV1 based on x264 like x265 started. Although so many fundamental structures derived from the VPx series would make it a lot harder to start. HEVC was enough of a H.264 subset that getting an x265 that did SOMETHING wasn't THAT hard. AV1-as-an-ecosystem has a serious disadvantage in not having a well-tuned, production-grade, open-source VPx encoder to start from.

Libvpx never got the obsessive focus from thousands of quality/efficiency/speed obsessed video pirates that was the foundation of what x264 is today.

Just look at the H.264, or even the HEVC, forums here, and all the posts over many years. That's the kind of community focus that makes for a great encoder.

Look at the "Posts" column:

https://forum.doom9.org/forumdisplay.php?f=17

TomV

17th January 2019, 21:25

Sorry, I wasn't clear I was offering x264/5 syntax for what you want an AV1 equivalent to. Which does not (yet?) exist in libaom.

Oh. That was my original question... "what's your recommended settings for the equivalent of --preset veryslow?" Of course I'm intimately familiar with x265 syntax. I defined a fair amount of it. :)

zub35

17th January 2019, 23:22

benwaggoner If the codec will always be so slow to encode on x86 CPUs, then this encoder will be exclusively for large companies that are capable of acquiring a HW encoder.
AV1 it is a medal from two sides:
1. free and high quality 2. the need to purchase special HW encoders.
Therefore, what is the point of the community to develop it in of quality improvement.
AV1 in its current form (x86), is simply a way of advertising and popularization, nothing more.
p.s. AV1 - "free" (need buy HW) codec for only youtube...

hajj_3

18th January 2019, 00:31

benwaggoner If the codec will always be so slow to encode on x86 CPUs, then this encoder will be exclusively for large companies that are capable of acquiring a HW encoder.
AV1 it is a medal from two sides:
1. free and high quality 2. the need to purchase special HW encoders.
Therefore, what is the point of the community to develop it in of quality improvement.
AV1 in its current form (x86), is simply a way of advertising and popularization, nothing more.
p.s. AV1 - "free" (need buy HW) codec for only youtube...

rav1e is a reasonably fast encoder, it can do several fps and will get faster with time.

Also youtube won't be the only ones adopting this. Facebook, bbc iplayer, netflix etc and many others will be adopting this.

foxyshadis

19th January 2019, 10:18

benwaggoner If the codec will always be so slow to encode on x86 CPUs, then this encoder will be exclusively for large companies that are capable of acquiring a HW encoder.
It's not completely set in stone, but I really believe H.264/AVC might be the last codec easily encodable in pure x86/x64. lntel and AMD have agreed on some extensions to make H.265/HEVC and AV1 not suck quite as much, but the obvious direction is in GPU or fixed-function encoding.

hajj_3

19th January 2019, 11:01

lntel and AMD have agreed on some extensions to make H.265/HEVC and AV1 not suck quite as much

source?

alex1399

20th January 2019, 03:20

I think that is the reason H.264/AVC ditched some complicated features whether they can be implemented by hardware decoder or not. In some extensions, heavy taxing on cpu loading from software encoding perspective is quite possible.

Gravitator

20th January 2019, 08:23

How do I beat a pulsating noise?
I need to save/delete it across the entire sample (using the AOM settings).
> Encoded sample (https://files.videohelp.com/u/227452/AOM%20test%20sp6.mkv)
> Original sample (https://files.videohelp.com/u/227452/SW2.mkv)
aomenc --passes=2 --pass=1 --target-bitrate=800 --end-usage=vbr --fpf="PATH TO THE .stats FILE" --profile=0 --cpu-used=6 --min-q=0 --max-q=63 --bias-pct=70 --minsection-pct=15 --maxsection-pct=10000 --lag-in-frames=25 --drop-frame=0 --undershoot-pct=0 --overshoot-pct=0 --buf-sz=6 --buf-initial-sz=4 --buf-optimal-sz=5 --drop-frame=0 --kf-min-dist=0 --kf-max-dist=250 --auto-alt-ref=1 --arnr-maxframes=7 --arnr-strength=5 --noise-sensitivity=0 --sharpness=0 --static-thresh=0 --tune-content=default --tile-columns=0 --tile-rows=0 --aq-mode=0 --min-gf-interval=0 --max-gf-interval=0 --threads=2 --width=1920 --height=816 --i420 --input-bit-depth=10 --bit-depth=10 --row-mt=0 --cdf-update-mode=1 -o NUL -

aomenc --passes=2 --pass=2 --target-bitrate=800 --end-usage=vbr --fpf="PATH TO THE .stats FILE" --profile=0 --cpu-used=6 --min-q=0 --max-q=63 --bias-pct=70 --minsection-pct=15 --maxsection-pct=10000 --lag-in-frames=25 --drop-frame=0 --undershoot-pct=0 --overshoot-pct=0 --buf-sz=6 --buf-initial-sz=4 --buf-optimal-sz=5 --drop-frame=0 --kf-min-dist=0 --kf-max-dist=250 --auto-alt-ref=1 --arnr-maxframes=7 --arnr-strength=5 --noise-sensitivity=0 --sharpness=0 --static-thresh=0 --tune-content=default --tile-columns=0 --tile-rows=0 --aq-mode=0 --min-gf-interval=0 --max-gf-interval=0 --threads=2 --width=1920 --height=816 --i420 --input-bit-depth=10 --bit-depth=10 --row-mt=0 --cdf-update-mode=1 -o OUTPUTFILE -

benwaggoner

21st January 2019, 19:26

It's not completely set in stone, but I really believe H.264/AVC might be the last codec easily encodable in pure x86/x64. lntel and AMD have agreed on some extensions to make H.265/HEVC and AV1 not suck quite as much, but the obvious direction is in GPU or fixed-function encoding.

HEVC is certainly encodable on x64, although 32-bit becomes impractical at high quality at very high frame sizes. The trend for even live encoding has been towards highly multithreaded CPU encoding. Fixed-function ASIC style hardware is too inflexible when there are SO many options for how to encode every block, with lots of psychovisual tuning to be done. The more complex codecs get, the more high quality encoders are on CPU. And arithmetic entropy coding really benefits from peak single-thread performance. I think there is hope for hybrid CPU/GPU/ASIC/FPGA models, but I don’t see professional quality encoding not to heavily use CPU anytime soon.

I don’t think there is anything intrinsically hard about AV1 for software encoding. If anything SW encoding could be easier than HW encoding due to parallelization limitations. The bigger issue is that VPx hasn’t had a truly competitive quality @ perf encoder in YEARS. And a reference encoder isn’t a great starting point, which libaom sort of is. If one wanted to build a quality @ perf optimized encoder from scratch, especially for low latency live encoding, I’d start by just implementing the mandatory features of a bitstream and then add features incrementally, seeing what their quality @ perf is. Starting with an encoder that HAS to use ALL codec features like a reference encoder does can be harder than going front the ground up.

Sent from my iPhone using Tapatalk

benwaggoner

21st January 2019, 19:30

rav1e is a reasonably fast encoder, it can do several fps and will get faster with time.

Also youtube won't be the only ones adopting this. Facebook, bbc iplayer, netflix etc and many others will be adopting this.

Do we have data on quality @ perf @ bitrate?

It’s pretty easy to make a fast encoder. But making one that is fast and produces competitive quality at a given bitrate is a lot harder.

AV1 is new enough that I don’t expect encoders to give us a clear sense of what the potential quality @ perf for the bitstream is yet. There is a lot of quality and perf optimization to be in the ballpark to compare.

Sent from my iPhone using Tapatalk

utack

24th January 2019, 23:13

A new Speech on AV1 by Tim Terriberry
https://www.youtube.com/watch?v=qubPzBcYCTw

Nintendo Maniac 64

25th January 2019, 02:22

A new Speech on AV1 by Tim Terriberry
https://www.youtube.com/watch?v=qubPzBcYCTw

Kind of amusing that, for something about new fancy-pants video codecs, the video itself has telecine judder (it's been telecine'd from 25fps to 30fps).

TomV

25th January 2019, 08:07

A new Speech on AV1 by Tim Terriberry
https://www.youtube.com/watch?v=qubPzBcYCTw

Tim is a smart engineer, but engineers typically aren't well equipped to present legal opinions / advice (from 2 min to 9 min). Nobody pays for patents twice. If you license from both MPEG LA and HEVC Advance, the companies that are in both only get paid once. The patent chart Tim is using is out of date and inaccurate. Fraunhofer sold their patents to GE, which is why GE has HEVC patents. Canon licenses their patents through MPEG LA. Velos Media has told hundreds of companies what they charge, and they've signed many companies to their license program.

The truth is that every competitive device that supports video now supports HEVC in hardware. Billions of devices, with billions more sold each year. Most every TV, smartphone, tablet or connected set-top box (including Google Chromecast Ultra). If the patent situation were really untenable, Apple, Samsung, LG, Sony, Amazon, Google, GoPro, Roku and hundreds of other device OEMs wouldn't be incorporating HEVC in their devices. And we wouldn't be watching 4K HDR HEVC movies from Netflix, Amazon, Hulu, Vudu, and Apple. If you want another perspective, I gave a talk at the SF Video meetup on the topic... https://www.youtube.com/watch?v=vgE8-4rcXl0

The Alliance for Open Media isn't the only group working to deal with the difficulties of licensing patents for industry standards. MPEG is dealing with it. The Media Coding Industry Forum is dealing with it. And outside firms like Unified Patents, with their Video Codec Zone (specifically focused on HEVC) are dealing with it. It's an ongoing challenge (both for HEVC and for new standards in development), but it's being dealt with.

On the other hand, I'm blazing along at less than 1 frame per minute of 1080P with aomenc cpu-used=1 on a fast Core i7-7820X (with hyperthreading disabled, for the fastest possible single-threaded performance). The resulting videos are roughly on par with my HEVC encodes at identical bit rates (sometimes better, but very often worse). They're clean, but soft and lacking detail.

TD-Linux

26th January 2019, 22:10

Do we have data on quality @ perf @ bitrate

Here's a link to AWCY as shown in Tim's presentation:

https://beta.arewecompressedyet.com/?job=x264-veryslow%402018-11-04T00%3A40%3A25.690Z&job=vp9_Sept-06-19%402018-09-06T14%3A41%3A38.675Z&job=master-s1-high-latency-525f981376bd

tl;dr it is better than x264 at every bitrate, but still worse that libvpx VP9. It is also currently about 10x slower than x264, which is blazing fast compared to libaom but still has a lot of room for improvement.

benwaggoner

26th January 2019, 22:56

AWCY doesn’t include any metrics that are well demonstrated to be able to finely discriminate between quality of different encoders and codecs. VMAF is the least-bad we’ve ever had, but can still be off quite a bit for individual clips, especially if the use codec features or psychovisual optimizations that weren’t included in their test clips. For example, VMAF is bad at rating effectiveness of low-Luna adaptive quant, I speculate because they didn’t include any clips that used different ways to do that in their testing. VMAF is a very impressive effort, but it is not magic. Like all machine learning aystems, it tried to predict what a human would answer given complex input, based on a. large set of example inputs and answers. But I’d it doesn’t have human input for some kinds of inputs, the validity of its predicted ratings for those inputs is unpredictable at best.

Also, the value of mean or even harmonic mean of per-frame scores is limited for clips much more than 10 seconds. A movie encoded in CBR and a VBR encode at the same ABR might up with the same mean score per frame, but the VBR would be strongly preferred by viewers as it offers consistent quality, with the worse sections being a lot better than the worst in a CBR encode.

Comparing psychovisual optimizations, rate control, and significantly different encoders tools requires subjective double-one testing before any confidence in objective meassures’ applicability.

Net-net: you can’t know how good video looks without real people looking at it when techniques are used that weren’t incorporated in an objective metric. If we see a high correlation between MOS and VMAF for a new technique/codec, then we can start trusting that metric.

utack

27th January 2019, 15:30

Does anyone know what the deal with Qualcomm is?
Is the YouTube rollout and Netfix talk pushing them towards making a hardware decoder, or do they have some interests in MPEG doing well and will try to delay AV1 support in phones for a while?

Djfe

28th January 2019, 18:25

A rendered 8k video on YouTube with lots of flickering (epilepsy warning), details like rain drops on a helmet etc.
https://youtu.be/fOWsamMv_v4
Maybe good for comparing av1 to vp9 on YouTube (up to 480p currently)
Once YouTube gets better encoders anyways

obvious already: more blurred but definitely less blocky and less obvious artifacts)
1:30min into the video is probably the best place to compare (and the hardest part for their av1 implementation so far at that bitrate)

benwaggoner

28th January 2019, 20:37

A rendered 8k video on YouTube with lots of flickering (epilepsy warning), details like rain drops on a helmet etc.
https://youtu.be/fOWsamMv_v4
Maybe good for comparing av1 to vp9 on YouTube (up to 480p currently)
Once YouTube gets better encoders anyways

obvious already: more blurred but definitely less blocky and less obvious artifacts)
1:30min into the video is probably the best place to compare (and the hardest part for their av1 implementation so far at that bitrate)
Are you getting at AV1 encode at 480p and below somehow? It shows as VP9 for me at every bitrate (using Chrome).

That is a very interesting clip from a compression perspective. It'll really stress weighted prediction (all those strobes) and adaptive quantization (intense variation in frequency distribution). Tons of value from intraframe prediction.

The VP9 encode is not doing well; at 8K scaled down to my 4K monitor there's lots of blocking and banding issues on the guy. I don't have an immediate intuition for how much is limitations in VP9 versus libvpx. That's a kind of content not in the standard libraries of clips encoders get tuned against.

I would expect libaom to do pretty well against it at a slow preset, as libaom is doing a pretty broad mode search with its myriad tools. So it might find lots of oddball methods that work well with this clip. Probably a big gap between slower and faster modes.

Nintendo Maniac 64

28th January 2019, 20:51

Are you getting at AV1 encode at 480p and below somehow? It shows as VP9 for me at every bitrate (using Chrome).

Don't you have to opt into using AV1 on YouTube?

Anyway, I can definitely confirm via youtube-dl that AV1 (listed as av01) encodes do in fact exist for that video at 480p resolution and lower:

https://imgoat.com/uploads/aa1883c641/190367.png

benwaggoner

28th January 2019, 20:57

Don't you have to opt into using AV1 on YouTube?

Anyway, I can definitely confirm via youtube-dl that AV1 (listed as av01) encodes do in fact exist for that video at 480p resolution and lower:

Yes, it can be set here: https://www.youtube.com/testtube

Now I need to figure out how to do side/by/side in different codecs. Worth comparing to the x264 encodes as well.

benwaggoner

28th January 2019, 21:51

Does anyone know what the deal with Qualcomm is?
Is the YouTube rollout and Netfix talk pushing them towards making a hardware decoder, or do they have some interests in MPEG doing well and will try to delay AV1 support in phones for a while?
I don’t know anything about Qualcomm specifically, but it can take quite a while to go from final spec to design to tape-out to samples to full-scale fab to products launching with a new SoC.

It was being generally discussed in the industry that AV1’s bitstream finalization delays caused chipmakers to miss the 2019 product design window. A HW accelerated decoder is a lot more flexible, but fixed-function decoder needs to be RIGHT. Small product flaws can wind up impact the entire industry for years. And the combination of video decode and DRM is complex with very high functional requirements.

And I’ve heard that implementing AV1 in hardware is more complex than anticipated, due to relatively low parallelism opportunities and how many discreet tools can get applied to any given pixel. Getting a decoder running on a low-power chip is quite diffeeenr than with 1-2 very fast x64 threads. Say what you will about the MPEG process, but it is good at constraining decode complexity for software and hardware.

Nintendo Maniac 64

28th January 2019, 23:39

Now I need to figure out how to do side/by/side in different codecs.

Download each individual video stream via youtube-dl and then play them back in their own video player program window?

alex1399

29th January 2019, 18:28

libavfilter could be utilized to perform some sort of [0:v]crop=in_w/2:in_h:0:0[VL];[1:v]crop=in_w/2:in_h:in_w/2:0[VR];[VL][VR]hstack stuff

mandarinka

29th January 2019, 23:39

AWCY doesn’t include any metrics that are well demonstrated to be able to finely discriminate between quality of different encoders and codecs. VMAF is the least-bad we’ve ever had, but can still be off quite a bit for individual clips, especially if the use codec features or psychovisual optimizations that weren’t included in their test clips.

Isn't there also the possibility that there's a sort of implicit "training" for this metric included in one codec/encoder and not the other?

I don't know whether VMAF was used in some x265 tuning, but given the age of all the significant parts of x264 codebase, I am fairly sure that there has been no attempts to do this.

Meanwhile VMAF was IIRC used during development of Daala and AV1 and maybe AOMenc/Rav1e? In that case, there could be some inherent bias in the metric towards those codecs that would then add some imaginary advantage above their real compression quality into the numbers, when measured by VMAF. Simply because their output was implicitly tuned to get better VMAF, because VMAF was used to test new tools/analysis/RDO and so on.

benwaggoner

30th January 2019, 00:20

Isn't there also the possibility that there's a sort of implicit "training" for this metric included in one codec/encoder and not the other?
It is an inevitability. Generally the utility of a metric goes down once it is codified, because people start optimizing for that metric instead of the subjective quality that metric approximates. So the correlation of the metric with subjective ratings becomes weaker, as metric-specific tricks get implemented.

I don't know whether VMAF was used in some x265 tuning, but given the age of all the significant parts of x264 codebase, I am fairly sure that there has been no attempts to do this.
x265 was around long before VMAF, and a VMAF useful for UHD has only been around a few months. Libaom seems have been getting a lot more tuning-by-VMAF.

Meanwhile VMAF was IIRC used during development of Daala and AV1 and maybe AOMenc/Rav1e? In that case, there could be some inherent bias in the metric towards those codecs that would then add some imaginary advantage above their real compression quality into the numbers, when measured by VMAF. Simply because their output was implicitly tuned to get better VMAF, because VMAF was used to test new tools/analysis/RDO and so on.
VMAF wasn't around for most/all of Daala work. AV1 is really the first bitstream to have its practical implementations start in the VMAF era. This is one reason I'm suspicious of VMAF scores for AV1. Good analysis.

In particular I worry that VMAF is insufficiently sensitive to temporal shifts in video quality. A VMAF of 70, 65, 60, 60, 65, 60, 60 might come out as a nice "VMAF=65.3" but be a annoying to watch. Frame strobing was a weakness of libvpx.

TD-Linux

30th January 2019, 01:00

The patent chart Tim is using is out of date and inaccurate. Fraunhofer sold their patents to GE, which is why GE has HEVC patents. Canon licenses their patents through MPEG LA.

The chart is the one used in Leonardo's blog (http://blog.chiariglione.org/a-crisis-the-causes-and-a-solution/), though I think it's originally from streamingmedia.com. I've attached an updated version for future presentations.

https://people.xiph.org/~tdaede/HEVC.png

jonatans

30th January 2019, 02:14

I originally created the figure and presented it for the first time during Streaming Tech Sweden 2017. There have been some changes since then.

Here is an updated figure:

https://www.divideon.com/images/HevcPatentHolders190130.png

Please note that the figure is only based on public information available from ISO/IEC/ITU and from the patent pools. Please also note that not all of the MPEG LA patent holders are shown in the figure.

hajj_3

30th January 2019, 02:32

I originally created the figure and presented it for the first time during Streaming Tech Sweden 2017. There have been some changes since then.

Here is an updated figure:

https://www.divideon.com/images/HevcPatentHolders190130.png

Please note that the figure is only based on public information available from ISO/IEC/ITU and from the patent pools. Please also note that not all of the MPEG LA patent holders are shown in the figure.

I think i read that Franhaufer sold their HEVC patents to General Electric, if true you should remove Franhaufer from your diagram.

TomV

30th January 2019, 02:51

The chart is the one used in Leonardo's blog (http://blog.chiariglione.org/a-crisis-the-causes-and-a-solution/), though I think it's originally from streamingmedia.com. I've attached an updated version for future presentations.

https://people.xiph.org/~tdaede/HEVC.png
It's originally from Jonatan Samuelsson, a.k.a. jonatans (https://forum.doom9.org/member.php?u=227153)

Even with the updates, the main problem with this chart is that it's a bit misleading.. for 2 reasons. First, I think most people in the video industry assume that AVC patent licensing is and was perfectly clean and simple... that all necessary standard-essential patents were licenseable in the MPEG LA patent pool. That's not true. Nokia, Qualcomm, Broadcomm, Blackberry, Texas Instruments, MIT all hold standard-essential AVC patents outside the MPEG LA pool (although Qualcomm messed up and a judge ruled they can't assert them for AVC). Multiple legal battles have been fought over AVC patents, including some pretty big cases... Microsoft v Motorola, and Apple v Nokia. Today, everyone can agree that the patent licensing situation for AVC is much better than it is for HEVC. But it didn't start out that way, and it took some time for the situation to settle. Also, there are quite a few more patent holders in some of these HEVC pools than shown in this chart.

In his talk, Tim mentioned that patents are issued that may not be valid (https://youtu.be/qubPzBcYCTw?t=498), and then said "and you could go around and try to invalidate them all, but they're really expensive to do that, and there's a lot of them". Well, if you're a multi-billion dollar company (Apple, Samsung, Google, Amazon, etc.), you have a lot of lawyers, and that's what they're paid to do. If you're being asked to pay tens or hundreds of millions of dollars a year in patent license fees, you have all the motivation in the world to spend whatever it takes on legal fees to right-size the problem. When multiple multi-billion dollar companies have this issue, collectively there is a lot of motivation. It turns out that when challenged in court, most patents don't hold up. They can be invalidated for many reasons... prior art, unpatentable claims, obviousness, the invention was anticipated, etc. This type of effort is being undertaken by Unified Patents (as a service to many large tech companies), and there is a relatively new law called the America Invents Act that provides a faster, less expensive way to get rid of bad patents, called an Inter Partes Review (IPR). Unified already filed an IPR against Velos Media, and you can expect more such filings under their Video Codec domain. But you don't even have to invalidate patents in order not to pay a fortune.

Again, keep in mind that no one is asking for patent license fees for content distribution (streaming, etc.), except for UHD-Blu-ray disc (a small per-disc fee to HEVC advance). Only hardware device manufacturers need to license HEVC patents, and they are dealing with that issue and they continue to support HEVC in every device they make that supports video. For video services, HEVC is free. Now that the majority of active end-user devices support HEVC, it makes a lot of financial sense for video services to make their VOD catalog, or the majority of their live channels available in both AVC and HEVC (not just 4K and HDR content... all content). The bandwidth savings and customer experience improvement far outweigh the additional cost of encoding and CDN storage.

TomV

30th January 2019, 02:58

I think i read that Franhaufer sold their HEVC patents to General Electric, if true you should remove Franhaufer from your diagram.

That's true. I don't think it was ever announced, but I can assure you that I got confirmation from a very reliable source.

TomV

30th January 2019, 03:02

I originally created the figure and presented it for the first time during Streaming Tech Sweden 2017. There have been some changes since then.

Here is an updated figure:

https://www.divideon.com/images/HevcPatentHolders190130.png

Please note that the figure is only based on public information available from ISO/IEC/ITU and from the patent pools. Please also note that not all of the MPEG LA patent holders are shown in the figure.
Hey... we were both responding at the same time (cross posting). Thanks for posting an update Jonatan. Your efforts on this, and in the MC-IF are really appreciated.