Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > VP9 and AV1

Reply
 
Thread Tools Search this Thread Display Modes
Old 2nd November 2018, 14:19   #1181  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,752
New uploads: (MSYS2; MinGW32: GCC 7.3.0 / MinGW64: GCC 8.2.0)

AOM v1.0.0-864-g351711076
now with TPL model (RDO modulation based on frame temporal dependency) and block based denoiser

rav1e 0.1.0 (7492fc5 / 2018-11-01)

dav1d 0.0.1 (287ba91 / 2018-11-02)
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 2nd November 2018, 18:10   #1182  |  Link
Mjpeg
Registered User
 
Join Date: Jun 2018
Posts: 7
16x FTW

Report from a 1 day meeting on future codecs, h264 through VVC.

http://www.streamingmedia.com/Articl...P9-128213.aspx

The most interesting quote is from a Youtube encoding engineer that AV1 encoding time is down to 16x slower vs. VP9, so that's a nice performance trend (no doubt giving up some % of quality). What's important (as benwaggoner always says) is what quality@perf tradeoffs are available.
Mjpeg is offline   Reply With Quote
Old 2nd November 2018, 20:39   #1183  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,750
Quote:
Originally Posted by Mjpeg View Post
Report from a 1 day meeting on future codecs, h264 through VVC.

http://www.streamingmedia.com/Articl...P9-128213.aspx

The most interesting quote is from a Youtube encoding engineer that AV1 encoding time is down to 16x slower vs. VP9, so that's a nice performance trend (no doubt giving up some % of quality). What's important (as benwaggoner always says) is what quality@perf tradeoffs are available.
Yeah, YouTube is sort of a special case for encoding. Given how much content they get and the average views/upload, economically they’re going to spend fewer MIPS/pixel than for premium content. And they do their stuff (last I heard) single-threaded on unused-at-that-moment Google servers, ala a spot instance. Which is why libvpx never got a lot of multithreading, nor were the VPx series of bitstream vetted for parallelizability.

Flip side is no one expects spectacular quality from YouTube. It’s free. So even though video game captures always look terrible (lots of high frequency sharp edges...), that’s what people are used to and so they don’t really think about it any more. So YouTube can experiment a lot at how to make good-enough quality fast, which is a different direction from a lot of other folks, and a very useful one in encoder development.

It can be a good starting point for live encoders, although a YouTube can handle some content taking 3x longer to encode than other content due to complexity.

HEVC and AV1 have great tools for making text and video game footage a LOT better. But they are also pretty expensive to add to normal mode detection, so lot of heuristics to figure out when to use them are important.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 2nd November 2018, 22:26   #1184  |  Link
Mjpeg
Registered User
 
Join Date: Jun 2018
Posts: 7
Live encoding

I'm all for AV1 encoders getting faster, but of course on an absolute scale it's pretty slow still, spending a lot of CPU for all that efficiency.

I always figured live-encoding would be quite a wait, although the official framing of AV! always mentions that live-stream/chat was an important case they had thought about.

So anyway I find a talk by Nathan Egge of Mozilla

Most of the slides look familiar but there's a claim about live encoding I had not seen before.

https://people.xiph.org/~negge/AVIF2018.pdf

page 55:

rav1e Live Encoding
Shown at IBC in Sept 2018
● 640x480 @ 30 fps
● Single tile / thread
● Simplified feature set

I guess one sign of AV1 progress will be when someone posts a link here to an AV1 webcam.
Mjpeg is offline   Reply With Quote
Old 3rd November 2018, 11:01   #1185  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
32/64bits binaries (GCC 8.2):
av1-1.0.0-877-ge5761e020: https://mega.nz/#!1sgl3QrB!x6F6SfLzz...vWZ-IMSzZukIWs

A long standing multithreading bug has been fixed tonight, so here's a new build
Cc @utack
SmilingWolf is offline   Reply With Quote
Old 3rd November 2018, 16:11   #1186  |  Link
v0lt
Registered User
 
Join Date: Dec 2008
Posts: 1,959
Quote:
Originally Posted by SmilingWolf View Post
A long standing multithreading bug has been fixed tonight, so here's a new build
I still don’t see aomenc.exe using more than one thread.
v0lt is offline   Reply With Quote
Old 3rd November 2018, 17:44   #1187  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
It won't by default. Tile columns, rows and threads count are all set to 0.
You have to give at least --tile-columns=1 (and/or --tile-rows=1) --threads=2 to have it use multithreading.
SmilingWolf is offline   Reply With Quote
Old 3rd November 2018, 18:12   #1188  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,752
32 bit GCC 8.2? ... Does it exist for Windows? MSYS2 did not yet solve internal compiling errors, I believe.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 3rd November 2018, 18:24   #1189  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
I'm cross compiling from a linux VM. Made it easier to switch between compiler versions back when I was investigating the optimization related bug, then the bug was worked around and the environment stuck.
Silver lining, I'm not stuck with an outdated compiler and I don't have to manage my own MSYS2 package.
SmilingWolf is offline   Reply With Quote
Old 3rd November 2018, 18:35   #1190  |  Link
Clare
Registered User
 
Join Date: Apr 2016
Posts: 61
Quote:
Originally Posted by SmilingWolf View Post
It won't by default. Tile columns, rows and threads count are all set to 0.
You have to give at least --tile-columns=1 (and/or --tile-rows=1) --threads=2 to have it use multithreading.
Just use --row-mt=1 instead of tiles, it maxes out all my threads.
Clare is offline   Reply With Quote
Old 3rd November 2018, 18:57   #1191  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
What does your command line look like? So far I've been unable to make row-mt work myself

My tries so far:
this one generates invalid bitstream:
../../bin8/aomenc --frame-parallel=0 --tile-columns=2 --tile-rows=2 --row-mt=1 --threads=4 --auto-alt-ref=1 --cpu-used=4 --tune=psnr --passes=2 --end-usage=q --cq-level=40 --test-decode=fatal -o test.av1.cq40.webm orig.i420.y4m
Pass 1/2 frame 480/481 92352B 1539b/f 36899b/s 17849 ms (26.89 fps)
Pass 2/2 frame 19/0 0B 17871 ms 1.06 fps [ETA unknown] 2423FFailed to decode frame 2 in stream 0: Corrupt frame detected
Failed to decode tile data

This one works but uses only 12-13% (one core) of my 4c/8t CPU:
../../bin8/aomenc --frame-parallel=0 --tile-columns=2 --tile-rows=2 --row-mt=1 --auto-alt-ref=1 --cpu-used=4 --tune=psnr --passes=2 --end-usage=q --cq-level=40 --test-decode=fatal -o test.av1.cq40.webm orig.i420.y4m
Pass 1/2 frame 480/481 92352B 1539b/f 36899b/s 18130 ms (26.47 fps)
Pass 2/2 frame 16/0 0B 18152 ms 52.88 fpm [ETA unknown]

Last edited by SmilingWolf; 3rd November 2018 at 19:06.
SmilingWolf is offline   Reply With Quote
Old 3rd November 2018, 19:30   #1192  |  Link
v0lt
Registered User
 
Join Date: Dec 2008
Posts: 1,959
Quote:
Originally Posted by SmilingWolf View Post
It won't by default. Tile columns, rows and threads count are all set to 0.
You have to give at least --tile-columns=1 (and/or --tile-rows=1) --threads=2 to have it use multithreading.
I always ask 4 threads, but it never worked.
aomenc --codec=av1 --cq-level=20 --threads=4

Added:
I do not understand what are the columns and rows in this context. If a codec divides a frame into identical independent cells, it is unclear how it can effectively compress in a multi-thread mode.

Last edited by v0lt; 3rd November 2018 at 19:38.
v0lt is offline   Reply With Quote
Old 3rd November 2018, 22:20   #1193  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
Tile columns (click to enlarge):


The frame is divided in N (10 in my case) columns of equal width.
Each column in indipendent, so every thread can indipendently work on a tile.
Using tile columns (and/or rows) makes decoding faster too, because each tile can be decoded by a separate thread, making playback way smoother

You command line doesn't show any multithreading because you only have a single big tile, which is being worked on by thread 0, leaving threads 1,2,3 with nothing to do.

Last edited by SmilingWolf; 3rd November 2018 at 22:28.
SmilingWolf is offline   Reply With Quote
Old 4th November 2018, 06:54   #1194  |  Link
v0lt
Registered User
 
Join Date: Dec 2008
Posts: 1,959
@SmilingWolf
In this case, the video stream received in the multi-thread mode can be worse than the one-thread mode (with the same bitrate of course). Because motion prediction algorithms will not be able to work effectively.

Added:
I also noticed that some files are decoded by 2 threads, while others are always in single-threaded mode. This is strange, because It is not clear how this will affect hardware decoding.

Last edited by v0lt; 4th November 2018 at 07:24.
v0lt is offline   Reply With Quote
Old 4th November 2018, 11:04   #1195  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
Quote:
Originally Posted by v0lt View Post
@SmilingWolf
In this case, the video stream received in the multi-thread mode can be worse than the one-thread mode (with the same bitrate of course). Because motion prediction algorithms will not be able to work effectively.
Indeed tile columns affect compression efficiency. However it is also the only way to have threaded decoding and the only way to have smooth 1080p decoding back when I began testing (far before my registration on doom9). Well, at least it was before dav1d, which seems to be using a couple different parallelization techniques.
I'll be running a couple of simple test encodes and decodes and report back some numbers.

Quote:
Originally Posted by v0lt View Post
Added:
I also noticed that some files are decoded by 2 threads, while others are always in single-threaded mode.
Do you have any samples? Files coming from YouTube perhaps? I'd like to inspect them. I know at least some of the first videos they put online in the AV1 test playlist used a single tile column, which forced single threaded decoding in anything using libaom (e.g. Firefox, Chrome, FFMpeg)
SmilingWolf is offline   Reply With Quote
Old 4th November 2018, 12:13   #1196  |  Link
v0lt
Registered User
 
Join Date: Dec 2008
Posts: 1,959
@SmilingWolf
As far as I remember, streams obtained using rav1e v1.0.116 were decoded in single-threaded mode. But samples from elecard.com loaded at least 2 cores.
Now it is difficult for me to recheck it, because the decoder in the player works faster than before.
v0lt is offline   Reply With Quote
Old 4th November 2018, 12:20   #1197  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
Alright, rav1e doesn't support tiles yet, so every frame is a single big column
I'll inspect the Elecard samples ASAP.

Meanwhile my encodes are finishing up, so I'll post size, quality and decoding time differences when they're done

Last edited by SmilingWolf; 4th November 2018 at 12:25.
SmilingWolf is offline   Reply With Quote
Old 4th November 2018, 18:23   #1198  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
The clip used is the F.Y.C one I described some pages ago

aomenc/aomdec: 1.0.0-877-ge5761e020
dav1d: 0.0.1 e0c3186

Quality and sizes:
Code:
Sizes:
test.av1.cq20.tc0.ivf: 5956739
test.av1.cq20.tc2.ivf: 6001827 +0.75%
test.av1.cq20.tc6.ivf: 6091937 +2.22%

PSNR-HVS-M:
test.av1.cq20.tc0.ivf: 43.192
test.av1.cq20.tc2.ivf: 43.1736 -0.04%
test.av1.cq20.tc6.ivf: 43.1489 -0.10%

MS-SSIM:
test.av1.cq20.tc0.ivf: 26.5095
test.av1.cq20.tc2.ivf: 26.4895 -0.07%
test.av1.cq20.tc6.ivf: 26.467  -0.15%
Decoding:
Code:
# aomdec --threads=8 --progress -o /dev/null test.av1.cq20.tc0.ivf
480 decoded frames in 4660361 us (103.00 fps)

# aomdec --threads=8 --progress -o /dev/null test.av1.cq20.tc2.ivf
480 decoded frames in 3365067 us (142.64 fps) +27,79%

# aomdec --threads=8 --progress -o /dev/null test.av1.cq20.tc6.ivf
480 decoded frames in 3267103 us (146.92 fps) +29,89%

# time dav1d -i test.av1.cq20.tc0.ivf -o /dev/null --muxer yuv4mpeg2 -q --framethreads 8 --tilethreads 4
480 decoded frames in 1997 ms    (240,36 fps)

# time dav1d -i test.av1.cq20.tc2.ivf -o /dev/null --muxer yuv4mpeg2 -q --framethreads 8 --tilethreads 4
480 decoded frames in 1747 ms    (274,75 fps) +12,51%

# time dav1d -i test.av1.cq20.tc6.ivf -o /dev/null --muxer yuv4mpeg2 -q --framethreads 8 --tilethreads 4
480 decoded frames in 1763 ms    (272,26 fps) +11,71%
TC 0 means one single column (whole frame)
TC 2 generates 4 columns
TC 6 generates 10 columns
I write they "generate" N columns because there's an upper limit to how many columns fit in a given horizontal resolution, TC 6 implies an actual max of 2^6=64 columns and I use it as a catch all to generate as many columns as possible for my clips.

So the take aways from all this:
  • for very negligible quality and size differences you can get up to 30% faster decoding performances on 720p. I'd expect it to be even more noticeable on higher resolutions;
  • dav1d is now faster than libaom;
  • interestingly, dav1d gave slightly better results with less columns on this particular clip. This might warrant more thorough investigation in the future.

Also, RE: Elecard clips:
they use tile columns (5 for the 720p clips, 10 for the HD clips), so that's why the player used more than one core on those
SmilingWolf is offline   Reply With Quote
Old 6th November 2018, 19:15   #1199  |  Link
Mr_Khyron
Member
 
Mr_Khyron's Avatar
 
Join Date: Nov 2002
Posts: 203
Quote:
ffmpeg -hide_banner -t 60 -c:v libdav1d -threads 16 -tilethreads 4 -i Stream2_AV1_4K_22.7mbps.webm -benchmark -f null -
[libdav1d @ 000001ec27984180] libdav1d bd747b1
Input #0, matroska,webm, from 'Stream2_AV1_4K_22.7mbps.webm':
Metadata:
encoder : libwebm-0.2.1.0
Duration: 00:02:24.12, start: 0.000000, bitrate: 22728 kb/s
Stream #0:0(eng): Video: av1 (Main), yuv420p(tv), 3840x2160, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 1k tbn, 1k tbc (default)
[libdav1d @ 000001ec27a66b80] libdav1d bd747b1
Stream mapping:
Stream #0:0 -> #0:0 (av1 (libdav1d) -> wrapped_avframe (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
Metadata:
encoder : Lavf58.22.100
Stream #0:0(eng): Video: wrapped_avframe, yuv420p, 3840x2160 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc (default)
Metadata:
encoder : Lavc58.39.100 wrapped_avframe
frame= 1500 fps= 77 q=-0.0 Lsize=N/A time=00:01:00.00 bitrate=N/A speed=3.09x
video:785kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
bench: utime=228.359s stime=38.609s rtime=19.687s
bench: maxrss=2776212kB
I tried bencmarking with ffmpeg 4.2
from 16fps with libaom to 77fps with Dav1d
Mr_Khyron is offline   Reply With Quote
Old 9th November 2018, 19:02   #1200  |  Link
Clare
Registered User
 
Join Date: Apr 2016
Posts: 61
Quote:
Originally Posted by SmilingWolf View Post
What does your command line look like? So far I've been unable to make row-mt work myself

My tries so far:
this one generates invalid bitstream:
../../bin8/aomenc --frame-parallel=0 --tile-columns=2 --tile-rows=2 --row-mt=1 --threads=4 --auto-alt-ref=1 --cpu-used=4 --tune=psnr --passes=2 --end-usage=q --cq-level=40 --test-decode=fatal -o test.av1.cq40.webm orig.i420.y4m
Pass 1/2 frame 480/481 92352B 1539b/f 36899b/s 17849 ms (26.89 fps)
Pass 2/2 frame 19/0 0B 17871 ms 1.06 fps [ETA unknown] 2423FFailed to decode frame 2 in stream 0: Corrupt frame detected
Failed to decode tile data

This one works but uses only 12-13% (one core) of my 4c/8t CPU:
../../bin8/aomenc --frame-parallel=0 --tile-columns=2 --tile-rows=2 --row-mt=1 --auto-alt-ref=1 --cpu-used=4 --tune=psnr --passes=2 --end-usage=q --cq-level=40 --test-decode=fatal -o test.av1.cq40.webm orig.i420.y4m
Pass 1/2 frame 480/481 92352B 1539b/f 36899b/s 18130 ms (26.47 fps)
Pass 2/2 frame 16/0 0B 18152 ms 52.88 fpm [ETA unknown]
aomenc -v --threads=8 --cpu-used=4 --row-mt=1 --lag-in-frames=25 --auto-alt-ref=1--passes=2 --pass=2 --bit-depth=10 --input-bit-depth=10 --end-usage=q --cq-level=28 -o Chimera_DCI4k2398p_HDR_P3PQ.ivf Chimera_DCI4k2398p_HDR_P3PQ.y4m
Clare is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 10:06.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.