Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > VP9 and AV1

Reply
 
Thread Tools Search this Thread Display Modes
Old 12th November 2018, 05:22   #1221  |  Link
Mystery Keeper
Beyond Kawaii
 
Mystery Keeper's Avatar
 
Join Date: Feb 2008
Location: Russia
Posts: 724
Quote:
Originally Posted by Selur View Post
Which would require 2pass encoding and a fixed gop structue (in regard to the gop sizes), iirc 2nd pass normally should be able to overwrite GOP to archive vbv limits (not totally sure).
I'm totally fine with that. I usually use 2pass anyway. And, of course, I meant I wish they had it as an option.
__________________
...desu!
Mystery Keeper is offline   Reply With Quote
Old 12th November 2018, 12:47   #1222  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,746
@ Nintendo Maniac 64:

Even AMD Athlon64/Phenom (K8-K10 arch.) support some SSE3; but x264/x265 does not use it, considers their implementation as "too slow", I believe.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 12th November 2018, 23:16   #1223  |  Link
marcomsousa
Registered User
 
Join Date: Jul 2018
Posts: 80
SSE3-optimised av1_nn_predict


https://aomedia.googlesource.com/aom...6f313f27b1c501

Quote:
I have developed a SIMD-optimised neural network implementation using
SSE3. I have also added functional equivalence tests between this and
the original implementation. I added aom_clear_system_state() to a few
places where FPU operations are used after av1_nn_predict.

Speed-ups over the original C implementation for various network shapes:
10x64x16: 1.72x
12x12x1: 2.72x
12x24x1: 2.35x
12x32x1: 3.34x
18x24x4: 0.94x
18x32x4: 0.93x
4x16x1: 2.01x
8x16x1: 1.89x
8x16x4: 2.02x
8x24x1: 2.77x
8x32x1: 2.98x
8x64x1: 3.76x
9x32x3: 1.08x
4x8x4: 1.66x

A few awkwardly-shaped networks are slightly slower: these could be
padded to more convenient sizes to use the SIMD kernels.

I also wrote an AVX/AVX2 implementation but on these relatively small
networks it was barely faster than the SSE3 code.
__________________
AV1 win64 VS2019 builds
Last build here
marcomsousa is offline   Reply With Quote
Old 12th November 2018, 23:23   #1224  |  Link
Nintendo Maniac 64
Registered User
 
Nintendo Maniac 64's Avatar
 
Join Date: Nov 2009
Location: Northeast Ohio
Posts: 447
Quote:
Originally Posted by LigH View Post
Even AMD Athlon64/Phenom (K8-K10 arch.) support some SSE3
...but this is exactly what I alluded to?

Athlon 64 CPUs are available on socket 754, 939, and AM2; 754 and 939 used DDR1 memory while AM2 used DDR2, and all AM2 CPUs support SSE3.

(there are some socket 754 and 939 CPUs that support SSE3, though it's kind of hit and miss).

Phenom for reference requires at least DDR2.
Nintendo Maniac 64 is offline   Reply With Quote
Old 13th November 2018, 08:50   #1225  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,746
I'm sorry, I don't know socket numbers... - so we looked at the same topic from different angles.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 13th November 2018, 19:26   #1226  |  Link
v0lt
Registered User
 
Join Date: Dec 2008
Posts: 1,954
Quote:
Originally Posted by Wolfberry View Post
I ran the same test as above and get 16/38/46 fps.
What is the CPU you use for testing?
It might be related to the AVX2 code used in dav1d.
Intel i5-3570k (SSE4.1, SSE4.2, AVX), Windows 7 Sp1 x64.
v0lt is offline   Reply With Quote
Old 15th November 2018, 00:29   #1227  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
Status report!
Previous edition: http://forum.doom9.org/showthread.ph...49#post1852449
Whatever paragraph I don't repeat here can be assumed to be the same as in the aforementioned post

First of all: graphs! Click to enlarge
Y axis: chosen metric
X axis: bits per pixel

720p:


1080p:


BD rates for 720p:
Code:
x264 -> rav1e (yeah you read that right!)
        RATE (%)  DSNR (dB)
 MSSSIM -0.736889 0.0375593
PSNRHVS -5.5274   0.375081

rav1e -> x265
        RATE (%) DSNR (dB)
 MSSSIM -26.5291 1.29942
PSNRHVS -27.1134 1.70509

x265 -> libaom
        RATE (%) DSNR (dB)
 MSSSIM -18.9088 0.7852
PSNRHVS -15.3123 0.761791
BD rates for 1080p:
Code:
x264 -> rav1e (yeah you read that right again!)
        RATE (%) DSNR (dB)
 MSSSIM -4.92009 0.235151
PSNRHVS -7.23088 0.473125

rav1e -> x265
        RATE (%) DSNR (dB)
 MSSSIM -26.7063 1.16103
PSNRHVS -28.0007 1.53902

x265 -> libaom
        RATE (%) DSNR (dB)
 MSSSIM -26.486  0.938124
PSNRHVS -21.7431 0.905916
Encoders:
x264 157-2935-545de2f
x265 2.9-4-471726d3a046
rav1e 0.1.0-702-ab4d23e2
libaom 1.0.0-908-g3a607f7b0

Cmdlines:
x264 --preset veryslow --tune ssim --crf 16 -o test.x264.crf16.264 orig.i420.y4m
x265 --preset veryslow --tune ssim --crf 16 -o test.x265.crf16.hevc orig.i420.y4m
rav1e --low_latency false -o test.rav1e.cq80.ivf --quantizer 80 -s 2 --tune psnr orig.i420.y4m
aomenc --frame-parallel=0 --tile-columns=3 --auto-alt-ref=1 --cpu-used=4 --tune=psnr --passes=2 --threads=2 --end-usage=q --cq-level=20 --test-decode=fatal -o test.av1.cq20.webm orig.i420.y4m

Notes:
So as you can see, the rav1e and aomenc cmdlines have been slightly adjusted to take advantage of the bugfixes and updates from the last months.
In particular, rav1e has been gifted by Frank Bossen the ability to create a B-pyramid, which almost single handedly decreed rav1e's advantage over x264.
A word of warning on this last point: it's still kind of a mixed bag. In very flat, static scenes like PresageFlowerWalk x264 still rules by quite a margin, while rav1e takes the crown in clips like F.Y.C and PresageFlowerFight
Code:
F.Y.C, x264 -> rav1e:
        RATE (%) DSNR (dB)
 MSSSIM -18.451  1.01281
PSNRHVS -25.7463 2.03419

PresageFlowerFight, x264 -> rav1e:
        RATE (%) DSNR (dB)
 MSSSIM -31.4953 1.80761
PSNRHVS -31.0827 2.27546

PresageFlowerWalk, x264 -> rav1e:
        RATE (%) DSNR (dB)
 MSSSIM 66.2264 -1.70084
PSNRHVS 70.8208 -2.28853
(as always, a negative BD rate means improvement, positive means regression)

Considerations about times with libaom:
I'm using my desktop PC to run all the encodes. It is also my main study/work PC, so the times can come quite off. Plus, I run multiple encodes in parallel, which further messes up timings.
HOWEVER, between annoying bugs and a lot of stuff, the first report did cost me nearly a week of time (this includes having to re-run some encodes because sh*t happened) ONLY to encode with libaom.
Taking advantage of the recent bugfixes and improvements I have been able to rework my workflow and bring down that time to a couple days only, WITHOUT having to touch the --cpu-used parameter and no night time encoding.
All in all, I am pretty satisfied.

This concludes my (bi-monthly?) report.
As always, I'm open to any kind of feedback to improve my comparisons and my encodes.

Last edited by SmilingWolf; 15th November 2018 at 00:34.
SmilingWolf is offline   Reply With Quote
Old 16th November 2018, 19:53   #1228  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,738
So, what's everyone's favorite AV1 decoder app on Windows? Chrome looks to be not converting from video to PC range correctly (blacks are washed out, contrast is low, etcetera). Is there a nightly of something that does AV! correctly for apples-apples?
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 16th November 2018, 22:45   #1229  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
Quote:
Originally Posted by benwaggoner View Post
So, what's everyone's favorite AV1 decoder app on Windows? Chrome looks to be not converting from video to PC range correctly (blacks are washed out, contrast is low, etcetera). Is there a nightly of something that does AV! correctly for apples-apples?
VLC 3.0.5 (Nightly). I fixed my nVidia settings just today because I had that same problem while playing back the ToS fragment I use for the tests. Plays out correctly now.
In alternative, ffplay for quick stuff when I already have a bunch of command prompts open in the right path.

Last edited by SmilingWolf; 17th November 2018 at 12:30.
SmilingWolf is offline   Reply With Quote
Old 16th November 2018, 23:27   #1230  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,746
I use almost only MPC-HC. Which uses LAV Filters with a direct API. It was able to play AV1 clips from the YouTube beta playlist and some tiny own encodes (I don't have powerful CPU's available). So, only a limited experience, yet, but it appears to work.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 18th November 2018, 12:40   #1231  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
32/64bits binaries (GCC 9.0):
av1-1.0.0-941-gd2a592e1c: https://mega.nz/#!F5Am2KyK!9aQ6_7mM2...6_OaZahvKCHPWQ
SmilingWolf is offline   Reply With Quote
Old 19th November 2018, 10:51   #1232  |  Link
mandarinka
Registered User
 
mandarinka's Avatar
 
Join Date: Jan 2007
Posts: 729
Quote:
Originally Posted by Mystery Keeper View Post
I wish aomenc/vpxenc had GOP-level parallelism. When each thread is encoding one GOP, and then they are stitched together. That would make use of all CPU power without compromising quality/compression.
You could get the same results by splitting manually into X parts end encode them separately at once. I'm not sure how much does libvpx/libaom count with that. It works great with x264 and x265 (using raw output at least).
mandarinka is offline   Reply With Quote
Old 19th November 2018, 10:58   #1233  |  Link
mandarinka
Registered User
 
mandarinka's Avatar
 
Join Date: Jan 2007
Posts: 729
Quote:
Originally Posted by LigH View Post
@ Nintendo Maniac 64:

Even AMD Athlon64/Phenom (K8-K10 arch.) support some SSE3; but x264/x265 does not use it, considers their implementation as "too slow", I believe.
SSE3 is not particularly useful for multimedia and it's just a few instructions introduced in Presscot P4 and Venice 90nm K8.

You probably mean SSSE3 (SSS instead of SS) aka "Suplemental SSE3" which is a confusing and dumb name. Probably should have been SSE4 but got renamed for marketing reasons. Or SSE3 was not supposed to be SSE3 originally.

SSSE3 is very useful for encoding and decoding, but only comes on Core 2 chips, and Bobcat/Bulldozer and later cores from AMD. K10 and K8 end at the not-so-important SSE3.
(Note that x265 actually needs SSSE3 + SSE4 to be useful, you are barred from most of assembly optimization if you only have SSSE3, like with 65nm Core 2s or pre-Sandy Bridge Pentium/Celeron).

Last edited by mandarinka; 19th November 2018 at 11:01.
mandarinka is offline   Reply With Quote
Old 19th November 2018, 13:09   #1234  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,746
Thanks, mandarinka, that explains a bit. I meant SSE3 of 2004 (a.k.a. "Prescott New Instructions" PNI, according to Wikipedia), originally. SSSE3 of 2006 did not arrive in AMD CPUs before the "Cat" (Fusion APU) and "Heavy Equipment" series, so Athlon64/Phenom are clearly out of business.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 20th November 2018, 01:08   #1235  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,738
Quote:
Originally Posted by mandarinka View Post
You could get the same results by splitting manually into X parts end encode them separately at once. I'm not sure how much does libvpx/libaom count with that. It works great with x264 and x265 (using raw output at least).
Naïve Split-and-stich risks violating VBV at the stitch boundaries and/or reducing quality at those boundaries in order to ensure VBV.

Not that VBV is being used in any AV1 testing I've seen so far.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 20th November 2018, 04:13   #1236  |  Link
utack
Registered User
 
Join Date: Apr 2018
Posts: 63
So I am not entirely sure about what the stats file from first pass includes.
When using pure "q" mode for constant quality, is there a benefit to doing a first pass, or does the first pass only determine how to distribute bitrate when a target bitrate and vbr is specified?
utack is offline   Reply With Quote
Old 20th November 2018, 16:37   #1237  |  Link
marcomsousa
Registered User
 
Join Date: Jul 2018
Posts: 80
Building Modern Web Media Experiences: AV1 (Chrome Dev Summit 2018)
https://youtu.be/iTC3mfe0DwE?t=612
__________________
AV1 win64 VS2019 builds
Last build here

Last edited by marcomsousa; 20th November 2018 at 19:16.
marcomsousa is offline   Reply With Quote
Old 21st November 2018, 11:24   #1238  |  Link
uneedme
Registered User
 
Join Date: Sep 2007
Posts: 41
Hi all

Still, anywhere could find the detail explained parameter functions and arguments range?


forgive my poor wording...

high-end spree means nothing...
uneedme is offline   Reply With Quote
Old 22nd November 2018, 00:09   #1239  |  Link
utack
Registered User
 
Join Date: Apr 2018
Posts: 63
dav1d is doing well
http://www.jbkempf.com/blog/post/201...-first-release
utack is offline   Reply With Quote
Old 22nd November 2018, 11:48   #1240  |  Link
Wolfberry
Helenium(Easter)
 
Wolfberry's Avatar
 
Join Date: Aug 2017
Location: Hsinchu, Taiwan
Posts: 99
64-bit GCC 8.2.0 binaries: av1-1.0.0-962-1468e60d7

Quote:
AVX2 ver of highbd dr predictions Z1,Z3
perfromance increase 1.22x-20x depending on input params
__________________
Monochrome Anomaly
Wolfberry is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 06:09.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.