Log in

View Full Version : Alliance for Open Media codecs


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 [26] 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

easyfab
26th November 2018, 22:19
for me ( 16 threads cpu ) without tiles, row-mt alone is more than 2x slower ( only use 4/6 threads / cpu usage 10-20 % ) less than 20 fpm.
with row-mt + tiles it use all threads ( cpu usage 50-60% ) and give 34 fpm on TOS 1500 first frames . I'm ok to loose a bit quality with tiles to gain 2x speed.

SmilingWolf
26th November 2018, 22:22
We share the same opinion. The tradeoffs are vastly in favour (http://forum.doom9.org/showthread.php?p=1856939#post1856939) of tiling

julius666
28th November 2018, 00:30
My try with the first 1500 frames @1000kb/s with row-mt + tiles

My 2 pass command line :

7z.exe" x "ToS_1920x800_xdither.7z" -so | aomenc.exe --cpu-used=4 --row-mt=1 --threads=16 --tile-columns=4 --tile-rows=2 --kf-max-dist=250 --bias-pct=75 --webm -o tos.webm --target-bitrate=1000 --codec=av1 --passes=2 --pass=2 --fpf=fpf --limit=1500 -
Pass 2/2 frame 1500/1481 8290530B 2633147 ms 34.18 fpm [ETA unknown]
Pass 2/2 frame 1500/1500 8314991B 44346b/f 1064304b/s 2656175 ms (0.56 fps)

And the result file : https://www.sendspace.com/file/dcf6ii

It seems to decode fine for me.

Wow, this looks incredibly good! And it plays smoothly on my old Thinkpad X230, which is actually the worst (AV1-capable) HW I have access to atm.
This is almost unbelievable given the codec's infancy.

Nintendo Maniac 64
28th November 2018, 10:34
it plays smoothly on my old Thinkpad X230

Are you playing this back with dav1d or libaom?

julius666
29th November 2018, 22:10
Are you playing this back with dav1d or libaom?

I tried it with the latest mpv on Arch Linux. It uses libaom by default:


[vd] Codec list:
[vd] libaom-av1 (av1) - libaom AV1
[vd] Opening decoder libaom-av1
[vd] No hardware decoding requested.
[vd] Using software decoding.
[vd] Detected 4 logical cores.
[vd] Requesting 5 threads for decoding.
[ffmpeg/video] libaom-av1: 1.0.0
[vd] Selected codec: libaom-av1 (libaom AV1)

mandarinka
6th December 2018, 19:57
I see nobody talking about it here, yet, but it seems Google or some other members decided to "unfreeze" the bitstream, there will by the look of it be an incompatible AV1.1.0 revision.
Supposedly it's mostly driven by requests of hardware implementers which wanted to restrict the format a bit in some places.

It looks like hardware might only implement AV1.1.0 mostly, not AV1.0? (This is a bit speculative, maybe there will be exceptions).

Current AV1.0 decoders will support the more restricted AV1.1.0 streams but new (hardware) decoders targetting AV1.1.0 won't be compatible with AV1.0 video. Rather messy, IMHO they should have just delayed the codec finalization and do it properly the first time around, instead of this, but I guess politics prevailed.

There's remarkable silence about it, but I guess it's not the best news so they don't want to brag about it too much until it's final (not sure if the decision is set in stone already).

utack
6th December 2018, 20:33
Supposedly it's mostly driven by requests of hardware implementers which wanted to restrict the format a bit in some places.
.

Even worse that it seems to be about tiles.
These should have never been implemented in a new codec the first place, it is just a lazy workaround because libaom sucks at frame parallel encoding and decoding.
rav1e and dav1d won't need them

SmilingWolf
6th December 2018, 21:34
Ronald Bultje commented on frame parallelism being a bad thing for VP9 (https://blogs.gnome.org/rbultje/2014/02/22/the-worlds-fastest-vp9-decoder-ffvp9/), so not much of a surprise it was turned off by default in AV1/libaom
Tiles help with decoding, do I have to re-link my tests every time this stuff is brought up?

The proposed changes to tile width management only had to be set in stone, all I know is that they have been the de-facto standard in the libaom codebase since at least this summer, when I first began encoding 1080p clips.
EDIT: looks like I was thinking about something else in these regards, nevermind

And since we're talking about this, why at least not link the to the source of the news?
https://www.reddit.com/r/AV1/comments/a1038a/av11_is_launching_soon_and_will_include_breaking/

nevcairiel
6th December 2018, 22:31
Tiles help with decoding, do I have to re-link my tests every time this stuff is brought up?

That may be true, but you can make the same argument for a lot of things. That alone does not necessarily justify a feature thats generally rather annoying, and generally considered a remnant of the past.

In any case, at the current point in time, AV 1.0 bitstreams will just eventually vanish, and AV 1.1 will be the actual standard people use. Outside of tech demos, AV1 is generally unused still.
And with knowing this change is here, noone is going to adopt it now until this is cleared up and "final" again.

mandarinka
8th December 2018, 01:08
I can't tell how correct it is, but this was an interesting read: https://codecs.multimedia.cx/2018/12/why-i-am-sceptical-about-av1/
Author is a former libav/ffmpeg developer if you don't remember his name.

nevcairiel
8th December 2018, 11:13
He can ramble about hardware influence all he wants, but if you don't get the hardware people onboard, your codec is DOA anyway.

utack
8th December 2018, 15:34
I can't tell how correct it is, but this was an interesting read: https://codecs.multimedia.cx/2018/12/why-i-am-sceptical-about-av1/
Author is a former libav/ffmpeg developer if you don't remember his name.

That the daala people who provided most of the new ideas startet their own rav1e encoder from scratch is supporting this blog post.
I also don't get how hardware seems to play a big role, VP9 was not made with hardware in mind, and Qualcomm Samsung Nvidia AMD Intel, as well as some random Chinese SOC vendors still managed to make a hardware decoder. So hardware designers can't be scared off that easily it seems?

Mjpeg
8th December 2018, 17:11
He can ramble about hardware influence all he wants, but if you don't get the hardware people onboard, your codec is DOA anyway.

I agree. I'm just a lurker, but the HEVC licensing debacle opened up a window for a few years, so AV1 needed to jump in quickly to have a chance, which leads to the not-so-radical design that annoys him. I think what he misses is that if AV1 is can succeed, then we'll get AV2, AV3 etc.

It's super clear to me that hardware is crucial because of playback. A codec cannot succeed if "OMG Youtube Is Killing My Battery" is a big reddit thread!

SmilingWolf
8th December 2018, 17:39
That the daala people who provided most of the new ideas startet their own rav1e encoder from scratch is supporting this blog post.
rav1e was started to provide a minimal and fast encoder as early as possible, it was not some kind of statement.
There are even some slides about why working on the existing libvpx-then-branched-libaom codebase made things hard in some xiph.org user folder, I'll edit if I can find them again.
It mainly had to do with libaom being big, old and full of experiments anyway
---
Alright found a couple right off the bat:
https://people.xiph.org/~tdaede/rav1e_vdd_2017.pdf
Started as a reimplementation of AV1 in order to find bitstream and specification bugs
● Could do an encoder or decoder:
– Decoder (especially fuzzed) more useful to find mismatches
– Encoder doesn’t need all features implemented to work
● Algorithmic improvements over libaom
https://people.xiph.org/~tdaede/rav1e_vdd_2018.pdf
Background on libaom:
Derived from libvpx codebase
● Reference implementation, “sort of usable”
● Much encoder behavior is inherited from previous VPx codecs
– multiple frame passes
– weird rate control

As for hardware, we can speculate how Google had to drag some vendors on board with who knows what promises or deals as a last ditch effort to promote VP9. It was not designed with hardware in mind, hardware support came very late, and whoa look at the amount of people willing to make VP9 encodes around outside of Google/Youtube! /s

One would think they learnt from that experience. Again, we can only speculate, but that would provide a good explanation for the veto power of hardware vendors this time around.

That may be true, but you can make the same argument for a lot of things. That alone does not necessarily justify a feature thats generally rather annoying, and generally considered a remnant of the past.
But it's a matter of fact that we're stuck with it. Considering it's not that hard to understand how to use it from an encoder (the person, not the software) POV and the low overhead, discouraging its use will only have people creating streams that can't be reproduced on a very large amount of systems. Next thing you know, plenty of people will be screaming bloody murder because their 16 core system stutters on a 1080p clip with mild to heavy motion and the CPU is still underused.

What are the modern alternatives to tiling? I remember reading somewhere WPP is patented, so what would be the next best alternative?

alex1399
9th December 2018, 11:37
Oh no, Youtube coupled the high define resolution with the 60fps in all most every video. It uses a trivial motion interpolation that simply duplicates frame from previous and converts native frame-rate into 60fps. What a jerk. Thats why it works so hard on some low end PC with high speed Internet.

Nintendo Maniac 64
9th December 2018, 22:23
It seems that at some point YouTube actually implemented AV1 "in the wild". Much like how VP9 was rolled out, it seems that the larger the view count then the more likely you'll get an AV1 encode...but not entirely - this is most obvious on Linus Tech Tips where this LTT video with ~1.6M views (https://www.youtube.com/watch?v=U5dJn7V_4Bk) has AV1 but this other LTT video posted just 1 day before with ~2.8M views (https://www.youtube.com/watch?v=VWHlPH23P-w) does not...

Unless YouTube flipped an AV1 switch sometime specifically on December 3rd only for newly-posted video, making the first-linked video quality while the second-linked video did not?




Anyway, I just finished a bunch of CPU decoding performance testing for one of SmilingWolf personal videos that he encoded in AV1, and while the absolute performance numbers are relatively meaningless since they largely depend on things like bitrate, framerate, and resolution, the relative decoding performance numbers should still be of interest.

The way I measured included (but was not limited to) having a video clip play at 17fps and then seeing what the lowest clockrate required was for a given CPU architecture and thread configuration with a 45nm Core 2 Duo @ 3.5GHz as the baseline. I also tested within a single architecture for performance scaling at various clockrate, and save for Wolfdale (possibly due to the lack of an integrated memory controller), the performance scaling was for all intents and purposes identical between Nehalem and Haswell (that is, the percentage amount of extra clockrate necessary to play back 24fps vs 17fps was darned-near exactly the same)

All of this was tested with MPC-HC 1.8.3 x64 (LAVfilters 0.73, libaom) and only on CPUs that supported SSE 4.1 as CPUs lacking this instruction set would have needed a 10+ GHz overclock (I'm not kidding) such as the Phenom II and the 65nm Conroe-based Core 2 Duo (even though the later supports SSSE3 and not just SSE3).

And to clarify, the percentage below is simply how much faster a given CPU should be if it had the same clockrate as the baseline 2c/2t Wolfdale (which itself was clocked at 3.5GHz).


100% - Wolfdale 2c/2t
119% - Nehalem 2c/2t
130% - Wolfdale 4c/4t
138% - Nehalem 2c/4t
167% - Haswell 2c/2c
175% - Nehalem 4c/4t
175% - Nehalem 4c/8t (not a typo)

And from some of the other tests I did, I was able to extrapolate the performance of Haswell CPUs configured at 2c/4t, 4c/4t, and 4c/8t (again, relative to a 2c/2t Wolfdale) as I do not have access to Haswell CPUs with thread configurations greater than 2c/2t:

199% - Haswell 2c/4t
241% - Haswell 4c/4t
241% - Haswell 4c/8t (not a typo)


For those that don't know their CPU architectures...
Wolfdale = second generation desktop Core 2 Duo/Quad, 45nm die-shrink
Nehalem = 45nm, first generation of Intel CPUs that use the Core i5/i7 branding (i3 didn't come along until the 32nm Westmere die-shrink...which was still considered "1st gen" oddly enough)
Haswell = 22nm, fourth generation of Intel CPUs that use the Core i3/i5/i7 branding

Zebulon84
10th December 2018, 00:18
It seems that at some point YouTube actually implemented AV1 "in the wild". Much like how VP9 was rolled out, it seems that the larger the view count then the more likely you'll get an AV1 encode...but not entirely - this is most obvious on Linus Tech Tips where this LTT video with ~1.6M views (https://www.youtube.com/watch?v=U5dJn7V_4Bk) has AV1 but this other LTT video posted just 1 day before with ~2.8M views (https://www.youtube.com/watch?v=VWHlPH23P-w) does not...

Unless YouTube flipped an AV1 switch sometime specifically on December 3rd only for newly-posted video, making the first-linked video quality while the second-linked video did not?
The one with more views without AV1 is also longer (18 min vs 11 min), so it may be still encoding, or length is taken into account when choosing which video is worth spending hours of encoding time for a few AV1 views.

Thanks for the benchmarks, it proves it's possible to do software decoding of AV1 on desktop, but it's quite heavy, and will be hard on laptop battery and probably too much for any smartphones. Do you plan to add ...lake, Ryzen or dav1d ?

Nintendo Maniac 64
10th December 2018, 01:03
it proves it's possible to do software decoding of AV1 on desktop, but it's quite heavy, and will be hard on laptop battery and probably too much for any smartphones

Remember that this depends heavily on video resolution, framerate, and bitrate.

In particular, the video I was using I had manually slowed down to 17fps because that was the highest framerate my Nehalem x3470 could handle when configured as 2c/2t and turbo disabled (2.93GHz), which is why I did not provide any absolute performance numbers and focused on relative performance.



Do you plan to add ...lake, Ryzen or dav1d ?

I don't have any such CPUs, and I've no idea how to benchmark dav1d since coding and such is totally not my specialty (my expertise is much more in hardware).

However, Zen-based CPUs should have per-GHz performance similar to Haswell while Sky/Kaby/Coffee lake will have slightly better per-GHz performance than Haswell, so for the most part you can just use the Haswell relative performance numbers (not the clockrate!) as a reference for those architectures.

hajj_3
11th December 2018, 17:11
dav1d v0.1 has been released: http://www.jbkempf.com/blog/post/2018/First-release-of-dav1d

sneaker_ger
11th December 2018, 17:52
And, we've been experimenting with shaders, notably for the Film Grain feature.
shaders = GPU?

v0lt
11th December 2018, 17:52
dav1d v0.1 has been released: http://www.jbkempf.com/blog/post/2018/First-release-of-dav1d
Good news.
I will wait for the ffmpeg build with both libaom and libda1d libraries. I want to compare the speed of work in the same conditions.

SmilingWolf
11th December 2018, 20:07
64bits, GCC 8.2:
ffmpeg 4.2-92673-g876ed08b0d: https://mega.nz/#!QxpinIyQ!HBtUEzFObdc5RDFEc3UrzOdaRo8QxNxABGMlPcvgSWA
- libaom 1.0.0-1024-g5b8f393fe
- libdav1d 0.1.0 c0501f1

sneaker_ger
11th December 2018, 20:48
Thx. Seems dav1d is still easily 40% slower on an i5-2500K (AVX but no AVX2) compared to libaom. Only ~50% CPU utilization on both.

nevcairiel
11th December 2018, 21:14
SSE* code is still actively being worked on, and is actively coming in right now, so its getting faster day by day on those systems. AVX1 doesn't help a codec like this much, since AVX instructions are primarily floating point, and only AVX2 adds the required integer instructions.

SmilingWolf
11th December 2018, 21:25
To follow the progress of SSSE3 implementation: https://code.videolan.org/videolan/dav1d/issues/216
Same thing for NEON: https://code.videolan.org/videolan/dav1d/issues/215

An article on dav1d 0.1.0 by the same guy who's been doing most of the benchmarks that appeared in the official blogposts: https://medium.com/@ewoutterhoeven/dav1d-0-1-0-release-the-first-benchmarks-5404360e44e3

v0lt
12th December 2018, 04:03
SSE* code is still actively being worked on, and is actively coming in right now, so its getting faster day by day on those systems. AVX1 doesn't help a codec like this much, since AVX instructions are primarily floating point, and only AVX2 adds the required integer instructions.
They claim that dav1d is always faster than libaom. They say that there are problems only in single-threaded mode. This lie breaks. :mad:

On modern desktop, dav1d is very fast, compared to other decoders:
Pentium G5600 not modern?
But, since the previous blogpost, we've added more assembly for desktop, and we've merged some assembly for ARMv8, and for older machines (SSSE3).

We're now as fast as libaom, in single-thread, on ARMv8, and faster with more threads.
My tests:
ffmpeg -t 10 -c:v libaom-av1 -i Stream2_AV1_4K_22.7mbps.webm -benchmark -f null -
ffmpeg -t 10 -c:v libdav1d -i Stream2_AV1_4K_22.7mbps.webm -benchmark -f null -
ffmpeg -t 10 -c:v libdav1d -threads 4 -tilethreads 4 -i Stream2_AV1_4K_22.7mbps.webm -benchmark -f null -
Result:
libaom-av1 - 14 fps
libdav1d - max 7.1 fps
libdav1d -threads 4 -tilethreads 4 - max 9.6 fps
I got the exact same result a month ago (https://forum.doom9.org/showthread.php?p=1857317#post1857317).

Wolfberry
12th December 2018, 04:45
This commit (https://code.videolan.org/videolan/dav1d/commit/02312cae6c45a58d1b275ad80eb6c41271415c3b) may help.

Not sure if it is CLI only or can be used in ffmpeg.

Nintendo Maniac 64
12th December 2018, 05:04
Pentium G5600 not modern?.

Unfortunately, many people do not realize that Intel Pentiums do not support AVX at all.

At least going forward there's now an AMD alternative in the form of the Athlon 200GE which does support AVX2 (in addition to having a better iGPU and actual sane prices in lieu of Intel's 14nm shortage), but that processor was only just released a couple months ago.

MoSal
12th December 2018, 05:48
libaom-av1 - 14 fps
libdav1d - max 7.1 fps
libdav1d -threads 4 -tilethreads 4 - max 9.6 fps
I got the exact same result a month ago (https://forum.doom9.org/showthread.php?p=1857317#post1857317).

Can you try -threads 8 -tilethreads 1?

marcomsousa
12th December 2018, 10:55
ffmpeg-4.2-92681-0e833f6
- libaom 1.0.0-1028-78e6b2c
- libdav1d 0.1.0 73067e5

ffmpeg -t 10 -c:v libaom-av1 -i Stream2_AV1_4K_22.7mbps.webm -benchmark -f null -
ffmpeg -t 10 -c:v libdav1d -i Stream2_AV1_4K_22.7mbps.webm -benchmark -f null -
ffmpeg -t 10 -c:v libdav1d -threads 4 -tilethreads 4 -i Stream2_AV1_4K_22.7mbps.webm -benchmark -f null -

Result:
Code:
libaom-av1 - 21 fps 0.780x speed
libdav1d - 41 fps 1.65x speed
libdav1d -threads 4 -tilethreads 4 - 58 fps 2.31x speed

CPU: Intel i7 8550U (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, EM64T, AES, AVX, AVX2, FMA3)

Gravitator
12th December 2018, 12:02
ffmpeg-4.2-92396-g55e021f39b (https://forum.doom9.org/showthread.php?p=1857294#post1857294)
- libaom 1.0.0-902-g03d8ebedc
- libdav1d 58fc516
ffmpeg -hide_banner -t 10 -c:v libaom-av1 -i 1.mp4 -benchmark -f null - (43 fps)
ffmpeg -hide_banner -t 10 -c:v libdav1d -i 1.mp4 -benchmark -f null - (52 fps)
ffmpeg -hide_banner -t 10 -c:v libdav1d -threads 1 -tilethreads 2 -i 1.mp4 -benchmark -f null - (61 fps)
ffmpeg -hide_banner -t 10 -c:v libdav1d -threads 2 -tilethreads 2 -i 1.mp4 -benchmark -f null - (65 fps)

ffmpeg-4.2-92681-0e833f6 (https://forum.doom9.org/showthread.php?p=1859780#post1859780)
- libaom 1.0.0-1028-78e6b2c
- libdav1d 0.1.0 73067e5
ffmpeg -hide_banner -t 10 -c:v libaom-av1 -i 1.mp4 -benchmark -f null - (45 fps)
ffmpeg -hide_banner -t 10 -c:v libdav1d -i 1.mp4 -benchmark -f null - (51 fps)
ffmpeg -hide_banner -t 10 -c:v libdav1d -threads 1 -tilethreads 2 -i 1.mp4 -benchmark -f null - (58 fps)
ffmpeg -hide_banner -t 10 -c:v libdav1d -threads 2 -tilethreads 2 -i 1.mp4 -benchmark -f null - (63 fps)

nevcairiel
12th December 2018, 12:50
They claim that dav1d is always faster than libaom. They say that there are problems only in single-threaded mode. This lie breaks. :mad:

Actuall it says that it will soon be faster then other decoders on all platforms. "soon" and not now.

If you don't have AVX2, the decoder is still being bottlenecked quite heavily, and also won't thread quite as nicely because reference frames take too long to decode, for example. The SSSE3 work is still at early stages - if you look at the ticket linked above, only a small part of assembly has been covered in SSSE3 yet.

NikosD
12th December 2018, 14:39
SSSE3 code base is fundamental because it's the first instruction test supported by all Core 2 Duo and above (not Pentium 4) and also it's very useful for decoding (at least on previous codecs like H.264/H.265)

But I don't know if they want to go back to even older instruction sets and CPUs like SSE2.

We'll see.

sneaker_ger
12th December 2018, 15:15
Shame for AMD users. K10 (like Phenom II) and similar don't have SSSE3. Produced up to 2012. Of course it will be years until AV1 is de-facto required (if ever) so by then...

SmilingWolf
12th December 2018, 15:24
But I don't know if they want to go back to even older instruction sets and CPUs like SSE2.

https://code.videolan.org/videolan/dav1d/issues/207#note_24056

nevcairiel
12th December 2018, 15:30
Shame for AMD users. K10 (like Phenom II) and similar don't have SSSE3. Produced up to 2012. Of course it will be years until AV1 is de-facto required (if ever) so by then...

The marketshare of non-SSSE3 desktop CPUs is so small that noone is really going to bother with that, particularly because many of those CPUs are often times going to be too slow for any real use anyway.
And in all honesty, if you bought a K10 in 2012 or anywhere near to that, you just did it wrong, even on the low-end market.

Intel introduced SSSE3 all the way back in 2006, afterall. Its hardly "new" even in 2012.

Ultimately its up to the developers how they want to spend their time, but as mentioned in the ticket linked above, pure SSE2 is often a lot more painful to write then using SSSE3 enhancements.

NikosD
12th December 2018, 15:38
https://code.videolan.org/videolan/dav1d/issues/207#note_24056 Thank you.

So, SSSE3 is the minimum.

Little pity for AMD CPUs.

sneaker_ger
12th December 2018, 15:39
Steam HW Survey says 3% don't have SSSE3, only 0.01% don't have SSE3.

https://store.steampowered.com/hwsurvey

nevcairiel
12th December 2018, 15:43
SSE3 (without the third S) is mostly useless for video. Its primarly floating-point.
For video, which needs integer instructions, you only have a few meaningful steps: (everything left out is mostly floating point or otherwise not related, like SSE3, AVX1, etc).

- MMX
- SSE2
- SSSE3
- SSE4.1
- AVX2
- AVX512

Obviously noone cares about MMX anymore. SSE4.1 is only useful in special cases. And obviously AVX512 is not rolled out and perhaps even understood widely enough yet, maybe in a few years.
So, by and large, that leaves SSE2, SSSE3, AVX2. The difference between SSE2 and SSSE3 is not gigantic, same 128-bit registers afterall, SSSE3 only adds a bunch of new instructions - but some of those are really useful and make code much simpler and easier to write.

clsid
12th December 2018, 16:33
The optimizations in Dav1d are currently mostly for 8-bit only. So for 10-bit libaom may still be faster.

Development pace in Dav1d is pretty high, so we will have a fast decoder long before there is actual widespread AV1 content (beyond the current demo files and a few Youtube videos).

v0lt
12th December 2018, 16:34
Can you try -threads 8 -tilethreads 1?
I test again. i5-3570K.
libaom-av1 - max 14 fps
libdav1d - max 7.2 fps
libdav1d -threads 4 -tilethreads 4 - max 9.7 fps
libdav1d -threads 8 -tilethreads 1 - max 10 fps

Actuall it says that it will soon be faster then other decoders on all platforms. "soon" and not now.
I carefully read their "press releases". I did not see them writing about slow speed without AVX2. But they know exactly about this. This happens the second time. "Press releases" write for sponsors?
I'm waiting for the dav1d to be faster on my processor. I want to see truthful information, not PR.

SmilingWolf
12th December 2018, 16:53
I carefully read their "press releases". I did not see them writing about slow speed without AVX2. But they know exactly about this. This happens the second time. "Press releases" write for sponsors?
I'm waiting for the dav1d to be faster on my processor. I want to see truthful information, not PR.

No you didn't.
The blogpost links twice to this previous one for detailed perf reports: http://www.jbkempf.com/blog/post/2018/dav1d-toward-the-first-release
Today, dav1d is very fast on AVX2 processors, which should cover a bit more than 50% of the CPUs used on the desktop. We wrote 95% of the code needed for AVX2, but there is still a bit more achievable.

We're readying the SSE and the ARM optimizations, to do the same. They will be very fast too, in the next weeks.
It's clearly stated that dav1d is the fastest on AVX2.
Then the same post you claim to have read very carefully states that work on SSSE3 has only just begun.
Since the Pentium G5600 only supports extensions up to SSE4.2 it's clear you'll have to wait some more.

Spare the rage and read some more

v0lt
12th December 2018, 17:18
No you didn't.
The blogpost links twice to this previous one for detailed perf reports: http://www.jbkempf.com/blog/post/2018/dav1d-toward-the-first-release
Please quote the text where it is written that dav1d without AVX2 will run slower.
This information I could find only in the discussion of beta testing.

SmilingWolf
12th December 2018, 17:29
Please quote the text where it is written that dav1d without AVX2 will run slower.
This information I could find only in the discussion of beta testing.

A certain extension provides a speedup. It follows, w/o said extension things will be slower.
You have the wunderbar vector extensions: you have the speedup these provide.
You can't use the vector extensions: you're going to run on C code, which is gonna be slower. Which is the reason these multimedia extensions exist in the first place.
Doesn't really take a degree to understand.
I got it, everyone around here got it, it seems you're the only one left out. Wonder where the problem lies?

easyfab
12th December 2018, 18:06
And if you want the latest info for SIMD you should look :

AVX2 https://code.videolan.org/videolan/dav1d/issues/78
SSSE3 https://code.videolan.org/videolan/dav1d/issues/216
ARM / NEON https://code.videolan.org/videolan/dav1d/issues/215

As you can see for AVX2 it's pretty much done, but only a few for others. And that only for 8bit if i'm correct.

Nintendo Maniac 64
12th December 2018, 19:40
many of those CPUs are often times going to be too slow for any real use anyway.
And in all honesty, if you bought a K10 in 2012 or anywhere near to that, you just did it wrong, even on the low-end market.

Keep in mind that even the Llano 1st gen APUs lacked SSSE3 due to their K10-derived CPU architecture.


As someone with both a Phenom II x4 and a Core 2 Quad (actually a Phenom II x2 unlocked to x4 and a quad Wolfdale Xeon), I find that the latter has pretty sub-par multicore scaling in video workloads - yes it's faster than a Core 2 Duo, but not quite at the level that you'd expect as I showed in my post two pages back (https://forum.doom9.org/showthread.php?p=1859536#post1859536) (if Wolfdale had the same scaling from 2c/2t to 4c/4t as Nehalem, then 4c/4t Wolfdale would've only needed ~2.4GHz, not 2.7GHz)

This then commonly results in the Phenom actually performing similar to if not better than the Core 2 Quad on a per-GHz basis assuming the tested code isn't heavily relying on SSSE3 or SSE4.1 (as is obviously the case currently with AV1 decoding), and the Phenom not only tended to have higher stock clocks but even came in 6 core variants as well.


Similarly, I've also previously documented that the Phenom II is faster than Core 2 Quad clock-for-clock in SVP video interpolation (http://www.svp-team.com/forum/viewtopic.php?pid=68658) (which is a task that loves "moar cores!" and SMT threads).

benwaggoner
12th December 2018, 20:06
And if you want the latest info for SIMD you should look :

AVX2 https://code.videolan.org/videolan/dav1d/issues/78
SSSE3 https://code.videolan.org/videolan/dav1d/issues/216
ARM / NEON https://code.videolan.org/videolan/dav1d/issues/215

As you can see for AVX2 it's pretty much done, but only a few for others. And that only for 8bit if i'm correct.
Do we have numbers for the installed base of AVX2 capable PCs? They've been in all new mainstream systems for several years now. I'd guess it's >50% already.

Nintendo Maniac 64
12th December 2018, 21:22
2Do we have numbers for the installed base of AVX2 capable PCs? They've been in all new mainstream systems for several years now.

I realize I sound like a broken record at this point, but the newest Pentiums and Celerons still do not support AVX, and this even applies to the models that use the full-fat Sky/Kaby/Coffeelake cores (though with smaller cache size) such as the ever-popular 2c/4t Pentium G4560 and its direct successor the G5400.

(and again, going forward the Athlon 200GE is a wiser choice of CPU, but that's only been on the market for a couple months now)

mzso
13th December 2018, 12:08
Hi!

On the decoder sides Dav1d and libAOM are the only two options? I see Firefox has a Dav1d option, which doesn't work too well, because it freezes on the bitmovin demo. (I guess the other is libaom.) The default decoder plays the video completely smoothly now on my computer.

PS:
By the way, can I download these streams?
The player is pretty trashy, the quality always resets and doesn't want to change unless I seek.

mzso
13th December 2018, 12:22
Even worse that it seems to be about tiles.
These should have never been implemented in a new codec the first place, it is just a lazy workaround because libaom sucks at frame parallel encoding and decoding.
rav1e and dav1d won't need them

Why shouldn't we like tiled encoding?