Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > VP9 and AV1

Reply
 
Thread Tools Search this Thread Display Modes
Old 12th October 2019, 13:49   #1  |  Link
hajj_3
Registered User
 
Join Date: Mar 2004
Posts: 1,120
dav1d accelerated AV1 decoder

dav1d 0.5.0 'Asiatic Cheetah'

https://code.videolan.org/videolan/dav1d/-/releases

The fast and small AV1 decoder, codename 'Asiatic Cheetah'. It supports all the AV1 features and all bitdepths.

0.5.0 brings large improvements in speed on SSSE3 CPU (up to 40% speedup), new speed improvements on AVX-2 (for 4-7%) and ARM64 (up to 10%) and ARM32. It introduces some VSX, SSE2 and SSE4 optimizations.
0.5.0 fixes some minor issues, can export ITU T.35 metadata and improves the player example.

Last edited by foxyshadis; 3rd December 2020 at 12:34. Reason: add general releases link
hajj_3 is offline   Reply With Quote
Old 29th October 2019, 01:23   #2  |  Link
Beelzebubu
Registered User
 
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 109
Quote:
Originally Posted by NikosD View Post
TBH, I remembered ffvp9 to be one of the best optimized decoders ever and I thought it was due to AVX2 and not SSSE3 optimizations.
There's a reasonable amount, but it's sort of the inverse as dav1d: we really did go all out in dav1d, doing everything-and-the-kitchen-sink in AVX2, and then we did SSSE3 later, doing most of it, but not quite everything. For ffvp9, it was the other way around, we did everything-and-more in SSSE3, and then did a couple of things (some MC, some inverse transforms) in AVX2, but the smaller inverse transforms and MC, as well as the loopfilters and most intra predictors, were never done. So it's fairly incomplete.

Quote:
Originally Posted by NikosD View Post
it seems that all decoders are doomed in the SSEx vs AVX2 battle.
That's a little negative. But yes, you won't get a 2x (or even 1.5x) speedup. 1.2x is nothing bad, though. And this i straight Haswell, newer chipsets (Zen2, Skylake) will get more, as will encoders.
Beelzebubu is offline   Reply With Quote
Old 29th October 2019, 13:28   #3  |  Link
excellentswordfight
Lost my old account :(
 
Join Date: Jul 2017
Posts: 322
Quote:
Originally Posted by nevcairiel View Post
dav1ds AVX2 is fine. If you want to properly compare SSSE3 vs AVX2, then you need to look at Single Threaded benchmarks. Multi-Threading is often limited in scaling, where such differences can "hide".
But you should also not expect twice the performance from AVX2, since once you optimize everything possible with SSSE3/AVX2, the remaining parts that cannot be optimized so easily will impact the performance the most.
What dav1d version does the stable 0.74.1 LAV filter use?

Tried to play that 2160p60 sample from netflix with 0.74.1 and mpc-be on an i7-7500U; 4 threads at 100% load at 3.2Ghz, could barely open the file, after 30s it started playing at 2-10fps (downscaled to 1080p). Not that I was expecting any smooth playback, but is this "normal" performance? HEVC 10bit sw decoding is about 3x faster on the same setup.

Last edited by excellentswordfight; 29th October 2019 at 14:26.
excellentswordfight is offline   Reply With Quote
Old 29th October 2019, 14:30   #4  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,344
Quote:
Originally Posted by excellentswordfight View Post
What dav1d version does the stable 0.74.1 LAV filter use?
0.2.1, the newest available at the time. You can use a nightly version which would come with 0.5.1, the newest available right now.
That won't necessarily guarantee that 2160p60 will play on a mobile U-series CPU, but it got the best chances.

Just be careful not to pick the 10-bit variant of the Netflix Chimera video. 10-bit is not optimized at all yet, and its not representative of real-world content yet. YouTube for example only delivers AV1 8-bit so far.
And since there is no 8-bit 2160p variant of Chimera, thats your answer.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 29th October 2019 at 14:35.
nevcairiel is offline   Reply With Quote
Old 29th October 2019, 20:40   #5  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
I'm about to start a few benchmarks using various versions of dAV1d regarding SSSE3 and AVX2 progress on Core2Duo, Haswell, Skylake and Coffee Lake Refresh CPUs.

Is there a link with 1080p and 4K AV1 8bit sample videos to test ?
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 30th October 2019, 11:04   #6  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
dAV1d benchmarks: 0.5.1 vs 0.2.1 on Core2Duo, Haswell, Skylake, Coffee Lake R

OK, here we are.

Test Systems:
Skylake Core i5 6500 (TDP 65W) - Win 10 v1809 (17763.805) - 8GB DDR4-2133 MHz (1 DIMM - Single Channel)
All-core-turbo 3.3GHz

Haswell Core i3 4170 (TDP 54W) - Win 10 v1903 (18362.449) - 16GB DDR3-1600 MHz (Dual Channel 2x8GB)
Fixed 3.7GHz clock

Coffee Lake Refresh Core i3 9100F (TDP 65W) - Win 10 v1903 (18362.449) - 16GB DDR4-2400 MHz (Dual Channel 2x8GB)
All-core-turbo 4.0GHz

Merom Core2Duo T7600 (TDP 34W) - Win 10 v1809 (17763.805) - 4GB DDR2-667 MHz (Dual Channel Interleaved 2x2GB)
Fixed 2.33GHz clock


SW Tools:
DXVA Checker v4.2.1

LAV filters 0.74.1 (dAV1d 0.2.1 - 12/03/2019)
LAV filters 0.74.1-29 (dAV1d 0.5.1 - 26/10/2019)

During the whole benchmarking procedure, the Core i5 6500 never dropped its turbo clock of 3.3 GHz speed and Core i3 9100F never dropped its turbo clock of 4.0 GHz speed either.

Core i5 6500:
Max TDP for 1080p ~33W
Max TDP for 4K ~36W

Core i3 4170
Max TDP for 4K ~35W

Core i3 9100F
Max TDP for 4K ~54W

Core2Duo T7600
No tool can read Power Consumption


All video samples below are 8bit.

Chimera 1080p24fps sample is from Netflix
Dua Lipa 1080p25fps sample is from Youtube
Holi Festival 4K25fps sample is from Elecard (thanks @HolyWu)
Summer Nature 4K25fps sample is from Elecard

The numbers below represent FramesPerSecond (FPS) expressed as minimum/average/maximum.

1080p

Chimera ~6.6Mbps

Core i5 6500 86/134/290 CPU 87% -0.5.1
Core i5 6500 77/127/273 CPU 91% -0.2.1

Core2Duo T7600 10/19/94 CPU 72% -0.5.1
Core2Duo T7600 8/17/100 CPU 87% -0.2.1


Dua Lipa ~2.2Mbps

Core i5 6500 120/186/251 CPU 87% -0.5.1
Core i5 6500 112/186/255 CPU 91% -0.2.1

Core2Duo T7600 7/18/70 CPU 65% -0.5.1
Core2Duo T7600 7/18/69 CPU 84% -0.2.1



4K

Holi Festival ~14Mbps

Core i5 6500 34/43/61 CPU 94% -0.5.1
Core i5 6500 30/40/60 CPU 95% -0.2.1


Summer Nature ~23Mbps

Core i3 9100F 45/60/82 CPU 91% -0.5.1

Core i5 6500 32/43/57 CPU 93% -0.5.1
Core i5 6500 26/37/50 CPU 91% -0.2.1

Core i3 4170 21/30/46 CPU 92% -0.5.1
Core i3 4170 16/27/41 CPU 90% -0.2.1



Comments:

0) Sorry guys...dAV1d 0.5.1 has serious CPU utilization problem with my Core2Duo for laptop, essentially wiping out any optimization for SSSE3 set.
Dua Lipa has 0% gain over 0.2.1 and Chimera has only 11% on average.
The situation is a disaster for SSSE3 optimizations.

1) After 7 months of 0.2.1 release, I would say that dAV1d team certainly was not busy doing AVX2 optimizations.
It looks like 0.5.1 is only 0% - 8% faster than 0.2.1 on Skylake, besides the last 4K clip that gets a nice 16% gain.

2) Skylake Core i5 6500 is certainly not capable of decoding anything more than 4K30fps for AV1 up to ~20Mbps without dropping frames, even with the latest version.

3) Coffee Lake R Core i3 9100F is closer to 4K60fps, but still minimum frame rate is well below 60fps.

4) 0.5.1 dropped CPU utilization a little for Skylake (but enormously for Core2Duo) compared to 0.2.1, eating some of the performance optimizations of latest version for Skylake.

The only time that CPU utilization increased - compared to 0.2.1 - the gain was a good 16%.

5) Core i3 9100F vs Core i5 6500 results are showing that CFL-R is ~15% faster than its clock favor, probably due to a lot faster memory configuration.

Overall the results comparing 0.2.1 vs 0.5.1 were bad for both instruction sets of SIMD optimizations - SSSE3 and AVX2.

The last 7 months I see no progress according to my benchmarks and I really wonder where all those huge numbers of gain came from dAV1d team regarding 0.5.1 version vs 0.2.1

Is there a difference using a Core2Duo for desktop ?

Really looking forward for your tests and feedback.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all

Last edited by NikosD; 31st October 2019 at 04:30.
NikosD is offline   Reply With Quote
Old 2nd November 2019, 06:20   #7  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
@Beelzebubu
@nevcairiel

Guys, I posted a huge benchmark report regarding dAV1d decoder progress between 0.2.1 vs 0.5.1 versions, meaning for the last seven months and I see no replies or reactions from you since.

Can you confirm or reject my findings with yours, showing different things ?

I have seen a lot of huge numbers regarding dAV1d progress from the dAV1d team in the official release notes - which I couldn't confirm - but in here you are very quiet.

Waiting for your feedback!
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 3rd November 2019, 01:04   #8  |  Link
Beelzebubu
Registered User
 
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 109
Quote:
Originally Posted by NikosD View Post
@Beelzebubu
@nevcairiel

Guys, I posted a huge benchmark report regarding dAV1d decoder progress between 0.2.1 vs 0.5.1 versions, meaning for the last seven months and I see no replies or reactions from you since.

Can you confirm or reject my findings with yours, showing different things ?

I have seen a lot of huge numbers regarding dAV1d progress from the dAV1d team in the official release notes - which I couldn't confirm - but in here you are very quiet.

Waiting for your feedback!
Just to add to SmilingWolf's comments, I agree you and I have diverging results and I've been discussing with various people as for what could be the cause. I don't immediately have a solution or explanation, but I haven't forgotten about it either.

To be clear, we don't just do command-line interface tests. We test this in end-user applications such as VLC and Chrome/Firefox also, and we see the same performance improvements there that we also see in "dav1d" the commandline tool.
Beelzebubu is offline   Reply With Quote
Old 3rd November 2019, 02:59   #9  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by Beelzebubu View Post
... I don't immediately have a solution or explanation, but I haven't forgotten about it either.

To be clear, we don't just do command-line interface tests. We test this in end-user applications such as VLC and Chrome/Firefox also, and we see the same performance improvements there that we also see in "dav1d" the commandline tool.
Ok, but SmilingWolf and you, have tested different things than me.
Firstly, he posted single threaded performance difference and I posted multi-threaded performance difference, besides the obvious difference of the implementation.
VLC is a popular media player - no doubt about it - but here we mostly prefer other players (MPC-HC / MPC-BE / MPV.NET etc)
I don't think there is other way to find out what is going on, than to reproduce the tests by yourself.
Is it possible to test the two versions of LAV's implementation I posted above ?
Also, the huge gains of performance posted in various release notes of dAV1d are for single-threaded or multi-threaded performance ?
Thanks!
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 3rd November 2019, 08:18   #10  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by SmilingWolf View Post
Conveniently forgetting about my two posts dedicated to multi threaded performance aren't we?
Conveniently forgetting about my word "firstly" as you posted initially single-tnreaded performance only, while I was asking to confirm or reject my multi-threaded results, as I posted first, regarding this issue.
After my comment you posted multi-threaded results, not using the same tools and with different threading status.
Anyway, the point here is to understand what's going on and not once again playing with words or intensions.
You could try to delete the config file of DXVA Checker and uninstall and reinstall everything.
I'm still waiting for an answer if the publicly available reported gains between versions of dAV1d referred to single-tnreaded or multi-threaded performance.
BTW, how do you benchmark dAV1d with the two executables you posted here ?
There is no internal command in these.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 3rd November 2019, 18:38   #11  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by SmilingWolf View Post
There isn't a DXVA Checker report yet, afternoon spent trying to make it work notwithstanding, but as I said, CPU utilization goes between 70% and 90% with the two sequences used.
The main issue of dAV1d progress between 0.2.1 and 0.5.1 is not CPU Utilization.
The drop of CPU utilization using Skylake was only 2% although using Core2Duo the drop was huge.
The main issue of dAV1d it's the loss of any single-thread gain in real-world multi-thread decoding for whatever internal reason.
In the end, the end user doesn't know and doesn't care for the reasons that Dua Lipa video has exactly the same decoding speed for both versions of dAV1d 0.2.1 and 0.5.1 for two different CPU architectures and instructions sets (Skylake using AVX2 / Core2Duo using SSSE3)
It is us that we are still searching why is this happening and under what circumstances.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 2nd November 2019, 10:10   #12  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
TLDR:
Win7 64bits, i7-4770k, 3.40GHz (stock), improvement between 0.2.1 and 0.5.1 using only SSSE3 accelerated routines, single thread:
Chimera: 33.2%
Dua Lipa: 34.9%

Included are some AVX2 tests too, because yes.

Code:
# time ./dav1d-0.2.1.exe -q -i Chimera-AV1-8bit-1920x1080-6736kbps.ivf --muxer yuv4mpeg2 --framethreads 1 --tilethreads 1 --cpumask ssse3 -o /dev/null

real    5m27,012s
user    0m0,000s
sys     0m0,000s

# time ./dav1d-0.5.1.exe -q -i Chimera-AV1-8bit-1920x1080-6736kbps.ivf --muxer yuv4mpeg2 --framethreads 1 --tilethreads 1 --cpumask ssse3 -o /dev/null

real    3m38,449s
user    0m0,000s
sys     0m0,000s

# time ./dav1d-0.5.1.exe -q -i Chimera-AV1-8bit-1920x1080-6736kbps.ivf --muxer yuv4mpeg2 --framethreads 1 --tilethreads 1 --cpumask avx2 -o /dev/null

real    3m5,282s
user    0m0,000s
sys     0m0,000s

# time ./dav1d-0.2.1.exe -q -i Dua_Lipa.ivf --muxer yuv4mpeg2 --framethreads 1 --tilethreads 1 --cpumask ssse3 -o /dev/null

real    2m42,726s
user    0m0,000s
sys     0m0,000s

# time ./dav1d-0.5.1.exe -q -i Dua_Lipa.ivf --muxer yuv4mpeg2 --framethreads 1 --tilethreads 1 --cpumask ssse3 -o /dev/null

real    1m45,987s
user    0m0,000s
sys     0m0,000s

# time ./dav1d-0.5.1.exe -q -i Dua_Lipa.ivf --muxer yuv4mpeg2 --framethreads 1 --tilethreads 1 --cpumask avx2 -o /dev/null

real    1m22,243s
user    0m0,000s
sys     0m0,000s
Overall the results comparing 0.2.1 vs 0.5.1 were good for both instruction sets of SIMD optimizations - SSSE3 and AVX2.

The last 7 months I see progress according to my benchmarks and I really don't have to wonder where all those huge numbers of gain came from dAV1d team regarding 0.5.1 version vs 0.2.1

Last edited by SmilingWolf; 2nd November 2019 at 10:14.
SmilingWolf is offline   Reply With Quote
Old 2nd November 2019, 11:31   #13  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
FFMpeg says it's going to use 4 frame threads and 3 tile threads to decode the files, so I'll be using those numbers.

Chimera: 34%
Dua Lipa: 29.7%

Code:
# time ./dav1d-0.2.1.exe -q -i Chimera-AV1-8bit-1920x1080-6736kbps.ivf --muxer yuv4mpeg2 --framethreads 4 --tilethreads 3 --cpumask ssse3 -o /dev/null

real    1m40,657s
user    0m0,000s
sys     0m0,000s

# time ./dav1d-0.5.1.exe -q -i Chimera-AV1-8bit-1920x1080-6736kbps.ivf --muxer yuv4mpeg2 --framethreads 4 --tilethreads 3 --cpumask ssse3 -o /dev/null

real    1m6,398s
user    0m0,000s
sys     0m0,000s

# time ./dav1d-0.5.1.exe -q -i Chimera-AV1-8bit-1920x1080-6736kbps.ivf --muxer yuv4mpeg2 --framethreads 4 --tilethreads 3 --cpumask avx2 -o /dev/null

real    0m54,087s
user    0m0,000s
sys     0m0,000s

# time ./dav1d-0.2.1.exe -q -i Dua_Lipa.ivf --muxer yuv4mpeg2 --framethreads 4 --tilethreads 3 --cpumask ssse3 -o /dev/null

real    0m46,972s
user    0m0,000s
sys     0m0,000s

# time ./dav1d-0.5.1.exe -q -i Dua_Lipa.ivf --muxer yuv4mpeg2 --framethreads 4 --tilethreads 3 --cpumask ssse3 -o /dev/null

real    0m33,041s
user    0m0,000s
sys     0m0,000s

# time ./dav1d-0.5.1.exe -q -i Dua_Lipa.ivf --muxer yuv4mpeg2 --framethreads 4 --tilethreads 3 --cpumask avx2 -o /dev/null

real    0m25,912s
user    0m0,000s
sys     0m0,000s
SmilingWolf is offline   Reply With Quote
Old 2nd November 2019, 12:11   #14  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
The tool is from the dav1d project, found here: https://code.videolan.org/videolan/d...e/master/tools

You can either compile it yourself (MABS can do that) or use my copy: https://mega.nz/#!op5gGSTD!JPyhq1IqJ...g8SyR2HbSTs7gk

Finding the best frame/tile threads numbers is a bit tricky. Fiddling can improve performance, and I made some quick tests that brought Dua Lipa down to 29 seconds on 0.5.1+SSSE3 using --framethreads 6 --tilethreads 2, but I did not want to post them because in the end almost no media player is going to let you stray from FFmpeg, and therefore LAVFilters, defaults.

Last edited by SmilingWolf; 2nd November 2019 at 12:13.
SmilingWolf is offline   Reply With Quote
Old 2nd November 2019, 12:33   #15  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by SmilingWolf View Post
The tool is from the dav1d project, found here: https://code.videolan.org/videolan/d...e/master/tools

You can either compile it yourself (MABS can do that) or use my copy: https://mega.nz/#!op5gGSTD!JPyhq1IqJ...g8SyR2HbSTs7gk

Finding the best frame/tile threads numbers is a bit tricky. Fiddling can improve performance, and I made some quick tests that brought Dua Lipa down to 29 seconds on 0.5.1+SSSE3 using --framethreads 6 --tilethreads 2, but I did not want to post them because in the end almost no media player is going to let you stray from FFmpeg, and therefore LAVFilters, defaults.
Then Houston we have a problem.
Because we have seriously contradicting results between multi-threaded performance of ffmpeg and LAV filters regarding dAV1d, according to your tests and mine.

Could be your compilations vs nevcairiel's compilations, could be the setup of LAV vs ffmpeg for dAV1d or the hyperthreading nature of 4770K.

If you don't want to raise the threads in order to reach 100% CPU utilization, you could close hyperthreading from BIOS and run again the tests with 4 threads.

Also, you could run LAV filters benchmark using GraphStudioNext or DXVA Checker on your Core i7 as is with hyperthreading ON and see how that's going.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 2nd November 2019, 12:59   #16  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
I can't use DXVA to check performance since the dav1d library inside the FFmpeg library inside the LAVfilters library would default to using AVX2, unless you know of a way to target a specific instruction set from within DXVA, or perhaps using an environment variable. OTOH, if you just want me to check how dav1d multithreading works in different versions of LAVFilters/ffmpeg, I can test that. But it won't be a 0.2.1 vs 0.5.1 SSSE3 benchmark anymore.

I'm not too sure why you think I'm not using all cores of my CPU. With the settings above, 4 frame 3 tile threads, I get peaks of 80% CPU usage, and an eyeballed average of around 70%.

Anyway, again just because, here is my best 0.5.1+SSSE3 Dua Lipa result so far:
Code:
# time ./dav1d-0.5.1.exe -q -i Dua_Lipa.ivf --muxer yuv4mpeg2 --framethreads 6 --tilethreads 3 --cpumask ssse3 -o /dev/null

real    0m28,737s
user    0m0,000s
sys     0m0,000s
Peaks at 92% CPU, hovers at around 85% average.

Last edited by SmilingWolf; 2nd November 2019 at 13:09.
SmilingWolf is offline   Reply With Quote
Old 2nd November 2019, 14:00   #17  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by SmilingWolf View Post
I can't use DXVA to check performance since the dav1d library inside the FFmpeg library inside the LAVfilters library would default to using AVX2, unless you know of a way to target a specific instruction set from within DXVA, or perhaps using an environment variable. OTOH, if you just want me to check how dav1d multithreading works in different versions of LAVFilters/ffmpeg, I can test that. But it won't be a 0.2.1 vs 0.5.1 SSSE3 benchmark anymore.
I couldn't find a way to test specific instruction sets too, that's why I used 0.2.1 vs 0.5.1 on different hardware.
You could check SSSE3 using a Core2Duo or an AMD processor and AVX2 on any Haswell onwards or Ryzen 3000.
If you only have i7 4770K, just check AVX2.

Quote:
Originally Posted by SmilingWolf View Post
I'm not too sure why you think I'm not using all cores of my CPU. With the settings above, 4 frame 3 tile threads, I get peaks of 80% CPU usage, and an eyeballed average of around 70%.
Due to 8 threads capable CPU.
Waiting for your AVX2 LAV filters results 0.2.1 vs 0.5.1 preferably in the form of DXVA Checker min/avg/max and the average CPU utilization reported by DXVA Checker.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 3rd November 2019, 19:27   #18  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
Oh but you seemed so worried about how much dav1d was using all my cores just one day ago.

But here, have a Chimera run:
Code:
LAVFilters 0.74.1-29:
CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
GPU: NVIDIA GeForce GTX 1080
Decoder: LAV Video Decoder
Decoder Device: -
Frames: 8929
FPS: 170,234 [103-349]
CPU Usage: -
GPU Usage: 0 [0-1] %
GPU Video Engine Usage: 0 [0-0] %

LAVFilters 0.74.1:
CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
GPU: NVIDIA GeForce GTX 1080
Decoder: LAV Video Decoder
Decoder Device: -
Frames: 8929
FPS: 139,201 [77-306]
CPU Usage: -
GPU Usage: 0 [0-1] %
GPU Video Engine Usage: 0 [0-0] %
And Dua Lipa:
Code:
LAVFilters 0.74.1-29:
CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
GPU: NVIDIA GeForce GTX 1080
Decoder: LAV Video Decoder
Decoder Device: -
Frames: 5615
FPS: 260,815 [183-335]
CPU Usage: -
GPU Usage: 0 [0-1] %
GPU Video Engine Usage: 0 [0-0] %

LAVFilters 0.74.1:
CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
GPU: NVIDIA GeForce GTX 1080
Decoder: LAV Video Decoder
Decoder Device: -
Frames: 5615
FPS: 248,936 [137-328]
CPU Usage: -
GPU Usage: 0 [0-0] %
GPU Video Engine Usage: 0 [0-0] %

Last edited by SmilingWolf; 3rd November 2019 at 20:41.
SmilingWolf is offline   Reply With Quote
Old 4th November 2019, 08:55   #19  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by SmilingWolf View Post
Oh but you seemed so worried about how much dav1d was using all my cores just one day ago.
My worries were and still are, the same.
The very low gain of real-world multi-thread performance between versions 0.2.1 vs 0.5.1 of dAV1d decoder, as measured by me using the above systems and tools, compared to the advertised and publicly reported by dAV1d team regarding SSSE3 and AVX2 optimizations.
All the other comments by me, express my agony to explain by any means that huge difference.
Your results confirm mine in an absolute way regarding Dua Lipa video, but there is a small light in the end of the tunnel regarding Chimera (regardless the name of the video)
I think @nevcairiel could explain better and test LAV filter's dAV1d implementation.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 4th November 2019, 13:01   #20  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
@nevcairiel
@Beelzebubu
@SmilingWolf

A few more interesting notes regarding LAV filters.

LAV filters v0.74.1 allows you to set Thread = 1 but it actually uses 50% something CPU utilization, which means 2 cores = 2 threads for AV1 (using dAV1d)

But for all the other codecs, it uses only 1 thread as it should, based on the selection.

LAV filters v0.74.1-29 doesn't even allow you to set Thread = 1 because if you set it to 1, it doesn't enumerate in DXVA Checker when trying to decode AV1 files, while it can be used for all the other codecs using only 1 thread.

So, there is definitely something different regarding dAV1d integration in LAV filters, compared to all the other codecs.

In LAV filters 0.74.1-29, when setting Thread = 4, it has exactly the same performance as Auto for my Core i5 6500.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 19:10.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.