LAV Filters - DirectShow Media Splitter and Decoders [Archive] - Page 468

lvqcl

13th March 2019, 18:43

Here are the results of my small test of decoder performance:

Test app: graphstudio / graphstudio64
Test codecs: LAV Filters 0.73.1-36 and MPC-BE standalone filters 1.5.3.4462.
Test CPU: Intel Core2 Q9300 (quad core, up to SSE4.1, no AVX)
Test video: Gangnam Style 720p from Youtube
Results:

x86 LAVFilters 0.73.1-36: 76.3 FPS
x86 MPC-BE filters 1.5.3.4462: 82.7 FPS

x64 LAVFilters 0.73.1-36: 84.8 FPS
x64 MPC-BE filters 1.5.3.4462: 91.6 FPS

CPU usage for LAV Filters is also less than for MPC-BE: 72% vs 82%.

It seems that LAV Filters doesn't use as many threads to decode AV1 as MPC-BE. This results is slightly less CPU usage and slightly less decoding performance.

nevcairiel

13th March 2019, 20:43

LAV uses as many threads as your CPU has threads, while MPC-BE uses more. Using more is beneficial in benchmarking, but not in playback, as it might starve the renderer to overtax the CPU, which is why I changed LAV to not do that anylonger.
You can adjust the thread count yourself if you feel like testing. MPC-BE uses Thread * 1.5 (ie. 50% higher).

lvqcl

13th March 2019, 21:37

Indeed, I set the number of threads to 6 and now I can get ~94 FPS from LAV Filters.

Pat357

15th March 2019, 00:20

I have some trouble playing an AV1 encoded file with MPC-HC or MPC-BE on my SkylakeX i9-7940X system.
In the MadVR OSD, I see the decoder cue dropping from 16 to 0-1 and obviously the the playback is very jerky. Just unwatchable !

- File (300MB) : https://mega.nz/#!1Bo0EIRT!es4Zu0K3cKS9DIt9xmFS_t-5La0texjPChfQri8VTEM

- Cut a small piece with only 500 frames (46 MB) : https://mega.nz/#!tZ5kgaKJ!IQqBnuiIOPwicuRQJCNv9LhWerEAg003ymoOjHjyNw0

This is what MediaInfo tells about this file :

General
Complete name : D:\film\UHD-HD\Stream3_AV1_4K_13.9mbps.webm
Format : WebM
Format version : Version 4
File size : 300 MiB
Duration : 3 min 1 s
Overall bit rate : 13.9 Mb/s
Writing application : aomenc 1.0.0
Writing library : libwebm-0.2.1.0

Video
ID : 1
Format : AV1
Format/Info : AOMedia Video 1
Format profile : Main@L5.0
Codec ID : V_AV1
Duration : 3 min 1 s
Bit rate : 13.3 Mb/s
Width : 3 840 pixels
Height : 2 160 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 25.000 FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Bits/(Pixel*Frame) : 0.064
Stream size : 288 MiB (96%)
Language : English
Default : Yes
Forced : No
Color range : Limited

I did some tests by compiling ffmpeg with libaom and libdav1d and the results are amazing : libdav1d is almost 4x faster than libaom on decoding this file !

Maybe it's time to replace libaom by libdav1 in LAVFilters ?
Nevcairiel, what's your opinion about this ? Is this feasible with not to much work ?

nevcairiel

15th March 2019, 00:22

You should really read the last 4 posts just directly above yours.

Pat357

15th March 2019, 01:09

You should really read the last 4 posts just directly above yours.

Dam, you're really fast !
I ask something today and you had it ready yesterday as we speak !

I noticed a problem with the 32bit version on my system : the player closes immediately upon opening the above mentioned file.

Naam van toepassing met fout: mpc-be.exe, versie: 1.5.3.4455, tijdstempel: 0x5c7f3025
Naam van module met fout: avformat-lav-58.dll, versie: 58.26.101.0, tijdstempel: 0x00000000
Uitzonderingscode: 0xc0000005
Foutmarge: 0x000031f8
Id van proces met fout: 0x3034
Starttijd van toepassing met fout: 0x01d4dac9349ddd69
Pad naar toepassing met fout: D:\programs\MPC-BE.1.5.0.1997.x86\mpc-be.exe
Pad naar module met fout: d:\programs\LAV Filters\x86\avformat-lav-58.dll
Rapport-id: 8d7e6434-5d59-4595-9710-5a297dc965b7
Volledige pakketnaam met fout:
Relatieve toepassings-id van pakket met fout:

I had the same problem with the 32bit version from Videolan cmdline AV1 decoder dav1d.exe : as soon as I used more than 6 frame-threads, I got a message that it could not allocate memory to initialise.

As soon as I patched the header from dav1d.exe to -Wl,large_addres_aware, I was able to up the freamethreads to 16. (it used 1900 MB then)
My feeling tells me this is the same problem.

My system i9-7940X has 14C/28T.
I guess only the systems with a lot of cores are affected.

Could you please test my sample from above on a system with a lot of cores with a 32bit player ?

nevcairiel

15th March 2019, 01:13

I said 4 posts. :p Here, have a link to help you out https://forum.doom9.org/showthread.php?p=1868640#post1868640
And here, have another link: https://files.1f0.de/lavf/nightly/

sneaker_ger

15th March 2019, 07:53

dav1d developers are aware that it uses more memory than necessary.
So 1 more link:
https://code.videolan.org/videolan/dav1d/issues/257

nevcairiel

15th March 2019, 10:06

with a 32bit player ?

You should switch to 64-bit, there is no getting around that 32-bit is going to severely limit you eventually, both in performance and available memory.
There is nothing I can do to fix that. As sneaker_ger said, dav1d uses a bit too much memory on 8-bit and/or 4:2:0/4:2:2 samples, but even if that is fixed, it only slightly moves the resource limit, nevermind the significantly slower decoding on 32-bit (since the majority of the AVX2 code is 64-bit only).

When I try decoding such a 4K AV1 stream on my 7900X on 32-bit, all I get is a black screen because it runs out of memory instantly without lowering the thread count. Its not supposed to crash, but I guess there are still some unchecked allocations left somewhere.

Some numbers to further demonstrate how futile 32-bit is. I set LAV to 10 threads on my 10-core CPU, since that seems to allow it to safely run on 32-bit, and benchmarked the full 4K clip you posted

32-bit 10 threads: 63 FPS
64-bit 10 threads: 130 FPS
64-bit auto threads: 145 FPS

So by moving from 32-bit to 64-bit, you can more then double the FPS these files decode at, and thats not going to change much. Even once the memory usage is reduced and you can run more threads on 32-bit, all the fast AVX2 code is going to remain 64-bit only, which will make 64-bit dav1d almost twice as fast on 64-bit if AVX2 is supported.

In any case, whatever reason someone has to stick to 32-bit, it should really be re-evaluated.

Grimsdyke

15th March 2019, 10:24

@ nev. Would it maybe an idea to add (basic) profile ability ??
I am asking because on my system I can use DXVA2-copyback on everything up to 1080p but unfortunately I do need D3D11-native for UHD !!

Sorry for 'bumping' this thread but after three months I thought I could ask again because yesterday evening I compared render times watching the Italian "SUSPIRIA"-BD.
D3d11 native: ~ 28.8 msec
DXVA2 copyback: ~ 23.8 msec
I think that is a huge difference and I really could use these 5 msec to further increase performance in MadVR !! I am of course aware that I could always change this every time
depending on the content but I think software should do these tasks for users.
I understand that a profile system might be way too much work but it would be great to have a basic setting so that LAV chooses automatically the decoder !!
Something like:
SD = DXVA2 copyback
HD = DXVA2 copyback
4K = D3D11 native

You could make also it optional so that users with enough GPU horsepower don't even have to bother. Best wishes

nevcairiel

15th March 2019, 10:29

I'm sorry, but I don't plan such functionality. D3D11 native should always be the most efficient, note that render times are not really a good measurement, since they depend on the power state of the GPU, which can vary a lot (and if you use a more efficient decoder, maybe it drops down a state)

lvqcl

15th March 2019, 18:56

Dam, you're really fast !
As soon as I patched the header from dav1d.exe to -Wl,large_addres_aware, I was able to up the freamethreads to 16. (it used 1900 MB then)
My feeling tells me this is the same problem.

IIRC the support of more than 2GB of RAM is a property of a program, not DLL files that are loaded by the program. And MPC-BE does have LargeAddressAware flag (although I wonder why LAV files don't have it).

nevcairiel

15th March 2019, 19:19

LargeAddressAware has absolutely no impact on DLLs, its always controlled by the calling application.

sneaker_ger

15th March 2019, 20:21

Some numbers to further demonstrate how futile 32-bit is. I set LAV to 10 threads on my 10-core CPU, since that seems to allow it to safely run on 32-bit, and benchmarked the full 4K clip you posted

32-bit 10 threads: 63 FPS
64-bit 10 threads: 130 FPS
64-bit auto threads: 145 FPS

So by moving from 32-bit to 64-bit, you can more then double the FPS these files decode at, and thats not going to change much. Even once the memory usage is reduced and you can run more threads on 32-bit, all the fast AVX2 code is going to remain 64-bit only, which will make 64-bit dav1d almost twice as fast on 64-bit if AVX2 is supported.
For the record: SSSE3 difference seems to be only about 15%.

DMU

16th March 2019, 10:47

NikosD

16th March 2019, 11:10

I can not activate h/w decoding vp9 on AMD Vega GPU in LAV.

Screen 1 (https://drive.google.com/open?id=1_uR37H_nZJy0XFOd27iLgc90DCaV0lfP)

In MS Edge all ok.

Screen 2 (https://drive.google.com/open?id=1ouqs-lmFXuaIBtz0jboHkvLmctJ5QoHy)Edge uses a different mode for HW acceleration than LAV and any other decoder out there.

It's extremely optimized for DXVA using MFT.

For LAV use DXVA2 native NOT dxva copy-back to see the difference.

Still, Edge is even more optimized.

nevcairiel

16th March 2019, 11:43

As far as I can tell, Vega doesn't support VP9 through DXVA2. I don't own such hardware, so I cannot test, but there is no hardware specific logic in LAV, so if it doesn't work, then the driver doesn't actually expose that mode.

el Filou

16th March 2019, 13:18

Amazing speedup from dav1d in new nightly, on a 6,5 Mbps file I went from 22 to 39 avg fps on my old Core 2 Duo. Still not enough to watch a movie unfortunately, as some scenes where the bitrate goes very high still bring frame drops.
64-bit only brought a 2 fps improvement, I guess because a CPU that old doesn't have the optimized instruction sets that are used. :(

nevcairiel

16th March 2019, 15:44

LAV Filters 0.74

LAV Splitter
- Changed: Using GnuTLS for HTTPS and other TLS protocols, improving performance and compatibility with a lot of web streaming services (ie. YouTube Live Streams through youtube-dl, and more)
- Fixed: Keyframes in MP4 files were being reported with a slightly offset timestamp, resulting in slow keyframe seeking
- Fixed: Subtitles that stretch over chapter boundaries could be lost in Ordered Chapter MKV files
- Fixed: Fonts embedded in MKVs without a proper mimetype were not being imported (now it checks the file extensions for .ttf/.otf as well)

LAV Video
- NEW: Initial support for parsing HDR10+ (SMPTE ST 2094-40) metadata, and passing it to the video renderer
- NEW: Using the dav1d AV1 decoder for significantly improved AV1 decoding performance
- Changed: Re-enabled experimental hardware acceleration for H.264 MVC 3D decoding on Intel GPUs, disabled by default
- Changed: Updated Intel MediaSDK dispatchers to the latest Media SDK, fixing compatibility with newer runtimes in the Intel DCH drivers
- Changed: Improved support for additional UtVideo subtypes

LAV Audio
- Changed: Added an option to disable the PCM fallback when bitstreaming is requested
- Fixed: Further improvements to TrueHD Bitstreaming, resolving glitching on more new titles (particularly seamless branching titles)
- Fixed: Automatic fallback from bitstreaming to PCM could crash in some situations

Download: Installer (both x86/x64) (https://files.1f0.de/lavf/LAVFilters-0.74.exe) -- Zips: 32-bit (https://files.1f0.de/lavf/LAVFilters-0.74.zip) & 64-bit (https://files.1f0.de/lavf/LAVFilters-0.74-x64.zip)

Not much to say that isn't already apparent from the Change Log above. A lot of collected bugfixes, further improvements to TrueHD bitstreaming, which should to the best of my knowledge finally bring it to a fully spec compliant level, dav1d for fast AV1 decoding, and much more.

A quick note on HDR10+ support - without renderer support, this does nothing. Its just additional metadata, and does not impact the decoding of the video. It should feel much like an ordinary HDR10 movie if the metadata is being ignored, and what exactly a renderer does with that metadata is out of my hands.

As always, please report issues, specifically regressions, in as much detail as possible with a sample file if applicable.

Have fun!

Sebastiii

16th March 2019, 19:58

Thanks :) you rocks

hubblec4

16th March 2019, 20:01

Thank you for improving Matroska support.

Manni

16th March 2019, 20:10

Thanks a lot for your work, much appreciated, as always :)

jmone

16th March 2019, 23:20

Thanks. Quick test on MC25 was all good.

Carpo

17th March 2019, 12:38

Hi @nevcairiel

Thanks for the new release, I have a question I wonder if you or others could help me with, I often see people saying use x method over y ( DXVA2(native/copy) over D3D11) so I am never sure which to use, I have two systems

i7-7700K with a Nvidia 1080

and a

i7 4600K with a Nvidia 970

out of all the options, which would you suggest?

Thanks

Edit: I should have added that I have tried both and haven't noticed any difference.

nevcairiel

17th March 2019, 13:23

As a general purpose answer when using madVR, DXVA2 Copy-Back is generally the best, since its fully compatible with every feature in all renderers, is fully bit-exact, is reasonably fast (typically faster then D3D11 Copy-Back for complex reasons), and has no feature limitations.

The only reason to use something else is performance, which mostly applies to more low-end systems, where a "Native" mode is required to play 4K 10-bit content, for example, in which case D3D11 Native would be the best option, since it incurs no quality penality. Unfortunately madVR does not support all its features in D3D11 Native mode, so its hard to recommend it as a general option for everyone.

If you use EVR, then DXVA2 Native is usually the best option to use.

Just to mention all modes:
QuickSync or CUVID should generally not be used anylonger, in favor of DXVA2-CopyBack.

Carpo

17th March 2019, 14:53

Thanks Nev

bitterman

17th March 2019, 17:44

Thanks.

Sunspark

18th March 2019, 21:59

Danke nevcairiel for all the great work you do.

nevcairiel

19th March 2019, 09:43

LAV Filters 0.74.1

LAV Video
- Fixed: VP9 video could produce wrong timestamps, resulting in a black screen or other playback disruptions
- Fixed: Decoding VP9 from a non-keyframe (ie. after a seek, or badly cut file) would not always recover properly once a keyframe was encountered

Download: Installer (both x86/x64) (https://files.1f0.de/lavf/LAVFilters-0.74.1.exe) -- Zips: 32-bit (https://files.1f0.de/lavf/LAVFilters-0.74.1.zip) & 64-bit (https://files.1f0.de/lavf/LAVFilters-0.74.1-x64.zip)

If you haven't seen the 0.74 release notes yet, you can find them here (https://forum.doom9.org/showthread.php?p=1869029#post1869029)

Just a few quick fixes for VP9 decoding on top of 0.74.

As always, please report issues, specifically regressions, in as much detail as possible with a sample file if applicable.

Have fun!

mzso

19th March 2019, 17:59

LAV Filters 0.74

- Changed: Improved support for additional UtVideo subtypes

Hi!

I guess UT Video T2 RGB video is not among what ffmpeg supports, because LAV only produces a black screen.

nevcairiel

19th March 2019, 18:38

I don't have any experience with UtVideo. I just pass it to ffmpeg. I suggest to try the ffmpeg commandline to see if it supports such videos.

mzso

19th March 2019, 19:26

I don't have any experience with UtVideo. I just pass it to ffmpeg. I suggest to try the ffmpeg commandline to see if it supports such videos.

Well, it gave an error for the same file: "Error while decoding stream #0:0: Invalid data found when processing input" repeated many times.

GCRaistlin

20th March 2019, 23:18

LAV Video seems to report the active hardware accelerator incorrectly.
The test system has dual graphics: AMD Radeon HD 6620G (integrated) and AMD Radeon HD 6750M (discrete). I renamed mpc-hc64.exe to ~mpc-hc64.exe (or else the video driver doesn't allow to use the discrete graphics for MPC-HC), set it to use the discrete graphics in Radeon Additional Settings applet, launched it - LAV Video decoder (0.74.1, tried internal 0.74.0, too) says: "Active Hardware Accelerator AMD Radeon 6620G" (while Switchable Graphics Application Monitor reports that ~mpc-hc64.exe is using the "High Performance" (discrete) GPU). If I set MPC-HC64 to use the integrate graphics LAV Video decoder reports the same. That is, switching between adapters doesn't affect what LAV Video is reporting.
Then I installed FurMark. By default, it uses the discrete graphics, shows 14 fps with it and reports: "AMD Radeon 6600M and 6700M Series". If it forced to use the integrated graphics it shows 2 fps and reports: "AMD Radeon 6620G". That is, it's really been switched and correctly reports what GPU is currently being used.

nevcairiel

21st March 2019, 02:03

huhn

21st March 2019, 02:40

you are not supposed to rename anything to use a different GPU there are other ways that should work.

like display-> Graphics settings -> browse -> add mpc-hc and now you can manipulate windows to use a different GPU for that program.

clsid

21st March 2019, 02:46

To clarify: the setting that huhn refers to is found in the Windows settings app, and exists since Windows 10 1803.

GCRaistlin

21st March 2019, 08:22

With which hardware decoder mode is that?
Generally LAV will ask the D3D device it uses for decoding during startup about its name, and then store that name somewhere. If its being switched out behind its back, then it won't catch that (and since its cosmetic only, also not really care).

It's with dxva (copy-back). The D3D device wasn't definitely being switched out behind LAV's back as I first set MPC-HC to use the discrete graphics and only then launched MPC-HC. As for cosmetic only I don't completely agree with you as this info (if it is correct) could help to diagnose performance issues.

the setting that huhn refers to ... exists since Windows 10 1803.
The OS being used is Windows 8.1 x64.

nevcairiel

21st March 2019, 11:51

It's with dxva (copy-back). The D3D device wasn't definitely being switched out behind LAV's back as I first set MPC-HC to use the discrete graphics and only then launched MPC-HC. As for cosmetic only I don't completely agree with you as this info (if it is correct) could help to diagnose performance issues.

I didn't mean that you changed it, but possibly Windows or the driver does. I literally create the device and right after ask about its name. So either its really using the integrated graphics for decoding, or its lying to me somewhere.

I don't suppose you have any way to check which device is actually doing the decoding, based on hardware usage etc, instead of just trusting the tool that says it should be the dedicated?

kolak

21st March 2019, 14:32

Is there any reason why LAV decoder can't read MOV headers for color flagging (including new tags for master display information: https://patchwork.ffmpeg.org/patch/6409/) and pass it to madVR (like it does for h265)?
This would allow to play eg ProRes, DNxHR etc. HDR files to HDR TV.

nevcairiel

21st March 2019, 16:30

Of course, the reason is that the decoder doesn't get access to the container file. It can only read the video stream itself, anything else would require manual handling of some sort.

GCRaistlin

21st March 2019, 17:33

nevcairiel
Switching the GPU for an app is, for some unknown reason, often a hack thing in AMD systems, with all that renamings of exe's. So I was hoping to get another proof from LAV that all is working in the right way. The fact is that I can't see any difference in madVR's performance when MPC-HC set to use the integrated GPU or the discrete GPU. The conclusion is either 6620G and 6750M have the equal possibilities in video processing made by madVR or MPC-HC ignores the GPU setting and always uses the integrated graphics, though the driver reports the reverse.

el Filou

21st March 2019, 17:57

Configure DXVA Checker the same way as MPC-HC, then do a decode benchmark while monitoring your integrated and discrete GPU. If the iGPU is used for decoding you should see some GPU load on it even in pure decode mode.

GCRaistlin

21st March 2019, 20:59

Should I look at GPU Engine Usage? How to do a decode benchmark?

el Filou

21st March 2019, 21:55

Install DXVA Checker: http://bluesky23.yukishigure.com/en/DXVAChecker.html
go to DS/MF Decoder ==> Check Decoders ==> Select media file ==> click the arrow next to LAV Decoder ==> Decode Performance ==> DXVA2.
For usage monitoring you can use GPU-Z: https://www.techpowerup.com/gpuz/

GCRaistlin

22nd March 2019, 10:35

Is it normal that I have "Unsupported" under "LAV Video Decoder" on DS/MF Decoder tab?

kolak

22nd March 2019, 11:38

Of course, the reason is that the decoder doesn't get access to the container file. It can only read the video stream itself, anything else would require manual handling of some sort.

Ok, so this info has to be inside stream headers, not container.
In case of ProRes and DNxHR it's (except master display info). They have private headers flagging inside as well.

nevcairiel

22nd March 2019, 11:40

Is it normal that I have "Unsupported" under "LAV Video Decoder" on DS/MF Decoder tab?

Yes. I'm not sure what it tries to read there, but it works regardless.

nevcairiel

22nd March 2019, 11:41

Ok, so this info has to be inside stream headers, not container.
In case of ProRes and DNxHR it's (except master display info). They have private headers flagging inside as well.

If you can provide example files, I could take a look. In theory there already are some mechanisms to transport this data, since I added it for HDR WebM Files, which only have it in the container data as well, but it needs to be implemented for every container specifically, unfortunately.

kolak

22nd March 2019, 14:08

In all cases you have the same container flagging (ProRes also has correct private frames headers flagging, but DNxHR/Cineform not). Best to relay on MOV flagging. In this case: colr, clli and mdcv tags. Those fully describe HDR content.
DNxHR also stores range info in ACLR tag. ProRes has no range info as it should be rather always limited range (although you can make full range file as well). Cineform has no known range flagging either.

Link:

https://drive.google.com/open?id=1DmtLfyNJbMR8UHMI0U3Pni2cSGvTQ1hq

Values for different standards HDR10, HLG etc are the same as for h265. Mediainfo has support for all those tags.