Log in

View Full Version : LAV CUVID Decoder - High Quality Hardware decoding for NVIDIA


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 [13] 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

VincAlastor
18th May 2011, 08:26
Just to see if its being used? No.
You can always check with stuff like ProcessExplorer and see which modules it loaded.

Or just check your GPU usage with GPU-Z, if its 0, its not being used.

oh great, than i will install the micosoft ProcessExplorer. is it normal that graphedit and graphstudio everytime load the dtv-dvd decoder?

nevcairiel
18th May 2011, 08:36
If you let GraphStudio construct the graph on its own, it will prefer the filters marked as "preferred" in the registry, which is the MS decoder.
You can manually add filters to the graph and test like that, though.

nevcairiel
18th May 2011, 13:25
The source of LAV CUVID is now available in my Git Repository (http://git.1f0.de/gitweb?p=lavcuvid.git;a=summary), if anyone is interested.

SamuriHL
18th May 2011, 13:33
Sweet! Thanks for that. :)

SamuriHL
18th May 2011, 13:40
Uhhh, stupid question time. This can only be built on a machine that actually has an nVidia card? :( There's no way to install the CUDA API without having the nVidia hardware? Or am I missing something. I'd like to build it on my laptop which has an AMD card, but...

nevcairiel
18th May 2011, 13:41
No idea if you can install the CUDA SDK without the appropriate hardware. As you know, i only have NVIDIA. :p

SamuriHL
18th May 2011, 13:43
Is it just the cudatoolkit that I need?

nevcairiel
18th May 2011, 13:45
http://developer.nvidia.com/cuda-toolkit-32-downloads

This is the one i build against.
"CUDA Toolkit" 32-bit from the long list of things down there, yes.

SamuriHL
18th May 2011, 13:48
Yea, ok, I got that installed. It's just not finding cuda.h when I build. How does the solution look for the CUDA API? As in where is it looking for it? I have it in:

C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v3.2

nevcairiel
18th May 2011, 13:58
I forgot, i may have added that to the global (system) path, or in the source path in visual studio, somewhere in the settings.

CruNcher
18th May 2011, 14:06
The source of LAV CUVID is now available in my Git Repository (http://git.1f0.de/gitweb?p=lavcuvid.git;a=summary), if anyone is interested.

:)

Thanks Mindbomb, I appreciate the response. I'll try the drivers Cruncher linked to, the other 2 suggestions I have done.

Am also curious as to why it will use versions 1 and 3 of the decoder, but none of the others...

you could try something fuzzy ;) copy the nvcuvid.dll from your current driver installation install the new driver and put the old nvcuvid.dll in the folder you registered lav cuvid from ;)
This way you maybe still can use the optimized 3D part without compromising your Hardware Video Decode :)
Though when Cuda Versions differ between the Driver it could fail

SamuriHL
18th May 2011, 14:15
I forgot, i may have added that to the global (system) path, or in the source path in visual studio, somewhere in the settings.

Yea, I'll figure it out. CUDA adds a bunch of vars to the system env:

CUDA_INC_PATH
CUDA_LIB_PATH

Still trying to figure out how to add them.

SamuriHL
18th May 2011, 14:24
2>Build succeeded.
2>
2>Time Elapsed 00:00:06.56
========== Rebuild All: 2 succeeded, 0 failed, 0 skipped ==========

WOO HOO! Thanks, Nev!

Sebastiii
18th May 2011, 16:31
The source of LAV CUVID is now available in my Git Repository (http://git.1f0.de/gitweb?p=lavcuvid.git;a=summary), if anyone is interested.

Nice :) thanks.

hawkson
21st May 2011, 10:15
fter install, just load up your DirectShow player of choice and start playing a movie.
Depending on your choice of player, you might need to configure the LAV CUVID Decoder as a preferred filter
for the formats you want it to decode.


I want to use MPC HC. How can I verify I am using the Lav Cuvid? I have installed the 64 bit version of Cuda Toolkit.

I also installed the CoreAVC. Should I uninstall CoreAVC now because i have Lav Cuvid?

nevcairiel
21st May 2011, 10:16
LAV CUVID does currently not work on 64-bit.
Anyhow, in MPC-HC just right click the movie, navigate to Filters, and in the list of filters should be LAV CUVID if its active.

Selur
21st May 2011, 11:05
due to a limitation in the CUDA SDK 3.2, currently only 32-bit is available
any news on a 64bit Version of the filter?

nevcairiel
21st May 2011, 11:10
64-bit is overrated.

I'm not even working on this, so no news.

Selur
21st May 2011, 12:57
Okay, thanks for letting us know. :)

Virtual_ManPL
21st May 2011, 13:26
64-bit is overrated.
Wrong.
99% of calculations is just data movement from register A to register B.
32bit (x86) has 8 registers and 64bit (x86-64) has 16 registers.
So in perfect world, performance should be at least 50-75% (100% in dream mode) better, same like in SLI looking on GPUs.
Also don't forget that now new CPUs are capable of 128bit or even 256bit operations through SIMDs extensions :p

I'm not even working on this, so no news.
But when nVidia will release official support for 64bit in SDK 4.0 are you planing to do the same with this decoder ? :p

nevcairiel
21st May 2011, 13:28
So in perfect world, performance should be at least 50-75% (100% in dream mode) better, same like in SLI looking on GPUs.


Like i said, overrated. You produce some artificial stats, without any real impact, and think its magically better,
In the real world, performance is usually even less then a x86 build, because data types are bigger (adresses are 64-bit, default integer types might suddenly be 64-bit, depending on compiler, etc)

Moving data between registers around is not what takes up the time in an algorithm. More registers surely helps, but its not the key to performance. It only avoids a cache hit now or then, but L1 caches are insanely fast.
The actual calculations, on the data in the registers, thats what takes time.

If doubling registers doubles performance, why didn't they add many more when building the x64 arch back in the day?

Oh and btw, the SIMD extensions can be used without x64.
I actually have some optimizations using MMX, which would stop working on x64. :p



But when nVidia will release official support for 64bit in SDK 4.0 are you planing to do the same with this decoder ? :p
Probably not.
Unless there are real advantages (x64 is not one of them), i won't be upgrading the SDK, as every new SDK requires a newer driver to be used, and i would like to keep the requirements low.

Maybe one day if there is MVC decoding support in a SDK, i might upgrade again.

Adding support for x64 is more then just compiling it differently. Like i outlined above, my MMX code wouldn't work, and a new SDK causes all other sorts of things to be considered.
Also, i just don't think its worth it. I specifically targeted this decoder at madVR, and its only 32-bit as well, so there you go! With other renderers, you are free to use a DXVA decoder, which offers the same quality.

hawkson
21st May 2011, 16:49
CI downloaded and installed Lav Cuvid according to the read me. How can I tell if its working? How do I select it in MPC-HC?

CruNcher
21st May 2011, 17:00
@nev
Fix : MPEG2 DXVA Decoder can decode video only with 4:2:0 chroma format. ;)
Hehe the free ones are the first again (after Mainconcept) i wonder when Cyberlink and the others react though they have no 4:2:2 support @ all yet :D

BTW it should be rather easy to detect this as the DXVA decoder get it from the splitter it's changes its Subtype Mode from the default Mpeg-2 and by that they cancel DXVA and switch into Software Decoding. Lav splitter supports it but so far Lav Cuvid doesn't seem to detect it, so instead of the others using this subtype change to disable DXVA i guess you could cancel the Mpeg-2 Decoder connection and falll back to the dshow chain.

Virtual_ManPL
21st May 2011, 20:04
@ nevcairiel

No artificial stats, just look on archivers, encoders, CADs or even some decoders and browsers, boost in speed looks impressive :cool:
but you code need to be "well-written"

why no more registers, dunno, maybe some problems with patents, don't forget that Intel want to ban AMD from producing x86 CPUs when AMD will not "share" 86x-64 specification

when you use MMX, don't forget that it will be unusable on Tegras (AMR architecture). In near future these low power eaters with high performance will be probably dominant in mobile field. Even Windows 8 will be on it. nVidia also started Project Denver, which will create high performance 64bit AMR processor (comatible with 32bit) for mobiles, PCs, servers or even supercomputers

Change is coming so it's bad idea to limit your software only to 32bit because of MadVR :p

nevcairiel
21st May 2011, 20:06
LAV CUVID is a DirectShow filter, for Windows, on x86. Why would i care about about any other architecture like ARM?
Also, i don't care about other applications. CAD, for example, will use massive amounts of RAM, of course x64 is preferable there. But thats the only solid advantage it has.

You don't make any sense, none whatsoever. I'll start ignoring now.
I actually know what i'm talking about, i get the impression that you're just a 64-bit fanboi.

There will be *NO* x64 support in the foreseeable future.
I don't care about any arguments you make up. All *solid* statistics for any DirectShow components i have seen usually point to stuff even running *slower*.
LAV CUVID decodes in hardware, is does not benefit from x64 *at all*, because it does not use the CPU much. Heck, it would probably be slower on x64 because my MMX code won't work there.

The source is now freely available, you're free to make it work on x64 for your own pleasure. My time is too valuable for that.

andyvt
21st May 2011, 20:08
No artificial stats, just look on archivers, encoders, CADs or even some decoders and browsers, boost in speed looks impressive :cool:
but you code need to be "well-written"



Can you point to a benchmark (or series of) that demonstrates this? Are you a software developer?

madshi
21st May 2011, 21:30
I think we have another case here of a non-developer thinking he knows more than the developers do.

andyvt
21st May 2011, 21:33
I think we have another case here of a non-developer thinking he knows more than the developers do.

He must be a Project Manager :)

madshi
21st May 2011, 21:43
Haha! Yeah, probably... :D

noee
21st May 2011, 22:09
He must be a Project Manager :)

Hey now, some of us actually have a background to support our unreasonable demands. Some of us don't. (http://en.wikipedia.org/wiki/The_Peter_Principle)

;)

Virtual_ManPL
21st May 2011, 22:24
Implying sometimes developers aren't close minded to tech news and always know the best...

If you looking about tests, just go to some hardware site and look for CPU tests... or do it in your PC, this will be the most reliable



@ nevcairiel - I'm not 64bit fanboy, but I can look like one in here between people's own adoration
I was only thinking it will be a good idea to port this decoder on AMR architecture when Windows 8 will be coming and nVidia also joining the "CPU" war. Could be a big hit.
No offense here, I respect you hard work and this that you don't waste your time on things which isn't your priority.

nevcairiel
21st May 2011, 22:27
If anyone knows how code works on different CPUs, then developers. Who else would? Developers, and the engineers that build the CPUs. You don't strike me as either.

You don't even know if ARM Windows8 will still have DirectShow. They have a clean break there, no legacy x86 filters work anyway, it would make sense to drop DirectShow in the process.
The Tegra SoCs also don't support CUDA/CUVID, they have their own interface to the video decoder. You don't just "port" stuff to that, you have to rewrite it.

Its pretty clear that you're no developer, and that you also have no insight into the technology involved.
You're free to state your opinion of course, but your tech facts are just false.

Virtual_ManPL
21st May 2011, 22:41
I stated many times that I'm not software developer, so you can laugh at me guys.
But thanks that some will correct me about it.

Windows 8 won't have DirectShow. It will be completely replaced by Media Foundation, based on what I read on M$ TechNet and some other sites.



EDIT:
If we joking, have same more :p

http://www.businessballs.com/images/treeswing/treeswing-copy-softwarebook.jpg

nevcairiel
21st May 2011, 22:51
Here is a more modern version. :p
http://alonsorobles.com/wp-content/uploads/2011/01/tree_swing_development_requirements.jpg

We have that hanging in the office at the wall somewhere. :p

andyvt
21st May 2011, 23:03
If you looking about tests, just go to some hardware site and look for CPU tests... or do it in your PC, this will be the most reliable


Simple test on my dev PC


static void Main(string[] args)
{
DateTime startTime = DateTime.Now;
int i = 0;

for (int yo = 0; yo < Int32.MaxValue; yo++)
i++;

TimeSpan tsTake = DateTime.Now - startTime;

Console.WriteLine("{2} Iterations Took:{0} Platform: {1}", tsTake, (IntPtr.Size == 8 ? "x64" : "x86"), Int32.MaxValue);
}


2147483647 Iterations Took:00:00:04.2972458 Platform: x86

2147483647 Iterations Took:00:00:05.2703014 Platform: x64

andyvt
21st May 2011, 23:10
Windows 8 won't have DirectShow. It will be completely replaced by Media Foundation, based on what I read on M$ TechNet and some other sites.


When MS removes VFW you might be able to make that argument.

Virtual_ManPL
21st May 2011, 23:49
CPUIDMP
Phenom Phenom
3.0 GHz 3.0 GHz
32 bit 64 bit

Separate Tests
32 bit SSE MFLOPS 11981 12013
32 bit Integer MIPS 9015 8279

Two Threads
32 bit SSE MFLOPS 12012 12042
32 bit Integer MIPS 9027 8265

Four Threads
32 bit SSE MFLOPS 12005 11996
32 bit Integer MIPS 9029 8265
32 bit SSE MFLOPS 11991 12004
32 bit Integer MIPS 8270 9030


WhetsMP
Phenom 3.0 GHz 32 bit compilation

MFLOPS Vax MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
Gmean MIPS 1 2 3 MOPS MOPS MOPS MOPS MOPS

1675 25786 6353 1818 1780 1453 138 95 4986 5385 5109
Thread 1 900 870 721 69 47 2512 2898 3995
Thread 2 918 910 732 68 48 2473 2487 1113

Phenom 3.0 GHz 64 bit compilation

1486 34935 6892 1808 1451 1252 199 93 4964 5804 11837
Thread 1 900 724 751 100 46 2482 2893 10625
Thread 2 908 727 501 99 47 2483 2912 1211

BusMP
Speed In MBytes/Second

Phenom 3 GHz DDR3 Phenom 3 GHz DDR3
32 Bit 64 Bit
CPUs Inc64B RdAll SSE2 Inc64B RdAll SSE2
L1 6KB
1 14041 13617 23847 23804 24340 23800
2 18754 25133 47214 22114 29916 47175
% 134 185 198 93 123 198

L2 96KB
1 1496 12878 23879 2986 21594 23822
2 2974 25520 47516 5676 27000 47542
% 199 198 199 190 125 200

L3
1 841 10107 13311 1499 11256 11967
2 1492 18736 25895 2788 15690 22640
% 177 185 195 186 139 189

RAM 128MB
1 454 5212 7289 897 5792 7372
2 760 8959 12146 1477 8734 12124
% 167 172 167 165 151 164



Now run some programs like I said above instead of testing synthetic "benchmarks"
Play with AutoCAD 2012 or Blender (even cold start loading is faster), comperes sth with archiver, encode/convert some audio or video stream or even recompile e.g. new kerner in Linux

andyvt
22nd May 2011, 00:36
Now run some programs like I said above instead of testing synthetic "benchmarks"
Play with AutoCAD 2012 or Blender (even cold start loading is faster), comperes sth with archiver, encode/convert some audio or video stream or even recompile e.g. new kerner in Linux

I don't think anyone said that x86 is always faster than x64. There are certainly scenarios where x64 is quite a bit faster (mostly limited to applications that need to work with large amounts of memory or that can take advantage of x64 only CPU instructions).

Where your assertion (http://forum.doom9.org/showpost.php?p=1502625&postcount=620), or at least how I read it, was that x64 is always faster and that I just needed to test it on my PC (http://forum.doom9.org/showpost.php?p=1502745&postcount=631) - which simply isn't true (http://forum.doom9.org/showpost.php?p=1502757&postcount=635).

madshi
22nd May 2011, 09:21
I stated many times that I'm not software developer, so you can laugh at me guys.
We do, thanks.

Virtual_ManPL
22nd May 2011, 11:15
@ andyvt - you miss "perfect world" and "dream mode", I not stated anywhere that 64bit is always faster

@ madshi - same here, when I hear 32>64

OK, end of offtop... looks like army defending the 32bit base it to strong here... ;)

nevcairiel
22nd May 2011, 11:36
In the long run, 64-bit will be better, and its the future. But in the here and now, there is just no advantage in using it, and the time needed to make everything work on 64-bit is better spend on other things.

We aren't strictly defending 32-bit, we are just pointing out the obvious flaws and errors in your arguments for 64-bit.

Didée
22nd May 2011, 16:17
Looking at x264 as a practical example - - i think the over-the-thumb number is that x264_64 is roughly ~15% faster, isn't it?

15% faster is a worthwile improvement. But it isn't exactly earthshattering, either. If in the context of a (much less complex) decoder there would be a 10% speed gain, well .... nice to have, but anything but critical.


@ Virtual_ManPL:
if you're in need of a x64-GPU-decoder so urgently - nobody forbids you to write one! Go ahead! :)

SamuriHL
22nd May 2011, 16:28
64 bit makes sense in my world....the world of video editing. It eats memory like there's no tomorrow. Besides, your theoretical 15% gain comes from _CPU_ decoding, *NOT* GPU decoding as Nev has already mentioned. I'd be willing to bet that Nev is quite correct that GPU decoding isn't going to benefit at all from 64 bit. And if the decoder isn't benefiting from 64 bit, the rest of your architecture doesn't make sense, either. A renderer isn't going to benefit from it. The player in theory might, but, without a decoder or renderer, that's wasted. So, the reality is you really need the entire architecture to move to 64 bit, and today, there's not really a benefit to doing that. On the contrary, if we ask madshi to make a 64 bit madVR, and Nev to make a 64 bit decoder, then they're taking time away from other useful features that could be done. madshi has always said the minute someone comes to him with a CLEAR and CONCISE reason to move to 64 bit that isn't based upon speculation and opinion, that he'd at least consider the time and effort. Until then, good luck with that. So argue away on this all you want. The fact is you have the job of convincing the developers to change their minds. And to do that is going to take actual *FACT* with solid evidence.

roozhou
22nd May 2011, 17:13
Looking at x264 as a practical example - - i think the over-the-thumb number is that x264_64 is roughly ~15% faster, isn't it?

Where did you get that 15%?

AJ73
23rd May 2011, 16:19
To say a few words about x64 support. For my tests with the CUDA video encoder I get a significant higher performance with the x64 build than with the x86 build (on a x64 system, the x86 build on a x86 system is quiet as fast the x64 build on a x64 system) with the same code (250 to 350 fps for a 720x576 video).

Beside the performance (at least on the encoder side, but as much I have seen a lot of CUDA code is faster with native x64) there is an argument for a x64 build. x86 path on a x64 system is completely seperated from the x64 path. All codecs (VfW als well as DirectShow, MTF or ActiveX elements) have to be present twice. x86 software can not see the x64 components (on process level).

I had no problem to compile a x64 version from the GIT repository. The only thing I had to do is to deactivate the SSE YV12 conversion (to have no or slow YV12 conversion is no drawback for me).

nevcairiel
23rd May 2011, 16:52
Speed enhancements in *encoding* are totally unrelated to decoding. LAV CUVID also doesn't use CUDA, it just sends the video frame to the hardware, and that does its mojo then. I doubt its *any* faster on x64, as the decoding hardware is the bottleneck, not my wrapper around it.

Like i said earlier, if you desperately want x64, the source is available, get a SDK with x64 compat, and go nuts.

HeadlessCow
23rd May 2011, 18:32
There's an x64 version of DGIndexNV (Donald Graft's CUVID decoder app/avisynth filter), so you can check it yourself if you have a license to it or I will check it when I get home and then we can end this sidetrack into x64 performance.

BeNooL
24th May 2011, 18:36
So I now moved to Windows 7 x64 and gave this filter a new try...unfortunately I still can't get it to work.

I'm using all 32bits Direct Show components (MPC, ffdshow, etc); installed the VC++2010 redist libs, registered the filter (with regsvr32 from SysWOW64); forced CUVID as preferred in MPC-HC External filter........and it is still ffdshow that's gets loaded.

Any idea what I could be missing?

Hardware is a 8600 GTS so a VP2 with Feature Set A according to wiki. 266.58 Nvidia drivers installed (and CUDA working fine with BOINC for example)

nevcairiel
24th May 2011, 18:56
Didn't you have the NVIDIA card just as a secondary card without screen connected?

Thats probably causing it. :p
I'll add some debug output at some point to maybe get an idea of whats going on. Won't be that soon however, i have other things to work on right now.

BeNooL
24th May 2011, 19:02
yes that's it, the main card is an ATI and has the screens connected to it.