Log in

View Full Version : LAV CUVID Decoder - High Quality Hardware decoding for NVIDIA


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 [25] 26 27 28

Andy o
12th September 2011, 23:04
OK, so I put back in my GTX 460, but I have the same problem as Thuan, after using CUVID and closing MPC-HC, my card stays in the P8 power state until I reboot the PC. Before playing the video, it returns promptly to P12, after, it doesn't. This is with 280.26 drivers on Win 7-64. Are all you guys using those drivers?

thuan
13th September 2011, 03:21
Maybe it is the driver as I'm also on that driver. At work now so I can't try any other driver version now.

Andy o
13th September 2011, 03:30
Tried 280.36 as well, same thing.

Andy o
13th September 2011, 07:05
OK, I enabled Nvidia Inspector's Multi Display Power Saver for MPC-HC (set for P8) and the bug went away, now it goes back to P12. (It says *P12 though, don't know what the asterisk is about, but clocks and voltage are P12 indeed).

EDIT: The asterisk seems to indicate that the P state is locked, so in this case it's because it's being forced by NV inspector, not because the driver went back to P12 as it should.

CruNcher
13th September 2011, 09:29
Andy o could you try the 280.47 (should be a little better though not perfect) and the 275 driver the 280.26 WHQL can stay sometimes for a very long time in a mode (like it's stuck) it should work better for XP though.

Last info about the P-state improvement and most probably a driver with more conservative switching behavior (though most probably not so fast (low latency start to switch) then the current 280.36 WHQL when it sometimes works ;) )
Though i can understand if Nvidia has to change to higher latency to keep compatible with every low price component card (PMU) on the market again, at least it shows which cards are of high build quality and which not (if you don't overclocked or its pre overclocked and it still resets with a TMD in 3D Games) ;) )


Got an email from nVidia admitting fault with the 280.26 driver, said they are aware of what is causing the problems and that the next driver branch is 285.

I am guessing it's the same people who worked on the 275 set which aren't too bad.

Andy o
13th September 2011, 09:39
I could not find a newer one than 280.36, but thanks for letting me know about .47. 280.26 didn't just get stuck for a "very long time" though. It was hours, I went to bed and came back and it was still there. Also, it seems the GTX460 is using more power than I remember, when I compared it back then with the 5770, even at idle.

nevcairiel
13th September 2011, 09:41
Tried 280.36 as well, same thing.

They only do very minor changes in the same release branch. If you want to test another driver, test 270 or 275.

Its known that they changed P-state handling in 280. Its also been confirmed by NVIDIA that the handling will return to a more conservative mode in 285, which should see its first beta within 1-2 weeks.

PS:
280.36 and 280.47 are OpenGL 4.2 developer drivers, they typically don't have any other changes except the new OpenGL support.

CruNcher
13th September 2011, 09:55
Though the crashs are for sure not Nvidias fault the stuck in some P-state mode might be but the crashs (due to the faster switching) are surely not they just hit a area where low quality non reference cards (or heavily overclocked) wouldn't survive anymore and now drive back, because of the massive (OC Gamer) feedback :)
I have no crash issues with my MSI 460 GTX Hawk and i saw no other 460 Hawk user reporting problems (non overclocked) :)

nevcairiel
13th September 2011, 10:14
Yeah its really only the factory OC cards that are having those issues.

Andy o
13th September 2011, 10:48
Thanks, I think I'll wait for 285, NV Inspector is working fine right now for it. Another thing I found was that now the driver is outputting 24-bit to my display, when months before when I was using it, it used to output 36-bit. I don't think it makes a difference though.

Alexey1975
13th September 2011, 15:05
... I found was that now the driver is outputting 24-bit to my display, when months before when I was using it, it used to output 36-bit.

Sorry, but HOW did you "found" it?

CruNcher
13th September 2011, 16:10
Not only that came look @ this http://developer.nvidia.com/nvidia-gpudirect%E2%84%A2-video

Im thinking about the Scenario for example Sending Lav Cuvid decoded frames directly into Quicksync for example :D

nevcairiel
13th September 2011, 16:14
GPUDirect is old news. Its only for Quadro/Tesla, and is only useful if you need to transfer data between GPUs, not between GPU<->CPU

CruNcher
13th September 2011, 16:21
GPUDirect for Video is new :) http://blogs.nvidia.com/2011/09/check-out-nvidia-gpudirect-for-video-and-more-at-ibc-2011/ and yes my example is like that ;) GPU(NVidia)->Intel(Quicksync) low CPU overhead of course the overhead for Quicksync on the CPU would stay :)
Especially in situations the Intel GPU cant handle @ decoding anymore when using PP this could come handy :)

nevcairiel
13th September 2011, 16:28
Its not a feature for consumer use, its a API for other hardware developers to interface with the Quadro better. It will never be usable for what you seem to want to use it for. Not to mention that i doubt that you have a Quadro.
And its still old news, it was announced quite a while ago.

nevcairiel
13th September 2011, 16:32
OK, so I put back in my GTX 460, but I have the same problem as Thuan, after using CUVID and closing MPC-HC, my card stays in the P8 power state until I reboot the PC. Before playing the video, it returns promptly to P12, after, it doesn't. This is with 280.26 drivers on Win 7-64. Are all you guys using those drivers?

I found out why that's happening. Its madVRs fault.
The latest madVR version has a bug that hangs onto a reference to the video decoder, which causes the video decoder to never be completely destructed. In LAV CUVID case that means it'll hang on to the CUDA interfaces, which apparently cause it to keep the clocks up.

I'm preparing a new version with this behavior changed, by releasing all resources earlier in the process. This should eliminate the symptoms you're seeing, as well as fix a bug when you play too many videos (all resources are eaten up eventually)

Damn you madshi, this bug is getting annoying. :)

nevcairiel
13th September 2011, 16:42
LAV CUVID Decoder 0.13

0.13 - 2011/09/13
- Improved resource release behavior
- Improved VC-1 in EVO with pulldown flags

Download: Installer (32/64-bit, CUDA 4.0+) (http://files.1f0.de/cuvid/LAVCUVID-0.13.exe) - 32-bit (CUDA 4.0+) (http://files.1f0.de/cuvid/LAVCUVID-0.13.zip) - 64-bit (CUDA 4.0+) (http://files.1f0.de/cuvid/LAVCUVID-0.13-x64.zip) -- 32-bit (Older CUDA) (http://files.1f0.de/cuvid/LAVCUVID-0.13-LegacyCUDA.zip)

The resource fix solves a problem with madVR 0.74, which caused LAV CUVID to not properly release all resources, keeping the clocks of the card up, and eventually running out of resources, causing the decoder to not work anymore.
Note that this is a madVR bug, and may have other side-effects as well.

Anyhow, have fun with these.

Xaurus
13th September 2011, 18:37
Thanks nev!

When I open a "simple" 720p file (~4 mbit/s bitrate) with CUVID 0.13, I notice that the performance sticks at the highest level (p0?), ie. full 3D force.

The video engine load is about ~13% and the GPU load is ~3%.
Is it the decoder or the graphics driver that control the level needed?

It has run for 5 minutes now and still full force.

PS: This is with the new 285.27 drivers and I have set the application preference "Power Management Mode" to Adaptive.

PPS: I am not really worried about power consumption, this is more a scientific question as to whether it's supposed to be like this or not. :)

nevcairiel
13th September 2011, 18:50
CUDA seems to always force P0 mode for some reason.
I recommend to use NVIDIA Inspectors Multi Display Power Saver to force a lower P state. I just set it to limit it to P8 when running mpc-hc - but there are some more extensive rules you could employ to make the selection smarter.

Andy o
13th September 2011, 19:28
Thanks for the fix, NEv.

Sorry, but HOW did you "found" it?

My display (Pioneer KRP-500M) tells you the bit depth of the signal. FWIW, my 5770 is 30-bit, but again, I don't think higher-than 24 matters, at least not yet.

nevcairiel
13th September 2011, 19:51
Consumer cards are limited to 8-bit output, partly artificially limited by BIOS/driver. They want you to buy Quadro/FireGL cards for professional usage of high bit-depth signals.
They may actually send another signal over the wire, but the data it uses is based on 8-bit RGB data.

Personally, i think actual 10-bit output is overrated for video. Internal processing should of course be kept at a high enough level, but once you reach RGB, dithering it to 8bit gives you already a pretty good picture, and unless you move way up close, you won't see the dithering noise anyway.
Files may move to 10bit for some additional precision, and of course the added compression efficiency (who would've thought that storing more bits is actually more efficient?), but the whole display chain? I don't see it happening any time soon really, even though HDMI in theory supports it.

Xaurus
13th September 2011, 21:26
With a 1080i 29.97 file, would it be better to display it using interlaced resolution?
How would I go about this with CUVID, just set it to weave? Or simply use another decoder with no deinterlacing capabilities?

The reason I ask is that the plasma seems to lose something when I set it to 59p, ie. the settings are mostly greyed out for some reason and 1:1 pixel mapping is lost, like it thinks the format is some sort of non-digital or something.
If I set it to 59i the plasma will keep the settings including 1:1 pixel mapping but then I can't really run the file with a deinterlacer, can I?

All frequencies below and including 50 Hz are all good, but anything above (progressive) seems to lose the settings and my colors, black level etc. etc. are all messed up and I can't adjust it.

nevcairiel
13th September 2011, 21:30
All decoders output interlaced video as weave, its very rare that fields are delivered separately.

Interlaced output from pcs is nothing I would ever recommend.

TheShadowRunner
13th September 2011, 22:51
Nev, thanks for the new build, but Legacy 0.13 cannot be registered (having to use nv266.58/lavcuvid.dll 266.58 because of this retarded bug (http://forums.nvidia.com/index.php?showtopic=197425&st=0)): "procedure specified could not be found."
After checking, both non-legacy and legacy LAVCUVID.ax are identical (same MD5)..
Please repack ;)

Andy o
14th September 2011, 00:03
With 29i output, the whole PC output is usually re-interlaced after being deinterlaced by the decoder or renderer, in my experience. I think an exception may be Media Center and probably other players that use EVR. With madVR it definitely is re-interlaced.


Interlaced output from pcs is nothing I would ever recommend.
Is there a reason? I've been using 29i on my PC along with 23p and 50p for a couple of years now. I don't use 59p except for games.

On the same subject, would it be possible for CUVID to have an option to output 23/24p for IVTC'd video? Then I could follow your advice and not output 29i :)

robpdotcom
14th September 2011, 01:33
On the same subject, would it be possible for CUVID to have an option to output 23/24p for IVTC'd video? Then I could follow your advice and not output 29i :)

Isn't that what it does when you select "film" in the configuration screen?

Andy o
14th September 2011, 02:12
No, it outputs 30p as it says. My guess is that it does IVTC, but then duplicates every 4th frame. I have to either apply a decimation script after that, or let my PC re-interlace to 30i and let my display do auto IVTC and display at 72Hz (that's the main reason why I'm outputting 29i).

Andy o
14th September 2011, 02:27
Hmm apparently NV has no way to disable deinterlacing e.g. in Media Center (which has to be done through the driver control panel), so it might be that with NV content is always deinterlaced before being re-interlaced for final output.

nevcairiel
14th September 2011, 06:02
Nev, thanks for the new build, but Legacy 0.13 cannot be registered (having to use nv266.58/lavcuvid.dll 266.58 because of this retarded bug (http://forums.nvidia.com/index.php?showtopic=197425&st=0)): "procedure specified could not be found."
After checking, both non-legacy and legacy LAVCUVID.ax are identical (same MD5)..
Please repack ;)

Thats odd.
Anyhow, i repacked the build and replaced the file, should hopefully work now. I checked the dll with depends, and it doesn't use any of the v2 functions this time around..

nevcairiel
14th September 2011, 06:59
No, it outputs 30p as it says. My guess is that it does IVTC, but then duplicates every 4th frame. I have to either apply a decimation script after that, or let my PC re-interlace to 30i and let my display do auto IVTC and display at 72Hz (that's the main reason why I'm outputting 29i).

It doesn't "duplicate" frames. The IVTC process works like that, it results in 30 progressive frames. You need to apply a decimation step afterwards - however figuring out which frame is the duplicate is non-trivial.

Andy o
14th September 2011, 07:10
There are 24 original (reconstructed) frames in that 30fps, so 6 must be duplicates, no?

Xaurus
14th September 2011, 10:02
Okay, I am getting tired of trying to make a 1080i 29.97 BD rip playback without judder when there is panning. Fairly terrible judder, I might add.

I thought my setup was OK with interlaced content but this is fairly high bitrate stuff.

Display: 29.97012 Hz
Framerate: 29.970

0 dropped frames
0 delayed frames

LAV CUVID set to film mode. Adaptive deinterlacing. High quality DXVA processing. YV12 output. Auto field order.


ID : 1 (0x1)
Complete name : xxxxxxxxxxxxxxx_1080i.ts
Format : MPEG-TS
File size : 11.4 GiB
Duration : 56mn 37s
Overall bit rate : 28.9 Mbps
Maximum Overall bit rate : 35.5 Mbps

Video
ID : 4113 (0x1011)
Menu ID : 1 (0x1)
Format : AVC
Format/Info : Advanced Video Codec
Format profile : High@L4.1
Format settings, CABAC : Yes
Format settings, ReFrames : 2 frames
Codec ID : 27
Duration : 56mn 36s
Bit rate mode : Variable
Bit rate : 25.9 Mbps
Width : 1 920 pixels
Height : 1 080 pixels
Display aspect ratio : 16:9
Frame rate : 29.970 fps
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Interlaced
Scan order : Top Field First
Bits/(Pixel*Frame) : 0.416
Stream size : 10.2 GiB (90%)

nevcairiel
14th September 2011, 10:46
Its probably not interlaced, but telecined - which means you would see the 3:2 judder.

Does the judder persist if you play it with a software decoder and EVR?
Which BD disc is it?

TheShadowRunner
14th September 2011, 11:39
Nev, thanks a lot, that fixed it.
I'm also happy to report that the workaround for madVR's "bug that hangs onto a reference to the video decoder" is working here. (previously ZP was sometimes giving me graph creation errors when opening a new video while one was already playing, no more with 0.13 ^^)

e-t172
14th September 2011, 14:36
It doesn't "duplicate" frames. The IVTC process works like that, it results in 30 progressive frames. You need to apply a decimation step afterwards - however figuring out which frame is the duplicate is non-trivial.

It is trivial. What's not trivial is the IVTC itself: figuring out the position of the telecine pattern inside the current batch of frames. Once you know where the pattern is, you can IVTC the first frame of the pattern and throw away the next one.

nevcairiel
14th September 2011, 14:40
It is trivial. What's not trivial is the IVTC itself: figuring out the position of the telecine pattern inside the current batch of frames. Once you know where the pattern is, you can IVTC the first frame of the pattern and throw away the next one.

Since the NVIDIA decoder does already perform the IVTC itself, all that remains is "finding" the duplicate frame and removing it - which is the same as finding the telecine pattern. Its not static, the pattern may change over time, you need motion in the first place to "lock" on it, and all the fun involved!

An alternative would be to let NVIDIA weave it and manually do the full IVTC process.

Anyhow, i will probably not work on IVTC in LAV CUVID, i'll rather do it in LAV Video, and eventually merge the two.

Xaurus
14th September 2011, 16:48
Its probably not interlaced, but telecined - which means you would see the 3:2 judder.

Does the judder persist if you play it with a software decoder and EVR?
Which BD disc is it?
Hi nev,

The judder persists with the combo of ffdshow video + EVR. It's the same frequency of the judder as with CUVID + madvr.

It is this disc:
http://www.blu-ray.com/movies/The-Pillars-of-the-Earth-Blu-ray/14546/

nevcairiel
14th September 2011, 16:51
Since that looks like TV content pressed onto Blu-ray, its quite likely that its a telecined and not an interlaced disc. A full IVTC process is not supported by any H264 decoder yet that i know of. The closest you can get is using ffdshow with the experimental IVTC patches from e-t172

Xaurus
14th September 2011, 17:38
Since that looks like TV content pressed onto Blu-ray, its quite likely that its a telecined and not an interlaced disc. A full IVTC process is not supported by any H264 decoder yet that i know of. The closest you can get is using ffdshow with the experimental IVTC patches from e-t172
Thanks, nev. What does the "use Inverse Telecine" option in "adjust video image settings" in the Nvidia control panel really do, if it's not helping in cases like this?

I guess one would have to simply use the Blu Ray player to watch these kinds of shows then. I assume they have a built in IVTC function or something? I've never seen a 1080i Blu-Ray disc until this one.

I've read up on telecine on wikipedia:

It is also possible, but more difficult, to perform reverse telecine without prior knowledge of where each field of video lies in the 2-3 pulldown pattern. This is the task faced by most consumer equipment such as line doublers and personal video recorders. Ideally, only a single field needs to be identified, the rest following the pattern in lock-step. However, the 2-3 pulldown pattern does not necessarily remain consistent throughout an entire program. Edits performed on film material after it undergoes 2-3 pulldown can introduce “jumps” in the pattern if care is not taken to preserve the original frame sequence (this often happens during the editing of television shows and commercials in NTSC format). Most reverse telecine algorithms attempt to follow the 2-3 pattern using image analysis techniques, e.g. by searching for repeated fields.

Algorithms that perform 2-3 pulldown removal also usually perform the task of deinterlacing. It is possible to algorithmically determine whether video contains a 2-3 pulldown pattern or not, and selectively do either reverse telecine (in the case of film-sourced video) or deinterlacing (in the case of native video sources).

Not sure I got any wiser though, because this is heavy stuff.

nevcairiel
14th September 2011, 17:47
The option controls wether the driver performs IVTC. But it only searches for the sequence and restores the proper frames - however it does not remove the duplicate frame - which is why you end up with 30 frames per second, and not 24.

Andy o
14th September 2011, 18:51
Ah, I'm decimating without any alternative versions of ffdshow, with the TIVTC script (http://forum.doom9.org/showthread.php?t=82264). Took me a while to find the right script (or at least the one that didn't confuse the hell out of me with its settings).

Xaurus, see if this works. Set CUVID to YV12 and 30p output (film). Install avisynth, put the TIVTC dll in the plugins folder, and do this in the avisynth section of ffdshow's raw filter:
http://photos.smugmug.com/photos/i-Ps5BZMx/0/O/i-Ps5BZMx.png

My main problems with AMD 5770 was getting IVTC done in the first place, decimating with this was trivial. I tried IVTC'ing with the DScaler decoder, and it worked most of the time, but with glitches. Hardware IVTC is working flawlessly.

nevcairiel
14th September 2011, 18:53
I thought someone said that ffdshows AVISynth didn't support changing the framerate properly.

Andy o
14th September 2011, 19:14
When I use that, ReClock reports 23.976, if that's what you mean. madVR doesn't drop any frames then. Just double checked with h.264 interlaced mt2s from the Dolby bluray and it's working great.

Might be a slight problem if you're using madVR's refresh rate switch, but just adding a "23p" or "24p" to the file name fixes it.

BeNooL
15th September 2011, 10:54
Interlaced output from pcs is nothing I would ever recommend.
I'd add that it depends on what the PC is connected to.

If if it's a computer LCD screen then yes, you're better off deinterlacing in the PC.

In my case, my PC is connected to a DVDO Duo scaler and I want it to do the deinterlacing. I'm just trying to output a signal that as close as possible to the source.

nevcairiel
15th September 2011, 11:01
Even then i would not recommend it.
Internally, a PC will always handle progressive signals, which means it'll weave the image. If it performs any image processing on this weaved image, it will break the original fields and make perfect deinterlacing much harder.
In addition to that, information like Top Field First is lost as well. PCs convert typically to RGB, which means chroma expansion on a still interlaced image, most likely without knowing that it was weaved from interlaced...
After that, it has to take the weaved image and try to interlace it again ... IMHO, its a perfect formula for chaos. :)

PCs are not meant for interlaced signals, never were and never will be.

nm
15th September 2011, 12:04
Even then i would not recommend it.
In addition to that, information like Top Field First is lost as well.

Maintaining proper field order within the video rendering process is the biggest issue. The player/renderer gets no information about the currently displayed field (whether it's top or bottom) and a VBlank interrupt is received for each field. Therefore the standard way of drawing the entire weaved frame at once doesn't work. Any missed VBlank or framedrop may reverse field order causing the "terrible judder" issue that Xaurus described.

This field sync problem could be avoided by only drawing one field per VBlank, always keeping the two most recent fields in the framebuffer. This needs to be specially implemented in a renderer though, and I don't know any freely available software solution that supports it currently.

Discussion: http://forum.xbmc.org/showthread.php?t=81834

italospain
15th September 2011, 12:50
Maintaining proper field order within the video rendering process is the biggest issue. The player/renderer gets no information about the currently displayed field (whether it's top or bottom) and a VBlank interrupt is received for each field. Therefore the standard way of drawing the entire weaved frame at once doesn't work. Any missed VBlank or framedrop may reverse field order causing the "terrible judder" issue that Xaurus described.

This field sync problem could be avoided by only drawing one field per VBlank, always keeping the two most recent fields in the framebuffer. This needs to be specially implemented in a renderer though, and I don't know any freely available software solution that supports it currently.

Discussion: http://forum.xbmc.org/showthread.php?t=81834

Hi nm here is link to a linux solution

short english explanation:

http://lists.x.org/archives/xorg/2008-September/038296.html

original german thread

http://www.vdr-portal.de/board17-developer/board25-patches/78480-patch-rgb-pal-ueber-vga-mit-variabler-framerate/

It offers a driver patch to synchronize interlaced frames for a interlaced resolution display so there is no need to deinterlace at all.

I would love to see a Windows Solution

BeNooL
15th September 2011, 14:33
Even then i would not recommend it.
Internally, a PC will always handle progressive signals, which means it'll weave the image. If it performs any image processing on this weaved image, it will break the original fields and make perfect deinterlacing much harder.
In addition to that, information like Top Field First is lost as well. PCs convert typically to RGB, which means chroma expansion on a still interlaced image, most likely without knowing that it was weaved from interlaced...
After that, it has to take the weaved image and try to interlace it again ... IMHO, its a perfect formula for chaos. :)

PCs are not meant for interlaced signals, never were and never will be.

What happens in the case of an interlaced IPTV recording being played with madVR (with its internal decoding) and outputting straight to 1080i over HDMI?

Visually the result is über smooth and has excellent picture quality. Way better than using YAdif or CUVID (with a 8600GTS that does horrible edge interpolation production jiggling borders)

nm
15th September 2011, 15:06
Hi nm here is link to a linux solution

short english explanation:

http://lists.x.org/archives/xorg/2008-September/038296.html

Yep, vga-sync-fields works by slightly altering display refresh rate to avoid dropping any fields or frames for sync purposes during playback. This is especially useful with live TV playback where a broadcast signal is the timing source and it can't be slowed down or sped up. Unfortunately the patch only works with certain old Radeons and some Intel GPUs, and both the display driver and the player software needs to be tweaked.

On Windows, you could try using ReClock to better maintain the field order when playing back files or recordings, but you'll still need to restart playback every now and then to fix the field order.

What happens in the case of an interlaced IPTV recording being played with madVR (with its internal decoding) and outputting straight to 1080i over HDMI?

Visually the result is über smooth and has excellent picture quality. Way better than using YAdif or CUVID (with a 8600GTS that does horrible edge interpolation production jiggling borders)

Hmm. If the Nvidia Windows driver only notifies about VBlank after one of the two fields, it might be possible to get stable field order with ordinary software. When I last played with these things, that wasn't the case with any of the cards I tried.

ney2x
16th September 2011, 14:29
@nev
I will be buying a used z68 motherboard and i7 2600k cpu next week. From what I had read with the new generations of chipsets and processors, they have built-in GPU. I have GTX 560 Ti GPU. My question is, where will I connect my display then? And what is the best decoder for me, LAV CUVID or LAV Video? Thanks in advance.