madVR - high quality video renderer (GPU assisted) - Page 44

iSunrise · 4th May 2009, 19:40

@madshi:
I´ve just tested v0.9 thoroughly and I must say I´m quite impressed with the lower GPU usage times, that is by comparing the numbers with _and_ without "update textures". Also, CPU usage seems to be lower, too, which is a great step forward. This is on a very fast Core i7 and a GTX260-216 with >GTX285 clocks and ZoomPlayer with Luma set to Spline36 and Chroma set to SoftCubic100.

However, not to my very liking v0.9 made one thing very obvious to me, which is good in a way, since you can hopefully reproduce and fix this.

Now, being more specific:

There is at least one very noticeable problem if you use CoreAVC with CUDA decoding, which will partly or completely go away if you either uncheck CUDA-decoding or use ffdshow as a decoder (ffmpeg-mt selected).

Here´s a small list of things that I came across. Each decoder is mentioned so you can check them step-by-step. I´ve reproduced each step a dozen times just to be sure.

Movie sample (official trailer):
wolverine-tlra_h1080p.mov

Now the decoders (madVR v0.9 as renderer):

CoreAVC with CUDA-decoding:
1) Avrg gpu rendering time is noticably higher (*1) and max gpu rendering time/update textures goes through the roof (depending on madVR settings ranging from 3-8 times higher, which is madVR settings dependant *1)
2) Display estimate 3 very often resets to 0.00000Hz and stays there for several seconds while the movie is playing or paused and Display will show [1s]

CoreAVC without CUDA-decoding:
1) Max gpu rendering time/update textures looks fine
2) Display estimate 3 often resets to 0.00000Hz and stays there for several seconds while the movie is playing or paused and Display will show [1s]

ffdshow:
1) Max gpu rendering time/update textures looks fine
2) Display estimate 3 occasionally resets to 0.00000Hz and Display will show [1s]

*1: Compared to CoreAVC without CUDA-decoding and ffdshow

It looks like 1) is a result of both (decoder and renderer) using the GPU extensively and/or it´s related to your new video->GPU uploading method. The higher avrg rendering times and the drastically higher max rendering time numbers just don´t make any sense to me. If I choose higher settings (like Lanczos8 on both Luma and Chroma) my max gpu rendering times are sometimes higher than the movie frame interval. If I´m only using software/cpu-decoding the max gpu rendering times/updating textures are 1/5 of that, so it will never reach the frame interval, regardless of the settings I choose.

I hope you can look into this. Thanks.

Finally, here´s the 3 shots (coreavc+cuda/coreavc-nocuda/ffmpeg-mt):

kostik · 4th May 2009, 20:27

what are the best chroma and luma resampling settings? for h264?
to get the best quality?

6233638 · 4th May 2009, 21:23

Quote:

Originally Posted by kostik

what are the best chroma and luma resampling settings? for h264?
to get the best quality?

I found Mitchell-Netravali to be the best compromise between sharpness and ringing for luma, and SoftCubic 50 to be best for chroma. (100 blurs it too much)

Egh · 5th May 2009, 03:37

OK, now moar comparison for scaling algorithms, this time for the anime content.

I have this rather good R2 dvd so the video is unaltered in any way apart from decoding and rendering by mVR.

Scaling specified is only for luma, chroma is softcubic50 for each screenshot.

Bilinear:

C-R:

Lanc4:

Lanc8:

Splin64:

I haven't taken bicubics and softcubics as they were rather soft anyway.

In order to compare you need to download these pngs and compare them in a viewer so that they would be shown in exactly same position on screen (I use acdsee and scroll with a mousewheel, so I can quickly switch the pictures).

I really find hard to describe the difference between bilinear and C-R methods so for the purposes of upscaling C-R is bad, imo. I haven't saved mitchell, unfortunately (and mVR is hard to point to the exactly same frame ;P). It wasn't bad, but somewhat in between bilinear and lanc, still too soft for such task.

My personal choice is Spline64, as I've been using splines for quite a while even with ffdshow resize. Interesting that difference between lanc8 and spl64 is very subtle, although the methods differ considerably (?). Edges are a bit softer with spl64 though but seems overall shaprness is good for any of them. Basically normal unaltered DVD anime content (read: dvd content, not the video damaged by 95% of encoders in the wild ;P) can be quite watchable when upscaled to 720p with either lanc8 or spl64. Of course it is only valid for large regions with contrast edges, smaller details like text etc looks blurry in any way (as expected, that what we need HD for

)

6233638 · 5th May 2009, 04:32

From that comparison, I would say:

Bilinear is too soft, and suffers from aliasing.
Lanczos8 introduces too many artefacts.
Spline64 is very similar to Lanczos4 but with marginally less ringing.
Spline64 is sharper than Catmull-Rom, at the expense of introducing more ringing into the picture.

Personally, from those examples, I would choose Catmull-Rom as I'm quite adverse to ringing. I'd like to see how Mitchell Netravali compares, as it seemed to be the best from the testing I did, but that was filmed content rather than animated. It would also be good to see an unscaled image to get an idea of how sharp those lines should be.

Thunderbolt8 · 5th May 2009, 10:11

madshi, where is the focus when downscaling, on keeping the colours as close to the original or (also) on sharpness (with default options)? because I'm wondering whether I now need to apply another sharpness level in ffdshow as with haali filter when I scale down from 1080p to 720p res. I have the feeling that madvr already displays the picture a little sharper then as haali, is this correct?

nlnl · 5th May 2009, 10:41

Quote:

Originally Posted by 6233638

I found Mitchell-Netravali to be the best compromise between sharpness and ringing for luma, and SoftCubic 50 to be best for chroma. (100 blurs it too much)

And what was your input source (resolution) and display resolution? Did your resize, up or down?

6233638 · 5th May 2009, 11:09

Quote:

Originally Posted by nlnl

And what was your input source (resolution) and display resolution? Did your resize, up or down?

I was using blu-ray at double-size just for testing. (as blu-ray tends to have better quality video than DVD with less artefacts)

madshi · 5th May 2009, 11:28

Quote:

Originally Posted by Hypernova

Here are the same frame (from 0.8, will install 0.9 right after). [...] Add Haali renderer shots. Now it's hard to see the difference, but it's still here.

From what I can see, both EVR and Haali are slightly sharper than your madVR screenshot. That might make a difference in banding, too? Which madVR upscaling filter are you using? You may want to try a sharper one (e.g. Lanczos) to see whether the increases banding or not.

Quote:

Originally Posted by TinTime

With all other renderers (and software decoding) it connects using YUY2. Perhaps it's a HD YV12 connection that causes the problem?

That's quite possible. After all, the MPC HC decoders also worked fine with YUY2 and had a bug when forced to output YV12.

Quote:

Originally Posted by nlnl

Does uploading textures mean updating textures?

So for MadVR 0.7 and 0.9:
avr gpu 0.9 - upd text 0.9 <> (?) avr gpu 0.7 ?

Yes, upload=update. Not sure what you mean with your second question.

Quote:

Originally Posted by tetsuo55

Bugreport: Settings are not saved when madVR is run from a protected directory.

[...]

To work better in the future madVR should be limited user aware, and should ask for admin rights when saving settings.

This is a problematic situation. Asking for admin rights in order to save settings would be bad user experience IMHO. Of course I could do what Microsoft suggests, namely storing the settings into the user profile directory tree. But actually I hate this logic because it scatters all the files all over the harddisk. So I'm not really sure how to handle this.

Any opinions/suggestions?

Quote:

Originally Posted by tetsuo55

CoreAVC with CUDA-decoding:
1) Avrg gpu rendering time is noticably higher (*1) and max gpu rendering time/update textures goes through the roof (depending on madVR settings ranging from 3-8 times higher, which is madVR settings dependant *1)

Interesting numbers, thanks. It is probably to be expected that average rendering numbers are a bit higher when CUDA is used, because obviously with both CUDA + madVR active the GPU will be more busy than with just one of them active. I don't think I can do anything about that.

The max gpu rendering times look really bad, but as I've already said multiple times, in the long run the max gpu rendering times are not very important.

Quote:

Originally Posted by 6233638

I found Mitchell-Netravali to be the best compromise between sharpness and ringing for luma, and SoftCubic 50 to be best for chroma. (100 blurs it too much)

Quote:

Originally Posted by Egh

I really find hard to describe the difference between bilinear and C-R methods so for the purposes of upscaling C-R is bad, imo. I haven't saved mitchell, unfortunately (and mVR is hard to point to the exactly same frame ;P). It wasn't bad, but somewhat in between bilinear and lanc, still too soft for such task.

My personal choice is Spline64

That goes to show that luma scaling algorithms are really a matter of taste. Most of the algorithms have some advantages and some disadvantages. Personally, I like Lanczos4 for its sharpness, but (obviously) I dislike the ringing. I like Mitchell, but actually I like SoftCubic50 even more, which I find very similar to Mitchell, but with less aliasing. So sometimes I'm using Lanczos4 and sometimes SoftCubic50.

For chroma I'm using SoftCubic100. Interesting that both of you guys prefer SoftCubic50 for chroma.

Quote:

Originally Posted by 6233638

Interesting that difference between lanc8 and spl64 is very subtle, although the methods differ considerably (?)

Lanczos and Spline produce very similar results, I think. However, you should compare with the same number of taps. Lanczos3 and Spline36 use 3 taps. Lanczos4 and Spline64 use 4 taps. Lanczos8 uses 8 taps. Personally, I can see some differences between 3 and 4 taps. But I don't really see much of a difference between 4 and 8 taps. Except that 8 taps rings quite a bit more...

Quote:

Originally Posted by Thunderbolt8

where is the focus when downscaling, on keeping the colours as close to the original or (also) on sharpness (with default options)? because I'm wondering whether I now need to apply another sharpness level in ffdshow as with haali filter when I scale down from 1080p to 720p res. I have the feeling that madvr already displays the picture a little sharper then as haali, is this correct?

madVR does not tamper with the colors, regardless of which downscaling filter you're using. madVR also does not apply any sharpness algorithm. I don't know if madVR is sharper than Haali, that probably also depends on which downscaling filter you're using. If you find madVR generally sharper than Haali, then Haali probably does something wrong.

nijiko · 5th May 2009, 14:43

>>madshi

Excuse me. Madshi. Have you known the problem with NVidia Video Decoder in HD clip?

mark0077 · 5th May 2009, 14:58

madshi, regarding saving renderer settings, could you save to registry as another option.

Egh · 5th May 2009, 15:01

Quote:

Originally Posted by Madshi

That goes to show that luma scaling algorithms are really a matter of taste. Most of the algorithms have some advantages and some disadvantages. Personally, I like Lanczos4 for its sharpness, but (obviously) I dislike the ringing. I like Mitchell, but actually I like SoftCubic50 even more, which I find very similar to Mitchell, but with less aliasing. So sometimes I'm using Lanczos4 and sometimes SoftCubic50.

For chroma I'm using SoftCubic100. Interesting that both of you guys prefer SoftCubic50 for chroma.

I haven't tested chroma rescalers yet. However I don't think it is really important as chroma has originally just a quarter of information compared to luma. Besides anime specifics with gradients and solid regions also needs to be taken into account. So I basically left SoftCubic50, is it sharper than 100?

As for the splines, even though they are different they are still quite remarkably similar. Would be interesting if somebody analyzed these pictures and posted enlarged cuts displaying the difference in ringing. I kind of don't see any worth mentioning in spline64

And is it possible to implement spline256 since there is lanc8 which should be equivalent?

madshi · 5th May 2009, 15:15

Quote:

Originally Posted by nijiko

Have you known the problem with NVidia Video Decoder in HD clip?

I guess it's caused by a bug in the NVidia decoder due to being forced to output YV12, so I don't plan to further look into this right now.

Quote:

Originally Posted by mark0077

regarding saving renderer settings, could you save to registry as another option.

Well, I could try to save to the ini first, and if that fails, store the settings in HKEY_CURRENT_USER. That should work...

Quote:

Originally Posted by Egh

I haven't tested chroma rescalers yet. However I don't think it is really important as chroma has originally just a quarter of information compared to luma. Besides anime specifics with gradients and solid regions also needs to be taken into account. So I basically left SoftCubic50, is it sharper than 100?

SoftCubic100 is *very* soft, but I like that best for chroma. Most of the time there's no difference visible. But if there is, to me SoftCubic100 looks better because it gets rid of any jaggies which SoftCubic50 doesn't always do. So SoftCubic100 is madVR's default setting for chroma resampling.

Quote:

Originally Posted by Egh

Would be interesting if somebody analyzed these pictures and posted enlarged cuts displaying the difference in ringing. I kind of don't see any worth mentioning in spline64

Look at the upper part of the nose (that one dark stroke between the eyes). There's a dark shadow left and right of that stroke.

Quote:

Originally Posted by Egh

And is it possible to implement spline256 since there is lanc8 which should be equivalent?

No, that's not possible because I don't know the correct formula for spline256. For Lanczos the basic formula is always the same, regardless of how many taps you use. For spline the formula's coefficients are different, depending on the number of taps.

Casshern · 5th May 2009, 15:52

Quote:

Originally Posted by madshi

SoftCubic100 is *very* soft, but I like that best for chroma. Most of the time there's no difference visible. But if there is, to me SoftCubic100 looks better because it gets rid of any jaggies which SoftCubic50 doesn't always do. So SoftCubic100 is madVR's default setting for chroma resampling.

Instead of upscaling luma and chroma separatly, its much better to use luma information to upscale chroma. This would use the higher res luma information to more accurately reconstruct the lowres chroma information. The easiest would be to use a delta in luma as weights for interpolating the chroma. As it is a obvious idea, there probably is already some work on it done.

nlnl · 5th May 2009, 16:07

Quote:

Originally Posted by nlnl

Nvidia IGP 9400 +512m, Intel 8300 2.83, Vista 32, AERO, ffdshow mt, MPC-НС 1065
madVR 0.7 reports:

luma - lancsoz3
chroma - softcubic100

1) Input - 1080x1920/23.976, video refresh rate (Nvidia control panel) - 23Hz.
Output - NO RESIZE, monitor resolution - 1920x1080

madVR 0.7 OSD says

avrg GPU rendering time - 25 (stable)

2) Input - 1080x1920/23.976, video refresh rate (Nvidia control panel) - 23Hz.
Output - RESIZE 1920x1080 --> 132,0,1766,919

madVR 0.7 OSD says:

avrg GPU rendering time - 45

CPU load
OSD OFF - 65-72%

And now for MadVR 0.9:
luma - lancsoz3, chroma - softcubic100
1)NO RESIZE
avrg GPU rendering time - 26.5-6.5(updating textures time)=20 (stable)
-20% against 0.7

2)RESIZE
avrg GPU rendering time - 45.5-7.5=38
-15.5% against 0.7

CPU load
OSD OFF - 60%
-14% against 0.7

3) Scaling up 576 (PAL DVD) --> 1080
luma - lancsoz4, chroma - softcubic100
avrg GPU rendering time - 29.3
My IGP Nvidia 9400 can upscale PAL DVD to 1080p with sharp lancsoz4! Thank you madshi!l

madshi · 5th May 2009, 16:13

Quote:

Originally Posted by Casshern

Instead of upscaling luma and chroma separatly, its much better to use luma information to upscale chroma.

You say that as if it was a fact. Have you seen this done? And have you seen that it's actually "much better"?

Quote:

Originally Posted by Casshern

This would use the higher res luma information to more accurately reconstruct the lowres chroma information. The easiest would be to use a delta in luma as weights for interpolating the chroma.

Doing it that way would mean making brighter pixels more saturated and darker pixels less saturated, right? I don't really see how that would be "more accurate". But I'm not an expert in this area. Is there any "scientific" reason for making brighter pixels more saturated than darker pixels?

As far as I can see, luma and chroma are independent and the luma value does not have any direct influence on chroma. So I don't really see how luma can help upsampling chroma better. But then we're talking about gamma corrected Y'CbCr and not linear light YCbCr. And IIRC I've been told that there is a bit of luminance in Cb and Cr, too. Argh, this is complicated.

@yesgrey, your opinion?

Quote:

Originally Posted by Casshern

As it is a obvious idea, there probably is already some work on it done.

There is an article which handles border cases where chroma is spread to neighbor pixels, if luma is too dark or too bright to hold the upsampled chroma. I've yet to look into implementing a similar algorithm. But the article only handles such corner cases and does not *generally* reshuffle the chroma. Actually the author of that article told me that a friend of his suggested to use luma to form chroma better, but he was not convinced of his friend's efforts...

leeperry · 5th May 2009, 16:21

Quote:

Originally Posted by Thunderbolt8

I have the feeling that madvr already displays the picture a little sharper then as haali, is this correct?

yes I agree. mostly because the chroma is blurrier than ffdshow I think(in softcubic50 at least) so the luma looks cleaner(which is even more true considering it's processed in 16bit).
is there a way to get very blurry chroma from the ffdshow avisynth filter? I can't use either mVR/rgb3dlut() at this point as the colors are not identical to realtime dddc() on my set up

cyberbeing · 5th May 2009, 18:34

Quote:

Originally Posted by madshi

I guess it's caused by a bug in the NVidia decoder due to being forced to output YV12, so I don't plan to further look into this right now.

I don't believe this is the issue, as it was just baseless speculation.

When outputting YUY2 from the NVIDIA decoder to FFDshow which is outputting YV12 to madVR, that misaligned luma and chroma problem still happens. This only happens with madVR. Other renderers are fine. This suggests it doesn't have anything to do with the colorspace being output by the NVIDIA decoder.

madshi · 5th May 2009, 18:48

Quote:

Originally Posted by cyberbeing

I don't believe this is the issue, as it was just baseless speculation.

When outputting YUY2 from the NVIDIA decoder to FFDshow which is outputting YV12 to madVR, that misaligned luma and chroma problem still happens. This only happens with madVR. Other renderers are fine. This suggests it doesn't have anything to do with the colorspace being output by the NVIDIA decoder.

Ok. Is the NVidia decoder freeware and does it work on ATI cards, too?

cyberbeing · 5th May 2009, 19:01

Quote:

Originally Posted by madshi

Ok. Is the NVidia decoder freeware and does it work on ATI cards, too?

It's not freeware, but there is a 30 day trial here: http://www.nvidia.com/object/dvd_dec...223-trial.html

It seems I remember hearing of people using it on ATI cards before, but not owning an ATI card myself currently, I can't confirm. Since you would be using it in software mode with madVR, I really don't see why it wouldn't work.

4th May 2009, 19:40	#861 \| Link
iSunrise Registered User Join Date: Dec 2008 Posts: 496	@madshi: I´ve just tested v0.9 thoroughly and I must say I´m quite impressed with the lower GPU usage times, that is by comparing the numbers with _and_ without "update textures". Also, CPU usage seems to be lower, too, which is a great step forward. This is on a very fast Core i7 and a GTX260-216 with >GTX285 clocks and ZoomPlayer with Luma set to Spline36 and Chroma set to SoftCubic100. However, not to my very liking v0.9 made one thing very obvious to me, which is good in a way, since you can hopefully reproduce and fix this. Now, being more specific: There is at least one very noticeable problem if you use CoreAVC with CUDA decoding, which will partly or completely go away if you either uncheck CUDA-decoding or use ffdshow as a decoder (ffmpeg-mt selected). Here´s a small list of things that I came across. Each decoder is mentioned so you can check them step-by-step. I´ve reproduced each step a dozen times just to be sure. Movie sample (official trailer): wolverine-tlra_h1080p.mov Now the decoders (madVR v0.9 as renderer): CoreAVC with CUDA-decoding: 1) Avrg gpu rendering time is noticably higher (1) and max gpu rendering time/update textures goes through the roof (depending on madVR settings ranging from 3-8 times higher, which is madVR settings dependant 1) 2) Display estimate 3 very often resets to 0.00000Hz and stays there for several seconds while the movie is playing or paused and Display will show [1s] CoreAVC without CUDA-decoding: 1) Max gpu rendering time/update textures looks fine 2) Display estimate 3 often resets to 0.00000Hz and stays there for several seconds while the movie is playing or paused and Display will show [1s] ffdshow: 1) Max gpu rendering time/update textures looks fine 2) Display estimate 3 occasionally resets to 0.00000Hz and Display will show [1s] 1: Compared to CoreAVC without CUDA-decoding and ffdshow It looks like 1) is a result of both (decoder and renderer) using the GPU extensively and/or it´s related to your new video->GPU uploading method. The higher avrg rendering times and the drastically higher max rendering time numbers just don´t make any sense to me. If I choose higher settings (like Lanczos8 on both Luma and Chroma) my max gpu rendering times are sometimes higher than the movie frame interval. If I´m only using software/cpu-decoding the max gpu rendering times/updating textures are 1/5 of that, so it will never reach the frame interval, regardless of the settings I choose. I hope you can look into this. Thanks. Finally, here´s the 3 shots (coreavc+cuda/coreavc-nocuda/ffmpeg-mt): Last edited by iSunrise; 4th May 2009 at 19:56.*

5th May 2009, 03:37	#864 \| Link
Egh Registered User Join Date: Jun 2005 Posts: 630	OK, now moar comparison for scaling algorithms, this time for the anime content. I have this rather good R2 dvd so the video is unaltered in any way apart from decoding and rendering by mVR. Scaling specified is only for luma, chroma is softcubic50 for each screenshot. Bilinear: C-R: Lanc4: Lanc8: Splin64: I haven't taken bicubics and softcubics as they were rather soft anyway. In order to compare you need to download these pngs and compare them in a viewer so that they would be shown in exactly same position on screen (I use acdsee and scroll with a mousewheel, so I can quickly switch the pictures). I really find hard to describe the difference between bilinear and C-R methods so for the purposes of upscaling C-R is bad, imo. I haven't saved mitchell, unfortunately (and mVR is hard to point to the exactly same frame ;P). It wasn't bad, but somewhat in between bilinear and lanc, still too soft for such task. My personal choice is Spline64, as I've been using splines for quite a while even with ffdshow resize. Interesting that difference between lanc8 and spl64 is very subtle, although the methods differ considerably (?). Edges are a bit softer with spl64 though but seems overall shaprness is good for any of them. Basically normal unaltered DVD anime content (read: dvd content, not the video damaged by 95% of encoders in the wild ;P) can be quite watchable when upscaled to 720p with either lanc8 or spl64. Of course it is only valid for large regions with contrast edges, smaller details like text etc looks blurry in any way (as expected, that what we need HD for) Last edited by Egh; 5th May 2009 at 03:42.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

4th May 2009, 20:27	#862 \| Link
kostik Registered User Join Date: Jul 2007 Posts: 161	what are the best chroma and luma resampling settings? for h264? to get the best quality?

5th May 2009, 04:32	#865 \| Link
6233638 Registered User Join Date: Apr 2009 Posts: 1,019	From that comparison, I would say: Bilinear is too soft, and suffers from aliasing. Lanczos8 introduces too many artefacts. Spline64 is very similar to Lanczos4 but with marginally less ringing. Spline64 is sharper than Catmull-Rom, at the expense of introducing more ringing into the picture. Personally, from those examples, I would choose Catmull-Rom as I'm quite adverse to ringing. I'd like to see how Mitchell Netravali compares, as it seemed to be the best from the testing I did, but that was filmed content rather than animated. It would also be good to see an unscaled image to get an idea of how sharp those lines should be.

5th May 2009, 10:11	#866 \| Link
Thunderbolt8 Registered User Join Date: Sep 2006 Posts: 2,197	madshi, where is the focus when downscaling, on keeping the colours as close to the original or (also) on sharpness (with default options)? because I'm wondering whether I now need to apply another sharpness level in ffdshow as with haali filter when I scale down from 1080p to 720p res. I have the feeling that madvr already displays the picture a little sharper then as haali, is this correct?

5th May 2009, 14:43	#870 \| Link
nijiko Hi-Fi Fans Join Date: Dec 2008 Posts: 222	>>madshi Excuse me. Madshi. Have you known the problem with NVidia Video Decoder in HD clip?

5th May 2009, 14:58	#871 \| Link
mark0077 Registered User Join Date: Apr 2008 Posts: 1,106	madshi, regarding saving renderer settings, could you save to registry as another option.