Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
4th May 2009, 19:40 | #861 | Link |
Registered User
Join Date: Dec 2008
Posts: 496
|
@madshi:
I´ve just tested v0.9 thoroughly and I must say I´m quite impressed with the lower GPU usage times, that is by comparing the numbers with _and_ without "update textures". Also, CPU usage seems to be lower, too, which is a great step forward. This is on a very fast Core i7 and a GTX260-216 with >GTX285 clocks and ZoomPlayer with Luma set to Spline36 and Chroma set to SoftCubic100. However, not to my very liking v0.9 made one thing very obvious to me, which is good in a way, since you can hopefully reproduce and fix this. Now, being more specific: There is at least one very noticeable problem if you use CoreAVC with CUDA decoding, which will partly or completely go away if you either uncheck CUDA-decoding or use ffdshow as a decoder (ffmpeg-mt selected). Here´s a small list of things that I came across. Each decoder is mentioned so you can check them step-by-step. I´ve reproduced each step a dozen times just to be sure. Movie sample (official trailer): wolverine-tlra_h1080p.mov Now the decoders (madVR v0.9 as renderer): CoreAVC with CUDA-decoding: 1) Avrg gpu rendering time is noticably higher (*1) and max gpu rendering time/update textures goes through the roof (depending on madVR settings ranging from 3-8 times higher, which is madVR settings dependant *1) 2) Display estimate 3 very often resets to 0.00000Hz and stays there for several seconds while the movie is playing or paused and Display will show [1s] CoreAVC without CUDA-decoding: 1) Max gpu rendering time/update textures looks fine 2) Display estimate 3 often resets to 0.00000Hz and stays there for several seconds while the movie is playing or paused and Display will show [1s] ffdshow: 1) Max gpu rendering time/update textures looks fine 2) Display estimate 3 occasionally resets to 0.00000Hz and Display will show [1s] *1: Compared to CoreAVC without CUDA-decoding and ffdshow It looks like 1) is a result of both (decoder and renderer) using the GPU extensively and/or it´s related to your new video->GPU uploading method. The higher avrg rendering times and the drastically higher max rendering time numbers just don´t make any sense to me. If I choose higher settings (like Lanczos8 on both Luma and Chroma) my max gpu rendering times are sometimes higher than the movie frame interval. If I´m only using software/cpu-decoding the max gpu rendering times/updating textures are 1/5 of that, so it will never reach the frame interval, regardless of the settings I choose. I hope you can look into this. Thanks. Finally, here´s the 3 shots (coreavc+cuda/coreavc-nocuda/ffmpeg-mt): Last edited by iSunrise; 4th May 2009 at 19:56. |
5th May 2009, 03:37 | #864 | Link |
Registered User
Join Date: Jun 2005
Posts: 630
|
OK, now moar comparison for scaling algorithms, this time for the anime content.
I have this rather good R2 dvd so the video is unaltered in any way apart from decoding and rendering by mVR. Scaling specified is only for luma, chroma is softcubic50 for each screenshot. Bilinear: C-R: Lanc4: Lanc8: Splin64: I haven't taken bicubics and softcubics as they were rather soft anyway. In order to compare you need to download these pngs and compare them in a viewer so that they would be shown in exactly same position on screen (I use acdsee and scroll with a mousewheel, so I can quickly switch the pictures). I really find hard to describe the difference between bilinear and C-R methods so for the purposes of upscaling C-R is bad, imo. I haven't saved mitchell, unfortunately (and mVR is hard to point to the exactly same frame ;P). It wasn't bad, but somewhat in between bilinear and lanc, still too soft for such task. My personal choice is Spline64, as I've been using splines for quite a while even with ffdshow resize. Interesting that difference between lanc8 and spl64 is very subtle, although the methods differ considerably (?). Edges are a bit softer with spl64 though but seems overall shaprness is good for any of them. Basically normal unaltered DVD anime content (read: dvd content, not the video damaged by 95% of encoders in the wild ;P) can be quite watchable when upscaled to 720p with either lanc8 or spl64. Of course it is only valid for large regions with contrast edges, smaller details like text etc looks blurry in any way (as expected, that what we need HD for) Last edited by Egh; 5th May 2009 at 03:42. |
5th May 2009, 04:32 | #865 | Link |
Registered User
Join Date: Apr 2009
Posts: 1,019
|
From that comparison, I would say:
Bilinear is too soft, and suffers from aliasing. Lanczos8 introduces too many artefacts. Spline64 is very similar to Lanczos4 but with marginally less ringing. Spline64 is sharper than Catmull-Rom, at the expense of introducing more ringing into the picture. Personally, from those examples, I would choose Catmull-Rom as I'm quite adverse to ringing. I'd like to see how Mitchell Netravali compares, as it seemed to be the best from the testing I did, but that was filmed content rather than animated. It would also be good to see an unscaled image to get an idea of how sharp those lines should be. |
5th May 2009, 10:11 | #866 | Link |
Registered User
Join Date: Sep 2006
Posts: 2,197
|
madshi, where is the focus when downscaling, on keeping the colours as close to the original or (also) on sharpness (with default options)? because I'm wondering whether I now need to apply another sharpness level in ffdshow as with haali filter when I scale down from 1080p to 720p res. I have the feeling that madvr already displays the picture a little sharper then as haali, is this correct?
|
5th May 2009, 11:28 | #869 | Link | |||||||||
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
Quote:
Quote:
Quote:
Quote:
Any opinions/suggestions? Quote:
The max gpu rendering times look really bad, but as I've already said multiple times, in the long run the max gpu rendering times are not very important. Quote:
Quote:
For chroma I'm using SoftCubic100. Interesting that both of you guys prefer SoftCubic50 for chroma. Quote:
Quote:
|
|||||||||
5th May 2009, 15:01 | #872 | Link | |
Registered User
Join Date: Jun 2005
Posts: 630
|
Quote:
As for the splines, even though they are different they are still quite remarkably similar. Would be interesting if somebody analyzed these pictures and posted enlarged cuts displaying the difference in ringing. I kind of don't see any worth mentioning in spline64 And is it possible to implement spline256 since there is lanc8 which should be equivalent? |
|
5th May 2009, 15:15 | #873 | Link | |||
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
I guess it's caused by a bug in the NVidia decoder due to being forced to output YV12, so I don't plan to further look into this right now.
Quote:
Quote:
Quote:
No, that's not possible because I don't know the correct formula for spline256. For Lanczos the basic formula is always the same, regardless of how many taps you use. For spline the formula's coefficients are different, depending on the number of taps. |
|||
5th May 2009, 15:52 | #874 | Link | |
Registered User
Join Date: Apr 2007
Posts: 220
|
Quote:
|
|
5th May 2009, 16:07 | #875 | Link | |
Registered User
Join Date: Aug 2008
Posts: 176
|
Quote:
luma - lancsoz3, chroma - softcubic100 1)NO RESIZE avrg GPU rendering time - 26.5-6.5(updating textures time)=20 (stable) -20% against 0.7 2)RESIZE avrg GPU rendering time - 45.5-7.5=38 -15.5% against 0.7 CPU load OSD OFF - 60% -14% against 0.7 3) Scaling up 576 (PAL DVD) --> 1080 luma - lancsoz4, chroma - softcubic100 avrg GPU rendering time - 29.3 My IGP Nvidia 9400 can upscale PAL DVD to 1080p with sharp lancsoz4! Thank you madshi!l Last edited by nlnl; 5th May 2009 at 16:16. |
|
5th May 2009, 16:13 | #876 | Link | ||
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
Quote:
Quote:
As far as I can see, luma and chroma are independent and the luma value does not have any direct influence on chroma. So I don't really see how luma can help upsampling chroma better. But then we're talking about gamma corrected Y'CbCr and not linear light YCbCr. And IIRC I've been told that there is a bit of luminance in Cb and Cr, too. Argh, this is complicated. @yesgrey, your opinion? There is an article which handles border cases where chroma is spread to neighbor pixels, if luma is too dark or too bright to hold the upsampled chroma. I've yet to look into implementing a similar algorithm. But the article only handles such corner cases and does not *generally* reshuffle the chroma. Actually the author of that article told me that a friend of his suggested to use luma to form chroma better, but he was not convinced of his friend's efforts... |
||
5th May 2009, 16:21 | #877 | Link | |
Kid for Today
Join Date: Aug 2004
Posts: 3,477
|
Quote:
is there a way to get very blurry chroma from the ffdshow avisynth filter? I can't use either mVR/rgb3dlut() at this point as the colors are not identical to realtime dddc() on my set up |
|
5th May 2009, 18:34 | #878 | Link | |
Broadband Junkie
Join Date: Oct 2005
Posts: 1,859
|
Quote:
When outputting YUY2 from the NVIDIA decoder to FFDshow which is outputting YV12 to madVR, that misaligned luma and chroma problem still happens. This only happens with madVR. Other renderers are fine. This suggests it doesn't have anything to do with the colorspace being output by the NVIDIA decoder. |
|
5th May 2009, 18:48 | #879 | Link | |
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
Quote:
|
|
5th May 2009, 19:01 | #880 | Link | |
Broadband Junkie
Join Date: Oct 2005
Posts: 1,859
|
Quote:
It seems I remember hearing of people using it on ATI cards before, but not owning an ATI card myself currently, I can't confirm. Since you would be using it in software mode with madVR, I really don't see why it wouldn't work. |
|
Tags |
direct compute, dithering, error diffusion, madvr, ngu, nnedi3, quality, renderer, scaling, uhd upscaling, upsampling |
Thread Tools | Search this Thread |
Display Modes | |
|
|