Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
29th April 2009, 06:40 | #661 | Link | |
Registered User
Join Date: Dec 2001
Location: Israel
Posts: 34
|
Quote:
My background is from professional video post processing. In this field correctness is not important as other features like sharpness, noise level, detail and implementation cost versus achieved quality. The latter is a key factor in algorithm selection. Conversion to linear RGB back and forth is not expensive in HW but performing Lanczos4 on RGB is very expensive than performing Lancso4 on Y and bi-cibuc/Lanczos2 on YU. Madshi uses a softer kernel on UV with good results - this isn't possible in RGB. Many video processing algorithms work better or are much easier to implement in YUV than RGB. Post scaling algorithms like sharpness is a good example. Sharpness filters are not "correct" but they look nice. We want to enhance the video beyond its original digital quality (the output of the decoder). |
|
29th April 2009, 07:11 | #662 | Link | |
Registered User
Join Date: Aug 2008
Posts: 176
|
Quote:
Sorry, I do not have my own opinion, but have a look at the posts #90 Mr.D, # 102 dmunsil in these thread: http://archive2.avsforum.com/avs-vb/...2&page=4&pp=30 |
|
29th April 2009, 07:18 | #663 | Link | |
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
41 seems to be *much* better on your hardware. Maybe it's less smooth for you with the current madVR build, but with a future madVR build it will be much better than 40.
However, your average GPU rendering times are generally too high! You won't be able to get really perfectly smooth playback with these rendering times, regardless which shader compiler version you're using. You will have to either disable some quality options or use different resampling filters, or upgrade your hardware. E.g. the Radeon 4770 was released yesterday and looks great to me. 50W under full load with performance almost identical to the 4850. Can't get much better than that for HTPCs, I think... Quote:
My plan is to (at least optionally) offer this processing chain: (1) Upsample chroma to 4:4:4 by using a soft kernel. (2) Convert Y'CbCr -> R'G'B' -> RGB -> YCbCr by using shader math (floating point with 32bit per channel), without clipping in between, so the conversion should be nearly lossless. The resulting linear light YCbCr data will be stored in 16bit integer. (3) Eventually adjust brightness, contrast and saturation (in linear light). (4) Scale to the final output resolution (in YCbCr linear light). (5) Eventually do some further post processing (in YCbCr linear light). (6) Convert linear light YCbCr back to 16bit R'G'B' by using a precalculated 3dlut, which will also do gamut & gamma correction. (7) Dither down to 8bit R'G'B'. I think a graphics card like the Radeon 4770 should be able to do all this with ease, at least with 24p content (60p content might be too much of a burden). Does that sound to you like the optimal way to go for image quality? Or do you have a suggestion on how to make things even better? E.g. I'm not sure on whether I should scale on linear light RGB or linear light YCbCr? Probably it doesn't matter much, I'd guess? I'd prefer to scale linear light YCbCr because I have some scaling algorithm tweaks in mind which might show less artifacts when doing scaling in linear light YCbCr. Also I'm wondering if 16bit integer is enough for linear light YCbCr? Or should I better use 32bit floating point buffers for linear light YCbCr instead? I don't care right now how much burden it is on the hardware, because graphics cards are getting faster at an almost ridiculous pace. So what may be too much burden today might be easily doable in a year or two. So I want to implement the best quality that is possible, even if it overloads current budget hardware. (But of course there will be options to trade quality for performance...) |
|
29th April 2009, 07:44 | #664 | Link | ||
And so it begins...
Join Date: Nov 2005
Location: Hannover, Germany
Posts: 64
|
@madshi
I love your no-compromise-thinking! I think it is the right way. Quote:
Quote:
Last edited by FoLLgoTT; 29th April 2009 at 08:05. |
||
29th April 2009, 07:51 | #665 | Link |
Registered User
Join Date: Aug 2008
Posts: 176
|
madshi
Please have a look at the ATI Avivo Display Engine architecture: http://ati.amd.com/technology/Avivo/...Whitepaper.pdf. May be that will help you to develop the best processing chain. By the way they claim using spatial and temporal 10bit -> 8,6bit dithering and 10x6 taps scaling filters. Last edited by nlnl; 29th April 2009 at 08:30. |
29th April 2009, 09:56 | #668 | Link | |||
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
Quote:
(1) We want to do gamut & gamma correction, not only gamma correction. For doing gamut correction we're using a very big 3dlut. Now if we already use a 3dlut for gamut correction, gamma correction can be done through the very same 3dlut "for free". So why should madVR limit itself to 10bit ATI gamma correction, if it can do 16bit gamma correction on its own without any additional performance cost? (2) I don't trust the graphics cards manufacturers to get things right. If you look at the chroma upsampling screenshots, you'll see what terrible quality my ATI card gives me. Now there are reports that some users are getting much better chroma upsampling quality from their ATI cards, but only with specific renderers and specific video formats. Anyway, it's a mess. Also I can't get VMR9 or EVR to output video levels without getting BTB and WTW clipped. Or I'm having trouble getting VMR9 to provide PC levels. You see, getting the standard renderers to do what we want is extremely hard. And the fault lies with ATI. Don't know about NVidia. But why the heck should we rely on graphics manufacturers to get things right, if they've proven again and again that they don't know what the heck they are doing in the HTPC area? (3) The purpose of madVR is to provide best possible quality. On any graphics card. With any source. Without worrying about having to tweak decoder and renderer settings all the time. And this is only possible if I avoid any and all video related processing offered by the graphics card manufacturers. Because they simple are not reliable. Sometimes they work, sometimes not. It can change with driver revisions. It can change with the OS, with the graphics card model, with the video format etc etc. madVR doesn't have any of these problems right now, so why should I even consider going back to depend on graphics card manufacturers getting things right? IMO that would be major step back. Quote:
BTW, if I'd use their taps calculation, madVR's Lanczos8 scaling filter uses 16x16 taps instead of 10x6. And madVR uses 16bit -> 8bit dithering instead of 10bit -> 8bit. So in both cases madVR "wins". The problem is that the number of taps alone is not the decisive factor for image quality... Quote:
Ok, I don't know which exact resampling filter HQV is using. Maybe it's not Lanczos, maybe they are using something else which has less ringing at higher taps. But still, I don't really see much benefit from using more than 8x8 = 64 taps. The increase in sharpness in minimal, the increase in ringing is big. BTW, Don Munsil and Stacey Spears prefer using Catmull-Rom scaling over Lanczos scaling because of the Lanczos ringing artifacts. Now Catmull-Rom uses only 2 taps (or 4x4 = 16 taps according to HQV math). Go figure... I don't really agree with their choice of Catmull-Rom, though. IMHO the number of scaling taps is somewhat similar to the megapixel race with digicams. J6p likes big numbers. Marketing likes big numbers. But the reality is different. Best image quality does not always come with the highest number. Sometimes a higher number can even be bad. Last edited by madshi; 29th April 2009 at 09:58. |
|||
29th April 2009, 10:38 | #669 | Link | |
And so it begins...
Join Date: Nov 2005
Location: Hannover, Germany
Posts: 64
|
@madshi
OK, I got your points and I agree with you. Quote:
Anyway, this special case is no problem at all. I can still use VideEqualizer "behind" madVR and deactivate 3D LUT processing and dithering. Last edited by FoLLgoTT; 29th April 2009 at 10:48. |
|
29th April 2009, 10:48 | #670 | Link |
MPC-HC Project Manager
Join Date: Mar 2007
Posts: 2,317
|
They claim to be using a modified version of teranex scaler, which according to them is a military grade scaler.
Teranex scaler seem to used to upscale movies filmed with pre-HD cameras to HD for release on bluray. The technology has even won an emmy. The demo's on their site look very impressive, but that's the marketing guys talking ofcourse, reviews on the net seem positive though It doesn't matter though because its closed source and very expensive. |
29th April 2009, 11:07 | #671 | Link | |||
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
Quote:
Quote:
Quote:
I'm not saying that HQV quality is bad, not at all. But just because their marketing department goes fancy you shouldn't believe that it isn't possible for us to compete with them quality wise... |
|||
29th April 2009, 11:08 | #672 | Link | |
Registered User
Join Date: Sep 2004
Posts: 1,295
|
Quote:
IMHO 16bit integer is not enough for linear processing. 32bit FP should be the bare minimum. I have tryed running cr3dlut only in 32bit FP and it was not lossless. When I say lossless, I mean simply removing the gamma and reapplying it would yield different results. Not much, only by 1 in some of the components, but there was a difference. Only with 64bit FP I achieved completelly lossless degamma/engamma. Also, if you're looking for the ultimate solution, the final step should also be performed using shader math, because the 3dlut only can have 8bit per component at the input side, and if you have 32bit FP you will loose some precision when you interpolate linearly for the 3dlut indexing. Last edited by yesgrey; 29th April 2009 at 11:12. |
|
29th April 2009, 11:17 | #673 | Link |
MPC-HC Project Manager
Join Date: Mar 2007
Posts: 2,317
|
Yes i agree.
The end result should be compared right? Do you think madvr will eventually rival/beat the image quality of those 5000$ players?(assuming DVI/HDMI, videocard's often cannot be the high quality dac's on those standalone players for analogue signals) I think you are going in the right direction with the "scientific facts only please, no snakeoil" approach. |
29th April 2009, 11:45 | #674 | Link | ||
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
Quote:
Quote:
That would be ideal, but it will be difficult to get clean screenshots from HQV processing! So comparisons might have to be done in real time playback and that means it might be somewhat subjective... I have no idea, but I sure hope so. Actually I expect to beat them in some ways (e.g. doing linear light scaling etc). HTPCs might lose in deinterlacing quality, though. |
||
29th April 2009, 12:19 | #675 | Link | |
Registered User
Join Date: Sep 2004
Posts: 1,295
|
Quote:
You don't need to use the 3dlut for the gamut and gamma correction. The gama correction is easy, is just using a custom gamma function when converting from RGB->R'G'B'. The gamut correction is also easy to perform; if you use the linearized bradford transform is just a simple matrix multiplication, and the error by using the linearized vs the full bradford transform is not significative, at the maximum is a +/-1 difference in some component values... and since it affects the blue most, it would not be very easy to see the differences. You can always add using the full bradford transform as an option for more powerfull GPUs, of course. The 3DLUT is a very good solution, but it's limited by it's size. We could never use more than 8 or 9 bit at the input size, and when you consider so important to create a processing chain with a full 16bit depth precision, having one component limiting it to 8 or 9 bit does not seems very logical... I will run some tests to see how much time the current code I have needs for processing a few pixels. This could give you a raw idea of how it should perform if ported to the GPU... |
|
29th April 2009, 13:03 | #676 | Link |
Kid for Today
Join Date: Aug 2004
Posts: 3,477
|
oh yes, 64FP HSLS gamut conversion w/ the existing PS script would be great! no loading time, perfect colors!
just two text files SD.txt/HD.txt that you can switch w/ a batch file depending on the gamut/display w/ automatic 601/709 decoding depending on the resolution, and we'd be good to go Last edited by leeperry; 29th April 2009 at 13:16. |
29th April 2009, 14:14 | #677 | Link |
Registered User
Join Date: Sep 2004
Posts: 1,295
|
Here are some benchmarking results using cr3dlut code:
Time for processing 2 107 392 pixels using an Intel E2160@2700MHz (2 cores) (1920x1080 = 2 073 600 pixels) YCbCr->RGB 64bit FP: < 1ms YCbCr->RGB, gamut & gamma correction, CA 1, 32bit FP, 1 thread: 813ms YCbCr->RGB, gamut & gamma correction, CA 1, 32bit FP, 2 thread: 421ms YCbCr->RGB, gamut & gamma correction, CA 1, 64bit FP, 1 thread: 688ms YCbCr->RGB, gamut & gamma correction, CA 1, 64bit FP, 2 thread: 359ms YCbCr->RGB, gamut & gamma correction, CA 2, 64bit FP, 1 thread: 1656ms YCbCr->RGB, gamut & gamma correction, CA 2, 64bit FP, 2 thread: 968ms Now the question is: How can we translate these cpu processing times into gpu processing times? I've measured the YCbCr->RGB conversion to serve as a reference, because it's already implemented by madshi. This code is parallel by nature, so with more processing units the processing time should decrease proportionally. Considering the GPUs have lots of processing units, maybe cr3dlut can be fully ported to shader code and used in realtime with no significant penalty... |
29th April 2009, 14:23 | #678 | Link | ||
Registered Developer
Join Date: Sep 2006
Posts: 9,140
|
Quote:
Quote:
No, it would not. A really good CMS needs different corrections for different stimulus levels. And that is not possible with a simple PS script. |
||
29th April 2009, 14:43 | #679 | Link | |
Hi-Fi Fans
Join Date: Dec 2008
Posts: 222
|
Quote:
So I use it for testing. In these moments, sadly, I have no plan to buy new h/w... |
|
29th April 2009, 14:53 | #680 | Link |
Registered User
Join Date: Mar 2006
Posts: 1,538
|
I don't want to call this a bug but I do want to see if others are experiencing the same thing.
MPC-HC Build 1.2.1079.0 madVR v0.8 ffdshow tryouts, svn 2918 Windows XP SP3 NVIDIA Quadro NVS 135M When playing a 1280x720 video on my 1280x1024 LCD monitor I noticed a delay when switching the video to full screen. When the video switches to full screen the video is stretched for approximately 5 seconds and then adjusted to the correct AR. I don't experience this with VMR or EVR CP. I'm assuming this is due to the video card but I just wanted to bring it up. Thanks. |
Tags |
direct compute, dithering, error diffusion, madvr, ngu, nnedi3, quality, renderer, scaling, uhd upscaling, upsampling |
Thread Tools | Search this Thread |
Display Modes | |
|
|