Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > New and alternative a/v containers

Reply
 
Thread Tools Search this Thread Display Modes
Old 24th August 2011, 06:08   #5041  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,361
Quote:
Originally Posted by pirlouy View Post
Just for my curiosity, and like it's not the first time you say it, why do you prefer this decoder ? Is it only because of CPU usage, or is there another advantage ??
Its present on every system and can decode basically every stream.
It just works.

I use LAV CUVID, but if i wouldn't have a NVIDIA card, i would probably use the MS decoder.

Quote:
Originally Posted by pirlouy View Post
@nevcairiel: can you post it at the end of this thread, when there's an update please ? or RSS or whatever please ? I've missed the 0.33 for example.
I always post a release announcement here in this thread.
http://forum.doom9.org/showthread.ph...84#post1521184
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 24th August 2011 at 07:23.
nevcairiel is offline   Reply With Quote
Old 24th August 2011, 07:19   #5042  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Quote:
Originally Posted by nevcairiel View Post
Its present on every system and can decode basically every stream.
It just works.
Agreed. And it's faster than both the libav/ffmpeg and the Intel Media SDK software VC-1 decoders.
madshi is offline   Reply With Quote
Old 24th August 2011, 09:34   #5043  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,361
Some news from the optimizations of the pixel format converters:

I noticed last night that i do not need to code converters for big-endian formats, hooray. ffmpeg always outputs in native endianness, and on x86 thats always little-endian. That saved alot of work and made me happy.
Last night, i wrote a new converter for YUV420 9/10bit -> YV12/NV12. Its again notably faster then the old one, especially for NV12 output.

Next up are converters for YUV420/422 9/10bit -> P010/P210, and YUV422 -> YUY2. When those are done, all common unscaled YUV conversions will have a optimized converter.

On the weekend, i will tackle chroma upscaling, YUV420 -> YUV422 -> YUV444 -> RGB (with entry and exit points at every step in the chain)
All upscaling will be done in 16-bit integer (RGB conversion with 32-bit intermediate results), and in the end dithered back down to the desired output bitdepth (currently always 8-bit, 10-bit output is only supported for untouched chroma)

I've been pondering on something, though. Is there some real (noticeable) quality loss when dithering the YUV to 8bit before the RGB conversion, or should i take the extra effort and build the RGB converter to actually work with 10-bit input pixels?
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 24th August 2011 at 10:35.
nevcairiel is offline   Reply With Quote
Old 24th August 2011, 11:51   #5044  |  Link
pirlouy
_
 
Join Date: May 2008
Location: France
Posts: 692
Quote:
Originally Posted by nevcairiel View Post
Its present on every system and can decode basically every stream.
It just works.
Yes, but EVR renderer works too. It displays all files. And that does not means it's the best one.

Quote:
I use LAV CUVID, but if i wouldn't have a NVIDIA card, i would probably use the MS decoder.
People say GPU drivers caused different problems. And you said the advantage of LAV CUVID is deinterlacing, so if you only have VC1 from Blu-Ray, LAV Cuvid is not necessarily the best solution.

Quote:
Originally Posted by madshi View Post
And it's faster than both the libav/ffmpeg and the Intel Media SDK software VC-1 decoders.
In my case, there's not a lot of difference (both at 20% for Harry Potter 2 in my case). The Intel one from madVR uses more CPU (30% for HP2).

Like LAV Splitter is based on ffmpeg, and MS decoder does not bring advantages (except the support of interlaced files I don't use), I prefer to stay with ffmpeg decoder (LAV seems to use a bit less CPU than madVR one). :-)

Quote:
I always post a release announcement here in this thread.
http://forum.doom9.org/showthread.ph...84#post1521184
Sorry. :/
pirlouy is offline   Reply With Quote
Old 24th August 2011, 12:20   #5045  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,361
Quote:
Originally Posted by pirlouy View Post
Yes, but EVR renderer works too. It displays all files. And that does not means it's the best one.
How a file is to be decoded is 100% specified, there are no differences in quality between decoders, so the comparison is kinda off.

Quote:
Originally Posted by pirlouy View Post
People say GPU drivers caused different problems. And you said the advantage of LAV CUVID is deinterlacing, so if you only have VC1 from Blu-Ray, LAV Cuvid is not necessarily the best solution.
Blu-rays also contain VC-1 interlaced, especially documentaries and some concert discs. I just don't want to switch decoders when watching one of those.

Besides, i never had any real trouble with NVIDIA drivers. All complaining i hear is really mostly from ATI/AMD users, but i have no first-hand experience either way.

Quote:
Originally Posted by pirlouy View Post
Like LAV Splitter is based on ffmpeg, and MS decoder does not bring advantages (except the support of interlaced files I don't use), I prefer to stay with ffmpeg decoder (LAV seems to use a bit less CPU than madVR one). :-)
Its a format designed by MS, so why would their decoder be bad?
Its faster (less CPU), and it supports more types of movies.

Anyway, in the end its your choice which decoder to use, and if you never watch VC-1 interlaced, then the ffmpeg decoder will probably be fine as well - i just don't think it has any advantages over the MS decoder, thats why its off by default now.
If interlaced support is added to ffmpeg (some day), i'll probably enable it again.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 24th August 2011 at 12:26.
nevcairiel is offline   Reply With Quote
Old 24th August 2011, 12:34   #5046  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
@pirlouy, I'm not really sure what you're trying to say. We've given you 2 good reason why we think the MS VC-1 decoder is the better default decoder: (1) Speed. (2) Interlaced decoding capability. So you're personally not interesting in either of these advantages. Ok, but what is your point? You want us to use the ffmpeg VC-1 decoder as default, although it's slower and less capable? That doesn't make any sense.
madshi is offline   Reply With Quote
Old 24th August 2011, 12:42   #5047  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Quote:
Originally Posted by nevcairiel View Post
Some news from the optimizations of the pixel format converters:

I noticed last night that i do not need to code converters for big-endian formats, hooray. ffmpeg always outputs in native endianness, and on x86 thats always little-endian. That saved alot of work and made me happy.
Last night, i wrote a new converter for YUV420 9/10bit -> YV12/NV12. Its again notably faster then the old one, especially for NV12 output.

Next up are converters for YUV420/422 9/10bit -> P010/P210, and YUV422 -> YUY2. When those are done, all common unscaled YUV conversions will have a optimized converter.

On the weekend, i will tackle chroma upscaling, YUV420 -> YUV422 -> YUV444 -> RGB (with entry and exit points at every step in the chain)
All upscaling will be done in 16-bit integer (RGB conversion with 32-bit intermediate results), and in the end dithered back down to the desired output bitdepth (currently always 8-bit, 10-bit output is only supported for untouched chroma)
Sounds good to me! Will the chroma upscaling also support 10bit input?

Quote:
Originally Posted by nevcairiel View Post
I've been pondering on something, though. Is there some real (noticeable) quality loss when dithering the YUV to 8bit before the RGB conversion, or should i take the extra effort and build the RGB converter to actually work with 10-bit input pixels?
You'd have to implement dithering twice this way. I've no idea what happens to ordered dithering if you apply it twice. With random dithering (as used by madVR) applying dithering twice increases the noise floor.
madshi is offline   Reply With Quote
Old 24th August 2011, 12:51   #5048  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,361
Quote:
Originally Posted by madshi View Post
Sounds good to me! Will the chroma upscaling also support 10bit input?
Thats easy enough, because 10bit is stored in 16bit internally anyway, and i do need to do processing in 16-bit. I need 4 bit free space for the upscaling, so anything up to 12-bit can be natively processed with the algorithm i have in mind.

Quote:
Originally Posted by madshi View Post
You'd have to implement dithering twice this way. I've no idea what happens to ordered dithering if you apply it twice. With random dithering (as used by madVR) applying dithering twice increases the noise floor.
Hm, yeah. I guess it won't be that much more work to actually support 10-bit input, for similar reasons as above. Just have to keep track of the number of bits that are actually valid, so i can apply the proper dithering and shifting at the end.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 24th August 2011, 13:06   #5049  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Maybe you can upconvert everything to 12bit first and then write routines only for that bitdepth? Then for the final step you'd need downconversion routines to 8bit, 9bit and 10bit, I guess.

Edit: Wait, this is all for RGB output, right? In that case you only need 8bit output, I guess, because there are no 9bit or 10bit RGB output FOURCCs. At least I don't know any. You could optionally output 16bit RGB, though.

Last edited by madshi; 24th August 2011 at 13:09.
madshi is offline   Reply With Quote
Old 24th August 2011, 13:14   #5050  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,361
Yeah i thought about that, it adds one shift operation to every pixel, but i guess the performance difference using SSE2 is minimal, and simplifys the code a bit (the alternative would be to use a template and let the compiler generate three versions of the function for 8, 9 and 10bit each, without much special code).

While i have your attention, just to confirm my logic is alright.
Looking at the MPEG-2 Chroma position, for the 4:2:0 -> 4:2:2 conversion i would use a simple 75:25 interpolation. For the second step, the 4:2:2 -> 4:4:4, i would just use a 50:50 interpolation. Its not really a very fancy algorithm, but its similar to what ffdshow uses, and the quality seems alright. Or am i missing something?

Quote:
Originally Posted by madshi View Post
Edit: Wait, this is all for RGB output, right? In that case you only need 8bit output, I guess, because there are no 9bit or 10bit RGB output FOURCCs. At least I don't know any. You could optionally output 16bit RGB, though.
Technically, yes. Although some renderers (Haali) only accept YUY2 (or RGB) input, so at least the 4:2:0->4:2:2 upsampling would be used there as well, but still limited to 8-bit.
Any conversions are really only for renderes or post-processing filters that don't know any better - and those will probably never understand 10bit.

Edit:
Technically, one could use D3DFMT_A2R10G10B10 as a FourCC, the MSDN claims its similar in usage, but i guess nothing supports it anyway.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 24th August 2011 at 13:29.
nevcairiel is offline   Reply With Quote
Old 24th August 2011, 13:39   #5051  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Quote:
Originally Posted by nevcairiel View Post
While i have your attention, just to confirm my logic is alright.
Looking at the MPEG Chroma position, for the 4:2:0 -> 4:2:2 conversion i would use a simple 75:25 interpolation. For the second step, the 4:2:2 -> 4:4:4, i would just use a 50:50 interpolation. Its not really a very fancy algorithm, but its similar to what ffdshow uses, and the quality seems alright. Or am i missing something?
Hmmmm... I've never upscaled chroma like that, so I'm not 100% sure right now. But I guess it's ok that way.

Quote:
Originally Posted by nevcairiel View Post
Technically, yes. Although some renderers (Haali) only accept YUY2 (or RGB) input, so at least the 4:2:0->4:2:2 upsampling would be used there as well, but still limited to 8-bit.
Any conversions are really only for renderes or post-processing filters that don't know any better - and those will probably never understand 10bit.
Yeah, you're probably right.

Quote:
Originally Posted by nevcairiel View Post
Technically, one could use D3DFMT_A2R10G10B10 as a FourCC
True! Forgot about that...
madshi is offline   Reply With Quote
Old 24th August 2011, 13:43   #5052  |  Link
Superb
Registered User
 
Join Date: Feb 2010
Posts: 364
Two genius guys talking. I feel like I'm watching The Big Bang Theory. Bazinga!
Superb is offline   Reply With Quote
Old 24th August 2011, 13:54   #5053  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,361
Quote:
Originally Posted by madshi View Post
Hmmmm... I've never upscaled chroma like that, so I'm not 100% sure right now. But I guess it's ok that way.
Well i suppose this might not be the absolute perfect quality, some complicated cubic interpolation might yield better results, but this will be fast, and people never complained about ffdshows interpolator (it actually uses 75:25 for the horizontal expansion as well, which seems quite odd to me, and would result in a chroma phase shift - assuming MPEG-2 siting - maybe ffdshow was designed against MPEG-1 chroma?)

Quote:
Originally Posted by Superb View Post
Two genius guys talking. I feel like I'm watching The Big Bang Theory. Bazinga!
I hope that show is returning soon.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 24th August 2011 at 14:00.
nevcairiel is offline   Reply With Quote
Old 24th August 2011, 16:12   #5054  |  Link
Mercury_22
Registered User
 
Join Date: Dec 2007
Posts: 1,138
@ Nev any chance that after you finish with this optimizations you'll take a look at DXVA ?
Cause since people have started to compare energy consumption and cpu / gpu utilization I've got curious and I've made a quick comparison between LAVVideo - MPC-HC's internal - Microsoft DTV-DVD H264 decoders (on a 720p, 3285 Kbps /ac3 mkv file) using "GPU Observer – Sidebar Gadget" & "Intel Core Series – Sidebar Gadget" and the results (although not so accurate) speak for themselves
[HTML]x86 EVR CPU GPU CPU % GPU % CPU Mhz GPU Mhz x86 EVR-CP CPU GPU CPU % GPU % CPU Mhz GPU Mhz
M_DTV-DVD 40.734 112.5 3 15 1357.8 750 M_DTV-DVD 95.9874 187.5 6 25 1599.79 750
MPC-HC 82.318 97.5 5 13 1646.36 750 MPC-HC 82.119 202.5 5 27 1642.38 750
LAV 272.1285 36 15 12 1814.19 300 LAV 320.8368 172.5 16 23 2005.23 750

x64 EVR CPU GPU CPU % GPU % CPU Mhz GPU Mhz x64 EVR -CP CPU GPU CPU % GPU % CPU Mhz GPU Mhz
M_DTV-DVD 41.4897 112.5 3 15 1382.99 750 M_DTV-DVD 106.209 202.5 6 27 1770.15 750
MPC-HC 42.1959 105 3 14 1406.53 750 MPC-HC 94.7352 202.5 6 27 1578.92 750
LAV 148.9959 46.2 9 28 1655.51 165 LAV 321.4125 195 15 26 2142.75 750
[/HTML]
I've used MPC-HC 1.5.3.3697 x86 & x64 @ defaults + LAVAudio + LAVSplitter & EVR (and another test with EVR-CP + 10-bit in & out + FFPP)
Before starting the test(s) I've restarted my system and disabled Superfetch also no other service or program have been started / closed during test

So any thoughts ?
__________________
Intel UHD Graphics 750; Win 10 22H2

Last edited by Mercury_22; 24th August 2011 at 16:18.
Mercury_22 is offline   Reply With Quote
Old 24th August 2011, 16:33   #5055  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,361
The GPU hardware decoder does not show up in the GPU usage %, so the test is really pointless.
Of course will the software decoder use more CPU then the DXVA decoders. And what was this supposed to show now?

In the normal EVR case with LAV Video, the GPU even went into a lower power state, probably saving more energy then DXVA would.

Anyway, there are alot of DXVA decoders around, and since the decoding is done on the GPU, there really wouldn't be any performance or quality difference, so ... just use those decoders?

Something to consider:
Intels next generation (Ivy Bridge) will actually reduce power consumption quite significantly, while increase performance at the same time (Tri-Gate Transistors). So just maybe, those CPUs will actually be far more efficient then a big GPU?
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 24th August 2011 at 16:49.
nevcairiel is offline   Reply With Quote
Old 24th August 2011, 17:24   #5056  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Quote:
Originally Posted by nevcairiel View Post
Well i suppose this might not be the absolute perfect quality, some complicated cubic interpolation might yield better results, but this will be fast, and people never complained about ffdshows interpolator (it actually uses 75:25 for the horizontal expansion as well, which seems quite odd to me, and would result in a chroma phase shift - assuming MPEG-2 siting - maybe ffdshow was designed against MPEG-1 chroma?)
I think the 75:25 and 50:50 interpolation should be similar to bilinear interpolation, which should have a good quality/speed tradeoff for chroma upsampling. However, I agree with you that using 75:25 in both directions is not correct. Is should be 75:25, 25:75, 75:25, 25:75 in one direction and 100, 50:50, 100, 50:50 in the other direction.

One alternative would be a tent filter, using weights of 1, 2, 3, 4, 3, 2, 1 for neighboring pixels. The weights sum up to 16, so it should perform well, too, but of course it'd be noticeably slower than bilinear upsampling because you need to average many more values together.
madshi is offline   Reply With Quote
Old 24th August 2011, 17:40   #5057  |  Link
Mercury_22
Registered User
 
Join Date: Dec 2007
Posts: 1,138
Quote:
Originally Posted by nevcairiel View Post
The GPU hardware decoder does not show up in the GPU usage %, so the test is really pointless...
Ups I didn't know that then what does GPU usage % show ? GPU - UVD - AVP ?
__________________
Intel UHD Graphics 750; Win 10 22H2

Last edited by Mercury_22; 24th August 2011 at 17:43.
Mercury_22 is offline   Reply With Quote
Old 24th August 2011, 17:50   #5058  |  Link
pirlouy
_
 
Join Date: May 2008
Location: France
Posts: 692
Quote:
Originally Posted by madshi View Post
@pirlouy, I'm not really sure what you're trying to say. We've given you 2 good reason why we think the MS VC-1 decoder is the better default decoder: (1) Speed. (2) Interlaced decoding capability. So you're personally not interesting in either of these advantages. Ok, but what is your point? You want us to use the ffmpeg VC-1 decoder as default, although it's slower and less capable? That doesn't make any sense.
My point has always been to understand why you prefer MS decoder.
I'm sure you understand things better than me and of course I don't criticize your choice. But you said MS decoder uses less CPU, which is not what I've noticed in my case.
And yes, like ffmpeg works well in a lot of things (splitter, decoders), I prefer to rely on it, especially if the only difference (for me) is the support of interlaced files. I also like the fact it's an active project. It's the same kind of reasoning I do when using LAV splitter instead of Haali Splitter for example.

But it's ok, I've understood your points.
pirlouy is offline   Reply With Quote
Old 24th August 2011, 17:54   #5059  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,361
Quote:
Originally Posted by Mercury_22 View Post
Ups I didn't know that then what does GPU usage % show ? GPU - UVD - AVP ?
Just raw 3D GPU usage. On AMD, you cannot really view the usage of the decoder. On NVIDIA, GPU-Z shows the video decoder usage.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 24th August 2011, 18:01   #5060  |  Link
madshi
Registered Developer
 
Join Date: Sep 2006
Posts: 9,140
Quote:
Originally Posted by pirlouy View Post
you said MS decoder uses less CPU, which is not what I've noticed in my case.
On my PC the MS decoder is significantly faster than ffmpeg. I can't get Blu-Rays to play smoothly with ffmpeg on my (rather old/slow) PC, while the MS decoder plays most Blu-Rays just fine for me.
madshi is offline   Reply With Quote
Reply

Tags
decoders, directshow, filters, splitter

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 01:52.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.