Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > New and alternative video codecs

Reply
 
Thread Tools Search this Thread Display Modes
Old 16th January 2012, 11:00   #561  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by nevcairiel View Post
The old LAV CUVID supported MPEG4-ASP, and LAV Video 0.45 will regain that ability. I may also add it to the DXVA2 decoder for AMD/ATI, if i ever feel really bored (its not implemented in ffmpeg yet, so more work then just flipping a switch)
Good to know.
For Nvidia HW why isn't it possible to accelerate MPEG4-ASP in DXVA ? (Direct or Frame copy)
Why do you have to use LAV CUVID ?

Quote:
Originally Posted by nevcairiel View Post
@VC-1:
Its possible to intercept the calls from the MSDK into DXVA and check what its doing different (i used that alot the last few days to figure out VC-1 interlaced DXVA2 with other DXVA2 decoders), but since we have the MSDK - why bother? :d
Direct is always faster and more efficient in terms of power consumption, a requirement for laptops mainly.
Especially Intel's MSDK implementation with 60fps VC-1 clips, there is the same problem of Turbo CPU frequency as with H.264 60fps files, even in normal playback mode.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 16th January 2012, 11:10   #562  |  Link
betaking
Fantasy Codecs writer
 
betaking's Avatar
 
Join Date: Nov 2007
Location: Yang Zhou,Jiang Su,China
Posts: 392
Quote:
Originally Posted by egur View Post
OK then. everyone wants deinterlacing + film detection. I'll start with that. Scaling and other post processing features will follow.
I won't implement a custom DI myself, I'll use the one supplied by the driver. Personally I've tested it to be better than Nvidia/AMD but your mileage may vary.
@Betaking
The MSDK doesn't support MPEG4-ASP as far as I know. I don't know about driver support. Does DXVA support this format (any GPU)?
Thanks for the reply.
betaking is offline   Reply With Quote
Old 16th January 2012, 11:13   #563  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,336
Quote:
Originally Posted by NikosD View Post
For Nvidia HW why isn't it possible to accelerate MPEG4-ASP in DXVA ? (Direct or Frame copy)
Why do you have to use LAV CUVID ?
Who says you have to?
DXVA is also possible.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 16th January 2012, 11:17   #564  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Nobody has done it, yet.
Including you.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 16th January 2012, 11:19   #565  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,336
And i already said why i didn't do it yet, because ffmpeg doesn't support MPEG4 DXVA2 yet, and implementing that is quite a bit of work for very little benefit. It is however planned for some time in the future.

Also, my DXVA2 decoder is only a few days old, i rather focused on issues like VC-1 interlaced DXVA, which required you to use a commercial DXVA decoder up to now. :d
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 16th January 2012 at 11:22.
nevcairiel is offline   Reply With Quote
Old 16th January 2012, 11:23   #566  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
So, PotPlayer's (which is based on FFMpeg) UVD3 support of MPEG4-ASP VLD is just a work of their own ?

Or FFMpeg implemented MPEG4 ASP only for ATI's HW ?

BTW, DivX codec itself has DXVA MPEG4 ASP support only for UVD3.

Update:
PotPlayer is free and can accelerate DXVA VC-1 Interlaced at least one year before!
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 16th January 2012, 11:26   #567  |  Link
betaking
Fantasy Codecs writer
 
betaking's Avatar
 
Join Date: Nov 2007
Location: Yang Zhou,Jiang Su,China
Posts: 392
Quote:
Originally Posted by NikosD View Post
So, PotPlayer's (which is based on FFMpeg) UVD3 is just a work of their own ?

Or FFMpeg implemented MPEG4 ASP only for ATI's HW ?

BTW, DivX codec itself has DXVA MPEG4 ASP support only for UVD3.
I not have UVD3 to test it !but arcsoft video codec support DXVA MPEG4 ASP only for UVD3 too!
betaking is offline   Reply With Quote
Old 16th January 2012, 11:33   #568  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,336
Quote:
Originally Posted by NikosD View Post
So, PotPlayer's (which is based on FFMpeg) UVD3 support of MPEG4-ASP VLD is just a work of their own ?

Or FFMpeg implemented MPEG4 ASP only for ATI's HW ?
Its not implemented at all, if they added it, its their own - and they "forgot" to contribute it back to ffmpeg, like the license mandates.

Quote:
Originally Posted by NikosD View Post
PotPlayer is free and can accelerate DXVA VC-1 Interlaced at least one year before!
A decoder limited to one player is not useful to many people.

Its just one example why their attitude is not really productive. They take open source code, then add their own features, and claim to have more features then everyone else.
If you base your work on open source, its mandatory to also contribute any changes back to the project, or at least make the changes available for the public. Its not only a "nice thing to do", but its also required by copyright law!
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 16th January 2012 at 11:42.
nevcairiel is offline   Reply With Quote
Old 16th January 2012, 11:40   #569  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
So, they look like the "bad" guys of Open Source community.
They take things from others, but they give nothing back.

I don't know if that's true - I have heard it from others too - but I know that they sure have the most complete Video Player out there especially regarding DXVA video codecs support for all HW (ATI, Nvidia, Intel)

I consider PotPlayer and LAV filters among the best "new" free software for multimedia (MPC-HC, FFMpeg are the "grandfathers")
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all

Last edited by NikosD; 16th January 2012 at 11:44.
NikosD is offline   Reply With Quote
Old 16th January 2012, 14:41   #570  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by NikosD View Post
Thanks for the reply.

So, when you propose to "Output native DXVA surfaces" doesn't involve you in DXVA itself ?
You could output native DXVA surfaces through MSDK ?
...
The MSDK outputs Direct3D9 surfaces (which I allocate BTW). These type of resource is used by DXVA for video frames. MSDK, according to its documentation is an abstraction layer on top of DXVA2. BTW, the overhead of the MSDK is extremely low. I add an overhead in my decoder (frame copying) so I can impersonate a SW decoder with the added benefits.

Quote:
Originally Posted by RBG View Post
egur
Hello Eric.

Can you explain a little bit more about scaling and what do you mean by that. AFAIK video scaling(chroma, luma upscaling, downscaling) is something that is usually done on render level, and you are developing a decoder. Also how good is intel scaling compared to madVR?
Scaling
Assumption: an image is a discrete representation of the continuous world. this means that pixels (samples) are integrals of an area in the real world. This is similar to audio samples that represent an integral over time.

Scaling AKA resampling can be described as converting the discrete samples to a continuous signal and getting the value (actually integrating) the signal in new positions. If you create more sample points from the continuous signal, you up-scale the image (more pixels) if you sample less points, you're performing down-scaling.

Signal processing theory describes the process (some signal processing knowledge required :
* Create a continuous signal from the discrete samples. This done by adding zeros between the samples. The continuous signal is all zeroes with spikes where the samples where.
* Low-pass the signal (weaken or eliminate high frequency)
* Sample the values of the low-passed signal at new positions.

Actual resampling implementations (nearest neighbor, bi-linear, bi-cubic, Lanczos) do just that. Instead of integrating a signal which most of it is zero, you can simply sample the low-pass function and multiple the sampled values with the original pixels.

When down-scaling, the low-pass function must be designed so it will remove high frequencies that do exist in the output image. Every discrete signal has a Nyquist frequency which half it's size (one for horizontal and one for vertical).

From signal processing point of view, the perfect low-pass filter is a Sync (sin(x)/x). A Sync will clip all high frequencies and retain the amplitude (strength) of the low frequencies.

For performance reasons the number of samples used to derive a new pixel value is limited. This is called the sampling window width.
For down-scaling this is perfect (if all the pixels in the input image are used to create each and every output pixel).

For up-scaling things are not so easy. Using a Sync or a modified trimmed version of it (Lanczos) will result in ripples near edges. This is unpleasing to the eye (false edges and mosquito noise).

A variety of sampling functions exist, they are always compromise on performance, sharpness and artifacts.
* Lanczos is the sharpest. Exhibits strong edge artifacts. The more taps used (sampling window size) the output will be sharper with more artifacts
* Bi-cubic - less artifacts, less sharp.
* Bi-leaner - not sharp, geometric artifacts.
* nearest neighbor - sharp, heavy geometric artifacts

There are some sampling algorithms that work a little differently. They can guess the value of missing samples by some kind of heuristics or statistics (e.g. NEDI algorithm). They are computationally very heavy and the results are not worth the effort.

SandyBridge's adavnced video scaler has a different approach. A context adaptive scaler.
It will use a Lanczos4 scaler (8 taps) in order to create very sharp images. In order to avoid (most) of the artifacts, it will perform an analysis of the area and blend between the sharp scaler and a smooth scaler depending if the analysis thought the target pixel is prone to artifacts.

Context adaptive scaling is not a new idea but this implementation's quality and performance are probably one the best.
Some companies perform context adaptive scaling using a different paradigm - use a soft scaler like bi-cubic and perform post processing sharpness filter on edges that were very strong in the source image.

BTW, these tricks are used only for upscaling. For downscaling , the optimal filter is Lanczos for a given sampling window size.

Regarding luma and chroma scaling. Luma is he grey levels of the image (called Y) and chroma is the color information (called UV or CbCr). In the YUV color space, which most of the videos are encoded with, the UV color components are usually at a lower resolution and thus not fully aligned with the luma (Y) component. A scaler algorithm must make sure that chroma scaling produces a pleasing result. Most of the time, chroma values are resampled using a softer scaler (bi-cubic variant).

MadVR currently implements a wise variety of scaling algorithms, all of them are known textbook algorithms and allows selecting different algorithms for Y and UV scaling so the user can get the results he/she likes best.
Since theirs usually a trade-of between sharpness and various artifacts some users will sacrifice one for the other.
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.

Last edited by egur; 16th January 2012 at 16:01.
egur is offline   Reply With Quote
Old 16th January 2012, 16:07   #571  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by egur View Post
The MSDK outputs Direct3D9 surfaces (which I allocate BTW). These type of resource is used by DXVA for video frames. MSDK, according to its documentation is an abstraction layer on top of DXVA2. BTW, the overhead of the MSDK is extremely low. I add an overhead in my decoder (frame copying) so I can impersonate a SW decoder with the added benefits.
It's clear now.

But then , I think is extremely easy to implement a "direct" DXVA decoder through MSDK, just by sending directly the decoded frame to EVR renderer.

So, I think the "direct" - through MSDK - DXVA decoder can be implemented earlier and easier than video scaling procedures.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 16th January 2012, 16:16   #572  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
Quote:
Originally Posted by egur View Post
OK then. everyone wants deinterlacing + film detection. I'll start with that. Scaling and other post processing features will follow.
I won't implement a custom DI myself, I'll use the one supplied by the driver. Personally I've tested it to be better than Nvidia/AMD but your mileage may vary.

@CruNcher
The scaling algorithm in SandyBridge is superior to both Nvidia and AMD in both upscaling and downscaling. Since I invented the algorithm for the video scaler, I have deep knowledge on the matter. Like many other parts of the video engine, it's implemented as ASIC and has very high performance.

@NikosD
I have relatively little knowledge on DXVA and what the driver support or not. Frankly I don't want to deep dive on the matter, I'd rather have a tooth pulled out
That's why I use the Media SDK, it simplifies (significantly) HW video decode/process and also adds encode as a bonus.
This means that dealing with DXVA is handled by the Media SDK developers and not me
Some features may be possible when using native DXVA that don't exist in the MSDK, but I can live with that.
Anyway, complaints about the driver (or feature requests) should be posted in the driver forum.

@Betaking
The MSDK doesn't support MPEG4-ASP as far as I know. I don't know about driver support. Does DXVA support this format (any GPU)?
I fully believe you that though that wasn't the question on the scaling it was more how does it compare to NEEDI3

Quote:
There are some sampling algorithms that work a little differently. They can guess the value of missing samples by some kind of heuristics or statistics (e.g. NEDI algorithm). They are computationally very heavy and the results are not worth the effort.
Yes extremely slow and that's the question how does yours compare Performance/Quality in Hardware implemented even Adaptive

Quote:
SandyBridge's adavnced video scaler has a different approach. A context adaptive scaler.
It will use a Lanczos4 scaler (8 taps) in order to create very sharp images. In order to avoid (most) of the artifacts, it will perform an analysis of the area and blend between the sharp scaler and a smooth scaler depending if the analysis thought the target pixel is prone to artifacts.
It sounds good on paper (did you ever released one ?), and surely Intel wouldn't have bought it if it wouldn't have looked valuable for them
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 16th January 2012 at 16:44.
CruNcher is offline   Reply With Quote
Old 16th January 2012, 16:32   #573  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,336
Quote:
Originally Posted by NikosD View Post
But then , I think is extremely easy to implement a "direct" DXVA decoder through MSDK, just by sending directly the decoded frame to EVR renderer
Its not that easy, there are a number of annoying factors to deal with. Personally, i think the copy-back solution is easier, which is why i started with it.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 16th January 2012, 16:51   #574  |  Link
RBG
Registered User
 
Join Date: Oct 2011
Posts: 108
egur

Thanks for your reply, I appreciate it a lot. I want to clear something up, will SB scaling work on hybrid systems and what are the conditions of it? For example, there is no real display connected to my Intel HD graphics, I made a fake one, like you suggested here.
RBG is offline   Reply With Quote
Old 16th January 2012, 17:05   #575  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
@Eric
as this is one of your professions i would like to advice you that we also have Robidoux on Doom9 Madshi, Tritical and other who research in that field over @ the Avisynth area (i know you are fully on with the decoder but maybe as soon as you get to the scaling implementation you could say hello )

http://forum.doom9.org/showthread.php?t=160038
http://forum.doom9.org/showthread.php?t=145358
http://forum.doom9.org/showthread.php?t=160610
http://forum.doom9.org/showthread.php?t=154143

Quote:
Context adaptive scaling is not a new idea but this implementation's quality and performance are probably one the best.
Some companies perform context adaptive scaling using a different paradigm - use a soft scaler like bi-cubic and perform post processing sharpness filter on edges that were very strong in the source image.
This is also what i currently prefer Realtime and use in my avisynth framework via the GPU shader though not with bi-cubic
Would really tove to see your implementations result, especialy speed beeing native asic and not Shader though copy back will hit that again
But yeah i voted for Deinterlacing + Ivtc and those are more important for now
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 16th January 2012 at 19:19.
CruNcher is offline   Reply With Quote
Old 16th January 2012, 17:08   #576  |  Link
STaRGaZeR
4:2:0 hater
 
Join Date: Apr 2008
Posts: 1,302
Good post right there egur
__________________
Specs, GTX970 - PLS 1440p@96Hz
Quote:
Originally Posted by Manao View Post
That way, you have xxxx[p|i]yyy, where xxxx is the vertical resolution, yyy is the temporal resolution, and 'i' says the image has been irremediably destroyed.
STaRGaZeR is offline   Reply With Quote
Old 16th January 2012, 20:42   #577  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
@CruNcher
SandyBridge's Advanced Video Scaler (AVS) is a programmable fixed function sclaler (ASIC) utilized when either using the EVR or by renderers from Cyberlink and Arcsoft and maybe other companies.
I've confirmed that the Media SDK uses the AVS for scaling. Older GPUs had simpler scalers.

I didn't release a paper/patent since the actual implementation is trade secret (the analysis part). But again, context adaptive scaling (or context adaptive algorithms in general) is not new.

The performance of the AVS will vary on GPU clock speed, but it can do several 1080p60 streams simultaneously.

The best way to test upscaling is by scaling DVD resolution to 1080p (720p-->1080p is a small scale factor). Downscaling can be checked by shrinking the player/render and playing test patterns - look for aliasing.

@RBG
EVR will use the video processing features (DI, scaling ,etc) available on the GPU connected to the screen showing the video. So with hybrid setups, you get what AMD/Nvidia gives you.
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.
egur is offline   Reply With Quote
Old 16th January 2012, 21:28   #578  |  Link
RBG
Registered User
 
Join Date: Oct 2011
Posts: 108
Quote:
Originally Posted by egur View Post
@RBG
EVR will use the video processing features (DI, scaling ,etc) available on the GPU connected to the screen showing the video. So with hybrid setups, you get what AMD/Nvidia gives you.
Ah... That's sad.

Now I don't understand what did you mean by writing:

What should be the next big feature?
* HW Video processing: deinterlacing, film detection (3:2, 2:2 pulldowns, etc), noise reduction, sharpness, scaling, etc.


Internal hw deinterlacing, edge-enhancement, all that works in LAV video(CUVID), and when I saw "scaling" in your list, I thought you are going to implement it somehow in the decoder itself, that is why I asked you about it.
RBG is offline   Reply With Quote
Old 16th January 2012, 22:02   #579  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by RBG View Post
Ah... That's sad.

Now I don't understand what did you mean by writing:

What should be the next big feature?
* HW Video processing: deinterlacing, film detection (3:2, 2:2 pulldowns, etc), noise reduction, sharpness, scaling, etc.


Internal hw deinterlacing, edge-enhancement, all that works in LAV video(CUVID), and when I saw "scaling" in your list, I thought you are going to implement it somehow in the decoder itself, that is why I asked you about it.
I do plan to implement it internally and it will work on Intel HW regardless of the renderer. That's the beauty of the QS decoder's design, it cares very little about the renderer.
The decoder is (traditionally) not responsible for scaling, the renderer is. But I can expose the feature anyway, like ffdshow does (scale to a fixed resolution).
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.
egur is offline   Reply With Quote
Old 16th January 2012, 23:40   #580  |  Link
RBG
Registered User
 
Join Date: Oct 2011
Posts: 108
Quote:
Originally Posted by egur View Post
I do plan to implement it internally and it will work on Intel HW regardless of the renderer. That's the beauty of the QS decoder's design, it cares very little about the renderer.
The decoder is (traditionally) not responsible for scaling, the renderer is. But I can expose the feature anyway, like ffdshow does (scale to a fixed resolution).
Yes, scaling is done by render, I know that, but IMO it brings some inconveniences, especially on hybrid systems. That means if I want to get high quality picture, I should either stick with MadVR which is obviously not very stable, either use SB hw scaler, which is limited to vanilla EVR and physical display connection needed here. And in this situation internal hq hw scaling can be a real option. It will be totally awesome if you implement this feature someday. Also I wonder if it is possible to make internal hw scaling work dynamically, like it works on EVR, scale to the actual window size?
RBG is offline   Reply With Quote
Reply

Tags
ffdshow, h264, intel, mpeg2, quicksync, vc1, zoom player

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 11:39.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.