Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Hardware & Software > Software players
Register FAQ Calendar Today's Posts Search

Reply
 
Thread Tools Search this Thread Display Modes
Old 2nd April 2011, 18:28   #41  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
It seems that UVD 3 has the same performance as UVD 2.2, or worse in minimum framerate.

The results - in terms of raw performane of UVD 3 - I have to say are a little disappointing.

Thanks renq.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 2nd April 2011, 18:54   #42  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
ATI/AMD never claimed that Performance is better just that the featureset was enhanced, though the same on Nvidias side

seems the 4 Girls stream is practicaly the same Performance as on Nvidia VP2 with UVD 2.2 11) PowerDVD DXVA 55/57/79 it doesn't reach the full 60 fps so no big difference though i reach only 50 fps without benching with DXVAChecker but in Real but also on XP with VMR7 windowed lets see what DXVAChecker is saying.
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 2nd April 2011 at 19:17.
CruNcher is offline   Reply With Quote
Old 2nd April 2011, 19:18   #43  |  Link
pirlouy
_
 
Join Date: May 2008
Location: France
Posts: 692
Thanks NikosD and CruNcher for enlightenments. I admit I don't know everythink about DXVA, that's why my first post was a question. :-)

I know it's a bit a pity not to use this module (UVD in my case), but I don't know if it's my module which has a problem, or if it's a driver problem, but in my case, it's safer to use CPU (no dropped frames).

Are there any OpenCL video decoders in the market ? Maybe it's a young technology/language, but it could be better than CUDA (for general users, not inevitably Nvidia users), since it's not reserved for a company...
pirlouy is offline   Reply With Quote
Old 2nd April 2011, 19:27   #44  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,347
Quote:
Originally Posted by pirlouy View Post
Are there any OpenCL video decoders in the market ? Maybe it's a young technology/language, but it could be better than CUDA (for general users, not inevitably Nvidia users), since it's not reserved for a company...
Thats a common misconception. A CUDA video decoder does not use CUDA to decode. CUDA offers a special API to access the video decoder - you just "use" CUDA to access that API, the decoder is not written in CUDA. Thats why i named my decoder CUVID (the name of the API), and try to avoid using the terms CUDA in general when talking about it.

Because of this, OpenCL does not qualify for this. The "common" API on Windows is DXVA, sadly it does come with limitations, but because its mostly meant for playback, i don't see the vendors investing in a common API anytime soon.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 2nd April 2011, 19:29   #45  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
The situation seems like MLAA.

ATI introduced MLAA as Radeon 6xxx series feature only - but later added MLAA to 5xxx series, too.

When ATI introduced UVD 2.2 they said it had all the features that UVD 3 has. Full MPEG2 acceleration, MPEG4 ASP support (DivX, Xvid)

I'm pretty sure that UVD 3 & UVD 2.2 are the same hardware.

They decided to expose the "extra" features of both versions to UVD 3 only.

Please ATI fanboys, don't answer to that guess. It's just an evil thought of me

But I don't think that they will act the same way as MLAA
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all

Last edited by NikosD; 2nd April 2011 at 19:31.
NikosD is offline   Reply With Quote
Old 2nd April 2011, 19:39   #46  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by nevcairiel View Post
Thats a common misconception. A CUDA video decoder does not use CUDA to decode. CUDA offers a special API to access the video decoder - you just "use" CUDA to access that API, the decoder is not written in CUDA. Thats why i named my decoder CUVID (the name of the API), and try to avoid using the terms CUDA in general when talking about it.

Because of this, OpenCL does not qualify for this. The "common" API on Windows is DXVA, sadly it does come with limitations, but because its mostly meant for playback, i don't see the vendors investing in a common API anytime soon.
This is not exactly right.

The latest OpenCL 1.1 specification has added direct calls to DXVA, like CUDA.
I'm not a developer to know exactly the differences between CUDA and OpenCL access to DXVA, but I'm pretty sure that OpenCL will finally catch up CUDA in DXVA access, if it hasn't done that already.

But I'm not aware of any "OpenCL" decoder.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 2nd April 2011, 20:11   #47  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,347
Quote:
Originally Posted by NikosD View Post
This is not exactly right.
Oh, but it is.

Quote:
Originally Posted by NikosD View Post
The latest OpenCL 1.1 specification has added direct calls to DXVA, like CUDA.
It does not.

OpenCL 1.1 spec: http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf

Go find me any reference to video decoding.

ATI is trying to be smart, and invented an API called "OpenVideo Decode", but its only ATI, and not "Open", and its still in its very early days. (and not related to OpenCL, although it has interop with OpenCL like CUVID has with CUDA)
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 2nd April 2011 at 20:13.
nevcairiel is offline   Reply With Quote
Old 2nd April 2011, 20:20   #48  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
VP2 WIN XP SP3 Forceware 270.51 9800 GT

4 Girls

DXVA:

Renderer: Video Mixing Renderer 7
Decoder: CyberLink Video Decoder
Decoder Device: ModeH264_VLD_NoFGT
Processor Device: -
Time: 03:47.269
Average FPS: 53,012
Min/Max FPS: Min: 50 Max: 58

Renderer: Video Mixing Renderer 9
Decoder: CyberLink Video Decoder
Decoder Device: ModeH264_VLD_NoFGT
Processor Device: -
Time: 03:47.192
Average FPS: 53,030
Min/Max FPS: Min: 50 Max: 58

Renderer: Video Mixing Renderer 7
Decoder: CoreAVC Video Decoder
Decoder Device: ModeH264_VLD_NoFGT
Processor Device: -
Time: 03:53.328
Average FPS: 51,635
Min/Max FPS: Min: 47 Max: 56

Renderer: Video Mixing Renderer 9
Decoder: CoreAVC Video Decoder
Decoder Device: ModeH264_VLD_NoFGT
Processor Device: -
Time: 03:52.397
Average FPS: 51,842
Min/Max FPS: Min: 48 Max: 56

Nvcuvid via Dshow Surprise Surprise (not really)

Renderer: Video Mixing Renderer 7
Decoder: CoreAVC Video Decoder
Decoder Device: -
Processor Device: -
Time: 04:02.820
Average FPS: 49,617
Min/Max FPS: Min: 44 Max: 55

Renderer: Video Mixing Renderer 9
Decoder: CoreAVC Video Decoder
Decoder Device: -
Processor Device: -
Time: 04:02.846
Average FPS: 49,612
Min/Max FPS: Min: 44 Max: 54

Renderer: Video Mixing Renderer 7
Decoder: LAV CUVID Decoder
Decoder Device: -
Processor Device: -
Time: 04:24.156
Average FPS: 45,609
Min/Max FPS: Min: 43 Max: 52

Renderer: Video Mixing Renderer 9
Decoder: LAV CUVID Decoder
Decoder Device: -
Processor Device: -
Time: 04:23.916
Average FPS: 45,651
Min/Max FPS: Min: 43 Max: 53

Renderer: Video Mixing Renderer 7
Decoder: CUDA Video Decoder
Decoder Device: -
Processor Device: -
Time: 04:41.867
Average FPS: 42,733
Min/Max FPS: Min: 40 Max: 46

Renderer: Video Mixing Renderer 9
Decoder: CUDA Video Decoder
Decoder Device: -
Processor Device: -
Time: 04:41.806
Average FPS: 42,742
Min/Max FPS: Min: 40 Max: 46

CPU:

Renderer: Video Mixing Renderer 7
Decoder: CoreAVC Video Decoder
Decoder Device: -
Processor Device: -
Time: 00:56.136
Average FPS: 214,622
Min/Max FPS: Min: 200 Max: 234

Renderer: Video Mixing Renderer 9
Decoder: CoreAVC Video Decoder
Decoder Device: -
Processor Device: -
Time: 00:55.837
Average FPS: 215,771
Min/Max FPS: Min: 201 Max: 248
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 2nd April 2011 at 20:35.
CruNcher is offline   Reply With Quote
Old 2nd April 2011, 20:23   #49  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
I meant this:

http://developer.amd.com/gpu/amdapps...s/default.aspx

"Support for UVD video hardware component through OpenCL"

I thought it was OpenCL 1.1 feature, not ATI-only.

I'm pretty sure that if ATI did it, Khronos will do it also in a future spec.

What do you think?
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all

Last edited by NikosD; 2nd April 2011 at 20:31.
NikosD is offline   Reply With Quote
Old 2nd April 2011, 20:52   #50  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,347
I don't think it'll be standardized in OpenCL.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 2nd April 2011, 21:05   #51  |  Link
pirlouy
_
 
Join Date: May 2008
Location: France
Posts: 692
Of course I don't have nevcairiel's knowledge, but for me, OpenCL is better than CUDA because even if you seem to disagree, it seems more open than CUDA.
And if people starts to develop applications for CUDA, it's the beginning of a company dependence.

Imagine if there were a C for Intel and a C for AMD, it would be boring for everyone.

From what I've read, OpenCL can use DXVA (with its limitations), but it seems it's up to GPU companies to allow direct connections between OpenCL and their H.264/VC1 etc. modules. And from NikosD's link, it seems it's the case now with AMD.

But indeed, all this is young and should have a lot of bugs. But in some years, I hope there won't be a lot of applications "optimized for Nvidia"....
pirlouy is offline   Reply With Quote
Old 3rd April 2011, 10:34   #52  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
The only reason for Nvidia not to implement a solution for "OpenCL" video decoder ever, is CUDA.
If Nvidia had the choice, they would prefer to not implement anything in OpenCL. CUDA is their exclusive API, they try to spread everywhere and to everyone in order to sell more cards.

Eventually we will see an "OpenCL" video decoder, but for ATI only.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 3rd April 2011, 11:33   #53  |  Link
pirlouy
_
 
Join Date: May 2008
Location: France
Posts: 692
I think they support OpenCL, but through CUDA (OpenCL -> CUDA -> hardware). So it should work, but not as powerful as if there were a direct "connection". But at least they try...
pirlouy is offline   Reply With Quote
Old 4th April 2011, 04:50   #54  |  Link
thuan
Registered User
 
Join Date: Sep 2005
Location: Vietnam, HCM City
Posts: 262
I run everything on my home system through an UPS and with its software, my power usage between accelerated playback with DXVA and ffdshow with EVR-CP are not that much different, hovering around 123w and 125w respectively. This and problems I have with accelerated video playback are more than enough for me to abandon it.
__________________
Home PC: Core i5-2400, 8GB RAM, nVidia GTX560Ti, Windows 7 64bit SP1.
Work PC: Intel Xeon X3220 (Core 2 Q6600), 4GB RAM, Intel G45, Windows 7 64bit SP1.

Last edited by thuan; 4th April 2011 at 04:58.
thuan is offline   Reply With Quote
Old 4th April 2011, 11:01   #55  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
hmm using FFMpeg-MT, CoreAVC or DiAVC ? and yes its a valid question if you only look @ Playback power consumption especially as CPUs become more and more energy efficient, nobody yet measured how much energy for example does CoreAVC need for Playback on SB with the integrated GPU not doing anything else and with the GPU doing the decode though in this non discrette case and with the Logic combined on 1 Die its almost clear that it would save a lot of energy even compared to a High efficient Decoder like CoreAVC (it could look completely different for a discrette system where you have a base idle consumption + decoder dsp consumption + other cards components consumption while decoding)
Though for other System configurations it would loss efficiency depending on several factors, though a Desktop system with discrete GPU is mostly also used for Gaming so the GPU is active so or so including the Logic you may not need, and being able to utilize that to save some power in certain tasks or just offload the CPU is nothing wrong and yeah Decoding alone can be a very small power save depending on the System but if you combine more and more tasks smart and efficiently you can save a lot.
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 4th April 2011 at 11:22.
CruNcher is offline   Reply With Quote
Old 4th April 2011, 11:08   #56  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,347
Of course once CPUs become more powerful and efficient, the difference will always be smaller. However, alot of people have fairly weak CPUs in their dedicated HTPCs, which might just manage to decode it in software, but do that at nearly full load, which adds noise and heat.
Or people want to do post-processing (ok, which does not work with DXVA, but does work with for example CUVID based decoders), and need the CPU for that.

There will always be valid use-cases for accelerated decoding, just not for everyone.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 4th April 2011, 11:44   #57  |  Link
thuan
Registered User
 
Join Date: Sep 2005
Location: Vietnam, HCM City
Posts: 262
Agree and I was using ffdshow so ffmpeg-mt. Tested video was h264 encode of the last part of Planet Earth ep 5 where there were a lot of locusts.
__________________
Home PC: Core i5-2400, 8GB RAM, nVidia GTX560Ti, Windows 7 64bit SP1.
Work PC: Intel Xeon X3220 (Core 2 Q6600), 4GB RAM, Intel G45, Windows 7 64bit SP1.
thuan is offline   Reply With Quote
Old 4th April 2011, 12:11   #58  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
Quote:
Originally Posted by nevcairiel View Post
Of course once CPUs become more powerful and efficient, the difference will always be smaller. However, alot of people have fairly weak CPUs in their dedicated HTPCs, which might just manage to decode it in software, but do that at nearly full load, which adds noise and heat.
Or people want to do post-processing (ok, which does not work with DXVA, but does work with for example CUVID based decoders), and need the CPU for that.

There will always be valid use-cases for accelerated decoding, just not for everyone.
Not entirely true Roozhou works on DXVA frame grabing based stuff which could be also utilized for Post Processing
True the first tests don't look as efficient as Nvcuvid on XP at least see Roozhou test http://forum.doom9.org/showthread.php?t=160371 but testing and real life results are sometimes not near @ each other see my Performance issues with LAV CUVID and Haali/Madvr which i yet cant explain either why i lose so heavily efficiency compared to VMR on my System configuration where theoreticaly LAV CUVID + Madvr should be somewhere performance wise equal in theory as when rendering onto VMR (where LAV CUVID on VMR9 renderless also shows problems here) but somewhere is a bottleneck might be inside my system configuration or how LAV CUVID,CoreAVC CUDA, CUDA Video Decoder provide the decoded frames i dunno yet another explanation could be high resolution timing.

I mean if you see that

LAV CUVID + MadVR



LAV CUVID + VMR9 renderless



and this you would also ask you these same questions

Cyberlink DXVA + VMR9 renderless

__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 4th April 2011 at 13:03.
CruNcher is offline   Reply With Quote
Old 4th April 2011, 12:26   #59  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,347
Especially ATI cards are *really slow* when downloading the contents of a D3D Texture (where the DXVA decoded image ends up), so it will cut into the performance. (Especially on XP this shows, the new driver model used by Vista/7 seems to improve this somewhat - but then XP is really getting old, personally i don't care for any statistics in it.)
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 4th April 2011 at 12:28.
nevcairiel is offline   Reply With Quote
Old 4th April 2011, 16:44   #60  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
Could somebody Benchmark this in Vista/7 with Forceware 270.51 ?

DXVA + VMR9 Renderless + Complex Sharpening 2

The 4 Girls stream can be found @ the first post



Don't expect to get that Stream Playing Back Realtime neither Nvidia nor ATI Hardware can do that (or some overhead doesn't allow them too, someone with VDPAU Linux could test ?) according (also for Vista/7) to the Benchmarks here and my own, though Intel HD2000/3000 results on Vista/7 would still be needed, someone with a Broadcom Crystal or any Hardware based Windows Decoder setup and results from their would also be great
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 4th April 2011 at 18:11.
CruNcher is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 08:18.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.