Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > New and alternative video codecs

Reply
 
Thread Tools Search this Thread Display Modes
Old 16th October 2011, 21:58   #181  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by TPoise View Post
I'm using MPC, 64-bit edition v1.5.2.3456
Any special reason to use 64 bit?
Can you try the 32 bit version? My decoder will be a little faster in 32bit as I've optimized the copy function in ASM. 64 bit use intrinsic functions but the compiler isn't 100% efficient using them.
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.
egur is offline   Reply With Quote
Old 17th October 2011, 17:59   #182  |  Link
Boltron
Registered User
 
Boltron's Avatar
 
Join Date: May 2011
Posts: 94
What performance monitor utility are you using that shows Summary, CPU, Memory GPU and also the GPU Engine History?
Boltron is offline   Reply With Quote
Old 17th October 2011, 20:35   #183  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 12,811
ProcessExplorer?

And, if you want a more detailed analysis on how many CPU cycles have been spent in each function, you could have a look at Code Analyst:
http://developer.amd.com/tools/CodeA...s/default.aspx

(Although it is an AMD tool, it works on Intel CPU's just as well. Just make sure you use it with a Debug build, if you want function names!)
__________________
There was of course no way of knowing whether you were being watched at any given moment.
How often, or on what system, the Thought Police plugged in on any individual wire was guesswork.



Last edited by LoRd_MuldeR; 17th October 2011 at 20:38.
LoRd_MuldeR is offline   Reply With Quote
Old 17th October 2011, 21:27   #184  |  Link
Boltron
Registered User
 
Boltron's Avatar
 
Join Date: May 2011
Posts: 94
Wow, ProcessExplorer sure looks different from the last time I used it. This is so cool. Thx!
Boltron is offline   Reply With Quote
Old 18th October 2011, 01:22   #185  |  Link
TPoise
Registered User
 
Join Date: Feb 2005
Posts: 22
Quote:
Originally Posted by egur View Post
Any special reason to use 64 bit?
Can you try the 32 bit version? My decoder will be a little faster in 32bit as I've optimized the copy function in ASM. 64 bit use intrinsic functions but the compiler isn't 100% efficient using them.
Used 32-bit and get the same video corruption.
TPoise is offline   Reply With Quote
Old 19th October 2011, 13:48   #186  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Multi GPU setup

I managed to solve the multi GPU problem without cables. You'll need v0.16 or newer to make this work.
1) You need to set up another (fake) screen. Right click on desktop->screen resolution.
2) Click the Detect button. Unconnected screens will appear.
3) Extend desktop to a VGA connection on the Intel GPU (screen 2 in the image).
4) Drag the 2nd screen to the corner of the primary screen so the mouse boundaries of the primary screen will remain (almost) the same.
5) Click OK/Apply. A reboot is recommended.

6) Open your favorite player and select MadVR or other GPU demanding renderer for to test the setup. You can test further by selecting EVR as renderer, open the control panel for your AMD/Nvidia GPU and override the color settings (e.g. kill the saturation).
Here's a working setup
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.

Last edited by egur; 22nd December 2011 at 09:13.
egur is offline   Reply With Quote
Old 19th October 2011, 14:08   #187  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,950
awesome i just love NT6 now we can mix input output like crazy without needing any 3rd party solutions great work egur
i wonder though is DXVA also working or does the decoder need specifically to support this ?
And what happens if you open a DXVA session and where does it get rendered ?
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 19th October 2011 at 14:13.
CruNcher is offline   Reply With Quote
Old 19th October 2011, 14:53   #188  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 8,954
Quote:
Originally Posted by CruNcher View Post
i wonder though is DXVA also working or does the decoder need specifically to support this ?
And what happens if you open a DXVA session and where does it get rendered ?
You can't easily transfer GPU textures between devices, so if you use DXVA, it needs to be rendered on the same device that decoded it.

Besides, if you already use DXVA, why not use the DXVA of your primary video card?
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 19th October 2011, 16:24   #189  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
New version released 0.16

New and improved version. Zip files contains installer and documentation, please read.

Download version 0.16 alpha:
32 bit http://www.multiupload.com/Z4PX2UFGB4
64 bit http://www.multiupload.com/QH5ZZXINCQ
Source code http://www.multiupload.com/06IZWGH4T0

Revision highlights:
v1.16:
* Support multi GPU setups. Now the decoder can run on separate HW then the renderer. Even without connecting the Intel GPU to a screen. See Multi GPU below for details.
* This version will be the first version on SourceForge.
* Updated to ffdshow build 3996.
* Some fixes to the timestamp code. Now supporting streams with no frame rate.
* Fixed several aspect ratio issues.
* Very initial support for DVD playback. Menus are not displayed right yet. WIP. Recommend not to use except for testing purposes.
* Changed mechanism for handling flush & seek event. Code is faster and more robust. A critical stage for playing DVDs.
* Added a new callback for FFDShow’s internal decoders – EndFlush. This is needed for DVD playback. Other decoders do not need to implement it.
* Enhanced FFDShow’s code with a faster memcpy function (SSE2 based). This replaces calling memcpy. The original source code would use ffmpeg to do it, but it crashes on NV12 images.
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.

Last edited by egur; 19th October 2011 at 16:31.
egur is offline   Reply With Quote
Old 19th October 2011, 16:45   #190  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by CruNcher View Post
awesome i just love NT6 now we can mix input output like crazy without needing any 3rd party solutions great work egur
i wonder though is DXVA also working or does the decoder need specifically to support this ?
And what happens if you open a DXVA session and where does it get rendered ?
DXVA connections will not cross HW boundaries. Maybe there's a tricky way to do it, but I doubt it's worth the trouble.

My decoder is mostly aimed at low power, but it was a nice problem to solve. I'm not aware of similar solutions.

Since I copy the frames from the GPU to the CPU very quickly, it makes sense in using it with your favorite SW setup. The pipeline is File->CPU->GPU1->CPU->GPU2->Screen.

This opens up a way for fast HW decoding with super strong programmable video processing on a discrete GPU.

I wish Windows 7 was easier to use in sense of utilizing the various HW resources.
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.
egur is offline   Reply With Quote
Old 19th October 2011, 17:24   #191  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 6,228
Quicksync in official ffdshow r4000 would epic
Atak_Snajpera is offline   Reply With Quote
Old 19th October 2011, 17:45   #192  |  Link
ajp_anton
Registered User
 
ajp_anton's Avatar
 
Join Date: Aug 2006
Location: Stockholm/Helsinki
Posts: 748
Quote:
Originally Posted by egur View Post
Now, it's clear. I'll add a checkbox option for ffdshow's codec page to decline a connection in such cases. Default behavior will be fall back to libavcodec or other internal decoder.
Does this have any relevance to other formats (VC1, MPEG2)?
What happened?
ajp_anton is offline   Reply With Quote
Old 19th October 2011, 20:33   #193  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by ajp_anton View Post
What happened?
This part wasn't ready for this release. Currently it falls back to libavcodec if the platform can't support QuickSync or for H264 unsupported formats.
If something else is unsupported ffdhsow will decline the connection.
I'll fix this for next release. Hopefully after I integrate into the main ffdshow trunk in sourceforge.
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.
egur is offline   Reply With Quote
Old 20th October 2011, 16:30   #194  |  Link
pulbitz
Registered User
 
Join Date: Sep 2011
Posts: 22
I'm sorry. I don't speak English very well.

audio/video unsync (with Gabest Splitter) sample files.

(2011.09.28) Hyun Young 조현영 _A_ @ Gachon University Festival Celebration Fancam(720p_H.264-AAC).mp4
http://o-o.preferred.fra02s05.v5.lsc...89491d982d9386

(2011.10.06) Hyun Young 조현영 _Mach_ @ Gyeonggi University of S&T Festival Fancam(720p_H.264-AAC).mp4
http://o-o.preferred.fra02s05.v7.lsc...30d6979f680c4c

QuickSync = 30.303fps
libavcodec = 29.97xfps

please improve your timestamp code more.

Last edited by pulbitz; 20th October 2011 at 19:25.
pulbitz is offline   Reply With Quote
Old 20th October 2011, 21:11   #195  |  Link
JanWillem32
Registered User
 
JanWillem32's Avatar
 
Join Date: Oct 2010
Location: The Netherlands
Posts: 1,084
Quote:
Originally Posted by egur View Post
DXVA connections will not cross HW boundaries. Maybe there's a tricky way to do it, but I doubt it's worth the trouble.

My decoder is mostly aimed at low power, but it was a nice problem to solve. I'm not aware of similar solutions.

Since I copy the frames from the GPU to the CPU very quickly, it makes sense in using it with your favorite SW setup. The pipeline is File->CPU->GPU1->CPU->GPU2->Screen.

This opens up a way for fast HW decoding with super strong programmable video processing on a discrete GPU.

I wish Windows 7 was easier to use in sense of utilizing the various HW resources.
DMA access to GPU memory has been around since forever. (http://en.wikipedia.org/wiki/Direct_memory_access for those that are interested.)
Allocating a buffer explicitly in the video memory has always been possible. Proper memory resource management is even a key feature to any graphics rendering engine.
Sharing resources trough the DirectX API is relatively new: http://msdn.microsoft.com/en-us/libr...=vs.85%29.aspx and http://msdn.microsoft.com/en-us/libr...=vs.85%29.aspx . The usual DXVA helper device for EVR uses a shared handle system to give the main rendering device access to DXVA output surfaces. The extra device runs mostly asynchronously from the main device.
File->CPU->GPU1->GPU2->Screen is completely allowed, but I don't know what would be faster, a render target on GPU1's memory or on GPU2's memory. Making GPU1 render to system memory or doing an extra copy operation from video memory to system memory will most certainly slow things down.
It's actually not the copy operation itself that's an issue. It's usually the wait for the lock operation. Scheduled transfers without locking surfaces/textures in video memory are a lot more efficient.
__________________
development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv
JanWillem32 is offline   Reply With Quote
Old 20th October 2011, 21:54   #196  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,950
I got Cyberlink HAM working on Intel it's basically nothing else then a Renderless DXVA (not bound to the renderer) that also Potplayers DXVA Decoder makes use of.
The big questions is do we really need APIs from every vendor for NT6 if Microsofts integrated the possibility to use DXVA Renderless from the beginning, and why integrate every each vendor ones if 1 for all exists (in terms of interoperability) ?

DXVA Renderless (supports everyone)
AMD OpenVIdeo (supports AMD)
Intel MediaSDK (supports Intel)
Nvidia Nvcuvid (supports Nvidia)

Is there really such a big Performance difference that would justify implementing each vendors own (or is there even a performance lose doing so wrapping from a to b), for the specific hardware case ?
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 20th October 2011 at 22:33.
CruNcher is offline   Reply With Quote
Old 20th October 2011, 22:24   #197  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by pulbitz View Post
I'm sorry. I don't speak English very well.

audio/video unsync (with Gabest Splitter) sample files.

(2011.09.28) Hyun Young 조현영 _A_ @ Gachon University Festival Celebration Fancam(720p_H.264-AAC).mp4
http://o-o.preferred.fra02s05.v5.lsc...89491d982d9386

(2011.10.06) Hyun Young 조현영 _Mach_ @ Gyeonggi University of S&T Festival Fancam(720p_H.264-AAC).mp4
http://o-o.preferred.fra02s05.v7.lsc...30d6979f680c4c

QuickSync = 30.303fps
libavcodec = 29.97xfps

please improve your timestamp code more.
I know I need to improve the time stamps. Fixed a few things in the v0.16 but there's still more work to do...

The links you've posted are not working - "access denied" for both. You can share very quickly on http://www.multiupload.com
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.
egur is offline   Reply With Quote
Old 20th October 2011, 22:42   #198  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by CruNcher View Post
DXVA Renderless (supports everyone)
AMD OpenVIdeo (supports AMD)
Intel MediaSDK (supports Intel)
Nvidia Nvcuvid (supports Nvidia)

Is there really such a big Performance difference that would justify implementing each vendors own, for the specific hardware case ?
There's a difference in features (mostly related to video processing) and DXVA is very complex and not high level enough.
Hopefully this chaos will converge to a single API at some point. A user friendly API that abstracts enough details while remaining high performing.
Performance is a very important issue in the mobile world - battery life. Every minute of video playback is worth a lot of R&D, validation and enabling resources.
The architecture "war" with ARM (starting with Windows 8) will probably help to push HW acceleration forward on many fronts so small devices can compete with ARM based SOCs.
Nvidia plays both sides of the fence in this war (GPU for x86 platforms as well as ARM CPU maker) so one can expect them to fork out a cross platform API for HW acceleration. This would probably be the best kind of API - abstract the HW completely - no need to be a DirectX expert to do complex stuff.
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.
egur is offline   Reply With Quote
Old 20th October 2011, 22:49   #199  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by JanWillem32 View Post
DMA access to GPU memory has been around since forever. (http://en.wikipedia.org/wiki/Direct_memory_access for those that are interested.)
Allocating a buffer explicitly in the video memory has always been possible. Proper memory resource management is even a key feature to any graphics rendering engine.
Sharing resources trough the DirectX API is relatively new: http://msdn.microsoft.com/en-us/libr...=vs.85%29.aspx and http://msdn.microsoft.com/en-us/libr...=vs.85%29.aspx . The usual DXVA helper device for EVR uses a shared handle system to give the main rendering device access to DXVA output surfaces. The extra device runs mostly asynchronously from the main device.
File->CPU->GPU1->GPU2->Screen is completely allowed, but I don't know what would be faster, a render target on GPU1's memory or on GPU2's memory. Making GPU1 render to system memory or doing an extra copy operation from video memory to system memory will most certainly slow things down.
It's actually not the copy operation itself that's an issue. It's usually the wait for the lock operation. Scheduled transfers without locking surfaces/textures in video memory are a lot more efficient.
I haven't seen anything like this - two DXVA devices from different GPUs passing surfaces from one to the other?

I can take your word for it but it's probably extremely complicated to accomplish.

In the Intel GPU, I don't think there's any DMA going on when copying surfaces back and forth to the CPU. It's the same memory sitting on the same memory controller. A special SSE4 instruction was introduced in Penryn to address the complex mapping to solve the speed issues.
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.
egur is offline   Reply With Quote
Old 21st October 2011, 01:02   #200  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,950
I just recorded my first 3D Gaming with my Low Latency H.264 Quicksync Encoder Framework it runs rather smooth in the 3D Engine (ID tech 5) at least playable. Entirely on GT1 (Playing + Recording)
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 21st October 2011 at 03:29.
CruNcher is offline   Reply With Quote
Reply

Tags
ffdshow, h264, intel, mpeg2, quicksync, vc1, zoom player

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 04:22.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2017, vBulletin Solutions Inc.