Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > New and alternative video codecs

Reply
 
Thread Tools Search this Thread Display Modes
Old 4th September 2011, 22:26   #1  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing

Updated June 22nd 2013
Hi,
My name is Eric Gur and I've taken upon myself a side project at my Intel position to make the Intel SandyBridge (or newer) hardware accelerated video decoding technology freely accessible to everyone.
The project name is Intel QuickSync Decoder.

To do so, I decided to embed the Intel QuickSync technology introduced in SandyBridge into the widely popular FFDShow video decoder filter.
Nowadays, the Intel QuickSync Decoder is officially integrated in FFDShow, LAV Video Decoder and PotPlayer.

Main features
* HW decode using Intel's high performance QuickSync engine.
* Decodes H264, MPEG2, VC-1, WMV9. DVD playback not supported.
* HW deinterlacing -auto or forced, with half or full (50/60p) output rate
* HW denoise and detail filters
* Soft 3:2 pulldown on marked streams.
* Support variable frame rate streams.
* Support headless iGPU (Intel GPU disconnected from display) on Windows 8 and newer.

If your system meets the requirements, I'd appreciate stability feedback with assorted quality and sources of video content.
To report a bug report or feature request, please post in this thread.

If something is broken, please provide me with a detailed report including (after reading the known issues section below) :
1. Hardware (CPU, GPUs)
2. Software (OS, driver version, player, splitter, etc.)
3. Access to the offending content. Share via your favorite file share sites. Limit content to <100MB.

Requirements:
1. SandyBridge (2nd Generation Core i3/i5/i7/celeron/pentium) or newer. Older platforms will not work and no plans to support them.
2. Latest Intel graphic drivers. Intel GPU must either be the primary GPU, extended display or use Lucid Virtu.
3. Windows 7 (32/64) or newer OS. Should work in Vista but I can't test this.

Known Issues:
* Jumpy playback or heavy corruption on many clips are the result of drivers obtained from Windows Update. Download drivers from your OEM website or directly from Intel's download center. Some versions of Lucid Virtu will cause video playback in 64 bit player to display frames out of order.
* Frame rate is wrong or incorrect aspect ratio: Haali Media Splitter is sending corrupt time stamps or aspect ratio. LAV splitter is recommended.
* After a seek in a TS file, a corruption is seen for a few frames. LAV splitter known issue.
* Resolutions greater than 1080p aren't supported in SandyBridge.

Installation:
1. An ffdshow installer is supplied.
2. Open FFDShow configuration dialog and select 'Intel Quicksync' from the codec page for the desired formats (H264/VC1/MPEG2).

Version 0.45 is out with the following changes:
* Bugfix - frames were sometime treated as interlaced.
* Bugfix - time stamps are passed 'as is' when TS manipulation is off.
* Bugfix - time stamps handling was causing A/V delay.
* Changed: AnnexB type packets (AVC in TS files) is not pre-processed and sent to the HW decoder directly. May break a broken clip or two but save many others.
* Sync with MSDK 2014 files.
* FFDShow: r4531

Downloads
* For the latest cutting edge FFDShow builds download my builds Intel QuickSync Decoder SourceForge home page
* FFDShow-tryout site
* LAV Splitter builds
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.

Last edited by egur; 28th June 2014 at 14:14.
egur is offline   Reply With Quote
Old 4th September 2011, 22:32   #2  |  Link
Guest
Guest
 
Join Date: Jan 2002
Posts: 21,901
Welcome to the forum, Eric! And thanks for your contribution. I haven't got a SandyBridge but I'm sure you will get a lot of testers here.
Guest is offline   Reply With Quote
Old 5th September 2011, 00:45   #3  |  Link
Eliminateur
Registered User
 
Join Date: Jan 2010
Posts: 75
egur, this is very good to know, i have some questions:
1) Does SB have specific problems with DXVA interfaces what it needs specific quicksync support?, it's known to crash MPC-HC and ffdshow DXVA as well
2) What about the Pentium Gxxx series?, since they don't have quicksync...
Eliminateur is offline   Reply With Quote
Old 5th September 2011, 07:38   #4  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by Eliminateur View Post
egur, this is very good to know, i have some questions:
1) Does SB have specific problems with DXVA interfaces what it needs specific quicksync support?, it's known to crash MPC-HC and ffdshow DXVA as well
2) What about the Pentium Gxxx series?, since they don't have quicksync...
1) I'm not aware of any specific DXVA issues. QuickSync implementation is done with DXVA and further abstracted by the Intel Media SDK which I've used to create this FFDShow version. Using the DXVA interface directly isn't trivial and needs quite a few workarounds, the Media SDK takes care of some them. I had trouble myself with FFDShow-DXVA using both Intel graphics and an AMD Radeon 6950. Currently I managed to play dosens of HD (and non HD) movies well but I don't think the SW is at production level. I haven't tested with MPC-HC yet but I will.
2) Regarding the Pentium brand, I don't know. If someone has it, please let me know.
egur is offline   Reply With Quote
Old 5th September 2011, 08:47   #5  |  Link
kirakami
Registered User
 
Join Date: Aug 2011
Posts: 84
What is Sandy Bridge?
will Intel Pentium 4 CPU built in year 2001 support?
& Geforce 4 mx
kirakami is offline   Reply With Quote
Old 5th September 2011, 09:00   #6  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by kirakami View Post
What is Sandy Bridge?
will Intel Pentium 4 CPU built in year 2001 support?
& Geforce 4 mx
SandyBridge is the codename for Intel's latest generation CPU. Also called "2nd Generation i3/i5/i7 Core Processor".
SandyBridge has 2-4 cores an integrated GPU, integrated memory controller and integrated PCIe controller.
Pentium 4 doesn't have the HW needed and will definitely not work. My build of FFDshow might work on Core 2 Duo/Quad and i3/i5/i7 if and only if there's an Intel integrated GPU (can be found in many laptops and low end desktops). This wasn't tested though.
It will not work on AMD processors either as they do not have compatible HW.

My build should work on future processors with Intel graphics such as IvyBridge and Haswell.
egur is offline   Reply With Quote
Old 5th September 2011, 10:28   #7  |  Link
namaiki
Registered User
 
Join Date: Sep 2009
Location: Sydney, Australia
Posts: 1,073
Quote:
Originally Posted by egur View Post
My build of FFDshow might work on Core 2 Duo/Quad and i3/i5/i7 if and only if there's an Intel integrated GPU (can be found in many laptops and low end desktops). This wasn't tested though.
Unfortunately doesn't seem to work on my i5 with Intel HD graphics (Arrandale).
Tested on Windows 7 (64-bit) in MPC-HC (32-bit).

Last edited by namaiki; 5th September 2011 at 10:42.
namaiki is offline   Reply With Quote
Old 5th September 2011, 11:59   #8  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
Nice work Eric though i guess it wont do any better then Intels own Decoder sample in IMSDK 3 ?
@ least for Mpeg-2 it seems questionable if the hassle with different setups is worth it from my meassuring it saves somewhere 1W on my Core I5-2400 compared to ffmpegs decoder, though you will have all the hassle with Mpeg-2 Studio 4:2:2 switching as the Intel Decoder is same as Nvidias also not capable of doing this with DXVA

Of course it looks totally different for H.264 (there is the biggest save compared to the Worlds most Performant Software Decoders, but again if we come to the 10 Bit 4:2:2 and 4:4:4 or Lossless area everything fals apart again)
but also VC-1 im not sure at least WMV3 seems not to perform much better on Quicksync then again Libavcodecs decoder on the CPU

http://forum.doom9.org/showthread.ph...85#post1523685

a follow up on that terminating overhead further

http://forum.doom9.org/showthread.ph...92#post1523692


though it's cool that you (Intel) now also want to optimize based on samples like Nvidia did in the early days

first thing you should look @ this sample http://forum.doom9.org/showthread.ph...93#post1523293

i tried alot but i don't get it stable with EVR and Intels Decoder (it doesn't matter which splitter the tree pan doesn't get smooth hardware decoded also with Microsofts DTV-Decoder no go, the only solution for this sample is the Lav based Framework on EVR it gets perfectly smooth then perfectly telecined)

and then there is my issue with my sample.ts (also telecined though H.264) on EVR custom but im not so sure if this is a Intel fault though Software decoding again works fine but Hardware fails with EVR Custom see a Video of this issue http://mirror05.x264.nl/CruNcher/mpc-hc/ (Btw made with Quicksync ) <- Fixed with FFdshow for Quicksync

Intel Driver is = 8.15.10.2476 (Windows 7 64 bit)

Im trying your decoder now with all this

PS: You should mention that it's 32 Bit in your post

Superb news my sample.ts (H.264) (EVR Custom) issue is history with this, perfect telecined 23.976

Perfect awesome it doesn't allow Mpeg-2 Studio Profile connection and so fallbacks like it should be

This is the most awesome Decoder for Quicksync currently (except overhead being not DXVA2 Native is huuuge depending on stream see here after bugs http://forum.doom9.org/showthread.ph...06#post1523906)

Though the correct telecine to 24.30 (evil_tree Mpeg-2 1080i 29.970 sample) is problematic also with it on EVR it does 0.30 fps to much it seems (interlace flags off)



Really tricky

this is what it should look like in the end (works only on EVR normal)



else you wont get the tree pan smooth

Default Telecine works perfect even on EVR Custom



It also likes to crash with several *.ts files in combination with Lav Splitter (those crashy ones work fine with the Internal MPC-HC ts splitter) http://forum.doom9.org/showthread.php?t=156191 Yep it crashes a lot with Lav Splitter

No Vsync no Exclusive mode nothing just Aero and Quicksync (again you can nicely see the jitter the Stats and Graph Rendering causes current EVR Custom OSD overhead)

__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 5th September 2011 at 18:48.
CruNcher is offline   Reply With Quote
Old 5th September 2011, 17:11   #9  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
Major issues with VC-1 in *.ts either Sync problems or Rendering issues (different VC-1 Interlace encoding mixed modes)

Sync Issues:



Rendering Issues: (This Problem Nvidia fixed ages ago )




It also crashes for both with Lav Splitter had to switch to MPC-HC Internal Splitter

Incorrect Telecine again

Lav-Splitter->Lav-Audio->FFdshow quicksync : (Incorrect)



MPC-HC Internal->Lav-Audio->FFdshow quicksync : (Correct)





Though i slowly wonder if this is DXVA2 hardware Playback also because MPC-HC doesn't show any DXVA2 information (or more something like Nvdias NVcuvid API own Intel API but even for that it would be heavy overhead, just for Playback purpose ??) as i get much much lower CPU utilization with Microsofts DTV-Decoder (DXVA2) on H.264 streams ???? (lets see 4 girls is coming )


Yeah really heavy that overhead on this small HD2000 compared to Microsofts DXVA2

ffdshow-quicksync overhead:




Native DXVA2 is still the way to go (imho we just need a better optimized playback framework for Quicksync and not only for it )



Though will be really interesting to compare vs Nvcuvid overhead

Quote:
Known Issues:
1. Higher CPU usage on low bitrate clips.
No comment
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 5th September 2011 at 19:11.
CruNcher is offline   Reply With Quote
Old 5th September 2011, 20:50   #10  |  Link
Blight
Software Developer
 
Blight's Avatar
 
Join Date: Oct 2001
Location: Israel
Posts: 1,005
The major issue here is the overhead the driver adds for memory copies.

John Carmack (ID Software) wrote about it in this interview.
Quote:
The topic of the GPU hardware race came up early in our talk and the response Carmack gave us was pretty interesting. Stating “I don’t worry about the GPU hardware at all, I worry about the drivers” seemed to be a reiterated point. This became very apparent to id Software while developing RAGE where even though the PC had truly an order of magnitude more horsepower than the consoles, it struggled to keep up with the “minimum latency”, get feedback here, update data there, etc and do it all to maintain a 60 Hertz frame rate. DirectX 11 and multi-threaded drivers might have helped things but he still claims that they are far from the solution he envisions: direct surfacing of the memory system. The process of updating a textures on the PC is on the order of “tens of thousands of times slower” than on the Xbox 360 and PS3. AMD did implement a “multi-texture” update specifically for id Tech 5 which should help, but from the interview you can tell that Carmack really does want more done on this topic.

One interesting side effect of this talk – Intel’s integrated graphics actually has impressed Carmack quite a bit and the shared memory address space could potentially fix much of this issue. AMD’s Fusion architecture, seen in the Llano APU and upcoming Trinity design, would also fit into the same mold here. He calls it “almost a forgone conclusion” that eventually this type of architecture is going to be the dominant force. You might remember our discussion of this topic with Josh’s analysis of AMD’s Fusion System Architecture – it would appear that AMD has a potential ally on its side if they are paying attention.
The same situation applies here too. Basically, the Intel GPU driver provides virtual GPU memory that in reality resides in the system ram.
But... you can't get direct access to that memory. The way the driver provides access to this memory is 1000's of percent slower than if the driver were able to point to the real memory address and let you just copy the image directly.
__________________
Yaron Gur
Zoom Player . Lead Developer

Last edited by Blight; 5th September 2011 at 20:52.
Blight is offline   Reply With Quote
Old 5th September 2011, 20:57   #11  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,336
The main problem here is actually copying stuff back from the GPU memory to the CPU/System memory, which only NVIDIA seems to have really managed to optimize properly for CUDA. Its not a task a game needs, which is why AMD never really cared to invest in it (and therefor is really slow with it). Intel doesn't seem to get that much performance either on the GPU -> CPU copys.

Its probably true that drivers are holding back the true potential of the current and next gen hardware.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 5th September 2011 at 21:02.
nevcairiel is offline   Reply With Quote
Old 5th September 2011, 21:21   #12  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
CruNcher:
First, thanks a lot for your analysis. That's the best way to get my little SW running properly...

I'd like to explain what I did in FFDShow.
I used the Intel Media SDK v3 beta 3 Direct Show filters sample code. Stripped most of it, fixed several bugs, cleaned it up, some refactoring, put some inline documentation and created a DLL that exports an interface.
My code doesn't use any secret APIs or secret driver GUIDs and doesn't contain any algorithms. It's quite simple and not very big.
Intel's Media SDK uses DXVA1/2 to communicate with the driver/HW (that's what I've heard anyway). What it does is somewhat abstract the horrible DXVA API making this task easier (but not easy!) and use less code.
The (relatively) high CPU usage is caused by one thing - memory copying from the GPU to system memory. I'll try to reduce this by trying to do VPP (DXVA/MSDK video post processing) to a system memory buffer. Hopefully the driver will do the copy faster than memcpy().

My idea with FFDShow is to have a 1 stop decoder that's low on power and high on quality. I want to abstract the HW acceleration and hopefully don't lose too much because of the above frame copying.

I used a profiler to check where the CPU spends its time and most of the time is copying the frame to system memory. A large chunk (25-50%) goes into the renderer's code somewhere. No clue as to why.

Just using DXVA to decode isn't trivial as different splitters behave differently and give different data and maybe the HW decoders aren't following the various specs to the letter. Microsoft's documentation isn't clear enough on how to write things properly. Theoretically they could have created a DXVA decoder themselves, but they didn't. Same goes to Intel/AMD/Nvidia.

My own CPU usage analysis shows that on low/medium bitrates, libavcodec uses less CPU than my implementation, but when bitrates are high (I have only one 26Mbps clip) the CPU usage stays about the same in my decoder and rises in libavcodec.

BTW, if someone know how to copy a frame from the GPU quickly I'd like to know. Since there's no PCIe traffic going on a solution is bound to be found.
egur is offline   Reply With Quote
Old 5th September 2011, 21:27   #13  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,336
Quote:
Originally Posted by egur View Post
Microsoft's documentation isn't clear enough on how to write things properly. Theoretically they could have created a DXVA decoder themselves, but they didn't.
Oh, but they did. For H264 its called Microsoft DTV-DVD Video Decoder, and ships with Vista/7.
They also have one for VC-1, the WMVideo Decoder DMO, but for some reason this one only uses DXVA in WMP, it must be locked down somehow.

Of course their decoders are "pure" DXVA, which means they don't copy stuff back from the GPU, it remains in there until it is displayed - avoiding the memcpy problem.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 5th September 2011, 22:20   #14  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by nevcairiel View Post
...

Its probably true that drivers are holding back the true potential of the current and next gen hardware.
If that was done on purpose then they (Intel/AMD/NVidia) could sell a premium part for more money that doesn't have this limitation and calling it a feature. Most likely a low priority issue that no one wants to spend resources on it (HW or SW).

The reason for the slowness as far as I've heard (aside from the PCIe latency and BW) is that the GPU stores surfaces differently than the CPU. A GPU in many cases needs to work on blocks or tiles (e.g. 8x8 16x16, etc.) and if those pixels are sequential in physical memory then they are read/written much faster and provide higher cache hits as well as efficient cache prefetching. So when a CPU tries to read several bytes each time (inner loop of memcpy) there's a lot of address translations and the memory controller needs to set up the DDR again and again for different pages.
egur is offline   Reply With Quote
Old 6th September 2011, 00:00   #15  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
Quote:
The major issue here is the overhead the driver adds for memory copies.

John Carmack (ID Software) wrote about it in this interview.
Quote:
The topic of the GPU hardware race came up early in our talk and the response Carmack gave us was pretty interesting. Stating “I don’t worry about the GPU hardware at all, I worry about the drivers” seemed to be a reiterated point. This became very apparent to id Software while developing RAGE where even though the PC had truly an order of magnitude more horsepower than the consoles, it struggled to keep up with the “minimum latency”, get feedback here, update data there, etc and do it all to maintain a 60 Hertz frame rate. DirectX 11 and multi-threaded drivers might have helped things but he still claims that they are far from the solution he envisions: direct surfacing of the memory system. The process of updating a textures on the PC is on the order of “tens of thousands of times slower” than on the Xbox 360 and PS3. AMD did implement a “multi-texture” update specifically for id Tech 5 which should help, but from the interview you can tell that Carmack really does want more done on this topic.

One interesting side effect of this talk – Intel’s integrated graphics actually has impressed Carmack quite a bit and the shared memory address space could potentially fix much of this issue. AMD’s Fusion architecture, seen in the Llano APU and upcoming Trinity design, would also fit into the same mold here. He calls it “almost a forgone conclusion” that eventually this type of architecture is going to be the dominant force. You might remember our discussion of this topic with Josh’s analysis of AMD’s Fusion System Architecture – it would appear that AMD has a potential ally on its side if they are paying attention.
The same situation applies here too. Basically, the Intel GPU driver provides virtual GPU memory that in reality resides in the system ram.
But... you can't get direct access to that memory. The way the driver provides access to this memory is 1000's of percent slower than if the driver were able to point to the real memory address and let you just copy the image directly.
Also when we are about the talk on GPU/CPU Efficiency we have to come to the OS itself and it's current Driver architecture and WDDM 1.1 is just the start of this Process the next Windows is going to bring the next step until we some day reach WDDM 2.0
We already had a similar Discussion on Beyond3d and nobody really want's to go to Assembler Style Code the GPU directly anymore, so yeah it's up @ Microsoft and the Vendors to improve this

Quote:
The (relatively) high CPU usage is caused by one thing - memory copying from the GPU to system memory. I'll try to reduce this by trying to do VPP (DXVA/MSDK video post processing) to a system memory buffer. Hopefully the driver will do the copy faster than memcpy().


Quote:
My idea with FFDShow is to have a 1 stop decoder that's low on power and high on quality. I want to abstract the HW acceleration and hopefully don't lose too much because of the above frame copying.
Nvidia was very successful with this

Quote:
Just using DXVA to decode isn't trivial as different splitters behave differently and give different data and maybe the HW decoders aren't following the various specs to the letter. Microsoft's documentation isn't clear enough on how to write things properly. Theoretically they could have created a DXVA decoder themselves, but they didn't. Same goes to Intel/AMD/Nvidia.
Yeah true many ISVs know that and some do better then others in those regards, having more open and better documented APIs like Nvcuvid,Open Video and Intel Media SDK are great and hopefully will make this more easy for Devs Lav Cuvid and ffdshow-quicksync are nice examples though Nvidia is still in the lead here and both AMD and Intel came late into the Game.
Also it makes it much easier to adapt to new Renderer that doesn't support DXVA and use full capabilities without being limited
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 6th September 2011 at 00:31.
CruNcher is offline   Reply With Quote
Old 6th September 2011, 16:30   #16  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
CruNcher:
Regarding the "evil trees" clip. I get very strange results from different splitters. The LAV splitter reports 59.94 fps while haali and the Gabest MPEG splitter report 29.97.
All splitters produce a cadence of P B T P B .... (progressive, bottom first, top first) and all of them start past the zero time stamp (something like 4 missing frames). I'll dig into this to make sure I behave properly on all of them.

I need a VC1 clip that crashes - like you reported, currently I don't have crashing content. Also, what source filters are used for VC1 (.wmv), the WM ASF Reader freezes too much (regardless of decoder).
I've fixed the seeking issue and now seeks are instantaneous without artifacts.
I also fixed MPEG2 sequence header initialization which will seek corruption.

I'll release a new build in a day or two.
egur is offline   Reply With Quote
Old 6th September 2011, 16:47   #17  |  Link
Superb
Registered User
 
Join Date: Feb 2010
Posts: 364
Not trying to get you down or anything, but why integrate it into ffdshow while LAV Video & LAV CUVID Decoder are the new rising stars around the neighborhood?
Nev (the developer) said he's planning to integrate the two one day (which makes sense; like CoreAVC), and I think it would be wonderful if he'll have a patch available adding SB acceleration as well. It will make it the best video decoder hands down.
What I'm trying to say: think ahead. forward. ffdshow is slowly fading w/ each step LAV Filters take.
I believe the day where codec packs use LAV Filters (instead of Haali & ffdshow) is not that far away.

Or maybe I'm the only one who has noticed it?

Last edited by Superb; 6th September 2011 at 16:50.
Superb is offline   Reply With Quote
Old 6th September 2011, 17:11   #18  |  Link
pandy
Registered User
 
Join Date: Mar 2006
Posts: 1,049
Quote:
Originally Posted by egur View Post
BTW, if someone know how to copy a frame from the GPU quickly I'd like to know. Since there's no PCIe traffic going on a solution is bound to be found.
AFAIR from old PCI times (seems that PCIe is only extension to PCI) reading from PCI device to memory was much slower than writing from PCI device to memory - if there is chance to make PCIe device transaction initiatior and order that PCIe device will write to system memory should IMHO faster than reading from device.
pandy is offline   Reply With Quote
Old 6th September 2011, 21:26   #19  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by Superb View Post
Not trying to get you down or anything, but why integrate it into ffdshow while LAV Video & LAV CUVID Decoder are the new rising stars around the neighborhood?
Nev (the developer) said he's planning to integrate the two one day (which makes sense; like CoreAVC), and I think it would be wonderful if he'll have a patch available adding SB acceleration as well. It will make it the best video decoder hands down.
What I'm trying to say: think ahead. forward. ffdshow is slowly fading w/ each step LAV Filters take.
I believe the day where codec packs use LAV Filters (instead of Haali & ffdshow) is not that far away.

Or maybe I'm the only one who has noticed it?
My work has very little on FFDshows own code. I created a separate DLL that doesn't link with FFDshow or any of its components. FFDshow works very well (that I've noticed anyway) and it was a good start point to me as it doesn't change all the time (actually it does change but with very short merge times on my part). Porting to LAV should be easy, but one thing at a time.
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.
egur is offline   Reply With Quote
Old 6th September 2011, 21:38   #20  |  Link
Superb
Registered User
 
Join Date: Feb 2010
Posts: 364
That's great news. Btw, you might wanna look at VLC's git repository... They use DXVA2 acceleration and copy the frames back too. (under modules\codec\avcodec\dxva2.c)

Last edited by Superb; 6th September 2011 at 21:51.
Superb is offline   Reply With Quote
Reply

Tags
ffdshow, h264, intel, mpeg2, quicksync, vc1, zoom player

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 07:44.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.