View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing
NikosD
8th October 2012, 08:15
Eric hi.
I've noticed that it's been over two months of your latest software update.
Have you finished the pre/post processing features of QS software ?
I think that you have said before, about some driver problems regarding QS hardware features - not exposed in a suitable manner for you to use them or something like that.
What about the direct, native DXVA2 decoding approach - through Intel's MSDK of course - along with FFDShow and/or LAV filters ?
Is there any progress to that ?
TIA,
Nikos
egur
8th October 2012, 09:11
I'll release a small bugfix, probably today to handle the new driver (15.28.xx.xx) issues.
As for native DXVA, it requires cooperation from LAV filters and its not high on Nev's priority list. I can't do this for ffdshow - requires a significant change to ffdshow, if done properly and it would break ffdshow otherwise.
So I'll probably drop DXVA for now and focus on DX11 support or add MJPG which should be supported by Haswell as seen in Anandtech. DX11 is probably more important as it lifts many of the fullscreen and disconnected GPU issues.
Tacio
8th October 2012, 09:52
DX11 is probably more important as it lifts many of the fullscreen and disconnected GPU issues.
So, it's only for Ivy Bridge and later CPUs?
nevcairiel
8th October 2012, 10:10
As for native DXVA, it requires cooperation from LAV filters and its not high on Nev's priority list.
I just fail to see the big advantage it would offer. Sure, slightly lower CPU usage, but considering even the Ultrabook SNB/IVB CPUs can work with the current QuickSync implementation just fine (even in lowest power modes), i fail to see the need. Development time is limited, sadly, and the list of things to do is long.
If Eric wants to implement it, i'll probably find the time to expose this feature to the users (somehow), but i won't push for it.
wanezhiling
8th October 2012, 10:17
So, it's only for Ivy Bridge and later CPUs?
SNB dont support DX11..:o
nevcairiel
8th October 2012, 11:20
Technically, the DX11 API can also be used on hardware which does not support DX11, as long as the driver is OK with that. You just can't use any of the new hardware features that were added with DX11.
ajp_anton
8th October 2012, 16:26
I just fail to see the big advantage it would offer. Sure, slightly lower CPU usage, but considering even the Ultrabook SNB/IVB CPUs can work with the current QuickSync implementation just fine (even in lowest power modes), i fail to see the need. Development time is limited, sadly, and the list of things to do is long.
If Eric wants to implement it, i'll probably find the time to expose this feature to the users (somehow), but i won't push for it.How much would battery life improve for those long travel journies without power?
CiNcH
8th October 2012, 16:38
DX11 support
But this will still work together with a DX9 Renderer resp. Custom Presenter, right?
How much would battery life improve for those long travel journies without power?
I have measured a difference of 1-2W between 'QuickSync' and 'DXVA2 native' with 'Open Hardware Monitor' when playing back 1080i live streams (H.264 10-15mbps).
egur
8th October 2012, 20:26
I don't have any details on the DX11 usage by the Intel Media SDK, I need to look at the docs and sample code first and see if it actually solves any problems. I also don't know if SandyBridge will work or not. It might since DX11 hardware isn't used, only driver interface.
As for native DXVA output, this is only good for battery life.
I also want to do scaling (probably start with downscaling) so 4K video will take so many resources when the display has lower resolution.
ajp_anton
8th October 2012, 23:58
As for native DXVA output, this is only good for battery life.You make it sound like it's not a good thing =).
egur
9th October 2012, 08:17
You make it sound like it's not a good thing =).
It's a good thing, but maybe not the lowest hanging fruit. I'll definitely do it, but don't know when.
egur
9th October 2012, 20:52
Version 0.39 beta is out with the following changes:
* Workaround for various issues when drivers from the 15.28 or 15.31 (beta) family are used.
* Better handling of timestamp discontinuities - usually due to broken streams.
* Default multithreading option is now set to "Multithreaded copy".
* FFDShow: r4488
Note to Window 7/8 users with drivers 15.28.xx.xx or 15.31.xx.xx:
Do not enable full multithreading support in FFDshow or player will abort playback at some point in many clips.
Select "Multithreaded copy" for good performance and rock solid stability.
If/when the drivers get fixed, I'll post the working version number.
Downloads
* For the latest cutting edge FFDShow builds download my builds Intel QuickSync Decoder SourceForge home page (http://sourceforge.net/projects/qsdecoder/)
* FFDShow-tryout site (http://ffdshow-tryout.sourceforge.net/download.php)
* LAV Splitter builds (http://forum.doom9.org/showthread.php?t=156191)
whiplash000
11th October 2012, 03:34
Hi, I decided to try this thing out and I can't seem to get it to work.
Here's a test video: http://vimeo.com/27770499
Select download and choose the 1080p mov version.
I have a 2600k and an Asrock Z68 Extreme4 motherboard. I have two monitors, one hooked up to an AMD gpu (default monitor), and one to the internal graphics. I tried playing a video (1080p, 60fps H.264) and it plays fine on my main monitor, but when I drag the WMP window to the other monitor, WMP simply crashes. Soluto says it has something to do with the intel graphics driver. I don't really think the acceleration is being used anyway, because I tried fiddling with the sharpness and noise removal and the video didn't seem to be affected. I'm using libav's splitter.
I then tried putting wmplayer.exe on Lucid Virtu's list (forcing it to use internal graphics), and when I tried to play files then, I got an error: "Windows Media Player cannot play the file." Same on both monitors.
I'm running driver version 8.15.10.2827 on Windows 7 x64. Is that the newest available?
Also, is the 1080p limitation set in stone or can it be changed? Because I like to record video footage at 1920x1200.
ryrynz
11th October 2012, 06:20
I'm running driver version 8.15.10.2827 on Windows 7 x64. Is that the newest available?
9.17.10.2849 beta (http://drivers.softpedia.com/get/GRAPHICS-BOARD/INTEL/Intel-HD-Graphics-Test-Driver-15285642849-for-Windows-7-x64-Windows-8-x64.shtml) is the newest I've seen.
egur
11th October 2012, 08:41
The EVR renderer used by WMP does use HW acceleration, so moving a video window between GPUs may cause problems. Maybe you should report this to the graphics driver support forum. I can't help.
hajj_3
11th October 2012, 11:36
9.17.10.2849 beta (http://drivers.softpedia.com/get/GRAPHICS-BOARD/INTEL/Intel-HD-Graphics-Test-Driver-15285642849-for-Windows-7-x64-Windows-8-x64.shtml) is the newest I've seen.
v2857 is the latest beta: http://downloadmirror.intel.com/21840/a08/win64_15286t.zip
egur
11th October 2012, 13:04
Note that when using the 15.28 drivers, you must use the latest version (just released). LAV users can copy the QS DLL into the LAV installation directory.
nevcairiel
11th October 2012, 13:12
The latest test build i posted yesterday in the LAV thread also has the latest version of the QS decoder.
A final release will most likely be out within the next week or so.
whiplash000
11th October 2012, 18:55
Okay, so I got the v2857 drivers and installed them. Now, when I move the WMP window to my second monitor as it's playing, it doesn't crash anymore, but I see no video. The audio still plays, but the video is black save for a grey bar across the top of the video. I'm using the latest Oct. 10th build of the decoder.
Video still refuses to play if I force wmplayer.exe to use the integrated graphics with Lucid Virtu.
wanezhiling
13th October 2012, 10:42
http://forum.cyberlink.com/forum/posts/list/22488.page
• Supports hardware acceleration for playback of 4K video files on 3rd generation Intel Core i5 CPUs and above.
Hi Eric, ivb i3 family dont support 4k?
nevcairiel
13th October 2012, 10:54
Only in crappy PowerDVD. :p
egur
13th October 2012, 20:00
I don't remember seeing that i3 is SKUed differently with respect to QuickSync. The Pentium/Celeron SKUs don't have encoding but that about it.
RBG
15th October 2012, 09:33
Hello, Eric.
I can't make this sample (http://www.mediafire.com/?2g9inv0ph8s8zes) run with quick sync acceleration on ffdshow decoder, though hardware decoding works just fine on LAVQS and PotPlayerQS decoders.
FFdshow v4488.
Intel driver 8.15.10.2761.
Windows 7 x64
32 GB Ram( I've had to limit Intel's GPU memory to 256 MB in my BIOS to solve the compatibility problem. It seems that Intel HD Graphics haven't been properly tested on a large amounts of Ram, or it is just Gigabyte motherboards issue, but none the less when intel HD Graphics(i7 2600) is active(except cases whet it's memory is limited to 256 MB) with 32 GB of RAM operating system boot becomes awfully slow, about 7 minutes instead 1:30 with iGPU turned off, and overall OS behavior is somewhat sluggish...)
egur
15th October 2012, 10:44
I'll test this clip at home.
I've never heard of this sluggishness issue, I'll try to find out.
BTW, my main home PC is an i7-2600 with 8GB RAM + Intel SSD + Win7 x64 Ultimate and it boots in 15-20 seconds.
Do you have any special drivers installed? Hypervisor? Memory controller is not configured well in BIOS? Latest BIOS?
RBG
15th October 2012, 11:47
egur
My home system is really nothing special, no нypervisor, just an ordinary h67 chipset motherboard(Gigabyte h67a-ud3h-b3 latest bios) with a hybrid gpu setup, dGPU - GeForce 470(main) and iGPU - i7-2600 (only for Quick Sync support), and WD Black 2TB HDD as a system drive, 32 GB RAM(kingston 1333 9-9-9-24). When iGPU is disabled or is limited to 256 MB everything runs just fine, when it is enabled with other options I got the problems described above. I googled a little bit and found similar cases. Also I measured memory performance in AIDA64 with iGPU enabled and disabled on 32 GB(everything is fine on 16 GB) and there was significant memory bandwidth drop, about 30 % in memory copy operations, about 37% in memory write operations, and about 21% in memory read operations.
egur
15th October 2012, 12:17
Sent you a PM with details.
CiNcH
15th October 2012, 12:20
Hi Eric, ivb i3 family dont support 4k?
My i3 IVB decodes 4K H.264 just fine..
egur
15th October 2012, 19:16
Hello, Eric.
I can't make this sample (http://www.mediafire.com/?2g9inv0ph8s8zes) run with quick sync acceleration on ffdshow decoder, though hardware decoding works just fine on LAVQS and PotPlayerQS decoders.
FFdshow v4488.
Works fine here.
Before I try older drivers, which splitter was used?
I assume you used 32 bit.
RBG
15th October 2012, 19:44
egur
I used Haali splitter 1.11.288.0. Yes, x86 version of ffdshow.
And I guess I've found the culprit.
LAV Splitter Source
http://i3.imageban.ru/out/2012/10/15/28efb9bab7a38f65fdc806913a7d3682.png
Haali Splitter.
http://i2.imageban.ru/out/2012/10/15/8875cb10449b53c2734f2a2215271d65.png
Same sample, same ffdshow settings.
I wonder why ffdshow selects libavccodec instead of QS when chained with Haali?
whiplash000
15th October 2012, 20:29
Hey, just wanted to say I finally figured out my issues. First off, I'm running the latest beta drivers you guys linked to for my iGPU. Apparently, these drivers make Lucid crash, but since I already had monitor #2 hooked into my motherboard's output, it isn't really needed. Also, I failed to mention that I'm using Shark007's Windows 7 Codec pack, sorry. So anyway, I got rid of that, reinstalled the QS FFDshow filter, and installed the LAV splitter linked in the OP. Also, I had to run this tweaker (http://codecguide.com/windows7_preferred_filter_tweaker.htm) in order to actually get FFDshow to work with Windows Media Player. I had to go to "Tweaks" and disable everything Microsoft on my system. Once I did that, WMP was actually decoding my 60fps videos at 60fps. Wow. And the issue of moving my WMP window between two monitors was fixed, too!
Good work Eric!
egur
15th October 2012, 20:57
@whiplash000 - I'm glad you managed to fix things. Too bad video playback/setup is so complicated...
@RGB - Haali causes a lot of issues on many clips so I stopped using it. Different splitters supply different media samples to the decoder. I guess my compatibility check failed.
I'll check if Haali started using a new fourcc or something.
I'll report back when I know the problem.
whiplash000
15th October 2012, 21:00
If I had just one wish, it'd be that you would also do the encoder (at least H.264) at some point in the future. The gaming community would be forever indebted to you for unlocking a hardware encoder built into their CPU, since current solutions involve using at least some portion of the CPU for video encoding, which eats into in-game FPS.
...or maybe you could help me pull it off? I'm well-versed in C, familiar with C++, not so familiar with the inner workings of VFW or FFDshow.
egur
15th October 2012, 22:21
@whiplash000 - making the HW decoder work under full screen exclusive mode is probably very hard if not impossible since new HW devices can't be created.
If you're familiar with C++ than you can try using the Media SDK to run the HW encoder. I can help, but I don't have the time to start another project like this.
@RGB - I root caused the problem with your clip. Haali sends an incomplete media sample - the fields that denote the H264 profile and level are both zero which are not valid values. This is a Haali bug since the H264 sequence header (which ffdshow parses well) reports these values properly.
I didn't implement H264 sequence header parsing yet so my checker function fails on this clip.
My suggestion is to drop Haali and switch to LAV splitter. In fact the only benefit to Haali is faster and cleaner seeks in TS files. The downsides to Haali are bad support for aspect ratios, terrible time stamps (in TS files) which will cause my decoder to detect a wrong frame rate and cause libavcodec to produce jittered video.
RBG
16th October 2012, 09:59
egur
In fact the only benefit to Haali is faster and cleaner seeks in TS files.
Haali also has mkv segment linking support which is very important for me. Anyway, thanks.
dna108
16th October 2012, 22:52
http://www.mediafire.com/?xobl7pcuw8fbq27
Hardware
i7 3770k
Intel HD4000
AMD HD6850
Software
Win7 x64
MPC-HC (32+64)
Lav splitter
Driver 8.15.10.2761
QS ffdshow (2012-07-30)
Watch the concrete brick work on the floor with deinterlacing on, then with 'Full rate DI' on. Looks much better with full DI.
I don't think its a motion issue; probably more frames gives the algorithm more to work with.
egur
17th October 2012, 13:12
The DI works the same in either case, in the half rate case, half of the output frames are dropped. I'll take a look later today.
Edit
I watched the clip, the smoother 60fps looks indeed much better (smoother), strange.
dna108
19th October 2012, 01:08
The DI works the same in either case, in the half rate case, half of the output frames are dropped. I'll take a look later today.
Edit
I watched the clip, the smoother 60fps looks indeed much better (smoother), strange.
I ran it through Avisynth with SelectEven for original fps, and it still looked good.
Maybe it's just that clip, as i have not noticed it on anything else; but I don't really have any test (problem) clips.
kwlee
20th October 2012, 07:47
There is an idea about gpu_memcpy() in QuickSyncUtil.cpp
and it's no need to allocate extra 2K memory anymore,
we can use memcpy() for the area within 2K offset, after that,
use SSE4.1 copy..all actions will be done in gpu_memcpy()
egur
20th October 2012, 08:51
There is an idea about gpu_memcpy() in QuickSyncUtil.cpp
and it's no need to allocate extra 2K memory anymore,
we can use memcpy() for the area within 2K offset, after that,
use SSE4.1 copy..all actions will be done in gpu_memcpy()
I added 4K to the size of the output buffer so I can create an offset of 2K between source (GPU) and destination (system) addresses.
This trick ensures that address compares between source and destination are performed faster. Internally the CPU performs a check whether there's an overlap between source and destination. It doesn't compare the whole address at once (slow) it checks the 11 LSB of the address first.
No extra bytes are copied. The output buffer is slightly larger (4K). Note that the function itself doesn't use this extra size, it justs informs the reader about proper/fast usage.
The 2K address offset is optional, it's not required to work. If the 11LSB of both addresses are identical between source and destination, the copy will be slower.
Maybe it's time I should write a white paper on this (and add AVX2 implementation).
kwlee
20th October 2012, 09:46
Ya, mostly agree.
I just think it's possible the decoded buffer is from renderer, original method would need a temporary buffer, then copy back to render buffer.
with the idea I proposed, use the render buffer for decoder directly, that can save one copy time of frame data
egur
20th October 2012, 10:06
Copying directly to the render buffer will break ffdshow and probably LAV video decoder design. They might perform some kind of operation on the output buffer (color space conversion, image processing, etc).
crotecun
21st October 2012, 16:23
I read in an earlier post that Sandy Bridge has its own different approach to scaling video.
SandyBridge's adavnced video scaler has a different approach. A context adaptive scaler.
It will use a Lanczos4 scaler (8 taps) in order to create very sharp images. In order to avoid (most) of the artifacts, it will perform an analysis of the area and blend between the sharp scaler and a smooth scaler depending if the analysis thought the target pixel is prone to artifacts.
Context adaptive scaling is not a new idea but this implementation's quality and performance are probably one the best.
Some companies perform context adaptive scaling using a different paradigm - use a soft scaler like bi-cubic and perform post processing sharpness filter on edges that were very strong in the source image.
Does this also work on an Ivy Bridge laptop running a Linux distro? Around here Dell has Core i3 laptops with Ubuntu pre-installed, I'd like to know if such a setup would also benefit from the scaling technology described in the above quote.
egur
21st October 2012, 21:48
I don't know what Linux drivers do and what options they support.
aufkrawall
24th October 2012, 08:31
Hello egur,
may I ask if you're still working on DXVA frames output? :)
egur
24th October 2012, 11:18
Hello egur,
may I ask if you're still working on DXVA frames output? :)
I'm not working on it.
markanini
27th October 2012, 10:10
Chroma interpolation is still poor in Flash Player (2500k, v2761): http://i.imgur.com/l3abl.png
Look at the reporters left chin
egur
27th October 2012, 11:44
Not the right place to post flash playback issues. Nothing I can do about it. Sorry.
wanezhiling
27th October 2012, 13:01
http://downloadmirror.intel.com/22020/eng/releasenotes_gfx_2867-64.pdf
http://i.imgur.com/qp58t.png
Hi Eric, 2867 has supported 4K display? IVB i3 cant?
PS:LAV QS cant support 4096 x 2048 decoding.
egur
27th October 2012, 14:02
I can check that at my IVB at work tomorrow.
NikosD
27th October 2012, 14:31
I tried Win 7 32bit v2867 drivers to Core i5/ HD2000 hoping for 4K H.264 support but of course there isn't any there.
The 3 MFT transcoders "Intel Hardware H.264/MPEG-2/VC-1 Decoder MFT" are not working for decoding (playback), but I didn't remember them, were they always there Eric ?
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.