Log in

View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [19] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

nevcairiel
4th March 2012, 18:17
Didn't work with it before, what are the dependencies?
OpenMP is super easy too and it ships with the compiler.

No dependencies, include ppl.h and get cracking. Its included in VS2010 (not earlier versions)
http://msdn.microsoft.com/en-us/library/dd492418.aspx


My copy function is written using intrinsics and it works very fast in debug builds.
The copy instructions are not affected because they copy memory anyway (so the debugger knows whats going in the register) but all processing instructions are.
My code to convert YUV -> RGB is incredibly slow in debug builds <.<

egur
4th March 2012, 19:21
No dependencies, include ppl.h and get cracking. Its included in VS2010 (not earlier versions)
http://msdn.microsoft.com/en-us/library/dd492418.aspx


Works nice for my copy function - ~same performance as my threadpool with 1/100 of the code :)
Checked with Dependency Walker - no new dlls are linked. QS decoder dll is twice as big though.

nevcairiel
4th March 2012, 20:07
Works nice for my copy function - ~same performance as my threadpool with 1/100 of the code :)
Checked with Dependency Walker - no new dlls are linked. QS decoder dll is twice as big though.

All the code required is dynamically pulled in through templates and whatnot, so yeah it ends up in your file.

Esperado
4th March 2012, 20:50
Do you mean that in the codecs tab you selected the QS decoder and in the info tab you saw libavcodec? What player/splitter do you use? Does it happen with LAV video decoder?
Yes for your first question.
for the second, I'm using BVBViewer (a very good TNT program). You can chose all the elements you want to use in the directX configuration page.
If i chose an other decoder (CoreAVC, cyberlink, or ATI), it works as expected.
If i chose FFDSHow (your version), it is ffdshow, and everything seems to wok. The only problem is, in ffdshow, i had configured Quick Sync for the codec, and (thanks to the direct show intelligent connect, of ffdshow ?) right clicking on the icon or opening Info & cpu tab shows Libavcodec as used instead.

It says " input FOURCC:H264, Decoder: Libavcodec h264, Output color space: YV12 "
I had tried both with EVR or VMR9 rendrers, same issue.

According to one of your messages, i had configured the graphic card in BIOS to reserve 500 Mo of ram, and i can indeed see 3553 for the available physical memory (instead of 4GO) in the SEVEN task manager, and 543 Mo as reserved for material.

CharlieCL
4th March 2012, 21:14
Yes for your first question.

If i chose FFDSHow (your version), it is ffdshow, and everything seems to wok. The only problem is, in ffdshow, i had configured Quick Sync for the codec, and (thanks to the direct show intelligent connect, of ffdshow ?) right clicking on the icon or opening Info & cpu tab shows Libavcodec as used instead.

It says " input FOURCC:H264, Decoder: Libavcodec h264, Output color space: YV12 "
I had tried both with EVR or VMR9 rendrers, same issue.

According to one of your messages, i had configured the graphic card in BIOS to reserve 500 Mo of ram, and i can indeed see 3553 for the physical memory (instead of 4GO) in the SEVEN task manager.

This may be the reason that I can not use QS Codec. I also applied DirectShow Intelligent Connect. I set a testing code that when input format is YV12 the video display as gray. So the wmv video was displayed as gray but h.264 and MP4 video was still displayed in color. I am sure the HW acceleration has been applied in some players but not sure QS was used.

CruNcher
4th March 2012, 21:28
@ Egur
Still a lot Decoding issues with WMV3/9 MP though i guess you and the MSDK guys know that http://img859.imageshack.us/img859/9984/intelwmv3decodeissues.png ;)

egur
4th March 2012, 21:36
Esperado,
Can you try LAV video decoder instead of ffdshow - it will show you if QS is enabled on your system.

I don't have live TV streams to check and very little test material for fourcc::H264 (most h264 content is AVC1). Make sure you are using the ffdshow latest version, fourcc::H264 wasn't supported a few versions ago.

The problem could also be in the stream - if it's an h264 10bit or 4:2:2 profile than HW acceleration doesn't support it (neither on Nvidia/AMD).

egur
4th March 2012, 21:37
@ Egur
Still a lot Decoding issues with WMV3/9 MP though i guess you and the MSDK guys know that http://img859.imageshack.us/img859/9984/intelwmv3decodeissues.png ;)

Yes, known issue. Thanks.

BTW, SPP deblocking or probably any other deblocking algorithm within ffdshow relies on having the quantization parameter for each macro block.
I can't supply those parameters, so the deblocking algorithms will smooth the image a lot and effectively become useless.

Esperado
5th March 2012, 00:21
I don't have live TV streams to check and very little test material for fourcc::H264 (most h264 content is AVC1). Make sure you are using the ffdshow latest version, fourcc::H264 wasn't supported a few versions ago.Thanks a lot for your care, Egur.
Can it helps-you if i provide in line for you some .ts files recorded from TNT HD ? If yes, just tell-me how long you want-them.

Yes, Intel Quick Sync seem activated, as it works now with ffdshow on non HD programs (since i had reserved memory for the graphic card ?).
But it is not fluid (disappointing). Still does not woks with H264.
Under libavcodec it is 1ms instable, under Quick sync it is 27-32ms, varying continuously.
FFDshow Version is tryouts rev4322 feb 13 20 12 21:44:36 (MSCV 2010) and says H264 is implemented.

If you need some extra info, feel free to ask me (with the way to get-it).

Esperado
5th March 2012, 04:38
Installed LAV video decoder. It says "Intel Quick Sync active". And wow !
Works out of the box. Even with H264 HD channnels. The best de-interlacing of all my (numerous) testings.
Added ffdshow raw video filter for a little Xsharpen, and gosh ! The best image of all. Soft and hardware.
I have too an ATI Radeon R6850.
50 I/s absolutely constant in EVR renderer, <1ms of instability.
And 0ms of instability with VMR9 (witch i prefer the texture).

Only a little disappointed by cpu usage: it eats as much cpu than ffdshow libavcodec.
Still here, Egur, if you need some files, and thanks so much for your help and fantastic work, together with Nevcairiel !

egur
5th March 2012, 08:13
Installed LAV video decoder. It says "Intel Quick Sync active". And wow !
Works out of the box. Even with H264 HD channnels. The best de-interlacing of all my (numerous) testings.
Added ffdshow raw video filter for a little Xsharpen, and gosh ! The best image of all. Soft and hardware.
I have too an ATI Radeon R6850.
50 I/s absolutely constant in EVR renderer, <1ms of instability.
And 0ms of instability with VMR9 (witch i prefer the texture).

Only a little disappointed by cpu usage: it eats as much cpu than ffdshow libavcodec.
Still here, Egur, if you need some files, and thanks so much for your help and fantastic work, together with Nevcairiel !

Good that it works. Your system is QS enabled but have some other issues I can't identify.
Both ffdshow and QS use the same QS dll but LAV sets the timestamps better. The latter is affected by the splitter in many cases and LAV does a better job than what I can do within my code (my code is not aware of the DirectShow environment as it's not a DS filter). LAV also has a very nice SW fallback mechanism in case a specific profile isn't supported in HW.
CPU utilization levels are affected by bitrate - high bitrate will make QS shine, lower bitrates will make libavcodec shine.

In live TV playback, LAV is superior to ffdshow with respect to QS usage as it works in lower latency. This has to do with timestamps generation as stated above.

Update
The latest (http://sourceforge.net/projects/qsdecoder/files/ffdshow_builds/) ffdshow I've built fixes most of the live TV issues. you should try it too.

CruNcher
5th March 2012, 08:59
Thus i say having a non copy back version like in the Reference Decoder would rock with less overhead lower latency :)

egur
5th March 2012, 09:35
Thus i say having a non copy back version like in the Reference Decoder would rock with less overhead lower latency :)

No image processing, no subtitles, must use EVR/VMR and the gain is mostly visible in benchmarks not in real playback of contemporary streams (e.g. most of stuff we watch today). One can use the MS decoder or a bunch of other DXVA decoders for that.

nevcairiel
5th March 2012, 09:39
To be fair, subtitles do work in many players, just not with ffdshows subtitle renderer or DirectVobSub.

CruNcher
5th March 2012, 09:46
if everyone looks @ what dukey did and builds up on it subtitles would be no problem :)
and image processing the most important stuff can still be done over the shaders though i dunno what for the quality of intels EVR chain is excellent :)

nevcairiel
5th March 2012, 10:53
if everyone looks @ what dukey did and builds up on it subtitles would be no problem :)

His subtitle renderer only supports IA44 colors, which is a 4-bit paletted format. For many subtitles, that is by far not enough.
In theory it should work with AYUV, but many GPUs don't support that.

easyfab
5th March 2012, 11:46
new driver for Intel HD Graphics : 15.26.3
Is there something new for Quicksync ?

egur
5th March 2012, 15:51
new driver for Intel HD Graphics : 15.26.3
Is there something new for Quicksync ?

Yes, HW WMV9 playback and probably MVC (not supported in my code) as well but this driver isn't ready for the public yet so I recommend not using it. I recommend sticking with the official SandyBridge drivers for now (15.22 family).

hajj_3
5th March 2012, 15:59
When will intel be announcing whether or not x264 encoding support will be added to ivy bridge or as a software update to sandy/ivy bridge? Hoping they will add support.

andyvt
5th March 2012, 16:18
When will intel be announcing whether or not x264 encoding support will be added to ivy bridge or as a software update to sandy/ivy bridge? Hoping they will add support.

QS offers h.264 support already on SNB.

Esperado
5th March 2012, 16:32
LAV also has a very nice SW fallback mechanism in case a specific profile isn't supported in HW.
CPU utilization levels are affected by bitrate - high bitrate will make QS shine, lower bitrates will make libavcodec shine.
How can i be sure QS is using Intel * hardware* in LAV ?
Well, about CPU usage, to give-you an idea (I5-2500k overclocked at around 4.5Ghz) i use ~50% CPU looking at 4 HD streams in the same time around 7/11 Mbits each in four different windows. Fluently.

Last question: Are we obliged to reserve memory in Bios for Intel graphics ? And, if yes what would be the good size, with 4Gb of ram to reserve less as possible ?

DragonQ
5th March 2012, 17:07
If you're running MPC-HC, go to View -> Filters -> LAV Video and you should see that it says "Active" next to the QuickSync decoder. If it doesn't, then it's fallen back to software mode.

hajj_3
5th March 2012, 21:54
QS offers h.264 support already on SNB.

I'm on about x264 specifically.

andyvt
5th March 2012, 21:56
I'm on about x264 specifically.

You mean the FOSS SW h.264 encoder?

hajj_3
6th March 2012, 02:16
i realise x264 is a h.264 encoder but i'm pretty sure there is no way to hardware encode x264 with intel chips only intel's own h.264 codec.

andyvt
6th March 2012, 02:28
i realise x264 is a h.264 encoder but i'm pretty sure there is no way to hardware encode x264 with intel chips only intel's own h.264 codec.

x264 is a component that encodes h.264. There should be little practical difference b/w its output and the output of Intel's QS h.264 HW encoder. The thing you're asking for already exists.

egur
6th March 2012, 09:29
QS offers h.264 support already on SNB.

I think he meant patching the x264 encoder with QS HW acceleration.
I don't know if such a project is in the works.

egur
6th March 2012, 09:34
How can i be sure QS is using Intel * hardware* in LAV ?
Well, about CPU usage, to give-you an idea (I5-2500k overclocked at around 4.5Ghz) i use ~50% CPU looking at 4 HD streams in the same time around 7/11 Mbits each in four different windows. Fluently.

Last question: Are we obliged to reserve memory in Bios for Intel graphics ? And, if yes what would be the good size, with 4Gb of ram to reserve less as possible ?

I didn't change the defaults (I think it was 400MB). More RAM for the GPU means more streams can be played simultaneously. You'll get better performance if you OC the RAM. CPU overclocking and GPU overclocking made little change in my (very limited) performance tests compared to RAM OC.

andyvt
6th March 2012, 13:09
I think he meant patching the x264 encoder with QS HW acceleration.
I don't know if such a project is in the works.

That would be a bizarre project :)

Esperado
6th March 2012, 20:57
I didn't change the defaults (I think it was 400MB). More RAM for the GPU means more streams can be played simultaneously. You'll get better performance if you OC the RAM. CPU overclocking and GPU overclocking made little change in my (very limited) performance tests compared to RAM OC.Overclock is for other softwares, like CAO or photo ones. And it speed the general behavior of Seven, including the boot time.
Well, i do not understand this memory stuff. i reduced to 150MB et don't see any difference. It looks like the memory is dynamically called by my TNT card software.

[edit] I discovered that the memory info for graphic card is available in the Intel graphic configuration panel (CTRL + Alt +F12). "Option and help -> information center".
By default, my Bios reserves only 64MB for the graphic card. The Bios setting modify only the minimum graphic memory value and it seems that the GPU is able to dynamically assign as much it needs. I can read, playing 3 TV windows, that the Graphics use only around 130MB. It do not seem change with the numbers of opened TV windows.

So i had set only 160MB for the graphics. The result shows: "Minimum graphic memory: 160Mo, maximum graphics memory: 1760Mo, Grahics memory in use: 127MB." Everything seems OK and i do not lose precious Ram that way.

Esperado
7th March 2012, 03:09
I had, with some Mpeg2 channels in VMR9 a strange behavior. Image perspective is not linear, and the left and right sides are like zoomed.
No problem with H264.
So, i'm back with EVR :-(
nb: this problem never occurred with sw decoders.
I prefer VRM9 because it gives more details in images (specially on skin's visages)

NikosD
7th March 2012, 07:59
Ivy Bridge QuickSync performance (Intel HD 4000) about 40% faster in transcoding than Sandy Bridge QS (Intel HD 3000).

The article says its advantage comes from decoder's performance (and drivers).

http://www.anandtech.com/show/5626/ivy-bridge-preview-core-i7-3770k/17

egur
7th March 2012, 09:30
I had, with some Mpeg2 channels in VMR9 a strange behavior. Image perspective is not linear, and the left and right sides are like zoomed.
No problem with H264.
So, i'm back with EVR :-(
nb: this problem never occurred with sw decoders.
I prefer VRM9 because it gives more details in images (specially on skin's visages)

Non linear scaling is an option you can turn on/off from the graphics control panel. The player might enable this feature overriding the driver defaults.

egur
7th March 2012, 10:01
Intel's Graphics Performance Analyzers 2012 R1 (http://software.intel.com/en-us/articles/vcsource-tools-intel-gpa/) has been released.

Esperado
7th March 2012, 11:57
Non linear scaling is an option you can turn on/off from the graphics control panel.Oh, thanks, Egur. It was activated by default in the graphic panel, but at 0%. Works perfect now.
BTW: I wrote a false info (the graphics panel do not update fast enough). On my system, it appears that the used memory is around 110MB, plus around 200MB for each full frame tv instance.

AS the graphic card can use the needed RAM dynamically, what is the interest to reserve more than the minimum for it in Bios ?

My god, all those direct show things are so confused and complicated....

egur
7th March 2012, 12:06
Oh, thanks, Egur. It was activated by default in the graphic panel, but at 0%. Works perfect now.
BTW: I wrote a false info (the graphics panel do not update fast enough). On my system, it appears that the used memory is around 110MB, plus around 200MB for each full frame tv instance.

AS the graphic card can use the needed RAM dynamically, what is the interest to reserve more than the minimum for it in Bios ?

My god, all those direct show things are so confused and complicated....
I don't why you changed the BIOS defaults, you should leave them as is. You should not lower the minimum values. Actual values are dynamic which is usually the best option.

Atak_Snajpera
7th March 2012, 12:21
does intel decoder support avc high 10bit profile?

egur
7th March 2012, 12:28
does intel decoder support avc high 10bit profile?

Only 4:2:0 8 bit for all codecs.
BTW, are you aware of any GPU that supports 10bit and/or 4:2:2/4:4:4 profiles?

Esperado
7th March 2012, 12:59
I don't why you changed the BIOS defaults.I had read somewhere in this thread that, to can activate itself, QSync decoding need >128MB of reserved ram. That's why. But this is not true, indeed.
You should not lower the minimum values
Once again, by default, it was 64MB in Bios. And i'm now at 160MB.

Atak_Snajpera
7th March 2012, 14:46
BTW, are you aware of any GPU that supports 10bit and/or 4:2:2/4:4:4 profiles?
It's a shame that even modern GFX cards do not support AVC fully .

Esperado
7th March 2012, 15:32
As a photographer, i'm very involved in images quality. Using Qsync in LAV video filter, WMR9 renderer, sharpen in Intel graphics and a little Xsharpen added in post processing with FFDshow raw filter, i get a fantastic result.

With good HD programs, images are so sharp, perspectives, reliefs and perception of the depth of field astonishing, without this "video game" non natural feeling i used to get with other decoders with enhanced sharpening.

De-interlacing is perfect and only limited by the latency of my screen. Stability problems forgotten.

Even in a non interesting program, it is a physical pleasure to look at the image's quality. Equal if not even better than my best digital post-processed photos.

Egur and nevcairiel, (and Intel) thank-you so much for your "state of the art" perfect work. I am very appreciative and grateful.

egur
7th March 2012, 21:31
Great!
You might consider switching to EVR for better quality.

Esperado
7th March 2012, 23:01
Great!
You might consider switching to EVR for better quality.I do not agree at all. For two reason (on my PC):
1- Frequency is perfectly stable with WMR9, (0ms instability). It is constantly variable around the 50hz point with EVR (49,98 to 50,02 Hz, 1ms instability).
2- EVR seem to smooth the colors, skins look like repaint with less details and i don't like it.
On a TV logo, as an example, there is a little color gradient on the background. I can see the mpg compression steps in VMR9 ( like in Gif images) while it is smoothed with EVR and seems monochromatic or monotone. Less sharp, too. Like too much noise reduction in photo post prod.
I had chosen WMR9 because images look better on my system with better details. Chosen with my eyes, with no doubt, while my original belief was that EVR was better.
The only side where i find EVR better is CPU usage.

Oh, i have a question. While my Ram is over-clocked at 828Mhz
7-8-7-20, and my CPU to 4550MHz, is any benefit to be expected overclocking Intel GPU from 850 to 1100MHz, just to watch live or recorded TNT programs (i don't play video games) ?

CharlieCL
7th March 2012, 23:28
Ivy Bridge QuickSync performance (Intel HD 4000) about 40% faster in transcoding than Sandy Bridge QS (Intel HD 3000).

The article says its advantage comes from decoder's performance (and drivers).

http://www.anandtech.com/show/5626/ivy-bridge-preview-core-i7-3770k/17

In my testing of the latest GPA, the architecture of Sandy Bridge may be the main problem of lower performance than Ivy Bridge. In my testing I found 50% GPU EU stalled. The wider ring bus of Ivy Bridge may make higher performance.

egur
8th March 2012, 09:11
I do not agree at all. For two reason (on my PC):
1- Frequency is perfectly stable with WMR9, (0ms instability). It is constantly variable around the 50hz point with EVR (49,98 to 50,02 Hz, 1ms instability).
2- EVR seem to smooth the colors, skins look like repaint with less details and i don't like it.
On a TV logo, as an example, there is a little color gradient on the background. I can see the mpg compression steps in VMR9 ( like in Gif images) while it is smoothed with EVR and seems monochromatic or monotone. Less sharp, too. Like too much noise reduction in photo post prod.
I had chosen WMR9 because images look better on my system with better details. Chosen with my eyes, with no doubt, while my original belief was that EVR was better.
The only side where i find EVR better is CPU usage.

Oh, i have a question. While my Ram is over-clocked at 828Mhz
7-8-7-20, and my CPU to 4550MHz, is any benefit to be expected overclocking Intel GPU from 850 to 1100MHz, just to watch live or recorded TNT programs (i don't play video games) ?

Skin tone correction can be turned off (or weakened) in the control panel. so does noise reduction, sharpening, auto contrast, etc.

Overclocking will not get you anything (anything good) if your use case is playback. it will help in transcoding and benchmarks. Your system is already very fast for playback.

egur
8th March 2012, 09:12
In my testing of the latest GPA, the architecture of Sandy Bridge may be the main problem of lower performance than Ivy Bridge. In my testing I found 50% GPU EU stalled. The wider ring bus of Ivy Bridge may make higher performance.

Wider ring bus? Where did you hear that?

mbcd
8th March 2012, 12:49
Hi Guys,

I could need some help getting it to run.

System: I7 2nd-Edition, Windows 7, Lucid Virtuo 1.2.105.17711, Intel HD 8.15.10.2622 Driver

I downloaded ffdshow from egur and installed it. But on Codec page I cant choose IntelQuickSync, only "libavcodec" or "disabled".

So it seems that something is going wrong.

My primary Videocard is an ATI, because my MB does not have an Graphic-Out-Port onboard.

Virtua says that it is running, so some idea where to fix ?

CharlieCL
8th March 2012, 17:21
Wider ring bus? Where did you hear that?

It was open published. here is the URL

http://www.anandtech.com/Gallery/Album/1375#6

The ring bus is 256-bits.

nevcairiel
8th March 2012, 17:23
The ring bus is 256-bits.

Sandy Bridge also has a 256-bit ring bus.

egur
8th March 2012, 18:07
Sandy Bridge also has a 256-bit ring bus.

Correct.
IvyBridge adds some micro architectural performance enhancements with respect to SSE/AVX. It should be worthwhile to write an AVX copy back function.