View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing
NikosD
5th January 2012, 21:43
@Cruncher
@Egur
I can't think of a way to be possible for any DXVA decoder or DXVA through Intel Media decoder to play a video file of a resolution beyond 1920 x 1080 in DXVA mode, when DXVA Checker says that there are no device decoders capable of more than 1920 x 1080.
If Intel drivers don't install decoder devices capable of more than 1080, you will never manage to run DXVA beyond 1080, I think.
If you run DXVA Checker to check the decoder devices for Intel HD graphics, you will see what I'm talking about.
BTW did you check http://xhmikosr.1f0.de/index.php?folder=c2FtcGxlcy8yMTYwcA==
It has a lot of 4K samples and good download speed.
egur
5th January 2012, 22:18
Hy egur :D
I'm in a lot of trouble:
I've downloaded last ffdshow compiled by you: quicksync is present during installation, but not after installation (on the normal ffdshow Video Codecs page): Mediaportal plays the file but the ffdshow icon says it's using libvacopec (cpu usage is high!)... in ffdshow video I can't configure QuickSync anymore :mad:
It's a bug or my system it's not supported?
Strange as my hardware is AsRock Core 100 HT:
http://www.asrock.com/nettop/overview.asp?Model=Core%20100HT#Specifications
it shouldn't be supported?
If not why installer deludes me?
Official ffdshow have QuickSync selectable but in the they used libvacodec...
I've to install by hand (?) the famous QuickSync,dll? There's none in my system :p
:thanks: for your support!
I've just downloaded and installed the latest SVN build (4225) and it works fine. The DLL name is IntelQuickSyncDecoder.dll BTW.
It looks like your system is based on an older CPU (Nehalem or Westmere) - the 1st generation i3/i5/i7. Although these systems should work, they don't work as good as SandyBridge. Are you sure the Intel driver is enabled?
CruNcher
5th January 2012, 22:20
Can you share on multiupload, download breaks all the time.
Did you tried it from both server ?
@Cruncher
@Egur
I can't think of a way to be possible for any DXVA decoder or DXVA through Intel Media decoder to play a video file of a resolution beyond 1920 x 1080 in DXVA mode, when DXVA Checker says that there are no device decoders capable of more than 1920 x 1080.
If Intel drivers don't install decoder devices capable of more than 1080, you will never manage to run DXVA beyond 1080, I think.
If you run DXVA Checker to check the decoder devices for Intel HD graphics, you will see what I'm talking about.
BTW did you check http://xhmikosr.1f0.de/index.php?folder=c2FtcGxlcy8yMTYwcA==
It has a lot of 4K samples and good download speed.
The problem is it might not support even High Bitrate QFHD and DXVA cant change that it has nothing todo with it but the Decoder Asic
So it seems pretty useless to enable it when people would only get bad performance from it in the end customer support would be ringing all day and night for the ISVs as well, just for the same answer we never advertised with it it's not capable of it have a good day ;) .
It might be enough to higher the Frequency of the current Asic i guess in Ivy Bridge they will have done that (due to all the improvements a little more heat is no problem, they would cancel it out with the tri gates), The Quicksync encoder partly relies in performance on the GPU frequency (EU,Shader) but i doubt the Decoder shares the same though easy to test :)
NikosD
5th January 2012, 22:39
@Cruncher
Have you seen my thread with DXVA hardware comparisons ?
http://forum.doom9.org/showthread.php?t=163110
As you can see, QS Decoder is more than faster from VP5 which Nvidia says it's capable of 4K.
So if Nvidia says that VP5 is capable of 4K, then what Intel has to say about QS Decoder performance and 4K ?
Also the QS decoder has the same frequency as the GPU.
CruNcher
5th January 2012, 22:50
@Cruncher
Have you seen my thread with DXVA hardware comparisons ?
http://forum.doom9.org/showthread.php?t=163110
As you can see, QS Decoder is more than faster from VP5 which Nvidia says it's capable of 4K.
So if Nvidia says that VP5 is capable of 4K, then what Intel has to say about QS Decoder performance and 4K ?
Also the QS decoder has the same frequency as the GPU.
hmm yep that would be 2x more frequency than VP4 is by default running @ and could explain the performance + being ondie. Did someone ever benchmarked results GPU overclocked if the Asic and GPU are really bound with each others clocks you should see a improvement in Decoding performance, though due to my understanding they are separated from each other same as with Nvidias Asic most probably also AMDs.
Anyway you are right it's a little strange and i also would expect those QFHD Sample to play without issues though GPU seems overloaded i didn't look @ the stats yet but my mouse starts to lag and whole Aero latency explodes this is a clear indication of GPU @ its edge, though could also explain that GPU and Decoder are really dependent on each other and in the case of the Memory Copy they are so it's hard to say which of those 2 possibilities is the case here with QFHD playback. So you could say the Memory Copy is causing the stress and spikes seeing that the whole Aero Desktop is under pressure it very well could be the case and with DXVA it might run flawless.
What i could try is to unload the GPU Shaders from some load and see how that works out using MadVR @ lowest quality config or use overlay directly :)
Yep CPU gets @ the edge (fluctuates heavily) also when Rendering into Nul so most probably it is the the Memory Copy overload that is causing the problems, so with DXVA it should indeed play flawless :)
nevcairiel
6th January 2012, 00:10
Eric, i debugged the Aspect Ratio problem a bit, and i found why its not working.
Apparently, the structure you get back from GetVideoParams does not update - however the structure thats associated to the surface itself does contain the new data.
So, in CQuickSync::ProcessDecodedFrame, if you look at pSurface->Info, all the needed information is there, you just have to plug it into your output frame.
Hope you can address that soon'ish. :)
egur
6th January 2012, 13:09
Eric, i debugged the Aspect Ratio problem a bit, and i found why its not working.
Apparently, the structure you get back from GetVideoParams does not update - however the structure thats associated to the surface itself does contain the new data.
So, in CQuickSync::ProcessDecodedFrame, if you look at pSurface->Info, all the needed information is there, you just have to plug it into your output frame.
Hope you can address that soon'ish. :)
:thanks: for finding the workaround. I'll test it. I'm not sure this is the intended way of doing things, but as a temporary solution I'll implement the fix. I'm not very happy with how multithreading is done in my code and I already found a bug. I'll release a new version in a few days.
Update
I've committed the last fixes (rev16) including the AR fix - nev's workaround was working great 10x again.
Did very basic testing so I'm not releasing a build just yet. Only after the MT rework.
NikosD
6th January 2012, 16:45
My second visit to Core i5-2400 system and a more careful look to GPU frequency, Power consumption and performance.
During playback with PotPlayer DXVA and PotPlayer QS the GPU is always at default speed of 850MHz, the CPU at 1.6GHz and the total consumption of CPU package is ~12W - 15W and QS about 7 - 8 W.
To find QS power consumption, I substract from the whole package the IA cores and GT cores.
But for 60fps during playback with QuickSync only (not DXVA) the total consumption goes up to 25 - 28 W, because although GPU remains at default speed of 850 MHz, the CPU goes up to 3.2GHz (Turbo mode).
QS consumption still remain to 7-8 W.
During benchmarking I had a real surprise.
Most of the time during 60fps clips benchmarking with QS FFDshow the GPU remained at 850 MHz, it didn't go to 1100 MHz.
During pure DXVA benchmarking the GPU (QS) always goes to maximum speed of 1100 MHz, just as QS FFDshow goes up to 1100 MHz during benchmarking of non 60fps clips.
That explains the "poor" performance of QS FFDShow vs QS DXVA in 60fps clips. It never pushes QS to maximum speed.
The power consumption went very high up to 43W for the whole CPU package because CPU and GPU during benchmarking works at Turbo mode (3.2 GHz) and 1100 MHz.
QS FFDshow had 7-9 W more power consumption than QS DXVA.
One last thing...
During QS FFDShow VC-1 benchmarking, GPU (QS) was at default speed of 850 MHz, the QS utilization was ONLY 35% !! but it decoded 200 fps at the most difficult VC-1 clip I have found.
So if someone could use VC-1 decoder at 1100MHz and 97-98% utilization, he could easily exceed 700 fps!
nevcairiel
6th January 2012, 17:33
Update
I've committed the last fixes (rev16) including the AR fix - nev's workaround was working great 10x again.
Did very basic testing so I'm not releasing a build just yet. Only after the MT rework.
Tried it, and seems to be working great.
I disabled MT for the time being, and still going with r16. :)
egur
6th January 2012, 20:17
Tried it, and seems to be working great.
I disabled MT for the time being, and still going with r16. :)
Very well, maybe you can put a setup option to disable/enable MT for stress testing or benchmarks.
nevcairiel
6th January 2012, 20:41
I got reported a rather odd bug..
Sample: http://www.multiupload.com/96E38TN3HI
This file hangs the player after a few seconds when multi-threading is disabled. Funny enough, it works with multi-threading on, however not completely. If you seek alot, it can hang the player too.
The "special" thing about this file appears to be that it has 16 ref-frames, however i'm not sure thats the real cause for the problem.
I'll try to debug a bit as well, but if you get a chance to look at it, that would be great. :)
Edit:
First debug results:
It seems to hang in CQuickSync::ProcessDecodedFrame in the loop that waits for a free frame, but never gets one (the while (m_FreeFramesPool.Empty()) loop)
I increased the capacity in CQuickSync::InitDecoder from 4 to 20, and the hang seems to be gone in initial tests. I tried 8 before, and hang was still there, so i went overboard with 20 to be sure. :p
I'm not sure what that the fixed-size pool is good for anyway. If it cannot get one, just create a new one? Maybe i'm missing something. :) At least 4 seems to be not enough for some streams like that.
nevcairiel
6th January 2012, 21:33
Another problem/bug
When decoding Live TV with DVBViewer, it frequently happens that after a channel change you don't get an image for 20-30 seconds or so, then it suddenly starts working.
In the period when its not working, i'm spammed with these errors:
QSDcoder: Decode MFX_ERR_NOT_ENOUGH_BUFFER
QSDcoder: Error - ran out of work buffers!
Again, i'll try to look into it. :)
Edit:
I think i know why. Apparently you don't init the decoder until OnSeek is called, right? However, the Source filter in DVBViewer doesn't call NewSegment during the start of playback (and thus no OnSeek is triggered), it may take a while for this to happen (for some reason), and such, its not initialized in time.
I tried manually calling OnSeek during my own init, and then it works. I would suggest to check if you init'ed in ::Decode(), and if it didn't happen yet, perform these steps there? As i understand, the delay is only there to allow for setting the D3D device manager, so when you start decoding something, you should have that setup already.
egur
6th January 2012, 21:50
I got reported a rather odd bug..
Sample: http://www.multiupload.com/96E38TN3HI
This file hangs the player after a few seconds when multi-threading is disabled. Funny enough, it works with multi-threading on, however not completely. If you seek alot, it can hang the player too.
The "special" thing about this file appears to be that it has 16 ref-frames, however i'm not sure thats the real cause for the problem.
I'll try to debug a bit as well, but if you get a chance to look at it, that would be great. :)
Edit:
First debug results:
It seems to hang in CQuickSync::ProcessDecodedFrame in the loop that waits for a free frame, but never gets one (the while (m_FreeFramesPool.Empty()) loop)
I increased the capacity in CQuickSync::InitDecoder from 4 to 20, and the hang seems to be gone in initial tests. I tried 8 before, and hang was still there, so i went overboard with 20 to be sure. :p
I'm not sure what that the fixed-size pool is good for anyway. If it cannot get one, just create a new one? Maybe i'm missing something. :) At least 4 seems to be not enough for some streams like that.
This is a special clip - a specific call to Decode caused more than 4 frames to be outputted. I have overlooked this case.
the fix is add a call to DeliverSurface after ProcessDecodedFrame:
ProcessDecodedFrame(pSurface);
DeliverSurface(true);
The reason to use 4 is because using more wastes a lot of memory and doesn't gain any speed (using current implementation anyway). There's a race who works faster - the HW decoder or the frame copy. If the decoder is super fast (low bitrate) than an unlimited queue will eat up all the memory. If the decoder is very slow compared to the frame copy, a large queue will not help.
This will probably change a few times before I'm done with it.
Thanks for finding another bug.
egur
6th January 2012, 22:02
Another problem/bug
When decoding Live TV with DVBViewer, it frequently happens that after a channel change you don't get an image for 20-30 seconds or so, then it suddenly starts working.
In the period when its not working, i'm spammed with these errors:
QSDcoder: Decode MFX_ERR_NOT_ENOUGH_BUFFER
QSDcoder: Error - ran out of work buffers!
Again, i'll try to look into it. :)
Edit:
I think i know why. Apparently you don't init the decoder until OnSeek is called, right? However, the Source filter in DVBViewer doesn't call NewSegment during the start of playback (and thus no OnSeek is triggered), it may take a while for this to happen (for some reason), and such, its not initialized in time.
I tried manually calling OnSeek during my own init, and then it works. I would suggest to check if you init'ed in ::Decode(), and if it didn't happen yet, perform these steps there? As i understand, the delay is only there to allow for setting the D3D device manager, so when you start decoding something, you should have that setup already.
OK, i think I know the problem. I don't have a live TV capture setup and I'm not familiar with DVBViewer.
The decoder is initialized late to handle full screen exclusive mode where the decoder must have an external D3D device manager.
Long story short, because of DVD playback issues (DVD doesn't send NewSegment either) the Decode function looks for the internal variable m_bNeedToFlush. This variable is set to true on either NewSegment (OnSeek) or BeginFlush. It means that the decoder has received a flush (discard frames) event and the flushing have not completed. This is a must due to the asynchronous nature of the BeginFlush/EndFlush function calls (Nev, you know this but I'm trying to draw a complete picture).
Since DVBViewer doesn't send neither NewSegment nor BeginFlush, the Decoder Init function should set m_bNeedToFlush to true. In normal playback, OnSeek is called anyway so no performance is lost at all.
In InitDecoder:
MSDK_TRACE("QSDcoder: InitDecoder\n");
CQsAutoLock cObjectLock(&m_csLock);
m_bNeedToFlush = true;
nevcairiel
6th January 2012, 22:16
Since DVBViewer doesn't send neither NewSegment nor BeginFlush, the Decoder Init function should set m_bNeedToFlush to true. In normal playback, OnSeek is called anyway so no performance is lost at all.
It can also happen if a dynamic format change occurs.
Its possible for a DirectShow decoder to receive a new media type and perform a complete format change without any flush on the graph.
This change would also take care of that, good.
PS:
Are you commiting those changes, or do i have to ship a modified version to my testers? :D
Also, doesn't multi-threading also need a fix for the first issue? Well, i guess when you're reworking the whole MT anyway, you can handle it somehow. :)
egur
7th January 2012, 00:22
It can also happen if a dynamic format change occurs.
Its possible for a DirectShow decoder to receive a new media type and perform a complete format change without any flush on the graph.
This change would also take care of that, good.
PS:
Are you commiting those changes, or do i have to ship a modified version to my testers? :D
Also, doesn't multi-threading also need a fix for the first issue? Well, i guess when you're reworking the whole MT anyway, you can handle it somehow. :)
If it's the same kind of codec then there's no problem changing the media type during playback (should be). If a new codec is used then you must destroy the decoder and reinitialize - call Flush() before destruction.
I suggest for now to disable MT for testing.
I've made good progress on MT and probably release tomorrow.
nevcairiel
7th January 2012, 00:53
Sounds good.
With all the recent fixes, its working quite nicely now. :)
El Topo
7th January 2012, 11:59
Thanx for the work guys.
What Renderer should I use with QS Decoder? Normally I use EVR-Custom, but the performance seems to be significant lower than with vanilla EVR.
I have a SB Pentium G630T and a Celeron G530, if you want me to test specific things, let me know.
Nev, good work with the integration. Can't figure out any further problems with the fix1-version from yesterday.
egur
7th January 2012, 15:06
Thanx for the work guys.
What Renderer should I use with QS Decoder? Normally I use EVR-Custom, but the performance seems to be significant lower than with vanilla EVR.
I have a SB Pentium G630T and a Celeron G530, if you want me to test specific things, let me know.
Nev, good work with the integration. Can't figure out any further problems with the fix1-version from yesterday.
With SandyBridge, you best bets are either EVR (standard) or MadVR. EVR has better performance and more image processing algorithms. MadVR has several advanced features and is more configurable.
El Topo
7th January 2012, 17:42
Good to know... I always thought EVR Custom is more andvanced than EVR standard.
I use DVBViewer and XBMC most of the time, so no Madvr for me at the moment....
nevcairiel
7th January 2012, 17:44
Its more advanced, but it also needs more performance for those advanced features.
fano
7th January 2012, 18:49
I've just downloaded and installed the latest SVN build (4225) and it works fine. The DLL name is IntelQuickSyncDecoder.dll BTW.
It looks like your system is based on an older CPU (Nehalem or Westmere) - the 1st generation i3/i5/i7.
Thanks egur for your prompt response... for start now I've installed a official ffdshow: rev4192 for precision (I've formatted to restart clean!) so, I suppose it's normal there is no Quicksync dll (but in H264 decoder I've QuickSync selectable BUT not working as in you fork... this is a bug?) :p
So that's my CPUZ report m(it says it's ARRANDALE CPU!):
http://www.4shared.com/archive/EY6wLrGj/AsRcok_Core100.html
And GPUZ report:
http://gpuz.techpowerup.com/12/01/06/7fz.png
Although these systems should work, they don't work as good as SandyBridge. Are you sure the Intel driver is enabled?
Ahrrr sad news... it could not work? I hope you'd do work!
Yes I've installed GPU driver if it's that you're talking: in Intel GPU information I see it's version 6.14.10.5387...
I'd suppose it's the last...
There something other to install I've lost?
Thanks for all your help... it's appreciated :D
nevcairiel
7th January 2012, 18:52
Thats Windows XP, eh?
The Intel QuickSync tech only works on Vista or 7, no XP support.
egur
7th January 2012, 19:48
Good to know... I always thought EVR Custom is more andvanced than EVR standard.
I use DVBViewer and XBMC most of the time, so no Madvr for me at the moment....
EVR is special in the sense that it's output is different between Intel, Nvidia and AMD. Also different between GPU families and generations.
Many people think EVR uses simple bi-linear interpolation for scaling. For old cards, they are right. For the new cards that very wrong.
SandyBridge use a very advanced scaler. So it's best to test for yourself. The power/performance of the GPU is best utilized by the EVR. You should check the quality by applying a large scale factor to a clip (e.g. 4x) as well as very small scale factor (e.g. 1/4x).
rica
7th January 2012, 19:51
AFAIK, egur's utility wouldn't work on Arrandale/Clarkdale even it was Vista or Seven?
egur
7th January 2012, 19:52
Thats Windows XP, eh?
The Intel QuickSync tech only works on Vista or 7, no XP support.
Correct, XP not supported (need DXVA2 to work). Sorry.
egur
7th January 2012, 19:55
AFAK, egur's utility wouldn't work on Arrandale/Clarkdale even it was Vista or Seven?
Works on Penryn (or newer) with Intel GPU on Vista (or newer).
I only test Windows 7 (and newer) and SandyBridge (and newer).
clsid
7th January 2012, 19:57
I've installed a official ffdshow: rev4192That is an old version and does not yet include quicksync. Latest build available on SourceForge is 4225.
rica
7th January 2012, 20:00
Sorry egur, I've missed your latest update. :o
I will test it with my iGPU; i3 540+H55 on Seven 32.
fano
7th January 2012, 21:09
Correct, XP not supported (need DXVA2 to work). Sorry.
So I've lost... in the END :mad:
I NEVER install Vista or 7.. I hate them!
No chance to make it works in XP, too? Pretty please with chocolate on top :D
In the end Windows XP is the best Microsoft OS ever, right ;) ?
nevcairiel
7th January 2012, 21:15
No chance to make it works in XP, too? Pretty please with chocolate on top :D
No chance. Its impossible, the XP driver does not allow it.
The only chance would be if Intel starts supporting this in the XP driver, but the chance of that is slim to none.
In the end Windows XP is the best Microsoft OS ever, right ;) ?
It most certainly is not. :)
PS:
Eric, how is the progress on the next version coming? :)
I'm pondering releasing soon or waiting.
egur
7th January 2012, 21:37
Eric, how is the progress on the next version coming? :)
I'm pondering releasing soon or waiting.
Rev17 is out with all the fixes and the new MT code.
Atak_Snajpera
7th January 2012, 22:06
So I've lost... in the END :mad:
I NEVER install Vista or 7.. I hate them!
No chance to make it works in XP, too? Pretty please with chocolate on top :D
In the end Windows XP is the best Microsoft OS ever, right ;) ?
you are acting like child saying that you hate vista and 7. does your wonderful ancient xp have gpu accelerated interface, trim function for ssd, dx11 , self repairing capabilty, overall higher stabilty, evr ... and so on
rica
7th January 2012, 22:12
I will test it with my iGPU; i3 540+H55 on Seven 32.
Sorry but i can't find DXVA option on ffdshow?
ffdshow rev. 4225 + Intel HD Graphics: ver.8.15.10.2559 on Seven 32.
Ckarkdale i3 540+H55.
Thanks!
EDIT: OK, fixed, works like a charm. :thanks:
Chain: Lav Splitter > ffdshow video DXVA > EVR CP.
egur
7th January 2012, 22:17
Eric, how is the progress on the next version coming? :)
I'm pondering releasing soon or waiting.
Update: use rev18 - it has the correct version number. other than that rev17 is the same.
egur
7th January 2012, 22:24
Version 0.22 beta is out with the following changes:
* Much better multi-threading code (many fixes from v0.21).
* Fixed dynamic aspect ratio change during playback.
* FFDShow rev4227
Download from SourceForge home page (http://sourceforge.net/projects/qsdecoder/)
nevcairiel
7th January 2012, 22:35
LAV Filters 0.44 now also "officially" features the 0.22 decoder.
Multi-threading is still off for the time being, until i can do proper testing.
egur
7th January 2012, 23:03
LAV Filters 0.44 now also "officially" features the 0.22 decoder.
Multi-threading is still off for the time being, until i can do proper testing.
That's great news :cool:
rica
7th January 2012, 23:19
LAV Filters 0.44 now also "officially" features the 0.22 decoder.
Multi-threading is still off for the time being, until i can do proper testing.
I gave it a go with Clarkdale. Even i can see "Intel Quicksync" option under "Hardware Acceleration", it says it is "not available".
And it is really "not available".
http://img843.imageshack.us/img843/7549/nev.th.png (http://imageshack.us/photo/my-images/843/nev.png/)
EDIT: I haven't tried with ffdshow yet, since sourceforge has collapsed for the time being.
EDIT: Here is the test file for you. (It is working with ffdshow 4225, btw.):
http://www.mediafire.com/?x2jk7irhns6506d
_ _ _ _
DragonQ
7th January 2012, 23:35
Yep, not working on Arrandale either (already said this in LAV thread :)).
nevcairiel
8th January 2012, 01:37
EDIT: Here is the test file for you. (It is working with ffdshow 4225, btw.):
http://www.mediafire.com/?x2jk7irhns6506d
_ _ _ _
Not sure what that file is meant to show..? Decodes just fine. :)
Anyhow, you're saying that QuickSync in ffdshow works on your Clarkdale, but with LAV it doesn't?
Did you check the CPU usage to confirm that its really using hardware decoding?
rica
8th January 2012, 01:41
Not sure what that file is meant to show..? Decodes just fine. :)
Anyhow, you're saying that QuickSync in ffdshow works on your Clarkdale, but with LAV it doesn't?
Did you check the CPU usage to confirm that its really using hardware decoding?
Sure I did. I will add the screen caps if you have enough time to wait.
EDIT: Here, they are:
http://img683.imageshack.us/img683/2816/004vb.th.png (http://imageshack.us/photo/my-images/683/004vb.png/)
http://img543.imageshack.us/img543/1026/005cq.th.png (http://imageshack.us/photo/my-images/543/005cq.png/)
_ _ _ _
nevcairiel
8th January 2012, 01:52
I can probably patch up a debug version that shows why QS fails, maybe it sheds some light on things.
Edit:
http://files.1f0.de/lavf/LAVVideo-0.44-debug.zip
Throw that on top of 0.44, and a log file should appear on your desktop. Maybe there is something interesting in there....
Paste the log on pastebin or something, don't want to wait for attachment approval. ;)
nevcairiel
8th January 2012, 02:11
EDIT: Here, they are:
http://img683.imageshack.us/img683/2816/004vb.th.png (http://imageshack.us/photo/my-images/683/004vb.png/)
http://img543.imageshack.us/img543/1026/005cq.th.png (http://imageshack.us/photo/my-images/543/005cq.png/)
Thats DXVA, not QuickSync
In any case, a log file would be useful.
rica
8th January 2012, 02:21
OK, tomorrow/or today.
Thanks!
CruNcher
8th January 2012, 04:12
EVR is special in the sense that it's output is different between Intel, Nvidia and AMD. Also different between GPU families and generations.
Many people think EVR uses simple bi-linear interpolation for scaling. For old cards, they are right. For the new cards that very wrong.
SandyBridge use a very advanced scaler. So it's best to test for yourself. The power/performance of the GPU is best utilized by the EVR. You should check the quality by applying a large scale factor to a clip (e.g. 4x) as well as very small scale factor (e.g. 1/4x).
The biggest issue is still subtitling though which @ least in MPC-HC is still dependent on EVR Custom.
So you allways have 1 issue either no Subtitling or all the Deinterlace Pain, there are only a few renderer that can do everything DXVA, + Deinterlacing + Subtitling all custom DirectX Renderer. MadVR could be another one once it supports DXVA + Custom Shader Code :). Though i wouldn't agree with this "The power/performance of the GPU is best utilized by the EVR." it can only be fully utilized by a Custom Renderer these days that utilizes the same backend as a Game Engine does ;)
NikosD
8th January 2012, 08:47
Version 0.22 beta is out with the following changes:
* Much better multi-threading code (many fixes from v0.21).
* Fixed dynamic aspect ratio change during playback.
* FFDShow rev4227
LAV Filters 0.44 now also "officially" features the 0.22 decoder.
Multi-threading is still off for the time being, until i can do proper testing.
I have done no tests with FFDShow 0.22 nor LAV Filters 0.44, but I think that by using Multi-Threaded code for the required work that has to be done by IA cores, may provide solution for eveything.
I mean that multi-threaded code should:
1) Increase the throughput required by 60fps clips
2) Feed better the QS decoding engine and
3) Push QS to maximum speed (frequency)
After all these, the decoding performance of 60fps clips should definitely increase.
About power consumption, the CPU frequency will go down from the Turbo Mode of single threaded code during playback of 60fps clips and probably power consumption will go down too.
During benchmarking or during playback of future difficult clips at 120fps the power consumption will increase again, I think.
Looking forward to test your next optimized multi-threaded versions in real tests.
egur
8th January 2012, 09:02
I have done no tests with FFDShow 0.22 nor LAV Filters 0.44, but I think that by using Multi-Threaded code for the required work that has to be done by IA cores, may provide solution for eveything.
I mean that multi-threaded code should:
1) Increase the throughput required by 60fps clips
2) Feed better the QS decoding engine and
3) Push QS to maximum speed (frequency)
After all these, the decoding performance of 60fps clips should definitely increase.
About power consumption, the CPU frequency will go down from the Turbo Mode of single threaded code during playback of 60fps clips and probably power consumption will go down too.
During benchmarking or during playback of future difficult clips at 120fps the power consumption will increase again, I think.
Looking forward to test your next optimized multi-threaded versions in real tests.
MT's purposes are the following:
1) Reduce decode thread latency - the decode thread will do just the HW decode and delivery of decoded images down the pipeline. A worker thread will do the rest - most time consuming tasks are frame copy and lockings the d3d9 surfaces. This allows more CPU work (video processing) to performed after decode.
2) Increase performance by adding parallelism - since the HW decode and the frame copying work in parallel, the HW decoder is better utilized allowing more FPS.
The MT work is not done. I believe I can achieve better performance than v0.22. v0.22 is much more stable than 0.21.
CruNcher
8th January 2012, 09:24
@ Eric
you allways post that sf.net url with a " @ the end ;)
BTW: Nev does the QFHD fallback to LAV (CPU) it's better though not doing that for ffdshow-quicksync also in terms of having a comparison point as Nev has no option in LAV Video to disable this restriction.
NikosD
8th January 2012, 09:35
MT's purposes are the following:
1) Reduce decode thread latency - the decode thread will do just the HW decode and delivery of decoded images down the pipeline. A worker thread will do the rest - most time consuming tasks are frame copy and lockings the d3d9 surfaces. This allows more CPU work (video processing) to performed after decode.
2) Increase performance by adding parallelism - since the HW decode and the frame copying work in parallel, the HW decoder is better utilized allowing more FPS.
The MT work is not done. I believe I can achieve better performance than v0.22. v0.22 is much more stable than 0.21.
So, the MT code involves IA cores, GPU cores, MFX engine or all of them ?
Could you give percentages for each component using MT code?
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.