View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing
CiNcH
3rd April 2013, 20:16
It does NOT fix "DXVA Best" Deinterlacing with XBMC/ffmpeg. I am experiencing a still image when selecting this option. "DXVA BOB" works but nobody actually wants to use that nowadays...
foxyshadis
3rd April 2013, 22:43
Quick OT post regarding this.
The GUI has been Windows 8ified, which I'm okay with but you gotta do it right and some of the graphics and design choices are pretty bad. The right click menu has been expanded on the left more to fit Intel's logo which doesn't look good, also why keep the white background around the logo? The Tiles have images that are clearly stretched and well, don't get me started on those info ? circles...
I feel like the clock has been turned back couple of decades, the previous UI was a lot more visually appealing, also my WEI scores didn't go up! Disappointed. :P
Yikes, you weren't kidding.
https://photos-1.dropbox.com/t/0/AACyooqLXKPLf4umPVj35WcFZu5Tq-dHJkzZ3Dif-fUOMw/12/54412753/png/320x320/1/_/0/2/context.png/5y2Fu-zfqcu2B_Xdxznrihr9YUE7X4StDNSacsOcvgU (https://dl.dropbox.com/u/54412753/context.png)https://photos-3.dropbox.com/t/0/AAAoHFX7ZklhMnDWWQDsOeeGjZkQMrBJsX5grOQEzWLJfg/12/54412753/png/320x320/1/_/0/2/intelmain.png/F9zZrpD4vHK9PmvAzYFhng2M7htjwrB6Y59VQ6g1_T4 (https://dl.dropbox.com/u/54412753/intelmain.png)https://photos-6.dropbox.com/t/0/AADjxLDPzT-9T0sHrX_JHrIJOQKb2_jk1hvLnF_5T0mRmQ/12/54412753/png/320x320/1/_/0/2/intel.png/kBLkk8HtHu10BbRUNMtl5nTKDKwdaaeHQzh8CXDkjIY (https://dl.dropbox.com/u/54412753/intel.png)
Back to our regularly scheduled topic, sorry.
foxyshadis
3rd April 2013, 22:47
From the above driver..
Does this mean what i think it does? Using QuickSync even if the Intel GPU has no attached screen?
Anything else wouldn't make any sense, because it worked before, right?
Too bad its only Win8, i won't get to test that anytime soon.
No, that means QuickSync now works with Optimus. Before, you had to either disable Optimus, or at least disable it for specific applications, to get QuickSync working. I'm not sure if it automatically disables it when using quick sync, or if it just works even if the discrete GPU is running (awesome), either way it saves you the trouble.
andyvt
4th April 2013, 01:53
Does this mean what i think it does? Using QuickSync even if the Intel GPU has no attached screen?
Anything else wouldn't make any sense, because it worked before, right?
Yes, although it should already be possible if the transcoding app enables DX11. Maybe the new driver enables it without explicit action by the transcoding app?
egur
5th April 2013, 16:56
i got an error with quicksync.
my mpc hc crashes when 4k content is played with lavfilter quicksync.
dxva works fine and dxva cp does not work with 4 k but it falls back to software decoding. so i'm pretty sure the problem is in the quicksync decoder.
i'm using a i7 3770k.
my aim is to run 4k vids throw quicksync on my display with is connected to my amd graphic card which can't handle 4k at all.
1080p works fine with quicksync.
i think i need the error log but is not in the mpc hc folder...
Specify driver details and if it crashes on specific clip.
hi. Is there a list/table of cpu models that infact supports Quicksync? Im defining a new spec for large scale CCTV workstations wich is rackmounted and i only can seem to find Xeon E5-26xx series cpu's.. and i cant seem to figure out if they support quicksync.
br TE
The E5 CPUs do not have a GPU, so no Quicksync. All desktop, laptop and E3 Xeons have GPUs.
nevcairiel
5th April 2013, 17:00
E3 Xeons have GPUs.
Actually, not all E3 Xeons have GPUs, there are models with and without.
http://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors#.22Sandy_Bridge.22_.2832_nm.29
huhn
5th April 2013, 17:05
Specify driver details and if it crashes on specific clip.
the new ivy bridge driver fixed the issue
it happens with win64_152812 15.28.12.2932 and was fixed with Win64_15313 Intel 9.18.10.3071(15.31) from 3.04.2013
sample: http://xhmikosr.1f0.de/samples/2160p/DucksTakeOff/
huhn
5th April 2013, 17:08
Actually, not all E3 Xeons have GPUs, there are models with and without.
http://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors#.22Sandy_Bridge.22_.2832_nm.29
all "P" desktop cpu don't have a gpu and the 2550"K" too http://ark.intel.com/products/65647/
egur
5th April 2013, 17:24
Quicksync support when using dGPU is done via DirectX11 API. DX11 allows enumerating headless GPUs along with DXVA. I don't dig too deep into this yet.. This feature (parts of it) don't exist in Windows 7. I don't have a Win8 system to play with but this will change when I return from my vacation. Not having an IVB or HSW at home doesn't help either.
BTW, mobile CPUs are usually available in rPGA sockets but such boards are rare. LGA socket based Processors are cheaper and easier to manufacture.
The HTPC market is not important enough for board or pc makers, because mobile processors are the optimal solution for HTPCs. In my view anyway.
HeadlessCow
5th April 2013, 19:00
Quicksync support when using dGPU is done via DirectX11 API. DX11 allows enumerating headless GPUs along with DXVA. I don't dig too deep into this yet.. This feature (parts of it) don't exist in Windows 7.
Does enough of it exist in Win 7 that you think you'll be able to get it working?
I'm putting together a new system and using Quicksync to hardware decode while I use x264 to software encode will, presumably, give me a nice speed boost just like using DGIndexNV gives me now. I've got the choice of Win7 or Win8 and I was leaning towards Win7, but this (combined with Hyper-V) could be the feature that pushes me towards Win8 instead.
If you're not sure (or guess wrong now) it's not a huge deal, but I figured I might as well check to see if you had a decent guess :)
foxyshadis
6th April 2013, 01:47
This feature (parts of it) don't exist in Windows 7.
Does it work with the new Win7 platform update that brings partial D3D 11.1 to Win7?
egur
6th April 2013, 04:37
Does it work with the new Win7 platform update that brings partial D3D 11.1 to Win7?
I didn't get any info on the above update yet. Do you have link?
Does enough of it exist in Win 7 that you think you'll be able to get it working?
You can have headless support with dx9 today as shown in this thread multiple times. Search for "hybrid GPU" in this thread.
Win8 should allow headless support without any hacks. Win8 adds new DX API functions that allows this to work.
GTPVHD
6th April 2013, 06:48
http://blogs.msdn.com/b/chuckw/archive/2012/11/14/directx-11-1-and-windows-7.aspx
http://support.microsoft.com/kb/2670838
Unfortunately KB2670838 is known to be rather buggy and breaks a lot of other software. Still waiting for MS to release a newer KB2670838 V2 update with their bugs fixed.
Actually, not all E3 Xeons have GPUs, there are models with and without.
http://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors#.22Sandy_Bridge.22_.2832_nm.29
So in other words, theres no possibility to get a Dual CPU solution with Quicksync support?
nevcairiel
7th April 2013, 19:30
So in other words, theres no possibility to get a Dual CPU solution with Quicksync support?
Thats right.
NikosD
9th April 2013, 21:13
Well I did manage to overclock Core i5-2400 GPU from 1100 MHz to 1450 MHz - 32% raise.
Also I managed to overclock memory from 1333MHz to 1373MHz with an increase to CPU from 3.1GHz to 3.2GHz.
After all these I ran dxva benchmarks of clips 2, 3 and 8 of my collection.
I used latest drivers and DXVA Checker 2.9.1 along with LAV 0.55.3 Native
Clips 2 and 3 didn't go up to expected performance, although of course they had an increase.
I analyzed the benchmarking procedure with Intel GPA 2013 R1 to find out that those 2 clips didn't fully utilize the MFX engine.
Also the combination of latest drivers, dxva checker and LAV video was slower than the one of the original benchmarks of my first page of benchmarks.
But clip 8 was a different story.
The performance went from 118fps to 159fps. Big improvement.
Intel GPA said that utilization of MFX engine was 99% for clip 8.
Now Eric with my overclocked GPU, I'm ready for 4K :)
NikosD
12th April 2013, 20:01
Eric,
by changing media settings to application settings in latest Intel drivers, I got a huge improvement in benchmarks using DXVAChecker.
Is there a reason that Intel drivers have other default values that lower performance so much ?
Also without doing anything special, just playing with a BIOS setting which overclocks GPU without altering memory, multiplier, BLK, CPU clock, voltage and with standard air cooler, I managed to push GPU clock of Core i5-2400 from 1100MHz to 2100MHz!
The result was a huge benchmark boost for clip 8 from 118fps to 220fps !
I'm really curious for the explanation of Intel, why on earth did they put max clock so low - only 1100 MHz.
4K for SandyBridge will be more than easy with such GPU clocks.
There is no excuse for not delivering 4K drivers for Sandy ;)
nevcairiel
12th April 2013, 20:59
4k is not only about speed, you also need bigger buffers to store the frames.
But you can stop dreaming, it will never happen.
NikosD
12th April 2013, 21:42
Eric has never said that there are hardware limitations for 4K, not even performance.
Maybe you think you know more than him about Intel's hardware.
Are you already in payroll or are you close to it ?
egur
12th April 2013, 22:05
Eric,
by changing media settings to application settings in latest Intel drivers, I got a huge improvement in benchmarks using DXVAChecker.
Is there a reason that Intel drivers have other default values that lower performance so much ?
Any HW algorithm that involves reading/writing from/to memory will decrease performance. Setting various options to application settings will effectively turn them off if the application (or my decoder DLL) do not request them.
I'm really curious for the explanation of Intel, why on earth did they put max clock so low - only 1100 MHz.
Max clock is dependent on several factors:
* Quality of silicon - each silicon die from the same wafer will behave differently. You are lucky to get a very good one :). The variance is quite large.
* Motherboard quality is also important. A good board will deliver a cleaner clock and a stable power supply to the processor. Cheap boards do not. That's why OC boards are much more expensive.
* Silicon aging - as the transistors age, they'll need more voltage to operate. Using a high voltage (OC) will quicken their aging. Your GPU will not work at 2100MHz forever. It might become unstable within months.
* Guard bands - Intel has been very conservative with its guard bands. That's why these processors (almost) never die. When you OC, you remove those guard bands and risk instability.
* Heat - high voltage + high clock means more heat. The baseline cooling solutions may not be able to cool the processor. A hot processor will throttle it's clocks and turn into a 386 (performance wise). Heat also increase silicon aging.
The power budget for the processor is distributed between the CPU and GPU, giving too much to the GPU (via high clock) will cripple performance for most applications.
When you OC, you take some of the power decisions into your own hands (e.g. ignore TDP) and get better results for your specific apps.
BTW, actual voltage to the GPU is controlled by the processor HW, I'm sure a high voltage was used. Not all boards can supply this current in a stable fashion.
4K for SandyBridge will be more than easy with such GPU clocks.
There is no excuse for not delivering 4K drivers for Sandy ;)
This is a HW limitation that drivers can't fix. This is the official answer I got when I asked the same question.
FYI, SNB was designed long before 4K was discussed by anyone. Using large line buffers is expensive (many millions of $) and should be justified. 4K was not a relevant use case. IVB was late enough to fix this.
nevcairiel
12th April 2013, 22:13
Eric has never said that there are hardware limitations for 4K
This is a HW limitation that drivers can't fix. This is the official answer I got when I asked the same question.
While he said that before as well, happy now? :p
NikosD
13th April 2013, 07:46
Any HW algorithm that involves reading/writing from/to memory will decrease performance. Setting various options to application settings will effectively turn them off if the application (or my decoder DLL) do not request them.
As a matter of fact, last time I installed ATI drivers I think they turn on by default some video filters and settings I always turn off.
It seems that I have to do the same with Intel Media settings, too.
Max clock is dependent on several factors:
* Quality of silicon - each silicon die from the same wafer will behave differently. You are lucky to get a very good one :). The variance is quite large.
* Motherboard quality is also important. A good board will deliver a cleaner clock and a stable power supply to the processor. Cheap boards do not. That's why OC boards are much more expensive.
* Silicon aging - as the transistors age, they'll need more voltage to operate. Using a high voltage (OC) will quicken their aging. Your GPU will not work at 2100MHz forever. It might become unstable within months.
* Guard bands - Intel has been very conservative with its guard bands. That's why these processors (almost) never die. When you OC, you remove those guard bands and risk instability.
* Heat - high voltage + high clock means more heat. The baseline cooling solutions may not be able to cool the processor. A hot processor will throttle it's clocks and turn into a 386 (performance wise). Heat also increase silicon aging.
All of the above are very true and are the main reasons I don't overclock my CPUs and GPUs.
But I didn't say anything about overclocking.
I said about clocking.
If you put the clock too low, everything above it is overclocking :p
The default max value (1100MHz) seems a very low and conservative clock for such CPU/GPU quality.
That's why many SandyBridge implementations like Core i7 go up to 1350 MHz.
I don't think that putting SandyBridge GPU to 1500MHz will degrade transistor quality.
2100MHz is too high, but 1100MHz is too low.
The power budget for the processor is distributed between the CPU and GPU, giving too much to the GPU (via high clock) will cripple performance for most applications.
When you OC, you take some of the power decisions into your own hands (e.g. ignore TDP) and get better results for your specific apps.
BTW, actual voltage to the GPU is controlled by the processor HW, I'm sure a high voltage was used. Not all boards can supply this current in a stable fashion.
Nice info. I'll take a look to voltage and pure CPU and CPU/GPU performance with high GPU clocks.
This is a HW limitation that drivers can't fix. This is the official answer I got when I asked the same question.
FYI, SNB was designed long before 4K was discussed by anyone. Using large line buffers is expensive (many millions of $) and should be justified. 4K was not a relevant use case. IVB was late enough to fix this.
OK. It's the official answer.
Just one question.
Segmenting 4K to 1080 and pushing the smaller segments to line buffers of SNB, wouldn't do the job ?
Slower than native 4K, but doable.
egur
13th April 2013, 08:19
NikosD, you make claims about clocks with very little knowledge about the domain. This thread is not the place for a deep analysis on Intel's specs. The clock values are derived after a long period of testing.
I have nothing more to add on the matter.
As for 4K, to my understanding, your suggestion is not possible.
NikosD
13th April 2013, 08:36
Egur, I have never said I'm an expert on hardware like Intel engineers.
I only say what I see.
Respecting your position to Intel and this thread, I don't want to add anything more either.
pankov
13th April 2013, 09:21
nevcairiel, Eric,
is there any chance for implementing the hardware (QuickSync) deinterlacer that is available through the DLL in LAV Video Decoder the same way as it's done for NVidia (CUVID)?
I'm asking about this because I had to change my CPU recently to a non -K variant and I had to downgrade from HD4000 to HD2500 and now very tough to handle 1080i50/60 content with madVR even at minimal settings. I'm not sure this will make it possible but I've tried 1080p60 content and it works so I have my hopes up.
Btw
does anybody know if this hardware deinterlacing will be better (as in faster) that the DXVA one that is currently used in madVR?
nevcairiel
13th April 2013, 13:39
Its the same hardware deinterlacer, so speed should be the same.
I can put it on my list to test it again for the next major LAV version and re-enable it if it works properly.
pankov
13th April 2013, 13:43
Thanks,
I'll be glad if you do so
NikosD
14th April 2013, 21:25
* Heat - high voltage + high clock means more heat. The baseline cooling solutions may not be able to cool the processor. A hot processor will throttle it's clocks and turn into a 386 (performance wise). Heat also increase silicon aging.
The power budget for the processor is distributed between the CPU and GPU, giving too much to the GPU (via high clock) will cripple performance for most applications.
When you OC, you take some of the power decisions into your own hands (e.g. ignore TDP) and get better results for your specific apps.
BTW, actual voltage to the GPU is controlled by the processor HW, I'm sure a high voltage was used. Not all boards can supply this current in a stable fashion.
I did some tests with Girls clip which found out is the most power consuming probably because of duration and difficulty, I used LAV 0.56.1 native.
GPU@1100MHz max
Vcore 1,02V
VGPU 1,22V
Temp: 46 C
GPU power consumption: 7,8W
CPU+GPU power consumption: 24,2W
GPU Load: 84%
GPU@2000MHz max
Vcore 1,12V
VGPU 1,26V
Temp: 57 C
GPU power consumption: 13,8W
CPU+GPU power consumption: 39,1W
GPU Load: 72%
It seems that GPU Voltage is not affected, just a slight increase of 0,04V.
CPU Vcore goes higher 0,10V.
But the power consumption is almost double, just like performance and the temperature of course is higher.
GPU load drops, due to higher performance of EUs.
Doesn't look too bad to me.
Next time I'll try to find a benchmark or software that uses the 4 CPU cores and the GPU at the same time.
I don't play games, so it's not easy for me to find one.
If someone has a proposition, I would try it.
egur
15th April 2013, 08:37
Let me interpret your results.
The HW decoder/vpp (most likely) doesn't max out the GPU power budget. The EUs (GPU cores) do. At 2GHz they will consume a lot of power. Too much power and will cross the TDP mark.
Video playback, even if it uses 100% of the CPU doesn't activate enough CPU compute resources and doesn't reach TDP at 100% utilization.
Also, drawing conclusions from a single processor is quite meaningless, like I said, there's a large variance between units and you probably have a very good one.
While your findings are outside my scope, I find them interesting. They are useful for QuickSync based media servers & digital signage systems. A small niche, but a valid one non the less.
NikosD
15th April 2013, 08:58
Every video benchmark pushes EUs a lot, not 100%, along with HW decoder (MFX engine), because it's not a normal video playback which tries to keep realtime framerate, it's a benchmark that wants to go as fast as it can.
Moreover, most of the clips don't push even MFX engine to 100%, you need a high bandwidth clip like Birds (8th clip) to stress MFX engine to 100%.
I had the same thought like the one you mention above yesterday and I did a test with a GPU (EU only) benchmark, which had the GPU Load at 100% using EUs only (it's a D3D10 test) and had a little more power consumption and voltage, than video benchmark.
I didn't keep the results, because I was interested to video benchmarks only.
I'll do the D3D10 test again and see the results.
I prepare a CPU, CPU/GPU, CPU/VPU benchmark and monitoring session.
itsonlyjustincase
17th April 2013, 11:33
Hi,
Thanks to the new intel ivy gpu drivers, we can now on win8 take advantage of quick sync even if the dGPU is the primay in used. Are we going to have an update of Quick Sync Decoder and Lav Video to support this ?
I have a video program that is not compatible with intel GPU. On my laptop (asus ux32vd) i have to launch it forcing the gGPU but it think i would have better performance using QuickSync over Cuvid (nvidia geforce 620m). What do you think about it ?
egur
18th April 2013, 09:01
The feature you refer to requires Win8 + IvyBridge/Haswell. A combination I don't have at home. It requires some code changes (use D3D11 API instead of D3D9) and I'm not sure if it will work in full screen exclusive mode.
I didn't understand what you meant in your question.
NikosD
18th April 2013, 13:25
* Heat - high voltage + high clock means more heat. The baseline cooling solutions may not be able to cool the processor. A hot processor will throttle it's clocks and turn into a 386 (performance wise). Heat also increase silicon aging.
The power budget for the processor is distributed between the CPU and GPU, giving too much to the GPU (via high clock) will cripple performance for most applications.
When you OC, you take some of the power decisions into your own hands (e.g. ignore TDP) and get better results for your specific apps.
BTW, actual voltage to the GPU is controlled by the processor HW, I'm sure a high voltage was used. Not all boards can supply this current in a stable fashion.
I did some long tests with 2 different systems.
One with Core i5 (SNB) and one of my signature with a discrete CPU/ GPU combination.
I ran a CPU only application (a flops benchmark), a GPU (EU) only application (a DX10 benchmark with ~2% percent CPU usage) and a VPU only (QuickSync + EU) benchmark with ~3% CPU usage.
Of course I ran combinational benchmarks CPU/GPU and CPU/VPU.
For the SNB system, I ran them with normal GPU clock and overclocked clock (2000 MHz max)
Nothing unreasonable happened.
The CPU never throttled back to lower frequencies due to temperature or power limitations, even with the overclocked GPU to 2GHz during CPU/ GPU test, where the power consumption went more than 82W for the whole chip.
GPU Voltage never crossed 1,27V and GPU power consumption never crossed 16W, during the most rough tests.
The behavior of on-die CPU/ GPU and discrete CPU/ GPU was the same, regarding performance loss due to the combinations of CPU/GPU and CPU/VPU benchmarks.
Conclusion:
If you overclock or let's say clock higher the SNB GPU you will get reasonable results:
1) Higher performance (can go more than two times up for GPU)
2) Higher temperatures (only 3 degrees of Celsius - from 67 to 70)
3) Higher consumption of GPU (from 8W to 16W)
The CPU performance will not be affected at all, because of higher GPU clocking.
itsonlyjustincase
19th April 2013, 08:53
The feature you refer to requires Win8 + IvyBridge/Haswell. A combination I don't have at home. It requires some code changes (use D3D11 API instead of D3D9) and I'm not sure if it will work in full screen exclusive mode.
I didn't understand what you meant in your question.
Sorry it's because i'm french speaker.
I saw that recently intel upgraded the ivy drivers so that Quick sync can now can be used even if the intel GPU isn't used in the application. It is for systems with discrete GPU.
http://www.necacom.net/index.php?option=com_content&view=article&id=6961:intel-hd-graphics-drivers-153133071&catid=68:intel&Itemid=86
I'm asking that cause i'm a dj who uses a video mix program that is not compatible with intel GPU. So on my laptop (asus ux32vd) i have to set the nvidia Panel so that the nvidia GPU is used when i launch my video mix software (Serato Video). The thing is that, regarding what i've seen on the net, the intel quicksync of the HD 4000 is very powerful and perhaps more than the CUDA of my geforce 620m discrete GPU. So perhaps thanks to the new drivers of intel and a version of your decoder that support it, i could use the intel quicksync decoder even if i launched the Serato Video with the nvidia GPU
itsonlyjustincase
19th April 2013, 08:58
From the above driver..
Does this mean what i think it does? Using QuickSync even if the Intel GPU has no attached screen?
Anything else wouldn't make any sense, because it worked before, right?
Too bad its only Win8, i won't get to test that anytime soon.
Exactly !!
Too bad you can't test on win8 :(
egur
21st April 2013, 12:05
I've installed Win8 on my home PC and will start development this week for the headless decode/vpp feature. I'll also upgrade one of my dev platforms at work (more complicated).
The Media SDK support team said that SandyBridge supports this feature (Windows 8 + 15.28 driver or newer). I'll report progress as I advance.
No clue whether this will work in full screen exclusive or not.
If developers have any insight on DX11.1 video capabilities, please share.
Superb
21st April 2013, 13:47
That's great news! :)
itsonlyjustincase
21st April 2013, 20:29
Super news :) thank you
TEB
24th April 2013, 08:36
hei guys. Im a bit confused still over what cpu supports QuickSync..
So: on the Intel Denlow Server Platform "Haswell E3"
i can see that some of the cpu's have a "GT2" Gpu unit.
When i google this i get both "HD Graphics 4600" and "HD Graphics P3000" on the GT2 name.. Not sure which one is correct.. One CPU im considering is the : E31245 V3
Any idea which gpu is the correct one? And is the performance the same as the consumer versions of quicksync?
http://wccftech.com/intel-denlow-server-platform-2013-detailed-added-compatibility-broadwell-server-chips/
egur
24th April 2013, 08:45
Performance of the Xeon-E3 line should be about the same as the consumer models (of the same generation).
The E3's may be tweaked slightly differently from the consumer models but these changes have little impact on video HW acceleration.
TEB
24th April 2013, 08:59
hi. It seems when googling for v3, the Intel ARK for v1 comes up.. hence P3000.... In other words = wrong ;)
Haswell is definitive HD Graphics 4600 = gt2 for the Xeon family
egur
24th April 2013, 09:09
ark.intel.com is not updated with Haswell models yet. Haswell hasn't launched yet...
TEB
24th April 2013, 11:12
Hmm.. i see that my knowledge on Quick Sync is somewhat lacking.. Care to explain a few things for quick sync newbs like me ? ;)
1. is Quick Sync more designed for Encoding instead of Decoding? Or is it just different parts of the HW? Or is quick sync some general purpose DSP that can be used for more than video purposes like audio or crypto?
2. How does Quick sync deal with paralell requests, wheter it is encoding or decoding? Does it scale well with 2-4-8 threads that all call the "quick sync" functions? or is it a "one at a time" principle?
br TE
NikosD
24th April 2013, 11:38
QuickSync is an ASIC inside GPU and GPU is on the same die with CPU. GPU shares some resources to QuickSync (EUs)
It's not general purpose, it's for video transcoding, which means both decoding and encoding.
It can decode at the same time at least 4 different 1080p H.264 streams in realtime.
Eric can add more...
egur
24th April 2013, 13:47
QuickSync is the brand name of the HW accelerated decode, video post processing (VPP) and encode engine within the Intel GPU.
Most of it is implemented in ASIC (fixed function HW) that give it superior performance and power.
My decoder only uses the decode and video processing (optional) capabilities of QuickSync and works on platforms that are NOT QuickSync ready (Pentium and Celeron class processors).
QuickSync can be utilized via DXVA2 calls (no encode in DXVA2) or via the Intel Media SDK. The latter is used in this project since it simplifies the work.
Multiple stream can be decoded/encoded at the same time. They don't actually work in parallel, but the HW is fast enough to decode 4-8 streams in real time. Actual number depends on stream complexity as well as memory speed.
Overclocking the iGPU, as shown by NikosD along with very fast RAM can significantly improve performance.
It's also possible to have the display on a different GPU (AMD/Nvidia) freeing the iGPU so it has more resources to decode.
TEB
24th April 2013, 13:53
QuickSync is the brand name of the HW accelerated decode, video post processing (VPP) and encode engine within the Intel GPU.
Most of it is implemented in ASIC (fixed function HW) that give it superior performance and power.
My decoder only uses the decode and video processing (optional) capabilities of QuickSync and works on platforms that are NOT QuickSync ready (Pentium and Celeron class processors).
QuickSync can be utilized via DXVA2 calls (no encode in DXVA2) or via the Intel Media SDK. The latter is used in this project since it simplifies the work.
Multiple stream can be decoded/encoded at the same time. They don't actually work in parallel, but the HW is fast enough to decode 4-8 streams in real time. Actual number depends on stream complexity as well as memory speed.
Overclocking the iGPU, as shown by NikosD along with very fast RAM can significantly improve performance.
It's also possible to have the display on a different GPU (AMD/Nvidia) freeing the iGPU so it has more resources to decode.
But is there a quicksync asic PR core? or 1 for the whole die?
I was thinking about testing this on the new Dell T1700 with a e3-1245 v3 CPU + a nvidia gfx card.
Then i would have 4 CORES + 4 HT Cores + a Nvidia GPU.
Could it then do f.ex 12 HD low complexity HD streams in paralell (still CCTV ;))
br TE
egur
24th April 2013, 14:10
The amount of HW replication is a micro architectural detail which is not exposed.
I can't predict performance on such a system. Sorry. You need to make sure that your application runs on the iGPU not Nvidia's GPU.
TEB
24th April 2013, 14:39
The amount of HW replication is a micro architectural detail which is not exposed.
I can't predict performance on such a system. Sorry. You need to make sure that your application runs on the iGPU not Nvidia's GPU.
ok, i was mearly pointing at "It's also possible to have the display on a different GPU (AMD/Nvidia) freeing the iGPU so it has more resources to decode." for the nvidia gfx card.
I guess a standard GPU accelerated Video renderer is the best aproach..
Nice article on Quick sync quality vs other HW and SW assisted implementations of h264
http://www.hardware.fr/focus/67/encodage-h-264-retour-nvenc-quicksync.html
GTPVHD
24th April 2013, 17:16
http://www.scribd.com/doc/137419114/Introduction-to-AVX2-optimizations-in-x264
Eric, please add AVX2 optimizations to the decoder where applicable.
nevcairiel
24th April 2013, 17:18
Eric, please add AVX2 optimizations to the decoder where applicable.
The decoding is done in hardware, how would this be optimized? :p
The only thing would be the memory copy function, which i doubt would actually get much faster.
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.