View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing
Stereodude
6th February 2014, 17:00
SSE/AVX is like DSP in 256/512bit with 3GHz in CPU. The ASIC decoder may have run only in 1GHz. I can not see why SW decoder can not be faster. Maybe SSE/AVX is poor in architecture or maybe software implementation is poor.
Anyway I am disappoint on the performance of SSE/AVX.I can't believe you just asked why specialized hardware is faster than general purpose hardware doing things in software. :scared:
CharlieCL
6th February 2014, 20:50
I can't believe you just asked why specialized hardware is faster than general purpose hardware doing things in software. :scared:
Good question. In a mobile phone case (ARM) a software codec of JPEG by using DSP is faster than hardware codec.
CharlieCL
6th February 2014, 20:55
I'll give you a high level example how things work. Let's say the HW needs to perform inverse DCT of 8x8 blocks on many blocks. Blocks are located sequentially in memory (for simplicity of example).
....
An IDCT can be done in one clock? At least I see the performance limitation of ASIC codec on discrete GPU card because of the data transfer through PCIe.
NikosD
7th February 2014, 09:11
I can't believe you just asked why specialized hardware is faster than general purpose hardware doing things in software. :scared:
Good question. In a mobile phone case (ARM) a software codec of JPEG by using DSP is faster than hardware codec.
In principle, in theory, in general, ASIC (HW) should and could be faster than DSP or CPU (SW).
But in reality, in real world implementations, if we see HW performance from the first fixed-function HW in 2006 like UVD from ATI and VP2 from Nvidia up to 2011, before QuickSync from Intel, then HW was not faster than SW.
Almost every Quad-core CPU SW decoder, is faster than any AMD UVDx or Nvidia VPx fixed-function HW.
So in real world, we can say the opposite is truth for HW vs SW performance.
SW is faster than HW.
Except QuickSync ASIC of course, which is faster than any Quad core CPU.
You can check my signature for further performance details in the Excel spreadsheet, between SW and HW generations.
egur
7th February 2014, 09:16
An IDCT can be done in one clock? At least I see the performance limitation of ASIC codec on discrete GPU card because of the data transfer through PCIe.
Think of the HW as a pipeline with several stages. Stage 1 is the input and stage K is the output. On every clock, new data is fed to the inputs. after K clocks, the first output is ready. After K+1 clocks, the 2nd data is ready, etc.
So after N+K clocks, N+K clock ticks, N tasks are ready.
If you have a GPU with massive parallelism and a high clock frequency, it might be possible to do the same task faster but take much more power.
Power difference can be >1000x, depending on how the ASIC and surrounding HW blocks are implemented.
A hybrid ASIC + GPU-cores style implementation as done in the iGPU is a compromised approach.
A full ASIC video pipeline as done in TVs is much more efficient (power) but also can't be upgraded so it's not used in the PC world.
Stereodude
7th February 2014, 14:06
In principle, in theory, in general, ASIC (HW) should and could be faster than DSP or CPU (SW).
But in reality, in real world implementations, if we see HW performance from the first fixed-function HW in 2006 like UVD from ATI and VP2 from Nvidia up to 2011, before QuickSync from Intel, then HW was not faster than SW.
Almost every Quad-core CPU SW decoder, is faster than any AMD UVDx or Nvidia VPx fixed-function HW.
So in real world, we can say the opposite is truth for HW vs SW performance.
SW is faster than HW.
Except QuickSync ASIC of course, which is faster than any Quad core CPU.
You can check my signature for further performance details in the Excel spreadsheet, between SW and HW generations.It seems like you're intentionally missing the point. The HW decoders you're complaining about being slower than SW only need to run real time for playback. So they weren't designed to run at 10x real time or faster as that would be a waste of silicon and or power. Further, even though SW decoders might be faster in terms of raw frame rate (if you're only decoding) you saturate the CPU. I still get faster x264 encoding using veryslow when I use DGDecNV for decoding than if I were to use FFMpegSource with software decoding because I'm not wasting CPU resources decoding the video. QuickSync is a very different animal because it wasn't designed only to provide HW decode for playback, but to do rapid decode/encode.
NikosD
7th February 2014, 14:39
It seems like you're intentionally missing the point.
It seems you're unintentionally missing the point.
UVD, UVD+, UVD2.0, UVD2.2, UVD3 etc are not fast enough to decode 1080p60fps H.264 files, like for example AVCHD files from camcorders.
It is a very well known limitation.
And of course they can't decode 4K H.264 at all.
VPx except VP5 can't decode huge bitrate 1080p H.264 files, they can't even reach 24fps.
Of course they can't decode 4K H.264 too.
Finally VP5 can go up to 27-29 fps max at 4K H.264 files, not enough for flawless 4K@30 fps and of course it is far away from 4K@60 fps or 4K@120 fps that QuickSync can do.
I hope it is clearer now that ALL HW mentioned before, have trouble decoding in realtime this or that file.
CharlieCL
7th February 2014, 15:51
Think of the HW as a pipeline with several stages. Stage 1 is the input and stage K is the output. On every clock, new data is fed to the inputs. after K clocks, the first output is ready. After K+1 clocks, the 2nd data is ready, etc.
So after N+K clocks, N+K clock ticks, N tasks are ready.
If you have a GPU with massive parallelism and a high clock frequency, it might be possible to do the same task faster but take much more power.
Power difference can be >1000x, depending on how the ASIC and surrounding HW blocks are implemented.
A hybrid ASIC + GPU-cores style implementation as done in the iGPU is a compromised approach.
A full ASIC video pipeline as done in TVs is much more efficient (power) but also can't be upgraded so it's not used in the PC world.
So in average the pipeline is one clock output. SW codec can be constructed in this kind of pipeline by threads in multi-core but performance may be poor. When CPU is running in 100% usage everything will be slow. And power usage will be high.
Stereodude
7th February 2014, 17:02
It seems you're unintentionally missing the point.
UVD, UVD+, UVD2.0, UVD2.2, UVD3 etc are not fast enough to decode 1080p60fps H.264 files, like for example AVCHD files from camcorders.
It is a very well known limitation.
And of course they can't decode 4K H.264 at all.
VPx except VP5 can't decode huge bitrate 1080p H.264 files, they can't even reach 24fps.
Of course they can't decode 4K H.264 too.
Finally VP5 can go up to 27-29 fps max at 4K H.264 files, not enough for flawless 4K@30 fps and of course it is far away from 4K@60 fps or 4K@120 fps that QuickSync can do.
I hope it is clearer now that ALL HW mentioned before, have trouble decoding in realtime this or that file.So you're complaining that HW decoders don't work outside the use cases they were designed to handle. No surprise there... I'm well aware that the QuickSync engine is quite a bit more capable and faster than the hardware decode engines found on video cards. It was designed with a different objective in mind and the performance bears that out. It shouldn't be surprising that a well designed hardware decoding engine that's intended to accelerate decoding & encoding is much faster than software routines, or older hardware that's strictly targeted for real time playback of certain content.
NikosD
7th February 2014, 18:14
So you're complaining that HW decoders don't work outside the use cases they were designed to handle.
I'm not complaining for anything.
I just gave you the proofs that SW decoders are faster than HW decoders in all cases besides QuickSync and that HW decoders can't keep up for sure with anything but clips inside BluRay specs.
egur
8th February 2014, 00:06
Most (all?) of Nvidia's and AMD's decoding HW solutions were based on heavily using their GPU cores and less on fixed function HW. Those GPUs had much faster (and wider) memory using much more power and still couldn't compete in speed.
It all depends on the implementation details. How much is fixed function and how much is programmable.
Stereodude
8th February 2014, 00:55
Most (all?) of Nvidia's and AMD's decoding HW solutions were based on heavily using their GPU cores and less on fixed function HW. Those GPUs had much faster (and wider) memory using much more power and still couldn't compete in speed.
It all depends on the implementation details. How much is fixed function and how much is programmable.I'm pretty sure Nvidia's HW decode does not rely on the 2d/3d pixel engine. All variants with a given VPx engine had the same performance regardless of cost or 3D capabilities of the card.
NikosD
8th February 2014, 07:52
Most (all?) of Nvidia's and AMD's decoding HW solutions were based on heavily using their GPU cores and less on fixed function HW. Those GPUs had much faster (and wider) memory using much more power and still couldn't compete in speed.
It all depends on the implementation details. How much is fixed function and how much is programmable.
Actually no.
I didn't choose accidentally the year 2006.
It was then, the year of Blu Ray that most CPUs couldn't keep up with the resolution, H.264 complexity and huge bitrate of BluRay movies and Nvidia first with VP2 and then ATI with UVD, decided to use fixed-function HW for the whole H.264 pipeline (VLD decoding) - the VP2 case - and H.264 & VC-1 VLD decoding for UVD.
Before VP2 and UVD at 2006, it is true that both companies were using programmable GPU resources to accelerate MPEG-2, WMV and H.264 for partial acceleration - no VLD decoding.
Actually even UVD and UVD+ were using shaders for MPEG-2 codec, but not for H.264 and VC-1 - they used fixed-function HW for both of them.
Blu-Ray brought us fixed-function HW and VLD decoding for H.264&VC-1.
egur
10th February 2014, 04:04
As far as I know they didn't use ASIC for all the decoding components, otherwise why they are slower? They ran at about the same clock frequency, have much faster dedicated memory, they don't share the L3 cache with the CPU and have tons of thermal headroom.
Copying the compressed bitstream over PCIe isn't slow so I don't see any reason except:
1) They don't have ASIC for everything.
2) The drivers are highly inefficient (less likely).
Stereodude
10th February 2014, 05:15
As far as I know they didn't use ASIC for all the decoding components, otherwise why they are slower? They ran at about the same clock frequency, have much faster dedicated memory, they don't share the L3 cache with the CPU and have tons of thermal headroom.
Copying the compressed bitstream over PCIe isn't slow so I don't see any reason except:
1) They don't have ASIC for everything.
2) The drivers are highly inefficient (less likely).Because they only need to achieve realtime playback of the most demanding profile and resolution they were after given the use case they were supporting. Why waste die area or power making it run 10x faster than you need?
wanezhiling
10th February 2014, 05:21
Because fast decoding speed can make a fast seeking speed.
NikosD
10th February 2014, 08:28
As far as I know they didn't use ASIC for all the decoding components, otherwise why they are slower? They ran at about the same clock frequency, have much faster dedicated memory, they don't share the L3 cache with the CPU and have tons of thermal headroom.
Copying the compressed bitstream over PCIe isn't slow so I don't see any reason except:
1) They don't have ASIC for everything.
2) The drivers are highly inefficient (less likely).
Nvidia and especially ATI, preferred to stick to the BluRay specs and they decided to go up to realtime performance.
So the maximum performance was a dual stream of 1080p30 fps H.264 for PIP.
Only for VP5 Nvidia decided to build something out of BluRay specs and support 4K@24 fps, the maximum HDMI v1.4 output
About clocks and die area.
The clock of UVD2.2 is 400 MHz, when Intel GT1 goes up to 1100MHz.
The die area of first UVD fixed-function decoder was so big, that ATI was forced to remove it from the fastest card HD 2900, because 2900 had the largest number of GPU cores and it didn't have room for UVD!
The same thing happened with Nvidia and VP5, which appeared for the first time in Fermi architecture (Series 500), but only GT520 got it because it had the minimum number of GPU cores from all 500 series cards.
About Intel.
Intel didn't make a video decoder, but a video transcoder.
The purpose was not just "play movies", but build the fastest transcoder, which means faster decoder and encoder at the same time.
Intel decided to go far away from real time performance, because they wanted to decode as fast as possible and to encode as fast as possible at the same time, in order to minimize the transcoding time.
If you see the renderless speed of QuickSync in my Excel, you will be astonished by the performance advantage over the other HW and SW decoders.
That's why they didn't stick to Bluray specs and "accidentally" built the fastest video decoder of all, QuickSync, which is good for us who just decode movies and various other clips.
Stereodude
10th February 2014, 14:53
About Intel.
Intel didn't make a video decoder, but a video transcoder.
The purpose was not just "play movies", but build the fastest transcoder, which means faster decoder and encoder at the same time.
Intel decided to go far away from real time performance, because they wanted to decode as fast as possible and to encode as fast as possible at the same time, in order to minimize the transcoding time.
If you see the renderless speed of QuickSync in my Excel, you will be astonished by the performance advantage over the other HW and SW decoders.
That's why they didn't stick to Bluray specs and "accidentally" built the fastest video decoder of all, QuickSync, which is good for us who just decode movies and various other clips.The irony of you "explaining" QuickSync to an Intel employee who has been busy working with it firsthand isn't lost on me, nor is your use of the exact same argument I used on why CharlieCL shouldn't be surprised that the QS HW "decoder" is so much faster than software (that you criticized me for).
NikosD
10th February 2014, 15:12
The irony of you "explaining" QuickSync to an Intel employee who has been busy working with it firsthand isn't lost on me, nor is your use of the exact same argument I used on why CharlieCL shouldn't be surprised that the QS HW "decoder" is so much faster than software (that you criticized me for).
I don't explain explicitly things to anyone.
I take the opportunity to explain issues in general and with my perspective.
Eric is an Intel employee but he is not obliged to know everything about HW decoding of ATI and Nvidia.
So you are misleading with your post by putting my post with the Intel explanation only, when it is more than obvious that the explanation about Intel was not written alone but in contrast with ATI and Nvidia practice on this specific matter.
About CharlieCL and QS HW, I explained in details why he should be surprised by the performance of QS decoder judging by the previous approaches on the issue from ATi and Nvidia and the performance of HW decoders like UVDx and VPx that were common practice for years and Eric wrote in short but accurately why he shouldn't be surprised when the approach is different.
I can't understand what is your problem with me, besides some points I commented on your posts to enlighten the issues, which obviously made you nervous and aggressive.
Take it easy...
CharlieCL
10th February 2014, 20:26
Both NVIDIA and AMD used hardware codec in their graphics chips and APU now. They have found that GPU was not suitable for codecs.
Current hardware codec is put in wrong position - in the side of GPU with driver. It should be put in the side of CPU like FPU. Codec is executed in CPU not in GPU.
AVX2 is 512-bit so it can execute 16 32-bit multiplexer. Some DSP has 1024-bit. The improvement of SW algorithm should not be far away from HW.
At present SW codec can not play 4K video but HW codec can do it easily in X86. There may have something wrong in software implementation.
nevcairiel
10th February 2014, 20:36
AVX2 is 512-bit
Wrong, AVX2 is 256-bit.
You also need data which is suited for such processing. Not every algorithm can be made magically faster with AVX, some processing just doesn't benefit from SIMD operations.
At present SW codec can not play 4K video but HW codec can do it easily in X86. There may have something wrong in software implementation.
There is nothing wrong in the software, thats just how it is. The decoding complexity increases quite a lot from 1080p to 4K.
There is 4 times the pixels to handle, and a higher bitrate to process. Higher bitrates cause the bitstream parsers to need much more time, especially the entropy decoding (CABAC) in H264 is rather slow and cannot be implemented with SIMD instructions (SSE/AVX), but only "normal" instructions. Bitrate plays a huge role in the speed of a software decoder (unlike hardware decoders, where the impact from bitrate is much smaller).
H264 was primarily designed for HD, not UHD. You notice this in the UHD decoding performance, it doesn't scale properly. Its not a software problem as such, but a limitation of the format.
Even a HEVC decoder in its early stages (meaning not very optimized yet) can be as fast or even faster than a highly optimized H264 decoder on UHD content, just because the format is more suited for it. Bitrate is lower, processing algorithms are better suited and more efficient at higher resolutions.
PS:
All of this applies to software decoders, HW implementations scale quite differently.
NikosD
11th February 2014, 12:28
H264 was primarily designed for HD, not UHD. You notice this in the UHD decoding performance, it doesn't scale properly. Its not a software problem as such, but a limitation of the format.
The various levels of H.264 allow the format to be as flexible as you want.
You can encode in various resolutions from very low to very high and the same goes for bitrate.
Regarding scaling to multi-core/ multi-thread decoding, I disagree with you based on my SW decoding tests.
Most of the times the performance from single core to dual core is almost double and from dual core to quad core, is double again.
You can check it out in my Excel.
I would say that H.264 SW decoding is one of the most suitable process to show off multi-core/ multi-thread/ hyperthreading efficiency.
H.265 is just a small evolution of H.264 and not a revolution as it was H.264.
H.265 can achieve same picture quality with half-bitrate of H.264 and go up to 8K, but this is something that H.264 could possibly handle in the future with an appropriate extension -the resolution, not the bitrate.
Even MPEG1 managed to go to HD!
So, the "problem" is not in the algorithm or the implementation, the "problem" is the efficiency and speed of an ASIC vs CPU.
The order of magnitude that Eric wrote in his post between the complexity of the ASIC implementation and the programmable HW can't be beaten by faster clock or SIMD architecture unless you push it in unrealistic values, speaking theoretically.
nevcairiel
11th February 2014, 12:33
You don't understand. I never said H.264 doesn't scale properly to multi-cores. I said H.264 decoding performance does not scale properly for UHD content, because the format is not optimized for such image sizes and high bitrates. It just gets extremely slow!
Of course you can create such files, but they will just decode very slow in software decoders. HEVC on the other hand is optimized for UHD content and above, and its much easier to write a decoder for HEVC which can decode 4K movies faster than a H264 decoder could.
H264 was designed at a time when HD was the major target. You CAN create files in 4K and maybe even 8K in H.264, but decoding those files is just going to be extra slow.
NikosD
11th February 2014, 12:43
I understood what you wrote, but I think you didn't understand me.
During my SW decoding tests, there was no difference between H.264 HD decoding and H.264 UHD decoding.
The algorithm and the ffmpeg SW decoder, scale almost linearly, almost perfectly!
The algorithm and the decoder don't have strange preferences to check if the content is HD or UHD and to behave accordingly!
If it scales to HD it scales to UHD too.
It's just the extra information, pixels, bandwidth due to higher resolution that make things harder for decoders (SW or HW) to keep up with UHD.
HEVC on the other hand is optimized for UHD content and above, and its much easier to write a decoder for HEVC which can decode 4K movies faster than a H264 decoder could.
H264 was designed at a time when HD was the major target. You CAN create files in 4K and maybe even 8K in H.264, but decoding those files is just going to be extra slow.
You are DEFINITELY WRONG about that.
H.265 decoding is much more difficult than H.264 decoding and you need a quad core CPU to decode 1080p content.
For 4K H.265 you can forget any CPU that is released today, if you increase a little the bitrate or fps.
Again, check the SW performance of H.265 decoder in my Excel to see the figures of 1080p H.265 decoding
nevcairiel
11th February 2014, 13:22
There is no H.265 decoder out there today which is already properly optimized like the H.264 decoders are. Once they are optimized, they'll likely be double as fast as today, or even more.
I am a developer and I deal with software decoders every day. I know what I'm talking about.
For 1080p, H.264 will remain faster. For 4K, HEVC will be faster.
In any case, I've learned a long time ago that you are unwilling to accept any other truth then what you believe it to be, so I won't bother to try to argue.
hajj_3
11th February 2014, 13:32
egur, do you know whether intel's current apu's will be able to hardware decode h265 and vp9 using programmable portions of the hardware as i'd rather not drain the battery on laptops playing h265/vp9 with cpu decoding.
NikosD
11th February 2014, 13:32
@nev
I don't doubt your programming skills, I just see the facts.
I did some testing a few weeks ago and I posted my results to an Excel.
I change my mind often, when I see real numbers, but I have this bad habit to test myself things and not just believe others with no testing.
You know that too.
And I have to say that most of the times - not always of course - I'm right and this is not selfish.
About 4K H.265, we' ll see...
I just haven't seen anywhere something that tells me H.265 is easier to decode than H.264, even for 4K.
nevcairiel
11th February 2014, 13:35
egur, do you know whether intel's current apu's will be able to hardware decode h265 and vp9 using programmable portions of the hardware as i'd rather not drain the battery on laptops playing h265/vp9 with cpu decoding.
Thats unlikely to be useful.
You'll have to wait for a new generation of CPUs/GPUs to get hardware H265 decoders.
Luckily, there isn't any real content using it yet either. Not to mention that all encoders for it still suck and don't even manage to beat H.264 videos. :)
CharlieCL
11th February 2014, 15:57
...
Higher bitrates cause the bitstream parsers to need much more time, especially the entropy decoding (CABAC) in H264 is rather slow and cannot be implemented with SIMD instructions (SSE/AVX), but only "normal" instructions. Bitrate plays a huge role in the speed of a software decoder (unlike hardware decoders, where the impact from bitrate is much smaller).
H264 was primarily designed for HD, not UHD. You notice this in the UHD decoding performance, it doesn't scale properly. Its not a software problem as such, but a limitation of the format.
Even a HEVC decoder in its early stages (meaning not very optimized yet) can be as fast or even faster than a highly optimized H264 decoder on UHD content, just because the format is more suited for it. Bitrate is lower, processing algorithms are better suited and more efficient at higher resolutions.
PS:
All of this applies to software decoders, HW implementations scale quite differently.
The entropy decoding (CABAC) in H264 may be need to be optimized by "Assembly".
HEVC with half bitrate of H264 has the same video quality?That will be good for SW decoder.
When CPU usage raise the performance of CPU may be decreased. That may be another problem of SW decoder.
nevcairiel
11th February 2014, 16:09
The entropy decoding (CABAC) in H264 may be need to be optimized by "Assembly".
It already is.
But you're welcome to contribute improvements!
CharlieCL
11th February 2014, 16:12
Most of the times the performance from single core to dual core is almost double and from dual core to quad core, is double again.
You can check it out in my Excel.
I would say that H.264 SW decoding is one of the most suitable process to show off multi-core/ multi-thread/ hyperthreading efficiency.
What software decoder can double the performance from dual core to quad core?
NikosD
11th February 2014, 16:25
I always use LAV Video in my tests, aka ffmpeg.
NikosD
13th February 2014, 12:16
As far as I know they didn't use ASIC for all the decoding components, otherwise why they are slower? They ran at about the same clock frequency, have much faster dedicated memory, they don't share the L3 cache with the CPU and have tons of thermal headroom.
Copying the compressed bitstream over PCIe isn't slow so I don't see any reason except:
1) They don't have ASIC for everything.
2) The drivers are highly inefficient (less likely).
The first Intel chipset capable of full HW acceleration of all three BluRay codecs (MPEG2, H.264, VC-1) was G45 (GMA X4500HD) and the mobile version GM45 (GMA 4500MHD)
It was also the last chipset on motherboard.
The first (and last) on-package GPU for Intel the Clarkdale/Arrandale SoC was an upgrade to G45/GM45, supporting dual stream full HW acceleration BluRay codec.
Do you know if those two full HW accelerators (G45, Clarkdale) had ASIC for everything ?
Did they use any programmable logic (CPU/GPU) for HW acceleration ?
Because I think their speed was on par with VPx/UVDx in years 2008-2010.
GTPVHD
13th February 2014, 12:53
Maxwell introduces even faster H.264 encoding and decoding with improved NVENC (which is used, for instance, in ShadowPlay).
Looks like Nvidia Maxwell has a new generation VP6(?) decoder, maybe fast & powerful enough to decode 4K60P(?)/120P(?) now instead of just 4K24P in Kepler VP5.
NikosD
13th February 2014, 13:08
We 'll find out later this month, but while it says improved NVENC, it doesn't say anything about new decoding engine.
Probably they don't like the unofficial name "VPx".
Faster H.264 decoding could be a new engine like "VP6" or an improved VP5.x
VP5 is faster than 4K@24fps.
It's closer to ~30fps, like 27-29fps.
And what about other features regarding codecs, resolution or VPP functions ?
I don't think Maxwell supports H.265 (surely they would say so in bold letters), but I think that Intel's Broadwell will not support H.265 too.
I bet Nvidia with Maxwell will catch 4K@60 fps, but no more.
It's not that easy to catch 4K@120fps like Intel did.
We 'll see...
NikosD
24th February 2014, 22:43
Intel surely listens users and makes progress in driver's quality but:
1) They don't change their mind about pre-configured VPP functions in drivers - they activate a lot of them by default
2) They new drivers crash DXVA Checker v3.0
3) They disabled DXVA VC-1 above 1080p - fortunately QS decoder still use 4K VC-1
4) In the release notes say that my Pentium G3420 has Intel QuickSync enabled but not Intel Clear Video HD!
5) Still has OpenGL problems with MadShaders v.0.3.0
6) Still GPA can't be used for media metrics under Win 8.1 and it crashes OS
mhourousha
26th February 2014, 06:38
HDMI don't output sound after I Installed 10.18.10.3412,I have to uninstall the audio driver :( My cpu is i3-3225
Procrastinating
27th February 2014, 13:16
Would anyone know what the lowest-end Haswell CPU is, which can take advantage of Quicksync Encoding features? It would be interesting to see how cheap of a recording-ready processor could be found nowadays.
egur
27th February 2014, 14:41
Would anyone know what the lowest-end Haswell CPU is, which can take advantage of Quicksync Encoding features? It would be interesting to see how cheap of a recording-ready processor could be found nowadays.
That would be the i3-4130 (http://ark.intel.com/products/77480/Intel-Core-i3-4130-Processor-3M-Cache-3_40-GHz) according to ark.intel.com
kalehrl
27th February 2014, 14:52
Would anyone know what the lowest-end Haswell CPU is, which can take advantage of Quicksync Encoding features? It would be interesting to see how cheap of a recording-ready processor could be found nowadays.
Have a look here:
http://forum.doom9.org/showthread.php?p=1670447#post1670447
wanezhiling
27th February 2014, 15:33
That would be the i3-4130 (http://ark.intel.com/products/77480/Intel-Core-i3-4130-Processor-3M-Cache-3_40-GHz) according to ark.intel.com
http://forum.doom9.org/showthread.php?p=1670539#post1670539
GTPVHD
27th February 2014, 22:04
http://www.phoronix.com/scan.php?page=news_item&px=MTYxNzE
http://anzwix.com/a/VA-API/AddTheSeparatedMediaEncodingdecodingFilesForBDW
"On the video acceleration side, there's lots of changes between Haswell and Broadwell."
Interesting, wished Intel would release Broadwell instead of constantly delaying it, the need for more efficient 14nm CPU is great.
NikosD
28th February 2014, 12:16
Would anyone know what the lowest-end Haswell CPU is, which can take advantage of Quicksync Encoding features? It would be interesting to see how cheap of a recording-ready processor could be found nowadays.
You have to look for the cheapest, among these:
•Intel Pentium Processor 3558U/3561Y/G3220/G3220T/G3320TE/G3420/G3420T/G3430 with Intel HD Graphics
•Intel Celeron Processor 2957U/2961Y/2981U/G1820/G1820T/G1820TE/G1830 with Intel HD Graphics
kalehrl
8th March 2014, 20:58
Is there an encoding tool which makes use of Intel Quick Sync hardware deinterlacing?
andyvt
8th March 2014, 21:23
Is there an encoding tool which makes use of Intel Quick Sync hardware deinterlacing?
QSTranscode (http://sourceforge.net/projects/qstranscode/) does. By default it will use MSDK VPP to DI and output the rendered frame rate.
kalehrl
9th March 2014, 09:53
I tried it and it really works on my cheap Celeron g1820 processor.
The encoding is extremely fast but the quality is bad even with the highest quality preset.
I wound prefer to use MeGUI and software x264 encoder but just do deinterlacing with a tool which supports hardware Quick Sync deinterlacing.
I currently use QTGMC but it is very slow.
NikosD
9th March 2014, 10:01
Have you tried HandBrake with QuickSync support (https://trac.handbrake.fr/milestone/QuickSync%20Beta) ?
egur
9th March 2014, 10:11
You can try using LAV/ffdshow in your AVS script via directshow source. I didn't try it but I've heard it works. Both DS filters have HW deinterlacing and film mode detection (HW IVTC).
kalehrl
9th March 2014, 11:05
Have you tried HandBrake with QuickSync support (https://trac.handbrake.fr/milestone/QuickSync%20Beta) ?
I've just tried it and I can encode my video with qsv but the problem is, the video is sped up and it is half the length of the original. I followed this advice:
Simply select the "H.264 (Intel QSV)" encoder option from the "Video Encoder" dropdown menu on the "Video" tab.
For hardware-accelerated deinterlacing, use deinterlace custom with "qsv" (without quotes). This will only work when QSV is also used for video encoding.
The problem seems to be not being able to select field order in handbrake and also not dumping half the fields. The encoded video is 50fps progressive.
@egur
Could you please tell me how to transform this script into the one using LAV/ffdshow decoder with HW deinterlacing?
LoadPlugin("D:\Programs\MeGUI\tools\ffms\ffms2.dll")
FFVideoSource("D:\Dreambox\movie\test.mkv", fpsnum=25, fpsden=1, threads=1)
Load_Stdcall_Plugin("D:\Programs\MeGUI\tools\avisynth_plugin\yadif.dll")
Yadif(order=0)
crop(2, 74, -2, -74)
LoadPlugin("D:\Programs\MeGUI\tools\avisynth_plugin\UnDot.dll")
Undot() # Minimal Noise
Spline36Resize(704,394) # Spline36 (Neutral)
I tried this way:
DirectShowSource("D:\Dreambox\movie\test.mkv")
crop(2, 74, -2, -74)
LoadPlugin("D:\Programs\MeGUI\tools\avisynth_plugin\UnDot.dll")
Undot() # Minimal Noise
Spline36Resize(704,394) # Spline36 (Neutral)
And it seems to work and LAV filter icon appears in the tray so I guess LAV is used now instead of FFVideoSource but the question is how to use Intel HW deinterlacing instead of Yadif?
NikosD
9th March 2014, 11:40
Sorry I can't help you with that, I'm extremely new to QuickSync encoding, since last 3412 driver update :)
I'm sure that you can set your questions to their user support pages/ forum or directly to the developers of HandBrake.
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.