View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing
NikosD
15th December 2013, 09:26
In only 2 years support ?
That's a pity...
They should have put it along with Haswell drivers or Ivy.
I don't know when exactly they added VC1_VLD2010 support.
zcream
16th December 2013, 14:10
Is 10-bit 4:2:2 YUY2 still unsupported within Quicksync ?
From the Intel Media SDK forums -
http://software.intel.com/en-us/forums/topic/328273
Media SDK supports YUV format NV12 (this is 4:2:0) natively. The SDK also supports 2 additional YUV formats: YV12 (this is 4:2:0) and YUY2 (this is 4:2:2). To use format YV12 or YUY2 a developer must first use VPP to convert format to NV12 before feeding the input frames to encoder.
- It supports the same formats as DXVA, so only 8-bit 4:2:0 (MPEG2, H264 and VC1)
- H264 High profile, no 10-bit or 4:2:2/4:4:4
- See above, only 8-bit. Output is always NV12.
egur
16th December 2013, 14:27
VPP can be used to convert from YUY2/YV12 to NV12, but that's it. No 10 bit support no 4:2:2 encoding or decoding and my guess, is that VPP operations occur after conversion to NV12.
andyvt
17th December 2013, 21:43
I am going to test this as soon as I get my laptop.
ATM, ffmpeg can record from USB cameras.
https://trac.ffmpeg.org/wiki/DirectShow
If I can use the ffmpeg cli and pass the ffmpeg raw stream to QSTranscode it should be fine.
I am very curious to see if there are any noticeable differences in Cineform 4:2:2 vs h.264 4:2:0 at a high enough bitrate.
I updated QSTranscode (http://sourceforge.net/projects/qstranscode/files/qstranscode1022.zip/download) so that it will allow you to use DS sources. Pass the devices to -i. If you need specific configuration for the devices (framesize, pixelfmt, etc) pass that using -ex.
zcream
18th December 2013, 08:32
Cool. I'm going to buy a laptop with hd4000 or higher and then test it.
In Haswell, there is a MJPEG decoder but no encoder. Is your MJPEG encoding then done in software ?
andyvt
18th December 2013, 11:13
Cool. I'm going to buy a laptop with hd4000 or higher and then test it.
In Haswell, there is a MJPEG decoder but no encoder. Is your MJPEG encoding then done in software ?
All video encoding is done by the MSDK, IIRC MJPEG encoding is SW based with HSW.
NikosD
22nd December 2013, 23:34
Eric,
is it possible to accelerate in QS3 HW this clip ?
bbb_sunflower_2160p (http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_2160p_30fps_stereo_abl.mp4)
It is a 3D clip (top-bottom) 3840x2160x2=4320
egur
23rd December 2013, 08:29
Eric,
is it possible to accelerate in QS3 HW this clip ?
bbb_sunflower_2160p (http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_2160p_30fps_stereo_abl.mp4)
It is a 3D clip (top-bottom) 3840x2160x2=4320
I don't think so. I can try and report back.
ryrynz
2nd January 2014, 02:53
I have visual corruption at the bottom of the screen 9 seconds in with this sample (http://www.sendspace.com/file/h20f5q), happens on my 2600K with 15.28.20.64.3347 drivers on Windows 8.1 x64, but not with my 3570K using 15.33.8.64.3345.
I had this video freeze after this occured a few times (audio playback was uninterrupted with no response from MadVR using Ctrl-J) yet MPC-BE was fully operational.
Not sure if that's related but I hope so as I usually have no issues.
egur
3rd January 2014, 20:01
I have visual corruption at the bottom of the screen 9 seconds in with this sample (http://www.sendspace.com/file/h20f5q), happens on my 2600K with 15.28.20.64.3347 drivers on Windows 8.1 x64, but not with my 3570K using 15.33.8.64.3345.
I had this video freeze after this occured a few times (audio playback was uninterrupted with no response from MadVR using Ctrl-J) yet MPC-BE was fully operational.
Not sure if that's related but I hope so as I usually have no issues.
I'll take a look.
On SandyBridge I got the corruption around 8s but no freezes. I'll try other systems. After the weekend.
ryrynz
4th January 2014, 05:58
Glad to see it duplicated.
The freeze occured about an hour or so into the video this sample was taken from, if the error gets fixed I can retest the video, thanks.
egur
5th January 2014, 22:34
Here's a nice article in Anandtech (http://www.anandtech.com/show/7566/intels-haswell-nuc-d54250wyk-ucff-pc-review) on the new Haswell NUC.
Racer
7th January 2014, 20:08
Did anybody try to use this FFDShow decoder supporting QuickSync for frame-serving to x264. Is this actually possible? Because then decoding could be done by hardware and encoding by software.
egur
8th January 2014, 08:22
QuickSync can be used in FFDShow everywhere. If not, let me know.
The benefits are visible when high bitrate source clips are used or if you want to enable QS's video processing (e.g. deinterlacing, noise reduction).
TEB
19th January 2014, 11:04
Ho. This question has prolly been asked before, but does anyone know if Broadwell's Quicksync implementation will support H.265?
br TE
wanezhiling
19th January 2014, 14:47
I hope that Broadwell supports H.264 SVC, H.265, VP9. :D
btw Broadwell will not be used on desktop?
GTPVHD
19th January 2014, 22:45
http://www.cpu-world.com/news_2013/2013112001_Broadwell-K_socket_1150_CPUs_to_feature_GT3_graphics.html
There will be Socket 1150 Broadwell desktop chips.
skryabin
24th January 2014, 02:07
Does the fake monitor trick work anymore in 8.1?
When I click detect nothing happens, no vga monitors detection, so I can't extend the desktop and use quicksync.
It worked fine on windows 8.
EDIT: maybe I got it working again, reinstalled the drivers.
egur
25th January 2014, 22:00
The fake screen trick still works for me.
Deihmos
26th January 2014, 05:01
Is there any difference between ffdshow supplied by the developer and the tryouts version?
ryrynz
26th January 2014, 05:21
Is there any difference between ffdshow supplied by the developer and the tryouts version?
No, grab which ever version is the latest or suits your needs.
egur
26th January 2014, 08:10
My builds are taken from ffdshow-tryouts source code so they are the same. Since my last build, there wasn't anything new to add so I didn't release anything...
Near Broadwell launch I'll add some new stuff or before if I can find a workaround for broken clips.
Stereodude
1st February 2014, 14:04
Eric, can you take a look why the Intel QuickSync HW decoder gives choppy video playback with this sample .vob file (http://www.sendspace.com/file/b1pxf5)? FWIW, this is not some odd corner case I found. I've found that the QuickSync HW decoder can't smoothly decode a lot of MPEG-2 DVD content (played directly from the disc with the MS DVD navigator or from an extraced .vob file using the LAV splitter).
Here are the relevant details:
madVR tells me the renderer isn't dropping any frames so the stutter has to be in the decoder.
I'm using a i7-4770k with the latest 10.18.10.3345 drivers
An i5-4300U with slightly older 9.18.10.3324 drivers does the same thing
Windows 7 x64 on both systems
MPC-HC 1.7.1 or 1.7.2 w/ internal LAV filters
DXVA2 (native), DXVA2 (copy-back), CUVID, and SW decoding in LAV work fine
Extracting the audio and video streams from the .vob and putting them in a .mkv will play back fine decoded with QuickSync
nevcairiel says the decoder is misbehaving, not LAV
egur
1st February 2014, 20:00
I've managed to reproduce the bug. It didn't happen in my 3:2 soft telecine content...
As a quick fix, use either LAV video decoder or disable the time stamp correction within fddshow config->Intel QuickSync
I'll create a version after some more testing.
nevcairiel
1st February 2014, 20:09
It doesn't work properly in LAV either when the QS decoder is used, although it has a different bug.
Stereodude
1st February 2014, 20:22
I've managed to reproduce the bug. It didn't happen in my 3:2 soft telecine content...
As a quick fix, use either LAV video decoder or disable the time stamp correction within fddshow config->Intel QuickSync
I'll create a version after some more testing.Thanks! I guess I should have clarified that I'm using the LAV video decoder with QuickSync not ffdshow.
egur
2nd February 2014, 11:41
I'll test with LAV too before posting a fix.
Deihmos
2nd February 2014, 22:26
Is it possible to disable auto loading of external subtitles in ffdshow? The ffdshow development is now dead and no one responds.
egur
3rd February 2014, 12:59
Is it possible to disable auto loading of external subtitles in ffdshow? The ffdshow development is now dead and no one responds.
Not using ffdshow for subs, but there's an option in config->subtitles: "Subtitle files". Doesn't unchecking this remove auto loading? Also, the player may force ffdshow to load the subs.
RealSnoopyDog
3rd February 2014, 13:09
Since i updated my PCs from Windows 7 to Windows 8.1, i get blurred and choppy playback of VC1 encoded full hd content with the quicksync decoder in the LAV filters. When i use software decoding or DXVA2, the playback is good again.
I tested several Intel drivers already, the "newest" one that windows 8.1 installs automatically (newer than the last offical release on the Intel homepage), the latest official from Intel and some olders.
I have CPUs with HD4000 graphics atm.
Is this a known issue with the windows 8.1 drivers or am i the only one getting this?
wanezhiling
3rd February 2014, 13:35
Post sample file which explains everything.
RealSnoopyDog
3rd February 2014, 15:08
Happens with every full HD VC1 encoded material that i have.
egur
4th February 2014, 09:16
@Stereodude - can you upload the file again? The link is not working.
Stereodude
4th February 2014, 13:31
@Stereodude - can you upload the file again? The link is not working.I deleted the file because I figured you had saved it and I didn't need it online any longer. Here it is again (http://www.sendspace.com/file/jquzwx).
Edit: Let me know when you've downloaded it because I plan to delete it again.
egur
4th February 2014, 16:02
My fix seems to work for ffdshow but not well for LAV. I'll do this offline with nevcairiel.
Using SW deinterlacing, everything is blurry and jumpy - looks like the DI is not handling 3:2 well or maybe the odd/even flags are wrong. When using HW DI (in my decoder), the frames are sharp but playback has tons of jitter - non smooth motion. The latter suggests time stamp issues.
Why this is happening:
In ffdshow, when QS decoder reports a film sequence, it will override the time stamps and use 1/23.976 second difference between frames and also clear the frame flags for the renderer. The original stamps of 3:2 clips is always for 29.97fps.
Also in ffdshow, the default behavior of QS decoder is to modify the time stamps.
Stereodude
4th February 2014, 19:29
I assume the sample I provided is soft telecined. Doesn't that mean the QS decoder should apply a telecine to the content instead of passing progressive frames? Is it valid for a decoder to ignore the flagging and pass progressive frames?
Personally, I'd rather get progressive 24p output instead of telecined output that then gets IVTC'd by either the video card drivers or madVR's routines as the latter is just a waste of GPU/CPU resources to pointlessly telecine and then IVTC it. However, I don't have a clue if that's a valid action for a mpeg-2 decoder to take, or QS has a way to negotiate skipping the soft telecine that with the application / filter using it.
egur
4th February 2014, 19:36
QS decoder has 2 modes:
1) Time stamp manipulation on (default for ffdshow, configurable): soft 3:2 are used and output is progressive frames at 23.975 (3:2).
2) TS manipulation is off (hardcoded in LAV), decoder outputs frames as interlaced but they are marked to have repeated fields. LAV decoder should do the soft telecine and change the timestamps to the new frame rate.
This behavior is something I fixed in ffdshow itself more than a year ago.
Stereodude
4th February 2014, 20:02
Mode 1 sounds more desirable to me. I wonder why does LAV not allow it...
nevcairiel
4th February 2014, 20:22
No, 1) is not more desirable. It may sound like that, but its really not. You lose information, which the video renderer can use to make better decisions how to handle the video.
Besides, the goal is to have all decoders in LAV behave similar, so QuickSync doesn't get any special modes that mock with data. LAV is not ffdshow.
The problem LAV has is that the QS decoder does not behave sanely on this stream, it does not pass-through the timestamps 1:1 like it should. The input stream barely has any timestamps, and the QS decoder "invents" new 30 fps timestamps, which are clearly wrong - despite me telling it not to do anything with the time!
Stereodude
4th February 2014, 21:04
Oh... Well, I'll let you two fight it out. :)
egur
4th February 2014, 21:08
No fighting, we're all on the same page :).
Could be my bug. Need more checks.
nevcairiel
4th February 2014, 21:50
Might also be in Intels code, who knows. All i know is that i give it certain timestamps, and get others back out. :)
CharlieCL
5th February 2014, 00:40
When the hardware of Quick Sync Decoder was called I guess many driver instructions need to be executed along with memory bandwidth, this will slow down the QS decoder. In software decoder if SSE4 or AVX was implemented right, is it possible faster than hardware QS decoder in high end Core i processors?
I guess that current software decoder was not well implemented.
Deihmos
5th February 2014, 05:03
Not using ffdshow for subs, but there's an option in config->subtitles: "Subtitle files". Doesn't unchecking this remove auto loading? Also, the player may force ffdshow to load the subs.
That disables it completely. I was hoping for an option to not load them if the audio language is english.
egur
5th February 2014, 08:19
When the hardware of Quick Sync Decoder was called I guess many driver instructions need to be executed along with memory bandwidth, this will slow down the QS decoder. In software decoder if SSE4 or AVX was implemented right, is it possible faster than hardware QS decoder in high end Core i processors?
The HW decoder is significantly faster than all SW decoders. When copying frames back to system memory performance is lost. In low bitrate clips SW will end up being faster but on medium-high, HW will be faster. FYI, SSE4 isn't an improvement to SSE1/2/3, it's an additional set of instructions that use the same SSE registers.
That disables it completely. I was hoping for an option to not load them if the audio language is english.
I think this sort of logic should be part of the player, not the decoder. Just my 2 cents. You can always delete/rename the subtitle file before seeing the video. That's what I do.
egur
5th February 2014, 16:04
@Stereodude:
I've found a very low level bug that caused this. Very strange it didn't surface before. A few more tests and I'll release ffdshow.
LAV filters got the update too, so the next LAV release should be good.
If you have more issues, don't be shy. My test material is finite...
Stereodude
6th February 2014, 00:43
@Stereodude:
I've found a very low level bug that caused this. Very strange it didn't surface before. A few more tests and I'll release ffdshow.
LAV filters got the update too, so the next LAV release should be good.
If you have more issues, don't be shy. My test material is finite...Thanks for looking into it.
CharlieCL
6th February 2014, 04:55
The HW decoder is significantly faster than all SW decoders. When copying frames back to system memory performance is lost. In low bitrate clips SW will end up being faster but on medium-high, HW will be faster. FYI, SSE4 isn't an improvement to SSE1/2/3, it's an additional set of instructions that use the same SSE registers.
SSE/AVX is like DSP in 256/512bit with 3GHz in CPU. The ASIC decoder may have run only in 1GHz. I can not see why SW decoder can not be faster. Maybe SSE/AVX is poor in architecture or maybe software implementation is poor.
Anyway I am disappoint on the performance of SSE/AVX.
egur
6th February 2014, 10:31
SSE/AVX is like DSP in 256/512bit with 3GHz in CPU. The ASIC decoder may have run only in 1GHz. I can not see why SW decoder can not be faster. Maybe SSE/AVX is poor in architecture or maybe software implementation is poor.
Anyway I am disappoint on the performance of SSE/AVX.
I'll give you a high level example how things work. Let's say the HW needs to perform inverse DCT of 8x8 blocks on many blocks. Blocks are located sequentially in memory (for simplicity of example).
ASIC (dedicated HW circuit a.k.a fixed function):
* Entire IDCT algorithm is implemented as a circuit. Input is a matrix/vector, and so is the output.
* Every adder, multiplier and other logic in the circuit is not shared. e.g. if the algorithm has 64 multiplications, 64 different multipliers will exist in the circuit.
* The circuit can read new inputs at every clock.
* The circuit can output new data at every clock.
* The circuit has a fixed delay D (clock ticks between input and output). D is usually very small. For IDCT it's <10.
* Work time for performing IDCT on N DCT blocks is N+D, not N*D.
* If higher performance is needed, the circuit can be duplicated. This will give 2 IDCTs per clock. The cost is silicon area ($$$).
* The circuit can do just one algorithm/function.
SSE/AVX/DSP:
* CPU decodes instructions and/or fetches input data every clock.
* Logic resources are shared between stages of the algorithm. e.g. if the algorithm has 64 multiplications, there will be a small number of (vector) multipliers.
* Blocks are processed one at a time. With a delay of D2. Usually larger than the delay of the ASIC implementation.
* Time for performing N tasks is N*D2. Much larger than N+D.
* HW is much more generic, can performs millions of functions.
I hope this is clearer.
Why use ASIC? high performance for known and unchanging algorithms.
Why use DSP? Flexibility, ability to fix bugs or optimize after production, sometimes cheaper HW.
NikosD
6th February 2014, 13:59
Nice post Eric.
Very clear, short and accurate.
I could only add the efficiency/power as an added reason for implementing and using something in HW (ASIC) than SW (CPU)
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.