Log in

View Full Version : Intel QuickSync Decoder - HW accelerated FFDShow decoder with video processing


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 [29] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

egur
13th June 2012, 11:40
I have two samples which makes problem during playback: http://netload.in/dateiRtEF2R4qzi/Interlaced samples.zip.htm. Framerate indicator shows 2x bigger framerate then real framerate and there's horrible stuttering during playback. I have to disable deinterlacing on QuickSync and use software deinterlacing from FFDShow, then movies play perfectly.

OK, got the files, I'll investigate the issue later.
By stuttering, do you mean audio, video or both?
BTW, frame rate can double if the clip is marked as interlaced and full rate is selected.

mecedo
13th June 2012, 14:47
OK, got the files, I'll investigate the issue later.
By stuttering, do you mean audio, video or both?
BTW, frame rate can double if the clip is marked as interlaced and full rate is selected.

MadVR reports many dropped video frames.

egur
13th June 2012, 15:15
MadVR reports many dropped video frames.

MadVR will probably choke for this high frame rate. The iGPU resources are also drained by the fact that it has to do deinterlacing.
This might be not be a bug - it looks like the iGPU is stressed too much.
Does this happen with EVR?
Try lowering the MadVR algorithm quality (e.g. to bi-linear), does this help?

BTW, if you want the full potential of MadVR, you'll need a mainstream discreet GPU.

andyvt
13th June 2012, 15:20
BTW, if you want the full potential of MadVR, you'll need a mainstream discreet GPU.

Ignoring the flexibility to select the VPP algorithm, do you think madVR offers a benefit over the VPP provided by IVB?

egur
13th June 2012, 15:40
Ignoring the flexibility to select the VPP algorithm, do you think madVR offers a benefit over the VPP provided by IVB?

VPP is more than scaling. In some use cases MadVR alone will do a better job.

The benefits of doing DI within the decoder is to enable various post processing functions to work including adding subtitles.

MadVR, if operating on another GPU can give the best quality by utilizing his special features (exclusive full screen, sharp luma scaler combined with a soft chroma scaler).

We're not at the point where the iGPU can run MadVR at its full capabilities.

andyvt
13th June 2012, 15:52
VPP is more than scaling. In some use cases MadVR alone will do a better job.


Recent versions of madVR do more than scale. I don't think it offers the same capabilities as MSDK, but without getting into too much detail I thought "VPP" was the best term :)


The benefits of doing DI within the decoder is to enable various post processing functions to work including adding subtitles.


This is only a benefit when the renderer doesn't provide proper* support right?

* or more specifically the application doesn't allow for enough flexibility in its support for subtitles/cc when it orchestrates playback via DirectShow


MadVR, if operating on another GPU can give the best quality by utilizing his special features (exclusive full screen, sharp luma scaler combined with a soft chroma scaler).

We're not at the point where the iGPU can run MadVR at its full capabilities.

So "under the right conditions" to the original question?

egur
13th June 2012, 16:11
Under the right conditions, MadVR can provide better quality than MSDK VPP.
Under different conditions, it might stall.
If you care about subtitles or you're unhappy (for any reason) from the way MadVR handles DI, then MSDK VPP is a good option.

Given the low power budget the iGPU have together with the fact that it costs nothing (to the user), you can't seriously expect it to beat a 150W GPU on every turf...
Future iGPUs will be stronger and stronger, at some point MadVR will be able to run smoothly using it's (today's) highest settings. When will this happen? Don't know.

andyvt
13th June 2012, 16:19
Given the low power budget the iGPU have together with the fact that it costs nothing (to the user), you can't seriously expect it to beat a 150W GPU on every turf...


TBC, I don't expect it to. My interest is around defining the decision points which could lead someone to select a dGPU over IVB in a HTPC.

egur
13th June 2012, 18:45
TBC, I don't expect it to. My interest is around defining the decision points which could lead someone to select a dGPU over IVB in a HTPC.

My opinion for decision points should be:
dGPU:
+ Use MadVR - may not work well in 4K@60fps even if screen resolution is smaller.
+ Proper HDMI output levels
- $$$
- More power
- More noise unless a fanless model is used.
- DXVA not 100% stable on all models.
- hybrid GPU setup not always work.
- Fan will make a lot of noise after a year or two.
- Large form factor due to dGPU height/length
- Poor HW DI quality (may improve). Little progress on this front in the last 5 years (AMD and Nvidia).

iGPU (IvyBridge):
+ It's free :)
+ Quiet
+ very little power -> especially with mobile or ULV parts.
+ Small form factor possible. Wife not angry about PC in the living room :)
* Use EVR (4K) or EVR-CP (1080p)
* Requires fast memory (1600MHz or better) - not so bad - good for CPU too. Not expensive.
* Can always buy dGPU later if not satisfied (unless case is too small).
- In combination with some TVs, incorrect HDMI output levels are used. Long standing driver bug.
- Should opt for GT2 models - stronger iGPU (16 EUs vs. 6 EUs for GT1), can raise the price.
- Driver availability/updates policy is not clear once HW becomes "old" (2 years).
- No HW deblocking/deringing for older formats.

For those who want the absolute best quality and do not care for DI, they should buy a silent dGPU, an i7 processor and do everything in SW.

My personal setup is an i7-2600K, no dGPU.

nevcairiel
14th June 2012, 07:04
I don't agree with some of your points there.

- HW DI on NVIDIA and AMD cards is good, i don't think Intel is any better. In fact, i think you're the first to ever claim that.
- DXVA/MediaSDK is equally unstable on Intel, very much depending on driver version and whatnot. Heck Intel doesn't even bother supporting the "standard" DXVA, limiting you to applications that have special support for Intels differences (Media SDK or otherwise). From my experience, NVIDIA has had the most stable experience, if you ignore the recent MPEG4-ASP breakage (a format not available on Intel)
- NVIDIA and AMD support actual 4k Output (future proofing)

Me personally, i have a AMD 7750 passive card now, and use it in a hybrid setup with the ivy bridge CPU/GPU, and it works just beautifully.

------------
On another note, i got a report about some issues with QS DI, maybe you can have a look:
http://forum.doom9.org/showthread.php?p=1578205#post1578205

vivan
14th June 2012, 08:57
Heck Intel doesn't even bother supporting the "standard" DXVA, limiting you to applications that have special support for Intels differences (Media SDK or otherwise).And Flash is not such application, so with h/w video decoding it produces artifacts on files with big number of reference frames. Also EVR on intel GPU's have some issues with chroma upsampling (Nearest Neighbor interpolation). You can use madVR... but not in Flash player.

egur
14th June 2012, 14:25
And Flash is not such application, so with h/w video decoding it produces artifacts on files with big number of reference frames. Also EVR on intel GPU's have some issues with chroma upsampling (Nearest Neighbor interpolation). You can use madVR... but not in Flash player.

If you provide a flash sample (or link) that shows artifacts, that would be nice.

As for chroma upsampling, are you sure this happens on SandyBridge/IvyBridge?

Nev, as for DI, on my DI test suite, AMD fails miserably. I don't have an Nvidia for a while , but the last one I had was a 9800GTX and it too wasn't very good.

My AMD card, a Radeon HD7950, seems to throw away every other frame (output 30p), at least when the fields are very different. The result image is less sharp (and less details) in AMD than in Intel.

nevcairiel
14th June 2012, 15:11
My AMD card, a Radeon HD7950, seems to throw away every other frame (output 30p), at least when the fields are very different. The result image is less sharp (and less details) in AMD than in Intel.

Sounds like some misconfiguration or something else broken, it can produce fluid 60p image from interlaced videos.
Its a common thing to be tested in HTPC-focused reviews, including the typical (artificial) Cheese Slice test as well as HQV 2.0, and AMD usually always scored slightly above NVIDIA (mostly because of better cadence detection), and both above Intel.

egur
14th June 2012, 15:19
Sounds like some misconfiguration or something else broken, it can produce fluid 60p image from interlaced videos.
Its a common thing to be tested in HTPC-focused reviews, including the typical (artificial) Cheese Slice test as well as HQV 2.0, and AMD usually always scored slightly above NVIDIA (mostly because of better cadence detection), and both above Intel.

Ask any video processing expert and you'll get the same answer - HQV is a not a serious test.
It was meant to prove the superiority of HQV video processors. That's why it puts a high score on those rare cadences and doesn't stress deinterlacing too much. Also scaling is partially checked but given very low score (<10% of overall score).
Digital noise reduction (deblock/dering/mosquito) isn't scored either.

andyvt
14th June 2012, 15:26
Ask any video processing expert and you'll get the same answer - HQV is a not a serious test.
It was meant to prove the superiority of HQV video processors. That's why it puts a high score on those rare cadences and doesn't stress deinterlacing too much.

It's also quite likely that the video sequence used for cadence detection contains embedded moiré and the scoring guide specifically calls for scoring based on the presence of moiré in the stands:


No moiré pattern observed in stands within less than 1/2 second 5
No moiré pattern observed in stands within less than 1 second 3
Moiré pattern observed in stands intermittently or constantly through the clip 0


Which raises some questions about the suitability/quality control around the samples in the test.

mecedo
14th June 2012, 17:22
Does this happen with EVR?
Try lowering the MadVR algorithm quality (e.g. to bi-linear), does this help?

Yes. Renderer has no impact.

egur
14th June 2012, 21:07
I've found a few issues with the DI, but a fix is not clear ATM, I suggest that people stop using it until a fix is ready.

nevcairiel
14th June 2012, 21:16
Ask any video processing expert and you'll get the same answer - HQV is a not a serious test.
You missed the whole point of my post.
I didn't say HQV is a good test, just that reviews use it and such catastrophic DI results would've been noticed. That's all.

andyvt
14th June 2012, 21:34
You missed the whole point of my post.
I didn't say HQV is a good test, just that reviews use it and such catastrophic DI results would've been noticed. That's all.

The thing is that HQV doesn't really test the quality of DI, just that the VP is able to do it.

Most reviewers just look at the bars in the middle (i.e. is the cadence detected), and not at the stands; so I don't think that claim (that they would have noticed) is well founded. If you look at the stands, there's no way (or at least not with any of the GPUs or CE devices I've tested) that any device could get more than 0.

nevcairiel
14th June 2012, 21:38
Most reviewers also use the cheese slice test when trying to determine the DI quality, and any good reviews also use other interlaced material. HQV was just an example, vent your hate for that test somewhere else, jeez. :P

andyvt
14th June 2012, 21:42
HQV was just an example, vent your hate for that test somewhere else, jeez. :P

I don't hate it, I just don't think it's valid in the context that most people place it.

egur
14th June 2012, 22:03
You missed the whole point of my post.
I didn't say HQV is a good test, just that reviews use it and such catastrophic DI results would've been noticed. That's all.

I used test DVDs from several TV manufacturers as well as other video processors, unfortunately I can't share this material.

The problem is that HQV is used so much by the press that it forces video processors to optimize for it, at the expense of real world clips...

vivan
14th June 2012, 22:23
If you provide a flash sample (or link) that shows artifacts, that would be nice.http://uppod.ru/vfvqawryip

As for chroma upsampling, are you sure this happens on SandyBridge/IvyBridge?Yes (http://2.firepic.org/2/images/2012-06/14/0qa1al7hpd7b.png). You can check this video (http://amvnews.ru/index.php?lang=english&go=Files&in=view&id=1656) (you can switch lang to english), or youtube (http://www.youtube.com/watch?v=hPZjKjwuaAs)/vimeo (http://vimeo.com/15327875).

egur
14th June 2012, 22:48
http://uppod.ru/vfvqawryip

I see what you mean. How can I download the file?


Yes (http://2.firepic.org/2/images/2012-06/14/0qa1al7hpd7b.png). You can check this video (http://amvnews.ru/index.php?lang=english&go=Files&in=view&id=1656) (you can switch lang to english), or youtube (http://www.youtube.com/watch?v=hPZjKjwuaAs)/vimeo (http://vimeo.com/15327875).

Downloaded the clip, played (with EVR) at 100% size and did a print screen. Definitely not nearest neighbor (SandyBridge).
The capture image (your link) does show NN interpolation, how was it made? Which GPU?

vivan
15th June 2012, 00:24
http://amvnews.ru/index.php?go=Files&file=down&id=2656

EVR-CP in MPC-HC. GPU is Intel® HD Graphics 3000 integrated into i5-2410M. With discrete GPU (nVidia GT540M) I don't have such issue (however it's impossible to use QS Decoder with rendering on dGPU =/).
Also happens in flash, I'll post screenshots tomorrow (it's 3 am here :D).

UPD:
http://2.firepic.org/2/images/2012-06/14/3vp6bof90eyr.png
http://2.firepic.org/2/images/2012-06/14/7nrqt4bwpga3.png
http://2.firepic.org/2/images/2012-06/14/8q3xpyohj5dl.png
Latest stable MPC-HC (from sc) & intel HD drivers (2696)

Another one video with artifacts: http://akross.ru/index.cgi?act=video;id=2810;l=e ("Preview (Low-Q)" is streaming through player).
Yes, it has 15 reference frames... But 270p resolution.

egur
15th June 2012, 11:37
http://amvnews.ru/index.php?go=Files&file=down&id=2656
Link doesn't work.
EVR-CP in MPC-HC. GPU is Intel® HD Graphics 3000 integrated into i5-2410M. With discrete GPU (nVidia GT540M) I don't have such issue (however it's impossible to use QS Decoder with rendering on dGPU =/).
Also happens in flash, I'll post screenshots tomorrow (it's 3 am here :D).

UPD:
http://2.firepic.org/2/images/2012-06/14/3vp6bof90eyr.png
http://2.firepic.org/2/images/2012-06/14/7nrqt4bwpga3.png
http://2.firepic.org/2/images/2012-06/14/8q3xpyohj5dl.png
Latest stable MPC-HC (from sc) & intel HD drivers (2696)

Strange. I have newer drivers, so maybe this was fixed, but I don't ever recall this being an issue.
What's the scaler settings EVR-CP?

Another one video with artifacts: http://akross.ru/index.cgi?act=video;id=2810;l=e ("Preview (Low-Q)" is streaming through player).
Yes, it has 15 reference frames... But 270p resolution.
This clip makes me dizzy...
Playing the file in ZoomPlayer looks fine (1st minute).
It seems to have cut scenes that are garbage (white noise) on intention. Tell me where there's a problem.

ipanema
15th June 2012, 12:46
The notes on the qsdecoder project page say "Started as an internal decoder within FFDShow. Can be easily ported to other DirectShow decoders."

Has anyone actually implemented a standalone DirectShow filter based on qsdecoder?

It might be something I'd look at, but I'm not familiar with ffdshow internals or how qsdecoder interfaces into ffdshow. Presumably I'd need to convert the ffdshow input and output mechanism to the DirectShow base classes way of working. Is there any documentation that summaries how ffdshow passes samples/frames to (in this case) qsdecoder, and maybe how it handles IQualityControl::Notify quality control messages?

egur
15th June 2012, 12:52
UPD:
[url]http://2.firepic.org/2/images/2012-06/14/3vp6bof90eyr.png
http://2.firepic.org/2/images/2012-06/14/7nrqt4bwpga3.png
http://2.firepic.org/2/images/2012-06/14/8q3xpyohj5dl.png
Latest stable MPC-HC (from sc) & intel HD drivers (2696)

Digging a little deeper revealed that ZoomPlayer has a small aspect ratio error resulting in the scaler being used.
When the scaler is used, the effect is significantly smaller. That's why I didn't pick this up yet. In full screen viewing this is hardly noticeable (not an excuse - don't get me wrong).
I know that the HW is designed to do something much better...

Anyway, I'll ask around.

egur
15th June 2012, 13:35
The notes on the qsdecoder project page say "Started as an internal decoder within FFDShow. Can be easily ported to other DirectShow decoders."

Has anyone actually implemented a standalone DirectShow filter based on qsdecoder?

It might be something I'd look at, but I'm not familiar with ffdshow internals or how qsdecoder interfaces into ffdshow. Presumably I'd need to convert the ffdshow input and output mechanism to the DirectShow base classes way of working. Is there any documentation that summaries how ffdshow passes samples/frames to (in this case) qsdecoder, and maybe how it handles IQualityControl::Notify quality control messages?

Yes, LAV video decoder integrated QS decoder a while back. nevcairiel (LAV filters author) reported a 2 day effort.
Potplayer also integrated it, without any support from me.
If you're looking at FFDshow's code, find all references to the TvideoCodecQuickSync class (under codecs). That's the proxy class for the QS decoder. It's very simple.
If you want to go further, drop me a PM.

egur
15th June 2012, 15:26
Version 0.36 beta is out with the following changes:
* Bugfixes - DI was stalling or crashing.
* FFDShow rev4464

Downloads
* For the latest cutting edge FFDShow builds download my builds Intel QuickSync Decoder SourceForge home page (http://sourceforge.net/projects/qsdecoder/)
* FFDShow-tryout site (http://ffdshow-tryout.sourceforge.net/download.php)
* LAV Splitter builds (http://forum.doom9.org/showthread.php?t=156191)

vivan
15th June 2012, 15:35
Link doesn't work.
Strange, it should redirect to http://46.4.15.147:8080/Video/Full003/02656.Cenit-Jagdmeister.HD.60fps.amvnews.ru.mp4 (link, that I used in that player).
Or http://amvnews.ru/index.php?go=Files&in=view&id=2656 - link under the player (that is 147.51 Mb 1280x720@59.94fps)

Strange. I have newer drivers, so maybe this was fixed, but I don't ever recall this being an issue.I have such problem from the beginning (~1 year).
What's the scaler settings EVR-CP?Bicubic A=-1.00 (PS 2.0)

This clip makes me dizzy...
Playing the file in ZoomPlayer looks fine (1st minute).
It seems to have cut scenes that are garbage (white noise) on intention. Tell me where there's a problem.That's how it looks in web player: http://dl.dropbox.com/u/16254258/Desktop%2015-06-2012%2018-16-16.mp4
Same in MPC-HC with LAV DXVA-copyback decoder (or Native DXVA + EVR-CP).

egur
15th June 2012, 15:57
@vivan
I also noticed the bad chroma upsampling - look up a few posts.
Downloaded the file. I'll take a look.
It plays fine offline (ffdshow-quicksync), online via flash player is pretty bad like you said.
The previous sample behaves the same, offline is great, online is bad.

I'll see if I can find out who supports Adobe.

vivan
15th June 2012, 16:27
With your decoder anything is fine ;)
Issue is with applications that use DXVA (like flash player).

When playing offline you can use any player you want with any renderer, so I use QS decoder + madVR - and it's perfect. But when it comes to online - you don't have any choice... So issues with chroma upsampling with EVR (on most videos it hardly visible, due to scaling and poor quality) and heavy video (through it affects only couple of sites, that allow to stream uploaded video without any reencoding - even 1080p@60fps and Hi10p...) with DXVA appears.

egur
15th June 2012, 21:58
With your decoder anything is fine ;)
Issue is with applications that use DXVA (like flash player).

When playing offline you can use any player you want with any renderer, so I use QS decoder + madVR - and it's perfect. But when it comes to online - you don't have any choice... So issues with chroma upsampling with EVR (on most videos it hardly visible, due to scaling and poor quality) and heavy video (through it affects only couple of sites, that allow to stream uploaded video without any reencoding - even 1080p@60fps and Hi10p...) with DXVA appears.

Good to know I have a happy customer :)
As for flash player - usually the best approach is to find a way to notify Adobe on this issue. It could either be their bug or they can ask for support directly from Intel (in this example) and they'll get it.

It's a little like having car-engine troubles, you don't contact the engine manufacturer, you contact the car dealership. I know it's not the same, but gets the job done.

CruNcher
16th June 2012, 08:53
@Egur
Decodinerror.ts is fixed (the 1 frame coruption) in combination with Quicksync via current Lav Video with Driver 2761, though it still fails with Microsofts DTV Decoder.

Ahh wait i see Lav Video fallsback to avcodec on that Video hehe, i wonder what the reason in the end is now for that decision (detection reason for the fallback) ;)

I see ffdshow quicksync also fallsback now to libavcodec so egur what is the reason for that what can't be fixed here on the Quicksync DSP (Driver) side anymore (or are you still working on a Media SDK fix) ? :)


Im now on a Dual Card config and can switch between Nvidia (640 GTX) and Intel (HD 2000) results, though i still do this manually by changing inputs but it works nicely without Virtu (just had to get used to the Black Screen on Boot @ first, when on IGD and the discrete card first being initialized on the NT 6 layer)

Whats very interesting is the Kernel Driver Latency difference between Intel and Nvidia i wonder how much the path plays a role in Nvidias driver and how much the manipulation layer in terms of the overall latency. I guess Intels Driver is also perfect to compare being so much younger (not so bloated yet) :)

Also i wonder what is gonna happen if i call the Nvidia Cuda layer + Renderer on the Discrete Card let the video playback and then switch (same for other applications using the DSP or GPU or both before the switch) ;)

In those regards im preparing a big compare of my multitasking test http://forum.doom9.org/showthread.php?t=164555 for Win 7 and Win 8 Intel and Nvidia WDDM 1.1 and 1.2 :) im thinking of adding Ray Tracing additionally as overhead (OpenCL,CUDA,Native CPU) which should really push down the response most probably WDDM 1.1 (getting to its scheduling boarders) will collapse here 1.2 should be able to cope with it (full supported Hardware even better) :)

The Winner of it is then gonna go vs http://haiku-os.org/ though with some different (more evened fair,non GPU accellerated) setup which im still working on http://blip.tv/linuxconfau/haiku-4747185 .

wanezhiling
16th June 2012, 11:24
NEW FEATURES IN THIS DRIVER VERSION 2761

SUPPORT ADDED FOR OPENGL VERSION 3.3 ON INTEL HD GRAPHICS 4000 / 2500
RECENT GAME RELEASES ENABLED IN DRIVER VERSION 2761



ISSUES FIXED IN DRIVER VERSION 2761

VIDEO PLAYBACK
• Sound can now be heard from the external monitor after selecting the monitor as the playback device in the Windows Sound control panel.
• Resolved issue where sound may not be heard after waking the computer from Sleep state.
• Resolved issue where flickering and noise may be seen in video transcoded to MP4 format.

DISPLAY
• When using three displays with the Intel HD Graphics 4000/2500, resolved an issue where sometimes the external monitor may not be detected after pressing the Ctrl+Alt+F11 keys several times to turn panel scaling on/off.
• When using three displays with the Intel HD Graphics 4000/2500, resolved issue where flickering and noise may be seen after closing and re-opening the laptop’s lid.
• When using three displays with the Intel HD Graphics 4000/2500, resolved issue where the refresh rates listed in Windows and in the Intel Graphics and Media Control Panel may not match.
• Resolved an issue in the Intel Graphics and Media Control Panel where the rotation setting’s picture did not change after selecting a different rotation setting.
• Resolved issue where the DisplayPort display cannot be used in Extended Desktop or Clone configuration after a fresh installation of Windows.
• Resolved issue where after connection to the computer, an analog monitor could not be used in Extended Desktop or Clone configuration.
• When using three displays with the Intel HD Graphics 4000/2500, resolved a flickering issue and STOP error with code 0x116 that may be seen after waking the computer from Sleep state.
• Resolved an issue where the laptop’s display may be pale after connecting the power cable and waking the laptop from Sleep state.

egur
16th June 2012, 13:21
I was asked to share evil trees (http://www.mediafire.com/?vap13dtvlcbb4nh), a clip originally shared by CruNcher (link's dead now).
This clip is an MPEG2 NTSC HDTV clip which alternates very violently between soft and hard telecine.

CruNcher, welcome back.
The decodingerror clip (Lady Gaga) indeed fails to pass the initial checks, I'm on it.
Update:
Good catch CruNcher :)
Seems to be a regression that occurred in driver 2752 and continues through 2778 (in MSDK DLL). It refuses to decode the H264 header. H264 header seems fine according to ffdshow's H264 SPS parser.
I'll report this bug.

cybersans
16th June 2012, 13:52
some audio codec (i just found the ac3) that caused stuttering sound. i need to disable the ac3 decoder and it plays with uncompressed PCM so that the sound will not stuttering again.

guys.
per my previous posts, i still experienced stuttering with many avi/mp4/mkv video which has AC3 as input audio and buzzer sound when drag the seeking bar. i need to convert all ac3 audio to aac or other format and remuxed the video back so that i can watch it without stuttering sound. or disabled the ac3 decoder and play with pcm uncompressed input. using either libavcodec or liba52 does not make any change.

CruNcher
16th June 2012, 15:56
Current Intel Driver Win 7 Aero DPC Latency Result 2761 (had to optimize a bit on the Realtek Adapter Network config, ASUS Mainboard)
This is with a full On Demand Security Stack (including several Kernel security layer) ;):

http://www.ld-host.de/uploads/images/3eb00df37561d7ed406060644f77ee2a.jpg


Fully accelerated:

http://www.ld-host.de/uploads/images/3ca5b2cafae5027dd9fcb3bf8c3988e2.jpg

ipanema
16th June 2012, 16:06
Yes, LAV video decoder integrated QS decoder a while back. nevcairiel (LAV filters author) reported a 2 day effort.
Potplayer also integrated it, without any support from me.
If you're looking at FFDshow's code, find all references to the TvideoCodecQuickSync class (under codecs). That's the proxy class for the QS decoder. It's very simple.
If you want to go further, drop me a PM.

Thanks Eric. I'll take a look at LAV project aswell.

CharlieCL
16th June 2012, 19:12
I suppose 64-bit should speed up a lot vs 32-bit for codecs because of double bus-width. However there are no or very little improvement. The launch time may be a little bit faster. Are the current 64-bit codecs optimized for 64-bit?

egur
16th June 2012, 19:38
I suppose 64-bit should speed up a lot vs 32-bit for codecs because of double bus-width. However there are no or very little improvement. The launch time may be a little bit faster. Are the current 64-bit codecs optimized for 64-bit?

You do understand that QS is a HW solution...
The bulk of CPU cycles is spent copying the frames from GPU memory to system memory. I use SSE4.1 instructions for that and the extra 8 xmm registers do not contribute anything so it's the same code for 32/64. The rest of the code is usually ~1% of the CPU cycles I use.
Other parts may behave differently though (ffdshow, splitter, etc).

64 bit codec is only useful for Windows Media Center on Win7 64. Otherwise, there's no real reason to use it.

CruNcher
16th June 2012, 20:11
Almost perfect only Firefox Windowed Plugin and Javascript Threading and Performance here is still not where it could be to much frame drops way to much :(

http://www.ld-host.de/uploads/images/6ece3524b5a1f1567518c255bc5205ec.jpg


Though Intel Drivers handle it perfect :)



Nvidia 301.42 WHQL result:

http://www.ld-host.de/uploads/images/660c78aa5bfeb794712bcdbfbd93db90.jpg

http://www.ld-host.de/uploads/images/a58b463bd79e18f74dc86f108d907607.jpg


Not that bad :)

Only Firefox is still a big downer here in it's overall Directx Performance and stability thus people should be aware all the crazy state changes when you use Firefox are due to it's still not optimal D2D implementation and not Nvidias fault in General and the Firefox Team is working heavily on improving it :(
With Chrome/Chromium you wouldn't have seen any Frame Drops (on both) :(


The Workflow i used above for doing the switching is a little awkward but it works nicely on NT 6 no interferences i made sure to disable the Secondized Card and it's driver stack after the switch to get as clean measurements as possible as well :)

CharlieCL
17th June 2012, 17:26
You do understand that QS is a HW solution...
The bulk of CPU cycles is spent copying the frames from GPU memory to system memory. I use SSE4.1 instructions for that and the extra 8 xmm registers do not contribute anything so it's the same code for 32/64. The rest of the code is usually ~1% of the CPU cycles I use.
Other parts may behave differently though (ffdshow, splitter, etc).

64 bit codec is only useful for Windows Media Center on Win7 64. Otherwise, there's no real reason to use it.

Interesting. When you feed motion picture data to QS hardware accelerator, the 64-bit should be double faster than 32-bit. Can SSE4 be used? I guess 4K video may show big difference. In that case CPU usage may be 30%. May be need to use 4-channel Ivy Bridge-E to show pros of 64-bit.

egur
17th June 2012, 18:29
Interesting. When you feed motion picture data to QS hardware accelerator, the 64-bit should be double faster than 32-bit. Can SSE4 be used? I guess 4K video may show big difference. In that case CPU usage may be 30%. May be need to use 4-channel Ivy Bridge-E to show pros of 64-bit.

QuickSync is an ASIC (Application Specific Integrated Circuit - fixed HW) as opposed to a programmable HW (CPU or shader core).
ASIC is much faster and uses much less power than programmable alternatives but such a circuit can do a single function (QS is a collection of such circuits).
That circuit doesn't even belong to the CPU (it's part of the GPU), it doesn't know any x86/x64 instructions so it's performance has nothing to do with the CPU's mode of operation.
ASIC is useful for very high performance together with little power on a known algorithm that doesn't change (like H264 decoding). The downside is that it can't adapt to new algorithms - like H265.
That's how a modest GPU could give performance of 5x compared to a 200W high end GPU.
Tablets and smart phone have similar HW in concept.

nevcairiel
17th June 2012, 18:58
FWIW, even in a software decoder, 64-bit doesn't mean double speed. Very, very few algorithms can take that much advantage of more and bigger registers, if there is even one. You can maybe get 20-30% performance increases on highly optimized algorithms (thats about the number x264 gets faster during encoding, iirc), but most of the time in decoding, it will be about the same speed because all the important code path is already running in optimized SIMD code, which is the same in 32 or 64-bit.

Heck, if you think 64-bit is twice as fast as 32-bit, you don't understand the whole concept. :p
The main difference is that you get 64-bit registers and more registers in general. This can make your algorithm much faster, but thats assuming you actually deal with 64-bit numbers, and alot of them at the same time.

egur
17th June 2012, 19:49
Like Nev said, x64 gives you 16 general purpose registers and 16 SSE/AVX registers. The compiler has more headroom to play with the registers and doesn't need to store them back in memory/stack so quickly.
Function calls are also faster since the first 4 arguments are passed via registers and not the stack (but have a place in the stack).
Legacy floating point (x87) is out and SSE scalar floating point is used instead which is much faster in most cases.
Large arguments are passed by reference.
Memory allocations are 16 byte aligned to help performance.
So all this enhancements can accumulate to performance.

CruNcher
18th June 2012, 17:26
Yup especially if you work with massive amount of Data to crunch on no matter binary or ascii being Audio ,Pictures, Video or Databases though expecting only because of the 32bit 64bit 2x more performance is funny and i wonder where that comes from i never heard such advertisement or marketing saying it brings double the performance when the move was initiated by Intel/AMD ;)

CharlieCL
19th June 2012, 04:00
Like Nev said, x64 gives you 16 general purpose registers and 16 SSE/AVX registers. The compiler has more headroom to play with the registers and doesn't need to store them back in memory/stack so quickly.
Function calls are also faster since the first 4 arguments are passed via registers and not the stack (but have a place in the stack).
Legacy floating point (x87) is out and SSE scalar floating point is used instead which is much faster in most cases.
Large arguments are passed by reference.
Memory allocations are 16 byte aligned to help performance.
So all this enhancements can accumulate to performance.

One program like YV12toYUV2 conversion and YV12toRGB
may be faster by using 64-bit. Since we can get Y, U, V value from 3 addresses in 64-bit (2x 32-bit), that may reduce lots of cache missing.

CharlieCL
19th June 2012, 04:13
FWIW, even in a software decoder, 64-bit doesn't mean double speed. Very, very few algorithms can take that much advantage of more and bigger registers, if there is even one. You can maybe get 20-30% performance increases on highly optimized algorithms (thats about the number x264 gets faster during encoding, iirc), but most of the time in decoding, it will be about the same speed because all the important code path is already running in optimized SIMD code, which is the same in 32 or 64-bit.

Heck, if you think 64-bit is twice as fast as 32-bit, you don't understand the whole concept. :p
The main difference is that you get 64-bit registers and more registers in general. This can make your algorithm much faster, but thats assuming you actually deal with 64-bit numbers, and alot of them at the same time.

In case of 12G blu-ray video, 64-bit Windows with 16GB DRAM, 128GB SSD, by using memory-mapping file access, the 64-bit source/decoder can be better (50%??). Many 32-bit algorithms are based on small memory, there may have limitation.