Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
|
Thread Tools | Search this Thread | Display Modes |
8th January 2012, 01:37 | #441 | Link | |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
|
Quote:
Anyhow, you're saying that QuickSync in ffdshow works on your Clarkdale, but with LAV it doesn't? Did you check the CPU usage to confirm that its really using hardware decoding?
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
|
8th January 2012, 01:41 | #442 | Link | |
Registered User
Join Date: Mar 2008
Posts: 2,021
|
Quote:
EDIT: Here, they are: _ _ _ _ Last edited by rica; 8th January 2012 at 02:06. |
|
8th January 2012, 01:52 | #443 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
|
I can probably patch up a debug version that shows why QS fails, maybe it sheds some light on things.
Edit: http://files.1f0.de/lavf/LAVVideo-0.44-debug.zip Throw that on top of 0.44, and a log file should appear on your desktop. Maybe there is something interesting in there.... Paste the log on pastebin or something, don't want to wait for attachment approval.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders Last edited by nevcairiel; 8th January 2012 at 02:02. |
8th January 2012, 02:11 | #444 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
|
Thats DXVA, not QuickSync
In any case, a log file would be useful.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
8th January 2012, 04:12 | #446 | Link | |
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 4,926
|
Quote:
The biggest issue is still subtitling though which @ least in MPC-HC is still dependent on EVR Custom. So you allways have 1 issue either no Subtitling or all the Deinterlace Pain, there are only a few renderer that can do everything DXVA, + Deinterlacing + Subtitling all custom DirectX Renderer. MadVR could be another one once it supports DXVA + Custom Shader Code . Though i wouldn't agree with this "The power/performance of the GPU is best utilized by the EVR." it can only be fully utilized by a Custom Renderer these days that utilizes the same backend as a Game Engine does
__________________
all my compares are riddles so please try to decipher them yourselves :) It is about Time Join the Revolution NOW before it is to Late ! http://forum.doom9.org/showthread.php?t=168004 Last edited by CruNcher; 8th January 2012 at 06:32. |
|
8th January 2012, 08:47 | #447 | Link | ||
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Quote:
Quote:
I mean that multi-threaded code should: 1) Increase the throughput required by 60fps clips 2) Feed better the QS decoding engine and 3) Push QS to maximum speed (frequency) After all these, the decoding performance of 60fps clips should definitely increase. About power consumption, the CPU frequency will go down from the Turbo Mode of single threaded code during playback of 60fps clips and probably power consumption will go down too. During benchmarking or during playback of future difficult clips at 120fps the power consumption will increase again, I think. Looking forward to test your next optimized multi-threaded versions in real tests.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
||
8th January 2012, 09:02 | #448 | Link | |
QuickSync Decoder author
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
|
Quote:
1) Reduce decode thread latency - the decode thread will do just the HW decode and delivery of decoded images down the pipeline. A worker thread will do the rest - most time consuming tasks are frame copy and lockings the d3d9 surfaces. This allows more CPU work (video processing) to performed after decode. 2) Increase performance by adding parallelism - since the HW decode and the frame copying work in parallel, the HW decoder is better utilized allowing more FPS. The MT work is not done. I believe I can achieve better performance than v0.22. v0.22 is much more stable than 0.21.
__________________
Eric Gur, Processor Application Engineer for Overclocking and CPU technologies Intel QuickSync Decoder author Intel Corp. |
|
8th January 2012, 09:24 | #449 | Link |
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 4,926
|
@ Eric
you allways post that sf.net url with a " @ the end BTW: Nev does the QFHD fallback to LAV (CPU) it's better though not doing that for ffdshow-quicksync also in terms of having a comparison point as Nev has no option in LAV Video to disable this restriction.
__________________
all my compares are riddles so please try to decipher them yourselves :) It is about Time Join the Revolution NOW before it is to Late ! http://forum.doom9.org/showthread.php?t=168004 Last edited by CruNcher; 8th January 2012 at 09:55. |
8th January 2012, 09:35 | #450 | Link | |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Quote:
Could you give percentages for each component using MT code?
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
|
8th January 2012, 10:42 | #451 | Link | |
QuickSync Decoder author
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
|
Quote:
The QS decoder calls API functions of the Intel Media SDK to utilize the MFX engine. The Media SDK abstract the communication with HW much better than DXVA. The GPU cores (EUs) may be involved for some internal operations (don't know too much about this), but the bulk of the work is done by the MFX engine. GPU parallelism as well as EU usage is abstracted by the MSDK and may change from generation to generation (or even driver versions).
__________________
Eric Gur, Processor Application Engineer for Overclocking and CPU technologies Intel QuickSync Decoder author Intel Corp. |
|
8th January 2012, 10:48 | #452 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
|
Even the most difficult clips you could find decode at 120fps, typical blu-rays decode at 300+ fps, i think 60fps clips are fine.
You over-estimate what multi-threading means, during normal playback there will be nearly zero difference, you only see it when benchmarking - so its really not all that great.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
8th January 2012, 11:04 | #453 | Link |
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 4,926
|
when you use quicksync decoding for encoding it can be
Eric do you know why the new driver has been removed http://webcache.googleusercontent.co...wnldID%3D20676 Had no issues with it
__________________
all my compares are riddles so please try to decipher them yourselves :) It is about Time Join the Revolution NOW before it is to Late ! http://forum.doom9.org/showthread.php?t=168004 Last edited by CruNcher; 8th January 2012 at 11:14. |
8th January 2012, 11:18 | #454 | Link |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
@Egur
Thanks for the info. The misunderstanding occurred by the following. You used the term "HW decode thread" which usually means decoding not in CPU IA cores. CPU decoding usually referred as software decoding. So by using the term "HW decode thread" you actually mean the work that has to be done in CPU to prepare the data for HW decoding in MFX engine. No actual decoding happens in CPU. @Nevcariel and Egur I expect from MT code to drop the frequency of CPU during playback of 60fps clips from Turbo mode to much lower frequency, by spreading the load to more cores. Have you seen by yourselves that even in normal playback mode of 60fps clips the CPU goes in Turbo Mode increasing Power consumption ? Not in pure DXVA mode and not in other clips <60 fps. Only with FFDShow QS decoder and only in 60fps and above.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all Last edited by NikosD; 8th January 2012 at 11:24. |
8th January 2012, 11:23 | #455 | Link |
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 4,926
|
NikosD don't you understand the Frame Copy GPU->CPU is pressuring the CPU ?
__________________
all my compares are riddles so please try to decipher them yourselves :) It is about Time Join the Revolution NOW before it is to Late ! http://forum.doom9.org/showthread.php?t=168004 Last edited by CruNcher; 8th January 2012 at 11:26. |
8th January 2012, 11:27 | #456 | Link |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Read my previous post again please.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
8th January 2012, 11:39 | #457 | Link |
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 4,926
|
Over here its not always @ full frequency (on balanced power profile) it fluctuates as expected due to the frame copy though playback is fine with 4 girls and 5 birds also absolute smooth 2k the same 4k low bitrate also problems get heavy with high bitrate QFHD And yeah MT might be able to more evenly distribute the load so that frequency fluctuates less. Though also keep in mind that frequency isn't really costing that much power @ all voltage increase is the main factor here. You will not get DXVA consumption from ffdshow-quicksync it will always be lower with DXVA solely due to the frame copy.
__________________
all my compares are riddles so please try to decipher them yourselves :) It is about Time Join the Revolution NOW before it is to Late ! http://forum.doom9.org/showthread.php?t=168004 Last edited by CruNcher; 8th January 2012 at 12:01. |
8th January 2012, 11:52 | #458 | Link | |
QuickSync Decoder author
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
|
Quote:
Regarding Turbo, in my tests Turbo wasn't active for the entire duration of playback. If there's a lot of compute/cpu work to be done, the most efficient way to it is in bursts and not by spreading the workload across time. Idle time after a burst allows power management to kick in. If you're worried about that the GPU will lose it's power budget to the CPU and thus work in a lower frequency, you may or may not be right. The algorithm for deciding this is not exposed to the public. Anyway, this is only the start not the end of MT. @CruNcher: 10x for correcting the typo in the web site link.
__________________
Eric Gur, Processor Application Engineer for Overclocking and CPU technologies Intel QuickSync Decoder author Intel Corp. |
|
8th January 2012, 12:07 | #459 | Link |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
@Egur
Is it possible for the Frame Copy process to be executed in parallel in more than one core? Or is it a strictly serial process ? By using more cores for Frame Copy alone, could help us compare the behavior of the whole CPU package in different situations during playback, to what we have now with serial Frame Copy.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
8th January 2012, 12:51 | #460 | Link | |
QuickSync Decoder author
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
|
Quote:
In the Core2Dou days, I did a few benchmarks as I wrote an application that did a lot of memory copying. My results were: * Writing memcpy using SSE2 was 2x faster than the standard library version (vs2005). * Using 2 threads gave almost 2x performance boost. Using more than 2 didn't change anything. The benchmark were for large buffers (usually > 1M). So my copy function was ~4x faster than the standard memcpy. Today, using SSE2 copy doesn't change all that much, either vs2010 has a better memcpy or the CPU uArch implements the simpler memcpy better. Regarding threads, I need to test this. I can assume that using 2 threads will help. This is next on my list. I'll make a programmable solution that allows scaling beyond 2 threads. I'll post the results in this thread. BTW, parallelizing memcpy is super trivial. Probably the easiest task to make parallel.
__________________
Eric Gur, Processor Application Engineer for Overclocking and CPU technologies Intel QuickSync Decoder author Intel Corp. |
|
Tags |
ffdshow, h264, intel, mpeg2, quicksync, vc1, zoom player |
Thread Tools | Search this Thread |
Display Modes | |
|
|