Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
2nd July 2010, 12:34 | #1 | Link |
Derek Prestegard IRL
Join Date: Nov 2003
Location: Los Angeles
Posts: 5,989
|
Single threaded H.264 decoding performance
[EDIT] This was all updated quite a bit. The interesting graph I came up with a few hours later has been added to this OP just to grab eyes
Consider my thoughts in this OP somewhat misguided... tl;dr - ffmpeg-mt is faster (for these sources anyway) than CoreAVC or DivX H.264! Hey folks, So we all know that CoreAVC is a great place to start if you're looking for FAST H.264 decoding on Windows. That's all well and good. However, I recently realized I'd made the assumption that it would be very fast in all scenarios. As my numbers below show, this is not the case! All tests were performed on a 2.4 GHz Intel Q6600 quad-core CPU running Windows 7 x64. Benchmarks were made using avs2avi. DSS2 provided DirectShow interaction. All sources were of identical content, and were 1080p24. Source 1 - ~50mbps x264 encode, "fastdecode" tune (no CABAC, B-Frames, or Deblocking) CoreAVC - 103.7 fps FFMS2 - 31.5 fps Fake CoreAVC "single threaded" ~ 25.93 fps Source 2 - ~50mbps x264 encode, no CABAC or B-Frames, but has deblocking CoreAVC - 86.9 fps FFMS2 - 27.6 fps Fake CoreAVC "single threaded" ~ 21.73 fps Source 3 - ~50mbps x264 encode, with CABAC, B-frames, and deblocking CoreAVC - 56.6 fps FFMS2 - 13.7 fps Fake CoreAVC "single threaded" ~ 14.2 fps My conclusion? In typical scenarios for this community, CoreAVC or similar high performance multi-threaded decoders are the way to go. However, in MY CASE , things are a little different. I usually work with H.264 sources that have no B-Frames, and use CAVLC instead of CABAC. Both of these sacrifices are made to facilitate real-time capture, which is a huge ugly animal all on its own! Given my typical sources, and the fact that I'm doing a LOT of transcoding at once (anywhere from 3-6 1080p encodes on a system at once, depending on how many cores I have), FFMS2, single threaded, seems like it will actually give me better throughput - provided it never bottlenecks x264! Sure, the numbers are always higher with CoreAVC, but the numbers are always with it eating up almost 100% of my CPU! That's fine for playback, but for high volume transcoding, I want the most efficient decoder for my type of files, correct? I THINK I'm seeing this correctly. What do you guys think? The best possible solution is to do all the decoding on a couple GPUs, but that's another story. Derek
__________________
These are all my personal statements, not those of my employer :) Last edited by Blue_MiSfit; 2nd July 2010 at 17:53. |
2nd July 2010, 12:59 | #2 | Link |
Derek Prestegard IRL
Join Date: Nov 2003
Location: Los Angeles
Posts: 5,989
|
Some more numbers, this time with timeCodec.exe. They're higher overall, but I assume that's because it avoids AviSynth etc. The advantage is, I can restrict CPU affinity reliably this way
Also, since FFMS2 isn't available in this framework, I'm just using libavcodec from ffdshow-tryouts rev 3463 (may 29 2010) Source 1 (--tune fastdecode, zerolatency) CoreAVC (4 threads) - 126.3 libavcodec (1 thread) - 35.9 CoreAVC (1 thread) - 34 Source 2 (--no-cabac, --tune zerolatency) CoreAVC (4 threads) - 101.4 libavcodec (1 thread) - 30.4 CoreAVC (1 thread) - 27.7 Source 3 (all on) CoreAVC (4 threads) - 62.5 libavcodec (1 thread) - 17.6 CoreAVC (1 thread) - 16.5 Interesting. The trend is basically the same, though the differences here are less pronounced (less than 10% in most cases). Interestingly, good old MPEG-2 at 80mbps, with typical GOP structure decodes at ~44fps single threaded with FFMS2, which is almost 2x faster than DGDecode Derek
__________________
These are all my personal statements, not those of my employer :) |
2nd July 2010, 13:15 | #3 | Link |
*****
Join Date: Feb 2005
Posts: 5,647
|
ffdshow also has multi-threading capability if you select ffmpeg-mt as H.264 decoder.
__________________
MPC-HC 2.2.1 |
2nd July 2010, 13:58 | #5 | Link |
Derek Prestegard IRL
Join Date: Nov 2003
Location: Los Angeles
Posts: 5,989
|
Here's more results, this time using ffmpeg-mt (and including the CoreAVC results for reference
Source 1 (--tune fastdecode, zerolatency) ffmpeg-mt (1 thread) - 35.5 fps ffmpeg-mt (4 threads) - 130.4 fps CoreAVC (1 thread) - 34 fps CoreAVC (4 threads) - 126.3 fps Source 2 (--no-cabac, --tune zerolatency) ffmpeg-mt (1 thread) - 28 fps ffmpeg-mt (4 threads) - 110.4 fps CoreAVC (1 thread) - 27.7 fps CoreAVC (4 threads) - 101.4 fps Source 3 (all on) ffmpeg-mt (1 thread) - 17.3 fps ffmpeg-mt (4 threads) - 66.2 fps CoreAVC (1 thread) - 16.5 fps CoreAVC (4 threads) - 62.5 fps WOW. I was NOT expecting that! ffmpeg-mt is faster than CoreAVC in all cases for my usual files! It looks like the ffmpeg devs have been working hard I'll make a pretty little graph to illustrate, and maybe put in DivX H.264 as well, I don't have DiAVC and don't feel like spending $10 right now... Derek
__________________
These are all my personal statements, not those of my employer :) Last edited by Blue_MiSfit; 2nd July 2010 at 14:18. |
3rd July 2010, 03:12 | #15 | Link |
Derek Prestegard IRL
Join Date: Nov 2003
Location: Los Angeles
Posts: 5,989
|
Fair enough.
Still, I'm trying to determine the optimal way to decode high bitrate H.264 for high volume transcode. In my mind, this means the decoder that gives the best per-thread performance. Is that logical? This seems like a valid test, no? As I said, MPEG-2 decodes very quickly as well Derek
__________________
These are all my personal statements, not those of my employer :) Last edited by Blue_MiSfit; 3rd July 2010 at 03:16. |
3rd July 2010, 11:12 | #16 | Link |
4:2:0 hater
Join Date: Apr 2008
Posts: 1,302
|
If you feel like it, it'll be really nice to have a comparison chart with more samples, with a complete description of the used settings. That way we'll see where each decoder shines or fails
|
3rd July 2010, 18:27 | #18 | Link |
Derek Prestegard IRL
Join Date: Nov 2003
Location: Los Angeles
Posts: 5,989
|
Well, for these (unusual) test cases anyway. I haven't bothered to do anything with standard BluRay streams or typical 4-15mbps backup type streams.
Derek
__________________
These are all my personal statements, not those of my employer :) |
4th July 2010, 11:16 | #19 | Link |
Registered User
Join Date: Dec 2003
Location: Denmark
Posts: 122
|
@Dark Shikari : You wrote on your blog about some new deblocking optimization you made to x264. Could this optimization to an encoder deblocker be applied to a decoder deblocker , like the one in FFmpeg ?
Have it already been applied - Is that why the new FFmpeg 0.6 release decodes h264 faster ? |
4th July 2010, 19:53 | #20 | Link | |
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Quote:
I haven't done it because the second stage of the process -- "fix things up if the neighbors are different for deblocking than for normal decoding -- is rather hard when you have to consider all possible options (PAFF, MBAFF, etc, etc). |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|