karl_lillevold
15th February 2005, 23:19
I posted these RealVideo decoder performance numbers on AMD64, to the helix-producer-dev mailing list, but the topic may be of interest to the many AMD owners in this forum as well. The main problem with the 64-bit compiler from Microsoft, is that it does not support inline assembly code, and it is not possible to compile any of the RealVideo MMX/SSE/SSE2 optimized code, even though the AMD64 processor now supports these instruction sets.
===
There have been a few questions about releasing a 64-bit Producer for performance reasons. We already assumed, but did not actually measure, that without a 64-bit compiler that supports inline assembly code, there will not be any performance increase for native 64-bit compiles, but rather a loss in performance, relative to 32-bit assembly optimized code.
The encoder gains from MMX/SSE/SSE2 optimizations are many times that of the decoder, so it is enough to verify this assumption for the decoder.
After a couple of fixes required for the decoder to run properly, here are some interesting numbers, decode times per frame, as measured on AMD64 3400+ (2.2 GHz) with the latest beta Windows x64:
c32 = ANSI C compiled as 32 bit code
c64 = ANSI C compiled as 64 bit code
a32 = C++/MMX/SSE/SSE2 optimized code compiled as 32 bit
a64 is not possible, since MS compiler for x64 does not support inline assembly.
High complexity scene 300 frames 704x576 ("crew")
Very high bitrate 7 Mbps:
c32: 26.7 ms
c64: 23.0 ms (13.9% [c32])
a32: 19.5 ms (26.9% [c32] | 15.2% [c64])
Low bitrate 1.7 Mbps:
c32: 16.6 ms
c64: 15.5 ms (6.6% [c32])
a32: 11.1 ms (33.3% [c32] | 28.4% [c64])
% speedup is relative to [xyc]
Speedup from native 64-bit vs 32-bit: 6.6 - 13.9% (C vs C)
The ~30% speedup from MMX/SSE/SSE2 in the decoder matches previous measurements (a32 vs c32 / 32bit vs 32bit).
Since the MMX/SSE/SSE2 optimized encoder (in Producer) runs about 3-5 times faster than the C version of the encoder, there is no reason to consider a 64-bit release for performance reasons.
(Note that Windows x64 does not allow 32-bit applications to load 64-bit DLLs and vice versa, so a codec only release for 64-bit is also not possible)
===
There have been a few questions about releasing a 64-bit Producer for performance reasons. We already assumed, but did not actually measure, that without a 64-bit compiler that supports inline assembly code, there will not be any performance increase for native 64-bit compiles, but rather a loss in performance, relative to 32-bit assembly optimized code.
The encoder gains from MMX/SSE/SSE2 optimizations are many times that of the decoder, so it is enough to verify this assumption for the decoder.
After a couple of fixes required for the decoder to run properly, here are some interesting numbers, decode times per frame, as measured on AMD64 3400+ (2.2 GHz) with the latest beta Windows x64:
c32 = ANSI C compiled as 32 bit code
c64 = ANSI C compiled as 64 bit code
a32 = C++/MMX/SSE/SSE2 optimized code compiled as 32 bit
a64 is not possible, since MS compiler for x64 does not support inline assembly.
High complexity scene 300 frames 704x576 ("crew")
Very high bitrate 7 Mbps:
c32: 26.7 ms
c64: 23.0 ms (13.9% [c32])
a32: 19.5 ms (26.9% [c32] | 15.2% [c64])
Low bitrate 1.7 Mbps:
c32: 16.6 ms
c64: 15.5 ms (6.6% [c32])
a32: 11.1 ms (33.3% [c32] | 28.4% [c64])
% speedup is relative to [xyc]
Speedup from native 64-bit vs 32-bit: 6.6 - 13.9% (C vs C)
The ~30% speedup from MMX/SSE/SSE2 in the decoder matches previous measurements (a32 vs c32 / 32bit vs 32bit).
Since the MMX/SSE/SSE2 optimized encoder (in Producer) runs about 3-5 times faster than the C version of the encoder, there is no reason to consider a 64-bit release for performance reasons.
(Note that Windows x64 does not allow 32-bit applications to load 64-bit DLLs and vice versa, so a codec only release for 64-bit is also not possible)