View Single Post
Old 15th May 2008, 06:57   #17  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Hmm, it appears there's still a lot of room for improvement here. As far as I can tell, DivX's decoder contains no SSE3, no SSSE3, and no SSE4

SSSE3 is useful for palignr (luma MC) and pmaddubsw (chroma MC), for example.

There's also some general serious fail in the code, such as emms being put after some assembly functions, which wastes clocks since floating point code is never or almost never used in a decoder (emms should be put before float functions, not after asm functions). There's also the fact that I'm seeing the frame pointer being used, in other words the code was compiled without -fomit-frame-pointer or its <insert compiler name here> equivalent.

I also get the feeling from reading some extremely bad assembly in here that this was built using an autovectorizing compiler of some sort. For example, an 8x7 VSAD (for adaptive deinterlacing, I assume) that keeps its sum in a GPR and repeatedly adds to it from MMX registers (WTF?!). That must be slowing down the function by at least a factor of 3 or 4.

Last edited by Dark Shikari; 15th May 2008 at 07:26.
Dark Shikari is offline   Reply With Quote