View Single Post
Old 6th June 2009, 15:39   #15  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by SEt View Post
Dark Shikari, i know not everything is optimally written, but i thought better release working version now than super-optimized never. I know that horizontal blur is made for palignr and will look into this when i have time, but i have no idea how to save kittens or why loading correct line gives 1.4x speed drop for the whole function, including those awful pinsrw that should be much more time consuming than just unaligned load from additional memory location.
"Should be much more time consuming?"

Does that imply you tested it, and found it to be faster?

If it's faster, I'm going to be inclined to blame cacheline-split. Test on an AMD chip or Nehalem and watch the penalties melt away.
Dark Shikari is offline   Reply With Quote