View Single Post
Old 6th June 2009, 19:37   #13  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by SEt View Post
I'm already on Nehalem and when i change
Code:
movq	xmm7, qword ptr [edi+pitch*0-1]
movq	xmm4, qword ptr [edi+pitch*0+1]
movq	xmm1, qword ptr [edi+pitch*0]
movq	xmm2, qword ptr [edi+pitch*2]
to
Code:
movq	xmm7, qword ptr [edi+pitch*1-1]
movq	xmm4, qword ptr [edi+pitch*1+1]
movq	xmm1, qword ptr [edi+pitch*0]
movq	xmm2, qword ptr [edi+pitch*2]
i see 1.4x slowdown in profiler for the whole function.
I was referring to the pinsrw with regard to the cacheline split.

If you're getting such a large slowdown merely by changing that, you should try to figure out why. Performance counters might be useful for analyzing that.
Dark Shikari is offline   Reply With Quote