View Single Post
Old 6th June 2009, 22:20   #15  |  Link
SEt
Registered User
 
Join Date: Aug 2007
Posts: 374
Played with performance counters for some time and here is what i found (removed all pinsrw for tests as they don't change the situation):
global slowdown is produced by
Code:
movq	xmm7, qword ptr [edi+pitch*1-1]
but not by
Code:
movq	xmm4, qword ptr [edi+pitch*1+1]
It results in spike of L1D.REPL and huge spikes of L1D.M_REPL, L1D.M_EVICT, L1D.M_SNOOP_EVICT in that area (also ILD_STALL.ANY, but i don't think it's interesting).
I've tried to change the only writing instruction here from movq to movdq2q,movntq but that changed nothing.


Fizick, i think the situation is similar to MVTools 1-2 It's fully compatible in terms of available functionality and effective ranges of parameters are supersets of the original ones. I know i should probably change the name to aWarpSharp2, but it looks kind of strange with aSobel, aBlur, aWarp. In truth it's more like a beta release to me due to mentioned wrong offsets in Warp and saturated multiplication by 6 at the end of Sobel that i don't like at all.
SEt is offline   Reply With Quote