View Single Post
Old 7th June 2009, 18:17   #21  |  Link
SEt
Registered User
 
Join Date: Aug 2007
Posts: 374
Problem solved. Thanks to Dark Shikari for kicking me to actually go see performance counters and IanB for idea what can help.
Placing earlier
Code:
mov	al, byte ptr [edi+pitch*1-8]
changed nothing, but gave me idea that worked - i moved problem loads before write of previous iteration. A few more optimizations and new code works as fast as old one and sometimes even a bit faster. Will post it later when other things are done.

IanB, i started to use pitch*? instead of registers because included source had no register for pitch*1.
SEt is offline   Reply With Quote