Wow - that's a big percentwise change, alx gets.
Nics version might be bacuse of ICL - trbarry's version
could be because it has inlined assembler within a rather CPU intensive section. MSVC has a tendency to disable optimizations in a C-block, if it contains inline assembler. Calling other functions containing the assembler doesn't have this impact.
@Nic: Look for an Off by one bug.
(sorry Tom - couldn't help myself)