View Single Post
Old 28th December 2007, 06:44   #121  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
Okay the equivalent C code helps a lot. Perhaps it should be in your source as comments.

Have you got this
Code:
movntq [ebx+edx], mm0
in the block you are testing? And you have an AMD beastie. Test with a normal movq.

Do not mix cached and non-temporal memory accesses, specially writes and make absolutly sure that they are 64 bit aligned.

Also as a baseline perhaps you should test the pure C code version.

------------------------

Also a possible hint for your end around code. As long as you have at least 8 bytes total to process just do a single unaligned movq at the end. i.e. offset the movq to match the end of the buffer and process the few overlapping bytes twice.
IanB is offline   Reply With Quote