Thread: MVTools
View Single Post
Old 28th June 2008, 08:03   #799  |  Link
TSchniede
Registered User
 
Join Date: Aug 2006
Posts: 77
So this time I improved the internal 2xY SAD function (avoided one push & pop and switched half of the segment register reads with regular ones - 8% faster).
After a good idea I added a optimized version which avoided most reg->mmx moves and used a new fact(1.9.3.2) - the aligned source block buffer is continuous (pitch = blockwidth), so only one read is needed for that. This resulted in an other 8% gain of MVDegrain3 with block=8 on YUY2, the gain with YV12 is less.
Since the second version is clearly faster, it is now the only one used. The Source contains both though.

You can get it here.

So I suppose that should solve the 4xY block issue.
__________________
GA-P35-DS3R, Core2Quad Q9300@3GHz, 4.0GB/800 MHz DDR2, 2x250GB SATA HD, Geforce 6800
TSchniede is offline   Reply With Quote