HeHe, hoisted by his own petard.
I have in the past tried both VS 2005 and VS 2008 and found that I got a performance hit of something like 20% when compared to
VS TK2003, and I think even VS6 was faster (No alterations to standard optimization config in 2008 or VS6, TK3).
Doing time test comparisons between Avisynth Standard AverageLuma and RT_AverageLuma, I was getting something like 4200FPS (STD) and 5000FPS (RT) in
VS6, some of that would be down to the fact that it would locate RT (external plug) quicker in the name lookup table.
On TK3, was something like 7083FPS(RT) on my lowly dual core Core Duo 2.4GHz.
Actual timing for RT_AverageLuma TK3 7083FPS and 4200FPS for STD in VS6.
TK3 seems to quite efficiently compile these type of loop implementations (more efficient in TK3 than VS6, some processors can handle the --x>=0 and
loop in a single instruction
eg Motorola M68K family although x would need to be 16 bit int)
Code:
for(x=width; --x>=0;) {
....
}
rather than this
Code:
for(x=0; x < width; x++) {
....
}
or this (better than prev on some compilers/processors
EDIT: No real difference in either VS6 or TK3)
Code:
for(x=0; x < width; ++x) {
....
}
On tight loop, can have a significant performance boost simply by using the --x>=0 thing.
I also did time test in Avisynth+, not much difference in speed between A+ and RT for eg YPlaneMin equivalent and other non AverageLuma functions, but,
A+ blew my socks off with 30,000FPS for AverageLuma using 2008 (I think) intrinsics.