View Single Post
Old 15th January 2017, 16:08   #27  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,555
Quote:
Originally Posted by feisty2 View Post
r2 runs at 476.65fps at 1920x1080
r3 runs at 489.65fps at 1920x1080

makes no sense!!!
shouldn't AVX and FMA be at least 8x faster than x87???

I think there's probably something wrong with my compiler (VS2017), can you please compile the source code with your compiler and do a test as well?

to switch avx/fma back to c++

Line 225: FixFadesPrepare_AVX(); -> FixFadesPrepare();

Line 231: FixFadesMode0_AVX_FMA(); -> FixFadesMode0();

Line 234: FixFadesMode1_AVX_FMA(); -> FixFadesMode1();

Line 237: FixFadesMode2_AVX_FMA(); -> FixFadesMode2();
YOU ARE NOT USING X87!!! You compiled it as x64 code and the ABI (more or less) requires it to use sse2 instructions to implement it. Obviously at least the scalar float versions. It's even possible that it managed to auto vectorize like half of this code since most of it is just mindless read and sum. Look at the generated code instead of asking us about what you, YOURSELF, told the compiler to do.

Your assumption still wouldn't be true about x87 vs avx. For simple algorithms you run into memory bw limitations long before you see the glory of sse (avx is even more rare to matter). Modern cpus are just too good.
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet
Myrsloik is offline   Reply With Quote