After seeing handmade hero i wanted to port this function to AVX for understanding how optimization works [i don't think i'll do SSE].
A changed all the variable from double to float so that a whole stride can fit into a 256bit register, i hope that this wont change the behaviour of the function this much.
Right now i've almost finished the cdct function
Edit: did some test on the function fillfactors, went down from 134cycles full unrolled to 16, to bad it's only called once XD
Last edited by MonoS; 31st May 2015 at 18:37.
|