View Single Post
Old 31st May 2015, 16:15   #5  |  Link
MonoS
Registered User
 
Join Date: Aug 2012
Posts: 203
After seeing handmade hero i wanted to port this function to AVX for understanding how optimization works [i don't think i'll do SSE].

A changed all the variable from double to float so that a whole stride can fit into a 256bit register, i hope that this wont change the behaviour of the function this much.

Right now i've almost finished the cdct function

Edit: did some test on the function fillfactors, went down from 134cycles full unrolled to 16, to bad it's only called once XD

Last edited by MonoS; 31st May 2015 at 18:37.
MonoS is offline   Reply With Quote