View Single Post
Old 21st November 2021, 01:10   #9  |  Link
TomArrow
Registered User
 
Join Date: Dec 2017
Posts: 90
Ah I see, interesting. I agree, it would only make sense if it's faster.

Meanwhile I have been taking some baby steps and looked into how auto-vectorization works. Turns out you can print diagnostics and it showed me that it wasn't doing any vectorization at all.

So now I tuned it a bit and it does do it now. In addition, I've now also created AVX2 and AVX512 compiles, where the old one was merely AVX (because that's what my CPU supports).

Would you do another one of your benchmarks with the updated version?
https://github.com/TomArrow/ColorMat...ses/tag/v0.1.1

Just pick whatever instruction set your CPU can handle, though the AVX512 is with a newer compiler version which in my tests seemed to actually produce a bit slower code, so if you try the AVX512 one, maybe also try the AVX2 one. And for AVX and AVX2 I included a second version called "OMPtest" which has the OpenMP parallelization of the outer loop (rows) while the other one does not have this.

To me subjectively it feels snappier now, but maybe I'm imagining it. Whereas with the OpenMP thing I really can't tell and I don't know how to do benchmarks.
TomArrow is offline   Reply With Quote