View Single Post
Old 8th July 2018, 20:55   #11  |  Link
jackoneill
unsigned int
 
jackoneill's Avatar
 
Join Date: Oct 2012
Location: 🇪🇺
Posts: 760
I figured out what's wrong when radius is 4 or 5. It calculates a sum which has to fit in 14 bits, so it can't be more than 16383. Radius 4 (9 * 9 * 255) is just too many pixels to fit.

This is easily fixed by using the pmulhuw instruction (introduced in SSE) instead of the pmulhw instruction (introduced in MMX).

https://github.com/dubhater/vapoursy...eleases/tag/v2

Code:
   * Fix bad output with radius 4 and 5 (especially 5).
   * Allow radius 6 and 7.
   * Better precision in the calculations.
__________________
Buy me a "coffee" and/or hire me to write code!
jackoneill is offline   Reply With Quote