Made a few (easy) speedups - it should be a couple of times faster now.
Updated source.
- Use float instead of double. They have enough precision and are much faster. It can however be changed back by changing the typedef in ColorMatrix.h
- Simpler algorithms.
- Use internal limiter for output also, instead of very slow if-then.
- Better rounding (adding 0.5 for more exact float to int conversion)
No SSE/MMX this tim, though.