Quote:
Originally Posted by HolyWu
Released r3.
Add SIMD operations. The provided binary package contains two DLLs for SSE2 and AVX2 each. Roughly test on my E3-1231 v3, the SSE2 version is about 1x~2x% faster than C version, while the AVX2 version is about only 5% faster than SSE2 version.
|
Interesting... I've never seen anyone actually use Agner Fog's vector classes before.
The speed observations are normal. Yadifmod is a fairly simple filter so in the end memory bandwidth becomes the limiting factor. You usually don't gain that much from going beyond SSE2 unless there's a particular instruction you need.
I strongly disagree with your packaging though. The best code path should be automatically chosen. There's no excuse for having multiple different dlls for this kind of code. If you need hints on how to detect cpu features you can simply
borrow/study the cpu* files from libav. I looked at the documentation for the vector classes but couldn't see an easy way to use multiple vector sizes at once in the same project though. You may have to use a messy trick with multiple files to make it work.
Personally I wouldn't have bothered to release an AVX2 version at all considering the very small gains.