Thread: Yadifmod
View Single Post
Old 16th August 2014, 09:33   #7  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,548
Quote:
Originally Posted by HolyWu View Post
Released r3.

Add SIMD operations. The provided binary package contains two DLLs for SSE2 and AVX2 each. Roughly test on my E3-1231 v3, the SSE2 version is about 1x~2x% faster than C version, while the AVX2 version is about only 5% faster than SSE2 version.
Interesting... I've never seen anyone actually use Agner Fog's vector classes before.

The speed observations are normal. Yadifmod is a fairly simple filter so in the end memory bandwidth becomes the limiting factor. You usually don't gain that much from going beyond SSE2 unless there's a particular instruction you need.

I strongly disagree with your packaging though. The best code path should be automatically chosen. There's no excuse for having multiple different dlls for this kind of code. If you need hints on how to detect cpu features you can simply borrow/study the cpu* files from libav. I looked at the documentation for the vector classes but couldn't see an easy way to use multiple vector sizes at once in the same project though. You may have to use a messy trick with multiple files to make it work.

Personally I wouldn't have bothered to release an AVX2 version at all considering the very small gains.
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet
Myrsloik is offline