Doom9's Forum - View Single Post - Internaly multi-threaded resampling functions

TheFluff · 13th August 2016, 23:16

Stephen R. Savage posted this earlier but since he loves deleting his own posts I'll repeat what I remember of it:

There's no evidence that slice-based threading is any faster than frame-based threading for a convolution filter like a resizer. Frame-based threading of the Vapoursynth internal resizers scaled very close to linearly up to 24 cores in Stephen's tests (24 cores, 23.8x speedup compared to one core). He had some argument that there is no cache advantage to the slice-based threading because there's no shared data between lines, or something? I don't remember. But anyway internally multithreading like this is likely pointless, at least for resizers. Then again I'm pretty sure avs-mt's frame-based multithreading design is bad but I don't really have the evidence to back that up.

Quote:

Originally Posted by jpsdr

You may not have this point of view, but for those who share it, you can use my multi-threaded version.

I realize that "optimizing" things based on guesswork, hearsay and fundamental misunderstandings of the underlying technology is a very doom9 thing to do (remember that guy who wrote 3000 lines of asm to try to optimize memcpy even though optimizing memcpy does absolutely nothing in the real world?), but holy shit, seriously. Dude. If you optimize something, you'd better benchmark it to prove that is faster than the thing you wanted to improve on. One algorithm being faster than another isn't an opinion or a point of view. Don't try to rice shit without benchmarks.