PDA

View Full Version : vapoursynth.dll built with ICC16 for Windows


HolyWu
16th April 2016, 10:49
Due to the commonly used morphologic filters in core functions don't have SIMD code in them yet, we can only rely on the compiler's auto-vectorization capability. After testing I find ICC16 doing a good job of it than MSVC2015 here. The only thing needed is just adding the restrict keyword to the essential pointers.

The script used for benchmarking:

import vapoursynth as vs
import havsfunc as haf
core = vs.get_core()
core.max_cache_size = 3072

clip = core.lsmas.LibavSMASHSource(r'an 1920x1080 video.mp4')

# test1
#clip = core.std.Maximum(clip[0:1000]).std.Inflate().std.Minimum().std.Deflate()

# test2
#clip = haf.QTGMC(clip[0:500], Preset='Fast', TFF=True)

clip.set_output()


For MSVC build I get 64.63 fps for test1 and 9.47 fps for test2. For ICC build I get 131.08 fps for test1 and 12.33 fps for test2. (E3-1231 v3 @ 3.40 GHz)

VapourSynth-r1754-ICC16.7z (http://www.mediafire.com/download/cl4a02g10xpi77b/VapourSynth-r1754-ICC16.7z)

In addition to the usual "C:\Program Files (x86)\VapourSynth\core<32/64>" folder, I think you need to overwrite the dll in "C:\Program Files (x86)\Python35-32\Lib\site-packages\vapoursynth" (for 32-bit) or "C:\Program Files\Python35\Lib\site-packages\vapoursynth" (for 64-bit) as well.

aegisofrime
16th April 2016, 13:07
This is great! With a real world example (QTGMC encoding with x265), I'm getting 14.53 fps on stock dll, but 16.25fps with your dll. Awesome!

asarian
16th April 2016, 19:29
Did a long real-world test with it (7 hours job): 5.94 fps vs. 5.17 fps. May not look like much, but that's a 13% increase. :)

Also, I was using QTGMC with KNLMeansCL (so it probably would have been even faster, relatively, when no GPU would have been involved at all).