Log in

View Full Version : eedi3


HolyWu
31st August 2017, 18:55
https://github.com/HomeOfVapourSynthEvolution/VapourSynth-EEDI3/

There are two filters in this plugin, one using pure CPU and the other using OpenCL. Unfortunely one of the heavy loop can't get parallel execution on GPU (or I'm too dumb to find a way to conquer it) and still need to be executed on CPU, the OpenCL version may not necessarily run faster than pure CPU version. Anyway, here is my rough test on my E3-1230v3 and (poor) GTX 660.

AVS EEDI3 v0.9.2.1 x64
eedi3(opt=1, pure c): 1.15 fps
eedi3(opt=2, sse2): 3.81 fps

VS EEDI3 r2 x64
EEDI3(opt=1, pure c): 1.22 fps
EEDI3(opt=2, sse2): 4.19 fps
EEDI3(opt=3, sse4.1): 4.89 fps
EEDI3(opt=4, avx): 6.18 fps
EEDI3(opt=5, avx512): unavailable
EEDI3CL(opt=1, pure c): 4.88 fps
EEDI3CL(opt=2, sse2): 7.17 fps

Sample video (https://www.mediafire.com/file/hw0mcamkynsrj7j/test_1080i.mp4) used for benchmarking.

feisty2
31st August 2017, 19:03
with floating point support, great, the one comes with vsinstaller can rest in peace now

DJATOM
4th October 2017, 15:19
Debian 8, gcc 4.9, and I can't build it there: https://pastebin.com/km3g2hKS. If compiler is too old (most likely that), which gcc version should I use?

HolyWu
4th October 2017, 16:02
Debian 8, gcc 4.9, and I can't build it there: https://pastebin.com/km3g2hKS. If compiler is too old (most likely that), which gcc version should I use?

I guess GCC 5.x series could do the job.

HolyWu
13th October 2017, 09:03
Update r2.


EEDI3: Add avx and avx512 instruction sets. Remove avx2 instruction set.
EEDI3CL: Remove sse4.1 and avx2 instruction sets.
Minor speed improvement.


The benchmark in the first post is revised.

edcrfv94
15th October 2017, 18:54
import vapoursynth as vs

core = vs.get_core(threads=0)
core.max_cache_size = 32000

c = core.std.BlankClip(width=1920, height=1080, format=vs.YUV420P8, length=2000)
c = core.eedi3m.EEDI3(c, field=0, opt=4)
c.set_output()


AMD ryzen 1950x stock

VS EEDI3 r1 x64
AVX2: 33.57fps

VS EEDI3 r2 x64
pure c: 6.14fps
sse2: 21.87fps
sse4.1: 27.58fps
AVX: 35fps

HolyWu
19th October 2017, 06:10
AVX-512 code seems to work.

It looks like not all the code paths are optimized for AVX-512, as it spent some time at the AVX turbo frequency.

Thanks. Could you please try again with https://www.mediafire.com/file/tb7ioard7aaiig0/EEDI3-test.7z and see whether it makes any difference?

DJATOM
26th October 2017, 14:05
It would be nice if eedi3cl will have dw parameter just like it implemented in nnedi3cl. In that case I can dispose of Transpose calls in my script when CL versions is used (I'm using "sclip=nnedi3 clip").

KingLir
24th November 2017, 22:53
I am getting the following when building on macOS (latest 10.13.1 with latest eedi3 from GitHub). Any ideas ? I will appreciate any help.

https://pastebin.com/T1TJha0D

Are_
25th November 2017, 00:00
Your clang is too old, you need clang 5 or greater. If there isn't anything newer wait for it or use an older release without avx512 support.

KingLir
25th November 2017, 00:10
Your clang is too old, you need clang 5 or greater. If there isn't anything newer wait for it or use an older release without avx512 support.

Are you sure ? It's the latest Xcode and it says (also in the log I posted above) that it's clang 900.0.38

$ clang --version
Apple LLVM version 9.0.0 (clang-900.0.38)
Target: x86_64-apple-darwin17.2.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Are_
25th November 2017, 00:24
I use Linux and I can replicate that error only if I use clang-4.0.1, with clang-5.0.0 it builds OK.
Apple's clang will be based on vanilla clang-5.x.x probably when they change that "clang-900" version string to something higher.

KingLir
25th November 2017, 00:53
I use Linux and I can replicate that error only if I use clang-4.0.1, with clang-5.0.0 it builds OK.
Apple's clang will be based on vanilla clang-5.x.x probably when they change that "clang-900" version string to something higher.

I see. Anyone knows how I can set it up on macOS ?
I have brew working and everything already installed.
Can't figure what command or edit I need to do for the makefile / configure....

HolyWu
25th November 2017, 06:04
I see. Anyone knows how I can set it up on macOS ?
I have brew working and everything already installed.
Can't figure what command or edit I need to do for the makefile / configure....

Try again with the latest commit.

KingLir
25th November 2017, 09:23
Try again with the latest commit.

Thanks! all previous ones seems to fixed - now getting a new one:

$ make
CXXLD libeedi3m.la
ld: library not found for -lOpenCL
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [libeedi3m.la] Error 1

HolyWu
25th November 2017, 19:24
Thanks! all previous ones seems to fixed - now getting a new one:

$ make
CXXLD libeedi3m.la
ld: library not found for -lOpenCL
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [libeedi3m.la] Error 1

The error is quite obvious. The linker can't find the OpenCL library. Are you sure it's installed? Or maybe you need to pass additional flags to tell the linker where it's located?

KingLir
25th November 2017, 19:50
The error is quite obvious. The linker can't find the OpenCL library. Are you sure it's installed? Or maybe you need to pass additional flags to tell the linker where it's located?

Oh, I google it and found that replacing " -lOpenCL " with " -framework OpenCL " will work - and indeed it worked.
Thanks again !

HolyWu
23rd December 2017, 14:17
Update r3.


EEDI3: Fix incorrect data type in AVX512 code path for 32 bit depth.
EEDI3CL: Fix decimal-point character issue in different locales.
EEDI3CL: Use snprintf to convert floating point to string for more precise value.
EEDI3CL: Store the compiled binary for reuse in the offline cache located in $HOME/.boost_compute on UNIX-like systems and in %APPDATA%/boost_compute on Windows.
EEDI3CL: Minor performance improvement by explicitly specify work-group size.

Myrsloik
11th July 2018, 00:23
Update r3.


EEDI3: Fix incorrect data type in AVX512 code path for 32 bit depth.
EEDI3CL: Fix decimal-point character issue in different locales.
EEDI3CL: Use snprintf to convert floating point to string for more precise value.
EEDI3CL: Store the compiled binary for reuse in the offline cache located in $HOME/.boost_compute on UNIX-like systems and in %APPDATA%/boost_compute on Windows.
EEDI3CL: Minor performance improvement by explicitly specify work-group size.


Would you consider renaming the dll to eedi3m for the next release? Having the same filename creates even more confusion. No hurry.

HolyWu
12th July 2018, 03:06
Would you consider renaming the dll to eedi3m for the next release? Having the same filename creates even more confusion. No hurry.

Since it probably won't have a next release for a long time unless a bug is found, I just reupload the archive in GitHub release with only filename changed.

jackoneill
4th December 2018, 11:27
Hi!

Once upon a time, cretindesalpes added the mclip parameter:


- Added an optional mask to process only the specified parts.
It’s helpful when EEDI3 is used as an anti-aliasing processor on cartoon-like materials.
Additional 1.2×–2× speedup, depending on the source.


Why doesn't your version have it? Was it very annoying to implement? The reason I'm interested is that I'm porting the xaa antialiasing script, which uses eedi3 with a mask (among other filters).

DJATOM
4th December 2018, 16:03
Yeah, +1 for mask option.

HolyWu
11th December 2018, 15:03
Update r4.


EEDI3: Add parameter mclip. Note that code paths of SSE4.1 and above could be slower than SSE2 when mclip is used.



Was it very annoying to implement?

Yes, it was damn annoying. :devil:

jackoneill
12th December 2018, 12:40
Update r4.


EEDI3: Add parameter mclip. Note that code paths of SSE4.1 and above could be slower than SSE2 when mclip is used.





Yes, it was damn annoying. :devil:

Thank you!

edcrfv94
12th December 2018, 13:51
Update r4.


EEDI3: Add parameter mclip. Note that code paths of SSE4.1 and above could be slower than SSE2 when mclip is used.





Yes, it was damn annoying. :devil:

Thank.

Maybe NNEDI3 need too, then can use your own mask replace pscrn mask.(e.g NNEDI3(mclip=mask, pscrn=0))