Log in

View Full Version : SangNom for VapourSynth


james1201
5th August 2016, 14:47
Source: https://bitbucket.org/James1201/vapoursynth-sangnom/overview

Binary: https://bitbucket.org/James1201/vapoursynth-sangnom/downloads

Hello, I rewrote the SangNom for VapourSynth.
The original code is from AVISynth SangNom2 written by tp7.

This filter support 8...16 bit integer and 32 bit float input while the AVISynth SangNom2 only support 8 bit integer input.

see more information in the Source page.


r40
remove param algo and the new algorithm, always use the original one now.
add new param dh.
fix a bug that keep bottom field donesn't work.
add support for specifying aa values for different planes by using array.
r37
r36
fix a bug that copyField didn't copy the whole lines in 16, 32 bit format.
fix a bug that compute the last block twice. luckily it didn't affect the output result.
check the video height which should be an even number.
add SSE2 code for float type input.
add configure scripts for linux users.

littlepox
5th August 2016, 17:30
https://github.com/HomeOfVapourSynthEvolution/VapourSynth-SangNomMod

Have you compared your version with the existing one?

james1201
5th August 2016, 17:45
https://github.com/HomeOfVapourSynthEvolution/VapourSynth-SangNomMod

Have you compared your version with the existing one?

sure, that version you post is the original version I used before.

note that the default param 'aa' is different between them.

old one: aa=48, aac=0 by default

mine: aa=48 for all planes by default

feisty2
5th August 2016, 17:46
Well, any algorithmic differences?

feisty2
5th August 2016, 17:53
Hah, I don't really use sangnom and just compared the docs a few secs ago, I assume the main difference is high bitdepth support?

james1201
5th August 2016, 17:59
Well, any algorithmic differences?

for algo=0, it's completely the same as the old ones.
(this is the default)

for algo=1, do more accurate computing on the average values, but much slower.

I found that algo=1 might be the correct one after I reversed the assembly code back into plain c code...
perhaps I'm wrong?

however, it's not necessary to use algo=1, it's too slow and may not produce a much better result IMO.

so just use the default one, algo=0, which is the same as the old AVISynth SangNom2.

Tormaid
5th August 2016, 23:51
High bit-deapth support is VERY welcome. Thank you!

littlepox
6th August 2016, 05:17
I did some test on my lap top, with:
Core i5 2410M (2.3GHz 2C4T)
8GB DDR3 DC
Windows 7 64bit
VS R32 64bit

Test Script:

import vapoursynth as vs
import sys
import havsfunc as haf
import mvsfunc as mvf

core = vs.get_core(accept_lowercase=True,threads=6)

core.max_cache_size = 8000

a = "00000.m2ts"
src = core.lsmas.LWLibavSource(a,threads=1).std.Trim(0,100)
src16 = core.fmtc.resample(src,3840,2160)
src8 = core.fmtc.bitdepth(src16,bits=8)

res = core.sangnom.SangNomMod(src8,aac=48)

res.set_output()

The result is:
http://img.2222.moe/images/2016/08/06/logo.png

littlepox
6th August 2016, 05:29
Conclusion:

1. For algo=0, the new filter(SangNom) and the old one(SangNomMod(aac=48)) produced identical output; the images they output are bit-wise same.
2. For algo=0 and 8-bit input/output, the new filter is slightly faster than the old one. Of course I only test 3 times so it's likely to be inaccurate. Anyway, at least the new one shouldn't be slower.
3. Using Algo=1 will be a bit slower, as expected.
4. Using 16bit input/output will be slower, as expected.

5. SangNom is primarily used for edge anti-aliasing, and in some cases you wish to do separate filtering in flat areas(like denoise/debanding/deblocking), then you mask-merge your edge and non-edge components.
In that case, the precision of SangNom is insignificant since later only the edge part is kept. However you still wish to do them in an 16bit filter chain, like : bitdepth(bits=8).sangnom().bitdepth(bits=16)
I've tested this use, and find it still faster than native 16bit processing.


Overall, great job for this new implementation. I'm going to switch to the new one.

james1201
6th August 2016, 11:14
Thanks for the testing.

Your conclusion is right. The new one is slightly faster than the old one in 8 bit.
for my testing is about 81 fps vs 76 fps. (~6% faster)

Are_
6th August 2016, 14:26
Really nice job there james.
In my benches I got ~57 for old plug-in and ~97 fps for yours (YUV420P8 3840x2160), that's a 70% speed improvement!
And having the option for higher bit depths is really nice. :)

(Anyway I don't really understand why I'm seeing such a great speed improvement and other people don't)

Edit:
import vapoursynth as vs

core = vs.get_core()

clip = core.lsmas.LWLibavSource('1920x1080YUV420P8.mkv')

clip = core.resize.Bicubic(clip, 1920*2, 1080*2)

# clip = core.sangnom.SangNom(clip, order=0, aa=48, algo=0, planes=0)
clip = core.sangnom.SangNomMod(clip, order=0, aa=48, aac=0)

clip.set_output()

littlepox
6th August 2016, 15:01
For the old SangNomMod, you need to set aac=-1 to disable chroma AA completely; set it to zero does NOT mean doing nothing.

Are_
6th August 2016, 15:07
Ups, you are right, thx for pointing it out. Now I got ~89 fps and that's an ~8% speed improvement.

james1201
6th August 2016, 15:09
Really nice job there james.
In my benches I got ~57 for old plug-in and ~97 fps for yours (YUV420P8 3840x2160), that's a 70% speed improvement!
And having the option for higher bit depths is really nice. :)

(Anyway I don't really understand why I'm seeing such a great speed improvement and other people don't)

Edit:
import vapoursynth as vs

core = vs.get_core()

clip = core.lsmas.LWLibavSource('1920x1080YUV420P8.mkv')

clip = core.resize.Bicubic(clip, 1920*2, 1080*2)

# clip = core.sangnom.SangNom(clip, order=0, aa=48, algo=0, planes=0)
clip = core.sangnom.SangNomMod(clip, order=0, aa=48, aac=0)

clip.set_output()

NOTE!!


clip = core.sangnom.SangNomMod(clip, order=0, aa=48, aac=0)
is equivalent to

clip = core.sangnom.SangNom(clip, order=0, aa=48, algo=0, planes=0).sangnom.SangNom(order=0, aa=0, algo=0, planes=[1, 2])


in old sangnom,
aac=0 still do interpolation but use a threshhold=0 for chroma

in new sangnom,
if you only specify planes=0, the chroma part is just copied without any modification.

Myrsloik
6th August 2016, 15:42
I looked at the code and I'm wondering why the sse path isn't used for float. Did you forget that or it's untested/not working?

I can also make a patch for you that changes the code to sse2 instructions only. Shouldn't really make a real performance difference but work for more people.

You should also think about the poor people who want to process video on their arm cpus and make it possible to compile for other cpus...

james1201
6th August 2016, 17:03
I looked at the code and I'm wondering why the sse path isn't used for float. Did you forget that or it's untested/not working?


It seems no performance gain when using the sse code for float.
so i just used the c version code for it.



I can also make a patch for you that changes the code to sse2 instructions only. Shouldn't really make a real performance difference but work for more people.

You should also think about the poor people who want to process video on their arm cpus and make it possible to compile for other cpus...

any patches are welcome :thanks:

Myrsloik
6th August 2016, 17:11
I'm lazy so here's the whole file (https://dl.dropboxusercontent.com/u/73468194/sangnom_c.cpp) with changes. Grab whatever you want from it.

Highlights:
Only needs sse2 now, no measurable performance loss
Fixes vs2015 compilation (which somehow produces a faster binary than your gcc one)
Puts all the sse code under #ifdef VS_TARGET_CPU_X86 so a plain version can be compiled on other platforms
Fixes some conversion warnings

I tried to get the float sse to compile but it seems it's missing a float version of processBuffers_org_sse. I can only test the new version after commenting out stuff?

james1201
6th August 2016, 18:00
I'm lazy so here's the whole file (https://dl.dropboxusercontent.com/u/73468194/sangnom_c.cpp) with changes. Grab whatever you want from it.

Highlights:
Only needs sse2 now, no measurable performance loss
Fixes vs2015 compilation (which somehow produces a faster binary than your gcc one)
Puts all the sse code under #ifdef VS_TARGET_CPU_X86 so a plain version can be compiled on other platforms
Fixes some conversion warnings

I tried to get the float sse to compile but it seems it's missing a float version of processBuffers_org_sse. I can only test the new version after commenting out stuff?

Thanks for the patch, I have pushed it into the repo.

PS:
my vs2015 was broken after I updated it to the new version yesterday.
so i can only provide the binary built by gcc now...

Selur
6th August 2016, 19:09
Just wondering: Anyone tried to modify havsfunc.py->santiag to use libsangnom instead of vs_sangnommod?

james1201
29th August 2016, 08:29
update the binary to r36

some minor bugs fixed.


r36
fix a bug that copyField didn't copy the whole lines in 16, 32 bit format.
fix a bug that compute the last block twice. luckily it didn't affect the output result.
check the video height which should be an even number.
add SSE2 code for float type input.
add configure scripts for linux users.

littlepox
30th August 2016, 12:11
Feature request:

1. support separated aa settings, like aa=[48,24] or aa=[48,24,36].
2. support dh mode(double height). Treat the whole picture as a top field or bottom field, then interpolate it.

Both of them can currently be done with some extra work, I'd just wish we could have our code simplified a lot.

Myrsloik
30th August 2016, 12:14
Feature request:

1. support separated aa settings, like aa=[48,24] or aa=[48,24,36].
2. support dh mode(double height). Treat the whole picture as a top field or bottom field, then interpolate it.

Both of them can currently be done with some extra work, I'd just wish we could have our code simplified a lot.

You'll get the same speed just by doing something like this:
SangNom(SangNom(clip, aa=24, planes=[1, 2]), aa=48, planes=0)

Since it only processes one plane at a time anyway and passing the other planes through is free.

littlepox
30th August 2016, 12:17
You'll get the same speed just by doing something like this:
SangNom(SangNom(clip, aa=24, planes=[1, 2]), aa=48, planes=0)

Since it only processes one plane at a time anyway and passing the other planes through is free.

I'm currently writing something like this:

1. core.sangnom.SangNom(clip, aa=48, planes=0).sangnom.SangNom(aa=24, planes=[1, 2])

2. core.resize.point(clip, w,h*2).sangnom.SangNom().fmtc.resample(sy=[-0.5,-1,-1]) #resample for fixing the center shift.

james1201
31st August 2016, 11:27
Feature request:

1. support separated aa settings, like aa=[48,24] or aa=[48,24,36].
2. support dh mode(double height). Treat the whole picture as a top field or bottom field, then interpolate it.

Both of them can currently be done with some extra work, I'd just wish we could have our code simplified a lot.

OK, I have finished this job.
And fixed the order=2(keep bottom field) problem I found last night...
I didn't test it properly.


r40
remove param algo and the new algo, always use the original one now.
add new param dh.
fix a bug that keep bottom field donesn't work.
add support for specifying aa values for different planes by using array.
r37

littlepox
31st August 2016, 12:37
OK, I have finished this job.
And fixed the order=2(keep bottom field) problem I found last night...
I didn't test it properly.

Tons of thanks.;)

jackoneill
27th February 2019, 20:36
Hello.

There is a bug in the way you handle the order parameter: https://bitbucket.org/James1201/vapoursynth-sangnom/src/5a00bb64258d3d061ad4044caaf5447d42368b3f/src/sangnom.cpp?at=default&fileviewer=file-view-default#sangnom.cpp-1734

According to the documentation, the default value for order is 1 (keep top field). Thus you should have

d->order = int64ToIntS(vsapi->propGetInt(in, "order", 0, &err));
if (err)
d->order = SNOT_SFR_KT;


As for the part where you examine the _FieldBased property, that belongs in the sangnomGetFrame function, because the property is only found attached to frames (http://www.vapoursynth.com/doc/api/vapoursynth.h.html#getframepropsro). And of course you only need to examine that property when order is SNOT_DFR (0).


Another thing I noticed is that you copy any unprocessed planes using memcpy. You could avoid this copying if you used the newVideoFrame2 (http://www.vapoursynth.com/doc/api/vapoursynth.h.html#newvideoframe2) function, because VapourSynth has a copy-on-write system. Here's how you can use this function to let unprocessed planes pass through without copying them:

const VSFrameRef *plane_src[3] = {
d->process[0] ? nullptr : src,
d->process[1] ? nullptr : src,
d->process[2] ? nullptr : src
};

int planes[3] = { 0, 1, 2 };

VSFrameRef *dst = vsapi->newVideoFrame2(d->vi->format, d->vi->width, d->vi->height, plane_src, planes, src, core);

ChaosKing
27th February 2019, 21:58
His last login was in 2016 ... it would be better to open an issue on bitbucket I guess.

jackoneill
27th February 2019, 22:30
His last login was in 2016 ... it would be better to open an issue on bitbucket I guess.

But I don't want to make an account. :(

ChaosKing
27th February 2019, 22:41
I think you can create them without an account on bitbucket. Just solve the captcha :D

jackoneill
14th March 2019, 19:02
There is a bug in the way you handle the order parameter: https://bitbucket.org/James1201/vapoursynth-sangnom/src/5a00bb64258d3d061ad4044caaf5447d42368b3f/src/sangnom.cpp?at=default&fileviewer=file-view-default#sangnom.cpp-1734


Okay, I think I fixed it: https://github.com/dubhater/vapoursynth-sangnom/releases/tag/r41


r41
Fix order=0 (double rate output). It was always keeping the top field
instead of alternating between top field and bottom field.
Add Meson build system.
r40

sl1pkn07
8th September 2020, 16:28
all your mercurial repos is gone

greetings