Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
5th August 2016, 14:47 | #1 | Link |
Registered User
Join Date: Feb 2013
Posts: 9
|
SangNom for VapourSynth
Source: https://bitbucket.org/James1201/vapo...ngnom/overview
Binary: https://bitbucket.org/James1201/vapo...gnom/downloads Hello, I rewrote the SangNom for VapourSynth. The original code is from AVISynth SangNom2 written by tp7. This filter support 8...16 bit integer and 32 bit float input while the AVISynth SangNom2 only support 8 bit integer input. see more information in the Source page. Code:
r40 remove param algo and the new algorithm, always use the original one now. add new param dh. fix a bug that keep bottom field donesn't work. add support for specifying aa values for different planes by using array. r37 r36 fix a bug that copyField didn't copy the whole lines in 16, 32 bit format. fix a bug that compute the last block twice. luckily it didn't affect the output result. check the video height which should be an even number. add SSE2 code for float type input. add configure scripts for linux users. Last edited by james1201; 31st August 2016 at 11:28. |
5th August 2016, 17:30 | #2 | Link |
Registered User
Join Date: Nov 2012
Posts: 218
|
https://github.com/HomeOfVapourSynth...nth-SangNomMod
Have you compared your version with the existing one? |
5th August 2016, 17:45 | #3 | Link | |
Registered User
Join Date: Feb 2013
Posts: 9
|
Quote:
note that the default param 'aa' is different between them. old one: aa=48, aac=0 by default mine: aa=48 for all planes by default |
|
5th August 2016, 17:59 | #6 | Link |
Registered User
Join Date: Feb 2013
Posts: 9
|
for algo=0, it's completely the same as the old ones.
(this is the default) for algo=1, do more accurate computing on the average values, but much slower. I found that algo=1 might be the correct one after I reversed the assembly code back into plain c code... perhaps I'm wrong? however, it's not necessary to use algo=1, it's too slow and may not produce a much better result IMO. so just use the default one, algo=0, which is the same as the old AVISynth SangNom2. |
6th August 2016, 05:17 | #8 | Link |
Registered User
Join Date: Nov 2012
Posts: 218
|
I did some test on my lap top, with:
Core i5 2410M (2.3GHz 2C4T) 8GB DDR3 DC Windows 7 64bit VS R32 64bit Test Script: import vapoursynth as vs import sys import havsfunc as haf import mvsfunc as mvf core = vs.get_core(accept_lowercase=True,threads=6) core.max_cache_size = 8000 a = "00000.m2ts" src = core.lsmas.LWLibavSource(a,threads=1).std.Trim(0,100) src16 = core.fmtc.resample(src,3840,2160) src8 = core.fmtc.bitdepth(src16,bits=8) res = core.sangnom.SangNomMod(src8,aac=48) res.set_output() The result is: |
6th August 2016, 05:29 | #9 | Link |
Registered User
Join Date: Nov 2012
Posts: 218
|
Conclusion:
1. For algo=0, the new filter(SangNom) and the old one(SangNomMod(aac=48)) produced identical output; the images they output are bit-wise same. 2. For algo=0 and 8-bit input/output, the new filter is slightly faster than the old one. Of course I only test 3 times so it's likely to be inaccurate. Anyway, at least the new one shouldn't be slower. 3. Using Algo=1 will be a bit slower, as expected. 4. Using 16bit input/output will be slower, as expected. 5. SangNom is primarily used for edge anti-aliasing, and in some cases you wish to do separate filtering in flat areas(like denoise/debanding/deblocking), then you mask-merge your edge and non-edge components. In that case, the precision of SangNom is insignificant since later only the edge part is kept. However you still wish to do them in an 16bit filter chain, like : bitdepth(bits=8).sangnom().bitdepth(bits=16) I've tested this use, and find it still faster than native 16bit processing. Overall, great job for this new implementation. I'm going to switch to the new one. Last edited by littlepox; 7th August 2016 at 10:12. |
6th August 2016, 14:26 | #11 | Link |
Registered User
Join Date: Jun 2012
Location: Ibiza, Spain
Posts: 321
|
Really nice job there james.
In my benches I got ~57 for old plug-in and ~97 fps for yours (YUV420P8 3840x2160), that's a 70% speed improvement! And having the option for higher bit depths is really nice. (Anyway I don't really understand why I'm seeing such a great speed improvement and other people don't) Edit: Code:
import vapoursynth as vs core = vs.get_core() clip = core.lsmas.LWLibavSource('1920x1080YUV420P8.mkv') clip = core.resize.Bicubic(clip, 1920*2, 1080*2) # clip = core.sangnom.SangNom(clip, order=0, aa=48, algo=0, planes=0) clip = core.sangnom.SangNomMod(clip, order=0, aa=48, aac=0) clip.set_output() Last edited by Are_; 6th August 2016 at 14:29. |
6th August 2016, 15:09 | #14 | Link | |
Registered User
Join Date: Feb 2013
Posts: 9
|
Quote:
Code:
clip = core.sangnom.SangNomMod(clip, order=0, aa=48, aac=0) Code:
clip = core.sangnom.SangNom(clip, order=0, aa=48, algo=0, planes=0).sangnom.SangNom(order=0, aa=0, algo=0, planes=[1, 2]) aac=0 still do interpolation but use a threshhold=0 for chroma in new sangnom, if you only specify planes=0, the chroma part is just copied without any modification. |
|
6th August 2016, 15:42 | #15 | Link |
Professional Code Monkey
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,548
|
I looked at the code and I'm wondering why the sse path isn't used for float. Did you forget that or it's untested/not working?
I can also make a patch for you that changes the code to sse2 instructions only. Shouldn't really make a real performance difference but work for more people. You should also think about the poor people who want to process video on their arm cpus and make it possible to compile for other cpus...
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet |
6th August 2016, 17:03 | #16 | Link | ||
Registered User
Join Date: Feb 2013
Posts: 9
|
Quote:
so i just used the c version code for it. Quote:
|
||
6th August 2016, 17:11 | #17 | Link |
Professional Code Monkey
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,548
|
I'm lazy so here's the whole file with changes. Grab whatever you want from it.
Highlights: Only needs sse2 now, no measurable performance loss Fixes vs2015 compilation (which somehow produces a faster binary than your gcc one) Puts all the sse code under #ifdef VS_TARGET_CPU_X86 so a plain version can be compiled on other platforms Fixes some conversion warnings I tried to get the float sse to compile but it seems it's missing a float version of processBuffers_org_sse. I can only test the new version after commenting out stuff?
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet Last edited by Myrsloik; 6th August 2016 at 17:55. Reason: Add question |
6th August 2016, 18:00 | #18 | Link | |
Registered User
Join Date: Feb 2013
Posts: 9
|
Quote:
PS: my vs2015 was broken after I updated it to the new version yesterday. so i can only provide the binary built by gcc now... |
|
29th August 2016, 08:29 | #20 | Link |
Registered User
Join Date: Feb 2013
Posts: 9
|
update the binary to r36
some minor bugs fixed. Code:
r36 fix a bug that copyField didn't copy the whole lines in 16, 32 bit format. fix a bug that compute the last block twice. luckily it didn't affect the output result. check the video height which should be an even number. add SSE2 code for float type input. add configure scripts for linux users. |
Thread Tools | Search this Thread |
Display Modes | |
|
|