Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > VapourSynth

Reply
 
Thread Tools Search this Thread Display Modes
Old 5th August 2016, 14:47   #1  |  Link
james1201
Registered User
 
Join Date: Feb 2013
Posts: 9
SangNom for VapourSynth

Source: https://bitbucket.org/James1201/vapo...ngnom/overview

Binary: https://bitbucket.org/James1201/vapo...gnom/downloads

Hello, I rewrote the SangNom for VapourSynth.
The original code is from AVISynth SangNom2 written by tp7.

This filter support 8...16 bit integer and 32 bit float input while the AVISynth SangNom2 only support 8 bit integer input.

see more information in the Source page.

Code:
r40
    remove param algo and the new algorithm, always use the original one now.
    add new param dh.
    fix a bug that keep bottom field donesn't work.
    add support for specifying aa values for different planes by using array.
r37
r36
    fix a bug that copyField didn't copy the whole lines in 16, 32 bit format.
    fix a bug that compute the last block twice. luckily it didn't affect the output result.
    check the video height which should be an even number.
    add SSE2 code for float type input.
    add configure scripts for linux users.

Last edited by james1201; 31st August 2016 at 11:28.
james1201 is offline   Reply With Quote
Old 5th August 2016, 17:30   #2  |  Link
littlepox
Registered User
 
Join Date: Nov 2012
Posts: 218
https://github.com/HomeOfVapourSynth...nth-SangNomMod

Have you compared your version with the existing one?
littlepox is offline   Reply With Quote
Old 5th August 2016, 17:45   #3  |  Link
james1201
Registered User
 
Join Date: Feb 2013
Posts: 9
Quote:
Originally Posted by littlepox View Post
https://github.com/HomeOfVapourSynth...nth-SangNomMod

Have you compared your version with the existing one?
sure, that version you post is the original version I used before.

note that the default param 'aa' is different between them.

old one: aa=48, aac=0 by default

mine: aa=48 for all planes by default
james1201 is offline   Reply With Quote
Old 5th August 2016, 17:46   #4  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: void
Posts: 2,633
Well, any algorithmic differences?
feisty2 is offline   Reply With Quote
Old 5th August 2016, 17:53   #5  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: void
Posts: 2,633
Hah, I don't really use sangnom and just compared the docs a few secs ago, I assume the main difference is high bitdepth support?
feisty2 is offline   Reply With Quote
Old 5th August 2016, 17:59   #6  |  Link
james1201
Registered User
 
Join Date: Feb 2013
Posts: 9
Quote:
Originally Posted by feisty2 View Post
Well, any algorithmic differences?
for algo=0, it's completely the same as the old ones.
(this is the default)

for algo=1, do more accurate computing on the average values, but much slower.

I found that algo=1 might be the correct one after I reversed the assembly code back into plain c code...
perhaps I'm wrong?

however, it's not necessary to use algo=1, it's too slow and may not produce a much better result IMO.

so just use the default one, algo=0, which is the same as the old AVISynth SangNom2.
james1201 is offline   Reply With Quote
Old 5th August 2016, 23:51   #7  |  Link
Tormaid
Registered User
 
Tormaid's Avatar
 
Join Date: Dec 2011
Posts: 24
High bit-deapth support is VERY welcome. Thank you!
Tormaid is offline   Reply With Quote
Old 6th August 2016, 05:17   #8  |  Link
littlepox
Registered User
 
Join Date: Nov 2012
Posts: 218
I did some test on my lap top, with:
Core i5 2410M (2.3GHz 2C4T)
8GB DDR3 DC
Windows 7 64bit
VS R32 64bit

Test Script:

import vapoursynth as vs
import sys
import havsfunc as haf
import mvsfunc as mvf

core = vs.get_core(accept_lowercase=True,threads=6)

core.max_cache_size = 8000

a = "00000.m2ts"
src = core.lsmas.LWLibavSource(a,threads=1).std.Trim(0,100)
src16 = core.fmtc.resample(src,3840,2160)
src8 = core.fmtc.bitdepth(src16,bits=8)

res = core.sangnom.SangNomMod(src8,aac=48)

res.set_output()

The result is:
littlepox is offline   Reply With Quote
Old 6th August 2016, 05:29   #9  |  Link
littlepox
Registered User
 
Join Date: Nov 2012
Posts: 218
Conclusion:

1. For algo=0, the new filter(SangNom) and the old one(SangNomMod(aac=48)) produced identical output; the images they output are bit-wise same.
2. For algo=0 and 8-bit input/output, the new filter is slightly faster than the old one. Of course I only test 3 times so it's likely to be inaccurate. Anyway, at least the new one shouldn't be slower.
3. Using Algo=1 will be a bit slower, as expected.
4. Using 16bit input/output will be slower, as expected.

5. SangNom is primarily used for edge anti-aliasing, and in some cases you wish to do separate filtering in flat areas(like denoise/debanding/deblocking), then you mask-merge your edge and non-edge components.
In that case, the precision of SangNom is insignificant since later only the edge part is kept. However you still wish to do them in an 16bit filter chain, like : bitdepth(bits=8).sangnom().bitdepth(bits=16)
I've tested this use, and find it still faster than native 16bit processing.


Overall, great job for this new implementation. I'm going to switch to the new one.

Last edited by littlepox; 7th August 2016 at 10:12.
littlepox is offline   Reply With Quote
Old 6th August 2016, 11:14   #10  |  Link
james1201
Registered User
 
Join Date: Feb 2013
Posts: 9
Thanks for the testing.

Your conclusion is right. The new one is slightly faster than the old one in 8 bit.
for my testing is about 81 fps vs 76 fps. (~6% faster)
james1201 is offline   Reply With Quote
Old 6th August 2016, 14:26   #11  |  Link
Are_
Registered User
 
Join Date: Jun 2012
Location: Ibiza, Spain
Posts: 321
Really nice job there james.
In my benches I got ~57 for old plug-in and ~97 fps for yours (YUV420P8 3840x2160), that's a 70% speed improvement!
And having the option for higher bit depths is really nice.

(Anyway I don't really understand why I'm seeing such a great speed improvement and other people don't)

Edit:
Code:
import vapoursynth as vs

core = vs.get_core()

clip = core.lsmas.LWLibavSource('1920x1080YUV420P8.mkv')

clip = core.resize.Bicubic(clip, 1920*2, 1080*2)

# clip = core.sangnom.SangNom(clip, order=0, aa=48, algo=0, planes=0)
clip = core.sangnom.SangNomMod(clip, order=0, aa=48, aac=0)

clip.set_output()

Last edited by Are_; 6th August 2016 at 14:29.
Are_ is offline   Reply With Quote
Old 6th August 2016, 15:01   #12  |  Link
littlepox
Registered User
 
Join Date: Nov 2012
Posts: 218
For the old SangNomMod, you need to set aac=-1 to disable chroma AA completely; set it to zero does NOT mean doing nothing.

Last edited by littlepox; 7th August 2016 at 08:15.
littlepox is offline   Reply With Quote
Old 6th August 2016, 15:07   #13  |  Link
Are_
Registered User
 
Join Date: Jun 2012
Location: Ibiza, Spain
Posts: 321
Ups, you are right, thx for pointing it out. Now I got ~89 fps and that's an ~8% speed improvement.
Are_ is offline   Reply With Quote
Old 6th August 2016, 15:09   #14  |  Link
james1201
Registered User
 
Join Date: Feb 2013
Posts: 9
Quote:
Originally Posted by Are_ View Post
Really nice job there james.
In my benches I got ~57 for old plug-in and ~97 fps for yours (YUV420P8 3840x2160), that's a 70% speed improvement!
And having the option for higher bit depths is really nice.

(Anyway I don't really understand why I'm seeing such a great speed improvement and other people don't)

Edit:
Code:
import vapoursynth as vs

core = vs.get_core()

clip = core.lsmas.LWLibavSource('1920x1080YUV420P8.mkv')

clip = core.resize.Bicubic(clip, 1920*2, 1080*2)

# clip = core.sangnom.SangNom(clip, order=0, aa=48, algo=0, planes=0)
clip = core.sangnom.SangNomMod(clip, order=0, aa=48, aac=0)

clip.set_output()
NOTE!!

Code:
clip = core.sangnom.SangNomMod(clip, order=0, aa=48, aac=0)
is equivalent to
Code:
clip = core.sangnom.SangNom(clip, order=0, aa=48, algo=0, planes=0).sangnom.SangNom(order=0, aa=0, algo=0, planes=[1, 2])
in old sangnom,
aac=0 still do interpolation but use a threshhold=0 for chroma

in new sangnom,
if you only specify planes=0, the chroma part is just copied without any modification.
james1201 is offline   Reply With Quote
Old 6th August 2016, 15:42   #15  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,548
I looked at the code and I'm wondering why the sse path isn't used for float. Did you forget that or it's untested/not working?

I can also make a patch for you that changes the code to sse2 instructions only. Shouldn't really make a real performance difference but work for more people.

You should also think about the poor people who want to process video on their arm cpus and make it possible to compile for other cpus...
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet
Myrsloik is offline   Reply With Quote
Old 6th August 2016, 17:03   #16  |  Link
james1201
Registered User
 
Join Date: Feb 2013
Posts: 9
Quote:
Originally Posted by Myrsloik View Post
I looked at the code and I'm wondering why the sse path isn't used for float. Did you forget that or it's untested/not working?
It seems no performance gain when using the sse code for float.
so i just used the c version code for it.


Quote:
Originally Posted by Myrsloik View Post
I can also make a patch for you that changes the code to sse2 instructions only. Shouldn't really make a real performance difference but work for more people.

You should also think about the poor people who want to process video on their arm cpus and make it possible to compile for other cpus...
any patches are welcome
james1201 is offline   Reply With Quote
Old 6th August 2016, 17:11   #17  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,548
I'm lazy so here's the whole file with changes. Grab whatever you want from it.

Highlights:
Only needs sse2 now, no measurable performance loss
Fixes vs2015 compilation (which somehow produces a faster binary than your gcc one)
Puts all the sse code under #ifdef VS_TARGET_CPU_X86 so a plain version can be compiled on other platforms
Fixes some conversion warnings

I tried to get the float sse to compile but it seems it's missing a float version of processBuffers_org_sse. I can only test the new version after commenting out stuff?
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet

Last edited by Myrsloik; 6th August 2016 at 17:55. Reason: Add question
Myrsloik is offline   Reply With Quote
Old 6th August 2016, 18:00   #18  |  Link
james1201
Registered User
 
Join Date: Feb 2013
Posts: 9
Quote:
Originally Posted by Myrsloik View Post
I'm lazy so here's the whole file with changes. Grab whatever you want from it.

Highlights:
Only needs sse2 now, no measurable performance loss
Fixes vs2015 compilation (which somehow produces a faster binary than your gcc one)
Puts all the sse code under #ifdef VS_TARGET_CPU_X86 so a plain version can be compiled on other platforms
Fixes some conversion warnings

I tried to get the float sse to compile but it seems it's missing a float version of processBuffers_org_sse. I can only test the new version after commenting out stuff?
Thanks for the patch, I have pushed it into the repo.

PS:
my vs2015 was broken after I updated it to the new version yesterday.
so i can only provide the binary built by gcc now...
james1201 is offline   Reply With Quote
Old 6th August 2016, 19:09   #19  |  Link
Selur
Registered User
 
Selur's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 7,259
Just wondering: Anyone tried to modify havsfunc.py->santiag to use libsangnom instead of vs_sangnommod?
__________________
Hybrid here in the forum, homepage
Selur is offline   Reply With Quote
Old 29th August 2016, 08:29   #20  |  Link
james1201
Registered User
 
Join Date: Feb 2013
Posts: 9
update the binary to r36

some minor bugs fixed.

Code:
r36
    fix a bug that copyField didn't copy the whole lines in 16, 32 bit format.
    fix a bug that compute the last block twice. luckily it didn't affect the output result.
    check the video height which should be an even number.
    add SSE2 code for float type input.
    add configure scripts for linux users.
james1201 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 17:28.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.