Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development
Register FAQ Calendar Today's Posts Search

Reply
 
Thread Tools Search this Thread Display Modes
Old 22nd March 2018, 09:56   #61  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,314
Thanks. Now it's fast!

But now it seems that probably the VS integration would be better (compiler optimization?), surely it's not an MT issue, even the single-threaded case is slower a bit.
Provided my two scripts are comparable, please note if it's not the case.

(Win10, i7-7700)
Avs+ r2636
Avsmeter64: 264 fps (was: 41fps with r1c)
Avsmeter64: (thread=1, no prefetch): 90 fps

VapourSynth r43
VSEdit r18 Benchmark: 217 fps
VSEdit r18 Benchmark: (threads=1): 83 fps

VirtualDub2 r41462 (Direct Stream Copy, switched off input/output preview):
avs+: 172 fps (37fps with r1c)
vs r43: 150 fps

Passing a YUV420P16 clip to Virtualdub2 runs through a P016 conversion - both at VapourSynth and Avisynth+ -, and possibly VDub2 makes further copy/conversions which are extra overhead.
I was using this script to find the slowness before your r1d patch:
Code:
BlankClip(10000,1920,1080,"YUV420P8")
z_ConvertFormat(resample_filter="bicubic", pixel_type="RGBPS",colorspace_op="709:709:709:l=>rgb:linear:2020:l", dither_type="none") # r1c: slooooow, r1d: faaast
z_ConvertFormat(resample_filter="bicubic", pixel_type="YUV420P16",colorspace_op="rgb:linear:2020:l=>709:709:709:l",dither_type="ordered")
prefetch(4)
Code:
import vapoursynth as vs
core = vs.get_core(threads=4)
clip = core.std.BlankClip(width=1920, height=1080, length=10000, color=[0, 128, 128], format=vs.YUV420P8)
clip = core.resize.Bicubic(clip=clip, format=vs.RGBS, matrix_s="rgb", transfer_s="linear", primaries_s="2020", matrix_in_s="709", transfer_in_s="709", primaries_in_s="709", dither_type="none")
clip = core.resize.Bicubic(clip=clip, format=vs.YUV420P16, matrix_in_s="rgb", transfer_in_s="linear", primaries_in_s="2020", matrix_s="709", transfer_s="709", primaries_s="709", dither_type="ordered")
clip.set_output()

Last edited by pinterf; 22nd March 2018 at 16:33. Reason: scream->stream
pinterf is offline   Reply With Quote
Old 22nd March 2018, 10:04   #62  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,316
What is the difference between
Code:
z_ConvertFormat(pixel_type="RGBPS",colorspace_op="2020ncl:st2084:2020:l=>rgb:linear:2020:l", dither_type="none")
and the BT.2020 convert implemented in avs+ ?

Is it related to the formula in REC-BT.2100 page 4 and 5 ?
Or is it something found only in SMTPE 2084 ?
__________________
My github.
jpsdr is offline   Reply With Quote
Old 22nd March 2018, 11:36   #63  |  Link
TheFluff
Excessively jovial fellow
 
Join Date: Jun 2004
Location: rude
Posts: 1,100
Quote:
Originally Posted by jpsdr View Post
What is the difference between
Code:
z_ConvertFormat(pixel_type="RGBPS",colorspace_op="2020ncl:st2084:2020:l=>rgb:linear:2020:l", dither_type="none")
and the BT.2020 convert implemented in avs+ ?

Is it related to the formula in REC-BT.2100 page 4 and 5 ?
Or is it something found only in SMTPE 2084 ?
I dunno, does Avs+ even have linear RGB builtin? As far as I know the builtin Avs+ BT2020 conversion doesn't touch the gamma at all. The conversion you quoted is SMPTE 2084 to linear, and the zimg ST2084 transfer functions are in zimg/colorspace/gamma.cpp line 228 and onwards.

zimg does support HLG too, but under the name ARIB B67, see zimg/colorspace/gamma.cpp line 154 and on.

Last edited by TheFluff; 22nd March 2018 at 11:40.
TheFluff is offline   Reply With Quote
Old 22nd March 2018, 13:37   #64  |  Link
videoh
Useful n00b
 
Join Date: Jul 2014
Posts: 1,667
Quote:
Originally Posted by Stephen R. Savage View Post
I updated the first post with a new build.
Thank you!
videoh is offline   Reply With Quote
Old 22nd March 2018, 13:41   #65  |  Link
videoh
Useful n00b
 
Join Date: Jul 2014
Posts: 1,667
Quote:
Originally Posted by pinterf View Post
Passing a YUV420P16 clip to Virtualdub2 runs through a P016 conversion
I thought those two were the same thing. DGSource() delivers CS_YUV420P16. Why would there need to be any conversion?
videoh is offline   Reply With Quote
Old 22nd March 2018, 13:46   #66  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,314
Quote:
Originally Posted by videoh View Post
I thought those two were the same thing. DGSource() delivers CS_YUV420P16. Why would there need to be any conversion?
P016 is a semi-packed format which is needed for passing a YUV420P16 clip.
https://msdn.microsoft.com/en-us/lib...(v=vs.85).aspx
Similarly Y416 is used for passing a YUV444P16 clip through the VfW interface, VirtualDub is using.
pinterf is offline   Reply With Quote
Old 22nd March 2018, 13:57   #67  |  Link
videoh
Useful n00b
 
Join Date: Jul 2014
Posts: 1,667
I see, thank you. If there are spurious conversions inside Avisynth+ or Windows due to VfW maybe that is another reason to prefer Vapoursynth and native Vapoursynth filters. It seems insane to me to convert to an intermediate semi-packed format.

And thank you for your test results. Hopefully, Myrsloik will comment on your performance numbers.

Last edited by videoh; 22nd March 2018 at 14:32.
videoh is offline   Reply With Quote
Old 22nd March 2018, 14:21   #68  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,314
As at 8 bits, 10-16 bit formats are also negotiated through the Vfw interface, here. For YUV420P16 the fourcc code P016 is reported, this is what VirtualDub2 understands. This format is semi-packed one, Y "plane" is followed immediately wih UVUVUVUV.. data, so that pitch is the same as of the Y data. For YUV422P10 the P210 is used, but you have 2 more choices, such as V210 and Y3[10][10] fourcc codes. These other options can be set thought Avisynth+ OPT_xxx variables, which should appear at the beginning of the scripts.
EDIT: this conversion is done once in the outputting stage, because VirtualDub2 works through VfW. The same interface and method is used by VapourSynth

Last edited by pinterf; 22nd March 2018 at 14:24.
pinterf is offline   Reply With Quote
Old 22nd March 2018, 14:45   #69  |  Link
TheFluff
Excessively jovial fellow
 
Join Date: Jun 2004
Location: rude
Posts: 1,100
Yeah, for VfW you need the packing unfortunately. VS also uses P016 for YUV420P16 when outputting through the VfW interface. If you're outputting through e.g. vspipe though, you can output whatever format you want.

Last edited by TheFluff; 22nd March 2018 at 15:13.
TheFluff is offline   Reply With Quote
Old 22nd March 2018, 14:47   #70  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,555
Quote:
Originally Posted by TheFluff View Post
Yeah, for VfW you need the packing unfortunately. VS also uses P016 for YUV422P16 when outputting through the VfW interface. If you're outputting through e.g. vspipe though, you can output whatever format you want.
You shouldn't be using vfw, it's <current year>. It mostly exists for convenience, previews and backward compatibility. You're sacrificing a lot of threading by using vfw, do the right thing and use vspipe. Or avspipe or whatever...
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet
Myrsloik is offline   Reply With Quote
Old 22nd March 2018, 15:14   #71  |  Link
videoh
Useful n00b
 
Join Date: Jul 2014
Posts: 1,667
Questions for Myrsloik:

1) I have made a version of DGDecodeNV that supports both native Avisynth and native Vapoursynth. Will we have to remove DGSource from your avscompat layer, or will Vapoursynth see the native version and use it even if the avscompat layer still has DGSource given as a source filter?

2) Any comment on pinterf's benchmarking showing Vapoursynth a bit slower?

Last edited by videoh; 22nd March 2018 at 15:19.
videoh is offline   Reply With Quote
Old 22nd March 2018, 15:28   #72  |  Link
TheFluff
Excessively jovial fellow
 
Join Date: Jun 2004
Location: rude
Posts: 1,100
It doesn't need to be removed, you can have (and use) both. VS plugins get their own namespaces so the same function name can exist in several different plugins that are loaded at the same time. That is, you can have both core.avs.DGSource() and core.dg.DGSource() in the same script, no problem. The AVS compat stuff is only used when loading plugins with core.avs.LoadPlugin.

You can't have two functions with the same name in the same plugin namespace though, so IIRC if you load an Avisynth plugin that uses overloaded functions (exports the same function more than once but with different argument signatures) in VS, the compat layer will de-conflict by renaming the functions, so you get e.g. func(), func_2(), func_3() etc.

Last edited by TheFluff; 22nd March 2018 at 15:34.
TheFluff is offline   Reply With Quote
Old 22nd March 2018, 15:56   #73  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,555
Quote:
Originally Posted by videoh View Post
Question for Myrsloik: I have made a version of DGDecodeNV that supports both native Avisynth and native Vapoursynth. Will we have to remove DGSource from your avscompat layer, or will Vapoursynth see the native version and use it even if the avscompat layer still has DGSource given as a source filter?
Don't forget to mark the filter as fmUnordered and have the flag nfMakeLinear set. Will you export frametype information as well?
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet
Myrsloik is offline   Reply With Quote
Old 22nd March 2018, 16:11   #74  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 5,377
Quote:
Originally Posted by videoh View Post

2) Any comment on pinterf's benchmarking showing Vapoursynth a bit slower?
One difference might be blankclip . It's much faster in avisynth. Like 100x faster with 0% cpu usage , no overhead. You can set the length to 1000000 and it will finish instantaneously with blankclip only.
poisondeathray is offline   Reply With Quote
Old 22nd March 2018, 16:14   #75  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,555
Quote:
Originally Posted by poisondeathray View Post
One difference might be blankclip . It's much faster in avisynth. Like 100x faster with 0% cpu usage , no overhead. You can set the length to 1000000 and it will finish instantaneously with blankclip only.
Add keep=1 as an argument to vs blankclip to speed it up a lot. If you don't it creates a new frame every getframe call to avoid extreme memory bloat in some corner cases. Most people never notice the difference anyway until the benchmarks come out...
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet
Myrsloik is offline   Reply With Quote
Old 22nd March 2018, 16:21   #76  |  Link
videoh
Useful n00b
 
Join Date: Jul 2014
Posts: 1,667
How do I "mark the filter as fmUnordered and have the flag nfMakeLinear set"?

Thank you.

Last edited by videoh; 22nd March 2018 at 16:30.
videoh is offline   Reply With Quote
Old 22nd March 2018, 16:26   #77  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,555
Quote:
Originally Posted by videoh View Post
How do I "mark the filter as fmUnordered and have the flag nfMakeLinear set"?

Thank you.
You'll have a line with createfilter that looks something like this:
Code:
vsapi->createFilter(in, out, "DGSomething", init, getframe, free, fmUnordered, nfMakeLinear, data, core);
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet
Myrsloik is offline   Reply With Quote
Old 22nd March 2018, 16:27   #78  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 5,377
Quote:
Originally Posted by Myrsloik View Post
Add keep=1 as an argument to vs blankclip to speed it up a lot. If you don't it creates a new frame every getframe call to avoid extreme memory bloat in some corner cases. Most people never notice the difference anyway until the benchmarks come out...
Yes, I tried that (I actually read the docs this time ) . You still don't get "ludicrous" speed

I think it might have to do with avsmeter64. If you send avs to a "real" encoding application , ffmpeg , vdub, x264, etc... you don't get those "instant" speeds when outputting null

This is why I suggested looking at other testing methodologies earlier - there are discrepancies between the methods and numbers, sometimes large

I like ffmpeg since it accepts both avs and vspipe. There still is a discrepancy with blankclip between avs and vpy . Avs is about 20% faster
poisondeathray is offline   Reply With Quote
Old 22nd March 2018, 16:30   #79  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,314
Quote:
Originally Posted by Myrsloik View Post
Add keep=1 as an argument to vs blankclip to speed it up a lot. If you don't it creates a new frame every getframe call to avoid extreme memory bloat in some corner cases. Most people never notice the difference anyway until the benchmarks come out...
Thanks, in this case keep=1 has practically no effect.
pinterf is offline   Reply With Quote
Old 22nd March 2018, 16:31   #80  |  Link
Myrsloik
Professional Code Monkey
 
Myrsloik's Avatar
 
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,555
Quote:
Originally Posted by poisondeathray View Post
Yes, I tried that (I actually read the docs this time ) . You still don't get "ludicrous" speed

I think it might have to do with avsmeter64. If you send avs to a "real" encoding application , ffmpeg , vdub, x264, etc... you don't get those "instant" speeds when outputting null

This is why I suggested looking at other testing methodologies earlier - there are discrepancies between the methods and numbers, sometimes large

I like ffmpeg since it accepts both avs and vspipe. There still is a discrepancy with blankclip between avs and vpy . Avs is about 20% faster
I sure hope you benchmarked this way:
Code:
vspipe script.vpy .
And at ludicrous speeds you basically end up benchmarking very irrelevant things, like the number of calls used for writing to stdout and nothing else.
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet
Myrsloik is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 00:28.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.