z.lib resizers for AviSynth+ - Page 4

pinterf · 22nd March 2018, 09:56

Thanks. Now it's fast!

But now it seems that probably the VS integration would be better (compiler optimization?), surely it's not an MT issue, even the single-threaded case is slower a bit.
Provided my two scripts are comparable, please note if it's not the case.

(Win10, i7-7700)
Avs+ r2636
Avsmeter64: 264 fps (was: 41fps with r1c)
Avsmeter64: (thread=1, no prefetch): 90 fps

VapourSynth r43
VSEdit r18 Benchmark: 217 fps
VSEdit r18 Benchmark: (threads=1): 83 fps

VirtualDub2 r41462 (Direct Stream Copy, switched off input/output preview):
avs+: 172 fps (37fps with r1c)
vs r43: 150 fps

Passing a YUV420P16 clip to Virtualdub2 runs through a P016 conversion - both at VapourSynth and Avisynth+ -, and possibly VDub2 makes further copy/conversions which are extra overhead.
I was using this script to find the slowness before your r1d patch:

Code:

BlankClip(10000,1920,1080,"YUV420P8")
z_ConvertFormat(resample_filter="bicubic", pixel_type="RGBPS",colorspace_op="709:709:709:l=>rgb:linear:2020:l", dither_type="none") # r1c: slooooow, r1d: faaast
z_ConvertFormat(resample_filter="bicubic", pixel_type="YUV420P16",colorspace_op="rgb:linear:2020:l=>709:709:709:l",dither_type="ordered")
prefetch(4)

Code:

import vapoursynth as vs
core = vs.get_core(threads=4)
clip = core.std.BlankClip(width=1920, height=1080, length=10000, color=[0, 128, 128], format=vs.YUV420P8)
clip = core.resize.Bicubic(clip=clip, format=vs.RGBS, matrix_s="rgb", transfer_s="linear", primaries_s="2020", matrix_in_s="709", transfer_in_s="709", primaries_in_s="709", dither_type="none")
clip = core.resize.Bicubic(clip=clip, format=vs.YUV420P16, matrix_in_s="rgb", transfer_in_s="linear", primaries_in_s="2020", matrix_s="709", transfer_s="709", primaries_s="709", dither_type="ordered")
clip.set_output()

jpsdr · 22nd March 2018, 10:04

What is the difference between

Code:

z_ConvertFormat(pixel_type="RGBPS",colorspace_op="2020ncl:st2084:2020:l=>rgb:linear:2020:l", dither_type="none")

and the BT.2020 convert implemented in avs+ ?

Is it related to the formula in REC-BT.2100 page 4 and 5 ?
Or is it something found only in SMTPE 2084 ?

TheFluff · 22nd March 2018, 11:36

Quote:

Originally Posted by jpsdr

What is the difference between

Code:

z_ConvertFormat(pixel_type="RGBPS",colorspace_op="2020ncl:st2084:2020:l=>rgb:linear:2020:l", dither_type="none")

and the BT.2020 convert implemented in avs+ ?

Is it related to the formula in REC-BT.2100 page 4 and 5 ?
Or is it something found only in SMTPE 2084 ?

I dunno, does Avs+ even have linear RGB builtin? As far as I know the builtin Avs+ BT2020 conversion doesn't touch the gamma at all. The conversion you quoted is SMPTE 2084 to linear, and the zimg ST2084 transfer functions are in zimg/colorspace/gamma.cpp line 228 and onwards.

zimg does support HLG too, but under the name ARIB B67, see zimg/colorspace/gamma.cpp line 154 and on.

videoh · 22nd March 2018, 13:37

Quote:

Originally Posted by Stephen R. Savage

I updated the first post with a new build.

Thank you!

videoh · 22nd March 2018, 13:41

Quote:

Originally Posted by pinterf

Passing a YUV420P16 clip to Virtualdub2 runs through a P016 conversion

I thought those two were the same thing. DGSource() delivers CS_YUV420P16. Why would there need to be any conversion?

pinterf · 22nd March 2018, 13:46

Quote:

Originally Posted by videoh

I thought those two were the same thing. DGSource() delivers CS_YUV420P16. Why would there need to be any conversion?

P016 is a semi-packed format which is needed for passing a YUV420P16 clip.
https://msdn.microsoft.com/en-us/lib...(v=vs.85).aspx
Similarly Y416 is used for passing a YUV444P16 clip through the VfW interface, VirtualDub is using.

videoh · 22nd March 2018, 13:57

I see, thank you. If there are spurious conversions inside Avisynth+ or Windows due to VfW maybe that is another reason to prefer Vapoursynth and native Vapoursynth filters. It seems insane to me to convert to an intermediate semi-packed format.

And thank you for your test results. Hopefully, Myrsloik will comment on your performance numbers.

pinterf · 22nd March 2018, 14:21

As at 8 bits, 10-16 bit formats are also negotiated through the Vfw interface, here. For YUV420P16 the fourcc code P016 is reported, this is what VirtualDub2 understands. This format is semi-packed one, Y "plane" is followed immediately wih UVUVUVUV.. data, so that pitch is the same as of the Y data. For YUV422P10 the P210 is used, but you have 2 more choices, such as V210 and Y3[10][10] fourcc codes. These other options can be set thought Avisynth+ OPT_xxx variables, which should appear at the beginning of the scripts.
EDIT: this conversion is done once in the outputting stage, because VirtualDub2 works through VfW. The same interface and method is used by VapourSynth

TheFluff · 22nd March 2018, 14:45

Yeah, for VfW you need the packing unfortunately. VS also uses P016 for YUV420P16 when outputting through the VfW interface. If you're outputting through e.g. vspipe though, you can output whatever format you want.

Myrsloik · 22nd March 2018, 14:47

Quote:

Originally Posted by TheFluff

Yeah, for VfW you need the packing unfortunately. VS also uses P016 for YUV422P16 when outputting through the VfW interface. If you're outputting through e.g. vspipe though, you can output whatever format you want.

You shouldn't be using vfw, it's <current year>. It mostly exists for convenience, previews and backward compatibility. You're sacrificing a lot of threading by using vfw, do the right thing and use vspipe. Or avspipe or whatever...

videoh · 22nd March 2018, 15:14

Questions for Myrsloik:

1) I have made a version of DGDecodeNV that supports both native Avisynth and native Vapoursynth. Will we have to remove DGSource from your avscompat layer, or will Vapoursynth see the native version and use it even if the avscompat layer still has DGSource given as a source filter?

2) Any comment on pinterf's benchmarking showing Vapoursynth a bit slower?

TheFluff · 22nd March 2018, 15:28

It doesn't need to be removed, you can have (and use) both. VS plugins get their own namespaces so the same function name can exist in several different plugins that are loaded at the same time. That is, you can have both core.avs.DGSource() and core.dg.DGSource() in the same script, no problem. The AVS compat stuff is only used when loading plugins with core.avs.LoadPlugin.

You can't have two functions with the same name in the same plugin namespace though, so IIRC if you load an Avisynth plugin that uses overloaded functions (exports the same function more than once but with different argument signatures) in VS, the compat layer will de-conflict by renaming the functions, so you get e.g. func(), func_2(), func_3() etc.

Myrsloik · 22nd March 2018, 15:56

Quote:

Originally Posted by videoh

Question for Myrsloik: I have made a version of DGDecodeNV that supports both native Avisynth and native Vapoursynth. Will we have to remove DGSource from your avscompat layer, or will Vapoursynth see the native version and use it even if the avscompat layer still has DGSource given as a source filter?

Don't forget to mark the filter as fmUnordered and have the flag nfMakeLinear set. Will you export frametype information as well?

poisondeathray · 22nd March 2018, 16:11

Quote:

Originally Posted by videoh

2) Any comment on pinterf's benchmarking showing Vapoursynth a bit slower?

One difference might be blankclip . It's much faster in avisynth. Like 100x faster with 0% cpu usage , no overhead. You can set the length to 1000000 and it will finish instantaneously with blankclip only.

Myrsloik · 22nd March 2018, 16:14

Quote:

Originally Posted by poisondeathray

One difference might be blankclip . It's much faster in avisynth. Like 100x faster with 0% cpu usage , no overhead. You can set the length to 1000000 and it will finish instantaneously with blankclip only.

Add keep=1 as an argument to vs blankclip to speed it up a lot. If you don't it creates a new frame every getframe call to avoid extreme memory bloat in some corner cases. Most people never notice the difference anyway until the benchmarks come out...

videoh · 22nd March 2018, 16:21

How do I "mark the filter as fmUnordered and have the flag nfMakeLinear set"?

Thank you.

Myrsloik · 22nd March 2018, 16:26

Quote:

Originally Posted by videoh

How do I "mark the filter as fmUnordered and have the flag nfMakeLinear set"?

Thank you.

You'll have a line with createfilter that looks something like this:

Code:

vsapi->createFilter(in, out, "DGSomething", init, getframe, free, fmUnordered, nfMakeLinear, data, core);

poisondeathray · 22nd March 2018, 16:27

Quote:

Originally Posted by Myrsloik

Add keep=1 as an argument to vs blankclip to speed it up a lot. If you don't it creates a new frame every getframe call to avoid extreme memory bloat in some corner cases. Most people never notice the difference anyway until the benchmarks come out...

Yes, I tried that (I actually read the docs this time

) . You still don't get "ludicrous" speed

I think it might have to do with avsmeter64. If you send avs to a "real" encoding application , ffmpeg , vdub, x264, etc... you don't get those "instant" speeds when outputting null

This is why I suggested looking at other testing methodologies earlier - there are discrepancies between the methods and numbers, sometimes large

I like ffmpeg since it accepts both avs and vspipe. There still is a discrepancy with blankclip between avs and vpy . Avs is about 20% faster

pinterf · 22nd March 2018, 16:30

Quote:

Originally Posted by Myrsloik

Add keep=1 as an argument to vs blankclip to speed it up a lot. If you don't it creates a new frame every getframe call to avoid extreme memory bloat in some corner cases. Most people never notice the difference anyway until the benchmarks come out...

Thanks, in this case keep=1 has practically no effect.

Myrsloik · 22nd March 2018, 16:31

Quote:

Originally Posted by poisondeathray

Yes, I tried that (I actually read the docs this time

) . You still don't get "ludicrous" speed

I think it might have to do with avsmeter64. If you send avs to a "real" encoding application , ffmpeg , vdub, x264, etc... you don't get those "instant" speeds when outputting null

This is why I suggested looking at other testing methodologies earlier - there are discrepancies between the methods and numbers, sometimes large

I like ffmpeg since it accepts both avs and vspipe. There still is a discrepancy with blankclip between avs and vpy . Avs is about 20% faster

I sure hope you benchmarked this way:

Code:

vspipe script.vpy .

And at ludicrous speeds you basically end up benchmarking very irrelevant things, like the number of calls used for writing to stdout and nothing else.

22nd March 2018, 09:56	#61 \| Link
pinterf Registered User Join Date: Jan 2014 Posts: 2,314	Thanks. Now it's fast! But now it seems that probably the VS integration would be better (compiler optimization?), surely it's not an MT issue, even the single-threaded case is slower a bit. Provided my two scripts are comparable, please note if it's not the case. (Win10, i7-7700) Avs+ r2636 Avsmeter64: 264 fps (was: 41fps with r1c) Avsmeter64: (thread=1, no prefetch): 90 fps VapourSynth r43 VSEdit r18 Benchmark: 217 fps VSEdit r18 Benchmark: (threads=1): 83 fps VirtualDub2 r41462 (Direct Stream Copy, switched off input/output preview): avs+: 172 fps (37fps with r1c) vs r43: 150 fps Passing a YUV420P16 clip to Virtualdub2 runs through a P016 conversion - both at VapourSynth and Avisynth+ -, and possibly VDub2 makes further copy/conversions which are extra overhead. I was using this script to find the slowness before your r1d patch: Code: BlankClip(10000,1920,1080,"YUV420P8") z_ConvertFormat(resample_filter="bicubic", pixel_type="RGBPS",colorspace_op="709:709:709:l=>rgb:linear:2020:l", dither_type="none") # r1c: slooooow, r1d: faaast z_ConvertFormat(resample_filter="bicubic", pixel_type="YUV420P16",colorspace_op="rgb:linear:2020:l=>709:709:709:l",dither_type="ordered") prefetch(4) Code: import vapoursynth as vs core = vs.get_core(threads=4) clip = core.std.BlankClip(width=1920, height=1080, length=10000, color=[0, 128, 128], format=vs.YUV420P8) clip = core.resize.Bicubic(clip=clip, format=vs.RGBS, matrix_s="rgb", transfer_s="linear", primaries_s="2020", matrix_in_s="709", transfer_in_s="709", primaries_in_s="709", dither_type="none") clip = core.resize.Bicubic(clip=clip, format=vs.YUV420P16, matrix_in_s="rgb", transfer_in_s="linear", primaries_in_s="2020", matrix_s="709", transfer_s="709", primaries_s="709", dither_type="ordered") clip.set_output() __________________ AviSynth+ on github, Other repos: RgTools, Masktools2, MvTools2, TIVTC, Average Last edited by pinterf; 22nd March 2018 at 16:33. Reason: scream->stream

22nd March 2018, 10:04	#62 \| Link
jpsdr Registered User Join Date: Oct 2002 Location: France Posts: 2,316	What is the difference between Code: z_ConvertFormat(pixel_type="RGBPS",colorspace_op="2020ncl:st2084:2020:l=>rgb:linear:2020:l", dither_type="none") and the BT.2020 convert implemented in avs+ ? Is it related to the formula in REC-BT.2100 page 4 and 5 ? Or is it something found only in SMTPE 2084 ? __________________ My github.

22nd March 2018, 13:57	#67 \| Link
videoh Useful n00b Join Date: Jul 2014 Posts: 1,667	I see, thank you. If there are spurious conversions inside Avisynth+ or Windows due to VfW maybe that is another reason to prefer Vapoursynth and native Vapoursynth filters. It seems insane to me to convert to an intermediate semi-packed format. And thank you for your test results. Hopefully, Myrsloik will comment on your performance numbers. Last edited by videoh; 22nd March 2018 at 14:32.

22nd March 2018, 14:21	#68 \| Link
pinterf Registered User Join Date: Jan 2014 Posts: 2,314	As at 8 bits, 10-16 bit formats are also negotiated through the Vfw interface, here. For YUV420P16 the fourcc code P016 is reported, this is what VirtualDub2 understands. This format is semi-packed one, Y "plane" is followed immediately wih UVUVUVUV.. data, so that pitch is the same as of the Y data. For YUV422P10 the P210 is used, but you have 2 more choices, such as V210 and Y3[10][10] fourcc codes. These other options can be set thought Avisynth+ OPT_xxx variables, which should appear at the beginning of the scripts. EDIT: this conversion is done once in the outputting stage, because VirtualDub2 works through VfW. The same interface and method is used by VapourSynth __________________ AviSynth+ on github, Other repos: RgTools, Masktools2, MvTools2, TIVTC, Average Last edited by pinterf; 22nd March 2018 at 14:24.

22nd March 2018, 14:45	#69 \| Link
TheFluff Excessively jovial fellow Join Date: Jun 2004 Location: rude Posts: 1,100	Yeah, for VfW you need the packing unfortunately. VS also uses P016 for YUV420P16 when outputting through the VfW interface. If you're outputting through e.g. vspipe though, you can output whatever format you want. Last edited by TheFluff; 22nd March 2018 at 15:13.

22nd March 2018, 15:14	#71 \| Link
videoh Useful n00b Join Date: Jul 2014 Posts: 1,667	Questions for Myrsloik: 1) I have made a version of DGDecodeNV that supports both native Avisynth and native Vapoursynth. Will we have to remove DGSource from your avscompat layer, or will Vapoursynth see the native version and use it even if the avscompat layer still has DGSource given as a source filter? 2) Any comment on pinterf's benchmarking showing Vapoursynth a bit slower? Last edited by videoh; 22nd March 2018 at 15:19.

22nd March 2018, 15:28	#72 \| Link
TheFluff Excessively jovial fellow Join Date: Jun 2004 Location: rude Posts: 1,100	It doesn't need to be removed, you can have (and use) both. VS plugins get their own namespaces so the same function name can exist in several different plugins that are loaded at the same time. That is, you can have both core.avs.DGSource() and core.dg.DGSource() in the same script, no problem. The AVS compat stuff is only used when loading plugins with core.avs.LoadPlugin. You can't have two functions with the same name in the same plugin namespace though, so IIRC if you load an Avisynth plugin that uses overloaded functions (exports the same function more than once but with different argument signatures) in VS, the compat layer will de-conflict by renaming the functions, so you get e.g. func(), func_2(), func_3() etc. Last edited by TheFluff; 22nd March 2018 at 15:34.

22nd March 2018, 16:21	#76 \| Link
videoh Useful n00b Join Date: Jul 2014 Posts: 1,667	How do I "mark the filter as fmUnordered and have the flag nfMakeLinear set"? Thank you. Last edited by videoh; 22nd March 2018 at 16:30.