Avisynth+ [Archive] - Page 75

pinterf

1st November 2017, 08:52

Continuing the discussion from here (https://forum.doom9.org/showthread.php?p=1823329#post1823329), let's talk about those nasty memory offsets again.

In Avs+ all these size_t's are (32-bit signed) int instead. In Avisynth it doesn't really matter in practice because a frame is always allocated as one big chunk of memory, and nobody really has any use for more than 2^31 bytes of vfb. If you ever wanted to stop doing this though and use one individual pointer per plane like VS does, this won't fly because you can't reliably stuff a 64-bit pointer in a 32-bit integer, and it's generally bad coding practice to not use size_t (or ptrdiff_t) for these things.

Way back in the day ultim claimed that he didn't want to follow IanB's lead and switch to size_t because it'd break a number of completely irrelevant plugins (that would have been trivial to recompile). I didn't realize it at the time, but I'm pretty sure that in practice, he was wrong even on the technical aspect. If you look at these functions above, the only one that conceivably might end up getting called by plugins in reality is VideoFrame::GetOffset(), and I'm 99% sure that its return value just gets passed in a register and zero-extended, so as long as you keep passing the same old less-than-2^31 offsets, everything will keep working just fine (but most plugins just use GetRead/WritePtr). New VideoFrames and VFB's are only constructed by env->NewVideoFrame, so changing the signature of the VideoFrame constructor shouldn't be an issue.

Thanks, then it's worth a try.

Myrsloik

1st November 2017, 11:06

Since we're on the subject of changes. Why was the return type of GetCPUFlags() changed from long to int? That one makes absolutely no sense to me.

pinterf

6th November 2017, 12:38

Since we're on the subject of changes. Why was the return type of GetCPUFlags() changed from long to int? That one makes absolutely no sense to me.
Another piece of history:
"Standardize some type usage to prevent confusion of devs with GCC background."
https://github.com/AviSynth/AviSynthPlus/commit/a6ced5b4b16e666d09b1d24c16269c2a15028712#diff-987740f99567909b32dd60d203ea9504

pinterf

6th November 2017, 12:42

r2508 RGB->Y8 [incorrect]
r2508 RGB->YV24->Y8 [correct]
2.6MT RGB->Y8 [correct]

Thanks for the report, fixed. Rec601, Rec709 (limited range) RGB->Y conversion was affected. Regression since r2266 (a lot of high bit depth work at that time)

pinterf

6th November 2017, 13:26

Continuing the discussion from here (https://forum.doom9.org/showthread.php?p=1823329#post1823329), let's talk about those nasty memory offsets again.

In Avs+ all these size_t's are (32-bit signed) int instead. In Avisynth it doesn't really matter in practice because a frame is always allocated as one big chunk of memory, and nobody really has any use for more than 2^31 bytes of vfb. If you ever wanted to stop doing this though and use one individual pointer per plane like VS does, this won't fly because you can't reliably stuff a 64-bit pointer in a 32-bit integer, and it's generally bad coding practice to not use size_t (or ptrdiff_t) for these things.

Way back in the day ultim claimed that he didn't want to follow IanB's lead and switch to size_t because it'd break a number of completely irrelevant plugins (that would have been trivial to recompile). I didn't realize it at the time, but I'm pretty sure that in practice, he was wrong even on the technical aspect. If you look at these functions above, the only one that conceivably might end up getting called by plugins in reality is VideoFrame::GetOffset(), and I'm 99% sure that its return value just gets passed in a register and zero-extended, so as long as you keep passing the same old less-than-2^31 offsets, everything will keep working just fine (but most plugins just use GetRead/WritePtr). New VideoFrames and VFB's are only constructed by env->NewVideoFrame, so changing the signature of the VideoFrame constructor shouldn't be an issue.

Thanks, then it's worth a try.

Turned out that programs using C interface could be broken.

Current x264 is one of such programs, it fails, because it reads zero pitches (strides).

x264 [error]: Input picture width (640) is greater than stride (0)
x264 [error]: x264_encoder_encode failed

Having a look at the x264 source code, it contains a hybride avisynth_c.h (I guess that it is a mix of some previous headers from the "classic" avs line, inserted new high bit depth stuff from current avs+) in which the function avs_get_read_ptr_p (http://git.videolan.org/?p=x264.git;a=blob;f=extras/avisynth_c.h;h=81598790b1217e839eac01dffffd9ca540381f41;hb=HEAD#l491) is inlined and directly accesses the fields of AVS_VideoFrame struct (http://git.videolan.org/?p=x264.git;a=blob;f=extras/avisynth_c.h;h=81598790b1217e839eac01dffffd9ca540381f41;hb=HEAD#l480)
It fails because when we change the type of offset-like variables from int (32 bits) to size_t (64 bits on x64), the positions of fields in AVS_VideoFrame are shifted and a "baked" reference to the "pitch" now becomes the higher 32 bits of "offset", which is usually zero.

And that's only for x264. Anyway, when I'll make a test build, I will provide this "offsets are size_t instead of int" version as a separate test version.

If it could be changed it may work for current (int) and future (size_t) version of avisynth. Maybe it was inlined purposely, because older avisynth versions did not support avs_get_read_ptr_p through C interface (did not have time to dig into its history)?

To tell the truth, I had a look at the current avisynth_c.h from avs+ project, and although this specific call is O.K. in current version, there are two other problematic functions, namely
avs_get_row_size and avs_get_height (the ones without a "plane" parameter) so they apply on plane 0 (PLANAR_Y).

These functions are still directly accessing the AVS_VideoFrame contrary to the big warning "DO NOT USE THIS STRUCTURE DIRECTLY", so c interface header in avs+ is still inconsistent, at least for AVS_VideoFrame.

TheFluff

6th November 2017, 14:10

Ah, I hadn't considered that. Tricky.

Shirtfull

7th November 2017, 02:06

Thanks for the report, fixed. Rec601, Rec709 (limited range) RGB->Y conversion was affected. Regression since r2266 (a lot of high bit depth work at that time)

Rgb24 to yv12 seems incorrect as well, Frame served Progressive/interlaced testcube RGB24 from Virtualdub/VirtualDub_FilterMod to AvsPmod.

Rgb>yv12

https://s1.postimg.org/3m4qzr6lij/Test_avi000000a.jpg (http://postimg.org/image/3m4qzr6lij/)

Rgb>yv16

https://s1.postimg.org/5vnrj8rrez/Test_avi000000b.jpg (http://postimg.org/image/5vnrj8rrez/)

pinterf

7th November 2017, 08:45

Rgb24 to yv12 seems incorrect as well, Frame served Progressive/interlaced testcube RGB24 from Virtualdub/VirtualDub_FilterMod to AvsPmod.

Rgb>yv12

Rgb>yv16

I can see something like a rotation blur on the yv12 sample. How did you achieve that exactly?

Shirtfull

7th November 2017, 12:10

SetFilterMTMode("AVISource", x) #2or3
AVISource("Path to frameserve file.avi").ConvertToYv12() #12 or 16
bob()
Prefetch (4)

Only difference between pics was the conversion.

Take out bob, they both look the same.

pinterf

7th November 2017, 13:09

SetFilterMTMode("AVISource", x) #2or3
AVISource("Path to frameserve file.avi").ConvertToYv12() #12 or 16
bob()
Prefetch (4)

Only difference between pics was the conversion.

Take out bob, they both look the same.
Specify interlaced=true in ConvertToYV12.
RGB->YV12 conversion is a two-phase conversion. First comes RGB->YV24, then YV24->YV12. This latter needs the hint that the clip is an interlaced one.
(remark: parameter "interlaced" is used only in conversions where yv12 is involved either as source or target)

Shirtfull

7th November 2017, 13:39

Thanks, that fixed it.

wonkey_monkey

7th November 2017, 16:27

SetFilterMTMode("AVISource", x) #2or3
AVISource("Path to frameserve file.avi").ConvertToYv12() #12 or 16
bob()
Prefetch (4)

Only difference between pics was the conversion.

Take out bob, they both look the same.

Anyone know why it looks like some kind of rotation with the bob() in there? Seems weird.

shekh

7th November 2017, 17:05

Anyone know why it looks like some kind of rotation with the bob() in there? Seems weird.

I think VD does not have way to request interlaced conversion to YV12.
Looks better:
deinterlace (unfold)
convert format (yv12)
deinterlace (fold)

Shirtfull

8th November 2017, 16:50

The other way round, using VDFiltermod to generate cube and then frame-serving to avisynth. I recall it can only serve RGB.
https://s1.postimg.org/43ibe01o97/Create_test_video.jpg (https://postimg.org/image/43ibe01o97/)

johnmeyer

8th November 2017, 22:45

It appears that Prefetch cannot be used in a script which uses Return.

Here is my test. The following script runs at the same speed with, or without, the Prefetch statement. It is clear that multi-threading is NOT being used.

LoadPlugin("E:\Documents\My Videos\AVISynth\AVISynth Plugins\plugins\Film Restoration\Script_and_Plugins\RemoveGrainSSE2.dll")
source=AVISource("E:\fs.avi").killaudio().ConvertToYV12()
output=MDegrain2i2(source,8,4,400,0)
return output
Prefetch(5)

If I move Prefetch to the line before the "return output" statement, the script throws an error, "Invalid arguments to function 'Prefetch' ".

If I re-code so the script ends with the implied "last," as shown below, I get a 4x speedup (i.e., multi-threading is working), and don't get an error message. In other words, this works. However, it is a PITA to have to re-write scripts to avoid the return statement because I often want to do comparisons between the initial and final states of a denoising script, so it is useful to have a final "output" variable.

Is there a way to get Prefetch to work with Return, or I have I just found the only way?

LoadPlugin("E:\Documents\My Videos\AVISynth\AVISynth Plugins\plugins\Film Restoration\Script_and_Plugins\RemoveGrainSSE2.dll")
source=AVISource("E:\fs.avi").killaudio().ConvertToYV12()
MDegrain2i2(source,8,4,400,0)
Prefetch(5)

poisondeathray

9th November 2017, 00:19

Is there a way to get Prefetch to work with Return, or I have I just found the only way?

Don't use "return" , just call the variable directly

eg.

source = whateversource()
A = source().filter1()
B = source().filter2()

#source
A
#B
prefetch(5)

If you wanted "B" , comment out "A" and uncomment out B. If you wanted "source" - same idea, uncomment out source, comment out A and B

johnmeyer

9th November 2017, 00:28

Don't use "return" , just call the variable directly

eg.

source = whateversource()
A = source().filter1()
B = source().filter2()

#source
A
#B
prefetch(5)

If you wanted "B" , comment out "A" and uncomment out B. If you wanted "source" - same idea, uncomment out source, comment out A and BI just learned something: I thought I had to use "return." I need to re-read the AVISynth doc. Thanks!

TheFluff

9th November 2017, 00:30

return output.prefetch(5)

may or may not work

LigH

9th November 2017, 00:44

AviSynth (plus as well as legacy) internally uses a clip variable "last" where an explicit assignment is omitted (and an implicit "return last" where any return was omitted). Explicitly, this would act like:

source = whateversource()
A = source().filter1()
B = source().filter2()

#source
last = A
#last = B
last.prefetch(5)
return last

AviSynth will probably even detect that B is never used, therefore never execute "B = source().filter2()".

StainlessS

9th November 2017, 00:52

I nearly suggested same as Fluffy, but as I've never used MT/Avs+, thought it might be daft.

Perhaps docs could be updated to reflect that Prefetch takes a [EDIT: compulsory] clip arg (if indeed it does), docs as given on Avisynth.org/Avs+ below.

Enabling MT

The other difference is how you actually enable multithreading. Calling SetFilterMTMode() is not enough, it sets the MT mode, but the MT mode only has an effect if MT is enabled at all. Note this means you can safely include/import/autoload your SetFilterMTMode() calls in even single-threaded scripts, and they will not be messed up. Uhm, onto the point: You enable MT by placing a single call to Prefetch(X) at the *end* of your script, where X is the number of threads to use.
Example

# This line causes all filters that don't have an MT mode explicitly use mode 2 by default.
# Mode 2 is a relatively safe choice until you don't know most of your calls to be either mode 1 or 3.
# Compared with mode 1, mode 2 trades memory for MT-safety, but only a select few filters will work with mode 1.
SetFilterMTMode("DEFAULT_MT_MODE", 2)
or
SetFilterMTMode("DEFAULT_MT_MODE", MT_MULTI_INSTANCE)

# FFVideoSource(), like most of all source filters, needs MT mode 3.
# Note: starting with AviSynth+ r2069, it will now automatically recognize source filters.
# If it sees a source filter which has no MT-mode specified at all, it will automatically use
# mode 3 instead of the default MT mode.
SetFilterMTMode("FFVideoSource", 3)
or
SetFilterMTMode("FFVideoSource", MT_SERIALIZED)

# Now comes your script as usual
FFVideoSource(...)
Trim(...)
QTGMC(...)
...

# Enable MT!
Prefetch(4)

LigH

9th November 2017, 01:00

But important, when following StainlessS' form: Do not use "return" before "Prefetch". In case you do, the "return" statement is "the end of the script", not the last character.

StainlessS

9th November 2017, 01:05

But important, when following StainlessS' form: Do not use "return" before "Prefetch". In case you do, the "return" statement is "the end of the script", not the last character.

Are you saying that

return Last.Prefetch(4)

is equivalent to [EDIT: of course it dont work]

return
Last.Prefetch(4)

???

EDIT: OK, I think I get what you mean, simlar to John's earlier not working script.

Use eg

return output.Prefetch(4)
OR
output.Prefetch(4)
at the end of script (where no further parsing will occur on subsequent lines).

LigH

9th November 2017, 01:19

No, it would be equivalent to:

last = last.Prefetch(4)
return last

With "before", I mean "at least one line above, one linebreak in between"; "return" in the same line, just in front of the Prefetch result, would work.

StainlessS

9th November 2017, 01:22

OK, but both of my last two above code blocks would work, yes ?

EDIT: or that below will NOT work as planned.
return output.Prefetch(4)

EDIT: OK, thank you sir.

LigH

9th November 2017, 08:49

Using a variable "output" only works if you actually assigned the currently filtered clip to this clip variable; it is not an existing global symbol (but "last" is).

pinterf

9th November 2017, 09:22

A short question.
Lately moved to Visual Studio 2017.
Now Avisynth+ is built with v141_xp toolset instead of v140_xp (VS 2015).

I have put it to an xp virtual machine, which did not have Microsoft Visual C++ Redistributable for Visual Studio 2017 (https://www.visualstudio.com/downloads/).
I expected that it would crash with a nice exception but it was running instead. What function call should I use that would crash this vs2017-built dll with no new redistributables?

Groucho2004

9th November 2017, 09:46

What function call should I use that would crash this vs2017-built dll with no new redistributables?
No idea. Have you read through this (https://docs.microsoft.com/en-us/cpp/what-s-new-for-visual-cpp-in-visual-studio) (paragraph "Standard Library improvements")? You might also want to look at this (https://docs.microsoft.com/en-us/cpp/porting/overview-of-potential-upgrade-issues-visual-cpp).

pinterf

9th November 2017, 10:44

No idea. Have you read through this (https://docs.microsoft.com/en-us/cpp/what-s-new-for-visual-cpp-in-visual-studio) (paragraph "Standard Library improvements")? You might also want to look at this (https://docs.microsoft.com/en-us/cpp/porting/overview-of-potential-upgrade-issues-visual-cpp).
Thanks. I reached here:
C++ Binary Compatibility between Visual Studio 2015 and Visual Studio 2017 (https://docs.microsoft.com/en-us/cpp/porting/binary-compat-2015-2017)
It says that in our case they are compatible, because the major number is to the toolsets (v140, v141) is 14.
(Does it mean that for a freshly installed system, it's enough to have the VS2017 redistributables for our old dlls that would need VS2015 redist in the past?)

Myrsloik

9th November 2017, 10:50

Thanks. I reached here:
C++ Binary Compatibility between Visual Studio 2015 and Visual Studio 2017 (https://docs.microsoft.com/en-us/cpp/porting/binary-compat-2015-2017)
It says that in our case they are compatible, because the major number is to the toolsets (v140, v141) is 14.
(Does it mean that for a freshly installed system, it's enough to have the VS2017 redistributables for our old dlls that would need VS2015 redist in the past?)

Yes, if you look at installed programs the 2017 runtime actually replaces the 2015 one

real.finder

13th November 2017, 12:33

since there are many plugins dll's that didn't port to x64 including the closed source plugins, is there some ways to make the 32 bit one work in 64 processes? I note this http://www.dllwrapper.com/ but couldn't build wrapped dll successfully, and even if I did, it will work one day only (need to buy it)

LigH

13th November 2017, 12:53

I guess it might be possible in a separate 32-bit sub process (with a "bridge" EXE in between), but that would probably be quite inefficient, possibly requiring frame data to be piped (like the avs2yuv or avs4x26x bridges do).

Could you name any "priceless" plugins you know some users can't live without?

real.finder

13th November 2017, 13:06

I guess it might be possible in a separate 32-bit sub process (with a "bridge" EXE in between), but that would probably be quite inefficient, possibly requiring frame data to be piped (like the avs2yuv or avs4x26x bridges do).

Could you name any "priceless" plugins you know some users can't live without?

I think that dllwrapper use bridge EXE for that

about the "priceless" plugins, it depend on the source you work with it, but there are many, mostly used in avsi script functions

TheFluff

13th November 2017, 13:50

People keep saying that but they never cite any actual examples. If you actually say what you want maybe someone will actually modernize it!

LigH

13th November 2017, 15:15

Somehow I think about "glorified anime miracle scripts" right now...

real.finder

13th November 2017, 15:31

ok then, like removedirt, deen, TBilateral, removegrainT, LGhost

edit: and AVSInpaint (for FillBorders() in Stabilization Tools Pack by Dogway)

Yanak

13th November 2017, 20:07

ok then, like removedirt, deen, TBilateral, removegrainT, LGhost

edit: and AVSInpaint (for FillBorders() in Stabilization Tools Pack by Dogway)
AvsInpaint.dll used also with InpaintFunc.avs which is the logo remover i get best results with, many Vdub Plugins not natively supported in x64 too, hopefully MP_Pipeline exists and can run some parts of a .avs in win32 mode + some good souls managed to port some other very nice plugins on x64 recently.

This said i gave up on many x86 plug-ins that looked interesting on the paper for some specific tasks i had to do and needed a solution for but did not bothered with long processing and headache of not being able to pass some variables from one process mode to the other and in the end found other ways to do what i needed using other tools outside avisytnh or simply gave up on those projects.

Being able to run natively and more smoothly some x86 stuff like MP_Pipeline allows it will be a dream, but i keep this as a dream :)

real.finder

13th November 2017, 20:36

AvsInpaint.dll used also with InpaintFunc.avs which is the logo remover i get best results with, many Vdub Plugins not natively supported in x64 too, hopefully MP_Pipeline exists and can run some parts of a in win32 mode + some good souls managed to port some other very nice plugins on x64 recently.

This said i gave up on many x86 plug-ins that looked interesting on the paper for some specific tasks i had to do and needed a solution for but did not bothered with long processing and headache of not being able to pass some variables from one process mode to the other and in the end found other ways to do what i needed using other tools outside avisytnh or simply gave up on those projects.

Being able to run natively and more smoothly some x86 stuff like MP_Pipeline allows it will be a dream, but i keep this as a dream :)

mpp is good but it's complicated for some people, has some limit like you can't use """ """ and you can't use some avs+ Features in it (https://github.com/SAPikachu/MP_Pipeline/issues/1), no audio support, and has some overhead ofc

and aside from mpp downside, if there are one plugin that lacking x64 port in some function then you have to use the x86 just for it!

pinterf

14th November 2017, 16:19

All good things must come to an end: now I stopped optimizing Expr so have fun with this release.

This version - along with fixing some annoying bugs - features the Expr filter, which was ported from the Vapoursynth project. Although it was ported in one day, tweaking it further took a _lot_ more time and of course, a good entertainment.

Download Avisynth+ r2542 (20171114) (https://github.com/pinterf/AviSynthPlus/releases/tag/r2542-MT)

Questions, testing are welcome.

Please read the "readme.txt" for details about Expr, until the documentation appears in the avisynth webpage.

# Avisynth+ r2542 (20171114)

## Fixes
- Fix: RGB (full scale) conversion: 10-16 bits to 8 bits rounding issue; pic got darker in repeated 16<->8 bit conversion chain
- Fix: ConvertToY: remove unnecessary clamp for Planar RGB 32 bit float
- Fix: RGB ConvertToY when rec601, rec709 (limited range) matrix. Regression since r2266

## modification, additions
- Add: Expr filter
- Add: Levels: 32 bit float format support
- Optimized: Faster RGB (full scale) 10-16 bits to 8 bits conversion when dithering
- Other: Default frame alignment is 64 bytes (was: 32 bytes). (independently of AVX512 support)
- Built with Visual Studio 2017, v141_xp toolset
- some fixes in avisynth_c.h (C interface header file)
- experimental x64 build with size_t frame offsets for testing more properly written C interfaces

edit: quick info about Expr (from readme):
Expr filter

Syntax ("c+s+[format]s[optAvx2]b[optSingleMode]b[optSSE2]b")
clip Expr(clip c[,clip c2, ...], string expr [, string expr2[, string expr3[, string expr4]]] [, string format]
[, bool optSSE2][, bool optAVX2][, bool optSingleMode])

Clip and Expr parameters are unnamed
'format' overrides the output video format
'optSSE2' to disable simd optimizations (use C code)
'optAVX2' to disable AVX2 optimizations (use SSE2 code)
'optSingleMode' default false, to generate simd instructions for one XMM/YMM wide data instead of two. Experimental.
One simd cycle processes 8 pixels (SSE2) or 16 pixels (AVX2) at a time by using two XMM/YMM registers as working set.
Very-very complex expressions would use too many XMM/YMM registers which are then "swapped" to memory slots, that can be slow.
Using optSingleMode = true may result in using less registers with no need for swapping them to memory slots.

Expr accepts 1 to 26 clips as inputs and up to four expression strings, an optional video format overrider, and some debug parameters.
Output video format is inherited from the first clip, when no format override.
All clips have to match their dimensions and plane subsamplings.

Expressions are evaluated on each plane, Y, U, V (and A) or R, G, B (,A).
When an expression string is not specified, the previous expression is used for that plane. Except for plane A (alpha) which is copied by default.
When an expression is an empty string ("") then the relevant plane will be copied (if the output clip bit depth is similar).
When an expression is a single clip reference letter ("x") and the source/target bit depth is similar, then the relevant plane will be copied.
When an expression is constant, then the relevant plane will be filled with an optimized memory fill method.
Expressions are written in Reverse Polish Notation (RPN).

Expressions use 32 bit float precision internally

For 8..16 bit formats output is rounded and clamped from the internal 32 bit float representation to valid 8, 10, ... 16 bits range.
32 bit float output is not clamped at all.

- Clips: letters x, y, z, a, ... w. x is the first clip parameter, y is the second one, etc.
- Math: * / + -
- Math constant: pi
- Functions: min, max, sqrt, abs, neg, exp, log, pow ^ (synonyms: "pow" and "^")
- Logical: > < = >= <= and or xor not == & | != (synonyms: "==" and "=", "&" and "and", "|" and "or")
- Ternary operator: ?
- Duplicate stack: dup, dupN (dup1, dup2, ...)
- Swap stack elements: swap, swapN (swap1, swap2, ...)
- Scale by bit shift: scaleb (operand is treated as being a number in 8 bit range unless i8..i16 or f32 is specified)
- Scale by full scale stretch: scalef (operand is treated as being a number in 8 bit range unless i8..i16 or f32 is specified)
- Bit-depth aware constants
ymin, ymax (ymin_a .. ymin_z for individual clips) - the usual luma limits (16..235 or scaled equivalents)
cmin, cmax (cmin_a .. cmin_z) - chroma limits (16..240 or scaled equivalents)
range_half (range_half_a .. range_half_z) - half of the range, (128 or scaled equivalents)
range_size, range_half, range_max (range_size_a .. range_size_z , etc..)
- Keywords for modifying base bit depth for scaleb and scalef: i8, i10, i12, i14, i16, f32
- Spatial input variables in expr syntax:
sx, sy (absolute x and y coordinates, 0 to width-1 and 0 to height-1)
sxr, syr (relative x and y coordinates, from 0 to 1.0)

Additions and differences to VS r39 version:
------------------------------
(similar features to the masktools mt_lut family syntax)

Aliases:
introduced "^", "==", "&", "|"
New operator: != (not equal)
Built-in constants
ymin, ymax (ymin_a .. ymin_z for individual clips) - the usual luma limits (16..235 or scaled equivalents)
cmin, cmax (cmin_a .. cmin_z) - chroma limits (16..240 or scaled equivalents)
range_half (range_half_a .. range_half_z) - half of the range, (128 or scaled equivalents)
range_size, range_half, range_max (range_size_a .. range_size_z , etc..)
Autoscale helper functions (operand is treated as being a number in 8 bit range unless i8..i16 or f32 is specified)
scaleb (scale by bit shift - mul or div by 2, 4, 6, 8...)
scalef (scale by stretch full scale - mul or div by source_max/target_max
Keywords for modifying base bit depth for scaleb and scalef
: i8, i10, i12, i14, i16, f32
Built-in math constant
pi
Alpha plane handling. When no separate expression is supplied for alpha, plane is copied instead of reusing last expression parameter.
Proper clamping when storing 10, 12 or 14 bit outputs
(Faster storing of results for 8 and 10-16 bit outputs, fixed in VS r40)
16 pixels/cycle instead of 8 when avx2, with fallback to 8-pixel case on the right edge. Thus no need for 64 byte alignment for 32 bit float.
(Load zeros for nonvisible pixels, when simd block size goes beyond image width, to prevent garbage input for simd calculation)

Optimizations for pow: x^0.5 is sqrt, ^2, ^3, ^4 is done by faster and more precise multiplication
Spatial input variables in expr syntax:
sx, sy (absolute x and y coordinates, 0 to width-1 and 0 to height-1)
sxr, syr (relative x and y coordinates, from 0 to 1.0)
Optimize: recognize constant plane expression: use fast memset instead of generic simd process. Approx. 3-4x (32 bits) to 10-12x (8 bits) speedup
Optimize: Recognize single clip letter in expression: use fast plane copy (BitBlt)
(e.g. for 8-16 bits: instead of load-convert_to_float-clamp-convert_to_int-store). Approx. 1.4x (32 bits), 3x (16 bits), 8-9x (8 bits) speedup
Optimize: do not call GetFrame for input clips that are not referenced or plane-copied
Recognize constant expression: use fast memset instead of generic simd process. Approx. 3-4x (32 bits) to 10-12x (8 bits) speedup
Example: Expr(clip,"128","128,"128")

Differences from masktools 2.2.10
--------------------------------
Up to 26 clips are allowed (x,y,z,a,b,...w). Masktools handles only up to 4 clips with its mt_lut, my_lutxy, mt_lutxyz, mt_lutxyza

- Clips with different bit depths are allowed
- Works with 32 bit floats instead of 64 bit double internally
- Less functions (e.g. no bit shifts)
- No float clamping and float-to-8bit-and-back load/store autoscale magic (yet)
- Logical 'false' is 0 instead of -1
- The ymin, ymax, etc built-in constants can have a _X suffix, where X is the corresponding clip designator letter. E.g. cmax_z, range_half_x
- mt_lutspa-like functionality is available through "sx", "sy", "sxr", "syr"
- No y= u= v= parameters with negative values for filling plane with constant value, constant expressions are changed into optimized "fill" mode

Sample:

Average three clips:
c = Expr(clip1, clip2, clip3, "x y + z + 3 /")

using spatial feature:
c = Expr(clip1, clip2, clip3, "sxr syr 1 sxr - 1 syr - * * * 4096 scaleb *", "", "")

Myrsloik

14th November 2017, 16:59

About how much faster is avx2 vs sse2 on a modern cpu in your expr version?

real.finder

14th November 2017, 17:05

thanks pinterf

- Clips with different bit depths are allowed

some friend that use vs said that float <- -> int is broken in vs expr, did you note that and fix it?

pinterf

14th November 2017, 17:14

About how much faster is avx2 vs sse2 on a modern cpu in your expr version?

I had to do that in blind mode, I have no AVX2, only through SDE emulator. I could test it only two days ago on a 2 yr old i5 notebook and the results show that it was worth to implement.
Other speed tests are welcome, that's why there are optXXX parameters.

results in fps
avx2: set it only in Expr through optAvx2 parameter
bits i5 sse2 32/64 bit i5Avx2 32/64 bit
8 17.00 19.30 24.63 28.98
16 15.69 17.59 20.38 23.26
32 12.70 13.59 16.03 17.14

The script was something like this (deleted my debug experimental commented out lines)
lsmashvideosource("13HoursCUT.mp4", format="YUV444P8")
Spline64Resize(486,240) #resize, result is a multistacked image
src=last
# expr
c8 = CalcTest(src,8, False)
c16 = CalcTest(src,16, False)
c32 = CalcTest(src,32, False)

# lutxy
c8e = CalcTest(src,8, True)
c16e = CalcTest(src,16, True)
c32e = CalcTest(src,32, True)

res8=Diff(c8,src)
res16=Diff(c16,src)
res32=Diff(c32,src)

res8e=Diff(c8e,src)
res16e=Diff(c16e,src)
res32e=Diff(c32e,src)

col1=StackVertical(c8,c16.convertbits(8),c32.convertbits(8))
col2=StackVertical(res8, res16, res32)
col3=StackVertical(c8e,c16e.convertbits(8),c32e.convertbits(8))
col4=StackVertical(res8e, res16e, res32e)
StackHorizontal(col1, col2, col3, col4)

#used only c8, c16 or c32 output for speed test from the clips above.
# change parameters. e.g. optSSE2=true, optSingleMode=false, optAvx2=false
c8

Function Diff(clip src1, clip src2)
{
return Subtract(src1.ConvertBits(8),src2.ConvertBits(8)).Levels(120, 1, 255-120, 0, 255, coring=false)
}

Function CalcTest(clip src, int bits, bool lut)
{
src
convertbits(bits)
tmp=last
method=Blur(1)

szrp=16
spwr=4
str=100/100.0
sdmplo=4
sdmphi=48
expr_pow = "x y == x x x y - abs "+string(Szrp) +" scaleb / 1 "+string(Spwr)+" / ^ "+string(Szrp) +" scaleb * "+string(str)+" * x y - 2 ^ x y - 2 ^ "
\+string(SdmpLo)+" scaleb scaleb + / * x y - x y - abs / * 1 "+string(SdmpHi)+" scaleb 0 == 0 x y - abs "+string(SdmpHi)+" scaleb / 4 ^ ? + / + ?"

ret=lut ? mt_lutxy(tmp,method, yexpr=expr_pow, U=1,V=1 ) : Expr(tmp,method,expr_pow,"","", optSSE2=true, optSingleMode=false, optAvx2=false)
return ret
}

pinterf

14th November 2017, 17:15

thanks pinterf

some friend that use vs said that float <- -> int is broken in vs expr, did you note that and fix it?

Yes, fixed and noted.

ryrynz

14th November 2017, 22:04

Any idea when your commits going to merge officially? You're the only one spearheading code in right now.

wonkey_monkey

15th November 2017, 10:03

sxr, syr (relative x and y coordinates, from 0 to 1.0)

Aren't those normalised, rather than relative?

pinterf

15th November 2017, 10:25

Aren't those normalised, rather than relative?
Yes, normalized but I took the terminology from here:
http://avisynth.nl/index.php/MaskTools2/mt_lutspa

edcrfv94

15th November 2017, 12:29

I change mt_lutxy to Expr then limit function no work any more.

Function kf_limit_dif8_expr(clip filtered, clip original, bool "smooth", float "thr", float "elast", float "darkthr", int "Y", int "U", int "V")
{
sCSP = filtered.kf_GetCSP()
IsY8 = sCSP == "Y8"

smooth = Default(smooth, True )
thr = Default(thr, 1.0 )
elast = Default(elast, smooth ? 3.0 : 255./thr)
darkthr = Default(darkthr,thr )
Y = Default(Y, 3 )
U = Default(U, 3 )
V = Default(V, 3 )

Y = min(Y, 4)
U = min(U, 4)
V = min(V, 4)

thr = max(min( thr, 255.0), 0.0)
darkthr = max(min(darkthr, 255.0), 0.0)
elast = max(elast, 1.0)
mode = thr == 0 && darkthr == 0 ? 4 : thr == 255 && darkthr == 255 ? 2 : 3
smooth = elast==1 ? False : smooth

diffstr = " x y - "
elaststr = " "+string(elast)+" "

thrstr = diffstr+" 0 > "+string(darkthr)+" scalef "+string(thr)+" scalef ? "
alphastr = elaststr+" 1 <= 0 1 "+elaststr+" 1 - "+thrstr+" * / ? "
betastr = thrstr+elaststr+" * "
sexpr = smooth ? alphastr+diffstr+" * "+betastr+diffstr+" abs - * y + "
\ : thrstr+diffstr+diffstr+" abs / * y + "
expr = diffstr+" abs "+thrstr+" <= x "+diffstr+" abs "+betastr+" >= y "+sexpr+" ? ? "

thrstrc = " "+string(thr)+" scalef "
alphastrc= elaststr+" 1 <= 0 1 "+elaststr+" 1 - "+thrstrc+" * / ? "
betastrc = thrstrc+elaststr+" * "
sexprc = smooth ? alphastrc+diffstr+" * "+betastrc+diffstr+" abs - * y + "
\ : thrstrc+diffstr+diffstr+" abs / * y + "
exprc = diffstr+" abs "+thrstrc+" <= x "+diffstr+" abs "+betastrc+" >= y "+sexprc+" ? ? "

# diff = filtered - original
# alpha = 1 / (thr * (elast - 1))
# beta = elast * thr
# When smooth=True :
# output = diff <= thr ? filtered : \
# diff >= beta ? original : \
# original + alpha * diff * (beta - abs(diff))
# When smooth=False :
# output = diff <= thr ? filtered : \
# diff >= beta ? original : \
# original + thr * (diff / abs(diff))

expry = (Y == 3) ? expr : ""
expru = (U == 3) ? exprc : ""
exprv = (V == 3) ? exprc : ""

return (mode == 4) ? original
\ : (mode == 2) ? filtered
\ : IsY8 ? Expr(filtered, original, expry, optSSE2=true, optSingleMode=false, optAvx2=true)
\ : Expr(filtered, original, expry, expru, exprv, optSSE2=true, optSingleMode=false, optAvx2=true)
}

Function kf_limit_dif8_mt(clip filtered, clip original, bool "smooth", float "thr", float "elast", float "darkthr", int "Y", int "U", int "V")
{

smooth = Default(smooth, True )
thr = Default(thr, 1.0 )
elast = Default(elast, smooth ? 3.0 : 255./thr)
darkthr = Default(darkthr,thr )
Y = Default(Y, 3 )
U = Default(U, 3 )
V = Default(V, 3 )

Y = min(Y, 4)
U = min(U, 4)
V = min(V, 4)

thr = max(min( thr, 255.0), 0.0)
darkthr = max(min(darkthr, 255.0), 0.0)
elast = max(elast, 1.0)
mode = thr == 0 && darkthr == 0 ? 4 : thr == 255 && darkthr == 255 ? 2 : 3
smooth = elast==1 ? False : smooth

diffstr = " x y - "
elaststr = " "+string(elast)+" "

thrstr = diffstr+" 0 > "+string(darkthr)+" scalef "+string(thr)+" scalef ? "
alphastr = elaststr+" 1 <= 0 1 "+elaststr+" 1 - "+thrstr+" * / ? "
betastr = thrstr+elaststr+" * "
sexpr = smooth ? alphastr+diffstr+" * "+betastr+diffstr+" abs - * y + "
\ : thrstr+diffstr+diffstr+" abs / * y + "
expr = diffstr+" abs "+thrstr+" <= x "+diffstr+" abs "+betastr+" >= y "+sexpr+" ? ? "

thrstrc = " "+string(thr)+" scalef "
alphastrc= elaststr+" 1 <= 0 1 "+elaststr+" 1 - "+thrstrc+" * / ? "
betastrc = thrstrc+elaststr+" * "
sexprc = smooth ? alphastrc+diffstr+" * "+betastrc+diffstr+" abs - * y + "
\ : thrstrc+diffstr+diffstr+" abs / * y + "
exprc = diffstr+" abs "+thrstrc+" <= x "+diffstr+" abs "+betastrc+" >= y "+sexprc+" ? ? "

# diff = filtered - original
# alpha = 1 / (thr * (elast - 1))
# beta = elast * thr
# When smooth=True :
# output = diff <= thr ? filtered : \
# diff >= beta ? original : \
# original + alpha * diff * (beta - abs(diff))
# When smooth=False :
# output = diff <= thr ? filtered : \
# diff >= beta ? original : \
# original + thr * (diff / abs(diff))

return mode == 4 ? original
\ : mode == 2 ? filtered
\ : mt_lutxy(filtered, original, yExpr=expr, uExpr=exprc, vExpr=exprc, Y=Y, U=U, V=V)
}

pinterf

15th November 2017, 12:48

I change mt_lutxy to Expr then limit function no work any more.

Could you specify your calling parameters and what should I look for in the results?

edcrfv94

15th November 2017, 13:05

Could you specify your calling parameters and what should I look for in the results?

The result is very different.

SetMemoryMax(3000)

#colorbars(width=1920, height=1080, pixel_type="yv12").killaudio().assumefps(25, 1)
ImageSource("1.png", end=0).Dither_convert_rgb_to_yuv(lsb=true,output="YV12").DitherPost(mode=6)

ConvertToY8()
trim(0, 5000)
#Limiter()
#InvertNeg()
#VToY()

src = last.ConvertBits(bits=16)

sharp = src.Sharpen(1.0).Sharpen(1.0)

kf_limit_dif8_mt_test(sharp, src, thr=1, elast=42, y=3, u=1, v=1)
#kf_limit_dif8_expr_test(sharp, src, thr=1, elast=42, y=3, u=1, v=1)

ConvertToStacked().DitherPost(mode=6, ampo=1)

Function kf_limit_dif8_expr_test(clip filtered, clip original, bool "smooth", float "thr", float "elast", float "darkthr", int "Y", int "U", int "V")
{
sCSP = filtered.kf_GetCSP()
IsY8 = sCSP == "Y8"

smooth = Default(smooth, True )
thr = Default(thr, 1.0 )
elast = Default(elast, smooth ? 3.0 : 255./thr)
darkthr = Default(darkthr,thr )
Y = Default(Y, 3 )
U = Default(U, 3 )
V = Default(V, 3 )

Y = min(Y, 4)
U = min(U, 4)
V = min(V, 4)

thr = max(min( thr, 255.0), 0.0)
darkthr = max(min(darkthr, 255.0), 0.0)
elast = max(elast, 1.0)
mode = thr == 0 && darkthr == 0 ? 4 : thr == 255 && darkthr == 255 ? 2 : 3
smooth = elast==1 ? False : smooth

diffstr = " x y - "
elaststr = " "+string(elast)+" "

thrstr = diffstr+" 0 > "+string(darkthr)+" scalef "+string(thr)+" scalef ? "
alphastr = elaststr+" 1 <= 0 1 "+elaststr+" 1 - "+thrstr+" * / ? "
betastr = thrstr+elaststr+" * "
sexpr = smooth ? alphastr+diffstr+" * "+betastr+diffstr+" abs - * y + "
\ : thrstr+diffstr+diffstr+" abs / * y + "
expr = diffstr+" abs "+thrstr+" <= x "+diffstr+" abs "+betastr+" >= y "+sexpr+" ? ? "

thrstrc = " "+string(thr)+" scalef "
alphastrc= elaststr+" 1 <= 0 1 "+elaststr+" 1 - "+thrstrc+" * / ? "
betastrc = thrstrc+elaststr+" * "
sexprc = smooth ? alphastrc+diffstr+" * "+betastrc+diffstr+" abs - * y + "
\ : thrstrc+diffstr+diffstr+" abs / * y + "
exprc = diffstr+" abs "+thrstrc+" <= x "+diffstr+" abs "+betastrc+" >= y "+sexprc+" ? ? "

# diff = filtered - original
# alpha = 1 / (thr * (elast - 1))
# beta = elast * thr
# When smooth=True :
# output = diff <= thr ? filtered : \
# diff >= beta ? original : \
# original + alpha * diff * (beta - abs(diff))
# When smooth=False :
# output = diff <= thr ? filtered : \
# diff >= beta ? original : \
# original + thr * (diff / abs(diff))

expry = (Y == 3) ? expr : ""
expru = (U == 3) ? exprc : ""
exprv = (V == 3) ? exprc : ""

return (mode == 4) ? original
\ : (mode == 2) ? filtered
\ : IsY8 ? Expr(filtered, original, expry, optSSE2=true, optSingleMode=false, optAvx2=true)
\ : Expr(filtered, original, expry, expru, exprv, optSSE2=true, optSingleMode=false, optAvx2=true)
}

Function kf_limit_dif8_mt_test(clip filtered, clip original, bool "smooth", float "thr", float "elast", float "darkthr", int "Y", int "U", int "V")
{

smooth = Default(smooth, True )
thr = Default(thr, 1.0 )
elast = Default(elast, smooth ? 3.0 : 255./thr)
darkthr = Default(darkthr,thr )
Y = Default(Y, 3 )
U = Default(U, 3 )
V = Default(V, 3 )

Y = min(Y, 4)
U = min(U, 4)
V = min(V, 4)

thr = max(min( thr, 255.0), 0.0)
darkthr = max(min(darkthr, 255.0), 0.0)
elast = max(elast, 1.0)
mode = thr == 0 && darkthr == 0 ? 4 : thr == 255 && darkthr == 255 ? 2 : 3
smooth = elast==1 ? False : smooth

diffstr = " x y - "
elaststr = " "+string(elast)+" "

thrstr = diffstr+" 0 > "+string(darkthr)+" scalef "+string(thr)+" scalef ? "
alphastr = elaststr+" 1 <= 0 1 "+elaststr+" 1 - "+thrstr+" * / ? "
betastr = thrstr+elaststr+" * "
sexpr = smooth ? alphastr+diffstr+" * "+betastr+diffstr+" abs - * y + "
\ : thrstr+diffstr+diffstr+" abs / * y + "
expr = diffstr+" abs "+thrstr+" <= x "+diffstr+" abs "+betastr+" >= y "+sexpr+" ? ? "

thrstrc = " "+string(thr)+" scalef "
alphastrc= elaststr+" 1 <= 0 1 "+elaststr+" 1 - "+thrstrc+" * / ? "
betastrc = thrstrc+elaststr+" * "
sexprc = smooth ? alphastrc+diffstr+" * "+betastrc+diffstr+" abs - * y + "
\ : thrstrc+diffstr+diffstr+" abs / * y + "
exprc = diffstr+" abs "+thrstrc+" <= x "+diffstr+" abs "+betastrc+" >= y "+sexprc+" ? ? "

# diff = filtered - original
# alpha = 1 / (thr * (elast - 1))
# beta = elast * thr
# When smooth=True :
# output = diff <= thr ? filtered : \
# diff >= beta ? original : \
# original + alpha * diff * (beta - abs(diff))
# When smooth=False :
# output = diff <= thr ? filtered : \
# diff >= beta ? original : \
# original + thr * (diff / abs(diff))

return mode == 4 ? original
\ : mode == 2 ? filtered
\ : mt_lutxy(filtered, original, yExpr=expr, uExpr=exprc, vExpr=exprc, Y=Y, U=U, V=V)
}

Function kf_GetCSP(clip c)
{
try {
csp = c.kf_GetCSP_avsPlus()
} catch (error_msg) {
csp = c.kf_GetCSP_avs()
}
return csp
}

Function kf_GetCSP_avs(clip c)
{
return c.IsPlanar ? c.IsYV12 ? "YV12" :
\ c.IsYV16 ? "YV16" :
\ c.IsYV24 ? "YV24" : c.kf_GetCSP_Y8_YV411() :
\ c.IsYUY2 ? "YUY2" :
\ c.IsRGB32 ? "RGB32" :
\ c.IsRGB24 ? "RGB24" : "Unknown"

Function kf_GetCSP_Y8_YV411(clip c) {
try {
c.UtoY
csp = "YV411"
} catch (error_msg) {
csp = "Y8"
}
return csp
}
}

Function kf_GetCSP_avsPlus(clip c)
{
return c.Is420 ? "YV12" :
\ c.IsY ? "Y8" :
\ c.Is422 ? "YV16" :
\ c.Is444 ? "YV24" :
\ c.IsYUVA ? "YUVA" :
\ c.IsYV411 ? "YV411" :
\ c.IsYUY2 ? "YUY2" :
\ c.IsRGB32 ? "RGB32" :
\ c.IsRGB24 ? "RGB24" :
\ c.IsPackedRGB ? "RGB32/RGB24" :
\ c.IsPlanarRGB ? "RGB48" :
\ c.IsPlanarRGBA ? "RGB64" : "Unknown"
}

pinterf

15th November 2017, 13:39

The result is very different.
...code...

Thanks, scalef bug in 10-16 bits. scaleb was o.k.
I'll wait a bit before make the new version with the fix.

pinterf

15th November 2017, 17:32

New build with a hotfix

Download Avisynth+ r2544 (https://github.com/pinterf/AviSynthPlus/releases/tag/r2544-MT)

20171115 r2544
- Expr: fix "scalef" for 10-16 bits
- Expr optimization: eliminate ^1 +0 -0 *1 /1