Avisynth+ [Archive] - Page 111

StainlessS

18th April 2020, 13:32

Thanks, seems that there are too many filters for my memory capacity :)

Suggest 16KB RAM pack upgrade and lump of Bluetack to prevent RAM pack wobble.

dREV

24th April 2020, 19:32

Hi, I don't know if this is an AviSynth question or a HEVC one. I wanted to ask is it possible to do a script with both chroma shift and chroma upscale (the tattler I think that's what it does) in 4:2:0 encoding in HEVC similarly with a script below?

This is the script I'm currently using:

dither_convert_8_to_16()
# filter(s)
s16 = last
DitherPost()
# 8 bit filter(s)
dither_convert_8_to_16()
s16.Dither_limit_dif16 (last,)
ly = GradFun3mod(resizer="DebilinearM", lsb_in=true, lsb=true)
lc = nnedi3_resize16(1280*2, 720*2,lsb_in=true,lsb=true,kernel_d="Spline36",kernel_u="Spline36",src_top=0.0,src_left=0.50,nlsb=false)
lu = lc.UtoY()
lv = lc.VtoY()
YtoUV(lu,lv,ly)
Dither_out()

Here's another resize script I also use.

Y = ConvertToY8().dither_resize16(1280,720,kernel="Spline36")src top and left
U = UToY8().dither_resize16(1280,720,kernel="blackmanminlobe",src_top=0.0,src_left=0.25)
V = VToY8().dither_resize16(1280,720,kernel="blackmanminlobe",src_top=0.0,src_left=0.25)
YToUV(U, V, Y)

I do not understand the language of what it's doing above during resize (ly,lc,lu, yuv) I just like the end results. If someone can either explain or link to an AviSynth wiki so I can sort of understand better I'd appreciate that. I doubt I'll be able to come up with a script.

I get no issues in H264 4:2:0. No issues with HEVC 4:4:4 of course.

Just in case it's asked I am using the correct script in HEVC: --input-depth 16 --profile main10 --input-csp i420 and using AviSynth+ ver 3.5.0. I am also using possibly an out of date avs4x26x (https://astrataro.wordpress.com/2014/08/28/avs4x26x-0-10-0/) if that helps. :D

real.finder

24th April 2020, 21:06

Hi, I don't know if this is an AviSynth question or a HEVC one. I wanted to ask is it possible to do a script with both chroma shift and chroma upscale (the tattler I think that's what it does) in 4:2:0 encoding in HEVC similarly with a script below?

This is the script I'm currently using:

dither_convert_8_to_16()
# filter(s)
s16 = last
DitherPost()
# 8 bit filter(s)
dither_convert_8_to_16()
s16.Dither_limit_dif16 (last,)
ly = GradFun3mod(resizer="DebilinearM", lsb_in=true, lsb=true)
lc = nnedi3_resize16(1280*2, 720*2,lsb_in=true,lsb=true,kernel_d="Spline36",kernel_u="Spline36",src_top=0.0,src_left=0.50,nlsb=false)
lu = lc.UtoY()
lv = lc.VtoY()
YtoUV(lu,lv,ly)
Dither_out()

Here's another resize script I also use.

Y = ConvertToY8().dither_resize16(1280,720,kernel="Spline36")src top and left
U = UToY8().dither_resize16(1280,720,kernel="blackmanminlobe",src_top=0.0,src_left=0.25)
V = VToY8().dither_resize16(1280,720,kernel="blackmanminlobe",src_top=0.0,src_left=0.25)
YToUV(U, V, Y)

I do not understand the language of what it's doing above during resize (ly,lc,lu, yuv) I just like the end results. If someone can either explain or link to an AviSynth wiki so I can sort of understand better I'd appreciate that. I doubt I'll be able to come up with a script.

I get no issues in H264 4:2:0. No issues with HEVC 4:4:4 of course.

Just in case it's asked I am using the correct script in HEVC: --input-depth 16 --profile main10 --input-csp i420 and using AviSynth+ ver 3.5.0. I am also using possibly an out of date avs4x26x (https://astrataro.wordpress.com/2014/08/28/avs4x26x-0-10-0/) if that helps. :D

seems you missed this post https://forum.doom9.org/showpost.php?p=1906222&postcount=405 :)

also this https://forum.doom9.org/showthread.php?p=1906465#post1906465

real.finder

24th April 2020, 22:09

I find something when I try adding HBD for Ylevels by Didée

function Ylevels(clip clp,
\ float "input_low", float "gamma", float "input_high",
\ float "output_low", float "output_high", bool "show_function")
{
sisavs26 = !(VersionNumber() < 2.60)
input_low = Default(input_low, 0)
gamma = Default(gamma, 1.0)
input_high = Default(input_high, 255)
output_low = Default(output_low, 0)
output_high = Default(output_high, 255)
show_function = Default(show_function, false)

wicked = sisavs26 ? "x " +string(input_low)+ " scalef - " +string(input_high)+ " scalef " +string(input_low)+ " scalef - / 1 " +string(gamma)+
\ " / ^ " +string(output_high)+ " scalef " +string(output_low)+ " scalef - * " +string(output_low)+ " scalef +"
\ : "x " +string(input_low)+ " - " +string(input_high)+ " " +string(input_low)+ " - / 1 " +string(gamma)+
\ " / ^ " +string(output_high)+ " " +string(output_low)+ " - * " +string(output_low)+ " +"

return( show_function ? clp.subtitle(wicked) : sisavs26 ? clp.mt_lut(Yexpr = wicked, use_expr=2, U=2,V=2) : clp.mt_lut(Yexpr = wicked, U=2,V=2) )
}

it's ok in 16bit and in float it's ok as I don't set gamma to 1.2

convertbits(32)
Ylevels(40, 1.2, 255, 0, 255, false)
Limiter(0,1,-0.5,0.5)
convertbits(8)

also tried with scale_input="float" but still give white artifacts in dark parts of frame, both with use_expr=2 or without

even if I scale expr with x 255 * and 255 / in the end of expr still same, is there are certain limits if the input is float clip or it's bug?

pinterf

25th April 2020, 06:08

How the formula looks like originally that should be implemented with expr?

real.finder

25th April 2020, 08:44

How the formula looks like originally that should be implemented with expr?

output = ( (input - input_low) / (input_high - input_low) )^(1 / gamma) * (output_high - output_low) + output_low
as from http://avisynth.nl/index.php/Levels

StainlessS

25th April 2020, 11:37

Does mt_Infix() no longer work ? [Win7].

Colorbars(Pixel_type="YV12").ConvertToYV24.ConvertBits(32)
YLevels()
convertToRGB32

# Inside YLevels
RT_DebugF("Wicked = '%s'",wicked)
infix = Mt_Infix(wicked)
RT_DebugF("Infix = '%s'",Infix)

00000315 0.14433271 [4556] RT_DebugF: Wicked = 'x 0 scalef - 255 scalef 0 scalef - / 1 1.000000 / ^ 255 scalef 0 scalef - * 0 scalef +'
00000316 0.14450951 [4556] RT_DebugF: Infix = '(0+)'

EDIT: masktools2_x86(NOT_XP)_2.2.18.dll

EDIT: this works so its not totally broken [error in wicked RPN ? (I dont understand that new scalef stuff) ]

Colorbars
RPN="x y - abs"
IFX=mt_infix(RPN)
RT_DebugF("RPN = '%s'\nIFX = '%s'",RPN,IFX)
Subtitle("RPN = '"+RPN+"'\n"+"IFX = '"+IFX+"'",lsp=0,font="Courier New")

__END__

00000962 0.17500676 [3320] RT_DebugF: RPN = 'x y - abs'
00000963 0.17506003 [3320] RT_DebugF: IFX = 'abs((x-y))'

pinterf

25th April 2020, 12:31

1.) We are at 2.2.21 masktools.
2.) I dont remember is scalef is handled in infix.
3.) Probably power 1 is optimized to do nothing. But 1.00001 should give quite similar result. Does it work?
Cannot check, I'm not near my PC

StainlessS

25th April 2020, 12:47

1) Yeh, its in my Inbox :(
2.) Seems not in 2.2.18

This looks right after doing scalef thing manually [only for defaults]

Colorbars
RPN="x 0.0 - 1.0 0.0 - / 1 1.0 / ^ 1.0 0.0 - * 0.0 +"
IFX=mt_infix(RPN)
RT_DebugF("RPN = '%s'\nIFX = '%s'",RPN,IFX)
Subtitle("RPN = '"+RPN+"'\n"+"IFX = '"+IFX+"'",lsp=0,font="Courier New")

__END__

00006458 0.14370735 [3704] RT_DebugF: RPN = 'x 0.0 - 1.0 0.0 - / 1 1.0 / ^ 1.0 0.0 - * 0.0 +'
00006459 0.14376387 [3704] RT_DebugF: IFX = '(((((x-0.0)/(1.0-0.0))^(1/1.0))*(1.0-0.0))+0.0)'

#output = ( (input - input_low) / (input_high - input_low) )^(1 / gamma) * (output_high - output_low) + output_low

I'll see bout installing 2.2.21

pinterf

25th April 2020, 13:03

(deleted :))

StainlessS

25th April 2020, 13:23

(deleted )
Yeh, probably cursing me for not using 2.2.21, sorry :(

Seems ok (InFix scalef thing in 2.2.21, not optimized)

SCRIPT DELETED: See post #5519.

pinterf

25th April 2020, 13:40

Regarding the expression, don't apply gamma on negative numbers. Limit it before. " 0 max "

StainlessS

25th April 2020, 13:48

EDIT: Oops did not see P Post.

SCRIPT DELETED: See post #5519. [EDIT: Not because of above P post, P was answering RF]

pinterf

25th April 2020, 16:04

Checked the code in Levels, there is a proper guard:

if(use_gamma)
p = (float)pow((double)clamp(p, 0.0f, 1.0f), gamma);

StainlessS

25th April 2020, 16:37

SCRIPT DELETED: See post #5519.

real.finder

25th April 2020, 17:11

the question is why there are no problem when 8-16 bits?

pinterf

25th April 2020, 17:12

Masktools and expr is full float inside, there are proper rounding on conversions.

pinterf

25th April 2020, 17:15

the question is why there are no problem when 8-16 bits?

Substitute e.g. x=20 and do the same computing as Expr does for both 8 bit and float (20/255.)

StainlessS

25th April 2020, 18:47

As in below Pinterf next post.
Just before gamma clamp to 0-1 regardless from the original bit depth. That's why we normalize the range before applying power function.

Just remove these lines to put back the bug [in two places].

\ + "0 max 1 min " [* P = min(max(P , 0.0), 1.0) *]

Function Ylevels(clip clp,
\ float "input_low", float "gamma", float "input_high",
\ float "output_low", float "output_high", bool "show_function")
{
input_low = Default(input_low, 0)
gamma = Default(gamma, 1.0)
input_high = Default(input_high, 255)
output_low = Default(output_low, 0)
output_high = Default(output_high, 255)
show_function = Default(show_function, false)
#
invGam = 1.0 / min(max(0.1,Gamma),10.0) # Sane range, avoid div by zero
divisor = (input_high==input_low) ? 1 : input_high-input_low # Avoid divide by zero
try { bpc = clp.BitsPerComponent } catch(msg) { bpc=8 } # Use scalef only if bpc > 8 (ie Avs+)
#
wicked = (bpc>8) [* Use scalef only if > 8 bit ie Avs+ *]
\ ? "x " +string(input_low)+" scalef - "+string(divisor)+" scalef / " [* P = (input - input_low) / divisor *]
\ + "0 max 1 min " [* P = min(max(P , 0.0), 1.0) *]
\ + string(invGam) + " ^ " [* P = pow(P, 1.0/Gamma) *]
\ + string(output_high)+" scalef " +string(output_low)+" scalef - * "+string(output_low)+" scalef +" [* P = P * (output_high - output_low) + output_low *]
\ : "x " +string(input_low)+" - "+string(divisor)+" / "
\ + "0 max 1 min "
\ + string(invGam) + " ^ "
\ + string(output_high)+" " +string(output_low)+" - * "+string(output_low)+" +"
# UnComment next line to explicitly round & clamp to output range, although Masktools will do this anyway. 32 bit Float NOT rounded. Here as documentation.
# wicked = wicked + ((bpc==32) ? " 0 max 255 scalef min" : (bpc>8) ? " round 0 max 255 scalef min" : " round 0 max 255 min") [* P = min(max(int(P+0.5) , 0.0), 255.0) *]
return( show_function ? clp.subtitle(wicked) : (bpc>8) ? clp.mt_lut(Yexpr = wicked, use_expr=2, U=2,V=2) : clp.mt_lut(Yexpr = wicked, U=2,V=2) )
}

From Levels Docs:-
This is one of those filters for which it would really be nice to have a GUI.
Since I can't offer a GUI (at least not in AviSynth's current form), I decided I could at least make this filter compatible with VirtualDub's
when the clip is RGB. In that case you should be able to take the numbers from VirtualDub's Levels dialog and pass them as parameters
to the Levels filter and get the same results. However, the input and output parameters can be larger than 255.
EDIT: And also less than 0 too.

Here specifying both input and output coords outside of colorspace range,

Ylevels(0-32, 1.0, 255+32, 0-32, 255+32, false)

https://i.postimg.cc/q7T19z5r/Untitled-00.jpg (https://postimages.org/)

Also, you should not artifically limit input or output args to the YLevels function.
[EDIT: Original v2.58 (and Beta 2.60) avs Levels used PixelClip on RGB ouput, this was an (in some caes failed) attempt to clamp output RGB and was a bug, a simple Min,Max clamp fixed it. ]

Client for similar to above image.

SHOW=False
Avisource("D:\Parade.avi")
#Colorbars(width=1920,height=1080,pixel_type="YV12")
#convertbits(16)
convertbits(32)
Ylevels(-32, 1.0, 255+32, 0-32, 255+32, SHOW)
convertbits(8)

Code originating from Levels, AutoLevels and others.

gamma = min(max(gamma,0.1f),10.0f); // ssS: Added sane range limiting (& avoid div by zero on gamma)
gamma = 1/gamma;
int divisor = (in_max == in_min) ? 1 : (in_max - in_min); // avoid zero divide

if (vi.IsYUV()) {
for (int i=0; i<256; ++i) {
float p;
if (coring)
p = ((i-16)*(255.0f/219.0f) - in_min) / divisor;
else
p = float(i - in_min) / divisor; // range 0.0 -> 1.0 of input range

p = pow(min(max(p, 0.0f), 1.0f), gamma); // gamma
p = p * (out_max - out_min) + out_min; // output range with out_min offset
int pp;

if (coring)
pp = int(p*(219.0f/255.0f)+16.5f);
else
pp = int(p+0.5f); // round to nearest luma level

map[i] = min(max(pp, (coring) ? 16 : 0), (coring) ? 235 : 255);

int q = ((i-128) * (out_max-out_min) + (divisor>>1)) / divisor + 128;
mapchroma[i] = min(max(q, (coring) ? 16 : 0), (coring) ? 240 : 255);
}
} else if (vi.IsRGB()) {
for (int i=0; i<256; ++i) {
float p = float(i - in_min) / divisor;
p = pow(min(max(p, 0.0f), 1.0f), gamma);
p = p * (out_max - out_min) + out_min;
int z=int(p+0.5f);
map[i] = (z < 0) ? 0 : (z>255) ? 255 : z; // # EDIT: Was originally bugged PixelClip clamping
}
}

pinterf

25th April 2020, 20:39

Just before gamma clamp to 0-1 regardless from the original bit depth. That's why we normalize the range before applying power function.

StainlessS

26th April 2020, 13:02

Cleaned up and removed some earlier scripts.
YLevels script in post #5519 Updated, Added Levels originating source code:- https://forum.doom9.org/showthread.php?p=1909075#post1909075
Update, use scalef only if BitsPerComponent > 8 [and also avs+].
Made final clamp to colorspace range optional, uncomment a line to use explicit clamp, Masktools clamps anyway so not necessary, suggest leave in-situ [commented out] as documentation.
EDIT: Original Levels source code, and YLevels script do not include any dithering [was I think added to levels in avs 2.60 Std final]. [ Maybe something for RF to think about :) ]

real.finder

26th April 2020, 15:23

Cleaned up and removed some earlier scripts.
YLevels script in post #5519 Updated, Added Levels originating source code:- https://forum.doom9.org/showthread.php?p=1909075#post1909075
Update, use scalef only if BitsPerComponent > 8 [and also avs+].
Made final clamp to colorspace range optional, uncomment a line to use explicit clamp, Masktools clamps anyway so not necessary, suggest leave in-situ [commented out] as documentation.
EDIT: Original Levels source code, and YLevels script do not include any dithering [was I think added to levels in avs 2.60 Std final]. [ Maybe something for RF to think about :) ]

thank you both pinterf and StainlessS

I will add these changes as soon as possible

dithering in 2.60 Std? it mean using lsb or some hack, so it's unlikely, unless you made it dll :) here are the all functions https://github.com/realfinder/AVS-Stuff/blob/Community/avs%202.5%20and%20up/YLevels_mt.avsi

edit: 255 scalef can be replaced with range_max

StainlessS

26th April 2020, 16:11

Did not know if some clever stuff could be done for dithering using Expr (I aint really ever looked at Expr).
255 scalef can be replaced with range_max
Very good to know, thanks. [Its your (adopted) baby, mod it as you suggested].
If you need asistance with the other Didee funcs, let me know [but not just now, bit busy].
EDIT:
I had thought about splitting the string into several parts wick1, wick2 etc,
so could provide line wrap for subtitle, and append parts for the actual functionality, maybe you add it.

real.finder

26th April 2020, 18:02

Did not know if some clever stuff could be done for dithering using Expr (I aint really ever looked at Expr).

Very good to know, thanks. [Its your (adopted) baby, mod it as you suggested].
If you need asistance with the other Didee funcs, let me know [but not just now, bit busy].
EDIT:
I had thought about splitting the string into several parts wick1, wick2 etc,
so could provide line wrap for subtitle, and append parts for the actual functionality, maybe you add it.

Expr is only for avs+, and dither need use nearby Pixels which mt_lut can't unless if dither parameter added for it

The only thing that can be done for now is add stacked hack for avs 2.6 :devil:

real.finder

26th April 2020, 18:04

speaking of expr()

pinterf, seems you forget about this https://forum.doom9.org/showthread.php?p=1899122#post1899122 :)

dREV

26th April 2020, 19:34

seems you missed this post https://forum.doom9.org/showpost.php?p=1906222&postcount=405 :)

also this https://forum.doom9.org/showthread.php?p=1906465#post1906465

It's not that I missed it more like I haven't kept up with your thread. Made a post over at your page about your script https://forum.doom9.org/showthread.php?p=1909256#post1909256 It's not working out in 4:2:0 color depth in HEVC.

real.finder

26th April 2020, 21:10

It's not that I missed it more like I haven't kept up with your thread. Made a post over at your page about your script https://forum.doom9.org/showthread.php?p=1909256#post1909256 It's not working out in 4:2:0 color depth in HEVC.

you use input dimensions as native but anyway I fix it, but keep in mind it's 4:4:4 output, if you don't need 444 use resizer only

also I forgot to mention that it work in HBD if you want

pinterf

26th April 2020, 21:15

speaking of expr()

pinterf, seems you forget about this https://forum.doom9.org/showthread.php?p=1899122#post1899122 :)
I won't do that. We had already had AVX2 and the fma (mul and add in a single step) optimizations were already backported.
It has a more futureproof coding style with even more in-Expr optimizations.
But I think it would offer not real speed benefit at the moment. And since sekrit-twc is thinking about replacing the whole engine let's wait what magic can he do. It won't be a one-week job.

pinterf

26th April 2020, 21:24

@real.finder, what about a mode25 test instead? Check rgtools site. This took one week's time including my earlier attempts on it, I do hope there are no other unavoidable modes.

real.finder

26th April 2020, 21:34

@real.finder, what about a mode25 test instead? Check rgtools site. This took one week's time including my earlier attempts on it, I do hope there are no other unavoidable modes.

thanks and why not?

ColorBars(width=640, height=480, pixel_type="yv12")
SoftSharpen (http://web.archive.org/web/20160608111758/http://leon1789.perso.sfr.fr/avisynth/SoftSharpen-8.8.zip)

https://i.postimg.cc/NMZPHvg7/Untitled.png (https://postimages.org/)

SoftSharpen need mode 27, I will test mode 25 later :)

pinterf

26th April 2020, 21:57

Prove it that it cannot be replaced. Just because someone has put there mode 27 twenty years ago (o.k I'm probably exaggerating, but I hope you understand what I am trying to say in general) Experiment with new, maybe the replacements are even better.

real.finder

26th April 2020, 22:23

Prove it that it cannot be replaced. Just because someone has put there mode 27 twenty years ago (o.k I'm probably exaggerating, but I hope you understand what I am trying to say in general) Experiment with new, maybe the replacements are even better.

I don’t know, I don’t have enough knowledge and experience

26 = medianblur. Based off mode 17, but preserves corners, but not thin lines.
27 = medianblur. Same as mode 26 but preserves thin lines.

don't know how 26 preserves corners, and 27 use 26 as a base

and isn't using 17 with mask (if it can be done) to get 26 then 26 with another mask to get 27 will make it slower?

tormento

26th April 2020, 23:04

Check rgtools site.
I saw you just released RGTools 0.99.

In the release note, you specify support for SSE2, SSE4.1 and AVX2.

I still have Sandy Bridge, that supports AVX but not AVX2. Can you include this specific optimization too into your releases?

Thanks! :o

qyot27

27th April 2020, 00:55

Ask for a hand, get an ARM instead.
https://i.imgur.com/OCNAx9Kh.jpg (https://i.imgur.com/OCNAx9K.jpg)

real.finder

27th April 2020, 01:08

Ask for a hand, get an ARM instead.
https://i.imgur.com/OCNAx9K.jpg

I don't use an ARM PC but this is cool

it should work even on RISC-V right?

qyot27

27th April 2020, 01:28

It'd need the correct CPU defines in config.h so it could build, but otherwise, maybe? The actual thing you're looking at there is the ability to build a plain C version (with the right toggle in avs/config.h so that it can build on ARM), with the Intel intrinsics disabled. So hypothetically, in the state it's in, if it can be built, it should be able to run so long as it's under an OS it can already handle. But it'll be slow until intrinsics/asm for those platforms get added. And probably due to the reliance on some of the cpuid stuff, Prefetch will segfault, so right now it's single threaded and no asm assist on non-x86.

StainlessS

27th April 2020, 03:09

Impressive, Raspberry Pi yum yum.

pinterf

27th April 2020, 05:50

I saw you just released RGTools 0.99.

In the release note, you specify support for SSE2, SSE4.1 and AVX2.

I still have Sandy Bridge, that supports AVX but not AVX2. Can you include this specific optimization too into your releases?

Thanks! :o
Unfortunately not.
AVX is not for integer data.

Reel.Deel

27th April 2020, 08:15

Hi everyone, I just want to say thanks to all responsible for the continued work on AviSynth+. There are now over 100+ 64-bit plugins available to use and still a handful missing.

http://avisynth.nl/index.php/Category:Plugins_x64

:thanks:

tormento

27th April 2020, 08:29

Unfortunately not. AVX is not for integer data.
Understood. And why not the complete set of SSE, i.e. 4.2, but 4.1 only?

tormento

27th April 2020, 08:30

I don't use an ARM PC but this is cool
Picture Apple is thinking about switching to ARM...

tormento

27th April 2020, 08:31

Ask for a hand, get an ARM instead.
Cool.

Any benchmark for us? Let's say simple denoise script with x264. :D

MeteorRain

27th April 2020, 11:49

Understood. And why not the complete set of SSE, i.e. 4.2, but 4.1 only?

Since SIMD is probably a popular dev topic I suggest you to take a look at what each instruction set provides.

For example, usually integer operations are provided after floating points, so SSE=float and SSE2+=int, AVX=float and AVX2=int.

Then because SSE appeared in early days, more and more instructions are added because later they were found useful but not provided. For example SSSE3 added horizontal additions. SSE4.1 is pretty sweet, it added lots of useful tools like extending data width (8bit extending to 16bit, which are useful in high bit depth), min and max on unsigned data (clamping pixel channels), and floating point rounding (used in converting internal floating to integers). As you can see, lots of them are naturally used by lots of filters.

SSE4.2 however is irrelevant to computing. It has instructions for strings (such as comparing strings or get length of a string), and for CRCs. It's close to useless for filters.

That's why most of the filters will use up to SSE4.1. Some can limit their usage under SSE/SSE2 because some SSE4.1 operations can usually be done in SSE/SSE2 at relatively low cost. It's author's discretion to support which one.

tormento

27th April 2020, 11:52

Since SIMD is probably a popular dev topic I suggest you to take a look at what each instruction set provides.

Very interesting explanation!

Thank you!!!

StainlessS

27th April 2020, 13:36

Understood. And why not the complete set of SSE, i.e. 4.2, but 4.1 only?

Maybe there is not much use for it.
SSE4.2
SSE4.2 added STTNI (String and Text New Instructions),[10] several new instructions that perform character searches and comparison on two operands of 16 bytes at a time. These were designed (among other things) to speed up the parsing of XML documents.[11] It also added a CRC32 instruction to compute cyclic redundancy checks as used in certain data transfer protocols. These instructions were first implemented in the Nehalem-based Intel Core i7 product line and complete the SSE4 instruction set. Support is indicated via the CPUID.01H:ECX.SSE42[Bit 20] flag.
https://en.wikipedia.org/wiki/SSE4#SSE4.2

EDIT: OOPS sorry, was looking at last post on previous page of thread, thought is was the last post in thread. [already answered by MeteorRain] :(

tormento

27th April 2020, 14:13

OOPS sorry, was looking at last post on previous page of thread, thought is was the last post in thread. [already answered by MeteorRain] :(

Kindness is never enough. Thank you too!

pinterf

27th April 2020, 14:23

Anyway, non SSE4.1 processors are so rare that months pass until someone report a bug on that (executing sse4.1 instruction instead of sse2), lately in VapourSynth Expr, but such bug existed in an old mvtools2 as well and probably once in Avisynth+)

real.finder

27th April 2020, 18:17

@real.finder, what about a mode25 test instead? Check rgtools site. This took one week's time including my earlier attempts on it, I do hope there are no other unavoidable modes.

did some tests, mode 25 seems work ok in x64 and HBD

:thanks:

real.finder

27th April 2020, 18:20

Cool.

Any benchmark for us? Let's say simple denoise script with x264. :D

You will be lucky if you get 1 fps every 1 minute :p since it's arm and there are only c (no asm)

pinterf

27th April 2020, 18:30

did some tests, mode 25 seems work ok in x64 and HBD

:thanks:
And behind the scenes mode 26-28 are ready from C up to avx2. I decided to do the Repair side as well because these things are ugly when done half finished.