Quote:
Originally Posted by Myrsloik
About how much faster is avx2 vs sse2 on a modern cpu in your expr version?
|
I had to do that in blind mode, I have no AVX2, only through SDE emulator. I could test it only two days ago on a 2 yr old i5 notebook and the results show that it was worth to implement.
Other speed tests are welcome, that's why there are optXXX parameters.
Code:
results in fps
avx2: set it only in Expr through optAvx2 parameter
bits i5 sse2 32/64 bit i5Avx2 32/64 bit
8 17.00 19.30 24.63 28.98
16 15.69 17.59 20.38 23.26
32 12.70 13.59 16.03 17.14
The script was something like this (deleted my debug experimental commented out lines)
Code:
lsmashvideosource("13HoursCUT.mp4", format="YUV444P8")
Spline64Resize(486,240) #resize, result is a multistacked image
src=last
# expr
c8 = CalcTest(src,8, False)
c16 = CalcTest(src,16, False)
c32 = CalcTest(src,32, False)
# lutxy
c8e = CalcTest(src,8, True)
c16e = CalcTest(src,16, True)
c32e = CalcTest(src,32, True)
res8=Diff(c8,src)
res16=Diff(c16,src)
res32=Diff(c32,src)
res8e=Diff(c8e,src)
res16e=Diff(c16e,src)
res32e=Diff(c32e,src)
col1=StackVertical(c8,c16.convertbits(8),c32.convertbits(8))
col2=StackVertical(res8, res16, res32)
col3=StackVertical(c8e,c16e.convertbits(8),c32e.convertbits(8))
col4=StackVertical(res8e, res16e, res32e)
StackHorizontal(col1, col2, col3, col4)
#used only c8, c16 or c32 output for speed test from the clips above.
# change parameters. e.g. optSSE2=true, optSingleMode=false, optAvx2=false
c8
Function Diff(clip src1, clip src2)
{
return Subtract(src1.ConvertBits(8),src2.ConvertBits(8)).Levels(120, 1, 255-120, 0, 255, coring=false)
}
Function CalcTest(clip src, int bits, bool lut)
{
src
convertbits(bits)
tmp=last
method=Blur(1)
szrp=16
spwr=4
str=100/100.0
sdmplo=4
sdmphi=48
expr_pow = "x y == x x x y - abs "+string(Szrp) +" scaleb / 1 "+string(Spwr)+" / ^ "+string(Szrp) +" scaleb * "+string(str)+" * x y - 2 ^ x y - 2 ^ "
\+string(SdmpLo)+" scaleb scaleb + / * x y - x y - abs / * 1 "+string(SdmpHi)+" scaleb 0 == 0 x y - abs "+string(SdmpHi)+" scaleb / 4 ^ ? + / + ?"
ret=lut ? mt_lutxy(tmp,method, yexpr=expr_pow, U=1,V=1 ) : Expr(tmp,method,expr_pow,"","", optSSE2=true, optSingleMode=false, optAvx2=false)
return ret
}