Welcome to Doom9's Forum, THE inplace to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. 
27th September 2021, 09:23  #422  Link 
Registered User
Join Date: Jan 2014
Posts: 2,308

'exp' is a valid Expr function, and is using SIMD, probably worth using it.
In your replacement code above, luckily power with small integer exponents like 1, 2, 3 and 4 are optimized internally into mul (and dup), but for larger exponent values the result is calculated using a^b = exp(b*ln(a)) which needs much more computing. 
27th September 2021, 13:55  #423  Link 
Registered User
Join Date: Nov 2009
Posts: 2,351

I didn't want to make the snippet more complex than it is so I used "n ^", in real I'm using vars to reuse operations.
I tested with taylor series and gave a great performance improvement over 'exp' but it might not be valid for high steepness, I crafted a graph to see what was happening, I still might be doing something wrong. https://www.desmos.com/calculator/guetsfy9ww Guess I can join another polynomial but it makes things more complex. EDIT: just tested and yes, 'exp' is as fast. I had the notion that not when coding ex_bilateral() removing 'exp' gave a huge speed boost. @tormento: thanks. I think cos, sin and tan are the easiest, I already adapted 'interpolation' mode in ex_blend(), the problem comes when 'x' uses derivatives and other complex functions like atan. If I'm not wrong atan(x) = tan(y) = sin(y) / cos(y)
__________________
i74790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread Last edited by Dogway; 27th September 2021 at 14:17. 
27th September 2021, 15:16  #424  Link 
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,503

Nope, the arctan function is the inverse of the tangent function: it returns the angle whose tangent is a given number.
There is a Taylor series & here for it too. Look here also. Explicit algorithm is present too.
__________________
@turment on Telegram Last edited by tormento; 27th September 2021 at 15:32. 
27th September 2021, 17:21  #425  Link 
Registered User
Join Date: Nov 2009
Posts: 2,351

Thanks tormento, I managed to build a piecewise function for atan(x) since the Taylor series didn't converge between 0.8 and 1.65 so I built a polynomial in that section. https://www.desmos.com/calculator/bb392gsvnu
I tested on avisynth and works fine, now I will try to optimize it and benchmark, and see if I can reduce it on a case by case basis. Here's the code and bench (400% speed increase): Code:
Expr(last,Format(" x 255 / atan 255 *"),"") # 90 Code:
e8 = "X dup dup * X2@ X * X3@ 0.333333 *  X2 X3 * X5@ 0.200001 * + X5 X2 * X7@ 0.142857143 *  X7 X2 * 0.111111111 * + X7 X3 * 0.0909090909 * " # up to 0.8 e16 = " X2 0.245982 * X 1.00976 * + 0.021622 +" # up to 1.65 els = "pi 0.5 * 1 X /  1 X3 3 * / + 1 X5 5 * /  1 X7 7 * /  " # from 1.65 onwards atan = "X@ 0.8 <= "+e8+" X 1.65 >= "+els+" "+e16+" ? ?" # atan = "X@ 0.8 <= "+e8+" "+e16+" ?" # for atan([01]) Expr(last,Format("x range_max / "+atan+" range_max *"),"") # 413
__________________
i74790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread Last edited by Dogway; 27th September 2021 at 17:51. 
27th September 2021, 18:59  #426  Link 
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,503

You are the most welcome. AFAIK you used a McLaurin (x=0) and not a Taylor series (where x is an arbitrary point), that's why it doesn't fit for x larger than 0.
__________________
@turment on Telegram Last edited by tormento; 27th September 2021 at 19:05. 
27th September 2021, 22:18  #427  Link 
Registered User
Join Date: Nov 2009
Posts: 2,351

Yes I know, I'm currently working on cos(x) where x is pi/2 since some functions need cosines as high as pi.
Here's the Taylor series of cos(x) when x=pi/2, converges between 0 and pi. Code:
cosTP = " pi 0.500001 *  X@ 0.00000367321 swap  X dup * X2@ 0.0000018366 * + X2 X * X3@ 0.166666666 * + X2 dup * 0.00000015305 *  X2 X3 * 0.008333333 * " Expr(last,Format("x range_max / pi * "+cosTP+" range_max *"),"") # 390 #Expr(last,Format("x range_max / pi * cos range_max *"),"") # 70
__________________
i74790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread 
27th September 2021, 23:25  #428  Link 
Formerly davidh*****
Join Date: Jan 2004
Posts: 2,485

If pi/2 < x < pi, can't you just subtract x from pi and then take the negative of the result?
Or have I misunderstood... you say you're working on cos(x) when x = pi/2, but that's just zero every time... Possible helpful reading: http://gruntthepeon.free.fr/ssemath/sse_mathfun.h Last edited by wonkey_monkey; 27th September 2021 at 23:31. 
27th September 2021, 23:36  #429  Link 
Registered User
Join Date: Nov 2009
Posts: 2,351

It's not cosine of pi/2 but a cosine function approximation around pi/2, so when cos(pi) it gives more accurate results than if I design the Taylor series around x=0.
Here is the desmos graph (check around x=pi )
__________________
i74790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread 
27th September 2021, 23:42  #430  Link 
Formerly davidh*****
Join Date: Jan 2004
Posts: 2,485

I see, so as per my previous comment: you could design your calculation around 0 < x' < pi/2, reducing x to this range appropriately first. It might be faster for the same accuracy (or more accurate for the same speed).
The purple one needs 5 powers of x, the green one only needs 3. You would basically be taking the first part of the green line (up to pi/2) and rotating it around its endpoint to extend it to pi. https://www.desmos.com/calculator/bktakxsm7u Edit: there is a slight discontinuity at pi/2 but you can remove that my nudging the coefficients. Last edited by wonkey_monkey; 27th September 2021 at 23:56. 
28th September 2021, 00:04  #431  Link 
Registered User
Join Date: Nov 2009
Posts: 2,351

Yes, makes total sense, inverting the function and make it piecewise. I will bench speed and quality in case the discontinuity is visible.
__________________
i74790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread 
28th September 2021, 00:10  #433  Link 
Formerly davidh*****
Join Date: Jan 2004
Posts: 2,485

Using 0.0013934 (this is just a rough approximation, not a calculated value) as the x^6 coefficient should all but remove the discontinuity. The maximum error is about 0.0000924. Adding an x^8 term can make the max error almost 100x smaller.
I'll try and work on best coefficients tomorrow, if I have time. Last edited by wonkey_monkey; 28th September 2021 at 00:15. 
28th September 2021, 00:27  #434  Link 
Registered User
Join Date: Nov 2009
Posts: 2,351

I tested and speed is 2% faster, it was already pretty fast, from 420 to 430fps. Quality wise I think it's better because it doesn't touch range extremes which are always sensible and the discontinuity is not appreciable. But if you can find a better coefficient that would be great.
There's another approximation noted by tormento, the Bhaskara I approx. but it's not as good as the six degree polynomial. Code:
(pi^2  4x^2) / (pi^2 + x^2) EDIT: yep, coefficient 0.001329 is almost a match, much better. tormento: check wonkey_monkey's link above. He includes all the three approximations, the one from my post is the purple.
__________________
i74790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread Last edited by Dogway; 28th September 2021 at 01:16. 
28th September 2021, 02:39  #435  Link  
Grumpy Old Man.
Join Date: Jul 2019
Location: Out There....
Posts: 692

"Real" testing,,
Hi Dogway (and the rest),
I asked a question back here : https://forum.doom9.org/showthread.p...24#post1953024 That didn't get answered. Quote:
__________________
Not poorly done, just doin' it my way !!! Live every day like it's your last, because one day, it will be !! (M$B) PD Builds, etc 

28th September 2021, 03:51  #436  Link  
Registered User
Join Date: Jan 2018
Posts: 2,133

Quote:
https://forum.doom9.org/showthread.p...31#post1953031 

28th September 2021, 07:22  #437  Link  
Grumpy Old Man.
Join Date: Jul 2019
Location: Out There....
Posts: 692

Quote:
I don't consider that as an answer...it's very a confusing comment, and I still can't get the latest builds to work. And I do remember encoding DVD's at a very slow pace, my fave tool was DVD2SVCD, and I had a very powerful dual Athlon MP2600 system to churn thru it.
__________________
Not poorly done, just doin' it my way !!! Live every day like it's your last, because one day, it will be !! (M$B) PD Builds, etc 

28th September 2021, 15:16  #438  Link 
Formerly davidh*****
Join Date: Jan 2004
Posts: 2,485

Best coefficients I've found so far:
Up to x^6: Code:
1  0.5x^2 + 0.041574811029363x^4  0.001292112506266x^6 Max error: 0.000019976279586 (43x better than truncated Taylor series, 4.6x better than modifying only last term to avoid discontinuity) Code:
1  0.5x^2 + 0.041666666666667x^4  0.001387723061268x^6 + 0.000023661684925x^8 Max error: 0.000000330438621 (72x better than truncated Taylor series, 6x better than modifying only last term to avoid discontinuity) Last edited by wonkey_monkey; 28th September 2021 at 15:25. 
28th September 2021, 15:38  #439  Link 
Registered User
Join Date: Nov 2009
Posts: 2,351

Thanks a lot. Will keep the first one for performance reasons.
EDIT: By the way, I managed to also create a Taylor (Maclaurin) series for exp(x), I know 'exp' is accelerated in Expr but it was the main cause of ex_bilateral() drop in performance ('exp' called many times) so I decided to give it a go. Well it works very well for as low as a 5th degree polynomial even in PC levels, it increased from 115fps for ex_bilateral(1) to 167fps, so it's faster than vsTBilateral(). In ex_contrast() it isn't worth it as it's only called once, and the range of action is larger (from 8 to +8 in x)
__________________
i74790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread Last edited by Dogway; 28th September 2021 at 16:07. 
28th September 2021, 22:15  #440  Link 
Registered User
Join Date: Nov 2009
Posts: 2,351

Access violation, not sure if my fault or a bug in avs+:
Code:
a=FlipHorizontal() #~ ex_blend(a,"interpolation",1,0.7) cosTS = "X dup * X2@ 0.500001 * 1 swap  X2 dup * X4@ 0.041574811029363 * + X4 X2 * 0.001292112506266 * " # 0.00129 to fix discontinuity at pi/2 cosT = "X@ pi 0.500001 * <= "+cosTS+" dup pi swap  1 * ? " Expr(last,a,"x ymin  ymax ymin  / pi * "+cosT+" 0.250001 * 0.500001 swap  y ymin  ymax ymin  / pi * "+cosT+" 0.250001 *  ymax ymin  *" ,"") This works though: Code:
cosTS = "X dup * X2@ 0.500001 * 1 swap  X2 dup * X4@ 0.041574811029363 * + X4 X2 * 0.001292112506266 *  dup" cosT = "X@ pi 0.500001 * <= "+cosTS+" pi swap  1 * ? "
__________________
i74790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread Last edited by Dogway; 28th September 2021 at 22:21. 
Tags 
avisynth, dogway, filters, hbd, packs 
Thread Tools  Search this Thread 
Display Modes  

