Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Usage

Reply
 
Thread Tools Search this Thread Display Modes
Old 11th July 2021, 18:07   #101  |  Link
Dogway
Registered User
 
Join Date: Nov 2009
Posts: 2,339
I use sorting networks. Right now I'm trying to port (kinda done), Adaptive Median by VC Mohan checking the source file, the concept is easy but costly in terms of performance. In any case I'm doing this for the literature so I can have all the algos and how they look in one right place.
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread
Dogway is offline   Reply With Quote
Old 11th July 2021, 18:17   #102  |  Link
wonkey_monkey
Formerly davidh*****
 
wonkey_monkey's Avatar
 
Join Date: Jan 2004
Posts: 2,478
Hmm. Neat but mindboggling (the sorting networks I mean).
__________________
My AviSynth filters / I'm the Doctor

Last edited by wonkey_monkey; 11th July 2021 at 18:54.
wonkey_monkey is offline   Reply With Quote
Old 11th July 2021, 21:22   #103  |  Link
wonkey_monkey
Formerly davidh*****
 
wonkey_monkey's Avatar
 
Join Date: Jan 2004
Posts: 2,478
Fully optimsed undot2 - 40% faster!

Code:
x[-1,1]  x[1,1]   dup1 dup1 min swap2 max
x[0,1]   x[-1,0]  dup1 dup1 min swap2 max
x[1,0]   x[0,-1]  dup1 dup1 min swap2 max
x[-1,-1] x[1,-1]  dup1 dup1 min swap2 max

swap7 swap1 swap3 dup1 dup1 min swap2 max
swap5 swap1 swap3 dup1 dup1 min swap2 max
swap6 swap1 swap2 dup1 dup1 min swap2 max
swap4 swap1 swap7 dup1 dup1 min swap2 max
swap3 swap1 swap2                     max
swap6             dup1 dup1 min swap2 max
swap4 swap1 swap5 dup1 dup1 min swap2 max
swap3 swap1 swap2           min
swap4             dup1 dup1 min swap2 max
swap3 swap1 swap2 dup1 dup1 min swap2 max
swap5 swap1 swap3           min
swap2 swap1 swap3                     max
swap2                       min
swap2                                 max

x swap2 swap1
clip
This now makes it nearly 18x faster than the equivalent non-SIMD C++ plugin on my computer.
__________________
My AviSynth filters / I'm the Doctor

Last edited by wonkey_monkey; 11th July 2021 at 21:36.
wonkey_monkey is offline   Reply With Quote
Old 12th July 2021, 13:05   #104  |  Link
Dogway
Registered User
 
Join Date: Nov 2009
Posts: 2,339
I tested the undot2 and I got a 10% performance increase, from 317fps (old dup1 dup1 syntax) to 343fps. VERY nice!.

I'm trying to understand the new syntax, I think I got it but got some issues. I wonder why in the first swap3 line you only compute max? Is this to save a comparison check? Following you do a swap6 so you end with A F (I name them to keep track) on the stack and do a min, max when it should be A C.

Stack state after swap3 line:

C F
E G
H D
A B

As an exercise I'm doing the simple undot1:
Code:
"x[-1,1] A^ x[0,1] B^ x[1,1] C^ x[-1,0] D^ x[1,0] E^ x[-1,-1] F^ x[0,-1] G^ x[1,-1] H^ "             
"x[0,0] A B min C min D min E min F min G min H min A B max C max D max E max F max G max H max clip"
And this was my attempt which doesn't match one to one (and it's slower):
Code:
x[0,0] x[-1,1] x[0,1] x[1,1] x[-1,0] x[1,0] x[-1,-1] x[0,-1] x[1,-1]
swap7 swap1 swap6 dup1 dup1 min swap2 max
swap5             dup swap2 min swap5 max
swap1 swap3       dup swap2 max swap4 min
                       dup1 min swap3 max
                       dup1 max swap2 min
swap2                  dup1 max swap2 min
swap2                  dup1 max swap2 min swap clip
EDIT: yikes, I borked it at dup. Edit above but still no match.
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread

Last edited by Dogway; 12th July 2021 at 14:03.
Dogway is offline   Reply With Quote
Old 12th July 2021, 18:17   #105  |  Link
wonkey_monkey
Formerly davidh*****
 
wonkey_monkey's Avatar
 
Join Date: Jan 2004
Posts: 2,478
Quote:
Originally Posted by Dogway View Post
I'm trying to understand the new syntax, I think I got it but got some issues. I wonder why in the first swap3 line you only compute max? Is this to save a comparison check?
Yes, the result of a min here would never be used so it's preferable not to do it (otherwise it would need to be popped at the end).

Quote:
Following you do a swap6 so you end with A F (I name them to keep track) on the stack and do a min, max when it should be A C.
Because there's no min calculation, A ends up being removed from the stack entirely. The max changes the stack from "C F E G H D A B" to "C F E G H D B", then on the next line swap6 turns this into "B F E G H D C", ready for C and D to be compared (both min and max, since both are required).
__________________
My AviSynth filters / I'm the Doctor
wonkey_monkey is offline   Reply With Quote
Old 12th July 2021, 21:06   #106  |  Link
wonkey_monkey
Formerly davidh*****
 
wonkey_monkey's Avatar
 
Join Date: Jan 2004
Posts: 2,478
Can you give this tool a try?

https://horman.net/expr_sort.php

Select a network from the dropdown, then choose which sorted elements you want (e.g. for undot2, elements 1 and 6), select an ordering for the selected elements to be left on the stack, then hit submit.

Copy the contents of the left box and replace each [#] with an unsorted element (e.g. a pixel reference).
__________________
My AviSynth filters / I'm the Doctor

Last edited by wonkey_monkey; 12th July 2021 at 21:13.
wonkey_monkey is offline   Reply With Quote
Old 13th July 2021, 09:06   #107  |  Link
Dogway
Registered User
 
Join Date: Nov 2009
Posts: 2,339
Woow, so awesome! Saves a lot of time specially for 25 inputs. The problem is except for low inputs it's slower! I do think there's more to it than meets the eye. Following some reasoning.

I think there are some unwritten rules for Expr to optimize performance. I will try list those that come to mind.
  1. Multiplication is faster than division. "x 0.5 *" > "x 2 /"
  2. For better precision calculate the reciprocal within Expr, constants are directly stored so they are reused. "x 1 3 / *" > "x 0.333333333 * "
  3. Ternaries are very slow, try to replace them with min|max when possible. "x y max" > "x y > x y ?"
  4. Dragging stack elements lowers performance. "x 0.5 - x *" > "x x 0.5 - *"
  5. dupn lowers performance for higher "n"
  6. more?

I need to set those into stone to decide what optimization route to go forward with ExTools.
I think the problem with 25 inputs is number 4. swapping might be a free operation but performance suffers because Expr likes continuous operations on the stack. You can see this clearly with the example in my above post, simply swap the location of "x" to the correct stack position and it increases performance by 10%. "x" or "x[0,0]" doesn't make a change (at all).
Code:
x[0,-1] x[1,-1]
x[1,0] x[-1,-1]
x[1,1] x[-1,0]
x[-1,1] x[0,1]  dup1  dup1 min swap2 max
swap3           dup1 swap2 min swap3 max
swap1           dup1 swap2 max swap2 min
swap3           dup1 swap2 max swap3 min
swap1           dup1 swap2 min swap2 max
swap2           dup1 swap2 min swap2 max
swap2           dup1 swap2 min swap2 max
x swap2 swap1 clip
The degree to which dragging elements behind impacts performance is not yet defined. For example.
Code:
old                           new
"x {th} > 255 scaleb x ?" > "x dup {th} > 255 scaleb swap1 swap2 ?"

old = 471 + 473 + 479 + 474 + 465 + 482 + 481 = 475    mean = 474 median = 1883 (470.75) 50%p
new = 471 + 469 + 486 + 481 + 464 + 468 + 485 = 474.85 mean = 471 median = 1872 (468)    50%p

Another example, each one stacked 3 times to augment difference.
Code:
sat=-0.2
Expr("", Format("x A^ 1 {sat} - A * {sat} range_max A - * + ")) # 297 298 297
Expr("", Format("1 {sat} - x dup swap2 * range_max swap1 swap2 - {sat} * +") ) # 294 299 293
As one can see the improvement of not using variables is not clear cut.

So my suggestion for median5 (or long sorting networks) is to bring stack elements to the front from time to time (after 1 or 2 layers).
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread
Dogway is offline   Reply With Quote
Old 13th July 2021, 10:08   #108  |  Link
wonkey_monkey
Formerly davidh*****
 
wonkey_monkey's Avatar
 
Join Date: Jan 2004
Posts: 2,478
Quote:
The problem is except for low inputs it's slower!
Slower than what?
__________________
My AviSynth filters / I'm the Doctor
wonkey_monkey is offline   Reply With Quote
Old 13th July 2021, 19:34   #109  |  Link
Dogway
Registered User
 
Join Date: Nov 2009
Posts: 2,339
Slower than this.

Tried with this, but didn't help, it needs some manual work.
Code:
mode == "median5"? "x[-2,2]  x[-1,2]      dup1 dup1 min swap2 max
                    x[0,2]   x[1,2]       dup1 dup1 min swap2 max
                    x[2,2]   x[-2,1]      dup1 dup1 min swap2 max
                    x[-1,1]  x[0,1]       dup1 dup1 min swap2 max
                    x[1,1]   x[2,1]       dup1 dup1 min swap2 max
                    x[-2,0]  x[-1,0]      dup1 dup1 min swap2 max
                    x[0,0]   x[1,0]       dup1 dup1 min swap2 max
                    x[2,0]   x[-2,-1]     dup1 dup1 min swap2 max
                    x[-1,-1] x[0,-1]      dup1 dup1 min swap2 max
                    x[1,-1]  x[2,-1]      dup1 dup1 min swap2 max
                    x[-2,-2] x[-1,-2]     dup1 dup1 min swap2 max
                    x[0,-2]  x[1,-2]      dup1 dup1 min swap2 max
                    swap23 swap1  swap19  dup1 dup1 min swap2 max
                    swap21 swap1  swap3   dup1 dup1 min swap2 max
                    swap22 swap1  swap18  dup1 dup1 min swap2 max
                    swap17 swap1  swap7   dup1 dup1 min swap2 max
                    swap15 swap1  swap19  dup1 dup1 min swap2 max
                    swap13 swap1  swap9   dup1 dup1 min swap2 max
                    swap11 swap1  swap5   dup1 dup1 min swap2 max
                    swap20 swap1  swap2   dup1 dup1 min swap2 max
                    swap10 swap1  swap4   dup1 dup1 min swap2 max
                    swap8  swap1  swap12  dup1 dup1 min swap2 max
                    swap6  swap1  swap16  dup1 dup1 min swap2 max
                    swap14 swap1  swap23  dup1 dup1 min swap2 max
                    swap3  swap1  swap19  dup1 dup1 min swap2 max
                    swap18 swap1  swap2   dup1 dup1 min swap2 max
                   
                    N@ P@ D@ O@ U@ E@ S@ L@ M@ R@ V@ J@ K@ X@ F@ Q@ C@ T@ G@ I@ Y@ A@ B@ H@
                   
                    (...)
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread

Last edited by Dogway; 13th July 2021 at 19:37.
Dogway is offline   Reply With Quote
Old 13th July 2021, 20:33   #110  |  Link
wonkey_monkey
Formerly davidh*****
 
wonkey_monkey's Avatar
 
Join Date: Jan 2004
Posts: 2,478
I wonder if Expr's built-in spilling of values, when they don't fit on the stack, is slower than saving them to variables for some reason.

There is a fault somewhere in yours though - I checked it with random pixel values and it didn't give the correct median.
__________________
My AviSynth filters / I'm the Doctor
wonkey_monkey is offline   Reply With Quote
Old 13th July 2021, 20:52   #111  |  Link
Dogway
Registered User
 
Join Date: Nov 2009
Posts: 2,339
Yes I know, I revised it several times but didn't find the culprit. The same with undot1 as stated above. Might not be my fault who knows.

I think it might also be a matter of dragging behind stack elements as explained above with simple expressions. The following is also very slow, I guess popping them out fixes the issue:
Code:
x[-2,2]  x[-1,2]      dup1 dup1 min swap2 max
x[0,2]   x[1,2]       dup1 dup1 min swap2 max
x[2,2]   x[-2,1]      dup1 dup1 min swap2 max
x[-1,1]  x[0,1]       dup1 dup1 min swap2 max
x[1,1]   x[2,1]       dup1 dup1 min swap2 max
x[-2,0]  x[-1,0]      dup1 dup1 min swap2 max
x[0,0]   x[1,0]       dup1 dup1 min swap2 max
x[2,0]   x[-2,-1]     dup1 dup1 min swap2 max
x[-1,-1] x[0,-1]      dup1 dup1 min swap2 max
x[1,-1]  x[2,-1]      dup1 dup1 min swap2 max
x[-2,-2] x[-1,-2]     dup1 dup1 min swap2 max
x[0,-2]  x[1,-2]      dup1 dup1 min swap2 max

N^ P^ D^ O^ U^ E^ S^ L^ M^ R^ V^ J^ K^ X^ F^ Q^ C^ T^ G^ I^ Y^ A^ B^ H^
M
EDIT: Above 12 stack elements it starts to bog down.
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread

Last edited by Dogway; 13th July 2021 at 20:55.
Dogway is offline   Reply With Quote
Old 13th July 2021, 20:57   #112  |  Link
wonkey_monkey
Formerly davidh*****
 
wonkey_monkey's Avatar
 
Join Date: Jan 2004
Posts: 2,478
Mine (median5) is faster if you set optSingleMode = true Yours does not benefit

Unfortunately knowing whether or not to switch it on is probably a black art and may not even be the same from computer to computer. Are you on x86 or x64?

Edit: actually thinking about it, if it's faster on one computer it should be faster on any computer with the same architecture (x86 or x64). It's to do with how many registers are available. Yours gained some speed by keeping more registers free, but lost some due to storing and loading variables. I'm still a little surprised at how close they are though.

When I eventually write my new RPN compiler I plan on including automatic profiling so it should always make the correct choices of such features.
__________________
My AviSynth filters / I'm the Doctor

Last edited by wonkey_monkey; 13th July 2021 at 21:14.
wonkey_monkey is offline   Reply With Quote
Old 13th July 2021, 23:02   #113  |  Link
Dogway
Registered User
 
Join Date: Nov 2009
Posts: 2,339
That one is outdated actually , I think this might be faster (if we ignore the output difference):
Code:
x[-2,2] A^ x[-1,2] B^ x[0,2] C^ x[1,2] D^ x[2,2] E^ x[-2,1] F^ x[-1,1] G^ x[0,1] H^ x[1,1] I^ x[2,1] J^ x[-2,0] K^ x[-1,0] L^ x[0,0] M^
x[1,0] N^ x[2,0] O^ x[-2,-1] P^ x[-1,-1] Q^ x[0,-1] R^ x[1,-1] S^ x[2,-1] T^ x[-2,-2] U^ x[-1,-2] V^ x[0,-2] X^ x[1,-2] Y^ x[2,-2] Z^
A C dup1 dup1 min AA^ max CC^ B I dup1 dup1 min BB^ max II^ D S dup1 dup1 min DD^ max SS^ E R dup1 dup1 min EE^ max RR^ F U dup1 dup1 min FF^ max UU^ G T dup1 dup1 min GG^ max TT^ H J dup1 dup1 min HH^ max JJ^ K L dup1 dup1 min KK^ max LL^ M N dup1 dup1 min MM^ max NN^ O Q dup1 dup1 min OO^ max QQ^ P X dup1 dup1 min PP^ max XX^ V Y dup1 dup1 min VV^ max YY^
AA DD dup1 dup1 min A^ max D^ BB PP dup1 dup1 min B^ max P^ CC SS dup1 dup1 min C^ max S^ EE MM dup1 dup1 min E^ max M^ FF VV dup1 dup1 min F^ max V^ GG KK dup1 dup1 min G^ max K^ HH OO dup1 dup1 min H^ max O^ II XX dup1 dup1 min I^ max X^ JJ QQ dup1 dup1 min J^ max Q^ LL TT dup1 dup1 min L^ max T^ NN RR dup1 dup1 min N^ max R^ UU YY dup1 dup1 min U^ max Y^
A E dup1 dup1 min AA^ max EE^ B H dup1 dup1 min BB^ max HH^ C N dup1 dup1 min CC^ max NN^ D M dup1 dup1 min DD^ max MM^ F G dup1 dup1 min FF^ max GG^ I O dup1 dup1 min II^ max OO^ J P dup1 dup1 min JJ^ max PP^ K V dup1 dup1 min KK^ max VV^ L U dup1 dup1 min LL^ max UU^ Q X dup1 dup1 min QQ^ max XX^ R S dup1 dup1 min RR^ max SS^ T Y dup1 dup1 min TT^ max YY^
AA FF dup1 dup1 min A^ max F^ CC LL dup1 dup1 min C^ max L^ DD GG dup1 dup1 min D^ max G^ EE KK dup1 dup1 min E^ max K^ HH QQ dup1 dup1 min H^ max Q^ II JJ dup1 dup1 min I^ max J^ MM VV dup1 dup1 min M^ max V^ NN TT dup1 dup1 min N^ max T^ OO PP dup1 dup1 min O^ max P^ RR UU dup1 dup1 min R^ max U^ SS YY dup1 dup1 min S^ max Y^
C H dup1 dup1 min CC^ max HH^ G J dup1 dup1 min GG^ max JJ^ I LL dup1 dup1 min II^ max L^ O Z dup1 dup1 min OO^ max ZZ^ S V dup1 dup1 min SS^ max VV^
D II dup1 dup1 min DD^ max I^ HH K dup1 dup1 min H^ max KK^ LL M dup1 dup1 min L^ max MM^ N OO dup1 dup1 min NN^ max O^ P VV dup1 dup1 min PP^ max V^ SS U dup1 dup1 min S^ max UU^ X ZZ dup1 dup1 min XX^ max Z^
E NN dup1 dup1 min EE^ max N^ KK Q dup1 dup1 min K^ max QQ^ L PP dup1 dup1 min LL^ max P^ S Z dup1 dup1 min SS^ max ZZ^ T XX dup1 dup1 min TT^ max X^
BB EE dup1 dup1 min B^ max E^ I LL dup1 dup1 min II^ max L^ JJ TT dup1 dup1 min J^ max T^ N R dup1 dup1 min NN^ max RR^ O SS dup1 dup1 min OO^ max S^ QQ UU dup1 dup1 min Q^ max U^ Y ZZ dup1 dup1 min YY^ max Z^
A B dup1 dup1 min AA^ max BB^ E F dup1 dup1 min EE^ max FF^ GG NN dup1 dup1 min G^ max N^ J OO dup1 dup1 min JJ^ max O^ K RR dup1 dup1 min KK^ max R^ MM Q dup1 dup1 min M^ max QQ^ S T dup1 dup1 min SS^ max TT^ U V dup1 dup1 min UU^ max VV^ X YY dup1 dup1 min XX^ max Y^
CC G dup1 dup1 min C^ max GG^ DD EE dup1 dup1 min D^ max E^ FF N dup1 dup1 min F^ max NN^ H JJ dup1 dup1 min HH^ max J^ M SS dup1 dup1 min MM^ max S^ P R dup1 dup1 min PP^ max RR^ QQ TT dup1 dup1 min Q^ max T^ UU XX dup1 dup1 min U^ max X^ VV Y dup1 dup1 min V^ max YY^
BB C dup1 dup1 min B^ max CC^ F II dup1 dup1 min FF^ max I^ GG HH dup1 dup1 min G^ max H^ J KK dup1 dup1 min JJ^ max K^ L NN dup1 dup1 min LL^ max N^ O PP dup1 dup1 min OO^ max P^ RR U dup1 dup1 min R^ max UU^ V X dup1 dup1 min VV^ max XX^
B D dup1 dup1 min BB^ max DD^ CC E dup1 dup1 min C^ max EE^ FF G dup1 dup1 min F^ max GG^ H LL dup1 dup1 min HH^ max L^ I JJ dup1 dup1 min II^ max J^ K N dup1 dup1 min KK^ max NN^ MM OO dup1 dup1 min M^ max O^ P Q min PP^ R S min RR^
J M max MM^ KK L max LL^ NN O min N^ PP RR min P^ LL MM max M^ N P min NN^ M NN min MM^
MM
I don't mind doing manual profiling this time, well it's what I've doing for 2 months anyway, haha.
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread
Dogway is offline   Reply With Quote
Old 14th July 2021, 00:08   #114  |  Link
wonkey_monkey
Formerly davidh*****
 
wonkey_monkey's Avatar
 
Join Date: Jan 2004
Posts: 2,478
Yup, faster for me too. What did you change? The ordering of swaps within each layer?
__________________
My AviSynth filters / I'm the Doctor
wonkey_monkey is offline   Reply With Quote
Old 14th July 2021, 04:42   #115  |  Link
kedautinh12
Registered User
 
Join Date: Jan 2018
Posts: 2,085
I think need keep output same result cause it will affect whole scripts when just replace some functions. Example: you was replaced blur(.6) with RemoveGrain(1) in Framerateconverter.avsi cause it's only faster but MysteryX said it will make artifact when mask larger

Last edited by kedautinh12; 14th July 2021 at 05:42.
kedautinh12 is offline   Reply With Quote
Old 14th July 2021, 10:47   #116  |  Link
Dogway
Registered User
 
Join Date: Nov 2009
Posts: 2,339
I just removed unnecessary checks for the last 2 layers. Anyway I will revise it one more time this time with regex so I don't do any mistake.
Today mainly will be profiling and optimizing, if lucky I can release this evening.

@kedautinh12: Yes, these are only drafts. I still haven't updated MIX/EX mods to latest ExTools, it happened that removegrain(12) or blur is not true gaussian but a weighted mean. blur(0.6) was implemented recently as a bare convolution. I also want to optimize Expr with latest tricks but most likely will leave untouched as shown here.




EDIT: By the way undot was:
Code:
x[0,-1] x[1,-1]
x[1,0] x[-1,-1]
x[1,1] x[-1,0]
x[-1,1] x[0,1]  dup1  dup1 min swap2 max
swap3           dup  swap2 min swap3 max
swap1           dup  swap2 max swap2 min
swap3           dup  swap2 max swap3 min
swap1           dup  swap2 min swap2 max
swap2           dup  swap2 min swap2 max
swap2           dup  swap2 min swap2 max
x swap2 swap1 clip
It still slower than the old min max succession so I post it here just for reference. I also fixed median5 (an LL^ variable was mistyped) and also weightedp. Overall good optimizations going forward.
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread

Last edited by Dogway; 14th July 2021 at 15:15.
Dogway is offline   Reply With Quote
Old 14th July 2021, 20:18   #117  |  Link
wonkey_monkey
Formerly davidh*****
 
wonkey_monkey's Avatar
 
Join Date: Jan 2004
Posts: 2,478
Can you check if this is faster for median5? It was for me. It uses the second 25-item network instead of the first, which is apparently more Expr-friendly:

Code:
Expr(optsinglemode = true, "
x[-2,2]  x[-1,2]      dup1 dup1 min swap2 max
x[0,2]   x[1,2]       dup1 dup1 min swap2 max
x[2,2]   x[-2,1]      dup1 dup1 min swap2 max
x[-1,1]  x[0,1]       dup1 dup1 min swap2 max
x[1,1]   x[2,1]       dup1 dup1 min swap2 max
x[-2,0]  x[-1,0]      dup1 dup1 min swap2 max
x[0,0]   x[1,0]       dup1 dup1 min swap2 max
x[2,0]   x[-2,-1]     dup1 dup1 min swap2 max
x[-1,-1] x[0,-1]      dup1 dup1 min swap2 max
x[1,-1]  x[2,-1]      dup1 dup1 min swap2 max
x[-2,-2] x[-1,-2]     dup1 dup1 min swap2 max
x[0,-2]  x[1,-2]      dup1 dup1 min swap2 max
swap23 swap1  swap19  dup1 dup1 min swap2 max
swap21 swap1  swap3   dup1 dup1 min swap2 max
swap22 swap1  swap18  dup1 dup1 min swap2 max
swap17 swap1  swap7   dup1 dup1 min swap2 max
swap15 swap1  swap19  dup1 dup1 min swap2 max
swap13 swap1  swap9   dup1 dup1 min swap2 max
swap11 swap1  swap5   dup1 dup1 min swap2 max
swap20 swap1  swap2   dup1 dup1 min swap2 max
swap10 swap1  swap4   dup1 dup1 min swap2 max
swap8  swap1  swap12  dup1 dup1 min swap2 max
swap6  swap1  swap16  dup1 dup1 min swap2 max
swap14 swap1  swap23  dup1 dup1 min swap2 max
swap3  swap1  swap19  dup1 dup1 min swap2 max
swap18 swap1  swap2   dup1 dup1 min swap2 max
swap7  swap1  swap23  dup1 dup1 min swap2 max
swap21 swap1  swap15  dup1 dup1 min swap2 max
swap9  swap1  swap5   dup1 dup1 min swap2 max
swap4  swap1  swap20  dup1 dup1 min swap2 max
swap12 swap1  swap22  dup1 dup1 min swap2 max
swap11 swap1  swap13  dup1 dup1 min swap2 max
swap16 swap1  swap19  dup1 dup1 min swap2 max
swap8  swap1  swap10  dup1 dup1 min swap2 max
swap14 swap1  swap17  dup1 dup1 min swap2 max
swap6  swap1  swap3   dup1 dup1 min swap2 max
swap2  swap1  swap20                      max
swap14 swap1  swap9   dup1 dup1 min swap2 max
swap4  swap1  swap3   dup1 dup1 min swap2 max
swap17 swap1  swap18  dup1 dup1 min swap2 max
swap21 swap1  swap12  dup1 dup1 min swap2 max
swap8  swap1  swap15  dup1 dup1 min swap2 max
swap20 swap1  swap19  dup1 dup1 min swap2 max
swap11 swap1  swap10  dup1 dup1 min swap2 max
swap2  swap1  swap7   dup1 dup1 min swap2 max
swap5  swap1  swap9   dup1 dup1 min swap2 max
swap13 x[2,-2]        dup1 dup1 min swap2 max
swap7  swap1  swap20  dup1 dup1 min swap2 max
swap16 swap1  swap22  dup1 dup1 min swap2 max
swap9  swap1  swap11  dup1 dup1 min swap2 max
swap5  swap1  swap20  dup1 dup1 min swap2 max
swap2  swap1  swap21  dup1 dup1 min swap2 max
swap12 swap1  swap6   dup1 dup1 min swap2 max
swap14 swap1  swap7             min
swap22 swap1  swap20  dup1 dup1 min swap2 max
swap18 swap1  swap10                      max
swap11 swap1  swap18  dup1 dup1 min swap2 max
swap13 swap1  swap20  dup1 dup1 min swap2 max
swap16 swap1  swap3   dup1 dup1 min swap2 max
swap7  swap1  swap18  dup1 dup1 min swap2 max
swap6  swap1  swap4   dup1 dup1 min swap2 max
swap15 swap1  swap19  dup1 dup1 min swap2 max
swap2  swap1  swap3                       max
swap15 swap1  swap7   dup1 dup1 min swap2 max
swap16 swap1  swap18  dup1 dup1 min swap2 max
swap5  swap1  swap9   dup1 dup1 min swap2 max
swap2  swap1  swap4   dup1 dup1 min swap2 max
swap14 swap1  swap20  dup1 dup1 min swap2 max
swap11 swap1  swap7   dup1 dup1 min swap2 max
swap8  swap1  swap15                      max
swap16 swap1  swap4   dup1 dup1 min swap2 max
swap11 swap1  swap8   dup1 dup1 min swap2 max
swap2  swap1  swap15  dup1 dup1 min swap2 max
swap12 swap1  swap19  dup1 dup1 min swap2 max
swap3  swap1  swap14  dup1 dup1 min swap2 max
swap6  swap1  swap13  dup1 dup1 min swap2 max
swap7  swap1  swap10            min
swap17 swap1  swap14                      max
swap14 swap1  swap6   dup1 dup1 min swap2 max
swap15 swap1  swap12  dup1 dup1 min swap2 max
swap7  swap1  swap17  dup1 dup1 min swap2 max
swap3  swap1  swap9   dup1 dup1 min swap2 max
swap11 swap1  swap8   dup1 dup1 min swap2 max
swap6  swap1  swap10  dup1 dup1 min swap2 max
swap5  swap1  swap16            min
swap3                           min
swap10 swap1  swap7                       max
swap12 swap1  swap6   dup1 dup1 min swap2 max
swap14 swap1  swap11                      max
swap9  swap1  swap3   dup1 dup1 min swap2 max
swap4  swap1  swap12  dup1 dup1 min swap2 max
swap7  swap1  swap3             min
swap4  swap1  swap7             min
swap8  swap1  swap7                       max
swap8  swap1  swap9                       max
swap9  swap1  swap8   dup1 dup1 min swap2 max
swap5  swap1  swap2   dup1 dup1 min swap2 max
swap8  swap1  swap4             min
swap2  swap1  swap5             min
swap5  swap1  swap7                       max
swap3                                     max
swap5                           min
swap3  swap1  swap2             min
swap3                                     max
swap2                           min
                                min
")
__________________
My AviSynth filters / I'm the Doctor

Last edited by wonkey_monkey; 14th July 2021 at 20:22.
wonkey_monkey is offline   Reply With Quote
Old 14th July 2021, 20:40   #118  |  Link
Dogway
Registered User
 
Join Date: Nov 2009
Posts: 2,339
It runs at 74fps, more or less like my post above, but I went and optimized it further and now it runs at 87fps

I already profiled all the modes and should be running at optimal speeds, I'm not sure it's possible to squeeze more out of it. For ex_edge I don't think I will be doing anything, they run pretty fast already.

I tried to do some dup swap thing in some functions like ex_merge, ex_makeadddiff and so, but I don't think they improve performance, if any it kinda feel slower (minimal but still). For ex_binarize smooth mode it was a tie, not sure what to think but left it updated with the new syntax. Tomorrow I will profile GradePack (ex_contrast and ex_levels).


By the way, for anyone reading, rebased MIX/EX mods with latest ExTools.
__________________
i7-4790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread
Dogway is offline   Reply With Quote
Old 14th July 2021, 20:56   #119  |  Link
wonkey_monkey
Formerly davidh*****
 
wonkey_monkey's Avatar
 
Join Date: Jan 2004
Posts: 2,478
Try this. I got 25% more out of it than my previous one (note that it does NOT use OptSingleMode):

Code:
Expr("
x[-2,-2] x[-2,-1]     dup1 dup1 min A^ max C^
x[-2,0] x[-2,1]       dup1 dup1 min B^ max I^
x[-2,2] x[-1,-2]      dup1 dup1 min D^ max S^
x[-1,-1] x[-1,0]      dup1 dup1 min E^ max R^
x[-1,1] x[-1,2]       dup1 dup1 min F^ max U^
x[0,-2] x[0,-1]       dup1 dup1 min G^ max T^
x[0,0] x[0,1]         dup1 dup1 min H^ max J^
x[0,2] x[1,-2]        dup1 dup1 min K^ max L^
x[1,-1] x[1,0]        dup1 dup1 min M^ max N^
x[1,1] x[1,2]         dup1 dup1 min O^ max Q^
x[2,-2] x[2,-1]       dup1 dup1 min P^ max W^
x[2,0] x[2,1]         dup1 dup1 min V^ max X^
A D                   dup1 dup1 min A^ max D^
B P                   dup1 dup1 min B^ max P^
C S                   dup1 dup1 min C^ max S^
E M                   dup1 dup1 min E^ max M^
F V                   dup1 dup1 min F^ max V^
G K                   dup1 dup1 min G^ max K^
H O                   dup1 dup1 min H^ max O^
I W                   dup1 dup1 min I^ max W^
J Q                   dup1 dup1 min J^ max Q^
L T                   dup1 dup1 min L^ max T^
N R                   dup1 dup1 min N^ max R^
U X                   dup1 dup1 min U^ max X^
A E                   dup1 dup1 min A^ max E^
B H                   dup1 dup1 min B^ max H^
C N                   dup1 dup1 min C^ max N^
D M                   dup1 dup1 min D^ max M^
F G                   dup1 dup1 min F^ max G^
I O                   dup1 dup1 min I^ max O^
J P                   dup1 dup1 min J^ max P^
K V                   dup1 dup1 min K^ max V^
L U                   dup1 dup1 min L^ max U^
Q W                   dup1 dup1 min Q^ max W^
R S                   dup1 dup1 min R^ max S^
T X                   dup1 dup1 min T^ max X^
A F                                    max F^
C L                   dup1 dup1 min C^ max L^
D G                   dup1 dup1 min D^ max G^
E K                   dup1 dup1 min E^ max K^
H Q                   dup1 dup1 min H^ max Q^
I J                   dup1 dup1 min I^ max J^
M V                   dup1 dup1 min M^ max V^
N T                   dup1 dup1 min N^ max T^
O P                   dup1 dup1 min O^ max P^
R U                   dup1 dup1 min R^ max U^
S X                   dup1 dup1 min S^ max X^
C H                   dup1 dup1 min C^ max H^
G J                   dup1 dup1 min G^ max J^
I L                   dup1 dup1 min I^ max L^
O x[2,2]              dup1 dup1 min O^ max Y^
S V                   dup1 dup1 min S^ max V^
D I                                    max I^
H K                   dup1 dup1 min H^ max K^
L M                   dup1 dup1 min L^ max M^
N O                   dup1 dup1 min N^ max O^
P V                   dup1 dup1 min P^ max V^
S U                   dup1 dup1 min S^ max U^
W Y                   dup1 dup1 min W^ max Y^
E N                   dup1 dup1 min E^ max N^
K Q                   dup1 dup1 min K^ max Q^
L P                   dup1 dup1 min L^ max P^
S Y                   dup1 dup1 min S^ max Y^
T W                   dup1 dup1 min T^ max W^
B E                                    max E^
I L                   dup1 dup1 min I^ max L^
J T                   dup1 dup1 min J^ max T^
N R                   dup1 dup1 min N^ max R^
O S                   dup1 dup1 min O^ max S^
Q U                   dup1 dup1 min Q^ max U^
X Y                             min X^
E F                                    max F^
G N                   dup1 dup1 min G^ max N^
J O                   dup1 dup1 min J^ max O^
K R                   dup1 dup1 min K^ max R^
M Q                   dup1 dup1 min M^ max Q^
S T                   dup1 dup1 min S^ max T^
U V                             min U^
W X                             min W^
C G                                    max G^
F N                   dup1 dup1 min F^ max N^
H J                   dup1 dup1 min H^ max J^
M S                   dup1 dup1 min M^ max S^
P R                   dup1 dup1 min P^ max R^
Q T                             min Q^
U W                             min U^
F I                                    max I^
G H                                    max H^
J K                   dup1 dup1 min J^ max K^
L N                   dup1 dup1 min L^ max N^
O P                   dup1 dup1 min O^ max P^
R U                             min R^
H L                                    max L^
I J                                    max J^
K N                   dup1 dup1 min K^ max N^
M O                   dup1 dup1 min M^ max O^
P Q                             min P^
R S                             min R^
J M                                    max M^
K L                                    max L^
N O                             min N^
P R                             min P^
L M                                    max
N P                             min
                                min
")
I think there is still some work to be done. I'll update my online tool at some point as it now inserts automatic pixel references (when it can) and also has a variable-ised output option.
__________________
My AviSynth filters / I'm the Doctor

Last edited by wonkey_monkey; 14th July 2021 at 21:07.
wonkey_monkey is offline   Reply With Quote
Old 14th July 2021, 21:08   #120  |  Link
wonkey_monkey
Formerly davidh*****
 
wonkey_monkey's Avatar
 
Join Date: Jan 2004
Posts: 2,478
Made a couple of optimisations to the above post. I won't edit it further to avoid confusion.
__________________
My AviSynth filters / I'm the Doctor
wonkey_monkey is offline   Reply With Quote
Reply

Tags
avisynth, dogway, filters, hbd, packs

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 00:58.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, vBulletin Solutions Inc.