Welcome to Doom9's Forum, THE inplace to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. 
11th July 2021, 18:07  #101  Link 
Registered User
Join Date: Nov 2009
Posts: 2,339

I use sorting networks. Right now I'm trying to port (kinda done), Adaptive Median by VC Mohan checking the source file, the concept is easy but costly in terms of performance. In any case I'm doing this for the literature so I can have all the algos and how they look in one right place.
__________________
i74790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread 
11th July 2021, 21:22  #103  Link 
Formerly davidh*****
Join Date: Jan 2004
Posts: 2,478

Fully optimsed undot2  40% faster!
Code:
x[1,1] x[1,1] dup1 dup1 min swap2 max x[0,1] x[1,0] dup1 dup1 min swap2 max x[1,0] x[0,1] dup1 dup1 min swap2 max x[1,1] x[1,1] dup1 dup1 min swap2 max swap7 swap1 swap3 dup1 dup1 min swap2 max swap5 swap1 swap3 dup1 dup1 min swap2 max swap6 swap1 swap2 dup1 dup1 min swap2 max swap4 swap1 swap7 dup1 dup1 min swap2 max swap3 swap1 swap2 max swap6 dup1 dup1 min swap2 max swap4 swap1 swap5 dup1 dup1 min swap2 max swap3 swap1 swap2 min swap4 dup1 dup1 min swap2 max swap3 swap1 swap2 dup1 dup1 min swap2 max swap5 swap1 swap3 min swap2 swap1 swap3 max swap2 min swap2 max x swap2 swap1 clip Last edited by wonkey_monkey; 11th July 2021 at 21:36. 
12th July 2021, 13:05  #104  Link 
Registered User
Join Date: Nov 2009
Posts: 2,339

I tested the undot2 and I got a 10% performance increase, from 317fps (old dup1 dup1 syntax) to 343fps. VERY nice!.
I'm trying to understand the new syntax, I think I got it but got some issues. I wonder why in the first swap3 line you only compute max? Is this to save a comparison check? Following you do a swap6 so you end with A F (I name them to keep track) on the stack and do a min, max when it should be A C. Stack state after swap3 line: C F E G H D A B As an exercise I'm doing the simple undot1: Code:
"x[1,1] A^ x[0,1] B^ x[1,1] C^ x[1,0] D^ x[1,0] E^ x[1,1] F^ x[0,1] G^ x[1,1] H^ " "x[0,0] A B min C min D min E min F min G min H min A B max C max D max E max F max G max H max clip" Code:
x[0,0] x[1,1] x[0,1] x[1,1] x[1,0] x[1,0] x[1,1] x[0,1] x[1,1] swap7 swap1 swap6 dup1 dup1 min swap2 max swap5 dup swap2 min swap5 max swap1 swap3 dup swap2 max swap4 min dup1 min swap3 max dup1 max swap2 min swap2 dup1 max swap2 min swap2 dup1 max swap2 min swap clip
__________________
i74790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread Last edited by Dogway; 12th July 2021 at 14:03. 
12th July 2021, 18:17  #105  Link  
Formerly davidh*****
Join Date: Jan 2004
Posts: 2,478

Quote:
Quote:


12th July 2021, 21:06  #106  Link 
Formerly davidh*****
Join Date: Jan 2004
Posts: 2,478

Can you give this tool a try?
https://horman.net/expr_sort.php Select a network from the dropdown, then choose which sorted elements you want (e.g. for undot2, elements 1 and 6), select an ordering for the selected elements to be left on the stack, then hit submit. Copy the contents of the left box and replace each [#] with an unsorted element (e.g. a pixel reference). Last edited by wonkey_monkey; 12th July 2021 at 21:13. 
13th July 2021, 09:06  #107  Link 
Registered User
Join Date: Nov 2009
Posts: 2,339

Woow, so awesome! Saves a lot of time specially for 25 inputs. The problem is except for low inputs it's slower! I do think there's more to it than meets the eye. Following some reasoning.
I think there are some unwritten rules for Expr to optimize performance. I will try list those that come to mind.
I need to set those into stone to decide what optimization route to go forward with ExTools. I think the problem with 25 inputs is number 4. swapping might be a free operation but performance suffers because Expr likes continuous operations on the stack. You can see this clearly with the example in my above post, simply swap the location of "x" to the correct stack position and it increases performance by 10%. "x" or "x[0,0]" doesn't make a change (at all). Code:
x[0,1] x[1,1] x[1,0] x[1,1] x[1,1] x[1,0] x[1,1] x[0,1] dup1 dup1 min swap2 max swap3 dup1 swap2 min swap3 max swap1 dup1 swap2 max swap2 min swap3 dup1 swap2 max swap3 min swap1 dup1 swap2 min swap2 max swap2 dup1 swap2 min swap2 max swap2 dup1 swap2 min swap2 max x swap2 swap1 clip Code:
old new "x {th} > 255 scaleb x ?" > "x dup {th} > 255 scaleb swap1 swap2 ?" old = 471 + 473 + 479 + 474 + 465 + 482 + 481 = 475 mean = 474 median = 1883 (470.75) 50%p new = 471 + 469 + 486 + 481 + 464 + 468 + 485 = 474.85 mean = 471 median = 1872 (468) 50%p Another example, each one stacked 3 times to augment difference. Code:
sat=0.2 Expr("", Format("x A^ 1 {sat}  A * {sat} range_max A  * + ")) # 297 298 297 Expr("", Format("1 {sat}  x dup swap2 * range_max swap1 swap2  {sat} * +") ) # 294 299 293 So my suggestion for median5 (or long sorting networks) is to bring stack elements to the front from time to time (after 1 or 2 layers).
__________________
i74790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread 
13th July 2021, 19:34  #109  Link 
Registered User
Join Date: Nov 2009
Posts: 2,339

Slower than this.
Tried with this, but didn't help, it needs some manual work. Code:
mode == "median5"? "x[2,2] x[1,2] dup1 dup1 min swap2 max x[0,2] x[1,2] dup1 dup1 min swap2 max x[2,2] x[2,1] dup1 dup1 min swap2 max x[1,1] x[0,1] dup1 dup1 min swap2 max x[1,1] x[2,1] dup1 dup1 min swap2 max x[2,0] x[1,0] dup1 dup1 min swap2 max x[0,0] x[1,0] dup1 dup1 min swap2 max x[2,0] x[2,1] dup1 dup1 min swap2 max x[1,1] x[0,1] dup1 dup1 min swap2 max x[1,1] x[2,1] dup1 dup1 min swap2 max x[2,2] x[1,2] dup1 dup1 min swap2 max x[0,2] x[1,2] dup1 dup1 min swap2 max swap23 swap1 swap19 dup1 dup1 min swap2 max swap21 swap1 swap3 dup1 dup1 min swap2 max swap22 swap1 swap18 dup1 dup1 min swap2 max swap17 swap1 swap7 dup1 dup1 min swap2 max swap15 swap1 swap19 dup1 dup1 min swap2 max swap13 swap1 swap9 dup1 dup1 min swap2 max swap11 swap1 swap5 dup1 dup1 min swap2 max swap20 swap1 swap2 dup1 dup1 min swap2 max swap10 swap1 swap4 dup1 dup1 min swap2 max swap8 swap1 swap12 dup1 dup1 min swap2 max swap6 swap1 swap16 dup1 dup1 min swap2 max swap14 swap1 swap23 dup1 dup1 min swap2 max swap3 swap1 swap19 dup1 dup1 min swap2 max swap18 swap1 swap2 dup1 dup1 min swap2 max N@ P@ D@ O@ U@ E@ S@ L@ M@ R@ V@ J@ K@ X@ F@ Q@ C@ T@ G@ I@ Y@ A@ B@ H@ (...)
__________________
i74790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread Last edited by Dogway; 13th July 2021 at 19:37. 
13th July 2021, 20:33  #110  Link 
Formerly davidh*****
Join Date: Jan 2004
Posts: 2,478

I wonder if Expr's builtin spilling of values, when they don't fit on the stack, is slower than saving them to variables for some reason.
There is a fault somewhere in yours though  I checked it with random pixel values and it didn't give the correct median. 
13th July 2021, 20:52  #111  Link 
Registered User
Join Date: Nov 2009
Posts: 2,339

Yes I know, I revised it several times but didn't find the culprit. The same with undot1 as stated above. Might not be my fault who knows.
I think it might also be a matter of dragging behind stack elements as explained above with simple expressions. The following is also very slow, I guess popping them out fixes the issue: Code:
x[2,2] x[1,2] dup1 dup1 min swap2 max x[0,2] x[1,2] dup1 dup1 min swap2 max x[2,2] x[2,1] dup1 dup1 min swap2 max x[1,1] x[0,1] dup1 dup1 min swap2 max x[1,1] x[2,1] dup1 dup1 min swap2 max x[2,0] x[1,0] dup1 dup1 min swap2 max x[0,0] x[1,0] dup1 dup1 min swap2 max x[2,0] x[2,1] dup1 dup1 min swap2 max x[1,1] x[0,1] dup1 dup1 min swap2 max x[1,1] x[2,1] dup1 dup1 min swap2 max x[2,2] x[1,2] dup1 dup1 min swap2 max x[0,2] x[1,2] dup1 dup1 min swap2 max N^ P^ D^ O^ U^ E^ S^ L^ M^ R^ V^ J^ K^ X^ F^ Q^ C^ T^ G^ I^ Y^ A^ B^ H^ M
__________________
i74790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread Last edited by Dogway; 13th July 2021 at 20:55. 
13th July 2021, 20:57  #112  Link 
Formerly davidh*****
Join Date: Jan 2004
Posts: 2,478

Mine (median5) is faster if you set optSingleMode = true Yours does not benefit
Unfortunately knowing whether or not to switch it on is probably a black art and may not even be the same from computer to computer. Are you on x86 or x64? Edit: actually thinking about it, if it's faster on one computer it should be faster on any computer with the same architecture (x86 or x64). It's to do with how many registers are available. Yours gained some speed by keeping more registers free, but lost some due to storing and loading variables. I'm still a little surprised at how close they are though. When I eventually write my new RPN compiler I plan on including automatic profiling so it should always make the correct choices of such features. Last edited by wonkey_monkey; 13th July 2021 at 21:14. 
13th July 2021, 23:02  #113  Link 
Registered User
Join Date: Nov 2009
Posts: 2,339

That one is outdated actually , I think this might be faster (if we ignore the output difference):
Code:
x[2,2] A^ x[1,2] B^ x[0,2] C^ x[1,2] D^ x[2,2] E^ x[2,1] F^ x[1,1] G^ x[0,1] H^ x[1,1] I^ x[2,1] J^ x[2,0] K^ x[1,0] L^ x[0,0] M^ x[1,0] N^ x[2,0] O^ x[2,1] P^ x[1,1] Q^ x[0,1] R^ x[1,1] S^ x[2,1] T^ x[2,2] U^ x[1,2] V^ x[0,2] X^ x[1,2] Y^ x[2,2] Z^ A C dup1 dup1 min AA^ max CC^ B I dup1 dup1 min BB^ max II^ D S dup1 dup1 min DD^ max SS^ E R dup1 dup1 min EE^ max RR^ F U dup1 dup1 min FF^ max UU^ G T dup1 dup1 min GG^ max TT^ H J dup1 dup1 min HH^ max JJ^ K L dup1 dup1 min KK^ max LL^ M N dup1 dup1 min MM^ max NN^ O Q dup1 dup1 min OO^ max QQ^ P X dup1 dup1 min PP^ max XX^ V Y dup1 dup1 min VV^ max YY^ AA DD dup1 dup1 min A^ max D^ BB PP dup1 dup1 min B^ max P^ CC SS dup1 dup1 min C^ max S^ EE MM dup1 dup1 min E^ max M^ FF VV dup1 dup1 min F^ max V^ GG KK dup1 dup1 min G^ max K^ HH OO dup1 dup1 min H^ max O^ II XX dup1 dup1 min I^ max X^ JJ QQ dup1 dup1 min J^ max Q^ LL TT dup1 dup1 min L^ max T^ NN RR dup1 dup1 min N^ max R^ UU YY dup1 dup1 min U^ max Y^ A E dup1 dup1 min AA^ max EE^ B H dup1 dup1 min BB^ max HH^ C N dup1 dup1 min CC^ max NN^ D M dup1 dup1 min DD^ max MM^ F G dup1 dup1 min FF^ max GG^ I O dup1 dup1 min II^ max OO^ J P dup1 dup1 min JJ^ max PP^ K V dup1 dup1 min KK^ max VV^ L U dup1 dup1 min LL^ max UU^ Q X dup1 dup1 min QQ^ max XX^ R S dup1 dup1 min RR^ max SS^ T Y dup1 dup1 min TT^ max YY^ AA FF dup1 dup1 min A^ max F^ CC LL dup1 dup1 min C^ max L^ DD GG dup1 dup1 min D^ max G^ EE KK dup1 dup1 min E^ max K^ HH QQ dup1 dup1 min H^ max Q^ II JJ dup1 dup1 min I^ max J^ MM VV dup1 dup1 min M^ max V^ NN TT dup1 dup1 min N^ max T^ OO PP dup1 dup1 min O^ max P^ RR UU dup1 dup1 min R^ max U^ SS YY dup1 dup1 min S^ max Y^ C H dup1 dup1 min CC^ max HH^ G J dup1 dup1 min GG^ max JJ^ I LL dup1 dup1 min II^ max L^ O Z dup1 dup1 min OO^ max ZZ^ S V dup1 dup1 min SS^ max VV^ D II dup1 dup1 min DD^ max I^ HH K dup1 dup1 min H^ max KK^ LL M dup1 dup1 min L^ max MM^ N OO dup1 dup1 min NN^ max O^ P VV dup1 dup1 min PP^ max V^ SS U dup1 dup1 min S^ max UU^ X ZZ dup1 dup1 min XX^ max Z^ E NN dup1 dup1 min EE^ max N^ KK Q dup1 dup1 min K^ max QQ^ L PP dup1 dup1 min LL^ max P^ S Z dup1 dup1 min SS^ max ZZ^ T XX dup1 dup1 min TT^ max X^ BB EE dup1 dup1 min B^ max E^ I LL dup1 dup1 min II^ max L^ JJ TT dup1 dup1 min J^ max T^ N R dup1 dup1 min NN^ max RR^ O SS dup1 dup1 min OO^ max S^ QQ UU dup1 dup1 min Q^ max U^ Y ZZ dup1 dup1 min YY^ max Z^ A B dup1 dup1 min AA^ max BB^ E F dup1 dup1 min EE^ max FF^ GG NN dup1 dup1 min G^ max N^ J OO dup1 dup1 min JJ^ max O^ K RR dup1 dup1 min KK^ max R^ MM Q dup1 dup1 min M^ max QQ^ S T dup1 dup1 min SS^ max TT^ U V dup1 dup1 min UU^ max VV^ X YY dup1 dup1 min XX^ max Y^ CC G dup1 dup1 min C^ max GG^ DD EE dup1 dup1 min D^ max E^ FF N dup1 dup1 min F^ max NN^ H JJ dup1 dup1 min HH^ max J^ M SS dup1 dup1 min MM^ max S^ P R dup1 dup1 min PP^ max RR^ QQ TT dup1 dup1 min Q^ max T^ UU XX dup1 dup1 min U^ max X^ VV Y dup1 dup1 min V^ max YY^ BB C dup1 dup1 min B^ max CC^ F II dup1 dup1 min FF^ max I^ GG HH dup1 dup1 min G^ max H^ J KK dup1 dup1 min JJ^ max K^ L NN dup1 dup1 min LL^ max N^ O PP dup1 dup1 min OO^ max P^ RR U dup1 dup1 min R^ max UU^ V X dup1 dup1 min VV^ max XX^ B D dup1 dup1 min BB^ max DD^ CC E dup1 dup1 min C^ max EE^ FF G dup1 dup1 min F^ max GG^ H LL dup1 dup1 min HH^ max L^ I JJ dup1 dup1 min II^ max J^ K N dup1 dup1 min KK^ max NN^ MM OO dup1 dup1 min M^ max O^ P Q min PP^ R S min RR^ J M max MM^ KK L max LL^ NN O min N^ PP RR min P^ LL MM max M^ N P min NN^ M NN min MM^ MM
__________________
i74790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread 
14th July 2021, 04:42  #115  Link 
Registered User
Join Date: Jan 2018
Posts: 2,085

I think need keep output same result cause it will affect whole scripts when just replace some functions. Example: you was replaced blur(.6) with RemoveGrain(1) in Framerateconverter.avsi cause it's only faster but MysteryX said it will make artifact when mask larger
Last edited by kedautinh12; 14th July 2021 at 05:42. 
14th July 2021, 10:47  #116  Link 
Registered User
Join Date: Nov 2009
Posts: 2,339

I just removed unnecessary checks for the last 2 layers. Anyway I will revise it one more time this time with regex so I don't do any mistake.
Today mainly will be profiling and optimizing, if lucky I can release this evening. @kedautinh12: Yes, these are only drafts. I still haven't updated MIX/EX mods to latest ExTools, it happened that removegrain(12) or blur is not true gaussian but a weighted mean. blur(0.6) was implemented recently as a bare convolution. I also want to optimize Expr with latest tricks but most likely will leave untouched as shown here. EDIT: By the way undot was: Code:
x[0,1] x[1,1] x[1,0] x[1,1] x[1,1] x[1,0] x[1,1] x[0,1] dup1 dup1 min swap2 max swap3 dup swap2 min swap3 max swap1 dup swap2 max swap2 min swap3 dup swap2 max swap3 min swap1 dup swap2 min swap2 max swap2 dup swap2 min swap2 max swap2 dup swap2 min swap2 max x swap2 swap1 clip
__________________
i74790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread Last edited by Dogway; 14th July 2021 at 15:15. 
14th July 2021, 20:18  #117  Link 
Formerly davidh*****
Join Date: Jan 2004
Posts: 2,478

Can you check if this is faster for median5? It was for me. It uses the second 25item network instead of the first, which is apparently more Exprfriendly:
Code:
Expr(optsinglemode = true, " x[2,2] x[1,2] dup1 dup1 min swap2 max x[0,2] x[1,2] dup1 dup1 min swap2 max x[2,2] x[2,1] dup1 dup1 min swap2 max x[1,1] x[0,1] dup1 dup1 min swap2 max x[1,1] x[2,1] dup1 dup1 min swap2 max x[2,0] x[1,0] dup1 dup1 min swap2 max x[0,0] x[1,0] dup1 dup1 min swap2 max x[2,0] x[2,1] dup1 dup1 min swap2 max x[1,1] x[0,1] dup1 dup1 min swap2 max x[1,1] x[2,1] dup1 dup1 min swap2 max x[2,2] x[1,2] dup1 dup1 min swap2 max x[0,2] x[1,2] dup1 dup1 min swap2 max swap23 swap1 swap19 dup1 dup1 min swap2 max swap21 swap1 swap3 dup1 dup1 min swap2 max swap22 swap1 swap18 dup1 dup1 min swap2 max swap17 swap1 swap7 dup1 dup1 min swap2 max swap15 swap1 swap19 dup1 dup1 min swap2 max swap13 swap1 swap9 dup1 dup1 min swap2 max swap11 swap1 swap5 dup1 dup1 min swap2 max swap20 swap1 swap2 dup1 dup1 min swap2 max swap10 swap1 swap4 dup1 dup1 min swap2 max swap8 swap1 swap12 dup1 dup1 min swap2 max swap6 swap1 swap16 dup1 dup1 min swap2 max swap14 swap1 swap23 dup1 dup1 min swap2 max swap3 swap1 swap19 dup1 dup1 min swap2 max swap18 swap1 swap2 dup1 dup1 min swap2 max swap7 swap1 swap23 dup1 dup1 min swap2 max swap21 swap1 swap15 dup1 dup1 min swap2 max swap9 swap1 swap5 dup1 dup1 min swap2 max swap4 swap1 swap20 dup1 dup1 min swap2 max swap12 swap1 swap22 dup1 dup1 min swap2 max swap11 swap1 swap13 dup1 dup1 min swap2 max swap16 swap1 swap19 dup1 dup1 min swap2 max swap8 swap1 swap10 dup1 dup1 min swap2 max swap14 swap1 swap17 dup1 dup1 min swap2 max swap6 swap1 swap3 dup1 dup1 min swap2 max swap2 swap1 swap20 max swap14 swap1 swap9 dup1 dup1 min swap2 max swap4 swap1 swap3 dup1 dup1 min swap2 max swap17 swap1 swap18 dup1 dup1 min swap2 max swap21 swap1 swap12 dup1 dup1 min swap2 max swap8 swap1 swap15 dup1 dup1 min swap2 max swap20 swap1 swap19 dup1 dup1 min swap2 max swap11 swap1 swap10 dup1 dup1 min swap2 max swap2 swap1 swap7 dup1 dup1 min swap2 max swap5 swap1 swap9 dup1 dup1 min swap2 max swap13 x[2,2] dup1 dup1 min swap2 max swap7 swap1 swap20 dup1 dup1 min swap2 max swap16 swap1 swap22 dup1 dup1 min swap2 max swap9 swap1 swap11 dup1 dup1 min swap2 max swap5 swap1 swap20 dup1 dup1 min swap2 max swap2 swap1 swap21 dup1 dup1 min swap2 max swap12 swap1 swap6 dup1 dup1 min swap2 max swap14 swap1 swap7 min swap22 swap1 swap20 dup1 dup1 min swap2 max swap18 swap1 swap10 max swap11 swap1 swap18 dup1 dup1 min swap2 max swap13 swap1 swap20 dup1 dup1 min swap2 max swap16 swap1 swap3 dup1 dup1 min swap2 max swap7 swap1 swap18 dup1 dup1 min swap2 max swap6 swap1 swap4 dup1 dup1 min swap2 max swap15 swap1 swap19 dup1 dup1 min swap2 max swap2 swap1 swap3 max swap15 swap1 swap7 dup1 dup1 min swap2 max swap16 swap1 swap18 dup1 dup1 min swap2 max swap5 swap1 swap9 dup1 dup1 min swap2 max swap2 swap1 swap4 dup1 dup1 min swap2 max swap14 swap1 swap20 dup1 dup1 min swap2 max swap11 swap1 swap7 dup1 dup1 min swap2 max swap8 swap1 swap15 max swap16 swap1 swap4 dup1 dup1 min swap2 max swap11 swap1 swap8 dup1 dup1 min swap2 max swap2 swap1 swap15 dup1 dup1 min swap2 max swap12 swap1 swap19 dup1 dup1 min swap2 max swap3 swap1 swap14 dup1 dup1 min swap2 max swap6 swap1 swap13 dup1 dup1 min swap2 max swap7 swap1 swap10 min swap17 swap1 swap14 max swap14 swap1 swap6 dup1 dup1 min swap2 max swap15 swap1 swap12 dup1 dup1 min swap2 max swap7 swap1 swap17 dup1 dup1 min swap2 max swap3 swap1 swap9 dup1 dup1 min swap2 max swap11 swap1 swap8 dup1 dup1 min swap2 max swap6 swap1 swap10 dup1 dup1 min swap2 max swap5 swap1 swap16 min swap3 min swap10 swap1 swap7 max swap12 swap1 swap6 dup1 dup1 min swap2 max swap14 swap1 swap11 max swap9 swap1 swap3 dup1 dup1 min swap2 max swap4 swap1 swap12 dup1 dup1 min swap2 max swap7 swap1 swap3 min swap4 swap1 swap7 min swap8 swap1 swap7 max swap8 swap1 swap9 max swap9 swap1 swap8 dup1 dup1 min swap2 max swap5 swap1 swap2 dup1 dup1 min swap2 max swap8 swap1 swap4 min swap2 swap1 swap5 min swap5 swap1 swap7 max swap3 max swap5 min swap3 swap1 swap2 min swap3 max swap2 min min ") Last edited by wonkey_monkey; 14th July 2021 at 20:22. 
14th July 2021, 20:40  #118  Link 
Registered User
Join Date: Nov 2009
Posts: 2,339

It runs at 74fps, more or less like my post above, but I went and optimized it further and now it runs at 87fps
I already profiled all the modes and should be running at optimal speeds, I'm not sure it's possible to squeeze more out of it. For ex_edge I don't think I will be doing anything, they run pretty fast already. I tried to do some dup swap thing in some functions like ex_merge, ex_makeadddiff and so, but I don't think they improve performance, if any it kinda feel slower (minimal but still). For ex_binarize smooth mode it was a tie, not sure what to think but left it updated with the new syntax. Tomorrow I will profile GradePack (ex_contrast and ex_levels). By the way, for anyone reading, rebased MIX/EX mods with latest ExTools.
__________________
i74790K@Stock::GTX 1070] AviSynth+ filters and mods on GitHub + Discussion thread 
14th July 2021, 20:56  #119  Link 
Formerly davidh*****
Join Date: Jan 2004
Posts: 2,478

Try this. I got 25% more out of it than my previous one (note that it does NOT use OptSingleMode):
Code:
Expr(" x[2,2] x[2,1] dup1 dup1 min A^ max C^ x[2,0] x[2,1] dup1 dup1 min B^ max I^ x[2,2] x[1,2] dup1 dup1 min D^ max S^ x[1,1] x[1,0] dup1 dup1 min E^ max R^ x[1,1] x[1,2] dup1 dup1 min F^ max U^ x[0,2] x[0,1] dup1 dup1 min G^ max T^ x[0,0] x[0,1] dup1 dup1 min H^ max J^ x[0,2] x[1,2] dup1 dup1 min K^ max L^ x[1,1] x[1,0] dup1 dup1 min M^ max N^ x[1,1] x[1,2] dup1 dup1 min O^ max Q^ x[2,2] x[2,1] dup1 dup1 min P^ max W^ x[2,0] x[2,1] dup1 dup1 min V^ max X^ A D dup1 dup1 min A^ max D^ B P dup1 dup1 min B^ max P^ C S dup1 dup1 min C^ max S^ E M dup1 dup1 min E^ max M^ F V dup1 dup1 min F^ max V^ G K dup1 dup1 min G^ max K^ H O dup1 dup1 min H^ max O^ I W dup1 dup1 min I^ max W^ J Q dup1 dup1 min J^ max Q^ L T dup1 dup1 min L^ max T^ N R dup1 dup1 min N^ max R^ U X dup1 dup1 min U^ max X^ A E dup1 dup1 min A^ max E^ B H dup1 dup1 min B^ max H^ C N dup1 dup1 min C^ max N^ D M dup1 dup1 min D^ max M^ F G dup1 dup1 min F^ max G^ I O dup1 dup1 min I^ max O^ J P dup1 dup1 min J^ max P^ K V dup1 dup1 min K^ max V^ L U dup1 dup1 min L^ max U^ Q W dup1 dup1 min Q^ max W^ R S dup1 dup1 min R^ max S^ T X dup1 dup1 min T^ max X^ A F max F^ C L dup1 dup1 min C^ max L^ D G dup1 dup1 min D^ max G^ E K dup1 dup1 min E^ max K^ H Q dup1 dup1 min H^ max Q^ I J dup1 dup1 min I^ max J^ M V dup1 dup1 min M^ max V^ N T dup1 dup1 min N^ max T^ O P dup1 dup1 min O^ max P^ R U dup1 dup1 min R^ max U^ S X dup1 dup1 min S^ max X^ C H dup1 dup1 min C^ max H^ G J dup1 dup1 min G^ max J^ I L dup1 dup1 min I^ max L^ O x[2,2] dup1 dup1 min O^ max Y^ S V dup1 dup1 min S^ max V^ D I max I^ H K dup1 dup1 min H^ max K^ L M dup1 dup1 min L^ max M^ N O dup1 dup1 min N^ max O^ P V dup1 dup1 min P^ max V^ S U dup1 dup1 min S^ max U^ W Y dup1 dup1 min W^ max Y^ E N dup1 dup1 min E^ max N^ K Q dup1 dup1 min K^ max Q^ L P dup1 dup1 min L^ max P^ S Y dup1 dup1 min S^ max Y^ T W dup1 dup1 min T^ max W^ B E max E^ I L dup1 dup1 min I^ max L^ J T dup1 dup1 min J^ max T^ N R dup1 dup1 min N^ max R^ O S dup1 dup1 min O^ max S^ Q U dup1 dup1 min Q^ max U^ X Y min X^ E F max F^ G N dup1 dup1 min G^ max N^ J O dup1 dup1 min J^ max O^ K R dup1 dup1 min K^ max R^ M Q dup1 dup1 min M^ max Q^ S T dup1 dup1 min S^ max T^ U V min U^ W X min W^ C G max G^ F N dup1 dup1 min F^ max N^ H J dup1 dup1 min H^ max J^ M S dup1 dup1 min M^ max S^ P R dup1 dup1 min P^ max R^ Q T min Q^ U W min U^ F I max I^ G H max H^ J K dup1 dup1 min J^ max K^ L N dup1 dup1 min L^ max N^ O P dup1 dup1 min O^ max P^ R U min R^ H L max L^ I J max J^ K N dup1 dup1 min K^ max N^ M O dup1 dup1 min M^ max O^ P Q min P^ R S min R^ J M max M^ K L max L^ N O min N^ P R min P^ L M max N P min min ") Last edited by wonkey_monkey; 14th July 2021 at 21:07. 
Tags 
avisynth, dogway, filters, hbd, packs 
Thread Tools  Search this Thread 
Display Modes  

