Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
|
Thread Tools | Search this Thread | Display Modes |
16th February 2010, 10:53 | #1 | Link |
Registered User
Join Date: Feb 2010
Posts: 84
|
SEt's Avisynth 2.5.8 MT compiled for *X86_64*, Latest Build 4/16/2010
Edit: Quicklinks Updated 4/26/2010
Featured Release 4/16/2010 + Resize artifacting fixed + Horizontal resize code re-written to use SSE registers + Worth noting, often used functions Temporal Soften, Merge, etc have been tweaked for a decent speed gain + Bug found, fixed in memory copy routine, again + Universal binary, no longer need to distinguish between AMD and Intel builds + Optimized BitBlt memory copy Routine + Started implementation of SSE3/4 specific instructions when supported processor is detected + Removed most code paths intended to support CPU's without mmx/iSSE + Resize functions reworked to take advantage of extra registers available when processor is in 64bit mode Avisynth64 binary and installer built on 4/16/2010 Many of the plugins have been optimized and recompiled, please get the latest and greatest versions with this release. Service update on 3/19/2010 + Minor fixes to allow better usage of MT modes + Tweaks to code for small performance increases all around + Fixed resize bug Use this build for Intel processors Use this build for AMD processors Version from 3/15/2010: 64 bit Avisynth 2.5.8 w/multithreading ------------------------------------------------ Plugins (Alphabetically, for now) ------------------------------------------------ New on 3/21/2010 AddGrainC x64 Built on 3/14/2010 AutoCrop x64 New on 3/21/2010 aWarpSharp x64 This version is based on SEt's original rewrite found in this thread Built on 3/20/2010 Color Matrix x64 New on 3/13/2010 DFTTest x64-->needs the included libfftw3f-3.dll to be in your system32 directory Built on 3/19/2010 DgDecode 1.5.8 x64 Note: Is missing some IDCT modes, will get them back ASAP Built on 4/10/2010 EEDI2 x64 + Vectorized main loop + Further restructured main loop to minimize branching, processor dependent speed increase FeildHints x64 Built on 3/12/2010 FFT3DFilter-->needs the included libfftw3f-3.dll to be in your system32 directory Built on 3/15/2010 FFT3DGPU x64 note:The hlsl (shader program) file is edited from the original to adhere to pixel shader 3.0 syntax rules. Please make sure to place the correct file in the same directory as the 64bit plugin. New on 3/13/2010 kemuri-_9's FFMS2 (The Fabulous FM Source 2) Big thanks to kemuri-_9 for the build Built on 3/29/2010 GradFun2DB x64 Built on 4/08/2010 hqdn3d x64 Built on 3/14/2010 LeakKernelDeint x64 Built on of 3/12/2010 MaskTools 2.0a41 + Now straight from the source, Manao Built on 3/31/2010 MVTools2 x64 + Continued conversion and updating of assembly functions + Removal of some code intended to support processors without mmx/iSSE + Converted often used assembly functions to SSE2 instead of mmx/iSSE + Updated to latest shared function library from x264 + Healthy 20%+ speed increase over x86 version in most cases New on 3/14/2010 TDeint x64 This is basically the same as squid80's build, main differences being newer avisynth.h and newer compiler TelecideHints x64 Built on 3/13/2010: TIVTC x64 New on 3/20/2010 TNLMeans_x64 v1.0.3 New on 3/20/2010 TTempSmooth x64 v0.9.4 RemoveGrain x64 Repair x64 VerticalCleaner x64 Visit Squid80's homepage for more x64 plugins Benchmarking Suggestions Here is a 64bit avs2avi for benchmarking. You can run it against the original To simply run the script through Avisynth, execute the following at a command prompt: Code:
avs2avi64.exe <path:\script.avs> -o n The same should be done with avs2avi.exe. This will take two factors out of the speed equation: 64bit vs 32bit compressors and hard disk write speed. The final fps report from both runs will allow a fairly apples to apples comparison of the two builds. Source Code For those who are interested: The source is now hosted over at google code, I'll keep it as up to date as I can The source is constantly in flux. This wiki page has all of the plugins linked as well. The source to any of the plugins I have personally modified is available upon request. Please message me if interested. Last edited by JoshyD; 27th April 2010 at 03:05. |
16th February 2010, 20:06 | #3 | Link |
Registered User
Join Date: Feb 2010
Posts: 84
|
Hmm, it appears that even when linking all your libs statically, intel's compiler still links the openmp libs dynamically, I'll get a rebuild up ASAP. From the ICC forums:
"The 11.0 Windows compilers (both C++ and Fortran) have decoupled /MT from having any effect on which OpenMP runtime (static or dynamic) gets linked. In fact, all of /MT[d], /MD[d], and /ML[d] (latter VS2003 only) now only effect which MS C runtime is linked. We made this change because we want dynamic to be the default when linking the OpenMP RTL. The use of static OpenMP libraries is not recommended, because they might cause multiple libraries to be linked in an application. The condition is not supported and could lead to unpredictable results. It can also cause Thread Checker false positives and other problems with the Intel Threading tools. If you want to link against the static OpenMP RTL, you must add /Qopenmp-link:static, which is a new switch for 11.0. So to produce a purely static executable, compile/link with /MT /Qopenmp-link:static Patrick Kennedy Intel Compiler Lab" I didn't build the project with OpenMP libs enabled, but did allow the compiler to auto-parallelize loops it found it could. Perhaps this is the issue. Last edited by JoshyD; 16th February 2010 at 20:23. |
17th February 2010, 00:48 | #4 | Link |
Registered User
Join Date: Feb 2010
Posts: 84
|
There is support for SetMTMode and it's functions, but not MT("command") as of now. The whole MT function is contained in an entirely separate DLL loaded form your plugins dir automatically, or manually at the beginning of the script.
The RAR does contain a copy of direct show source for use with AVI synth. Any program that can open avs files AND is already a 64 bit executable should do the trick. VDub64 comes to mind, as well as media player classic home cinema 64. 64 bit builds of x264 that specifically have the ability to open avs files should work as well. I have personally tested Virtual Dub 64, it has played back all of my little test cases I've had a chance to run without any complaint. For some 64bit plugins that MAY work with this DLL please check out Squid80's prior work. He was the guy who ported Avisynth 2.5.5 to x64 a while back, without the nicety of having a compiler that supports inline asm. Other than that, the the top two plugins that I want to see work with 64bit are MvTools2 and FFT3DFilter. I get a lot of mileage out of those two projects. If there's enough interest generated I'm considering going back and optimizing all the ASM routines to take advantage of architectural changes that have occured over the past 5 or so years. Some of these assembly routines are long in the tooth, and could be better tuned for newer processors (I think). Anyhow thanks for the post, and keep checking back for updates. |
17th February 2010, 00:54 | #5 | Link |
Registered User
Join Date: Nov 2009
Posts: 327
|
What I meant was, could you modify your avisynth64 release to use a separate directory for autoloading 64-bit plugins to avoid name-conflicts and similar issues? On an aside, having mvtools2 + MaskTools2 would allow a lot of script functions to work automatically.
Last edited by Stephen R. Savage; 17th February 2010 at 00:56. |
17th February 2010, 01:33 | #6 | Link | |
Registered User
Join Date: Feb 2010
Posts: 84
|
Quote:
|
|
17th February 2010, 03:50 | #7 | Link | |
Registered User
Join Date: Nov 2009
Posts: 327
|
Update: I copied avisynth.dll and devil.dll to system32 on a Windows 2008 R2 setup, after which I imported avisynth.reg. I created a test script with the code "BlankClip().ConvertToYV12()". It does not load in either VirtualDub64 or x264. Both crash upon exit. VirtualDub64 gives an error message "AVI Import Filter error: (Unknown) 80040154".
Quote:
Last edited by Stephen R. Savage; 17th February 2010 at 04:02. |
|
17th February 2010, 05:02 | #8 | Link | |
Registered User
Join Date: Feb 2010
Posts: 84
|
Quote:
I'm not sure which snapshot of the binary I compiled you may have grabbed, but along the way I realized I had the compile flags wrong. I was only generating code for a core I3/I5/I7. This caused my Penryn laptop to die when trying to load a clip. With the latest file up there, my Penryn executes no problem. Just to be double safe, try this compilation of the avisynth binary: It only has the SSE2 code path enabled If that doesn't work, we'll come up with a solution, I'm a bit hazy on how Windows associates registry keys with filter types. If anyone else has some pointers, that'd be great too. |
|
17th February 2010, 05:16 | #9 | Link |
Registered User
Join Date: Nov 2009
Posts: 327
|
The copy you just linked to depends on libiomp5md.dll, so I can't run it. Incidentally, my Merom CPU supports flags up to SSSE3, but not SSE4.1 like the higher-end Penryns. Perhaps you could use runtime CPU detection instead of requiring a specific instruction set compatibility?
Edit: Success! The build in the topic post of this now works. I guess it must have been silently updated. Edit2: DirectShowSource and Spline36Resize work. However, scripts in my Avisynth32 plugins directory don't seem to be autoimported. Edit3: TDeint64 built by Squid80 works fine. RemoveGrain64 by Kassandro does not (unknown exception). Edit4: EEDI2_64 by Squid80 does not work. I believe this is because it is also statically linked to OpenMP, which according to a Google result on Intel's webpage, can cause conflicts. Edit5: ConvertToRGB32 does not work. It always errors with the message "Rec.709 and PC Levels support require MMX and horizontal width a multiple of 4" regardless of settings/input. Last edited by Stephen R. Savage; 17th February 2010 at 05:49. |
17th February 2010, 06:10 | #10 | Link |
Registered User
Join Date: Feb 2010
Posts: 84
|
EDIT:
You were right about EEDI2 using OpenMP and being dynamically linked to the libraries. I just recompiled the source with the OpenMP libraries statically linked: EEDI2 64bit Multithreaded Also, where did you run across RemoveGrain64? I'd like a copy (and the source if possible) so I can figure out what exception it's throwing. The only thing I can say for sure is that converting RGB error is hard coded in there for now, as there was a decent amount of assembly involved in converting the routine, and I just plain didn't feel like it. I'll get back around to it, put that on my todo as well. I'll start checking into those other two plugins, I'm working on MVTools at the moment . . . that would be a huge win. The other issue with the code paths was that I was basically telling ICC to target just my host CPU. When I tried to get it going on any other computer I realized my mistake. The current DLL for download has code paths for all EMT64 supporting Intel processors. Once again, don't know if ICC cripples AMD chips or not . . . Last edited by JoshyD; 17th February 2010 at 06:58. |
17th February 2010, 20:51 | #11 | Link |
typo lover
Join Date: May 2009
Posts: 595
|
Hi, JoshyD
Thx for static build version. it seems to work at present without troubles. I did some benchmarks. the results is here. I think that faster 64bit decoder is necessary for me... Last edited by Chikuzen; 2nd March 2010 at 13:39. |
17th February 2010, 21:22 | #12 | Link |
Registered User
Join Date: Nov 2009
Posts: 327
|
Kassandro posted his RemoveGrain64 on his own forum:
http://videoprocessing.11.forumer.co...2cd20f1c9425d0 Edit: Your rebuilt EEDI2() causes artifacts in chroma. Edit2: Seems to be a pitch error. Doing a TurnLeft/TurnRight works around it. Last edited by Stephen R. Savage; 17th February 2010 at 21:35. |
18th February 2010, 21:35 | #13 | Link |
Registered User
Join Date: Feb 2010
Posts: 84
|
Thanks for the benchmarcks! I don't think we're going to see any speed increases with 64bit code unless the assembly is re-worked to take advantage of more registers / less memory access.
Any resizers you run through there are really just using their old 32 bit counter part, essentially. The size of the pointers change, but the register usage at the CPU level is still the same, because it was specified explicitly. I'm going to continue work to see if I can't eek out some extra performance. I mainly want to see if I can't get some of the more demanding plugins a speed boost. I'm not sure what's wrong with EEDI2, the source was straight compiled again. |
19th February 2010, 01:05 | #14 | Link |
Registered User
Join Date: Nov 2009
Posts: 327
|
Perhaps there was always the bug in the source code. Nevertheless, if you have experience in Avisynth plugin development, perhaps you could squash it for us? Please?
EEDI2 would normally not be anywhere near the top of my priorities, but without nnedi2, it's pretty much a necessary step to get TempGaussMC working on avs64 (the other steps being removegrain64, mvtools2_64, and masktools2_64). |
19th February 2010, 12:06 | #15 | Link | |
Registered User
Join Date: Feb 2010
Posts: 84
|
Quote:
Try this out and let me know if it produces consistent results I DO have a somewhat working MVTools2. In so far as I have tested it, the "important" functions are working. A TON of the ASM has been re-coded to adhere to function calling specifications set forth by x64 c++. I did it by hand, meaning, there's probably a decent chance you'll crash it. I also had to update the parts borrowed from other projects (xvid, x264, fftw), so those are a little "rough" at the moment. My test cases mainly focused around motion vector generation and the degraining functions. Perhaps someone else will be able to fault it in other places, allowing me a chance to find and fix the bugs. Here's the link to MVTools2 x64 Personally, I see a significant performance increase (from ~20fps x86 to ~30fps x64, when using multi threading in both cases) when just writing out a raw stream. Try it out, and let me know where the problems are. This is a little sample of what I've been using to mess around with the parameters. You can get it to go through a surprisingly large number of code paths just by varying the inputs to different degrain functions. Code:
#MVTools x64 SetMTMode(2,4) #could be more, my system has four logical threads, but in certain instances more increase my encoding fps LoadPlugin("D:\Development\mvtools2.dll") AviSource("D:\testfile.avi") ConvertToYV12(interlaced=true) function MDegrain2i(clip source, int "overlap", int "dct", int "blksize", int "pel", int "search", int "searchparam") { overlap = default(overlap,0) # overlap value (0 to 4 for blksize=8) dct = default(dct,0) # use dct=1 for clip with light flicker blksize = default(blksize, 8) pel = default(pel, 2) search = default(search, 4) searchparam = default(searchparam, 3) fields = source.SeparateFields() # separate by fields super = fields.MSuper(chroma=true, pel=pel) backward_vec2 = super.MAnalyse(blksize=blksize, isb = true, delta = 2, overlap=overlap, dct=dct, truemotion=true, temporal=true, pelsearch=pel, search=search, searchparam=searchparam) forward_vec2 = super.MAnalyse(blksize=blksize, isb = false, delta = 2, overlap=overlap, dct=dct, truemotion=true, temporal=true, pelsearch=pel, search=search, searchparam=searchparam) backward_vec4 = super.MAnalyse(blksize=blksize, isb = true, delta = 4, overlap=overlap, dct=dct, truemotion=true, temporal=true, pelsearch=pel, search=search, searchparam=searchparam) forward_vec4 = super.MAnalyse(blksize=blksize, isb = false, delta = 4, overlap=overlap, dct=dct, truemotion=true, temporal=true, pelsearch=pel, search=search, searchparam=searchparam) fields.MDegrain2(super, backward_vec2,forward_vec2,backward_vec4,forward_vec4, thSAD=500, thSCD1=500, thSCD2=130, plane=4) Weave() } return MDegrain2i(last, overlap=4, blksize=8, pel=2) Last edited by JoshyD; 20th February 2010 at 01:19. |
|
20th February 2010, 17:08 | #17 | Link |
Registered User
Join Date: Nov 2009
Posts: 327
|
RemoveGrain is not very useful without its brother Repair, but the version you linked works. Also, what version did you compile? The EEDI2 build works and is consistent, though no longer multithreaded. Will all filters with internal multithreading be incompatible with this version of Avisynth?
Edit: Strange request, but could you build TelecideHints and FieldHint as well? They're pretty useful for anyone who uses Yatta, even if Yatta itself isn't 64-bit, and should be completely free of assembly code. Edit2: Hmm, perhaps I should make a checklist of things that'd be cool in Avs64. Edit3: VSFilter64 available here. Last edited by Stephen R. Savage; 20th February 2010 at 18:18. |
21st February 2010, 19:17 | #18 | Link |
AviSynth plugger
Join Date: Nov 2003
Location: Russia
Posts: 2,183
|
JoshyD, you do big work!
I am intersted to see your TON of the ASM mvtools_64 source code lines
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick I usually do not provide a technical support in private messages. |
22nd February 2010, 01:23 | #19 | Link | |
Potentate
Join Date: Mar 2003
Posts: 219
|
Quote:
Fizick, you do know that he is Jeremy Duncan in another persona, right? From the user control panel, Jeremy's last activity was on Feb 5. JoshyD joins on Feb 5... Hmmmmm |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|