Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
|
Thread Tools | Search this Thread | Display Modes |
|
16th February 2010, 10:53 | #1 | Link |
Registered User
Join Date: Feb 2010
Posts: 84
|
SEt's Avisynth 2.5.8 MT compiled for *X86_64*, Latest Build 4/16/2010
Edit: Quicklinks Updated 4/26/2010
Featured Release 4/16/2010 + Resize artifacting fixed + Horizontal resize code re-written to use SSE registers + Worth noting, often used functions Temporal Soften, Merge, etc have been tweaked for a decent speed gain + Bug found, fixed in memory copy routine, again + Universal binary, no longer need to distinguish between AMD and Intel builds + Optimized BitBlt memory copy Routine + Started implementation of SSE3/4 specific instructions when supported processor is detected + Removed most code paths intended to support CPU's without mmx/iSSE + Resize functions reworked to take advantage of extra registers available when processor is in 64bit mode Avisynth64 binary and installer built on 4/16/2010 Many of the plugins have been optimized and recompiled, please get the latest and greatest versions with this release. Service update on 3/19/2010 + Minor fixes to allow better usage of MT modes + Tweaks to code for small performance increases all around + Fixed resize bug Use this build for Intel processors Use this build for AMD processors Version from 3/15/2010: 64 bit Avisynth 2.5.8 w/multithreading ------------------------------------------------ Plugins (Alphabetically, for now) ------------------------------------------------ New on 3/21/2010 AddGrainC x64 Built on 3/14/2010 AutoCrop x64 New on 3/21/2010 aWarpSharp x64 This version is based on SEt's original rewrite found in this thread Built on 3/20/2010 Color Matrix x64 New on 3/13/2010 DFTTest x64-->needs the included libfftw3f-3.dll to be in your system32 directory Built on 3/19/2010 DgDecode 1.5.8 x64 Note: Is missing some IDCT modes, will get them back ASAP Built on 4/10/2010 EEDI2 x64 + Vectorized main loop + Further restructured main loop to minimize branching, processor dependent speed increase FeildHints x64 Built on 3/12/2010 FFT3DFilter-->needs the included libfftw3f-3.dll to be in your system32 directory Built on 3/15/2010 FFT3DGPU x64 note:The hlsl (shader program) file is edited from the original to adhere to pixel shader 3.0 syntax rules. Please make sure to place the correct file in the same directory as the 64bit plugin. New on 3/13/2010 kemuri-_9's FFMS2 (The Fabulous FM Source 2) Big thanks to kemuri-_9 for the build Built on 3/29/2010 GradFun2DB x64 Built on 4/08/2010 hqdn3d x64 Built on 3/14/2010 LeakKernelDeint x64 Built on of 3/12/2010 MaskTools 2.0a41 + Now straight from the source, Manao Built on 3/31/2010 MVTools2 x64 + Continued conversion and updating of assembly functions + Removal of some code intended to support processors without mmx/iSSE + Converted often used assembly functions to SSE2 instead of mmx/iSSE + Updated to latest shared function library from x264 + Healthy 20%+ speed increase over x86 version in most cases New on 3/14/2010 TDeint x64 This is basically the same as squid80's build, main differences being newer avisynth.h and newer compiler TelecideHints x64 Built on 3/13/2010: TIVTC x64 New on 3/20/2010 TNLMeans_x64 v1.0.3 New on 3/20/2010 TTempSmooth x64 v0.9.4 RemoveGrain x64 Repair x64 VerticalCleaner x64 Visit Squid80's homepage for more x64 plugins Benchmarking Suggestions Here is a 64bit avs2avi for benchmarking. You can run it against the original To simply run the script through Avisynth, execute the following at a command prompt: Code:
avs2avi64.exe <path:\script.avs> -o n The same should be done with avs2avi.exe. This will take two factors out of the speed equation: 64bit vs 32bit compressors and hard disk write speed. The final fps report from both runs will allow a fairly apples to apples comparison of the two builds. Source Code For those who are interested: The source is now hosted over at google code, I'll keep it as up to date as I can The source is constantly in flux. This wiki page has all of the plugins linked as well. The source to any of the plugins I have personally modified is available upon request. Please message me if interested. Last edited by JoshyD; 27th April 2010 at 03:05. |
16th February 2010, 20:06 | #3 | Link |
Registered User
Join Date: Feb 2010
Posts: 84
|
Hmm, it appears that even when linking all your libs statically, intel's compiler still links the openmp libs dynamically, I'll get a rebuild up ASAP. From the ICC forums:
"The 11.0 Windows compilers (both C++ and Fortran) have decoupled /MT from having any effect on which OpenMP runtime (static or dynamic) gets linked. In fact, all of /MT[d], /MD[d], and /ML[d] (latter VS2003 only) now only effect which MS C runtime is linked. We made this change because we want dynamic to be the default when linking the OpenMP RTL. The use of static OpenMP libraries is not recommended, because they might cause multiple libraries to be linked in an application. The condition is not supported and could lead to unpredictable results. It can also cause Thread Checker false positives and other problems with the Intel Threading tools. If you want to link against the static OpenMP RTL, you must add /Qopenmp-link:static, which is a new switch for 11.0. So to produce a purely static executable, compile/link with /MT /Qopenmp-link:static Patrick Kennedy Intel Compiler Lab" I didn't build the project with OpenMP libs enabled, but did allow the compiler to auto-parallelize loops it found it could. Perhaps this is the issue. Last edited by JoshyD; 16th February 2010 at 20:23. |
17th February 2010, 00:48 | #4 | Link |
Registered User
Join Date: Feb 2010
Posts: 84
|
There is support for SetMTMode and it's functions, but not MT("command") as of now. The whole MT function is contained in an entirely separate DLL loaded form your plugins dir automatically, or manually at the beginning of the script.
The RAR does contain a copy of direct show source for use with AVI synth. Any program that can open avs files AND is already a 64 bit executable should do the trick. VDub64 comes to mind, as well as media player classic home cinema 64. 64 bit builds of x264 that specifically have the ability to open avs files should work as well. I have personally tested Virtual Dub 64, it has played back all of my little test cases I've had a chance to run without any complaint. For some 64bit plugins that MAY work with this DLL please check out Squid80's prior work. He was the guy who ported Avisynth 2.5.5 to x64 a while back, without the nicety of having a compiler that supports inline asm. Other than that, the the top two plugins that I want to see work with 64bit are MvTools2 and FFT3DFilter. I get a lot of mileage out of those two projects. If there's enough interest generated I'm considering going back and optimizing all the ASM routines to take advantage of architectural changes that have occured over the past 5 or so years. Some of these assembly routines are long in the tooth, and could be better tuned for newer processors (I think). Anyhow thanks for the post, and keep checking back for updates. |
17th February 2010, 00:54 | #5 | Link |
Registered User
Join Date: Nov 2009
Posts: 327
|
What I meant was, could you modify your avisynth64 release to use a separate directory for autoloading 64-bit plugins to avoid name-conflicts and similar issues? On an aside, having mvtools2 + MaskTools2 would allow a lot of script functions to work automatically.
Last edited by Stephen R. Savage; 17th February 2010 at 00:56. |
17th February 2010, 01:33 | #6 | Link | |
Registered User
Join Date: Feb 2010
Posts: 84
|
Quote:
|
|
17th February 2010, 20:51 | #7 | Link |
typo lover
Join Date: May 2009
Posts: 595
|
Hi, JoshyD
Thx for static build version. it seems to work at present without troubles. I did some benchmarks. the results is here. I think that faster 64bit decoder is necessary for me... Last edited by Chikuzen; 2nd March 2010 at 13:39. |
17th February 2010, 21:22 | #8 | Link |
Registered User
Join Date: Nov 2009
Posts: 327
|
Kassandro posted his RemoveGrain64 on his own forum:
http://videoprocessing.11.forumer.co...2cd20f1c9425d0 Edit: Your rebuilt EEDI2() causes artifacts in chroma. Edit2: Seems to be a pitch error. Doing a TurnLeft/TurnRight works around it. Last edited by Stephen R. Savage; 17th February 2010 at 21:35. |
18th February 2010, 21:35 | #9 | Link |
Registered User
Join Date: Feb 2010
Posts: 84
|
Thanks for the benchmarcks! I don't think we're going to see any speed increases with 64bit code unless the assembly is re-worked to take advantage of more registers / less memory access.
Any resizers you run through there are really just using their old 32 bit counter part, essentially. The size of the pointers change, but the register usage at the CPU level is still the same, because it was specified explicitly. I'm going to continue work to see if I can't eek out some extra performance. I mainly want to see if I can't get some of the more demanding plugins a speed boost. I'm not sure what's wrong with EEDI2, the source was straight compiled again. |
19th February 2010, 01:05 | #10 | Link |
Registered User
Join Date: Nov 2009
Posts: 327
|
Perhaps there was always the bug in the source code. Nevertheless, if you have experience in Avisynth plugin development, perhaps you could squash it for us? Please?
EEDI2 would normally not be anywhere near the top of my priorities, but without nnedi2, it's pretty much a necessary step to get TempGaussMC working on avs64 (the other steps being removegrain64, mvtools2_64, and masktools2_64). |
19th February 2010, 12:06 | #11 | Link | |
Registered User
Join Date: Feb 2010
Posts: 84
|
Quote:
Try this out and let me know if it produces consistent results I DO have a somewhat working MVTools2. In so far as I have tested it, the "important" functions are working. A TON of the ASM has been re-coded to adhere to function calling specifications set forth by x64 c++. I did it by hand, meaning, there's probably a decent chance you'll crash it. I also had to update the parts borrowed from other projects (xvid, x264, fftw), so those are a little "rough" at the moment. My test cases mainly focused around motion vector generation and the degraining functions. Perhaps someone else will be able to fault it in other places, allowing me a chance to find and fix the bugs. Here's the link to MVTools2 x64 Personally, I see a significant performance increase (from ~20fps x86 to ~30fps x64, when using multi threading in both cases) when just writing out a raw stream. Try it out, and let me know where the problems are. This is a little sample of what I've been using to mess around with the parameters. You can get it to go through a surprisingly large number of code paths just by varying the inputs to different degrain functions. Code:
#MVTools x64 SetMTMode(2,4) #could be more, my system has four logical threads, but in certain instances more increase my encoding fps LoadPlugin("D:\Development\mvtools2.dll") AviSource("D:\testfile.avi") ConvertToYV12(interlaced=true) function MDegrain2i(clip source, int "overlap", int "dct", int "blksize", int "pel", int "search", int "searchparam") { overlap = default(overlap,0) # overlap value (0 to 4 for blksize=8) dct = default(dct,0) # use dct=1 for clip with light flicker blksize = default(blksize, 8) pel = default(pel, 2) search = default(search, 4) searchparam = default(searchparam, 3) fields = source.SeparateFields() # separate by fields super = fields.MSuper(chroma=true, pel=pel) backward_vec2 = super.MAnalyse(blksize=blksize, isb = true, delta = 2, overlap=overlap, dct=dct, truemotion=true, temporal=true, pelsearch=pel, search=search, searchparam=searchparam) forward_vec2 = super.MAnalyse(blksize=blksize, isb = false, delta = 2, overlap=overlap, dct=dct, truemotion=true, temporal=true, pelsearch=pel, search=search, searchparam=searchparam) backward_vec4 = super.MAnalyse(blksize=blksize, isb = true, delta = 4, overlap=overlap, dct=dct, truemotion=true, temporal=true, pelsearch=pel, search=search, searchparam=searchparam) forward_vec4 = super.MAnalyse(blksize=blksize, isb = false, delta = 4, overlap=overlap, dct=dct, truemotion=true, temporal=true, pelsearch=pel, search=search, searchparam=searchparam) fields.MDegrain2(super, backward_vec2,forward_vec2,backward_vec4,forward_vec4, thSAD=500, thSCD1=500, thSCD2=130, plane=4) Weave() } return MDegrain2i(last, overlap=4, blksize=8, pel=2) Last edited by JoshyD; 20th February 2010 at 01:19. |
|
21st February 2010, 19:17 | #12 | Link |
AviSynth plugger
Join Date: Nov 2003
Location: Russia
Posts: 2,183
|
JoshyD, you do big work!
I am intersted to see your TON of the ASM mvtools_64 source code lines
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick I usually do not provide a technical support in private messages. |
22nd February 2010, 01:23 | #13 | Link | |
Potentate
Join Date: Mar 2003
Posts: 219
|
Quote:
Fizick, you do know that he is Jeremy Duncan in another persona, right? From the user control panel, Jeremy's last activity was on Feb 5. JoshyD joins on Feb 5... Hmmmmm |
|
22nd February 2010, 10:20 | #16 | Link | |
Registered User
Join Date: Feb 2010
Posts: 84
|
Quote:
Besides, Jeremy Duncan could barely compile the original 32bit source . . . |
|
22nd February 2010, 20:06 | #17 | Link |
AviSynth plugger
Join Date: Nov 2003
Location: Russia
Posts: 2,183
|
Strange personality discussion.... I am simply waiting the source code and project files of MVTools2_x64 (GPL).
Are 32bit and 64 bit asm code generalized?
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick I usually do not provide a technical support in private messages. |
22nd February 2010, 21:09 | #18 | Link | |
Registered User
Join Date: Feb 2010
Posts: 84
|
Quote:
Unfortunately, they're not generalized as a whole. I did the inline assembler generally using just #ifdef's and then grabbed the latest x264 asm functions that the project uses, which are generalized. The actual asm for functions contained in files like bilinear.asm just got overwritten. It's trivial to swap the file for the original, and re-compile. It's just not exactly elegant. The functions should have been generalized, but I was just working fast, kind of without putting a ton of thought into the process. I've been learning as I go, and it seems I always find an example of a sleeker solution after working through the first ugly one that popped into my head. The main differences are in function calling, and register usage. You can get by with a lot less push/popping in 64bit land. Stack allocation between function calls changes as well. As a rule, all arguments are aligned at 8 byte boundaries. Arguments that are passed to the functions via registers still get shadow space on the stack, so your 5th integer argument will be at [rsp+40], if you didn't push any registers on the stack in the first place. I wanted to ask some specific questions about the filtering and code copying functions. Was there a reason that they're often limited to the mmx registers? Things like the Horizontal Weiner filter are bothering me, because depending on your byte window, you're going to get different results from it, or it would seem that way. Right now, it has a 4 byte "window" to filter around. My understanding of a weiner filter is that it's adaptive, so changing its discrete window would change the filter's output altogether. It's possible to look at 8 byte chunks using the XMM registers (unpacking to words for arithmetic (128bits total), repacking) but I'm unsure on the effect on overall image quality. Thoughts? Finally, a lot of the mmx functions don't take advantage of the fact that we have a ton of XMM registers floating around that can also be used in mmx arithmetic. XMM0-XMM5 are all volatile across calls which could prevent some mmx registers from being shuffled around, etc. When writing assembler, I'm not sure how the CPU's register files are architected to interact with each other. As in, if there's a pentalty associated with transferring a qword from an XMM register to an MMX reg, and vice versa. I'm actually a VLSI designer (very large scale integration) by education, so thinking on the machine level is interesting and thought provoking. I don't know enough about the design of the x86 cores of late to generalize performance impact of various code paths. Is there any way other than running a battery of tests to analyze the clock cycles it takes for an instruction to retire? I'm going to search around for the answers, but I thought I'd ask anyhow. Sometimes that's the fastest and most concise way to find the info you're after. Last edited by JoshyD; 22nd February 2010 at 21:45. |
|
1st March 2010, 22:40 | #19 | Link |
Registered User
Join Date: May 2008
Posts: 1,840
|
I see directshowsource source in the sources package but any chance of including directshowsource.dll in the binary package?
I've never been able to get squid80's directshowsource.dll working, keeps giving no function errors. Also if you ever plan on an x64 build of 2.60 I look forward to it considering a resize only script is about 10% faster with 2.60 over 2.58 and about 15% faster then 2.58 MT. |
1st March 2010, 23:06 | #20 | Link |
Registered User
Join Date: Nov 2009
Posts: 327
|
DSS was included in one of the earlier copies of the Avs64 build posted here. It seems JoshyD has forgotten to include it in the latest one, so I have uploaded it here: http://www.sendspace.com/file/tm7pcf
|
|
|