Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 16th February 2010, 10:53   #1  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
SEt's Avisynth 2.5.8 MT compiled for *X86_64*, Latest Build 4/16/2010

Edit: Quicklinks Updated 4/26/2010

Featured Release 4/16/2010
+ Resize artifacting fixed
+ Horizontal resize code re-written to use SSE registers
+ Worth noting, often used functions Temporal Soften, Merge, etc have been tweaked for a decent speed gain
+ Bug found, fixed in memory copy routine, again
+ Universal binary, no longer need to distinguish between AMD and Intel builds
+ Optimized BitBlt memory copy Routine
+ Started implementation of SSE3/4 specific instructions when supported processor is detected
+ Removed most code paths intended to support CPU's without mmx/iSSE
+ Resize functions reworked to take advantage of extra registers available when processor is in 64bit mode
Avisynth64 binary and installer built on 4/16/2010

Many of the plugins have been optimized and recompiled, please get the latest and greatest versions with this release.

Service update on 3/19/2010
+ Minor fixes to allow better usage of MT modes
+ Tweaks to code for small performance increases all around
+ Fixed resize bug
Use this build for Intel processors
Use this build for AMD processors

Version from 3/15/2010:
64 bit Avisynth 2.5.8 w/multithreading

------------------------------------------------
Plugins (Alphabetically, for now)
------------------------------------------------

New on 3/21/2010
AddGrainC x64

Built on 3/14/2010
AutoCrop x64

New on 3/21/2010
aWarpSharp x64
This version is based on SEt's original rewrite found in this thread

Built on 3/20/2010
Color Matrix x64

New on 3/13/2010
DFTTest x64-->needs the included libfftw3f-3.dll to be in your system32 directory

Built on 3/19/2010
DgDecode 1.5.8 x64
Note: Is missing some IDCT modes, will get them back ASAP

Built on 4/10/2010
EEDI2 x64
+ Vectorized main loop
+ Further restructured main loop to minimize branching, processor dependent speed increase

FeildHints x64

Built on 3/12/2010
FFT3DFilter-->needs the included libfftw3f-3.dll to be in your system32 directory

Built on 3/15/2010
FFT3DGPU x64
note:The hlsl (shader program) file is edited from the original to adhere to pixel shader 3.0 syntax rules. Please make sure to place the correct file in the same directory as the 64bit plugin.

New on 3/13/2010
kemuri-_9's FFMS2 (The Fabulous FM Source 2)
Big thanks to kemuri-_9 for the build

Built on 3/29/2010
GradFun2DB x64

Built on 4/08/2010
hqdn3d x64

Built on 3/14/2010
LeakKernelDeint x64

Built on of 3/12/2010
MaskTools 2.0a41
+ Now straight from the source, Manao

Built on 3/31/2010
MVTools2 x64
+ Continued conversion and updating of assembly functions
+ Removal of some code intended to support processors without mmx/iSSE
+ Converted often used assembly functions to SSE2 instead of mmx/iSSE
+ Updated to latest shared function library from x264
+ Healthy 20%+ speed increase over x86 version in most cases

New on 3/14/2010
TDeint x64
This is basically the same as squid80's build, main differences being newer avisynth.h and newer compiler

TelecideHints x64

Built on 3/13/2010:
TIVTC x64


New on 3/20/2010
TNLMeans_x64 v1.0.3

New on 3/20/2010
TTempSmooth x64 v0.9.4

RemoveGrain x64

Repair x64

VerticalCleaner x64

Visit Squid80's homepage for more x64 plugins

Benchmarking Suggestions

Here is a 64bit avs2avi for benchmarking.
You can run it against the original

To simply run the script through Avisynth, execute the following at a command prompt:
Code:
avs2avi64.exe <path:\script.avs> -o n
A dialog box will pop up asking what to do for compression. Choose no recompression (or whatever similar option your os gives you) and the script will run without saving an output.

The same should be done with avs2avi.exe.

This will take two factors out of the speed equation: 64bit vs 32bit compressors and hard disk write speed. The final fps report from both runs will allow a fairly apples to apples comparison of the two builds.

Source Code
For those who are interested: The source is now hosted over at google code, I'll keep it as up to date as I can
The source is constantly in flux.

This wiki page has all of the plugins linked as well.

The source to any of the plugins I have personally modified is available upon request. Please message me if interested.

Last edited by JoshyD; 27th April 2010 at 03:05.
JoshyD is offline   Reply With Quote
Old 16th February 2010, 15:30   #2  |  Link
Chikuzen
typo lover
 
Chikuzen's Avatar
 
Join Date: May 2009
Posts: 595
Hi, JoshyD

I tried to test it with my Q9450 and 64bit windows7, but some DLLs seems to be insufficient.
where can i get libiomp5md.dll ?
Chikuzen is offline   Reply With Quote
Old 16th February 2010, 20:06   #3  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
Hmm, it appears that even when linking all your libs statically, intel's compiler still links the openmp libs dynamically, I'll get a rebuild up ASAP. From the ICC forums:

"The 11.0 Windows compilers (both C++ and Fortran) have decoupled /MT from having any effect on which OpenMP runtime (static or dynamic) gets linked. In fact, all of /MT[d], /MD[d], and /ML[d] (latter VS2003 only) now only effect which MS C runtime is linked.
We made this change because we want dynamic to be the default when linking the OpenMP RTL. The use of static OpenMP libraries is not recommended, because they might cause multiple libraries to be linked in an application. The condition is not supported and could lead to unpredictable results. It can also cause Thread Checker false positives and other problems with the Intel Threading tools.
If you want to link against the static OpenMP RTL, you must add /Qopenmp-link:static, which is a new switch for 11.0. So to produce a purely static executable, compile/link with /MT /Qopenmp-link:static
Patrick Kennedy
Intel Compiler Lab"

I didn't build the project with OpenMP libs enabled, but did allow the compiler to auto-parallelize loops it found it could. Perhaps this is the issue.

Last edited by JoshyD; 16th February 2010 at 20:23.
JoshyD is offline   Reply With Quote
Old 17th February 2010, 00:48   #4  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
There is support for SetMTMode and it's functions, but not MT("command") as of now. The whole MT function is contained in an entirely separate DLL loaded form your plugins dir automatically, or manually at the beginning of the script.

The RAR does contain a copy of direct show source for use with AVI synth. Any program that can open avs files AND is already a 64 bit executable should do the trick. VDub64 comes to mind, as well as media player classic home cinema 64. 64 bit builds of x264 that specifically have the ability to open avs files should work as well.

I have personally tested Virtual Dub 64, it has played back all of my little test cases I've had a chance to run without any complaint.

For some 64bit plugins that MAY work with this DLL please check out Squid80's prior work. He was the guy who ported Avisynth 2.5.5 to x64 a while back, without the nicety of having a compiler that supports inline asm.

Other than that, the the top two plugins that I want to see work with 64bit are MvTools2 and FFT3DFilter. I get a lot of mileage out of those two projects.

If there's enough interest generated I'm considering going back and optimizing all the ASM routines to take advantage of architectural changes that have occured over the past 5 or so years. Some of these assembly routines are long in the tooth, and could be better tuned for newer processors (I think).

Anyhow thanks for the post, and keep checking back for updates.
JoshyD is offline   Reply With Quote
Old 17th February 2010, 00:54   #5  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 327
What I meant was, could you modify your avisynth64 release to use a separate directory for autoloading 64-bit plugins to avoid name-conflicts and similar issues? On an aside, having mvtools2 + MaskTools2 would allow a lot of script functions to work automatically.

Last edited by Stephen R. Savage; 17th February 2010 at 00:56.
Stephen R. Savage is offline   Reply With Quote
Old 17th February 2010, 01:33   #6  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
Quote:
Originally Posted by Stephen R. Savage View Post
What I meant was, could you modify your avisynth64 release to use a separate directory for autoloading 64-bit plugins to avoid name-conflicts and similar issues? On an aside, having mvtools2 + MaskTools2 would allow a lot of script functions to work automatically.
Sure, that's pretty easy. I can add throw that one on the to do list. I'm just a bit more focused on making sure others can run it at the moment.
JoshyD is offline   Reply With Quote
Old 17th February 2010, 20:51   #7  |  Link
Chikuzen
typo lover
 
Chikuzen's Avatar
 
Join Date: May 2009
Posts: 595
Hi, JoshyD
Thx for static build version. it seems to work at present without troubles.

I did some benchmarks. the results is here.

I think that faster 64bit decoder is necessary for me...

Last edited by Chikuzen; 2nd March 2010 at 13:39.
Chikuzen is offline   Reply With Quote
Old 17th February 2010, 21:22   #8  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 327
Kassandro posted his RemoveGrain64 on his own forum:

http://videoprocessing.11.forumer.co...2cd20f1c9425d0

Edit: Your rebuilt EEDI2() causes artifacts in chroma.
Edit2: Seems to be a pitch error. Doing a TurnLeft/TurnRight works around it.

Last edited by Stephen R. Savage; 17th February 2010 at 21:35.
Stephen R. Savage is offline   Reply With Quote
Old 18th February 2010, 21:35   #9  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
Thanks for the benchmarcks! I don't think we're going to see any speed increases with 64bit code unless the assembly is re-worked to take advantage of more registers / less memory access.

Any resizers you run through there are really just using their old 32 bit counter part, essentially. The size of the pointers change, but the register usage at the CPU level is still the same, because it was specified explicitly. I'm going to continue work to see if I can't eek out some extra performance. I mainly want to see if I can't get some of the more demanding plugins a speed boost.


I'm not sure what's wrong with EEDI2, the source was straight compiled again.
JoshyD is offline   Reply With Quote
Old 19th February 2010, 01:05   #10  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 327
Perhaps there was always the bug in the source code. Nevertheless, if you have experience in Avisynth plugin development, perhaps you could squash it for us? Please?

EEDI2 would normally not be anywhere near the top of my priorities, but without nnedi2, it's pretty much a necessary step to get TempGaussMC working on avs64 (the other steps being removegrain64, mvtools2_64, and masktools2_64).
Stephen R. Savage is offline   Reply With Quote
Old 19th February 2010, 12:06   #11  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
Quote:
Originally Posted by Stephen R. Savage View Post
Perhaps there was always the bug in the source code. Nevertheless, if you have experience in Avisynth plugin development, perhaps you could squash it for us? Please?

EEDI2 would normally not be anywhere near the top of my priorities, but without nnedi2, it's pretty much a necessary step to get TempGaussMC working on avs64 (the other steps being removegrain64, mvtools2_64, and masktools2_64).
Edit: EEDI2 bug squashed, I think
Try this out and let me know if it produces consistent results


I DO have a somewhat working MVTools2. In so far as I have tested it, the "important" functions are working. A TON of the ASM has been re-coded to adhere to function calling specifications set forth by x64 c++. I did it by hand, meaning, there's probably a decent chance you'll crash it.

I also had to update the parts borrowed from other projects (xvid, x264, fftw), so those are a little "rough" at the moment.

My test cases mainly focused around motion vector generation and the degraining functions. Perhaps someone else will be able to fault it in other places, allowing me a chance to find and fix the bugs.

Here's the link to MVTools2 x64

Personally, I see a significant performance increase (from ~20fps x86 to ~30fps x64, when using multi threading in both cases) when just writing out a raw stream. Try it out, and let me know where the problems are.

This is a little sample of what I've been using to mess around with the parameters. You can get it to go through a surprisingly large number of code paths just by varying the inputs to different degrain functions.

Code:
#MVTools x64
SetMTMode(2,4) #could be more, my system has four logical threads, but in certain instances more increase my encoding fps
LoadPlugin("D:\Development\mvtools2.dll")
AviSource("D:\testfile.avi")
ConvertToYV12(interlaced=true)


function MDegrain2i(clip source, int "overlap", int "dct", int "blksize", int "pel", int "search", int "searchparam")
{
	overlap = default(overlap,0) # overlap value (0 to 4 for blksize=8)
	dct = default(dct,0) # use dct=1 for clip with light flicker
	blksize = default(blksize, 8)
	pel = default(pel, 2)
	search = default(search, 4)
	searchparam = default(searchparam, 3)
	
	fields = source.SeparateFields() # separate by fields
	super = fields.MSuper(chroma=true, pel=pel)
	
	backward_vec2 = super.MAnalyse(blksize=blksize, isb = true, delta = 2, overlap=overlap, dct=dct, truemotion=true, temporal=true, pelsearch=pel, search=search, searchparam=searchparam)
	forward_vec2 = super.MAnalyse(blksize=blksize, isb = false, delta = 2, overlap=overlap, dct=dct, truemotion=true, temporal=true, pelsearch=pel, search=search, searchparam=searchparam)
	
	backward_vec4 = super.MAnalyse(blksize=blksize, isb = true, delta = 4, overlap=overlap, dct=dct, truemotion=true, temporal=true, pelsearch=pel, search=search, searchparam=searchparam)
	forward_vec4 = super.MAnalyse(blksize=blksize, isb = false, delta = 4, overlap=overlap, dct=dct, truemotion=true, temporal=true, pelsearch=pel, search=search, searchparam=searchparam)
	
	fields.MDegrain2(super, backward_vec2,forward_vec2,backward_vec4,forward_vec4, thSAD=500, thSCD1=500, thSCD2=130, plane=4)
	Weave()
}

return MDegrain2i(last, overlap=4, blksize=8, pel=2)
Enjoy, and let me know any problems

Last edited by JoshyD; 20th February 2010 at 01:19.
JoshyD is offline   Reply With Quote
Old 21st February 2010, 19:17   #12  |  Link
Fizick
AviSynth plugger
 
Fizick's Avatar
 
Join Date: Nov 2003
Location: Russia
Posts: 2,183
JoshyD, you do big work!
I am intersted to see your TON of the ASM mvtools_64 source code lines
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick
I usually do not provide a technical support in private messages.
Fizick is offline   Reply With Quote
Old 22nd February 2010, 01:23   #13  |  Link
tedkunich
Potentate
 
Join Date: Mar 2003
Posts: 219
Quote:
Originally Posted by Fizick View Post
JoshyD, you do big work!
I am intersted to see your TON of the ASM mvtools_64 source code lines

Fizick, you do know that he is Jeremy Duncan in another persona, right?

From the user control panel, Jeremy's last activity was on Feb 5. JoshyD joins on Feb 5... Hmmmmm
tedkunich is offline   Reply With Quote
Old 22nd February 2010, 02:35   #14  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 327
I'm fairly certain that Jeremy Duncan would not be able to write any significant amount of code, much less port a large software project.
Stephen R. Savage is offline   Reply With Quote
Old 22nd February 2010, 06:19   #15  |  Link
tedkunich
Potentate
 
Join Date: Mar 2003
Posts: 219
Quote:
Originally Posted by Stephen R. Savage View Post
I'm fairly certain that Jeremy Duncan would not be able to write any significant amount of code, much less port a large software project.
Does seem a stretch, but it is extremely suspicious that this guy comes here out of the blue at the same time our resident 64bit pest disappears...
tedkunich is offline   Reply With Quote
Old 22nd February 2010, 10:20   #16  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
Quote:
Originally Posted by tedkunich View Post
Does seem a stretch, but it is extremely suspicious that this guy comes here out of the blue at the same time our resident 64bit pest disappears...
I don't mean to be a pest about 64 bit binaries, I was just curious what improvements would be seen, if any. It was a nice chance to learn assembly, re-learn signal filtering, and all that good stuff.

Besides, Jeremy Duncan could barely compile the original 32bit source . . .
JoshyD is offline   Reply With Quote
Old 22nd February 2010, 20:06   #17  |  Link
Fizick
AviSynth plugger
 
Fizick's Avatar
 
Join Date: Nov 2003
Location: Russia
Posts: 2,183
Strange personality discussion.... I am simply waiting the source code and project files of MVTools2_x64 (GPL).
Are 32bit and 64 bit asm code generalized?
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick
I usually do not provide a technical support in private messages.
Fizick is offline   Reply With Quote
Old 22nd February 2010, 21:09   #18  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
Quote:
Originally Posted by Fizick View Post
Strange personality discussion.... I am simply waiting the source code and project files of MVTools2_x64 (GPL).
Are 32bit and 64 bit asm code generalized?
The source is finally up, sorry for the delay

Unfortunately, they're not generalized as a whole. I did the inline assembler generally using just #ifdef's and then grabbed the latest x264 asm functions that the project uses, which are generalized. The actual asm for functions contained in files like bilinear.asm just got overwritten. It's trivial to swap the file for the original, and re-compile. It's just not exactly elegant.

The functions should have been generalized, but I was just working fast, kind of without putting a ton of thought into the process. I've been learning as I go, and it seems I always find an example of a sleeker solution after working through the first ugly one that popped into my head.

The main differences are in function calling, and register usage. You can get by with a lot less push/popping in 64bit land. Stack allocation between function calls changes as well. As a rule, all arguments are aligned at 8 byte boundaries. Arguments that are passed to the functions via registers still get shadow space on the stack, so your 5th integer argument will be at [rsp+40], if you didn't push any registers on the stack in the first place.

I wanted to ask some specific questions about the filtering and code copying functions. Was there a reason that they're often limited to the mmx registers? Things like the Horizontal Weiner filter are bothering me, because depending on your byte window, you're going to get different results from it, or it would seem that way. Right now, it has a 4 byte "window" to filter around. My understanding of a weiner filter is that it's adaptive, so changing its discrete window would change the filter's output altogether.

It's possible to look at 8 byte chunks using the XMM registers (unpacking to words for arithmetic (128bits total), repacking) but I'm unsure on the effect on overall image quality. Thoughts?

Finally, a lot of the mmx functions don't take advantage of the fact that we have a ton of XMM registers floating around that can also be used in mmx arithmetic. XMM0-XMM5 are all volatile across calls which could prevent some mmx registers from being shuffled around, etc.

When writing assembler, I'm not sure how the CPU's register files are architected to interact with each other. As in, if there's a pentalty associated with transferring a qword from an XMM register to an MMX reg, and vice versa.

I'm actually a VLSI designer (very large scale integration) by education, so thinking on the machine level is interesting and thought provoking. I don't know enough about the design of the x86 cores of late to generalize performance impact of various code paths. Is there any way other than running a battery of tests to analyze the clock cycles it takes for an instruction to retire?

I'm going to search around for the answers, but I thought I'd ask anyhow. Sometimes that's the fastest and most concise way to find the info you're after.

Last edited by JoshyD; 22nd February 2010 at 21:45.
JoshyD is offline   Reply With Quote
Old 1st March 2010, 22:40   #19  |  Link
turbojet
Registered User
 
Join Date: May 2008
Posts: 1,840
I see directshowsource source in the sources package but any chance of including directshowsource.dll in the binary package?

I've never been able to get squid80's directshowsource.dll working, keeps giving no function errors.

Also if you ever plan on an x64 build of 2.60 I look forward to it considering a resize only script is about 10% faster with 2.60 over 2.58 and about 15% faster then 2.58 MT.
turbojet is offline   Reply With Quote
Old 1st March 2010, 23:06   #20  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 327
DSS was included in one of the earlier copies of the Avs64 build posted here. It seems JoshyD has forgotten to include it in the latest one, so I have uploaded it here: http://www.sendspace.com/file/tm7pcf
Stephen R. Savage is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 21:37.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.