View Single Post
Old 27th February 2010, 18:46   #45  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
Quote:
Originally Posted by Stephen R. Savage View Post
I can confirm that TGMC/EEDI2 works now.

@aegis: You want avs64, eedi2_64, mvtools2_64, mt_masktools64, removegrain64, and repair64.

Things left to do:
1) Port changes to avs 2.6
2) Implement RGB code
3) Port more plugins
4) Clean up code
@aegis
Stephen's spot on about what to get, the directions for installation of the 64bit dll are pretty straight forward, and contained in the readme that accompanies the archive containing Avisynth64. Any headaches you encounter, come back and we'll (myself and whoever else cares to pitch in(I bet Stephen wouldn't mind)) will be as helpful as possible.

@Stephen
Your list of todo's is quite well put. I wanted to ask you if you saw any speed improvement under the 64bit environment. I went from encoding ~10fps on a 720x480 interlaced dv source to ~20fps. I feel like that's too big of a leap and I may have screwed something up along the line. If your numbers are anything similar, I can rest a little easier.




Code clean up is a big concern . . . it's kind of a mess in there, especially the hack and slash that was done to get the inline assembler in Avisynth to compile. I would really like to rip the assembly out to standalone ASM files and optimize it specifically for x64 to truly take advantage of the extra registers all around. I think there's some extra performance to eek out if that can be done. Any current inline ASM doesn't use the extra registers at our disposal. In addition, a lot of it uses the mmx register set, when we could easily use the SSE registers, which are twice as large and twice as numerous (8 64 bit mmx registers vs 16 128bit xmm (SSE) registers).

Some of the existing code can be directly dropped in to what's in Avisynth 2.6, while other portions of the project have gone to Softwire for dynamic assembly generation. That's just a pain to deal with, I would have rather seen assembly generated dynamically via macros and such as Masktools did. This would mean generating every function needed before runtime, leading to possibly larger code, but removes the dependence on Softwire, which is no longer officially maintained.

Seriously, I'm really impressed with the way the Masktools authors implemented their assembly routines. It obscures a lot of the function generation via a TON of macros, but with some architecture specific aliasing of registers/variables, the codebase can be compiled for both x86 and x64 without too much of a headache. That's almost the state I have it in now, an hour of coding and the source would compile for either architecture. It's also compiler agnostic (for the most part, there were some pieces of C++ that ICC and GCC don't agree with, which were coded around in my release-->most compilers like their function templates outside of *.cpp source files). If any Masktools authors care, I can give you the source I made, and point you to where the more strict compilers complain.

The RGB code should be back in the main DLL without too much work, I think I've almost got it taken care of, hopefully we'll see it back by the end of the week. It just wasn't on the top of my priority list, I wanted to get TempGaussMC up and going before anything else.

The script produces awesome results for restoring my old home movies, but was painfully slow when comparing it to a quick noise removal / deinterlace with some of the other plugins (namely a quick run through FFT3DFilter and MVDegrain coupled with yadifmod/nnedi2). The x64 results appear visually identical, so I don't think there are any steps accidentally skipped in the x64 code path. I'm racking my brain for differences outside of arhcitecture, but can't think of any as of yet.

That being said, working with the same samples, I can literally encode almost twice as fast when working under a pure 64bit environment. This makes re-encoding 24 hours of raw dv video (thank you HDD's in the TB+ range) seem like a practical batch job.

I've grabbed the 2.6 branch from the CVS periodically, and it's not stable for me in x86 land as of yet. The code compiles, and starts to serve the video stream, but output will just drop to zero a few hundred frames into an AVI encode. When the project matures to a point that the authors feel comfortable calling it a beta release, I want to go to work. A lot of the code can just be "dropped in" while I'd like to create a more elegant solution to replace Softwire generated assembly. This would make the whole package a bit more future proof. We're going to see x64 stick around for a while, it makes sense to start to use the features it offers. I personally have 4 x64 machines floating around, while clinging to an old x86 P4 (northwood 2.6ghz) that I use to mess around with various linux distros.

Final note: Both GCC and ICC support iterative compile-->runtime profiling of code-->profile based optimization. None of that has been done for any of these releases. When it's all a bit more stable, I'd like to allow some time for this. The guys at doom10 recommend this approach when doing your own x264 compiles.
JoshyD is offline   Reply With Quote