Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development
Register FAQ Calendar Today's Posts Search

Reply
 
Thread Tools Search this Thread Display Modes
Old 2nd March 2010, 19:14   #61  |  Link
turbojet
Registered User
 
Join Date: May 2008
Posts: 1,840
I was able to get directshowsource working. Loadplugin was pointing to a 32 bit dll.

Also I made an batch installer that checks for x64 os, checks if avisynth x86 is installed, copies files to the avisynth plugins directory, and enters the registry entries. It's kind of a rough installer but it only takes one click and gets the job done. Feel free to edit at your will and/or include it with the binaries if you want.

It's ok to use avisynth x86 plugins directory for x64 dll autoloading but they need to be named different from x86 dll's to allow x86 dll's to load.

Last edited by turbojet; 9th March 2010 at 23:00.
turbojet is offline   Reply With Quote
Old 3rd March 2010, 00:50   #62  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 327
@JoshyD: Thank you. I have updated the benchmarks to reflect the more accurate avs2avi methodology. avs2avi allows the accurate measurement of even very fast filters.

@turbojet: Good job figuring out the autoload thing. I thought it would be something like that.
Stephen R. Savage is offline   Reply With Quote
Old 3rd March 2010, 02:45   #63  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
@Turbojet: Thanks for the nifty little script, I'll go ahead and stick it in the archive with future updates of Avisynth64.

@Stephen: How many threads were you allowing during those benchmarks? I'm guessing you just ran it through the normal cache, so just double checking.

Also, are there any other little pieces of code that you'd like to see work with the project? I took a look at TIVTC, and it's a bit of a beast to convert, are there any other day to day plugins that would be beneficial?

I'm thinking I might optimize some of those hot spots where a single thread gets "stuck". I think there's some speed gains hidden in removing the core program's reliance on the mmx register set. All 64 bit processors have 16 XMM registers, 6 of which are volatile across function calls, meaning we can do whatever we want with them.

The BitBlt Functions move data in 64bit chunks when we could be doing it in 128bit chunks.

I at least want to get the memory copy functions using the XMM registers instead of the MMX ones.
JoshyD is offline   Reply With Quote
Old 3rd March 2010, 04:09   #64  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 327
All tests were strictly single-threaded. Neither MT() nor SetMTMode() were used, and only single-threaded filters were used.

If you can't get TIVTC, then perhaps:

Stuff I use frequently (descending order of priority):
  • DGDecode
  • GradFun2DB
  • dfttest
  • AddGrainC
  • ColorMatrix
  • FFmpegSource2
  • aWarpSharp (SEt version)
  • Vinverse (DLL version, but has working script equivalent)

Stuff I have but don't really use much (unsorted):
  • Average
  • BiFrost
  • ChromaShift
  • DeGrainMedian
  • Gavino's Runtime and Script extensions
  • RawSource
  • SSIQ (YV12 mod)
  • TComb
  • TTempSmooth

Stuff I use that is closed source:
  • nnedi/nnedi2
  • Checkmate (needs to be reimplemented anyway)
  • DSS2 (dead)

Last edited by Stephen R. Savage; 3rd March 2010 at 06:07.
Stephen R. Savage is offline   Reply With Quote
Old 5th March 2010, 23:45   #65  |  Link
Adub
Fighting spam with a fish
 
Adub's Avatar
 
Join Date: Sep 2005
Posts: 2,699
Is it possible to run 32bit plugins with 64 bit Avisynth? Or will all plugins require a rewrite to work correctly with 64bit Avisynth?
__________________
FAQs:Bond's AVC/H.264 FAQ
Site:Adubvideo
Adub is offline   Reply With Quote
Old 6th March 2010, 14:27   #66  |  Link
kemuri-_9
Compiling Encoder
 
kemuri-_9's Avatar
 
Join Date: Jan 2007
Posts: 1,348
Quote:
Originally Posted by Adub View Post
Is it possible to run 32bit plugins with 64 bit Avisynth? Or will all plugins require a rewrite to work correctly with 64bit Avisynth?
x86 and x86_64 code is not allowed to intermix by the operating system.
if the plugin uses inline asm then it'll need to be rewritten...
(seeing at how most plugins inline asm instead of using an asm compiler like yasm/nasm)

if it only uses C/C++, then it can (most generally) just simply recompiled for x86_64
(this depends on how assumptive they are of things and used code that breaks when the sizeof some of the basic variable types change).

if you want to have avisynth x86 and x86_64 intermix, the available option is to use TCPDeliver to deliver frames over socket connections between them.
__________________
custom x264 builds & patches | F@H | My Specs

Last edited by kemuri-_9; 6th March 2010 at 14:33.
kemuri-_9 is offline   Reply With Quote
Old 6th March 2010, 19:36   #67  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 327
By the way, I am reporting that the version of FFT3DFilter on the first post of this thread does not work. It returns "could not load plugin fft3dfilter.dll". I have tried with fftw3.dll in system32, the application directory, and in another global/path directory.

Also, instead of making FFT3DFilter link against fftw3.dll, why not use the default name libfftw3f-3.dll, so that the DLL can be shared with dfttest (if you ever port it).
Stephen R. Savage is offline   Reply With Quote
Old 6th March 2010, 20:54   #68  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
Quote:
Originally Posted by Stephen R. Savage View Post
Also, instead of making FFT3DFilter link against fftw3.dll, why not use the default name libfftw3f-3.dll, so that the DLL can be shared with dfttest (if you ever port it).
Because I was being lazy . . . I'll change the library name to the original. I was wondering if maybe I included the wrong version of the fftw3.dll with the rar? I can't check that right now, but if you have a free second, try grabbing the original package here, and renaming the dll?

I had it up and running on my system . . . I'll get back to my development machine and check it out.

Also, in other fun news, I've ported the vertical resizers to use SSE registers instead of mmx, and taken it out of inline assembly. I'm almost done removing the default inline assembly horizontal routines to their own assembly functions.

You'd be surprised how much speed you can gain by just not using inline assembler and going with straight up assembly instead.

After that's all done, I think the slowest internal Avisynth function will be bitblt, and then it's on to the plugins for more optimizations.
JoshyD is offline   Reply With Quote
Old 6th March 2010, 20:56   #69  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
Quote:
Originally Posted by Adub View Post
Is it possible to run 32bit plugins with 64 bit Avisynth? Or will all plugins require a rewrite to work correctly with 64bit Avisynth?
The response you got earlier is quite correct, but I was wondering which plugin in particular you were interested in?
JoshyD is offline   Reply With Quote
Old 6th March 2010, 21:38   #70  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 327
JoshyD, the MD5 of my fftw3.dll matches the one from the main page (Win64 version, of course).

Also, I believe the resizer functions were optimized for SSE2 in the 2.6 branch. Hopefully your optimizations won't be for nothing and can be generalized to the CVS version.
Stephen R. Savage is offline   Reply With Quote
Old 7th March 2010, 14:52   #71  |  Link
squid_80
Registered User
 
Join Date: Dec 2004
Location: Melbourne, AU
Posts: 1,963
Softwire was alive a few years ago, when 64-bit support was added: http://cvs.gna.org/cvsweb/softwire/?cvsroot=softwire

For benchmarking, that is what the "play" command of avsutil is for.
squid_80 is offline   Reply With Quote
Old 7th March 2010, 19:33   #72  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
@Stephen

The optimizations shouldn't be all lost. The only difference is that mine aren't dynamically generated like they are in the new branch. Currently, I'm implementing them with function pointers to different resizers depending on the FIR filter size which is dependent on the filter. For example Lanczos4Resize has a FIR filter size of 8, it's possible to generate larger FIR filters, but those are rare. I'm still kicking around ideas on how to best handle these cases.

The trade here is that the actual code is larger, but since it's generated at compile time, there's no one time cost of function generation during filter instantiation. Will it really make a difference? Not really sure right now.

To give an idea of the difference in execution, I do something along these lines:

Code:
switch (plane) 
		{
			case 2:  // take V plane
				cur = resampling_patternUV;
				fir_filter_size = *cur++;
				src_pitch = src->GetPitch(PLANAR_V);
				dst_pitch = dst->GetPitch(PLANAR_V);
				xloops = ((src->GetRowSize(PLANAR_V_ALIGNED)+15) / 16)*16;  // Round to multiple of 16
				dstp = dst->GetWritePtr(PLANAR_V);
				srcp = src->GetReadPtr(PLANAR_V);
				y = dst->GetHeight(PLANAR_V);
				yOfs2 = this->yOfsUV;
				(((INT_PTR)srcp&15) || (src_pitch &15)) ? ua_proc_uvplane(srcp, dstp, src_pitch, dst_pitch, y, xloops, yOfs2, cur)
								:a_proc_uvplane(srcp, dstp, src_pitch, dst_pitch, y, xloops, yOfs2, cur);
			break;

			case 1: // U Plane
				cur = resampling_patternUV;
				fir_filter_size = *cur++;
				dstp = dst->GetWritePtr(PLANAR_U);
				srcp = src->GetReadPtr(PLANAR_U);
				y = dst->GetHeight(PLANAR_U);
				src_pitch = src->GetPitch(PLANAR_U);
				dst_pitch = dst->GetPitch(PLANAR_U);
				xloops = ((src->GetRowSize(PLANAR_U_ALIGNED)+15) / 16)*16;  // Round to multiple of 16
				yOfs2 = this->yOfsUV;
				plane--; // skip case 0
				(((INT_PTR)srcp&15) || (src_pitch &15)) ? ua_proc_uvplane(srcp, dstp, src_pitch, dst_pitch, y, xloops, yOfs2, cur)
								:a_proc_uvplane(srcp, dstp, src_pitch, dst_pitch, y, xloops, yOfs2, cur);
			break;
			
			case 3: // Y plane for planar
			break;
			
			case 0: // Default for interleaved
				(((INT_PTR)srcp&15) || (src_pitch &15)) ? ua_proc_yplane(srcp, dstp, src_pitch, dst_pitch, y, xloops, yOfs2, cur)
								:a_proc_yplane(srcp, dstp, src_pitch, dst_pitch, y, xloops, yOfs2, cur);
			break;

			default:
				
			break;
		}
The 2.6 branch does something along these lines:

Code:
switch (plane) {
      case 2:  // take V plane
        src_pitch = src->GetPitch(PLANAR_V);
        dst_pitch = dst->GetPitch(PLANAR_V);
        dstp = dst->GetWritePtr(PLANAR_V);
        srcp = src->GetReadPtr(PLANAR_V);
        y = dst->GetHeight(PLANAR_V);
        yOfs2 = this->yOfsUV;
        (((int)srcp&15) || (src_pitch &15)) ? assemblerUV.Call() : assemblerUV_aligned.Call();
        break;
      case 1: // U Plane
        dstp = dst->GetWritePtr(PLANAR_U);
        srcp = src->GetReadPtr(PLANAR_U);
        y = dst->GetHeight(PLANAR_U);
        src_pitch = src->GetPitch(PLANAR_U);
        dst_pitch = dst->GetPitch(PLANAR_U);
        yOfs2 = this->yOfsUV;
        plane--; // skip case 0
        (((int)srcp&15) || (src_pitch &15)) ? assemblerUV.Call() : assemblerUV_aligned.Call();
        break;
      case 3: // Y plane for planar
      case 0: // Default for interleaved
        (((int)srcp&15) || (src_pitch &15)) ? assemblerY.Call() : assemblerY_aligned.Call();
        break;
    }
The real difference is in the function calling, the underlying code is really very similar. The references to assemblerY.call() or ua_proc_yplane(...vars...) execute very similar code as far as I can tell.

I checked my FFT3DFilter, and it works still. I think I may have been building it funny, I noticed some intel compiler specific include directories in my configuration, so I removed them, and rebuilt. Would you humor me and try this build?

@squid_80
Softwire does indeed have 64bit support, but it's apparently lacking. The main team of Avisynth guys has been updating 32bit Softwire as they go, I've noticed. I guess I just feel more comfortable writing hard coded assembly than dealing with Softwire, which will probably come back and bite me later, we'll see.

I can see myself breaking down and starting to implement the same things with Softwire64, it wouldn't be much of a stretch.
JoshyD is offline   Reply With Quote
Old 7th March 2010, 19:57   #73  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 327
Oh, figured it out:

Quote:
Activation context generation failed for "FFT3DFILTER.DLL". Dependent Assembly Microsoft.VC90.CRT,processorArchitecture="amd64",publicKeyToken="1fc8b3b9a1e18e3b",type="win32",version="9.0.21022.8" could not be found. Please use sxstrace.exe for detailed diagnosis.
There is also a dependency on libmmd.dll and ieshims.dll.

Edit: It loads and works fine after I installed the MSVC 2008 SP1 runtimes and a googled copy of libmmd.dll. Hopefully you can resolve the dependencies on your end as well (static linking?).

Last edited by Stephen R. Savage; 7th March 2010 at 20:13.
Stephen R. Savage is offline   Reply With Quote
Old 7th March 2010, 20:28   #74  |  Link
Adub
Fighting spam with a fish
 
Adub's Avatar
 
Join Date: Sep 2005
Posts: 2,699
Quote:
Originally Posted by JoshyD View Post
The response you got earlier is quite correct, but I was wondering which plugin in particular you were interested in?
Well, I have been considering installing Windows 7 x64 bit for some time as the website I run (Adubvideo gives a number of tutorials, and on some of these my readers have been having issues when running BD Rebuilder on Windows 7.

Since I'm a bleeding edge kind of guy, I though, let's go 64bit! But I wanted to be sure that I could still get the most out of my current encoding chains.

I currently don't have a particular filter in mind, but then I haven't gone through all of my plugins to check which ones are compatible and not. Judging as more and more support is being added for 64 bit, I'll think I will go ahead and make the switch in the next few weeks.

I'll be back with plugin suggestions soon after that, I'm sure!

And thank you so much for your hard work! It is greatly appreciated right now with the big switch to 64 bit operating systems and processors.
__________________
FAQs:Bond's AVC/H.264 FAQ
Site:Adubvideo
Adub is offline   Reply With Quote
Old 8th March 2010, 01:18   #75  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
Yeah, statically linking would be pretty helpful, somehow I missed that one when setting my build options. It's fixed now, I think we're good to go
JoshyD is offline   Reply With Quote
Old 9th March 2010, 21:07   #76  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
New build is up on the main page, please download it and let me know if I've missed any oddities in my test cases.

Changes are listed on the front page, there's some nice speed gains to be had in the main dll. Running single threaded TempGaussMC beta2 through avs2avi with a null output and using an old home movie recorded in DVSD (720x480, 29.97fps, interlaced, bff) as the source:

Avisynth32: 5.05fps
Avisynth64: 6.09fps

Hooray, a whole extra frame every second! It may not seem like much, but for a long encode (each run took ~40mins, give or take depending on the version used), an extra frame every second, can shave a significant amount off your total encode time.

Add some good ol' SetMTMode(2) into the mix, and you've got a larger gap (same test):

Avisynth32: 13.79fps
Avisynth64: 17.66fps

Still not setting any speed records, but with a slow script (that produces great results) I'll take my speed gains, however minor they may be. Script dependent, I'd say speed increases are in the 15% to 20% range on average.

Last edited by JoshyD; 9th March 2010 at 21:18.
JoshyD is offline   Reply With Quote
Old 9th March 2010, 22:27   #77  |  Link
Stephen R. Savage
Registered User
 
Stephen R. Savage's Avatar
 
Join Date: Nov 2009
Posts: 327
Quick tests:

Decode 640x480 Xvid (MS MPEG-4 Decoder, 500 frames)
32-bit: 113.27 fps
64-bit: 118.20 fps
Relative Speed: 104.4%

Spline36Resize 16x Enlarge (500 frames)
32-bit: 18.06 fps
64-bit: 31.61 fps
Relative Speed: 175.0%

TempGaussMC beta 2 (EEDI2, 100 fields)
32-bit: 3.42 fps
64-bit: 3.72 fps
Relative Speed: 108.9%

MDeGrain3 (200 frames)
32-bit: 5.96 fps
64-bit: 6.54 fps
Relative Speed: 109.7%

AAA (100 frames)
32-bit: 2.17 fps
64-bit: 2.20 fps (MTMode = 1)
Relative Speed: 101.3%

All tests were run three times and averaged. I am not seeing the performance increases in TGMC that you are, JoshyD. However, resize performance is definitely up! The caching bug related to AAA is still not fixed. I suspect this bug is sapping performance out of other scripted filters (read: TGMC) as well. Note that the SetMTMode hack is suboptimal, as it costs performance when the caching bug is not in play (e.g. a simple source+resize script).

Last edited by Stephen R. Savage; 9th March 2010 at 22:29.
Stephen R. Savage is offline   Reply With Quote
Old 9th March 2010, 22:59   #78  |  Link
turbojet
Registered User
 
Join Date: May 2008
Posts: 1,840
Thanks for the new version however it crashes on Athlon II. Here's the windows error codes.

Code:
Problem signature:
  Problem Event Name:	APPCRASH
  Application Name:	x264_x64.exe
  Application Version:	0.0.0.0
  Application Timestamp:	4b8c1206
  Fault Module Name:	avisynth.DLL
  Fault Module Version:	2.5.8.5
  Fault Module Timestamp:	4b9681a6
  Exception Code:	c0000005
  Exception Offset:	0000000000005528
  OS Version:	6.1.7600.2.0.0.256.1
  Locale ID:	1033
  Additional Information 1:	9eab
  Additional Information 2:	9eabb149e34b0e02564736c484278831
  Additional Information 3:	6bcd
  Additional Information 4:	6bcdaf28393e1989487185c90748dcec
I noticed the new dll is 4 MB while the older one that works is 800 KB.

Also since all 64 bit filters I could find are named identical to the 32 bit counterpart I changed the install script to use plugins64 directory. You can grab it here

Another thing is as far as I know there's only one haali media splitter build that's x64 and handles vc-1 and it's very difficult to find but I uploaded it here. Up to you if you want to post it in the original post in case people aren't able to use HMS x64 or report issues with VC-1 (from the latest official release).

Lastly about filters I heavily use TIVTC so it's unfortunate that's not easy to convert. The other ones I use every once in awhile are:
Decomb v4 (more effective ivtc then v5 but nowhere near tivtc, squid's is v5 only)
LeakKernelDeint (fast, simple, sharp deinterlacer)
RePAL (very effective for handling pal sources that were blended for ntsc dvds)

Last edited by turbojet; 9th March 2010 at 23:10.
turbojet is offline   Reply With Quote
Old 10th March 2010, 00:14   #79  |  Link
JoshyD
Registered User
 
Join Date: Feb 2010
Posts: 84
@Stephen: Let TGMC (and all the filters, really) have a few thousand frames to work with, 100 doesn't give the caching mechanisms in Avisynth (nor the computer as a whole) time to get filled and ready to go. All that data has to be pulled in closer and closer to the processor before any computationally intensive algorithms can really shine. If they don't have the needed data close at hand, you're going to be memory latency limited rather than compute limited. I've been running tests with TGMC and my own personal builds to collect usage data in the program. That being said, using a larger sample can sometimes accentuate the differences between the two versions. For example, a 2002 frame sample produces slightly more differences than if I just let it run though the first few hundred frames.

Avisynth 32: 5.54fps
Avisynth 64: 6.60fps
Relative increase: 119%

The caching bug is annoying, can post the exact script and file for AAA that you're working with?

@turbojet: That's a memory access error, can you post the script that you were running when that occurred? A short sample of the clip you were using would be useful as well, I need to get pitch, height, width, etc info from it. I haven't coded any instructions that are incompatible with your processor, however, I may very well have mucked up my memory alignment access requests. I think it probably occurred in the horizontal resize function, but can't be certain.

The DLL linked here is ridiculous in size because of the compiler that generated it, and the options I allowed. Intel's C++ compiler will generate a specific code path for every Intel processor p4 and newer. At runtime, the code CPUID's your processor, and if you're lucky enough to have the your VendorID = 'GenuineIntel' you'll get a special set of the code optimized for your particular processor and its idiosyncrasies. Therefore, extra code for those processors, in addition to some statically linked code from OpenMP, balloon the size of the DLL.

Don't worry about AMD processors though, the base code path is for any processor that has SSE3 or newer. While Intel's compiler won't auto-vectorize using SSE4 instructions for AMD processors, it will still give them the benefit of data operations that can be performed with SSE3 and any older set of SIMD instructions.

Last edited by JoshyD; 10th March 2010 at 00:24.
JoshyD is offline   Reply With Quote
Old 10th March 2010, 00:51   #80  |  Link
turbojet
Registered User
 
Join Date: May 2008
Posts: 1,840
Source is a 1920x1080 m2ts, script is: DirectShowsource().AssumeFPS(24000,1001).LanczosResize(1280,720)

It's definitely an issue with resize, if I don't resize it works without crashing. I tried bilinear, point, bicubic resize and all crashed.
turbojet is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 07:14.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.