SEt's Avisynth 2.5.8 MT compiled for *X86_64*, Latest Build 4/16/2010 - Page 5

JoshyD · 10th March 2010, 01:35

@turbojet

Would you try the latest release on the first page to see if that cleared up any of the problems?

If that doesn't, would you try the following two procedures:
1. Only resize on the the vertical axis
2. Only resize on the horizontal axis

Hopefully, only one of those will crash the program, if at all.

I'm guessing DirectShowSource is giving you a yv12 stream?

turbojet · 10th March 2010, 02:11

With same 1920x1080 source
LanczosResize(1280,720) - crash
LanczosResize(1920,720) - crash
LanczosResize(1280,1080) - works

yes yv12

JoshyD · 10th March 2010, 03:43

@turbojet
I can't seem to recreate the crash. I specifically wrote the vertical resize functions to check for memory alignment before executing. I hope this isn't processor specific, that would be a bummer. Checking for instruction support, your Athlon II should have all the goods to do the resize correctly. Any chance you could snip a few frames (100?) of the source and post it somewhere so I can investigate further?

@Stephen
That cache bug is annoying . . . but interesting. It seems that it also exists in SEt's 32bit build of avisynth 2.5.8 as well, perhaps something got all strange when the MT mode was hacked to be supported?

turbojet · 10th March 2010, 06:31

Every source crashes instantly with horizontal resize so a source I don't think would help and I'm afraid it's a cpu instruction issue. Some things that might help is x264 doesn't use SSE3 on this cpu, this is what it uses: MMX2 SSE2Fast FastShuffle SSEMisalign LZCNT. Also a few months ago I was testing icc x264 builds and found some that worked and some that didn't. I believe the ones that didn't use -QaxSSE3 during the compile but I'm not 100% on that.

kemuri-_9 · 10th March 2010, 14:16

Quote:

Originally Posted by turbojet

Every source crashes instantly with horizontal resize so a source I don't think would help and I'm afraid it's a cpu instruction issue. Some things that might help is x264 doesn't use SSE3 on this cpu, this is what it uses: MMX2 SSE2Fast FastShuffle SSEMisalign LZCNT. Also a few months ago I was testing icc x264 builds and found some that worked and some that didn't. I believe the ones that didn't use -QaxSSE3 during the compile but I'm not 100% on that.

what cpu do you have again (codename preferred)?

SSE3 is barely used in x264, the majority of the SSE3 related asm actually uses SSSE3.
that being said, x264 does have some SSE3 asm but it only uses these for CPUs that flag Cacheline64, which is only on intel processors.

tl;dr
AMD processors never see SSE3 getting used in x264 even if they have it.

Stephen R. Savage · 10th March 2010, 17:11

Quote:

Originally Posted by JoshyD

@turbojet
I can't seem to recreate the crash. I specifically wrote the vertical resize functions to check for memory alignment before executing. I hope this isn't processor specific, that would be a bummer. Checking for instruction support, your Athlon II should have all the goods to do the resize correctly. Any chance you could snip a few frames (100?) of the source and post it somewhere so I can investigate further?

@Stephen
That cache bug is annoying . . . but interesting. It seems that it also exists in SEt's 32bit build of avisynth 2.5.8 as well, perhaps something got all strange when the MT mode was hacked to be supported?

Here is AAA.avsi in full:

Code:

function AAA(clip input, int "type", bool "mask", bool "chroma")
{
	ox = width(input)
	oy = height(input)

	type = default(type, 1)
	mask = default(mask, true)
	chroma = default(chroma, false)

	gscale = chroma ? input : Greyscale(input)

	aa = type >= 2 ? nnedi2_rpow2(gscale, rfactor=2) :
		\ type == 1 ? TurnRight(gscale).EEDI2(field=1).TurnLeft().EEDI2(field=1) :
		\ PointResize(gscale, ox * 2, oy * 2).TurnRight().SangNom().TurnLeft().SangNom()

	edge = mt_didee(aa).Spline36Resize(ox, oy, -0.5, -0.5, 2 * ox, 2 * oy)
	ds = Spline36Resize(aa, ox, oy, -0.5, -0.5, 2 * ox, 2 * oy)
	maskmerge = mask ? mt_merge(input, ds, edge, U=1, V=1) : ds

	return chroma ? MergeChroma(maskmerge, ds) : MergeChroma(maskmerge, input)
}

function mt_didee(clip input)
{
	mask = mt_logic(mt_edge(input, "5 10 5 0 0 0 -5 -10 -5 4", 0, 255, 0, 255),
		\ mt_edge(input, "5 0 -5 10 0 -10 5 0 -5 4", 0, 255, 0, 255), "max").Greyscale().
		\ Levels(0, 0.8, 128, 0, 255, false)
	return mask
}

I just call it as

Quote:

DirectShowSource("source.avi")
AAA()

adding "chroma=true" makes the cache bug even worse. I don't use MT myself, so this cache bug is quite a bummer. Hope you can find it, even if it's not a bug in your own code. Perhaps it will even turn out to be an intractable design flaw (argh).

Also, many older AMDs do not support SSE3, but only up to SSE2 (slowly).

Wilbert · 10th March 2010, 17:32

Quote:

Also, many older AMDs do not support SSE3, but only up to SSE2 (slowly).

My old Athlon XP doesn't support SSE2, but only iSSE (or perhaps SSE dunno).

Stephen R. Savage · 10th March 2010, 17:48

According to Wikipedia, the oldest AMD64 CPU (Opteron, 130nm) supports SSE2. However, turbojet's Athlon II appears to support SSE3, so who knows...

JoshyD · 10th March 2010, 18:32

@Stephen
You're quite correct, which is why I was confused. I don't think his processor agrees with loading the values of some of the arithmetic functions straight from memory. For now, I'm going to band-aid the code paths to use MMX if a non-compatible CPU turns up. I had totally forgotten that the Athlon64's only supported SSE2. I was happily thinking that many of the feature checking before function execution were going to go away because x64 processors generally have the latest and greatest when it comes to SIMD instructions. Looks like that's not the case. I've got an old Athlon64 I rarely use, so I guess I'll be turning it on to run test vectors before any future code gets loose.

It's weird because he can use the horizontal resize functions which make heavy usage of 128bit memory transfers to set up their workspaces. SSE2 doesn't have official listings for movdqa or movdqu instructions, but those are most certainly used when resizing YV12 or YUY2 horizontally.

The cache issue remains open, and I'm able to perfectly recreate it using any source. Looking intensively over a diff with the core Avisynth files hasn't turned up anything of interest yet. This test case seems to be the only one that really highlights the problem. Running other filters, internal and external, with and without a SetMTMode command doesn't give such a stark contrast in performance. I'm wondering a) why this filter combination is a showstopper and b) what script environment variables aren't set unless you allow a MT mode to be set. The largest difference I could find is:

Code:

if ((env->GetMTMode(false) > 0) && (env->GetMTMode(false) < 5)) {
            filter_graph = new CacheMT1(new Distributor(filter_graph, env), env);
          }
          else {
            filter_graph = Cache::Create_Cache(AVSValue(filter_graph), 0, env).AsClip();
          }

This occurs on script instantiation. I'll keep looking. On a sidenote, building and using the current 2.6 allows MT mode to be set, but the performance is abysmal. All non-MT functionality is perfect though.

@kemuri-_9
Thanks for the heads up on the x264 methodology of choosing instructions based on cacheline size. This makes me wonder if turbojet's CPU will return "true" when asked if it supports SSE3. If that's the case, then I'll have to add in code (yank it from the x264 cpuid functions) that checks for cache line size, and goes about its business appropriately.

Looking at the 2.6 CVS, it dynamically assembles almost identical code to what I have written for the vertical resizers. It does so based upon a check for SSE3 and SSSE3. If it's the case that Athlon II's and their brethren say they have SSE3, but fail when executing these SSE ops, the main code branch may hit a similar snag.

@turbojet:
Would you mind running CPU-z and telling me what feature flags your CPU reports? If it says SSE3, then dang, more code to write.

EDIT:

Caching bug squashed. Turns out, EEDI2 identifies itself as never wanting a cache, so when your script asks for a frame generated by EEDI2, it goes all the way back and generates it again.

Code:

  if (h_policy == CACHE_NOTHING) { // don't want a cache. Typically filters that only ever seek forward.
    __asm mov rbx,rbx  // Hack! prevent compiler from trusting ebx contents across call
    return childGetFrame(n, env);
  }

That childGetFrame executes more times than needed when your AAA script is run. This is the cause of the huge slowdown. Can anyone think of an instance where we would universally never want to save the previously generated frames? Check the main page for an update, this also *tries* to address the problems with older processors, but I don't think it's all the way yet.

turbojet · 10th March 2010, 23:15

It's an athlon II 620. CPU Instructions are: MMX(+) 3DNow(+) SSE(1,2,3,4A) x86-x64, AMD-V which I'm pretty sure is identical to Phenom II. At least linux /proc/sys/cpu has identical flags.

Avisynth 2.6 Alpha 2 works fine here and if a dropback is needed wouldn't it be better to drop back to SSE(2) instead? IIRC that's all the further 2.6 branch goes to and it's a significant speedup

Stephen R. Savage · 11th March 2010, 00:04

Updated results for AAA (200 frames):

32-bit: 1.85 fps
64-bit: 1.90 fps
Relative Performance: 102.7%

A bit disappointing, since I recalled EEDI2 being 10% faster and masktools2 similarly faster. How did you fix the caching bug? Did you enforce a cache for all filters? How does the original Avisynth code handle this?

Also, here's to hoping that you manage to get TIVTC ported one day, as it's one of the Greatest Filters Of All Time™.

JoshyD · 11th March 2010, 03:45

@Stephen
The quick fix was to just *allow* a filter to be cached. EEDI2 was requesting not to be cached at all for some reason or another. Before, if the policy was not not cache a filter, it would go to the filter's get frame function, and return that, and no cache related code would be executed. By disallowing the insta-return and allowing it to also check / insert the frame into the cache, AAA starts finding the frames EEDI2 previously generated, and uses those, rather than making EEDI2 do the work all over again.

I'm not sure what the difference is between SEt's build and the standard build that creates the slowdown. The MTModes are hacked in to the main program, so there's some incongruities in the code. I think the internal caching mechanism is being polished little by little for 2.6, so this will be cleared up in the end. For now, the fix uses a little extra memory in certain corner cases. In general, a cache irregardless of whether or not the filter was written to only seek forward isn't a bad idea. You can write a script to access the source in any pattern you like really. If the accesses repeat, the script will be faster.

Also, a comment on the speed differences in AAA, over 3 runs of 250 frames, I have:
32-bit: 3.28fps
64-bit: 3.69fps
Relative performance: 112.5%

Did you grab the latest build of EEDI2? It used to use the same code for memory copying as Avisynth, which I wrote to work on processors with 128bit registers, instead of 64. The memory copy alone adds a little speed bump.

TIVTC is a beast because of the hodge podge of code it consists of. It also intermingles inline asm with compiler intrinsics as a means of using the XMM registers in some cases. It can get a little ugly. If I can figure out what's causing turbojet's crash, I'll re-examine TIVTC.

@turbojet:
The latest grab of the 2.6CVS has vertical resize code that checks for SSE3 support and then uses the same combination of instructions I do. I wish I had a similar machine to test on, so I could trace during run time. When I said fall back to MMX, I meant the mmx registers. They're half the size of the SSE registers, but technically, iSSE instructions use them. I'd drop back to that code in the case of an incompatible processor. The conundrum is why is your SSE1-4A supporting processor balking at the code? I also wonder what x264 is using to red flag your CPU to restrict it to SSE2, perhaps the vendor ID?

Do the older versions of the DLL let you resize? I hadn't changed ANY of the resize code at that point. I guess that's a good sanity check point. Try this one or the fail safe is this build of the source. If that doesn't work, there's something deeper behind the problem.

Has anyone been able to get this to run correctly on AMD cpu's?

kemuri-_9 · 11th March 2010, 04:48

Quote:

Originally Posted by JoshyD

I also wonder what x264 is using to red flag your CPU to restrict it to SSE2, perhaps the vendor ID?

look at common/cpu.c and common/x86/cpu-a.asm

osgZach · 11th March 2010, 15:43

Does anyone have 32 vs 64 bit performance numbers for TempGaussMC_beta2 ? (assuming it runs.. Stephen mentioned TGMC but not which version he's using, I don't think)

Stephen R. Savage · 11th March 2010, 18:36

@JoshyD: I'm using the copy of EEDI2 from the first page, which is dated to 2/19/2010. If there is another version, I am not aware of it.

osgZach · 11th March 2010, 19:19

I was referring more to the AVS(i) script itself. But I'm guessing we're talking about the same thing really.

Either way. I average about 2fps give or take some fractional ups and downs. Takes about 4h:45m to do a 22m:48s clip

Core 2 Duo @ 3.2ghz

What kind of fps do you see on your machine?

And while I have your attention.. any way to get it entered either into the Fieldhint(blah..()) or replace that line entirely, during Yatta AVS generation ?

Sucks to have to manually swap it out on 50 different project files on average

But I suppose nothing a find/replace script can't fix. Still learning bits and pieces about YATTA, but since writing my Find30fps tool it's been so much less trouble.

JoshyD · 11th March 2010, 19:59

Temp Gauss Beta 2 were the runs Stephen and I were performing, single threaded, for me enjoys a healthy ~20% speed increase, multiple threads speeds up the process by a larger margin. Here are some sample numbers for a 4000 frame run of a dvsd (720x480, 29.97 fps, interlaced bff) source through the script using avs2avi, no compression, outputting to null:

32bit Avisynth: 5.06fps
64bit Avisynth: 6.07fps
Relative speed: 119.96%

Threading the script with SetMTMode(2,8):
32bit Avisynth: 14.47fps
64bit Avisynth: 18.05fps
Relative speed: 124.74%

Tests were run with a Core i5-750 (4 cores, no HT) @ 3.71GHz. The eight thread creation request keeps all 4 cores constantly churning at 100%. Requesting less threads reduces total CPU utilization, with 4 threads giving ~50% total usage. Your performance will vary based upon your system setup, obviously.

The versions of Avisynth were both based on SEt's 2.5.8 build with multithreading enabled, for as much of an apples to apples comparison as possible.

Generally, it seems safe to assume ~15-20% performance increase, dependent upon which plugins you want to use.

Vertical Sharpen was ported specifically because of it's use in TGMC beta 2. The same goes for RemoveGrain and Repair. MaskTools2 and MVTools2 were ported because they're so darned useful. EEDI2 was ported to fill the void of a good 64bit deinterlacer.

The 32 bit versions of these plugins were compiled by myself, with some tweaks here and there as I went. I've been "rolling my own" versions of these for a while now, the 64bit port was an extension of this hobby.

osgZach · 11th March 2010, 20:10

Thanks for your response.

Those are certainly some nice numbers. I wasn't even aware you could run it multi-threaded either. Frankly the setup process for MT related stuff scared me way, was afraid I would boink something

Right now I only exclusively use TempGaussMC_beta2, the only other filter being TDecimate, and any other stuff YATTA deems necesarry in the generated AVS. I am encoding to HuffYV12 and then filtering later. So hey maybe I'll get something like 5 or 6 fps ? LOL

Perhaps I will go back to the initial post and see if I can follow the steps to do all of this..

Although as far as MT goes.. I only have 2 cores, so I wouldn't expect huge numbers like you got (but really impressive), but I think the x64 single threaded boost might be pretty big as well.

Is there a chance this will all be available vial a one-click installer at some point?

Adub · 11th March 2010, 20:26

So, are we currently not able to use TIVTC and most of Tritical's plugins with the 64 bit version of Avisynth? Or is that just because they haven't been converted yet?

If so, I request that we convert as many of Tritical's plugins as possible. Specifically TIVTC and Colormatrix, as I think that those are some of the most often used plugins.

osgZach · 11th March 2010, 20:34

@ Adub

Quote:

Originally Posted by JoshyD

@Stephen

TIVTC is a beast because of the hodge podge of code it consists of. It also intermingles inline asm with compiler intrinsics as a means of using the XMM registers in some cases. It can get a little ugly. If I can figure out what's causing turbojet's crash, I'll re-examine TIVTC.

Hopefully something will happen in the future though. This is a great project, I've been waiting for a long time.. So it is great to see the progress we have already.

I wish I had the skills to contribute.. I barely know what little Python I use as it is...

10th March 2010, 01:35	#81 \| Link
JoshyD Registered User Join Date: Feb 2010 Posts: 84	@turbojet Would you try the latest release on the first page to see if that cleared up any of the problems? If that doesn't, would you try the following two procedures: 1. Only resize on the the vertical axis 2. Only resize on the horizontal axis Hopefully, only one of those will crash the program, if at all. I'm guessing DirectShowSource is giving you a yv12 stream? Last edited by JoshyD; 10th March 2010 at 01:38.

10th March 2010, 03:43	#83 \| Link
JoshyD Registered User Join Date: Feb 2010 Posts: 84	@turbojet I can't seem to recreate the crash. I specifically wrote the vertical resize functions to check for memory alignment before executing. I hope this isn't processor specific, that would be a bummer. Checking for instruction support, your Athlon II should have all the goods to do the resize correctly. Any chance you could snip a few frames (100?) of the source and post it somewhere so I can investigate further? @Stephen That cache bug is annoying . . . but interesting. It seems that it also exists in SEt's 32bit build of avisynth 2.5.8 as well, perhaps something got all strange when the MT mode was hacked to be supported? Last edited by JoshyD; 10th March 2010 at 04:19.

10th March 2010, 06:31	#84 \| Link
turbojet Registered User Join Date: May 2008 Posts: 1,840	Every source crashes instantly with horizontal resize so a source I don't think would help and I'm afraid it's a cpu instruction issue. Some things that might help is x264 doesn't use SSE3 on this cpu, this is what it uses: MMX2 SSE2Fast FastShuffle SSEMisalign LZCNT. Also a few months ago I was testing icc x264 builds and found some that worked and some that didn't. I believe the ones that didn't use -QaxSSE3 during the compile but I'm not 100% on that. Last edited by turbojet; 10th March 2010 at 06:34.

10th March 2010, 18:32	#89 \| Link
JoshyD Registered User Join Date: Feb 2010 Posts: 84	@Stephen You're quite correct, which is why I was confused. I don't think his processor agrees with loading the values of some of the arithmetic functions straight from memory. For now, I'm going to band-aid the code paths to use MMX if a non-compatible CPU turns up. I had totally forgotten that the Athlon64's only supported SSE2. I was happily thinking that many of the feature checking before function execution were going to go away because x64 processors generally have the latest and greatest when it comes to SIMD instructions. Looks like that's not the case. I've got an old Athlon64 I rarely use, so I guess I'll be turning it on to run test vectors before any future code gets loose. It's weird because he can use the horizontal resize functions which make heavy usage of 128bit memory transfers to set up their workspaces. SSE2 doesn't have official listings for movdqa or movdqu instructions, but those are most certainly used when resizing YV12 or YUY2 horizontally. The cache issue remains open, and I'm able to perfectly recreate it using any source. Looking intensively over a diff with the core Avisynth files hasn't turned up anything of interest yet. This test case seems to be the only one that really highlights the problem. Running other filters, internal and external, with and without a SetMTMode command doesn't give such a stark contrast in performance. I'm wondering a) why this filter combination is a showstopper and b) what script environment variables aren't set unless you allow a MT mode to be set. The largest difference I could find is: Code: if ((env->GetMTMode(false) > 0) && (env->GetMTMode(false) < 5)) { filter_graph = new CacheMT1(new Distributor(filter_graph, env), env); } else { filter_graph = Cache::Create_Cache(AVSValue(filter_graph), 0, env).AsClip(); } This occurs on script instantiation. I'll keep looking. On a sidenote, building and using the current 2.6 allows MT mode to be set, but the performance is abysmal. All non-MT functionality is perfect though. @kemuri-_9 Thanks for the heads up on the x264 methodology of choosing instructions based on cacheline size. This makes me wonder if turbojet's CPU will return "true" when asked if it supports SSE3. If that's the case, then I'll have to add in code (yank it from the x264 cpuid functions) that checks for cache line size, and goes about its business appropriately. Looking at the 2.6 CVS, it dynamically assembles almost identical code to what I have written for the vertical resizers. It does so based upon a check for SSE3 and SSSE3. If it's the case that Athlon II's and their brethren say they have SSE3, but fail when executing these SSE ops, the main code branch may hit a similar snag. @turbojet: Would you mind running CPU-z and telling me what feature flags your CPU reports? If it says SSE3, then dang, more code to write. EDIT: Caching bug squashed. Turns out, EEDI2 identifies itself as never wanting a cache, so when your script asks for a frame generated by EEDI2, it goes all the way back and generates it again. Code: if (h_policy == CACHE_NOTHING) { // don't want a cache. Typically filters that only ever seek forward. __asm mov rbx,rbx // Hack! prevent compiler from trusting ebx contents across call return childGetFrame(n, env); } That childGetFrame executes more times than needed when your AAA script is run. This is the cause of the huge slowdown. Can anyone think of an instance where we would universally never want to save the previously generated frames? Check the main page for an update, this also tries to address the problems with older processors, but I don't think it's all the way yet. Last edited by JoshyD; 10th March 2010 at 20:34.

11th March 2010, 03:45	#92 \| Link
JoshyD Registered User Join Date: Feb 2010 Posts: 84	@Stephen The quick fix was to just allow a filter to be cached. EEDI2 was requesting not to be cached at all for some reason or another. Before, if the policy was not not cache a filter, it would go to the filter's get frame function, and return that, and no cache related code would be executed. By disallowing the insta-return and allowing it to also check / insert the frame into the cache, AAA starts finding the frames EEDI2 previously generated, and uses those, rather than making EEDI2 do the work all over again. I'm not sure what the difference is between SEt's build and the standard build that creates the slowdown. The MTModes are hacked in to the main program, so there's some incongruities in the code. I think the internal caching mechanism is being polished little by little for 2.6, so this will be cleared up in the end. For now, the fix uses a little extra memory in certain corner cases. In general, a cache irregardless of whether or not the filter was written to only seek forward isn't a bad idea. You can write a script to access the source in any pattern you like really. If the accesses repeat, the script will be faster. Also, a comment on the speed differences in AAA, over 3 runs of 250 frames, I have: 32-bit: 3.28fps 64-bit: 3.69fps Relative performance: 112.5% Did you grab the latest build of EEDI2? It used to use the same code for memory copying as Avisynth, which I wrote to work on processors with 128bit registers, instead of 64. The memory copy alone adds a little speed bump. TIVTC is a beast because of the hodge podge of code it consists of. It also intermingles inline asm with compiler intrinsics as a means of using the XMM registers in some cases. It can get a little ugly. If I can figure out what's causing turbojet's crash, I'll re-examine TIVTC. @turbojet: The latest grab of the 2.6CVS has vertical resize code that checks for SSE3 support and then uses the same combination of instructions I do. I wish I had a similar machine to test on, so I could trace during run time. When I said fall back to MMX, I meant the mmx registers. They're half the size of the SSE registers, but technically, iSSE instructions use them. I'd drop back to that code in the case of an incompatible processor. The conundrum is why is your SSE1-4A supporting processor balking at the code? I also wonder what x264 is using to red flag your CPU to restrict it to SSE2, perhaps the vendor ID? Do the older versions of the DLL let you resize? I hadn't changed ANY of the resize code at that point. I guess that's a good sanity check point. Try this one or the fail safe is this build of the source. If that doesn't work, there's something deeper behind the problem. Has anyone been able to get this to run correctly on AMD cpu's? Last edited by JoshyD; 11th March 2010 at 08:32.

10th March 2010, 02:11	#82 \| Link
turbojet Registered User Join Date: May 2008 Posts: 1,840	With same 1920x1080 source LanczosResize(1280,720) - crash LanczosResize(1920,720) - crash LanczosResize(1280,1080) - works yes yv12

10th March 2010, 17:48	#88 \| Link
Stephen R. Savage Registered User Join Date: Nov 2009 Posts: 327	According to Wikipedia, the oldest AMD64 CPU (Opteron, 130nm) supports SSE2. However, turbojet's Athlon II appears to support SSE3, so who knows...

10th March 2010, 23:15	#90 \| Link
turbojet Registered User Join Date: May 2008 Posts: 1,840	It's an athlon II 620. CPU Instructions are: MMX(+) 3DNow(+) SSE(1,2,3,4A) x86-x64, AMD-V which I'm pretty sure is identical to Phenom II. At least linux /proc/sys/cpu has identical flags. Avisynth 2.6 Alpha 2 works fine here and if a dropback is needed wouldn't it be better to drop back to SSE(2) instead? IIRC that's all the further 2.6 branch goes to and it's a significant speedup

11th March 2010, 00:04	#91 \| Link
Stephen R. Savage Registered User Join Date: Nov 2009 Posts: 327	Updated results for AAA (200 frames): 32-bit: 1.85 fps 64-bit: 1.90 fps Relative Performance: 102.7% A bit disappointing, since I recalled EEDI2 being 10% faster and masktools2 similarly faster. How did you fix the caching bug? Did you enforce a cache for all filters? How does the original Avisynth code handle this? Also, here's to hoping that you manage to get TIVTC ported one day, as it's one of the Greatest Filters Of All Time™.

11th March 2010, 15:43	#94 \| Link
osgZach Registered User Join Date: Feb 2009 Location: USA Posts: 676	Does anyone have 32 vs 64 bit performance numbers for TempGaussMC_beta2 ? (assuming it runs.. Stephen mentioned TGMC but not which version he's using, I don't think) Last edited by osgZach; 11th March 2010 at 16:17. Reason: spelling error

11th March 2010, 18:36	#95 \| Link
Stephen R. Savage Registered User Join Date: Nov 2009 Posts: 327	@JoshyD: I'm using the copy of EEDI2 from the first page, which is dated to 2/19/2010. If there is another version, I am not aware of it. Last edited by Stephen R. Savage; 11th March 2010 at 19:54.

11th March 2010, 19:19	#96 \| Link
osgZach Registered User Join Date: Feb 2009 Location: USA Posts: 676	I was referring more to the AVS(i) script itself. But I'm guessing we're talking about the same thing really. Either way. I average about 2fps give or take some fractional ups and downs. Takes about 4h:45m to do a 22m:48s clip Core 2 Duo @ 3.2ghz What kind of fps do you see on your machine? And while I have your attention.. any way to get it entered either into the Fieldhint(blah..()) or replace that line entirely, during Yatta AVS generation ? Sucks to have to manually swap it out on 50 different project files on average But I suppose nothing a find/replace script can't fix. Still learning bits and pieces about YATTA, but since writing my Find30fps tool it's been so much less trouble. Last edited by osgZach; 11th March 2010 at 19:23.

11th March 2010, 19:59	#97 \| Link
JoshyD Registered User Join Date: Feb 2010 Posts: 84	Temp Gauss Beta 2 were the runs Stephen and I were performing, single threaded, for me enjoys a healthy ~20% speed increase, multiple threads speeds up the process by a larger margin. Here are some sample numbers for a 4000 frame run of a dvsd (720x480, 29.97 fps, interlaced bff) source through the script using avs2avi, no compression, outputting to null: 32bit Avisynth: 5.06fps 64bit Avisynth: 6.07fps Relative speed: 119.96% Threading the script with SetMTMode(2,8): 32bit Avisynth: 14.47fps 64bit Avisynth: 18.05fps Relative speed: 124.74% Tests were run with a Core i5-750 (4 cores, no HT) @ 3.71GHz. The eight thread creation request keeps all 4 cores constantly churning at 100%. Requesting less threads reduces total CPU utilization, with 4 threads giving ~50% total usage. Your performance will vary based upon your system setup, obviously. The versions of Avisynth were both based on SEt's 2.5.8 build with multithreading enabled, for as much of an apples to apples comparison as possible. Generally, it seems safe to assume ~15-20% performance increase, dependent upon which plugins you want to use. Vertical Sharpen was ported specifically because of it's use in TGMC beta 2. The same goes for RemoveGrain and Repair. MaskTools2 and MVTools2 were ported because they're so darned useful. EEDI2 was ported to fill the void of a good 64bit deinterlacer. The 32 bit versions of these plugins were compiled by myself, with some tweaks here and there as I went. I've been "rolling my own" versions of these for a while now, the 64bit port was an extension of this hobby.

11th March 2010, 20:10	#98 \| Link
osgZach Registered User Join Date: Feb 2009 Location: USA Posts: 676	Thanks for your response. Those are certainly some nice numbers. I wasn't even aware you could run it multi-threaded either. Frankly the setup process for MT related stuff scared me way, was afraid I would boink something Right now I only exclusively use TempGaussMC_beta2, the only other filter being TDecimate, and any other stuff YATTA deems necesarry in the generated AVS. I am encoding to HuffYV12 and then filtering later. So hey maybe I'll get something like 5 or 6 fps ? LOL Perhaps I will go back to the initial post and see if I can follow the steps to do all of this.. Although as far as MT goes.. I only have 2 cores, so I wouldn't expect huge numbers like you got (but really impressive), but I think the x64 single threaded boost might be pretty big as well. Is there a chance this will all be available vial a one-click installer at some point? Last edited by osgZach; 11th March 2010 at 20:12.

11th March 2010, 20:26	#99 \| Link
Adub Fighting spam with a fish Join Date: Sep 2005 Posts: 2,699	So, are we currently not able to use TIVTC and most of Tritical's plugins with the 64 bit version of Avisynth? Or is that just because they haven't been converted yet? If so, I request that we convert as many of Tritical's plugins as possible. Specifically TIVTC and Colormatrix, as I think that those are some of the most often used plugins. __________________ FAQs:Bond's AVC/H.264 FAQ Site:Adubvideo

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode