I think a key problem is a lack of ways to benchmark AVS64. I use avs2avi for standard AVS, but there is no analogue for AVS64. For example, if you benchmark against x264, the results are skewed by the 64-bit advantage for x264. If you normalize by using avs2yuv for the 32-bit Avisynth, you still have to factor in the speedup by cutting out the piping overhead.
That said, here are some benchmarks.
All results were taken using x264 64-bit in q=51, preset=ultrafast. 32-bit AVS was fed using avs2yuv, which should have negligible performance cost when single-threading.
Edit: Performance benchmarks have been redone using avs2avi.
TempGaussMC/EEDI2
32-bit: 2.83 fps
64-bit: 3.01 fps
MDeGrain3
32-bit: 6.20 fps
64-bit: 6.90 fps
AAA (mt_masktools, EEDI2)
32-bit: 1.92 fps
64-bit: 0.63 fps (SetMTMode: 1.94 fps)
Didee's Edge Mask
32-bit: 80.16 fps
64-bit: 89.02 fps
EEDI2 Resize2x
32-bit: 5.59 fps
64-bit: 5.67 fps
None of these cases came out bit-exact. I am a bit confused as to why AAA() comes out so much slower when all the components are faster.
Edit: I have traced the AAA slowdown to the following code fragment:
Code:
input = DirectShowSource("640x480p30.xvid.avi")
ox = width(input)
oy = height(input)
aa = TurnRight(input).EEDI2(field=1).TurnLeft().EEDI2(field=1)
edge = mt_logic(mt_edge(aa, "5 10 5 0 0 0 -5 -10 -5 4", 0, 255, 0, 255),
\ mt_edge(aa, "5 0 -5 10 0 -10 5 0 -5 4", 0, 255, 0, 255), "max").Greyscale().
\ Levels(0, 0.8, 128, 0, 255, false).Spline36Resize(ox, oy, -0.5, -0.5, 2 * ox, 2 * oy)
ds = Spline36Resize(aa, ox, oy, -0.5, -0.5, 2 * ox, 2 * oy)
maskmerge = mt_merge(input, ds, edge, U=1, V=1)
MergeChroma(ds)
I think it is a cache-related bug, because none of the individual pieces is slower.