In a word, yep
. The main things that make it so much slower are:
1.) by default decimate only checks luma while tdecimate checks both luma and chroma (set quality = 3 in decimate to sample both)
2.) tdecimate uses overlapping blocks while decimate doesn't
3.) tdecimate can use user defined block sizes in metric calculations (the blockx and blocky parameters) while decimate uses fixed 32x32 blocks
I am suprised it was that much of a drop off, though it would make more difference with larger frame sizes. I could try adding a quality setting like decimate has (to do subsampling) or having a metrics mode that does not use overlapping blocks or both. Though then tdecimate would be pretty much the same as decimate for most similar 1-in-N non-hybrid cases.