I looked at the clip and to me it looks like plain 12fps animation. After field matching there is a 3 frames (1 new + 2 dups) followed by 2 frames (1 new + 1 dup) pattern and mode = 1 with tdecimate handles it fine. It did have some irregular rff flagging so using d2v="" in tfm with flags = 0, 1 or 2 lead to incorrect decimation in a couple cycles where mode=1 w/o the rff info picked the correct frames to decimate. I went ahead and added in some extra logic to prevent mode=1 using d2v dup info in such cases.
Here is the result (xvid const quant 2) after using the following script:
mpeg2source("C:\gits.d2v")
tfm(d2v="C:\gits.d2v")
tdecimate(mode=1)
gits clip