Thread: MPEG2Dec3 v1.10
View Single Post
Old 13th May 2003, 14:20   #24  |  Link
trbarry
Registered User
 
trbarry's Avatar
 
Join Date: Oct 2001
Location: Gainesville FL USA
Posts: 2,092
Quote:
@trbarry: Add_Block was just one that it mentioned (that I remembered). Its important and may not be able to be improved. The SSE iDCT is there, not the other SSE2 code. Marc FD removed/ifdef'd it because it was unstable (?) I think or at least not producing accurate results. If I had a SSE2 computer for development id test through each bit & find the bits that worked (i.e. created the same output as the SSEMMX parts). But I dont at present (you got any free time ? )
Nic -

As Int21h pointed out the SSE2 code (except for IDCT) made only marginal improvements, though it made more of a difference on my machine than his for some reason. But some of it was very sensitive to compiler optimization and would crash with some settings and combination of inlining. That's probably why Marc FD had to turn it off. There is a whole series of timing tests written up for similar DVD2AVI code in the DVD2AVI section in that huge DVD2AVI Sourceforge thread somewhere.

I've been planning for awhile to add a couple more simpler assembler optimizations but haven't quite got to it. These would just need P3's (or less), not P4's.

First, the iDCT code should probably be called via pointer like in Xvid (or my DctFilter), avoiding all the extra logic in AddBlock. You want to do this? (the SSE2 prefetch call is unneeded)

Second, the assembler now in AddBlock can be easily optimized a bit.

And third, and more important, would probably be asm optimizing the dequant functions. This wouldn't be hard and -h nags us about this from time to time.

I'll try to get to those.

I haven't checked yet to see if there are unneeded data copies in YV12 but maybe you can get those if you can find them. It certainly seems YV12 should be able to copy data fewer times since (IIRC for MPEG2DEC2) for YUY2 there was first a pass to planar 4:2:2 and then a conversion to YUY2. It seems at least one and maybe both of those should be unneeded.

- Tom

Last edited by trbarry; 13th May 2003 at 14:24.
trbarry is offline   Reply With Quote