13th December 2008, 11:55
|
#16
| Link
|
Registered User
Join Date: Nov 2005
Posts: 497
|
Quote:
Originally Posted by Dark Shikari
The only "SSE" I see in this program is unbelievably bad compiler-generated assembly...
Code:
1000290d: 66 0f 6f c8 movdqa xmm1,xmm0
10002911: 66 0f 74 c8 pcmpeqb xmm1,xmm0
Code:
10004c00: f3 0f 7e 02 movq xmm0,[edx]
10004c04: f3 0f 7e 08 movq xmm1,[eax]
10004c08: 66 0f e0 c1 pavgb xmm0,xmm1
10004c0c: 49 dec ecx
10004c0d: 66 0f d6 00 movq [eax],xmm0
10004c11: 03 d6 add edx,esi
10004c13: 03 c6 add eax,esi
10004c15: 85 c9 test ecx,ecx
10004c17: 75 e7 jne 0x10004c00
Code:
10008e71: 66 0f 60 c0 punpcklbw xmm0,xmm0
10008e75: 66 0f 6f d0 movdqa xmm2,xmm0
10008e82: 66 0f 60 d0 punpcklbw xmm2,xmm0
10008e8a: 66 0f 70 c2 00 pshufd xmm0,xmm2,0x0
(Well, OK, so I saw an iDCT that didn't look too awful, but that's about it...)
|
Yes, you are right. I code the motion compensation, IDCT and the deblocking filter using the intrinsic functions ( http://msdn.microsoft.com/en-us/libr...3a(VS.80).aspx) and compile it using the VC 2008. I have found it very bad and I will code it using the pure assembly code in future.
|
|
|