View Single Post
Old 13th December 2008, 11:55   #16  |  Link
schweinsz
Registered User
 
Join Date: Nov 2005
Posts: 497
Quote:
Originally Posted by Dark Shikari View Post
The only "SSE" I see in this program is unbelievably bad compiler-generated assembly...

Code:
1000290d:       66 0f 6f c8             movdqa xmm1,xmm0
10002911:       66 0f 74 c8             pcmpeqb xmm1,xmm0
Code:
10004c00:       f3 0f 7e 02             movq   xmm0,[edx]
10004c04:       f3 0f 7e 08             movq   xmm1,[eax]
10004c08:       66 0f e0 c1             pavgb  xmm0,xmm1
10004c0c:       49                      dec    ecx
10004c0d:       66 0f d6 00             movq   [eax],xmm0
10004c11:       03 d6                   add    edx,esi
10004c13:       03 c6                   add    eax,esi
10004c15:       85 c9                   test   ecx,ecx
10004c17:       75 e7                   jne    0x10004c00
Code:
10008e71:       66 0f 60 c0             punpcklbw xmm0,xmm0
10008e75:       66 0f 6f d0             movdqa xmm2,xmm0
10008e82:       66 0f 60 d0             punpcklbw xmm2,xmm0
10008e8a:       66 0f 70 c2 00          pshufd xmm0,xmm2,0x0
(Well, OK, so I saw an iDCT that didn't look too awful, but that's about it...)
Yes, you are right. I code the motion compensation, IDCT and the deblocking filter using the intrinsic functions (http://msdn.microsoft.com/en-us/libr...3a(VS.80).aspx) and compile it using the VC 2008. I have found it very bad and I will code it using the pure assembly code in future.
schweinsz is offline   Reply With Quote