View Single Post
Old 15th May 2008, 07:52   #22  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by sparky View Post
This is correct. To my knowledge, CoreAVC does not contain any SSE3, either. (My knowledge could be outdated) We will add support of these instruction sets eventually.
Yup, I don't think it does. I have seen use of lddqu in Elecard though, and FFDshow of course uses SSSE3 for luma/chroma MC.
Quote:
Originally Posted by sparky View Post
The decoder is part of a bigger source tree, there was no effort to "prune" dead code for this release. You could be seeing some assembly that is not used by the H.264 decoder. For example, ASP encoder does use floating point. That should cover most instances of 'emms'. But you have a good point.
Ah, that would explain why I found an MPEG-4 iDCT in there!

Here's the code I found:

Code:
1008962e:       55                      push   %ebp
1008962f:       89 e5                   mov    %esp,%ebp
10089631:       81 ec 08 00 00 00       sub    $0x8,%esp
10089637:       89 7d f8                mov    %edi,0xfffffff8(%ebp)
1008963a:       31 c9                   xor    %ecx,%ecx
1008963c:       8b 45 08                mov    0x8(%ebp),%eax
1008963f:       8b 7d 0c                mov    0xc(%ebp),%edi
10089642:       01 f8                   add    %edi,%eax
10089644:       0f 6f 00                movq   (%eax),%mm0
10089647:       0f 6f 0c 38             movq   (%eax,%edi,1),%mm1
1008964b:       0f f6 c1                psadbw %mm1,%mm0
1008964e:       0f 7e c2                movd   %mm0,%edx
10089651:       01 d1                   add    %edx,%ecx
10089653:       8d 04 78                lea    (%eax,%edi,2),%eax
10089656:       0f 6f 00                movq   (%eax),%mm0
10089659:       0f f6 c8                psadbw %mm0,%mm1
1008965c:       0f 7e ca                movd   %mm1,%edx
1008965f:       01 d1                   add    %edx,%ecx
10089661:       01 f8                   add    %edi,%eax
10089663:       0f 6f 08                movq   (%eax),%mm1
10089666:       0f f6 c1                psadbw %mm1,%mm0
10089669:       0f 7e c2                movd   %mm0,%edx
1008966c:       01 d1                   add    %edx,%ecx
1008966e:       0f 6f 04 38             movq   (%eax,%edi,1),%mm0
10089672:       0f 6f 0c 78             movq   (%eax,%edi,2),%mm1
10089676:       0f f6 c1                psadbw %mm1,%mm0
10089679:       0f 7e c2                movd   %mm0,%edx
1008967c:       01 d1                   add    %edx,%ecx
1008967e:       8d 04 78                lea    (%eax,%edi,2),%eax
10089681:       0f 6f 04 38             movq   (%eax,%edi,1),%mm0
10089685:       0f f6 c8                psadbw %mm0,%mm1
10089688:       0f 7e ca                movd   %mm1,%edx
1008968b:       01 d1                   add    %edx,%ecx
1008968d:       0f 6f 0c 78             movq   (%eax,%edi,2),%mm1
10089691:       0f f6 c1                psadbw %mm1,%mm0
10089694:       0f 7e c2                movd   %mm0,%edx
10089697:       01 d1                   add    %edx,%ecx
10089699:       31 c0                   xor    %eax,%eax
1008969b:       8b 55 10                mov    0x10(%ebp),%edx
1008969e:       d1 e2                   shl    %edx
100896a0:       39 d1                   cmp    %edx,%ecx
100896a2:       0f 9e c0                setle  %al
100896a5:       0f 77                   emms
100896a7:       8b 7d f8                mov    0xfffffff8(%ebp),%edi
100896aa:       89 ec                   mov    %ebp,%esp
100896ac:       5d                      pop    %ebp
100896ad:       c3                      ret
Akupenguin simplified this to the following (using x264 nasm syntax):
Code:
cglobal vsad, 2,3
lea    r2,  [r1*3]
movq   mm0, [r0]
movq   mm1, [r0+r1]
movq   mm2, [r0+r1*2]
movq   mm3, [r0+r2]
lea    r0,  [r0+r1*4]
movq   mm4, [r0]
movq   mm5, [r0+r1]
movq   mm6, [r0+r1*2]
psadbw mm0, mm1
psadbw mm1, mm2
psadbw mm2, mm3
psadbw mm3, mm4
psadbw mm4, mm5
psadbw mm5, mm6
paddd  mm0, mm1
paddd  mm2, mm3
paddd  mm4, mm5
paddd  mm0, mm2
mov    r2,  r2m
paddd  mm0, mm4
shl    r2
xor    eax, eax
movd   r1,  mm0
cmp    r1,  r2
setle  al
ret
I'm also noticing some other interesting stuff--you chose to put dequant as part of the iHCT process instead of as part of the entropy decoding process.

(This would be a whole lot easier if we could get an unstripped debug build, but like that'll ever happen... )

Last edited by Dark Shikari; 15th May 2008 at 07:57.
Dark Shikari is offline   Reply With Quote