x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Quote:
Originally Posted by sparky
This is correct. To my knowledge, CoreAVC does not contain any SSE3, either. (My knowledge could be outdated) We will add support of these instruction sets eventually.
|
Yup, I don't think it does. I have seen use of lddqu in Elecard though, and FFDshow of course uses SSSE3 for luma/chroma MC.
Quote:
Originally Posted by sparky
The decoder is part of a bigger source tree, there was no effort to "prune" dead code for this release. You could be seeing some assembly that is not used by the H.264 decoder. For example, ASP encoder does use floating point. That should cover most instances of 'emms'. But you have a good point.
|
Ah, that would explain why I found an MPEG-4 iDCT in there!
Here's the code I found:
Code:
1008962e: 55 push %ebp
1008962f: 89 e5 mov %esp,%ebp
10089631: 81 ec 08 00 00 00 sub $0x8,%esp
10089637: 89 7d f8 mov %edi,0xfffffff8(%ebp)
1008963a: 31 c9 xor %ecx,%ecx
1008963c: 8b 45 08 mov 0x8(%ebp),%eax
1008963f: 8b 7d 0c mov 0xc(%ebp),%edi
10089642: 01 f8 add %edi,%eax
10089644: 0f 6f 00 movq (%eax),%mm0
10089647: 0f 6f 0c 38 movq (%eax,%edi,1),%mm1
1008964b: 0f f6 c1 psadbw %mm1,%mm0
1008964e: 0f 7e c2 movd %mm0,%edx
10089651: 01 d1 add %edx,%ecx
10089653: 8d 04 78 lea (%eax,%edi,2),%eax
10089656: 0f 6f 00 movq (%eax),%mm0
10089659: 0f f6 c8 psadbw %mm0,%mm1
1008965c: 0f 7e ca movd %mm1,%edx
1008965f: 01 d1 add %edx,%ecx
10089661: 01 f8 add %edi,%eax
10089663: 0f 6f 08 movq (%eax),%mm1
10089666: 0f f6 c1 psadbw %mm1,%mm0
10089669: 0f 7e c2 movd %mm0,%edx
1008966c: 01 d1 add %edx,%ecx
1008966e: 0f 6f 04 38 movq (%eax,%edi,1),%mm0
10089672: 0f 6f 0c 78 movq (%eax,%edi,2),%mm1
10089676: 0f f6 c1 psadbw %mm1,%mm0
10089679: 0f 7e c2 movd %mm0,%edx
1008967c: 01 d1 add %edx,%ecx
1008967e: 8d 04 78 lea (%eax,%edi,2),%eax
10089681: 0f 6f 04 38 movq (%eax,%edi,1),%mm0
10089685: 0f f6 c8 psadbw %mm0,%mm1
10089688: 0f 7e ca movd %mm1,%edx
1008968b: 01 d1 add %edx,%ecx
1008968d: 0f 6f 0c 78 movq (%eax,%edi,2),%mm1
10089691: 0f f6 c1 psadbw %mm1,%mm0
10089694: 0f 7e c2 movd %mm0,%edx
10089697: 01 d1 add %edx,%ecx
10089699: 31 c0 xor %eax,%eax
1008969b: 8b 55 10 mov 0x10(%ebp),%edx
1008969e: d1 e2 shl %edx
100896a0: 39 d1 cmp %edx,%ecx
100896a2: 0f 9e c0 setle %al
100896a5: 0f 77 emms
100896a7: 8b 7d f8 mov 0xfffffff8(%ebp),%edi
100896aa: 89 ec mov %ebp,%esp
100896ac: 5d pop %ebp
100896ad: c3 ret
Akupenguin simplified this to the following (using x264 nasm syntax):
Code:
cglobal vsad, 2,3
lea r2, [r1*3]
movq mm0, [r0]
movq mm1, [r0+r1]
movq mm2, [r0+r1*2]
movq mm3, [r0+r2]
lea r0, [r0+r1*4]
movq mm4, [r0]
movq mm5, [r0+r1]
movq mm6, [r0+r1*2]
psadbw mm0, mm1
psadbw mm1, mm2
psadbw mm2, mm3
psadbw mm3, mm4
psadbw mm4, mm5
psadbw mm5, mm6
paddd mm0, mm1
paddd mm2, mm3
paddd mm4, mm5
paddd mm0, mm2
mov r2, r2m
paddd mm0, mm4
shl r2
xor eax, eax
movd r1, mm0
cmp r1, r2
setle al
ret
I'm also noticing some other interesting stuff--you chose to put dequant as part of the iHCT process instead of as part of the entropy decoding process.
(This would be a whole lot easier if we could get an unstripped debug build, but like that'll ever happen... )
Last edited by Dark Shikari; 15th May 2008 at 07:57.
|