View Full Version : xmmintrin.h
vcmohan
18th January 2006, 04:16
I need to have a header file xmmintrin.h or its equivalent for getting __m128 in my VC++6. Is there one compatible available? I searched with Google but got confused. If this post needs to go somewhere else please excuse me. I need this for Avisynth plugin.
tritical
18th January 2006, 05:20
xmmintrin.h should be in the platform sdk (Windows Server 2003 SP1 SDK). It is in my install at least. Not sure if it is in older releases of the platform sdk or not. I think it is also in the processor pack for vc6 sp5, but don't know for sure (never used vc6).
IanB
18th January 2006, 13:19
The compiler makes a truely horrible mess when using the xmm intrinsics, it continuously loads and unload the xmm registers to memory between every intrinsic call and wastes all your good SSE2 speed gains. Don't waste your time, just write straight __asm or fastwire. :D
squid_80
18th January 2006, 21:55
The compiler makes a truely horrible mess when using the xmm intrinsics, it continuously loads and unload the xmm registers to memory between every intrinsic call and wastes all your good SSE2 speed gains. Don't waste your time, just write straight __asm or fastwire. :D
Or download VS 2005 Express for free, it handles them a lot better.
vcmohan
19th January 2006, 04:29
I am trying to use an existing code in my plugin. The code for using SIMD includes this file. The code is pure c . and uses __m128 and a few mm_
instructions. My problem will be how to create buffers with 4 X float data to be accepted by that code. May be I can do a void * or use a struct of 4 elements and assume they are 16 byte aligned. As I am not that proficient in this I will be grateful for any advice. Thanks for the information
squid_80
19th January 2006, 08:56
The __m128 types are just 16-byte arrays, aligned by 16 in memory. In fact (from xmmintrin.h): typedef union __declspec(intrin_type) __declspec(align(16)) __m128 {
float m128_f32[4];
unsigned __int64 m128_u64[2];
__int8 m128_i8[16];
__int16 m128_i16[8];
__int32 m128_i32[4];
__int64 m128_i64[2];
unsigned __int8 m128_u8[16];
unsigned __int16 m128_u16[8];
unsigned __int32 m128_u32[4];
} __m128;
So you should be able to cast an array of four floats to __m128 as long as it's declared 16-byte aligned. Unless I'm missing something.
IanB
20th January 2006, 01:41
Or download VS 2005 Express for free, it handles them a lot better.Better is still not BEST by a very long way. Have a read of Avery Lee's blogs on the subject, they will curl your nose hairs :sly:
squid_80
20th January 2006, 05:16
What's so bad about this? (http://www.virtualdub.org/blog/pivot/entry.php?id=46#body)
Here's the routine using SSE2 intrinsics that I used to punish the compiler last time I wrote about this problem:
#include <emmintrin.h>
unsigned premultiply_alpha(unsigned px) {
__m128i px8 = _mm_cvtsi32_si128(px);
__m128i px16 = _mm_unpacklo_epi8(px8, _mm_setzero_si128());
__m128i alpha = _mm_shufflelo_epi16(px16, 0xff);
__m128i result16 = _mm_srli_epi16(_mm_mullo_epi16(px16, alpha), 8);
return _mm_cvtsi128_si32(_mm_packus_epi16(result16, result16));
}
<snipped vs 2003 stuff>
Here's what Visual Studio .NET 2005 generates for this function:
pxor xmm1, xmm1
movd xmm0, ecx
punpcklbw xmm0, xmm1
pshuflw xmm1, xmm0, 255
pmullw xmm0, xmm1
psrlw xmm0, 8
packuswb xmm0, xmm0
movd eax, xmm0
ret
That's actually not too bad. Fairly good, even.
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.