Log in

View Full Version : SSE Question


cogman
5th December 2009, 23:20
Hey all, I'm an SSE initiate, so go easy on me :). Here is what I'm trying to mimic

int findInts(int* found, int* data, int dataSize, int pattern)
{
int numFound = 0;
for (int i = 0; i < dataSize; ++i)
{
if (data[i] == pattern)
{
found[numFound++] = i;
}
}
return numFound;
}

So basically, I'm trying to search through a random array of integers and find the locations where the integers match a certain value (Actually, I'm going to use to it to search for specific colors).

I'm thinking I could use something like pcmpeqd to get a mask when things match, however, how do I go from that mask to specific matching i values? Especially if the pattern is 0.

If there is another way to do this, or if just not using SSE is better I'd be glad to hear it. (My computer that I'm doing this on only supports SSE2, so no fancy SSE3/4 please :))

Sulik
6th December 2009, 01:33
Use pcmpeqd, followed by psadbw with zero (horizontal sum), then a psubd, you'll end up with count*4

hank315
6th December 2009, 15:38
A good compiler will automatically insert SSE2 code for this, so just look at the generated assembler code.
Mine does something like this:

movdqa xmm1, XMMWORD PTR [pattern]
pxor xmm2, xmm2
xor eax, eax
movdqa xmm0, XMMWORD PTR [one]
$B1$11: movdqa xmm3, XMMWORD PTR data[eax]
pcmpeqd xmm3, xmm1
pand xmm3, xmm0
paddd xmm2, xmm3
add eax, 16
cmp eax, 256
jb $B1$11

cogman
7th December 2009, 15:12
What is in variable one? It seems to be key for this thing to work correctly., is it ..01..001..001..001? My compiler (Gcc) Isn't that smart, it doesn't generate any SSE code from compiling that.

Also, that isn't syntactically the same. AFAICT, that will keep a count of the number of pattern matches found, it will not, however, give the location of the pattern matches, making it worthless.

Sulik
7th December 2009, 20:43
There is no need for the "and 1" operation, just replace the add with a sub:

movdqa xmm1, XMMWORD PTR [pattern]
pxor xmm2, xmm2
xor eax, eax
$B1$11:
movdqa xmm3, XMMWORD PTR data[eax]
pcmpeqd xmm3, xmm1
psubd xmm2, xmm3
add eax, 16
cmp eax, 256
jb $B1$11

hank315
8th December 2009, 02:14
Seems Sulik really examined the asm code :), I just wrote some code and compiled it (Intel Fortran compiler), I didn't seriously look at the generated asm code, it was just an example how a compiler could handle such source code.

In many cases hand written assembler is more optimized, major drawback: it takes a lot of time...

And yes, [one] = 0001h 0001h 0001h 0001h

roozhou
11th December 2009, 11:40
A faster way to get [one] is

pcmpeqd xmm0, xmm0
psrld xmm0, 31