sh0dan
20th August 2009, 10:45
I need a quick way to convert float values to 16 bit unsigned integers. I currently use this:
minps [65535.f], xmm0 // Saturate (Latency: 3 on C2, Throughput: 1)
cvtps2dq xmm0, xmm0 // Convert to dwords (Latency: 3 on C2, Throughput: 1)
movdqa xmm1, xmm0 // Copy (L:1 T:0.5)
pcmpgtd xmm1, [zeroes] // if (xmm1 > 0) xmm1 = ones (L:1 T:1)
pand xmm1,xmm0 // Result in xmm1 in dwords (L:1 T:1)
(shuffle to get lower words in xmm1 to get result in lower 64 bits of xmm1)
An SSE 4.1 implementation is much simpler, as it has "packusdw", avoiding everything but the actual conversion.
Does anyone have a more efficient way for SSE2? I don't care that much for latency, but throughput count is important.
minps [65535.f], xmm0 // Saturate (Latency: 3 on C2, Throughput: 1)
cvtps2dq xmm0, xmm0 // Convert to dwords (Latency: 3 on C2, Throughput: 1)
movdqa xmm1, xmm0 // Copy (L:1 T:0.5)
pcmpgtd xmm1, [zeroes] // if (xmm1 > 0) xmm1 = ones (L:1 T:1)
pand xmm1,xmm0 // Result in xmm1 in dwords (L:1 T:1)
(shuffle to get lower words in xmm1 to get result in lower 64 bits of xmm1)
An SSE 4.1 implementation is much simpler, as it has "packusdw", avoiding everything but the actual conversion.
Does anyone have a more efficient way for SSE2? I don't care that much for latency, but throughput count is important.