Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 4th April 2006, 08:07   #1  |  Link
MeteorRain
紺野木綿季
 
Join Date: Dec 2003
Location: NJ; OR; Shanghai
Posts: 498
[help]SSE optimized function runs slower

I'd like to calc the distance of 2 color in RGB colorspace. I write 2 functions, one use sqrt, another use SSE2 code. In my 2000 frames test, the former function uses 39sec while the optimized code uses 48sec. Theoretically the SSE code should run much faster than the ordinary code so i wonder there's some bottleneck in my code. Please help me to find out the problem, TIA.

my code:
Code:
if(sse)
	diff += _sse2_dist((float)(tr - r), (float)(tg - g), (float)(tb - b));
else
	diff += sqrt((float)(tr - r) * (tr - r) + (tg - g) * (tg - g) + (tb - b) * (tb - b));

float _sse2_dist(float a, float b, float c)
{
	__m128 x, s, r;
	_MM_ALIGN16 float flo[4] = {0.0};
	flo[0] = a;
	flo[1] = b;
	flo[2] = c;
	x = _mm_load_ps(flo);
	s = _mm_mul_ps(x, x);
	r = _mm_add_ss(s, _mm_movehl_ps(s, s));
	r = _mm_add_ss(r, _mm_shuffle_ps(r, r, 1));
	r = _mm_sqrt_ps(r);
	_mm_store_ss(flo, r);
	return flo[0];
}
regards,
MeteorRain
MeteorRain is offline   Reply With Quote
Old 4th April 2006, 09:02   #2  |  Link
Sulik
Registered User
 
Join Date: Jan 2002
Location: San Jose, CA
Posts: 215
In this case, most of your time will be spent in the square root and call overhead (not including store-to-load forwarding issues from storing 3 32-bit values immediately before loading it as a packed value). The overhead of SSE setup is killing any benefits over regular floating point.

The whole point of SIMD is to process data in parallel. You'd be better off re-arranging your data so you have 3 separate R,G, and B planes, so you can then process 4 pixels simultaneously, without having to get around the lack of horizontal operations.
Sulik is offline   Reply With Quote
Old 4th April 2006, 09:33   #3  |  Link
MeteorRain
紺野木綿季
 
Join Date: Dec 2003
Location: NJ; OR; Shanghai
Posts: 498
thanks, i got it.
i'd try to re-arrange it.
MeteorRain is offline   Reply With Quote
Old 4th April 2006, 15:21   #4  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,171
Step 1 move for "if (sse)" test outside your outer most loop.

Step 2 as Sulik says try to process entire rows of data. i.e. pass in 2 pointers and a count, have your sse code pick up 4 pixels at a time and pipeline the algorithm.

Think very hard if you can express your algorithm without doing the SQRT's. If you are just doing comparisons then the sum of squared values is just as good as the sum of values. i.e. if A>B then A*A > B*B

Also when using instrinsics always ask for an ASM listing from the compiler. You will probably find the compiler is loading and storing the XMM register between intrinsic call, which of course defeats the purpose of using SSE instructions. The latest compiler is much better but can still do some very stupid things. Shuffling your code slightly may help unconfuse the compiler. To get the ultimate speed you may need to use assembler.
IanB is offline   Reply With Quote
Old 5th April 2006, 01:11   #5  |  Link
MeteorRain
紺野木綿季
 
Join Date: Dec 2003
Location: NJ; OR; Shanghai
Posts: 498
i re-write almost the whole code, delete the point matrix in my old code, direct get point from the source data, and re-arrange the algorithm. the result data is the same as the old one, but speed goes up obviously!
Thanks to above ppl!

===============
yea, cost me several hours to debug it. :|

Last edited by MeteorRain; 5th April 2006 at 01:14.
MeteorRain is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 15:58.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.