View Full Version : Performance difference between two code fragments?
lol_123
2nd November 2007, 17:44
I am new to MFC coding. This afternoon, I found out a quite strange thing. 1. runs to 1.5 fps, 2. runs up to 590000 fps. Code as below
1.
int i,j,k;
for (i=0;i<2000;i++)
for (j=0; j<2000;j++)
{ if (j>=0) k=k+1;
}
2.
int i,j,k,g;
for (i=0;i<2000;i++)
for (j=0; j<2000;j++)
{ if (j>=0)
{g=k+1;
k=g;
}
Does any one know what makes 1 and 2 have so huge difference
on performance?
neuron2
2nd November 2007, 18:02
Your second example is syntactically incorrect. If you don't post the exact code, we can't help you.
And what do you mean by "runs to 1.5 fps"? There is no frame processing in this code.
Lastly, read the forum rules and do not make lame thread titles. I have fixed yours.
Dark Shikari
2nd November 2007, 18:15
Sounds like in one of them, the compiler is optimizing out the loop, and in the other it isn't.
JohnnyMalaria
3rd November 2007, 03:45
Are you using debug or release mode?
In debug mode, the compiler adds a lot of extra stuff for checking and doesn't make use of efficient processor instructions.
For example, k = k + 1 can be done with a single CPU instruction. g = k + 1 followed by k = g needs more instructions and in debug mode it needs a lot more. If you debug your code and set breakpoints at the k = k + 1 and g = k + 1, you can look at the compiled instructions (disassembly) - you'll probably see quite a difference.
If you are using release mode then a whole different set of reasons can apply.
BTW - if you are using debug mode you will get unexpected results if you don't initialize your variables to zero. In debug mode, your variables are set to 0xCDCDCDCD (deliberately).
audiohominis
3rd November 2007, 09:55
In the second example there are two open brackets and one close bracket; is that normal?
Taktaal
3rd November 2007, 17:39
No optimization would ever account for a 300000x difference in run speeds. The only thing I can think of is that you maybe ran the first code with a conditional breakpoint. Having one of these in a loop with a lot of repetitions will really screw up the execution times.
lol_123
3rd November 2007, 17:59
i'm using VC2005, at release mode, it seems to me that it takes really long time to excute "k=k+1" in the " if " loop. I guess that
it might be caused by memory allocation.
Does any one have clue?
clsid
3rd November 2007, 18:11
Memory allocation should not be the problem. The space for k only needs to be allocated once.
You can remove the "if (j>=0)" code. It's always true.
lol_123
3rd November 2007, 18:30
Clsid, that's reason why i got confused. you may try it by yourself under vc2005. it makes huge different if you put "if (j>=0)" there although it is always ture.
JohnnyMalaria
3rd November 2007, 18:49
How are you timing the code?
Are you testing the loops in isolation (i.e., no other code in the function)?
The release build of this:
MyTestFunction()
{
int i, j, k, g;
for (i = 0; i < 2000; i++)
for (j = 0; j < 2000; j++)
if (j >= 0)
{
g = k + 1;
k = g;
}
return 0;
};
generates NO CODE AT ALL except for the return (xor eax, eax)).
But this:
MyTestFunction()
{
int i, j, k, g;
for (i = 0; i < 2000; i++)
for (j = 0; j < 2000; j++)
if (j >= 0)
{
g = k + 1;
k = g;
}
MessageBox(0, 0, 0, k);
return 0;
};
does. (It doesn't matter what the extra code is as long as it calls another function that could potentially change/depend on the variable).
Compile the debug version and all the code is generated.
To convince yourself, enable ASM listing generation and open the ASM files after compilation.
BTW, when I use QueryPerformanceCounter to time the loops (that actually have the code as well), the k = k + 1 version is about 5% faster than the other.
If you have one of your loops in a function that generates no assembly code and the other in one that does, that could explain your differences. Without seeing the full context of your code, it's impossible to say.
Manao
3rd November 2007, 18:53
There's absolutely no difference in speed between both code under msvc 2005. I think you're mishandling MSVC.
Taktaal
3rd November 2007, 19:21
Clsid, that's reason why i got confused. you may try it by yourself under vc2005. it makes huge different if you put "if (j>=0)" there although it is always ture.
An if statement that's (almost) always true will slow down the code because jump prediction in modern CPUs will usually always guess on a jump which is what happens if a comparison is false.
It is a doozie in unoptimized code, but the issue has been taken into consideration by compiler optimization for at least 5 years. In code like Java and .NET the JIT will even invert jumps at run time when it notices that it's causing the CPU to guess wrongly.
Again, there is no way a simple optimization is going to cause a 300000x slow down, period.
clsid
3rd November 2007, 19:39
Maybe lol_123 can explain how he is measuring the execution time. That is where things go wrong.
Dark Shikari
3rd November 2007, 20:08
No optimization would ever account for a 300000x difference in run speeds.Sure it would, its called optimizing out the loop so it doesn't run to begin with.
lol_123
4th November 2007, 04:11
Thanks for all of you replies. Now i am working on a runtime 2D scaler optimization. Currently, the 2D scaler only runs to 0.1 fps when it zooms up 320x240 to 640x480. As i said earlier, i am new to MFC , before I only did the code for embeded system, like DSP or silicon. During the optimization, i found out that it takes very long time to assign the value at certain case, for example, in "if" loop, k=k+1 takes long time, but g=k+1, k=g,
takes no time.
By the way, i am not a native english speaker, my english may not be understandable.
the following is my code
LONGLONG t1,t2;
LONGLONG persecond;
QueryPerformanceFrequency((LARGE_INTEGER *)&persecond);
QueryPerformanceCounter((LARGE_INTEGER *)&t1);
int n = 0;
for(int i = 0; i <20000; i++)
{
for(int j = 0; j<20000; j++)
{
if(j >= 0)
{
n += 1;
}
}
}
QueryPerformanceCounter((LARGE_INTEGER *)&t2);
double err=(double)(t2-t1)/(double)persecond;
CString serr;
serr.Format("%lf",1/err);
AfxMessageBox(serr);
neuron2
4th November 2007, 04:17
You talk about variables g and k, but they don't appear in your code, and instead n appears. You have to be more rigorous if you want proper help.
Sulik
4th November 2007, 07:06
A good compiler would simply optimize away the two loops and replace the entire thing by n=20000*20000.
Adding/removing useless instructions might prevent or enable the compiler to perform this optimization in a way that does not necessarily seem obvious to you.
Benchmarking useless loop constructs is useless in itself. If you really want to find out what's happening in your real code, look at the disassembly.
akupenguin
8th November 2007, 04:24
An if statement that's (almost) always true will slow down the code because jump prediction in modern CPUs will usually always guess on a jump which is what happens if a comparison is false.
Modern CPUs differ in what they predict the first time a branch is executed (K8 predicts not-taken, P4 depends on branch direction (loops taken, forward branches not), Core2 has no special first time behavior and just uses whatever garbage is in the uninitialized branch history).* But thereafter as long as the branch instruction remains in cache they keep track of how often it's taken or not, and predict based on past behaviour. That said, a branch that's always taken can still cause a little speed loss on some architectures, because even without any mispredictions a jump causes an extra fetch.
* Software Optimization Guide for AMD64 Processors (http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF), page 253.
Intel 64 and IA-32 Architectures Optimization Reference Manual ("http://www.intel.com/design/processor/manuals/248966.pdf), page 3.10
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.