The documentation for the iphone’s VFP processor says that all floating point instructions take 1 cycle, apart from divide and square root, which take 15 cycles.
This leads to a very obvious optimisation, to cache your divide operations.
In other words if you have code like this:
float fl = 1.0007f;
for (i = 0; i < 16; i++)
matrix.Set(i, matrix[i]/fl);
you can increase performance by doing the divide outside the loop:
float fl = 1.0f/1.0007f;
for (i = 0; i < 16; i++)
matrix.Set(i, matrix[i]*fl);
I measured this and found that it does indeed make a big improvement.
Also, I thought it would be interesting to try out the old Quake inverse square root trick.
float InvSqrt(float x){
float xhalf = 0.5f * x;
int i = *(int*)&x; // store floating-point bits in integer
i = 0x5f3759d5 - (i >> 1); // initial guess for Newton's method
x = *(float*)&i; // convert new bits into float
x = x*(1.5f - xhalf*x*x); // One round of Newton's method
return x;
}
This code works on the iPhone, but the results are less accurate.
float fl = 2.0f;
result = 1.f / sqrt ( fl ); // gives 0.707106769
result = InvSqrt(fl); // gives 0.706930041
the actual value should be 0.70710678
I measured the performance and found that it’s a touch faster, but not much.