c++ for games programming essay piece

5
Some Comments on Optimisation and Readability As a self-declared expert in optimisation I really cannot resist commenting on the guest lecture float xhalf = 0.5f * x; int i = *(int *) &x; i = 0x5f57759df – (i >> 1); x = *(float *) &i; x = x * (1.5f – xhalf*x*x); return x; During the guest lecture I did not know what to make of the above. It is supposed to perform some calculation deep inside some innermost loop. It is unreadable. What is surprising to me is that the example is so badly chosen, as this code is far from optimal whatever it is supposed to be calculating After all it uses float arithmetic which will be very slow compared with integer arithmetic on most Pentiums. My memory is that one of the guest lecturers - I’ve forgotten who - said that it calculates something to do with a calculation involving exponentiation of some kind that is done thousands of times. It is well known that Pentium floating point code is slow. Almost always doing the calculations using only integers will give a faster result, but will be much more obscure code Let’s assume we want to calculate the square root of a 32 bit float using integer arithmetic only The first thing to do is to mask off the sign bit, just to be safe and knowing the number must be positive. (Refer “Assembly Language Programming for the Intel 80XXX Family” pp614-666 William B. Giles pub MacMillan 1991.) Start with:- union { float f; int x; } int xpos = 0x80000000 & x; Then one wants to extract the 8 bit exponent and the 23 bit significand and separate them. int xsignificand = 0x007fffff & xpos;

Upload: andrew-francis-oliver

Post on 11-Sep-2015

5 views

Category:

Documents


0 download

DESCRIPTION

My second best piece in my Games Programming Portfolio for my Masters in Information Technology from Swinburne University of Technology 2011-12. Fore, after inheriting a bequest, went back to University to finish my education as a Software Engineer ...

TRANSCRIPT

Some Comments on Optimisation and Readability

As a self-declared expert in optimisation I really cannot resist commenting on the guest lecture

float xhalf = 0.5f * x;int i = *(int *) &x;i = 0x5f57759df (i >> 1);x = *(float *) &i;x = x * (1.5f xhalf*x*x);return x;

During the guest lecture I did not know what to make of the above. It is supposed to perform some calculation deep inside some innermost loop. It is unreadable. What is surprising to me is that the example is so badly chosen, as this code is far from optimal whatever it is supposed to be calculating After all it uses float arithmetic which will be very slow compared with integer arithmetic on most Pentiums.My memory is that one of the guest lecturers - Ive forgotten who - said that it calculates something to do with a calculation involving exponentiation of some kind that is done thousands of times.It is well known that Pentium floating point code is slow. Almost always doing the calculations using only integers will give a faster result, but will be much more obscure code Lets assume we want to calculate the square root of a 32 bit float using integer arithmetic only The first thing to do is to mask off the sign bit, just to be safe and knowing the number must be positive.(Refer Assembly Language Programming for the Intel 80XXX Family pp614-666 William B. Giles pub MacMillan 1991.)Start with:-union { float f; int x;}int xpos = 0x80000000 & x;Then one wants to extract the 8 bit exponent and the 23 bit significand and separate them.int xsignificand = 0x007fffff & xpos;int xexponent = 0x7f800000 & xpos;Then add the first binary bit of the mantissa back int xmantissa = xsignificand | 0x00800000;But first adjust the exponent for denormalisation xexponent += 0x0b8000000;

We need to deal with two cases, when the exponent is odd and when the exponent is even, in order to denormalise the mantissa correctly for the integer approach to square roots ...if (xexponent & 0x00800000) { xmantissa = 1;The proposed result:-xbuilditup = 0;Use a table of prepared precalculated integers int topbitarray[32] = { 1, 2, 4, 8, 16, 32, 64, 128, 0x00000100, 0x00000200, 0x00000400, 0x00000800, 0x00001000, 0x00002000, 0x00004000, 0x00008000, 0x00010000, 0x00020000, 0x00040000, 0x00080000, 0x00100000, 0x00200000, 0x00400000, 0x00800000, 0x01000000, 0x02000000, 0x04000000, 0x08000000, 0x10000000, 0x20000000, 0x40000000, 0x80000000 }int j;for (j = 31 ; j >= 0 ; j--) { if (xmantissa & topbitarray[j]) {break; }}

Then, if j odd selecting the even part k = j & 0xfffffff7;xbuilditup += topbitarray[k / 2];xmantissa -= topbitarray[k];Loop, i.e. repeat this process and one should churn out an approximate value of the square root of x, but in denormalised xexponent and xmantissa parts. If one needed them back together one could create a float out of them, but in any innermost loop the separated bits would be more use anyway for additive comparisons

Note that sometimes there will be the same j found several times while we deduct away a bit of xmantissa, so one cant not start j at 31 again. However, if we know that the top seven bits are clear anyway we could say start at j = 24 to get things moving, and retest only from the last j that found anything.One could write a test program if required to test the above method Even faster we could precompute the integer square of every integer from 0 to 255, and use a moving radix analysis of the top eight bits of the xmantissa with binary search to compute the square root digits in eight bit chunks rather than one bit chunks, but remembering that with integer square root routines that one sometimes has to deduct something more than one times to get it right unlike long division!!!!This code should really be tested before being relied upon!!!!

Further DiscussionSee Michael Abrash Zen of Code Optimisation also another book worth reading up if one really needed a fast square root algorithm and found the Pentium/8087/80287/80387 instruction fsqrt too slow (perhaps because it calculates it to 64 bit significand internally and then rounds to 23 bits, using the 80 bit internal FPU registers.)Anyway, according to William Giles [1] p843 the 80387 speeded up the fsqrt a lot and it only takes the time of about four fmult instructions, to quote Its speed is remarkable so why hack with integer arithmetic just get your global precomputation right for the overall calculations you are doing, and avoid calling fsqrt so much.However, from the above routine one might, on reflection, wonder how useful a 12 bit approximation to the square root of 32 bit float with 23 bit precision really is even if its rocket fast!

1. William B. Giles, Assembly Language Programming for the Intel 80XXX Family pub MacMillan 1991.2. Michael Abrash, Zen of Code Optimisation, pub Coriolis Group Books 1994