1 coms 161 introduction to computing title: numeric processing date: november 08, 2004 lecture...

1

COMS 161Introduction to Computing

Title: Numeric Processing

Date: November 08, 2004

Lecture Number: 30

2

Announcements

3

Review

• Real numbers– Representation– Limitations

4

Outline

• Real numbers– Representation– Limitations

5

IEEE Standard 754

• Provides two floating point types– Single

• 24-bits of significand precision

– Double• 53-bits of significand precision

6

Single Precision

• IEEE standard 754– Floating point number representation– 32-bit

s eeeeeeee fffffff ffffffffffffffff

– s: (1) sign bit• 0 means positive, 1 means negative

s exponent significand31 30 23 22 0

7

Single Precision

s eeeeeeee fffffff ffffffffffffffff – e: (8) exponent bits [-126 … 127]

• A bias of 127 is added to the exponent

– f: (24) fractional part [23 bits + 1 implied bit]• Normalize the fractional part• 1 will always be on the left side of the binary point

8

Special Single Cases

• Two zeros– Signed zero– e = 0, f = 0 (exponent and fractional bits are all 0)– (-1)s x 0.0

• 0000 0000 0000 0000 0000 0000 0000 0000– 0x0000 0000 (+0)

• 1000 0000 0000 0000 0000 0000 0000 0000– 0x8000 0000 (-0)

9


• Positive infinity– +INF– s = 0, e = 255, f = 0 (all fractional bits are all 0)

• 0111 1111 1000 0000 0000 0000 0000 0000• 0x7f80 0000

• Negative infinity– -INF– s = 1, e = 255, f = 0 (all fractional bits are all 0)

• 1111 1111 1000 0000 0000 0000 0000 0000• 0xff80 0000

10


• Not-A-Number (NaN)– s = 0 | 1, e = 255, f != 0 (at least one fractional bit

is NOT 0)– There are many representations for NaN– Here is one example

• 0111 1111 1100 0000 0000 0000 0000 0000• 0x7fc0 0000

11


• Maximum single number– 0111 1111 0111 1111 1111 1111 1111 1111– 0x7f7f ffff– 3.40282347 x 1038

• Minimum positive single number– 0000 0000 1000 0000 0000 0000 0000 0000– 0x00800000– 1.17549435 x 10-38

• To represent larger numbers

12

Double Precision

• IEEE standard 754– Floating point number representation– 64-bit

s eeeeeeeeeeeffffffffffffffffffffffffffffffffffffffffffffffffff – s: (1) sign bit

• 0 means positive, 1 means negative

s exponent significand63 62 52 51 32

significand31 0

13

Single Precision

s eeeeeeeeeeeffffffffffffffffffffffffffffffffffffffffffffffffff– e: (11) exponent bits [-1022 … 1023]

• A bias of 1023 is added to the exponent

– f: (53) fractional part [52 bits + 1 implied bit]• Normalize the fractional part• 1 will always be on the left side of the binary point

14

Real (Decimal) Number Storage

• Double precision floating point numbers

– s: (1) sign bit

– e: (11) exponent bits [-1022 … 1023]

– f: (53) fractional part [52 bits + 1 implied bit]

seeeeeee eee f f f f f f f f f f f f f f f f f f f f

Byte 0 1 2 3

f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f

Byte 4 5 6 7

15

Special Double Cases

• Two zeros– Signed zero– e = 0, f = 0 (exponent and fractional bits are all

0)– (-1)s x 0.0

• 64 bits• 0000 0000 0000 0000 0000 0000 0000 … 0000

– 0x0000 0000 0000 0000 (+0)• 1000 0000 0000 0000 0000 0000 0000 … 0000

– 0x8000 0000 0000 0000 (-0)

16


• Positive infinity– +INF– s = 0, e = 2047, f = 0 (all fractional bits are all 0)

• 0111 1111 1111 0000 0000 0000 0000 … 0000• 0x7ff0 0000 0000 0000

• Negative infinity– -INF– s = 1, e = 2047, f = 0 (all fractional bits are all 0)

• 1111 1111 1111 0000 0000 0000 0000 … 0000• 0xfff0 0000 0000 0000

17


• Not-A-Number (NaN)– s = 0 | 1, e = 2047, f != 0 (at least one fractional

bit is NOT 0)– There are many representations for NaN– Here is one example

• 0111 1111 1111 1000 0000 0000 0000 … 0000• 0x7ff8 0000 0000 0000

18


• Maximum double number– 0111 1111 1110 1111 1111 1111 1111 … 1111– 0x7fef ffff ffff ffff– 1.7976931348623157 x 10308

• Minimum positive single number– 0000 0000 0001 0000 0000 0000 0000 … 0000– 0x0010 0000 0000 0000– 2.2250738585072014 x 10-308 – Don’t forget about the implied 1 bit!!

19

Decimal to Float Conversion

• Show –24.12510 in IEEE single precision format– First, save sign (negative so 1) and convert to binary…

– 24.12510 = 11000.0012 x 20

– Normalize…

– = 1.10000012 x 24

– Strip 1 off the mantissa and extend to form significand

– = .10000010000000000000000– Bias the exponent…

– Exp + Bias = 4 + 127 = 131 = 100000112

20

Real (Decimal) Number Storage

• 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

• 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

• Hex value : 0xC1C10000

• Link me baby

1 coms 161 introduction to computing title: numeric processing date: november 08, 2004 lecture...

Documents