datorteknik floatingpoint bild 1 floating point number system corresponding to the decimal notation...
TRANSCRIPT
Datorteknik FloatingPoint bild 1
Floating point
Number system corresponding to the decimal notation
1,837 * 10
significand exponent
a great number of corresponding binary standards exists
there is one common standard:
IEEE 754-1985 (IEC 559)
4
Datorteknik FloatingPoint bild 2
IEEE 754-1985
Number representation
Single precision (32 bits)
sign: 1 bit
exponent: 8 bits
fraction: 23 bits
Double precision (64 bits)
sign: 1 bit
exponent: 11 bits
fraction: 52 bits
Single extended and double extended numbers exists inside the floating point hardware
Datorteknik FloatingPoint bild 3
IEEE 754-1985
1 8 23
sign
exponent:excess 127binary integer
S E M
mantissa:sign + magnitude, normalizedbinary significand w/ hiddeninteger bit: 1.M
Single Precision:
actual exponent ise = E - 127
N = (-1) 2 (1.M)E-127
0 < E < 255
0 = 0 00000000 0 . . . 0 -1.5 = 1 01111111 10 . . . 0
Magnitude of numbers that can be represented is in the range:
2-126
(1.0) to 2127
(2 - 223)
which is approximately:
1.8 x 10-38
to 3.40 x 10 38
Datorteknik FloatingPoint bild 4
IEEE 754-1985
Fraction part:23 / 52 bits;
0 ≤ x <1
Significand:1 + fraction part
“1” is not stored; “hidden bit”
corresponds to 7 resp. 16 decimal digits
Exponent:127 / 1023 added to the exponent;
“biased exponent”
corresponds to 10 - 10
resp. 10 - 10
-39 39
-308 308
Datorteknik FloatingPoint bild 5
IEEE 754-1985
Special features:
Correct rounding of “halfway” result (to even number)
Includes special values:
NaN Not a number
∞ Infinity
-∞ - Infinity
Uses denormal number to represent
numbers less than 2
Rounds to nearest by default; Three other rounding modes exists.
Sophisticated exception handling
Emin
Datorteknik FloatingPoint bild 6
Multiplication
(s1 * 2 ) * (s2 * 2 ) = s1*s2 *2
so, multiply significands and add exponents
Problem:Significand coded in signed-
magnitude - use unsigned multiplication and take care of sign
Round 2n bits significand to n bits significand
Compute new exponent with respect to bias
e1 e2 e1+e2
Datorteknik FloatingPoint bild 7
Rounding
1. Multiply the two significands to get the 2n-bits product:
Case 1: x0 = 0, shift needed:
Case 2: x0 = 1, increment exponent, set g=r; r=s or r
x0 x1 x2 x3 x4 x5 g r s s s s
P A
x1 x2 x3 x4 x5 g r s s s s
P A
x0 x1 x2 x3 x4 x5 r s s s s s
P A
These four bitsOR:ed together(“sticky bit”)
guard roundbit bit
Datorteknik FloatingPoint bild 8
Rounding
2: For both cases:
if r = 0, P is the correctly rounded product.
if r = 1 and s = 1, then P + 1 is the correctly rounded product
if r = 1 and s = 0, (the “halfway case”), then
P is the correctly rounded product if x5 (or g) is 0
P+1 is the correctly rounded product if x5 (or g) is 1
Datorteknik FloatingPoint bild 9
Add / Sub
(s1 * e ) + (s2 * e ) = (s3 * e )
1: Shift summands so they have the same exponent.
(eg. if e2 < e1: shift s2 right and increment e2 until e1 = e2)
2: Add significands
3: Normalize number(shift s3 left and decrement e3 until
MSB = 1)
4: Round s3 correctly(under the common assumption
that more than 23 / 52 bits is internally used for addition)
Subtraction use the same method
e1 e2 e3
s3
Datorteknik FloatingPoint bild 10
Division
(s1 * 2 ) / (s2 * 2 ) = (s1 / s2) * 2e1 e1 e1-e2
so, divide significands and subtract exponents
Problem:Significand coded in signed- magnitude - use unsigned division (different algoritms exists) and take care of sign
Round n + 2 (guard and round) bits significand to n bits significand
Compute new exponent with respect to bias