math1070 2 error and computer arithmetic

Upload: phongiswindy

Post on 02-Apr-2018

231 views

Category:

Documents


1 download

TRANSCRIPT

  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    1/60

  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    2/60

    > 2. Error and Computer Arithmetic

    Computers use binary arithmetic, representing each number as abinary number: a finite sum of integer powers of 2.

    Some numbers can be represented exactly, but others, such as110 ,

    1100 ,

    11000 , . . ., cannot.

    For example,2.125 = 21 + 23

    has an exact representation in binary (base 2), but

    3.1 21 + 20 + 24 + 25 + 28 +

    does not.And, of course, there are the transcendental numbers like that haveno finite representation in either decimal or binary number system.

    2. Error and Computer Arithmetic Math 1070

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    3/60

    > 2. Error and Computer Arithmetic

    Computers use 2 formats for numbers.

    Fixed-point numbers are used to store integers.Typically, each number is stored in a computer word of 32 binary

    digits (bits) with values of 0 and 1. at most 232 differentnumbers can be stored.If we allow for negative numbers, we can represent integers in therange 231 x 231 1, since there are 232 such numbers.Since 231 2.1 109, the range for fixed-point numbers is toolimited for scientific computing. =

    always get an integer answer.the numbers that we can store are equally spaced.very limited range of numbers.

    Therefore they are used mostly for indices and counters.

    An alternative to fixed-point, floating-point numbersapproximate real numbers.

    the numbers that we can store are NOT equally spaced.wide range of variably-spaced numbers that can be representeedexactly.

    2. Error and Computer Arithmetic Math 1070

    http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    4/60

    > 2. Error and Computer Arithmetic > 2.1 Floating-point numbers

    Numbers must be stores and used for arithmetic operations. Storing:

    1 integer format

    2 floating-point format

    Definition (decimal Floating-point representation)

    Let consider x = 0 written in decimal system.Then it can be written uniquely as

    x =

    x

    10e (4.1)

    where

    = +1 or 1 is the signe is an integer, the exponent

    1 x < 10, the significand or mantissaExample ( 124.62 = (1.2462

    x

    ) 102 )

    = +1, the exponent e = 2, the significand x = 1.2462

    2. Error and Computer Arithmetic Math 1070

    2 E d C A i h i 2 1 Fl i i b

    http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    5/60

    > 2. Error and Computer Arithmetic > 2.1 Floating-point numbers

    The decimal floating-point representation of x R is given in (4.1),with limitations on the

    1 number of digits in mantissa x

    2 size of e

    Example

    Suppose we limit

    1

    number of digits in x to 42 99 e 99

    We say that a computer with such a representation has a four-digit decimalfloating point arithmetic.This implies that we cannot store accurately more than the first four digits of

    a number; and even the fourth digit may be changed by rounding.

    What is the next smallest number bigger than 1? What is the next smallestnumber bigger than 100? What are the errors and relative errors?

    What is the smallest positive number?

    2. Error and Computer Arithmetic Math 1070

    > 2 E d C t A ith ti > 2 1 Fl ti i t b

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    6/60

    > 2. Error and Computer Arithmetic > 2.1 Floating-point numbers

    Definition (Floating-point representation of a binary number x)

    Let consider x written in binary format. Analogous to (4.1)

    x = x 2e (4.2)

    where

    = +1 or

    1 is the sign

    e is an integer, the exponent

    x is a binary fraction satisfying

    (1)2 x < (10)2 (in decimal:1 x < 2)

    For example, ifx = (11011.0111)2

    then = +1, e = 4 = (100)2 and x = (1.10110111)2

    2. Error and Computer Arithmetic Math 1070

    > 2 Error and Computer Arithmetic > 2 1 Floating point numbers

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    7/60

    > 2. Error and Computer Arithmetic > 2.1 Floating-point numbers

    The floating-point representation of a binary number x is given by (4.2) witha restriction on

    1 number of digits in x: the precision of the binary floating-pointrepresentation of x

    2 size of eThe IEEE floating-point arithmetic standard is the format for floating pointnumbers used in almost all computers.

    the IEEE single precision floating-point representation of x has

    a precision of 24 binary digits,and the exponent e is limited by 126 e 127:

    x = (1.a1a2 . . . a23) 2ewhere, in binary

    (1111110)2

    e

    (1111111)2

    the IEEE double precision floating-point representation of x has

    a precision of 53 binary digits,and the exponent e is limited by 1022 e 1023:

    x = (1.a1a2 . . . a52) 2e2. Error and Computer Arithmetic Math 1070

    > 2 Error and Computer Arithmetic > 2 1 Floating point numbers

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    8/60

    > 2. Error and Computer Arithmetic > 2.1 Floating-point numbers

    the IEEE single precision floating-point representation of xhas a precision of 24 binary digits,and the exponent e is limited by 126 e 127:

    x = (1.a1a2 . . . a23) 2estored on 4 bytes (32 bits)

    b1

    b2b3 b9

    E=e+127b10b11 b32

    xthe IEEE double precision floating-point representation of xhas a precision of 53 binary digits,and the exponent e is limited by 1022 e 1023:

    x =

    (1.a1

    a2

    . . . a52

    )

    2e

    stored on 8 bytes (64 bits)

    b1

    b2b3 b12

    E=e+1023

    b13b14 b64

    x

    2. Error and Computer Arithmetic Math 1070

    > 2 Error and Computer Arithmetic > 2 1 1 Accuracy of floating-point representation

    http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    9/60

    > 2. Error and Computer Arithmetic > 2.1.1 Accuracy of floating-point representation

    Epsilon machine

    How accurate can a number be stored in the floating point representation?How can this be measured?

    (1) Machine epsilon

    Machine epsilon

    For any format, the machine epsilon is the difference between 1 and the nextlarger number that can be stored in that format.

    In single precision IEEE, the next larger binary number is

    1.0000000000000000000000 1a23

    (1 + 224 cannot be stored exactly)

    Then the machine epsilon in single precision IEEE format is

    223

    = 1.19 107i.e., we can store approximately 7 decimal digits of a number x in decimal format.

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.1.1 Accuracy of floating-point representation

    http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    10/60

    > 2. Error and Computer Arithmetic > 2.1.1 Accuracy of floating point representation

    Then the machine epsilon in double precision IEEE format is

    252

    = 2.22 1016

    IEEE double-precision format can be used to store approximately 16decimal digits of a number x in decimal format.MATLAB: machine epsilon is available as the constant eps.

    2. Error and Computer Arithmetic Math 1070

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    11/60

    > 2. Error and Computer Arithmetic > 2.1.2 Rounding and chopping

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    12/60

    p g pp g

    Let say that a number x has a significand

    x = 1.a1a2 . . . an1anan+1

    but the floating-point representation may contain only n binary digits.Then x must be shortened when stored.

    Definition

    We denote the machine floating-point version of x by f l(x).1 truncate or chop x to n binary digits, ignoring the remaining

    digits2 round x to n binary digits, based on the size of the part of x

    following digit n:

    (a) if digit n + 1 is 0, chop x to n digits(b) if digit n + 1 is 1, chop x to n digits and add 1 to the last digit of

    the result

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.1.2 Rounding and chopping

    http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    13/60

    It can be shown that

    f l(x) = x (1 + ) (4.3)where is a small number depending of x.

    (a) If chopping is used

    2n+1 0 (4.4)(b) If rounding is used

    2n

    2n

    (4.5)Characteristics of chopping:

    1 the worst possible error is twice as large as when rounding is used2 the sign of the error x f l(x) is the same as the sign of x

    The worst of the two: no possibility of cancellation of errors.Characteristics of rounding:1 the worst possible error is only half as large as when chopping is used2 More important: the error x f l(x) is negative for only half the

    cases, which leads to better error propagation behavior2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.1.2 Rounding and chopping

    http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    14/60

    For single precision IEEE floating-point rounding arithmetic (there aren = 24 digits in the significand):

    1 chopping (rounding towards zero):

    223 0 (4.6)2 standard rounding:

    224

    224 (4.7)

    For double precision IEEE floating-point rounding arithmetic:

    1 chopping:

    252

    0 (4.8)

    2 rounding:

    253 253 (4.9)

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic >Conseq. for programming of floating-point arithm.

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    15/60

    Numbers that have finite decimal expressions may have infinite binaryexpansions. For example

    (0.1)10 = (0.000110011001100110011 . . .)2

    Hence (0.1)10 cannot be represented exactly in binary floating-pointarithmetic.Possible problems:

    Run into infinite loops

    pay attention to the language used:

    Fortran and C have both single and double precision, specify

    double precision constants correctlyMATLAB does all computations in double precision

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2 Errors: Definitions, Sources and Examples

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    16/60

    Definition

    The error in a computed quantity is defined as

    Error(xA) = xT - xA

    where xT =true value, xA=approximate value. This is called alsoabsolute error.

    The relative error Rel(xA) is a measure off error related to thesize of the true value

    Rel(xA) = errortrue value =xTxAxT

    For example, for the approximation

    =22

    7we have x

    T= = 3.14159265 . . . and x

    A= 22

    7= 3.1428571

    Error

    22

    7

    = 22

    7

    = 0.00126

    Rel

    22

    7

    =

    227

    = 0.000422. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2 Errors: Definitions, Sources and Examples

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    17/60

    The notion of relative error is a more intrinsic error measure.

    1 The exact distance between 2 cities: x1T = 100km and the

    measured distance is x1

    A = 99km

    Error

    x1T

    = x1T x1A = 1km

    Rel

    x1T

    =

    Error

    x1T

    x1T= 0.01 = 1%

    2 The exact distance between 2 cities: x2T = 2km and the measureddistance is x2A = 1km

    Error x2T = x

    2T

    x2A = 1km

    Rel

    x2T

    =Error x2T

    x2T= 0.5 = 50%

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2 Errors: Definitions, Sources and Examples

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    18/60

    Definition (significant digits)

    The number of significant digits in xA is number of its leading digits

    that are correct relative to the corresponding digits in the true value xT

    More precisely, if xA, xT are written in decimal form; compute the error

    xT = a1 a2 .a3 am am+1 am+2

    |xT

    xA

    |= 0 0 .0

    0 bm+1 bm+2

    If the error is 5 units in the (m + 1)th digit of xT, countingrightward from the first nonzero digit, then we say that xA has, atleast, m significant digits of accuracy relative to xT.

    Example

    1 xA = 0.222, xT =29 :

    xT = 0. 2 2 2 2 2 2|xT xA| = 0. 0 0 0 2 2 2 3

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2 Errors: Definitions, Sources and Examples

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    19/60

    Example

    1 xA = 23.496, xT = 23.494:

    xT = 2 3. 4 9 6|xT xA| = 0 0. 0 0 1 9 9 4

    2 xA = 0.02138, xT = 0.02144:

    xT = 0. 0 2 1 4 4|xT xA| = 0. 0 0 0 0 6 2

    3 xA =227 = 3.1428571 . . . , xT = = 3.14159265 . . . :

    xT = 3. 1 4 1 5 9 2 6 5|xT xA| = 0. 0 0 1 2 6 4 4 8 3

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2.1 Sources of error

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    20/60

    Errors in a scientific-mathematical-computational problem:1 Original errors

    (E1) Modeling errors(E2) Blunders and mistakes(E3) Physical measurement errors(E4) Machine representation and arithmetic errors

    (E5) Mathematical approximation errors2 Consequences of errors

    (F1) Loss-of-significance errors(F2) Noise in function evaluation(F3) Underflow and overflow erros

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2.1 Sources of error

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    21/60

    (E1) Modeling errors: the mathematical equations are used to

    represent physical reality - mathematical model.

    Malthusian growth model (it can be accurate for some stages ofgrowth of a population, with unlimited resources)

    N(t) = N0ekt

    , N0, k 0where N(t) = population at time t. For large t the modeloverestimates the actual population:

    accurately models the growth of US population for

    1790 t 1860, with k = 0.02975, N0 = 3, 929, 000 e1790k

    but considerably overestimates the actual population for 1870

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2.1 Sources of error

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    22/60

    (E2) Blunders and mistakes: mostly programming errors

    test by using cases where you know the solution

    break into small subprograms that can be tested separately

    (E2) Physical measurement errors. For example, the speed of light invacuum

    c = (2.997925 + )

    1010cm/sec,||

    0.000003

    Due to the error in data, the calculations will contain the effectof this observational error. Numerical analysis cannot remove theerror in the data, but it can

    look at its propagated effect in a calculation and

    suggest the best form for a calculation that will minimize thepropagated effect of errors in data

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2.1 Sources of error

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    23/60

    (E4) Machine representation and arithmetics errors. For exampleerrors from rounding and chopping.

    they are inevitable when using floating-point arithmetic

    they form the main source of errors with some problems (solvingsystems of linear equations). We will look at the effect ofrounding errors for some summation procedures

    (E4) Mathematical approximation errors: major forms of error that wewill look at.

    For example, when evaluating the integral

    I =

    10

    ex2

    dx,

    since there is no antiderivative for ex2

    , I cannot be evaluatedexplicitly. Instead, we approximate it with a quantity that can becomputed.

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2.1 Sources of error

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    24/60

    Using the Taylor approximation

    ex2 1 x2 + x

    4

    2! x

    6

    3!+

    x8

    4!

    we can easily evaluate

    I 10

    1 x2 + x

    4

    2! x

    6

    3!+

    x8

    4!

    dx

    with the truncation error being evaluated by (3.10)

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2.2 Loss-of-significance errors

    http://-/?-http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    25/60

    Loss of significant digits

    Example

    Consider the evaluation off(x) = x[

    x + 1 x]

    for an increasing sequence of values of x.

    x0 Computed f(x) True f(x)1 0.414210 0.414214

    10 1.54340 1.54347100 4.99000 4.98756

    1000 15.8000 15.8074

    10, 000 50.0000 49.9988100, 000 100.000 158.113

    Table: results of using a 6-digit decimal calculator

    As x increases, there are fewer digits of accuracy in the computed value f(x)

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2.2 Loss-of-significance errors

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    26/60

    What happens?For x = 100:

    100 = 10.0000

    exact

    ,

    101 = 10.04999 rounded

    where

    101 is correctly rounded to 6 significant digits of accuracy.

    x + 1

    x =

    101

    100 = 0.0499000

    while the true value should be 0.0498756.The calculation has a loss-of-significance error. Three digits ofaccuracy in

    x + 1 =

    101 were canceled by subtraction of the

    corresponding digits in x =

    100.The loss of accuracy was a by-product of

    the form of f(x) and

    the finite precision 6-digit decimal arithmetic being used

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2.2 Loss-of-significance errors

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    27/60

    For this particular f, there is a simple way to reformulate it and avoidthe loss-of-significance error:

    f(x) =x

    x + 1 x

    which on a 6 digit decimal calculator will imply

    f(100) = 4.98756

    the correct answer to six digits.

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2.2 Loss-of-significance errors

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    28/60

    Example

    Consider the evaluation off(x) =

    1 cos(x)x2

    for a sequence approaching 0.

    x0 Computed f(x) True f(x)

    0.1 0.4995834700 0.49958347220.01 0.4999960000 0.49999583330.001 0.5000000000 0.49999995830.0001 0.5000000000 0.4999999996

    0.00001 0.0 0.5000000000Table: results of using a 10-digit decimal calculator

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2.2 Loss-of-significance errors

    Look at the calculation when x = 0 01:

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    29/60

    Look at the calculation when x = 0.01:

    cos(0.01) = 0.9999500004 (= 0.999950000416665)

    has nine significant digits of accuracy, being off in the tenth digit bytwo units. Next

    1 cos(0.01) = 0.0000499996 (= 4.999958333495869e 05)which has only five significant digits, with four digits being lost in the

    subtraction.To avoid the loss of significant digits, due to the subtraction of nearlyequal quantities, we use the Taylor approximation (3.7) for cos(x)about x = 0:

    cos(x) = 1 x22!

    + x44!

    x66!

    + R6(x)

    R6(x) =x8

    8!cos(), unknown number between 0 and x.

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2.2 Loss-of-significance errors

    http://-/?-http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    30/60

    Hence

    f(x) =1

    x2 1 1 x2

    2!

    +x4

    4! x6

    6!

    + R6(x)=

    1

    2! x

    2

    4!+

    x4

    6! x

    6

    8!cos()

    giving f(0) = 12 . For

    |x

    | 0.1

    x6

    8!cos()

    106

    8!

    = 2.5 1011

    Therefore, with this accuracy

    f(x) 12!

    x2

    4!+

    x4

    6!, |x| < 0.1

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2.2 Loss-of-significance errors

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    31/60

    Remark

    When two nearly equal quantities are subtracted, leading significantdigits will be lost.

    In the previous two examples, this was easy to recognize and we foundways to avoid the loss of significance.More often, the loss of significance is subtle and difficult to detect, asin calculating sums (for example, in approximating a function f(x) bya Taylor polynomial). If the value of the sum is relatively smallcompared to the terms being summed, then there are probably some

    significant digits of accuracy being lost in the summation process.

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2.2 Loss-of-significance errors

    Example

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    32/60

    p

    Consider using the Taylor series approximation for ex to evaluate e5:

    e5 = 1 +(5)

    1!+

    (5)22!

    +(5)3

    3!+

    (5)44!

    + . . .

    Degree Term Sum Degree Term Sum0 1.000 1.000 13 -0.1960 -0.042301 -5.000 -4.000 14 0.7001E-1 0.027712 12.50 8.500 15 -0.2334E-1 0.004370

    3 -20.83 -12.33 16 0.7293E-2 0.011664 26.04 13.71 17 -0.2145E-2 0.0095185 -26.04 -12.33 18 0.5958E-3 0.010116 21.70 9.370 19 -0.1568E-3 0.0099577 -15.50 -6.130 20 0.3920E-4 0.0099968 9.688 3.558 21 -0.9333E-5 0.009987

    9 -5.382 -1.824 22 0.2121E-5 0.00998910 2.691 0.8670 23 -0.4611 E-6 0.00998911 -1.223 -0.3560 24 0.9607 E-7 0.00998912 0.5097 0.1537 25 -0.1921 E-7 0.009989

    Table. Calculation of e5 = 0.006738 using four-digit decimal arithmetic

    2. Error and Computer Arithmetic Math 1070

  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    33/60

    > 2. Error and Computer Arithmetic > 2.2.3 Noise in function evaluation

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    34/60

    Consider evaluating a continuous function f for all x

    [a, b]. Thegraph is a continuous curve.When evaluating f on a computer using floating-point arithmetic (withrounding or chopping), the errors from arithmetic operations cause thegraph to cease being a continuous curve.

    Let look at

    f(x) = (x 1)3 = 1 + x(3 + x(3 + x))

    on [0, 2].

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2.3 Noise in function evaluation

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    35/60

    Figure: f(x) = x3

    3x2

    + 3x 1

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2.3 Noise in function evaluation

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    36/60

    Figure: Detailed graph of f(x) = x3 3x2 + 3x 1 near x = 1Here is a plot of the computed values of f(x) for 81 evenly spaced

    values of x [0.99998, 1.00002].A rootfinding program might consider f(x) t have a very large numberof solutions near 1 based on the many sign changes!!!

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2.4 Underflow and overflow errors

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    37/60

    From the definition of floating-point number, there are upper and

    lower limits for the magnitudes of the numbers that can be expressedin a floating-point form.Attempts to create numbers

    that are too small underflow errors: the default option is toset the number to zero and proceed

    that are too large overflow errors: generally fatal errors onmost computers. With the IEEE floating-point format, overflowerrors can be carried along as having a value of or N aN,depending on the context. Usually, an overflow error is an

    indication of a more significant problem or error in the program.

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2.4 Underflow and overflow errors

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    38/60

    Example (underflow errors)

    Consider evaluating f(x) = x10 for x near 0.

    With the IEEE single precision arithmetic, the smallest nonzeropositive number expressible in normalized floating point arithmetic is

    m = 2126

    = 1.18

    1038

    So f(x) is set to zero if

    x10 < m

    |x|

    < 10

    m

    = 1.61

    104

    0.000161 < x < 0.000161

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.2.4 Underflow and overflow errors

    S i i ibl li i h fl b j

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    39/60

    Sometimes is possible to eliminate the overflow error by justreformulating the expression being evaluated.

    Example (overflow errors)

    Consider evaluating z =

    x2 + y2.

    If x or y is very large, then x2 + y2 might create an overflow error,even though z might be within the floating-point range of the machine.

    z =

    |x|

    1 +yx

    2, 0 |y| |x|

    |y|

    1 +xy

    2, 0 |x| |y|

    In both cases, the argument of 1 + w2 has |w| 1, which will notcause any overflow error (except when z is too large to be expressed inthe floating-point format used).

    2. Error and Computer Arithmetic Math 1070

  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    40/60

    > 2. Error and Computer Arithmetic > 2.3 Propagation of Error

    Example

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    41/60

    Let xA = 3.14 and yA = 2.651 be correctly rounded from xT and yT,to the number of digits shown. Then

    |xA xT| 0.005, |yA yT| 0.00053.135 xT 3.145 2.6505 yT 2.6515 (4.10)

    For addition ( : +

    )

    xA + yA = 5.791 (4.11)

    The true value, from (4.10)

    5.7855=3.135+2.6505 xT+yT 3.145+2.6515=5.7965 (4.12)To bound E, subtract (4.11) from (4.12) to get

    0.0055 (xT + yT) (xA + yA) 0.0055

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.3 Propagation of Error

    Example

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    42/60

    For division ( : )

    xAyA

    = 3.142.651

    = 1.184459 (4.13)

    The true value, from (4.10),

    3.135

    2.6515 xT

    yT 3.145

    2.6505

    dividing and rounding to 7 digits:

    1.182350 xTyT

    1.186569

    The bound on E

    0.002109 xTyT

    1.184459 0.002110

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.3 Propagation of Error

    Propagated error in multiplication

    http://-/?-http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    43/60

    The relative error in xAyA compared to xTyT is

    Rel(xAyA) = xTyT xAyAxTyT

    If xT = xA + , yT = yA + , then

    Rel(xAyA) =xTyT

    xAyA

    xTyT =xTyT

    (xT

    )(yT

    )

    xTyT

    =xT + yT

    xTyT=

    xT+

    yT

    xT

    yT

    = Rel(xA) + Rel(yA)

    Rel(xA)Rel(yA)

    If Rel(xA) 1 and Rel(yA) 1, then

    Rel(xAyA) Rel(xA) + Rel(yA)

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.3 Propagation of Error

    Propagated error in division

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    44/60

    The relative error in xAyA

    compared to xTyT

    is

    Rel xAyA =xTyT

    xAyAxT

    yT

    If xT = xA + , yT = yA + , then

    Rel xA

    yA=

    xTyT

    xAyA

    xT

    yT

    =xTyA xAyT

    xTyA

    =xT(yT ) (xT )yT

    xT(yT )=

    xT + yTxT(yT ) =

    xT

    yT

    1 xT

    =Rel(xA) Rel(yA)

    1

    Rel(yA)

    If Rel(yA) 1, thenRel

    xAyA

    Rel(xA) Rel(yA)

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.3 Propagation of Error

    Propagated error in addition and subtraction

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    45/60

    (xT yT) (xA yA) = (xT xA) (yT yA) = Error(xA yA) = Error(xA) Error(yA)

    Misleading: we can have much larger Rel(xA yA) for small values ofRel(xA) and Rel(yA) (this source of error is connected closely toloss-of-significance errors).

    Example

    1

    xT = xA = 13

    yT = 168, yA = 12.961Rel(xA) = 0,Rel(yA) = 0.0000371Error(xA yA) = 0.0004814Rel(xA yA) = 0.0125

    2

    xT = xA = 3.1416

    yT =227 , yA = 3.1429

    xT xA = 7.35 106 Rel(xA) = 2.34 106,(yT yA) = 4.29 105 Rel(yA) = 1.36 105(xTyT)(xAyA) = 0.0012645 (.0013) = 3.55 105Rel(xA

    yA)

    =

    0.28

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.3 Propagation of Error

    Total calculation error

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    46/60

    With floating-point arithmetic on a computer, xAyA involves anadditional rounding or chopping error, as in (4.13). Hence the total

    error in computing xAyA (the complete operation in computer,involving the propagated error plus the rounding or chopping error):

    xTyT xAyA = (xTyT xAyA)

    propagated error

    +(xAyA xAyA

    error in computing xAyA

    )

    When using IEEE arithmetic with the basic arithmetic operations

    xAyA = f l(xAyA)(4.3)

    = (1 + )(xAyA)

    where as in (4.4)-(4.5), we get

    xAyA xAyA = (xAyA)xAyAxAyAxAyA

    = Hence the process of rounding or chopping introduces a relatively smallnew error into xAyA as compared with xAyA.

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.3.1 Propagated error in function evaluation

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    47/60

    Evaluate f(x)

    C1[a, b] at the approximate value xA instead of xT.

    Using the mean value theorem

    f(xT)f(xA) = f(c)(xTxA), c between xT and xA f(xT)(xT xA) f(xA)(xT xA) (4.14)

    and

    Rel(f(xA)) f(xT)

    f(xT)(xT xA) = f

    (xT)

    f(xT)xTRel(xA) (4.15)

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.3.1 Propagated error in function evaluation

    Example: ideal gas law

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    48/60

    P V = nRT

    with R a constant for all gases:

    R = 8.3143 + , || 0.0012Let evaluate T assuming P = V = n = 1 : T = 1

    R,

    i.e., evaluate f(x) = 1x

    at x = R. Let

    xT = R, xA = 8.3143,

    |xT

    xA

    | 0.0012

    For the error

    E =1

    R 1

    8.3143we have

    |E| = |f(xT) f(xA)|(4.14)

    |f

    (xA)||xT xA| 1

    x2A 0.0012

    = 0.000144

    Hence , the uncertainty in R relatively small error in the computed1

    R

    =1

    8.3143

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.3.1 Propagated error in function evaluation

    Example: evaluate bx, b > 0

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    49/60

    Since f(x) = (ln b)bx, by (4.15)

    bxT bxA (ln b)bxT(xT xA)Rel(bxA) (ln b)xT

    K=conditioning number

    Rel(xA)

    If the conditioning number K

    1 is large in size

    Rel(bxA) Rel(xA)

    For example, ifRel(xA) = 10

    7, K = 104

    thenRel(bxA)

    = 103

    independent of how bxA is actually computed.

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.4 Summation

    Let denote the sumn

    http://-/?-http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    50/60

    S = a1 + a2 + + an =j=1

    aj ,

    where each aj is a floating-point number. It takes (n 1) additions, each ofwhich will probably involve a rounding or chopping error:S2 = f l(a1 + a2)

    (4.3)= (a1 + a2)(1 + 2)

    S3 = f l(a3 + S2)(4.3)

    = (a3 + S2)(1 + 3)

    S4 = f l(a4 + S3)

    (4.3)

    = (a4 + S3)(1 + 4)...

    Sn = f l(an + Sn1)(4.3)

    = (an + Sn1)(1 + n)

    with Sn the computed version of S.

    Each j satisfies the bounds (4.6)- (4.9), assuming the IEEE arithmetic isused.

    SSna1(2+. . .+n)a2(2+. . .+n)a3(3+. . .+n)a4(4 + . . . + n) ann (4.16)

    2. Error and Computer Arithmetic Math 1070

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    51/60

    > 2. Error and Computer Arithmetic > 2.4 Summation

    n True SL Error LS Error10 2 929 2 929 0 001 2 927 0 002

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    52/60

    10 2.929 2.929 0.001 2.927 0.00225 3.816 3.813 0.003 3.806 0.01050 4.499 4.491 0.008 4.479 0.020

    100 5.187 5.170 0.017 5.142 0.045200 5.878 5.841 0.037 5.786 0.092500 6.793 6.692 0.101 6.569 0.224

    1000 7.486 7.284 0.202 7.069 0.417

    Table: Calculating S on a machine using chopping

    n True SL Error LS Error10 2.929 2.929 0 2.929 025 3.816 3.816 0 3.817 -0.00150 4.499 4.500 -0.001 4.498 0.001

    100 5.187 5.187 0 5.187 0200 5.878 5.878 0 5.876 0.002500 6.793 6.794 0.001 6.783 0.010

    1000 7.486 7.486 0 7.449 0.037

    Table: Calculating S on a machine using rounding2. Error and Computer Arithmetic Math 1070

  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    53/60

    > 2. Error and Computer Arithmetic > 2.4.1 Rounding versus chopping

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    54/60

    (2) For chopping with the four-digit decimal machine, (4.18) isreplaced by

    0.001 j 0

    and all errors have one sign.

    Again, the errors will be random in this interval, with the average

    0.0005 and (4.17) will likely be

    a1 (n 1) (0.0005)

    hence T is proportional to n, which increases more rapidly than

    n.

    Thus the error (4.17) and (4.16) will grow more rapidly when choppingis used rather than rounding.

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.4.1 Rounding versus chopping

    Example (difference between rounding and chopping on summation)

    C id

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    55/60

    Consider

    S =n

    j=11

    jin single precision accuracy of six decimal digits.

    Errors occur both in calculation of 1j

    and in the summation process.

    n True Rounding error Chopping Error10 2.92896825 -1.76E-7 3.01E-7

    50 4.49920534 7.00E-7 3.56E-6

    100 5.18737752 -4.12E-7 6.26E-6

    500 6.79282343 -1.32E-6 3.59E-5

    1000 7.48547086 8.88E-8 7.35E-5

    Table: Calculation of S: rounding versus choppingThe true values were calculated using double precision arithmetic toevaluate S; all sums were performed from the smallest to the largest.

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.4.2 A loop error

    Suppose we wish to calculate:

    jh (4 19)

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    56/60

    x = a + jh, (4.19)

    for j = 0, 1, 2, . . . , n, for given h > 0, which is used later to evaluate afunction f(x).

    Question

    Should we compute x using (4.19) or by using

    x = x + h (4.20)

    in the loop, having initially set x = a before beginning the loop?

    Remark: These are mathematically equivalent ways to compute x, but

    they are usually not computationally equivalent.The difficulty arises when h does not have a finite binary expansionthat can be stored in the given floating-point significand; for example

    h = 0.1

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.4.2 A loop error

    The computation (4 19)

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    57/60

    The computation (4.19)x = a + jh

    will involve two arithmetic operations, hence only two chopping orrounding errors, for each value of x.In contrast, the repeated use of (4.20)

    x = x + hwill involve a succession of j additions, for the x of (4.19). As x

    increases in size, the use of (4.20) involves a larger number of roundingor chopping errors, leading to a different quantity than in (4.19).Usually (4.19) is the preferred way to evaluate x

    Example

    Compute x in the two ways, with a = 0, h = 0.1 and verify with thetrue value computed using a double precision computation.

    2. Error and Computer Arithmetic Math 1070

    > 2. Error and Computer Arithmetic > 2.4.2 A loop error

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    58/60

    j x Error using (4.19) Error using (4.20)

    10 1 1.49E-8 -1.04 E-7

    20 2 2.98E-8 -2.09E-730 3 4.47E-8 7.60 E-7

    40 4 5.96E-8 1.73 E-6

    50 5 7.45E-8 2.46 E-6

    60 6 8.94E-8 3.43 E-670 7 1.04E-7 4.40 E-6

    80 8 1.19E-7 5.36 E-6

    90 9 1.34E-7 2.04 E-6

    100 10 1.49E-7 -1.76 E-6

    Table: Evaluation of x = j h, h = 0.1

    2. Error and Computer Arithmetic Math 1070

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    59/60

    > 2. Error and Computer Arithmetic > 2.4.3 Calculation of inner products

    http://-/?-
  • 7/27/2019 MATH1070 2 Error and Computer Arithmetic

    60/60

    (2) Using double precision:1 convert each aj and bj to double precision by extending their

    significands with zeros2 multiply in double precision3 sum in double precision4 round to single precision to obtain the calculated value of S

    For machines with IEEE arithmetic, this is a simple and rapid

    procedure to obtain more accurate inner products in singleprecision.There is no increase in storage space for the arraysA = [a1, a2, . . . , an] and B.The accuracy is improved since there is only one single precision

    rounding error, regardless of the size n.

    2. Error and Computer Arithmetic Math 1070