risky measures of risk: error analysis of numerical differentiation

Upload: hjstein6984

Post on 01-Mar-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    1/80

    Risky Measures of Risk:Error Analysis of

    Numerical Differentiation

    Dr. Harvey J. SteinHead, Quantitative Finance R&D

    Bloomberg L.P.22 June 2005

    Revision: 1.13

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    2/80

    1. Overview 2

    1. Overview

    Issues in numerical differentiation

    Roundoff error Convexity error Cancellation error

    Correlated errors

    Methods to improve accuracy h control Smoothing techniques

    Computation specific approaches

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    3/80

    2. Derivatives as finite difference 3

    2. Derivatives as finite difference

    To compute the derivative of a function f, one must compute

    f(x) = limh0

    f(x + h) f(x)h

    .

    Although one could try to take this limit numerically, this is much work. Morecommonly, one chooses a small value of h = h0, and tries to verify that theapproximation

    f(x)fr(x) = f(x + h0) f(x)h0

    is sufficiently close to the desired derivative.

    But, once were not taking the limit, we run into questions, such as:

    What value ofh0?

    Why fr, and not f

    l (x) = f(x)f(xh0)h0

    or fc = f(x+h0)f(xh0)2h0

    ?

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    4/80

    3. Investigating behavior as a function ofh0 4

    3. Investigating behavior as a function of h0

    Naively, one might think that since we want the limit as hgoes to zero, small isbetter.

    However, graphing the computed derivatives as functions of the step size indicatesotherwise.

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    5/80

    3. Investigating behavior as a function ofh0 5

    0

    0.2

    0.4

    0.6

    0.8

    1

    -3 -2 -1 0 1 2 3

    Normal CDFCentral derivative with h = 10^-12Central derivative with h = 10^-13

    As can be seen in the above graph, h0 = 1013 exhibits some noise (which, Im

    afraid is barely visible on this slide).

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    6/80

    3. Investigating behavior as a function ofh0 6

    Usingh0= 1014 gives visible noise:

    0

    0.2

    0.4

    0.6

    0.8

    1

    -3 -2 -1 0 1 2 3

    Normal CDFCentral derivative, stepsize 10^-14

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    7/80

    3. Investigating behavior as a function ofh0 7

    As h0 is decreased, the approximation gets worse and worse, with h0 = 1017

    giving complete nonsense - a derivative approximation thats mostly zero:

    0

    0.2

    0.4

    0.6

    0.8

    1

    -3 -2 -1 0 1 2 3

    Normal CDFCentral derivative, stepsize 10^-15Central derivative, stepsize 10^-16Central derivative, stepsize 10^-17

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    8/80

    4. Error analysis 101 - machine precision 8

    4. Error analysis 101 - machine precision

    Why does the derivative look so bad for h0 = 1017? Because then h0 is disap-

    pearing into the resolution of our computer arithmetic:

    Range from 1-10^-15 to 1+10^-15

    Cumulative normal

    Doubles by the IEEE standard are 64 bits long with a 53 bit mantissa. For anygiven exponent, one can only represent 252 different positive values (one bit is

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    9/80

    4. Error analysis 101 - machine precision 9

    the sign bit). 252 1016, so we only get to use at most 16 decimal digits torepresent a mantissa. In particular, our computers think that 1 + 1017 = 1.This discreteness affects the input value, the output value, and all intermediatecomputations.

    Inspecting the above graph, we see that a stepsize of 1016 forxvalues around 1will give either zero or a ridiculously high value for the derivative. 1015 will alsogive noisy derivatives, depending on where the actual x+h0 and x h0 lie onthe step function that constitutes the normal CDF at this level of magnification.

    Theres another effect thats commonly discussed. The fact that (x + h) (x h)= 2h because of roundoff error. This means that the denominator of ourapproximation shouldnt really be h. There are two ways to fix this. One is touse a power of 2 forhinstead of a power of 10. The other is to seth = (x+h)x.The later requires a little bit of effort to prevent an optimizing compiler fromreducing it to h=h.

    But, Ive only observed minor impact from such adjustments. For clarity, Ill

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    10/80

    4. Error analysis 101 - machine precision 10

    continue using powers of 10 instead of powers of 2 here.

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    11/80

    5. Error analysis 102A - The right h 11

    5. Error analysis 102A - The right h

    But, is this the whole story? Since were differentiating the normal CDF, wecan compare it to the analytic solution. Comparing our original calculation withh0 = 10

    12, we see that the derivative computed wasnt nearly as accurate aswe had thought. In fact, h0= 10

    10 gives much smaller errors:

    -0.0001

    -5e-05

    0

    5e-05

    0.0001

    0.00015

    -3 -2 -1 0 1 2 3

    normal - delta f, h=10^-12normal - delta f, h=10^-11normal - delta f, h=10^-10

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    12/80

    5. Error analysis 102A - The right h 12

    Lets see how large we need to make h0.

    -1e-06

    -8e-07

    -6e-07

    -4e-07

    -2e-07

    0

    2e-07

    4e-07

    6e-07

    8e-07

    1e-06

    -3 -2 -1 0 1 2 3

    normal - delta f, h=10^-10normal - delta f, h=10^-19normal - delta f, h=10^-8

    Clearly, 108 works much better than 1010.

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    13/80

    5. Error analysis 102A - The right h 13

    Continuing along, we see that 106 works much better than 108:

    -1e-08

    -8e-09

    -6e-09

    -4e-09

    -2e-09

    0

    2e-09

    4e-09

    6e-09

    8e-09

    1e-08

    -3 -2 -1 0 1 2 3

    normal - delta f, h=10^-8normal - delta f, h=10^-7normal - delta f, h=10^-6

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    14/80

    5. Error analysis 102A - The right h 14

    Finally, we see that 105 gives the best results, with 104 being smooth, butgiving a high bias.

    -7e-10

    -6e-10

    -5e-10

    -4e-10

    -3e-10

    -2e-10

    -1e-10

    0

    1e-10

    2e-10

    3e-10

    -3 -2 -1 0 1 2 3

    normal - delta f, h=10^-6normal - delta f, h=10^-5

    normal - delta f, h=10^-4

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    15/80

    5. Error analysis 102A - The right h 15

    To prove Im not cheating, lets do the same withf(x) =x3:

    -6

    -5

    -4

    -3

    -2

    -1

    0

    1

    2

    -6 -4 -2 0 2 4 6

    deriv(x^3), error with h = 10^-14

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    16/80

    5. Error analysis 102A - The right h 16

    The best h0 forf(x) =x3 ends up again being around 105. Coincidence?

    -2e-09

    0

    2e-09

    4e-09

    6e-09

    8e-09

    1e-08

    1.2e-08

    -3 -2 -1 0 1 2 3

    deriv(x^3), error with h=10^-6deriv(x^3), error with h=10^-5deriv(x^3), error with h=10^-4

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    17/80

    5. Error analysis 102A - The right h 17

    Lets look more closely at the error graph using h0= 105:

    -1.5e-11

    -1e-11

    -5e-12

    0

    5e-12

    1e-11

    1.5e-11

    -3 -2 -1 0 1 2 3

    normal - delta f, h=10^-5

    It appears to consist of noise plus some periodic error. The noise is from cancel-lation error, and the periodic component is from convexity error.

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    18/80

    6. Convexity error 18

    6. Convexity error

    f(x + h) =f(x) + f(x)h +f(x)

    2! h2 +

    f(x)

    3! h3 + ...

    f(x + h) f(x h) = 2f

    (x)h + 2f(x)

    h3 3! + ...

    f(x + h) f(x h)2h

    =f(x) +f(x)

    3! h2 + ...

    When h is large enough, the f

    3! h2 term contributes to the error. This error

    tends to zero as h tends to zero.

    This is one of the reasons why a centered derivative is favored over a one sidedderivative the fh error term drops out, so the convexity error is smaller.Note also that the one sided derivative is the same as the two sided derivativecomputed at half the stepsize and shifted by half the stepsize, so in some sensethe error in the one sided derivative is that its estimating the derivative at thewrongx value.

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    19/80

    7. Cancellation error 19

    7. Cancellation error

    Consider f(x+h)f(xh). Suppose our of each has 3 significant digits ofaccuracy. How many digits of accuracy are in the difference?

    f(x + h) =.335 + noise

    (f(x

    h) =.231 + noise)

    f(x + h) f(x h) =.104 + noise

    However,

    f(x + h) =.335 + noise

    (f(x h) =.331 + noise)f(x + h) f(x h) =.004 + noise

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    20/80

    7. Cancellation error 20

    In the first case, the difference yielded 3 significant digits. In the second case,the high order digits cancelled, leaving only 1 significant digit in the difference.This is cancellation error.

    Clearly, cancellation error increases as h decreases. For h sufficiently small,f(x + h) =f(x h) and the relative error becomes infinite.

    In general, we can use relative errors to encode the number of significant digitsin our computations. Let

    f(x) =f(x) + (x)f(x)

    where f(x) is what we actually get when computingf(x). Here(x) is a randomquantity that quantifies the relative error in calculating f(x). If were accurateto machine precision, then

    ||

    253 . If our calculation only yields 5 decimal

    digits of accuracy, then|| 1052 .

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    21/80

    7. Cancellation error 21

    f(x + h) f(x h) =f(x + h) f(x h)+ (x + h)f(x + h) (x h)f(x h)

    f(x + h) f(x h) + (x)f(x)

    assuming h is small enough that f(x+h) and f(xh) are about the samemagnitude, and that(x + h) and(x h) are independent noise, and ignoringthe fact that summing them might cause the loss of one additional significant

    bit. This analysis can be done more carefully, but this will get us into the rightballpark.

    This quantifies the problem of cancellation error. The absolute error in calculat-ing f is roughly f. For small h, f is small, leaving the error to dominatethe calculation.

    The best value ofh will balance the cancellation error and the convexity error.

    f(x + h) f(x h)2h

    f(x) + f

    (x)3!

    h2 + (x)f(x)2h

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    22/80

    7. Cancellation error 22

    To minimize the error we must minimize f(x)3! h2 + (x)f(x)2h ).

    d

    dh(

    f(x)

    3! h2 +

    (x)f(x)

    2h ) =

    fh

    3 f

    2h2

    = fh3 3f

    6h2

    = 0

    impliesfh3 3f= 0

    orh= 3(3f/f)

    If our calculations are exact except for roundoff error, andf andf are aroundthe same order of magnitude, then 253 (52 accurate digits, base 2), whichgives an optimal h of about 7106. This ties out with our above empiricalwork, and indicates that both functions are being computed with around fullaccuracy.

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    23/80

    7. Cancellation error 23

    For x2, we know the 3rd derivative is zero. This means that the only error isfrom cancellation, which means that larger h should automatically be better.Graphs confirm this theory:

    -2.5e-13

    -2e-13

    -1.5e-13

    -1e-13

    -5e-14

    0

    5e-14

    1e-13

    1.5e-13

    2e-13

    2.5e-13

    -3 -2 -1 0 1 2 3

    deriv(x^2), error with h=10^-2deriv(x^2), error with h=10^-1deriv(x^2), error with h=10^0

    Note that in finance our error is often on the order of a penny on a 100 value, a

    relative error of about 105

    which requires h00.03, a rather large value!

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    24/80

    8. Error analysis 102B - Flying without a reference 24

    8. Error analysis 102B - Flying without a reference

    More commonly were faced with calculating derivatives without being able toverify against a known analytic formula. If we dont know the derivative, andwe dont know how much error we have in our function, how does one pick h0?

    One way is to inspect higher order derivatives. If our first derivative is jumpingaround, then the second derivative with the same step size will be visibly noisy.

    Well use

    f(x) f(x + h) + f(x h) 2f(x)h20and

    f(x)

    f(x + 2h) 2f(x + h) + 2f(x h) f(x 2h)

    2h2

    0

    so that for the second and third derivative approximations sample f with thesame spacing as the first derivative calculation.

    8 E l i 102B Fl i i h f 25

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    25/80

    8. Error analysis 102B - Flying without a reference 25

    Graphingf andf as a function of step size shows that the second derivativeis visibly poor for h0 = 10

    7 and that the third derivative is visibly poor forh0= 10

    5:

    -0.6

    -0.4

    -0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    -3 -2 -1 0 1 2 3

    Normal CDFDerivative with h = 10^-5

    2nd derivative with h = 10^-72nd derivative with h = 10^-63rd derivative with h = 10^-53rd derivative with h = 10^-4

    This at least indicates that something between 10

    6 and 10

    4 is called for when

    8 E l i 102B Fl i ith t f 26

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    26/80

    8. Error analysis 102B - Flying without a reference 26

    computingf. Can we make this more precise? Maybe, but we havent tried.

    One might consider trying to integrate the derivative to see how close it comesback to the original function. Unfortunately this doesnt help because the sum is

    effectively a telescoping series. If x= 2h,X={x1, x1 + x, x1 + 2x . . . , x2},then

    X

    f(x + h) f(x h)2h

    x=f(x2+ h) f(x1 h).

    In other words, the error in the derivative from one point to the next cancel eachother. This is clear because if a given point is too high, then the derivative tothe left will be too large while on the right it will be too small.

    9 E l i 201 C l t d 27

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    27/80

    9. Error analysis 201 - Correlated errors 27

    9. Error analysis 201 - Correlated errors

    This is where most error analysis ends up, but is far from the whole story. Oneof the key assumptions in the above analysis is that the error is random anduncorrelated from one x value to another. This is rarely the case.

    Consider finite difference and lattice (aka tree) approaches to option valuation.In these, the pricing function is a weighed average of the payoff sampled at variouspoints. The weights change slightly as a function of the underlying, but the actualpayoffs used change substantially as the strike passes a sample point. This makes

    the pricing function calculation roughly a piecewise linear approximation of theactual function. In the case of a european option on a stock under Black-Scholesand using a binomial lattice, its exactly a piecewise linear approximation. Inthe case of an option on a bond, its closer to piecewise exponential.

    To prove this, letSij be the stock value at nodej at timeti. With starting valueS0, volatility , maturity time T, N steps, t = T /N, and risk free rate r, atypical binomial lattice uses future stock values of:

    Sij =S0ujdij ,

    9 Error analysis 201 Correlated errors 28

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    28/80

    9. Error analysis 201 - Correlated errors 28

    whereu=t, and d= 1/u.

    9 Error analysis 201 Correlated errors 29

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    29/80

    9. Error analysis 201 - Correlated errors 29

    The value of an option of strike K is then

    C=erTNj=0

    max(SNj K, 0)

    =erTNj=0

    max(S0(pu)j(qd)Nj K, 0)

    =e

    rT

    N

    j=j(S0) S

    0(pu)

    j

    (qd)

    Nj

    Kwherep = e

    rtd

    ud , andq= 1p, andj(S0) is the minimumj such thatSNj > K.

    Then,dC

    dS0=erT

    Nj=j(S0)

    (pu)j(qd)Nj

    The derivative is a step function, only changing value when j(S0) changes.

    9 Error analysis 201 Correlated errors 30

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    30/80

    9. Error analysis 201 - Correlated errors 30

    This is rarely noticed when graphing the function, just like the error in thederivative calculations werent noticed in our initial graphs:

    0

    10

    20

    30

    40

    50

    60

    40 60 80 100 120 140 160

    Initial stock value

    Option value, 1 yr opt, 30% vol, 3% risk free rate

    12 step lattice

    9 Error analysis 201 - Correlated errors 31

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    31/80

    9. Error analysis 201 - Correlated errors 31

    But it becomes clearly evident when we inspect the difference derivative:

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.70.8

    0.9

    1

    40 60 80 100 120 140 160

    Initial stock value

    dC/dS, 1 yr opt, 30% vol, 3% risk free rate, h=.01

    12 steps120 steps

    A 12 step lattice gives us large piecewise linear sections. A 120 step lattice, whileincreasing computation by a factor of 100, only decreases the sizes of the steps

    by about a factor of 3.

    9 Error analysis 201 - Correlated errors 32

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    32/80

    9. Error analysis 201 Correlated errors 32

    Why only a factor of three? Because u=t. Decreasing the step size by afactor of 10 only decreases u by a factor of

    10 3, which only gives about 3

    times the level density.

    When the approximation is piecewise linear, and the stepsize is much smallerthan the support of the linear segments, the first derivative is poor. In computingthe second derivative, the sample endpoints almost always land on the samesegement, making the estimate of the second derivative zero almost everywhere.

    9. Error analysis 201 - Correlated errors 33

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    33/80

    9. Error analysis 201 Correlated errors 33

    -5

    0

    5

    10

    15

    20

    25

    40 60 80 100 120 140 160

    Initial stock value

    ddC, 1 yr opt, 30% vol, 3% risk free rate, h=.01

    12 step lattice120 step lattice

    10. Smoothing 34

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    34/80

    g 3

    10. Smoothing

    When the calculation is a black box, we cant get inside to use the internals inthe calculation. In this case, how can one compute a good derivative?

    One trick is to use a large h. We suffer convexity error because its being swamped

    by error from the piecewise linearity of the function. Picking h around 1 to 2xthe support of the linear segments will do it.

    11. H adjustment 35

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    35/80

    j

    11. H adjustment

    Here we can see that with a 12 step lattice, we need to compute the derivativewith h017.

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    40 60 80 100 120 140 160

    Initial stock value

    ddC, 1 yr opt, 30% vol, 3% risk free rate, 12 step lattice

    h=.01h=1

    h=10h=17

    11. H adjustment 36

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    36/80

    j

    The second derivative is also helped by using a larger stepsize, but still isntespecially good:

    0

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    40 60 80 100 120 140 160

    Initial stock value

    2nd deriv of 1 yr opt, 30% vol, 3% risk free rate, 12 step lattice

    h=1h=10h=17

    11. H adjustment 37

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    37/80

    More commonly, people would use 120 levels for a 1 year stock option, but eventhis requires a large value ofh0:

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.80.9

    1

    40 60 80 100 120 140 160

    Initial stock value

    dC/dS of 1 yr opt, 30% vol, 3% risk free rate, 120 step lattice

    h=.01h=1h=5

    11. H adjustment 38

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    38/80

    Second derivative:

    0

    0.005

    0.01

    0.015

    0.02

    40 60 80 100 120 140 160

    Initial stock value

    2nd deriv of 1 yr opt, 30% vol, 3% risk free rate, 120 step lattice

    h=5

    11. H adjustment 39

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    39/80

    For fun, lets take a look at what happens with 1200 levels, which is over 3levels/day:

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.80.9

    1

    40 60 80 100 120 140 160

    Initial stock value

    dC/dS, 30% vol, 3% risk free rate, 1200 step lattice

    h=.01h=2

    As you can see, we still have fairly large piecewise linear sections. We need to

    make h0 around 2 to get reasonable derivative estimates.

    11. H adjustment 40

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    40/80

    Second derivative:

    0.002

    0.004

    0.006

    0.008

    0.01

    0.012

    0.014

    0.016

    0.018

    40 60 80 100 120 140 160

    Initial stock value

    2nd deriv, 30% vol, 3% risk free rate, 1200 step lattice

    h=2

    11. H adjustment 41

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    41/80

    Why did an h0 of 17 for 12 levels, 5 for 120 levels and 2 for 1200 levels workreasonably well?

    As mentioned before, the stepsize needed is roughly the lattice spacing. This is

    approximately 2S0

    T, which is 17 for 12 steps/year, 5.5 for 120 steps/year,and 1.7 for 1200 steps/year.

    Even for a dense lattice of 1200 levels, a much larger stepsize required than iscommonly recognized.

    In fact, its common to use monthly steps in a binomial lattice for long datedbonds, and a bump size of 10bp for modified duration and key rate duration witha one sided derivative.

    Lets take a look at the behavior of this. Well use a trinomial lattice, whichgives better results than a binomial lattice.

    11. H adjustment 42

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    42/80

    First, consider the error in computing the change in a callable bond as a functionof the step size using a centered derivative.

    -0.07

    -0.06

    -0.05

    -0.04

    -0.03

    -0.02

    -0.01

    0

    0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075

    curve shift

    Centered derivatives

    dC/dS 10bpdC/dS 25bpdC/dS 50bp

    A 25bp shift is bumpy, but looks fairly close to what it should be.

    11. H adjustment 43

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    43/80

    Compare this to a one sided derivative, which is whats commonly used:

    -0.07

    -0.06

    -0.05

    -0.04

    -0.03

    -0.02

    -0.01

    0

    0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075

    curve shift

    One sided derivatives

    dC/dS 25bpdC/dS 10bp, 1 sided

    dC/dS 25bp, 1 sideddC/dS 50bp, 1 sided

    The stepping in the one sided derivative for a given h0 is the same as that forthe centered derivative at h0/2, but the convexity error is much worse.

    11. H adjustment 44

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    44/80

    Next, consider the key rate sensitivities. The bond isnt sensitive to the 3morate, so the 1st key rate sensitivity should be zero.

    -8.15e-05

    -8.1e-05

    -8.05e-05

    -8e-05

    -7.95e-05

    -7.9e-05

    -7.85e-05

    -7.8e-05

    -7.75e-05

    0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075

    curve shift

    key rate duration - k1 - call sensitivity vs level

    10bp 1side

    10bp centered25bp 1side

    25bp centered50bp 1side

    50bp centered

    It ends up being close to zero, but noisy, and pretty similar for all step sizesand seemingly unaffected by whether we use a centered derivative or a one sided

    derivative.

    11. H adjustment 45

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    45/80

    The second key rate sensitivities suffer from the piecewise nature of the calcula-tion, both the centered ones as well as the one sided ones.

    -0.01

    -0.009

    -0.008

    -0.007

    -0.006

    -0.005

    -0.004

    -0.003

    -0.002-0.001

    0

    0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075

    curve shift

    key rate duration - k2 - call sensitivity vs level

    10bp 1side

    10bp centered25bp 1side

    25bp centered50bp 1side

    50bp centered

    The other key rates look similar.

    11. H adjustment 46

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    46/80

    Comparing the sum of the one sided key rates to the 25bp centered derivatives, wesee that the sum suffers both from the piecewise nature as well as the convexity,and suffers worse than the one sided full sensitivity.

    -0.07

    -0.06

    -0.05

    -0.04

    -0.03

    -0.02

    -0.01

    0

    0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075

    curve shift

    Sum of key rates, one sided

    dC/dS 25bpsum kr 10bpsum kr 25bpsum kr 50bp

    11. H adjustment 47

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    47/80

    The centered key rates are better, with the 25bp step size landing fairly close tothe 25bp derivative.

    -0.07

    -0.06

    -0.05

    -0.04

    -0.03

    -0.02

    -0.01

    0

    0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075

    curve shift

    Sum of key rates, centered

    dC/dS 25bp

    sum kr 10bpsum kr 25bp

    11. H adjustment 48

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    48/80

    Comparing the 50bp centered key rates to the 50bp difference derivative, wesee that the two are close, but are significantly different. This is because thekey rates interact with the piecewise nature differently than the full curve shift.

    -0.07

    -0.06

    -0.05

    -0.04

    -0.03

    -0.02

    -0.01

    0

    0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075

    curve shift

    Sum of key rates, centered

    dC/dS 50bpsum kr 50bp

    12. Filtering 49

    12 Filtering

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    49/80

    12. Filtering

    A more sophisticated approach is to smooth our pricing function. Essentially,wed like to filter out the high frequencies that come from the corners where theslope changes, leaving only the lower frequency data arising from the changingfunction values.

    This amounts to computing the Fourier transform of the price function, mul-tiplying by a function that decays to zero (to dampen out the high frequencynoise), and transforming back, or

    Smoothf=F1(F(f)D)

    where D is our damping function (or smoothing kernel), andF is the Fouriertransform.

    12. Filtering 50

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    50/80

    SinceF(f g) =F(f)F(g) (whereis the convolution operator),

    Smoothf=F1(F(f)D)=F1(F(f)F(F1(D)))=F1(F(f F1(D)))=f F1(D)

    So, smoothing a function is the same as computing its convolution with theinverse transform of the smoothing kernel. Since (fg) = f g = fg,smoothing the derivative can be done by convolving with the derivative of theinverse transform of the smoothing kernel. Finally, since the Fourier transformof a Gaussian PDF is a Gaussian (up to scaling), we can smooth by integratingagainst a Gaussian and its derivatives.

    All thats left is to integrate a function times a Gaussian, which is best done by

    12. Filtering 51

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    51/80

    Gaussian quadrature.

    1

    2

    f(x0 x)

    ex

    2

    22

    dx= 13

    2

    f(x0 x)xe

    x2

    22 dx

    = 1

    2

    f(x0 2x)xex

    2

    dx

    wi

    2f(x0

    2xi)xi

    wherexi are the Gaussian quadrature points andwi are the associated weights.

    The theory sounds beautiful, and looks like exactly what we need, but the theorydoesnt live up to its promise in practice. Although Ive used this method in

    the past, and it has applications in signal processing, Ive been unable to makeit perform better than a two point difference derivative. It seems that it worksbetter in the random noise case than on piecewise linear functions.

    12. Filtering 52

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    52/80

    Using 5 points on a 120 level Black-Scholes lattice yields:

    0

    0.1

    0.20.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    40 60 80 100 120 140 160

    Stock price

    FFT vs centered difference - 1st derivative

    formula, h=10^-5centered, h=5

    Gauss, 3pt, h=5

    The 5 point FFT method yields similar results to the 2 point difference derivative.Both look good by inspection.

    12. Filtering 53

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    53/80

    But of course, the best way to check is to compare to a good reference. In thiscase, well compare the error relative to a difference derivative computed on theformula using a step size of 105.

    -0.006

    -0.004

    -0.002

    0

    0.002

    0.004

    0.006

    40 60 80 100 120 140 160

    Stock price

    FFT vs centered difference - 1st derivative, errors

    Centered errFFT err, sig=3FFT err, sig=2

    The FFT method is hard pressed to do better than a well chosen step size.

    12. Filtering 54

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    54/80

    Its hard to make either method produce both a smooth and accurate secondderivative:

    0.002

    0.004

    0.006

    0.008

    0.01

    0.012

    0.014

    0.016

    40 60 80 100 120 140 160Stock price

    FFT vs centered difference - 2nd derivative

    formula, h=10^-4

    centered, h=15FFT h=6

    Both look equally poor, with the FFT method requiring twice the computationaleffort.

    12. Filtering 55

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    55/80

    Error graphs confirm what we saw in the previous graph:

    -0.0008

    -0.0006

    -0.0004

    -0.0002

    0

    0.0002

    0.0004

    0.0006

    40 60 80 100 120 140 160

    Stock price

    FFT vs centered difference - 2nd derivative errors

    Centered, h=15FFT err, sig=6

    13. Complex arithmetic 56

    13. Complex arithmetic

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    56/80

    p

    Another approach makes use of complex analysis. Iff is a real valued functionof one real variable, and can be extended to a complex analytic function, then

    f(x + ih) =f(x) + f(x)ih

    f(x)h2

    2 f(3)(x)ih3

    3!

    +f(4)(x)h4

    4!

    +

    so

    (f(x + ih))h

    =f(x) f(3)(x)h2

    3! +

    This has the same convexity error as the centered derivative, but doesnt directlysuffer from cancellation error, allowing one to reduce h to lower convexity errorwithout increasing cancellation error.

    While this approach can be useful in analytic methods, difficulties are encoun-tered when trying to apply it in finance. It doesnt correct for correlated errors when the function is piecewise linear, it just does a very good job of returningthe slope of the linear sections, yielding a step function for the derivative. Its

    also not as straight forward as it looks. One cant just change all references

    13. Complex arithmetic 57

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    57/80

    of double to complex because numerical code in finance makes heavy use ofinequalities, as in

    max(S K, 0),which are meaningless on the complex plane. They need to be replaced by

    something else. One source recommends comparing the real parts, but thisprevents the function from being analytic, thus breaking the above Taylor seriesanalysis. Finally, our analytic formulas in finance typically involve cumulativenormal distributions. While there is a unique continuation to the complex plane,

    computing it is more involved than just calculatingerf(x/sqrt(2))/2+1/2. Onewould need to develop fast and accurate numerical methods for the calculationof a complex cumulative normal before this method is useful in such a context.

    This method is commonly compared to a one sided derivative because both re-quire one additional function evaluation. But, evaluating a function at a complexpoint can triple the computational effort. One complex addition is over doublethe effort of a real addition, in that it requires two real additions and works withmore memory. One complex multiplication requires four real multiplications plus

    two real additions, and thus is over four times as expensive as a real addition.

    13. Complex arithmetic 58

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    58/80

    A centered derivative is more comparable in computational effort, in which caseboth methods have the same convergence properties as h0 tends to zero. Theonly difference is in the cancellation error.

    Its easy to see why this doesnt help for Black-Scholes binomial lattices. Recall-ing that the lattice computation for the value of a call option is

    C(S0) =erT

    N

    j=j(S0)

    S0(pu)j(qd)Nj K

    we see that

    (C(S0+ ih))/h=erT

    N

    j=j(S0)

    (pu)

    j

    (qd)

    Nj

    K

    Up to round off error, the complex method gives the same results as the centered

    difference.

    13. Complex arithmetic 59

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    59/80

    Another complex technique is to exploit the Cauchy integral formula, whichstates that:

    fn(z0) = n!

    2i

    f(z)

    (z z0)n+1 dz

    where is a counterclockwise loop enclosing z0.

    One can then compute the above integral numerically. Bruno Dupire and ArunVerma have looked into this method a little, deriving formulas for getting 4thorder accuracy using 4 points for the first 4 derivatives.

    14. Algorithm specific approaches 60

    14. Algorithm specific approaches

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    60/80

    There are additional approaches that can be taken if one can modify the internalsof the numerical calculations.

    15. Using internal lattice spacing 61

    15. Using internal lattice spacing

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    61/80

    In finite difference approaches, one can often read extra information from thelattice itself. In a simple Black-Scholes lattice, one can start the lattice twolevels early. This gives the option value as the middle value after the secondstep. The values at the other two nodes can be used for the up and down values.

    One reference for this method is a 1994 article by Pelsser and Vorst, where theycall it a well known alternative to the difference derivative.

    Pelsser and Vorst compute the derivative as f

    x, which introduces convexity error

    by doing a difference derivative around the wrong point. Here we avoid this byusing another numerical technique fitting all three points (the up, the downand the center) to a quadratic and reading the derivatives from there.

    This latter technique could also be used in general when three points are avail-able, and should reduce convexity error, but I havent tested it.

    Shifting the lattice often gives the best derivatives that can be gotten from a

    lattice.

    15. Using internal lattice spacing 62

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    62/80

    Unfortunately, the approach cant always be applied. In interest rate lattices, thevalues at the other nodes dont always correspond to a shift of the yield curve.In normal short rate models they do, but in log normal models they dont. In thelatter case, to apply this approach, one would have to either adjust the derivativeor settle for differentiating with respect to a different sort of curve move.

    15. Using internal lattice spacing 63

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    63/80

    None the less, where this method applies, it works quite well. Well compareit to the best of fixed h0 selection. Consider Black-Scholes again with a 1 yearoption, 30% vol, 3% risk free rate, computed using a 120 step binomial lattice.Again, the first derivatives are visually fine:

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    40 60 80 100 120 140 160

    Stock price

    Centered difference vs lattice shift - 1st derivative

    formula, h=10^-5centered, h=5

    lattice shift

    15. Using internal lattice spacing 64

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    64/80

    But the differences to a reference show that the shifted lattice approach is farsmoother and more accurate:

    -0.006

    -0.005

    -0.004

    -0.003-0.002

    -0.001

    0

    0.001

    0.002

    0.003

    0.004

    40 60 80 100 120 140 160Stock price

    Centered difference vs lattice shift - 1st derivative, errors

    Centered err

    lattice shift

    15. Using internal lattice spacing 65

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    65/80

    The results on the second derivative are more pronounced. The fixed hselectionis visibly poor, while the shifted lattice still looks quite good:

    0.002

    0.004

    0.006

    0.008

    0.01

    0.012

    0.014

    0.016

    40 60 80 100 120 140 160Stock price

    Centered difference vs lattice shift - 2nd derivative

    formula, h=10^-4

    centered, h=15lattice shift

    15. Using internal lattice spacing 66

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    66/80

    Checking the differences to a reference shows how much better:

    -0.0008

    -0.0006

    -0.0004

    -0.0002

    0

    0.0002

    0.0004

    0.0006

    40 60 80 100 120 140 160

    Stock price

    Centered difference vs lattice shift - 2nd derivative errors

    Centered, h=15FFT err, sig=6

    lattice shift

    15. Using internal lattice spacing 67

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    67/80

    Looking at the errors for the lattice shift by itself, we can see the errors in thesecond derivative calculation are around 104, which is about a 1.5% error inthe second derivative.

    -0.0001

    -8e-05

    -6e-05

    -4e-05

    -2e-05

    0

    2e-05

    4e-05

    6e-05

    8e-05

    40 60 80 100 120 140 160

    Stock price

    Lattice shift 2nd derivative error

    lattice shift

    15. Using internal lattice spacing 68

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    68/80

    Surprisingly, this method yields reasonable results even with a monthly lattice.Heres the first derivative:

    0

    0.1

    0.2

    0.30.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    40 60 80 100 120 140 160Stock price

    Centered difference vs lattice shift - 1st derivative

    formula, h=10^-5

    lattice shift

    15. Using internal lattice spacing 69

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    69/80

    Heres the second derivative:

    0.002

    0.004

    0.006

    0.008

    0.01

    0.012

    0.014

    0.016

    40 60 80 100 120 140 160

    Stock price

    Centered difference vs lattice shift - 2nd derivative

    formula, h=10^-4lattice shift

    15. Using internal lattice spacing 70

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    70/80

    The shifted lattice performs well because it samples the price function at exactlythe right points. At option expiration, on the up tree theres exactly one morenode in the money and one less out of the money, and the rest get exactly thesame value.

    When the call option price is:

    C(S0) =erT

    Nj=j(S0)

    S0(pu)j(qd)Nj K

    and 1

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    71/80

    Whereas we used a quadratic approximation, for simplicity, just consider a onesided derivative. Its value is

    erT

    u(pu)j(S0)1(qd)Nj(S0)+1 K/S0

    u

    1

    +N

    j=j0(S0)

    (pu)j(qd)Nj

    Its exactly the centered difference derivative for a small shift, plus a correctionterm thats a linear function of 1/S0. Its the correction term varying as afunction ofS0whilej(S0) remains fixed that makes up the appropriate correction

    for the derivative calculation.

    It must be noted that despite the above graphs, optimizing h0 and the shiftedlattice method are actually the same numerically. Pickingh0gives poorer resultsin the above tests predominantly because were not picking a differenth0for each

    underlying. Fixing it for the entire computation is why its not behaving nearlyas well.

    Setting h = S0u

    S0 would make the difference derivative quite close to the

    shifted lattice value. Using separate up and down shifts instead of doing a

    15. Using internal lattice spacing 72

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    72/80

    centered derivative would make them identical. But this is the same value atalmost three times the computational effort.

    16. Differentiation under the integral sign 73

    16. Differentiation under the integral sign

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    73/80

    In Monte Carlo calculations, one computes an integral via random sampling ofthe payoff. Pricing errors in Monte Carlo based calculations are typically muchlarger than in other methods, making shifting methods particularly poor.

    One approach (advocated by Vladimir Piterbarg) is to exploit the fact thatthe integral and derivative commute to integrate the derivative of the payofffunction instead of differentiating the integral of the payoff. This approach maylead to staircasing, but even so, its still better than the random noise observedin attempting a direct finite difference calculation.

    Another approach (advocated by Fournie, Lasry, Lebuchoux, Lions and Touzi,as well as by Benhamou) applies Malliavin calculus in an effort to reduce theerror in computing the expectation of the derivative. Here, instead of computing

    ddS0 E[X(S0)], we find a random variable such that ddS0 E[X(S0)] =E[X(S0)],which again allows computing the derivative directly by Monte Carlo instead oftaking the difference of two Monte Carlo price calculations.

    Differentiation under the integral can also be used when valuing options via FFT.

    16. Differentiation under the integral sign 74

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    74/80

    The derivative can be computed by computing the FFT of the derivative of thecharacteristic function.

    17. Analytic techniques 75

    17. Analytic techniques

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    75/80

    Theres a large literature on working out various greeks analytically which Ihavent reviewed. Because of the pricing PDE, there are relations that can beexploited to avoid the need to compute all the greeks some can be gotten fromothers. Symmetries and in general, behavior under specific transformations can

    be exploited as well. Papers by Peter Carr as well as by Oliver Reiss and UweWystup are good places to get started.

    18. Summary 76

    18. Summary

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    76/80

    Approximating the derivative by a difference magnifies the error of the orig-inal function.

    Small step sizes give huge errors due to cancellation error. Large step sizes give huge errors due to convexity error. Balancing convexity error and cancellation error requires unexpectedly large

    step sizes as large as 105 when calculations are accurate to machineprecision.

    Its hard to judge accuracy without an an accurate reference, but one cantry to make due by graphing higher order derivatives with small stepsize. Finite difference methods produce piecewise linear (or exponential) func-

    tions, which require extra care. Large step sizes are needed to producereasonable results. We observed the need for step sizes of 17 for a 12 levelbinomial lattice, and 25 50 bp for a 12 level trinomial lattice. Hedges inpractice could be way off.

    Fixing this by increasing lattice density is computationally infeasible becauselevel spacing is proportional to

    t.

    18. Summary 77

    Beware of key rate durations Theyre especially inaccurate

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    77/80

    Beware of key rate durations. They re especially inaccurate.

    Beware of one sided derivatives. Theyre more sensitive to piecewise linearfunctions and more sensitive to convexity the worst of both worlds.

    Other methods appear in the literature, but dont always help.

    One simple method that does help is using the points in the lattice for theup and down values, extending the lattice back in time if necessary to getthose points.

    19. References 78

    19. References

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    78/80

    What Every Computer Scientist Should Know About Floating-Point Arithmetic,David Goldberg, Computing Surveys, March 1991

    http://docs.sun.com/source/806-3568/ncg goldberg.html

    Numerical Recipes in C/C++/Fortran, William H. Press, Saul A. Teukolsky,William T. Vetterling, Brian P. Flannery

    The Binomial Model and the Greeks, Antoon Pelsser and Ton Vorst, Journal ofDerivatives, Spring 1994.

    The Complex-Step Derivative Approximation (Sensitivity Analysis Workshop,

    Livermore, August 2001)http://mdolab.utias.utoronto.ca/documents/livermore2001.pdf

    The connection between the complex-step derivative approximation and algorith-

    mic differentiation, J. R. R. A. Martins, P. Sturdza, J. J. Alonso, AIAA Paper

    19. References 79

    2001-0921 Jan 2001

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    79/80

    2001 0921, Jan. 2001.

    Using Complex Variables to Estimate Derivatives of Real Functions, WilliamSquire and George Trapp, SIAM Review, Vol. 40, No. 1, March 1998.

    Risk Sensitivities of Bermuda Swaptions, Vladimir Piterbarg, Bank of AmericaWorking Paper, November 1, 2002

    Applications of Malliavin calculus to Monte Carlo methods in finance, Eric

    Fournie, Jean-Michel Lasry, Jerome Lebuchoux, Pierre-Louis Lions, Nizar Touzi,Finance and Stochastics, Vol. 3, No. 4, August 1999

    Applications of Malliavin calculus to Monte Carlo methods in finance, II, EricFournie, Jean-Michel Lasry, Jerome Lebuchoux, Pierre-Louis Lions, Nizar Touzi,Finance and Stochastics, Vol 5, No. 2, April 2001

    Smart Monte Carlo: Various tricks using Malliavin calculus, E. Benhamou,Quantitative Finance, Volume 2, Number 5, 2002.

    19. References 80

    Optimal Malliavin Weighting Function for the Computation of the Greeks, E.

  • 7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

    80/80

    Optimal Malliavin Weighting Function for the Computation of the Greeks, E.Benhamou, Mathematical Finance, Volume 13, Issue 1, 2003.

    Deriving Derivatives of Derivative Securities, Peter Carr, Journal of Computa-tional Finance, Vol. 4, No. 2, Winter 2000.

    Computing Option Price Sensitivities Using Homogeneity and Other Tricks,Oliver Reiss and Uwe Wystup, The Journal of Derivatives, Winter 2001