risky measures of risk: error analysis of numerical differentiation

7/25/2019 Risky Measures of Risk: Error Analysis of Numerical Differentiation

1/80

Risky Measures of Risk:Error Analysis of

Numerical Differentiation

Dr. Harvey J. SteinHead, Quantitative Finance R&D

Bloomberg L.P.22 June 2005

Revision: 1.13


2/80

1. Overview 2

1. Overview

Issues in numerical differentiation

Roundoff error Convexity error Cancellation error

Correlated errors

Methods to improve accuracy h control Smoothing techniques

Computation specific approaches


3/80

2. Derivatives as finite difference 3

2. Derivatives as finite difference

To compute the derivative of a function f, one must compute

f(x) = limh0

f(x + h) f(x)h

.

Although one could try to take this limit numerically, this is much work. Morecommonly, one chooses a small value of h = h0, and tries to verify that theapproximation

f(x)fr(x) = f(x + h0) f(x)h0

is sufficiently close to the desired derivative.

But, once were not taking the limit, we run into questions, such as:

What value ofh0?

Why fr, and not f

l (x) = f(x)f(xh0)h0

or fc = f(x+h0)f(xh0)2h0

?


4/80

3. Investigating behavior as a function ofh0 4

3. Investigating behavior as a function of h0

Naively, one might think that since we want the limit as hgoes to zero, small isbetter.

However, graphing the computed derivatives as functions of the step size indicatesotherwise.


5/80


0

0.2

0.4

0.6

0.8

1

-3 -2 -1 0 1 2 3

Normal CDFCentral derivative with h = 10^-12Central derivative with h = 10^-13

As can be seen in the above graph, h0 = 1013 exhibits some noise (which, Im

afraid is barely visible on this slide).


6/80


Usingh0= 1014 gives visible noise:

0

0.2

0.4

0.6

0.8

1

-3 -2 -1 0 1 2 3

Normal CDFCentral derivative, stepsize 10^-14


7/80


As h0 is decreased, the approximation gets worse and worse, with h0 = 1017

giving complete nonsense - a derivative approximation thats mostly zero:

0

0.2

0.4

0.6

0.8

1

-3 -2 -1 0 1 2 3

Normal CDFCentral derivative, stepsize 10^-15Central derivative, stepsize 10^-16Central derivative, stepsize 10^-17


8/80

4. Error analysis 101 - machine precision 8

4. Error analysis 101 - machine precision

Why does the derivative look so bad for h0 = 1017? Because then h0 is disap-

pearing into the resolution of our computer arithmetic:

Range from 1-10^-15 to 1+10^-15

Cumulative normal

Doubles by the IEEE standard are 64 bits long with a 53 bit mantissa. For anygiven exponent, one can only represent 252 different positive values (one bit is


9/80


the sign bit). 252 1016, so we only get to use at most 16 decimal digits torepresent a mantissa. In particular, our computers think that 1 + 1017 = 1.This discreteness affects the input value, the output value, and all intermediatecomputations.

Inspecting the above graph, we see that a stepsize of 1016 forxvalues around 1will give either zero or a ridiculously high value for the derivative. 1015 will alsogive noisy derivatives, depending on where the actual x+h0 and x h0 lie onthe step function that constitutes the normal CDF at this level of magnification.

Theres another effect thats commonly discussed. The fact that (x + h) (x h)= 2h because of roundoff error. This means that the denominator of ourapproximation shouldnt really be h. There are two ways to fix this. One is touse a power of 2 forhinstead of a power of 10. The other is to seth = (x+h)x.The later requires a little bit of effort to prevent an optimizing compiler fromreducing it to h=h.

But, Ive only observed minor impact from such adjustments. For clarity, Ill


10/80


continue using powers of 10 instead of powers of 2 here.


11/80

5. Error analysis 102A - The right h 11

5. Error analysis 102A - The right h

But, is this the whole story? Since were differentiating the normal CDF, wecan compare it to the analytic solution. Comparing our original calculation withh0 = 10

12, we see that the derivative computed wasnt nearly as accurate aswe had thought. In fact, h0= 10

10 gives much smaller errors:

-0.0001

-5e-05

0

5e-05

0.0001

0.00015

-3 -2 -1 0 1 2 3

normal - delta f, h=10^-12normal - delta f, h=10^-11normal - delta f, h=10^-10


12/80


Lets see how large we need to make h0.

-1e-06

-8e-07

-6e-07

-4e-07

-2e-07

0

2e-07

4e-07

6e-07

8e-07

1e-06

-3 -2 -1 0 1 2 3


Clearly, 108 works much better than 1010.


13/80


Continuing along, we see that 106 works much better than 108:

-1e-08

-8e-09

-6e-09

-4e-09

-2e-09

0

2e-09

4e-09

6e-09

8e-09

1e-08

-3 -2 -1 0 1 2 3



14/80


Finally, we see that 105 gives the best results, with 104 being smooth, butgiving a high bias.

-7e-10

-6e-10

-5e-10

-4e-10

-3e-10

-2e-10

-1e-10

0

1e-10

2e-10

3e-10

-3 -2 -1 0 1 2 3

normal - delta f, h=10^-6normal - delta f, h=10^-5

normal - delta f, h=10^-4


15/80


To prove Im not cheating, lets do the same withf(x) =x3:

-6

-5

-4

-3

-2

-1

0

1

2

-6 -4 -2 0 2 4 6

deriv(x^3), error with h = 10^-14


16/80


The best h0 forf(x) =x3 ends up again being around 105. Coincidence?

-2e-09

0

2e-09

4e-09

6e-09

8e-09

1e-08

1.2e-08

-3 -2 -1 0 1 2 3

deriv(x^3), error with h=10^-6deriv(x^3), error with h=10^-5deriv(x^3), error with h=10^-4


17/80


Lets look more closely at the error graph using h0= 105:

-1.5e-11

-1e-11

-5e-12

0

5e-12

1e-11

1.5e-11

-3 -2 -1 0 1 2 3

normal - delta f, h=10^-5

It appears to consist of noise plus some periodic error. The noise is from cancel-lation error, and the periodic component is from convexity error.


18/80

6. Convexity error 18

6. Convexity error

f(x + h) =f(x) + f(x)h +f(x)

2! h2 +

f(x)

3! h3 + ...

f(x + h) f(x h) = 2f

(x)h + 2f(x)

h3 3! + ...

f(x + h) f(x h)2h

=f(x) +f(x)

3! h2 + ...

When h is large enough, the f

3! h2 term contributes to the error. This error

tends to zero as h tends to zero.

This is one of the reasons why a centered derivative is favored over a one sidedderivative the fh error term drops out, so the convexity error is smaller.Note also that the one sided derivative is the same as the two sided derivativecomputed at half the stepsize and shifted by half the stepsize, so in some sensethe error in the one sided derivative is that its estimating the derivative at thewrongx value.


19/80

7. Cancellation error 19

7. Cancellation error

Consider f(x+h)f(xh). Suppose our of each has 3 significant digits ofaccuracy. How many digits of accuracy are in the difference?

f(x + h) =.335 + noise

(f(x

h) =.231 + noise)

f(x + h) f(x h) =.104 + noise

However,

f(x + h) =.335 + noise

(f(x h) =.331 + noise)f(x + h) f(x h) =.004 + noise


20/80


In the first case, the difference yielded 3 significant digits. In the second case,the high order digits cancelled, leaving only 1 significant digit in the difference.This is cancellation error.

Clearly, cancellation error increases as h decreases. For h sufficiently small,f(x + h) =f(x h) and the relative error becomes infinite.

In general, we can use relative errors to encode the number of significant digitsin our computations. Let

f(x) =f(x) + (x)f(x)

where f(x) is what we actually get when computingf(x). Here(x) is a randomquantity that quantifies the relative error in calculating f(x). If were accurateto machine precision, then

||

253 . If our calculation only yields 5 decimal

digits of accuracy, then|| 1052 .


21/80


f(x + h) f(x h) =f(x + h) f(x h)+ (x + h)f(x + h) (x h)f(x h)

f(x + h) f(x h) + (x)f(x)

assuming h is small enough that f(x+h) and f(xh) are about the samemagnitude, and that(x + h) and(x h) are independent noise, and ignoringthe fact that summing them might cause the loss of one additional significant

bit. This analysis can be done more carefully, but this will get us into the rightballpark.

This quantifies the problem of cancellation error. The absolute error in calculat-ing f is roughly f. For small h, f is small, leaving the error to dominatethe calculation.

The best value ofh will balance the cancellation error and the convexity error.

f(x + h) f(x h)2h

f(x) + f

(x)3!

h2 + (x)f(x)2h


22/80


To minimize the error we must minimize f(x)3! h2 + (x)f(x)2h ).

d

dh(

f(x)

3! h2 +

(x)f(x)

2h ) =

fh

3 f

2h2

= fh3 3f

6h2

= 0

impliesfh3 3f= 0

orh= 3(3f/f)

If our calculations are exact except for roundoff error, andf andf are aroundthe same order of magnitude, then 253 (52 accurate digits, base 2), whichgives an optimal h of about 7106. This ties out with our above empiricalwork, and indicates that both functions are being computed with around fullaccuracy.


23/80


For x2, we know the 3rd derivative is zero. This means that the only error isfrom cancellation, which means that larger h should automatically be better.Graphs confirm this theory:

-2.5e-13

-2e-13

-1.5e-13

-1e-13

-5e-14

0

5e-14

1e-13

1.5e-13

2e-13

2.5e-13

-3 -2 -1 0 1 2 3

deriv(x^2), error with h=10^-2deriv(x^2), error with h=10^-1deriv(x^2), error with h=10^0

Note that in finance our error is often on the order of a penny on a 100 value, a

relative error of about 105

which requires h00.03, a rather large value!


24/80

8. Error analysis 102B - Flying without a reference 24

8. Error analysis 102B - Flying without a reference

More commonly were faced with calculating derivatives without being able toverify against a known analytic formula. If we dont know the derivative, andwe dont know how much error we have in our function, how does one pick h0?

One way is to inspect higher order derivatives. If our first derivative is jumpingaround, then the second derivative with the same step size will be visibly noisy.

Well use

f(x) f(x + h) + f(x h) 2f(x)h20and

f(x)

f(x + 2h) 2f(x + h) + 2f(x h) f(x 2h)

2h2

0

so that for the second and third derivative approximations sample f with thesame spacing as the first derivative calculation.

8 E l i 102B Fl i i h f 25


25/80


Graphingf andf as a function of step size shows that the second derivativeis visibly poor for h0 = 10

7 and that the third derivative is visibly poor forh0= 10

5:

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

-3 -2 -1 0 1 2 3

Normal CDFDerivative with h = 10^-5

2nd derivative with h = 10^-72nd derivative with h = 10^-63rd derivative with h = 10^-53rd derivative with h = 10^-4

This at least indicates that something between 10

6 and 10

4 is called for when

8 E l i 102B Fl i ith t f 26


26/80


computingf. Can we make this more precise? Maybe, but we havent tried.

One might consider trying to integrate the derivative to see how close it comesback to the original function. Unfortunately this doesnt help because the sum is

effectively a telescoping series. If x= 2h,X={x1, x1 + x, x1 + 2x . . . , x2},then

X

f(x + h) f(x h)2h

x=f(x2+ h) f(x1 h).

In other words, the error in the derivative from one point to the next cancel eachother. This is clear because if a given point is too high, then the derivative tothe left will be too large while on the right it will be too small.

9 E l i 201 C l t d 27


27/80

9. Error analysis 201 - Correlated errors 27

9. Error analysis 201 - Correlated errors

This is where most error analysis ends up, but is far from the whole story. Oneof the key assumptions in the above analysis is that the error is random anduncorrelated from one x value to another. This is rarely the case.

Consider finite difference and lattice (aka tree) approaches to option valuation.In these, the pricing function is a weighed average of the payoff sampled at variouspoints. The weights change slightly as a function of the underlying, but the actualpayoffs used change substantially as the strike passes a sample point. This makes

the pricing function calculation roughly a piecewise linear approximation of theactual function. In the case of a european option on a stock under Black-Scholesand using a binomial lattice, its exactly a piecewise linear approximation. Inthe case of an option on a bond, its closer to piecewise exponential.

To prove this, letSij be the stock value at nodej at timeti. With starting valueS0, volatility , maturity time T, N steps, t = T /N, and risk free rate r, atypical binomial lattice uses future stock values of:

Sij =S0ujdij ,

9 Error analysis 201 Correlated errors 28


28/80


whereu=t, and d= 1/u.



29/80


The value of an option of strike K is then

C=erTNj=0

max(SNj K, 0)

=erTNj=0

max(S0(pu)j(qd)Nj K, 0)

=e

rT

N

j=j(S0) S

0(pu)

j

(qd)

Nj

Kwherep = e

rtd

ud , andq= 1p, andj(S0) is the minimumj such thatSNj > K.

Then,dC

dS0=erT

Nj=j(S0)

(pu)j(qd)Nj

The derivative is a step function, only changing value when j(S0) changes.



30/80


This is rarely noticed when graphing the function, just like the error in thederivative calculations werent noticed in our initial graphs:

0

10

20

30

40

50

60

40 60 80 100 120 140 160

Initial stock value

Option value, 1 yr opt, 30% vol, 3% risk free rate

12 step lattice

9 Error analysis 201 - Correlated errors 31


31/80


But it becomes clearly evident when we inspect the difference derivative:

0

0.1

0.2

0.3

0.4

0.5

0.6

0.70.8

0.9

1

40 60 80 100 120 140 160

Initial stock value

dC/dS, 1 yr opt, 30% vol, 3% risk free rate, h=.01

12 steps120 steps

A 12 step lattice gives us large piecewise linear sections. A 120 step lattice, whileincreasing computation by a factor of 100, only decreases the sizes of the steps

by about a factor of 3.

9 Error analysis 201 - Correlated errors 32


32/80

9. Error analysis 201 Correlated errors 32

Why only a factor of three? Because u=t. Decreasing the step size by afactor of 10 only decreases u by a factor of

10 3, which only gives about 3

times the level density.

When the approximation is piecewise linear, and the stepsize is much smallerthan the support of the linear segments, the first derivative is poor. In computingthe second derivative, the sample endpoints almost always land on the samesegement, making the estimate of the second derivative zero almost everywhere.



33/80

9. Error analysis 201 Correlated errors 33

-5

0

5

10

15

20

25

40 60 80 100 120 140 160

Initial stock value

ddC, 1 yr opt, 30% vol, 3% risk free rate, h=.01

12 step lattice120 step lattice

10. Smoothing 34


34/80

g 3

10. Smoothing

When the calculation is a black box, we cant get inside to use the internals inthe calculation. In this case, how can one compute a good derivative?

One trick is to use a large h. We suffer convexity error because its being swamped

by error from the piecewise linearity of the function. Picking h around 1 to 2xthe support of the linear segments will do it.

11. H adjustment 35


35/80

j

11. H adjustment

Here we can see that with a 12 step lattice, we need to compute the derivativewith h017.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

40 60 80 100 120 140 160

Initial stock value

ddC, 1 yr opt, 30% vol, 3% risk free rate, 12 step lattice

h=.01h=1

h=10h=17

11. H adjustment 36


36/80

j

The second derivative is also helped by using a larger stepsize, but still isntespecially good:

0

0.005

0.01

0.015

0.02

0.025

0.03

40 60 80 100 120 140 160

Initial stock value

2nd deriv of 1 yr opt, 30% vol, 3% risk free rate, 12 step lattice

h=1h=10h=17

11. H adjustment 37


37/80

More commonly, people would use 120 levels for a 1 year stock option, but eventhis requires a large value ofh0:

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.80.9

1

40 60 80 100 120 140 160

Initial stock value

dC/dS of 1 yr opt, 30% vol, 3% risk free rate, 120 step lattice

h=.01h=1h=5

11. H adjustment 38


38/80

Second derivative:

0

0.005

0.01

0.015

0.02

40 60 80 100 120 140 160

Initial stock value

2nd deriv of 1 yr opt, 30% vol, 3% risk free rate, 120 step lattice

h=5

11. H adjustment 39


39/80

For fun, lets take a look at what happens with 1200 levels, which is over 3levels/day:

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.80.9

1

40 60 80 100 120 140 160

Initial stock value

dC/dS, 30% vol, 3% risk free rate, 1200 step lattice

h=.01h=2

As you can see, we still have fairly large piecewise linear sections. We need to

make h0 around 2 to get reasonable derivative estimates.

11. H adjustment 40


40/80

Second derivative:

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

40 60 80 100 120 140 160

Initial stock value

2nd deriv, 30% vol, 3% risk free rate, 1200 step lattice

h=2

11. H adjustment 41


41/80

Why did an h0 of 17 for 12 levels, 5 for 120 levels and 2 for 1200 levels workreasonably well?

As mentioned before, the stepsize needed is roughly the lattice spacing. This is

approximately 2S0

T, which is 17 for 12 steps/year, 5.5 for 120 steps/year,and 1.7 for 1200 steps/year.

Even for a dense lattice of 1200 levels, a much larger stepsize required than iscommonly recognized.

In fact, its common to use monthly steps in a binomial lattice for long datedbonds, and a bump size of 10bp for modified duration and key rate duration witha one sided derivative.

Lets take a look at the behavior of this. Well use a trinomial lattice, whichgives better results than a binomial lattice.

11. H adjustment 42


42/80

First, consider the error in computing the change in a callable bond as a functionof the step size using a centered derivative.

-0.07

-0.06

-0.05

-0.04

-0.03

-0.02

-0.01

0

0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075

curve shift

Centered derivatives

dC/dS 10bpdC/dS 25bpdC/dS 50bp

A 25bp shift is bumpy, but looks fairly close to what it should be.

11. H adjustment 43


43/80

Compare this to a one sided derivative, which is whats commonly used:

-0.07

-0.06

-0.05

-0.04

-0.03

-0.02

-0.01

0

0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075

curve shift

One sided derivatives

dC/dS 25bpdC/dS 10bp, 1 sided

dC/dS 25bp, 1 sideddC/dS 50bp, 1 sided

The stepping in the one sided derivative for a given h0 is the same as that forthe centered derivative at h0/2, but the convexity error is much worse.

11. H adjustment 44


44/80

Next, consider the key rate sensitivities. The bond isnt sensitive to the 3morate, so the 1st key rate sensitivity should be zero.

-8.15e-05

-8.1e-05

-8.05e-05

-8e-05

-7.95e-05

-7.9e-05

-7.85e-05

-7.8e-05

-7.75e-05

0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075

curve shift

key rate duration - k1 - call sensitivity vs level

10bp 1side

10bp centered25bp 1side


50bp centered

It ends up being close to zero, but noisy, and pretty similar for all step sizesand seemingly unaffected by whether we use a centered derivative or a one sided

derivative.

11. H adjustment 45


45/80

The second key rate sensitivities suffer from the piecewise nature of the calcula-tion, both the centered ones as well as the one sided ones.

-0.01

-0.009

-0.008

-0.007

-0.006

-0.005

-0.004

-0.003

-0.002-0.001

0

0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075

curve shift

key rate duration - k2 - call sensitivity vs level

10bp 1side



50bp centered

The other key rates look similar.

11. H adjustment 46


46/80

Comparing the sum of the one sided key rates to the 25bp centered derivatives, wesee that the sum suffers both from the piecewise nature as well as the convexity,and suffers worse than the one sided full sensitivity.

-0.07

-0.06

-0.05

-0.04

-0.03

-0.02

-0.01

0

0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075

curve shift

Sum of key rates, one sided

dC/dS 25bpsum kr 10bpsum kr 25bpsum kr 50bp

11. H adjustment 47


47/80

The centered key rates are better, with the 25bp step size landing fairly close tothe 25bp derivative.

-0.07

-0.06

-0.05

-0.04

-0.03

-0.02

-0.01

0

0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075

curve shift

Sum of key rates, centered

dC/dS 25bp

sum kr 10bpsum kr 25bp

11. H adjustment 48


48/80

Comparing the 50bp centered key rates to the 50bp difference derivative, wesee that the two are close, but are significantly different. This is because thekey rates interact with the piecewise nature differently than the full curve shift.

-0.07

-0.06

-0.05

-0.04

-0.03

-0.02

-0.01

0

0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075

curve shift

Sum of key rates, centered

dC/dS 50bpsum kr 50bp

12. Filtering 49

12 Filtering


49/80

12. Filtering

A more sophisticated approach is to smooth our pricing function. Essentially,wed like to filter out the high frequencies that come from the corners where theslope changes, leaving only the lower frequency data arising from the changingfunction values.

This amounts to computing the Fourier transform of the price function, mul-tiplying by a function that decays to zero (to dampen out the high frequencynoise), and transforming back, or

Smoothf=F1(F(f)D)

where D is our damping function (or smoothing kernel), andF is the Fouriertransform.

12. Filtering 50


50/80

SinceF(f g) =F(f)F(g) (whereis the convolution operator),

Smoothf=F1(F(f)D)=F1(F(f)F(F1(D)))=F1(F(f F1(D)))=f F1(D)

So, smoothing a function is the same as computing its convolution with theinverse transform of the smoothing kernel. Since (fg) = f g = fg,smoothing the derivative can be done by convolving with the derivative of theinverse transform of the smoothing kernel. Finally, since the Fourier transformof a Gaussian PDF is a Gaussian (up to scaling), we can smooth by integratingagainst a Gaussian and its derivatives.

All thats left is to integrate a function times a Gaussian, which is best done by

12. Filtering 51


51/80

Gaussian quadrature.

1

2

f(x0 x)

ex

2

22

dx= 13

2

f(x0 x)xe

x2

22 dx

= 1

2

f(x0 2x)xex

2

dx

wi

2f(x0

2xi)xi

wherexi are the Gaussian quadrature points andwi are the associated weights.

The theory sounds beautiful, and looks like exactly what we need, but the theorydoesnt live up to its promise in practice. Although Ive used this method in

the past, and it has applications in signal processing, Ive been unable to makeit perform better than a two point difference derivative. It seems that it worksbetter in the random noise case than on piecewise linear functions.

12. Filtering 52


52/80

Using 5 points on a 120 level Black-Scholes lattice yields:

0

0.1

0.20.3

0.4

0.5

0.6

0.7

0.8

0.9

1

40 60 80 100 120 140 160

Stock price

FFT vs centered difference - 1st derivative

formula, h=10^-5centered, h=5

Gauss, 3pt, h=5

The 5 point FFT method yields similar results to the 2 point difference derivative.Both look good by inspection.

12. Filtering 53


53/80

But of course, the best way to check is to compare to a good reference. In thiscase, well compare the error relative to a difference derivative computed on theformula using a step size of 105.

-0.006

-0.004

-0.002

0

0.002

0.004

0.006

40 60 80 100 120 140 160

Stock price

FFT vs centered difference - 1st derivative, errors

Centered errFFT err, sig=3FFT err, sig=2

The FFT method is hard pressed to do better than a well chosen step size.

12. Filtering 54


54/80

Its hard to make either method produce both a smooth and accurate secondderivative:

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

40 60 80 100 120 140 160Stock price

FFT vs centered difference - 2nd derivative

formula, h=10^-4

centered, h=15FFT h=6

Both look equally poor, with the FFT method requiring twice the computationaleffort.

12. Filtering 55


55/80

Error graphs confirm what we saw in the previous graph:

-0.0008

-0.0006

-0.0004

-0.0002

0

0.0002

0.0004

0.0006

40 60 80 100 120 140 160

Stock price

FFT vs centered difference - 2nd derivative errors

Centered, h=15FFT err, sig=6

13. Complex arithmetic 56

13. Complex arithmetic


56/80

p

Another approach makes use of complex analysis. Iff is a real valued functionof one real variable, and can be extended to a complex analytic function, then

f(x + ih) =f(x) + f(x)ih

f(x)h2

2 f(3)(x)ih3

3!

+f(4)(x)h4

4!

+

so

(f(x + ih))h

=f(x) f(3)(x)h2

3! +

This has the same convexity error as the centered derivative, but doesnt directlysuffer from cancellation error, allowing one to reduce h to lower convexity errorwithout increasing cancellation error.

While this approach can be useful in analytic methods, difficulties are encoun-tered when trying to apply it in finance. It doesnt correct for correlated errors when the function is piecewise linear, it just does a very good job of returningthe slope of the linear sections, yielding a step function for the derivative. Its

also not as straight forward as it looks. One cant just change all references



57/80

of double to complex because numerical code in finance makes heavy use ofinequalities, as in

max(S K, 0),which are meaningless on the complex plane. They need to be replaced by

something else. One source recommends comparing the real parts, but thisprevents the function from being analytic, thus breaking the above Taylor seriesanalysis. Finally, our analytic formulas in finance typically involve cumulativenormal distributions. While there is a unique continuation to the complex plane,

computing it is more involved than just calculatingerf(x/sqrt(2))/2+1/2. Onewould need to develop fast and accurate numerical methods for the calculationof a complex cumulative normal before this method is useful in such a context.

This method is commonly compared to a one sided derivative because both re-quire one additional function evaluation. But, evaluating a function at a complexpoint can triple the computational effort. One complex addition is over doublethe effort of a real addition, in that it requires two real additions and works withmore memory. One complex multiplication requires four real multiplications plus

two real additions, and thus is over four times as expensive as a real addition.



58/80

A centered derivative is more comparable in computational effort, in which caseboth methods have the same convergence properties as h0 tends to zero. Theonly difference is in the cancellation error.

Its easy to see why this doesnt help for Black-Scholes binomial lattices. Recall-ing that the lattice computation for the value of a call option is

C(S0) =erT

N

j=j(S0)

S0(pu)j(qd)Nj K

we see that

(C(S0+ ih))/h=erT

N

j=j(S0)

(pu)

j

(qd)

Nj

K

Up to round off error, the complex method gives the same results as the centered

difference.



59/80

Another complex technique is to exploit the Cauchy integral formula, whichstates that:

fn(z0) = n!

2i

f(z)

(z z0)n+1 dz

where is a counterclockwise loop enclosing z0.

One can then compute the above integral numerically. Bruno Dupire and ArunVerma have looked into this method a little, deriving formulas for getting 4thorder accuracy using 4 points for the first 4 derivatives.

14. Algorithm specific approaches 60

14. Algorithm specific approaches


60/80

There are additional approaches that can be taken if one can modify the internalsof the numerical calculations.

15. Using internal lattice spacing 61

15. Using internal lattice spacing


61/80

In finite difference approaches, one can often read extra information from thelattice itself. In a simple Black-Scholes lattice, one can start the lattice twolevels early. This gives the option value as the middle value after the secondstep. The values at the other two nodes can be used for the up and down values.

One reference for this method is a 1994 article by Pelsser and Vorst, where theycall it a well known alternative to the difference derivative.

Pelsser and Vorst compute the derivative as f

x, which introduces convexity error

by doing a difference derivative around the wrong point. Here we avoid this byusing another numerical technique fitting all three points (the up, the downand the center) to a quadratic and reading the derivatives from there.

This latter technique could also be used in general when three points are avail-able, and should reduce convexity error, but I havent tested it.

Shifting the lattice often gives the best derivatives that can be gotten from a

lattice.



62/80

Unfortunately, the approach cant always be applied. In interest rate lattices, thevalues at the other nodes dont always correspond to a shift of the yield curve.In normal short rate models they do, but in log normal models they dont. In thelatter case, to apply this approach, one would have to either adjust the derivativeor settle for differentiating with respect to a different sort of curve move.



63/80

None the less, where this method applies, it works quite well. Well compareit to the best of fixed h0 selection. Consider Black-Scholes again with a 1 yearoption, 30% vol, 3% risk free rate, computed using a 120 step binomial lattice.Again, the first derivatives are visually fine:

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

40 60 80 100 120 140 160

Stock price

Centered difference vs lattice shift - 1st derivative

formula, h=10^-5centered, h=5

lattice shift



64/80

But the differences to a reference show that the shifted lattice approach is farsmoother and more accurate:

-0.006

-0.005

-0.004

-0.003-0.002

-0.001

0

0.001

0.002

0.003

0.004

40 60 80 100 120 140 160Stock price

Centered difference vs lattice shift - 1st derivative, errors

Centered err

lattice shift



65/80

The results on the second derivative are more pronounced. The fixed hselectionis visibly poor, while the shifted lattice still looks quite good:

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

40 60 80 100 120 140 160Stock price

Centered difference vs lattice shift - 2nd derivative

formula, h=10^-4

centered, h=15lattice shift



66/80

Checking the differences to a reference shows how much better:

-0.0008

-0.0006

-0.0004

-0.0002

0

0.0002

0.0004

0.0006

40 60 80 100 120 140 160

Stock price

Centered difference vs lattice shift - 2nd derivative errors

Centered, h=15FFT err, sig=6

lattice shift



67/80

Looking at the errors for the lattice shift by itself, we can see the errors in thesecond derivative calculation are around 104, which is about a 1.5% error inthe second derivative.

-0.0001

-8e-05

-6e-05

-4e-05

-2e-05

0

2e-05

4e-05

6e-05

8e-05

40 60 80 100 120 140 160

Stock price

Lattice shift 2nd derivative error

lattice shift



68/80

Surprisingly, this method yields reasonable results even with a monthly lattice.Heres the first derivative:

0

0.1

0.2

0.30.4

0.5

0.6

0.7

0.8

0.9

1

40 60 80 100 120 140 160Stock price

Centered difference vs lattice shift - 1st derivative

formula, h=10^-5

lattice shift



69/80

Heres the second derivative:

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

40 60 80 100 120 140 160

Stock price

Centered difference vs lattice shift - 2nd derivative

formula, h=10^-4lattice shift



70/80

The shifted lattice performs well because it samples the price function at exactlythe right points. At option expiration, on the up tree theres exactly one morenode in the money and one less out of the money, and the rest get exactly thesame value.

When the call option price is:

C(S0) =erT

Nj=j(S0)

S0(pu)j(qd)Nj K

and 1


71/80

Whereas we used a quadratic approximation, for simplicity, just consider a onesided derivative. Its value is

erT

u(pu)j(S0)1(qd)Nj(S0)+1 K/S0

u

1

+N

j=j0(S0)

(pu)j(qd)Nj

Its exactly the centered difference derivative for a small shift, plus a correctionterm thats a linear function of 1/S0. Its the correction term varying as afunction ofS0whilej(S0) remains fixed that makes up the appropriate correction

for the derivative calculation.

It must be noted that despite the above graphs, optimizing h0 and the shiftedlattice method are actually the same numerically. Pickingh0gives poorer resultsin the above tests predominantly because were not picking a differenth0for each

underlying. Fixing it for the entire computation is why its not behaving nearlyas well.

Setting h = S0u

S0 would make the difference derivative quite close to the

shifted lattice value. Using separate up and down shifts instead of doing a



72/80

centered derivative would make them identical. But this is the same value atalmost three times the computational effort.

16. Differentiation under the integral sign 73

16. Differentiation under the integral sign


73/80

In Monte Carlo calculations, one computes an integral via random sampling ofthe payoff. Pricing errors in Monte Carlo based calculations are typically muchlarger than in other methods, making shifting methods particularly poor.

One approach (advocated by Vladimir Piterbarg) is to exploit the fact thatthe integral and derivative commute to integrate the derivative of the payofffunction instead of differentiating the integral of the payoff. This approach maylead to staircasing, but even so, its still better than the random noise observedin attempting a direct finite difference calculation.

Another approach (advocated by Fournie, Lasry, Lebuchoux, Lions and Touzi,as well as by Benhamou) applies Malliavin calculus in an effort to reduce theerror in computing the expectation of the derivative. Here, instead of computing

ddS0 E[X(S0)], we find a random variable such that ddS0 E[X(S0)] =E[X(S0)],which again allows computing the derivative directly by Monte Carlo instead oftaking the difference of two Monte Carlo price calculations.

Differentiation under the integral can also be used when valuing options via FFT.

16. Differentiation under the integral sign 74


74/80

The derivative can be computed by computing the FFT of the derivative of thecharacteristic function.

17. Analytic techniques 75

17. Analytic techniques


75/80

Theres a large literature on working out various greeks analytically which Ihavent reviewed. Because of the pricing PDE, there are relations that can beexploited to avoid the need to compute all the greeks some can be gotten fromothers. Symmetries and in general, behavior under specific transformations can

be exploited as well. Papers by Peter Carr as well as by Oliver Reiss and UweWystup are good places to get started.

18. Summary 76

18. Summary


76/80

Approximating the derivative by a difference magnifies the error of the orig-inal function.

Small step sizes give huge errors due to cancellation error. Large step sizes give huge errors due to convexity error. Balancing convexity error and cancellation error requires unexpectedly large

step sizes as large as 105 when calculations are accurate to machineprecision.

Its hard to judge accuracy without an an accurate reference, but one cantry to make due by graphing higher order derivatives with small stepsize. Finite difference methods produce piecewise linear (or exponential) func-

tions, which require extra care. Large step sizes are needed to producereasonable results. We observed the need for step sizes of 17 for a 12 levelbinomial lattice, and 25 50 bp for a 12 level trinomial lattice. Hedges inpractice could be way off.

Fixing this by increasing lattice density is computationally infeasible becauselevel spacing is proportional to

t.

18. Summary 77

Beware of key rate durations Theyre especially inaccurate


77/80

Beware of key rate durations. They re especially inaccurate.

Beware of one sided derivatives. Theyre more sensitive to piecewise linearfunctions and more sensitive to convexity the worst of both worlds.

Other methods appear in the literature, but dont always help.

One simple method that does help is using the points in the lattice for theup and down values, extending the lattice back in time if necessary to getthose points.

19. References 78

19. References


78/80

What Every Computer Scientist Should Know About Floating-Point Arithmetic,David Goldberg, Computing Surveys, March 1991

http://docs.sun.com/source/806-3568/ncg goldberg.html

Numerical Recipes in C/C++/Fortran, William H. Press, Saul A. Teukolsky,William T. Vetterling, Brian P. Flannery

The Binomial Model and the Greeks, Antoon Pelsser and Ton Vorst, Journal ofDerivatives, Spring 1994.

The Complex-Step Derivative Approximation (Sensitivity Analysis Workshop,

Livermore, August 2001)http://mdolab.utias.utoronto.ca/documents/livermore2001.pdf

The connection between the complex-step derivative approximation and algorith-

mic differentiation, J. R. R. A. Martins, P. Sturdza, J. J. Alonso, AIAA Paper

19. References 79

2001-0921 Jan 2001


79/80

2001 0921, Jan. 2001.

Using Complex Variables to Estimate Derivatives of Real Functions, WilliamSquire and George Trapp, SIAM Review, Vol. 40, No. 1, March 1998.

Risk Sensitivities of Bermuda Swaptions, Vladimir Piterbarg, Bank of AmericaWorking Paper, November 1, 2002

Applications of Malliavin calculus to Monte Carlo methods in finance, Eric

Fournie, Jean-Michel Lasry, Jerome Lebuchoux, Pierre-Louis Lions, Nizar Touzi,Finance and Stochastics, Vol. 3, No. 4, August 1999

Applications of Malliavin calculus to Monte Carlo methods in finance, II, EricFournie, Jean-Michel Lasry, Jerome Lebuchoux, Pierre-Louis Lions, Nizar Touzi,Finance and Stochastics, Vol 5, No. 2, April 2001

Smart Monte Carlo: Various tricks using Malliavin calculus, E. Benhamou,Quantitative Finance, Volume 2, Number 5, 2002.

19. References 80

Optimal Malliavin Weighting Function for the Computation of the Greeks, E.


80/80

Optimal Malliavin Weighting Function for the Computation of the Greeks, E.Benhamou, Mathematical Finance, Volume 13, Issue 1, 2003.

Deriving Derivatives of Derivative Securities, Peter Carr, Journal of Computa-tional Finance, Vol. 4, No. 2, Winter 2000.

Computing Option Price Sensitivities Using Homogeneity and Other Tricks,Oliver Reiss and Uwe Wystup, The Journal of Derivatives, Winter 2001

risky measures of risk: error analysis of numerical differentiation

Documents