numerical methodspasciak/classes/609/lectures.pdf · numerical methods andrea bonito and joseph e....

NUMERICAL METHODS

ANDREA BONITO AND JOSEPH E. PASCIAK

Contents

1. Lecture 1: Preliminaries 22. Lecture 2 52.1. Review of Some Linear Algebra 52.2. Polynomial Interpolation (see Chapt. 6) 53. Lecture 3 83.1. Newton form in matlab 83.2. Lagrange Form of the Interpolating Polynomial 94. Lecture 4 115. Lecture 5 145.1. Chebyschev Polynomials 146. Lecture 6 166.1. Piecewise Polynomial Interpolation 167. Lecture 7 19A general uniquely solvable interpolation problem 19The general Hermite interpolation problem 20Piecewise Hermite interpolation 218. Lecture 8 22Trigonometric Interpolation 22Trigonometric interpolation 24The DFT - Discrete Fourier Transform 249. Lecture 9 2610. Lecture 10 28Extension of Trigonometric interpolation 28Numerical Differentiation 28Error estimates 2811. Lecture 11 3211.1. Differentiation via Polynomial Interpolation 3212. Lecture 12 35Numerical Integration 3513. Lecture 13 41Composite Schemes 41Composite Quadrature 4213.1. Simpson’s Composite Quadrature Rule 4214. Lecture 14 44Gaussian Quadrature 44Generalization: Weighted Gaussian Quadrature 4615. Lecture 15 49Rodrigues Formula for Legendre Polynomials 51

1

NUMERICAL METHODS ANDREA BONITO AND JOSEPH E. PASCIAK

Rodrigues Formula for Chebyshev Polynomials 5116. Lecture 16 5317. Lecture 17 54Nonlinear Equations 54Bisection method 5418. Lecture 18 58Fixed Point Iteration or Picard Iteration 5819. Lecture 19 60Newton’s Method 6120. Lecture 20 63Ordinary Differential Equations (ODEs) 6321. Lecture 21 67Numerical Ordinary Differential Equations (ODEs) 67Finite Difference Approximations 6722. Lecture 22 70Runge-Kutta Methods 7023. Lecture 23 7223.1. Multistep Methods 7224. Lecture 24 75Backwards Differences 75Appendix of Lecture 24 7725. Lecture 25 7825.1. Errors in ODE schemes 7826. Lecture 26 81Stiff ODE’s (systems) 8227. Lecture 27 8428. Lecture 28 88

1. Lecture 1: Preliminaries

Definition 1.1 (Limits). Let c ∈ R and f : R → R. We say that the limit of fwhen x tends to c exists and is equal to L if

(1) f is well defined in a neighborhood of c (but not necessarily at c);(2) given ε > 0, there exists δ > 0 such that

|f(x)− L| < ε when x ∈ (c− δ, c+ δ) \ c.In this case, we write

limx→c

f(x) = L.

Definition 1.2 (Continuity). A function f : R → R is continuous at c ∈ R if thelimit of f at c exists and

limx→c

f(x) = f(c).

Definition 1.3 (C(a, b)). We say that f ∈ C(a, b) if f : R → R is continuous foreach x ∈ (a, b).

Definition 1.4 (C[a, b]). We say that f ∈ C[a, b] if f ∈ C(a, b) and

limx→a+

f(x) = f(a) and limx→b−

f(x) = f(b).

2 c©

ANDREA BONITO AND JOSEPH E. PASCIAK NUMERICAL METHODS

Definition 1.5 (Differentiability). f : R → R is said to be differentiable at c ∈ Rif f is defined in a neighborhood of c and

limx→c

f(x)− f(c)

x− cexists. In that case, we write

f ′(c) = limx→c

f(x)− f(c)

x− c.

Result 1.1 (Differentiability and Continuity). If f ′ exists at c, then f is continuousat c.

Definition 1.6 (C1(a, b)). We say that f ∈ C1(a, b) if f ′(x) exists for each x ∈(a, b).

Definition 1.7 (C1[a, b]). We say that f ∈ C1[a, b] if f ∈ C1(a, b), f ∈ C[a, b] andf ′ ∈ C[a, b].

Example 1.1 (Continuity and Differentiability). Two examples:

(1) f(x) = 1/x is in C(0, 1) but not in C[0, 1];(2) f(x) = x1/2 is in C[0, 1] not in C1[0, 1].

Theorem 1.1 (Intermediate Values). If f ∈ C[a, b], then f takes all the caluesbetween f(a) and f(b).

Theorem 1.2 (Mean Value). If f ∈ C1[a, b], then there exists c ∈ (a, b) such that

f ′(c) =f(b)− f(a)

b− a.

Theorem 1.3 (Rolle’s). If f ∈ C1[a, b] and f(a) = f(b), then there exists c ∈ (a, b)with f ′(c) = 0.

The Rolles theorem is illustrated in Figure 1.

a c2 bc1

y = f(x)

Figure 1. Illustration of Rolles theorem. In this case, there aretwo possible c in the interval (a, b).

Definition 1.8 (Cm[a, b]). We say that f ∈ Cm[a, b] if f, f ′, ..., f (m) ∈ C[a, b].

Theorem 1.4 (Taylor). Assume f ∈ Cn[a, b] and f (n+1) exists and is in C(a, b).Then for c ∈ (a, b) and x ∈ [a, b]

f(x) =

n∑j=0

f (j)(c)

j!(x− c)j + En+1(x).

c© 3


The error term En+1(x) is given by

(1) En+1(x) =1

(n+ 1)!f (n+1)(ξ)(x− c)n+1

for some ξ between x and c; or

(2) En+1(x) =1

n!

∫ x

c

f (n+1)(t)(x− t)ndt.

Taylors Theorem provides a numerical approximationn∑j=0

f (j)(c)

j!(x− c)j

of the function f together with an error bound En+1(x).

Example 1.2 (Approximation using Taylors Theorem). We use the 3 term Maclau-rin series (Taylor series with c = 0) to approximate cosh(x) for x ∈ [−1, 1] andbound the error. To do this, we compute (cosh(x)′ = sinh(x), (cosh(x)′′ = cosh(x)and (cosh(x)′′′ = sinh(x). This leads to the approximation

cosh(x) ≈ cosh(0) + sinh(0)x+1

2cosh(0)x2 = 1 +

x2

2.

To bound the error |cosh(x) − (1 + x2

2 )| = |E3(x)| we resort the expression of E3

given by (1), which reads in this case

E3(x) =1

3!sinh(ξ)x3

for some ξ ∈ (−1, 1). Since | sinh(ξ)| ≤ sinh(|ξ|) and sinh(t) is increasing forpositive t, we deduce that

| sinh(ξ)| ≤ sinh(1) =e− e−1

2.

As a consequence, we obtain the error bound

|E3(x)| ≤ e− e−1

12.

Example 1.3 (Effect of the Length of the Approximation Interval). Let us considerthe quadratic (3 term) Maclaurin series approximating cos(x) on [−π, π]:

cos(x) ≈ 1− 1

2x2.

The corresponding error term reads

E3(x) =1

6sin(ξ)x3

for some ξ ∈ (−π, π). As a consequence, we get

|E3(x)| ≤ π3

6≈ 5.16.

This indicates that Polynomial approximations may not be very good over largeintervals.

Exercise 1.1 (Effect of the Interval). Consider the same problem as in Example 1.3but on (i) [−1/2, 1/2] and (ii) [−1/10, 1/10].

4 c©


2. Lecture 2

2.1. Review of Some Linear Algebra. A system of m linear equations with nunknowns x1, ..., xn is of the form

a11x1 + a12x2 + ...+ a1nxn = b1a21x1 + a22x2 + ...+ a2nxn = b2

...am1x1 + am2x2 + ...+ amnxn = bm,

can be rewritten in a matrix-vector form

Ax = b,

where

x =

x1

...xn

, b =

b1...bm

and

A =

a11 a12 ... a1n

a21 a22 ... a2n

.... . .

...am1 am2 ... amn

.

We will often deal with square systems, i.e. m = n. For a square system:

Theorem 2.1 (Invertibility). Let A be a n×n matrix. The following are equivalent

• det(A) 6= 0;• A is invertible, i.e. there exists a n× n matrix B satisfying

BA = AB = I,

(I denotes the identity matrix);• Range(A) = Rn;• Ker(A) = x ∈ Rn : Ax = 0 = 0;• Rank(A) = n;• For any b ∈ Rn, the system Ax = b has a unique solution x ∈ Rn.

2.2. Polynomial Interpolation (see Chapt. 6).

Definition 2.1 (Polynomial Space). We define Pk to be the collection of all poly-nomials of degree at most k, i.e.

Pk = span1, x, ..., xk.

In particular dim(Pk) = k + 1.

Interpolation problem: Given a function f and interpolation points x1, ..., xm,find p ∈ Pk such that

p(xi) = f(xi), i = 1, ...,m.

Some remarks are in order.

Remark 2.1 (Distinct Interpolation Points). The x′is should be distinct otherwisethe equations are redundant.

c© 5


Remark 2.2 (Number of Interpolation Points). Pk has dimension k + 1 so we needat least m = k + 1 snapshots.

Remark 2.3 (Approximation). The interpolating polynomial p(x) provides an ap-proximation to f .

Theorem 2.2 (Interpolation). Let x0, ..., xn be distinct real numbers. Then forarbitrary snapshots y0, ..., yn there exists a unique p ∈ Pn satisfying

p(xi) = yi, i = 0, ..., n.

Proof. Let p(x) = a0 + a1x + ... + anxn such that p(xj) = yj for j = 0, ..., n. Thelatter conditions can be rewritten

a0 + a1xj + a2x2j + ...+ anx

nj = yj , j = 0, ..., n.

This is a linear system of n + 1 equations and n + 1 unknowns a0, ..., an. Thissystem corresponds to

Ba = y,

where

y =

y0

...yn

, a =

a0

...an

and

B =

1 x0 x2

0 ... xn01 x1 x2

1 ... xn1...

. . ....

1 xn x2n ... xnn

.

According to Theorem 2.1, we shall show that B is invertible by checking thatKer(B) = 0. Suppose that Bc = 0 for some

c =

c0c1...cn

.

Consider q(x) = c0 + c1x+ ...+ cnxn. Then (Bc)j = 0 is equivalent to

c0 + c1xj + c2x2j + ...+ cnx

nj = 0,

i.e.

q(xj) = 0.

The only polynomial of degree n with n + 1 distinct roots is the zero polynomial.Thus c0 = c1 = ... = cn = 0, i.e. c = 0 and Ker(B) = 0.

Remark 2.4 (Function Interpolation). This implies that there exists a unique poly-nomial p ∈ Pn interpolating f at x0, ..., xn.

The following method construct recursively the polynomial p ∈ Pn satisfyingp(xi) = yi for i = 0, ..., n.

Newton Form: Given x0, ..., xn distincts and y0, ..., yn.6 c©


(1) if n = 0 then P0 = span1 and p0 ∈ P0 satisfying p0(x0) = y0 is theconstant polynomial p0(x) = y0.

(2) Suppose that pk ∈ Pk has been constructed satisfying pk(xi) = yi, i =0, ..., k. We look for pk+1 ∈ Pk+1 of the form

pk+1(x) = pk(x) + ck+1(x− x0)(x− x1) · ... · (x− xk)︸︷︷︸∈Pk+1

,

where ck+1 is to be determined. Note that

pk+1(xi) = pk(xi) = yi, i = 0, ..., k.

Therefore, we determine ck+1 imposing the last condition

yk+1 = pk+1(xk+1) = pk(xk+1) + ck+1(xk+1 − x1) · ... · (xk+1 − xk)

i.e.

ck+1 =yk+1 − pk(xk+1)

(xk+1 − x1) · ... · (xk+1 − xk).

Example 2.1 (Newton form). Find p ∈ P3 interpolating

p(i) = 1/i, i = 1, 2, 3, 4.

The initialization of the algorithm consists in setting p0(x) = 1. We then look forp1(x) = 1 + c1(x− 1) using the second interpolation point 2:

1/2 = p1(2) = 1 + c1(2− 1), c1 = −1/2 =⇒ p1(x) = 1 +1

2(x− 1).

Similarly for the third interpolation point: p2(x) = p1(x) + c2(x − 1)(x − 2) suchthat

1/3 = p2(3) = 1−1

2(2)+c2(2)(1), c2 =

1

6=⇒ p2(x) = 1−1

2(x−1)+

1

6(x−1)(x−2).

Finally, using the last interpolation point: p3(x) = p2(x) + c3(x− 1)(x− 2)(x− 3)

1/4 = p3(4) = 1− 3

2+

1

6(3 · 2) + c3(3 · 2 · 1), c3 = − 1

24,

which leads to the desired polynomial

p3(x) = 1− 1

2(x− 1) +

1

6(x− 1)(x− 2)− 1

24(x− 1)(x− 2)(x− 3).

You can plot this polynomial in matlab

1 x = . 5 : 0 . 0 5 : 4 ;

2 y=1−(x−1)/2+ t imes ( x−1,x−2)/6 − t imes ( x−1, t imes ( x−2,x−3) ) ;

3 p l o t ( x , y , x , r d i v i d e (1 , x ) ) ;

c© 7


3. Lecture 3

3.1. Newton form in matlab. Recall that the Newton’s form algorithm proceedsrecursively.

(1) p0(x) = y0;(2) pk+1(x) = pk(x) + ck+1(x− x0) · ... · (x− xk), where

ck+1 =yk+1 − pk(xk)

(xk+1 − x0) · ... · (xk+1 − xk).

We will need three matlab functions.

1 f unc t i on c=CGEN(n , x , y )

which generates c0, ..., cn from x0, ..., xn and y0, ..., yn,

1 f unc t i on Px=EVAL(n , c , x , y0 , t )

which computes pn(t) given c = (c0, ..., cn), x = (x0, ..., xn), y0 = y0 and

1 f unc t i on PLOTINTP( x0 , xf ,NP, c , x , n , y0 ,FN) ,

which plots p(x) and the interpolated function, where x0, xf are the plot limits,NP is the number of points used for plotting, c, x, n are as above and FN is theanalytic function interpolated.

Notice that matlab array indices start at 1! Therefore, we rewrite the Newtonform algorithm with index starting at 1.

Newton Form with Shifted index: Given x1, ..., xn+1 distincts and y1, ..., yn+1

find pn+1 ∈ Pn satisfying

pn+1(xj) = yj , j = 1, ..., n+ 1.

(1) p1(x) = y1;(2) pk+1(x) = pk(x) + ck(x− x1) · ... · (x− xk), where

ck =yk+1 − pk(xk+1)

(xk+1 − x1) · ... · (xk+1 − xk).

We now provide the matlab code for the CGEN routine

1 f unc t i on c=CGEN(n , x , y )

2 f o r j =1:n % compute c ( j )

3 % compute the denominator o f c ( j )

4 PROD=1.0;

5 f o r I =1: j

6 PROD = PROD∗( x ( j +1)−x ( I ) ) ;

7 end

8 % compute pj ( x ( j +1) )

9 i f ( j ==1)

10 pj=y (1) ;

11 e l s e

12 pj=EVAL( j −1,c , x , y (1 ) , x ( j +1) ) ;

13 end

14 c ( j ) = ( y ( j +1)−pj ) /PROD;

15 end8 c©


Links to this code and the PLOTINTP code are given on ecampus. These arematlab m-file function codes and need to be in the subdirectory where you will runmatlab. There is one file per function.

Problem 3.1. Write and debug EVAL.m, a matlab m-file code for the functionEVAL above.

Problem 3.2. For n = 2, 3, 4, 5, 6 define x0 = 1, x1, x2, ..., xn = 4 with x0,...,xnuniformly spaced on [1, 4]. Using the above routines, for each n, compute c for thepolynomial in Pn interpolating ex and plot on [0, 5] using NP = 200. Report theL−infinity error computed by PLOTINTP using the call

1 PLOTINTP(0 ,5 ,200 , c , x , y0 , n , i n l i n e ( ’ exp ( x ) ’ ) ) ;

Handin plots for n = 2 and n = 6.

Problem 3.3. Repeat the above problem using the interpolation interval [0, π] butinstead interpolating sin(x). Plot on [0, π] with 200 points.

3.2. Lagrange Form of the Interpolating Polynomial. We consider again theproblem of finding p ∈ Pn satisfying

p(xi) = yi, i = 0, ..., n.

We saw that there is a unique lj ∈ Pn satisfying (fix j ∈ 0, 1, ..., n)

lj(xi) = δij ,

where

δij :=

1 if i = j0 otherwise

is the Kronecker Delta. In fact,

lj(x) =

n∏i=0,i6=j

(x− xi)(xj − xi)

.

These Lagrange polynomials allows us to solve the interpolation problem easily.Indeed, the interpolant is given by

p(x) =

n∑i=0

yili(x)

(check it!).

Example 3.1 (Lagrange Polynomials). Let x0 = 1, x1 = 3/2 and x2 = 2. Computeli(x) for i = 0, 1, 2. They are given by

l0(x) =(x− 3/2)(x− 2)

(1− 3/2)(1− 2)= 2(x− 3/2)(x− 2)

l1(x) =(x− 1)(x− 2)

(3/2− 1)(3/2− 2)= −4(x− 1)(x− 2)

l2(x) =(x− 1)(x− 3/2)

(2− 1)(2− 3/2)= 2(x− 1)(x− 3/2).

c© 9


Example 3.2 (Lagrange Interpolation). Use l0, l1 and l2 from the previous exampleto find p ∈ P2 satisfying

p(1) = 1, p(2) = −1, p(3/2) = 2.

The desired polynomial is directly given by

p(x) = 1l0(x)+2l1(x)−1l2(x) = 2(x−3/2)(x−2)−8(x−1)(x−2)−2(x−1)(x−3/2).

10 c©


4. Lecture 4

The Lagrange form can be used to deduce properties of polynomial interpolation.We recall that C[a, b] denote the set of continuous functions defined on [a, b].

The space C[a, b] is a linear (vector) space, with vector operations given by

(f + g)(x) = f(x) + g(x), x ∈ [a, b]

and

(αf)(x) = αf(x), x ∈ [a, b]

(for all f, g ∈ C[a, b] and α ∈ R). The set Pk is also a linear space (check it!). Inaddition, for x0, ..., xk ⊂ [a, b] define

L : C[a, b]→ Pk

by Lf to be the polynomial in Pk interpolating f at xi, i = 0, ..., k.

Lemma 4.1 (Property of L). The transformation L is a linear transformation, i.e.

L(αf + βg) = αL(f) + βL(g)

for all α, β ∈ R and f, g ∈ C[a, b].

Proof. Let liki=0 be the Lagrange polynomials associated with x0, ..., xk. Then

(Lf)(x) =

k∑i=0

f(xi)li(x) and (Lg)(x) =

k∑i=0

g(xi)li(x).

This implies

L(αf + βg) =

k∑i=0

(αf(xi) + βg(xi))li(x) = α

k∑i=0

f(xi)li(x) + β

k∑i=0

g(xi)li(x)

= α(Lf)(x) + β(Lg)(x).

Lemma 4.2. Let lini=0 be the Lagrange polynomials corresponding to the nodesx0, x1, ..., xn. Then every p ∈ Pn reads

p(x) =

n∑i=0

p(xi)li(x).

Proof. The polynomial q ∈ Pk given by

q(x) =

n∑i=0

p(xi)li(x)

interpolates p(x). Since p trivially interpolates itself, the unicity of the interpolantimplies that q = p.

Remark 4.1 (Polynomial Integration). Using the above representation lemma, wedirectly deduce an integration formula working simultaneously for all polynomialsof degree n: ∫ b

a

p(x)dx =

∫ b

a

n∑i=0

p(xi)li(x) =

n∑i=0

p(xi)Ai,

where Ai :=∫ bali(x)dx.

c© 11


The next theorem is central to derive approximation properties of the interpolant.

Theorem 4.1 (Inteprolation Estimates). Assume that f ∈ Cn+1[a, b] and p ∈ Pninterpolates f at x0, ..., xn ⊂ [a, b] (distincts). To each x ∈ [a, b], there is aξx ∈ (a, b) such that

f(x)− p(x) =1

(n+ 1)!f (n+1)(ξx)

n∏i=1

(x− xi).

Proof. If x = xi for some i = 0, ..., n then the assertion is trivial. Therefore, weassume that x 6= xi. Set

w(t) = (t− x0)(t− x1) · ... · (t− xn)

and

(3) φ(t) = f(t)− p(t)− λw(t),

with

λ =f(x)− p(x)

w(x).

Note that φ ∈ Cn+1[a, b], φ(xi) = 0, i = 0, ..., n and φ(x) = 0. Hence, φ has n+ 2distinct roots. Rolles theorem implies that φ′ has at least n + 1 distinct roots.Repeating the argument: φ′′ has at least n roots,..., φ(n+1) has at least 1 root,denoted ξx ∈ (a, b). Differentiating (3) n+ 1 times, we get

0 = φ(n+1)(ξx) = f (n+1)(ξx)− dn+1

dtn+1p(t)︸︷︷︸

=0

|t=ξx − λdn+1

dtn+1w(t)︸︷︷︸

=(n+1)!

|t=ξx .

In view of the definition of λ, this simplifies to

f(x)− p(x) =1

(n+ 1)!f (n+1)(ξx)

n∏i=1

(x− xi),

which is the desired result.

Let us now see if we can try to optimize (x0, ..., xn) distinct such that

maxx∈[a,b]

|(x− x0)...(x− xn)| ≤ maxx∈[a,b]

|(x− y0)...(x− yn)|

for any y0, ..., yn distinct. Such points would make the right hand side the smallestpossible (in magnitude). We need to choose x0, ..., xn so that

q(x) = (x− x0)...(x− xn)

has minimal absolute value on [a, b].Suppose that [a, b] = [−1, 1]. If p(x) has all distinct real roots in [x0, xn] (reorder

the roots such that x0 < x1 < ... < xn), then p(x) is monotone (increasing ordecreasing) for x > xn and x < x0. One might guess that all of the roots shouldbe in [−1, 1] to achieve the minimal absolute value.

To gain more insight, consider n = 0. In that case,

maxx∈[−1,1]

|x− x0| =

1 + x0 ifx0 ≥ 0,1− x0 ifx0 ≤ 0

and the minimum occurs at x0 = 0.12 c©


Next consider the quadratic case. Intuitively, the two points should be symmet-ric, i.e. x1 = −x0 and so

q(x) = (x− x0)(x+ x0) = x2 − x20.

Hence,

maxx∈[−1,1]

|(x− x0)(x+ x0)| = max|q(0)|, |q(1)| = maxx20, 1− x2

0.

The optimum choice satisfies x20 = 1 − x2

0, i.e. 1 = 2x20 or x0 =

√1/2. With this

choice, q(x0) = 12 = 1− x2

0.In the two examples, q has n+1 extreme points with equal magnitude. We need

a polynomial Tn(x) with n zeros in [−1, 1] and n + 1 extrema’s (with oscillatingsigns). More later.

c© 13


5. Lecture 5

The cosine function has a lots of zeros and alternating extrema but unfortunately,it is not a polynomial.

5.1. Chebyschev Polynomials. Define for x ∈ [−1, 1] and integer n ≥ 0

Tn(x) := cos(n cos−1(x)).

Recall that

cos−1 : [−1, 1]→ [0, π]

so that

n cos−1 : [−1, 1]→ [0, nπ].

As a consequence, Tn(x) has n+1 extrema oscillating between ±1. Moreover, Tn(x)has n zeros. Less obvious, Tn(x) is actually a polynomial. Indeed,

T0(x) = cos(0) = 1

T1(x) = cos(cos−1(x)) = x.

Also,

Tn+1(x) = cos((n+ 1)θ), θ = cos−1(x)

and so

Tn+1(x) = cos(θ) cos(nθ)− sin(θ) sin(nθ).

Similarly

Tn−1(x) = cos(θ) cos(nθ) + sin(θ) sin(nθ)

and therefore

Tn+1(x) + Tn−1(x) = 2 cos(θ) cos(nθ) = 2xTn(x).

We just proved a recurrence formula

Tn+1(x) = 2xTn(x)− Tn−1(x),

which implies

(1) Tn(x) is a polynomial of degree n;(2) The leading coefficient for Tn(x) is 2n−1xn for n ≥ 1;(3) For odd n, Tn(x) is an odd function (contain only xj with j odd) while for

even n, Tn(x) is even.

The roots of Tn are related to the zeros of cos(nθ), θ ∈ [0, π] (θ = cos−1(x)). Sincecos(y) = 0 for y = π(j + 1

2 ), the roots xj of Tn(x) are such that

n cos−1(xj) = π(j +1

2)

or

xj = cos

(π

n(j +

1

2)

), j = 0, 1, ..., n− 1.

Example 5.1 (n = 1). When n = 1, T1(x) = x and x0 = 0.

Example 5.2 (n = 2). When n = 2, T2(x) = 2x2 − 1 and x0 = − 1√2

, x1 = 1√2

.

Example 5.3 (n = 3). When n = 3, T3(x) = 4x3 − 3x and x0 = −√

34 , x1 = 0,

x2 =√

34 .

14 c©


The following theorem provides a share estimate on the interpolation error (seeTheorem 4.1).

Theorem 5.1 (Interpolation Error). Given n, let x0, ..., xn be the roots of Tn+1(x),i.e.

xj = cos

(π

n+ 1(j +

1

2)

), j = 0, .., n.

Let y0, ..., yn ⊂ Rn+1, then

mx := 2−n = maxx∈[−1,1]

∣∣∣∣∣n∏i=0

(x− xi)

∣∣∣∣∣ ≤ maxx∈[−1,1]

∣∣∣∣∣n∏i=0

(x− yi)

∣∣∣∣∣ =: my.

Proof. We proceed by contradiction. Suppose the theorem does not hold. Then,there exists y0, ..., yn with my < mx. Set

P (x) :=

n∏i=0

(x− xi) = 2−nTn+1(x),

where we used the fact that xi are the roots of Tn+1(x). Also, we set

Q(x) :=

n∏i=0

(x− yi) and R(x) := P (x)−Q(x).

Now, mx = 2−n since Tn+1(x) oscillate between −1 and 1. Moreover, my < mx

implies that R(x) has the same sign as P (x) at each extrema of P (x) (which coincidewith those of Tn+1(x)).

There are n + 2 extrema with oscillating signs. Therefore R(x) has oscillatingsigns at the extrema of Tn+1(x). The intermediate value theorem implies thatthere is a root of R(x) between each pair (of oscillating signs). In turn, n + 2extrema implies that R(x) has n + 1 roots. Note however, that both P (x) andQ(x) are monic so their difference, i.e., R(x) is a polynomial of degree n. Theonly polynomial of degree n with n+ 1 distinct roots is the zero polynomial, whichimplies that P (x) = Q(x). This contradicts my < mx.

Example 5.4 (Error Theorem). Let x0, ..., xn be distinct in [0, π] and p ∈ Pninterpolate sin(x) at x0, ..., xn. Derive a bound for the error | sin(x) − p(x)| forx ∈ [0, π]. According to Theorem 4.1, we have

| sin(x)− p(x)| = |f(n+1)(ξx)|(n+ 1)!

∣∣∣∣∣n∏i=0

(x− xi)

∣∣∣∣∣ .Without information on the distribution of xi∣∣∣∣∣

n∏i=0

(x− xi)

∣∣∣∣∣ =

n∏i=0

|(x− xi)| ≤ πn+1

and |f (n+1)(ξx)| ≤ 1 since

|f (n+1)(ξx)| =| cos(ξx)| when n+ 1 is odd,| sin(ξx)| when n+ 1 is even.

As a consequence, we obtain

| sin(x)− p(x)| ≤ πn+1

(n+ 1)!→ 0 when n→ 0.

c© 15


6. Lecture 6

The following example, shows the limitation of higher order interpolation meth-ods.

Example 6.1 (No Convergence). Let p ∈ Pn interpolate f(x) = x−α (α definedbelow) on the interval [1/2, 2] using n + 1 distinct nodes xi in [1/2, 2]. By theerror Theorem 4.1

f(x)− p(x) =f (n+1)(ξx)

(n+ 1)!

n∏i=0

(x− xi)

so

|f(x)− p(x)| =maxξ∈[1/2,2] |f (n+1)(ξ)|

(n+ 1)!

n∏i=0

|x− xi|.

Without further information on the xi, we can only conclude

|x− xi| ≤ 3/2.

Also

f ′(x) = −αx−(α+1),

f ′′(x) = α(α+ 1)x−(α+2),

...

f (n+1)(x) = (−1)n+1α(α+ 1) · ... · (α+ n)x−(α+n+1).

Suppose α ≤ 1, then

|f (n+1)(ξ)|(n+ 1)!

≤ α(α+ 1) · .... · (α+ n)

1 · 2 · ... · (n+ 1)|ξ|−(α+n+1) ≤ |ξ|−(α+n+1).

In addition, |ξ|−(α+n+1) is a decreasing function in ξ so that

maxξ∈[1/2,2]

|ξ|−(α+n+1) = 2α+n+1.

Gathering the above estimates, we obtain the bound

|f(x)− p(x)| ≤ 2α+n+1(3/2)n+1 = 2α · 3n+1.

Therefore, no convergence can be guaranteed (unless the interpolation points arechosen strategically).

Piecewise polynomial interpolation is a way to circumvent this issue.

6.1. Piecewise Polynomial Interpolation. Let I = [a, b] be the interval wherewe want to construct an interpolant and N > 0 an integer. Set zi := a+ (b− a) iNwith constitutes a uniform partition of [a, b] with spacing h = b−a

N , i.e.

a = z0 < z1 < ... < zN = b

and zi − zi−1 = h. On each subinterval Ii = (zi−1, zi), we use low order approxi-mation.16 c©


Case 1: P0 interpolation. Choose zi ∈ Ii and set fh(x) be defined on each subin-terval by

fh(x)|Ii = f(zi).

This is a polynomial interpolation in P0 on each subinterval, i.e. f0 is piecewiseconstant. Using the error Theorem 4.1 on each subinterval Ii:

|f(x)− fh(x)| = |f ′(ξx)||(x− zi)| ≤ supξ∈Ii|f ′(ξ)|h

for every x ∈ Ii. As a consequence, the piecewise interpolation converges as theh→ 0, i.e. the number of intervals increase to infinity (N →∞).

Example 6.2 (x−α). We return to the example above. In this case, a = 1/2, b = 2and zi = 1

2 + 32iN , i = 0, ..., N with h = 3

2N . For x ∈ Ii, we have

|f(x)− fh(x)| ≤ αh supξ∈Ii

ξ−α−1 ≤ αh2α+1.

This implies that

maxx∈[1/2,2]

|f(x)− f0(x)| ≤ αh2α+1.

Notice that the interpolant fh(x) of f(x) is not continuous. We sometimes writefh(x) ∈ C−1(1/2, 2).

Case 2: Continuous P1 interpolation. We construct fh, continuous on the entireinterval and in P1 on each interval. To do this, we use the endpoints of Ii as theinterpolation nodes, i.e. xi = zi and

fh(x) = f(xi−1)(xi − x)

h+ f(xi)

(x− xi−1)

h

for x ∈ Ii. Notice that in particular, fh(xi−1) = f(xi−1) and fh(xi) = f(xi), whichimplies the resulting piecewise polynomial fh is continuous because fh(xi) = f(xi)from Ii and Ii+1 for i = 1, ..., N − 1 (so we do not need to worry that Ii and Ii+1

defines fh(xi)).

Remark 6.1 (Discontinuous piecewise linears). Continuity was forced by choosingthe endpoints as interpolation nodes. If instead you used interior nodes, you willend up with a discontinuous approximation (in general).

On each Ii we use again error representation provided by Theorem 4.1:

|f(x)− fh(x)| ≤ |f(2)(ξx)|

2|(x− xi−1)(x− xi)|, x ∈ Ii.

Now |(x − xi−1)(x − xi)| = (xi − x)(x − xi−1) which achieves its maximum at

x = 12 (xi + xi−1) with a value of h2

4 . Therefore, for x ∈ Ii

|f(x)− fh(x)| ≤ supξ∈Ii

|f (2)(ξ|8

h2

and for x ∈ I

|f(x)− fh(x)| ≤ supξ∈I

|f (2)(ξ|8

h2.

Notice that again the error is converging but this time the rate is quadratic (insteadof linear).

c© 17


Example 6.3. x−α For the above example, this is

|f(x)− fh(x)| ≤ α(α+ 1)h2

8max

ξ∈[1/2,2]ξ−α−2 ≤ α(α+ 1)2α−1h2.

The following theorem gather both cases discussed

Theorem 6.1. Let f be defined on [a, b] with f ∈ C1[a, b]. Set zi := a + ih,i = 0, ..., N with h = (b− a)/N . If fh is the piecewise constant approximation of fthen

|fh(x)− f(x)| ≤ ‖f ′‖L∞(a,b)h.

Moreover, if f ∈ C2[a, b] and fh is the continuous, piecewise linear approximationof f then

|fh(x)− f(x)| ≤ ‖f ′′‖L∞(a,b)h2

8.

Here ‖v‖L∞(a,b) := maxξ∈[a,b] |v(ξ)|.

18 c©


7. Lecture 7

In the previous lecture, we discussed piecewise interpolation with Pk, k ≥ 1,polynomial. These approximations are globally continuous if the endpoints of eachinterval are interpolation points.

We are now contemplating the possibility of constructing globally C1 interpolant.

Example 7.1 (Globally P2 and C1 interpolant). Find p ∈ P2 satisfying

(4) p(−1) = a, p′(0) = b, p(1) = c

for given a, b, c ∈ R.Let p(x) = α+ βx+ γx2 for α , β and γ ∈ R. The above 3 equations lead to the

linear system 1 −1 10 1 01 1 1

︸︷︷︸

A

αβγ

︸︷︷︸

x

=

abc

︸︷︷︸

b

.

Note that det(A) = 1 det

(1 11 1

)= 0 and so A is singular. This means that

depending on b, there are either infinitely many solutions or no solutions. To givean example, if p(x) solves (4), so does p(x) + η(x2 − 1) for any η ∈ R. Also thereare no solutions to (4) when a = 0, b = 1 and c = 0 because otherwise both (x− 1)and (x + 1) would have to divide p(x), i.e. p(x) would have to be a multiple of(x2 − 1) and so p′(0) = 0!

Exercise 7.1. Consider the interpolation problem, find p ∈ P3 satisfying

p(−1) = y0, p(0) = y1, p′(0) = y2, p(1) = y3.

Show that the above problem always as a unique solution.Hint: Look for a solution p(x) = α+ βx+ γx2 + δx3, derive the matrix system forthe unknowns α, β, γ and δ and show that its determinant is non-zero.

A general uniquely solvable interpolation problem. Assume x0, x1, ..., xnare distinct. Find p ∈ PN satisfying for j = 0, 1, ...,mi and i = 0, 1, ..., n

p(j)(xi) = yi,j ,

where yi,j are given.The number of equations is

n∑i=0

(mi + 1)

so we must take N =∑ni=0(mi + 1)− 1.

Theorem 7.1 (Globally Smooth Interpolation). The above interpolation problemhas a unique solution given any set yi,j

n;mji=0;j=0

Notice that if you include a derivative of order j at xi, you must also includep(l)(xi), l = 0, 1, ..., j!.

c© 19


The general Hermite interpolation problem. Given x0, ..., xn distinct andf ∈ C1[x0, xn]. Find p ∈ P2n+1 satisfying

p(xi) = f(xi), p′(xi) = f ′(xi).

According to the previous theorem, this interpolation problem has a unique solu-tion.

Theorem 7.2 (Error with Hermite Inteprolation). Suppose that f ∈ C2n+2[a, b]and that xi ⊂ [a, b]. Let p be the Hermite interpolant. Then for x ∈ [a, b], thereis a ξx ∈ (a, b) satisfying

f(x)− p(x) =f (2n+2)(ξx)

(2n+ 2)!

n∏i=0

(x− xi)2.

Proof. The formula is valid when x = xi, i = 0, ..., n. Therefore, we assume thatx 6∈ x0, ..., xn.

Set w(t) =∏mi=0(t− xi)2 and

(5) φ(t) = f(t)− p(t)− λw(t),

where

λ =f(x)− p(x)

w(x).

Note that φ′(xi) = 0 since f ′(xi) = p′(xi) and w′(xi) = 0. Now φ(xi) = 0 fori = 0, 1, ..., n and φ(x) = 0 from the definition of λ. Rolles theorem implies that φ′

has at least n+1 additional zeros which are not in x0, ..., xn. Therefore, φ′ has atleast 2n+2 distinct zero. Applying Rolles theorem again but to these zeros impliesthat φ′′ has at least 2n + 1 zero. Repeating this process we find that φ(2n+2) hasat least 1 zero, i.e. φ(2n+2)(ξx) = 0 for some ξx ∈ (a, b).

Differentiating (5) 2n+ 1 times and evaluating at ξx

0 = φ(2n+2)(ξx) = f (2n+2)(ξx)− λ(2n+ 2)!

and the claim follows by simple algebra.

Example 7.2 (Hermite n = 0). Find p ∈ P1 satisfying

p(x0) = f(x0), p′(x0) = f ′(x0).

The solution isp(x) = f(x0) + f ′(x0)(x− x0),

i.e. the 2 term Taylor polynomial.

Example 7.3 (Cubic Hermite n = 1). Find p ∈ P3 satisfying

p(xi) = f(xi), p′(xi) = f ′(xi), i = 0, 1.

We use the Newton form solution. From the first example

p1(x) = f(x0) + f ′(x0)(x− x0),

solves p1(x0) = f(x0) and p′1(x0) = f ′(x0). Look for

p2(x) = p1(x) + c2(x− x0)2

satisfying p2(x1) = f(x1), i.e.

c2 =f(x1)− p1(x1)

(x1 − x0)2=f(x1)− f(x0)− f ′(x0)(x1 − x0)

(x1 − x0)2.

20 c©


Then we look forp3(x) = p2(x) + c3(x− x0)2(x− x1)

satisfying p′3(x1) = f ′(x1).

Exercise 7.2. Derive an expression for p3(x) in term of

x0, x1, f(x0), f ′(x0), f(x1), f ′(x1).

Piecewise Hermite interpolation. Let a = x0 < x1 < ... < xm = b and considerpiecewise Hermite cubic approximation fh, i.e.

fh(x) = pi(x) on [xi−1, xi]

with pi ∈ P3 solving

pi(xi−1) = f(xi−1), p′i(xi−1) = f ′(xi−1)

pi(xi) = f(xi), p′i(xi) = f ′(xi),

where h = maxi=1,..,N (xi − xi−1). Note that fh ∈ C1[a, b].

Exercise 7.3. Assume that f ∈ C4[a, b]. Use the previous theorem to prove a 4thorder (in h) error bound for |fh(x)− f(x)|.

c© 21


8. Lecture 8

Trigonometric Interpolation. We now recall some results on Fourier in series.

We set Ω = [0, 2π], ψj(x) = eijx for x ∈ Ω, i =√−1 and j ∈ Z.

We recall the Euler’s formula for θ ∈ R

eiθ = cos(θ) + i sin(θ),

see Figure 2 for an illustration.

θ

z = eiθ

cos(θ)Re(z)

Im(z)

sin(θ)

Figure 2. Illustration of the Euler’s formula on the Complex plane.

Using the Euler’s formula we find that

ψj(x) = cos(jx) + i sin(jx), j ∈ Z.

Define the space

L2(Ω) :=

f : [0, 2π]→ C :

∫ 2π

0

|f(x)|2dx <∞

with norm

‖f‖L2(Ω) :=

(∫ 2π

0

|f(x)|2dx)1/2

.

Remark 8.1 (L2(Ω)). (1) L2(Ω) is a vector space (infinite dimensional) of func-tions on [0, 2π] with scalar field C.

(2) A norm ‖.‖ on a vector space V over C satisfies(a) ‖v‖ ≥ 0 for all v ∈ V and ‖v‖ = 0 only if v = 0.(b) ‖αv‖ = |α|‖v‖ for all α ∈ C and v ∈ V.(c) ‖v + w‖ ≤ ‖v‖+ ‖w‖ for v, w ∈ V (triangle inequality).

(3) Norms on a vector space V give a notion of distances between elements ofV, i.e. the distance between two elements v, w ∈ V is ‖v − w‖.

Definition 8.1 (Convergence of Series in Normed Vector Space). Let V be a vectorspace and ‖.‖ a norm on V. Assume vj ⊂ V and v ∈ V. Then

∑∞j=1 vj converges

22 c©


to vin ‖.‖ if the sequence of partial sums Sl :=∑lj=1 vj converges to v. This means

that given ε > 0, there exists N = N(ε) satisfying

‖v − Sl‖ ≤ ε when l > N.

Theorem 8.1 (Fourier Series). For f ∈ L2(Ω), the series

+∞∑j=−∞

cjψj(x),

with

cj =1

2π

∫ 2π

0

f(x)ψj(x)dx ∈ C

converges to f . (In this case we set Sl(x) :=∑lj=−l cjψj(x).) In addition,

‖f‖2L2(Ω) = 2π∑j∈Z|cj |2.

Remark 8.2. Some remarks are in order.

(1) For f ∈ L2(Ω), the series∑∞j=−∞ |cj |2 converges.

(2) Note that the functions ψj(x) are periodic, i.e.

limx→0+

ψj(x) = limx→2π−

ψj(x)

and

limx→0+

ψ′j(x) = limx→2π−

ψ′j(x)

and so on for all derivatives.(3) Depending on the smoothness of f , i.e. f ∈ Cn[0, 2π] and f, f ′, f (2), ..., f (n)

are periodic, then the series∑j∈Z|cj |2j2n <∞.

The set of such functions is denoted Hn and we set

‖f‖Hn :=

∑j∈Z|cj |2(1 + j)2n

1/2

.

Theorem 8.2 (Spectral approximation). Suppose that f ∈ Hn. Then the truncatedseries

Sl :=

l∑j=−l

cjψj(x)

satisfies

‖f − SN‖2L2(Ω) = 2π∑|j|>N

|cj |2 ≤2π

(N + 1)2n‖f‖2

Hn.

Proof. We have

‖f − SN‖2L2(Ω) = 2π∑|j|>N

|cj |2 = 2π∑|j|>N

|cj |2j2n

j2n.

Now for |j| > N , 1j2n ≤

1(N+1)2n and the claims follow.

c© 23


Some remarks are in order.Remark 8.3.

(1) The rate of convergence for the truncated series is better for smooth f .(2) To compute the coefficients cj , you need compute integrals with complex

integrand. We provide an alternative next.

Trigonometric interpolation. Given an integer N ≥ 0, set

h =2π

2N + 1and xj = jh, j = 0, ..., 2N.

and

V2N+1 := spanψj , j = −N, ..., N =

∑|j|≤N

djψj , dj ⊂ C

.

The trigonometric interpolation problem reads: Find f2N+1 ∈ V2N+1 satisfying

f2N+1(xj) = f(xj), j = 0, 1, ..., 2N.

Note that dim(V2N+1) = 2N + 1 (f2N+1 involves 2N + 1 coefficients) and there are2N + 1 equations. Maybe there is a unique solution!

The DFT - Discrete Fourier Transform. Define E(j) := e2πij2N+1 for j ∈ Z and

note that

E(j + (2N + 1)) = e2πij2N+1 e

2πi(2N+1)2N+1 = E(j) e2πi︸︷︷︸

=1

= E(j).

This shows that E(j) is periodic with period 2N + 1. Also

E(j)(2N+1) = e2πij(2N+1)

2N+1 = e2πij = 1.

In fact, E(j), j = 0, 1, ..., 2N are the 2N + 1 roots of x2N+1 − 1 = 0, see Figure 3.N = 1 N = 2

E(0)

E(1)

E(2)

2π3

4π3

E(0)

E(1)E(2)

E(3) E(4)

2π5

4π5

Figure 3. Illustration of DFT: (left) N = 1 where E(0) = 1,

E(1) = e2πi3 , E(2) = e

4πi3 , E(3) = E(0) = 1 and (right) N = 2

where E(0) = 1, E(1) = e2πi5 , E(2) = e

4πi5 , E(3) = e

6πi5 , E(4) =

e8πi5 , E(5) = E(0) = 1 .

For d ∈ C2N+1 we define DFT±(d) ∈ C2N+1 by

DFT±(d)(j) =

2N∑m=0

dmE(±jm).

24 c©


We now return to the interpolation problem. If

f2N+1 =∑|j|≤N

cjψj ∈ VN

satisfiesf2N+1(xl) = f(xl), l = 0, 1, ..., 2N,

then

f(xl) = f2N+1(xl) =

N∑j=−N

cjeijxl =

N∑j=−N

cjeijl2π2N+1 =

N∑j=−N

cjE(jl) =

2N∑j=0

djE(jl) = DFT+(d)(l),

where

dj =

cj if 0 ≤ j ≤ Ncj−(2N+1) if N < j ≤ 2N.

This means that

F :=

f(x0)f(x1)

...f(x2N )

= DFT+(d).

The next theorem guarantees that the interpolation problem (finding the coef-ficients cj or dj) can be solved for any data and as this is a square linear system,unique solvability follows.

Theorem 8.3 (Inverse DFT).

(DFT+)−1 =1

2N + 1DFT−

so that

d =1

2N + 1DFT−(F ).

c© 25


9. Lecture 9

Proof of Theorem 8.3. We begin by proving the identity

(DFT )−1+ =

1

2N + 1DFT− .

For c ∈ C2N+1, we have

DFT+(c)(j) =

2N∑l=0

clE(jl),

where E(jl) = e2πijl2N+1 . Then,

DFT−(DFT+(c))(m) =

2N∑j=0

DFT+(c)(j)E(−jm) =

2N∑j=0

(2N∑l=0

clE(lj)

)E(−jm)

=

2N∑j=0

2N∑l=0

clE((l −m)j) =

2N∑l=0

cl

2N∑j=0

E((l −m)j)

.

Now, if l = m,

sum2Nj=0E((l −m)j) =

2N∑j=0

1 = 2N + 1.

If l 6= m,

E((l −m)j) = e2πi(l−m)j

2N+1 = ξj ,

with

ξ := e2πi(l−m)

2N+1 .

Therefore, we have

2N∑j=0

E((l−m)j) = 1+ξ+ ...+ξ2N =1− ξ2N+1

1− ξ=

1− e2πi(l−m)j(2N+1)

2N+1

1− ξ=

1− 1

1− ξ= 0

so that

DFT−(DFT+(c))(m) = (2N + 1)cm,

and the desired relation follows.

Remark 9.1 (Trigonometric interpolation using 2N points). Recall the interpolation

problem: find f2N+1(x) =∑Nj=−N cjψj(x) such that

f2N+1(xi) = f(xj), j = 0, 1, ..., 2N,

where ψj(x) = eijx.The discrete Fourier transforms using 2N points reads

DFT± : (c0, ..., c2N−1) ⊂ C2N → C2N ,

where

DFT±(c)(j) =

2N−1∑l=0

clE(±lj), j = 0, ..., 2N − 1,

and

E(m) = e2πim2N .

26 c©


Hence, if we set xj = jh, with h = 2π2N = π

N , the the trigonometric interpolationproblem has solution

f2N (x) =

N∑j=−N+1

cjψj(x),

where

cj =

dj for j = 0, ..., Ndj+2N for j = −N + 1, ...,−1,

and

d =1

2NDFT−(F ), with Fj = f(xj), j = 0, 1, ..., 2N − 1.

Remark 9.2 (Computational Cost using DFT). We recall that

DFT±(c)(j) =

2N−1∑l=0

clE(lj).

Each value j requires 2N multiplications (complex) and 2N −1 additions or a totalof O(N) flops (floating point operations). This entails an overall cost of O(N2) tocompute all the output values.

Remark 9.3 (The Fast ”Discrete” Fourier Transform - FFT). If 2N is highly fac-torable, the FFT algorithm will compute the DFT in O(Nlog(N)) operations.

The discussion in the audio file now regards Homework 5 (available from ecam-pus).

c© 27


10. Lecture 10

Extension of Trigonometric interpolation. On [0, π] use

VN−1 = spansin(jx) : j = 1, 2, ..., N − 1

so that the interpolation problem reads: find SN ∈ VN−1 satisfying

SN (xj) = f(xj), for j = 1, ..., N − 1,

with xj = πN j = hj where h := π

N .

Remark 10.1 (Existence and Uniqueness). The above interpolation problem hasa unique solution. Moreover, computing the coefficients cj in the interpolatingpolynomial

SN (x) =

N−1∑j=1

cj sin(jx)

can be done using FFT ′s of size 2N .

Remark 10.2 (Spectral Convergence). This sine approximation also exhibits spec-tral convergence as in Theorem 8.2.

On [0, π] one can also use

VN+1 = spancos(jx) : j = 0, 1, ..., N.

In this case, xj = hj for j = 0, ..., N and the interpolation problem consists infinding CN+1 ∈ VN+1 such that

CN+1(xj) = f(xj), j = 0, ..., N.

As in the sine case, similar remarks holds.

Numerical Differentiation

We start with the simplest example: use the difference quotient from the defini-tion of the derivative, i.e.

(6) f ′(x) ≈ f(x+ h)− f(x)

h, h > 0.

This is called a forward difference. Similarly,

(7) f ′(x) ≈ f(x)− f(x− h)

h, h > 0.

is called a backward difference.

Error estimates. The Taylor series around x reads

f(x+ h) = f(x) + hf ′(x) +h2

2f ′′(ξ)

for some ξ ∈ (x, x+ h). Whence,

f ′(x) =f(x+ h)− f(x)

h− h

2f ′′(ξ)︸︷︷︸O(h)

.

This implies that error due to the approximation (6) is O(h) or first order!28 c©


The same property holds for (7) since

f(x− h) = f(x)− hf ′(x) +h2

2f ′′(ξ)

for some ξ ∈ (x− h, x) and so

f ′(x) =f(x)− f(x− h)

h+h

2f ′′(ξ)︸︷︷︸O(h)

.

Example 10.1 (Method of Undetermined Coefficients). We propose to find anapproximation of the derivative of highest possible order of the form

f ′(x) ≈ af(x) + bf(x− h) + cf(x− 2h).

To do this, we use the method of undetermined coefficients. We write the Taylor’sseries around x and expand f(x− h) and f(x− 2h)

f(x) = f(x)

f(x− h) = f(x)− hf ′(x) +h2

2f ′′(x) + ...

f(x− 2h) = f(x)− 2hf ′(x) + 2h2f ′′(x) + ...

Now, use these expression to expand af(x) + bf(x− h) + cf(x− 2h) and regroupsterms with the same power of h

af(x)+bf(x−h)+cf(x−2h) = (a+b+c)f(x)−(b+2c)hf ′(x)+h2

2(b+4c)f ′′(x)−h

3

6(b+8c)f ′′′(x)+...

We want this to equal f ′(x) + O(hr) for r as large as possible. To do this, weconstraint the coefficients to satisfy

a+ b+ c = 0

−(b+ 2c)h = 1

b+ 4c = 0.

This has a unique solution. Indeed, it is equivalent to (upon multiplying the secondconstraint by −1/h) 1 1 1

0 1 20 1 4

︸︷︷︸

=:A

abc

=

0−1/h

0

.

We compute det(A) = 1.det

(1 21 4

)= 2 6= 0. This implies that the system has a

unique solution and we cannot get rid of higher order terms. The solution is givenby a = 3

2h , b = − 2h and c = 1

2h so that the desired approximation is

f ′(x) ≈32f(x)− 2f(x− h) + 1

2f(x− 2h)

h.

c© 29


To derive an expression of the error, we use Taylor series with remainder (at thelowest remaining term), i.e.

f(x) = f(x)

f(x− h) = f(x)− hf ′(x) +h2

2f ′′(x)− h3

6f ′′′(ξ1), ξ1 ∈ (x− h, x)

f(x− 2h) = f(x)− 2hf ′(x) + 2h2f ′′(x)− 8h3

6f ′′′(ξ2), ξ2 ∈ (x− 2h, x).

Then, we compute

32f(x)− 2f(x− h) + 1

2f(x− 2h)

h= f ′(x)−h

3

6

(− 2

hf ′′(ξ1) +

8

2hf ′′′(ξ2)

)= f ′(x)+O(h2).

This is the highest order method of this form and its order is 2.

Example 10.2 (Second Derivatives). We use the method of undetermined coeffi-cients to get a difference approximation to

−f ′′(x) ≈ af(x− h) + bf(x) + cf(x+ h)

of highest possible order. We have

f(x) = f(x)

f(x± h) = f(x)± f ′(x) +h2

2f ′′(x)± h3

6f ′′′(x) +

h4

24f (4)(x)± ...

Hence,

af(x− h) + bf(x) + cf(x+ h) =(a+ b+ c)f(x) + h(c− a)f ′(x) +h2

2(c+ a)f ′′(x)

+h3

6(c− a)f ′′′(x) +

h4

24(c+ a)f (4)(x) + ...

To have this expression agree with −f ′′(x) to highest order we set

a+ b+ c = 0

c− a = 0

h2

2(c+ a) = −1.

This is probably as far as we can go and leads to the system 1 1 1−1 0 11 0 1

︸︷︷︸

=:A

abc

=

00

−2/h2

.

We compute det(A) = −1 det

(−1 11 1

)= 2 6= 0. The unique solution is given by

a = − 1h2 , b = 2

h2 and c = − 1h2 so that the desired approximation is Note that c = a

implies that the leading error term is the one involving f (4). The Taylor series withremainder (at that term) are

f(x± h) = f(x)± f ′(x) +h2

2f ′′(x)± h3

6f ′′′(x) +

h4

24f (4)(ξ±),

30 c©


where ξ+ ∈ (x, x+ h) and ξ− ∈ (x− h, x). Thus,

−f(x− h) + 2f(x)− f(x+ h)

h2= −f ′′(x)− h2

24

(f (4)(ξ+) + f (4)(ξ−)

).

The approximation is second order but requires f ∈ C4[x− h, x+ h].

c© 31


11. Lecture 11

11.1. Differentiation via Polynomial Interpolation. Suppose we want a dif-ferentiation formula of the form

f ′(x) =

n∑i=0

αif(xi),

with x0, ..., xn distinct in [a, b].Let p ∈ Pn be the polynomial interpolating f at x0, ..., xn, i.e.

p(x) =

n∑i=0

li(x)f(xi),

where li(x) are the Lagrange polynomials (see Section 3.2). Then p′(x) =∑ni=0 l

′i(x)f(xi)

should approximate f ′(x). This is indeed the case at x = xi, i = 0, ..., n but it isless clear what happen when x is not an interpolation point.

Theorem 11.1. Assume that f ∈ Cn+1[a, b], x0, ..., xn distinct in [a, b] andp ∈ Pn interpolates f at x0, ..., xn. Then, there exists ξj ∈ [a, b] such that

f ′(xj)− p′(xj) =f (n+1)(ξj)

(n+ 1)!

∏l 6=j

(xj − xl).

Proof. For x 6∈ x0, ..., xn, define

Θ(x) :=f(x)− p(x)∏ni=0(x− xi)

(n+ 1)! .

Note that Theorem 4.1 guarantees that

f(x)− p(x) =f (n+1)(ξx)

(n+ 1)!

n∏i=0

(x− xi)

so

Θ(x) = f (n+1)(ξx)

Since we do not know what happens to ξx when x → xj , we cannot use thisrepresentation further but it shows that

Θ(x) ∈ Range(f (n+1)), for x ∈ [a, b].

Instead, we evaluate

limx→xj

Θ(x) = limx→xj

(f(x)− p(x))(n+ 1)!∏ni=0(x− xi)

.

This is a 0/0 type of indetermination so we can use L’Hospital’s rule

limx→xj

Θ(x) = limx→xj

(f ′(x)− p′(x))(n+ 1)!

(∏ni=0(x− xi))

′ .

We now compute the denominator(n∏i=0

(x− xi)

)′=

n∑i=0

∏l 6=i

(x− xl).

32 c©


Each term of the above sum has a factor (x − xj) except when i = j and so theonly one that is not vanishing when x→ xj is the ith factor. This implies that

limx→xj

(n∏i=0

(x− xi)

)′=∏l 6=j

(xj − xl)

and thus

limx→xj

Θ(x) =(f ′(xj)− p′(xj))(n+ 1)!∏

l 6=j(xj − xl).

Now, f (n+1) is continuous on [a, b] by assumption and we have already seen that

Θ(x) ∈ Range(f (n+1)), for x ∈ [a, b].

In particular,

limx→xj

Θ(x) ∈ Range(f (n+1))

or, there exists ξj ∈ [a, b] with

limx→xj

Θ(x) = ξj .

Thus

f (n+1)(ξj) =(f ′(xj)− p′(xj))(n+ 1)!∏

l 6=j(xj − xl),

which is the desired estimate after simple algebraic manipulations.

Remark 11.1 (Continuity of ξx). Within the proof of the above theorem, we actuallyshowed that the function

x 7→ f (n+1)(ξx)

is continuous on [a, b] provided f ∈ C(n+1)[a, b]. This fact will be used later.

Example 11.1 (3 points differentiation scheme). Use polynomial interpolation toderive an approximation to the derivative of the form

f ′(x) ≈ af(x) + bf(x− h) + cf(x− 2h).

We first compute the lagrange basis for the interpolation points x, x − h, x − 2h(we use t for the variable for a fixed x)

l0(t) =(t− (x− h))(t− (x− 2h))

(x− (x− h))(x− (x− 2h))=t2 − (2x− 3h)t+ (x− h)(x− 2h)

2h2;

l1(t) =(t− x)(t− (x− 2h))

((x− h)− x)((x− h)− (x− 2h))=t2 − (2x− 2h)t+ x(x− 2h)

−h2;

l2(t) =(t− x)(t− (x− h))

(x− 2h− x)(x− 2h− (x− h))=t2 − (2x− h)t+ x(x− h)

2h2.

The associated interpolant is

p(t) = l0(t)f(x) + l1(t)f(x− h) + l2(t)f(x− 2h)

so that

f ′(x) ≈ l′0(x)f(x) + l′1(x)f(x− h) + l′2(x)f(x− 2h).c© 33


This means

f ′(x) ≈2x− (2x− 3h)

2h2f(x) +

2x− (2x− 2h)

−h2f(x− h) +

2x− (2x− h)

2h2f(x− 2h)

=3

2hf(x)− 2

hf(x− h) +

1

2hf(x− 2h).

The error term is

f ′(x)−(

3

2hf(x)− 2

hf(x− h) +

1

2hf(x− 2h)

)=f ′′′(ξ0)

6(x− (x− h))(x− (x− 2h))

=f ′′′(ξ0)

3h2.

34 c©


12. Lecture 12

Numerical Integration

We use polynomial interpolation techniques to derive numerical integration schemesto approximate

I(f) =

∫ β

α

f(x) dx,

for α < β. Let x0, ..., xn ⊂ [a, b] be distinct, where a < b are such that [α, β] ⊆[a, b]. Let p ∈ Pn interpolates f at x0, ..., xn. We propose to approximate I(f)by

Q(f) =

∫ β

α

p(x) dx.

Using the Lagrange polynomials li(x)ni=0 associated with the interpolations pointsxini=0, we write

p(x) =

n∑i=0

f(xi)li(x)

so that

Q(f) =

∫ β

α

(n∑i=0

f(xi)li(x)

)dx =

n∑i=0

f(xi)

∫ β

α

li(x) dx

=

n∑i=0

wif(xi),

where we defined

wi :=

∫ β

α

li(x) dx.

This leads to the following definition of quadrature.

Definition 12.1 (Quadrature). An integral approximation of the form

I(f) ≈ Q(f) =

n∑i=0

wif(xi)

is called a quadrature. The real numbers wi are the weights and xi are thenodes.

Example 12.1 (Rectangle quadrature). Let x0 ∈ [a, b]. Find the quadrature ap-proximating

I(f) =

∫ b

a

f(x) dx

based on polynomial interpolation using x0 and P0. In this case, p(x) = f(x0)l0(x) =f(x0) and

Q(f) =

∫ b

a

f(x0) dx = (b− a)f(x0),

which is the area of the shaded region in Figure 4.c© 35


a bx0

f(x)

f(x0)

Figure 4. Rectangle quadrature. The approximation Q(f) cor-responds to the area of the shaded region.

Example 12.2 (Trapezoidal quadrature). Consider p ∈ P1 interpolating f at x0 =a and x1 = b to approximate

I(f) =

∫ b

a

f(x) dx.

In that case,

p(x) = l0(x)f(a) + l1(x)f(b) =b− xb− a

f(a) +x− ab− a

f(b)

so that

Q(f) =

∫ b

a

p(x) dx =f(a)

b− a

∫ b

a

(b− x) dx+f(b)

b− a

∫ b

a

(x− a) dx.

Both integrals equal 12 (b− a)2 so

Q(f) =b− a

2(f(a) + f(b)) .

See Figure 5 for an illustration.

a = x0 b = x1

f(x)

Figure 5. Trapezoidal quadrature. The approximation Q(f) cor-responds to the area of the shaded region.

Example 12.3 (3 Points Quadrature). Consider the interpolation nodes x, x −h, x− 2h for some h > 0 and x ∈ R and use a quadratic polynomial interpolant toderive a quadrature scheme to approximate

I(f) =

∫ x

x−hf(t) dt.

Note that this will use interpolation points outside the integration region. In Ex-ample 11.1, we have already computed the corresponding lagrange polynomials l0,36 c©


l1 and l2. To derive such quadrature scheme, we need to compute

wi =

∫ x

x−hli(t) dt,

which seems like a lot of work... perhaps there is a better way (later).

We now discuss how well Q(f) approximate I(f).

Theorem 12.1 (Interpolation error). Let f ∈ C(n+1)[a, b], x0, ..., xn distinctin [a, b] and a ≤ α < β ≤ b. If p ∈ Pn inteprolates f at xi, i = 0, ..., n, and

Q(f) =∑ni=0 wif(xi), with wi =

∫ βαli(x) dx, then we have

I(f)−Q(f) =1

(n+ 1)!

∫ β

α

f (n+1)(ξx)

n∏i=0

(x− xi) dx,

where for every x ∈ [α, β], ξx ∈ [a, b]. Moreover, if∏ni=0(x − xi) does not change

sign on [a, b], then there exists ξ ∈ [a, b] with

I(f)−Q(f) =f (n+1)(ξ)

(n+ 1)!

∫ β

α

n∏i=0

(x− xi) dx.

Proof. In view of the interpolation error provided by Theorem 4.1, for x ∈ (α, β)there exists ξx ∈ (a, b) such that

I(f)−Q(f) =

∫ β

α

(f(x)− p(x)) dx =1

(n+ 1)!

∫ β

α

f (n+1)(ξx)

n∏i=0

(x− xi) dx.

This is the first claim. To continue further, it suffices to note that f (n+1)(ξx) iscontinuous (see Remark 11.1) and invoke the mean value theorem for integrals towrite

1

(n+ 1)!

∫ β

α

f (n+1)(ξx)

n∏i=0

(x− xi) dx =f (n+1)(ξ)

(n+ 1)!

∫ β

α

n∏i=0

(x− xi) dx,

for some ξ ∈ (a, b). This implies the second claim.

Example 12.4 (Error for the Trapezoidal quadrature). For some ξ ∈ (a, b) theabove theorem guarantees that∫ b

a

f(x) dx− b− a2

(f(a) + f(b)) =f ′′(ξ)

2

∫ b

a

(x− a)(x− b) dx

using the fact that (x−a)(x−b) does not change sign. Hence, computing the integralleads to∫ b

a

f(x) dx−b− a2

(f(a) + f(b))y=x−a

=f ′′(ξ)

2

∫ b−a

0

y(y−(b−a) dy = −f′′(ξ)

12(b−a)3.

Example 12.5 (Simpson quadrature). We want to approximate

I(f) =

∫ 1

−1

f(x) dx

c© 37


using a polynomial of degree 2 interpolating f at x0 = −1, x1 = 0 and x2 = 1. Wefirst compute the lagrange polynomials

l0(x) =(x− 0)(x− 1)

(−1− 0)(−1− 1)=x2 − x

2

l1(x) =(x+ 1)(x− 1)

(0 + 1)(0− 1)= 1− x2

l2(x) =(x− 0)(x+ 1)

(1− 0)(1 + 1)=x2 + x

2

Therefore,

w0 =

∫ 1

−1

l0(x) dx =

∫ 1

−1

1

2x2 dx =

1

3

w1 =

∫ 1

−1

l1(x) dx =

∫ 1

−1

(1− x2) dx =4

3

w2 =

∫ 1

−1

l3(x) dx =

∫ 1

−1

1

2x2 dx =

1

3.

and the quadrature rule reads

I(f) ≈ Q(f) =1

3f(−1) +

4

3f(0) +

1

3f(1).

However,2∏i=0

(x− xi) = (x+ 1)x(x− 1) = (x2 − 1)x

changes sign on [−1, 1] so we cannot get a formula involving f ′′′(ξ) for some (fixed)ξ ∈ (−1, 1).

We now discuss a possibly simpler way to compute the weights wi. This relies onthe observation that when f ∈ Pn is a polynomial of degree n, then its interpolantp ∈ Pn is f itself (since interpolant are unique). This means

I(f) =

∫ b

a

f(x) dx =

∫ b

a

p(x) dx = Q(f)

for all f ∈ Pn. In other words, the quadrature is exact for p ∈ Pn. In particular,this means that

J0 :=

∫ b

a

1 dx =

n∑i=0

wi1 = Q(1)

J1 :=

∫ b

a

x dx =

n∑i=0

wixi = Q(x)

...

Jn :=

∫ b

a

xn dx =

n∑i=0

wixni = Q(xn).

38 c©


Hence, the weights wini=0 satisfy the linear system

Atw :=

1 1 1 ... 1x0 x1 x2 ... xnx2

0 x21 x2

2 ... x2n

...xn0 xn1 xn2 ... xnn

w0

w1

...wn

=

J0

J1

...Jn

.

Recall that you get the linear system

Aw :=

1 x0 x2

0 ... xn01 x1 x2

1 ... xn11 x2 x2

2 ... xn2...

1 xn x2n ... xnn

c0c1...cn

=

y0

y1

...yn

,

when solving the interpolation problem p(xi) = yi, where

p(x) = c0 + c1x+ ...cnxn.

We already know that A is non-singular and it follows that At is also non-singular.As a consequence, we realize that

Remark 12.1 (Uniqueness of Weights). The weights wi making the quadrature Qexact on Pn are uniquely determined from the exactness conditions

Q(xj) =

n∑i=0

wixji =

∫ b

a

xj dx, j = 0, ..., n.

Example 12.6 (Simpson’s quadrature).

Q(f) = w0f(−1) + w1f(0) + w2f(1) ≈∫ 1

−1

f(x) dx.

The exactness conditions (for P2) are

2 =

∫ 1

−1

1 dx = w0(1) + w1(1) + w2(1)

0 =

∫ 1

−1

x dx = w0(−1) + w1(0) + w2(1)

2

3=

∫ 1

−1

x2 dx = w0(−1)2 + w1(0)2 + w2(1)2,

i.e.

w0 = w2 =1

3and w1 =

4

3as previously obtained.

Remark 12.2 (Higher order exactness for Simpson). Note that in addition of beingexact for any polynomial of degree 2, the Simpson’s quadrature rule also satisfy

0 =

∫ 1

−1

x3 dx =1

3(−1)3 + 0 +

1

3(1)3

c© 39


but2

5=

∫ 1

−1

x4 dx 6= 1

3(−1)4 + 0 +

1

3(1)4 =

2

3.

Hence, the Simpson’s quadrature rule is exact for P3 but not P4.

40 c©


13. Lecture 13

Last lecture introduced the Simpson’s rule

I(f) :=

∫ 1

−1

f(x) dx ≈ 1

3f(−1) +

4

3f(0) +

1

3f(1) =: Q(f).

We found that Q was exact for cubics.Consider instead the quadrature scheme based on the nodes x0, x1, x2, x3 :=

−1, 0, 1/2, 1, i.e.

I(f) ≈ Q(f) =

3∑i=0

wif(xi).

We also saw during the last lecture that there exists a unique set of weights wi,i = 0, 1, 2, 3 making Q exact for cubics (see Remark 12.1). Note that the Simpson’sscheme can be interpreted as

Q(f) =1

3f(−1) +

4

3f(0) + 0f(1/2) +

1

3f(1)

and hence is the unique scheme. Applying the error formula provided by Theo-rem 12.1 gives

I(f)−Q(f) =1

4!

∫ 1

−1

f (4)(ξx)

3∏i=0

(x− xi) dx.

The quantity on the right side of the above relation is usually quite large. To getquadrature to approximate integrals, we need composite schemes.

Composite Schemes. Suppose you have a scheme

Q(f) =

n∑i=0

wif(xi) ≈∫ b

a

f(x) dx = I(f),

exact on Pn, where x0, ..., xn are distinct in [a, b].We want to deduce from a scheme on [α, β]. Let λ be the linear mapping taking

[a, b] onto [α, β], i.e.

λ(x) = α+x− ab− a

(β − α)

(λ(a) = α and λ(b) = β). As we saw in an early homework, if p ∈ Pn, then so isq(x) = p(λ(x)). Also,

λ−1(x) = a t−αβ−α

(b− a)

is a linear mapping from [α, β] to [a, b]. Now, for q ∈ Pn

I(q) :=

∫ β

α

q(t) dt =β − αb− a

∫ b

a

q(λ(x)) dx,

where we have used the change of variable x = λ−1(t) or t = λ(x) so that dxdt = b−a

β−αand dt = β−α

b−a dx. Since the composition q λ is in Pn and the quadrature schemeis exact on Pn

I(q) :=β − αb− a

n∑i=0

wiq(λ(xi)).

We set

(8) wi =β − αb− a

wi and xi = λ(xi)

c© 41


to deduce that ∫ β

α

q(t)dt =

n∑i=0

wiq(xi) =: Q(q).

In conclusion, given a scheme

I(f) =

∫ b

a

f(x) dx ≈ Q(f) =

n∑i=0

wif(xi)

which is exact on Pn, we get a translated scheme

I(f) =

∫ β

α

f(t) dt ≈ Q(f) =

n∑i=0

wif(xi)

which is also exact on Pn using the notation (8).

Remark 13.1 (Property of the translated scheme). The map λ is a linear map of[a, b] onto [α, β] so it maps points in a proportional way: a→ α and b→ β implies(a+ b)/2→ (α+ β)/2. More generally, for any t ∈ [0, 1]

[a, b] 3 ta+ (1− t)b→ tα+ (1− t)β ∈ [α, β].

Composite Quadrature. We want to approximate

I(f) =

∫ b

a

f(x) dx

and introduce N + 1 distinct points

a = x0 < x1 < ... < xN = b

and set h = maxi=1,...,N (xi − xi−1). Hence, we split the integral over [a, b] onto Npieces

I(f) =

N∑i=1

∫ xi

xi−1

f(x) dx

and use a base (translated) quadrature scheme over each sub-interval.

13.1. Simpson’s Composite Quadrature Rule. If we use the Simpson’s rule∫ 1

−1

g(t) dt ≈ 1

3g(−1) +

4

3g(0) +

1

3g(1)

to approximate ∫ xi

xi−1

f(x) dx,

we have ∫ xi

xi−1

f(x) dx ≈2∑i=0

wif(xi),

where

w2 = w0 =xi − xi−1

2

1

3and w1 =

xi − xi−1

2

4

3and the nodes are moved proportionally

−1→ xi−1, 1→ xi and 0→ xi−1 + xi2

.

42 c©


This implies∫ xi

xi−1

f(x) dx ≈ xi − xi−1

6

(f(xi−1) + 4f(xi−1/2) + f(xi)

),

where

xi−1/2 :=xi−1 + xi

2.

Gathering all the approximations in all subinterval, we arrive at the compositeSimpson’s rule approximation∫ b

a

f(x) dx ≈N∑i=1

xi − xi−1

6

(f(xi−1) + 4f(xi−1/2) + f(xi)

)=:

N∑i=1

Qi(f).

Regarding the integration error, the Simpson’s rule is the rule based on −1, 0, 1/2, 1,which is exact for cubics. Since 1

2 = 34 (1) + 1

4 (−1), the translated rule is the ruleon

xi−1,1

2(xi−1 + xi),

3

4xi +

1

4xi−1, xi := x0, x1, x2, x3,

which is exact on cubics. The error formula (Theorem 12.1) gives∫ xi

xi−1

f(x) dx− Qi(f) =1

24

∫ xi

xi−1

f (4)(ξx)

3∏j=0

(x− xj) dx,

provided that f ∈ C4[a, b]. Let ‖f (4)‖∞ = maxt∈[a,b] |f (4)(t)|, then∣∣∣∣∣∫ xi

xi−1

f(x) dx− Q(f)

∣∣∣∣∣ ≤ 1

24‖f (4)‖∞h4

∫ xi

xi−1

dx

(since |x− xi| ≤ h as x, xj ∈ [xi−1, xi]). Therefore, we obtain that∣∣∣∣∣∫ xi

xi−1

f(x) dx− Q(f)

∣∣∣∣∣ ≤ 1

24‖f (4)‖∞h4(xi − xi−1).

Summing over all subintervals gives an estimate for the error∣∣∣∣∣∫ b

a

f(x) dx−N∑i=1

Qi(f)

∣∣∣∣∣ ≤N∑i=1

∣∣∣∣∣∫ xi

xi−1

f(x) dx− Qi(f)

∣∣∣∣∣ ≤ b− a24‖f (4)‖∞h4.

In general, a quadrature rule which is exact on Pn translated to an interval ofsize h has a local accuracy of hn+2 and usually results in a global accuracy of hn+1

when used in a composite quadrature.

c© 43


14. Lecture 14

Gaussian Quadrature. We noted last lecture that the order of a quadrature isdetermined by exactness on Pn. It is natural to optimize the order by allowing thenodes to move.

We start with examples.

Example 14.1 (One point).

I(f) =

∫ b

a

f(x) dx ≈ (b− a)f(xi).

Note that the weight (b − a) makes the quadrature exact on constants. For thequadrature to be exact on linears, we need that

b2 − a2

2=

∫ b

a

x dx = (b− x)xi,

or

xi =b+ a

2,

which implies that

Q(f) = (b− a)f

(b+ a

2

).

This is called the mid-point rule. It is exact for linears but not quadratics since

b3 − a3

3=

∫ b

a

x2 6= (b− a)

(a+ b

2

)2

.

Example 14.2 (Two points formula). The two-points formula has 4 unknowns(2 weights and 2 interpolation points). We show that we can determine these un-knowns for the quadrature to be exact on cubics. For this, the following 4 exactnessconditions (see Remark 12.1)

2 =

∫ 1

−1

dx = w1 + w2

0 =

∫ 1

−1

x dx = w1x1 + w2x2

2

3=

∫ 1

−1

x2 dx = w1x21 + w2x

22

0 =

∫ 1

−1

x3 dx = w1x31 + w2x

32.

From the second condition, we deduce that w1x1 = −w2x2. This into the 4thcondition yield

0 = w2x2(x21 − x2

2).

Note that w1 6= 0, for otherwise it would be a one-point rule which cannot beexact for cubics, and x2 6= 0 for otherwise the second condition would imply thatx1 = x2 = 0 and the interpolation points would not be distinct. Therefore, we musthave x2

1 = x22, i.e.

x1 = −x2

44 c©


(again, we want distinct interpolation points). Using the second constraint again,this implies that w2x1 − w2x1 = 0 or

w1 = w2.

Now the second and fourth constraints hold. The first condition requires

w1 + w2 = 2 =⇒ w1 = w2 = 1.

Finally, the third condition implies

2

3= 2w1x

21 or x1 = ±

√1

3.

Finally, the scheme is∫ 1

−1

f(x) dx ≈ f(− 1√3

) + f(1√3

) := Q(f)

and is exact on cubics. It is not exact on quartics since

2

5=

∫ 1

−1

x4 dx 6= Q(x4) =1

9+

1

9.

Example 14.3 (Three points). Can we make a three points quadrature rule exacton P5? Here the unknowns are wi, xi2i=0. Assume the scheme is symmetric aboutthe origin, i.e.

Q(f) = w1f(−x1) + w0f(0) + w1f(x1).

Notice that the symmetry implies that for all odd degree conditions:

0 =

∫ 1

−1

x2j+1 dx = −w1x2j+11 + w00 + w1x

2j+11 = Q(x2j+1), j ≥ 0.

We now check the even degree conditions:

2 =

∫ 1

−1

1 dx?= 2w1 + w0

2

3=

∫ 1

−1

x2 dx?= 2w1x

21

2

5=

∫ 1

−1

x4 dx?= 2w1x

41.

Divide the third relation by the second to get

3

5= x2

1 =⇒ x1 =

√3

5.

From the second relation we compute w1:

1

3= w1

3

5=⇒ w1 =

5

9.

This in the first constraint implies that

10

9+ w0 = 2 =⇒ w0 =

8

9and the scheme reads

Q(f) =5

9f(−

√3/5) +

8

9f(0) +

5

9f(√

3/5).

This scheme is exact on P5 but not on P6.c© 45


Definition 14.1 (Gaussian Quadrature). A quadrature involving n points, whichis exact on P2n+1 is called a Gaussian quadrature.

Generalization: Weighted Gaussian Quadrature. Given a non-negative weightfunctions w(x) only vanishing at a discrete set of points, we want to derive Gaussianquadrature schemes such that∫ b

a

w(x)f(x) dx ≈n∑i=0

wif(xi).

Notice that the assumption on the weight function implies that when [α, β] ⊆ [a, b]with α < β then ∫ β

α

w(x) dx > 0.

We define

〈f, g〉w :=

∫ b

a

w(x)f(x)g(x) dx.

The mapping 〈., .〉w provides an inner product on C[a, b], i.e.

(1) 〈., .〉w is bilinear, i.e.

〈αf + βg, h〉w = α〈f, h〉w + β〈f, h〉w,

and

〈h, αf + βg〉w = α〈h, f〉w + β〈h, g〉w,for f, g, h ∈ C[a, b] and α, β ∈ R.

(2) 〈., .〉w is symmetric, i.e.

〈f, g〉w = 〈f, g〉w, f, gC [a, b].

(3) 〈., .〉w is positive definite, i.e.

〈f, f〉w ≥ 0, f ∈ C[a, b]

and equals 0 only if f is the zero function, i.e. f(x) = 0.

The above three properties implies that

‖f‖w := (〈f, f〉w)1/2

is a norm on C[a, b] and

|〈f, g〉w| ≤ ‖f‖w‖g‖w(Cauchy-Schwartz inequality).

Definition 14.2 (w-orthogonality). We say that f is w−orthogonal to Pk is

〈f, p〉w = 0 for all p ∈ Pk.

Theorem 14.1 (Gaussian Quadrature). Suppose there is a nonzero qk+1 ∈ Pk+1

which is w−orthogonal to Pk. If qk+1 has k + 1 distinct roots x0, ..., xk, thenquadrature based on the nodes x0, ..., xk approximating

I(f) =

∫ b

a

w(x)f(x) dx

which is exact on Pk is in fact exact on P2k+1, i.e. a Gaussian quadrature.46 c©


Note that the quadrature Q is exact on Pk (or P2k+1) means that

I(p) =

∫ b

a

w(x)p(x) dx = Q(p)

for every p ∈ Pk (or Pk). As in the case w(x) = 1, the exactness conditions uniquelydetermine the quadrature weights.

We postpone the proof of this theorem for later. For now, we make the followingremark.

Remark 14.1 (Leading Coefficient of w−orthogonal polynomial). If qk+1 ∈ Pk+1 isw−orthogonal to Pk and nonzero then

qk+1(x) = ak+1xk+1 + akx

k + ...+ a0

and ak+1 6= 0. Indeed, if ak+1 = 0 then qk+1 ∈ Pk and

0 = 〈qk+1, qk+1〉w =

∫ b

a

w(x)q2k+1(x) dx,

which implies that qk+1 = 0 and contradicts our assumption. Moreover, since weare only interested in the roots of qk+1, we may assume that qk+1 is monic, i.e.ak+1 = 1 and

qk+1(x) = xk+1 + akxk + ...+ a0.

Example 14.4 (k = 1 and w(x) = 1). Find a monic q ∈ P2 with

〈f, g〉w =

∫ 1

−1

f(x)g(x) dx.

We are looking for α and β such that

0 = 〈q, 1〉w =

∫ 1

−1

(x2 + αx+ β) dx =2

3+ α0 + 2β,

i.e. 2β = − 23 or β = − 1

3 . In addition, we want

0 = 〈q, x〉w =

∫ 1

−1

(x3 + αx2 + βx) dx =2

3α,

and so α = 0. This implies that the desired polynomial is

q(x) = x3 − 1

3,

which has two roots, namely

± 1√3.

There are the quadrature nodes derived in Example 14.2.

Example 14.5 (k = 2 and w(x) = 1). Find q ∈ P3,

q(x) = x3 + αx2 + βx+ γ,c© 47


which is w−orthogonal to P2 with w(x) = 1. The desired polynomial must satisfythe following 3 constraints

α2

3+ 2γ = 〈q, 1〉w = 0

2

5+

2

3β = 〈q, x〉w = 0

α2

5+

2

3γ = 〈q, x2〉w = 0.

The first and last constraints hold only if

A :=

(23 225

23

)(αγ

)=

(00

).

Note that det(A) = 49 −

45 6= 0, so the only solution is α = γ = 0. From the second

constraint, we find that

1

5+β

3= 0 =⇒ β = −3

5.

The desired polynomial reads

q(x) = x3 − 3

5x = (x2 − 3

5)x

and has roots −√

3/5 , 0,√

3/5.

48 c©


15. Lecture 15

We start with the proof of the quadrature theorem (Theorem 14.1).

Proof of Theorem 14.1. Let x0, .., xk be distinct roots of qk+1 and let Q be theassociated exact quadrature scheme exact on Pk. The latter is assumed to be nonzero, belongs to Pk+1 and is w−orthogonal to Pk. Let p ∈ P2k+1 and factor (usingpolynomial division with reminder)

p = qs+ r,

with r, s ∈ Pk. Then,

I(p) =

∫ b

a

w(x)p(x) dx =

∫ b

a

w(x)q(x)s(x) dx︸︷︷︸=0

+

∫ b

a

w(x)r(x) dx

using the fact that q is w−orthogonal to Pk and s ∈ Pk. Now, since Q is exact onPk, then

I(p) =

k∑j=0

wjr(xj).

Moreover, the nodes x0, ..., xk are the roots of p, so that computing further

I(p) =

k∑j=0

wjr(xj) =

k∑j=0

wj

q(xj)︸︷︷︸=0

s(xj) + r(xj)

= Q(p),

which proves the quadrature is exact on P2k+1.

Lemma 15.1 (Roots of w−orthogonal polynomial). If q ∈ Pk+1 is non zero andis w−orthogonal to Pk, then all of the roots of q are distinct and in (a, b).

Proof. We will see in the next lemma (Lemma 15.2) that q as real coefficients. If qdoes not have any root in (a, b) then q > 0 or q < 0 in (a, b) and so∫ b

a

w(x)q(x) dx > 0 and

∫ b

a

w(x)q(x) dx < 0,

either contradicting the w−orthogonality of q in P0 ⊂ Pk.Suppose now that q has 1 ≤ l < k + 1 roots in (a, b), denoted y1, y2, ..., yl

(repeated according to their multiplicity), and set

r(x) =∏

yj root of odd multiplicity

(x− yj).

As the polynomial q changes sign across a root of odd multiplicity (as does r(x)),the product q(x)r(x) has the same sign except for x = yj , where yj is a root of oddmultiplicity. This implies that∫ b

a

w(x)q(x)r(x) dx 6= 0.

As r ∈ Pk, this is a contradiction with the w−orthogonality in Pk. As a conse-quence, there must be k+ 1 roots of q in (a, b) and so they must be distinct for theresulting r to belongs to Pk+1.

c© 49


Lemma 15.2 (Real coefficients). There is a unique monic real polynomial q ∈Pk+1, which is w−orthogonal to Pk.

Proof. Let q = xk+1 +αkxk + ...+α0. Then q is a w−orthogonal to Pk if and only

if〈q, xj〉w = 0, j = 0, ..., k,

i.e.

〈xk+1, xj〉w +

k∑l=0

αl〈xl, xj〉w = 0, j = 0, ..., k.

This is equivalent to

A

α0

...αk

= F,

where the coefficients of the matrix A are given by

Aj,l = 〈xl, xj〉w j, l = 0, ..., k

andFj = −〈xk+1, xj〉w, j = 0, ..., k.

Suppose that Aβ = 0 for some β ∈ Rk+1. Set

r(x) = βkxk + βk−1x

k−1 + ...+ β0.

The jth equation of Aβ = 0 is

0 =

k∑l=0

Aj,lβl =

k∑l=0

〈xl, xj〉wβl =

k∑l=0

〈βlxl, xj〉w = 〈r(x), xj〉w, j = 0, 1, ..., k,

i.e. r(x) is w−orthogonal to Pk. As r ∈ Pk

0 = 〈r(x), r(x)〉w =⇒ r(x) = 0, i.e. β = 0.

This proves that A is nonsingular. As A is real valued, so is A−1. (For instance,the inverse can be computed by row reducing (A : I)→ (I : A−1).)

Every weighted quadrature problem gives rise to a sequence of orthogonal poly-nomial. The sequence follows a 3 term recurrence.

Start with p0 = 1 ∈ P0 (nothing to be orthogonal to). Then p1 ∈ P1 must beorthogonal to 1. If p1(x) = x+ α then α must satisfy

0 = 〈x+ α, 1〉w, or α = −〈x, 1〉w.Suppose we have computed pj−1 and pj . Write

pj+1 = (x+ α)pj + βpj−1.

Then for θ ∈ Pj−2

〈pj+1, θ〉w = 〈(x+ α)pj , θ〉w + β 〈pj−1, θ〉w︸︷︷︸=0

= 〈pj , (x+ α)θ〉w︸︷︷︸=0

= 0.

We also need

0 = 〈pj+1, pj−1〉w = 〈(x+ α)pj , pj−1〉w + β〈pj−1, pj−1〉w.The α term goes away so

β = − 〈xpj , pj−1〉w〈pj−1, pj−1〉w

.

50 c©


Also

0 = 〈pj+1, pj〉w = 〈(x+ α)pj , pj〉w + β 〈pj−1, pj〉w︸︷︷︸=0

,

and so we find

α = −〈xpj , pj〉w〈pj , pj〉w

.

The values of α and β determines pj+1. Note that the orthogonal polynomialsalways satisfy 3 term recurrence relations!

Rodrigues Formula for Legendre Polynomials.

Example 15.1. w(x) = 1, a = −1, b = 1] Consider the approximation I(f) =∫ 1

−1f(x) dx.. Then

pn(x) =1

(2n)(2n− 1) · .... · (n+ 1)

dn

dxn[(x2 − 1)n].

To see this, we check that it is a monic polynomial of degree n and w−orthogonalto Pn−1. We leave the first part as an exercise (Exercise 15.1). Now, if p ∈ Pn−1∫ 1

−1

dn

dxn[(x2 − 1)n]p(x) dx =

dn

dxn[(x2 − 1)n]

∣∣∣∣1−1︸︷︷︸

=0

−∫ 1

−1

dn−1

dxn−1[(x2 − 1)n]p′(x) dx.

Repeating this by moving all derivatives over to p, we arrive at∫ 1

−1

dn

dxn[(x2 − 1)n]p(x) dx = (−1)n

∫ 1

−1

(x2 − 1)n p(n)(x)︸︷︷︸=0

dx = 0

because p ∈ Pn−1.

Definition 15.1 (Legendre Polynomials). The polynomial

1

2nn!

dn

dxn[(x2 − 1)n].

is called the Legendre polynomial (different normalization) and satisfies

(n+ 1)pn+1(x) = (2n+ 1)xpn(x)− npn(x).

Exercise 15.1. Show that

pn(x) =1

(2n)(2n− 1) · .... · (n+ 1)

dn

dxn[(x2 − 1)n]

is a polynomial of degree n and is monic (i.e. the leading coefficient is 1).

Rodrigues Formula for Chebyshev Polynomials.

Example 15.2 (Chebyshev polynomials). We recall that the Chebyshev are givenby

Tn(x) = cos(n cos−1(x)).

Note that for n 6= j ∫ π

0

cos(nθ) cos(jθ) dθ = 0.

We leave the above claim as exercise.c© 51


Set θ = cos−1(x), then x = cos(θ) and

dx = − sin(θ)dθ = −√

1− cos2(θ) dθ = −√

1− x2 dθ.

Using the orthogonality above

0 =

∫ π

0

cos(nθ) cos(jθ) dθ =

∫ 1

−1

cos(n cos−1(x)) cos(j cos−1(x))dx√

1− x2

=

∫ 1

−1

1√1− x2

Tn(x)Tj(x) dx,

i.e. the Chebyshev polynomial Tn are orthogonal polynomials on −1, 1 with weightsw(x) = 1√

1−x2and satisfy the recurrence

Tn+1(x) = 2xTn(x)− Tn−1(x)

as we have seen already in Section 5.1.

52 c©


16. Lecture 16

We saw last lecture that the Chebyshev polynomials are orthogonal with respectto the scalar product

〈f, g〉w =

∫ 1

−1

1√1− x2︸︷︷︸

=:w(x)

f(x)g(x) dx,

i.e. Tn+1 satisfies Tn+1 ∈ Pn+1 and

〈Tn+1, Tj〉w = 0, 0 ≤ j ≤ m.As Tjnj=0 is a basis for Pn, Tn+1 is w−orthogonal to Pn.

The Rodrigues formula for the Chebyshev polynomials reads

Tn = w(x)dn

dxn(w(x)(1− x2)n

)=

1

(1− x2)1/2

dn

dxn

((1− x2)n−1/2

)using the definition of the weight w(x) = (1− x2)−1/2. Note that

dn

dxn(fg) =

n∑j=0

(nj

)f (j)g(n−j)

(you can prove this by induction). Therefore,

Tn(x) =1

(1− x2)1/2

dn

dxn

((1− x)n−1/2(1 + x)n−1/2

)=

1

(1− x2)1/2

n∑j=0

(nj

)cj,n(1− x)n−j−1/2(1 + x)j−1/2

=

n∑j=0

(nj

)cj,n(1− x)n−j(1 + x)j .

This proves that Tn ∈ Pn. We now check that it is w−orthogonal. We use integra-tion by parts again:

I :=

∫ 1

−1

(1− x2)−1/2Tn(x)p(x) dx

=

∫ 1

−1

dn

dxn

((1− x2)−1/2(1− x2)n

)p(x) dx

= p(x)dn−1

dxn−1

((1− x)n−1/2(1 + x)n−1/2

)∣∣∣∣x=1

x=−1

−∫ 1

−1

dn−1

dxn−1

((1− x2)−1/2(1− x2)n

)p′(x) dx

= −n−1∑j=0

(n− 1j

)cn−1,j(1− x)n−j−1/2(1 + x)j+1−1/2.

Note that all terms evaluated at either x = −1 or x = 1 are zero because the havepositive powers of (1− x) and (1 + x). Repeating the argument gives

I = (−1)n∫ 1

−1

(1− x2)−1/2(1− x2)np(n)(x) dx = 0

for p ∈ Pn−1. Since, Tn and Tn differ at most by a normalization constant, Tn isalso a polynomial of degree n, w−orthogonal to Pn−1.

c© 53


17. Lecture 17

Nonlinear Equations

Essentially, the only way that one can solve nonlinear equations is by iteration.The quadratic formula enables one to compute the roots of p(x) = 0 when p ∈ P2.

Formulas were derived for finding the roots of p ∈ P3 and p ∈ P4 by expressionsinvolving radicals (∼ 1545). The case of quintics was studied unsuccessfully foralmost 300 years until Abel (1824) proved no such formula existed. As we shall see,the roots of quintics (and other nonlinear equations) can be solved by iteration!

Example 17.1 (Eigenvalues). Let A be a n × n matrix with n ≥ 5. We pro-pose to compute the eigenvalues of A. The eigenvalues of A are the roots of thecharacteristic polynomial, i.e.

p(λ) = det(A− λI).

The characteristic polynomial p has degree n and leading term is (−1)nλn. Thisproblem can only be solved by iteration when n ≥ 5.

Bisection method. Suppose f is continuous on [a, b] and satisfies f(a)f(b) < 0(not the same sign). The intermediate value theorem tells us there is at least 1 rootf(x∗) = 0, x∗ ∈ (a, b). The bisection method is used to finding such root.

Bisection Algorithm (mathematical)Set a0 = a and b0 = bFor i = 0, 1, 2, ...

Set ai+ 12

= ai+bi2

If f(ai+ 12) = 0

STOP, ai+ 12

is the desired root

Else If f(ai+ 12)f(ai) > 0

Set ai+1 = ai+ 12

and bi+1 = biElse

Set ai+1 = ai and bi+1 = ai+ 12

End For

It is obvious that after j steps, either we have found a root or there is a root in(aj , bj). Note that in that case the root x∗ is at most half of the interval lengthaway from either aj or bj , i.e.

x∗ =aj + bj

2+ ε,

where

ε < bj −aj + bj

2=bj − aj

2=b− a

2j.

We now describe the matlab version of the mathematical bisection algorithmusing minimal memory usage.

1 %%% R i s the root approximation a f t e r N s t ep s

2 %%% A,B are r e a l numbers

3 %%% F i s a cont inuous func t i on

4 %%% F(A)F(B)<=0

5 Function R=BISECTION(A,B,N,F)

6

54 c©


7 %% P r e l i m i n a r i e s : A or B are a root , F(A)F(B)>0

8 SA = s ign (F(A) ) ;

9 i f (SA == 0)

10 R=A;

11 re turn ;

12 end

13

14 SB = s ign (F(B) ) ;

15 i f (SB == 0)

16 R=B;

17 re turn ;

18 end

19

20 i f (SA == SB)

21 f p r i n t f ( ’ Input e r r o r to b i s e c t i o n \n ’ ) ;

22 R=NaN(1) ; %return not a number

23 re turn ;

24 end

25

26 %% a l l the p r e l i m i n a r i e s are done

27 f o r I =1:N

28 AV = (A B) /2 ;

29 FAV = F(AV) ;

30 S=s ign (FAV) ;

31 i f (S==0)

32 R=AV;

33 re turn ;

34 end

35 i f (S==SA)

36 A=AV;

37 e l s e

38 B=AV;

39 end

40 end

41 end

This algorithm will only get the real roots of a real valued function f : R→ R.

Example 17.2 (Cubic). Let f(x) = x3+x = x(x2+1). The roots of f are (0, i,−i)and bisection will only get the real roots, i.e. x = 0 To find the complex roots, weneed to treat f as a complex valued functions. We define F : R2 ' C→ C ' R2 by

F

(RI

)∼ f(R+ iI) = (R+ iI)3 + (R+ iI) = R3 + 3iR2I − 3RI2 − iI3 + (R+ iI)

= (R3 − 3RI2 +R) + (3R2I − I3 + I)i ∼(R3 − 3RI2 +R3R2I − I3 + I

).

c© 55


for real number R, I. Find the complex roots correspond to a system

F

(RI

):=

(R3 − 3RI2 +R3R2I − I3 + I

)=

(00

).

The previous illustrates the need of iterative techniques for systems (but is farfrom the only reason).

Let F : Rn → Rn be a vector values function on Rn. We want to find rootsx∗ ∈ Rn satisfying

F (x∗) = 0 ∈ Rn.

This gives n equations and x∗1, ..., x∗n are the n unknown.

Example 17.3 (n = 2).

F

(x1

x2

):=

(x2

1 + 4x22 − 1

4x21 + x2

2 − 1

)=

(00

).

Refer to Figure 6 for an illustration of the situation.

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

-1

-0.5

0.5

1

Figure 6. There are 4 roots of the system x21 + 4x2

2 − 1 = 0 and4x2

1 + x22 − 1 = 0.

Example 17.4 (n = 1). Consider

f(x) = sin(x) = 0.

The roots are x = jπ, j ∈ Z = ...,−2,−1, 0, 1, 2, ....

Definition 17.1 (Fixed Point Equation). A fixed point equation is one of the form

x = G(x),

with x ∈ Rn and G : Rn → Rn.56 c©


A solution to a fixed point equations is x∗ ∈ Rn satisfies

x∗ = G(x∗).

We can turn F (x) = 0 into a fixed point problem, i.e.

x = x− F (x), i.e. G(x) := x− F (x).

This means that x∗ solves F (x∗) = 0 if and only if x∗ = G(x∗). Obviously fixedpoint problems can be turned into F (x) = 0 by setting F (x) = x−G(x). We couldhave also used

x = x−BF (x) =: G(x)

for any non singular n× n matrix B. We will see the importance of the matrix Bsoon.

c© 57


18. Lecture 18

We considered in the last lecture the fixed point formulation: Find x∗ ∈ Rnsatisfying

x∗ = G(x∗),

where G : Rn → Rn. We now discuss an algorithm approximating such x∗.

Fixed Point Iteration or Picard Iteration. Start with an initial iterate (guess)x0 ∈ Rn. Then, for i = 0, 1, 2, ... set

xi+1 = G(xi).

We now want to understand when xi → x∗.

Definition 18.1 (Lipschitz). Let Ω ⊂ Rn. The vector-valued function G : Ω→ Rnis called Lipschitz conitnuous if there is a M ≥ 0 with

‖G(x)−G(y)‖∞ ≤M‖x− y‖∞for all x, y ∈ Ω. Here for w ∈ Rn

‖w‖∞ := maxi=1,...,n

|wi|.

Definition 18.2 (Contraction Mapping). Let Ω ⊂ Rn. The vector-valued functionG : Ω → Ω is a contraction mapping if G is Lipschitz continuous with constantM = ρ < 1.

Theorem 18.1 (Contraction Mapping Theorem). Let Ω be a closed subset of Rnand G be a contraction mapping of Ω into Ω with constant ρ. Then, there is aunique fixed point x∗ ∈ Ω (satisfying x∗ = G(x∗)) and the Picard iteration xj∞j=0,

starting with any x0 ∈ Ω, converges to x∗ and satisfies

‖xj − x∗‖∞ ≤ρj

1− ρ‖x0 − x∗‖∞.

This is often called linear convergence. We postpone the proof for later.To understand the contraction mapping hypothesis, we consider the equation

F (x) = 0, where F : R → R, and assume that it has as solution x∗ ∈ R, i.e.F (x∗) = 0. Furthermore, we assume that F ∈ C2 in a neighborhood Bδ1(x∗) :=[x∗ − δ1, x∗ + δ1] and that F ′(x∗) 6= 0. Notice that the latter condition, guaranteesthat there is a neighborhood Bδ2(x∗) (δ2 ≤ δ1) such that

|F ′(ξ)| ≥ 1

2|F ′(x∗)|

for every ξ ∈ Bδ2(x∗). This implies that

|F ′(ξ)|−1 ≤ 2|F ′(x∗)|−1 ξ ∈ Bδ2(x∗).

Now for δ ≤ δ2 (to be determined), we pick w ∈ Bδ(x∗) and set

(9) G(x) = x− (F ′(w))−1F (x).

Clearly x∗ is a fixed point of G and for x, y ∈ Bδ(x∗)

G(x)−G(y) = G′(y)(x− y) +G′′(ξ)

2(x− y)2,

for some ξ between x and y. Therefore,

|G(x)−G(y)| ≤ |G′(y)||x− y|+ |G′′(ξ)|2

(x− y)2.

58 c©


However,

G′(y) = 1−(F ′(w))−1F ′(y) = (F ′(w)−F ′(y))(F ′(w))−1 = F ′′(ξ1)(w−y)(F ′(w))−1

for ξ1 between w and y. Now by the C2 assumption, there exists a constant M0

such that |F ′′(θ)| ≤M for every θ ∈ Bδ1(x∗). Thus,

|G′(y)| ≤ 4M |F ′(x∗)|−1δ

so

(10) |G(x)−G(y)| ≤

4M |F ′(x∗)|−1δ +Mδ|x− y|,

using the fact that |w − y| ≤ 2δ and |x− y| ≤ 2δ.Given 0 < ρ < 1, we chose δ so that

4M |F ′(x∗)|−1 +Mδ < ρ.

This implies that G is a contraction mapping and so the Picard iterates

xi+1 = xi − (F ′(w))−1F (xi)

converges to x∗ provided |x0−x∗| ≤ δ and |w−x∗| ≤ δ. The next theorem generalizethis argumentation to Rn.

Theorem 18.2 (Secant Algorithm). Assume F : Rn → Rn and F (x∗) = 0 forsome x∗ ∈ Rn with

(1) F ∈ C2 in a neighborhood of x∗;(2) DF (x∗) = the derivative matrix at x∗ given by

(DF (x∗))ij =∂

∂xjFi(x

∗)

is nonsingular.

Then there is a δ > 0 such that if w ∈ Bδ(x∗) and x0 ∈ Bδ(x∗), the iteration

xi+1 = xi − (DF (x∗))−1F (xi)

converges to x∗.

The proof is as in the case of scalar functions discussed before the theorembut more complicated due to matrix notations. The major obstacle to apply thisalgorithm is getting close enough to the root to come up with w (then we can alwaystake x0 = w).

c© 59


19. Lecture 19

We start with the proof of the contraction mapping theorem (Theorem 18.1).

Proof of Theorem 18.1. Let j ≤ k and l ≥ 0, xj+1 = G(xj) with x0 ∈ Ω ⊂ Rn(closed) and G a contraction (Lipschitz constant ρ < 1) on Ω. Note that

xk+l+1 − xk = xk+l+1 − xk+l + xk+l − xk+l−1 + ...+ xk+1 − xk.

Now,

xm+1 − xm = G(xm)−G(xm−1)

and so

‖xm+1 − xm‖∞ = ‖G(xm)−G(xm−1)‖∞ ≤ ρ‖xm − xm−1‖∞.

Repeating

‖xm+k+1 − xm+k‖∞ ≤ ρk‖xm+1 − xm‖∞and thus

‖xk+l+1 − xk‖∞ ≤ ‖xk+l+1 − xk+l‖∞ + ‖xk+l − xk+l−1‖∞ + ...+ ‖xk+1 − xk‖∞≤ (ρl + ρl−1 + ...+ 1)‖xk+1 − xk‖∞≤ (ρl + ρl−1 + ...+ 1)ρk‖x1 − x0‖∞

≤ ρk−j︸︷︷︸≤1

ρj

1− ρ‖x1 − x0‖∞ ≤

ρj

1− ρ‖x1 − x0‖∞.

This means that if m, l > j

‖xm − xl‖∞ ≤ρj

1− ρ‖x1 − x0‖∞.

The quantity on the right side can be made as small as we want by taking j large.This implies that the sequence xj is a Cauchy sequence and so converges to somex∗ ∈ Ω. (Recall that Rn is complete and Ω closed implies that Ω is complete).Moreover,

‖x∗ −G(x∗)‖∞ ≤ ‖x∗ − xj +G(xj−1)−G(x∗)‖∞ ≤ ‖x∗ − xj‖∞ + ρ‖x∗ − xj−1‖∞.

As xj converges to x∗, the quantity on the right can be made as small as desiredbuy taking j large, i.e.

x∗ = G(x∗).

This shows that every Picard iteration converges to a fixed point. These fixedpoints are unique. Indeed, if x∗1 = G(x∗1) is another fixed point, then

‖x∗1 − x∗‖∞ = ‖G(x∗1)−G(x∗)‖∞ ≤ ρ‖x∗1 − x∗‖∞,

i.e.

(1− ρ)‖x∗1 − x∗‖∞ ≤ 0

and therefore ‖x∗1 − x∗‖∞ or x∗1 = x∗. 60 c©


Newton’s Method. We start with a motivation. We look for zero of F (x) = 0,where F : Rn → Rn. Assume F ∈ C2 and that there is a zero x∗ of F such thatDF (x∗) is nonsingular. As in the previous lecture, there is a neighborhood Bδ(x

∗)such that DF (x) is nonsingular when x ∈ Bδ(x∗). Assume that xj ∈ Bδ(x∗). TheTaylor theorem for vector fields guarantees that

F (x) = F (xi) +DF (xj)(x− xj) + ε(x),

where ‖ε(x)‖ = O(‖x− xj‖2). Ignoring the error term and recalling that we wantF (x∗) = 0, we chose xj+1 by

0 = F (xj) +DF (xj)(xj+1 − xj), i.e.xj+1 = xj −DF (xj)−1F (xj).

This is the Newton’s iterative method.Note that this iteration is of the form

xj+1 = Gj(xj), Gj(x) = x−DF (xj)−1F (x)

so it is not quite a fixed point iteration because G changes at each step!The mathematical analysis of this iterative scheme follows that provided be-

fore Theorem 18.2. For simplicity, we again consider the case Ω ⊂ R. The maindifference is that w in (9) is replaced by xj . As in the previous analysis, for

(1) x∗ satisfying F (x∗) = 0;(2) F ′(x∗) 6= 0;(3) F ∈ C2 in a neighborhood of x∗

we assume that xj ∈ Bδ(x∗) with δ ≤ δ2 (δ to be determined).Now

x∗ − xj+1 = Gj(x∗)−Gj(xj)

so that|x∗ − xj+1| ≤ |Gj(x∗)−Gj(xj)|.

Since xj ∈ Bδ(x∗), we have as in (10)

|Gj(x∗)−Gj(xj)| ≤

4M |F ′(x∗)|−1 +Mδ|x∗ − xj |.

As long as |x∗ − xj | ≤ δ2, we may take δ = |x∗ − xj | to conclude

|x∗ − xj+1| ≤M4|F ′(x∗)|−1 + 1|x∗ − xj |2.This is called quadratic convergence.

Once Newton’s iterates get close to the solution so that quadratic convergencekicks in, the convergence is extremely fast.

For a geometric interpretation of the Newton’s method, we refer to Figure 7.

Example 19.1 (Newton). Consider the function f(x) = x3e−x2

, which has onlyone root (at x = 0), see Figure 8. We compute

f ′(x) = (3x2 − 2x4)e−x2

sof ′(x) > 0 for |x| <

√3/2

andf ′(x) < 0 for |x| >

√3/2

The geometric interpretation of the Newton method implies that

(1) if x0 >√

3/2, Newton’s method diverge to ∞c© 61


F (xj) + (x− xj)F ′(xj)

xjxj+1

F (x)

Figure 7. Geometric Interpretation of the Newton’s method.

-2.4 -2 -1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 2 2.4

-0.5

-0.25

0.25

0.5

Figure 8. f(x) = x3e−x2

.

(2) if x0 < −√

3/2, Newton’s method diverge to −∞(3) it diverge in a neighborhood of both −

√3/2 and

√3/2

(4) it converges to 0 otherwise.

62 c©


20. Lecture 20

Ordinary Differential Equations (ODEs)

Look for x(t) ∈ C1(a, b) such thatx′(t) = f(t, x(t)), t ∈ (a, b);x(t0) = x0

for some t0 ∈ (a, b), x0 ∈ R and a function f : (a, b)× R→ R.We ask the following questions:

(1) Is there a function x(t) satisfying the ODE and is it unique?(2) How to approximate solutions?

Example 20.1 (Non-Existence). Consider the following ODE

x′ = x tan(t), x(0) = 1.

We check that x(t) = sec(t) = 1cos(t) is a solution. Indeed, we check

x′ =sin(t)

cos2(t)=

1

cos(t)tan(t) = x tan(t)

and x(0) = 1cos(0) = 1. The function x(t) = 1

cos(t) is pictured in Figure 9. This

-10 -7.5 -5 -2.5 0 2.5 5 7.5 10

-5

-2.5

2.5

5

Figure 9. Function x(t) = 1/ cos(t).

solution is only continuously defined on (−π/2, π/2) and is unique in that interval.The blow up of the solution at t = ±π/2 is caused by the blow up of tan(t) att = ±π/2 (see later).

Example 20.2 (Non-Existence 2). Consider now the ODE

x′ = 1 + x2, x(0) = 0.

In this case we find the solution separating the variables and taking anti-derivatives∫dx

1 + x2=

∫1 dt+ C.

This implies thatarctan(x) = t+ C

c© 63


or

x = tan(t+ C).

Using the initial condition, we find that C = 0, i.e.

x(t) = tan(t).

Figure 10 depicts this function. Although the ODE looks harmless, the solutionx(t) = tan(t) is continuously defined (and unique) on (−π/2, π/2) and blows up atthe endpoints.

-4.8 -4 -3.2 -2.4 -1.6 -0.8 0 0.8 1.6 2.4 3.2 4 4.8

-3

-2

-1

1

2

3

Figure 10. Function x(t) = tan(t).

Example 20.3 (Non-Uniqueness). Consider now the ODE

x′ = x2/3, x(0) = 0.

Clearly x(t) = 0 is a solution. Alternatively,∫dx

x2/3=

∫1 dt+ C.

This implies that

3x1/3 = t+ C

or

x(t) =

(t+ C

3

)3

.

Using the initial condition, we find that C = 0, i.e.

x(t) =

(t

3

)3

.

We check that

x′(t) =3

3(t/3)2 =

((t/3)6

)1/3= (x2)1/3.

Hence, we found 2 solutions (non-unique). There are, in fact, an infinite amountof solutions.64 c©


Systems of ODEs are treated similarly. We are given F (t, x) : (a, b)×Rn → Rn,

x(t) =

x1(t)x2(t)

...xn(t)

, x′(t) =

x′1(t)x′2(t)

...x′n(t)

.

We look for solutions xi(t) ∈ C1(a, b) for i = 1, ..., n satisfying

(11)

x′(t) = F (t, x), t ∈ (a, b)x(0) = x0,

where x0 ∈ Rn and t0 ∈ (a, b) are given.

Definition 20.1 (Uniform Lipschitz). Assume that F is continuous on [a, b]×Rn.Then F satisfies a uniform Lipschitz condition if

|F (t, x1)− F (t, x2)| ≤ L|x1 − x1|

for all t ∈ [a, b] and x1, x2 ∈ Rn.

Theorem 20.1 (Existence and Uniqueness). If F satisfies a uniform Lipschitzcondition, the system of ODES (11) has a unique solution on [a, b].

Proof. (Sketch using Picard Iteration). We start by noting that integrating thedifferential equation gives

x(t) = x0

∫ t

t0

x′(s) ds =

∫ t

t0

F (s, x(s)) ds,

i.e.

(12) x(t) = x0 +

∫ t

t0

F (s, x(s)) ds.

This is a fixed point equation for the vector valued function x(t) on Bδ(t0) = (t0−δ, t0 +δ) for some δ > 0. Hence, we define a sequence of functions xj : Bδ(t0)→ Rnby

x0(t) = x0, xj+1(t) = x0 +

∫ t

t0

F (s, xj(s)) ds.

Now, the proof follows the contraction mapping theorem (Theorem 18.1). For avector valued function u(t) defined on Bδ(t0), we set

‖u‖∞ = maxt∈Bδ(t0)

‖u(t)‖∞.

As

xj+1(t) = x0 +

∫ t

t0

F (s, xj(s)) ds,

we have that

‖xj+1(t)− xj(t)‖∞ = ‖∫ t

t0

(F (s, xj(s))− F (s, xj−1(s))) ds‖∞

≤ |t− t0| maxs between t0 and t

‖F (s, xj(s))− F (s, xj−1(s))‖∞

≤ δL maxs between t0 and t

‖xj(s)− xj−1(s)‖∞.

c© 65


Thus, ‖xj+1 − xj‖∞ ≤ δL‖xj − xj−1‖∞. This sequence converges in ‖.‖∞ if ρ :=δL < 1, which is guaranteed upon choosing δ sufficiently small. We denote byx∗ := limj→∞ xj . One then shows that each xj is continuous and ‖xj‖∞ ≤ C‖x0‖∞.As xj converges uniformly on Bδ(t0), its limit x∗ is continuous and is the uniquefixed point of (12). Differentiating the fixed point equation (12) shows that

(x∗)′ = F (t, x∗).

66 c©


21. Lecture 21

Numerical Ordinary Differential Equations (ODEs)

We discuss two techniques to approximate the ODE

d

dtx(t) = f(t, x(t)).

(1) Replace ddtx(t) by a finite difference approximation.

(2) Integrate:

x(tk+1)− x(tk) =

∫ tk+1

tk

d

dtx(t) dt =

∫ tk+1

tk

f(t, x(t)) dt

or

x(tk+1) = x(tk) +

∫ tk+1

tk

f(t, x(t)) dt

and replace the integral by a quadrature.

Finite Difference Approximations. We saw that for t ∈ [tk, tk+1]

d

dtx(t) ≈ xtk+1 − xtk

hk,

where hk = tk+1− tk. Using this in the ODE at t = tk yields the Forward Euler orforward difference relation

x(tk+1)− x(tk)

hk= f(tk, x(tk)) +O(hk).

This leads to the ODE scheme

xk+1 − xkhk

= f(tk, x(tk)),

where xk ≈ x(tk).Similarly, but using t = tk+1 in the ODE, yields the Backward Euler or backward

difference relation

x(tk+1)− x(tk)

hk= f(tk+1, x(tk+1)) +O(hk).

In turn, the ODE scheme becomes

xk+1 − xkhk

= f(tk+1, x(tk+1)),

where again xk ≈ x(tk).

Remark 21.1 (Explicit Schemes). Given x0, the Forward Euler determines xk iter-atively according to the relation

xk+1 = xk + hkf(tk, xk).

Given xk, the computation of the right side of the above relation only involvesaddition, multiplication and the evaluation of f(tk, xk). This is called an explicitmethod.

c© 67


Remark 21.2 (Implicit Schemes). Given x0, the Backward Euler determines xkiteratively according to the relation

xk+1 − hkf(tk, xk+1) = xk.

Given xk, the problem of determining xk+1 is generally a nonlinear. It can bewritten as a fixed point iteration

x = xk + hkf(tk+1, x)

and one could use Picardo or Newton’s method. ODE schemes that require thesolution of (nonlinear) equations are called implicit. For certain types of problems,implicit methods have better stability properties.

Remark 21.3 (Systems of ODEs). We always derive ODE methods for a single ODE.As we shall see, schemes for systems of ODEs follow immediately from schemes forthe single variable ODE.

ODE schemes can also be derived using a “Taylor series technique” as illustratedin the following examples.

Example 21.1 (Forward Euler). Set hj = tj+1 − tj so that

x(tj+1) = x(tj) + hjx′(tj) +O(h2

j ).

Replace x(tj) by an approximation xj and throw away the O(h2j ) term:

xj+1 = xj + hjx′(tj) = xj + hjf(tj , xj).

This is the forward Euler again.

Example 21.2 (Second Order).

x(tj+1) = x(tj) + hjx′(tj) +

h2j

2x′′(tj) +O(h3

j ).

As before,x′(tj) = f(tj , xj)

but also

x′′(t) =d

dtx′(t) =

d

dtf(t, x(t)) =

∂

∂tf(t, x(t)) +

∂

∂yf(t, x(t))x′(t).

Throw away the O(h3j ) term and replace x(tj) by xj to arrive at

xj+1 = xj + hjf(tj , xj) +h2j

2(ft(tj , xj) + fy(tj , xj)f(tj , xj)) .

Example 21.3 (Third Order).

x(tj+1) = x(tj) + hjx′(tj) +

h2j

2x′′(tj) +

h3j

6+O(h4

j ).

Here, we also use

x′′′(t) =d

dtx′′(t) =

d

dt(ft + fyf) = ftt + ftyf + f(fyt + fyyf) + fy(ft + fyf)

to deduce the scheme

xj+1 = xj + hjf(tj , xj) +h2j

2(ft + fyf) +

h3j

6

(ftt + 2ftyf + f2fyy + fyft + f2

y f).

All the f ’s and derivatives are evaluated at (tj , xj).68 c©


The methods of Examples 21.2 and 21.3 are not used in practice. The reason isthat you can derive higher order methods which only require the users to providea routine for f(t, x) (not its derivatives).

However, we shall use these methods to derive higher order methods only involv-ing f(t, x) for various values of t and x.

c© 69


22. Lecture 22

Runge-Kutta Methods. These methods are based on Taylor series method.

Example 22.1 (Second Order). From Example 21.2:(13)

x(tk+1) = x(tk)+hkf(tk, x(tk))+h2k

2(ft(tk, x(tk)) + fy(tk, x(tk))f(tk, x(tk)))+O(h3

k).

We look for a method of the form

x(tk+1) = x(tk)+ω1hkf(tk, x(tk))+ω2hkf (tk + αhk, x(tk) + αhkf(tk, x(tk)))+O(h3k)

for some parameters ω1, ω2, α. In what follows, f , ft, fy and ∇f = (ft, fy)t areevaluated at (tk, x(tk)). By Taylor theorem

f (tk + αhk, x(tk) + αhkf(tk, x(tk))) = f +∇f · (αhk, αhkf) +O(h2k)

= f + αhk(ft + fyf) +O(h2k).

Putting it together gives

x(tk+1) = x(tk) + hk(ω1 + ω2)f + αh2kω2(ft + fyf) +O(h3

k).

Comparing this with (13), we see that we need to set

ω1 + ω2 = 1, ω2α =1

2

(3 unknowns, 2 equations, multiple solutions). The numerical method reads

xk+1 = xk + ω1hkf(tk, xk) + ω2hkf (tk + αhk, xk + αhkf(tk, xk)) .

Example 22.2 (Heuns Method). We set ω1 = ω2 = 12 and α = 1 to arrive at

xk+1 = xk +1

2hk(F1 + F2),

where

F1 = f(tk, xk)

F2 = f(tk + hk, xk + hkF1).

Example 22.3 (Modified Euler). We set ω1 = 0, ω2 = 1 and α = 12 to arrive at

xk+1 = xk + hkF2,

where

F1 = f(tk, xk)

F2 = f(tk +1

2hk, xk +

1

2hkF1).

Remark 22.1 (Heun’s method). Apply Heun’s method to

x′(t) = f(t)

to get

x(tk+1) = x(tk) +hk2

(f(tk) + f(tk + hk)) +O(h3k).

Note that∫ tk+1

tk

x′(t) dt = x(tk+1)− x(tk) =hk2

(f(tk) + f(tk + hk)) +O(h3k).

70 c©


Thus ∫ tk+1

tk

f(t) dt =

∫ tk+1

tk

x′ dt ≈ hk2

(f(tk) + f(tk + hk)) ,

which is the Trapezoidal rule. The latter is locally 3rd order and globally 2nd order.Thus it is natural to define the order of the Heun’s method to be 2.

Exercise 22.1 (Backward and Forward Euler). Use a similar reasoning to showthat Backward and Forward Euler methods are first order.

Remark 22.2 (Modified Euler). The modified Euler method applied to

x′(t) = f(t)

givesx(tk+1) = x(tk) + hkf(tk + hk/2).

This is the midpoint quadrature which is also globally second order, so the (global)order of the modified Euler is 2.

Heun’s methods and modified Euler methods are called (explicit) Runge-Kuttamethods. High order Runge-Kutta methods are tedious to derive but we have beenextensively studied. One such method, which is widely used, is the Runge-Kutta4th order method:

xk+1 = xk +hk6

(F1 + 2F2 + 2F3 + F4) ,

where

F1 = f(tk, xk),

F2 = f(tk +hk2, xk +

hk2F1),

F3 = f(tk +hk2, xk +

hk2F2),

F4 = f(tk + hk, xk + hkF3).

This method is 4th order.

Remark 22.3 (Simpson). Applying the Runge-Kutta 4th order method to

x′(t) = f(t)

gives ∫ tk+1

tk

f(t)dt =

∫ tk+1

tk

x′(t) dt ≈ xk+1 − xk

=hk6

(f(tk) + 4f(tk + hk/2) + f(tk + hk)) .

This is Simpson’s Rule, which is locally 5th order, globally 4th order.

c© 71


23. Lecture 23

23.1. Multistep Methods. We start with the integral representation

x(tk+1) = x(tk) +

∫ tk+1

tk

f(t, x(t)) dt

and apply quadrature to the integral. Note that one only has approximation forx(t) at tj , j = 0, ..., k so we have to use t0, ..., tk and possibly tk+1 as quadraturenodes.

The Adams Bashford schemes use the m+ 1 nodes tk, ..., tk−m:

x(tk+1) = x(tk) +

m∑j=0

wjf(tk−j , x(tk−j)) +O(hm+2)

when the weights wj are taken so that

I(g) :=

∫ tk+1

tk

g(t) dt =

m∑j=0

wjg(tk−j) = Im(g),

i.e. the quadrature is exact on Pm. Note that Adams Bashford methods

xk+1 = xk +

m∑j=0

wjf(tk−j , xk−j)

are explicit and globally of order m+ 1.The Adams Moulton schemes use instead the m+ 1 nodes tk+1, ..., tk−m+1:

x(tk+1) = x(tk) +

m−1∑j=−1

wjf(tk−j , x(tk−j)) +O(hm+2).

Adams Moulton methods

xk+1 = xk +

m−1∑j=−1


are implicit since f(tk+1, xk+1) appears and globally of order m+ 1.In general, the ODE schemes are

xk+1 = xk +

m+σ∑j=σ


with σ = 0 for Adams Bashford and σ = −1 for Adams Moulton.

Remark 23.1 (Non Uniform Time-steps). If one only has

t0 < t1 < t2 < ...

(i.e. no structure), then new quadrature weights need to be computed at each timestep. Also, the scheme cannot be analyzed.

Remark 23.2 (Starting Values). When σ = 0, one needs the starting values

x0, x1, ..., xm

to compute xm+1, xm+2, .... The values for x0, ..., xm need to be computed by someother method.72 c©


When σ = −1, one needs the starting values

x0, x1, ..., xm−1

to compute xm, xm+1, .... The values for x1, ..., xm−1 need to be computed by someother method.

Remark 23.3 (More General). The methods can be made still more general, e.g.

m∑i=0

αixj−i =

m∑i=0

βif(tj−i, xj−i)

with α0 = 1. In this case, the method is explicit if β0 = 0, otherwise, it is implicit.

From here onward, we assume a uniform time step, i.e.

tj = t0 + jh

for some fixed h > 0.We return to Adams Bashford and compute the weights for the quadrature

problem

I(g) :=

∫ m+1

m

g(x) dx ≈m∑j=0

wjg(m− j) =: Im(g)

so that Im is exact on Pm. Now for tj = t0 + jh, we get the quadrature

I(g) :=

∫ tk+1

tk

g(t) dt ≈ hm∑j=0

wjg(tk−j) = Im(g)

by translating Im(g).

Example 23.1 (3rd order). We set m = 2:

I(g) =

∫ 3

2

g(x) dx ≈2∑j=0

wjg(2− j) =: I2(g).

To make it exact for P2, we require

I(1) =

∫ 3

2

1 dx = 1 = w0 + w1 + w2 = I2(1),

I(x) =

∫ 3

2

x dx =x2

2

∣∣∣∣32

=5

2= 2w0 + w1 = I2(x),

I(x2) =

∫ 3

2

x2 dx =x3

3

∣∣∣∣32

=19

3= 4w0 + w1 = I2(x2).

The last two conditions imply

2w0 =19

3− 5

2=

23

6

and so

w0 =23

12.

Using this in the second constraint yields

23

6+ w1 =

5

2c© 73


and so

w1 =5

2− 23

6= −8

6= −4

3.

It remains to use the first constraint to get

w2 = 1− 23

12+

4

3=

12− 23 + 16

12=

5

12.

The resulting scheme reads

xk+1 = xk +h

1223f(tk, xk)− 16f(tk−1, xk−1) + 5f(tk−2, xk−2) .

74 c©


24. Lecture 24

Similarly as for Adams-Bashford, we now derive the Adams-Moulton (for equallyspaced nodes, i.e. xi = x0 + hi), we solve the interpolation problem

I(g) :=

∫ m

m−1

g(s) ds ≈m∑j=0

wjg(m− j) =: Im(g),

which is exact on Pm. The resulting ODE scheme is

xk = xk−1 + h

m∑j=0

wjf(tk−j , xk−j).

Exercise 24.1 (Undetermined Coefficients). Compute the coefficients wj2j=0 forthe third order method using the undetermined coefficients.

Backwards Differences. Consider approximating the derivative

f ′(x) ≈ α0f(x) + α1f(x− h) + ...+ αmf(x− hm).

Recall that to find such approximation, we first find the polynomial pm ∈ Pminterpolating

(14) pm(t) = f(t)

for t = x, x− h, ..., x− hm and we then set

f ′(x) ≈ p′m(x).

Applying Newton form to solve the interpolation problem (14), we get

pm(t) =

m∑i=0

ciqi(t),

where

q0(t) ≡ 1, qj(t) =

j−1∏l=0

(t− x+ lh).

As we shall see in the Appendix at the end of this lecture (the audio refers to thewrong pages),

(15) ci =1

i!Dihf(x).

Here

D0hf(x) = f(x),

D1hf(x) =

f(x)− f(x− h)

h,

Dj+1h f(x) = D1

h[Djhf(x)].

Example 24.1 (D2h).

D2hf(x) = D1

h

[f(x)− f(x− h)

h

]=

f(x)−f(x−h)h − f(x−h)−f(x−2h)

h

h=f(x)− 2f(x− h) + f(x− 2h)

h2.

c© 75


Example 24.2 (D3h).

D3hf(x) =

f(x)− 3f(x− h) + 3f(x− 2h)− f(x− 3)

h2.

Notice that the coefficients 1,−3, 3,−1 in front of the functions are the binomialcoefficients.

Returning to the Newton’s form for pm, we find that

pm(t) = f(x0) +D1hf(x)(t− x) +

1

2!D2hf(x)(t− x)(t− x+ h)

+1

3!D3hf(x)(t− x)(t− x+ h)(t− x+ 2h)

+ ...+1

m!Dmh f(x)(t− x)(t− x+ h) · ... · (t− x+ (m− 1)h).

Differentiating the above expression and taking t = x, we arrive at

p′m(x) = 0 +D1hf(x) +

1

2!D2hf(x)h

+1

3!D3hf(x)h · 2h

+ ...+1

m!Dmh f(x)h(2h) · ... · (m− 1)h,

i.e.

p′m(x) =

m∑i=1

hi−1

iDihf(x)

and so

f ′(x) ≈m∑i=1

hi−1

iDihf(x).

The corresponding ODE schemes are

m∑i=1

hi−1

iDihxi = f(tm, xm).

They are all implicit.

Example 24.3 (1st order method).

f ′(x) ≈ f(x)− f(x− h)

h.

The corresponding ODE scheme is the Backward Euler scheme

xj − xj−1

h= f(tj , xj).

Example 24.4 (2nd order method).

f ′(x) ≈ f(x)− f(x− h)

h+f(x)− 2f(x− h) + f(x = 2h)

2h=

3/2f(x)− 2f(x− h) + 1/2f(x− 2h)

h.

The corresponding ODE scheme is

3/2xj − 2xj−1 + 1/2xj−2

h= f(tj , xj).

76 c©


Example 24.5 (3rd order method).

f ′(x) ≈3/2f(x)− 2f(x− h) + 1/2f(x− 2h)

h+f(x)− 3f(x− h) + 3f(x− 2h)− f(x− 3h)

3h

=1

6h(11f(x)− 18f(x− h) + 9f(x− 2h)− 2f(x− 3h)) .

The corresponding ODE scheme is

11xj − 18xj−1 + 9xj−2 − 2xj−3

6h= f(tj , xj).

Appendix of Lecture 24. We now return to the justification of (15) and referto Section 6.2 of the book. In the latter, the coefficients

ci = f [x, x− h, ..., x− hi],the divided difference notation. We want to show that

f [x, x− h, ..., x− h(i+ i)] =1

i!Dif(x).

To prove this, we proceed by induction. Clearly, we have

c0 = f [x] := f(x).

Suppose that

f [x, x− h, ..., x− hl] =1

l!Dlf(x).

This implies that

f [x− h, x− 2h, ..., x− h(l + 1)] =1

l!Dlf(x− h).

We now invoke Theorem 1 of Section 6.2, which guarantees that

f [x0, x1, ..., xn] =f [x1, ..., xn]− f [x0, ..., xn−1]

xn − x0.

Therefore,

f [x, x− h, ..., x− (l + 1)h] =f [x− h, ..., x− (l + 1)h]− f [x, ..., x− lh]

x− (l + 1)h− x

=Dlhf(x)−Dlf(x− h)

h(l + 1)

1

l!=

1

(l + 1)!Dl+1h f(x).

This ends the induction proof.

c© 77


25. Lecture 25

25.1. Errors in ODE schemes. Stability and local truncation error are two cen-tral notions are when studying numerical methods for ODE schemes. We firstprovide an intuitive description.

(1) Local truncation error: The numerical schemes we have seen we obtained bydropping error terms. The truncation error characterized this error madeat each step.

(2) Stability: If you make an error computing xj to approximation x(tj), thaterror propagates through all remaining steps (this is unavoidable). A stablescheme prevents this error from increasing.

We will only consider the Adams-Bashford (AB) and Adams-Moulon (AM)schemes for the analysis.

Regarding the Stability, the error at any time propagates but do not grow toomuch. A necessary condition for stability, which we will take as the definition ofstability in this class, is the following.

Definition 25.1 (Stability). We say that a scheme is stable if when applied to

x′(t) = 0, x(t0) = ε,

the approximate solutions remains bounded at any fixed time independently of themesh size h.

In the case t0 = 0 and h = 1/m, xhm ≈ x(1) then stability holds at time t = 1 if

|xhm| ≤ C as h = 1/m→ 0.

Example 25.1. AB/AM The AB or AM schemes applied to the above ODE read

xhk+1 = xhk , xh0 = ε =⇒ xhk = ε

and are therefore stable.

We now discuss the local truncation error and start with a definition.

Definition 25.2 (Local truncation error). The local truncation error is the errormade when one substitutes the solution into the numerical ODE scheme.

For example,

x(tk+1)− x(tk)

h−m+σ∑j=σ

wjf(tk−j , x(tj − k)) =: ρh(k + 1)

is the local truncation for the numerical scheme

xk+1 − xkh

=

m+σ∑j=σ

wjf(tk−j , xj−k).

Note that ρh(k + 1) = O(hm+1) for AB/AM.The error between x(tk) and xk given by the AB scheme is described in the

following theorem.

Theorem 25.1 (Error Estimate for Adams Bashford). Assume that f satisfies auniform Lipschitz condition, i.e.

|f(t, x)− f(t, y)| ≤M(x− y), x, y ∈ R, t ∈ [0, T ].78 c©


Let h > 0, tk := kh and x0, x1, .. be the sequence of the m + 1 points Adams-Brashford approximates. Set ρh := maxm<k≤T/h |ρh(k)|, w := maxj=0,...,m wj andεk := maxj=0,...,k |x(tj)− xj |. Then for m < k ≤ T/h, there holds

εk ≤ eδtk [εm + ρh/δ],

where δ := (m+ 1)wM .

Proof. We start by noting that the Lipschitz assumption on f guarantees a uniquesolution of the ODE for t ≥ t0 = 0. The AB approximate solutions are given by

xk+1 = xk + h

m∑j=0


Subtracting this from the local truncation relation

x(tk+1)− x(tk)

h−

m∑j=0

wjf(tk−j , x(tk−j)) =: ρh(k + 1)

and setting ek := x(tk)− xk gives

ek+1 = ek + h

m∑j=0

wj [f(tk−j , x(tk−j))− f(tk−j , xk−j)] + hρh(k + 1).

Now apply the Lipschitz condition and the triangle inequality

|ek+1| ≤ |ek|+ h

m∑j=0

|wj ||f(tk−j , x(tk−j))− f(tk−j , xk−j)|+ h|ρh(k + 1)|

|ek|+ hM

m∑j=0

|wj ||x(tk−j)− xk−j)|+ h|ρh(k + 1)|

|ek|+ hMw

m∑j=0

|ek−j |+ h|.

Using the definition of ρh, εk and δ, we get

|ek+1| ≤ |ek|+ δhεk + hρh,

i.e.

εk+1 ≤ (1 + δh)εk + hρh.

Repeating the application of the above estimate gives

εk ≤ (1 + δh)εk−1 + hρh

≤ (1 + δh)2εk−2 + hρh(1 + (1 + δh))

...

≤ (1 + δh)k−mεm + hρh

k−m−1∑j=0

(1 + δh)j .

Now k ≤ T/h, so

(1 + δh)k−m ≤ (eδh)k−m ≤ eδkh ≤ eδT .c© 79


Alsok−m−1∑j=0

(1 + δh)j =(1 + δh)k−m − 1

(1 + δh)− 1≤ eδT

δh.

Thus,εk ≤ eδT (εm + ρh/δ),

which is the desired result.

Before providing a similar result for the AM scheme, we first show that thescheme is well defined. Recall the AM is implicit and xk is the solution (if exists)to

xk+1 = xk + h

m−1∑j=−1


Lemma 25.1 (AM - Well defined). Assume that f satisfies a uniform Lipschitzcondition with constant M . Given x0, ..., xk. Then the k + 1 iterate of the m + 1points AM is well defined provided h < 1

|w−1|M .

Proof. We consider the fixed point iteration for xk+1, namely

y0 = xk

yl+1 = xk + hw−1f(tk+1, yl) + h

m−1∑j=0

wjf(tk−j , xk−j) := G(yl)

i.e. yl+1 = G(yl). Note that

G(y)−G(z)− hw−1[f(tk+1, y)− f(tk+1, z)]

so that|G(y)−G(z)| ≤ h|w−1|M |y − z|.

This means that under the assumption h|w−1|M < 1, G is a contraction mappingso the sequence yl converges to a unique fixed point, xk+1.

80 c©


26. Lecture 26

In the previous lecture, we provided an error analysis for the Adams-Bashfordschemes (see Theorem 25.1) and showed that Adams-Moulton schemes were welldefined provided that h ≤ h0 (Lemma 25.1). Here h0 depends on |w−1| and theuniform Lipschitz constant M of f . To analyze Adams-Moulton, we need thefollowing lemma.

Lemma 26.1. Let α, β, h > 0 with αh ≤ 1/2. Then,

(1− αh)−1 ≤ (1 + 2αh),

(1− αh)−1(1 + βh) ≤ eγh,where γ := β + 2α and

n−1∑j=0

ejγh ≤ enγh

γh.

Proof. The assumption αh ≤ 1/2 implies that −1/2 ≤ −αh, 1/2 ≤ 1− αh or

(1− αh)−1 ≤ 2.

Thus

(1− αh)−1 = 1 + αh+ (αh)2 + (αh)3 + ... = 1 + αh+ α2h2(1 + αh+ (αh)2 + ...)

1 + αh+ α2h2(1− αh)−1 = 1 + αh+ αh (αh(1− αh)−1)︸︷︷︸≤1

≤ 1 + 2αh.

Also,

1 + βh ≤ 1 + βh+(βh)2

2!+ ... = eβh

so

(1− αh)−1(1 + βh) ≤ e2αheβh = eγh.

Finally, the geometric serie relation

n−1∑j=0

ejγh =

n−1∑j=0

(eγh)j =enγh − 1

eγh − 1.

We can now provide the error analysis for Adams-Moulton.

Theorem 26.1 (Error Estimate for Adams Bashford). Assume that f satisfies auniform Lipschitz condition, i.e.

|f(t, x)− f(t, y)| ≤M(x− y), x, y ∈ R, t ∈ [0, T ].

Let h > 0, tk := kh and x0, x1, .. be the sequence of the m + 1 points Adams-Moulton approximates. Set ρh := maxm−1<k≤T/h |ρh(k)|, w := maxj=0,...,m−1 |wj |and εk := maxj=0,...,k |x(tj) − xj |. Assume that h ≤ h0 = 1

2|w−1|M , then for

m ≤ k ≤ T/h, there holds

εk ≤ eδtk [εm−1 + 2ρh/δ],

where δ := M(mw + 2|w−1|).c© 81


Proof. The AM scheme read

xk+1 − hw−1f(tk+1, xk+1) = xk + h

m−1∑j=0


so that the local truncation error satisfies

x(tk+1)− hw−1f(tk+1, x(tk+1)) = x(tk) + h

m−1∑j=0

wjf(tk−j , xk−j) + hρh(k + 1).

Subtracting the two equations, taking absolute values on both sides and proceedingas in the proof of Theorem 25.1, we get

|ek+1 − hw−1[f(tk+1, x(tk+1))− f(tk+1, xk+1)]| ≤ |ek|+ h

m−1∑j=0

|wj |M |ek−j |+ hρh

≤ εk(1 +mMwh) + hρh.

where ek := x(tk)− xk. Also,

|ek+1 − hw−1[f(tk+1, x(tk+1))− f(tk+1, xk+1)]|≥ |ek+1| − h|w−1||f(tk+1, x(tk+1))− f(tk+1, xk+1)|≥ |ek+1| − h|w−1|M |ek+1|.

Combining the above two estimates we get

εk+1(1− h|w−1|M) ≤ εk(1 +mMwh) + hρh

orεk+1 ≤ (1− h|w−1|M)−1(1 +mMwh)εk + hρh(1− h|w−1|M)−1.

Lemma 26.1 gives (since h|w−1|M ≤ 1/2)

εk+1 ≤ eγhεk + 2hρh,

whereγ = δ = M(mw + 2|w−1|).

Applying repetitively (as in Theorem 25.1) gives

εk ≤ e(k−m+1)δhεm−1 + 2hρh

k−m∑j=0

ejδh

≤ eδT (εm−1 + 2ρh/δ),

where we used Lemma 26.1 for the second inequality. This ends the proof.

Stiff ODE’s (systems)

We start with an example.

Example 26.1 (Backward Euler). Consider x(t) ∈ R2 satisfying

x′(t) = −(

1 00 M

)︸︷︷︸

=:A

x(t), x(0) =

(11

).

Here M is large and positive. These equations decouple:

x′1(t) = −x1, x′2(t) = −Mx2.82 c©


The solution isx1(t) = e−t, x2(t) = e−Mt,

i.e.

x(t) =

(e−t

e−Mt

).

Now consider the Backward-Euler (fixed step size h) approximation applied to thesystem is

1

h(xj+1(t)− xj(t) = −Axj+1.

(Warning: here the subindices in xj denote the approximation at tj = jh+ t0, notthe components of x.) Thus

(I + hA)xj+1 = xj

or

xj+1 = (I + hA)−1xj =

((1 + h)−1 0

0 (1 +Mh)−1

)xj .

This leads to

xj+1 =

((1 + h)−j 0

0 (1 +Mh)−j

)(11

).

The approximate solutions remain bounded for any j and h > 0. By the way,(1 + α)−j ≈ e−αj and so (1 +Mh)−j ≈ e−Mtj .

Example 26.2 (Forward Euler). Consider again the system

x′(t) = −(

1 00 M

)︸︷︷︸

=:A

x(t), x(0) =

(11

).

The Forward-Euler scheme reads1

h(xj+1(t)− xj(t) = −Axj .

or(I + hA)xj+1 = xj

or

xj+1 = xj − hAxj = (I − hA)xj =

((1− h) 0

0 (1−Mh)

)xj

so

xj+1 =

((1− h)j 0

0 (1−Mh)j

)(11

).

In this case, the solutions remain bounded only if

|1−Mh| ≤ 1 i.e. Mh ≤ 2.

When Mh > 2, the solutions (at least the second component) blow up exponentially(unstable).

Remark 26.1. In the previous examples x2(t) = e−Mt becomes small fast so asuccessful scheme should approximate the first component accurately while makingthe second small.

c© 83


27. Lecture 27

We start this lecture with another example of stiff system.

Example 27.1 (Parabolic problem). We consider the parabolic partial differentialboundary value problem: find u : [0, 1]× [0, T ]→ R such that ut(x, t)− uxx(x, t) = 0, x ∈ (0, 1), t ∈ (0, T ],

u(0, t) = u(1, t) = 0, t ∈ [0, T ]u(x, 0) = u0(x), x ∈ (0, 1)

where u0 is a given initial condition.A finite difference approximation to this system can be constructed by introducing

a grid xj = jh, h = 1/N and a approximation

w(xi, t) ≈ u(xi, t), i = 0, ..., N, t ∈ [0, T ].

Using the finite difference approximation to −uxx,

−uxx(xi, t) =2u(xi, t)− u(xi−1, t)− u(xi+1, t)

h2+O(h2)

we replace u(xi, t) by w(xi, t) and drop the O(h2) term to arrive at

wt(xi, t) +2w(xi, t)− w(xi−1, t)− w(xi+1, t)

h2= 0

together with w(xi, 0) = u0(xi) and w(x0, t) = w(xN , t) = 0. Now we introduce thevector v(t) defined component wise by

vj(t) = w(xj , t), j = 1, ..., N − 1

which therefore satisfies

v′ = − 1

h2Av,

where

A =

2 −1

−1 2 −1 0−1 2 −1

. . .. . .

. . .

0 −1 2 −1−1 2

.

The eigenvectors ψj, j = 1, ..., N − 1, of A are known and given by

(ψj)k = sin

(πjk

N

), k = 1, ..., N − 1.

The associated eigenvalues are

λj = 2− 2 cos

(πj

N

).

This means thatM−1AM = D

where the jth column of M is ψj and D is the diagonal matrix with entries Djj = λj.Multiplying the ODE system by M−1, we find

M−1v′(t) = − 1

h2M−1AMM−1v

84 c©


or using the notation v(t) = M−1v(t)

v′ = − 1

h2Dv,

i.e.

v′j = − 1

h2λj vj .

The solution is

vj = e−λj

h2tvj(0).

Note that

λ1 = 2− 2 cos

(iπ

N

)≈ 2− 2

(1− π2

2N2

)=

π2

N2= π2h2.

Hence λ1

h2 ≈ π2. Similarly,

λN−1 = 2− 2 cos

(N − 1π

N

)≈ 4

soλN−1

h2≈ 4

h2= 4N2.

Therefore, we realize that this is a stiff ODE but you do not know the componentswithout knowing the eigenvectors. For more general problems, the eigenvectors areunknown but the eigenvalues behaves the same way.

To characterize/understand the stability of stiff ODEs, we consider the singlevariable equation for given λ ∈ C:

u′ = λu, t > 0

supplemented with the initial condition u(0) = u0. Its solution is

u(t) = eλtu0 = e(Re(λ)+iIm(λ))tu0 = eReλt (cos(Im(λ)t+ i sin(Im(λ)t))

using the Euler’s formula. Hence,

|u(t)| = |u0||eλt| = eRe(λ)t|u0|,which proves that the solution remains bounded as t gets large if an only if Re(λ) ≤0. This leads to the following definition.

Definition 27.1 (A-stable). An ODE method is A-stable if when applied to theODE

u′(t) = λu(t), t > 0; u(0) = u0

with fixed timestep h, the resulting approximation remains bounded for all h and allλ with Re(λ) ≤ 0.

A−stable methods are good schemes for stiff ODEs.

Example 27.2 (Backward Euler). The Backward Euler scheme applied to u′ = λu,with u(0) = v given, reads

uj − uj−1

h= λuj , u0 = v

oruj = uj−1 + hλuj

i.e.uj = (1− hλ)−1uj−1 = (1− hλ)−ju0.

c© 85


1

1

η = 1− eiθ

|1− η| < 1

Figure 11. Absolute convergence region for Backward Euler.

Hence,|uj | = |1− hλ|−j |u0|

but

|1− hλ|2 = |1− hRe(λ)− ihIm(λ)|2 = (1− hRe(λ))2 + h2Im(λ)2 ≥ (1− hRe(λ))2

and if Re(λ) ≤ 0, 1− hRe(λ) ≥ 1. This implies that

(1− hRe(λ))2 ≥ 1

or1

|1− hλ|j≤ 1

|1− hλ|≤ 1.

In turn, we obtain|uj | ≤ |u0| = |v|,

which shows that the approximation remains uniformly in h and therefore that Back-ward Euler is A-stable.

The previous example also indicates that the Backward Euler scheme is stableif and only if

1

|1− hλ|≤ 1

or|1− hλ| ≥ 1.

Note that 1− λh = 1− η (define η = λh) is a continuous function of η. Moreover,

|1− η| = 1 =⇒ 1− η = eiθ for some θ ∈ [0, 2π],

i.e.η = 1− eiθ.

Thus the scheme diverges if and only if

η ∈ z ∈ C : |1− z| < 1 =: B1(1).

This region is depicted in Figure 11.

Definition 27.2 (Absolute Stability). The region of absolute stability of a schemeis the set of η = hλ such that the approximate solutions remain bounded.86 c©


1

1

Figure 12. Absolute convergence region for Forward Euler(dashed region).

Example 27.3 (Backward Euler). The region of absolute stability for BackwardEuler is

C \B1(1) = z ∈ C : |1− z| ≥ 1 .

Example 27.4 (Forward Euler). The Forward Euler scheme applied to u′ = λuwith u(0) = u0 is

uj = (1 + hλ)uj−1

oruj = (1 + hλ)ju0.

The solutions remain bounded if and only if

|1 + hλ| ≤ 1.

Note that |1 + hλ| = 1 if and only if

1 + hλ = eiθ, θ ∈ [0, 2π].

If η := hλ then 1 + η = eiθ or η = (eiθ − 1), see Figure 12. Forward Euler is notA−stable.

Remark 27.1 (Aboslute and A-stability). A scheme is A−stable if and only if theregion of absolute stability contains

z ∈ C : Re(z) ≤ 0.

c© 87


28. Lecture 28

This programming exercise is the last assignment of this course and concludesMATH 609d (there will not be a final)!

This was the first time that typed notes were provided and several typos needto be fixed. If you found any, please let me know. I have also realized that theydid not strictly matched the audio files but this will be also fixed next time.

In any event, the combination of notes and audio file enabled me to give adistance course that was pretty close to the level of course that is done “in house”.I hope that you all enjoyed the course and I hope many of you will consider MATH610d in the Spring.

88 c©

numerical methodspasciak/classes/609/lectures.pdf · numerical methods andrea bonito and joseph e....

Documents