interpolation&polynomialapproximationjpeterson/interpolation.pdf ·...

74
Interpolation & Polynomial Approximation The next problem we want to investigate is the problem of finding a simple function such as a polynomial, trigonometric function or rational function which approximates either a discrete set of (n + 1) data points (x 0 ,f 0 ), (x 1 ,f 1 ), ··· (x n ,f n ) or a more complicated function f (x). We saw that we can approximate a set of discrete data by a linear least squares approach. In this case we make the assumption that the data behaves in a certain way such as linearly or quadratically. Now we take a different approach where we find a function which agrees with the data at each of the given points, i.e., it interpolates the data.

Upload: trinhminh

Post on 06-Aug-2019

255 views

Category:

Documents


0 download

TRANSCRIPT

Interpolation & Polynomial Approximation

• The next problem we want to investigate is the problem of finding a simplefunction such as a polynomial, trigonometric function or rational functionwhich approximates either a discrete set of (n + 1) data points

(x0, f0), (x1, f1), · · · (xn, fn)or a more complicated function f(x).

•We saw that we can approximate a set of discrete data by a linear leastsquares approach. In this case we make the assumption that the data behavesin a certain way such as linearly or quadratically. Now we take a differentapproach where we find a function which agrees with the data at each of thegiven points, i.e., it interpolates the data.

• The problem we are going to consider initially is to find an interpolatingpolynomial for a set of discrete data (xi, fi).

• The data can be obtained from two main sources. First, suppose we have acontinuous function f(x) and we select (n + 1) distinct points xi ∈ [a, b],and then let fi = f(xi). In this way we are using interpolation as a means toapproximate a continuous function by a simpler function.

• Secondly, the data (xi, fi) may come from observations, e.g., from measure-ments. In this case we do not know a continuous function f(x) but rather itsvalue at (n + 1) points.

• If the data is observed then this is the fixed data we must work with. However,if the data arises from evaluating a known function f(x) at (n + 1) points,then we can choose the points uniformly or some other sampling.

Polynomial interpolation of discrete data: Given n+1 distinctpoints

(x0, f0), (x1, f1), · · · (xn, fn)find a polynomial of degree ≤ n, say pn, that interpolates thegiven data, i.e., such that

pn(xi) = fi, for i = 0, 1, 2, . . . , n

The polynomial pn(x) is called the interpolant of the data{fi}, i = 1, . . . , n + 1. The points {xi}, i = 0, 1, . . . , n arecalled the interpolation points.

• This means that the graph of the polynomial passes through our n+1 distinctdata points.

• Let Pn denote the space of all polynomials of degree n or less. Then anypolynomial in Pn can be written in terms of the monomial basis for Pn

1, x, x2, . . . , xn as

a0 + a1x + a2x2 + · · · + an−1x

n−1 + anxn

• Note that we have (n+1) coefficients to determine and (n+1) interpolationconditions to satisfy.

•Why do we choose Pn to be the space of all polynomials with degree ≤ nand not just equal to n?

• In the first figure we have two data points so we find the linear polynomialwhich passes through these two points.

• In the second figure we have three data points so we find the quadratic poly-nomial which passes through these three points.

• In the third and fourth figures we have four and five points so we find a cubicand a quartic, respectively.

The interpolation conditions pn(xi) = fi, i = 0, . . . , n can be gathered togetherinto the linear system of algebraic equations given by

1 x0 x20 x30 · · · xn−10 xn0

1 x1 x21 x31 · · · xn−11 xn1

1 x2 x22 x32 · · · xn−12 xn2

... ... ... ... · · · ... ...1 xn−1 x2n−1 x3n−1 · · · xn−1

n−1 xnn−1

1 xn x2n x3n · · · xn−1n xnn

a0a1a2...

an−1

an

=

f0f1f2...

fn−1

fn

. (1)

The (n + 1)× (n + 1) coefficient matrix of this linear system is referred as theinterpolation matrix or, more precisely, the interpolation matrix corresponding tothe monomial basis {xj}nj=0. In this case, the interpolation matrix is known asthe Vandermonde matrix.

It is not difficult to show that Vandermonde determinant, i.e., the determinantof the matrix in (1), is given by

∏0≤k<j≤n(xj − xk) so that this determinant is

nonzero because the given points are distinct, i.e., xj 6= xk for j 6= k.

Example For n = 2, we have the coefficient matrix

1 x0 x201 x1 x211 x2 x22

whose determinant is given by

det

1 x0 x201 x1 x211 x2 x22

= (x1x

22 − x2x

21)− (x0x

22 − x2x

20) + (x0x

21 − x1x

20)

= (x1 − x0)(x2 − x0)(x2 − x1) =∏

0≤k<j≤2

(xj − xk).

Note that because the points are distinct, this determinant is not equal to zero.�

Because the determinant of coefficient matrix of the linear system (1) is non-zero,that matrix is non-singular so that there exists a unique set of numbers a0, a1,a2, a3, . . . , an−1, an that solve those linear systems. But those numbers are thecoefficients of the interpolant pn(x) so that we have proved the following result.

Theorem. Existence and uniqueness of the polynomial inter-polant. Given the data pairs {xj, fj}nj=0, where the points

{xj}nj=0 are distinct, there exists a unique polynomial pn(x) ∈Pn that satisfies the interpolation conditions pn(xi) = fi,i = 0, 1, 2, . . . , n.

It seems that we are done: given the data pairs {xj, fj}nj=0, we can constructthe interpolating polynomial of degree less than or equal to n by first solvingthe linear system (1) for the coefficients {ai}nj=0 and then substituting theminto the expression for pn. There are two reasons why we are not done. First,the Vandermonde matrix is extremely ill conditioned; in general, the conditionnumber grows exponentially as n increases. The second reason is that there areways to construct the interpolant without having to solve a matrix problem.

What do we want to accomplish?

1. First, we want to determine an efficient way to determine the polynomial,i.e., determine the coefficients a0, a1, · · · , an.

2. Once we have determined the coefficients, we want to determine how toefficiently evaluate the polynomial at a point.

3. If we decide to add points to our data set, then we don’t want to start ourcalculation from scratch. Since the polynomial is unique we choose to evaluateit any way that works and is efficient.

4. We want to decide if this is a good approach to interpolating a set of dataor a function. If not, we want to look at alternative ways to interpolate.

5. If the data is obtained by evaluating a function at points, is there an “opti-mal” way to choose these points?

Some Simple Examples

Example As a first example, find the linear polynomial p1 which passes throughthe two points (1,−5), (5, 3).

Clearly if p1 passes through these two points we have

p1(1) = −5 p1(5) = 3

So if p1 = a0 + a1x we must satisfy

a0 + a1(1) = −5 a0 + 5a1 = 3

Solving these two equations we get a0 = −7 and a1 = 2 so we have the polyno-mial p1 = 2x− 7.

Another approach to find p1 is to write it as the sum of two linear polynomials

ℓi(x), i = 1, 2 with the special properties

ℓ1(x) =

{1 if x = x1

0 if x = x2ℓ2(x) =

{0 if x = x1

1 if x = x2

If we determine ℓi(x), i = 1, 2 then the linear polynomial which interpolates thetwo data points is just

p1(x) = −5ℓ1(x) + 3ℓ2(x)

because p1(x1) = −5 and p1(x2) = 3.

Now we can immediately write down these two linear polynomials

ℓ1(x) =x− x2x1 − x2

=x− 5

−4ℓ2(x) =

x− x1x2 − x1

=x− 1

4

Thus the interpolating polynomial is

5

4(x− 5) +

3

4(x− 1) = 2x− 7

Of course we got the same polynomial as before because it is unique.

Example As a second example, find the quadratic that passes through the 3points (1, 10), (2, 19), (-1,-14).

Clearly if p2 = a0 + a1x + a2x2 passes through these three points we have

p2(1) = 10 p2(2) = 19 p2(−1) = −14

or equivalently we must satisfy

a0+a1(1)+a2(1)2 = 10 a0+2·a1+(2)2a2 = 19 a0+a1(−1)+a2(−1)2 = −14

Solving these three equations we get the polynomial p2 = −x2 + 12x− 1.

Alternately, we can write p2(x) in terms of three quadratic polynomials ℓi(x),i = 1, 2, 3 where

p2(x) = 10ℓ1(x) + 19ℓ2(x)− 14ℓ3(x)

with

ℓ1(x) =

{1 x = x1

0 x = x2 or x3ℓ2(x) =

{1 x = x2

0 x = x1 or x3ℓ3(x) =

{1 x = x3

0 x = x1 or x2

Clearly

ℓ1(x) =(x− 2)(x + 1)

(1− 2)(1 + 1)ℓ2(x) =

(x− 1)(x + 1)

(2− 1)(2 + 1)ℓ3(x) =

(x− 1)(x− 2)

(−1− 1)(−1− 2)

• As you can see, if we proceed in the manner when we directly impose theinterpolation conditions then to find a polynomial of degree n that passesthrough the given n+1 points we have to solve n+1 linear equations for then + 1 unknowns a0, a1, . . . , an.

• However, if we take the approach of writing pn as the sum of (n+ 1) polyno-mials of degree n then we avoid solving the linear system.

• There are basically two main efficient approaches for determining the interpo-lating polynomial. Each has advantages in certain circumstances.

• Since we are guaranteed that we will get the same polynomial using differentapproaches (since the polynomial is unique) it is just a question of which is amore efficient implementation.

Lagrange Form of the Interpolating Polynomial

•We want to generalize the procedure of the previous examples which avoidssolving a linear system to find the interpolating polynomial. This approach iscalled the Lagrange form of the interpolating polynomial.

• Assume we have (n + 1) data points (xi, fi), i = 0, 1, 2, . . . , n where the xiare distinct. Let ℓi(x) be the n degree polynomial which satisfies the “deltaconditions”

ℓi(x) =

{1 if x = xi

0 if x = xj, for j 6= ior equivalently ℓi(xj) = δij .

If we can write each of these polynomials then the interpolating polynomial isjust

pn(x) =n∑

i=0

fiℓi(x) .

• How do we form ℓi(x)? Since it must be zero at each xj, j 6= i then the

numerator is just the product of the factors (x− xj), i.e.,

(x− x0)(x− x1) · · · (x− xi−1)(x− xi+1) · · · (x− xn)

and because it must be one at x = xi the denominator is just the numeratorevaluated at xi, i.e.,

(xi − x0)(xi − x1) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xn)

Given n + 1 distinct points

(x1, f1), (x2, f2), · · · (xn+1, fn+1) .

The Lagrange form of the interpolating polynomial is

pn =n∑

i=0

fiℓi(x) ,

where

ℓi(x) =(x− x1)(x− x2) · · · (x− xi−1)(x− xi+1) · · · (x− xn+1)

(xi − x1)(xi − x2) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xn+1)

• Each ℓi(x) is a polynomial of degree n and satisfies ℓi(xj) = δij.

• The space Pn has dimension (n + 1). The polynomials ℓi(x) belong to Pn;there are (n + 1) of them and they are linearly independent so they formanother basis for Pn. They are called the Lagrange basis.

• The interpolation conditions

fj =n∑

i=0

ciℓi(xj) imply fj = cj

which gives the form

pn =n∑

i=0

fiℓi(x) .

Using the Lagrange form of the interpolation polynomial doesn’t require solv-ing a linear system so instead of the coefficient matrix being the ill-conditionedVandermonde matrix it is the perfectly conditioned identity matrix!

• The uniqueness of the Lagrange interpolating polynomial can be proved with-out resorting to the Vandermonde system (1) or, for that matter, resorting toany specific method for constructing that polynomial.

Suppose we have two polynomials p(x) ∈ Pn and p̂(x) ∈ Pn, both of whichsatisfy the interpolating conditions ; i.e., we have p(xj) = p̂(xj) = fj forj = 0, 1, . . . , n.

Then, p(x)− p̂(x) is also a polynomial of degree less than or equal to n andwe have that p(xj)− p̂(xj) = 0 for j = 0, 1, . . . , n.

Thus, p(x) − p̂(x) is a polynomial of degree less than or equal to n thatvanishes at (n + 1) distinct points; such a polynomial must vanish for all x,i.e., p(x)− p̂(x) = 0 for all x so that p(x) = p̂(x) = 0.

The polynomials

ℓi(x) =(x− x1)(x− x2) · · · (x− xi−1)(x− xi+1) · · · (x− xn+1)

(xi − x1)(xi − x2) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xn+1)

are sometimes written as

ℓj(x) =W (x)

(x− xj)W ′(xj)for j = 0, . . . , n.

where

W (x) =

n∏

k=0

(x− xk),

Example. Evaluate the interpolating polynomial for the data

(0, 4), (1, 5), (−2,−10), (4, 128)

at x = 2 and x = 3. In practice we will typically evaluate an interpolatingpolynomial at points instead of writing out the polynomials; e.g., when we plotthe polynomial we simply evaluate it at many points and connect the points toget the curve.

Because we have 4 points we know that we will have a cubic (or less degree)polynomial. The polynomial evaluated at x = 2 is

p3 = 4ℓ1(2) + 5ℓ2(2)− 10ℓ3(2) + 128ℓ4(2).

The coefficients of the Lagrange polynomial at x = 2 can be found by evaluatingthe numerator of each ℓi at x = 2 and then dividing by the denominator of ℓi

ℓ1(x) =(x− 1)(x + 2)(x− 4)

(0− 1)(0 + 2)(0− 4)⇒ ℓ1(2) =

1 · 4 · −2

8= −1

ℓ2(x) =(x− 0)(x + 2)(x− 4)

(1− 0)(1 + 2)(1− 4)⇒ ℓ2(2) =

2 · 4 · −2

−9=

16

9

ℓ3(x) =(x− 0)(x− 1)(x− 4)

(−2− 0)(−2− 1)(−2− 4)⇒ ℓ3(2) =

2 · 1 · −2

−36=

1

9

ℓ4(x) =(x− 0)(x− 1)(x + 2)

(4− 0)(4− 1)(4 + 2)⇒ ℓ4(2) =

2 · 1 · 424 · 3 =

8

72=

1

9

Therefore

p3(2) = 4(−1) + 5(16

9)− 10(

1

9) + 128(

1

9) =

−36 + 80− 10 + 128

9= 18

Now to evaluate p3(3) note that the denominators of the ℓi remain the sameas well as the coefficients 4, 5,−10, 128 in the expansion; the only thing thatchanges is the numerator of each ℓi because we are evaluating it at a differentx-value.

ℓ1(3) =2 · 5 · −1

8ℓ2(3) =

3 · 4 · −1

−9

ℓ3(3) =3 · 2 · −1

−36ℓ4(3) =

3 · 2 · 572

Then p3(3) is given by

p3(3) =4

8(−10) +

5

−9(−12) +

−10

−36(−6) +

128

72(30) = 55

For homework you are going to investigate an efficient way to implement theLagrange interpolating polynomial assuming you just want to evaluate it at givenpoints.

Newton Form of the Interpolating Polynomial

• This approach to determining the interpolating polynomial using divided dif-ferences and allows adding data without starting over.

• Recall that when we wrote the Lagrange form of say the third degree inter-polating polynomial we wrote it in terms of sums of cubic polynomials. TheNewton form takes a different approach.

• The Newton form of the line passing through (x0, f0), (x1, f1) is

p1(x) = a0 + a1(x− x0)

for appropriate constants a0, a1.

• The Newton form of the parabola passing through (x0, f0), (x1, f1), (x2, f2)is

p2(x) = a0 + a1(x− x0) + a2(x− x0)(x− x1)

for appropriate constants a0, a1, a2.

• The general form of the Newton polynomial passing through (xi, fi), i =0, 1, . . . , n is

pn(x) = a0+a1(x−x0)+a2(x−x0)(x−x1)+· · ·+an(x−x0)(x−x1)·(x−xn)

for appropriate constants a0, a1, . . . , an.

• Note first that when we evaluate pn(x0) all terms in the expansion disappearso that

f0 = pn(x0) = a0 .

• If we evaluate pn(x1) then all terms starting at a2 term disappear to get

f1 = pn(x1) = f0 + a1(x1 − x0) ⇒ a1 =f1 − f0x1 − x0

.

• If we evaluate pn(x2) then all terms starting at a3 term disappear to get

f2 = pn(x2) = f0 +(f1 − f0x1 − x0

)(x2 − x0) + a2(x2 − x0)(x2 − x1)

⇒ a2 =f2 − f0 −

(f1−f0x1−x0

)(x2 − x0)

(x2 − x0)(x2 − x1).

⇒ a2 =

(f2 − f1x2 − x1

)−(f1 − f0x1 − x0

)

x2 − x0

• The first divided difference of f with respect to xi, xi+1 is

f [xi, xi+1] =f(xi+1)− f(xi)

xi+1 − xiso that

a1 = f [x0, x1], the first divided difference of f with respect to x0, x1

• The second divided difference of f with respect to xi, xi+1, xi+1 is

f [xi, xi+1, xi+2] =f [xi+1, xi+2]− f [xi, xi+1]

xi+2 − xi.

so that

a2 = f [x0, x1, x2], the second divided difference of f with respect to x0, x1, x2

• In general, ak is the kth divided difference of f with respect to x0, x1, x2, . . . , xk.

• The following example illustrates the use of this approach and the ease atwhich we can add additional data points.

Example Compute the coefficients for the Newton interpolating polynomial atthe points (0, 4), (1, 5), (3, 15) and then evaluate the polynomial at x = 2. Thenadd the data point (4, 40) and repeat the calculation.

We make a divided difference table

xi f [xi] f [xi−1, xi] f [xi−2, xi−1, xi]0 4 - -

1 55− 4

1− 0= 1 -

3 1515− 5

3− 1= 5

5− 1

3− 0=

4

3

Thus the interpolating polynomial p2(x) is

p2(x) = f(x0) + f [x0, x1](x− x0) + f [x0, x1, x2](x− x0)(x− x1)

= 4 + 1(x− 0) +4

3(x− 0)(x− 1) ⇒ p2(2) = 4 + 2 +

4

3(2)(1) =

26

3

To add another data point we simply add another row and column to our table.

xi f [xi] f [xi−1, xi] f [xi−2, xi−1, xi] f [xi−3, xi−2, xi−1, xi]0 4 - -

1 55− 4

1− 0= 1 -

3 1515− 5

3− 1= 5

5− 1

3− 0=

4

3

4 4040− 15

4− 3= 25

25− 5

4− 1=

20

3

20/3− 4/3

4− 0=

16

12=

4

3

Thus the interpolating polynomial p2(x) is

p2(x) = f(x0)+f [x0, x1](x−x0)+f [x0, x1, x2](x−x0)(x−x1)+f [x0, x1, x2, x3](x−x0)(x−

= 4 + 1(x− 0) +4

3(x− 0)(x− 1) +

4

3(x− 0)(x− 1)(x− 3)

p2(2) = 4 + 1(2) +4

3(2)(1) +

4

3(2)(1)(−1) ⇒ p2(2) = 6 +

8

3− 8

3= 6

Note that to add a data point we used all of our previous calculations. If weuse the Lagrange form of the interpolating polynomial then when we add a datapoint we have to start over with the calculations.

Measuring Errors

We have used phrases such as the “difference between two functions” or the“error in an approximation of a function.” We want to quantify these notions,i.e., precisely say what we mean by these phrases, specifically, say how we aregoing to measure such differences and errors.

In our previous error calculations we had a vector of errors at points. Here f(x)and p(x) are functions defined everywhere on some interval [a, b] so instead usinga vector norm to measure the error we want an error which measures how wellwe are approximating f(x) over the entire interval.

In our context, we are given a function f(x) and then we constructed a simplerfunction (the interpolating polynomial) p(x) that is hopefully “close” to f(x);we want to quantify what it means to be “close,” or equivalently, quantify whatit means for the difference or error f(x)− p(x) to be “small.”

There are many ways to measure the closeness of two functions; we use two suchways in these notes.

Continuous norms

Let g(x) be a square integrable function defined on the interval[a, b], we can measure its size using the L2(a, b) or root-meansquare norm

‖g(x)‖L2(a,b) =(∫ b

a

|g(x)|2 dx)1/2

,

Example Let g(x) = sinπx on [0, 1] ; calculate L2(0, 1) norm of g(x).

‖g(x)‖2L2(a,b) =

∫ 1

0

| sin(πx)|2 dx =1

2

∫ 1

0

(1− cos(2πx)

)dx

=(x2− sin 2πx

)∣∣∣x=1

x=0=

1

2=⇒ ‖g(x)‖L2(a,b) =

√2

2≈ 0.707

Let g(x) be an integrable function defined on the interval [a, b],we can measure its size using the L∞[a, b] or uniform or maxi-mum or pointwise norm

‖g(x)‖L∞[a,b] = maxx∈[a,b]

|g(x)|,

Example Let g(x) = sinπx on [0, 1] ; calculate L∞[a, b] norm of g(x).

‖ sin πx‖L∞[0,1] = maxx∈[0,1]

| sin πx| = 1

When we looked at vector norms, we saw that all vector norms are equivalent.This is because they are norms on the finite dimensional space (i.e., we can finda basis) IRn. This is true for all finite dimensional spaces.

However, here we are dealing with infinite dimensional spaces.

What is the relationship between norms in L2 and in L∞?

For a continuous function g(x),

‖g‖2L2 =

∫ b

a

|g(x)|2 dx ≤∫ b

a

[ maxx∈[a,b]

|g(x)|]2

dx

= maxx∈[a,b]

|g(x)|2∫ b

a

dx = (b− a)(maxx∈[a,b]

|g(x)|)2

= (b− a)‖g‖2L∞

so that‖g‖L2(a,b) ≤ (b− a)1/2‖g‖L∞[a,b].

Now if we let g(x) = f(x)−p(x), this bound implies that so long as the interval[a, b] is finite (i.e, we have −∞ < a < b < ∞) small errors as measured in theL∞[a, b] norm imply that the error is also small if it is measured in the L2(a, b)

norm.

The converse, however, is not true, i.e., close approximation in the L2(a, b) normdoes not imply close approximation in the L∞[a, b] norm.

Example This example illustrates that close approximation in the L2(a, b) normdoes not imply close approximation in the L∞[a, b] norm.

Consider the function f(x) = 1 on the interval [0, 1] and the continuous functionfε(x) as depicted in the figure, where 0 < ε < 1. We have that

‖f(x)− fε(x)‖L2(0,1) = ε1/2 and ‖f(x)− fε(x)‖L∞[0,1]) = 1.

If ε ≪ 1, then ‖f(x)− fε(x)‖L2(0,1) is small but ‖f(x)− fε(x)‖L∞[0,1] = 1, i.e.,

small L2(0, 1) norm does not imply small L∞[0, 1] norm.

fε(x)

f(x)

1

1

Small L∞[a, b] norm implies that the function is small at every point in theinterval [a, b] whereas small L2(a, b) norm only implies that the function is smallin some averaged sense.

In this example, we see that fε(x) is a good approximation to the functionf(x) = 1 if the error is measured in the L2(0, 1) norm, but is a bad approximationif the error is measured in the L∞(0, 1) norm.

This is why it is very important to specify how one chooses to measure error and

how one interprets the smallness or lack thereof of errors.

To economize notation, from now on we will abbreviate ‖ · ‖L∞[a,b] by ‖ · ‖∞ and‖ · ‖L2(a,b) by ‖ · ‖2 whenever the interval involved is not ambiguous.

Discrete norms

As we have seen, sometimes data is given in discrete form, i.e., only the pairs{xj, fj}mj=1 are known. In such cases, it makes no sense to consider L2(a, b)or L∞[a, b] norms. Instead, we consider the data {fj}mj=1 as being a discretefunction1 for which one uses discrete norms such as our standard vector norms;e.g.,

|||{fj}mj=1|||2 =( 1

m

m∑

j=1

|fj|2 dx)1/2

or |||{fj}mj=1|||∞ = maxxj, j=1,...,m

|fj|.

Why have we included the term 1m when we defined the Euclidean error norm?

1A discrete function is one that is defined at a discrete set of points; the more familiar functions we deal with are defined for all x in some interval [a, b], i.e., they are defined at a

infinite number of points.

The Errors Incurred in Lagrange Interpolation

• Consider the case for which the data {fj}nj=0 is determined by evaluating agiven function f(x) at the interpolation points {xj}nj=0, i.e., we have fj =f(xj). In this case we say that “pn(x) interpolates the function f(x) at thepoints x0, x1, . . . , xn.”

•We turn to the question of how close is the Lagrange interpolating polynomialto the function f(x), where we measure closeness using the L∞[a, b] (oruniform) norm. The answer is provided by the following theorem. Note thatthe theorem holds, at least in the case of infinite precision arithmetic, for anyform for the Lagrange interpolating polynomial, i.e., for any basis one choosesfor Pn.

Theorem. Error estimate for Lagrange interpolation Sup-pose f(x) has (n + 1) continuous derivatives for all x ∈ [a, b]and pn(x) denotes the Lagrange interpolating polynomial ofdegree less than or equal to n that interpolates f(x) at thepoints {xj}nj=0 in [a, b]. Let W (x) =

∏nj=0(x− xj). Then,

‖f − pn‖∞ ≤ 1

(n + 1)!‖f (n+1)‖∞‖W‖∞, (2)

where fn+1(x) denotes the (n + 1)-st derivative of f(x). �

We refer to inequalities such as (2) as “error estimates.” It is important to notethat (2) only provides an upper bound for the error, i.e., all it says is that the error‖f − p‖∞ cannot be any larger than the right-hand side. For a specific choiceof f(x), the error can be a lot smaller than the right-hand side. On the otherhand, there do exist functions f(x) for which (2) holds with equality. Estimatesfor which the latter can be said are referred to as being “sharp error estimates.”

We see that the error estimate (2) depends on the interpolation points {xj}nj=0

only through the term ‖W‖∞. Note that ‖W‖∞ ≤ (b − a)n+1 because eachfactor in its definition has absolute value less than or equal to (b− a).

It is then natural to ask: can we select the interpolation points {xj}nj=0 in a sucha way so that ‖W‖∞ is minimized? For the interval [a, b] the answer to thisquestion is given by the following classic theorem.

Theorem. Chebyshev Points Let

W (x) =n∏

j=0

(x− xj) .

Then, ‖W‖∞ is minimized on [−1, 1] by the Chebyshev pointsgiven by

xj = cos( (2j + 1)

2(n + 1)π)

for j = 0, . . . , n. (3)

For the Chebyshev points, we have that ‖W‖∞ ≤ 2−n. �

The Chebyshev points are often defined (see Wikipedia) in the equivalent way

xk = cos

(2k − 1

2mπ

)k = 1, . . . , m

Example Find the Chebyshev points on [−1, 1] for n = 3.

x0 = cos(π

8) = 0.92388

x1 = cos(3π

8) = 0.38268

x2 = cos(5π

8) = −0.38268

x3 = cos(7π

8) = −0.92388

The analogous results for the general case of Chebyshev points on the interval[a, b] are easily deduced by mapping from the interval [−1, 1]. The choice ofpoints that minimizes ‖W‖∞ is then given by

xj =(b + a)

2+

(b− a)

2cos

( 2j + 1

2(n + 1)π)

for j = 0, . . . , n

in which case ‖W‖∞ ≤ 2−2n−1(b− a)n+1.

Example . Find the Chebyshev points on [1, 3] for n = 3.

x0 = 2 +2

2cos(

π

8) = 2.92388

x1 = 2 +2

2cos(

8) = 2.38268

x2 = 2 +2

2cos(

8) = 1.61732

x3 = 2 +2

2cos(

8) = 1.07612

We now consider some additional ramification of the error estimate (2).

If we fix the degree of the polynomial n in (2) and we consider small intervals[a, b], the Lagrange interpolating polynomial provides quite a good approximation.

Example Consider the two interpolation points x0 = a and x1 = b. If weinterpolate f(x) by a linear polynomial p1(x) at the points x0 and x1, we obtain

the error estimate

‖f − p1‖∞ ≤ h2

8‖f ′′‖∞,

where h = (b− a).

Consider the three interpolation points x0 = a, x1 = a+b2 , and x2 = b. If we

interpolate f(x) by a quadratic polynomial p2(x) at the points x0, x1, and x2,we obtain the error estimate

‖f − p2‖∞ ≤ h3‖f ′′′‖∞.

Clearly, p1(x) and p2(x) are good approximations (in the uniform norm) to f(x)as h → 0, i.e., for (b− a) small. �

The situation changes drastically when we consider the approximating propertiesof Lagrange interpolating polynomials when we keep [a, b] fixed and let n, thedegree of the interpolating polynomial, increase.

Note that when we increase n we need more interpolation points. Thus, weconsider a sequence, indexed by n, of interpolation point sets {xj}nj=0 for n =0, 1, 2, . . . and the corresponding sequence of Lagrange interpolating polynomials

pn(x) ∈ Pn.

Question: Is it true that pn(x) → f(x) as n → ∞ for all x ∈ [a, b]? This seemsplausible in view of the estimate (2), at least for2 C∞[a, b] functions f(x).

2Cr [a, b] denotes the space of functions having r continuous derivatives so that C∞[a, b] denotes the space of functions that are infinitely differentiable.

Runge’s Example

This example demonstrates that it is NOT true that pn(x) → f(x) as n → ∞for all x ∈ [a, b] using equally spaced points. In fact,

‖f − pn‖∞becomes arbitrarily large as n → ∞.

Consider the function

f(x) =1

1 + 25x2− 1 ≤ x ≤ 1

which we want to interpolate (using only function values) with an increasingnumber of points. We interpolate with evenly spaced points, i.e., with a higherand higher degree polynomial. The black curve represents the function.

-1 -0.5 0.5 1

0.51

1.52

-1 -0.5 0.5 1-2-1

1234

-1 -0.5 0.5 1

0.20.40.60.81

-1 -0.5 0.5 1-0.4-0.2

0.20.40.60.81

One may suspect that it is the equal spacing of the interpolation points thatcauses these problems. To some extent this it true. For example, we have thefollowing result.

Theorem. Convergence of Interpolating Polynomial usingChebyshev Points. Let f(x) have two continuous deriva-tives on [a, b] and let pn(x) denote its Lagrange interpolatingpolynomial of degree less than or equal to n with respect tothe Chebyshev points (3). Then, pn(x) converges to f(x) uni-formly, i..e., ‖f − pn‖∞ → 0 as n → ∞; in fact, we havethat

‖f − pn‖∞ ≤ C1√n

as n → ∞, where C is independent of n. �

Nevertheless, in the general case of a continuous function f(x), it can be shownthat no matter how the interpolation points {xj}ni=1 are chosen , there exists acontinuous function f(x) such that ‖f − pn‖∞ → ∞ as n → ∞.

Since we know that it is not advisable to approximate a function by a higherdegree interpolating polynomial, how should we interpolate a function to getbetter accuracy? The answer is piecewise interpolation.

Piecewise Polynomial Interpolation

•We have seen that Lagrange interpolating polynomials can provide a goodapproximation for a fixed degree if (b− a) is small.

• In general, the Lagrange interpolating polynomials are not so effective if [a, b]is fixed and the degree n increases.

• In general, high degree polynomial interpolation should be avoided.

•What can we do instead?

•We use something called piecewise interpolation.

•When a graphing program plots a function like we did in the previous examplesit connects closely spaced points with a straight line. If the points are closeenough, then the result looks like a curve to our eye. This is called piecewiselinear interpolation.

Examples of Piecewise Functions

Piecewise Constant Function

0.5 1 1.5 2 2.5 3

-0.4

-0.2

0.2

0.4

0.6

Continuous Piecewise Linear Function

0.5 1 1.5 2 2.5 3

0.2

0.4

0.6

0.8

1

1.2

1.4

A continuous piecewise linear function is linear on each subinterval and because itis continuous the linear function on the interval [xi−1, xi] and the linear functiondefined on [xi, xi+1] will match up at xi.

Below we see some examples of a piecewise linear interpolant to sinx2 on(-2,2). In each plot we are using uniform subintervals and doubling the num-ber of subintervals from one plot to the next.

-2 -1 1 2

-0.75-0.5

-0.25

0.250.5

0.751

-2 -1 1 2

-0.75-0.5-0.25

0.250.5

0.751

-2 -1 1 2

-0.75-0.5

-0.25

0.250.5

0.751

-2 -1 1 2

-0.75-0.5-0.25

0.250.5

0.751

Below we see some examples of piecewise linear interpolation for Runge’s ex-ample. Note that this form of interpolation does not have the “wiggles” weencountered by using a higher degree polynomial.

-1 -0.5 0.5 1

0.20.40.60.8

1

-1 -0.5 0.5 1

0.20.40.60.8

1

-1 -0.5 0.5 1

0.20.40.60.8

1

Piecewise Linear Interpolation

• Assume that we are given (x0, f0), (x1, f1), (x2, f2), . . ., (xn, fn). We wantto construct the piecewise linear interpolant.

x1 x2 x3 xi xi+1 xn xn+1

•We divide our domain into points xi i = 0, 1, . . . , n

• The linear interpolation on the interval [xi, xi+1] is just

Li(x) = ai + bi(x− xi)

where

ai = fi fi+1 = fi + bi(xi+1 − xi) ⇒ bi =fi+1 − fixi+1 − xi

Alternately we could write it as

Li(x) = fi(x− xi+1) + fi+1(x− xi)

• Thus our piecewise linear interpolant is given by

L(x) =

L0(x) if x0 ≤ x ≤ x1

L1(x) if x1 ≤ x ≤ x2... ...

Li(x) if xi ≤ x ≤ xi+1

... ...

Ln−1(x) if xn−1 ≤ x ≤ xn

• Note that L(x) is continuous but not differentiable everywhere.

Example Determine the piecewise linear interpolant to sinx on [0, π] found byusing 2 equal subintervals; then evaluate it at the points π

3 ,3π5 . Compute error at

the points.

• Our intervals are [0, π2 ] and [π2 , π].

• On the first interval the coefficients are

a1 = sin 0 = 0 b1 =sin π

2− sin 0

π2 − 0

=2

π

so we have L0 = 0 +2

π(x− 0) =

2x

π.

• On the second interval the coefficients are

a2 = sinπ

2= 1 b2 =

sin π − sin π2

π − π2

=0− 1

π2

= −2

π

which gives L1 = 1− 2

π(x− π

2).

• The point π3 is in the first interval so

L0(x) = a1 + b1(x− x1) ⇒ L0(π

3) = 0 +

2

π(π

3− 0) =

2

3The actual value of sin π

3 is 0.866025 so our error is approximately 0.2. Theerror is fairly large because the length of our subinterval is large π

2 ≈ 1.57.

• Now the point 3π5is in the second interval so

L1(x) = a2 + b2(x− x2) ⇒ L2(3π

5) = 1− 2

π(3π

5− π

2) =

4

5The actual value of sin 3π

5 is 0.950157 so our error is approximately 0.151.

• Of course we can also do piecewise quadratic or cubic interpolation. Usuallywe don’t go higher than this.

• Remember that piecewise interpolation is almost always preferable to using ahigher order polynomial.

Formal Definitions

Given −∞ < a < b < ∞, an integer n ≥ 1, and points {xj}nj=0 such thata ≤ x0 < x1 < . . . < xn−1 < xn = b, let

∆h = {[xj−1, xj]}nj=1

denote a set of subintervals that define a partition of [a, b] into n subintervalsand let

h = maxj=1,2,...,n

(xj − xj−1),

i.e., h denotes the largest length of any of the subintervals in ∆h. We then makethe following definition.

Definition. Continuous piecewise-linear polynomials Given thepartition ∆h of [a, b], the set L∆h

of all continuous on [a, b]piecewise-linear polynomials with respect to ∆h is the set

L∆h={φ(x) is continuous on the interval [a, b] and is a

linear polynomial on each subinterval [xj−1, xj] in ∆h

}.�

Basis for Piecewise Linear Interpolation

Now consider the functions

φ0(x) =

x1 − x

x1 − x0if x0 ≤ x ≤ x1

0 otherwise

φi(x) =

x− xi−1

xi − xi−1if xi−1 ≤ x ≤ xi

xi+1 − x

xi+1 − xjif xi ≤ x ≤ xi+1

0 otherwise

for j = 1, . . . , n− 1

φn(x) =

x− xn−1

xn − xn−1if xn−1 ≤ x ≤ xn

0 otherwise.

For the functions {φj(x)}nj=0, we have the following.

• For j = 0, . . . , n, each φj(x) ∈ L∆h, i.e., each φj(x) is a piecewise-linear

function with respect to the partition ∆h of the interval [a, b].

• The set {φj(x)}nj=0 is a basis for L∆h, i.e., it is linearly independent and spans

L∆h.

• L∆hhas dimension n + 1;

• The basis {φj(x)}nj=0 has the delta property

φj(xk) = δjk for j, k = 0, . . . , n. (4)

• For j = 0 and j = n, φj(x) has support3 over one interval; for j = 1, . . . , n−

1, φj(x) has support over two subintervals. Specifically, we have that,

φj(x) = 0 for x ∈ [xk−1, xk] whenever k 6= j or j + 1. (5)

The functions {φj(x)}Nj=0 are commonly referred to as the hat functions; thefigure provides an illustration of the piecewise-linear basis functions {φj(x)}4j=0

for n = 4.3The support of a function are those x for which the function is not zero.

φ0(x)

x0=a x1 x2 x3 x4 x5=b

φ1(x) φ2(x)φ3(x) φ4(x) φ5(x)

Given f(x) continuous on the interval [a, b], consider the function qn(x) ∈ L∆h

defined by

qn(x) =n∑

j=0

f(xj)φj(x) = f(x0)φ0(x) + f(x1)φ1(x) + · · ·

+ f(xn−1)φn−1(x) + f(xn)φn(x).

(6)

Evaluating this function at xk, we have that

qn(xk) =n∑

j=0

f(xj)φj(xk) =n∑

j=0

f(xj)δjk = f(xk) for k = 0, . . . , n,

i.e., the continuous piecewise-linear function qn(x) interpolates f(x) at the points{xk}nk=1. Thus, we refer to qn(x) given by (6) as the Lagrange continuouspiecewise-linear interpolant of f(x) with respect to the partition∆h of the interval[a, b]. This interpolant exists (we in fact, have just shown that by constructingit) and it is easy to show that it is also unique.

Because of (5), i.e., because of φj(x) has support over at most two subintervals,(6) collapses to just two terms in each subinterval, i.e., we have that

qn(x) = f(xj−1)φj−1(x) + f(xj)φj(x) for x ∈ [xj−1, xj], j = 1, . . . , n.(7)

Thus, the value of qn(x) for x ∈ [xj−1, xj] is determined using only the twobasis function φj−1(x) and φj(x) corresponding to the end points of that in-terval; for this reason, piecewise polynomial interpolants are referred as beinglocal approximations. On the other hand, because all the Lagrange fundamentalpolynomials ℓj(x) are non-zero over the whole interval [a, b], all terms in the sumare needed whenever one evaluates the Lagrange interpolant pLn(x) at any pointx ∈ [a, b]. For this reason, the Lagrange interpolant is referred to as being aglobal approximation. Likewise, because the basis {ℓj(x)} consists of globallysupported functions, i.e., functions that are non-zero over all of [a, b], that ba-

sis is referred to as a global basis whereas the basis {φj(x)} consists of locallysupported functions, i.e., functions that are non-zero over only the intervals thathave xj as an endpoint, that basis is referred to as a local basis.

Example Write the piecewise linear interpolant of f(x) = sin(π2x) on theinterval [1, 2] using the points x0 = 1, x1 =

43, x2 =

53, and x3 = 2 and the local

basis given by the “hat functions”.

We have n = 3 and the data

i 0 1 2 3xj 1 4

3532

f(xj) 1√32

120

(8)

so that

φ0(x) =

43 − x43 − 1

= 4− 3x if 1 ≤ x ≤ 43

0 otherwise

φ1(x) =

x− 143− 1

= 3x− 3 if 1 ≤ x ≤ 43

53 − x53− 4

3

= 5− 3x if 43≤ x ≤ 5

3

0 otherwise

φ2(x) =

x− 43

53 − 4

3

= 3x− 4 if 43 ≤ x ≤ 5

3

2− x

2− 53

= 6− 3x if 53 ≤ x ≤ 2

0 otherwise

φ3(x) =

x− 53

2− 53

= 3x− 5 if 53≤ x ≤ 2

0 otherwise

so that

q3(x) =

f(1)φ0(x) + f(43)φ1(x) = 1(4− 3x) +

√32(3x− 3) if 1 ≤ x ≤ 4

3

f(43)φ1(x) + f(53)φ2(x) =√32 (5− 3x) + 1

2(3x− 4) if 43 ≤ x ≤ 5

3

f(53)φ2(x) + f(2)φ3(x) =12(6− 3x) + 0(3x− 5) if 5

3 ≤ x ≤ 2.

One can easily verify that the interpolation conditions {q3(xj) = f(xj)}3i=0 aresatisfied. �

Error in the Piecewise-Linear Interpolant

We now derive an estimate for the error in the piecewise-linear interpolant, inparticular, an estimate for ‖f(x)− qn(x)‖L∞[a,b].

On each subinterval [xj−1, xj], qn(x) is a linear polynomial such that qn(xj−1) =f(xj−1) and qn(xj) = f(xj) so that qn(x), restricted to the interval [xj−1, xj],is the Lagrange linear interpolation polynomial for f(x) over that interval.

Thus, on each subinterval we can apply the error estimate (2) to conclude that

‖f(x)− qn(x)‖L∞[xj−1,xj] ≤1

2‖f ′′(x)‖L∞[xj−1,xj]‖(x− xj−1)(x− xj)‖L∞[xj−1,xj],

where have used W (x) = (x− xj−1)(x− xj).

Now

‖(x− xj−1)(x− xj)‖L∞[xj−1,xj] =≤1

4h2

where we have used the calculus to find the maximum value of the function(x − xj−1)(xj − x) over the interval [xj−1, xj] (just like in a previous example)

and also used that, by the definition of h, (xj − xj−1)2 ≤ h2. Combining the

last two results, we have that

‖f(x)− qn(x)‖L∞[xj−1,xj] ≤1

8h2‖f ′′(x)‖L∞[xj−1,xj] for j = 1, . . . , n.

Clearly,

‖f ′′(x)‖L∞[xj−1,xj] = maxx∈[xj−1,xj]

|f ′′(x)| ≤ maxx∈[a,b]

|f ′′(x)| = ‖f ′′(x)‖L∞[a,b].

Combining the last two results, we have

‖f(x)− qn(x)‖L∞[xj−1,xj] ≤1

8h2‖f ′′(x)‖L∞[a,b] for j = 1, . . . , n.

However, we have that

‖f(x)− qn(x)‖L∞[a,b] = maxx∈[a,b]

|f(x)− qn(x)| = maxj=1,...,n

maxx∈[xj−1,xj]

|f(x)− qn(x)|

= maxj=1,...,n

‖f(x)− qn(x)‖L∞[xj−1,xj]

so that combining the last two results, we have

‖f(x)− qn(x)‖L∞[a,b] ≤ maxj=1,...,N

(18h2‖f ′′(x)‖L∞[a,b]

)=

1

8h2‖f ′′(x)‖L∞[a,b].

We have thus proved the following result.

Theorem. Error estimate for Lagrange continuous piecewise-linearpolynomial interpolation Let f(x) denote a given twice continu-ously differentiable over the interval [a, b] and let qn(x) denote theLagrange continuous piecewise-linear polynomial that interpolatesf(x) at the points a = x0 < x1 < . . . < xn−1 < xn = b. Then,

‖f(x)− qn(x)‖L∞[a,b] ≤1

8h2‖f ′′(x)‖L∞[a,b], (9)

where h = maxj=1,...,n |xj − xj−1|. �

If we increase the number of interpolation points n + 1 in such a way that hbecomes smaller and smaller, i.e., the length of the longest subinterval [xj−1, xj]becomes smaller, then, so long as f(x) has two continuous derivatives, we areguaranteed the error becomes smaller and smaller. In fact, given any ε > 0, wecan guarantee that ‖f(x)− qn(x)‖L∞[a,b] < ε by choosing n large enough.

A special case is given by uniform partitions of [a, b], i.e., we have that xj =

a + ib−an for j = 0, 1, . . . , n. In this case all the subintervals have the same

length h = (b−a)n

and (9) becomes

‖f(x)− qn(x)‖L∞[a,b] ≤1

8‖f ′′(x)‖L∞[a,b]

(b− a)2

n2. (10)

In this case it is obvious that we can make ‖f(x)− qn(x)‖L∞[a,b] as small as wewant by choosing n large enough.

Example Consider the function f(x) = 11+x2

on the interval [−5, 5] for which, ifone uses a uniform partition of the interval [a, b], the error for the global Lagrangepolynomial interpolant becomes arbitrarily large as we increase the degree of thepolynomial. A tedious but easy calculation (involving finding the extrema of1

1+x2over the interval [−5, 5]) yields that ‖f ′′(x)‖L∞[−5,5] = 2. Then, because

b− a = 10, we have from (10) that∣∣∣ 1

1 + x2−qn(x)

∣∣∣ ≤∥∥∥ 1

1 + x2−qn(x)

∥∥∥L∞[−5,5]

≤ (1

8)(2)(102)

1

n2=

25

n2for all x ∈ [−5, 5]

Thus, we can guarantee that∣∣ 11+x2

− qn(x)∣∣ < ε for all x ∈ [−5, 5] by choosing

25n2

< ε, i.e., choosing n to be the smallest integer larger than 5√ε. For example,

we can guarantee that the error is less than 0.01 by choosing n = 50. �

Piecewise Cubic Hermite Interpolation

•When we use piecewise linear, piecewise quadratic, piecewise cubic, etc inter-polation the interpolating polynomial does not get smoother at the nodes.

• For example, for piecewise quadratic interpolation we add another interpola-tion point in each interval [xi, xi+1], typically the midpoint.

• If we want a smoother interpolating polynomial we can add additional inter-polation conditions at the interior nodes. For example, we can require thatnot only is the function continuous but additionally the first derivatives arecontinuous.

•When we use derivatives as interpolation conditions, the interpolating poly-nomial is called a Hermite interpolating polynomial.

• In the homework you will explore such a polynomial.

Cubic Splne Interpolation

• Linear interpolation is the simplest form of a spline, followed by quadraticinterpolation and then cubic interpolation.

• It is generally assumed that cubic is the optimal degree for splines.

• So we divide our interval [a, b] into subintervals [xi−1, xi] and on each subin-terval represent the function by a cubic polynomial.

• Assume we have points xi, i = 0, 1, . . . , n so we have n subintervals. Oneach subinterval we have four conditions to determine the cubic so there are4n degrees of freedom.

• On each of the n intervals [xi−1, xi] the cubic must interpolate the data ateach endpoint. This guarantees continuity at the interior nodes and gives atotal 2(n) constraints.

• So now we have 4n degrees of freedom and only 2n constraints so we addadditional conditions for smoothness at the n−1 interior nodes. If we impose

continuity of the derivatives at the interior nodes then we only add n − 1conditions so we can actually require continuity of the first and second deriva-tives at the interior nodes bringing the total up to 2n + 2(n − 1) = 4n − 2constraints.

• How do we impose the remaining two constraints? Different conditions canbe used. A common one is that the second derivative of the cubic is zero atthe two endpoints.

• In the lab, you will use Matlab’s cubic spline interpolant to fit some data sets.

• If we determined a basis (similar to the hat functions) for cubic splines, thenwe would find that the support of the basis was no longer two subintervalsbut rather has support over four intervals.

• Here is the cubic spline basis function on the interval [−2, 2] where the subin-tervals have length one. To get the spline on [xi−2, xi+2] we use the transfor-mation

φi(x) =

{φ(x

h− i) xi−2 ≤ x ≤ xi+2

0 elsewhere

where

φ(x) =

1

4(x + 2)3 −2 ≤ x ≤ −1 ,

1

4

(1 + 3(1 + x) + 3(1 + x)2 − 3(1 + x)3

)−1 ≤ x ≤ 0 ,

1

4

(1 + 3(1− x) + 3(1− x)2 − 3(1− x)3

)0 ≤ x ≤ 1 ,

1

4(2− x)3 1 ≤ x ≤ 2 .

-2 -1 1 2

1

φ(x)