sturm liouville euler lagrange

22
Physics 129a Calculus of Variations 071113 Frank Porter Revision 081120 1 Introduction Many problems in physics have to do with extrema. When the problem involves finding a function that satisfies some extremum criterion, we may attack it with various methods under the rubric of “calculus of variations”. The basic approach is analogous with that of finding the extremum of a function in ordinary calculus. 2 The Brachistochrone Problem Historically and pedagogically, the prototype problem introducing the cal- culus of variations is the “brachistochrone”, from the Greek for “shortest time”. We suppose that a particle of mass m moves along some curve under the influence of gravity. We’ll assume motion in two dimensions here, and that the particle moves, starting at rest, from fixed point a to fixed point b. We could imagine that the particle is a bead that moves along a rigid wire without friction [Fig. 1(a)]. The question is: what is the shape of the wire for which the time to get from a to b is minimized? First, it seems that such a path must exist – the two outer paths in Fig. 2(b) presumably bracket the correct path, or at least can be made to bracket the path. For example, the upper path can be adjusted to take an arbitrarily long time by making the first part more and more horizontal. The lower path can also be adjusted to take an arbitrarily long time by making the dip deeper and deeper. The straight-line path from a to b must take a shorter time than both of these alternatives, though it may not be the shortest. It is also readily observed that the optimal path must be single-valued in x, see Fig. 1(c). A path that wiggles back and forth in x can be shortened in time simply by dropping a vertical path through the wiggles. Thus, we can describe path C as a function y (x). 1

Upload: jojolillo

Post on 02-Oct-2014

81 views

Category:

Documents


2 download

TRANSCRIPT

Physics 129aCalculus of Variations071113 Frank Porter

Revision 081120

1 Introduction

Many problems in physics have to do with extrema. When the probleminvolves finding a function that satisfies some extremum criterion, we mayattack it with various methods under the rubric of “calculus of variations”.The basic approach is analogous with that of finding the extremum of afunction in ordinary calculus.

2 The Brachistochrone Problem

Historically and pedagogically, the prototype problem introducing the cal-culus of variations is the “brachistochrone”, from the Greek for “shortesttime”. We suppose that a particle of mass m moves along some curve underthe influence of gravity. We’ll assume motion in two dimensions here, andthat the particle moves, starting at rest, from fixed point a to fixed point b.We could imagine that the particle is a bead that moves along a rigid wirewithout friction [Fig. 1(a)]. The question is: what is the shape of the wirefor which the time to get from a to b is minimized?

First, it seems that such a path must exist – the two outer paths inFig. 2(b) presumably bracket the correct path, or at least can be made tobracket the path. For example, the upper path can be adjusted to take anarbitrarily long time by making the first part more and more horizontal. Thelower path can also be adjusted to take an arbitrarily long time by makingthe dip deeper and deeper. The straight-line path from a to b must takea shorter time than both of these alternatives, though it may not be theshortest.

It is also readily observed that the optimal path must be single-valued inx, see Fig. 1(c). A path that wiggles back and forth in x can be shortened intime simply by dropping a vertical path through the wiggles. Thus, we candescribe path C as a function y(x).

1

C

a

b

a

b

(a) (b)

yx

(c)a

b

.

.

. .

..

Figure 1: The Brachistochrone Problem: (a) Illustration of the problem; (b)Schematic to argue that a shortest-time path must exist; (c) Schematic toargue that we needn’t worry about paths folding back on themselves.

We’ll choose a coordinate system with the origin at point a and the y axisdirected downward (Fig. 1). We choose the zero of potential energy so thatit is given by:

V (y) = −mgy.

The kinetic energy is

T (y) = −V (y) =1

2mv2,

for zero total energy. Thus, the speed of the particle is

v(y) =√

2gy.

An element of distance traversed is:

ds =√

(dx)2 + (dy)2 =

√√√√1 +

(dy

dx

)2

dx.

Thus, the element of time to traverse ds is:

dt =ds

v=

√1 +

(dydx

)2

√2gy

dx,

and the total time of descent is:

T =∫ xb

0

√1 +

(dydx

)2

√2gy

dx.

2

Different functions y(x) will typically yield different values for T ; we callT a “functional” of y. Our problem is to find the minimum of this functionalwith respect to possible functions y. Note that y must be continuous – itwould require an infinite speed to generate a discontinuity. Also, the accel-eration must exist and hence the second derivative d2y/dx2. We’ll proceedto formulate this problem as an example of a more general class of problemsin “variational calculus”.

Consider all functions, y(x), with fixed values at two endpoints; y(x0) =y0 and y(x1) = y1. We wish to find that y(x) which gives an extremum forthe integral:

I(y) =∫ x1

x0

F (y, y′, x) dx,

where F (y, y′, x) is some given function of its arguments. We’ll assume “goodbehavior” as needed.

In ordinary calculus, when we want to find the extrema of a functionf(x, y, . . .) we proceed as follows: Start with some candidate point (x0, y0, . . .),Compute the total differential, df , with respect to arbitrary infinitesimalchanges in the variables, (dx, dy, . . .):

df =

(∂f

∂x

)x0,y0,...

dx +

(∂f

∂y

)x0,y0,...

dy + . . .

Now, df must vanish at an extremum, independent of which direction wechoose with our infinitesimal (dx, dy, . . .). If (x0, y0, . . .) are the coordinatesof an extremal point, then

(∂f

∂x

)x0,y0,...

=

(∂f

∂y

)x0,y0,...

= . . . = 0.

Solving these equations thus gives the coordinates of an extremum point.Finding the extremum of a functional in variational calculus follows the

same basic approach. Instead of a point (x0, y0, . . .), we consider a candidatefunction y(x) = Y (x). This candidate must satisfy our specified behavior atthe endpoints:

Y (x0) = y0

Y (x1) = y1. (1)

We consider a small change in this function by adding some multiple ofanother function, h(x):

Y (x) → Y (x) + εh(x).

3

0.9

1.4

1.9

-1.1

-0.6

-0.1

0.4

0 0.2 0.4 0.6 0.8 1

h

YY+ hε

Figure 2: Variation on function Y by function εh.

To maintain the endpoint condition, we must have h(x0) = h(x1) = 0. Thenotation δY is often used for εh(x).

A change in functional form of Y (x) yields a change in the integral I.The integrand changes at each point x according to changes in y and y′:

y(x) = Y (x) + εh(x),y′(x) = Y ′(x) + εh′(x). (2)

To first order in ε, the new value of F is:

F (Y +εh, Y ′+εh′) ≈ F (Y, Y ′, x)+

(∂F

∂y

)y=Y

y′=Y ′

εh(x)+

(∂F

∂y′

)y=Y

y′=Y ′

εh′(x). (3)

We’ll use “δI” to denote the change in I due to this change in functionalform:

δI =∫ x1

x0

F (Y + εh, Y ′ + εh′, x) dx −∫ x1

x0

F (Y, Y ′, x) dx,

≈ ε∫ x1

x0

⎡⎢⎣(

∂F

∂y

)y=Y

y′=Y ′

h +

(∂F

∂y′

)y=Y

y′=Y ′

h′

⎤⎥⎦ dx. (4)

We may apply integration by parts to the second term:

∫ x1

x0

∂F

∂y′h′ dx = −

∫ x1

x0

hd

dx

(∂F

∂y′

)dx, (5)

4

where we have used h(x0) = h(x1) = 0. Thus,

δI = ε∫ x1

x0

[∂F

∂y+

d

dx

(∂F

∂y′

)]y=Y

y′=Y ′

h dx. (6)

When I is at a minimum, δI must vanish, since, if δI > 0 for some ε,then changing the sign of ε gives δI < 0, corresponding to a smaller value ofI. A similar argument applies for δI < 0, hence δI = 0 at a minimum. Thismust be true for arbitrary h and ε small but finite. It seems that a necessarycondition for I to be extremal is:[

∂F

∂y+

d

dx

(∂F

∂y′

)]y=Y

y′=Y ′

= 0. (7)

This follows from the fundamental theorem:

Theorem: If f(x) is continuous in [x0, x1] and

∫ x1

x0

f(x)h(x) dx = 0 (8)

for every continuously differentiable h(x) in [x0, x1], where h(x0) =h(x1) = 0, then f(x) = 0 for x ∈ [x0, x1].

Proof: Imagine that f(χ) > 0 for some x0 < χ < x1. Since f is continuous,there exists ε > 0 such that f(x) > 0 for all x ∈ (χ − ε, χ + ε). Let

h(x) ={

(x − χ + ε)2(x − χ − ε)2, χ − ε ≤ x ≤ χ + ε0 otherwise.

(9)

Note that h(x) is continuously differentiable in [x0, x1] and vanishes at x0

and x1. We have that

∫ x1

x0

f(x)h(x) dx =∫ χ+ε

χ−εf(x)(x − χ + ε)2(x − χ − ε)2 dx (10)

> 0, (11)

since f(x) is larger than zero everywhere in this interval. Thus, f(x) cannotbe larger than zero anywhere in the interval. The parallel argument followsfor f(x) < 0.

This theorem then permits the assertion that

[∂F

∂y+

d

dx

(∂F

∂y′

)]y=Y

y′=Y ′

= 0. (12)

5

whenever y = Y such that I is an extremum, at least if the expression onthe right is continuous. We call the expression on the right the “Lagrangianderivative” of F (y, y′, x) with respect to y(x), and denote it by δF

δy.

The extremum condition, relabeling Y → y, is then:

δF

δy≡ ∂F

∂y− d

dx

(∂F

∂y′

)= 0. (13)

This is called the Euler-Lagrange equation.Note that δI = 0 is a necessary condition for I to be an extremum, but

not sufficient. By definition, the Euler-Lagrange equation determines pointsfor which I is “stationary”. Further consideration is required to establishwhether I is an extremum or not.

We may write the Euler-Lagrange equation in another form. Let

Fa(y, y′, x) ≡ ∂F

∂y′ . (14)

Then

d

dx

(∂F

∂y′

)=

dFa

dx=

∂Fa

∂x+

∂Fa

∂yy′ +

∂Fa

∂y′ y′′ (15)

=∂2F

∂x∂y′ +∂2F

∂y∂y′ y′ +

∂2F

∂y′2 y′′. (16)

Hence the Euler-Lagrange equation may be written:

∂2F

∂y′2 y′′ +∂2F

∂y∂y′y′ +

∂2F

∂x∂y′ −∂F

∂y= 0. (17)

Let us now apply this to the brachistochrone problem, finding the ex-tremum of: √

2gT =∫ xb

0

√1 + y′2

ydx. (18)

That is:

F (y, y′, x) =

√1 + y′2

y. (19)

Notice that, in this case, F has no explicit dependence on x, and we cantake a short-cut. Starting with the Euler-Lagrange equation, if F has noexplicit x-dependence we find:

0 =

[∂F

∂y− d

dx

∂F

∂y′

]y′ (20)

6

=∂F

∂yy′ − y′ d

dx

∂F

∂y′ (21)

=dF

dx− ∂F

∂y′ y′′ − y′ d

dx

∂F

∂y′ (22)

=d

dx

(F − y′∂F

∂y′

). (23)

Hence,

F − y′∂F

∂y′ = constant = C. (24)

In this case,

y′∂F

∂y′ = (y′)2/√

y (1 + y′2). (25)

Thus, √1 + y′2

y− (y′)2

/√

y (1 + y′2) = C, (26)

or

y(1 + y′2) =

1

C2≡ A. (27)

Solving for x, we find

x =∫ √

y

A − ydy. (28)

We may perform this integration with the trigonometric substitution: y =A2(1 − cos θ) = A sin2 θ

2. Then,

x =∫ √√√√ sin2 θ

2

1 − sin2 θ2

A sinθ

2cos

θ

2dθ (29)

= A∫

sin2 θ

2dθ (30)

=A

2(θ − sin θ) + B. (31)

We determine integration constant B by letting θ = 0 at y = 0. Wechose our coordinates so that xa = ya = 0, and thus B = 0. Constant A isdetermined by requiring that the curve pass through (xb, yb):

xb =A

2(θb − sin θb), (32)

yb =A

2(1 − cos θb). (33)

7

This pair of equations determines A and θb. The brachistochrone is givenparametrically by:

x =A

2(θ − sin θ), (34)

y =A

2(1 − cos θ). (35)

In classical mechanics, Hamilton’s principle for conservative systems thatthe action is stationary gives the familiar Euler-Lagrange equations of clas-sical mechanics. For a system with generalized coordinates q1, q2, . . . , qn, theaction is

S =∫ t

t0L ({qi} , {qi} , t′) dt, (36)

where L is the Lagrangian. Requiring S to be stationary yields:

d

dt

(∂L

∂qi

)− ∂L

∂qi= 0, i = 1, 2, . . . , n. (37)

3 Relation to the Sturm-Liouville Problem

Suppose we have the Sturm-Liouville operator:

L =d

dxp(x)

d

dx− q(x), (38)

with p(x) ≥ 0, q(x) ≥ 0, and x ∈ (0, U). We are interested in solving theinhomogeneous equation Lf = g, where g is a given function.

Consider the functional

J =∫ U

0

(pf ′2 + qf 2 + 2gf

)dx. (39)

The Euler-Lagrange equation for J to be an extremum is:

∂F

∂f− d

dx

(∂F

∂f ′

)= 0, (40)

where F = pf ′2 + qf 2 + 2gf . We have

∂F

∂f= 2qy + 2g (41)

d

dx

(∂F

∂f ′

)= 2p′f ′ + 2pf ′′. (42)

8

Substituting into the Euler-Lagrange equation gives

d

dx

[p(x)

d

dxf(x)

]− q(x)f(x) = 0. (43)

This is the Sturm-Liouville equation! That is, the Sturm-Liouville differentialequation is just the Euler-Lagrange equation for the functional J .

We have the following theorem:

Theorem: The solution to

d

dx

[p(x)

d

dxf(x)

]− q(x)f(x) = g(x), (44)

where p(x) > 0, q(x) ≥ 0, and boundary conditions f(0) = a andf(U) = b, exists and is unique.

Proof: First, suppose there exist two solutions, f1 and f2. Then d = f1 − f2

must satisfy the homogeneous equation:

d

dx

[p(x)

d

dxd(x)

]− q(x)d(x) = 0, (45)

with homogeneous boundary conditions d(0) = d(U) = 0. Now multi-ply Equation 45 by d(x) and integrate:

∫ U

0d(x)

d

dx

(p(x)

d

dxd(x)

)dx −

∫ U

0q(x)d(x)2 dx = 0

= d(x)p(x)dd(x)

dx

∣∣∣U0−∫ U

0

(dd(x)

dx

)2

p(x) dx

= −∫ U

0pd′2 dx. (46)

Thus, ∫ U

0

(pd′2(x) + q(x)d(x)2

)dx = 0. (47)

Since pd′2 ≥ 0 and qd2 ≥ 0, we must thus have pd′2 = 0 and qd2 = 0in order for the integral to vanish. Since p > 0 and pd′2 = 0 it mustbe true that d′ = 0, that is d is a constant. But d(0) = 0, therefored(x) = 0. The solution, if it exists, is unique.

The issue for existence is the boundary conditions. We presume thata solution to the differential equation exists for some boundary con-ditions, and must show that a solution exists for the given boundary

9

condition. From elementary calculus we know that two linearly inde-pendent solutions to the homogeneous differential equation exist. Leth1(x) be a non-trivial solution to the homogeneous differential equationwith h1(0) = 0. This must be possible because we can take a suitablelinear combination of our two solutions. Because the solution to theinhomogeneous equation is unique, it must be true that h1(U) = 0.Likewise, let h2(x) be a solution to the homogeneous equation withh2(U) = 0 (and therefore h2(0) = 0). Suppose f0(x) is a solution tothe inhomogeneous equation satisfying some boundary condition. Formthe function:

f(x) = f0(x) + k1h1(x) + k2h2(x). (48)

We adjust constants k1 and k2 in order to satisfy the desired boundarycondition

a = f0(0) + k2h2(0), (49)

b = f0(U) + k1h1(U). (50)

That is,

k1 =b − f0(U)

h1(U), (51)

k2 =a − f0(0)

h2(U). (52)

We have demonstrated existence of a solution.

This discussion leads us to the variational calculus theorem:

Theorem: For continuously differentiable functions in (0, U) satisfying f(0) =a and f(U) = b, the functional

J =∫ U

0

(pf ′2 + qf 2 + 2gf

)dx, (53)

with p(x) > 0 and q(x) ≥ 0, attains its minimum if and only if f(x) isthe solution of the corresponding Sturm-Liouville equation.

Proof: Let s(x) be the unique solution to the Sturm-Liouville equation sat-isfying the given boundary conditions. Let f(x) be any other continu-ously differentiable function satisfying the boundary conditions. Thend(x) ≡ f(x) − s(x) is continuously differentiable and d(0) = d(U) = 0.

10

Solving for f , squaring, and doing the same for the dervative equation,yields

f 2 = d2 + s2 + 2sd, (54)

f ′2 = d′2 + s′2 + 2s′d′. (55)

Let

ΔJ ≡ J(f) − J(s) (56)

=∫ U

0

(pf ′2 + qf 2 + 2gf − ps′2 − qs2 − 2gs

)dx (57)

=∫ U

0

[p(d′2 + 2s′d′)+ q

(d2 + 2ds

)+ 2gf

]dx (58)

= 2∫ U

0(pd′s′ + qds + gd) dx +

∫ U

0

(pd′2 + qd2

)dx. (59)

But

∫ U

0(pd′s′ + qds + gd) dx = dps′

∣∣∣U0

+∫ U

0

[−d(x)

d

dx(ps′) + qds + gd

]dx

=∫ U

0d(x)

[− d

dx(ps′) + qs + g

]dx, since d(0) = d(U) = 0

= 0; integrand is zero by the differential equation. (60)

Thus, we have that

ΔJ =∫ U

0

(pd′ + qd2

)dx ≥ 0. (61)

In other words, f does no better than s, hence s corresponds to aminimum. Furthermore, if ΔJ = 0, then d = 0, since p > 0 implies d′

must be zero, and therefore d is constant, but we know d(0) = 0, henced = 0. Thus, f = s at the minimum.

4 The Rayleigh-Ritz Method

Consider the Sturm-Liouville problem:

d

dx

[p(x)

d

dxf(x)

]− q(x)f(x) = g(x), (62)

with p > 0, q ≥ 0, and specified boundary conditions. For simplicity here,let’s assume f(0) = f(U) = 0. Imagine expanding the solution in some set

11

of complete functions, {βn(x)} (not necessarily eignefunctions):

f(x) =∞∑

n=1

Anβn(x).

We have just shown that our problem is equivalent to minimizing

J =∫ U

0

(pf ′2 + qf 2 + 2gf

)dx. (63)

Substitute in our expansion, noting that

pf ′2 =∑m

∑n

AmAnp(x)β ′m(x)β ′

n(x). (64)

Let

Cmn ≡∫ U

0pβ ′

mβ ′n dx, (65)

Bmn ≡∫ U

0pβmβn dx, (66)

Gn ≡∫ U

0gβn dx. (67)

Assume that we can interchange the sum and integral, obtaining, for example,

∫ U

0pf ′2 dx =

∑m

∑n

CmnAmAn. (68)

ThenJ =

∑m

∑n

(Cmn + Bmn) AmAn + 2∑n

GnAn. (69)

Let Dmn ≡ Cmn + Bmn = Dnm. The Dmn and Gn are known, at least inprinciple. We wish to solve for the expansion coefficients {An}. To accom-plish this, use the condition that J is a minimum, that is,

∂J

∂An= 0, ∀n. (70)

Thus,

0 =∂J

∂An=

∞∑m=1

DnmAm + Gn, n = 1, 2, . . . (71)

This is an infinite system of coupled inhomogeneous equations. If Dnm isdiagonal, the solution is simple:

An = −Gn/Dnn. (72)

12

The reader is encouraged to demonstrate that this occurs if the βn are theeigenfunctions of the Sturm-Liouville operator.

It may be too difficult to solve the eigenvalue problem. In this case, we canlook for an approximate solution via the “Rayleigh-Ritz” approach: Choosesome finite number of linearly independent functions {α1(x), α2(x), . . . , αN(x)}.In order to find a function

f(x) =N∑

n=1

Anα(n)(x) (73)

that approximates closely f(x), we find the values for An that minimize

J(f) =N∑

n,m=1

DnmAmAn + 2N∑

n=1

GnAn, (74)

where now

Dnm ≡∫ U

0(pα′

nα′m + qαnαm) dx (75)

Gn ≡∫ U

0gαn dx. (76)

The minimum of J(f) is at:

N∑m=1

DnmAm + Gn = 0, n = 1, 2, . . . (77)

In this method, it is important to make a good guess for the set of functions{αn}.

It may be remarked that the Rayleigh-Ritz method is similar in spiritbut different from the variational method we typically introduce in quantummechanics, for example when attempting to compute the ground state energyof the helium atom. In that case, we adjust parameters in a non-linearfunction, while in the Rayleigh-Ritz method we adjust the linear coefficientsin an expansion.

5 Adding Constraints

As in ordinary extremum problems, constraints introduce correlations, nowin the possible variations of the function at different points. As with theordinary problem, we may employ the method of Lagrange multipliers toimpose the constraints.

13

We consider the case of the “isoperimetric problem”, to find the stationarypoints of the functional:

J =∫ b

aF (f, f ′, x) dx, (78)

in variations δf vanishing at x = a, b, with the constraint that

C ≡∫ b

aG(f, f ′, x) dx (79)

is constant under variations.We have the following theorem:

Theorem: (Euler) The function f that solves this problem also makes thefunctional I = J + λC stationary for some λ, as long as δC

δf= 0 (i.e., f

does not satisfy the Euler-Lagrange equation for C).

Proof: (partial) We make stationary the integral:

I = J + λC =∫ b

a(F + λG)dx. (80)

That is, f must satisfy

∂F

∂f− d

dx

∂F

∂f ′ + λ

(∂G

∂f− d

dx

∂G

∂f ′

)= 0. (81)

Multiply by the variation δf(x) and integrate:

∫ b

a

(∂F

∂f− d

dx

∂F

∂f ′

)δf(x) dx + λ

∫ b

a

(∂G

∂f− d

dx

∂G

∂f ′

)δf(x) dx = 0.

(82)Here, δf(x) is arbitrary. However, only those variations that keep Cinvariant are allowed (e.g., take partial derivative with respect to λ andrequire it to be zero):

δC =∫ b

a

(∂G

∂f− d

dx

∂G

∂f ′

)δf(x) dx = 0. (83)

5.1 Example: Catenary

A heavy chain is suspended from endpoints at (x1, y1) and (x2, y2). Whatcurve describes its equilibrium position, under a uniform gravitational field?

14

The solution must minimize the potential energy:

V = g∫ 2

1ydm (84)

= ρg∫ 2

1yds (85)

= ρg∫ x2

x1

y√

1 + y′2dy, (86)

where ρ is the linear density of the chain, and the distance element along thechain is ds = dx

√1 + y′2.

We wish to minimize V , under the constraint that the length of the chainis L, a constant. We have,

L =∫ 2

1ds =

∫ x2

x1

√1 + y′2dx. (87)

To solve, let (we multiply L by ρg and divide out of the problem)

F (y, y′, x) = y√

1 + y′2 + λ√

1 + y′2, (88)

and solve the Euler-Lagrange equation for F .Notice that F does not depend explicitly on x, so we again use our short

cut that

F − y′∂F

∂y′ = constant = C. (89)

Thus,

C = F − y′∂F

∂y′ (90)

= (y + λ)

(√1 + y′2 − y′2

√1 + y′2

)(91)

=(y + λ)

(√

1 + y′2.(92)

Some manipulation yields

dy√(y + λ)2 − C2

=dx

C. (93)

With the substitution y +λ = C cosh θ, we obtain θ = x+kC

, where k is anintegraton constant, and thus

y + λ = C cosh

(x + k

C

). (94)

15

There are three unknown constants to determine in this expression, C, k,and λ. We have three equations to use for this:

y1 + λ = C cosh

(x1 + k

C

), (95)

y2 + λ = C cosh

(x2 + k

C

), and (96)

L =∫ x2

x1

√1 + y′2dx. (97)

6 Eigenvalue Problems

We may treat the eigenvalue problem as a variational problem. As an exam-ple, consider again the Sturm-Liouville eigenvalue equation:

d

dx

[p(x)

df(x)

dx

]− q(x)f(x) = −λw(x)f(x), (98)

with boundary conditions f(0) = f(U) = 0. This is of the form

Lf = −λwf. (99)

Earlier, we found the desired functional to make stationary was, for Lf =0,

I =∫ U

0

(pf ′2 + qf 2

)dx. (100)

We modify this to the eigenvalue problem with q → q − λw, obtaining

I =∫ U

0

(pf ′2 + qf 2 − λwf 2

)dx, (101)

which possesses the Euler-Lagrange equation giving the desired Sturm-Liouvilleequation. Note that λ is an unknown parameter - we want to determine it.

It is natural to regard the eigenvalue problem as a variational problemwith constraints. Thus, we wish to vary f(x) so that

J =∫ U

0

(pf ′2 + qf 2

)dx (102)

is stationary, with the constraint

C =∫ U

0wf 2dx = constant. (103)

16

Notice here that we may take C = 1, corresponding to normalized eigenfunc-tions f , with respect to weight w.

Let’s attempt to find approximate solutions using the Rayleigh-Ritz method.Expand

f(x) =∞∑

n=1

Anun(x), (104)

where u(0) = u(U) = 0. The un are some set of expansion functions, not theeigenfunctions – if they are the eigenfunctions, then the problem is alreadysolved! Substitute this into I, giving

I =∞∑

m=1

∞∑n=1

(Cmn − λDmn)AmAn, (105)

where

Cmn ≡∫ U

0(pu′

mu′n + qumun) dx (106)

Dmn ≡∫ U

0wumundx. (107)

Requiring I to be stationary,

∂I

∂Am= 0, m = 1, 2, . . . , (108)

yields the infinite set of coupled homogeneous equations:

∞∑n=1

(Cmn − λDmn)An = 0, m = 1, 2, . . . (109)

This is perhaps no simpler to solve than the original differential equation.However, we may make approximate solutions for f(x) by selecting a finiteset of linearly independent functions α1, . . . , αN and letting

f(x) =N∑

n=1

Anαn(x). (110)

Solve for the “best” approximation of this form by finding those {An} thatsatisfy

N∑n=1

(Cmn − λDmn

)An = 0, m = 1, 2, . . . , N, (111)

where

Cmn ≡∫ U

0(pα′

mα′n + qαmαn) dx (112)

Dmn ≡∫ U

0wαmαndx. (113)

17

This looks like N equations in the N+1 unknowns λ, {An}, but the overallnormalization of the An’s is arbitrary. Hence there are enough equations inprinciple, and we obtain

λ =

∑Nm,n=1 CmnAmAn∑Nm,n=1 DmnAmAn

. (114)

Notice the similarity of Eqn. 114 with

λ =

∫ U0 (pf ′2 + qf 2) dx∫ U

0 wf 2dx=

J(f)

C(f). (115)

This follows since I = 0 for f a solution to the Sturm-Liouville equation:

I =∫ U

0

(pf ′2 + qf 2 − λwf 2

)dx

= pff ′∣∣∣U0

+∫ U

0

[−f

d

dx(pf ′) + qf 2 − λwf 2

]dx

= 0 +∫ U

0

(−qf 2 + λwf 2 + qf 2 − λwf 2

)dx

= 0, (116)

where we have used the both the boundary condition f(0) = f(U) = 0 andSturm-Liouville equation d

dx(pf ′) = qf − λwf to obtain the third line. Also,

λ =J(f)

C(f), (117)

since, for example,

J(f) =∫ U

0

(pf ′2 + qf 2

)dx

=∫ U

0

(p∑m,n

AnAmα′mα′

n + q∑m,n

AnAmαmαn

)dx

=∑m,n

CmnAnAm. (118)

That is, if f is “close” to an eigenfunction f , then λ should be “close” to aneigenvalue λ.

Let’s try an example: Find the lowest eigenvalue of f ′′ = −λf , withboundary conditions f(±1) = 0. We of course readily see that the firsteigenfunction is cos(πx/2) with λ1 = π2/4, but let’s try our method to see

18

how we do. For simplicity, we’ll try a Rayleigh-Ritx approximation with onlyone term in the sum.

As we noted earlier, it is a good idea to pick the functions with somecare. In this case, we know that the lowest eigenfunction won’t wiggle much,and a good guess is that it will be symmetric with no zeros in the interval(−1, 1). Such a function, which satisfies the boundary conditions, is:

f(x) = A(1 − x2

), (119)

and we’ll try it. With N = 1, we have α1 = α = 1 − x2, and

C ≡ C11 =∫ 1

−1

(pα′2 + qα2

)dx. (120)

In the Sturm-Liouville form, we have p(x) = 1, q(x) = 0, w(x) = 1.With N = 1, we have α1 = α = 1 − x2, and

C =∫ 1

−14x2dx =

8

3. (121)

Also,

D ≡ D11 =∫ 1

−1wα2dx =

∫ 1

−1

(1 − x2

)2dx =

16

15. (122)

The equation

N∑n=1

(Cmn − λDmn

)An = 0, m = 1, 2, . . . , N, (123)

becomes(C − λD)A = 0. (124)

If A = 0, then

λ =C

D=

5

2. (125)

We are within 2% of the actual lowest eigenvalue of λ1 = π2/4 = 2.467. Ofcourse this rather good result is partly due to our good fortune at picking aclose approximation to the actual eigenfunction, as may be seen in Fig. 6.

19

0.6

0.8

1

1.2

0

0.2

0.4

-1.5 -1 -0.5 0 0.5 1 1.5x

f(x)cos x1-x2

Figure 3: Rayleigh-Ritz eigenvalue estimation example, comparing exact so-lution with the guessed approximation.

7 Extending to Multiple Dimensions

It is possible to generalize our variational problem to multiple independentvariables, e.g.,

I(u) =∫∫

DF

(u,

∂u

∂x,∂u

∂y, x, y

)dxdy, (126)

where u = u(x, y), and bounded region D has u(x, y) specified on its bound-ary S. We wish to find u such that I is stationary with respect to variationof u.

We proceed along the same lines as before, letting

u(x, y) = u(x, y) + εh(x, y), (127)

where h(x, y)|S = 0. Look for stationary I: dIdε

∣∣∣∣∣ε=0

= 0. Let

ux ≡ ∂u

∂x, uy ≡ ∂u

∂y, hx ≡ ∂h

∂x, etc. (128)

ThendI

dε=∫∫

D

(∂F

∂uh +

∂F

∂uxhx +

∂F

∂uyhy

)dxdy. (129)

20

We want to “integrate by parts” the last two terms, in analogy with thesingle-variable case. Recall Green’s theorem:

∮S

(Pdx + Qdy) =∫∫

D

(∂Q

∂x− ∂P

∂y

)dxdy, (130)

and let

P = h∂F

∂ux, Q = −h

∂F

∂uy. (131)

With some algrbra, we find that

dI

dε=∮

Sh

(∂F

∂uxdx − ∂F

∂uydy

)+∫∫

Dh

[∂F

∂u− D

Dx

(∂F

∂ux

)− D

Dy

(∂F

∂uy

)]dxdy,

(132)where

Df

Dx≡ ∂f

∂x+

∂f

∂u

∂u

∂x+

∂f

∂ux

∂2u

∂x2+

∂f

∂uy

∂2u

∂x∂y(133)

is the “total partial derivative” with respect to x.The boundary integral over S is zero, since h(x ∈ {S}) = 0. The re-

maining double integral over D must be zero for arbitrary functions h, andhence,

∂F

∂u− D

Dx

(∂F

∂ux

)− D

Dy

(∂F

∂uy

)= 0. (134)

This result is once again called the Euler-Lagrange equation.

8 Exercises

1. Suppose you have a string of length L. Pin one end at (x, y) = (0, 0)and the other end at (x, y) = (b, 0). Form the string into a curve suchthat the area between the string and the x axis is maximal. Assumethat b and L are fixed, with L > b. What is the curve formed by thestring?

2. We considered the application of the Rayleigh-Ritz method to findingapproximate eigenvalues satisfying

y′′ = −λy, (135)

with boundary conditions y(−1) = y(1) = 0. Repeat the method, nowwith two functions:

α1(x) = 1 − x2, (136)

α2(x) = x2(1 − x2). (137)

21

You should get estimates for two eigenvalues. Compare with the exacteigenvalues, including a discussion of which eigenvalues you have man-aged to approximate and why. If the eigenvalues you obtain are notthe two lowest, suggest another function you might have used to getthe lowest two.

3. The Bessel differential equation is

d2y

dx2+

1

x

dy

dx+

(k2 − m2

x2

)y = 0. (138)

A solution is y(x) = Jm(kx), the mth order Bessel function. Assumea boundary condition y(1) = 0. That is, k is a root of Jm(x). Use theRayleigh-Ritz method to estimate the first non-zero root of J3(x). Isuggest you try to do this with one test function, rather than a sum ofmultiple functions. But you must choose the function with some care.In particular, note that J3 has a third-order root at x = 0. You shouldcompare your result with the actual value of 6.379. If you get within,say, 15% of this, declare victory.

22