math 1080: numerical linear algebra chapter 5, …sussmanm/1080/supplemental/chap5.pdfmath 1080:...

Math 1080: Numerical Linear AlgebraChapter 5, Solving Ax = b by Optimization

M. M. [email protected]

Office Hours: MW 1:45PM-2:45PM, Thack 622

March 2015

1 / 66

Some definitions

I A matrix A is “SPD” (Symmetric Postive Definite)1. A = AT

2. x · Ax = 〈x ,Ax〉 = xT Ax > 0 for x 6= 0I A is “negative definite” if −A is SPDI A is “skew-symmetric” if AT = −AI A is “indefinite” if symmetric and neither SPD nor negative

definite.I Two symmetric matrices satisfy A > B if (A− B) is SPD

2 / 66

Important facts

I All eigenvalues of a symmetric matrix are realI A symmetric matrix has an orthogonal basis of eigenvectorsI A symmetric matrix A is positive definite if and only if all λ(A) > 0.

3 / 66

A-inner product

If A is SPD then〈x , y〉A = x tAy

is a weighted inner product on Rn, the A-inner product, and

‖x‖A =√〈x , x〉A =

√x tAx

is a weighted norm, the A-norm.

4 / 66

Proof

bilinear

〈u+v , y〉A = (u+v)tAy = utAy+v tAy = 〈u, y〉A+〈v , y〉A

and

〈αu, y〉A = (αu)tAy = α(utAy

)= α〈u, y〉A.

Positive〈x , x〉A = x tAx > 0, for x 6= 0, since A is SPD.

Symmetric

〈x , y〉A = x tAy = (x tAy)t = y tAtx tt = y tAx = 〈y , x〉A.

Hence, 〈x , x〉A induces a norm ‖x‖A =√〈x , x〉A

5 / 66

The A-norm is a weighted 2-norm

The norm induced by a 2× 2 SPD matrix A is just a weighted 2-norm.

Proof:I Let (λ1, φ1) and (λ2, φ2) be the two eigenvalues and eigenvectors

of A

I Any vector v = α1φ1 + α2φ2

I ‖v‖2A = (α1φ1 + α2φ2)T A(α1φ1 + α2φ2)

I ‖v‖2A = (α1φ1 + α2φ2)T (α1λ1φ1 + α2λ2φ2)

I ‖v‖2A = (α2

1λ1 + α22λ2)

6 / 66

Homework

Text, Exercises 258, 259

7 / 66

Topics

The connection to optimization

Descent Methods

Application to Stationary Iterative Methods

Parameter selection

The Steepest Descent Method

8 / 66

Finding a solution = finding a minimum

For A SPD, the unique solution of

Ax = b

is also the unique minimizer of the functional

J(x) =12〈x ,Ax〉 − 〈x ,b〉 =

12

(xT Ax)− xT b

ProofTake derivatives of both sides.

∂J∂xi

= 0 =12

((Ax)i + (Ax)i )− bi

9 / 66

Notation

x = A−1b is the argument that minimizes J(y):

x = arg miny∈RN

J(y)

10 / 66

Theorem 260

Let A be SPD. The solution of Ax = b is the unique minimizer of J(x).I For any y

J(x + y) = J(x) +12

y tAy > J(x)

I And‖x − y‖2

A = 2 (J(y)− J(x)) .

11 / 66

Proof of first claim

J(x + y) =12

(x + y)tA(x + y)− (x + y)tb

=12

x tAx + y tAx +12

y tAy − x tb − y tb

= (12

x tAx − x tb) + (y tAx − y tb) +12

y tAy

= J(x) + y t (Ax − b) +12

y tAy

= J(x) +12

y tAy > J(x).

A is SPD, so y 6= 0 =⇒ y tAy > 0.

12 / 66

Proof of second claim

LHS

‖x − y‖2A

= (x − y)tA(x − y)

= x tAx − y tAx − x tAy + y tAy= ( since Ax = b)

= x tb − y tb − y tb + y tAy

= y tAy − 2y tb + x tb

RHS

2 (J(y)− J(x))

= y tAy − 2y tb − x tAx + 2x tb= (since Ax = b)

= y tAy − 2y tb − x t [Ax − b] + x tb

= y tAy − 2y tb + x tb

13 / 66

The 2× 2 case

I 2× 2 system Ax = b with

A =

[a cc d

].

I Eigenvalues: (a− λ)(d − λ)− c2 = 0

λ =(a + d)±

√(a + d)2 − 4(ad − c2)

2

I Positive definite (λ > 0) =⇒ (a + d) > 0 and c2 − ab < 0

14 / 66

The 2× 2 case cont’dI Write x = [x1, x2]T

J(x1, x2) =12

(x1, x2)

[a cc d

] [x1x2

]− [x1, x2]

[b1b2

]=

12

(ax21 + 2cx1x2 + dx2

2 )− (b1x1 + b2x2).

I The surface z = J(x1, x2) is a paraboloid opening up if and only ifa > 0, d > 0, and c2 − ad < 0.

I Hessian matrix is SPD (Theorem 260)

z=J(x,y)

15 / 66

Paraboloid opening upward (again)

I Consider b1 = b2 = 0: x1, x2 are components of the errorI J(x1, x2) = 1

2 (ax21 + 2cx1x2 + dx2

2 )

I Set x1 = r√a cos θ and x2 = r√

dsin θ

J = r2 cos2 θ + r2 sin2 θ + 2r2 c√ad

sin θ cos θ

= r2(

1 +c√ad

sin 2θ)

I J > 0 if and only if ± c√ad< 1 or c2 < ad .

16 / 66

Topics


Descent Methods


Parameter selection


17 / 66

General Descent Method, Algorithm 263Given

1. Ax = b2. a quadratic functional J(·) that is minimized at x = A−1b3. a maximum number of iterations itmax4. an initial guess x0:

Compute r0 = b − Ax0

for n=1:itmax

(∗) Choose a direction vector dn

Find α = αn by solving the 1d minimization problem:(∗∗) αn = arg minα J(xn + αdn)

xn+1 = xn + αndn

rn+1 = b − Axn+1

if converged, exit, endend

18 / 66

Steepest descent

I Functional: J(x) = 12 x tAx − x tb

I Descent direction: dn = −∇J(xn)

19 / 66

−∇J = r when J(x) = 12x tAx − x tb

I J = 12 xT Ax − xT b

I ∂J∂xi

= 12 (Ax)i + 1

2 (xT A)i − bi

I But 12 (xT A)i = 1

2 ((Ax)T )i = 12 (Ax)i

I So ∂J∂xi

= (Ax)i − bi = −ri

I −∇J = r

20 / 66

Formula for α

Proof of following is a homework problem:

For arbitrary dn,

αn =(dn)T rn

(dn)T Adn

Hint: Set ddα (J(x + αd)) = 0 and solve.

21 / 66

Homework

Text, 264.

22 / 66

Notation

I Use the 〈·, ·〉 notationI α for steepest descent becomes

αn =〈dn, rn〉〈dn,Adn〉

=〈dn, rn〉〈dn,dn〉A

.

23 / 66

Descent demonstration

descentdemo.m

24 / 66

Different descent methods: descent direction dn

I Steepest descent direction: dn = −∇J(xn)

I Random directions: dn = a randomly chosen vectorI Gauss-Seidel like descent: dn cycles through the standard basis

of unit vectors e1,e2, · · ·,eN and repeats if necessary.I Conjugate directions: dn cycles through an A-orthogonal set of

vectors.

25 / 66

Different descent methods: choice of functional J

I If A is SPD the most common choice is

J(x) =12

x tAx − x tb

I Minimum residual methods: for general A ,

J(x) =12〈b − Ax ,b − Ax〉.

I Various combinations such as residuals plus updates:

J(x) =12〈b − Ax ,b − Ax〉+

12〈x − xn, x − xn〉

26 / 66

Homework

Text, 265, 268

27 / 66

Topics


Descent Methods


Parameter selection


28 / 66

Stationary Iterative Methods

I A = M − NI Iterative method Mxn+1 = b + Nxn

I Equivalently M(xn+1 − xn) = b − Axn

29 / 66

Examples

I FOR: M = ρI, N = ρI − AI Jacobi: M = diag(A)

I Gauss-Seidel: M = D + L (lower triangular part of A).

30 / 66

Householder lemma (269)

Let A be SPDLet xn be given by M(xn+1 − xn) = b − Axn

Let en = x − xn then

〈en,Aen〉 − 〈en+1,Aen+1〉 = 〈(xn+1 − xn),P(xn+1 − xn)〉

where P = M + MT − A.

31 / 66

Convergence of FOR, Jacobi, GS and SOR

Corollary 270For A SPD,P SPD =⇒ convergence is monotonic in the A norm:

‖en‖A > ‖en+1‖A > . . . −→ 0 as n→ 0.

Householder lemma in this case gives

〈en,Aen〉 − 〈en+1,Aen+1〉 = 〈(xn+1 − xn),P(xn+1 − xn)〉, so

〈en,Aen〉 > 〈en+1,Aen+1〉, or

‖en+1‖A < ‖en‖A

32 / 66

Proof

I ‖en‖A ≥ 0 and monotone decreasing, so has a limitI Cauchy criterion:

(〈en,Aen〉 − 〈en+1,Aen+1〉)→ 0.

I Householder gives||xn+1 − xn||P → 0.

I So thatM(xn+1 − xn) = b − Axn → 0

33 / 66

Monotone convergence

1. FOR converges monotonically in ‖ · ‖A if

P = M + M t − A = 2ρI − A > 0 if ρ >12λmax(A).

2. Jacobi converges monotonically in the A-norm if diag(A) > 12 A.

3. Gauss-Seidel converges monotonically in the A norm for SPD Ain all cases.

4. SOR converges monotonically in the A-norm if 0 < ω < 2.

34 / 66

Proofs

1. For Jacobi, M = diag(A), so

P = M + M t − A = 2diag(A)− A > 0 if diag(A) >12

A.

2. For GS, A = D + L + LT (since A SPD)

P = M + MT − A = (D + L) + (D + L)T − A =

D + L + D + LT − (D + L + LT ) = D > 0

3. For SOR,

P = M + M t − A = M + M t − (M − N) = M t + N

= ω−1D + Lt +1− ωω

D − U =2− ωω

D > 0,

for 0 < ω < 24. Convergence of FOR in the A-norm is left as an exercise.

35 / 66

Proofs



A.


P = M + MT − A = (D + L) + (D + L)T − A =

D + L + D + LT − (D + L + LT ) = D > 0

3. For SOR,

P = M + M t − A = M + M t − (M − N) = M t + N

= ω−1D + Lt +1− ωω

D − U =2− ωω

D > 0,

for 0 < ω < 2

4. Convergence of FOR in the A-norm is left as an exercise.

35 / 66

Proofs



A.


P = M + MT − A = (D + L) + (D + L)T − A =

D + L + D + LT − (D + L + LT ) = D > 0

3. For SOR,

P = M + M t − A = M + M t − (M − N) = M t + N

= ω−1D + Lt +1− ωω

D − U =2− ωω

D > 0,

for 0 < ω < 24. Convergence of FOR in the A-norm is left as an exercise.

35 / 66

Homework

Text, Exercises 275, 278In Exercise 278, you are to:

1. Give details of the proof that P > 0 implies convergence of FOR.2. Prove the Householder lemma.

Hint: expand the right side using P = M + MT − A and theidentities xn+1 − xn = en − en+1 and 〈y ,Wz〉 = 〈W T y , z〉.

36 / 66

Topics


Descent Methods


Parameter selection


37 / 66

Picking ρ for FOR

I FOR: ρoptimal = (λmax + λmin)/2I This gives fastest overall convergence for fixed ρI Hard to calculate.I What about ρn =“Best possible for this step”

38 / 66

Variable ρ

Given x0 and a maximum number of iterations, itmax:

for n=0:itmaxCompute rn = b − Axn

Compute ρn via a few auxiliary calculationsxn+1 = xn + (ρn)−1rn

if converged, exit, endend

39 / 66

FOR again

Residuals rn = b − A ∗ xn for FOR satisfy

rn+1 = rn − ρ−1Arn.

Proof:

xn+1 = xn − ρ−1rn.

Multiply by “−A” and add b to both sides

b − Axn+1 = b − Axn − ρ−1Arn,

40 / 66

Homework

Text, Exercise 281Easy! Preparation for following work.

41 / 66

Option 1 Residual Minimization

Pick ρn to minimize ‖rn+1‖2:

‖rn+1‖2 = ‖rn − ρ−1Arn‖2 = 〈rn − ρ−1Arn, rn − ρ−1Arn〉.

Since rn is fixed, this is a simple function of ρ

J(ρ) = 〈rn − ρ−1Arn, rn − ρ−1Arn〉

orJ(ρ) = 〈rn, rn〉 − 2〈rn,Arn〉ρ−1 − 〈Arn,Arn〉ρ−2.

Taking J ′(ρ) = 0 and solving for ρ = ρoptimal gives

ρoptimal =〈Arn,Arn〉〈rn,Arn〉

.

The cost of using this optimal value at each step: two extra dotproducts per step.

42 / 66

Option 2 J minimization

Minimize J(xn+1):

φ(ρ) = J(xn + ρ−1rn) =

12〈xn + ρ−1rn,A(xn + ρ−1rn)〉 − 〈xn + ρ−1rn,b〉

Expanding, setting φ′(ρ) = 0 and solving, as before, gives

ρoptimal =〈rn,Arn〉〈rn, rn〉

.

I Option 2 requires SPD AI Faster than Option 1

43 / 66

Algorithm for Option 2

Given x1

the matrix Aa maximum number of iterations, itmax:

r1 = b − Ax1

for n=1:itmax

ρn = 〈Arn, rn〉/〈rn, rn〉xn+1 = xn + (1/ρn) rn

if satisfied, exit, end

rn+1 = b − Axn+1

end

44 / 66

Homework

Text, Exercise 283. Implement and test Options 1 and 2.

45 / 66

The “normal” equations

I Ae = r so ‖r‖22 = ‖Ae‖2

2 = ‖e‖2AT A

I AT A is SPD, so minimizing ‖e‖2AT A is minimizing

(AT A)x = AT b.

I Both bandwidth and condition number of AT A are much largerthan A, but AT A is SPD.

46 / 66

Condition number of normal equations (284)

Let A be N × N and invertible.Then AT A is SPD.If A is SPD then λ(AT A) = λ(A)2 and

cond2(AT A) = [cond2(A)]2 .

Proof

I Symmetry: (AT A)T = AT ATT = AT AI Positivity: 〈x ,AAT x〉 = 〈AT x ,AT x〉 = ‖AT x‖2 > 0I Condition no.: For A SPD

cond2(AtA) = cond2(A2) =λmax(A2)

λmin(A2)=

(λmax(A)

λmin(A)

)2

= (cond2(A))2.

I Squaring the condition number greatly increases number ofiterations

47 / 66

Topics


Descent Methods


Parameter selection


48 / 66

Steepest descent

I Reduce J as much as possible in gradient directionI Option 2

49 / 66

Steepest descent algorithm

GivenI SPD AI x0, r0 = b − Ax0

I maximum number of iterations itmax

for n=0:itmax

rn = b − Axn

αn = 〈rn, rn〉/〈rn,Arn〉xn+1 = xn + αnrn

if converged, exit, end

end

αn was called 1/ρn earlier.

50 / 66

Example: It is still slow!

I N = 2, x = (x1, x2)T .

I A =

[2 00 50

], b =

[20

]I

J(→x ) =

12

[x1, x2]

[2 00 50

] [x1x2

]− [x1, x2]

[20

]=

12

(2x21 + 50x2

2 )− 2x1 = x21 − 2x1 + 25x2

2︸︷︷︸ellipse

+1− 1

=(x1 − 1)2

12 +x2

2

( 15 )2− 1.

51 / 66

Steepest descent illustration

x0 = (11, 1)

x4 = (6.37, 0.54)

limit= (1, 0)

52 / 66

SD is no faster than FOR (287)

Let A be SPD and κ = λmax(A)/λmin(A).. The steepest descentmethod converges to the solution of Ax = b for any x0. The errorx − xn satisfies

‖x − xn‖A ≤(κ− 1κ+ 1

)n

‖x − x0‖A

and

J(xn)− J(x) ≤(κ− 1κ+ 1

)n (J(x0)− J(x)

).

But no eigenvalue estimates are required!

53 / 66

Proof

I 〈x − xn,A(x − xn)〉 = 〈x ,b〉+ 2J(xn)

I Minimizing J is the same as minimizing ‖x − xn‖A

I Steepest descent reduces J maximallyI FOR reduces ‖x − xn

FOR‖A with known boundI Doing one iteration of each, starting from same xn−1,

‖x − xn‖A ≤ ‖x − xnFOR‖A ≤

(κ− 1κ+ 1

)‖x − xn−1‖A

54 / 66

Theorem 288: inequality is sharp!

I Ax = bI If x0 = x − (λ2φ1 ± λ1φ2) where φ1 and φ2 are the eigenvectors

of λ1 = λmin(A) and λ2 = λmax(A) respectivelyI Then the convergence rate is precisely

λ2 − λ1

λ2 + λ1=κ− 1κ+ 1

55 / 66

Proof

x0 = x − (λ2φ1 ± λ1φ2)

e0 = λ2φ1 ± λ1φ2

r0 = Ae0 = −λ1λ2φ1 ∓ λ1λ2φ2

Ar0 = −λ21λ2φ1 ∓ λ1λ

22φ2

α0 = 〈r0, r0〉/〈r0,Ar0〉

=λ2

1λ22 + λ2

1λ22

λ31λ

22 + λ2

1λ32

=2

λ1 + λ2

x1 = x0 + αr0

= x −(λ2 −

2λ1 + λ2

λ1λ2

)φ1 −

(∓λ2

1 ∓2

λ1 + λ2λ1λ2

)φ2

= x − λ2

(λ2 − λ1

λ2 + λ1

)φ1 − λ1

(±λ1 ∓ λ2

λ1 + λ2

)φ2

= x − λ2 − λ1

λ2 + λ1(λ2φ1 ∓ λ1φ2)

56 / 66

Homework

Text, Exercise 291

57 / 66

Exercise 290: Convection-Diffusion Equation

−ε∆u + c · ∇u = f inside Ω = [0,1]× [0,1]

c =

[10

]u = g on ∂Ω

I The convection term is discretized as

c · ∇u =∂u∂x≈ u(i + 1, j)− u(i − 1, j)

2h

58 / 66

Comments about CDE

I c is like a wind, blowing the solution to the right.I Interesting behavior for ε < ‖c‖

I Pe=”cell Péclet number”I Discretization becomes unstable as Pe becomes large.

59 / 66

Comments about CDE

I c is like a wind, blowing the solution to the right.I Interesting behavior for ε < ‖c‖I Pe=”cell Péclet number”I Discretization becomes unstable as Pe becomes large.

59 / 66

Modify jacobi2d.m

I Add in convection term.I Ends up with a factor of h in the numeratorI Debug with f = 1, g = x − y .

I Exact solution is u = gI Also exact discrete solution!I Diffusion is debugged already

I Debug with N = 5I Can see the solution on the screenI h = 0.2 so can recognize solution values on sight

I Start from exact: must get answer in 1 iteration.I Start from exact, run many iterations: must stay exactI Start from zero, get exact?

I If not, program in exact solution, print norm of error at eachiteration.

I Double-check original code!

60 / 66

“Real” problem

I N = 50, f = x + y , g = 0I Jacobi, steepest descent, residual minimizationI ε = 1.0, 0.1, 0.01, 0.001

61 / 66

Solution, ε = 1.0

62 / 66

Solution, ε = 0.1

63 / 66

Solution, ε = 0.01

64 / 66

Solution, ε = 0.001

65 / 66

Iterative methods

Numbers of iterationsε Jacobi Steepest descent Residual minimization

1.0 3318 3246 32810.1 17161 1652 16900.01 13235 130 113

0.001 15735 diverges 1663

66 / 66

math 1080: numerical linear algebra chapter 5, …sussmanm/1080/supplemental/chap5.pdfmath 1080:...

Documents