numerical linear algebra and vector calculus dr. …lecture 2: mathematical preliminaries numerical...

Lecture 2: Mathematical preliminariesNumerical Linear Algebra and Vector Calculus

Dr. Abebe Geletu

Ilmenau University of TechnologyDepartment of Simulation and Optimal Processes (SOP)

Winter Semester 2011/12

Lecture 2: Mathematical preliminaries

TU Ilmenau

2.1. Numerical Linear Algebra - review2.1.1. Vectors:

• x =

x1...

xn

is a vector with n components;

Matlab: length(x) to determine the length vector x ;I transpose of x is a row vector x> = (x1, . . . , xn); Matlab: x ′.Norms of a vector:

I Euclidean norm: ‖x‖2 =√

x21 + x2

2 + . . .+ x2n ; Matlab>>

norm(x,2) or norm(x).A vector x is a unit vector if ‖x‖2 = 1.I maximum norm: ‖x‖∞ = max1≤k≤n |xk |; Matlab>> norm(x,inf)

.


TU Ilmenau

Operations with vectors

I the result of multiplication of a vector x by a scalar α is vector :αx = (αx1, αx2, . . . , αxn); Matlab>> αx .For two vectors x any y of equal length:I the sum of x and y: x + y = (x1 + y1, . . . , xn + yn); Matlab>> x+y.I the scalar product of x and y, denoted by < x , y > or x · y or x>y ,is scalar number given by x>y = x1y1 + . . .+ xnyn; Matlab>> x’*y.Note that: x>x = ‖x‖2.I componentwise product of x and y is a vector given by(x1y1, . . . , xnyn); Matlab>> x.*y


TU Ilmenau

Linear Combination, Linear Independencies andBasisI A vector x is a linear combination of vectors v1, v2, . . . , vm in Rn

if there are scalars α1, α2, . . . , αm such that x = α1v1 + . . .+ αmvm.I a set of vectors v1, v2, . . . , vm in Rn are linearly dependent if thereare scalars α1, α2, . . . , αm not all of them equal to zero such that

α1v1 + α2v2 + . . .+ αmvm = 0. (1)

If equation (??) holds true only for α1 = α2 = . . . = αm = 0, thenthese vectors are said to be linearly independent.⇒ The standard unit vectors e>i = (0, . . . , 1, . . . , 0), i = 1, . . . , n,are linearly independent.I a set of vectors linearly independent vectors {v1, v2, . . . , vm} is abasis of Rn if any vector x in Rn can be written as a linearcombination of these vectors. That is, x = α1v1 + . . .+ αnvn.


TU Ilmenau

Orthogonal vectors, Orthonormal vectors andOrthogonalization

I x and y are orthogonal if x>y = 0, written x ⊥ y ;⇒ Orthogonal vectors are linearly independent.I A set of vectors {v1, v2, . . . , vm} in Rn is orthonormal if

vi ⊥ vj , i 6= j and ‖vi‖ = 1, i = 1, . . . , n.

⇒ The standard unit vectors e>i = (0, . . . , 1, . . . , 0), i = 1, . . . , n,are linearly independent.


TU Ilmenau

The Gram-Schmidt orthonormalization algorithmGiven a set {v1, . . . , vm} of linearly independent vectors, to constructa set of orthonormal vectors {q1, . . . , qm} .

Algorithm 1: The Modified Gram-Schmidt Algorithm

1: r11 ← ‖v1‖;2: q1 ← v1/r11;3: for j = 2 : n do4: qj ← vj ;5: for i = 1 : (j − 1) do6: rij ← (qi )

> vj ;7: qj ← qj − rijqi ;8: end for9: rjj ← ‖qj‖;

10: qj ← qj/rjj ;11: end for


TU Ilmenau

A Matlab implementation

function Q=ModGrammSchmidt(V)

[n,m]=size(V);

R=zeros(n,m);

Q=zeros(n,m);

r(1,1)=norm(V(:,1));

Q(:,1)= V(:,1)/r(1,1);

for j=2:n

Q(:,j)=V(:,j);

for i=1:(j-1)

R(i,j)=Q(:,i)’*V(:,j);

Q(:,j)=Q(:,j)-R(i,j)*Q(:,i);

end

R(j,j)=norm(Q(:,j));

Q(:,j)=Q(:,j)/R(j,j);

endLecture 2: Mathematical preliminaries

TU Ilmenau

2.1.2. Matrices

A =

a11 a12 . . . a1n

a21 a22 . . . a2n...

.... . . am−1,n

am1 . . . am,n−1 amn

is a matrix with m rows and n columns. A shorter notation A = (aij).I transpose A> is a matrix with n rows and m columns; Matlab>>A’.I rank of A is a number equal to the number of linear independentrows or columns of A; Matlab >> rank(A).Properties:• rank(A) ≤ min{m, n};• if rank(A) = m, then A has full row rank;• if rank(A) = n, then A has full column rank;• If y ∈ Rm and x ∈ Rn, then matrix A = yx> has rank(A) = 1.


TU Ilmenau

Matrix normsLet A ∈ Rm×n; i.e. A is an m by n matrix. ThenI Frobenious norm of A;

‖A‖F =

m∑i=1

n∑j=1

|aij |21/2

; Matlab: norm(A,’fro’).

I Maximum norm of A:

‖A‖∞ = max1 ≤ i ≤ m1 ≤ j ≤ n

|aij |; Matlab: norm(A,inf).

I Induced norm of A:

‖A‖2 = maxx 6=0

‖Ax‖2

‖x‖2= max

x 6=0

‖Ax‖‖x‖

; Matlab: norm(A,2) or norm(A) .

• For any matrix A, the following holds true: ‖A‖2 ≤ ‖A‖F ≤ ‖A‖∞ .


TU Ilmenau

Operations with matricesI sum of two matrices A = (aij) and B = (bij) of equal sizeA + B = (aij + bij); Matlab >> C=A+B.I componentwise product of two matrices A = (aij) and B = (bij)of equal size A. ∗ B = (aij ∗ bij); Matlab >> C=A.*B.I product of a matrix A = (aij) size m× n and matrix B = (bjk) sizen × p is a matrix C = (cik) of size m × p, such that C = AB:

Algorithm 2: Matrix Multiplication

1: for i = 1 : m do2: for k = 1 : p do3: cik =

∑nj=1 aijbjk ;

4: end for5: end for

Matlab>> C=A*B.I product of a matrix A size m × n and vector x of length n is avector b of length m; Matlab>> b=A*x.


TU Ilmenau

Square matrices and some propertiesI an m by n matrix A is a square matrix if m = n;I For the square matrix A, d = (a11, a22, . . . , ann) is the vector ofdiagonal elements; Matlab >> d=diag(A).I the n by n matrix with all it diagonal elements equal to 1 and therest of the elements equal to 0 is the identity matrix In.Note that: ‖In‖2 = ‖In‖F = ‖In‖∞ = 1.Matlab >> I=eye(n).I a n × n matrix A is invertible if there is a matrix B such thatAB = BA = In. B is called the inverse of A; written B = A−1;Matlab>> B=inv(A).I if an n × n matrix A is invertible, then

column vectors A or row vectors of A are linearly independent;Ax = 0 iff x = 0.

rank(A) = n or

det(A) 6= 0; Matlab>> det(A).


TU Ilmenau

Eigenvalues and eigenvectorsA non-zero (real or complex) number λ is an eigenvalue of A if

Av = λv ,

for some vector v . In this case is called an eigenvector of A. Aneigenvalue can be a real or a complex number; Matlab>>[V,D]=eig(A).• One use of eigenvalues: stability analysis of linear and nonlineardynamic systems.

I λ is an eigenvalue of an n×n square matrix A iff det(λI − A) = 0 .

I p(λ) = det(λI − A) = 0 is called characteristic polynomial of A

of degree n.Example:

If A =

[1 23 4

], then p(λ) = det(λI − A) = λ2 − 5λ− 2.


TU Ilmenau

Spectrum and spectral radiusI For a matrix A, the set

σ(A) = {λ | λ is eigenvalue of A}is called the spectrum of the matrix A.I The number of

ρ(A) = maxλ∈σ(A)

|λ|

is called the spectral radius of A.Example:

If A =

[−4 00 3

]Then σ(A) = {−4, 3} and ρ(A) = 4.I For an A ∈ Rn×n, it follows that ‖A‖ =

√ρ(AA>)

I Convergence of iterative algorithms for the solution of a system ofequations Ax = b depend on the spectral radius of the iterationmatrix.


TU Ilmenau

Symmetric, Semi-definite, Orthogonal MatricesI A square matrix A is symmetric if A = A>.⇒ All eigenvalues of a symmetric matrix are real numbers.I An n × n symmetric matrix A is positive semi-definite ifx>Ax ≥ 0, for all x ∈ Rn.⇒ all eigenvalues of a positive semi-definite matrix are non-negativereal numbers.⇒ For any matrix B, the matrix A = BB> is symmetric and positivedefinite.I An n × n symmetric matrix A is positive definite if x>Ax ≥ 0, forall x ∈ Rn, x 6= 0.⇒ All eigenvalues of a positive definite matrix are positive realnumbers.⇒ A positive definite matrix is invertible.I A square matrix Q is orthogonal if QQ> = I .⇒ An orthogonal matrix Q is invertible, Q−1 = Q> and ‖Q‖ = 1 .


TU Ilmenau

Singular Value Decomposition (SVD)I Let A an m × n matrix with rank(A) = r , then A can be expressedas

A = UΣV>

where U an m×m and V is an n× n orthogonal matrices and Σ is anm × n diagonal matrix such that

Σ =

σ1 0 . . . 0σ2 0 . . . 0

. . . 0 . . . 0

σr... . . .

...

0 0 . . . 0 0 . . . 0...

... 0 . . . 00 0 . . . 0 0 . . . 0

,

where σ1 ≥ σ2 ≥ . . . ≥ σrand σ1, σ2, . . . , σr are called

singular values of A ;[U,S,V] = svd(A).


TU Ilmenau

SVD...Image CompressionIn the SVD for A, let U = [u1, u2, . . . , um] ∈ Rm×m andV = [v1, v2, . . . , vn] so that

A = [u1, u2, . . . , um]

σ1 0 . . . 0σ2 0 . . . 0

. . . 0 . . . 0

σr... . . .

...

0 0 . . . 0 0 . . . 0...

... 0 . . . 00 0 . . . 0 0 . . . 0

︸︷︷︸

=Σ

v>1v>2...

v>n

Then A = σ1U1V>1 + σ2U2V>2 + . . .+ σrUrV>r .


TU Ilmenau

SVD...Image Compression ...I The gray-scale values of a digital image are repressed by a matrix A.I The sum A = σ1U1V>1 + σ2U2V>2 + . . .+ σrUrV>r is weighted sumwith decreasing singular values as decreasing weights, sinceσ1 ≥ σ2 ≥ . . . ≥ σr .I Thus dropping some of the terms with smaller weights(singular-values) does not significantly affect image quality but savesstorage space ⇒ image compression.(More on this in the Tutorials!!)

Note that: For a symmetric n × n matrix A it follows that• A = UΣV> implies U = V• The columns of U: U1,U2, . . . ,Un are eigenvectors A.• The singular values σ1, σ2, . . . , σr are eigenvalues of A.• In addition, if A positive semi-definite, then singular values arenon-negative, i.e. σ1 ≥ σ2 ≥ . . . ≥ σr ≥ 0.


TU Ilmenau

Condition Number,well/ill-conditioned matrices

I For a regular (or nonsingular or invertible) matrix A ∈ Rn×n, thenumber

κ (A) = ‖A‖‖A−1‖

is called the condition number of A; Matlab>> cond(A).I If A is a nonsingular matrix with singular values σ1, σ2, . . . , σn, then

κ(A) =σ1

σn=σmax(A)

σmin(A).

I A matrix A is well-conditioned if κ(A) is not too large; otherwise,it is ill-conditioned.

Example:For A =

[1 1.000000012 2

], κ (A) = 5.0e + 8.


TU Ilmenau

Condition Number,well/ill-conditioned matrices ...Example: Impact of an ill-conditioned matrix.To solve the equation Ax = b with Let

A =

[1000 999999 998

]and b =

[19991997

]Condition number of A : κ(A) = 3.992e + 06.

Solution of Ax = b is x =

[11

].

Now suppose b has a small change (perturbation, noise) given as

b̃ = b + ∆b =

[19991997

]+

[−0.010.01

]The solution of Ax = b̃ will be x =

[20.97−18.99

].

I A small inaccuracy in problem data may lead to a totally differentresult from the actual one.


TU Ilmenau

2.1.3. Solution Methods for Systems of LinearEquationsConsider a system of algebraic equations

Ax = b,

where A is a matrix of size m × n and b is a vector of length.Algorithms to solve Ax = b depend on:

Type of matrix A: eg. square, non-square (i.e. Ax = b is eitherover- or under-determined)

Properties of A: symmetric, nonsymmetric, positive definite,regular (invertible), well-conditioned, ill-conditioned, etc.

Structure of matrix A: dense matrix, sparse-matrix (many zerosthan non-zero numbers), banded matrix, block-structured matrix,etc.

Size of the matrix A: small to medium-sized matrix, very largematrix with a complicated structures, etc.


TU Ilmenau

Solution Methods for Systems of Linear Equations...

I Algorithms need to exploit the properties and structures of thematrix A.I Specific algorithms are frequently preferred for specific applications.Do not do this

x = A−1b!

Unless A−1 is already available or given to you for free!. Otherwise

A−1 is usually quite expensive to compute,

A−1 may not be available.


TU Ilmenau

Solution Methods for Systems of Linear Equations...(Simpler Instances)(a) A is a diagonal matrix:

a11 0 . . . 00 a22 0 . . . 0

0 0. . . 0 0

......

. . . 00 0 . . . 0 ann

x1

x2...

xn

=

b1

b2...

bn

,

If all akk 6= 0, then

xk =bk

akk, k = 1, . . . , n.

If for some index k , akk = 0, then the system has no solution.


TU Ilmenau

... Simpler Instances(b) A is an upper-triangular matrix:

a11 a12 . . . a1n

a22 a23 . . . a2n

. . ....

an−1,n−1 an−1,n

ann

x1

x2...

xn

=

b1

b2...

bn

,

Solution by backward substitution:

xn =bn

ann; xn−1 =

bn−1 − an−1,nxnan−1,n−1

;

...

xk =bk −

∑ni=k+1 akixi

akk, k = 1, . . . , n − 1.

If for some index k , akk = 0, then the system has no solution.Lecture 2: Mathematical preliminaries

TU Ilmenau

...Simpler Instances

I If A a lower-triangular matrix use forward substitution.

(c) A is an orthogonal matrix, then AA> = A>A = In. Thus

Ax = b ⇒ A>Ax = A>b ⇒ Inx = A>b

Hence, the solution of Ax = b is given by

x = A>b.

I In practical applications A may have none of the above simplerstructures.


TU Ilmenau

Solution methods for systems of linear equations...

In general, there are two classes of algorithms:

I. Direct Methods

II. Iterative Methods

I. Direct Methods

factorize A as a product of matrices with simpler structures(diagonal, triangular, orthogonal matrices, etc.).


TU Ilmenau

... Direct Methods

Known matrix factorization methods

factorization type type of A

LU A=LU symmetric, non symmetric

Cholesky A = LL> symmetric positive definite

LDL> A = LDL> symmetric indefinite

QR A = QR A ∈ Rm×n,m ≥ n, rank(A) = n

SVD A = UΣV> A ∈ Rm×n

• in LU , Cholesky and LDL>: L - lower triangular and D - diagonal• in QR : Q orthogonal, R upper triangular, frequently used in leastsquare problems• in SVD: V is m ×m and U is n × n orthonormal matrices, Σ ism × n with Σ = diag(σ1, σ2, . . . , σn), with σ1 ≥ σ2 ≥ . . . ≥ σn ≥ 0.


TU Ilmenau

... Direct MethodsExample: Solution through LU factorization

A = LU,

where

L =

∗∗ ∗∗ ∗ ∗...

. . .

∗ ∗ . . . ∗

, U =

∗ ∗ . . . ∗∗ ∗ . . . ∗

. . ....

∗ ∗∗

Then

Ax = b =⇒ (LU)x = b =⇒ L( Ux︸︷︷︸=y

) = b. (2)


TU Ilmenau

... Direct Methods ...

Algorithm 3: Solution of Ax = b through LU factorization

1: Set y = Ux ;2: Use forward substitution to solve for y from Ly = b.;3: Use back substitution to solve for x from Ux = y .

Algorithm 4: Solution of Ax = b through QR factorization

1: Put A = QR so that QRx = b;

2: Multiply both sides of QRx = b by Q> to obtain Rx = Q>b;

3: Use back substitution to solve for x from Rx = d with d = Q>b.


TU Ilmenau

... Direct MethodsAdvantages :

high accuracy of solutions

suitable for small to medium-scale systems of equations

matrix partitioning techniques can be applied

easy to parallelize

Disadvantages :

computationally expensive

inefficient for large and sparse systems

may cause fill-in effect when A is a sparse matrix

Matlab matrix factorization functions :[L,U]=lu(A), L=chol(A), [Q,R]=qr(A), [U,S,V]=svd(A),L = ldl(A) (after MATLAB Version 7.3 (R2006b))


TU Ilmenau

II. Iterative Methods

Algorithm 5: Principle of Iterative Algorithms

1: Step 0: Select an initial iterate x (0);

2: Step k: Determine x (k+1) from x (k), k = 1, 2, . . .;3: Stop: If termination criteria is satisfied.

Commonly used termination criteria: given ε

Relative residual norm:‖b − Ax (k)‖‖b‖

≤ ε.

Two groups of Iterative methods:(A) Stationary Iterative Methods - Matrix Splitting Methods.(B) Dynamic Iterative Methods - Krylov-Subspace Methods.


TU Ilmenau

A. Stationary Iterative Algorithms...

Also known as matrix splitting methods or fixed-point iterativemethods.

Algorithm 6: Basic Algorithm Algorithm

1: Step 0: Start from x0;2: Step k: xk+1 = Bxk + d , k = 1, 2, . . .;3: Stop: If termination criteria is satisfied..

I Well-known algorithms: Jacobi, Gauss-Seidel, SOR, etc.SOR = Successive Over Relaxation.

Jacobi, Gauss-Seidel, SOR MethodsGiven a square matrix A, split A as

A = D + L + U.


TU Ilmenau

Stationary Iterative Algorithms...

L =

0

a21 0a31 a32 0

.... . .

an1 an2 . . . an,n−1 0

,U =

0 a12 . . . a1n

0 a23 . . . a2n

. . ....

0 an−1,n

0

and

D =

a11 0 . . . 00 a22 0 . . . 0

0 0. . . 0 0

......

. . . 00 0 . . . 0 ann

,


TU Ilmenau

Stationary Iterative Algorithms ...

Jacobi method: x (k+1) = Bx (k) + d , whereB = −D−1 (L + U) , d = D−1b.

Gauss-Seidel: x (k+1) = Bx (k) + d , whereB = (D + L)−1 U, d = (D + L)−1 b.

SOR: x (k+1) = Bx (k) + d , withB = (D + ωL)−1 [(1− ω)D − ωU] , d = ω (D + ωL)−1 b,ω > 0 - relaxation factor (convergence tunining factor).

SOR:• Good values of are 0 < ω < 2.• 0 < ω < 1 under relaxation.• 1 < ω2 over-relaxation.


TU Ilmenau

Stationary Iterative Algorithms ... Convergenceproperties

Convergence from any start point x (0) iff ρ(A) < 1.

A is strictly diagonal dominant =⇒ Jacobi and Gauss-Seidelconverge from any start point x (0).

A symmetric positive definite =⇒ Jacobi and SOR (0 < ω < 2)converge from any start point x (0).

N.B.: Convergence is guaranteed only for a well-conditioned matrix A.

Advantages :

Global convergence if A SPD and ρ(A) < 1.

In general, SOR converges faster than the other two forω ∈ (0, 2).


TU Ilmenau

Stationary Iterative Algorithms - LimitationsDisadvantages :

Applicable only to smaller or problems with well-conditioned orstrictly diagonal dominant matrix A.

Matrix B is not (dynamically) adapted to the current iterate.

May not converge if A is ill-conditioned.

Serious Issues:

What if A is ill-conditioned?, i.e. ρ(A) >> 1 or cond(A) >>

What if A is not symmetric?

What if A is not definite?

What if A is not a square matrix?

What if A is very large and sparse matrix?

How to exploit the structure of matrix A? - sparsity, bandstructures, block structures, etc.


TU Ilmenau

B. Krylov Subspace Methods - Iterative Methods forSparse Linear SystemsBasic principles:

• Step 0: Start from x (0).• Step k: Determine x (k) ∈ Kk(A, r 0), k = 1, 2, . . .where

Kk = Kk(A, r 0) = span{

r (0),Ar (0),A2r (0), . . . ,Ak r (0)}

is known as Krylov-subspace of dimension k . NB: Kk ⊂ Kk+1.• Stop: If termination criteria is satisfied.

Efficient (dynamic) iterative solvers:• Conjugate Gradient method (CG) - for SPD matrix A• Bi-Conjugate Gradient Stabilized Method (BiCGStab) - fornon-symmetric A• Generalized Minimized Residual (GMRES) - for general matrix A


TU Ilmenau

B. Krylov Subspace Methods - CG-invented by Hestens & Steifel in (1959).Standard assumption: A is n × n and SPD, b ∈ Rn.

The classical CG method:Facts:

x∗ is a solution of Ax = b if x∗ is a minimizer of the quadraticfunction

φ(x) =1

2x>Ax − b>x .

Given x0 and d (0), d (1), . . . , d (n−1) vectors in Rn with theproperty (

d i)>

A(d j), i , j = 0, 1, . . . , n − 1, i 6= j .

The iteration x (k+1) = x (k) + tkdk , k = 0, 1, 2, . . .


TU Ilmenau

B. Krylov Subspace Methods ... CG (contd)

terminates after n-steps with x (n) as a minimum point x∗ of φ(x);where tk is a optimal point of the one-dimensional optimizationproblem

mint∈Rn

φ(x (k) + td (k)).

In particular

tk = −(g (k)

)>dk(

d (k))>

d (k)with gk := Ax (k) − b, k = 0, 1, . . . , n − 1.

The vectors is (*) are said to be A-conjugate or A-orthogonal.


TU Ilmenau

B. Krylov Subspace Methods ... CG (contd)CG as a Krylov-subspace method:For the CG-algorithm above the following holds:

span{

d (0), d (1), . . . , d (k−1)}

= span{

g (0), g (1), . . . , g (k−1)}

= span{

r (0), r (1), . . . , r (k−1)}

= span{

r (0),Ar (0), . . . ,A(k−1)r (0)}

= Kk(r 0,A), k = 0, 1, 2, . . . ,

From the iteration (**) it follows

xk+1 = x (0) +k∑

i=0

tid(i)

⇒ xk+1 ∈ x (0) + span{

d (0), d (1), . . . , d (k),}

= x (0) +Kk .


TU Ilmenau

B. Krylov Subspace Methods ... CG as Krylov(contd)

How to determine the vectors d (0), d (1), . . . , d (k−1) at each step k .


TU Ilmenau

Comparison of Direct and Iterative Methods

Direct Methodsindirect Methods Advantages

usable for large-scale and sparse linear systems

applicable to systems with matrices of arbitrary structures

requires less computer memory

Disadvantages

efficiency depends on the type of problem

difficult to parallelize

requires pre-conditioning for convegence


TU Ilmenau

numerical linear algebra and vector calculus dr. …lecture 2: mathematical preliminaries numerical...

Documents