linear algebra review - iowa state...

Chapter 1

Linear Algebra Review

It is assumed that you have had a course in linear algebra, and are familiar withmatrix multiplication, eigenvectors, etc. I will review some of these terms here,but quite rapidly.

1.1 Vector Spaces

The standard object in linear algebra is a vector space.

Definition 1.1. A vector space V over a field F (the scalars) is a set of vectorswith two operations: vector addition

vector + vector = vector,

which makes V into an Abelian group, and scalar multiplication

scalar · vector = vector,

with properties

α · (βv) = (αβ) · v,α(v + w) = αv + αw,

(α+ β)v = αv + ·v.

A subspace is a subset of V which is closed under addition and scalarmultiplication.

In this course, we will only consider R (real numbers) and C (complexnumbers) as scalars. I will state the definitions for the complex case, if it makesa difference, but all of the examples and homework problems will be real (exceptpossibly in the section on eigenvalues).

There are two standard examples of finite-dimensional vector spaces.

1

2 CHAPTER 1. LINEAR ALGEBRA REVIEW

1.1.1 The Geometric View: Arrows

A vector is an arrow, given by a direction and a length. You can move it aroundto any place you want. Examples are forces in physics, or velocities.

1.1.2 The Analytic View: Columns of Numbers

The vector space is V = Rn or Cn.

v =

v1v2...vn

, v + w =

v1 + w1

v2 + w2

...vn + wn

, αv =

αv1αv2

...αvn

.

Note: Vectors are columns of numbers, not rows.In this course, we basically only use the analytic view, but the geometric view

is occasionally useful to get an intuitive understanding of what some theoremor algorithm means.

1.1.3 Linear Independence and Bases

Let {vi} be a collection of vectors, and {αi} some scalars. A linear combina-tion of the vectors is ∑

i

αivi.

The set of all linear combinations of {vi} is called the span. The span is alwaysa subspace.

The {vi} are called linearly dependent if there is a set of coefficients {αi},not all zero, for which ∑

i

αivi = 0.

Otherwise, they are linearly independent.A basis of V is a collection {vi} so that every w ∈ V can be written uniquely

as a linear combination.

1.2. LINEAR MAPS 3

Theorem 1.2.(a) Every vector space has a basis (usually infinitely many of them)(b) Every basis has the same number of elements. This is called the dimension ofthe space.

Take an arbitrary n-dimensional vector space V over F. Pick a basis {e1, e2, . . . , en}.Then we can equate

w =∑

αiei ⇐⇒ w =

α1

α2

· · ·αn

V ⇐⇒ Fn.

Thus, every finite-dimensional vector space over F is isomorphic to Fn. Wewill frequently use the notation

e1 =

100...0

, e2 =

010...0

, . . .

for the standard basis vectors.

Sideline:As a generalization of Fn, the space of infinite sequences

{α1, α2, α3, . . .}

is also a vector space, with dimension infinity.Even more general, the space of functions on an interval [a, b] is a vector space.In finite dimensions, we have subscripts 1, . . . , n. In the case of sequences, we

have subscripts 1, 2, . . .. In the case of functions, you can think of the x in f(x) as asubscript.

Linear algebra for infinite-dimensional spaces is called functional analysis, and isits own topic.

1.2 Linear Maps

A homomorphism L between vector spaces needs to preserve the two vectorspace operations. This means

L(v + w) = L(v) + L(w),

L(αv) = αL(v),


or equivalentlyL(αv + βw) = αL(v) + βL(w).

Such a mapping is called a linear map.Assume L is a linear map from V to W .

Definition 1.3.R(L) = range of L = {Lv : v ∈ V } ⊂W ,N(L) = nullspace of L = kernel of L = {v ∈ V : Lv = 0} ⊂ V .

Both of these are subspaces. The dimension of R(L) is called the rank of L.

Theorem 1.4.dim(R(L)) + dim(N(L)) = dim(V ).

Note: From the definitions, a linear map L from V to W is one-to-one if andonly if N(L) = {0}. It is onto if and only if R(L) = W .

By the theorem and a dimension count, a mapping L from V into itself isone-to-one if and only if it is onto.

1.3 Matrices

So far, L is just a mapping between vector spaces. It maps vectors to othervectors. Now we want to introduce coordinates.

We pick a basis {ei}, i = 1, . . . , n in V , and another basis {fj}, j = 1, . . . ,min W .

If v =∑

i viei is an arbitrary vector in V (expressed in terms of the basis{ei}), then by linearity,

w = Lv =∑i

viLei.

We can express Lei in the basis of W :

Lei =∑j

`jifj .

We collect all these numbers in a matrix L. Then

L · v = w`11 `12 · · · `1n`21 `22 · · · `2n...

...`m1 `m2 · · · `mn

·v1...vn

=

w1

...wm

.

So, the columns of the matrix represent the images of the basis vectors.

1.3. MATRICES 5

Note the numbering in L: the first subscript always refers to the row, thesecond to the column. Entry `35 is in row 3, column 5. The map L goes fromFn to Fm, and is of size m× n (which is the opposite order).

There are two ways to think about matrix times vector multiplication.

The first one is the dot product interpretation: The jth entry in w isthe dot product between the jth row of L, and v.

wj = (`j1, `j2, . . . , `jn)

v1v2...vn

= `j1v1 + `j2v2 + · · ·+ `jnvn.

The second one is the linear combination interpretation: w is the linearcombination of the columns of L, with coefficients from v. This is the way wederived it above.

When you program this on a computer, (matrix times vector) correspondsto a double loop. The loop can be executed in either order, corresponding tothe two interpretations. Depending on the computer architecture, one way maybe faster than the other.

In Matlab-like code:

% dot product % linear combination

w = 0; w = 0;

for i = 1:n for j = 1:n

for j = 1:n for i = 1:n

w(i) = w(i) + L(i,j)*v(j) w(i) = w(i) + L(i,j)*v(j)

end end

end end

Remark: Technically, I should distinguish between the mapping L and thematrix L, but I won’t. Just keep in mind that the matrix depends on the choiceof basis, the mapping does not.

1.3.1 Matrix Multiplication

Suppose you have a linear map L from V (dimension n) to W (dimension m),and M from W to X (dimensions p). We can then consider the combined mapN = M ◦ L (backwards again).


You can verify that for the matrices,

N = M · L(p× n) = (p×m) · (m× n)

nij =∑k

mik`kj

In words: the entry nij in the product is the dot product of row i in M withcolumn j in L.Note: The middle dimension has to match, and gets “canceled”. The size ofthe result comes from the outer numbers.

Likewise in the sum, the middle index gets “canceled”.On a computer, (matrix times matrix) corresponds to a triple loop, which

can be executed in 6 orders, corresponding to 3 different viewpoints. (Each oneshows up twice, depending on the order in which the product matrix gets filledin).

The first two are the dot product and linear combination interpretationsfrom above. The third one is the sum of rank one matrices interpretation.

If w∗ is a row vector (size 1×n), and v is a column vector (size n× 1), thenw∗v is a scalar (1× 1 matrix), and vw∗ is a matrix of rank 1:

v1w1 v1w2 · · · v1wn

v2w1 v2w2 · · · v2wn

...vnw1 vnw2 · · · vnwn

You can think of matrix multiplication ML as the sum of the rank one matricesproduced from the products between the ith column of M and the ith row of L.

Matrix multiplication is not commutative in general: LM 6= ML. Unlessthe matrices are square, the two products are not even both defined, or not thesame size.

However, matrix multiplication is associative and distributive:

(LM)N = L(MN)

L(M +N) = LM + LN

(L+M)N = LN +MN

1.3. MATRICES 7

One more observation about matrix multiplication: If N = ML, then thenfirst column of N depends only on the first column of L, not any of the othernumbers in L. The second column of N depends only on the second column inL, and so on.

Likewise, the first row in N depends only on the first row in M , and so on.This means that if you want to solve a system of matrix equations

AX = B,

you can treat this as a sequence of matrix-vector problems:

Ax1 = b1

Axm = bm

where bi, xi are the columns of B, X.

1.3.2 Basis Change

The mapping L is independent of the choice of bases in V , W , but the matrix Ldepends on the bases. How does the matrix change when you change the bases?We will just consider this for the case where V = W .

Suppose you have the standard basis {ei}, and a new basis {fi}. Let F bethe matrix with columns fi.

Let x be an arbitrary vector. In the original basis, it is expressed withcoefficients vi, in the new basis with coefficients wi.

x =∑i

viei =∑

wifi = Fw.

(Use the interpretation of the matrix-vector product Fw as a linear combinationof the columns of F ).

So, v = Fw or w = F−1v.To get from the original representation v in basis {ei} to the new represen-

tation w in basis {fi} you have to multiply by F−1.Now assume we have a linear map from V to V . In basis {ei} it is represented

by a square matrix L. In the original basis, consider

y = Lx.

Convert to the new basis

F−1y =(F−1LF

) (F−1x

).

Thus, the mapping L is represented in the new basis as(F−1LF

).

The matrices L and F−1LF are called conjugates of each other, or similarmatrices. Similar matrices can be considered as the same mapping representedin two different bases.

Properties of a matrix that are geometric, such as the determinant or eigen-values, are preserved under conjugation. Other properties that are analytic arenot preserved, such as the special shapes of matrices listed below.


1.3.3 Special Matrices

Let Fm×n be the set of matrices of size m×n. This becomes a vector space overF of dimension mn, with entry-by-entry addition and scalar multiplication. Theunit element for addition is the zero matrix:

O =

0 0 · · · 00 0 · · · 0...

...0 0 · · · 0

.

In Fn×n, the identity matrix

I =

1 0 · · · 0

0 1. . .

......

. . .. . . 0

0 · · · 0 1

satisfies I · L = L · I = L.

For a given L, the inverse matrix L−1 satisfies

L · L−1 = L−1 · L = I.

The inverse matrix may or may not exist.

Example: Take

L =

(a bc d

)∈ C2×2.

The inverse is

L−1 =1

ad− bc

(d −b−c a

),

which can be verified by multiplying. The inverse exists if and only if ad−bc 6= 0.

The term ad− bc in the example is the determinant of the matrix. Thereare formulas for the determinant of a larger matrix, but we won’t need them.Here is what the determinant means.

The unit basis vectors ei form a square (in dimension 2) or cube (in dimension3) of area or volume 1.

After the mapping, the images form a parallelogram or parellelepiped. Theabsolute value of the determinant is the area (or volume) of that. The sign ofthe determinant has to do with orientation. This is the reason the Jacobiandeterminant shows up in multidimensional change of variable in integrals.

If the determinant is zero, that means that the square or cube gets flattenedinto something lower-dimensional, and there is no inverse.

1.4. EIGENVALUES 9

Theorem 1.5. Properties of the determinant:

det(I) = 1

det(AB) = det(A) · det(B)

det(A−1) = 1/ det(A)

det(triangular matrix) = product of diagonal terms

Properties of the inverse:

A−1 exists if and only if det(A) 6= 0

A−1 is unique

A ·A−1 = A−1 ·A = I

(AB)−1 = B−1A−1

1.4 Eigenvalues

Assume A is a square matrix, of size n × n. A nonzero vector v is called aneigenvector to eigenvalue λ, if

Av = λv ⇔ (A− λI)v = 0.

It is obvious that if v is an eigenvector, so is any multiple of v. More generally,linear combinations of eigenvectors to the same eigenvalue are again eigenvec-tors, so it makes more sense to think of the eigenspace

E(λ) = N(A− λI) = {v : (A− λI)v = 0}.

The dimension of E(λ) is called the geometric multiplicity of λ, written γ(λ).From the definition, a nonzero v exists for a given λ if and only if det(A −

λI) = 0. It turns out that this determinant is a polynomial of degree n in λ.This is called the characteristic polynomial χ(λ).

By the fundamental theorem of algebra, χ(λ) has precisely n roots, possiblycomplex, possibly multiple. The multiplicity of λ as a root of χ(λ) is called thealgebraic multiplicity of λ, written α(λ).Example: The n× n identity matrix has χ(λ) = (1− λ)n, and every vector isan eigenvector to eigenvalue 1. Algebraic and geometric multiplicity are bothn.Example: The matrix

A =

(1 10 1

)has χ(λ) = (1−λ)2, so λ = 1 has algebraic multiplicity 2. The only eigenvector

is (1, 0)T , so the geometric multiplicity is 1.


We always have 1 ≤ γ(λ) ≤ α(λ). If γ(λ) = α(λ) for all λ, then A iscalled diagonalizable. In this case, there is a basis of eigenvectors in which Abecomes a diagonal matrix:

V −1AV = Λ ⇐⇒ A = V ΛV −1.

Remark: In theoretical linear algebra courses, you spend a lot of time in-vestigating what happens to matrices which are not diagonalizable. There issomething called the Jordan Normal Form that leads to lots of interesting math-ematics.

For numerical purposes, this is completely irrelevant. We simply assume thatevery matrix is diagonalizable, which is actually pretty close to the truth: for amatrix with randomly chosen entries, the probability that it is not diagonalizableis 0.

Some more properties of eigenvalues:

• The product of all eigenvalues (with appropriate algebraic multiplicity) isthe determinant.

• If A is triangular, the eigenvalues are the numbers on the diagonal.

• If H is Hermitian, the eigenvalues are real, and H is diagonalizable.

• If Q is unitary, the eigenvalues are on the complex unit circle: |λi| = 1,and Q is diagonalizable.

1.5 The Inner Product

In addition to addition and scalar multiplication, many vector spaces also havean inner product: vector times vector = scalar. The standard inner producton Cn is

〈v, w〉 =∑i

viwi,

where the bar denotes complex conjugation. For Rn, just ignore the bar.Any inner product produces a norm by

‖v‖ =√〈v, v〉 =

√∑i

|vi|2.

A norm in general is a measure for the length or size of a vector. This one iscalled the Euclidean norm, and corresponds to the geometrical length of thevector.

If v, w are real, then

〈v, w〉 = ‖v‖ · ‖w‖ · cos θ,

where θ is the angle between v and w. In particular, v and w are orthogonalif 〈v, w〉 = 0.

1.6. SPECIAL MATRIX TYPES 11

For complex vectors you cannot talk about angles, but the definition oforthogonality is the same.

Once you have an inner product, you can define the adjoint A∗ of a linearmap A by

〈Av,w〉 = 〈v,A∗w〉.

For matrices, it turns out that A∗ = AT (“A transpose”) if A is real, or A∗ = AH

(“A Hermitian transpose”) if A is complex. If A has size m × n, then A∗ hassize n×m, and

(AT )ij = Aji, (AH)ij = Aji.

I frequently use a notation like v = (1, 2, 3, 4)T . That means that v is acolumn vector, but typesetting it like that would take up too much space on thepage, so I write it as the transpose of a row vector.

Theorem 1.6. The adjoint has the following properties:

(AB)∗ = B∗A∗

(A∗)−1 = (A−1)∗

det(A∗) = det(A)

1.6 Special Matrix Types

1.6.1 Matrices With Certain Patterns of Zeros

Some matrices have certain patterns of zeros that make them easy to work with.These patterns do not usually stay the same under a basis transformation. Infact, many algorithms are based on finding a basis that makes a general matrixtake such a shape.

The notation ∗ just means any entry that does not have to be zero (but itcould be).

Diagonal

∗ 0 · · · · · · 0

0 ∗. . .

......

. . .. . .

. . ....

.... . . ∗ 0

0 · · · · · · 0 ∗


Tridiagonal

∗ ∗ 0 · · · · · · · · · 0∗ ∗ ∗ 0 · · · · · · 00 ∗ ∗ ∗ 0 · · · 0...

. . .. . .

. . .. . .

...0 · · · 0 ∗ ∗ ∗ 00 · · · · · · 0 ∗ ∗ ∗0 · · · · · · · · · 0 ∗ ∗

More generally, a banded matrix has several bands of numbers near the diag-onal.

Triangular (upper or lower)∗ · · · · · · ∗

0 ∗...

.... . .

. . ....

0 · · · 0 ∗

Hessenberg (upper or lower). This is triangular, with one more band along

the diagonal.

∗ · · · · · · · · · ∗

∗. . .

. . ....

0. . .

. . .. . .

......

. . .. . .

. . ....

0 · · · 0 ∗ ∗

Hessenberg matrices are used for eigenvalue problems. It turns out that youcannot transform a general matrix into triangular form with identical eigenval-ues in a finite number of steps, but you can get it into Hessenberg form. Then,you start an iterative method to wipe out the extra band.

Note that a symmetric Hessenberg matrix is tridiagonal.

All of these matrix types can also be defined for non-square matrices. Youjust start at the top left and follow the diagonal, until you run into an edge.The following matrices all count as upper triangular:

1.7. BLOCK MATRICES 13

1.6.2 Hermitian and Unitary Matrices

Here are two other special kinds of matrices that come up a lot in numericalanalysis.

A matrix which is equal to its adjoint is called self-adjoint in general,but that word is usually only used in the infinite-dimensional setting. For realmatrices, it is called symmetric, for complex matrices it is called Hermitian.

For solving Ax = b for a symmetric or Hermitian matrix, we only need halfthe storage space, and half the number of calculations. For eigenvalue problemsof real symmetric matrices, we know that the eigenvalues and eigenvectors arereal, so we don’t need complex arithmetic. Also, the Hessenberg form becomestridiagonal, which speeds up the algorithm significantly.

A square matrix whose columns are mutually orthogonal is called orthog-onal in the real case, and unitary in the complex case. Orthogonal matricesare very popular in computation for two reasons:

• U−1 = U∗, so we have the inverse handy if we need it

• These matrices are extremely numerically stable (very little round-off er-ror).

Since all eigenvalues of a unitary matrix are on the unit circle, the determi-nant also has absolute value 1. In the real case, the determinant is either 1 or(−1). Determinant 1 means the matrix represents a rotation of the coordinatesystem. Determinant (−1) means it is a rotation pluys a reflection.

There are two special examples of orthogonal matrices that are worth men-tioning.

A permutation matrix has one 1 in each row or column, the rest are 0.This corresponds to a permutation of the basis vectors.

0 1 0 00 0 0 11 0 0 00 0 1 0

If A is any matrix, then AP is A with its columns in different order. PA is

A with its rows in different order.The general 2× 2 orthogonal matrix is(

cos θ − sin θsin θ cos θ

)(det = 1), or

(cos θ sin θsin θ − cos θ

)(det = −1).

This is a rotation by angle θ (and maybe a reflection).

1.7 Block Matrices

You can partition matrices into blocks:


1 2 3 45 6 7 89 10 11 12

=

(A B CD E F

).

As long as the dimensions all fit together, you can add/multiply block ma-trices just like regular matrices:

(A B CD E F

)·

GHI

=

(AG+BH + CIDG+ EH + FI

).

This is used extensively in parallel computing, for splitting the calculationsamong multiple processors.

linear algebra review - iowa state...

Documents