linear algebra review 1. preliminaries 3 1.1. â€œthe invertible

LINEAR ALGEBRA REVIEW

SCOTT ROME

CONTENTS

1. Preliminaries 31.1. “The Invertible Matrix Theorem” 31.2. Computing the Rank 31.3. Other Definitions and Results 32. Determinants 32.1. Elementary Row Matrices (Row Operations) and Effects on Determinants 33. Eigenvalues 43.1. Basics 43.2. Eigenvalues after Addition 53.3. Relation to the Determinant and Trace 53.4. Tips on finding eigenvectors quickly 53.5. Questions to answer 54. Characteristic and Minimal Polynomial 54.1. Cayley-Hamilton Theorem 54.2. Using the Characteristic Polynomial to find Inverses and Powers 64.3. More On The Minimal Polynomial 64.4. Finding the Characteristic Polynomial using Principal Minors 64.5. Eigenvalues depend continuously on the entries of a matrix 74.6. AB and BA have the same characteristic polynomial 74.7. Lagrange Interpolating Polynomial 75. Similarity, Diagonalization and Commuting 75.1. Facts About Similarity 75.2. Diagonalizable 85.3. Simultaneously Diagonalizable and Commuting Families 86. Unitary Matrices 86.1. Definition and Main Theorem 86.2. Determining Unitary Equivalence 97. Schur’s Unitary Triangulization Theorem 98. Hermitian and Normal Matrices 98.1. Normal Matrices 98.2. Definition and Characterizations of Hermitian matrices 108.3. Extra Spectral Theorem for Hermitian Matrices 108.4. Other Facts About Hermitian Matrices 109. The Jordan Canonical Form 119.1. Finding The Jordan Form 119.2. Remarks on Finding the Number of Jordan Blocks 119.3. Regarding the Characteristic and Minimal Polynomial of a Matrix A 12

1

9.4. Finding the S matrix 129.5. Solutions to Systems of Differential Equations 129.6. Companion Matrices and Finding Roots to Polynomials 139.7. The Real Jordan Canonical Form 1310. QR And Applications 1310.1. Gram-Schmidt 1310.2. QR Factorization 1410.3. QR Algorithm for Finding Eigenvalues 1411. Rayleigh-Ritz Theorem 1412. Courant Fischer Theorem 1413. Norms and Such 1513.1. Vector Norms 1513.2. Matrix Norms 1513.3. Theorems, Definitions, and Sums 1613.4. Gelfand’s Formula for the Spectral Radius 1614. Weyl’s Inequalities and Corollaries 1615. Interlacing Eigenvalues 1716. Gersgorin Disc Theorem 1716.1. Eigenvalue “Trapping” 1816.2. Strict Diagonal Dominance 1817. Positive (Semi) Definite Matrices 1917.1. Characterizations 1917.2. Finding the square roots of positive semidefininte matrices 1917.3. Schur’s Product Theorem and Similar Things 1918. Singular Value Decomposition and Polar Form 2018.1. Polar Form 2018.2. Misc. Results 2018.3. Singular Value Decomposition 2018.4. Computing the Singular Value Decomposition 2118.5. Notes on Nonsquare matrices 2119. Positive Matrices and Perron Frobenius 2119.1. Nonnegative matrices 2220. Loose Ends 2220.1. Cauchy-Schwarz inequality 2220.2. Random Facts 2220.3. Groups of Triangular Matrices 2220.4. Block Diagonal Matrices 2220.5. Identities involving the Trace 2321. Examples 23

The following material is based in part on Matrix Analysis by Horn and Johnson and the lecturesof Dr. Hugo Woerdeman. No originality is claimed or implied. The following document is madeavailable as a reference for Drexel University’s Matrix Analysis qualifier.

1. PRELIMINARIES

1.1. “The Invertible Matrix Theorem”. Let A ∈Mn. The following are equivalent:(a) A is nonsingular(b) A−1 exists, i.e. A is invertible(c) rankA = n(d) The rows of A are linearly independent(e) The columns of A are linearly independent(f) detA 6= 0(g) The dimension of the range of A is n(h) The dimension of the null space of A is 0(i) Ax = b is consistent for each b(j) If Ax = b is consistent, then the solution is unique(k) Ax = b has a unique solution for each b(l) The only solution to Ax = 0 is x = 0

(m) 0 is not an eigenvalue of A

1.2. Computing the Rank. When computing the rank of a matrix by row reducing, the rank of amatrix is invariant under the following operations:

• permuting rows• permuting columns• adding a multiple of a row to another row• multiplying a row by a nonzero scalar

I include this because I occasionally forget about being able to permute columns and rows. This ishelpful for determining the Jordan Form.

The rank of a matrix can be deduced whenever a matrix is in any echelon (triangular-ish) formby counting the pivots.

Relation to Eigenvalues In this light, the rank is also equal to the number of nonzero eigenvaluescounting multiplicity.

1.3. Other Definitions and Results.• A is invertible if there exists a matrix B such that AB = BA = I . This matrix is unique

and we say B = A−1.• The standard scalar product for x, y ∈ Cn is < x, y >= y∗x.• A set of mutually orthogonal vectors {vi} such that < vk, vj >= 0 for j 6= k is linearly

independent. (Dot c1v1 + ...+ cnvn = 0 with each vector to prove)

2. DETERMINANTS

2.1. Elementary Row Matrices (Row Operations) and Effects on Determinants. We will be-gin with a few examples to build intuition and give a method of remembering the rules using 2x2matrices. To convince yourself on what each matrix E does, multiply it by the identity (on theright, so EI).

• The permutation matrix[

0 11 0

]corresponds to the elementary row operation of switch-

ing the first row with the second. Notice this can be deduced from the first row containinga 1 in the second column (i.e. this says switch the first row with the second row). Thedeterminate of this matrix is −1.

• The matrix[c 00 1

]multiplies the first row of a matrix by c. The determinant of this

matrix is c.

• The matrix[

1 0d 1

]adds the d · (first row) to the second row. The determinant of this

matrix is 1.Now let us explore their effects on the determinants by multiplying them by a matrix A. So wehave the following rules

det(

[0 11 0

]A) = −det(A)

det(

[c 00 1

]A) = c · det(A)

det(

[1 0d 1

]A) = det(A)

So from this we have the following rules, I will do the rule for the 2x2 case for convenience.Theorem Row Operations and the Determinant

(1) Interchange of two rows ∣∣∣∣ a bc d

∣∣∣∣ = −∣∣∣∣ c da b

∣∣∣∣(2) Multiplication of a row by a nonzero scalar∣∣∣∣ a b

c d

∣∣∣∣ =1

k

∣∣∣∣ ka kdc d

∣∣∣∣(3) Addition of a scalar multiple of one row to another row∣∣∣∣ a b

c d

∣∣∣∣ =

∣∣∣∣ a bc+ ka d+ kb

∣∣∣∣(4) Since detA = detAT you may also do column operations and switch columns (with a -

sign)

3. EIGENVALUES

3.1. Basics. There will be some repeats here from later sections to drive the point home.• The eigenvalues for a triangular or diagonal matrix are the diagonal entries.• Every n-by-n square matrix A has n eigenvalues counting multiplicity, and at least one

nonzero eigenvector.• The eigenvectors of an n-by-n matrix A with n distinct eigenvalues are linearly indepen-

dent.Proof. Proceed by induction on the size of a matrix A. The 1-by-1 case is trivial. Assumeit is true for the n case. Then if c1v1 + ... + cn+1vn+1 = 0, apply A− λn+1I to both sidesand use the induction hypothesis to find c1 = ... = cn = 0. Plug this back into the originalequation noting that vn+1 is nonzero to prove the result.• The eigenvectors of a hermitian matrix A are orthogonal (see hermitian section).

• AB andBA have the same characteristic polynomial and eigenvalues. (ifA is nonsingular,they are similar using S = A. If not, for λ ∈ σ(AB), AB = λx =⇒ BAB = λBx and soλ is an eigenvalue of BA and similarly for the eigenvalues of BA being an eigenvalue ofAB )• If A is invertible, then AB ∼ BA, if not they will not be similar. (Example below)

3.2. Eigenvalues after Addition. Let A and B be in Mn

• If λ ∈ σ(A) with eigenvector x, then for k constant, (k+λ) ∈ σ(kI +A) with eigenvectorx.• Lemma For any matrix A, for all sufficiently small ε > 0, the matrix A + εI is invertible.

The proof involves noting A + ε = UTU∗ + εUU∗ whose entries on the diagonal are allnonzero.• If A and B commute, σ(A + B) ⊆ σ(A) + σ(B). (That is, if σ(A) = {α1, ..., αn} andσ(B) = {β1, ..., βn}, there exists a permutation i1, ..., in of indices 1, ..., n so that theeigenvalues of A+B are α1 + βi1 , ..., αn + βin .)

3.3. Relation to the Determinant and Trace. Assume A ∈ Mn and σ(A) = {λ1, ..., λn}. Wehave the following identities:

trA =n∑i=1

λi

detA =n∏i=1

λi

3.4. Tips on finding eigenvectors quickly.• When solving for eigenvectors, you are solving a system of linear equations. Because

interchanging equations makes no difference to the solution, neither does interchangingrows of the matrix when trying to reduce to echelon form and solving for eigenvectors!• If an n-by-n matrix is diagonalizable and you have already found n− 1 eigenvectors, find

the last one by choosing a vector orthogonal to the others. Make sure to check Ax = λx tomake sure you chose it correctly.• Complex eigenvalues of real matrices come in conjugate pairs. Their eigenvectors do also:

Ax = Ax = λx = λx

3.5. Questions to answer.• It is clear if λ ∈ σ(A) then λ∗ ∈ σ(A∗), but do they have the same eigenvalues? Proof.

0 = det(A − λI) = det(A − λI)∗ = det(A∗ − λ∗I). Let x be the eigenvalue associatedwith λ for A. Then Ax = λx and further λ∗x∗

4. CHARACTERISTIC AND MINIMAL POLYNOMIAL

Note: the characteristic polynomial always has the same degree as the dimension of a squarematrix. A useful definition to use is

pA(t) = det(A− tI)

4.1. Cayley-Hamilton Theorem. Let pA(t) be the characteristic polynomial of A ∈ Mn. ThenpA(A) = 0.

4.2. Using the Characteristic Polynomial to find Inverses and Powers.

• (Powers Example) Let A =

[3 1−2 0

]Then pA(t) = t2 − 3t + 2. In particular pA(A) =

A2 − 3A + 2I = 0 =⇒ A2 = 3A − 2I . Using this we can find A3 = AA2 and so on andso forth.

Corollary to Cayley-Hamilton If A ∈ Mn is nonsingular, then there is a polynomial q(t) (whosecoefficients depend on A) of degree at most n− 1, such that A−1 = q(A).Proof. If A is invertible, we can write pA(t) = tn + an−1t

n−1 + ...+ a0 with nonzero a0. Then wehave pA(A) = An + an−1A

n−1 + ...+ a1 + a0I = 0 =⇒ I = A(cn−1An−2 + ...+ c0I) =⇒ A−1 =

q(A).

4.3. More On The Minimal Polynomial. For this section, assume A ∈ Mn, qA(t) denotes theminimal polynomial and pA(t) denotes the characteristic polynomial ofA respectively. For specificmore information, view the segment in the Jordan Canonical Form section.

• qA(t) is the unique monic polynomial such that qA(A) = 0.• The degree of qA(t) is at most n.• If p(t) is any polynomial such that p(A) = 0, then the minimal polynomial qA(t) dividesp(t).• Similar matrices have the same minimal polynomial (this makes sense, they have the same

Jordan form)• For A ∈ Mn the minimal polynomial qA(t) divides the characteristic polynomial pA(t).

Also, qA(λ) = 0 if and only if λ is an eigenvalue of A. So every root of pA(t) is a root ofqA(t). This was basically already here, but just for emphasis I added it.• Each of the following is a necessary and sufficient condition for A to be diagonalizable:

(a) qA(t) has distinct linear factors(b) Every root of the minimal polynomial has multiplicity 1.(c) For all t such that qA(t) = 0, q′A(t) 6= 0.

4.4. Finding the Characteristic Polynomial using Principal Minors. Before proceeding wemust have some definitions:

• Definition For a square matrix A ∈ Mn. Let α ⊂ {1, 2, . . . , n} and let |α| = k Thenthe submatrix A(α) is a k-by-k principal submatrix of A.This matrix is formed by takingentries of A that lie in positions (αi, αj). That is, a k-by-k principal submatrix of A is onelying in the same set of k rows and k columns.

• Example

Let A =

1 2 34 5 67 8 9

, α = {1, 3}. Then a 2-by-2 principal submatrix of A is A(α) =[1 37 9

]• Definition The determinant of a principal submatrix is called a principal minor.• There are

(nk

)k-by-k principal minors of A = [aij].

• We denote the sum of the k-by-k principal minors of A as Ek(A).• Example E1(A) =

∑ni=1 aii = trA, En(A) = detA.

Now we are able to present the important formula:Theorem The characteristic polynomial of a n-by-n matrix A can be given as

pA(t) = tn − E1(A)tn−1 + E2(A)tn−2 − · · · ± En(A)

Examples:• Let A ∈M2. Then pA(t) = t2 − (trA)t+ detA.• Consider

A =

a b cd e fg h i

then

pA(t) = t3 − (a+ e+ i)t2 +

(∣∣∣∣ a bd e

∣∣∣∣+

∣∣∣∣ a cg i

∣∣∣∣+

∣∣∣∣ e fh i

∣∣∣∣) t−∣∣∣∣∣∣a b cd e fg h i

∣∣∣∣∣∣4.5. Eigenvalues depend continuously on the entries of a matrix. Facts The following factscombine to yield the title of the section. Since this is mostly based on ideas involving the charac-teristic polynomial, I placed it in this section. The following only holds for square matrices.

• The zeros of a polynomial depend continuously on the coefficients.• The coefficients of the characteristic polynomial are continuous functions of the entries of

the matrix.• The zeros of the characteristic polynomial are the eigenvalues.

This is discussed more thoroughly in Appendix D of Horn and Johnson. The moral is: sufficientlysmall changes in the entries of A will cause small changes in the coefficients of pA(t) which willresult in small changes in the eigenvalues.

4.6. AB and BA have the same characteristic polynomial. AB and BA are similar in the casethe either A or B is invertible and the result is clear. Otherwise: Consider Aε = A + εI whichis invertible for all sufficiently small ε. Then AεB ∼ BAε. Letting ε → 0, similarity may fail,but the characteristic polynomials will still be equal since the characteristic polynomials dependcontinuously on the parameter ε. This holds from the above subsection’s discussion.

4.7. Lagrange Interpolating Polynomial. (0.9.11) add in later

5. SIMILARITY, DIAGONALIZATION AND COMMUTING

A nice way to think about similarity is the following: Two matrices in Mn are similar if theyrepresent the same linear transformation T : Cn → Cn in (possibly) two different basis. Thereforesimilarity may be thought of as studying properties which are intrinsic to a linear transformationitself, or properties that are common to all its various basis representations.

5.1. Facts About Similarity.• Similarity is an equivalence relation• If two matrices are similar, they have the same characteristic polynomial. Which implies

they have the same eigenvalues counting multiplicity.• The converse of the previous statement is not true unless the matrices are normal. (Consider

the zero matrix and A =

[0 10 0

])

• The only matrix similar to the zero matrix is itself.• Rank is similarity invariant• A necessary but not sufficient condition for two matrices to be similar is that their deter-

minants are equal. That is because the determinants can be seen as the product of theeigenvalues. Another check is if the traces are equal.• If two matrices are similar, then they have the same Jordan Canonical Form by transitivity.• From the above statement, it is clear two similar matrices have the same minimal polyno-

mial.• Two normal matrices are similar if and only if they are unitarily equivalent. (problem

2.5.31)• Theorem Similar matrices have the same characteristic polynomial.

Proof: If A and B are similar, then A = SBS−1 for some invertible S. Then

pA(t) = det(A−tI) = det(S)det(A−tI)det(S−1) = det(SAS−1−tSS−1) = det(B−tI) = pB(t)

5.2. Diagonalizable.• To start easy: diagonal matrices always commute.• A matrix is diagonalizable if and only if it is similar to a diagonal matrix.• An nxnmatrixA is diagonalizable if and only if it has n linearly independent eigenvectors.• An aside: If A is 2x2 and dimKer(A − λI) = 2, then necessarily that eigenvalue has 2

linearly independent eigenvectors from the Jordan Form. The Jordan form is helpful forthis. From this we see:• If a matrix has n distince eigenvalues, then it is diagonalizable.• if C is block diagonal, then C is diagonalizable if and only if each block on the diagonal is

diagonalizable.• For an additional characterization by minimal polynomials, check that section!

5.3. Simultaneously Diagonalizable and Commuting Families. The main takeaway of this sec-tion should be: if two matrices commute, they can be simultaneously diagonalized/(unitarily trian-gularized) by a single matrix.

• For two diagonalizable matrices A and B. A and B commute if and only if they are simul-taneously diagonalizable. Stronger Theorem Let F ⊆ Mn be a commuting family. Thenthere exists a unitary U such that for all A ∈ F , U∗AU is upper triangular. This says thatmatrices only need to be commute to be simultaneously unitarily triangularized!!• If M is a commuting family of normal matrices, then M is simultaneously unitarily diag-

onalizable. This is the same as the above statement.. not sure why I’m keeping it.• If F is a commuting family, then there is a vector x which is an eigenvector for everyA ∈ F .• Commuting families are also simultaneously diagonalizable families• If A and B commute, from Schur’s triangulization theorem and a property above, we have

that σ(A+B) ⊆ σ(A) + σ(B).

6. UNITARY MATRICES

6.1. Definition and Main Theorem.• Definition A matrix U ∈Mn is unitary provided U∗U = I .• Theorem The following are equivalent:

(a) U is unitary

(b) U is nonsingular and U−1 = U∗

(c) UU∗ = I(d) U∗ is unitary(e) The columns of U form an orthonormal set, that is, it’s range is Rn

(f) The row of U form an orthonormal set; and(g) For all x ∈ Cn, the Euclidean length of y ≡ Ux is the same as that of x, that is,

y∗y = x∗x or again ‖Ux‖ = ‖x‖.• A real unitary matrix is called an orthogonal matrix.• |detU | = 1• λ ∈ σ(U), |λ| = 1.• A sequence of unitary matrices that converges, converges to a unitary matrix.

6.2. Determining Unitary Equivalence.• If A = [aij] and B = [bij] are unitarily equivalent, then

∑ni,j=1 |bij|2 =

∑ni,j=1 |aij|2.

• (Pearcy’s Theorem) Two matrices A,B ∈ Mn are unitarily equivalent if and only iftr(w(A,A∗)) = tr(w(B,B∗)) for every word w(s, t) of degree at most 2n2.

– Note an example of a word w(s, t) in the letters s and t of degree 3 is sts.– In general this is used to show two matrices are not unitarily equivalent. To use

this you find a combination, where the trace is not equal, for example tr(A∗AA∗) 6=tr(B∗BB∗) where A and B are given.

7. SCHUR’S UNITARY TRIANGULIZATION THEOREM

Theorem Given A ∈Mn with eigenvalues λ1, . . . , λn in any prescribed order, there is a unitarymatrix U such that

U∗AU = T = [tij]

where T is upper triangular, with the diagonal entries tii = λi for i = 1, . . . , n. In words, everysquare matrix A is unitarity equivalent to an upper triangular matrix whose diagonal entries are theeigenvalues of A in any prescribed order. Furthermore, if A ∈Mn = (R) and if all the eigenvaluesof A are real, then U may be chosen to be real and orthogonal.Therefore, neither T nor U are unique!One way to interpret Schur’s theorem is that it says that every matrix is “almost” diagonalizablewhich is made precise in the following way:Theorem (2.4.6) Let A = [aij] ∈ Mn. For every ε > 0, there exists a matrix A(ε) = [a(ε)ij] thathas n distinct eigenvalues (and therefore diagonalizable) such that∑

i,j

= 1n|aij − a(ε)ij|2 < ε

.

8. HERMITIAN AND NORMAL MATRICES

8.1. Normal Matrices.• A matrix is normal if A∗A = AA∗

• A ∈Mn is normal if and only if every matrix unitarily equivalent to A is normal.• A real, normal matrix A ∈ M2(R) is either symmetric (A = A∗), or the sum of a scalar

matrix and some skew symmetric matrix (i.e. A = kI +B for k ∈ R and B = −B∗).• Unitary, hermitian, and skew hermitian matrices are normal.

• For a normal matrix A, Ax = 0 if and only if A∗x = 0. So they have the same kernel. (Usethe fact below to prove).• For a normal matrix A, ‖Ax‖ = x∗A∗Ax = x∗AA∗x = ‖A∗x‖.• For λ ∈ C, A+ λI is normal.• A normal triangular matrix T is diagonal. Equate entries of TT ∗ and T ∗T• If Ax = λx for nonzero x, A∗x = λx. From ||(A−λI)x|| = ||(A−λI)∗x|| since (A−λI)

is normal.Spectral Theorem If A = [aij] ∈Mn has eigenvalues λ1, ..., λn, the following are equivalent(a) A is normal(b) A is unitarily diagonalizable (A is unitarily equivalent to a diagonal matrix)(c)∑n

i,j=1 |aij|2 =∑n

i=1 |λi|2 = tr(A∗A)(d) There is an orthonormal set of n eigenvectors of A.

8.2. Definition and Characterizations of Hermitian matrices.• A matrix A ∈Mn is Hermitian if A = A∗. It is skew-Hermitian if A = −A∗.• A+ A∗, AA∗, A∗A are always Hermitian• If A is Hermitian, Ak is Hermitian for all k ∈ N, and if A is nonsingular, A−1 is Hermitian.• If A,B are Hermitian, aA+ bB is hermitian for all real scalers a, b.• A− A∗ is skew-Hermitian• If A,B is skew-Hermitian, so is aA+ bB for real scalers a, b.• If A is Hermitian, iA is skew-Hermitian. If A is skew-Hermitian, iA is Hermitian,• Any matrix A may be written A = 1

2(A + A∗) + 1

2(A− A∗) where we have A split into a

Hermitian and Skew Hermitian part.• If A is Hermitian, the diagonal entries of A are real.

Theorem A is Hermitian if and only if at least one of the following holds:• x∗Ax is real for all x ∈ Cn

• A is normal and all the eigenvalues of A are real• S∗AS is Hermitian for all S ∈Mn.

If one of them holds, A is Hermitian and the others hold as well. If A is Hermitian all three hold.The point of the theorem is that if you can find one of these hold, then you may conclude A isHermitian.For example, if you can show A is unitarily diagonalizable by using Schur’s theorem and findingthe non-diagonal entries are 0, then A is normal and if you can then show each diagonal entry isreal, then A is Hermitian.Hermitian matrices are neccessarily normal and so all the theorems about normal matrices hold forthem.

8.3. Extra Spectral Theorem for Hermitian Matrices. If A is hermitian, then(1) All eigenvalues of A are real and(2) A is unitarily diagonalizable.(3) If A ∈Mn(R) is symmetric, then A is real orthogonally diagonalizable.

8.4. Other Facts About Hermitian Matrices.• For all x ∈ Cn the nxn matrix xx∗ is hermitian. The only eigenvalues of this matrix are

0 and ‖x‖22 = x∗x. Proof: Clearly by choosing a vector orthogonal to x, say p, xx∗p =x(x∗p) = x0 = 0p. So notice xx∗x = x(x∗x) = x‖x‖22 = ‖x‖22x. If there was another

eigenvalue, say λ, then for some y, xx∗y = λy = x(x∗y) = (x∗y)x. So this implies y is inthe span of x and so λ = ‖x‖22.• If A is hermition, the rank of A is equal to the number of nonzero eigenvalues of A.• Every principal submatrix of a hermitian matrix is hermitian.• If A is hermitian and x∗Ax ≥ 0 for all x ∈ Cn, then all the eigenvalues of A are nonnega-

tive. If in addition trA = 0, then A = 0.• You may find the minimal polynomial by computing the Jordan form. A quicker version

is to find for each λ the smallest k such that rank(A − λI)k = rank(A − λI)k+1. Thenconstruct the polynomial from this.• Every matrixA is uniquely determined by its Hermitian form x∗Ax. That is, if for matricesA,B we have x∗Ax = x∗Bx for all x ∈ C. Then A = B. (Exercise 4.1.6)• Eigenvalues of Skew Hermitian matrices are pure imaginary. The square of a skew hermit-

ian matrix has real nonpositive eigenvalues. (exercise 4.1.10).

9. THE JORDAN CANONICAL FORM

9.1. Finding The Jordan Form.Step 1) Find the eigenvalues with multiplicity. This is equivalent to factoring the characteristic

polynomial.Recall: If given the characteristic polynomials, one may read off the eigenvalues and theirmultiplicity.Example: pA(λ) = (λ − 1)3(λ + 2)5. The eigenvalues of A are 1 and −2 with algebraicmultiplicity 3 and 5 respectively.

Step 2) For each eigenvalue, determine the number of Jordan Blocks using the relation:

dim ker[(A− λI)k] =k∑j=1

# Jordan Blocks of Order ≥ j

= # of J Blocks associated with λ + # of J Blocks of size ≥ 2 +...+ # of J blocks of size ≥ k

Step 3) Mention that the forms you found are the only possible ones up to permutation of theblocks.

9.2. Remarks on Finding the Number of Jordan Blocks.• The geometric multiplicity (which is the number of linearly independent eigenvectors for

the eigenvalue and also characterized as the dimension of the eigenspace) determines thetotal number of Jordan Blocks in the form:

dim ker(A− λI) = #Jordan Blocks of Order ≥ 1 = Total # of Jordan Blocks

• The orders of the Jordan Blocks of an eigenvalue λ must sum to the algebraic multiplicityof λ.• # of Jordan Blocks of order ≥ m = dim ker[(A− λI)m]− dim ker[(A− λI)m−1]• The order of the largest Jordan Block of A corresponding to an eigenvalue λ (called theindex of λ) is the smallest value of k ∈ N such that rank(A− λI)k = rank(A− λI)k+1.Proof. If rank(A− λI)k = rank(A− λI)k+1 then

0 = dim ker[(A− λI)k+1]− dim ker[(A− λI)k] = # of J Blocks of order ≥ k + 1

Since this is the smallest such k which makes this equation equal 0, this implies that

0 ≤ dim ker[(A− λI)k]− dim ker[(A− λI)k−1] = # of J Blocks of order ≥ k �

• For a J block of order k associated with λ (Jk(λ)), we have that k is also the smallestnumber such that

dim ker[(Jk(λ)− λI)k] = 0

9.3. Regarding the Characteristic and Minimal Polynomial of a Matrix A. Let A ∈ Mn

whose distinct eigenvalues are λ1, ..., λm. Assume the characteristic polynomial is given by pA(t) =∏mi=1(t− λi)si and the minimal polynomial by qA(t) =

∏mi=1(t− λi)ri .

• The exponent in the characteristic polynomial si (i.e. the algebraic multiplicity) is the sumof the orders of all the J blocks of λi.• The exponent in the minimal polynomial ri is the largest order J block corresponding to λi.

9.4. Finding the S matrix. For the Jordan form we haveA = SJS−1 for some S. If J is diagonalthen S is simply the eigenvectors of A arranged in a prescribed order, but if not: To find S notice

AS = SJ . Let’s look at the 2-by-2 case for S = [s1, s2] and J =

[λ 10 λ

], notice that AS = SJ

can be written: As1 As2

=

λs1 s1 + λs2

So you can solve for s1 as an eigenvector for λ. To find s2, we solve (A− λI)s2 = s1.

9.5. Solutions to Systems of Differential Equations. As ex :=∑∞

k=0xk

k!this gives us the defi-

nition of eA = I + A + 12!A2 + ... which converges for all square matrices in Cnxn. If we have

two matrices A,B such that AB = BA then we have the identity eA+B = eAeB since eA, eB mustcommute here. Using the Jordan form where J = D + T where D is the diagonal entries and Tare the other entries, we can see that etA = etSJS

−1= Set(D+T )S−1 = SetDetTS−1, which will

allow us to solve systems of ODEs:{~x′(t) = Ax(t) A ∈Mn, ~x ∈ Cn

~x(0) = ~C

Then our solution is ~x(t) = etA ~C. Another problem you may encounter is: Solve for x(t) where

x3(t)− x2(t)− 4x′(t) + 4x(t) = 0

To do so, convert this to a system of ODEs.

x1(t) = x(t) =⇒ x′1(t) = x2(t)

x2(t) = x′(t) =⇒ x′2(t) = x3(t)

x3(t) = x2(t) =⇒ x′3(t) = x3(t) = x2(t) + 4x′(t)− 4x(t) = x3(t) + 4x2(t)− 4x1(t)

If we define ~x(t) = (x1(t), x2(t), x4(t), x4(t))T we immediately see we can rewrite this system

above as

~x′(t) =

x′1(t)x′2(t)x′3(t)

=

0 1 00 0 1−4 4 1

x1(t)x2(t)x3(t)

= A~x(t)

You then solve this ODE for ~t, and the first component x1(t) is the solution to the original question.This construction also gives the general framework for Companion Matrices.

9.6. Companion Matrices and Finding Roots to Polynomials. Let p(x) = xn + an−1xn−1 +

...+ a1x+ a0 be a polynomial. The Companion Matrix C of p(x) is defined to be

C =

0 1 0 0 . . . 00 0 1 0 . . . 0

0 0 0. . . 0 0

0 0 0 . . . 1 0−a0 −a1 −a2 . . . −an−2 −an−1

Notice that this matrix has 1’s along most of the super diagonal, and 0’s on most of the diagonal,and across the bottom are the negative coefficients. Now notice

C

1xx2

...xn−1

=

xx2

...xn−1

−a0 − a1x− ...− an−1xn−1

=

xx2

...xn−1

xn

= x

1xx2

...xn−1

if and only if x is a root of p(x) since then xn = −an−1xn−1 − ... − a1x − a0. But this implies xis also an eigenvalue of C and we also have an eigenvector. Therefore, the important thing aboutthis matrix is the eigenvalues of C are roots of p(x). Performing the QR algorithm on a matrixlike this is an efficient way to find roots of polynomials.

9.7. The Real Jordan Canonical Form. For λ = a+ bi ∈ C we define

C(a, b) :=

[a b−b a

]Theorem If A ∈Mn(R) then there exists an invertible real matrix S such that

A = S

Jn1(λ1) 0 0 0 0 0 0

0. . . 0 0 0 0 0

0 0 Jnr(λr) 0 0 0 00 0 0 Cnq(a1, b1) 0 0 0

0 0 0 0. . . 0 0

0 0 0 0 0 Cnn(ak, bk) 0

S−1

Where λk = ak + bki for k = nq, ..., nk and λ1, ..., λr are the real eigenvalues of A. Furthermore,

Ck(a, b) ==

C(a, b) I2 0 0

0. . . . . . 0

0 0 C(a, b) I20 0 0 C(a, b)

is a block matrix with 2k C(a, b) along the diagonal and I2 along the superdiagonal. This matrix issimilar to (and takes the place of in the Jordan form)D(λ) = diag(Jk(λ), Jk(λ)) where λ = a+bi.

10. QR AND APPLICATIONS

10.1. Gram-Schmidt. Let x1, ..., xm ∈ Cm be linearly independent. To find an orthonormal setof vectors {y1, ..., ym} whose span equals that of x1, ..., xm use Gram-Schmidt. Algorithm

(1) Let v1 = x1

(2) v2 = x2 − <x2,v1><v1,v1>

v1 = x2 − v∗1x1v∗1v1

v1Notice the second term is the projection of x2 onto the subspace spanned by v1.

(3) vj = xj −∑j−1

k=1<xk,vk><vk,vk>

vk(4) Let yi = vi

‖vi‖ = vi<vi,vi>1/2

10.2. QR Factorization. Theorem If A ∈ Mn,m and n ≥ m, there is a matrix Q ∈ Mn,m withorthonormal columns and an upper triangular matrix R ∈ Mm such that A = QR. If m = n, thenQ is unitary; if in addition A is nonsingular, R may be chosen so that all its diagonal entries arepositive, and in this event, the factors Q,R are both unique. If A is real, then both Q and R maybe taken to be real.Factorization To find the QR factorization: First find Q by performing Gram-Schmidt on thecolumns of A. Place this new orthonormal set as the columns of Q without changing the order.Then to get R, simply compute R = Q∗A.NOTE: If the columns of A are linearly dependent, then one of the vectors from Gram-Schmidtis the zero vector. Obviously this is not correct (as then the matrix will not be orthogonal), and soyou should choose your last vector to be orthogonal (it should be simple).

Ex:

1 0 10 1 01 0 1

Tip Say the k-th column of A has i in each entry. You can rewrite A as A = A′B where B isdiagonal and multiplies the k-th column of A by i. Then find A′ = Q′R′. Substituting in we haveA = Q′R′B. Letting R = R′B which is still upper triangular we have the desired factorization.

10.3. QR Algorithm for Finding Eigenvalues. Let A0 be given. Write A0 = Q0R0. DefineA1 = R0Q0. Then write A1 = Q1R1 and repeat. In g eneral factor Ak = QkRk and defineAk+1 = RkQk. Ak will converge (in most cases) to an upper triangular matrix which is unitarilyequivalent to A and so they have the same eigenvalues. If there are complex eigenvalues, it willconverge to a block triangular matrix under certain conditions.

11. RAYLEIGH-RITZ THEOREM

Let A ∈ Mn be Hermitian and let the eigenvalues of A be ordered as λmin = λ1 ≤ λ2 ≤ · · · ≤λn = λmax. Then

λ1x∗x ≤ x∗Ax ≤ λnx

∗x ∀x ∈ C

λmax = λn = maxx 6=0

x∗Ax

x∗x= max

x∗x=1x∗Ax

λmin = λ1 = minx 6=0

x∗Ax

x∗x= min

x∗x=1x∗Ax

12. COURANT FISCHER THEOREM

Let A ∈ Mn be a Hermitian matrix with eigenvalues λ1 ≤ λ2 ≤ ... ≤ λn and let k be a giveninteger with 1 ≤ k ≤ n. Then

minw1,w2,...,wn−k∈Cn

maxx 6=0,x∈Cn

x⊥w1,w2,...,wn−k

x∗Ax

x∗x= λk

andmax

w1,w2,...,wk−1∈Cnmin

x 6=0,x∈Cn

x⊥w1,w2,...,wk−1

x∗Ax

x∗x= λk

Remark: If k = 1, n then the theorem reduces to Rayleigh-Ritz. I would like to also put a word oninterpreting this result physically. The min/max statement was so intimidating it took me severalmonths to actually think about it. On the first equality, it says we first take the max of x∗Ax/x∗xover all x 6= 0 that are perpendicular to a particular n − k dimensional subspace of Cn. Then wetake the minimum value of the set of numbers formed for the previous condition over all possiblen− k dimensional subspaces.Corollary The singular values of A, σ1 ≥ σ2 ≥ . . . σq where q = min{m,n} and for 1 ≤ k leqnmay be given by

σk = minw1,w2,...,wk−1∈Cn

maxx 6=0,x∈Cn

x⊥w1,w2,...,wk−1

‖Ax‖2‖x‖2

σk = maxw1,w2,...,wn−k∈Cn

minx 6=0,x∈Cn

x⊥w1,w2,...,wn−k

‖Ax‖2‖x‖2

13. NORMS AND SUCH

A Norm ‖ · ‖ : V → R satisfies:(1) ‖x‖ ≥ 0 for all x ∈ V(2) ‖x‖ = 0 if and only if x = 0(3) ‖cx‖ = |c|‖x‖(4) ‖x+ y‖ = ‖x‖+ ‖y‖

13.1. Vector Norms.• ‖x‖2 = (|x1|2 + . . . |xn|2)1/2. Note for x ∈ Cn, ‖x‖2 =

√x∗x.

13.2. Matrix Norms. A matrix norm ‖| · ‖| satisfies (1-4) of the properties above and in additionsubmultiplicity: ‖|AB‖| ≤ ‖|A‖|‖|B‖| for any matrix A,B.One may induce a norm from a vector norm ‖ · ‖ by defining ‖|A‖| = max‖x‖=1 ‖Ax‖.

• The spectral norm is

‖|A‖|2 = max{√λ : λ ∈ σ(A∗A)}

‖|A‖|2 = max{√λ : λ ∈ σ(AA∗)}

‖|A‖|2 = max{σ : σ is a singular value of A}and further

‖|A‖|2 = max‖x‖2=1

‖Ax‖2 = max‖x‖2≤1

‖Ax‖2 = max‖x‖2 6=0

‖Ax‖2‖x‖2

= max‖x‖2=‖y‖2=1

|y∗Ax| = max‖x‖2≤1;‖y‖2≤1

|y∗Ax|

• The previous identities can be used to show ‖|A‖|2 = ‖|A∗‖|2 for all A ∈ Mn. Addition-ally, ‖|A∗A‖|2 = ‖|AA∗‖|2 = ‖|A‖|22 from properties of matrix norms and that A∗A ishermitian.• ‖|A‖|∞ = max1≤i≤n

∑nj=1 |aij| This max row sum of the absolute values of the row en-

tries.

• ‖|A‖|1 = max1≤j≤n∑n

i=1 |aij| This max column sum of the absolute values of the columnentries.• A matrix norm that is not an induced norm (not determined by a vector norm) is the Frobe-

nius norm ‖A‖2 :=√tr(A∗A).

• ‖A‖∞ := maxi,j=1,..,n |aij| is a norm on the vector space of matrices but NOT a matrixnorm.

13.3. Theorems, Definitions, and Sums.

ρ(A) := max{|λ| : λ ∈ σ(A)}

The following lemmas build to the first theorem:Property

• ‖|Ak‖| ≤ ‖|A‖|k• If ‖|Ak‖| → 0 for some norm, since all vector norms on the n2 dimensional space Mn are

equivalent, we have Ak → 0 because the limit would also hold in the ‖ · ‖∞ norm.Lemma If there is a matrix norm ‖| · ‖| such that ‖|A‖| < 1, then limk→∞A

k = 0. Theorem IfA ∈ Mn, then limm→∞A

m = 0 if and only if all the eigenvalues of A have modulus < 1. That is,ρ(A) < 1 if and only if limn→∞A

n = 0This tells us that if Ak → 0, then ρ(A) < 1 so there exists a matrix norm ‖|A‖| < 1 in which case‖|A‖|k → 0.Theorem If ‖| · ‖| is a matrix norm, then ρ(A) ≤ ‖|A‖|.Corollary Since ‖A‖∞ and ‖A‖1 are matrix norms,

ρ(A) ≤ min{maxi

n∑j=1

|aij|,max jn∑i=1

|aij|}

Lemma LetA ∈Mn and ε > 0. Then there exists a matrix norm ‖| ·‖| such that ‖|A‖| ≤ ρ(A)+ε.Theorem If A ∈Mn, then the series Σ∞k=0akA

k converges if there exists a matrix norm ‖| · ‖| suchthat the numerical series Σ∞k=0|ak|‖|A‖|k converges, or even if the partial sums of this series arebounded.Important Theorem If there is a matrix norm such that ‖|I − A‖| < 1, then A is invertible andA−1 =

∑k=0(I − A)k.

Theorem If ρ(A) < 1, then I − A is invertible and∑∞

k=0Ak = (I − A)−1. This could also be

stated with the condition “if there exists a matrix norm such that ‖|A‖| < 1”.

13.4. Gelfand’s Formula for the Spectral Radius. If A ∈ Mn and ‖| · ‖| is a matrix norm, thenρ(A) = limk→∞ ‖|Ak‖|1/kFor the proof, consider ρ(A)k = ρ(Ak) ≤ ‖|Ak‖| for one inequality and the matrix A = (ρ(A) +ε)−1A for the other and use that the limit of Ak is 0.

14. WEYL’S INEQUALITIES AND COROLLARIES

This is a consequence of the Courant Fischer Theorem.Theorem Let A,B ∈ Mn be Hermitian and let the eigenvalues λi(A), λi(B) and λi(A + B) bearranged in increasing order (max = λ1 ≤ ... ≤ λn = max. For each k ∈ N we have

λk(A) + λ1(B) ≤ λk(A+B) ≤ λk(A) + λn(B)

Corollaries Many of these corollaries continue to stress intuition about the eigenvalues of Her-mitian matrices. For example, the first corollary stresses that positive semidefinite matrices havepositive eigenvalues.

• Assume B is positive semidefinite with the above assumptions, then λk(A) ≤ λk(A+B).• For a vector z ∈ Cn with the eigenvalues of A and A± zz∗ arranged in increasing order

(a) λk(A± zz∗) ≤ λk+1(A) ≤ λk+2(A± zz∗) for k = 1, 2, ..., n− 2(b) λk(A) ≤ λk+1(A± zz∗) ≤ λk+2(A) for k = 1, 2, ..., n− 2.

Theorem Let A,B ∈Mn be Hermitian and suppose that B has at most rank r. Then(a) λk(A+B) ≤ λk+r(A) ≤ λk+2r(A+B) k = 1, 2, ..., n− 2r(b) λk(A) ≤ λk+r(A+B) ≤ λk+2r(A) k = 1, 2, ..., n− 2r(c) If A = UΛU∗ with U = [u1, ..., un] unitary and λ = diag(λ1, ..., λn) arranged in increasing

order, and if

B = λnunu∗n + λn−1un−1u

∗n−1 + ...+ λn−r+1un−r+1u

∗n−r+1

then λmax(A−B) = λn−r(A).Corollary By applying the above theorem repeatedly we get: If B has at most rank r, then

λk−r(A) ≤ λk(A+B) ≤ λk+r(A)

This theorem intrinsically requires k − r ≥ 1 and k + r ≤ n.Theorem For A,B Hermitian,(a) 1 ≤ j, k ≤ n, j + k ≥ n+ 1 then

λj+k−n(A+B) ≤ λj(A) + λk(B)

(b) if j + k ≤ n+ 1,λj(A) + λk(B) ≤ λj+k−1(A+B)

15. INTERLACING EIGENVALUES

Let A ∈Mn+1 be Hermitian. Define A ∈Mn, y ∈ Cn and a ∈ R so that

A =

[A yy∗ a

]where the eigenvalues of A and A are denoted {λk} and {λi} for k = 1, .., n + 1 and i = 1, .., n(respectively) arranged in increasing order. Then

λ1 ≤ λ1 ≤ λ2 ≤ λ2 ≤ · · · ≤ λn−1 ≤ λn ≤ λn ≤ ˆλn+1

Note: This Theorem sometimes specifies that A is Hermitian, but that is given because every prin-cipal submatrix of a Hermitian matrix is Hermitian. Further, notice a ∈ R, because the diagonalentries of Hermitian matrices are real.

16. GERSGORIN DISC THEOREM

This theorem gives us an idea of the location of the eigenvalues of a matrix. Furthermore itallows us to say things about the eigenvalues of a matrix without computing them.Theorem Let A = [aij] ∈Mn and let

R′i :=n∑j=1j 6=i

|aij|, 1 ≤ i ≤ n

denote the deleted absolute row sums of A. Then all the eigenvalues of A are located in the unionof n discs

n⋃i=1

{z ∈ C : |z − aii| ≤ R′i} ≡ G(A)

in other words, σ(A) ⊆ G(A). Furthermore, if a union of k of these n discs form a connectedregion that is disjoint from all the remaining n− k discs, then there are precisely k eigenvalues ofA in this region.Corollary All the eigenvalues of A are located in the union of n discs

n⋃j=1

{z ∈ Cn : |z − ajj| ≤ C ′j} = G(AT )

where Cij :=

∑ni=1j 6=i|aij| is the deleted absolute column sum. Furthermore, if a union of k of these

n discs form a connected region that is disjoint from all the remaining n − k discs, then there areprecisely k eigenvalues of A in this region.Since similar matrices have the same eigenvalues, you can sometimes find a better estimate on thelocation of a matrices eigenvalues by looking at a similar matrix. A convenient choice is a matrixD = diag(p1, ..., pn) with pi > 0. One can then easily calculate D−1AD = [pjaij/pi] and applyGersgorin to D−1AD and its transpose to yield:Corollary All the eigenvalues of A lie in the region

n⋃i=1

{z ∈ Cn : |z − aii| ≤

1

pi

n∑j=1j 6=i

pj|aij|}

= G(D−1AD)

as well as in the regionn⋃i=1

{z ∈ Cn : |z − ajj| ≤ pj

n∑j=1j 6=i

1

pi|aij|

}= G[(D−1AD)T ]

16.1. Eigenvalue “Trapping”.σ(A) ⊂

⋂D

G(D−1AD)

where D is diagonal. Therefore you can use different D’s to attempt to “trap” eigenvalues bytaking intersections.

16.2. Strict Diagonal Dominance. Definition: A matrix is strictly diagonally dominant if:

|aii| >n∑j=1j 6=i

|aij| = R′i for all i = 1, . . . , n

Theorem: If a square matrix A is strictly diagonally dominant then• A is invertible• If all the main diagonal entries ofA are positive, then all the eigenvalues ofA have positive

real part• If A is Hermitian and all main diagonal entries of A are positive, then all the eigenvalues

of A are real and positive.

17. POSITIVE (SEMI) DEFINITE MATRICES

In the following section, things in parenthesis are true for other things in parenthesis, ie if Iwrite, the grass (sky) is green (blue). I’m saying the grass is green and the sky is blue.

17.1. Characterizations.• By definition, a positive definite matrix (and semi) is hermitian, this is proved below by a

lemma.• The trace, determinant, and all principal minors (determinants of principal submatrices) of

a positive definite matrix (semi definite) are positive (nonnegative).• Lemma If x∗Ax ∈ R for all x ∈ Cn, then A is hermitian.

Proof. e∗iAei = aii ∈ R. So look at (ei + ej)∗A(ei + ej) = aii + ajj + aij + aji ∈ R =⇒

Im(aij) = −Im(aji). Similarly (iei+ej)∗A = (iei+ej) = aii+ajj− iaij + iaji ∈ R =⇒

Im(iaij) = −Im(iaji) =⇒ Re(aij) = −Re(aji). Therefore aij = aji. So A is hermitian.This lemma in particular shows that both positive semi definite matrices and positive defi-nite matrices are hermitian.• A hermitian matrix A is positive (semi) defininte if and only if all of its eigenvalues are

positive (nonnegative).• The preceding statement proves that all powers of a positive semi defininte matrix Ak are

also psd.• A hermitian is positive definite if and only if all leading principal minors of A are positive.

So for any leading principal submatrix Ai := A({1, 2, ..., i}), i = 2, ..., n, detAi > 0.• If A is nonsingular, then AA∗ and A∗A are both positive definite. If A is singular they are

positive semi definite.

17.2. Finding the square roots of positive semidefininte matrices. Theorem Let A ∈ Mn bepositive semi definite and k ≥ 1 be a given integer. Then there exists a unique positive semidefiniteHermitiam matrix B such that Bk = A. We also have

(a) BA=AB and there is a polynomial p(t) such that B = p(t)(b) rankB = rankA so B is positive definite if A is(c) B is real if A is real

Uniqueness is a key property that is sometimes used in proofs of Polar Form theorems!Algorithm to Find the Kth Root The idea is that A can be unitarily diagonalized and so A =

U∗ΛU . Define Λ1/k = diag(λ1/k1 , ..., λ

1/kn ) and then B = U∗Λ1/kU . Therefore Bk = A.

Example Consider the matrix A =

[5 33 5

]. It may be computed that the eigenvalues are 8, 2 and

that x1 = (1, 1)∗ ∈ N(A − 8I) and x2 = (1,−1)∗ ∈ N(A − 2I). Therefore we notice x1 ⊥ x2and so we simply normalize them and place them as the columns of U in the order the eigenvaluesshow up in our desired Λ. Finally we compute A1/2 = U∗Λ1/2U .

17.3. Schur’s Product Theorem and Similar Things. Theorem IfA,B are pos. (semi) definite,with A = (aij) and B = (bij) then A ◦B = (aijbij)

ni,j=1 is positive (semi) definite.

Lemma The sum of a positive definite matrix A and a positive semi definite matrix B is positivedefinite.Proof. x∗(A+B)x = x∗Ax+ x∗Bx > 0 since x∗Ax > 0.

18. SINGULAR VALUE DECOMPOSITION AND POLAR FORM

18.1. Polar Form. Let A ∈Mm,n with m ≤ n. Then A may be written

A = PU

where P ∈ Mm is positive semidefinite, rankP = rankA, and U ∈ Mm,n has orthonormal rows(that is UU∗ = I). The matrix P is always uniquely determined as P = (AA∗)1/2 and U isuniquely determined when A has rank m. If A is real then U, P may be taken to be real as well.An important special case:Corollary !! If A ∈Mn then

A = PU

where P = (AA∗)1/2 is positive semidefinite and U unitary. P is uniquely determined; if A isnonsingular, then U is uniquely determined as U = P−1A. If A is real then P,U may be taken tobe real.

• Necessarily P is hermitian in the square case!• Let A = PU . Then A is normal if and only if PU = UP . The forward direction comes

from uniqueness of square roots and the converse from noticing P is hermitian and lookingat AA∗ and A∗A.

18.2. Misc. Results.• For any square matrix A, A∗A is unitarily similar to AA∗ and therefore have the same

eigenvalues. Proof by observing using word in two letters trace theorem.

18.3. Singular Value Decomposition. The singular value decomposition may be seen as a “sub-stitute” eigenvalue decomposition for matrices which are not normal (or square). For positivedefinite matrices (Hermitian), the SVD is the eigenvalue decomposition.

Theorem If A ∈Mm,n with rank k it may be written in the form A = V ΣW ∗.

• The matrices V ∈Mm and W ∈Mn are unitary.• Σ = [σi,j] ∈ Mm,n as σij = 0 for i 6= j and σ11 ≥ σ22 ≥ · · · ≥ σkk ≥ σk+1,k+1 = · · · =σqq = 0 where q = min{n,m}.• The numbers {σii} ≡ {σi} are the nonnegative square roots of the eigenvalues of AA∗,

and hence are uniquely determined. (It is also true that they are the nonnegative squareroots of the eigenvalues of A∗A)• The columns of V are the eigenvectors of AA∗ and the columns of W are the eigenvalues

of A∗A (arranged in the same order as the corresponding eigenvalues of σ2i ). [This follows

because each matrix is Hermitian, and so they have an orthonormal set of eigenvectors,yielding unitary matrices]• If m ≤ n and if AA∗ has distinct eigenvalues, then V is determined up to a right diagonal

factor D = diag(eiθ1 , ..., eiθn). That is if A = V1ΣW∗1 = V2ΣW

∗2 then V2 = V1D.

• If m < n, then W is never uniquely determined.• If n = m = k and V is given, then W is uniquely determined.• If n ≤ m the uniqueness of V and W is determined by considering A∗.• If A is real, then V , Σ, and W may be taken to be real.• tr(AA∗) =

∑σ2i

18.4. Computing the Singular Value Decomposition. For a nonsingular square matrixA ∈Mn:

(a) Form the positive definite Hermitian matrix AA∗ and compute the unitary diagonalizationAA∗ = UΛU∗ by finding the (positive) eigenvalues {λi} of AA∗ and a corresponding set{ui} of normalized eigenvectors.

(b) Set Σ = Λ1/2 and V = W = {u1, ..., un}.(c) Set W = A∗V Σ−1

Notes:

• If A were singular, then AA∗ would be positive semi definite, and so Σ would not beinvertible. Therefore it would be necessary to compute the eigenvectors of A∗A to find W .• The eigenvectors must be arranged in the same order as the singular values.• The singular values of a normal matrix are the absolute values of the eigenvalues. Besides

the usual way, since V,W are not unique, the columns of V can also be the eigenvectorsof A. In this case V is determined by the eigenvectors of A, but V 6= W . To find find W ,notice for each λk, λk = |λk|eiθk . So let D = diag(eiθ1 , ..., eiθn). Then A = UΛU∗ =U |Λ|DU∗ = U |Λ|(UD)∗

• If A ∈ Mn is singular, that means it will have at least one zero singular value. Whenthis occurs, augment U and W as necessary by choosing orthogonal vectors to make themsquare as well.• In fact, whenever you come up short, always augment U and W with orthogonal vectors• If the matrix is not square or singular, the process will not be fun.

18.5. Notes on Nonsquare matrices. When computing the singular value decomposition for non-square matrices, there are a few things to keep in mind:

• Although you only use the nonzero eigenvalues for the singular value decomposition, youstill use the eigenvectors associated with any zero eigenvalues. AA∗ = A∗A have the samenonzero eigenvalues.• The matrix V should be the same size as AA∗ and similarly for W and A∗A.

19. POSITIVE MATRICES AND PERRON FROBENIUS

• If |A| ≤ |B| then ‖A‖2 ≤ ‖B‖2• ‖A‖2 = ‖|A‖|2• If |A| ≤ B then ρ(A) ≤ ρ(|A|) ≤ ρ(B)• For A ≥ 0, if the all row sums of A are equal, ρ(A) = ‖|A‖|∞. If all the column sums ofA are equal, ρ(A) = ‖|A‖|1.• If |A| ≤ B, then ρ(A) ≤ ρ(|A|) ≤ ρ(B).• Theorem Let A ∈Mn and suppose A ≥ 0. Then

min1≤i≤n

n∑j=1

aij ≤ ρ(A) ≤ max1≤i≤n

n∑j=1

aij

min1≤j≤n

n∑i=1

aij ≤ ρ(A) ≤ max1≤j≤n

n∑i=1

aij

This says ρ(A) is between the biggest and smallest column and row sum of A.

• Let A ∈Mn and A ≥ 0. Then for any positive x ∈ Cn we have

min1≤i≤n

1

xi

n∑j=1

aijxj ≤ ρ(A) ≤ max1≤i≤n

1

xi

n∑j=1

aijxj

min1≤j≤n

xj

n∑i=1

aijxi≤ ρ(A) ≤ max

1≤j≤nxj

n∑i=1

aijxi

Another way to view the first is:

min1≤i≤n

1

xi(Ax)i ≤ ρ(A) ≤ max

1≤i≤n

1

xi(Ax)i

Theorem (Perron) If A ∈Mn and A > 0 (i.e. aij > 0 for all i, j) then(a) ρ(A) > 0(b) ρ(A) is an eigenvalue of A(c) There is an x ∈ Cn with x > 0 and Ax = ρ(A)x(d) ρ(A) is an algebraically (and hence geometrically) simple eigenvalue of A.(e) |λ| < ρ(A) for every eigenvalue λ 6= 0, that is, ρ(A) is the unique eigenvalue of maximum

modulus(f) [ρ(A)−1A]m → L as m→∞ where L ≡ xyT , Ax = ρ(A)x, ATy = ρ(A)y, x > 0, y > 0 and

xTy = 1

Lemma If A ≥ 0 (i.e. aij ≥ 0 for all i, j), if rowsums are equal to c then ρ(A) = c. (∑n

k=1 aik = cfor all i)

19.1. Nonnegative matrices. If A ∈Mn and A ≥ 0, then ρ(A) is an eigenvalue of A and their isa nonnegative eigenvector x (x ≥ 0, x 6= 0) associated with it.

20. LOOSE ENDS

20.1. Cauchy-Schwarz inequality. | < x, y > | ≤< x, x >1/2< y, y >1/2 which can also bewritten:

| < x, y > | ≤ ‖x‖ · ‖y‖

20.2. Random Facts.• e∗iAej = aij• If T is triangular and the diagonal entries of TT ∗ are the same of T ∗T (in order), then T is

diagonal (This is used to prove the spectral theorem for normal matrices).

• Matrices of the form

0 1 00 0 10 0 0

and

λ 0 00 λ 00 0 λ

commute! This is useful sometimes

for solving ODEs.

20.3. Groups of Triangular Matrices. The set of upper triangular matrices form a group undermatrix multiplication, and similarly for lower triangular matrices. Therefore we have:

• The inverse of an upper triangular matrix is upper triangular• The inverse of a lower triangular matrix is lower triangular

20.4. Block Diagonal Matrices.

• The trace of a block diagonal matrix is the sum of the traces of the block entries.• The determinant of a block diagonal matrix is the product of the determinants of the entries.

(Easy proof: show they are similar to a matrix with their Jordan forms along the diagonal)

20.5. Identities involving the Trace.

• For any matrices C,D, tr(CD) = tr(DC). This can be used to deduce that the trace issimilarity invariant.• tr(A+B) = tr(A) + tr(B)•∑

i,j |aij|2 = trA∗A

• If σ(A) = {λ1, ..., λn}, tr(Ak) =∑n

i=1 λki for k ∈ N.

• tr(A) =∑n

i=1 e∗iAei

21. EXAMPLES

• Having the same eigenvalues is necessary but NOT sufficient for similarity. Consider[0 10 0

]and

[0 00 0

]. These both have eigenvalue 0 with multiplicity 2, but they are

not similar.

• A =

[0 10 0

]is not diagonalizable. If it were, it would be similar to the 0 matrixm which

is untrue since the only matrix similar to the 0 matrix is itself. Also, the dimKer(A−0I) =1 and so we know there is only one eigenvector associated with 0, but to be diagonalizablethere must be 2 since A is 2x2.

• Consider[

0 10 0

]and the 2x2 identity I . These two matrices commute but are not simul-

taneously diagonalizable. This is because the first matrix is not diagonalizable.

• Consider two matrices where AB is diagonalizable but BA is not. A =

[0 10 0

]B =[

1 10 0

]•[

3 1−2 0

]and

[1 10 2

]are similar but not unitarily equivalent.

• The following two matrices satisfy∑

i,j |aij|2 =∑

i,j |bij|2 but are not unitarily equivalent,in fact they are not even similar![

0 10 0

] [1 00 0

]

• A =

[1 10 0

]and B =

[0 10 0

]is a real 2-by-2 matrix that is not normal since A∗A 6=

AA∗

• A =

[1 1−1 1

]is a real 2-by-2 matrix that is normal but not symmetric, skew-symmetric

or orthogonal (real unitary).

• A =

[1 00 2

]and B =

[3 00 4

]. A and B commute so σ(A+B) ⊆ (σ(A) + σ(B)). But

1 + 4 = 5 ∈ (σ(A) + σ(B)) and σ(A + B) = {4, 6}. So σ(A + B) 6= (σ(A) + σ(B)) ingeneral.

• A =

[0 10 0

]and B =

[0 01 0

]. A and B do not commute, and clearly (σ(A)+σ(B)) =

{0} but σ(A+B) = {1,−1}. So σ(A+B) 6⊂ (σ(A) + σ(B)).

• A =

[0 10 0

]the rank of A is bigger than the number of nonzero eigenvalues (This can

happen when A is not hermitian). Furthermore, the geometric multiplicity (1) is strictlysmaller than the number of eigenvalues counting multiplicity, and A only has one eigen-vector (up to scaling).

• A =

[1 ii 1

]and B =

[i ii 1

]are both complex and symmetric matrices, but one is

normal and one is not.

• A =

[0 10 0

]A∗ =

[0 01 0

]do not have the same null space. (For normal matrices they do).

• A =

[1 11 1

]is positive semi definite but not positive definite.

• A =

[0 00 −1

]has all nonnegative leading principal minors but is not positive semi

definite (So the corresponding theorem for positive definite matrices does not apply to psdmatrices).

• A =

[0 10 0

]is not normal but satisfies

∑i,j ai,j = tr(A∗A) =

∑i |λi(A)|2 which is

associated with normal matrices.• AB and BA can have the same eigenvalues and characteristic poly, but not the same mini-

mal polynomial, i.e. different jordan forms, i.e. not similar to each other:

A =

1 0 00 0 10 0 1

and

B =

1 0 00 1 00 0 0

• For Hermitian matrices, the rank of A is equal to the number of nonzero eigenvalues. This

is not true for non-Hermitian matrices:0 10 0

• Weyl’s inequality (the first one) if A,B are not hermitian. Consider 0 10 0

and 0 01 0

• Matrices with the same minimal polynomial and characteristic polynomial NEED NOT besimilar for matrices of order ≥ 4. Ex: 4x4 matrices with zeroes on the diagonals, whereone has two J blocks of order 2, and the other has only 1.

• Idempotent matrices can only have eigenvalues 0 or 1. A 2x2 matrix that is idempotent that

is not the zero matrix or the identity is(

1 00 0

)

linear algebra review 1. preliminaries 3 1.1. â€œthe invertible

Documents