linear algebra · 2019-09-20 · before starting a course on linear algebra, it is useful to...

Linear AlgebraMPhil Preliminary Course 2019–2020

Daniel WalesFaculty of Economics

University of Cambridge

Contents

1 Introduction 4

2 Preliminary concepts 52.1 Denitions and notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Special matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Operations with vectors and matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3.1 Addition and scalar multiplication . . . . . . . . . . . . . . . . . . . . . . . . . 82.3.2 Vector multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3.3 Geometric interpretation of the dot product (∗) . . . . . . . . . . . . . . . . . . 112.3.4 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Matrices as operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4.1 Elementary matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Systems of linear equations 223.1 Linear combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2 Linear dependence and rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.1 Application: complete asset markets . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Square matrices 304.1 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1.1 Geometry of the determinant (∗) . . . . . . . . . . . . . . . . . . . . . . . . . . 334.2 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.3 Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.3.1 Computing the inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.3.2 Special inverses (∗) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.3.3 Application: ISLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.4 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.4.1 Application: principal components (∗) . . . . . . . . . . . . . . . . . . . . . . . 42

5 Diagonalisation and powers of A 445.1 Dierence Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.1.1 Application: labour markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.1.2 Application: vector autoregression . . . . . . . . . . . . . . . . . . . . . . . . . 475.1.3 Application: Fibonacci sequence (∗) . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.2 Symmetric and positive denite matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6 Vector and matrix calculus 516.1 The Jacobian and the Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.1.1 Geometric interpretation of the Jacobian matrix . . . . . . . . . . . . . . . . . . 526.1.2 Geometric interpretation of the Hessian matrix . . . . . . . . . . . . . . . . . . 546.1.3 Application: cost minimisation (∗) . . . . . . . . . . . . . . . . . . . . . . . . . 55

2

7 References 59

A Summary of key results 60A.1 Vector and matrix dierentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

B Additional Material (All Non-Examinable) 63B.1 Preliminary concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

B.1.1 Hadamard and Kronecker product (∗) . . . . . . . . . . . . . . . . . . . . . . . . 63B.2 Square matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

B.2.1 Determinant by Laplacian expansion (∗) . . . . . . . . . . . . . . . . . . . . . . 65B.3 Vector and matrix calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

B.3.1 Derivative with respect to a matrix (∗) . . . . . . . . . . . . . . . . . . . . . . . 69

3

1 Introduction

The purpose of these notes is to provide you with the linear algebra tools required for the mphil course.The results are presented in a heuristic manner in that we do not prove them. The idea is that youwill be able to apply them comfortably in unseen questions. Sections marked with a (∗) are for yourinformation only and are not examinable for the Precourse exam. This means that you can safely skipthem. They are included to make these notes more “complete” and some of you may nd them usefulduring the course.

Email me with any typos, corrections and suggestions at [email protected]. Additional explanationsand corrected typos appear in red. These notes are based on the notes and comments of Alexis De Boeck,Nicolas Paez, Donald Robertson and Ekaterina Smetanina. Many thanks go to them.

Before starting a course on linear algebra, it is useful to consider why economists may wish to understandthese issues. Broadly, in economics the motivation for understanding concepts in linear algebra may becategorised as follows:

(i) If expressed in matrix from, the mathematical results from linear algebra may be applied directlyto economic problems and proofs;

(ii) Modern economists work with vast amounts of data and matrices are often the clearest and mostconvenient form for describing problems and their solution;

(iii) More complicated problems can often be linearised and solved easily to a good degree of approx-imation.

I hope to illustrate some of these benets through economic applications of linear algebra in these notes,though the real benet of an understanding of linear algebra is likely to be more scattered throughoutthe mphil course.

There are two primary ways to contemplate topics in linear algebra:

(i) A geometric interpretation takes vectors as arrows in some set of space (e.g. R2). Each vec-tor is dened by a length and a direction in that space. This leads itself neatly to a graphicalrepresentation of concepts.

(ii) Alternatively, the building blocks and concepts in linear algebra may be thought of as involvinglists of numbers, where order in the list matters.

I hope to provide an understanding of both views, presenting both graphical and purely mathematicalarguments for the concepts in these notes to build knowledge of both interpretations, and encourageyou to interlink both views.

4

mailto:[email protected]

2 Preliminary concepts

2.1 Denitions and notation

Denition 1. A matrix of order n × p is a rectangular array of numbers with n rows and p columns

A := [(A)i j ] =

a11 a12 . . . a1p

a21 a22 . . . a2p...

.... . .

...

an1 an2 . . . anp

.

A matrix, r , of order 1 × p is called a row vector and a matrix, c , of order n × 1 is called a column vector.

r1×p=

[r1 r2 . . . rp

], c

n×1=

c1

c2...

cn

.

Matrices and vectors (usually in column form) are the primary building blocks for geometrical interpret-ations in linear algebra. We will work extensively with them throughout.

Example 2.1. The use of matrices in economics is widespread, as they may be used to store vastamounts of information in a coherent, organised manner. One such application is through the use of anLeontief input-output matrix to represent how intermediate goods from one industry or sector are usedin the production process of another. Each element in the matrix (A)i j represents the amount of goodsfrom sector i which demanded by sector j. One example for a simple 3 sector economy is set out below,which concatenates the production vectors from each industry.

A =

From\to Agriculture Manufacturing Services

Agriculture 0.2 0.05 0.01Manufacturing 0.3 0.5 0.1

Services 0.3 0.1 0.3

Denition 2. A matrix of order n × p is said to have row dimension n and column dimension p.

Denition 3. The transpose of an n × p matrix A is dened by A′ := [(A)ji ]. It is the p × n matrixobtained by interchanging the rows and columns of A.

Example 2.2. When the rows and columns of the matrixA are interchanged we observe the transpose

5

A′.

An×p=

a11 a12 . . . a1p

a21 a22 . . . a2p...

.... . .

...

an1 an2 . . . anp

infers that A′

p×n=

a11 a21 . . . an1

a12 a22 . . . an2...

.... . .

...

a1p a2p . . . anp

.

We will use the following notation: bold italic capital letters, e.g. A and B, denote matrices, bolditalic lower-case letters, e.g. a and b, are vectors and scalars are all in lower case. The set of naturalnumbers, integers and real numbers are respectively N, Z and R. The n-dimensional Euclidean space isRn . Subscripts are used to indicate that we are looking at a particular element of a larger quantity. Forexample, we may use ai to denote the ith element of the vector a or aj to represent the jth column ofthe matrix A. The ijth element of A is represented by (A)i j . All vectors are column vectors; if we need towork with the rows of a matrix we denote these by using a superscript, e.g. ai is the ith row of A.

2.2 Special matrices

Below is a list of matrices which will be of particular interest during these lectures. We are runningahead of ourselves as we have not yet dened any operations on vectors or matrices, but you can returnto this section after completing the notes.

Denition 4. A matrix is said to be square if it has the same number of columns as rows, i.e. n = p.

Example 2.3. Examples of square matrices include a scalar, a, a 2× 2 matrix, B, and a 3× 3 matrix,C .

a1×1= 1, B

2×2=

[1 33 4

], C

3×3=

1 2 34 5 67 8 9

.

Denition 5. A matrix A is symmetric if it is equal to its transpose, i.e. A = A′. A scalar will besymmetric by denition.

Example 2.4. In the example above, the matrix B is symmetric, as B = B′, while the matrix C isclearly not since C , C ′.

Exercise 2.1. Given that A is symmetric and B is skew-symmetric (B = −B′), nd a, b, c , u, v , w , x ,

6

y and z.

A =

3 a −12 5 c

b 8 2

, B =

u 3 v

w x y

−2 6 z

.

Denition 6. A square matrix D is diagonal if it is of the form

D := diag(d1, . . . ,dn) =

d1 0 . . . 00 d2 . . . 0....... . .

...

0 0 . . . dn

,

where the elements o the main diagonal are all zero. Diagonal matrices are always symmetric.

Example 2.5. The identity matrix is the diagonal n × n matrix with ones as diagonal elements

In := diag(1, . . . , 1) =

1 0 . . . 00 1 . . . 0....... . .

...

0 0 . . . 1

.

Furthermore, pre- or post-multiplying any square matrix, An×n

, by the identity matrix, In will leave thatmatrix unchanged such that AIn = A = InA. As the identity matrix, In , is a diagonal matrix it is alsosymmetric.

Denition 7. A square matrix, L, is lower triangular if all its elements above the main diagonal arezero. The transpose of a lower triangular matrix, U = L′ is known as an upper triangular matrix.

L =

a11 0 . . . 0a21 a22 . . . 0...

.... . .

...

an1 an2 · · · ann

, U = L′ =

a11 a21 . . . an1

0 a22 . . . an2...

.... . .

...

0 0 · · · ann

.

Triangular matrices are useful in determining the solution to a series of simultaneous linear equations.

Denition 8. A square matrix A is orthogonal if A′A = In = AA′. Hence, A−1 = A′.

Denition 9. A square matrix is called idempotent if it is equal to its square: A = A2.

Exercise 2.2. Show that the identity matrix of any dimension may be said to be square, symmetric,diagonal, lower triangular, upper triangular, orthogonal and idempotent.

7

(a) (b)

Figure 1: (a) vector addition; (b) vector subtraction.

2.3 Operations with vectors and matrices

We would like to discuss basic operations such as addition, subtraction and multiplication betweenmatrices and in order to do so we need the notion of conformability. Two matrices are conformable iftheir dimensions allow the desired operation. Analogously, the operation just does not make any sensefor non-conformable matrices.

2.3.1 Addition and scalar multiplication

Vector and matrix addition (and equivalently subtraction) are dened element-wise. Therefore, bothvectors and matrices need to be of the same dimension to be conformable. Adding two vectors a and b

together amounts to adding the ith element of a to the ith element of b for all i = 1, . . . ,n:

a + b =

a1

a2...

an

+

b1

b2...

bn

=

a1 + b1

a2 + b2...

an + bn

.

To understand why we dene addition and subtraction element-wise we need to turn to the geometricinterpretation of these concepts. Figure 1 provides a visual illustration, in R2, of vector addition (panela) and vector subtraction (panel b) for the two vectors:

v =

[42

]and w =

[−12

].

The geometric illustration says, rst move along the vectorv , then move alongw . Logically, the resulting

8

(a)

Figure 2: Scalar multiplication of a vector.

point should be the addition of these vectors. Of course this operation could have been conducted inreverse, starting by moving along the vector w . Seen in this way, we can understand precisely whyaddition is only conformable for vectors of the same length. For example, it makes little sense to rstmove along a vector in R2, and then one in R3.

Vector subtraction works in a similar fashion. We rst move along vectorv and then backwards alongthe vector w . The result is shown in Figure 1 (panel b).

Matrix addition is dened equivalently. In order to add two matrices A and B we add the ijth element ofA to the ijth element of B for all i = 1, . . . ,n and j = 1, . . . ,p:

A + B =

a11 a12 . . . a1p

a21 a22 . . . a2p...

.... . .

...

an1 an2 . . . anp

+

b11 b12 . . . b1p

b21 b22 . . . b2p...

.... . .

...

bn1 bn2 . . . bnp

=

a11 + b11 a12 + b12 . . . a1p + b1p

a21 + b21 a22 + b22 . . . a2p + b2p...

.... . .

...

an1 + bn1 an2 + bn2 . . . anp + bnp

.

Scalar multiplication of a vector by a constant is also dened element-wise. A representation of this isshown in Figure 2. Geometrically this shows how one may consider the product 2w as saying movealong the vector w and then move in the same direction, for the same length, once again.

If we multiply a matrix (or a vector) by a constant c ∈ R, then we multiply each element by that scalar

9

such that for a matrix A ∈ Rn×p and a scalar c ∈ R,

cA =

ca11 ca12 . . . ca1p

ca21 ca22 . . . ca2p...

.... . .

...

can1 can2 . . . canp

.

We focus exclusively on the properties of matrix scalar multiplication, but these carry over to the vectorcase without loss of generality.

Properties 1 (Matrix addition). For three conformable matrices A, B and C:

(i) A + B = B +A;

(ii) c(A + B) = cA + cB, for any c ∈ R;

(iii) A + (B +C) = (A + B) +C .

Exercise 2.3. Show the three properties above.

Exercise 2.4. Show that (A + B)′ = A′ + B′ for any two conformable matrices A and B.

Example 2.6. An input-output matrix (example 2.1) may also be adapted to show the use of vectoraddition. Converting the columns of an input-output matrix to vectors we have that total (internal)demand, d , for goods produced by each sector of the economy is given as the row sum.

d =

0.20.30.3

+0.050.50.1

+0.010.10.1

=0.260.90.5

.The impact of scalar multiplication may also be shown here. Suppose the size of (internal) demand fromthe manufacturing industry were to half, as it becomes more productive. The impact on internal demandmay be calculated by computing a new vector, f , given as

f =

0.20.30.3

+ 0.5

0.050.50.1

+0.010.10.1

=0.20.30.3

+0.0250.250.05

+0.010.10.1

=0.2350.650.45

.

10

2.3.2 Vector multiplication

First, consider the vector case. When talking about multiplying two vectors, we usually mean the innerproduct (or dot product). For two vectors a and b the inner product is dened as follows

a · b :=n∑i=1

aibi .

Hence, both vectors need to be of the same dimension in order to be conformable. Often, you will alsosee the notation: a′b or 〈a,b〉.

Properties 2 (Inner product). For three conformable vectors a, b and c:

(i) a · b = b · a;

(ii) a · (b + c) = a · b + a · c;

(iii) (c1a) · (c2b) = c1c2(a · b), for any c1, c2 ∈ R.

Exercise 2.5. Show the three properties above.

The notion of an inner product allows us to compute the Euclidean norm of a vector1

‖a‖ =√a · a =

√√ n∑i=1

a2i ,

or the Euclidean distance between two vectors

‖a − b‖ =√(a − b) · (a − b) =

√√ n∑i=1(ai − bi )2.

Example 2.7. A rm demands a number of inputs, x , in order to produce output. At given marketprices, p, we may compute total costs, c , for the rm.

c = p · x =n∑i=1

pixi = p1x1 + · · · + pnxn .

2.3.3 Geometric interpretation of the dot product (∗)

These arguments are taken from Binmore and Davies (2002). Consider any triangle, with lengths a, band c , as shown in Figure 3, panel (a). The generalised Pythagorean theorem may be used to give arelationship between the length of the sides and one of the angles. In particular the cosine rule state

1This is not the only way to dene the norm of a vector, but probably the one you are the most familiar with. If you wantto know more look up “`p -norms”.

11

-4 -3 -2 -1 0 1 2 3 4

-1

-0.5

0

0.5

1

1.5

2

2.5

3

(a)

-4 -3 -2 -1 0 1 2 3 4

-1

-0.5

0

0.5

1

1.5

2

2.5

3

(b)

Figure 3: (a) Generalised Pythagorean theorem; (b) Geometric dot product.

that:

c2 = a2 + b2 − 2ab cosψ .

Now, instead consider the same vector space we used above, with vectors v =

[12

]and w =

[21

]. We

are able to construct a triangle by moving the position of the composite vectorv −w , and consideringthe angle, θ , between the vectors. This is shown in Figure 3, panel (b). In this case the cosine rulebecomes:

‖v −w ‖2 = ‖v ‖2 + ‖w ‖2 − 2‖v ‖‖w ‖ cosθ ,

where we restrict 0 ≤ θ ≤ π . Using the denition of the dot product we may expand this expression toshow:

‖v −w ‖2 = 〈v −w,v −w〉,

‖v −w ‖2 = 〈v,v〉 − 〈w,v〉 − 〈v,w〉 + 〈w,w〉,

‖v −w ‖2 = ‖v ‖2 + ‖w ‖2 − 2〈v,w〉,

and hence by comparison to above:

‖v ‖2 + ‖w ‖2 − 2‖v ‖‖w ‖ cosθ = ‖v ‖2 + ‖w ‖2 − 2〈v,w〉,

cosθ =〈v,w〉‖v ‖‖w ‖ ,

for 0 ≤ θ < π .

Heuristically, in R2, this means that:

12

(i) Wheneverv ·w > 0 vectors have “similar” directions. The angle between them satises 0 < θ < π2 ;

(ii) Wheneverv ·w < 0 vectors have “dissimilar” directions. The angle between these vectors satisesπ2 < θ < π ;

(iii) Wheneverv ·w = 0 vectors are perpendicular as the angle between them is θ = π2 .

2.3.4 Matrix multiplication

There are multiple ways to dene the notion of multiplication for matrices. Let us rst review the“standard” way; the one you are all probably most familiar with. Let A and B be matrices of order n × pand k ×m respectively. If p = k then A and B are conformable to matrix multiplication. Let

C = AB :=

∑ks=1 a1sbs1

∑ks=1 a1sbs2 . . .

∑ks=1 a1sbsm∑k

s=1 a2sbs1∑k

s=1 a2sbs2 . . .∑k

s=1 a2sbsm...

.... . .

...∑ks=1 ansbs1

∑ks=1 ansbs2 . . .

∑ks=1 ansbsm

,

where C is an n ×m matrix. The ijth element of C is the nothing else than the inner product betweenthe ith row of A and the jth column of B.

(C)i j := ai · bj :=k∑s=1

aisbs j .

In general, AB , BA. Indeed, BA is not even dened unless n =m and in that case it is a k × p matrix.Matrix multiplication does obey the following properties.

Properties 3 (Matrix multiplication). For three conformable matrices A, B and C:

(i) A(B +C) = AB +AC;

(ii) (A + B)C = AC + BC;

(iii) A(BC) = (AB)C = ABC;

(iv) c(AB) = (cA)B = A(cB);

(v) (AB)′ = B′A′.

Property (iii) means that you can multiply BC rst or AB rst.2 To see where this is useful: consider asquare matrix A, then A2A = AA2. By induction, ApA = AAp such that matrix powers follow the same

2The proof is a little awkward, but if you are curious, refer to the “hints” in problem 2.4.37 of Strang (2009).

13

rule as the real numbers:

Ap = A · · ·A︸︷︷︸p factors

(Ap )(Aq) = Ap+q (Ap )q = Apq .

The ability to break up these multiplications may have computational benets.

Exercise 2.6.

(i) Prove property (v) above. Using this, or otherwise, prove that (ABC)′ = C ′B′A′.

(ii) Under what condition is (AB)′ = A′B′?

(iii) Find an example of two distinct matrices A and B such that AB = BA.

Exercise 2.7. Consider the matrices: A is 3 × 5, B is 5 × 3, C is 5 × 1 and D is 3 × 1. Which of thefollowing matrix operations are allowed?

(i) BA;

(ii) AB;

(iii) ABC;

(iv) DBA;

(v) A(B +C).

Exercise 2.8. Let

A =

[1 2 34 5 6

]and B =

[4 5 61 2 3

].

Compute AB′, BA′, A′B and B′A.

Exercise 2.9. For square matrices A and B, which of the following statements are true, and why?

(i) (A + B)2 = (B +A)2;

(ii) (A + B)2 = A2 + 2AB + B2;

(iii) (A + B)2 = A(A + B) + B(A + B);

(iv) (A + B)2 = (A + B)(B +A);

(v) (A + B)2 = A2 +AB + BA + B2.

Exercise 2.10. For square matrices A and B, which of the following statements are true, and why?

(i) (A − B)2 = (B −A)2;

14

(ii) (A − B)2 = A2 − B2;

(iii) (A − B)2 = A2 − 2AB + B2;

(iv) (A − B)2 = A(A − B) − B(A − B);

(v) (A − B)2 = A2 −AB − BA + B2.

There are three other ways to view matrix multiplication that all yield the same result. Let A be an n ×pmatrix and let B be a p ×m matrix.

(1) Matrix A times columns of B: each column of AB is a linear combination of the columns of A suchthat:

AB =

A

©«

b1

ª®®®®®¬A

©«

b2

ª®®®®®¬· · · A

©«

bm

ª®®®®®¬,

where bj is the jth column of B.

(2) Rows of A times matrix B: each row of AB is a linear combination of the rows of B such that:

AB =

( a1 )B( a2 )B

...( an )B

,

where ai is the ith row of A.

(3) Block matrix multiplication: each block of C is the sum of block-rows of A times block-columns ofB

AB =

[A11 A12

A21 A22

] [B11 B12

B21 B22

]=

[A11B11 +A12B21 A11B12 +A12B22

A21B11 +A22B21 A21B12 +A22B22

].

Example 2.8. As an important special case, let the blocks of A be its p columns and the blocks

15

of B be its p rows, then AB is the sum of the columns of A times the rows of B :

AB =

a1 . . .

ap

b1

... bp

=

a1

[ b1 ]

+ · · · +

ap

[ bp ]

=[a1 . . . ap

] b1

...

bp

,where ai is the ith column of A and bi is the ith row of B. Setting the above example aside, blockmatrices arise naturally in many circumstances. In particular, matrices with identity matricesas their blocks are frequently encountered. Then, it can be much easier to work with the blocksrather than the entire matrix.

Exercise 2.11. Which rows or columns or matrices do you multiply to nd

(i) the third column of AB;

(ii) the rst row of AB;

(iii) the entry in row three, column 4 of AB;

(iv) the entry in row 1, column 1 of CDE.

Exercise 2.12. An econometrician collects data on the number of years of education and the maritalstatus of n dierent individuals. They organise the data for the ith individual as a 2 × 1 vector xi andarranges these n 2-dimensional vectors into an n × 2 matrix X . Show that X ′X =

∑ni=1 xix

′i .

Exercise 2.13. The econometrician also collects income data on the same n individuals. Let y be then × 1 vector with ith element yi . Show that X ′y =

∑ni=1 xiyi .

Exercise 2.14. Suppose that for every i = 1, . . . ,n, the econometrician applies a weight 1/σi to theith observation and arranges these n 2-dimensional observations into a matrix Z of order n × 2 suchthat the ijth element of Z is (X )i j/σi . Let Ω be the diagonal matrix with (Ω)ii = σ 2

i . Show that

Z ′Z =n∑i=1(xi/σi )(xi/σi )′ =

n∑i=1

xix′i /σ 2

i = X ′Ω−1X

16

and

n∑i=1(xi/σi )yi/σi =

n∑i=1

xiyi/σ 2i = X ′Ω−1y.

Exercise 2.15. Multiply a 3 × 3 matrix A and I3 using columns of A times rows of I3.

Exercise 2.16. Multiply AB below using columns times rows:

AB =

1 02 42 1

[3 3 01 2 1

]=

122

[3 3 0

]+ = .

Example 2.9. Again returning to the input-output matrix described earlier (exercise 2.1). We mayrecreate the increase in productivity of the manufacturing sector using a matrix, S . We translate thequantity of each input required into a new production matrix in which manufactured goods require halfthe number of inputs as previously stated. This translation of the system may be written as

A =

0.2 0.05 0.010.3 0.5 0.10.3 0.1 0.1

, S =

1 0 00 0.5 00 0 1

where B = AS =

0.2 0.025 0.010.3 0.25 0.10.3 0.05 0.1

such that the internal demand, given as the row sum, is as calculated above.

2.4 Matrices as operators

Another way to view matrices is as linear operators, which can be made to act through matrix multiplic-ation. The linear operator A (a matrix) acts upon an input which is either a vector x or a matrix B andyields another vector Ax or matrix AB.

Example 2.10. Consider the matrix

A =

[0 11 0

].

This matrix is called a permutation matrix and it acts on a vector x according to

x =

[x1

x2

]⇒ b = Ax =

[x2

x1

]The linear operator A permutes the elements of x .

Denition 10. A permutation matrix is a matrix obtained by reordering the rows of the identity

17

matrix.

Exercise 2.17. Construct all the permutation matrices of order n = 3. How many permutationmatrices of order n = 4 are there?

Example 2.11. In Figure 4, we plot 101 points on a circle with radius 2 and use the matrix

A =

[2 11 2

]to map these to 101 points on an ellipse. The matrix A is an operator and B is the 2 × 101 matrix ofcoordinates on the circle. C = AB produces a 2 × 101 matrix of coordinates on the ellipse.

To understand what happened, we need to consider each of the points in the original circle, anddecompose these into their corresponding (x ,y)-coordinates. Here the matrix A, above tells us that thex-coordinate should be mapped to the vector

x =

[21

]while the y-coordinate should be mapped to the vector

y =

[12

]A graphical representation of this is shown in Figure 4 (panel c).

Exercise 2.18. Suppose you are given 101 points on the surface of a sphere. What is the dimension ofthe matrix you would require in order to map those points to 101 points on the surface of a 3-dimensionalellipsoid?

Exercise 2.19. Consider the matrix

X =

x11 x12 x13

x21 x22 x23

x31 x32 x33

x41 x42 x43

x51 x52 x53

.

Construct a matrix A such that

AX =

x21 − x11 x22 − x12 x23 − x13

x31 − x21 x32 − x22 x33 − x23

x41 − x31 x42 − x32 x43 − x33

x51 − x41 x52 − x42 x53 − x43

.

18

-5 0 5

-5

0

5

(a) Initial Circle.

-5 0 5

-5

0

5

(b) Resultant Ellipse.

-5 0 5

-5

0

5

(c) Transformation of single point.

Figure 4: Panel (a) shows 101 randomly drawn points from a circle with radius 2 and the origin as centre;Panel (b) points on circle mapped into an ellipse by the linear transformation A; Panel (c) shows howthis may be operation may be interpreted point-by-point.

19

2.4.1 Elementary matrices

Certain permutations of the identity matrix have particularly simple interpretations as operators. Thesematrices can be quite useful in statistics for analysing the eects on various statistical procedures of,for instance, measurement error or logarithmic transformations. We will call such matrices elementary

matrices.Denition 11. An elementary matrix, E, is a matrix which diers from the identity matrix by a

single row operation. Row operations refer to:

(i) Row interchanging. Ei j will denote the identity matrix with rows i and j interchanged;

(ii) Row multiplication. Ei (γ ) will denote the identity matrix whose ith row is multiplied by γ , 0;

(iii) Row addition. Ei (γ |j)will denote the identity matrix whereγ times row j is added to a row i (i , j).

Example 2.12. For n = 3, consider

E13 =

0 0 10 1 01 0 0

E2(7) =1 0 00 7 00 0 1

E3(5|2) =1 0 00 1 00 5 1

.Elementary matrices satisfy the following.

Properties 4 (Elementary matrices). For an elementary matrix:

(i) E ′i j = Ei j ;

(ii) E ′i (γ ) = Ei (γ );

(iii) E ′i (γ |j) = Ej (γ |i);

The proof of these results can be found in exercises 6.4 and 6.5 in Abadir and Magnus (2005, p. 134).

It may be possible to decompose a more complicated matrix into the product of elementary matrices.

Example 2.13. Consider the 3 × 3 matrix, A, which initially appears dicult to interpret

A =

2 2 00 1 00 0 1

The matrix A may be decomposed into a series of two steps. First add the rst row to the second and

20

then double the rst row. As such we have that

A =

2 2 00 1 00 0 1

=2 0 00 1 00 0 1

1 1 00 1 00 0 1

= E1(2)E1(1|2).

such that A is the product of two elementary matrices.

Exercise 2.20. Decompose the following matrices into the product of series of elementary matricesand state the operations each matrix represents.

A =

[1 00 4

], B =

1 0 00 0 00 1 1

, C =

1 0 1 00 0 1 00 1 0 00 0 0 3

21

3 Systems of linear equations

3.1 Linear combinations

Denition 12 (Linear combination). A linear combination of vectors is the sum of a scalar multipleof each vector.

Example 3.1. A linear combination of p vectorsv1, . . . ,vp may be written as

c1v1 + . . . + cpvp ,

where ci ∈ R for i = 1, . . . ,p.

In light of example 3.1 adding v and w is equivalent to taking a linear combination cv + dw withc = d = 1. Other special linear combinations are

(i) 1v + (−1)w is the dierence of vectors, as depicted in Figure 1 (b);

(ii) 0v + 0w = 0;

(iii) cv + 0w is a stretch or contraction ofv . (Scalar multiplication).

Exercise 3.1. Let u,v and z be vectors of the same order. Show that:

(i) u +v = v +u;

(ii) (u +v) + z = v + (u + z).

Exercise 3.2. For vectors u andv of the same order and scalars c and d . Show that

(i) (c + d)(u +v) = cv + cu + dv + du;

(ii) (Harder) the zero vector is uniquely determined by the condition that c0 = 0 for all scalars c ∈ R.

We will use these linear combination to rewrite systems of linear equations, which are of interest inmany economic applications. For example, consider a system with three linear equations and threeunknowns.

x + 2y + 3z = 6

2x + 5y + 2z = 4

6x − 3y + z = 2

22

01

23

45

67

12

34

56

y

−4

−2

0

2

4

6

z

1

2

6

= column 1

2

5

−3

= column 2

3

2

1

= column 3

b =

6

4

2

= 2 times column 3

x

Figure 5: graphical representation of the three vectors in the system.

Notice, that we can write this system as a linear combination

126

x +

25−3

y +321

z =642

,or even more compactly in the form Ax = b, where

A =

1 2 32 5 26 −3 1

, x =

x

y

z

and b =

642

.Hence, Ax is a linear combination between the columns of A weighted by the scalar components of x .We wish to nd the x , y and z that solve the above simultaneous equations.

A way to visualise the system is to plot the three equations in x ,y, z-space, with the columns of Aproviding the coordinates. Each equation in the system denes a plane in x ,y, z-space. Two planes meetin a line, and three planes meet at a point. The coordinates of this point provide the values of x , y and z

that solve our system.

In the exercises below you may assume that a unique solution exits.

Exercise 3.3. Solve the system above.

Exercise 3.4. Write the following system in matrix form and draw the corresponding column picture.

23

By simple inspection, identify the x vector that solves the system

2x − y = 0

−x + 2y = 3.

Exercise 3.5. Write the following system in matrix form and draw the corresponding column picture.By simple inspection, identify the x vector that solves the system

2x − y = 0

−x + 2y − z = −1

−3y + 4z = 4.

3.2 Linear dependence and rank

Up to now, we have taken for granted that a unique solution to a system of equations exists. We willprovide conditions under which the existence of a unique solution for x is guaranteed. The existenceand uniqueness of a solution to a system of linear equations may be ascertained using a quantity calledthe rank of a matrix. Central to the rank of a matrix is the linear (in)dependence of a set of vectors.

Denition 13 (Linear dependence). A set of vectors v1, . . . ,vp with vi ∈ Rn is called linearlydependent if there exits scalars c1, . . . , cp not all zero such that

c1v1 + · · · + cpvp = 0.

Conversely, if no such scalars exist then the set of vectors is called linearly independent.

Exercise 3.6. Consider the following sets of vectors, u,v and w and state whether they are linearindependent. If they are linearly dependent write down a set of associated scalars c1, c2, c3 to show this.

(i) u =

[11

],v =

[02

]and w =

[12

];

(ii) u =

001

,v =

010

and w =

100

;

(iii) u =

123

,v =

246

and w =

100

;

(iv) u =

145

,v =

110

and w =

368

;

24

(v) u =

13−1

,v =

37−7

and w =

−1−15

;

(vi) u =

12−1

,v =

22−4

and w =

123

.

Exercise 3.7. Let x , y and z be linearly independent vectors of order n, show that (x +y), (x + z),and (y + z) are also linearly independent.

Exercise 3.8. Let v1, . . . ,vp be a set of linearly independent vectors. Consider another vectorv such that the vectors in v1, . . . ,vp ,v are linearly dependent. Show that v can be expressed as aunique linear combination ofv1, . . . ,vp .

Denition 14. The column rank of a matrix A is the maximum number of linearly independentcolumns of A. Likewise, the row rank is the maximum number of linearly independent rows.

Theorem 1 (Rank theorem). The column rank and row rank of any matrix A ∈ Rn×p are equal.

Proof. See exercise 4.5 in Abadir and Magnus (2005, p. 77).

From Theorem 1 it readily follows that the concept of rank in unambiguous. Therefore, we will just referto it as rank, denoted by rk, and drop the reference to “column” and “row” unless absolutely necessary.

Properties 5 (Rank). For two conformable matrices A and B:

(i) 0 ≤ rkA ≤ minn,p;

(ii) rkA = rkA′;

(iii) rkA = 0 if and only if A = On×p ;

(iv) rk In = n;

(v) rk(cA) = rkA, for any c ∈ R0;

(vi) rk(A + B) ≤ rkA + rkB;

(vii) rk(A − B) ≥ | rkA − rkB |;

(viii) rk(AB) ≤ minrkA, rkB.

A matrix that has rank minn,p is said to be full of rank. Otherwise, we say that the matrix is rankdecient.

25

Example 3.2.

rk

[1 22 4

]= 1 whilst rk

[1 00 2

]= 2.

Example 3.3.

(i) rk

1 2 42 4 83 6 12

= 1, (ii) rk

1 2 02 4 73 6 2

= 2, (iii) rk

1 2 02 3 73 3 2

= 3, (iv) rk

2 03 73 2

= 2.

The matrices (i) and (ii) are rank decient, whilst (iii) and (iv) are full rank.

Exercise 3.9. Find the rank of the following matrices

(i) A =

0 3 02 0 00 0 7

, (ii) B =

1 0 2 10 2 4 20 2 2 1

, (iii) C =

2 −1 31 0 10 2 −11 1 4

, (iv) D =

1 4 00 3 120 2 8

.

We will now use the concept of rank to determine whether a system of equations has a solution. First,consider the following example.

Example 3.4. Consider the system of equations Ax = b given by1 1 22 1 33 1 44 1 5

x1

x2

x3

=b1

b2

b3

b4

.

Does Ax = b have a solution for every possible b? The answer is no, but there are some vectors b thatwe can solve for, for instance:

(i) b = (0, 0, 0, 0)′⇒ x = (0, 0, 0)′

(ii) b = (1, 2, 3, 4)′⇒ x = (1, 0, 0)′

(iii) b = (1, 1, 1, 1)′⇒ x = (0, 1, 0)′

(iv) b = (2, 3, 4, 5)′⇒ x = (0, 0, 1)′

Wheneverb is a linear combination of the columns ofA, the system admits at least one solution. Formally,

26

this requires that rk(A|b) = rkA, or

rk

1 1 2 b1

2 1 3 b2

3 1 4 b3

4 1 5 b4

= rk

1 1 22 1 33 1 44 1 5

.

The condition for the existence of at least one solution to a system of linear equations is that the rank ofthe coecient matrix is equal to the rank of the augmented matrix, A|b, where the augmented matrix isthe coecient matrix with b attached as an extra column.

However, this still does not guarantee the uniqueness of the solution. For a unique solution, the coe-cient matrix A must be full column rank, otherwise there are innitely many solutions.

Example 3.5. Consider the previous system, but now look at a particular b vector1 1 22 1 33 1 44 1 5

x1

x2

x3

=0000

.

Does Ax = 0 have a solution? It has many, for example:

(i) x = (1, 1,−1)′

(ii) x = (2, 2,−2)′

In fact, it has innitely many solutions, all of the form x = (c, c,−c)′ for any c ∈ R.

Example 3.6. Similarly, x = (1, 0, 0)′ solves1 1 22 1 33 1 44 1 5

x1

x2

x3

=1234

,

but so does any x of the form

x =

1 + cc

−c

,where c ∈ R.

The above examples illustrates how a system of equations with a column rank decient coecientmatrix has innitely many solutions.

27

For an n ×p matrix A, a p-vector x and an n-vector b, the four possible scenarios are stated here withoutproof:

(i) rkA = n = p (A is square and invertible) Ax = b has one solution(ii) rkA = n and rkA < p (A is short and wide) Ax = b has∞ solutions

(iii) rkA < n and rkA = p (A is tall and thin) Ax = b has 0 or 1 solution(iv) rkA < minn,p (A is not full rank) Ax = b has 0 or∞ solutions.

Note the the rst three situations (i), (ii) and (iii) refer to a settings where the matrix A is full rank. Incase (ii) ans (iii) the matrix A is not square. The nal case, (iv), refers to any situation where the matrixA is not full rank.

Exercise 3.10. Let A be an n × p matrix. Suppose you know that there exists an n × 1 vector a suchthat if a is added as an additional column to A, the rank of increases by 1, i.e. rk(A|a) = rk(A) + 1. Showthat this implies the rows of A are linearly dependent. Hint: prove by contradiction.

Exercise 3.11. Show that it is not true in general that rkAB = rkBA for two square matricesA and B.

3.2.1 Application: complete asset markets

The concepts of linear dependence and rank are widely used in both nance and economics whencontemplating the nancial structure of the economy, and in particular the concept of complete assetmarkets.

Denition 15 (Complete asset markets). Asset markets are said to be complete when the payovectors of available assets span all states of nature (i.e. the payo matrix is full column rank).

Example 3.7. Suppose a discrete number, n, of possible states of nature exist in the future. Financialassets each pay out dierent amounts for each state. Suppose there are p possible nancial assets topurchase. This information may be summarised in a returns matrix, R.

R =

Asset 1 Asset p

State 1 r11 . . . r1p...

. . ....

State n rn1 . . . rnp

As long as rkR = n, any consumer is able to use a combination of assets to purchase any desired level ofconsumption across states of nature.

28

Exercise 3.12. Suppose two states of nature exist S ∈ Rain, Sun, such that n = 2. Two nancialassets exist, one which pays out 2 units in rain and another which pays out 1 unit in any state of theworld. By considering the returns matrix, R, argue why nancial markets are complete. If a householddesires 4 units of consumption in the sunny state, but none when it rains, what are the quantities ofeach asset they much purchase for their portfolio?

29

4 Square matrices

From Section 2.2 we know that a square matrix is one whose row and column dimension are the same.Throughout this section, we will discuss concepts which apply uniquely to such matrices. During yourcourse you will encounter these regularly.

4.1 Determinant

The determinant is a single value associated with a square matrix. It has several uses and here wewill focus on just a few which will be discussed in Sections 4.3 and 4.4. Before we can introduce thedeterminant, we need to know what a minor of a square matrix is.

Denition 16 (Minor). The i, j-minor, Mi j , of ann×n matrixA is the determinant of the (n−1)×(n−1)submatrix created by deleting the ith row and jth column of A.

Example 4.1. The 1, 1-minor of[a b

c d

],

is M11 = det[d] = d . Similarly, M12 = c , M21 = b and M22 = a.

Denition 17 (Determinant by cofactor expansion). The determinant of a square n × n matrix A

dened by the cofactor expansion is

detA :=n∑i=1(−1)i+jai jMi j ≡

n∑j=1(−1)i+jai jMi j , (1)

where the left-hand side of the equivalence is for xed j and the right-hand side is for xed i .

The denition above states that we can compute the determinant by expanding in cofactors, (−1)i+jMi j ,either along a xed row (xed i) or column (xed j).

Example 4.2. Consider again the 2 × 2 matrix

A =

[a b

c d

].

Use denition 17 by expanding along the rst column (x j = 1),

detA = (−1)1+1a11M11 + (−1)2+1a21M21

= ad − bc .

30

You can try by yourself to expand along any other column or row and you should get the same answer.

Example 4.3. Consider again the 3 × 3 matrix

A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

.Expand along the rst row (x i = 1) such that

detA = (−1)1+1a11

a22 a23

a32 a33

+ (−1)1+2a12

a21 a23

a31 a33

+ (−1)1+3a13

a21 a22

a31 a32

= a11(a22a33 − a23a32) − a12(a21a33 − a23a31) + a13(a21a32 − a22a31)

= a11a22a33 − a11a23a32 − a12a21a33 + a12a23a31 + a13a21a32 − a13a22a31.

Applying the following properties will make your life easier when computing determinants.

Properties 6 (Determinant). For two n × n matrices A and B:

(i) det In = 1;

(ii) detA′ = detA;

(iii) detA−1 = (detA)−1;

(iv) detAB = detAdetB;

(v) det(cA) = cn detA, for any c ∈ R;

(vi) for a lower or upper triangular matrix, including a diagonal matrix, A

detA =n∏i=1

aii ,

(vii) the determinant changes signs when two rows of A are interchanged;

(viii) subtracting a multiple of one row of A from another leaves detA unchanged;

(ix) if A has a row of zeros then detA = 0;

(x) the determinant is a function of each row separately, i.e.ca11 ca12 . . . ca1n

a21 a22 . . . a2n...

.... . .

...

an1 an2 . . . ann

= ca11 a12 . . . a1n

a21 a22 . . . a2n...

.... . .

...

an1 an2 . . . ann

31

and a11 + a

′11 a12 + a

′12 . . . a1n + a

′1n

a21 a22 . . . a2n...

.... . .

...

an1 an2 . . . ann

=a11 a12 . . . a1n

a21 a22 . . . a2n...

.... . .

...

an1 an2 . . . ann

+a′11 a′12 . . . a′1na21 a22 . . . a2n...

.... . .

...

an1 an2 . . . ann

.

Remark. By property (ii), all the results above still hold when changing “row” to “column”.

Denition 18. A singular matrix is a square matrix whose determinant is zero.

Exercise 4.1. Evaluate the determinants of the following matrices using the cofactor expansionmethod, along an appropriate row or column

A =

2 5 11 0 27 1 1

, B =

7 5 2 32 0 0 011 2 0 023 57 1 −1

, C =

1 2 1 03 2 1 00 1 6 50 1 1 1

, D =

0 1 0 01 0 0 00 0 1 00 0 0 1

, E =

0 0 0 0 0 10 0 0 0 3 20 0 0 2 9 30 0 1 0 7 40 6 9 8 7 51 3 4 2 9 6

,

F =

3 t −2−1 5 32 1 1

.

Exercise 4.2. Using property (vii), prove that if two rows of a matrix A are equal then detA = 0.Hint: proof by contradiction by supposing that A has two equal rows, but det(A) , 0.

Exercise 4.3. Prove property (viii) by using the result from exercise 4.2 and property (x). Hint:Use both parts of property (x) to separate the relevant determinant into two components and apply exercise 4.2.

Exercise 4.4. Prove property (ix) by using property (x).

Exercise 4.5. Prove or disprove the statement: det(A + B) = detA + detB.

32

Exercise 4.6. By considering the matrices

A =

[1 00 0

]and B =

[0 03 4

]and the properties of determinants, show that AB = O does not imply that either A or B is the zeromatrix, but that it does imply that at least one of them is singular.

4.1.1 Geometry of the determinant (∗)

The determinant of a matrix may look like a completely abstract concept, but it has a clear geometricinterpretation. A generalisation of a parallelogram in n-dimensional Euclidean space is called a par-allelepiped. It turns out that the volume of a parallelepiped, V , with as edges the vectors v1, . . . ,vn

is

V = | det(v1, . . . ,vn)|.

In the case of a two dimensional matrix,

A =

[4 −12 2

],

this claim may be veried diagrammatically, as shown in Figure 6. Notice the vectors v =

[42

]and

v =

[−12

]are the same vectors as in Figure 1 (a).

Figure 6: The absolute value of the determinant of the matrix with columnsv and w is the area of theparallelogram spanned by these vectors.

Similarly, when this matrix is viewed as a basis of R2, we observe that the determinant is a useful wayto summarise the relative area spanned by the set of basis vectors. This example is therefore shown as

33

the dierence between the standard basis, displayed in Figure 7, and the alternative basis, displayed inFigure 8.

Intuitively, we realise that it will be useful to know how much area in the standard basis translates toarea in the alternative. The answer is precisely given by the determinant. Any unitary square in thestandard basis is mapped into a square of size det(A) = 10 in the alternative.

Figure 7: Standard basis Figure 8: Alternative basis

The alternative basis arises from using the vectors in the columns of matrix A as a basis of R2.

4.2 Trace

Another value associated with square matrices is the trace. It is dened as follows.

Denition 19. The trace of an n × n matrix A is the sum of the diagonal elements

trA =n∑i=1

aii .

It satises the following properties.

Properties 7. For two conformable matrices A and B:

(i) tr(A + B) = trA + trB;

(ii) tr(cA) = c trA, for any c ∈ R;

(iii) if A ∈ Rn×p and B ∈ Rp×n , then

trAB = trBA.

(iv) tr(A′) = trA;

34

(v) tr c = c , for c ∈ R.

Properties (i) and (ii) make the trace a linear operator. Furthermore, in property (iii), the matrices A andB need not be square, nor do AB and BA need to be of the same dimension.

Exercise 4.7. Prove the properties above.

Exercise 4.8. Using property (iii), show that for three conformable matrices we have

tr(ABC) = tr(CAB) = tr(BCA).

Exercise 4.9. Referring to your computations in exercise 2.8 show that

tr(A′B) = tr(BA′) = tr(AB′) = tr(B′A).

4.3 Inverses

In Section 2.4 above, we discussed matrices A that operate on another matrix B to produce a new oneC through the equation AB = C . Whatever operation is performed by A can be undone by its inversematrix A−1 provided that such a matrix exists.

Denition 20. A matrix A ∈ Rn×n is invertible if there exists a matrix A−1 ∈ Rn×n such that

A−1A = In = AA−1.

Therefore, if AB = C , then

A−1C = A−1AB = IB = B.

Note that for rectangular matrices it cannot be true that the left inverse is equal to the right inverse; thedimensions would not allow it. Rectangular matrices have inverses, but we do not discuss them here. Ifyou are interested, please look up the “Moore-Penrose inverse”.

Exercise 4.10. Show (by multiplyingAA−1) that in general ifA is a 2×2 matrix given byA =

[a b

c d

],

and (ad − cb) , 0 then A−1 = 1ad−cb

[d −b−c a

], noting that (ad − cb) is the determinant of A.

Exercise 4.11. Write down the inverse of the matrices

(i) A =

[1 32 4

], (ii) B =

[5 −32 1

], (iii) C =

[6 11 2

].

35

Theorem 2. If an inverse exists, then it is unique.

Proof. Suppose not such that there exist two distinct inverses B and C for a matrix A. Then,

B = BI = B(AC)

= (BA)C

= IC = C,

a contradiction.

Theorem 3. If there exists a non-null vector x such that Ax = 0, then A is not invertible.

Proof. Suppose not such that there exists a non-null vector x with Ax = 0 and A is invertible. Bydenition denition 20, A−1Ax = x = 0, but x is non-null. Hence, a contradiction.

By Theorem 3, singular matrices take some non-zero vector into zero. There is no A−1 which can recoverthat vector.

Theorem 4. If A is an n × n invertible matrix, then the system of equations

n∑j=1

ai jx j = bi (1 ≤ i ≤ n),

has a unique solution for each choice of bi .

Exercise 4.12. Use Theorem 2 [uniqueness of an inverse] to prove Theorem 4 [unique solution foreach choice of bi ].

Exercise 4.13. Construct a matrix A that multiplies the vector (3,−1)′ to produce the zero vector(0, 0)′. What do you notice about the matrix A? Compute its determinant.

Theorem 5. A square matrix A of order n is invertible if and only if rk(A) = n.


Corollary 5.1. A square matrix A is invertible if and only if it is non-singular.

Exercise 4.14. Consider the matrix

A =

1 2 02 4 73 6 2

.

36

Show, using Theorem 5 that the matrix A is not invertible.

Theorem 6. The product AB has an inverse if and only if A and B are non-singular and of the same

dimension.

Exercise 4.15. Suppose that A and B are non-singular matrices of the same order. Show that

(AB)−1 = B−1A−1,

and thereby proving Theorem 6.

The inverse of a matrix A satises the following properties.

Properties 8 (Inverse). For a non-singular matrix A:

(i) (cA)−1 = (1/c)A−1, for any c ∈ R0;

(ii) (A′)−1 = (A−1)′;

(iii) (A−1)−1 = A.

Exercise 4.16. Prove the properties above.

4.3.1 Computing the inverse

Denition 20 does not immediately reveal how to compute the inverse. However, you could show thatfor an arbitrary n × n matrix A, the formula for its inverse is

A−1 =1

detAA#, (2)

where A# is the adjoint matrix of A.

Denition 21. The adjoint matrix of A is the transpose of its cofactor matrix (recall denition 16).Formally,

(A#)i j = (−1)i+jMji .

We see that (2) is consistent with corollary 5.1.

Exercise 4.17. Find the inverse of

(i) A =

1 1 22 1 20 1 1

(ii) B =

2 0 00 2 10 1 1

(iii) C =

−2 1.6 −0.41.6 −0.9 0.41.3 −1.2 1.6

Note: Harder.

37

Exercise 4.18. Show that a matrix with a column (or row) of zeros is not invertible.

4.3.2 Special inverses (∗)

The inverse of elementary matrices (introduced in section 2.4.1) are particularly simple to compute, andobey the following properties. These elementary matrices satisfy the following.

Properties 9 (Inverse of elementary matrices). For an elementary matrix:

(i) E−1i j = Ei j ;

(ii) E−1i (γ ) = Ei (γ−1);

(iii) E−1i (γ |j) = Ei (−γ |j).

The proof of these results can be found in exercises 6.4 and 6.5 in Abadir and Magnus (2005, p. 134).

Exercise 4.19. Suppose A is invertible and you exchange its rst two rows to obtain B. Explain whythe new matrix is invertible. How would you nd B−1 from A−1.

4.3.3 Application: ISLM

A simple way to demonstrate how inverse of matrices may be useful in economics is to consider thestandard closed economy ISLM model. Conveniently, the equilibrium of a simplied version of thismodel is often expressed in linear form as the system:

Y = C + I +G0, (Accounting identity)

C = α + βY , (Keynesian consumption function)

I = γ − δi, (Investment equation)

M0 = εY − ζ i, (Money market equilibrium)

where Y , C , I and i are endogenous variables, G0 and M0 are exogenous variables and α , β , γ , δ , ε and ζare structural parameters of the model. Typically two policy multipliers are of interest: the Keynesiangovernment spending multiplier and the money multiplier. Noting that to simplify we may eliminateconsumption, C , and the system may be re-written as:

1 − β −1 0

0 1 δ

ε 0 −ζ

Y

I

i

=α

γ

0

+G0

0M0

, or alternatively Ay = a + x .

38

wherey is a vector of the endogenous variables, and x is a vector of the exogenous variables. We are ableto see immediately how computing an inverse will uncover the multipliers of interest in the form:

y = A−1(a + x).

To do this we proceed with the recipe, and rstly compute the matrix of minors, M , corresponding tothe each element in the matrix A:

M =

−ζ −δε −εζ −ζ (1 − β) ε

−δ δ (1 − β) (1 − β)

.Next, we use cofactors and the transpose to convert this matrix into the adjoint matrix, A#.

A# =

−ζ −ζ −δδε ζ (β − 1) δ (β − 1)−ε −ε 1 − β

.Then calculate the determinant of the matrix

detA = (β − 1)ζ − δε

Before nally combining to give the inverse as:

A−1 =1

detAA# =

1(β − 1)ζ − δε

−ζ −ζ −δδε ζ (β − 1) δ (β − 1)−ε −ε 1 − β

Hence we may read o the equation for output, Y , in terms of exogenous variables and structuralparameters only as:

Y =ζ (α + γ +G0)δε + (1 − β)ζ +

δM0

δε + (1 − β)ζ ,

and deduce the multipliers as:

ζ

δε + (1 − β)ζ andδ

δε + (1 − β)ζ .

4.4 Eigenvalues and eigenvectors

In Section 2.4, we briey discussed matrices as linear operators and linked matrices to functions, whichtake a vector x as input and produce as output another vector Ax . In this section we are particularlyinterested in vectors whose direction is invariant to the linear transformation A. This brings us to theconcepts of eigenvalues and eigenvectors.

39

Denition 22. For a square matrix A, the vector x ∈ Rn\0 and the corresponding scalar λ ∈ Rare called an eigenvector and eigenvalue if they satisfy

Ax = λx . (3)

Remark. There is no reason why we should restrict the eigenvalues or eigenvectors to be real. Evenfor real matrices, the eigenvalues or eigenvectors can be complex, see example 4.6. However, in thiscourse and the mphil programme you will not come across complex valued eigenvalues and eigenvectors(often). Furthermore, in the denition above we have explicitly restricted the eigenvectors not to be thezero vector. Of course, x = 0 always satises (3), but we rule out this trivial case.

Denition 22 immediately gives us a way to nd the eigenvalues and eigenvectors. Rewrite (3) as

(A − λI )x = 0,

then from Theorem 3 we know that the matrix A − λI needs to be singular since we have ruled outx = 0. Hence, we can nd the eigenvalues by solving

det(A − λI ) = 0. (4)

We refer to (4) as the characteristic polynomial. This is a polynomial in λ of order n. Hence, from the“fundamental theorem of algebra” there are n eigenvalues, but these are not necessarily distinct. Theeigenvectors associated with each eigenvalue can then be found by solving the system in (3). Fromthis we can see that the eigenvectors are unique up to scaling. It is therefore natural to write down theeigenvectors with unit length, ‖x ‖ = 1. These are called the normalised eigenvectors.

Example 4.4. Consider the permutation matrix in example 2.10. It is not dicult to spot two vectorsx that yield Ax = λx

x =

[11

]⇒ Ax =

[11

]= x (λ = 1),

and

x =

[−11

]⇒ Ax =

[1−1

]= −x (λ = −1).

Example 4.5. We can gain some intuition by looking at what the eigenvalues and eigenvectorsmean graphically. Consider the linear transformation

A =

[2 11 2

],

40

1

2

3

0 1 2 3

x

y

(a)

1

2

3

0 1 2 3

x

y

(b)

Figure 9: (a) 100 points on a grid in [0, 1]2; (b) 100 points in [0, 1]2 transformed by A =

[2 11 2

]. The

direction of the vectors parallel to (1,−1)′ (green) and (1, 1)′(orange) remains unchanged with the latteralso being scaled by a factor of 3. The other vectors (blue) change direction after a transformation by A.

which has eigenvectors

x1 =

[1−1

]and x2 =

[11

],

associated with the eigenvalues λ1 = 1 and λ2 = 3, respectively. It is a good exercise to see if you can ndthese for yourself. As we can see in Figure 9, the direction of the vectors parallel to x1 and x2 remainsunchanged after a transformation by A. Furthermore, the vectors parallel to x2 are scaled by a factorof 3, the value of λ2. Whereas the direction of the vectors not parallel to either of the eigenvectors changes.

Example 4.6 (Complex eigenvalues). Consider the matrix

A =

[6 −131 0

],

which has only real entries, but both its eigenvalues are complex. To see this,

det(A − λI ) = λ(λ − 6) + 13

= λ2 − 6λ + 13.

The discriminant of this polynomial is ∆ = −16, thus it has no real solutions. Hence,

λ1,2 =6 ±√∆

2= 3 ± 2i .

41

Exercise 4.20. Find the two eigenvalues of the matrix

B =

[3 11 3

].

Deduce two eigenvectors of B by inspection.

Exercise 4.21. LetA be the permutation matrix in example 4.4, and write the matrix B of exercise 4.20as B = A + 3I . Algebraically, deduce that the eigenvalues of A are three less than the eigenvalues of Band that the eigenvectors are unchanged.

Exercise 4.22. Show that:

(i) λ2 is an eigenvalue of A2;

(ii) λ−1 is an eigenvalue of A−1;

(iii) λ + 1 is an eigenvalue of A + I .

Exercise 4.23. Show that the eigenvalues of idempotent matrices are either 0 or 1.

Exercise 4.24. Let

A =

[a b

c d

].

(i) Show that det(A − λI ) = λ2 − tr(A)λ + detA;

(ii) Find an expression for the two eigenvalues of A in terms of a, b, c and d . These characteristicroots satisfy

(λ − λ1)(λ − λ2) = λ2 − tr(A)λ + detA = 0,

and λ1λ2 = detA and λ1 + λ2 = trA.

4.4.1 Application: principal components (∗)

Suppose that you have a data set with p variables and n observations on each of these variables and storethem in the n × p matrix X . These data points can be plotted in Rp space. A question we can ask is: “Inwhich direction does the data vary the most?” To answer this, consider the empirical variance-covariancematrix X ′X/n, where X = X − X . We want to nd the vectors which maximise this variance

maxv ∈Rp

v ′X ′Xv subject to ‖v ‖ = 1. (5)

It turns out that the solution to (5) is the eigenvector associated with the largest eigenvalue of X ′X . Werefer to this as the rst principal component. The other n − 1 principal components are the remaining

42

eigenvectors with the last principal component being the eigenvector associated with the smallesteigenvalue.

Figure 10 plots the two principal components from a simulated data set with 10, 000 observations drawnfrom a random vector[

X1

X2

]∼ N

( [12

],

[1 −0.8−0.8 1

] ).

There is a very large literature on Principal Components Analysis and for those of you who are interested,a good place to start is James et al. (2013).

-3 -2 -1 0 1 2 3 4 5

-2

-1

0

1

2

3

4

5

6

7

Figure 10: The two principal components from a simulated dataset.

43

5 Diagonalisation and powers of A

In this section we consider one of the many useful applications of eigenvalues and eigenvectors, calledthe eigendecomposition. Let Q be the matrix whose columns are the eigenvectors of A. Suppose A

possesses n linearly independent eigenvectors such that Q is invertible. By denition 22,

AQ = A

x1 . . .

xn

=

λ1

x1 . . . λn

xn

=

x1 . . .

xn

λ1 0 . . . 00 λ2 . . . 0....... . .

...

0 0 . . . λn

:= QΛ.

Observe that by premultiplying by Q−1 we obtain

Λ = Q−1AQ .

Therefore, showing that Q−1AQ diagonalises A, in the sense that it produces the diagonal matrix Λ withthe eigenvalues λ1, . . . , λn along its main diagonal. Analogously, by postmultiplying by Q−1 we havethe eigendecomposition

A = QΛQ−1. (6)

There is a wide variety of matrix decompositions, but this decomposition is powerful enough for ourpurposes and allows us to prove some interesting properties of square matrices.

Lemma 1. Let A be diagonalisable according to (6), then

detA =n∏i=1

λi ,

and

trA =n∑i=1

λi .

Exercise 5.1. Prove lemma 1.

Remark. You could try and prove this directly from det(A − λI ) showing that these results hold evenfor matrices which are not diagonalisable through the eigendecomposition. This is slightly more involved.

44

5.1 Dierence Equations

Another interesting use for the eigendecomposition is computing powers of A. We know that the kthpower of a diagonal matrix D is

Dk = diag(dk1 , . . . ,dkn ).

From (6), it is easy to show that

Ak = QΛkQ−1. (7)

This can lead to considerable computational improvements when k is large. Moreover, the followingresult will be useful in the lectures on dierence equations.

Theorem 7. Let A be a matrix of order n with n linearly independent eigenvectors, then Ak converges

to the zero matrix if and only if the eigenvalues of A satisfy |λi | < 1 for all i = 1, . . . ,n.

Proof. Both directions follow easily from (7).

Theorem 7 is important for analysing the convergence properties of certain multivariate dierenceequations of the form

uk+1 = Auk ,

and by recursive substitution we have

uk+1 = Ak+1u0,

for some initialisation vector u0.

5.1.1 Application: labour markets

In labour economics, the total population (labour force) of an economy may be characterised into twostates. At any given time period-t each person may be employed, et or unemployed, ut . The movementbetween each state may also be classied. The probability that a worker moves from employmentto unemployment is known as the job separation rate, s , while the probability of movement fromunemployment to employment is known as the job nding rate, f . Collectively these labour market owrates are often referred to as labour market ‘churn’. The dynamic relationship between these variablesmay therefore be described as[

ut+1

et+1

]=

[1 − f s

f 1 − s

] [ut

et

],

45

which we will summarise in the form

ut+1 = Aut ,

An interesting policy question is therefore, given a starting composition of the labour force, u0, whatwill the labour force composition look like in 2 years, u2, 5 years, u5, and in the ‘long run’? A concreteexample will help to answer these questions.3

Example 5.1. Suppose the economy currently has 10 unemployed workers and 90 employed workers.The job separation rate is 0.05 while the job nding rate is 0.5. We note that this infers the main matrixof interest, A may be written as

A =

[0.5 0.050.5 0.95

],

We apply the procedure and diagonalise the matrix as follows. Firstly calculate the eigenvalues andeigenvectors of the matrix.

det(A − λI ) = 0,

(0.5 − λ)(0.95 − λ) − 0.025 = 0,

0.45 − 1.45λ + λ2 = 0,

which infers λ1 = 1 and λ2 = 0.45 and in turn these implyv1 =

[0.11

], andv2 =

[−11

]. Such that we have

A = QΛQ−1 =

[0.1 −11 1

] [1 00 0.45

] [1011

1011

− 1011

111

],

as the diagonalised form of the matrix A. Noting that we are given the initial vector, u0 =

[1090

], we may

now turn to the economics questions of interest. With the time period set to a year.

u2 = A2u0 = QΛ2Q−1 =

[0.1 −11 1

] [1 00 0.45

]2 [1011

1011

− 1011

111

] [1090

]=

[9.27590.725

],

u5 = A5u0 = QΛ5Q−1 =

[0.1 −11 1

] [1 00 0.45

]5 [1011

1011

− 1011

111

] [1090

]=

[9.107790.8923

]

The behaviour of the system in the “long run” may be seen as we take the limit of the dynamic systemas t →∞ and the associated vector u∞. Having diagonalised the matrix this behaviour is easy to note,

3Notice that the matrix being used here, A, is actually known as a Markov matrix, as all entries are non-negative and eachcolumn sum is 1.

46

as the limit of each eigenvalue is take in turn. As such we have that

u∞ =

[0.1 −11 1

] [1 00 0

]2 [1011

1011

− 1011

111

] [1090

]=

[9.090990.9091

],

and we conclude that the initial unemployment rate, of 10%, was above its equilibrium value. Thedynamics clearly show a smooth transition back. A graph of this situation is shown in Figure 11.Of course we could also have been clever in the nal example an noted that the “long run” will becharacterised by an equation of the form

u∞ = Au∞,

and therefore claimed that the “long run” distribution of workers between states will be the eigenvectorassociated with the unitary eigenvalue.

0 1 2 3 4 5 6 7 8 9 10

9

9.1

9.2

9.3

9.4

9.5

9.6

9.7

9.8

9.9

10

Figure 11: Transition path for the unemployment rate.

5.1.2 Application: vector autoregression

Vector autoregressions (VARs) are a popular econometrics tool, used by academics, policymakers and inbusiness to analyse complex economic phenomena in a time series context. The core concept requiredto apply these models makes use of the diagonalised form of a matrix. A vector autoregression (VAR)may be written in the following form

xt = Axt−1 + Bet

where xt is a vector of economic variables (data), A is a matrix of coecients describing the evolution ofthe economy, et represent structural economic shocks of interest and x0 = 0. A typical question is to askhow the economy responds after a one-time impulse (typically normalised to one standard deviation)that happens in period t = 1. This is known as an impulse response function and can be mapped quickly

47

and eciently using the equation

xt+n = AnBe1 = QΛnQ−1Be1

after rst using econometric techniques to estimate the matrices of interest from the data.

5.1.3 Application: Fibonacci sequence (∗)

The following example comes from Strang (2009).

Example 5.2. The rst 8 numbers of the Fibonacci sequence are 0, 1, 1, 2, 3, 5, 8, 13, . . . where the kthelement of the sequence can be deduced from the (k − 1)th and (k − 2)th elements as Fk = Fk−1 + Fk−2.One useful question to ask is: “How fast are the Fibonacci numbers growing?” The answer to thisquestion lies in the eigenvalues. Let

uk =

[Fk+1

Fk

].

Then, the Fibonacci sequence satises

uk+1 = Auk with A =

[1 11 0

].

We have

det(A − λI ) = λ2 − λ − 1.

Setting this to zero and solving gives

λ1 =12(1 +√

5) and λ2 =12(1 −√

5).

Since the largest eigenvalue is the one controlling the growth, the Fibonacci numbers are growing atthe rate (1 +

√5)/2.

Exercise 5.2. Assuming you can write the initialisation vector u0 as a linear combination of theeigenvectors, nd an expression for the kth element of an arbitrary sequence of the form uk+1 = Auk .

Exercise 5.3. Using the expression you found in exercise 5.2, nd the 100th element of the Fibonaccisequence in example 5.2.

5.2 Symmetric and positive denite matrices

We know from Section 2.2 that symmetric matrices are equal to their transpose. Let us rst remind youwhat an orthogonal matrix is.

48

Denition 23. A square matrix Q of order n is orthogonal if for any two columns

q′iqj =

0 if i , j

1 if i = j.

Thus, an orthogonal matrix has orthonormal columns.

Then in relation to the eigendecomposition, we have the following result for symmetric matrices.

Theorem 8 (Spectral decomposition). LetA be a symmetric n ×n matrix, then there exists a matrixQ

withQ−1 = Q ′ (i.e. Q is orthogonal) and a diagonal matrix, Λ, with diagonal elements the eigenvalues of

A such that

A = QΛQ ′.

We then refer to the eigendecomposition of A as the spectral decomposition.


Next, we ask: “what does A > O mean?” This question is answered by the the deniteness of a matrixwhich generalises the notion of positivity and negativity to matrices. Together with the eigendecompos-ition, the deniteness lets us prove many interesting results.

Denition 24 (Deniteness). A square matrix A is called positive denite if for any x ∈ Rn\0,

x ′Ax > 0. (8)

The matrix A is positive semi-denite if (8) holds with weak inequality. The notion of negative andnegative semi-denite is similarly dened by reversing the inequalities. A matrix which is neitherpositive or negative denite is indenite.

Denition 24 formalises statements like: matrix A is larger than matrix B, in the sense that A − B ispositive (semi)denite. Here, we will solely focus on positive (semi)deniteness for ease of exposition.

Lemma 2. A symmetric matrix is positive denite if and only if all its eigenvalues are positive. Moreover,

a symmetric matrix is positive semi-denite if and only if all its eigenvalues are non-negative.

Proof. Left as an exercise.

Lemma 2 allows us to check for positive deniteness by checking the signs of the eigenvalues. If asymmetric matrix has two eigenvalues with dierent signs then it is indenite. Some of the properties

49

below follow directly from this lemma and are left as an exercise.

Properties 10 (Positive denite). For a symmetric positive denite matrix A:

(i) detA > 0 such that it is also non-singular;

(ii) trA > 0;

(iii) if A ≥ O and B ≥ O , then A + B is at least positive semi-denite.

(iv) A > O if and only if A−1 > O .

Exercise 5.4. Show the properties above.

Exercise 5.5. Show that there exists a matrix which is both positive semi-denite and negativesemi-denite, but that a matrix cannot be positive and negative denite at the same time.

Exercise 5.6. Let Ω be a symmetric positive denite matrix. Use the spectral decomposition toshow that there exists a non-singular matrixL such thatΩ = L′L and such thatL−1 = (L′)−1. What isL−1?

Exercise 5.7. Find the X that solves

2X 2 − 3X + In = On ,

where X is a real symmetric matrix.

50

6 Vector and matrix calculus

In this section, we present a brief foray into vector and matrix calculus. Please consult other textbooksif you want to learn more; see e.g. Abadir and Magnus (2005); Magnus and Neudecker (1999).

Broadly there are two learning objectives to this section of the course:

(i) Ability to compute the Jacobian and Hessian matrices;

(ii) Knowledge of what the Jacobian and Hessian matrices represent.

The following notation is maintained throughout: let φ, f and F respectively represent scalar, vector andmatrix-valued functions possibly taking a scalar, vector or matrix as their inputs. For example,

φ(x) = sin(x), φ(x) =√x ′x , φ(X ) = detX

f (x) = (x ,x − 3)′, f (x) = Ax , f (X ) = Xa

F (x) = xA, F (x) = xx ′, F (X ) = X ′X .

6.1 The Jacobian and the Hessian

Our rst step in using vector and matrix calculus will be to introduce the Jacobian and Hessian matrices,and their computation.

Denition 25 (Jacobian matrix). Let f be anm × 1 vector function with input an n-vector x , thendene them × n Jacobian matrix as

D f (x) :=d f (x)

dx ′=

[∂ fi (x)∂x j

].

This matrix collects all the rst-order partial derivatives of the components of f into anm × n matrix.Note that (rather unhelpfully) the literature refers to both the Jacobian matrix and its determinant asthe Jacobian.

Denition 26 (Hessian matrix). Let φ(x) be a scalar-valued function with input an n-vector x , thenthe Hessian matrix is the n × n matrix of second-order partial derivatives

H (x) :=

∂2φ(x )∂x 2

1

∂2φ(x )∂x1∂x2

. . .∂2φ(x )∂x1∂xn

∂2φ(x )∂x2∂x1

∂2φ(x )∂x 2

2. . .

∂2φ(x )∂x2∂xn

......

. . ....

∂2φ(x )∂xn∂x1

∂2φ(x )∂xn∂x2

. . .∂2φ(x )∂x 2

n

.

Exercise 6.1. Assume φ(x) = φ(x1,x2) = 2x21x

32 , nd the Jacobian and Hessian matrix of this (scalar)

function (which has the vector input x =

[x1

x2

]).

51

Exercise 6.2. Assuming an input an n-vector x , nd both the Jacobian and Hessian matrix of thescalar function φ(x) = x ′x .

Exercise 6.3. Assume f (x) =[x + cos(y)y + cos(x)

]and nd the Jacobian of this function.

6.1.1 Geometric interpretation of the Jacobian matrix

We already know that the Jacobian stores all of the partial derivatives of a function, when dierentiatingwith respect to a vector. It may be viewed as the ratio of an innitesimal change in the variables of one

coordinate system to another. You will all have already encountered the Jacobian matrix when using thechange of variable approach to integration.

Example 6.1. Consider the integral∫ 1

0 6x2√

2x3 + 5dx . We pick an easy example, such that wealready know how to approach the answer.

Using the change of variable approach, we may transform the integral as follows. Let u = 2x3 + 5, suchthat du = 6x2dx , such that the transformed integral will be given by

∫ 75 u

12dx , where we also remember

to change the limits of integration! The integral looks considerably easier.

Notice, to make this substitution we invoke the Jacobian of the problem and, in this example, we havethat du

dx = 6x2.

Example 6.2. We wish to extend the approach in Example 6.1 away from the scalar setting, and willuse the function given in Exercise 6.3. The function

f (x) =[f1

f2

]=

[x + cos(y)y + cos(x)

]

may be viewed as a way to map every element in the (x ,y)−coordinate system into an alternative(f1, f2)−coordinate system. In the current example we may explicitly perform this change of variables,as shown in Figure 12, which maps a series of points to their position after the function f has beenapplied.

The interpretation of the Jacobian is then more straightforward to understand. Consider a small changein x , call this dx . After the function, f (x), has been applied this initial change in x results in a change inboth f1 and f2 (shown in red on the Figure). Similarly, an initially small change in y, call this dy willresult in a small change in both f1 and f2 (shown in blue on the Figure). The Jacobian matrix recordsthese changes, for any value of (x ,y). We plot these as changes from the point x = [1,−3]′.

52

(a) Standard Basis (b) Non-linear Transformation

(c) Non-linear Transformation (Zoom)

Figure 12: A set of point in the (x ,y)−coordinate system (a), and their equivalent position after thefunction f has been applied (b).

53

6.1.2 Geometric interpretation of the Hessian matrix

We now turn to the geometric representation of the Hessian matrix. As with the Jacobian, we alreadyknow that the Hessian stores all of the second-order partial derivates of a scalar-valued function andtherefore describes the local curvature of a function. Stationary (or turning) points may be described as(local) minimum, (local) maximum or inection point. By using available information in the Hessian,we are able to identify each type of stationary point.

Example 6.3. To demonstrate how a Hessian matrix may identify dierent types of stationary point,we again proceed by starting with a scalar example. Consider the three functions and their associated(scalar) Hessian matrices:

φ1(x) = x2, H1(x) = 2,

φ2(x) = −x2, H2(x) = −2,

φ3(x) = x3, H3(x) = 6x ,

Observe that the nature of the single stationary point which arises in all three cases, at x = 0, dierssubstantially. In the rst case (with H (x) > 0) is the stationary point is the (global) minimum point, inthe second (with H (x) < 0) it is the (global) maximum point, and in the third it is a point of inection(H (x) evaluated at the stationary point x = 0 gives H (0) = 0) . These functions, and their stationarypoints, are shown in Figure 14.

-4 -3 -2 -1 0 1 2 3 4-4

-3

-2

-1

0

1

2

3

4

(x) = x2

(x) = -x2

(x) = x3

Figure 13: Three very dierent scalar functions.

Example 6.4. Extending the analysis of Example 6.3 to additional dimensions is straightforward.We consider the following scalar-valued functions, and their Hessians, which take the vector x = [x ,y]

54

as an input

φ1(x) = x2 + y2, H1(x) =[2 00 2

],

φ2(x) = −x2 − y2, H2(x) =[−2 00 −2

],

φ3(x) = x3 + y3, H3(x) =[6x 00 6y

],

Again we notice that the single stationary point is the same in all three cases, and arises at x =

[00

]. In

the rst case this stationary point is the (global) minimum point, in the second it is the (global) maximumpoint, and in the third it is a point of inection. These functions, and their stationary points, are shownin Figure 14.

Now, as the Hessian is a matrix it may not be evaluated as being either positive or negative. However theconcepts introduced in Section 5.2 gives us a procedure which we may use to identify related propertiesof the Hessian matrices, namely:

(i) H1(x) is positive denite. We may show this by examining the eigenvalues of the matrix

det(H1(x) − λI2) = (2 − λ)2

which gives λ = 2 (twice). As λ > 0, H1(x) is positive denite, telling us that the function φ1(x) isconvex, and we have indeed reached a (global) minimum point;

(ii) H2(x) is negative denite. We may show this by examining the eigenvalues of the matrix

det(H2(x) − λI2) = (−2 − λ)2

which gives λ = −2 (twice). As λ < 0, H2(x) is negative denite, telling us that the function φ2(x)is concave, and we have indeed reached a (global) maximum point;

(iii) Finally, H3(x) evaluated at the stationary point gives H3(x) = O, with det(H3(x)) = 0. This infers

det(H3(x) − λI2) = λ2

which gives λ = 0 (twice). As λ = 0, H3(x) is neither positive denite nor negative denite at thestationary point, telling us that the function φ3(x) is reaches a point of inection;

6.1.3 Application: cost minimisation (∗)

We may use the tools of matrix calculus along with the concepts of positive deniteness to analysethe solution to a cost minimisation problem. Production levels, x , are inputs to a rms cost function,

55

(a) (b)

(c)

Figure 14: Functions associated with Example 6.4. (a) φ1(x); (b) φ2(x); (c) φ3(x).

56

C(x). The rm also faces a linear production constraint E(x). For the solution to the cost-minimisationproblem to be well-dened we need to ensure the cost function is concave. It turns out that the functionwill be concave if we have a positive denite Hessian matrix.

Example 6.5. Assume a rm has a cost function of with two arguments,C(x ,y), and must minimisethis cost function, subject to the production requirement, E(x ,y) ≥ 30. Where

C(x ,y) = 4x2 + 4y2 − 2xy − 40x − 140y + 1600,

E(x ,y) = x + y.

Before solving the constrained optimisation problem, we rstly inspect the cost function and solvethe unconstrained cost minimisation problem. We may calculate the derivative, second derivative andcross-derivative of each element as

∂C(x ,y)∂x

= 8x−2y−40,∂C(x ,y)∂y

= 8y−2x−140,∂2C(x ,y)∂x2 =

∂2C(x ,y)∂y2 = 8 and

∂2C(x ,y)∂xy

= −2.

This allows us to nd the Jacobian of the problem as

DC(x ,y) =[8x − 2y − 40, 8y − 2x − 140

]Stationary points arise when DC(x ,y) = 0, in this case we have a single stationary point at x = 10,y = 20. Whether this point represents a maximum, minimum or saddle point is yet to be determined.The cost function will be concave if it is associated with a positive denite Hessian matrix, and thispoint will then be the global minimum. The derivatives found earlier may be combined to show that theHessian of the cost function may be written as

H (x ,y) =[∂2C(x,y)

∂x 2∂2C(x,y)∂x∂y

∂2C(x,y)∂y∂x

∂2C(x,y)∂y2

]=

[8 −2−2 8

].

By inspection Lemma 2 applies, as the matrix is symmetric and therefore positive deniteness may bedetermined from an inspection of the eigenvalues of the matrix. To show that the Hessian matrix ispositive denite, we calculate the eigenvalues and show these to be positive.

det(H (x ,y) − λI2) =8 − λ −2−2 8 − λ

= (8 − λ)2 − 4 = 0, giving λ = 10 and λ = 6.

The cost function therefore has a unique minimum at its stationary point. In this problem the constraintplays little role, as the global minimum is attainable while respecting the constraint. Figure 15 shows agraphical representation of this minimisation problem.

57

60

400

010

5000

20 2030

40

10000

50060

15000

Cost function

Constraint

Optimal point

Figure 15: Cost function and constrained minimisation solution.

58

7 References

Abadir, K. M. and J. R. Magnus (2005). Matrix Algebra. In K. M. Abadir, J. R. Magnus, and P. C. B. Phillips(Eds.), Econometric Exercises, Chapter 1. New York: Cambridge University Press.

Binmore, K. and J. Davies (2002). Calculus: Concepts and Methods. Cambridge University Press.

James, G., D. Witten, T. Hastie, and R. Tibshirani (2013). An Introduction to Statistical Learning. NewYork: Springer (Science and Business Media).

Magnus, J. R. and H. Neudecker (1999). Matrix Dierential Calculus with Applications in Statistics and

Econometrics (Second ed.). Chichester/New York: John Wiley and Sons.

Strang, G. (2009). Introduction to Linear Algebra (Fourth ed.). Cambridge, Massachusetts: Wellesley.

59

A Summary of key results

For conformable matrices A, B, C and D:

(1) A = [(A)i j ] ⇔ A′ = [(A)ji ];

(2) cA = [c(A)i j ];

(3) (A + B)i j = (A)i j + (B)i j ;

(4) A + B = B +A;

(5) c(A + B) = cA + cB, for any c ∈ R;

(6) A + (B +C) = (A + B) +C;

(7) a · b = b · a;

(8) a · (b + c) = a · b + a · c;

(9) (c1a) · (c2b) = c1c2(a · b), for anyc1, c2 ∈ R;

(10) (AB)i j = ai · bj ;

(11) AB , BA;

(12) A(B +C) = AB +AC;

(13) (A + B)C = AC + BC;

(14) A(BC) = (AB)C = ABC;

(15) c(AB) = (cA)B = A(cB);

(16) (AB)′ = B′A′;

(17) the rank of a matrix, rkA, is the maximum number of linearly independent rows or columns of A;

(18) 0 ≤ rkA ≤ minn,p;

(19) rkA = rkA′;

(20) rkA = 0 if and only if A = On×p ;

(21) rk In = n;

(22) rk cA = rkA, for any c ∈ R0;

(23) rkA + B ≤ rkA + rkB;

(24) rkA − B ≥ | rkA − rkB |;

(25) rkAB ≤ minrkA, rkB;

(26) rkA =∑n

i=1 1λi , 0;

60

(27) det In = 1;

(28) detA′ = detA;

(29) detA−1 = (detA)−1;

(30) detAB = detAdetB;

(31) det(cA) = cn detA, for any c ∈ R;

(32) for a lower or upper triangular matrix, including a diagonal matrix, A

detA =n∏i=1

aii ;

(33) the determinant changes signs when two rows of A are interchanged;

(34) subtracting a multiple of one row of A from another leaves detA unchanged;

(35) if A has a row of zeros then detA = 0;

(36) the determinant is a function of each row separately, i.e.ca11 ca12 . . . ca1n

a21 a22 . . . a2n...

.... . .

...

an1 an2 . . . ann

= ca11 a12 . . . a1n

a21 a22 . . . a2n...

.... . .

...

an1 an2 . . . ann

and

a11 + a′11 a12 + a

′12 . . . a1n + a

′1n

a21 a22 . . . a2n...

.... . .

...

an1 an2 . . . ann

=a11 a12 . . . a1n

a21 a22 . . . a2n...

.... . .

...

an1 an2 . . . ann

+a′11 a′12 . . . a′1na21 a22 . . . a2n...

.... . .

...

an1 an2 . . . ann

;

(37) trA =∑n

i=1 aii ;

(38) trA + B = trA + trB;

(39) tr(cA) = c trA, for any c ∈ R;

(40) if A ∈ Rn×p and B ∈ Rp×n , then

trAB = trBA;

(41) tr(A′) = trA;

(42) tr c = c , for c ∈ R;

61

(43) a square matrix is positive denite if x ′Ax > 0 for all x ∈ Rn\0;

(44) detA =∏n

i=1 λi ;

(45) trA =∑n

i=1 λi ;

(46) for a symmetric positive denite matrix:

(i) detA > 0 such that it is also non-singular;

(ii) trA > 0;

(iii) if A ≥ O and B ≥ O , then A + B is at least positive semi-denite;

(iv) A > O if and only if A−1 > O .

A.1 Vector and matrix dierentiation

(1) d f ′ = (d f )′;

(2) d(c f ) = c d f , for any c ∈ R;

(3) d(f ± д) = d f ± dд;

(4) d(tr f ) = tr d f ;

(5) d fд = (d f )д + f (dд);

(6) DAx = A;

(7) Dx ′Aa = a′A;

(8) Da′Ax = a′A;

(9) Dx ′Ax = x ′(A +A′).

62

B Additional Material (All Non-Examinable)

This section of the appendix collects together additional material which has been taught on the pre-course in previous years. It is meant solely as a reference and does not form part of the course thisyear.

B.1 Preliminary concepts

B.1.1 Hadamard and Kronecker product (∗)

The way to dene matrix multiplication as we have done before is not the only way. Below, we presenttwo more ways to do so.

Denition 27 (Hadamard product). For two n × p matrices A and B the Hadamard product A B isobtained by element-wise multiplication,

(A B)i j := (A)i j (B)i j .

The Hadamard product yields a matrix of the same dimensions as the individual factors. By the denitionabove, it is not dened for two matrices of dierent dimensions. This particular matrix product satisesthe properties below.

Properties 11 (Hadamard product). For three conformable matrices A, B and C:

(i) A B = B A;

(ii) A (B C) = (A B) C;

(iii) A (B +C) = A B +A C;

(iv) the identity matrix under the Hadamard product IH is a matrix of the same dimension of all ones

AIH = A = IHA,

and every matrix without any zero-valued elements has an inverse with elements (A−1H )i j = 1/(A)i j

such that

A A−1H = IH = A−1

H A.

(v) rkA B ≤ rkA rkB.

63

Example B.1. The Hadamard product between the matrices

A =

[18 −513 −10

]and B =

[−1 −92 −17

],

is

A B =[−18 4526 170

].

Finally, we introduce the Kronecker product. This product is dened for any two matrices of arbitrarydimensions.

Denition 28 (Kronecker product). The Kronecker product between an n × p matrix A and a k ×mmatrix B is the nk × pm matrix

A ⊗ B =

a11B . . . a1pB...

. . ....

an1B . . . anpB

.The Kronecker product has many interesting applications and properties, some of which are listed below.

Properties 12 (Kronecker product). For four matrices A, B, C and D:

(i) c(A ⊗ B) = (cA) ⊗ B = A ⊗ (cB), for any c ∈ R;

(ii) (A ⊗ B)′ = A′ ⊗ B′;

(iii) (A ⊗ B) ⊗ C = A ⊗ (B ⊗ C);

(iv) (A + B) ⊗ C = A ⊗ C + B ⊗ C;

(v) A ⊗ (B +C) = A ⊗ B +A ⊗ C;

(vi) (A ⊗ B)(C ⊗ D) = AC ⊗ BD;

(vii) rkA ⊗ B = rkA rkB;

(viii) for A ∈ Rn×n and B ∈ Rm×m :

(a) trA ⊗ B = trB ⊗ A = trA trB;

(b) detA ⊗ B = detB ⊗ A = (detA)n(detB)m .

The Kronecker product will be useful for us in Appendix B.3.1.

64

B.2 Square matrices

B.2.1 Determinant by Laplacian expansion (∗)

Another way to compute the determinant is given by the Laplacian expansion. In order to dene thedeterminant in this way, we consider a set (1, . . . ,n) and a permutation σ = (j1, . . . , jn) thereof. Recallthat a permutation is any rearrangement of the original set. For a set of size n, there are n! permutations.We call a transposition the interchange of two elements of (1, . . . ,n) and denote the total number oftranspositions needed to obtain σ from (1, . . . ,n) by N (σ ).

Example B.2. Consider the set (1, 2, 3) which has 3! = 6 permutations (check this). Two of thesepermutations are σ1 = (1, 3, 2) and σ2 = (3, 1, 2). To get σ1, we interchanged 2 and 3 such that we neededonly one transposition. Hence, N (σ1) = 1. For σ2, we needed two transpositions

(1, 2, 3) 7→ (3, 2, 1) 7→ (3, 1, 2),

and thus N (σ2) = 2.

Now, let the map sgn : N→ −1, 1 be

sgn(σ ) =−1 if N (σ ) is odd

1 if N (σ ) is even.

Example B.3 (Example B.2 continued). Given the above denition, sgn(σ1) = −1 and sgn(σ2) = 1.

Denition 29 (Determinant by Laplacian expansion). The determinant of a square n × n matrix A

dened by the Laplacian expansion is

detA :=∑

sgn(σ )n∏i=1

ai ji , (9)

where the sum is taken over all n! permutations, σ , of (1, . . . ,n).

Example B.4. Consider the 2 × 2 matrix

A =

[a b

c d

].

There are two permutations of (1, 2), i.e. (2, 1) and (1, 2) itself. We have that sgn(1, 2) = 1 andsgn(2, 1) = −1. Plugging this into (9)

detA = 1 × a11a22 + (−1) × a12a21

= ad − bc .

65

Example B.5. Consider the 3 × 3 matrix

A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

.The set (1, 2, 3) has 6 permutations:

(1, 2, 3), (1, 3, 2),

(2, 1, 3), (2, 3, 1),

(3, 1, 2), (3, 2, 1).

The determinant of a 3 × 3 is therefore

detA = sgn(1, 2, 3)a11a22a33 + sgn(1, 3, 2)a11a23a32 + sgn(2, 1, 3)a12a21a33 + sgn(2, 3, 1)a12a23a31

+ sgn(3, 1, 2)a13a21a32 + sgn(3, 2, 1)a13a22a31

=a11a22a33 − a11a23a32 − a12a21a33 + a12a23a31 + a13a21a32 − a13a22a31.

The Laplacian expansion gives you exactly the same as expanding in cofactors such that these twodenitions for the determinant are equivalent.

B.3 Vector and matrix calculus

We have all the tools to introduce vector and matrix calculus. To gain some intuition, recall the denitionof the derivative in the one-dimensional case

limu→0

φ(x + u) − φ(x)u

= φ ′(x).

Re-writing gives,

φ(x + u) = φ(x) + φ ′(x)u + rx (u) (10)

where rx (u) is the remainder term such that rx (u)/u → 0 as u → 0.

Dene the rst dierential of φ at x (with increment u) as dφ(x ;u) = φ ′(x)u. Note, that we do notrequire u to be innitesimally small for the dierential to be well dened. As the example below shows,dφ(x ;u) is the linear part (in u) of the increment φ(x + u) − φ(x). Setting aside rigorous justication forthe double use of the symbol “d”, we will write dx for u, hence dφ(x) = φ ′(x) dx .

Example B.6. For φ(x) = x2, we have φ(x +u)−φ(x) = 2xu+u2. Then, it follows that dφ(x ;u) = 2xuwith rx (u) = u2 such that rx (u)/u = u → 0 as u → 0.

66

The expansion in (10) caries over into the vector case

f (x +u) = f (x) +A(x)u + rx (u). (11)

If A(x) depends only on x but not on u and rx (u)/‖u‖ → 0 as u → 0, then f is dierentiable at x .From this expansion you may see that A(x) is an operator acting on a vector u ∈ Rn . Combine this withthe fact that f : Rn → Rm yields anm-vector-valued output, then there can be no ambiguity that A(u)needs to be anm × n matrix.

Remark. In this exposition, we have concentrated on vector-valued functions, f , with vectors as input.The results carry over for scalar functions or scalar inputs. In those cases, set m = 1 or n = 1 orboth. Hence, dierentiating a scalar-valued function with respect to a vector produces a row vectorand dierentiating a vector-valued function with respect to a vector produces a matrix. The obviousconsequence of this is that the second derivative of a scalar-valued function with respect to a vector is amatrix, e.g. recall the Hessian matrix in denition 26.

The problem with the above expansion is that it does not always provide us with an easy recipe to ndA(x). Let the rst dierential be

d f (x ;u) = A(x)u,

and take the following identication result as given

d f (x) = A(x) dx ⇔ D f (x) = A(x). (12)

This result states that if the function is dierentiable at x , we can identify the rst derivative as theJacobian matrix from denition 25. Hence, depending on the situation we can either nd A(u) directlyor compute D f (x).

Unlike the one-dimensional case, it is often easier to work with dierentials rather than derivativesin the multi-dimensional case; this is especially true when we consider matrices further down below.Therefore, the following properties are stated in terms of dierentials and not derivatives.

Properties 13 (Vector-valued dierentials). Let f and д be two conformable vector functions:

(i) d f ′ = (d f )′;

(ii) d(c f ) = c d f , for any c ∈ R;

(iii) d(f ± д) = d f ± dд;

(iv) d fд = (d f )д + f (dд).

Example B.7. Consider the scalar-valued function φ(x) = a′x , where a is a vector of coecients.

67

Then,

a′(x +u) − a′x = a′u .

In line with (11), set A(x) = a′ and rx (u) = 0. Hence, da′x = a′ dx . It follows that Dφ(x) = a′, a rowvector as expected.

Example B.8. Consider the scalar-valued function φ(x) = x ′Ax , where A is a matrix of coecients.First, notice that with similar reasoning as in example B.7, we can show that dAx = Adx . Furthermore,(dx ′)Ax is a scalar so it is equal to its transpose x ′A′(dx ′)′. Using this and properties (i) and (v) above,

dφ(x) = (dx ′)Ax + x ′ dAx

= x ′A′(dx ′)′ + x ′Adx

= x ′A′ dx + x ′Adx

= x ′(A +A′) dx .

Hence, Dφ(x) = x ′(A +A′). Of course, if A is symmetric this reduces to: Dφ(x) = 2x ′A.

We illustrate using the derivative directly according to (12) which sets the following examples apartfrom the previous ones.

ExampleB.9. Consider two vector-valued functions f (x) = (1,x2,x21)′ andд(x) = (x1x2 expx2

3, logx4)′,then the derivatives are

D f =

0 00 1

2x1 0

,and

Dд =

[x2 expx2

3 x1 expx23 2x1x2x3 expx2

3 00 0 0 1/x4

].

Exercise B.1. Consider the scalar function φ(x ,y) = sin(x) sin(y). Find all its stationary pointsand determine whether these are local minima, maxima or saddle points by computing the Hessianmatrix. Hint: what does the deniteness of the Hessian matrix tell you about the concavity or convexity of φ?

Example B.10. Consider some given n × 1 vector y, n ×m matrix X , andm × 1 vector β . Find theexpression for β (in terms of y and X ) that satises the following condition

∂

∂β

[(y −Xβ)′(y −Xβ)

]= 0.

68

Firstly let us expand the expression and consider the dimensions of our problem.

(y −Xβ)′(y −Xβ) = (−β ′X ′ +y ′)(y −Xβ) = −β ′X ′y1×1+ β ′X ′Xβ

1×1+y ′y

1×1−y ′Xβ

1×1,

such that it becomes clear immediately that we are being asked to take the derivative of a scalar withrespect to am × 1 vector. We already know the resultant matrix will therefore be of sizem × 1 (noticehere we dierentiate with respect to β , rather than β ′ as would be the standard case according to ourdenition of the Jacobian matrix).

Next, dierentiate each term individually to give

∂

∂β

[(y −Xβ)′(y −Xβ)

]= −X ′y

m×1+ 2X ′Xβ

m×1− (y ′X )′

m×1,

= −2X ′y + 2X ′Xβ = −2X ′(y −Xβ) = 0,

→ β = (X ′X )−1X ′y.

Exercise B.2. In statistics, the problem of ridge regression is formulated as follows

βR(λ) := arg minβ ∈Rk

(y −Xβ)′(y −Xβ) + λ‖β ‖2,

where λ ∈ R is a parameter that needs to be chosen beforehand. Notice, that this is a least-squaresproblem but where β ’s which are “too large” are penalised. By computing the rst-order conditions,nd βR(λ).

Exercise B.3 (PCA continued). Show that the solution to (5) in Section 4.4.1 is indeed an eigenvectorof X ′X . Argue that the optimal solution is the eigenvector associated with the largest eigenvalue.

B.3.1 Derivative with respect to a matrix (∗)

So far we have only discussed how to dierentiate with respect to a scalar or a vector, but not withrespect to a matrix. However, before we do so we must introduce the concept of vectorisation.

Denition 30. The vectorisation of a matrix A, denoted by vecA, reorders the n × p matrix bystacking the columns on top of each other into an np-vector. Formally,

vecA := (a11, . . . ,an1,a12, . . . ,an2, . . . ,a1p , . . . ,anp )′.

Example B.11. The vectorisation of the matrix A =

[a b

c d

]is

vecA =

a

c

b

d

.

69

Vectorise the matrix-valued function and the matrix with respect to which we are dierentiating, thenthe Jacobian matrix of F at X is identied by

DF (X ) = dvec F (X )d(vecX )′ , (13)

which for F a n ×p matrix-valued function and X am ×q matrix yields a matrix of order np ×mq.

This is not to only way to represent the matrix derivative. However, Magnus and Neudecker (1999)argue that this is the only natural generalisation of the Jacobian matrix from vectors to matrices. It ispossible to go via a denition using partial derivatives which orders the derivatives as such

∂F (X )∂X

=

∂f11(X )∂X . . .

∂f1p (X )∂X

.... . .

...∂fn1(X )

∂X . . .∂fnp (X )

∂X

,with

∂ fi j (X )∂X

=

∂fi j (X )∂x11

. . .∂fi j (X )∂x1q

.... . .

...∂fi j (X )∂xm1

. . .∂fi j (X )∂xmq

.The matrix ∂F (X )

∂X is of order mn × pq and contains the same mnpq partial derivatives as DF but in adierent order. Magnus and Neudecker (1999, p. 195) state that this matrix is not the Jacobian matrix,and thus its determinant is meaningless which is a problem for applications, nor does this denitionprovide us with a practical chain rule. In the remainder of the discussion, we will therefore stick with(13).

Below, we present some properties of vectorisation.

Properties 14 (Vectorisation). For three conformable matrices A, B and C:

(i) vec(A + B) = vecA + vecB;

(ii) vec(A B) = vecA vecB;

(iii) vecABC = (C ′ ⊗ A) vecB;

(iv) vecAB = (B′ ⊗ In) vecA = (Im ⊗ A) vecB;

(v) vec(dF (X )) = dvec F (X ).

By vectorising, we can analyse matrix derivatives using exactly the same apparatus already developedabove. Let us look at some examples.

Example B.12. Consider the following matrix function F (X ) = xx ′, i.e. F maps a vector into itsouter product. It should be obvious that dxx ′ = (dx)x ′ + x(dx)′. Combining this with property (v) we

70

get

d(vecxx ′) = vec(dxx ′)

= vec[(dx)x ′ + x(dx)′],

splitting up the sum by property (i), and rewriting the multiplications within the vectorisation usingproperty (iv) yields

= vec[(dx)x ′] + vec[x(dx)′]

= (x ⊗ I ) vec(dx) + (I ⊗ x) vec((dx)′)

= [(x ⊗ I ) + (I ⊗ x)] dx ,

where the nal equality follows upon realising that by denition vec((dx)′) = vec(dx) = dx . Hence, by(13)

DF (X ) = x ⊗ I + I ⊗ x .

Example B.13. Consider the scalar function φ(X ) = a′Xb, where a andb are conformable coecientvectors. First, see that d(a′Xb) = a′(dX )b. Then, using the properties of vectorisation

d(veca′Xb) = vec(da′Xb)

= veca′(dX )b

= (b ′ ⊗ a′) vec dX

= (b ′ ⊗ a′) dvecX .

Therefore,

Dφ(X ) = b ′ ⊗ a′,

which is a row vector, or in partial derivative notation

∂φ(X )∂X

= ab ′.

Example B.14. Let F (A) = A−1 be a matrix-valued function. We have On = dIn = dA−1A, such that

On = dIn = dA−1A

= (dA−1)A +A−1 dA,

and thus,

dA−1 = −A−1(dA)A−1.

71

By the properties of vectorisation, it follows that

DF (A) = −(A′)−1 ⊗ A−1.

This result is interesting, because we can consider A(θ ) to be matrix function of a scalar parameter θ ∈ Rsuch that F (θ ) = A−1(θ ). This is useful in statistics as (inverses of) covariance matrices often depend onscalar parameters. The result above combined with a chain rule Magnus and Neudecker (1999, Theorem12 p. 108) gives

DA−1(θ ) = −[(A′)−1(θ ) ⊗ A−1(θ )]DA(θ ).

Exercise B.4. Show that for φ(X ) = a′XX ′a, we have DF (X ) = 2(a′X ) ⊗ a′.

Exercise B.5. Show for a square matrix X , F (X ) = Xn and n ∈ N0, that

DF (X ) =n∑p=1(X ′)n−p ⊗ Xp−1.

72

linear algebra · 2019-09-20 · before starting a course on linear algebra, it is useful to...

Documents