chapter 2 - introduction to matrix algebra · column vectors. in matrix algebra, if not speci ed...

Chapter 2 - Introduction to Matrix Algebra∗

Justin Leduc†

These lecture notes are meant to be used by students entering the University of Mannheim

Master program in Economics. They constitute the base for a pre-course in mathematics;

that is, they summarize elementary concepts with which all of our econ grad students must

be familiar. More advanced concepts will be introduced later on in the regular coursework.

A thorough knowledge of these basic notions will be assumed in later

coursework.

Although the wording is my own, the definitions of concepts and the ways to approach them

is strongly inspired by various sources, which are mentioned explicitly in the text or at the

end of the chapter.

Justin Leduc

I have slightly restructured and amended these lecture notes provided by my predecessor

Justin Leduc in order for them to suit the 2016 course schedule. Any mistakes you might

find are most likely my own.

Simona Helmsmueller

∗This version: 2017†Center for Doctoral Studies in Economic and Social Sciences.

1

Contents

1 Introduction 4

2 The vector space Mn×m 5

2.1 Special matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Further matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Matrices and Systems of Linear Equations 9

3.1 Algebra of square matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 The GaußJordan algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3 Determinant of a square matrix . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 The vector space Mn×m and linear equations 16

5 Linear functions and matrices 19

5.1 Linear functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.2 Eigenvectors and -values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2

Preview

What you should take away from this chapter:

1. Introduction

• Matrix algebra is about solving systems of linear equations and there are three

questions we would like to answer

2. The vector space Mn×n

• Know how matrix addition and scalar multiplication are defined.

• Know special matrices such as square and diagonal matrices

• Know how to transpose a matrix and how to multiply two matrices

• Know which rules apply to matrix multiplication and which do not!

3. Matrices and Systems of Linear Functions

• Know how to calculate the inverse matrix

• Be able to apply the GaußJordan algorithm

• Know how to calculate the determinant of a square matrix and why it is important

in the context of linear systems of equations

4. The vector space Mn×n and linear equations

• Have a geometric intuition for the rank condition

5. Linear functions and matrices

• Have at least a vague idea of how the definiteness of a matrix is defined and some

sort of geometric intuition for it

• Know what eigenvectors and -values are and how they can be found

3

1 Introduction

Let us start with an exercise. Have a look at the following system of linear equations:

x1 + x2 + x3 = 6

x2 − x3 = 0

Even before trying to solve it, you know that there is more than one solution. This is

so because there are two equations, but three variables. In the terminology of the previous

chapter, you are looking for a subset of the 3-dimensional vector space R3. The two equations

restrict two degrees of freedom, such that there is only one degree of freedom left. That is,

you are free to choose one of the variables, say x1, but then x2, x3 are determined by the

system of equations.

Now look at this set of linear equations:

x1 + x2 + x3 = 3

2x1 = 6− 2x2 − 2x3

Again, we have two equations but three variables. Following the logic from the last

example, we could again say that we are left with one degree of freedom, and choosing x1freely will lead to deterministic values for x2, x3. This is indeed not so1.

The analysis of systems of linear equations is crucial in economics. It is central to our

three most preferred mathematical exercises: statistical analysis, optimization, and equilib-

rium analysis2! Of course, when the number of equations and unknowns are low (i.e. less

or equal to 3), we know how to solve a system of linear equations quite efficiently, alge-

braically and geometrically. But: whenever the number of equations and unknowns becomes

high (think, e.g. in a regression framework you might easily have hundreds of variables and

thousands of equations), things may become quite complicated. There are in general three

questions that we are interested in answering in such a context:

1. Does a solution exist?

2. Is the solution unique / how many solutions are there?

3. How can we (or our computer) efficiently derive at the (set of) solutions?

To answer any of the above questions, let us introduce a concise notation: that of matrices.

A matrix is a rectangular array, which is (at least in our context) filled with real numbers,

for example:

1You can confirm this e.g. by letting x1 = 3.2And more than what you may think for equilibrium analysis, for if it involves nonlinear equations, our

best way to proceed is to locally approximate it with a system of linear equations and solve this linear systeminstead!

4

A4×3 =

35 40 25

6 25 10

0.2 0.7 1

0.6 0.1 2

Matrices are often denoted with capital letters with an index indicating the number of

rows and columns (which we call the order). In order to abstract from specific cases and

study the general rules of matrix operations, we proceed and replace numbers by symbolic

letters. Thus, we can generically define a matrix of order n×m as follows:

An×m =

a1,1 a1,2 · · · a1,ma2,1 a2,2 · · · a2,m

......

. . ....

an,1 an,2 · · · an,m

where the index i, j is then used to denote the element located in the ith row and jth

column of the matrix. Notice, again, rows come first, columns second. Finally, it will some-

times be useful to refer to the above general matrix in the following manner:

An×m = (ai,j)i=1,...,n;j=1,...,m

For convenience, the matrix index n × m is very often dropped, i.e. we denote An×m

simply by A. Yet, always try to keep in mind the order when working with matrices. It

is a tough effort at the start, but it will slowly become a very useful habit! Further, it is

absolutely necessary, as algebraic operations, to which we turn now, are only well defined if

orders coincide!

In order to see how this new notation relates to the systems of linear equations above, we

link it to the concepts introduced in the last chapter, vector spaces, and consider the matrix

as a vector.

2 The vector space Mn×m

As you remember, a vector space consists of a set and two algebraic operations. So, in this

section, we define arithmetical operations for matrices.

Definition 2.1. (Equality of Matrices)

Two matrices are said to be equal if and only if:

(i) they have the same dimension

(ii) their corresponding elements are equal

In symbols, if An×m = (ai,j)i=1,...,n;j=1,...,m and Br×s = (bi,j)i=1...r,j=1...s, then

5

A = B ⇔ (n = r, m = s, and, ∀ i = 1, ...,n(= r), j = 1, ...,m(= s) ai,j = bi,j).

Remark 1. So be aware: a 2× 2 matrix with all entries equal to 0 is not equal to a 3 matrix

with all entries equal to 0.

Definition 2.2. (Addition of Matrices)

For two matrices of identical dimension, we define addition of matrices in terms of addition

of their corresponding elements. In symbols, if An×m = (ai,j)i=1,...,n;j=1,...,m and Bn×m =

(bi,j)i=1,...,n;j=1,...,m, then their sum, denoted by A + B, is

A + B = (ai,j + bi,j)i=1,...,n;j=1,...,m.

Definition 2.3. (Multiplication of a Matrix by a Scalar)

Let λ ∈ R. Then to multiply a matrix by this scalar, we multiply each element of the matrix

by this scalar. In symbols, let An×m = (ai,j)i=1,...,n;j=1,...,m be any matrix, then:

λA := (λai,j)i=1,...,n,j=1,...,m

Theorem 2.1. (Vector Space of Matrices)

The set of all n×m matrices, Mn×m together with the algebraic operations matrix addition

and multiplication with a scalar as defined above defines a vector space.

Proof. The proof is left as an exercise.

2.1 Special matrices

With m · n entries, matrices are very versatile objects. Some matrices show special charac-

teristics and it is important to define terminology before we proceed with the algebra.

Definition 2.4. (Row and Column Vectors) Matrices containing only one row (n = 1)

are called row vectors, and similarly, matrices containing only one column (m = 1) are called

column vectors. In matrix algebra, if not specified otherwise, by vector we conventionally

mean a column vector.

Definition 2.5. (Square Matrix)

A matrix of size n×m is said to be a square matrix if and only if n = m.

Definition 2.6. (Diagonal Matrix)

A square matrix is said to be a diagonal matrix is all of its off-diagonal elements are zero.

Definition 2.7. (Upper and Lower Triangular Matrix)

An upper triangular matrix is square and has characteristic elements ai,j equal to zero

whenever i > j and a lower triangular matrix is square and has characteristic elements

ai,j equal to zero whenever i < j. Less formally, in the same way that we call a matrix

diagonal if non-diagonal entries are restricted to be null, we call upper- (lower-) diagonal a

matrix whose entries that do not belong to the part above (below) the diagonal are restricted

to be null.

6

Definition 2.8. (Symmetric Matrix)

A square matrix is said to be a symmetric matrix if and only if ai,j = aj,i∀1 ≤ i, j ≤ n.

Definition 2.9. (Identity Matrices)

The n × n identity matrix, denoted In, is a diagonal matrix with all its diagonal elements

equal to 1.

Remark 1. A convenient notation is sometimes the following:

In = (δi,j)n×n

where δi,j corresponds to the ijth element of In and denotes the Kronecker Delta:

δi,j =

{1 if i = j

0 if i 6= j

Definition 2.10. (Zero Matrices)

The n× n zero matrix, denoted 0n×n, is a square matrix with all its elements equal to 0.

2.2 Further matrix algebra

In addition to the algebraic operations addition and scalar multiplication, we can define

further algebraic operations on subsets of Mn×m.

Definition 2.11. (Transpose of a matrix)

The transpose of a matrix is obtained by reflecting all elements of a matrix over its main

diagonal. In symbols, let An×m = (ai,j)i=1,...,n;j=1,...,m be any matrix, then its transpose,

denoted A′m×n or ATm×n, is such that:

A′m×n := (aj,i)j=1...m,1=1...n

Remark 1. Note that the transpose of a matrix of order n×m is of order m×n. Informally,

you can understand transposition as “constructing a matrix where column (row) i becomes

row (column) i”.

Remark 2. With this operation, we could alternatively define symmetric matrices as square

matrices with A = A′.

So we are now ready to turn our attention to product of matrices! Unfortunately, we

cannot multiply any two matrices. There are specific requirements on the dimensions of the

considered matrices. Namely, the number of columns of the first matrix needs to be equal to

the number of rows of the second matrix. If two matrices fulfill this requirements, we say

that they are conformable. Because matrix multiplication is often encountered in economics,

it is important to keep track of the dimensions of the considered matrices!

7

Definition 2.12. (Product of Matrices)

For two conformable matrices, their product is defined as the matrix with ijth element equal

to the inner product of the ith row of the first matrix and the jth column of the second matrix.

In symbols, if An×m = (ai,j)i=1,...,n;j=1,...,m is an n×m matrix and Bm×k = (bi,j)i=1...m,j=1...k is

an m×k matrix, the product of An×m and Bm×k is the n×k matrix Cn×k with characteristic

element:

ci,j =m∑l=1

ai,lbl,j

Example 2.1. Let

A2×2 =

(a1,1 a1,2a2,1 a2,2

)and B2×3 =

(b1,1 b1,2 b1,3b2,1 b2,2 b2,3

)We wish to compute their product. To clarify things, let us partition our matrices in a

convenient way. We wish to perform inner products between the rows of the first matrix

and the columns of the second one, let us group the rows of the first matrix and group the

columns of the second one.

A2×2 =

(a1

a2

)and B2×3 =

(b1 b2 b3

)Where

a1 =(a1,1 a1,2

), a2 =

(a1,1 a1,2

), b1 =

(b1,1b2,1

), b2 =

(b1,2b2,2

), b3 =

(b1,3b2,3

)The product matrix will be of size 2 × 3 (number of rows of the first matrix × number of

columns of the second matrix. And we know that each elements are inner products! Then:

A2×2B2×3 =

(a1 · b1 a1 · b2 a1 · b3

a2 · b1 a2 · b2 a2 · b3

)i.e.

A2×2B2×3 =

(a1,1b1,1 + a1,2b2,1 a1,1b1,2 + a1,2b2,2 a1,1b1,3 + a1,2b2,3a2,1b1,2 + a2,2b2,1 a2,1b1,2 + a2,2b2,2 a2,1b1,3 + a2,2b2,3

)Besides its complicated look, matrix multiplication does have some desirable properties:

Theorem 2.2. (Associativity and Distributivity of the Product)

The product for matrices is:

(i) Associative: (AB)C = A(BC)

(ii) Distributive over matrix addition: A(B + C) = AB + AC and (A + B)C = AC + BC

8

Remark 3. Contrary to the ordinary multiplication, matrix multiplication is not commutative

(in general), i.e. AB 6= BA (in general).

Remark 4. Distributivity of the product over matrix addition implies that the ordinary

expansion formula also hold for matrices! That is, if A and B are n × m matrices and C

and D are m× k matrices, then (A + B)(C + D) = AC + BC + AD + BD.

Theorem 2.3. (Transposition, sum, and product)

(i) If A and B are n×m matrices, then:

(A + B)′ = A′ + B′

(ii) If A is an n×m matrix and B is an m× k matrix, then:

(AB)′ = B′A′

(iii) If A is 1× 1 matrix, then A is actually a scalar and A′ = A.

The following examples might give a first hint that the concept of matrices often presents

a convenient way for concise notation.

Example 2.2. Let x be any vector of dimension n. Then, the sum of square of its elements

can be computed as follows:

x′ · x = x21 + ...+ x2n.

As you know from the last chapter, this is equal to the dot product of x with itself. It is

also equal to the square of the Euclidean norm of x.

Example 2.3. Similarly, defining ι to be a vector of ones of the adequate length, we can

as compactly write down the sum of the elements of any vector. For instance, let ι be of

dimension n and x be an arbitrary vector of dimension n, we have:

ι′ · x = x1 + ...+ xn.

3 Matrices and Systems of Linear Equations

To understand the relation between matrices and system of linear equation, one has to be

familiar with the laws of matrix algebra. Indeed, consider a system of linear equations:

(S)

3x1 + 5x2 + x3 = 0

7x1 − 2x2 + 4x3 = 4

−6x1 + 3x2 + 2x3 = 2

9

Such a system of equations can be translated into a unique matrix equation. Namely, let:

A3×3 :=

3 5 1

7 −2 4

−6 3 2

x3×1 :=

x1x2x3

b3×1 :=

0

4

2

For clarity, allow me to drop the dimension indexes. But keep them in mind! Computing

the product Ax is a good exercise and should convince you that requiring:

Ax = b

to hold is equivalent to requiring that the system (S) holds. This relation between matrix

equations and linear systems holds true in general, that is, let Am×n be a given m × n

matrix, xn×1 be an n × 1 vector of unknowns, and bn×1 a given n × 1 vector, then, the

matrix equation:

Am×nxn×1 = bn×1

describes a system of m linear equations with n unknowns.

Now, let me repeat the questions concerning this sytem of linear equations, which we

raised in the introduction of this chapter:

1. Does a solution exist?

2. Is the solution unique / how many solutions are there?

3. How can we (or our computer) efficiently derive at the (set of) solutions?

The easiest and most general insight actually comes from matrix algebra, and it is a

sufficiency condition. Consider the following system of m linear equations and n unknowns:

Am×nxn×1 = bn×1

Again, let me drop the dimension indexes for clarity. Now, assume A has an inverse. Then

we have that:

A−1Ax = A−1b

That is:

x = A−1b

In words, if A has an inverse, then, for any b in Rn, the system Ax = b has a unique

solution. Namely, x = A−1b.

So, first we ask the question: when does A have an inverse?

10

3.1 Algebra of square matrices

Definition 3.1. (Inverse Matrix)

Let A be an n× n matrix. If there exists a matrix A−1 such that:

A−1A = AA−1 = In

Then A is said to be invertible and A−1 is called the inverse matrix of A.

Remark 1. It is also possible to define inverse matrices for a non-square matrix Ak×n. How-

ever, it is clear that we need to then distinguish between a right inverse, Rn×k, and a left

inverse, Ln×k. A left-inverse does not need to be right-inverse, but if both exist, then they

are equal.

This inverse matrix of a square matrix is indeed unique as the following theorem ascer-

tains:

Theorem 3.1. A square matrix A can have at most one inverse.

Proof. Suppose that B and C are both inverses of A. Then

C = CIn = C(AB) = (CA)B = InB = B.

Some other properties might also come in handy at times:

Theorem 3.2. Let An and Bn be invertible. Then the following holds:

1. (AT )−1

= (A−1)T

2. AB is invertible and (AB)−1 = B−1A−1.

3. for any scalar λ 6= 0, λA is invertible and (λA)−1 =1

λA−1.

Proof. Left as exercise.

The existence of an inverse guarantees that the associated set of linear equations has a

solution and that this solution is unique. This is summarized in the following theorem, the

proof of which has already been given ex-ante in the motivation of this section.

Theorem 3.3. If a square matrix An is invertible, then the unique solution to the system

of linear equations Anx = b is x = A−1b.

Remark 2. In fact, if the system contains the same number of equations as variables, then

the converse also holds: if the system of linear equations has a unique solution, then the

associated (square) matrix A is invertible.

11

This theorem gives a sufficient condition for answering the first two questions. Upon

near inspection, however, it is little useful because it is not clear under which conditions a

matrix is invertible. The rest of this section will deal with this question by first showing an

algorithm with which to calculate the inverse matrix and second by showing a more usable

condition for the existence of the inverse.

3.2 The GaußJordan algorithm

While there is also a theorem which provides a closed-form for the inverse matrix, this for-

mula would require a set of new definitions and heavy notation. Since I have never found it

useful, I skip it here and focus on a more practical method for calculating inverse matrices,

the Gauß-Jordan elimination method. The general idea is very natural, for it stems from

real algebraic techniques applied to solve linear systems. As you may remember, three ele-

mentary operations can be performed on a system of linear equation without changing its

solution set. Namely:

(i) interchanging two equations,

(ii) multiplying each element of an equation by a nonzero scalar, and

(iii) adding a nonzero multiple of one equation to another.

To each of these elementary operations one can associate an elementary matrix Ek such that,

when one pre-multiplies it with the matrix to be inverted, say, A, the resulting matrix is

exactly the coefficient matrix associated with the linear system that results from performing

the given elementary operation on the initial system. One way to solve the system Ax = b,

is to perform sufficiently many elementary operations so as to isolate every unknown on

exactly one of the linear equations. For instance, we could search for a set of elementary

matrices E1,E2, ...,EK−1,EK such that:

EKEK−1...E1A = I

for then,

EKEK−1...E1b = EKEK−1...E1Ax = Ix = x

This is yields the spirit of the Gauß-Jordan elimination method: the sequence of elementary

operations which transforms our invertible matrix A into the identity matrix, if performed

on the vector b, yields the solution to the system x. Further, note that EKEK−1...E1 is itself

a matrix. More precisely note that EKEK−1...E1 = A−1, as multiplying it with A yields the

identity matrix! This suggests that the same method can be used to invert matrices. And,

indeed, it can:

EKEK−1...E1A = I ⇒ (EKEK−1...E1)IA = I ⇒ (EKEK−1...E1)I = A−1

12

In words: the sequence of elementary operations which transforms our invertible matrix A

into the identity matrix, if performed on the identity matrix, yields the inverse matrix of A.

In order to alleviate notation, we define an augmented matrix : [An×n | bn] (resp. [An×n |In]). Then, the rules of the game are as follows: use the three elementary operations to have

the identity matrix on the left hand side of the augmented matrix. Any operation realized

on the left hand side must be realized on the right hand side as well. The solution vector

(resp. inverse matrix) is then the vector (resp. matrix) appearing on the right hand side of

the augmented matrix.

An example is worth a thousand words, especially in this case! Therefore, let us proceed

with an example. Notice the pattern: select a column, get your one and your zeros, move to

the next column! 2 −1 0 | 1 0 0

−1 2 −1 | 0 1 0

0 −1 2 | 0 0 1

Get the coefficient in position (1,1) equal to one (i.e., Multiply the first line by 1

2) and the

coefficients in positions (2,1) and (3,1) to be zero (i.e., add once the new first line to the

second line, and do not touch the third line):1 −1/2 0 | 1/2 0 0

0 3/2 −1 | 1/2 1 0

0 −1 2 | 0 0 1

Get the coefficient in position (2,2) equal to one (i.e., multiply the second line by 2

3) and the

coefficients in positions (1,2) and (3,2) to be zero (i.e. add 12

times the new second line to

the first line, and one time the new second line to the third line):1 0 −1/3 | 2/3 1/3 0

0 1 −2/3 | 1/3 2/3 0

0 0 4/3 | 1/3 2/3 1

Get the coefficient in position (3,3) equal to one (i.e., multiply the third line by 3

4) and the

coefficients in positions (1,3) and (2,3) to be zero (i.e. add 13

times the new third line to the

first line, and 23

times the new second line to the third line):1 0 0 | 3/4 1/2 1/4

0 1 0 | 1/2 1 1/2

0 0 1 | 1/4 1/2 3/4

13

3.3 Determinant of a square matrix

ONLY SQUARE MATRICES HAVE A DETERMINANT!

The determinant is a function which maps from Mn×n into the real line in an extremely

clever way. Amongst other things, it helps you compute the inverse of a matrix whenever it

exists; it eases your determination of the definiteness of a matrix; and it also happens to be

useful in the computation of the eigenvalues of a matrix (an important concept that I can

only shortly introduce in these notes).

Do not expect a general formula for the computation of a determinant though. There

does exist a general formula, but it involves concepts I do not introduce here (such as permu-

tations and their associated parity). Hence, we will here define the determinant recursively,

in the sense that we first define it for simple 1×1 matrices, and then express the determinant

of an n× n matrix as a function of that of an (n− 1)× (n− 1) one. It is conceptually more

straightforward and sufficient for the use you will make of them.

Let A be an n× n matrix. Then, we denote its determinant as follows:

detA =| A |=

∣∣∣∣∣∣∣∣∣a1,1 a1,2 · · · a1,na2,1 a2,2 · · · a2,n

......

. . ....

an,1 an,2 · · · an,n

∣∣∣∣∣∣∣∣∣• If A be an 1× 1 matrix (a1,1), then detA = a1,1

• If A be an 2× 2 matrix

(a1,1 a1,2a2,1 a2,2

), then detA = a1,1a2,2 − a2,1a1,2

• If A be an 3× 3 matrix

a1,1 a1,2 a1,3a2,1 a2,2 a2,3a3,1 a3,2 a3,3

, then

detA = a1,1

∣∣∣∣a2,2 a2,3a3,2 a3,3

∣∣∣∣− a1,2 ∣∣∣∣a2,1 a2,3a3,1 a3,3

∣∣∣∣+ a1,3

∣∣∣∣a2,1 a2,2a3,1 a3,2

∣∣∣∣Notice the pattern here! In fact, this pattern will hold for all square matrices, no matter

their dimensions. We can thus consider the determinant of an n× n matrix. Let

An,n =

a1,1 a1,2 · · · a1,na2,1 a2,2 · · · a2,n

......

. . ....

an,1 an,2 · · · an,n

14

Then

detA =n∑

j=1

a1,jC1,j

Where C1,j := (−1)1+jA1,j is known as the (1, j)th cofactor of A, and A1,j, known as the

(1, j)th first minor, is the determinant of the (n− 1)× (n− 1) formed out of A by deleting

the first row and the jth column.

Remark 1. Unfortunately there is a lot of “naming” in this subsection of the lecture. De-

terminants of large matrices appear cumbersome to compute, and they are. In practice, you

will mostly be asked to compute those of order 2 and 3.

Remark 2. Other easy to compute cases are that of a lower-triangular, upper-triangular or

diagonal matrix. There, the determinant is simply the product of the matrix’s diagonal

entries. If this is not clear to you, you can verify it by calculating a few simple examples.

Remark 3. From the previous remark, it in particular follows that the determinant of the

identity matrix equals one: detIn = 1.

As a practice, you should verify the following theorem for the cases that n = 2 and n = 3:

Theorem 3.4. (Determinant of the Product)

For any two n× n matrices A and B we have

det(AB) = det(A)det(B).

Remark 4. A similar equation does in general not hold for the product, i.e., in general

det(A + B) 6= det(A) + det(B).

Given the recursive definition of the determinant function, its applicability to the problem

of solving linear equations is not at all obvious. It is stated in the following theorem:

Theorem 3.5. Let A be a square matrix. Then

A−1 exists ⇔ det(A) 6= 0.

Proof. ”⇒”:

If A is invertible, then

1 = det(In) = det(AA−1) = det(A)det(A−1),

hence, det(A) 6= 0.

15

”⇐” (idea only):

The other direction is not easy to show. But here is the intuition for the simple 2× 2 case:{ax+ by = z1

cx+ dy = z2

One can solve this system multiplying the first equation by d, the second by −b, and adding

the two, which yields:

(ad− cb)x = dz1 − bz2Alternatively, multiplying the first equation by −c, the second by a, and adding, one gets:

(ad− cb)y = az2 − cz1Note that, if the quantity (ad − cb) is different from zero, then these equations uniquely

determine the value of the unknowns x and y. But this quantity is nothing else than the

determinant of the coefficient matrix associated to our system of linear equations!

A2×2 =

(a b

c d

)That is, a sufficient condition for a linear system of two unknowns and two equations to

have a well defined solution is that the determinant of the associated coefficient matrix be

different from zero. Actually this intuition can be generalized to any dimension n. In other

words, the determinant of the matrix associated to a system of linear equations provides a

sufficient condition for ensuring solvability of the system. Namely, Let A be the coefficient

matrix of a system of linear equations. If det(A) 6= 0, then the system has a unique solution.

4 The vector space Mn×m and linear equations

While this is a neat and convenient result, I have not yet held my promise to link linear

equations to the concepts of the previous chapter. This was simply not necessary when con-

sidering square matrices, i.e. systems of linear equations with the same number of variables

and equations. This changes when we are looking at the general case, where there could

be more or less equations than variables. What can we then say about the existence and

number of solutions, and how do we find them? The following example illustrates how the

terminology of vector spaces comes in handy here.

Now, watch out! A crucial geometrical interpretation of systems of linear equations is on

its way! Consider the following system of m linear equations and n unknowns:

Am×nxn×1 = bn×1

Partitioning our coefficient matrix so as to preserve the columns but stack the rows, we get

the following, equivalent, equation:

16

(a1n×1

a2n×1

· · · amn×1

)xn×1 = bn×1

Which is equivalent to the following expression:

n∑i=1

xiai = bn×1

In words, when solving a system of linear equations, we are looking for the coefficients of a

linear combination of the columns of our coefficient matrix that would yield the vector bn×1!

Let us clarify thing with a small example.

Example 4.1. Consider the following system:{x1 + 2x2 = 2

x1 − x2 = 0

Its associated matrix form is the following:(1 2

1 −1

)(x1x2

)=

(2

0

)And, when looking for a solution to the system, the question we are asking is the following:

does there exist x1 and x2 such that:(2

0

)= x1

(1

1

)+ x2

(2

−1

)Geometrically:

Figure 1: b as a linear combination of the columns

In this case, b can indeed be expressed as linear combination of the columns of the

coefficient matrix A. As an exercise, I suggest you to represent graphically other linear

combinations of the two columns. It should convince you of the following fact: let b be

17

any vector in R2, then there exist a unique linear combination of the columns of A, with

respective coefficients x1 and x2, that yield b. The fundamental reason behind this result is

that the columnvectors of A are independent. Had they been linearly dependent, then their

graphical representation would have lied on a single line, and so would any linear combination

of these two vectors. Therefore, vectors in R2 but not on that line can not be expressed as

a linear combination of the two columns.

Let us formalize this idea.

Definition 4.1. (Rank of a Matrix)

Let A be an n × m matrix. Then, the rank of A, Rank(A), is defined as the number of

linearly independent columns of A. If n = m and Rank(A)=n, then we say that A has full

rank.

Remark 1. This is actually only a characterization of the rank of a matrix, and not its

original definition. But it is the most common characterization among economists.

A necessary and sufficient condition, then, is the following.

Theorem 4.1. (Rank condition)

If A has full rank3, then for any b in Rn, the system Ax = b has a unique solution.

Let us sum up our results for square matrices:

Theorem 4.2. (Matrices and Linear Systems)

Let A be a square matrix of dimension n. The following statements are equivalent:

(i) det(A) is different from zero.

(ii) A has an inverse.

(iii) A has full rank.

(iv) For all b ∈ Rn, the linear system Ax = b has a unique solution. Namely, x = A−1b.

Now, what about non-square matrices? We defined the rank of a matrix as the number

of linearly independent columns of A. Let us go further with using the terminology of the

last chapter.

Definition 4.2. (Column Space)

The subset of Rn spanned by the columns of a matrix A = [a1, ..., an] is called the column

space of A:

Col(A) = Span(a1, ..., an).

Remember the definition of the dimension of a vector space? It is the maximal number

of linearly independent vectors in a set of vectors spanning that vector. Hence:

Theorem 4.3.

dimCol(A) = rankA3Note, again, this requires A to be a square matrix!

18

As we saw at the beginning of this subsection, the column space plays an important

role in linear equation systems, as these can be written as finding a linear combination of

the columns which equal the vector b. It is clear that this it is only possible to find such

a linear combination, iff b ∈ Col(A). If we want this statement to hold for all b ∈ Rn,

then we need that the set of column vectors of the matrix spans the whole Rn, in other

words: dimCol(A)(= rankA) = n. The following theorem concludes our discussion of

linear equations:

Theorem 4.4. Let A be n×m matrix. Then:

1. The system of linear equations represented by Ax = b has a solution for a particular

b ∈ Rn if and only if b ∈ Col(A).

2. The system of linear equations represented by Ax = b has a solution for every b ∈ Rn

if and only if rankA = n.

3. If the system of linear equations represented by Ax = b has a solution for every b ∈ Rn,

then:

n = rankA ≤ number columns of A = m.

5 Linear functions and matrices

5.1 Linear functions

The use of matrices goes beyond that of linear equations. In fact, a matrix of dimension

m×n can also be thought of as a linear function from Rn to Rm. Let us start by defining what

we mean by a ”linear function” from Rn to Rm. Similar to the way vector spaces preserve

linear combinations of their elements, linearity of a function is also about preservation of

linear combinations, but between two vector spaces, namely, the domain and the codomain.

More formally:

Definition 5.1. (Linear Function)

Let X and Y be two vector spaces. A function f : X→ Y is said to be linear if and only if:

f

(n∑

i=1

λixi

)=

n∑i=1

λif(xi) ∀n ∈ N, λi ∈ R, and xi ∈ X.

Remark 1. Actually, to check that a function is linear, it suffices to check that (i) both

the domain and codomain are vector spaces (i.e. the expressions on the left and right are

well-defined) and (ii) that f(λx1 + x2) = λf(x1) + f(x2) ∀λ ∈ R and x1, x2 ∈ X. As an

exercise, you should verify that this is indeed a necessary and sufficient condition for a linear

function.

19

Example 5.1. It is easy to see that for any a ∈ R the function f1 : R → R, x 7→ a · x is

linear , and that the function f : R→ R, x 7→ a · x2 is not. What about x 7→ ax+ b?

Example 5.2. An important example is the functionδ

δx, which maps from the vector space

of all differentiable functions over R into the vector space of all functions, i.e.,

δ

δx: {f : R→ R, f is differentiable } → {f : R→ R}, f 7→ f

′.

You should check that this is indeed a linear function. Often, for easier communication,

functions which operate between vector spaces of functions are called operators.

Now, consider an m× n matrix A and a vector x in Rn. The product Ax is an element

of Rm, and, therefore, one can associate a matrix A to a function which, to every verctor x

of Rn, associates the vector Ax in Rm. You should show that this function is linear. In fact,

the relation between the two types of objects is stronger than that, as the converse is also

true: All linear functions from Rn to Rm are expressible via a matrix of dimension m × n.

A proof can be found, for instance, in De la Fuente [?]’s chapter 3, theorem 3.5.

Restricting our attention to square matrices of dimension n, geometrically interesting

situations may occur, as the input and output vector lie in the same space, namely, Rn. For

instance, one could be curious about the angle between the input and output vectors. The

next definition labels some particularly remarkable situations4.

Definition 5.2. (Definiteness of a Matrix)

A (symmetric) n× n matrix A is:

(i) positive semidefinite if and only if x′Ax ≥ 0 for all x ∈ Rn.

(ii) negative semidefinite if and only if x′Ax ≤ 0 for all x ∈ Rn.

(iii) positive definite if and only if x′Ax > 0 for all x ∈ Rn.

(iv) negative definite if and only if x′Ax < 0 for all x ∈ Rn.

In words, if, for every vector x ∈ Rn, the n×n matrix A maps x into a vector Ax whose

angle with x is acute (resp. obtuse), then A is positive (resp. negative) semidefinite. If, for

every vector x ∈ Rn, the n× n matrix A maps x into a vector Ax (i) that is not orthogonal

to x and (ii) whose angle with x is acute (resp. obtuse), then A is positive (resp. negative)

definite.

As definiteness of a matrix plays a crucial role in the remainder of our lectures, we shall

make sure that we know how to decide on the definiteness of a given matrix. Fortunately,

4Recall your high school physics lectures! If u and v are a row and column vectors in a Euclidean space,then u · v = cos(θ)‖u‖‖v‖ where θ denotes the (smallest) angle between the vectors u and v. Hence, the dotproduct of two vectors that form an acute angle is non negative, and that of two vectors that form an obtuseangle is non positive!

20

one does not have to try out every vector x in Rn and compare x′Ax to 0. Unfortunately

none of the alternative procedures are simple and all require the introduction of some more

vocabulary. But cope with it, once one has practiced the exercise a few times, it becomes

quite clear! After a brief layout of the principal minor method, I introduce in the following

subsection one particular method, because the necessary concept of eigenvalues is an inter-

esting phenomena on its own right. On the downside, in its straightforward form, it is only

valid for symmetric matrices!

Definition 5.3. ((Leading) principal minors of a matrix)

Let A be an n× n matrix.

1. Any k× k submatrix of A formed by deleting n− k rows and the corresponding n− kcolumns is called a kth order principal submatrix of A. The determinant of the kth

order principal submatrix of A is called a kth order principal minor of A.

2. The k× k submatrix of A formed by deleting the last n− k rows and columns of A is

called the kth order leading principal submatrix of A, denoted by Ak. Its determinant

is called the kth order leading principal minor of A, denoted |Ak|.

Theorem 5.1. (Definiteness of a matrix)

Let A be an n× n symmetric matrix. Then,

1. A is positive semidefinite if and only if all its principal minors are non negative.

2. A is positive definite if and only if all its n leading principal minors are strictly positive.

3. A is negative semidefinite if and only if, for all k, its kth order principal minor have

the same sign as (−1)k.

4. A is negative definite if and only if, for all k, its kth order leading principal minor

have the same sign as (−1)k or is 0. (Put in other words: A is negative semidefinite

if and only if every principal minor of odd order is ≤ 0 and every principal minor of

even order is ≥ 0.)

5.2 Eigenvectors and -values

Eigenvalues stem from another geometrically interesting situation. Let A be a square n× nmatrix and consider again the two n-dimensional vectors x and Ax. The case where x and

Ax are collinear is sufficiently peculiar to deserve a detailed examination and eigenvalues

and eigenvectors are at the core of it.

Definition 5.4. (Eigenvectors and -values)

Let A be a square n× n matrix, then, a vector x ∈ Rn is said to be an eigenvector of A if

and only if there exists a λ in R such that:

21

Ax = λx

λ is then called an eigenvalue of A.

Note that the above equation, provided it holds, constitute a soluble system of linear

equations which may be more explicitly written as follows:

(A− λI)x = Ax− λIx = 0 (S)

Moreover, the solution to this system is non trivial (i.e., different from 0) if and only if:

det(A− λI) = 0

This yields a method for finding the eigenvalues of a matrix A. Namely, let

P (λ) := det(A− λI)

This is a polynomial of degree n, called the characteristic polynomial, whose roots, i.e. λs for

which P (λ) = 0, correspond to the eigenvalues of A. Once one has an eigenvalue, plugging

it back in the system (S) and solving for x yields the associated eigenvector.

Turning back our thoughts to definiteness of matrices, two facts follow directly from the

definition. If a matrix A is positive (resp. negative) semidefinite, then all its eigenvalues

must be non-negative (resp. non-positive). If a matrix A is positive (resp. negative) definite,

then all its eigenvalues must be positive (resp. negative). Actually, if A is symmetric, the

converse is also true. That is, the following result holds:

Theorem 5.2. (Definitness of a Matrix)

Let A be an n× n symmetric matrix. Then:

(i) A is positive definite if and only if all its n eigenvalues are strictly positive.

(ii) A is negative definite if and only if all its n eigenvalues are strictly negative.

(iii) A is positive semidefinite if and only if all its n eigenvalues are non-negative.

(iv) A is negative semidefinite if and only if all its n eigenvalues are non-positive.

Proof. (only ”⇒”, exemplary for (i))

Let A be positive definite and let x be an eigenvector with associated eigenvalue λ. Then,

0 < xTAx = xTλx = λxTx.

Now, because xTx = ||x||, the Euclidean norm of x, we know that it is greater zero for all

non-zero vectors. We conclude λ > 0.

22

Remark 1. In order to generalize the above to non-symmetric matrices, note that for any

square matrix A, the sum of the matrix and its transpose, i.e. A+AT is symmetric, and for

the sum it holds that z · (A + AT ) · z = 2 · zAz. Hence, a (possible non-symmetric) square

matrix is negative / positive (semi) definite if and only if the symmetric matrix A + AT is

negative / posivite (semi) definite!

23

chapter 2 - introduction to matrix algebra · column vectors. in matrix algebra, if not speci ed...

Documents