chapter 7 matrices i: linear equations - hwmarkl/teaching/algebra/chapter7.pdf · chapter 7...

Chapter 7

Matrices I: linear equations

The term matrix was introduced by James Joseph Sylvester (1814–1897)in 1850, and the first paper on matrix algebra was published by ArthurCayley (1821–1895) in 18581. Matrices were introduced initially as packagingfor systems of linear equations, but then came to be investigated in theirown right. The main goal of this chapter is to introduce the basics of thearithmetic and algebra of matrices. This chapter and the two that followform the first steps in the subject known as linear algebra. It is hard tooveremphasize the importance of this subject throughout mathematics andits applications

7.1 Matrix arithmetic

In this section, I shall introduce matrices and three arithmetic operationsdefined on them. I shall also define an operation called the ‘transpose of amatrix’ that will be important in later work. This section forms the founda-tion for all that follows.

7.1.1 Basic matrix definitions

A matrix2 is a rectangular array of numbers. In this course, the numbers willusually be real numbers but, on occasion, I shall also use complex numbers

1A memoir on matrices, Philosphical Transactions of the Royal Society of London 148(1858), 17–37. This is well-worth reading.

2Plural: matrices.

171

172 CHAPTER 7. MATRICES I: LINEAR EQUATIONS

for variety.

Example 7.1.1. The following are all matrices:

(1 2 34 5 6

),

(41

),

1 1 −10 2 41 1 3

,

(6).

Usually the array of numbers that comprises a matrix is enclosed in roundbrackets. Occasionally books use square brackets with the same meaning.Later on, I shall introduce determinants and these are indicated by usingstraight brackets. In general, the kind of brackets you use is important andis not just a matter of taste.

We usually denote matrices by capital Roman letters: A, B, C, etc. Thesize of a matrix is m × n if it has m rows and n columns. The entries in amatrix are often called the elements of the matrix and are usually denotedby lower case Roman letters. If A is an m × n matrix, and 1 ≤ i ≤ m and1 ≤ j ≤ n, then the entry in the ith row and jth column of A is often denoted(A)ij. Thus ()ij means ‘the element in ith row and jth column’.

Examples 7.1.2.

1. Let

A =

(1 2 34 5 6

)

Then A is a 2×3 matrix. We have that (A)11 = 1, (A)12 = 2, (A)13 = 3,(A)21 = 4, (A)22 = 5, (A)23 = 6.

2. Let

B =

(41

)

Then B is a 2× 1 matrix. We have that (B)11 = 4, (B)21 = 1.

3. Let

C =

1 1 −10 2 41 1 3

Then C is a 3×3 matrix. (C)11 = 1, (C)12 = 1, (C)13 = −1, (C)21 = 0,(C)22 = 2, (C)23 = 4, (C)31 = 1, (C)32 = 1, (C)33 = 3.

7.1. MATRIX ARITHMETIC 173

4. LetD =

(6)

Then D is a 1× 1 matrix. We have that (D)11 = 6.

Matrices A and B are said to be equal, written A = B, if they have thesame size and corresponding entries are equal: that is, (A)ij = (B)ij for allallowable i and j.

Example 7.1.3. Given that(a 2 b4 5 c

)=

(3 x −2y z 0

)

Find a, b, c, x, y, z. This example simply illustrates what it means for twomatrices to be equal. By definition a = 3, 2 = x, b = −2, 4 = y, 5 = z andc = 0.

When we want to talk about an arbitrary matrix A we usually denoteits elements by aij where i tells you the row the element lives in and j thecolumn. For example, a typical 2× 3 matrix A would be written

A =

(a11 a12 a13

a21 a22 a23

)

7.1.2 Addition, subtraction, scalar multiplication andthe transpose

We define first the operations that cause us no trouble.

Addition Let A and B be two matrices of the same size. Then their sumA+B is the matrix defined by

(A+B)ij = (A)ij + (B)ij.

That is, corresponding entries of A and B are added. If A and B arenot the same size then their sum is not defined.

Subtraction Let A and B be two matrices of the same size. Then theirdifference A−B is the matrix defined by

(A−B)ij = (A)ij − (B)ij.

That is, corresponding entries of A and B are subtracted. If A and Bare not the same size then their difference is not defined.


Scalar multiplication In matrix theory, numbers are often called scalars.For us scalars will usually be either real or complex. Let A be anymatrix and λ any scalar. Then the matrix λA is defined as follows:

(λA)ij = λ(A)ij.

In other words, every element of A is multiplied by λ.

Transpose of a matrix Let A be an m × n matrix. Then the transposeof A, denoted AT , is the n×m matrix defined by (AT )ij = (A)ji. Wetherefore interchange rows and columns: the first row of A becomes thefirst column of AT , the second row of A becomes the second column ofAT , and so on.

Examples 7.1.4.

1.(

1 2 −13 −4 6

)+

(2 1 3−5 2 1

)=

(1 + 2 2 + 1 −1 + 3

3 + (−5) −4 + 2 6 + 1

)

which gives (3 3 2−2 −2 7

)

2.(

1 2 −13 −4 6

)−(

2 1 3−5 2 1

)=

(1− 2 2− 1 −1− 3

3− (−5) −4− 2 6− 1

)

which gives (−1 1 −4

8 −6 5

)

3. (1 12 1

)−(

3 3 2−2 −2 7

)

is not defined since the matrices have different sizes.

4.

2

(3 3 2−2 −2 7

)=

(6 6 4−4 −4 14

)


Examples 7.1.5. The transposes of the following matrices

(1 2 34 5 6

),

(41

),

1 1 −10 2 41 1 3

,

(6)

are, respectively,

1 42 53 6

,

(4 1

),

1 0 11 2 1−1 4 3

,

(6).

Example 7.1.6. If

A =

(1 −1 23 0 1

)and B =

(0 −2 32 1 −1

)

we may calculate 3A+ 2B using the above definitions to get(

3 −7 1213 2 1

)

7.1.3 Matrix multiplication

This is more complicated than the other operations and, like them, is notalways defined. To define this operation it is useful to work with two specialclasses of matrix. A row matrix or row vector is a matrix with one row (butany number of columns). A column matrix or column vector is a matrix withone column but any number of rows. Row and column matrices are oftendenoted by bold lower case Roman letters a,b, c . . .. The ith element of therow or column matrix a will be denoted by ai.

Examples 7.1.7. The matrix(

1 2 3 4)

is a row matrix whilst

1234

is a column matrix.


I shall build up to the definition of matrix multiplication in three stages.

Stage 1. Let a be a row matrix and b a column matrix, where

a = (a1 a2 . . . am)

and

b =

b1

b2

.

.

.bn

Then their product ab is defined if, and only if, the number of columns ofa is equal to the number of rows of b, that is m = n, in which case theirproduct is the 1× 1 matrix

ab = (a1b1 + a2b2 + . . .+ anbn).

The numbera1b1 + a2b2 + . . .+ anbn

is called the inner product of a and b and is denoted by a · b. Using thisnotation we have that

ab = (a · b).

Example 7.1.8. This odd way of multiplying is actually quite natural.Here’s an example of where it arises in real life. If you buy y items whoseunit cost is x then you spend xy. This can be generalized as follows whenyou buy a number of different kinds of items at different prices. Let a be therow matrix (

0 · 6 1 0 · 2)

where 0 · 6 is the price of a bottle of milk, 1 is the price of a loaf of bread,and 0 · 2 is the price of an egg. Let b be the column matrix

23

10

where 2 is the number of bottles of milk bought, 3 is the number of loavesof bread bought, and 10 is the number of eggs bought. Thus a is the price


row matrix and b is the quantity column matrix. The total amount spent istherefore

0 · 6× 2 + 1× 3 + 0 · 2× 10 :

namely, the sum over all the commodities bought of the price of each com-modity times the number of items of that commodity purchased. This num-ber is precisely the inner product a · b: namely, 6 · 20.

Stage 2. Let a be a row matrix as above and let B be a matrix. Thus a isa 1 ×m matrix and B is a p × q matrix. Then their product aB is definedif, and only if, the number of columns of a is equal to the number of rowsof B. Thus m = p. To calculate the product think of B as consisting of qcolumn matrices b1, . . . ,bq. We calculate the q numbers a · b1, . . . , a · bq asin stage 1, and the q numbers that result become the entries of aB. ThusaB is a 1× q matrix whose jth entry is the number a · bj.

Example 7.1.9. Let a be the cost matrix of our previous example. Let B bethe 3× 5 matrix whose columns tell me the quantity of commodities boughton each of the days of the week Monday to Friday:

B =

2 0 2 0 43 0 4 0 8

10 0 10 0 20

Thus on Tuesday and Thursday no purchases were made, whilst on Fridayextra commodities were bought in preparation for the weekend. The matrixaB is a 1× 5 matrix which tells us how much was spent on each day of theweek. Thus

aB =(

0 · 6 1 0 · 2)

2 0 2 0 43 0 4 0 8

10 0 10 0 20

which is equal to (6 · 2 0 7 · 2 0 14 · 4

)

Stage 3. Let A be an m × n matrix and let B be a p × q matrix. Theirproduct AB is defined if, and only if, the number of columns of A is equalto the number of rows of B: that is n = p. If this is so then AB is an m× qmatrix. To define this product we think of A as consisting of m row matricesa1, . . . , am and we think of B as consisting of q column matrices b1, . . . ,bq.


As in Stage 2 above, we multiply the first row of A into each of the columnsof B and this gives us the first row of A; we then multiply the second row ofA into each of the columns of B to get the second row of B, and so on.

Example 7.1.10. Let B be the 3× 5 matrix of the previous example whosecolumns tell me the quantity of commodities bought on each of the daysMonday to Friday

B =

2 0 2 0 43 0 4 0 8

10 0 10 0 20

Let A be the 2×3 matrix whose first row tells me the cost of the commoditiesin shop 1 and whose second row tells me the cost of the commodities in shop 2.

A =

(0 · 6 1 0 · 2

0 · 65 1 · 05 0 · 30

)

The first row of AB tells me how much was spent on each day of the weekin shop 1, and the second row of AB tells me how much was spent on eachday of the week in shop 2. Thus

AB =

(0 · 6 1 0 · 2

0 · 65 1 · 05 0 · 30

)

2 0 2 0 43 0 4 0 8

10 0 10 0 20

which is equal to (6 · 2 0 7 · 2 0 14 · 4

7 · 45 0 8 · 5 0 17

)

Examples 7.1.11.

1.

(1 −1 0 2 1

)

231−1

3

=(

0)

2. The product (1 −1 23 0 1

)(0 −2 32 1 −1

)

doesn’t exist because the number of columns of the first matrix is notequal to the number of rows of the second matrix.


3. The product(

1 2 42 6 0

)

4 1 4 30 −1 3 12 7 5 2

exists because the first matrix is a 2×3 and the second is a 3×4. Thusthe product will be a 2× 4 matrix and is

(12 27 30 138 −4 26 12

)

Summary of matrix multiplication

• Let A be an m × n matrix and B a p × q matrix. The product ABis defined if, and only if, n = p and the result will then be an m × qmatrix. In other words:

(m× n)(n× q) = (m× q).

• (AB)ij is the inner product of the ith row of A and the jth column ofB.

• It follows that the inner product of the ith row of A and each of thecolumns of B in turn yields each of the elements of the ith row of ABin turn.

If ai are row matrices and bj are column matrices then the product oftwo matrices can be written as follows

a1

.

.

.am

(

b1 . . .bn)

=

a1 · b1 . . . a1 · bn. . . . .. . . . .. . . . .

am · b1 . . . am · bn

7.1.4 Special matrices

Matrices come in all shapes and sizes, but some of these are important enoughto warrant their own terminology. A matrix all of whose elements are zero iscalled a zero matrix. The m× n zero matrix is denoted Om,n or just O and


we let the context determine the size of O. A square matrix is one in whichthe number of rows is equal to the number of columns. In a square matrix Athe elements (A)11, (A)22, . . . , (A)nn are called the diagonal elements. All theother elements of A are called the off-diagonal elements. A diagonal matrix isa square matrix in which all off-diagonal elements are zero. A scalar matrixis a diagonal matrix in which the diagonal elements are all the same. Then× n identity matrix is the scalar matrix in which all the diagonal elementsare the number one. This is denoted by In or just I where we allow thecontext to determine the size of I. Thus scalar matrices are those of theform λI where λ is any scalar. A matrix is real if all its elements are realnumbers, and complex if all its elements are complex numbers. A matrix Ais said to be symmetric if AT = A. In particular, symmetric matrices arealways square.

Examples 7.1.12.

1. The matrix

1 0 00 2 00 0 3

is a 3× 3 diagonal matrix.

2. The matrix

1 0 0 00 1 0 00 0 1 00 0 0 1

is the 4× 4 identity matrix.

3. The matrix

42 0 0 0 00 42 0 0 00 0 42 0 00 0 0 42 00 0 0 0 42

is a 5× 5 scalar matrix.


4. The matrix

0 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 0

is a 6× 5 zero matrix.

7.1.5 Linear equations

Matrices are extremely useful in helping us to solve systems of linear equa-tions. For the time being, I shall simply show you how matrices provide aconvenient notation for writing down such equations.

A system of m linear equations in n unknowns is a list of equations of thefollowing form

a11x1 + a12x2 + . . .+ a1nxn = b1

a21x1 + a22x2 + . . .+ a2nxn = b2

· · ·am1x1 + am2x2 + . . .+ amnxn = bm

If we have only a few unknowns then we often use w, x, y, z rather thanx1, x2, x3, x4. A solution is a set of values of x1, . . . , xn that satisfy all theequations. The set of all solutions is called the solution set or general solution.The equations above can be conveniently represented using matrices. Let Abe the m×n matrix (A)ij = aij, let b be the m× 1 matrix (b)i = bi, and letx be the n× 1 matrix (x)j = xj. Then the system of linear equations abovecan be written in the form

Ax = b.

The matrix A is called the coefficient matrix. At the moment, we are justusing matrices as packaging for the equations.

Example 7.1.13. The following system of linear equations

2x+ 3y = 1

x+ y = 2


may be written in matrix form as follows.

(2 31 1

)(xy

)=

(12

)

7.1.6 Conics and quadrics

We have dealt with polynomial equations in one unknown and, in this chap-ter, we shall deal with linear equations in several unknowns. But what aboutequations in several unknowns where both products and powers of the un-knowns can occur? The simplest class of such equations are the conics. Theseare equations of the form

ax2 + bxy + cy2 + dx+ ey + f = 0

where a, b, c, d, e, f are numbers of some kind. These are equations in twovariables and variables either appear to degree zero, which is the constantterm, directly as linear terms or as binary products such as xy or x2. Ingeneral, the roots or zeroes of such equations form curves in the plane suchas circles, ellipses and hyperbolas. The term conic arises from the way thatthey were first defined by the Greeks as those curves that arise when you cuta double cone by means of a plane. These curves are important in astronomysince it can be proved that the orbits of satellites, planets, space-craft etcalways follow conics. The reason for introducing them here is that they canbe represented as matrix equations as folows.

xTAx + JTx + (h) = (0)

where

A =

(a 1

2b

12b c

)x =

(xy

)J =

(fg

)

This is not just a notational convenience. The fact that the matrix A issymmetric means that powerful ideas from matrix theory, to be developedlater in this book, can be brought to bear on studying such conics. If wereplace the x above by the matrix

x =

xyz


and A by a 3 × 3 symmetric matrix and J by a 3 × 1 matrix then we getthe matrix equation of a quadric surface. Examples of such surfaces are thesurface of a sphere or the surface described by a cooling tower. But eventhough we are dealing with three rather than two dimensions, the matrixalgebra we shall develop applies just as well.

Exercises 7.1

1. Let A =

1 21 0−1 1

and B =

1 4−1 1

0 3

. Find A+B, A−B and

−3B.

2. Let A =

0 4 2−1 1 3

2 0 2

and B =

1 −3 52 0 −43 2 0

. Find the matrices

AB and BA.

3. Let A =

(3 10 −1

), B =

0 1−1 1

3 1

and C =

(1 0 3−1 1 1

).

Calculate BA, AA and CB. Can any other pairs of these matrices bemultiplied ? Multiply those which can.

4. Calculate

1234

(

1 2 3)

5. If A =

2 1−1 0

2 3

, B =

(3 0−2 1

)and C =

(−1 2 3

4 0 1

). Calcu-

late both (AB)C and A(BC) and check that you get the same answer.

6. Calculate

2 −1 21 2 −43 −1 1

xyz


7. Calculate (2 + i 1 + 2i

i 3 + i

)(2i 2 + i

1 + i 1 + 2i

)

where i is the complex number i.

8. Calculate

a 0 00 b 00 0 c

d 0 00 e 00 0 f

9. Calculate

(a)

1 0 00 1 00 0 1

a b cd e fg h i

(b)

0 1 01 0 00 0 1

a b cd e fg h i

(c)

a b cd e fg h i

0 1 01 0 00 0 1

10. Find the transposes of each of the following matrices

A =

1 21 0−1 1

, B =

1 −3 52 0 −43 2 0

, C =

1234

11. This question deals with the following 4 matrices with complex entriesand their negatives: I,X, Y, Z where

I =

(1 00 1

)X =

(0 1−1 0

)Y =

(i 00 −i

)and Z =

(0 −i−i 0

)

7.2. MATRIX ALGEBRA 185

Show that the product of any two such matrices is again a matrix ofthis type by completing the following table for multiplication where theentry in row A and column B is AB in that order.

I X Y Z −I −X −Y −ZIXYZ−I−X−Y−Z

7.2 Matrix algebra

In this section, we shall look at algebra where the variables are matrices.This algebra is similar to high-school algebra but also differs significantly inone or two places. For example, if A and B are matrices it is not true ingeneral that AB = BA even if both products are defined. We will learn inthis section which rules of school algebra apply to matrices and those whichdon’t.

7.2.1 Properties of matrix addition

In Chapter 3, I introduced the idea of a binary operation. Matrix additionand multiplication both have two inputs just as addition and multiplicationof real numbers, but there is an added complication that not all pairs ofmatrices can be added and not all pairs of matrices can be multiplied. Despitethis difference, I shall nevertheless use the same terminology I introduced inChapter 3 but in this slightly different setting.

(MA1) (A + B) + C = A + (B + C). This is the associative law for matrixaddition.

(MA2) A+ O = A = O + A. The zero matrix O, the same size as A, is theadditive identity for matrices the same size as A.


(MA3) A + (−A) = O = (−A) + A. The matrix −A is the unique additiveinverse of A.

(MA4) A+B = B + A. Matrix addition is commutative.

Thus matrix addition has the same properties as the addition of real num-bers, apart from the fact that the sum of two matrices is only defined whenthey have the same size. The role of zero is played by the zero matrix O ofthe appropriate size.

Example 7.2.1. Calculate

2A− 3B + 6I

where

A =

(1 23 4

)and B =

(0 12 1

)

Because we are dealing with matrix addition and scalar multiplication therules we apply are the same as those in high-school algebra. We get

(8 10 11

)

7.2.2 Properties of matrix multiplication

(MM1) (AB)C = A(BC). This is the associative law for matrix multiplica-tion.

(MM2) Let A be an m× n matrix. Then ImA = A = AIn. The matrices Imand In are the left and right multiplicative identities, respectively.

(MM3) A(B + C) = AB + AC and (B + C)A = BA + CA. These are theleft and right distributivity laws for matrix multiplication over matrixaddition.

Thus matrix multiplication has the same properties as the multiplicationof real numbers, apart from the fact that the product is not always defined,except the following three major differences.


Warning 1: matrix multiplication is not commutative.Consider the matrices

A =

(1 23 4

)and B =

(1 1−1 1

)

Then AB 6= BA. One consequence of the fact that matrix multiplication isnot commutative is that

(A+B)2 6= A2 + 2AB +B2,

in general (see below).

Warning 2: the product of two matrices can be a zero matrixwithout either matrix being a zero matrix.Consider the matrices

A =

(1 22 4

)and B =

(−2 −6

1 3

)

Then AB = O.

Warning 3: cancellation of matrices is not allowed.Consider the matrices

A =

(0 20 1

)and B =

(2 31 4

)and C =

(−1 1

1 4

)

Then A 6= O and AB = AC but B 6= C.

Example 7.2.2. Calculate

(1 20 1

)(2 11 0

)(1 23 4

)(−1 −2

2 1

)

However you bracket the matrices to carry out the calculation you shouldalways get (

17 −23 0

)


Example 7.2.3. Find the 3× 2 matrix X that satisfies

2

1 −4−2 3

4 0

− 4X + 3

2 04 −20 8

=

0 00 00 0

This can be solved much in the way it would be solved in high-school algebrato yield

2 −22 02 6

Example 7.2.4. If

A =

(x y3 1

)and B =

(4 13 0

)

commute find x and y. Multiplying out the matrices and comparing entriesyields x = 5 and y = 1.

Just how different matrix algebra is from high-school algebra is shownmy the following example.

Example 7.2.5. Suppose that X2 = I. Then X2 − I = O and so we mayfactorize to get (X − I)(X + I) = O. But we cannot conclude from thisthat X = I or X = −I because we cannot conclude from the fact that theproduct of two matrices is a zero matrix then one of the matrices must itselfby a zero matrix. We have seen that this is false. We therefore cannot deducethat the identity matrix has two square roots. In fact, it has infinitely manyas we now show. Let

A =

(a bc −a

)

and suppose that a2 + bc = 1. Check that A2 = I. Examples of matricessatisfying these conditions are

( √1 + n2 −n

n −√

1 + n2

)

where n is any positive integer. Thus the 2× 2 identity matrix has infinitelymany square roots!


7.2.3 Properties of scalar multiplication

(S1) 1A = A.

(S2) λ(A+B) = λA+ λB

(S3) (λµ)A = λ(µA).

(S4) (λ+ µ)A = λA+ µA.

(S5) (λA)B = A(λB) = λ(AB).

7.2.4 Properties of the transpose

(T1) (AT )T = A.

(T2) (A+B)T = AT +BT .

(T3) (αA)T = αAT .

(T4) (AB)T = BTAT .

Warning! Notice that the transpose of a product reverses the order of thematrices.

There are some important consequences of the above properties:

• Because matrix addition is associative we can write sums without brack-ets.

• Because matrix multiplication is associative we can write matrix prod-ucts without brackets.

• The left and right distributivity laws can be extended to arbitrary finitesums.


7.2.5 Some proofs

In this section, I shall prove that the algebraic properties of matrices statedreally do hold. I shan’t prove all of them: just a representative sample. I shallleave you the pleasure of proving the rest. It is important to observe that allthe properties of matrix algebra are ultimately proved using the propertiesof real numbers.

Let A be an m× n matrix whose entry in the ith row and jth column isaij. Let B be an n× p matrix whose entry in the jth row and kth column isbjk. By definition (AB)ik is the number equal to the product of the ith rowof A times the kth column of B. This is just

(AB)ik =n∑

j=1

aijbjk.

Theorem 7.2.6.

1. (A+B) + C = A+ (B + C).

2. A(BC) = (AB)C.

3. (λ+ µ)A = λA+ µA.

Proof. (1) To show that (A + B) + C = A + (B + C) we have to prove twothings. First, the size of (A+B) +C is the same as the size of A+ (B+C).Second, elements of (A+B) +C and A+ (B+C) in corresponding positionsare equal. To add A and B they have to be the same size and the resultwill be the same size as both of them. Thus C is the same size as A and B.It’s clear that both sides of the equation really are the same size. We nowcompare corresponding elements:

((A+B) + C)ij = (A+B)ij + (C)ij = ((A)ij + (B)ij) + (C)ij.

But now we use the associativity of addition of real numbers to get

((A)ij+(B)ij)+(C)ij = (A)ij+((B)ij+(C)ij) = (A)ij+(B+C)ij = (A+(B+C))ij,

as required.(2) Let A be an m× n matrix with entries aij, let B be an n× p matrix

with entries bjk, and let C be a p × q matrix with entries ckl. It’s evident


that A(BC) and (AB)C have the same size, so it remains to show thatcorresponding elements are the same. We shall prove that

(A(BC))il = ((AB)C)il.

By definition

(A(BC))il =n∑

t=1

ait(BC)tl,

and

(BC)tl =

p∑

s=1

btscsl.

Thus

(A(BC))il =n∑

t=1

ait

(p∑

s=1

btscsl

).

Using distributivity of multiplication over addition for real numbers this sumis just

n∑

t=1

p∑

s=1

aitbtscsl.

Now change the order in which we add up these real numbers to get

p∑

s=1

n∑

t=1

aitbtscsl.

Now use distributivity again

p∑

s=1

(n∑

t=1

aitbts

)csl.

The sum within the brackets is just

(AB)is

and so the whole sum isp∑

s=1

(AB)iscsl


which is precisely((AB)C)il.

(3) Clearly (λ + µ)A and λA + µA have the same sizes. We show thatcorresponding elements are the same:

((λ+ µ)A)ij = (λ+ µ)(A)ij = λ(A)ij + µ(A)ij = (λA)ij + (µA)ij

which is just (λA+ µA)ij, as required.

Warning! In (4) below, notice how matrices are reversed.

Theorem 7.2.7.

1. (AT )T = A.

2. (A+B)T = AT +BT .

3. (αA)T = αAT .

4. (AB)T = BTAT .

Proof. (1) We have that

((AT )T )ij = (AT )ji = (A)ij.

(2) We have that

((A+B)T )ij = (A+B)ji = (A)ji + (B)ji = (AT )ij + (BT )ij

which is just(AT +BT )ij.

(3) We have that

((αA)T )ij = (αA)ji = α(A)ji = α(AT )ji = (αAT )ij.

(4) Let A be an m×n matrix and B an n×p matrix. Thus AB is definedand is m × p. Hence (AB)T is p ×m. Now BT is p × n and AT is n ×m.Thus BTAT is defined and is p×m. Hence (AB)T and BTAT have the samesize. We now show that corresponding elements are equal. By definition

((AB)T )ij = (AB)ji.


This is equal ton∑

s=1

(A)js(B)si =n∑

s=1

(AT )sj(BT )is.

But real numbers commute under multiplication and so

n∑

s=1

(AT )sj(BT )is =

n∑

s=1

(BT )is(AT )sj = (BTAT )ij,

as required.

Quantum Mechanics

Quantum mechanics is one of the fundamental theories of physics. Atits heart are matrices. We have defined the transpose of a matrix butfor matrices with complex entries there is another, related, operation.Given any complex matrix A we define the matrix A† to be the oneobtained by transposing A and then taking the complex conjugate ofall entries. It is therefore the conjugate-transpose of A. A matrix Ais called Hermitian if A† = A — when A is a real matrix, we just getback the definition of a symmetric matrix. It turns out that quantummechanics is based on Hermitian matrices and their generalizations. Thefact that matrix multiplication is not commutative is one of the reasonsthat quantum mechanics is so different from classical mechanics. Thetheory of quantum computing makes heavy use of Hermitian matricesand their properties.

Exercises 7.2

1. Calculate(

2 07 −1

)+

(1 11 0

)+

(0 11 1

)+

(2 23 3

)

2. Calculate

123

( 3 2 1

)

1−1−4

( 3 1 5

)


3. Calculate (x y

)( 5 44 4

)(xy

)

4. If

A =

(1 −11 2

)

calculate A2, A3 and A4.

5. Let A =

(1 11 0

)and x =

(10

). Calculate Ax, A2x, A3x, A4x and

A5x. What do you notice?

6. Calculate A2 where

A =

(cos θ sin θ− sin θ cos θ

)

7. Show that

A =

(1 23 4

)

satisfies A2 − 5A− 2I = O.

8. Let A be the following 3× 3 matrix

2 4 40 1 −10 1 3

CalculateA3 − 6A2 + 12A− 8I

where I is the 3× 3 identity matrix.

9. Let A =

3 1 −12 2 −12 2 0

Calculate

A3 − 5A2 + 8A− 4I

where I is the 3× 3 identity matrix.


10. If 3X + A = B, find X in terms of A and B.

11. If X + Y =

(1 12 2

)and X − Y =

(2 21 1

)find X and Y .

12. If AB = BA show that A2B = BA2.

13. Is it true that AABB = ABAB?

14. Show that(A+B)2 − (A−B)2 = 2(AB +BA).

15. Let A and B be n× n matrices. Is it necessarily true that

(A−B)(A+B) = A2 −B2?

If so, prove it. If not, find a counterexample.

16. Expand (A+ I)4 carefully.

17. A matrix A is said to be symmetric if AT = A.

(a) Show that a symmetric matrix must be square.

(b) Show that if A is any matrix then AAT is defined and symmetric.

(c) Let A and B be symmetric matrices of the same size. Prove thatAB is symmetric if and only if AB = BA.

18. An n× n-matrix A is said to be skew-symmetric if AT = − A.

(a) Show that the diagonal entries of a skew-symmetric matrix are allzero.

(b) If B is any n × n-matrix, show that B + BT is symmetric andthat B − BT is skew-symmetric.

(c) Deduce that every square matrix can be expressed as the sum ofa symmetric matrix and a skew-symmetric matrix.

19. Let A, B and C be square matrices of the same size. Define [A,B] =AB −BA. Calculate

[[A,B], C] + [[B,C], A] + [[C,A], B].


20. Let A be a 2× 2 matrix such that AB = BA for all 2× 2 matrices B.Show that

A =

(λ 00 λ

)

for some scalar λ.

21. Let A be a 2× 2 matrix. The trace of A, denoted tr(A), is the sum ofthe diagonal elements.

(a) Show that tr(A+B) = tr(A) + tr(B); tr(λA) = λtr(A); tr(AB) =tr(BA).

(b) Let A be a known matrix. Show that the equation AX −XA = Icannot be solved for X.

7.3 Solving systems of linear equations

The goal of this section is to use matrices to help us solve systems of linearequations. We begin by proving some general results on linear equations,and then we describe Gaussian elimination, an algorithm for solving systemsof linear equations.

7.3.1 Some theory

A system of m linear equations in n unknowns is a list of equations of thefollowing form

a11x1 + a12x2 + . . .+ a1nxn = b1

a21x1 + a22x2 + . . .+ a2nxn = b2

· · ·am1x1 + am2x2 + . . .+ amnxn = bm

A solution is a sequence of values of x1, . . . , xn that satisfy all the equa-tions. The set of all solutions is called the solution set or general solution.

The equations above can be conveniently represented using matrices. LetA be the m×n matrix (A)ij = aij, let b be the m× 1 matrix (b)i1 = bi, and

7.3. SOLVING SYSTEMS OF LINEAR EQUATIONS 197

let x be the n × 1 matrix (x)j1 = xj. Then the system of linear equationsabove can be written in the form

Ax = b

If b is a zero matrix, we say that the equations are homogeneous, otherwisethey are said to be inhomogeneous.

A system of linear equations that has no solution is said to be inconsistent;otherwise, it is said to be consistent.

We begin with some results that tell us what to expect when solvingsystems of linear equations.

Proposition 7.3.1. Homogeneous equations Ax = 0 are always consistent,because x = 0 is always a solution. In addition, the sum of any two solutionsis again a solution, and the scalar multiple of any solution is again a solution.

Proof. Let Ax = 0 be our homogeneous system of equations. Let a and b besolutions. That is Aa = 0 and Ab = 0. We now calculate A(a + b). To dothis we use the fact that matrix multiplication satisfies the left distributivitylaw

A(a + b) = Aa + Ab = 0 + 0 = 0.

Now let a be a solution and λ any scalar. Then

A(λa) = λAa = λ0 = 0.

Proposition 7.3.2. LetAx = b

be a consistent system of linear equations. Let p be any one solution. Thenevery solution of the equation is of the form p + h for some solution h ofAx = 0.

Proof. Let a be any solution to Ax = b. Let h = a−p. Then Ah = 0. Theresult now follows.

Theorem 7.3.3 (Fundamental theorem of linear equations). A system oflinear equations Ax = b has either

• No solutions.


• Exactly one solution.

• Infinitely many solutions.

Proof. We prove that if we can find two different solutions we can in factfind infinitely many solutions. Let u and v be two distinct solutions tothis equation then Au = b and Av = b. Consider now the column matrixw = u− v. Then

Aw = A(u− v) = Au− Av = 0

using the distributive law. Thus w is a non-zero column matrix that satisfiesthe equation Ax = 0. Consider now the column matrices of the form

u + λw

where λ is any real number. This is therefore a set of infinitely many differentcolumn matrices. We calculate

A(u + λw) = Au + λAw = b

using the distributive law and properties of scalars. It follows that the in-finitely many column matrices u + λw are solutions to the equation Ax =b.

7.3.2 Gaussian elimination

In this section, we shall develop an algorithm that will take as input a systemof linear equations and produce as output the following: if the system hasno solutions it will tell us, on the other hand if it has solutions then it willdetermine them all. Our method is based on three simple ideas:

1. Certain systems of linear equations have a shape that makes them veryeasy to solve.

2. Certain operations can be carried out on systems of linear equationswhich simplify them but do not change the solutions.

3. Everything can be done using matrices.

Here are examples of each of these ideas.


Example 7.3.4. The system of equations

2x+ 3y = 1

−y = 3

is very easy to solve. From the second equation we get y = −3. Substitutingthis value into the first equation gives us x = 5. We can check that thissolution is correct by checking that these two values satisfy every equation.


2x+ 3y = 1

x+ y = 2

can be converted into a system with the same solutions but which is easier tosolve. Multiply then second equation by 2. This gives us the new equations

2x+ 3y = 1

2x+ 2y = 4

which have the same solutions as the original equations. Next, subtract thefirst equation from the second equation to get

2x+ 3y = 1

−y = 3

These equations also have the same solutions as the original equations, butthey can now be easily solved.


2x+ 3y = 1

x+ y = 2

can be written in matrix form as the matrix equation(

2 31 1

)(xy

)=

(12

)


For the purposes of our algorithm, we rewrite this equation in terms of whatis called an augmented matrix

(2 3 11 1 2

)

The operations carried out in the previous example can be applied directlyto the augmented matrix.

(2 3 11 1 2

)=⇒

(2 3 12 2 4

)=⇒

(2 3 10 −1 3

)

This augmented matrix can then be converted back into the usual matrixform and solved

2x+ 3y = 1

−y = 3

We now formalize the above ideas.A matrix is called a row echelon matrix or to be in row echelon form if it

satisfies the following three conditions:

1. Any zero rows are at the bottom of the matrix.

2. If there are non-zero rows then they begin with the number 1, calledthe leading 1.

3. In the column beneath a leading 1, the elements are all zero.

The following operations on a matrix are called elementary row operations:

1. Multiply row i by a non-zero scalar λ. We notate this operation byRi ← λRi.

2. Interchange rows i and j. We notate this operation by Ri ↔ Rj.

3. Add a multiple λ of row i to another row j. We notate this operationby Rj ← Rj + λRi.

The following result is not hard to prove.

Proposition 7.3.7. Applying the elementary row operations to a system oflinear equations does not change their set of solutions


Given a system of linear equations

Ax = b

the matrix(A|b)

is called the augmented matrix.

Algorithm 7.3.8. (Gaussian elimination) This is an algorithm for solvingsystems of linear equations. In outline, the algorithm runs as follows:

1. Given a system of equations

Ax = b

form the augmented matrix

(A|b).

2. By using elementary row operations, convert

(A|b)

into an augmented matrix(A′|b′)

which is a row echelon matrix.

3. Solve the equations obtained from

(A′|b′)

by back substitution.

Remarks

• The process in step (2) has to be carried out systematically to avoidgoing around in circles.

• Elementary row operations applied to a set of linear equations do notchange the solution set. Thus the solution sets of

Ax = b and A′x = b′

are the same.


• Solving systems of linear equations where the associated augmentedmatrix is a row echelon matrix is easy and can be accomplished byback substitution.

Here is a more detailed description of step (2) of the algorithm — theinput is a matrix B and the output is a matrix B′ which is a row echelonmatrix:

1. Locate the leftmost column that does not consist entirely of zeros.

2. Interchange the top row with another row if necessary to bring a non-zero entry to the top of the column found in step 1.

3. If the entry now at the top of the column found in step 1 is a, thenmultiply the first row by 1

ain order to introduce a leading 1.

4. Add suitable multiples of the top row to the rows below so that allentries below the leading 1 become zeros.

5. Now cover up the top row, and begin again with step 1 applied to thematrix that remains. Continue in this way until the entire matrix is arow echelon matrix.

The important thing to remember is to start at the top and work down-wards.

Here is a more detailed description of step (3) of the algorithm. LetA′x = b′ be a system of equations where the augmented matrix is a rowechelon matrix and where there is more than one solution. The variablesare divided into two groups: those variables corresponding to the columnsof A′ containing leading 1’s, called leading variables, and the rest, called freevariables. We solve for the leading variables in terms of the free variables; thefree variables can be assigned arbitrary values independently of each other.

Examples 7.3.9.

1. Show that the following system of equations is inconsistent (i.e. has nosolutions).

x+ 2y − 3z = −1

3x− y + 2z = 7

5x+ 3y − 4z = 2


The first step is to write down the augmented matrix of the system. Inthis case, this is the matrix

1 2 −3 −13 −1 2 75 3 −4 2

Carry out the elementary row operations R2 ← R2 − 3R1 and R3 ←R3 − 5R1. This gives us

1 2 −3 −10 −7 11 100 −7 11 7

Now carry out the elementary row operation R3 ← R3 − R2 whichyields

1 2 −3 −10 −7 11 100 0 0 −3

The equation corresponding to the last line of the augmented matrix is0x + 0y + 0z = −3. Clearly, this equation has no solutions and so theoriginal set of equations has no solutions.

2. Show that the following system of equations has exactly one solution,and check it.

x+ 2y + 3z = 4

2x+ 2y + 4z = 0

3x+ 4y + 5z = 2

We first write down the augmented matrix

1 2 3 42 2 4 03 4 5 2

We then carry out the elementary row operations R2 ← R2 − 2R1 andR3 ← R3 − 3R1 to get

1 2 3 40 −2 −2 −80 −2 −4 −10


Then carry out the elementary row operations R2 ← −12R2 and R3 ←

−12R3 that yield

1 2 3 40 1 1 40 1 2 5

Finally, carry out the elementary row operation R3 ← R3 −R2

1 2 3 40 1 1 40 0 1 1

This is now a row echelon matrix. Write down the corresponding setof equations

x+ 2y + 3z = 4

y + z = 4

z = 1

Now solve by back substitution to get x = −5, y = 3 and z = 1.Finally, we check that

1 2 32 2 43 4 5

−5

31

=

402

3. Show that the following system of equations has infinitely many solu-tions, and check them.

x+ 2y − 3z = 6

2x− y + 4z = 2

4x+ 3y − 2z = 14

The augmented matrix for this system is

1 2 −3 62 −1 4 24 3 −2 14


We transform this matrix into an echelon matrix by means of the fol-lowing elementary row operations R2 ← R2 − 2R1, R3 ← R3 − 4R1,R2 ← −1

5R2, R3 ← −1

5R3 and R3 ← R3 −R2. This yields

1 2 −3 60 1 −2 20 0 0 0

Because the bottom row consists entirely of zeros, this means that wehave only two equations

x+ 2y − 3z = 6

y − 2z = 2

By back substitution, both x and y can be expressed in terms of z andz may take any value we like. We say that z is a free variable. Letz = λ ∈ R. Then the set of solutions can be written in the form

xyz

=

220

+ λ

−1

21

We now check that these solutions work

1 2 −32 −1 44 3 −2

2− λ2 + 2λ

λ

=

62

14

as required.

Exercises 7.3

1. In each case, determine whether the system of equations is consistentor not. When consistent, find all solutions and show that they work.

(a)

2x+ y − z = 1

3x+ 3y − z = 2

2x+ 4y + 0z = 2


(b)

2x+ y − z = 1

3x+ 3y − z = 2

2x+ 4y + 0z = 3

(c)

2x+ y − 2z = 10

3x+ 2y + 2z = 1

5x+ 4y + 3z = 4

(d)

x+ y + z + w = 0

4x+ 5y + 3z + 3w = 1

2x+ 3y + z + w = 1

5x+ 7y + 3z + 3w = 2

7.4 Blankinship’s algorithm

The ideas of this chapter lead to an alternative, and better, procedure forcalculating the integers x abnd y such that gcd(a, b) = xa + yb. It wasdescribed by W. A. Blankinship in his paper ‘A new version of the Euclideanalgorithm’ American Mathematical Monthly 70 (1963), 742–745. To explainhow it works, let’s go back to the basic step of Euclid’s algorithm. If a ≥ bthen we divide b into a and write

a = bq + r

where 0 ≤ r ≤ b. The key point is that gcd(a, b) = gcd(b, r). We shall nowthink of (a, b) and (b, r) as column matrices

(ab

),

(rb

).

We want the 2× 2 matrix that maps(ab

)to

(rb

).

7.4. BLANKINSHIP’S ALGORITHM 207

This is the matrix (1 −q0 1

).

Thus (1 −q0 1

)(ab

)=

(rb

).

Finally, we can describe the process by the following matrix operation

(1 0 a0 1 b

)→(

1 −q r0 1 b

)

by carrying out an elementary row operation. This procedure can be iterated.It will terminate when one of the entries in the righthand column is 0. Thenon-zero entry will then be the greatest common divisor of a and b and thematrix on the lefthand side will tell you how to get to 0, gcd(a, b) from a, band so will provide the information that the Euclidean algorithm provides.All of this is best illustrated by means of an example.

Let’s calculate x, y such that gcd(2520, 154) = x2520 + y154. We startwith the matrix (

1 0 25200 1 154

)

If we divide 154 into 2520 it goes 16 times plus a remainder. Thus we subtract16 times the second row from the first to get

(1 −16 560 1 154

)

We now repeat the process but, since the larger number, 154, is on thebottom, we have to subtract some multiple of the first row from the second.This time we subtract twice the first row from the second to get

(1 −16 56−2 33 42

)

Now repeat this procedure to get

(3 −49 14−2 33 42

)


And again (3 −49 14

−11 180 0

)

The process now terminates because we have a zero in the rightmost column.The non-zero entry in the rightmost column is gcd(2520, 154). We also knowthat (

3 −49−11 180

)(2520154

)=

(140

).

Now this matrix equation corresponds to two equations. The bottom onecan be verified. The top one says that

14 = 3× 2520− 49× 154

which is both true and solves the extended Euclidean problem!

chapter 7 matrices i: linear equations - hwmarkl/teaching/algebra/chapter7.pdf · chapter 7...

Documents