1 linear algebra – introduction

100
Mathematics for Engineers & Scientists MATH1551 Lecture Notes Contents 1 Linear Algebra 1 1.0 Introduction .................................. 1 1.1 Gaussian elimination ............................. 4 1.2 Solvability ................................... 7 1.3 Determinants ................................. 9 1.4 Matrices .................................... 12 1.5 Inverse Matrices ................................ 15 1.6 LU Decomposition .............................. 18 1.7 P T LU Decomposition ............................ 22 1.8 Rounding Errors and Ill-conditioned Matrices ............... 24 1.9 Iterative Methods: Jacobi’s Method ..................... 27 1.10 Iterative Methods: Gauss-Seidel Method .................. 31 1.11 Iterative Methods: SOR Method ....................... 33 2 Complex Numbers 35 2.0 Introduction .................................. 35 2.1 Complex Arithmetic ............................. 37 2.2 Polar Form .................................. 41 2.3 Euler’s Formula ................................ 43 2.4 Powers of Complex Numbers ......................... 44 2.5 Solving Polynomial Equations ........................ 47 2.6 Complex Logarithms ............................. 49 i

Upload: others

Post on 18-Dec-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Mathematics for Engineers & ScientistsMATH1551

Lecture Notes

Contents

1 Linear Algebra 1

1.0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Solvability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.5 Inverse Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.6 LU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.7 P TLU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.8 Rounding Errors and Ill-conditioned Matrices . . . . . . . . . . . . . . . 24

1.9 Iterative Methods: Jacobi’s Method . . . . . . . . . . . . . . . . . . . . . 27

1.10 Iterative Methods: Gauss-Seidel Method . . . . . . . . . . . . . . . . . . 31

1.11 Iterative Methods: SOR Method . . . . . . . . . . . . . . . . . . . . . . . 33

2 Complex Numbers 35

2.0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.1 Complex Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.2 Polar Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.3 Euler’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.4 Powers of Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.5 Solving Polynomial Equations . . . . . . . . . . . . . . . . . . . . . . . . 47

2.6 Complex Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

i

2.7 De Moivre’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.8 Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

2.9 Application to Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2.10 Application to AC Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3 Vectors 62

3.0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.1 Basic Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.2 Coordinates and Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.3 The Scalar Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.4 Projections and Orthonormality . . . . . . . . . . . . . . . . . . . . . . . 72

3.5 Lines in 3 dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

3.6 Planes in 3 dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

3.7 The Vector Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

3.8 Distances Between Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

3.9 The Scalar Triple Product . . . . . . . . . . . . . . . . . . . . . . . . . . 86

3.10 Application to Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

3.11 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . 90

3.12 Diagonalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

[MATH1551] ii [updated 23/11/2021]

1 Linear Algebra

1.0 Introduction

In engineering and science, we often need to solve simultaneous equations like

20x − 15z = 220,

25y − 10z = 0,

−15x− 10y + 45z = 0.

This a system of linear equations, where “linear” means they only involve first powersof variables (so constant multiples of x, y and z but nothing like x2 or sin z). By “solve”,we mean find values for x, y and z that satisfy all the equations simultaneously. LinearAlgebra begins with the study of systems like this. Here is an example of a problem thatwould give rise to the equations above.

Example. Currents in an electrical network leading to linear equations. .

Kirchoff’s law tells us that the sum of voltage drops is equal to the sum of voltage sourcesaround any loop. Applying it to the three loops shown, along with Ohm’s law for eachresistor, gives the simultaneous equations

5i1 + 15(i1 − i3) = 220,

10(i2 − i3) + 5i2 + 10i2 = 0,

20i3 + 10(i3 − i2) + 15(i3 − i1) = 0.

Simplifying, we get our original system

20i1 − 15i3 = 220,

25i2 − 10i3 = 0,

−15i1 − 10i2 + 45i3 = 0.

An electrical engineer faced with these equations might use MATLAB and type

1 coeffs = [20 0 -15; 0 25 -10; -15 -10 45]

2 rhs = [220; 0; 0]

3 linsolve(coeffs , rhs)

This will output the solutions for x, y, z, or equivalently the currents i1, i2 and i3:

1 ans =

2 15.1597

3 2.2185

4 5.5462

[MATH1551] 1 [updated 23/11/2021]

In this topic, we’ll learn how the computer found this solution (a method calledGaussianelimination). It’s important to understand the basics of what is happening “under thehood” because it doesn’t always work as smoothly as this. Consider the equations

x+ 2y + 3z = 1,

4x+ 5y + 6z = 0,

7x+ 8y + 9z = 2.

If we try to solve this system in MATLAB as before, typing

1 coeffs2 = [1 2 3; 4 5 6; 7 8 9]

2 rhs2 = [1; 0; 2]

3 linsolve(coeffs2 , rhs2)

we get

1 Warning: Matrix is close to singular or badly scaled. Results

may be inaccurate.

2 ans = 1.0e+16 *

3 0.9458

4 -1.8915

5 0.9458

The warning message gives us a clue that something is amiss. In fact, not only does this“solution” not satisfy the original equations, actually no solution exists! This issue hascome about purely due to rounding error in the computer. We’ll see how to determinewhether a system of equations has a solution or not, using the determinant.

Another thing that can go wrong is that the calculation with Gaussian elimination istoo slow, particularly for large systems of equations. Although we will concentrate onsmall systems (for practical reasons!), we’ll also learn about some alternative iterativemethods that come into their own for the large linear systems often encountered inscience and engineering. Large systems of equations often arise from spatial problemssuch as the following.

Example. In studying a steady state temperature in a square with top and left sidesheld at 100◦C and bottom and right at 0◦C, a typical numerical approach is to divideinto a discrete grid of points. For example, taking N = 16 points x1, . . . , x16:

[MATH1551] 2 [updated 23/11/2021]

In a steady state, the temperature at an interior point is (approximately) the average ofits four neighbours. The values on the edges are fixed at

x1 = x5 = x9 = x13 = x14 = x15 = x16 = 100,

x2 = x3 = x4 = x8 = x12 = 0,

leaving only four unknowns remaining for the interior values, determined by

−14x2 − 1

4x5 + x6 − 1

4x7 − 1

4x10 = 0,

−14x3 − 1

4x6 + x7 − 1

4x8 − 1

4x11 = 0,

−14x6 − 1

4x9 + x10 − 1

4x11 − 1

4x14 = 0,

−14x7 − 1

4x10 + x11 − 1

4x12 − 1

4x15 = 0.

Increasing the resolution quickly increases the number of equations. For instance, withN = 322 and N = 1282, MATLAB produces

Increasing the resolution further requires alternative “cleverer” methods, and maybe youcan see how further problems arise in three-dimensional calculations.

Let’s finish by thinking about what would make a system of three equations easier tosolve. The easiest case would be a diagonal system where the equations are decoupled,for example

x = 2,

−y = 1,

2z = 6.

We can just read off the solution here, giving x = 2, y = −1, z = 3.

The next simplest system is one that is triangular in form, for example

x+ 2y − z = −3,

y − 3z = −10,

z = 3.

We can solve this system by back substitution: first find z = 3 from the last equation.Then substitute this into the second equation to find y−9 = −10, giving y = −1. Finallysubstitute these values of y and z into the first equation to find x−5 = −3, giving x = 2.

[MATH1551] 3 [updated 23/11/2021]

1.1 Gaussian elimination

The idea behind Gaussian elimination is to systematically convert a linear system intoa more convenient form that can be solved easily.

▶ Sometimes textbooks convert to the triangular form, which can then be solved quicklyby back substitution. Other times, they like to go further and transform the system to acompletely decoupled diagonal form where you can just read off the answer. This secondway is useful but can take longer (more steps) and often, numerical software (like Matlabor numpy/python) do the first “triangular” way.

So how do we “transform” the system from one form to another? There are three simpleoperations we can perform that don’t change the solution:

(Ri ↔ Rj) : Swap two equations. For example,

x+ 2y − z = −3

x+ y + 2z = 7

2x+ 5y − 4z = −13

R1↔R2−−−−→x+ y + 2z = 7

x+ 2y − z = −3

2x+ 5y − 4z = −13

(cRi) : Multiply an equation by a non-zero constant. For example,

x+ 2y − z = −3

x+ y + 2z = 7

2x+ 5y − 4z = −13

3R1−−→3x+ 6y − 3z = −9

x+ y + 2z = 7

2x+ 5y − 4z = −13

(Ri + cRj) : Add/subtract a multiple of one equation to another. For example,

x+ 2y − z = −3

x+ y + 2z = 7

2x+ 5y − 4z = −13

R3−2R1−−−−→x+ 2y − z = −3

x+ y + 2z = 7

y − 2z = −7

Hopefully you can see these operations don’t affect the solutions to the system. They arecalled elementary row operations (EROs). When performing these operations, it isuseful to have concise notation for the system. We simply drop the unnecessary symbols,just keeping the numbers written in a so-called augmented matrix. For example,

x+ 2y − z = −3

x+ y + 2z = 7

2x+ 5y − 4z = −13

is written as

1 2 −1 −31 1 2 72 5 −4 −13

.

Notice that the right-hand sides (the constant terms) of the equations are to the rightof the vertical line. So our aim in Gaussian elimination is to transform this into thetriangular form looking like 1 ? ? ?

0 1 ? ?0 0 1 ?

.

[MATH1551] 4 [updated 23/11/2021]

There is a systematic algorithm, which is easiest to illustrate with examples.

Example. Gaussian elimination for the above 3 × 3 system. We want to obtain zerosbelow the diagonal and it’s convenient to do this one column at a time, starting from theleft. First, eliminate x from R2 and R3 by subtracting appropriate multiples of R1:1 2 −1 −3

1 1 2 72 5 −4 −13

R2−R1−−−−→

1 2 −1 −30 −1 3 102 5 −4 −13

R3−2R1−−−−→

1 2 −1 −30 −1 3 100 1 −2 −7

Next we change the coefficient of y in R2 to 1, by multiplying R2 by −1:1 2 −1 −3

0 −1 3 100 1 −2 −7

−R2−−→

1 2 −1 −30 1 −3 −100 1 −2 −7

Finally, we eliminate y from R3 by subtracting an appropriate multiple of R2:1 2 −1 −3

0 1 −3 −100 1 −2 −7

R3−R2−−−−→

1 2 −1 −30 1 −3 −100 0 1 3

We have reached the triangular form that we hoped for, with 1’s on the diagonal, and wecould go on to solve by back substitution.

Depending on the initial system, it could be that we need to swap rows somewhere.

Example. Use Gaussian elimination to transform the system0 0 3 91 2 1 02 1 0 3

to triangular form, and hence find the solution by back substitution. The top-left elementis zero, so we can’t eliminate x from R2 and R3 by subtracting multiples of R1. Instead,we swap R1 and R2 first:0 0 3 9

1 2 1 02 1 0 3

R1↔R2−−−−→

1 2 1 00 0 3 92 1 0 3

R3−2R1−−−−→

1 2 1 00 0 3 90 −3 −2 3

Now we have the same problem again: we can’t eliminate y from R3 by subtracting amultiple of R2. So we swap R2 and R3.1 2 1 0

0 0 3 90 −3 −2 3

R2↔R3−−−−→

1 2 1 00 −3 −2 30 0 3 9

Now we can reach triangular form just by scaling R2 and R3:1 2 1 0

0 −3 −2 30 0 3 9

−13R2

−−−→

1 2 1 00 1 2

3−1

0 0 3 9

13R3

−−→

1 2 1 00 1 2

3−1

0 0 1 3

[MATH1551] 5 [updated 23/11/2021]

Back substitution then gives the solution

z = 3,

y = −1− 23z = −1− 2

3(3) = −3,

x = 0− 2y − z = 0− 2(−3)− 3 = 3.

To finish, let’s show how transforming a system into a diagonal form works.

Example. Gaussian elimination for the previous example to diagonal form. Once we’vereached the triangular form as above, we just have to obtain zeros above the diagonal aswell as below. This time we start with the last column, subtracting appropriate multiplesof the bottom row:1 2 1 0

0 1 23

−10 0 1 3

R2− 23R3−−−−−→

1 2 1 00 1 0 −30 0 1 3

R1−R3−−−−→

1 2 0 −30 1 0 −30 0 1 3

and then eliminate the y from R1 using the middle row:1 2 0 −3

0 1 0 −30 0 1 3

R1−2R2−−−−→

1 0 0 30 1 0 −30 0 1 3

.

We can now read off the solution as the three equations simply say

x = 3,

y = −3,

z = 3.

▶ It’s very important to apply EROs to the whole rows of an augmented matrix, not justthe left hand side. If you forget about the part on the right side of the vertical bar, itwon’t work!

▶ In the above examples, we deliberately wrote the specific EROs as part of the calcula-tions. You should always do this as well in assignments and the exam. It tells the readerwhat you’re doing and also helps you in case there’s a mistake and you need to go backand fix things.

▶ It’s possible to perform more than one operation at once. However, you should becareful doing this, and, if in doubt, only do one ERO at each step. In particular, neverapply two operations to the same row in the same step.

▶ Swapping rows to get non-zero elements in the right position is called pivoting, andthese key entries (that have to be non-zero) are called pivots. We’ll see more about thislater.

There are some things that can go wrong with the above method, however, because...

[MATH1551] 6 [updated 23/11/2021]

1.2 Solvability

...not all systems of linear equations have a solution and sometimes they have lots. Wecan see this happening even in a 2× 2 system. For example, consider the equations

mx− y = −c,

y = 0,

where m and c are constants. Graphically, these equations correspond to two straightlines, and provided that m = 0, there is a unique solution at the point of intersectionx = −c/m, y = 0:

On the other hand, when m = 0, the lines are parallel and two things can happen,depending on c. If c = 0, the lines are distinct, the equations are inconsistent, and thereare no solutions. If c = 0, the two lines are the same, the two equations are insufficientto determine both x and y uniquely, and there are infinitely many solutions.

This behaviour is typical of linear systems of any size: for “most” values of the coefficients,there is a unique solution for any right-hand side values. But for certain values of thecoefficients, there can be either no solutions or an infinite number, depending on theright-hand side. Let’s see how these pathological situations manifest themselves when wetry to do Gaussian elimination on some 3× 3 systems.

Example. Apply Gaussian elimination to the system

x+ 2y + 3z = 1,

4x+ 5y + 6z = 1,

7x+ 8y + 9z = −5.

We write the system in augmented matrix form and proceed as before:1 2 3 14 5 6 17 8 9 −5

R2−4R1−−−−→

1 2 3 10 −3 −6 −37 8 9 −5

R3−7R1−−−−→

1 2 3 10 −3 −6 −30 −6 −12 −12

−13R2

−−−→

1 2 3 10 1 2 10 −6 −12 −12

R3+6R2−−−−→

1 2 3 10 1 2 10 0 0 −6

.

[MATH1551] 7 [updated 23/11/2021]

The last equation says that 0 = −6, which is false! This means the system of equationsis inconsistent: there are no solutions (like with the distinct parallel lines case).

Example. Apply Gaussian elimination to the system

x+ 2y + 3z = 1,

4x+ 5y + 6z = 1,

7x+ 8y + 9z = 1.

This is the same as the last example but with different right-hand side in the last equation.Gaussian elimination will involve the same sequence of row operations, since these don’tdepend on the right-hand side (think about why). Thus we get1 2 3 1

4 5 6 17 8 9 1

R2−4R1−−−−→

1 2 3 10 −3 −6 −37 8 9 1

R3−7R1−−−−→

1 2 3 10 −3 −6 −30 −6 −12 −6

−13R2

−−−→

1 2 3 10 1 2 10 −6 −12 −6

R3+6R2−−−−→

1 2 3 10 1 2 10 0 0 0

.

Unlike before, the last equation is now consistent, it just says 0 = 0. But there areeffectively only two equations left,

x+ 2y + 3z = 1,

y + 2z = 1,

so only two variables can be determined, in terms of the third one (like the case wherethe lines were parallel and c = 0). Choosing z to be the undetermined one, write z = λwhere λ is an arbitrary number (called a parameter). Then back substitution givesevery solution in terms of the parameter:

z = λ,

y = 1− 2λ,

x = 1− 2(1− 2λ)− 3λ = −1 + λ.

There are infinitely many solutions, and moreover, we have found them all!

▶ We’ll see later in the course that the set of solutions to this last example is theparametric equation of a line in 3D space.

▶ Notice that when one of the equations “drops out” like this, it is the same as startingwith only two equations in the first place. Mathematicians say that the rank of thesystem is only 2 rather than 3.

To sum up:

� If you get a row of all zeros except for the right-hand side, there is no solution.

� If you get a row of all zeros, and the number of non-zero rows is less than thenumber of variables, then the system will have multiple solutions. We can thenwrite the answer in parametric form.

[MATH1551] 8 [updated 23/11/2021]

1.3 Determinants

There is a way to compute ahead of time whether or not a linear system will have aunique solution. For a 2× 2 system, we can derive it directly for a general system

ax+ by = u,

cx+ dy = v.

Subtracting b times the second equation from d times the first gives

(ad− bc)x = du− bv.

Similarly, subtracting c times the first equation from a times the second gives

(ad− bc)y = av − cu.

In particular, as long as ad− bc = 0, we find the solution is uniquely determined

x =du− bv

ad− bcand y =

av − cu

ad− bc.

The denominator here is called the determinant, and is written as∣∣∣∣a bc d

∣∣∣∣ = ad− bc.

So our linear system has a unique solution if the determinant is non-zero. On the otherhand, if the determinant is zero, there will be either no solutions or infinitely manysolutions (try checking this), just like with the 2 × 2 example in the previous sectionwhen m = 0. Notice the determinant depends only on the coefficients a, b, c, d and noton u, v.

The situation is similar for larger systems: there is a special number called the determi-nant calculated from the coefficients, and the system has a unique solution if and onlyif its determinant is non-zero. We can evaluate the determinant by “expanding alongthe top row”. For example,∣∣∣∣∣∣

a b cd e fg h i

∣∣∣∣∣∣ = a

∣∣∣∣e fh i

∣∣∣∣− b

∣∣∣∣d fg i

∣∣∣∣+ c

∣∣∣∣d eg h

∣∣∣∣∣∣∣∣∣∣∣∣a b c de f g hi j k lm n o p

∣∣∣∣∣∣∣∣ = a

∣∣∣∣∣∣f g hj k ln o p

∣∣∣∣∣∣− b

∣∣∣∣∣∣e g hi k lm o p

∣∣∣∣∣∣+ c

∣∣∣∣∣∣e f hi j lm n p

∣∣∣∣∣∣− d

∣∣∣∣∣∣e f gi j km n o

∣∣∣∣∣∣ .Hopefully you see the general pattern. Each term in the top row is multiplied by thesmaller determinant obtained by crossing out the row and column of that term. Thenthese are added/subtracted in an alternating fashion. It’s a recursive process: a 3 × 3determinant is calculated using three 2×2 determinants, a 4×4 determinant is calculatedusing four 3× 3 determinants, and so on.

▶ It’s beyond the scope of this course to derive these big formulas from scratch - thealgebra gets nasty, and more complicated mathematical ideas are required. Fortunately,we just need to know how to compute them.

[MATH1551] 9 [updated 23/11/2021]

Example. Calculate the determinant of the linear system

x+ 2y + 3z = 1,

4x+ 5y + 6z = 1,

7x+ 8y + 9z = −5.

We saw earlier that this system has no solution, so it should be that the determinantvanishes. Using the 3× 3 formula above, we find indeed that∣∣∣∣∣∣

1 2 34 5 67 8 9

∣∣∣∣∣∣ = 1

∣∣∣∣5 68 9

∣∣∣∣− 2

∣∣∣∣4 67 9

∣∣∣∣+ 3

∣∣∣∣4 57 8

∣∣∣∣= 1 (5× 9− 6× 8)− 2 (4× 9− 6× 7) + 3 (4× 8− 5× 7)

= −3 + 12− 9 = 0.

Notice again that the determinant doesn’t depend on the right-hand side of the system.Remember that changing the system to

x+ 2y + 3z = 1,

4x+ 5y + 6z = 1,

7x+ 8y + 9z = 1

gave an infinite number of solutions, but this also corresponds to having zero determinant,because the solution is not unique.

The formulas for 3× 3, 4× 4,... determinants quickly become cumbersome. But often wecan simplify the evaluation of larger determinants using some nice properties:

(1) Swapping two rows (i.e. Ri ↔ Rj) changes the sign of the determinant. Forexample, ∣∣∣∣4 3

1 2

∣∣∣∣ = 8− 3 = 5 and

∣∣∣∣1 24 3

∣∣∣∣ = 3− 8 = −5.

(2) Multiplying one row by a constant (i.e. cRi) multiplies the whole determinant bythat constant. For example,∣∣∣∣4 3

2 4

∣∣∣∣ = ∣∣∣∣ 4 32(1) 2(2)

∣∣∣∣ = 2(8)− 2(3) = 2(5) = 2

∣∣∣∣4 31 2

∣∣∣∣ .(3) Adding a multiple of one row to another (i.e. Ri + cRj) doesn’t change the deter-

minant. For example,∣∣∣∣ 4 31 + 2(4) 2 + 2(3)

∣∣∣∣ = ∣∣∣∣4 39 8

∣∣∣∣ = 32− 27 = 5 =

∣∣∣∣4 31 2

∣∣∣∣ .(4) Switching the rows and columns (called transposition) doesn’t change the deter-

minant. For example, ∣∣∣∣4 31 2

∣∣∣∣ = ∣∣∣∣4 13 2

∣∣∣∣ = 8− 3 = 5.

[MATH1551] 10 [updated 23/11/2021]

Notice that, since transposition doesn’t change the determinant, as well as begin able touse row operations as above, we can use similarly defined column operations to computedeterminants.

Also notice that a determinant with a whole row of zeros or with a whole column ofzeros must be zero. Furthermore, a determinant with two equal rows or with two equalcolumns must be zero.

Example. Evaluate the determinant

∣∣∣∣∣∣1 2 34 5 67 8 9

∣∣∣∣∣∣ using the above properties.

We already calculated this by directly expanding along the top row. Alternatively, wecan use row operations to simplify the determinant first as follows (a bit like Gaussianelimination):∣∣∣∣∣∣

1 2 34 5 67 8 9

∣∣∣∣∣∣ R2−4R1=

∣∣∣∣∣∣1 2 30 −3 −67 8 9

∣∣∣∣∣∣ R3−7R1=

∣∣∣∣∣∣1 2 30 −3 −60 −6 −12

∣∣∣∣∣∣ transpose=

∣∣∣∣∣∣1 0 02 −3 −63 −6 −12

∣∣∣∣∣∣ .Now expanding along the top row is much easier as two of the little determinants vanish,giving ∣∣∣∣∣∣

1 0 02 −3 −63 −6 −12

∣∣∣∣∣∣ = 1

∣∣∣∣−3 −6−6 −12

∣∣∣∣+ 0 + 0 = −3(−12)− (−6)(−6) = 0.

[MATH1551] 11 [updated 23/11/2021]

1.4 Matrices

We’ve seen that the mathematical behaviour of a system of linear equations really dependson the coefficients of x, y, z,... and not so much on the constant (right-hand side) terms.These coefficients may be written in a grid inside some brackets - a matrix. For instance,we could consider

A =

1 2 34 5 67 8 9

and B =

(1 2 34 5 6

).

We say it is an m× n matrix if it has m rows and n columns. So in particular, A aboveis a 3 × 3 matrix and B is a 2 × 3 matrix. In the special case m = n (as with A), thematrix is said to be square.

You’ve maybe already met some non-square matrices. Single-column i.e. m× 1 matricesare called vectors (or more precisely, column vectors). By convention, matrices arewritten as capital letters, whereas vectors are usually written in bold face in typed notes,such as

u =

(12

)and v =

345

.

It’s hard to write in bold script, so in hand-written work one usually puts a little lineunder the letter, such as u and v, to signify that it is a vector.

There are various operations we can apply to matrices. We can sometimes add twomatrices, multiply a matrix by a number or multiply two matrices together. Theseoperations work as follows.

Addition: Two matrices can be added just by adding their corresponding entries. Forexample, if

A =

(1 2 34 5 6

)and B =

(0 1 00 2 3

)then

A+B =

(1 + 0 2 + 1 3 + 04 + 0 5 + 2 6 + 3

)=

(1 3 34 7 9

).

Note that, because of the way we’ve set this up, the two matrices must have the samenumber of rows and the same number of columns, otherwise A+B doesn’t exist.

Scalar multiplication: A scalar is just a number - for us it will mainly be a realnumber but it could also be a complex number (see later). Any matrix can be multipliedby a scalar, just by multiplying all entries by it. For example, with

A =

(1 2 34 5 6

),

we have

2A =

(2× 1 2× 2 2× 32× 4 2× 5 2× 6

)=

(2 4 68 10 12

).

[MATH1551] 12 [updated 23/11/2021]

Matrix multiplication: This is the trickiest (and most important) operation. Wecalculate the entries of the product AB by combining rows of A with columns of B. Forexample, suppose

A =

(1 23 4

)and B =

(5 6 78 9 0

).

Then

AB =

(1× 5 + 2× 8 1× 6 + 2× 9 1× 7 + 2× 03× 5 + 4× 8 3× 6 + 4× 9 3× 7 + 4× 0

)=

(21 24 747 54 21

).

Because of the way this is set up, two matrices can be multiplied only when the numberof columns of the first equals the number of rows of the second. That means if A is anm × n matrix and B is a p × q matrix, then the product AB only exists when n = p.Furthermore, in that case, AB will be an m× q matrix.

In particular, with

A =

(1 23 4

)and B =

(5 6 78 9 0

)as above, the product AB is a 2× 3 matrix, whereas the product BA doesn’t exist.

Even when AB and BA both exist, it’s important to know that they are not always thesame (that is, matrix multiplication is not commutative). For example,(

1 01 2

)(0 13 4

)=

(0 16 9

)but

(0 13 4

)(1 01 2

)=

(1 27 8

).

Other peculiar things can happen that don’t occur when just multiplying numbers. Forinstance, we can multiply two non-zero matrices together and get the zero matrix (whichhas all entries equal to zero):(

2 4−1 −2

)(2 4−1 −2

)=

(0 00 0

).

Some familiar properties that work with ordinary numbers do work with matrices, how-ever. For instance,

(1) Matrix addition is commutative, that is, A+B = B + A.

(2) Matrix addition is associative, that is, A+ (B + C) = (A+B) + C.

(3) Matrix multiplication is associative, that is, A(BC) = (AB)C.

(4) Matrix multiplication is distributive, that is, A(B + C) = AB + AC.

These work provided that the matrices have the right sizes for the various sums andproducts to exist in the first place.

[MATH1551] 13 [updated 23/11/2021]

One of the great benefits of the seemingly complicated definition for multiplying matricesis that it gives us another convenient way of writing and working with systems of linearequations. For instance,

x+ 2y − z = −3

x+ y + 2z = 7

2x+ 5y − 4z = −13

i.e.

1 2 −1 −31 1 2 72 5 −4 −13

can be written in matrix form Ax = b, once we’ve define a matrix and two vectors

A =

1 2 −11 1 22 5 −4

, x =

xyz

, b =

−37

−13

.

To see this, apply the matrix multiplication rule to Ax:

Ax =

1 2 −11 1 22 5 −4

xyz

=

(1)x+ (2)y + (−1)z(1)x+ (1)y + (2)z(2)x+ (5)y + (−4)z

=

−37

−13

= b.

▶ Another reason for the complicated multiplication is to ensure that if Ax = b and

B is another matrix, then (BA)x = Bb, by the associativity of matrix multiplication.This nicely “transforms” one system into another and is part of a long story requiring alonger course in Linear Algebra.

There is a special square n× n matrix called the identity matrix:

I = In =

1 0 · · · 00 1 · · · 0...

.... . .

...0 0 · · · 1

.

(It’s common to not write the subscript n when it is implicit from the context.) We willfind this matrix useful because AIn = InA = A for any n× n matrix A.

Transposition: There is another useful operation we can perform on matrix A of anysize. We define the transpose AT of A to be the matrix whose rows are the columns ofA and whose columns are the rows of A. For example,

(1 2 34 5 6

)T

=

1 42 53 6

and

149

T

=(1 4 9

)▶ Having described matrix addition and scalar multiplication, it’s not too hard to domatrix subtraction as well: A − B = A + (−1)B. However, dealing with division is amuch harder task and we’ll look at that in the next section.

[MATH1551] 14 [updated 23/11/2021]

1.5 Inverse Matrices

It’s important to appreciate that we cannot just divide by a matrix. We can not write

Ax = b ⇐⇒ x = b/A. This is nonsense!

Instead, let’s rethink how we divide real numbers - we can multiply by an inverse: if x = 0,then there is a unique number y = x−1 (the inverse of x) and it satisfies yx = xy = 1.

The inverse of an n× n square matrix A is another n× n matrix A−1 such that

A−1A = AA−1 = In.

Provided that such an A−1 exists, then we can write

Ax = b ⇐⇒ A−1Ax = A−1b ⇐⇒ x = A−1b.

The next question is then: when does A−1 actually exist? First of all, A must be asquare matrix, otherwise there is no hope. Furthermore, it can be shown that a squarematrix A is invertible if and only if its determinant is non-zero. In other words,when the corresponding system of linear equations has a unique solution. If a matrix haszero determinant, and hence has no inverse, we call it non-invertible (or singular).

Example. Consider a general 2× 2 matrix A =

(a bc d

).

We can re-write a general 2× 2 system using this matrix.

ax+ by = u

cx+ dy = v⇐⇒

(a bc d

)(xy

)=

(uv

).

At the beginning of Section 1.3, we saw that if det(A) =

∣∣∣∣a bc d

∣∣∣∣ = ad− bc = 0, then

x =du− bv

ad− bcand y =

av − cu

ad− bc.

That means (xy

)=

1

ad− bc

(du− bv−cu+ av

)=

1

ad− bc

(d −b−c a

)(uv

).

Looking at the 2× 2 matrices here, if we set

A =

(a bc d

)and A−1 =

1

ad− bc

(d −b−c a

),

it is easy to check (and you should check!) that A−1A = AA−1 = I2 =

(1 00 1

).

[MATH1551] 15 [updated 23/11/2021]

There is a simple adaptation of Gaussian elimination that allows us to find the inverseof a matrix, where it exists.

Example. Find the inverse of the matrix A =

2 5 −41 1 21 2 −1

.

You can check that the determinant det(A) = 1, so A−1 exists. To find it, we start bywriting a “big” augmented matrix with the identity matrix on the right-hand side:2 5 −4 1 0 0

1 1 2 0 1 01 2 −1 0 0 1

.

We then use row operations to convert the left-hand side into the identity matrix:2 5 −4 1 0 01 1 2 0 1 01 2 −1 0 0 1

R1↔R3−−−−→

1 2 −1 0 0 11 1 2 0 1 02 5 −4 1 0 0

R2−R1−−−−→

1 2 −1 0 0 10 −1 3 0 1 −12 5 −4 1 0 0

R3−2R1−−−−→

1 2 −1 0 0 10 −1 3 0 1 −10 1 −2 1 0 −2

−R2−−→

1 2 −1 0 0 10 1 −3 0 −1 10 1 −2 1 0 −2

R3−R2−−−−→

1 2 −1 0 0 10 1 −3 0 −1 10 0 1 1 1 −3

R2+3R3−−−−→

1 2 −1 0 0 10 1 0 3 2 −80 0 1 1 1 −3

R1+R3−−−−→

1 2 0 1 1 −20 1 0 3 2 −80 0 1 1 1 −3

R1−2R2−−−−→

1 0 0 −5 −3 140 1 0 3 2 −80 0 1 1 1 −3

Now we’ve reached the identity matrix on the left, what remains on the right will be A−1,

A−1 =

−5 −3 143 2 −81 1 −3

.

▶ There are various choices we can make for EROs. For instance, you could begin with12R1 to get a 1 in the top left corner. This would be fine. It would unfortunately introduce

fractions from the start (fractions are sometimes unavoidable), but if you take care, youshould still get to the same place.

▶ You might also use more EROs than I’ve done above. One of the more efficient waysis to create zeros in an anti-clockwise fashion around the matrix, below the diagonal fromleft to right, then above the diagonal from right to left.

▶ Since the likelihood of making a mistake is quite high, it is always a good idea to checkthat A−1A = I3. Try it! You could also “cheat” and check using MATLAB, WolframAlpha, etc. This is not really cheating, but be aware that in assignments or the exam,you must show your working!

[MATH1551] 16 [updated 23/11/2021]

Once we know the inverse, solving a corresponding linear system turns into matrix mul-tiplication, using

Ax = b ⇐⇒ A−1Ax = A−1b ⇐⇒ x = A−1b.

Example. Use the previous example to solve the system

2x+ 5y − 4z = −13,

x+ y + 2z = 7,

x+ 2y − z = −3.

In matrix form, this says Ax = b where

A =

2 5 −41 1 21 2 −1

, x =

xyz

and b =

−137−3

.

Using the inverse A−1 we found above gives

x = A−1b =

−5 −3 143 2 −81 1 −3

−137−3

=

2−13

.

▶ You should see from this example that solving a system by Gaussian Eliminationand back substitution (as we did earlier) is quicker than finding the inverse and thenmultiplying. This is generally the case in solving Ax = b and for large systems, GaussianElimination is a lot faster. Also, the Gaussian Elimination process works even for non-square matrices. On the other hand, if we want to solve Ax = b many times with thesame square A but different b’s, the “inverse” way can be better. In the next section,we’ll see an alternative, intermediate way.

▶ You may have seen another way for finding inverses using “the Cofactor Method”. Wewon’t be covering this. Whilst it is kind of okay for 3× 3 matrices, it is generally a veryslow impractical method, although it does have some theoretical significance.

[MATH1551] 17 [updated 23/11/2021]

1.6 LU Decomposition

Sometimes, we may need to solve multiple sets of simultaneous equations Ax = b withthe same square matrix A but different b. In that case, there is a shortcut to avoidhaving to do Gaussian elimination all over again each time but isn’t quite as much workas finding the inverse. The idea is to factorise A as a product A = LU , where

L =

1 0 0 · · · 0× 1 0 · · · 0× × 1 · · · 0...

......

. . ....

× × × · · · 1

and U =

× × × · · · ×0 × × · · · ×0 0 × · · · ×...

......

. . ....

0 0 0 · · · ×

.

We call L a lower triangular matrix and U an upper triangular matrix.

▶ Getting 1’s on the diagonal of L is a convention that makes the decomposition unique.

Once we know L and U , we can quickly solve Ax = b in two steps:

1. Use forward substitution to find u satisfying Lu = b.

2. Use backward substitution to find x satisfying Ux = u.

Then x is the required solution because Ax = LUx = Lu = b.

Example. Given the LU decomposition

1 2 32 6 10−3 0 6

=

1 0 02 1 0−3 3 1

︸ ︷︷ ︸

L

1 2 30 2 40 0 3

︸ ︷︷ ︸

U

,

solve the systemx+ 2y + 3z = 5,2x+ 6y + 10z = 16,

−3x + 6z = 9.

1. First we solve Lu = b by forward substitution. Writing u =

uvw

, we have

1 0 02 1 0−3 3 1

uvw

=

5169

=⇒

u = 5v = 16− 2u = 6w = 9 + 3u− 3v = 6

2. Now we solve Ux = u, by back substitution1 2 30 2 40 0 3

xyz

=

566

=⇒

x = 5− 2y − 3z = 1y = (6− 4z)/2 = −1z = 2

We are left with the problem: how do we find L and U?

[MATH1551] 18 [updated 23/11/2021]

The idea is to transform A to U by adding multiples of one row to another, similar toGaussian elimination. Each of these row operations is actually equivalent to multiplyingA on the left by a matrix, called an elementary matrix. For example, applying R2−2R1

to A is the same as multiplying 1 0 0−2 1 00 0 1

︸ ︷︷ ︸

E

1 2 32 6 10−3 0 6

︸ ︷︷ ︸

A

=

1 2 32−2(1) 6−2(2) 10−2(3)−3 0 6

︸ ︷︷ ︸

EA

.

Example. Find a sequence of elementary matrices that reduces

1 2 32 6 10−3 0 6

to trian-

gular form. We follow the Gaussian elimination algorithm (but without normalising thediagonal elements to become 1s and with no right-hand side) and record the sequence ofcorresponding matrices: 1 2 3

2 6 10−3 0 6

︸ ︷︷ ︸

A

R2−2R1−−−−→

1 2 30 2 4−3 0 6

=

1 0 0−2 1 00 0 1

1 2 32 6 10−3 0 6

R3+3R1−−−−→

1 2 30 2 40 6 15

=

1 0 00 1 03 0 1

1 0 0−2 1 00 0 1

1 2 32 6 10−3 0 6

R3−3R2−−−−→

1 2 30 2 40 0 3

︸ ︷︷ ︸

U

=

1 0 00 1 00 −3 1

︸ ︷︷ ︸

G

1 0 00 1 03 0 1

︸ ︷︷ ︸

F

1 0 0−2 1 00 0 1

︸ ︷︷ ︸

E

1 2 32 6 10−3 0 6

︸ ︷︷ ︸

A

.

Notice that we have expressed U as a product U = GFEA. If we multiply one-by-oneby the inverses of the elementary matrices G, F and E, we can rearrange this equationto find an expression for L:

GFEA = U =⇒ FEA = G−1U

=⇒ EA = F−1G−1U

=⇒ A = E−1F−1G−1︸ ︷︷ ︸L

U.

The inverse of an elementary matrix is simply given by changing the sign of the off-diagonal element – for example, 1 0 0

−2 1 00 0 1

−1

=

1 0 02 1 00 0 1

as

1 0 02 1 00 0 1

1 0 0−2 1 00 0 1

=

1 0 00 1 00 0 1

.

[MATH1551] 19 [updated 23/11/2021]

So in the previous example, we have

L = E−1F−1G−1 =

1 0 02 1 00 0 1

1 0 00 1 0−3 0 1

1 0 00 1 00 3 1

=

1 0 02 1 0−3 0 1

1 0 00 1 00 3 1

=

1 0 02 1 0−3 3 1

.

The matrix L always contains the (negated) row multipliers from the elimination, as longas you deal with the first column first, then the second column,... So we can constructL directly as we perform the row operations converting A to U , without needing to findthe elementary matrices E, F , G, . . . explicitly.

Example. Find the LU decomposition of A =

2 −3 4 14 −3 10 −26 −15 7 13−2 9 3 −17

.

2 −3 4 14 −3 10 −26 −15 7 13−2 9 3 −17

R2−2R1−−−−→

2 −3 4 10 3 2 −46 −15 7 13−2 9 3 −17

L =

1 0 0 02 1 0 0? ? 1 0? ? ? 1

R3−3R1−−−−→

2 −3 4 10 3 2 −40 −6 −5 10−2 9 3 −17

L =

1 0 0 02 1 0 03 ? 1 0? ? ? 1

R4+R1−−−−→

2 −3 4 10 3 2 −40 −6 −5 100 6 7 −16

L =

1 0 0 02 1 0 03 ? 1 0−1 ? ? 1

R3+2R2−−−−→

2 −3 4 10 3 2 −40 0 −1 20 6 7 −16

L =

1 0 0 02 1 0 03 −2 1 0−1 ? ? 1

R4−2R2−−−−→

2 −3 4 10 3 2 −40 0 −1 20 0 3 −8

L =

1 0 0 02 1 0 03 −2 1 0−1 2 ? 1

R4+3R3−−−−→

2 −3 4 10 3 2 −40 0 −1 20 0 0 −2

= U, L =

1 0 0 02 1 0 03 −2 1 0−1 2 −3 1

You can (and should!) check that LU = A by matrix multiplication. Note the order ofall the matrices really matters, A = LU not UL.

[MATH1551] 20 [updated 23/11/2021]

Does every square matrix have an LU decomposition? No – for example, suppose we tryto write (

0 13 2

)=

(1 0a 1

)(b c0 d

).

Then we would have to satisfy both b = 0 and ab = 3, which is impossible. Notice thatthis matrix has no LU decomposition even though it is invertible (with determinant −3).

In general, an invertible matrix A has an LU decomposition precisely when all of itsprincipal minors are non-zero. These are the determinants of different sizes in the topleft corner of A. For the 2× 2 matrix above, they are:

∣∣0∣∣ = 0,

∣∣∣∣0 13 2

∣∣∣∣ = −3.

The first principal minor vanishes, so the matrix does not have an LU decomposition.

Example. To decide if the matrix

1 3 21 0 12 −1 0

has an LU decomposition, we compute

the three principal minors:

∣∣1∣∣ = 1,

∣∣∣∣1 31 0

∣∣∣∣ = −3, and∣∣∣∣∣∣1 3 21 0 12 −1 0

∣∣∣∣∣∣ =∣∣∣∣ 0 1−1 0

∣∣∣∣− 3

∣∣∣∣1 12 0

∣∣∣∣+ 2

∣∣∣∣1 02 −1

∣∣∣∣ = 1− 3(−2) + 2(−1) = 5.

All three are non-zero, so the matrix has an LU decomposition (and we can now find it!)

For larger matrices, the principal minor condition is time-consuming to check and thereis another condition which is helpful. A matrix is strictly diagonally dominant if theabsolute value of the diagonal element in each row is strictly greater than the sum of theabsolute values of the other elements in the row. This condition also guarantees the LUdecomposition exists.

Example. To show

−4 1 21 −3 1−3 0 6

is strictly diagonally dominant, we just have to check

� in row 1, we have | − 4| > |1|+ |2|,

� in row 2, we have | − 3| > |1|+ |1|,

� in row 3, we have |6| > | − 3|+ 0.

▶ If a matrix is not strictly diagonally dominant, that does not prevent it from havingan LU decomposition – look at the previous example.

[MATH1551] 21 [updated 23/11/2021]

1.7 P TLU Decomposition

Recall that in Gaussian elimination we sometimes needed to swap rows for the algorithmto work. If we allow row swaps, then we can make a modified LU decomposition thatexists for any non-singular matrix. Instead of writing A = LU , the idea is to write

PA = LU

where P is a permutation matrix that rearranges the rows of A. For example,

R1 ↔ R2 :

0 1 01 0 00 0 1

︸ ︷︷ ︸

P

0 0 31 2 12 6 4

=

1 2 10 0 32 6 4

Permutation matrices always have a single 1 in every row and in every column, with 0’severywhere else. To find the permutation matrix that swaps rows i and j, swap rows iand j of the identity matrix.

Given A, we need to choose P so that the permuted matrix PA has all of its principalminors non-zero.

Example. Find matrices P , L and U such that PA = LU , where A =

0 0 31 2 12 6 4

.

We can check that all principal minors are non-zero after swapping R1 ↔ R3, so

P =

0 0 10 1 01 0 0

, PA =

2 6 41 2 10 0 3

.

Now we carry out the usual LU decomposition on PA instead:

PA =

2 6 41 2 10 0 3

R2−12R1

−−−−−→

2 6 40 −1 −10 0 3

L =

1 0 012

1 0? ? 1

R3+0R1−−−−→

2 6 40 −1 −10 0 3

L =

1 0 012

1 00 ? 1

R3+0R2−−−−→

2 6 40 −1 −10 0 3

L =

1 0 012

1 00 0 1

So we have

P =

0 0 10 1 01 0 0

, L =

1 0 012

1 00 0 1

, U =

2 6 40 −1 −10 0 3

.

[MATH1551] 22 [updated 23/11/2021]

Now permutation matrices are easy to invert: you just take the transpose P−1 = P T .For example,

0 1 0 00 0 1 01 0 0 00 0 0 1

︸ ︷︷ ︸

P

0 0 1 01 0 0 00 1 0 00 0 0 1

︸ ︷︷ ︸

PT

=

1 0 0 00 1 0 00 0 1 00 0 0 1

︸ ︷︷ ︸

I4

.

So once we know PA = LU , we finally get the decomposition A = P TLU .

Because we can choose which rows to swap, P TLU decompositions are not unique ingeneral. In the previous example, we could equally have applied the permutation matrix

P ′ =

0 1 00 0 11 0 0

, so P ′A =

1 2 12 6 40 0 3

.

Now the LU decomposition of P ′A will be different, given by (exercise for the reader!)

L′ =

1 0 02 1 00 0 1

, U ′ =

1 2 10 2 20 0 3

.

However we do have the nice fact: every invertible matrix has a P TLU decomposition.

▶ Mathematically, the distinct decompositions are equally good. But when working withfinite precision arithmetic, some permutations may lead to more accurate results.

▶ Computers use a slightly different algorithm where P is not computed ahead of time,but rather the necessary row swaps are recorded during the elimination.

[MATH1551] 23 [updated 23/11/2021]

1.8 Rounding Errors and Ill-conditioned Matrices

So far we have assumed that all calculations can be carried out to perfect accuracy. Inreality, this is not usually possible, as computers usually hold a finite number of significantfigures at each stage in a calculation.

To illustrate what can go wrong, suppose we have a calculator which only works to twosignificant digits, and we use it to add the numbers

1 + 1 + . . .+ 1︸ ︷︷ ︸99 times

+100.

As we successively add on the 1’s, our calculator will output 1, 2, 3, . . . 99. Then whenwe add on the final 100, it will output 200. This is pretty close to the actual sum 199.

Now suppose we use the same calculator to add the same numbers in a different order:

100 + 1 + 1 + . . .+ 1︸ ︷︷ ︸99 times

.

Our calculator starts at 100, but each time we add 1, it stays at 100, and the final outputwill also be 100. This is not very close to the actual sum 199 and is not even correctto one significant digit. So we see that the order we add numbers matters! The moreaccurate way was to add them in increasing order of size.

When using Gaussian elimination, we haven’t been too concerned about the order inwhich we applied operations. On a computer with limited accuracy, this can have similarlydisastrous consequences.

Example. Solve the following equations working to 3 significant figures at each stage:

0.4x+ 99.6y = 100,

75.3x− 45.3y = 30.

Note that the exact solution is easily seen to be x = 1, y = 1. First, let’s performGaussian elimination on the original equations.

Applying R2 − (75.3/0.4)R1, and working to 3 significant figures, produces

−45.3− 75.3

0.4× 99.6 ≈ −45.3− 188× 99.6

≈ −45.3− 18700

≈ −18700,

and similarly

30− 75.3

0.4× 100 ≈ 30− 188× 100 ≈ −18800.

Thus our equations become

0.4x+ 99.6y = 100,

−18700y = −18800.

[MATH1551] 24 [updated 23/11/2021]

We find y =−18800

−18700≈ 1.01, which is almost correct. However, back substitution then

gives

x =100− 99.6× 1.01

0.4≈ 100− 101

0.4=

−1

0.4= −2.50,

which is a long way from the actual value x = 1. We kept 3 significant figures throughout,but our answer is incorrect to even one significant figure!

Now let’s try Gaussian elimination but with the order of the equations swapped:

75.3x− 45.3y = 30,

0.4x+ 99.6y = 100.

Of course the true solution is unchanged. We start with R2 − (0.4/75.3)R1, and workingto 3 significant figures, we now obtain

99.6− 0.4

75.3× (−45.3) ≈ 99.6− 0.00531× (−45.3)

≈ 99.6 + 0.241

≈ 99.8

and

100− 0.4

75.3× 30 ≈ 100− 0.00531× 30

≈ 100− 0.159

≈ 99.8.

The equations become

75.3x− 45.3y = 30,

99.8y = 99.8,

giving y = 1.00 and hence x =30 + 45.3× 1.00

75.3=

75.3

75.3= 1.00.

This time we have the answer correct to 3 significant figures.

What went wrong in the first case was that the rounding errors were amplified by di-viding by a very small number 0.04, rather than 75.3 in the second case. In practicalimplementations of the Gaussian elimination algorithm, it’s better to move the largest(in absolute value) coefficient to the diagonal term in the column, and use that as thepivot. This is known as partial pivoting.

▶ There is a further improvement called full pivoting that we won’t discuss, where thevariables are reordered as well as the equations, again to avoid division by small numbers.

[MATH1551] 25 [updated 23/11/2021]

Something else that can go wrong in numerical calculations with matrices is if the coef-ficient matrix has a very small determinant. These are called ill-conditioned matricesand can make calculations very sensitive to tiny variations.

For example, consider the following equations:

262

123x+ y =

139

123,

475

223x+ y =

252

223,

which have solution x = 1, y = −1. If we change the right-hand side slightly to, say,

262

123x+ y =

140

123,

475

223x+ y =

253

223,

then the new solution is x = 101, y = −214. A very small change in the coefficients hasled to a very large change in the solutions, and we say that the matrix is ill-conditioned.The reason for the big change is that

A =

(262123

1475223

1

)=⇒ det(A) =

262

123− 475

223=

1

27429.

This is nearly zero so A is “close” to being non-invertible, which causes problems.

Another way to think about it is if we know that determinants are somehow “scalingfactors”, e.g. multiplying by a matrix with large determinant “moves” things about alot. (We will come back to this later in the course.) Now a general and very usefulproperty (which we won’t prove) about determinants is that they are multiplicative: ifA and B are n× n matrices, then

det(AB) = det(A) det(B).

In particular, since A−1A = In, we have

det(A−1) det(A) = det(In) = 1 =⇒ det(A−1) =1

det(A).

Thus with the above ill-conditioned matrix, when solving Ax = b using x = A−1b, weare multiplying b by a matrix with large determinant det(A−1) = 27429. So a smallchange in b leads to a large change in x = A−1b.

[MATH1551] 26 [updated 23/11/2021]

1.9 Iterative Methods: Jacobi’s Method

Gaussian elimination or LU decomposition give us (up to rounding error) the exactsolution of a linear system, but are expensive for large systems. Often an iterative methodis a quicker and cheaper way to solve the system to a reasonable degree of accuracy. Inan iterative method, we find a sequence of approximations

x(0),x(1),x(2), . . .

getting closer and closer to the exact solution x. The simplest iterative method for linearequations is Jacobi’s method, where we rearrange each equation for a different variableand use this to make an iterative formula.

Example. Apply the Jacobi method to2x− y = 5x− 2y = 4

with initial guess

(x(0)

y(0)

)=

(00

).

The idea is to rearrange the first equation for x and the second for y, so

x =5 + y

2, y =

4− x

−2.

We turn this into an iterative formula by adding superscripts:

x(k+1) =5 + y(k)

2, y(k+1) =

4− x(k)

−2.

Thus starting with x(0) = 0 and y(0) = 0 gives

x(1) =5 + y(0)

2=

5 + 0

2=

5

2, y(1) =

4− x(0)

−2=

4− 0

−2= −2,

then

x(2) =5 + y(1)

2=

5− 2

2=

3

2, y(2) =

4− x(1)

−2=

4− 52

−2= −3

4.

Continuing in this way we can make a table of values as below, working to 4 significantfigures at each stage:

k x(k) y(k)

0 0 01 2.5 -22 1.5 -0.753 2.125 -1.254 1.875 -0.93755 2.031 -1.0626 1.969 -0.98457 2.008 -1.0168 1.992 -0.9969 2.002 -1.00410 1.998 -0.99911 2.000 -1.00112 2.000 -1.00013 2.000 -1.00014 2.000 -1.000

[MATH1551] 27 [updated 23/11/2021]

Once there are two successive lines the same, they will remain the same from then on andwe can stop. Notice how the sequence of iterates converges towards the actual solution,which we easily check is x = 2, y = −1.

Example. Repeat the previous example but with the equations swapped:x− 2y = 42x− y = 5

Now the Jacobi iteration is given by the equations

x(k+1) =4 + 2y(k)

1, y(k+1) =

5− 2x(k)

−1.

Starting from the same initial guess, we construct the following table of values:

k x(k) y(k)

0 0 01 4 -52 -6 33 10 -174 -30 155 34 -656 -126 637 130 -2578 -510 2559 514 -102510 -2046 102311 2050 -409712 -8190 4095

This time they don’t seem to converge as they did before...

This illustrates that Jacobi’s method for solving Ax = b only works for some matrices.It can be proved that it converges for any starting values when A is strictly diagonallydominant.

▶ Strictly diagonally dominant is a sufficient but not necessary condition for Jacobi’smethod to work.

The method works similarly for matrices of any size.

Example. Apply Jacobi’s method to the following equations:

x− 3y + z = −3

−4x− y + 2z = −7

−2x− 3y + 7z = −12

Note that the matrix of coefficients

1 −3 1−4 −1 2−2 −3 7

is not strictly diagonally dominant,

but we can make it so by swapping the first two equations:−4 −1 21 −3 1−2 −3 7

is s.d.d. as

| − 4| > | − 1|+ |2|,| − 3| > |1|+ |1|,|7| > | − 2|+ | − 3|

[MATH1551] 28 [updated 23/11/2021]

The equations are then

−4x− y + 2z = −7,

x− 3y + z = −3,

−2x− 3y + 7z = −12,

which we rewrite as

x =−7 + y − 2z

−4, y =

−3− x− z

−3, z =

−12 + 2x+ 3y

7.

Jacobi’s method then has the iterative formula

x(k+1) =−7 + y(k) − 2z(k)

−4,

y(k+1) =−3− x(k) − z(k)

−3,

z(k+1) =−12 + 2x(k) + 3y(k)

7.

Starting with x(0) = y(0) = z(0) = 0, we get

x(1) =−7 + y(0) − 2z(0)

−4=

−7 + 0− 2× 0

−4= 1.75,

y(1) =−3− x(0) − z(0)

−3=

−3− 0− 0

−3= 1.00,

z(1) =−12 + 2x(0) + 3y(0)

7=

−12 + 2× 0 + 3× 0

7= −1.71,

and continuing gives the table

k x(k) y(k) z(k)

0 0 0 01 1.75 1.00 -1.712 0.645 1.01 -0.7863 1.10 0.95 -1.104 0.962 1.00 -0.9935 1.00 0.99 -1.016 0.998 0.997 -1.007 1.00 1.00 -1.008 1.00 1.00 -1.00

You can check that x = 1, y = 1, z = −1 is indeed the solution.

[MATH1551] 29 [updated 23/11/2021]

We can write a general formula for the Jacobi method by decomposing the matrix A intoits diagonal, lower and upper parts: for a general n× n matrix

A =

a11 a12 · · · a1na21 a22 · · · a2n...

.... . .

...an1 an2 · · · ann

we write A = D − L− U where

D =

a11 0 · · · 00 a22 · · · 0...

.... . .

...0 0 · · · ann

and

L =

0 0 · · · 0

−a21 0 · · · 0...

.... . .

...−an1 −an2 · · · 0

, U =

0 −a12 · · · −a1n0 0 · · · −a2n...

.... . .

...0 0 · · · 0

▶ Warning: these are not the same as the L and U used in LU decomposition!

Then Jacobi’s method comes from rewriting

Ax = (D − L− U)x = b ⇐⇒ Dx = b+ (L+ U)x,

so that the iteration step is given by

x(k+1) = D−1(b+ (L+ U)x(k)

).

Note that finding and multiplying by the diagonal matrix inverse D−1 is easy – you justdivide the ith equation by aii.

▶ The Jacobi method is rather slow, in terms of the number of steps needed relativeto the size of the matrix, so it is not often used in practice. However, it has seen aresurgence in recent times as it is highly suitable for parallel computation: at eachstep, each variable can be updated independently, in any order.

[MATH1551] 30 [updated 23/11/2021]

1.10 Iterative Methods: Gauss-Seidel Method

We can speed up the convergence of the Jacobi method (for many matrices) by a smallmodification. In the Gauss-Seidel method, new values are used as soon as they areavailable. So in the Gauss-Seidel method, the order that the variables are updated matters(and is sometimes carefully chosen in applications).

Example. Apply Gauss-Seidel method to2x− y = 5x− 2y = 4

with initial guess

(x(0)

y(0)

)=

(00

).

As with Jacobi, we rewrite the equations in the form

x =5 + y

2, y =

4− x

−2,

but now we use the newest available estimate for the iteration at each step:

x(k+1) =5 + y(k)

2, y(k+1) =

4− x(k+1)

−2.

Notice we are using the more recent x(k+1) we just calculated instead of x(k). Now wefind

x(1) =5 + y(0)

2=

5 + 0

2=

5

2, y(1) =

4− x(1)

−2=

4− 52

−2= −3

4,

and

x(2) =5 + y(1)

2=

5− 34

2=

17

8, y(2) =

4− x(2)

−2=

4− 178

−2= −15

16.

The values obtained are given in the following table, alongside the Jacobi table for com-parison:

Jacobi’s method Gauss-Seidel methodk x(k) y(k) x(k) y(k)

0 0 0 0 01 2.5 -2 2.5 -0.752 1.5 -0.75 2.125 -0.93753 2.125 -1.25 2.031 -0.98454 1.875 -0.9375 2.008 -0.9965 2.031 -1.062 2.002 -0.9996 1.969 -0.9845 2.000 -1.0007 2.008 -1.016 2.000 -1.0008 1.992 -0.9969 2.002 -1.00410 1.998 -0.99911 2.000 -1.00112 2.000 -1.00013 2.000 -1.000

As you can see, for this example the Gauss-Seidel method converges more quickly.

[MATH1551] 31 [updated 23/11/2021]

As for Jacobi’s method, it can be proved that the Gauss-Seidel method will convergefor any starting values if the matrix is strictly diagonally dominant. But we also have afurther useful result: the Gauss-Seidel method will converge for any starting values if thematrix is positive definite.

A positive definite matrix is a matrix that is symmetric (meaning A = AT ) andwhose principal minors are all positive. For example, consider the two matrices

A =

4 1 21 3 13 0 6

and B =

2 −1 1−1 5 21 2 2

.

The matrix A cannot be positive definite, because it is not symmetric (even though itsprincipal minors are all positive). On the other hand, B is positive definite, because it issymmetric and

|2| > 0,

∣∣∣∣ 2 −1−1 5

∣∣∣∣ = 9 > 0,

∣∣∣∣∣∣2 −1 1−1 5 21 2 2

∣∣∣∣∣∣ = 2(6) + 1(−4) + 1(−7) = 1 > 0.

Notice B is positive definite but not strictly diagonally dominant. So not only is theGauss-Seidel method often quicker, we can also guarantee that it will converge for morematrices than the Jacobi method.

As for Jacobi, we can also write a general formula for the method in terms of the matrixdecomposition A = D − L− U . This time we find

x(k+1) = D−1(b+ Lx(k+1) + Ux(k)

).

Note it does make sense to write it like this. The entries in the left hand side vector x(k+1)

are computed in order, one by one, and the Lx(k+1) on the right only includes entriesthat have been already computed.

[MATH1551] 32 [updated 23/11/2021]

1.11 Iterative Methods: SOR Method

There is a further refinement to the Gauss-Seidel method called Successive Over-Relaxation, or SOR for short. Starting with A = D − L − U as before, we rewriteour system Ax = b as

x = D−1(b+ Lx+ Ux

),

then multiply this by some number ω (which we will choose later) to give

ωx = ωD−1(b+ Lx+ Ux

).

Adding (1− ω)x to both sides gives

x = (1− ω)x+ ωD−1(b+ Lx+ Ux

).

The SOR method is then the iteration

x(k+1) = (1− ω)x(k) + ωD−1(b+ Lx(k+1) + Ux(k)

).

Notice ω = 1 would give the Gauss-Seidel method, whereas for a different value of ω theSOR method is a weighted average. The idea behind this is that at each stage of theGauss-Seidel iteration, the guesses move towards the final answer, but will, dependingon the matrix, generally overshoot or generally undershoot. The modification with anappropriate choice of ω can improve this and speed up convergence.

Example. Derive the SOR iteration formula for the system4x+ 3y = 243x+ 4y − z = 30

−y + 4z = −24

We rewrite the equations as

x = (1− ω)x+ ω

(24− 3y

4

),

y = (1− ω)y + ω

(30− 3x+ z

4

),

z = (1− ω)z + ω

(−24 + y

4

)and the SOR iteration is

x(k+1) = (1− ω)x(k) + ω

(24− 3y(k)

4

),

y(k+1) = (1− ω)y(k) + ω

(30− 3x(k+1) + z(k)

4

),

z(k+1) = (1− ω)z(k) + ω

(−24 + y(k+1)

4

).

Notice the fractions in the brackets are precisely the terms you’d get in the right-handsides of the Gauss-Seidel iteration (which you obtain by setting ω = 1).

[MATH1551] 33 [updated 23/11/2021]

Rather than give a table of results, here is a graph showing the performance of SOR fordifferent values of ω, compared to the exact solution x = 3, y = 4, z = −5:

Trial and error shows that the optimum value is ω ≈ 1.25. In this example, the SORmethod with this optimum ω converges approximately twice as quickly as the Gauss-Seidel method (where ω = 1).

▶ Notice that the performance is strongly dependent on the choice of the ω parameter.Choosing the best value of ω is generally a very hard problem. It is possible to calculatethe optimum ω for special classes of matrices, but this is beyond the scope of this course.In practice, for general matrices some trial-and-error is used.

[MATH1551] 34 [updated 23/11/2021]

2 Complex Numbers

2.0 Introduction

Complex numbers provide a way of working with equations such as x2 = −7 that don’thave solutions in the ordinary real numbers we are used to.

Their very special properties can simplify the study of areas such as geometry, particlephysics, fluid mechanics, electrical circuits,... which don’t initially appear to need asquare root of −1. But why should such a curious thing ever arise?

There are good reasons for introducing new types of “numbers” - to solve equations. It’snatural to start with the set of natural numbers, which we write as

N = {1, 2, 3, ...}

We have no problem adding and multiplying them: given a, b ∈ N, we can find c, d ∈ Nsatisfying

c = a+ b and d = ab.

However, if we want to always find an x satisfying

a+ x = b,

we are led to introduce negative numbers. We thus get the set of integers (wholenumbers)

Z = {...,−2,−1, 0, 1, 2, ...}

If we further want to always find an x so that

bx = a

(at least for b = 0), we are led to the rational numbers Q, i.e. fractions of integers.We can add, subtract, multiply and divide (except by 0) these just fine.

But then what about equations such as x2 = 2?

Example. The number√2 is not in Q.

Just for fun, let’s actually prove this. Suppose on the contrary that√2 is rational. Then

there are positive integers a, b with no common integer factors such that

√2 =

a

b, i.e. a2 = 2b2.

But this means a2 and hence a is an even number, so a = 2c for some integer c. Theabove now says

a2 = (2c)2 = 2b2 and so b2 = 2c2.

But now this means b2 and hence b is even. We have shown that both a and b are divisibleby 2. But we said that they had no common factor and so this contradiction means aand b can’t exist in the first place.

[MATH1551] 35 [updated 23/11/2021]

This reasoning (“proof by contradiction”) is very common in maths. A small modificationof the above shows more generally that a natural number m is either a perfect square or√m is irrational.

Thus, starting with the rationals, we can go on to create irrational numbers such as√2,

3√

2 +√7,... which are not merely fractions of integers. However, there are numbers

we are familiar with which are not even roots of polynomials with rational coefficients.These are the transcendental numbers and include the well-known numbers π and e.

We can, however, think of π as a “limit” of rational numbers, e.g. 3, 3.1, 3.14, 3.141, ...which get progressively closer to π. By similar considerations (that we won’t go into),one can construct any of the real numbers R. You can think of this as the set of numbersformed from (possibly infinite) decimal expansions, e.g. e = 2.7182818284....

Unfortunately, there are still some gaps. Some quadratic equations, e.g. x2 − 7 = 0 havetwo solutions in R whereas others, e.g. x2 + 7 = 0 have none. We fix this non-uniformbehaviour by introducing the complex numbers. Complex numbers are of the form

z = x+ iy

where x and y are real numbers and i =√−1 is a special new number satisfying i2 = −1.

We use the symbol C to mean the set of all complex numbers, just as we use Z for theintegers and R for the real numbers.

Mathematicians like complex numbers because, just by including i =√−1, we obtain

the Fundamental Theorem of Algebra. This says that a polynomial equation of anydegree

zn + an−1zn−1 + . . .+ a1z + a0 = 0,

where a0, a1, . . . , an−1 are in C, has exactly n roots in C (up to multiplicity). In otherwords, there are unique roots z1, z2, . . . , zn (not necessarily distinct) in C such that wecan factorize

zn + an−1zn−1 + . . .+ a1z + a0 = (z − z1)(z − z2) · · · (z − zn).

Even if the coefficients are real, it could be that the roots are not. But the theoremtells us that all polynomials have the “correct” number of roots if we allow for complexnumbers.

Example. Consider the quadratic equation z2 + 7 = 0. This is a polynomial equationwith real coefficients, but the roots have the form

z = ±√−7 = ±

√7√−1 = ±i

√7,

which are not real. This corresponds to the factorisation

z2 + 7 = (z + i√7)(z − i

√7).

▶ Notation: occasionally electrical engineers use the letter j instead of i because i isoften used for currents. But we will use the more common i.

[MATH1551] 36 [updated 23/11/2021]

2.1 Complex Arithmetic

In this section, we will learn about the basic properties and manipulation of complexnumbers, so that you are comfortable using them in other modules.

Given a complex numberz = x+ iy

where x and y are real numbers, we call x the real part and y the imaginary part,and write

x = Re(z), y = Im(z).

If x = 0 we say that z is purely imaginary and if y = 0 then z is just a real number.

▶ Warning: Im(z) is the real number y, not the imaginary number iy.

Using their real and imaginary parts, complex numbers can be thought of geometrically as“numbers with two coordinates”. We can represent them by a position in two-dimensionalspace, called the complex plane or sometimes an Argand diagram. For example:

There are some basic rules for manipulating complex numbers.

Addition and Subtraction: We add and subtract two complex numbers by addingtheir corresponding real and imaginary parts. For example

(1 + 2i) + (4 + i) = (1 + 4) + i(2 + 1) = 5 + 3i

and(1 + 2i)− (4 + i) = (1− 4) + i(2− 1) = −3 + i.

Notice that two complex numbers z = x+ iy and w = u+ iv are equal when

z = w ⇐⇒ z − w = 0,

⇐⇒ x− u = 0 and y − v = 0,

⇐⇒ x = u and y = v.

In other words, they’re equal precisely when their real and imaginary parts are bothequal.

[MATH1551] 37 [updated 23/11/2021]

In the complex plane, adding two complex numbers is like adding vectors (which we willcome back to in Topic 3). For example, the sum and difference above look as follows:

Complex conjugation: Given z = x + iy, we define the complex conjugate of z tobe

z = x− iy.

In the complex plane, this is just the reflection of z in the real axis:

By combining z and z, we can derive expressions for the real and imaginary parts:

z + z = (x+ iy) + (x− iy) = 2x = 2Re(z) =⇒ Re(z) =z + z

2,

z − z = (x+ iy)− (x− iy) = 2iy = 2i Im(z) =⇒ Im(z) =z − z

2i.

Multiplication: We just use the normal rules of algebra, along with the fact thati2 = −1. For example,

(2 + 3i)(5 + i) = 2(5) + 2i+ 3(5)i+ 3i2 = 10 + 2i+ 15i+ 3(−1) = 7 + 17i.

Also, notice we actually have two square roots of −1 since (−i)2 = i2 = −1 as well.

▶ The geometrical interpretation of multiplication is more complicated and we will returnto it later...

[MATH1551] 38 [updated 23/11/2021]

Modulus: For an arbitrary complex number z = x+ iy, we have

zz = (x+ iy)(x− iy) = x2 + y2.

This is a non-negative real number and is zero precisely when z = 0. The modulus (orabsolute value) of z is defined by

|z| =√zz =

√x2 + y2.

In the complex plane, this is the distance between the origin and z (by Pythagoras):

Division: We can remove complex denominators from a fraction by multiplying top andbottom by the complex conjugate of the denominator. For example,

2 + 3i

5 + i=

(2 + 3i)(5− i)

(5 + i)(5− i)=

2(5)− 2i+ 3(5)i+ 3

52 + 12=

13 + 13i

26=

1

2+

1

2i.

We can see from the definitions that complex conjugation combines nicely with arithmeticoperations. In particular, you can check from the definitions that:

z + w = z + w, and z − w = z − w,

zw = (z) (w) , and( zw

)=

z

w.

These can make calculations much easier.

Example. Find the complex conjugate of (2 + 3i)7. We have

(2 + 3i)7 = (2 + 3i)7 = (2− 3i)7.

A similar thing happens with the modulus – it works nicely with multiplication anddivision. Notice that

|zw|2 = (zw)(zw) = zwz w = zz ww = |z|2|w|2,

and so |zw| = |z| |w|. Similarly,∣∣∣ zw

∣∣∣ = |z||w|

.

[MATH1551] 39 [updated 23/11/2021]

Example. Find the modulus of (2 + 3i)7. We have

|(2 + 3i)7| = |2 + 3i|7 =(√

22 + 32)7

= 137/2.

Find the modulus of2 + 3i

5 + i. We have

∣∣∣∣2 + 3i

5 + i

∣∣∣∣ = |2 + 3i||5 + i|

=

√22 + 32√52 + 12

=

√13√26

=

√2

2.

Notice we found this without using2 + 3i

5 + i=

1

2+

1

2i.

▶ Warning: modulus does not work with addition and subtraction in the same way, andin general

|z + w| = |z|+ |w|.

This makes sense because geometrically, |z|, |w| and |z + w| are the three sides of atriangle:

The triangle inequality says that the length of any side of a triangle can not be greaterthan the sum of the other two side lengths.

Are there any triangles where one side length actually equals the sum of the other two?The answer is yes, but they are quite special and you might have to think about it tofind them! We’ll return to the triangle inequality later.

[MATH1551] 40 [updated 23/11/2021]

2.2 Polar Form

Polar coordinates lead to an important alternative way to represent complex numbers.Recall that a point with Cartesian coordinates (x, y) has polar coordinates (r, θ) where

x = r cos θ and y = r sin θ,

so thatr =

√x2 + y2 and tan θ =

y

x.

In the complex plane this looks like

In terms of complex numbers, we have

z = x+ iy = r cos θ + ir sin θ = r(cos θ + i sin θ).

As we saw earlier, the lengthr = |z| =

√x2 + y2

is the modulus of z. The angle θ between the real axis and z is called the argument ofz and written

θ = arg(z).

Since shifting θ by a whole multiple of 2π doesn’t change z (it just rotates a whole numberof times around the origin) we can always make sure that −π < θ ≤ π. This particularvalue is called the principal argument and written Arg(z) with a capital letter. Ingeneral, we have arg(z) = Arg(z) + 2nπ for some n ∈ Z.

▶ We have to be careful when calculating θ from x and y, because it is not simply θ =arctan(y/x). The problem is that arctan gives values between −π/2 and π/2, but we wantan angle which can be anything between −π and π. We actually have θ = arctan(y/x) orarctan(y/x) ± π, depending on which quadrant z lies in. Most programming languagesdefine a special function for this; in MATLAB you type atan2(y,x).

[MATH1551] 41 [updated 23/11/2021]

Example. Find the moduli and principal arguments of 1 + i and −√3− i.

(Note: “moduli” is the plural of modulus.)

The moduli are|1 + i| =

√12 + 12 =

√2

and

| −√3− i| =

√(√3)2 + 12 = 2.

From the diagram, we have

Arg(1 + i) = arctan(1/1) = π/4

andArg(−

√3− i) = arctan(1/

√3)− π = −π + π/6 = −5π/6.

▶ Unless you have had lots of practice, always draw a picture to get the argument correct.I always do!

It should be clear that two complex numbers are equal if and only if they have the samemodulus and principal argument.

[MATH1551] 42 [updated 23/11/2021]

2.3 Euler’s Formula

A fundamental fact is Euler’s formula

eiθ = cos θ + i sin θ.

In particular, setting θ = π gives the famous equation

eiπ = −1.

▶ This is often voted the most beautiful equation in all mathematics.

We can give a sketch of why Euler’s formula holds using some calculus. Suppose we definea function of a real number θ,

f(θ) = e−iθ(cos θ + i sin θ).

We will show that this is always equal to 1. Treating i as a regular number, the derivativeis

f ′(θ) = e−iθ(− sin θ + i cos θ)− ie−iθ(cos θ + i sin θ),

= e−iθ(− sin θ + i cos θ − i cos θ − i2 sin θ) = 0.

We have a function whose derivative is always zero and hence must be constant. However,f(0) = e0(cos 0 + i sin 0) = 1, so f(θ) = 1 for all θ and Euler’s formula follows.

Any complex number can therefore be written as

z = x+ iy = reiθ.

With this representation, we can see what multiplication and division do geometricallyin the complex plane. With z = reiθ and w = seiϕ, we have

zw = rsei(θ+ϕ) andz

w=

r

sei(θ−ϕ).

So when we multiply two complex numbers, we multiply their moduli and add theirarguments. In other words,

|zw| = |z| |w| = rs and arg(zw) = arg(z) + arg(w) = θ + ϕ.

▶ Generally speaking, Cartesian coordinates x + iy are easy to add but complicated tomultiply, whereas polar coordinates reiθ are easy to multiply but complicated to add. Wehave to learn to choose whichever is most appropriate for the problem at hand.

[MATH1551] 43 [updated 23/11/2021]

2.4 Powers of Complex Numbers

If z = reiθ and a ∈ R, we can easily find

za = ra(eiθ)a = raeiaθ.

But remember that we can add a multiple of 2π to θ without changing z. So for anyinteger n ∈ Z we also have

za = raeiaθ+2anπi.

If a is an integer, this makes no difference since e2anπi = 1. But if a is non-integer, thenza will take multiple different values for different n ∈ Z.

Example. Let’s find all z ∈ C such that z2 = 4. Notice that z2 = 4 = 4e2nπi for anyn ∈ Z, and so

z =(4e2nπi

) 12 = 4

12 enπi = 2enπi.

We have enπi = 1 for even n and enπi = −1 for odd n. So we get precisely the two realroots z = ±2 (that we should have already guessed).

Re

Im

2−2

Example. Find all z ∈ C such that z3 = 1. (These are the cube roots of 1.)

From the Fundamental Theorem of Algebra this should have three roots, but only one ofthem is real, namely z = 1. The others must be complex! Writing z3 = 1 = e2nπi, then

z = e23nπi for n ∈ Z.

There appear to be infinitely many solutions . . . , e−23πi, e0, e

23πi, e

43πi, . . . but these actually

repeat every third term, so there are exactly three roots

z = 1, e23πi, e−

23πi, that is, z = 1,

−1± i√3

2.

Re

Im

1

e2πi/3

e−2πi/3

2π/3

−2π/3

[MATH1551] 44 [updated 23/11/2021]

Notice these three cube roots of 1 lie at the corners of an equilateral triangle. We candeal with more general fractional powers in the same way: given m ∈ Z and w ∈ C solve

zm = w

The key is to work with polar coordinates. Write z = reiθ and w = seiϕ = seiϕ+2nπi.Then

zm = w =⇒ rmeimθ = seiϕ = seiϕ+2nπi for any n ∈ Z

and equating the modulus and argument on each side gives{rm = s

mθ = ϕ+ 2nπ=⇒

{r = s1/m

θ = ϕ+2nπm

=⇒ z = s1/meiϕ/m+2nπi/m

Notice that shifting n by a multiple of m doesn’t change this so we do, as expected, getm roots to the equation zm = w, for example by taking 0 ≤ n ≤ m− 1. In the complexplane, these are at the vertices of a regular polygon.

We can even raise a complex number to a complex power.

Example. Let’s find ii. Again, we start by writing i in polar coordinates. To find theargument draw the diagram:

Re

Im

i

π/2

We see that i = eπi/2 = eπi/2+2nπi for any n ∈ Z. Then

ii =(eπi/2+2nπi

)i= eπi

2/2+2nπi2 = e−π/2−2nπ.

Soii = . . . , e3π/2, e−π/2, e−5π/2, . . .

There are infinitely many solutions, and they are all real! For example, the three partic-ular solutions listed are approximately 111.3, 0.2079, 0.00039.

[MATH1551] 45 [updated 23/11/2021]

Here’s a more complicated example.

Example. Find all values of (1 + i)2−i.

First, write 1 + i in the appropriate form.

Re

Im

1 + i

π/4

√2

In polar form, we have 1 + i =√2eπi/4 =

√2eπi/4+2nπi for any n ∈ Z.

Also using√2 = eln

√2, we can now write

1 + i = eln√2+πi/4+2nπi.

for any n ∈ Z. So

(1 + i)2−i = e(ln√2+πi/4+2nπi)(2−i)

= eln 2+π/4+2nπ+i(π/2−ln√2)+4nπi

= 2eπ/4+2nπei(π/2−ln√2)

where we have used e4nπi = 1 to simplify at the end. There are infinitely many valueshaving a different modulus 2eπ/4+2nπ for each n ∈ Z but having the same argument12π − ln

√2. This means they all lie on a line through the origin in the complex plane.

▶ More generally, to find za+ib where a, b are real, first write z = reiθ in polar form.

Then using r = eln r, we have

z = eln r+iθ = eln r+iθ+2nπi for any n ∈ Z.

Hence

za+ib =(eln r+iθ+2nπi

)a+ib= e(ln r+iθ+2nπi)(a+ib)

= ea ln r−b(θ+2nπ)+i(b ln r+aθ+2anπ)

= rae−b(θ+2nπ)︸ ︷︷ ︸modulus

argument︷ ︸︸ ︷ei(b ln r+aθ+2anπ)

Typically, one gets a different value for each n ∈ Z.Don’t memorise formulas like this or just substitute numbers into it! Instead, rememberthe method - write z in the form eα+iβ then multiply out the powers and simplify.

[MATH1551] 46 [updated 23/11/2021]

2.5 Solving Polynomial Equations

Even though the Fundamental Theorem of Algebra tells us that polynomial equationsalways have complex solutions, we can only find explicit solutions in special cases. Forexample, we saw how to find all solutions to zm = w when m is an integer.

Another special case we are familiar with is a quadratic equation

az2 + bz + c = 0 for a, b, c ∈ C and a = 0.

Remember that we solve this by completing the square:

az2 + bz + c = 0 ⇐⇒ z2 +b

az +

c

a= 0

⇐⇒(z +

b

2a

)2

+c

a− b2

4a2= 0

⇐⇒(z +

b

2a

)2

=b2 − 4ac

4a2

⇐⇒ (2az + b)2 = b2 − 4ac.

We know that b2− 4ac has two square roots (even when a, b, c are not real), so we obtainthe usual formula

z =−b±

√b2 − 4ac

2a.

It’s also possible to deal with quadratic equations in zm.

Example. Find all solutions to z6 + 2z3 + 2 = 0.

The Fundamental Theorem of Algebra tells us there should be six solutions. The keything here is to notice that the polynomial is just a quadratic in w = z3. So there aretwo solutions for w:

w2 + 2w + 2 = 0 =⇒ w =−2±

√22 − 4(1)(2)

2= −1± i.

For each w, we can now find three solutions for z.

Re

Im

−1 + i

3π/4

−1− i

−3π/4

[MATH1551] 47 [updated 23/11/2021]

Firstly, converting z3 = −1 + i to polar coordinates gives

z3 = −1 + i = 212 e

34πi+2nπi for n ∈ Z

=⇒ z = 216 e

14πi+ 2

3nπi.

Taking three consecutive values for n (e.g. n = 0,±1) gives z = 216 e

14πi, 2

16 e

1112

πi, 216 e−

512

πi.

Similarly, z3 = −1− i gives

z3 = −1− i = 212 e−

34πi+2nπi for n ∈ Z

=⇒ z = 216 e−

14πi+ 2

3nπi,

leading to the distinct solutions z = 216 e−

14πi, 2

16 e

512

πi, 216 e−

1112

πi.

In summary, there are six solutions z = 216 e±

14πi, 2

16 e±

512

πi, 216 e±

1112

πi. These are plotted inthe following diagram – notice that the solutions are in complex conjugate pairs:

Re

Im

21/6

▶ In fact, for any polynomial p(z) with real coefficients, the roots are either real or occurin complex conjugate pairs. We already know this for quadratic equations, but you canshow using the properties of complex conjugates that, if p(z) = 0, then p(z) = 0 as well.

Another technique we can use to solve polynomial equations is factorisation.

Example. Finding all solutions to z3 = 1 is equivalent to factorising

z3 − 1 = (z − z1)(z − z2)(z − z3),

and the roots are then z1, z2, z3. It’s easily seen that z1 = 1 is a solution, so (z− 1) mustbe a factor and

z3 − 1 = (z − 1)(az2 + bz + c)

for some coefficients a, b, c. Multiplying out gives

z3 − 1 = az3 + (b− a)z2 + (c− b)z − c.

Equating coefficients on the left and right gives a = b = c = 1, so

z3 − 1 = (z − 1)(z2 + z + 1).

The solutions to z2 + z + 1 = 0 give the other two roots

z2, z3 =−1±

√−3

2= −1

2± i

√3

2= e±

23πi.

[MATH1551] 48 [updated 23/11/2021]

2.6 Complex Logarithms

Suppose we want to solve the equation

ez = 1.

For real numbers, we know there is one solution z = ln 1 = 0, but for complex numbersthere are more solutions. Writing z = x+ iy and 1 = e2nπi for any n ∈ Z, we get

exeiy = e2nπi for any n ∈ Z.

Equating modulus and argument shows that ex = 1 (meaning x = 0) and y = 2nπ. Inother words,

ez = 1 ⇐⇒ z = 2nπi for any n ∈ Z.

Now suppose we wish to solve a more general equation for any w ∈ C

ez = w.

Writing w = seiϕ = eln s+iϕ, we have

ez = eln s+iϕ ⇐⇒ ez−ln s−iϕ = 1 ⇐⇒ z − ln s− iϕ = 2nπi for n ∈ Z.

So we have infinitely many solutions

z = ln s+ iϕ+ 2nπi for n ∈ Z= ln |w|+ i argw.

This z is what we mean by the logarithm of the complex number w. Notice that it isagain multi-valued since arg(w) takes infinitely many values.

Recall that the principal argument Arg(w) is the particular angle in the polar form thatsatisfies −π < Arg(w) ≤ π. Similarly we define the principle value of the logarithm tobe

Ln(w) = ln |w|+ iArg(w).

Example. Solve ez = −1 and hence find Ln(−1).

We havez = ln | − 1|+ i arg(−1) = 0 + πi+ 2nπi = (2n+ 1)πi

so taking the principal value gives

Ln(−1) = iπ.

Notice this is just a restatement of the fact eπi = −1.

[MATH1551] 49 [updated 23/11/2021]

2.7 De Moivre’s Theorem

If we replace θ in Euler’s formula

eiθ = cos θ + i sin θ

with nθ for integer n, then

einθ = cos(nθ) + i sin(nθ).

But clearly einθ =(eiθ)n

and so we get de Moivre’s Theorem:

cos(nθ) + i sin(nθ) = (cos θ + i sin θ)n.

We can use this to easily prove various trigonometric identities for real θ.

Example. Express cos(3θ) and sin(3θ) in terms of cos θ and sin θ.

From de Moivre’s Theorem,

cos(3θ) + i sin(3θ) = (cos θ + i sin θ)3 ,

= cos3 θ + 3 cos2 θ(i sin θ) + 3 cos θ(i sin θ)2 + (i sin θ)3,

=(cos3 θ − 3 cos θ sin2 θ

)+ i(3 cos2 θ sin θ − sin3 θ

).

Now equating real and imaginary parts gives

cos(3θ) = cos3 θ − 3 cos θ sin2 θ,

sin(3θ) = 3 cos2 θ sin θ − sin3 θ.

We could go a little further, using cos2 θ + sin2 θ = 1 to see

cos(3θ) = cos3 θ − 3 cos θ(1− cos2 θ) = 4 cos3 θ − 3 cos θ,

sin(3θ) = 3(1− sin2 θ) sin θ − sin3 θ = 3 sin θ − 4 sin3 θ.

[MATH1551] 50 [updated 23/11/2021]

2.8 Trigonometric Functions

Interchanging θ with −θ in Euler’s formula gives two equations

eiθ = cos θ + i sin θ,

e−iθ = cos θ − i sin θ,

which may be solved simultaneously to give the important expressions

cos θ =eiθ + e−iθ

2and sin θ =

eiθ − e−iθ

2i.

▶ Notice the similarity to what we learnt about hyperbolic functions in the Inductionweek Warm-up. If we leave out the i’s in the above formulae, we get

cosh θ =eθ + e−θ

2and sinh θ =

eθ − e−θ

2,

and in fact we can relate the trigonometric and hyperbolic functions:

cos θ = cosh(iθ) and sin θ =sinh(iθ)

i.

We can use the exponential expressions for cos θ and sin θ to easily derive trigonometricidentities.

Example. Show that

cos2 θ =1 + cos(2θ)

2and sin2 θ =

1− cos(2θ)

2.

Just substitute the formulas in and simplify:

cos2 θ =

(eiθ + e−iθ

2

)2

=e2iθ + 2 + e−2iθ

4

=1

2+

1

2

(e2iθ + e−2iθ

2

)=

1 + cos(2θ)

2.

Similarly,

sin2 θ =

(eiθ − e−iθ

2i

)2

=e2iθ − 2 + e−2iθ

−4

=1

2− 1

2

(e2iθ + e−2iθ

2

)=

1− cos(2θ)

2.

The exponential expressions for cos θ and sin θ can also help with calculus of trigonometricfunctions.

[MATH1551] 51 [updated 23/11/2021]

Example. Findd

dxcosx directly from the exponential formula.

Differentiating gives

d

dxcosx =

ieix − ie−ix

2= i2

(eix − e−ix

2i

)= − sinx.

Example. Find

∫eax cos(bx) dx and

∫eax sin(bx) dx, where a, b ∈ R are not both

zero.

We could do each of these using integration by parts twice. However, notice that

eax cos(bx) + ieax sin(bx) = eaxeibx = e(a+ib)x,

so ∫eax cos(bx) dx+ i

∫eax sin(bx) dx =

∫e(a+ib)x dx

=1

a+ ibe(a+ib)x + C

=

(a− ib

a2 + b2

)eax(cos(bx) + i sin(bx)

)=

eax (a cos(bx) + b sin(bx))

a2 + b2

+ ieax (a sin(bx)− b cos(bx))

a2 + b2+ C.

Equating real and imaginary parts gives us both integrals at once.

We can use the exponential expressions to define cos z and sin z for arbitrary complexnumbers:

cos z =eiz + e−iz

2and sin z =

eiz − e−iz

2i.

This has a number of surprising consequences. For instance, sin z can take values whichare not just real numbers between −1 and 1.

Example. Find all complex numbers z satisfying sin z = 2.

We want to find all z so that

eiz + e−iz

2i= 2 ⇐⇒ e2iz − 4ieiz − 1 = 0.

Writing w = eiz, this is a quadratic in w, which we know how to solve:

w2 − 4iw − 1 = 0 ⇐⇒ w =4i±

√(4i)2 + 4

2= i(2±

√3).

Now we need to solve eiz = i(2±√3) = (2±

√3)e

12πi+2nπi for n ∈ Z.

[MATH1551] 52 [updated 23/11/2021]

Writing z = x+ iy, we have

ei(x+iy) = e−yeix = (2±√3)e

12πi+2nπi,

⇐⇒

{x = 1

2π + 2nπ,

y = − ln(2±√3),

⇐⇒ z =1

2π + 2nπ − i ln(2±

√3) for n ∈ Z.

In the complex plane, these look like two rows of equally spaced points:

Re

Im

ln(2 +√3)

ln(2−√3)

−7π/2 −3π/2 π/2 5π/2

We could rewrite these solutions in a lightly nicer way as

z =1

2π + 2nπ ± i ln(2 +

√3)

since (2 +√3)(2−

√3) = 1 and so ln(2−

√3) = − ln(2 +

√3).

▶ In general, for complex w, you can show that

arcsinw = −i ln(iw ±

√1− w2

).

Note that ln here is the complex logarithm so is multi-valued. This means that arcsin isalso multivalued. We could define a principal value for it in the same way as we did forarguments and logarithms. Notice how similar this formula is to the formula for sinh−1(x)from the Induction week Warm-up.

[MATH1551] 53 [updated 23/11/2021]

2.9 Application to Geometry

We’ve seen how complex numbers have a natural meaning in two-dimensional space. Thisconnection can help us study problems in two-dimensional geometry.

Recall that, for a point z in the complex plane, the modulus |z| represents the distancefrom the origin and arg z represents the angle with the real (horizontal) axis. Also,arithmetic operations correspond to geometric ones. For instance:

� Adding w = u+ iv translates by a distance u horizontally and v vertically.

� Multiplying by a real number s scales by a factor s.

� Multiplying by eiϕ rotates anticlockwise by an angle ϕ.

▶ In particular, rotations in 2D are easier to handle compared to using normal coordi-nates. With complex numbers, it’s just multiplication! In fact, three dimensional rota-tions can be handled using a generalization of the complex numbers called quaternions,but these are beyond the scope of this course.

Example. An equilateral triangle has centre at (3, 3) and a vertex at (4, 5). Find thecoordinates of the other vertices and the length of the sides.

In the complex plane, the centre is at z0 = 3 + 3i and one vertex is at z1 = 4 + 5i. Wewant to find the other two vertices z2, z3 as shown in the left-hand picture:

Re

Im

0 2 4

2

4

z1

z2

z3

z0 Re

Im

2

−2

2w1

w2

w3

We can find these by rotating z1 around z0 through angles ±2π/3 respectively. Rotating

around the origin would be easier, since then we just multiply by e±23πi. To rotate around

z0, we first translate the picture so that z0 moves to the origin. So set w1 = z1 − z0,w2 = z2 − z0 and w3 = z3 − z0. Now we have the right hand picture, where

w2 = e23πiw1 =⇒ z2 − z0 = e

23πi(z1 − z0),

w3 = e−23πiw1 =⇒ z3 − z0 = e−

23πi(z1 − z0).

[MATH1551] 54 [updated 23/11/2021]

Using e±23πi = cos

(±2

3π)+ i sin

(±2

3π)= −1

2± i1

2

√3, we get

z2 = z0 + e23πi(z1 − z0) = 3 + 3i+

(−1

2+ i

√3

2

)(1 + 2i) =

(5

2−√3

)+ i

(2 +

√3

2

),

z3 = z0 + e−23πi(z1 − z0) = 3 + 3i+

(−1

2− i

√3

2

)(1 + 2i) =

(5

2+√3

)+ i

(2−

√3

2

).

The length of the sides is

|z1 − z2| =

∣∣∣∣∣4 + 5i−(5

2−

√3

)− i

(2 +

√3

2

)∣∣∣∣∣=

∣∣∣∣∣(3

2+√3

)+ i

(3−

√3

2

)∣∣∣∣∣=

√√√√(3

2+√3

)2

+

(3−

√3

2

)2

=√15.

▶ Here’s a similar, general fact you could try to prove as an exercise: suppose a trianglehas vertices at z1, z2 and z3 (taken in anticlockwise order). Show that it is equilateral ifand only if

z1 + ωz2 + ω2z3 = 0 where ω = e2π3i.

Hint: notice it’s equilateral if and only if z3 − z2 = ω(z2 − z1) and use the fact thatω2 = −ω − 1.

Complex numbers also give a convenient way to define various curves and regions. Let’slook at a few examples.

Example. Given a complex number w and a real number R > 0, sketch the pointssatisfying |z − w| = R.

Notice that |z − w| is the distance between z and w. We are thus looking at all points zwhich are distance R away from w. It’s just a circle!

Re

Im

w

z

z − w

R

[MATH1551] 55 [updated 23/11/2021]

You can easily check this algebraically: if z = x+ iy and w = u+ iv then

|z − w|2 = (x− u)2 + (y − v)2 = R2.

Example. Sketch the set of points z satisfying |z − 2| = |z − 3i|.

The equation says that z is equidistant from the point 2 and the point 3i, so it must bethe perpendicular bisector line:

Re

Im

z

|z − 2|

|z − 3i|

2

3i

To see this algebraically, let z = x+ iy so

|z − 2|2 = |z − 3i|2 =⇒ (x− 2)2 + y2 = x2 + (y − 3)2,

=⇒ −4x+ 4 = −6y + 9,

=⇒ y =4x+ 5

6.

Example. Sketch the set of points z satisfyingπ

6≤ arg(z − 1− i) ≤ π

3.

This says that the angle between the line from z to 1+ i and the real axis is between π/6and π/3. That make the region a wedge shape between two lines.

Re

Im

1 + i

z

arg(z − 1− i)

[MATH1551] 56 [updated 23/11/2021]

Algebraically, if z = x+ iy then

1√3≤ tan arg(z − 1− i) =

y − 1

x− 1≤

√3,

sox− 1√

3+ 1 ≤ y ≤

√3(x− 1) + 1.

Hence z lies in the region between the two lines and with x ≥ 1.

Complex numbers also give a particularly elegant proof of the triangle inequality: thissays that if a triangle with side lengths a, b, c, then a ≤ b+ c. The equivalent statementusing complex numbers says that for any two complex numbers z, w we have

|z + w| ≤ |z|+ |w|.

Re

Im

z

|z|

w |w|

z + w

|z+w|

To prove this, remember that |z|2 = zz, |w|2 = ww, and

|z + w|2 = (z + w)(z + w)

= (z + w)(z + w)

= zz + zw + zw + ww

= |z|2 + |w|2 + zw + zw.

Now let zw = a+ ib for real numbers a and b. Then

zw + zw = zw + zw = 2Re(zw) = 2a ≤ 2√a2 + 02 ≤ 2

√a2 + b2 = 2|zw|.

Combining, we see that

|z + w|2 ≤ |z|2 + |w|2 + 2|zw| = |z|2 + |w|2 + 2|z||w| = (|z|+ |w|)2,

and taking square roots gives the required inequality.

[MATH1551] 57 [updated 23/11/2021]

2.10 Application to AC Circuits

Complex numbers are very good for analysing things that vary in waves. In particular,they help us to analyse AC (alternating current) electrical circuits.

▶ For the purposes of this module, we don’t expect you to know anything about ACcircuits. However, below we will set up the background to create some formulas involvingcomplex numbers. You would then be expected to know how to manipulate these formulasif they were given to you.

In an AC circuit, the current varies sinusoidally, so in particular, could be of the form

I = I0 cos(ωt)

for (real) constants I0 and ω. In particular, I0 is the amplitude and ω/(2π) is the fre-quency. In other words, the current completes a whole sinusoidal cycle when t increasesby 2π/ω.

An RLC circuit consists of some combination of three types of component:

The voltage across a resistor is proportional to the current

(this is Ohm’s Law):

VR = IR = I0R cos(ωt).

The voltage across a capacitor accumulates at a rate proportional to

the current:

VC =1

C

∫I dt =

I0ωC

sin(ωt).

The voltage across an inductor is proportional to the rate of change

of current:

VL = LdI

dt= −ωLI0 sin(ωt).

So the voltage across a resistor VR oscillates in sync with I but VC and VL are out of syncwith I, with phase differences ±π/(2ω).

Complex numbers allow us to handle these voltages across all three components in auniform way. The trick is to think of the current I as the real part of a complexcurrent

I = I0eiωt.

[MATH1551] 58 [updated 23/11/2021]

We now express everything as the real parts of complex quantities, using

eiωt = cos(ωt) + i sin(ωt).

Notice that multiplying by ±i = e±π2i allows us to switch between cosines and sines:

� cos(ωt) is the real part of eiωt,

� sin(ωt) = cos(ωt− π

2

)is the real part of ei(ωt−

π2) = −ieiωt,

� − sin(ωt) = cos(ωt+ π

2

)is the real part of ei(ωt+

π2) = ieiωt.

The voltages across our components are hence the real parts of multiples of I:

� VR is the real part of IR.

� VC is the real part of I

(−i

ωC

).

� VL is the real part of I(iωL).

Each of these is the real part of a complex voltage V = IZ, where Z is called thecomplex impedance:

ZR = R, ZC =−i

ωC, ZL = iωL.

In a circuit with more than one component, the rules for adding impedances are the sameas for adding resistances in a DC circuit. So in a circuit with impedances Z1, Z2, . . . , Zn

in series, the total impedance is

Z = Z1 + Z2 + . . .+ Zn.

For a circuit with impedances Z1, Z2, . . . , Zn in parallel, the total impedance satisfies

1

Z=

1

Z1

+1

Z2

+ . . .+1

Zn

.

Example. For the following “series RLC circuit” with current I = I0 cos(ωt), find thevalue of ω which minimises the magnitude of the total impedance. Also, find the phasedifference between the current and voltage in terms of ω.

[MATH1551] 59 [updated 23/11/2021]

The total impedance is

Z = ZR + ZL + ZC = R + iωL− i

ωC= R + i

(ωL− 1

ωC

).

This has modulus

|Z| =

√R2 +

(ωL− 1

ωC

)2

,

and since the two terms inside the square root are positive, we clearly want to find ω sothat the second term vanishes:

ωL− 1

ωC= 0 =⇒ ω =

1√LC

.

Furthermore, since the (complex) voltage and (complex) current are related by V = IZ,

the phase difference is ϕ = argZ = arg(V )− arg(I). With Z as above, this means

tanϕ =1

R

(ωL− 1

ωC

).

▶ Since the impedance is minimal for this value of ω, the circuit is “tuned” to allow thisparticular frequency ω/(2π) through most easily. It is called the resonant frequencyof the circuit. Also, ϕ is called the phase of the circuit - the voltage is a sinusoidal wavetracking the (sinusoidal) current with a phase-lag of ϕ.

Here is a more complicated example.

Example. For the following “parallel RC circuit”, show that as ω varies from 0 to ∞,the total complex impedance traces out a semi-circle centred at z = 1

2R in the complex

plane. Find its radius.

The total impedance Z satisfies

1

Z=

1

R− ωC

i=⇒ Z =

R

1 + iRωC.

To show that it traces out a semi-circle, we compute the real and imaginary parts:

Z =R(1− iRωC)

12 + (RωC)2=

R

1 + (RωC)2− i

R2ωC

1 + (RωC)2.

[MATH1551] 60 [updated 23/11/2021]

If Z = X + iY lies on the given semicircle, then (X − 12R)2 + Y 2 should be equal to a

constant independent of ω. With some complicated (but mindless) algebra, we find thisconstant is just R2/4:(

X − R

2

)2

+ Y 2 =

(R

1 + (RωC)2− R

2

)2

+

(R2ωC

1 + (RωC)2

)2

,

=

(2R−R(1 + (RωC)2)

2(1 + (RωC)2)

)2

+

(2R2ωC

2(1 + (RωC)2)

)2

,

=R2

4

[(1− (RωC)2

1 + (RωC)2

)2

+

(2RωC

1 + (RωC)2

)2],

=R2

4

[1− 2(RωC)2 + (RωC)4 + 4(RωC)2

(1 + (RωC)2)2

],

=R2

4

[(1 + (RωC)2)2

(1 + (RωC)2)2

],

=R2

4.

So Z lies on a circle radius1

2R and centred at

(1

2R, 0

).

In fact, Z lies on the semi-circle below the real axis because

Y =−R2ωC

1 + (RωC)2≤ 0 for all ω ≥ 0.

Notice that as ω → 0, the complex impedance Z reduces to R, corresponding to a constantdirect current (DC) through the resistor. As ω → ∞ we have Z → 0 so high frequenciespass almost unimpeded.

▶ A diagram showing complex impedance as a function of ω like this is sometimes calleda Nyquist plot.

[MATH1551] 61 [updated 23/11/2021]

3 Vectors

3.0 Introduction

Vectors are objects with both length and direction. We can think of a vector as an arrowfrom one place to another.

▶ We will mostly consider them in 2- or 3-dimensional Euclidean space R2 or R3, butthe concept is much more general.

Given two points A and B (in R2 or R3 or . . . ), the vector−→AB is the straight line path

from A to B. Vectors consist of both a direction and a length. We write∣∣∣−→AB∣∣∣ for this

length and, if this equals 1 we call it a unit vector.

A

−→AB

B

C

−−→CD

D

The vector doesn’t actually depend on the starting point. You should think of it as “whatyou have to do to get from A to B”. So given a parallel path from C to D of the same

length, we actually have−→AB =

−−→CD.

With the origin at O, the vector−→OA is called the position vector of A.

For convenience, if we don’t want to refer to two points, we just label vectors by a singleletter. In these notes, we’ll use boldface letters like u, v. When writing by hand we oftenuse underlined letters like u, v, or sometimes arrows like u, v.

▶ Many fundamental quantities in science and engineering are most naturally describedby vectors. These include positions, velocities and forces in particle mechanics (which wewill mention briefly later). In future courses you may learn about vector fields, wherethere is a vector defined at every point in space. These include quantities such as thevelocity field in fluid mechanics, the electric and magnetic fields in electromagnetism, orthe Earth’s gravitational field.

[MATH1551] 62 [updated 23/11/2021]

3.1 Basic Rules

Vector addition. We add vectors “head to tail”. Thus we always have

A

B

C

−→AB +

−−→BC =

−→AC

With two vectors u and v, the sumw = u+v is the path to the corner of the parallelogramformed from u and v:

u

vw

u

v

Scalar multiplication. We can multiply a vector u by a scalar λ ∈ R to give a newvector λu. This is in the same direction as u for λ > 0 and in the opposite direction ifλ < 0, and the length is scaled by |λ|, so

|λu| = |λ| |u|.

u2u

−u

▶ If u is parallel to v, then v = λu for some λ = 0.

▶ If λu = µv and u is not parallel to v, then it must be that λ = µ = 0.

Vector addition and scalar multiplication satisfy the following rules (similar to additionand multiplication of real numbers):

Commutativity: u+ v = v + u.

Associativity: u+ (v +w) = (u+ v) +w.

Distributivity: (λ+ µ)u = λu+ µu, λ(u+ v) = λu+ λv, λ(µu) = (λµ)u.

We also have a zero vector 0. (This doesn’t have a direction but we call it a “vector”anyway!) It satisfies u+ 0 = u for every u and has length 0.

[MATH1551] 63 [updated 23/11/2021]

Example. If the position vectors of A, B and C are a, b and 2a+3b respectively, what

are−→AB,

−−→BC and

−→CA?

−→AB =

−→AO +

−−→OB =

−−→OB −

−→OA = b− a

−−→BC =

−→OC −

−−→OB = 2a+ 3b− b = 2a+ 2b

−→CA =

−→OA−

−→OC = a− (2a+ 3b) = −a− 3b

O

aA

bB

2a+ 3b

C

Notice that−→AB +

−−→BC +

−→CA = 0. Going from A to B to C to A goes nowhere!

Example. Given a triangle ABC, show that the three lines connecting A to the midpointof BC, B to the midpoint of CA and C to the midpoint of AB intersect at a commonpoint. Furthermore, show that the intersection is 2/3 of the way along each of these lines.

Let the three midpoints be D, E and F as shown below, and let X be the point whereBE and CF intersect. We want to show that X lies on AD.

A

B

C

E

F

DX

Let u =−→AB and v =

−→AC. Then

−−→BC = v − u and the three lines have vectors

−−→AD =

−→AB +

1

2

−−→BC = u+

1

2(v − u) =

1

2v +

1

2u,

−−→BE =

−→BA+

1

2

−→AC = −u+

1

2v,

−→CF =

−→CA+

1

2

−→AB = −v +

1

2u.

[MATH1551] 64 [updated 23/11/2021]

We will now express−−→AX in two ways. Notice

−−→BX = λ

−−→BE and

−−→CX = µ

−→CF for some

µ, λ. Thus

−−→AX =

−→AB +

−−→BX =

−→AB + λ

−−→BE

= u+ λ

(−u+

1

2v

)= (1− λ)u+

1

2λv.

Similarly,

−−→AX =

−→AC +

−−→CX =

−→AC + µ

−→CF

= v + µ

(−v +

1

2u

)= (1− µ)v +

1

2µu.

Equating these gives

(1− λ)u+1

2λv = (1− µ)v +

1

2µu

=⇒(1− λ− 1

)u =

(1− µ− 1

)v.

However, u and v are not parallel (if they were, the triangle would collapse into a lineand the question would be much easier). So the above equation must say 0u = 0v and{

1− λ− 12µ = 0

1− µ− 12λ = 0

=⇒ λ = µ =2

3.

In particular,−−→BX =

2

3

−−→BE and

−−→CX =

2

3

−→CF .

Furthermore, using the above formulas for−−→AX and

−−→AD, we have

−−→AX =

1

3u+

1

3v =

2

3

−−→AD

and X lies 2/3 of the way along on AD, BE and CF .

▶ The point X is called the centroid of the triangle ABC. If we make the triangle outof a thin, uniform density sheet of metal, then X is the centre of mass.

[MATH1551] 65 [updated 23/11/2021]

3.2 Coordinates and Bases

We often represent vectors by using coordinates. For example, consider the followingvector in R2

1 2 3

1

2

0

u

It can be represented as u =

(32

), in the same form we used in the Linear Algebra topic.

An alternative notation uses the standard basis vectors i =

(10

)and j =

(01

).

We can write u = 3i + 2j and we call 3 and 2 the coordinates of the vector withrespect to the standard basis. By Pythagoras’ Theorem the length of this vector is|u| =

√22 + 32 =

√13.

Adding two vectors just corresponds to adding the coordinates:(32

)+

(1−1

)=

(41

)or in terms of the standard basis vectors (3i+ 2j) + (i− j) = 4i+ j.

Multiplying by a scalar just multiplies the coordinates:

4

(32

)=

(4× 34× 2

)=

(128

)or alternatively, we can write 4 (3i+ 2j) = 12i+ 8j.

Similarly, a vector in R3 can be written

u =

xyz

= x

100

+ y

010

+ z

001

= xi+ yj+ zk.

It has length |u| =√

x2 + y2 + z2 and addition/scalar multiplication work in the obviousway.

An n-dimensional vector in n-dimensional space Rn is handled in the same way: it hasn coordinates with respect to n standard basis vectors (often called e1, e2, . . . , en, whereei has a 1 in the i-th coordinate and 0 everywhere else).

▶ Notice that vectors are nothing other than matrices with a single column. But nowthey have geometric meaning as well.

[MATH1551] 66 [updated 23/11/2021]

Sometimes, it’s useful to write a vector in terms of other vectors, rather than the standardbasis. Given a set of m vectors S = {v1,v2, ...,vm} in n-dimensional space Rn, the spanof the vectors, written as Span(S), is the set of all linear combinations

u = c1v1 + c2v2 + . . .+ cmvm for c1, c2, . . . cm ∈ R.

We say that the set spans Rn if every vector in Rn can be written in this way. It meanswe can reach every point using the directions of the set S in at least one way.

Example. (1) Consider the vectors v1 =

200

, v2 =

030

in R3.

We can see directly that c1

200

+ c2

030

=

2c13c20

, which can’t equal e.g.

001

.

Hence these vectors don’t span R3. More generally, there’s no way that less than 3 vectorscould span R3. We can’t get everywhere in 3-d space using only two directions.

(2) Consider the vectors v1 =

012

, v2 =

201

, v3 =

120

in R3.

Spanning means we can find c1, c2, c3 so that c1v1 + c2v2 + c3v3 = u for arbitrary u.Notice we can rewrite this as a matrix equation

c1

012

+ c2

201

+ c3

120

=

0 2 11 0 22 1 0

c1c2c3

=

uvw

.

We can check the determinant of the matrix is non-zero, so it is invertible and there is asolution. In other words, this set does span R3.

(3) Be careful though - a set with at least 3 vectors does not necessarily span R3.

For instance, S =

111

,

222

,

333

,

444

clearly doesn’t.

Another very useful concept is linear independence. A set ofm vectors S = {v1,v2, ...,vm}is called linearly independent if none of the vectors is a linear combination of the oth-ers. An alternative way of saying this is that

c1v1 + c2v2 + . . .+ cmvm ⇐⇒ c1 = c2 . . . = cn = 0.

This essentially says there are no redundant directions in the set S and we can reach aparticular point in space in at most one way (though possibly not at all).

c1v1 + c2v2 + ...+ cmvm = d1v1 + d2v2 + ...+ dmvm

⇐⇒ (c1 − d1)v1 + (c2 − d2)v2 + ...+ (cm − dm)vm = 0

⇐⇒ c1 − d1 = c2 − d2 = ... = cm − dm = 0

⇐⇒ c1 = d1, c2 = d2, ... , cm = dm.

[MATH1551] 67 [updated 23/11/2021]

Example. (1) Consider the vectors v1 =

100

, v2 =

010

, v3 =

001

, v4 =

123

.

We can see directly that v1 + 2v2 + 3v3 − v4 = 0 so they are not linearly independent.

More generally, there’s no way that more than 3 vectors in R3 can be linearly independent.We’ll always be able to write one of them in terms of the others.

(2) Consider the vectors v1 =

012

, v2 =

201

, v3 =

120

in R3.

Linear independence means that if c1v1 + c2v2 + c3v3 = 0, then c1 = c2 = c3 = 0.

Again, we can rewrite this as a matrix equation

c1

012

+ c2

201

+ c3

120

=

0 2 11 0 22 1 0

c1c2c3

=

000

.

The determinant of the matrix is non-zero, so there is a solution. But we can see thatc1 = c2 = c3 = 0 is a solution so it must be the only one! In other words, this set islinearly independent.

(3) Be careful though - a set with less than 3 vectors in R3 is not necessarily linearlyindependent.

For instance, S =

111

,

222

clearly isn’t.

So a set of vectors spans Rn if you can use them to reach everywhere in at least one wayand it is linearly independent if you can only use them to get somewhere in at most oneway. A basis is a set which both spans the space and is linearly independent.

Example. The vectors v1 =

012

, v2 =

201

, v3 =

120

form a basis of R3.

We’ve seen in the previous examples that they span R3 and are linearly independent.

Now, for a set of m vectors S = {v1,v2, ...,vm} in n-dimensional space, we have

(1) if m < n then it can not span Rn,

(2) if m > n, then it can not be linearly independent.

In particular, a basis must have m = n, i.e. the same number of vectors as the dimension.Furthermore, there is exactly one way to write any given vector u ∈ Rn as

u = c1v1 + c2v2 + . . .+ cnvn

We call c1, c2, . . . , cn the coordinates of u with respect to the basis {v1,v2, ...,vn}.

[MATH1551] 68 [updated 23/11/2021]

There is a quick way to check if a set S of exactly n vectors in Rn is a basis. In this case,the following are equivalent:

(1) S is linearly independent,

(2) S spans Rn,

(3) S is a basis of Rn,

(4) the matrix with the vectors of S as columns has non-zero determinant, i.e. isinvertible.

Example. Show that the vectors v1 =

(1√21√2

)and v2 =

(− 1√

21√2

)form a basis for R2,

and find the coordinates of u =

(32

)with respect to this basis.

To show it is a basis, we just need to check the determinant∣∣∣∣∣ 1√2

− 1√2

1√2

1√2

∣∣∣∣∣ = 1

2+

1

2= 1 = 0.

To find the coordinates of u with respect to this new basis, write c1v1 + c2v2 = u, so

c1

(1√21√2

)+ c2

(− 1√

21√2

)=

(1√2

− 1√2

1√2

1√2

)(c1c2

)=

(32

),

and solve for the coordinates c1 and c2 (by Gaussian elimination or directly). We have{c1 − c2 = 3

√2

c1 + c2 = 2√2

=⇒ c1 =5√2

2, c2 = −

√2

2.

u

i

j v1

v2

▶ Changing to a different basis thus corresponds to using a different coordinate system.This can simplify calculations and is useful in many applications in engineering andscience.

[MATH1551] 69 [updated 23/11/2021]

3.3 The Scalar Product

The scalar product is a way of multiplying two vectors to get a scalar.

Given two vectors in R2,

u =

(u1

u2

)= u1i+ u2j and v =

(v1v2

)= v1i+ v2j,

their scalar product (also called dot product or inner product) is defined by

u · v = u1v1 + u2v2.

Similarly for two vectors in R3, u = u1i+ u2j+ u3k and v = v1i+ v2j+ v3k, their scalarproduct is u · v = u1v1 + u2v2 + u3v3.

▶ You should be able to guess how to define it for n-dimensional vectors – just multiplythe corresponding coordinates and add.

The scalar product has the following nice, natural properties:

Commutativity: u · v = v · u.

Scalar associativity: (λu) · v = λ(u · v).

Distributivity: u · (v +w) = u · v + u ·w.

Length: u · u = |u|2.

Perpendicularity: u · v = 0 if and only if u and v are perpendicular.

The first four properties follow readily from the definition. The last property is a specialcase of the important formula

u · v = |u| |v| cos θ

where θ is the angle between u and v. We can prove it using the cosine law : this is ageneralisation of Pythagoras’s Theorem to non-right-angled triangles.

Suppose a triangle has side lengths a, b, c and angle θ as shown.

b

ca

y z

x

θ

[MATH1551] 70 [updated 23/11/2021]

Split it into two smaller right-angled triangles as shown and use Pythagoras’s Theoremon each of these:

c2 = x2 + z2 = x2 + (b− y)2 = x2 + y2 + b2 − 2by

= a2 + b2 − 2by

= a2 + b2 − 2ab cos θ.

Now consider the triangle formed by two vectors u and v

u

v

u− v

θ

It has side lengths a = |u|, b = |v| and c = |u− v|, so using the cosine law, we get

|u− v|2 = |u|2 + |v|2 − 2|u| |v| cos θ.

Also, using the properties of the scalar product,

|u− v|2 = (u− v) · (u− v)

= u · u+ v · v − u · v − v · u= |u|2 + |v|2 − 2u · v.

Comparing these two formulas gives u · v = |u| |v| cos θ.

This formula gives us an easy way to calculate angles between vectors.

Example. Find the angle between the vectors

u =

111

= i+ j+ k and v =

20−3

= 2i− 3k.

We have

u · v = 1(2) + 1(0) + 1(−3) = −1,

|u| =√12 + 12 + 12 =

√3,

|v| =√22 + 02 + (−3)2 =

√13,

so

cos θ =u · v|u| |v|

=−1√3√13

=⇒ θ = arccos

(−1√39

).

[MATH1551] 71 [updated 23/11/2021]

3.4 Projections and Orthonormality

Another way to think about the scalar product is in terms of projections. Given two

vectors a =−→OA and b =

−−→OB, there is a point C on OB with AC perpendicular to OB.

O

a

A

b Bθ

c C

Then the projection of a onto b is the vector c =−→OC. Its length |

−→OC| is the component

of a in the direction of b, and is given by

|−→OC| = |a| cos θ = |a| a · b

|a| |b|=

a · b|b|

.

The projection c is a multiple of b, but which multiple? A unit vector in the directionof b is b/|b|, so we can write the projection vector as

c =

(a · b|b|

)b

|b|=

(a · b|b|2

)b.

▶ The projection is essentially telling us “how much a goes in the direction of b”.

There is a related application of the scalar product in physics/engineering. The workdone (energy used up) by a force F moving a particle through a displacement d is

W = F · d.

It’s the component of the force in the direction of motion, multiplied by the distance |d|.Example. A particle is displaced d = 2i + 3j by a force F = i + j. The work done onthe particle by the force is just

W = F · d = (2i+ 3j) · (i+ j) = 2(1) + 3(1) = 5.

A nice property of the standard basis vectors such as i, j, k in R3 is that they are all unitlength and mutually perpendicular. That is,

i · i = j · j = k · k = 1

andi · j = i · k = j · k = 0.

A basis of mutually perpendicular unit vectors is called orthonormal.

[MATH1551] 72 [updated 23/11/2021]

Example. The basis v1 = (cos θ)i+ (sin θ)j, v2 = (− sin θ)i+ (cos θ)j of R2.

We can check that this basis is orthonormal:

v1 · v1 = cos2 θ + sin2 θ = 1,

v2 · v2 = sin2 θ + cos2 θ = 1,

v1 · v2 = − cos θ sin θ + sin θ cos θ = 0.

Notice that v1 and v2 are just the vectors obtained by rotating i and j anti-clockwiseabout the origin by angle θ.

▶ The concept of vectors goes far beyond the Euclidean spaces Rn we consider in thiscourse. Ideas such as scalar products and orthonormal bases work in much more gener-ality.

A particular example you may meet later in your course is Fourier series. Functionswhich are both odd and periodic with period 2π, i.e.

f(−x) = −f(x) and f(x+ 2π) = f(x)

can be thought of as vectors in an infinite-dimensional vector space. We can add them,multiply by scalars,..., just as we do with ordinary vectors. A basis for this vector spaceof functions is

{sinx, sin(2x), sin(3x), . . .},

and every function of this type (almost) can be written as an infinite sum

f(x) = a1 sinx+ a2 sin(2x) + a3 sin(3x) + . . . =∞∑n=1

an sin(nx)

for some real numbers a1, a2, . . .. The scalar product in this case is

f · g =1

π

∫ π

−π

f(x)g(x) dx

and one can show that

sin(mx) · sin(nx) = 1

π

∫ π

−π

sin(mx) sin(nx) dx =

{1 if m = n0 if m = n

so that the basis {sinx, sin(2x), sin(3x), . . .} is actually orthonormal.

These kind of functions are useful wherever oscillatory behaviour appears – for example,vibrating strings, signal processing or quantum mechanics.

[MATH1551] 73 [updated 23/11/2021]

3.5 Lines in 3 dimensions

We can write the equation for a straight line in terms of vectors. Given two points A andB with position vectors a and b respectively, the direction of the line joining A and Bhas direction d = b− a.

0

b

B

a

A

d = b− a

x = a+ td

We can get to any point x on the line by travelling to A, then moving some distance indirection d. In other words,

x = a+ td

for some real number t. This t is a parameter and this equation for an arbitrary pointx is called a parametric equation of the line.

▶ It can useful to think of t as a time variable. At t = 0 we are at A, at t = 1 we are atB and we move at constant speed |d| along the line.

Note that there are many ways to write the line in parametric form. We could havestarted at B rather than A, or indeed any other point on the line, and could use anynon-zero multiple of d for the direction.

Example. Find a parametric equation of the line joining A = (1, 2, 1) and B = (3, 7, 4).

The direction vector is

d = b− a =

374

121

=

253

so an arbitrary point on the line is x = a+ td =

121

+ t

253

.

Notice that if x has coordinates x, y, z, then we can eliminate t.xyz

=

a1a2a3

+ t

d1d2d3

=⇒ t =x− a1d1

=y − a2d2

=z − a3d3

.

We say thatx− a1d1

=y − a2d2

=z − a3d3

are Cartesian equations of the line.

[MATH1551] 74 [updated 23/11/2021]

▶ Notice if e.g. d1 = 0, the above doesn’t quite work and we need to replace thatequation by x = a1. In fact, parametric form is often easier to work with in practice.

Example. Find Cartesian equations of the line in the previous example.

x =

xyz

=

1 + 2t2 + 5t1 + 3t

=⇒ t =x− 1

2=

y − 2

5=

z − 1

3,

so Cartesian equations arex− 1

2=

y − 2

5=

z − 1

3.

Example. Suppose there are particles of mass m1, m2 at position vectors R1, R2

respectively. Where is the centre of mass R? The definition of this is the weightedaverage position

R =m1R1 +m2R2

m1 +m2

.

We can rewrite this to show explicitly that it is a point on the line through R1 and R2

by adding and subtracting m2R1 on the numerator:

R =(m1 +m2)R1 +m2(R2 −R1)

m1 +m2

= R1 +m2

m1 +m2

(R2 −R1).

This is the parametric equation of the line with a = R1 and d = R2 −R1. It shows that

R is a fractionm2

m1 +m2

of the way along the line from R1 to R2.

0

R1

m1

R2

m2

R2 −R1

R

Just to check, if m2 = m1 thenm2

m1 +m2

=m1

2m1

=1

2, which is midway between, as

expected. Similarly, if m1 is much bigger than m2 thenm2

m1 +m2

≈ 0, and if m1 is much

smaller than m2 thenm2

m1 +m2

≈ 1.

▶ If we had more particles, R1, . . . ,RN with masses m1, . . . ,mN then the correspondingcentre of mass would be a similar weighted average

R =m1R1 + . . .+mNRN

m1 + . . .+mN

.

[MATH1551] 75 [updated 23/11/2021]

3.6 Planes in 3 dimensions

Now consider planes in three-dimensional space. One way to define a plane is by specifyingany point a on it as well as a direction n which is orthogonal (perpendicular) to the plane.We call such a vector n a normal vector to the plane.

0

a

x

x− ax− a

n

Then, given an arbitrary point on the plane with position vector x, the vector x − a isperpendicular to n, so

(x− a) · n = 0.

This is a vector equation for the plane. There are lots of choices here: we could takeany point on the plane and any non-zero multiple of the normal direction.

To find an equation for the plane involving the coordinates,

x =

xyz

, a =

a1a2a3

, n =

n1

n2

n3

,

note that the vector equation gives x · n = a · n. We have

n1x+ n2y + n3z = n1a1 + n2a2 + n3a3.

Or, since a and n are fixed, there is a constant d with

n1x+ n2y + n3z = d.

This is the standard form for the Cartesian equation of a plane.

Example. Find a vector equation for the plane x+ 2y + 2z = 1.

We can read off the normal vector n =

122

from the coefficients of x, y, z. We also

need any point a on the plane. This must satisfy a1 + 2a2 + 2a3 = 1, so a simple choiceis a2 = a3 = 0 and a1 = 1. So a vector equation isx

yz

100

·

122

= 0.

[MATH1551] 76 [updated 23/11/2021]

Take any point a on a plane and any two non-zero, non-parallel vectors u, v along theplane.

a

x

x− a

0

u

v

Then an arbitrary point on the plane can be written as

x = a+ λu+ µv

for some values of parameters λ and µ. This is a parametric equation for the plane.

▶ Notice that if n is normal to the plane, then n is perpendicular to both u and v. Hence

(x− a) · n = λu · n+ µv · n = 0

which is consistent with the vector equation of the plane.

Example. Find a parametric equation for the plane x+ 2y + 2z = 1.

From the earlier example we already found a point a =

100

on this plane.

Now find two more points b and c on the plane with a, b and c not all in a line. We cando this by e.g. fixing two of the coordinates to determine the third.

For example, b =

−110

and c =

−101

will work. Then

u = b− a =

−210

and v = c− a =

−201

give two non-parallel directions on the plane and and a parametric equation is

x = a+ λu+ µv =

100

+ λ

−210

+ µ

−201

.

[MATH1551] 77 [updated 23/11/2021]

We could also get back to the Cartesian equation by setting x =

xyz

=

1− 2λ− 2µλµ

and eliminating λ and µ. Notice that this is a system of linear equations with two freeparameters. So when solving a 3 × 3 linear system via Gaussian elimination, if we findtwo free parameters, there is only one independent equation and the solutions must lieon a plane.

Now that we have various ways to write down planes, we can ask some further questions.

For instance, how can two planes intersect in 3 dimensional space? Either

� they are equal (so they intersect everywhere),

� they are parallel (so they do not intersect),

� or they intersect in a straight line.

In this final case, notice that if the planes have normal vectors n1 and n2, then the anglebetween the planes is the same as the angle between these normal vectors:

L

P1

n1

P2

n2

θ

θ

Example. Find a parametric equation for the line of intersection of x+2y+2z = 1 andx+ y + z = 1, and find the angle between the two planes.

The intersection solves the simultaneous equations

x+ 2y + 2z = 1x+ y + z = 1

Gaussian elimination gives(1 2 2 11 1 1 1

)R2−R1−−−−→

(1 2 2 10 −1 −1 0

)−R2−−→

(1 2 2 10 1 1 0

)Taking z = λ as a free parameter, back substitution gives y = −λ, x = 1, so theintersection line has the parametric equation

x =

1−λλ

=

100

+ λ

0−11

.

[MATH1551] 78 [updated 23/11/2021]

The normals to the two planes are

n1 =

122

and n2 =

111

,

so the angle between the planes is the angle θ between these two vectors and satisfies

cos θ =n1 · n2

|n1| |n2|=

1 + 2 + 2√12 + 22 + 22

√12 + 12 + 12

=5

3√3.

Vector equations for planes also give us a quick way to find shortest distances betweenpoints and planes. Suppose we have a plane (x− a) · n = 0. We can re-write this as

x · n = a · n,

where n =n

|n|is a unit normal vector. The distance between x and the origin is |x|

so, using the formula for scalar products, we see that

|x| |n| cos θ = a · n =⇒ |x| = a · ncos θ

,

where θ is the angle between x and n. This distance is shortest when cos θ = 1, i.e. whenx is in the direction of n, giving |x| = |a · n|. Furthermore, the corresponding point onthe plane is x = (a · n)n as this has the right length and direction.

Example. Find the closest point to the origin on the plane x + 2y + 2z = 1 and itsdistance from the origin.

We found a =

100

and n =

122

so a unit normal vector is n =1

3

122

.

The minimal distance is

|a · n| = 1× 1

3+ 0× 2

3+ 0× 2

3=

1

3

and the closest point is

x = (a · n) n =1

3

1

3

122

=1

9

122

.

▶ This can be generalised to find the shortest distance between an arbitrary point band a plane by first translating everything so that b moves to the origin.

[MATH1551] 79 [updated 23/11/2021]

3.7 The Vector Product

There is another useful way to multiply two vectors in R3 which produces a vector. Thisis the vector product (also called the cross product)

u× v =

u1

u2

u3

×

v1v2v3

=

u2v3 − u3v2u3v1 − u1v3u1v2 − u2v1

.

Note, this only works in 3-dimensional space R3.

A helpful way to remember the formula is to write it in terms of determinants:

u× v =

∣∣∣∣∣∣i j ku1 u2 u3

v1 v2 v3

∣∣∣∣∣∣ = i

∣∣∣∣u2 u3

v2 v3

∣∣∣∣− j

∣∣∣∣u1 u3

v1 v3

∣∣∣∣+ k

∣∣∣∣u1 u2

v1 v2

∣∣∣∣= (u2v3 − u3v2)i+ (u3v1 − u1v3)j+ (u1v2 − u2v1)k.

Example. Calculate u× v when u =

120

and v =

231

.

u× v =

120

×

231

=

2(1)− 0(3)0(2)− 1(1)1(3)− 2(1)

=

2−1−1

.

Example. Calculate u× v when u =

a00

and v =

b cos θb sin θ0

.

Notice here that u and v lie in the xy-plane with an angle θ between them. We have

u× v =

a00

×

b cos θb sin θ0

=

00

ab sin θ

.

So u× v is in the z-direction and orthogonal (perpendicular) to both u and v.

The vector product has the following properties:

1. Anti-commutativity: u× v = −v × u.

2. Scalar associativity: (λu)× v = λ(u× v).

3. Distributivity: u× (v +w) = u× v + u×w.

These are simple to show directly from the definition.

The geometrical interpretation of u × v is as a kind of “directed area”. Its direction isorthogonal to both u and v and its magnitude is the area of the parallelogram definedby u and v, as shown here:

[MATH1551] 80 [updated 23/11/2021]

u

v

u× v

|u× v|

θ

If θ is the angle between u and v, then the area of the parallelogram is given by basetimes height, so

|u× v| = |u| |v| sin θ.

▶ This is a little complicated to prove in general from the definition of the vector productin terms of coordinates of u and v, but we saw a specific case in the previous example.

Thus, just as the scalar product tells us when two non-zero vectors are perpendicular(it’s when u · v = 0), the vector product tells us when two non-zero vectors are parallel(it’s when u× v = 0). In particular, u× u = 0 for any vector u.

However, there are two directions orthogonal to the parallelogram. Should the vectoru× v point up or down? This is determined by the right-hand rule – with your righthand, point your index finger in the u direction and your middle finger in the v direction.Then your thumb points in the u× v direction.

In particular, you can check that

i× j = k, j× k = i, k× i = j,

whereasi× k = −j, j× i = −k, k× j = −i.

▶ We say that our usual Cartesian coordinates are right-handed because the standardbasis vectors obey these identities.

[MATH1551] 81 [updated 23/11/2021]

The vector product gives us an easy way to find a vector perpendicular to two givenvectors. We can use this to quickly find a normal vector to a plane.

Example. Find a vector equation for the plane containing the points A = (2, 0, 0),B = (0, 4, 0) and C = (0, 0,−3).

We can create two independent directions on the plane using the three points:

u =−→AB =

040

200

=

−240

and v =−→AC =

00−3

200

=

−20−3

.

Then a normal vector to the plane is

n = u× v =

−240

×

−20−3

=

−12−68

.

Now we just need any point e.g. a =

200

on the plane and we then have a vector

equation (x− a) · n = 0.

We can also use the vector product to find a vector equation of a line. Recall theparametric equation of a line – if a is a point on the line and d is its direction, then

x = a+ td =⇒ x− a = td.

Taking the cross product with d, and using d× d = 0, gives a vector equation of theline,

(x− a)× d = 0.

Example. Find a vector equation for the line parallel to i+2j−k which passes throughthe point (0, 1, 0).

The line has direction d = i+2j−k and a point on the line is a = j so the vector equationis (x− a)× d = 0, that is,

(x− a)× d =

xy − 1z

×

12−1

=

1− y − 2zx+ z

2x− y + 1

=

000

.

The equations of each of the the coordinates here give us Cartesian equations for the lineand solving these equations simultaneously with one variable as a parameter would getus back to the parametric form.

▶ We now have Cartesian, parametric and vector equations for both lines and planesin 3 dimensions. Being able to describe the same object in different ways gives us extramathematical flexibility.

[MATH1551] 82 [updated 23/11/2021]

3.8 Distances Between Lines

Finding the shortest distance between two lines is another application of vector products.

First, suppose the two lines are parallel. Thus they have the same direction vector u andcan be written in parametric form as

x = a+ λu,

x = b+ µu

for some vectors a and b.

θ

|a− b| sin θ

b

a

u

u

The shortest distance will be the length of the line perpendicular to both lines as shown.If θ is the angle between a− b and u, then this distance is |a− b| sin θ. However,∣∣(a− b)× u

∣∣ = |a− b| |u| sin θ

so the distance can be written∣∣(a− b)× u

∣∣, where u = u/|u|.

Example. Find the distance between the lines with Cartesian equations

x− 3

2=

y + 8

−2=

z − 1

1and

x+ 5

4=

y + 3

−4=

z − 6

2.

The direction vectors u =

2−21

and v =

4−42

are parallel since v = 2u.

A unit vector in this direction is u =1

3

2−21

, and points on the two lines are a =

3−81

and b =

−5−36

respectively. So

(a− b)× u =1

3

8−5−5

×

2−21

=1

3

−5− 10−10− 8−16 + 10

=1

3

−15−18−6

=

−5−6−2

.

The minimum distance is then the length of this, that is,√52 + 62 + 22 =

√65.

[MATH1551] 83 [updated 23/11/2021]

Now consider the case where the two lines are not parallel. Then we can write arbitrarypoints on the two lines as

x1 = a+ λu,

x2 = b+ µv

for some vectors a, b and non-parallel direction vectors u, v.

d

a

b x1 = a+ λu

x2 = b+ µvu

v

The distance is minimum when the line connecting x1 to x2 is perpendicular to the twolines, hence to both u and v. But we know a vector in this direction, namely n = u× v.In fact, if d denotes the minimum distance, then letting n = n/|n|, we must have

x1 − x2 = ±dn

and so(a+ λu)− (b+ µv) = ±dn.

Taking the scalar product of both sides with n, and remembering u · n = v · n = 0 andn · n = 1, we obtain

(a− b) · n = ±d.

In other words, the minimum distance between the two lines is

d =∣∣(a− b) · n

∣∣ = ∣∣∣∣(a− b) · u× v

|u× v|

∣∣∣∣ .Example. Find the distance between the lines given by parametric equations

x =

110

+ λ

162

and x =

15−2

+ µ

2156

.

The lines are not parallel since their direction vectors u =

162

and v =

2156

aren’t

multiples of each other. A vector perpendicular to both is

n = u× v =

162

×

2156

=

36− 304− 615− 12

=

6−23

.

[MATH1551] 84 [updated 23/11/2021]

This has length |n| =√

62 + (−2)2 + 32 = 7 so the unit vector in this direction is

n =1

7

6−23

.

Taking points on the lines a =

110

and b =

15−2

, we have

(a− b) · n =

0−42

· 17

6−23

=1

7(0 + 8 + 6) = 2

and so the minimum distance is 2.

[MATH1551] 85 [updated 23/11/2021]

3.9 The Scalar Triple Product

Given three vectors u, v, w in R3, the scalar triple product is

u · (v ×w).

Geometrically, this gives the volume of the parallelepiped (the 3-d version of a paral-lelogram) with edges given by u, v and w.

v

w

u

v ×w

θ

To see why this is, note that the volume is the area of the base times the height. Weknow that the area of the base is |v ×w|. The vector n = v ×w is normal to the base.If it makes an angle θ with u then the height of the parallelepiped is |u| cos θ. So thevolume is

|v ×w| |u| cos θ = u · (v ×w).

Notice that cycling u, v, w doesn’t change the volume, so

u · (v ×w) = v · (w × u) = w · (u× v).

On the other hand, by the anti-commutativity of the vector product,

u · (v ×w) = −u · (w × v) = −v · (u×w) = −w · (v × u).

Also notice that if u = v or u = w, then the volume is zero so

v · (v ×w) = w · (v ×w) = 0.

This again tells us that v ×w is perpendicular to both v and w.

In terms of determinants, the scalar triple product has a nice expression:

u · (v ×w) = (u1i+ u2j+ u3k) ·

∣∣∣∣∣∣i j kv1 v2 v3w1 w2 w3

∣∣∣∣∣∣ =∣∣∣∣∣∣u1 u2 u3

v1 v2 v3w1 w2 w3

∣∣∣∣∣∣ .

[MATH1551] 86 [updated 23/11/2021]

3.10 Application to Mechanics

Suppose we have a position vector r(t) which depends on time t. Then we can differentiater(t) to give derivative

r(t) =dr

dt= lim

h→0

r(t+ h)− r(t)

h.

▶ In Physics,we often put a dot over a variable as shorthand for differentiation withrespect to time.

▶ We will look more at the limit definition of differentiation next term.

To calculate this derivative, we just differentiate each component separately:

r(t) =

x(t)y(t)z(t)

=⇒ r(t) =

x(t)y(t)z(t)

.

Differentiating vectors like this satisfies a number of natural properties which can bechecked directly from the definitions and the product rule. For vector-valued functionsr(t), s(t) and a scalar λ(t) all depending on t, we have:

d

dt(r+ s) = r+ s,

d

dt(λr) = λr+ λr,

d

dt(r · s) = r · s+ r · s, d

dt(r× s) = r× s+ r× s.

▶ In the last formula it’s important not to change the order of the r and s, since thevector product depends on the order u× v = v × u).

In mechanics, we can use this to find vector expressions for various quantities associatedwith a particle of mass m and position vector r(t):

Velocity: v = r.

Momentum: p = mv = mr.

Acceleration: a = v = r.

Force (Newton’s Second Law): F = ma = p = mr.

Angular momentum about the origin: L = r× p = mr× r.

Moment of force about the origin (torque): τ = r× F = mr× r.

Newton’s Second Law tells us that if F = 0 then p = 0, so p is constant. In other words,if the net force on an object is continually zero then the momentum is conserved.

[MATH1551] 87 [updated 23/11/2021]

Example. A particle moves with position vector

r(t) =

x(t)y(t)0

=

R cos(ωt)R sin(ωt)

0

for some constants R and ω. Show that the particle moves in a circle, find the forceacting on the particle, and show its angular momentum is conserved.

Since x(t)2 + y(t)2 = R2, we see that the particle moves around a circle in the xy-planewith radius R centred at the origin. To find the force F, we first calculate the velocity:

v =

xy0

=

−Rω sin(ωt)Rω cos(ωt)

0

.

Notice that r ·v = 0 at any time t so the velocity is always perpendicular to the position,meaning along the tangent to the circle:

ωt

R

v

r

The acceleration is a = v =

−Rω2 cos(ωt)−Rω2 sin(ωt)

0

= −ω2r.

Hence the force is F = ma = −mω2r. Notice that this is always directed towards theorigin. Also, the angular momentum is

L = mr(t)× v(t)

= m

R cos(ωt)R sin(ωt)

0

×

−Rω sin(ωt)Rω cos(ωt)

0

= mR2ω

(cos2(ωt) + sin2(ωt)

)k

= mR2ωk.

This is independent of t, so the angular momentum L is conserved.

[MATH1551] 88 [updated 23/11/2021]

More generally, differentiating L = mr× r gives

L = mr× r+mr× r

= 0+ r× F

= τ ,

so the rate of change of angular momentum is equal to the torque (“turning force”) onthe particle.

▶ If the force always acts directly toward the origin, then we call it a central force.(Think about e.g. gravitational attraction.) Under such a force, we have τ = r×F = 0,as r and F are parallel. Hence L = 0 and this means angular momentum is conserved.

We can also see that the angular momentum only arises from the perpendicular compo-nent of the velocity by decomposing the velocity as v = v∥ + v⊥, where v∥ is parallel tothe position vector r and v⊥ is perpendicular to r:

v⊥

v∥

v

r

0

Then

L = mr× v∥ +mr× v⊥ = 0+mr× v⊥ = mr× v⊥.

which doesn’t depend on v∥.

[MATH1551] 89 [updated 23/11/2021]

3.11 Eigenvalues and Eigenvectors

Multiplying a vector in Rn by an n × n matrix A changes it into another vector – forexample, (

23

)7−→

(2 20 4

)(23

)=

(1012

).

So we can think of matrices as representing transformations of space, mapping vectorsto vectors. These are called linear maps and they “respect linear things”. For instance,they map straight lines to straight lines, maybe by stretching, reflecting or rotating space.Transformations like this will often have “preferred” directions, such as the location ofa reflecting mirror or the axis of a rotation. These special directions are examples ofeigenvectors belonging to a matrix, and finding them can make difficult linear algebraproblems easy. They have many applications that you will learn about in this and latercourses.

Example.

(1) The matrix A =

(2 00 4

)stretches vectors:

(xy

)7−→

(2 00 4

)(xy

)=

(2x4y

).

� If v =

(t0

)then Av = 2v. It stretches vectors on the x-axis by factor 2 but doesn’t

change their direction.

� If v =

(0t

)then Av = 4v. It stretches vectors on the y-axis by factor 4 but doesn’t

change their direction.

� All other vectors change direction: Av is not a multiple of v.

(2) The matrix A =

(0 11 0

)reflects in the line y = x:

(xy

)7−→

(0 11 0

)(xy

)=

(yx

).

� If v =

(tt

)then Av = v. It doesn’t change these vectors - they lie on the mirror.

� If v =

(t−t

)then Av = −v. These are vectors perpendicular to the mirror.

� All other vectors change direction when reflected: Av is not a multiple of v.

Suppose A is an n× n matrix andAv = λv

for some scalar λ and some non-zero vector v = 0. Then we say that λ is an eigenvalueof A and v is an eigenvector corresponding to λ.

▶ The prefix “eigen-” comes from the German for “own”, “special” or “characteristic”.They are special values and vectors associated with a particular matrix.

[MATH1551] 90 [updated 23/11/2021]

We can actually find what the eigenvalues must be without mentioning the eigenvectors.Rearrange the above equation

Av = λv ⇐⇒ Av − λv = 0 ⇐⇒ (A− λI)v = 0

where I = In is the n× n identity matrix. Notice that v = 0 is automatically a solutionto the system (A−λI)v = 0. If there is also a non-zero solution v = 0, then this systemdoesn’t have a unique solution. In particular, λ is an eigenvalue of A precisely when

det(A− λI) = 0.

Because of the way determinants are defined, det(A− λI) is a polynomial in λ of degreen and is called the characteristic polynomial of A.

Once we have found the roots of the characteristic polynomial to find the eigenvalues λ,we can then solve the linear system (A−λI)v = 0 to find the corresponding eigenvectorsfor each λ in turn. Because the system is singular, we know the v for each eigenvalue arenot unique, so it is convenient to express them in terms of the eigenspace Vλ. This isthe set spanned by the possible v corresponding to λ, together with the zero vector.

It’s easiest to see how this works with an example.

Example. Find the eigenvalues and corresponding eigenspaces of A =

(2 20 4

).

1. Find the eigenvalues : these are the roots of the characteristic polynomial

det(A− λI) =

∣∣∣∣2− λ 20 4− λ

∣∣∣∣ = 0 ⇐⇒ (2− λ)(4− λ) = 0.

So the eigenvalues are λ = 2 and 4.

2. Find the eigenspace V2: the eigenvectors v for λ = 2 satisfy (A− 2I)v = 0 (which weremember has infinitely many solutions).

Writing v =

(xy

), we have

(0 20 2

)(xy

)=

(2y2y

)=

(00

)and thus y = 0 and x = µ for

some parameter µ. The eigenvectors are thus v =

(µ0

)= µ

(10

)for µ = 0.

This eigenspace consists of all eigenvectors as well as the zero vector, i.e. V2 = Span

{(10

)}.

3. Find the eigenspace V4: the eigenvectors v satisfy the linear system (A− 4I)v = 0, or(−2 20 0

)(xy

)=

(−2x+ 2y

0

)=

(00

)=⇒ y = µ, x = µ.

So v =

(µµ

)= µ

(11

)for µ = 0, and this eigenspace is V4 = Span

{(11

)}.

[MATH1551] 91 [updated 23/11/2021]

▶ Just for fun, let’s investigate the previous example as a transformation.

The matrix A =

(2 20 4

)maps

(xy

)7−→

(2 20 4

)(xy

)=

(2x+ 2y

4y

).

This has the effect of stretching and shearing an initial square region into a parallelogram:

2

4

2 4x

y

2

4

2 4x

y

Notice that straight lines are mapped to straight lines – this is always true when youmultiply by a (non-singular) matrix. The blue and red arrows in the left hand plot show

the eigenvectors

(10

)and

(11

). They are stretched in length by their corresponding

eigenvalues 2 and 4, but they don’t change direction. Any straight line that is notparallel to one of these eigenvectors will necessarily change its direction – like the verticallines, for instance.

▶ Another fun fact: the determinant of A tells you how much the area of the square isincreased during its transformation, so in this example, the square with area 1 becomesa parallelogram with area det(A) = 8. A negative determinant would mean that there isalso a reflection involved, leading to a “negative area”.

Example. Find the eigenvalues and eigenspaces of the matrix A =

5 6 −30 −1 06 6 −4

.

1. Find the eigenvalues : The characteristic polynomial is det(A− λI3) = 0, and

det(A− λI3) =

∣∣∣∣∣∣5− λ 6 −30 −1− λ 06 6 −4− λ

∣∣∣∣∣∣ transp.=

∣∣∣∣∣∣5− λ 0 66 −1− λ 6−3 0 −4− λ

∣∣∣∣∣∣= (5− λ)

∣∣∣∣−1− λ 60 −4− λ

∣∣∣∣+ 6

∣∣∣∣ 6 −1− λ−3 0

∣∣∣∣= (5− λ)(1 + λ)(4 + λ)− 18(1 + λ)

= (1 + λ)(20 + λ− λ2 − 18) = (1 + λ)2(2− λ).

So we have eigenvalues λ = −1,−1, 2.

[MATH1551] 92 [updated 23/11/2021]

2. Find the eigenspace V−1: eigenvectors v corresponding to λ = −1 satisfy (A+I)v = 0,i.e. 6 6 −3

0 0 06 6 −3

xyz

=

000

.

There is effectively only one equation 2x+ 2y − z = 0, so there will be a two-parameterfamily of solutions (i.e. a plane). Choosing x = µ and y = ν shows that

v =

xyz

=

µν

2µ+ 2ν

= µ

102

+ ν

012

and so V−1 = Span

102

,

012

.

3. Find the eigenspace V2: eigenvectors v corresponding to λ = 2 satisfy (A− 2I)v = 0,i.e. 3 6 −3

0 −3 06 6 −6

xyz

=

000

.

This time we can’t immediately read off the solutions so easily, so instead let’s solve byGaussian elimination:3 6 −3 0

0 −3 0 06 6 −6 0

13R1 , − 1

3R2−−−−−−−→

13R3

1 2 −1 00 1 0 01 1 −1 0

R1−2R2−−−−→R3−R2

1 0 −1 00 1 0 01 0 −1 0

R3−R1−−−−→

1 0 −1 00 1 0 00 0 0 0

.

Hence y = 0 and taking z = µ to be a free parameter, we get x = µ. So

v =

µ0µ

= µ

101

and so V2 = Span

101

.

▶ Notice in the above two examples that the combined eigenspaces spanned the wholeof space (R2 in the first case and R3 in the second). But this is not always the case, asillustrated by the next example.

Example. Find the eigenvalues and eigenspaces of the matrix A =

(3 10 3

).

1. Find the eigenvalues : the characteristic polynomial is

det(A− λI2) = 0 ⇐⇒∣∣∣∣3− λ 1

0 3− λ

∣∣∣∣ = 0 ⇐⇒ (3− λ)2 = 0,

so A has the repeated eigenvalue λ = 3, 3.

[MATH1551] 93 [updated 23/11/2021]

2. Find the eigenspace V3: the eigenvectors v with λ = 3 satisfy (A− 3I)v = 0, or(0 10 0

)(xy

)=

(00

)=⇒ y = 0 and x = µ.

So v =

(µ0

)= µ

(10

)and V3 = Span

{(10

)}.

▶ Consider the matrix A =

(0 −11 0

)mapping

(xy

)7−→

(0 −11 0

)(xy

)=

(−yx

).

This rotates vectors anti-clockwise around the origin by angle π/2.

v =

(xy

)Av =

(−yx

)

π/2

Now we said that a transformation v 7→ Av doesn’t alter the direction of its eigenvectors.But surely a rotation angle π/2 about the origin will change the direction of every non-zero vector - it seems we have a matrix with no eigenvalues or eigenvectors? The answeris that eigenvalues (and eigenvectors) can be complex.

Indeed, the characteristic polynomial of A is

det(A− λI) =

∣∣∣∣−λ −11 −λ

∣∣∣∣ = λ2 + 1

and so the eigenvalues are λ = ±i. To find the corresponding eigenspace for λ = i, we

find v =

(xy

)such that (A− iI)v = 0

(−i −1 01 −i 0

)iR1−−→

(1 −i 01 −i 0

)R2−R1−−−−→

(1 −i 00 0 0

).

This means x = iy and the corresponding eigenspace is Vi = Span

{(i1

)}.

Similarly, we can show that the eigenspace corresponding to λ = −i is V−i = Span

{(1i

)}.

The rotation doesn’t change the “directions” of these complex vectors!

[MATH1551] 94 [updated 23/11/2021]

3.12 Diagonalisation

One important application of eigenvalues and eigenvectors is to diagonalise a matrix A,which means finding a matrix Y such that

Y −1AY = D

where D is a diagonal matrix. As we saw in the Linear Algebra topic, diagonal matricesare much easier to handle.

The trick to finding such a matrix Y is to choose its columns to be linearly independenteigenvectors of A.

Example. Find a matrix Y that diagonalises the matrix A =

(2 20 4

).

In the last section we found the eigenvalues λ1 = 2, λ2 = 4, with corresponding inde-

pendent eigenvectors v1 =

(10

)and v2 =

(11

). So let’s try Y =

(v1

∣∣v2

)=

(1 10 1

).

Then

Y −1AY =

(1 −10 1

)(2 20 4

)(1 10 1

)=

(1 −10 1

)(2 40 4

)=

(2 00 4

).

This matrix is indeed diagonal. Furthermore, notice that its entries are the eigenvaluesλ1 and λ2. This is not a coincidence!

For a given A, the matrix Y is not unique. Suppose we wrote the eigenvectors the other

way around, setting Y =

(1 11 0

). Then

Y −1AY =

(0 11 −1

)(2 20 4

)(1 11 0

)=

(4 00 2

),

so the order of the eigenvectors in D is swapped, but we still diagonalised A and thistime the eigenvalues swapped in D as well.

Alternatively, if we chose instead the eigenvector v2 =

(−2−2

), so that Y =

(1 −20 −2

),

then

Y −1AY =

(1 −10 −1

2

)(2 20 4

)(1 −20 −2

)=

(2 00 4

).

So it doesn’t seem to matter which eigenvectors we choose, except for the order we writethem.

To explain why this works, suppose A is an n× n matrix with (not necessarily distinct)eigenvalues λ1, ..., λn and corresponding eigenvectors v1, ...,vn so that Avi = λivi.

Y =

(v1

∣∣∣ v2

∣∣∣ ... ∣∣∣ vn

)and D =

λ1 0 · · · 00 λ2 · · · 0...

.... . .

...0 0 · · · λn

.

[MATH1551] 95 [updated 23/11/2021]

Due to the way matrix multiplication is defined, the columns of AY are precisely A timesthe columns of Y , i.e. they are Avi = λivi:

AY = A

(v1

∣∣∣ v2

∣∣∣ ... ∣∣∣ vn

)=

(Av1

∣∣∣ Av2

∣∣∣ ... ∣∣∣ Avn

)=

(λ1v1

∣∣∣ λ2v2

∣∣∣ ... ∣∣∣ λnvn

).

On the other hand, the columns of Y D are the columns of Y multiplied by the corre-sponding diagonal entries of D:

Y D =

(v1

∣∣∣ v2

∣∣∣ ... ∣∣∣ vn

)λ1 0 · · · 00 λ2 · · · 0...

.... . .

...0 0 · · · λn

=

(λ1v1

∣∣∣ λ2v2

∣∣∣ ... ∣∣∣ λnvn

).

In particular, we see that AY = Y D. Now provided that Y is invertible, we get thediagonalisation A = Y −1AY . But we need Y −1 to exist for this to work. This will happenif the vectors {v1, ...,vn} are a basis for Rn, since then the determinant det(Y ) = 0 andY is invertible.

Example. Now consider the matrix A =

(3 10 3

). In the last section, we saw it has a

repeated eigenvalue λ = 3. The eigenspace V3 = Span

{(10

)}is only one-dimensional

and we can only choose one linearly independent eigenvector, e.g. v =

(10

).

We don’t have enough independent eigenvectors to put in the matrix Y to make it in-vertible so the above method of diagonalisation will not work.

In fact no Y exists such that Y −1AY is diagonal. If Y =

(a bc d

)and D =

(r 00 s

)then

AY = Y D =⇒(3 10 3

)(a bc d

)=

(a bc d

)(r 00 s

)=⇒

{3a+ c = ar, 3b+ d = bs

3c = cr, 3d = ds

Assuming c = 0 =⇒ r = 3 =⇒ 3a + c = ra = 3a =⇒ c = 0, a contradiction, so wemust have c = 0. Similarly, d = 0 but then the matrix Y isn’t invertible as it has a rowof zeros.

▶ One can more rigorously show that a square matrix A is diagonalisable if and onlyif there is a basis consisting of eigenvectors. Notice it is the number of independenteigenvectors that matters, not the number of distinct eigenvalues. A matrix can haverepeated eigenvalues and still be diagonalisable.

▶ It can also be proved that symmetric n × n matrices (with real coefficients) alwayshave a basis of n eigenvectors, and hence are always diagonalisable.

▶ In fact, if a matrix is not diagonalisable, there is something called the Jordan normalform which “almost” diagonalises the matrix.

[MATH1551] 96 [updated 23/11/2021]

Eigenvectors corresponding to different eigenvalues are automatically linearly indepen-dent. A sketch of why this is true is as follows. Suppose Av1 = λ1v1 and Av2 = λ2v2 forthe same matrix A, with λ1 = λ2. Then

av1 + bv2 = 0 =⇒ (A− λ1I)(av1 + bv2) = 0

=⇒ (A− λ1I)bv2 = 0

=⇒ (λ2 − λ1)bv2 = 0

Since λ1 = λ2 and v2 = 0 by definition, this is only possible if b = 0. But then av1 = 0,and since v1 = 0 we see that a = 0. There is therefore no linear combination of v1 andv2 that gives 0, so they are linearly independent.

Example. Find a matrix Y that diagonalises the matrix A =

5 6 −30 −1 06 6 −4

.

Previously we found:

- an eigenvalue λ = 2 with eigenspace Span

101

.

- a repeated eigenvalue λ = −1 with eigenspace Span

102

,

012

So if we take

Y =

1 1 00 0 11 2 2

then D = Y −1AY =

2 0 00 −1 00 0 −1

.

It’s a good idea to check that this really works. Multiply out the matrices AY = Y Dand check you didn’t go wrong during the calculation.

Diagonalisation has important applications in physics, chemistry, mechanics,... as wellas in simultaneous differentiation equations, which we will see next term. Here we willfinish with a couple of simple applications involving powers of matrices.

Example. Given A =

(2 20 4

), calculate A100.

Using A = Y DY −1, with the Y =

(1 10 1

)and D =

(2 00 4

)we found before, we have

A100 = (Y DY −1)100 = Y DY −1Y DY −1 . . . Y DY −1Y DY −1 = Y D100Y −1.

But finding D100 is easy – just raise the diagonal elements to the power 100. So

A100 =

(1 10 1

)(2100 00 4100

)(1 −10 1

)=

(2100 4100 − 2100

0 4100

).

[MATH1551] 97 [updated 23/11/2021]

The same kind of trick works for finding “roots” of matrices. The point is finding a rootof a diagonal matrix is easier, so the diagonalisation process allows us to shift the problemto these instead.

Example. Find a matrix B such that B3 = A =

(−46 135−18 53

).

We first find the eigenvalues:

det(A− λI) =

∣∣∣∣−46− λ 135−18 53− λ

∣∣∣∣ = λ2 − 7λ− 8 = (λ+ 1)(λ− 8)

so the eigenvalues are λ1 = −1 and λ2 = 8.

For λ1 = −1, we solve (A+ I)v1 = 0, i.e.(−45 135 0−18 54 0

)−→

(1 −3 00 0 0

)=⇒ x− 3y = 0.

Hence we can take v1 =

(31

)as the corresponding eigenvector.

For λ1 = 8, we solve (A− 8I)v2 = 0, i.e.(−54 135 0−18 −45 0

)−→

(2 −5 00 0 0

)=⇒ 2x− 5y = 0.

Hence we can take v2 =

(52

)as the corresponding eigenvector.

We now have Y −1AY = D where Y =

(3 51 2

)and D =

(−1 00 8

). (Check this!)

Writing down a cube root of a diagonal matrix is easy - just take cube roots of the

diagonal elements. In other words, C3 = D where C =

(−1 00 2

).

But then letting B = Y CY −1 gives

B3 =(Y CY −1

)3= Y C3Y −1 = Y DY −1 = A

and this matrix B is what we are looking for:

B = Y CY −1 =

(3 51 2

)(−1 00 2

)(2 −5−1 3

)=

(−16 45−6 17

).

[MATH1551] 98 [updated 23/11/2021]