numbers, sets and functions 2021–22

Numbers, Sets and Functions 2021–22

Dr M. Fayers

Contents

1 Mathematical notation 11.1 Basic arithmetic operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Brackets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 The Greek alphabet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5

∑notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.6 Manipulating sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.7 Products and factorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Logic 52.1 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Using statements to make new ones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.5 Converse and contrapositive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Proofs 113.1 What is a proof? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Proving implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.3 Disproving implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.4 Proof by contradiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.5 Proof by induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.6 Finding mistakes in proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Integers 194.1 Natural numbers and integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Divisibility and primes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.3 Greatest common divisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.4 Lowest common multiple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5 Sets 265.1 Definition, notation and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.2 Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.3 Set operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.4 Some set identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.5 Another set operation: Cartesian product . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

i

6 Counting 326.1 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326.2 Counting subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326.3 Counting subsets of a particular size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

7 Functions 357.1 Definition of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357.2 Injective, surjective, bijective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387.3 Restriction and composition of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397.4 Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407.5 Bijections and cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427.6 Images and inverse images of subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

8 Some more mathematical objects 458.1 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458.2 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

9 Rational numbers and real numbers 499.1 Rational numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499.2 An irrational number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499.3 Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509.4 Upper bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

10 Complex numbers 5410.1 Definition and operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5410.2 The complex plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5510.3 Roots of unity (non-examinable) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

ii

1 Mathematical notation

One of the most important things in mathematics is to write things accurately. There are two aspectsto this:

� using words to make clear logical statements;

� using mathematical notation correctly.

In this short section we’ll focus on notation. I’ll introduce various notation (some of which you will haveseen before), and make a few comments on how to use it accurately.

1.1 Basic arithmetic operations

Basic arithmetic operations +, −, × and ÷ I don’t need to say much about. You’re used to the fact thatyou can often leave out the × symbol: you usually write xy instead of x × y . Don’t ever do this if the twothings you’re multiplying are both numbers: 2× 2 can’t be written as 22.

It’s quite unusual to see the symbol ÷ in university mathematics. Much more often we’ll use /, oractually write a fraction x

yinstead of x ÷ y .

1.2 Brackets

Brackets are important when you have different mathematical operations in the same expression, andyou need to show which takes priority: you know that 2 + (2 × 2) and (2 + 2) × 2 are not the same thing.Even if the two operations are the same you may need brackets: (2 − 2) − 2 and 2 − (2 − 2) are not thesame thing. At school you learnt the rule BIDMAS which specifies priority with basic arithmetic operations.But at university (including in this module) you’ll learn about new operations for which there’s no standardrule, so you need to make sure you use brackets wherever needed.

Often complicated expressions will have several pairs of brackets. In some subjects it’s typical to use adifferent style of brackets for some of them, such as [ ] or { }. You should avoid doing this: in mathematics,the brackets { } and [ ] have specific meanings. Just use ( ) for nested brackets, making each pair larger

than the ones inside them: 2−(

2−(2− (2− 2)

)).

We’ll be using curly brackets { } a lot when we talk about sets. One place where you may see squarebrackets [ ] used is to represent the integer part of a number: if x is a real number, then [x ] is used for itsinteger part, i.e. the largest integer less than or equal to x . For example, [3.2] = 3. Sometimes you maysee b c used for the integer part instead.

1.3 Infinity

As you know, the symbol∞means “infinity”. But its use in mathematics is very restricted: you shouldn’tthink of ∞ as being a number, or even a specific object. It’s only used in very special ways, such as in∑∞

n=1 or∫∞

0 , or limx→∞.

1.4 The Greek alphabet

Mathematicians like to use letters to represent things, and in particular like to use different kinds ofletters to represent different kinds of things. One way to do this is to use capital and lower-case letters, butoften Greek letters are used. In a maths course it’s assumed that you know the Greek alphabet. I’ve put

1

it on the QMplus page. You don’t have to memorise it now, but you should practise writing all the letters.Make sure your ν is different from v, your ρ is different from p, and so on.

We sometimes use capital Greek letters, but not as much, because many of them are the same astheir Roman counterparts. For example, capital alpha is just A. But now let’s look at a capital Greek letteryou’ll see a lot.

1.5∑

notation

Σ is the capital letter sigma, and is almost always used to represent a sum. Here’s an example:

100∑n=1

n = 5050.

What the left-hand side means is 1 + 2 + · · · + 99 + 100.In general, a summation like this is written

b∑n=a

xn,

where a and b are integers satisfying a 6 b, and xn (which is called the summand) is a number whichdepends on n. The summation notation says we take these values and add them up. Here’s anotherexample:

3∑n=−3

n2 = 28.

The letter n is called a dummy variable: it’s a variable that we introduce temporarily, and we let it take thevalues 1, 2, . . . , 100 in turn. For each value of n, we add n to our sum. The dummy variable should onlyappear inside the sum. You should never have an expression like

b∑n=a

xn = something involving n.

You should be able to change n to some other symbol without changing the meaning of the sum:

b∑n=a

xn =b∑

m=a

xm.

Here are some variations of this notation.

�b∑

n=a

may be written as∑

a6n6b

.

� The upper limit can be∞. What this means is that we take all values of n going from a upwards. Forexample, you may have seen the sum of an infinite geometric progression:

∞∑n=1

12n = 1.

In general it’s an awkward issue to say what the sum of infinitely may numbers means; you will seethis in your calculus modules.∑∞

n=1 may also be written as∑

n>1.

In a similar way, we can make the lower limit of the sum −∞.

2

� The limits of the sum don’t need to be definite numbers; they can also be variables. Here they arenot dummy variables, so they may appear elsewhere. You may have seen the following formula:

m∑n=1

n =m(m + 1)

2.

� You might want to sum over values of n which are not consecutive whole numbers. If you have anyset X of numbers, you can sum over all values in X . For example:∑

n∈{2,4,7,9}

n2

means 22 + 42 + 72 + 92. (As we’ll see later in the module, n ∈ X means “n is an element of the setX ”.)

1.6 Manipulating sums

Often you’ll have an expression using summation notation, and you’ll need to manipulate it to make itmore useful.

First you may want to shift the range of summation. For example, suppose we have the sum

9∑n=0

n(n + 1)

but we would like the range of summation to be from 1 to 10 instead of 0 to 9. We can just define a newvariable m equal to n + 1, and use this instead. Now as n goes from 0 to 9, m goes from 1 to 10. We canwrite n(n + 1) as (m − 1)m, so

9∑n=0

n(n + 1) =10∑

m=1

(m − 1)m.

This technique can be particularly useful if you have two sums with different ranges of summation.There are two useful ways to split a sum into two sums. The first is to break up the range of summation.

For example,9∑

n=0

n(n + 1) =4∑

n=0

n(n + 1) +9∑

n=5

n(n + 1).

Another way is to split the summand into two parts. For example, suppose we want to evaluate

99∑n=0

((n + 1)2 − n2) .

Let’s split this into two sums:99∑

n=0

(n + 1)2 −99∑

n=0

n2.

Now we use a different substitution in each of these two sums. In the first sum, we set m = n + 1, and inthe second sum, we set m = n:

100∑m=1

m2 −99∑

m=0

m2.

3

Now we can split up the range of each sum:

99∑m=1

m2 + 1002 − 02 −99∑

m=1

m2

which comes to 1002 = 10000.

1.7 Products and factorials

We can write notation for products in the same way as for sums, using Π (capital pi) instead of Σ. Forexample,

3∏n=1

(2n + 1) = 105.

One very special case of this is the factorial

m! =m∏

n=1

n.

For example,3! = 1× 2× 3 = 6, 5! = 1× 2× 3× 4× 5 = 120.

We also define 0! = 1. This seems strange to most people when they first see it: they expect 0! = 0. Butactually 0! = 1 is more logical and useful: for example, there’s a simple formula

(n + 1)× n! = (n + 1)!

which is obviously true when n > 1. If you want it to be true for n = 0 as well, you need 0! = 1.

4

2 Logic

Before we get to mathematical objects such as numbers, sets and functions, we’ll look at how we makeprecise meaningful statements in maths, and the logical constructions that go into theorems. In the nextsection, we’ll look at how to go about proving things.

2.1 Statements

Definition. A statement or assertion is an expression which can be a complete sentence by itself, and iseither true or false.

Examples. In our examples we’ll take some everyday statements as well as mathematical ones.� “Paris is the capital of France” is a statement which is true.

� “The capital of Italy” is not a statement – it’s a way of specifying a city.

� “Broccoli is delicious” is a statement, though whether it’s true is a matter of opinion.

� “3 6 4” is a statement which is true.

� “1 + 1 = 3” is a statement which is false.

� “1 + 2” is not a statement; it’s just a number. It makes no sense to say that 1 + 2 is true or false.

� “√

2 is irrational” is a statement which is true (as we’ll see later in the module).

� “There is a prime number larger than 1000000” is a statement which is true.

The statements above do not involve any external variables and were all universally true or false (sub-ject to your opinion about broccoli). Often a statement will involve a variable (representing a number orset, or some other object) and whether the statement is true or not depends on the value of that variable.Here are some examples of statements involving variables:

� x = 2

� x 6= y

� n is even

� there is a prime number larger than n

� A ⊆ B

� A ∩ B = C.

These are all statements whose truth depends on one or more external quantity (the numbers x , y and n,or the sets A, B and C). To say whether the statement “n is even” is true or false, we need to know what nis.

We will often label a statement with a letter or symbol for easy reference, just as we do with numbersand functions. A general statement may be denoted P or Q, and a statement that depends on somevariable x by P(x) or Q(x). For example, we could say: let P(n) denote the statement “n is even”. ThenP(2) is true, but P(5) is false.

5

2.2 Using statements to make new ones

Just as there are ways to combine numbers to make new numbers (such as addition), there are waysto combine statements to make new statements. Two of the simplest ones are to use the words “and” and“or”. So if we have two statements P and Q, we get a new statement “P and Q”. (In symbolic logic this isoften written P ∧ Q and called the conjunction of P and Q, but in this course I will stick with the everydayword “and”.) In order for the statement “P and Q” to be true, P must be true and Q must be true. Forexample, if P is the statement “Paris is the capital of France” and Q is the statement “Madrid is the capitalof Italy”, then we get the statement “Paris is the capital of France and Madrid is the capital of Italy”. Thisstatement is false, because Q is false.

Notation. Quite often when mathematical statements are combined with “and”, a shorthand is used whichdoesn’t involve the word “and”. For example, you’re probably familiar with writing

a < b < c

to mean “a < b and b < c”. You might also see

a, b > 0

to mean “a > 0 and b > 0”. So be alert for hidden “and”s.

Similarly, if P and Q are statements then we get a new statement P or Q. (In symbolic logic this iswritten P ∨Q and called the disjunction of P and Q.) In order for “P or Q” to be true, we just need P to betrue or Q to be true or both. Continuing the example above, the statement “Paris is the capital of Franceor Madrid is the capital of Italy” is true, because P is true.

In fact we’ve already seen an example of an “or” statement: the statement “3 6 4” is just a shorthandway of writing “3 < 4 or 3 = 4”. And it’s true, because 3 < 4 (even though the statement 3 = 4 is not true).

You may have seen the idea of a truth table to show you how the truth of statements like “P and Q”depend on whether the individual statements P and Q are true. For example, here are the truth tables for“P and Q” and “P or Q”’.

P Q P and Qtrue true truetrue false falsefalse true falsefalse false false

P Q P or Qtrue true truetrue false truefalse true truefalse false false

In each truth table we have some basic statements (in this case the statements P and Q) from which we’rebuilding up more complicated statements. The table should have one row for each possible combinationof true/false for the basic statements.

Another simple way to turn statements into new ones is negation: this simply means replacing astatement with its opposite. Often in everyday English, we make the negation of a statement by insertingthe word “not” in the appropriate place.

Definition. If P is a statement, the negation of P is the statement “not P” (that is “P is false”).

Examples of Negations.� The negation of the statement “it is raining” is “it is not raining”.

� The negation of the statement “Paris is not the capital of France” is “Paris is the capital of France”.

6

� The negation of the statement “x = 2” is “x 6= 2”.

� The negation of the statement “x > y ” is “x < y ”.

� The negation of the statement “x = 2 and y = 0” is “x 6= 2 or y 6= 0”.

� The negation of the statement “x = 2 or y = 0” is “x 6= 2 and y 6= 0”.

Notice how “and” and “or” behave with negation: “and” becomes “or” and vice versa. I hope this isintuitively obvious, but we can check it with a truth table. First, here’s the truth table for “not P”.

P not Ptrue falsefalse true

Now here’s a combined truth table for the statements “not (P and Q)”, “(not P) or (not Q)”.

P Q not P not Q not(P and Q) (not P) or (not Q)true true false false false falsetrue false false true true truefalse true true false true truefalse false true true true true

You can see that “not(P and Q)” is true exactly when “(not P) or (not Q)” is true, so they are equivalentstatements.

As an exercise, you should do the same thing for the statements “not (P or Q)” and “(not P) and (notQ)”.

2.3 Quantifiers

There are two special constructions that allow us to turn a statement involving a variable into a generalstatement. These are called quantifier statements, and involve adding “for all” or “there exists” to astatement.

For example, the statement “n is even” depends on the integer n. But we can write “n is even forall integers n” and we have a general statement. (Of course, it’s a false statement.) n is now a dummyvariable: just as with dummy variables for summation, you can replace it with any other label (as longas you’re not already using that label for something else). So you could instead write “m is even for allintegers m” and logically you’re saying exactly the same thing.

Be alert to variations in wording: mathematicians like to re-arrange sentences or use different wordsfor the same thing. You may see “for every” or “for any” instead of “for all”, and the phrase may come atvarious places in the sentence. So the statement in the last paragraph could be written “for every integern, n is even” or (avoiding the use of a dummy variable) “every integer is even”.

“For all” statements can also be expressed using “If . . . then”: for example “if n is an integer, then nis even”. This kind of statement is called an implication, and we’ll discuss these below. Another way ofwriting “for all” statements that you’ll see a lot and often causes confusion is the “Let . . . ” construction. Soyou may see the following.

Let n be an integer. Then n is even.

This is a mathematician’s way of writing “n is even for all integers n”. This sometimes trips studentsup in coursework and exam questions: if you see “Let n be an integer. Prove that . . . ”, it doesn’t meanyou can choose the integer n and prove whatever you’re supposed to prove for that particular n; you’re

7

supposed to prove it for every n. This may seem like a strange way to word this, but it’s a standard way towrite things in maths.

We can also turn statements depending an a parameter into general statements using “there exists”:we can write “There exists an integer n such that n is even”. Again, n has become a dummy variable. (Ofcourse, this statement is true.) As with “for all” statements, there are different ways of wording this: youmay see “for some” instead of “there exists”, and it could come at different places in the sentence. So forexample we could write “n is even for some integer n”, or (avoiding using a dummy variable) “there is aneven integer”.

It’s important to be able to read a mathematical statement and recognise that there’s a dummy variable,and understand whether it’s a “for all” statement or a “there exists” statement. In maths most of thetheorems you’ll see are “for all” statements, because these are much more powerful: we’re interested inthings that are always true, no matter what values we choose for the variables.

We use “for all” and “there exists” statements in everyday English too. If someone asks you how yourweek-long holiday was, and you replied “it rained that day”, it wouldn’t make sense: it’s a grammaticallysound sentence, but effectively you’re making a statement involving a variable (even though you don’t givethat variable a label), and the statement doesn’t make sense until you either specify a value for that variable(“it rained on the last day”) or use “for all” (“it rained every day”) or “there exists” (“it rained one day”).

Examples of quantifier statements. Many mathematical statements involve quantifiers either directly orin disguise. For example:

(a) Every prime number greater than 2 is odd.

(b) For any integer n, there is an integer m which is larger than n.

Example (a) is a “for all” statement written without a dummy variable. Example (b) uses two dummyvariables, one with “for all” and one with “there exists” (although alternative words are used). In exampleslike this the order is very important. This statement says that for any choice of n, you can find an integerm which is bigger than n. This statement is true: given n, we could let m = n + 1. But now consider thefollowing statement:

there is an integer m which is larger than any integer n.

This statement still has the same three ingredients: “for all n”, “there exists m”, and “m > n”, but now it’sobviously false: it says that there’s an integer m which works for any choice of n.

Notation. You may see the symbols ∀ to mean “for all” and ∃ to mean “there exists”. These are usuallyonly used in symbolic formulæ; when we write a theorem as an English sentence we tend to write “for all”and “there exists” in words.

Quantifiers and negation

An important thing to be aware of is how to form the negation of quantifier statements: “for all” and“there exists” get swapped. If we take example (a) above:

every prime number greater than 2 is odd

its negation is

there is a prime number greater than 2 which is even.

Also for example (b): the negation of

8

for any integer n, there is an integer m which is larger than n

is

there is an integer n such that every integer m is less than or equal to n.

Notice that the “for all n” has been replaced with “there exists n”, the “there exists m” has been replacedwith “for all m”, and “m > n” has been replaced with its negation “m 6 n”.

2.4 Implications

Definition. Suppose that P and Q are statements. The statement “If P then Q” is called an implication.It is false if P is true and Q is false. Otherwise it is true. We write P ⇒ Q to mean ‘If P then Q’.

As with quantifier expressions, I tend to prefer words to symbols and I recommend using the⇒ symbolrather sparingly. In particular, you should rarely use it in a piece of continuous prose.

In everyday speech we use “If . . . then . . . ” in essentially the same way as in mathematics. Forexample:

if we go to McDonalds, I will have a cheeseburger.

If you go to McDonalds and have a cheeseburger, then this statement is true. The statement is also trueif you go to Burger King and have a cheeseburger, or if you go to Burger King and have chicken nuggets.The only way this statement can be false if if you go to McDonalds and don’t have a cheeseburger.

As with quantifier statements, there are several different ways of wording implications. The followingare all ways of saying P ⇒ Q:

� if P then Q

� Q if P

� P implies Q

� Q is implied by P

� P only if Q

� P is a sufficient condition for Q

� Q is a necessary condition for P.

There is another word we can use in the case where Q involves the word “not”: we can replace “if. . . not” with “unless”: for example “n2 is not even if n is not even” can be written as “n2 is not even unlessn is even”.

A difference between everyday use and mathematical logic is that when we say P implies Q there isno suggestion that P causes Q. We’re simply saying that if P is true then so is Q, whether they’re relatedor not. Consider the following statement.

(1 + 1 = 2)⇒ (7 is prime)

The is a true implication, even though the fact that 7 is prime has nothing to do with the fact that 1 + 1 = 2.In mathematics, saying P ⇒ Q simply means that either Q is true or P is false, or both. Here’s a truthtable for “P implies Q”:

P Q P ⇒ Qtrue true truetrue false falsefalse true truefalse false true

9

Both of the following are true implications:

(1 + 1 = 3)⇒ (7 is prime)

(1 + 1 = 3)⇒ (9 is prime).

These examples are slightly unusual, in that P and Q are self-contained statements that don’t involvevariables. More often you’ll see them with variables; an implication is regarded as being true if it’s truefor all values of the variables. (This ties in with what we said earlier about re-wording a “for all” statementusing “if . . . then”.)

Here are some more examples of implications with variables.

Examples of Implications.� (x = 2)⇒ (x2 = 4) is a true implication.

� (x2 = 4) ⇒ (x = 2) is a false implication. Although it’s true for most values of x , it’s not always true:if we set x = −2 then x2 = 4 is true but x = 2 is false.

� (n > 4)⇒ (n > 3) is a true implication.

� (n + 2)⇒ 3 is meaningless. It is not an implication because n + 2 and 3 are not statements.

2.5 Converse and contrapositive

Related to an implication P ⇒ Q are two other implications involving the statements P and Q.

� The converse of P ⇒ Q is the implication Q ⇒ P.

� The contrapositive of P ⇒ Q is the implication (not Q)⇒ (not P).

The converse of P ⇒ Q is a completely different and logically independent statement from P ⇒ Q.We saw an example above where P ⇒ Q was true but Q ⇒ P was false:

(x = 2)⇒ (x2 = 4) is a true implication

but

(x2 = 4)⇒ (x = 2) is a false implication.

It’s important to make sure your implications go in the right direction, and to use the symbol⇒ accurately.Quite often a theorem will say that P ⇒ Q and Q ⇒ P. This is often expressed as “P ⇔ Q” or “P

is equivalent to Q” or “P if and only if Q”. You will often see “if and only if” abbreviated as “iff” (this isa standard abbreviation, not just bad spelling). When you have a theorem like this, there are two thingsto prove. If it’s a very simple theorem, then it may be possible to prove them at the same time by givinga chain of statements going from P to Q, with ⇔ between consecutive statements. But more often, aseparate argument is needed for each of the two statements. Often the proof will have two paragraphs:the first will begin with something like “First suppose P holds . . . ”, and will go on to deduce that Q is true;and the second will begin with “Conversely, suppose Q holds . . . ” and will go on to deduce P.

Finding the converse. There is a slight subtlety here arising from the fact that there are often severalequivalent ways to write an implication. As we said earlier, most interesting implications have the featurethat they apply to specified objects (such as all integers). For instance, earlier we saw the followingexample of a statement about an arbitrary integer n:

(n is even)⇒ (n2 is even).

10

The converse of this is the following statement about an arbitrary integer n:

(n2 is even)⇒ (n is even).

Both of these statements are true (and we’ll talk about how to prove them later).However, we could have moved the assumption that n is an integer into the implication to get the

following statement which is equivalent to the first statement above. For all real numbers n:

(n is an even integer)⇒ (n2 is an even integer).

The converse is the following statement about an arbitrary real number n:

(n2 is an even integer)⇒ (n is an even integer)

which is false (why?).So when we’re finding the converse of an implication P ⇒ Q, we need to be clear about which of the

assumptions are part of P, and which assumptions hold generally.

Now let’s look at the contrapositive. This is important because it’s logically equivalent to the originalimplication: P ⇒ Q and (not Q) ⇒ (not P) are either both true or both false. This takes some thinkingabout; let’s use a non-mathematical example to make it more convincing. Here’s a traditional belief:

if it’s about to rain, then cows lie down.

If you believe this, then you should also believe the following statement:

if cows aren’t lying down, then it’s not about to rain.

Another way to check that the contrapositive is logically the same as the original statement is to use atruth table.

P Q P ⇒ Q not Q not P (not Q)⇒ (not P)true true true false false truetrue false false true false falsefalse true true false true truefalse false true true true true

The third column is the same as the sixth column, so P ⇒ Q and (not P)⇒ (not Q) are logically equivalent.

3 Proofs

3.1 What is a proof?

One of the things that distinguishes mathematics from other fields is the concept of mathematical proof.A proof is a logical argument which establishes some result with complete certainty. This is the standardwe require for a result to be accepted as a mathematical theorem. Proofs can be long or short, routine oringenious, self-contained or relying on many other results.

In this chapter we will look at what a proof is and the process of writing a proof in more detail. Becomingcomfortable with these ideas will help you in all the modules you study.

You can think of the process of writing a proofs as convincing a critical friend that the result is true.The person we are trying to convince will point out the possible holes in our argument and press us to saywhy something is true if we are vague. Although some particular proofs are long and use ingenious ideas,there are a lot of standard proof techniques that you can use, and you will get proficient at proving simpleresults for yourself.

11

Here is an example showing the structure of a simple mathematical result and its proof.

Theorem 3.1. Let n be an integer. If n is even then n2 is even.

Before we prove this, let’s look at the statement of the theorem. The first sentence sets the context (nis an integer) to which the second sentence applies. The second sentence contains two statements: “n iseven” and “n2 is even”) which are linked together with “if . . . then” to form an implication (“If n is even thenn2 is even”). The things that we are given (“n is an integer” and “n is even”) are called the hypotheses orassumptions, and the thing that results from them is the conclusion.

Before trying to prove this, let’s think what a proof needs to achieve. This will suggest the structure theproof must have. We will call this structure a “roadmap” for the proof.

Roadmap for the proof of Theorem 3.1.Let n be an integer. Suppose that n is even.

...(some argument)

...So n2 is even.

This may look very simple, but it’s really important that the proof goes in the right direction. Oftenstudents go wrong by writing the thing they’re supposed to prove as the first line of the proof. You shouldstart with what you know (meaning the hypotheses of the theorem, or other results that you know to betrue) and end with what you’re supposed to prove.

With this structure or roadmap of the proof in mind, it is not too difficult to fill in the argument. For this,we need a precise definition of what an even number is: an integer n is even if it can be written as 2k forsome integer k .

Proof of Theorem 3.1. Let n be an integer. Suppose that n is even.Then n = 2k for some integer k .So

n2 = (2k )2 = 4k2 = 2(2k2)

So n2 is even.

In this chapter we will look at proof techniques. The main thing I want to emphasise is that you needto be clear about what a proof of a particular statement needs to do. With some easy proofs, once youhave the structure, the details (the “some argument” bit above) are much easier. Of course, there are alsoproofs where these details require a lot of ingenuity and imagination to come up with.

3.2 Proving implications

Many mathematical theorems state an implication that involves a variable; so they have the followingform.

Let x be an integer. If P(x) holds then Q(x) holds. (Imp1)

(Of course, this structure may involve objects other than integers; I’ve just used integers as an exam-ple.) I’ve labelled this statement (Imp1) for easy reference. To prove that (Imp1) is true, we need to checkthat whenever P(x) holds, Q(x) holds as well. Another way of thinking of this is that to prove that theimplication P(x)⇒ Q(x) is true we need to rule out the possibility that P(x) is true and Q(x) is false. Hereare three ways of doing this.

12

Roadmap for proving directly that (Imp1) is true.Suppose that x is an integer and P(x) holds.

...(some argument)

...So Q(x) holds.

Another way of proving Imp1 is to use the contrapositive. Remember that the contrapositive of P(x)⇒Q(x) is (not Q(x))⇒ (not P(x)). So we can show that whenever Q(x) is false, P(x) is also false. This leadsto a second roadmap for a proof of (Imp1).

Roadmap for proving (Imp1) via the contrapositive.We prove the contrapositive. Suppose x is an integer and that Q(x) is false.

...(some argument)

...So P(x) is false.

Finally, suppose that we were able to establish that having P(x) true and Q(x) false would lead to somefalse statement. We would conclude that this could not happen and so the statement (Imp1) is true. Thisis a proof technique called proof by contradiction which we will look at more thoroughly later.

Roadmap for proving (Imp1) by contradiction.Suppose (for a contradiction) that there is an integer x for which P(x) is true and Q(x) is false.

...(some argument)

...(A contradiction)But this is impossible. So (Imp1) is true.

Our proof of Theorem 3.1 above was an example of a direct proof of an implication. Here’s an exampleof how we can prove an implication by proving the contrapositive.

Theorem 3.2. Let n be an integer. If n2 is even then n is even.

Proof. We use the contrapositive.Let n be an integer, and suppose that n is odd. Then n = 2k + 1 for some k ∈ Z. So

n2 = (2k + 1)2 = 4k2 + 4k + 1 = 2(2k2 + 2k ) + 1,

which is odd.

This theorem would be hard to prove directly: we could say “suppose n2 is even; then n2 = 2k for someinteger k ”. But then what?

13

3.3 Disproving implications

Now suppose we want to disprove (Imp1), i.e. prove that (Imp1) is false. Disproving an implicationis generally much easier than proving it, because an implication is supposed to be true for all x , so todisprove it you just have to provide a single x for which the implication doesn’t hold. This x is called acounterexample to the implication. For instance, the implication “If p is prime then p is odd” is falsebecause p = 2 is a counterexample.

Roadmap for disproving (Imp1).Let x = . . . (a particular integer).

...(some argument)

...So P(x) holds but Q(x) does not hold. So (Imp1) is false.

3.4 Proof by contradiction

In mathematics, statements are either true or false. This means that if we can prove that a statement Pis not false then we can conclude that P is true. One way of doing this is to show that P being false wouldlead to some impossibility – a false statement (like 1 + 1 = 3) or contradiction (like n > 0 and n < 0).

This idea is called proof by contradiction.

Roadmap for a proof of a statement P by contradiction. Suppose (for a contradiction) that P is false....

(some argument)...

Therefore (some statement which isn’t true or doesn’t make sense).But this is a contradiction, so P is true.

Here’s an example.

Theorem 3.3. There is no smallest positive real number.

Proof. Suppose (for a contradiction) that there is a smallest positive real number. Call this number r .Then 0 < r and so 0 < r/2 < r .It follows that r/2 is a positive real number which is smaller than r . This contradicts the assumption

that r was the smallest such.We conclude that there is no smallest positive real number.

In particular, statements of the form “there is no . . . ” are very good candidates for proof by contradic-tion.

3.5 Proof by induction

Another common proof technique is proof by induction. This technique is useful when the statementwe are proving depends on a positive integer n.

Suppose we want to prove the statement P(n) for every positive integer n. To prove it by induction, weprove two things:

14

Base case: P(1) is true.

Inductive step: If n > 2 and P(n − 1) is true then P(n) is true.

Then we can conclude that P(n) is true for all n.

Roadmap for proof of this statement by induction.Base Case: We check that P(1) is true.

Inductive Step: Suppose n > 2 and that P(n − 1) is true....

(some argument)...

So P(n) is true.We conclude that P(n) is true for all n by induction.

The assumption that P(n − 1) is true is called the inductive hypothesis.Here’s an example you may have seen before.

Theorem 3.4. Suppose n is a positive integer. Then

n∑k=1

k =n(n + 1)

2.

Proof. We use proof by induction. Let P(n) denote the equation∑n

k=1 k = n(n+1)2 .

Base case: P(1) says that1∑

k=1

k =1× (1 + 1)

2.

This is true because both sides equal 1.

Inductive step: Suppose n > 2 and that P(n − 1) is true. Then

n−1∑k=1

k =(n − 1)n

2.

So

n∑k=1

k =n−1∑k=1

k + n splitting the last term off from the sum

=(n − 1)n

2+ n using P(n − 1)

=n(n + 1)

2,

so P(n) is true.

So by induction P(n) is true for every n.

15

Why does proof by induction work? Think how the chain of implications will eventually reach any naturalnumber:

� P(1) is true (by the base case),

� P(2) is true (because P(1) is true and P(1)⇒ P(2)),

� P(3) is true (because P(2) is true and P(2)⇒ P(3)),

and so on.Given any k we can imagine continuing this list for k steps after which we reach the statement P(k ). It

follows that P(k ) holds for all k .You may see slight variations in the use of variables with induction proofs. Sometimes n is a non-

negative integer rather than a positive integer, in which case the base case is n = 0 rather than n = 1.Sometimes the inductive step is written as P(n)⇒ P(n + 1) rather than P(n − 1)⇒ P(n).

Here’s another example of proof by induction.

Theorem 3.5. If a and r are real numbers with r 6= 1 and n is a positive integer, then

n∑k=1

ar k−1 =a(rn − 1)

r − 1.

In this theorem there are three variables a, r , n. We think of a and r as being fixed, and use inductionon n.

Proof. Let A(n) be the equationn∑

k=1

ar k−1 =a(rn − 1)

r − 1.

Base case: A(1) says that a = a, which is true.

Inductive step: Suppose n > 2 and that A(n − 1) is true. This means

n−1∑k=1

ar k−1 =a(rn−1 − 1)

r − 1.

So

n∑k=1

ar k−1 =n−1∑k=1

ar k−1 + arn−1 splitting off the last term from the sum

=a(rn−1 − 1)

r − 1+ arn−1 because A(n − 1) is true

=a(rn−1 − 1 + (r − 1)rn−1)

r − 1

=a(rn−1 − 1 + rn − rn−1)

r − 1

=a(rn − 1)

r − 1,

so A(n) is true.

16

We conclude that A(n) holds for all n by induction.

A lot of students write induction proofs badly even when they understand what they’re doing. Here aresome tips.

� Give a name to the statement you are trying to prove: “Let P(n) be the statement . . . ”. This willclarify in your mind what you need to prove and provide a helpful label, rather than saying things like“now suppose the theorem is true with n replaced by n − 1”.

� Don’t confuse a statement with a quantity. In the proof of Theorem 3.5 above, A(n) is not the number∑nk=1 ar k−1; it is the statement that two quantities are equal. So don’t write A(n) =

∑nk=1 ar k−1.

� Don’t write nonsense like “let n = n + 1”.

Look back at our explanation of why induction works. You will notice that when we came to considerthe statement P(k ) we had already established that all of P(1), P(2), . . . , P(k − 1) were true. This leads tothe following observation and an approach sometimes called strong induction.

Base case: P(1) is true.

Inductive step: If n > 2 and P(1), P(2), . . . , P(n − 1) are all true, then P(n) is true.

If we can do these two steps, then again we can deduce that P(n) is true for all n. This makes proof byinduction more powerful. You can think of the induction step as being to prove P(n), allowing ourselves touse P(1), P(2), . . . , P(n − 1) in the proof.

Here’s an example of strong induction: let’s prove a statement from the week 2 tutorial sheet.

Theorem 3.6. Define the sequence F1, F2, F3, . . . by

F1 = F2 = 1, Fn = Fn−1 + Fn−2 for n > 3.

Then Fn < 2n for every n.

Let P(n) be the statement “Fn < 2n”. We will prove this by induction. To prove P(n) we need toconsider Fn and since this depends on Fn−1 and Fn−2 it is natural to use strong induction. More precisely,our induction step will take n > 3 and establish P(n) under the assumption that both P(n−2) and P(n−1)are true. The base case will be that P(1) and P(2) both hold. Here are the details.

Proof of Theorem 3.6. Let P(n) denote the inequality Fn < 2n.

Base case: F1 = 1 < 21 so P(1) holds, and F2 = 1 < 22, so P(2) holds.

Inductive step: Suppose that n > 3 and that P(n − 2) and P(n − 1) both hold. Then

Fn−2 < 2n−2, Fn−1 < 2n−1.

So

Fn = Fn−1 + Fn−2

< 2n−1 + 2n−2 using P(n − 1) and P(n − 2)

< 2n−1 + 2n−1

= 2n,

so P(n) holds.

It follows that P(n) is true for all n by (strong) induction.

17

3.6 Finding mistakes in proofs

It’s useful to be able to identify when a proof has an error. When you’re writing your own proofs, youshould check them at the end to see whether they make sense and prove the thing they set out to prove.Finding mistakes in proofs is a skill that you gain with experience, but here are some tips.

� Make sure that the “mathematical grammar” of the proof makes sense: this means that symbolsshould only be used where it makes sense. For example, a proof (or any mathematical writing)should never say that a number is equal to a set. If A and B are sets, then it shouldn’t say A < B.If a function f is used and f is supposed to be a function defined on the positive integers, then theproof shouldn’t use f (−1).

� Make sure the proof goes in the right direction. It should start from the things you know (i.e. thehypotheses of the theorem, together with other results that you know are true), and end with theconclusion (i.e. the thing you’re supposed to prove).

If the proof is by contradiction or by using the contrapositive, then the direction of the proof is a bitdifferent. For a proof by contradiction, you should assume that the thing you are trying to prove isfalse. In other words, you assume the negation of the thing you’re trying to prove. You should gofrom here (together with the hypotheses and other things you know to be true) and end up with acontradiction. For a proof by contrapositive, you’re trying to prove that P ⇒ Q for some statementsP and Q. Your proof should start from the negation of Q, and end up with the negation of P.

� Make sure you don’t use any invalid assumptions. This is why it’s a good idea when you’re workingthrough the calculations in a proof to say where each line comes from. If it’s just a simple algebraicmanipulation from the previous line, then you don’t need to justify it, but if at some point you’rebringing in one of the hypotheses or a different theorem, you should say so. Correspondingly, whenyou’re reading a proof, if there’s a line with no explanation, make sure you can see where that linehas come from. Does it use an assumption without saying so? If so, is that assumption valid?

� Check whether the proof uses all the hypotheses. For example, if the theorem includes the hypothe-sis that n is an even integer (together with the some other hypotheses), where does the proof actuallyuse the assumption that n is even? If it doesn’t use this assumption, then either the proof is wrongor this hypothesis is not necessary. Check and see whether the theorem is plausible if n is an oddinteger instead. If it’s not (e.g. it clearly fails for n = 5) then there’s something wrong with the proof.

� Work through the proof for some special values of the variables. If the theorem says that somethingis true for all m from 1 to n, then check the cases m = 1 and m = n; the edge cases are wheremistakes can creep in. Similarly, for a proof by induction, check the inductive step carefully in thecase n = 2.

In this chapter we’ve seen some ideas about how to prove things, but these won’t always be enough;sometimes proving a theorem requires an ingenious idea that isn’t part of any general proof strategy. Butas you gain experience, you’ll get more of a feeling for how to come up with proofs, and what at firstseemed like a clever idea will become a standard trick, and what seems like a standard trick will becomea basic observation. The important thing is not to be afraid of proofs. When you encounter a proof in yourlectures, make sure you can understand the logic (so that you really believe the theorem has been proved),but also try to isolate the key ideas that make the proof work. Often what looks like quite a long proof justrests on one key idea, and the rest of the proof is routine manipulation. Please ask lecturers and tutors forhelp if you’re struggling to get to grips with a proof.

18

4 Integers

Now we start looking at some mathematical objects, starting with numbers. In this section we will lookat natural numbers and integers; later we will explore some more number systems (rational numbers, realnumbers and complex numbers).

4.1 Natural numbers and integers

We start with the positive while numbers, also called natural numbers. We use the symbol N to meanthe set of all natural numbers.

Blackboard bold. The symbol N is the letter N written in a special font called blackboard bold. Math-ematicians use blackboard bold letters for very specific things; in this module, we’ll see them used forvarious sets of numbers.

You will have a good intuition for the natural numbers and the idea of visualising them as a “numberline”.

1 2 3 4 5

The set N has several useful features:

� The arithmetic operations of addition and multiplication are defined in N and obey the familiar prop-erties. In other words if n, m ∈ N then n + m ∈ N and nm ∈ N. These operations obey familiar lawslile n + m = m + n and n(l + m) = nl + nm.

� The relation < gives an ordering of N.

� Every non-empty subset of N has a minimal element. (This sounds like a strange statement, but it’sactually what makes proof by induction possible.)

Number systems bigger than N are usually introduced to allow us to solve an equation we can’t solvein a smaller number system. To begin with, the operation of subtraction may or may not work within N. Forexample, 1, 3 ∈ N and 3− 1 ∈ N but 1− 3 /∈ N. Another way of saying this is that the equation x + 1 = 3does have a solution with x ∈ N but the equation x + 3 = 1 does not. We extend N to Z (the set of allintegers) to get round this.

(We use blackboard bold Z for the integers: the Z stands for “Zahlen”, which is German for “numbers”.)Again, Z can be visualised as a number line.

−3 −2 −1 0 1 2 3

In Z, the operations of addition and multiplication still work and still obey the familiar rules from N, andnow we have the operation of subtraction too. Later in the module we’ll extend further by introducing therational numbers, but for now we’ll explore the integers.

4.2 Divisibility and primes

We haven’t mention division because this may take us outside Z. For instance 3, 4 ∈ Z but 4/3 /∈ Z.Later we will use this to define the rational numbers, but here we’ll use it to define an important relation onZ: divisibility.

19

Definition. Let d , n ∈ Z. We say that d divides n if there exists some k ∈ Z with n = dk . We write d | nto mean that d divides n. We write d - n to mean that d does not divide n.

Examples.� 2 | 12 (since 12 = 6× 2).

� 2 | −12 (since −12 = (−6)× 2).

� d | 0 for all d ∈ Z (since 0 = 0× d).

� 0 | n if and only if n = 0 (since n = k × 0 if and only if n = 0).

� n | n for all n ∈ Z (since n = 1× n).

� 1 | n for all n ∈ Z (since n = n × 1).

Warning. Don’t confuse d | n with d/n. The expression d | n (pronounced “d divides n”) is a statementwhich is either true or false; for example, it is true that 3 | 6 but it is not true that 3 | 7. The expression d/n(pronounced “d divided by n”) is a number; for example, 3/6 equals 1

2 .

There are other ways of saying that d divides n: we may say that

� d is a factor of n,

� d is a divisor of n,

� n is divisible by d ,

� n is a multiple of d .

For example, the factors of 6 are −6,−3,−2,−1, 1, 2, 3, 6. The multiples of 6 are

. . . ,−12,−6, 0, 6, 12, 18, . . . .

Lemma 4.1. Suppose a, b, c ∈ Z. If a | b and b | c, then a | c.

Proof. Since a | b ad b | c, there are integers k , l such that b = ak and c = lb. Then c = (kl)a, and kl ∈ Z,so a | c.

Lemma 4.1 says that | is a transitive relation. We’ll see more on relations later in the module.Now we come to prime numbers. As we saw in the examples above, any number n has 1 and n as

factors. Natural numbers which have no factors in N other than these are said to be prime.

Definition. Suppose n ∈ N.n is prime if n > 1 and n has no positive factors except 1 and n.n is composite if n > 1 and n is not prime.

In particular, the number 1 is neither prime nor composite; every integer greater than 1 is either primeor composite but not both.

We often say “a prime” to mean “a prime number”. It may seem surprising that we don’t include 1 asa prime number, but this is a universal mathematical convention which turns out to be more convenient insome situations.

Let’s see some basic facts about primes.

Lemma 4.2. Every natural number n with n > 1 has at least one prime factor.

20

Proof. We will use proof by strong induction. Let P(n) be the statement “n has a prime factor”.

Base case: The base case is P(2). Since 2 | 2 and 2 is prime, P(2) is true.

Inductive step: For the inductive step, take n > 3, and assume that P(2), P(3), . . . , P(n − 1) are all true.Now we consider two cases: n is either prime or not prime.

� If n is prime then n is a prime factor of n, so P(n) is true.

� If n is not prime then it has a factor a, where 1 < a < n. Because we are assuming P(a) istrue, a has a prime factor p. Now p | a and a | n, so p | n by Lemma 4.1. So n has a primefactor p, so P(n) is true.

The result follows by induction.

In fact, more is true: every natural number n can be written as a product of primes, and this “primefactorisation” is unique (up to re-ordering). This is called the prime factorisation of n.

For example, 60 = 2× 2× 3× 5, and any other expression for 60 as the product of prime numbers willhave the prime factors 2, 2, 3, 5 in some order.

The fact that prime factorisation is unique up to re-ordering is called the Fundamental Theorem ofArithmetic.

The previous lemma is an important ingredient in the following famous result. The proof of this goesback to Euclid (around 300 BC), and is one of the most famous proofs in maths.

Theorem 4.3. There are infinitely many primes.

Proof. We use proof by contradiction. Suppose there are only finitely many primes. This means that wecan write all the prime numbers in a finite list

p1, p2, p3, . . . , pm.

Now let n = p1p2 . . . pm + 1. By Lemma 4.2, n has a prime factor p. We are assuming p1, p2, . . . , pm arethe only prime numbers, so p = pi for some i . This means that p is a factor of n but is also a factor of n−1.So n = pk and n − 1 = pl for some k , l ∈ Z. But then

p(k − l) = pk − pl = n − (n − 1) = 1.

but this is impossible since k − l ∈ Z and p > 1.We conclude that our original assumption was false and so there are infinitely many primes.

4.3 Greatest common divisor

Definition. Suppose a, b ∈ Z and a, b are not both 0. The greatest common divisor of a and b is thelargest natural number d such that d | a and d | b. We write gcd(a, b) for the greatest common divisor ofa and b.We say that a and b are coprime if gcd(a, b) = 1. This means they have no common factors except 1and −1.

(When defining gcd(a, b), we have to exclude the case where a and b are both 0, because d | 0 forevery d , so there is no greatest common divisor of 0 and 0.)

21

Examples.� gcd(5, 7) = 1.

� gcd(90, 100) = 10.

� gcd(4, 6) = 2.

� gcd(−6,−4) = 2.

� If a and b are primes and a 6= b, then gcd(a, b) = 1; this is because the only positive factors of a are1 and a, while the only positive factors of b are 1 and b so the only positive common factor is 1.

� gcd(1, b) = 1 for any b, because the only positive divisor of 1 is 1.

� If b > 0, then gcd(0, b) = b, because every positive integer divides 0, so gcd(0, b) is the greatestdivisor of b.

� If b > 0, then gcd(b, b) = b.

There’s no need to stop at two numbers a and b. Given any integers a1, . . . , am, we can define theirgreatest common divisor gcd(a1, . . . , am), as long as a1, . . . , am are not all 0. We can even define thegreatest common divisor of infinitely many integers.

How do we find the greatest common divisor of a and b? The most obvious method is to list the positivedivisors of each one and find the largest number that appears on both lists. For example, take a = 24 andb = 42:

� the positive divisors of 24 are 1, 2, 3, 4, 6, 8, 12, 24;

� the positive divisors of 42 are 1, 2, 3, 6, 7, 14, 21, 42.

The largest number on both lists is 6, so gcd(24, 42) = 6.The problem with this method is that finding all the factors of a large number takes a lot of computation.A slightly better way to find the greatest common divisor is to use prime factorisations. If you write

down the prime factorisation of n, then each positive factor is found by taking the product of some of theprimes that occur. (We have to include the empty product to get the factor 1.)

For example, the prime factorisation of 40 is 2× 2× 2× 5. So the positive factors of 40 are

2, 5, 2× 2, 2× 5, 2× 2× 2, 2× 2× 5, 2× 2× 2× 5 and 1.

So to work out gcd(a, b), we can work out the prime factorisation of each and write down the primes thatoccur in both factorisations. For example,

24 = 2× 2× 2× 3; 42 = 2× 3× 7

and we can read off from this that gcd(24, 42) = 2 × 3 = 6. Again, this is conceptually simple but hard inpractice since it is computationally difficult to find the prime factorisation of a large integer.

We will give a method for calculating greatest common divisors which avoids these computationaldifficulties. The next two lemmas prepare the way for this method. The first lemma expresses the conceptof “dividing with a remainder”.

Lemma 4.4. Suppose a and b are integers and b > 0. The there exist integers q, r with 0 6 r < b suchthat

a = qb + r

22

Proof. Let q be the largest integer such that qb 6 a, and let r = a − qb. Then certainly r > 0, becausea > qb. But also (q+1)b > a (because of the way we chose q), which rearranges to say b > a−qb = r .

The next proposition relates “division with remainder” to greatest common divisors. This is the crucialresult that will allow our algorithm to work.

Proposition 4.5. Suppose a, b, q, r ∈ Z with b > 0 and a = qb + r . Then gcd(a, b) = gcd(b, r ).

Proof. Suppose d is a positive integer. We will show that d is a divisor of both a and b if and only if it is adivisor of both b and r .

Suppose d | a and d | b. Then there are integers k , l such that a = dk and b = dl . So r = a − qb =d(k − ql), and so d | r . So if d is a common divisor of a and b then d is a common divisor of b and r .

Conversely, suppose d | b and d | r . Then there are integers l , m such that b = dl and r = dm. Thena = qb + r = d(ql +m), so d | a. So if d is a common divisor of b and r then d is a comon divisor of a and b.

So the common divisors of a and b are exactly the same as the common divisors of b and r . Inparticular, the greatest common divisor of a and b is the same as the greatest common divisor of band r .

Example. Take a = 32 and b = 12. When we do division with remainder, we get 32 = 2 × 12 + 8, so letr = 8.

The positive divisors of 32 are 1, 2, 4, 8, 16, 32.The positive divisors of 12 are 1, 2, 3, 4, 6, 12.The positive divisors of 8 are 1, 2, 4, 8.So the positive common divisors of 32 and 12 are 1, 2, 4. The positive common divisors of 12 and 8

are also 1, 2, 4. So gcd(32, 12) = 4 = gcd(12, 8).

Now we can give a fast algorithm for finding the greatest common divisor of two integers. This is calledEuclid’s algorithm.

The idea is as follows. Suppose we want to find the greatest common divisor of a and b. We may aswell assume a, b > 0, and by swapping a and b if necessary we can assume a > b.

Suppose b > 0. Then by Lemma 4.4 we can find q, r ∈ Z with 0 6 r < b such that a = qb + r . Now byProposition 4.5 gcd(a, b) (the thing we want to find) equals gcd(b, r ). So we can replace the integers a, bwith the smaller integers b, r without changing the greatest common divisor.

Let’s do an example.

Example. Suppose we want to find gcd(75, 27).

� 75 = 2× 27 + 21 and so (by Proposition 4.5) gcd(75, 27) = gcd(27, 21).

� 27 = 1× 21 + 6 and so gcd(27, 21) = gcd(21, 6).

� 21 = 3× 6 + 3 and so gcd(21, 6) = gcd(6, 3).

� 6 = 2× 3 + 0 and so gcd(6, 3) = gcd(3, 0).

� gcd(3, 0) = 3.

Putting these equalities together gives gcd(75, 27) = 3.

23

This example contains all the ideas needed for the general algorithm. We just need to specify whatwe have done precisely and make sure that every case is covered and that the algorithm always gives theright answer.

Euclid’s algorithm for finding gcd(a, b)

Input: a, b ∈ Z with a > b > 0.

� If b = 0 then output gcd(a, b) = a and stop.

� If b > 0 then find q, r with a = bq + r , 0 6 r < b. Replace a, b with b, r and repeat.

Why does this work?

� Whenever we replace a, b with b, r we do not change the greatest common divisor, by Proposi-tion 4.5.

� If b > 0, then we can always find q, r as required, by Lemma 4.4.

� When we replace a, b with b, r , the numbers get smaller. So as we run the algorithm we follow asequence

gcd(a1, b1) = gcd(a2, b2) = gcd(a3, b3) = gcd(a4, b4) = . . .

with b1 > b2 > b3 > . . . . Since each bi is a non-negative integer, they cannot carry on gettingsmaller and smaller for ever. It follows that eventually we must hit bk = 0 for some k and thealgorithm ends.

The idea of an algorithm as a precise specification of a procedure for solving a particular instance ofsome problem (for example finding the greatest common divisor of a pair of integers) is very importantin mathematics and theoretical computer science. Often people are interested in how fast an algorithmperforms. This can be defined in a precise mathematical way (roughly speaking, we consider how thenumber of steps the algorithm takes depends on the size of the input). For finding the greatest commondivisor, Euclid’s algorithm is much faster than the alternative approaches based on finding all factors of aand b or factorising a and b into primes.

4.4 Lowest common multiple

Definition. Suppose a, b are non-zero integers. The lowest common multiple of a and b is the smallestm ∈ N such that a | m and b | m. We write lcm(a, b) for the lowest common multiple of a and b.

Note that we have to take a, b both non-zero, because the only multiple of 0 is 0, so there is no m ∈ Nfor which 0 | m.

Examples.� lcm(4, 6) = 12.

� lcm(4,−4) = 4.

� lcm(−6, 9) = 18.

� If a > 0, then lcm(a, 1) = a; this because every natural number is a multiple of 1, so we just need thelowest positive multiple of a, which is a itself.

� If a > 0, then lcm(a, a) = a.

24

In fact, knowing the lowest common multiple of a and b is enough to tell us all the common multiplesof a and b, as the following lemma shows.

Lemma 4.6. Suppose a and b are non-zero integers, and let m = lcm(a, b). Then the positive commonmultiples of a and b are m, 2m, 3m, . . . .

Proof. We have to prove two things: that the numbers m, 2m, 3m, . . . are common multiples of a and b,and that every common multiple of a and b is one of m, 2m, 3m, . . . .

Suppose q ∈ N. Then a | m and m | qm, so a | qm by Lemma 4.1. So m, 2m, 3m, . . . are all multiplesof a. Similarly they are all multiples of b.

Now suppose n ∈ N is a multiple of both a and b; we have to show that n = qm for some q ∈ N. UsingLemma 4.4 we can write

n = qm + r

where 0 6 r < m. We know a | n and a | m, so we can write n = ak and m = al for k , l ∈ Z. So

r = n − qm = a(k − ql)

so a | r . Similarly b | r . If r > 0 then this is a contradiction, because m is the smallest natural numberdivisible by a and b, and r < m. So instead r = 0, and therefore n = qm.

How do we find the lowest common multiple of a and b? We can do this using the following theorem.

Theorem 4.7. Suppose a, b ∈ N. Then lcm(a, b) = abgcd(a, b)

.

Proof (non-examinable). Let g = gcd(a, b) and m = lcm(a, b). Because g | b, the number bg is an integer.

So abg = a× b

g is divisible by a. Similarly abg is divisible by b, so ab

g is a common multiple of a and b. So by

Lemma 4.6 abg = km for some k ∈ N. We need to show that k = 1.

Now a | m, so there is n ∈ Z such that m = an. Then abg = kan, which we can rearrange to get b = kgn.

So kg | b. Similarly, kg | a, so kg is a common divisior of a and b. But g is the greatest common divisorof a and b, so k = 1, so m = ab

g .

This tells us how to find lcm(a, b): we just use Euclid’s algorithm to find gcd(a, b), and then applyTheorem 4.7.

Examples.� gcd(42, 24) = 6, so lcm(42, 24) = 42×24

6 = 168.

� gcd(90, 100) = 10, so lcm(90, 100) = 90×10010 = 900.

� If a and b are primes and a 6= b, then gcd(a, b) = 1, so lcm(a, b) = a×b1 = ab.

� If a ∈ N, then gcd(a, 1) = 1, so lcm(a, 1) = a×11 = a.

� If a ∈ N, then gcd(a, a) = a, so lcm(a, a) = a×aa = a.

25

5 Sets

5.1 Definition, notation and examples

The most fundamental thing in mathematics (from which most other mathematical objects are defined)is a set. A set just means a collection of objects gathered together. If A is a set, we call the objects in Athe elements of A. We write x ∈ A to mean “x is an element of A”, and x /∈ A to mean “x is not an elementof A”.

The simplest way to write down a set is to list its elements. The way we do this is to put the elementsbetween curly brackets, with commas between them. For example,

S = {red, white, blue}

means “S is the set whose elements are red, white and blue, and nothing else”. Now we can write red ∈ S,and green /∈ S.

The order of elements in a set doesn’t matter, so

{♥,♠,♣,♦} = {♠,♦,♣,♥}.

We’re also allowed to write elements more than once:

{1, 2, 4} = {1, 2, 1, 4}.

(This might seem a bit pointless, but with more complicated sets it can be convenient. In Introduction toProbability the convention is that sets are never written with elements repeated.)

An important thing about sets is that a set is defined by its elements:

If two sets have exactly the same elements, then they are the same set.

More ways to write down a set

1. The list notation can be used for infinite sets, provided the set obeys some sort of pattern which isobvious from giving a few elements of the set. For example,

{2, 4, 6, 8, . . . }

is the set of positive even numbers. You might also have a set that goes to infinity in both directions:for example,

{. . . ,−3,−1, 1, 3, . . . }

is the set of all odd integers.

You can also use . . . when writing large finite sets: it should be clear what

{1, 2, . . . , 1000}

means. You can do the same thing for finite sets of unspecified size. So you might write

{1, 2, . . . , n}

for the set of all positive whole numbers up to n, where n is a positive whole number that hasn’t beenspecified yet.

2. Sometimes a set can be best described in words, for example:

26

let S be the set of all buses in London.

3. Some sets (especially sets of numbers) have standard names that are agreed upon by all mathe-maticians.

� N is the set of all natural numbers:

N = {1, 2, 3, . . . }.

� Z is the set of all integers:, that is

Z = {. . . ,−2,−1, 0, 1, 2, . . . }.

� Q is the set of all rational numbers, that is numbers which can be written as ab where a and b

are integers and b 6= 0. (Of course such a representation is not unique: 12 is the same as 2

4 .)

� R is the set of all real numbers. (Think of these as all numbers which have a (possibly infinite)decimal expansion. We’ll be a bit more precise later in the course.)

There is one more very important set with a standard name: the empty set is the set with noelements. It is written as ∅.

4. We can define a set by defining it to be the set of all things that satisfy a given condition.

{x : x ∈ R, x > 0}

is the set of positive real numbers. It’s important to get the notation right here: we introduce avariable (in this case x). Then inside the { } there’s a colon. Before the colon you put x , Then afterthe colon you write the condition that x has to satisfy to be in your set. (Some mathematicians usea vertical line | instead of a colon.) x is a dummy variable: it should only appear inside the { }, andyou can replace it with any other symbol that you’re not already using and get the same set.

Quite often we specify that x has to belong to some given set S, so we put x ∈ S before the colon.The idea is that S is a set that’s already known, and you’re defining your new set by taking only theelements of S that satisfy some additonal condition.

Here’s another example:{n ∈ N : n is even}

is another way of writing the set of positive even numbers: you’re taking the known set N, andspeciifying that you’re choosing the ones that are even.

5. As a more advanced version of the above notation, you can define a set by taking all the things thatsatisfy a particular condition, and then applying some operation to all of them. For example, if youwanted to write the set of positive square numbers, you could write

{1, 4, 9, 16, . . . }

and the pattern is obvious enough, but you could also write this set as{x2 : x ∈ N

}.

The rules for the notation are the same as before: you define the set by putting the elements of theset before the colon, and the conditions they satisfy after the colon. (When you read this set notationout loud, often the colon is read as “such that” or “where”.)

27

Here’s another example:{2n : n ∈ N}

is yet another way of writing the set of positive even numbers.

We can define the set of rational numbers using this notation, with two dummy variables:

Q ={

ab

: a, b ∈ Z, b 6= 0}

.

Notice that this is a case where it’s convenient to allow ourselves to write elements of a set more thanonce: in writing the elements of Q his way, we’ve written 1

2 and 24 , but these are the same element

of Q.

5.2 Subsets

Definition. Let A and B be sets. Then A is a subset of B if every element of A is an element of B.

Notation. We write A ⊆ B to mean “A is a subset of B”. We write A 6⊆ B to mean that A is not a subsetof B.We can also write B ⊇ A instead of A ⊆ B. We write A ⊂ B to mean that A is a subset of B and A 6= B (inthis case A is called a proper subset of B).

Examples.� {1, 3} ⊂ {1, 2, 3}.

� N ⊂ Z.

� A ⊆ A for any A.

� ∅ ⊆ A for any A.

Proposition 5.1. Suppose A and B are sets. If A ⊆ B and B ⊆ A, then A = B.

Proof. We use the contrapositive. So suppose A 6= B. Then we need to show that A * B or B * A.A set is defined by its elements, so if A 6= B then there is an x which is an element of one of A and B

but not the other.If x ∈ A but x /∈ B, then A * B (because not every element of A is an element of B).If x ∈ B but x /∈ A, then B * A (because not every element of B is an element of A).So A 6= B implies that A * B or B * A.

Quite often in maths we want to prove that two sets A and B are equal, and we do this using Proposi-tion 5.1: we show that A ⊆ B and B ⊆ A.

5.3 Set operations

We often use set operations. Each of these takes two sets A and B and creates a new set from them(just as an arithmetic operation like + takes two numbers a and b and produces a new number a + b).

For these operations, it is imprtant that A and B are sets (rather than numbers or functions or somethingelse), and the thing the operation produces is also a set.

28

� The union A ∪ B (pronounced “A union B”) is the set consisting of all elements which are in A or inB (or both).

� The intersection A ∩ B (pronounced “A intersect B”) is the set consisting of all elements which arein A and in B.

� The difference A \ B is the set consisting of all elements which are in A but not in B.

� The symmetric difference A4 B is the set consisting of all elements which are in A or in B but notboth.

You may have seen these set operations expressed with pictures called Venn diagrams: the sets arerepresented by overlapping circles, and then intersections and unions correspond to different regions ofthe diagram.

A B A B A B A B

A ∪ B A ∩ B A \ B A4 B

It’s important to express these sets symbolically as well. For example,

A∩B = {x : x ∈ A and x ∈ B} , A∪B = {x : x ∈ A or x ∈ B} , A\B = {x : x ∈ A and x /∈ B} .

Symmetric difference can be expressed in terms of the other operations in two different ways:

A4 B = (A ∪ B) \ (A ∩ B) or A4 B = (A \ B) ∪ (B \ A).

With intersection defined, we can make one more definition.

Definition. Two sets A and B are disjoint if A ∩ B = ∅.

Words and Pictures: is it OK to just use Venn Diagrams?

Venn diagrams are useful tools to help remember how the operations are defined. However, you needto get used to working with sets using symbols. The reason for this is that you might work with several sets.It’s easy to draw a Venn diagram to show the pattern on intersections between two or three sets. However,if you had an expression involving a hundred sets the Venn diagram would be impossible to visualise. Evenworse, you might have an expression involving an unspecified number of sets A1, . . . , An, which cannot bedrawn precisely with a Venn diagram. Worse still, you might be working with infinitely many sets.

5.4 Some set identities

We now know how some set operations are defined. But if we want to use these operations it will behelpful to know some of their properties. As an analogy, it is helpful when doing arithmetic to know that fornumbers (a + b) + c = a + (b + c). We can’t just assume that the set operations defined satisfy this kind ofidentity. Is it true that for sets (A∪B)∪C = A∪ (B∪C)? The next proposition gives several useful identitiesfor set operations.

29

Proposition 5.2. Let A, B, C be sets. Then:

(a) (A ∩ B) ∩ C = A ∩ (B ∩ C),

(b) (A ∪ B) ∪ C = A ∪ (B ∪ C),

(c) (A4 B)4 C = A4 (B4 C),

(d) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C),

(e) A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).

The statements in this proposition only involve three sets A, B and C, so we could check them just bydrawing Venn diagrams. But we’re going to prove them more rigorously so that we get used to the kind ofargument that we need to use in more complicated situations.

Proof of Proposition 5.2. We will only prove some of the parts; you should prove the others as an exer-cise.

In each part we have to prove that two sets are equal. Remember what this means: two sets X and Yare equal if they have the same elements.

(b) If x ∈ (A ∪ B) ∪ C, then either x ∈ A ∪ B or x ∈ C. This says that x ∈ A or x ∈ B or x ∈ C.

Similarly, if x ∈ A ∪ (B ∪ C) then either x ∈ A or x ∈ B ∪ C. So x ∈ A or x ∈ B or x ∈ C.

So the condition for x to be in (A ∪ B) ∪ C is the same as the condition for x to be in A ∪ (B ∪ C), so(A ∪ B) ∪ C and A ∪ (B ∪ C) have the same elements.

(c) x ∈ (A4 B)4 C means that either

• x ∈ A4 B but x /∈ Cor

• x /∈ A4 B but x ∈ C.

In the first case either x is in A but not in B and not in C, or x is in B but not in A and not in C. In thesecond case either x is not in A or B but is in C, or x is in both A and B and in C. We conclude thatthe elements of (A4 B)4 C are precisely those elements which are in exactly one of A, B, C or inall three of A, B, C.

Similarly, x ∈ A4 (B4C) means that x is in either exactly one of A, B, C or x is in all three of A, B,C.

It follows that (A4 B)4 C = A4 (B4 C).

(e) This identity is a little harder to digest, so we’ll use Proposition 5.1, and proving that every elementof A ∩ (B ∪ C) belongs to (A ∩ B) ∪ (A ∩ C) and vice versa.

� Suppose x ∈ A ∩ (B ∪ C). Then x ∈ A and x ∈ B ∪ C. So either x ∈ B or x ∈ C. We will splitinto two cases.

Case 1: If x ∈ B then x ∈ A ∩ B.Case 2: If x ∈ C then x ∈ A ∩ C.

So either x ∈ A ∩ B or x ∈ A ∩ C, so x ∈ (A ∩ B) ∪ (A ∩ C).

� Now suppose x ∈ (A ∩ B) ∪ (A ∩ C). Then either x ∈ A ∩ B or x ∈ A ∩ C. Again, we split intotwo cases.

30

Case 1: If x ∈ A ∩ B then x ∈ A and x ∈ B.Case 2: If x ∈ A ∩ C, then x ∈ A and x ∈ C.

In either case x ∈ A and x ∈ B ∪ C, so x ∈ A ∩ (B ∪ C).

Because of parts Proposition 5.2, we can write expressions like A ∪ B ∪ C without brackets. We caneven do this for more than three sets. However, we can’t write A ∪ B ∩ C without brackets, becauseA ∪ (B ∩ C) is not always the same as (A ∪ B) ∩ C. Here’s a very small example of this: take

A = {1}, B = {1, 2}, C = {1, 2, 3}.

Then A ∩ (B ∪ C) = {1} ∩ {1, 2, 3} = {1}, whereas (A ∩ B) ∪ C = {1} ∪ {1, 2, 3} = {1, 2, 3}.(By the way, there is no rule like BIDMAS for set operations: you need to put the brackets in.)

5.5 Another set operation: Cartesian product

We write (a, b) for the ordered pair “a then b”. An ordered pair is not a set; it is a different kind ofmathematical object. As the name suggests, order matters and (1, 2) 6= (2, 1) (compare with sets where{1, 2} = {2, 1}). We can also have a = b: the ordered pair (1, 1) is not the same as the number 1.

Definition. If A and B are sets, the Cartesian product of A and B is the set of all ordered pairs (a, b) witha ∈ A and b ∈ B. We denote this by A× B, so

A× B = {(a, b) : a ∈ A and b ∈ B}.

For example:

� {1, 2} × {1, 2, 3} = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3)}

� {1, 2, 3} × {1, 2} = {(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2)}

� ∅ × S = ∅ for any set S (there are no ordered pairs (a, b) with a ∈ ∅ and b ∈ S because there are noelements in ∅).

Note that we have given a special definition × for which applies to sets. We cannot assume that whatwe know about multiplication of numbers transfers to the Cartesian product of sets; it is a completelydifferent concept. For instance, we know that if a and b are numbers then a × b = b × a. However, wehave already seen an example showing that this does not always hold for the Cartesian product of sets.

Now suppose we have three sets A, B, C. Strictly speaking, (A× B)×C and A× (B ×C) are differentsets: (A× B)×C is the set of all expressions ((a, b), c), where a ∈ A, b ∈ B and c ∈ C, while A× (B ×C)is the set of all expressions (a, (b, c)). But usually we overlook this distinction and think of (A×B)×C andA× (B×C) as being the same set. We write this set just as A×B×C, and write its elements as (a, b, c),rather than ((a, b), c) or (a, (b, c)).

Of course, we can extend this to more than three sets. These sets may be the same: we sometimeswrite An for the Cartesian product A × · · · × A (where there are n copies of A). Sometimes we writeelements of An as column vectors; you may be familiar with writing R3 to mean the set of all columnvectors consisting of three real numbers. This is just an example of the Cartesian product.

31

6 Counting

6.1 Cardinality

Definition. If A is a finite set then the cardinality of A is the number of elements it contains. We write |A|for the cardinality of A.

Examples.� |{1, 2, 4}| = 3.

� |∅| = 0.

� If a is an integer, let Sa = {−a, a}. Then

|Sa| =

{2 if a 6= 0

1 if a = 0.

� If n is a positive integer, then |{2, 4, 6, . . . , 2n}| = n.

For small sets like these which are specified explicitly it is easy to determine the cardinality just bycounting the elements. However, for large and more general sets, counting problems can involve someinteresting mathematics.

If A is an infinite set, then cardinality is a much more difficult issue. In this module, we will just write|A| =∞ when A is an infinite set, but actually there are different cardinalities that an infinite set can have.

6.2 Counting subsets

Definition. If X is a set, we define the power set of X to be the set of all subsets of X . It is denotedby P(X ):

P(X ) = {S : S ⊆ X} .

For example,P({1, 2, 3}) =

{∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}

}Counting the elements we see that |P(X )| = 8. In words this says that X has eight subsets. We want

to generalise this.

Counting by making a series of choices

Counting is a very important idea in maths. Often we simplify counting things by making a series ofchoices. Here’s a primary school example: I have m bags, and each bag contains n apples. How manyapples do I have? Of course you instinctively know that to get the answer you multiply m by n, but I wantto use this to illustrate an important general idea.

We often think of counting in terms of choices. So instead of asking “how many apples are there?”,we ask “how many ways can I choose and apple?”. We can break the process of choosing an apple intotwo stages.

� choose a bag;

� choose an apple from that bag.

32

At the first stage we have m options (because there are m bags). At the second stage we have n optionsregardless of which option we chose at the first stage; this is because each bag contains n apples.

The Multiplication Principle says that if we want to find the number of ways of choosing an object,and we can break the choosing down into a number of steps in such a way that the number of optionswe have at a give stage doesn’t depend on which options we chose at earlier stages, then the number ofchoices overall is just the product of the numbers of options at each stage.

As another example of the multiplication principle: how many permutations (i.e. orderings) of the num-bers 1, . . . , k are there? To answer this, we choose a permutation a1a2 . . . ak by making a sequence ofchoices:

� choose a1 (k choices);

� choose a2 (k − 1 choices – it can be any number except a1);

� choose a3 (k − 2 choices – it can be any number except a1 or a2);...

� choose ak (only one possible choice).

So by the Multiplication Principle the number of permutations of 1, 2, . . . , k is n × (n − 1)× · · · × 1 = n!.

We can use the Multiplication Principle to find |P(X )|.

Theorem 6.1. Suppose X is a finite set, and |X | = n. Then |P(X )| = 2n.

Proof. Because |X | = n, we can label the elements of X as x1, x2, . . . , xn. Now we can choose a subsetA ⊆ X by making a series of choices:

� decide whether x1 ∈ A;

� decide whether x2 ∈ A;...

� decide whether xn ∈ A.

We’re making n choices, and for each choice we have two options: the element xi is in A or not. So by theMultiplication Principle the number of possible A is 2× 2× · · · × 2 = 2n.

Note that this theorem also applies in the case n = 0: the set with 0 elements (i.e. ∅) has exactly onesubset.

6.3 Counting subsets of a particular size

Definition. Suppose X is a set and k ∈ Z. A k -element subset of X means a subset with exactly kelements.If n > 0, we define

(nk

)to be the number of k -element subsets of the set {1, . . . , n}.

The notation(

nk

)is read out loud as “n choose k ”: we can think of it as the number of ways of choosing

k elements from 1, . . . , n. The numbers(

nk

)are called binomial coefficients.

(There’s nothing special about the set {1, . . . , n}: any set with n elements will do.)

33

Examples.� The subsets of {1, 2, 3} are

∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}.

So (30

)= 1,

(31

)= 3,

(32

)= 3,

(33

)= 1.

�(

42

)= 6: the 2-element subsets of {1, 2, 3, 4} are {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4} and {3, 4}.

�(

n0

)= 1: there is one 0-element subset of {1, . . . , n}, namely ∅.

�(

nn

)= 1: the only n-element subset of {1, . . . , n} is {1, . . . , n}.

�(

n1

)= n: the 1-element subsets of {1, . . . , n} are the sets {1}, {2}, . . . , {n}.

�(

nk

)= 0 if k < 0: no subset can have a negative number of elements.

�(

nk

)= 0 if k > n: no subset of {1, . . . , n} can have more than n elements.

Here’s a formula for(

nk

)in the case where 0 6 k 6 n.

Theorem 6.2. Suppose n and k are integers with 0 6 k 6 n. Then(nk

)=

n!k !(n − k )!

Proof. Let X be a set with n elements. We want to count the ways of choosing a k -element subset{a1, a2, . . . , ak} of X . First let’s count the ways of choosing distinct numbers a1, . . . , ak in order.

� We can choose a1 in n different ways.

� We can then choose a2 in n − 1 different ways (it can be any element of X except a1).

� We can then choose a3 in n − 2 different ways (it can be any element of X except a1 or a2)....

� We can then choose ak in n−(k−1) different ways (it can be any element of X except a1, a2, . . . , ak−1).

So by the Multiplication Principle, the number of ways to choose a1, . . . , ak in order is n(n−1)(n−2) . . . (n−k + 1). But observe that this product is the product n × (n − 1) × (n − 2) × · · · × 1 with the terms(n − k )× (n − k − 1)× · · · × 1 removed, so

n(n − 1) . . . (n − k + 1) =n!

(n − k )!.

Each choice of a1, . . . , ak in order gives us a k -element subset {a1, . . . , ak}. But in a set the order doesn’tmatter, so we’ve counted each set more than once. How many times have we counted each k -element

34

subset? Given a k -element subset {a1, . . . , ak}, we’ve counted this set k ! times, because there are k !possible orders for the numbers a1, . . . , ak .

So to get the number of k -element subsets, we take the number of ways of choosing a1, . . . , ak inorder, and divide by k ! to get (

nk

)=

n!(n − k )!k !

.

Example. Let’s work out(5

3

)following the proof of Theorem 6.2. Let X = {1, 2, 3, 4, 5}. First we find all the

ways of choosing three numbers a1, a2, a3. By the Multiplication Principle, the number of ways of doingthis is

5× 4× 3 =5× 4× 3× 2× 1

2× 1=

5!2!

.

But now each 3-element subset has been counted 3! = 6 times. For example, the set {1, 3, 4} has beencounted from the choices

1, 3, 4, 1, 4, 3, 3, 1, 4, 3, 4, 1, 4, 1, 3, 4, 3, 1.

So to get the number of 3-element subsets we divide by 3!:(53

)=

5!2!3!

= 10.

The formula for(

nk

)in Theorem 6.2 is very useful. But when you use it, do as much cancelling as

possible, rather than working out very large factorials. For example:(106

)=

10!6!4!

=10× 9× 8× 7× 6× 5× 4× 3× 2× 16× 5× 4× 3× 2× 1× 4× 3× 2× 1

=10× 9× 8× 7× 6× 5

6× 5× 4× 3× 2

=10× 9× 8× 7

4× 3× 2

=10× 9× 7

3= 10× 3× 7

= 210.

Although this formula is very useful, it’s not the definition of the binomial coefficient(n

k

). In fact the

formula is only valid when 0 6 k 6 n (because the factorial of a negative number doesn’t make sense).So you should think of

(nk

)as meaning “the number of k -element subsets of an n-element set” and try

to use this interpretation when working with binomial coefficients. Only resort to the formula n!/(n − k )!k !when you’re stuck, and even then make sure you only use it for 0 6 k 6 n.

Binomial coefficients crop up in various parts of mathematics and have lots of nice properties. You willsee them again in the Introduction to Probability module and we will also revisit them in the next section.

7 Functions

7.1 Definition of functions

Now we introduce another fundamental object: functions. You may have a good idea of what a functionis, and you will certainly have seen lots of examples of them.

35

Definition. Let A and B be sets. A function from A to B is a rule which assigns an element of B to eachelement of A.We write f : A → B to mean “f is a function from A to B”. If a ∈ A, we write f (a) for the element of Bassigned to a by f .The set A is called the domain of f , and B is the codomain.

If a ∈ A, we say that f (a) is the image of a under f , or sometimes (especially if B is a set of numbers)the value of f at a. We say that f maps a to f (a).

We write f : A→ B to mean “f is a function from A to B”. For a rule to define a function from A to B weneed f (a) to be an unambiguously defined element of B for every a ∈ A.

Examples. Here are several examples of functions.

� f : R→ R defined by f (x) = x2.

� f : N→ N defined by f (x) = x2.

� f : N→ N ∪ {0} defined by f (n) = n − 1.

(We had to be a bit careful with the codomain here. It would not work to say f : N→ N, f (n) = n − 1because f would map 1 to something outside the codomain.)

� f : R→ R defined by

f (x) =

0 if x < 0

1 if 0 6 x 6 2

2 if x > 2.

In this kind of expression where a function is defined by a different formula in different parts of thedomain, you need to make sure that the function is defined unambiguously at each point of thedomain.

� f : {1, 2, 3, 4} → {♠,♥,♦,♣} defined by f (1) = ♦, f (2) = ♠, f (3) = ♠, f (4) = ♦.

This is an example of a function where there’s no nice formula (because the codomain is just a setof symbols with no meaning), so we define the function just by writing down all its values.

� f : P({1, 2, 3, . . . , n})→ {0, 1, 2, . . . , n} defined by f (A) = |A|.

� f : P({1, 2, 3, . . . , n})→ P({1, 2, 3, . . . , n}) defined by f (A) = {1, 2, 3, . . . , n} \ A.

Note that the first two examples are different functions, because the domain and codomain are partof the specification of a function. These functions have different properties; in the second example thestatement “if f (n) = 4 then n = 2” is true, but in the first example it’s not. So when defining a functionproperly, you shouldn’t talk about “the function x2”: you need to say what your domain and codomain are.

Be careful with arrows: we write f : A→ B to mean “f is a function from A to B”. If a ∈ A, then we maywrite a 7→ f (a) to show where a maps to. (Note the different kinds of arrow here: → is used between thesets A and B, and 7→ is used between the elements a and f (a).)

The figure below is a mental picture that you could have to help understand the idea of a functionf : A → B. The blob on the left is the domain, the blob on the right is the codomain, and the arrows

36

indicate how the rule assigns an element of the codomain to each element of the domain.

•

•

•

•

•

•

•

•A f B

The three figures below show how a rule can fail to be a function.

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

element of domain with no image element of domain with two images image outside codomain

Definition. If f : A→ B, the range of f is the set of all elements of B which actually occur as the image ofsome element of A:

range(f ) = {f (a) : a ∈ A} .

(The range is sometimes (especially in algebra) called the image of f . We’ll stick to “range” in thismodule, but they mean exactly the same thing.)

Examples. Let’s work out the range for some functions.

� f : {1, 2, 3, 4} → {♠,♥,♦,♣} defined by f (1) = ♦, f (2) = ♠, f (3) = ♠, f (4) = ♦.

In this case range(f ) = {♠,♦}.

� f : R→ R, defined by f (x) = x2.

In this case range(f ) = {x ∈ R : x > 0}.

� f : N→ N defined by f (n) = n + 1.

In this case range(f ) = {2, 3, 4, . . . }.

It is important to note the difference between the codomain and the range. The codomain is a set ofallowed outputs. The range is the set of outputs that occur (as described formally in the definition above).

37

7.2 Injective, surjective, bijective

Now we look at three important properties which a function might have.

Definition. Suppose f : A→ B is a function.

� f is injective if different elements of the domain A are mapped to different elements of the codomain;that is, if a1, a2 ∈ A and a1 6= a2, then f (a1) 6= f (a2).

� f is surjective if its codomain is equal to its range; that is, for every b ∈ B there is some a ∈ A withf (a) = b.

� f is bijective if it is both injective and surjective.

We also refer to a bijective function as a bijection.

Injective and surjective are independent properties: a function can be injective, or surjective, or both,or neither. The following pictures help to illustrate the definitions.

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

injective, not surjective surjective, not injectiveneither injectivenor surjective

bijective

Examples. Let’s look at some of our examples from above.

� f : R→ R defined by f (x) = x2.

f is not injective because f (1) = f (−1). Also f is not surjective, because there is no a ∈ R for whichf (a) = −1.

� f : P({1, 2, 3, . . . , n})→ {0, 1, 2, . . . , n} defined by f (A) = |A|.f is surjective: given m ∈ {0, . . . , n}, the subset {1, 2, . . . , m} satisfies f ({1, 2, . . . , m}) = m. (Notethat I’m including the case m = 0 here, where you have to interpret {1, 2, . . . , m} as ∅.)Assuming n > 2, f is not injective, because f ({1}) = 1 = f ({2}).

� f : N→ N defined by f (x) = x2.

f is not surjective, as there is no a ∈ N for which f (a) = 2.

f is injective. To prove this, we suppose m, n ∈ N and m2 = n2. Then m = ±n. But m, n > 0, so itcan’t be the case that m = −n. So m = n.

We make some comments about proving injectivity and surjectivity. Suppose f : A→ B.

� To prove that f is not surjective, you have to find an element b ∈ B such that there is no a ∈ A withf (a) = b.

� To prove that f is surjective, you have to give a general argument to show that for every b ∈ B thereis an a ∈ A for which f (a) = b.

38

� To prove that f is not injective, you have to find a1, a2 ∈ A such that a1 6= a2 but f (a1) = f (a2).

� To prove that f is injective, you have to give a general argument to show that if a1 6= a2 then f (a1) 6=f (a2). In fact, we usually do this using the contrapositive, and show that if f (a1) = f (a2) then a1 = a2.This is because = is generally a more useful relation than 6=. (See the third example above.)

7.3 Restriction and composition of functions

Here we look at some ways of making new functions from old ones. Often we only want to considerwhat a function does to part of the domain.

Definition. Suppose f : A → B is a function, and C ⊆ A. The restriction of f to C is the functiong : C → B defined by g(c) = f (c) for all c ∈ C. This function is written as f |C .

Defining the function f |C really just means “forgetting what happens to things not in C”.

Examples.� f : {1, 2, 3, 4} → {♠,♥,♦,♣} defined by f (1) = ♦, f (2) = ♠, f (3) = ♠, f (4) = ♦.

f |{1,2} is the function g : {1, 2} → {♠,♥,♦,♣} defined by g(1) = ♦, g(2) = ♠.

� f : R→ R defined by f (x) = cos(πx).

f |Z is the function g : Z→ R defined by g(n) = (−1)n.

Now we consider composition of functions: this is where we apply a function, and then apply anotherfunction to the output of the first function. Here’s a formal definition.

Definition. Suppose A, B and C are sets, f is a function from A to B, and g is a function from B to C. Thecomposition g ◦ f is the function from A to C defined by (g ◦ f )(a) = g(f (a)).

Sometimes we just write gf instead of g ◦ f . Note that in order for g ◦ f to be defined, the domain of gmust be the same as the codomain of f . So we have the following picture.

A f−→ Bg−→ C

The notation is a little counterintuitive in that g ◦ f means “do f then g”. This is a consequence of thefact that we write our functions on the left (i.e. we write f (a) rather than (a)f ).

Examples.� Consider the functions f : N → N and g : N → N given by f (a) = a + 4 and g(a) = 2a. Then g ◦ f is

the function N → N given by (g ◦ f )(a) = 2(a + 4). On the other hand, f ◦ g is the function N → Ngiven by (f ◦ g)(a) = 2a + 4.

� Consider the function f : N → Z and g : Z → N given by f (a) = 1 − a and g(b) = 1 + |b|. Then fora ∈ N,

(g ◦ f )(a) = g(f (a)) = g(1− a) = 1 + |1− a| = 1 + a− 1 = a,

so g ◦ f is the identity function on N.

39

7.4 Inverses

Now we give another way of thinking about bijective functions.

Definition. Suppose A is a set. The identity function on A is the function f : A → A defined by f (a) = afor all a ∈ A.

Definition. Suppose f : A → B is a function. An inverse to f is a function g : B → A satisfying the twoconditions

� g(f (a)) = a for all a ∈ A

� f (g(b)) = b for all b ∈ B.

If f has an inverse then f is called invertible.

The condition that g is an inverse of f can be expressed as saying that g ◦ f is the identity function onA, and f ◦ g is the identity function on B.

Examples.� Define f : R → R by f (x) = 2x + 3. Then an inverse to f is the function g : R → R defined by

g(x) = x−32 .

Let’s check that this works:

g(f (x)) = g(2x + 3) =(2x + 3)− 3

2= x

and

f (g(x)) = f(

x − 32

)= 2(

x − 32

)+ 3 = x .

� Define f : P({1, 2, 3, 4, 5}) → Z by f (A) = |A|. Then f doesn’t have an inverse. If it had an inverseg, then we would have g(f (A)) = A for every A. In particular, g(f ({1})) = {1} and g(f ({2})) = {2},which means g(1) = {1} and g(1) = {2}. So {1} = {2}, a contradiction.

� From one of the examples above, take f : N → Z and g : Z → N given by f (a) = 1 − a andg(b) = 1 + |b|. We saw that g(f (a)) = a for every a ∈ N. But g is not an inverse to f , becausef (g(1)) = f (2) = −1.

Usually we talk about the inverse of a function f , rather than an inverse. The next lemma shows whywe can do this.

Lemma 7.1. Suppose f : A→ B is an invertible function. Then the inverse of f is unique.

When we say the inverse is unique, we mean f has only one inverse. There’s a very standard way ofproving that something is unique: we suppose there are two of them, and then use the definitions to showthat they’re the same.

Proof. Suppose g : B → A and h : B → A are both inverses for f . Our task is to show that g = h. Now gand h are both functions B → A, and saying that they’re equal means that g(b) = h(b) for every b ∈ B. Todo this, first we know that f (h(b)) = b, because h is an inverse to f . Applying g, we get g(f (h(b))) = g(b).But h(b) ∈ A, and the fact that g is an inverse to f means that g(f (a)) = a for every a ∈ A. So in particularg(f (h(b))) = h(b). So g(b) = g(f (h(b))) = h(b).

40

This lemma allows us to define the following notation.

Notation. If f : A→ B is invertible, then we write f−1 for the inverse of f .

Here are some more examples of inverses of functions.

Examples.� f : N→ N ∪ {0} defined by f (n) = n − 1.

The inverse is the function f−1 : N ∪ {0} → N defined by f−1(n) = n + 1.

� Let X be a set, and define f : P(X ) → P(X ) by f (A) = X \ A. (f is the “complement function” whichyou’ve used in Introduction to Probability.)

The function f is its own inverse: if A ⊆ X , then

f (f (A)) = X \ (X \ A) = X \ {x ∈ X : x /∈ A} = A.

So f is invertible, with f−1 = f .

� Here’s another example that we didn’t see earlier. Define f : Z→ N by

f (n) =

{2n if n > 0

1− 2n if n 6 0.

Let’s just double-check that this function really does map Z to N: if n ∈ Z with n > 0 then certainly2n ∈ N. If n 6 0, then 2n 6 0, so −2n > 0, so 1− 2n > 0, as required.

Here’s an inverse for f : define g : N→ Z by

g(n) =

{n2 if n is even1−n

2 if n is odd.

Let’s check that g really is an inverse for f . First we take n ∈ Z, and check that g(f (n)) = n. Let’sconsider two cases.

� If n > 0, then f (n) = 2n, which is even, so

g(f (n)) = g(2n) =2n2

= n.

� If n 6 0, then f (n) = 1− 2n, which is odd, so

g(f (n)) = g(1− 2n) =1− (1− 2n)

2= n.

Now we take m ∈ N, and check that f (g(m)) = m. Again we consider two cases.

� If m is even, then g(m) = m2 which is positive, so

f (g(m)) = f (m2 ) = 2m

2 = m.

� If m is odd, then g(m) = 1−m2 which is non-positive, so

f (g(m)) = f (1−m2 ) = 1− 2(1−m

2 ) = m.

41

Now let’s relate invertibility to the properties that we saw earlier. Suppose I want to find an inverseto a given function f . If we represent the function f by arrows from A to B, we would like to define gas the function from B to A which “reverses the arrows”. What could go wrong with this? If there is anelement b1 ∈ B with no arrow pointing to it then we cannot define g(b1). This problem does not arise if fis surjective. Also, if there is an element b2 ∈ B with more than one arrow pointing to it then we cannotdefine g(b2) unambiguously. This problem does not arise if f is injective.

This suggests that a function should be invertible if and only if it is both injective and surjective.

Theorem 7.2. Suppose f : A→ B. Then f has an inverse if and only if f is bijective.

Proof (non-examinable). The theorem contains “if and only if”, so we have two things to prove.First suppose that f has an inverse g : B → A. We need to show that f is bijective.

f is injective: If f (a1) = f (a2) then we can apply the inverse function to both sides to get g(f (a1)) = g(f (a2)).By the definition of inverse this implies that a1 = a2 and so f is injective.

f is surjective: If b ∈ B then g(b) ∈ A. The definition of inverse means that f (g(b)) = b. So there is a ∈ Asuch that f (a) = b, so f is surjective.

Since f is injective and surjective it is bijective.

In the other direction, suppose that f is bijective. We need to show that f has an inverse g : B → A.Suppose that b ∈ B. Because f is surjective, there is at least one a ∈ A such that f (a) = b. Choose

such an a, and let g(b) = a. This defines g(b) for every b ∈ B, so it defines a function g : B → A. We needto check that g is an inverse.

If b ∈ B, then f (g(b)) = b because g(b) was chosen to be an element with this property.If a ∈ A, let b = f (a). Then f (g(b)) = b, or in other words f (g(f (a))) = f (a). But because f is injective,

this means g(f (a)) = a.We conclude that g is indeed an inverse for f and so f is invertible.

7.5 Bijections and cardinality

There is a connection between the injective and surjective properties and cardinalities.

Theorem 7.3. Let A and B be finite sets and f : A→ B be a function.

(a) If f is injective then |A| 6 |B|.

(b) If f is surjective then |A| > |B|.

(c) If f is bijective then |A| = |B|.

Proof.(a) Let a1, a2, . . . , am be the elements of A. Because f is injective, the elements f (a1), f (a2), . . . , f (am) of

B are different. So B contains at least |A| elements.

(b) Let b1, b2, . . . , bn be the elements of B. Because f is surjective, we can find elements a1, a2, . . . , an ∈A such that

f (a1) = b1, f (a2) = b2, . . . , f (an) = bn.

So A contains at least |B| elements.

42

(c) If f is bijective then it is both injective and surjective. Since f is injective, |A| 6 |B|. Since f issurjective |A| > |B|. It follows that |A| = |B|.

theorem(c) is often used in counting problems. The idea is that if I have a set A which is hard to countdirectly, I can try to find a bijection from it to a set B which is easier to count.

Example. How many 8-element subsets of {1, 2, . . . , 100} are there that contain the number 100 but notthe number 99?

Let A be the set consisting of these sets:

A = {X ⊆ {1, 2, . . . , 100} : |X | = 8, 99 /∈ X , 100 ∈ X} .

Then we want to know |A|. Also define

B = {X ⊆ {1, 2, . . . , 98} : |X | = 7} .

Then there is a function f : A→ B defined by f (X ) = X \ {100}. This function is bijective (check this!), so|A| = |B| =

(987

).

Theorem 7.3 is really the basis of counting. What do we really mean by the number 3? The sets

{red, white, blue}, {apple, orange, banana}, {cat, dog, mouse}

are intuitively “equivalent”, and we encapsulate this equivalence by labelling them all with the number 3.In fact this equivalence is that there is a bijection between any two of these sets.

7.6 Images and inverse images of subsets

Definition. Suppose f : A → B is a function, and C ⊆ A. The image of C under A is the set of all valuesof f at elements of C; that is, the set

f (C) = {f (a) : a ∈ C} .

Examples.� Define f : {1, 2, 3, 4} → {1, 2, 3} by f (1) = 2, f (2) = 3, f (3) = 3, f (4) = 1. Then

f ({1}) = {2},f ({1, 2}) = {2, 3},

f ({1, 2, 3}) = {2, 3}.

� Define f : R→ R by f (x) = x2. Then

f ({1, 2, 3}) = {1, 4, 9},f ({x ∈ R : x > 2}) = {x ∈ R : x > 4} ,

f ({x ∈ R : x 6 1}) = R.

� Define f : P({1, 2, 3, 4, 5})→ P({1, 2, 3, 4, 5}) by f (X ) = {1, 2, 3, 4, 5} \ X . Then

f ({{1, 2}, {2, 3, 4}}) = {{3, 4, 5}, {1, 5}},f ({X : X ⊆ {1, 2, 3, 4, 5}, |X | = 2}) = {X : X ⊆ {1, 2, 3, 4, 5}, |X | = 3} ,

f ({X : X ⊆ {1, 2, 3, 4, 5}, 1 ∈ X}) = {X : X ⊆ {1, 2, 3, 4, 5}, 1 /∈ X} .

43

� If f : A→ B is any function, then f (∅) = ∅.

� If f : A→ B is any function, then f (A) is just the range of A.

Now here’s a related concept.

Definition. Suppose f : A → B is a function and D ⊆ B. The inverse image of D under f is the set of allelements of A that get mapped to D by f ; that is, the set

f−1(D) = {a ∈ A : f (a) ∈ D} .

Warning. When we make this definition, we are not assuming that f is invertible. This definition makessense for any function f , and we can always write f−1 if we are applying it to a subset of B.(If f does happen to be invertible, then the inverse image of D under f is the same as the image of D underf−1.)

Examples.� Define g : {1, 2, 3, 4} → {1, 2, 3} by g(1) = 2, g(2) = 3, g(3) = 3, g(4) = 1. Then

g−1({1, 2}) = {1, 4},g−1({1, 3}) = {2, 3, 4}.

� Define f : R→ R by f (x) = x2. Then

f−1({1, 4, 9}) = {1,−1, 2,−2, 3,−3},f−1({x ∈ R : x > 4}) = {x ∈ R : x > 2} ∪ {x ∈ R : x 6 −2} ,

f−1({x ∈ R : x 6 1}) = {x ∈ R : −1 6 x 6 1} ,

f−1({x ∈ R : x < 0}) = ∅.

� If f : A→ B is any function, then f−1(∅) = ∅.

� If f : A→ B is any function, then f−1(B) = A.

One thing which is quite surprising is that image and inverse image aren’t inverses of each other. IfC ⊆ A, then it need not be the case that f−1(f (C)) = C, and if D ⊆ B, then it need not be true thatf (f−1(D)) = D.

Example. Take f : Z→ Q defined by f (x) = x2.Let C = {1, 2}. Then f (C) = {1, 4}, so f−1(f (C)) = {−2,−1, 1, 2} 6= C.Let D = {1, 2}. Then f−1(D) = {−1, 1}, so f (f−1(D)) = {1} 6= D.

Something for you to think about: what property does f need to have in order to guarantee thatf−1(f (C)) = C for every C ⊆ A? What property does f need to have in order to guarantee that f (f−1(D)) = Dfor every D ⊆ B?

44

8 Some more mathematical objects

8.1 Relations

Now we come to another important object in mathematics.

Definition. Suppose X is a set. A relation on X is a property which may or may not hold for each orderedpair of elements of X .

We think of a relation as a symbol R that we can put between any two elements of X to produce astatement which is true or false. For example, < is a relation on R: for any a, b ∈ R, we have a statementa < b which is either true or false. We write a /R b to mean that a R b is false.

Here are some other examples:

� if X is a set of numbers, then 6 is a relation on X ;

� if X is a set of integers, then | is a relation on X ;

� if X is a set of sets, then ⊆ is a relation on X ;

� if X is a set of people, then “is a parent of” is a relation on X ;

� if X is any set, then = is a relation on X .

There are several properties that a relation on a set X may have.

Definition. A relation R on a set X is:

� reflexive if a R a for all a ∈ X ;

� symmetric if a R b implies b R a, for a, b ∈ X ;

� anti-symmetric if a R b and b R a imply that a = b, for a, b ∈ X ;

� transitive if a R b and b R c imply that a R c, for a, b, c ∈ X .

Note that “anti-symmetric” does not mean the same thing as “not symmetric”.

Examples.� Take X = R and consider the relation 6. This relation is:

� reflexive (a 6 a for every a ∈ R);

� not symmetric (since 1 6 2 but 2 66 1);

� anti-symmetric (if a 6 b and b 6 a then a = b);

� transitive (if a 6 b and b 6 c then a 6 c).

� Take X = R and consider the relation <. This relation is not reflexive (since 1 6< 1). It is anti-symmetric and transitive, but not symmetric (just as for 6).

� Take X = Z and consider the relation “|”. This relation is:

� reflexive (a | a for any a);

� not symmetric (2 | 4 but 4 - 2);

� not antisymmetric (2 | −2 and −2 | 2);

45

� transitive (Lemma 4.1).

� Let X be the set of all the people in the world, and R the relation “is a parent of”. This relation is:

� not reflexive (you are not your own parent);

� not symmetric (your mum is your parent, but you are not your mum’s parent);

� anti-symmetric (two different people can’t be each other’s parents);

� not transitive (your nan is your mum’s parent and your mum is your parent, but your nan is notyour parent).

� Take X = P(N) and consider the relation ⊆. This relation is:

� reflexive (because A ⊆ A for any set S);

� not symmetric (since {1} ⊆ {1, 2} but {1, 2} 6⊆ {1});� anti-symmetric (if A ⊆ B and B ⊆ A then A = B);

� transitive (if A ⊆ B and B ⊆ C then A ⊆ C).

� Consider X = P(N) and consider the relation R defined by A R B if A and B are disjoint (rememberwhat this means: A and B have no elements in common). This relation is:

� not reflexive (because {1} and {1} are not disjoint, so {1} /R {1});� symmetric (if A and B are disjoint, then B and A are disjoint);

� not anti-symmetric (because {1} and {2} are disjoint, and also {2} and {1} are disjoint);

� not transitive ({1, 2} and {3, 4} are disjoint, {3, 4} and {2, 5} are disjoint, but {1, 2} and {2, 5}are not disjoint).

� Take X = Z, and consider the relation R defined by a R b if a− b is even. This relation is:

� reflexive (a R a for any a ∈ Z, because a− a = 0 which is even);

� symmetric (if a− b is even then b − a is even);

� not anti-symmetric (3 R 7 and 7 R 3 but 3 6= 7);

� transitive (if a R b R c, then a− b and b − c are both even, so a− c = (a− b) + (b − c) is alsoeven, so a R c).

Certain combinations of our properties of relations capture the ideas of ordering and equivalence. Thisleads to the important concepts of a partial ordering (a relation which is reflexive, transitive and anti-symmetric) and an equivalence relation (a relation which is reflexive, transitive and symmetric). You willmeet these ideas in later modules.

8.2 Sequences

Definition. A sequence is an ordered list a1, a2, a3, . . . of elements of some set X .

In this module the word sequence will always mean infinite sequence. (This is the usual convention inmaths, but sometimes people use sequence to include ordered lists of finite length.)

We sometimes write (ak )k∈N or (ak )∞k=1 for the sequence a1, a2, a3, . . . .Sometimes we will describe a sequence by giving the first few terms until there’s an obvious pattern.

Sometimes we will give a formula. Sometimes we may need to describe the sequence in words.

46

Examples.� The sequence of real numbers

1, 12

, 13

, 14

, 15

, . . . .

This can be written as (ak )∞k=1, where ak = 1/k .

� The sequence of integers1,−1, 1,−1, 1,−1, . . . .

This can be written as (bk )∞k=1, where bk = (−1)k+1.

� The sequence of sets{1}, {1, 2}, {1, 2, 3}, {1, 2, 3, 4}, . . . .

This can be written as (Ck )∞k=1, where Ck = {1, 2, 3, . . . , k}.

� The sequence of digits1, 4, 1, 5, 9, 2, 6, 5, . . . .

This can be described as “the sequence of digits after the decimal point in pi”.

Another way of thinking of a sequence of elements of X is as a function from N to X ; the image of kbeing the k th element of the sequence. Our first example above corresponds to the function a : N → R,a(k ) = 1/k .

Definition. A subsequence of the sequence (ak )∞k=1 is a sequence obtained from (ak )∞k=1 by deleting someelements and keeping the rest in the same order.

Remember that we defined a sequence as an infinite list, so when forming a subsequence we cannotdelete too many elements of our original sequence.

Examples. Here are some examples of subsequences.

� Take the sequence (ak )∞k=1 where ak = k2:

1, 4, 9, 16, . . . .

If we delete the odd terms, we get the subsequence

4, 16, 36, 64, . . . .

This is the sequence (bk )∞k=1, where bk = 4k2.

� Take the sequence (ak )∞k=1, where ak = 2k :

2, 4, 8, 16, . . . .

If we delete just the first two terms, we get the subsequence

8, 16, 32, 64, . . . .

This is the sequence (bk )∞k=1, where bk = 4× 2k .

47

� Take the Fibonacci sequence (Fk )∞k=1, defined by F1 = F2 = 1 and Fk = Fk−1 + Fk−2 for k > 3:

1, 1, 2, 3, 5, 8, . . . .

If we delete just the first term, we get the subsequence

1, 2, 3, 5, 8, 13, . . . .

This is the sequence (Gk )∞k=1 defined by G1 = 1, G2 = 2 and Gk = Gk−1 + Gk−2 for k > 3.

Mostly we are interested in sequences of numbers. Here are various properties that a sequence ofnumbers may have.

Definition. A sequence (xk )∞k=1 of numbers is

� increasing if xk < xl whenever k < l ,

� decreasing if xk > xl whenever k < l ,

� weakly increasing if xk 6 xl whenever k < l ,

� weakly decreasing if xk > xl whenever k < l ,

� constant if xk = xl for all k , l .

Sometimes “non-decreasing” and “non-increasing” are used instead of “weakly increasing” and “weaklydecreasing”.

In your calculus modules, you will see some more properties that sequences can have, such as con-vergent or divergent.

Examples.� The sequence (ak )∞k=1 with ak = 1/k is decreasing.

� The sequence (bk )∞k=1 with bk = (−1)k+1 is neither increasing nor decreasing.

Given any property of sequences (such as “increasing”), we can make a weaker property by puttingthe word “eventually” before it: this means that from some point onwards the sequence has that property.For instance, we can ask whether a sequence is eventually increasing. The formal way to say that thesequence a1, a2, a3, . . . is eventually increasing is: there exists N ∈ N such that if N 6 i < j then ai < aj .

For example, consider the sequence (dk )∞k=1 given by dk = (k − 5)2. This sequence goes

16, 9, 4, 1, 0, 1, 4, 9, 16, 25, . . . .

This sequence is not increasing (since d1 > d2) but it is eventually increasing: if 5 6 i < j , then di < dj .

48

9 Rational numbers and real numbers

Now we return to numbers, and look at some more families of numbers.

9.1 Rational numbers

We introduced the integers because subtraction is not always possible within N. Similarly, we introducethe rational numbers because division is not always possible within Z. For example 5, 2 ∈ Z but there isno integer x such that 5 = 2x . If we want division to always be possible (except for division by 0), we needto extend the integers to a larger set of numbers; this yields the rational numbers.

A rational number is a fraction ab with a ∈ Z, b ∈ Z\{0}. We denote the set of all rational numbers by

Q. Each element of Q can be written in many different ways. For instance 12 = 2

4 = 101202 = −1

−2 . Notice that inthe simplest expression for 1

2 , the numerator and denominator are coprime (remember what this means: aand b have no common factor except 1.). We will use the fact that you can always write a rational numberas a

b with a and b coprime later.The arithmetic operations of addition, subtraction, multiplication and division are defined in Q. For

instance, if b and d are non-zero thenab

+cd

=ad + bc

bdwhich is rational (bd 6= 0 because both b and d are non-zero).

Why not divide by zero?. Why don’t we extend the rational numbers by including a number 10? Then we

would be able to divide 1 by 0. The trouble is that if we then apply familiar rules for multiplying and dividing,we run into problems. One familiar rule is that a×c

b×c = ab . This gives 1

0 = 20 . Another familiar rule is that

b × ab = a. Applying this with a = 1 and b = 0, we get 0 × 1

0 = 1. Similarly, 0 × 20 = 2. Combining these

three equations gives 1 = 2, which is a problem.So we can’t come up with a number system extending Z keeping our familiar rules in which you can divideby every number. But the rational numbers are still very useful, even though we don’t allow division by 0.

Here are some other properties of Q which aren’t true for Z.

� The set P = {x ∈ Q : x > 0} has no smallest element. This is because for any element m ∈ Pwe can find a smaller element of P, for example m

2 . This means in particular that we can’t do proofby induction in Q: if we have a statement P(n) which we want to prove for every positive rationalnumber n and we try to prove it by induction, what would our base case be?

� For any a, b ∈ Q with a < b, there exists some c ∈ Q such that a < c < b. To see this, set c = a+b2 .

Then a < c < b. Since a and b are rational, c is also rational.

The second property says that between any two rational numbers we can find another rational num-ber. The rational numbers occur so densely that we cannot talk about two consecutive rational numbers(compare this with integers).

9.2 An irrational number

Consider a right-angled isoceles triangle whose short sides have length1, and let x be the length of the hypotenuse. By Pythagoras’s Theoremx2 = 12 + 12 = 2. The following theorem shows that x can’t be rational.

1

1

x

Theorem 9.1. There is no q ∈ Q with q2 = 2.

49

You may have seen a proof of this before; it’s a classic example of proof by contradiction. But thestatement of the theorem may look a little strange. This theorem is usually written as “

√2 is irrational”.

But that raises the question “what is√

2 then?” We’ve sort-of answered this above with a bit of geometry.But Theorem 9.1 as we have written it is cleaner in that it’s a statement entirely about rational numbers; itdoesn’t assume there is a thing called

√2.

It’s also a helpful way of writing the theorem because it suggests a method of proof. A very good wayto prove a statement of the form “something doesn’t exist” is to use proof by contradiction: if we supposesuch a thing does exist, then we have something we can work with, and hopefully derive a contradiction.

Proof of Theorem 9.1. Suppose, for a contradiction, that there does exist q ∈ Q with q2 = 2. Then wecan write q = a

b where a, b ∈ Z and b 6= 0. In fact we can do this in such a way that gcd(a, b) = 1: ifgcd(a, b) > 1, then we can just divide a and b by gcd(a, b) to express q with a simpler fraction.

The assumption q2 = 2 now gives a2 = 2b2. So a2 is even and (by Theorem 3.2) a is even. So we canwrite a = 2k for some k ∈ Z and so

(2k )2 = 2b2

so 4k2 = 2b2

so 2k2 = b2.

So b2 is even which means that b is even.Since a and b are both even, we have gcd(a, b) > 2. This contradicts our assumption that gcd(a, b) = 1.

We conclude that no such q exists.

9.3 Real Numbers

Our aim is to define the set of real numbers which extend the rational numbers in such a way that thenumber line has no “gaps”; in particular, this allows us to find numbers that solve equations like x2 = 2.There are actually several ways to rigorously define the real numbers. We will use the most familiar one ofdecimal expansions.

Let’s think a bit about what decimal expansions mean. Let’s start with terminating decimal expansions.When we write a number like 0.125, we mean

0 +110

+2

100+

51000

.

You can simplify this fraction and show that it equals 18 .

But some rational numbers have an infinite decimal expansion. For example,

13

= 0.3333 . . . .

This means13

= 0 +310

+3

100+

31000

+ . . . .

We won’t say precisely what an infinite sum like this means – you’ll see that in your calculus modules.

50

Here are some more decimal expansions of rational numbers:

12

= 0.5 = 0.5000000 . . .

23

= 0.666666 . . .

111

= 0.090909 . . .

76

= 1.166666 . . .

You will notice that all of these are eventually periodic. In other words, each of them consists ofsection which repeats infinitely often, possibly preceded by some initial non-repeating section. Any termi-nating decimal such as 0.5 has this form because we can write it as 0.5000 . . . (the repeating part is 0,the initial non-repeating part is the 0.5).

In fact, the decimal expansion of any rational number is eventually periodic, and any eventually periodicdecimal expansion is the decimal expansion of a rational number. (The proof of this is beyond the scopeof this module.)

Now we can define the real numbers to be all decimal expansions including those which are noteventually periodic.

Definition. A real number is an infinite decimal. That is of the form n.a1a2a3 . . . where n ∈ Z anda1, a2, a3, . . . ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.The set of all real numbers is denoted by R. The elements of the set R \Q are called irrational numbers.

This defines a set, but strictly speaking we still need to define how to add, subtract, multiply and divide,and check that these operations satisfy all the familiar properties.

You might wonder whether different decimal expansions always give different real numbers. The an-swer is usually yes. The one oddity comes with any rational number whose decimal expansion ends withan infinite string of 9s. For instance

0.99999 . . . .

In this case we can deal with the infinite sum because it is a geometric progression. Using the formula forthe sum of a geometric progression with common ratio 1

10 we get

0.99999 · · · = 9/101− 1/10

= 1.

So the decimal expansions 0.99999 . . . and 1.000000 . . . both represent the same real number 1. Similarlyany terminating decimal can also be represented by a decimal ending with an infinite string of 9s (forexample 2.19999 · · · = 2.2). Apart from this, any different decimals really do represent different numbers.

How does the idea of infinite decimals give us a number whose square is 2?We will try to construct a real number (infinite decimal) whose square is exactly 2. If we can do this we

will write this number as√

2. To do this, we work out√

2 one digit at a time:

� 12 < 2 < 22 and so we should take√

2 = 1. . . .

� (1.4)2 < 2 < (1.5)2 and so we should take√

2 = 1.4 . . .


2 = 1.41 . . .


2 = 1.414 . . .

51

and so on.If we repeat this infinitely many times to get an infinite decimal then it seems reasonable that the square

of this should be exactly 2 (proving this involves interpreting exactly what we mean by the infinite sum).Another way of thinking of this is that we have an infinite sequence of rational numbers:

1, 1.4, 1.41, 1.414, . . .

This sequence looks as if it should tend to a limit (and once you have set the definitions up properly itdoes); that limit is the real number

√2.

9.4 Upper bounds

Now we introduce an important property that R has but Q doesn’t. But to introduce this, we’ll go backto the integers and talk about upper bounds.

Any finite non-empty set X of integers has a maximum: this means an element x ∈ X such that y 6 xfor all y ∈ X . We normally write max X for the maximum of X . For example,

max{1, 4, 2, 7, 3} = 7.

What about infinite sets of integers? Sometimes they have a maximum and sometimes they don’t. Forexample, the set N has no maximum, but the set {−1,−2,−3, . . . } does, namely−1. To study this further,we need to introduce upper bounds.

Definition. Suppose X ⊆ R, and u ∈ R. We say that u is an upper bound for X if x 6 u for all x ∈ X .X is bounded above if it has an upper bound.

Examples.� {1, 2, 4} is bounded above: for example, 4 is an upper bound.

� {−1,−2,−3, . . . } is bounded above: 0 is an upper bound.

� N is not bounded above.

� The set of prime numbers is not bounded above: because there are infinitely many primes, for anyreal number u there is a prime number bigger than u.

� The set {x ∈ R : 0 < x < 1} is bounded above: 1 is an upper bound.

� The set{

x ∈ Q : x2 6 2}

is bounded above: 3 is an upper bound.

� ∅ is bounded above: for example, 16 is an upper bound.

Any non-empty set of integers has a maximum if and only if it is bounded above. But the same is nottrue with sets of rational or real numbers. For example, the set

X = {x ∈ Q : 0 < x < 1}

is bounded above (for example, 1 is an upper bound) but it has no maximum. The number 1 is a bit likethe maximum, but it’s not a maximum because it dooesn’t belong to X .

To get round this, we introduce the idea of a supremum.

52

Definition. Suppose X is a non-empty set of real numbers which is bounded above. A supremum for Xis a real number s such that:

� s is an upper bound for X , and

� if t is any upper bound for X , then t > s.

If X has a supremum then it’s unique: if s and t are both suprema of X , then the definition implies thats 6 t and also t 6 s, so s = t . We write sup X for the supremum of X , if it exists. If X is not boundedabove, then we write sup X =∞. If X = ∅, then we write sup X = −∞.

The supremum of X is also called the least upper bound of X : it’s an upper bound, and it’s smallerthan any other upper bound.

Examples.� Suppose X has a maximum x . Then x is a supremum of X : certainly x is an upper bound (this is

part of the definition of “maximum”), and if t is any other upper bound for X then t > x becausex ∈ X .

� If X = {x ∈ R : 0 < x < 1}, then 1 is a supremum of X . Certainly x 6 1 for every x ∈ X so 1an upper bound. Now suppose t is another upper bound. If t < 1, then take a real number u witht < u < 1. Then u ∈ X , and so t is not an upper bound for X , a contradiction. So t > 1 for anyupper bound t , which means tht 1 is a supremum.

� ∅ has no supremum: because any real number u is an upper bound for ∅, there is no least upperbound: given any upper bound, I can always find a smaller one.

Informally, you should think of “supremum” as being very similar to “maximum”: if X has a maximumthen the maximum is the supremum, but sometimes supremum is defined even when maximum isn’t. Thesecond example above shows that sup X does not need to be an element of X .

Now we show why we need the real numbers. Let

C ={

x ∈ Q : x2 6 2}

.

Then C is bounded above. C doesn’t have a maximum, but this is slightly tricky to prove. To prove it, weneed to show that given any x ∈ C, we can find a larger element v ∈ C. In fact v = 2x+2

x+2 will do the job.(An exercise for you is to check this: show that if x ∈ C then 2x+2

x+2 ∈ C and 2x+2x+2 > x .)

C does have a supremum, namely√

2.The point of this example is that to find the supremum (even of a set of rational numbers), we may

need to work over R rather than Q. The set C is a set of rational numbers whose supremum is irrational.You might wonder whether the real numbers are enough: could there be a set of real numbers which

doesn’t have a supremum in R? The following theorem says that the answer is no.

Theorem 9.2 (Principle of the Supremum). If X is a non-empty set of real numbers which is boundedabove, then X has a supremum.

We will not prove this. (In fact one approach to defining R uses this principle as part of the definition.)Everything we’ve done in this section we could also do with lower bounds. If X ∈ R, we can define the

minimum of X in the obvious way. A lower bound for X is a real number u such that u 6 x for all x ∈ X ;we say that X is bounded below if it has a lower bound. The infimum of X (if it exists) is the greatestlower bound. We write this as inf X . We define inf X = −∞ if X is not bounded below, and inf ∅ = ∞. Forexample, with the set C as above, the infimum inf C equals −

√2. Then Theorem 9.2 holds with “above”

replaced by “below” and “supremum” replaced by “infimum”.

53

10 Complex numbers

10.1 Definition and operations

Now we introduce the complex numbers, which you may have seen before. Just as with other numbersystems, this arises by defining new numbers to enable us to solve equations that we can’t solve with ourexisting number system. For example, we extend N to Z in order to be able to do subtractions like 2 − 3,and we extend from Z to Q to enable us to do divisions like 1÷ 2.

In R, the square of any number is non-negative, so for example −1 does not have a square root. Sowe extend R by adding a new number i which is defined to be a square root of −1.

Definition. A complex number is an expression a + bi where a, b ∈ R. We write C for the set of allcomplex numbers.

If a ∈ R, then we usually write a + 0i just as a. So we think of R as a subset of C.If z = a + bi with a, b ∈ R, then a is called the real part of z (written Re(z)), and b is called the

imaginary part of z (written Im(z)).We define addition and multiplication in C by

� (a + bi) + (c + d i) = (a + c) + (b + d)i,

� (a + bi)(c + d i) = (ac − bd) + (ad + bc)i.

We write the complex number a + 0i simply as a. So we can think of R as a subset of C. The letter i ischosen to mean “imaginary”; a number of the form bi, for b ∈ R, is called an imaginary number. (Usuallyin physics j is used instead of i.)

The idea of the definition is that i will stand for√−1. Given this, if we want multiplication to behave as

it does in R with respect to brackets, we must take:

(a + bi)× (c + d i) = ac + bd i2 + ad i + bci = (ac − bd) + (ad + bc)i.

Examples.� (4 + 3i)(3 + 2i) = (12− 6) + (8 + 9)i = 6 + 17i

� (4 + 3i)(4− 3i) = (16 + 9) + (12− 12)i = 27 + 0i = 27

This example shows we can have z, w ∈ C \ R with zw ∈ R.

� i× i = (0 + i)(0 + i) = (0− 1) + (0 + 0)i = −1 (as expected)

� −i×−i = (0− i)(0− i) = (0− 1) + (0 + 0)i = −1

(So −1 has two square roots in C, namely i and −i.)

We defined the operations of addition and multiplication only but we can work out from these howsubtraction and division should be defined. If we want subtraction to satisfy z −w = z + (−1)×w then weneed to define it by:

(a + bi)− (c + d i) = (a− c) + (b − d)i.

Division is a little bit trickier to define. Dividing by real numbers is easy: if a, b, c ∈ R and c 6= 0, then wedefine

(a + bi)÷ c =ac

+bc

i.

In other words, we separately divide the two number a and b by c. (As in R, we still cannot divide by 0.)

54

To divide by any non-zero complex number, we make the following observation about multiplication:

(a + bi)× (a− bi) = a2 + b2 + (ab − ba)i = a2 + b2.

So(a + bi)

a− bia2 + b2 = 1.

So if division is going to be defined, we will have

1a + bi

=a− bia2 + b2 .

This tells us how to do division: as in R, dividing by x should be the same as multiplying by 1/x . So todivide by a + bi we multiply by

a− bia2 + b2 .

(Note that because a and b are real, the denominator a2 + b2 can only be zero if a = b = 0. So we candivide by any non-zero complex number.)

To simplify a fraction involving complex numbers, multiply the top and bottom by some factor whichmakes the denominator into a real number. If the denominator is a + bi then the factor a − bi will do thetrick. For example

2 + 3i1− 8i

=(2 + 3i)(1 + 8i)(1− 8i)(1 + 8i)

=−22 + 19i

65=−2265

+1965

i.

It can be checked these arithmetic operations on C satisfy the usual properties, such as (uv )w) =u(vw) and u(v + w) = uv + uw

Definition. Let z = a + bi ∈ C be a complex number. The complex conjugate of z is the number a − bi,denoted by z.

As we saw when we introduced multiplication, we have

zz = (a + bi)(a− bi) = a2 + b2.

So for any z ∈ C we have zz ∈ R. Moreover, zz > 0, and zz is zero only if z = 0.

10.2 The complex plane

The real numbers can be thought of as a one dimensional “number line”. How can we visualise complexnumbers in a similar way? We need to represent the real and imaginary parts and the most natural wayto do this is with a two dimensional plane. We will see that this does indeed give a useful geometricway of representing complex numbers, and that the operations we have seen (addition, multiplication,conjugation) all have simple interpretations in this context.

The complex plane simply means the plane R2, but with each point (a, b) representing the complex

55

number a + bi. The x-axis is now called the real axis, and the y -axis is called the imaginary axis.

Real axis

Imaginary axis

θ

z = a + bi = r (cos θ + i sin θ)

a

bi

Consider the line from the origin to the point z = a + bi. From the figure, we can see that another wayof specifying z is to give the length r of this line, and the angle θ between it and the positive real axis(measured anti-clockwise). This leads to the expression

z = r (cos θ + i sin θ)

called the polar form of z.

Definition. Let z = a + bi ∈ C be a complex number.

� The modulus of z is√

a2 + b2; it is denoted by |z|.

� If z 6= 0, the argument of z is the unique θ with b|z| = sin θ, a

|z| = cos θ and −π < θ 6 π; it is denotedby arg(z).

(arg(0) is undefined. Some people take arg(z) so that 0 6 arg(z) < 2π, rather than −π < arg(z) 6 π.This doesn’t make much practical difference.)

Recall that we defined addition of complex numbers by

(a + bi) + (c + d i) = (a + c) + (b + d)i.

In other words, we add the real and imaginary parts separately, which is just like adding vectors in R2.Thinking about this geometrically, the origin and the points of the complex plane representing z, w andz + w form a parallelogram.

z

w

w + z

56

To multiply complex numbers in the complex plane, it is much easier to use polar form. If z = r (cosα +i sinα) and w = s(cosβ + i sinβ) then

zw = r (cosα + i sinα)s(cosβ + i sinβ)

= rs ((cosα cosβ− sinα sinβ) + i(cosα sinβ + cosβ sinα))

= rs (cos(α + β) + i sin(α + β)) .

So, to multiply two complex numbers we multiply the moduli and add the arguments. Geometrically tomultiplying z by w we stretch z by a factor |w | and rotate z anti-clockwise by angle arg(w).

One particular consequence of this is that |zw | = |z||w |.Unlike addition, multiplication of complex numbers is not just using something we already know about

for vectors in R2. Thinking of complex numbers as points in the complex plane does not just mean reducingthem to vectors; multiplication gives some extra structure.

Finally, the complex conjugate has a geometric intrepretation. If z = r (cos θ + i sin θ) then

z = r (cos θ− i sin θ) = r (cos(−θ) + i sin(−θ)).

In other words, we form the conjugate by negating the argument and leaving the modulus unchanged so|z| = |z| and arg(z) = −arg(z). Geometrically, this corresponds to reflection in the real axis.

θ

θ

z = a + bi = r (cos θ + i sin θ)

z = a− bi = r (cos θ− i sin θ)

a

b

−b

Thinking of raising to a power as being repeated multiplication we get the following result.

Theorem 10.1 (De Moivre’s Theorem). For all n ∈ N and θ ∈ R,

(cos θ + i sin θ)n = cos(nθ) + i sin(nθ).

Proof. Let P(n) be the statement that (cosθ + i sin θ)n = cos nθ + i sin nθ. We will prove that this holds forall n ∈ N by induction.

Base case: P(1) says that cos θ + i sin θ = cos θ + i sin θ, which is true.

Inductive step: Suppose that n > 2 and that P(n − 1) holds. Then

(cos θ + i sin θ)n = (cos θ + i sin θ)n−1 × (cos θ + i sin θ)

= (cos((n − 1)θ) + i sin((n − 1)θ))× (cos θ + i sin θ) (by the induction hypothesis)

= cos(nθ) + i sin(nθ) (by the polar expression for multiplication).

So P(n) holds.

57

We conclude that (cos θ + i sin θ)n = cos nθ + i sin nθ for all n ∈ N by induction.

10.3 Roots of unity (non-examinable)

In maths, we often use “unity” as a fancy word for the number 1. The question here is: given n ∈ N,which complex numbers z satisfy zn = 1? These numbers z are called the nth roots of unity.

If we replace C with R then the only nth roots of unity are 1 and (if n is even) −1. In C, we can answerthe question using De Moivre’s Theorem.

Write z in polar form as r (cos θ + i sin θ). Then |z| = r , so |zn| = rn. So if zn = 1, then rn = 1, where ris a non-negative real number. The only way this can happen is if r = 1, so z = cos θ + i sin θ.

Now we can use De Moivre: if zn = 1, then cos(nθ) + i sin(nθ) = 1. Looking at the real part and theimaginary part, this means

cos(nθ) = 1, sin(nθ) = 0.

From basic trigonometry, we know that cosα = 1 if and only if α = 2πm for some integer m, and in thiscase also sinα = 0.

So we can answer our question: zn = 1 if and only if z = cos θ + i sin θ, where nθ = 2mπ for someinteger m. So

z = cos(

2πmn

)+ i sin

(2πm

n

)for some integer m.

Thinking about this in the complex plane, the solutions to zn = 1 are all at distance 1 from 0, goinground in angles of 2π/n. So there are exactly n solutions, and they form a regular n-sided polygon.

For example, when n = 4, the four solutions are z = 1, i,−1,−i.

1−1

−i

i

When n = 3, the solutions are 1, cos(2π/3) + i sin(2π/3) and cos(4π/3) + i sin(4π/3). Rememberingthat

cos(2π/3) = cos(π/3) = −12

, sin(2π/3) = sin(π/3) =√

32

,

cos(4π/3) = cos(π/3) = −12

, sin(4π/3) = − sin(π/3) = −√

32

,

58

we get the following picture of the cube roots of unity.

1

1+i√

32

1−i√

32

In general, any polynomial equation of the form anzn + an−1zn−1 + · · · + a1z + a0 has n solutions in C(possibly repeated roots). This result is called the Fundamental Theorem of Algebra.

59

numbers, sets and functions 2021–22

Documents