introduction to analysishajlasz/teaching/math1530fall2018/selection.pdfanother example, less obvious...

Introduction to Analysis

Piotr Haj lasz

Contents

1 Logic 7

1.1 Propositional calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2.1 Direct proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2.2 Proof by contradiction . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2.3 Proof by contrapositive . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3 Context and quantifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Set theory 21

2.1 Elementary set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2 Russell’s paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.3 Axioms of the set theory∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3 Cartesian products and functions 33

3.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2 More logic∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4 Mathematical induction 41

4.1 Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.2 Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3

4 CONTENTS

4.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5 Relations, integers and rational numbers 53

5.1 Binary relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.2 Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.3 Rational numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6 Real numbers 65

6.1 The first sixteen axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.2 Natural numbers, integers and rational numbers . . . . . . . . . . . . . . . . . 74

6.3 Supremum, infimum and the last axiom . . . . . . . . . . . . . . . . . . . . . 75

6.4 Natural and rational numbers among real numbers . . . . . . . . . . . . . . . 80

6.5 Formal construction of real numbers . . . . . . . . . . . . . . . . . . . . . . . 81

7 Cardinality 85

8 Sequences and limits 95

8.1 Limits of sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

8.2 The Stolz theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

8.3 Monotone sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

8.4 Subsequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

8.5 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

8.6 Decimal expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

8.7 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

8.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

9 Exponential and logarithmic functions 121

9.1 The definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

9.2 Number e and natural logarithm . . . . . . . . . . . . . . . . . . . . . . . . . 124

9.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

CONTENTS 5

10 Subsequences, series and the Cauchy condition 131

10.1 The Cauchy condition and the Bolzano-Weierstrass theorem . . . . . . . . . . 131

10.2 The upper and the lower limits . . . . . . . . . . . . . . . . . . . . . . . . . . 133

10.3 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

10.4 Alternating series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

10.5 Multiplication of series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

10.6 Changing the order in a series . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

10.7 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

11 Limits and continuity 151

11.1 Limits of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

11.2 One-sided limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

11.3 Limits involving infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

11.4 Continuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

11.5 Trigonometric and exponential functions . . . . . . . . . . . . . . . . . . . . . 162

11.6 Composition of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

11.7 Continuity of monotone and inverse functions . . . . . . . . . . . . . . . . . . 165

11.8 Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

11.9 Intermediate value property . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

11.10Logarithmic function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

11.11Maximum, minimum and uniform continuity . . . . . . . . . . . . . . . . . . 169

11.12Uniform convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

11.13Power series I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

11.14Weierstrass theorem: approximation by polynomials . . . . . . . . . . . . . . 181

12 Derivative 185

12.1 Derivatives of elementary functions I . . . . . . . . . . . . . . . . . . . . . . . 190

12.2 Mean value property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

12.3 Derivatives of elementary functions II: inverse functions . . . . . . . . . . . . 194

12.4 L’Hospital’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

6 CONTENTS

12.5 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

12.6 Power series II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

12.7 Taylor’s theorem(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

12.7.1 Peano’s remainder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

12.7.2 Little o calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

12.7.3 The Lagrange and the Cauchy remainders . . . . . . . . . . . . . . . . 213

12.7.4 Applications and examples . . . . . . . . . . . . . . . . . . . . . . . . 215

12.8 Differentiability of a limit function . . . . . . . . . . . . . . . . . . . . . . . . 218

13 Integral 223

13.1 Existence of an antiderivative . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

13.2 Definite and indefinite integral . . . . . . . . . . . . . . . . . . . . . . . . . . 227

13.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

13.2.2 Geometric interpretation of the integral . . . . . . . . . . . . . . . . . 231

13.3 Basic results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

13.3.1 Taylor’s theorem with an integral remainder . . . . . . . . . . . . . . . 236

13.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

13.4.1 Wallis’ formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

13.4.2 Stirling’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

13.4.3 π is irrational . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

13.5 Improper integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

13.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

Chapter 1

Logic

1.1 Propositional calculus

All mathematical reasoning is based on logic. Below we provide a brief introduction to themain concepts of logic. We will try to keep a balance between rigorousness and intuition.

Mathematics consists of statements that are true or false, and we need to find out whichstatements are true. If a “statement” is neither true of false, it is not a statement in themathematical sense.

2 + 2 = 4 is a true statement,

1 + 2 = 4 is a false statement.

Let’s have a lunch. It is not a statement, because it has no logical value of being true orfalse. This sentence is false. It is also not a statement. Assuming it is true we arrive to acontradiction and assuming it is false we also arrive to a contradiction, so it is neither trueor false and thus it is not a statement in the sense of logic. The problem with this sentenceis that it refers to itself. Although this sentence is not mathematical in its nature, later wewill see that some mathematical sentences1 refer to themselves causing antinomies.

Some statements depend on variables and they are sometimes true and sometimes false,depending on the value of the variable. Consider the statement

x2 = 1.

It is true if x = 1 or if x = −1, false otherwise.

We say that statements p and q are logically equivalent if p is true if and only if q is true.We denote logically equivalence by p ≡ q or by p⇔ q. For example the two statements

(x2 = 1) and (x ∈ {−1, 1})

are logically equivalent, so we can write

(x2 = 1) ≡ (x ∈ {−1, 1}).1See Russel’s paradox in Section 2.2.

7

8 CHAPTER 1. LOGIC

Given statements p and q, we can build new statements.

Conjunction: “p and q” dentoed “p ∧ q”.

Disjunction: “p or q” dentoed “p ∨ q”.

The statement p ∧ q is true if and only if both statements p and q are true

(2 + 2 = 4) ∧ (2 + 2 = 5)

is false, while(2 + 2 = 4) ∧ (2 + 3 = 5)

is true. The statement p ∨ q is true if and only if at least one of the statements p and q istrue, so

(2 + 2 = 4) ∨ (2 + 2 = 5)

is true, but(2 + 2 = 5) ∨ (2 + 2 = 6)

is false.

To a true statement we assign a logical value 1 and 0 to a false one. Thus we maysummarize the above discussion using a truth table:

p q p ∧ q p ∨ q0 0 0 00 1 0 11 0 0 11 1 1 1

Negation of a statement p is the statement

“not p” denoted ¬p

that is true if and only if p is false, so we have

p ¬p0 11 0

Another, less obvious construction is the implication

(p implies q) or (If p, then q) or (p⇒ q).

Here p is called hypothesis and q is called conclusion. When is the implication true? Let’sconsider an example.

If I win the lottery︸︷︷︸p

, then︸︷︷︸⇒

I will buy a house︸︷︷︸q

.

If I win the lottery, but decide not to buy a house, I break my promise making the abovestatement false,

p is true, q is false; p⇒ q is false.

1.1. PROPOSITIONAL CALCULUS 9

If I win the lottery and buy a house I fulfill my promise and the statement is true,

p is true, q is true; p⇒ q is true.

If I do not win in the lottery, then I have no obligation to buy a house. Thus no matter whatI do, I will not break my promise. I can decide to buy a house or not to buy a house and ineither case the statement is true.

p is false, q is true or false; p⇒ q is true.

Thus the only situation when the implication is false is when the hypothesis p is true, butthe conclusion q is false. In all other cases the implication is true.

p q p⇒ q

0 0 10 1 11 0 01 1 1

That seems obvious, but there are situations when it is not.

Example 1.1. The following implication is true since false assumption always implies a trueconclusion

2 + 2 = 5 =⇒ 2 + 2 = 4.

Although is does seem right, we can deduce the equality 2 + 2 = 4 from the assumption that2 + 2 = 5 as follows. If 2 + 2 = 5, then 0 · (2 + 2) = 0 · 5, 0 = 0. Adding 4 yields 4 = 4.Replacing the left hand side with 2 + 2 (we can do it because we know that 2 + 2 = 4) gives2 + 2 = 4.

Example 1.2.

2 + 2 = 4 =⇒ Ronald Reagan was a president.

The implication is certainly true, because both the hypothesis and the conclusion are true,but there is no way we can logically deduce from a simple mathematical equality 2 + 2 = 4,the historical fact that Reagan was a president. That does not matter. In the mathematicallogic, for the implication to be true we do not require the conclusion to be deducible fromthe hypothesis.

Example 1.3. Let us consider the following bizarre statement:

If I will eat my car, my right hand will turn into a leg.

That sounds quite insane, but the statement is true. Since I will not ear my car2, thehypothesis is false, and thus the implication is true.

2Michel Lotito eat Cesna 150 so eating a car is possible, but I am not that crazy.

10 CHAPTER 1. LOGIC

If p and q are statements, then the equivalence p ≡ q is also a statement with obviousrules

p q p ≡ q0 0 10 1 01 0 01 1 1

Indeed, p ≡ q is true if and only if both p and q are true or false at the same time. Intuitivelyit is obvious that the following statements are equivalent

(p ≡ q) and((p⇒ q) ∧ (q ⇒ p)

).

The reader may verify the equivalence formally using a truth table in a similar way to theverification of the De Morgan Laws in (1.1) provided below.

Another, easy to check example of equivalent statements is that p is logically equivalentto ¬(¬p),

p ≡ ¬(¬p).

As an application of the above discussion we will prove De Morgan’s Laws:

¬(p ∧ q) ≡ (¬p) ∨ (¬q)

¬(p ∨ q) ≡ (¬p) ∧ (¬q).

We will discuss the first law only, the argument for the second one is similar. The law saysthat the conjunction p∧ q is false if and only if p is false or q is false, and that seems obvious,but we can (and we should to be mathematically rigorous) justify De Morgan’s Laws formallyusing a truth table.

(1.1)

p q p ∧ q ¬(p ∧ q) ¬p ¬q (¬p) ∨ (¬q)0 0 0 1 1 1 10 1 0 1 1 0 11 0 0 1 0 1 11 1 1 0 0 0 0

By looking at the fourth and the seventh column in the above table we see that the statements¬(p ∧ q) and (¬p) ∨ (¬q) are both true or both false at the same time, so they are logicallyequivalent. That proves the first of the two De Morgan’s laws.

Another example, less obvious than De Morgan’s Laws, is that the following statementsare equivalent

(1.2) p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r)

p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r)

1.2. PROOFS 11

We will prove the equivalence of statements in (1.2) leaving details of the second equivalenceas an exercise.

p q r p ∧ (q ∨ r) (p ∧ q) ∨ (p ∧ r)0 0 0 0 00 0 1 0 00 1 0 0 00 1 1 0 01 0 0 0 01 0 1 1 11 1 0 1 11 1 1 1 1

Example 1.4. Prove that the statement p∧¬(q∧¬r) is logically equivalent to (p∧¬q)∨(p∧r).

Proof. We could check it using a table as above, but instead we will prove it directly usingthe rules that we have already verified.

p ∧ ¬(q ∧ ¬r)De Morgan≡

p ∧ ((¬q) ∨ (¬(¬r))) ≡

p ∧ ((¬q) ∨ r)(1.2)≡

(p ∧ ¬q) ∨ (p ∧ r) .

1.2 Proofs

In this section we will discuss the logical structure of proofs.

1.2.1 Direct proof

The only situation when an implication is false is when the hypothesis is true and the con-clusion is false, so in order to prove that p⇒ q we can assume that p is true and we need toshow that q is true as well. This is so called direct proof.

Example 1.5. Prove that if a, b ≥ 0, then a+b2 ≥

√ab.

Proof. Let a, b ≥ 0. Then (a− b)2 ≥ 0, a2 − 2ab+ b2 ≥ 0, a2 + 2ab+ b2 ≥ 4ab, (a+ b)2 ≥ 4aband taking the square root gives a+b ≥ 2

√ab. Dividing both sides by 2 yields a+b

2 ≥√ab.

1.2.2 Proof by contradiction

Let us start with an example of a simple proof by contradiction. Then we will carefullyanalyze the structure of the proof from the perspective of logic.

12 CHAPTER 1. LOGIC

Example 1.6. Let n be an integer. If n3 + 5 is odd, then n is even.

Proof. Suppose to the contrary that n3 + 5 is odd and n is odd to. Then n3 + 5 = 2k + 1,n = 2`+ 1 for some integers k and ` and a simple calculation yields

2k + 1 = (2`+ 1)3 + 5, 2k = 8`3 + 12`2 + 6`+ 5

which is a contradiction since in the last equality the left hand side is even while the rightone is odd. The contradiction completes the proof.

The logical argument used in the proof is based on the fact that the following two state-ments are logically equivalent3

(1.3) (p⇒ q) ≡ ¬(p ∧ ¬q).

Therefore, in order to prove that the statement p ⇒ q is true it suffices to prove that thestatement ¬(p∧¬q) is true, i.e. that the statement p∧¬q is false. How do we prove that thestatement p ∧ ¬q is false? We assume that it is true and we arrive to a contradiction.

Let us look at the above proof again to see that it follows the logical argument describedhere. We want to prove

If n3 + 5 is odd︸︷︷︸p

, then︸︷︷︸⇒

n is even︸︷︷︸q

.

To this end it suffices to prove that the following statement is false

(1.4) n3 + 5 is odd︸︷︷︸p

and︸︷︷︸∧

n is odd︸︷︷︸¬q

.

So we assume that (1.4) is true and, after some calculations, we arrive to a contradiction.This completes the proof.

The argument by contradiction is very common in mathematics and we often employ itusing our intuition without realizing that the method of the proof by contradiction can beformally justified in the mathematical logic.

Sometimes a simpler argument is enough. In order to prove that the statement p is trueit suffices to prove that the statement ¬p is false. Here are two examples. The proof ofExample 1.7 is due to Euclid (c. 300 BC).

Example 1.7. There are infinitely many prime numbers.

Proof. Suppose to the contrary that there are only finitely many primes. List them asp1, p2, . . . , pn. Consider the number q = p1p2 · . . . · pn + 1 > 1. Let p be a prime factorof q. Then p 6∈ {p1, p2, . . . , pn}, because otherwise it would divide q − p1p2 · . . . · pn = 1.Contradiction. The proof is complete.

Example 1.8.√

2 is irrational.

3We leave the proof of the equivalence (1.3) as an exercise.

1.2. PROOFS 13

Proof. Suppose to the contrary that√

2 is rational. Then√

2 = p/q for some positive integersand we may assume that p and q have no common factors. We have

2 =p2

q2, p2 = 2q2.

Therefore p2 is even. Since the square of an odd number is odd, we have that p is even, p = 2kso

(2k)2 = 2q2, 4k2 = 2q2, 2k2 = q2.

Thus q2 is even and since the square of an odd number is odd, we have that q is even.We proved that both numbers p and q are even which contradicts the assumption that thenumbers p and q have no common factors.

1.2.3 Proof by contrapositive

This proof is based on the equivalence

(1.5) (p⇒ q) ≡ (¬q ⇒ ¬p).

Here is an example.

Example 1.9. Let x, y ∈ R. If y3 + yx2 ≤ x3 + xy2, then y ≤ x.

In order to prove

y3 + yx2 ≤ x3 + xy2︸︷︷︸p

=⇒ y ≤ x︸︷︷︸q

it suffices to prove that

y > x︸︷︷︸¬q

=⇒ y3 + yx2 > x3 + xy2︸︷︷︸¬p

.

Proof. Suppose that it is not true that y ≤ x, i.e. y > x. Then y − x > 0 and we have

(y − x)(x2 + y2) > 0 (because x2 + y2 > 0)

yx2 + y3 − x3 − xy2 > 0

y3 + yx2 > x3 + xy2

so it is not true that y3 + yx2 ≤ x3 + xy2. The proof is complete.

The proof by contrapositive is less common than the proof by contradiction.

Here is a non mathematical application of the equivalence (1.5)

14 CHAPTER 1. LOGIC

Example 1.10. The following two sentences were written on a jar of a barbecue sauce4:

Good goods are never cheap.

Cheap goods are never good.

Do the two sentences have the same meaning? A product is either good (G) or not good(¬G), cheap (C) or not cheap (¬C) so the sentences can be stated as follows

G ⇒ ¬CC ⇒ ¬G.

According to (1.5) we have

(G ⇒ ¬C) ≡ (¬(¬C) ⇒ ¬G) ≡ (C ⇒ ¬G).

Therefore the sentences have the same meaning so adding the second one was redundant sinceit did not provide any new information.

1.3 Context and quantifiers.

Mathematical statements often involve variables like e.g.

x2 = 1.

It is important to fix the context in which we consider this statement. That means we needto fix a set from which we choose x. It can be the set of all complex numbers, the set of allreal numbers, the set of all integers, etc.

Very often mathematical statements involve phrases for all or there exists. For examplethe statement

For all integeres x, x2 = 1

is false, but the statement

There is an integer x such that x2 = 1

is true.

If we fix the context first, we can express the two statements in a shorter way. Let x bean integer.

For all x, x2 = 1.

There is x such that x2 = 1.

The two statements are equivalent to the previous ones. We did not have to include theinformation that x is an integer into the statements, because it was assumed earlier.

The phrase for all or for every is called the universal quantifier and is dented by thesymbol ∀. The phrase there exists is called the existential quantifier and is denoted by ∃.

4Daddy Sam’s Bar-B-Que Sawce.

1.3. CONTEXT AND QUANTIFIERS. 15

Let x be an integer, then we can reformulate the above two statements as follows

∀x (x2 = 1)

∃x (x2 = 1).

Sometimes we may explain the context directly in the formula as in the following true state-ment:

∀x ∈ R(x2 = 1⇒ x ∈ {−1, 1}

)That fact that it is true seems obvious, but let’s look at it more carefully. The statementsays that for every x ∈ R the statement that follow the quantifier is true. Since it is true forevery x ∈ R, in particular, it is true for x = 0:

02 = 1 ⇒ 0 ∈ {−1, 1}.

While this implication seems ridiculous, it is certainly true, because an implication with thefalse hypothesis is always true!

Let x and y be real numbers. The statement

∀x ∃y (x+ y = 0)

means:For every real x there is a real y such that x+ y = 0.

This statement is obviously true.

The order of quantifiers is very important.

∀x ∃y (x+ y = 0) is true

while∃y ∀x (x+ y = 0) is false.

More generally if P (x, y) is a statement depending of two variables x and y (e.g. x+y = 0)we can consider statements like

∀x ∃y P (x, y).

The statement P (x, y) is often called a property. Indeed, this is a property that is satisfied(when true) by the given x and y or is not satisfied (when false). The example of theproperty x+ y = 0 shows that in general ∃y ∀x P (x, y) does not imply ∃y ∀x P (x, y) since atrue statement does not imply a false one. However we have.

Example 1.11. For any property P ,(∃y ∀x P (x, y)

)⇒(∀x ∃y P (x, y)

).

Proof. Since the only situation when an implication can be false is when the hypothesis istrue and the conclusion is false, it suffices to assume that the hypothesis is true and then toshow that the conclusion is true as well.

Assume that the statement

(1.6) ∃y ∀x P (x, y)

16 CHAPTER 1. LOGIC

is true. We need to show that the other statement

(1.7) ∀x ∃y P (x, y)

is true as well. From the assumption that (1.6) is true we see that there is y, denote it by y0,such that for all x, P (x, y0) is true so for all x we can find y (namely y0) such that P (x, y) istrue which is precisely (1.7).

It is of fundamental importance to understand how to negate statements that involvequantifiers. Let A be a set. Observe that the following two statements are equivalent

¬(∀x ∈ A P (x)) ≡ ∃x ∈ A (¬P (x))

Indeed, the fact that for all x ∈ A, P (x) is satisfied is not true means that there is x ∈ Asuch that P (x) is not satisfied. Similarly we justify the equivalence

¬(∃x ∈ A P (x)) ≡ ∀x ∈ A (¬P (x)).

Note that we reverse the quantifier and negate the statement, but we do not negate thecondition x ∈ A. For example the negation of the (false) statement

∀ε > 0 (ε2 ≥ 1)

is the (true) statement

∃ε > 0 (ε2 < 1).

Now we can apply the two rules to the statements that involve more than one quantifier.For example

¬(∀x ∃y P (x, y)) ≡ ∃x (¬(∃y P (x, y))) ≡ ∃x ∀y (¬P (x, y)).

We can apply the same argument to the case in which we have more than two quantifiers,e.g.

¬(∀x ∃y ∃z ∀w P ) ≡ ∃x ∀y ∀z ∃w (¬P ).

The rule is quite clear, I hope. Each time we change the quantifier to the other one and thenwe negate the condition P . Let’s see how it works. The statement

∃x ∀y (x+ y = 0)

is clearly false, so its negation should be true

¬(∃x ∀y (x+ y = 0)) ≡ ∀x ∃y (x+ y 6= 0)

and that is indeed true.

Example 1.12. Prove the statement

(∀ε > 0) (∃n ∈ N) (n−1 < ε).

1.3. CONTEXT AND QUANTIFIERS. 17

Proof. That statement means that for every positive ε there is a natural number n such thatn−1 < ε. Suppose to the contrary that the statement is false. Then its negation is true, i.e.5

¬((∀ε > 0) (∃n ∈ N) (n−1 < ε)

)≡ (∃ε > 0) (∀n ∈ N) ¬(n−1 < ε)

≡ (∃ε > 0) (∀n ∈ N) (n−1 ≥ ε) .

That is(∃ε > 0) (∀n ∈ N) (n−1 ≥ ε)

is true. It says that we can find an ε > 0 such that for every n, n−1 ≥ ε. Fix such ε > 0.Then for all positive integers n, n−1 ≥ ε.

Let n0 be any positive integer such that n0 > ε−1. Since the statement above is true forall positive integers, it is true, in particular, for n = n0, so we have n−10 ≥ ε. This however,implies that n0 ≤ ε−1 which contradicts the fact that n0 is bigger than ε−1. The contradictionproves the claim.

Remark 1.13. A good question is, how did we know that we should consider n0 > ε−1. Wewill learn it later, but for checking that the proof is correct we do not need to know howwe got this argument. The next example will have a similar issue. We will take δ =

√ε/2

without any clue why.

The next example is similar in nature, but more complicated because of a more involvedlogical structure of the statement.

Example 1.14. Prove that the following statement is true6

(∀ε > 0) (∃δ > 0) (∀x ∈ R)(|x| < δ ⇒ |x2| < ε

).

Proof. The statement that we plan to prove should be read as follows. For every ε > 0there is δ > 0 such that for ever x ∈ R, if |x| < δ, then |x2| < ε.

Suppose to the contrary that the statement is false, then its negation is true

¬((∀ε > 0) (∃δ > 0) (∀x ∈ R) (|x| < δ ⇒ |x2| < ε)

)≡

(∃ε > 0) (∀δ > 0) (∃x ∈ R) ¬(|x| < δ ⇒ |x2| < ε) ≡(∃ε > 0) (∀δ > 0) (∃x ∈ R) (|x| < δ ∧ |x2| ≥ ε).

In the last step we used equivalence of the statements

¬(p⇒ q) ≡ p ∧ ¬q .

Thus the following statement is true and we need to arrive to a contradiction.

(∃ε > 0) (∀δ > 0) (∃x ∈ R) (|x| < δ ∧ |x2| ≥ ε).5As we have already explained earlier we reverse quantifiers and negate the statement n−1 < ε, but we keep

conditions ε > 0 and n ∈ N unchanged.6A reader familiar with a formal definition of continuity will see that the statement says that the function

f(x) = x2 is continuous at x = 0. However, at this point we are not interested in studying continuity and weonly want to learn logic behind a proof of a statement.

18 CHAPTER 1. LOGIC

The statement says that there is ε > 0 such that for every δ > 0 there is x such that |x| < δand |x2| ≥ ε.

Fix such ε > 0. Since for every δ > 0 there is x such that |x| < δ and |x2| ≥ ε, it is truefor δ =

√ε/2. That means for δ =

√ε/2 there is x such that

|x| <√ε/2 and |x2| ≥ ε.

However, the first inequality gives |x2| < ε/4 which contradicts the second one. The proof iscomplete. 2

This proof is difficult, but arguments of this type will appear in our reasoning very oftenand eventually we will learn how to use them.

1.4 Problems

Problem 1. Use the truth table to prove equivalence of the statements:p ≡ q and (p⇒ q) ∧ (q ⇒ p).

Problem 2. Use the truth table to prove equivalence of the statements:

(p⇒ q) ≡ ¬(p ∧ ¬q) ≡ (¬q ⇒ ¬p).

Problem 3. Use the equivalence

(1.8) p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r)

to provep ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r).

To this end apply (1.8) to ¬p, ¬q, ¬r in place of p, q, r, and negate the statement using DeMorgan’s Laws.

Problem 4. Negate the statement7

∀ε > 0 ∃δ > 0 ∀x ∈ R ∀y ∈ R (|x− y| < δ ⇒ | sinx− sin y| < ε).

Problem 5. Negate the statement: For all real numbers x, y satisfying x < y, there is arational number q such that x < q < y. Formulate the negation as a sentence and not as aformula involving quantifiers.

Problem 6. Use an argument by contradiction prove that√

3 is irrational.

Problem 7. In Example 1.5 we provided a direct proof. Prove the same statement using aproof by contradiction.

Problem 8. Prove the following statement8

∀ε > 0 ∃n0 ∈ N ∀n ∈ N (n ≥ n0 ⇒ n−1 ≤ ε).7This is a true statement known as uniform continuity of the function sinx. However, you are not asked to

prove the statement only to negate it.8Compare with Example 1.12.

1.4. PROBLEMS 19

Problem 9. Find all functions f : R→ R that satisfy the following property9

∀ ε > 0 ∀x, y ∈ R ∃ δ > 0 (|x− y| < δ ⇒ |f(x)− f(y)| < ε).

Problem 10. Find a mistake in the following ‘solution’ to Problem 9.

“Solution”. We will prove that only constant functions satisfy the condition. Clearly constandfunctions satisfy it, because |f(x)−f(y)| = 0 < ε for all x, y ∈ R. Suppose now that a functionf satisfies the condition. We have to prove that f is constant. The condition says that forany ε > 0 and any x, y ∈ R we can find δ > 0 such that

|x− y| < δ ⇒ |f(x)− f(y)| < ε.

Since the inequality |f(x) − f(y)| < ε is required to hold for all ε > 0 we have that |f(x) −f(y)| ≤ 0 and hence |f(x)−f(y)| = 0, f(x) = f(y). Since for all x, y ∈ R we have f(x) = f(y)the function f is constant. “2”

9A reader familiar with uniform continuity should notice that this is not the condition for the uniformcontinuity, although it looks very similar: the order of quantifiers has been changed, see Section 11.11.

20 CHAPTER 1. LOGIC

Chapter 2

Set theory

2.1 Elementary set theory

A set is a collection of arbitrary objects. A priori there are no restrictions of what objectswe consider, but we will soon see that one has to be careful. There is no formal definition ofa set and we only use our intuition and imagination to create sets.

The simplest set is the empty set denoted by ∅. This is a set with no elements at all.

By writing x ∈ A we denote that x is an element of the set A. If x is not an element ofA, we write x 6∈ A. That is

¬(x ∈ A) ≡ x 6∈ A.

Two sets are equal if they have the same elements. Formally

(A = B) ≡ ∀x (x ∈ A ≡ x ∈ B).

We say that A is a subset of B, A ⊂ B (or B ⊃ A) if every element of A is also an elementof B. Formally

(A ⊂ B) ≡ ∀x (x ∈ A ⇒ x ∈ B)

Therefore

(A = B) ≡ (A ⊂ B ∧ B ⊂ A).

Is the empty set of real numbers the same as the empty set of chairs? Yes. They both havethe same elements – no elements at all.

Sets can be defined by listing all of the elements, e.g.

A = {1, 3, 5} = {5, 1, 3}.

This is a set with three elements 1, 3 and 5 only. The order of elements in a set is notimportant, this is why {1, 3, 5} = {5, 1, 3}. Also

{1, 1, 2, 5, 5} = {1, 2, 5},

21

22 CHAPTER 2. SET THEORY

because both sets have the same elements: 1, 2, 5.

N = {1, 2, 3, 4, . . .}

is the set of all natural numbers, or, in other words, all positive integers.

Z = {. . . ,−3,−2,−1, 0, 1, 2, 3, . . .}

is the set of all integers.

If A is a given set and P (x) is a property of elements of A that may or may not be satisfiedby a given x ∈ A, then we can define a new set

{x ∈ A : P (x)}.

This is the set of all elements in A such that the property P (x) is satisfied. For example

{x ∈ N : x is even} = {2, 4, 6, 8, . . .}

{x ∈ R : x2 ≤ 1} = [−1, 1].

The union of sets A and B is defined by

A ∪B = {x : x ∈ A ∨ x ∈ B}.

That means the union A∪B is a set that consists of all elements of A, all elements of B andno other element. The intersection of the sets A∩B consists of elements that belong to bothsets A and B at the same time, that is

A ∩B = {x : x ∈ A ∧ x ∈ B}.

Example 2.1. If a ∈ A, then {a} ⊂ A, {a} ∪A = A and {a} ∩A = {a}.

We can also define the union and the intersection of an infinite family of sets. IfA1, A2, A3, . . . are sets, then we define

∞⋃n=1

An = A1 ∪A2 ∪A3 ∪ . . . = {x : ∃n x ∈ An}

∞⋂n=1

An = A1 ∩A2 ∩A3 ∩ . . . = {x : ∀n x ∈ An} .

In the above unions and intersections of infinite families of sets the sets were arranged intoa sequence, but we can also think about other infinite families of sets that even cannot bearranged into a sequence. For example we may think of a situation that we have a setassociated with each real number

At, t ∈ R.

Then we define⋃t∈R

At = {x : ∃t ∈ R (x ∈ At)} ,⋂t∈R

At = {x : ∀t ∈ R (x ∈ At)}.

2.1. ELEMENTARY SET THEORY 23

Example 2.2. ⋃t∈R{t} = R,

⋂t∈R{t} = ∅.

More generally, if I is a set and for each i ∈ I, Ai is a set we can talk about the set {Ai}i∈Iof all lets Ai. Then we can define⋃

i∈IAi = {x : ∃i ∈ I (x ∈ Ai)} and

⋂i∈I

Ai = {x : ∀i ∈ I (x ∈ Ai)}.

A set {Ai}i∈I whose elements are sets is often called a family of sets. In the definition of thesum and intersection there is no restriction how large family of sets we consider.

The complement of A relative to B or just the difference of sets A and B is defined by

B \A = {x ∈ B : x /∈ A}

sox ∈ B \A ≡ (x ∈ B) ∧ ¬(x ∈ A).

Example 2.3. N \ {2, 4, 6, 8, . . .} = {1, 3, 5, 7, . . .}

The operations on sets can be represented graphically using Venn diagrams.

Example 2.4. Prove that A \ (B \ C) = (A \B) ∪ (A ∩ C).

Proof. This is an important example, because it shows different methods that can be used tostudy operations on sets and also it shows a direct link to logic.

First we will present a geometric and not very rigorous proof based on Venn diagrams.


The result follows after comparing Fig. 1 with Fig. 2.

Now we will show a rigorous proof that will employ methods of logic developed earlier. Wewill prove that the two sets are equal by showing that they have the same elements. Recallthat

p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r).

We havex ∈ A \ (B \ C) ≡

x ∈ A ∧ ¬(x ∈ B \ C) ≡x ∈ A ∧ ¬(x ∈ B ∧ x 6∈ C) ≡x ∈ A ∧ (x 6∈ B ∨ x ∈ C) ≡

(x ∈ A ∧ x 6∈ B) ∨ (x ∈ A ∧ x ∈ C) ≡(x ∈ A \B) ∨ (x ∈ A ∩ C) ≡x ∈ (A \B) ∪ (A ∩ C) .

We proved that x is an element of A\(B\C) if and only if x is an element of (A\B)∪(A∩C).Therefore the two sets are equal.

Remark 2.5. Observe that the equivalence between the third and the fifth line follows fromthe equivalence

p ∧ ¬(q ∧ ¬r) ≡ (p ∧ ¬q) ∨ (p ∧ r)

that we proved in Example 1.4. Indeed,

x ∈ A︸︷︷︸p

∧ ¬(x ∈ B︸︷︷︸q

∧ x 6∈ C︸︷︷︸¬r

) ≡ (x ∈ A︸︷︷︸p

∧ x 6∈ B︸︷︷︸¬q

) ∨ (x ∈ A︸︷︷︸p

∧x ∈ C︸︷︷︸r

)

The next result is a version of De Morgan Laws for sets.

2.2. RUSSELL’S PARADOX 25

Proposition 2.6.A \ (B ∪ C) = (A \B) ∩ (A \ C) ,

A \ (B ∩ C) = (A \B) ∪ (A \ C)

Proof. We will prove only the first equality, the proof of the second one is similar. We have

x ∈ A \ (B ∪ C) ≡x ∈ A ∧ ¬(x ∈ B ∪ C) ≡

x ∈ A ∧ ¬(x ∈ B ∨ x ∈ C) ≡x ∈ A ∧ (x 6∈ B ∧ x 6∈ C) ≡

(x ∈ A ∧ x 6∈ B) ∧ (x ∈ A ∧ x 6∈ C) ≡(x ∈ A \B) ∧ (x ∈ A \ C) ≡x ∈ (A \B) ∩ (A \ C) .

We proved that x is an element of A\(B∪C) if and only if x is an element of (A\B)∩(A\C).Therefore the two sets are equal.

Similarly one can prove a more general form of the De Morgan Laws. We leave a proofas an exercise.

Proposition 2.7 (De Morgan Laws). For any set A and any family of sets {Ai}i∈I

A \⋃i∈I

Ai =⋂i∈I

(A \Ai) ,

A \⋂i∈I

Ai =⋃i∈I

(A \Ai) .

2.2 Russell’s paradox

Our intuition is that a set can be any collection of any elements. In particular elements ofa set can also be sets. Let X be a set. Since its elements can be sets the statement X 6∈ Xmakes sense and we may consider the set

A = {X : X is a set ∧ X 6∈ X}.

Now we can ask an important question

Is A is an element of A?

Suppose that A ∈ A. Then it satisfies the condition X 6∈ X from the definition of A so A 6∈ Aand we have a contradiction. Suppose now that A 6∈ A, then A is a set and it satisfies thecondition X 6∈ X from the definition of A so A ∈ A and we arrive at a contradiction again.That simply means the set A cannot exist.

But then we may ask: what is a set? How can we be sure that a set we define will notlead to a contradiction?


The only way to handle this unpleasant and annoying situation is to create a list of “safetyrules” which tell us what operations on sets are allowed and how we can create new sets fromthe ones that already exist. Such safety rules are called axioms and we are not allowed to doanything which cannot be concluded from axioms.

The axioms will be discusses in the next section, but let us mention one of the axiomsnow. The axiom of specification says that if X is a set and P is a property of elements of X,then the set

{Y ∈ X : P (Y )}

exists. In the Russel Paradox we tried to create the set:

A = {X ∈ All sets : X 6∈ X}.

Suppose that the set of all sets exists. Then according to the axiom of specification the set Aexists too and we arrive to a contradiction which is explained above. Assuming that the setof all sets exists leads to a contradiction and that proves that the set of all sets cannot exist.Since the set of all sets does not exists, A is not a set, because the axiom of specification doesnot apply. We proved

Proposition 2.8. The set of all sets does not exist.

2.3 Axioms of the set theory∗

The material of this section is difficult, because of a high level of abstraction. It is okay ifafter going through the material of this section you will be confused and lost. What is writtenhere will not be really used in the subsequent sections so if you decide to skip this section itis okay, but I would advise to read it anyway. If you like it that is great. If not, then no muchharm is done.

Formal axiomatic theory of sets, the so called Zermelo-Fraenkel set theory with the Axiomof Choice or just ZFC for short is difficult and it is not of our intention to present it withdetails. Instead, we will present a rather intuitive approach which however, will give a quitegood understanding of what is it all about.

Mathematics deals with abstract entities. Integers are also abstract objects. While wecan use them to count apples, integers by itself have nothing to do with objects in a real life.

The same applies to sets. We said that sets can be collections of any objects, but inmathematics we would need to know how to define these objects. Thus for simplicity weassume that all objects existing in the set theory are sets and so elements of sets can be setsonly1. That is bizarre, I know, but that makes the theory a lot simpler. In fact we will onlyassume that the empty set exist and starting from that we will create the entire mathematicalrealm, including integers and real numbers. We will create everything from nothing (i.e. formthe empty set). Isn’t that great?

1In the following sections however, we will not be that restrictive and by a set we will mean any collectionof objects that are not necessarily sets.

2.3. AXIOMS OF THE SET THEORY∗ 27

Axiom of Existence. The empty set ∅ exists.

Formally the empty set is defined by the condition

∃Y ∀X (X 6∈ Y ).

By know we know that at least one set exists: empty set. That is not much, but other axiomswill allow us to construct new sets from it. Actually we will be able to construct all the setsthat are needed in mathematics. The next axiom explains what it means that two sets areequal.

Axiom of Equality2. Two sets are equal X = Y if they have the same elements, i.e.,

(X = Y ) ≡ ∀Z(Z ∈ X ≡ Z ∈ Y )

It implies that there is exactly one empty set: the empty set of integers is equal to the emptyset of complex numbers, because both sets have exactly the same elements – no elements atall.

For sets we define what it means that X is a subset of Y exactly as in Section 2 and thentwo sets X and Y are equal if and only if X ⊂ Y and Y ⊂ X.

The next axiom tells us how to build new sets from the sets that exist.

Axiom of Pairing. If the sets X and Y exist, then also the set {X,Y } exists.

Let us emphasize that the set {X,Y } is a set with elements X and Y and it is not theunion of sets X and Y .

As for now we only know that ∅ exists, but the Axiom of Pairing implies that also

{∅, ∅} = {∅}

exists. Note that both of the sets {∅, ∅} and {∅} are equal, because both have the sameelements:

(X ∈ {∅, ∅}) ≡ (X = ∅ ∨ X = ∅) ≡ (X = ∅) ≡ (X ∈ {∅}).

Then also the following sets also exist:

∅, {∅}, {∅, {∅}}, {∅, {∅}, {∅, {∅}}}, . . .

It is easy to prove that these sets are different. For example ∅ 6= {∅}. Indeed, ∅ ∈ {∅} while∅ 6∈ ∅, because ∅ has no elements. Therefore the sets ∅, and {∅} do not have the same elementsso they are different.

We can now define natural numbers N∗ including zero3 by calling

(2.1) 0 := ∅, 1 := {∅}, 2 := {∅, {∅}}, 3 := {∅, {∅}, {∅, {∅}}}, . . .2Known also as theAxiom of Extensionality.3While here we include zero among natural numbers, in the remaining sections of the book we will always

assume that natural numbers are N = {1, 2, 3, . . .}


Axiom of Union. If I is a set (of sets), then there is a set whose elements belong to at leastone of the elements of I.

In other words the following set exists:⋃X∈I

X =⋃{X : X ∈ I} := {Y : (∃X ∈ I) (Y ∈ X)}.

Now we can define the operation of adding 1 to an integer in the following way:

If n = X is a natural number, that is a member of the list (2.1), then we define n + 1 =X ∪ {X}.

For example

2 = {∅, {∅}}︸︷︷︸X

so 2 + 1 = X ∪ {X} = {∅, {∅}} ∪ {{∅, {∅}}}︸︷︷︸this set has one element

= {∅, {∅}, {∅, {∅}}} = 3.

Then we can define addition and multiplication of integers in terms of operations on sets (wewill not provide details).

Once we know how to add natural numbers we can define the inequality n ≤ m as follows:

(n ≤ m) ≡ (∃a ∈ N∗ (n = a+m)).

In particular we have the relation n ≤ m between natural numbers and clearly 0 is thesmallest natural number with respect to this relation.

Axiom of Specification4. If X is a set and P is a property of elements of X, then the set

{Y ∈ X : P (Y )}

exists.

The Axiom of Specification implies the existence intersection of sets. If X and Y are sets,then we define

X ∩ Y = {Z ∈ X : Z ∈ Y }.

More generally if I is any nonempty set of sets, we define the intersection of all sets in I asfollows. Fix any X0 ∈ I and set⋂

X∈IX =

⋂{X : X ∈ I} := {Y ∈ X0 : (∀X ∈ I) (Y ∈ X)}.

Therefore we do not need to add as an axiom the existence of an intersection of sets since itit a consequence of the axiom of specification. Although the Axiom of Specification can beused to define intersection of sets it cannot be used to define the union of sets so we had tostate the existence of union of sets as an axiom.

4Known also as the Axiom Schema of Specification.

2.3. AXIOMS OF THE SET THEORY∗ 29

In particular, if I = {X1, X2, . . .}, we write

⋂{X : X ∈ I} =

∞⋂i=1

Xi = {Y ∈ X1 : ∀i (Y ∈ Xi)}.

Here however, we went a bit ahead, because the set I is infinite and we do not know yet thatinfinite sets exist. We will get to that.

Axiom of Powers. For a given set X there is a set P (X) whose elements are all subsets ofX.

The set P (X) is called the power set. That explains the name of the axiom. One canprove that if a set X has n elements, then P (X) has 2n elements and for that reason thepower set is sometimes denoted by 2X .

The axioms listed above allow us to construct new sets starting from the empty set.However, in all the constructions we end up with finite sets only. We could define naturalnumbers, and clearly there are infinitely many natural numbers, but we do not know yetwhether the set of all natural numbers exist. We cannot conclude yet that any infinite setexists. The problem is solved by introducing the Axiom of Infinity.

Axiom of Infinity. There is a set X such that ∅ ∈ X and whenever Y is an element of X,then also Y ∪ {Y } is an element of X.

Formally,

(∃X)(∅ ∈ X ∧

((∀Y ∈ X) (Y ∪ {Y } ∈ X)

)).

In particular the set contains

(2.2) ∅︸︷︷︸0

, ∅ ∪ {∅} = {∅}︸︷︷︸1

, {∅} ∪ {{∅}} = {∅, {∅}}︸︷︷︸2

, . . .

so it contains all natural numbers. However, the set X may contain more elements than thoselisted in (2.2). More precisely, there are many infinite sets with the above property, some ofthem are larger than just the set of natural numbers and we define the set of natural numbersN∗ as the intersection of all such sets. This is the smallest set with that property.

Thus the set N∗ has the three properties:

1. ∅ ∈ N∗ (i.e. 0 ∈ N∗) ,

2. if X ∈ N∗, then X ∪ {X} ∈ N∗ (i.e. if n ∈ N∗, then n+ 1 ∈ N∗).

Being the smallest set with the two properties means that

3. If a set S has the above two properties, then N∗ ⊂ S.

It can also be formulated it as follows as the Principle of Mathematical Induction.


Theorem 2.9 (Principle of Mathematical Induction). Let S be a subset of N∗ that has twoproperties

1. 0 ∈ S,

2. For every natural number n, if n ∈ S, then n+ 1 ∈ S.

Then S = N∗.

Proof. The set S has the properties 1. and 2., so by property 3. of the set of natural numbersN∗ ⊂ S. On the other hand we know that S ⊂ N∗ so S = N∗.

The Principle of Mathematical Induction has many applications and we will discuss themin Chapter 4.

Once we have the definition of the set of natural numbers it is also not difficult to definethe set of integers Z and the set of rational numbers Q. Rational numbers p/q can be identifiedwith pairs of integers (p, q) such that q ∈ N and p and q are relatively prime. Then we candefine real numbers. We will return to these constructions in Section 6.

The next axiom is the Axiom of Replacement. The formal statement of the Axiom ofReplacement is very difficult to comprehend and we will provide only a not very rigorousversion of it. It refers to functions. A formal definition of a function will be provided inSection 3.1.

Axiom of Replacement5. Let f : A → B be a function. Then the image of the functionexists. More precisely the image f(A) is a subset of B defined by

f(A) = {Y ∈ B : ∃X ∈ A (Y = f(X))}.

The last and the least obvious axiom is the Axiom of Choice. Although the statementseems quite natural it leads to very unexpected examples that often contradict our intuition.

Axiom of Choice. For every set X whose elements are nonempty sets, there is a functionf with domain X such that for all Y ∈ X, f(Y ) ∈ Y .

The function f is called a choice function. To each set Y ∈ X the function f selects oneelement f(Y ) ∈ Y from Y . Then the image of the function f is a set that contains at leastone element from each of the sets in X. If the sets in X are pairwise disjoint, then the imageof f contains exactly one element from each of the sets in X.

There are many statements that are equivalent to the Axiom of Choice. Among thestatements equivalent to the Axiom of Choice the most important are Zorn’s lemma and theexistence of well-ordering of sets. Another equivalent statement is that every linear space hasa basis.

5Known also as the Axiom Schema of Replacement

2.4. PROBLEMS 31

2.4 Problems

Problem 1. Let A = {1, 2, 5}. Find the power set P (A).

Problem 2. Let A = {2n− 1 : n ∈ N} and B = {4n : n ∈ N}. Find A ∩B.

Problem 3. Prove that if A ⊂ B, then A = B \ (B \A).

Problem 4. Prove that (A \B) \ C = A \ (B ∪ C)

Problem 5. Prove that {∅} 6= {∅, {∅}}.

Problem 6. Prove that {{a}, {a, {b}}} = {{c}, {c, {d}}} if and only if a = b and c = d.

MORE PROBLEMS WILL BE ADDED!

Chapter 3

Cartesian products and functions

The sets do not specify the order of its elements, e.g.

{1, 2} = {2, 1}.

Thus in order to take the order into consideration we need the notion of ordered pair (a, b).Namely we define

(a, b) = (c, d) if and only if a = c and b = d.

In particular (1, 2) 6= (2, 1).

Formally we need to be able to define the ordered pair using the set theory (that is allwhat we have) and a formal definition due to Kuratowski is as follows:

(a, b) := {{a}, {a, {b}}}.

It is an easy exercise1 that with this definition (a, b) = (c, d) if and only if a = c and b = d.

If A and B are sets, then we define the Cartesian product by

A×B = {(a, b) : a ∈ A ∧ b ∈ B}.

The Cartesian product has a transparent geometric interpretation.

1See Problem 6 in Chapter 2.

33

34 CHAPTER 3. CARTESIAN PRODUCTS AND FUNCTIONS

3.1 Functions

Let X and Y be given sets. A function f : X → Y is a rule that assigns to each x ∈ X anelement of Y denoted by f(x). X is called the domain of f and Y the target of f .

f(X) = {y ∈ Y : y = f(x) for some x ∈ X}

is the range of f or the image of f .

Example 3.1. If f : R→ R, f(x) = x2, then the range of f is f(R) = [0,∞).

The graph of f isG(f) = {(x, f(x)) ∈ X × Y : x ∈ X}.

Observe that a graph of a function is not an arbitrary subset of X × Y . The graph G(f) ⊂X×Y has the property that for every x ∈ X there is exactly one y ∈ Y such that (x, y) ∈ G(f).We denote exists exactly one by ∃!. Thus the graph G(f) has the property2

(∀x ∈ X ∃!y ∈ Y ) ((x, y) ∈ G(f)).

Suppose now that we have any subset R ⊂ X × Y with the above property that is:

(3.1) (∀x ∈ X ∃!y ∈ Y ) ((x, y) ∈ R).

Then the set R is a graph of some function f : X → Y . Namely, the function f that is definedas follows. For x ∈ X we define f(x) to be the unique y in Y such that (x, y) ∈ R.

In other words, if we know the graph, we also know the function. All information aboutthe function is contained in its graph.3

The definition of a function as a certain ‘rule’ is vague and not very rigorous so in mathe-matics we define a function to be any subset R ⊂ X × Y with the property (3.1). That is weidentify a function with its graph and that is a good definition, because we can reconstructthe function if we know its graph. However, in what follows we will use a less formal definitionthat a function is a ‘rule’.

A function f : X → Y is called one-to-one if4

∀x1, x2 ∈ X (x1 6= x2 ⇒ f(x1) 6= f(x2)).

If a function f : X → Y is one-to-one, then there is an inverse function

f−1 : f(X)→ X

defined by f−1(f(x)) = x for all x ∈ X.

2This is an abstract version of the vertical line test.3This definition of a function leads to a funny example. If X = Y = ∅, then X × Y = ∅ and R = ∅ ⊂

∅ = X × Y satisfies the definition of the function as given above. Therefore ∅ : ∅ → ∅ is a function! That iscorrect. We have to accept such examples if we want to have theory to be simple and consistent. Althoughwe will never deal with such a function, there is no need to exclude it from the set of functions.

4This is an abstract version of the horizontal line test.

3.1. FUNCTIONS 35

Example 3.2. If f : R→ R, f(x) = ex, then f is one-to-one, the range if f(R) = (0,∞) andthe inverse function is f−1 : (0,∞)→ R, f−1(x) = lnx.

A function is said to be onto or a surjection or surjective if f(X) = Y , i.e.,

∀y ∈ Y ∃x ∈ X (y = f(x)).

A function f : X → Y that is one-to-one and surjective is called a bijection. In this situationthe inverse function is defined on f(X) = Y and the inverse function f−1 : Y → X is alsoa bijection. Thus a bijection is a one-to-one correspondence between all points of X and allpoints of Y .

Example 3.3. f : R→ R, f(x) = x3 is one-to-one and onto so it is a bijection. The inversefunction is f−1(x) = 3

√x. It is also a bijection.

If f : X → Y and A ⊂ X, then we define

f(A) = {y ∈ Y : ∃x ∈ X (y = f(x))} = {f(x) : x ∈ X}

and we call f(A) the image of A under f .

If f : X → Y and A ⊂ Y , then we define

f−1(A) = {x ∈ X : f(x) ∈ A}

and we call this set the preimage of A under f or the inverse image of A under f . Note that

x ∈ f−1(A) ≡ f(x) ∈ A.

The function f : X → X, f(x) = x for all x ∈ X is called identity and is often denotedby IX : X → X.

If the image of f consists of one point, then f is called a constant function. That isf : X → Y is a constant function if there is c ∈ Y such that f(x) = c for all x ∈ X.

If f : X → Y and g : Y → Z are two functions, then the composition of f and g is thefunction

g ◦ f : X → Z

defined by the formula(g ◦ f)(x) = g(f(x)).

Example 3.4. If f : R→ R, f(x) = 2x+ 1 and g : R→ R, g(x) = x2, then g ◦ f : R→ R isgiven by (g ◦ f)(x) = g(f(x)) = g(2x+ 1) = (2x+ 1)2.

If f : X → Y and A ⊂ X, then the restriction of f to A is

f |A : A→ Y, (f |A)(x) = f(x) for all x ∈ A.

In this situation we say that f is an extension of f |A.


Proposition 3.5. Let f : X → Y be a function and let A,B ⊂ Y . Then

(3.2) f−1(A ∪B) = f−1(A) ∪ f−1(B),

f−1(A ∩B) = f−1(A) ∩ f−1(B)

(3.3) f−1(Y \A) = X \ f−1(A).

Remark 3.6. Given a set E ⊂ X we often write Ec = X \ E to denote the complement ofthe set E in X. With this notation (3.3) can be stated as

f−1(Ac) = (f−1(A))c.

Proof. We will only prove (3.2) leaving the proofs of the other two statements as an exercise.We need to prove that

(3.4) x ∈ f−1(A ∪B) if and only if x ∈ f−1(A) ∪ f−1(B).

Let x ∈ f−1(A ∪B). This means that f(x) ∈ A ∪B, i.e. f(x) ∈ A or f(x) ∈ B. If f(x) ∈ A,then x ∈ f−1(A) ⊂ f−1(A) ∪ f−1(B). If f(x) ∈ B, then x ∈ f−1(B) ⊂ f−1(A) ∪ f−1(B). Ineither case x ∈ f−1(A) ∪ f−1(B).

Suppose now that x ∈ f−1(A)∪ f−1(B). Then x ∈ f−1(A) of x ∈ f−1(B). If x ∈ f−1(A),then f(x) ∈ A ⊂ A∪B. If x ∈ f−1(B), then f(x) ∈ B ⊂ A∪B. In either case f(x) ∈ A∪B,i.e. x ∈ f−1(A ∪B). This completes the proof of the equivalence (3.4).

We could also prove the equivalence (3.4) more formally as follows

x ∈ f−1(A ∪B) ≡f(x) ∈ A ∪B ≡

(f(x) ∈ A) ∨ (f(x) ∈ B) ≡(x ∈ f−1(A)) ∨ (x ∈ f−1(B)) ≡

x ∈ f−1(A) ∪ f−1(B).

Proposition 3.7. Let f : X → Y be a function and let A,B ⊂ X. Then

(3.5) f(A ∪B) = f(A) ∪ f(B), f(A ∩B) ⊂ f(A) ∩ f(B).

If in addition f is one-to-one, then

(3.6) f(A ∩B) = f(A) ∩ f(B).

We will only prove (3.5) leaving the proof of (3.6) as an exercise.

Proof that f(A ∪B) = f(A) ∪ f(B). If y ∈ f(A ∪ B), then y = f(x) for some x ∈ A ∪ Band we have two cases. If x ∈ A, then y = f(x) ∈ f(A) ⊂ f(A) ∪ f(B). If x ∈ B, theny = f(x) ∈ f(B) ⊂ f(A) ∪ f(B). In either case, y ∈ f(A) ∪ f(B). We proved that

y ∈ f(A ∪B) ⇒ y ∈ f(A) ∪ f(B),

3.1. FUNCTIONS 37

i.e.,

(3.7) f(A ∪B) ⊂ f(A) ∪ f(B).

If y ∈ f(A) ∪ f(B), then y ∈ f(A) or y ∈ f(B) and we have two cases. If y ∈ f(A), theny = f(x) for some x ∈ A ⊂ A ∪ B so y = f(x) ∈ f(A ∪ B). If y ∈ f(B), then y = f(x) forsome x ∈ B ⊂ A ∪B so y = f(x) ∈ f(A ∪B). We proved

y ∈ f(A) ∪ f(B) ⇒ y ∈ f(A ∪B),

i.e.,

(3.8) f(A) ∪ f(B) ⊂ f(A ∪B).

Now (3.7) and (3.8) yield equality f(A ∪B) = f(A) ∪ f(B).

Proof that f(A ∩B) ⊂ f(A) ∩ f(B). If y ∈ f(A ∩ B), then y = f(x) for some x ∈ A ∩ B.Hence x ∈ A and x ∈ B at the same time so y = f(x) ∈ f(A) and y = f(x) ∈ f(B) at thesame time and hence f ∈ f(A) ∩ f(B). We proved that

y ∈ f(A ∩B) ⇒ y ∈ f(A) ∩ f(B),

i.e. f(A ∩B) ⊂ f(A) ∩ f(B).

Remark 3.8. In general we do not have equality in f(A ∩ B) ⊂ f(A) ∩ f(B). That is,in general it is not true that f(A) ∩ f(B) ⊂ f(A ∩ B). This is somewhat surprising sincewe proved that f(A ∪ B) = f(A) ∪ f(B) and one could expect a similar equality for theintersection of sets.

Let us try to prove that f(A) ∩ f(B) ⊂ f(A ∩B) to see where the proof falls apart.

Let y ∈ f(A) ∩ f(B). Then y ∈ f(A) and y ∈ f(B) at the same time so

y = f(x) for some x ∈ A and y = f(x) for some x ∈ B.

Therefore y = f(x) for some x ∈ A ∩B so y ∈ f(A ∩B) and we proved that

y ∈ f(A) ∩ f(B) ⇒ y ∈ f(A ∩B),

i.e.,f(A) ∩ f(B) ⊂ f(A ∩B).

The proof is not correct! But where did we make a mistake?

We proved (and that is correct) that

(3.9) y = f(x) for some x ∈ A,

(3.10) y = f(x) for some x ∈ B,

and we concluded that y = f(x) for some x ∈ A∩B. This conclusion is however, not correctsince x ∈ A for which (3.9) is true need not be the same as x ∈ B for which (3.10) is true andso we cannot claim that there is a common x ∈ A ∩B for which y = f(x)!


3.2 More logic∗

We will prove now (3.5) using a more formal logical notation. In practice we do not do thisand we write proofs as those written above, but the point is to learn what logical structureis hidden behind the above proofs. First we have to learn some more logic.

Assume that P (x) and Q(x) are some statements. Then we have

∃x (P (x) ∨Q(x)) ≡ (∃x P (x)) ∨ (∃x Q(x)).

This is correct. ∃x (P (x) ∨ Q(x)) is true, when there is x such that P (x) is true or Q(x)which happens exactly when there is x such that P (x) is true or there is x such that Q(x) istrue i.e., (∃x P (x)) ∨ (∃x Q(x)). Thus the two statements are equivalent.

∀x (P (x) ∧Q(x)) ≡ (∀x P (x)) ∧ (∀x Q(x)).

This is also true. For every x both statements P (x) and Q(x) are true exactly when for allx, P (x) is true and for all x, Q(x) is true.

However in the following statements we only have implication in one direction

∃x (P (x) ∧Q(x)) ⇒ 6⇐ (∃x P (x)) ∧ (∃x Q(x)).

Indeed, if ∃x (P (x) ∧Q(x)) is true, then we can find x such that P (x) and Q(x) are true atthe same time. However, if the statement (∃x P (x))∧ (∃x Q(x)) is true, then P (x) and Q(x)can be true for different values of x. Here is an example. We assume that x ∈ R.

∃x (x2 ≥ 1 ∧ x2 ≤ 1/2)︸︷︷︸false

⇒ 6⇐ (∃x x2 ≥ 1)︸︷︷︸true

∧ (∃x x2 ≤ 1/2)︸︷︷︸true︸︷︷︸

true

.

A false statement implies a true statement, but a true statement does not imply a false oneso we only have an implication from left to right. Similarly we have

∀x(P (x) ∨Q(x)) 6⇒ ⇐ (∀x P (x)) ∨ (∀x Q(x))

Here is an example

∀x (x2 ≥ 1 ∨ x2 ≤ 1)︸︷︷︸true

6⇒ ⇐ (∀x x2 ≥ 1) ∨ (∀x x2 ≤ 1)︸︷︷︸false

.

Now we are ready to provide a formal proof of Proposition 3.7. We will only prove (3.5).

Proof that f(A ∪B) = f(A) ∪ f(B).

y ∈ f(A ∪B) ≡∃x ((x ∈ A ∪B) ∧ (y = f(x)) ≡∃x ((x ∈ A ∨ x ∈ B) ∧ (y = f(x)) ≡

∃x (x ∈ A ∧ y = f(x)) ∨ (x ∈ B ∧ y = f(x)) ≡(∃x (x ∈ A ∧ y = f(x))) ∨ (∃x (x ∈ B ∧ y = f(x))) ≡

y ∈ f(A) ∨ y ∈ f(B) ≡y ∈ f(A) ∪ f(B) .

3.3. PROBLEMS 39

Proof that f(A ∩B) ⊂ f(A) ∩ f(B).

y ∈ f(A ∩B) ≡∃x ((x ∈ A ∩B) ∧ (y = f(x)) ≡∃x (x ∈ A ∧ x ∈ B ∧ y = f(x)) ≡

∃x ((x ∈ A ∧ y = f(x)) ∧ (x ∈ B ∧ y = f(x)) ⇒ 6⇐(∃x (x ∈ A ∧ y = f(x))) ∧ (∃x (x ∈ B ∧ y = f(x))) ≡

y ∈ f(A) ∧ y ∈ f(B) ≡y ∈ f(A) ∩ f(B) .

All statements but one are equivalent and in one case we only have the implication ⇒ so weproved that y ∈ f(A ∩B) ⇒ y ∈ f(A) ∩ f(B) and hence f(A ∩B) ⊂ f(A) ∩ f(B).

3.3 Problems

Problem 1. Prove Proposition 3.7.

Problem 2. Prove that if f : X → Y is a function and A1, A2, A3, . . . are subsets of X, then

f

( ∞⋃i=1

Ai

)=∞⋃i=1

f(Ai) ,

and

(3.11) f

( ∞⋂i=1

Ai

)⊂∞⋂i=1

f(Ai).

Provide an example to show that we do not necessarily have equality in (3.11)

Problem 3. Prove that if f : X → Y is one-to-one and A1, A2, A3, . . . are subsets of X, then

f

( ∞⋂i=1

Ai

)=

∞⋂i=1

f(Ai) .

MORE PROBLEMS WILL BE ADDED!

Chapter 4

Mathematical induction

In this section we will assume familiarity with real numbers although a formal set of axiomsand a formal construction of real numbers will be presented later. While it might looklike a lack of consistency, mathematical induction is so important and straightforward thatpostponing it for later would be a mistake.

4.1 Patterns

Consider a well known formula

1 + 2 + 3 + . . .+ n =n(n+ 1)

2.

How can we check that it is true? We can check it for n = 1, 2, 3, . . . , 100 on a calculatorand we see that both sides are equal. We observe a pattern. We often guess general formulasfrom a pattern. Since the formula is true for all n = 1, 2, 3, . . . , 100 it seems obvious that itmust be true for all n. Is it a convincing argument?

Here is another example. Consider the sequence 2n, n = 0, 1, 2, 3, . . . and lookat the first digits in the decimal representation of 2n. The sequence 2n starts with1,2,4,8,16,32,64,128,256,512 . . . If we keep playing with the calculator we see that thefirst digits in the sequence are

1, 2, 4, 8, 1, 3, 6, 1, 2, 5,

1, 2, 4, 8, 1, 3, 6, 1, 2, 5,

1, 2, 4, 8, 1, 3, 6, 1, 2, 5,

1, 2, 4, 8, 1, 3, 6, 1, 2, 5,

1, 2, 4, 8, 1, 3, . . .

The above sequence shows the first digits of 2n for n = 0, 1, 2, 3, . . . , 45. We can easily seea pattern that keeps repeating. The sequence of digits 1, 2, 4, 8, 1, 3, 6, 1, 2, 5 appears with aperiodic regularity. In particular digits 7 and 9 do not appear in the sequence. That seems like

41

42 CHAPTER 4. MATHEMATICAL INDUCTION

a general rule. Can we state it as a theorem? We seem to have enough evidence. However, ifwe would add one more term to the sequence we would see that 246 = 7.036874418 . . .× 1013.Moreover 253 = 9.007199255 . . . × 1015. One can prove even a more surprising result thateventually the digit 7 will appear more often1 than the digit 8!

Let us consider now another, more complicated, example.2

Example 4.1. Let a1 = 14, a2 = 128, a3 = 1170, a4 = 10695, a5 = 97763, and let thefollowing elements in the sequence be defined by the recursive function

an+5 = 10an+4 − 8an+3 + an+2 + 3an+1 + 2an for n ≥ 1.

This sequence is defined in a similar way as the famous Fibonacci sequence. Consider alsoanother sequence b1 = 14, b2 = 128 and

bn+2 =

[b2n+1

bn+

1

2

], for n ≥ 1.

Here [x] denotes the integer part of x, i.e., the largest integer less than or equal to x. Are thetwo sequences an and bn equal for all n? The formulas are very different, but if we experimentwith a calculator or a computer, we will see that they agree for n = 1, 2, . . . , 100. We seea pattern, so two sequences seem equal. We are not sure, so we keep checking for largern = 101, 102, . . . , 5000. Still works. Now we can be sure that they must be equal for all n.Really? If we would be more patient and have checked the formula for still larger n we wouldsee that an 6= bn for n = 5016. That is really surprising! Isn’t it?

Example 4.2. Let a0 = a1 = 1 and

an+1 =a970 + a971 + a972 + . . .+ a97n

nfor n = 1, 2, 3, . . .

One can prove that the numbers an are integers for n ≤ 2039, but a2040 is not an integer.

4.2 Induction

If we observe a pattern that suggest a formula for a given sequence, it can never be regarded asa proof and we need a rigorous mathematical argument that will allow us to verify the formulafor all n. A method of proving such statements is based on the Principle of MathematicalInduction (Theorem 2.9). Let’s recall it again.3

Theorem 4.3 (Principle of Mathematical Induction). Let S be a subset of N that has twoproperties

1The reason why the sequence of 10 digits seems to appear periodically has a rational explanation. 210 =1024 is very close to 1000. Thus multiplying a given number 10 times by 2 it is almost like multiplying itby 1000, and multiplication by 1000 just adds three zeros at the end — it does not change the digits at thebeginning. However, we do not multiply by 1000, but by 1024 and 24 creates an error which, like a tumor,grows so much that at the time of taking 246 the error changes the anticipated first digit 6 to 7. Tumor killsthe patient.

2This example and the next one are due to J. Wroblewski.3While in Theorem 2.9 we started induction form n = 0, here we start the induction from n = 1.

4.2. INDUCTION 43

1. 1 ∈ S,

2. For every natural number n, if n ∈ S, then n+ 1 ∈ S.

Then S = N.

The above theorem leads to a beautiful and powerful method of proving that a statementP (n) about a natural number n is true for all n ∈ N. The method is called MathematicalInduction. We formulate the method as a theorem.

Theorem 4.4 (Principle of Mathematical Induction). Let for each n ∈ N, P (n) be a state-ment about a natural number n. Suppose also that

1. P (1) is true,

2. For every n ∈ N, if P (n) is true, then P (n+ 1) is true.

Then P (n) is true for all n ∈ N.

Proof. If S = {n ∈ N : P (n) is true}, then S satisfies the assumptions of from Theorem 4.3and hence S = N. 2

The assumption “if P (n) is true” in 2. is called the induction hypothesis.

Let us check how the theorem works in practice. As a first application of MathematicalInduction we will prove

Example 4.5. Prove that for every n ∈ N, 26n+1 + 32n+2 is divisible by 11.

Proof.

1. For n = 1 we have26n+1 + 32n+2 = 27 + 34 = 209 = 11 · 19.

2. Suppose now 11|26n+1 + 32n+2. We need to prove that 11|26(n+1)+1 + 32(n+1)+2. By theassumption 26n+1 + 32n+2 = 11k for some k ∈ N. We have

26(n+1)+1 + 32(n+1)+2 = 26n+1 · 26 + 32n+2 · 32 = 64 · 26n+1 + 9 · 32n+2

= 64(26n+1 + 32n+2)− 55 · 32n+2 = 64 · 11k − 5 · 11 · 32n+2 = 11(64k − 5 · 32n+2) .

The proof is complete. 2

Proposition 4.6 (Bernoulli’s inequality). If n is a positive integer and a ≥ −1, a realnumber, then

(a+ 1)n ≥ 1 + na.

Proof.

1. For n = 1 the inequality is obvious.


2. Suppose that (1 + a)n ≥ 1 + na. We need to prove that (1 + a)n+1 ≥ 1 + (n + 1)a. Wehave

(1 + a)n+1 = (1 + a)n(1 + a) ≥ (1 + na)(1 + a) = 1 + (n+ 1)a+ na2 ≥ 1 + (n+ 1)a .

In the proof of the first inequality we used the fact that (1 +a)n ≥ 1 +na and 1 +a ≥ 0. Theproof is complete. 2

Example 4.7. Prove that for n ≥ 1

1

n+ 1+

1

n+ 2+

1

n+ 3+ . . .+

1

3n+

1

3n+ 1> 1 .

Proof.

1. For n = 1 we have1

2+

1

3+

1

4=

6 + 4 + 3

12=

13

12> 1 .

2. Suppose the inequality is true for n, we have to prove it for n+ 1.

1

n+ 2+

1

n+ 3+ . . .+

1

3(n+ 1) + 1

=1

n+ 2+

1

n+ 3+ . . .+

1

3n+ 1+

1

3n+ 2+

1

3n+ 3+

1

3n+ 4

=

(1

n+ 1+

1

n+ 2+ . . .+

1

3n+ 1

)− 1

n+ 1+

1

3n+ 2+

1

3n+ 3+

1

3n+ 4.

Now it suffices to prove that

− 1

n+ 1+

1

3n+ 2+

1

3n+ 3+

1

3n+ 4> 0 .

We have1

3n+ 2+

1

3n+ 3+

1

3n+ 4>

1

n+ 1=

3

3n+ 3≡

1

3n+ 2+

1

3n+ 4>

2

3n+ 3≡

3n+ 4 + 3n+ 2

(3n+ 2)(3n+ 4)>

2

3n+ 3≡

(6n+ 6)(3n+ 3) > 2(3n+ 2)(3n+ 4) ≡

(3n+ 3)2 > (3n+ 2)(3n+ 4) ≡

9n2 + 18n+ 9 > 9n2 + 18n+ 8 ≡

9 > 8.

Since the last statement is true and all statements above are equivalent, the inequality wewanted to prove is true. The proof is complete. 2

4.2. INDUCTION 45

Recall that n factorial is defined by n! = 1 · 2 · . . . · n and 0! = 1. We also define binomialcoefficients

(4.1)

(n

k

)=

n!

k!(n− k)!=n(n− 1) · . . . · (n− k + 1)

k!.

Theorem 4.8 (Binomial Theorem). For a, b ∈ R and n ∈ N we have

(a+ b)n =n∑k=0

(n

k

)an−kbk = an+

n

1!an−1b+

n(n− 1)

2!an−2b2 + . . .+

n(n− 1) · · · 2(n− 1)!

abn−1 + bn.

Remark 4.9. The first three and the last three terms of the above formula are

an + nan−1b+n(n− 1)

2an−2b2 + . . .+

n(n− 1)

2a2bn−2 + nabn−1 + bn

Also the coefficients in the binomial formula are symmetric in the sense that the coefficientat an−kbk is equal to the coefficient at akbn−k, because one can easily prove that(

n

k

)=

(n

n− k

).

Proof. For n = 1 the equality is obvious. Suppose it is true for n and we will prove it forn+ 1. One can easily prove (see problem 1) that(

n

k − 1

)+

(n

k

)=

(n+ 1

k

).

By the induction hypothesis we assume that binomial formula is true for n. Multiplying itby a+ b yields

(a+ b)n+1 =n∑k=0

(n

k

)an−k+1bk +

n∑k=0

(n

k

)an−kbk+1 .

Observe thatn∑k=0

(n

k

)an−kbk+1 = bn+1 +

n−1∑k=0

(n

k

)an−kbk+1 = bn+1 +

n∑k=1

(n

k − 1

)an+1−kbk .

Hence

(a+ b)n+1 = an+1 +

n∑k=1

(n

k

)an+1−kbk

+ bn+1 +

n∑k=1

(n

k − 1

)an+1−kbk

= an+1 +

(n∑k=1

[(n

k

)+

(n

k − 1

)]an+1−kbk

)+ bn+1

= an+1 +

(n∑k=1

(n+ 1

k

)an+1−kbk

)+ bn+1

=

n+1∑k=0

(n+ 1

k

)an+1−kbk .

The proof is complete.


You need to memorize two cases:

(a± b)2 = a2 ± 2ab+ b2 and (a+ b)3 = a3 + 3a2b+ 3ab2 + b3.

For the cases with higher exponents the easiest way to obtain formula is to use the Pascaltriangle:

Then we have

(a+ b)n Expansion Pascal’s triangle

(a+ b)0 1 1(a+ b)1 1a+ 1b 1,1(a+ b)2 1a2 + 2ab+ 1b2 1,2,1(a+ b)3 1a3 + 3a2b+ 3ab2 + 1b3 1,3,3,1(a+ b)4 1a4 + 4a3b+ 6a2b2 + 4ab3 + 1b4 1,4,6,4,1(a+ b)5 1a5 + 5a4b+ 10a3b2 + 10a2b3 + 5ab4 + 1b5 1,5,10,10,5,1

Theorem 4.10. If n is a natural number and a, b ∈ R, then

an − bn = (a− b)(an−1 + an−2b+ an−3b2 + . . .+ abn−2 + bn−1).

Proof. While one can prove the equality using mathematical induction, here is a shorterargument

(a− b)(an−1 + an−2b+ an−3b2 + . . .+ abn−2 + bn−1)

= a(an−1 + an−2b+ an−3b2 + . . .+ abn−2 + bn−1)

− b(an−1 + an−2b+ an−3b2 + . . .+ abn−2 + bn−1)

= (an + an−1b+ an−2b2 + . . .+ a2bn−2 + abn−1)

− (an−1b+ an−2b2 + an−3b3 + . . .+ abn−1 + bn)

= an − bn.

The last equality follows from the fact that most of the terms will cancel out.

4.2. INDUCTION 47

Theorem 4.11 (Cauchy-Schwarz inequality). For n ∈ N and real numbers ai, bi, i =1, 2, . . . , n we have ∣∣∣∣∣

n∑i=1

ai bi

∣∣∣∣∣ ≤(

n∑i=1

a2i

)1/2( n∑i=1

b2i

)1/2

.

Proof. While it is possible to prove the inequality using mathematical induction (Problem 6)we will present a much more clever and a shorter proof.

For any real number t we have

0 ≤n∑i=1

(ai + tbi)2 =

n∑i=1

a2i + 2t

n∑i=1

aibi + t2n∑i=1

b2i .

That means the quadratic function

t 7→

(n∑i=1

b2i

)t2 +

(2

n∑i=1

aibi

)t+

(n∑i=1

a2i

)

is always non-negative so it can have at more one root and therefore its discriminant is lessthan or equal to zero, that is(

2n∑i=1

aibi

)2

− 4

(n∑i=1

b2i

)(n∑i=1

a2i

)≤ 0,

(n∑i=1

aibi

)2

≤

(n∑i=1

b2i

)(n∑i=1

a2i

)and the Cauchy-Schwarz inequality follows upon taking the square root.4

Example 4.12. “Prove” that for all n ∈ N the inequality 30n < 2n + 110 is true.

“Proof”. For n = 1 we have 30 < 2 + 110 which is true. Suppose the inequality is truefor given n and we need to prove it for n+ 1. We have

30(n+ 1) = 30n+ 30 < 2n + 110 + 30 = 2n+1 + 110 + 30− 2n < 2n+1 + 110,

and clearly the last inequality is true for n ≥ 5. Therefore it remains to prove the inequality30n < 2n + 110 for n = 2, 3, 4 and we do it by a direct computation. For n = 2 we have60 < 4 + 110, For n = 3 we have 90 < 8 + 110 and for n = 4 we have 120 < 16 + 110. “2”

In particular for n = 6 we obtain an amazing inequality 180 < 174. The problem is thatwe proved the inequality for n = 1, 2, 3, 4 and then we proved that if it is true for n ≥ 5,then it is also true for n + 1. We did not however, checked the inequality for n = 5 and theinequality is false for n = 5. Be careful!

4Remember that√x2 = |x|.


There are many modifications of the method of induction, for example in many situationswe need to start with n = n0 rather than n = 1. For example if we want to prove P (n) forall n ≥ 3 it suffices to prove P (3) and that P (n) implies P (n+ 1) for n ≥ 3.

As an application of the Principle of Mathematics Induction we will prove the followingimportant result.

Theorem 4.13 (Well-Ordering Principle). Every nonempty subset S ⊂ N has the smallestelement, i.e., there is n0 ∈ S such that n0 ≤ n for all n ∈ S.

Proof. Suppose to the contrary that ∅ 6= S ⊂ N does not have the smallest element. LetX = N \ S and let

Y = {n ∈ X : ∀m ≤ n (m ∈ X)} = {n ∈ X : 1, 2, 3, . . . , n ∈ X}.

That is, the set Y consists of all natural numbers n ∈ X such that none of the elements1, 2, . . . , n belong to S. Clearly Y ⊂ X = N \ S. We will use the Principle of MathematicalInduction to prove that Y = N and hence S = ∅, so we will have a contradiction. Clearly1 ∈ X as otherwise 1 ∈ S would be the smallest element. Hence also 1 ∈ Y . Suppose nowthat n ∈ Y . That means all natural numbers m ≤ n are not in S. Thus n+ 1 cannot belongto S as otherwise it would be the smallest element in S, so n+ 1 ∈ X and since n+ 1 and allsmaller numbers than n + 1 are also in X we have that n + 1 ∈ Y . It follows now from thePrinciple of Mathematical Induction (Theorem 4.3) that Y = N which is a contradiction aspointed above.

Sometimes in order to prove P (n+1) it is not enough to use P (n) but we also need to useP (k) for all k ≤ n. In this situation the following modification of the method of inductionapplies.

Theorem 4.14 (Principle of Strong Induction). Let for each n ∈ N, P (n) be a statementabout a natural number n. Suppose also that

1. P (1) is true,

2. For every n ∈ N, if P (k) is true, for k = 1, 2, . . . n, then P (n+ 1) is true.

Then P (n) is true for all n ∈ N.

Again this theorem seems obvious and we will not prove it.

Theorem 4.15 (Binary representation). Every non-negative integer can be represented as

n = ck2k + ck−12

k−1 + . . .+ c020,

where ci ∈ {0, 1}. It is called a binary representation.

4.2. INDUCTION 49

Proof. For n = 0 we have 0 = 0 · 20. Let n ≥ 0 and assume that all integers 0, 1, 2, . . . , n havebinary representations (Principle of Strong Induction). We need to prove that n+ 1 also hasit.

If n+ 1 is even, n ≥ 1 and hence (n+ 1)/2 ≤ n has a binary representation

n+ 1

2= ck2

k + ck−12k−1 + . . .+ c02

0.

Accordingly

n+ 1 = ck2k+1 + ck−12

k + . . .+ c021 + 0 · 20.

is a binary representation of n+ 1. If n+ 1 is odd, then n/2 ≥ 0 has a binary representation

n

2= ck2

k + ck−12k−1 + . . .+ c02

0

and hence

n+ 1 = ck2k+1 + ck−12

k + . . .+ c021 + 1 · 20

is a binary representation of n+ 1.

A quite unusual modification of the method of induction will be used in the proof of thefollowing result (no cheating this time!).

Theorem 4.16 (Arithmetic-Geometric Mean Inequality). If a1, a2, a3, . . . , an ≥ 0, then

n√a1 · a2 · . . . · an ≤

a1 + a2 + . . .+ ann

and the equality holds if and only if a1 = a2 = . . . = an.

Proof. We will prove the inequality only, but the reader may conclude from the proofthat the equality holds if and only if a1 = a2 = . . . = an. We leave this last conclusion as anexercise.

First we will prove the inequality for n = 2k, k = 1, 2, 3, . . ., i.e., we will prove it forn = 2, 4, 8, 16, . . .

1. For n = 21 we have

a1 + a22

≥√a1a2 ≡ a1 − 2

√a1a2 + a2 ≡ (

√a1 −

√a2)

2 ≥ 0 .

Since the last condition is obviously true, the inequality, as equivalent to the last statement,is also true.


2. Suppose that the inequality is true for n = 2k. We need to prove it for n = 2k+1. We have

2k+1√a1 · . . . · a2k · a2k+1 · . . . · a2k+1

=√

2k√a1 · . . . · a2k 2k

√a2k+1 · . . . · a2k+1

≤2k√a1 · . . . · a2k + 2k

√a2k+1 · . . . · a2k+1

2

≤a1+...+a2k

2k+

a2k+1

+...a2k+1

2k

2

=a1 + a2 + . . .+ a2k+1

2k+1.

The above estimates require some explanations. The first equality is obvious. The secondinequality is just a consequence of the arithmetic-geometric inequality for n = 1 which wasproved in 1. The third inequality follows from the inductive assumption that the inequalityis true for n = 2k and the last equality is obvious again.

We proved the inequality for n = 2, 4, 8, 16, . . .. In order to prove that the inequality istrue for all integers it suffices to prove that if it is true for n, then it is also true for n − 1(reverse induction).

Thus suppose that the inequality is true for n. We will prove it is true for n− 1. We have

n

√a1 · . . . · an−1 ·

(a1 + . . .+ an−1

n− 1

)

≤a1 + . . . an−1 +

(a1+...+an−1

n−1

)n

=a1 + . . .+ an−1

n− 1.

The first inequality above follows from the assumption that the arithmetic-geometric inequal-ity is true for n. Hence

n√a1 · . . . · an−1 n

√a1 + . . .+ an−1

n− 1≤ a1 + . . .+ an−1

n− 1,

so (a1 · . . . · an−1

)1/n ≤ (a1 + . . . an−1n− 1

)1−1/n

4.3. PROBLEMS 51

and finally (a1 · . . . · an−1

) 1n−1 ≤ a1 + . . .+ an−1

n− 1

which is what we wanted to prove. 2

4.3 Problems

Problem 1. Prove that if 0 ≤ k ≤ n are integers, then(n

k − 1

)+

(n

k

)=

(n+ 1

k

).

Problem 2. Prove that

1 + 2 + . . .+ n =n(n+ 1)

2.

Problem 3. Prove that for q 6= 1 we have

1 + q + q2 + . . .+ qn =1− qn+1

1− q.

Problem 4. Prove using induction that if n is a natural number and a, b ∈ R, then

an − bn = (a− b)(an−1 + an−2b+ an−3b2 + . . .+ abn−2 + bn−1).

Problem 5. Prove that the number of diagonals in a convex polygon with n sides equalsn(n− 3)/2.

Problem 6. Use mathematical induction to prove the Cauchy-Schwarz inequality∣∣∣∣∣n∑i=1

ai bi

∣∣∣∣∣ ≤(

n∑i=1

a2i

)1/2( n∑i=1

b2i

)1/2

.

Problem 7. Examine the sequence of first digits in the decimal expansion of 1.778n, n =1, 2, 3, . . . We see a periodic sequence 1, 3, 5, 9. The sequence will eventually change. Usingcalculator find the smallest value of n for which the sequence changes. Why does the sequenceof digits 1, 3, 5, 9 appear periodically for such a long time? Can you modify the sequence sothat the digits 1, 3, 5, 9 will appear periodically even more times?

Chapter 5

Relations, integers and rationalnumbers

In Section 2.3 we learned how to define natural numbers N∗ = {0, 1, 2, . . .} directly from theaxioms of the set theory. Those who did not follow this construction carefully, may assumethat somehow we know what the natural numbers N∗ are and that they are equipped withthe operations of addition, multiplication and the order n ≤ m. We also assume familiaritywith basic properties of arithmetic like

n+m = m+ n, n(m+ k) = nm+ nk, etc.

While all these properties can be proved using the definitions, we take them for granted. Themain point of this chapter is to learn how to define all integers Z and all rational numbers Q.In the next chapter we will learn how to define all real numbers R.

We are born with a good understanding of what natural numbers are, however, existenceof negative integers, rational numbers or real numbers is not so natural and it took humanitythousands of years to understand what they are and how they should be constructed. LeopoldKronecker (1823–1891) who worked on number theory, algebra and logic once said: Godcreated the natural numbers; all the rest is the work of man.

However, before we can proceed to the construction of Z or Q we need to learn aboutbinary relations.

5.1 Binary relations

Before we provide a general definition of a binary relation, called relation for short, let uslook at some examples of relations:

1. A ⊂ B, where A and B are sets,

2. A = B, where A and B are sets,

53

54 CHAPTER 5. RELATIONS, INTEGERS AND RATIONAL NUMBERS

3. x ≤ y, where x, y are real numbers,

4. n < m, where n,m are integers,

5. Congruence relation on the set of integers. Given n ∈ N, n ≥ 2, two integers a, b ∈ Zare congruent modulo n, denoted

a ≡ b (mod n),

if a− b is divisible by n, n|a− b,

6. ∆ABC ∼= ∆A′B′C ′, relation of congruence of triangles.

More generally, if X is a set, and ∼ is a relation between elements of X, then we write x ∼ y ifthe elements x, y are in this relation. However, we still need a general and rigorous definitionof a relation.

A (binary) relation on X is any subset R ⊂ X × X of the Cartesian product. Then wewrite

a ∼ b if and only if (a, b) ∈ R.

Let us look at some examples.

Example 5.1. The real numbers satisfy the relation x ≤ y if and only if (x, y) belongs tothe set:

Example 5.2. The natural numbers satisfy the relation n = m is and only if (n,m) belongsto the set:

We say that a relation ∼ on a set X is

1. Reflexive, if a ∼ a for all a ∈ X,

2. Symmetric, if for all a, b ∈ X, a ∼ b implies b ∼ a,

3. Transitive, if for all a, b, c ∈ X, if a ∼ b and b ∼ c, then also a ∼ c,

4. Equivalence relation, if it is reflexive, symmetric and transitive.

1. The relation A ⊂ B, where A and B are sets is reflexive and transitive, but not sym-metric.

2. The relation A = B, where A and B are sets is an equivalence relation.

3. The relation x ≤ y, where x, y are real numbers is reflexive and transitive, but notsymmetric.

4. The relation n < m, where n,m are integers is transitive, but neither reflexive norsymmetric.

5.1. BINARY RELATIONS 55

5. The congruence relation between integers is an equivalence relation. Indeed, it is re-flexive, a ≡ a (mod n), because a − a = 0 is divisible by n. It is symmetric because ifa ≡ b (mod n), then b ≡ a (mod n) (if a − b is divisible by n, then obviously b − a isdivisible by n). Finally, the relation is transitive. If a ≡ b (mod n) and b ≡ c (mod n),then a− b = kn and b− c = `n so a− c = (a− b) + (b− c) = (k + `)n and hence a ≡ c(mod n).

6. The relation of congruence of triangles is an equivalence relation.

Formally, there is a ‘small’ problem with the example of sets. We defined a relation to be arelation between elements of some set X. However, the relation A ⊂ B between any two setsshould be defined on the set of all sets, but there is no set of all sets (Proposition 2.8)! Stillwe can think of the collection of all sets. Why not? While the collection of all sets is not aset, we call it a category of sets so the relation A ⊂ B is defined in the category of all sets.We use the world ‘category’ to emphasize that this is not a set.

Let ∼ be an equivalence relation in a set X. For any a ∈ X we define the equivalenceclass of a as

[a] = {b ∈ X : b ∼ a}.

Clearly a ∈ [a] (because the relation ∼ is reflexive) so [a] is non-empty.

Example 5.3. The relation x = y between real numbers is an equivalence relation and[x] = {x} for very x ∈ R.

Example 5.4. Consider the congruence relation a ≡ b (mod 5) in Z. We have that b ≡ a(mod 5) if b − a = 5k or b = a + 5k for some k ∈ Z. Therefore [a] = {a + 5k : k ∈ Z}. Wehave

[0] = {. . . ,−15,−10,−5, 0, 5, 10, 15, . . .}[1] = {. . . ,−14, −9, −4, 1, 6, 11, 16, . . .}[2] = {. . . ,−13, −8, −3, 2, 7, 12, 17, . . .}[3] = {. . . ,−12, −7, −2, 3, 8, 13, 18, . . .}[4] = {. . . ,−11, −6, −1, 4, 9, 14, 19, . . .}[5] = {. . . ,−10, −5, 0, 5, 10, 15, 20, . . .}

Clearly [0] = [5]. Each equivalence class [n] consists of a set of integers that is infinite in bothdirections with gaps of length 5 between consecutive integers. The set [n + 1] is obtainedfrom the set [n] by shifting all integers to the right by 1. Since the gaps have length 5, theset [n + 5] is obtained from [n] by shifting all integers to the right by 5 and we obtain thesame set as [n], i.e., [n + 5] = [n]. This is to say, that we have only 5 different equivalenceclasses: [0], [1], [2], [3], [4]. Every integer belongs to one and only one equivalence class and[a] = [b] if and only if a ≡ b (mod 5), that is if and only if a and b differ by a multiple of 5.The collection of these equivalence classes is denoted by

Z5 = {[0], [1], [2], [3], [4]},

and with some abuse of notation we can simply identify Z5 with the set {0, 1, 2, 3, 4}.


Example 5.5. The above example can be easily generalized as follows. Let n ∈ N. Then theequivalence classes of the congruence relation a ≡ b (mod n) are

[a] = {a+ kn : k ∈ Z} = {. . . , a− 2n, a− n, a, a+ n, a+ 2n, . . .},

[a] = [a+n] and we have only n distinct equivalence classes [0], [1], . . . , [n− 1]. Every integerbelongs to one and only one equivalence class and [a] = [b] if and only if a ≡ b (mod n). Theequivalence classes form the set

Zn = {[0], [1], [2], . . . , [n− 1]},

and we can identify it with Zn = {0, 1, 2, . . . , n− 1}.

In the case of an arbitrary equivalence relation on a set X we have

Theorem 5.6. Let ∼ be an equivalence relation on a set X. Then every element of X belongsto one and only one equivalence class. More precisely, for every a ∈ X, a ∈ [a] and if a ∈ [b],then [a] = [b]. Moreover a ∼ b if and only if [a] = [b].

That means the set X is partitioned into a collection of pairwise disjoint equivalenceclasses.

Proof. If a ∈ A, then a ∼ a (reflexive relation) so a ∈ [a]. Suppose now that a ∈ [b]. We needto show that [a] = [b].

Since a ∈ [b], it follows from the definition of [b] that a ∼ b and hence b ∼ a (symmetricrelation).

If c ∈ [a] then c ∼ a, a ∼ b so c ∼ b (transitive relation) and hence c ∈ [b]. This provesthat [a] ⊂ [b]. If c ∈ [b], then c ∼ b, b ∼ a and hence c ∼ a so c ∈ [a]. This proves that[b] ⊂ [a]. Since [a] ⊂ [b] and [b] ⊂ [a] we have that [a] = [b].

It remains to prove that a ∼ b if and only if [a] = [b]. If a ∼ b, then a ∈ [b], but (as weproved above) it implies that [b] = [a]. If [a] = [b], then a ∈ [a] = [b], a ∈ [b] and hence a ∼ b.This completes the proof that a ∼ b is equivalent to [a] = [b].

5.2 Integers

Every child knows what the natural numbers are. I have 5 fingers in my palm, I can countthem! But what is −5? I cannot think of −5 fingers! In the elementary school they teachus that the temperature can be negative. Just look at your thermometer! Or that you canhave negative savings if you have a debt. All these examples are just to convince us that weshould take negative integers for granted and just to learn how to use them. Imagine thatyou want to teach a child (who knows natural numbers including 0) about all integers, butyou do not have in mind examples with the temperature or debt. Then the natural (but notvery didactic) way to introduce integers would be to say:

5.2. INTEGERS 57

You know, integers are just numbers n − m, where n,m ∈ N∗. If n ≥ m, then youknow that it is a number in N∗, but you can also write n − m, even if n < m. Don’tworry what it is. Treat it formally and just learn how to compute. First, remember thatn1−m1 = n2−m2 if n1 +m2 = n2 +m1. You know how to add natural numbers, don’t you?You add integers as follows: (n1 −m1) + (n2 −m2) = (n1 + n2) − (m1 + m2) and multiplythem by (n1 − m1)(n1 − m2) = (n1n2 + m1m2) − (n1m2 + m1n2). Aha, there is one morething. You can regard natural numbers n ∈ N∗ as integers of the form n−0 and we also write−n = 0− n. Now you know everything you need to know. Go and do your homework!

While, I am sure, the child would appreciate your lesson (a poor child left behind), thisis exactly what the integers are and this is how we define them in mathematics. We identifyintegers with ordered pairs of natural numbers (n,m), n,m ∈ N∗. Such a pair representsn −m and we identify pairs (n1,m1) and (n2,m2) if n1 + m2 = n2 + m1. That means weintroduce the relation: (n1,m1) ∼ (n2,m2) if and only if n1 + m2 = n2 + m1. This relationturns out to be an equivalence relation and we identify integers with equivalence classes ofthis relation. Okay, let’s do it formally.

In the set N∗ × N∗ we define the relation ∼ as follows

(n1,m1) ∼ (n2,m2) if and only if n1 +m2 = n2 +m1.

This is an equivalence relation. Indeed (n,m) ∼ (n,m) because n + m = n + m so it isreflexive. If (n1,m1) ∼ (n2,m2), then n1 +m2 = n2 +m1 and hence n2 +m1 = n1 +m2 andthe last condition means that (n2,m2) ∼ (n1,m1). Therefore the relation is symmetric. If(n1,m1) ∼ (n2,m2) and (n2,m2) ∼ (n3,m3), then n1 +m2 = n2 +m1 and n2 +m3 = n3 +m2

so

(n1 +m2) + (n2 +m3) = (n2 +m1) + (n3 +m2),

(n1 +m3) + (n2 +m2) = (n3 +m1) + (n2 +m2),

n1 +m3 = n3 +m1

and hence (n1,m1) ∼ (n3,m3), which proves that the relation is transitive.

We proved that ∼ is an equivalence relation and hence we can talk about equivalenceclasses of this relation.

The set of integers, denoted by Z is defined to be the set of all equivalence relations

Z = {[(n,m)] : (n,m) ∈ N∗ × N∗}.

If a, b ∈ Z, then there are natural numbers n1,m1, n2,m2 ∈ N∗ such that a = [(n1,m1)] andb = [(n2,m2)] and we define

a+ b = [(n1 + n2,m1 +m2)],

a · b = [(n1n2 +m1m2, n1m2 +m1n2)]

Note that the natural numbers n1,m1, n2,m2 are not uniquely determined so in order toshow that addition and multiplication of integers is well defined we need to show that if werepresent a and b differently, that is a = [(n1, m1)] and b = [(n2, m2)], then the result of


addition and multiplication computed with this new representation will be the same as withthe old one, that is we need to show that

(5.1) [(n1 + n2, m1 + m2)] = [(n1 + n2,m1 +m2)],

(5.2) [(n1n2 + m1m2, n1m2 + m1n2)] = [(n1n2 +m1m2, n1m2 +m1n2)].

We will verify (5.1) only, leaving verification of (5.2) as an exercise.

We have

a = [(n1,m1)] = [(n1, m1)], b = [(n2,m2)] = [(n2, m2)]

so

n1 + m1 = n1 +m1, n2 + m2 = n2 +m2.

Adding both equalities yields

n1 + m1 + n2 + m2 = n1 +m1 + n2 +m2,

(n1 + n2) + (m1 +m2) = (n1 + n2) + (m1 + m2),

which proves (5.1).

Let 0 = [(0, 0)] and 1 = [(1, 0)]. If a = [(n,m)] ∈ Z, then we write −a = [(m,n)].

Theorem 5.7. The following properties are satisfied by all a, b, c ∈ Z:

(Z1) (a+ b) + c = a+ (b+ c),

(Z2) a+ b = b+ a,

(Z3) a+ 0 = a,

(Z4) a+ (−a) = 0,

(Z5) (a · b) · c = a · (b · c),

(Z6) a · b = b · a,

(Z7) a · 1 = a,

(Z8) a · (b+ c) = (a · b) + (a · c).

(Z9) 1 6= 0.

All these properties are easy to verify directly from the definitions and the proof is left tothe reader. In what follows we take them for granted.

The subtraction k − ` is well defined among elements of N∗ as long as k ≥ `. Namely, wedefine k − ` to be a unique n ∈ N∗ such that k = n + `, but k − ` makes no sense in N∗ ifk < `.

5.2. INTEGERS 59

Each integer can be uniquely represented as

(5.3) [(n, 0)] or [(0, n)].

Indeed, let [(k, `)] ∈ Z be any integer. If k ≥ `, then [(k, `) = [(n, 0)], where n = k − ` ∈ N∗.If ` ≥ k, then [(k, `)] = [(0, n)], where n = ` − k ∈ N∗. The integer n representing [(k, `)] isunique. If [(n, 0)] = [(n, 0)], then n + 0 = n + 0, n = n. Similarly, if [(0, n)] = [(0, n)], thenn = n.

Note also that [(0, n)] = −[(n, 0)]. Because of the (unique) representation (5.3) of integers,we can define n = [(n, 0)] so −n = [(0, n)] and we can list all integers in Z as

(5.4) . . . , [(0, 4)]︸︷︷︸−4

, [(0, 3)]︸︷︷︸−3

, [(0, 2)]︸︷︷︸−2

, [(0, 1)]︸︷︷︸−1

, [(0, 0)]︸︷︷︸0

, [(1, 0)]︸︷︷︸1

, [(2, 0)]︸︷︷︸2

, [(3, 0)]︸︷︷︸3

, [(4, 0)]︸︷︷︸4

, . . .

We now can identify natural numbers with integers, by identifying n ∈ N∗ with n = [(n, 0)],but we still need to check that the arithmetic operations of addition and multiplication in N∗are consistent with the corresponding operations in Z. That is we need to check that for alln,m ∈ N∗ we have

n+m = n+m and n ·m = n ·m.

The verification is easy and left to the reader.

Finally, we can equip integers with the inequality ≤ which is a relation defined as follows.For any a, b ∈ Z

a ≤ b if and only if b = a+ n for some n ∈ N∗.

With this inequality, integers in (5.4) are listed in the increasing order.

Theorem 5.8. For all integers a, b, c ∈ Z, the relation of inequality ≤ has the followingproperties:

(Z10) a ≤ a.

(Z11) a ≤ b ∧ b ≤ a ⇒ a = c.

(Z12) a ≤ b ∧ b ≤ c ⇒ a ≤ c.

(Z13) Either a ≤ b or b ≤ a.

(Z14) a ≤ b ⇒ a+ c ≤ b+ c

(Z15) 0 ≤ a ∧ 0 ≤ b ⇒ 0 ≤ a · b.

We also write a ≥ b if b ≤ a.

From now one we will write n and −n instead of n and −n and we will regard N∗ (andhence N) as a subset of Z.


5.3 Rational numbers

In the ancient Greece they did not talk about rational numbers, but about proportions. Twosegments (areas etc.) A and B are in the proportion like n : m if A increased m times equalsB increased n times. Here, of course m and n are positive integers. Clearly proportionsn1 : m1 and n2 : m2 are equivalent if n1m2 = n2m1. Rational numbers are the abstractionof a notion of a proportion and formally they have to be defined as equivalence classes ofproportions n : m. Here we however, allow the integers to be negative.

ConsiderF = {(n,m) ∈ Z× Z : m 6= 0}

and define the relation

(n1,m1) ∼ (n2,m2) if and only if n1m2 = n2m1.

A reader will not have difficulty to check that this is an equivalence relation. The set ofrational numbers is

Q = {[(n,m)] : (n,m) ∈ Z}with the operations of addition and multiplication defined below.

The equivalence classes are denote by

n

m:= [(n,m)].

Let r, s ∈ Q. Then

r =n1m1

, s =n2m2

for some n1,m1, n2,m2 ∈ Z, nm1,m2 6= 0

and we define

r + s = [(n1m2 + n2m1,m1m2)] =n1m2 + n2m1

m1m2, rs = [(n1n2,m1m2)] =

n1n2m1m2

.

As in the case of the construction of integers we need to show that the definition of r+ s andrs does not depend on how we represent r and s. That is, we need to show that if also

r =n1m1

, s =n2m2

, that is n1m1 = n1m1, n2m2 = n2m2,

thenn1m2 + n2m1

m1m2=n1m2 + n2m1

m1m2and

n1n2m1m2

=n1n2m1m2

.

We will prove the second equality, leaving the proof of the first one as an exercise. First notethat

n

m=

kn

km, for all n,m, k ∈ Z, m, k 6= 0, because n(km) = (kn)m.

Using this equality we can argue as follows:

n1n2m1m2

=m1m2n1n2m1m2m1m2

=(n1m1)(n2m2)

m1m2m1m2=

(n1m1)(n2m2)

m1m2m1m2=

n1n2m1m2

.

If r 6= 01 , i.e., r = n/m, where n,m 6= 0, then we write r−1 = m/n. It is not difficult to prove

that addition and multiplication in Q have the following properties.

5.3. RATIONAL NUMBERS 61

Theorem 5.9. For all r, s, t ∈ Q we have

(Q1) (r + s) + t = r + (s+ t),

(Q2) r + s = s+ t,

(Q3) r + 01 = r,

(Q4) r + (−r) = 01 ,

(Q5) (r · s) · t = r · (s · t),

(Q6) r · s = s · r,

(Q7) r · 11 = r,

(Q8) If r 6= 01 , then r · (r−1) = 1

1 ,

(Q9) r · (s+ t) = (r · s) + (r · t).

(Q10) 11 6=

01 .

We leave the proof as an exercise.

Note that the property (Q8) has no counterpart in Z (see Theorem 5.7), all other propertiesdo.

Now, each integer n ∈ Z can be identified with a rational number n1 . Namely the function

Φ : Z→ Q defined by Φ(n) =n

1

is one-to-one and it has the following properties Φ(0) = 01 , Φ(1) = 1

1 and

Φ(n+m) = Φ(n) + Φ(m), Φ(n ·m) = Φ(n) · Φ(m) for all n,m ∈ Z.

That means the operations of addition and multiplication of integers n and m are consistentwith the operations of addition and multiplication of these integers regarded as rationalnumbers n

1 and m1 .

In what follows we will regard Z as a subset of Q and simply write n instead of n1 .

Each rational number can be represented as

(5.5) r =n

m, where n ∈ Z and m ∈ N.

Indeed, if r = k/`, and ` > 0, then we take n = k, m = `. If r = k/`, and ` < 0, thenr = (−k)/(−`) and we take n = −k, m = −`.

Therefore for any q ∈ Q, there is m ∈ N (denominator in (5.5)) such that mq ∈ Z. Ifr, s ∈ Q, then we can find m ∈ N such that mr,ms ∈ Z (common denominator).

Integers are equipped with the relation of the inequality ≤. It can be extended to the setof all rational numbers Q as follows: r ≤ s if and only if there is m ∈ N such that mr,ms ∈ Zand mr ≤ ms.


Theorem 5.10. For all r, s, t ∈ Q we have

(Q11) r ≤ s.

(Q12) r ≤ s ∧ s ≤ r ⇒ r = s.

(Q13) r ≤ s ∧ s ≤ t ⇒ r ≤ t.

(Q14) Either r ≤ s or s ≤ r.

(Q15) r ≤ s ⇒ r + t ≤ s+ t

(Q16) 01 ≤ r ∧

01 ≤ s ⇒

01 ≤ r · s.

5.4 Problems

Problem 1. Prove the property (Z14): for all a, b, c ∈ Z we have a ≤ b⇒ a+ c ≤ b+ c.

Problem 2. Let

F = {(n,m) ∈ Z× Z : m 6= 0}.

Prove that the relation


is an equivalence relation.

Problem 3. Let1 F∗ = Z× Z and define the relation


We can try to mimic the construction of rational numbers with the set F∗ in place of F.Where does the theory fall apart?

Problem 4. Z11 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10} is a field. Find 7 + 8, 7 · 8, 7/8 in Z11.

Problem 5. Let X be a non-empty set and let P (X) be the power set. In the power set wehave the operation of addition A+B := A ∪B. Check which of the axioms A1, A2, A3, A4are satisfied.

Problem 6. Give an example of a non-empty set X with a relation ∼ that is

(a) reflexive, but not symmetric or transitive;

(b) symmetric, but not reflexive or transitive;

(c) transitive, but not reflexive or symmetric;

1It differs from F in Problem 2, because we include (n, 0). Remember that [(n,m)] represents the fractionnm

so now we try to develop rational numbers, where we can divide by 0.

5.4. PROBLEMS 63

(d) reflexive and symmetric, but not transitive;

(e) reflexive and transitive, but not symmetric;

(f) symmetric and transitive, but not reflexive.

Problem 7. Let X be a nonempty set with a relation ∼. Consider the following argument.Claim: If ∼ is symmetric and transitive, then it is also reflexive.Proof. x ∼ y ⇒ y ∼ x; also x ∼ y and y ∼ x⇒ x ∼ x. Therefore x ∼ x for every x ∈ X. 2

This argument is at odds with part (f) of the previous problem. Where is the flaw?

Chapter 6

Real numbers

6.1 The first sixteen axioms

As we already mentioned in Section 5.3, in the ancient Greece rational numbers were under-stood as proportions. They believed that any two quantities are in a certain proportion. Inparticular, they believed that given two segments A and B, there are natural numbers n andm such that A increased m times and B increased n have the same length. In other words thesegments A and B are in the proportion n : m for some natural numbers n and m. However,Hippasus, one of the students of Pythagoras, discovered that there is no proportion betweenthe diagonal of the square and its side. No matter how many times we will increase the lengthof the diagonal and how many times we will increase the length of a side, the two lengths willnever be equal. Pythagoras could not accept this fact and a legend says that Hippasus wasthrown overboard and drowned.

Translating into modern language we would say that Hippasus proved that√

2 is irrational.We know what the rational numbers are and here we have an example of a (real) numberthat is not rational. This leads to a question: What are the real numbers? How can we definethem? In the ancient Greece they did not know how to answer this question and there wasno good answer until XIXth century.

Actually, the formal construction of real numbers is much more complicated than theformal construction of the rational numbers. Instead we will list all properties that uniquelydetermine real numbers. Such properties are called axioms. Recall that rational numberssatisfy properties (Q1)-(Q16) listed in Theorems 5.9 and 5.10. These are so called axiomsof an ordered field (we will discuss it below in details). These axioms do not determine R,because both Q and R satisfy them. We will need one more axiom (axiom of completeness)that will distinguish R from Q and, in fact will determine R uniquely (up to an isomorphism).These 17 axioms provide a complete set of rules needed in order to use real numbers. Allproperties of real numbers can be concluded from them. Just like the axioms of the set theoryare all what we need in order to use sets. However, we still need to make sure that there isa set with operations of addition and multiplication that satisfies all the 17 axioms. That is,we need to construct the set of real numbers. We have constructed a set of rational numbers

65

66 CHAPTER 6. REAL NUMBERS

satisfying properties (axioms) (Q1)-(Q16) so we know that an ordered field exists, but realnumbers satisfy one more axiom (axiom of completeness) and we need to make sure that thereis a set satisfying the axiom of completeness. Such a construction was provided in XIXthcentury by Dedekind and we will briefly describe it in the last Section 6.5. Well, we saida lot of words here making the above description opaque, but don’t worry, we will explaineverything what was mentioned here.

The first sixteen axioms of real numbers are modelled on properties (Q1)-(Q16).

A field is a set F with operations of addition ‘+’ and multiplication ‘·’ that satisfies thefollowing 10 axioms:

(A1) x+ y = y + x.

(A2) x+ (y + z) = (x+ y) + z.

(A3) There is an element denoted by 0 such that for every x, x+ 0 = x.

(A4) For every x there is an element denoted by −x such that x+ (−x) = 0.

(A5) x · y = y · x

(A6) x · (y · z) = (x · y) · z

(A7) There is an element denoted by 1 such that x · 1 = x.

(A8) For every x 6= 0 there is an element denoted by x−1 such that x · (x−1) = 1.

(A9) x · (y + z) = x · y + x · z.

(A10) 1 6= 0.

Here we understand that conditions are satisfied for all x, y, z ∈ F. We did not includequantifiers in the conditions to make them more transparent.

The element 0 is the neutral element of addition and 1 is the neutral element of themultiplication. Note that despite the fact we use symbols 0 and 1, we do not assume that0 and 1 are integers equal 0 and 1. We just use the same notation, because of similarity ofproperties to those of integers 0 and 1 Elements −x and x−1 are called the inverse elementsof addition and multiplication respectively.

We will also often write xy instead of x · y.

Axioms A2 and A6 say that the order in which we add elements and the order in whichwe multiply elements does not matter and therefore we may write x+ y+ z and xyz in placeof expressions at A2 and A6.

Axiom A3 says that there is an element 0 such that x+ 0 = x for all x. However, it doesnot say that the element 0 is unique. It is unique and that is a theorem. In what follows byT1, T2,. . . we will denote a very tiny theorems that easily follows from the axioms.

T1. If 0 has the property that x+ 0 = x for all x, then 0 = 0.

6.1. THE FIRST SIXTEEN AXIOMS 67

Indeed,

0 = 0 + 0︸︷︷︸property of 0 with x = 0

(A1)= 0 + 0 = 0︸︷︷︸

property of 0 with x = 0

.

Similarly we prove

T2. If 1 has the property that x · 1 = x for all x, then 1 = 1.

In the next result we prove that the inverse element of addition −x is unique.

T3. Let x ∈ F. If an element −x ∈ F has the property that x+ (−x) = 0, then −x = −x.

Proof. Let x ∈ F and assume that −x has the property (A4’): x+ (−x) = 0. Then

−x (A3)= (−x) + 0

(A1)= 0 + (−x)

(A4)= (x+ (−x)) + (−x)

(A1)= ((−x) + x) + (−x)

(A2)= (−x) + (x+ (−x))

(A4’)= (−x) + 0

(A3)= −x

proving that −x = −x.

Similarly we prove uniqueness of x−1.

T4. Let 0 6= x ∈ F. If an element x−1 ∈ F has the property that x · x−1 = 1, then x−1 = x−1.

Note that a set of rational numbers Q, the set of real numbers R or a set of complexnumbers C are fields, but there are more examples. Much more. Although the propertieslook like very natural properties of numbers, there are unexpected examples of fields.

Example 6.1. Recall that for n ∈ N, n ≥ 2, Zn = {[0], [1], . . . , [n − 1]} was defined inExample 5.5 as the set of equivalence classes of the congruence relation a ≡ b (mod n).

We define in Zn addition and multiplication by

[a] + [b] = [a+ b], [a] · [b] = [ab].

First we need to prove that the operations of addition and multiplication are well defined.That is we need to prove:

If ([a] = [a] and [b] = [b]), then ([a+ b] = [a+ b] and [ab] = [ab]).

Since [a] = [a] if and only if a ≡ a (mod n) and [b] = [b] if and only if b ≡ b (mod n),(Theorem 5.6), we have that

a = a+ kn, b = b+ `n for some k, ` ∈ Z.

Therefore

a+ b = a+ b+ (k + `)n, a+ b ≡ a+ b (mod n), [a+ b] = [a+ b].


Similarly

ab = (a+ kn)(b+ `n) = ab+ a`n+ bkn+ kln2 = ab+ (a`+ bk + k`n)n,

ab ≡ ab (mod n), [ab] = [ab].

Note that[a] + [0] = [a], [a] + [−a] = [0], [a] · [1] = [a]

and it easily follows that Zn with 0 = [0], 1 = [1] and −[a] = [−a] satisfies all axioms (A1)-(A7), A9 and A10. It turns out however, that the axiom A8 need not be satisfied. The caseof axiom A8 is completely answered by the next result.

Theorem 6.2. Zn is a field, that is it satisfies all axioms (A1)-(A10), if and only if n = pis a prime number.

Proof. We know that Zn satisfies all (A1)-(A10) with a possible exception of the axiom A8.Therefore it remians to show that the axiom A8 is satisfied if and only if n = p is a primenumber. That is we need to prove two implications: If A8 is satisfied, then n is a primenumber and if n is a prime number, then A8 is satisfied.

To prove the first implication it suffices to show that if n ≥ 2 is not a prime number, thenA8 is not satisfied.1 Assume that n ≥ 2 is not prime, then n = k`, for some k, ` ∈ N, k, ` > 1.We will show that the element [k] ∈ Zn does not have the multiplicative inverse [k]−1 provingthat the axiom A8 is not satisfied. Suppose to the contrary that there is a ∈ Z such that[a] = [k]−1 i.e., [k][a] = [ka] = [1]. Then ak ≡ 1 (mod n). That is ak − 1 = mn = m(k`) forsome m ∈ Z, 1 = k(a −m`). This however, gives a contradiction, because 1 is not divisibleby k.2

Now we will prove the second implication. Suppose now that n = p is a prime number.We need to show that the axiom A8 is satisfied, that is, we need to show that for any [a] 6= [0],there is an integer x such that [x] = [a]−1, that is [a][x] = [ax] = [1].

We will need the following result from the elementary number theory that we state withoutproof.

Lemma 6.3 (Bezout’s identity). Let a, b ∈ Z \ {0} be relatively prime integers i.e., integerswith the greatest common divisor equal 1. Then there are integers x, y ∈ Z such that ax+by =1.

Since [a] 6= [0], a is not divisible by p. Since p is prime it follows that a and p are relativelyprime so there are integers x, y ∈ Z such that ax+py = 1. However, this equality means thatax ≡ 1 (mod p) so [1] = [ax] = [a][x] proving that [x] = [a]−1.

In a field we define the following operations

x− y := x+ (−y),

1Because p⇒ q is equivalent to ¬q ⇒ ¬p, an argument by contrapositive.2The proof presented here has an interesting logical structure. Inside a contrapositive argument we used

an argument by contradiction.


x

y:= x · (y−1), provided y 6= 0,

x2 := x · x, x3 := x2 · x, . . .

2 := 1 + 1, 3 := 2 + 1, 4 := 3 + 1, . . .

Example 6.4. Since in Zp, 0 = [0], 1 = [1], we have 2 = 1 + 1 = [1] + [1] = [1 + 1] = [2],3 = [3] etc. Thus

Zp = {[0], [1], . . . , [p− 1]} = {0, 1, . . . , p− 1}.

For example in the field Z5 we have

3 + 4 = [3] + [4] = [7] = [2] = 2, 3 · 2 = [3] · [2] = [3 · 2] = [6] = [1] = 1, 3−1 = 2.

All the properties of operations in a field can be directly deduced from the axioms. Forexample

Proposition 6.5. In any field (a+ b)2 = a2 + 2ab+ b2.

Proof. The result seems obvious and well known, but since elements of fields can bedifferent than real or rational numbers, this fact really requires a careful proof.

(a+ b)2def. of x2

= (a+ b) · (a+ b)

(A9)= (a+ b) · a+ (a+ b) · b

(A5)= a · (a+ b) + b · (a+ b)

(A9)= a · a+ a · b+ b · a+ b · b

def. of x2= a2 + a · b+ b · a+ b2

(A5)= a2 + a · b+ a · b+ b2

(A7)= a2 + (a · b) · 1 + (a · b) · 1 + b2

(A9)= a2 + (a · b) · (1 + 1) + b2

def. of 2= a2 + (a · b) · 2 + b2

(A5)= a2 + 2 · (a · b) + b2

= a2 + 2ab+ b2.


The above proof is quite long and very formal. For a while we will write such formal proofs,just to realize how to use axioms. In fact all standard rules of addition and multiplicationare true in any field and, in practice, we work with them just like in the case of real numberswithout paying much attention how to verify our computations directly from the axioms.

In addition to operations of addition and multiplication, the fields of rational and realnumbers are equipped with the relation ≤ of being less than or equal to. This relation isnot present in fields C or Zp, p-prime. The relation ≤ has to satisfy the following axioms(A11)-(A16) listed below.


A field F (i.e. a set with operations ‘+’ and ‘·’ satisfying axioms (A1)-(A10)) equippedwith a relation ≤ is called an ordered field if

(A11) x ≤ x.

(A12) x ≤ y ∧ y ≤ x ⇒ x = y.

(A13) x ≤ y ∧ y ≤ z ⇒ x ≤ z.

(A14) For every x, y either x ≤ y or y ≤ x.

(A15) x ≤ y ⇒ x+ z ≤ y + z

(A16) 0 ≤ x ∧ 0 ≤ y ⇒ 0 ≤ xy.

We also write x ≥ y if y ≤ x.

Note that Q, R, C and Zp, where p is a prime number are fields, and Q, R are orderedfields. There are however, other ordered fields than just Q and R and need one more axiomthat will uniquely identify R among all ordered fields. We will discuss it in the next section.

In any ordered field we introduce more notation. For example we say that

x < y if x ≤ y and x 6= y,

and we also write y > x if x < y. Then for any two x, y exactly one of the following conditionsis satisfied

x < y, y < x or x = y.

To see how to use formal language of axioms we will prove some additional properties.

T5. x · 0 = 0 · x = 0.

Indeed,

0(A4)= x · 0 + (−(x · 0))

(A3)= x · (0 + 0) + (−(x · 0))

(A9)= (x · 0 + x · 0) + (−(x · 0))

(A2)= x · 0 + (x · 0 + (−(x · 0))

(A4)= x · 0 + 0

(A3)= x · 0.

We proved that x · 0 = 0. Hence 0 · x = x · 0 = 0 by A5.

T6. −(−x) = x.

Indeed, (−x) + x(A1)= x + (−x)

(A4)= 0 so by T3, x is the unique additive inverse of −x and

hence x = −(−x). We could also argue as follows:3

−(−x)(A3)= (−(−x)) + 0

(A4)= (−(−x)) + (x+ (−x))

(A1)= (x+ (−x)) + (−(−x))

(A2)= x+ ((−x) + (−(−x)))

(A4)= x+ 0

(A3)= x.

3We already have one proof so we do not need another one, but we just want to practice how to use axioms.


T7. (−x) · y = x · (−y) = −(x · y).

By T5 we have 0 = y·0 = y·(x+(−x)) = y·x+y·(−x) = x·y+(−x)·y. Since x·y+(−x)·y = 0,by T3, (−x) ·y is the unique additive inverse of x ·y proving that (−x) ·y = −(x ·y). Similarlywe prove that x · (−y) = −(x · y).

T8. (−x) · (−y) = x · y.

Indeed, (−x) · (−y)(T7)= −(x · (−y))

(T7)= −(−(x · y))

(T6)= x · y.

T9. If x > 0 and y > 0, then x+ y > 0.

Since x ≥ 0, axiom A15 gives

x+ y ≥ 0 + y = y ≥ 0,

so x+ y ≥ 0 by axiom A13. It remains to show that x+ y 6= 0. Suppose to the contrary thatx+ y = 0, then the above inequality gives

0 = x+ y ≥ y

and hence 0 ≥ y, which is a contradiction with the assumption y > 0.

T10. If x 6= 0, then −x 6= 0.

Suppose to the contrary that −x = 0. Then 0 = x+(−x) = x+0 = x which is a contradiction.

T11. If x > 0, then −x < 0.

Indeed, A11 and A15 give

0 ≥ 0 = x+ (−x) ≥ 0 + (−x) = −x, 0 ≥ −x.

The fact that x > 0 implies that x 6= 0 so −x 6= 0 by T10. Thus −x ≤ 0 and −x 6= 0 yields−x < 0.

Similarly we can prove

T12. If x < 0, then −x > 0.

T13. If x ≥ y, then −x ≤ −y.

Indeed, A15 yields

−y = (x+ (−x)) + (−y) = x+ ((−x) + (−y)) ≥ y+ ((−x) + (−y)) = (y+ (−y)) + (−x) = −x.


T14. If x > 0 and y > 0, then xy > 0.

Indeed, xy ≥ 0 by A16 and it remains to prove that xy 6= 0. Suppose to the contrary thatxy = 0. Since y 6= 0, by A8, there is y−1 such that y · y−1 = 1 and we have

0 = 0 · y−1 = (x · y) · y−1 = x · (y · y−1) = x · 1 = x

which is a contradiction with the assumption x > 0.

T15. If x 6= 0, then x2 > 0.

If x > 0, then x2 = x · x > 0 by T14. If x < 0, then −x > 0 by T12 so T8 and T14 givex2 = (−x) · (−x) > 0.

T16. 1 > 0.

Since 1 6= 0, by A10, 1 = 12 > 0 by T15.

T17. If x < 0 and y < 0, then xy > 0

Indeed, by T12 −x > 0 and −y > 0, so T8 and T14 yield

x · y = (−x) · (−y) > 0.

T18. If x < 0, then y > 0, then xy < 0.

Indeed, −x > 0, so

−(x · y) = (−x) · y > 0, x · y < 0.

In any ordered field we define the absolute value by

|x| =

x if x > 0,0 if x = 0,−x if x < 0.

Proposition 6.6. If F is an ordered field and x, y ∈ F, then

T19. |xy| = |x||y|.

T20. |x|2 = x2.

T21. If c ≥ 0, then |x| ≤ c if and only if −c ≤ x ≤ c.

T22. −|x| ≤ x ≤ |x|.

T23. |x+ y| ≤ |x|+ |y|.


T24.∣∣|x| − |y|∣∣ ≤ |x− y|.

Proof of T19. If either x = 0 or y = 0, then both sides are equal zero. Therefore wemay assume that x 6= 0 and y 6= 0 and we have to consider four cases: (a) x > 0, y > 0; (b)x > 0, y < 0; (c) x < 0, y > 0; (d) x < 0, y < 0. As an illustration we will investigate the case(c) only; other cases are left to the reader. Since x < 0 and y > 0 we have xy < 0 by T18, so

|xy| = −(xy) = (−x) · y = |x||y|.

Proof of T20. If x ≥ 0, then x = |x|, so |x|2 = x2. If x < 0, then |x| = −x and hence|x|2 = (−x) · (−x) = x2 by T8.

Proof of T21. To prove “if and only if” condition we need to prove two implications. Theimplication from left to right (⇒) and the implication from right to left (⇐).

(⇒) We need to show that if |x| ≤ c, then −c ≤ x ≤ c. Remember that we assume c ≥ 0. Wehave two cases.

If x ≥ 0, then x = |x| ≤ c, x ≤ c. Since c ≥ 0, −c ≤ 0 by T13 and hence x ≥ 0 ≥ −c givesx ≥ −c. Now both inequalities that we proved can be put together: −c ≤ x ≤ c.

If x < 0, then −x = |x| ≤ c, so −x ≤ c and hence x ≥ −c by T13. Since x ≤ 0 ≤ c, wealso have x ≤ c and hence −c ≤ x ≤ c.

(⇐) We need to prove that if −c ≤ x ≤ c, then |x| ≤ c.

We have x ≤ c and the inequality −c ≤ x gives −x ≤ c. Hence |x| ≤ c, because |x| isequal to x or −x.

Proof of T22. Taking c = |x| in T21 we obtain |x| ≤ |x| = c, so −|x| ≤ x ≤ |x|.

Proof of T23. We have −|x| ≤ x ≤ |x|, −|y| ≤ y ≤ |y|. Applying A15 several times we have

−(|x|+ |y|) = −|x| − |y| ≤ −|x|+ y ≤ x+ y ≤ x+ |y| ≤ |x|+ |y|

and the inequality |x+ y| ≤ |x|+ |y| follows from T21.

Proof of T24. Since x = (x− y) + y, T23 gives |x| ≤ |x− y|+ |y|, so

(6.1) |x| − |y| ≤ |x− y|

Replacing x by y and y by x in the above inequality we obtain |y| − |x| ≤ |y−x| = |x− y|, so

(6.2) −|x− y| ≤ |x| − |y|.

Now (6.1) and (6.2) give−|x− y| ≤ |x| − |y| ≤ |x− y|

and the claim follows from T21. 2

T25. If x > 0, then x > x/2 > 0.


Proof. First observe that

x

2+x

2= (x+ x) · 2−1 = (x · 2) · 2−1 = x.

To prove that x/2 > 0, suppose to the contrary that x/2 ≤ 0. Then x = x/2 + x/2(A15)

≤0 + x/2 = x/2 ≤ 0 which contradicts the assumption that x > 0.

To prove that x > x/2 suppose to the contrary that x ≤ x/2. Then

x

2=x

2+x

2+(−x

2

)= x+

(−x

2

) (A15)

≤ x

2+(−x

2

)= 0

which contradicts the fact that x/2 > 0 that we have just proved.

Proposition 6.7. Let F be an ordered field and x ∈ F. If for every ε > 0, x < ε, then x ≤ 0.

Proof. Suppose to the contrary that x > 0. Take ε = x/2. Then ε > 0 and x > ε by T25 soit is not true that x < ε. Contradiction.

As we can see, we can prove all basic properties of operations, inequalities and absolutevalue directly from the axioms, but proving each new property takes a considerable amountof time. We will not continue this investigation and in what follows we will take all basicproperties of operations and inequalities for granted without proving carefully how they followfrom axioms. However, we have to be aware that it is really important to know that they canbe proved directly from axioms, especially because the ordered fields can be very differentfrom familiar fields of numbers.

6.2 Natural numbers, integers and rational numbers

In a field we can define4 quasi natural numbers N, quasi integers Z, and quasi rational numbersQ as

N = {1, 2 = 1 + 1, 3 = 2 + 1, . . .},

Z = {. . . ,−3,−2,−1, 0, 1, 2, 3, . . .},

Q ={ nm, n,m ∈ Z, m 6= 0

}.

We call these numbers ‘quasi’, because for example in the field Z5, 7 = 2, −1 = 6 or 2 · 3 = 1so 2−1 = 3 and hence 3/2 = 3 · 3 = 4. That means such ‘quasi’ natural numbers, integers ofrational numbers are very different from ‘real’ ones.

However, in the ordered field we have 1 > 0 (T16) and hence n + 1 > n for all n ∈ Z soall quasi integers are ordered

. . . ,−3 < −2 < −1 < 0 < 1 < 2 < 3 < . . .

4The names quasi natural numbers, quasi integers, and quasi rational numbers are not official and you willnot find them in the literature. We will use this name only in this section and not in the remaining parts thethe book.

6.3. SUPREMUM, INFIMUM AND THE LAST AXIOM 75

and unlike in the case of Zp all these quasi integers are different so there is a bijection betweenΦ : Z→ Z that maps n ∈ Z to n ∈ Z. One can prove that

(6.3) Φ(n+m) = Φ(n) + Φ(m) and Φ(nm) = Φ(n)Φ(m)

which means, addition and multiplication in Z is the same as in Z and we can identify Z inan ordered field with Z. A bijection Φ : Z→ Z satisfying (6.3) is called an isomorphism.

Note that quasi rational numbers Q follow the rules (6.4) and (6.5)

(6.4)n

m=k

`≡ n` = mk

Indeed,n

m=k

`≡ (n ·m−1) · (m`) = (k · `−1) · (m`) ≡ n` = mk.

In the first equivalence we use the fact that (m`)−1 exists so that the equality in the middleimplies the equality on the left (after we multiply both sides by (m`)−1). Similarly we prove

(6.5)n

m· k`

=nk

m`

n

m+k

`=n`+mk

m`.

We will prove the formula for adding fractions leaving the formula for the product of fractionsas an exercise. We have

n

m+k

`=n`+mk

m`≡(n

m+k

`

)· (m`) =

(n`+mk

m`

)· (m`) ≡ n`+ km = n`+mk

Since the last equality is obviously true and all equalities are equivalent, the first one is alsotrue.

Thus we have the same operations in Q as in Q and one can prove that there is a bijectionΦ : Q→ Q such that

(6.6) Φ(r + s) = Φ(r) + Φ(s) and Φ(rs) = Φ(r)Φ(s)

A bijection Φ : Q→ Q satisfying (6.6) is called an isomorphism.

In other words we can assume that any ordered field contains a (isomorphic) copy of N,Z and Q.

6.3 Supremum, infimum and the last axiom

The above list of axioms is not sufficient for the theory of real numbers. Indeed, Q satisfiesall axioms (A1)-(A16) and we know that there are real numbers that are not rational. Westill need one axiom. However, it is not very easy to formulate and we need to do somepreparations.

Definition. Let F be an ordered field and let ∅ 6= A ⊂ F.


We say that a set A is bounded from above if there is M ∈ F such that x ≤ M for allx ∈ A. Each such M is called an upper bound of A.

We say that A is bounded from below if there is m ∈ F such that m ≤ x for all x ∈ A.Each such m is called a lower bound of A.

Finally the set A is called bounded if it is bounded from above and bounded from below.The set A is called unbounded if it is not bounded.

Similarly we can define sets unbounded from above and unbounded from below.

Example 6.8. The set A = {x ∈ Q : x < 2} is bounded from above.

Indeed, 2 is an upper bound and any M ≥ 2 is an upper bound. However, 2 is the smallest(the least) upper bound of the set A.

Definition. Let F be an ordered field and let ∅ 6= A ⊂ F.

If A is bounded from above, then M is called a supremum (or a least upper bound) of A if

1. M is an upper bound of A;

2. If a is any upper bound of A, then a ≥M .

If A is bounded from below, then m is called an infimum (or a greatest lower bound) of Aif

1. m is a lower bound of A;

2. If b is any lower bound of A, then b ≤ m.

It easily follows from the definition that a set can have at most one supremum and atmost one infimum, so if a set has a supremum of an infimum, then it is uniquely defined. Ifthe supremum or the infimum of a set A exist, it is denoted respectively by

supA and inf A.

If the set A is unbounded from above we write supA =∞ and if it is unbounded from belowwe write inf A = −∞.

Example 6.9. Let A ⊂ Q be defined by

A =

{x

x+ 1: x > 0, x ∈ Q

}.

Find supA and inf A (in Q).

Solution. Clearly 0 < x/(x + 1) < 1 for all x > 0. Hence 0 is a lower bound and 1 is anupper bound of A. We will show that supA = 1 and inf A = 0. To prove that supA = 1, i.e.,that 1 is the least upper bound of A it suffices to show that if y < 1, then y cannot be an


upper bound of A, that is there is x > 0, x ∈ Q such that x/(x+ 1) > y. To find such x wesimply take any x > y/(1 − y) and one easily checks that this indeed gives x/(x + 1) > y.5

Similarly we prove that inf A = 0. We leave details to the reader. 2

The following result provides a useful method of verification whether a given upper boundM is the supremum.

Proposition 6.10. Let F be an ordered field and let M ∈ F be an upper bound of ∅ 6= A ⊂ F.Then M = supA if and only if

∀ε > 0 ∃x ∈ A (x > M − ε).

Remark 6.11. A careful reader will see that a similar argument has already been employedin the solution to the above exercise.

Proof. This result is nearly obvious, but let us prove it carefully. Let M ∈ F be an upperbound of A. We want to prove

(M = supA) ≡ (∀ε > 0 ∃x ∈ A x > M − ε).

⇒ Suppose to the contrary6 that M = supA and there is ε > 0 such that for all x ∈ A,x ≤M − ε. Then a = M − ε < M is an upper bound of A which contradicts the fact that Mwas the least upper bound.

⇐We know that M is an upper bound and we need to prove that it is the least one. Supposeto the contrary that there is a smaller upper bound a < M , i.e., x ≤ a for all x ∈ A. Takeε = (M − a)/2. Then ε > 0 and M − ε > a. It follows from the what we assumed that thereis x ∈ A such that x > M − ε > a and that contradicts the assumption that a is an upperbound.

Similarly we prove

Proposition 6.12. Let F be an ordered field and let m ∈ F be a lower bound of ∅ 6= A ⊂ F.Then m = inf A if and only if

∀ε > 0 ∃x ∈ A (x < m+ ε).

If F is an ordered field and A ⊂ F is bounded from above, then it is not necessarily truethat A has supremum in F. Here is an example. Let

A = {x ∈ Q : x > 0 and x2 ≤ 2} .

This set is bounded from above, i.e., by 2. Indeed, if x > 2, then x2 > 4, so no elementwith x > 2 can belong to A and hence every element of A satisfies x ≤ 2. It is also not verysurprising that if M is the supremum of A, then it must satisfy M2 = 2 (we will providea rigorous proof of this fact later, see the proof of Theorem 6.14), but there is no rationalnumber M such that M2 = 2, so the set A has no supremum in Q.

5The condition x > y/(y − 1) is obtained from solving the inequality x/(x+ 1) > y for x.6Recall that ¬(p⇒ q) ≡ p ∧ ¬q.


A fundamental difference between Q and R is that every subset of R bounded from abovehas supremum.

Definition. We say that an ordered field F is complete if

(A17) every subset A ⊂ F bounded from above has supremum.

The axiom A17 is called the axiom of completeness.

The next important result shows that there is one, up to an isomorphism, only one com-plete ordered field.

Theorem 6.13. There exists a complete ordered field F . Moreover if F1 and F2 are completeordered fields, then they are isomorphis, i.e. there is a bijection Φ : F1 → F2 such thatΦ(0) = 0, Φ(1) = 1, Φ(x + y) = Φ(x) + Φ(y), Φ(xy) = Φ(x)Φ(y) and x ≤ y if and only ifΦ(x) ≤ Φ(y).

The proof of the theorem is rather difficult. A construction of a complete ordered field isbriefly described in Section 6.5.

Definition. A complete ordered field is called the set of real numbers.

Now we can prove that√

2 exists. Actually, we will prove a more general result.

Theorem 6.14. For any x ≥ 0 and n ∈ N there is one and only one y ≥ 0 such that yn = x.

This unique number y is denoted by y = n√x = x1/n. In the proof we will need the

following lemma.

Proof. First we will prove that if a nonnegative solution of the equation yn = x exists, then itis unique i.e., if yn1 = yn2 = x, then y1 = y2. If x = 0, then clearly y1 = y2 = 0. Thus assumethat x > 0 and hence y1, y2 > 0. By Theorem 4.10 we have

0 = yn1 − yn2 = (y1 − y2)(yn−11 + yn−21 y2 + . . .+ yn−12 ).

Since the second factor on the right hand side is positive, the first one must be zero and hencey1 = y2.

We are left with the proof of the existence of a solution. Assume first that x > 1. Considerthe set

S = {z > 0 : zn ≤ x}.

If z > x, then zn > xn > x (because x > 1) and hence z 6∈ S. Thus all elements z ∈ S satisfyz ≤ x and hence S is bounded from above by x. Since 1 ∈ S we have S 6= ∅ and thus

y = supS ∈ R.

We have

yn < x or yn > x or yn = x.


It suffices to show that the first and the second possibility lead to a contradiction. Then wewill necessarily have yn = x.

Suppose that yn < x. If ε ∈ (0, x), then

(y + ε)n − yn = ε ·((y + ε)n−1 + (y + ε)n−2y + . . .+ yn−1

)< ε · n(y + x)n−1.

Taking

0 < ε <x− yn

n(y + x)n−1

we have

(y + ε)n − yn < x− yn, (y + ε)n < x

and hence y + ε ∈ S. Thus y + ε ≤ supS = y which is a contradiction.

Suppose now that yn > x. For y > ε > 0 we have

yn − (y − ε)n = ε ·(yn−1 + yn−2(y − ε) + . . .+ (y − ε)n−1

)< ε · nyn−1.

Taking

0 < ε < min

{y,yn − xnyn−1

}we have yn − (y − ε)n < yn − x i.e., (y − ε)n > x. Since x ≥ zn for all z ∈ S we obtain that(y − ε)n > z for all z ∈ S i.e., y − ε is an upper bound of S which contradicts the fact that yis the least upper bound. The proof in the case x > 1 is complete.

If x = 0, then y = 0. If x = 1, then y = 1. If 0 < x < 1, then 1/x > 1 and hence there isw > 0 such that wn = 1/x. Now y = 1/w satisfies yn = x.

In the definition of the complete ordered field we only discuss the existence of supremumfor sets bounded from above, but it turns out that this condition also implies that every setbounded from below has infimum. Indeed, If A ⊂ F is bounded from below, then the setA′ = {−x : x ∈ A} is bounded from above, so M = supA′ exists. Then it is easy to see that−M = inf A.

Exercise 6.15. Provide a rigorous proof that A′ is bounded from above and that −M = inf A.

We have seen that Q is an ordered field and Q can be easily defined with the help ofintegers. Unfortunately Q is not complete. How do we know that there exists a completeordered field? We cannot take this fact for granted. It requires a proof. Namely we haveto construct an ordered field. The construction and then verification that it is a completeordered field is actually quite technical and boring. Very boring indeed. Thus we will onlyprovide a sketch.

We start with the field of rational numbers Q and we need to add all “missing” realnumbers to make the field complete. How do we define real numbers that are not rational?


6.4 Natural and rational numbers among real numbers

We know that natural numbers N, integers Z and rational numbers Q form subsets of R. Thefirst property that we are going to prove is that for any real number x we can find a naturalnumber that is larger than x.

Theorem 6.16. The set N ⊂ R is unbounded from above.

Proof. Suppose that N is bounded from above. Since it is not empty, M = supN ∈ Rexists. Thus

n ≤M for all n ∈ N.

Hence alson+ 1 ≤M for all n ∈ N,

son ≤M − 1 for all n ∈ N,

and therefore M − 1 is an upper bound of N. This is however, a contradiction with theassumption that M is the least upper bound. 2

Theorem 6.17 (Archimedes Postulate). 7 For any positive real numbers a > 0 and b ∈ Rthere is a natural number n ∈ N such that an > b.

Proof. By contrary if na ≤ b for all n ∈ N, then N is bounded from above by b/a whichis a contradiction. 2

The next result states that rational numbers are dense in R.

Theorem 6.18. For any real numbers x < y there is a rational number q such that x < q < y.

Proof. Since y − x > 0 it follows from the previous result that n(y − x) > 1 for somen ∈ N and hence 1 + nx < ny. Another application of Theorem 6.17 gives the existence ofm1,m2 ∈ N such that

m1 > nx and m2 > −nx.

Hence 1 < nx+m2 + 1 < m1 +m2 + 1. By Well-Ordering Principle (Theorem 4.13) there isa smallest integer m3 ∈ N such that 1 < nx+m2 + 1 < m3. Since m3− 1 ∈ N is smaller thanm3 we clearly have

m3 − 1 ≤ nx+m2 + 1 < m3,

i.e.,m− 1 ≤ nx < m,

where m = m3 −m2 − 1 ∈ Z. Hence

nx < m ≤ 1 + nx < ny

and since n > 0 it follows thatx <

m

n< y .

7In Archimedes’ words: Any magnitude when added to itself enough times will exceed any given magnitude.

6.5. FORMAL CONSTRUCTION OF REAL NUMBERS 81


One can also easily prove that the set of irrational numbers is dense in R. To prove thisit suffices to consider numbers of the form q

√2, where q ∈ Q. We leave details to the reader.

Theorem 6.19. For any real numbers x < y there is an irrational number z such thatx < z < y.

6.5 Formal construction of real numbers

Sketch of the construction. In the proof we have to construct real numbers out or rationalones. Suppose for a moment that we already know what the real numbers are and we ask toidentify all real numbers using rational numbers only. Every real number α defines a set ofrational numbers strictly less than than α

(6.7) Aα = {p ∈ Q : p < α}

and such a set uniquely determines the real number α. Thus the idea is to identify realnumbers with certain subsets of rational numbers. For example the set

{p ∈ Q : p2 < 2}

will determine the real number α with the property α2 = 2.

Observe that the sets Aα have the following properties

1. ∅ 6= Aα 6= Q.

2. If p ∈ Aα and Q 3 q < p, then q ∈ Aα.

3. If p ∈ Aα, then there is p < r ∈ Q such that r ∈ Aα.

The above reasoning is not a part of proof, because we already assumed that the realnumbers exist. However, it provides a motivation for what is discussed below.

Thus we no longer have real numbers, only rational numbers and we want to constructreal numbers. The construction is as follows.

We define the set R to be the family of subsets of Q called cuts. A cut is a subset α ⊂ Qsuch that

1. α is not empty and α 6= Q.

2. If p ∈ α, q ∈ Q and q < p, then q ∈ α.

3. If p ∈ α, then p < r for some r ∈ α.


This is a reasonable construction, because the set defined in (6.7) has all the properties listedhere.

We will denote rational numbers by letters p, q, r,. . . and cuts by Greek letters α, β, γ,. . .

Thus we identify the set of all real numbers R with cuts. Since cuts do not look likenumbers, we have to define the operations of addition, multiplication and the inequality forcuts to make them numbers. Moreover we have to identify rational numbers with certaincuts.

If α, β ∈ R, then we define

α+ β = {p+ q : p ∈ α and q ∈ β} .

Thus α + β is a subset of Q and one is required to prove that it is a cut. We leave it to thereader.

If α, β ∈ R, we say that α ≤ β, if α ⊂ β.

Since elements of R are sets and elements of Q are numbers we need to identify rationalnumbers with cuts to have Q ⊂ R. Namely any q ∈ Q is identified with the cut

q∗ = {r ∈ Q : r < q}.

In particular 0 is identifies with

0∗ = {r ∈ Q : r < 0}.

Now we can also define multiplication, but it is slightly more complicated. For cuts α ≥ 0∗

and β ≥ 0∗ we can define

α · β = {pq : p ∈ α and q ∈ β}

This corresponds to the multiplication of nonnegative numbers and one can think how todefine multiplication of other real numbers. We leave it to the reader.

Now R is equipped with the operations “+”, “·”, the inequality “≤” and Q ⊂ R. One canprove that R with the operations and the inequality is an ordered field. We will not show it(too boring). It still remains to prove that R is complete., i.e., we have to show that any set∅ 6= A ⊂ R bounded from above has a supremum. Here is a proof.

Elements α ∈ A are non-empty sets of rational numbers (cuts), and hence the union ofall sets α ∈ A,

γ =⋃α∈A

α 6= ∅

is also a non-empty set of rational numbers.

Let β ∈ R be any upper bound of A, i.e., α ≤ β for all α ∈ A, i.e., α ⊂ β for all α ∈ A.Hence the union γ of all sets α ∈ A also satisfies γ ⊂ β. In particular γ 6= Q.

We proved that γ 6= ∅ and γ 6= Q. This is the first property of a cut. One can easily provethat γ satisfies the other two properties., i.e., γ is a cut, i.e., γ ∈ R.

6.5. FORMAL CONSTRUCTION OF REAL NUMBERS 83

We claim that γ is an upper bound of A. Indeed, since γ is the union of all sets α ∈ A,α ⊂ γ for all α ∈ A, i.e., α ≤ γ for all α ∈ A. On the other hand we already proved that forany upper bound β, γ ⊂ β, i.e., γ ≤ β, so γ is the least upper bound of A. In other wordsγ = supA. The proof is complete. 2

The proof presented above is very abstract and it is difficult to follow the construction.To have a better imagination of what happens here one can draw pictures

on the margin of the proof to interpret the cuts α and then all the steps of the constructionwill have a clear interpretation with respect to such pictures.

Form now one we will only need to know that R, the complete ordered field that containsQ exists and we do not have to know how it was constructed. The construction was onlyto make sure that such a field exists. If we want to use a certain mathematical object thatsatisfies certain properties (like R) we cannot take the existence of it for granted. Otherwiseit could happen that the properties have some permanent logical error and we can arriveto a contradiction like in the case of Russell’s paradox. In the case of real numbers thecompleteness of the field is a highly nontrivial and non obvious property, so we really had toprove the existence such a field.

One more remark is that there are many very different fields however, one can provethat in a certain sense the complete ordered field is unique. All complete ordered fields areisomorphic which roughly speaking means they are identical.

More precisely if F1 and F2 are complete ordered fields, then there is a bijection f : F1 →R2 such that

f(x1 + x2) = f(x1) + f(x2), f(x1x2) = f(x1)f(x2),

x1 ≤ x2 if and only if f(x1) ≤ f(x2).

Such a bijection is called an isomorphism of F1 and F2. We will not prove this fact. Observethat f(0) = 0 and f(1) = 1.

Chapter 7

Cardinality

What does it mean that a set A has n elements? That mean that we can label the elementsof the set with numbers 1, 2, . . . , n. Being more precise that means that there is a bijection

f : {1, 2, . . . , n} → A.

It is also easy to see that two finite sets A and B have the same number of elements if andonly if there is a bijection f : A→ B.

A set A is finite if there is a natural number n such that A has n elements. Sets that arenot finite are called infinite sets.

We say that two sets (finite or infinite) A and B have the same cardinality if there is abijection f : A → B. As we already observed in the case of finite sets they have the samecardinality if and only if they have the same number of elements. Thus the intuition shouldbe that two infinite sets have the same cardinality if they have the same amount of elements.

Proposition 7.1. If A has the same cardinality as B and B has the same cardinality as C,then A has the same cardinality as C.

Proof. That is obvious. If f : A → B is a bijection and g : B → C is a bijection, theng ◦ f : A→ C is a bijection. 2

Proposition 7.2. The set of all natural numbers has the same cardinality as the set of allpositive even numbers.

Proof. One could think that the set of all natural numbers is larger, because it containsall positive even integers but also all positive odd integers, so it should be twice as large.However, the two sets have the same cardinality, because the function

ϕ : {1, 2, 3, 4, . . .} → {2, 4, 6, 8, . . .}, ϕ(n) = 2n

is the bijection. This is one of the antinomies that are persistent when we deal with infinity.2

85

86 CHAPTER 7. CARDINALITY

We say that a set A is denumerable if it has the same cardinality as the set of naturalnumbers N. A set A is countable if it is finite or denumerable. A set A is uncountable if it isnot countable.

Proposition 7.2 says that the set of positive integers is denumerable.

Proposition 7.3. Z has the same cardinality as N, i.e., Z is denumerable.

Proof. It suffices to show that we can arrange elements of Z into a sequence, but that iseasy.

0, 1,−1, 2,−2, 3,−3, 4,−4, . . .

Then the bijection between N and Z is given by

1 2 3 4 5 6 7 9 9 . . .↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓0 1 -1 2 -2 3 -3 4 -4 . . .

2

We defined the set of natural numbers using the set theory, and me mentioned that it isnot difficult to define integers and rational numbers. However, we do not know yet how todefine real numbers. We will provide a formal definition in Section 6, but in the next fewexamples we will assume familiarity with real numbers and decimal expansions.

Z2 is the set of all points in R2 with both coordinates being integers. It is called integerlattice.

Proposition 7.4. Z2 is denumerable.

Proof. The set Z2 looks as follows

and the next picture shows how to find a bijection between N and Z2.

87

2

Proposition 7.5. If A is denumerable and B ⊂ A is an infinite subset, then B is denumer-able.

Proof. The fact that A has the same cardinality as N means that elements of A can bearranged into a sequence. Indeed, if f : N→ A is a bijection, and an = f(n), then

a1, a2, a3, a4, . . .

is such an arrangement. If B ⊂ A is an infinite set, then we can arrange the set into aninfinite sequence by erasing those elements from the above sequence that do not belong to A.Since elements of B can be arranged into a sequence the set B has the same cardinality asN. 2

Proposition 7.6. The set of rational numbers is denumerable.

Proof. Each rational number can uniquely be represented in the form ±p/q, where p, q ∈ Nare relatively prime. Therefore with each rational number we can associate a unique element(±p, q) in Z2. However, not every point in Z2 is associated with a rational number. Forexample the point (2, 4) will not be associated with any rational number, because the numbers2 and 4 are not relatively prime.

Thus there is a bijection between Q and an infinite subset of Z2. Since Z2 has the samecardinality as N, any infinite subset of Z2 has the same cardinality as N (Proposition 7.5). Inparticular Q has the same cardinality as N. 2

Not every set is however, countable.

Theorem 7.7 (Cantor). The set R of all real numbers is uncountable.

Proof. It suffices to prove that the subset (0, 1) ⊂ R is uncountable.1 Suppose (0, 1) iscountable, so we can arrange all real numbers from (0, 1) into a sequence. Suppose

x1, x2, x3, x4, . . .

1Indeed, countability of R would imply countability of (0, 1), see Proposition 7.5.


is such a sequence.

Note that the decimal expansion of a real number is not necessarily unique. For example

1 = 0.9999 . . .

Hence

0.001 = 0.0009999 . . .

and thus

0.237 = 0.2369999 . . .

This also shows that every real number x > 0 has a decimal expansion in which infinitelymany digits are different than zero, and one can prove (we skip the proof) that such anexpansion is unique. Let

x1 = 0.a11a12a13a14 . . .

x2 = 0.a21a22a23a24 . . .

x3 = 0.a31a32a33a34 . . .

. . .

be the sequence of decimal expansions of numbers x1, x2, x3, . . . with infinite number of non-zero digits. Here aij represents jth digit of xi.

For every n let bn ∈ {1, . . . , 8} be a digit different than ann. Then

x = 0.b1b2b3b4 . . . ∈ (0, 1)

However, x does not appear in the sequence x1, x2, x3 . . ., because the decimal expansion ofx differs from that of xn on nth position. Therefore the sequence x1, x2, x3, . . . cannot list allreal numbers from (0, 1). This contradiction completes the proof. 2

It is interesting to investigate what other sets have the same cardinality as R.

Proposition 7.8. Any finite interval (a, b) has the same cardinality as R.

Proof. The function f(x) = arctanx gives a bijection between (−π/2, π/2) and R.Therefore (−π/2, π/2) has the same cardinality as R, but (a, b) has the same cardinality as(−π/2, π/2), because we may easily find a linear function that maps (a, b) onto (−π/2, π/2)in a bijective way and then the result follows from Proposition 7.1. 2

Now a tricky problem.

Example 7.9. Prove that (0, 1] has the same cardinality as (0, 1).

Proof. It is not obvious how to start. If we try to build a bijection from (0, 1] to (0, 1)it is not clear what to do with the endpoint 1. Actually it is not possible to construct acontinuous bijection. One can prove that every continuous bijection defined on (0, 1] will

89

have an interval with an endpoint included as an image, so it cannot work. Thus we willconstruct a discontinuous bijection f : (0, 1]→ (0, 1). We define it by a formula

f

(1

n

)=

1

n+ 1for n = 1, 2, 3, . . .

f(x) = x for x 6= 1n .

We just shift the points of the form 1/n to the left and do not move other points. It takesa moment to see that f is a bijection, but if you look at the picture for a while it should beobvious and no further explanations are needed. 2

Theorem 7.10 (Cantor-Bernstein). If A has the same cardinality as a subset A′ ⊂ B andB has the same cardinality as a subset B′ ⊂ A, then A has the same cardinality as B.

Intuitively, the theorem seems clear. Since A has the same cardinality as a subset of B,the amount of elements in the set B should be greater than or equal to the amount of elementsin A. Since B has the same cardinality as a subset of A, the amount of elements in A shouldbe greater than or equal to the amount of elements in B, so the amount of elements in sets Aand B should be the same. However, since we talk about infinite sets nothing is obvious. Theamount of elements in infinite sets is not a number and there is reason to claim that b ≥ a,a ≥ b implies b = a, because a and b (amounts of elements in sets A and B) are not numbers.By the assumption of the theorem we have bijections f : A → A′ ⊂ B and g : B → B′ ⊂ A,but we need to construct a bijection h : A→ B and it is not obvious at all how to do it. Thetheorem has many non-trivial applications. Consider A = (0, 1) and B = (0, 1]. f : A → B,f(x) = x is a bijection between A = (0, 1) and A′ = (0, 1) ⊂ (0, 1] = B and g : B → A,g(x) = x/2 is a bijection between B = (0, 1] and B′ = (0, 1/2] ⊂ (0, 1) = A. Thus accordingto the Cantor-Bernstein theorem the sets A = (0, 1) and B = (0, 1] have the same cardinality,but we have already seen that proving directly that the two sets have the same cardinality isnot an easy task, so the Cantor-Barnstein theorem cannot be easy.

Proof. Let f : A→ A′ ⊂ B and g : B → B′ ⊂ A be bijections. For a given set X ⊂ A wedefine a subset F (X) ⊂ A by

F (X) := A \ g(B \ f(X)).

The construction of the set F (X) is explained on the picture


Suppose that there is a set X0 ⊂ A such that F (X0) = X0, i.e., X0 is a “fixed point” ofF . In this case the situation is as on the picture

and hence the function

h(x) =

{f(x) if x ∈ X0,g−1(x) if x 6∈ X0,

defines a bijection of A onto B. Thus it remains to show that there is a set X0 such thatF (X0) = X0.

Lemma 7.11. For any family X1, X2, X3, . . . of subsets of A we have

F

( ∞⋂i=1

Xi

)=

∞⋂i=1

F (Xi) .

Proof. First compare the lemma with Exercise ??. Does the exercise apply here? Is Fone-to-one? That is a completely wrong point of view. Observe that F is not a functiondefined on A, because F is not defined for points in A, but for subsets of A, so we cannotreally compare the lemma with the exercise. However, as we will see the exercise will beemployed in the proof.

91

We have

F

( ∞⋂i=1

Xi

)= A \ g

(B \ f

( ∞⋂i=1

Xi

))

= A \ g

(B \

∞⋂i=1

f(Xi)

)(Exercise ??, since f is one-to-one)

= A \ g

( ∞⋃i=1

(B \ f(Xi)

)(De Morgan Law, Proposition 2.7)

= A \∞⋃i=1

g(B \ f(Xi)) (Exercise ??)

=∞⋂i=1

(A \ g(B \ f(Xi)) (De Morgan Law, Proposition 2.7)

=∞⋂i=1

F (Xi) .

This completes the proof of the lemma. 2

Observe that we have a decreasing sequence of sets (Why is it decreasing?)

A ⊃ F (A) ⊃ F 2(A) ⊃ F 3(A) ⊃ . . . ,

where F k(A) = F (F (. . . (F (A)) . . .)). Define

X0 = A︸︷︷︸X1

∩F (A)︸︷︷︸X2

∩F 2(A)︸︷︷︸X3

∩F 3(A)︸︷︷︸X4

∩ . . . =∞⋂i=1

Xi .

Observe also that

X0 = A ∩ F (A) ∩ F 2(A) ∩ F 3(A) ∩ . . . = F (A) ∩ F 2(A) ∩ F 3(A) ∩ . . .

because if B is a subset of A, then A ∩B = B. Now the lemma immediately yields

F (X0) =

∞⋂i=1

F (Xi) = F (A) ∩ F 2(A) ∩ F 3(A) ∩ F 4(A) ∩ . . . = X0

and the theorem follows. 2

The plane R2 can be described as the set of all ordered pairs of real numbers

R2 = {(x, y) : x, y ∈ R} .

The following result is quite surprising.

Theorem 7.12. R2 has the same cardinality as R.


Proof. According to the Cantor-Bernstein theorem it suffices to find a bijection f : R →A ⊂ R2 and a bijection g : R2 → B ⊂ R. The first of the two bijections is easy to construct.

f(x) = (x, 0)

is a bijection of R onto the x-axis in R2. The second one is more difficult to construct. In thefirst step we simplify the problem by observing that R2 has the same cardinality as the openunit square

Q = {(x, y) : 0 < x, y < 1} .

Indeed, in Proposition 7.8 we constructed a bijection h : R→ (0, 1) and hence

H(x, y) = (h(x), h(y))

is a bijection H : R2 → Q. Therefore it suffices to construct a bijection w : Q → A ⊂ R,because, then g = w ◦H : R2 → A ⊂ R will a bijection that we need.

Let (x, y) ∈ Q, i.e., x ∈ (0, 1) and y ∈ (0, 1). Consider the unique decimal expansions ofx and y with infinite non-zero digits (cf. proof of Theorem 7.7)

x = 0.a1a1a3 . . .

y = 0.b1b2b2 . . .

and defineH(x, y) = 0.a1b1a2b2a3b3 . . .

Namely H(x, y) ∈ (0, 1) is a number obtained from x and y by mixing the digits of x and y.It is easy to see that the function H is one-to-one and that completes the proof. 2

Exercise 7.13. Prove that H : Q→ (0, 1) is not a surjection.

When we look at the above proof carefully, we see that the result is not that surprising,after all. Points in R2 are ordered pairs of numbers (x, y) and hence can be encoded by twosequences of digits (decimal expansions) and points in R can be encoded by one sequence ofdigits. Now the main idea in the proof is the observation that two sequences of digits can beencoded in a single sequence of digits – by a suitable mixing of the digits from the two givensequences.

We have seen that R is uncountable, i.e., it has different cardinality than N. The followingresults provides another method of constructing infinite sets with different cardinalities.

Theorem 7.14 (Cantor). There is no bijection between a set A and its power set P (A).

Recall that P (A) is a family of all subsets of A. If A has n elements, then P (A) has 2n

elements, so clearly there is no bijection, because 2n > n. The result is however, not obviousif the sets are infinite.

Proof. Suppose ϕ : A → P (A) is a bijection. Since ϕ(a) is a subset of A, either a is anelement of ϕ(a) or a is not an element of ϕ(a). Let

(7.1) R = {a ∈ A : a 6∈ ϕ(a)} .

93

This set is well defined, since its existence is guaranteed by the Axiom of Specification. R isa subset of A, so R = ϕ(a0) for some a0 ∈ A (because ϕ is a bijection). We may inquire nowwhether a0 ∈ R or a0 6∈ R. If a0 ∈ R, then a0 must satisfy the condition from the definition(7.1) of the set R. Hence a0 6∈ ϕ(a0) = R which is a contradiction. If a0 6∈ R = ϕ(a0), thenthe condition from the definition (7.1) is satisfied and hence a0 ∈ R, a contradiction again.Every possibility leads to a contradiction. That proves that a bijection ϕ : A → P (A) doesnot exist. 2

Observe that the above proof is remarkably similar to the argument used in Russell’sparadox. This time however, it is not a paradox – it is a rigorous mathematical argument.

Cantor’s theorem implies, in particular, that the two infinite sets N and P (N) are not ofthe same cardinality. That means a collection of all subsets of N cannot be arranged as asequence, in other words, the set P (N) is uncountable.

Exercise 7.15. Prove that P (N) has the same cardinality as R.

Note that there is a one-to-one function

f : A→ P (A), f(a) = {a}

but there is no one-to-one function g : P (A)→ A, otherwise the sets A and P (A) would havethe same cardinality (Cantor-Bernstein). Thus in some sense P (A) has a “larger” amount ofelements than A. Then also P (P (A)) has “larger” amount of elements than P (A) and so on.

Actually one can associate with any set A a cardinal number n which is something like the“amount” of elements in the set A. Two sets have the same cardinal numbers if they havethe same cardinality. If A is a finite set, then its cardinal number is the number of elementsin A. If the cardinal number of A is n and the cardinal number of B is m, we write n ≥ m isthere is a one-to-one function f : B → A. If n ≥ m and m ≥ n, then n = m. Indeed, it is adirect consequence of the Cantor-Bernstein theorem. If n ≥ m, but n 6= m we write n > m.

If A has the cardinal number n, the cardinal number of P (A) is denoted2 by 2n. Sincef : A → P (A), f(a) = {a} is one-to-one, 2n ≥ n. Since, according to Cantor’s theorem, Aand P (A) do not have the same cardinality, 2n > n. In particular for any cardinal number wecan find a bigger cardinal number. The largest cardinal number does not exist.

Given two cardinal numbers n and m, at most one condition is satisfied

n < m, n = m, n > m.

One can prove that for any two cardinal numbers n and m, n < m, or n = m, or n > m. Thisresult is actually difficult to prove. Indeed, it exactly means that for any two sets A and Bthere is a one-to-one function f : A → B, or a bijection f : A → B or a one-to one functiong : B → A. How can we assure that such a function exists if we do not know anything aboutthe sets A and B? The proof is based on the Axiom of Choice.

2This notation is consistent with the case of finite sets: if A has n elements, the set P (A) has 2n elements.

Chapter 8

Sequences and limits

In this chapter we will discuss basic result regarding limits of sequences and sums of infiniteseries.

8.1 Limits of sequences

A sequence (an) = (an)∞n=1 is a collection of real numbers

(8.1) a1, a2, a3, a4, . . .

where the order of elements is important. More precisely we can define a sequence as afunction f : N→ R. Namely, the sequence (8.1) is identified with a function

f : N→ R, f(n) = an.

We say that a sequence (an) is

increasing if a1 ≤ a2 ≤ a3 ≤ . . .,

decreasing if a1 ≥ a2 ≥ a3 ≥ . . .,

strictly increasing if a1 < a2 < a3 < . . .,

strictly decreasing if a1 > a2 > a3 > . . .,

monotone if it is either increasing or decreasing,

strictly monotone if it is either strictly increasing or strictly decreasing.

bounded above if there is M ∈ R such that an ≤M for all n,

bounded below if there is m ∈ R such that an ≥ m for all n,

bounded if there is M ≥ 0 such that |an| ≤M for all n ∈ N.

95

96 CHAPTER 8. SEQUENCES AND LIMITS

Clearly, a sequence is bounded if and only if it is bounded above and below.

Roughly speaking we say that a sequence (an) converges to g ∈ R if the numbers an getcloser and closer to g as n gets larger and larger. While the intuition behind this descriptionis clear, the notion of convergence has to be given a formal definition.

Let us dissect the intuitive notion of limit more carefully. Suppose that (an) converges tog ∈ R. If ε > 0 is a very small positive number, then for all sufficiently large n, the distancefrom an to g, which equals |an − g| will be less than ε i.e. |an − g| < ε.

That means if ε > 0 is any number,1 then we can find n0 such that for all n ≥ n0 we willhave |an − g| < ε. This leads to the following definition.

Definition. We say that a sequence (an)∞n=1 of real numbers converges to g ∈ R if

∀ε > 0 ∃n0 ∈ N ∀n ≥ n0 |an − g| < ε .

We denote this fact by

limn→∞

an = g or by an → g as n→∞.

We say that a sequence (an) is convergent if it converges to some limit g ∈ R, otherwise thesequence is divergent.

Example 8.1. As an exercise in logic, let us write a formal definition of a divergent sequenceusing quantifiers. A sequence (an) is divergent, if for every g ∈ R it is not true that (an) isconvergent to g which can be formally written as

∀g ∈ R(¬ (∀ε > 0 ∃n0 ∈ N ∀n ≥ n0 |an − g| < ε)

)and this is clearly equivalent to

(8.2) ∀g ∈ R ∃ε > 0 ∀n0 ∈ N ∃n ≥ n0 |an − g| ≥ ε.

Thus formally we could use (8.2) as a definition of a divergent sequence. However, theexpression (8.2) is very complicated and it is much more convenient to define a divergentsequence the way we did before.

In the next four examples we will study limits using directly the definition. Later we willtechniques that will greatly simplify the task of finding limits.

Example 8.2. Prove that limn→∞ 1/n = 0.

Remark 8.3. While this is a very simple example of a limit, the main purpose of it is tolearn how a proof of a statement about the limit should be structured.

1While we want to think that ε > 0 is a very small number, we put the condition that ε > 0 is any number:if the inequality |an − g| < ε is true for small ε, then it is certainly true for all larger values of ε so no harm isdone if we require the condition for all ε > 0.

8.1. LIMITS OF SEQUENCES 97

Proof. We need to prove that

∀ε > 0 ∃n0 ∈ N ∀n ≥ n0∣∣∣∣ 1n − 0

∣∣∣∣ < ε .

Let ε > 0 be given.2 Let n0 be a natural number such that n0 > 1/ε.3 Then for any n ≥ n0we have4

n ≥ n0 >1

ε, n >

1

ε,

1

n< ε,

∣∣∣∣ 1n − 0

∣∣∣∣ < ε.


Remark 8.4. One may ask how did we know that n0 > 1/ε would be a right choice. In thenext two examples we will carefully show the procedure of how to select n0.

Example 8.5. Prove that

limn→∞

n

n+ 1= 1 .

Finding n0. Before proving the claim, we will show how to find n0. Given ε > 0 we wantto find n0 ∈ N so that for all n ≥ n0, |n/(n + 1) − 1| < ε. Thus we start by analyzing theinequality that we want to be true.∣∣∣∣ n

n+ 1− 1

∣∣∣∣ < ε,

∣∣∣∣n− (n+ 1)

n+ 1

∣∣∣∣ < ε,1

n+ 1< ε, n+ 1 >

1

ε, n >

1

ε− 1.

If we choose n0 >1ε − 1, then n ≥ n0 will certainly satisfy n > 1

ε − 1 and by reversing theestimates we will be able to prove the required inequality. This is what we will do below. Itis not necessarily to include the above reasoning in the proof since the proof written belowwill contain all necessary details.


∀ε > 0 ∃n0 ∈ N ∀n ≥ n0∣∣∣∣ n

n+ 1− 1

∣∣∣∣ < ε .

Let ε > 0 be given. Let n0 be any natural number such that

(8.3) n0 >1

ε− 1 .

Then for any n ≥ n0 we have

n ≥ n0 >1

ε− 1, n >

1

ε− 1, n+ 1 >

1

ε,

1

n+ 1< ε,

∣∣∣∣n− (n+ 1)

n+ 1

∣∣∣∣ < ε,∣∣∣∣ n

n+ 1− 1

∣∣∣∣ < ε .

The proof is complete.2We need to prove that the above statement is true for all ε > 0 so we choose ε > 0 arbitrarily and then

we need to show that ∃n0 ∈ N ∀n ≥ n0 |1/n− 0| < ε.3Once ε > 0 was chosen, we need to show that there exists n0 ∈ N such that ∀n ≥ n0 |1/n− 0| < ε. That

means it suffices to show just one value of n0 such that ∀n ≥ n0 |1/n− 0| < ε. Since ε > 0 was selected beforeselecting n0, the choice of n0 may depend on ε.

4Since ε > 0 and n0 ∈ N have already been selected, it remains to prove that ∀n ≥ n0 |1/n − 0| < ε, andthis is what we are doing now.


Remark 8.6. The proof is perfectly fine, even if it does not explain the choice of n0 in (8.3).5


limn→∞

3n2 + 2

n2 + 2n+ 1= 3.

Finding n0. The problem is more difficult thank the two examples discussed above.

We need to find n0 ∈ N such that∣∣∣∣ 3n2 + 2

n2 + 2n+ 1− 3

∣∣∣∣ < ε for n ≥ n0.

We have6∣∣∣∣ 3n2 + 2

n2 + 2n+ 1− 3

∣∣∣∣ =

∣∣∣∣3n2 + 2− 3n2 − 6n− 3

n2 + 2n+ 1

∣∣∣∣ =6n+ 1

n2 + 2n+ 1<

6n+ 1

n2≤ 7n

n2=

7

n< ε.

7

n< ε, n >

7

εand we take n0 >

7

ε.

The ‘finding n0’ part is what we write on a scratch paper and we do not have to include itin the actual proof. We write the proof and we put the scratch paper in trash so a personreading the proof will have no idea why we have chosen n0 > 7/ε.

Proof. Let ε > 0 be given. Let n0 ∈ N be such that n0 > 7/ε. Then for n ≥ n0 we haven > 7/ε, 7/n < ε so∣∣∣∣ 3n2 + 2

n2 + 2n+ 1− 3

∣∣∣∣ =

∣∣∣∣3n2 + 2− 3n2 − 6n− 3

n2 + 2n+ 1

∣∣∣∣ =6n+ 1

n2 + 2n+ 1<

6n+ 1

n2≤ 7n

n2=

7

n< ε,

∣∣∣∣ 3n2 + 2

n2 + 2n+ 1− 3

∣∣∣∣ < ε for all n ≥ n0.


In the next example we will show that the sequence an = (−1)n is divergent. Intuitively, itis absolutely obvious: the sequence oscillates between 1 and −1 infinitely many times so thereis no single number g ∈ R to which (an) would converge. While it seems obvious, writing aformal proof is surprisingly difficult.

Example 8.8. Prove that the sequence an = (−1)n is divergent.

Proof. Suppose to the contrary that limn→∞ an = g for some g ∈ R. Then

∀ε > 0 ∃n0 ∈ N ∀n ≥ n0 |an − g| < ε.

5If a student asks me how did I know that (8.3) would be a right choice, I often smile and say: I often hearvoices in my head and one whispered to me: ‘take n0 > 1/ε− 1’.

6Some explanations are necessary. We want 6n+1n2+2n+1

to be less than ε, but the expression is complicated

so we estimate it from above by a simpler one 7/n so if 7/n < ε, then also 6n+1n2+2n+1

< ε.


Since it is true for all ε > 0, it is true for ε = 1. That is there is n0 ∈ R such that

|an − g| < 1 for all n ≥ n0.

Taking n ≥ n0 even, we have

|1− g| < 1.

Taking n ≥ n0 odd we have

|(−1)− g| < 1

but the two inequalities lead to a contradiction.7 Indeed, the triangle inequality |a + b| ≤|a|+ |b| yields

2 = |1− (−1)| = |(1− g) + (g − (−1))| ≤ |1− g|+ |g − (−1)| < 1 + 1 = 2,

2 < 2.


Theorem 8.9. The limit of a sequence is uniquely defined, i.e., if

limn→∞

an = g and limn→∞

an = h

then g = h.

Proof. 8 Although the result seems obvious it requires a proof. Suppose g 6= h. Thenε = |g − h|/2 > 0. The definition of the limit with this choice of ε implies that there is n1such that

|an − g| < ε for all n ≥ n1

and also there is n2 such that

|an − h| < ε for all n ≥ n2.

Hence for all n ≥ max{n1, n2} we have

|g − h| ≤ |an − g|+ |an − h| < ε+ ε = |g − h|

which is an obvious contradiction.

Example 8.10. As we know limn→∞ an = g, g ∈ R means that

(8.4) ∀ε > 0 ∃n0 ∈ N ∀n ≥ n0 |an − g| < ε.

Given a sequence (an) convergent to g ∈ R, is the following condition also true?

(8.5) ∀ε > 0 ∃n0 ∈ N ∀n ≥ n0 |an − g| <ε

3.

7Geometrically speaking, since the distance between 1 and −1 equals 2, there is no g ∈ R such that thedistance of g to 1 and to −1 is less than 1.

8Note that the argument used in this proof is very similar to the one used in Example 8.8.


Yes, it is! Let η > 0 be given. Then ε = η/3 > 0. Hence according to (8.4), there is n0 ∈ Nsuch that for all n ≥ n0 |an − g| < ε = η/3 so we proved that

∀η > 0 ∃n0 ∈ N ∀n ≥ n0 |an − g| <η

3.

Here η is used to denote any positive number and instead of η we can use symbol ε whichshows that (8.5) is true. For the same reason the following condition (and many others) istrue.

∀ε > 0 ∃n0 ∈ N ∀n ≥ n0 |an − g| < ε2.

Example 8.11. If limn→∞ an = g ∈ R, then limn→∞ |an| = |g|.

Proof. Recall that ||a| − |b|| ≤ |a− b| (Theorem 6.6). We need to prove

(8.6) ∀ε > 0 ∃n0 ∀n ≥ n0∣∣|an| − |g|∣∣ < ε,

and we know that

(8.7) ∀ε > 0 ∃n0 ∀n ≥ n0 |an − g| < ε,

so we prove (8.6) as follows. Let ε > 0 be given. Let n0 be such that for n ≥ n0, |an − g| < ε(such n0 exists by (8.7)). Then for n ≥ n0 we have∣∣|an| − |g|∣∣ ≤ |an − g| < ε,∣∣|an| − |g|∣∣ < ε.

That completes the proof of (8.6).

Example 8.12. If an = (−1)n, then the sequence (an) diverges, but limn→∞ |an| = 1.

However we have

Proposition 8.13. limn→∞ an = 0 if and only if limn→∞ |an| = 0.

Proof. limn→∞ an = 0 means that

(8.8) ∀ε > 0 ∃n0 ∀n ≥ n0 |an − 0| < ε︸︷︷︸|an|<ε

while limn→∞ |an| = 0 means that

(8.9) ∀ε > 0 ∃n0 ∀n ≥ n0 ||an| − 0| < ε︸︷︷︸|an|<ε

so (8.8) and (8.9) are trivially equivalent.

Theorem 8.14. Convergent sequences are bounded.


Proof. Assume that (an) is convergent, limn→∞ an = g. Taking ε = 1 in the definition of thelimit we see that there is n0 such that

|an − g| < 1 for all n ≥ n0.

Now|an| = |(an − g) + g| ≤ |an − g|+ |g| ≤ 1 + |g| for all n ≥ n0.

Hence|an| ≤ 1 + |g|+ |a1|+ |a2|+ . . .+ |an0−1|︸︷︷︸

M

for all n.

Example 8.15. Prove that the sequence (an), an = n+ n(−1)n is divergent.

Proof. Let M > 0 be given. If n = 2k, k > M/4, then an = 4k > M so the sequnece isunbounded above and therefore, divergent (by Theorem 8.14).

The following result is of fundamental importance.

Theorem 8.16. Iflimn→∞

an = a ∈ R and limn→∞

bn = b ∈ R,

then

1. limn→∞(an + bn) = a+ b,

2. limn→∞ can = ca for any c ∈ R,

3. limn→∞ anbn = ab,

4. limn→∞ an/bn = a/b, provided bn 6= 0 for all n and b 6= 0.

Proof. 1. Given ε > 0 there is n1 such that

|an − a| <ε

2for all n ≥ n1

and there is n2 such that

|bn − b| <ε

2for all n ≥ n2.

Hence for all n ≥ n0 = max{n1, n2} we have

|(an + bn)− (a+ b)| ≤ |an − a|+ |bn − b| <ε

2+ε

2= ε.

Similarly we treat the case of an − bn.

2. If c = 0, then can = 0→ 0 = ca so the result is obvious. Thus we may assume that c 6= 0.Given ε > 0, there is n0 such that for n ≥ n0 we have |an − a| < ε/|c| so

|can − ca| = |c| |an − a| < ε for n ≥ n0.


3. Given ε > 0 we need to prove that there is n0 such that

|anbn − ab| < ε for all n ≥ n0.

First we need to do an analysis to find an appropriate n0. We have9

|anbn − ab| = |anbn − anb+ anb− ab| ≤ |an||bn − b|+ |an − a||b| = ♥.

The sequence (an) is bounded as convergent (Theorem 8.14). Hence there is M > 0 suchthat

|an| < M for all n.

Taking M larger, if necessary, we can also assume that |b| < M . Then we have

♥ ≤M |bn − b|+ |an − a|M.

Since an → a and bn → b as n→∞, there is n0 such that

(8.10) |an − a| <ε

2Mand |bn − b| <

ε

2Mfor all n ≥ n0.

Now we can complete the proof. Fix M > 0 such that |b| < M and |an| < M for all n. Letε > 0 be given. Let n0 be such that (8.10) holds. Then for n ≥ n0 we have

|anbn − ab| ≤M |bn − b|+ |an − a|M < Mε

2M+

ε

2MM = ε .


4. It suffices to show that 1/bn → 1/b as n → ∞. The rest will follow from 2. Sincebn → b 6= 0, there is n1 such that

|bn − b| <|b|2

for n ≥ n1

and hence

|bn| >|b|2

for n ≥ n1.

For n ≥ n1 we have ∣∣∣∣ 1

bn− 1

b

∣∣∣∣ =|bn − b||bnb|

≤ 2|b− bn||b|2

.

Let ε > 0. Let n0 ≥ n1 be such that

|b− bn| <ε|b|2

2for n ≥ n0.

Then for n ≥ n0 we have ∣∣∣∣ 1

bn− 1

b

∣∣∣∣ ≤ 2|b− bn||b|2

<2

|b|2· ε|b|

2

2= ε.


9We split anbn− ab into two expressions, and we applied the triangle inequality. This allows us to estimatean and bn separately. It is important to understand this trick. Similar tricks will be used over and over again.


The above result can easily be generalized as follows. If k ∈ N and (a1n), (a2n),. . . , (akn)are convergent sequences, then

(8.11) limn→∞

(a1n + a2n + . . .+ akn) = limn→∞

a1n + limn→∞

a2n + . . .+ limn→∞

akn

limn→∞

(a1n · a2n · . . . · akn) = limn→∞

a1n · limn→∞

a2n · . . . · limn→∞

akn

Example 8.17. Applying the above result we have

1 = limn→∞

1 = limn→∞

1

n+

1

n+ . . .+

1

n︸︷︷︸n

= limn→∞

1

n+ limn→∞

1

n+ . . .+ lim

n→∞

1

n︸︷︷︸n

= 0 + 0 + . . .+ 0 = 0.

Where did we make a mistake? The formula (8.11) is true if k is a fixed integer, while herewe have n terms with n approaching to infinity. Be careful!

Theorem 8.18. If an ≤ bn ≤ cn and

limn→∞

an = limn→∞

cn = g ∈ R,

thenlimn→∞

bn = g .

Proof. Let ε > 0 be arbitrary. Then there are n1 and n2 such that

|cn − g| < ε for n ≥ n1,

|an − g| < ε for n ≥ n2.

Hence for n ≥ n0 = max{n1, n2} we have

g − ε < an ≤ bn ≤ cn < g + ε,

g − ε < bn < g + ε,

−ε < bn − g < ε,

|bn − g| < ε.


Theorem 8.19. For a > 0, limn→∞ n√a = 1.

Proof. We assume first that a > 1. Let xn = n√a − 1. Applying Bernoulli’s inequality

(Proposition 4.6) we havea = (1 + xn)n ≥ 1 + nxn

0 < xn ≤a− 1

n.

Since the left hand side converges to zero (as a constant sequence) and the right hand sideconverges to zero we conclude from the previous theorem that n

√a− 1 = xn → 0, and hence

n√a→ 1.


If a = 1, then n√a = 1→ 1. If 0 < a < 1, then 1/a > 1, so n

√1/a→ 1 and thus

n√a =

1n√

1/a→ 1.


Exercise 8.20. Find the limit limn→∞n√

3n + 5n.

Solution. We have5 =

n√

5n ≤ n√

3n + 5n ≤ n√

2 · 5n =n√

2 · 5.

Since the left hand side and the right hand side both converge to 5 we conclude that

limn→∞

n√

3n + 5n = 5 .

Theorem 8.21. limn→∞ n√n = 1.

Proof. Let xn = n√n− 1. Then xn ≥ 0. Applying the binomial formula we have

n = (1 + xn)n ≥(n

2

)1n−2 · x2n =

n(n− 1)

2x2n ,

0 ≤ xn ≤√

2

n− 1.

Since the left hand side and the right hand side both converge to zero (Problem 1) we getn√n− 1 = xn → 0, n

√n→ 1.

Proposition 8.22. If a > 1, then limn→∞ a−n = 0. If |a| < 1, then limn→∞ a

n = 0.

Proof. If a > 1, Bernoulli’s inequality, Proposition 4.6, yields

an = (1 + (a− 1))n ≥ 1 + n(a− 1)

and hence

0 < a−n ≤ 1

1 + n(a− 1)→ 0 .

If a = 0, then an = 0→ 0 and if 0 < |a| < 1, then 1/|a| > 1 and hence |an| = (1/|a|)−n → 0,so an → 0.

Definition. We say that a sequence (an) diverges to +∞ if10

∀M > 0 ∃n0 ∀n ≥ n0 an > M.

Then we writelimn→∞

an = +∞ or an → +∞ as n→∞.

10In this case the limit exists limn→∞ an = +∞ or (−∞), but we still say that the sequence (an) diverges.


We say that a sequence (an) diverges to −∞ if

∀M > 0 ∃n0 ∀n ≥ n0 an < −M.

Then we write

limn→∞

an = −∞ or an → −∞ as n→∞.

Theorem 8.23. Let an > 0. Then an → 0 if and only if 1/an →∞.

Proof. Let an > 0, an → 0. Let M > 0 be given. Let n0 be such that an < 1/M forn ≥ n0. Then 1/an > M for n ≥ n0 and hence 1/an → +∞. Similarly we prove the otherimplication.

The above result and Proposition 8.22 give

Corollary 8.24. For a > 1, limn→∞ an = +∞.

Example 8.25. limn→∞√n = +∞.


∀M > 0 ∃n0 ∀n ≥ n0√n > M.

Given M > 0, let n0 be such that n0 > M2. Then for all n ≥ n0 we have n ≥ n0 > M2,n > M2,

√n > M .

Example 8.26. limn→∞(√n+ 1−

√n) = 0.

Proof.

√n+ 1−

√n =

(√n+ 1−

√n)(√n+ 1 +

√n)√

n+ 1 +√n

=1√

n+ 1 +√n→ 0.

It is easy to show that√n+ 1 +

√n→∞ as n→∞ so 1√

n+1+√n→ 0 by Theorem 8.23.

Definition. The extended real line R is defined as

R = R ∪ {−∞,+∞} .

Theorem 8.27. If limn→∞ an = g ∈ R, then

limn→∞

a1 + a2 + . . .+ ann

= g.

Theorem 8.28. If limn→∞ an = g ∈ R and an > 0 for all n, then

limn→∞

n√a1a2 · · · an = g.


We will only prove the first theorem. The proof of the second theorem is similar.11

Proof. We will prove the theorem under the additional assumption that g ∈ R. Since an−g →0, the sequence an − g is bounded i.e., there is M > 0 such that

|an − g| ≤M for all n.

Given ε > 0, let n0 be such that

|an − g| <ε

2for n ≥ n0.

We need to estimate the arithmetic mean with the help of the above inequality. Observethat the inequality applies only to n ≥ n0. Therefore we need to split the arithmetic meaninto two parts, where the first part will contain a1, . . . , an0−1 and the second one an0 , . . . , an.Then we will estimate the two parts separately.

For n ≥ n0 we have∣∣∣∣a1 + . . .+ ann

− g∣∣∣∣ =

∣∣∣∣(a1 − g) + . . .+ (an − g)

n

∣∣∣∣≤ |a1 − g|+ . . .+ |an0 − g|

n+|an0+1 − g|+ . . .+ |an − g|

n

≤ n0M

n+

(n− n0)ε/2n

≤ n0M

n+ε

2.

Let n1 > n0 be such that n0M/n < ε/2 for n > n1. The for n > n1∣∣∣∣a1 + . . .+ ann

− g∣∣∣∣ < ε.


As an immediate consequence of the above results we obtain.

Example 8.29. limn→∞

1 +√

2 + 3√

3 + . . .+ n√n

n= 1.

Example 8.30. If a1 = 1, a2 = 2/1,. . . , an = n/(n− 1), then an → 1 and hence

n√n = n

√1 · 2

1· 3

2· · · n

n− 1→ 1 .

This proof of the fact that n√n → 1 is much more complicated than the one we obtained

earlier, but it is still a nice illustration of Theorem 8.28.

More generally we have.

11It can actually be concluded from the result about arithmetic means with the help of continuity of thelogarithm.

8.2. THE STOLZ THEOREM 107

Theorem 8.31. If an > 0 and all n and limn→∞ an+1/an = a ∈ R, then limn→∞ n√an = a.

Proof. Since an+1/an → a we see that also the the following sequence converges to a

1,a2a1,a3a2, . . . ,

anan−1

, . . . −→ a.

Thereforen√an

n√a1

= n

√1 · a2

a1· a3a2· anan−1

→ a .

Since n√a1 → 1, we conclude that n

√an → a.

Example 8.32. an = n→∞ and hence

limn→∞

n√n! = lim

n→∞n√a1a2 · · · an =∞.

8.2 The Stolz theorem

Theorem 8.33. Suppose that (xn)n = 1∞ and (yn)∞n=1 are sequences of real numbers suchthat

(a) there is N such that 0 < yn < yn+1 for all n ≥ N ,

(b) limn→∞ yn =∞.

Under the above assumptions, if

limn→∞

xn − xn−1yn − yn−1

= g ∈ R, then limn→∞

xnyn

= g.

Remark 8.34. One may view this result as a version of l’Hospital’s rule for sequences. Infact, one of the proof of l’Hospital’s rule will be based on Theorem 8.33.

Proof. Assume first that the limit is finite

(8.12) limn→∞


= g ∈ R.

Fix ε > 0. There is n0 ≥ N such that

(8.13) g − ε

2<xm+1 − xmym+1 − ym

< g +ε

2for m ≥ n0.

Note that according to (a), all denominators ym+1−ym are positive. We will need the followingsimple observation.

Lemma 8.35. If b, d > 0 and a/b < c < d, then

a

b<a+ c

b+ d<c

d.


Applying the lemma to (8.13) with help of a simple induction argument, gives that for alln > n0 we have

xn − xn0

yn − yn0

=

∑n−1m=n0

(xm+1 − xm)∑n−1m=n0

(ym+1 − ym)∈ (g − ε/2, g + ε/2),

i.e.,

(8.14)

∣∣∣∣xn − xn0

yn − yn0

− g∣∣∣∣ < ε

2for n > n0.

Note thatxnyn− g =

xn0 − gyn0

yn+

(1− yn0

yn

)(xn − xn0

yn − yn0

− g).

Since |1− yn0yn| < 1 for n > n0, we have

(8.15)

∣∣∣∣xnyn − g∣∣∣∣ ≤ ∣∣∣∣xn0 − gyn0

yn

∣∣∣∣+

∣∣∣∣xn − xn0

yn − yn0

− g∣∣∣∣ .

Let n1 > n0 be such that

(8.16)

∣∣∣∣xn − xn0

yn − yn0

− g∣∣∣∣ < ε

2for n ≥ n1.

Existence of n1 follows from (b). Hence(8.14), (8.15) and (8.16) imply∣∣∣∣xnyn − g∣∣∣∣ < ε

2+ε

2= ε for n ≥ n1.

This completes the proof in the case of finite limit (8.12).

The case of infinite limits can be easily reduced to the case of a finite limit. Indeed,assume for example that

limn→∞


=∞.

Then there is N such that for n ≥ N

xn − xn−1 > yn − yn−1 > 0 and hence (why?) limn→∞

xn =∞.

Since

limn→∞

yn − yn−1xn − xn−1

= 0,

the assumptions of Stolz’s theorem are satisfied (finite limit case) and hence

limn→∞

ynxn

= 0 so limn→∞

xnyn

=∞.

The case of the −∞ limit is similar ad left to the reader.

As an application we will provide a new proof of Theorem 8.27.

8.3. MONOTONE SEQUENCES 109

Proof of Theorem 8.27. Let xn = a1 + . . .+ an and yn = n. Then

limn→∞

a1 + . . .+ ann

= limn→∞


= limn→∞

an = g.

Example 8.36. If k ∈ N, then

limn→∞

1k + 2k + . . .+ nk

nk+1=

1

k + 1.

Indeed, let xn = 1k + . . .+ nk and yn = nk+1. It follows from binomial’s formula that

yn − yn−1 = nk+1 − (n− 1)k+1 = (k + 1)nk + lower order terms.

Hence

limn→∞

1k + 2k + . . .+ nk

nk+1= lim

n→∞


= limn→∞

nk

(k + 1)nk + l.o.t.=

1

k + 1

8.3 Monotone sequences

The following important result stems from the fact that the ordered field of real numbers iscomplete.

Theorem 8.37. Every sequence that is increasing and bounded from above is convergent.

Proof. Let A = {a1, a2, a3, . . .} be the set of all values of the sequence (an). Since the set isbounded from above

g = supA = sup{a1, a2, a3, . . .}

exists and belongs to R. Note that an ≤ g for all n, because g is an upper bound of A. Sinceg is the least upper bound, for every ε > 0, g − ε is not an upper bound, so there is n0 suchthat an0 > g − ε. Hence for all n ≥ n0

g − ε < an0 ≤ an ≤ g

and thus|an − g| < ε .

We proved that for any ε > 0 there is n0 such that for all n ≥ n0, |an− g| < ε i.e., we provedthat

limn→∞

an = g.


By a similar argument one can prove.

Theorem 8.38. Every sequence that is decreasing and bounded from below is convergent.


8.4 Subsequences

If (an) is a sequence and n1 < n2 < n3 < . . . are positive integers, then the sequence

(bk), bk = ank

is called a subsequence of (an). The sequence (ank)∞k=1 is simply obtained from (an) by

selecting infinitely many terms with indices in the increasing order.

Note that nk ≥ k. It is important to remember this fact.

Proposition 8.39. If limn→∞ an = g ∈ R and (ank)∞k=1 is a subsequence of (an), then

limk→∞ ank= g.

Proof. We will prove the result in the case in which g ∈ R. The cases g = ±∞ are left to thereader. We need to prove that

∀ε > 0 ∃k0 ∀k ≥ k0 |ank− g| < ε,

but we know that∀ε > 0 ∃n0 ∀n ≥ n0 |an − g| < ε,

Recall that nk ≥ k.

Let ε > 0 be given and let n0 be such that for all n ≥ n0, |an− g| < ε. Let k0 = n0. Thenfor k ≥ k0 we have

nk ≥ k ≥ k0 = n0, nk ≥ n0 so |ank− g| < ε

and we prove that|ank− g| < ε for k ≥ k0.

Example 8.40. If nk = k + 1, then ank= ak+1 and hence12

limn→∞

an = limn→∞

an+1 = g.

If nk = 2k, then ank= a2k and hence13

limn→∞

an = limn→∞

a2n = g.

However, form convergence of a subsequence we cannot conclude convergence of the se-quence.

Example 8.41. In Example 8.8 we proved that the sequence an = (−1)n is divergent. Hereis another proof of this fact. Suppose (an) converges to g. Then 1 = a2n → g so g = 1 and−1 = a2n+1 → g so g = −1 and hence 1 = −1 which is a contradiction. This contradictionproves divergence of the sequence an = (−1)n. Although the sequence an = (−1)n is divergent,both subsequences (a2n)n and (a2n+1)n, are convergent.

12Step by step explanation: limn→∞ an+1 = limk→∞ ak+1 = limk→∞ ank = limn→∞ an by Proposition 8.39.13Step by step explanation: limn→∞ a2n = limk→∞ a2k = limk→∞ ank = limn→∞ an by Proposition 8.39.

8.4. SUBSEQUENCES 111

However, we have.14

Proposition 8.42. If limn→∞ a2n = limn→∞ a2n+1 = g ∈ R, then limn→∞ an = g.

Proof. We will prove the result under the additional assumption that g ∈ R. For any ε > 0there is n1 such that for n ≥ n1

|a2n − g| < ε and |a2n+1 − g| < ε.

Hence for n0 = 2n1 + 1 and n ≥ n0 we have

|an − g| < ε,

because every integer n ≥ n0 = 2n1 + 1 is of the form n = 2k or n = 2k + 1 for somek ≥ n1.

Here is a nice and quite typical application of subsequences.

Example 8.43. Find the limit of the sequence (xn) which is defined as follows:15 x1 = 2018is any number and

(8.17) xn+1 =1

2

(xn +

1

xn

)for all n ≥ 1.

Proof. Suppose for a moment that we know that (xn) converges to a positive limit g ∈ (0,∞).Then

limn→∞

xn = limn→∞

xn+1 = limn→∞

1

2

(xn +

1

xn

)=

1

2

(limn→∞

xn +1

limn→∞ xn

),

so

g =1

2

(g +

1

g

), 2g = g +

1

g, g =

1

g, g2 = 1,

and hence g = 1, because g > 0. Therefore it remains to prove that the sequence (xn) has apositive and finite limit. To this end it suffices show that starting from n = 2 the sequence isdecreasing and bounded from below by 1 (see Theorem 8.38).

• Bounded from below by 1 for n ≥ 2:

xn ≥ 1 ≡1

2

(xn−1 +

1

xn−1

)≥ 1 ≡

xn−1 +1

xn−1≥ 2 ≡ (it is easy to see that all xk > 0 for all k)

x2n−1 + 1 ≥ 2xn−1 ≡(xn−1 − 1)2 ≥ 0.

14In Example 8.41 both (a2n) and (a2n+1) converge, but to different limits.15The sequence is defined by a so called recursive formula. An initial condition that tells where the sequence

starts: x1 = 2018. A recursion formula (8.17) that tells how any term of the sequence is defined in terms ofthe preceding terms. According to the Principle of Mathematical Induction, the sequence (xn) is well definedfor all n. Another example of a sequence defined by a recursive formula is a well known Fibonacci sequence.


Since the last inequality is obviously true, the first inequality is also true as equivalent.

• Decreasing starting from n ≥ 2:

xn+1 ≤ xn ≡1

2

(xn +

1

xn

)≤ xn ≡

xn +1

xn≤ 2xn ≡

1

xn≤ xn ≡

x2n ≥ 1 ≡xn ≥ 1.

Since the last inequality is true for n ≥ 2 (as proved earlier), the first inequality is true forn ≥ 2 as equivalent.

Proposition 8.44. A sequence (an) does not converge to g ∈ R if and only if there is ε > 0and a subsequence (ank

)k such that |ank− g| ≥ ε.

Proof. A sequence (an) does not converge to g ∈ R if and only if the following negation istrue:

¬ (∀ε > 0 ∃N ∀n > N |an − g| < ε) ,

that is

(8.18) ∃ε > 0 ∀N ∃n > N |an − g| ≥ ε.

Fix such ε > 0 and take N = 1. It follows from (8.18) that there is n1 > N = 1 such that|an1 − g| ≥ ε. Take now N = n1 (ε remains the same through the rest of the proof). Then itfollows from (8.18) that there is nn > N = n1 such that |an2 − g| ≥ ε. Take now N = n2 andagain we can find n3 > n2 such that |an3 − g| ≥ ε. Continuing this procedure ad infinitumwe obtain a subsequence (ank

)k, n1 < n2 < . . . such that |ank− g| ≥ ε for all k ∈ N.

As an application we will prove the following useful result.16

Theorem 8.45. limn→∞ an = g ∈ R if and only if every subsequence of (an) has a subse-quence with the limit equal g.

Proof. We will prove the result under the assumption that g ∈ R, leaving the case of g = ±∞to the reader.

If (an) converges to g, then every subsequence of a subsequence converges to g. That isnearly obvious.

16It is nearly obvious that an → g if and only if every subsequence of (an) has limit equal g. However, thetheorem is deeper than that since we do not require subsequences to have limits equal g, but instead we havea weaker condition that each subsequence has a subsequence with the limit equal g.

8.5. SERIES 113

To prove the other implication suppose to the contrary that every subsequence has a sub-sequence convergent to g ∈ R and (an) does not converge to g. According to Proposition 8.44,there is ε > 0 and a subsequence (ank

)k such that |ank− g| ≥ ε for all k ∈ N. Hence (ank

)kcannot have a subsequence convergent to g which contradicts our assumption. The proof iscomplete.

8.5 Series

An important class of examples of sequences is provided by series.

Definition. If sn = a1 + a2 + . . .+ an and

limn→∞

sn = limn→∞

(a1 + a2 + . . .+ an) = g,

then we write

a1 + a2 + a3 + . . . =

∞∑n=1

an = g .

We call the sequence (sn) a series and the elements sn partial sums of the series. The limit gis called the sum of the series. We call the series convergent if (sn) is convergent. If g = +∞or g = −∞, then we say that the series is divergent to +∞ or −∞.

The following result provides a necessary, but not sufficient condition for the convergenceof the series.

Theorem 8.46. If the series∑∞

n=1 an is convergent, then an → 0.

Proof. Suppose the series is convergent to g, i.e., sn → g. Then also sn−1 → g (why?)and hence an = sn − sn−1 → g − g = 0. 2

Theorem 8.47. If |q| < 1, then

∞∑n=0

qn = 1 + q + q2 + . . . =1

1− q.

If |q| ≥ 1, then the series∑∞

n=1 qn is divergent.

Proof. If |g| < 1, then (see Problem 3 in Chapter 4)

1 + q + q2 + . . .+ qn =1− qn+1

1− q→ 1

1− q

by Proposition 8.22. If |q| ≥ 1, then qn does not converge to zero, so the series is divergent(see the previous theorem). 2

Theorem 8.48.∞∑n=1

1

n= +∞.


Proof. Since the sequence of partial sums is increasing it suffices to show that it is notbounded. This is however, impossible, because according to Example 4.7

s3n+1 − sn =1

n+ 1+

1

n+ 2+ . . .+

1

3n+ 1> 1

and hence

s4−s1 > 1, s13−s1 = (s13−s4)+(s4−s1) > 2, s40−s1 = (s40−s13)+(s13−s1) > 3 etc.

2

Series will be carefully investigated in the Section 10.3.

8.6 Decimal expansion

In order to understand the decimal expansion from the perspective of limits of sequences weneed to know that bounded and monotone sequences are convergent.

The notion of the limit of a sequence can be used to explain the meaning of the infinitedecimal expansion of a real number. Let x0 be an arbitrary real number. Let n0 be thelargest integer less than or equal to x.

Let n1 be the largest integer such that n0 +n1/10 ≤ x. Since n0 + 10/10 = n0 + 1 > x weconclude that n1 ∈ {0, 1, 2, . . . , 9}, i.e., n1 is a digit.

Let n2 be the largest integer such that n0 + n1/10 + n2/102 ≤ x. Since

n0 +n110

+10

102= n0 +

n110

+1

10> x

we conclude that n2 ∈ {0, 1, 2, . . . , 9} etc.

8.6. DECIMAL EXPANSION 115

The sequence

n0, n0 +n110, n0 +

n110

+n2102

, . . .

is increasing and it is easy to see that it converges to x. Indeed, if ε > 0, then 10−N < ε forsome N and hence for k ≥ N

x− ε < x− 1

10N≤ x− 1

10k< n0 +

n110

+ . . .+nk10k︸︷︷︸

ak

≤ x, |ak − x| < ε ,

which meanslimk→∞

ak = x.

The fact that

x = limk→∞

(n0 +

n110

+ . . .+nk10k

)=

∞∑k=0

nk10k

where n0 ∈ Z and ni ∈ {0, 1, 2, . . . , 9} is simply denoted by

(8.19) x = n0.n1n2n3 . . .

and the right hand side of (8.19) is called the decimal expansion of x.

Thus we proved that every real number has a decimal expansion. Note also that thedecimal expansion of a real number is not unique. Indeed,

an = 0 +9

10+

9

102+ . . .+

9

10k= 1− 1

10k→ 1 as n→∞

and hence

(8.20) 1 = 0.99999999 . . .

We already discussed the issue of uniqueness of the decimal expansion when we discussedcardinality of sets, but then we relied of an intuitive understanding of the real number andits decimal expansion so the equality (8.20) could not be properly justified until now.

Observe that in the above argument we did not use completeness of the field of realnumbers. For example we could apply the above proof to the field of rational numbers andprove that every rational number has a decimal expansion. However, we will use completenessof R now. Namely, if we select an arbitrary sequence

n0 ∈ Z, ni ∈ {0, 1, 2, 3, . . . , 9}, i = 1, 2, 3, . . .

then the sequence

ak = n0 +n110

+n2102

+ . . .+nk10k

, k = 0, 1, 2, . . .

is increasing and bounded from above by n0+1. Therefore it is convergent, see Theorem 8.37.The limit is, of course, a real number whose decimal expansion is n0.n1n2n3 . . . That means,using completeness of R we proved that every infinite decimal expansion defines a real number.


We used here completeness of R, because it was used in the proof of Theorem 8.37. Indeed,the sequence of rational numbers (approximation of

√2):

1, 1.4, 1.41, 1.414, 1.4142, 1.41421, 1, 414213, 1.4142135, . . .

is increasing and bounded, but it has no limit in Q.

Remark. We constructed the field of real numbers from the field of rational numbers usingcuts. One can also provide a different construction by identification of real numbers withdecimal expansions. This is possible, but not as simple as it seems. In the construction weassume familiarity with rational numbers, so numbers 1, 1.4, 1.41, 1.414, 1.4142,. . . are welldefined however, 1.4142135 . . . (decimal expansion of

√2) can only be understood as a formal

expression and we have to define addition17 and multiplication of such expressions and proveall properties. We would also have to address the problem of non uniqueness of the decimalexpansion. After all, this construction of real numbers would not be easier than the one thatuses cuts.

8.7 Examples

Example 8.49. Find the limit limn→∞

n( 3√n3 + n− n

).

Solution. Identity

a3 − b3

a2 + ab+ b2= a− b

gives

3√n3 + n− n =

(n3 + n)− n3(3√n3 + n

)2+ n 3√n3 + n+ n2

,

n( 3√n3 + n− n

)=

n2(3√n3 + n

)2+ n 3√n3 + n+ n2

=1(

3

√1 + 1

n2

)2+ 3

√1 + 1

n2 + 1

→ 1

3.

2

Example 8.50. Let a ∈ {1, 2, 3, . . . , 9} be a digit. By aaaaa we denote a number whosedecimal representation has 5 digits a. Find the limit

limn→∞

a+ aa+ aaa+ . . .+

n digits︷︸︸︷aaa . . . a

10n.

17That is not easy! When we add 23.2987732 to 4.1307921 we start from the last digit, but for infiniteexpansions we do not have the last digit to start with.

8.7. EXAMPLES 117

Solution. We have

a+ aa+ . . .+ aaa . . . a = a(1 + 11 + 111 + . . .+

n︷︸︸︷111 . . . 1)

= a

(10− 1

9+

102 − 1

9+

103 − 1

9+ . . .+

10n − 1

9

)

= a( n+1︷︸︸︷

111 . . . 10−n9

)=a

9

(10 ·

n︷︸︸︷111 . . . 1−n

)=

a

9

(10 · 10n − 1

9− n

)=

a

81

(10(10n − 1)− 9n

)and hence

a+ aa+ . . .+ aaa . . . a

10n=

a

81

(10

(1− 1

10n

)− 9n

10n

)→ 10a

81.

2

Example 8.51. Prove that limn→∞

(n√n− 1)n = 0.

Proof. Since n√n→ 1, n

√n− 1 < 1/2 for n ≥ n0 and hence

0 < ( n√n− 1)n < 1/2n → 0.

2

Example 8.52. Find a series whose nth partial sum equals sn = (n+ 1)/n.

Solution. a1 = s1 = 2,

a1 + . . .+ an−1 = sn−1 =n

n− 1, a1 + . . .+ an = sn =

n+ 1

n.

Hence

an = sn − sn−1 =n+ 1

n− n

n− 1=

(n+ 1)(n− 1)− n2

n(n− 1)= − 1

n(n− 1),

so the series is

2− 1

2 · 1− 1

3 · 2− 1

4 · 3− 1

5 · 4− . . .

2

Example 8.53. Find the sum of the series∞∑n=1

2n+ 1

n2(n+ 1)2.

Solution. Since2n+ 1

n2(n+ 1)2=

1

n2− 1

(n+ 1)2,


we have

sn =

(1

12− 1

22

)+

(1

22− 1

32

)+ . . .+

(1

n2− 1

(n+ 1)2

)= 1− 1

(n+ 1)2→ 1 .

Accordingly∞∑n=1

2n+ 1

n2(n+ 1)2= 1 .

2

Example 8.54. Find the sum∞∑n=1

sinn!π

720.

Solution. If n ≥ 6, then

n! = 1 · 2 · · · 6︸︷︷︸720

·7 · · ·n ,

so an = 0 for n ≥ 6. Hence

∞∑n=1

sinn!π

720= sin

π

720+ sin

2π

720+ sin

6π

720+ sin

24π

720+ sin

120π

720

= sinπ

720+ sin

π

360+ sin

π

120+ sin

π

30+ sin

π

6.

2

Example 8.55. Prove that∞∑n=1

n

3 · 5 · · · (2n+ 1)=

1

2.

Proof. For n > 1 we have

an =n

3 · 5 · · · (2n+ 1)=

1

2

(2n+ 1)− 1

3 · 5 · · · (2n+ 1)

=1

2

(1

3 · 5 · · · (2n− 1)− 1

3 · 5 · · · (2n+ 1)

)and hence

a1 + a2 + . . .+ an =

=1

3︸︷︷︸a1

+1

2

(1

3− 1

3 · 5+

1

3 · 5− 1

3 · 5 · 7+ . . .+

1

3 · 5 · · · (2n− 1)− 1

3 · 5 · · · (2n+ 1)

)

=1

3+

1

2

(1

3− 1

3 · 5 · · · (2n+ 1)

)→ 1

3+

1

6=

1

2.

2

8.8. PROBLEMS 119

8.8 Problems

Problem 1. Show directly from the definition of the limit that

limn→∞

√2

n− 1= 0.

Problem 2. Prove directly using the definition, that limn→∞

2n+ 5

3n− 7=

2

3.

Problem 3. Prove directly using the definition, that limn→∞

2n+ 5

3n2 − 7= 0.

Problem 4. Prove that if an > 0, a > 0, and limn→∞ an = a, then limn→∞

√an =

√a.

Hint: Use the formula a− b = (a2 − b2)/(a+ b) to estimate√an −

√a.

Problem 5. Find the limit limn→∞(√n2 + 2n+ 5− n). You can use results proved in class,

but clearly explain what results you use.Hint: Use the formula a − b = (a2 − b2)/(a + b). You can use without proving it, that

limn→∞

√1 + 2

n + 5n2 = 1.

Problem 6. Find the limit limn→∞n√n2 + n. You can use results that have been proved in

class.

Problem 7. Prove that the sequencen3 + (−1)nn3

n2 + 1is divergent by showing that it is un-

bounded.

Problem 8. Although the sequencen3 + (−1)nn3

n2 + 1is unbounded (Problem 7), show that it

does not diverge to +∞.

Problem 9. Prove that if limn→∞ an =∞, then

limn→∞

a1 + a2 + . . .+ ann

=∞.

Chapter 9

Exponential and logarithmicfunctions

9.1 The definition

Definition. For n,m ∈ N and x > 0 we define

xmn =

(x

1n

)mand x−

mn =

1

xmn

.

Therefore we definedxq for any x > 0 and q ∈ Q.

Observe that a rational number q has many representations as a fraction

q =m

n=km

kn.

Thus in order to prove that xq is well defined one needs to show that the definition is inde-pendent of the choice of a particular representation i.e.(

x1n

)m=(x

1kn

)km.

This is however, easy and the reader can (and should) check it.

Proposition 9.1. For x, y > 0 and q ∈ Q we have (xy)q = xqyq.

Proof. We have(x

1n y

1n

)n=

(x

1n y

1n

)· · ·(x

1n y

1n

)︸︷︷︸

n times

= x1n · · ·x

1n︸︷︷︸

n

y1n · · · y

1n︸︷︷︸

n

=(x

1n

)n (y

1n

)n= xy.

Since z = x1/ny1/n satisfies zn = xy we have z = (xy)1/n i.e.,

x1n y

1n = (xy)

1n .

121

122 CHAPTER 9. EXPONENTIAL AND LOGARITHMIC FUNCTIONS

Thus(xy)

mn =

((xy)

1n

)m=(x

1n y

1n

)m=(x

1n

)m (y

1n

)m= x

mn y

mn .

Similarly one can prove that(xy)−

mn = x−

mn y−

mn .


Using similar arguments one can prove other well known algebraic properties of the func-tion f(x) = xq like for example

Proposition 9.2. If x > 0 and p, q ∈ Q, then xpxq = xp+q, (xp)q = xpq.


Proposition 9.3. If a > 1, the the function

Q 3 q 7→ aq

is strictly increasing.

Proof. We will prove that if 0 < q1 < q2, then aq1 < aq2 . The case in which we allow theexponents to be less than or equal zero is left to the reader. Let 0 < q1 = m1/n1 < q2 =m2/n2. Then m1n2 < m2n1. Since

a1

n1n2 > 1

we haveaq1 = a

m1n1 =

(a

1n1n2

)m1n2

<(a

1n1n2

)m2n1

= am2n2 = aq2 .


If a = 1, the function q 7→ aq is constant (equal 1) and if 0 < a < 1, the function q 7→ aq

is decreasing.

So far we defined xq for x > 0 and rational exponents q only.

Definition. If a > 1 and x ∈ R we define

ax = sup {aq : q ≤ x ∧ q ∈ Q} .

Clearly the set is nonempty. Since the function q 7→ aq is increasing, the set is bounded fromabove by aq0 , where q0 is any rational number larger than x. Therefore the supremum existsand is finite.

If 0 < a < 1, then we define

ax =1(1a

)x .Note that (1/a)x is well defined, because 1/a > 1.

One can easily prove that the function f(x) = ax, x ∈ R has the same algebraic propertiesas in the case of rational exponents. For example (ab)x = axbx, ax+y = axay, (ax)y = axy

9.1. THE DEFINITION 123

Moreover the function f(x) = ax is increasing when a > 1 and decreasing when 0 < a < 1.We leave details as an exercise.

For a > 1, the function f(x) = ax, f : R → (0,∞) is strictly increasing. Therefore it isone-to-one and hence it has an inverse. The domain of the inverse function is the image of f .The problem is that we did not prove so far that the image of f is (0,∞). We will do it now.

Proposition 9.4. Let f(x) = ax, 1 6= a > 0, f : R→ (0,∞). Then f(R) = (0,∞).

Proof. Assume first that a > 1. Let y ∈ (0,∞). We have to prove that there is x ∈ Rsuch that ax = y. Let

A = {z ∈ R : az ≤ y} .

A 6= ∅, because a−n → 0 as n → ∞ (Corollary 8.22), so a−n0 < y for some n0, and hence−n0 ∈ A. A is bounded from above, because an →∞ as n→∞ (Corollary 8.24) and hencean1 > y for some n1. Thus az > y for all z ≥ n1, so n1 is an upper bound of A. Accordingly

x := supA ∈ R.

Clearly x − 1/n is not an upper bound of A, so there is z ∈ A such that z > x − 1/n andhence

ax−1/n < az ≤ y.

Since x+ 1/n 6∈ A we also have ax+1/n > y. Thus

ax(a1/n)−1 = ax−1/n < y < ax+1/n = axa1/n.

Both sequences that bound y converge to ax as n→∞. Hence the constant sequence y alsoconverges to ax i.e., ax = y.

If 0 < a < 1, then 1/a > 1, so for every y ∈ (0,∞) there is z ∈ R such that (1/a)z = yand hence for x = −z we have ax = (1/a)z = y. The proof is complete. 2

The function f(x) = ax, f : R → (0,∞) is strictly increasing for a > 1 and strictlydecreasing for 0 < a < 1. In either case the inverse function f−1(x) exists. Since f(R) =(0,∞), the domain of f−1 is (0,∞). The function f−1 is strictly increasing when a > 1(why?) and strictly decreasing when 0 < a < 1 (why?). We denote it by

loga(x) := f−1(x).

We have

(9.1) loga(ax) = f−1(f(x)) = x for all x ∈ R.

(9.2) a(loga(x)) = f(f−1(x)) = x for all x ∈ (0,∞),

Algebraic properties of the logarithm follow easily from the algebraic properties of the expo-nential function. For example (9.2) yields

aloga xy = xy = aloga xaloga y = aloga x+loga y .


Since the function z 7→ az is one-to-one the exponents must be equal i.e.,

loga xy = loga x+ loga y.

All other algebraic properties of the logarithm can be proved in a similar way. We assumethat the reader is familiar with those properties and in what follow we will take them forgranted.

Theorem 9.5. If a > 0, then limn→∞ n−a = limn→∞ 1/na = 0.

Proof. Given ε > 0, for n > ε−1/a, we have∣∣∣∣ 1

na− 0

∣∣∣∣ =1

na< ε

and the claim follows. 2

Example 9.6. Consider the sequence an = n1,000,000/1.000001n. The numerator growthsvery fast and denominator seems to growth slowly. However, it turns out that eventually thegrowth of the denominator will be much faster than that of numerator and

limn→∞

n1000000

1.000001n= 0 .

This follows from the next result. Remember this example when you have money in the bankeven with a very small interest rate, but you are willing to wait for a sufficiently long time!

Theorem 9.7. If p > 0 and α ∈ R, then

limn→∞

nα

(1 + p)n= 0 .

Proof. For k > α, k > 0 and n > 2k we have

(1 + p)n >

(n

k

)pk · 1n−k =

n(n− 1) · · · (n− k + 1)

k!pk >

(n2

)k pkk!,

because each factor in the product n(n− 1) · · · (n− k + 1) is bigger than n/2. Hence

0 <nα

(1 + p)n< nα

(2

n

)k k!

pk=

2kk!

pknα−k .

Since α − k < 0, the right hand side converges to zero as n → ∞ (Theorem 9.5), and theclaim follows. 2

9.2 Number e and natural logarithm

Theorem 9.8. The sequence an = (1 + 1/n)n is strictly increasing, the sequence bn = (1 +1/n)n+1 is strictly decreasing and both sequences converge to the same limit. We denote thislimit by

e = limn→∞

(1 +

1

n

)n= lim

n→∞

(1 +

1

n

)n+1

.

9.2. NUMBER E AND NATURAL LOGARITHM 125

Proof. First we will prove that an is strictly increasing. To this end it suffices to showthat an+1/an > 1. We have

an+1

an=

(1 + 1

n+1

)n+1

(1 + 1

n

)n =

(n+2n+1

)n+1

(n+1n

)n=

((n+ 2)n

(n+ 1)2

)n n+ 2

n+ 1=

(n2 + 2n

n2 + 2n+ 1

)nn+ 2

n+ 1

=

(1− 1

n2 + 2n+ 1

)n n+ 2

n+ 1

≥(

1− n

n2 + 2n+ 1

)n+ 2

n+ 1(Bernoulli)

=n3 + 3n2 + 3n+ 2

n3 + 3n2 + 3n+ 1> 1.

Similarly we can show that the sequence bn is decreasing (we leave it as an exercise). Sincean ≤ bn for every n we have

2 = a1 < a2 < a3 < . . . < an < . . . < bn < bn−1 < . . . < b1 = 4.

Hence an is increasing and bounded from above, so convergent. Also bn is decreasing andbounded from above, so convergent. Clearly limn→∞ an ∈ (2, 4), so limn→∞ an 6= 0 and hence

limn→∞ bnlimn→∞ an

= limn→∞

bnan

= limn→∞

1 +1

n= 1, lim

n→∞bn = lim

n→∞an.

2

One more proof that an is increasing. A clever application of the Arithmetic-Geometricmean inequality gives ((

1 +1

n

)n· 1)1/(n+1)

= n+1

√(1 +

1

n

)· · ·(

1 +1

n

)· 1

≤(1 + 1

n

)+ . . .+

(1 + 1

n

)+ 1

n+ 1= 1 +

1

n+ 1.

Hence (1 +

1

n

)n· 1 ≤

(1 +

1

n+ 1

)n+1

,

(1 +

1

n

)n≤(

1 +1

n+ 1

)n+1

.

2

Remark. Since (1+1/n)n is increasing and (1+1/n)n+1 is decreasing and e is their commonlimit, we have that (

1 +1

n

)n< e <

(1 +

1

n

)n+1

for every n. Taking n large we obtain lower and upper estimate for e. One can prove that

e = 2.718281828 . . .


Theorem 9.9. e =∞∑n=0

1

n!.

Proof. Let

xn =

(1 +

1

n

)n, yn = 1 +

1

1!+

1

2!+ . . .+

1

n!.

Thus (yn) is the sequence of the partial sums of the above series. The binomial formula yields

xn = 1n +

(n

1

)1n−1

1

n+

(n

2

)1n−2

1

n2+ . . .+

(n

n− 1

)11

1

nn−1+

(n

n

)10

1

nn

= 1 + 1 +n(n− 1)

2!

1

n2+n(n− 1)(n− 2)

3!

1

n3+ . . .

+n(n− 1)(n− 2) · · · (n− k + 1)

k!

1

nk+ . . .+

n(n− 1)(n− 2) · · · 1n!

1

nn

= 1 + 1 +1

2!

(n− 1

n

)+

1

3!

(n− 1

n

)(n− 2

n

)+ . . .+

1

n!

(n− 1

n

)(n− 2

n

)· · ·(

1

n

).

Therefore xn ≤ yn. On the other hand the coefficients at 1/k! converge to 1 as n → ∞ sowe should expect that the limit of xn will be as large as that of yn which together with theinequality xn ≤ yn should give equality of the limits. To turn this observation into a rigorousargument, fix k. Then for n ≥ k we have

xn ≥ 1 + 1 +1

2!

(n− 1

n

)+ . . .+

1

k!

(n− 1

n

)(n− 2

n

)· · ·(n− k + 1

n

).

With that fixed k, letting n→∞ on both sides of the above inequality yields

e = limn→∞

xn ≥ 1 + 1 +1

2!+

1

3!+ . . .+

1

k!= yk.

This and the inequality xn ≤ yn gives e← xn ≤ yn ≤ e and hence

∞∑n=0

1

n!= lim

n→∞yn = e.

2

Theorem 9.10. e is irrational.

Proof. Let

xn = 1 +1

1!+

1

2!+ . . .+

1

n!.

9.2. NUMBER E AND NATURAL LOGARITHM 127

Then

e− xn =1

(n+ 1)!+

1

(n+ 2)!+

1

(n+ 3)!+ . . .

=1

(n+ 1)!

(1 +

1

n+ 2+

1

(n+ 2)(n+ 3)+

1

(n+ 2)(n+ 3)(n+ 4)+ . . .

)<

1

(n+ 1)!

(1 +

1

n+ 1+

1

(n+ 1)2+

1

(n+ 1)3+ . . .︸︷︷︸

geometric series

)

=1

(n+ 1)!

1

1− 1n+1

=1

n!n.

Hence

0 < e− xn <1

n!n.

Suppose that e is a rational number i.e., e = p/q for some p, q ∈ N. Then

0 < e− xq <1

q! q,

0 < eq!︸︷︷︸integer

− xqq!︸︷︷︸integer

<1

q.

Since there are no integers between 0 and 1/q we arrived a contradiction. This proves that ecannot be a rational number. 2

Definition. The natural logarithm is defined by

lnx = log x = loge x.

Observe that differently than in high school, log x is with base e instead of 10.

It is not clear at this point why the base e is more important than any other base. It willbe transparent later when we will study derivatives, but even now the following result showsa nice and important inequality that is true for the natural logarithm.

Lemma 9.11.1

n+ 1< ln

(1 +

1

n

)<

1

nfor n = 1, 2, 3, . . ..

Proof. The inequality (1 +

1

n

)n< e <

(1 +

1

n

)n+1

implies

n ln

(1 +

1

n

)< 1 < (n+ 1) ln

(1 +

1

n

).

The left inequality gives

ln

(1 +

1

n

)<

1

n


and the right inequality gives1

n+ 1< ln

(1 +

1

n

).

2

Theorem 9.12. The sequence

an = 1 +1

2+

1

3+ . . .+

1

n− lnn

is convergent to a finite limit

γ := limn→∞

(1 +

1

2+

1

3+ . . .+

1

n− lnn

).

Remark. The limit γ = 0.5772156649 . . . is called the Euler constant. It is not know if γis rational or not. Good exercise for you!

Proof. We will prove that the sequence is decreasing. To this end it suffices to show thatan+1 − an < 0. We have

an+1 − an =

(1 +

1

2+ . . .+

1

n+

1

n+ 1

)− ln(n+ 1)−

(1 +

1

2+ . . .+

1

n

)+ lnn

=1

n+ 1− ln(n+ 1) + lnn

=1

n+ 1− ln

(n+ 1

n

)=

1

n+ 1− ln

(1 +

1

n

)< 0 ,

where the last inequality follows from the lemma. Therefore the sequence (an) is decreasing.Applying the lemma one more time we have

1 > ln(1 + 1),1

2> ln

(1 +

1

2

), . . . ,

1

n> ln

(1 +

1

n

),

and hence

an = 1 +1

2+ . . .+

1

n− lnn

> ln(1 + 1) + ln

(1 +

1

2

)+ . . .+ ln

(1 +

1

n

)− lnn

= ln 2 + ln3

2+ . . .+ ln

n+ 1

n− lnn

= ln

(2 · 3

2· 4

3· 5

4· · · n+ 1

n

)− lnn

= ln(n+ 1)− lnn > 0.

Thus the sequence is decreasing and bounded from below by 0. Hence it is convergent. 2

9.3. EXAMPLES 129

As a corollary we obtain another proof that

∞∑n=1

1

n= +∞.

Indeed, since the sequence of partial sums

sn = 1 +1

2+

1

3+ . . .+

1

n

is increasing it suffices to show that it is not convergent. Suppose it is convergent. Since thesequence in Theorem 9.12 is also convergent, the difference of two sequences i.e., the sequencesn − an = lnn is also convergent, but it is not, since limn→∞ lnn = +∞.

9.3 Examples

Example 9.13. Prove that the sequence n√n is decreasing starting from n = 3.

Proof. We have

n1/n > (n+ 1)1/(n+1) ≡ nn+1 > (n+ 1)n ≡ n >(n+ 1)n

nn=

(1 +

1

n

)n.

The last inequality is true for n ≥ 3, because n ≥ 3 > e > (1 + 1/n)n and hence the firstinequality is true for n ≥ 3 as equivalent. 2

Example 9.14. Find the following limits

(a) limn→∞

(n!

nne−n

)1/n

, (b) limn→∞

((n!)3

n3ne−n

)1/n

.

Solution.

(a). Let an =n!

nne−n. Then

an+1

an=

(n+ 1)!

(n+ 1)n+1e−(n+1)

nne−n

n!=

n!(n+ 1)

(n+ 1)(n+ 1)ne−ne−1nne−n

n!

=nne

(n+ 1)n=

e(1 + 1

n

)n → 1

and hence Theorem 8.31 gives

n√an =

(n!

nne−n

)1/n

→ 1.


(b). Let an =(n!)3

n3ne−n. Then

an+1

an=

((n+ 1)!

)3((n+ 1)n+1

)3e−(n+1)

n3ne−n

(n!)3=

(n!)3(n+ 1)3

(n+ 1)3(n+ 1)3ne−ne−1n3ne−n

(n!)3

=n3ne

(n+ 1)3n=

e((1 + 1

n

)n)3 → e

e3= e−2

and hence Theorem 8.31 gives

n√an =

((n!)3

n3ne−n

)1/n

→ e−2 .

2

Chapter 10

Subsequences, series and theCauchy condition

10.1 The Cauchy condition and the Bolzano-Weierstrass the-orem

The following result plays a fundamental role in analysis and toplogy.

Theorem 10.1 (Bolzano-Weierstrass). Every bounded sequence of real numbers has a con-vergent subsequence.

Proof. Let (xn) be a bounded sequence i.e.,

a1 ≤ xn ≤ b1

for some a1, b1 ∈ R and all n ∈ N. Divide the interval [a1, b1] into two subintervals of equallength: [

a1,a1 + b1

2

],

[a1 + b1

2, b1

].

Obviously infinitely many elements belong to one of the two intervals. Perhaps to both ofthem, but for sure to at least one of them. Choose such an interval and denote its endpointsby a2 and b2. Hence

[a2, b2] =

[a1,

a1 + b12

]or [a2, b2] =

[a1 + b1

2, b1

].

Now divide the interval [a2, b2] into two subintervals of equal length[a2,

a2 + b22

],

[a2 + b2

2, b2

].

Infinitely many elements of the sequence belong to at least one of the two intervals. Denotesuch an interval by [a3, b3] etc. We repeat this procedure infinitely many times.

131

132 CHAPTER 10. SUBSEQUENCES, SERIES AND THE CAUCHY CONDITION

In this construction we obtain an increasing sequence (an) and a decreasing sequence (bn):

a1 ≤ a2 ≤ a3 ≤ . . . ≤ b3 ≤ b2 ≤ b1.

We see that the sequence (an) is increasing and bounded from above, so it is convergent. Alsothe sequence (bn) is decreasing and bounded form below, so it is also convergent. Since

bn − an =b1 − a12n−1

→ 0

we conclude that both sequences converge to the same limit

limn→∞

an = limn→∞

bn = g.

Now we will show how to select a subsequence (xnk) such that

limk→∞

xnk= g.

Choose n1 = 1. We have a1 ≤ xn1 ≤ b1. Infinitely many xn’s satisfy the inequality a2 ≤xn ≤ b2, so there is n2 > n1 such that a2 ≤ xn2 ≤ b2. Since infinitely many xn’s satisfy theinequality a3 ≤ xn ≤ b3, there is n3 > n2 such that a3 ≤ xn3 ≤ b3 etc.

We constructed an increasing sequence of integers n1 < n2 < n3 < . . . such that thesubsequence (xnk

) satisfiesak ≤ xnk

≤ bk.

Since both the left and the right hand side converge to g we conclude that xnk→ g. 2

Definition 10.2. We say that (xn) is a Cauchy sequence if

∀ε > 0 ∃n0 ∀n,m ≥ n0 |xn − xm| < ε.

The above condition is called the Cauchy condition.

The next result states that a sequence is convergent if and only if it is a Cauchy sequence.This is a very important result and it is easy to understand why. If we want to prove that asequence (xn) is convergent using the definition we have to find a number g (limit) such that|xn− g| < ε for all sufficiently large n, so the first thing we have to do is to identify the limit.However, in many situations one can prove that a sequence is convergent, even if it is notpossible to find the limit. This can be achieved by checking that the sequence satisfies theCauchy condition – this condition does not involve the limit g and only requires an estimateof |xn − xm|.

Theorem 10.3. A sequence (xn) is convergent if and only if (xn) is a Cauchy sequence.

Proof. Since this is “if and only if” condition, we have to prove two implications.

(⇒) Suppose that limn→∞ xn = g ∈ R. Then, for every ε > 0 we can find n0 such that

|xn − g| <ε

2for n ≥ n0

10.2. THE UPPER AND THE LOWER LIMITS 133

and thus

|xn − xm| = |xn − g + g − xm| ≤ |xn − g|+ |xm − g| <ε

2+ε

2= ε

for all n,m ≥ n0.

(⇐) Suppose (xn) is a Cauchy sequence. First we will prove that (xn) is bounded. Thedefinition of the Cauchy sequence with ε = 1 gives that there is n0 such that

|xn − xm| < 1 for all n,m ≥ n0.

In particular we can take m = n0 and hence

|xn − xn0 | < 1 for all n ≥ n0,

so

|xn| < 1 + |xn0 | for all n ≥ n0.

Therefore

|xn| < 1 + |xn0 |+ |x1|+ |x2|+ . . .+ |xn0−1| for all n.

Since the sequence (xn) is bounded, the Bolzano-Weierstrass theorem implies that there is asubsequence (xnk

) convergent to a finite limit

xnk→ g ∈ R.

We will prove that limn→∞ xn = g. To this end let ε > 0 be given. We need to find N0 suchthat

|xn − g| < ε for all n ≥ N0.

Because (xn) is a Cauchy sequence, there is N1 such that

|xn − xm| <ε

2for all n,m ≥ N1.

Convergence of xnkimplies that there is nk0 ≥ N1 such that

|xnk0− g| < ε

2.

Set N0 = nk0 . Then for n ≥ N0 = nk0 ≥ N1 we have

|xn − g| ≤ |xn − xnk0|+ |xnk0

− g| < ε

2+ε

2= ε .

2

10.2 The upper and the lower limits

For any sequence (an) of real numbers we define

lim supn→∞

an = limn→∞

sup{an, an+1, an+2 . . .}, lim infn→∞

an = limn→∞

inf{an, an+1, an+2 . . .}.


First observe that if (an) is unbounded from above then there is a subsequence divergentto ∞ and lim supn→∞ an = ∞. Similarly, if (an) is unbounded from below, then there is asubsequence divergent to −∞ and lim infn→∞ an = −∞.

Moreover, it is not difficult to prove that

(10.1) limn→∞

an =∞ if and only if lim infn→∞

an = lim supn→∞

an = +∞

and

(10.2) limn→∞

an = −∞ if and only if lim infn→∞

an = lim supn→∞

an = −∞.

We leave details to the reader.

For the rest of this section we will focus on a more interesting case of bounded sequences.

Let (an) be a bounded sequence. Since it has a convergent subsequence, the set of possiblelimits of convergent subsequences of (an) is bounded and non-empty.

Theorem 10.4. Let (an) be a bounded sequence. Then there is a convergent subsequence of(an) such that its limit is greater than or equal to the limit of any other convergent subsequenceof (an). In other words supG ∈ G.

Proof. Let x = supG. It suffices to show that there is a subsequence convergent to x.

Let k ∈ N. Since x − 1/k < supG, there is a subsequence with the limit larger thanx − 1/k and, in particular, infinitely many an’s are larger than x − 1/k. Therefore, thereis n1 such that an1 > x − 1/1. Since infinitely many an’s are larger than x − 1/2, there isn2 > n1 such that an2 > x − 1/2. Since there are infinitely many an’s larger than x − 1/3,there is n3 > n2 such that an3 > x − 1/3 etc. We obtain a subsequence (ank

)k such thatank

> x − 1/k. The sequence (ank)k is bounded and so it has a convergent subsequence

(ank`)`. Clearly lim`∞ ank`

≤ x. Note that k` ≥ ` so

ank`≥ x− 1

k`≥ x− 1

`≥ x− 1

iprovided ` ≥ i

and hence

x ≥ lim`→∞

ank`≥ x− 1

ifor all i ∈ N

proving that lim`→∞ ank`= x.

Note that the sequencebn = sup{an, an+1, an+2, . . .}

is bounded and decreasing1 so it is convergent.

Theorem 10.5. Let (an) be a bounded sequence. Then

limn→∞

sup{an, an+1, an+2, . . .} = supG.

1Increasing n we, decrease the set over which we take supremum and hence supremum is decreasing.

10.2. THE UPPER AND THE LOWER LIMITS 135

Proof. Let (ank) be a subsequence convergent to supG, the existence of which is guaranteed

by Theorem 10.4. Fix i ∈ N. For each n ∈ N, there is nk > n such that ank> supG − 1/i

and hence for each n

bn = sup{an, an + 1, an+2 . . .} ≥ ank> supG− 1

i

proving that limn→∞ bn ≥ supG − 1/i so limn→∞ bn ≥ supG. It remains to showthat limn→∞ bn ≤ supG. Suppose to the contrary that limn→∞ bn > supG. Thenlimn→∞ bn > supG + ε for some ε > 0. Since the sequence bn is decreasing, for eachn, sup{an, an+1, an+2, . . .} > supG + ε and so for each n we can find k ≥ n such thatak > supG + ε. It easily follows that there is a subsequence (ank

), ank> supG + ε and so

(ank) has a subsequence convergent to a limit that is greater than or equal to supG+ε which

contradicts the definition of the set G.

Similarly we prove

Theorem 10.6. Let (an) be a bounded sequence. Then there is a convergent subsequence of(an) such that its limit is less than or equal to the limit of any other convergent subsequenceof (an). In other words, inf G ∈ G.

Theorem 10.7. Let (an) be a bounded sequence. Then

limn→∞

inf{an, an+1, an+2 . . .} = inf G.

It follows that is lim supn→∞ an and lim infn→∞ an are the lowest and the largest possiblelimits among all subsequences of (an).

Finally we have

Theorem 10.8. For any sequence (an) of real numbers we have

limn→∞

an = g if and only if lim infn→∞

an = lim supn→∞

an = g.

Proof. The case g = ±∞ was discussed in (10.1) and (10.2) and left to the reader as anexercise. Thus we assume that g ∈ R.

If an → g, then all the subsequences converge to g so G = {g}, inf G = supG = g andclearly

(10.3) lim infn→∞

an = inf G = lim supn→∞

an = supG = g.

That proves implication from left to right. To prove the implication from right to left assumethat (10.3) is satisfied. Since g ∈ R, the sequence (an) is bounded as otherwise it would havea subsequence divergent to −∞ when not bounded from below (so lim infn→∞ an = −∞) ora subsequence divergent to +∞ when not bounded from above (so lim supn→∞ an = +∞).

Since (an) is bounded, every subsequence of (an) has a convergent subsequence. SincesupG = inf G = {g} it follows that G = {g} so all such convergent subsequences mustconverge to g and hence limn→∞ an = g by Theorem 8.45.


10.3 Series

As we know, the series a1 + a1 + a3 + . . . can be identified with the sequence of partial sumssn = a1 + a2 + . . . + an. Therefore the characterization that a sequence is convergent if andonly if it is a Cauchy sequence has a direct reformulation for series. Recall that (sn) is Cauchyif and only if

∀ε > 0 ∃n0 ∀k, ` ≥ n0 |sk − s`| < ε.

It is easy to see that this condition is equivalent to

∀ε > 0 ∃n0 ∀k > ` ≥ n0 − 1 |sk − s`| < ε.

If we write` = n− 1, n ≥ n0, k = n+m, m ≥ 0,

then the condition becomes

∀ε > 0 ∃n0 ∀n ≥ n0 ∀m ≥ 0 | an + an+1 + . . .+ an+1︸︷︷︸sn+m−sn−1

| < ε.

Therefore we have

Theorem 10.9. The series a1 + a2 + a3 + . . . is convergent (to a finite sum) if and only if

∀ε > 0 ∃n0 ∀n ≥ n0 ∀m ≥ 0 |an + an+1 + . . .+ an+m| < ε.

The condition formulated in the above theorem is called the Cauchy condition for a series.

Theorem 10.10. If an ≥ 0, n = 1, 2, 3, . . ., then∑∞

n=1 an converges if and only if thesequence of partial sums is bounded.

Proof. It is easy to see that the condition an ≥ 0 is equivalent to the condition that thesequence of partial sums is increasing, and an increasing sequence is convergent if and only ifit is bounded. 2

The next test for convergence of a series makes use of the Cauchy condition.

Theorem 10.11 (Comparison test).

(a) Suppose that there is N such that |an| ≤ bn for n ≥ N and∑∞

n=1 bn converges. Thenthe series

∑∞n=1 an converges, too.

(b) Suppose that there is N such that an ≥ bn ≥ 0 for all n ≥ N and∑∞

n=1 bn diverges (to∞, of course), then the series

∑∞n=1 an diverges, too (to ∞, of course).

Proof. Part (b) is quite obvious, so we will only prove part (a). Since the series∑∞

n=1 bnconverges, it satisfies the Cauchy condition

∀ε > 0 ∃n0 ∀n ≥ n0 ∀m ≥ 0 bn + bn+1 + . . . bn+m < ε.

10.3. SERIES 137

We did not put absolute value because bk ≥ 0. Hence for n ≥ max{n0, N} and m ≥ 0

|an + an+1 + . . .+ an+m| ≤ bn + bn+1 + . . .+ bn+m < ε,

so the series∑∞

n=1 an satisfies the Cauchy condition and thus it is convergent. 2

Definition. We say that a series∑∞

n=1 an is absolutely convergent if∑∞

n=1 |an| is convergent.

Theorem 10.12. If∑∞

n=1 an is absolutely convergent, then∑∞

n=1 an is convergent.

Proof. Observe that |an| ≤ |an| := bn. Since the series∑∞

n=1 bn =∑∞

n=1 |an| is convergent,the Comparison Test implies convergence of

∑∞n=1 an. 2

Theorem 10.13 (Cauchy Condensation Test). Suppose a1 ≥ a2 ≥ a3 ≥ . . . ≥ 0. Then∑∞n=1 an converges if and only if

∑∞n=0 2na2n converges.

Proof. Denote

sn = a1 + a2 + . . .+ an, tk = a1 + 2a2 + 4a4 + . . .+ 2ka2k .

For n < 2k we have

sn = a1 + . . .+ an ≤ a1 + (a2 + a3) + (a4 + a5 + a6 + a7) + . . .+ (a2k + . . .+ a2k+1−1)

≤ a1 + 2a2 + 4a4 + . . .+ 2ka2k = tk.

Now if∑∞

k=0 2ka2k converges, the partial sums tk are bounded, so are the partial sums sn,and since an ≥ 0 that implies convergence of the series

∑∞n=1 an.

For n > 2k we have

sn = a1 + a2 + . . .+ an

≥ a1 + a2 + (a3 + a4) + (a5 + a6 + a7 + a8) + . . .+ (a2k−1+1 + . . .+ a2k)

≥ 1

2a1 + a2 + 2a4 + 4a8 + . . .+ 2k−1a2k =

1

2tk .

Now if∑∞

n=1 an converges, the partial sums sn are bounded, so are the partial sums tk andhence the series

∑∞k=0 2ka2k converges. 2

Theorem 10.14 (p-Series Test).

∞∑n=1

1

npconverges for p > 1 and diverges for 0 < p ≤ 1.

Remark 10.15. The divergent series∑∞

n=11n =∞ is called the harmonic series.

Proof. For 0 < p ≤ 1, 1/np ≥ 1/n, so the divergence of the series∑∞

n=1 1/np follows from∑∞n=1 1/n = +∞. To prove convergence for p > 1 we apply the Cauchy condensation test.

Let an = 1/np. Then

2na2n = 2n1(

2n)p =

1(2p−1

)n ,


∞∑n=0

2na2n =∞∑n=0

(1

2p−1

)nconverges as a geometric series with |1/2p−1| < 1. Hence

∑∞n=1 an converges, too. 2

Remark. One can prove that∞∑n=1

1

n2=π2

6.

Theorem 10.16.∞∑n=2

1

n(log n)pconverges for p > 1 and diverges for 0 < p ≤ 1.

Proof. Let an = 1/(n(log n)p). Then

2na2n = 2n1

2n(log 2n)p=

(1

log 2

)p 1

np.

Hence the previous result yields that

∞∑n=1

2na2n =

(1

log 2

)p ∞∑n=2

1

np

converges if and only if p > 1 and the theorem follows from the Cauchy condensation test. 2

Theorem 10.17 (Ratio Test (d’Alambert Test)).

(a) If limn→∞

∣∣∣∣an+1

an

∣∣∣∣ < 1, then the series∑∞

n=1 an converges absolutely.

(b) If limn→∞

∣∣∣∣an+1

an

∣∣∣∣ > 1, then∑∞

n=1 an diverges.

Theorem 10.18 (Root Test (Cauchy Test)).

(a) If limn→∞n√|an| < 1, then the series

∑∞n=1 an converges absolutely.

(b) If limn→∞n√|an| > 1, then

∑∞n=1 an diverges.

Remark. If

limn→∞

∣∣∣∣an+1

an

∣∣∣∣ = 1 or limn→∞

n√|an| = 1,

then we cannot conclude convergence or divergence of the series. For example, if an = 1/n,then the above limits are equal 1 and the series

∑∞n=1 an diverges. If an = 1/n2, then still

the above limits are equal 1, but this time the series∑∞

n=1 an converges.

We will prove the d’Alambert test only; the proof for the Cauchy test is similar and leftas an exercise.

10.3. SERIES 139

Proof. If limn→∞ |an+1/an| < 1, then there is 0 < q < 1 and n0 such that∣∣∣∣an+1

an

∣∣∣∣ < q for n ≥ n0.

For n ≥ n0 we have

|an+1| < q|an| < q2|an−1| < . . . < qn+1−n0 |an0 | ,

|an+1| <(q−n0 |an0 |

)qn+1

Replacing n+ 1 by n in this formula we have

|an| <(q−n0 |an0 |

)qn for n > n0.

Since the series∞∑n=1

(q−n0 |an0 |

)qn =

(q−n0 |an0 |

) ∞∑n=1

qn

converges, the series∑∞

n=1 an converges absolutely by the comparison test.

If limn→∞ |an+1/an| > 1, there are n0 and q > 1 such that |an+1/an| > q for n ≥ n0 andit easily follows that an does not converge to zero. Hence the series

∑∞n=1 an diverges (see

Theorem 8.46). 2

Example 10.19. For every x ∈ R the series∑∞

n=0 xn/n! converges absolutely. It is obvious

if x = 0, so we can assume that x 6= 0. If an = xn/n!, then |an+1/an| = |x|/(n + 1) → 0, sothe absolute convergence follows from the d’Alambert test.

Example 10.20. Investigate convergence of the series∞∑n=1

(n

n+ 1

)(n+1)n

.

Solution. Let an =

(n

n+ 1

)(n+1)n

. Then

n√an =

(n

n+ 1

)n+1

=1(

n+1n

)n+1 =1(

1 + 1n

)n 1

1 + 1n

→ 1

e< 1

and hence the series converges. 2

Theorem 10.21. Assume that an > 0, bn > 0 and

an+1

an≤ bn+1

bnfor all n ≥ n0.

If the series∑∞

n=1 bn converges, then the series∑∞

n=1 an converges, too.

Remark 10.22. If limn→∞ bn+1/bn < 1, then convergence of the series∑∞

n=1 an follows fromthe d’Alambert test. However, if

limn→∞

an+1

an= lim

n→∞

bn+1

bn= 1

and we know that the series∑∞

n=1 bn converges, we still can conclude convergence of∑∞

n=1 aneven if d’Alambert’s test does not apply. We will see examples after we prove the theorem.


Proof. Let cn = an/bn. Then

cn+1 =an+1

bn+1≤ anbn

= cn for n ≥ n0,

so cn is decreasing starting from n = n0. Hence cn is bounded, say cn ≤M for all n. Therefore

an = cnbn ≤Mbn

and convergence of the series∑∞

n=1Mbn = M∑∞

n=1 bn implies convergence of∑∞

n=1 an.

Now we will show two applications of the above result.


nn−2

enn!.

Solution. Let an =nn−2

enn!. Then

an+1

an=

(n+ 1)n−1

en+1(n+ 1)!

enn!

nn−2=

(n+ 1)n−2(n+ 1)enn!

e en(n+ 1)n!nn−2

=

(1 + 1

n

)n−2e

=

(1 + 1

n

)ne︸︷︷︸<1

(1 +

1

n

)−2

<

(n

n+ 1

)2

=

1(n+1)2

1n2

.

Hencean+1

an<

1(n+1)2

1n2

.

Since the series∑∞

n=1 1/n2 converges, the series∑∞

n=1 an converges, too. 2


nn

enn!.

Solution. Let an =nn

enn!. Then

an+1

an=

(n+ 1)n+1

en+1(n+ 1)!

enn!

nn=

(n+ 1)n(n+ 1)enn!

enen!(n+ 1)nn

=

(1 + 1

n

)ne

>

(1 + 1

n

)n(1 + 1

n

)n+1 =1

1 + 1n

=n

n+ 1=

1n+11n

.

10.4. ALTERNATING SERIES 141

Hence1

n+11n

≤ an+1

an.

Suppose that the series∑∞

n=1 an converges. The the theorem would give convergence of theseries

∑∞n=1 1/n which is a contradiction. Therefore

∑∞n=1 an diverges. 2

10.4 Alternating series

Theorem 10.25. Suppose that

(a) |c1| ≥ |c2| ≥ |c3| ≥ . . ., limn→∞ cn = 0.

(b) c1 ≥ 0, c2 ≤ 0, c3 ≥ 0, c4 ≤ 0,. . .

Then the series∑∞

n=1 cn is convergent.

Remark 10.26. A series of the form described in the above theorem is called an alternatingseries, because cn changes sign.

Proof. Observe that

c2n+1 + c2n+2 ≥ 0, because c2n+1 ≥ 0 and |c2n+2| ≤ |c2n+1|

andc2n + c2n+1 ≤ 0, because c2n ≤ 0 and |c2n+1| ≤ |c2n|.

Hence the sequence of partial sums sn = a1 + a2 + . . .+ an satisfies

s2n ≤ s2n + (c2n+1 + c2n+2) = s2n+2, s2n−1 ≥ s2n−1 + (c2n + c2n+1) = s2n+2.

Thus s2 ≤ s4 ≤ s6 ≤ . . . and s1 ≥ s3 ≥ s5 ≥ . . . Since

s2n ≤ s2n + c2n+1︸︷︷︸≥0

= s2n+1

we haves2 ≤ s4 ≤ s6 ≤ s8 ≤ . . . ≤ s9 ≤ s7 ≤ s5 ≤ s3 ≤ s1.

Thus the sequence (s2n)∞n=1 is convergent as increasing and bounded and (s2n+1)∞n=0 is con-

vergent as decreasing and bounded. Since

|s2n+1 − s2n| = |c2n+1| → 0

we conclude thatlimn→∞

s2n = limn→∞

s2n+1

and hence the sequence (sn) is also convergent (Proposition 8.42). This however, meansconvergence of the series

∑∞n=1 cn.


As an immediate application of the result wee see that the series∑∞

n=1(−1)n+1/n isconvergent, but not absolutely convergent. We will find the sum of this series.

Theorem 10.27. 1− 1

2+

1

3− 1

4+ . . . = ln 2 .

Proof. We need the following lemma.

Lemma 10.28.

limn→∞

(1

n+

1

n+ 1+ . . .+

1

2n

)= ln 2 .

Proof. Recall the inequality

(10.4)1

n+ 1< ln

(1 +

1

n

)<

1

n.

The left inequality yields

1

n+

1

n+ 1+ . . .+

1

2n< ln

(1 +

1

n− 1

)+ ln

(1 +

1

n

)+ . . .+ ln

(1 +

1

2n− 1

)= ln

(n

n− 1· n+ 1

n· n+ 2

n+ 1· · · 2n

2n− 1

)= ln

2n

n− 1.

Similarly the right inequality of (10.4) gives

ln2n+ 1

n<

1

n+

1

n+ 1+ . . .+

1

2n.

Hence

(10.5) ln2n+ 1

n<

1

n+

1

n+ 1+ . . .+

1

2n< ln

2n

n− 1

It suffices to prove that2

limn→∞

ln2n+ 1

n= lim

n→∞ln

2n

n− 1= ln 2.

We have1

2n+ 1< ln

(1 +

1

2n

)<

1

2n

and hence

limn→∞

ln

(1 +

1

2n

)= 0.

Thus

limn→∞

ln2n+ 1

n= lim

n→∞

(ln 2 + ln

(1 +

1

2n

))= ln 2.

2This follows immediately from the continuity of the function lnx, but we do not want to refer to continuousfunctions at this point of the game.

10.5. MULTIPLICATION OF SERIES 143

Similarly we prove that limn→∞ ln 2n/(n− 1) = ln 2. 2

Now we can prove the theorem. We know that the series converges (alternating series),sn → g, so s2n → g. We have

s2n = 1− 1

2+

1

3− . . .− 1

2n

= 1 +1

2+

1

3+

1

4+ . . .+

1

2n− 2

(1

2+

1

4+ . . .+

1

2n

)= 1 +

1

2+

1

3+ . . .+

1

2n−(

1 +1

2+

1

3+ . . .+

1

n

)=

1

n+ 1+

1

n+ 2+ . . .+

1

2n

=

(1

n+

1

n+ 1+ . . .+

1

2n

)− 1

n→ ln 2

and hence g = ln 2. 2

10.5 Multiplication of series

Formally we would like to multiply two series as follows

(a1 + a2 + a3 + . . .)(b1 + b2 + b3 + . . .) = a1b1 + (a1b2 + a2b1) + (a1b3 + a2b2 + a3b1) + . . .

In the first group a1b1 we collect all terms with indices that add up to 2. In the second groupa1b2 + a2b1 we collect terms with indices that add up to 3. Then terms with indices that addup to 4 and so on. Since we deal with infinite sums we have to rigorously investigate whenthe above formula is correct. We have

Theorem 10.29 (Cauchy Multiplication Formula). If the series∑∞

n=1 an converges abso-lutely and the series

∑∞n=1 bn converges, then( ∞∑

n=1

an

)( ∞∑n=1

bn

)=∞∑n=1

cn ,

wherec1 = a1b1

c2 = a1b2 + a2b1

· · ·

cn = a1bn + a2bn−1 + . . .+ anb1

· · ·

Proof. Let

sn = a1 + a2 + . . .+ an, tn = b1 + b2 + . . .+ bn, un = c1 + c2 + . . .+ cn.


It suffices to prove thatlimn→∞

sntn − un = 0.

Indeed, this will readily yield

∞∑n=1

cn = limn→∞

un = limn→∞

sntn = limn→∞

sn · limn→∞

tn =

( ∞∑n=1

an

)( ∞∑n=1

bn

).

Observe that

un =∑

i+j≤n+1

aibj = a1(b1 + . . .+ bn) + a2(b1 + . . .+ bn−1) + . . .+ anb1

= a1tn + a2tn−1 + . . .+ ant1.


n=1 bn and∑∞

n=1 |an| converge, the sequences of partial sums are boundedi.e., there is M > 0 such that

|tn| ≤M, |a1|+ . . .+ |an| ≤M for all n.

In particular |tn − tm| ≤ 2M for all n and m. Moreover it follows from the Cauchy conditionthat given ε > 0 there is N such that for n,m ≥ N

|tn − tm| <ε

3M, |aN+1|+ . . .+ |an| <

ε

3M.

Now for n > 2N we have

|sntn − un| =∣∣ (a1tn + a2tn + . . .+ antn)︸︷︷︸

sntn

− (a1tn + a2tn−1 + . . .+ ant1)︸︷︷︸un

∣∣≤

(|a1| |tn − tn|+ |a2| |tn − tn−1|+ . . .+ |aN | |tn − tn−N+1|

)+

(|aN+1| |tn − tn−N |+ . . .+ |an| |tn − t1|

)≤

(|a1|+ . . .+ |aN |

) ε

3M+(|aN+1|+ . . .+ |an|

)2M

≤ M · ε

3M+

ε

3M· 2M = ε

and hence limn→∞ sntn − un = 0. We used here the fact that

|tn − tn|, |tn − tn−1|, . . . , |tn − tn−N+1| <ε

3M

which is true, becausen, n− 1, . . . , n−N + 1 ≥ N.

2

Exercise 10.30. Use the Cauchy multiplication formula to find the sum of the series

∞∑n=1

nxn−1, |x| < 1.

10.6. CHANGING THE ORDER IN A SERIES 145

Solution. The series∑∞

n=0 xn converges absolutely for |x| < 1. Hence(

1

1− x

)2

=

( ∞∑n=0

xn

)( ∞∑n=0

xn

)= (1 + x+ x2 + . . .)(1 + x+ x2 + . . .)

= 1 + (1 · x+ x · 1) + (1 · x2 + x · x+ x2 · 1) + (1 · x3 + x · x2 + x2 · x+ x3 · 1) + . . .

+ (1 · xn + x · xn−1 + x2 · xn−2 + . . .+ xn · 1) + . . .

= 1 + 2x+ 3x2 + 4x3 + . . .+ (n+ 1)xn + . . .

=∞∑n=0

(n+ 1)xn =∞∑n=1

nxn−1 .

Thus∞∑n=1

nxn−1 =

(1

1− x

)2

.

2

Example 10.31. As an application of the Cauchy multiplication formula we will prove that( ∞∑n=0

xn

n!

)( ∞∑n=0

yn

n!

)=∞∑n=0

(x+ y)n

n!.

It follows from the d’Alambert test that both series∞∑n=0

xn

n!and

∞∑n=0

yn

n!

converge absolutely. Hence the Cauchy formula gives( ∞∑n=0

xn

n!

)( ∞∑n=0

yn

n!

)

=

∞∑n=0

(x0

0!· y

n

n!+x1

1!· yn−1

(n− 1)!+x2

2!· yn−2

(n− 2)!+ . . .+

xn

n!· y

0

0!

)

=∞∑n=0

1

n!

(x0yn +

n

1!xyn−1 +

n(n− 1)

2!x2yn−2 + . . .+

n(n− 1) · · · 1n!

xny0︸︷︷︸(x+y)n

)

=∞∑n=0

(x+ y)n

n!.

2

10.6 Changing the order in a series

Let∑∞

n=1 an be a series and let φ : N → N be a bijection. Then a series∑∞

n=1 aφ(n) isobtained from

∑∞n=1 an by rearrangement of the elements: we add exactly the same numbers,

but in a different order.


Theorem 10.32. If∑∞

n=1 an converges absolutely and φ : N → N us a bijection, then∑∞n=1 aφ(n) converges and

∞∑n=1

aφ(n) =∞∑n=1

an.

Proof. Suppose that∑∞

n=1 an converges absolutely and let φ : N → N be a bijection. Thenfrom the Cauchy condition for series we have

(10.6) ∀ε > 0 ∃n0 ∀m |an0 |+ |an0+1 + . . .+ |an0+m| < ε.

Denote partial sums of∑∞

n=1 an and∑∞

n=1 aφ(n) by sn and tn respectively. Choose p so largethat

{1, 2, . . . , n0 − 1} ⊂ {φ(1), φ(2), . . . , φ(p)}.

If n > p, then the numbers a1, a2, . . . , an0−1 will cancel out in the difference of partial sums

sn−tn = (a1+a2+. . .+an0−1+an0 +. . .+an)−(aφ(1)+aφ(2)+. . .+aφ(p)+aφ(p+1)+. . .+aφ(n)).

The remaining terms will be ai’s with i ≥ n0 and signs + or −. The + sign will be associatedwith terms in the partial sum sn and the − sign will be associated with the terms in thepartial sum tn. No index i will be repeated twice as elements with the same index will showonce with sign + and once with sign − so they will cancel out. Therefore there is m suchthat3

|sn − tn| ≤ |an0 |+ |an0+1|+ . . .+ |an0+m| < ε,

where the last inequality follows from (10.6). We prove that for every ε > 0 there is p suchthat for n > p, |sn − tn| < ε. That clearly implies that tn converges to the same limit as sn,i.e.,

∞∑n=1

aφ(n) = limn→∞

tn = limn→∞

sn =∞∑n=1

an.

The above result is not true without the assumption of the absolute convergence of∑∞n=1 an.

Example 10.33. Consider the series

1− 1

2+

1

3− 1

4+

1

5− . . .

3For example if n0 = 3, p = 4 and n = 6 we could have

sn − tn = (a1 + a2︸︷︷︸n0−1

+a3 + a4 + a5 + a6)− (a7 + a2 + a5 + a1︸︷︷︸p

+a10 + a8)

so

|sn − tn| ≤ |a3|+ |a4|+ |a6|+ |a7|+ |a8|+ |a10| ≤ |a3|+ |a4|+ |a5|+ |a6|+ |a7|+ |a8|+ |a9|+ |a10|.

10.6. CHANGING THE ORDER IN A SERIES 147

The series is convergent conditionally, but not absolutely. It follows that

5

6= 1− 1

2+

1

3= s3 > s5 > s7 > . . . > 1− 1

2+

1

3− 1

4+

1

5− . . .

Let us now change the order of elements as follows

(10.7) 1 +1

3− 1

2︸︷︷︸>0

+1

5+

1

7− 1

4︸︷︷︸>0

+1

9+

1

11− 1

6︸︷︷︸>0

+ . . .

The sum in each of the groups is positive, because of the elementary estimate

1

4k − 3+

1

4k − 1− 1

2k> 0.

Hence the partial sums of the series (10.7) satisfy t3 < t6 < t9 < . . . Therefore the seriescannot converge4 to

1− 1

2+

1

3− 1

4+

1

5− . . . < t3 =

5

6.

This example is apart of a much stronger result

Theorem 10.34 (Riemann). If a series∑∞

n=1 an converges conditionally, but not absolutely,then for every g ∈ R there is a bijection φ : N→ N such that

∞∑n=1

aφ(n) = g.

Proof. We will prove the theorem in the case g ∈ R leaving the cases g = ±∞ to the reader.We will actually only sketch the proof. The sketch will show a clear idea, while writing theproof with all necessary details would hide the intuition.

Denote non-negative and negative terms in the series by bn and cn. For example in theseries

1

2+ 0 +

1

5+ 6− 2 + 0− 1

3+ 1− 1

4+ . . .

we have∞∑n=1

bn =1

2+ 0 +

1

5+ 6 + 0 + 1 + . . .

∞∑n=1

cn = −2− 1

3− 1

4− . . .

Since the series is convergent, an → 0, and hence bn → 0 and cn → 0. Since∑∞

n=1 |an| =∞,it follows that

∑∞n=1 bn =∞ and

∑∞n=1 cn = −∞. Indeed, if both sums were finite, then

k∑n=1

|an| ≤∞∑n=1

|bn|+∞∑n=1

|cn| <∞ so

∞∑n=1

|an| ≤∞∑n=1

|bn|+∞∑n=1

|cn| <∞

which is a contradiction. Similarly, if∑∞

n=1 bn ∈ R and∑∞

n=1 cn = −∞, then∑∞

n=1 an = −∞and if

∑∞n=1 bn =∞ and

∑∞n=1 cn ∈ R, then

∑∞n=1 an =∞ leading to a contradiction in each

of the cases.4One can show that the series converges.


Take a smallest index k1 such that

B1 = b1 + . . .+ bk1 > g.

Then the smallest m1 such that

B1 + C1 = B1 + c1 + . . .+ cm1 < g.

Then the smallest index k2 > k1 such that

B1 + C1 +B2 = B1 + C1 + bk1+1 + . . .+ bk2 > g

and smallest index m2 > m1 such that

B1 + C1 +B2 + C2 = B1 + C1 +B2 + cm1+1 + . . . cm2 > g.

We continue this procedure forever. The series

(10.8) b1 + . . .+ bk1 + c1 + . . .+ cm1 + bk1+1 + . . .+ bk2 + cm1+1 + . . .+ cm2 + . . .

converges to g. Indeed, it is increasing (part B1), then decreasing (part C1), then increasing(part B2) and so on. Each time it changes from increasing to decreasing and from decreasingto increasing once the partial sum passes through g. But since bn → 0 and cn → 0, the partialsum can cross g by distances that converge to 0 and hence the series converges to g.

Now it remains to observe that the series (10.8) is obtained from the series∑∞

n=1 an bychanging the order in which we add terms. This change of the order is clearly given by acertain bijection φ : N→ N.

10.7 Examples

Exercise 10.35. Investigate convergence of the series

∞∑n=1

1

nln

(1 +

1

n

).

Solution. The inequality

ln

(1 +

1

n

)<

1

n

yields

0 <1

nln

(1 +

1

n

)<

1

n2.


n=1 converges, the comparison test implies convergence of the series

∞∑n=1

1

nln

(1 +

1

n

).

2

10.7. EXAMPLES 149

Exercise 10.36. Prove that the series∞∑n=1

(n√n− 1

)nconverges.

Proof. Let an =(

n√n− 1

)n. Since n

√|an| = n

√n− 1→ 0 < 1, the series converges by the

Cauchy test. 2

Exercise 10.37. Prove that if limn→∞ nan = g > 0, then the series∑∞

n=1 an diverges.

Proof. Since nan → g > 0, there is n0 such that for n ≥ n0, nan > g/2, an > g/(2n).Divergence of the series

∞∑n=1

g

2n=g

2

∞∑n=1

1

n=∞

implies divergence of the series∑∞

n=1 an (comparison test). 2

Exercise 10.38. Find a convergent series∑∞

n=1 an, an > 0 such that nan does not convergeto 0.

Solution. The series∑∞

n=1 1/n2 converges, but nan = 1/n → 0, so this is not a goodexample. However, we can make a trick. Consider the series

a1 + a2 + a3 + . . . =1

12+ 0 + 0 +

1

22+ 0 + 0 + 0 + 0 +

1

32+ 0 + 0 + 0 + 0 + 0 + 0 +

1

42+ . . . ,

i.e.,

an2 =1

n2, ak = 0 for k 6= 12, 22, 32, . . .

This series clearly converges. Suppose that nan → 0. Then the subsequence n2an2 alsoconverges to zero, so

n2an2 = n2 · 1

n2= 1→ 0

which is a contradiction. Hence nan does not converge to 0. 2

Exercise 10.39. Investigate convergence of the series

∞∑n=2

1

(lnn)lnn.

Solution. We have1

(lnn)lnn=

1

nln(lnn)<

1

nefor n > ee

e.


n=1 1/ne converges, the original series converges, too (comparison test). 2

Chapter 11

Limits and continuity

11.1 Limits of functions

For simplicity suppose that f : R → R is a function defined on the set of all real numbers.Let x0 ∈ R. In order to investigate behavior of the function near x0 we introduce the notionof limit.

We say that f has limit g at x0 if

∀ε > 0 ∃δ > 0 ∀x ∈ R 0 < |x− x0| < δ ⇒ |f(x)− g| < ε.

We denote it bylimx→x0

f(x) = g

orf(x)→ g as x→ x0.

This definition is similar to the definition of the limit of a sequence. Geometrically speaking itasserts that if x approaches to x0 through vales different than x0, then values of the functionapproach to g. It is very important to require that x 6= x0, because in many interestinginstances the limit g will be different than the value of f at x0.

Example 11.1. limx→x0

x2 = x20 .

Proof. Given ε > 0 we need to find δ > 0 such that

0 < |x− x0| < δ ⇒ |x2 − x20| < ε.

First we have to do analysis in which we find an appropriate candidate for δ. Inequality|x− x0| < δ implies |x| < |x0|+ δ, so if 0 < δ ≤ 1, then

|x+ x0| ≤ |x|+ |x0| < 2|x0|+ 1.

Hence|x2 − x20| = |x− x0| |x+ x0| ≤ (2|x0|+ 1)|x− x0|.

151

152 CHAPTER 11. LIMITS AND CONTINUITY

We want the right hand side to be less than ε, so we need

|x− x0| <ε

2|x0|+ 1.

Thus it seems that

(11.1) δ = min

{1,

ε

2|x0|+ 1

}would be a good choice. Let’s check if we can prove the claim with this choice of δ.

Let ε > 0 be given. Let δ be given by (11.1). Then

|x+ x0| ≤ |x|+ |x0| ≤ 2|x0|+ 1 (becase 0 < δ ≤ 1).

Hence

|x2 − x20| = |x− x0| |x+ x0| ≤ (2|x0|+ 1)|x− x0|

< (2|x0|+ 1)ε

2|x0|+ 1= ε.


In particular limx→0 x2 = 0. Consider now a function

f(x) =

{x2 if x 6= 0,1 if x = 0.

The proof given above yields limx→0 f(x) = 0. Indeed, for x 6= 0, f(x) = x2 and when westudy the limit at 0, we investigate values of f at x 6= 0 only, so no matter what is the valueof f at 0, the limit will always be the same. In our example

limx→0

f(x) = 0 6= f(0).

If we would change the condition 0 < |x− 0| < δ to |x− 0| < δ in the definition of the limit(at x0 = 0), the limit of f at 0 would not exist. This is why we have to require x 6= x0 whenwe study the limit at x0.

The assumption that f is defined on R is too restrictive. In order to define limit at x0 itsuffices to require that there are points at which f is defined that are arbitrarily close to x0,but different than x0. This leads to the following definition.

Definition. Let A ⊂ R. We say that x0 ∈ R is a cluster point of A if

∀δ > 0 ∃x ∈ A 0 < |x− x0| < δ.

Observe that the cluster point does not necessarily belong to A.

Example 11.2.

(a) 0 is a cluster point of A = (0, 1), but 0 6∈ A.

11.1. LIMITS OF FUNCTIONS 153

(b) 0 is a cluster point of A = {1/n : n ∈ N}.

(c) Any real number is a cluster point of A = Q.

(d) A = N has no cluster points.

Theorem 11.3. x0 ∈ R is a cluster point of A ⊂ R if and only if there is a sequence (xn),xn ∈ A such that xn → x0 and xn 6= x0 for all n.

Proof. (⇒) If x0 is a cluster point of A, then for ε = 1/n, n = 1, 2, 3, . . . there is xn ∈ Asuch that 0 < |xn − x0| < 1/n and hence xn → x0.

(⇐) Suppose that there is a sequence xn ∈ A, xn 6= x0, xn → A. Then for any ε > 0 there isn such that |xn − x0| < ε and hence 0 < |xn − x0| < ε, because xn 6= x0, so x0 is a clusterpoint of A. 2

Now we can provide a general definition of the limit.

Definition. Let f : A → R, A ⊂ R be a given function and let x0 ∈ R be a cluster point ofA. We say that g ∈ R is a limit of f at x0 if

∀ε > 0 ∃δ > 0 ∀x ∈ A 0 < |x− x0| < δ ⇒ |f(x)− g| < ε.

As before, we writelimx→x0

f(x) = g

orf(x)→ g as x→ x0.

If f has no (finite) limit at x0 we say that f diverges at x0.

Similarly as in the case of sequences one can prove

Theorem 11.4. If f : A → R and x0 ∈ R is a cluster point of A, then f can have at mostone limit at x0.

Exercise 11.5. Let A =

{1

n: n ∈ N

}and let f : A→ R be defined by

f

(1

n

)= 1 +

1

22+ . . .+

1

n2.

Find limx→0 f(x).

Solution. We will prove that

limx→0

f(x) =∞∑k=1

1

k2.

To this end we have to show that

∀ε > 0 ∃δ > 0 ∀x ∈ A 0 < |x− 0| < δ ⇒

∣∣∣∣∣f(x)−∞∑k=1

1

k2

∣∣∣∣∣ < ε.


Let ε > 0 be given. We know that(1 +

1

22+

1

32+ . . .+

1

n2

)→

∞∑k=1

1

k2as n→∞.

Thus there is n0 such that∣∣∣∣∣1 +1

22+ . . .+

1

n2−∞∑k=1

1

k2

∣∣∣∣∣ < ε for all n ≥ n0.

Let δ > 0 be such that 0 < δ < 1/n0 and let x ∈ A = {1/n : n ∈ N} be such that0 < |x− 0| < δ. Then x| < 1/n0 and hence x = 1/n for some n > n0, so∣∣∣∣∣f(x)−

∞∑k=1

1

k2

∣∣∣∣∣ =

∣∣∣∣∣f(

1

n

)−∞∑k=1

1

k2

∣∣∣∣∣=

∣∣∣∣∣(

1 +1

22+ . . .+

1

n2

)−∞∑k=1

1

k2

∣∣∣∣∣ < ε,

because n > n0. 2

Remark. One can prove an amazing result that

∞∑k=1

1

k2=π2

6.

The following result provides a very useful tool to study limits of functions.

Theorem 11.6. Let f : A → R be given and let x0 ∈ R be a cluster point of A. Thenlimx→x0 f(x) = g ∈ R if and only if for every sequence (xn) such that A 3 xn → x0, xn 6= x0for all n, we have f(xn)→ g.

Proof. (⇒) Suppose limx→x0 f(x) = g ∈ R and let A 3 xn → x0, xn 6= x0 for all n. Thenthere is δ > 0 such that |f(x) − g| < ε whenever 0 < |x − x0| < δ. Since xn converges to x0and xn 6= x0, there is n0 such that 0 < |xn − x0| < δ for all n ≥ n0 and hence

|f(xn)− g| < ε for all n ≥ n0

which means f(xn)→ g as n→∞.

(⇐) We want to prove implication g ⇒ p. By the contrapositive argument it is equivalent toprove that ¬p⇒ ¬q. Thus suppose it is not true that f(x)→ g as x→ x0 i.e.,

¬ (∀ε > 0 ∃δ > 0 ∀x ∈ A (0 < |x− x0| < δ ⇒ |f(x)− g| < ε)) ,

i.e.∃ε > 0 ∀δ > 0 ∃x ∈ A (0 < |x− x0| < δ ∧ |f(x)− g| ≥ ε).

Fix such ε. Since the remaining part of the statement is true for all δ > 0. It is true, inparticular, for δ = 1/n, n = 1, 2, 3, . . ., so for δ = 1/n there is xn ∈ A such that 0 < |xn−x0| <1/n and |f(xn)− g| ≥ ε. Hence A 3 xn → x0, xn 6= x0, but f(xn) 6→ g, so it is not true thatfor every sequence A 3 xn → x0, xn 6= x0 we have f(xn)→ g. 2

This result immediately implies

11.1. LIMITS OF FUNCTIONS 155

Theorem 11.7. Let f, g : A → R, A ⊂ R be given and let x0 ∈ R be a cluster point of A.Suppose that

limx→x0

f(x) = a ∈ R and limx→x0

g(x) = b ∈ R.

Then

1. limx→x0

f(x)± g(x) = a± b ,

2. limx→x0

f(x)g(x) = ab ,

3. limx→x0

f(x)/g(x) = a/b , provided g(x) 6= 0 for all x ∈ A and b 6= 0.

To see how the previous result applies here, we will prove (2). Then it will be clear howto prove the other parts of the theorem.

Let A 3 xn → x0, xn 6= x0. Then f(xn)→ a, g(xn)→ b and hence f(xn)g(xn)→ ab (weapply here a suitable result about limits of sequences). Thus

limx→x0

f(x)g(x) = ab

by the previous theorem. 2

Similarly one can prove

Theorem 11.8. Let f, g, h : A→ R, A ⊂ R and let x0 ∈ R be a cluster point of A. Supposethat

f(x) ≤ g(x) ≤ h(x) for all x ∈ A such that x 6= x0.

Iflimx→x0

f(x) = limx→x0

h(x) = L ∈ R ,

thenlimx→x0

g(x) = L.

Now we will consider examples of divergent functions.

Exercise 11.9. Show that the limit limx→0

sin1

xdoes not exist in R.

Proof. Suppose that the limit exists, so there is g ∈ R such that if 0 6= xn → 0, thensin(1/xn)→ g. Taking xn = 1/(nπ) we have

sin1

xn= sinnπ = 0→ 0,

so g = 0. Taking xn = 1/(2πn+ π/2) we have

sin1

xn= sin

(2nπ +

π

2

)= sin

π

2= 1→ 1,

so g = 1. Thus 0 = 1. Contradiction. Hence the limit does not exist in R. 2


Exercise 11.10. Show that the limit limx→0

1

xdoes nor exist in R.

Proof. Suppose that limx→0 1/x = g ∈ R. Then for xn = 1/n we have n = 1/nn → gwhich implies g =∞. Contradiction. 2

There are many modifications of the notion of limit that we will describe next.

11.2 One-sided limits

Definition. Let f : A→ R be given and let x0 ∈ R be a cluster point of A ∩ (x0,∞). Thenwe say that g ∈ R is the right-hand limit of f at x0,

limx→x+0

f(x) = g

if

∀ε > 0 ∃δ > 0 ∀x ∈ A x0 < x < x0 + δ ⇒ |f(x)− g| < ε.

Similarly we can define the left-hand limit

limx→x−0

f(x) = g.

Theorem 11.11. Let f : A→ R, A ⊂ R and let x0 ∈ R be a cluster point of both of the setsA ∩ (x0,∞) and A ∩ (−∞, x0). Then

limx→x0

f(x) = g ∈ R

if and only if

limx→x+0

f(x) = limx→x−0

f(x) = g.


In many situations the result is useful: in order to find the limit at x0 we need to findthe left-hand limit, the right-hand limit and show that they are equal. This is useful, forexample, in a situation when f is given by different formulas for x > x0 and x < x0.

11.3 Limits involving infinity

Previously we studied finite limits, but one can also define the limit equal to infinity.

Definition. Let f : A→ R, A ⊂ R be given and let x0 ∈ A be a cluster point of A. We saythat f diverges to ∞ at x0 if

∀M > 0 ∃δ > 0 ∀x ∈ A 0 < |x− x0| < δ ⇒ f(x) > M.

11.3. LIMITS INVOLVING INFINITY 157

Then we writelimx→x0

f(x) =∞

orf(x)→∞ as x→ x0.

We also say that f(x) approaches ∞ as x→ x0. Similarly we define limit equal to −∞

limx→x0

f(x) = −∞.

Clearly, one can also talk about one-sided limits equal to ∞ of −∞.

In all cases discussed above we considered limits at a finite point x0. However, we canalso talk about limits as x→∞.

Definition. Let f : A → R, A ⊂ R. Suppose that the set A is unbounded from above, i.e.every interval (M,∞) contains points of A. We say that g ∈ R is the limit of f as x→∞,

limx→∞

f(x) = g

if∀ε > 0 ∃M > 0 ∀x ∈ A x > M ⇒ |f(x)− g| < ε.

Similarly we define limit at −∞,lim

x→−∞f(x) = g.

Finally, we can also define infinite limits at infinity like e.g.

limx→∞

f(x) =∞.

We leave details of corresponding definitions to the reader.

We close this section with some applications. We know that

(11.2) limn→∞

(1 +

1

n

)n= e.

Theorem 11.12. limx→∞

(1 +

1

x

)x= e.

Proof. Let xn →∞ and let kn = [xn] be the integer part of xn. Hence

(11.3) kn ≤ xn < kn + 1,

so

1 +1

kn + 1< 1 +

1

xn≤ 1 +

1

kn,

and hence (11.3) yields(1 +

1

kn + 1

)kn<

(1 +

1

xn

)xn≤(

1 +1

kn

)kn+1

,


which can be written as

(11.4)

(1 +

1

kn + 1

)kn+1 1

1 + 1kn+1

<

(1 +

1

xn

)xn≤(

1 +1

kn

)kn (1 +

1

kn

).

Since kn →∞ as n→∞, (11.2) implies that both expressions on the left-hand side and theright-hand side of (11.4) converge to e, so also

limn→∞

(1 +

1

xn

)xn= e.

Thus

limx→∞

(1 +

1

x

)x= e.

2

Proposition 11.13. If limx→∞ f(x) = g ∈ R = R ∪ {−∞,+∞}, then

limx→0+

f

(1

x

)= g.

Indeed, if xn → 0+, then 1/xn →∞ and hence f(1/xn)→ g. 2

Theorem 11.14. limx→0

(1 + x)1/x = e.

Remark. We consider here two-sided limit, i.e. we allow x to be negative.

Proof. We will consider the left-hand limit and the right-hand limit separately and wewill show that both limits are equal e. The previous two results give

limx→0+

(1 + x)1/x = limx→∞

(1 +

1

x

)x= e.

We are left with the left-hand limit. We have.

limx→0−

(1 + x)1/x = limx→0+

(1− x)−1/x = limx→∞

(1− 1

x

)−x= lim

x→∞

1(x−1x

)x = limx→∞

(x

x− 1

)x

= limx→∞

(1 +

1

x− 1

)x−1(1 +

1

x− 1

)= ♥.

Since

limx→∞

(1 +

1

x

)x= e, then lim

x→∞

(1 +

1

x− 1

)x−1= e

and hence ♥ = e · 1 = e. 2

11.4. CONTINUOUS FUNCTIONS 159

11.4 Continuous functions

Definition. Let f : A → R, A ⊂ R be a given function. We say that f is continuous atx0 ∈ A if

∀ε > 0 ∃δ > 0 ∀x ∈ A |x− x0| < δ ⇒ |f(x)− f(x0)| < ε.

We say that f is continuous if it is continuous at every point of A.

The definition seems related to the definition of the limit, but the setting is slightlydifferent.

• We assume that x0 ∈ A, while we could investigate limits at points that do not belongto A.

• We do not assume that x0 is a cluster point of A.

Definition. We say that x0 ∈ A is an isolated point of A, if there is δ > 0 such that

(x0 − δ, x0 + δ) ∩A = {x0}.

Proposition 11.15. Let x0 ∈ A. Then x0 is an isolated point of A if and only if x0 is nota cluster point of A.

Proof. (⇒) If (x0 − δ, x0 + δ) ∩A = {x0}, then clearly if is not true that

∀δ > 0 ∃x ∈ A 0 < |x− x0| < δ

so x0 cannot be a cluster point of A.

(⇐) The condition that x0 is a cluster point of A can be stated as

∀δ > 0 ∃x0 6= x ∈ A |x− x0| < δ

Suppose now that x0 ∈ A is not a cluster point of A. Then the negation of the above conditionis true, i.e.

∃δ > 0 ∀x0 6= x ∈ A |x− x0| ≥ δ.

If δ > 0 is such that this condition is satisfied, then clearly

(x0 − δ, x0 + δ) ∩A = {x0}

so x0 is an isolated point of A. 2

Proposition 11.16. Let f : A → R, A ⊂ R. If x0 ∈ A is an isolated point of A, then f iscontinuous at x0.

Proof. Continuity at x0 means

∀ε > 0 ∃δ > 0 ∀x ∈ A |x− x0| < δ ⇒ |f(x)− f(x0)| < ε.


Let ε > 0 be given. Let δ > 0 be such that

(11.5) (x0 − δ, x0 + δ) ∩A = {x0}

Let x ∈ A be such that |x− x0| < δ. Then (11.5) implies that x = x0 and hence

|f(x)− f(x0)| = |f(x0)− f(x0)| = 0 < ε,

so f is continuous at x0.

Example 11.17. Any function f : N→ R is continuous. Indeed, every point of N is isolated.Recall that sequences are identified with functions f : N→ R, namely, we identify a sequence(an) with a function f defined by f(n) = an, so the result can be stated as follows: anysequence is continuous. Sounds strange, but it is correct, because of the definitions we areusing.

Continuity at isolated points is not an interesting issue, because any function is continuousat isolated points, so this does not provide any interesting information. Hence we should onlybe concerned about continuity at cluster points.

Proposition 11.18. Let f : A → R be given and let x0 ∈ A be a cluster point. Then f iscontinuous at x0 if and only if limx→x0 f(x) = f(x0).

The proof is easy and we leave it as an exercise.

Example 11.19. We proved that limx→x0 x2 = x20, so f(x) = x2 is continuous.

Since we characterized limits of functions in terms of sequences, one can easily prove

Proposition 11.20. Let f : A → R, A ⊂ R be given and let x0 ∈ A. Then f is continuousat x0 if and only if for any sequence (xn), xn ∈ A, xn → x0, the sequence (f(x0)) convergesto f(x0).

In contrast with a similar characterization of limits of functions, we allow here xn = x0.The proposition applies both to the case when x0 is a cluster point of A as well to the situationwhen x0 is an isolated point of A. If x0 is an isolated point of A, then xn → x0 implies thatthere is n0 such that xn = x0 for all n ≥ n0 and hence f(xn) = f(x0) → f(x0). Thuscontinuity at isolated points follows.

Theorem 11.21. Let f, g : A → R, A ⊂ R be continuous at x0 ∈ A. Then the functionsf ± g, g are continuous at x0 and f/g is continuous at x0, provided g 6= 0 in A.

This result follows easily from a suitable result for limits of functions. We leave details asan exercise.

Obviously a constant function f(x) = c is continuous. It also immediately follows fromthe definition that the function g(x) = x is continuous. Thus the above result implies thatany polynomial

f(x) = anxn + an−1x

n−1 + . . .+ a1x+ a0

11.4. CONTINUOUS FUNCTIONS 161

is continuous and also any rational function

Q(x) =anx

n + an−1xn−1 + . . .+ a1x+ a0

bmxm + bm−1xm−1 + . . .+ b1x+ b0

is continuous on the set where the denominator is different than zero.

The following example is due to Dirichlet.

Example 11.22. Let f : R→ R be defined as follows

f(x) =

{1 if x is rational,0 if x is irrational.

Then f is discontinuous at every point.

Indeed, suppose that f is continuous at a rational point x. There is a sequence of irrationalnumbers xn that converge to x (for example we may take xn = x +

√2/n). Assuming

continuity at x we have

0 = f(xn)→ f(x) = 1

which is impossible. Similarly we prove that f is discontinuous at irrational points.

The next example constructed by Riemann is more interesting.

Example 11.23. Let f : (0, 1)→ R be defined by f(x) = 0 when x is irrational and f(x) =1/q, when x = p/q is a rational number, where p, q ∈ N and p and q are relatively prime.Then f is continuous at every irrational point of (0, 1) and discontinuous at every rationalone.

The last property of the function is really surprising. One should expect that since thefunction is discontinuous at rational points, it has jumps at every rational point, so the jumpsare “everywhere” and hence the function has to be discontinuous everywhere. That is a oneof many examples in mathematics, where intuition suggests an incorrect answer.

Here is a proof. First, assume that all rational numbers p/q, p, q ∈ N, that we considerbelow are represented in such a way that p and q are relatively prime. Discontinuity at rationalpoints is easy. Let x0 = p/q ∈ (0, 1) be a rational number. Suppose that f is continuous atx0. Let xn ∈ (0, 1) be a sequence of irrational numbers that converge to x0. Then

0 = f(xn)→ f(x0) =1

q

which is a contradiction.

Continuity at irrational points is more difficult. Let x0 ∈ (0, 1) be irrational. We have toprove that

∀ε > 0 ∃δ > 0 ∀x ∈ (0, 1) |x− x0| < δ ⇒ |f(x)− f(x0)| < ε.


Let ε > 0 be given. Let n0 be such that 1/n0 < ε. There is only a finite number of rationalnumbers in (0, 1) with denominator less than n0. Denote all such numbers by α1, . . . , αm.Since x0 is irrational it does not belong to the set and hence

δ := min{|x0 − αi| : i = 1, 2, 3, . . . ,m} > 0.

Hence ∣∣∣∣pq − x0∣∣∣∣ ≥ δ if q < n0.

Thus for any x ∈ (0, 1), if |x − x0| < δ, the either x is irrational or x = p/q, q ≥ n0. In thefirst instance

|f(x)− f(x0)| = |0− 0| = 0 < ε

and in the second one

|f(x)− f(x0)| =∣∣∣∣1q − 0

∣∣∣∣ =1

q≤ 1

n0< ε.

Hence f is continuous at x0. 2

11.5 Trigonometric and exponential functions

We assume that the function f(x) = sinx is defined via geometric considerations. This isnot a particularly good definition from the point of view of analysis, because one can doubtif it is rigorous. Later we will provide a rigorous definition, but as for now we will use thegeometric one.

We also assume certain trigonometric identities and inequalities which can be verifiedusing elementary geometry. In particular we assume that

| sinx| ≤ |x| for all x ∈ R,

| sinx| ≤ 1, and | cosx| ≤ 1

for all x ∈ R. We will also use the identity

sinx− sin y = 2 sinx− y

2cos

x+ y

2for all x, y ∈ R.

Theorem 11.24. Functions sinx and cosx are continuous.

Proof. We have

| sinx− sinx0| ≤ 2

∣∣∣∣sin x− x02

∣∣∣∣ ∣∣∣∣cosx+ x0

2

∣∣∣∣ ≤ 2

∣∣∣∣sin x− x02

∣∣∣∣≤ 2

∣∣∣∣x− x02

∣∣∣∣ = |x− x0| ,

so if xn → x0, then |xn − x0| → 0 and hence | sinxn − sinx0| → 0, sinxn → sinx0. Thisproves continuity of the function sinx. Since cosx = sin(x + π/2), continuity of cosx easilyfollows from that of sinx. 2

11.5. TRIGONOMETRIC AND EXPONENTIAL FUNCTIONS 163

Corollary 11.25. The function tanx is continuous on its domain of definition.

Indeed, tanx = sinx/ cosx is the quotient of two continuous functions. 2

Now we will need the inequality

sinx < x < tanx for x ∈ (0, π/2).

this inequality easily implies

cosx <sinx

x< 1 for x ∈ (0, π/2).

Since cos 0 = 1, we have that cosx→ 1 as x→ 0 and hence

limx→0+

sinx

x= 1.

The function sinx is odd and hence

limx→0−

sinx

x= lim

x→0+

sin(−x)

−x= lim

x→0+

sinx

x= 1.

Hence we proved

Theorem 11.26. limx→0

sinx

x= 1.

This result implies that the function

f(x) =

sinx

xif x 6= 0,

1 if x = 0

is continuous on R. Indeed, continuity ar x 6= 0 is obvious (quotient of two continuousfunctions) and for x = 0 we have

limx→0

f(x) = limx→0

sinx

x= 1 = f(0).

Now we will discuss continuity of the exponential function.

Theorem 11.27. The function f(x) = ax, a > 0 is continuous.

Proof. First we need to recall how the function f(x) = ax is defined. If a = 1, then ax = 1.Assume that a > 1. If x = p/q, p, q ∈ N is a rational number, then we define ax = (ap)1/q

and a−x = 1/(ax). Then one can prove that ax+y = axay for rational exponents x, y ∈ Q.Also one can prove that the function Q 3 x 7→ ax is strictly increasing. Now if x ∈ R, thenwe define

ax = sup {aα : α ∈ Q ∧ α ≤ x} .


Monotonicity of ax over rational exponents easily implies that the function f(x) = ax isstrictly increasing over R. One can also prove that ax+y = axay for all x, y ∈ R. Finally, if0 < a < 1 we define ax = 1/(1/a)x.

With this information about the function f(x) = ax we can proceed to the proof of itscontinuity.

If a = 1, then f(x) = 1x = 1, so continuity is obvious. Assume now that a > 1. First wewill prove continuity at 0. To this end we have to prove that

∀ε > 0 ∃δ > 0 ∀x ∈ R |x− 0| < δ ⇒ |ax − 1| < ε.

Let ε > 0 be given. Recall that

limn→∞

a1/n = limn→∞

n√a = 1

and also

limn→∞

a−1/n = limn→∞

1n√a

= 1.

Hence there is n0 such that ∣∣∣a1/n0 − 1∣∣∣ < ε,

∣∣∣a−1/n0 − 1∣∣∣ < ε.

Let 0 < δ < 1/n0. If |x− 0| < δ, Then |x| < 1/n0,

− 1

n0< x <

1

n0.

Now the fact that f(x) = ax is strictly increasing implies

1− ε < a−1/n0 < ax < a1/n0 < ε+ 1,

−ε < ax − 1 < ε, |xx − 1| < ε

which proves continuity of f(x) = ax at x = 0. If x0 ∈ R is an arbitrary real number, then

ax = ax0(ax−x0 − 1

)+ ax0 .

If x→ x0, then x− x0 → 0, so ax−x0 → 1 by continuity at 0 and hence

limx→x0

ax = limx→x0

ax0(ax−x0 − 1

)+ ax0 = ax0(1− 1) + ax0 = ax0 .

That proves continuity at x0 ∈ R.

If 0 < a < 1, then

limx→x0

ax = limx→x0

1(1a

)x =1(

1a

)x0 = ax0 ,

so f(x) = ax is continuous. 2

Let a > 1. Since f(x) = ax is strictly increasing and an →∞ as n→∞, a−n = 1/an → 0as n→∞ one can easily conclude that ax →∞ as x→∞ and ax → 0 as x→ −∞. Similarresult can also be obtained for 0 < a < 1. This gives the following result.

11.6. COMPOSITION OF FUNCTIONS 165

Theorem 11.28. If a > 1, then

limx→∞

ax =∞, limx→−∞

ax = 0.

If 0 < a < 1, thenlimx→∞

ax = 0, limx→−∞

ax =∞.

11.6 Composition of functions

Theorem 11.29. Let A,B ⊂ R and let f : A→ R, g : B → R be such that f(A) ⊂ B. If f iscontinuous at x0 ∈ A and f is continuous at f(x0) ∈ B, then the composition g ◦ f : A→ Ris continuous at x0.

Proof. let xn ∈ A, xn → x0. Then continuity of f at x0 yields yn := f(xn)→ f(x0). Sinceyn ∈ B and yn → f(x0), continuity of g at f(x0) gives

(g ◦ f)(xn) = g(f(xn)) = g(yn)→ g(f(x0)) = (g ◦ f)(x0)

and hence continuity of g ◦ f at x0 follows. 2

Corollary 11.30. Let A,B ⊂ R and let f : A→ R, g : B → R be such that f(A) ⊂ B. If fis continuous on A and g is continuous on B, then g ◦ f : A→ R is continuous on A.

11.7 Continuity of monotone and inverse functions

Definition. Intervals are subsets of R of the form (a, b), [a, b), (a, b], [a, b], (a,∞), [a,∞),(−∞, a), (−∞, a], where a, b ∈ R.

If P is an interval, we say that x ∈ P is an interior point of A if there is δ > 0 such that(x− δ, x+ δ) ⊂ P .

For example for P = [a, b], a and b are not interior points of P , while any other pointof P is an interior one. The points a, b are called endpoints. Also for an interval (a, b] thepoints a, b are called endpoints. In the interval (a,∞), a is the endpoint, but ∞ is not. Theendpoints are finite numbers.

Example 11.31. The function

f(x) =

{x if x ≤ 0,

x+ 1 if x > 0,

is strictly increasing, but it is not continuous at 0.

Theorem 11.32. If f : A → R, A ⊂ R is a monotone function1 and f(A) is an interval,then f is continuous.

1That means increasing or decreasing.


Proof. Suppose that f is increasing (the proof in the case of decreasing function is almostthe same). Let x0 ∈ A. Suppose also that f(x0) is an interior point of the interval f(A) (theproof in the case in which f(x0) is an endpoint is similar).

Let ε > 0 be given. Since f(x0) is an interior point of f(A), there are points y1, y2 ∈ f(A)such that

y1 < f(x0) < y2.

Since f(A) is an interval, we have [y1, y2] ⊂ f(A). We can assume that |y1 − y2| < ε. Sincey1, y2 ∈ f(A), there are x1, x2 ∈ A such that f(x1) = y1, f(x2) = y2. Since f is increasing,

x1 < x0 < x2.

Let δ = min{|x1 − x0|, |x2 − x0|}. If |x− x0| < δ, then x1 < x < x2 and hence

y1 = f(x1) ≤ f(x) ≤ f(x2) = y2.

Hence

f(x), f(x0) ∈ [y1, y2].

Since the length of the interval is less than ε,

|f(x)− f(x0)| < ε

which proves continuity of f at x0. 2

11.8 Inverse functions

If f is strictly increasing and continuous, then the inverse function f−1 exists, but it is notnecessarily continuous. Here is an example

f(x) =

{x if x ≤ 0,

x− 1 if x > 1

is continuous and strictly increasing however, the inverse function

f−1(x) =

{x if x ≤ 0,

x+ 1 if x > 0

is not continuous at 0. Observe that the domain of f is not an interval.

Theorem 11.33. Let P ⊂ R be an interval. If f : P → R is strictly monotone, then theinverse function f−1 exists and is continuous.

Proof. The inverse function f−1 is strictly monotone too and its range is the interval P ,so continuity of f−1 follows from Theorem 11.32. 2

11.9. INTERMEDIATE VALUE PROPERTY 167

11.9 Intermediate value property

Theorem 11.34. Let P ⊂ R be an interval. If f : P → R is continuous and

f(x) < c < f(y),

then there is z between x and y such that f(z) = c.

Proof. Suppose that x < y. The proof in the case x > y is similar. Since P is an interval,[x, y] ⊂ P . Let

A = {t ∈ [x, y] : f(t) < c}.The set A is not empty, because x ∈ A. The set A is bounded from above by y, so

z = supA ≤ y.Since x ∈ A and z = supA is an upper bound of A, z ≥ x. Thus z ∈ [x, y]. We will provethat f(z) = c.

Suppose that f(z) 6= c. We have two cases.

Case 1: f(z) < c. Then also z < y. Let ε = c − f(z). Since f is continuous, there is δ > 0such that if

z < t < z + δ, z < t < y

then|f(z)− f(t)| < ε = c− f(z)

and hence f(t) < c. Hence t ∈ A. Since t > z we have a contradiction with the fact thatz = supA.

Case 2: f(z) > c. This case is similar and we leave details to the reader. 2

Exercise 11.35. Let f(x) = anxn + . . .+ a1x+ a0, be a polynomial of an odd degree. Prove

that there is x such that f(x) = 0.

Proof. Suppose an > 0 (the argument for an < 0 is similar). We have

f(x) = xn(an +

an−1x

+ . . .+anxn

).

Since n is odd, xn →∞ as x→∞, and xn → −∞ as x→ −∞. Since the expression in theparenthesis converges to an > 0 as x→∞ or x→ −∞ we have

limx→∞

f(x) =∞, limx→−∞

f(x) = −∞.

In particular there are x1, x2 such that

f(x1) < 0 < f(x2)

and it follows from the intermediate value property that there is x between x1 and x2 suchthat f(x) = 0. 2

Exercise 11.36. Let f : [0, 1] → [0, 1] be a continuous function. Prove that f has a fixedpoint, i.e. there is x such that f(x) = x.

Proof. Let g(x) = f(x) − x. Then g(1) ≤ 0 ≤ g(0), so there is x such that f(x) − x =g(x) = 0, f(x) = x. 2


11.10 Logarithmic function

Let a > 1. We proved that the function f(x) = ax is strictly increasing and

limx→∞

f(x) =∞, limx→−∞

f(x) = 0.

Since f is continuous, it follows from the intermediate value property that f(R) = (0,∞).Hence the inverse function is strictly increasing, defined on (0,∞) and with the range R,

f−1 : (0,∞)→ R, surjection.

Since f is defined on an interval, f−1 is continuous, see Theorem 11.33. Similarly if 0 < a < 1,the function f−1 : (0,∞) → R is continuous and strictly decreasing. We denote the inversefunction by

loga : (0,∞)→ R.

Thus

aloga x = x for all x ∈ (0,∞),

loga ax = x for all x ∈ R.

Algebraic properties of the exponential function imply

(11.6) loga xy = loga x+ loga y, x, y > 0

loga(xr) = r loga x, x > 0, r ∈ R

and all other algebraic properties that we know. As an example we will prove (11.6). Wehave

aloga xy = xy = aloga xaloga y = aloga x+loga y ,

loga

(aloga xy

)= loga

(aloga x+loga y

),

loga xy = loga x+ loga y.

Theorem 11.37. If the functions f : A → (0,∞) and g : A → R are continuous, then thefunction

h(x) = f(x)g(x)

is continuous.

Proof. We have

h(x) = ag(x) loga f(x) .

The function x 7→ loga f(x) is continuous as a composition of two continuous functions, thefunction x 7→ g(x) loga f(x) is continuous as a product of two continuous functions and finallyh(x) is continuous as a composition of a continuous functions x 7→ ax and x 7→ g(x) loga f(x).2

Corollary 11.38. The function f(x) = ax, x > 0, a ∈ R is continuous.

11.11. MAXIMUM, MINIMUM AND UNIFORM CONTINUITY 169

11.11 Maximum, minimum and uniform continuity

A continuous function defined on a bounded interval is not necessarily bounded. f(x) = 1/x,x ∈ (0, 1] is an example. Even if j is bounded, sup and inf are not necessarily attained. Forexample f(x) = x, x ∈ (0, 1) is bounded,

supx∈(0,1)

f(x) = 1, infx∈(0,1)

f(x) = 0,

but the values 1 and 0 are not attained on (0, 1).

Exercise 11.39. Find a bounded continuous function on (0, 1] such that sup and inf are notattained.

The situation is different if f is defined on a bounded interval [a, b] that contains endpoints.We call such interval compact.

Theorem 11.40. Let f : [a, b] → R be continuous. Then f is bounded. Moreover there arex1, x2 ∈ [a, b] such that

f(x1) = supx∈[a,b]

f(x), f(x2) = infx∈[a,b]

f(x).

Proof. Clearly there is a sequence zn ∈ [a, b] such that

f(zn)→ supx∈[a,b]

f(x)

(even if the supremum equals ∞). Since the sequence zn is bounded it has a convergentsubsequence znk

→ x1 ∈ [a, b] and hence

supx∈[a,b]

f(x) = limk→∞

f(znk) = f(x1)

by continuity of f . Similarly we can prove that there is x2 ∈ [a, b] such that f(x2) =infx∈[a,b] f(x). In particular f(x2) ≤ f(x) ≤ f(x1) for all x ∈ [a, b] and hence the function fis bounded. 2

Definition. We say that a function f : A→ R is uniformly continuous if

∀ε > 0 ∃δ > 0 ∀x, y ∈ A |x− y| < δ ⇒ |f(x)− f(y)| < ε.

Let us compare uniform continuity with the definition of a continuous function on A. Afunction f : A→ R is continuous on A is

∀x ∈ A ∀ε > 0 ∃δ > 0 ∀y ∈ A |x− y| < δ ⇒ |f(x)− f(y)| < ε.

We fix x ∈ A and ε > 0 first and then we find δ > 0. It may happen that δ depends on X,while in the definition of uniform continuity we fix ε > 0 and then find δ > 0 which is goodfor all x. That is a huge difference as is easily seen in the next example.


Example 11.41. f(x) = 1/x, x ∈ (0, 1] is continuous, but not uniformly continuous.

Proof. Continuity is obvious, so we will show that f is not uniformly continuous. Bycontrary suppose that f is uniformly continuous, i.e.

∀ε > 0 ∃δ > 0 ∀x, y ∈ (0, 1] |x− y| < δ ⇒∣∣∣∣1x − 1

y

∣∣∣∣ < ε.

Take ε = 1. Then there is δ > 0 such that for all x, y ∈ (0, 1]

|x− y| < δ ⇒∣∣∣∣1x − 1

y

∣∣∣∣ < 1.

If n > 1/δ is an integer, then x = 1/n and y = 1/(n+ 1) are both less than δ, so

|x− y| =∣∣∣∣ 1n − 1

n+ 1

∣∣∣∣ < δ,

but ∣∣∣∣1x − 1

y

∣∣∣∣ = |n− (n+ 1)| = 1 ≥ 1

which is a contradiction. 2

In this example the function f was not defined on a compact interval. That is crucial.

Theorem 11.42. If f : [a, b]→ R is continuous, then f is uniformly continuous.

Proof. By contradiction, suppose that f is not uniformly continuous, i.e.

¬ (∀ε > 0 ∃δ > 0 ∀x, y ∈ [a, b] |x− y| < δ ⇒ |f(x)− f(y)| < ε) ,

i.e.∃ε > 0 ∀δ > 0 ∃x, y ∈ [a, b] |x− y| < δ ∧ |f(x)− f(y)| ≥ ε.

Let ε > 0 be such that the above statement is true. Then the rest of the statement is truefor all δ > 0. In particular for δ = 1/n we can find xn, yn ∈ [a, b] such that

|xn − yn| <1

n∧ |f(xn)− f(yn)| ≥ ε.

The sequence (xn) is bounded, so it has a convergent subsequence

xnk→ x0 ∈ [a, b] as k →∞.

Since |xnk− ynk

| < 1/nk → 0 as k →∞ we conclude that

ynk→ x0 as k →∞

and hence|f(xnk

)− f(ynk)| → |f(x0)− f(x0)| = 0,

but it is not possible, since |f(xnk)− f(ynk

)| ≥ ε. 2

The function f(x) = x, x ∈ (0, 1) is uniformly continuous, although it is not defined ona compact interval. Indeed, f = x, x ∈ [0, 1] is uniformly continuous on [0, 1] and hence itis uniformly continuous on (0, 1), because uniform continuity on a larger set always impliesuniform continuity on a smaller set. This is to say that an uniformly continuous function isnot necessarily defined on a compact interval.

11.11. MAXIMUM, MINIMUM AND UNIFORM CONTINUITY 171

Proposition 11.43. If f : (a, b) → R is an uniformly continuous function defined on abounded interval (i.e. a, b ∈ R), then f is bounded.

Proof. The definition of uniform continuity with ε = 1 yields that there is δ > 0 such that

|x− y| ≤ δ ⇒ |f(x)− f(y)| < 1.

Choose n0 such that n0δ > (b− a)/2. Let x ≥ (a+ b)/2 (estimate in the case x < (a+ b)/2is similar). Consider a sequence

xk =a+ b

2+ δk, k = 0, 1, 2, . . .

Then there is n < n0 such that

xn ≤ x but xn+1 > x.

We have∣∣∣∣f(x)− f(a+ b

2

)∣∣∣∣=

∣∣(f(x0)− f(x1))

+(f(x1)− f(x2)

)+ . . .+

(f(xn−1)− f(xn)

)+(f(xn)− f(x)

)∣∣≤ |f(x0)− f(x1)|+ . . .+ |f(xn−1)− f(xn)|+ |f(xn)− f(x)| ≤ n+ 1 ≤ n0.

Hence

|f(x)| ≤ n0 +

∣∣∣∣f (a+ b

2

)∣∣∣∣which proves boundedness of f . 2

Definition. We say that a function f : A → R, A ⊂ R is α-Holder continuous, α > 0, ifthere is a constant C > 0 such that

|f(x)− f(y)| ≤ C|x− y|α for all x, y ∈ A.

We say a function f : A→ R is Lipschitz continuous if there is a constant C such that

|f(x)− f(y)| ≤ C|x− y| for all x, y ∈ A.

Thus Lipschitz continuous functions are exactly 1-Holder continuous functions.

Theorem 11.44. If f : A→ R, A ⊂ R is α-Holder continuous, then f is uniformly contin-uous. In particular Lipschitz continuous functions are uniformly continuous.

Proof. Let f be α-Holder continuous. Let ε > 0 be given and let δ = (ε/C)1/α. If|x− y| < δ, then

|f(x)− f(y)| ≤ C|x− y|α < Cδα = ε

and uniform continuity follows. 2


Exercise 11.45. Let f(x) =√x, x ≥ 0. Prove that f is 1

2 -Holder continuous and thusuniformly continuous.

Proof. For y ≥ x ≥ 0 we have

y ≤(√y − x+

√x)2,

√y ≤√y − x+

√x,

√y −√x ≤√y − x,

|√y −√x| ≤ |y − x|1/2.


The above example shows that a function defined on an unbounded set can be uniformlycontinuous.

11.12 Uniform convergence

Definition 11.46. Let fn : A → R be a sequence of functions defined on a set2 A. We saythat (fn) converges pointwise if

∀x ∈ A limn→∞

fn(x) ∈ R.

Denoting the limit by f(x) = limn→∞ fn(x), we say that the sequence (fn) converges pointwiseto the function f : A→ R.

In other words pointwise convergence fn → f means that

(11.7) ∀x ∈ A ∀ε > 0 ∃n0 ∀n ≥ n0 |fn(x)− f(x)| < ε.

Definition 11.47. Let fn : A → R be a sequence of functions defined on a set A. We saythat (fn) converges uniformly to a function f : A→ R if

(11.8) ∀ε > 0 ∃n0 ∀n ≥ n0 ∀x ∈ A |fn(x)− f(x)| < ε.

We will denote the uniform convergence by fn ⇒ f .

We can rewrite (11.7) and (11.8) respectively as

(11.9) ∀ε > 0 ∀x ∈ A ∃n0 ∀n ≥ n0 |fn(x)− f(x)| < ε.

and

(11.10) ∀ε > 0 ∃n0 ∀x ∈ A ∀n ≥ n0 |fn(x)− f(x)| < ε.

2Here A is any set, not necessarily a subset of R.

11.12. UNIFORM CONVERGENCE 173

This is because changing the order of quantifiers

∀x ∈ A ∀ε > 0 to ∀ε > 0 ∀x ∈ A and ∀n ≥ n0 ∀x ∈ A to ∀x ∈ A ∀n ≥ n0

does not alter the logical meaning of the statements.

Note that conditions (11.9) and (11.10) differ by the order of quantifiers: ∀x ∈ A∃n0 vs.∃n0 ∀x ∈ A. That is, in the case of pointwise convergence, for each x we can find n0 and itcan happen that different values of x require different choice of n0. However, in the case ofuniform convergence we can find n0 that works for all x at the same time. Clearly, this isa stronger condition and so uniform convergence implies the pointwise one. However, as wewill see, we can have pointwise convergence that is not uniform.

The uniform convergence is well explained on a picture:

Example 11.48. Let fn : [0, 1]→ [0, 1] be a sequence of functions defined by their graphs

It is easy to see that fn → 0 pointwise.3 However, the convergence is not uniform. If0 < ε < 1, then there is no n such that the graph of fn is contained in the ε-neighborhood ofthe constant function 0 so the convergence fn → 0 cannot be uniform.

Formally, uniform convergence fn ⇒ 0 would mean that

∀ε > 0 ∃n0 ∀n ≥ n0 ∀x ∈ [0, 1] |fn(x)− 0| < ε.

Hence to prove that the convergence if not uniform we need to prove that

∃ε > 0 ∀n0 ∃n ≥ n0 ∃x ∈ [0, 1] |fn(x)| ≥ ε.3Why?


Let ε = 1/2, let n0 be arbitrary, let n = n0 and let x = 2/n. Then fn(x) = 1 and hence|fn(x)| ≥ ε which proves the claim.

Example 11.49. Let fn : [0, 1]→ R be defined by fn(x) = xn. Clearly,

limn→∞

fn(x) = f(x) :=

{1 if x = 1

0 if 0 ≤ x < 1.

The convergence fn → f is pointwise, but not uniform. Indeed, suppose to the contrary thatfn ⇒ f uniformly. In particular, the convergence is uniform on [0, 1). That is

∀ε > 0 ∃n0 ∀n ≥ n0 ∀x ∈ [0, 1) |xn − 0| < ε.

Let ε = 1/2. Let n0 be such that the above statement is true (for ε = 1/2). Let n ≥ n0be arbitrary. Then the above statement says that for all x ∈ [0, 1), we have |xn| < 1/2. Inparticular, taking x = 2−1/n we have

1

2= |xn| < 1

2

which is a clear contradiction.

The next result is one of the main reasons why the uniform convergence is so important.

Theorem 11.50. Let fn : A→ R, A ⊂ R be a sequence of continuous functions that convergeuniformly to f : A→ R. Then f : A→ R is continuous.

Proof. Let x0 ∈ A be arbitrary. We shall prove continuity of f at x0. Let ε > 0 be given. Weneed to find δ > 0 such that

x ∈ A, |x− x0| < δ ⇒ |f(x)− f(x0)| < ε.

Since fn ⇒ f converges uniformly, there is n such that for all x ∈ A, |fn(x) − f(x)| < ε/3.The function fn is continuous at x0 so there is δ > 0 such that

x ∈ A, |x− x0| < δ ⇒ |fn(x)− fn(x0)| <ε

3.

Therefore for x ∈ A such that |x− x0| < δ we have

|f(x)− f(x0)| ≤ |f(x)− fn(x)|+ |fn(x)− fn(x0)|+ |fn(x0)− f(x0)| <ε

3+ε

3+ε

3= ε.

which proves continuity of f at x0.

Let us now look at Example 11.49 again. Since the functions fn are continuous and thelimiting function f is not, the convergence cannot be uniform.

11.12. UNIFORM CONVERGENCE 175

Definition 11.51. Let A be any set and let f : A → R, n ∈ N, be a sequence of functions.The pointwise and the uniform convergence of the series

∞∑n=1

fn(x)

is defined respectively as the pointwise and the uniform convergence of the sequence of partialsums

sk(x) =

k∑n=1

fn(x).

Corollary 11.52. If the functions fn : A → R, A ⊂ R, are continuous and the series∑∞n=1 fn = f converges uniformly, then the function f : A→ R is continuous.

Note that we can reformulate this corollary in terms of exchanging the order of taking thelimit and the sum. Namely, under the assumptions of the corollary we have

(11.11) limx→x0

∞∑n=1

fn(x) =

∞∑n=1

limx→x0

fn(x).

Indeed, since the functions fn are continuous, the right hand side is nothing else, but∑∞n=1 fn(x0) = f(x0). On the other hand, the left side is equal to limx→x0 f(x) so (11.11) is

a different way of writing limx→x0 f(x) = f(x0) which is continuity of f at x0.

This also shows that if the functions fn are continuous at x0, but the function f(x) =∑∞n=1 fn(x) is not, then equality (11.11) does not hold.

The next result provides a powerful tool for proving uniform convergence.

Theorem 11.53 (Weierstrass M -test). Suppose that fn : A → R, A ⊂ R, is a sequence offunction such that |fn(x)| ≤ an for all x ∈ A and n ∈ N. If the series

∑∞n=1 an converges,

then the series∑∞

n=1 fn(x) converges uniformly.

Proof. It follows from the Comparison Test that the series∑∞

n=1 fn(x) converges pointwise.Denote its sum by

f(x) =

∞∑k=1

fk(x).

Let ε > 0 be arbitrary. Take n0 such that

∞∑n=n0

ak < ε.

Such n0 exists since the series∑∞

n=1 an is convergent. Then for m > n ≥ n0 and every x ∈ Awe have ∣∣∣∣∣

m∑k=1

fk(x)−n∑k=1

fk(x)

∣∣∣∣∣ =

∣∣∣∣∣m∑

k=n+1

fk(x)

∣∣∣∣∣ ≤m∑

k=n+1

ak ≤∞∑

k=n0

ak < ε.


Therefore, letting m → ∞ (while n ≥ n0 is fixed), and bearing in mind that∑m

k=1 fk(x) →f(x) as m→∞, we get∣∣∣∣∣f(x)−

n∑k=1

fk(x)

∣∣∣∣∣ < ε for all n ≥ n0 and all x ∈ A.

This however, means that the series∑∞

k=1 fk(x) converges uniformly to f(x).

Example 11.54. Prove that the fucntion

f(x) =

∞∑n=1

1

x2 − n2

is continuous for all real non-integer x i.e., for all x ∈ R \ Z.

Proof. It suffices to prove that for every M > 0, the function f is well defined4 and continuousat all points of [−M,M ] \ Z.

For n >√

2M and x ∈ [−M,M ] we have x2 < n2/2 and hence∣∣∣∣ 1

x2 − n2

∣∣∣∣ =1

n2 − x2≤ 1

n2 − n2

2

=2

n2.

Now convergence of the series∑

2/n2 and the M -test yield the uniform convergence of theseries ∑

n>√2M

1

x2 − n2on[−M,M ].

Since the convergence is uniform and the functions that we add are continuous (zero does notshow up in the denominator), the sum of the above series defines a continuous function on[−M,M ]. Therefore

f(x) =∑

n≤√2M

1

x2 − n2︸︷︷︸continuous in [−M,M ] \ Z

+∑

n>√2M

1

x2 − n2︸︷︷︸continuous in [−M,M ]

,

and the claim follows.

Example 11.55. Prove that the function

f(x) =

∞∑n=1

x

1 + n2x2

is continuous on R \ {0} and discontinuous at x = 0.

4i.e. the series is convergent

11.13. POWER SERIES I 177

Proof. If x = 0, then the series converges and f(0) = 0. Fix a > 0. For |x| ≥ a we have∣∣∣∣ x

1 + n2x2

∣∣∣∣ =|x|

1 + n2x2<|x|n2x2

=1

n2|x|≤ 1

n2a2

and it follows from the M -test that for every a > 0 the series converges uniformly on

(11.12) (−∞,−a] ∪ [a,∞) = {x : |x| ≥ a}.

Therefore, for every a > 0 the function f is continuous in the set (11.12). Hence, f(x) is welldefined for all x ∈ R (as the series is convergent for all x ∈ R) and f is continuous in R\{0}.5

It remains to prove that f is not continuous at 0. For x > 0 we have

f(x) =∞∑n=1

x

1 + n2x2>

m∑n=1

x

1 + n2x2>

mx

1 +m2x2.

In particular f(1/m) > 1/2 so f(1/m) does not converge to f(0) = 0 as m→∞ proving thatf is not continuous at 0.

Remark 11.56. In fact, as a corollary we can conclude that the above series does not convergeuniformly in any of the intervals −[a, a], a > 0 as otherwise the limit would be continuous at0.

11.13 Power series I

Definition 11.57. The power series is a function defined by a series

(11.13) f(x) = c0 + c1x+ c2x2 + . . . =

∞∑n=0

cnxn.

The domain of the definition of this fucntion is a set of those x, where the series converges.

The next result explains the structure of the set of points where the series converges.

Theorem 11.58. There is R ∈ [0,∞] such that6

(a) If |x| > R, then the series (11.13) diverges.

(b) If |x| < R, then the series (11.13) converges. Moreover for any 0 < r < R, theseries converges uniformly and absolutely in [−r, r] and therefore it defines a continuousfunction on (−R,R).

Definition 11.59. The number R is called the radius of convergence of the series (11.13).

5Indeed, if x0 6= 0 and a = |x0|/2, then the function f is continuous is (11.12); since x0 belongs to theinterior of that set, f is continuous at x0.

6R =∞ is included.


Proof. Let A = {x ∈ R : cnxn → 0}. Clearly 0 ∈ A so A 6= ∅. Let R = supA. Clearly,

R ∈ [0,∞]. We will prove that R has all desired properties.

(a). If |x| > R, then |x| 6∈ A so cn|x|n 6→ 0 and hence (11.13) diverges.

(b). If R = 0, then the claim is obvious. If R > 0, it suffices to prove that for every 0 < r < R,the series converges uniformly and absolutely on [−r, r].

Since 0 < r < R = supA, there is a ∈ A such that a > r. By the definition of the set A,we have that cna

n → 0 as n→∞. In particular the sequence is bounded so

|cnan| ≤M for some M > 0 and all n.

Therefore for x ∈ [−r, r]

|anxn| ≤ |cnrn| = |cnan|(ra

)n≤M

(ra

)nand hence

(11.14)

∞∑n=0

|cnxn| ≤∞∑n=0

|cnrn| ≤M∞∑n=0

(ra

)n<∞,

because 0 < r/a < 1. Now the absolute and the uniform convergence of the series (11.13)follows from (11.14) and the M -test.

Example 11.60. The series

∞∑n=0

xn =1

1− xfor |x| < 1

has the radius of convergence equal R = 1.

Example 11.61. Prove that the radius of convergence of the series

∞∑n=0

xn

n!= 1 +

x

1!+x2

2!+ . . .

is R =∞.

Proof. The result is an immediate consequence of the Ratio Test. If an = xn/n!, then|an+1/an| = |x|/(n+ 1)→ 0 < 1 so the series converges for all x ∈ R and hence R =∞.

Example 11.62. Show that the radius of convergence of the series∞∑n=1

n!xn

nnequals e.

Proof. Let an = n!xn/nn. Then

limn→∞

∣∣∣∣an+1

an

∣∣∣∣ = limn→∞

|x|(1 + 1

n

)n =|x|e.

If |x| < e, then the series converges according to the Ratio Test, and if |x| > e, it diverges.Therefore R = e.

11.13. POWER SERIES I 179

If 0 < R <∞ is the radius of convergence of a power series (11.13), then the series definesa continuous function on (−R,R) and it diverges when |x| > R. What does happen whenx = ±R? Actually, everything can happen. There are series such that they

(a) diverge at R and diverge at −R,

(b) converge at R and converge at −R,

(c) converge at R and diverge at −R,

(d) diverge at R and converge at −R.

The geometric series∑∞

n=0 xn has the radius of convergence R = 1 and it diverges at x = 1

and x = −1 which is the case (a).

The series

(11.15)

∞∑n=1

(−1)n+1xn

n= x− x2

2+x3

3− x4

4+ . . .

has radius of convergence R = 1 (Ratio Test). It converges at x = 1 as alternating series anddiverges at x = −1 as negative harmonic series. This is the case (c). Multiplying the seriesby −1 we obtain the case (d).

The series

(11.16)∞∑n=0

(−1)nx2n+1

2n+ 1= x− x3

3+x5

5− x7

7+ . . .

has radius of convergence R = 1 (Ratio Test) and it converges at both x = 1 and x = −1since in both cases we have an alternating test. This is the case (b).

In the above examples we always have R = 1. However, we may easily change them toget any 0 < R <∞. To this end it suffices to observe that if the radius of convergence of theseries

∑∞n=0 cnx

n equals 1, then the radius of convergence of the series∑∞

n=0 cnR−nxn equals

R. (Why?)

Remark 11.63. Later, we will prove that the series (11.15) equals ln(1+x) for all−1 < x ≤ 1.In particular taking x = 1 we obtain

ln 2 = 1− 1

2+

1

3− 1

4+

1

5− . . .

This will give a new proof of Theorem 10.27. We will also prove that the series (11.16) equalsarctanx for all −1 ≤ x ≤ 1 so taking x = 1 yields an even more interesting formula due toLeibniz

π

4= 1− 1

3+

1

5− 1

7+

1

9− . . .

In the examples discussed earlier, we computed the radius of convergence of the powerseries using Ratio Test. With the same argument we obtain


Theorem 11.64. If limn→∞

∣∣∣∣cn+1

cn

∣∣∣∣ = λ ∈ [0,∞], then the radius of convergence of the series

∞∑n=0

cnxn equals R =

1

λ,

where we adopt the convention that 1/0 =∞ and 1/∞ = 0.

We leave details of the proof to the reader.

While the above result is easy to use, it applies only to those few power series for whichthe limit of |cn+1/cn| exists. However, the next result provides a formula for the radius ofconvergence that works for all power series.

Theorem 11.65 (Cauchy-Hadamard). The radius of convergence of the series

∞∑n=0

cnxn

equals R = 1/λ, where

λ = lim supn→∞

n√|cn| ∈ [0,∞].

Again, we use the same convention as before: 1/0 =∞ and 1/∞ = 0.

Proof. First we will show convergence of series when |x| < R = 1/λ. Since the series triviallyconverges at x = 0, we can assume that |x| > 0. If λ = ∞, then R = 0 and there is nothingto check. Thus assume that λ < ∞ and 0 < |x| < R = 1/λ. Since λ < 1/|x|, we can findγ > 0 such that

lim supn→∞

n√|cn| = λ < γ <

1

|x|.

It follows from the definition of the upper limit that only finitely many terms n√|cn| can be

greater than or equal γ, as otherwise we would find a subsequence of n√|cn| with the limit

no less than γ, but the largest limit of all possible subsequences is λ < γ. This is to say thatthere is n0 such that

n√|cn| < γ <

1

|x|for all n ≥ n0.

Therefore

(11.17) |cnxn| ≤ (γ|x|)n for n ≥ n0.

Since γ|x| < 1, the series∑∞

n=1(γ|x|)n converges, and hence convergence of the series∑∞n=0 cnx

n follows from (11.17) and the Comparison Test.

It remains to prove that the series is divergent when |x| > R = 1/λ. If λ = 0, then R =∞and there is nothing to check. Thus we may assume that λ > 0 and hence R <∞. Let η > 0be such that

1

|x|< η < λ.

11.14. WEIERSTRASS THEOREM: APPROXIMATION BY POLYNOMIALS 181

Since λ is a limit of a subsequence nk√|cnk|, there is k0 such that

nk

√|cnk| > η for k ≥ k0.

This however, yields that

|cnk||x|nk > (η|x|)nk →∞ as k →∞.

and hence cnkxnk does not converge to zero, proving the divergence of the series

∑∞n=0 cnx

n.

11.14 Weierstrass theorem: approximation by polynomials

Theorem 11.66 (Weierstrass). If f : [a, b] → R is a continuous function, then for everyε > 0, there is a polynomial pε(x) such that

|f(x)− pε(x)| < ε for all x ∈ [a, b].

Remark 11.67. Note that the theorem can be equivalently stated as: there is a sequence ofpolynomials qn that converges to f uniformly on [a, b]. Indeed, if a sequence of polynomialsqn ⇒ f converges uniformly, then for any ε > 0 there is n0 such that |f(x) − qn(x)| < ε forall x ∈ [a, b] and all n ≥ n0 so we can take pε = qn0 . On the other hand if for every ε > 0 wecan find a polynomial pε such that |f(x) − pε(x)| < ε for all x ∈ [a, b], then the sequence ofpolynomials qn = p1/n converges uniformly to f .

Observe that it suffices to prove the theorem when [a, b] = [0, 1], because we problem canbe reduced to the interval [0, 1] by a linear change of variables. More precisely, if f : [a, b]→ Ris a continuous function and L : [0, 1]→ [a, b] is a linear parametrization, L(x) = a+(b−a)x,then f ◦ L : [0, 1] → R is continuous. If qε is a polynomial such that |(f ◦ L)(y)− qε(y)| < εfor all y ∈ [0, 1], then for all x ∈ [a, b] we have∣∣∣∣f(x)− qε

(x− ab− a

)∣∣∣∣ =

∣∣∣∣(f ◦ L)

(x− ab− a

)− qε

(x− ab− a

)∣∣∣∣ < ε, becausex− ab− a

∈ [0, 1].

Therefore we may take pε(x) = qε(x−ab−a ).

The above argument shows that Theorem 11.66 is an immediate consequence of

Theorem 11.68 (Bernstein). If f : [0, 1]→ R is continuous, then the sequence of polynomials

pn(x) =

n∑k=0

(n

k

)xk(1− x)n−kf

(k

n

), where

(n

k

)=

n!

k!(n− k)!,

converges uniformly to f on [0, 1].


Proof. According to the Binomial Theorem 4.8,

(11.18) (x+ y)n =n∑k=0

(n

k

)xkyn−k.

Regarding (11.18) as equality of two functions of variable x (with y fixed), taking derivativewith respect to x twice yields

(11.19) n(x+ y)n−1 =n∑k=0

k

(n

k

)xk−1yn−k,

(11.20) n(n− 1)(x+ y)n−2 =n∑k=0

k(k − 1)

(n

k

)xk−2yn−k.

Multiplying both sides of (11.19) by x and both sides of (11.20) by x2 yields

(11.21) nx(x+ y)n−1 =n∑k=0

k

(n

k

)xkyn−k,

(11.22) n(n− 1)x2(x+ y)n−2 =n∑k=0

k(k − 1)

(n

k

)xkyn−k.

Let y = 1− x and

rn(x) =

(n

k

)xkyn−k, then

n∑k=0

rk(x) = 1 andn∑k=0

krk(x) = nx

by (11.18) and (11.21). Now (11.22) implies that

n(n− 1)x2 =

n∑k=0

k(k − 1)rk(x) =

n∑k=0

k2rk(x)−n∑k=0

krk(x)︸︷︷︸nx

.

Therefore,n∑k=0

k2rk(x) = n(n− 1)x2 + nx

and hence

n∑k=0

(k − nx)2rk(x) =n∑k=0

k2rk(x)− 2nxn∑k=0

krk(x) + n2x2n∑k=0

rk(x)

= n(n− 1)x2 + nx− 2nx · nx+ n2x2 = nx(1− x).

Chapter 12

Derivative

Definition. Let A ⊂ R. We say that x ∈ A is an interior point of A, denoted x ∈ intA, ifthere is δ > 0 such that (x− δ, x+ δ) ⊂ A.

Definition 12.1. Let f : A → R, A ⊂ R and let x0 ∈ intA be an interior point of A. Wesay that f is differentiable at x0 if

f(x0 + h)− f(x0)

his convergent as h→ 0.

We denote the limit by

f ′(x0) = limh→0

f(x0 + h)− f(x0)

h

and call it derivative of f at x0.

We say that a function f : (a, b)→ R is differentiable if it is differentiable at every pointof (a, b).

Note that

f ′(x0) = limx→x0

f(x)− f(x0)

x− x0.

Example 12.2. It is easy to see that if f is constant, then f ′(x) = 0 and if f(x) = x, thenf ′(x) = 1.

Example 12.3. (√x)′ =

1

2√x

for x > 0.

Proof. We have

limh→0

√x+ h−

√x

h= lim

h→0

(√x+ h−

√x)(√x+ h+

√x)

h(√x+ h+

√x)

= limh→0

1√x+ h+

√x

=1

2√h, .

2

185

186 CHAPTER 12. DERIVATIVE

Proposition 12.4. If f is differentiable at x0, then f is continuous at x0.

Proof. For x 6= x0 we have

f(x) =f(x)− f(x0)

x− x0(x− x0) + f(x0)

and hencelimx→x0

f(x) = f ′(x0) · 0 + f(x0) = f(x0)

which proves continuity of f at x0. 2

Definition 12.5. We say that a function g(h) is of order O(hα) (read: big O) if there isε > 0 and M > 0 such that

|g(h)||h|α

≤M for all 0 < |h| < ε.

We say that g(h) is of order o(hα) (read: little o) if

g(h)

|h|α→ 0 as h→ 0.

Clearly the condition o(hα) is stronger than O(hα).

Remark 12.6. 3h is O(h), h2 is o(h), but√h is neither O(h) nor o(h). However,

√h =

O(h1/2). In general o(hα) denotes any function with the property described above and thesame symbol is used to denote different functions. Thus h2 = o(h) and h3 = o(h) despite thefact that these are two different functions. Thus, in general, it is also true that o(hα)+o(hα) =o(hα) and that does not imply that o(hα) = 0.

Theorem 12.7. Let f : A → R, A ⊂ R, x0 ∈ intA. Then the following conditions areequivalent.

(a) f is differentiable at x0.

(b) There is L ∈ R such that

f(x0 + h) = f(x0) + Lh+ o(h).

(c) There is L ∈ R and a function ψ that is continuous at 0, ψ(0) = 0 such that

f(x0 + h) = f(x0) + Lh+ hψ(h).

Moreover L = f ′(x0).

Proof.

(c)⇒ (b). Sincehψ(h)

h= ψ(h)→ 0 as h→ 0

187

we conclude that the function hψ(h) is of order o(h).

(b)⇒ (a). Assume that the function f satisfies

f(x0 + h) = f(x0) + Lh+ o(h).

Thenf(x0 + h)− f(x0)

h= L+

o(h)

h→ L as h→ 0,

so f is differentiable at x0 and f ′(x0) = L.

(a)⇒ (c). By our assumption the function f is differentiable at x0, i.e.

f(x0 + h)− f(x0)

h→ f ′(x0) as h→ 0,

Hencef(x0 + h)− f(x0)− f ′(x0)h

h→ 0 as h→ 0.

This however, implies that the function

ψ(h) =

f(x0 + h)− f(x0)− f ′(x0)h

hif h 6= 0,

0 if h = 0.

is continuous at 0 and ψ(0) = 0. It is easy to check that

f(x0 + h) = f(x0) + f ′(x0)h+ hψ(h).

Hence (c) follows with L = f ′(x0). 2

Definition. If a function f is differentiable at x0, then the function

y = f(x0) + f ′(x0)(x− x0)

is called tangent line to y = f(x) at x0. the function y = f(x0) + f ′(x0)(x− x0) is also calledlinear approximation of f near x0 or linearization of f at x0.

Geometric interpretation of the derivative:

i.e. the difference between the tangent line and the value of the function at (x0 + h) is oforder o(h), i.e. much less that the distance between x0 + h and x0, provided h is sufficientlysmall.


Exercise 12.8. Use the linearization of y =√x to approximate

√4.12.

Solution. Let f(x) =√x. We proved that f ′(x) = 1/(2

√x), so f ′(4) = 1/4 and hence

y =√

4 +1

4(x− 4) = 2 +

1

4(x− 4)

is the linearization of f(x) =√x at x = 4

Hence√

4 + h = 2 +1

4h+ o(h),

√h+ 4 ≈ 2 +

1

4h for small h.

In particular√

4.12 ≈ 2 +1

4· 0.12 = 2.03.

Note that the approximation 2+h/4 has the error o(h) which is much smaller than h, providedh is sufficiently small. In our example

√4.12 = 2.029778313 . . .

and the error is √4.12− 2.03 = −0.0002216 . . .

which is substantially smaller than h = 0.12. 2

Theorem 12.9. If f, g : A → R, A ⊂ R are functions that are differentiable at x0 ∈ intA,then

(a) (f + g)′(x0) = f ′(x0) + g′(x0).

(b) (kf)′(x0) = kf ′(x0) for k ∈ R.

(c) (fg)′(x0) = f ′(x0)g(x0) + f(x0)g′(x0).

(d) (f

g

)′(x0) =

f ′(x0)g(x0)− f(x0)g′(x0)

g2(x0)provided g(x0) 6= 0.

189

Proof. Properties (a) and (b) are very easy and left to the reader.

(c). Applying Theorem 12.7 we have

f(x0 + h) = f(x0) + f ′(x0)h+ hψ1(h), g(x0 + h) = g(x0) + g′(x0)h+ hψ2(h).

Therefore

(fg)(x0 + h) =(f(x0) + f ′(x0)h+ hψ1(h)

)(g(x0 + h) = g(x0) + g′(x0)h+ hψ2(h)

)= f(x0)g(x0) +

(f ′(x0)g(x0) + f(x0)g

′(x0))h+ hψ(h),

where

ψ(h) = f ′(x0)g′(x0)h

2 + ψ1(h)(g(x0) + g′(x0)h+ hψ2(h)

)+ ψ2(h)

(f(x0) + f ′(x0)h

).

Clearly ψ is continuous at 0 and ψ(0) = 0, so

(fg)′(x0) = f ′(x0)g(x0) + f(x0)g′(x0).

(d) This easily follows from (c) if we know what trick to use. We have

f ′ =

(f

g· g)′

=

(f

g

)′g +

(f

g

)g′,

f ′g =

(f

g

)′g2 + fg′,(

f

g

)′=f ′g − fg′

g2.


Theorem 12.10 (Chain rule). If f is differentiable at x0 and g is differentiable at f(x0),then g ◦ f is differentiable at x0 and

(g ◦ f)′(x0) = g′(f(x0))f′(x0).

Proof. Denote f(x0) = y0. Since g is differentiable at y0, Theorem 12.7 yields

g(y) = g(y0) + g′(y0)(y − y0) + (y − y0)ψ(y − y0),

where φ is continuous at 0 with φ(0) = 0. Substituting y by f(x) and y0 by f(x0) we have

g(f(x)) = g(f(x0)) + g′(f(x0)(f(x)− f(x0)) + (f(x)− f(x0))ψ(f(x)− f(x0)),

and hence

(g ◦ f)(x)− (g ◦ f)(x0)

x− x0= g′(f(x0))

f(x)− f(x0)

x− x0+f(x)− f(x0x− x0

ψ(f(x)− f(x0)).

Since f(x)→ f(x0) as x→ x0, we have ψ(f(x)− f(x0))→ 0 as x→ x0 and hence

limx→x0

(g ◦ f)(x)− (g ◦ f)(x0)

x− x0= g′(f(x0))f

′(x0) + f ′(x0) · 0 = g′(f(x0))f′(x0).

2


12.1 Derivatives of elementary functions I

It is easy to see that if f is constant then f(x)′ = 0 and if f(x) = x, then f ′(x) = 1. HenceTheorem 12.9 implies that if

f(x) = anxn + an−1x

n−1 + . . .+ a1x+ a0

is a polynomial, then

f ′(x) = nanxn−1 + (n− 1)an−1x

n−2 + . . .+ a1.

Theorem 12.11. (lnx)′ =1

x, x > 0.

Proof. Let f(x) = lnx, x > 0. Then

f(x+ h)− f(x)

h=

1

h(ln(x+ h)− lnx) =

1

hln

(1 +

h

x

)=

1

xln

(1 +

h

x

)x/h.

Since by Theorem 11.14

limh→0

(1 +

h

x

)x/h= lim

t→0(1 + t)1/t = e

using continuity of f(x) = lnx we conclude that

limh→0

f(x+ h)− f(x)

h=

1

xln e =

1

x.

2

Theorem 12.12. (sinx)′ = cosx and (cosx)′ = − sinx for x ∈ R.

Proof. For x ∈ R and h 6= 0 we have

sin(x+ h)− sinx

h=

1

h· 2 sin

(x+ h)− x2

cos(x+ h) + x

2

=sin h

2h2

cos

(x+

h

2

)→ cosx

as h → 0. We applied here Theorem 11.26 and continuity of the cosx function. Similarargument show that (cosx)′ = − sinx. 2

Using this result and Theorem 12.9 we can find derivatives of other trigonometric func-tions. For example the reader will easily check the following result.

Theorem 12.13. (tanx)′ =1

cos2 xfor x 6= π

2+ kπ, k ∈ Z.

12.2. MEAN VALUE PROPERTY 191

12.2 Mean value property

Proposition 12.14. If f : (a, b)→ R is differentiable at x ∈ (a, b) and f has local maximumor local minimum at c, then f ′(c) = 0.

Proof. Suppose that f has local maximum at c. Then

f ′(c) = limh→0+

f(x0)− f(x0)

h≤ 0

because the numerator is less than or equal zero and denominator is greater than zero. How-ever, we also have

f ′(c) = limh→0−

f(x0)− f(x0)

h≥ 0

because now numerator is still less than or equal zero, but the denominator is negative. Thetwo inequalities yield f ′(c) = 0. The same argument applies to the case of local minimum. 2

Theorem 12.15 (Rolle). If f : [a, b] → R is continuous, differentiable in (a, b) and f(a) =f(b), then there is c ∈ (a, b) such that f ′(c) = 0.

Proof. If f is constant, then the claim is obvious. Since f is continuous it attains maximumand minimum.

f(c1) = supx∈[a,b]

f(x), f(c2) = infx∈[a,b]

f(x).

Since f is not constant, then either

f(c1) > f(a) = f(b) or f(c2) < f(a) = f(b).

Suppose e.g. f(c1) > f(a) = f(b). Then c1 ∈ (a, b) and it follows from the previous resultthat f ′(c1) = 0. 2

Theorem 12.16 (Mean Value Theorem). If f : [a, b] → R is continuous and differentiablein (a, b), then there is c ∈ (a, b) such that

f(b)− f(a) = f ′(c)(b− a).

Remark. Rolle’s theorem is a special case of the mean value theorem.

Proof. Let

ϕ(x) = f(x)− f(a)−(x− a)

(f(b)− f(a)

)b− a

.

Then ϕ is continuous on [a, b], differentiable in (a, b) and ϕ(a) = ϕ(b) = 0. Hence it followsfrom the Rolle theorem that

f ′(c)− f(b)− f(a)

b− a= ϕ′(c) = 0

for some c ∈ (a, b) and the theorem easily follows. 2


Theorem 12.17. Suppose that f : [a, b]→ R is continuous and differentiable in (a, b).

(a) If f ′(x) ≥ 0 in (a, b), then f is increasing in [a, b].

(b) If f ′(x) ≤ 0 in (a, b), then f is decreasing in [a, b].

(c) If f ′(x) > 0 in (a, b), then f is strictly increasing in [a, b].

(d) If f ′(x) < 0 in (a, b), then f is strictly decreasing in [a, b].

Proof. Proofs of all the four cases are similar and we will prove the case (b) only. Ifa ≤ x1 < x2 ≤ b, then there is c ∈ (x1, x2) such that

f(x2)− f(x1) = f ′(c)(x2 − x1) ≤ 0.

Hence f(x2) ≤ f(x1), i.e. f is decreasing. 2

Theorem 12.18. If f : (a, b)→ R is differentiable and the derivative is bounded, |f ′(x)| ≤Mfor a ∈ (a, b), then

|f(x)− f(y)| ≤M |x− y| for a, y ∈ (a, b).

Hence f is Lipschitz continuous and thus uniformly continuous.

Proof. We have

f(x)− f(y) = f ′(c)(x− y),

|f(x)− f(y)| ≤ |f ′(c)| |x− y| ≤M |x− y|.

2

Exercise 12.19. Give an example of a function f : R → R which is differentiable and suchthat f ′ is not continuous.

Solution. Let

f(x) =

{x2 sin 1

x if x 6= 0,0 if x = 0.

Clearly f is differentiable at x 6= 0. It is also differentiable at 0, because

f ′(0) = limh→0

h2 sin 1h

h= 0.

Hence

f ′(x) =

{− cos 1

x + 2x sin 1x if x 6= 0,

0 if x = 0

is clearly discontinuous at 0.

12.2. MEAN VALUE PROPERTY 193

Although f ′ need not be continuous, it still satisfies the intermediate value property, justlike a continuous function.

Theorem 12.20. If f : (a, b)→ R is differentiable, then the derivative has the intermediatevalue property, i.e. if a < α < β < b and

f ′(α) < y < f ′(β) or f ′(β) < y < f ′(α),

then there is x ∈ (α, β) such that f ′(x) = y.

Proof. We will prove the theorem in the case

f ′(α) < y < f ′(β).

Suppose first that y = 0, i.e.f ′(α) < 0 < f ′(β).

Since f ′(α) < 0, f(α + h) < f(α) for sufficiently small h > 0. Similarly f(β − h) < f(β) forsufficiently small h > 0. Therefore

inf{f(x) : x ∈ [α, β]} < min{f(α), f(β)}.

Since f is continuous it attains minimum on the interval [α, β], but the above inequality showsthat the minimum is not attained at the endpoints, so it is attained at some c ∈ (α, beta) andtherefore f ′(c) = 0 by Theorem 12.14.

Now let’s consider the general case

f ′(α) < y < f ′(β).

Let g(x) = f(x)− xy. Then g′(x) = f ′(x)− y and hence

g′(α) = f ′(α)− y < 0 < f ′(β)− y = g′(β).

Thus there is c ∈ (a, b) such that

f ′(c)− y = g′(c) = 0, f ′(c) = y.

2

We will close this section with one more, very important, variant of the mean valuetheorem.


Theorem 12.21 (Cauchy mean value theorem). Let f, g : [a, b] → R be continuous anddifferentiable in (a, b). Suppose also that g′(x) 6= 0 for all x ∈ (a, b). Then there is c ∈ (a, b)such that

f(b)− f(a)

g(b)− g(a)=f ′(c)

g′(c).

Proof. Since g′(x) 6= 0 for all x ∈ (a, b) it follows from Rolle’s theorem that g(b)−g(a) 6= 0.For x ∈ [a, b] define

ϕ(x) =f(b)− f(a)

g(b)− g(a)

(g(x)− g(a)

)−(f(x)− f(a)

).

The function ϕ is continuous in [a, b] and differentiable in (a, b). Moreover ϕ(a) = ϕ(b) = 0.Hence according to Rolle’s theorem, there is c ∈ (a, b) such that

0 = ϕ′(c) =f(b)− f(a)

g(b)− g(a)g′(c)− f ′(c)

and the theorem follows. 2

12.3 Derivatives of elementary functions II: inverse functions

Theorem 12.22. If f : (a, b)→ R is differentiable and f ′(x) 6= 0 for all x ∈ (a, b), then eitherf is strictly increasing or strictly decreasing. Hence f has the inverse function. Moreover theinverse function f−1 is differentiable and

(f−1)′(f(x)) =1

f ′(x)for x ∈ (a, b).

Proof. Since f ′(x) 6= 0 for all x ∈ (a, b) we conclude that

f ′(x) > 0 for all x ∈ (a, b)

orf ′(x) < 0 for all x ∈ (a, b).

Indeed, if we would have f ′(x1) > 0 and f ′(x1) < 0, then we would also have f ′(x) = 0 forsome x between x1 and x2 by the intermediate value property of the derivative, Theorem 12.20,which contradicts our assumptions.

Thus either f is strictly increasing or strictly decreasing and hence it has the inversefunction. The inverse function is continuous according to Theorem 11.33. To prove differen-tiability of f−1 we compute its derivative as follows

(f−1)′(y0) = limy→y0

f−1(y)− f−1(y0)y − y0

= ♥.

If we let x = f−1(y) and x0 = f−1(y0), then x→ x0 by continuity of f−1, so

♥ = limy→y0

x− x0f(x)− f(x0)

= limy→y0

1f(x)−f(x0)

x−x0

=1

f ′(x0).

12.4. L’HOSPITAL’S RULE 195

2

Using this result we can find derivatives of many elementary functions.

Theorem 12.23. (ex)′ = ex, x ∈ R.

Proof. If f(x) = lnx, x > 0, then f−1(x) = ex, x ∈ R, so ex is differentiable and

(ex)′ = (f−1)′(x) =1

f ′(f−1(x))= f−1(x) = ex,

because f ′(y) = 1/y. 2

Theorem 12.24. If g, h : (a, b)→ R are differentiable functions and g(x) > 0 for x ∈ (a, b),then

f(x) = g(x)h(x)

is differentiable and

f ′(x) = g(x)h(x)(h′(x) ln g(x) +

h(x)g′(x)

g(x)

).

Proof. Since

f(x) = eh(x) ln g(x)

the result follows from the chain rule. 2

In particular we have.

Theorem 12.25. (xα)′ = αxα−1, x > 0, α ∈ R.

Using derivatives of the trigonometric functions and Theorem 12.22 one can easily prove

Theorem 12.26. (arctanx)′ =1

1 + x2, x ∈ R.

Theorem 12.27. (arcsinx)′ =1√

1− x2, x ∈ (−1, 1).

12.4 L’Hospital’s rule

The following result is a very powerful tool in computing limits with the help of the derivative.Recall that R = R ∪ {−∞,+∞}.

Theorem 12.28 (L’Hospital’s rule). Let f, g : A→ R be given functions, where the set A isof the form

A = (a, x0), −∞ ≤ a < x0 ≤ +∞

or

A = (x0, a), −∞ ≤ x0 < a ≤ +∞


orA = (a, x0) ∪ (x0, b), −∞ ≤ a < x0 < b ≤ +∞.

Suppose f and g are differentiable and g′(x) 6= 0 for every x ∈ A. Suppose also that

limx→x0

f(x) = limx→x0

g(x) = 0

orlimx→x0

|g(x)| = +∞.

Then if the limit

limx→x0

f ′(x)

g′(x)= α ∈ R

exists, then also

limx→x0

f(x)

g(x)= α.

Note that if the functions are defined on the interval (x0, a), then

limx→x0

f(x)

g(x)= lim

x→x+0

f(x)

g(x),

so the theorem includes the case of one-sided limits.

Proof. The theorem has many cases, so the proof must be very long and we will not prove itin the full generality, just in one special case.

Suppose f, g : (x0, a)→ R, −∞ ≤ x0 < a ≤ ∞ are differentiable, g′ 6= 0 and

(12.1) limx→x+0

f(x) = limx→x+0

g(x) = 0.

Assume also that

(12.2) limx→x+0

f ′(x)

g′(x)= g ∈ R.

Let x0 < x < a. Note that (12.1) implies that is we define f(x0) = g(x0) = 0, then thefunctions f, g : [x0, x] → R are continuous. It follows from The Cauchy mean value theorem(Theorem 12.21) that there is ξ ∈ (x0, x) such that1

f(x)

g(x)=f(x)− f(x0)

g(x)− g(x0)=f ′(ξ)

g′(ξ).

If x→ x+0 , then ξ → x+0 and hence

limx→x+0

f(x)

g(x)= lim

ξ→x+0

f ′(ξ)

g′(ξ)= g.

1The fact that g′ 6= 0 and Rolle’s theorem guarantees that g(x) = g(x)− g(x0) 6= 0.

12.4. L’HOSPITAL’S RULE 197

Example 12.29. Find the limit

limx→0

(ax − x ln a

bx − b ln b

)1/x2

, where a, b > 0.

Proof.

limx→0

( ax − x ln a

bx − x ln b︸︷︷︸f(x)

)1/x2= ♥.

If we want to find a limit or a derivative of a function of the form f(x)g(x), then it is convenientto consider the function ln(f(x)g(x)) = g(x) ln f(x). In our case we have

ln f(x) =ln(ax − x ln a)− ln(bx − x ln b)

x2.

Therefore the limit of ln f(x) splits into two similar limits, one involving a and the other oneinvolving b. We have

limx→0

ln(ax − x ln a)

x2

00= limx→0

ax ln a− ln a

2x(ax − x ln a)00= limx→0

ax(ln a)2

2(ax − x ln a) + 2x(ax ln a− ln a)=

(ln a)2

2.

Similarly we compute the limit involving b. Therefore

♥ = exp

((ln a)2 − (ln b)2

2

).


f(x) =3√x2 + sinx+ cosx− ex

is differentiable at x = 0. Find f ′(0).

Proof. We have

f ′(0) = limx→0

f(x)− f(0)

x= lim

x→0

3√x2 + sinx+ cosx− ex

x= ♥.

We want to compute the limit using the l’Hospital rule. However, taking the derivative of3√f(x) would lead to ugly computations. There is however a simple trick how to avoid that

♥ =

(limx→0

x2 + sinx+ cosx− ex

x3

)1/3

=

(limx→0

2x+ cosx− sinx− ex

3x2

)1/3

=

(limx→0

2− sinx− cosx− ex

6x

)1/3

=

(limx→0

− cosx+ sinx− ex

6

)1/3

=

(−2

6

)1/3

= − 13√

3.


12.5 Inequalities

Differentiability properties of functions can be used to prove many difficult inequalities andin this section we will see some examples.

Example 12.31. Prove that is 0 < α < 1 and x, y > 0, then (x+ y)α < xα + yα.

Proof. Let f(y) = xα + yα − (x+ y)α. Then f(0) = 0 and

f ′(y) = αyα−1 − α(x+ y)α−1 > 0.

The first equality is just a formula for the derivative and the fact that the expression is greaterthan zero we can prove as follows.

αyα−1 − α(x+ y)α−1 > 0 ≡yα−1 > (x+ y)α−1 ≡y1−α < (x+ y)1−α ≡

y < x+ y

Since the last inequality is true, the first one is true as well.

We proved that f(0) = 0 and f ′(y) > 0 for y > 0. This implies that f(y) > 0 for y > 0which gives desired inequality. 2

Example 12.32. Prove thatx

1 + x≤ ln(1 + x) ≤ x for all x ∈ (−1,∞).

Proof. First we will prove the left inequality. Let

f(x) = ln(1 + x)− x

1 + x.

We have

f ′(x) =1

1 + x− 1 + x− x

(1 + x)2=

1

1 + x− 1

(1 + x)2=

x

(1 + x)2

so f ′(x) > 0 for x > 0 and f ′(x) < 0 for −1 < x < 0. The function f is decreasing on theinterval (−1, 0), attains value zero at 0 and then it is increasing. Hence f has minimum atx = 0, so f(x) ≥ f(0) = 0 which is our inequality.

To prove the right inequality let f(x) = x− ln(1 + x). We have f(0) = 0 and

f ′(x) = 1− 1

1 + x

Hence f ′(x) > 0 if x > 0 and f ′(x) < 0 if −1 < x < 0. The function f is decreasing on theinterval (−1, 0), attains value zero at 0 and then it is increasing. Hence it attains minimumat x = 0 and thus f(x) ≥ f(0) = 0 which is our inequality.

Example 12.33. Prove the inequality

0 < −1 +

(1

x+

1

2

)ln(1 + x) <

x2

12

for x > 0.

12.5. INEQUALITIES 199

Proof. Easy computation shows that the inequality is equivalent to

2x

x+ 2< ln(1 + x) <

x3 + 12x

6(x+ 2).

First we will prove the left inequality. Let

f(x) = ln(1 + x)− 2x

x+ 2.

We have f(0) = 0 and for x > 0

f ′(x) =1

1 + x− 2(x+ 2)− 2x

(x+ 2)2=

1

1 + x− 4

(x+ 2)2> 0 ≡

1

1 + x>

4

(x+ 2)2≡

(x+ 2)2 > 4(1 + x) ≡x2 + 4x+ 4 > 4 + 4x ≡

x2 > 0 true.

Hence f(x) > 0 for x > 0 which is our inequality.

To prove the right inequality it suffices to prove that

f(x) =x3 + 12x

x+ 2− 6 ln(1 + x) > 0 for x > 0.

We have f(0) = 0 and for x > 0

f ′(x) =(3x2 + 12)(x+ 2)− (x3 + 12x)

(x+ 2)2− 6

1 + x≡

3x3 + 6x2 + 12x+ 24− x3 − 12x

(x+ 2)2>

6

1 + x≡

(2x3 + 6x2 + 24)(1 + x) > 6(x+ 2)2 ≡2x3 + 6x2 + 24 + 2x4 + 6x3 + 24x > 6x2 + 24x+ 24 ≡

2x3 + 2x4 + 6x3 > 0 true, because x > 0.

Since f(0) = 0 and f ′(x) > 0 for x > 0 we conclude that f(x) > 0 for x > 0 which is whatwe needed to prove. 2

Exercise 12.34. Prove that the equation 3x + 4x = 5x, x > 0 has exactly one solution.

Proof. Clearly x = 2 is a solution. Let

f(x) =

(3

5

)x+

(4

5

)x− 1.


Obviously f(2) = 0. For x > 0 we have

f ′(x) =

(3

5

)xln

(3

5

)+

(4

5

)xln

(4

5

)< 0

Hence the function f is strictly decreasing and thus f(x) 6= 0 for x 6= 2, i.e.

3x + 4x 6= 5x for x 6= 2.

Example 12.35. Prove that for 0 < p < 1 we have

2

e< p

pp−1 + p

11−p < 1.

Proof. We have2

e< p

pp−1 + p

11−p = (1 + p)p

p1−p .

Since we will want to compute the derivative of f it is more convenient to consider

ln f(p) = ln(1 + p) +p

1− pln p.

It easily follows from the l’Hospital’s rule that

limp→0+

ln f(p) = 0 and limp→1−

ln f(p) = ln 2− 1.

Therefore

limp→0+

f(p) = 1 and limp→1−

f(p) =2

e

and it suffices to prove now that f is decreasing. To this end, it suffices to show that ln f(p)is decreasing. We have

(ln f(p))′ =1

1 + p+

1

(1− p)2ln p+

1

1− p=

1

(1− p)2(

ln p+ 21− p1 + p︸︷︷︸

g(p)

),

g′(p) =1

p

(1− p1 + p

)2

> 0.

Therefore g(p) < g(1) = 0 and hence

(ln f(p))′ =g(p)

(1− p)2< 0

which proves that ln f(p) is decreasing.

Example 12.36. What is larger eπ or πe?

Proof. Let f(x) = x− e lnx. Note that f(e) = 0 and

f ′(x) = 1− e

x> 0 for x > e.

Since π > e and f is incrising for x > e, we have that

f(π) > f(e) = 0, π − e lnπ > 0, π > e lnπ = lnπe, eπ > πe.

12.6. POWER SERIES II 201

12.6 Power series II

We will prove now that the power series defines not only a continuous function but an infinitelydifferentiable and that in fact we can differentiate the power series term by term, just like apolynomial.

Theorem 12.37. The two series

∞∑n=0

cnxn and

∞∑n=1

ncnxn−1

have the same radius of convergence R. If R > 0, then the first series is differentiable in(−R,R) and ( ∞∑

n=1

cnxn

)′=∞∑n=1

ncnxn−1 for all x ∈ (−R,R).

Note that we can apply the theorem again to the series of derivatives( ∞∑n=1

ncnxn−1

)′=∞∑n=2

n(n− 1)cnxn−1

and so on. That proves that the power series defines an infinitely differentiable function in

(−R,R). If f(x) =∞∑n=0

cnxn, then

f (k)(x) =

∞∑n=k

n(n− 1) · · · (n− k + 1)cnxn−k.

Taking x = 0, we see that the only non-zero term in the above sum is the one with n = k so

f (k)(0) = k!ck.

We proved

Corollary 12.38. If f(x) =∞∑n=0

cnxn has the radius of convergence R > 0, then the function

f(x) is of class C∞(−R,R) and

f(x) =

∞∑n=0

f (n)(0)

n!xn for all x ∈ (−R,R).

Remark 12.39. We will see later that this is a special case of the Taylor series.

Proof of Theorem12.37. The radii of convergence of the series

∞∑n=0

cnxn and

∞∑n=1

ncnxn−1 =

∞∑n=0

(n+ 1)cn+1xn


equal

R =1

lim supn→∞n√|cn|

and R =1

lim supn→∞n√

(n+ 1)|cn+1|.

Since√n→ 1 as n→∞, it easily follows that both radii of convergence are equal. The use

of the upper limit is however, quite mysterious so let us show a direct argument that both

radii are equal. Let as before R be the radius of convergence of∞∑n=0

cnxn. We need to show

that

|x| < R ⇒∞∑n=1

ncnxn−1 converges and |x| > R ⇒

∞∑n=1

ncnxn−1 diverges.

If |x| < R, then there is r such that |x| < r < R and we have

|ncnxn−1| = |cnrn−1|n∣∣∣xr

∣∣∣n−1 .Since the series

∑∞n=0 cnr

n converges, cnrn−1 = rn−1cnr

n converges to zero and hence formsa bounded sequence, |cnrn−1| ≤ M for all n. Therefore. Since

∑∞n=1Mn|x/r|n−1 converges

(Ratio Test and |x/r| < 1), convergence of the series∑∞

n=1 ncnxn−1 follows (Comparison

Test).

Assume now that |x| > R. In the proof of Theorem 11.58 we showed that R = supA,where R = {x ∈ R : cnx

n → 0}. Therefore cnxn 6→ 0. Since |ncnxn−1| ≥ |x|−1|cnxn|, we

conclude that ncnxn−1 6→ 0 so the series

∑∞n=1 ncnx

n−1 cannot converge.

That completes the proof that both series have equal radii of convergence.

Suppose now that R > 0. We are left with the proof of the term by term differentiabilityof the series f(x) =

∑∞n=0 cnx

n. Fix x0 ∈ (−R,R). For x 6= x0 we have2

f(x)− f(x0)

x− x0=

∞∑n=1

cnxn − xn0x− x0

=

∞∑n=1

cn(xn−1 + x0xn−2 + . . .+ xxn−20 + xn−10 )︸︷︷︸G(x)

.

Let r be such that |x0| < r < R. We will prove that the series G(x) converges uniformly on[−r, r]. Note that

(12.3) |cn(xn−1 + x0xn−2 + . . .+ xxn−20 + xn−10 )| ≤ n|cn|rn−1.


n=1 ncnxn−1 converges absolutely on [−r, r] we have that

∞∑n=1

n|cn|rn−1 <∞

so (12.3) and the M -test imply uniform convergence of G(x) on [−r, r]. Therefore G(x) is acontinuous function on [−r, r] and in particular, continuous at x0 so

f ′(x0) = limx→x0

f(x)− f(x0)

x− x0= lim

x→x0G(x) = G(x0) =

∞∑n=1

ncnxn−10 .

2an − bn = (a− b)(an−1 + an−2b+ . . .+ abn−2 + bn−1).



Example 12.40. Since

1

1− x=∞∑n=0

xn for |x| < 1,

we conclude that(1

(1− x)2

)2

=

(1

1− x

)′=

( ∞∑x=0

xn

)′=∞∑n=1

nxn−1 for |x| < 1.

We already proved the same formula using the Cauchy multiplication of the series.

Theorem 12.41. ex = 1 +1

1!+x2

2!+ . . . =

∞∑n=0

xn

n!for all x ∈ R.

Proof. Let g(x) =∞∑n=0

xn

n!. It follows from the ration test, that the series converges for all

x ∈ R and hence g ∈ C∞(R) and we can differentiate the series term by term. Therefore

g′(x) =

∞∑n=0

(xn

n!

)′=

∞∑n=1

xn−1

(n− 1)!=

∞∑n=0

xn

n!= g(x).

We proved that g′(x) = g(x), and we also know that (ex)′ = ex, but how can we prove thatg(x) = ex? Her is a clever trick

(g(x)e−x)′ = g′(x)︸︷︷︸g(x)

e−x − g(x)e−x = 0.

Therefore g(x)e−x = C is a constant function. Taking x = 0 we get 1 = g(0)e−0 = C, C = 1and hence g(x) = ex.

Theorem 12.42. For all x ∈ R we have

sinx = x− 1

3!x3 +

1

5!x5 − . . . =

∞∑n=0

(−1)nx2n+1

(2n+ 1)!

cosx = 1− 1

2!x2 +

1

4!x4 − . . . =

∞∑n=0

(−1)nx2n

(2n)!.

Proof. Let

s(x) =

∞∑n=0

(−1)nx2n+1

(2n+ 1)!and c(x) =

∞∑n=0

(−1)nx2n

(2n)!.

According to the ration test, the series converge on R and hence they define smooth functions:we can differentiate series term by term. Easy calculation shows that

s′(x) = c(x) and c′(x) = −s(x).


Consider the fucntion h(x) = (s(x)− sinx)2 + (c(x)− cos(x))2. We have

h′(x) = 2(s(x)− sin(x))(c(x)− cosx) + 2(c(x)− cosx)(−s(x) + sinx) = 0.

Therefore h(x) is a constant function. Since h(0) = 0, we have that h(x) = 0 for all x ∈ Rand hence s(x) = sinx and c(x) = cosx.

As we know, a power series f(x) =

∞∑n=0

cnxn whose radius of convergence is 0 < R < ∞

may converge at x = R or at x = −R, see examples (11.15) and (11.16). If it converges forexample at x = R, then it defines a function on f : (0, R] → R. We already proved thatthe function is smooth in (−R,R), but the next result (Abel’s theorem) shows that it is alsocontinuous on (0, R]. of course, the only problem is to prove left continuity at x = R. Thatdoes not seem much, but this continuity result is actually non-trivial and it has surprisingapplications, see Corollary 12.45 and 12.47

Theorem 12.43 (Abel). If the radius of convergence of the series f(x) =∑∞

n=0 cnxn equals

0 < R < ∞, and the series converges at x = R (or at x = −R), then the function f :(−R,R] → R is left continuous at x = R (f : [−R,R) → R is right continuous at x = −R).That is,

f(R) = limx→R−

f(x) (f(−R) = limx→−R+

f(x)).

In other words∞∑n=0

cnRn = lim

x→R−

∞∑n=0

cnxn

( ∞∑n=0

cn(−R)n = limx→−R+

∞∑n=0

cnxn

).

Proof. If the series converges at x = −R, then the series f(−x) =∑∞

n=1 cn(−1)nxn convergesat x = R so it suffices to consider the case of convergence at x = R only. If the series hasradius of convergence 0 < R < ∞, then the series f(Rx) =

∑∞n=0 cnR

nxn has radius ofconvergence equal 1. Therefore we may assume in the proof that R = 1 and that the seriesconverges at x = 1, i.e. we assume convergence of the series s =

∑∞n=0 cn. Therefore, it

remains to prove that

(12.4) limx→1−

∞∑n=0

cnxn =

∞∑n=0

cn, i.e., limx→1−

f(x) = s.

Letsn = c0 + c1 + . . .+ cn, s−1 = 0

be the sequence of partial sums of the series∑∞

n=0 cn. We added the term s−1 = 0 forconvenience of notation. For |x| < 1 we have

m∑n=0

cnxn =

m∑n=0

(sn − sn−1)xn

= (s0 + s1x+ s2x2 + . . .+ sm−1x

m−1 + smxm)− ( s−1︸︷︷︸

0

+s0x+ s1x2 + . . .+ sm−1x

m)

= (1− x)s0 + (1− x)xs1 + . . .+ (1− x)xm−1sm−1 + smxm = (1− x)

m−1∑n=0

snxn + smx

m.


(12.5) −smxm +m∑n=0

cnxn = (1− x)

m−1∑n=0

snxn.

For |x| < 1, the left hand side converges as m→∞. Indeed, the series converges to f(x) andsmx

m → 0, because |x| < 1, and the sequence sm of partial sums is bounded (as convergent).Therefore the right hand side of (12.5) converges too. Passing to the limit in (12.5) as m→∞yields

f(x) = (1− x)∞∑n=0

snxn for |x| < 1.

Since (1− x)∑∞

n=0 xn = 1, we have that

s = (1− x)∞∑n=0

sxn.

Therefore

|f(x)− s| =

∣∣∣∣∣(1− x)

∞∑n=0

(sn − s)xn∣∣∣∣∣ .

We want to show that given ε > 0, there is 0 < δ < 1 such that if 1 − δ < x < 1, then|f(x) − s| < ε. Note that sn → s since sn is a partial sum of a series convergent to s. Thusfor large n, |sn − s| is small. We certainly need to use this fact and we need to break theseries into two parts that we will estimate separately.

Given ε > 0, there is N such that

|sn − s| <ε

2for n > N .

Let 0 < δ < 1. Then for 1− δ < x < 1 we have3

|f(x)− s| ≤ |1− x|︸︷︷︸<δ

N∑n=0

|sn − s||x|n︸︷︷︸<1

+ε

2

∣∣∣∣∣(1− x)∞∑

n=N+1

xn

∣∣∣∣∣︸︷︷︸<1

≤ δN∑n=0

|sn − s|+ε

2.

Let 0 < δ < 1 be such that

δ <ε

2∑N

n=0 |sn − s|, then |f(x)− s| < ε.

Note that δ depends on N only and N depends directly on ε so δ depends directly on ε.We proved that for every ε > 0 we can find 0 < δ < 1 such that if 1 − δ < x < 1, then|f(x)− s| < ε. That is, we proved (12.4). The proof is complete.

Now we will show some remarkable applications of Abel’s theorem.

3We do not know yet what choice of δ will work so we take any 0 < δ < 1 and hope that the estimates thatfollow will lead us to an appropriate choice of δ.


Theorem 12.44.

ln(1 + x) = x− x2

2+x3

3− . . . =

∞∑n=1

(−1)n+1xn

nfor all −1 < x ≤ 1.

Taking x = 1 yields

Corollary 12.45. ln 2 = 1− 1

2+

1

3− 1

4+ . . .

Proof of Theorem 12.44. First we will prove the equality for |x| < 1. Let f(x) = ln(1 + x).Then

f ′(x) =1

1 + x=∞∑n=0

(−1)nxn for |x| < 1.

Clearly, R = 1 is the radius of convergence of this geometric series. We want to find a powerseries whose derivative equals to the above series. Obviously,4( ∞∑

n=1

(−1)n+1xn

n

)′=∞∑n=0

(−1)nxn = f ′(x) for |x| < 1.

Since f and the series above have the same derivative for all |x| < 1, both functions differ bya constant. However, f(0) = 0 and the above series evaluated at 0 equals 0 so we have that

(12.6) ln(1 + x) = f(x) =

∞∑n=1

(−1)n+1xn

nfor all |x| < 1.

We still have to prove equality at x = 1. The series converges at x = 1 as an alternatingseries

∞∑n=1

(−1)n+1 1

n= 1− 1

2+

1

3=

1

4+ . . .

Therefore, according to Abel’s theorem the series defines a continuous function on (−1, 1],since ln(1 + x) is also continuous on (−1, 1] and coincides with the series on (−1, 1), we bothln(1 + x) and the series must also be equal at x = 1. The proof is complete.

Theorem 12.46.

arctanx = x− x3

3+x5

5− . . . =

∞∑n=0

(−1)nx2n+1

2n+ 1for all |x| ≤ 1.

Taking x = 1 yields

Corollary 12.47 (Leibniz).π

4= 1− 1

3+

1

5− 1

7+ . . .

4We use here the fact that a power series and its derivative have the same radius of convergence, in ourcase R = 1.

12.7. TAYLOR’S THEOREM(S) 207

Proof of Theorem 12.46. First, we will prove the theorem for |x| < 1 and the case |x| = 1will follow from Abel’s theorem, just like in the proof of Theorem 12.44. Let f(x) = arctanx.We have

(12.7) f ′(x) =1

1 + x2=∞∑n=0

(−1)nx2n, for |x| < 1.

Clearly, R = 1 is the radius of convergence of the above geometric series. We want to find apower series whose derivative equals to the derivative of the series on the right hand side of(12.7). Clearly, ( ∞∑

n=0

(−1)nx2n+1

2n+ 1

)′=∞∑n=0

(−1)nx2n = f ′(x), for |x| < 1.

Since the derivative of f(x) equals to the derivative of the above series for all |x| < 1, f(x)differs from the series by a constant, but at x = 0 both f(0) = 0 and the series equals zero so

(12.8) arctanx = f(x) =∞∑n=0

(−1)nx2n+1

2n+ 1for all |x| < 1.

It remains to prove the equality at |x| = 1. If x = 1 or x = −1, the series on the right handside of (12.8) converges as an alternating series. Therefore, according to Abel’s theorem itdefies a continuous function on [−1, 1], but f(x) = arctanx is also continuous on [−1, 1] and,as we proved, it coincides with the series on (−1, 1). Therefore both f(x) = arctanx and theseries must coincide on the whole interval [−1, 1], and in particular at x = ±1. The proof iscomplete.

12.7 Taylor’s theorem(s)

In Corollary 12.38 we proved that if a power series f(x) =∑∞

n=0 cnxn has radius of conver-

gence R > 0, then f is infinitely differentiable on (−R,R) and

(12.9) f(x) =

∞∑k=0

f (k)(0)

k!xk for all x ∈ (−R,R).

In particular, when n is large, the polynomial (12.10) provides a good approximation of f .Therefore, if f : I → R is defined in an open interval I and n-times differentiable at 0 ∈ I,5

it is natural to consider the Maclaurin polynomial defined by

(12.10)

n∑k=0

f (k)(0)

k!xk = f(0) + f ′(0)x+ f ′′(0)

x2

2!+ . . .+ f (n)(0)

xn

n!.

While the Maclaurin polynomial cannot be equal to f(x) if f is not a polynomial, we stillexpect that it will provide a good approximation of f . Since it is computed from the deriva-tives of f at 0, only information about f near 0 is used and therefore, we should expect good

5n-times differentiability at 0 requires the function to be (n− 1)-times differentiable in some open intervalthat contains 0.


approximation of f(x) only when x is close to 0. The difference between the Maclaurin poly-nomial and the function is called the remainder term; the remainder term Rn(x) is definedthrough the equality

(12.11) f(x) =n∑k=0

f (k)(0)

k!xk +Rn(x) for all x ∈ I.

In other words, Rn(x) is the error of the approximation of f by the Maclaurin polynomial.The following lemma is easy to prove.

Lemma 12.48. If f is (n− 1)-times differentiable in an open interval I, and n-times differ-entiable at x = 0 ∈ I, then the remainder Rn(x) defined by (12.11) is n-times differentiable

at 0 and Rn(0) = R′n(0) = . . . = R(n)n (0) = 0.

Proof. The fact that Rn(x) is n-times differentiable at zero is obvious: f is n-times differen-tiable and Maclaurin’s polynomial is infinitely differentiable.

Let us show now a couple of steps of the proofs that Rn(0) = R′n(0) = . . . = R(n)n (0) = 0,

and the formal proof based on the induction argument is left to the reader.

Rn(x) = f(x)−(f(0) + f ′(0)x+ f ′′(0)

x2

2!+ f ′′′(0)

x3

3!+ . . .+ f (n)(0)

xn

n!

).

Therefore Rn(0) = f(0)− f(0) = 0. Taking the derivative yields

R′n(x) = f ′(x)−(f ′(0) + f ′′(0)x+ f ′′′(0)

x2

2!+ . . .+ f (n)(0)

xn−1

(n− 1)!

)so R′n(0) = f ′(0)− f ′(0) = 0. Similarly

R′′n(x) = f ′′(x)−(f ′′(0) + f ′′′(0)x+ . . .+ f (n)(0)

xn−2

(n− 2)!

)yields R′′n(0) = f ′′(0)− f ′′(0) = 0 and so on.

Since the first n derivatives of Rn(x) are equal zero at x = 0, it suggests that Rn(x)should be very small when x is close to 0, which together with (12.11) would imply that theMaclaurin polynomial gives a very good approximation of f(x) when x is close to 0. Makingthis statements precise is the content of Taylor’s theorem. We will actually show severalversion of such a result6 with different formulas expressing the remainder term Rn(x).

In the formula (12.11), all derivatives of f were computed at x = 0. However, we mayalso consider a similar formula, where the derivatives are computed at any other point a ∈ I.Thus, if f is n-times differentiable at a ∈ I, then the Taylor polynomial and the correspondingremainder term Rn are defined by the equality

(12.12) f(a+ h) =

n∑k=0

f (k)(a)

k!hk︸︷︷︸

Taylor’s polynomial

+Rn(h).

6See Theorems 12.50 and 12.59.


Taylor’s polynomial is defined for all h while the remainder term Rn(h) is defined only forall h such that a + h ∈ I. Clearly, if a = 0 and h is replaced by x, the Taylor polynomialbecomes the Maclaurin one. Lemma 12.48 easily implies

Lemma 12.49. If f is (n− 1)-times differentiable in I and n-times differentiable at x = a ∈I, then the remainder Rn(h) defined by (12.12) is n-times differentiable at 0 and Rn(0) =

R′n(0) = . . . = R(n)n (0) = 0.

Indeed, it suffices to apply Lemma 12.48 to the fucntion f(x) = f(a+ x).

12.7.1 Peano’s remainder

Let us recall (Theorem 12.7) that if f is differentiable at a ∈ I, then there is a function ψthat is continuous at 0, ψ(0) = 0, and

(12.13) f(a+ h) = f(a) + f ′(a)h+ hψ(h) = f(a) + f ′(a) + o(h).

In that case f(a)+f ′(a)h is the Taylor polynomial of degree n = 1, and R1(h) = hψ(h) = o(h)is the remainder term. The next result is an important generalization of this observation tothe case of higher order derivatives.

Theorem 12.50 (Taylor’s theorem with Peano’s remainder). If f : I → R is (n − 1)-timesdifferentiable in an open interval I and if f and has n-th derivative at a ∈ I, then there is afunction ψ, continuous at 0, ψ(0) = 0 such that

(12.14) f(a+ h) = f(a) + f ′(a)h+f ′′(a)

2!h2 + . . .+

f (n)(a)

n!hn + hnψ(h),

whenever a+ h ∈ I. In particular

f(a+ h) = f(a) + f ′(a)h+f ′′(a)

2!h2 + . . .+

f (n)(a)

n!hn + o(hn).

Remark 12.51. The remainder term Rn(h) = hnψ(h) = o(hn) is called the Peano remainder.

Remark 12.52. Replacing a+ h by x the obtain another form of Taylor’s formula

f(x) = f(a) + f ′(a)(x− a) +f ′′(a)

2!(x− a)2 + . . .+

f (n)(a)

n!(x− a)n + (x− a)nψ(x− a) .

Proof. Let

R(h) = f(a+ h)−n∑k=0

f (k)(a)

k!hk

and define

ψ(h) =

R(h)

hnif h 6= 0,

0 if h = 0.


With this definition, equality (12.14) is obvious and it remains to show that ψ(h) =R(h)/hn → 0 as h→ 0. Recall that according to Lemma 12.49 we have

R(0) = R′(0) = . . . = R(n)(0) = 0.

Therefore the l’Hospital rule gives

limh→0

R(h)

hn=

R′(h)

nhn−1= . . . = lim

h→0

R(n−1)(h)

(n− 1)!h= ♥.

Of course, the above equalities hold, provided the limit on the right hand side exists. Wecannot apply the l’Hospital rule again, because R(n−1) is not necessarily differentiable at otherpoints than h = 0. However, we can use now the definition of the derivative:

♥ = limh→0

f (n−1)(a+ h)− f (n−1)(a)− f (n)(a)h

(n− 1)!h= 0.


Definition 12.53. We say that a function f(x) has a local maximum at x = a, if there isε > 0 such that f(x) ≥ f(a), when |x − a| < ε, and f has a strict local maximum at a isthere is ε > 0 such that f(x) > f(a), when 0 < |x− a| < ε. Local minimum and strict localminimum are defined in a similar way. Extremum is a maximum or minimum.

As a direct application of Theorem 12.50 we will prove the following important result.

Theorem 12.54. Suppose f is (n−1)-times differentiable in a neighborhood of a and n-timesdifferentiable at a.

(a) If n is odd andf ′(a) = . . . = f (n−1)(a) = 0, f (n)(a) 6= 0,

then f has neither local maximum nor local minimum at a.

(b) If n is even andf ′(a) = . . . = f (n−1)(a) = 0, f (n)(a) > 0,

then f has a strict local minimum at a.

(c) If n is even andf ′(a) = . . . = f (n−1)(a) = 0, f (n)(a) < 0,

then f has a strict local maximum at a.

Proof. We will only prove he result in the case (a); the other cases are similar and left to thereader. Let f be as in the case (a). Applying Theorem 12.50 we have

f(a+ h) = f(a) + f ′(a)h+ f ′′(a)h

2+ . . .+ f (n)(a)

hn

n!+ hnψ(h)

= f(a) + f (n)(a)hn

n!+ hnψ(h) = f(a) +

( f (n)(a)

n!+ ψ(h)︸︷︷︸

A(h)

)hn.


Since f (n)(a) 6= 0, A(h) has constant sign if |h| < ε is small (because ψ(h)→ 0 as h→ 0). Iffor example f (n)(a) > 0, then A(h) is positive when |h| < ε. On the other hand, since n isodd, hn > 0 when h > 0 and hn < 0, when h < 0. Therefore,7 if f (n)(a) > 0, then

f(a+ h) = f(a) +A(h)hn

{> f(a) if 0 < h < ε,

< f(a) if −ε < h < 0,

and hence f cannot have extremum at x = a.

12.7.2 Little o calculus


f(x) =3√x2 + sinx+ cosx− ex

is differentiable at x = 0. Find f ′(0).

Proof. We already provided one proof based on the l’Hospital rule (Example 12.30), but nowwe will shown another one based on Theorem 12.50.8 According to Theorem 12.50 we have

sinx = x− x3

6+ o(x3), cosx = 1− x2

2+ o(x3), ex = 1 + x+

x2

2+x3

6+ o(x3)

and hence adding the functions, a lot of terms will cancel out and we will get9

x2 + sinx+ cosx− ex = −x3

3+ o(x3).

Therefore

f ′(0) = limx→0

f(x)− f(0)

x= lim

x→0

3√−x3/3 + o(x3)

x= lim

x→0

3

√−1

3+o(x3)

x3= − 1

3√

3.

The following examples are more involved, but they will teach us how to use o(xk) inorder to find Taylor polynomials and limits of complicated functions. It requires however,some practice as working with the little o expressions may be quite confusing.

Example 12.56. We will show that

(12.15) (sinx)3 = x3 + o(x4).

7The case f (n)(a) < 0 is similar.8After all, the is proof is not so much different from the one based on the l’Hospital rule, because the proof

of Theorem 12.50 was based on the l’Hospital rule.9With our notation o(x3) + o(x3) + o(x3) = o(x3), see Remark 12.6.


Of course, one can show it by computing derivatives of (sinx)3 at x = 0 of orders up to 4,but instead, we will show how to obtain such a formula directly from

sinx = x− x3

6+ o(x3).

We have

(sinx)3 = (x− x3

6+ o(x3))3 = a sum of 27 terms.

The polynomial terms arex3+ monomials of degree ≥ 5,

and the terms involving o(x3) are of the form

cxko(x3)`, k + ` ≥ 3, ` ≥ 1,

clearly, each such term is o(x4). This proves (12.15). With some practice one could write(12.15) directly while doing computations in memory.

Example 12.57. We will show that

sin(sinx) = x− x3

3+ o(x3),

using the little ‘o’ calculus. We have

sin(sinx) = sinx− (sinx)3

6+ o((sinx)3)

Clearly, o((sinx)3) = o(x3) because sinx→ 0 as x→ 0 and hence

o((sinx)3)

x3=o((sinx)3)

(sinx)3

(sinx

x

)3

→ 0 as x→ 0.

Therefore, invoking the previous example,10

sin(sinx) = x− x3

6+ o(x3)− x3 + o(x4)

6+ o(x3) = x− x3

3+ o(x3).


limx→0

sinx− sin(sinx)

ln(1 + x3).

Solution. Since ln(1 + x) = x + o(x), we have ln(1 + x3) = x3 + o(x3) and hence theprevious example gives

sinx− sin(sinx)

ln(1 + x3)=

(x− x3

6 + o(x3))−(x− x3

3 + o(x3))

x3 + o(x3)=

x3

6 + o(x3)

x3 + o(x3)=

16 + o(x3)

x3

1 + o(x3)x3

so the limit as x→ 0, equals 1/6. 2

10We are using here the fact that o(x3) + o(x4) + o(x3) = o(x3).


12.7.3 The Lagrange and the Cauchy remainders

Theorem 12.59 (Taylor’s theorem with the Lagrange and the Cauchy remainders). Letf : I → R be (n+ 1)-times differentiable on an open interval I. If a, a+h ∈ I, then there areθ1, θ2 ∈ (0, 1) such that the remainder term R(h) in

f(a+ h) = f(a) + f ′(a)h+f ′′(a)

2!h2 + . . .+

f (n)(a)

n!hn +R(h)

equals11

R(h) =f (n+1)(a+ θ1h)

(n+ 1)!hn+1︸︷︷︸

Lagrange’s remainder

=f (n+1)(a+ θ2h)

n!(1− θ2)nhn+1︸︷︷︸

Cauchy’s remainder

.

Remark 12.60. The Lagrange remainder is more common in applications, then the Cauchyone. Therefore, when we will refer to Taylor’s theorem, we will usually refer to the versionwith the Peano remainder or the Lagrange remainder, even if we do not say it explicitly.

Proof. Fix a and h, and for t ∈ [0, 1] define a new function

g(t) = f(a+ th) + f ′(a+ th)h(1− t) +f ′′(a+ th)

2!h2(1− t)2 + . . .+

f (n)(a+ th)

n!hn(1− t)n.

Clearly

g(1)− g(0) = f(a+ h)−

(f(a) + f ′(a)h+

f ′′(a)

2!h2 + . . .+

f (n)(a)

n!hn

)= R(h).

According to the mean value theorem12 g(1)−g(0) = g′(θ) for some θ ∈ (0, 1). Thus, we needto compute the derivative of g. Below, in each parenthesis, we have the result of applying theproduct rule. Then, most of the terms cancel out leading to a surprisingly simple expression

g′(t) = f ′(a+ th)h

+ (f ′′(a+ th)h2(1− t)− f ′(a+ th)h)

+

(f ′′′(a+ th)

2!h3(1− t)2 − f ′′(a+ th)h2(1− t)

)+ . . .

+

(f (n+1)(a+ th)

n!hn+1(1− t)n − f (n)(a+ th)

(n− 1)!hn(1− t)n−1

)

=f (n+1)(a+ th)

n!hn+1(1− t)n.

Therefore, the remainder term equals

R(h) = g(1)− g(0) = g′(θ) =f (n+1)(a+ θh)

n!hn+1(1− θ)n

11Note that ξ1 = a+ θ1h and ξ2 = a+ θ2h are points between a and a+ h.12Since f is (n+ 1)-times differentiable, g is differentiable.


for some θ ∈ (0, 1). This gives the Cauchy remainder.

Now applying the Cauchy Mean Value Theorem 12.21, to functions g(t) and u(t) =−(1− t)n+1, there is θ ∈ (0, 1) such that

R(h) = g(1)− g(0) =g(1)− g(0)

u(1)− u(0)=g′(θ)

u′(θ)=

f (n+1)(a+θh)n! hn+1(1− θ)n

(n+ 1)(1− θ)n=fn+1(a+ θh)

(n+ 1)!hn+1.

This gives the Lagrange remainder.

Remark 12.61. We proved Taylor’s theorem with the Lagrange remainder from that withthe Cauchy remainder by applying the Cauchy Mean Value Theorem. This method however,suggests how to obtain uncountably many different Taylor remainders. Indeed, if u : [0, 1]→ Ris any continuous function that is differentiable on (0, 1) with u′ 6= 0, then exactly as above,we can prove that there is θ ∈ (0, 1) such that

R(h) =g(1)− g(0)

u(1)− u(0)(u(1)− u(0)) =

f (n+1)(a+ θh)(1− θ)n(u(1)− u(0))hn+1

u′(θ)n!.

Another proof of Taylor’s formula with the Lagrange remainder. Fix a and h. Then there isa constant M such that

f(a+ h) = f(a) +f ′(a)

1!h+

f ′′(a)

2!h2 + . . .+

f (n)(a)

n!hn +Mhn+1 .

We can always find such M by solving the above equation for M . We have to prove that

M =f (n+1)(ξ)

(n+ 1)!for some ξ between a and a+ h.

Let

g(t) = f(t)−

(f(a) +

f ′(a)

1!(t− a) + . . .+

f (n)(a)

n!(t− a)n

)−M(t− a)n+1.

Clearlyg(a+ h) = 0 and g(n+1)(t) = f (n+1)(t)− (n+ 1)!M.

Also for k = 0, 1, . . . , n we have

gk(a) = f (k)(a)− f (k)(a)− 0 = 0.

Therefore, applying Rolle’s theorem several times we obtain

g(a) = 0, g(a+ h) = 0 ⇒ g′(ξ1) = 0 for some ξ1 between a and a+ h,

g′(a) = 0, g′(ξ1) = 0 ⇒ g′′(ξ2) = 0 for some ξ2 between a and ξ1,

. . .

gn(a) = 0, gn(ξn) = 0 ⇒ g(n+1)(ξn+1) = 0 for some ξn+1 between a and ξn.

Note that by construction, ξn+1 is between a and a+ h. We have

0 = g(n+1)(ξn+1) = f (n+1)(ξn+1)− (n+ 1)!M

and hence

M =f (n+1)(xn)

(n+ 1)!.



12.7.4 Applications and examples

Example 12.62. Prove that the approximation

√1 + x ≈ 1 +

x

2− x2

8

has the error less than1

2|x|3, when |x| ≤ 1/2.

Proof. We have

f(x) = (1 + x)1/2, f ′(x) =1

2(1 + x)−1/2, f ′′(x) = −1

4(1 + x)−3/2, f ′′′(x) =

3

8(1 + x)−5/2 .

Hence f(0) = 1, f ′(0) = 1/2, f ′′(0) = −1/4 and by Taylor’s theorem there is θ ∈ (0, 1) suchthat

f(x) = f(0) + f ′(0)x

1!+ f ′′(0)

x2

2!+ f ′′′(θx)

x3

3!,

i.e.√

1 + x = 1 +x

2− x2

8+

3

8(1 + θx)−5/2

x3

6,

∣∣∣∣√1 + x−(

1 +x

2− x2

8

)∣∣∣∣ =1

16|1 + θx|−5/2|x|3 ≤ 1

16

(1

2

)−5/2|x|3 =

√2

4|x|3 ≤ 1

2|x|3.

If f is n-times differentiable on an open interval I, then for a, a + h ∈ I, there is θ =θ(h) ∈ (0, 1) such that

(12.16) f(a+ h) = f(a) + f ′(a)h+ f ′′(a)h

2!+ . . .+ f (n−1)(a)

hn−1

(n− 1)!+ f (n)(a+ θ(h)h)

hn

n!.

This is Taylor’s theorem with Lagrange’s remainder. Note that we assume here only n-timesdifferentiability so the Taylor polynomial is of degree n − 1. The theorem does not provideany hint where θ is located in (0, 1). It could be close to 0 or close to 1. We simply do notknow. The next result shows however, that if in addition f is (n+ 1)-times differentiable at aand f (n+1)(a) 6= 0, then θ is very close to 1

n+1 ∈ (0, 1), at least, when h is close to zero. Notealso that since we do not assume (n + 1)-times differentiability on the entire interval I, westill have to have the Taylor polynomial of degree n− 1, when using Lagrange’s remainder.

Proposition 12.63. If f is n-times differentiable on an open interval I, (n + 1)-times dif-ferentiable at a, f (n+1)(a) 6= 0, and θ = θ(h) ∈ (0, 1) is such that Taylor’s formula (12.16)with Lagrange’s remainder is true, then

limh→0

θ(h) =1

n+ 1.


Proof. Note that the function f satisfies the assumptions of Theorem 12.50 with n + 1 inplace of n. Therefore, there is a function ψ such that

f(a+ h) = f(a) + f ′(a)h+ f ′′(a)h

2!+ . . .+ fn(a)

hn

n!+ f (n+1)(a)

hn+1

(n+ 1)!+ hn+1ψ(h).

Comparing this with (12.16) we have

f (n)(a+ θh)hn

n!= fn(a)

hn

n!+ f (n+1)(a)

hn+1

(n+ 1)!+ hn+1ψ(h).

Therefore,f (n)(a+ θh)− f (n)(a)

h=f (n+1)(a)

n+ 1+ n!ψ(h)

and hence

θ(h) =

f (n+1)(a)n+1 + n!ψ(h)

f (n)(a+θh)−f (n)(a)θh

Note that in the denominator we have a difference quotient that approximates f (n+1)(a) 6= 0.Since ψ(h)→ 0 as h→ 0, letting θ → 0 yields

limh→0

θ(h) =

f (n+1)(a)n+1

f (n+1)(a)=

1

n+ 1.

In particular, if f : R→ R is C∞ and f ′′(a) 6= 0, then in the formula

f(x+ h) = f(x) + f ′(x)h+f ′′(x+ θh)

2h2

θ has to be close to 1/3 when h is small. The question is: can it be equal 1/3 for all x andh? Only, if f is a polynomial of degree less than of equal 3.

Example 12.64. Prove that if f : R → R is 4-times continuously differentiable and suchthat for all x, h ∈ R we have

(12.17) f(x+ h) = f(x) + f ′(x)h+f ′′(x+ h/3)

2h2,

then f has to be a polynomial of degree less than or equal to 3.

Proof. It suffices to prove that f (4) = 0. Fix x ∈ R. Applying Taylor’s theorem to f ′′, thereis η between x and x+ h/3 such that

f ′′(x+

h

3

)= f ′′(x) + f ′′′(x)

h

3+f (4)(η)

2

(h

3

)2

Replacing f ′′(x+ h/3) in (12.17) with this formula yields

f(x+ h) = f(x) + f ′(x)h+1

2f ′′(x)h2 +

1

6f ′′′(x)h3 +

1

36f (4)(η)h4.


On the other hand Taylor’s theorem for f yields

f(x+ h) = f(x) + f ′(x)h+1

2f ′′(x)h2 +

1

6f ′′′(x)h3 +

1

24f (4)(ξ)h4

for some ξ between x and x+ h. Therefore,

1

36f (4)(η)h4 =

1

24f (4)(ξ)h4,

2

3f (4)(η) = f (4)(ξ)

If h→ 0, then η → x and ξ → x so continuity of f (4) yields

2

3f (4)(x) = f (4)(x), and hence f (4)(x) = 0.

For a function f : R→ R we define ‖f‖∞ = supx∈R |f(x)|.

Example 12.65. If f : R → R is twice differentiable, and both f and f ′′ are boundedfunctions,13 then f is bounded and

‖f ′‖∞ ≤ ‖f‖∞ + ‖f ′′‖∞ and ‖f ′‖∞ ≤ 2√‖f‖∞ ‖f ′′‖∞

Proof. Let x ∈ R and h > 0. According to Taylor’s theorem

(12.18) f(x+ h) = f(x) + f ′(x)h+ f ′′(ξ)h2

2for some ξ between x and x+ h.

Hence

(12.19) |f ′(x)|h ≤ |f(x+ h)|+ |f(x)|+ |f ′′(ξ)|h2

2≤ 2‖f‖∞ + ‖f ′′‖∞

h2

2.

Taking now supremum over x on the left hand side, and then dividing by h yields

(12.20) ‖f ′‖∞ ≤ ‖f‖∞2

h+ ‖f ′′‖∞

h

2.

When h = 2, we get the first of the two inequalities: ‖f ′‖∞ ≤ ‖f‖∞ + ‖f ′′‖∞.

It remains to prove the second inequality. Note that if ‖f ′′‖∞ = 0, then f ′′ = 0 sof(x) = ax + b. If a 6= 0, then f is unbounded, which contradicts our assumption. Thus,f(x) = a is constant so f ′ = 0 and the second inequality is obvious in that case. Therefore,we may assume that ‖f ′′‖∞ > 0.

On the right hand side of (12.20) we have a sum of two expressions depending on anarbitrary h. A common and important trick is to minimize it with respect to h. It turns outthat when minimum is attained, both terms that we add are equal so it suffices to solve forh such that14

‖f‖∞2

h= ‖f ′′‖∞

h

2, h = 2

√‖f‖∞‖f ′′‖∞

.

Applying this value of h to (12.20) we obtain the second inequality ‖f ′‖∞ ≤ 2√‖f‖∞ ‖f ′′‖∞.

13That is ‖f‖∞ <∞ and ‖f ′′‖∞ <∞.14This is where we need the assumption that ‖f ′′‖∞ > 0.


The above argument seems not to leave much space for an improvement so the next resultmight seem quite surprising.

Theorem 12.66 (Landau). If f : R → R is twice differentiable, and both f and f ′′ arebounded functions, then f is bounded and

‖f ′‖∞ ≤√

2‖f‖∞ ‖f ′′‖∞.

Proof. In addition to (12.18), Taylor’s theorem for h > 0 also gives

f(x− h) = f(x)− f ′(x)h+ f ′′(η)h2

2for some η between x− h and x.

Subtracting this equality from (12.18) yields

f(x+ h)− f(x− h) = 2f ′(x)h+ (f ′′(ξ)− f ′′(η))h2

2

and hence

2h|f ′(x)| ≤ |f(x+ h)− f(x− h)|+ |f ′′(ξ)− f ′′(η)|h2

2≤ 2‖f‖∞ + ‖f ′′‖∞h2,

(12.21) ‖f ′‖∞ ≤ ‖f‖∞1

h+ ‖f ′′‖∞

h

2.

As in Example 12.65 we can assume that ‖f ′′‖∞ > 0. Then taking h such that

‖f‖∞1

h= ‖f ′′‖∞

h

2, i.e., h =

√2‖f‖∞‖f ′′‖∞

and applying it to (12.21) yields ‖f ′‖∞ ≤√

2‖f‖∞ ‖f ′′‖∞.

12.8 Differentiability of a limit function

The following result generalizes Theorem 12.37.

Theorem 12.67. 15 Let fn : [a, b] → R be a sequence of continuously differentiable func-tions.16 If for some x0 ∈ [a, b], the sequence (fn(x0)) is convergent and the sequence f ′nis uniformly convergent on [a, b], then the sequence fn ⇒ f is uniformly convergent to acontinuously differentiable function f : [a, b]→ R and

f ′(x) = limn→∞

f ′n(x) for all x ∈ [a, b].

15In fact, we not have to assume that the derivatives are continuous, the result is still true if we only assumedifferentiability of the functions fn on [a, b], but the assumption of continuous differentiability is sufficient formost of the applications.

16Regarding the endpoints, the derivative is defined as one sided derivative. For example f ′n(a) =limh→0+(f(a+ h)− f(a))/h.

12.8. DIFFERENTIABILITY OF A LIMIT FUNCTION 219

We need to precede the proof with a simple, but useful observation.

Lemma 12.68. If gn ⇒ g is a uniformly convergent sequence of continuous functions, andxn → x, then gn(xn)→ g(x).

Proof. The result follows continuity of g:

|gn(xn)− g(x)| ≤ |gn(xn)− g(xn)|+ |g(xn)− g(x)| → 0 as n→∞.

Proof of Theorem 12.67. First we will prove that the sequence fn is uniformly convergent tosome function f : [a, b] → R. Let f ′n ⇒ g on [a, b]. According to the Mean Value Theorem,for x ∈ [a, b] and n,m ∈ N, there is ξ between x and x0 such that

|fn(x)− fm(x)| ≤ |(fn(x)− fm(x)

)−(fn(x0)− fm(x0)

)|+ |fn(x0)− fm(x0)|

= |f ′n(ξ)− f ′m(ξ)| |x− x0|+ |fn(x0)− fm(x0)|≤ |b− a|

(|f ′n(ξ)− g(ξ)|+ |f ′m(ξ)− g(ξ)|

)+ |fn(x0)− fm(x0)|.

Let ε > 0 be given and let n0 be such that for all n,m ≥ n0 and all t ∈ [a, b]

|f ′n(t)− g(t)| < ε

4|b− a|and |fn(x0)− fm(x0)| <

ε

2.

Then for n,m ≥ n0 and any x ∈ [a, b] we have

(12.22) |fn(x)− fm(x)| < |b− a|(

ε

4|b− a|+

ε

4|b− a|

)+ε

2= ε.

In particular for every x ∈ [a, b], the sequence (fn(x)) is Cauchy and hence convergent.Denoting the limit by f(x) we have pointwise convergence fn → f on [a, b]. Fix n and letm→∞ in (12.22). Then we get

|fn(x)− f(x)| ≤ ε for all x ∈ [a, b] and all n ≥ n0.

that proves uniform convergence fn ⇒ f .

The Mean Value Theorem implies that for x, x + h ∈ [a, b], and any n ∈ N, there is ξnbetween x and x+ h such that

fn(x+ h)− fn(x)

h= f ′n(ξn).

The sequence ξn is bounded so it has a convergent subsequence ξnk→ ξ. Therefore,

Lemma 12.68 implies that

f(x+ h)− f(x)

h← fnk

(x+ h)− fnk(x)

h= f ′nk

(ξnk)→ g(ξ) so

f(x+ h)− f(x)

h= g(ξ).

Since ξ is between x and x + h and g is continuous, letting h → 0 yields f ′(x) = g(x) =limn→∞ f

′n(x). The proof is complete.


The next result is an immediate consequence of Theorem 12.67.

Corollary 12.69. If fn : (a, b) → R is a sequence of continuously differentiable functionsthat converge at some point x0 ∈ (a, b) and if the sequence of derivatives (f ′n) is convergentuniformly on every interval [α, β] ⊂ (a, b), then fn ⇒ f converges uniformly on every interval[α, β] ⊂ (a, b) to a continuously differentiable function such that

f ′(x) = limn→∞

f ′n(x) for all x ∈ (a, b).

Applying this corollary to a sequence of partial sums we obtain.

Theorem 12.70. Let fn : (a, b) → R be a sequence of continuously differentiable functionssuch that the series

∑∞n=1 fn(x0) is convergent for some x0 ∈ (a, b). If the series

∑∞n=1 f

′n

converges uniformly on every interval [α, β] ⊂ (a, b), then the series∑∞

n=1 fn converges uni-formly on every interval [α, β] ⊂ (a, b) to a continuously differentiable function and( ∞∑

n=1

fn(x)

)′=∞∑n=1

f ′n(x) for all x ∈ (a, b).

This result immediately generalizes Theorem 12.37. Here is another corollary that easilyfollows by an induction argument.

Corollary 12.71. If fn ∈ C∞(a, b) and for every k = 0, 1, 2, . . . the series∑∞

n=1 f(k)n con-

verges uniformly on every interval [α, β] ⊂ (a, b), then f(x) =∑∞

n=1 fn(x) ∈ C∞(a, b) and( ∞∑n=1

fn(x)

)(k)

=∞∑n=1

f (k)n (x) for all x ∈ (a, b) and all k ∈ N.

It follows from the p-Series Test (Theorem 10.14 that the Riemann zeta function definedby

(12.23) ζ(x) =∞∑n=1

1

nx

is well defined for all x > 1. As an application of Corollary 12.71 we will prove the followingimportant result.

Theorem 12.72. ζ ∈ C∞(1,∞).

Proof. In order to prove convergence of derivatives of the series (12.23) we will need

Lemma 12.73. If a > 1 and k ≥ 1, then

∞∑k=2

(log n)k

na<∞ .

12.8. DIFFERENTIABILITY OF A LIMIT FUNCTION 221

Proof. Let ε > 0 be such that a− ε > 1. Observe that

(12.24)(log n)k

nε→ 0 as n→∞.

To prove this it suffices to show that

(log x)k

xε=

(log x

xε/k

)k→ 0 as x→∞

which is an immediate consequence of l’Hospital’s rule

limx→∞

log x

xε/k= lim

x→∞

x−1

εkx

εk−1 = lim

x→∞

k

εxε/k= 0 .

This proves (12.24). In particular the sequence at (12.24) is bounded, say by M and hence

∞∑n=2

(log n)k

na=∞∑n=2

(log n)k

nε1

na−ε≤M

∞∑n=2

1

na−ε<∞ ,

because a− ε > 1.

Now we can complete the proof of the theorem. According to Corollary 12.71 it sufficesto prove that for every k ≥ 0 and any a > 1 the series

(12.25)∞∑n=1

(1

nx

)(k)

converges uniformly on [a,∞).17

For k = 0∞∑n=1

∣∣∣∣ 1

nx

∣∣∣∣ ≤ ∞∑n=1

1

na<∞ ,

because a > 1. Thus the series ζ(x) =∑∞

n=1 1/nx converges uniformly on [a,∞) by theM -test.

For k ≥ 1 we have

∞∑n=1

(1

nx

)(k)

=∞∑k=2

(− log n)k

nx, so

∞∑n=1

∣∣∣∣∣(

1

nx

)(k)∣∣∣∣∣ ≤

∞∑n=2

(log n)k

na<∞

by Lemma 12.73 and hence the series (12.25) converges uniformly on [a,∞) by the M -test.

17It would suffice to prove uniform convergence on every interval [a, b], 1 < a < b <∞.

Chapter 13

Integral

13.1 Existence of an antiderivative

Definition 13.1. Given a continuous function f : I → R defined on an interval I, we saythat F : I → R an antiderivative of f , if F ′(x) = f(x) for all x ∈ I.1

Note that antiderivative is not unique: if F is an antiderivative of f , then for any constantc, F + c is also an antiderivative of f and in fact all antiderivatives differ by a constant.

Proposition 13.2. If f : I → R is a continuous function defined on an interval and F : I →R is antiderivative of f , then F1 : I → R is an antiderivative of f : I → R if and only is thereis a constant c such that F1(x) = F (x) + c for all x ∈ I.


Finding derivative of a function that is given by an explicit formula is a routine exercise.However, finding an antiderivative is usually much more difficult. That might appear surpris-ing at first: since finding derivative and finding an antiderivative are inverse operations oneto another, they should be equally difficult. But this intuition is not correct: placing a needlein a haystack is easy, but finding it, almost impossible. The situation is even much worsethan finding a needle in a haystack: Liouville proved there are functions, like for exampleexp(−x2), that have an antiderivative, but it is not possible to express it as a closed formulainvolving elementary functions.

We will prove now a very important result that states that every continuous functionhas an antiderivative. Since, in most of the situations, we cannot find an a formula for anantiderivative, how can we be sure that it exists?

Theorem 13.3. If f : (a, b) → R is a continuous function, then there is a function F :(a, b)→ R such that F ′(x) = f(x) for all x ∈ (a, b).

1We do not assume that I is open. If, for example, I = [a, b), we require that F ′+(a) = f(a), that is at theendpoints of the interval, we consider one-sided derivatives.

223

224 CHAPTER 13. INTEGRAL

This result is a consequence of

Lemma 13.4. If f : [a, b]→ R is a continuous function, then there is a function F : [a, b]→ Rsuch that F ′(x) = f(x) for all x ∈ [a, b].2

Before we prove the lemma, we will show how to conclude Theorem 13.3 from it.

Proof of Theorem 13.3 assuming Lemma 13.4. Let f : (a, b) → R be continuous. Fix x0 ∈(a, b) and take a strictly decreasing sequence an ↓ a and a strictly increasing sequence bn ↑ bsuch that

x0 ∈ [a1, b1] ⊂ [a2, b2] ⊂ [a3, b3] ⊂ . . . ⊂ (a, b),

∞⋃n=1

[an, bn] = (a, b).

According to Lemma 13.4, for each n, there is a function Fn : [an, bn] → R such thatF ′n(x) = f(x) for x ∈ (an, bn), and we can also assume that Fn(x0) = 0 (because we canalways add a constant to an antiderivative).

Observe that if m > n, then Fm = Fn on [an, bn] ⊂ [am, bm]. Indeed, (Fn − Fm)′(x) =f(x)−f(x) = 0 for x ∈ [an, bn] so Fn−Fm is constant on [an, bn]. Since (Fn−Fm)(x0) = 0, wehave that the constant equals zero and hence Fn = Fm on [an, bn]. This observation impliesthat the function

(13.1) F (x) = Fn(x) if x ∈ [an, bn]

is well defined on (a, b). Indeed, each x ∈ (a, b), belongs to many intervals [an, bn], but if x ∈[an, bn] and x ∈ [am, bm], then Fn(x) = Fm(x) so it does not matter which interval we choosein (13.1). Now it is easy to see that F ′(x) = f(x) for all x ∈ (a, b). Indeed, if x ∈ (a, b), thenfor a sufficiently large n, x ∈ (an, bn) and F = Fn on (an, bn) so F ′(x) = F ′n(x) = f(x).

It remains to prove Lemma 13.4. We will present two proofs. The first proof will berigorous, but not intuitive. It will follow from the Weierstrass Theorem 11.66 about approx-imation of continuous functions by polynomials and from Theorem 12.67. The second proofwill be intuitive, but not rigorous. It will be based on the notion of the area under the graphof a function. The second approach is much more important than the first one, because of theinterpretation of the antiderivative as the area under the graph. It is however, not rigorous,because the notion of the area under the graph has not been defined.

Proof of Lemma 13.4 (rigorous). According to Theorem 11.66, there is a sequence of polyno-mials pn(x) such that

|f(x)− pn(x)| < 1

nfor all x ∈ [a, b]

2At the endpoints we take one sided derivatives.

13.1. EXISTENCE OF AN ANTIDERIVATIVE 225

so pn ⇒ f . Each polynomial has an antiderivative. Indeed, if p(x) = anxn+an−1x

n−1 + . . .+a1x+ a0, then (

ann+ 1

xn+1 +an−1n

xn + . . .+a12x2 + a0x

)′= p(x).

Let Pn be an antiderivative of pn such that Pn(a) = 0. Since the sequence (Pn(a)) convergesand the sequence of derivatives P ′n = pn ⇒ f converges uniformly on [a, b], Theorem 12.67implies that Pn ⇒ F converges uniformly to some function F such that

F ′(x) = limn→∞

P ′n(x) = f(x) for all x ∈ [a, b].


Proof of Lemma 13.4 (intuitive, but not rigorous). Assume that f > 0 on [a, b]. We defineF (x) to be the area under the graph of f |[a,x]:

If a ≤ x < b, then for 0 < h < b − x, F (x + h) − F (x) is the area under the graph off |[x,x+h]

We can estimate the area from below by the area of a rectangle that is contained underthe graph and from above by the area of a rectangle that contains the the area under thegraph (picture on the right). That gives the inequalities

h inft∈[x,x+h]

f(t) ≤ F (x+ h)− F (x) ≤ h supt∈[x,x+h]

f(t)

Therefore, continuity of f yields,

f(x)h→0+

←−−− inft∈[x,x+h]

f(t) ≤ F (x+ h)− F (x)

h≤ sup

t∈[x,x+h]f(t)

h→0+

−−−→ f(x),


that is,

F ′+(x) = limh→0+

F (x+ h)− F (x)

h= f(x).

In particular F ′+(a) = f(a). Similarly we treat the left side derivative. If a < x ≤ b, then for0 < h < x− a, F (x)−F (x− h) is the area under the graph of f |[x−h,x] and arguing as abovewe have

f(x)h→0+

←−−− inft∈[x−h,x]

f(t) ≤ F (x)− F (x− h)

h≤ sup

t∈[x−h,x]f(t)

h→0+

−−−→ f(x).

Hence

F ′−(x) = limh→0−

F (x+ h)− F (x)

h= lim

h→0+

F (x− h)− F (x)

−h= lim

h→0+

F (x)− F (x− h)

h= f(x).

In particular F ′−(b) = f(b). If x ∈ (a, b), then F ′+(x) = F ′−(x) = f(x) implies that F ′(x) =f(x).

If f : [a, b]→ R is continuous, but not necessarily positive, then there is M > 0 such thatf+M is positive on [a, b]. Therefore, from what we already proved, there is G : [a, b]→ R suchthat G′(x) = f(x) +M for x ∈ [a, b] and hence F (x) = G(x)−Mx satisfies F ′(x) = f(x).

Remark 13.5. Not surprisingly, the second, non-rigorous proof is more important. In orderto turn it into a rigorous argument, one needs to develop the notion of the area under thegraph of a function. This will be done in Section 13.2.2.

Example 13.6. According to Theorem 13.3 the function exp(−x2) has an antiderivative,but as we mentioned earlier, it cannot be expressed in terms of elementary functions. We willshow now a straightforward construction of an antiderivative of exp(−x2) that does not referto Theorem 13.3. The function exp(−x2) has a power series expansion

e−x2

=

∞∑n=0

(−x2)n

n!=

∞∑n=0

(−1)nx2n

n!for all x ∈ R.

Therefore,

F (x) =∞∑n=0

(−1)nx2n+1

(2n+ 1)n!

satisfies F ′(x) = exp(−x2) for all x ∈ R. Indeed, taking the term by term derivative of thepower series F (x) gives the power series expansion of exp(−x2). Since, the series and its termby term derivative have the same radius of convergence, in our case R = ∞, the equalityF ′(x) = exp(−x2) is true for all x ∈ R.

Note that this proof is strictly related to the proof of Theorem 13.3 based on approximationby polynomials. Indeed, the power series expansion of e−x

2gives explicit polynomials

pn(x) =n∑k=0

(−1)kx2k

k!

that approximate exp(−x2) on finite intervals [0,M ], and the antiderivative F was constructedas a limit of antiderivatives of these polynomials.

13.2. DEFINITE AND INDEFINITE INTEGRAL 227

13.2 Definite and indefinite integral

Definition 13.7. The indefinite integral of a continuous function f : (a, b) → R is definedas a class of all antiderivatives of f and is denoted by

(13.2)

∫f(x) dx.

Therefore (13.2) is not a function, but a class of functions that differ by a constant. Forexample ∫

ex dx = ex + c where c ∈ R is any constant.

Later in Section ?? we will learn techniques of finding antiderivatives.

Definition 13.8. For a continuous function f : [a, b]→ R we define the definite integral by

(13.3)

∫ b

af(x) dx := F

∣∣∣ba

:= F (b)− F (a),

where F : [a, b]→ R is any antiderivative of f . If a = b, we set∫ a

af(x) dx = 0.

Note that the value of the integral (13.3) does not depend on the choice of an antiderivative.If F1 = F + c is another antiderivative of f , then F1(b)− F1(a) = F (b)− F (a).

The integral can be interpreted as the area under the graph. We already mentioned it inthe case in which f is positive. If f can attain negative values, then we take the areas belowthe x-axis with the negative sign.

In the definition of the definite integral (13.3) we assume that b ≥ a. If b < a, we define∫ b

af(x) dx = −

∫ a

bf(x) dx.

With this definition, we still have

∫ b

af(x) dx = F (b) − F (a), even if b < a. The way we

defined the definite integral makes the following result obvious.


Theorem 13.9 (Fundamental Theorem of Calculus). If f : (a, b) → R is continuous andc ∈ (a, b), then

d

dx

∫ x

cf(t) dt = f(x) for all x ∈ (a, b).

Indeed,d

dx

∫ x

cf(t) dt =

d

dx(F (x)− F (c)) = F ′(x) = f(x).

Remark 13.10. The above result is really obvious so how something absolutely obvious canbe called fundamental? We did define the definite integral through the antiderivative; withthis definition Theorem 13.9 is indeed obvious. However, if we interpret the indefinite integralas the area under the graph of the function between a and x, the result is not obvious at all.This interpretation makes the result fundamental.

Let us list some basic properties of the integral.

Theorem 13.11. If f, g : [a, b]→ R are continuous, and a1, a2 ∈ R then

(a)

∫ b

a(a1f(x) + a2g(x)) dx = a1

∫ b

af(x) dx+ a2

∫ b

ag(x) dx;

(b) If a < c < b, then

∫ b

af(x) dx =

∫ c

af(x) dx+

∫ b

cf(x) dx;

(c) If f ≤ g on [a, b], then

∫ b

af(x) dx ≤

∫ b

ag(x) dx;

(d) If m ≤ f(x) ≤M for all x ∈ [a, b], then

m(b− a) ≤∫ b

af(x) dx ≤M(b− a);

(e)

∣∣∣∣∫ b

af(x) dx

∣∣∣∣ ≤ ∫ b

a|f(x)| dx;

(f) (Mean Value Theorem). There is ξ ∈ [a, b] such that

(13.4)1

b− a

∫ b

af(x) dx = f(ξ).

Proof. Easy proofs of properties (a)-(c) are left to the reader. (d) follows from (c). Since−|f | ≤ f ≤ |f |, (e) easily follows from (c). It remains to prove (f). Since supremum andinfimum of f are attained, there are x1, x2 ∈ [a, b] such that f(x1) ≤ f(x) ≤ f(x2) for allx ∈ [a, b]. Hence

f(x1)(b− a) ≤∫ b

af(x) dx ≤ f(x2)(b− a)

∫ b

af(x) dx, f(x1) ≤

1

b− a

∫ b

af(x) dx ≤ f(x2).

Since f is continuous, it follows from the Intermediate Value Theorem that f attains any valuebetween f(x1) and f(x2), and in particular, there is ξ ∈ [a, b] such that (13.4) is satisfied.


13.2.1 Examples

The next two examples show simple applications of the Fundamental Theorem of Calculus.

Example 13.12. Find the derivative of the function g(x) =∫ x30 et

2dt.

Solution. g(x) = F (x3), where F (x) =∫ x0 e

t2 dt. Therefore

g′(x) = (F (x3))′ = F ′(x3) · 3x2 = e(x3)23x2 = 3x2ex

6.

2

Example 13.13. Let f : R → R be continuous. Find the derivative of the function g(x) =∫ ba f(x+ t) dt.

Solution. If F is an antiderivative of f , then ddtF (x+ t) = f(x+ t) and hence3∫ b

af(x+ t) dt = F (x+ t)

∣∣∣t=bt=a

= F (t)∣∣∣t=b+xt=a+x

=

∫ b+x

a+xf(t) dt.

Therefore

g(x) =

∫ b+x

a+xf(t) dt = F (b+ x)− F (a+ x)

and henceg′(x) = F ′(b+ x)− F ′(a+ x) = f(b+ x)− f(a+ x).

2

Example 13.14. Let f : [0, a]→ R be continuous and strictly increasing, f(0) = 0, f(a) = b.Then f has a continuous inverse g : [0, b]→ R, g(0) = 0, g(b) = b. Prove that∫ a

0f(x) dx+

∫ b

0g(x) dx = ab.

Proof. We have

3We could prove the equality of integrals using a linear change of variables, but we haven’t discussed achange of variables yet so we need a straightforward argument.


The area ab of the rectangle with sides a and b is split by the graph of f into two subareas:∫ a

0f(x) dx and

∫ b

0g(y) dy.

That completes the proof. However, the proof is not rigorous since it uses the notion of thearea under the graph of a function.

We will show now a rigorous proof under the additional assumption that the function fis differentiable. Let

ϕ(x) =

∫ x

0f(t) dt+

∫ f(x)

0g(t) dt− xf(x).

Clearly, ϕ(0) = 0. We will show that ϕ′(x) = 0. This will imply that ϕ(x) = 0. In particularϕ(a) = 0 that is

ϕ(a) =

∫ a

0f(t) dt+

∫ f(a)

0g(t) dt− af(a) = 0,

∫ a

0f(t) dt+

∫ b

0g(t) dt = ab.

Therefore, it remains to show that ϕ′(x) = 0. Let

F (x) =

∫ x

0f(t) dt and G(x) =

∫ x

0g(t) dt

be antiderivatives of f and g. We have

ϕ(x) = F (x) +G(f(x))− xf(x)

and hence

ϕ′(x) = F ′(x) +G′(f(x))f ′(x)− f(x)− xf ′(x) = f(x) + g(f(x))︸︷︷︸x

f ′(x)− f(x)− xf ′(x) = 0.


Remark 13.15. Let us explain why it was natural to consider the function ϕ(x). The areaof the rectangle

equals xf(x), but on the other hand, using geometric interpretation of the integral, the areaequals the sum of two integrals. Therefore the function ϕ(x) equals zero. While we do notwant rely on the geometric interpretation of the integral, we can prove directly that ϕ(x) = 0by showing that ϕ(0) = 0 and ϕ′(x) = 0.


Remark 13.16. It is rather annoying, that we cannot prove the the above result for generalcontinuous functions if we do not refer to the geometric interpretation of the integral. In thenext section we will discuss connection between the definite integral and the area under thegraph in a rigorous way; we will return to Example 13.14 and we will rigorously prove theclaim under the assumption that f is continuous.

Example 13.17. If f : [0, 1]→ R is continuous, then the sum Rn =n−1∑k=0

1

nf

(k

n

)

approximates the area under the graph of f . Therefore we should expect that limn→∞Rn =∫ 10 f(x) dx. However, there is no easy way to prove it if we only define the integral through

antiderivatives and we are reluctant to use the interpretation of the integral as the area asnot rigorous. Needless to say, we are stuck. This problem will be resolved in the next section.

13.2.2 Geometric interpretation of the integral

Without a rigorous argument that the integral can be interpreted as the area under the graphwe miss a lot. The Fundamental Theorem of Calculus is just a trivial observation, and wecannot provide rigorous argument for the claims of Examples 13.14 and 13.17. The next result,Theorem 13.18 will provide us with the rigorous proof that the integral can be interpretedas the area under the graph. With that result we will be able to justify Examples 13.14and 13.17.

Assume that f : [a, b] → R is continuous. Consider a partition of the interval [a, b] intosubintervals [xi, xi+1], where

a = x0 < x1 < x2 < . . . < xn = b and choose ti ∈ [xi, xi+1] for i = 0, 1, . . . , n− 1.

The Riemann sum defined by

Rn =

n−1∑i=0

f(ti)(xi+1 − xi),

is the sum of the areas of the rectangles above the intervals [xi, xi+1]. The height of each ofthe rectangles equals f(ti) so intuitively, the sum approximates the area under the graph.4

4Observe that if f(ti) < 0, the area of a corresponding rectangle is negative. That is consistent with ourinterpretation that the area of the part of the graph that is below the x-axis counts with the negative sign.


In order for the sum Rn to be a good approximation of the area, we need the bases ofthe rectangles xi+1 − xi to be very small. Therefore, if max

i=0,1,...n−1(xi+1 − xi)→ 0 as n→∞,

we should expect that Rn →∫ ba f(x) dx. This would give the desired interpretation of the

integral as the area. The next result proves this so much anticipated fact.

Theorem 13.18. Let f : [a, b]→ R be continuous. Let

a = x0 < x1 < x2 < . . . < xn = b and ti ∈ [xi, xi+1] for i = 0, 1, 2, . . . , n− 1.

Then for every ε > 0 there is δ > 0 such that if maxi=0,1,2,...,n−1

(xi+1 − xi) < δ, then

∣∣∣∣∣∫ b

af(x) dx−

n−1∑i=0

f(ti)(xi+1 − xi)

∣∣∣∣∣ < ε.

Proof. Theorem 13.11(b) and 13.11(e) imply that

∫ b

af(x) dx =

n−1∑i=0

∫ xi+1

xi

f(x) dx =

n−1∑i=0

f(ξi)(xi+1 − xi) for some ξi ∈ [xi, xi+1].

Therefore, ∣∣∣∣∣∫ b

af(x) dx−

n−1∑i=1

f(ti)(xi+1 − xi)

∣∣∣∣∣ ≤n−1∑i=0

|f(ξi)− f(ti)||xi+1 − xi| = ♥.

Since f is uniformly continuous, for every ε > 0, we can find δ > 0 such that |f(x)− f(y)| <ε/(b− a), whenever |x− y| < δ. In particular |f(ξi)− f(ti)| < ε/(b− a) since ξi, ti ∈ [xi, xi+1]belong to an interval of length less than δ. Therefore,

♥ ≤n−1∑i=0

ε

b− a|xi+1 − xi| = ε.



Corollary 13.19. If f : [a, b]→ R is continuous, then

limn→∞

b− an

n−1∑i=0

f

(a+

b− an

k

)=

∫ b

af(x) dx.

Proof. The points xi = a+ b−an k, partition [a, b] into intervals of equal length (b−a)/n. Since

the length of these intervals converge to zero as n→∞, the result follows form Theorem 13.18.

Now Example 13.17 follows from Corollary 13.19.


limn→∞

(1

n+

1

n+ 1+ . . .+

1

2n

)= ln 2.

Remark 13.21. This is Lemma 10.28, but now we will show a different proof based on theRiemann sum approximation.

Proof. Denote the sequence whose limit we must find by an and let f(x) = (1 + x)−1. Thenan = 1

n + bn, where

bn =1

n+ 1+

1

n+ 2+ · · ·+ 1

n+ n=

1

n

(1

1 + 1n

+1

1 + 2n

+ · · ·+ 1

1 + nn

)=

1

n

n∑i=1

f

(i

n

).

Therefore bn is a Riemann sum approximating the integral∫ 10 (1 + x)−1 dx so

limn→∞

an = limn→∞

1

n+ limn→∞

bn = 0 +

∫ 1

0

dx

1 + x= ln(1 + x)

∣∣∣10

= ln 2.

Now we can provide a rigorous proof for Example 13.14 without assuming differentiabilityof f .

Example 13.22. Let f : [0, a]→ R be continuous and strictly increasing, f(0) = 0, f(a) = b.Then f has a continuous inverse g : [0, b]→ R, g(0) = 0, g(b) = b. Prove that∫ a

0f(x) dx+

∫ b

0g(x) dx = ab.

Proof. Let xi = ai/n, i = 0, 1, 2, . . . , n be a partition of [0, a] into intervals of equal lengtha/n. Then

(13.5) limn→∞

n−1∑i=0

f(xi)(xi+1 − xi) =

∫ a

0f(x) dx.


The points yi = f(xi), i = 0, 1, 2, . . . , n form a partition of [0, b]. Since f is uniformlycontinuous,

(13.6) maxi=0,1,...,n−1

(yi+1 − yi)→ 0 as n→∞.

Therefore,5

limn→∞

n−1∑i=0

g(yi+1)(yi+1 − yi) =

∫ b

0g(y) dy.

Observe that

(13.7)n−1∑i=0

f(xi)(xi+1 − xi) +n−1∑i=0

g(yi+1)(yi+1 − yi) = ab.

Indeed, the first sum on the left hand side of (13.7) represents the sum of areas of therectangles below the graph of y = f(x) and the second sum on the left hand side of (13.7)represents the sum of areas above the graph of x = g(y). All the rectangles from both sumson the left hand side of (13.7) cover the whole area of the rectangle [0, a]× [0, b]. This proves(13.7). Now, passing to the limit in (13.7) as n→∞, yields the result.

13.3 Basic results

Theorem 13.23 (Integration by parts). If f, g : [a, b]→ R have continuous derivatives, then∫ b

af(x)g′(x) dx = fg

∣∣∣ba−∫ b

af ′(x)g(x) dx = (f(b)g(b)− f(a)g(a))−

∫ b

af ′(x)g(x) dx.

Proof. We have

fg∣∣∣ba

=

∫ b

a(fg)′(x) dx =

∫ b

af ′(x)g(x) dx+

∫ b

af(x)g′(x) dx

and the theorem follows.

Theorem 13.24. If fn : [a, b] → R is a sequence of continuous functions that convergeuniformly to f : [a, b]→ R, then∫ b

af(x) dx = lim

n→∞fn(x) dx.

Proof. Given ε > 0, there is n0 such that for n ≥ n0 |f(x)− fn(x)| < ε/(b− a) for all n ≥ n0and hence ∣∣∣∣∫ b

af(x) dx−

∫ b

afn(x) dx

∣∣∣∣ ≤ ∫ b

a|f(x)− fn(x)| dx < ε.

5Note that in (13.5) we have f(xi), but in (13.6) we have g(yi+1).

13.3. BASIC RESULTS 235

Corollary 13.25. If a series of continuous function∞∑n=0

fn(x) converges uniformly on [a, b],

then ∫ b

a

∞∑n=0

fn(x) dx =∞∑n=0

∫ b

afn(x) dx.

Theorem 13.26 (Integration by substitution). If f : [a, b]→ R is continuous and g : [a, b]→R is continuously differentiable and f is a continuous function defined on an interval thatcontains the image g([a, b]) of g, then

∫ b

af(g(x))g′(x) dx =

∫ g(b)

g(a)f(y) dy = F (g(b))− F (g(a)),

where F is an antiderivative of f .

Proof. The second equality follows from the definition of the definite integral. Since,F (g(x))′ = f(g(x))g′(x), the function F (g(x)) us an antiderivative of f(g(x))g′(x) and hence

∫ b

af(g(x))g′(x) dx = F (g(b))− F (g(a)).

Theorem 13.27 (Cauchy-Schwarz Inequality). If f, g : [a, b]→ R are continuous, then

∫ b

a|f(x)g(x)| dx ≤

(∫ b

af2(x) dx

)1/2(∫ b

ag2(x) dx

)1/2

.

Proof. 6 For t ∈ R we have

0 ≤∫ b

a(f(x)+tg(x))2 dx =

∫ b

af2(x) dx︸︷︷︸A

+2t

∫ b

af(x)g(x) dx︸︷︷︸

B

+t2∫ b

ag2(x) dx.︸︷︷︸C

= A+2tB+t2C.

The quadratic function t 7→ A + 2tB + t2C is nonnegative so its discriminant satisfies ∆ =(2B)2 − 4AC ≤ 0, i.e.,

4

(∫ b

af(x)g(x)

)2

− 4

(∫ b

af2(x) dx

)(∫ b

ag2(x) dx

)≤ 0

and the Cauchy-Schwarz inequality follows.

6This proof is very similar to the proof of Theorem 4.11.


13.3.1 Taylor’s theorem with an integral remainder

Recall that if f : (a, b)→ R is (n+ 1)-times differentiable and x, x+ h ∈ (a, b), then

(13.8) f(x+h) = f(x) + f ′(x)h+f ′′(x)

2!h2 + . . .+

f (n)(x)

n!hn+

f (n+1)(x+ θh)

n!(1− θ)nhn+1.

This is Taylor’s theorem with the Cauchy remainder. Here x + θh is some point betweenx and x + h, but we do not know exactly where. Now we will prove a version of Taylor’stheorem with the integral remainder. It will be similar to the Cauchy remainder, but insteadof being evaluated at one point x+ θh of undisclosed location, it will be integrated over thewhole interval from x to x+ h.

Suppose that f : (a, b) → R is (n + 1)-times continuously differentiable.7 Applying theintegration by parts multiple number of times, for x, x+ h ∈ (a, b) we have

f(x+ h)− f(x) =

∫ 1

0

d

dtf(x+ th) dt = h

∫ 1

0f ′(x+ th) dt = −h

∫ 1

0f ′(x+ th)(1− t)′ dt

= −hf ′(x+ th)(1− t)∣∣∣10

+ h

∫ 1

0

d

dtf ′(x+ th)(1− t)′ dt

= f ′(x)h− h2∫ 1

0f ′′(x+ th)(1− t) dt

= f ′(x)h− h2∫ 1

0f ′′(x+ th)

((1− t)2

2

)′dt

= f ′(x)h− h2f ′′(x+ th)(1− t)2

2

∣∣∣10

+ h2∫ 1

0

d

dtf ′′(x+ th)

(1− t)2

2dt

= f ′(x)h+ f ′′(x)h2

2+ h3

∫ 1

0f ′′′(x+ th)

(1− t)2

2dt

= f ′(x)h+ f ′′(x)h2

2− h3

∫ 1

0f ′′′(x+ th)

((1− t)3

3!

)′dt

= f ′(x)h+ f ′′(x)h2

2− h3f ′′′(x+ th)

(1− t)3

3!

∣∣∣10

+ h3∫ 1

0

d

dtf ′′′(x+ th)

(1− t)3

3!dt

= f ′(x)h+ f ′′(x)h2

2+ f ′′′(x)

h3

3!+ h4

∫ 1

0f (4)(x+ th)

(1− t)3

3!dt = · · · =

= f ′(x)h+ f ′′(x)h2

2+ · · ·+ f (n)(x)

hn

n!+hn+1

n!

∫ 1

0f (n+1)(x+ th)(1− t)n dt.

We proved:

Theorem 13.28. If f : (a, b)→ R is (n+ 1)-times continuously differentiable and x, x+ h ∈(a, b), then

f(x+ h) = f(x) + f ′(x)h+ f ′′(x)h2

2+ · · ·+ f (n)(x)

hn

n!+hn+1

n!

∫ 1

0f (n+1)(x+ th)(1− t)n dt.

7This is a stronger assumption than the one in (13.8) which did not require continuity of the derivativef (n+1).

13.4. APPLICATIONS 237

Remark 13.29. It follows from the Mean Value Theorem 13.11(f) that

hn+1

n!

∫ 1

0f (n+1)(x+ th)(1− t)n dt =

hn+1

n!f (n+1)(x+ θh)(1− θ)n dt.

for some θ ∈ (0, 1). This gives (13.8).

13.4 Applications

13.4.1 Wallis’ formula

As an application of the integration by parts we will prove

Theorem 13.30. For n = 0, 1, 2, . . . we have∫ π/2

0sin2n x dx =

(2n)!

22n(n!)2π

2,

∫ π/2

0sin2n+1 x dx =

22n(n!)2

(2n+ 1)!.

Proof. Let an =∫ π/20 sinn x dx. Then a0 = π/2 and a1 = 1. We will find a recursive formula

that will express an+2 in terms of an. We have

an+2 =

∫ π/2

0sinn+1 x sinx dx =

∫ π/2

0sinn+1 x(− cosx)′ dx

= − sinn+1 x cosx∣∣∣π/20−∫ π/2

0(sinn+1 x)′(− cosx) dx

= (n+ 1)

∫ π/2

0sinn x cos2 x dx

= (n+ 1)

∫ π/2

0sinn x(1− sin2 x) dx = (n+ 1)(an − an+2),

an+2 =n+ 1

n+ 2an.

Now a simple induction gives

a2n =1 · 3 · · · (2n− 1)

2 · 4 · · · (2n)

π

2=

1 · 2 · · · (2n)

(2 · 4 · · · (2n))2π

2=

(2n)!

22n(n!)2π

2,

and similarly

a2n+1 =2 · 4 · · · (2n)

1 · 3 · · · (2n+ 1)=

22n(n!)2

(2n+ 1)!.

There is something exciting about Theorem 13.30. For even exponents the integral involvesπ and for odd exponents there is no π in the formula. This observation will be used in theproof of


Theorem 13.31 (Wallis’ formula). limn→∞

22n(n!)2

(2n)!√n

=√π.

Proof. Clearly,

sinn x < sinm x for 0 < x <π

2and 0 ≤ m < n

so the sequence an defined in the proof of Theorem 13.30 is decreasing and hence

n+ 1

n+ 2=an+2

an<an+1

an< 1.

Therefore, an+1/an → 1, and in particular a2n+1/a2n → 1. We have

πa2n+1

a2n= π

22n(n!)2

(2n+ 1)!

22n(n!)2

(2n)!

2

π=

(22n(n!)2

(2n)!√n

)22n

2n+ 1→ π.

Since 2n/(2n+ 1)→ 1, the result follows upon taking the square root.

13.4.2 Stirling’s formula

We will use now Wallis’ formula to prove an approximate formula for n!.

Theorem 13.32 (Stirling’s formula). n! = e−nnn√

2πn(1 + εn), where εn → 0 as n→∞.8

Proof. Clearly n! = e−nnn√

2πn bn for some bn > 0 and we just need to prove that bn → 1.Replacing n! in Wallis’ formula by this expression involving bn we have

22n(e−nnn√

2πn bn)2

(e−2n(2n)2n√

2π · 2n b2n)√n→√π.

Most of the terms in this expression will cancel out and we get that b2n/b2n → 1. If we canprove that bn converges to a positive number g, we will have g2/g = 1, g = 1 and this willcomplete the proof.

Therefore it remains to show that bn converges to a positive number. Note that conver-gence of bn to a positive number is equivalent to convergence of ln bn to a real number which,in turn, in equivalent to convergence of the series 9

(13.9)∞∑n=1

(ln bn − ln bn+1) =∞∑n=1

ln

(bnbn+1

).

Elementary calculation show that

bnbn+1

=n!

e−nnn√

2πn

e−(n+1)(n+ 1)n+1√

2π(n+ 1)

(n+ 1)!= e−1

(1 +

1

n

)n+1/2

8We do not claim that εn > 0.9Indeed, partial sums of the series are ln b1 − ln bn+1.

13.4. APPLICATIONS 239

so

ln

(bnbn+1

)= −1 +

(n+

1

2

)ln

(1 +

1

n

).

In Example 12.33 we proved that

0 < −1 +

(1

x+

1

2

)ln(1 + x) <

x2

12for x > 0.

This inequality with x = 1/n yields

0 < ln

(bnbn+1

)<

1

12n2,

which proves convergence of the series (13.9).

13.4.3 π is irrational

We will use integration by parts to prove the following result.

Theorem 13.33. π2 is irrational. Therefore, π is irrational too.

Proof. In the proof we will need the following lemma.

Lemma 13.34. For n ∈ N let

f(x) =xn(1− x)n

n!

Then the polynomial f has the following properties:

(a) f(x) =1

n!

(cnx

n + cn+1xn+1 + . . .+ c2nx

2n)

=1

n!

2n∑i=n

cixi, where ci ∈ Z,

(b) The derivatives of all orders at 0 and 1 are integers, f (k)(0) ∈ Z, f (k)(1) ∈ Z, k =0, 1, 2, . . .,10

(c) For 0 < x < 1, 0 < f(x) <1

n!.

Proof. (a) follows from the fact that (1 − x)n is a polynomial of degree n with integralcoefficients. To prove (b) observe that f (k)(0) = 0 when k < n or k > 2n and

f (k)(0) =ckk!

n!∈ Z, when n ≤ k ≤ 2n.

Indeed, k!/n! ∈ Z, because k ≥ n and n! appears as a factor of k!. It remains to provethat f (k)(1) ∈ Z. Note that f(1 − x) = f(x) and hence f (k)(x) = (−1)kf (k)(1 − x) sof (k)(1) = (−1)kf (k)(0) ∈ Z which completes the proof of (b).

10Here f (0) = f , that is, the derivative of order zero is just the function itself.


(c) If 0 < x < 1, then 0 < 1− x < 1, and hence 0 < xn(1− x)n < 1. Therefore,

0 < f(x) =xn(1− x)n

n!<

1

n!.

This completes the proof of the lemma.

Now we can complete the proof of the theorem. Suppose to the contrary that π2 is

rational, i.e. π2 =a

bfor some a, b ∈ N.

Consider the function

F (x) = bn(π2nf(x)−π2n−2f (2)(x)+π2n−4f (4)(x)− . . .− (−1)nπ2f (2n−2)(x)+(−1)nf (2n)(x)).

To explain the signs of the last two terms in the sum, note that the general term in the sumis:

(−1)kπ2n−2kf (2k)(x).

When 2k = 2n we get (−1)nf (2n)(x) and when 2k = 2n − 2 we get (−1)n−1π2f (2n−2)(x) =−(−1)nπ2f (2n−2)(x).

Lemma 13.35. F (0) ∈ Z and F (1) ∈ Z.

Proof. The k-th term representing F (x) equals

bn(

(−1)kπ2n−2kf (2k)(x))

= (−1)kbn(π2)n−kf (2k)(x) = (−1)kbn(ab

)n−kf (2k)(x)

= (−1)kbkan−kf (2k)(x).

Since by Lemma 13.34(b), f (2k)(0) ∈ Z and f (2k)(1) ∈ Z, it follows that each term representingF (0) or F (1) is an integer and hence F (0) ∈ Z and F (1) ∈ Z.

Lemma 13.36.

d

dx

(F ′(x) sinπx− πF (x) cosπx

)= π2anf(x) sinπx.

Proof. The product rule and the chain rule gives

d

dx

(F ′(x) sinπx− πF (x) cosπx

)= F ′′(x) sinπx+ πF ′(x) cosπx− πF ′(x) cosπx+ π2F (x) sinπx

= (F ′′(x) + π2F (x)) sinπx = ♥.

Now we can use the definition of F (x) to express F ′′(x) and π2F (x). In the first case weincrease the order of the derivative of f (2k) by 2 and in the second case we increase the

13.5. IMPROPER INTEGRAL 241

exponent at π2n−2k by 2. Since f(x) is a polynomial of degree 2n it follows that f (2n+2)(x) = 0so

♥ =[bn(π2nf (2)(x)− π2n−2f (4)(x) + π2n−4f (6)(x)− . . .− (−1)nπ2f (2n)(x) + (−1)n f (2n+2)(x)︸︷︷︸

0

)+ bn

(π2n+2f(x)− π2nf (2)(x) + π2n−2f (4)(x)− . . .− (−1)nπ4f (2n−2)x+ (−1)nπ2f (2n)x

)]sinπx

= bnπ2n+2f(x) sinπx = bnπ2(ab

)nf(x) sinπx = π2anf(x) sinπx.

That completes the proof of Lemma 13.36.

Dividing both sides of the equality in Lemma 13.36 by π yields

d

dx

(1

πF ′(x) sinπx− F (x) cosπx

)= anπf(x) sinπx.

Taking the integral from 0 to 1 and using the fundamental theorem of calculus gives∫ 1

0anπf(x) sinπx dx =

(1

πF ′(x) sinπx− F (x) cosπx

) ∣∣∣10

= (0− F (1)(−1))− (0− F (0)) = F (1) + F (0) ∈ Z.

On the other hand 0 < sinπx ≤ 1 for 0 < x < 1 so Lemma 13.34(c) yields that 0 <

f(x) sinπx <1

n!for 0 < x < 1 and hence

0 <

∫ 1

0anπf(x) sinπx dx <

anπ

n!.

Since an/n!→ 0 as n→∞, there is n such that anπ/n! < 1, and hence for such n we have

0 <

∫ 1

0anπf(x) sinπx dx < 1.

This however, gives a contradiction since the integral equals F (1) + F (1) ∈ Z and there areno inters greater than 0 and less than 1.

13.5 Improper integral

Definition 13.37. If f : (a, b)→ R is continuous and nonnegative, then we define∫ b

af(x) dx = lim

x→b−F (x)− lim

x→a+F (x),

where F is an antiderivative of f . Similarly, we define the integral in the case in whichf : (a, b)→ R satisfies f ≤ 0.


If f ≥ 0, then F is increasing (because F ′ = f ≥ 0) so the above limits exist. However, itmight happen that

limx→b−

F (x) = +∞ and/or limx→a+

F (x) = −∞.

We adopt the convention that for g ∈ R

+∞− g = +∞, g − (−∞) = +∞, +∞− (−∞) = +∞.

Therefore, if f ≥ 0, then∫ ba f(x) dx is finite or equal +∞. Similarly, if f ≤ 0, then

∫ ba f(x) dx

is finite or −∞.

Example 13.38. ∫ 1

0

1

xdx = ln 1− lim

x→0+lnx = 0− (−∞) = +∞.

Also ∫ ∞0

1

xdx = lim

x→∞lnx− lim

x→0+lnx =∞− (−∞) = +∞.

However, ∫ 1

0

1√xdx = 2

√1− lim

x→0+2√x = 2.

The last example shows that an integral of an unbounded function may be finite.

The above definition is a special case of a more general definition of an improper integral,where the function may change sign.

Definition 13.39. If f : [a, b)→ R is continuous, then we define the improper integral by

(13.10)

∫ b

af(x) dx = lim

t→b−

∫ t

af(x) dx,

provided the limit exists and is finite. In that case we say that the integral converges (con-ditionally). If the limit does not exist or if it has infinite value, we say that the integraldiverges.

That definition applies to cases when b = ∞ or if b is finite, but f does not extend to acontinuous function on [a, b].

Definition 13.40. Similarly we define an improper integral on (a, b]. If f : (a, b)→ R, thenwe fix c ∈ (a, b) and define the improper integral as∫ b

af(x) dx = lim

s→a+

∫ c

sf(x) dx+ lim

t→b−

∫ t

cf(x) dx,

provided both limits exist and are finite.

It is easy to see, that the existence of the limits does not depend on the choice of c.

13.5. IMPROPER INTEGRAL 243

Definition 13.41. We say that the improper integral∫ ba f(x) dx converges absolutely, if∫ b

a |f(x)| dx <∞.11

Easy proof of the next result is left to the reader.

Proposition 13.42. If an improper integral converges absolutely, then it converges condi-tionally.

Example 13.43. Prove that the integral

∫ ∞1

sinx

xdx converges conditionally, but not abso-

lutely.

Proof. First we will prove the conditional convergence. We have∫ t

1

sinx

xdx =

∫ t

1(− cosx)′

1

xdx = −cosx

x

∣∣∣t1−∫ t

a

cosx

x2dx.

−cosx

x

∣∣∣t1

= −cos t

t+ cos 1→ cos 1 as t→∞,

and the integral∫∞1 (cosx)/x2 dx converges, since it converges absolutely∫ t

1

| cosx|x2

dx ≤∫ t

1

dx

x2= −1

t+ 1→ 1 as t→∞.

Now we will prove that the integral does not converge absolutely.12 We have∫ ∞1

| sinx|x

dx ≥∫ nπ

π

| sinx|x

dx =n∑k=2

∫ kπ

(k−1)π

| sinx|x

dx ≥ 2

π

n∑k=2

1

k→∞ as n→∞.

We used here two facts:∫ kπ

(k−1)π| sinx| dx = 2, and

1

x≥ 1

kπfor x ∈ [(k − 1)π, kπ].

The next result provides a powerful method of proving convergence of series.

Theorem 13.44 (Integral Test). If f : [1,∞)→ R is continuous, nonnegative and decreasing,then

∞∑n=1

f(n) <∞ if and only if

∫ ∞1

f(x) dx <∞.

11This definition applies to all the cases, when f is defined on (a, b], [a, b) or (a, b).12The idea is simple. On many intervals | sinx| ≥ a > 0 and hence | sinx|/x > a/x so the fact that∫∞

1| sinx|/x dx =∞ should be related to

∫∞1dx/x =∞.


Proof. Since∫ n

1f(x) dx =

n−1∑k=1

∫ k+1

kf(x) dx, and f(k + 1) ≤

∫ k+1

kf(x) dx ≤ f(k),

we have

f(2) + f(3) + · · ·+ f(n) ≤∫ n

1f(x) dx ≤ f(1) + f(2) + · · ·+ f(n− 1)

and the result follows upon letting n→∞.

Example 13.45. As a straightforward application of the Integral Test we obtain a different

proof of the p-Series Test (Theorem 10.14):∞∑n=1

1

npconverges for p > 1 and diverges for

0 < p ≤ 1. We leave details to the reader.

13.6 Examples


limn→∞

n√

(n+ 1)(n+ 2) · · · (n+ n)

n.

Solution. We have

n√

(n+ 1)(n+ 2) · · · (n+ n)

n= n

√(1 +

1

n

)(1 +

2

n

)· · ·(

1 +n

n

).

The idea is to represent it a a Riemann sum. We have a product, not a sum, but applyingthe logarithm, the product will become a sum. Therefore

limn→∞

n√

(n+ 1)(n+ 2) · · · (n+ n)

n= lim

n→∞exp

(ln n

√(1 +

1

n

)(1 +

2

n

)· · ·(

1 +n

n

))

= exp

(limn→∞

ln n

√(1 +

1

n

)(1 +

2

n

)· · ·(

1 +n

n

))

= exp

(limn→∞

1

n

(ln

(1 +

1

n

)+ ln

(1 +

2

n

)+ . . .+ ln

(1 +

n

n

)))= exp

(∫ 2

1lnx dx

)= exp

(x(lnx− 1)

∣∣∣21

)=

4

e.

2

Example 13.47. Prove that if f : [a, b]→ R is continuous, then

limn→∞

n

√∫ b

a|f(x)|n dx = sup

x∈[a,b]|f(x)|.

13.6. EXAMPLES 245

Proof. Let M = supx∈[a,b] |f(x)|. Then

(13.11) lim supn→∞

n

√∫ b

a|f(x)|n dx ≤ lim sup

n→∞M(b− a)1/n = M.

Since |f | is continuous, the maximum if attained at some point |f(x0)| = M , a ≤ x0 ≤ b.Given ε > 0, it follows from the continuity of |f | that |f(x)| ≥M−ε for all x in some intervalI that contain x0 and hence

(13.12) lim infn→∞

n

√∫ b

a|f(x)|n dx ≥ lim inf

n→∞n

√∫I|f(x)|n dx ≥ lim inf

n→∞(M − ε)|I|1/n = M − ε.

We proved that for any ε > 0,

M − ε ≤ lim infn→∞

n

√∫ b

a|f(x)|n dx ≤ lim sup

n→∞

n

√∫ b

a|f(x)|n dx ≤M.

Therefore, the upper and the lower limits are equal M and the claim follows.13

Example 13.48. Prove that if f : [0, 1]→ R is continuous and∫ 1

0f(x)xn dx = 0 for n = 0, 1, 2, . . .,

then f(x) = 0 for all x ∈ [0, 1].

Proof. The function f is continuous and hence bounded, say |f(x)| ≤ M for all x ∈ [0, 1].According to the Weierstrass Theorem 11.66, there is a polynomial pε(x) such that

|f(x)− pε(x)| ≤ ε

Mfor all x ∈ [0, 1].

If pε(x) =∑n

i=0 aixi, then∫ 1

0f(x)pε(x) dx =

n∑i=0

ai

∫ 1

0f(x)xi dx = 0.

Therefore

0 ≤∫ 1

0f2(x) dx =

∫ 1

0f(x)(f(x)− pε(x)) dx+

∫ 1

0f(x)pε(x) dx︸︷︷︸

0

=

∫ 1

0f(x)(f(x)− pε(x)) dx ≤

∫ 1

0|f(x)||f(x)− pε(x)| dx ≤

∫ 1

0M

ε

Mdx = ε.

We proved that for any ε > 0 we have 0 ≤∫ 10 f

2(x) dx ≤ ε. Therefore, the integral of f2

equals zero and hence f(x) = 0 for all x ∈ [0, 1].13This proof shows how useful the notions of the upper and the lower limits are. One is tempting to replace

lim sup and lim inf in (13.11) and (13.12) simply by lim. This is what many students do. We could do it if wewould know that the limit exists. However, we do not know yet that the limit exists and the use of lim is notpermitted.


Example 13.49. Prove that the integral

∫ ∞−∞

e−x2dx is finite.

Proof. A problem is that we cannot find a formula for the antiderivative of e−x2. Therefore

we need to estimate e−x2

by a larger function whose antiderivative we can find. Inequalityx2 ≥ 2|x| − 1, implies that e−x

2 ≤ e−2|x|+1. It is easy to check that

F (x) =

−1

2e−2x+1, if x ≥ 0,

12e

2x+1 − e, if x < 0,

is an antiderivative of e−2|x|+1. Since∫ ∞−∞

e2|x|+1 dx = limx→∞

(−1

2e−2x+1

)− limx→−∞

(1

2e2x+1 − e

)= e <∞,

it follows that∫∞−∞ e

−x2 dx <∞.

Remark 13.50. Although, it is impossible to find closed formula for the antiderivative ofe−x

2, one can prove that

1√π

∫ ∞−∞

e−x2dx = 1.

The function e−x2/√π is bell-shaped, and it is is known in probability as a normal distribution.

The area under the graph equal 1 represents the total probability equal 1. It is one of themost important functions in applications.

Example 13.51. Evaluate the integral

∫ ∞0

e−x cosx dx.

Solution. Since

|e−x cosx| ≤ e−x and

∫ ∞0

e−x dx = −e−x∣∣∣∞0

= 1 <∞,

the integral we want to evaluate converges absolutely. Let F (x) =∫ x0 e−t cos t dt. Integration

by parts yields

F (x) =

∫ x

0e−t(sin t)′ dt = e−t sin t

∣∣∣x0−∫ x

0(e−t)′ sin t dt = e−x sinx+

∫ x

0e−t sin t dt

= e−x sinx+

∫ x

0e−t(− cos t)′ dt = e−x sinx+ e−t(− cos t)

∣∣∣x0−∫ x

0(e−t)′(− cos t) dt

= e−x sinx− e−x cosx+ 1−∫ x

0e−t cos t dt︸︷︷︸F (x)

.

Therefore,

F (x) =1

2+

1

2e−x(sinx− cosx),

∫ ∞0

e−x cosx dx = limx→∞

F (x) =1

2.

2

13.6. EXAMPLES 247

Example 13.52. Given that

∫ ∞−∞

e−x2dx =

√π, find the limit

limx→∞

( ∞∑n=0

(−1)nx2n+1

(2n+ 1)n!

).

Solution. A power series expansion of ex implies that

e−t2

=∞∑n=0

(−1)nt2n

n!, with the radius of convergence R =∞.

The series converges uniformly on [−x, x] for every x > 0 and hence Corollary 13.25 allowsus to integrate the series term by term and hence,∫ x

0e−t

2dt =

∫ x

0

∞∑n=0

(−1)nt2n

n!dt =

∞∑n=0

(−1)nx2n+1

(2n+ 1)n!.

Accordingly,

limx→∞

( ∞∑n=0

(−1)nx2n+1

(2n+ 1)n!

)= lim

x→∞

∫ x

0e−t

2dt =

∫ ∞0

e−t2dx =

1

2

∫ ∞−∞

e−t2dt =

√π

2.

2

Example 13.53. Evaluate the integral

I =

∫ ∞0

arctan(πx)− arctanx

xdx.

Solution. We have

I = limt→∞

(∫ t

0

arctan(πx)

xdx︸︷︷︸

G(t)

−∫ t

0

arctanx

xdx︸︷︷︸

F (t)

)

The integrals seem to be improper that 0, but they are proper, because

limx→0+

arctan(πx)

x= π and lim

x→0+

arctanx

x= 1

so both functions extend to continuous functions on the interval [0, t]. Using the substitutiony = πx, (dy = π dx) we have

G(t) =

∫ t

0

arctan(πx)

πxπ dx =

∫ πt

0

arctan y

ydy = F (πt).

Therefore,

I = limt→∞

(F (πt)− F (t)) = ♥.


The function g(t) = F (πt)−F (t) is increasing, because it has positive derivative. Indeed, theFundamental Theorem of Calculus yields

g′(t) = F ′(πt)− F ′(t) =arctan(πt)− arctan t

t> 0.

Since t 7→ F (πt) − F (t) is increasing, the limit ♥ exists and in order to find the limit, itsuffices to take the sequence t = πn, n→∞. We have

♥ = limn→∞

(F (πn+1)− F (πn)), F (πn+1)− F (πn) =

∫ πn+1

πn

arctanx

xdx.

arctan(πn)

x≤ arctanx

x≤ arctan(πn+1)

x, for x ∈ [πn, πn+1].

Therefore,

arctan(πn)

∫ πn+1

πn

dx

x︸︷︷︸arctan(πn) lnπ

≤∫ πn+1

πn

arctanx

xdx ≤ arctan(πn+1)

∫ πn+1

πn

dx

x︸︷︷︸arctan(πn) lnπ

.

Since arctanx → π/2 as x → ∞, both the integrals on the left and on the right converge toπ2 lnπ and hence

I = limn→∞

∫ πn+1

πn

arctanx

xdx =

π lnπ

2.

2

Example 13.54. 14 Let f : [0,∞)→ R be continuous, positive and decreasing. Prove that if∫∞0 f(x) dx <∞, then

(13.13) limt→0+

t

∞∑n=1

f(tn) =

∫ ∞0

f(x) dx.

Use this fact to evaluate the limit

(13.14) limt→0+

∞∑n=1

t

1 + n2t2.

Solution. Adding up the inequalities∫ t(k+1)

tkf(x) dx ≤ tf(tk) ≤

∫ tk

t(k−1)f(x) dx

from k = 1 to k = n we have∫ t(n+1)

tf(x) dx ≤ t

n∑k=1

f(kt) ≤∫ tn

0f(x) dx ≤

∫ ∞0

f(x) dx <∞.

14Note that the sum on the left hand side of (13.13) is a sort of ‘infinite Riemann sum’ and it is natural toexpect that it converges to the integral as t→ 0+. However, we only proved such a result for finite Riemannsums (Corollary 13.19) and a rigorous proof of (13.13) is required.

13.6. EXAMPLES 249

This proves convergence of the series t∑∞

k=1 f(kt), because partial sums are increasing andbounded by

∫∞0 f(x) dx. Letting n→∞ we get∫ ∞

tf(x) dx ≤ t

∞∑k=1

f(kt) ≤∫ ∞0

f(x) dx.

Now passing to the limit as t → 0+ yields (13.13). It remians to evaluate the limit (13.14).Applying (13.13) to f(x) = 1/(1 + x2) yields

limt→0+

∞∑n=1

t

1 + n2t2= lim

t→0+t

∞∑n=1

f(nt) =

∫ ∞0

f(x) dx = arctanx∣∣∣∞0

=π

2.

2

introduction to analysishajlasz/teaching/math1530fall2018/selection.pdfanother example, less obvious...

Documents