analysisi: calculusofonerealvariable1 foundations: mathematical logic and set theory 1.1...

Analysis I:

Calculus of One Real Variable

Peter Philip∗

Lecture Notes

Created for the Class of Winter Semester 2015/2016 at LMU Munich†

January 19, 2021

Contents

1 Foundations: Mathematical Logic and Set Theory 5

1.1 Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Propositional Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.1 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.2 Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.3 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.4 Predicate Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 Functions and Relations 23

2.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3 Natural Numbers, Induction, and the Size of Sets 36

3.1 Induction and Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2 Cardinality: The Size of Sets . . . . . . . . . . . . . . . . . . . . . . . . . 43

4 Real Numbers 46

4.1 The Real Numbers as a Complete Totally Ordered Field . . . . . . . . . 46

4.2 Important Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

∗E-Mail: [email protected]†Resources used in the preparation of this text include [Kon04, Kun80, Wal04].

CONTENTS 2

5 Complex Numbers 49

5.1 Definition and Basic Arithmetic . . . . . . . . . . . . . . . . . . . . . . . 49

5.2 Sign and Absolute Value (Modulus) . . . . . . . . . . . . . . . . . . . . . 51

5.3 Sums and Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.4 Binomial Coefficients and Binomial Theorem . . . . . . . . . . . . . . . . 55

6 Polynomials 59

6.1 Arithmetic of K-Valued Functions . . . . . . . . . . . . . . . . . . . . . . 59

6.2 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

7 Limits and Convergence of Real and Complex Numbers 63

7.1 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

7.2 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

7.2.1 Definitions and First Examples . . . . . . . . . . . . . . . . . . . 72

7.2.2 Continuity, Sequences, and Function Arithmetic . . . . . . . . . . 74

7.2.3 Bounded, Closed, and Compact Sets . . . . . . . . . . . . . . . . 76

7.2.4 Intermediate Value Theorem . . . . . . . . . . . . . . . . . . . . . 80

7.2.5 Inverse Functions, Existence of Roots, Exponential Function, Log-arithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7.3 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

7.3.1 Definition and Convergence . . . . . . . . . . . . . . . . . . . . . 91

7.3.2 Convergence Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 93

7.3.3 Absolute Convergence and Rearrangements . . . . . . . . . . . . . 97

7.3.4 b-Adic Representations of Real Numbers . . . . . . . . . . . . . . 102

8 Convergence of K-Valued Functions 104

8.1 Pointwise and Uniform Convergence . . . . . . . . . . . . . . . . . . . . . 104

8.2 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

8.3 Exponential Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

8.4 Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

8.5 Polar Form of Complex Numbers, Fundamental Theorem of Algebra . . . 120

9 Differential Calculus 124

9.1 Definition of Differentiability and Rules . . . . . . . . . . . . . . . . . . . 124

CONTENTS 3

9.2 Higher Order Derivatives and the Sets Ck . . . . . . . . . . . . . . . . . 130

9.3 Mean Value Theorem, Monotonicity, and Extrema . . . . . . . . . . . . . 131

9.4 L’Hopital’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

9.5 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

10 The Riemann Integral on Intervals in R 141

10.1 Definition and Simple Properties . . . . . . . . . . . . . . . . . . . . . . 141

10.2 Important Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

10.2.1 Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . 152

10.2.2 Integration by Parts Formula . . . . . . . . . . . . . . . . . . . . 155

10.2.3 Change of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 155

10.3 Application: Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . 156

10.4 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

A Axiomatic Set Theory 165

A.1 Motivation, Russell’s Antinomy . . . . . . . . . . . . . . . . . . . . . . . 165

A.2 Set-Theoretic Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

A.3 The Axioms of Zermelo-Fraenkel Set Theory . . . . . . . . . . . . . . . . 168

A.3.1 Existence, Extensionality, Comprehension . . . . . . . . . . . . . 169

A.3.2 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

A.3.3 Pairing, Union, Replacement . . . . . . . . . . . . . . . . . . . . . 172

A.3.4 Infinity, Ordinals, Natural Numbers . . . . . . . . . . . . . . . . . 175

A.3.5 Power Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

A.3.6 Foundation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

A.4 The Axiom of Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

A.5 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

A.5.1 Relations to Injective, Surjective, and Bijective Maps; Schroder-Bernstein Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 189

A.5.2 Finite Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

A.5.3 Power Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

B Associativity and Commutativity 201

B.1 Associativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

B.2 Commutativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

CONTENTS 4

C Algebraic Structures 207

C.1 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

C.2 Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

C.3 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

D Construction of the Real Numbers 213

D.1 Natural Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

D.2 Interlude: Orders on Groups . . . . . . . . . . . . . . . . . . . . . . . . . 218

D.3 Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

D.4 Rational Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

D.5 Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

D.6 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

E Series: Additional Material 237

E.1 Riemann Rearrangement Theorem . . . . . . . . . . . . . . . . . . . . . . 237

E.2 b-Adic Representations of Real Numbers . . . . . . . . . . . . . . . . . . 239

F Cardinality of R and Some Related Sets 244

G Partial Fraction Decomposition 248

H Irrationality of e and π 253

H.1 Irrationality of e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

H.2 Irrationality of π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

I Trigonometric Functions 256

I.1 Additional Trigonometric Formulas . . . . . . . . . . . . . . . . . . . . . 256

J Differential Calculus 256

J.1 Continuous, But Nowhere Differentiable Functions . . . . . . . . . . . . . 256

References 259

1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 5

1 Foundations: Mathematical Logic and Set Theory

1.1 Introductory Remarks

The task of mathematics is to establish the truth or falsehood of (formalizable) state-ments using rigorous logic, and to provide methods for the solution of classes of (e.g.applied) problems, ideally including rigorous logical proofs verifying the validity of themethods (proofs that the method under consideration will, indeed, provide a correctsolution).

The topic of this class is calculus, which is short for infinitesimal calculus, usually un-derstood (as it is here) to mean differential and integral calculus of real and complexnumbers (more generally, calculus may refer to any method or system of calculationguided by the symbolic manipulation of expressions, we will briefly touch on anotherexample in Sec. 1.2 below). In that sense, calculus is the beginning part of the broaderfield of (mathematical) analysis, the section of mathematics concerned with the notionof a limit (for us, the most important examples will be limits of sequences (Def. 7.1below) and limits of functions (Def. 8.17 below)).

Before we can properly define our first limit, however, it still needs some preparatorywork. In modern mathematics, the objects under investigation are almost always so-called sets. So one aims at deriving (i.e. proving) true (and interesting and useful)statements about sets from other statements about sets known or assumed to be true.Such a derivation or proof means applying logical rules that guarantee the truth of thederived (i.e. proved) statement.

However, unfortunately, a proper definition of the notion of set is not easy, and neitheris an appropriate treatment of logic and proof theory. Here, we will only be able tobriefly touch on the bare necessities from logic and set theory needed to proceed to thecore matter of this class. We begin with logic in Sec. 1.2, followed by set theory inSec. 1.3, combining both in Sec. 1.4. The interested student can find an introductorypresentation of axiomatic set theory in Appendix A and he/she should consider takinga separate class on set theory, logic, and proof theory at a later time.

1.2 Propositional Calculus

1.2.1 Statements

Mathematical logic is a large field in its own right and, as indicated above, a thorough in-troduction is beyond the scope of this class – the interested reader may refer to [EFT07],[Kun12], and references therein. Here, we will just introduce some basic concepts usingcommon English (rather than formal symbolic languages – a concept touched on in Sec.A.2 of the Appendix and more thoroughly explained in books like [EFT07]).

As mentioned before, mathematics establishes the truth or falsehood of statements. Bya statement or proposition we mean any sentence (any sequence of symbols) that can

reasonably be assigned a truth value, i.e. a value of either true, abbreviated T, or false,abbreviated F. The following example illustrates the difference between statements andsentences that are not statements:

Example 1.1. (a) Sentences that are statements:

Every dog is an animal. (T)

Every animal is a dog. (F)

The number 4 is odd. (F)

2 + 3 = 5. (T)√2 < 0. (F)

x+ 1 > 0 holds for each natural number x. (T)

(b) Sentences that are not statements:

Let’s study calculus!

Who are you?

3 · 5 + 7.

x+ 1 > 0.

All natural numbers are green.

The fourth sentence in Ex. 1.1(b) is not a statement, as it can not be said to be eithertrue or false without any further knowledge on x. The fifth sentence in Ex. 1.1(b) isnot a statement as it lacks any meaning and can, hence, not be either true or false. Itwould become a statement if given a definition of what it means for a natural numberto be green.

1.2.2 Logical Operators

The next step now is to combine statements into new statements using logical operators,where the truth value of the combined statements depends on the truth values of theoriginal statements and on the type of logical operator facilitating the combination.

The simplest logical operator is negation, denoted ¬. It is actually a so-called unaryoperator, i.e. it does not combine statements, but is merely applied to one statement.For example, if A stands for the statement “Every dog is an animal.”, then ¬A standsfor the statement “Not every dog is an animal.”; and if B stands for the statement “Thenumber 4 is odd.”, then ¬B stands for the statement “The number 4 is not odd.”, whichcan also be expressed as “The number 4 is even.”

To completely understand the action of a logical operator, one usually writes what isknown as a truth table. For negation, the truth table is

A ¬AT FF T

(1.1)

that means if the input statement A is true, then the output statement ¬A is false; ifthe input statement A is false, then the output statement ¬A is true.

We now proceed to discuss binary logical operators, i.e. logical operators combiningprecisely two statements. The following four operators are essential for mathematicalreasoning:

Conjunction: A and B, usually denoted A ∧ B.

Disjunction: A or B, usually denoted A ∨ B.

Implication: A implies B, usually denoted A⇒ B.

Equivalence: A is equivalent to B, usually denoted A⇔ B.

Here is the corresponding truth table:

A B A ∧ B A ∨B A⇒ B A⇔ BT T T T T TT F F T F FF T F T T FF F F F T T

(1.2)

When first seen, some of the assignments of truth values in (1.2) might not be completelyintuitive, due to the fact that logical operators are often used somewhat differently incommon English. Let us consider each of the four logical operators of (1.2) in sequence:

For the use in subsequent examples, let A1, . . . , A6 denote the six statements from Ex.1.1(a).

Conjunction: Most likely the easiest of the four, basically identical to common languageuse: A∧B is true if, and only if, both A and B are true. For example, using Ex. 1.1(a),A1 ∧ A4 is the statement “Every dog is an animal and 2 + 3 = 5.”, which is true sinceboth A1 and A4 are true. On the other hand, A1 ∧ A3 is the statement “Every dog isan animal and the number 4 is odd.”, which is false, since A3 is false.

Disjunction: The disjunction A∨B is true if, and only if, at least one of the statementsA,B is true. Here one already has to be a bit careful – A ∨ B defines the inclusive or,whereas “or” in common English is often understood to mean the exclusive or (which isfalse if both input statements are true). For example, using Ex. 1.1(a), A1 ∨ A4 is thestatement “Every dog is an animal or 2 + 3 = 5.”, which is true since both A1 and A4

are true. The statement A1 ∨A3, i.e. “Every dog is an animal or the number 4 is odd.”is also true, since A1 is true. However, the statement A2 ∨ A5, i.e. “Every animal is adog or

√2 < 0.” is false, as both A2 and A5 are false.

As you will have noted in the above examples, logical operators can be applied tocombine statements that have no obvious contents relation. While this might seemstrange, introducing contents-related restrictions is unnecessary as well as undesirable,since it is often not clear which seemingly unrelated statements might suddenly appearin a common context in the future. The same occurs when considering implications andequivalences, where it might seem even more obscure at first.

Implication: Instead of A implies B, one also says if A then B, B is a consequenceof A, B is concluded or inferred from A, A is sufficient for B, or B is necessary forA. The implication A ⇒ B is always true, except if A is true and B is false. At firstglance, it might be surprising that A⇒ B is defined to be true for A false and B true,however, there are many examples of incorrect statements implying correct statements.For instance, squaring the (false) equality of integers −1 = 1, implies the (true) equalityof integers 1 = 1. However, as with conjunction and disjunction, it is perfectly validto combine statements without any obvious context relation: For example, using Ex.1.1(a), the statement A1 ⇒ A6, i.e. “Every dog is an animal implies x+ 1 > 0 holds foreach natural number x.” is true, since A6 is true, whereas the statement A4 ⇒ A2, i.e.“2 + 3 = 5 implies every animal is a dog.” is false, as A4 is true and A2 is false.

Of course, the implication A ⇒ B is not really useful in situations, where the truthvalues of both A and B are already known. Rather, in a typical application, one triesto establish the truth of A to prove the truth of B (a strategy that will fail if A happensto be false).

Example 1.2. Suppose we know Sasha to be a member of a group of students, takinga class in Analysis. Then the statement A “Sasha has taken a class in Analysis before.”implies the statement B “There is at least one student in the group, who has taken theclass before”. A priori, we might not know if Sasha has taken the Analysis class before,but if we can establish that Sasha has, indeed, taken the class before, then we also knowB to be true. If we find Sasha to be taking the class for the first time, then we do notknow, whether B is true or false.

—

Equivalence: A ⇔ B means A is true if, and only if, B is true. Once again, usinginput statements from Ex. 1.1(a), we see that A1 ⇔ A4, i.e. “Every dog is an animalis equivalent to 2 + 3 = 5.”, is true as well as A2 ⇔ A3, i.e. “Every animal is a dog isequivalent to the number 4 is odd.”. On the other hand, A4 ⇔ A5, i.e. “2 + 3 = 5 isequivalent to

√2 < 0, is false.

Analogous to the situation of implications, A⇔ B is not really useful if the truth valuesof both A and B are known a priori, but can be a powerful tool to prove B to be trueor false by establishing the truth value of A. It is obviously more powerful than theimplication as illustrated by the following example (compare with Ex. 1.2):

Example 1.3. Suppose we know Sasha has obtained the highest score among the stu-dents registered for the Analysis class. Then the statement A “Sasha has taken theAnalysis class before.” is equivalent to the statement B “The student with the highestscore has taken the class before.” As in Ex. 1.2, if we can establish Sasha to have takenthe class before, then we also know B to be true. However, in contrast to Ex. 1.2, if wefind Sasha to have taken the class for the first time, then we know B to be false.

Remark 1.4. In computer science, the truth value T is often coded as 1 and the truthvalue F is often coded as 0.


1.2.3 Rules

Note that the expressions in the first row of the truth table (1.2) (e.g. A ∧ B) are notstatements in the sense of Sec. 1.2.1, as they contain the statement variables (also knownas propositional variables) A or B. However, the expressions become statements if allstatement variables are substituted with actual statements. We will call expressions ofthis form propositional formulas. Moreover, if a truth value is assigned to each statementvariable of a propositional formula, then this uniquely determines the truth value of theformula. In other words, the truth value of the propositional formula can be calculatedfrom the respective truth values of its statement variables – a first justification for thename propositional calculus.

Example 1.5. (a) Consider the propositional formula (A ∧ B) ∨ (¬B). Suppose A istrue and B is false. The truth value of the formula is obtained according to thefollowing truth table:

A B A ∧ B ¬B (A ∧ B) ∨ (¬B)T F F T T

(1.3)

(b) The propositional formula A∨ (¬A), also known as the law of the excluded middle,has the remarkable property that its truth value is T for every possible choice oftruth values for A:

A ¬A A ∨ (¬A)T F TF T T

(1.4)

Formulas with this property are of particular importance.

Definition 1.6. A propositional formula is called a tautology or universally true if,and only if, its truth value is T for all possible assignments of truth values to all thestatement variables it contains.

Notation 1.7. We write φ(A1, . . . , An) if, and only if, the propositional formula φcontains precisely the n statement variables A1, . . . , An.

Definition 1.8. The propositional formulas φ(A1, . . . , An) and ψ(A1, . . . , An) are calledequivalent if, and only if, φ(A1, . . . , An) ⇔ ψ(A1, . . . , An) is a tautology.

Lemma 1.9. The propositional formulas φ(A1, . . . , An) and ψ(A1, . . . , An) are equiva-lent if, and only if, they have the same truth value for all possible assignments of truthvalues to A1, . . . , An.

Proof. If φ(A1, . . . , An) and ψ(A1, . . . , An) are equivalent and Ai is assigned the truthvalue ti, i = 1, . . . , n, then φ(A1, . . . , An) ⇔ ψ(A1, . . . , An) being a tautology implies ithas truth value T. From (1.2) we see that either φ(A1, . . . , An) and ψ(A1, . . . , An) bothhave truth value T or they both have truth value F.

If, on the other hand, we know φ(A1, . . . , An) and ψ(A1, . . . , An) have the same truthvalue for all possible assignments of truth values to A1, . . . , An, then, given such an


assignment, either φ(A1, . . . , An) and ψ(A1, . . . , An) both have truth value T or bothhave truth value F, i.e. φ(A1, . . . , An) ⇔ ψ(A1, . . . , An) has truth value T in each case,showing it is a tautology. �

For all logical purposes, two equivalent formulas are exactly the same – it does notmatter if one uses one or the other. The following theorem provides some importantequivalences of propositional formulas. As too many parentheses tend to make formulasless readable, we first introduce some precedence conventions for logical operators:

Convention 1.10. ¬ takes precedence over ∧,∨, which take precedence over ⇒,⇔.So, for example,

(A ∨ ¬B ⇒ ¬B ∧ ¬A) ⇔ ¬C ∧ (A ∨ ¬D)

is the same as((A ∨ (¬B)

)⇒((¬B) ∧ (¬A)

))

⇔(

(¬C) ∧(A ∨ (¬D)

))

.

Theorem 1.11. (a) (A ⇒ B) ⇔ ¬A ∨ B. This means one can actually define impli-cation via negation and disjunction.

(b) (A ⇔ B) ⇔((A ⇒ B) ∧ (B ⇒ A)

), i.e. A and B are equivalent if, and only if, A

is both necessary and sufficient for B. One also calls the implication B ⇒ A theconverse of the implication A ⇒ B. Thus, A and B are equivalent if, and only if,both A⇒ B and its converse hold true.

(c) Commutativity of Conjunction: A ∧B ⇔ B ∧ A.

(d) Commutativity of Disjunction: A ∨ B ⇔ B ∨ A.

(e) Associativity of Conjunction: (A ∧ B) ∧ C ⇔ A ∧ (B ∧ C).

(f) Associativity of Disjunction: (A ∨ B) ∨ C ⇔ A ∨ (B ∨ C).

(g) Distributivity I: A ∧ (B ∨ C) ⇔ (A ∧ B) ∨ (A ∧ C).

(h) Distributivity II: A ∨ (B ∧ C) ⇔ (A ∨B) ∧ (A ∨ C).

(i) De Morgan’s Law I: ¬(A ∧ B) ⇔ ¬A ∨ ¬B.

(j) De Morgan’s Law II: ¬(A ∨B) ⇔ ¬A ∧ ¬B.

(k) Double Negative: ¬¬A⇔ A.

(l) Contraposition: (A⇒ B) ⇔ (¬B ⇒ ¬A).

Proof. Each equivalence is proved by providing a truth table and using Lem. 1.9.


(a):A B ¬A A⇒ B ¬A ∨ BT T F T TT F F F FF T T T TF F T T T

(b) – (h): Exercise.

(i):A B ¬A ¬B A ∧ B ¬(A ∧ B) ¬A ∨ ¬BT T F F T F FT F F T F T TF T T F F T TF F T T F T T

(j): Exercise.

(k):A ¬A ¬¬AT F TF T F

(l):A B ¬A ¬B A⇒ B ¬B ⇒ ¬AT T F F T TT F F T F FF T T F T TF F T T T T

Having checked all the rules completes the proof of the theorem. �

The importance of the rules provided by Th. 1.11 lies in their providing proof techniques,i.e. methods for establishing the truth of statements from statements known or assumedto be true. The rules of Th. 1.11 will be used frequently in proofs throughout this class.

Remark 1.12. Another important proof technique is the so-called proof by contradic-tion, also called indirect proof. It is based on the observation, called the principle ofcontradiction, that A ∧ ¬A is always false:

A ¬A A ∧ ¬AT F FF T F

(1.5)

Thus, one possibility of proving a statement B to be true is to show ¬B ⇒ A ∧ ¬A forsome arbitrary statement A. Since the right-hand side of the implication is false, theleft-hand side must also be false, proving B is true.

—

Two more rules we will use regularly in subsequent proofs are the so-called transitivityof implication and the transitivity of equivalence (we will encounter equivalence againin the context of relations in Sec. 1.3 below). In preparation for the transitivity rules,we generalize implication to propositional formulas:

Definition 1.13. In generalization of the implication operator defined in (1.2), we saythe propositional formula φ(A1, . . . , An) implies the propositional formula ψ(A1, . . . , An)(denoted φ(A1, . . . , An) ⇒ ψ(A1, . . . , An)) if, and only if, each assignment of truth valuesto the A1, . . . , An that makes φ(A1, . . . , An) true, makes ψ(A1, . . . , An) true as well.

Theorem 1.14. (a) Transitivity of Implication: (A⇒ B) ∧ (B ⇒ C) ⇒ (A⇒ C).

(b) Transitivity of Equivalence: (A⇔ B) ∧ (B ⇔ C) ⇒ (A⇔ C).

Proof. According to Def. 1.13, the rules can be verified by providing truth tables thatshow that, for all possible assignments of truth values to the propositional formulas onthe left-hand side of the implications, either the left-hand side is false or both sides aretrue. (a):

A B C A⇒ B B ⇒ C (A⇒ B) ∧ (B ⇒ C) A⇒ CT T T T T T TT F T F T F TF T T T T T TF F T T T T TT T F T F F FT F F F T F FF T F T F F TF F F T T T T

(b):A B C A⇔ B B ⇔ C (A⇔ B) ∧ (B ⇔ C) A⇔ CT T T T T T TT F T F F F TF T T F T F FF F T T F F FT T F T F F FT F F F T F FF T F F F F TF F F T T T T

Having checked both rules, the proof is complete. �

Definition and Remark 1.15. A proof of the statement B is a finite sequence ofstatements A1, A2, . . . , An such that A1 is true; for 1 ≤ i < n, Ai implies Ai+1, and Animplies B. If there exists a proof for B, then Th. 1.14(a) guarantees that B is true.

Remark 1.16. Principle of Duality: In Th. 1.11, there are several pairs of rules thathave an analogous form: (c) and (d), (e) and (f), (g) and (h), (i) and (j). These


analogies are due to the general law called the principle of duality: If φ(A1, . . . , An) ⇒ψ(A1, . . . , An) and only the operators ∧,∨,¬ occur in φ and ψ, then the reverse im-plication Φ(A1, . . . , An) ⇐ Ψ(A1, . . . , An) holds, where one obtains Φ from φ and Ψfrom ψ by replacing each ∧ with ∨ and each ∨ with ∧. In particular, if, instead of animplication, we start with an equivalence (as in the examples from Th. 1.11), then weobtain another equivalence.

1.3 Set Theory

In the previous section, we have had a first glance at statements and corresponding truthvalues. In the present section, we will move our focus to the objects such statementsare about. Reviewing Example 1.1(a), and recalling that this is a mathematics classrather than one in zoology, the first two statements of Example 1.1(a) are less relevantfor us than statements 3–6. As in these examples, we will nearly always be interested instatements involving numbers or collections of numbers or collections of such collectionsetc.

In modern mathematics, the term one usually uses instead of “collection” is “set”. In1895, Georg Cantor defined a set as “any collection into a whole M of definite andseparate objects m of our intuition or our thought”. The objects m are called the ele-ments of the set M . As explained in Appendix A, without restrictions and refinements,Cantor’s set theory is not free of contradictions and, thus, not viable to be used in thefoundation of mathematics. Axiomatic set theory provides these necessary restrictionsand refinements and an introductory treatment can also be found in Appendix A. How-ever, it is possible to follow and understand the rest of this class, without having studiedAppendix A.

Notation 1.17. We write m ∈M for the statement “m is an element of the set M”.

Definition 1.18. The sets M and N are equal, denoted M = N , if, and only if, M andN have precisely the same elements.

—

Definition 1.18 means we know everything about a set M if, and only if, we know all itselements.

Definition 1.19. The set with no elements is called the empty set; it is denoted by thesymbol ∅.

Example 1.20. For finite sets, we can simply write down all its elements, for example,A := {0}, B := {0, 17.5}, C := {5, 1, 5, 3}, D := {3, 5, 1}, E := {2,

√2,−2}, where the

symbolism “:=” is to be read as “is defined to be equal to”.

Note C = D, since both sets contain precisely the same elements. In particular, theorder in which the elements are written down plays no role and a set does not change ifan element is written down more than once.


If a set has many elements, instead of writing down all its elements, one might useabbreviations such as F := {−4,−2, . . . , 20, 22, 24}, where one has to make sure themeaning of the dots is clear from the context.

Definition 1.21. The set A is called a subset of the set B (denoted A ⊆ B and alsoreferred to as the inclusion of A in B) if, and only if, every element of A is also anelement of B (one sometimes also calls B a superset of A and writes B ⊇ A). Pleasenote that A = B is allowed in the above definition of a subset. If A ⊆ B and A 6= B,then A is called a strict subset of B, denoted A ( B.

If B is a set and P (x) is a statement about an element x of B (i.e., for each x ∈ B,P (x) is either true or false), then we can define a subset A of B by writing

A := {x ∈ B : P (x)}. (1.6)

This notation is supposed to mean that the set A consists precisely of those elements ofB such that P (x) is true (has the truth value T in the language of Sec. 1.2).

Example 1.22. (a) For each set A, one has A ⊆ A and ∅ ⊆ A.

(b) If A ⊆ B, then A = {x ∈ B : x ∈ A}.

(c) We have {3} ⊆ {6.7, 3, 0}. Letting A := {−10,−8, . . . , 8, 10}, we have {−2, 0, 2} ={x ∈ A : x3 ∈ A}, ∅ = {x ∈ A : x+ 21 ∈ A}.

Remark 1.23. As a consequence of Def. 1.18, the sets A and B are equal if, and onlyif, one has both inclusions, namely A ⊆ B and B ⊆ A. Thus, when proving the equalityof sets, one often divides the proof into two parts, first proving one inclusion, then theother.

Definition 1.24. (a) The intersection of the sets A and B, denoted A∩B, consists ofall elements that are in A and in B. The sets A,B are said to be disjoint if, andonly if, A ∩ B = ∅.

(b) The union of the sets A and B, denoted A ∪B, consists of all elements that are inA or in B (as in the logical disjunction in (1.2), the or is meant nonexclusively). IfA and B are disjoint, one sometimes writes A ∪B and speaks of the disjoint unionof A and B.

(c) The difference of the sets A and B, denoted A\B (read “A minus B” or “A withoutB”), consists of all elements of A that are not elements of B, i.e. A \ B := {x ∈A : x /∈ B}. If B is a subset of a given set A (sometimes called the universe inthis context), then A \ B is also called the complement of B with respect to A.In that case, one also writes Bc := A \ B (note that this notation suppresses thedependence on A).

Example 1.25. (a) Examples of Intersections:

{1, 2, 3} ∩ {3, 4, 5} = {3}, (1.7a)

{√2} ∩ {1, 2, . . . , 10} = ∅, (1.7b)

{−1, 2,−3, 4, 5} ∩ {−10,−9, . . . ,−1} ∩ {−1, 7,−3} = {−1,−3}. (1.7c)


(b) Examples of Unions:

{1, 2, 3} ∪ {3, 4, 5} = {1, 2, 3, 4, 5}, (1.8a)

{1, 2, 3}∪{4, 5} = {1, 2, 3, 4, 5}, (1.8b)

{−1, 2,−3, 4, 5} ∪ {−99,−98, . . . ,−1} ∪ {−1, 7,−3}= {−99,−98, . . . ,−2,−1, 2, 4, 5, 7}. (1.8c)

(c) Examples of Differences:

{1, 2, 3} \ {3, 4, 5} = {1, 2}, (1.9a)

{1, 2, 3} \ {3, 2, 1,√5} = ∅, (1.9b)

{−10,−9, . . . , 9, 10} \ {0} = {−10,−9, . . . ,−1} ∪ {1, 2, . . . , 9, 10}. (1.9c)

With respect to the universe {1, 2, 3, 4, 5}, it is

{1, 2, 3}c = {4, 5}; (1.9d)

with respect to the universe {0, 1, . . . , 20}, it is

{1, 2, 3}c = {0} ∪ {4, 5, . . . , 20}. (1.9e)

As mentioned earlier, it will often be unavoidable to consider sets of sets. Here are firstexamples:

{∅, {0}, {0, 1}

},{{0, 1}, {1, 2}

}.

Definition 1.26. Given a set A, the set of all subsets of A is called the power set of A,denoted P(A) (for reasons explained later (cf. Prop. 2.18), the power set is sometimesalso denoted as 2A).

Example 1.27. Examples of Power Sets:

P(∅) = {∅}, (1.10a)

P({0}) ={∅, {0}

}, (1.10b)

P(P({0})

)= P

({∅, {0}

})={∅, {∅}, {{0}},P({0})

}. (1.10c)

—

So far, we have restricted our set-theoretic examples to finite sets. However, not sur-prisingly, many sets of interest to us will be infinite (we will have to postpone a mathe-matically precise definition of finite and infinite to Sec. 3.2). We will now introduce themost simple infinite set.

Definition 1.28. The set N := {1, 2, 3, . . . } is called the set of natural numbers (for amore rigorous construction of N, based on the axioms of axiomatic set theory, see Sec.A.3.4 of the Appendix, where Th. A.46 shows N to be, indeed, infinite). Moreover, wedefine N0 := {0} ∪ N.

—


The following theorem compiles important set-theoretic rules:

Theorem 1.29. Let A,B,C, U be sets.

(a) Commutativity of Intersections: A ∩ B = B ∩ A.

(b) Commutativity of Unions: A ∪B = B ∪ A.

(c) Associativity of Intersections: (A ∩ B) ∩ C = A ∩ (B ∩ C).

(d) Associativity of Unions: (A ∪ B) ∪ C = A ∪ (B ∪ C).

(e) Distributivity I: A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).

(f) Distributivity II: A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C).

(g) De Morgan’s Law I: U \ (A ∩ B) = (U \ A) ∪ (U \B).

(h) De Morgan’s Law II: U \ (A ∪B) = (U \ A) ∩ (U \B).

(i) Double Complement: If A ⊆ U , then U \ (U \ A) = A.

Proof. In each case, the proof results from the corresponding rule of Th. 1.11:

(a):

x ∈ A ∩ B ⇔ x ∈ A ∧ x ∈ BTh. 1.11(c)⇔ x ∈ B ∧ x ∈ A⇔ x ∈ B ∩ A.

(g): Under the general assumption of x ∈ U , we have the following equivalences:

x ∈ U \ (A ∩ B) ⇔ ¬(x ∈ A ∩ B) ⇔ ¬(x ∈ A ∧ x ∈ B

) Th. 1.11(i)⇔ ¬(x ∈ A) ∨ ¬(x ∈ B)

⇔ x ∈ U \ A ∨ x ∈ U \B ⇔ x ∈ (U \ A) ∪ (U \B).

The proofs of the remaining rules are left as an exercise. �

Remark 1.30. The correspondence between Th. 1.11 and Th. 1.29 is no coincidence.One can actually prove that, starting with an equivalence of propositional formulasφ(A1, . . . , An) ⇔ ψ(A1, . . . , An), where both formulas contain only the operators ∧,∨,¬,one obtains a set-theoretic rule (stating an equality of sets) by reinterpreting all state-ment variables A1, . . . , An as variables for sets, all subsets of a universe U , and replacing∧ by ∩, ∨ by ∪, and ¬ by U\ (if there are no multiple negations, then we do not needthe hypothesis that A1, . . . , An are subsets of U). The procedure also works in the op-posite direction – one can start with a set-theoretic formula for an equality of sets andtranslate it into two equivalent propositional formulas.


1.4 Predicate Calculus

Now that we have introduced sets in the previous section, we have to return to thesubject of mathematical logic once more. As it turns out, propositional calculus, whichwe discussed in Sec. 1.2, does not quite suffice to develop the theory of calculus (normost other mathematical theories). The reason is that we need to consider statementssuch as

x+ 1 > 0 holds for each natural number x. (T) (1.11a)

All real numbers are positive. (F) (1.11b)

There exists a natural number bigger than 10. (T) (1.11c)

There exists a real number x such that x2 = −1. (F) (1.11d)

For all natural numbers n, there exists a natural number bigger than n. (T) (1.11e)

That means we are interested in statements involving universal quantification via thequantifier “for all” (one also often uses “for each” or “for every” instead), existentialquantification via the quantifier “there exists”, or both. The quantifier of universalquantification is denoted by ∀ and the quantifier of existential quantification is denotedby ∃. Using these symbols as well as N and R to denote the sets of natural and realnumbers, respectively, we can restate (1.11) as

∀x∈N

x+ 1 > 0. (T) (1.12a)

∀x∈R

x > 0. (F) (1.12b)

∃n∈N

n > 10. (T) (1.12c)

∃x∈R

x2 = −1. (F) (1.12d)

∀n∈N

∃m∈N

m > n. (T) (1.12e)

Definition 1.31. A universal statement has the form

∀x∈A

P (x), (1.13a)

whereas an existential statement has the form

∃x∈A

P (x). (1.13b)

In (1.13), A denotes a set and P (x) is a sentence involving the variable x, a so-calledpredicate of x, that becomes a statement (i.e. becomes either true or false) if x is substi-tuted with any concrete element of the set A (in particular, P (x) is allowed to containfurther quantifiers, but it must not contain any other quantifier involving x – one saysx must be a free variable in P (x), not bound by any quantifier in P (x)).

The universal statement (1.13a) has the truth value T if, and only if, P (x) has the truthvalue T for all elements x ∈ A; the existential statement (1.13b) has the truth value Tif, and only if, P (x) has the truth value T for at least one element x ∈ A.


Remark 1.32. Some people prefer to write∧

x∈Ainstead of ∀

x∈Aand

∨

x∈Ainstead of ∃

x∈A.

Even though this notation has the advantage of emphasizing that the universal statementcan be interpreted as a big logical conjunction and the existential statement can beinterpreted as a big logical disjunction, it is significantly less common. So we will stickto ∀ and ∃ in this class.

Remark 1.33. According to Def. 1.31, the existential statement (1.13b) is true if, andonly if, P (x) is true for at least one x ∈ A. So if there is precisely one such x, then(1.13b) is true; and if there are several different x ∈ A such that P (x) is true, then(1.13b) is still true. Uniqueness statements are often of particular importance, and onesometimes writes

∃!x∈A

P (x) (1.14)

for the statement “there exists a unique x ∈ A such that P (x) is true”. This notationcan be defined as an abbreviation for

∃x∈A

(

P (x) ∧ ∀y∈A

(P (y) ⇒ x = y

))

. (1.15)

Example 1.34. Here are some examples of uniqueness statements:

∃!n∈N

n > 10. (F) (1.16a)

∃!n∈N

12 > n > 10. (T) (1.16b)

∃!n∈N

11 > n > 10. (F) (1.16c)

∃!x∈R

x2 = −1. (F) (1.16d)

∃!x∈R

x2 = 1. (F) (1.16e)

∃!x∈R

x2 = 0. (T) (1.16f)

Remark 1.35. As for propositional calculus, we also have some important rules forpredicate calculus:

(a) Consider the negation of a universal statement, ¬ ∀x∈A

P (x), which is true if, and

only if, P (x) does not hold for each x ∈ A, i.e. if, and only if, there exists at leastone x ∈ A such that P (x) is false (such that ¬P (x) is true). We have just provedthe rule

¬ ∀x∈A

P (x) ⇔ ∃x∈A

¬P (x). (1.17a)

Similarly, consider the negation of an existential statement. We claim the corre-sponding rule is

¬ ∃x∈A

P (x) ⇔ ∀x∈A

¬P (x). (1.17b)

Indeed, we can prove (1.17b) from (1.17a):

¬ ∃x∈A

P (x)Th. 1.11(k)⇔ ¬ ∃

x∈A¬¬P (x) (1.17a)⇔ ¬¬ ∀

x∈A¬P (x) Th. 1.11(k)⇔ ∀

x∈A¬P (x).

(1.18)


One can interpret (1.17) as a generalization of the De Morgan’s laws Th. 1.11(i),(j).

One can actually generalize (1.17) even a bit more: If a statement starts with severalquantifiers, then one negates the statement by replacing each ∀ with ∃ and vice versaplus negating the predicate after the quantifiers (see the example in (1.21e) below).

(b) If A,B are sets and P (x, y) denotes a predicate of both x and y, then ∀x∈A

∀y∈B

P (x, y)

and ∀y∈B

∀x∈A

P (x, y) both hold true if, and only if, P (x, y) holds true for each x ∈ A

and each y ∈ B, i.e. the order of two consecutive universal quantifiers does notmatter:

∀x∈A

∀y∈B

P (x, y) ⇔ ∀y∈B

∀x∈A

P (x, y) (1.19a)

In the same way, we obtain the following rule:

∃x∈A

∃y∈B

P (x, y) ⇔ ∃y∈B

∃x∈A

P (x, y). (1.19b)

If A = B, one also uses abbreviations of the form

∀x,y∈A

P (x, y) for ∀x∈A

∀y∈A

P (x, y), (1.20a)

∃x,y∈A

P (x, y) for ∃x∈A

∃y∈A

P (x, y). (1.20b)

Generalizing rules (1.19), we can always commute identical quantifiers. Caveat:Quantifiers that are not identical must not be commuted (see Ex. 1.36(d) below).

Example 1.36. (a) Negation of universal and existential statements:

Negation of (1.12a) : ∃x∈N

¬(x+1>0)︷︸︸︷

x+ 1 ≤ 0 . (F) (1.21a)

Negation of (1.12b) : ∃x∈R

¬(x>0)︷︸︸︷

x ≤ 0 . (T) (1.21b)

Negation of (1.12c) : ∀n∈N

¬(n>10)︷︸︸︷

n ≤ 10 . (F) (1.21c)

Negation of (1.12d) : ∀x∈R

¬(x2=−1)︷︸︸︷

x2 6= −1 . (T) (1.21d)

Negation of (1.12e) : ∃n∈N

∀m∈N

¬(m>n)︷︸︸︷

m ≤ n . (F) (1.21e)

(b) As a more complicated example, consider the negation of the uniqueness statement

(1.14), i.e. of (1.15):

¬ ∃!x∈A

P (x) ⇔ ¬ ∃x∈A

(

P (x) ∧ ∀y∈A

(P (y) ⇒ x = y

))

(1.17b), Th. 1.11(a)⇔ ∀x∈A

¬(

P (x) ∧ ∀y∈A

(¬P (y) ∨ x = y

))

Th. 1.11(i)⇔ ∀x∈A

(

¬P (x) ∨ ¬ ∀y∈A

(¬P (y) ∨ x = y

))

(1.17a)⇔ ∀x∈A

(

¬P (x) ∨ ∃y∈A

¬(¬P (y) ∨ x = y

))

Th. 1.11(j),(k)⇔ ∀x∈A

(

¬P (x) ∨ ∃y∈A

(P (y) ∧ x 6= y

))

Th. 1.11(a)⇔ ∀x∈A

(

P (x) ⇒ ∃y∈A

(P (y) ∧ x 6= y

))

. (1.22)

So how to decode the expression, we have obtained at the end? It states that ifP (x) holds for some x ∈ A, then there must be at least a second, different, elementy ∈ A such that P (y) is true. This is, indeed, precisely the negation of ∃!

x∈AP (x).

(c) Identical quantifiers commute:

∀x∈R

∀n∈N

x2n ≥ 0 ⇔ ∀n∈N

∀x∈R

x2n ≥ 0, (1.23a)

∀x∈R

∃y∈R

∃n∈N

ny > x2 ⇔ ∀x∈R

∃n∈N

∃y∈R

ny > x2. (1.23b)

(d) The following example shows that different quantifiers do, in general, not commute(i.e. do not yield equivalent statements when commuted): While the statement

∀x∈R

∃y∈R

y > x (1.24a)

is true (for each real number x, there is a bigger real number y, e.g. y := x+ 1 willdo the job), the statement

∃y∈R

∀x∈R

y > x (1.24b)

is false (for example, since y > y is false). In particular, (1.24a) and (1.24b) are notequivalent.

(e) Even though (1.14) provides useful notation, it is better not to think of ∃! as aquantifier. It is really just an abbreviation for (1.15), and it behaves very differentlyfrom ∃ and ∀: The following examples show that, in general, ∃! commutes neitherwith ∃, nor with itself:

∃n∈N

∃!m∈N

m < n 6⇔ ∃!m∈N

∃n∈N

m < n

(the statement on the left is true, as one can choose n = 2, but the statement onthe right is false, as ∃

n∈Nm < n holds for every m ∈ N). Similarly,

∃!n∈N

∃!m∈N

m < n 6⇔ ∃!m∈N

∃!n∈N

m < n

(the statement on the left is still true and the statement on the right is still false(there is no m ∈ N such that ∃!

n∈Nm < n)).

Remark 1.37. One can make the following observations regarding the strategy forproving universal and existential statements:

(a) To prove that ∀x∈A

P (x) is true, one must check the truth of P (x) for every element

x ∈ A – examples are not enough!

(b) To prove that ∀x∈A

P (x) is false, it suffices to find one x ∈ A such that P (x) is

false – such an x is then called a counterexample and one counterexample is alwaysenough to prove ∀

x∈AP (x) is false!

(c) To prove that ∃x∈A

P (x) is true, it suffices to find one x ∈ A such that P (x) is true

– such an x is then called an example and one example is always enough to prove∃x∈A

P (x) is true!

—

The subfield of mathematical logic dealing with quantified statements is called predicatecalculus. In general, one does not restrict the quantified variables to range only overelements of sets (as we have done above). Again, we refer to [EFT07] for a deepertreatment of the subject.

As an application of quantified statements, let us generalize the notion of union andintersection:

Definition 1.38. Let I 6= ∅ be a nonempty set, usually called an index set in the presentcontext. For each i ∈ I, let Ai denote a set (some or all of the Ai can be identical).

(a) The intersection⋂

i∈IAi :=

{

x : ∀i∈I

x ∈ Ai

}

(1.25a)

consists of all elements x that belong to every Ai.

(b) The union⋃

i∈IAi :=

{

x : ∃i∈I

x ∈ Ai

}

(1.25b)

consists of all elements x that belong to at least one Ai. The union is called disjointif, and only if, for each i, j ∈ I, i 6= j implies Ai ∩ Aj = ∅.

Proposition 1.39. Let I 6= ∅ be an index set, let M denote a set, and, for each i ∈ I,let Ai denote a set. The following set-theoretic rules hold:

(a)

(⋂

i∈IAi

)

∩M =⋂

i∈I(Ai ∩M).


(b)

(⋃

i∈IAi

)

∪M =⋃

i∈I(Ai ∪M).

(c)

(⋂

i∈IAi

)

∪M =⋂

i∈I(Ai ∪M).

(d)

(⋃

i∈IAi

)

∩M =⋃

i∈I(Ai ∩M).

(e) M \ ⋂i∈IAi =

⋃

i∈I(M \ Ai).

(f) M \ ⋃i∈IAi =

⋂

i∈I(M \ Ai).

Proof. We prove (c) and (e) and leave the remaining proofs as an exercise.

(c):

x ∈(⋂

i∈IAi

)

∪M ⇔ x ∈M ∨ ∀i∈I

x ∈ Ai(∗)⇔ ∀

i∈I

(x ∈ Ai ∨ x ∈M

)

⇔ x ∈⋂

i∈I(Ai ∪M).

To justify the equivalence at (∗), we make use of Th. 1.11(b) and verify ⇒ and ⇐. For⇒ note that the truth of x ∈M implies x ∈ Ai∨x ∈M is true for each i ∈ I. If x ∈ Aiis true for each i ∈ I, then x ∈ Ai ∨x ∈M is still true for each i ∈ I. To verify ⇐, notethat the existence of i ∈ I such that x ∈ M implies the truth of x ∈ M ∨ ∀

i∈Ix ∈ Ai.

If x ∈ M is false for each i ∈ I, then x ∈ Ai must be true for each i ∈ I, showingx ∈M ∨ ∀

i∈Ix ∈ Ai is true also in this case.

(e):

x ∈M \⋂

i∈IAi ⇔ x ∈M ∧ ¬ ∀

i∈Ix ∈ Ai ⇔ x ∈M ∧ ∃

i∈Ix /∈ Ai

⇔ ∃i∈I

x ∈M \ Ai ⇔ x ∈⋃

i∈I(M \ Ai),

completing the proof. �

2 FUNCTIONS AND RELATIONS 23

Example 1.40. We have the following identities of sets:⋂

x∈RN = N, (1.26a)

⋂

n∈N{1, 2, . . . , n} = {1}, (1.26b)

⋃

x∈RN = N, (1.26c)

⋃

n∈N{1, 2, . . . , n} = N, (1.26d)

N \⋃

n∈N{2n} = {1, 3, 5, . . . } =

⋂

n∈N

(N \ {2n}

). (1.26e)

2 Functions and Relations

2.1 Functions

Definition 2.1. Let A,B be sets. Given x ∈ A, y ∈ B, the set

(x, y) :={

{x}, {x, y}}

(2.1)

is called the ordered pair (often shortened to just pair) consisting of x and y. The set ofall such pairs is called the Cartesian product A×B, i.e.

A× B := {(x, y) : x ∈ A ∧ y ∈ B}. (2.2)

Example 2.2. Let A be a set.

A× ∅ = ∅ × A = ∅, (2.3a)

{1, 2} × {1, 2, 3} = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3)} (2.3b)

6= {1, 2, 3} × {1, 2} = {(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2)}. (2.3c)

Also note that, for x 6= y,

(x, y) ={{x}, {x, y}

}6={{y}, {x, y}

}= (y, x). (2.4)

Definition 2.3. Given sets A,B, a function or map f is an assignment rule that assignsto each x ∈ A a unique y ∈ B. One then also writes f(x) for the element y. The set Ais called the domain of f , denoted dom(f), and B is called the codomain of f , denotedcodom(f). The information about a map f can be concisely summarized by the notation

f : A −→ B, x 7→ f(x), (2.5)

where x 7→ f(x) is called the assignment rule for f , f(x) is called the image of x, andx is called a preimage of f(x) (the image must be unique, but there might be severalpreimages). The set

graph(f) :={(x, y) ∈ A×B : y = f(x)

}(2.6)


is called the graph of f (not to be confused with pictures visualizing the function f ,which are also called graph of f). If one wants to be completely precise, then oneidentifies the function f with the ordered triple (A,B, graph(f)).

The set of all functions with domain A and codomain B is denoted by F(A,B) or BA,i.e.

F(A,B) := BA :={(f : A −→ B) : A = dom(f) ∧ B = codom(f)

}. (2.7)

Caveat: Some authors reserve the word map for continuous functions, but we use func-tion and map synonymously.

Definition 2.4. Let A,B be sets and f : A −→ B a function.

(a) If T is a subset of A, then

f(T ) := {f(x) ∈ B : x ∈ T} (2.8)

is called the image of T under f .

(b) If U is a subset of B, then

f−1(U) := {x ∈ A : f(x) ∈ U} (2.9)

is called the preimage or inverse image of U under f .

(c) f is called injective or one-to-one if, and only if, every y ∈ B has at most onepreimage, i.e. if, and only if, the preimage of {y} has at most one element:

f injective ⇔ ∀y∈B

(

f−1{y} = ∅ ∨ ∃!x∈A

f(x) = y

)

⇔ ∀x1,x2∈A

(x1 6= x2 ⇒ f(x1) 6= f(x2)

). (2.10)

(d) f is called surjective or onto if, and only if, every element of the codomain of f hasa preimage:

f surjective ⇔ ∀y∈B

∃x∈A

y = f(x) ⇔ ∀y∈B

f−1{y} 6= ∅. (2.11)

(e) f is called bijective if, and only if, f is injective and surjective.

Example 2.5. Examples of Functions:

f : {1, 2, 3, 4, 5} −→ {1, 2, 3, 4, 5}, f(x) := −x+ 6, (2.12a)

g : N −→ N, g(n) := 2n, (2.12b)

h : N −→ {2, 4, 6, . . . }, h(n) := 2n, (2.12c)

h : N −→ {2, 4, 6, . . . }, h(n) :=

{

n for n even,

n+ 1 for n odd,(2.12d)

G : N −→ R, G(n) := n/(n+ 1), (2.12e)

F : P(N) −→ P(P(N)

), F (A) := P(A). (2.12f)


Instead of f(x) := −x + 6 in (2.12a), one can also write x 7→ −x + 6 and analogouslyin the other cases. Also note that, in the strict sense, functions g and h are different,since their codomains are different (however, using the following Def. 2.4(a), they havethe same image in the sense that g(N) = h(N)). Furthermore,

f({1, 2}) = {5, 4} = f−1({1, 2}), h−1({2, 4, 6}) = {1, 2, 3, 4, 5, 6}, (2.13)

f is bijective; g is injective, but not surjective; h is bijective; h is surjective, but notinjective. Can you figure out if G and F are injective and/or surjective?

Example 2.6. (a) For each nonempty set A, the map Id : A −→ A, Id(x) := x, iscalled the identity on A. If one needs to emphasize that Id operates on A, then onealso writes IdA instead of Id. The identity is clearly bijective.

(b) Let A,B be nonempty sets. A map f : A −→ B is called constant if, and only if,there exists c ∈ B such that f(x) = c for each x ∈ A. In that case, one also writesf ≡ c, which can be read as “f is identically equal to c”. If f ≡ c, ∅ 6= T ⊆ A, andU ⊆ B, then

f(T ) = {c}, f−1(U) =

{

A for c ∈ U,

∅ for c /∈ U.(2.14)

f is injective if, and only if, A = {x}; f is surjective if, and only if, B = {c}.

(c) Given A ⊆ X, the mapι : A −→ X, ι(x) := x, (2.15)

is called inclusion (also embedding or imbedding). An inclusion is always injective;it is surjective if, and only if A = X, i.e. if, and only if, it is the identity on A.

(d) Given A ⊆ X and a map f : X −→ B, the map g : A −→ B, g(x) = f(x), is calledthe restriction of f to A; f is called the extension of g to X. In this situation, onealso uses the notation f ↾A for g (some authors prefer the notation f |A or f |A).

Theorem 2.7. Let f : A→ B be a map, let ∅ 6= I be an index set, and assume S, T, Si,i ∈ I, are subsets of A, whereas U, V, Ui, i ∈ I, are subsets of B. Then we have the


following rules concerning functions and set-theoretic operations:

f(S ∩ T ) ⊆ f(S) ∩ f(T ), (2.16a)

f

(⋂

i∈ISi

)

⊆⋂

i∈If(Si), (2.16b)

f(S ∪ T ) = f(S) ∪ f(T ), (2.16c)

f

(⋃

i∈ISi

)

=⋃

i∈If(Si), (2.16d)

f−1(U ∩ V ) = f−1(U) ∩ f−1(V ), (2.16e)

f−1

(⋂

i∈IUi

)

=⋂

i∈If−1(Ui), (2.16f)

f−1(U ∪ V ) = f−1(U) ∪ f−1(V ), (2.16g)

f−1

(⋃

i∈IUi

)

=⋃

i∈If−1(Ui), (2.16h)

f(f−1(U)) ⊆ U, f−1(f(S)) ⊇ S, (2.16i)

f−1(U \ V ) = f−1(U) \ f−1(V ). (2.16j)

Proof. We prove (2.16b) (which includes (2.16a) as a special case) and the second partof (2.16i), and leave the remaining cases as exercises.

For (2.16b), one argues

y ∈ f

(⋂

i∈ISi

)

⇔ ∃x∈A

∀i∈I

(x ∈ Si ∧ y = f(x)

)⇒ ∀

i∈Iy ∈ f(Si) ⇔ y ∈

⋂

i∈If(Si).

The observation

x ∈ S ⇒ f(x) ∈ f(S) ⇔ x ∈ f−1(f(S)).

establishes the second part of (2.16i). �

It is an exercise to find counterexamples that show one can not, in general, replace thefour subset symbols in (2.16) by equalities (it is possible to find examples with sets thathave at most 2 elements).

Definition 2.8. The composition of maps f and g with f : A −→ B, g : C −→ D, andf(A) ⊆ C is defined to be the map

g ◦ f : A −→ D, (g ◦ f)(x) := g(f(x)

). (2.17)

The expression g ◦ f is read as “g after f” or “g composed with f”.


Example 2.9. Consider the maps

f : N −→ R, n 7→ n2, (2.18a)

g : N −→ R, n 7→ 2n. (2.18b)

We obtain f(N) = {1, 4, 9, . . . } ⊆ dom(g), g(N) = {2, 4, 6, . . . } ⊆ dom(f), and thecompositions

(g ◦ f) : N −→ R, (g ◦ f)(n) = g(n2) = 2n2, (2.19a)

(f ◦ g) : N −→ R, (f ◦ g)(n) = f(2n) = 4n2, (2.19b)

showing that composing functions is, in general, not commutative, even if the involvedfunctions have the same domain and the same codomain.

Proposition 2.10. Consider maps f : A −→ B, g : C −→ D, h : E −→ F , satisfyingf(A) ⊆ C and g(C) ⊆ E.

(a) Associativity of Compositions:

h ◦ (g ◦ f) = (h ◦ g) ◦ f. (2.20)

(b) One has the following law for forming preimages:

∀W∈P(D)

(g ◦ f)−1(W ) = f−1(g−1(W )). (2.21)

Proof. (a): Both h ◦ (g ◦ f) and (h ◦ g) ◦ f map A into F . So it just remains to prove(h ◦ (g ◦ f)

)(x) =

((h ◦ g) ◦ f

)(x) for each x ∈ A. One computes, for each x ∈ A,

(h ◦ (g ◦ f)

)(x) = h

((g ◦ f)(x)

)= h

(g(f(x))

)= (h ◦ g)(f(x))

=((h ◦ g) ◦ f

)(x), (2.22)

establishing the case.

(b): Exercise. �

Definition 2.11. A function g : B −→ A is called a right inverse (resp. left inverse)of a function f : A −→ B if, and only if, f ◦ g = IdB (resp. g ◦ f = IdA). Moreover,g is called an inverse of f if, and only if, it is both a right and a left inverse. If g isan inverse of f , then one also writes f−1 instead of g. The map f is called (right, left)invertible if, and only if, there exists a (right, left) inverse for f .

Example 2.12. (a) Consider the map

f : N −→ N, f(n) := 2n. (2.23a)

The maps

g1 : N −→ N, g1(n) :=

{

n/2 if n even,

1 if n odd,(2.23b)

g2 : N −→ N, g2(n) :=

{

n/2 if n even,

2 if n odd,(2.23c)


both constitute left inverses of f . It follows from Th. 2.13(c) below that f does nothave a right inverse.

(b) Consider the map

f : N −→ N, f(n) :=

{

n/2 for n even,

(n+ 1)/2 for n odd.(2.24a)

The maps

g1 : N −→ N, g1(n) := 2n, (2.24b)

g2 : N −→ N, g2(n) := 2n− 1, (2.24c)

both constitute right inverses of f . It follows from Th. 2.13(c) below that f doesnot have a left inverse.

(c) The map

f : N −→ N, f(n) :=

{

n− 1 for n even,

n+ 1 for n odd,(2.25a)

is its own inverse, i.e. f−1 = f . For the map

g : N −→ N, g(n) :=

2 for n = 1,

3 for n = 2,

1 for n = 3,

n for n /∈ {1, 2, 3},

(2.25b)

the inverse is

g−1 : N −→ N, g−1(n) :=

3 for n = 1,

1 for n = 2,

2 for n = 3,

n for n /∈ {1, 2, 3}.

(2.25c)

While Examples 2.12(a),(b) show that left and right inverses are usually not unique,they are unique provided f is bijective (see Th. 2.13(c)).

Theorem 2.13. Let A,B be nonempty sets.

(a) f : A −→ B is right invertible if, and only if, f is surjective (where the implication“⇐” makes use of the axiom of choice (AC), see Appendix A.4).

(b) f : A −→ B is left invertible if, and only if, f is injective.

(c) f : A −→ B is invertible if, and only if, f is bijective. In this case, the right inverseand the left inverse are unique and both identical to the inverse.


Proof. (a): If f is surjective, then, for each y ∈ B, there exists xy ∈ f−1{y} such thatf(xy) = y. By AC, we can define the choice function

g : B −→ A, g(y) := xy. (2.26)

Then, for each y ∈ B, f(g(y)) = y, showing g is a right inverse of f . Conversely, ifg : B −→ A is a right inverse of f , then, for each y ∈ B, it is y = f(g(y)), showing thatg(y) ∈ A is a preimage of y, i.e. f is surjective.

(b): Fix a ∈ A. If f is injective, then, for each y ∈ B with f−1{y} 6= ∅, let xy denotethe unique element in A satisfying f(xy) = y. Define

g : B −→ A, g(y) :=

{

xy for f−1{y} 6= ∅,a otherwise.

(2.27)

Then, for each x ∈ A, g(f(x)) = x, showing g is a left inverse of f . Conversely, ifg : B −→ A is a left inverse of f and x1, x2 ∈ A with f(x1) = f(x2) = y, thenx1 = (g ◦ f)(x1) = g(f(x1)) = g(f(x2)) = (g ◦ f)(x2) = x2, showing y has precisely onepreimage and f is injective.

(c): Assume g to be a left inverse of f and h to be a right inverse of f . Then, for eachy ∈ B,

g(y) =(g ◦ (f ◦ h)

)(y) =

((g ◦ f) ◦ h

)(y) = h(y), (2.28)

showing g = h. In particular, if f has an inverse f−1, then g = h = f−1. If f isinvertible, then f is bijective by (a) and (b). If f is bijective, then, by (a) and (b), f hasa left inverse g and a right inverse h (here, this follows without using AC, since, if f isboth injective and surjective, then, for each y ∈ B, the element xy ∈ f−1{y} is unique,and (2.26) can be defined without AC). By (2.28), g = h, i.e. f is invertible. �

Theorem 2.14. Consider maps f : A −→ B, g : B −→ C. If f and g are both injective(resp. both surjective, both bijective), then so is g ◦ f . Moreover, in the bijective case,one has

(g ◦ f)−1 = f−1 ◦ g−1. (2.29)

Proof. Exercise. �

Definition 2.15. (a) Given an index set I and a set A, a map f : I −→ A is sometimescalled a family (of elements in A), and is denoted in the form f = (ai)i∈I withai := f(i). When using this representation, one often does not even specify f andA, especially if the ai are themselves sets.

(b) A sequence in a set A is a family of elements in A, where the index set is the set ofnatural numbers N. In this case, one writes (an)n∈N or (a1, a2, . . . ). More generally,a family is called a sequence, given a bijective map between the index set I and asubset of N.


(c) Given a family of sets (Ai)i∈I , we define the Cartesian product of the Ai to be theset of functions

∏

i∈IAi :=

{(

f : I −→⋃

j∈IAj

)

: ∀i∈I

f(i) ∈ Ai

}

. (2.30)

If I has precisely n elements with n ∈ N, then the elements of the Cartesian product∏

i∈I Ai are called (ordered) n-tuples, (ordered) triples for n = 3.

Example 2.16. (a) Using the notion of family, we can now say that the intersection⋂

i∈I Ai and union⋃

i∈I Ai as defined in Def. 1.38 are the intersection and union ofthe family of sets (Ai)i∈I , respectively. As a concrete example, let us revisit (1.26b),where we have

(An)n∈N, An := {1, 2, . . . , n},⋂

n∈NAn = {1}. (2.31)

(b) Examples of Sequences:

Sequence in {0, 1} : (1, 0, 1, 0, 1, 0, . . . ), (2.32a)

Sequence in N : (n2)n∈N = (1, 4, 9, 16, 25, . . . ), (2.32b)

Sequence in R :((−1)n

√n)

n∈N =(

−1,√2,−

√3, . . .

)

, (2.32c)

Sequence in R : (1/n)n∈N =

(

1,1

2,1

3, . . .

)

, (2.32d)

Finite Sequence in P(N) :({3, 2, 1}, {2, 1}, {1}, ∅

). (2.32e)

(c) The Cartesian product∏

i∈I A, where all sets Ai = A, is the same as AI , the setof all functions from I into A. So, for example,

∏

n∈N R = RN is the set of allsequences in R. If I = {1, 2, . . . , n} with n ∈ N, then

∏

i∈IA = A{1,2...,n} =:

n∏

i=1

A =: An (2.33)

is the set of all n-tuples with entries from A.

—

In the following, we explain the common notation 2A for the power set P(A) of a setA. It is related to a natural identification between subsets and their correspondingcharacteristic function.

Definition 2.17. Let A be a set and let B ⊆ A be a subset of A. Then

χB : A −→ {0, 1}, χB(x) :=

{

1 if x ∈ B,

0 if x /∈ B,(2.34)

is called the characteristic function of the set B (with respect to the universe A). Onealso finds the notations 1B and 1B instead of χB (note that all the notations supressthe dependence of the characteristic function on the universe A).


Proposition 2.18. Let A be a set. Then the map

χ : P(A) −→ {0, 1}A, χ(B) := χB, (2.35)

is bijective (recall that P(A) denotes the power set of A and {0, 1}A denotes the set ofall functions from A into {0, 1}).

Proof. χ is injective: Let B,C ∈ P(A) with B 6= C. By possibly switching the namesof B and C, we may assume there exists x ∈ B such that x /∈ C. Then χB(x) = 1,whereas χC(x) = 0, showing χ(B) 6= χ(C), proving χ is injective.

χ is surjective: Let f : A −→ {0, 1} be an arbitrary function and define B := {x ∈ A :f(x) = 1}. Then χ(B) = χB = f , proving χ is surjective. �

Proposition 2.18 allows one to identify the sets P(A) and {0, 1}A via the bijective mapχ. This fact together with the common practise of set theory to identify the number2 with the set {0, 1} (cf. the first paragraph of Sec. D.1 in the Appendix) explains thenotation 2A for P(A).

2.2 Relations

Definition 2.19. Given sets A and B, a relation is a subset R of A× B (if one wantsto be completely precise, a relation is an ordered triple (A,B,R), where R ⊆ A × B).If A = B, then we call R a relation on A. One says that a ∈ A and b ∈ B are relatedaccording to the relation R if, and only if, (a, b) ∈ R. In this context, one usually writesaR b instead of (a, b) ∈ R.

Example 2.20. (a) The relations we are probably most familiar with are = and ≤.The relation R of equality, usually denoted =, makes sense on every nonempty setA:

R := ∆(A) := {(x, x) ∈ A× A : x ∈ A}. (2.36)

The set ∆(A) is called the diagonal of the Cartesian product, i.e., as a subset ofA× A, the relation of equality is identical to the diagonal:

x = y ⇔ xR y ⇔ (x, y) ∈ R = ∆(A). (2.37)

Similarly, the relation ≤ on R is identical to the set

R≤ := {(x, y) ∈ R2 : x ≤ y}. (2.38)

(b) Every function f : A −→ B is a relation, namely the relation

Rf = {(x, y) ∈ A×B : y = f(x)} = graph(f). (2.39)

Conversely, if B 6= ∅, then every relation R ⊆ A × B uniquely corresponds to thefunction

fR : A −→ P(B), fR(x) = {y ∈ B : xR y}. (2.40)

Definition 2.21. Let R be a relation on the set A.

(a) R is called reflexive if, and only if,

∀x∈A

xRx, (2.41)

i.e. if, and only if, every element is related to itself.

(b) R is called symmetric if, and only if,

∀x,y∈A

(xR y ⇒ y Rx

), (2.42)

i.e. if, and only if, each x is related to y if, and only if, y is related to x.

(c) R is called antisymmetric if, and only if,

∀x,y∈A

((xR y ∧ y Rx) ⇒ x = y

), (2.43)

i.e. if, and only if, the only possibility for x to be related to y at the same time thaty is related to x is in the case x = y.

(d) R is called transitive if, and only if,

∀x,y,z∈A

((xR y ∧ y R z) ⇒ xR z

), (2.44)

i.e. if, and only if, the relatedness of x and y together with the relatedness of y andz implies the relatedness of x and z.

Example 2.22. The relations = and ≤ on R (or N) are reflexive, antisymmetric, andtransitive; = is also symmetric, whereas ≤ is not; < is antisymmetric (since x < y∧y < xis always false) and transitive, but neither reflexive nor symmetric. The relation

R :={(x, y) ∈ N2 : (x, y are both even) ∨ (x, y are both odd)

}(2.45)

on N is not antisymmetric, but reflexive, symmetric, and transitive. The relation

S := {(x, y) ∈ N2 : y = x2} (2.46)

is not transitive (for example, 2S 4 and 4S 16, but not 2S 16), not reflexive, not sym-metric; it is only antisymmetric.

Definition 2.23. A relation R on a set A is called an equivalence relation if, and onlyif, R is reflexive, symmetric, and transitive. If R is an equivalence relations, then oneoften writes x ∼ y instead of xR y.

Example 2.24. (a) The equality relation = is an equivalence relation on each A 6= ∅.

(b) The relation R defined in (2.45) is an equivalence relation on N.

(c) Given a disjoint union A = ˙⋃i∈IAi with every Ai 6= ∅ (which is sometimes called a

decomposition of A), an equivalence relation on A is defined by

x ∼ y ⇔ ∃i∈I

(x ∈ Ai ∧ y ∈ Ai

). (2.47)

Conversely, given an equivalence relation ∼ on a nonempty set A, we can constructa decomposition A = ˙⋃

i∈IAi such that (2.47) holds: For each x ∈ A, define

[x] := {y ∈ A : x ∼ y}, (2.48)

called the equivalence class of x; each y ∈ [x] is called a representative of [x]. Oneverifies that the properties of ∼ guarantee

([x] = [y] ⇔ x ∼ y

)∧

([x] ∩ [y] = ∅ ⇔ ¬(x ∼ y)

). (2.49)

The set of all equivalence classes I := A/ ∼:= {[x] : x ∈ A} is called the quotient set

of A by ∼, and A = ˙⋃i∈IAi with Ai := i for each i ∈ I is the desired decomposition

of A.

Definition 2.25. A relation R on a set A is called a partial order if, and only if, R isreflexive, antisymmetric, and transitive. If R is a partial order, then one usually writesx ≤ y instead of xR y. A partial order ≤ is called a total or linear order if, and only if,for each x, y ∈ A, one has x ≤ y or y ≤ x.

Notation 2.26. Given a (partial or total) order ≤ on A 6= ∅, we write x < y if, andonly if, x ≤ y and x 6= y, calling < the strict order corresponding to ≤ (note that thestrict order is never a partial order).

Definition 2.27. Let ≤ be a partial order on A 6= ∅, ∅ 6= B ⊆ A.

(a) x ∈ A is called lower (resp. upper) bound for B if, and only if, x ≤ b (resp. b ≤ x)for each b ∈ B. Moreover, B is called bounded from below (resp. from above) if, andonly if, there exists a lower (resp. upper) bound for B; B is called bounded if, andonly if, it is bounded from above and from below.

(b) x ∈ B is called minimum or just min (resp. maximum or max) of B if, and only if,x is a lower (resp. upper) bound for B. One writes x = minB if x is minimum andx = maxB if x is maximum.

(c) A maximum of the set of lower bounds of B (i.e. a largest lower bound) is calledinfimum of B, denoted inf B; a minimum of the set of upper bounds of B (i.e. asmallest upper bound) is called supremum of B, denoted supB.

Example 2.28. (a) For each A ⊆ R, the usual relation ≤ defines a total order on A.For A = R, we see that N has 0 and 1 as lower bound with 1 = minN = inf N. Onthe other hand, N is unbounded from above. The set M := {1, 2, 3} is boundedwith minM = 1, maxM = 3. The positive real numbers R+ := {x ∈ R : x > 0}have inf R+ = 0, but they do not have a minimum (if x > 0, then 0 < x/2 < x).

(b) Consider A := N× N. Then

(m1,m2) ≤ (n1, n2) ⇔ m1 ≤ n1 ∧ m2 ≤ n2, (2.50)

defines a partial order on A that is not a total order (for example, neither (1, 2) ≤(2, 1) nor (2, 1) ≤ (1, 2)). For the set

B :={(1, 1), (2, 1), (1, 2)

}, (2.51)

we have inf B = minB = (1, 1), B does not have a max, but supB = (2, 2) (if(m,n) ∈ A is an upper bound for B, then (2, 1) ≤ (m,n) implies 2 ≤ m and(1, 2) ≤ (m,n) implies 2 ≤ n, i.e. (2, 2) ≤ (m,n); since (2, 2) is clearly an upperbound for B, we have proved supB = (2, 2)).

A different order on A is the so-called lexicographic order defined by

(m1,m2) ≤ (n1, n2) ⇔ m1 < n1 ∨ (m1 = n1 ∧ m2 ≤ n2). (2.52)

In contrast to the order from (2.50), the lexicographic order does define a totalorder on A.

Lemma 2.29. Let ≤ be a partial order on A 6= ∅, ∅ 6= B ⊆ A. Then the relation ≥,defined by

x ≥ y ⇔ y ≤ x, (2.53)

is also a partial order on A. Moreover, using obvious notation, we have, for each x ∈ A,

x ≤-lower bound for B ⇔ x ≥-upper bound for B, (2.54a)

x ≤-upper bound for B ⇔ x ≥-lower bound for B, (2.54b)

x = min≤B ⇔ x = max≥B, (2.54c)

x = max≤B ⇔ x = min≥B, (2.54d)

x = inf≤B ⇔ x = sup≥B, (2.54e)

x = sup≤B ⇔ x = inf≥B. (2.54f)

Proof. Reflexivity, antisymmetry, and transitivity of ≤ clearly imply the same propertiesfor ≥, respectively. Moreover

x ≤-lower bound for B ⇔ ∀b∈B

x ≤ b ⇔ ∀b∈B

b ≥ x ⇔ x ≥-upper bound for B,

proving (2.54a). Analogously, we obtain (2.54b). Next, (2.54c) and (2.54d) are impliedby (2.54a) and (2.54b), respectively. Finally, (2.54e) is proved by

x = inf≤B ⇔ x = max≤{y ∈ A : y ≤-lower bound for B}⇔ x = min≥{y ∈ A : y ≥-upper bound for B} ⇔ x = sup≥B,

and (2.54f) follows analogously. �

Proposition 2.30. Let ≤ be a partial order on A 6= ∅, ∅ 6= B ⊆ A. The elementsmaxB, minB, supB, inf B are all unique, provided they exist.


Definition 2.31. Let A,B be nonempty sets with partial orders, both denoted by ≤(even though they might be different). A function f : A −→ B, is called (strictly)isotone, order-preserving, or increasing if, and only if,

∀x,y∈A

(x < y ⇒ f(x) ≤ f(y) (resp. f(x) < f(y))

); (2.55a)

f is called (strictly) antitone, order-reversing, or decreasing if, and only if,

∀x,y∈A

(x < y ⇒ f(x) ≥ f(y) (resp. f(x) > f(y))

). (2.55b)

Functions that are (strictly) isotone or antitone are called (strictly) monotone.

Proposition 2.32. Let A,B be nonempty sets with partial orders, both denoted by ≤.

(a) A (strictly) isotone function f : A −→ B becomes a (strictly) antitone functionand vice versa if precisely one of the relations ≤ is replaced by ≥.

(b) If the order ≤ on A is total and f : A −→ B is strictly isotone or strictly antitone,then f is one-to-one.

(c) If the order ≤ on A is total and f : A −→ B is invertible and strictly isotone (resp.antitone), then f−1 is also strictly isotone (resp. antitone).

Proof. (a) is immediate from (2.55).

(b): Due to (a), it suffices to consider the case that f is strictly isotone. If f is strictlyisotone and x 6= y, then x < y or y < x since the order on A is total. Thus, f(x) < f(y)or f(y) < f(x), i.e. f(x) 6= f(y) in every case, showing f is one-to-one.

(c): Again, due to (a), it suffices to consider the isotone case. If u, v ∈ B such that u < v,then u = f(f−1(u)), v = f(f−1(v)), and the isotonicity of f imply f−1(u) < f−1(v) (weare using that the order on A is total – otherwise, f−1(u) and f−1(v) need not becomparable). �

Example 2.33. (a) f : N −→ N, f(n) := 2n, is strictly increasing, every constant mapon N is both increasing and decreasing, but not strictly increasing or decreasing.All maps occurring in (2.25) are neither increasing nor decreasing.

(b) The map f : R −→ R, f(x) := −2x, is invertible and strictly decreasing, and so isf−1 : R −→ R, f−1(x) := −x/2.

3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 36

(c) The following counterexamples show that the assertions of Prop. 2.32(b),(c) are nolonger correct if one does not assume the order on A is total. Let A be the set from(2.51) (where it had been called B) with the (nontotal) order from (2.50). The map

f : A −→ N,

f(1, 1) := 1,

f(1, 2) := 2,

f(2, 1) := 2,

(2.56)

is strictly isotone, but not one-to-one. The map

f : A −→ {1, 2, 3},

f(1, 1) := 1,

f(1, 2) := 2,

f(2, 1) := 3,

(2.57)

is strictly isotone and invertible, however f−1 is not isotone (since 2 < 3, butf−1(2) = (1, 2) and f−1(3) = (2, 1) are not comparable, i.e. f−1(2) ≤ f−1(3) is nottrue).

3 Natural Numbers, Induction, and the Size of Sets

3.1 Induction and Recursion

One of the most useful proof techniques is the method of induction – it is used insituations, where one needs to verify the truth of statements φ(n) for each n ∈ N, i.e.the truth of the statement

∀n∈N

φ(n). (3.1)

Induction is based on the fact that N satisfies the so-called Peano axioms:

P1: N contains a special element called one, denoted 1.

P2: There exists an injective map S : N −→ N \ {1}, called the successor function (foreach n ∈ N, S(n) is called the successor of n).

P3: If a subset A of N has the property that 1 ∈ A and S(n) ∈ A for each n ∈ A, thenA is equal to N. Written as a formula, the third axiom is:

∀A∈P(N)

(1 ∈ A ∧ S(A) ⊆ A ⇒ A = N

).

Remark 3.1. In Def. 1.28, we had introduced the natural numbers N := {1, 2, 3, . . . }.The successor function is S(n) = n + 1. In axiomatic set theory, one starts with thePeano axioms and shows that the axioms of set theory allow the construction of aset N which satisfies the Peano axioms. One then defines 2 := S(1), 3 := S(2), . . . ,n+ 1 := S(n). The interested reader can find more details in Appendix D.1.


Theorem 3.2 (Principle of Induction). Suppose, for each n ∈ N, φ(n) is a statement(i.e. a predicate of n in the language of Def. 1.31). If (a) and (b) both hold, where

(a) φ(1) is true,

(b) ∀n∈N

(φ(n) ⇒ φ(n+ 1)

),

then (3.1) is true, i.e. φ(n) is true for every n ∈ N.

Proof. Let A := {n ∈ N : φ(n)}. We have to show A = N. Since 1 ∈ A by (a), and

n ∈ A ⇒ φ(n)(b)⇒ φ(n+ 1) ⇒ S(n) = n+ 1 ∈ A, (3.2)

i.e. S(A) ⊆ A, the Peano axiom P3 implies A = N. �

Remark 3.3. To prove some φ(n) for each n ∈ N by induction according to Th. 3.2consists of the following two steps:

(a) Prove φ(1), the so-called base case.

(b) Perform the inductive step, i.e. prove that φ(n) (the induction hypothesis) impliesφ(n+ 1).

Example 3.4. We use induction to prove the statement

∀n∈N

(

1 + 2 + · · ·+ n =n(n+ 1)

2

)

︸︷︷︸

φ(n)

: (3.3)

Base Case (n = 1): 1 = 1·22, i.e. φ(1) is true.

Induction Hypothesis: Assume φ(n), i.e. 1 + 2 + · · ·+ n = n(n+1)2

holds.

Induction Step: One computes

1 + 2 + · · ·+ n+ (n+ 1)

(φ(n))

=n(n+ 1)

2+ n+ 1 =

n(n+ 1) + 2n+ 2

2

=n2 + 3n+ 2

2=

(n+ 1)(n+ 2)

2, (3.4)

i.e. φ(n+ 1) holds and the induction is complete.

Corollary 3.5. Theorem 3.2 remains true if (b) is replaced by

∀n∈N

((

∀1≤m≤n

φ(m)

)

⇒ φ(n+ 1)

)

. (3.5)

Proof. If, for each n ∈ N, we use ψ(n) to denote ∀1≤m≤n

φ(m), then (3.5) is equivalent to

∀n∈N

(ψ(n) ⇒ ψ(n+ 1)

), i.e. to Th. 3.2(b) with φ replaced by ψ. Thus, Th. 3.2 implies

ψ(n) holds true for each n ∈ N, i.e. φ(n) holds true for each n ∈ N. �

Corollary 3.6. Let I be an index set. Suppose, for each i ∈ I, φ(i) is a statement. Ifthere is a bijective map f : N −→ I and (a) and (b) both hold, where

(a) φ(f(1)

)is true,

(b) ∀n∈N

(

φ(f(n)

)⇒ φ

(f(n+ 1)

))

,

then φ(i) is true for every i ∈ I.

Finite Induction: The above assertion remains true if f : {1, . . . ,m} −→ I is bijectivefor some m ∈ N and N in (b) is replaced by {1, . . . ,m− 1}.

Proof. If, for each n ∈ N, we use ψ(n) to denote φ(f(n)

), then Th. 3.2 shows ψ(n) is

true for every n ∈ N. Given i ∈ I, we have n := f−1(i) ∈ N with f(n) = i, showing thatφ(i) = φ

(f(n)

)= ψ(n) is true.

For the finite induction, let ψ(n) denote(n ≤ m ∧ φ

(f(n)

))∨ n > m. Then, for 1 ≤

n < m, we have ψ(n) ⇒ ψ(n+1) due to (b). For n ≥ m, we also have ψ(n) ⇒ ψ(n+1)due to n ≥ m ⇒ n + 1 > m. Thus, Th. 3.2 shows ψ(n) is true for every n ∈ N. Giveni ∈ I, it is n := f−1(i) ∈ {1, . . . ,m} with f(n) = i. Since n ≤ m ∧ ψ(n) ⇒ φ

(f(n)

), we

obtain that φ(i) is true. �

Apart from providing a widely employable proof technique, the most important ap-plication of Th. 3.2 is the possibility to define sequences inductively, using so-calledrecursion:

Theorem 3.7 (Recursion Theorem). Let A be a nonempty set and x ∈ A. Given asequence of functions (fn)n∈N, where fn : An −→ A, there exists a unique sequence(xn)n∈N in A satisfying the following two conditions:

(i) x1 = x.

(ii) ∀n∈N

xn+1 = fn(x1, . . . , xn).

The same holds if N is replaced by an index set I as in Cor. 3.6.

Proof. To prove uniqueness, let (xn)n∈N and (yn)n∈N be sequences in A, both satisfying(i) and (ii), i.e.

x1 = y1 = x and (3.6a)

∀n∈N

(xn+1 = fn(x1, . . . , xn) ∧ yn+1 = fn(y1, . . . , yn)

). (3.6b)


We prove by induction (in the form of Cor. 3.5) that (xn)n∈N = (yn)n∈N, i.e.

∀n∈N

xn = yn︸︷︷︸

φ(n)

: (3.7)

Base Case (n = 1): φ(1) is true according to (3.6a).

Induction Hypothesis: Assume φ(m) for each m ∈ {1, . . . , n}, i.e. xm = ym holds foreach m ∈ {1, . . . , n}.Induction Step: One computes

xn+1(3.6b)= fn(x1, . . . , xn)

(φ(1),...,φ(n)

)

= fn(y1, . . . , yn)(3.6b)= yn+1, (3.8)

i.e. φ(n+ 1) holds and the induction is complete.

To prove existence, we have to show that there is a function F : N −→ A such that thefollowing two conditions hold:

F (1) = x, (3.9a)

∀n∈N

F (n+ 1) = fn(F (1), . . . , F (n)

). (3.9b)

To this end, let

F :=

B ⊆ N× A : (1, x) ∈ B ∧ ∀

n∈N,(1,a1),...,(n,an)∈B

(n+ 1, fn(a1, . . . , an)

)∈ B

(3.10)

andG :=

⋂

B∈FB. (3.11)

Note that G is well-defined, as N × A ∈ F . Also, clearly, G ∈ F . We would like todefine F such that G = graph(F ). For this to be possible, we will show, by induction,

∀n∈N

∃!xn∈A

(n, xn) ∈ G︸︷︷︸

φ(n)

. (3.12)

Base Case (n = 1): From the definition of G, we know (1, x) ∈ G. If (1, a) ∈ G witha 6= x, then H := G \ {(1, a)} ∈ F , implying G ⊆ H in contradiction to (1, a) /∈ H.This shows a = x and proves φ(1).

Induction Hypothesis: Assume φ(m) for each m ∈ {1, . . . , n}.Induction Step: From the induction hypothesis, we know

∃!(x1,...,xn)∈An

(1, x1), . . . , (n, xn) ∈ G.

Thus, if we let xn+1 := fn(x1, . . . , xn), then (n+ 1, xn+1) ∈ G by the definition of G. If(n+ 1, a) ∈ G with a 6= xn+1, then H := G \ {(n+ 1, a)} ∈ F (using the uniqueness of


the (1, x1), . . . , (n, xn) ∈ G), implying G ⊆ H in contradiction to (n + 1, a) /∈ H. Thisshows a = xn+1, proves φ(n+ 1), and completes the induction.

Due to (3.12), we can now define F : N −→ A, F (n) := xn, and the definition of G thenguarantees the validity of (3.9). �

Example 3.8. In many applications of Th. 3.7, one has functions gn : A −→ A anduses

∀n∈N

(fn : An −→ A, fn(a1, . . . , an) := gn(an)

). (3.13)

Here are some important concrete examples:

(a) The factorial function F : N0 −→ N, n 7→ n!, is defined recursively by

0! := 1, 1! := 1, ∀n∈N

(n+ 1)! := (n+ 1) · n!, (3.14a)

i.e. we have A = N and gn(x) := (n+ 1) · x. So we obtain

(n!)n∈N0 = (1, 1, 2, 6, 24, 120, . . . ). (3.14b)

(b) For each a ∈ R and each d ∈ R, we define the following arithmetic progression (alsocalled arithmetic sequence) recursively by

a1 := a, ∀n∈N

an+1 := an + d, (3.15a)

i.e. we have A = R and gn = g with g(x) := x + d. For example, for a = 2 andd = −0.5, we obtain

(an)n∈N = (2, 1.5, 1, 0.5, 0, −0.5, −1, −1.5, . . . ). (3.15b)

(c) For each a ∈ R and each q ∈ R \ {0}, we define the following geometric progression(also called geometric sequence) recursively by

x1 := a, ∀n∈N

xn+1 := xn · q, (3.16a)

i.e. we have A = R and gn = g with g(x) := x · q. For example, for a = 3 andq = −2, we obtain

(xn)n∈N = (3, −6, 12, −24, 48, . . . ). (3.16b)

For the time being, we will continue to always specify A and the gn or fn in subsequentrecursive definitions, but in the literature, most of the time, the gn or fn are not providedexplicitly.

Example 3.9. (a) The Fibonacci sequence consists of the Fibonacci numbers, definedrecursively by

F0 := 0, F1 := 1, ∀n∈N

Fn+1 := Fn + Fn−1, (3.17a)


i.e. we have A = N0 and

fn : An −→ A, fn(a1, . . . , an) :=

{

1 for n = 1,

an + an−1 for n ≥ 2.(3.17b)

So we obtain

(Fn)n∈N0 = (0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, . . . ). (3.17c)

(b) For A := N, x := 1, and

fn : An −→ A, fn(a1, . . . , an) := a1 + · · ·+ an, (3.18a)

one obtains

x1 = 1, x2 = f1(1) = 1, x3 = f2(1, 1) = 2, x4 = f3(1, 1, 2) = 4,

x5 = f4(1, 1, 2, 4) = 8, x6 = f5(1, 1, 2, 4, 8) = 16, . . .(3.18b)

Definition 3.10. (a) Summation Symbol: On A = R (or, more generally, on every setA, where an addition + : A×A −→ A is defined), define recursively, for each given(possibly finite) sequence (a1, a2, . . . ) in A:

1∑

i=1

ai := a1,n+1∑

i=1

ai := an+1 +n∑

i=1

ai for n ≥ 1, (3.19a)

i.e.fn : An −→ A, fn(x1, . . . , xn) := xn + an+1. (3.19b)

In (3.19a), one can also use other symbols for i, except a and n; for a finite sequence,n needs to be less than the maximal index of the finite sequence.

More generally, if I is an index set and φ : {1, . . . , n} −→ I a bijective map, thendefine

∑

i∈Iai :=

n∑

i=1

aφ(i). (3.19c)

The commutativity of addition implies that the definition in (3.19c) is actuallyindependent of the chosen bijective map φ (cf. Th. B.5). Also define

∑

i∈∅ai := 0 (3.19d)

(for a general A, 0 is meant to be an element such that a+ 0 = 0 + a = a for eacha ∈ A and we can even define this if 0 /∈ A).

(b) Product Symbol: On A = R (or, more generally, on every set A, where a multiplica-tion · : A× A −→ A is defined), define recursively, for each given (possibly finite)sequence (a1, a2, . . . ) in A:

1∏

i=1

ai := a1,n+1∏

i=1

ai := an+1 ·n∏

i=1

ai for n ≥ 1, (3.20a)


i.e.fn : An −→ A, fn(x1, . . . , xn) := an+1 · xn. (3.20b)

In (3.20a), one can also use other symbols for i, except a and n; for a finite sequence,n needs to be less than the maximal index of the finite sequence.

More generally, if I is an index set and φ : {1, . . . , n} −→ I a bijective map, thendefine

∏

i∈Iai :=

n∏

i=1

aφ(i). (3.20c)

The commutativity of multiplication implies that the definition in (3.20c) is actuallyindependent of the chosen bijective map φ (cf. Th. B.5). Also define

∏

i∈∅ai := 1 (3.20d)

(for a general A, 1 is meant to be an element such that a · 1 = 1 · a = a for eacha ∈ A and we can even define this if 1 /∈ A).

Example 3.11. (a) Given a, d ∈ R, let (an)n∈N be the arithmetic sequence as definedin (3.15a). It is an exercise to prove by induction that

∀n∈N

an = a+ (n− 1)d, (3.21a)

∀n∈N

Sn :=n∑

i=1

ai =n

2(a1 + an) =

n

2

(2 a+ (n− 1) d

), (3.21b)

where the Sn are called arithmetic sums.

(b) Given a ∈ R and q ∈ R \ {0}, let (xn)n∈N be the geometric sequence as defined in(3.16a). We will prove by induction that

∀n∈N

xn = a qn−1, (3.22a)

∀n∈N

Sn :=n∑

i=1

xi =n∑

i=1

(a qi−1) = a

n−1∑

i=0

qi =

{

n a for q = 1,a (1−qn)

1−q for q 6= 1,(3.22b)

where the Sn are called geometric sums.

For the induction proof of (3.22a), φ(n) is xn = a qn−1. The base case, φ(1), is thestatement x1 = a q0 = a, which is true. For the induction step, we assume φ(n)and compute

xn+1 = xn · q(φ(n))

= a qn−1 · q = a qn, (3.23)

showing φ(n) ⇒ φ(n+ 1) and completing the proof.

For q = 1, the sum Sn is actually arithmetic with d = 0, i.e. Sn = na can beobtained from (3.21b). For the induction proof of (3.22b) with q 6= 1, φ(n) is


Sn = a(1−qn)1−q . The base case, φ(1), is the statement S1 =

a(1−q)1−q = a, which is true.

For the induction step, we assume φ(n) and compute

Sn+1 = Sn + xn+1

(φ(n))

=a(1− qn)

1− q+ aqn =

a(1− qn) + aqn(1− q)

1− q=a(1− qn+1)

1− q,

(3.24)showing φ(n) ⇒ φ(n+ 1) and completing the proof.

3.2 Cardinality: The Size of Sets

Cardinality measures the size of sets. For a finite set A, it is precisely the number ofelements in A. For an infinite set, it classifies the set’s degree or level of infinity (it turnsout that not all infinite sets have the same size).

Definition 3.12. (a) The sets A,B are defined to have the same cardinality or thesame size if, and only if, there exists a bijective map ϕ : A −→ B. One can showthat this defines an equivalence relation on every set of sets (see Th. A.53 of theAppendix).

(b) The cardinality of a set A is n ∈ N (denoted #A = n) if, and only if, there existsa bijective map ϕ : A −→ {1, . . . , n}. The cardinality of ∅ is defined as 0, i.e.#∅ := 0. A set A is called finite if, and only if, there exists n ∈ N0 such that#A = n; A is called infinite if, and only if, A is not finite, denoted #A = ∞ (in thestrict sense, this is an abuse of notation, since ∞ is not a cardinality – for example#N = ∞ and #P(N) = ∞, but N and P(N) do not have the same cardinality,since the power set P(A) is always strictly bigger than A (see Th. A.69 of theAppendix) – #A = ∞ is merely an abbreviation for the statement “A is infinite”).The interested student finds additional material regarding the uniqueness of finitecardinality in Th. A.61 and Cor. A.62, and regarding characterizations of infinitesets in Th. A.54 of the Appendix.

(c) The set A is called countable if, and only if, A is finite or A has the same cardinalityas N. Otherwise, A is called uncountable.

—

In the rest of the section, we present a number of important results regarding the naturalnumbers and countability.

Theorem 3.13. (a) Every nonempty finite subset of a totally ordered set has a mini-mum and a maximum.

(b) Every nonempty subset of N has a minimum.

Proof. (a): Let A be a set and let ≤ denote a total order on A. Moreover, let ∅ 6= B ⊆ A.We show by induction

∀n∈N

(#B = n ⇒ B has a min

)

︸︷︷︸

φ(n)

.

Base Case (n = 1): For n = 1, B contains a unique element b, i.e. b = minB, provingφ(1).

Induction Step: Suppose φ(n) holds and consider B with #B = n + 1. Let b be oneelement from B. Then C := B \ {b} has cardinality n and, according to the inductionhypothesis, there exists c ∈ C satisfying c = minC. If c ≤ b, then c ≤ x for each x ∈ B,proving c = minB. If b ≤ c, then b ≤ x for each x ∈ B, proving b = minB. In eachcase, B has a min, proving φ(n+ 1) and completing the induction.

(b): Let ∅ 6= A ⊆ N. We have to show A has a min. If A is finite, then A has a min by (a).If A is infinite, let n be an element from A. Then the finite set B := {k ∈ A : k ≤ n}must have a min m by (a). Since m ≤ x for each x ∈ B and m ≤ n < x for eachx ∈ A \B, we have m = minA. �

Proposition 3.14. Every subset A of N is countable.

Proof. Since ∅ is countable, we may assume A 6= ∅. From Th. 3.13(b), we know thatevery nonempty subset of N has a min. We recursively define a sequence in A by

a1 := minA, an+1 :=

{

minAn if An := A \ {ai : 1 ≤ i ≤ n} 6= ∅,an if An = ∅.

This sequence is the same as the function f : N −→ A, f(n) = an. An easy inductionshows that, for each n ∈ N, an 6= an+1 implies the restriction f ↾{1,...,n+1} is injective.Thus, if there exists n ∈ N such that an = an+1, then f ↾{1,...,k}: {1, . . . , k} −→ A isbijective, where k := min{n ∈ N : an = an+1}, showing A is finite, i.e. countable. Ifthere does not exist n ∈ N with an = an+1, then f is injective. Another easy inductionshows that, for each n ∈ N, f({1, . . . , n}) ⊇ {k ∈ A : k ≤ n}, showing f is alsosurjective, proving A is countable. �

Proposition 3.15. For each set A 6= ∅, the following three statements are equivalent:

(i) A is countable.

(ii) There exists an injective map f : A −→ N.

(iii) There exists a surjective map g : N −→ A.

Proof. Directly from the definition of countable in Def. 3.12(c), one obtains (i)⇒(ii) and(i)⇒(iii). To prove (ii)⇒(i), let f : A −→ N be injective. Then f : A −→ f(A) isbijective, and, since f(A) ⊆ N, f(A) is countable by Prop. 3.14, proving A is countableas well. To prove (iii)⇒(i), let g : N −→ A be surjective. Then g has a right inversef : A −→ N. One can obtain this from Th. 2.13(a), but, here, we can actually constructf without the axiom of choice: For a ∈ A, let f(a) := min g−1({a}) (recall Th. 3.13(b)).Then, clearly, g ◦f = IdA. But this means g is a left inverse for f , showing f is injectiveaccording to Th. 2.13(b). Then A is countable by an application of (ii). �

Theorem 3.16. If (A1, . . . , An), n ∈ N, is a finite family of countable sets, then∏n

i=1Aiis countable.


Proof. We first consider the special case n = 2 with A1 = A2 = N and show the map

ϕ : N× N −→ N, ϕ(m,n) := 2m · 3n,is injective: If ϕ(m,n) = ϕ(p, q), then 2m · 3n = 2p · 3q. Moreover m ≤ p or p ≤ m.If m ≤ p, then 3n = 2p−m · 3q. Since 3n is odd, 2p−m · 3q must also be odd, implyingp − m = 0, i.e. m = p. Moreover, we now have 3n = 3q, implying n = q, showing(m,n) = (p, q), i.e. ϕ is injective.

We now come back to the general case stated in the theorem. If at least one of the Ai isempty, then A is empty. So it remains to consider the case, where all Ai are nonempty.The proof is conducted by induction by showing

∀n∈N

n∏

i=1

Ai is countable

︸︷︷︸

φ(n)

.

Base Case (n = 1): φ(1) is merely the hypothesis that A1 is countable.

Induction Step: Assuming φ(n), Prop. 3.15(ii) provides injective maps f1 :∏n

i=1Ai −→N and f2 : An+1 −→ N. To prove φ(n+1), we provide an injective map h :

∏n+1i=1 Ai −→

N: Define

h :n+1∏

i=1

Ai −→ N, h(a1, . . . , an, an+1) := ϕ(f1(a1, . . . , an), f2(an+1)

).

The injectivity of f1, f2, and ϕ clearly implies the injectivity of h, thereby provingφ(n+ 1) and completing the induction. �

Theorem 3.17. If (Ai)i∈I is a countable family of countable sets (i.e. ∅ 6= I is countableand each Ai, i ∈ I, is countable), then the union A :=

⋃

i∈I Ai is also countable (thisresult makes use of AC, cf. Rem. 3.18 below).

Proof. It suffices to consider the case that all Ai are nonempty. Moreover, according toProp. 3.15(iii), it suffices to construct a surjective map ϕ : N −→ A. Also accordingto Prop. 3.15(iii), the countability of I and the Ai provides us with surjective mapsf : N −→ I and gi : N −→ Ai (here AC is used to select each gi from the set of allsurjective maps from N onto Ai). Define

F : N× N −→ A, F (m,n) := gf(m)(n).

Then F is surjective: Given x ∈ A, there exists i ∈ I such that x ∈ Ai. Since f issurjective, there is m ∈ N satisfying f(m) = i. Moreover, since gi is surjective, thereexists n ∈ N with gi(n) = x. Then F (m,n) = gi(n) = x, verifying that F is surjective.As N×N is countable by Th. 3.16, there exists a surjective map h : N −→ N×N. Thus,F ◦ h is the desired surjective map from N onto A. �

Remark 3.18. The axiom of choice is, indeed, essential for the proof of Th. 3.17. Itis shown in [Jec73, Th. 10.6] that it is consistent with the axioms of ZF (i.e. with theaxioms of Sec. A.3 of the Appendix) that, e.g., the uncountable sets P(N) and R (cf.Th. F.2 of the Appendix) are countable unions of countable sets.

4 REAL NUMBERS 46

4 Real Numbers

4.1 The Real Numbers as a Complete Totally Ordered Field

The set of real numbers, denoted R, is a set with special properties, namely a so-calledcomplete totally ordered field, which, after some preliminaries, will be defined in Def. 4.3below.

Definition 4.1. A total order ≤ on a nonempty set A is called complete if, and only if,every nonempty subset B of A that is bounded from above has a supremum, i.e.

∀B∈P(A)\{∅}

((

∃x∈A

∀b∈B

b ≤ x

)

⇒ ∃s∈A

s = supB

)

. (4.1)

Lemma 4.2. A total order ≤ on a nonempty set A is complete if, and only if, everynonempty subset B of A that is bounded from below has an infimum.

Proof. According to Lem. 2.29, it suffices to prove one implication. We show that (4.1)implies that every nonempty B bounded from below has an infimum: Define

C := {x ∈ A : x is lower bound for B}. (4.2)

Then every b ∈ B is an upper bound for C and (4.1) implies there exists s = supC ∈ A.To verify s = inf B, it remains to show s ∈ C, i.e. that s is a lower bound for B.However, every b ∈ B is an upper bound for C and s = supC is the min of all upperbounds for C, i.e. s ≤ b for each b ∈ B, showing s ∈ C. �

Definition 4.3. Let (A,+, ·) be a field and let ≤ be a total order on A. Then A (or,more precisely, (A,+, ·,≤)) is called a totally ordered field if, and only if, the order iscompatible with addition and multiplication, i.e. if, and only if,

∀x,y,z∈A

(x ≤ y ⇒ x+ z ≤ y + z

), (4.3a)

∀x,y∈A

(0 ≤ x ∧ 0 ≤ y ⇒ 0 ≤ xy

). (4.3b)

Finally, A is called a complete totally ordered field if, and only if, A is a totally orderedfield that is complete in the sense of Def. 4.1.

Theorem 4.4. There exists a complete totally ordered field R (it is called the set ofreal numbers). Moreover, R is unique up to isomorphism, i.e. if A is a complete totallyordered field, then there exists an isomorphism φ : A −→ R, i.e. a bijective map φ :A −→ R, satisfying

∀x,y∈A

φ(x+ y) = φ(x) + φ(y), (4.4a)

∀x,y∈A

φ(xy) = φ(x)φ(y), (4.4b)

∀x,y∈A

(x < y ⇒ φ(x) < φ(y)

). (4.4c)

It also turns out that the isomorphism is unique.

4 REAL NUMBERS 47

Proof. To really prove the existence of the real numbers by providing a constructionis tedious and not easy. One possible construction is provided in Appendix D (theexistence proof is completed in Th. D.41, the results regarding the isomorphism can befound in Th. D.45). �

Theorem 4.5. The following statements and rules are valid in the set of real numbersR (and, more generally, in every totally ordered field):

(a) x ≤ y ⇒ −x ≥ −y.(b) x ≤ y ∧ z ≥ 0 ⇒ xz ≤ yz holds as well as x ≤ y ∧ z ≤ 0 ⇒ xz ≥ yz.

(c) x 6= 0 ⇒ x2 := x · x > 0. In particular 1 > 0.

(d) x > 0 ⇒ 1/x > 0, whereas x < 0 ⇒ 1/x < 0.

(e) If 0 < x < y, then x/y < 1, y/x > 1, and 1/x > 1/y.

(f) x < y ∧ u < v ⇒ x+ u < y + v.

(g) 0 < x < y ∧ 0 < u < v ⇒ xu < yv.

(h) x < y ∧ 0 < λ < 1 ⇒ x < λx+ (1− λ)y < y. In particular x < x+y2< y.

Proof. (a): Using (4.3a): x ≤ y ⇒ 0 ≤ y − x ⇒ −y ≤ −x.(b): One argues, for z ≥ 0,

x ≤ y ⇒ 0 ≤ y − x(4.3b)⇒ 0 ≤ (y − x)z = yz − xz ⇒ xz ≤ yz,

and, for z ≤ 0,

x ≤ y ⇒ 0 ≤ y − x(4.3b)⇒ 0 ≤ (y − x)(−z) = xz − yz ⇒ xz ≥ yz.

(c): From (b), one obtains x2 ≥ 0. From Th. C.10(i), one then gets x2 > 0.

(d): If x > 0, then x−1 < 0 implies the false statement 1 = xx−1 < 0, i.e. x−1 > 0. Thecase x < 0 is treated analogously.

(e): Using (d), we obtain from 0 < x < y that x/y = xy−1 < yy−1 = 1 and 1 = xx−1 <yx−1 = y/x.

(f): x < y ⇒ x + u < y + u and u < v ⇒ y + u < y + v; both combined yieldx+ u < y + v.

(g): 0 < x < y ∧ 0 0, x < y implies

λx < λy ∧ (1− λ)x < (1− λ)y.

Using (4.3a), we obtain

x = λx+ (1− λ)x < λx+ (1− λ)y < λy + (1− λ)y = y,

completing the proof of the theorem. �

4 REAL NUMBERS 48

Theorem 4.6. Let ∅ 6= A,B ⊆ R, λ ∈ R, and define

A+B := {a+ b : a ∈ A ∧ b ∈ B}, (4.5a)

λA := {λa : a ∈ A}. (4.5b)

If A and B are bounded, then

sup(A+ B) = supA+ supB, (4.6a)

inf(A+ B) = inf A+ inf B, (4.6b)

sup(λA) =

{

λ · supA for λ ≥ 0,

λ · inf A for λ < 0,(4.6c)

inf(λA) =

{

λ · inf A for λ ≥ 0,

λ · supA for λ < 0.(4.6d)


4.2 Important Subsets

Remark 4.7. We would like to recover the natural numbers N as a subset of R. Indeed,if we start with 1 as the neutral element of multiplication and define 2 := 1+1, 3 := 2+1,. . . , then N := {1, 2, . . . } is a subset of R, satisfying the Peano axioms P1, P2, P3 of Sec.3.1. However, if one does actually construct R according to the axioms of axiomaticset theory, then one starts by constructing N first, constructing R from N in severalsteps (cf. Appendix D). Depending on the construction used, the original set of naturalnumbers will typically not be the same set as the natural numbers as a subset of R.However, both sets will satisfy the Peano axioms and you will have a canonical bijectionbetween the two sets. Which one you consider the “genuine” set of natural numbersdepends on your personal taste and philosophy and is completely irrelevant. Any twomodels of N will always produce equivalent results, since they must both satisfy thethree Peano axioms.

—

We now introduce a zoo of important subsets of R together with corresponding notation:

N := {1, 2, 3, . . . } (natural numbers), (4.7a)

N0 := N ∪ {0}, (4.7b)

Z− := {−n : n ∈ N} (negative integers), (4.7c)

Z := Z− ∪ N0 (integers), (4.7d)

Q+ := {m/n : m,n ∈ N} (positive rational numbers), (4.7e)

Q+0 := Q+ ∪ {0} (nonnegative rational numbers), (4.7f)

Q− := {−q : q ∈ Q+} (negative rational numbers), (4.7g)

5 COMPLEX NUMBERS 49

Q−0 := Q− ∪ {0} (nonpositive rational numbers), (4.7h)

Q := Q+0 ∪Q− (rational numbers), (4.7i)

R+ := {x ∈ R : x > 0} (positive real numbers), (4.7j)

R+0 := {x ∈ R : x ≥ 0} (nonnegative real numbers), (4.7k)

R− := {x ∈ R : x < 0} (negative real numbers), (4.7l)

R−0 := {x ∈ R : x ≤ 0} (nonpositive real numbers). (4.7m)

For a, b ∈ R with a ≤ b, one also defines the following intervals:

[a, b] := {x ∈ R : a ≤ x ≤ b} (bounded closed interval), (4.8a)

]a, b[ := {x ∈ R : a < x < b} (bounded open interval), (4.8b)

]a, b] := {x ∈ R : a < x ≤ b} (bounded half-open interval), (4.8c)

[a, b[ := {x ∈ R : a ≤ x < b} (bounded half-open interval), (4.8d)

]−∞, b] := {x ∈ R : x ≤ b} (unbounded closed interval), (4.8e)

]−∞, b[ := {x ∈ R : x < b} (unbounded open interval), (4.8f)

[a,∞[ := {x ∈ R : a ≤ x} (unbounded closed interval), (4.8g)

]a,∞[ := {x ∈ R : a < x} (unbounded open interval). (4.8h)

For a = b, one says that the intervals defined by (4.8a) – (4.8d) are degenerate ortrivial, where [a, a] = {a}, ]a, a[=]a, a] = [a, a[= ∅ – it is sometimes convenient to haveincluded the degenerate cases in the definition. It is sometimes also useful to abandonthe restriction a ≤ b, to let c := min{a, b}, d := max{a, b}, and to define

[a, b] := [c, d], ]a, b[:=]c, d[, ]a, b] := [c, d] \ {a}, [a, b[:= [c, d] \ {b}. (4.8i)

Theorem 4.8 (Archimedean Property). Let ǫ, x be real numbers. If ǫ > 0 and x > 0,then there exists n ∈ N such that n ǫ > x.

Proof. We conduct the proof by contradiction: Suppose x is an upper bound for the setA := {n ǫ : n ∈ N}. Since the order ≤ on R is complete, according to (4.1), there existss ∈ R such that s = supA. In particular, s− ǫ is not an upper bound for A, i.e. thereexists n ∈ N satisfying n ǫ > s− ǫ. But then (n+1) ǫ > s in contradiction to s = supA.This shows x is not an upper bound for A, thereby establishing the case. �

5 Complex Numbers

5.1 Definition and Basic Arithmetic

According to Th. 4.5(c), x2 ≥ 0 holds for every real number x ∈ R, i.e. the equationx2 + 1 = 0 has no solution in R. This deficiency of the real numbers motivates theeffort to try to extend the field of real numbers to a larger field C, the so-called complexnumbers. The two requirements that C is to be a field containing R and that there is to


be some complex number i ∈ C satisfying i2 = −1 already dictates the following lawsof addition and multiplication for complex numbers z = x + iy and w = u + iv withx, y, u, v ∈ R:

z + w = x+ iy + u+ iv = x+ u+ i(y + v), (5.1a)

zw = (x+ iy)(u+ iv) = xu− yv + i(xv + yu). (5.1b)

Moreover, if x + iy = u + iv, then (x − u)2 = −(v − y)2, i.e. x − u = 0 = v − y,implying x = u and y = v. This suggests to try defining complex numbers as pairs ofreal numbers. Indeed, this works:

Definition 5.1. We define the set of complex numbers C := R × R, where, keeping inmind (5.1), addition on C is defined by

+ : C× C −→ C,((x, y), (u, v)

)7→ (x, y) + (u, v) := (x+ u, y + v), (5.2)

and multiplication on C is defined by

· : C× C −→ C,((x, y), (u, v)

)7→ (x, y) · (u, v) := (xu− yv, xv + yu). (5.3)

Theorem 5.2. (a) The set of complex numbers C with addition and multiplication asdefined in Def. 5.1 forms a field, where (0, 0) and (1, 0) are the neutral elementswith respect to addition and multiplication, respectively,

−z := (−x,−y) (5.4a)

is the additive inverse to z = (x, y), whereas

z−1 :=1

z:=

(x

x2 + y2,

−yx2 + y2

)

(5.4b)

is the multiplicative inverse to z = (x, y) 6= (0, 0).

(b) Defining subtraction and division in the usual way, for each z, w ∈ C, by w − z :=w + (−z), and w/z := wz−1 for z 6= (0, 0), respectively, all the rules stated in Th.C.10 are valid in C.

(c) The mapι : R −→ C, ι(x) := (x, 0), (5.5)

is a monomorphism, i.e. it is injective and satisfies

∀x,y∈R

ι(x+ y) = ι(x) + ι(y), (5.6a)

∀x,y∈R

ι(xy) = ι(x) · ι(y). (5.6b)

It is customary to identify R with ι(R), as it usually does not cause any confusion.One then just writes x instead of (x, 0).

Proof. All computations required for (a) and (c) are straightforward and are left as anexercise; (b) is a consequence of (a), since Th. C.10 is valid for every field. �

Notation 5.3. The number i := (0, 1) is called the imaginary unit (note that, indeed,i2 = i · i = (0, 1) · (0, 1) = (0 · 0− 1 · 1, 0 · 1+1 · 0) = (−1, 0) = −1). Using i, one obtainsthe commonly used representation of a complex number z = (x, y) ∈ C:

z = (x, y) = x · (1, 0) + y · (0, 1) = x+ iy, (5.7)

where one calls Re z := x the real part of z and Im z := y the imaginary part of z.Moreover, z is called purely imaginary if, and only if, Re z = 0 (as a consequence of thisconvention, one has the (harmless) pathology that 0 is both real and purely imaginary).

Remark 5.4. There does not exist a total order ≤ on C that makes C into a totallyordered field (i.e. no total order on C can be compatible with addition and multiplicationin the sense of (4.3)): Indeed, if there were such a total order ≤ on C, then all the rulesof Th. 4.5 had to be valid with respect to that total order ≤. In particular, 0 < 12 = 1and 0 < i2 = −1 had to be valid by Th. 4.5(c), and, then, 0 < 1 + (−1) = 0 had tobe valid by Th. 4.5(f). However, 0 < 0 is false, showing that there is no total order onC that satisfies (4.3). Caveat: Of course, there do exist total orders on C, just nonecompatible with addition and multiplication – for example, the lexicographic order onR× R (defined as it was in (2.52) for N× N) constitutes a total order on C.

Definition and Remark 5.5. Conjugation: For each complex number z = x+ iy, wedefine its complex conjugate or just conjugate to be the complex number z := x − iy.We then have the following rules that hold for each z = x+ iy, w = u+ iv ∈ C:

(a) z + w = x+u−iy−iv = z+w and zw = xu−yv−(xv+yu)i = (x−iy)(u−iv) = zw.

(b) z + z = 2x = 2Re z and z − z = 2yi = 2i Im z.

(c) z = z ⇔ x+ iy = x− iy ⇔ y = 0 ⇔ z ∈ R.

(d) zz = (x+ iy)(x− iy) = x2 + y2 ∈ R+0 .

5.2 Sign and Absolute Value (Modulus)

We face a certain conundrum regarding the handling of square roots. The problemis that we will need the notion of a continuous function to prove the existence of aunique square root

√x for every nonnegative real number x and, in consequence, we

will have to wait until Section 7.2.5 below to carry out this proof. On the other hand, itis extremely desirable to present the theory of convergence simultaneously for real andfor complex numbers, which requires the notion of the absolute value or modulus of acomplex number, to be defined in Def. 5.7(b) below as the square root of a nonnegativereal number.

Faced with this difficulty, we will introduce the notion of square root now, assuming theexistence, until we can add the proof in Section 7.2.5. Some students might be worried

that this might lead to a circular argument, where our later proof of the existence ofsquare roots would somehow make use of our previous assumption of that existence. Ofcourse, we will be careful not to make such a circular (and, thereby, logically invalid)argument. The point is that for real numbers the notion of absolute value does in noway depend on the notion of a square root (see Lem. 5.8 below).

Definition and Remark 5.6. We define a nonnegative real number y ∈ R+0 to be the

square root of the nonnegative real number x ∈ R+0 if, and only if, y2 = x. If y is the

square root of x, then one uses the notation√x := y. We will see in Rem. and Def.

7.61 that every x ∈ R+0 has a unique square root and that the function f : R+

0 −→ R+0 ,

f(x) :=√x, is strictly increasing (in particular, injective).

Definition 5.7. (a) The sign function is defined by

sgn : R −→ R, sgn(x) :=

1 for x > 0,

0 for x = 0,

−1 for x < 0.

(5.8)

It is emphasized that the sign function is only defined for real numbers (cf. Rem.5.4)!

(b) The absolute value or modulus function is defined by

abs : C −→ R+0 , z = x+ iy 7→ |z| :=

√zz =

√

x2 + y2, (5.9)

where the term absolute value is often preferred for real numbers z ∈ R and theterm modulus is often preferred if one also considers complex numbers z /∈ R.

Lemma 5.8. For each x ∈ R, one has

|x| = x · sgn(x) ={

x for x ≥ 0,

−x for x < 0.(5.10)

Proof. One has

|x| =√x2 =

{

x for x ≥ 0,

−x for x < 0,(5.11)

as claimed. �

Theorem 5.9. The following rules hold for each z, w ∈ C:

(a) z 6= 0 ⇒ |z| > 0.

(b) ||z|| = |z|.

(c) |z| = |z|.

(d) max{|Re z|, | Im z|} ≤ |z| ≤ |Re z|+ | Im z|.


(e) |zw| = |z||w|.

(f) For w 6= 0, one has | zw| = |z|

|w| .

(g) Triangle Inequality:|z + w| ≤ |z|+ |w|. (5.12)

(h) Inverse Triangle Inequality:∣∣|z| − |w|

∣∣ ≤ |z − w|. (5.13)

Proof. We carry out the proofs for z, w ∈ C. However, for z, w ∈ R, everything caneasily be shown directly from (5.10), without making use of square roots.

Let z = x+ iy with x, y ∈ R.

(a): If z 6= 0, then x 6= 0 or y 6= 0, i.e. x2 > 0 or y2 > 0 by Th. 4.5(c), implyingx2 + y2 > 0 by Th. 4.5(f), i.e. |z| =

√

x2 + y2 > 0.

(b): Since a := |z| ∈ R+0 , we have |a| =

√a2 = a = |z|.

(c): Since z = x− iy, we have |z| =√

x2 + (−y)2 =√

x2 + y2 = |z|.(d): It is x = Re z, y = Im z. Let a := max{|x|, |y|}. As remarked in Def. and Rem.5.6, the square root function is increasing and, thus, taking square roots in the chain ofinequalities a2 ≤ x2 + y2 ≤ (|x|+ |y|)2 implies a ≤ |z| ≤ |x|+ |y| as claimed.

(e): As remarked in Def. and Rem. 5.6, the square root function is injective, and, thus,(e) follows from

|zw|2 = zw zwDef. and Rem. 5.5(a)

= zwzw = zz ww = |z|2 |w|2.

(f): Let w = u + iv with u, v ∈ R. We first consider the special case z = 1. Applyingthe formula (5.4b) for the inverse to w, one obtains

|w−1|2 = u2

(u2 + v2)2+

v2

(u2 + v2)2=

1

u2 + v2=(|w|−1

)2,

i.e. |w−1| = |w|−1. Now (f) follows from (e): | zw| = |zw−1| = |z||w−1| = |z||w|−1 = |z|

|w| .

(g) follows from

|z + w|2 = (z + w)(z + w) = zz + wz + zw + wwDef. and Rem. 5.5(b)

= |z|2 + 2Re(zw) + |w|2(d)

≤ |z|2 + 2|zw|+ |w|2 =(|z|+ |w|

)2,

once again using that the square root function is increasing.

(h): Using (g), we obtain

|z| = |z − w + w| ≤ |z − w|+ |w| ⇒ |z| − |w| ≤ |z − w|,|w| = |w − z + z| ≤ |z − w|+ |z| ⇒ −(|z| − |w|) ≤ |z − w|,

implying∣∣|z| − |w|

∣∣ ≤ |z − w| by (5.10) (notice |z| − |w| ∈ R). �


Remark 5.10. Each complex number (x, y) = x + iy can be visualized as a point inthe so-called complex plane, where the horizontal x-axis represents real numbers andthe veritcal y-axis represents purely imaginary numbers. Then the addition of complexnumbers is precisely the vector addition of 2-dimensional vectors in the complex plane,and conjugation is represented by reflection through the x-axis. Moreover, the modulus|z| of a complex number is precisely its distance from the origin (0, 0), and |z − w|is the distance between the points z = (x, y) and w = (u, v) in the plane. Complexmultiplication can also be interpreted geometrically in the plane: If φ denotes the anglethat the vector representing z = (x, y) forms with the x-axis, and, likewise, ψ denotesthe angle that the vector representing w = (u, v) forms with the x-axis, then zw isthe vector of length |zw| that forms the angle φ + ψ with the x-axis (we will betterunderstand this geometrical interpretation of complex multiplication later (see Def. andRem. 8.29), when writing complex numbers in the polar form z = x+ iy = |z| exp(iφ),making use of the exponential function exp).

5.3 Sums and Products

Here we compile some important rules involving sums and products. We are mostlyinterested in applying them to real and complex numbers. However, most of the rules,without extra difficulty, can be proved to hold in more general structures. We will pro-vide the more general statements, but the reader will not lose much by merely thinkingof C rather than a general ring in Th. 5.11, and of R rather than a general totallyordered field in Th. 5.12. Some rules involve exponentiation as defined in (C.10a) of theAppendix (in particular, it is used that z0 = 1 for each z ∈ R, where R is a ring withunity), and some proofs make use of the corresponding exponentiation rules of Th. C.6.

Theorem 5.11. Let (R,+, ·) be a ring (cf. Def. C.7 of the Appendix).

(a) For each n ∈ N and each λ, µ, zj , wj ∈ R, j ∈ {1, . . . , n}:n∑

j=1

(λ zj + µwj) = λn∑

j=1

zj + µn∑

j=1

wj.

(b) If R is a ring with unity, then, for each n ∈ N0 and each z ∈ R:

(1− z)(1 + z + z2 + · · ·+ zn) = (1− z)n∑

j=0

zj = 1− zn+1.

(c) If R is a commutative ring with unity, then, for each n ∈ N0 and each z, w ∈ R:

wn+1 − zn+1 = (w − z)n∑

j=0

zj wn−j = (w − z)(wn + zwn−1 + · · ·+ zn−1w + zn).

Proof. In each case, the proof can be conducted by an easy induction. We carry out(c) and leave the other cases as exercises. For (c), the base case (n = 0) is provided bythe true statement w0+1 − z0+1 = w − z = (w − z)z0w0−0. For the induction step, onecomputes

(w − z)n+1∑

j=0

zj wn+1−j = (w − z)

(

zn+1w0 +n∑

j=0

zj wn+1−j)

= (w − z)zn+1 + (w − z)wn∑

j=0

zj wn−j

ind. hyp.= (w − z) zn+1 + w(wn+1 − zn+1) = wn+2 − zn+2,

completing the induction. �

Theorem 5.12. Let (F,+, ·) be a totally ordered field (cf. Def. 4.3).

(a) For each n ∈ N and each xj, yj ∈ F , j ∈ {1, . . . , n}:(

∀j∈{1,...,n}

xj ≤ yj

)

⇒n∑

j=1

xj ≤n∑

j=1

yj,

where equality can only hold if xj = yj for each j ∈ {1, . . . , n}.

(b) For each n ∈ N and each xj, yj ∈ F , j ∈ {1, . . . , n}:(

∀j∈{1,...,n}

0 < xj ≤ yj

)

⇒n∏

j=1

xj ≤n∏

j=1

yj,

where equality can only hold if xj = yj for each j ∈ {1, . . . , n}.

Proof. Both cases are proved by simple inductions, where Th. 4.5(f) is used in (a) andTh. 4.5(g) is used in (b). �

Theorem 5.13. Triangle Inequality: For each n ∈ N and each zj ∈ C, j ∈ {1, . . . , n}:∣∣∣∣∣

n∑

j=1

zj

∣∣∣∣∣≤

n∑

j=1

|zj|.

Proof. Another very simple induction that is left to the reader. �

5.4 Binomial Coefficients and Binomial Theorem

The goal in this section is to expand (z+w)n into a sum. This sum involves the so-calledbinomial coefficients

(nk

), which are also useful in other contexts. To obtain an idea for


what to expect, let us compute the cases n = 0, 1, 2, 3: (z + w)0 = 1, (z + w)1 = z + w,(z + w)2 = z2 + 2zw + w2, (z + w)3 = z3 + 3z2w + 3zw2 + w3. One finds that thecoefficients form what is known as Pascal’s triangle, which we write for n = 0, . . . , 5:

n = 0 :n = 1 :n = 2 :n = 3 :n = 4 :n = 5 :

11 1

1 2 11 3 3 1

1 4 6 4 11 5 10 10 5 1

(5.14)

The entries of the nth row of Pascal’s triangle are denoted by(n0

), . . . ,

(nn

). One also

observes that one obtains each entry of the (n+1)st row, except the first and last entry,by adding the corresponding entries in row n to the left and to the right of the consideredentry in row n + 1. The first and last entry of each row are always set to 1. This canbe summarized as

∀n∈N0

((n

0

)

=

(n

n

)

= 1,

(n+ 1

k

)

=

(n

k − 1

)

+

(n

k

)

for k ∈ {1, . . . , n})

. (5.15)

The following Def. 5.14 provides a different and more general definition of binomialcoefficients. We will then prove in Prop. 5.15 that the binomial coefficients as definedin Def. 5.14 do, indeed, satisfy (5.15).

Definition 5.14. For each α ∈ C and each k ∈ N0, we define the binomial coefficient

(α

0

)

:= 1,

(α

k

)

:=k∏

j=1

α + 1− j

j=α(α− 1) · · · (α− k + 1)

1 · 2 · · · k for k ∈ N. (5.16)

Proposition 5.15. (a) For each α ∈ C and each k ∈ N:

(α

0

)

= 1,

(α + 1

k

)

=

(α

k − 1

)

+

(α

k

)

. (5.17)

(b) For each n ∈ N0: (n

n

)

= 1. (5.18)

The above statements include (5.15) as a special case.

Proof. (a): The first identity is part of the definition in (5.16). For the second identity,we first observe, for each k ∈ N,

(α

k

)

=k∏

j=1

α + 1− j

j=α + 1− k

k

k−1∏

j=1

α + 1− j

j=

(α

k − 1

)α + 1− k

k, (5.19)


which implies

(α

k − 1

)

+

(α

k

)

=

(α

k − 1

)(

1 +α + 1− k

k

)

=

(α

k − 1

)α + 1

k

=α + 1

k

k−1∏

j=1

α + 1− j

j=

k∏

j=1

α + 2− j

j=

(α + 1

k

)

. (5.20)

(b):(00

)= 1 according to (5.16). For n ∈ N, (5.18) is proved by induction. The base

case (n = 1) is provided by the true statement(11

)= 1+1−1

1= 1. For the induction step,

one computes

(n+ 1

n+ 1

)

=n+1∏

j=1

n+ 1 + 1− j

j=n+ 1

n+ 1

n∏

j=1

n+ 1− j

j=

(n

n

)ind. hyp.= 1, (5.21)

which completes the induction. �

Theorem 5.16 (Binomial Theorem). Let R be a commutative ring with unity (cf. Def.C.7 of the Appendix – for us, R = C is the most important example). For each z, w ∈ Rand each n ∈ N0, the following formula holds:

(z + w)n =n∑

k=0

(n

k

)

zn−kwk = zn +

(n

1

)

zn−1w + · · ·+(

n

n− 1

)

zwn−1 + wn. (5.22)

Proof. The proof is conducted via induction on n. The base case (n = 0) is provided bythe correct statement (z+w)0 = 1 =

(00

)z0−0w0. For the induction step, we first observe

(z + w)n+1 = (z + w) (z + w)n = z (z + w)n + w (z + w)n. (5.23)

Using the induction hypothesis, we now further manipulate the two terms on the right-hand side of (5.23):

z (z + w)nind. hyp.= z

n∑

k=0

(n

k

)

zn−kwk =n∑

k=0

(n

k

)

zn+1−kwk

( nn+1)=0=

n+1∑

k=0

(n

k

)

zn+1−kwk, (5.24)

w (z + w)nind. hyp.= w

n∑

k=0

(n

k

)

zn−kwk =n∑

k=0

(n

k

)

zn−kwk+1

=n+1∑

k=1

(n

k − 1

)

zn+1−kwk. (5.25)


Plugging (5.24) and (5.25) into (5.23) yields

(z + w)n+1 =

(n

0

)

zn+1w0 +n+1∑

k=1

((n

k

)

+

(n

k − 1

))

zn+1−kwk

Prop. 5.15=

(n+ 1

0

)

zn+1w0 +n+1∑

k=1

(n+ 1

k

)

zn+1−kwk

=n+1∑

k=0

(n+ 1

k

)

zn+1−kwk, (5.26)


The binomial theorem can now be used to infer a few more rules that hold for thebinomial coefficients:

Corollary 5.17. One has the following identities:

∀n∈N0

n∑

k=0

(n

k

)

=

(n

0

)

+

(n

1

)

+ · · ·+(n

n

)

= 2n, (5.27a)

∀n∈N

n∑

k=0

(n

k

)

(−1)k =

(n

0

)

−(n

1

)

+

(n

2

)

−+ · · ·+ (−1)n(n

n

)

= 0. (5.27b)

Proof. (5.27a) is just (5.22) with z = w = 1; (5.27b) is just (5.22) with z = 1 andw = −1. �

The formulas provided by the following proposition are also sometimes useful.

Proposition 5.18. (a) For each α ∈ C and each k ∈ N0:

k∑

j=0

(α + j

j

)

=

(α

0

)

+

(α + 1

1

)

+ · · ·+(α + k

k

)

=

(α + k + 1

k

)

. (5.28)

(b) For each n, k ∈ N0 with k ≤ n:(n

k

)

=n!

k!(n− k)!. (5.29)

Moreover, for n ≥ 1, one has(nk

)= #Pk({1, . . . , n}), where

Pk(A) :={B ∈ P(A) : #B = k

}(5.30)

denotes the set of all subsets of a set A that have precisely k elements.

(c) For each n, k ∈ N0:

k∑

j=0

(n+ j

n

)

=

(n

n

)

+

(n+ 1

n

)

+ · · ·+(n+ k

n

)

=

(n+ k + 1

n+ 1

)

. (5.31)

6 POLYNOMIALS 59

Proof. The induction proofs of (a) and (b) are left as exercises. For (c), one computes

k∑

j=0

(n+ j

n

)(5.29)=

k∑

j=0

(n+ j)!

n!(n+ j − n)!

(5.29)=

k∑

j=0

(n+ j

j

)

(5.28)=

(n+ k + 1

k

)(5.29)=

(n+ k + 1)!

k!(n+ 1)!=

(n+ k + 1

n+ 1

)

,

thereby establishing the case. �

6 Polynomials

6.1 Arithmetic of K-Valued Functions

Notation 6.1. We will write K in situations, where we allow K to be R or C.

Notation 6.2. If A is any nonempty set, then one can add and multiply arbitraryfunctions f, g : A −→ K, and one can define several further operations to create newfunctions from f and g:

(f + g) : A −→ K, (f + g)(x) := f(x) + g(x), (6.1a)

(λf) : A −→ K, (λf)(x) := λf(x) for each λ ∈ K, (6.1b)

(fg) : A −→ K, (fg)(x) := f(x)g(x), (6.1c)

(f/g) : A −→ K, (f/g)(x) := f(x)/g(x) (assuming g(x) 6= 0), (6.1d)

Re f : A −→ R, (Re f)(x) := Re(f(x)), (6.1e)

Im f : A −→ R, (Im f)(x) := Im(f(x)). (6.1f)

For K = R, we further define

max(f, g) : A −→ R, max(f, g)(x) := max{f(x), g(x)

}, (6.1g)

min(f, g) : A −→ R, min(f, g)(x) := min{f(x), g(x)

}, (6.1h)

f+ : A −→ R, f+ := max(f, 0), (6.1i)

f− : A −→ R, f− := max(−f, 0). (6.1j)

Finally, once again also allowing K = C,

|f | : A −→ R, |f |(x) := |f(x)|. (6.1k)

One calls f+ and f− the positive part and the negative part of f , respectively. ForR-valued functions f , we have

|f | = f+ + f−. (6.1l)

6 POLYNOMIALS 60

6.2 Polynomials

Definition 6.3. Let n ∈ N0. Each function from K into K, x 7→ xn, is called amonomial. A function P from K into K is called a polynomial if, and only if, it is alinear combination of monomials, i.e. if, and only if P has the form

P : K −→ K, P (x) =n∑

j=0

ajxj = a0 + a1x+ · · ·+ anx

n, aj ∈ K. (6.2)

The aj are called the coefficients of P . The largest number d ≤ n such that ad 6= 0 iscalled the degree of P , denoted deg(P ). If all coefficients are 0, then P is called the zeropolynomial; the degree of the zero polynomial is defined as −1 (in Th. 6.6(b) below, wewill see that each polynomial of degree n ∈ N0 is uniquely determined by its coefficientsa0, . . . , an and vice versa).

Polynomials of degree ≤ 0 are constant. Polynomials of degree ≤ 1 have the formP (x) = a+ bx and are called affine functions (often they are also called linear functions,even though this is not really correct for a 6= 0, since every function P that is linear (inthe sense of linear algebra) must satisfy P (0) = 0). Polynomials of degree ≤ 2 have theform P (x) = a+ bx+ cx2 and are called quadratic functions.

Each ξ ∈ K such that P (ξ) = 0 is called a zero or a root of P .

A rational function is a quotient P/Q of two polynomials P and Q.

Remark 6.4. Let λ ∈ K and let P,Q be polynomials. Then λP , P+Q, and PQ definedaccording to Not. 6.2 are polynomials as well. More precisely, if λ = 0 or P ≡ 0, thenλP = 0; if P ≡ 0, then P +Q = Q; if Q ≡ 0, then P +Q = P ; if P ≡ 0 or Q ≡ 0, thenPQ = 0. If λ 6= 0 and

P (x) =n∑

j=0

ajxj, Q(x) =

m∑

j=0

bjxj,

with deg(P ) = n ≥ 0, deg(Q) = m ≥ 0, n ≥ m ≥ 0,

(6.3)

then, defining bj := 0 for each j ∈ {m+ 1, . . . , n} in case n > m,

(λP )(x) =n∑

j=0

(λ aj) xj, deg(λP ) = n, (6.4a)

(P +Q)(x) =n∑

j=0

(aj + bj) xj, deg(P +Q) ≤ n = max{m,n}, (6.4b)

(PQ)(x) =m+n∑

j=0

cj xj, deg(PQ) = m+ n, (6.4c)

where, setting ak := 0 for each k ∈ {n + 1, . . . ,m + n} and bk := 0 for each k ∈{m+ 1, . . . ,m+ n},

∀j∈{0,...,n+m}

cj =

j∑

k=0

akbj−k. (6.4d)

6 POLYNOMIALS 61

Formula (6.4c) can be proved by induction on m = deg(Q) ∈ N0 as follows: For m = 0,we compute

(PQ)(x) = b0

n∑

j=0

aj xj =

n+0∑

j=0

b0aj xj,

i.e. cj = b0aj =∑j

k=0 akbj−k, which establishes the base case, remembering bj−k = 0 forj > k. For the induction step, we compute, for deg(Q) = m+ 1,

(PQ)(x) =n∑

j=0

aj xj

m+1∑

α=0

bα xα =

n∑

j=0

aj xj

(

bm+1xm+1 +

m∑

α=0

bα xα

)

ind. hyp.=

n∑

j=0

ajbm+1 xm+1+j +

m+n∑

j=0

(j∑

k=0

akbj−k

)

xj

=m+n+1∑

j=m+1

aj−m−1bm+1 xj +

m+n∑

j=0

(j∑

k=0

akbj−k

)

xj

=m+n+1∑

j=0

(j∑

k=0

akbj−k

)

xj,

which completes the induction step. There is a notational issue in the second and thirdline in of the above computation, since, in both lines, the bm+1 in the first sum is theactual bm+1 from Q, but bm+1 = 0 in the second sum in both lines, which is due to theinduction hypothesis being applied form < m+1. This is actually used when combiningboth sums in the last step, computing, for m+1 ≤ j ≤ m+n: aj−m−1bm+1 x

j+aj−m−1 ·0 ·xj = aj−m−1bm+1 x

j. For j = m+n+1, one has∑m+n+1

k=0 akbm+n+1−k = anbm+1, sincebm+n+1−k = 0 for n > k and ak = 0 for k > n.

Finally, deg(PQ) = m+ n follows from cm+n = ambn 6= 0.

Theorem 6.5. (a) For each polynomial P given in the form of (6.3) and each ξ ∈ K,we have the identity

P (x) =n∑

j=0

bj (x− ξ)j, (6.5)

where

∀j∈{0,...,n}

bj =n∑

k=j

ak

(k

j

)

ξk−j, in particular b0 = P (ξ), bn = an. (6.6)

(b) If P is a polynomial with n := deg(P ) ≥ 1, then, for each ξ ∈ K, there exists apolynomial Q with deg(Q) = n− 1 such that

P (x) = P (ξ) + (x− ξ)Q(x). (6.7)

In particular, if ξ is a zero of P , then P (x) = (x− ξ)Q(x).

6 POLYNOMIALS 62

Proof. (a): For ξ = 0, there is nothing to prove. For ξ 6= 0, defining the auxiliaryvariable η := x− ξ, we obtain x = ξ + η and

P (x) =n∑

k=0

ak(ξ + η)k(5.22)=

n∑

k=0

k∑

j=0

ak

(k

j

)

ξk−jηj =n∑

k=0

n∑

j=0

ak

(k

j

)

ξk−jηj

=n∑

j=0

n∑

k=0

ak

(k

j

)

ξk−jηj =n∑

j=0

n∑

k=j

ak

(k

j

)

ξk−jηj, (6.8)

which is (6.5).

(b): According to (a), we have

P (x) = P (ξ)+(x− ξ)Q(x), with Q(x) =n∑

j=1

bj (x− ξ)j−1 =n−1∑

j=0

bj+1 (x− ξ)j , (6.9)

proving (b). �

Theorem 6.6. (a) If P is a polynomial with n := deg(P ) ≥ 0, then P has at most nzeros.

(b) Let P,Q be polynomials as in (6.3) with n = m, deg(P ) ≤ n, and deg(Q) ≤ n. IfP (xj) = Q(xj) at n + 1 distinct points x0, x1, . . . , xn ∈ K, then aj = bj for eachj ∈ {0, . . . , n}.Consequence 1: If P,Q with degree ≤ n agree at n+1 distinct points, then P = Q.

Consequence 2: If we know P = Q, then they agree everywhere, in particular atmax{deg(P ), deg(Q)} + 1 distinct points, which implies they have the same coeffi-cients.

Proof. (a): For n = 0, P is constant, but not the zero polynomial, i.e. P ≡ a0 6= 0 withno zeros as claimed. For n ∈ N, the proof is conducted by induction. The base case(n = 1) is provided by the observation that deg(P ) = 1 implies P is the affine functionwith P (x) = a0 + a1x, a1 6= 0, i.e. P has precisely one zero at ξ = −a0/a1. For theinduction step, assume deg(P ) = n + 1. If P has no zeros, then the assertion of (a)holds true. Otherwise, P has at least one zero ξ ∈ K, and, according to Th. 6.5(b),there exists a polynomial Q such that deg(Q) = n and

P (x) = (x− ξ)Q(x). (6.10)

From the induction hypothesis, we gather that Q has at most n zeros, i.e. (6.10) impliesP has at most n+ 1 zeros, which completes the induction.

(b): If P (xj) = Q(xj) at n + 1 distinct points xj, then each of these points is a zero ofP −Q. Thus P −Q is a polynomial of degree ≤ n with at least n + 1 zeros. Then (a)implies deg(P − Q) = −1, i.e. P − Q is the zero polynomial, i.e. aj − bj = 0 for eachj ∈ {0, . . . , n}. �

7 LIMITS AND CONVERGENCE OF REAL AND COMPLEX NUMBERS 63

Remark 6.7. Let P be a polynomial with n := deg(P ) ≥ 0. According to Th. 6.6(a), Phas at most n zeros. Using Th. 6.5(b) for an induction shows there exists k ∈ {0, . . . , n}and a polynomial Q of degree n− k such that

P (x) = Q(x)k∏

j=1

(x− ξj) = (x− ξ1)(x− ξ2) · · · (x− ξk)Q(x), (6.11a)

where Q does not have any zeros in K and {ξ1, . . . , ξk} = {ξ ∈ K : P (ξ) = 0} is the setof zeros of P . It can of course happen that P does not have any zeros and P = Q (noξj exist). It can also occur that some of the ξj in (6.11a) are identical. Thus, we canrewrite (6.11a) as

P (x) = Q(x)l∏

j=1

(x− λj)mj = (x− λ1)

m1(x− λ2)m2 · · · (x− λl)

mlQ(x), (6.11b)

where λ1, . . . , λl, l ∈ {0, . . . , k}, are the distinct zeros of P , and mj ∈ N with∑l

j=1mj =k. Then mj is called the multiplicity of the zero λj of P .

7 Limits and Convergence in the Real and Complex

Numbers

7.1 Sequences

Recall from Def. 2.15(b) that a sequence in K is a function f : N −→ K, in this contextusually denoted as f = (zn)n∈N or (z1, z2, . . . ) with zn := f(n). Sometimes the sequencealso has the form (zn)n∈I , where I 6= ∅ is a countable index set (e.g. I = N0) differentfrom N (in the context of convergence (see the following Def. 7.1), I must be N or itmust have the same cardinality as N, i.e. finite I are not permissible).

Definition 7.1. The sequence (zn)n∈N in K is said to be convergent with limit z ∈ K if,and only if, for each ǫ > 0, there exists an index N ∈ N such that |zn − z| < ǫ for everyindex n > N . The notation for (zn)n∈N converging to z is limn→∞ zn = z or zn → z forn→ ∞. Thus, by definition,

limn→∞

zn = z ⇔ ∀ǫ∈R+

∃N∈N

∀n>N

|zn − z| < ǫ. (7.1)

The sequence (zn)n∈N in K is called divergent if, and only if, it is not convergent.

Example 7.2. (a) For every constant sequence (zn)n∈N = (a)n∈N with a ∈ K, one haslimn→∞ zn = limn→∞ a = a: Since, for each n ∈ N, |zn − a| = |a − a| = 0, one canchoose N = 1 for each ǫ > 0.

(b) limn→∞1

n+a= 0 for each a ∈ C: Here zn := 1/(n+ a) (if n = −a, then set zn := w

with w ∈ C arbitrary). Given ǫ > 0, choose an arbitrary N ∈ N with N ≥ ǫ−1+ |a|.Then, for each n ≥ N , we compute |n + a| = |n − (−a)| ≥ |n − |a|| = n − |a| >N − |a| ≥ ǫ−1, and, thus, |zn| = |n+ a|−1 < ǫ as desired.

(c) ((−1)n)n∈N is not convergent: We have zn = 1 for each even n and zn = −1 for eachodd n. Thus, for each z 6= 1 and each even n, |zn−z| = |1−z| > |1−z|/2 =: ǫ > 0,i.e. z is not a limit of (zn)n∈N. However, z = 1 is also not a limit of the sequence,since, for each odd n, |zn − 1| = | − 1 − 1| = 2 > 1 =: ǫ > 0, proving that thesequence has no limit.

Theorem 7.3. (a) Let (zn)n∈N be a sequence in C. Then (zn)n∈N is convergent in C

if, and only if, both (Re zn)n∈N and (Im zn)n∈N are convergent in R. Moreover, inthat case,

limn→∞

zn = z ⇔ limn→∞

Re zn = Re z ∧ limn→∞

Im zn = Im z. (7.2)

(b) Let (xn)n∈N be a sequence in R and z ∈ C. Then

limn→∞

xn = z ⇒ z ∈ R. (7.3)

Proof. (a): Suppose (zn)n∈N converges to z ∈ C. Then, given ǫ > 0, there exists N ∈ N

such that, for each n > N , |zn − z| < ǫ. In consequence, for each n > N ,

|Re zn − Re z| = |Re(zn − z)|Th. 5.9(d)

≤ |zn − z| < ǫ, (7.4)

proving limn→∞Re zn = Re z. The proof of limn→∞ Im zn = Im z is completely anal-ogous. Conversely, suppose there are x, y ∈ R such that limn→∞ Re zn = x andlimn→∞ Im zn = y. Here we encounter, for the first time, what is sometimes called an ǫ/2argument: Given ǫ > 0, there exists N ∈ N such that, for each n > N , |Re zn−x| < ǫ/2and | Im zn − y| < ǫ/2, implying, for each n > N ,

|zn − (x+ iy)| = |Re zn + i Im zn − (x+ iy)|≤ |Re zn − x|+ |i|| Im zn − y| < ǫ/2 + ǫ/2 = ǫ, (7.5)

proving limn→∞ zn = x+ iy.

(b) is a direct consequence of (a). �

Example 7.4. (a) According to Th. 7.3(a), we have

limn→∞

(√2 +

i

n− 17

)Ex. 7.2(a),(b)

=√2 + 0i =

√2.

(b) According to Th. 7.3(a) and Ex. 7.2(c), the sequence ( 1n+ (−1)n i)n∈N is divergent.

Another important example relies on the following inequality:

Proposition 7.5 (Bernoulli’s Inequality). For each n ∈ N0 and each x ∈ [−1,∞[, wehave

(1 + x)n ≥ 1 + nx, (7.6)

with strict inequality whenever n > 1 and x 6= 0.

Proof. For n = 0, (7.6) reads 1 ≥ 1, for n = 1, (7.6) reads 1 + x ≥ 1 + x, for n = 2,(7.6) reads (1 + x)2 = 1+ 2x+ x2 ≥ 1 + 2x, all three statements being trivially true, inthe case n = 2 with strict inequality for x 6= 0. We now proceed by induction for n ≥ 2.For the induction step, one estimates

(1 + x)n+1 = (1 + x)n (1 + x)ind. hyp., x ≥ −1

≥ (1 + nx) (1 + x) = 1 + (n+ 1)x+ nx2

≥ 1 + (n+ 1)x, (7.7)

with strict inequality for x 6= 0. �

Example 7.6. We have, for each q ∈ C,

|q| < 1 ⇒ limn→∞

qn = 0 : (7.8)

For q = 0, there is nothing to prove. For 0 < |q| < 1, it is |q|−1 > 1, i.e. h := |q|−1−1 > 0.Thus, for each ǫ > 0 and N ≥ 1/(ǫh), we obtain

n > N ⇒ |q|−n = (1 + h)n(7.6)

≥ 1 + nh > nh > 1/ǫ ⇒ |qn| = |q|n < ǫ. (7.9)

Definition 7.7. (a) Given z ∈ K and ǫ ∈ R+, we call the set Bǫ(z) := {w ∈ K :|w− z| < ǫ} the ǫ-neighborhood of z or, in anticipation of Analysis II, the (open) ǫ-ball with center z (in fact, for K = C, Bǫ(z) represents an open disk in the complexplane with center z and radius ǫ, whereas, for K = R, Bǫ(z) =]z − ǫ, z + ǫ[ is theopen interval with center z and length 2ǫ). More generally, a set U ⊆ K is calleda neighborhood of z if, and only if, there exists ǫ > 0 with Bǫ(z) ⊆ U (so, forexample, for ǫ > 0, Bǫ(z) is always a neighborhood of z, whereas R and [z − ǫ,∞[are neighborhoods of z for K = R, but not for K = C ([z − ǫ,∞[ not even beingdefined for z /∈ R); the sets {z}, {w ∈ K : Rew ≥ Re z}, {w ∈ K : Rew ≥ Re z+ǫ}are never neighborhoods of z).

(b) If φ(n) is a statement for each n ∈ N, then φ(n) is said to be true for almost alln ∈ N if, and only if, there exists a finite subset A ⊆ N such that φ(n) is true foreach n ∈ N \A, i.e. if, and only if, φ(n) is always true, with the possible exceptionof finitely many cases.

Remark 7.8. In the language of Def. 7.7, the sequence (zn)n∈N converges to z if, andonly if, every neighborhood of z contains almost all zn.

Definition 7.9. The sequence (zn)n∈N in K is called bounded if, and only if, the set{|zn| : n ∈ N} is bounded in the sense of Def. 2.27(a).

Proposition 7.10. Let (zn)n∈N be a sequence in K.

(a) Limits are unique, that means if z, w ∈ K such that limn→∞ zn = z and limn→∞ zn =w, then z = w.

(b) If (zn)n∈N is convergent, then it is bounded.

Proof. (a): Exercise.

(b): If limn→∞ zn = z, then A := {|zn| : |zn − z| ≥ 1} ∪ {|z1|} is nonempty and finite.According to Th. 3.13(a), A has an upper bound M . Then max{M, |z|+1} is an upperbound for {|zn| : n ∈ N}, and 0 is always a lower bound, showing that the sequence isbounded. �

Proposition 7.11. Let (zn)n∈N be a sequence in C with limn→∞ zn = 0.

(a) If (bn)n∈N is a sequences in C such that there exists C ∈ R+ with |bn| ≤ C|zn| foralmost all n, then limn→∞ bn = 0.

(b) If (cn)n∈N is a bounded sequence in C, then limn→∞(cnzn) = 0.

Proof. (a): Given ǫ > 0, there exists N ∈ N such that |zn| < ǫ/C and |bn| ≤ C|zn| foreach n > N . Then, for each n > N , |bn| ≤ C|zn| < ǫ, proving limn→∞ bn = 0.

(b): If (cn)n∈N is bounded, then there exists C ∈ R+ such that |cn| ≤ C for each n ∈ N.Thus, |cnzn| ≤ C|zn| for each n ∈ N, implying limn→∞(cnzn) = 0 via (a). �

Example 7.12. The sequences ((−1)n)n∈N and (b)n∈N with b ∈ C are bounded. Since,for each a ∈ C, limn→∞

1n+a

= 0 by Example 7.2(b), we obtain

limn→∞

(−1)n

n+ a= lim

n→∞b

n+ a= 0 (7.10)

from Prop. 7.11(b).

Theorem 7.13. (a) Let (zn)n∈N and (wn)n∈N be sequences in C. Moreover, let z, w ∈ C

with limn→∞ zn = z and limn→∞wn = w. We have the following identities:

limn→∞

(λzn) = λz for each λ ∈ C, (7.11a)

limn→∞

(zn + wn) = z + w, (7.11b)

limn→∞

(znwn) = zw, (7.11c)

limn→∞

zn/wn = z/w given all wn 6= 0 and w 6= 0, (7.11d)

limn→∞

|zn| = |z|, (7.11e)

limn→∞

zn = z, (7.11f)

limn→∞

zpn = zp for each p ∈ N. (7.11g)

(b) Let (xn)n∈N and (yn)n∈N be sequences in R. Moreover, let x, y ∈ R with limn→∞ xn =x and limn→∞ yn = y. Then

limn→∞

max{xn, yn} = max{x, y}, (7.12a)

limn→∞

min{xn, yn} = min{x, y}. (7.12b)

(c) If, in the situation of (b) (i.e. for real sequences), xn ≤ yn holds for almost alln ∈ N, then x ≤ y. In particular, if almost all xn ≥ 0, then x ≥ 0.

Proof. We start with the identities of (a).

(7.11a): For λ = 0, there is nothing to prove. For λ 6= 0 and ǫ > 0, there exists N ∈ N

such that, for each n > N , |zn − z| < ǫ/|λ|, implying

∀n>N

|λ zn − λ z| = |λ| |zn − z| < ǫ. (7.13a)

(7.11b): Given ǫ > 0, there exists N ∈ N such that, for each n > N , |zn − z| < ǫ/2 and|wn − w| < ǫ/2, implying

∀n>N

|zn + wn − (z + w)| ≤ |zn − z|+ |wn − w| < ǫ/2 + ǫ/2 = ǫ. (7.13b)

(7.11c): Let M1 := max{|z|, 1}. According to Prop. 7.10(b), there exists M2 ∈ R+ suchthat M2 is an upper bound for {|wn| : n ∈ N}. Moreover, given ǫ > 0, there existsN ∈ N such that, for each n > N , |zn− z| < ǫ/(2M2) and |wn−w| < ǫ/(2M1), implying

∀n>N

|znwn − zw| =∣∣(zn − z)wn + z(wn − w)

∣∣

≤ |wn| · |zn − z|+ |z| · |wn − w| < M2 ǫ

2M2

+M1 ǫ

2M1

= ǫ.

(7.13c)

(7.11d): We first consider the case, where all zn = 1. Given ǫ > 0, there exists N ∈ N

such that, for each n > N , |wn − w| < ǫ |w|2/2 and |wn − w| < |w|/2 (since w 6= 0 forthis case), implying |w| ≤ |w − wn|+ |wn| < |w|/2 + |wn| and |wn| > |w|/2. Thus,

∀n>N

∣∣∣∣

1

wn− 1

w

∣∣∣∣=

∣∣∣∣

wn − w

wnw

∣∣∣∣≤ 2 |wn − w|

|w|2 <2

|w|2ǫ |w|22

= ǫ. (7.13d)

The general case now follows from (7.11c).

(7.11e): This is a consequence of the inverse triangle inequality (5.13): Given ǫ > 0,there exists N ∈ N such that, for each n > N , |zn − z| < ǫ, implying

∀n>N

∣∣|zn| − |z|

∣∣ ≤ |zn − z| < ǫ. (7.13e)

(7.11f): Write zn = xn+ iyn and z = x+ iy with xn, yn, x, y ∈ R, n ∈ N. Then we knowlimn→∞ xn = x and limn→∞ yn = y from (7.2), and

limn→∞

zn = limn→∞

(xn − iyn)(7.11a),(7.11b)

= x− iy = z, (7.13f)

which establishes the case.

(7.11g) follows by induction from (7.11c) (cf. (7.16b) below).

The proofs for the two identities of (b) are left as exercises.

(c): Proceeding by contraposition, assume x > y and set s := (x+y)/2. Then y < s < xand yn < s < xn holds for almost all n, i.e. xn ≤ yn does not hold for almost all n. �

Example 7.14. (a) limn→∞n+an+b

= 1 for each a, b ∈ C: Here zn := (n + a)/(n + b) (ifn = −b, then set zn := w with w ∈ C arbitrary). Using (7.11b) and (7.11d), oneobtains

limn→∞

n+ a

n+ b= lim

n→∞1 + a/n

1 + b/n=

limn→∞

1 + limn→∞

an

limn→∞

1 + limn→∞

bn

=1 + 0

1 + 0= 1. (7.14)

(b) Using (7.11b), (7.11d), and (7.11g), one obtains

limn→∞

2n5 − 3in3 + 2i

3n5 + 17n= lim

n→∞2− 3i/n2 + 2i/n5

3 + 17/n4=

2 + 0 + 0

3 + 0=

2

3. (7.15)

Corollary 7.15. For k ∈ N, let (z(1)n )n∈N, . . . , (z

(k)n )n∈N be sequences in C. Moreover,

let z(1), . . . , z(k) ∈ C with limn→∞ z(j)n = z(j) for each j ∈ {1, . . . , k}. Then

limn→∞

k∑

j=1

z(j)n =k∑

j=1

z(j), (7.16a)

limn→∞

k∏

j=1

z(j)n =k∏

j=1

z(j). (7.16b)

Proof. (7.16) follows by simple inductions from (7.11b) and (7.11c), respectively. �

Theorem 7.16 (Sandwich Theorem). Let (xn)n∈N, (yn)n∈N, and (an)n∈N be sequencesin R. If xn ≤ an ≤ yn holds for almost all n ∈ N, then

limn→∞

xn = limn→∞

yn = x ∈ R ⇒ limn→∞

an = x. (7.17)

Proof. Given ǫ > 0, there exists N ∈ N such that, for each n > N , xn ≤ an ≤ yn,|xn − x| < ǫ, and |yn − x| < ǫ, implying

∀n>N

x− ǫ < xn ≤ an ≤ yn < x+ ǫ, (7.18)

which establishes the case. �

Example 7.17. Since, 0 < 1n!

≤ 1nholds for each n ∈ N, the Sandwich Th. 7.16 implies

limn→∞

1

n!= 0. (7.19)

Definition 7.18. Let (xn)n∈N be a sequence in R. The sequence is said to diverge to∞ (resp. to −∞), denoted limn→∞ xn = ∞ (resp. limn→∞ xn = −∞) if, and only if, foreach K ∈ R, almost all xn are bigger (resp. smaller) than K. Thus,

limn→∞

xn = ∞ ⇔ ∀K∈R

∃N∈N

∀n>N

xn > K, (7.20a)

limn→∞

xn = −∞ ⇔ ∀K∈R

∃N∈N

∀n>N

xn < K. (7.20b)

Theorem 7.19. Suppose S := (xn)n∈N is a monotone sequence in R (increasing ordecreasing). Defining A := {xn : n ∈ N}, the following holds:

limn→∞

xn =

supA if S is increasing and bounded,

∞ if S is increasing and not bounded,

inf A if S is decreasing and bounded,

−∞ if S is decreasing and not bounded.

(7.21)

Proof. We treat the increasing case; the decreasing case is proved completely analo-gously. If A is bounded and ǫ > 0, let K := supA − ǫ; if A is unbounded, then letK ∈ R be arbitrary. In both cases, since K can not be an upper bound, there existsN ∈ N such that xN > K. Since the sequence is increasing, for each n > N , xN ≤ xn,showing | supA− xn| < ǫ in the bounded case, and xn > K in the unbounded case. �

Example 7.20. Theorem 7.19 implies

∀k∈N

(

limn→∞

nk = ∞, limn→∞

(−nk) = −∞)

. (7.22)

—

It is sometimes necessary to consider so-called subsequences and reorderings of a givensequence. Here, we are interested in sequences in R or C, but for subsequences andreorderings it is irrelevant in which set A the sequence takes its values. As it presentsvirtually no extra difficulty to introduce the notions for general sequences, and since wewill need to consider sequences with values in sets other than R or C in Analysis II, weadmit general sequences in the following definition.

Definition 7.21. Let A be an arbitrary nonempty set. Consider a sequence σ : N −→A. Given a function φ : N −→ N (that means (φ(n))n∈N constitutes a sequence ofindices), the new sequence (σ ◦ φ) : N −→ A is called a subsequence of σ if, andonly if, φ is strictly increasing (i.e. 1 ≤ φ(1) < φ(2) < . . . ). Moreover, σ ◦ φ iscalled a reordering of σ if, and only if, φ is bijective. One can write σ in the form(zn)n∈N by setting zn := σ(n), and one can write σ ◦ φ in the form (wn)n∈N by settingwn := (σ ◦ φ)(n) = zφ(n). Especially for a subsequence of (zn)n∈N, it is also commonto write (znk

)k∈N. This notation corresponds to the one above if one lets nk := φ(k).Analogous definitions work if the index set N of σ is replaced by a general countablenonempty index set I.

Example 7.22. Consider the sequence (1, 2, 3, . . . ). Then (2, 4, 6, . . . ) constitutes asubsequence and (2, 1, 4, 3, 6, 5, . . . ) constitutes a reordering. Using the notation of Def.7.21, the original sequence is given by σ : N −→ N, σ(n) := n; the subsequenceis selected via φ1 : N −→ N, φ1(n) := 2n; and the reordering is accomplished via

φ2 : N −→ N, φ2(n) :=

{

n+ 1 if n is odd,

n− 1 if n is even.

Proposition 7.23. Let (zn)n∈N be a sequence in C. If limn→∞ zn = z, then everysubsequence and every reordering of (zn)n∈N is also convergent with limit z.

Proof. Let (wn)n∈N be a subsequence of of (zn)n∈N, i.e. there is a strictly increasingfunction φ : N −→ N such that wn = zφ(n). If limn→∞ zn = z, then, given ǫ > 0, there

is N ∈ N such that zn ∈ Bǫ(z) for each n > N . For N choose any number from N thatis ≥ N and in φ(N). Take M := φ−1(N) (where φ−1 : φ(N) −→ N). Then, for eachn > M , one has φ(n) > N ≥ N , and, thus, wn = zφ(n) ∈ Bǫ(z), showing limn→∞wn = z.

Let (wn)n∈N be a reordering of (zn)n∈N, i.e. there is a bijective function φ : N −→ N

such that wn = zφ(n). Let ǫ and N be as before. Define

M := max{φ−1(n) : n ≤ N}. (7.23)

As φ is bijective, it is φ(n) > N for each n > M . Then, for each n > M , one haswn = zφ(n) ∈ Bǫ(z), showing limn→∞wn = z. �

Definition 7.24. Let (zn)n∈N be a sequence in K. A point z ∈ K is called a clusterpoint or an accumulation point of the sequence if, and only if, for each ǫ > 0, Bǫ(z)contains infinitely many members of the sequence (i.e. #{n ∈ N : zn ∈ Bǫ(z)} = ∞).

Example 7.25. The sequence ((−1)n)n∈N has cluster points 1 and −1.

Proposition 7.26. A point z ∈ K is a cluster point of the sequence (zn)n∈N in K if,and only if, the sequence has a subsequence converging to z.

Proof. If (wn)n∈N is a subsequence of (zn)n∈N, limn→∞wn = z, then every Bǫ(z), ǫ > 0,contains infinitely many wn, i.e. infinitely many zn, i.e. z is a cluster point of (zn)n∈N.Conversely, if z is a cluster point of (zn)n∈N, then, inductively, define φ : N −→ N asfollows: For φ(1), choose the index k of any point zk in B1(z) (such a point exists, sincez is a cluster point of the sequence). Now assume that n > 1 and that φ(m) have alreadybeen defined for each m < n. Let M := max{φ(m) : m < n}. Since B 1

n(z) contains

infinitely many zk, there must be some zk ∈ B 1n(z) such that k > M . Choose this k as

φ(n). Thus, by construction, φ is strictly increasing, i.e. (wn)n∈N with wn := zφ(n) is asubsequence of (zn)n∈N. Moreover, for each ǫ > 0, there is N ∈ N such that 1/N < ǫ.Then, for each n > N , wn ∈ B 1

n(z) ⊆ B 1

N(z) ⊆ Bǫ(z), showing limn→∞wn = z. �

Theorem 7.27 (Bolzano-Weierstrass). Every bounded sequence S := (xn)n∈N in K

has at least one cluster point in K. Moreover, for K = R, the set A := {x ∈ R :x is cluster point of S} has a max x∗ ∈ R and a min x∗ ∈ R, i.e. every bounded sequencein R has a largest and a smallest cluster point. In addition, for each ǫ > 0, the inequalityx∗ − ǫ < xn < x∗ + ǫ holds for almost all n.

Proof. We first consider the case K = R. Define

A∗ := {x ∈ R : xn ≤ x for almost all n}, (7.24a)

A∗ := {x ∈ R : xn ≥ x for almost all n}. (7.24b)

We claim A∗ 6= ∅ is bounded from below and x∗ = maxA = inf A∗; A∗ 6= ∅ is boundedfrom above and x∗ = minA = supA∗. We prove the claim for A∗ – the proof for A∗ is

conducted completely analogous. Let m,M ∈ R be a lower and an upper bound for S,respectively. Then M ∈ A∗, showing A∗ 6= ∅; and m is a lower bound for A∗. Since A∗

is bounded from below, a := inf A∗ ∈ R by the completeness of R. Moreover, for eachǫ > 0, a− ǫ /∈ A∗, as a is a lower bound for A∗, i.e. xn > a− ǫ holds for infinitely manyn ∈ N. On the other hand, a + ǫ/2 ∈ A∗ follows from a being the largest lower boundof A∗, i.e. xn > a + ǫ/2 holds for only finitely many n (if any). In particular, we haveshown xn < a+ ǫ holds for almost all n, and a− ǫ < xn < a+ ǫ must hold for infinitelymany n, showing a is a cluster point of S. To see that a is the largest cluster point ofS (i.e. a = maxA), we have to show that x > a implies x is not a cluster point of S.However, letting ǫ := x − a > 0, we had seen above that xn > a + ǫ/2 holds for onlyfinitely many n, i.e. Bǫ/2(x) contains only finitely many xn, showing x is not a clusterpoint of S.

It now remains to consider the complex case, i.e. a bounded sequence S := (zn)n∈N in C.For each n ∈ N, let zn = xn+iyn with xn, yn ∈ R. Due to Th. 5.9(d), we have |xn| ≤ |zn|and |yn| ≤ |zn|, i.e. the boundedness of S implies the boundedness of both (xn)n∈N and(yn)n∈N. Then we know that (xn)n∈N has a cluster point x and, by Prop. 7.26, Shas a subsequence (znj

)j∈N such that x = limj→∞ xnj. As the subsequence (ynj

)j∈N isstill bounded, it must have a cluster point y and a subsequence (ynjk

)k∈N such thaty = limk→∞ ynjk

. Since x = limk→∞ xnjkas well, we now have limk→∞ znjk

= x+ iy =: z,i.e. S has a subsequence converging to z. According to Prop. 7.26, z is a cluster pointof S. �

Definition 7.28. A sequence (zn)n∈N in C is defined to be a Cauchy sequence if, andonly if, for each ǫ ∈ R+, there exists N ∈ N such that |zn − zm| < ǫ for each n,m > N ,i.e.

(zn)n∈N Cauchy ⇔ ∀ǫ∈R+

∃N∈N

∀n,m>N

|zn − zm| < ǫ. (7.25)

Theorem 7.29. The sequence (zn)n∈N in C is convergent if, and only if, it is a Cauchysequence.

Proof. Suppose the sequence is convergent with limn→∞ zn = z. Then, given ǫ > 0,there is N ∈ N such that zn ∈ B ǫ

2(z) for each n > N . If n,m > N , then |zn − zm| ≤

|zn − z|+ |z − zm| < ǫ2+ ǫ

2= ǫ, establishing that (zn)n∈N is a Cauchy sequence.

Conversely, suppose the sequence is a Cauchy sequence. Using similar reasoning as inthe proof of Prop. 7.10(b), we first show the sequence is bounded. If the sequence isCauchy, then there exists N ∈ N such that |zn − zm| < 1 for all n,m > N . Thus, theset A := {|zn| : |zn − zN+1| ≥ 1} ∪ {|z1|} ⊆ R+

0 is nonempty and finite. According toTh. 3.13(a), A has an upper bound M . Then max{M, |zN+1|+1} is an upper bound for{|zn| : n ∈ N}, showing that the sequence is bounded. From Th. 7.27, we obtain thatthe sequence has a cluster point z. It remains to show limn→∞ zn = z. Given ǫ > 0,choose N ∈ N such that |zn − zm| < ǫ/2 for all n,m > N . Since z is a cluster point,there exists k > N such that |zk − z| < ǫ/2. Thus,

∀n>N

|zn − z| ≤ |zn − zk|+ |zk − z| < ǫ

2+ǫ

2= ǫ, (7.26)

proving limn→∞ zn = z. �

Example 7.30. Consider the sequence S := (sn)n∈N defined by

sn :=n∑

k=1

1

k= 1 +

1

2+ · · ·+ 1

n. (7.27)

We claim S is not a Cauchy sequence and, thus, not convergent by Th. 7.29: For eachN ∈ N, we find n,m > N such that sn−sm > 1/2, namely m = N+1 and n = 2(N+1):

s2(N+1) − sN+1 =

2(N+1)∑

k=N+2

1

k=

1

N + 2+

1

N + 3+ · · ·+ 1

2(N + 1)

> (N + 1) · 1

2(N + 1)=

1

2. (7.28)

While we have just seen that S is not convergent, it is clearly increasing, i.e. Th. 7.19implies S is unbounded and limn→∞ sn = ∞. Sequences defined by longer and longersums are known as series and will be studied further in Sec. 7.3 below. The series of thepresent example is known as the harmonic series. It has become famous as the simplestexample of a series that does not converge even though its summands converge to 0. Interms of the notation introduced in Sec. 7.3 below, we have shown

∞∑

k=1

1

k= 1 +

1

2+

1

3+ · · · = ∞. (7.29)

7.2 Continuity

7.2.1 Definitions and First Examples

Roughly, a function is continuous if a small change in its input results in a small changeof its output. For functions defined on an interval, the notion of continuity makesprecise the idea of a function having no jump – no discontinuity – at some point x inits domain. For example, we would say the sign function of (5.8) has precisely onejump – one discontinuity – at x = 0, whereas quadratic functions (or, more generally,polynomials) do not have any jumps – they are continuous.

Definition 7.31. Let M ⊆ C. If ζ ∈ M , then a function f : M −→ K is said to becontinuous in ζ if, and only if, for each ǫ > 0, there is δ > 0 such that the distancebetween the values f(z) and f(ζ) is less than ǫ, provided the distance between z and ζis less than δ, i.e. if, and only if,

∀ǫ∈R+

∃δ∈R+

∀z∈M

(|z − ζ| < δ ⇒ |f(z)− f(ζ)| < ǫ

). (7.30)

Moreover, f is called continuous if, and only if, f is continuous in every ζ ∈M . The set ofall continuous functions from f : M −→ K is denoted by C(M,K), C(M) := C(M,R).

Example 7.32. (a) Every constant map f : M −→ K, ∅ 6= M ⊆ C, is continuous: Inthis case, given ǫ, we can choose any δ > 0 we want, say δ := 42: If ζ, z ∈M , then|f(ζ)− f(z)| = 0 < ǫ, which holds independently of δ, in particular, if |ζ − z| < δ.

(b) Every affine function f : K −→ K, f(z) := az + b is continuous: For a = 0, thisfollows from (a). For a 6= 0, given ǫ > 0, choose δ := ǫ/|a|. Then,

∀ζ,z∈K

|z − ζ| < δ =ǫ

|a| ⇒∣∣f(z)− f(ζ)

∣∣ =

∣∣az + b− aζ − b

∣∣

= |a| |z − ζ| < |a| ǫ|a| = ǫ

. (7.31)

(c) The sign function of (5.8) is not continuous: It is continuous in each ξ ∈ R\{0}, butnot continuous in 0: If ξ 6= 0, then, given ǫ > 0, choose δ := |ξ|. If |x− ξ| < δ, thensgn(x) = sgn(ξ), i.e. | sgn(x) − sgn(ξ)| = 0 < ǫ, proving continuity in ξ. However,at 0, for ǫ := 1/2, we have

∀δ>0

∣∣ sgn(0)− sgn(δ/2)

∣∣ = |0− 1| = 1 >

1

2= ǫ, (7.32)

showing sgn is not continuous in 0.

Some subtleties arise from the possibility that f can be defined on subsets of C withvery different properties. The notions introduced in Def. 7.33 help to deal with thesesubtleties.

Definition 7.33. Let M ⊆ C.

(a) The point z ∈ C is called a cluster point or accumulation point of M if, and only if,each ǫ-neighborhood of z, ǫ ∈ R+, contains infinitely many points of M , i.e. if, andonly if,

∀ǫ∈R+

#(M ∩ Bǫ(z)) = ∞. (7.33)

Note: A cluster point of M is not necessarily in M .

(b) The point z is called an isolated point of M if, and only if, there is ǫ ∈ R+ suchthat Bǫ(z) ∩M = {z}. Note: An isolated point of M is always in M .

Proposition 7.34. If M ⊆ C, then each point of M is either a cluster point or anisolated point of M , i.e.

M = {z ∈M : z cluster point of M} ∪{z ∈M : z isolated point of M}. (7.34)

Proof. Consider z ∈ M that is not a cluster point of M . We have to show that z is anisolated point of M . Since z is not a cluster point of M , there exists ǫ > 0 such thatA := (M ∩ Bǫ(z)) \ {z} is finite. Define

ǫ :=

{

min{|a− z| : a ∈ A} if A 6= ∅,ǫ if A = ∅. (7.35)

Then Bǫ(z)∩M = {z}, showing z is an isolated point ofM . Finally, the union in (7.34)is clearly disjoint. �

Lemma 7.35. Let M ⊆ C, f : M −→ K. If ζ is an isolated point of M , then f isalways continuous in ζ.

Proof. Independently of the concrete definition of f , we know there is δ > 0 such thatBδ(ζ) ∩M = {ζ}. In other words, if z ∈ M with |z − ζ| < δ, then z = ζ, implying|f(z)− f(ζ)| = 0 < ǫ for each ǫ > 0, showing f to be continuous in ζ. �

Example 7.36. (a) The sign function restricted to the setM :=]−∞,−1]∪{0}∪[1,∞[,i.e.

sgn(x) =

1 for x ∈ [1,∞[,

0 for x = 0,

−1 for x ∈]−∞,−1]

is continuous: As in Ex. 7.32(c), one sees that sgn is continuous in each ξ ∈M \{0}.However, now it is also continuous in 0, since 0 is an isolated point of M .

(b) Every function f : N −→ K is continuous, since every n ∈ N is an isolated point ofN (due to {n} = N ∩ B 1

2(n)).

7.2.2 Continuity, Sequences, and Function Arithmetic

To make available the power of the results on convergent sequences from Sec. 7.1 toinvestigations regarding the continuity of functions, we need to understand the relation-ship between both notions. The core of this relationship is the contents of the followingTh. 7.37, which provides a criterion allowing one to test continuity in terms of convergentsequences:

Theorem 7.37. Let M ⊆ C, f : M −→ K. If ζ ∈M , then f is continuous in ζ if, andonly if, for each sequence (zn)n∈N in M with limn→∞ zn = ζ, the sequence (f(zn))n∈Nconverges to f(ζ), i.e.

limn→∞

zn = ζ ⇒ limn→∞

f(zn) = f(ζ). (7.36)

Proof. If ζ ∈M is an isolated point ofM , then there is δ > 0 such thatM∩Bδ(ζ) = {ζ}.Then every f : M −→ K is continuous in ζ according to Lem. 7.35. On the other hand,every sequence in M converging to ζ must be finally constant and equal to ζ, i.e. (7.36)is trivially valid at ζ. Thus, the assertion of the theorem holds if ζ ∈ M is an isolatedpoint of M .

If ζ ∈M is not an isolated point ofM , then ζ is a cluster point ofM according to Prop.7.34. So, for the remainder of the proof, let ζ ∈ M be a cluster point of M . Assumethat f is continuous in ζ and (zn)n∈N is a sequence in M with limn→∞ zn = ζ. For eachǫ > 0, there is δ > 0 such that z ∈ M and |z − ζ| < δ implies |f(z) − f(ζ)| < ǫ. Sincelimn→∞ zn = ζ, there is also N ∈ N such that, for each n > N , |zn − ζ| < δ. Thus,for each n > N , |f(zn) − f(ζ)| < ǫ, proving limn→∞ f(zn) = f(ζ). Conversely, assumethat f is not continuous in ζ. We have to construct a sequence (zn)n∈N in M with

limn→∞ zn = ζ, but (f(zn))n∈N does not converge to f(ζ). Since f is not continuousin ζ, there must be some ǫ0 > 0 such that, for each 1/n, n ∈ N, there is at least onezn ∈M satisfying |zn− ζ| < 1/n and |f(zn)− f(ζ)| ≥ ǫ0. Then (zn)n∈N is a sequence inM with limn→∞ zn = ζ and (f(zn))n∈N does not converge to f(ζ). �

We can now apply the rules of Th. 7.13 to see that all the arithmetic operations definedin Not. 6.2 preserve continuity:

Theorem 7.38. LetM ⊆ C, f, g : M −→ K, λ ∈ K, ζ ∈M . If f, g are both continuousin ζ, then λf , f + g, fg, f/g for g 6= 0, |f |, Re f , and Im f are all continuous in ζ. IfK = R, then max(f, g), min(f, g), f+ and f−, are also all continuous in ζ.

Proof. Let (zn)n∈N be a sequence in M such that limn→∞ zn = ζ. Then the continuityof f and g in ζ yields limn→∞ f(zn) = f(ζ) and limn→∞ g(zn) = g(ζ). Then

(7.11a) ⇒ limn→∞

(λf)(zn) = (λf)(ζ),

(7.11b) ⇒ limn→∞

(f + g)(zn) = (f + g)(ζ),

(7.11c) ⇒ limn→∞

(fg)(zn) = (fg)(ζ),

(7.11d) ⇒ limn→∞

(f/g)(zn) = (f/g)(ζ),

(7.11e) ⇒ limn→∞

|f |(zn) = |f |(ζ),(7.2) ⇒ lim

n→∞(Re f)(zn) = (Re f)(ζ),

(7.2) ⇒ limn→∞

(Im f)(zn) = (Im f)(ζ).

If f, g are both R-valued, then we also have

(7.12a) ⇒ limn→∞

max(f, g)(zn) = max(f, g)(ζ),

(7.12b) ⇒ limn→∞

min(f, g)(zn) = min(f, g)(ζ),

and, finally, the continuity of f+ and f− follows from the continuity of max(f, g). �

Corollary 7.39. A function f : M −→ C, M ⊆ C, is continuous in ζ ∈ M if, andonly if, both Re f and Im f are continuous in ζ.

Proof. If f is continuous in ζ, then Re f and Im f are both continuous in ζ by Th. 7.38.If Re f and Im f are both continuous in ζ, then, as

f = Re f + i Im f, (7.37)

f is continuous in ζ, once again, by Th. 7.38. �

Example 7.40. (a) The continuity of the absolute value function z 7→ |z| on K can beconcluded directly from (7.11e) and, alternatively, from combining the continuityof f : K −→ K, f(z) = z, according to Ex. 7.32(b), with the continuity of |f |according to Th. 7.38.

(b) Every polynomial P : K −→ K, P (x) =∑n

j=0 ajxj, aj ∈ K, is continuous: First

note that every monomial x 7→ xj is continuous on K by (7.11g). Then Th. 7.38implies the continuity of x 7→ ajx

j on K. Now the continuity of P follows from(7.16a) or, alternatively, by an induction from the f + g part of Th. 7.38.

(c) Let P,Q : K −→ K, be polynomials and let A := Q−1{0} the set of all zeros ofQ (if any). Then the rational function (P/Q) : K \ A −→ K is continuous as aconsequence of (b) plus the f/g part of Th. 7.38.

Theorem 7.41. Let Df , Dg ⊆ C, f : Df −→ C, g : Dg −→ K, f(Df ) ⊆ Dg. If fis continuous in ζ ∈ Df and g is continuous in f(ζ) ∈ Dg, then g ◦ f : Df −→ K iscontinuous in ζ. In consequence, if f and g are both continuous, then the compositiong ◦ f is also continuous.

Proof. Let ζ ∈ Df and assume f is continuous in ζ and g is continuous in f(ζ). If (zn)n∈Nis a sequence in Df such that limn→∞ zn = ζ, then the continuity of f in ζ implies thatlimn→∞ f(zn) = f(ζ). Then the continuity of g in f(ζ) implies limn→∞ g(f(zn)) =g(f(ζ)), thereby establishing the continuity of g ◦ f in ζ. �

7.2.3 Bounded, Closed, and Compact Sets

Subsets A of C (and even subsets of R) can be extremely complicated. If the set A hasone or more of the benign properties defined in the following, then this can often beexploited in some useful way (we will see an important example in Th. 7.54 below).

Definition 7.42. Consider A ⊆ C.

(a) A is called bounded if, and only if, A = ∅ or the set {|z| : z ∈ A} is bounded in R

in the sense of Def. 2.27(a), i.e. if, and only if,

∃M∈R+

A ⊆ BM(0).

(b) A is called closed if, and only if, every sequence in A that converges in C has itslimit in A (note that ∅ is, thus, closed).

(c) A is called compact if, and only if, A is both closed and bounded.

Example 7.43. (a) Clearly, ∅ and sets containing single points {z}, z ∈ C are com-pact. The sets C and R are simple examples of closed sets that are not bounded.

(b) Let a, b ∈ R, a < b. Each bounded interval ]a, b[, ]a, b], [a, b[, [a, b] is, indeed,bounded (e.g. by M := ǫ+max{|a|, |b|} for each ǫ ∈ R+). If (xn)n∈N is a sequencein [a, b], converging to x ∈ R, then Th. 7.13(c) shows a ≤ x ≤ b, i.e. x ∈ [a, b] and[a, b] is, indeed, closed. Analogously, one sees that the unbounded intervals [a,∞[and ]−∞, a] are also closed. On the other hand, open and half-open intervals arenot closed: For sufficiently large n, the convergent sequence (b − 1

n)n∈N is in [a, b[,

but limn→∞(b − 1n) = b /∈ [a, b[, and the other cases are treated analogously. In

particular, only intervals of the form [a, b] (and trivial intervals) are compact.


(c) For each ǫ > 0 and each z ∈ C, the set Bǫ(z) is bounded (since Bǫ(z) ⊆ Bǫ+|z|(0)by the triangle inequality), but not closed (since, for sufficiently large n ∈ N,(z + ǫ − 1

n)n∈N is a sequence in Bǫ(z), converging to z + ǫ /∈ Bǫ(z)). In particular,

Bǫ(z) is not compact.

Proposition 7.44. (a) Finite unions of bounded (resp. closed, resp. compact) sets arebounded (resp. closed, resp. compact), i.e. if A1, . . . , An ⊆ C, n ∈ N, are bounded(resp. closed, resp. compact), then A :=

⋃nj=1Aj is also bounded (resp. closed, resp.

compact).

(b) Arbitrary (i.e. finite or infinite) intersections of bounded (resp. closed, resp. com-pact) sets are bounded (resp. closed, resp. compact), i.e. if I 6= ∅ is an arbitraryindex set and, for each j ∈ I, Aj ⊆ C is bounded (resp. closed, resp. compact), thenA :=

⋂

j∈I Aj is also bounded (resp. closed, resp. compact).

Proof. (a): Exercise.

(b): Fix j0 ∈ I. If all Aj, j ∈ I, are bounded, then, in particular, there is M ∈ R+ suchthat Aj0 ⊆ BM(0). Thus, A =

⋂

j∈I Aj ⊆ Aj0 ⊆ BM(0) shows A is also bounded. If allAj, j ∈ I, are closed and (an)n∈N is a sequence in A that converges to some z ∈ C, then(an)n∈N is a sequence in each Aj, j ∈ I, and, since each Aj is closed, z ∈ Aj for eachj ∈ I, i.e. z ∈ A =

⋂

j∈I Aj. If all Aj, j ∈ I, are compact, then they are all closed andbounded and, thus, A is closed and bounded, i.e. A is compact. �

Example 7.45. (a) According to Prop. 7.44(a), all finite subsets of C are compact.

(b) N =⋃

n∈N{n} shows that infinite unions of compact sets can be unbounded, and]0, 1[=

⋃

n∈N[1n, 1 − 1

n] shows that infinite unions of compact sets are not always

closed.

Many more examples of closed sets can be obtained as preimages of closed sets undercontinuous maps according to the following remark:

Remark 7.46. In Analysis II, it will be shown in the more general context of mapsf between topological spaces that a map f is continuous if, and only if, all preimagesf−1(A) under f of closed sets A are closed. Here, we will only prove the following specialcase:

f : C −→ K continuous and A ⊆ K closed ⇒ f−1(A) ⊆ C closed. (7.38)

Indeed, suppose f is continuous and A ⊆ K is closed. If (zn)n∈N is a sequence in f−1(A)with limn→∞ zn = z ∈ C, then (f(zn))n∈N is a sequence in A. The continuity of f thenimplies limn→∞ f(zn) = f(z) and, then, f(z) ∈ A, since A is closed. Thus, z ∈ f−1(A),showing f−1(A) is closed.

Example 7.47. (a) For each z ∈ C and each r > 0, the closed disk Br(z) := {w ∈ C :|z − w| ≤ r} with radius r and center z is, indeed, closed by (7.38), since

Br(z) = f−1[0, r], (7.39)

where f is the continuous map f : C −→ R, f(w) := |z−w|. Since Br(z) is clearlybounded, it is also compact.

(b) For each z ∈ C and each r > 0, the circle (also called a 1-sphere) Sr(z) := {w ∈ C :|z − w| = r} with radius r and center z is closed by (7.38), since Sr(z) = f−1{r},where f is the same map as in (7.39). Moreover, Sr(z) is also clearly bounded, and,thus, compact.

(c) According to (7.38), for each x ∈ R, the closed half-spaces {z ∈ C : Re z ≥ x} =Re−1[x,∞[ and {z ∈ C : Im z ≥ x} = Im−1[x,∞[ are, indeed, closed.

Theorem 7.48. A subset K of C is compact if, and only if, every sequence in K has asubsequence that converges to some limit z ∈ K.

Proof. IfK is closed and bounded, and (zn)n∈N is a sequence inK, then the boundedness,the Bolzano-Weierstrass Th. 7.27, and Prop. 7.26 yield a subsequence that converges tosome z ∈ C. However, since K is closed, z ∈ K.

Conversely, assume every sequence in K has a subsequence that converges to somelimit z ∈ K. Let (zn)n∈N be a sequence in K that converges to some w ∈ C. Then thissequence must have a subsequence that converges to some z ∈ K. However, according toProp. 7.23, it must be w = z ∈ K, showing K is closed. If K is not bounded, then thereexists a sequence (zn)n∈N in K such that limn→∞ |zn| = ∞. Every subsequence (znk

)k∈Nthen still has the property that limk→∞ |znk

| = ∞, in particular, each subsequence isunbounded and can not converge to some z ∈ C (let alone in K). �

Caveat 7.49. In Analysis II, we will generalize the notion of compactness to subsetsof so-called metric spaces. In metric spaces, it is still true that a set K is compact if,and only if, every sequence in K has a subsequence that converges to some limit inK. However, while it remains true that every compact set is closed and bounded, theconverse does not(!) hold in general metric spaces (in general, even in closed sets, thereexist bounded sequences that do not have convergent subsequences).

—

One reason that compact sets are useful is that real-valued continuous functions oncompact sets assume a maximum and a minimum, which is the contents of Th. 7.54below. In preparation, we now define maxima and minima for real-valued functions.

Definition 7.50. Let M ⊆ C, f : M −→ R.

(a) Given z ∈ M , f has a (strict) global min at z if, and only if, f(z) ≤ f(w) (f(z) <f(w)) for each w ∈ M \ {z}. Analogously, f has a (strict) global max at z if, andonly if, f(z) ≥ f(w) (f(z) > f(w)) for each w ∈M \{z}. Moreover, f has a (strict)global extreme value at z if, and only if, f has a (strict) global min or a (strict)global max at z.

(b) Given z ∈ M , f has a (strict) local min at z if, and only if, there exists ǫ > 0such that f(z) ≤ f(w) (f(z) < f(w)) for each w ∈ {w ∈ M : |z − w| < ǫ} \ {z}.Analogously, f has a (strict) local max at z if, and only if, there exists ǫ > 0 suchthat f(z) ≥ f(w) (f(z) > f(w)) for each w ∈ {w ∈ M : |z − w| < ǫ} \ {z}.Moreover, f has a (strict) local extreme value at z if, and only if, f has a (strict)local min or a (strict) local max at z.

Remark 7.51. In the context of Def. 7.50, it is immediate from the respective definitionsthat f has a (strict) global min at z ∈ M if, and only if, −f has a (strict) global maxat z. Moreover, the same holds if “global” is replaced by “local”. It is equally obviousthat every (strict) global min/max is a (strict) local min/max.

Theorem 7.52. If K ⊆ C is compact, and f : K −→ C is continuous, then f(K) iscompact.

Proof. If (wn)n∈N is a sequence in f(K), then, for each n ∈ N, there is some zn ∈ Ksuch that f(zn) = wn. As K is compact, there is a subsequence (an)n∈N of (zn)n∈Nwith limn→∞ an = a for some a ∈ K. Then (f(an))n∈N is a subsequence of (wn)n∈N andthe continuity of f yields limn→∞ f(an) = f(a) ∈ f(K), showing that (wn)n∈N has aconvergent subsequence with limit in f(K). By Th. 7.48, we have therefore establishedthat f(K) is compact. �

Lemma 7.53. If K is a nonempty compact subset of R, then K contains a smallestand a largest element, i.e. there exist m,M ∈ K such that m ≤ x ≤M for each x ∈ K.

Proof. Since the compact set K is bounded, we know that

−∞ < m := infK ≤ supK =:M <∞.

According to the definition of the inf and sup as largest lower bound and smallest upperbound, respectively, for each n ∈ N, there must be elements xn, yn ∈ K such thatm ≤ xn ≤ m+ 1

nand M − 1

n≤ yn ≤M . Since the compact set K is also closed, we get

m = limn→∞ xn ∈ K and M = limn→∞ yn ∈ K. �

Theorem 7.54. If K ⊆ C is compact, and f : K −→ R is continuous, then f assumesits max and its min, i.e. there are zm ∈ K and zM ∈ K such that f has a global min atzm and a global max at zM . In particular, the continuous function f assumes its maxand min on each compact interval K = [a, b] ⊆ R, a, b ∈ R.

Proof. Since K is compact and f is continuous, f(K) ⊆ R is compact according toTh. 7.52. Then, by Lem. 7.53, f(K) contains a smallest element m and a largestelement M . This, in turn, implies that there are zm, zM ∈ K such that f(zm) = m andf(zM) =M . �

Example 7.55. On an unbounded set, a continuous function does not necessarily havea global max or a global min, as one can already see from x 7→ x. An example for acontinuous function on a bounded, but not closed, interval, that does not have a globalmax is f : ]0, 1] −→ R, f(x) := 1/x, which is continuous by Th. 7.38.

7.2.4 Intermediate Value Theorem

Theorem 7.56 (Bolzano’s Theorem). Let a, b ∈ R with a < b. If f : [a, b] −→ R iscontinuous with f(a) > 0 and f(b) < 0, then f has at least one zero in ]a, b[. Moreprecisely, the set A := f−1{0} has a min ξ1 and a max ξ2, a < ξ1 ≤ ξ2 < b, where f > 0on [a, ξ1[ and f < 0 on ]ξ2, b].

Proof. Let ξ1 := inf f−1(R−0 ).

(a): f(ξ1) ≤ 0: This is clear if ξ1 = b. If ξ1 < b, then, for each n ∈ N sufficiently large,there exists xn ∈ [ξ1, ξ1 + 1/n[⊆ [a, b] such that f(xn) ≤ 0). Then limn→∞ xn = ξ1 andthe continuity of f implies limn→∞ f(xn) = f(ξ1). Now f(ξ1) ≤ 0 is a consequence ofTh. 7.13(c). In particular, (a) yields a < ξ1 and f > 0 on [a, ξ1[.

(b): f(ξ1) ≥ 0: The continuity of f implies limn→∞ f(ξ1 − 1/n) = f(ξ1) and, since wehave already seen f(ξ1 − 1/n) > 0 for each n ∈ N sufficiently large, f(ξ1) ≥ 0 is again aconsequence of Th. 7.13(c). In particular, we have ξ1 < b.

Combining (a) and (b), we have f(ξ1) = 0 and a < ξ1 < b.

Defining ξ2 := sup f−1(R+0 ), f(ξ2) = 0 and a < ξ2 < b is shown completely analogous.

Then f < 0 on ]ξ2, b] is also clear as well as ξ1 ≤ ξ2. �

Theorem 7.57 (Intermediate Value Theorem). Let a, b ∈ R with a < b. If f : [a, b] −→R is continuous, then f assumes every value between f(a) and f(b), i.e.

[

min{f(a), f(b)},max{f(a), f(b)}]

⊆ f([a, b]

). (7.40)

Proof. If f(a) = f(b), then there is nothing to prove. If f(a) < f(b) and η ∈]f(a), f(b)[,then consider the auxiliary function g : [a, b] −→ R, g(x) := η − f(x). Then g iscontinuous with g(a) = η − f(a) > 0 and g(b) = η − f(b) < 0. According to Bolzano’sTh. 7.56, there exists ξ ∈]a, b[ such that g(ξ) = η − f(ξ) = 0, i.e. f(ξ) = η as claimed.If f(b) < f(a) and η ∈]f(b), f(a)[, then consider the auxiliary function g : [a, b] −→ R,g(x) := f(x)−η. Then g is continuous with g(a) = f(a)−η > 0 and g(b) = f(b)−η < 0.Once again, according to Bolzano’s Th. 7.56, there exists ξ ∈]a, b[ such that g(ξ) =f(ξ)− η = 0, i.e. f(ξ) = η. �

Theorem 7.58. If I ⊆ R is an interval (of one of the 8 types listed in (4.8)) andf : I −→ R is continuous, then f(I) is also an interval (it can degenerate to a singlepoint if f is constant). More precisely, if ∅ 6= I = [a, b] is a compact interval, then∅ 6= f(I) = [min f(I),max f(I)]; if I is not a compact interval, then one of the following9 cases occurs:

f(I) = R, (7.41a)

f(I) =]−∞, sup f(I)], (7.41b)

f(I) =]−∞, sup f(I)[, (7.41c)

f(I) = [inf f(I),∞[ (7.41d)

f(I) = [inf f(I), sup f(I)], (7.41e)

f(I) = [inf f(I), sup f(I)[, (7.41f)

f(I) =] inf f(I),∞[, (7.41g)

f(I) =] inf f(I), sup f(I)], (7.41h)

f(I) =] inf f(I), sup f(I)[. (7.41i)

Proof. If I is a compact interval, then we merely combine Th. 7.54 with Th. 7.57.Otherwise, let η ∈ f(I). If f(I) has an upper bound, then Th. 7.57 implies [η, sup f(I)[⊆f(I) and f(I) ∩ [η,∞[⊆ [η, sup f(I)]. If f(I) does not have an upper bound, then Th.7.57 implies f(I) ∩ [η,∞[= [η,∞[. Analogously, one obtains f(I)∩]−∞, η] =]−∞, η]or f(I)∩]−∞, η] = [inf f(I), η] or f(I)∩]−∞, η] =] inf f(I), η], showing that there areprecisely the 9 possibilities of (7.41) for f(I) =

(f(I)∩]−∞, η]

)∪(f(I) ∩ [η,∞[

). �

The above results will have striking consequences in the following Sec. 7.2.5.

Example 7.59. The piecewise affine function

f : ]0, 1] −→ R, f(x) :=

(−1)n · n− 2n+11

n−1− 1

n

(x− 1

n

)for x ∈ [ 1

n, 1n−1

], n even,

(−1)n · n+ 2n+11

n−1− 1

n

(x− 1

n

)for x ∈ [ 1

n, 1n−1

], n ≥ 3 odd,

satisfies f(1/n) = (−1)n · n for each n ∈ N and is an example of a continuous functionon the bounded half-open interval I :=]0, 1] with f(I) = R.

7.2.5 Inverse Functions, Existence of Roots, Exponential Function, Loga-rithm

Theorem 7.60. Let I ⊆ R be an interval (of one of the 8 types listed in (4.8)). Iff : I −→ R is strictly increasing (resp. decreasing), then f has an inverse function f−1

defined on J := f(I), i.e. f−1 : J −→ I, and f−1 is continuous and strictly increasing(resp. decreasing). If f is also continuous, then J must be an interval.

Proof. From Prop. 2.32(b), we know f : I −→ R is one-to-one. Then f : I −→ f(I)is invertible and Prop. 2.32(c) shows f−1 is strictly monotone in the same sense as f .We need to prove the continuity of f−1. We assume f to be strictly increasing (the casewhere f is strictly decreasing then follows by considering −f). Let η ∈ J , ǫ > 0, andξ ∈ I with f(ξ) = η. If I = {ξ}, then J = {η}, and there is nothing to prove. It remainsto consider three cases:(a) ξ = min I, i.e. ξ is the left endpoint of I (and ξ 6= max I),(b) ξ = max I, i.e. ξ is the right endpoint of I (and ξ 6= min I),(c) ξ is neither the min nor the max of I.We carry out the proof for (c) and leave the (very similar) cases (a) and (b) as exercises.In case (c), there are points ξ1, ξ2 ∈ Bǫ(ξ) ∩ I such that

ξ − ǫ < ξ1 < ξ < ξ2 < ξ + ǫ. (7.42)

As f is strictly increasing, this implies

f(ξ1) < η < f(ξ2).

Choose δ > 0 such that

f(ξ1) < η − δ < η < η + δ < f(ξ2).

Then

∀y∈J∩Bδ(η)

(

f(ξ1) < y < f(ξ2)f−1 str. inc.⇒ ξ1 < f−1(y) < ξ2

(7.42)⇒ f−1(y) ∈ Bǫ(ξ))

,

proving the continuity of f−1 in η. That J must be an interval if f is continuous wasalready shown in Th. 7.58. �

Remark and Definition 7.61 (Roots). We are now in a position to fulfill the promisemade in Def. and Rem. 5.6, i.e. to prove the existence of unique roots for nonnegativereal numbers: For each n ∈ N, the function f : R+

0 −→ R, f(x) := xn, is continuousand strictly increasing with J := f(R+

0 ) = R+0 . Then Th. 7.60 implies the existence

of a continuous and strictly increasing inverse function f−1 : R+0 −→ R+

0 . For each

x ∈ R+0 , we call f−1(x) the nth root of x and write n

√x := x

1n := f−1(x). Then

( n√x)n = (x

1n )n = x is immediate from the definition. Caveat: By definition, roots are

always nonnegative and they are only defined for nonnegative numbers (when studyingcomplex numbers and C-valued functions more deeply in the field of Complex Analysis,one typically extends the notion of root, but we will not pursue this route in this class).As anticipated in Def. and Rem. 5.6, one also writes

√x instead of 2

√x and calls

√x the

square root of x.

Remark and Definition 7.62. It turns out that√2 (and many other roots) are not

rational numbers, i.e.√2 /∈ Q. This is easily proved by contradiction: If

√2 ∈ Q, then

there exist natural numbers m,n ∈ N such that√2 = m/n. Moreover, by canceling

possible factors of 2, we may assume at least one of the numbers m,n is odd. Now√2 = m/n implies m2 = 2n2, i.e. m2 and, thus, m must be even. In consequence, there

exists p ∈ N such that m = 2p, implying 2n2 = m2 = 4p2 and n2 = 2p2. Thus n2 and nmust also be even, in contradiction to m,n not both being even.

The elements of R \Q are called irrational numbers. It turns out that most real num-bers are irrational numbers – one can show that Q is countable, whereas R \ Q is notcountable (actually, every interval contains countably many rational and uncountablymany irrational numbers, see Appendix F, in particular, Th. F.1(c) and Cor. F.4).

Theorem 7.63 (Inequality Between the Arithmetic Mean and the Geometric Mean).If n ∈ N and x1, . . . , xn ∈ R+

0 , then

n√x1 · · · xn ≤ x1 + · · ·+ xn

n, (7.43)

where the left-hand side is called the geometric mean and the right-hand side is calledthe arithmetic mean of the numbers x1, . . . , xn. Equality occurs if, and only if, x1 =· · · = xn.

Proof. If at least one of the xj is 0, then (7.43) becomes the true statement 0 ≤ x1+···+xnn

with strict equality if at least one xj > 0. If x1 = · · · = xn = x, then (7.43) also holdssince both sides are equal to x. Thus, for the remainder of the proof, we assume allxj > 0 and not all xj are equal. First, we consider the special case, where

x1+···+xnn

= 1.Since not all xj are equal, there exists k with xk 6= 1. We prove (7.43) by induction forn ∈ {2, 3, . . . } in the form

(n∑

j=1

xj = n ∧ ∃k∈{1,...,n}

xk 6= 1

)

⇒n∏

j=1

xj < 1.

Base Case (n = 2): Since x1 + x2 = 2, 0 < x1, x2 and not both x1 and x2 are equal to1, there is ǫ > 0 such that x1 = 1 + ǫ and x2 = 1 − ǫ, i.e. x1x2 = 1 − ǫ2 < 1, whichestablishes the base case. Induction Step: We now have n ≥ 2 and 0 < x1, . . . , xn+1

with∑n+1

j=1 xj = n + 1 plus the existence of k, l ∈ {1, . . . , n + 1} such that xk = 1 + α,xl = 1− β with α, β > 0. Then define y := xk + xl− 1 = 1+α− β. One observes y > 0(since β < 1) and

y +n+1∑

j=1,j 6=k,l

xj = −1 +n+1∑

j=1

xj = nind. hyp.⇒ y

n+1∏

j=1,j 6=k,l

xj ≤ 1

(we can not exclude equality as y and all the remaining xj might be equal to 1). Sincexkxl = (1 + α)(1 − β) = 1 + α − β − αβ = y − αβ < y, we now infer

∏n+1j=1 xj < 1,

concluding the induction proof. It remains to consider the case x1+···+xnn

= λ > 0, notall xj equal. One estimates

n√x1 · · · xn = λ n

√x1λ

· · · xnλ

special case< λ

x1 + · · ·+ xnλn

=x1 + · · ·+ xn

n,

completing the proof of the theorem. �

Corollary 7.64. For each a ∈ R+0 \ {1}, n ∈ {2, 3, . . . }, p ∈ {1, . . . , n− 1}:

n√ap < 1 +

p

n(a− 1); p = 1 yields n

√a < 1 +

a− 1

n. (7.44)

Proof. The simple application

n√ap = n

√√√√ap ·

n−p∏

j=1

1Th. 7.63<

pa+ n− p

n= 1 +

p

n(a− 1) (7.45)

of Th. 7.63 establishes the case. �

Example 7.65. We use (7.43) to show

limn→∞

n√n = 1 : (7.46)

First note 0 < x < 1 ⇒ 0 < xn < 1, i.e. n√n > 1 for each ( n

√n)n = n > 1. Now write n

as the product of n factors n =√n√n ·∏n−2

k=1 1. Then, for n > 1,

n√n = n

√√√√

√n√n ·

n−2∏

k=1

1Th. 7.63<

2√n+ n− 2

n< 1 +

2√n. (7.47)

It is an exercise to show

limn→∞

1√n= 0. (7.48)

Now this together with 1 ≤ n√n ≤ 1 + 2√

nand the Sandwich Th. 7.16 proves (7.46).

Example 7.66 (Euler’s Number). We use Th. 7.63 to prove the limit

e := limn→∞

(

1 +1

n

)n

(7.49)

exists. It is known as Euler’s number. One can show it is an irrational number (seeAppendix H.1) and its first digits are e = 2.71828 . . . It is of exceptional importance foranalysis and mathematics in general, as it pops up in all kinds of different mathematicalcontexts. From Th. 7.63, we obtain

∀n∈N

∀x∈[−n,∞[,

x 6=0

(

1 +x

n

)n

= 1 ·(

1 +x

n

)n

<

(

1 +x

n+ 1

)n+1

, (7.50)

where we have used that, on both sides of the inequality in (7.50), there are n+1 factorshaving the same sum, namely n + 1 + x; and the inequality in (7.43) is strict, unlessall factors are equal. We now apply (7.50) to the sequences (an)n∈N, (bn)n∈N, (cn)n∈N,where

∀n∈N

an :=

(

1 +1

n

)n

, bn :=

(

1− 1

n

)n

,

cn := b−1n+1 =

((

1− 1

n+ 1

)−1)n+1

=

(

1 +1

n

)n+1

: (7.51)

Applying (7.50) with x = 1 and x = −1, respectively, yields (an)n∈N and (bn)n∈N arestrictly increasing, and (cn)n∈N is strictly decreasing. On the other hand, an < cn holdsfor each n ∈ N, showing (an)n∈N is bounded from above by c1, and (cn)n∈N is boundedfrom below by a1. In particular, Th. 7.19 implies the convergence of both (an)n∈N and(cn)n∈N. Moreover, limn→∞ cn = limn→∞

(an(1+ 1/n)

)= e · 1 = e, which, together with

an < e < cn for each n ∈ N, can be used to compute e to an arbitrary precision.

Definition 7.67. Let A ⊆ R be a subset of the real numbers. Then A is called densein R if, and only if, every ǫ-neighborhood of every real number contains a point from A,i.e. if, and only if,

∀x∈R

∀ǫ∈R+

A ∩ Bǫ(x) 6= ∅.

Theorem 7.68. (a) Q is dense in R.

(b) R \Q is dense in R.

(c) For each x ∈ R, there exist sequences (rn)n∈N and (sn)n∈N in the rational numbersQ such that x = limn→∞ rn = limn→∞ sn, (rn)n∈N is strictly increasing and (sn)n∈Nis strictly decreasing.

Proof. (a): Since each Bǫ(x) is an interval, it suffices to prove that every interval ]a, b[,a < b, contains a rational number. If 0 ∈]a, b[, then there is nothing to prove. Suppose0 < a 0. Choose n ∈ N such that 1/n < δ and let

q := max

{k

n: k ∈ N ∧ k

n a

}

.

Then, once again, q ∈ Q and a < q < b.

(b): Analogous to (a), we show that every interval ]a, b[, a < b, contains an irrationalnumber: According to (a), we choose q ∈ Q∩]a, b[, δ := b − q > 0 and n ∈ N suchthat

√2/n < δ. Then a < λ := q +

√2/n < b and also λ ∈ R \ Q (otherwise,√

2 = n(λ− q) ∈ Q).

(c): Using (a), for each n ∈ N, we choose rational numbers rn and sn such that

rn ∈]

x− 1

n, x− 1

n+ 1

[

, sn ∈]

x+1

n+ 1, x+

1

n

[

.

Then, clearly, (rn)n∈N is strictly increasing, (sn)n∈N is strictly decreasing, and the Sand-wich Th. 7.16 implies x = limn→∞ rn = limn→∞ sn. �

Definition and Remark 7.69 (Exponentiation). We have already used that Not. C.5of the Appendix defines ax for (a, x) ∈ C×N0 and for (a, x) ∈ (C\{0})×Z (using G = C

and G = C \ {0}, respectively). We will now extend the definition to (a, x) ∈ R+ × R

(later, we will further extend the definition to (a, z) ∈ R+ × C). The present extensionto (a, x) ∈ R+ × R is accomplished in two steps – first, in (a), for rational x, then, in(b), for irrational x.

(a) For rational x = k/n with k ∈ Z and n ∈ N, define

ax := akn :=

n√ak. (7.52)

For this definition to make sense, we have to check it does not depend on the specialrepresentation of x, i.e., we have to verify x = k

n= km

nmwith k ∈ Z and m,n ∈ N

implies akn = a

kmnm . To this end, observe, using Rem. and Def. 7.61,

(akn )nm = (

n√ak)nm = akm and (a

kmnm )nm = (

nm√akm)nm = akm, (7.53)

proving akn = a

kmnm (here, as in Rem. and Def. 7.61, we used that λ 7→ λN is one-

to-one on R+0 for each N ∈ N). The exponentiation rules of Th. C.6 now extend to

rational exponents in a natural way, i.e., for each a, b > 0 and each x, y ∈ Q:

ax+y = ax ay, (7.54a)

(ax)y = ax y, (7.54b)

ax bx = (ab)x. (7.54c)

For the proof, by possibly multiplying numerator and denominator by some naturalnumber, we can assume x = k/n and y = l/n with k, l ∈ Z and n ∈ N. Then

(ax+y)n = (ak+ln )n = ak+l

Th. C.6(a)= ak al = (a

kn )n (a

ln )n

Th. C.6(c)= (ax ay)n,

proving (7.54a);

((ax)y)n2 Th. C.6(b)

=(

((ax)ln )n)n

= ((akn )l)n

Th. C.6(b)= ((a

kn )n)l

Th. C.6(b)= akl

= (akl

n2 )n2

= (ax y)n2

,

proving (7.54b);

(ax bx)nTh. C.6(c)

= (akn )n (b

kn )nak bk

Th. C.6(c)= (ab)k = (ab)

kn·n Th. C.6(b)

= ((ab)x)n,

proving (7.54c).

Moreover, we obtain the following monotonicity rules for each a, b ∈ R+ and eachx, y ∈ Q:

∀x>0

(

a bx)

, (7.55b)

∀a>1

(

x < y ⇒ ax < ay)

, (7.55c)

∀0<a<1

(

x < y ⇒ ax > ay)

. (7.55d)

If x = k/n with k, n ∈ N and a < b, then a1/n < b1/n according to Rem. andDef. 7.61, which, in turn, implies ax = (a1/n)k < (b1/n)k = bx, proving (7.55a); anda−1 > b−1 implies a−x = (a−1)x > (b−1)x = b−x, proving (7.55b). If x < y, set q :=y − x > 0. Then 1 < a and (7.55a) imply 1 = 1q < aq, i.e. ax < ax aq = ay, proving(7.55c). Similarly, 0 < a < 1 and (7.55a) imply aq < 1q = 1, i.e. ay = ax aq < ax,proving (7.55d).

The following estimates will also come in handy: For a ∈ R+ and x, y ∈ Q:

a > 1 ∧ x > 0 ⇒ ax − 1 < x · ax+1, (7.56)

∀m∈N

(

x, y ∈ [−m,m] ⇒ |ax − ay| ≤ L |x− y|,

where L := max{am+1, (1/a)m+1})

.(7.57)

For x ≥ 1, (7.56) is proved by ax < ax+1 < x · ax+1 + 1; for x < 1, write x = p/nwith p, n ∈ N and p < n, and apply (7.44) to obtain ax < 1 + x(a− 1) < 1 + xa <1 + x · ax+1. For the proof of (7.57), first consider a > 1. Moreover, by possiblyrenaming x and y, we may assume x < y, i.e. z := y − x > 0. Thus, (7.56) holdswith x replaced by z. Multiplying the resulting inequality by ax yields

ax az − ax = ay − ax < z · ax az+1 = (y − x) ay+1 ≤ (y − x) am+1,

proving (7.57) for a > 1. For a = 1, it is clearly true, and for a < 1, it is a−1 > 1,i.e.

|ax − ay| = |(a−1)−x − (a−1)−y| ≤ |y − x| (a−1)m+1,

finishing the proof of (7.57).

(b) We now define ax for irrational x by letting

ax := limn→∞

aqn , where (qn)n∈N is a sequence in Q with limn→∞

qn = x. (7.58)

For this definition to make sense, we have to know such sequences (qn)n∈N exist,which we do know from Th. 7.68(c). We also know from Th. 7.68(c) that thereexists an increasing sequence (qn)n∈N in Q converging to x, in particular, boundedby x. Then, by (7.55c) and (7.55d), respectively, (aqn)n∈N is increasing for a > 1and decreasing for 0 < a < 1. Moreover, the sequence is bounded from above byaN with N ∈ N, N > x, for a > 1; and bounded from below by 0 for 0 < a < 1.In both cases, Th. 7.19 implies convergence of the sequence to some limit that wemay call ax. However, we still need to verify that, for each sequence (rn)n∈N in Q

with limn→∞ rn = x, the sequence (arn)n∈N converges to the same limit ax in R. Iflimn→∞ rn = x, then limn→∞ |qn − rn| = 0. Since (rn)n∈N and (qn)n∈N are bounded,(7.57) implies

∃L∈R+

∀n∈N

|aqn − arn| ≤ L |qn − rn|, (7.59)

such that Prop. 7.11(a) implies limn→∞ |aqn − arn | = 0 and

limn→∞

arn = limn→∞

(arn − aqn + aqn) = 0 + ax = ax, (7.60)

showing (7.58) does not depend on the chosen sequence.

Proposition 7.70. The exponentiation rules (7.54), the monotonicity rules (7.55), andthe estimates (7.56) and (7.57) remain valid if x, y ∈ Q is replaced by x, y ∈ R. More-over, for each a > 0 and each sequence (xn)n∈N in R:

limn→∞

xn = x ∈ R ⇒ limn→∞

axn = ax. (7.61)

Proof. Given x, y ∈ R, let (pn)n∈N and (qn)n∈N be sequences in Q such that limn→∞ pn =x and limn→∞ qn = y.

We start by verifying (7.57). As we can assume (pn)n∈N and (qn)n∈N to be monotone,we may also assume pn, qn ∈ [−m,m] for each n ∈ N. Then the rational case of (7.57)implies

∀n∈N

|apn − aqn | ≤ L |pn − qn|,

and Th. 7.13(c) establishes the case. Then (7.61) also follows, since

0 ≤ |axn − ax| ≤ L |xn − x| → 0.

We deal with (7.54) next. For each a, b > 0:

ax+y = limn→∞

apn+qn(7.54a)= lim

n→∞(apn aqn) = ax ay,

ax bx = limn→∞

apn limn→∞

bpn = limn→∞

(apn bpn)(7.54c)= lim

n→∞(ab)pn = (ab)x,

∀k∈N

(ax)qk = limn→∞

(apn)qk(7.54b)= lim

n→∞apnqk = ax qk ,

⇒ (ax)y = limn→∞

(ax)qn = limn→∞

ax qn(7.57)= ax y,

thereby proving (7.54).

Proceeding to (7.55c), let a > 1 and h > 0. If (qn)n∈N is a strictly increasing sequencein Q+ with limn→∞ qn = h, then ah = limn→∞ aqn > aq1 > 1. Thus, if x < y, leth := y − x > 0 to obtain ay = ax ah > ax, i.e. (7.55c). If 0 < a < 1 and x < y, then(1/a)x < (1/a)y, yielding (7.55d). For (7.55a), consider x > 0 and 0 < a < b. Then

b

a> 1 ⇒ bx

ax=

(b

a

)x

> 1 ⇒ bx > ax,

proving (7.55a). If x < 0 and 0 < a < b, then ax = (1/a)−x > (1/b)−x = bx, proving(7.55b).

Finally, it remains to verify (7.56). For x ≥ 1, the proof for rational x still works forirrational x. For 0 < x < 1, one uses the usual sequence (qn)n∈N in Q with limn→∞ qn = xand obtains (recalling a > 1)

ax = limn→∞

aqn(7.44)

≤ limn→∞

(1 + qn(a− 1)

)= 1 + x(a− 1) < 1 + x · ax+1,

proving (7.56). �

Definition 7.71 (Exponential and Power Functions). (a) Each function of the form

f : R+ −→ R, f(x) := xα, α ∈ R, (7.62)

is called a power function. For α > 0, the power function is extended to x = 0 bysetting 0α := 0; for α ∈ Z, it is defined on R \ {0}; for α ∈ N0 even on R.

(b) Each function of the form

f : R −→ R+, f(x) := ax, a > 0, (7.63)

is called a (general) exponential function. The case where a = e with e being Euler’snumber from (7.49) is of particular interest and importance. Most of the time, whenreferring to an exponential function, one actually means x 7→ ex. It is also commonto write exp(x) instead of ex.

Theorem 7.72. (a) Every power function as defined in Def. 7.71(a) is continuous onits respective domain. Moreover, for each α > 0, it is strictly increasing on [0,∞[;for each α < 0, it is strictly decreasing on ]0,∞[.

(b) Every exponential function as defined in Def. 7.71(b) is continuous. Moreover, foreach a > 1, it is strictly increasing; for each 0 < a < 1, it is strictly decreasing.

Proof. (a): The monotonicity claims are provided by (7.55a) and (7.55b), respectively.For each α ∈ N0, the power function is a polynomial, for each α ∈ Z, a rational function,i.e. continuity is provided by Ex. 7.40(b) and Ex. 7.40(c), respectively. For a generalα ∈ R, the continuity proof on R+ will be postponed to Ex. 7.76(a) below, where it canbe accomplished more easily. So it remains to show the continuity in x = 0 for α > 0.However, if (xn)n∈N is a sequence in R+ with limn→∞ xn = 0 and k ∈ N with 1/k ≤ α,

then, at least for n sufficiently large such that xn ≤ 1, 0 < xαn ≤ x1/kn by (7.55d). Then

the continuity of x 7→ x1/k implies limn→∞ x1/kn = 0 and the Sandwich Th. 7.16 implies

limn→∞ xαn = 0, proving continuity in x = 0.

(b): Everything has already been proved – continuity is provided by (7.61), monotonicityis provided by (7.55c) and (7.55d). �

Remark and Definition 7.73 (Logarithm). According to Th. 7.72(b), for each a ∈R+ \ {1}, the exponential function f : R −→ R+, f(x) := ax, is continuous and strictlymonotone with f(R) = R+ (verify that the image is all of R+ as an exercise). ThenTh. 7.60 implies the existence of a continuous and strictly monotone inverse functionf−1 : R+ −→ R. For each x ∈ R+, we call f−1(x) the logarithm of x to base a and writeloga x := f−1(x). The most important special case is where the base is Euler’s number,a = e. This is called the natural logarithm. Bases a = 2 and a = 10 also carry specialnames, binary and common logarithm, respectively. The notation is

ln x := loge x, lb x := log2 x, lg x := log10 x, (7.64)

however, the notation in the literature varies – one finds log used instead of ln, lb, andlg; one also finds lg instead of lb. So you always need to verify what precisely is meantby either notation.

Corollary 7.74. For each a ∈ R+ \ {1}, the logarithm function f : R+ −→ R, f(x) =loga x is continuous. For a > 1, it is strictly increasing; for 0 < a < 1, it is strictlydecreasing. �

Theorem 7.75. One obtains the following logarithm rules:

∀a∈R+\{1}

loga 1 = 0, (7.65a)

∀a∈R+\{1}

loga a = 1, (7.65b)

∀a∈R+\{1}

∀x∈R+

aloga x = x, (7.65c)

∀a∈R+\{1}

∀x∈R

loga ax = x, (7.65d)


∀a∈R+\{1}

∀x,y∈R+

loga(xy) = loga x+ loga y, (7.65e)

∀a∈R+\{1}

∀x∈R+

∀y∈R

loga(xy) = y loga x, (7.65f)

∀a∈R+\{1}

∀x,y∈R+

loga(x/y) = loga x− loga y, (7.65g)

∀a∈R+\{1}

∀x∈R+

∀n∈N

logan√x =

1

nloga x, (7.65h)

∀a,b∈R+\{1}

∀x∈R+

logb x = (logb a) loga x. (7.65i)

Proof. All the rules are easy consequences of the logarithm being defined as the inversefunction to f : R −→ R+, f(x) := ax.

(7.65a): It is loga 1 = f−1(1) = 0, as f(0) = a0 = 1.

(7.65b): It is loga a = f−1(a) = 1, as f(1) = a1 = a.

(7.65c): It is aloga x = f(f−1(x)) = x.

(7.65d): It is loga ax = f−1(f(x)) = x.

(7.65e): It is loga(xy) = f−1(xy) = f−1(f(loga x+ loga y)

)= loga x+ loga y, since

f(loga x+ loga y) = aloga x+loga y = aloga x aloga y(7.65c)= xy.

(7.65f): It is loga(xy) = f−1(xy) = f−1

(f(y loga x)

)= y loga x, since

f(y loga x) = ay loga x = (aloga x)y(7.65c)= xy.

(7.65g) is just a combination of (7.65e) and (7.65f): loga(x/y) = loga(xy−1) = loga x−

loga y.

(7.65h) is just a special case of (7.65f): logan√x = loga x

1/n = 1nloga x.

(7.65i): One computes

(logb a) loga x(7.65f)= logb a

loga x(7.65c)= logb x.

Thus, we have verified all the rules and concluded the proof. �

Example 7.76. (a) For each α ∈ R, the power function

f : R+ −→ R, f(x) := xα = eα lnx, (7.66)

is continuous, which follows from Th. 7.41, since f = exp ◦(α ln), ln is continuousby Cor. 7.74, and exp is continuous by Th. 7.72(b).


(b) As a consequence of Th. 7.41, each of the following functions f1, f2, f3, where

f1 : R −→ R, f1(x) :=(exp(λ+ x2)

)α,

f2 : R −→ R, f2(x) :=1

eαx + λ,

f3 : R −→ R, f3(x) :=x5

(λ+ |x|)α ,

is continuous for each α ∈ R and each λ ∈ R+.

7.3 Series

7.3.1 Definition and Convergence

Series are a special type of sequences, namely sequences whose members arise fromsumming up the members of another sequence. We have, on occasion, already encoun-tered series, for example the harmonic series (sn)n∈N, whose members sn were definedin (7.27). In the present section, we will study series more systematically.

Definition 7.77. Given a sequence (an)n∈N in K (or, more generally, in any set A,where an addition is defined), the sequence (sn)n∈N, where

∀n∈N

sn :=n∑

j=1

aj, (7.67)

is called an (infinite) series and is denoted by

∞∑

j=1

aj :=∑

j∈Naj := (sn)n∈N. (7.68)

The an are called the summands of the series, the sn its partial sums. Moreover, eachseries

∑∞j=k aj with k ∈ N is called a remainder (series) of the series (sn)n∈N.

The example of the remainder series already shows that it is useful to allow countableindex sets other than N. Thus, if (aj)j∈I , where I is a countable index set and φ : N −→I a bijective map, then define

∑

j∈Iaj :=

∞∑

j=1

aφ(j) (7.69)

(compare the definition in (3.19c) for finite sums). Note that the definition depends onφ, which is suppressed in the notation

∑

j∈I aj.

—

For sequences in K, the notion of convergence is available, and, thus, it is also availablefor series arising from real or complex sequences (as such series are, again, sequences inK).

Definition 7.78. If (sn)n∈N is a series with the sn defined as in (7.67) and with sum-mands aj ∈ K, then the series is called convergent with limit s ∈ K if, and only if,limn→∞ sn = s in the sense of (7.1). In that case, one writes

∞∑

j=1

aj = s (7.70)

and calls s the sum of the series. The series is called divergent if, and only if, it isnot convergent. We write

∑∞j=1 aj = ∞ (resp.

∑∞j=1 aj = −∞) if, and only if, (sn)n∈N

diverges to ∞ (resp. −∞) in the sense of Def. 7.18.

Caveat 7.79. One has to use care as the symbol∑∞

j=1 aj is used with two completelydifferent meanings. If it is used according to (7.68), then it means a sequence; if it isused according to (7.70), then it means a real or complex number (or, possibly, ∞ or−∞). It should always be clear from the context, if it means a sequence or a number.For example, in the statement “the series

∑∞j=1 2

−j is convergent”, it means a sequence;

whereas in the statement “∑∞

j=1 2−j = 1”, it means a number.

Example 7.80. (a) For each q ∈ C with |q| < 1,∑∞

j=0 qj is called a geometric series.

From (3.22b) (the reader is asked to go back and check that (3.22b) and its proof,indeed, remain valid for each q ∈ C), we obtain the partial sums sn =

∑nj=0 q

j =1−qn+1

1−q . Since |q| < 1, we know limn→∞ qn+1 = 0 from Ex. 7.6. Thus, the series isconvergent with

∀|q|<1

∞∑

j=0

qj = limn→∞

sn = limn→∞

1− qn+1

1− q=

1

1− q. (7.71)

(b) In Ex. 7.30, we obtained the divergence of the harmonic series:

∞∑

k=1

1

k= ∞. (7.72)

Corollary 7.81. Let∑∞

j=1 aj and∑∞

j=1 bj be convergent series in C.

(a) Linearity:

∀λ,µ∈C

∞∑

j=1

(λ aj + µ bj) = λ

∞∑

j=1

aj + µ

∞∑

j=1

bj. (7.73)

(b) Complex Conjugation:∞∑

j=1

aj =∞∑

j=1

aj. (7.74)

(c) Monotonicity:

(

∀j∈N

aj, bj ∈ R ∧ aj ≤ bj

)

⇒∞∑

j=1

aj ≤∞∑

j=1

bj. (7.75)


(d) Each remainder series∑∞

j=n+1 aj, n ∈ N, converges, and, letting S :=∑∞

j=1 aj,sn :=

∑nj=1 aj, rn :=

∑∞j=n+1 aj, one has

(

∀n∈N

S = sn + rn

)

, limn→∞

an = limn→∞

rn = 0. (7.76)

Proof. (a) follows from the first two identities of Th. 7.13(a), (b) is due to

∞∑

j=1

aj = limn→∞

n∑

j=1

ajDef. and Rem. 5.5(a)

= limn→∞

n∑

j=1

aj(7.11f)= lim

n→∞

n∑

j=1

aj =∞∑

j=1

aj,

(c) follows from Th. 7.13(c), and, for (d), one computes

limn→∞

an = limn→∞

(sn − sn−1) = S − S = 0,

∀n∈N

rn = limk→∞

k∑

j=n+1

aj = limk→∞

(sk − sn) = S − sn,

limn→∞

rn = limn→∞

(S − sn) = S − S = 0,


7.3.2 Convergence Criteria

Corollary 7.82. Let∑∞

j=1 aj be series such that all aj ∈ R+0 . If sn :=

∑nj=1 aj are the

partial sums of∑∞

j=1 aj, then

limn→∞

sn =

{

sup{sn : n ∈ N} if (sn)n∈N is bounded,

∞ if (sn)n∈N is not bounded.(7.77)

Proof. Since (sn)n∈N is increasing, (7.77) is a consequence of (7.21). �

Theorem 7.83. Let∑∞

j=1 aj and∑∞

j=1 bj be series in C such that |aj| ≤ |bj| holds foreach j ≥ k for some fixed k ∈ N.

(a) If∑∞

j=1 |bj| is convergent, then∑∞

j=1 aj is convergent as well, and, moreover,

∣∣∣∣∣

∞∑

j=k

aj

∣∣∣∣∣≤

∞∑

j=k

|bj|. (7.78)

(b) If∑∞

j=1 aj is divergent, then∑∞

j=1 |bj| is divergent as well.

Proof. Since (b) is merely the contraposition of (a), it suffices to prove (a). To this end,let sn :=

∑nj=1 aj and tn :=

∑nj=1 |bj| be the partial sums of

∑∞j=1 aj and

∑∞j=1 |bj|,

respectively. Since (tn)n∈N converges, it must be a Cauchy sequence by Th. 7.29. Thus,

∀ǫ∈R+

∃N∈N,N≥k

∀n>m>N

|tn − tm| = |bm+1|+ · · ·+ |bn| < ǫ

and the triangle inequality for finite sums implies

∀ǫ∈R+

∃N∈N,N≥k

∀n>m>N

|sn − sm| = |am+1 + · · ·+ an| ≤ |am+1|+ · · ·+ |an|≤ |bm+1|+ · · ·+ |bn| < ǫ,

showing (sn)n∈N is a Cauchy sequence as well, i.e. convergent by Th. 7.29. Since thetriangle inequality for finite sums also implies

∣∣∑n

j=k aj∣∣ ≤ ∑n

j=k |bj| for each n ≥ k,(7.78) is now a consequence of Th. 7.13(c). �

Definition 7.84. A series∑∞

j=1 aj in R is called alternating if, and only if, its summandsalternate between positive and negative signs, i.e. if sgn(aj+1) = − sgn(aj) 6= 0 for eachj ∈ N.

Theorem 7.85 (Leibniz Criterion). Let∑∞

j=1 aj be an alternating series. If the sequence(|an|)n∈N of absolute values is strictly decreasing and limn→∞ an = 0, then the series isconvergent and

∀n∈N

∃0<θn<1

rn :=∞∑

j=n+1

aj = θn an+1, (7.79)

that means the error made when approximating the limit by the partial sum sn has thesame sign as the first neglected summand an+1, and its absolute value is less than |an+1|.

Proof. We first consider the case where a1 > 0, i.e. where there exists a strictly de-creasing sequence of positive numbers (bn)n∈N such that an = (−1)n+1bn. As the bn arestrictly decreasing, we obtain bn − bn+1 > 0 for each n ∈ N, such that the sequences(un)n∈N and (vn)n∈N, defined by

∀n∈N

un := s2n =n∑

j=1

(b2j−1 − b2j) = (b1 − b2) + (b3 − b4) + · · ·+ (b2n−1 − b2n),

∀n∈N

vn := s2n+1 = b1 −n∑

j=1

(b2j − b2j+1)

= b1 − (b2 − b3)− (b4 − b5)− · · · − (b2n − b2n+1),

are strictly monotone, namely (un)n∈N strictly increasing and (vn)n∈N strictly decreasing.Since, 0 < un < un + b2n+1 = vn < b1 for each n ∈ N, both sequences (un)n∈N and(vn)n∈N are also bounded, and, thus, convergent by Th. 7.19, i.e. U := limn→∞ un ∈ R

and V := limn→∞ vn ∈ R. Since

V − U = limn→∞

(vn − un) = limn→∞

(s2n+1 − s2n) = limn→∞

a2n+1 = 0,

we obtain U = V and limn→∞ sn = U and 0 < U < b1 = a1. In particular, there isθ ∈]0, 1[ satisfying ∑∞

j=1 aj = θ a1.

In the case a1 < 0, the above proof yields convergence of −∑∞j=1 aj =

∑∞j=1(−aj) with∑∞

j=1(−aj) = θ (−a1) for a suitable θ ∈]0, 1[. However, this then yields, as before,∑∞

j=1 aj = θ a1.

Applying the above result to each remainder series∑∞

j=n+1 aj, n ∈ N, completes theproof of (7.79) and the theorem. �

Example 7.86. (a) Each of the following alternating series clearly converges, as theLeibniz criterion of Th. 7.85 clearly applies in each case:

∞∑

j=1

(−1)j+1

j= 1− 1

2+

1

3−+ . . . , (7.80a)

∞∑

j=1

(−1)j+1

2 j − 1= 1− 1

3+

1

5−+ . . . , (7.80b)

∞∑

j=1

(−1)j+1

ln(j + 1)=

1

ln 2− 1

ln 3+

1

ln 4−+ . . . (7.80c)

(b) To see that Th. 7.85 is false without its monotonicity requirement, take any diver-gent series with

∑∞j=1 aj = ∞, 0 < aj, limj→∞ aj = 0 (for example the harmonic

series), any convergent series with∑∞

j=1 cj = s ∈ R+ and 0 < cj (for example anygeometric series with 0 < q < 1), and define

dn :=

{

a(n+1)/2 for n odd,

−cn/2 for n even.

It is an exercise to show that∑∞

j=1 dj is an alternating series with limn→∞ dn = 0and

∑∞j=1 dj = ∞.

Definition 7.87. The series∑∞

j=1 aj in C is said to be absolutely convergent if, andonly if,

∑∞j=1 |aj| is convergent.

Corollary 7.88. Every absolutely convergent series∑∞

j=1 aj is also convergent andsatisfies the triangle inequality for infinite series:

∣∣∣∣∣

∞∑

j=1

aj

∣∣∣∣∣≤

∞∑

j=1

|aj|. (7.81)

Proof. The corollary is given by the special case aj = bj for each j ∈ N of Th. 7.83(a). �

Theorem 7.89. We consider the series∑∞

j=1 aj in C.

(a) If∑∞

j=1 cj is a convergent series such that cj ∈ R+0 and |aj| ≤ cj for each j ∈ N,

then∑∞

j=1 aj is absolutely convergent.

(b) Root Test:

(

∃0<q<1

( n√

|an| ≤ q < 1 for almost all n ∈ N)

)

⇒∞∑

j=1

aj is absolutely convergent, (7.82a)

#{

n ∈ N : n√

|an| ≥ 1}

= ∞ ⇒∞∑

j=1

aj is divergent. (7.82b)

(c) Ratio Test: If all an 6= 0, then

(

∃0<q<1

(∣∣∣∣

an+1

an

∣∣∣∣≤ q < 1 for almost all n ∈ N

))

⇒∞∑

j=1

aj is absolutely convergent, (7.83a)

∣∣∣∣

an+1

an

∣∣∣∣≥ 1 for almost all n ∈ N ⇒

∞∑

j=1

aj is divergent. (7.83b)

Proof. (a) is just another special case of Th. 7.83(a).

(b): If there is q ∈]0, 1[ and N ∈ N such that n√

|an| ≤ q for each n > N , i.e. |an| ≤ qn

for each n > N , then, by (7.71),∑∞

j=1 |aj| is bounded by 11−q +

∑Nj=1 |aj| and, thus,

convergent. If n√

|an| ≥ 1 for infinitely many n ∈ N, then |an| ≥ 1 for infinitely manyn ∈ N, showing that (an)n∈N does not converge to 0, proving the divergence of

∑∞j=1 aj.

(c): If there is q ∈]0, 1[ and N ∈ N such that∣∣∣an+1

an

∣∣∣ ≤ q for each n > N , then,

letting C := |aN+1|, an induction shows |aN+1+k| ≤ Cqk for each k ∈ N, i.e., by (7.71),∑∞

j=1 |aj| is bounded by C1−q +

∑N+1j=1 |aj| and, thus, convergent. If there is N ∈ N such

that∣∣∣an+1

an

∣∣∣ ≥ 1 for each n > N , then |an| ≥ |aN+1| > 0 for each n > N , showing (an)n∈N

does not converge to 0 and proving the divergence of∑∞

j=1 aj. �

Caveat 7.90. In (7.82a), it does not suffice to have n√

|an| < 1 to conclude convergence,and, likewise, |an+1

an| < 1 does not suffice in (7.83a): As a counterexample, consider

the harmonic series, which does not converge, but n√

1/n < 1 for each n ≥ 2 and1/(n+1)

1/n= n

n+1< 1 for each n ∈ N.

Example 7.91. (a) For each z ∈ C with |z| < 1 and each p ∈ N0, the series∑∞

n=1 np zn

is absolutely convergent: We have limn→∞n√np = 1 as a consequence of Ex. 7.65.

This implies limn→∞n√

|an| = limn→∞n√

np|z|n = |z| < 1. Thus, the root test of(7.82a) applies and proves convergence of the series.

(b) Let z ∈ C. The series∑∞

n=1zn n!nn is absolutely convergent for |z| < e and divergent

for |z| ≥ e, where e is Euler’s number from (7.49). We have, for each n ∈ N,∣∣∣∣

an+1

an

∣∣∣∣=

|z| (n+ 1)nn

(n+ 1)n+1=

|z|(1 + 1

n

)n → |z|e

for n→ ∞. (7.84)

Thus, the ratio test of (7.83a) applies and proves absolute convergence of the seriesfor |z| < e. For |z| > e, (7.83b) applies and proves divergence. Since, according toEx. 7.66,

(1 + 1

n

)n< e for each n ∈ N, (7.83b) applies to prove divergence also for

|z| = e.

7.3.3 Absolute Convergence and Rearrangements

In general, one has to use care when dealing with infinite series, as convergence propertiesand even the limit in case of convergence can depend on the order of the summands (inobvious contrast to the situation of finite sums). For real series that are convergent, butnot absolutely convergent, one has the striking Riemann rearrangement Th. 7.93, thatstates one can choose an arbitrary number S ∈ R∪{−∞,∞} and reorder the summandssuch that the new series converges to S (actually, Th. 7.93 says even more, namely thatone can prescribe an entire interval of cluster points for the rearranged series). However,the situation is better for absolutely convergent series. In Th. 7.95, we will see that thesum of absolutely convergent series does not depend on the order of the summands.

Proposition 7.92. Let∑∞

j=1 aj be a series in R. Defining

∀j∈N

(

a+j := max{aj, 0}, a−j := max{−aj, 0})

, (7.85)

the following assertions (a) and (b) hold:

(a)∑∞

j=1 aj is absolutely convergent if, and only if, both series∑∞

j=1 a+j and

∑∞j=1 a

−j

are convergent.

(b) If∑∞

j=1 aj is convergent, but not absolutely convergent, then

∞∑

j=1

a+j =∞∑

j=1

a−j = ∞. (7.86)

Proof. The key observation is that (7.85) implies, for each j ∈ N,

a+j + a−j = |aj|, (7.87a)

a+j − a−j = aj, (7.87b)

0 ≤ a+j , a−j ≤ |aj|. (7.87c)

(a): If∑∞

j=1 a+j and

∑∞j=1 a

−j are convergent, then

∞∑

j=1

|aj|(7.87a),(7.73)

=∞∑

j=1

a+j +∞∑

j=1

a−j , (7.88)

and, in particular,∑∞

j=1 aj is absolutely convergent. Conversely, if∑∞

j=1 aj is absolutely

convergent, then∑∞

j=1 a+j and

∑∞j=1 a

−j are convergent by (7.87c) and Th. 7.83(a).

(b): If∑∞

j=1 aj and∑∞

j=1 a+j are convergent, then (7.87b) implies that

∑∞j=1 a

−j is also

convergent and, thus,∑∞

j=1 aj absolutely convergent by (a). Likewise, if∑∞

j=1 aj and∑∞

j=1 a−j are convergent, then (7.87b) implies that

∑∞j=1 a

+j is also convergent and, once

again,∑∞

j=1 aj absolutely convergent by (a). Therefore, if∑∞

j=1 aj is convergent, butnot absolutely convergent, then (7.86) must hold by (7.77). �

Theorem 7.93 (Riemann Rearrangement Theorem a.k.a. Riemann Series Theorem).Let

∑∞j=1 aj be a series in R. If

∑∞j=1 aj is convergent, but not absolutely convergent,

then, given x, y ∈ R∪{−∞,∞} with x ≤ y, there exists a rearrangement∑∞

j=1 bj of theseries (i.e. a reordering (bj)j∈N of (aj)j∈N) such that

∑∞j=1 bj has precisely all elements of

[x, y] as cluster points (where we call −∞ (resp. ∞) a cluster point of the real sequence(tn)n∈N if, and only if, #{n ∈ N : tn < −N} = ∞ (resp. #{n ∈ N : tn > N} = ∞) foreach N ∈ N). In particular, choosing S := x = y ∈ R∪{−∞,∞}, one can prescribe anarbitrary limit S such that

∑∞j=1 bj = S.

Sketch of Proof. Here we just give a sketch of the proof to convey its fairly simple idea;a detailed proof is provided in Appendix E.1. According to Prop. 7.92(b), (7.86) musthold, where the a+j and a−j are as defined in (7.85). Thus, we can define

∀k∈N

xk :=

−k for x = −∞,

x for x ∈ R,

k for x = ∞,

yk :=

−k for y = −∞,

y for y ∈ R,

k for y = ∞,

and, noting xk ≤ yk for almost all k ∈ N, alternate between adding summands a+j untilthe partial sum exceeds yk and subtracting summands a−j until the partial sum fallsbelow xk. If k is sufficiently large such that xk ≤ yk, then, at each switching point (fromadding to subtracting or vice versa), the absolute value of the difference between thelast partial sum and xk or yk, respectively, is less than the value of the last contributingnonzero summand. Since

limj→∞

a+j = limj→∞

a−j = 0,

the partial sums corresponding to the switching points converge to the respective end-points x or y, respectively, and precisely all points between x and y are cluster points. �

We will now study the more benign situation of absolutely convergent series.

Theorem 7.94. Let∑∞

j=1 aj and∑∞

j=1 bj be series in C such that (bn)n∈N is a reorderingof (an)n∈N in the sense of Def. 7.21. If

∑∞j=1 aj is absolutely convergent, then so is

∑∞j=1 bj and

∑∞j=1 aj =

∑∞j=1 bj.

Proof. Let sn :=∑n

j=1 aj, sn :=∑n

j=1 |aj|, and tn :=∑n

j=1 bj denote the respectivepartial sums. We will show that limn→∞(sn − tn) = 0. Given ǫ > 0, since (sn)n∈N is aCauchy sequence by Th. 7.29, there exists N ∈ N, such that

∀n>m>N

|sn − sm| = |am+1|+ · · ·+ |an| < ǫ.

Since (bn)n∈N is a reordering of (an)n∈N, there exists a bijective map φ : N −→ N suchthat bn = aφ(n) for each n ∈ N. Since φ is bijective, there exists M ∈ N such that{1, 2, . . . , N + 1} ⊆ φ{1, 2, . . . ,M}. Then n > M implies φ(n) > N + 1, and

∀n>M

∃k∈N

|sn − tn| ≤ |aN+2|+ · · ·+ |aN+k| < ǫ,

since all aj with j ≤ N+1 occur in both sn and tn and cancel in sn−tn (i.e. all aj that donot cancel must have an index j > N +1). So we have shown that limn→∞(sn− tn) = 0,which, in turn, implies

∞∑

j=1

bj = limn→∞

tn = limn→∞

(tn − sn + sn) = 0 +∞∑

j=1

aj =∞∑

j=1

aj.

Applying this to sn :=∑n

j=1 |aj| yields∑∞

j=1 |bj| =∑∞

j=1 |aj|, proving absolute conver-gence of

∑∞j=1 bj. �

Theorem 7.95. Let I be an arbitrary infinite countable index set and let

I =⋃

n∈NIn (7.89)

be a disjoint decomposition of I into (empty, finite, or infinite) countable index sets In.

(a) If the series∑

j∈I aj (cf. (7.69)) is absolutely convergent, then

∑

j∈Iaj =

∞∑

n=1

∑

α∈Inaα. (7.90)

(b) The following statements are equivalent:

(i)∑

j∈I aj is absolutely convergent.

(ii) There exists a constant C ∈ R+0 such that

∑

j∈J |aj| ≤ C for each finite subsetJ of I.

(iii)∑∞

n=1

∑

α∈In |aα| <∞.

Proof. (a): First note that Th. 7.94 implies that, for absolutely convergent∑

j∈I aj,the limit

∑

j∈I aj =∑∞

j=1 aφ(j) does not depend on the bijective map φ : N −→ I:For each bijective map ψ : N −→ I, (aψ(j))j∈N is a reordering of (aφ(j))j∈N and, thus,∑∞

j=1 aψ(j) =∑∞

j=1 aφ(j).

Analogously, the sums∑

α∈In aα do not depend on the order of the indices in In.

Claim 1. If M ⊆ I, then S(I) = S(M) + S(I \M), where S(J) :=∑

j∈J aj for eachJ ⊆ I.

Proof. If M = ∅, then there is nothing to prove. If #M = n ∈ N, then let φ1 :{1, . . . , n} −→M and φ2 : {n+ 1, n+ 2, . . . } −→ I \M be bijective maps. Then

φ : N −→ I, φ(j) :=

{

φ1(j) for j ≤ n,

φ2(j) for j > n,

is a bijective map. Moreover,

S(I) =∞∑

j=1

aφ(j)(7.76)=

n∑

j=1

aφ(j) +∞∑

j=n+1

aφ(j) = S(M) + S(I \M),

establishing the case.

If #M = #(I \M) = #N, then let φ1 : {1, 3, 5, . . . } −→ M and φ2 : {2, 4, 6, . . . } −→I \M be bijective maps. Then

φ : N −→ I, φ(j) :=

{

φ1(j) for j odd,

φ2(j) for j even,

is a bijective map. Define,

bφ(j) :=

{

aφ(j) for j odd,

0 for j even,cφ(j) :=

{

aφ(j) for j even,

0 for j odd.

One then obtains

S(I) =∞∑

j=1

aφ(j)(7.73)=

∞∑

j=1

bφ(j) +∞∑

j=1

cφ(j) = S(M) + S(I \M),

establishing the case. N

Claim 2. If I = ˙⋃k

n=1Mn with k ∈ N is a decomposition of I, then, using the notation

introduced in Cl. 1, S(I) =∑k

n=1 S(Mn).

Proof. Follows by an induction from Cl. 1. N

Coming back to (7.89), Cl. 2 implies

∀k∈N

(

S(I) = S(I1) + S(I2) + · · ·+ S(Ik) + S(Mk), where Mk := I \k⋃

j=1

Ij

)

.

To prove the equality in (7.90), fix a bijective φ : N −→ I, and let ǫ > 0. Due to Cor.7.81(d), the sums rn :=

∑∞j=n+1 |aφ(j)| of the remainder series converge to 0, i.e. there

exists N ∈ N such that rn < ǫ for each n > N . More generally, for each (empty, finite,or infinite) subset J ⊆ {N + 2, N + 3, . . . },

∑

j∈J|aφ(j)| ≤

∞∑

j=N+2

|aφ(j)| = rN+1 < ǫ.

Next, we chooseM ∈ N sufficiently large such that {φ(1), . . . , φ(N +1)} ⊆ I1∪· · ·∪IM .Then, for each k > M ,

|S(Mk)| =∣∣∣∣∣

∑

j∈Mk

aj

∣∣∣∣∣

(7.81)

≤∑

j∈Mk

|aj| ≤∞∑

j=N+2

|aφ(j)| = rN+1 < ǫ,

proving∑

j∈Iaj = S(I) = lim

k→∞

k∑

n=1

S(In) =∞∑

n=1

∑

α∈Inaα,

which is (7.90).

(b): (i) implies (ii) with C :=∑

j∈I |aj| using Cl. 1 (with aj replaced by |aj|). (i) implies(iii) using (7.90) (with aj replaced by |aj|). (ii) implies (i) via (7.77), as C is an upperbound for (

∑nj=1 |aφ(j)|)n∈N for each bijection φ : N −→ I. Finally, (iii) implies (ii)

with C :=∑∞

n=1

∑

α∈In |aα|, since, given a finite J ⊆ I, there exists k ∈ N such thatJ ⊆ I1 ∪ · · · ∪ Ik, i.e.

∑

j∈J|aj| ≤

k∑

n=1

∑

α∈In|aα| ≤

∞∑

n=1

∑

α∈In|aα| = C,

thereby completing the proof. �

Example 7.96. We apply Th. 7.95 to so-called double series, i.e. to series with indexset I := N× N. The following notation is common:

∞∑

m,n=1

amn :=∑

(m,n)∈N×N

a(m,n), (7.91)

where one writes amn (also am,n) instead of a(m,n). Recall from Th. 3.16 that N × N iscountable. In general, the convergence properties of the double series and, if it exists,the value of the sum, will depend on the chosen bijection φ : N −→ N× N.

However, we will now assume our double series to be absolutely convergent. Then Th.7.94 guarantees the sum does not depend on the chosen bijection and we can apply Th.7.95. Applying Th. 7.95 to the decompositions

N× N =⋃

m∈N{(m,n) : n ∈ N}, (7.92a)

N× N =⋃

n∈N{(m,n) : m ∈ N}, (7.92b)

N× N =⋃

k∈N{(m,n) ∈ N× N : m+ n = k}, (7.92c)

yields

∑

(m,n)∈N×N

a(m,n)(7.92a)=

∞∑

m=1

∞∑

n=1

amn(7.92b)=

∞∑

n=1

∞∑

m=1

amn

(7.92c)=

∞∑

k=2

∑

m+n=k

amn :=∞∑

k=2

k−1∑

m=1

am,k−m. (7.93)

Theorem 7.97. It is possible to compute the product of two absolutely convergent (realor complex) series

∑∞m=1 am and

∑∞m=1 bm as a double series:

( ∞∑

m=1

am

)( ∞∑

m=1

bm

)

=∞∑

m,n=1

ambn =∞∑

k=2

k−1∑

m=1

ambk−m =∞∑

k=2

ck,

where ck :=k−1∑

m=1

ambk−m = a1bk−1 + a2bk−2 + · · ·+ ak−1b1.

(7.94)

This form of computing the product is known as a Cauchy product.

Proof. We first show that∑∞

m,n=1 ambn is absolutely convergent: By letting A :=∑∞

m=1 |am| and B :=∑∞

m=1 |bm|, we obtain

∞∑

m=1

∞∑

n=1

|ambn| =∞∑

m=1

(|am|B

)= AB <∞,

i.e.∑∞

m,n=1 ambn is absolutely convergent according to Th. 7.95(b)(iii). Now the secondequality in (7.94) is just the third equality in (7.93), and the first equality in (7.94) alsofollows from (7.93):

∞∑

m,n=1

ambn =∞∑

m=1

∞∑

n=1

ambn =∞∑

m=1

am

∞∑

n=1

bn =

( ∞∑

m=1

am

)( ∞∑

m=1

bm

)

,


Theorem 7.97 will be useful in Sec. 8.2 below.

7.3.4 b-Adic Representations of Real Numbers

We are mostly used to representing real numbers in the decimal system. For example,we write

x =395

3= 131.6 = 1 · 102 + 3 · 101 + 1 · 100 +

∞∑

n=1

6 · 10−n, (7.95a)

where ∞∑

n=1

6 · 10−n (7.71)= 6 ·

(1

1− 110

− 1

)

= 6 · 19=

2

3.

The decimal system represents real numbers as, in general, infinite series of decimalfractions. Digital computers represent numbers in the dual system, using base 2 insteadof 10. For example, the number from (7.95a) has the dual representation

x = 10000011.10 = 27 + 21 + 20 +∞∑

n=0

2−(2n+1), (7.95b)

where it is an exercise to verify

∞∑

n=0

2−(2n+1) =2

3.

Representations with base 16 (hexadecimal) and 8 (octal) are also of importance whenworking with digital computers. More generally, each natural number b ≥ 2 can be usedas a base.

Definition 7.98. Let b ≥ 2 be a natural number.

(a) Given an integer N ∈ Z and a sequence (dN , dN−1, dN−2, . . . ) in {0, . . . , b− 1}, theseries ∞∑

ν=0

dN−ν bN−ν (7.96)

is called a b-adic series. The number b is called the base or the radix, and thenumbers dν are called digits.

(b) If x ∈ R+0 is the sum of the b-adic series given by (7.96), than one calls the b-adic

series a b-adic representation or a b-adic expansion of x.

Theorem 7.99. Given a natural number b ≥ 2 and a nonnegative real number x ∈R+

0 , there exists a b-adic series representing x, i.e. there is N ∈ Z and a sequence(dN , dN−1, dN−2, . . . ) in {0, . . . , b− 1} such that

x =∞∑

ν=0

dN−ν bN−ν . (7.97)

If one introduces the additional requirement that 0 6= dN , then each x > 0 has either aunique b-adic representation or precisely two b-adic representations. More precisely, for0 6= dN and x > 0, the following statements are equivalent:

(i) The b-adic representation of x is not unique.

(ii) There are precisely two b-adic representations of x.

(iii) There exists a b-adic representation of x such that dn = 0 for each n ≤ n0 forsome n0 < N .

(iv) There exists a b-adic representation of x such that dn = b− 1 for each n ≤ n0 forsome n0 ≤ N .

8 CONVERGENCE OF K-VALUED FUNCTIONS 104

Proof. The proof is a bit lengthy and is provided in Appendix E.2. �

Example 7.100. Every natural number has precisely two decimal (i.e. 10-adic) repre-sentations. For instance,

2 = 2.0 = 1.9 = 1 +∞∑

n=1

9 · 10−n (7.71)= 1 + 9 ·

(1

1− 110

− 1

)

= 1 + 9 · 19, (7.98)

and analogously for all other natural numbers.

8 Convergence of K-Valued Functions

8.1 Pointwise and Uniform Convergence

So far we have studied the convergence of sequences in K. We will now also need tostudy the convergence of sequences (fn)n∈N, where each member fn of the sequence isa function fn : M −→ K, M ⊆ C. Here, for the first time, we encounter the situationthat there exist different useful notions of convergence for such sequences.

Definition 8.1. Let (fn)n∈N be a sequence of functions, fn : M −→ K, ∅ 6=M ⊆ C.

(a) We say (fn)n∈N converges pointwise to f : M −→ K if, and only if, limn→∞ fn(z) =f(z) for each z ∈M , i.e. if, and only if,

∀z∈M

∀ǫ∈R+

∃N∈N

∀n>N

|fn(z)− f(z)| < ǫ. (8.1)

So, in general, N in (8.1) depends on both z and ǫ.

(b) We say (fn)n∈N converges uniformly to f : M −→ K if, and only if,

∀ǫ∈R+

∃N∈N

∀n>N

∀z∈M

|fn(z)− f(z)| < ǫ. (8.2)

In (8.2), N is still allowed to depend on ǫ, but, in contrast to the situation of (8.1),not on z – in that sense, the convergence is uniform in z.

Remark 8.2. It is immediate from Def. 8.1(a),(b) that uniform convergence impliespointwise convergence, but Ex. 8.3(b) below will show the converse is not true.

Example 8.3. (a) Let ∅ 6= M ⊆ C (for example M = [0, 1] or M = B1(0)), andfn : M −→ K, fn(z) = 1/n for each n ∈ N. Then, clearly, (fn)n∈N convergesuniformly to f ≡ 0.

(b) The sequence (fn)n∈N, where fn : [0, 1] −→ R, fn(x) := xn, converges pointwise,but not uniformly, to

f : [0, 1] −→ R, f(x) :=

{

0 for 0 ≤ x < 1,

1 for x = 1 :(8.3)

For x = 1, limn→∞ xn = limn→∞ 1 = 1, and, for 0 ≤ x < 1, limn→∞ xn = 0 by Ex.7.6. To see that the convergence is not uniform, consider ǫ := 1

2. Then, for every

n ∈ N, according to the intermediate value Th. 7.57, there exists ξn ∈]0, 1[ suchthat fn(ξn) = ξnn = 1

2, i.e.

∀n∈N

|fn(ξn)− f(ξn)| = ξnn =1

2= ǫ, (8.4)

proving the convergence is not uniform.

Theorem 8.4. Let (fn)n∈N be a sequence of functions, fn : M −→ K, ∅ 6= M ⊆ C. If(fn)n∈N converges uniformly to f : M −→ K and all fn are continuous at ζ ∈ M , thenf is also continuous at ζ. In particular, if each fn is continuous, then so is f (uniformlimits of continuous functions are continuous).

Proof. Let ǫ > 0. Due to the uniform convergence of (fn)n∈N,

∃m∈N

∀z∈M

|fm(z)− f(z)| < ǫ

3. (8.5)

Due to the continuity of fm in ζ,

∃δ>0

∀z∈M∩Bδ(ζ)

|fm(z)− fm(ζ)| <ǫ

3. (8.6)

Thus,

∀z∈M∩Bδ(ζ)

|f(z)−f(ζ)| ≤ |f(z)−fm(z)|+|fm(z)−fm(ζ)|+|fm(ζ)−f(ζ)| < 3· ǫ3= ǫ, (8.7)

proving continuity of f in ζ. �

8.2 Power Series

Definition 8.5. (a) In Def. 7.77, it was mentioned that series can be formed from eachsequence in a set A, where an addition is defined. Letting ∅ 6= M ⊆ C, we nowconsider A := F(M,K), i.e. the set of functions from M into K. Then the additionon A is defined according to (6.1a) and, given a sequence of functions (fn)n∈N in A,the series ∞∑

j=1

fj := (sn)n∈N (8.8)

is defined as the sequence of partial sums sn :=∑n

j=1 fj.

(b) Given a sequence of functions (fn)n∈N0 , where fn : K −→ K, fn(z) = an zn with

an ∈ K, the series∞∑

j=0

aj zj :=

∞∑

j=0

fj (8.9)

is called a power series and the aj are called the coefficients of the power series.Note: The notation

∑∞j=0 aj z

j introduced in (8.9) is very common, but not entirely

correct, since one writes aj zj = fj(z) for the summands, even though one actually

means fj. Moreover, one uses the same notation if one actually does mean the series∑∞

j=0 fj(z) in K, so one has to see from the context if∑∞

j=0 aj zj means a series of

K-valued functions or a series of numbers.

Definition 8.6. Consider a series of K-valued functions∑∞

j=1 fj as in Def. 8.5(a), inparticular, sn :=

∑nj=1 fj for each n ∈ N.

(a) The series converges pointwise to f : M −→ K if, and only if, it (i.e. (sn)n∈N)converges pointwise in the sense of Def. 8.1(a). In that case, we use the notation

f =∞∑

j=1

fj. (8.10)

If (8.10) holds, then the series is sometimes called a series expansion of f , in par-ticular, a power series expansion if the series happens to be a power series.

Analogous to the situation of series in K, the notation∑∞

j=1 fj is also used withtwo different meanings – it can mean the sequence of partial sums as in (8.8) or, inthe case of convergent series, the limit function as in (8.10) (cf. Caveat 7.79).

(b) The series converges uniformly to f : M −→ K if, and only if, it converges uniformlyin the sense of Def. 8.1(b).

Corollary 8.7. Consider a function series∑∞

j=1 fj with fj : M −→ K, ∅ 6=M ⊆ C.

(a) The series converges uniformly to some f : M −→ K if, and only if, for eachn ∈ N and each z ∈M , the remainder series

∑∞j=n+1 fj(z) in K converges to some

rn(z) ∈ K such that∀

ǫ∈R+∃

N∈N∀

n>N∀

z∈M|rn(z)| < ǫ. (8.11)

(b) If∑∞

j=1 aj is a convergent series in R+0 , then the condition

∀z∈M

∀j∈N

|fj(z)| ≤ aj (8.12)

implies uniform convergence of∑∞

j=1 fj.

(c) If each fj is continuous in ζ ∈M and the series converges uniformly to f : M −→K, then f is continuous in ζ. In particular, if each fj is continuous, then f iscontinuous.

Proof. (a): If∑∞

j=1 fj converges uniformly to f , then f(z) =∑∞

j=1 fj(z) holds for eachz ∈ M , rn(z) = f(z) − sn(z) for each n ∈ N, z ∈ M according to (7.76), wheresn(z) :=

∑nj=1 fj(z). Then (8.11) is just (8.2), where the sn now play the role of the fn

in (8.2). Conversely, if the remainder series converge for each z ∈M , then we can define

f : M −→ K, f(z) := f1(z)+r1(z) =∑∞

j=1 fj(z). Then, once again, rn(z) = f(z)−sn(z)for each n ∈ N, z ∈M , and (8.11) is just (8.2), yielding the uniform convergence of theseries.

(b): First, (8.12) implies each remainder series∑∞

j=n+1 fj(z) converges absolutely. Thus,with rn(z) as in (a),

∀z∈M

|rn(z)|(7.81)

≤∞∑

j=n+1

|fj(z)| ≤∞∑

j=n+1

aj → 0 for n→ ∞,

such that (a) yields uniform convergence.

(c) is immediate from Th. 8.4. �

Remark 8.8. Given a function series∑∞

j=1 fj with fj : M −→ K, ∅ 6=M ⊆ C; for eachz ∈ M ,

∑∞j=1 fj(z) constitutes a series in K. Typically, one will only have convergence

of∑∞

j=1 fj(z) in K on a subset C ⊆ M . The series then converges pointwise in thesense of Def. 8.6(a) if all fj are restricted to C. It can be very difficult to determineif∑∞

j=1 fj(z) converges or diverges for some z ∈ M , and such investigations are oftenof particular interest in the context of function series. Even for power series, studyingconvergence can still be difficult, but the availability of the following Th. 8.9 does helpto (at least partially) settle the question in many cases.

Theorem 8.9. For each power series∑∞

j=0 aj zj, aj ∈ K, there exists a number r ∈

[0,∞] := R+0 ∪ {∞}, called the radius of convergence of the power series, such that

(

z ∈ K ∧ |z| < r)

⇒∞∑

j=0

aj zj converges absolutely in K, (8.13a)

(

z ∈ K ∧ |z| > r)

⇒∞∑

j=0

aj zj diverges in K (8.13b)

(for r = ∞, (8.13a) claims absolute convergence for each z ∈ K). In particular,∑∞

j=0 aj zj converges pointwise in the sense of Def. 8.6(a) for each z ∈ Br(0) (cf. Def.

7.7(a)). Moreover,

∀0<r0<r

( ∞∑

j=0

aj zj converges uniformly on Br0(0) (cf. Ex. 7.47(a))

in the sense of Def. 8.6(b)

)

. (8.14)

For the radius of convergence, one has the formula

r =1

L, where L := lim sup

n→∞n√

|an|. (8.15)

In (8.15), lim sup denotes the so-called limit superior, which is defined as the largestcluster point of the sequence ( n

√

|an|)n∈N if the sequence is bounded (cf. Th. 7.27) and∞ if the sequence is unbounded. As the limit superior can be 0 or ∞, we also define1/0 := ∞ and 1/∞ := 0 in (8.15).

One has the simpler formula

r = limn→∞

∣∣∣∣

anan+1

∣∣∣∣, (8.16)

provided all an are nonzero and provided the limit in (8.16) either exists in R+0 or is ∞.

Proof. For the proof of (8.15), we apply the root test from Th. 7.89(b). Here, for theroot test, we have to consider the sequence ( n

√

|an||z|n)n∈N. As a consequence of (7.11a)and Prop. 7.26, lim supn→∞(λxn) = λ lim supn→∞ xn for each λ > 0 and each sequence(xn)n∈N in R (with λ · ∞ := ∞, this also holds if the limit superior is infinite). Thus,

lim supn→∞

n√

|an||z|n = |z| lim supn→∞

n√

|an| = |z|L.

If |z| > 1/L, then |z|L > 1 and (7.82b) applies, i.e. (8.13b) holds for r = 1/L. If|z| < 1/L, then |z|L < 1, and, recalling the Bolzano-Weierstrass Th. 7.27, one sees that(7.82a) applies, i.e. (8.13a) holds for r = 1/L.

Next, if 0 < r0 < r, then∑∞

j=0 |aj rj0| converges according to (8.13a). Since, for each

z ∈ Br0(0) and each j ∈ N, we have |aj zj| ≤ |aj rj0|, (8.14) is a consequence of Cor.8.7(b).

The validity of (8.16) follows from the ratio test of Th. 7.89(c): If all an 6= 0 and z 6= 0,then

limn→∞

∣∣∣∣

an+1zn+1

anzn

∣∣∣∣= |z| lim

n→∞

∣∣∣∣

an+1

an

∣∣∣∣=

|z|limn→∞

∣∣∣anan+1

∣∣∣

.

If |z| < l := limn→∞

∣∣∣anan+1

∣∣∣, then |z|/l < 1, i.e. (7.83a) applies, proving (8.13a) for r = l.

If |z| > l, then |z|/l > 1, i.e. (7.83b) applies, proving (8.13b) for r = l. �

Corollary 8.10. If∑∞

j=0 aj zj, aj ∈ K, is a power series with radius of convergence

r ∈]0,∞], then the function

f : Br(0) −→ K, f(z) :=∞∑

j=0

aj zj, (8.17)

is continuous. In particular, if r = ∞, then f is continuous on K.

Proof. Each partial sum z 7→∑nj=0 aj z

j is a polynomial, i.e. continuous onK. Moreover,if ζ ∈ Br(0), then the power series converges uniformly on M := B|ζ|(0) by (8.14), i.e.it is continuous at ζ ∈M by Th. 8.4. �

Example 8.11. (a) For each α ∈ R, the radius of convergence of∑∞

n=1 nα zn is r = 1,

sincelim supn→∞

n√

|an| = limn→∞

n√nα = 1, (8.18)

which, for each α ∈ Z, follows from (7.46) and Th. 7.13(a), and, then, for all α ∈ R

from the Sandwich Th. 7.16.


Let us investigate what can happen for |z| = r = 1 for some cases: The series∑∞

n=1 zn (α = 0) is divergent for each z ∈ C with z = 1 by the observation that

(zn)n∈N does not converge to 0 for n → ∞ (as |zn| = 1 for each n ∈ N); theseries

∑∞n=1 n

−1 zn (α = −1) is the harmonic series, i.e. divergent, for z = 1, butconvergent for z = −1 according to Ex. 7.86(a).

(b) The radius of convergence of both∑∞

n=0zn

n!and

∑∞n=0

zn

nn is r = ∞ by (8.16) and(8.15), respectively, since

limn→∞

∣∣∣∣

anan+1

∣∣∣∣= lim

n→∞(n+ 1)!

n!= lim

n→∞(n+ 1) = ∞, (8.19a)

lim supn→∞

n√

|an| = limn→∞

n

√

1

nn= lim

n→∞1

n= 0. (8.19b)

(c) The radius of convergence of∑∞

n=0 n! zn is r = 0 by (8.16), since

limn→∞

∣∣∣∣

anan+1

∣∣∣∣= lim

n→∞n!

(n+ 1)!= lim

n→∞1

n+ 1= 0. (8.20)

Caveat 8.12. Theorem 8.9 does not claim the uniform convergence of∑∞

j=0 aj zj on

Br(0), which is usually not true (e.g., it is an exercise to show that∑∞

j=0 zj does not

converge uniformly on B1(0)). Theorem 8.9 also claims nothing about the convergenceor divergence of

∑∞j=0 aj z

j for |z| = r, which has to be determined case by case (cf. Ex.8.11(a)).

Definition and Remark 8.13. Given two power series p :=∑∞

j=0 aj zj and q :=

∑∞j=0 bj z

j in K, we define their Cauchy product

p ∗ q :=∞∑

j=0

cj zj , where cj :=

j∑

k=0

akbj−k = a0bj + a1bj−1 + · · ·+ ajb0. (8.21)

Note that we have not assumed any convergence of the series so far, i.e. p, q, and p ∗ qare not K-valued functions, but sequences of K-valued functions according to Def. 8.5(sequences of polynomials, actually). Sometimes one also calls the Cauchy product p∗ qthe convolution of p and q.

Now, if we do assume p and q to have some nonzero radii of convergence, say rp, rq ∈]0,∞], respectively, then, by (8.13a), both series are absolutely convergent for eachz ∈ Br(0), where r := min{rp, rq}. Thus, the functions

f : Br(0) −→ K, f(z) :=∞∑

j=0

aj zj, g : Br(0) −→ K, g(z) :=

∞∑

j=0

bj zj, (8.22)

are well-defined, and (7.94) implies

∀z∈Br(0)

f(z)g(z) =∞∑

j=0

cj zj with cj as in (8.21). (8.23)


8.3 Exponential Functions

The notion of power series allows us to extend the definition of exponential functions tocomplex arguments:

Definition and Remark 8.14. We define the exponential function

exp : C −→ C, exp(z) :=∞∑

n=0

zn

n!= 1 + z +

z2

2!+z3

3!+ . . . (8.24)

From Ex. 8.11(b), we already know the radius of convergence of the power series in(8.24) is ∞, such that the function in (8.24) is well-defined.

For the time being, we also redefine Euler’s number as e := exp(1) > 1 > 0 and, for eachx ∈ R+, ln x := logexp(1)(x). This, as well as calling the function of (8.24) exponentialfunction, will be justified as soon as we will have proved

limn→∞

(

1 +1

n

)n

=∞∑

n=0

1

n!= 1 +

1

1!+

1

2!+

1

3!+ . . . (8.25)

and

∀x∈R

ex =∞∑

n=0

xn

n!= 1 + x+

x2

2!+x3

3!+ . . . (8.26)

in (8.36) of Th. 8.18 and in Th. 8.16(c) below, respectively.

Proposition 8.15. If a continuous function E : R −→ R satisfies

a := E(1) > 0 and (8.27a)

∀x,y∈R

E(x+ y) = E(x)E(y), (8.27b)

then f is an exponential function – more precisely

∀x∈R

E(x) = ax. (8.28)

Proof. First, a = E(1) = E(0 + 1) = E(0)E(1) = E(0) a and a > 0 shows E(0) = 1.

Then, for each x ∈ R, 1 = E(0) = E(x − x) = E(x)E(−x), i.e. E(−x) =(E(x)

)−1,

showing E(x) 6= 0 for each x ∈ R. Thus, E(1) > 0, the continuity of E, and theintermediate value Th. 7.57 imply E(x) > 0 for each x ∈ R. Next, an induction shows

∀x∈R

∀n∈N

E(n · x) =(E(x)

)n: (8.29)

The base case is trivially true and the induction step is

E((n+ 1)x) = E(nx)E(x)ind. hyp.= (E(x))nE(x) = (E(x))n+1.


Applying (8.29) with x = 1 shows E(n) = an for each n ∈ N. Applying (8.29) withx = 1/n, n ∈ N, shows a = E(1) = (E(1/n))n, i.e. E(1/n) = a1/n since E(1/n) > 0.Next,

∀n,k∈N

E(k/n)(8.29)=(E(1/n)

)k= (a1/n)k = a

kn ,

showing (8.28) holds for each x ∈ Q+. Then (8.28) also holds for each x ∈ R+, since, if(qn)n∈N is a sequence in Q+ with limn→∞ qn = x, then the continuity of E implies

ax = limn→∞

aqn = limn→∞

E(qn) = E(x).

Finally, if x ∈ R−, then

ax = (a−x)−1 =(E(−x)

)−1= E(x),

completing the proof that (8.28) holds for each x ∈ R. �

Theorem 8.16. We consider the exponential function exp as defined in (8.24). Thefollowing holds:

(a) exp is continuous on C.

(b) exp(z + w) = exp(z) exp(w) is valid for all z, w ∈ C.

(c) With e := exp(1) (cf. Def. and Rem. 8.14), it is

∀x∈R

ex = exp(x) =∞∑

n=0

xn

n!.

Proof. (a) holds by Cor. 8.10; for (b), we compute (using (7.94)),

∀z,w∈C

exp(z) exp(w) =∞∑

n=0

cn,

where cn =n∑

j=0

zj

j!

wn−j

(n− j)!=

1

n!

n∑

j=0

(n

j

)

zj wn−j(5.22)=

(z + w)n

n!

;

(8.30)and then (c) is an immediate consequence of (a), (b), and Prop. 8.15. �

Definition 8.17. Let M ⊆ C. If ζ ∈ C is a cluster point of M , then a functionf : M −→ K is said to tend to η ∈ K (or to have the limit η ∈ K) for z → ζ(denoted by limz→ζ f(z) = η) if, and only if, for each sequence (zk)k∈N in M \ {ζ} withlimk→∞ zk = ζ, the sequence (f(zk))k∈N converges to η ∈ K, i.e.

limz→ζ

f(z) = η ⇔ ∀(zk)k∈N in M\{ζ}

(

limk→∞

zk = ζ ⇒ limk→∞

f(zk) = η)

. (8.31)


Theorem 8.18. We consider the exponential function exp as defined in (8.24). Withez := exp(z) for each z ∈ C and ln x := logexp(1)(x) for each x ∈ R+ (cf. Th. 8.16(c)and Def. and Rem. 8.14), we have the following limits:

limz→0

ez − 1

z= 1

(z ∈M := C \ {0}

), (8.32)

limx→0

ln(1 + x)

x= 1

(x ∈M :=]− 1,∞[\{0}

), (8.33)

∀ξ∈R

limx→0

ln(1 + ξ x)1x = ξ

(x ∈M := {x ∈ R : 1 + ξ x > 0} \ {0}

), (8.34)

∀ξ∈R

limx→0

(1 + ξ x)1x = eξ

(x ∈M := {x ∈ R : 1 + ξ x > 0} \ {0}

), (8.35)

∀x∈R

limn→∞

(

1 +x

n

)n

= ex =∞∑

n=0

xn

n!. (8.36)

Proof. (8.32): From (8.24) and ez = exp(z), we obtain

∀z 6=0

ez − 1

z=

∞∑

n=0

zn

(n+ 1)!= 1 +

z

2!+z2

3!+ . . . ,

which, since z 7→∑∞n=0

zn

(n+1)!is continuous on C by Cor. 8.10, implies (8.32).

(8.33): Consider the auxiliary function f : ] − 1,∞[:−→ R, f(x) := ln(x + 1), withf−1(x) = ex − 1. Now, given a sequence (xk)k∈N in ] − 1,∞[\{0} with limk→∞ xk = 0,one obtains

limk→∞

ln(1 + xk)

xk= lim

k→∞

ln(1 + f−1(f(xk))

)

f−1(f(xk))= lim

k→∞

ln(1 + ef(xk) − 1

)

ef(xk) − 1

= limk→∞

f(xk)

ef(xk) − 1

(8.32)= 1,

where, in the last step, it was used that limk→∞ xk = 0 and the continuity of f implieslimk→∞ f(xk) = ln 1 = 0.

Similarly, but simpler, one obtains (8.34) and (8.35) (exercise). Finally, for the sequence(xn)n∈N with xn := 1/n, (8.35) implies (8.36). �

Definition 8.19 (Exponentiation with Complex Exponents). For each (a, z) ∈ R+×C,we define

az := exp(z ln a), (8.37)

where exp is the function defined in (8.24). For a = e, (8.37) yields ez = exp(z), i.e.(8.37) is consistent with (8.26).

Theorem 8.20. (a) The first two exponentiation rules of (7.54) still hold for eacha, b > 0 and each z, w ∈ C:

az+w = az aw, (8.38a)

az bz = (ab)z. (8.38b)


(b) For each a ∈ R+, the exponential function

f : C −→ C, f(z) := az, (8.39a)

is continuous, and, for each ζ ∈ C, the power function

g : R+ −→ C, g(x) := xζ , (8.39b)

is continuous.

(c) The limit in (8.36) extends to complex numbers:

∀z∈C

limn→∞

(

1 +z

n

)n

= ez =∞∑

n=0

zn

n!. (8.40)

Proof. (a): We compute

az+w(8.37)= exp((z + w) ln a) = exp(z ln a+ w ln a)

Th. 8.16(b)= exp(z ln a) exp(w ln a)

(8.37)= az aw,

proving (8.38a), and

az bz(8.37)= exp(z ln a) exp(z ln b)

Th. 8.16(b)= exp(z ln a+ z ln b)

(7.65e)= exp

(z ln(ab)

) (8.37)= (ab)z,

proving (8.38b).

(b): The continuity of both functions follows from the continuity of exp (according toTh. 8.16(a)) and from the fact that continuity is preserved by compositions (accordingto Th. 7.41): The exponential function f , given by f(z) = ez ln a, is the composition ofthe continuous functions z 7→ z ln a and w 7→ ew, whereas (analogous to Ex. 7.76(a)),the power function g, given by g(x) = eζ lnx, is the composition g = exp ◦(ζ ln), whereln is continuous by Cor. 7.74.

(c): Exercise. �

8.4 Trigonometric Functions

The first “definition” of the trigonometric functions sine and cosine is the one based ongeometric visualization usually given in high school: cosx and sin x are the coordinatesof the point p = (p1, p2) ∈ R2 on the unit circle, such that x is the angle measured inradian between the line segment between (0, 0) and (1, 0) and the line segment between(0, 0) and p.

While this “definition” allows to obtain many important properties of sine and cosineusing geometric arguments, it is not mathematically rigorous, and, for example, provides

no clue how to compute values like sin 1. The problem is related to the fact that theangle measured in radian between the line segment between (0, 0) and (1, 0) and the linesegment between (0, 0) and p is supposed to be the length of the segment of the unitcircle between (1, 0) and p (taken in the counter-clockwise direction).

In the following Def. and Rem. 8.21, we will provide a mathematically rigorous definitionof sine and cosine using power series, and we will then verify that the functions have thefamiliar properties one learns in high school. However, as the computation of lengths ofcurved paths is actually beyond the scope of this lecture, we will not be able to see thatour sine and cosine functions are precisely the same we visualized in high school (theinterested reader is referred to Ex. 1 in Sec. 5.14 of [Wal02] and to [Phi17, Ex. 3.13(b)]).

Definition and Remark 8.21. We define the sine function, denoted sin, and thecosine function, denoted cos by

sin : C −→ C, sin z :=∞∑

n=0

(−1)n z2n+1

(2n+ 1)!= z − z3

3!+z5

5!−+ . . . , (8.41a)

cos : C −→ C, cos z :=∞∑

n=0

(−1)n z2n

(2n)!= 1− z2

2!+z4

4!−+ . . . . (8.41b)

(a) sin and cos are well-defined and continuous: For both series and each z ∈ C, we canestimate the absolute value of the nth summand by the nth summand of the seriesfor the exponential function e|z| (cf. (8.36)), which we know to be convergent fromEx. 8.11(b). Thus, by Th. 8.9, both series in (8.41) have radius of convergence ∞and are continuous by Cor. 8.10.

(b) cos : R −→ R (i.e. cos↾R) has a smallest positive zero α ∈ R+. We define π := 2α.One can show π is an irrational number (see Appendix H.2) and its first digits areπ = 3.14159 . . .

To see cos has a smallest positive zero and to obtain a first (very coarse) estimate,note

∀x∈R+

∀k∈N

(xk

k!>

xk+1

(k + 1)!⇔ 1 >

x

k + 1⇔ k + 1 > x

)

,

showing xk

k!> xk+1

(k+1)!holds for each k ≥ 2 and each x ∈]0, 3[. In particular, the

summands of the series in (8.41) converge monotonically to 0 (for k ≥ 2) and, sincethe series are alternating for x 6= 0, Th. 7.85 applies and (7.79) yields

∀0<x<3

f(x) := 1− x2

2< cos x < 1− x2

2+x4

24=: g(x),

x− x3

6< sin x < x− x3

6+

x5

120.

(8.42)

The zeros of x 7→ f(x) are −√2,√2, i.e.

√2 is its smallest positive zero; the zeros

of x 7→ g(x) are −√

6− 2√3,−

√

6 + 2√3,√

6− 2√3,√

6 + 2√3, i.e.

√

6− 2√3

is its smallest positive zero. Thus, as f(0) = g(0) = 1, the intermediate value Th.7.57 implies cos has a smallest positive zero α and

1.4 <√2 <

π

2:= α <

√

6− 2√3 < 1.6 (8.43)

Theorem 8.22. We have the following identities:

sin 0 = 0, cos 0 = 1, (8.44a)

∀z∈C

sin z = − sin(−z), cos z = cos(−z), (8.44b)

∀z,w∈C

sin(z + w) = sin z cosw + cos z sinw, (8.44c)

∀z,w∈C

cos(z + w) = cos z cosw − sin z sinw, (8.44d)

∀z∈C

(sin z)2 + (cos z)2 = 1, (8.44e)

cosπ

2= 0, sin

π

2= 1, ∀

x∈[0,π2[

cos x > 0, (8.44f)

∀z∈C

sin(

z +π

2

)

= cos z, cos(

z +π

2

)

= − sin z, (8.44g)

∀z∈C

sin(z + π) = − sin z, cos(z + π) = − cos z, (8.44h)

∀z∈C

sin(z + 2π) = sin z, cos(z + 2π) = cos z, (8.44i)

limz→0

sin z

z= 1, lim

z→0

cos z − 1

z2= −1

2. (8.44j)

Identities (8.44i) can be restated as sine and cosine being periodic functions with period2π.

Proof. (8.44a) is immediate from (8.41) since, for z = 0, all summands of the sineseries are 0 and all summands of the cosine series are 0, except the first one, which is(−1)0 00

0!= 1.

(8.44b) is also immediate from (8.41), since (−z)2n+1 = (−1)2n+1z2n+1 = −z2n+1 and(−z)2n = (−1)2nz2n = z2n.

(8.44c) and (8.44d) can be verified using the Cauchy product: According to (7.94),

∀z,w∈C

sin z cosw =∞∑

n=0

cn, cos z sinw =∞∑

n=0

dn,

where cn =n∑

j=0

(−1)j z2j+1

(2j + 1)!

(−1)n−j w2(n−j)

(2(n− j))!,

dn =n∑

j=0

(−1)j z2j

(2j)!

(−1)n−j w2(n−j)+1

(2(n− j) + 1)!,

,

that means, for each z, w ∈ C,

cn + dn =n∑

j=0

(−1)n z2j+1

(2j + 1)!

w2(n−j)

(2(n− j))!+

n∑

j=0

(−1)n z2j

(2j)!

w2(n−j)+1

(2(n− j) + 1)!

=n∑

j=0

(−1)n z2j+1

(2j + 1)!

w2n+1−(2j+1)

(2n+ 1− (2j + 1))!+

n∑

j=0

(−1)n z2j

(2j)!

w2n+1−2j

(2n+ 1− 2j)!

= (−1)n2n+1∑

j=0

zj

j!

w2n+1−j

(2n+ 1− j)!=

(−1)n

(2n+ 1)!

2n+1∑

j=0

(2n+ 1

j

)

zj w2n+1−j

=(−1)n (z + w)2n+1

(2n+ 1)!,

proving (8.44c). Similarly, according to (7.94),

∀z,w∈C

cos z cosw =∞∑

n=0

cn, sin z sinw =∞∑

n=0

dn,

where cn =n∑

j=0

(−1)j z2j

(2j)!

(−1)n−j w2(n−j)

(2(n− j))!,

dn =n∑

j=0

(−1)j z2j+1

(2j + 1)!

(−1)n−j w2(n−j)+1

(2(n− j) + 1)!,

,

that means, for each z, w ∈ C,

c0 = 1 and

∀n∈N

cn − dn−1 =n∑

j=0

(−1)n z2j

(2j)!

w2(n−j)

(2(n− j))!−

n−1∑

j=0

(−1)n−1 z2j+1

(2j + 1)!

w2(n−1−j)+1

(2(n− 1− j) + 1)!

=n∑

j=0

(−1)n z2j

(2j)!

w2n−2j

(2n− 2j)!+

n−1∑

j=0

(−1)n z2j+1

(2j + 1)!

w2n−(2j+1)

(2n− (2j + 1))!

= (−1)n2n∑

j=0

zj

j!

w2n−j

(2n− j)!=

(−1)n

(2n)!

2n∑

j=0

(2n

j

)

zj w2n−j

=(−1)n (z + w)2n

(2n)!,

proving (8.44d).

(8.44e): One computes for each z ∈ C:

(sin z)2 + (cos z)2 = cos z cos(−z)− sin z sin(−z) (8.44d)= cos(z − z) = cos 0 = 1.

(8.44f): cos π2= 0 and cos x > 0 for 0 ≤ x < π

2hold according to the definition of π in

Def. and Rem. 8.21(b). Then(

sinπ

2

)2 (8.44e)= 1−

(

cosπ

2

)2

= 1 and sinπ

2>π

2− (π/2)3

6

(8.43)> 1.4− (1.6)3

6> 0.7 > 0.


(8.44g) is immediate from (8.44c), (8.44d), and (8.44f).

(8.44h): One obtains

sin π = sin(π

2+π

2

)(8.44c)= 1 · 0 + 0 · 1 = 0,

cos π = cos(π

2+π

2

)(8.44d)= 0 · 0− 1 · 1 = −1,

∀z∈C

sin(z + π)(8.44c)= − sin z + 0 = − sin z,

∀z∈C

cos(z + π)(8.44d)= − cos z + 0 = − cos z.

(8.44i): One obtains

sin(2π) = sin(π + π)(8.44c)= 0 + 0 = 0,

cos(2π) = cos(π + π)(8.44d)= (−1)(−1)− 0 = 1,

∀z∈C

sin(z + 2π)(8.44c)= sin z + 0 = sin z,

∀z∈C

cos(z + 2π)(8.44d)= cos z − 0 = cos z.

(8.44j): One obtains

∀z∈C\{0}

sin z

z=

∞∑

n=0

(−1)n z2n

(2n+ 1)!= 1− z2

3!+z4

5!−+ . . . ,

cos z − 1

z2=

∞∑

n=0

(−1)n+1 z2n

(2(n+ 1))!= − 1

2!+z2

4!− z4

6!+− . . .

.

For both series on the right-hand side and each z ∈ C, we can estimate the absolutevalue of each summand by the corresponding summand of the exponential series for e|z|

(cf. (8.36)), showing they have radius of convergence ∞ and are continuous by Cor. 8.10.In particular, their continuity in z = 0 proves (8.44j). �

Theorem 8.23. One has sin(R) = cos(R) = [−1, 1], i.e. the image of both sine andcosine is [−1, 1]. Moreover, for each k ∈ Z:

sin is strictly increasing on[

−π2+ 2kπ,

π

2+ 2kπ

]

, (8.45a)

sin is strictly decreasing on

[π

2+ 2kπ,

3π

2+ 2kπ

]

, (8.45b)

cos is strictly increasing on [(2k − 1)π, 2kπ], (8.45c)

cos is strictly decreasing on [2kπ, (2k + 1)π], (8.45d)

which, due to (8.44e), can be summarized (and visualized) by saying that, if x runs from2kπ to 2(k + 1)π, then (cosx, sin x) runs once counterclockwise through the unit circle,starting at (1, 0).

Proof. From (8.44e), we know sin(R) ⊆ [−1, 1] and cos(R) ⊆ [−1, 1]. As

sinπ

2

(8.44f)= 1, sin

(

−π2

)(8.44b)= −1, cos 0

(8.44a)= 1, cos π

(8.44h)= − cos 0 = −1,

the continuity of sine and cosine together with the intermediate value Th. 7.57 impliessin(R) = cos(R) = [−1, 1].

From (8.42), we know 0 < x− x3

6< sin x and cosx < 1− x2

2+ x4

24< 1 for each x ∈]0, π

2],

implying

∀0≤x<x+y≤π

2

cos(x+ y) = cos x cos y − sin x sin y ≤ cos x cos y < cos x,

showing cos is strictly decreasing on [0, π2]. Then cos is strictly increasing on [−π

2, 0] by

(8.44b), sin is strictly increasing on [0, π2] and strictly decreasing on [π

2, π] by (8.44g),

implying sin is strictly increasing on [−π2, 0] and strictly decreasing on [−π,−π

2] by

(8.44b), i.e. sin is strictly increasing on [3π2, 2π] and strictly decreasing on [π, 3π

2] by

(8.44i), implying cos is strictly decreasing on [π2, π] and strictly increasing on [−π,−π

2]

by (8.44g). Since this fixes the monotonicity properties of both sine and cosine overmore than one period, the general statements in (8.45) are provided by (8.44i). �

We now come to important complex number relations between sine, cosine, and theexponential function.

Theorem 8.24. One has the following formulas, relating the (complex) sine, cosine,and exponential function:

∀z∈C

eiz = cos z + i sin z (Euler formula), (8.46a)

∀z∈C

cos z =eiz + e−iz

2, (8.46b)

∀z∈C

sin z =eiz − e−iz

2i. (8.46c)

Proof. Let z ∈ C. For (8.46a), one computes

eiz(8.37),(8.24)

=∞∑

n=0

(iz)n

n!=

∞∑

n=0

((−1)n z2n

(2n)!+i (−1)n z2n+1

(2n+ 1)!

)

Th. 7.95(a)=

∞∑

n=0

(−1)n z2n

(2n)!+ i

∞∑

n=0

(−1)n z2n+1

(2n+ 1)!

(8.41)= cos z + i sin z.

Then

eiz + e−iz(8.46a)= cos z + i sin z + cos(−z) + i sin(−z) = 2 cos z

proves (8.46b), and

eiz − e−iz(8.46a)= cos z + i sin z − cos(−z)− i sin(−z) = 2i sin z

proves (8.46c). �


As a first application of (8.46), we can now determine all solutions to the equationez = 1 and all zeros (if any) of exp, sin, and cos:

Theorem 8.25. The set of (complex) solutions to the equation ez = 1 consists preciselyof all integer multiples of 2πi, the exponential function has no zeros (neither in R norin C), and the set of all (real or complex) zeros of sine and cosine consists of a discreteset of real numbers. More precisely:

exp−1{1} = {2kπi : k ∈ Z}, (8.47a)

exp−1{0} = ∅, (8.47b)

sin−1{0} = {kπ : k ∈ Z}, (8.47c)

cos−1{0} ={(2k + 1) π

2: k ∈ Z

}. (8.47d)


Definition and Remark 8.26. We define tangent and cotangent by

tan : C \ cos−1{0}︸︷︷︸

C \ {(2k + 1)π2: k ∈ Z} by (8.47d)

−→ C, tan z :=sin z

cos z, (8.48a)

cot : C \ sin−1{0}︸︷︷︸

C \ {kπ : k ∈ Z} by (8.47c)

−→ C, cot z :=cos z

sin z, (8.48b)

respectively. Since sine and cosine are both continuous, tangent and cotangent are alsoboth continuous on their respective domains. Both functions have period π, since, foreach z in the respective domains,

tan(z + π) =sin(z + π)

cos(z + π)

(8.44h)=

− sin z

− cos z= tan z, cot(z + π)

(8.44h)=

− cos z

− sin z= cot z.

(8.49)Since

limn→∞

sin

(π

2− 1

n

)

= sinπ

2= 1 ∧ lim

n→∞cos

(π

2− 1

n

)

= cosπ

2= 0

∧ cos

(π

2− 1

n

)

> 0 ⇒ limn→∞

tan

(π

2− 1

n

)

= ∞,

limn→∞

sin

(

−π2+

1

n

)

= sin(

−π2

)

= −1 ∧ limn→∞

cos

(

−π2+

1

n

)

= cos(

−π2

)

= 0

∧ cos

(

−π2+

1

n

)

> 0 ⇒ limn→∞

tan

(

−π2+

1

n

)

= −∞,

limn→∞

sin1

n= sin 0 = 0 ∧ lim

n→∞cos

1

n= cos 0 = 1 ∧ sin

1

n> 0

⇒ limn→∞

cot1

n= ∞,


limn→∞

sin

(

π − 1

n

)

= sin π = 0 ∧ limn→∞

cos

(

π − 1

n

)

= cos π = −1

∧ sin

(

π − 1

n

)

> 0 ⇒ limn→∞

cot

(

π − 1

n

)

= −∞,

we obtain tan(R \ cos−1{0}) = cot(R \ sin−1{0}) = R.

For each k ∈ Z,

tan is strictly increasing on]

−π2+ kπ,

π

2+ kπ

[

, (8.50a)

cot is strictly decreasing on]

kπ, (k + 1)π[

: (8.50b)

On ]0, π2[, sin is strictly increasing and cos is strictly decreasing, i.e. tan is strictly

increasing and cot is strictly decreasing. Since tan(−x) = sin(−x)/ cos(−x) = − tan(x),on ]− π

2, 0[, tan is strictly increasing and cot is strictly decreasing. Taking into account

the signs of tan and cot on the respective intervals and their π-periodicity according to(8.49) proves (8.50).

Definition and Remark 8.27. Since we have seen sin to be strictly increasing on[−π

2, π2] with image [−1, 1], cos to be strictly decreasing on [0, π] with image [−1, 1], tan

to be strictly increasing on ]− π2, π2[ with image R, and cot to be strictly decreasing on

]0, π[ with image R; and since all four functions are continuous, Th. 7.60 implies theexistence of inverse functions, denoted by

arcsin : [−1, 1] −→ [−π/2, π/2], (8.51a)

arccos : [−1, 1] −→ [0, π], (8.51b)

arctan : R −→]− π/2, π/2[, (8.51c)

arccot : R −→]0, π[, (8.51d)

respectively, where all four inverse functions are continuous, arcsin is strictly increasing,arccos is strictly decreasing, arctan is strictly increasing, and arccot is strictly decreasing.

Of course, using (8.45) and (8.50), respectively, one can also obtain the inverse functionson different intervals, and, in the literature, such inverse functions are, indeed, consideredas well. Somewhat confusingly, it is common to denote all these different functions bythe same symbols, namely the ones introduced in (8.51). Here, we will not need to pursuethis any further, i.e. we will only consider the inverse functions precisely as defined in(8.51), which are also known as the principle inverse functions of sin, cos, tan, and cot,respectively.

8.5 Polar Form of Complex Numbers, Fundamental Theoremof Algebra

Theorem 8.28. For each complex number z ∈ C, there exist real numbers r ≥ 0 andϕ ∈ R such that

z = r eiϕ. (8.52)


Moreover, if (8.52) holds with r ≥ 0 and ϕ ∈ R, then r is the modulus of z and, forz 6= 0, ϕ is uniquely determined up to addition of an integer multiple of 2π, i.e.

∀z∈C\{0}

(

z = r eiϕ1 = r eiϕ2 ∧ r ≥ 0 ⇒ r = |z| ∧ ∃k∈Z

ϕ1 − ϕ2 = 2πk)

. (8.53)

Proof. For z = 0, there is nothing to prove, so we assume z 6= 0 and set r := |z|. Wewrite z = x+ iy with x, y ∈ R, first assuming y ≥ 0. Then

z

r= ξ + iη, where ξ =

x

r, η =

y

r≥ 0, ξ2 + η2 = 1. (8.54)

In particular, −1 ≤ ξ ≤ 1. Thus, letting

ϕ := arccos ξ,

we obtain ϕ ∈ [0, π], ξ = cosϕ, and sinϕ ≥ 0, yielding

sinϕ =√

1− (cosϕ)2 =√

1− ξ2(8.54)= η.

In consequence,z

r= ξ + iη = cosϕ+ i sinϕ

(8.46a)= eiϕ,

as desired. If y ≤ 0, then the above shows the existence of ψ ∈ R such that z = x− iy =reiψ = r cosψ + ir sinψ. Letting ϕ := −ψ, we, once again, have z = r cosψ − ir sinψ =re−iψ = reiϕ, as desired, completing the existence proof for the representation (8.52).Now assume (8.52) holds with r ≥ 0. Then

|z| = r|eiϕ| = r√

(sinϕ)2 + (cosϕ)2 = r.

Finally, if r eiϕ1 = r eiϕ2 with r > 0, then ei(ϕ1−ϕ2) = 1, i.e. i(ϕ1 − ϕ2) ∈ {2kπi : k ∈ Z}by (8.47a). �

Definition and Remark 8.29. The representation of z ∈ C given by (8.52) is called itspolar form, where (r, ϕ) are also called polar coordinates of z, ϕ is called an argument ofz. For z 6= 0, one can fix the argument uniquely by the additional requirement ϕ ∈ [0, 2π[(but one also finds other choices, for example ϕ ∈]− π, π], in the literature). The aboveterminology is consistent with the common use of calling (r, ϕ) polar coordinates of thevector z = (x, y) ∈ R2(= C) (in contrast to the Cartesian coordiantes (x, y)), where rconstitutes the distance of the point z = (x, y) from the origin (0, 0) and ϕ is the anglebetween the vector z = (x, y) and the x-axis (cf. the three introductory paragraphsof the previous Sec. 8.4). As promised, we can now better understand the geometricinterpretation of complex multiplication already described in Rem. 5.10: If z1 = r1e

iϕ1

and z2 = r2eiϕ2 , then z1z2 = r1r2e

i(ϕ1+ϕ2), i.e. complex multiplication, indeed, meansmultiplying absolute values and adding arguments.

Corollary 8.30. If z ∈ C, then |z| = 1 holds if, and only if, there exists ϕ ∈ R suchthat z = eiϕ – in other words, the map

f : R −→ {z ∈ C : |z| = 1}, f(ϕ) := eiϕ, (8.55)

is surjective. Moreover f(ϕ1) = f(ϕ2) holds if, and only if, ϕ1 − ϕ2 = 2πk for somek ∈ Z.


Proof. Everything is immediate from Th. 8.28. �

Corollary 8.31 (Roots of Unity). For each n ∈ N, the equation zn = 1 has precisely ndistinct solutions ζ1, . . . , ζn ∈ C, where

∀k=1,...,n

ζk := ek2πi/n(8.46a)= cos

k2π

n+ i sin

k2π

n= ζk1 . (8.56)

The numbers ζ1, . . . , ζn defined in (8.56) are called the nth roots of unity.

Proof. It is ζnk = ek2πi = 1 for each k ∈ {1, . . . , n} and the ζ1, . . . , ζn are all distinctby Cor. 8.30, since, for k, l ∈ {1, . . . , n} with k 6= l, (k − l)/n /∈ Z. As ζ1, . . . , ζn aren distinct zeros of the polynomial P : C −→ C, P (z) := zn − 1, and P has at most nzeros by Th. 6.6(a), ζ1, . . . , ζn constitute all solutions to zn = 1. �

We are now in a position to prove one of the central results of analysis and algebra,namely the fundamental theorem of algebra. The following proof does not need any toolsbeyond the ones provided by this class – it is actually mainly founded on continuousfunctions attaining a min and a max on compact sets according to Th. 7.54 and theexistence of nth roots of unity according to Cor. 8.31.

Theorem 8.32 (Fundamental Theorem of Algebra). Every polynomial P : C −→ C,P (z) :=

∑nj=0 ajz

j, of degree n ≥ 1 (i.e. a0, . . . , an ∈ C with an 6= 0) has at least onezero z0 ∈ C.

Proof. Dividing the equation P (z) = 0 by an 6= 0, it suffices to consider the case an = 1.We therefore assume

∀z∈C

P (z) = zn + an−1zn−1 + · · ·+ a1z + a0.

Claim 1. The function |P | attains its global min on C, i.e. there exists z0 ∈ C such that|P | is minimal in z0.

Proof. We first note

∀z 6=0

P (z) = zn(1 + r(z)

), where r(z) :=

an−1

z+ · · ·+ a0

zn.

Set M := |a0|+ · · ·+ |an−1| and R := max{1, 2M}.Then

∀|z|≥R

|r(z)||z|≥1

≤ M

|z||z|≥2M

≤ 1

2

and, thus,

∀|z|≥R

|P (z)| = |z|n∣∣1 + r(z)

∣∣ ≥ |z|n

2≥M.

This estimate together with |P (0)| = |a0| ≤M shows that the min of |P | on the compactdisk BR(0) (see Ex. 7.47(a)) (such a min z0 ∈ BR(0) exists due to Th. 7.54) must bethe global min of |P | on C. N

Claim 2. If |P | has a min in z0 ∈ C, then P (z0) = 0.

Proof. Proceeding by contraposition, we assume P (z0) 6= 0 and show that |P | does nothave a min in z0. We need to construct z1 ∈ C such that |P (z1)| < |P (z0)|. To this end,define

p : C −→ C, p(z) :=P (z0 + z)

P (z0).

Then p is still a polynomial of degree n. Since p(0) = 1,

∃k∈{1,...,n}

∃bk,...,bn∈C

∀z∈C

p(z) = 1 +n∑

j=k

bjzj, bk 6= 0.

Write −b−1k in polar form, i.e. −b−1

k = reiϕ with r ∈ R+ and ϕ ∈ R. Define

β := k√r eiϕ/k

(i.e. βk = reiϕ = −b−1

k

)

and

q : C −→ C, q(z) := p(βz) = 1 + bkβkzk +

n∑

j=k+1

bjβjzj = 1− zk + zk+1 S(z),

where S is the polynomial

S : C −→ C, S(z) :=n−k−1∑

j=0

bk+1+jβk+1+jzj (S ≡ 0 in case k = n).

Then, according to Th. 7.54,

∃C∈R+

∀z∈B1(0)

|S(z)| ≤ C.

Lettingc := min{1, C−1},

one obtains∀

0<|z|<c

∣∣zk+1 S(z)

∣∣ ≤ C |z|k+1 < |z|k

and, thus,∀

x∈]0,c[|q(x)| ≤ 1− xk +

∣∣xk+1 S(x)

∣∣ < 1− xk + xk = 1.

Thus, finally,

∀x∈]0,c[

|P (z0 + βx)||P (z0)|

= |p(βx)| = |q(x)| < 1,

showing |P | does not have a min in z0. N

Combining Claims 1 and 2 completes the proof of the theorem. �

9 DIFFERENTIAL CALCULUS 124

Corollary 8.33. (a) For every polynomial P : C −→ C of degree n ≥ 1, there existnumbers c, ζ1, . . . , ζn ∈ C such that

P (z) = c

n∏

j=1

(z − ζj) = c(z − ζ1)(z − ζ2) · · · (z − ζn) (8.57)

(the ζ1, . . . , ζn are precisely all the zeros of P , some or all of which might be iden-tical).

(b) For every polynomial P : R −→ R of degree n ≥ 1, there exist numbers n1, n2 ∈ N0

and c, ξ1, . . . , ξn1 , α1, . . . , αn2 , β1, . . . , βn2 ∈ R such that

n = n1 + 2n2, (8.58a)

and

P (x) = c

n1∏

j=1

(x− ξj)

n2∏

j=1

(x2 + αjx+ βj). (8.58b)

Proof. For (a), one just combines Th. 8.32 with Rem. 6.7.

(b): If P has only real coefficients, then we can take complex conjugates to obtain

P (ζ) = 0 ⇒ P (ζ) = P (ζ) = 0, (8.59)

showing that the nonreal zeros of P (if any) must occur in conjugate pairs. Moreover,

(x− ζ)(x− ζ) = x2 − (ζ + ζ)x+ ζζ = x2 − 2xRe ζ + |ζ|2, (8.60)

showing that (8.57) implies (8.58). �

9 Differential Calculus

9.1 Definition of Differentiability and Rules

The basic idea of differential calculus is to locally approximate nonlinear functions f bylinear functions. In our case, f will be defined on a subset M of R and, given ξ ∈ Mand R-valued f , we will investigate the question if we can define a number f ′(ξ) ∈ R

that represents the slope of the graph of f at ξ such that the line through ξ with slopef ′(ξ) (called the tangent of f in ξ) can be considered as a local approximation of thegraph of f .

If such a local approximation of f in ξ is at all reasonable, then, for x 6= ξ,

f(x)− f(ξ)

x− ξ

should provide “good” approximations of f ′(ξ) if x tends to ξ. This leads to the followingDef. 9.1, where we also allow C-valued functions (while the above-described geometricinterpretation only works for R-valued functions, it can be applied to both the real andthe imaginary parts of a C-valued function, cf. Rem. 9.2 below); but note that we donot consider differentiability of functions f : C −→ C, which would lead to the notionof complex differentiability or holomorphicity, which is studied in the field of ComplexAnalysis and is beyond the scope of this class.

Definition 9.1. Let a < b, f : ]a, b[−→ K (a = −∞, b = ∞ is admissible), and ξ ∈]a, b[.Then f is said to be differentiable at ξ if, and only if, the following limit in (9.1) exists

in the sense of Def. 8.17 (where x 7→ f(x)−f(ξ)x−ξ plays the role of x 7→ f(x) in Def. 8.17).

The limit is then called the derivative of f in ξ. Many symbols are used in the literatureto denote derivatives, the following provides a selection:

f ′(ξ) := ∂xf(ξ) :=df(ξ)

dx:= lim

x→ξ

f(x)− f(ξ)

x− ξ= lim

h→0

f(ξ + h)− f(ξ)

h. (9.1)

Note both limits occurring in (9.1) are, indeed, identical, since the sequence (xk)k∈N in]a, b[ converges to ξ if, and only if, the sequence (hk)k→∞ with hk := xk − ξ convergesto 0. The number in (9.1) (if it exists) is also called a differential quotient, whereasf(x)−f(ξ)

x−ξ is known as a difference quotient.

f is called differentiable if, and only if, it is differentiable at each ξ ∈]a, b[. In that case,one calls the function

f ′ : ]a, b[−→ K, x 7→ f ′(x), (9.2)

the derivative of f .

Remark 9.2. In the situation of Def. 9.1, the complex-valued function f : ]a, b[−→ C

is differentiable at ξ ∈]a, b[ if, and only if, both functions Re f, Im f : ]a, b[−→ R aredifferentiable, and, in that case

f ′(ξ) = (Re f)′(ξ) + i (Im f)′(ξ). (9.3)

Indeed, we merely have to note

∀x,ξ∈]a,b[,x 6=ξ

f(x)− f(ξ)

x− ξ=

Re f(x)− Re f(ξ)

x− ξ+ i

Im f(x)− Im f(ξ)

x− ξ(9.4)

and that, by (7.2) a sequence (zn)n∈N in C converges to ζ ∈ C if, and only if, bothlimn→∞ Re zn = Re ζ and limn→∞ Im zn = Im ζ hold.

Definition 9.3. If f : ]a, b[−→ R as in Def. 9.1 is differentiable at ξ ∈]a, b[, then thegraph of the affine function

L : R −→ R, L(x) := f(ξ) + f ′(ξ)(x− ξ), (9.5)

i.e. the line through (ξ, f(ξ)) with slope f ′(ξ) is called the tangent to the graph of f atξ.


Theorem 9.4. If f : ]a, b[−→ K as in Def. 9.1 is differentiable at ξ ∈]a, b[, then it iscontinuous at ξ. In particular, if f is everywhere differentiable, then it is everywherecontinuous.

Proof. Let (xk)k∈N be a sequence in ]a, b[\{ξ} such that limk→∞ xk = ξ. Then

limk→∞

(f(xk)− f(ξ)

)= lim

k→∞

(xk − ξ)(f(xk)− f(ξ)

)

xk − ξ= 0 · f ′(ξ) = 0, (9.6)

proving the continuity of f in ξ. �

Example 9.5. (a) For each a, b ∈ K, the affine function f : R −→ K, f(x) := ax+ b,is differentiable with f ′(x) = a for each x ∈ R: If x ∈ R and (hk)k∈N is a sequencewith hk 6= 0 such that limk→∞ hk = 0, then

limk→∞

f(x+ hk)− f(x)

hk= lim

k→∞

a(x+ hk) + b− ax− b

hk= lim

k→∞

a hkhk

= a. (9.7)

In particular, each constant function f ≡ b has derivative f ′ ≡ 0.

(b) For each c ∈ K, the function f : R −→ K, f(x) := ecx, is differentiable withf ′(x) = c ecx for each x ∈ R (in particular, c = 1 yields f ′(x) = ex for f(x) = ex,and c = ln a yields f ′(x) = (ln a) ax for f(x) = ax = ex ln a, a ∈ R+): The case c = 0was treated in (a). Thus, let c 6= 0. If x ∈ R and (hk)k∈N is a sequence with hk 6= 0such that limk→∞ hk = 0, then

limk→∞

f(x+ hk)− f(x)

hk= lim

k→∞

ecx+chk − ecx

hk= c ecx lim

k→∞

echk − 1

chk

(8.32)= c ecx. (9.8)

(c) The sine and the cosine function f, g : R −→ R, f(x) := sin x, g(x) := cos x, aredifferentiable with f ′(x) = cos x and g′(x) = − sin x for each x ∈ R: If x ∈ R and(hk)k∈N is a sequence with hk 6= 0 such that limk→∞ hk = 0, then

limk→∞

f(x+ hk)− f(x)

hk= lim

k→∞

sin(x+ hk)− sin x

hk(8.44c)= lim

k→∞

sin x coshk + cos x sinhk − sin x

hk

= sin x limk→∞

hk(coshk − 1)

h2k+ cos x lim

k→∞

sinhkhk

(8.44j)= (sin x) · 0 ·

(

−1

2

)

+ (cos x) · 1 = cos x. (9.9)

The proof of g′(x) = − sin x is left as an exercise.

(d) The absolute value function f : R −→ R, f(x) := |x|, is not differentiable at ξ = 0:

limn→∞

f(0 + 1n)− f(0)1n

= limn→∞

1 = 1, (9.10a)

limn→∞

f(0− 1n)− f(0)

− 1n

= limn→∞

1n

− 1n

= −1, (9.10b)

showing that f(0+h)−f(0)h

does not have a limit for h→ 0.

Remark 9.6. As just seen in Ex. 9.5(d), the absolute value function shows continuousfunctions do not need to be differentiable. Somewhat surprisingly, with a bit more effort,one can even construct continuous functions f : R −→ R that are not differentiable atany x ∈ R (see Appendix J.1).

Theorem 9.7. Let a < b, f, g : ]a, b[−→ K (a = −∞, b = ∞ is admissible), andξ ∈]a, b[. Assume f and g are differentiable at ξ.

(a) For each λ ∈ K, λf is differentiable at ξ and (λf)′(ξ) = λf ′(ξ).

(b) f + g is differentiable at ξ and (f + g)′(ξ) = f ′(ξ) + g′(ξ).

(c) Product Rule: fg is differentiable at ξ and (fg)′(ξ) = f ′(ξ)g(ξ) + f(ξ)g′(ξ).

(d) Quotient Rule: If g(ξ) 6= 0, then f/g is differentiable at ξ and

(f/g)′(ξ) =f ′(ξ)g(ξ)− f(ξ)g′(ξ)

(g(ξ))2, in particular (1/g)′(ξ) = − g′(ξ)

(g(ξ))2.

Proof. Let (hk)k∈N be a sequence with hk 6= 0 such that limk→∞ hk = 0.

For (a), one computes

limk→∞

(λf)(ξ + hk)− (λf)(ξ)

hk= lim

k→∞

λf(ξ + hk)− λf(ξ)

hk

= λ limk→∞

f(ξ + hk)− f(ξ)

hk= λ f ′(ξ).

For (b), one computes

limk→∞

(f + g)(ξ + hk)− (f + g)(ξ)

hk= lim

k→∞

f(ξ + hk)− f(ξ) + g(ξ + hk)− g(ξ)

hk

= limk→∞

f(ξ + hk)− f(ξ)

hk+ lim

k→∞

g(ξ + hk)− g(ξ)

hk= f ′(ξ) + g′(ξ).

For (c), one computes

limk→∞

(fg)(ξ + hk)− (fg)(ξ)

hk

= limk→∞

f(ξ + hk)g(ξ + hk)− f(ξ)g(ξ + hk) + f(ξ)g(ξ + hk)− f(ξ)g(ξ)

hk

= limk→∞

g(ξ + hk) limk→∞

f(ξ + hk)− f(ξ)

hk+ f(ξ) lim

k→∞

g(ξ + hk)− g(ξ)

hk

= f ′(ξ)g(ξ) + f(ξ)g′(ξ),

where, in the last equality, we used the continuity of g in ξ according to Th. 9.4.


For (d), one first proves the special case f ≡ 1 by

limk→∞

(1/g)(ξ + hk)− (1/g)(ξ)

hk= lim

k→∞

g(ξ)− g(ξ + hk)

g(ξ + hk)g(ξ)hk= − g′(ξ)

(g(ξ))2,

which implies the general case using (c):

(f/g)′(ξ) =

(

f · 1g

)′(ξ) =

f ′(ξ)

g(ξ)− f(ξ)g′(ξ)

(g(ξ))2=f ′(ξ)g(ξ)− f(ξ)g′(ξ)

(g(ξ))2,


Example 9.8. (a) Each polynomial is differentiable and the derivative is, again, apolynomial. More precisely,

P : R −→ K, P (x) =n∑

j=0

ajxj, aj ∈ K

⇒ P ′ : R −→ K, P ′(x) =n∑

j=1

j ajxj−1 :

(9.11)

The cases n = 0, 1 are provided by Ex. 9.5(a). To complete the induction proof of(9.11), we carry out the induction step for each n ∈ N: Writing P (x) =

∑nj=0 ajx

j+an+1xx

n and applying the induction hypothesis as well as the rules of Th. 9.7 yields

P ′(x) =n∑

j=1

j ajxj−1 + an+1(1 · xn + x · n · xn−1) =

n+1∑

j=1

j ajxj−1,

which establishes the case.

(b) Clearly, the derivatives of rational functions P/Q with polynomials P and Q canbe computed from (9.11) and the quotient rule of Th. 9.7(d).

(c) The functions tan and cot as defined in (8.48) and restricted to R \ cos−1{0} andR \ sin−1{0}, respectively, are differentiable and one obtains

tan′ : R \ cos−1{0}︸︷︷︸

R\{(2k+1)π2: k∈Z}

−→ R, tan′ x =1

(cos x)2= 1 + (tan x)2, (9.12a)

cot′ : R \ sin−1{0}︸︷︷︸

R\{kπ: k∈Z}

−→ R, cot′ x = − 1

(sin x)2= −(1 + (cot x)2) : (9.12b)

One merely needs the derivatives of sin and cos from Ex. 9.5(c) and the quotientrule of Th. 9.7(d):

tan′ x =cos x cos x− sin x(− sin x)

(cosx)2(8.44e)=

1

(cosx)2(8.44e)= 1 + (tan x)2,

cot′ x =− sin x sin x− cosx cos x

(sin x)2(8.44e)= − 1

(sin x)2(8.44e)= −(1 + (cot x)2).

Theorem 9.9 (Derivative of Inverse Functions). Let a < b, I :=]a, b[ (a = −∞, b = ∞is admissible). If f : I −→ R is differentiable and strictly increasing (resp. decreasing),then f has a continuous, strictly increasing (resp. decreasing) inverse function f−1 de-fined on the interval J := f(I), i.e. f−1 : J −→ I, and, for each ξ ∈ I with f ′(ξ) 6= 0,f−1 is differentiable at η := f(ξ) with

(f−1)′(η) =1

f ′(ξ)=

1

f ′(f−1(η)

) . (9.13)

Proof. As a differentiable function, f is continuous by Th. 9.4, i.e. Th. 7.60 providesall the present assertions, except differentiability at η and (9.13). Let (yk)k∈N be asequence in J \ {η} such that limk→∞ yk = η. Then, as f−1 is bijective and continuous,(f−1(yk))k∈N is a sequence in I \ {ξ} such that limk→∞ f−1(yk) = ξ, and one obtains

limk→∞

f−1(yk)− f−1(η)

yk − η= lim

k→∞

f−1(yk)− f−1(η)

f(f−1(yk)

)− f

(f−1(η)

) =1

f ′(f−1(η)

) , (9.14)

establishing the case. �

Example 9.10. (a) The function ln : R+ −→ R is differentiable and, for each x ∈ R+,ln′ x = 1/x: If f(x) = ex, then f ′(x) = ex 6= 0 for each x ∈ R, ln x = f−1(x), and(9.13) yields

ln′ x =1

f ′(ln x)=

1

elnx=

1

x.

(b) The function arcsin : ] − 1, 1[−→] − π/2, π/2[ is differentiable and, for each x ∈] − 1, 1[, arcsin′ x = 1/

√1− x2: If f(x) = sin x, then f ′(x) = cos x 6= 0 for each

x ∈]− π/2, π/2[, arcsin x = f−1(x), and (9.13) yields

arcsin′ x =1

f ′(arcsin x)=

1

cos arcsin x

(∗)=

1√

1− (sin arcsin x)2=

1√1− x2

,

where, at (∗), it was used that cos2 = 1−sin2 and cos t > 0 for each t ∈]−π/2, π/2[.

(c) The function arccos : ] − 1, 1[−→]0, π[ is differentiable and, for each x ∈] − 1, 1[,arccos′ x = −1/

√1− x2: If f(x) = cos x, then f ′(x) = − sin x 6= 0 for each x ∈]0, π[,

arccos x = f−1(x), and (9.13) yields

arccos′ x =1

f ′(arccosx)=

1

− sin arccos x

(∗)= − 1

√

1− (cos arccos x)2= − 1√

1− x2,

where, at (∗), it was used that sin2 = 1− cos2 and sin t > 0 for each t ∈]0, π[.

(d) The function arctan : R −→] − π/2, π/2[ is differentiable and, for each x ∈ R,arctan′ x = 1/(1 + x2): Apply Th. 9.9 with f(x) = tan x as an exercise.

(e) The function arccot : R −→]0, π[ is differentiable and, for each x ∈ R, arccot′ x =−1/(1 + x2): Apply Th. 9.9 with f(x) = cot x as an exercise.

Theorem 9.11 (Chain Rule). Let a < b, c < d, f : ]a, b[−→ R, g : ]c, d[−→ K,f(]a, b[) ⊆]c, d[ (a, c = −∞; b, d = ∞ is admissible). If f is differentiable in ξ ∈]a, b[and g is differentiable in f(ξ) ∈]c, d[, then g ◦ f : ]a, b[−→ K is differentiable in ξ and

(g ◦ f)′(ξ) = f ′(ξ)g′(f(ξ)). (9.15)

Proof. Let η := f(ξ) and define the auxiliary function

g : ]c, d[−→ K, g(x) :=

{g(x)−g(η)x−η for x 6= η,

g′(x) for x = η.(9.16)

Then∀

x∈]c,d[g(x)− g(η) = g(x)(x− η). (9.17)

Let (xk)k∈N be a sequence in ]a, b[\{ξ} such that limk→∞ xk = ξ. One obtains

limk→∞

g(f(xk)

)− g(f(ξ)

)

xk − ξ

(9.17)= lim

k→∞

g(f(xk)

)(f(xk)− f(ξ)

)

xk − ξ

= limk→∞

g(f(xk)

)limk→∞

f(xk)− f(ξ)

xk − ξ

= f ′(ξ)g′(f(ξ)), (9.18)


Example 9.12. (a) According to the chain rule of Th. 9.11, the function h : R −→ R,h(x) := sin(−x3) is differentiable and, for each x ∈ R, h′(x) = −3x2 cos(−x3).

(b) According to the chain rule of Th. 9.11, each power function h : R+ −→ K, h(x) :=xα = eα lnx, α ∈ K, is differentiable and, for each x ∈ R+, h′(x) = α

xeα lnx = αxα−1.

Indeed, h = g◦f , where f : R+ −→ R, f(x) := ln x with f ′ : R+ −→ R, f ′(x) := 1x,

according to Ex. 9.5(b), and g : R −→ K, g(x) := eαx, with g′ : R −→ K,g′(x) := α eαx according to Ex. 9.10(a).

9.2 Higher Order Derivatives and the Sets Ck

Definition 9.13. Let a < b, I :=]a, b[, f : I −→ K (a = −∞, b = ∞ is admissible). If fis differentiable, then f ′ might or might not itself be differentiable. If f ′ is differentiable,then its derivative is denoted by f ′′ and is called the second derivative of f . Clearly, thisprocess can be iterated, leading to the following general recursive definition of higher-order derivatives:

Let f (0) := f . For k ∈ N0 assume the kth derivative of f , denoted by f (k) exists onI. Then f is said to have a derivative of order k + 1 at ξ ∈ I if, and only if, f (k) isdifferentiable at ξ. In that case, define

f (k+1)(ξ) := (f (k))′(ξ). (9.19)

If f (k+1)(ξ) exists for all ξ ∈ I, then f is said to be (k + 1)-times differentiable and thefunction f (k+1) : I −→ K, x 7→ f (k+1)(ξ), is called the (k + 1)st derivative of f . It iscommon to write f ′ := f (1), f ′′ := f (2), f ′′′ := f (3), but f (k) if k ≥ 4.

If f (k) exists, it might or might not be continuous (cf. Ex. 9.14(c) below). One defines

∀k∈N0

Ck(I,K) :={

f ∈ F(I,K) : f (k) exists and is continuous on I}

, (9.20)

C∞(I,K) :=⋂

k∈N0

Ck(I,K) (9.21)

(note C0(I,K) = C(I,K) and C(I,K) ⊇ C1(I,K) ⊇ C2(I,K) ⊇ . . . ). Finally, we definethe notation Ck(I) := Ck(I,R) for k ∈ N0 ∪ {∞}.

Example 9.14. (a) One has sin ∈ C∞(R) with sin′ = cos, sin′′ = − sin, sin′′′ = − cos,sin(4) = sin, . . .

(b) A simple induction shows, for each polynomial P : R −→ K, P (x) =∑n

j=0 ajxj ,

aj ∈ K, n ∈ N0, that P(n)(x) = n! an. In particular, P ∈ C∞(R,K).

(c) It is an exercise to show the following function f is differentiable, but f ′ is notcontinuous, i.e. f /∈ C1(R):

f : R −→ R, f(x) :=

{

x2 cos(1x

)for x 6= 0,

0 for x = 0.

9.3 Mean Value Theorem, Monotonicity, and Extrema

Theorem 9.15. Let a < b. If f : ]a, b[−→ R is differentiable in ξ ∈]a, b[ and f has alocal min or max in ξ, then f ′(ξ) = 0.

Proof. Suppose f has a local max at ξ. Then there exists ǫ > 0 such that |h| < ǫ impliesf(ξ + h)− f(ξ) ≤ 0. Now let (hk)k∈N be a sequence in ]0, ǫ[ with limk→∞ hk = 0. Thenf(ξ ± hk)− f(ξ) ≤ 0 for all k ∈ N implies

f ′(ξ) = limk→∞

f(ξ + hk)− f(ξ)

hk≤ 0, f ′(ξ) = lim

k→∞

f(ξ − hk)− f(ξ)

−hk≥ 0, (9.22)

showing f ′(ξ) = 0. Now, if f has a local min at ξ, then −f has a local max at ξ, andf ′(ξ) = −(−f)′(ξ) = 0 establishes the case. �

Remark 9.16. For f : R −→ R, f(x) := x3, it is f ′(0) = 0, but f does not have a localmin or max at 0, showing that, while being necessary for an differentiable function f tohave a local extremum at ξ, f ′(ξ) = 0 is not a sufficient condition for such an extremumat ξ. Points ξ with f ′(ξ) = 0 are sometimes called stationary or critical points of f .

—

Now, we first prove an important special case of the mean value theorem:

Theorem 9.17 (Rolle’s Theorem). Let a < b. If f : [a, b] −→ R is continuous on thecompact interval [a, b], differentiable on the open interval ]a, b[, and f(a) = f(b), thenthere exists ξ ∈]a, b[ such that f ′(ξ) = 0.

Proof. If f is constant, then f ′(ξ) = 0 holds for each ξ ∈]a, b[. If f is nonconstant,then there exists x ∈]a, b[ with f(x) 6= f(a). If f(x) > f(a), then Th. 7.54 implies theexistence of ξ ∈]a, b[ such that f attains its (global and, thus, local) max in ξ. ThenTh. 9.15 yields f ′(ξ) = 0. The case f(x) < f(a) is treated analogously. �

Theorem 9.18 (Mean Value Theorem). Let a < b. If f, g : [a, b] −→ R are continuouson the compact interval [a, b], differentiable on the open interval ]a, b[, and g′(x) 6= 0 foreach x ∈]a, b[, then there exists ξ ∈]a, b[ such that

f(b)− f(a)

g(b)− g(a)=f ′(ξ)

g′(ξ). (9.23a)

In the special case that g : [a, b] −→ R, g(x) = x, one obtains the standard form

f(b)− f(a)

b− a= f ′(ξ). (9.23b)

Proof. First note that Rolle’s Th. 9.17 and g′ 6= 0 imply g(b) − g(a) 6= 0. Next, oneapplies Rolle’s Th. 9.17 to the auxiliary function

h : [a, b] −→ R, h(x) := f(x)−(g(x)− g(a)

) f(b)− f(a)

g(b)− g(a). (9.24)

Since f and g are continuous on [a, b] and differentiable on ]a, b[, so is h. Moreover,h(a) = f(a) = h(b), i.e. Rolle’s Th. 9.17 applies and yields ξ ∈]a, b[ satisfying h′(ξ) = 0.However, (9.24) implies h′(ξ) = 0 is equivalent to (9.23a). �

Corollary 9.19. Let c < d and f : ]c, d[−→ R be differentiable (c = −∞, d = ∞ isadmissible).

(a) If f ′ ≥ 0 (resp. f ′ ≤ 0), then f is increasing (resp. decreasing). Moreover, if theinequalities are strict, then the monotonicity of f is strict as well.

(b) If f ′ ≡ 0, then f is constant.

Proof. If c < a < b < d and f ′ ≥ 0 (resp. f ′ ≤ 0, resp. f ′ ≡ 0), then (9.23b) impliesf(b) ≥ f(a) (resp. f(b) ≤ f(a), resp. f(b) = f(a)). Moreover, strict inequalities for f ′

yield strict inequality between f(b) and f(a). �

Lemma 9.20. Let a < b, f : ]a, b[−→ R, ξ ∈]a, b[, and assume f is differentiable at ξ.If f ′(ξ) > 0 (resp. f ′(ξ) < 0), then there exists ǫ > 0 such that ]ξ − ǫ, ξ + ǫ[⊆]a, b[ and

∀a1∈]ξ−ǫ,ξ[

∀b1∈]ξ,ξ+ǫ[

f(a1) < f(ξ) < f(b1)(resp. f(a1) > f(ξ) > f(b1)

).

Proof. If there does not exist ǫ > 0 such that f(a1) < f(ξ) < f(b1) for each a1 ∈]ξ− ǫ, ξ[and each b1 ∈]ξ, ξ + ǫ[, then then there exists a sequence (xk)k∈N in ]a, b[\{ξ} such thatlimk→∞ xk = ξ and

∀k∈N

f(xk)− f(ξ)

xk − ξ≤ 0,

showing f ′(ξ) ≤ 0. Analogously, one obtains that f ′(ξ) ≥ 0 provided there does not existǫ > 0 such that f(a1) > f(ξ) > f(b1) for each a1 ∈]ξ − ǫ, ξ[ and each b1 ∈]ξ, ξ + ǫ[. �

Caveat 9.21. The hypotheses of Lem. 9.20 are not sufficient for f to be increasing ordecreasing in any neighborhood of ξ: It is an exercise to find a counterexample.

Theorem 9.22 (Sufficient Conditions for Extrema). Let c < d, let f : ]c, d[−→ R bedifferentiable, and assume f ′(ξ) = 0 for some ξ ∈]c, d[.

(a) If f ′(x) > 0 for each x ∈]c, ξ[ and f ′(x) < 0 for each x ∈]ξ, d[, then f has a strictmax at ξ. Likewise, if f ′′(ξ) exists and is negative, then f has a strict max at ξ.

(b) If f ′(x) < 0 for each x ∈]c, ξ[ and f ′(x) > 0 for each x ∈]ξ, d[, then f has a strictmin at ξ. Likewise, if f ′′(ξ) exists and is positive, then f has a strict min at ξ.

Proof. We just present the proof for (a); (b) is proved analogously. If f ′(x) > 0 for eachx ∈]c, ξ[, then (9.23b) shows f(ξ)−f(a) > 0 for each c < a < ξ; analogously, if f ′(x) < 0for each x ∈]ξ, d[, then (9.23b) shows f(ξ)−f(b) > 0 for each ξ 0 such that f ′ is positive on ]ξ−ǫ, ξ[ and negative on ]ξ, ξ+ǫ[.Applying what we have already proved with c := ξ − ǫ and d := ξ + ǫ establishes thecase. �

Example 9.23. One obtains

f : R −→ R, f(x) := x ex, (9.25a)

f ′ : R −→ R, f ′(x) = ex + x ex = (1 + x) ex, (9.25b)

f ′′ : R −→ R, f ′′(x) = 2ex + x ex = (2 + x) ex. (9.25c)

From Th. 9.15, we know that f can have at most one extremum, namely at ξ = −1,where f ′(ξ) = 0. Since f ′′(ξ) = e−x > 0, Th. 9.22(b) implies that f has a strict min at−1.

—

The following Th. 9.24, the intermediate value theorem for derivatives, is another ap-plication of Lem. 9.20. Even though Ex. 9.14(c) shows that derivatives do not haveto be continuous, not every function can occur as a derivative: The following Th. 9.24shows that derivatives always satisfy an intermediate value property, even if they arenot continuous (that also means that discontinuities in derivatives are always due tooscillations rather than jumps).

Theorem 9.24 (Intermediate Value Theorem for Derivatives). Let a, b, c, d ∈ R witha < c ≤ d < b. If f : ]a, b[−→ R is differentiable, then f ′ assumes every value betweenf ′(c) and f ′(d), i.e.

[

min{f ′(c), f ′(d)},max{f ′(c), f ′(d)}]

⊆ f ′([c, d]). (9.26)

Proof. Exercise (hint: use a suitable auxiliary function and apply Lem. 9.20 togetherwith Th. 7.54 and Th. 9.15). �

9.4 L’Hopital’s Rule

L’Hopital’s rule is a result that can help to determine (function) limits (cf. Def. 8.17).

Theorem 9.25 (L’Hopital’s Rule). Let ξ ∈ R and either I =]a, ξ[ with a < ξ orI :=]ξ, b[ with ξ < b. Moreover, assume f, g : I −→ R are differentiable, g′(x) 6= 0 foreach x ∈ I, and one of the following two conditions (a), (b) is satisfied:

(a) limx→ξ f(x) = limx→ξ g(x) = 0.

(b) limx→ξ g(x) = ∞ or limx→ξ g(x) = −∞, where Def. 8.17 is extended to the caseη ∈ {−∞,∞} in the obvious way.

Then

limx→ξ

f ′(x)

g′(x)= η ⇒ lim

x→ξ

f(x)

g(x)= η. (9.27)

The above statement also holds for ξ ∈ {−∞,∞} and/or η ∈ {−∞,∞} if, as in (b),one extends Def. 8.17 to these cases in the obvious way.

Proof. First, we assume (a). Consider the case ξ ∈ R. Since f and g are continuous, (a)implies f and g remain continuous, if we extend them to ξ by letting f(ξ) := g(ξ) = 0.This extension will now allow us to apply Th. 9.18 to f and g. To prove (9.27), let(xk)k∈N be a sequence in I with limk→∞ xk = ξ. Then (9.23a) yields, for each k ∈ N,some ξk ∈]xk, ξ[ if xk < ξ and some ξk ∈]ξ, xk[ if ξ < xk, satisfying

f(xk)

g(xk)=f(xk)− f(ξ)

g(xk)− g(ξ)=f ′(ξk)

g′(ξk). (9.28)

From the Sandwich Th. 7.16, we obtain limk→∞ ξk = ξ, i.e. (9.28) and limx→ξf ′(x)g′(x)

= η

imply limx→ξf(x)g(x)

= η (also for η ∈ {−∞,∞}). Now consider the case ξ ∈ {−∞,∞}and let (xk)k∈N be as before. If ξ = ∞, then choose 1 ≤ c ∈ I and set I :=]0, c−1[; ifξ = −∞, then choose −1 ≥ c ∈ I and set I :=]c−1, 0[. We apply what we have alreadyproved above to the auxiliary functions

f : I −→ R, f(x) := f(1/x), g : I −→ R, g(x) := g(1/x)

at ξ := 0. From the chain rule (9.15), we know f ′(x) = −f ′(1/x)x2

and g′(x) = −g′(1/x)x2

for

each x ∈ I. Thus, limx→ξf ′(x)g′(x)

= η implies,

η = limk→∞

f ′(xk)

g′(xk)= lim

k→∞

−x2k f ′(xk)

−x2kg′(xk)= lim

k→∞

f ′(1/xk)

g′(1/xk)= lim

k→∞

f(1/xk)

g(1/xk)= lim

k→∞

f(xk)

g(xk),

proving limx→ξf(x)g(x)

= η.

We now assume (b), still letting (xk)k∈N be as before. Note that g′ 6= 0 implies gis injective by Rolle’s Th. 9.17. Then the intermediate value theorem implies g iseither strictly increasing or strictly decreasing. We proceed with the proof for the caseI =]a, ξ[, the proof for I =]ξ, b[ can be done completely analogous. We first considerthe case where g is strictly increasing, i.e. limx→ξ g(x) = ∞. Assume η ∈ R and ǫ > 0.

Then limx→ξf ′(x)g′(x)

= η and limx→ξ g(x) = ∞ imply

∃c∈]a,ξ[

∀x∈]c,ξ[

(

g(x) > 0 ∧ η − ǫ

2<f ′(x)

g′(x)< η +

ǫ

2

)

.

Since limk→∞ xk = ξ, there exists N0 ∈ N such that, for each k > N0, c < xk < ξ. Next,according to Th. 9.18,

∀k>N0

∃ξk∈]c,xk[

η − ǫ

2<f(xk)− f(c)

g(xk)− g(c)=f ′(ξk)

g′(ξk)< η +

ǫ

2.

In consequence, using g(xk) > g(c), as g is strictly increasing,

∀k>N0

(

η − ǫ

2

)

(g(xk)− g(c)) < f(xk)− f(c) <(

η +ǫ

2

)

(g(xk)− g(c))

and

∀k>N0

(

η − ǫ

2

)

+f(c)−

(η − ǫ

2

)g(c)

g(xk)<f(xk)

g(xk)<(

η +ǫ

2

)

+f(c)−

(η + ǫ

2

)g(c)

g(xk).

Since limk→∞ g(xk) = ∞,

∃N≥N0

∀k>N

(∣∣∣∣∣

f(c)−(η − ǫ

2

)g(c)

g(xk)

∣∣∣∣∣<ǫ

2∧

∣∣∣∣∣

f(c)−(η + ǫ

2

)g(c)

g(xk)

∣∣∣∣∣<ǫ

2

)

,

that means

∀k>N

η − ǫ <f(xk)

g(xk)< η + ǫ,


= η. For η = ∞ and given n ∈ N, the argument is similar:

limx→ξf ′(x)g′(x)

= η and limx→ξ g(x) = ∞ imply

∃c∈]a,ξ[

∀x∈]c,ξ[

(

g(x) > 0 ∧ n <f ′(x)

g′(x)

)

.

As before, since limk→∞ xk = ξ, there exists N0 ∈ N such that, for each k > N0,c < xk < ξ. Again, according to Th. 9.18,

∀k>N0

∃ξk∈]c,xk[

n <f(xk)− f(c)

g(xk)− g(c)=f ′(ξk)

g′(ξk).

In consequence, using g(xk) > g(c), as g is strictly increasing,

∀k>N0

n (g(xk)− g(c)) < f(xk)− f(c)

and

∀k>N0

n+f(c)− n g(c)

g(xk)<f(xk)

g(xk).

Since limk→∞ g(xk) = ∞,

∃N≥N0

∀k>N

∣∣∣∣

f(c)− n g(c)

g(xk)

∣∣∣∣< 1,

that means

∀k>N

n− 1 <f(xk)

g(xk),


= η. If η = −∞, then, using what we have already shown,

limx→ξ

f ′(x)

g′(x)= η ⇒ lim

x→ξ

−f ′(x)

g′(x)= ∞ = lim

x→ξ

−f(x)g(x)

⇒ limx→ξ

f(x)

g(x)= η.

Finally, if g strictly decreasing, then −g is strictly increasing and we obtain

limx→ξ

f ′(x)

g′(x)= η ⇒ lim

x→ξ

f ′(x)

−g′(x) = −η = limx→ξ

f(x)

−g(x) ⇒ limx→ξ

f(x)

g(x)= η,

concluding the proof. �

Example 9.26. (a) Applying L’Hopital’s rule to f : ]−π/2, π/2[−→ R, f(x) := tan x,g : ]− π/2, π/2[−→ R, g(x) := ex − 1, with ξ = 0 yields

limx→0

tan x

ex − 1= lim

x→0

1 + tan2 x

ex=

1

1= 1 (9.29)

(note g′(x) = ex 6= 0 for each x ∈]− π/2, π/2[).

(b) It can happen that a single application of L’Hopital’s rule does not, yet, yield auseful result, but that a repeated application does. An example is provided byconsidering α > 0, n ∈ N, and f : R+ −→ R, f(x) := eαx, g : R+ −→ R,g(x) := xn, ξ := ∞. Applying L’Hopital’s rule n times yields

∀α∈R+

∀n∈N

limx→∞

eαx

xn= lim

x→∞αn eαx

n!= ∞ (9.30)

(note g(k)(x) = n(n − 1) · · · (n − k + 1)xn−k 6= 0 for each k ∈ {1, . . . , n} and eachx ∈ R+).

(c) It can also happen that even repeated applications of L’Hopital’s rule do not

help at all, even though limx→ξf(x)g(x)

does exist and the hypotheses of Th. 9.25

are all satisfied. A simple example is given by f : R −→ R, f(x) := ex, g :

R −→ R, g(x) := 2ex, and ξ = −∞. Even though limx→−∞f(x)g(x)

= 12, one has

limx→−∞ f (n)(x) = limx→−∞ g(n)(x) = 0 for every n ∈ N.

9.5 Convex Functions

In the present section, we provide an introduction to (one-dimensional) convex functions.They have many important applications, some of which we well need in Analysis II, whenstudying so-called norms on Kn.

The idea is to call a function f : I −→ R (where I ⊆ R is an interval) convex if, andonly if, each line segment connecting two points on the graph of f lies above this graph,and to call f concave if, and only if, each such line segment lies below the graph of f .Noting that, for x1 < x2, the line through the two points (x1, f(x1)) and (x2, f(x2)) isrepresented by the equation

L(x) =x2 − x

x2 − x1f(x1) +

x− x1x2 − x1

f(x2), (9.31)

this leads to the following definition:

Definition 9.27. Let I ⊆ R be an interval (I can be open, closed, or half-open, it canbe for finite or of infinite length) and f : I −→ R. Then f is called convex if, and onlyif, for each x1, x, x2 ∈ I such that x1 < x < x2, one has

f(x) ≤ x2 − x

x2 − x1f(x1) +

x− x1x2 − x1

f(x2); (9.32a)

f is called concave if, and only if, for each x1, x, x2 ∈ I such that x1 < x < x2, one has

f(x) ≥ x2 − x

x2 − x1f(x1) +

x− x1x2 − x1

f(x2). (9.32b)

Moreover, f is called strictly convex (resp. strictly concave) if, and only if, (9.32a) (resp.(9.32b)) always holds with strict inequality.

Lemma 9.28. Let I ⊆ R be an interval. Then f : I −→ R is (strictly) concave if, andonly if, −f is (strictly) convex.

Proof. Merely multiply (9.32b) by (−1) and compare with (9.32a). �

The following Prop. 9.29 provides equivalences for convexity. One can easily obtain thecorresponding equivalences for concavity by combining Prop. 9.29 with Lem. 9.28.

Proposition 9.29. Let I ⊆ R be an interval and f : I −→ R. Then the followingstatements are equivalent:

(i) f is (strictly) convex.

(ii) For each a, b ∈ I such that a 6= b and each λ ∈]0, 1[, the following estimate holds(with strict inequality):

f(λa+ (1− λ)b

)≤ λf(a) + (1− λ)f(b). (9.33)

(iii) For each x1, x, x2 ∈ I such that x1 < x < x2, one has (with strict inequality)

f(x)− f(x1)

x− x1≤ f(x2)− f(x)

x2 − x. (9.34)

(iv) For each x1, x, x2 ∈ I such that x1 < x < x2, one has (with strict inequality)

f(x)− f(x1)

x− x1≤ f(x2)− f(x1)

x2 − x1≤ f(x2)− f(x)

x2 − x. (9.35)

Proof. We leave the proof as an exercise. Suggestion: Establish the following implica-tions: (i) ⇔ (ii), (i) ⇔ (iii), (iv) ⇒ (iii), (i) ⇒ (iv). �

Example 9.30. Since |λx + (1− λ)y| ≤ λ|x| + (1− λ)|y| for each 0 < λ < 1 and eachx, y ∈ R, the absolute value function is convex. This example also shows that a convexfunction does not need to be differentiable.

—

For differentiable functions, one can formulate convexity criteria in terms of the deriva-tive:

Proposition 9.31. Let a < b, and suppose that f : [a, b] −→ R is continuous on [a, b]and differentiable on ]a, b[. Then f is (strictly) convex (resp. (strictly) concave) on [a, b]if, and only if, the derivative f ′ is (strictly) increasing (resp. (strictly) decreasing) on]a, b[.

Proof. Since (−f)′ = −f ′ and −f ′ is (strictly) increasing if, and only if, f ′ is (strictly)decreasing, it suffices to consider the (strictly) convex case. So assume that f is (strictly)convex. Then for each x1, x, x0, y, x2 ∈]a, b[ such that x1 < x < x0 < y < x2, applyingProp. 9.29(iv), one has (with strict inequalities),

f(x)− f(x1)

x− x1≤ f(x0)− f(x1)

x0 − x1≤ f(x2)− f(x1)

x2 − x1≤ f(x2)− f(y)

x2 − y. (9.36)

Thus,

f ′(x1) = limx↓x1

f(x)− f(x1)

x− x1≤ f(x0)− f(x1)

x0 − x1

(∗)≤ f(x2)− f(x1)

x2 − x1

≤ limy↑x2

f(x2)− f(y)

x2 − y= f ′(x2) (9.37)

(where the inequality at (∗) is strict if it is strict in (9.36)), showing that f ′ is (strictly)increasing on ]a, b[. On the other hand, if f ′ is (strictly) increasing on ]a, b[, then foreach x1, x, x2 ∈ [a, b] such that x1 < x < x2, Th. 9.18 yields ξ1 ∈]x1, x[ and ξ2 ∈]x, x2[such that

f(x)− f(x1)

x− x1= f ′(ξ1) and

f(x2)− f(x)

x2 − x= f ′(ξ2). (9.38)

As ξ1 < ξ2 and f ′ is (strictly) increasing, (9.38) implies (9.34) (with strict inequality)and, thus, the (strict) convexity of f . �

Proposition 9.32. Let a < b, and suppose that f : [a, b] −→ R is continuous on [a, b]and twice differentiable on ]a, b[.

(a) f is convex (resp. concave) on [a, b] if, and only if, f ′′ ≥ 0 (resp. f ′′ ≤ 0) on ]a, b[.

(b) If f ′′ > 0 (resp. f ′′ < 0) on ]a, b[, then f is strictly convex (resp. strictly concave)(as a caveat we remark that, here, the converse does not hold – for example x 7→ x4

is strictly convex, but its second derivative x 7→ 12x2 is 0 at x = 0).

Proof. Since −f ′′ ≥ 0 if, and only if f ′′ ≤ 0; and −f ′′ > 0 if, and only if f ′′ < 0, ifsuffices to consider the convex cases. Moreover, for (a), one merely has to combine Prop.9.31 with the fact that f ′ is increasing on ]a, b[ if, and only if, f ′′ ≥ 0 on ]a, b[. Theproof of (b) is left as an exercise. �

Example 9.33. (a) Since for f : R −→ R, f(x) = ex, it is f ′′(x) = ex > 0, theexponential function is strictly convex on R.

(b) Since for f : R+ −→ R, f(x) = ln x, it is f ′′(x) = −1/x2 < 0, the natural logarithmis strictly concave on R+.

Theorem 9.34 (Jensen’s inequality). Let I ⊆ R be an interval and let f : I −→ R beconvex. If n ∈ N and λ1, . . . , λn > 0 such that λ1 + · · ·+ λn = 1, then

∀x1,...,xn∈I

f(λ1x1 + · · ·+ λnxn) ≤ λ1f(x1) + · · ·+ λnf(xn). (9.39a)

If f is concave, then

∀x1,...,xn∈I

f(λ1x1 + · · ·+ λnxn) ≥ λ1f(x1) + · · ·+ λnf(xn). (9.39b)

If f is strictly convex or strictly concave, then equality in the above inequalities can onlyhold if x1 = · · · = xn.

Proof. If one lets a := min{x1, . . . , xn}, b := max{x1, . . . , xn}, and x := λ1x1+· · ·+λnxn,then

a =n∑

j=1

λj a ≤ x ≤n∑

j=1

λj b = b ⇒ x ∈ I. (9.40)


Since f is (strictly) concave if, and only if, −f is (strictly) convex, it suffices to considerthe cases where f is convex and where f is strictly convex. Thus, we assume that f isconvex and prove (9.39a) by induction. For n = 1, one has λ1 = 1 and there is nothingto prove. For n = 2, (9.39a) reduces to (9.33), which holds due to the convexity of f .Finally, let n > 2 and assume that (9.39a) already holds for each 1 ≤ l ≤ n− 1. Set

λ := λ1 + · · ·+ λn−1, x :=λ1λx1 + · · ·+ λn−1

λxn−1. (9.41)

Then x ∈ I follows as in (9.40). One computes

f(λ1x1 + · · ·+ λnxn) = f

(n−1∑

j=1

λjxj + λnxn

)

= f(λx+ λnxn)

l=2

≤ λf(x) + λnf(xn)l=n−1

≤ λ

n−1∑

j=1

λjλf(xj) + λnf(xn)

= λ1f(x1) + · · ·+ λnf(xn), (9.42)

thereby completing the induction, and, thus, the proof of (9.39a). If f is strictly convexand (9.39a) holds with equality, then one can also proceed by induction to prove theequality of the xj. Again, if n = 1, then there is nothing to prove. If n = 2, and x1 6= x2,then strict convexity requires (9.39a) to hold with strict inequality. Thus x1 = x2. Nowlet n > 2. It is noted that (9.42) still holds. By hypothesis, the first and last termin (9.42) are now equal, implying that all terms in (9.42) must be equal. Using theinduction hypothesis for l = 2 and the corresponding equality in (9.42), we concludethat x = xn. Using the induction hypothesis for l = n−1 and the corresponding equalityin (9.42), we conclude that x1 = · · · = xn−1. Finally, x = xn and x1 = · · · = xn−1 arecombined using (9.41) to get x1 = xn, finishing the proof of the theorem. �

Theorem 9.35 (Inequality Between the Weighted Arithmetic Mean and the WeightedGeometric Mean). If n ∈ N, x1, . . . , xn ≥ 0 and λ1, . . . , λn > 0 such that λ1+· · ·+λn = 1,then

xλ11 · · · xλnn ≤ λ1x1 + · · ·+ λnxn, (9.43)

where equality occurs if, and only if, x1 = · · · = xn. In particular, for λ1 = · · · = λn =1n, one recovers the inequality between the arithmetic and the geometric mean without

weights, known from Th. 7.63.

Proof. If at least one of the xj is 0, then (9.43) becomes the true statement 0 ≤∑n

j=1 λjxj with strict inequality if, and only if, at least one xj > 0. Thus, it remainsto consider the case x1, . . . , xn > 0. As we noted in Ex. 9.33(b), the natural logarithmln : R+ −→ R is concave and even strictly concave. Employing Jensen’s inequality(9.39b) yields

ln(λ1x1 + · · ·+ λnxn) ≥ λ1 ln x1 + · · ·+ λn ln xn = ln(xλ11 · · · xλnn ). (9.44)

Applying the exponential function to both sides of (9.44), one obtains (9.43). Since(9.44) is equivalent to (9.43), the strict concavity of ln yields that equality in (9.44)implies x1 = · · · = xn. �

10 THE RIEMANN INTEGRAL ON INTERVALS IN R 141

10 The Riemann Integral on Intervals in R

10.1 Definition and Simple Properties

Given a nonnegative function f : M −→ R+0 , M ⊆ R, we aim to compute the area

∫

Mf

of the set “under the graph” of f , i.e. of the set

{(x, y) ∈ R2 : x ∈M and 0 ≤ y ≤ f(x)

}. (10.1)

This area∫

Mf (if it exists) will be called the integral of f over M . Moreover, for

functions f : M −→ R that are not necessarily nonnegative, we would like to countareas of sets of the form (10.1) (which are below the graph of f and above the setM ∼=

{(x, 0) ∈ R2 : x ∈ M

}⊆ R2) with a positive sign, and whereas we would like to

count areas of sets above the graph of f and below the set M with a negative sign. Inother words, making use of the positive and negative parts f+ and f− of f = f+ − f−

as defined in (6.1i) and (6.1j), respectively, we would like our integral to satisfy

∫

M

f =

∫

M

f+ −∫

M

f−. (10.2)

Difficulties arise from the fact that both the function f and the set M can be extremelycomplicated. To avoid dealing with complicated sets M , we restrict ourselves to thesituation of integrals over compact intervals, i.e. to integrals over sets of the form M =[a, b]. Moreover, we will also restrict ourselves to bounded functions f , which we nowdefine:

Definition 10.1. Let ∅ 6= M ⊆ R and f : M −→ R. Then f is called bounded if, andonly if, the set {|f(x)| : x ∈M} ⊆ R+

0 is bounded, i.e. if, and only if,

‖f‖sup := sup{|f(x)| : x ∈M} ∈ R+0 . (10.3)

The basic idea for the definition of the Riemann integral∫

Mf is rather simple: De-

compose the set M into small pieces I1, . . . , IN and approximate∫

Mf by the finite sum

∑Nj=1 f(xj)|Ij|, where xj ∈ Ij and |Ij| denotes the size of the set Ij. Define

∫

Mf as the

limit of such sums as the size of the Ij tends to zero (if the limit exists). However, tocarry out this idea precisely and rigorously does require some work.

As stated before, we will assume thatM is a closed bounded interval, and we will choosethe Ij to be closed bounded intervals as well. To emphasize we are dealing with intervals,in the following, we will prefer to use the symbol I instead of M .

Definition 10.2. If a, b ∈ R, a ≤ b, and I := [a, b], then we call

|I| := b− a = |a− b|, (10.4)

the length or the (1-dimensional) size, volume, or measure of I.

Definition 10.3. Given a real interval I := [a, b] ⊆ R, a, b ∈ R, a < b, the (N+1)-tuple∆ := (x0, . . . , xN ) ∈ RN+1, N ∈ N, is called a partition of I if, and only if, a = x0 <x1 < · · · < xN = b. We call x0, . . . , xN the nodes of ∆, and let ν(∆) := {x0, . . . , xN}be the set of all nodes. A tagged partition of I is a partition together with an N -tuple(t1, . . . , tN) ∈ RN such that tj ∈ [xj−1, xj ] for each j ∈ {1, . . . , N}. Given a partition ∆(with or without tags) of I as above and letting Ij := [xj−1, xj ], the number

|∆| := max{|Ij| : j ∈ {1, . . . , N}

}, (10.5)

is called the mesh size of ∆. It is sometimes convenient, if we extend our definitions totrivial intervals, consisting of just one point: For a = b, we have I = [a, a] = {a}. Wethen define ∆ = x0 = a to be a partition of I, ν(∆) = {x0}, and a is then the only tagthat makes ∆ into a tagged partition. We also set I0 := I = {a}, and the mesh size inthis case is |∆| := 0.

Definition 10.4. Let ∆ be a partition of I = [a, b] ⊆ R, a ≤ b, as in Def. 10.3. Givena function f : I −→ R that is bounded according to Def. 10.1, define

mj := mj(f) := inf{f(x) : x ∈ Ij}, Mj :=Mj(f) := sup{f(x) : x ∈ Ij}, (10.6)

and

r(∆, f) :=N∑

j=1

mj |Ij| =N∑

j=1

mj(xj − xj−1), (10.7a)

R(∆, f) :=N∑

j=1

Mj |Ij| =N∑

j=1

Mj(xj − xj−1), (10.7b)

where r(∆, f) is called the lower Riemann sum and R(∆, f) is called the upper Riemannsum associated with ∆ and f . If ∆ is tagged by τ := (t1, . . . , tN), then we also definethe intermediate Riemann sum

ρ(∆, f) :=N∑

j=1

f(tj) |Ij| =N∑

j=1

f(tj)(xj − xj−1). (10.7c)

Note that, for a = b, all the above sums are empty and we have r(∆, f) = R(∆, f) =ρ(∆, f) = 0.

Definition 10.5. Let I = [a, b] ⊆ R be an interval, a ≤ b, and suppose f : I −→ R isbounded.

(a) Define

J∗(f, I) := sup{r(∆, f) : ∆ is a partition of I

}, (10.8a)

J∗(f, I) := inf{R(∆, f) : ∆ is a partition of I

}. (10.8b)

We call J∗(f, I) the lower Riemann integral of f over I and J∗(f, I) the upperRiemann integral of f over I.

(b) The function f is called Riemann integrable over I if, and only if, J∗(f, I) = J∗(f, I).If f is Riemann integrable over I, then

∫ b

a

f(x) dx :=

∫

I

f(x) dx :=

∫ b

a

f :=

∫

I

f := J∗(f, I) = J∗(f, I) (10.9)

is called the Riemann integral of f over I. The set of all functions f : I −→ R thatare Riemann integrable over I is denoted by R(I,R) or just by R(I).

(c) The function g : I −→ C is called Riemann integrable over I if, and only if, bothRe g and Im g are Riemann integrable. The set of all Riemann integrable functionsg : I −→ C is denoted by R(I,C). If g ∈ R(I,C), then

∫

I

g :=

(∫

I

Re g,

∫

I

Im g

)

=

∫

I

Re g + i

∫

I

Im g ∈ C (10.10)

is called the Riemann integral of g over I.

Remark 10.6. If I = [a, b] ⊆ R, ∆ is a partition of I, and f : I −→ R is bounded,then (10.6) implies

mj(f)(4.6c)= −Mj(−f) and mj(−f)

(4.6d)= −Mj(f), (10.11a)

(10.7) implies

r(∆, f) = −R(∆,−f) and r(∆,−f) = −R(∆, f), (10.11b)

and (10.8) implies

J∗(f, I) = −J∗(−f, I) and J∗(−f, I) = −J∗(f, I). (10.11c)

Example 10.7. (a) If I = [a, b] ⊆ R as before and f : I −→ R is constant, i.e. f ≡ cwith c ∈ R, then f ∈ R(I) and

∫ b

a

f = c (b− a) = c |I| : (10.12)

We have, for each partition ∆ of I,

r(∆, f) =N∑

j=1

mj |Ij| = c

N∑

j=1

|Ij| = c |I| = c (b− a) =N∑

j=1

Mj |Ij| = R(∆, f),

(10.13)proving J∗(f, I) = c (b− a) = J∗(f, I).

(b) An example of a function that is not Riemann integrable for a < b is given by theDirichlet function

f : [a, b] −→ R, f(x) :=

{

0 for x irrational,

1 for x rational,a < b. (10.14)

Since r(∆, f) = 0 and R(∆, f) =∑N

j=1 |Ij| = b− a for every partition ∆ of I, oneobtains J∗(f, I) = 0 6= (b− a) = J∗(f, I), showing that f /∈ R(I).

Definition 10.8. (a) If ∆ is a partition of [a, b] ⊆ R as in Def. 10.3, then anotherpartition ∆′ of [a, b] is called a refinement of ∆ if, and only if, ν(∆) ⊆ ν(∆′), i.e. if,and only if, the nodes of ∆′ include all the nodes of ∆.

(b) If ∆ and ∆′ are partitions of [a, b] ⊆ R, then the superposition of ∆ and ∆′, denoted∆+∆′, is the unique partition of [a, b] having ν(∆)∪ν(∆′) as its set of nodes. Notethat the superposition of ∆ and ∆′ is always a common refinement of ∆ and ∆′.

Lemma 10.9. Let a, b ∈ R, a < b, I := [a, b], and suppose f : I −→ R is bounded withM := ‖f‖sup ∈ R+

0 . Let ∆′ be a partition of I and assume

α := #(

ν(∆′) \ {a, b})

≥ 1 (10.15)

is the number of interior nodes that occur in ∆′. Then, for each partition ∆ of I, thefollowing holds:

r(∆, f) ≤ r(∆ +∆′, f) ≤ r(∆, f) + 2αM |∆|, (10.16a)

R(∆, f) ≥ R(∆ +∆′, f) ≥ R(∆, f)− 2αM |∆|. (10.16b)

Proof. We carry out the proof of (10.16a) – the proof of (10.16b) can be conductedcompletely analogous. Consider the case α = 1 and let ξ be the single element ofν(∆′) \ {a, b}. If ξ ∈ ν(∆), then ∆+∆′ = ∆, and (10.16a) is trivially true. If ξ /∈ ν(∆),then xk−1 < ξ < xk for a suitable k ∈ {1, . . . , N}. Define

I ′ := [xk−1, ξ], I ′′ := [ξ, xk] (10.17)

andm′ := inf{f(x) : x ∈ I ′}, m′′ := inf{f(x) : x ∈ I ′′}. (10.18)

Then we obtain

r(∆ +∆′, f)− r(∆, f) = m′ |I ′|+m′′ |I ′′| −mk |Ik| = (m′ −mk) |I ′|+ (m′′ −mk) |I ′′|.(10.19)

Together with the observation

0 ≤ m′ −mk ≤ 2M, 0 ≤ m′′ −mk ≤ 2M, (10.20)

(10.19) implies

0 ≤ r(∆ +∆′, f)− r(∆, f) ≤ 2M(|I ′|+ |I ′′|

)≤ 2M |∆|. (10.21)

The general form of (10.16a) follows by an induction on α. �

Theorem 10.10. Let a, b ∈ R, a ≤ b, I := [a, b], and let f : I −→ R be bounded.

(a) Suppose ∆ and ∆′ are partitions of I such that ∆′ is a refinement of ∆. Then

r(∆, f) ≤ r(∆′, f), R(∆, f) ≥ R(∆′, f). (10.22)

(b) For arbitrary partitions ∆ and ∆′, the following holds:

r(∆, f) ≤ R(∆′, f). (10.23)

(c) J∗(f, I) ≤ J∗(f, I).

(d) For each sequence of partitions (∆n)n∈N of I such that limn→∞ |∆n| = 0, one has

limn→∞

r(∆n, f) = J∗(f, I), limn→∞

R(∆n, f) = J∗(f, I). (10.24)

In particular, if f ∈ R(I), then

limn→∞

r(∆n, f) = limn→∞

R(∆n, f) =

∫

I

f, (10.25a)

and if f ∈ R(I) and the ∆n are tagged, then also

limn→∞

ρ(∆n, f) =

∫

I

f. (10.25b)

Proof. (a): If ∆′ is a refinement of ∆, then ∆′ = ∆ + ∆′. Thus, (10.22) is immediatefrom (10.16).

(b): This also follows from (10.16):

r(∆, f)(10.16a)

≤ r(∆ +∆′, f)(10.7)

≤ R(∆ +∆′, f)(10.16b)

≤ R(∆′, f). (10.26)

(c): One just combines (10.8) with (b).

(d): For a = b, there is nothing to show. For a < b, let (∆n)n∈N be a sequence ofpartitions of I such that limn→∞ |∆n| = 0, and let ∆′ be an arbitrary partition of I withnumbers α and M defined as in Lem. 10.9. Then, according to (10.16a):

r(∆n, f) ≤ r(∆n +∆′, f) ≤ r(∆n, f) + 2αM |∆n| for each n ∈ N. (10.27)

From (b), we conclude the sequence(r(∆n, f)

)

n∈N is bounded. According to the Bolza-no-Weierstrass Th. 7.27, if we can show that the sequence has J∗(f, I) as its only clusterpoint, then the first equality of (10.24) must hold. Thus, according to Prop. 7.26, it suf-fices to show that every converging subsequence of (r(∆n, f))n∈N converges to J∗(f, I).To this end, suppose (r(∆nk

, f))k∈N is a converging subsequence of (r(∆n, f))n∈N withβ := limk→∞ r(∆nk

, f). First note β ≤ J∗(f, I) due to the definition of J∗(f, I). More-over, (10.27) implies limk→∞ r(∆nk

+∆′, f) = β. Since r(∆′, f) ≤ r(∆nk+∆′, f) and ∆′

is arbitrary, we obtain J∗(f, I) ≤ β, i.e. J∗(f, I) = β. Thus, we have shown that, indeed,every subsequence of (r(∆n, f))n∈N converges to β = J∗(f, I). In the same manner, oneconducts the proof of J∗(f, I) = limn→∞R(∆n, f). Then (10.25a) is immediate fromthe definition of Riemann integrability, and (10.25b) follows from (10.25a), since (10.7)implies r(∆, f) ≤ ρ(∆, f) ≤ R(∆, f) for each tagged partition ∆ of I. �

Theorem 10.11. Let a, b ∈ R, a ≤ b, I := [a, b].

(a) The integral is linear: More precisely, if f, g ∈ R(I,K) and λ, µ ∈ K, then λf+µg ∈R(I,K) and ∫

I

(λf + µg) = λ

∫

I

f + µ

∫

I

g. (10.28)

(b) Let ∆ = (y0, . . . , yM), M ∈ N, be a partition of I, Jk := [yk−1, yk]. Then f ∈R(I,K) if, and only if, f ∈ R(Jk,K) for each k ∈ {1, . . . ,M}. If f ∈ R(I,K),then

∫ b

a

f =

∫

I

f =M∑

k=1

∫

Jk

f =M∑

k=1

∫ yk

yk−1

f. (10.29)

(c) Monotonicity of the Integral: If f, g : I −→ R are bounded and f ≤ g (i.e. f(x) ≤g(x) for each x ∈ I), then J∗(f, I) ≤ J∗(g, I) and J∗(f, I) ≤ J∗(g, I). In particular,if f, g ∈ R(I) and f ≤ g, then ∫

I

f ≤∫

I

g. (10.30)

(d) Triangle Inequality: For each f ∈ R(I,C), one has

∣∣∣∣

∫

I

f

∣∣∣∣≤∫

I

|f |. (10.31)

Proof. (a): First, consider K = R, i.e. f, g : I −→ R and λ, µ ∈ R. For a = b,there is nothing to prove, so let a < b. Let (∆n)n∈N be a sequence of partitions of I,∆n = (xn,0, . . . , xn,Nn

), In,j := [xn,j−1, xn,j ], satisfying limn→∞ |∆n| = 0. Note that, foreach n ∈ N and each j ∈ {1, . . . , Nn},

mn,j(f + g) = inf{f(x) + g(x) : x ∈ In,j}≥ inf{f(x) : x ∈ In,j}+ inf{g(x) : x ∈ In,j}= mn,j(f) +mn,j(g), (10.32a)

Mn,j(f + g) = sup{f(x) + g(x) : x ∈ In,j}≤ sup{f(x) : x ∈ In,j}+ sup{g(x) : x ∈ In,j}= Mn,j(f) +Mn,j(g), (10.32b)

∀λ∈R

mn,j(λf) = inf{λf(x) : x ∈ In,j}

(4.6d)=

{

λ inf{f(x) : x ∈ In,j} = λmn,j(f) for λ ≥ 0,

λ sup{f(x) : x ∈ In,j} = λMn,j(f) for λ < 0,(10.32c)

∀λ∈R

Mn,j(λf) = sup{λf(x) : x ∈ In,j}

(4.6c)=

{

λ sup{f(x) : x ∈ In,j} = λMn,j(f) for λ ≥ 0,

λ inf{f(x) : x ∈ In,j} = λmn,j(f) for λ < 0.(10.32d)

Thus,

J∗(f + g, I)(10.24)= lim

n→∞r(∆n, f + g)

(10.7a)= lim

n→∞

Nn∑

j=1

mn,j(f + g) |In,j|

(10.32a)

≥ limn→∞

(r(∆n, f) + r(∆n, g)

)= J∗(f, I) + J∗(g, I), (10.33a)

J∗(f + g, I)(10.24)= lim

n→∞R(∆n, f + g)

(10.7b)= lim

n→∞

Nn∑

j=1

Mn,j(f + g) |In,j|

(10.32b)

≤ limn→∞

(R(∆n, f) +R(∆n, g)

)= J∗(f, I) + J∗(g, I),(10.33b)

∀λ∈R

J∗(λf, I)(10.24)= lim

n→∞r(∆n, λf)

(10.7a)= lim

n→∞

Nn∑

j=1

mn,j(λf) |In,j|

(10.32c)=

{

λ limn→∞ r(∆n, f) = λJ∗(f, I) for λ ≥ 0,

λ limn→∞R(∆n, f) = λJ∗(f, I) for λ < 0,(10.33c)

∀λ∈R

J∗(λf, I)(10.24)= lim

n→∞R(∆n, λf)

(10.7b)= lim

n→∞

Nn∑

j=1

Mn,j(λf) |In,j|

(10.32d)=

{

λ limn→∞R(∆n, f) = λJ∗(f, I) for λ ≥ 0,

λ limn→∞ r(∆n, f) = λJ∗(f, I) for λ < 0.(10.33d)

Thus, if f and g are both Riemann integrable over I, then we obtain J∗(f + g, I) ≥J∗(f, I)+J∗(g, I) = J∗(f, I)+J∗(g, I) ≥ J∗(f+g, I), i.e., by Th. 10.10(c), (f+g) ∈ R(I);and J∗(λf, I) = λJ∗(f, I) = λJ∗(f, I) for λ ≥ 0, J∗(λf, I) = λJ∗(f, I) = λJ∗(f, I) forλ < 0, i.e. (λf) ∈ R(I) in each case. In particular, for each λ, µ ∈ R,

∫

I

(λf + µg) = J∗(λf + µg, I) = λJ∗(f, I) + µJ∗(g, I) = λ

∫

I

f + µ

∫

I

g,

proving (10.28) for K = R. It remains to consider f, g ∈ R(I,C) and λ, µ ∈ C. Onecomputes, using the real-valued case,

∫

I

(λf) =

(∫

I

(ReλRe f − Imλ Im f),

∫

I

(Reλ Im f + ImλRe f)

)

=

(

Reλ

∫

I

Re f − Imλ

∫

I

Im f, Reλ

∫

I

Im f + Imλ

∫

I

Re f

)

= λ

∫

I

f

and∫

I

(f + g) =

(∫

I

Re(f + g),

∫

I

Im(f + g)

)

=

(∫

I

Re f +

∫

I

Re g,

∫

I

Im g +

∫

I

Im g

)

=

(∫

I

Re f,

∫

I

Im f

)

+

(∫

I

Re g,

∫

I

Im g

)

=

∫

I

f +

∫

I

g.

(b): Once again, consider K = R first. For a = b, there is nothing to prove, so leta < b. For M = 1, there is still nothing to prove. For M = 2, we have a = y0 <y1 < y2 = b. Consider a sequence (∆n)n∈N of partitions of I, ∆n = (xn,0, . . . , xn,Nn

),such that limn→∞ |∆n| = 0 and y1 ∈ ν(∆n) for each n ∈ N. Define ∆′

n := (xn,0, . . . , y1),∆′′n := (y1, . . . , xn,Nn

). Then ∆′n and ∆′′

n are partitions of J1 and J2, respectively, andlimn→∞ |∆′

n| = limn→∞ |∆′′n| = 0. Moreover,

∀n∈N

(

r(∆n, f) = r(∆′n, f) + r(∆′′

n, f), R(∆n, f) = R(∆′n, f) +R(∆′′

n, f))

,

implying J∗(f, I) = J∗(f, J1)+J∗(f, J2) and J∗(f, I) = J∗(f, J1)+J∗(f, J2). This proves∫

If =

∫

J1f +

∫

J2f provided f ∈ R(I) ∩R(J1) ∩R(J2). So it just remains to show the

claimed equivalence between f ∈ R(I) and f ∈ R(J1) ∩ R(J2). If f ∈ R(J1) ∩ R(J2),then J∗(f, I) = J∗(f, J1)+J∗(f, J2) = J∗(f, J1)+J∗(f, J2) = J∗(f, I), showing f ∈ R(I).Conversely, J∗(f, I) = J∗(f, I) implies J∗(f, J1) = J∗(f, J1) + J∗(f, J2) − J∗(f, J2) ≥J∗(f, J1), showing J∗(f, J1) = J∗(f, J1) and f ∈ R(J1); f ∈ R(J2) follows completelyanalogous. The general case now follows by induction on M . If, f ∈ R(I,C), then onecomputes, using the real-valued case,

∫

I

f =

(∫

I

Re f,

∫

I

Im f

)

=

(M∑

k=1

∫

Jk

Re f,M∑

k=1

∫

Jk

Im f

)

=M∑

k=1

∫

Jk

f.

(c): If f, g : I −→ R are bounded and f ≤ g, then, for each partition ∆ of I, r(∆, f) ≤r(∆, g) and R(∆, f) ≤ R(∆, g) are immediate from (10.7). As these inequalities arepreserved when taking the sup and the inf, respectively, all claims of (c) are established.

(d): We will see in Th. 10.18(c) below, that f ∈ R(I,K) implies |f | ∈ R(I). Let ∆ bean arbitrary partition of I, tagged by (t1, . . . , tN). Then, using the same notation as inDef. 10.3 and Def. 10.4,

∣∣∣

(ρ(∆,Re f), ρ(∆, Im f)

)∣∣∣ :=

∣∣∣∣∣

(N∑

j=1

Re f(tj) |Ij|,N∑

j=1

Im f(tj) |Ij|)∣∣∣∣∣

≤N∑

j=1

∣∣∣

(Re f(tj), Im f(tj)

)∣∣∣ |Ij|

=N∑

j=1

|f(tj)| |Ij| =: ρ(∆, |f |). (10.34)

Since the intermediate Riemann sums in (10.34) converge to the respective integrals by(10.25b), one obtains

∣∣∣∣

∫

I

f

∣∣∣∣= lim

|∆|→0

∣∣∣

(ρ(∆,Re f), ρ(∆, Im f)

)∣∣∣

(10.34)

≤ lim|∆|→0

ρ(∆, |f |) =∫

I

|f |,

proving (10.31). �

Theorem 10.12 (Mean Value Theorem). Let a, b ∈ R, a ≤ b, I := [a, b]. If f, p ∈ R(I)and p ≥ 0, then, for each m,M ∈ R with m ≤ f ≤M :

m

∫

I

p ≤∫

I

fp ≤M

∫

I

p. (10.35a)

In particular, if f is continuous, then

∃ξ∈I

∫

I

fp = f(ξ)

∫

I

p. (10.35b)

Returning to a general f ∈ R(I), if p ≡ 1, then we obtain the theorem’s classical form:

m (b− a) = m |I| ≤∫ b

a

f =

∫

I

f ≤M |I| =M (b− a). (10.35c)

The theorem’s name comes from the fact that, for a < b, |I|−1∫

If is sometimes referred

to as the mean value of f on I.

Proof. Since mp ≤ fp ≤Mp, we compute

m

∫

I

pTh. 10.11(c)

≤∫

I

fpTh. 10.11(c)

≤ M

∫

I

p. (10.36)

If∫

Ip = 0, then (10.35a) implies

∫

Ifp = 0 and (10.35b) is immediate. It remains

to consider that∫

Ip > 0. If f is continuous, then we can let m,M be such that

f(I) = [m,M ] by Th. 7.54 and the intermediate value Th. 7.57. Then (10.35a) shows

there is ξ ∈ I such that f(ξ) =∫Ifp∫

Ip, i.e. (10.35b) holds. �

Theorem 10.13 (Riemann’s Integrability Criterion). Let I = [a, b] ⊆ R and supposef : I −→ R is bounded. Then f is Riemann integrable over I if, and only if, for eachǫ > 0, there exists a partition ∆ of I such that

R(∆, f)− r(∆, f) < ǫ. (10.37)

Proof. Suppose, for each ǫ > 0, there exists a partition ∆ of I such that (10.37) issatisfied. Then

J∗(f, I)− J∗(f, I) ≤ R(∆, f)− r(∆, f) < ǫ, (10.38)

showing J∗(f, I) ≤ J∗(f, I). As the opposite inequality always holds, we have J∗(f, I) =J∗(f, I), i.e. f ∈ R(I) as claimed. Conversely, if f ∈ R(I) and (∆n)n∈N is a sequence ofpartitions of I with limn→∞ |∆n| = 0, then (10.25a) implies that, for each ǫ > 0, thereis N ∈ N such that R(∆n, f)− r(∆n, f) < ǫ for each n > N . �

The previous theorem will allow us to prove that every continuous function on [a, b] isRiemann integrable. However, we will also need to make use of the following result:

Proposition 10.14. Let I = [a, b] ⊆ R, a ≤ b, f : I −→ R. If f is continuous, then fis even uniformly continuous, i.e.

∀ǫ∈R+

∃δ∈R+

∀x,y∈I

(|x− y| < δ ⇒ |f(x)− f(y)| < ǫ

). (10.39)

Proof. Arguing by contraposition, we assume f not to be uniformly continuous on I.Then the negation of (10.39) must hold, i.e.

∃ǫ0∈R+

∀δ∈R+

∃x,y∈I

(|x− y| < δ ∧ |f(x)− f(y)| ≥ ǫ0

). (10.40)

In particular, for each n ∈ N, there exist xn, yn ∈ I such that

|xn − yn| < δn := 1/n (10.41)

and |f(xn) − f(yn)| ≥ ǫ0. Then the sequence (xn)n∈N is bounded and the Bolzano-Weierstrass Th. 7.27 provides a convergent subsequence (xφ(n))n∈N, i.e. there is ξ ∈ R

with limn→∞ xφ(n) = ξ. Clearly, ξ ∈ [a, b] and (10.41) implies limn→∞ yφ(n) = ξ aswell. However, due to |f(xφ(n)) − f(yφ(n))| ≥ ǫ0 > 0, the sequences

(f(xφ(n))

)

n∈N and(f(yφ(n))

)

n∈N can not both converge to f(ξ), showing that f can not be continuous. �

Caveat 10.15. It is important in Prop. 10.14 that f is defined on a compact interval I.The examples f : ]0, 1] −→ R, f(x) := 1/x, and f : R −→ R, f(x) := x2 are examplesof continuous functions that are not uniformly continuous.

Theorem 10.16. Let I = [a, b] ⊆ R, a ≤ b.

(a) If f : I −→ C is continuous, then f is Riemann integrable over I.

(b) If f : I −→ R is increasing or decreasing, then f is Riemann integrable over I.

Proof. (a): As f is continuous if, and only if, Re f and Im f are both continuous, itsuffices to consider a real-valued continuous f . For a = b, there is nothing to prove, solet a < b. First note that, if f is continuous on I = [a, b], then f is bounded by Th.7.54. Moreover, f is uniformly continuous due to Prop. 10.14. Thus, given ǫ > 0, thereis δ > 0 such that |x− y| < δ implies |f(x)− f(y)| < ǫ/|I| for each x, y ∈ I. Then, foreach partition ∆ of I satisfying |∆| < δ, we obtain

R(∆, f)− r(∆, f) =N∑

j=1

(Mj −mj)|Ij| ≤ǫ

|I|N∑

j=1

|Ij| = ǫ, (10.42)

as |∆| < δ implies |x − y| < δ for each x, y ∈ Ij and each j ∈ {1, . . . , N}. Finally,(10.42) implies f ∈ R(I) due to Riemann’s integrability criterion of Th. 10.13.

(b): Suppose f : [a, b] −→ R is increasing. Then f is bounded, as f(a) ≤ f(x) ≤ f(b)for each x ∈ [a, b]. If f(a) = f(b), then f is constant. Thus, assume f(a) < f(b).Moreover, if ∆ = (x0, . . . , xN) is a partition of I as in Def. 10.3, then

R(∆, f)− r(∆, f) =N∑

j=1

(Mj −mj)|Ij| =N∑

j=1

(f(xj)− f(xj−1)

)|Ij| ≤ |∆|

(f(b)− f(a)

).

(10.43)

Thus, given ǫ > 0, we have R(∆, f) − r(∆, f) < ǫ for each partition ∆ of I satisfying|∆| < ǫ/(f(b)− f(a)). In consequence, f ∈ R(I), once again due to Riemann’s integra-bility criterion of Th. 10.13. If f is decreasing, then −f is increasing, and Th. 10.11(a)establishes the case. �

Definition and Remark 10.17. Let M ⊆ C. A function f : M −→ C f is calledLipschitz continuous in M with Lipschitz constant L if, and only if,

∃L∈R+

0

∀x,y∈M

|f(x)− f(y)| ≤ L |x− y|. (10.44)

Every Lipschitz continuous function is, indeed, continuous, since, if ξ ∈M and (yn)n∈Nis a sequence in M with limn→∞ yn = ξ, then (10.44) implies

∀n∈N

|f(ξ)− f(yn)| ≤ L |ξ − yn|, (10.45)

proving limn→∞ f(yn) = f(ξ). Moreover, it is not too much harder to prove Lipschitzcontinuous functions are even uniformly continuous, but we will not pursue this rightnow. On the other hand, f : R+

0 −→ R, f(x) :=√x, is an example of a continuous

function (actually, even uniformly continuous) that is not Lipschitz continuous.

Theorem 10.18. Let a, b ∈ R, a ≤ b, I := [a, b].

(a) If f ∈ R(I,R) and φ : f(I) −→ C is Lipschitz continuous, then φ ◦ f ∈ R(I,C).

(b) If f ∈ R(I,C) and φ : f(I) −→ R is Lipschitz continuous, then φ ◦ f ∈ R(I,R).

(c) If f ∈ R(I), then |f |, f 2, f+, f− ∈ R(I). In particular, we, indeed, have (10.2)from the introduction (with M replaced by I). If, in addition, there exists δ > 0such that f(x) ≥ δ for each x ∈ I, then 1/f ∈ R(I). Moreover, |f | ∈ R(I) alsoholds for f ∈ R(I,C).

(d) If f, g ∈ R(I), then max(f, g),min(f, g) ∈ R(I). If f, g ∈ R(I,K), then f , fg ∈R(I,K). If, in addition, there exists δ > 0 such that |g(x)| ≥ δ for each x ∈ I, thenf/g ∈ R(I,K).

Proof. (a),(b): We just carry out the proof for the case f ∈ R(I), φ : f(I) −→ R, andleave the extension to the case that f or φ is C-valued as an exercise. Thus, let f ∈ R(I)and let φ : f(I) −→ R be Lipschitz continuous. Then there exists L ≥ 0 such that

∣∣φ(x)− φ(y)

∣∣ ≤ L |x− y| for each x, y ∈ f(I). (10.46)

As f ∈ R(I), given ǫ > 0, Th. 10.13 provides a partition ∆ of I such that R(∆, f) −r(∆, f) < ǫ/L, and we obtain

R(∆, φ ◦ f)− r(∆, φ ◦ f) =N∑

j=1

(Mj(φ ◦ f)−mj(φ ◦ f)

)|Ij|

≤N∑

j=1

L(Mj(f)−mj(f)

)|Ij|

= L(R(∆, f)− r(∆, f)

)< ǫ. (10.47)

Thus, φ ◦ f ∈ R(I) by another application of Th. 10.13.

(c): |f |, f 2, f+, f− ∈ R(I) follows from (b) (for |f | also for f ∈ R(I,C)), since each ofthe maps x 7→ |x|, x 7→ x2, x 7→ max{x, 0}, x 7→ −min{x, 0} is Lipschitz continuouson the bounded set f(I) (recall that f ∈ R(I) implies that f is bounded). Sincef = f+ − f−, (10.2) is implied by (10.28). Finally, if f(x) ≥ δ > 0, then x 7→ x−1 isLipschitz continuous on the bounded set f(I), and f−1 ∈ R(I) follows from (b).

(d): For f, g ∈ R(I), we note that, due to

fg =1

4(f + g)2 − (f − g)2, (10.48a)

max(f, g) = f + (g − f)+, (10.48b)

min(f, g) = g − (f − g)−, (10.48c)

everything is a consequence of (c). For f, g ∈ R(I,C), due to

f = (Re f, − Im f), (10.48d)

fg = (Re f Re g − Im f Im g, Re f Im g + Im f Re g), (10.48e)

1/g = (Re g/|g|2, − Im g/|g|2), (10.48f)

everything follows from the real-valued case together with (c) and Th. 10.11(a), where|g| ≥ δ > 0 guarantees |g|2 ≥ δ2 > 0). �

10.2 Important Theorems

This section compiles a number of important theorems on Riemann integrals, which, inparticular, provide powerful tools to actually evaluate such integrals.

10.2.1 Fundamental Theorem of Calculus

We provide two variants of the fundamental theorem with slightly different flavors: Inthe first variant, Th. 10.20(a), we start with a function f , obtain another function Fby means of integrating f , and recover f by taking the derivative of F . In the secondvariant, Th. 10.20(b), one first differentiates the given function F , obtaining f := F ′,followed by integrating f , recovering F up to an additive constant.

Notation 10.19. If a, b ∈ R, a ≤ b, I := [a, b], f : I −→ C, then denote

∫ b

a

f :=

∫

I

f,

∫ a

b

f := −∫ b

a

f, (10.49a)

[f(t)]ba := [f ]ba := f(b)− f(a), [f(t)]ab := [f ]ab := f(a)− f(b), (10.49b)

where f ∈ R(I,C) for (10.49a).

Theorem 10.20. Let a, b ∈ R, a < b, I := [a, b].

(a) If f ∈ R(I,K) is continuous in ξ ∈ I, then, for each c ∈ I, the function

Fc : I −→ K, Fc(x) :=

∫ x

c

f(t) dt , (10.50)

is differentiable in ξ with F ′c(ξ) = f(ξ). In particular, if f ∈ C(I,K), then Fc ∈

C1(I,K) and F ′c(x) = f(x) for each x ∈ I.

(b) If F ∈ C1(I,K) or, alternatively, F : I −→ K is differentiable with integrablederivative F ′ ∈ R(I,K), then

F (b)− F (a) = [F (t)]ba =

∫ b

a

F ′(t) dt , (10.51a)

and

F (x) = F (c) +

∫ x

c

F ′(t) dt for each c, x ∈ I. (10.51b)

Proof. It suffices to prove the case K = R, since the case K = C then follows by applyingthe case K = R to ReFc and ImFc (for (a)) and to ReF and ImF (for (b)). Thus, forthe rest of the proof, we assume K = R.

(a): We need to show that

limh→0

A(h) = 0, where A(h) :=Fc(ξ + h)− Fc(ξ)

h− f(ξ). (10.52)

One computes

A(h) =1

h

∫ ξ+h

ξ

f(t) dt − 1

hf(ξ)

∫ ξ+h

ξ

dt =1

h

∫ ξ+h

ξ

(f(t)− f(ξ)

)dt . (10.53)

Now, given ǫ > 0, the continuity of f in ξ allows us to find δ > 0 such that |(f(t)−f(ξ)| <ǫ/2 for each t ∈ I with |t− ξ| < δ. Thus, for each h with |h| < δ, we obtain

|A(h)| ≤ 1

h

∫ ξ+h

ξ

∣∣f(t)− f(ξ)

∣∣ dt ≤ ǫh

2h< ǫ, (10.54)

thereby proving limh→0A(h) = 0, i.e. f(ξ) = F ′c(ξ).

(b): First assume F ∈ C1(I). Then F ′ is continuous on I, and we can apply (a) to thefunction

G : I −→ R, G(x) :=

∫ x

a

F ′(t) dt , (10.55)

to obtain G′ = F ′. Thus, for H := F −G, we obtain H ′ ≡ 0, showing that H must beconstant on I, i.e. H(x) = H(a) = F (a) − G(a) = F (a) for each x ∈ I. Evaluating atx = b yields

F (a) = H(b) = F (b)−∫ b

a

F ′(t) dt , (10.56)

thereby establishing the case.

Now we consider the remaining case of a differentiable F with integrable derivativeF ′ ∈ R(I). Consider a partition ∆ = (x0, . . . , xN) of I as in Def. 10.3. Then, foreach j ∈ {1, . . . , N}, the mean value theorem provides ξj ∈]xj−1, xj [ such that F (xj)−F (xj−1) = (xj − xj−1)F

′(ξj). Thus,

F (b)− F (a) =N∑

j=1

(F (xj)− F (xj−1)

)=

N∑

j=1

(xj − xj−1)F′(ξj) = ρ(∆, F ′). (10.57)

If we choose a sequence of partitions ∆ of I such that |∆| → 0, then the integrability of

f implies that the right-hand side of (10.57) converges to∫ b

aF ′, once again establishing

the case. �

Definition 10.21. If I ⊆ R, f : I −→ K, and F : I −→ K is a differentiable functionwith F ′ = f , then F is called a primitive or antiderivative of f .

Example 10.22. (a) Due to the fundamental theorem, if we know a function’s an-tiderivative, we can easily compute its integral over a given interval. Here are threesimple examples:

∫ 1

0

(x5 − 3x) dx =

[x6

6− 3x2

2

]1

0

=1

6− 3

2= −4

3, (10.58a)

∫ e

1

1

xdx = [ln x]e1 = ln e− ln 1 = 1, (10.58b)

∫ π

0

sin x dx = [− cos x]π0 = 2. (10.58c)

(b) As a more complicated example, consider a rational function x 7→ r(x) = p(x)/q(x)with polynomials p and q. To find an antiderivative of r, one first writes r in theform r = s + p/q with polynomials s, p, q such that deg(p) < deg(q) (this can bedone using so-called polynomial long division). One then applies the partial fractiondecomposition of Sec. G of the Appendix to p/q: As a concrete example, consider

r : R \ {−1, 2} −→ R, r(x) :=3x8 − 9x6 − 6x5 + x3 + 5x2 − 1

x3 − 3x− 2.

One then finds

r(x) = 3x5 + 1 +5x2 + 3x+ 1

(x+ 1)2(x− 2),

and the partial fraction decomposition according to (G.5) is

r(x) = 3x5 + 1 +2

x+ 1− 1

(x+ 1)2+

3

x− 2.

One can now easily provide an antiderivative for each summand, and putting ev-erything together yields the antiderivative

R : R \ {−1, 2} −→ R, R(x) :=1

2x6 + x+ 2 ln |x+ 1|+ 1

x+ 1+ 3 ln |x− 2|,

for r. We note that, to apply partial fraction decomposition, one always needs thezeros of the denominator (i.e. of q). For polynomials of high degree, it will usuallynot be possible to determine theses zeros exactly, but merely approximitely.

10.2.2 Integration by Parts Formula

Theorem 10.23. Let a, b ∈ R, a < b, I := [a, b]. If f, g ∈ C1(I,C), then the followingintegration by parts formula holds:

∫ b

a

fg′ = [fg]ba −∫ b

a

f ′g. (10.59)

Proof. If f, g ∈ C1(I,C), then, according to the product rule, fg ∈ C1(I,C) with(fg)′ = f ′g + fg′. Applying (10.51a), we obtain

[fg]ba =

∫ b

a

(fg)′ =

∫ b

a

f ′g +

∫ b

a

fg′, (10.60)

which is precisely (10.59). �

Example 10.24. We compute the integral∫ 2π

0sin2 t dt :

∫ 2π

0

sin2 t dt = [− sin t cos t]2π0 +

∫ 2π

0

cos2 t dt =

∫ 2π

0

cos2 t dt . (10.61)

Adding∫ 2π

0sin2 t dt on both sides of (10.61) and using sin2 +cos2 ≡ 1 yields

2

∫ 2π

0

sin2 t dt =

∫ 2π

0

1 dt = 2π, (10.62)

i.e.∫ 2π

0sin2 t dt = π.

10.2.3 Change of Variables

Theorem 10.25. Let I, J ⊆ R be intervals, φ ∈ C1(I) and f ∈ C(J,C). If φ(I) ⊆ J ,then the following change of variables formula holds for each a, b ∈ I:

∫ φ(b)

φ(a)

f =

∫ φ(b)

φ(a)

f(x) dx =

∫ b

a

f(φ(t))φ′(t) dt =

∫ b

a

(f ◦ φ)φ′. (10.63)

Proof. Let

F : J −→ C, F (x) :=

∫ x

φ(a)

f(t) dt . (10.64)


According to Th. 10.20(a) and the chain rule of Th. 9.11, we obtain

(F ◦ φ)′ : I −→ C, (F ◦ φ)′(x) = φ′(x)f(φ(x)). (10.65)

Thus, we can apply (10.51a), which yields

∫ φ(b)

φ(a)

f = F (φ(b))− F (φ(a)) =

∫ b

a

(f ◦ φ)φ′, (10.66)

proving (10.63). �

Example 10.26. We compute the integral∫ 1

0t2√1− t dt using the change of variables

x := φ(t) := 1− t, φ′(t) = −1:

∫ 1

0

t2√1− t dt = −

∫ 0

1

(1− x)2√x dx =

∫ 1

0

(√x− 2x

√x+ x2

√x) dx

=

[

2x32

3− 4x

52

5+

2x72

7

]1

0

=16

105. (10.67)

10.3 Application: Taylor’s Theorem

Theorem 10.27 (Taylor’s Theorem). Let I ⊆ R be an open interval and a, x ∈ I,x 6= a. If m ∈ N0 and f ∈ Cm+1(I,K), then

f(x) = Tm(x, a) +Rm(x, a), (10.68)

where

Tm(x, a) :=m∑

k=0

f (k)(a)

k!(x−a)k = f(a)+f ′(a)(x−a)+f

′′(a)

2!(x−a)2+· · ·+f

(m)(a)

m!(x−a)m

(10.69)is the mth Taylor polynomial and

Rm(x, a) :=

∫ x

a

(x− t)m

m!f (m+1)(t) dt (10.70)

is the integral form of the remainder term. For K = R, one can also write the remainderterm in Lagrange form:

Rm(x, a) =f (m+1)(θ)

(m+ 1)!(x− a)m+1 with some suitable θ ∈]x, a[. (10.71)

Proof. The integral form (10.70) of the remainder term we prove by using induction onm: For m = 0, the assertion is

f(x) = f(a) +

∫ x

a

f ′(t) dt , (10.72)

which holds according to the fundamental theorem of calculus in the form Th. 10.20(b).For the induction step, we assume (10.68) holds for fixed m ∈ N0 with Rm(x, a) inintegral form (10.70) and consider f ∈ Cm+2(I,K). For fixed x ∈ I, we define thefunction

g : I −→ K, g(t) :=(x− t)m+1

(m+ 1)!f (m+1)(t). (10.73)

Using the product rule, its derivative is

g′ : I −→ K, g′(t) =(x− t)m+1

(m+ 1)!f (m+2)(t)− (x− t)m

m!f (m+1)(t). (10.74)

Applying the fundamental theorem to g then yields

−g(a) = g(x)− g(a) =

∫ x

a

g′(t) dt(10.74)= Rm+1(x, a)−Rm(x, a), (10.75)

with Rm(x, a) and Rm+1(x, a) defined according to (10.70). Thus,

Tm+1(x, a) +Rm+1(x, a)(10.75)= Tm(x, a) +

f (m+1)(a)

(m+ 1)!(x− a)m+1 +Rm(x, a)− g(a)

= Tm(x, a) +Rm(x, a)ind. hyp.= f(x), (10.76)

thereby completing the induction and the proof of (10.70).

It remains to prove the Lagrange form (10.71) of the remainder term for K = R. Since(x − t)m > 0 for x > t or m = 0, and (x − t)m < 0 for x < t and m > 0, (10.70) and(10.35b) imply the existence of some θ ∈ [x, a] such that

Rm(x, a) = f (m+1)(θ)

∫ x

a

(x− t)m

m!dt =

f (m+1)(θ)

(m+ 1)!(x− a)m+1, (10.77)

proving the Lagrange form, where it remains an exercise to show one can always chooseθ ∈]x, a[. �

Remark 10.28. The importance of Taylor’s Th. 10.27 does not lie in the decompositionf = Tm + Rm, which can be accomplished simply by defining Rm := f − Tm. Theimportance lies rather in the specific formulas for the remainder term.

Example 10.29. Depending on f ∈ C∞(I) and x, a ∈ I, the Taylor series

(Tm(x, a)

)

m∈N =∞∑

k=0

f (k)(a)

k!(x− a)k

can converge to f(x) (see (a) below), diverge (see (b) below), or converge to η 6= f(x)(see (c) below):

(a) For f : R −→ R, f(x) = ex, x ∈ R, a = 0, we have f (k)(a) = e0 = 1 for eachk ∈ N0, recovering the power series for the exponential function, which we alreadyknow to converge from Def. and Rem. 8.14:

limm→∞

Tm(x, 0) = limm→∞

m∑

k=0

1

k!xk = ex.

(b) For f : R+ −→ R, f(x) = ln x, we have f (m)(x) = (−1)m−1(m − 1)! x−m foreach m ∈ N0 and each x ∈ R+. Thus, for x = 3

2and a = 1

2, using f (k)(a) =

(−1)k−1(k − 1)! 2k, we have

Tm (x, a) =m∑

k=0

f (k)(a)

k!(x− a)k =

m∑

k=0

(−1)k−1 2k

k,

which diverges for m→ ∞, since the summands do not converge to 0.

(c) For

f : R −→ R, f(x) :=

{

e−1/x2 for x 6= 0,

0 for x = 0,

it is an exercise to show f ∈ C∞(R) with f (k)(0) = 0 for each k ∈ N0 (hint: forx 6= 0, one obtains f (k)(x) = Pk(x

−1)e−1/x2 , where Pk is a polynomial of degree 3k).Thus, Tm(x, 0) = 0 for each x ∈ R and each m ∈ N0, implying

limm→∞

Tm(x, 0) = 0 6= f(x) for each x ∈ R \ {0}.

10.4 Improper Integrals

For our definition of the Riemann integral in Def. 10.5, it was important that we con-sidered bounded functions on compact intervals (where the boundedness of the intervalswas more important than the closedness) – for unbounded functions and/or unboundedintervals, even Def. 10.4 of lower and upper Riemann sums no longer makes sense.

Still, for sufficiently benign functions, it is possible to extend the notion of a definiteRiemann integral to both unbounded intervals and unbounded functions, and in suchsituations we will speak of improper integrals (cf. Def. 10.33 below).

Definition 10.30. Let ∅ 6= I ⊆ R be an interval. We call f : I −→ R to be locallyRiemmann integrable if, and only if, f ∈ R(J) for each compact interval J ⊆ I. LetRloc(I) denote the set of all locally Riemmann integrable functions on I.

Remark 10.31. In particular, locally Riemmann integrable functions are bounded onevery compact interval. Moreover, Rloc(I) = R(I) if, and only if, I is a compact interval.For example, for each a, b ∈ R with a < b, the function given by the assignment rule

f(x) :=1

(x− a)(x− b)

is clearly locally Riemmann integrable, but not bounded on each of the intervals ]−∞, a[,]a, b[, and ]b,∞[.

—

Before we can define improper Riemann integrals, we define, in partial extension of Def.8.17:

Definition 10.32. Let M ⊆ R. If M is unbounded from above (resp. below, thenf : M −→ K is said to tend to η ∈ K (or to have the limit η ∈ K) for x→ ∞ (resp., forx→ −∞) (denoted by limx→±∞ f(x) = η) if, and only if, for each sequence (ξk)k∈N inMwith limk→∞ ξk = ∞ (resp. with limk→∞ ξk = −∞), the sequence (f(ξk))k∈N convergesto η ∈ K, i.e.

limx→±∞

f(x) = η ⇔ ∀(ξk)k∈N in M

(

limk→∞

ξk = ±∞ ⇒ limk→∞

f(ξk) = η)

. (10.78)

Definition 10.33. Let a < c < b (a = −∞, b = ∞ is admissible).

(a) Let I := [c, b[, f ∈ Rloc(I), and assume b = ∞ or f is unbounded. Consider thefunction

F : I −→ R, F (x) :=

∫ x

c

f.

If the limit

limx→b

F (x) = limx→b

∫ x

c

f (10.79)

exists in R, then we define

∫

I

f :=

∫ b

c

f(t) dt :=

∫ b

c

f := limx→b

∫ x

c

f.

(b) Let I :=]a, c], f ∈ Rloc(I), and assume a = −∞ or f is unbounded. Consider thefunction

F : I −→ R, F (x) :=

∫ c

x

f.

If the limit

limx→a

F (x) = limx→a

∫ c

x

f (10.80)

exists in R, then we define

∫

I

f :=

∫ c

a

f(t) dt :=

∫ c

a

f := limx→a

∫ c

x

f.

(c) Let I =]a, b[ , f ∈ Rloc(I). If the conditions of both (a) and (b) hold, i.e. (i) – (iv),where

(i) b = ∞ or f is unbounded on [c, b[,

(ii) limx→b

∫ x

cf exists in R,

(iii) a = −∞ or f is unbounded on ]a, c],

(iv) limx→a

∫ c

xf exists in R,

then we define ∫

I

f :=

∫ b

a

f(t) dt :=

∫ b

a

f :=

∫ c

a

f +

∫ b

c

f.

All the above limits of Riemann integrals (if they exist) are called improper Riemannintegrals. In each case, if the limit exists, we call f improperly Riemann integrable andwrite f ∈ R(I).

Remark 10.34. (a) The definitions in Def. 10.33 are consistent with what occurs ifthe limits are proper Riemann integrals: Let a, c, b ∈ R, a < c < b, and f ∈ R[a, b].Then

limx→b

∫ x

c

f =

∫ b

c

f and limx→a

∫ c

x

f =

∫ c

a

f. (10.81)

Indeed, since f ∈ R[a, b], |f | is bounded by some M ∈ R+; and if (xk)k∈N is asequence in [a, b[ such that limk→∞ xk = b, then

∣∣∣∣

∫ b

xk

f

∣∣∣∣≤∫ b

xk

|f | ≤M (b− xk) → 0 for k → ∞,

implying

limk→∞

∫ xk

c

fTh. 10.11(b)

= limk→∞

(∫ b

c

f −∫ b

xk

f

)

=

∫ b

c

f − 0 =

∫ b

c

f.

An analogous argument shows the remaining equality in (10.81).

(b) In Def. 10.33(c), it can occur that∫∞−∞ f does not exist, even though the limit

limx→∞∫ x

−x f exists: For example, if f : R −→ R, f(x) = x, then f ∈ Rloc(R), and,for each sequence (xk)k∈N in R such that limk→∞ xk = ∞ and each c ∈ R, one has

limk→∞

∫ xk

−xkt dt = lim

k→∞

[t2

2

]xk

−xk= lim

k→∞

x2k − x2k2

= 0,

limk→∞

∫ xk

c

t dt = limk→∞

[t2

2

]xk

c

= limk→∞

x2k − c2

2= ∞,

limk→∞

∫ c

−xkt dt = lim

k→∞

[t2

2

]c

−xk= lim

k→∞

c2 − x2k2

= −∞,

i.e. limx→∞∫ x

−x t dt = 0, but neither limx→∞∫ x

ct dt nor limx→−∞

∫ c

xt dt exists in

R.

(c) Let a < c1 < c2 < b (a = −∞, b = ∞ is admissible). If I := [c1, b[, f ∈ Rloc(I), and

b = ∞ or f is unbounded, then∫ b

c1f exists if, and only if,

∫ b

c2f exists. Moreover, if

the integrals exist, then∫ b

c1

f =

∫ c2

c1

f +

∫ b

c2

f. (10.82a)

Indeed, if (xk)k→∞ is a sequence in [c1, b[ such that limk→∞ xk = b and if∫ b

c1f exists,

then

limk→∞

∫ xk

c2

fTh. 10.11(b)

= limk→∞

(∫ xk

c1

f −∫ c2

c1

f

)

=

∫ b

c1

f −∫ c2

c1

f,

proving∫ b

c2f exists and (10.82a) holds. Conversely, if

∫ b

c2f exists, then

limk→∞

∫ xk

c1

fTh. 10.11(b)

= limk→∞

(∫ xk

c2

f +

∫ c2

c1

f

)

=

∫ b

c2

f +

∫ c2

c1

f,

proving∫ b

c1f exists and (10.82a) holds. Analogously, one shows that if I :=]a, c2],

f ∈ Rloc(I), and a = −∞ or f is unbounded, then∫ c1af exists if, and only if,

∫ c2af

exists, where, if the integrals exist, then∫ c2

a

f =

∫ c2

c1

f +

∫ c1

a

f. (10.82b)

In particular, we see that neither the existence nor the value of the improper integralin Def. 10.33(c) depends on the choice of c.

Example 10.35. (a) Let 0 < α < 1. We claim that∫ 1

0

1

tαdt =

1

1− α

(

α =1

2yields

∫ 1

0

1√tdt = 2

)

. (10.83)

Indeed, if (xk)k∈N is a sequence in ]0, 1] such that limk→∞ xk = 0, then

limk→∞

∫ 1

xk

1

tαdt = lim

k→∞

[t1−α

1− α

]1

xk

= limk→∞

1− x1−αk

1− α=

1

1− α.

(b) If (xk)k∈N is a sequence in ]0, 1] such that limk→∞ xk = 0, then

limk→∞

∫ 1

xk

1

tdt = lim

k→∞

[

ln t]1

xk= lim

k→∞

(

0− ln xk

)

= ∞,

showing the limit does not exist in R, but diverges to ∞. Sometimes, this is statedin the form ∫ 1

0

1

tdt = ∞. (10.84)

(c) We claim that∫ 0

−∞et dt = 1. (10.85)

Indeed, if (xk)k∈N is a sequence in R−0 such that limk→∞ xk = −∞, then

limk→∞

∫ 0

xk

et dt = limk→∞

[

et]0

xk= lim

k→∞(1− exk) = 1.

(d) Consider the function

f : R+0 −→ R, f(t) :=

{

n for n ≤ t ≤ n+ 1n2n

, n ∈ N,

0 otherwise.

Then limt→∞ f(t) does not exist and f is not even bounded. However f ∈ R(R+0 )

and∫ ∞

0

f =∞∑

n=1

∫ n+1/(n2n)

n

n dt =∞∑

n=1

2−n =1

1− 12

− 1 = 1.

Lemma 10.36. Let a < c < b (a = −∞, b = ∞ is admissible). Let I ⊆]a, b[ be one ofthe three kinds of intervals occurring in Def. 10.33 (i.e. I = [c, b[, I =]a, c], or I =]a, b[),and assume f, g : I −→ R to be improperly Riemann integrable over I.

(a) Linearity: For each λ, µ ∈ R, λf +µg is improperly Riemann integrable over I and

∫

I

(λf + µg) = λ

∫

I

f + µ

∫

I

g.

(b) Monotonicity: If f ≤ g, then ∫

I

f ≤∫

I

g.

Proof. We conduct the proof for the case I = [c, b[ – the case I =]a, b] can be shownanalogously, and the case I =]a, b[ then also follows. Let (xk)k∈N be a sequence in Isuch that limk→∞ xk = b.

(a): One computes

limk→∞

∫ xk

c

(λf + µg)

Th. 10.11(a)= lim

k→∞

(

λ

∫ xk

c

f + µ

∫ xk

c

g

)

= λ

∫ b

c

f + µ

∫ b

c

g,

showing (λf + µg) ∈ R(I) and proving (a).

(b): One estimates

∫ b

c

f = limk→∞

∫ xk

c

fTh. 10.11(c)

≤ limk→∞

∫ xk

c

g =

∫ b

c

g,

proving (b). �

Definition 10.37. Let a < c < b (a = −∞, b = ∞ is admissible). Let I ⊆]a, b[ beone of the three kinds of intervals occurring in Def. 10.33 (i.e. I = [c, b[, I =]a, c], orI =]a, b[), and assume f ∈ Rloc(I). Then, by Th. 10.18(c), |f | ∈ Rloc(I). If

∫

I|f | exists

as an improper integral, then we call the improper integral∫

If absolutely convergent.

—

Before we can proceed to Prop. 10.39 about convergence criteria for improper integrals,we need to prove the analogon of Th. 7.19 for limits of functions.

Proposition 10.38. Let ∅ 6=M ⊆ R, a ∈ R ∪ {−∞}, b ∈ R ∪ {∞}, and assume

a =

{

inf(M \ {a}) if M is bounded from below,

−∞ if M is unbounded from below,(10.86a)

b =

{

sup(M \ {a}) if M is bounded from above,

∞ if M is unbounded from above.(10.86b)

Let f : M −→ R be monotone (increasing or decreasing). Defining A := f(M) ={f(x) : x ∈M}, the following holds:

limx→b

f(x) =

supA if f is increasing and A is bounded from above,

∞ if f is increasing and A is not bounded from above,

inf A if f is decreasing and A is bounded from below,

−∞ if f is decreasing and A is not bounded from below,

(10.87a)

limx→a

f(x) =

supA if f is decreasing and A is bounded from above,

∞ if f is decreasing and A is not bounded from above,

inf A if f is increasing and A is bounded from below,

−∞ if f is increasing and A is not bounded from below.

(10.87b)

Proof. We prove (10.87a) for the case, where f is increasing – the remaining case of(10.87a) as well as (10.87b) can be proved completely analogous. Let (xk)k∈N be asequence in M \ {b} such that limk→∞ xk = b. We have to show that limk→∞ f(xk) = η,where η := supA for A bounded from above and η := ∞ for A not bounded from above.Seeking a contradiction, assume limk→∞ f(xk) = η does not hold. Due to the choice ofb, there then must be ǫ > 0 and a subsequence (yk)k∈N of (xk)k∈N such that (yk)k∈N isstrictly increasing and

∀k∈N

f(yk) ≤{

η − ǫ if η = supA,

ǫ if η = ∞.

Since limk→∞ yk = b and f is increasing, this means supA ≤ η − ǫ or supA = ǫ, whichmeans a contradiction in each case. Thus, limk→∞ f(xk) = η must hold and the proofis complete. �

Proposition 10.39. Let a < c < b (a = −∞, b = ∞ is admissible). Let I ⊆]a, b[ beone of the three kinds of intervals occurring in Def. 10.33 (i.e. I = [c, b[, I =]a, c], orI =]a, b[), and assume f ∈ Rloc(I).

(a) If g ∈ Rloc(I), 0 ≤ f ≤ g, and∫

Ig exists, then

∫

If exists as well. Conversely, if

0 ≤ g ≤ f and∫

Ig diverges, then

∫

If diverges as well.

(b) If∫

If is an improper integral that is absolutely convergent, then it is also conver-

gent.

Proof. (a): We consider the case I = [c, b[ – the proof for the case I =]a, c] is completelyanalogous, and the case I =]a, b[ then also follows. First, suppose 0 ≤ f ≤ g, and

∫

Ig

exists. Since 0 ≤ f , the function

F : [c, b[−→ R+0 , F (x) :=

∫ x

c

f,


is increasing. Due to

∀x∈[c,b[

F (x) =

∫ x

c

f ≤∫ x

c

g ≤∫ b

c

g ∈ R+0 ,

F is also bounded from above (in the sense that {F (x) : x ∈ [c, b[} is bounded fromabove), i.e. Prop. 10.38 yields that limx→b F (x) = limx→b

∫ x

cf exists in R as claimed.

Now suppose 0 ≤ g ≤ f and∫

Ig diverges. As the function F above, the function

G : [c, b[−→ R+0 , G(x) :=

∫ x

c

g,

is increasing. Since we assume that limx→bG(x) does not exist in R, Prop. 10.38 im-plies limx→bG(x) = ∞. As a consequence, if (xk)k∈N is a sequence in [c, b[ such thatlimk→∞ xk = b, then

limk→∞

F (xk) = limk→∞

∫ xk

c

f = limk→∞

∫ xk

c

g = ∞,

showing that∫

If diverges as well.

(b): We assume∫

If to converge absolutely, i.e.

∫

I|f |must exist in R. Since 0 ≤ f+ ≤ |f |

and 0 ≤ f− ≤ |f |, (a) then implies the existence of∫

If+ and of

∫

If−. Thus, according

to Lem. 10.36(a),∫

If =

∫

If+ −

∫

If− must also exist. �

Example 10.40. (a) We will use Prop. 10.39(a) to show that the improper integral∫ ∞

0

e−t2

dt

exists. Indeed,

∀t∈R

(

(t− 1)2 = t2 − 2t+ 1 ≥ 0 ⇒ −t2 ≤ −2t+ 1 ⇒ 0 ≤ e−t2 ≤ e−2t+1

)

,

and, since∫ ∞

0

e−2t+1 dt = limx→∞

∫ x

0

e−2t+1 dt = limx→∞

[

−e−2t+1

2

]x

0

= limx→∞

e− e−2x+1

2=e

2,

Prop. 10.39(a) implies that∫∞0e−t

2dt exists in R.

(b) We will use Prop. 10.39(a) to show that∫ ∞

0

et2

dt

diverges. Indeed,

∀t∈R

(

t2 ≥ 0 ⇒ et2 ≥ 1

)

,

and, since

limx→∞

∫ x

0

1 dt = limx→∞

x = ∞,

Prop. 10.39(a) implies that∫∞0et

2dt = ∞.

A AXIOMATIC SET THEORY 165

(c) We provide an example that shows an improper integral can converge withoutconverging absolutely: Consider the function

f : [0,∞[−→ R, f(t) :=

{

(−1)n+1 for n ≤ t ≤ n+ 1n, n ∈ N,

0 otherwise.(10.88)

Then∫ ∞

0

|f | = limk→∞

n∑

k=1

∫ k+ 1k

k

1 dt = limk→∞

n∑

k=1

1

k

(7.72)= ∞, (10.89)

showing∫∞0f does not converge absolutely. However, we will show

∫ ∞

0

f =∞∑

j=1

(−1)j+1

j=: α > 0. (10.90)

We know α > 0 from Ex. 7.86(a) and Th. 7.85. Let (xk)k∈N be a sequence in R+0

such that limk→∞ xk = ∞. Given ǫ > 0, choose K ∈ N such that 1K< ǫ

2and N ∈ N

such that∀

k>Nxk > K. (10.91)

Then, for each k > N , there exists K1 ∈ N such that K < K1 ≤ xk < K1+1. Thus

∫ xk

0

f(t) dt = min

{

xk −K1,1

K1

}

+

K1−1∑

j=1

(−1)j+1

j(10.92)

and∣∣∣∣α−

∫ xk

0

f(t) dt

∣∣∣∣=

∣∣∣∣∣

∞∑

j=K1

(−1)j+1

j−min

{

xk −K1,1

K1

}∣∣∣∣∣

(7.79)<

1

K1

+1

K1

<2

K< 2 · ǫ

2= ǫ, (10.93)

thereby proving (10.90).

A Axiomatic Set Theory

A.1 Motivation, Russell’s Antinomy

As it turns out, naive set theory, founded on the definition of a set according to Cantor(as stated at the beginning of Sec. 1.3) is not suitable to be used in the foundation ofmathematics. The problem lies in the possibility of obtaining contradictions such asRussell’s antinomy, after Bertrand Russell, who described it in 1901.

Russell’s antinomy is obtained when considering the set X of all sets that do not con-tain themselves as an element: When asking the question if X ∈ X, one obtains thecontradiction that X ∈ X ⇔ X /∈ X:


Suppose X ∈ X. Then X is a set that contains itself. But X was defined to containonly sets that do not contain themselves, i.e. X /∈ X.

So suppose X /∈ X. Then X is a set that does not contain itself. Thus, by the definitionof X, X ∈ X.

Perhaps you think Russell’s construction is rather academic, but it is easily translatedinto a practical situation. Consider a library. The catalog C of the library should containall the library’s books. Since the catalog itself is a book of the library, it should occuras an entry in the catalog. So there can be catalogs such as C that have themselves asan entry and there can be other catalogs that do not have themselves as an entry. Nowone might want to have a catalog X of all catalogs that do not have themselves as anentry. As in Russell’s antinomy, one is led to the contradiction that the catalog X musthave itself as an entry if, and only if, it does not have itself as an entry.

One can construct arbitrarily many versions, which we will not do. Just one more:Consider a small town with a barber, who, each day, shaves all inhabitants, who do notshave themselves. The poor barber now faces a terrible dilemma: He will have to shavehimself if, and only if, he does not shave himself.

To avoid contradictions such as Russell’s antinomy, axiomatic set theory restricts theconstruction of sets via so-called axioms, as we will see below.

A.2 Set-Theoretic Formulas

The contradiction of Russell’s antinomy is related to Cantor’s sets not being hierarchical.Another source of contradictions in naive set theory is the imprecise nature of informallanguages such as English. In (1.6), we said that

A := {x ∈ B : P (x)}

defines a subset of B if P (x) is a statement about an element x of B. Now takeB := N := {1, 2, . . . } to be the set of the natural numbers and let

P (x) := “The number x can be defined by fifty English words or less”. (A.1)

Then A is a finite subset of N, since there are only finitely many English words (if youthink there might be infinitely many English words, just restrict yourself to the wordscontained in some concrete dictionary). Then there is a smallest natural number n thatis not in A. But then n is the smallest natural number that can not be defined byfifty English words or less, which, actually, defines n by less than fifty English words, incontradiction to n /∈ A.

To avoid contradictions of this type, we require P (x) to be a so-called set-theoreticformula.

Definition A.1. (a) The language of set theory consists precisely of the following sym-bols: ∧,¬, ∃, (, ),∈,=, vj , where j = 1, 2, . . . .


(b) A set-theoretic formula is a finite string of symbols from the above language of settheory that can be built using the following recursive rules:

(i) vi ∈ vj is a set-theoretic formula for i, j = 1, 2, . . . .

(ii) vi = vj is a set-theoretic formula for i, j = 1, 2, . . . .

(iii) If φ and ψ are set-theoretic formulas, then (φ)∧ (ψ) is a set-theoretic formula.

(iv) If φ is a set-theoretic formulas, then ¬(φ) is a set-theoretic formula.

(v) If φ is a set-theoretic formulas, then ∃vj(φ) is a set-theoretic formula forj = 1, 2, . . . .

Example A.2. Examples of set-theoretic formulas are (v3 ∈ v5) ∧ (¬(v2 = v3)),∃v1(¬(v1 = v1)); examples of symbol strings that are not set-theoretic formulas arev1 ∈ v2 ∈ v3, ∃∃¬, and ∈ v3∃.Remark A.3. It is noted that, for a given finite string of symbols, a computer can, inprinciple, check in finitely many steps, if the string constitutes a set-theoretic formulaor not. The symbols that can occur in a set-theoretic formula are to be interpretedas follows: The variables v1, v2, . . . are variables for sets. The symbols ∧ and ¬ areto be interpreted as the logical operators of conjunction and negation as described inSec. 1.2.2. Similarly, ∃ stands for an existential quantifier as in Sec. 1.4: The statement∃vj(φ) means “there exists a set vj that has the property φ”. Parentheses ( and ) areused to make clear the scope of the logical symbols ∃,∧,¬. Where the symbol ∈ occurs,it is interpreted to mean that the set to the left of ∈ is contained as an element in theset to the right of ∈. Similarly, = is interpreted to mean that the sets occurring to theleft and to the right of = are equal.

Remark A.4. A disadvantage of set-theoretic formulas as defined in Def. A.1 is thatthey quickly become lengthy and unreadable (at least to the human eye). To makeformulas more readable and concise, one introduces additional symbols and notation.Formally, additional symbols and notation are always to be interpreted as abbreviationsor transcriptions of actual set-theoretic formulas. For example, we use the rules of Th.1.11 to define the additional logical symbols ∨, ⇒, ⇔ as abbreviations:

(φ) ∨ (ψ) is short for ¬((¬(φ)) ∧ (¬(ψ))) (cf. Th. 1.11(j)), (A.2a)

(φ) ⇒ (ψ) is short for (¬(φ)) ∨ (ψ) (cf. Th. 1.11(a)), (A.2b)

(φ) ⇔ (ψ) is short for ((φ) ⇒ (ψ)) ∧ ((ψ) ⇒ (φ)) (cf. Th. 1.11(b)). (A.2c)

Similarly, we use (1.17a) to define the universal quantifier:

∀vj(φ) is short for ¬(∃vj(¬(φ))). (A.2d)

Further abbreviations and transcriptions are obtained from omitting parentheses if it isclear from the context and/or from Convention 1.10 where to put them in, by writingvariables bound by quantifiers under the respective quantifiers (as in Sec. 1.4), and byusing other symbols than vj for set variables. For example,

∀x

(φ⇒ ψ) transcribes ¬(∃v1(¬((¬(φ)) ∨ (ψ)))).

Moreover,

vi 6= vj is short for ¬(vi = vj); vi /∈ vj is short for ¬(vi ∈ vj). (A.2e)

Remark A.5. Even though axiomatic set theory requires the use of set-theoretic for-mulas as described above, the systematic study of formal symbolic languages is thesubject of the field of mathematical logic and is beyond the scope of this class (see, e.g.,[EFT07]). In Def. and Rem. 1.15, we defined a proof of statement B from statement A1

as a finite sequence of statements A1, A2, . . . , An such that, for 1 ≤ i < n, Ai impliesAi+1, and An implies B. In the field of proof theory, also beyond the scope of this class,such proofs are formalized via a finite set of rules that can be applied to (set-theoretic)formulas (see, e.g., [EFT07, Sec. IV], [Kun12, Sec. II]). Once proofs have been formal-ized in this way, one can, in principle, mechanically check if a given sequence of symbolsdoes, indeed, constitute a valid proof (without even having to understand the actualmeaning of the statements). Indeed, several different computer programs have beendevised that can be used for automatic proof checking, for example Coq [Wik15a], HOLLight [Wik15b], and Isabelle [Wik15c] to name just a few.

A.3 The Axioms of Zermelo-Fraenkel Set Theory

Axiomatic set theory seems to provide a solid and consistent foundation for conductingmathematics, and most mathematicians have accepted it as the basis of their everydaywork. However, there do remain some deep, difficult, and subtle philosophical issuesregarding the foundation of logic and mathematics (see, e.g., [Kun12, Sec. 0, Sec. III]).

Definition and Remark A.6. An axiom is a statement that is assumed to be truewithout any formal logical justification. The most basic axioms (for example, the stan-dard axioms of set theory) are taken to be justified by common sense or some underlyingphilosophy. However, on a less fundamental (and less philosophical) level, it is a commonmathematical strategy to state a number of axioms (for example, the axioms definingthe mathematical structure called a group), and then to study the logical consequencesof these axioms (for example, group theory studies the statements that are true for allgroups as a consequence of the group axioms). For a given system of axioms, the ques-tion if there exists an object satisfying all the axioms in the system (i.e. if the systemof axioms is consistent, i.e. free of contradictions) can be extremely difficult to answer.

—

We are now in a position to formulate and discuss the axioms of axiomatic set the-ory. More precisely, we will present the axioms of Zermelo-Fraenkel set theory, usuallyabbreviated as ZF, which are Axiom 0 – Axiom 8 below. While there exist variousset theories in the literature, each set theory defined by some collection of axioms, theaxioms of ZFC, consisting of the axioms of ZF plus the axiom of choice (Axiom 9, seeSec. A.4 below), are used as the foundation of mathematics currently accepted by mostmathematicians.


A.3.1 Existence, Extensionality, Comprehension

Axiom 0 Existence:∃X

(X = X).

Recall that this is just meant to be a more readable transcription of theset-theoretic formula ∃v1(v1 = v1). The axiom of existence states that thereexists (at least one) set X.

In Def. 1.18 two sets are defined to be equal if, and only if, they contain precisely thesame elements. In axiomatic set theory, this is guaranteed by the axiom of extensionality:

Axiom 1 Extensionality:

∀X

∀Y

(

∀z

(z ∈ X ⇔ z ∈ Y ) ⇒ X = Y)

.

Following [Kun12], we assume that the substitution property of equality is part of theunderlying logic, i.e. if X = Y , then X can be substituted for Y and vice versa withoutchanging the truth value of a (set-theoretic) formula. In particular, this yields theconverse to extensionality:

∀X

∀Y

(

X = Y ⇒ ∀z

(z ∈ X ⇔ z ∈ Y ))

.

Before we discuss further consequences of extensionality, we would like to have theexistence of the empty set. However, Axioms 0 and 1 do not suffice to prove the exis-tence of an empty set (see [Kun12, I.6.3]). This, rather, needs the additional axiom ofcomprehension. More precisely, in the case of comprehension, we do not have a singleaxiom, but a scheme of infinitely many axioms, one for each set-theoretic formula. Itsformulation makes use of the following definition:

Definition A.7. One obtains the universal closure of a set-theoretic formula φ, bywriting ∀

vjin front of φ for each variable vj that occurs as a free variable in φ (recall

from Def. 1.31 that vj is free in φ if, and only if, it is not bound by a quantifier in φ).

Axiom 2 Comprehension Scheme: For each set-theoretic formula φ, not containing Yas a free variable, the universal closure of

∃Y

∀x

(

x ∈ Y ⇔ (x ∈ X ∧ φ))

is an axiom. Thus, the comprehension scheme states that, given the set X,there exists (at least one) set Y , containing precisely the elements of X thathave the property φ.


Remark A.8. Comprehension does not provide uniqueness. However, if both Y andY ′ are sets containing precisely the elements of X that have the property φ, then

∀x

(

x ∈ Y ⇔ (x ∈ X ∧ φ) ⇔ x ∈ Y ′)

,

and, then, extensionality implies Y = Y ′. Thus, due to extensionality, the set Y givenby comprehension is unique, justifying the notation

{x : x ∈ X ∧ φ} := {x ∈ X : φ} := Y (A.3)

(this is the axiomatic justification for (1.6)).

Theorem A.9. There exists a unique empty set (which we denote by ∅ or by 0 – it iscommon to identify the empty set with the number zero in axiomatic set theory).

Proof. Axiom 0 provides the existence of a set X. Then comprehension allows us todefine the empty set by

0 := ∅ := {x ∈ X : x 6= x},where, as explained in Rem. A.8, extensionality guarantees uniqueness. �

Remark A.10. In Rem. A.4 we said that every formula with additional symbols andnotation is to be regarded as an abbreviation or transcription of a set-theoretic formulaas defined in Def. A.1(b). Thus, formulas containing symbols for defined sets (e.g. 0or ∅ for the empty set) are to be regarded as abbreviations for formulas without suchsymbols. Some logical subtleties arise from the fact that there is some ambiguity in theway such abbreviations can be resolved: For example, 0 ∈ X can abbreviate either

ψ : ∃y

(

φ(y) ∧ y ∈ X)

or χ : ∀y

(

φ(y) ⇒ y ∈ X)

, where φ(y) stands for ∀v(v /∈ y).

Then ψ and χ are equivalent if ∃y! φ(y) is true (e.g., if Axioms 0 – 2 hold), but they can

be nonequivalent, otherwise (see discussion between Lem. 2.9 and Lem. 2.10 in [Kun80]).

—

At first glance, the role played by the free variables in φ, which are allowed to occurin Axiom 2, might seem a bit obscure. So let us consider examples to illustrate thatallowing free variables (i.e. set parameters) in comprehension is quite natural:

Example A.11. (a) Suppose φ in comprehension is the formula x ∈ Z (having Z as afree variable), then the set given by the resulting axiom is merely the intersectionof X and Z:

X ∩ Z := {x ∈ X : φ} = {x ∈ X : x ∈ Z}.

(b) Note that it is even allowed for φ in comprehension to have X as a free variable, soone can let φ be the formula ∃

u(x ∈ u ∧ u ∈ X) to define the set

X∗ :={

x ∈ X : ∃u

(x ∈ u ∧ u ∈ X)}

.


Then, if 0 := ∅, 1 := {0}, 2 := {0, 1}, we obtain

2∗ = {0} = 1.

—

It is a consequence of extensionality that the mathematical universe consists of setsand only of sets: Suppose there were other objects in the mathematical universe, forexample a cow C and a monkey M (or any other object without elements, other thanthe empty set) – this would be equivalent to allowing a cow or a monkey (or any otherobject without elements, other than the empty set) to be considered a set, which wouldmean that our set-theoretic variables vj were allowed to be a cow or a monkey as well.However, extensionality then implies the false statement C =M = ∅, thereby excludingcows and monkeys from the mathematical universe.

Similarly, {C} and {M} (or any other object that contains a non-set), can not be insidethe mathematical universe. Indeed, otherwise we had

∀x

(

x ∈ {C} ⇔ x ∈ {M})

(as C and M are non-sets) and, by extensionality, {C} = {M} were true, in contradic-tion to a set with a cow inside not being the same as a set with a monkey inside. Thus,we see that all objects of the mathematical universe must be so-called hereditary sets,i.e. sets all of whose elements (thinking of the elements as being the children of the sets)are also sets.

A.3.2 Classes

As we need to avoid contradictions such as Russell’s antinomy, we must not require theexistence of a set {x : φ} for each set-theoretic formula φ. However, it can still beuseful to think of a “collection” of all sets having the property φ. Such collections arecommonly called classes:

Definition A.12. (a) If φ is a set-theoretic formula, then we call {x : φ} a class,namely the class of all sets that have the property φ (typically, φ will have x as afree variable).

(b) If φ is a set-theoretic formula, then we say the class {x : φ} exists (as a set) if, andonly if

∃X

(

∀x

(

x ∈ X ⇔ φ))

(A.4)

is true. Then X is actually unique by extensionality and we identify X with theclass {x : φ}. If (A.4) is false, then {x : φ} is called a proper class (and the usualinterpretation is that the class is in some sense “too large” to be a set).

Example A.13. (a) Due to Russell’s antinomy of Sec. A.1, we know that R := {x :x /∈ x} forms a proper class.


(b) The universal class of all sets, V := {x : x = x}, is a proper class. Once again,this is related to Russell’s antinomy: If V were a set, then

R = {x : x /∈ x} = {x : x = x ∧ x /∈ x} = {x : x ∈ V ∧ x /∈ x}

would also be a set by comprehension. However, this is in contradiction to R beinga proper class by (a).

Remark A.14. From the perspective of formal logic, statements involving properclasses are to be regarded as abbreviations for statements without proper classes. Forexample, it turns out that the class G of all sets forming a group is a proper class. Butwe might write G ∈ G as an abbreviation for the statement “The set G is a group.”

A.3.3 Pairing, Union, Replacement

Axioms 0 – 2 are still consistent with the empty set being the only set in existence (see[Kun12, I.6.13]). The next axiom provides the existence of nonempy sets:

Axiom 3 Pairing:∀x∀y∃Z(x ∈ Z ∧ y ∈ Z).

Thus, the pairing axiom states that, for all sets x and y, there exists a set Zthat contains x and y as elements.

In consequence of the pairing axiom, the sets

0 := ∅, (A.5a)

1 := {0}, (A.5b)

2 := {0, 1} (A.5c)

all exist. More generally, we may define:

Definition A.15. If x, y are sets and Z is given by the pairing axiom, then we call

(a) {x, y} := {u ∈ Z : u = x ∨ u = y} the unordered pair given by x and y,

(b) {x} := {x, x} the singleton set given by x,

(c) (x, y) := {{x}, {x, y}} the ordered pair given by x and y (cf. Def. 2.1).

—

We can now show that ordered pairs behave as expected:

Lemma A.16. The following holds true:

∀x,y,x′,y′

(

(x, y) = (x′, y′) ⇔ (x = x′) ∧ (y = y′))

.


Proof. “⇐” is merely

(x, y) = {{x}, {x, y}} x=x′, y=y′

= {{x′}, {x′, y′}} = (x′, y′).

“⇒” is done by distinguishing two cases: If x = y, then

{{x}} = (x, y) = (x′, y′) = {{x′}, {x′, y′}}.Next, by extensionality, we first get {x} = {x′} = {x′, y′}, followed by x = x′ = y′,establishing the case. If x 6= y, then

{{x}, {x, y}} = (x, y) = (x′, y′) = {{x′}, {x′, y′}},where, by extensionality {x} 6= {x, y} 6= {x′}. Thus, using extensionality again, {x} ={x′} and x = x′. Next, we conclude

{x, y} = {x′, y′} = {x, y′}and a last application of extensionality yields y = y′. �

While we now have the existence of the infinitely many different sets 0, {0}, {{0}}, . . . ,we are not, yet, able to form sets containing more than two elements. This is remediedby the following axiom:

Axiom 4 Union:∀M

∃Y∀x∀X

(

(x ∈ X ∧ X ∈ M) ⇒ x ∈ Y)

.

Thus, the union axiom states that, for each set of sets M, there exists a setY containing all elements of elements of M.

Definition A.17. (a) If M is a set and Y is given by the union axiom, then define

⋃

M :=⋃

X∈MX :=

{

x ∈ Y : ∃X∈M

x ∈ X

}

.

(b) If X and Y are sets, then define

X ∪ Y :=⋃

{X, Y }.

(c) If x, y, z are sets, then define

{x, y, z} := {x, y} ∪ {z}.Remark A.18. (a) The definition of set-theoretic unions as

⋃

i∈IAi :=

{

x : ∃i∈I

x ∈ Ai

}

in (1.25b) will be equivalent to the definition in Def. A.17(a) if we are allowed toform the set

M := {Ai : i ∈ I}.If I is a set and Ai is a set for each i ∈ I, then M as above will be a set by Axiom5 below (the axiom of replacement).


(b) In contrast to unions, intersections can be obtained directly from comprehensionwithout the introduction of an additional axiom: For example

X ∩ Y := {x ∈ X : x ∈ Y },⋂

i∈IAi :=

{

x ∈ Ai0 : ∀i∈I

x ∈ Ai

}

,

where i0 ∈ I 6= ∅ is an arbitrary fixed element of I.

(c) The union⋃

∅ =⋃

X∈∅X =

⋃

i∈∅Ai = ∅

is the empty set – in particular, a set. However,

⋂

∅ =

{

x : ∀X∈∅

x ∈ X

}

= V =

{

x : ∀i∈∅

x ∈ Ai

}

=⋂

i∈∅Ai,

i.e. the intersection over the empty set is the class of all sets – in particular, a properclass and not a set.

Definition A.19. We define the successor function

x 7→ S(x) := x ∪ {x} (for each set x).

Thus, recalling (A.5), we have 1 = S(0), 2 = S(1); and we can define 3 := S(2), . . . Ingeneral, we call the set S(x) the successor of the set x.

—

In Def. 2.3 and Def. 2.19, respectively, we define functions and relations in the usualmanner, making use of the Cartesian product A × B of two sets A and B, which,according to (2.2) consists of all ordered pairs (x, y), where x ∈ A and y ∈ B. However,Axioms 0 – 4 are not sufficient to justify the existence of Cartesian products. To obtainCartesian products, we employ the axiom of replacement. Analogous to the axiomof comprehension, the following axiom of replacement actually consists of a scheme ofinfinitely many axioms, one for each set-theoretic formula:

Axiom 5 Replacement Scheme: For each set-theoretic formula, not containing Y as afree variable, the universal closure of

(

∀x∈X

∃y! φ

)

⇒(

∃Y

∀x∈X

∃y∈Y

φ

)

is an axiom. Thus, the replacement scheme states that if, for each x ∈ X,there exists a unique y having the property φ (where, in general, φ will dependon x), then there exists a set Y that, for each x ∈ X, contains this y withproperty φ. One can view this as obtaining Y by replacing each x ∈ X bythe corresponding y = y(x).


Theorem A.20. If A and B are sets, then the Cartesian product of A and B, i.e. theclass

A×B :=

{

x : ∃a∈A

∃b∈B

x = (a, b)

}

exists as a set.

Proof. For each a ∈ A, we can use replacement with X := B and φ := φa being theformula y = (a, x) to obtain the existence of the set

{a} ×B := {(a, x) : x ∈ B} (A.6a)

(in the usual way, comprehension and extensionality were used as well). Analogously,using replacement again with X := A and φ being the formula y = {x} ×B, we obtainthe existence of the set

M := {{x} × B : x ∈ A}. (A.6b)

In a final step, the union axiom now shows⋃

M =⋃

a∈A{a} × B = A× B (A.6c)

to be a set as well. �

A.3.4 Infinity, Ordinals, Natural Numbers

The following axiom of infinity guarantees the existence of infinite sets (e.g., it will allowus to define the set of natural numbers N, which is infinite by Th. A.46 below).

Axiom 6 Infinity:

∃X

(

0 ∈ X ∧ ∀x∈X

(x ∪ {x} ∈ X)

)

.

Thus, the infinity axiom states the existence of a set X containing ∅ (iden-tified with the number 0), and, for each of its elements x, its successorS(x) = x ∪ {x}.

In preparation for our official definition of N in Def. A.41 below, we will study so-calledordinals, which are special sets also of further interest to the field of set theory (thenatural numbers will turn out to be precisely the finite ordinals). We also need somenotions from the theory of relations, in particular, order relations (cf. Def. 2.19 and Def.2.25).

Definition A.21. Let R be a relation on a set X.

(a) R is called asymmetric if, and only if,

∀x,y∈X

(

xRy ⇒ ¬(yRx))

, (A.7)

i.e. if x is related to y only if y is not related to x.


(b) R is called a strict partial order if, and only if, R is asymmetric and transitive. It isnoted that this is consistent with Not. 2.26, since, recalling the notation ∆(X) :={(x, x) : x ∈ X}, R is a partial order on X if, and only if, R \ ∆(X) is a strictpartial order on X. We extend the notions lower/upper bound, min, max, inf, supof Def. 2.27 to strict partial orders R by applying them to R∪∆(X): We call x ∈ Xa lower bound of Y ⊆ X with respect to R if, and only if, x is a lower bound of Ywith respect to R ∪∆(X), and analogous for the other notions.

(c) A strict partial order R is called a strict total order or a strict linear order if, andonly if, for each x, y ∈ X, one has x = y or xRy or yRx.

(d) R is called a (strict) well-order if, and only if, R is a (strict) total order and everynonempty subset of X has a min with respect to R (for example, the usual ≤constitutes a well-order on N (see Th. D.5 below), but not on R (e.g., R+ does nothave a min)).

(e) If Y ⊆ X, then the relation on Y defined by

xSy :⇔ xRy

is called the restriction of R to Y , denoted S = R↾Y (usually, one still writes R forthe restriction).

Lemma A.22. Let R be a relation on a set X and Y ⊆ X.

(a) If R is transitive, then R↾Y is transitive.

(b) If R is reflexive, then R↾Y is reflexive.

(c) If R is antisymmetric, then R↾Y is antisymmetric.

(d) If R is asymmetric, then R↾Y is asymmetric.

(e) If R is a (strict) partial order, then R↾Y is a (strict) partial order.

(f) If R is a (strict) total order, then R↾Y is a (strict) total order.

(g) If R is a (strict) well-order, then R↾Y is a (strict) well-order.

Proof. (a): If a, b, c ∈ Y with aRb and bRc, then aRc, since a, b, c ∈ X and R is transitiveon X.

(b): If a ∈ Y , then a ∈ X and aRa, since R is reflexive on X.

(c): If a, b ∈ Y with aRb and bRa, then a = b, since a, b ∈ X and R is antisymmetricon X.

(d): If a, b ∈ Y with aRb, then ¬bRa, since a, b ∈ X and R is asymmetric on X.

(e) follows by combining (a) – (d).

(f): If a, b ∈ Y with a = b and ¬aRb, then bRa, since a, b ∈ X and R is total on X.Combining this with (e) yields (f).

(g): Due to (f), it merely remains to show that every nonempty subset Z ⊆ Y has amin. However, since Z ⊆ X and R is a well-order on X, there is m ∈ Z such that m isa min for R on X, implying m to be a min for R on Y as well. �

Remark A.23. Since the universal class V is not a set, ∈ is not a relation in the senseof Def. 2.19. It can be considered as a “class relation”, i.e. a subclass of V × V, butit is a proper class. However, ∈ does constitute a relation in the sense of Def. 2.19 oneach set X (recalling that each element of X must be a set as well). More precisely, ifX is a set, then so is

R∈ := {(x, y) ∈ X ×X : x ∈ y}. (A.8a)

Then∀

x,y∈X(x, y) ∈ R∈ ⇔ x ∈ y. (A.8b)

Definition A.24. A set X is called transitive if, and only if, every element of X is alsoa subset of X:

∀x∈X

x ⊆ X. (A.9a)

Clearly, (A.9a) is equivalent to

∀x,y

(

x ∈ y ∧ y ∈ X ⇒ x ∈ X)

. (A.9b)

Lemma A.25. If X, Y are transitive sets, then X ∩ Y is a transitive set.

Proof. If x ∈ X ∩ Y and y ∈ x, then y ∈ X (since X is transitive) and y ∈ Y (since Yis transitive). Thus y ∈ X ∩ Y , showing X ∩ Y is transitive. �

Definition A.26. (a) A set α is called an ordinal number or just an ordinal if, and onlyif, α is transitive and ∈ constitutes a strict well-order on α. An ordinal α is called asuccessor ordinal if, and only if, there exists an ordinal β such that α = S(β), whereS is the successor function of Def. A.19. An ordinal α 6= 0 is called a limit ordinalif, and only if, it is not a successor ordinal. We denote the class of all ordinals byON (it is a proper class by Cor. A.33 below).

(b) We define

∀α,β∈ON

(α < β :⇔ α ∈ β), (A.10a)

∀α,β∈ON

(α ≤ β :⇔ α < β ∨ α = β). (A.10b)

Example A.27. Using (A.5), 0 = ∅ is an ordinal, and 1 = S(0), 2 = S(1) are bothsuccessor ordinals (in Prop. A.43, we will identify N0 as the smallest limit ordinal). Eventhough X := {1} and Y := {0, 2} are well-ordered by ∈, they are not ordinals, sincethey are not transitive sets: 1 ∈ X, but 1 6⊆ X (since 0 ∈ 1, but 0 /∈ X); similarly,1 ∈ 2 ∈ Y , but 1 /∈ Y .


Lemma A.28. No ordinal contains itself, i.e.

∀α∈ON

α /∈ α.

Proof. If α is an ordinal, then ∈ is a strict order on α. Due to asymmetry of strictorders, x ∈ x can not be true for any element of α, implying that α ∈ α can not betrue. �

Proposition A.29. Every element of an ordinal is an ordinal, i.e.

∀α∈ON

(

X ∈ α ⇒ X ∈ ON)

(in other words, ON is a transitive class).

Proof. Let α ∈ ON and X ∈ α. Since α is transitive, we have X ⊆ α. As ∈ is astrict well-order on α, it must also be a strict well-order on X by Lem. A.22(g). Inconsequence, it only remains to prove that X is transitive as well. To this end, letx ∈ X. Then x ∈ α, as α is transitive. If y ∈ x, then, using transitivity of α again,y ∈ α. Now y ∈ X, as ∈ is transitive on α, proving x ⊆ X, i.e. X is transitive. �

Proposition A.30. If α, β ∈ ON, then X := α ∩ β ∈ ON (we will see in Th. A.35(a)below that, actually, α ∩ β = min{α, β}).

Proof. X is transitive by Lem. A.25, and, since X ⊆ α, ∈ is a strict well-order on X byLem. A.22(g). �

Proposition A.31. On the class ON, the relation ≤ (as defined in (A.10)) is the sameas the relation ⊆, i.e.

∀α,β∈ON

(

α ≤ β ⇔ α ⊆ β ⇔ (α ∈ β ∨ α = β))

. (A.11)

Proof. Let α, β ∈ ON.

Assume α ≤ β. If α = β, then α ⊆ β. If α ∈ β, then α ⊆ β, since β is transitive.

Conversely, assume α ⊆ β and α 6= β. We have to show α ∈ β. To this end, we setX := β \α. Then X 6= ∅ and, as ∈ well-orders β, we can let m := minX. We will showm = α (note that this will complete the proof, due to α = m ∈ X ⊆ β). If µ ∈ m, thenµ ∈ β (since m ∈ β and β is transitive) and µ /∈ X (since m = minX), implying µ ∈ α(since X = β \ α) and, thus, m ⊆ α. Seeking a contradiction, assume m 6= α. Thenthere must be some γ ∈ α \m ⊆ α ⊆ β. In consequence γ,m ∈ β. As γ /∈ m and ∈ isa total order on β, we must have either m = γ or m ∈ γ. However, m 6= γ, since γ ∈ αand m /∈ α (as m ∈ X). So it must be m ∈ γ ∈ α, implying m ∈ α, as β is transitive.This contradiction proves m = α and establishes the proposition. �

Theorem A.32. The class ON is well-odered by ∈, i.e.

(i) ∈ is transitive on ON:

∀α,β,γ∈ON

(

α < β ∧ β < γ ⇒ α < γ)

.

(ii) ∈ is asymmetric on ON:

∀α,β∈ON

(

α < β ⇒ ¬(β < α))

.

(iii) Ordinals are always comparable:

∀α,β∈ON

(

α < β ∨ β < α ∨ α = β)

.

(iv) Every nonempty set of ordinals has a min.

Proof. (i) is clear, as γ is a transitive set.

(ii): If α, β ∈ ON, then α ∈ β ∈ α implies α ∈ α by (i), which is a contradiction toLem. A.28.

(iii): Let γ := α ∩ β. Then γ ∈ ON by Prop. A.30. Thus

γ ⊆ α ∧ γ ⊆ βLem. A.31⇒ (γ ∈ α ∨ γ = α) ∧ (γ ∈ β ∨ γ = β). (A.12)

If γ ∈ α and γ ∈ β, then γ ∈ α∩β = γ, in contradiction to Lem. A.28. Thus, by (A.12),γ = α or γ = β. If γ = α, then α ⊆ β. If γ = β, then β ⊆ α, completing the proof of(iii).

(iv): Let X be a nonempty set of ordinals and consider α ∈ X. If α = minX, thenwe are already done. Otherwise, Y := α ∩ X = {β ∈ X : β ∈ α} 6= ∅. Since α iswell-ordered by ∈, there is m := minY . If β ∈ X, then either β < α or α ≤ β by (iii).If β < α, then β ∈ Y and m ≤ β. If α ≤ β, then m < α ≤ β. Thus, m = minX,proving (iv). �

Corollary A.33. ON is a proper class (i.e. there is no set containing all the ordinals).

Proof. If there is a set X containing all ordinals, then, by comprehension, β := ON ={α ∈ X : α is an ordinal} must be a set as well. But then Prop. A.29 says that the setβ is transitive and Th. A.32 yields that the set β is well-ordered by ∈, implying β to bean ordinal, i.e. β ∈ β in contradiction to Lem. A.28. �

Corollary A.34. For each set X of ordinals, we have:

(a) X is well-ordered by ∈.

(b) X is an ordinal if, and only if, X is transitive. Note: A transitive set of ordinalsX is sometimes called an initial segment of ON, since, here, transitivity can berestated in the form

∀α∈ON

∀β∈X

(

α < β ⇒ α ∈ X)

. (A.13)

Proof. (a) is a simple consequence of Th. A.32(i)-(iv).

(b) is immediate from (a). �

Theorem A.35. Let X be a nonempty set of ordinals.

(a) Then γ :=⋂X is an ordinal, namely γ = minX. In particular, if α, β ∈ ON,

then min{α, β} = α ∩ β.

(b) Then δ :=⋃X is an ordinal, namely δ = supX. In particular, if α, β ∈ ON, then

max{α, β} = α ∪ β.

Proof. (a): Let m := minX. Then γ ⊆ m, since m ∈ X. Conversely, if α ∈ X, thenm ≤ α implies m ⊆ α by Prop. A.31, i.e. m ⊆ γ. Thus, m = γ, proving (a).

(b): To show δ ∈ ON, we need to show δ is transitive (then δ is an ordinal by Cor.A.34(b)). If α ∈ δ, then there is β ∈ X such that α ∈ β. Thus, if γ ∈ α, then γ ∈ β,since β is transitive. As γ ∈ β implies γ ∈ δ, we see that δ is transitive, as needed. Itremains to show δ = supX. If α ∈ X, then α ⊆ δ, i.e. α ≤ δ, showing δ to be an upperbound for X. Now let u ∈ ON be an arbitrary upper bound for X, i.e.

∀α∈X

α ⊆ u.

Thus, δ ⊆ u, i.e. δ ≤ u, proving δ = supX. �

Next, we obtain some results regarding the successor function of Def. A.19 in the contextof ordinals.

Lemma A.36. We have

∀α∈ON

(

x, y ∈ S(α) ∧ x ∈ y ⇒ x 6= α)

.

Proof. Seeking a contradiction, we reason as follows:

x = αα/∈α⇒ y 6= α

y∈S(α)⇒ y ∈ αα transitive⇒ y ⊆ α

x∈y⇒ α ∈ α.

This contradiction to α /∈ α yields x 6= α, concluding the proof. �

Proposition A.37. For each α ∈ ON, the following holds:

(a) S(α) ∈ ON.

(b) α < S(α).

(c) For each ordinal β, β < S(α) holds if, and only if, β ≤ α.

(d) For each ordinal β, if β < α, then S(β) < S(α).

(e) For each ordinal β, if S(β) < S(α), then β < α.

Proof. (a): Due to Prop. A.29, S(α) is a set of ordinals. Thus, by Cor. A.34(b), itmerely remains to prove that S(α) is transitive. Let x ∈ S(α). If x = α, then x =α ⊆ α ∪ {α} = S(α). If x 6= α, then x ∈ α and, since α is transitive, this impliesx ⊆ α ⊆ S(α), showing S(α) to be transitive, thereby completing the proof of (a).

(b) holds, as α ∈ S(α) holds by the definition of S(α).

(c) is clear, since, for each ordinal β,

β < S(α) ⇔ β ∈ S(α) ⇔ β ∈ α ∨ β = α ⇔ β ≤ α.

(d): If β < α, then S(β) = β ∪ {β} ⊆ α, i.e. S(β) ≤ α < S(α).

(e) follows from (d) using contraposition: If ¬(β < α), then β = α or α < β, implyingS(β) = S(α) or S(α) < S(β), i.e. ¬(S(β) < S(α)). �

We now proceed to define the natural numbers:

Definition A.38. An ordinal n is called a natural number if, and only if,

n 6= 0 ∧ ∀m∈ON

(

m ≤ n ⇒ m = 0 ∨ m is successor ordinal)

.

Proposition A.39. If n = 0 or n is a natural number, then S(n) is a natural numberand every element of n is a natural number or 0.

Proof. Suppose n is 0 or a natural number. If m ∈ n, then m is an ordinal by Prop.A.29. Suppose m 6= 0 and k ∈ m. Then k ∈ n, since n is transitive. Since n is a naturalnumber, k = 0 or k is a successor ordinal. Thus, m is a natural number. It remainsto show that S(n) is a natural number. By definition, S(n) = n ∪ {n} 6= 0. Moreover,S(n) ∈ ON by Prop. A.37(a), and, thus, S(n) is a successor ordinal. If m ∈ S(n), thenm ≤ n, implying m = 0 or m is a successor ordinal, completing the proof that S(n) isa natural number. �

Theorem A.40 (Principle of Induction). If X is a set satisfying

0 ∈ X ∧ ∀x∈X

S(x) ∈ X, (A.14)

then X contains 0 and all natural numbers.

Proof. Let X be a set satisfying (A.14). Then 0 ∈ X is immediate. Let n be a naturalnumber and, seeking a contradiction, assume n /∈ X. ConsiderN := S(n)\X. Accordingto Prop. A.39, S(n) is a natural number and all nonzero elements of S(n) are naturalnumbers. Since N ⊆ S(n) and 0 ∈ X, 0 /∈ N and all elements of N must be naturalnumbers. As n ∈ N , N 6= 0. Since S(n) is well-ordered by ∈ and 0 6= N ⊆ S(n), Nmust have a min m ∈ N , 0 6= m ≤ n. Since m is a natural number, there must be ksuch that m = S(k). Then k < m, implying k /∈ N . On the other hand

k < m ∧m ≤ n ⇒ k ≤ n ⇒ k ∈ S(n).

Thus, k ∈ X, implying m = S(k) ∈ X, in contradiction to m ∈ N . This contradictionproves n ∈ X, thereby establishing the case. �

Definition A.41. If the set X is given by the axiom of infinity, then we use compre-hension to define the set

N0 := {n ∈ X : n = 0 ∨ n is a natural number}

and note N0 to be unique by extensionality. We also denote N := N0\{0}. In set theory,it is also very common to use the symbol ω for the set N0.

Corollary A.42. N0 is the set of all natural numbers and 0, i.e.

∀n

(

n ∈ N0 ⇔ n = 0 ∨ n is a natural number)

.

Proof. “⇒” is clear from Def. A.41 and “⇐” is due to Th. A.40. �

Proposition A.43. ω = N0 is the smallest limit ordinal.

Proof. Since ω is a set of ordinals and ω is transitive by Prop. A.39, ω is an ordinalby Cor. A.34(b). Moreover ω 6= 0, since 0 ∈ ω; and ω is not a successor ordinal (ifω = S(n) = n ∪ {n}, then n ∈ ω and S(n) ∈ ω by Prop. A.39, in contradiction toω = S(n)), implying it is a limit ordinal. To see that ω is the smallest limit ordinal, letα ∈ ON, α < ω. Then α ∈ ω, that means α = 0 or α is a natural number (in particular,a successor ordinal). �

In the following Th. A.44, we will prove that N satisfies the Peano axioms P1 – P3 ofSec. 3.1 (if one prefers, one can show the same for N0, where 0 takes over the role of 1).

Theorem A.44. The set of natural numbers N satisfies the Peano axioms P1 – P3 ofSec. 3.1.

Proof. For P1 and P2, we have to show that, for each n ∈ N, one has S(n) ∈ N \ {1}and that S(m) 6= S(n) for each m,n ∈ N, m 6= n. Let n ∈ N. Then S(n) ∈ N by Prop.A.39. If S(n) = 1, then n < S(n) = 1 by Prop. A.37(b), i.e. n = 0, in contradiction ton ∈ N. If m,n ∈ N with m 6= n, then S(m) 6= S(n) is due to Prop. A.37(d). To proveP3, suppose A ⊆ N has the property that 1 ∈ A and S(n) ∈ A for each n ∈ A. We needto show A = N (i.e. N ⊆ A, as A ⊆ N is assumed). Let X := A ∪ {0}. Then X satisfies(A.14) and Th. A.40 yields N0 ⊆ X. Thus, if n ∈ N, then n ∈ X \ {0} = A, showingN ⊆ A. �

Notation A.45. For each n ∈ N0, we introduce the notation n + 1 := S(n) (moregenerally, one also defines α + 1 := S(α) for each ordinal α).

Theorem A.46. Let n ∈ N0. Then A := N0 \ n is infinite (see Def. 3.12(b)). Inparticular, N0 and N = N0 \ {0} = N0 \ 1 are infinite.

Proof. Since n /∈ n, we have n ∈ A 6= ∅. Thus, if A were finite, then there were abijection f : A −→ Am := {1, . . . ,m} = {k ∈ N : k ≤ m} for some m ∈ N. However,we will show by induction on m ∈ N that there is no injective map f : A −→ Am. Since

S(n) /∈ n, we have S(n) ∈ A. Thus, if f : A −→ A1 = {1}, then f(n) = f(S(n)),showing that f is not injective and proving the cases m = 1. For the induction step, weproceed by contraposition and show that the existence of an injective map f : A −→Am+1, m ∈ N, (cf. Not. A.45) implies the existence of an injective map g : A −→ Am.To this end, let m ∈ N and f : A −→ Am+1 be injective. If m+ 1 /∈ f(A), then f itselfis an injective map into Am. If m + 1 ∈ f(A), then there is a unique a ∈ A such thatf(a) = m+ 1. Define

g : A −→ Am, g(k) :=

{

f(k) for k < a,

f(k + 1) for a ≤ k.(A.15)

Then g is well-defined: If k ∈ A and a ≤ k, then k+1 ∈ A\{a}, and, since f is injective,g does, indeed, map into Am. We verify g to be injective: If k, l ∈ A, k < l, then alsok < l+1 and k+1 6= l+1 (by Peano axiom P2 – k+1 < l+1 then also follows, but wedo not make use of that here). In each case, g(k) 6= g(l), proving g to be injective. �

For more basic information regarding ordinals see, e.g., [Kun12, Sec. I.8].

A.3.5 Power Set

There is one more basic construction principle for sets that is not covered by Axioms 0– 6, namely the formation of power sets. This needs another axiom:

Axiom 7 Power Set:∀X

∃M

∀Y

(

Y ⊆ X ⇒ Y ∈ M)

.

Thus, the power set axiom states that, for each set X, there exists a set Mthat contains all subsets Y of X as elements.

Definition A.47. If X is a set and M is given by the power set axiom, then we call

P(X) := {Y ∈ M : Y ⊆ X}the power set of X. Another common notation for P(X) is 2X (cf. Prop. 2.18).

A.3.6 Foundation

Foundation is, perhaps, the least important of the axioms in ZF. It basically cleansesthe mathematical universe of unnecessary “clutter”, i.e. of certain pathological sets thatare of no importance to standard mathematics anyway.

Axiom 8 Foundation:

∀X

(

∃x(x ∈ X) ⇒ ∃

x∈X¬∃z

(

z ∈ x ∧ z ∈ X))

.

Thus, the foundation axiom states that every nonempty set X contains anelement x that is disjoint to X.


Theorem A.48. Due to the foundation axiom, the ∈ relation can have no cycles, i.e.there do not exist sets x1, x2, . . . , xn, n ∈ N, such that

x1 ∈ x2 ∈ · · · ∈ xn ∈ x1. (A.16a)

In particular, sets can not be members of themselves:

¬∃xx ∈ x. (A.16b)

Proof. If there were sets x1, x2, . . . , xn, n ∈ N, such that (A.16a) were true, then, byusing the pairing axiom and the union axiom, we could form the set

X := {x1, . . . , xn}.

Then, in contradiction to the foundation axiom, X ∩ xi 6= ∅, for each i = 1, . . . , n:Indeed, xn ∈ X ∩ x1, and xi−1 ∈ X ∩ xi for each i = 2, . . . , n. �

For a detailed explanation, why “sets” forbidden by foundation do not occur in standardmathematics, anyway, see, e.g., [Kun12, Sec. I.14].

A.4 The Axiom of Choice

In addition to the axioms of ZF discussed in the previous section, there is one moreaxiom, namely the axiom of choice (AC) that, together with ZF, makes up ZFC, theaxiom system at the basis of current standard mathematics. Even though AC is usedand accepted by most mathematicians, it does have the reputation of being somewhatless “natural”. Thus, many mathematicians try to avoid the use of AC, where possible,and it is often pointed out explicitly, if a result depends on the use of AC (but thispractise is by no means consistent, neither in the literature nor in this class, and onemight sometimes be surprised, which seemingly harmless result does actually depend onAC in some subtle nonobvious way). We will now state the axiom:

Axiom 9 Axiom of Choice (AC):

∀M

∅ /∈ M ⇒ ∃f :M−→ ⋃

N∈M

N

(

∀M∈M

f(M) ∈M

)

.

Thus, the axiom of choice postulates, for each nonempty set M, whose ele-ments are all nonempty sets, the existence of a choice function, that meansa function that assigns, to each M ∈ M, an element m ∈M .

Example A.49. For example, the axiom of choice postulates, for each nonempty setA, the existence of a choice function on P(A) \ {∅} that assigns each subset of A one ofits elements.

—

The axiom of choice is remarkable since, at first glance, it seems so natural that onecan hardly believe it is not provable from the axioms in ZF. However, one can actuallyshow that it is neither provable nor disprovable from ZF (see, e.g., [Jec73, Th. 3.5, Th.5.16] – such a result is called an independence proof, see [Kun80] for further material). Ifyou want to convince yourself that the existence of choice functions is, indeed, a trickymatter, try to define a choice function on P(R)\{∅} without AC (but do not spend toomuch time on it – one can show this is actually impossible to accomplish).

Theorem A.52 below provides several important equivalences of AC. Its statement andproof needs some preparation. We start by introducing some more relevant notions fromthe theory of partial orders:

Definition A.50. Let X be a set and let ≤ be a partial order on X.

(a) An element m ∈ X is called maximal (with respect to ≤) if, and only if, there existsno x ∈ X such that m < x (note that a maximal element does not have to be amax and that a maximal element is not necessarily unique).

(b) A nonempty subset C of X is called a chain if, and only if, C is totally ordered by≤. Moreover, a chain C is called maximal if, and only if, no strict superset Y of C(i.e. no Y ⊆ X such that C ( Y ) is a chain.

—

The following lemma is a bit technical and will be used to prove the implication AC⇒ (ii) in Th. A.52 (other proofs in the literature often make use of so-called transfiniterecursion, but that would mean further developing the theory of ordinals, and we willnot pursue this route in this class).

Lemma A.51. Let X be a set and let ∅ 6= M ⊆ P(X) be a nonempty set of subsets ofX. We let M be partially ordered by inclusion, i.e. setting A ≤ B :⇔ A ⊆ B for eachA,B ∈ M. Moreover, define

∀S⊆M

⋃

S :=⋃

S∈SS (A.17)

and assume∀

C⊆M

(

C is a chain ⇒⋃

C ∈ M)

. (A.18)

If the function g : M −→ M has the property that

∀M∈M

(

M ⊆ g(M) ∧ #(g(M) \M) ≤ 1)

, (A.19)

then g has a fixed point, i.e.∃

M∈Mg(M) =M. (A.20)

Proof. Fix some arbitrary M0 ∈ M. We call T ⊆ M an M0-tower if, and only if, Tsatisfies the following three properties


(i) M0 ∈ T .

(ii) If C ⊆ T is a chain, then⋃ C ∈ T .

(iii) If M ∈ T , then g(M) ∈ T .

Let T := {T ⊆ M : T is an M0-tower}. If T1 := {M ∈ M : M0 ⊆ M}, then, clearly,T1 is an M0-tower and, in particular, T 6= ∅. Next, we note that the intersection of allM0-towers, i.e. T0 :=

⋂

T ∈T T , is also an M0-tower. Clearly, no strict subset of T0 canbe an M0-tower and

M ∈ T0 ⇒ M ∈ T1 ⇒ M0 ⊆M. (A.21)

The main work of the rest of the proof consists of showing that T0 is a chain. To showT0 to be a chain, define

Γ :=

{

M ∈ T0 : ∀N∈T0

(M ⊆ N ∨ N ⊆M)

}

. (A.22)

We intend to show that Γ = T0 by verifying that Γ is an M0-tower. As an intermediatestep, we define

∀M∈Γ

Φ(M) := {N ∈ T0 : N ⊆M ∨ g(M) ⊆ N}

and also show each Φ(M) to be an M0-tower. Actually, Γ and each Φ(M) satisfy (i)due to (A.21). To verify Γ satisfies (ii), let C ⊆ Γ be a chain and U :=

⋃ C. ThenU ∈ T0, since T0 satisfies (ii). If N ∈ T0, and C ⊆ N for each C ∈ C, then U ⊆ N . IfN ∈ T0, and there is C ∈ C such that C 6⊆ N , then N ⊆ C (since C ∈ Γ), i.e. N ⊆ U ,showing U ∈ Γ and Γ satisfying (ii). Now, let M ∈ Γ. To verify Φ(M) satisfies (ii), letC ⊆ Φ(M) be a chain and U :=

⋃ C. Then U ∈ T0, since T0 satisfies (ii). If U ⊆M , thenU ∈ Φ(M) as desired. If U 6⊆ M , then there is x ∈ U such that x /∈ M . Thus, thereis C ∈ C such that x ∈ C and g(M) ⊆ C (since C ∈ Φ(M)), i.e. g(M) ⊆ U , showingU ∈ Φ(M) also in this case, and Φ(M) satisfies (ii). We will verify that Φ(M) satisfies(iii) next. For this purpose, fix N ∈ Φ(M). We need to show g(N) ∈ Φ(M). We alreadyknow g(N) ∈ T0, as T0 satisfies (iii). As N ∈ Φ(M), we can now distinguish three cases.Case 1: N (M . In this case, we cannot have M ( g(N) (otherwise, #(g(N) \N) ≥ 2in contradiction to (A.19)). Thus, g(N) ⊆ M (since M ∈ Γ), showing g(N) ∈ Φ(M).Case 2: N = M . Then g(N) = g(M) ∈ Φ(M) (since g(M) ∈ T0 and g(M) ⊆ g(M)).Case 3: g(M) ⊆ N . Then g(M) ⊆ g(N) by (A.19), again showing g(N) ∈ Φ(M). Thus,we have verified that Φ(M) satisfies (iii) and, therefore, is an M0-tower. Then, by thedefinition of T0, we have T0 ⊆ Φ(M). As we also have Φ(M) ⊆ T0 (from the definitionof Φ(M)), we have shown

∀M∈Γ

Φ(M) = T0.

As a consequence, if N ∈ T0 and M ∈ Γ, then N ∈ Φ(M) and this means N ⊆ M ⊆g(M) or g(M) ⊆ N , i.e. each N ∈ T0 is comparable to g(M), showing g(M) ∈ Γ and Γsatisfying (iii), completing the proof that Γ is an M0-tower. As with the Φ(M) above,


we conclude Γ = T0, as desired. To conclude the proof of the lemma, we note Γ = T0

implies T0 is a chain. We claim that

M :=⋃

T0

satisfies (A.20): Indeed, M ∈ T0, since T0 satisfies (ii). Then g(M) ∈ T0, since T0

satisfies (iii). We then conclude g(M) ⊆ M from the definition of M . As we alwayshave M ⊆ g(M) by (A.19), we have established g(M) =M and proved the lemma. �

Theorem A.52 (Equivalences to the Axiom of Choice). The following statements (i)– (v) are equivalent to the axiom of choice (as stated as Axiom 9 above).

(i) Every Cartesian product∏

i∈I Ai of nonempty sets Ai, where I is a nonemptyindex set, is nonempty (cf. Def. 2.15(c)).

(ii) Hausdorff’s Maximality Principle: Every nonempty partially ordered set X con-tains a maximal chain (i.e. a maximal totally ordered subset).

(iii) Zorn’s Lemma: Let X be a nonempty partially ordered set. If every chain C ⊆ X(i.e. every nonempty totally ordered subset of X) has an upper bound in X (suchchains with upper bounds are sometimes called inductive), then X contains amaximal element (cf. Def. A.50(a)).

(iv) Zermelo’s Well-Ordering Theorem: Every set can be well-ordered (recall the defi-nition of a well-order from Def. A.21(d)).

(v) Every vector space V over a field F has a basis B ⊆ V .

Proof. “(i) ⇔ AC”: Assume (i). Given a nonempty set of nonempty sets M, let I := Mand, for each M ∈ M, let AM :=M . If f ∈∏M∈I AM , then, according to Def. 2.15(c),for eachM ∈ I = M, one has f(M) ∈ AM =M , proving AC holds. Conversely, assumeAC. Consider a family (Ai)i∈I such that I 6= ∅ and each Ai 6= ∅. Let M := {Ai : i ∈ I}.Then, by AC, there is a map g : M −→ ⋃

N∈MN =⋃

j∈I Aj such that g(M) ∈ M foreach M ∈ M. Then we can define

f : I −→⋃

j∈IAj, f(i) := g(Ai) ∈ Ai,

to prove (i).

Next, we will show AC ⇒ (ii) ⇒ (iii) ⇒ (iv) ⇒ AC.

“AC ⇒ (ii)”: Assume AC and let X be a nonempty partially ordered set. Let M be theset of all chains in X (i.e. the set of all nonempty totally ordered subsets of X). Then∅ /∈ M and M 6= ∅ (since X 6= ∅ and {x} ∈ M for each x ∈ X). Moreover, M satisfiesthe hypothesis of Lem. A.51, since, if C ⊆ M is a chain of totally ordered subsets of X,then

⋃ C is a totally ordered subset of X, i.e. in M (here we have used the notationof (A.17); also note that we are dealing with two different types of chains here, namely

those with respect to the order on X and those with respect to the order given by ⊆ onM). Let f : P(X) \ {∅} −→ X be a choice function given by AC, i.e. such that

∀Y ∈P(X)\{∅}

f(Y ) ∈ Y.

As an auxiliary notation, we set

∀M∈M

M∗ :={x ∈ X \M : M ∪ {x} ∈ M

}.

With the intention of applying Lem. A.51, we define

g : M −→ M, g(M) :=

{

M ∪ {f(M∗)} if M∗ 6= ∅,M if M∗ = ∅.

Since g clearly satisfies (A.19), Lem. A.51 applies, providing an M ∈ M such thatg(M) =M . Thus, M∗ = ∅, i.e. M is a maximal chain, proving (ii).

“(ii) ⇒ (iii)”: Assume (ii). To prove Zorn’s lemma, let X be a nonempty set, partiallyordered by ≤, such that every chain C ⊆ X has an upper bound. Due to Hausdorff’smaximality principle, we can assume C ⊆ X to be a maximal chain. Let m ∈ X be anupper bound for the maximal chain C. We claim that m is a maximal element: Indeed,if there were x ∈ X such that m < x, then x /∈ C (since m is upper bound for C) andC ∪ {x} would constitute a strict superset of C that is also a chain, contradicting themaximality of C.

“(iii) ⇒ (iv)”: Assume (iii) and let X be a nonempty set. We need to construct awell-order on X. Let W be the set of all well-orders on subsets of X, i.e.

W :={(Y,W ) : Y ⊆ X ∧ W ⊆ Y × Y ⊆ X ×X is a well-order on Y

}.

We define a partial order ≤ on W by setting

∀(Y,W ),(Y ′,W ′)∈W

(

(Y,W ) ≤ (Y ′,W ′) :⇔ Y ⊆ Y ′ ∧ W = W ′↾Y

∧ (y ∈ Y, y′ ∈ Y ′, y′W ′y ⇒ y′ ∈ Y ))

(recall the definition of the restriction of a relation from Def. A.21(e)). To apply Zorn’slemma to (W ,≤), we need to check that every chain C ⊆ W has an upper bound. Tothis end, if C ⊆ W is a chain, let

UC := (YC,WC), where YC :=⋃

(Y,W )∈CY, WC :=

⋃

(Y,W )∈CW.

We need to verify UC ∈ W : If aWCb, then there is (Y,C) ∈ C such that aWb. Inparticular, (a, b) ∈ Y × Y ⊆ YC × YC, showing WC to be a relation on YC. Clearly, WCis a total order on YC (one just uses that, if a, b ∈ YC, then, as C is a chain, there is(Y,W ) ∈ C such that a, b ∈ Y and W = WC ↾Y is a total order on Y ). To see that WCis a well-order on YC, let ∅ 6= A ⊆ YC. If a ∈ A, then there is (Y,W ) ∈ C such that

a ∈ Y . Since W = WC ↾Y is a well-order on Y , we can let m := minY ∩ A. We claimthat m = minA as well: Let b ∈ A. Then there is (B,U) ∈ C such that b ∈ B. IfB ⊆ Y , then b ∈ Y ∩ A and mWb. If Y ⊆ B, then m, b ∈ B. If mUb, then we aredone. If bUm, then b ∈ Y (since (Y,W ) ≤ (B,U)), i.e., again, b ∈ Y ∩ A and mWb(actually m = b in this case), proving m = minA. This completes the proof that WC isa well-order on YC and, thus, shows UC ∈ W . Next, we check UC to be an upper boundfor C: If (Y,W ) ∈ C, then Y ⊆ YC and W = WC ↾Y are immediate. If y ∈ Y , y′ ∈ YC,and y′WCy, then y′ ∈ Y (otherwise, y′ ∈ A with (A,U) ∈ C, (Y,W ) ≤ (A,U), y′Uy, incontradiction to y′ /∈ Y ). Thus, (Y,W ) ≤ UC, showing UC to be an upper bound for C.By Zorn’s lemma, we conclude that W contains a maximal element (M,WM ). But thenM = X and WM is the desired well-order on X: Indeed, if there is x ∈ X \M , then wecan let Y :=M ∪ {x} and,

∀a,b∈Y

(

aWb :⇔ (a, b ∈M ∧ aWMb) ∨ b = x)

.

Then (Y,W ) ∈ W with (M,WM) < (Y,W ) in contradiction to the maximality of(M,WM ).

“(iv) ⇒ AC”: Assume (iv). Given a nonempty set of nonempty sets M, let X :=⋃

M∈MM . By (iv), there exists a well-order R on X. Then every nonempty Y ⊆ Xhas a unique min. As every M ∈ M is a nonempty subset of X, we can define a choicefunction

f : M −→ X, f(M) := minM ∈M,

proving AC.

“(v) ⇔ AC”: That every vector space has a basis is proved in [Phi19, Th. 5.23] by useof Zorn’s lemma. That, conversely, (v) implies AC was first shown in [Bla84], but theproof needs more algebraic tools than we have available in this class. �

A.5 Cardinality

A.5.1 Relations to Injective, Surjective, and Bijective Maps; Schroder-Bernstein Theorem

Theorem A.53. Let M be a set of sets. Then the relation ∼ on M, defined by

A ∼ B :⇔ A and B have the same cardinality, (A.23)

constitutes an equivalence relation on M.

Proof. According to Def. 2.23, we have to prove that ∼ is reflexive, symmetric, andtransitive. According to Def. 3.12(a), A ∼ B holds for A,B ∈ M if, and only if, thereexists a bijective map f : A −→ B. Thus, since the identity Id : A −→ A is bijective,A ∼ A, showing ∼ is reflexive. If A ∼ B, then there exists a bijective map f : A −→ B,and f−1 is a bijective map f−1 : B −→ A, showing B ∼ A and that ∼ is symmetric.If A ∼ B and B ∼ C, then there are bijective maps f : A −→ B and g : B −→ C.


Then, according to Th. 2.14, the composition (g ◦f) : A −→ C is also bijective, provingA ∼ C and that ∼ is transitive. �

The next theorem provides two interesting, and sometimes useful, characterizations ofinfinite sets:

Theorem A.54. Let A be a set. Using the axiom of choice (AC) of Sec. A.4, thefollowing statements (i) – (iii) are equivalent. More precisely, (ii) and (iii) are equivalenteven without AC (a set A is sometimes called Dedekind-infinite if, and only if, it satisfies(iii)), (iii) implies (i) without AC, but AC is needed to show (i) implies (ii), (iii).

(i) A is infinite.

(ii) There exists M ⊆ A and a bijective map f : M −→ N.

(iii) There exists a strict subset B ( A and a bijective map g : A −→ B.

One sometimes expresses the equivalence between (i) and (ii) by saying that a set isinfinite if, and only if, it contains a copy of the natural numbers. The property stated in(iii) might seem strange at first, but infinite sets are, indeed, precisely those identical insize to some of their strict subsets (as an example think of the natural bijection n 7→ 2nbetween all natural numbers and the even numbers).

Proof. We first prove, without AC, the equivalence between (ii) and (iii).

“(ii) ⇒ (iii)”: Let E denote the even numbers. Then E ( N and h : N −→ E,h(n) := 2n, is a bijection, showing that (iii) holds for the natural numbers. According to(ii), there existsM ⊆ A and a bijective map f : M −→ N. Define B := (A\M) ∪ f−1(E)and

h : A −→ B, h(x) :=

{

x for x ∈ A \M,

f−1 ◦ h ◦ f(x) for x ∈M.(A.24)

Then B ( A since B does not contain the elements of M that are mapped to oddnumbers under f . Still, h is bijective, since h↾A\M= IdA\M and h↾M= f−1 ◦ h ◦ f is thecomposition of the bijective maps f , h, and f−1↾E: E −→ f−1(E).

“(iii) ⇒ (ii)”: As (iii) is assumed, there exist B ⊆ A, a ∈ A \ B, and a bijective mapg : A −→ B. Set

M := {an := gn(a) : n ∈ N}.We show that an 6= am for each m,n ∈ N with m 6= n: Indeed, suppose m,n ∈ N withn > m and an = am. Then, since g is bijective, we can apply g−1 m times to an = amto obtain

a = (g−1)m(am) = (g−1)m(an) = gn−m(a).

Since l := n−m ≥ 1, we have a = g(gl−1(a)), in contradiction to a ∈ A \ B. Thus, allthe an ∈ M are distinct and we can define f : M −→ N, f(an) := n, which is clearlybijective, proving (ii).

“(iii) ⇒ (i)”: The proof is conducted by contraposition, i.e. we assume A to be finite andproof that (iii) does not hold. If A = ∅, then there is nothing to prove. If ∅ 6= A is finite,then, by Def. 3.12(b), there exists n ∈ N and a bijective map f : A −→ {1, . . . , n}. IfB ( A, then, according to Th. A.63(a), there exists m ∈ N0, m < n, and a bijectivemap h : B −→ {1, . . . ,m}. If there were a bijective map g : A −→ B, then h ◦ g ◦ f−1

were a bijective map from {1, . . . , n} onto {1, . . . ,m} with m < n in contradiction toTh. A.61.

“(i) ⇒ (ii)”: Inductively, we construct a strictly increasing sequence M1 ⊆M2 ⊆ . . . ofsubsets Mn of A n ∈ N, and a sequence of functions fn : Mn −→ {1, . . . , n} satisfying

∀n∈N

fn is bijective, (A.25a)

∀m,n∈N

(

m ≤ n ⇒ fn↾Mm= fm

)

: (A.25b)

Since A 6= ∅, there exists m1 ∈ A. Set M1 := {m1} and f1 : M1 −→ {1}, f1(m1) := 1.ThenM1 ⊆ A and f1 bijective are trivially clear. Now let n ∈ N and supposeM1, . . . ,Mn

and f1, . . . , fn satisfying (A.25) have already been constructed. Since A is infinite, theremust bemn+1 ∈ A\Mn (otherwiseMn = A and the bijectivity of fn : Mn −→ {1, . . . , n}shows A is finite with #A = n; AC is used to select the mn+1 ∈ A \Mn). Set Mn+1 :=Mn ∪ {mn+1} and

fn+1 : Mn+1 −→ {1, . . . , n+ 1}, fn+1(x) :=

{

fn(x) for x ∈Mn,

n+ 1 for x = mn+1.(A.26)

Then the bijectivity of fn implies the bijectivity of fn+1, and, since fn+1↾Mn= fn holds

by definition of fn+1, the implication

m ≤ n+ 1 ⇒ fn+1↾Mm= fm

holds true as well. An induction also shows Mn = {m1, . . . ,mn} and fn(mn) = n foreach n ∈ N. We now define

M :=⋃

n∈NMn = {mn : n ∈ N}, f : M −→ N, f(mn) := fn(mn) = n. (A.27)

Clearly, M ⊆ A, and f is bijective with f−1 : N −→M , f−1(n) = mn. �

Theorem A.55 (Schroder-Bernstein). Let A,B be sets. The following statements areequivalent (even without assuming the axiom of choice):

(i) The sets A and B have the same cardinality (i.e. there exists a bijective mapφ : A −→ B).

(ii) There exist an injective map f : A −→ B and an injective map g : B −→ A.

We will give two proofs of the Schroder-Bernstein theorem. The first proof is ratherelegant, but also quite abstract. The second proof is longer, but less abstract. Eventhough it is still nonconstructive in the general situation, in many concrete cases, it doesprovide a method for actually constructing a bijective map from two injective maps. Thefirst proof is based on the following lemma:


Lemma A.56. Let A be a set. Consider P(A) to be endowed with the partial ordergiven by set inclusion, i.e., for each X, Y ∈ P(A), X ≤ Y if, and only if, X ⊆ Y . IfF : P(A) −→ P(A) is isotone with respect to that order, then F has a fixed point, i.e.F (X0) = X0 for some X0 ∈ P(A).

Proof. Define

A := {X ∈ P(A) : F (X) ⊆ X}, X0 :=⋂

X∈AX (A.28)

(X0 is well-defined, since F (A) ⊆ A). Suppose X ∈ A, i.e. F (X) ⊆ X and X0 ⊆ X.Then F (X0) ⊆ F (X) ⊆ X due to the isotonicity of F . Thus, F (X0) ⊆ X for everyX ∈ A, i.e. F (X0) ⊆ X0. Using the isotonicity of F again shows F (F (X0)) ⊆ F (X0),implying F (X0) ∈ A and X0 ⊆ F (X0), i.e. F (X0) = X0 as desired. �

First Proof of Th. A.55. (i) trivially implies (ii), as one can simply set f := φ andg := φ−1. It remains to show (ii) implies (i). Thus, let f : A −→ B and g : B −→ Abe injective. To apply Lem. A.56, define

F : P(A) −→ P(A), F (X) := A \ g(B \ f(X)

),

and note

X ⊆ Y ⊆ A ⇒ f(X) ⊆ f(Y ) ⇒ B \ f(Y ) ⊆ B \ f(X)

⇒ g(B \ f(Y )

)⊆ g(B \ f(X)

)⇒ F (X) ⊆ F (Y ).

Thus, by Lem. A.56, F has a fixed point X0. We claim that a bijection is obtained viasetting

φ : A −→ B, φ(x) :=

{

f(x) for x ∈ X0,

g−1(x) for x /∈ X0.

First, φ is well-defined, since x /∈ X0 = F (X0) implies x ∈ g(B \ f(X0)

). To verify

that φ is injective, let x, y ∈ A, x 6= y. If x, y ∈ X0, then φ(x) 6= φ(y), as f isinjective. If x, y ∈ A \ X0, then φ(x) 6= φ(y), as g−1 is well-defined. If x ∈ X0 andy /∈ X0, then φ(x) ∈ f(X0) and φ(y) ∈ B \ f(X0), once again, implying φ(x) 6= φ(y). Ifremains to prove surjectivity. If b ∈ f(X0), then φ(f

−1(b)) = b. If b ∈ B \ f(X0), theng(b) /∈ X0 = F (X0), i.e. φ(g(b)) = b, showing φ to be surjective. �

Second Proof of Th. A.55. As in the first proof, we only need to show (ii) implies (i).We first assume that A and B are disjoint. To define φ, we first construct a suitablepartition of A ∪B, where the subsets of the partition are given via sequences defined byusing f and g. The idea is to assign a unique sequence σ(a) to each a ∈ A and a uniquesequence σ(b) to each b ∈ B by alternately applying f and g to advance the sequenceto the right and by alternately applying f−1 and g−1 to advance the sequence to theleft, if possible (for a given a ∈ A, g−1(a) might not be defined and, for a given b ∈ B,f−1(a) might not be defined). Thus, for a ∈ A, σ(a) has the form

. . . , f−1(g−1(a)

), g−1(a), a, f(a), g

(f(a)

), . . . (A.29)

More precisely, for each a ∈ A, we define σ(a) = (σi(a))i∈Ia recursively by

σi(a) := a for i = 0, (A.30a)

σi(a) := f(σi−1(a)

)for i > 0 odd, (A.30b)

σi(a) := g(σi−1(a)

)for i > 0 even, (A.30c)

σi(a) := g−1(σi+1(a)

)for i < 0 odd and σi+1(a) ∈ g(B), (A.30d)

ma := i+ 1, Ia := {k ∈ Z : ma ≤ k} for i < 0 odd and σi+1(a) /∈ g(B), (A.30e)

σi(a) := f−1(σi+1(a)

)for i < 0 even and σi+1(a) ∈ f(A), (A.30f)

ma := i+ 1, Ia := {k ∈ Z : ma ≤ k} for i < 0 even and σi+1(a) /∈ f(A), (A.30g)

where the conditions in (A.30e) and (A.30g) are meant to implicitly require σi+1(a) tobe defined for i+1. By induction, one shows σi−1(a) ∈ A for each i > 0 odd, σi−1(a) ∈ Bfor each i > 0 even, σi+1(a) ∈ A for each ma ≤ i < 0 odd, and σi+1(a) ∈ B for eachma ≤ i < 0 even, such that σi(a) is well-defined by (A.30) for each i ∈ Ia (with Ia = Z

if (A.30e) and (A.30g) are never satisfied). Analogously, for each b ∈ B, we defineσ(b) = (σi(b))i∈Ib recursively by

σi(b) := b for i = 0, (A.31a)

σi(b) := g(σi−1(b)

)for i > 0 odd, (A.31b)

σi(b) := f(σi−1(b)

)for i > 0 even, (A.31c)

σi(b) := f−1(σi+1(b)

)for i < 0 odd and σi+1(b) ∈ f(A), (A.31d)

mb := i+ 1, Ib := {k ∈ Z : mb ≤ k} for i < 0 odd and σi+1(b) /∈ f(A), (A.31e)

σi(b) := g−1(σi+1(b)

)for i < 0 even and σi+1(b) ∈ g(B), (A.31f)

mb := i+ 1, Ib := {k ∈ Z : mb ≤ k} for i < 0 even and σi+1(b) /∈ g(B), (A.31g)

where the conditions in (A.31e) and (A.31g) are meant to implicitly require σi+1(b) tobe defined for i+1. By induction, one shows σi−1(b) ∈ B for each i > 0 odd, σi−1(b) ∈ Afor each i > 0 even, σi+1(b) ∈ B for each mb ≤ i < 0 odd, and σi+1(b) ∈ A for eachmb ≤ i < 0 even, such that σi(b) is well-defined by (A.31) for each i ∈ Ib (with Ib = Z if(A.31e) and (A.31g) are never satisfied). The σ(a) and σ(b) now allow us to define thesets

∀x∈A ∪B

Sx := {σi(x) : i ∈ Ix} ⊆ A ∪B. (A.32)

Moreover, we call x ∈ A ∪B an A-stopper if, and only if, σ(x) terminates to the leftwith some element in A; a B-stopper, if, and only if, σ(x) terminates to the left withsome element in B; and a non-stopper, if σ(x) does never terminate to the left – thus,

x A-stopper ⇔(

Ix 6= Z ∧((x ∈ A ∧mx even) ∨ (x ∈ B ∧mx odd)

))

,

x B-stopper ⇔(

Ix 6= Z ∧((x ∈ A ∧mx odd) ∨ (x ∈ B ∧mx even)

))

,

x non-stopper ⇔ Ix = Z. (A.33)


Next, we prove that the Sx form a partition of A ∪B. Since, for each x ∈ A ∪B,x = σ0(x) ∈ Sx, it only remains to show

∀x,y∈A ∪B

(

Sx = Sy ∨ Sx ∩ Sy = ∅)

. (A.34)

To prove (A.34), it clearly suffices to show

∀x,z∈A ∪B

(

z ∈ Sx ⇒ Sx = Sz

)

. (A.35)

To verify (A.35), let z ∈ Sx. Then there exists i ∈ Ix such that z = σ0(z) = σi(x) andsimple inductions show σk(z) = σk+i(x) for each k ∈ Iz and σk−i(z) = σk(x) for eachk ∈ Ix (in particular, i+ Iz = Ix), proving Sx = Sz.

We are now in a position to define the desired bijection φ : A −→ B:

φ : A −→ B, φ(a) :=

{

f(a) if a is an A-stopper or a non-stopper,

g−1(a) if a is a B-stopper.(A.36)

Indeed, φ is injective: If a1, a2 ∈ {a ∈ A : a A-stopper or non-stopper} with a1 6= a2,then φ(a1) 6= φ(a2) due to f being injective; if a1, a2 ∈ {a ∈ A : a B-stopper} witha1 6= a2, then φ(a1) 6= φ(a2) due to g−1 being injective; and a1, a2 ∈ A with a2 a B-stopper and a1 not a B-stopper, Sa1 = Sf(a1) and Sa2 = Sg−1(a2), i.e. φ(a2) is also aB-stopper, whereas φ(a1) is not a B-stopper, in particular, φ(a1) 6= φ(a2). Moreover,φ is also surjective: If b ∈ B is a B-stopper, then, due to Sb = Sg(b), so is g(b), andb = g−1(g(b)) = φ(g(b)); if b ∈ B is not a B-stopper, then f−1(b) is defined and in Sb,i.e. f−1(b) is not a B-stopper, either, and b = f(f−1(b)) = φ(f−1(b)).

To conclude the proof, we consider the case that A and B are not necessarily disjoint.Since A× {0} and B × {1} are always disjoint with

f : A× {0} −→ B × {1}, f(a, 0) := (f(a), 1), (A.37a)

g : B × {1} −→ A× {0}, g(b, 0) := (g(b), 0), (A.37b)

still being injective if f, g are, the first part of the proof yields a bijective functionφ : A× {0} −→ B × {1}. Then, using the clearly bijective functions

α : A −→ A× {0}, α(a) := (a, 0), (A.38a)

β : B −→ B × {1}, β(b) := (b, 1), (A.38b)

φ := β−1 ◦ φ ◦ α : A −→ B is also bijective. �

Remark A.57. In general, the second proof of the Schroder-Bernstein Th. A.55 is stillnonconstructive, since one has, in general, no algorithm to determine if a given elementis an A-stopper, a B-stopper, or a non-stopper. However, as the following Ex. A.58shows, in particular situations, determining A-stoppers, B-stoppers, and non-stoppersdoes not have to be difficult.


Example A.58. Let A := N0, B := {n ∈ N0 : n even}. We consider A and B as beingmade disjoint (for example, by using the trick employed in the last part of the proofof Th. A.55 above), but, for the sake of readability, we will not reflect this in the usednotation. Define the maps

f : A −→ B, f(n) := 4n, (A.39a)

g : B −→ A, g(n) := n, (A.39b)

both being clearly injective, but not surjective. The goal is to, explicitly, find thebijective map φ : A −→ B, given by (A.36). As an intermediate step, we determinewhich elements of A are non-stoppers, A-stoppers, and B-stoppers, and likewise for theelements of B. Clearly 0 ∈ A and 0 ∈ B are non-stoppers. We will see that all otherelements are either A-stoppers or B-stoppers. The precise claim is

A1 := {a ∈ A : a is A-stopper} = C1 := {a ∈ A : a = n 4k, n odd, k ∈ N0}, (A.40a)

A2 := {a ∈ A : a is B-stopper} = C2 := A \ (C1 ∪ {0}), (A.40b)

B1 := {b ∈ B : b is A-stopper} = D1 := B \ (D2 ∪ {0}), (A.40c)

B2 := {b ∈ B : b is B-stopper} = D2 := {b ∈ B : b = n 2k; n, k odd; n, k ≥ 1}.(A.40d)

Indeed, if c = n 4k ∈ C1, then (f−1 ◦ g−1)k(c) = n is odd, i.e. n /∈ g(B), showing c isan A-stopper, proving C1 ⊆ A1. If d = n 2k ∈ D2, then k − 1 = 2m with m ∈ N0, i.e.d = n 2 · 4m and (g−1 ◦ f−1)m(d) = 2n is not divisable by 4, i.e. 2n /∈ f(A), showing d isa B-stopper, proving D2 ⊆ B2. Clearly, each a ∈ N either has the form a = n 4k with nodd and k ∈ N0 (i.e. a ∈ C1) or a = 2 · n4k with n odd and k ∈ N0, i.e.

C2 = {a ∈ A : a = 2 · n(2 · 2)k; n odd; k ∈ N0}= {a ∈ A : a = n 2k; n, k odd; n, k ≥ 1} = g(D2). (A.41)

Since D2 ⊆ B2, all elements of D2 are B-stoppers, and, thus, so are all elements of C2,proving C2 ⊆ A2. Since A = C1 ∪C2 ∪{0}, we then also obtain A1 = C1 and A2 = C2.Clearly, each even b ∈ N either has the form b = n 2k with odd n, k ≥ 1 (i.e. b ∈ D2) orb = n4k with n odd and k ∈ N, i.e.

D1 = {b ∈ B : b = n 4k, n odd, k ∈ N} = f(C1). (A.42)

Since C1 = A1, all elements of C1 are A-stoppers, and, thus, so are all elements of D1.Since B = D1 ∪D2 ∪{0}, we then also obtain B1 = D1 and B2 = D2.

Now that we have identified explicit formulas for A1 and A2, we can write the assignmentrule for the bijective φ : A −→ B, given by (A.36), in the explicit form

φ(a) :=

0 if a = 0,

4a if a = n 4k with n odd and k ∈ N0,

a if a = 2 · n4k with n odd and k ∈ N0.

(A.43)


Thus, φ starts out with the assignments

0 1 2 3 4 5 6 7 8

φ : ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ . . .

0 4 2 12 16 20 6 28 8

(A.44)

Theorem A.59. Let A,B be nonempty sets. Then the following statements are equiv-alent (where the implication “(ii) ⇒ (i)” makes use of the axiom of choice (AC) of Sec.A.4).

(i) There exists an injective map f : A −→ B.

(ii) There exists a surjective map g : B −→ A.

Proof. According to Th. 2.13(b), (i) is equivalent to f having a left inverse g : B −→ A(i.e. g ◦ f = IdA), which is equivalent to g having a right inverse, which, according toTh. 2.13(a), is equivalent to (ii) (AC is used in the proof of Th. 2.13(a) to show eachsurjective map has a right inverse). �

Corollary A.60. Let A,B be nonempty sets. Using AC, we can expand the two equiv-alent statements of Th. A.55 to the following list of equivalent statements:

(i) The sets A and B have the same cardinality (i.e. there exists a bijective mapφ : A −→ B).

(ii) There exist an injective map f : A −→ B and an injective map g : B −→ A.

(iii) There exist a surjective map f : A −→ B and a surjective map g : B −→ A.

(iv) There exist an injective map f1 : A −→ B and a surjective map f2 : A −→ B.

(v) There exist an injective map g1 : B −→ A and a surjective map g2 : B −→ A.

Proof. The equivalences are an immediate consequence of combining Th. A.55 with Th.A.59. �

A.5.2 Finite Sets

It is intuitively clear that finite cardinalities are uniquely determined. Still one has toprovide a rigorous proof. The key is the following theorem:

Theorem A.61. If m,n ∈ N and the map f : {1, . . . ,m} −→ {1, . . . , n} is bijective,then m = n.

Proof. We conduct the proof via induction on m. If m = 1, then the surjectivity of fimplies n = 1. For the induction step, we now consider m > 1. From the bijective mapf , we define the map

g : {1, . . . ,m} −→ {1, . . . , n}, g(x) :=

n for x = m,

f(m) for x = f−1(n),

f(x) otherwise.

(A.45)

Then g is bijective, since it is the composition g = h ◦ f of the bijective map f with thebijective map

h : {f(m), n} −→ {f(m), n}, h(f(m)) := n, h(n) := f(m). (A.46)

Thus, the restriction g↾{1,...,m−1}: {1, . . . ,m−1} −→ {1, . . . , n−1}must also be bijective,such that the induction hypothesis yields m− 1 = n− 1, which, in turn, implies m = nas desired. �

Corollary A.62. Let m,n ∈ N and let A be a set. If #A = m and #A = n, thenm = n.

Proof. If #A = m, then, according to Def. 3.12(b), there exists a bijective map f :A −→ {1, . . . ,m}. Analogously, if #A = n, then there exists a bijective map g :A −→ {1, . . . , n}. In consequence, we have the bijective map (g ◦ f−1) : {1, . . . ,m} −→{1, . . . , n}, such that Th. A.61 yields m = n. �

Theorem A.63. Let A 6= ∅ be a finite set.

(a) If B ⊆ A with A 6= B, then B is finite with #B < #A.

(b) If a ∈ A, then #(A \ {a}

)= #A− 1.

Proof. For #A = n ∈ N, we use induction to prove (a) and (b) simultaneously, i.e. weshow

∀n∈N

(

#A = n ⇒ ∀B∈P(A)\{A}

∀a∈A

#B ∈ {0, . . . , n− 1} ∧#(A \ {a}

)= n− 1

)

︸︷︷︸

φ(n)

.

Base Case (n = 1): In this case, A has precisely one element, i.e. B = A \ {a} = ∅, and#∅ = 0 = n− 1 proves φ(1).

Induction Step: For the induction hypothesis, we assume φ(n) to be true, i.e. we assume(a) and (b) hold for each A with #A = n. We have to prove φ(n+ 1), i.e., we considerA with #A = n+1. From #A = n+1, we conclude the existence of a bijective map ϕ :A −→ {1, . . . , n+ 1}. We have to construct a bijective map ψ : A \ {a} −→ {1, . . . , n}.To this end, set k := ϕ(a) and define the auxiliary function

f : {1, . . . , n+ 1} −→ {1, . . . , n+ 1}, f(x) :=

n+ 1 for x = k,

k for x = n+ 1,

x for x /∈ {k, n+ 1}.


Then f ◦ ϕ : A −→ {1, . . . , n+ 1} is bijective by Th. 2.14, and

(f ◦ ϕ)(a) = f(ϕ(a)) = f(k) = n+ 1.

Thus, the restriction ψ := (f ◦ ϕ) ↾A\{a} is the desired bijective map ψ : A \ {a} −→{1, . . . , n}, proving #

(A \ {a}

)= n. It remains to consider the strict subset B of A.

Since B is a strict subset of A, there exists a ∈ A \ B. Thus, B ⊆ A \ {a} and, aswe have already shown #

(A \ {a}

)= n, the induction hypothesis applies and yields B

is finite with #B ≤ #(A \ {a}

)= n, i.e. #B ∈ {0, . . . , n}, proving φ(n + 1), thereby


Theorem A.64. For #A = #B = n ∈ N and f : A −→ B, the following statementsare equivalent:

(i) f is injective.

(ii) f is surjective.

(iii) f is bijective.

Proof. It suffices to prove the equivalence of (i) and (ii).

If f is injective, then f : A −→ f(A) is bijective. Since #A = n, there exists a bijectivemap ϕ : A −→ {1, . . . , n}. Then (ϕ ◦ f−1) : f(A) −→ {1, . . . , n} is also bijective,showing #f(A) = n, i.e., according to Th. A.63(a), f(A) can not be a strict subset ofB, i.e. f(A) = B, proving f is surjective.

If f is surjective, then f has a right inverse g : B −→ A: One can obtain this fromTh. 2.13(a), but, here, we can actually construct g without the axiom of choice: We letϕ : A −→ {1, . . . , n} be the bijective map from above and, for b ∈ B, we let g(b) be theunique a ∈ C := f−1({b}) such that ϕ(a) = minϕ(C). Then, clearly, f ◦ g = IdB. Butthis also means f is a left inverse for g, such that g must be injective by Th. 2.13(b).According to what we have already proved above, g injective implies g surjective, i.e.g must be bijective. From Th. 2.13(c), we then know the left inverse of g is unique,implying f = g−1. In particular, f is injective. �

Lemma A.65. For each finite set A (i.e. #A = n ∈ N0) and each B ⊆ A, one has#(A \B) = #A−#B.

Proof. For B = ∅, the assertion is true since #(A \B) = #A = #A− 0 = #A−#B.

For B 6= ∅, the proof is conducted over the size of B, i.e. as a finite induction (cf. Cor.3.6) over the set {1, . . . , n}, showing

∀m∈{1,...,n}

(#B = m ⇒ #(A \B) = #A−#B

)

︸︷︷︸

φ(m)

.

Base Case (m = 1): φ(1) is precisely the statement provided by Th. A.63(b).

Induction Step: For the induction hypothesis, we assume φ(m) with 1 ≤ m < n. Toprove φ(m + 1), consider B ⊆ A with #B = m + 1. Fix an element b ∈ B and setB1 := B \ {b}. Then #B1 = m by Th. A.63(b), A \B = (A \B1) \ {b}, and we compute

#(A \B) = #((A \B1) \ {b}

) Th. A.63(b)= #(A \B1)− 1

(φ(m)

)

= #A−#B1 − 1

= #A−#B,

proving φ(m+ 1) and completing the induction. �

Theorem A.66. If A,B are finite sets, then #(A ∪ B) = #A+#B −#(A ∩B).

Proof. The assertion is clearly true if A or B is empty. If A and B are nonempty, thenthere exist m,n ∈ N such that #A = m and #B = n, i.e. there are bijective mapsf : A −→ {1, . . . ,m} and g : B −→ {1, . . . , n}.We first consider the case A∩B = ∅. We need to construct a bijective map h : A∪B −→{1, . . . ,m+ n}. To this end, we define

h : A ∪ B −→ {1, . . . ,m+ n}, h(x) :=

{

f(x) for x ∈ A,

g(x) +m for x ∈ B.

The bijectivity of f and g clearly implies the bijectivity of h, proving #(A ∪ B) =m+ n = #A+#B.

Finally, we consider the case of arbitrary A,B. Since A∪B = A ∪(B \A) and B \A =B \ (A ∩B), we can compute

#(A ∪B) = #(A ∪(B \ A)

)= #A+#(B \ A)

= #A+#(B \ (A ∩ B)

) Lem. A.65= #A+#B −#(A ∩ B),

thereby establishing the case. �

Theorem A.67. If (A1, . . . , An), n ∈ N, is a finite sequence of finite sets, then

#n∏

i=1

Ai = #(A1 × · · · × An

)=

n∏

i=1

#Ai. (A.47)

Proof. If at least one Ai is empty, then (A.47) is true, since both sides are 0.

The case where all Ai are nonempty is proved by induction over n, i.e. we know ki :=#Ai ∈ N for each i ∈ {1, . . . , n} and show by induction

∀n∈N

#n∏

i=1

Ai =n∏

i=1

ki

︸︷︷︸

φ(n)

.

Base Case (n = 1):∏1

i=1Ai = #A1 = k1 =∏1

i=1 ki, i.e. φ(1) holds.


Induction Step: From the induction hypothesis φ(n), we obtain a bijective map ϕ :A −→ {1, . . . , N}, where A :=

∏ni=1Ai and N :=

∏ni=1 ki. To prove φ(n + 1), we need

to construct a bijective map h : A× An+1 −→ {1, . . . , N · kn+1}. Since #An+1 = kn+1,there exists a bijective map f : An+1 −→ {1, . . . , kn+1}. We define

h : A× An+1 −→ {1, . . . , N · kn+1},h(a1, . . . , an, an+1) :=

(f(an+1)− 1

)·N + ϕ(a1, . . . , an).

Since ϕ and f are bijective, and since every m ∈ {1, . . . , N · kn+1} has a unique rep-resentation in the form m = a · N + r with a ∈ {0, . . . , kn+1 − 1} and r ∈ {1, . . . , N}(exercise), h is also bijective. This proves φ(n+ 1) and completes the induction. �

Theorem A.68. For each finite set A (i.e. #A = n ∈ N0), one has #P(A) = 2n.

Proof. The proof is conducted by induction by showing

∀n∈N0

(#A = n ⇒ #P(A) = 2n

)

︸︷︷︸

φ(n)

.

Base Case (n = 0): For n = 0, we have A = ∅, i.e. P(A) = {∅}. Thus, #P(A) = 1 = 20,proving φ(0).

Induction Step: Assume φ(n) and consider A with #A = n + 1. Then A contains atleast one element a. For B := A \ {a}, we then know #B = n from Th. A.63(b).Moreover, setting M :=

{C ∪ {a} : C ∈ P(B)

}, we have the disjoint decomposition

P(A) = P(B) ∪M. As the map ϕ : P(B) −→ M, ϕ(C) := C∪{a}, is clearly bijective,P(B) and M have the same cardinality. Thus,

#P(A)Th. A.66

= #P(B) + #M = #P(B) + #P(B)

(φ(n))

= 2 · 2n = 2n+1,

thereby proving φ(n+ 1) and completing the induction. �

A.5.3 Power Sets

Theorem A.69. Let A be a set. There can never exist a surjective map from A ontoP(A) (in this sense, the size of P(A) is always strictly bigger than the size of A; inparticular, A and P(A) can never have the same size).

Proof. If A = ∅, then there is nothing to prove. For nonempty A, the idea is toconduct a proof by contradiction. To this end, assume there does exist a surjective mapf : A −→ P(A) and define

B := {x ∈ A : x /∈ f(x)}. (A.48)

Now B is a subset of A, i.e. B ∈ P(A) and the assumption that f is surjective impliesthe existence of a ∈ A such that f(a) = B. If a ∈ B, then a /∈ f(a) = B, i.e. a ∈ B

B ASSOCIATIVITY AND COMMUTATIVITY 201

implies a ∈ B ∧ ¬(a ∈ B), so that the principle of contradiction tells us a /∈ B must betrue. However, a /∈ B implies a ∈ f(a) = B, i.e., this time, the principle of contradictiontells us a ∈ B must be true. In conclusion, we have shown our original assumption thatthere exists a surjective map f : A −→ P(A) implies a ∈ B ∧ ¬(a ∈ B), i.e., accordingto the principle of contradiction, no surjective map from A into P(A) can exist. �

B General Forms of the Laws of Associativity and

Commutativity

B.1 Associativity

In the literature, the general law of associativity is often stated in the form thata1a2 · · · an gives the same result “for every admissible way of inserting parentheses intoa1a2 · · · an”, but a completely precise formulation of what that actually means seems tobe rare. As a warm-up, we first prove a special case of the general law:

Proposition B.1. Let A be a set with a composition · : A× A −→ A (which we writeas a multiplication, but, clearly, this is not essential, and we could also write it as anaddition or with some other symbol). If the composition is associative, i.e. if

∀a,b,c∈A

(ab)c = a(bc), (B.1)

then

∀n∈N,n≥2

∀a1,...,an∈A

∀k∈{2,...,n}

(n∏

i=k

ai

)(k−1∏

i=1

ai

)

=n∏

i=1

ai, (B.2)

where the product symbol is defined according to (3.20a).

Proof. If k = n, then (B.2) is immediate from (3.20a). For 2 ≤ k < n, we prove (B.2)by induction on n: For the base case, n = 2, there is nothing to prove. For n > 2, onecomputes

(n∏

i=k

ai

)(k−1∏

i=1

ai

)

(3.20a)=

(

an ·n−1∏

i=k

ai

)(k−1∏

i=1

ai

)

(B.1)= an ·

((n−1∏

i=k

ai

)(k−1∏

i=1

ai

))

ind.hyp.= an ·

n−1∏

i=1

ai(3.20a)=

n∏

i=1

ai, (B.3)

completing the induction and the proof of the proposition. �

The difficulty in stating the general form of the law of associativity lies in giving aprecise definition of what one means by “an admissible way of inserting parentheses intoa1a2 · · · an”. So how does one actually proceed to calculate the value of a1a2 · · · an, given

that parentheses have been inserted in an admissible way? The answer is that one doesit in n− 1 steps, where, in each step, one combines two juxtaposed elements, consistentwith the inserted parentheses. There can still be some ambiguity: For example, for(a1a2)(a3(a4a5)), one has the freedom of first combining a1, a2, or of first combininga4, a5. In consequence, our general law of associativity will show that, for each admissiblesequence of n − 1 directives for combining two juxtaposed elements, the final result isthe same (under the hypothesis that (B.1) holds). This still needs some preparatorywork.

In the following, one might see it as a slight notational inconvenience that we havedefined

∏ni=1 ai as an · · · a1 rather than a1 · · · an. For this reason, we will enumerate the

elements to be combined by composition from right to left rather then from left to right.

Definition and Remark B.2. Let A be a (nonempty) set with a composition · :A × A −→ A, let n ∈ N, n ≥ 2, and let I be a totally ordered index set, #I = n,I = {i1, . . . , in} with i1 < · · · < in. Moreover, let F := (ain , . . . , ai1) be a family of nelements of A.

(a) An admissible composition directive (for combining two juxtaposed elements of thefamily) is an index ik ∈ I with 1 ≤ k ≤ n − 1. It transforms the family F intothe family G := (ain , . . . , aik+1

aik , . . . , ai1). In other words, G = (bj)j∈J , whereJ := I \ {ik+1}, bj = aj for each j ∈ J \ {ik}, and bik = aik+1

aik . We can write thistransformation as two maps

F 7→ δ(1)ik(F ) := G = (ain , . . . , aik+1

aik , . . . , ai1) = (bj)j∈J , (B.4a)

I 7→ δ(2)ik(I) := J = I \ {ik+1}. (B.4b)

Thus, an application of an admissible composition directive reduces the length ofthe family and the number of indices by one.

(b) Recursively, we define (finite) sequences of families, index sets, and indices as fol-lows:

Fn := F, In := I, (B.5a)

∀α∈{2,...,n}

Fα−1 := δ(1)jα(Fα), Iα−1 := δ

(2)jα(Iα), where jα ∈ Iα \ {max Iα}.

(B.5b)

The corresponding sequence of indices D := (jn, . . . , j2) in I is called an admissibleevaluation directive. Clearly,

∀α∈{1,...,n}

#Iα = α, i.e. Fα has length α. (B.6)

In particular, I1 = {j2} = {i1} (where the second equality follows from (B.4b)),F1 = (a), and we call

D(F ) := a (B.7)

the result of the admissible evaluation directive D applied to F .

Theorem B.3 (General Law of Associativity). Let A be a (nonempty) set with a com-position · : A × A −→ A, let n ∈ N, n ≥ 2, and let I be a totally ordered index set,#I = n, I = {i1, . . . , in} with i1 < · · · < in. Moreover, let F := (ain , . . . , ai1) be afamily of n elements of A. If the composition is associative, i.e. if (B.1) holds, then,for each admissible evaluation directive as defined in Def. and Rem. B.2(b), the resultis the same, namely

D(F ) =n∏

k=1

aik . (B.8)

Proof. We conduct the proof via induction on n. For n = 3, there are only two possibledirectives and (B.1) guarantees that they yield the same result. For the induction step,let n > 3. As in Def. and Rem. B.2(b), we write D = (jn, . . . , j2) and obtain someI2 = {i1, im}, 1 < m ≤ n, as the corresponding penultimate index set. Depending onim, we partition (jn, . . . , j3) as follows: Set

J1 :={k ∈ {3, . . . , n} : jk < im

}, J2 :=

{k ∈ {3, . . . , n} : jk ≥ im

}. (B.9)

Then, for k ∈ J1, jk is a composition directive to combine two elements to the right ofaim and, for k ∈ J2, jk is a composition directive to combine two elements to the left ofaim . Moreover, J1 and J2 might or might not be the empty set: If J1 = ∅, then jk 6= i1for each k ∈ {3, . . . , n}, implying im = i2; if J2 = ∅, then, in each of the n− 2 steps toobtain I2, an ik with k < m was removed from I, implying im = in (in particular, asn 6= 2, J1 and J2 cannot both be empty). If J1 6= ∅, then D1 := (jk)k∈J1 is an admissibleevaluation directive for (aim−1 , . . . , ai1) – this follows from

jk ∈ K ⊆ {i1, . . . , im−1} ⇒ δ(2)jk(K) ⊆ K ⊆ {i1, . . . , im−1}. (B.10)

Since m− 1 < n, the induction hypothesis applies and yields

D1(aim−1 , . . . , ai1) =m−1∏

k=1

aik . (B.11)

Analogously, if J2 6= ∅, then D2 := (jk)k∈J2 is an admissible evaluation directive for(ain , . . . , aim) – this follows from

jk ∈ K ⊆ {im, . . . , in} ⇒ δ(2)jk(K) ⊆ K ⊆ {im, . . . , in}. (B.12)

Since m > 1, the induction hypothesis applies and yields

D2(ain , . . . , aim) =n∏

k=m

aik . (B.13)

Thus, if J1 6= ∅ and J2 6= ∅, then we obtain

D(F )j2=i1= D2(ain , . . . , aim) · D1(aim−1 , . . . , ai1) =

(n∏

k=m

aik

)(m−1∏

k=1

aik

)

Prop. B.1=

n∏

k=1

aik

(B.14)


as desired. If J1 = ∅, then, as explained above, im = i2. Thus, in this case,

D(F )j2=i1= D2(ain , . . . , ai2) · ai1 =

(n∏

k=2

aik

)

· ai1Prop. B.1

=n∏

k=1

aik (B.15)

as needed. Finally, if J2 = ∅, then, as explained above, im = in. Thus, in this case,

D(F )j2=i1= ain · D1(ain−1 , . . . , ai1) =

n∏

k=1

aik , (B.16)

again, as desired, and completing the induction. �

B.2 Commutativity

In the present section, we will generalize the law of commutativity ab = ba to a finitenumber of factors, provided the composition is also associative. For this purpose, weintroduce the notion of permutation, also useful in many other mathematical contexts.

Definition and Remark B.4. Let n ∈ N. Each bijective map π : {1, . . . , n} −→{1, . . . , n} is called a permutation of {1, . . . , n}. The set of permutations of {1, . . . , n}forms a group with respect to the composition of maps, the so-called symmetric groupSn: Indeed, the composition of maps is associative by Prop. 2.10(a); the neutral elementis the identity map e : {1, . . . , n} −→ {1, . . . , n}, e(i) = i; and, for each π ∈ Sn, itsinverse map π−1 is also its inverse element in the group Sn. Caveat: Simple examplesshow that Sn is not commutative.

Theorem B.5 (General Law of Commutativity). Let A be a set with an associativecomposition · : A × A −→ A (which we write as a multiplication, but, clearly, this isnot essential, and we could also write it as an addition or with some other symbol). Ifthe composition is commutative, i.e. if

∀a,b∈A

ab = ba, (B.17)

then

∀n∈N

∀π∈Sn

∀a1,...,an∈A

n∏

i=1

ai =n∏

i=1

aπ(i). (B.18)

Before we can carry out the proof, we need to learn a bit more about permutations.

Definition B.6. Let k, n ∈ N, k ≤ n. A permutation π ∈ Sn is called a k-cycle if, andonly if, there exist k distinct numbers i1, . . . , ik ∈ {1, . . . , n} such that

π(i) =

ij+1 if i = ij, j ∈ {1, . . . , k − 1},i1 if i = ik,

i if i /∈ {i1, . . . , ik}.(B.19)

If π is a cycle as in (B.19), then one writes

π = (i1 i2 . . . ik). (B.20)

Each 2-cycle is also known as a transposition.

Theorem B.7. Let n ∈ N.

(a) Each permutation can be decomposed into finitely many disjoint cycles: For eachπ ∈ Sn, there exists a decomposition of {1, . . . , n} into disjoint sets A1, . . . , AN ,N ∈ N, i.e.

{1, . . . , n} =N⋃

i=1

Ai and Ai ∩ Aj = ∅ for i 6= j, (B.21)

such that Ai consists of the distinct elements ai1, . . . , ai,Niand

π = (aN1 . . . aN,NN) · · · (a11 . . . a1,N1). (B.22)

The decomposition (B.22) is unique up to the order of the cycles.

(b) If n ≥ 2, then every permutation π ∈ Sn is the composition of finitely many trans-positions, where each transposition permutes two juxtaposed elements, i.e.

∀π∈Sn

∃N∈N

∃τ1,...,τN∈T

π = τN ◦ · · · ◦ τ1, (B.23)

where T :={(i i+ 1) : i ∈ {1, . . . , n− 1}

}.

Proof. (a): We prove the statement by induction on n. For n = 1, there is nothing toprove. Let n > 1 and choose i ∈ {1, . . . , n}. We claim that

∃k∈N

(

πk(i) = i ∧ ∀l∈{1,...,k−1}

πl(i) 6= i

)

. (B.24)

Indeed, since {1, . . . , n} is finite, there must be a smallest k ∈ N such that πk(i) ∈ A1 :={i, π(i), . . . , πk−1(i)}. Since π is bijective, it must be πk(i) = i and (i π(i) . . . , πk−1(i))is a k-cycle. We are already done in case k = n. If k < n, then consider B :={1, . . . , n} \A1. Then, again using the bijectivity of π, π↾B is a permutation on B with1 ≤ #B < n. By induction, there are disjoint sets A2, . . . , AN such that B =

⋃Nj=2Aj,

Aj consists of the distinct elements aj1, . . . , aj,Njand

π↾B= (aN1 . . . aN,NN) · · · (a21 . . . a2,N2).

Since π = (i π(i) . . . , πk−1(i)) ◦ π ↾B, this finishes the proof of (B.22). If there wereanother, different, decomposition of π into cycles, say, given by disjoint sets B1, . . . , BM ,{1, . . . , n} =

⋃Mi=1Bi, M ∈ N, then there were Ai 6= Bj and k ∈ Ai ∩ Bj. But then k

were in the cycle given by Ai and in the cycle given by Bj, implying Ai = {πl(k) : l ∈N} = Bj, in contradiction to Ai 6= Bj.

(b): We first show that every π ∈ Sn is a composition of finitely many transpositions(not necessarily transpositions from the set T ): According to (a), it suffices to showthat every cycle is a composition of finitely many transpositions. Since each 1-cycle isthe identity, it is (i) = Id = (1 2) (1 2) for each i ∈ {1, . . . , n}. If (i1 . . . ik) is a k-cycle,k ∈ {2, . . . , n}, then

(i1 . . . ik) = (i1 i2) (i2 i3) · · · (ik−1 ik) : (B.25)

Indeed,

∀i∈{1,...,n}

(i1 i2) (i2 i3) · · · (ik−1 ik)(i) =

i1 for i = ik,

il+1 for i = il, l ∈ {1, . . . , k − 1},i for i /∈ {i1, . . . , ik},

(B.26)

proving (B.25). To finish the proof of (b), we observe that every transposition is acomposition of finitely many elements of T : If i, j ∈ {1, . . . , n}, i < j, then

(i j) = (i i+ 1) · · · (j − 2 j − 1)(j − 1 j) · · · (i+ 1 i+ 2)(i i+ 1) : (B.27)

Indeed,

∀k∈{1,...,n}

(i i+ 1) · · · (j − 2 j − 1)(j − 1 j) · · · (i+ 1 i+ 2)(i i+ 1)(k)

=

j for k = i,

i for k = j,

k for i < k < j,

k for k /∈ {i, i+ 1, . . . , j},

(B.28)

proving (B.27). �

Proof of Th. B.5. For n = 1, there is nothing to prove. So let n > 1. For l ∈1, . . . , n− 1, let τl : {1, . . . , n} −→ {1, . . . , n} be the transposition that interchangesl and l + 1 and leaves all other elements fixed (i.e. τl(l) = l + 1, τl(l + 1) = l, τ(α) = αfor each α ∈ {1, . . . , n} \ {l, l + 1}) and let T := {τ1, . . . , τn−1}. Then (B.17) and Th.B.3 imply that the theorem holds for π = τ for each τ ∈ T . For a general permutationπ ∈ Sn, Th. B.7(b) provides a finite sequence (τ 1, . . . , τN), N ∈ N, of elements of Tsuch that π = τN ◦ · · · ◦τ 1. Thus, as we already know that the theorem holds for N = 1,the case N > 1 follows by induction. �

The following example shows that, if the composition is not associative, then, in general,(B.17) does not imply (B.18):

Example B.8. Let A := {a, b, c} with #A = 3 (i.e. the elements a, b, c are all distinct).Let the composition · on A be defined according to the composition table

· a b ca b b bb b b ac b a a

C ALGEBRAIC STRUCTURES 207

(the table entry at the intersection of the row labeled with factor x and the columnlabeled with factor y is definition of x · y). Then, clearly, · is commutative. However, ·is not associative, since, e.g.,

(cb)a = aa = b 6= a = cb = c(ba), (B.29)

and (B.18) does not hold, since, e.g.,

a(bc) = aa = b 6= a = cb = c(ba). (B.30)

C Algebraic Structures

C.1 Groups

Definition C.1. Let G be a nonempty set with a map

◦ : G×G −→ G, (x, y) 7→ x ◦ y (C.1)

(called a composition onG, the examples we have in mind are addition and multiplicationon R). Then (G, ◦) (or just G, if the composition ◦ is understood) is called a group if,and only if, the following three conditions are satisfied:

(i) Associativity: x ◦ (y ◦ z) = (x ◦ y) ◦ z holds for all x, y, z ∈ G.

(ii) There exists a neutral element e ∈ G, i.e. an element e ∈ G such that

∀x∈G

x ◦ e = x. (C.2)

(iii) For each x ∈ G, there exists an inverse element x ∈ G, i.e. an element x ∈ G suchthat

x ◦ x = e.

G is called a commutative or abelian group if, and only if, it is a group and satisfies theadditional condition:

(iv) Commutativity: x ◦ y = y ◦ x holds for all x, y ∈ G.

Theorem C.2. The following statements and rules are valid in every group (G, ◦):

(a) If (C.2) holds for e ∈ G, then

∀x∈G

e ◦ x = x. (C.3)

also holds.


(b) The neutral element is unique: If e, f ∈ G, then

((

∀x∈G

x ◦ e = x

)

∧(

∀x∈G

x ◦ f = x

))

⇒ e = f. (C.4)

(c) If x, a ∈ G and x ◦ a = e (where e ∈ G is the neutral element), then a ◦ x = e aswell. Moreover, inverse elements are unique (for each x ∈ G, the unique inverse isthen denoted by x−1).

(d) (x−1)−1 = x holds for each x ∈ G.

(e) y−1 ◦ x−1 = (x ◦ y)−1 holds for each x, y ∈ G.

(f) x ◦ a = y ◦ a ⇒ x = y holds for each x, y, a ∈ G.

Proof. (a): Let x ∈ G. By Def. C.1(iii), there exists y ∈ G such that x ◦ y = e and, inturn, z ∈ G such that y ◦ z = e. Thus,

e ◦ z = (x ◦ y) ◦ z = x ◦ (y ◦ z) = x ◦ e = x, (C.5)

implyingx = e ◦ z = (e ◦ e) ◦ z = e ◦ (e ◦ z) = e ◦ x (C.6)

as desired.

(b): If e, f are both neutral elements, then, using (a), f = e ◦ f = e.

(c): Assume x ◦ a = e. Then there is b such that a ◦ b = e. One computes

e = a ◦ b = (a ◦ e) ◦ b = (a ◦ (x ◦ a)) ◦ b = a ◦ ((x ◦ a) ◦ b) = a ◦ (x ◦ (a ◦ b))= a ◦ (x ◦ e) = a ◦ x, (C.7)

establishing the case. Now let a, b be inverses to x. Then a = a◦e = a◦x◦ b = e◦ b = b.

(d): x−1 ◦ x = e holds according to (c) and shows that x is the inverse to x−1. Thus,(x−1)−1 = x as claimed.

(e) is due to y−1 ◦ x−1 ◦ x ◦ y = y−1 ◦ e ◦ y = e.

(f): If x ◦ a = y ◦ a, then x = x ◦ a ◦ a−1 = y ◦ a ◦ a−1 = y as claimed. �

Definition C.3. Let (G, ◦) and (H, ◦) be groups. A map φ : G −→ H is called a(group) homomorphism if, and only if,

∀a,b∈G

φ(a ◦ b) = φ(a) ◦ φ(b). (C.8)

Proposition C.4. Let (G, ◦) and (H, ◦) be groups and let φ : G −→ H be a grouphomomorphism. Let e, e′ denote the neutral elements of G and H, respectively. Thenthe following holds:

(a) φ(e) = e′.


(b) φ(a−1) = (φ(a))−1 for each a ∈ G.

(c) If φ is bijective, then φ−1 is also a group homomorphism.

Proof. (a): We compute

φ(e) ◦ e′ = φ(e) = φ(e ◦ e) = φ(e) ◦ φ(e).

Applying (φ(e))−1 to both sides of the above equality then proves φ(e) = e′.

(b): We computeφ(a−1) ◦ φ(a) = φ(a−1 ◦ a) = φ(e) = e′,

proving (b).

(c): Applying φ−1 to (C.8) yields

∀a,b∈G

a ◦ b = φ−1(φ(a) ◦ φ(b)). (C.9)

Thus, for each x, y ∈ H, we obtain

φ−1(x ◦ y) = φ−1(φ(φ−1(x)) ◦ φ(φ−1(y))

) (C.9)= φ−1(x) ◦ φ−1(y),

establishing the case and completing the proof of the proposition. �

Notation C.5. Exponentiation with Integer Exponents: Let G be a nonempty set witha composition · : G×G −→ G. Assume there exists a (unique) neutral element 1 ∈ G(satisfying x ·1 = x for each x ∈ G). Define recursively for each x ∈ G and each n ∈ N0:

x0 := 1, ∀n∈N0

xn+1 := x · xn. (C.10a)

Moreover, if (G, ·) constitutes a group, then also define for each x ∈ G and each n ∈ N:

x−n := (x−1)n. (C.10b)

Theorem C.6. Exponentiation Rules: Let G be a nonempty set with a composition· : G×G −→ G. Assume that the composition satisfies the law of associativity and thatthere exists a (unique) neutral element 1 ∈ G (satisfying x · 1 = x for each x ∈ G). Letx, y ∈ G. Then the following rules hold for each m,n ∈ N0. If (G, ·) is a group, thenthe rules even hold for every m,n ∈ Z.

(a) xm+n = xm · xn.

(b) (xm)n = xmn.

(c) If the composition is commutative (i.e. xy = yx for each x, y ∈ G), then it holdsthat xnyn = (xy)n.

Proof. (a): First, we fix n ∈ N0 and prove the statement for each m ∈ N0 by induction:The base case (m = 0) is xn = xn, which is true. For the induction step, we compute

xm+1+n (C.10a)= x · xm+n ind. hyp.

= x · xm · xn (C.10a)= xm+1xn,

completing the induction step. Now assume G to be a group. Consider m ≥ 0 andn < 0. If m+ n ≥ 0, then, using what we have already shown,

xmxn(C.10b)= xm(x−1)−n = xm+nx−n(x−1)−n = xm+n.

Similarly, if m+ n < 0, then

xmxn(C.10b)= xm(x−1)−n = xm(x−1)m(x−1)−n−m

(C.10b)= xm+n.

The case m < 0, n ≥ 0 is treated completely analogously. It just remains to considerm < 0 and n < 0. In this case,

xm+n = x−(−m−n) (C.10b)= (x−1)−m−n = (x−1)−m · (x−1)−n

(C.10b)= xm · xn.

(b): First, we prove the statement for each n ∈ N0 by induction (for m < 0, we assmueG to be a group): The base case (n = 0) is (xm)0 = 1 = x0, which is true. For theinduction step, we compute

(xm)n+1 (C.10a)= xm · (xm)n ind. hyp.

= xm · xmn (a)= xmn+m = xm (n+1),

completing the induction step. Now, let G be a group and n < 0. We already know(xm)−1 = x−m. Thus, using what we have already shown,

(xm)n(C.10b)=

((xm)−1

)−n= (x−m)−n = x(−m)(−n) = xmn.

(c): For n ∈ N0, the statement is proved by induction: The base case (n = 0) isx0y0 = 1 = (xy)0, which is true. For the induction step, we compute

xn+1yn+1 (C.10a)= x · xn · y · yn ind. hyp.

= xy · (xy)n (C.10a)= (xy)n+1,

completing the induction step. If G is a group and n < 0, then, using what we havealready shown,

xnyn(C.10b)= (x−1)−n(y−1)−n = (x−1y−1)−n

Th. C.2(e)=

((xy)−1

)−n (C.10b)= (xy)n,

which completes the proof. �


C.2 Rings

Definition C.7. Let R be a nonempty set with two maps

+ : R×R −→ R, (x, y) 7→ x+ y,

· : R×R −→ R, (x, y) 7→ x · y (C.11)

(+ is called addition and · is called multiplication; often one writes xy instead of x · y).Then (R,+, ·) (or just R, if + and · are understood) is called a ring if, and only if, thefollowing three conditions are satisfied:

(i) R is a commutative group with respect to +.

(ii) Multiplication is associative.

(iii) Distributivity:

∀x,y,z∈R

x · (y + z) = x · y + x · z, (C.12a)

∀x,y,z∈R

(y + z) · x = y · x+ z · x. (C.12b)

A ring R is called commutative if, and only if, its multiplication is commutative. More-over, a ring called a ring with unity if, and only if, R contains a neutral element withrespect to multiplication (i.e. there is 1 ∈ R such that 1 · x = x · 1 = x for eachx ∈ R) – some authors always require a ring to have a neutral element with respect tomultiplication.

Theorem C.8. The following statements and rules are valid in every ring (R,+, ·) (let0 denote the additive neutral element and let x, y, z ∈ R):

(a) x · 0 = 0 = 0 · x.

(b) x(−y) = −(xy) = (−x)y.

(c) (−x)(−y) = xy.

(d) x(y − z) = xy − xz.

Proof. (a): One computes

x · 0 + x · 0 (C.12a)= x · (0 + 0) = x · 0 = 0 + x · 0,

i.e. x · 0 = 0 follows since (R,+) is a group. The second equality follows analogouslyusing (C.12b).

(b): xy+x(−y) = x(y−y) = x·0 = 0, where we used (C.12a) and (a). This shows x(−y)is the additive inverse to xy. The second equality follows analogously using (C.12b).

(c): xy = −(−(xy)) = −(x(−y)) = (−x)(−y), where (b) was used twice.

(d): x(y − z) = x(y + (−z)) = xy + x(−z) = xy − xz. �


C.3 Fields

Definition C.9. Let (F,+, ·) be a ring with unity. Then (F,+, ·) (or just F , if + and· are understood) is called a field if, and only if, F \ {0} is a commutative group withrespect to ·.

Theorem C.10. The following statements and rules are valid in every field (F,+, ·):

(a) Inverse elements are unique. For each x ∈ F , the unique inverse with respect toaddition is denoted by −x. Also define y−x := y+(−x). For each x ∈ F \{0}, theunique inverse with respect to multiplication is denoted by x−1. For x 6= 0, definethe fractions y

x:= y/x := yx−1 with numerator y and denominator x.

(b) −(−x) = x and (x−1)−1 = x for x 6= 0.

(c) (−x) + (−y) = −(x+ y) and x−1y−1 = (xy)−1 for x, y 6= 0.

(d) x+ a = y + a ⇒ x = y and, for a 6= 0, xa = ya ⇒ x = y.

(e) x · 0 = 0.

(f) x(−y) = −(xy).

(g) (−x)(−y) = xy.

(h) x(y − z) = xy − xz.

(i) xy = 0 ⇒ x = 0 ∨ y = 0.

(j) Rules for Fractions:

a

c+b

d=ad+ bc

cd,

a

c· bd=ab

cd,

a/c

b/d=ad

bc,

where all denominators are assumed 6= 0.

Proof. (a) follows by applying Th. C.2(c) to the groups (F,+) and(F \ {0}, ·

).

(b) follows by applying Th. C.2(d) to the groups (F,+) and(F \ {0}, ·

).

(c) follows by applying Th. C.2(e) to the groups (F,+) and(F \ {0}, ·

), plus then using

commutativity of the groups.

(d) follows by applying Th. C.2(f) to the groups (F,+) and(F \ {0}, ·

)(in the latter

situation, the case x = y = 0 is also clear).

(e) follows by applying Th. C.8(a) to the ring with unity (F,+, ·).(f) follows by applying Th. C.8(b) to the ring with unity (F,+, ·).(g) follows by applying Th. C.8(c) to the ring with unity (F,+, ·).(h) follows by applying Th. C.8(d) to the ring with unity (F,+, ·).

D CONSTRUCTION OF THE REAL NUMBERS 213

(i): If xy = 0 and x 6= 0, then y = 1 · y = x−1xy = x−1 · 0 = 0.

(j): One computes

a

c+b

d= ac−1 + bd−1 = add−1c−1 + bcc−1d−1 = (ad+ bc)(cd)−1 =

ad+ bc

cd

anda

c· bd= ac−1bd−1 = ab(cd)−1 =

ab

cdand

a/c

b/d= ac−1(bd−1)−1 = ac−1b−1d = ad(bc)−1 =

ad

bc,


D Construction of the Real Numbers

In Th. 4.4, we have defined the set of real numbers R as a complete totally ordered fieldand we claimed that such a complete totally ordered field does actually exist. In thefollowing, we will describe how R can be constructed. We will follow [EHH+95, Chs.1,2], which contains several different approaches for the construction of R.

D.1 Natural Numbers

In the first step, one starts with the natural numbers N. The set of natural numbersN was defined in Def. A.41 and it was shown in Th. A.44 that N satisfies the Peanoaxioms P1 – P3 of Sec. 3.1. We denote natural numbers using the usual symbols 0 := ∅,1 := S(0) = {0}, 2 := S(1) = {0, 1}, 3 := S(2) = {0, 1, 2}, . . . , n + 1 := S(n) =n ∪ {n} = {0, 1, . . . , n} (which is consistent with previous definitions in (A.5) and Not.A.45).

Theorem 3.7 allows to define addition and multiplication on N0 via recursion:

Definition D.1. (a) For each m,n ∈ N0, m+ n is defined recursively by

m+ 0 := m, m+ 1 := S(m), ∀n∈N

m+ S(n) := S(m+ n). (D.1)

This fits into the framework of Th. 3.7, using A := N0, x1 := S(m), and, for eachn ∈ N, fn : An −→ A, fn(x1, . . . , xn) := S(xn) (due to the different initializations,one obtains a different recursion for each m ∈ N0).

(b) For each m,n ∈ N0, mn := m · n is defined recursively by

m · 0 := 0, m · 1 := m, ∀n∈N

m · (n+ 1) := m · n+m. (D.2)

This fits into the framework of Th. 3.7, using A := N0, x1 := m, and, for eachm,n ∈ N, fm,n : An −→ A, fm,n(x1, . . . , xn) := xn +m.


Theorem D.2. The set N0 of the natural numbers (including 0) with the maps ofaddition and multiplication

+ : N0 × N0 −→ N0, (x, y) 7→ x+ y,

· : N0 × N0 −→ N0, (x, y) 7→ x · y,

as defined in Def. D.1(a) and Def. D.1(b), respectively, satisfies Def. C.1(i),(ii),(iv) forboth addition and multiplication, i.e. associativity, commutativity, and the existence of aneutral element. This can be summarized as the statement that N0 forms a commutativesemigroup with respect to both addition and multiplication (however, no group, as theexistence of inverse elements is lacking). Moreover, distributivity, i.e. Def. C.7(iii) isalso satisfied.

Proof. Associativity of Addition: We have to show

∀k,m,n∈N0

(k +m) + n = k + (m+ n). (D.3a)

The proof of (D.3a) is carried out by induction on n. The base case (n = 0) followsfrom the first definition in (D.1): (k+m)+0 = k+m = k+(m+0) for every k,m ∈ N0.For the induction step, one computes, for every k,m, n ∈ N0,

(k +m) + (n+ 1)(D.1)= (k +m) + S(n)

(D.1)= S((k +m) + n)

ind. hyp.= S(k + (m+ n))

(D.1)= k + S(m+ n)

(D.1)= k + (m+ S(n))

(D.1)= k + (m+ (n+ 1)), (D.3b)

completing the induction.

Neutral Element of Addition: That 0 is the neutral element of addition is immediatefrom (D.1).

Commutativity of Addition: We have to show

∀m,n∈N0

m+ n = n+m. (D.4a)

The proof of (D.4a) is also carried out by induction on n. More precisely, we proven = 0 separately, and then carry out the induction for n ∈ N. The case n = 0 is provedby induction on m: The base case (m = 0) is the true statement 0 + 0 = 0 = 0+ 0. Forthe induction step, one computes (m+1)+0 = m+1 = S(m) = S(m+0) = S(0+m) =0+S(m) = 0+(m+1). The base case for the induction on n, i.e. n = 1 is also proved byinduction on m: The base case (m = 0) is the true statement 0+ 1 = S(0) = 1 = 1+ 0.For the induction step, one computes, for every m ∈ N0,

(m+ 1) + 1(D.1)= S(m+ 1)

ind. hyp.= S(1 +m)

(D.1)= (1 +m) + 1

(D.3a)= 1 + (m+ 1). (D.4b)


Now, for the induction step of the induction on n, one computes, for every (m,n) ∈N0 × N,

m+ (n+ 1)(D.1)= m+ S(n)

(D.1)= S(m+ n)

ind. hyp.= S(n+m)

(D.1)= n+ S(m)

(D.1)= n+ (m+ 1)

base case= n+ (1 +m)

(D.3a)= (n+ 1) +m, (D.4c)


Neutral Element of Multiplication: That 1 is the neutral element of addition is imme-diate from (D.2).

Commutativity of Multiplication: We have to show

∀m,n∈N0

m · n = n ·m. (D.5a)

We start with some preparatory steps: We first show

∀m∈N0

m = m · 1 = 1 ·m. (D.5b)

We have m · 1 = m for each m ∈ N0 directly from (D.2). We prove 1 ·m = m for eachm ∈ N0 via induction on m: 1 · 0 = 0 and 1 · 1 = 1 are immediate from (D.2). For theinduction step, one computes, for every m ∈ N,

1 · (m+ 1)(D.2)= 1 ·m+ 1

ind. hyp.= m+ 1.

Next, we show∀

m,n∈N0

n ·m+m = (n+ 1) ·m (D.5c)

via induction on m. For the base case (m = 0), we note (n + 1) · 0 = 0 by (D.2), andn · 0 + 0 = 0 by (D.1) and (D.2). For the induction step, we compute

n · (m+ 1) +m+ 1(D.2)= n ·m+ n+m+ 1

(D.4a)= n ·m+m+ n+ 1

ind. hyp.= (n+ 1) ·m+ n+ 1

(D.2)= (n+ 1) · (m+ 1).

We are now in a position to carry out the proof of (D.5a) by induction on n. Moreprecisely, we prove n = 0 separately, and then carry out the induction for n ∈ N. Letn = 0. We have m · 0 = 0 for each m ∈ N0 directly from (D.2). We prove 0 ·m = 0 foreach m ∈ N0 via induction on m: 0 · 0 = 0 and 0 · 1 = 0 are immediate from (D.2). Forthe induction step, one computes, for every m ∈ N,

0 · (m+ 1)(D.2)= 0 ·m+ 0

ind. hyp., (D.1)= 0.

The base case for the induction on n ∈ N is provided by (D.5b). For the induction step,one computes, for every (m,n) ∈ N0 × N,

m · (n+ 1)(D.2)= m · n+m

ind. hyp.= n ·m+m

(D.5c)= (n+ 1) ·m,


completing the proof of (D.5a).

Distributivity: As we have commutativity of multiplication, we only need to show

∀k,m,n∈N0

(k +m) · n = k · n+m · n. (D.6a)

The proof of (D.6a) is carried out by induction on n. The base case (n = 0) followsfrom (D.2): (k +m) · 0 = 0 = k · 0 +m · 0. For the induction step, one computes, forevery k,m, n ∈ N0,

(k +m) · (n+ 1)(D.2)= (k +m) · n+ k +m

ind. hyp.= k · n+m · n+ k +m

(D.4a)= k · n+ k +m · n+m

(D.2)= k · (n+ 1) +m · (n+ 1), (D.6b)


Associativity of Multiplication: We have to show

∀k,m,n∈N0

(k ·m) · n = k · (m · n). (D.7a)

The proof of (D.7a) is carried out by induction on n. The base case (n = 0) followsfrom (D.2): (k ·m) · 0 = 0 = k · (m · 0) for every k,m ∈ N0. For the induction step, onecomputes, for every k,m, n ∈ N0,

(k ·m) · (n+ 1)(D.2)= (k ·m) · n+ k ·m ind. hyp.

= k · (m · n) + k ·m(D.6a)= k · (m · n+m)

(D.2)= k · (m · (n+ 1)), (D.7b)


Lemma D.3. We have∀

m∈N0

∀n∈N

m+ n 6= m. (D.8)

Proof. We fix n ∈ N and conduct an induction over m ∈ N0. The base case (m = 0) isclear, since 0 + n = n 6= 0. The induction step is also clear, since m + n 6= m impliesm+ 1+ n = m+ n+ 1 = S(m+ n) 6= S(m) = m+ 1, where we used that S is injectiveby Peano axiom P2. �

Next, one defines an order ≤ on N0:

Definition D.4. For each n,m ∈ N0, let

n ≤ m :⇔ ∃k∈N0

n+ k = m. (D.9)

Theorem D.5. The relation defined in (D.9) constitutes a well-order on N0 (in partic-ular, a total order) that is compatible with addition and multiplication, i.e. it satisfies(4.3).

Proof. ≤ is Reflexive: For each n ∈ N0, we have n+ 0 = n, showing n ≤ n.

≤ is Antisymmetric: If m ≤ n and n ≤ m, then m + k = n and n + l = m. Thus,n = m + k = n + l + k. Thus, l + k = 0 by Lem. D.3, implying l = k = 0 (since0 /∈ S(N0)) and m = n.

≤ is Transitive: If n ≤ m and m ≤ l, then there are kn, km ∈ N0 such that n+ kn = mand m+ km = l. Then n+ kn + km = m+ km = l, showing m ≤ l.

≤ is Total Order: We have to show that, if m,n ∈ N0, then m ≤ n or n ≤ m. To thisend, fix n ∈ N0 and conduct an induction over m. If m = 0, then m + n = n, showingm ≤ n. For the induction step, let m ∈ N0. If m ≤ n, then m + k = n. If k = 0, thenn ≤ n + 1 = m + 1. If k 6= 0, then k = S(l), i.e. n = m + l + 1 = m + 1 + l, showingm+ 1 ≤ n. If n ≤ m, then n ≤ m+ 1 is immediate, completing the induction.

≤ is Well-Order: We first show that, for each n ∈ N0, the set An := {m ∈ N0 : m ≤ n}is finite: Indeed, this follows by an induction over n if we can show An+1 = An∪{n+1}.Indeed, if m ≤ n, then m ≤ n ≤ n + 1, i.e. m ≤ n + 1 by transitivity, showingAn ∪ {n+ 1} ≤ An+1. If m ≤ n+ 1 and m 6≤ n, then n < m, i.e. n+ k = m with k 6= 0,i.e. n+1+ l = m, showing n+1 ≤ m, i.e. m = n+1 and An+1 ⊆ An∪{n+1}. One nowfinishes the proof that ≤ is a well-order as in the proof of Th. 3.13(b): Let ∅ 6= A ⊆ N0.We have to show A has a min. If A is finite, then A has a min by Th. 3.13(a). If A isinfinite, let n be an element from A. Then the finite set B := {k ∈ A : k ≤ n} = An∩Bmust have a min m by Th. 3.13(a). Since m ≤ x for each x ∈ B and m ≤ n < x foreach x ∈ A \B, we have m = minA.

Compatibility with Addition: We have to show

∀k,m,n∈N0

(k ≤ m ⇒ k + n ≤ m+ n

). (D.10)

To this end, note that k ≤ m means that there is l ∈ N0 such that k+ l = m. But thenk + n+ l = k + l + n = m+ n, showing k + n ≤ m+ n.

Compatibility with Multiplication: As 0 ≤ n holds for every n ∈ N0, there is nothing toprove. �

Lemma D.6. Let a, n ∈ N. Then the following holds:

(a) a+ n ∈ N.

(b) a · n ∈ N.

Proof. (a): Since a+ n = 0 + (a+ n), a+ n 6= 0 is due to Lem. D.3.

(b): We conduct the proof via induction on n: For n = 1, a · 1 = a ∈ N by hypothesis.For n ≥ 1, a · (n+ 1) = a · n+ a · 1 = a · n+ a. Since a · n ∈ N by induction hypothesis,we have a · n+ a ∈ N by (a). �

Theorem D.7. (a) Monotonicity: Addition and multiplication on N0 are strictly in-creasing in the sense that, for each a, b ∈ N0 and each n ∈ N,

a < b ⇒ a+ n < b+ n (D.11a)

a < b ⇒ a · n < b · n. (D.11b)

(b) Cancellation Laws: Given a, b ∈ N0 and n ∈ N, addition and multiplication on N0

satisfy

a+ n = b+ n ⇒ a = b (D.12a)

a · n = b · n ⇒ a = b. (D.12b)

Proof. Let a, b ∈ N0 with a < b. Then there exists k ∈ N with a+ k = b. Let n ∈ N.

(a): We have b+n = a+k+n. Since k+n > 0 by Lem. D.6(a), this shows a+n < b+n.We have b · n = (a + k) · n = a · n + k · n. Since k · n > 0 by Lem. D.6(b), this showsa · n < b · n.(b): Arguing via contraposition, both laws are immediate from (a). �

D.2 Interlude: Orders on Groups

In the succeeding sections, we will construct the set of integers Z, the set of rationalnumbers Q, and the set of real numbers R. In each case, we will use the same methodto define a total order on the constructed set, making use of the algebraic structure ofits additive group. It is therefore economical as well as mathematically interesting, tostudy this construction once in its abstract form, which is the purpose of the presentsection.

Recall the definition of a group from Def. C.1.

Theorem D.8. Assume (G,+) to be a group. Moreover, assume we have a disjointdecomposition

G = P ∪{0} ∪(−P ), −P := {x ∈ G : −x ∈ P}, (D.13)

where −x denotes the inverse of x with respect to +. If P is closed under + (i.e. x, y ∈ Pimplies x+ y ∈ P ), then

y ≤ x :⇔ x− y ∈ P ∪ {0} (D.14)

defines a total order on G that is compatible with addition, i.e. it satisfies (4.3a). More-over, if a multiplication is also defined on G and P ∪ {0} is closed under this multipli-cation, then ≤ is also compatible with multiplication, i.e. it satisfies (4.3b). Of course,one refers to the elements of P as positive and to the elements of −P as negative.

Proof. For each x ∈ G, one has x−x = 0 ∈ P∪{0}, i.e. x ≤ x and the relation is reflexive.If x, y ∈ G, x ≤ y and y ≤ x, then x − y ∈ P ∪ {0} and −(x − y) = y − x ∈ P ∪ {0},and the disjointness of the union in (D.13) implies x − y = 0, i.e. x = y, showing therelation is antisymmetric. If x, y, z ∈ G with x ≤ y and y ≤ z, then y − x ∈ P ∪ {0},z−y ∈ P ∪{0}, and z−x = z−y+y−x ∈ P ∪{0} since P is closed under +, showing therelation is transitive. So we have shown ≤ constitutes a partial order on G. It remains toshow the order is total. However, given the decomposition in (D.13), for each x, y ∈ G,precisely one of the statements x−y ∈ P (i.e. y < x), x−y = 0 (i.e. x = y), x−y ∈ −P(i.e. x < y) must be true, proving that the order is total. To see ≤ satisfies (4.3a), let


x, y, z ∈ G. If x ≤ y, then y−x ∈ P ∪{0}, i.e. y+ z− (x+ z) = y+ z− z−x ∈ P ∪{0},showing x+z ≤ y+z. The proof is completed by noting (4.3b) is precisely the statementthat P ∪ {0} is closed under multiplication. �

D.3 Integers

As compared to our goal, the set of real numbers R, the set N0 still has three defi-ciencies, namely the lack of inverse elements for addition, the lack of inverse elementsfor multiplication, and that the order ≤ lacks completeness. The construction of theintegers will remedy (only) the first of the three deficiencies by providing the inverseelements of addition.

Definition and Remark D.9. The relation ∼ on N0 × N0 defined by

(a, b) ∼ (c, d) :⇔ a+ d = b+ c, (D.15)

constitutes an equivalence relation on N0 ×N0 (cf. Def. 2.23): Indeed, if a, b ∈ N0, thena+ b = b+ a shows (a, b) ∼ (a, b), proving ∼ to be reflexive. If a, b, c, d ∈ N0, then

(a, b) ∼ (c, d) ⇒ a+ d = b+ c ⇒ c+ b = d+ a ⇒ (c, d) ∼ (a, b),

proving ∼ to be symmetric. If a, b, c, d, e, f ∈ N0, then

(a, b) ∼ (c, d) ∧ (c, d) ∼ (e, f) ⇒ a+ d = b+ c ∧ c+ f = d+ e

⇒ a+ d+ c+ f = b+ c+ d+ e(D.12a)⇒ a+ f = b+ e ⇒ (a, b) ∼ (e, f),

proving ∼ to be transitive and an equivalence relation.

Definition D.10. (a) Define the set of integers Z as the set of equivalence classes ofthe equivalence relation ∼ defined in (D.15), i.e.

Z := (N0 × N0)/ ∼ ={[(a, b)] : (a, b) ∈ N0 × N0

}(D.16)

is the quotient set of N0 × N0 with respect to ∼ (cf. Ex. 2.24(c)). To simplifynotation, in the following, we will write

[a, b] := [(a, b)] (D.17)

for the equivalence class of (a, b) with respect to ∼.

(b) Addition on Z is defined by

+ : Z× Z −→ Z,([a, b], [c, d]

)7→ [a, b] + [c, d] := [a+ c, b+ d]. (D.18)

Subtraction on Z is defined by

− : Z× Z −→ Z,([a, b], [c, d]

)7→ [a, b]− [c, d] := [a, b] + [d, c]. (D.19)

—


For the definitions in Def. D.10(b) to make sense, one needs to check that they do notdepend on the chosen representatives of the equivalence classes. Moreover, one needs toconvince oneself that these definitions yield the desired familiar operations of additionand subtraction. Let us start by verifying the independence of the representatives is thefollowing Lem. D.11.

Lemma D.11. The definitions in Def. D.10(b) do not depend on the chosen represen-tatives, i.e.

∀a,b,c,d,a,b,c,d∈N0

(

[a, b] = [a, b]∧ [c, d] = [c, d] ⇒ [a+ c, b+d] = [a+ c, b+ d])

(D.20)

and


(

[a, b] = [a, b]∧ [c, d] = [c, d] ⇒ [a, b]− [c, d] = [a, b]− [c, d])

. (D.21)

Proof. (D.20): [a, b] = [a, b] means a + b = b + a, [c, d] = [c, d] means c + d = d + c,implying a+ c+ b+ d = b+ a+ d+ c, i.e. [a+ c, b+ d] = [a+ c, b+ d].

(D.21) is just (D.19) combined with (D.20). �

Theorem D.12. The set of integers Z forms a commutative group with respect to ad-dition as defined in Def. D.10(b), where [0, 0] is the neutral element, [b, a] is the inverseelement of [a, b] for each a, b ∈ N0, and, denoting the inverse element of [a, b] by −[a, b]in the usual way, [a, b]− [c, d] = [a, b] + (−[c, d]) for each a, b, c, d ∈ N0.

Proof. To verify commutativity and associativity of addition on Z, let a, b, c, d, e, f ∈ N0.Then

[a, b] + [c, d] = [a+ c, b+ d] = [c+ a, b+ d] = [c, d] + [a, b],

proving commutativity, and

[a, b] +([c, d] + [e, f ]

)= [a, b] + [c+ e, d+ f ] = [a+ (c+ e), b+ (d+ f)]

= [(a+ c) + e, (b+ d) + f ] = [a+ c, b+ d] + [e, f ]

=([a, b] + [c, d]

)+ [e, f ],

proving associativity. For every a, b ∈ N0, one obtains [a, b]+[0, 0] = [a+0, b+0] = [a, b],proving neutrality of [0, 0], whereas [a, b] + [b, a] = [a + b, b + a] = [a + b, a + b] = [0, 0](since (a+ b, a+ b) ∼ (0, 0)) shows [b, a] = −[a, b]. Now [a, b]− [c, d] = [a, b] + (−[c, d])is immediate from (D.19). �

Remark D.13. The map

ι : N0 −→ Z, ι(n) := [n, 0], (D.22)

is a monomorphism, i.e. it is injective (since ι(m) = [m, 0] = ι(n) = [n, 0] impliesm+ 0 = 0 + n, i.e. m = n) and satisfies

∀m,n∈N0

ι(m+ n) = [m+ n, 0] = [m, 0] + [n, 0] = ι(m) + ι(n). (D.23)

It is customary to identify N0 with ι(N0), as it usually does not cause any confusion.One then just writes n instead of [n, 0] and −n instead of [0, n] = −[n, 0].

Lemma D.14. We have the disjoint decomposition

Z = N ∪{0} ∪Z−, Z− := −N = {n ∈ Z : −n ∈ N}. (D.24)

Proof. Note that, due to (D.15), an equivalence class remains the same if a naturalnumber is added or subtracted in both components: [a, b] = [a +m, b +m]. Thus, foreach x = [a, b] ∈ Z, if a > b, then x = [a − b, 0] ∈ N; if a = b, then x = [0, 0] = 0; ifa < b, then x = [0, b− a] = −[b− a, 0] ∈ Z−. It just remains to verify that the union in(D.24) is disjoint. However, if [n, 0] = [0,m] with m,n ∈ N0, then n +m = 0, provingn = m = 0, completing the proof. �

Remark D.15. In the above construction, we obtained the commutative group (Z,+)from the commutative semigroup (N0,+). It is worth pointing out that the same con-struction always works when, instead of with N0, one starts with any commutativesemigroup (H,+) that satisfies the cancellation law a+ c = b+ c ⇒ a = b, to obtain acommutative group (G,+) and a monomorphism ι : H −→ G.

—

To obtain the expected laws of arithmetic, multiplication on Z needs to be defined suchthat (a− b) · (c− d) = (ac+ bd)− (ad+ bc), which leads to the following definition.

Definition D.16. Multiplication on Z is defined by

· : Z× Z −→ Z,([a, b], [c, d]

)7→ [a, b] · [c, d] := [ac+ bd, ad+ bc]. (D.25)

Lemma D.17. The definition in Def. D.16 does not depend on the chosen representa-tives, i.e.


(

[a, b] = [a, b]∧ [c, d] = [c, d] ⇒ [ac+bd, ad+bc] = [ac+ bd, ad+ bc])

.

(D.26)

Proof. As mentioned before, due to (D.15), an equivalence class remains the same if anatural number is added or subtracted in both components. Thus, one computes

[ac+ bd, ad+ bc](D.15)= [ac+ bd+ bc, ad+ bc+ bc] = [(a+ b)c+ bd, ad+ bc+ bc]

= [(a+ b)c+ bd, ad+ bc+ bc](D.15)= [ad+ ac+ bd, ad+ ad+ bc]

= [a(d+ c) + bd, ad+ ad+ bc] = [a(d+ c) + bd, ad+ ad+ bc]

= [ac+ (a+ b)d, ad+ ad+ bc] = [ac+ (a+ b)d, ad+ ad+ bc](D.15)= [ac+ bd+ bc, ad+ bc+ bc] = [ac+ b(d+ c), ad+ bc+ bc]

= [ac+ b(d+ c), ad+ bc+ bc] = [ac+ bd, ad+ bc], (D.27)

Theorem D.18. The set of integers Z is associative and commutative with respect tothe multiplication defined in Def. D.16. Moreover, distributivity, i.e. Def. C.7(iii) issatisfied, [1, 0] is the neutral element of multiplication, and there are no zero divisors,i.e.

∀a,b,c,d∈N0

(

[a, b] · [c, d] = [ac+ bd, ad+ bc] = [0, 0] ⇒ [a, b] = [0, 0] ∨ [c, d] = [0, 0])

.

(D.28)Algebraically, the theorem can be summarized by saying that (Z,+, ·) constitutes a prin-cipal ideal domain.

Proof. Let a, b, c, d, e, f ∈ N0. Then, using commutativity of addition and multiplicationon N0,

[a, b] · [c, d] = [ac+ bd, ad+ bc] = [ca+ db, cb+ da] = [c, d] · [a, b],proving commutativity of multiplication on Z; using distributivity and commutativityof addition on N0,

[a, b] ·([c, d] · [e, f ]

)= [a, b] · [ce+ df, cf + de]

= [ace+ adf + bcf + bde, acf + ade+ bce+ bdf ]

= [ace+ bde+ adf + bcf, acf + bdf + ade+ bce]

= [ac+ bd, ad+ bc] · [e, f ] =([a, b] · [c, d]

)· [e, f ],

proving associativity of multiplication on Z; again using distributivity and commutativ-ity of addition on N0,

[a, b] ·([c, d] + [e, f ]

)= [a, b] · [c+ e, d+ f ] = [ac+ ae+ bd+ bf, ad+ af + bc+ be]

= [ac+ bd+ ae+ bf, ad+ bc+ af + be]

= [ac+ bd, ad+ bc] + [ae+ bf, af + be]

= [a, b] · [c, d] + [a, b] · [e, f ],

proving distributivity on Z. Next, [a, b] · [1, 0] = [a · 1 + b · 0, a · 0 + b · 1] = [a, b] provesneutrality of [1, 0]. It remains to prove (D.28). Note that, due to (D.15), the conclusionis equivalent to a = b or c = d. We assume 0 ≤ a < b and have to prove c = d. Accordingto Def. D.5, a < b means b = a + k for some k ∈ N. Thus, [ac + bd, ad + bc] = [0, 0]implies

ac+(a+k)d = ac+bd = ad+bc = ad+(a+k)c ⇒ kd = kc(D.12b)⇒ c = d, (D.29)


Definition D.19. For each k, l ∈ Z, let

l ≤ k :⇔ k − l ∈ N0. (D.30)

Theorem D.20. (a) The relation defined in (D.30) constitutes a total order on Z thatis compatible with addition and multiplication, i.e. it satisfies (4.3).

(b) The map ι from (D.22) is strictly increasing.

Proof. (a) follows from (D.30), (D.24), and Th. D.8 since N0 is closed under additionand multiplication.

(b): According to Def. D.5, if m,n ∈ N with n < m, then m = n + k for some k ∈ N.In consequence ι(m) = ι(n) + ι(k) by (D.23), i.e. ι(m) − ι(n) = ι(k) ∈ N, provingι(n) < ι(m). �

D.4 Rational Numbers

The remaining two deficiencies of the set of integers Z (as compared with R) are thelack of inverse elements for multiplication and that the order ≤ lacks completeness.We proceed to the construction of the rational numbers, which will provide the inverseelements for multiplication. The completion of the order will then be achieved in thelast step in the next section.

Definition and Remark D.21. The relation ∼ on Z× (Z \ {0}) defined by

(a, b) ∼ (c, d) :⇔ ad = bc, (D.31)

constitutes an equivalence relation on Z× (Z \ {0}) (cf. Def. 2.23): Indeed, noting that(D.31) is precisely the same as (D.15) if + is replaced by ·, the proof from Def. and Rem.D.9 also shows that (D.31) does, indeed, define an equivalence relation on Z× (Z\{0}):One merely replaces each + with · and each N0 with Z or Z \ {0}), respectively. Theonly modification needed occurs for 0 ∈ {a, c, e} in the proof of transitivity (in this case,the proof of Def. and Rem. D.9 yields adcf = 0 = bcde, which does not imply af = be),where one now argues, for a = 0,

(a, b) ∼ (c, d) ∧ (c, d) ∼ (e, f) ⇒ ad = 0 = bc ∧ cf = de

b 6=0⇒ c = 0d 6=0⇒ e = 0 ⇒ af = 0 = be ⇒ (a, b) ∼ (e, f),

for c = 0,

(a, b) ∼ (c, d) ∧ (c, d) ∼ (e, f) ⇒ ad = 0 = bc ∧ cf = 0 = de

d 6=0⇒ a = e = 0 af = 0 = be ⇒ (a, b) ∼ (e, f),

and, for e = 0,

(a, b) ∼ (c, d) ∧ (c, d) ∼ (e, f) ⇒ ad = bc ∧ cf = 0 = de

f 6=0⇒ c = 0d 6=0⇒ a = 0 ⇒ af = 0 = be ⇒ (a, b) ∼ (e, f).

Definition D.22. (a) Define the set of rational numbers Q as the set of equivalenceclasses of the equivalence relation ∼ defined in (D.31), i.e.

Q :=(Z× (Z \ {0})

)/ ∼ =

{[(a, b)] : (a, b) ∈ Z× (Z \ {0})

}(D.32)


is the quotient set of Z×(Z\{0}) with respect to ∼ (cf. Ex. 2.24(c)). As is common,we will write

a

b:= a/b := [(a, b)] (D.33)

for the equivalence class of (a, b) with respect to ∼.

(b) Addition on Q is defined by

+ : Q×Q −→ Q,(a

b,c

d

)

7→ a

b+c

d:=

ad+ bc

bd. (D.34)

Multiplication on Q is defined by

· : Q×Q −→ Q,(a

b,c

d

)

7→ a

b· cd:=

ac

bd. (D.35)

—

For the definitions in Def. D.22(b) to make sense, one needs to check that they do notdepend on the chosen representatives of the equivalence classes, and that the results ofboth addition and multiplication are always elements of Q. All this is provided by thefollowing lemma.


∀a,c,a,c∈Z

∀b,d,b,d∈Z\{0}

(

a

b=a

b∧ c

d=c

d⇒ ad+ bc

bd=ad+ bc

bd

)

(D.36)

and

∀a,c,a,c∈Z

∀b,d,b,d∈Z\{0}

(a

b=a

b∧ c

d=c

d⇒ ac

bd=ac

bd

)

. (D.37)

Furthermore, the results of both addition and multiplication are always elements of Q.

Proof. (D.36) and (D.37): a/b = a/b means ab = ab, c/d = c/d means cd = cd, implying

(ad+ bc)bd = bd(ad+ bc), i.e.ad+ bc

bd=ad+ bc

bd(D.38)

and

acbd = bdac, i.e.ac

bd=ac

bd. (D.39)

That the results of both addition and multiplication are always elements of Q followsfrom (D.28), i.e. from the fact that Z has no zero divisors. In particular, if b, d 6= 0,then bd 6= 0, showing (ad+ bc)/(bd) ∈ Q and (ac)/(bd) ∈ Q. �

Theorem D.24. (a) The set of rational numbers Q with addition and multiplication asdefined in Def. D.22 forms a field, where 0/1 and 1/1 are the neutral elements withrespect to addition and multiplication, respectively, (−a/b) is the additive inverseto a/b, whereas b/a is the multiplicative inverse to a/b with a 6= 0.


(b) Defining subtraction and division in the usual way, for each r, s ∈ Q, by s − r :=s + (−r) and s/r := sr−1, respectively, with −r denoting the additive inverse of rand r−1 denoting the multiplicative inverse of r 6= 0, all the rules stated in Th. C.10are valid in Q.

(c) The map

ι : Z −→ Q, ι(k) :=k

1, (D.40)


∀k,l∈Z

ι(k + l) = ι(k) + ι(l), (D.41a)

∀k,l∈Z

ι(kl) = ι(k) · ι(l). (D.41b)

It is customary to identify Z with ι(Z), as it usually does not cause any confusion.One then just writes k instead of k

1.

Proof. (a): We verify + and · to be commutative and associative on Q: Let a, c, e ∈ Z

and b, d, f ∈ Z \ {0}. Then, using commutativity on Z, we compute

c

d+a

b=cb+ da

db=ad+ bc

bd=a

b+c

d,

c

d· ab=ca

db=ac

bd=a

b· cd,

showing commutativity on Q. Using associativity and distributivity on Z, we compute

a

b+

(c

d+e

f

)

=a

b+cf + de

df=adf + b(cf + de)

bdf=

(ad+ bc)f + bde

bdf

=ad+ bc

bd+e

f=(a

b+c

d

)

+e

f,

a

b·(c

d· ef

)

=a(ce)

b(df)=

(ac)e

(bd)f=(a

b· cd

)

· ef,

showing associativity on Q. We proceed to checking distributivity on Q: Using commu-tativity, associativity, and distributivity on Z, we compute

a

b·(c

d+e

f

)

=a(cf + de)

bdf=acf + dae

bdf=acbf + bdae

bdbf=ac

bd+ae

bf=a

b· cd+a

b· ef,

proving distributivity on Q. We now check the claims regarding neutral and inverseelements:

a

b+

0

1=a · 1 + b · 0

b · 1 =a

b,

a

b+

−ab

=ab+ b(−a)

b2Def. C.7(iii) for Z

=(a− a)b

b2=

0

b2(D.31)=

0

1,

a

b· 11=a · 1b · 1 =

a

b,

a

b· ba=ab

ba

(D.31)=

1

1.

Thus, (Q,+, ·) is a ring and (Q \ {0}, ·) is a group, implying Q to be a field.

(b) is a consequence of (a), since Th. C.10 and its proof are valid in every field.

(c): The map ι is injective, as ι(k) = k/1 = ι(l) = l/1 implies k · 1 = l · 1, i.e. k = l.Moreover,

ι(k) + ι(l) =k

1+l

1=k · 1 + 1 · l

1=k + l

1= ι(k + l), (D.42a)

ι(k) · ι(l) = k

1· l1=kl

1= ι(kl), (D.42b)


Definition and Remark D.25. Define

Q+ :=

{

r ∈ Q : ∃a,b∈N

r =a

b

}

. (D.43)

We then have the decomposition

Q = Q+ ∪{0} ∪Q−, Q− := −Q+ = {r ∈ Q : −r ∈ Q+}, (D.44)

since

a/b ∈ Q+ ⇔((a > 0 ∧ b > 0) ∨ (a < 0 ∧ b < 0)

), (D.45a)

a/b = 0 ⇔ a = 0, (D.45b)

a/b ∈ Q− ⇔((a > 0 ∧ b < 0) ∨ (a < 0 ∧ b > 0)

). (D.45c)

Definition D.26. For each r, s ∈ Q, let

s ≤ r :⇔ r − s ∈ Q+0 := Q+ ∪ {0}. (D.46)

Theorem D.27. (a) The relation defined in (D.46) constitutes a total order on Q thatis compatible with addition and multiplication, i.e. it satisfies (4.3); in other words(Q,+, ·,≤) constitutes a totally ordered field.

(b) All the rules stated in Th. 4.5 are valid in Q.

(c) The map ι from (D.40) is strictly increasing.

Proof. (a) follows from (D.46), (D.44), and Th. D.8, since it is immediate from (D.34)and (D.35) that Q+ is closed under addition and multiplication.

(b) is a consequence of (a), since Th. 4.5 and its proof are valid in every totally orderedfield.

(c): According to Def. D.27, if k, l ∈ Z with l < k, then n := k− l ∈ N. In consequenceι(k) = ι(l)+ ι(n) by (D.41a), i.e. ι(k)− ι(l) = ι(n) = n/1 ∈ Q+, proving ι(l) < ι(k). �

D.5 Real Numbers

In the previous section, the construction of the rational numbers Q yielded a totallyordered field. However, the order on Q is not complete – for example, Rem. and Def.7.62 shows that the set M := {r ∈ Q : r2 < 2}, which is bounded from above (forexample by 2), has no supremum in Q (otherwise, we had a rational number q = supMwith q2 = 2). Finally, in the present section, we will start out from Q to constructthe set of real numbers R such that it becomes a complete totally ordered field. Thereare several different important constructions to obtain R from Q. We will describethe construction that defines real numbers as equivalence classes of rational Cauchysequences following [EHH+95, Ch. 2.3]. The construction using so-called Dedekind cutscan be found in [EHH+95, Ch. 2.2], the construction via nested intervals in [EHH+95,Ch. 2.4].

Definition D.28. (a) Let S denote the set of all Cauchy sequences in Q, where wecall a sequence (rn)n∈N in Q a Cauchy sequence if, and only if,

∀ǫ∈Q+

∃N∈N

∀n,m>N

|rn − rm| < ǫ, (D.47)

which defers from (7.25) in that ǫ has to be from Q+ rather than from R+.

(b) Addition on S is defined by

+ : S×S −→ S, ((rn)n∈N, (sn)n∈N) 7→ (rn)n∈N+(sn)n∈N := (rn+sn)n∈N. (D.48)

Multiplication on S is defined by

· : S × S −→ S, ((rn)n∈N, (sn)n∈N) 7→ (rn)n∈N · (sn)n∈N := (rnsn)n∈N. (D.49)

As a consequence of the following Lem. D.29, addition and multiplication are well-defined on S.

Lemma D.29. If (rn)n∈N and (sn)n∈N are Cauchy sequences in Q, so are (rn + sn)n∈Nand (rnsn)n∈N.

Proof. The proofs are analogous to the proofs of Th. 7.13(7.11b),(7.11c):

Given ǫ ∈ Q+, there exists N ∈ N such that, for each n,m > N , |rn − rm| < ǫ/2 and|sn − sm| < ǫ/2, implying

∀n,m>N

|rn + sn − (rm + sm)| ≤ |rn − rm|+ |sn − sm| < ǫ/2 + ǫ/2 = ǫ, (D.50)

proving (rn + sn)n∈N is Cauchy.

The proof of Th. 7.29 shows both (rn)n∈N and (sn)n∈N are bounded, i.e. there existsM ∈Q+ that is an upper bound for the sets {|rn| : n ∈ N} and {|sn| : n ∈ N}. Moreover,

given ǫ ∈ Q+, there exists N ∈ N such that, for each n,m > N , |rn− rm| < ǫ/(2M) and|sn − sm| < ǫ/(2M), implying

∀n,m>N

|rnsn − rmsm| =∣∣(rn − rm)sn + rm(sn − sm)

∣∣

≤ |sn| · |rn − rm|+ |rm| · |sn − sm| <M ǫ

2M+M ǫ

2M= ǫ

,

(D.51)completing the proof of the lemma. �

Theorem D.30. (S,+) is a group and, in addition, S is associative and commutativewith respect to multiplication. Moreover, distributivity also holds in S. In algebraicterms, this can be summarized as the statement that (S,+, ·) constitutes a commutativering.

Proof. Note that, since the rational sequence (rn)n∈N is nothing but the function f :N −→ Q, f(n) = rn, addition and multiplication as defined in Def. D.28(b) is analogousto the definition of addition and multiplication of real-valued functions in (6.1a), (6.1c),respectively. It is an easy exercise to verify that these function operations always inheritassociativity, commutativity, and distributivity if these rules hold for the operationsdefined on the function’s codomain (i.e. for + and · on Q in our present situation ofrational sequences). The constant sequence (0, 0, . . . ) is the neutral element of additionon S and −(rn)n∈N = (−rn)n∈N is the additive inverse of (rn)n∈N. �

The reason that we need another step in our construction of R is the fact that S is nota field: As soon as 0 occurs, even just once, in the sequence (rn)n∈N ∈ S, the sequencedoes not have a multiplicative inverse (where the neutral element of multiplication isobviously the constant sequence (1, 1, . . . )). The solution to this problem consists offactoring out all sequences converging to 0.

Definition and Remark D.31. Let

N :={

(rn)n∈N ∈ S : limn→∞

rn = 0}

. (D.52)

be the set of rational sequences converging to zero. The relation ∼ on S defined by

(rn)n∈N ∼ (sn)n∈N ⇔ (rn)n∈N − (sn)n∈N ∈ N , (D.53)

constitutes an equivalence relation on S (cf. Def. 2.23): Indeed, ∼ is reflexive, as f ∈ Simplies f − f = 0 ∈ N ; ∼ is symmetric, since f, g ∈ S with f − g ∈ N impliesg − f ∈ N ; ∼ is transitive, as f, g, h ∈ S with f − g ∈ N and g − h ∈ N impliesf − h = f − g + g − h ∈ N .

Definition D.32. (a) Define the set of real numbers R as the set of equivalence classesof the equivalence relation ∼ defined in (D.53), i.e.

R := S/ ∼ = {[(rn)n∈N] : (rn)n∈N ∈ S} (D.54)

is the quotient set of S with respect to ∼ (cf. Ex. 2.24(c)).

(b) Addition on R is defined by

+ : R× R −→ R,([f ], [g]

)7→ [f ] + [g] := [f + g]. (D.55)

Multiplication on R is defined by

· : R× R −→ R,([f ], [g]

)7→ [f ] · [g] := [fg]. (D.56)

—

Once again, for the definitions in Def. D.32(b) to make sense, one needs to check thatthey do not depend on the chosen representatives of the equivalence classes, and onceagain, we provide a lemma providing this check:


∀f,g,f ,g∈S

(f − f ∈ N ∧ g − g ∈ N ⇒ f + g − (f + g) ∈ N

)(D.57)

and∀

f,g,f ,g∈S

(f − f ∈ N ∧ g − g ∈ N ⇒ fg − (f g) ∈ N

). (D.58)

Proof. Let f = (rn)n∈N, g = (sn)n∈N, f = (rn)n∈N, g = (sn)n∈N be elements of S suchthat f − f ∈ N and g − g ∈ N , i.e. limn→∞(rn − rn) = limn→∞(sn − sn) = 0.

Then (7.11b) implies 0 = limn→∞(rn + sn − (rn + sn)

), proving (D.57).

To prove (D.58), one computes

limn→∞

(rnsn − rnsn

)= lim

n→∞

(rn(sn − sn)− sn(rn − rn)

)= 0, (D.59)

where the last equality follows from the boundedness of (rn)n∈N and (sn)n∈N togetherwith Prop. 7.11(b). �

We will also use the following auxiliary result:

Proposition D.34. If (rn)n∈N ∈ S, then precisely one of the following statements iscorrect:

(rn)n∈N ∈ N , (D.60a)

∃ǫ∈Q+

#{n ∈ N : rn ≤ ǫ} ∈ N0, (D.60b)

∃ǫ∈Q+

#{n ∈ N : rn ≥ −ǫ} ∈ N0. (D.60c)

Proof. Let us first verify that the three statements in (D.60) are mutually exclusive. If(D.60a) holds, then, for every ǫ ∈ Q+, −ǫ < rn < ǫ holds for almost all (in particular,for infinitely many) n ∈ N, i.e. (D.60b) and (D.60c) are both false. If (D.60b) holds,

then (D.60a) must be false as we have just seen. Moreover, if rn ≤ ǫ holds for at mostfinitely many n ∈ N, then rn > ǫ > 0 must hold for infinitely many n ∈ N, i.e. (D.60c)is false.

Now suppose (D.60a) and (D.60b) are false. We have to show that (D.60c) is true. Since(D.60a) is false, there exists δ > 0 and an increasing sequence of indices (nk)k∈N with|rnk

| > δ for each k ∈ N. Since (D.60b) is false, there is an increasing sequence of indices(mk)k∈N with rmk

< 1/k. Thus, since (rn)n∈N is a Cauchy sequence, only finitely manyrnk

> δ and infinitely many rnk< −δ. Now, if N ∈ N is such that |rn − rm| < δ/2 for

all n,m > N and k0 ∈ N such that nk0 > N , then rn < −δ/2 for each n > N (since|rn − rnk0

| < δ/2). Thus, (D.60c) holds with ǫ := δ/2. �

Theorem D.35. (a) The set of real numbers R with addition and multiplication asdefined in Def. D.32 forms a field, where [(0, 0, . . . )] and [(1, 1, . . . )] are the neutralelements with respect to addition and multiplication, respectively.

(b) The mapι : Q −→ R, ι(r) :=

[(r, r, . . . )

], (D.61)


∀r,s∈Q

ι(r + s) = ι(r) + ι(s), (D.62a)

∀r,s∈Q

ι(rs) = ι(r) · ι(s). (D.62b)

It is customary to identify Q with ι(Q), as it usually does not cause any confusion.One then just writes r instead of

[(r, r, . . . )

].

Proof. (a): Clearly, Def. D.32(b) ensures the laws of associativity and commutativityof addition and multiplication valid in S are preserved in R, and, likewise, the law ofdistributivity. It is also immediate from (D.55) and (D.56), respectively, that [(0, 0, . . . )]and [(1, 1, . . . )] are the respective neutral elements of addition and multiplication. More-over, if −f is the additive inverse of f ∈ S, then [−f ] is the additive inverse of [f ] ∈ R.It remains to show that each x = [(rn)n∈N] 6= [(0, 0, . . . )] has a multiplicative inverse x−1

in R. We claim x−1 = [(sn)n∈N], where

∀n∈N

sn :=

{

r−1n for rn 6= 0,

1 for rn = 0.(D.63)

We need to verify [(sn)n∈N] ∈ R, i.e. (sn)n∈N is a Cauchy sequence. We know (rn)n∈N isa Cauchy sequence that does not converge to 0. Thus, according to Prop. D.34, thereexists δ > 0 and M ∈ N such that, for each n > M , we have |rn| > δ (in particular,rn 6= 0). Let ǫ > 0. As (rn)n∈N is a Cauchy sequence, there exists N ∈ N such thatN ≥M and, for each n,m > N , |rn − rm| < ǫ δ2. Thus,

∀n,m>N

|sn − sm| =∣∣∣∣

1

rn− 1

rm

∣∣∣∣=

∣∣∣∣

rn − rmrn rm

∣∣∣∣<ǫ δ2

δ2= ǫ, (D.64)

proving (sn)n∈N is a Cauchy sequence. Moreover,[(rn)n∈N

]·[(sn)n∈N

]=[(rnsn)n∈N

]=[(1, 1, . . . , )

], (D.65)

since rnsn = 1 for almost all n ∈ N, and the proof of (a) is complete.

(b): The map ι is injective, since ι(r) = [(r, r, . . . )] = ι(s) = [(s, s, . . . )] implieslimn→∞(r − s) = 0, i.e. r = s. Moreover,

ι(r) + ι(s) =[(r, r, . . . )

]+[(s, s, . . . )

]=[(r + s, r + s, . . . )

]= ι(r + s), (D.66a)

ι(r) · ι(s) =[(r, r, . . . )

]·[(s, s, . . . )

]=[(rs, rs, . . . )

]= ι(rs), (D.66b)


Definition D.36. We define R+ to consist of all real numbers represented by sequences(rn)n∈N such that there exists ǫ ∈ Q+ satisfying rn > ǫ for almost all n ∈ N, i.e.

R+ :=

{[(rn)n∈N

]∈ R : ∃

ǫ∈Q+#{n ∈ N : rn ≤ ǫ} ∈ N0

}

. (D.67)

Proposition D.37. (a) The definition in (D.67) does not depend on the chosen rep-resentatives (rn)n∈N.

(b) We have the decomposition

R = R+ ∪{0} ∪R−, R− := −R+ = {x ∈ R : −x ∈ R+}. (D.68)

Proof. (a): If (sn)n∈N ∈ S with limn→∞(rn− sn) = 0, then |rn− sn| < ǫ/2 for almost alln ∈ N. Thus, since |sn| ≥ |rn| − |rn − sn|, we obtain sn > ǫ/2 for almost all n ∈ N, i.e.#{n ∈ N : sn ≤ ǫ

2} ∈ N0.

(b) is an immediate consequence of Prop. D.34. �

Definition D.38. For each x, y ∈ R, let

y ≤ x :⇔ x− y ∈ R+0 := R+ ∪ {0}. (D.69)

Theorem D.39. (a) The relation defined in (D.69) constitutes a total order on R thatis compatible with addition and multiplication, i.e. it satisfies (4.3); in other words(R,+, ·,≤) constitutes a totally ordered field.

(b) The map ι from (D.61) is strictly increasing.

Proof. (a) follows from (D.69), (D.68), and Th. D.8, once we have shown that R+ isclosed under addition and multiplication. Let (rn)n∈N ∈ S, (sn)n∈N ∈ S. If rn > ǫ1 ∈ Q+

for almost all n ∈ N and sn > ǫ2 ∈ Q+ for almost all n ∈ N, then rn + sn > ǫ1 + ǫ2,showing R+ is closed under addition. Moreover, rnsn > ǫ1ǫ2, showing R+ is closed undermultiplication.

(b): According to Def. D.39, if r, s ∈ Q with s < r, then q := r − s ∈ Q+. Inconsequence ι(r) = ι(s) + ι(q) by (D.62a), i.e. ι(r) − ι(s) = ι(q) = [(q, q, . . . )] ∈ R+,proving ι(s) < ι(r). �

Finally, we will show in Th. D.41 below that the order ≤ on R is complete. However,we first need some additional auxiliary results.

Proposition D.40. (a) For each x ∈ R, there is (rn)n∈N ∈ S satisfying limn→∞ rn = x.

(b) Every (rn)n∈N ∈ S converges in R – more precisely, limn→∞ rn = [(rn)n∈N].

(c) Every Cauchy sequence in R converges in R.

Proof. (a) and (b): If x = [(rn)n∈N] with (rn)n∈N ∈ S, then, given ǫ > 0, choose N ∈ N

such that, for each m,n > N , one has |rn − rm| < ǫ/2. Then, for each k > N , one has|x−rk| = |[(rn−rk)n∈N]| < ǫ, since |rn−rk| < ǫ/2 for all n ≥ k, showing limn→∞ rn = x.

(c): Let (xn)n∈N be a Cauchy sequence in R. According to (a), for each n ∈ N, thereexists rn ∈ Q such that |xn− rn| < 1

n. Then (rn)n∈N is a Cauchy sequence: Given ǫ > 0,

choose k ∈ N such that 1k< ǫ

3and |xn − xm| < ǫ

3for each n,m > k. Then

∀n,m>k

|rn − rm| ≤ |rn − xn|+ |xn − xm|+ |xm − rm| <ǫ

3+ǫ

3+ǫ

3= ǫ, (D.70)

showing (rn)n∈N is Cauchy. Thus, from (b), we obtain x ∈ R with limn→∞ rn = x. Wecan now show, limn→∞ xn = x as well: Given ǫ > 0, choose N ∈ N such that 1

N< ǫ

2and

|x− rn| < ǫ2for each n > N . Then

∀n>N

|x− xn| ≤ |x− rn|+ |rn − xn| <ǫ

2+ǫ

2= ǫ, (D.71)

showing limn→∞ xn = x and completing the proof. �

Theorem D.41. The order ≤ on R is complete, i.e. (R,+, ·,≤) constitutes a completetotally ordered field.

Proof. Let ∅ 6= A ⊆ R and let M ∈ R be an upper bound for A. We have to show thatA has a supremum in R. To this end, we recursively construct two Cauchy sequences(xn)n∈N and (yn)n∈N in R such that (xn)n∈N is increasing, (yn)n∈N is decreasing, xn < yn,and limn→∞(yn − xn) = 0. Let x1 ∈ A be arbitrary and y1 :=M . Define

∀n∈N

xn+1 :=

{

(xn + yn)/2 if (xn + yn)/2 is not an upper bound for A,

xn otherwise,

yn+1 :=

{

(xn + yn)/2 if (xn + yn)/2 is an upper bound for A,

yn otherwise.

(D.72)

Then, clearly, the xn are increasing, the yn are decreasing, and xn ≤ yn holds for eachn ∈ N. Moreover, letting d := M − x1 ≥ 0, a simple induction shows yn − xn = d/2n−1

and limn→∞(yn − xn) = 0. Also, for m > n,

xm − xn =m−1∑

i=n

(xi+1 − xi) ≤ dm−1∑

i=n

2−i =d

2n

m−1∑

i=n

2−i+n =d

2n

m−1−n∑

i=0

2−i ≤ 2d

2n, (D.73)

showing (xn)n∈N is a Cauchy sequence. Analogous, one sees that (yn)n∈N is a Cauchysequence. By Prop. D.40(c), we obtain s ∈ R such that s = limn→∞ xn = limn→∞(yn −xn + xn) = limn→∞ yn. We claim s = supA. If s < y, then there is n ∈ N withs ≤ yn < y, showing y /∈ A, i.e. s is an upper bound for A. If y < s, then there is n ∈ N

with y < xn ≤ s, showing y is not an upper bound for A. Thus, s is the smallest upperbound for A, i.e. s = supA. �

D.6 Uniqueness

We will show in Th. D.45 below that, up to a unique isomorphism, R is the only completetotally ordered field.

Notation D.42. Let (A,+, ·,≤) be a complete totally ordered field. The neutral el-ements with respect to + and ·, we denote with 0A and 1A, respectively. We recur-sively define (n + 1)A := nA + 1A for each n ∈ N. Then NA := {nA : n ∈ N},ZA := NA ∪ {0A} ∪ {−nA : n ∈ N}, QA := {0A} ∪ {k

l: k, l ∈ ZA \ {0A}}.

Proposition D.43. Let (A,+, ·,≤) and (B,+, ·,≤) be complete totally ordered fields.Moreover, let φ : A −→ B be a field isomorphism, i.e. a bijective map, satisfying

∀x,y∈A

φ(x+ y) = φ(x) + φ(y), (D.74a)

∀x,y∈A

φ(xy) = φ(x)φ(y). (D.74b)

(a) φ(nA) = nB holds for each n ∈ N.

(b) φ is strictly isotone, i.e.

∀x,y∈A

(x < y ⇒ φ(x) < φ(y)

). (D.75)

Proof. (a): As (D.74a) and (D.74b) state φ to be a group homomorphism with respectto addition and multiplication, respectively, Prop. C.4(a) yields φ(0A) = 0B and φ(1A) =1B. If n ∈ N, then (D.74a) implies φ(nA + 1A) = φ(nA) + 1B and, thus, an inductionshows φ(nA) = nB for each n ∈ N.

(b): If x, y ∈ A with x < y, then, by Rem. and Def. 7.61, there exists a unique z ∈ Asuch that z2 = y − x. Thus, (φ(z))2 = φ(y)− φ(x). By Th. 4.5(c), we have (φ(z))2 > 0and, thus, φ(x) < φ(y), proving the strict isotonicity of φ. �

Proposition D.44. Let (A,+, ·,≤) be a complete totally ordered field. If φ : A −→ Ais (field) automorphism, i.e. a bijective map, satisfying (D.74), then φ is the identity onA.

Proof. From Prop. D.43(a), we already know φ(n) = n for each n ∈ NA. Next, ifn ∈ NA, then, using Prop. C.4(b), we obtain φ(−n) = −φ(n) = −n, showing φ(k) = kfor each k ∈ ZA. If k, l ∈ ZA \ {0A}, then φ(k/l) = φ(k · l−1) = φ(k) · (φ(l))−1 = kl−1,


where Prop. C.4(b) was used again. Thus, we already have φ(q) = q for each q ∈ QA.From Prop. D.43(b), we know φ to be strictly isotone. Finally, if x ∈ A, then, by Th.7.68(c), there exist a sequences (rn)n∈N in QA and (sn)n∈N in QA such that (rn)n∈N isstrictly increasing, (sn)n∈N is strictly decreasing, and

limn→∞

rn = limn→∞

sn = x.

As φ is isotone, we obtain

∀n∈N

φ(rn) = rn ≤ f(x) ≤ sn = φ(sn).

But then f(x) = x follows from Th. 7.16, proving φ = IdA. �

Theorem D.45. Let (A,+, ·,≤) and (B,+, ·,≤) be complete totally ordered fields. Thenthere exists a unique isomorphism φ : A −→ B, i.e. a unique bijective φ : A −→ B,satisfying (D.74a), (D.74b), and (D.75).

Proof. Uniqueness: Suppose, φ : A −→ B and ψ : A −→ B are both isomorphisms.Then, according to Prop. D.44, φ−1 ◦ ψ = IdA, where Prop. C.4(c) was used as well.However, this already shows ψ = φ.

Existence: Due to Prop. D.43(b), it suffices to show there exists a bijective φ : A −→ B,satisfying (D.74). We define φ and verify (D.74) in several steps. In the first step, set

∀n∈N0

φ(nA) := nB. (D.76)

Then φ : NA ∪ {0A} −→ NB ∪ {0B} is bijective with φ−1 : NB ∪ {0B} −→ NA ∪ {0A},φ−1(nB) = nA. We first verify

∀n∈N0

φ(1A + nA) = φ(1A) + φ(nA) : (D.77)

Indeed,

φ(1A + nA) = φ((n+ 1)A) = (n+ 1)B = 1B + nB = φ(1A) + φ(nA).

Next, we verify∀

m,n∈N0

φ(mA + nA) = φ(mA) + φ(nA) (D.78)

via induction on m: The case m = 0 holds, due to φ(0A + nA) = φ(nA) = nB =0B+nB = φ(0A)+φ(nA). The case m = 1 holds, due to (D.77). For the induction step,we compute

φ((m+ 1)A + nA) = φ(mA + 1A + nA)ind. hyp.= φ(mA) + φ(1A + nA)

(D.77)= φ(mA) + φ(1A) + φ(nA)

(D.77)= φ((m+ 1)A) + φ(nA),

proving (D.78). We now prove

∀m,n∈N0

φ(mAnA) = φ(mA)φ(nA) (D.79)


via induction on m: The case m = 0 holds, due to

φ(0AnA) = φ(0A) = 0B = φ(0A)φ(nA).

For the induction step, we compute

φ((m+ 1)A nA) = φ((mA + 1A)nA) = φ(mAnA + nA)(D.78)= φ(mAnA) + φ(nA)

ind. hyp.= φ(mA)φ(nA) + φ(nA) =

(φ(mA) + 1B

)φ(nA)

(D.78)= φ(mA + 1A)φ(nA) = φ((m+ 1)A)φ(nA).

In particular, according to (D.78) and (D.79), (D.74) holds for each x, y ∈ NA.

In the second step, set∀n∈N

φ(−nA) := −nB. (D.80)

Then φ : ZA −→ ZB is still bijective, where, for each m,n ∈ N, we have φ−1(−nB) =−nA.Let m,n ∈ N0. If m ≤ n, then

φ(−mA + nA) = φ((n−m)A) = (n−m)B = −mB + nB = φ(−mA) + φ(nA).

If m > n, then

φ(−mA + nA) = φ(−(m− n)A) = −(m− n)B = −mB + nB = φ(−mA) + φ(nA).

Now, for arbitrary m,n ∈ N0, φ(mA + (−nA)) = φ(−nA +mA) = φ(−nA) + φ(mA) =φ(mA)+φ(−nA) and φ(−mA+(−nA)) = φ(−(mA+nA) = −φ(mA+nA) = −(φ(mA)+φ(nA)) = −φ(mA)− φ(nA) = φ(−mA) + φ(−nA). We now consider multiplication, stillfor m,n ∈ N0:

φ((−mA)nA) = −φ(mAnA)(D.79)= −φ(mA)φ(nA) = φ(−mA)φ(nA).

Then φ(mA (−nA)) = φ(mA)φ(−nA) also follows and

φ((−mA)(−nA) = φ(mAnA)(D.79)= φ(mA)φ(nA) = φ(−mA)φ(−nA).

Thus, we have verified (D.74) for each x, y ∈ ZA.

In the third step, set∀

k,l∈ZA\{0A}φ(k/l) := φ(k)/φ(l). (D.81)

We verify that (D.81) well-defines φ for each q ∈ QA: If m,n, k, l ∈ ZA with n, l 6= 0A,then m/n = k/l implies ml = kn and φ(m)φ(l) = φ(ml) = φ(kn) = φ(k)φ(n). Thus,

φ(m/n) = φ(m)/φ(n) = φ(k)/φ(l) = φ(k/l).

We show φ : QA −→ QB to be bijective by providing the inverse map: Define ψ :QB −→ QA by setting

∀k,l∈ZB\{0B}

ψ(k/l) := φ−1(k)/φ−1(l).

We claim that ψ = φ−1 on QB: Indeed, for each k, l ∈ ZA \ {0A} and for each m,n ∈ZB \ {0B}

ψ(φ(k/l)) = ψ(φ(k)/φ(l)

)= φ−1(φ(k))/φ−1(φ(l)) = k/l,

φ(ψ(m/n)) = φ(φ−1(m)/φ−1(n)

)= φ(φ−1(m))/φ(φ−1(n)) = m/n.

Moreover, we have

φ

(m

n+k

l

)

= φ

(ml + kn

nl

)

=φ(ml + kn)

φ(nl)

(D.74) for ZA=

φ(m)φ(l) + φ(k)φ(n)

φ(n)φ(l)

=φ(m)

φ(n)+φ(k)

φ(l)= φ

(m

n

)

+ φ

(k

l

)

and

φ

(m

n· kl

)

= φ

(mk

nl

)

=φ(mk)

φ(nl)

(D.74) for ZA=

φ(m)φ(k)

φ(n)φ(l)

=φ(m)

φ(n)· φ(k)φ(l)

= φ(m

n

)

· φ(k

l

)

.

Thus, we have verified (D.74) for each x, y ∈ QA.

We now show φ : QA −→ QB to be strictly isotone, i.e.

∀r,s∈QA

(

r < s ⇒ φ(r) < φ(s))

: (D.82)

Let r, s ∈ QA such that r < s. Then d := s− r > 0A, i.e. there are m,n ∈ NA satisfyingd = m

n. Then φ(s)− φ(r) = φ(d) = φ(m)

φ(n)> 0B, proving φ(r) < φ(s).

In the fourth (and last) step, for each x ∈ A, we choose a sequence (rn)n∈N in QA suchthat x = limn→∞ rn and set

φ(x) := limn→∞

φ(rn). (D.83)

To show that φ is well-defined by (D.83), we have to verify that (φ(rn))n∈N does, indeed,converge in B, and that φ(x) does not depend on the chosen sequence (rn)n∈N. As(rn)n∈N converges to x, it has to be a Cauchy sequence by Th. 7.29. We show that(φ(rn))n∈N must be a Cauchy sequence as well: Let ǫ ∈ B, ǫ > 0B and choose ǫ ∈ QB

such that 0B < ǫ < ǫ. Then

∃N∈N

∀n,m>N

|rn − rm| < φ−1(ǫ).

As φ is strictly isotone, we obtain

∀n,m>N

|φ(rn)− φ(rm)| < ǫ < ǫ,

proving (φ(rn))n∈N to be a Cauchy sequence. Now Th. 7.29 implies the convergence of(φ(rn))n∈N. Next, we show limn→∞ rn = 0A implies limn→∞ φ(rn) = 0B for each sequencein QA: Indeed, as above, let ǫ > 0B and choose ǫ ∈ QB such that 0B < ǫ < ǫ. Then

∃N∈N

∀n>N

|rn| < φ−1(ǫ).

E SERIES: ADDITIONAL MATERIAL 237


∀n>N

|φ(rn)| < ǫ < ǫ,

proving limn→∞ φ(rn) = 0B. Thus, if (rn)n∈N and (sn)n∈N are sequences in QA such thatlimn→∞ rn = x = limn→∞ sn, then

limn→∞

φ(rn) = limn→∞

φ(rn − sn + sn) = 0B + limn→∞

φ(sn) = limn→∞

φ(sn),

showing φ to be well-defined by (D.83). To see that φ is injective, let x, y ∈ A withx < y and choose r, s ∈ QA such that x < r < s < y. If (rn)n∈N and (sn)n∈N aresequences in QA such that x = limn→∞ rn and y = limn→∞ sn, then

∃N∈N

∀n>N

(

rn < r < s < sn

)

.


∀n>N

(

φ(rn) < φ(r) < φ(s) < φ(sn))

,

showing φ(x) 6= φ(y) and the injectivity of φ. To see that φ is surjective, let b ∈ B andlet (rn)n∈N be an increasing sequence in QB such that b = limn→∞ rn. Then (φ−1(rn))n∈Nis an increasing sequence in QA that is bounded, i.e. it must converge to some a ∈ A.Then

φ(a) = limn→∞

φ(φ−1(rn)) = limn→∞

rn = b,

showing φ to be surjective. Finally, if x, y ∈ A, then let (rn)n∈N and (sn)n∈N be sequencesin QA such that x = limn→∞ rn and y = limn→∞ sn. Then

φ(x+ y) = limn→∞

φ(rn + sn) = limn→∞

φ(rn) + limn→∞

φ(sn) = φ(x) + φ(y)

andφ(xy) = lim

n→∞φ(rnsn) = lim

n→∞φ(rn) lim

n→∞φ(sn) = φ(x)φ(y).

Thus, we have verified (D.74) for each x, y ∈ A, and, thereby completed the proof. �

E Series: Additional Material

E.1 Riemann Rearrangement Theorem

Here, we provide the details for the proof of the Riemann rearrangement Th. 7.93, thatwas merely sketched in Sec. 7.3.3.

Proof of Th. 7.93. As already stated in the sketch, we define

∀k∈N

xk :=

−k for x = −∞,

x for x ∈ R,

k for x = ∞,

yk :=

−k for y = −∞,

y for y ∈ R,

k for y = ∞,

(E.1)

noting xk ≤ yk for almost all k ∈ N. Next, we observe

N = I+ ∪ I−, where (E.2a)

I+ := {j ∈ N : aj ≥ 0}, (E.2b)

I− := {j ∈ N : aj < 0}. (E.2c)

We have to define a suitable bijective map φ : N −→ N such that

∀j∈N

bj := aφ(j), (E.3a)

∀n∈N

tn :=n∑

j=1

bj. (E.3b)

The definition of φ will be recursive, and we will also need to recursively define anauxiliary sequence (σj)j∈N taking values in {−1, 1}, serving as an accounting tool tokeep track if we are in the process of moving right (i.e. adding a+j ) or moving left (i.e.subtracting a−j ). Moreover, we need a recursively defined auxiliary function κ : N −→ N

to update the left and right boundaries xk and yk, respectively, to handle the first andthird case of (E.1) if need be. The recursion is initialized by

φ(1) := 1, (E.4a)

σ1 :=

{

1 if t1 ≤ y1,

−1 if t1 > y1,(E.4b)

κ(1) :=

{

1 if t1 ≤ y1,

2 if t1 > y1,(E.4c)

and completed by

∀j>1

φ(j) :=

{

min(I+ \ φ{1, . . . , j − 1}

)if σj−1 = 1,

min(I− \ φ{1, . . . , j − 1}

)if σj−1 = −1,

(E.5a)

∀j>1

σj :=

1 if σj−1 = 1 and tj ≤ yκ(j−1),

−1 if σj−1 = 1 and tj > yκ(j−1),

−1 if σj−1 = −1 and tj ≥ xκ(j−1),

1 if σj−1 = −1 and tj < xκ(j−1),

(E.5b)

∀j>1

κ(j) :=

κ(j − 1) if σj−1 = 1 and tj ≤ yκ(j−1),

1 + κ(j − 1) if σj−1 = 1 and tj > yκ(j−1),

κ(j − 1) if σj−1 = −1 and tj ≥ xκ(j−1),

1 + κ(j − 1) if σj−1 = −1 and tj < xκ(j−1).

(E.5c)

We note that φ is well-defined, since, according to (7.86), both I+ and I− must haveinfinitely many elements. Moreover, φ is injective, since, for j1 < j2, φ(j2) 6= φ(j1)is immediate from (E.5a). Finally, φ is also surjective: Otherwise, there is a smallest

n ∈ N \ {1} such that n /∈ φ(N). Suppose n ∈ I+. Then, according to (E.5a), theremust be j0 ∈ N such that σj = −1 for every j > j0, i.e., according to (E.5b) and(E.5c), tj ≥ xκ(j0) ∈ R for each j > j0, which is in contradiction to the

∑∞j=1 a

−j = ∞

part of (7.86). Analogously, n ∈ I− leads to a contradiction to the∑∞

j=1 a+j = ∞ part

of (7.86), completing the proof of surjectivity of φ. So we have shown that∑∞

j=1 bjis a rearrangement of

∑∞j=1 aj as desired. We still need to verify that

∑∞j=1 bj (i.e.

(tn)n∈N) has precisely all elements of [x, y] as cluster points. To this end, first notethat, due to (7.86) and (E.1), limj→∞ xκ(j) = −∞ holds if, and only if, x = −∞; andlimj→∞ xκ(j) = ∞ holds if, and only if, x = ∞; and likewise for the yκ(j) and y. Ifx = −∞, then limj→∞ xκ(j) = −∞ and the bijectivity of φ together with (E.5b) and(E.5c) implies

∀N∈N

∃j∈N

tj < xκ(j−1) ≤ −N, (E.6)

showing −∞ is a cluster point of (tn)n∈N. Analogously, if y = ∞, then limj→∞ yκ(j) = ∞and the bijectivity of φ together with (E.5b) and (E.5c) implies

∀N∈N

∃j∈N

tj > yκ(j−1) ≥ N, (E.7)

showing ∞ is a cluster point of (tn)n∈N. Now let ξ ∈ [x, y] ∩ R and ǫ > 0. Due tolimj→∞ a+j = limj→∞ a−j = 0, we have

∃N∈N

∀j>N

tj − tj−1 < ǫ. (E.8)

Due to the bijectivity of φ together with (E.5b) and (E.5c), for each j0 ∈ N, there existsj > max{j0, N} such that tj−1 ≤ ξ ≤ tj, showing ξ is a cluster point of (tn)n∈N. On theother hand, if ξ ∈]−∞, x[, then x 6= −∞. If x = ∞, then limj→∞ tj = ∞ and ξ is nota cluster point of (tn)n∈N. If ξ < x < ∞, then let ǫ := (x − ξ)/2 and choose N as in(E.8). Then, by (E.5b) and (E.5c), for each j > N , tj > x − ǫ = ξ + ǫ, showing ξ isnot a cluster point of (tn)n∈N. Analogously, one sees that ξ ∈]y,∞[ can not be a clusterpoint of (tn)n∈N. �

E.2 b-Adic Representations of Real Numbers

The main goal of this section is to provide a proof of Th. 7.99. We begin with somepreparatory lemmas.

Lemma E.1. Given a natural number b ≥ 2, consider the b-adic series given by (7.96).Then ∞∑

ν=0

dN−ν bN−ν ≤ bN+1, (E.9)

and, in particular, the b-adic series converges to some x ∈ R+0 . Moreover, equality in

(E.9) holds if, and only if, dn = b− 1 for every n ∈ {N,N − 1, N − 2, . . . }.

Proof. One estimates, using the formula for the value of a geometric series:

∞∑

ν=0

dN−ν bN−ν ≤

∞∑

ν=0

(b− 1) bN−ν = (b− 1)bN∞∑

ν=0

b−ν = (b− 1)bN1

1− 1b

= bN+1. (E.10)

Note that (E.10) also shows that equality is achieved if all dn are equal to b − 1. Con-versely, if there is n ∈ {N,N − 1, N − 2, . . . } such that dn < b− 1, then there is n ∈ N

such that dN−n < b− 1 and one estimates

∞∑

ν=0

dN−ν bN−ν <

n−1∑

ν=0

dN−ν bN−ν + (b− 1)bN−n +

∞∑

ν=n+1

dN−ν bN−ν ≤ bN+1, (E.11)

showing that the inequality in (E.9) is strict. �

Lemma E.2. Given a natural number b ≥ 2, consider two b-adic series

x :=∞∑

ν=0

dN−ν bN−ν =

∞∑

ν=0

eN−ν bN−ν , (E.12)

N ∈ Z and dn, en ∈ {0, . . . , b− 1} for each n ∈ {N,N − 1, N − 2, . . . }. If dN < eN , theneN = dN + 1, dn = b− 1 for each n < N and en = 0 for each n < N .

Proof. By subtracting dNbN from both series, one can assume dN = 0 without loss of

generality. From Lem. E.1, we know

x =∞∑

ν=0

dN−ν bN−ν =

∞∑

ν=0

dN−1−ν bN−1−ν ≤ bN . (E.13a)

On the other hand:

x =∞∑

ν=0

eN−ν bN−ν ≥ bN . (E.13b)

Combining (E.13a) and (E.13b) yields x = bN . Once again employing Lem. E.1, (E.13a)also shows that dn = b − 1 for each n ≤ N − 1 as claimed. Since eN > 0 and en ≥ 0for each n, equality in (E.13b) can only occur for eN = 1 and en = 0 for each n < N ,thereby completing the proof of the lemma. �

Notation E.3. For each x ∈ R, we let

⌊x⌋ := max{k ∈ Z : k ≤ x} (E.14)

denote the integral part of x (also called floor of x or x rounded down).

Proof of Th. 7.99. We start by constructing numbers N and dn, n ∈ {N,N − 1, N −2, . . . }, such that (7.97) holds. For x = 0, one chooses an arbitrary N ∈ Z and dn = 0for each n ∈ {N,N −1, N −2, . . . }. Thus, for the remainder of the proof, fix x > 0. Let

N := max{n ∈ Z : bn ≤ x}. (E.15)

The numbers dN−n ∈ {0, . . . , b − 1} and xn ∈ R+, n ∈ N0, are defined inductively byletting

dN :=⌊ x

bN

⌋

, x0 := dNbN , (E.16a)

dN−n :=

⌊x− xn−1

bN−n

⌋

, xn := xn−1 + dN−n bN−n for n ≥ 1. (E.16b)

Claim 3. One can verify by induction on n that the numbers dN−n and xn enjoy thefollowing properties for each n ∈ N0:

dN−n ∈ {0, . . . , b− 1}, (E.17a)

0 < xn =n∑

ν=0

dN−ν bN−ν ≤ x, (E.17b)

x− xn < bN−n. (E.17c)

Proof. The induction is carried out for all three statements of (E.17) simultaneously.From (E.15), we know bN ≤ x < bN+1, i.e. 1 ≤ x

bN< b. Using (E.16a), this yields

dN ∈ {1, . . . , b − 1} and 0 < x0 = dNbN = bNdN ≤ bN x

bN= x as well as x − x0 =

x − dNbN = bN( x

bN− dN) < bN . For n ≥ 1, by induction, one obtains 0 ≤ x − xn−1 <

b1+N−n, i.e. 0 ≤ x−xn−1

bN−n < b. Using (E.16b), this yields dN−n ∈ {0, . . . , b − 1} and

xn = xn−1 + dN−nbN−n ≤ xn−1 + bN−n x−xn−1

bN−n = x. Moreover, by induction, 0 < xn−1 =∑n−1

ν=0 dN−ν bN−ν , such that (E.16b) implies xn = xn−1 + dN−n bN−n ≥ xn−1 > 0 andxn = xn−1 + dN−n bN−n = dN−n bN−n +

∑n−1ν=0 dN−ν bN−ν =

∑nν=0 dN−ν bN−ν . Finally,

x − xn = x − xn−1 − dN−n bN−n = bN−n(x−xn−1

bN−n − dN−n) ≤ bN−n, completing the proofof the claim. N

Since, for each n ∈ N0,

0(E.17b)

≤ x− xn(E.17c)< bN−n, (E.18)

and limn→∞ bN−n = 0, we have limn→∞ xn = x, thereby establishing (7.97).

It remains to verify the equivalence of (i) – (iv).

(ii) ⇒ (i) is trivial.

“(iii) ⇒ (i)”: Assume (iii) holds. Without loss of generality, we can assume that n0

is the largest index such that dn = 0 for each n ≤ n0. We distinguish two cases. Ifn0 < N − 1 or dN 6= 1, then

N−n0−2∑

ν=0

dN−ν bN−ν + (dn0+1 − 1)bn0+1 +

∞∑

ν=N−n0

(b− 1) bN−ν

is a different b-adic representation of x and its first coefficient is nonzero. If n0 = N − 1and dN = 1, then

∞∑

ν=1

(b− 1) bN−ν =∞∑

ν=0

(b− 1) bN−1−ν

is a different b-adic representation of x and its first coefficient is nonzero.

“(iv) ⇒ (i)”: Assume (iv) holds. Without loss of generality, we can assume that n0 isthe largest index such that dn = b− 1 for each n ≤ n0. Then

N−n0−2∑

ν=0

dN−ν bN−ν + (dn0+1 + 1)bn0+1 +

∞∑

ν=N−n0

0 bN−ν

is a different b-adic representation of x and its first coefficient is nonzero.

We will now show that, conversely, (i) implies (ii), (iii), and (iv). To that end, let x > 0and suppose that x has two different b-adic representations

x =∞∑

ν=0

dN1−ν bN1−ν =

∞∑

ν=0

eN2−ν bN2−ν (E.19)

with N1, N2 ∈ Z; dn, en ∈ {0, . . . , b− 1}; and dN1 , eN2 > 0. This implies

x ≥ bN1 , x ≥ bN2 . (E.20a)

Moreover, Lem. E.1 yieldsx ≤ bN1+1, x ≤ bN2+1. (E.20b)

If N2 > N1, then (E.20) imply N2 = N1 + 1 and bN2 ≤ x ≤ bN1+1 = bN2 , i.e. x = bN2 =bN1+1. Since eN2 > 0, one must have eN2 = 1, and, in turn, en = 0 for each n < N2.Moreover, x = bN1+1 and Lem. E.1 imply that dn = b− 1 for each n ∈ {N1, N1− 1, . . . }.Thus, forN2 > N1, the value ofN1 is determined byN2 and the values of all dn and en arealso completely determined, showing that there are precisely two b-adic representationsof x. Moreover, the dn have the property required in (iv) and the en have the propertyrequired in (iii). The argument also shows that, for N1 > N2, one must have N1 = N2+1with the en taking the values of the dn and vice versa. Once again, there are preciselytwo b-adic representations of x; now the dn have the property required in (iii) and theen have the property required in (iv).

It remains to consider the case N := N1 = N2. Since, by hypothesis, the two b-adicrepresentations of x in (E.19) are not identical, there must be a largest index n ≤ Nsuch that dn 6= en. Thus, (E.19) implies

y :=∞∑

ν=0

dn−ν bn−ν =

∞∑

ν=0

en−ν bn−ν . (E.21)

Now Lem. E.2 shows that there are precisely two b-adic representations of x, one havingthe property required in (iii) and the other having property required in (iv).

Thus, in each case (N2 > N1, N1 > N2, and N1 = N2), we find that (i) implies (ii), (iii),and (iv), thereby concluding the proof of the theorem. �

In most cases, it is understood that we work only with decimal representations suchthat there is no confusion about the meaning of symbol strings like 101.01. However,

in general, 101.01 could also be meant with respect to any other base, and, the numberrepresented by the same string of symbols does obviously depend on the base used.Thus, when working with different representations, one needs some notation to keeptrack of the base.

Notation E.4. Given a natural number b ≥ 2 and finite sequences

(dN1 , dN1−1, . . . , d0) ∈ {0, . . . , b− 1}N1+1, (E.22a)

(e1, e2, . . . , eN2) ∈ {0, . . . , b− 1}N2 , (E.22b)

(p1, p2, . . . , pN3) ∈ {0, . . . , b− 1}N3 , (E.22c)

N1, N2, N3 ∈ N0 (where N2 = 0 or N3 = 0 is supposed to mean that the correspondingsequence is empty), the respective string

(dN1dN1−1 . . .d0)b for N2 = N3 = 0,

(dN1dN1−1 . . .d0 . e1 . . . eN2p1 . . . pN3)b for N2 +N3 > 0(E.23)

represents the number

N1∑

ν=0

dν bν +

N2∑

ν=1

eν b−ν +

∞∑

α=0

N3∑

ν=1

pν b−N2−αN3−ν . (E.24)

Example E.5. For the number from (7.95), we get

x = (131.6)10 = (10000011.10)2 = (83.A)16 (E.25)

(for the hexadecimal system, it is customary to use the symbols 0, 1, 2, 3, 4, 5, 6, 7, 8,9, A, B, C, D, E, F).

—

One frequently needs to convert representations with respect to one base into represen-tations with respect to another base. When working with digital computers, conversionsbetween bases 10 and 2 and vice versa are the most obvious ones that come up. Con-verting representations is related to the following elementary remainder theorem andthe well-known long division algorithm.

Theorem E.6. For each pair of numbers (a, b) ∈ N2, there exists a unique pair ofnumbers (q, r) ∈ N2

0 satisfying the two conditions a = qb+ r and 0 ≤ r < b.

Proof. Existence: Define

q := max{n ∈ N0 : nb ≤ a}, (E.26a)

r := a− qb. (E.26b)

Then q ∈ N0 by definition and (E.26b) immediately yields a = qb + r as well as r ∈ Z.Moreover, from (E.26a), qb ≤ a = qb+ r, i.e. 0 ≤ r, in particular, r ∈ N0. Since (E.26a)also implies (q + 1)b > a = qb+ r, we also have b > r as required.

Uniqueness: Suppose (q1, r1) ∈ N0, satisfying the two conditions a = q1b + r1 and0 ≤ r1 < b. Then q1b = a − r1 ≤ a and (q1 + 1)b = a − r1 + b > a, showingq1 = max{n ∈ N0 : nb ≤ a} = q. This, in turn, implies r1 = a − q1b = a − qb = r,thereby establishing the case. �

F CARDINALITY OF R AND SOME RELATED SETS 244

F Cardinality of R and Some Related Sets

Theorem F.1. (a) The set of natural numbers N is countable.

(b) The set of integers Z is countable: #Z = #N.

(c) The set of rational numbers Q is countable: #Q = #N.

Proof. (a): The identity Id : N −→ N shows N is countable.

(b): Using (D.24), the map

φ : N −→ Z, φ(n) :=

n/2 if n is even,

0 if n = 1,

−(n− 1)/2 if n is odd,

(F.1)

is clearly bijective, proving #Z = #N.

(c): According to (b), Z and Z \ {0} are countable. Then Th. 3.16 implies that A :=Z×(Z\{0}) is countable and there is a bijective map f : N −→ A. It is then immediatefrom Def. D.22(a) that the map

φ : N −→ Q, φ(n) :=[f(n)

], (F.2)

where [f(n)] denotes the equivalence class of f(n) with respect to ∼ from (D.31), issurjective. Thus, Q is countable by Prop. 3.15. �

In the following theorem and its two corollaries, we will see that the set R of real numbersis not countable, but has the same cardinality as the power set of N. Moreover, the sameis true for every nontrivial interval of real numbers.

Theorem F.2. Let a, b ∈ R with a < b. Recalling the notations F(N, {0, 1}

)= {0, 1}N

for the set of sequences in {0, 1}, we obtain the following equalities of cardinalities:

#R = #]a, b[= #{0, 1}N = #P(N). (F.3)

Proof. We divide the proof into the following steps:

(i) #{0, 1}N = #P(N).

(ii) #]0, 1[= #{0, 1}N.

(iii) #]− 1, 1[= #R.

(iv) #]a, b[= #]0, 1[.


(i): To prove #{0, 1}N = #P(N), we have to show the existence of a bijective mapf : {0, 1}N −→ P(N). Given σ ∈ {0, 1}N, i.e. σ is a function σ : N −→ {0, 1}, define

f(σ) := σ−1{1} = {n ∈ N : σ(n) = 1}. (F.4)

Then, indeed, f : {0, 1}N −→ P(N). It remains to show f is bijective. To verify f isinjective, consider σ, τ ∈ {0, 1}N. If σ 6= τ , then there exists n ∈ N with σ(n) 6= τ(n). Ifσ(n) = 1, then τ(n) = 0, i.e. n ∈ f(σ), but n /∈ f(τ), showing f(σ) 6= f(τ). Analogously,if σ(n) = 0, then τ(n) = 1, i.e. n ∈ f(τ), but n /∈ f(σ), again showing f(σ) 6= f(τ),concluding the proof that f is injective. To verify f is surjective, for each A ∈ P(N),define

σA : N −→ {0, 1}, σA(n) :=

{

1 if n ∈ A,

0 if n /∈ A.(F.5)

Then σA ∈ {0, 1}N and f(σA) = σ−1A {1} = A, proving f is surjective.

(ii): To prove #{0, 1}N = #]0, 1[, we have to show the existence of a bijective mapf : {0, 1}N −→]0, 1[. The map

g : {0, 1}N −→ [0, 1], g((xi)i∈N

):=

∞∑

i=1

xi 2−i, (F.6)

is well-defined by Lem. E.1 (i.e. 0 ≤ g ≤ 1). Moreover, according to Th. 7.99, g issurjective, but not injective, as there are numbers x ∈]0, 1[, that have two different dual(i.e. 2-adic) representations. However, as there are only countably many such numbers,we can use a modification to obtain our desired f . In preparation, we define, for eachn ∈ N, the sequences en := (eni)i∈N and fn := (fni)i∈N, where

∀n,i∈N

eni :=

{

1 for i = n,

0 for i 6= n,(F.7a)

∀n,i∈N

fni :=

{

1 for i > n,

0 for i ≤ n,(F.7b)

and we note

g((0, 0, . . . )

)= 0, (F.8a)

g((1, 1, . . . )

)= 1, (F.8b)

g(en) = g(fn) = 2−n for each n ∈ N. (F.8c)

We are now in a position to define

f : {0, 1}N −→]0, 1[, f((xi)i∈N

):=

2−1 if (xi)i∈N = (0, 0, . . . ),

2−2 if (xi)i∈N = (1, 1, . . . ),

2−(2n+1) if xi = eni for each i ∈ N,

2−(2n+2) if xi = fni for each i ∈ N,∑∞

i=1 xi 2−i otherwise.

(F.9)

Introducing the auxiliary sets

A := {(0, 0, . . . ), (1, 1, . . . )} ∪ {en : n ∈ N} ∪ {fn : n ∈ N}, (F.10a)

B := {2−n : n ∈ N}, (F.10b)

it follows from Th. 7.99 that (the following restrictions of f which, to simplify notation,we also denote by f)

f : {0, 1}N \ A −→]0, 1[ \B, (F.11a)

and

f : A −→ B (F.11b)

are bijective, i.e. the full f of (F.9) is itself bijective, completing the proof of (ii).

(iii): To prove #] − 1, 1[= #R, we have to show the existence of a bijective map f :R −→]− 1, 1[. Since we know from Def. and Rem. 8.27 that arctan : R −→]−π/2, π/2[is bijective, we can define

f : R −→]0, 1[, f(x) :=2 arctan x

π. (F.12)

However, even though this provides a valid proof, arctan is a somewhat complicatedfunction (as it is defined via sin and cos, which are defined via power series). Thus, itmight be desirable to see an alternative proof, using a more elementary f . We claimthat

f : R −→]− 1, 1[, f(x) :=x

|x|+ 1, (F.13)

is also bijective. Since f is clearly continuous, according to the intermediate value Th.7.57, it suffices to show

∀ǫ∈]0,1[

∃x1,x2∈R

f(x1) < −1 + ǫ < 1− ǫ < f(x2). (F.14)

However, for each ǫ ∈]0, 1[,

x1 <−1 + ǫ

ǫ= −ǫ−1 + 1 ⇒ x1 < x1 − 1− ǫ x1 + ǫ ⇒ f(x1) =

x1−x1 + 1

< −1 + ǫ,

x2 >1− ǫ

ǫ= ǫ−1 − 1 ⇒ x2 > 1 + x2 − ǫ− ǫ x2 ⇒ f(x2) =

x

x+ 1> 1− ǫ,

proving (F.14) and the surjectivity of f . To verify f is injective, it suffices to show thatf is strictly increasing. Since

x1 ≤ 0 ≤ x2 ∧ x1 < x2 ⇒ f(x1) =x1

−x1 + 1≤ 0 ≤ x2

x2 + 1= f(x2)

∧ f(x1) < f(x2),

x1 < x2 ≤ 0 ⇒ −x1 x2 + x1 < −x1 x2 + x2

⇒ f(x1) =x1

−x1 + 1<

x2−x2 + 1

= f(x2),

0 ≤ x1 < x2 ⇒ x1 x2 + x1 < x1 x2 + x2

⇒ f(x1) =x1

x1 + 1<

x2x2 + 1

= f(x2),

showing f is strictly increasing and, hence, injective.

(iv): To prove #]a, b[= #]0, 1[, we have to show the existence of a bijective map f :]a, b[−→]0, 1[. Such a bijective map is given by the (restriction of an) affine map

f : ]a, b[−→]0, 1[, f(x) :=x− a

b− a. (F.15)

The proof that f is bijective can be conducted analogous to (but much simpler than) theproof in (iii), or one can use (for example, from Linear Algebra) that every nonconstantaffine map from R into R is bijective. �

Corollary F.3. #R = #P(N) – in particular, R is not countable.

Proof. #R = #P(N) was proved in Th. F.2 and P(N) is uncountable by Th. A.69. �

Corollary F.4. If a, b ∈ R with a < b, then #(Q∩]a, b[) = #N and #(]a, b[ \Q) = #R,i.e. ]a, b[ contains countably many rational and uncountably many irrational numbers.

Proof. Since Q∩]a, b[⊆ Q, the claim #(Q∩]a, b[) = #N follows from Th. F.1(c), Prop.3.14, and Th. 7.68(a).

To prove #(]a, b[ \Q) = #R, a bijection between ]a, b[ \Q and R can be constructedanalogous to the construction of f in step (ii) of the proof of Th. F.2, making use of thefact that #]a, b[= #R and #Q = #N. �

Theorem F.5. The set of complex numbers C = R×R has the same cardinality as R:#(R× R) = #R = #P(N).

Proof. LetA := {0, 1}N. (F.16)

By an application of Th. F.2, it suffices to prove #A = #(A×A), which is accomplishedby showing the existence of a bijective map f : A −→ A× A. We define

f : A −→ A× A, f((xj)j∈N

):=((yj)j∈N, (zj)j∈N

), (F.17a)

where

∀j∈N

yj := x2j−1, (F.17b)

∀j∈N

zj := x2j, (F.17c)

andg : A× A −→ A, g

((yj)j∈N, (zj)j∈N

):= (xj)j∈N, (F.18a)

where

∀j∈N

xj :=

{

y(j+1)/2 for j odd,

zj/2 for j even.(F.18b)

Clearly, g = f−1, proving that f is bijective as desired. �

G PARTIAL FRACTION DECOMPOSITION 248

G Partial Fraction Decomposition

We consider C-valued rational functions of the form

z 7→ R(z) :=P (z)

Q(z), (G.1)

where P,Q : C −→ C are polynomials such that deg(P ) < deg(Q) =: n. Using Cor.8.33 as well as Rem. 6.7, we write Q in the form

Q(z) = c

k∏

j=1

(z − λj)mj , (G.2)

where c ∈ C, and λ1, . . . , λk ∈ C, k ∈ {1, . . . , n}, are the distinct zeros of Q, mj ∈ N

with∑k

j=1mj = n being their respective multiplicities.

In can be useful to write R as a linear combination of the so-called partial fractions

1

z − λj,

1

(z − λj)2, . . . ,

1

(z − λj)mj(j = 1, . . . , k) (G.3)

(for example, for the computation of the antiderivative of R, cf. Ex. 10.22(b)). Thefollowing Th. G.1 guarantees this is always possible:

Theorem G.1. Let P,Q : C −→ C be polynomials such that deg(P ) < deg(Q) =: n.Moreover, let N (Q) denote the set of zeros of Q, and assume Q to have the form of(G.2). Then there exists a unique family of coefficients

ajl ∈ C (j = 1, . . . , k, l = 1, . . . ,mj), (G.4)

such that

∀z∈C\N (Q)

R(z) =P (z)

Q(z)=

k∑

j=1

mj∑

l=1

ajl(z − λj)l

. (G.5)

Proof. We first prove the existence of the decomposition (G.5) via induction on n =deg(Q). If n = 1, then P must be constant and there is nothing to prove. For theinduction step, consider n ≥ 2. Let ζ be a zero of Q with multiplicity m ∈ {1, . . . , n}.Then, according to Rem. 6.7, there exists a polynomial S : C −→ C such that Q(z) =(z − ζ)m S(z) and S(ζ) 6= 0. Noting

R(z) :=P (z)

S(z)− P (ζ)

S(ζ)=P (z)S(ζ)− S(z)P (ζ)

S(z)S(ζ)(G.6)

and that P (z)S(ζ)−S(z)P (ζ) vanishes for z = ζ, there exists a polynomial T : C −→ C,deg T ≤ n− 2, such that

R(z) =(z − ζ)T (z)

S(z). (G.7)

Thus, for each z ∈ C \ N (Q), we have

R(z)− P (ζ)

(z − ζ)m S(ζ)=

R(z)

(z − ζ)m(G.7)=

T (z)

(z − ζ)m−1S(z). (G.8)

We will now apply (G.8) with ζ = λk and m = mk. Since deg(T ) < n − 1 = deg((z −

ζ)m−1S(z))< deg(Q), the induction hypothesis applies to the function in (G.8), yielding

coefficients ajl ∈ C, j = 1, . . . , k, l = 1, . . . ,mj for j < k, l = 1, . . . ,mj − 1 for j = k,satisfying

R(z)− P (λk)

(z − λk)mk S(λk)=

k−1∑

j=1

mj∑

l=1

ajl(z − λj)l

+

mk−1∑

l=1

akl(z − λk)l

, (G.9)

thereby completing the induction for the existence proof.

It remains to prove the uniqueness of the coefficients ajl in (G.5). Thus, suppose onehas bjl ∈ C, j = 1, . . . , k, l = 1, . . . ,mj, such that

k∑

j=1

mj∑

l=1

ajl(z − λj)l

=k∑

j=1

mj∑

l=1

bjl(z − λj)l

. (G.10)

We fix j0 and prove aj0l = bj0l via induction on l = 1, . . . ,mj0 : Let l ∈ {1, . . . ,mj0} andassume aj0α = bj0α has already been shown for each α > l (the induction does, indeed,start at l = mj0 , working itself down to l = 1). Then (G.10) implies

l∑

β=1

aj0β(z − λj0)

β+

k∑

j=1,j 6=j0

mj∑

β=1

ajβ(z − λj)β

=l∑

β=1

bj0β(z − λj0)

β+

k∑

j=1,j 6=j0

mj∑

β=1

bjβ(z − λj)β

. (G.11)

One now multiplies (G.11) by (z − λj0)l. Then taking the limit for z → λj0 on both

sides yields aj0l = bj0l as desired. �

If, in (G.1), P,Q : R −→ R, then the partial fraction decomposition (G.5) of Th. G.1 isnot quite satisfactory, since, even though P and Q are both real, the ajl will typically benonreal elements of C. As the following Th. G.2 shows, if P,Q are real, then it is alwayspossible to obain a partial fraction decomposition with only real coefficients, howeverits form is somewhat more complicated.

We start by using the real factorization of Q : R −→ R, deg(Q) = n ∈ N, according to(8.58), where, as in (G.2), we combine identical factors, obtaining

Q(x) = c

k1∏

j=1

(x− λj)mj

k2∏

j=1

(x2 + αjx+ βj)nj , (G.12)

where c ∈ R; λ1, . . . , λk1 ∈ R, k1 ∈ {0, . . . , n}, are the distinct real zeros of Q (ifany), mj ∈ N being their respective multiplicities; and (α1, β1), . . . , (αk2 , βk2) ∈ R2

k2 ∈ {0, . . . , n}, are distinct pairs of real numbers, each pair arising from combiningtwo conjugate nonreal zeros of Q according to (8.60), nj ∈ N being their respectivemultiplicities;

k1∑

j=1

mj + 2

k2∑

j=1

nj = n. (G.13)

Theorem G.2. Let P,Q : R −→ R be polynomials such that deg(P ) < deg(Q) =: n.Moreover, let N (Q) denote the set of zeros of Q, and assume Q to have the form of(G.12). Then there exist families of coefficients

ajl ∈ R (j = 1, . . . , k1, l = 1, . . . ,mj), (G.14a)

bjl ∈ R (j = 1, . . . , k2, l = 1, . . . , nj), (G.14b)

cjl ∈ R (j = 1, . . . , k2, l = 1, . . . , nj) (G.14c)

such that

∀x∈R\N (Q)

R(x) =P (x)

Q(x)=

k1∑

j=1

mj∑

l=1

ajl(x− λj)l

+

k2∑

j=1

nj∑

l=1

bjl x+ cjl(x2 + αjx+ βj)l

. (G.15)

Proof. We show that, if P,Q are real, where Q is as in (G.12), then (G.5) can berewritten in the form (G.15): First, consider λj0 ∈ R to be a real zero of Q. Thenall corresponding coefficients in (G.5) are real: We prove aj0l ∈ R via induction onl = 1, . . . ,mj0 : Let l ∈ {1, . . . ,mj0} and assume aj0α ∈ R has already been shown foreach α > l. Then (G.5) (with z replaced by x) implies

∀x∈R\N (Q)

S(x) := R(x)−mj0∑

β=l+1

aj0β(x− λj0)

β=

l∑

β=1

aj0β(x− λj0)

β+

k∑

j=1,j 6=j0

mj∑

β=1

ajβ(x− λj)β

∈ R.

(G.16)One now multiplies (G.16) by (x − λj0)

l. Then taking the limit for x → λj0 on bothsides yields aj0l ∈ R as desired (the limit on the right-hand side is clearly aj0l and allvalues and, thus, the limit on the left-hand side are clearly in R).

Thus, the summands corresponding to real zeros of Q are identical in (G.5) and (G.15).It remains to show that terms in (G.5), corresponding to conjugate nonreal zeros ofQ, can be combined to result in the summands involving the bjl and cjl in (G.15). Tothis end, consider λj0 , λj1 ∈ C to be conjugate nonreal zeros of Q, λj1 = λj0 . Then allcorresponding coefficients in (G.5) are conjugate: We prove aj0l = aj1l via induction onl = 1, . . . ,mj0 = mj1 : Let l ∈ {1, . . . ,mj0} and assume aj0α = aj1α has already beenshown for each α > l. We once again have the formula (G.16) for S(x) (even for eachx ∈ C \ N (Q), but we can no longer expect S(x) ∈ R). As before, after multiplying(G.16) by (x− λj0)

l, we obtain

limx→λj0

(S(x)(x− λj0)

l)= aj0l. (G.17)


Analogously, we also have

∀x∈C\N (Q)

R(x)−mj1∑

β=l+1

aj1β(x− λj1)

β=

l∑

β=1

aj1β(x− λj1)

β+

k∑

j=1,j 6=j1

mj∑

β=1

ajβ(x− λj)β

. (G.18)

Taking complex conjugates in (G.16) and using the induction hypothesis as well asR(x) = R(x) (since the coefficients of P,Q are real) yields

∀x∈C\N (Q)

S(x) = R(x)−mj1∑

β=l+1

aj1β(x− λj1)

β

(G.18)=

l∑

β=1

aj1β(x− λj1)

β+

k∑

j=1,j 6=j1

mj∑

β=1

ajβ(x− λj)β

.

(G.19)If we multiply (G.19) by (x− λj1)

l, we obtain

limx→λj1

(S(x)(x− λj1)

l)= aj1l. (G.20)

Thus,

aj0l(G.17)= lim

x→λj0

(S(x)(x− λj0)

l)= lim

x→λj0

(S(x)(x− λj1)

l) (G.20)

= aj1l, (G.21)

as needed.

We now combine two corresponding summands of (G.5) (for x ∈ R \ N (Q)):

σl :=aj0l

(x− λj0)l+

aj0l

(x− λj0)l=aj0l(x− λj0)

l + aj0l(x− λj0)l

(x2 − 2xReλj0 + |λj0|2)l=a(x− λ)l + a(x− λ)l

(x2 + bx+ c)l,

(G.22)where we have set

a := aj0l, λ := λj0 , b := −2Reλj0 , c := |λj0|2, (G.23)

to simplify notation. To finish the proof of (G.15), it remains to show there are realcoefficients s1l, . . . , sll and t1l, . . . , tll such that

∀l∈{1,...,mj0

}σl =

l∑

β=1

sβlx+ tβl(x2 + bx+ c)β

, (G.24)

which we prove via induction on l: For l = 1, we have

σ1 =a(x− λ) + a(x− λ)

x2 + bx+ c=

(a+ a)x− (aλ+ aλ)

x2 + bx+ c, (G.25)

which fits the requirements of (G.24). For the induction step, we consider, for l =1, . . . ,mj0 − 1,

σl+1 =a(x− λ)l+1 + a(x− λ)l+1

(x2 + bx+ c)l+1. (G.26)


The numerator can be rewritten as

a(x− λ)l+1 + a(x− λ)l+1 = a(x− λ)l+1 + a(x− λ)l(x− λ)

− a(x− λ)l(x− λ)− a(x− λ)l(x− λ)

+ a(x− λ)l(x− λ) + a(x− λ)l+1. (G.27)

Thus, σl+1 = S1 + S2 + S3, where

S1 =

(a(x− λ)l + a(x− λ)l

)(x− λ)

(x2 + bx+ c)l+1=

σl(x− λ)

x2 + bx+ c, (G.28)

S2 = −(a(x− λ)l−1 + a(x− λ)l−1

)(x− λ)(x− λ)

(x2 + bx+ c)l+1=

σl−1

x2 + bx+ c, (G.29)

S3 =

(a(x− λ)l + a(x− λ)l

)(x− λ)

(x2 + bx+ c)l+1=

σl(x− λ)

x2 + bx+ c, (G.30)

where, for (G.29) to hold for l = 1, we set σ0 := a + a ∈ R. Using the inductionhypothesis, S2 clearly has the form required by (G.24). Using the induction hypothesistogether with the elementary equality

sx2

x2 + bx+ c= s− sbx+ sc

x2 + bx+ c, (G.31)

we also obtain

S1 + S3 =σl (2x+ b)

x2 + bx+ c=

2x+ b

x2 + bx+ c

l∑

β=1

sβlx+ tβl(x2 + bx+ c)β

=l∑

β=1

2sβlx2 + (bsβl + 2tβl)x+ btβl

(x2 + bx+ c)β(x2 + bx+ c)

(G.31)=

l∑

β=1

1

(x2 + bx+ c)β

(

2sβl −2sβl(bx+ c)

x2 + bx+ c

)

+l∑

β=1

(bsβl + 2tβl)x+ btβl(x2 + bx+ c)β+1

(G.32)

to have the form required by (G.24), thereby finishing the induction and the proof ofthe theorem. �

Remark G.3. Given a rational function R = P/Q as in Th. G.1 (or Th. G.2), thereremains the question of how to actually compute the coefficients ajl of the partial fractiondecomposition (G.5) (or ajl, bjl, cjl of the partial fraction decomposition (G.15) in the realcase)? First, one always needs to obtain the zeros λj and their respective multiplicitiesmj, which, for deg(Q) large, can be very difficult. Then there are basically three differentpossibilities to proceed, where, in practise, the most efficient way in a concrete situationmight be to mix the three strategies:

H IRRATIONALITY OF e AND π 253

(a) Linear System: To determine k unknown coefficients, one can plug k different valuesfor z into (G.5) (or for x into (G.15)) to obtain a linear system for the unknowncoefficients.

(b) One can multiply (G.5) (or (G.15)) by Q, obtaining a polynomial on both sides ofthe equation. As the polynomials need to be equal, the coefficients of equal powersneed to be equal on both sides, yielding a system of equations for the unknowncoefficients.

(c) Multiplying (G.5) (or (G.15)) by (z−λj)mj and setting z = λj yields the coefficientof 1

(z−λj)mj , etc.

H Irrationality of e and π

H.1 Irrationality of e

The following Prop. H.1, which will then be used to prove the irrationality of e inTh. H.2, shows, in particular, that the series (8.26) can be used to efficiently computeaccurate approximations of e.

Proposition H.1. Defining

∀n∈N

∀z∈C

Rn(z) := ez −n−1∑

j=0

zj

j!, (H.1)

we have

∀n∈N

(

|z| ≤ 1 ⇒∣∣Rn(z)

∣∣ ≤ 2 |z|n

n!

)

, (H.2)

i.e. the error made when approximating ez by the partial sum (for |z| ≤ 1) is at most aslarge as twice the modulus of the first missing summand.

Proof. One estimates, for each n ∈ N and each z ∈ C with |z| ≤ 1,

∣∣Rn(z)

∣∣

(8.24),(7.81)

≤∞∑

j=n

|z|jj!

(7.73)=

|z|nn!

(

1 +|z|n+ 1

+|z|2

(n+ 1)(n+ 2)+ . . .

)

|z|≤1

≤ |z|nn!

(

1 +1

2+

1

22+ . . .

)(7.71)=

2 |z|nn!

, (H.3)

which establishes the case. �

Theorem H.2. Euler’s number e is irrational.

Proof. Seeking a contradiction, we assume e to be rational. Then there exist m,n ∈ N

with n ≥ 2 such that e = mn. Then n!e ∈ N and, thus,

n!Rn+1(1)(H.1)= n! e− n!

n∑

j=0

1

j!∈ Z, (H.4)

in contradiction to 0 < |n!Rn+1(1)| < 2n+1

< 1, which holds according to (H.2) (recallingn ≥ 2). �

H.2 Irrationality of π

Theorem H.3. π2 is irrational (then, in particular, π must be irrational as well).

Proof. Seeking a contradiction, we assume π2 to be rational. Then

∃a,b∈N

π2 =a

b. (H.5)

We can then choose some even n ∈ N satisfying

0 <π an

n!< 1. (H.6)

We now consider the function

f : R −→ R, f(x) :=xn (1− x)n

n!

(∗)=

1

n!

2n∑

k=n

(−1)k(

n

k − n

)

xk, (H.7)

where the equality at (∗) is proved by

xn (1− x)n

n!

(5.22)=

xn

n!

n∑

k=0

(−1)k(n

k

)

xkn even=

1

n!

2n∑

k=n

(−1)k(

n

k − n

)

xk. (H.8)

Thus, for the polynomial f , we obtain the derivatives

f (j)(0) =

0 for 0 ≤ j < n,j!n!(−1)j

(nj−n)

for n ≤ j ≤ 2n,

0 for 2n < j.

(H.9)

In consequence, since, for n ≤ j ≤ 2n, j!n!

∈ N and(nj−n)∈ N,

∀j∈N0

f (j)(0) ∈ Z. (H.10)

Moreover, since f(1 − x) = f(x) for each x ∈ R, and, thus, f (j)(1 − x) = (−1)jf (j)(x)for each x ∈ R, we also have

∀j∈N0

f (j)(1) ∈ Z. (H.11)

Next, we consider another polynomial, namely

g : R −→ R, g(x) := bnn∑

k=0

(−1)kπ2(n−k)f (2k)(x). (H.12)

Due to (H.5), (H.10), (H.11), and (H.12), we have

∀j∈N0

(

g(0) ∈ Z ∧ g(1) ∈ Z)

. (H.13)

For each x ∈ R, one calculates

g′′(x) + π2 g(x) = bnn∑

k=0

(−1)kπ2(n−k)f (2(k+1))(x) + bnn∑

k=0

(−1)kπ2(n−(k−1))f (2k)(x)

= bnn+1∑

k=1

(−1)k−1π2(n−(k−1))f (2k)(x) + bnn∑

k=0

(−1)kπ2(n−(k−1))f (2k)(x)

= bn (−1)n f (2n+2)(x) + bn π2n+2 f(x) = bn π2n+2 f(x), (H.14)

and, thus, for

h : R −→ R, h(x) := g′(x) sin(πx)− πg(x) cos(πx), (H.15)

one obtains, for each x ∈ R,

h′(x) = g′′(x) sin(πx) + πg′(x) cos(πx)− πg′(x) cos(πx) + π2g(x) sin(πx)

=(g′′(x) + π2g(x)

)sin(πx)

(H.14)= bn π2n+2 f(x) sin(πx)

(H.5)= π2 an f(x) sin(πx), (H.16)

implying the function h is the antiderivative of the function x 7→ π2 an f(x) sin(πx).This, together with the fundamental theorem of calculus in the form Th. 10.20(b) implies

I :=π2 an

π

∫ 1

0

f(x) sin(πx) dx =h(1)− h(0)

π=πg(1) + πg(0)

π= g(1) + g(0) ∈ Z.

(H.17)On the other hand, the definition of f in (H.7) yields

∀0<x<1

0 < f(x) <1

n!, (H.18)

and, thus, by (10.30) (i.e. by the monotonicity of the integral),

0 < I <π an

n!

(H.6)< 1. (H.19)

The contradiction between (H.19) and (H.17) establishes the case. �

I TRIGONOMETRIC FUNCTIONS 256

I Trigonometric Functions

I.1 Additional Trigonometric Formulas

Proposition I.1. We have the following identities:

∀z∈C

sin(2z) = 2 sin z cos z, (I.1a)

∀z∈C

cos(2z) = (cos z)2 − (sin z)2, (I.1b)

∀z∈C

1− cos z

2=(

sinz

2

)2

, (I.1c)

∀z∈C\{(2k+1)π: k∈Z}

tanz

2=

sin z

cos z + 1. (I.1d)

∀z∈C\{(2k+1)π: k∈Z}

cos z =1− (tan z

2)2

1 + (tan z2)2. (I.1e)

Proof. (I.1a) is immediate from (8.44c), (I.1b) is immediate from (8.44d).

(I.1c): For each z ∈ C, one computes

1− cos z

2

(I.1b)=

1− (cos z2)2 + (sin z

2)2

2

(8.44e)=

2 (sin z2)2

2=(

sinz

2

)2

, (I.2)


(I.1d): Note that, according to (8.47d), it is

cosz

2= 0 ⇔ ∃

k∈Zz = (2k + 1) π. (I.3)

Thus, for each z ∈ C \ {(2k + 1)π : k ∈ Z}, one computes

tanz

2=

2 sin z2cos z

2

2 (cos z2)2

(I.1a),(8.44e)=

sin z

(cos z2)2 − (sin z

2)2 + 1

=sin z

cos z + 1, (I.4)


(I.1e): Once again, using (I.3), one computes for each z ∈ C \ {(2k + 1)π : k ∈ Z}:

cos z(I.1b),(8.44e)

=(cos z

2)2 − (sin z

2)2

(cos z2)2 + (sin z

2)2

=1− (tan z

2)2

1 + (tan z2)2, (I.5)

as claimed. �

J Differential Calculus

J.1 Continuous, But Nowhere Differentiable Functions

The following Ex. J.1 provides functions from f : R −→ R that are continuous, butnowhere differentiable.

J DIFFERENTIAL CALCULUS 257

Example J.1. We start by defining the triangle wave function

g : R −→ R, g(x) :=

{

x− k for k ≤ x ≤ k + 12, k ∈ Z,

−x+ k + 1 for k + 12≤ x ≤ k + 1, k ∈ Z.

(J.1)

Then g is well-defined and continuous, since, clearly, g is piecewise affine, k + 12− k =

12= −k − 1

2+ k + 1, and −(k + 1) + k + 1 = 0 = k + 1 − (k + 1). Moreover, for each

k ∈ Z, g is clearly strictly increasing on [k, k + 12] and clearly strictly decreasing on

[k + 12, k + 1], implying

∀x∈R

0 ≤ g(x) ≤ 1

2. (J.2)

Clearly, (J.1) implies g to be periodic with period 1, i.e.

∀x∈R

g(x+ 1) = g(x). (J.3)

Now fix q ∈ R, a ∈ N such that

0 < q < 1 ∧ a ≥ 4 ∧ aq > 2 (J.4)

(clearly, q = 12and a = 5 satisfy (J.4), and there are (uncountably) many other admis-

sible choices for a and q). We now claim that

f : R −→ R, f(x) :=∞∑

n=0

qn g(anx), (J.5)

is continuous and nowhere differentiable. We first note that, as∑∞

n=0 qn converges and

∀n∈N

∀x∈R

0(J.2)

≤ qn g(anx)(J.2)

≤ qn

2< qn, (J.6)

Cor. 8.7(b) implies the series in (J.5) to converge uniformly. Then, since each functionfn : R −→ R, fn(x) := qng(anx), is continuous, f must be continuous by Cor. 8.7(c).

In preparation for showing f to be nowhere differentiable, we have to further investigatethe properties of g. We proceed by showing g to be Lipschitz continuous with Lipschitzconstant 1, i.e.

∀x,y∈R

|g(x)− g(y)| ≤ |x− y| : (J.7)

If |x− y| ≥ 12, then, using (J.2),

|g(x)− g(y)| ≤ 1

2≤ |x− y|.

If |x − y| < 12, then we distinguish four cases, where, without loss of generality, we let

x denote the smaller of the two points and y the larger, i.e. x ≤ y. Case (i): There isk ∈ Z such that k ≤ x, y ≤ k + 1

2. Then

|g(x)− g(y)| = |x− k − y + k| = |x− y|.

J DIFFERENTIAL CALCULUS 258

Case (ii): There is k ∈ Z such that k + 12≤ x, y ≤ k + 1. Then

|g(x)− g(y)| = | − x+ k + 1 + y − k − 1| = |x− y|.

Case (iii): There is k ∈ Z such that k ≤ x ≤ k + 12and k + 1

2≤ y ≤ k + 1. Then

x− k − 12≤ 0 and y − k − 1

2≥ 0, implying

|g(x)− g(y)| = |x− k + y − k − 1| =∣∣∣∣x− k − 1

2+

1

2+ y − k − 1

∣∣∣∣

≤∣∣∣∣x− k − 1

2− y + k +

1

2

∣∣∣∣= |x− y|.

Case (iv): There is k ∈ Z such that k− 12≤ x ≤ k and k ≤ y ≤ k+ 1

2. Then −x+k ≥ 0

and −y + k ≤ 0, implying

|g(x)− g(y)| = | − x+ k − y + k| ≤ | − x+ k + y − k| = |x− y|,

finishing the proof of (J.7).

For each c ∈ R, we also consider the following modified versions of g:

gc : R −→ R, gc(x) := g(cx) =

{

cx− k for k ≤ cx ≤ k + 12, k ∈ Z,

−cx+ k + 1 for k + 12≤ cx ≤ k + 1, k ∈ Z.

(J.8)

Then, for c 6= 0, gc is periodic with period c−1:

∀x∈R

gc(x+ c−1) = g(cx+ 1)(J.3)= g(cx) = gc(x). (J.9)

Moreover, gc is Lipschitz continuous with Lipschitz constant |c|:

∀x,y∈R

|gc(x)− gc(y)| = |g(cx)− g(cy)|(J.7)

≤ |cx− cy| = |c| |x− y|. (J.10)

To show that f is nowhere differentiable, we will now study suitable difference quotients.Let (hk)k∈N be a sequence such that

∀k∈N

hk = ± 1

ak+1. (J.11)

Then (J.4) implies limk→∞ hk = 0. Let x ∈ R be arbitrary. Define

∀k,n∈N

δkg(anx) :=

g(an(x+ hk))− g(anx)

hk. (J.12)

Then

∀k,n∈N

|δkg(anx)|(J.10)

≤ an |hk||hk|

= an, (J.13)

REFERENCES 259

and, recalling a ∈ N,

∀n>k∈N

δkg(anx) =

gan(x± 1ak+1 )− gan(x)

hk=gan(x± an−(k+1)

an)− gan(x)

hk

(J.9)= 0. (J.14)

Thus, for each k ∈ N, we obtain

δkf :=f(x+ hk)− f(x)

hk=

∑∞n=0 q

n(g(an(x+ hk))− g(anx)

)

hk

(J.14)=

k−1∑

n=0

qn δkg(anx) + qk δkg(a

kx) (J.15)

and estimate, recalling aq > 2,

∣∣∣∣∣

k−1∑

n=0

qn δkg(anx)

∣∣∣∣∣≤

k−1∑

n=0

qn |δkg(anx)|(J.13)

≤k−1∑

n=0

qn an =1− qkak

1− qa<

qkak

qa− 1. (J.16)

We rewrite (J.16) as

∣∣∣∣∣

k−1∑

n=0

qn δkg(anx)

∣∣∣∣∣< η qkak, where η :=

1

aq − 1, 0 < η < 1. (J.17)

According to (J.8), gak is affine on intervals of length 12ak

. Thus, since a ≥ 4 implies

∀k∈N

1

ak+1≤ 1

4ak, (J.18)

we can always choose the sign of hk such that

|δkg(akx)| =|ak(x+ hk − x)|

|hk|= ak. (J.19)

Using this choice for the hk, we combine our estimates to obtain

∀k∈N

|δkf |(J.15)=

∣∣∣∣∣

k−1∑

n=0

qn δkg(anx) + qk δkg(a

kx)

∣∣∣∣∣

(J.17),(J.19)

≥ (1− η) qkak. (J.20)

Thus, as qa > 2, limk→∞ |δkf | = ∞, proving that f is not differentiable in x. As x ∈ R

was arbitrary, f is nowhere differentiable.

References

[Bla84] A. Blass. Existence of Bases Implies the Axiom of Choice. ContemporaryMathematics 31 (1984), 31–33.

REFERENCES 260

[EFT07] H.-D. Ebbinghaus, J. Flum, and W. Thomas. Einfuhrung in die math-ematische Logik, 5th ed. Spektrum Akademischer Verlag, Heidelberg, 2007(German).

[EHH+95] H.-D. Ebbinghaus, H. Hermes, F. Hirzebruch, M. Koecher,

K. Mainzer, J. Neukirch, A. Prestel, and R. Remmert. Numbers.Graduate Texts in Mathematics, Vol. 123, Springer-Verlag, New York, 1995,corrected 3rd printing.

[Jec73] T. Jech. The Axiom of Choice. North-Holland, Amsterdam, 1973.

[Kon04] Konrad Konigsberger. Analysis 1, 6th ed. Springer-Verlag, Berlin, 2004(German).

[Kun80] Kenneth Kunen. Set Theory. Studies in Logic and the Foundations ofMathematics, Vol. 102, North-Holland, Amsterdam, 1980.

[Kun12] Kenneth Kunen. The Foundations of Mathematics. Studies in Logic,Vol. 19, College Publications, London, 2012.

[Phi17] P. Philip. Analysis III: Measure and Integration Theory of Several Vari-ables. Lecture Notes, Ludwig-Maximilians-Universitat, Germany, 2016/2017,available in PDF format athttp://www.math.lmu.de/~philip/publications/lectureNotes/philipPeter_Analysis3.pdf.

[Phi19] P. Philip. Linear Algebra I. Lecture Notes, Ludwig-Maximilians-Universi-tat, Germany, 2018/2019, available in PDF format athttp://www.math.lmu.de/~philip/publications/lectureNotes/philipPeter_LinearAlgebra1.pdf.

[Wal02] Wolfgang Walter. Analysis 2, 5th ed. Springer-Verlag, Berlin, 2002 (Ger-man).

[Wal04] Wolfgang Walter. Analysis 1, 7th ed. Springer-Verlag, Berlin, 2004 (Ger-man).

[Wik15a] Wikipedia. Coq — Wikipedia, The Free Encyclopedia. 2015, https://en.wikipedia.org/wiki/Coq Online; accessed Sep-01-2015.

[Wik15b] Wikipedia. HOL Light — Wikipedia, The Free Encyclopedia. 2015, https://en.wikipedia.org/wiki/HOL_Light Online; accessed Sep-01-2015.

[Wik15c] Wikipedia. Isabelle (proof assistant) — Wikipedia, The Free En-cyclopedia. 2015, https://en.wikipedia.org/wiki/Isabelle_(proof_

assistant) Online; accessed Sep-01-2015.

analysisi: calculusofonerealvariable1 foundations: mathematical logic and set theory 1.1...

Documents