a simple introduction to ergodic theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  ·...

145
A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin December 18, 2008

Upload: trinhnhan

Post on 24-Mar-2018

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

A Simple Introduction to Ergodic Theory

Karma Dajani and Sjoerd Dirksin

December 18, 2008

Page 2: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

2

Page 3: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Contents

1 Introduction and preliminaries 51.1 What is Ergodic Theory? . . . . . . . . . . . . . . . . . . . . . 51.2 Measure Preserving Transformations . . . . . . . . . . . . . . 61.3 Basic Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4 Recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.5 Induced and Integral Transformations . . . . . . . . . . . . . . 15

1.5.1 Induced Transformations . . . . . . . . . . . . . . . . . 151.5.2 Integral Transformations . . . . . . . . . . . . . . . . . 18

1.6 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.7 Other Characterizations of Ergodicity . . . . . . . . . . . . . . 221.8 Examples of Ergodic Transformations . . . . . . . . . . . . . . 24

2 The Ergodic Theorem 292.1 The Ergodic Theorem and its consequences . . . . . . . . . . . 292.2 Characterization of Irreducible Markov Chains . . . . . . . . . 412.3 Mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3 Measure Preserving Isomorphisms and Factor Maps 473.1 Measure Preserving Isomorphisms . . . . . . . . . . . . . . . . 473.2 Factor Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.3 Natural Extensions . . . . . . . . . . . . . . . . . . . . . . . . 52

4 Entropy 554.1 Randomness and Information . . . . . . . . . . . . . . . . . . 554.2 Definitions and Properties . . . . . . . . . . . . . . . . . . . . 564.3 Calculation of Entropy and Examples . . . . . . . . . . . . . . 634.4 The Shannon-McMillan-Breiman Theorem . . . . . . . . . . . 664.5 Lochs’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3

Page 4: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

4

5 Hurewicz Ergodic Theorem 795.1 Equivalent measures . . . . . . . . . . . . . . . . . . . . . . . 795.2 Non-singular and conservative transformations . . . . . . . . . 805.3 Hurewicz Ergodic Theorem . . . . . . . . . . . . . . . . . . . . 83

6 Invariant Measures for Continuous Transformations 896.1 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896.2 Unique Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . 96

7 Topological Dynamics 1017.1 Basic Notions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1027.2 Topological Entropy . . . . . . . . . . . . . . . . . . . . . . . 108

7.2.1 Two Definitions . . . . . . . . . . . . . . . . . . . . . . 1087.2.2 Equivalence of the two Definitions . . . . . . . . . . . . 114

7.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

8 The Variational Principle 1278.1 Main Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 1278.2 Measures of Maximal Entropy . . . . . . . . . . . . . . . . . . 136

Page 5: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Chapter 1

Introduction and preliminaries

1.1 What is Ergodic Theory?

It is not easy to give a simple definition of Ergodic Theory because it usestechniques and examples from many fields such as probability theory, statis-tical mechanics, number theory, vector fields on manifolds, group actions ofhomogeneous spaces and many more.

The word ergodic is a mixture of two Greek words: ergon (work) and odos(path). The word was introduced by Boltzmann (in statistical mechanics)regarding his hypothesis: for large systems of interacting particles in equilib-rium, the time average along a single trajectory equals the space average. Thehypothesis as it was stated was false, and the investigation for the conditionsunder which these two quantities are equal lead to the birth of ergodic theoryas is known nowadays.

A modern description of what ergodic theory is would be: it is the studyof the long term average behavior of systems evolving in time. The collectionof all states of the system form a space X, and the evolution is representedby either

– a transformation T : X → X, where Tx is the state of the system attime t = 1, when the system (i.e., at time t = 0) was initially in state x.(This is analogous to the setup of discrete time stochastic processes).

– if the evolution is continuous or if the configurations have spacial struc-ture, then we describe the evolution by looking at a group of transfor-mations G (like Z2, R, R2) acting on X, i.e., every g ∈ G is identifiedwith a transformation Tg : X → X, and Tgg′ = Tg Tg′ .

5

Page 6: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

6 Introduction and preliminaries

The space X usually has a special structure, and we want T to preservethe basic structure on X. For example–if X is a measure space, then T must be measurable.–if X is a topological space, then T must be continuous.–if X has a differentiable structure, then T is a diffeomorphism.In this course our space is a probability space (X,B, µ), and our time isdiscrete. So the evolution is described by a measurable map T : X → X, sothat T−1A ∈ B for all A ∈ B. For each x ∈ X, the orbit of x is the sequence

x, Tx, T 2x, . . . .

If T is invertible, then one speaks of the two sided orbit

. . . , T−1x, x, Tx, . . . .

We want also that the evolution is in steady state i.e. stationary. In thelanguage of ergodic theory, we want T to be measure preserving.

1.2 Measure Preserving Transformations

Definition 1.2.1 Let (X,B, µ) be a probability space, and T : X → X mea-surable. The map T is said to be measure preserving with respect to µ ifµ(T−1A) = µ(A) for all A ∈ B.

This definition implies that for any measurable function f : X → R, theprocess

f, f T, f T 2, . . .

is stationary. This means that for all Borel sets B1, . . . , Bn, and all integersr1 < r2 < . . . < rn, one has for any k ≥ 1,

µ (x : f(T r1x) ∈ B1, . . . f(T rnx) ∈ Bn) =

µ(x : f(T r1+kx) ∈ B1, . . . f(T rn+kx) ∈ Bn

).

In case T is invertible, then T is measure preserving if and only if µ(TA) =µ(A) for all A ∈ B. We can generalize the definition of measure preserving tothe following case. Let T : (X1,B1, µ1) → (X2,B2, µ2) be measurable, thenT is measure preserving if µ1(T

−1A) = µ2(A) for all A ∈ B2. The following

Page 7: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Measure Preserving Transformations 7

gives a useful tool for verifying that a transformation is measure preserving.For this we need the notions of algebra and semi-algebra.Recall that a collection S of subsets of X is said to be a semi-algebra if(i) ∅ ∈ S, (ii) A ∩ B ∈ S whenever A,B ∈ S, and (iii) if A ∈ S, thenX\A = ∪ni=1Ei is a disjoint union of elements of S. For example if X = [0, 1),and S is the collection of all subintervals, then S is a semi-algebra. Or ifX = 0, 1Z, then the collection of all cylinder sets x : xi = ai, . . . , xj = ajis a semi-algebra.An algebra A is a collection of subsets of X satisfying:(i) ∅ ∈ A, (ii) ifA,B ∈ A, then A ∩ B ∈ A, and finally (iii) if A ∈ A, then X \ A ∈ A.Clearly an algebra is a semi-algebra. Furthermore, given a semi-algebra Sone can form an algebra by taking all finite disjoint unions of elements of S.We denote this algebra by A(S), and we call it the algebra generated by S.It is in fact the smallest algebra containing S. Likewise, given a semi-algebraS (or an algebra A), the σ-algebra generated by S (A) is denoted by B(S)(B(A)), and is the smallest σ-algebra containing S (or A).A monotone class C is a collection of subsets of X with the following twoproperties

– if E1 ⊆ E2 ⊆ . . . are elements of C, then ∪∞i=1Ei ∈ C,

– if F1 ⊇ F2 ⊇ . . . are elements of C, then ∩∞i=1Fi ∈ C.

The monotone class generated by a collection S of subsets of X is the smallestmonotone class containing S.

Theorem 1.2.1 Let A be an algebra of X, then the σ-algebra B(A) gener-ated by A equals the monotone class generated by A.

Using the above Theorem, one can get an easier criterion for checking thata transformation is measure preserving.

Theorem 1.2.2 Let (Xi,Bi, µi) be probability spaces, i = 1, 2, and T : X1 →X2 a transformation. Suppose S2 is a generating semi-algebra of B2. Then,T is measurable and measure preserving if and only if for each A ∈ S2, wehave T−1A ∈ B1 and µ1(T

−1A) = µ2(A).

Proof. Let

C = B ∈ B2 : T−1B ∈ B1, and µ1(T−1B) = µ2(B).

Page 8: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

8 Introduction and preliminaries

Then, S2 ⊆ C ⊆ B2, and hence A(S2) ⊂ C. We show that C is a monotoneclass. Let E1 ⊆ E2 ⊆ . . . be elements of C, and let E = ∪∞i=1Ei. Then,T−1E = ∪∞i=1T

−1Ei ∈ B1.

µ1(T−1E) = µ1(∪∞n=1T

−1En) = limn→∞

µ1(T−1En)

= limn→∞

µ2(En)

= µ2(∪∞n=1En)

= µ2(E).

Thus, E ∈ C. A similar proof shows that if F1 ⊇ F2 ⊇ . . . are elements of C,then ∩∞i=1Fi ∈ C. Hence, C is a monotone class containing the algebra A(S2).By the monotone class theorem, B2 is the smallest monotone class containingA(S2), hence B2 ⊆ C. This shows that B2 = C, therefore T is measurableand measure preserving.

For example if– X = [0, 1) with the Borel σ-algebra B, and µ a probability measure on B.Then a transformation T : X → X is measurable and measure preserving ifand only if T−1[a, b) ∈ B and µ (T−1[a, b)) = µ ([a, b)) for any interval [a, b).– X = 0, 1N with product σ-algebra and product measure µ. A trans-formation T : X → X is measurable and measure preserving if and onlyif

T−1 (x : x0 = a0, . . . , xn = an) ∈ B,

and

µ(T−1x : x0 = a0, . . . , xn = an

)= µ (x : x0 = a0, . . . , xn = an)

for any cylinder set.

Exercise 1.2.1 Recall that if A and B are measurable sets, then

A∆B = (A ∪B) \ (A ∩B) = (A \B) ∪ (B \ A).

Show that for any measurable sets A,B,C one has

µ(A∆B) ≤ µ(A∆C) + µ(C∆B).

Another useful lemma is the following (see also ([KT]).

Page 9: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Basic Examples 9

Lemma 1.2.1 Let (X,B, µ) be a probability space, and A an algebra gener-ating B. Then, for any A ∈ B and any ε > 0, there exists C ∈ A such thatµ(A∆C) < ε.

Proof. Let

D = A ∈ B : for any ε > 0, there exists C ∈ A such that µ(A∆C) < ε.

Clearly, A ⊆ D ⊆ B. By the Monotone Class Theorem (Theorem (1.2.1)),we need to show that D is a monotone class. To this end, let A1 ⊆ A2 ⊆ · · ·be a sequence in D, and let A =

⋃∞n=1An, notice that µ(A) = lim

n→∞µ(An).

Let ε > 0, there exists an N such that µ(A∆AN) = |µ(A) − µ(AN)| < ε/2.Since AN ∈ D, then there exists C ∈ A such that µ(AN∆C) < ε/2. Then,

µ(A∆C) ≤ µ(A∆AN) + µ(AN∆C) < ε.

Hence, A ∈ D. Similarly, one can show that D is closed under decreasing in-tersections so that D is a monotone class containg A, hence by the Monotoneclass Theorem B ⊆ D. Therefore, B = D, and the theorem is proved.

1.3 Basic Examples

(a) Translations – Let X = [0, 1) with the Lebesgue σ-algebra B, andLebesgue measure λ. Let 0 < θ < 1, define T : X → X by

Tx = x+ θ mod 1 = x+ θ − bx+ θc.

Then, by considering intervals it is easy to see that T is measurable andmeasure preserving.

(b) Multiplication by 2 modulo 1 – Let (X,B, λ) be as in example (a), andlet T : X → X be given by

Tx = 2x mod 1 =

2x 0 ≤ x < 1/22x− 1 1/2 ≤ x < 1.

For any interval [a, b),

T−1[a, b) = [a

2,b

2) ∪ [

a+ 1

2,b+ 1

2),

Page 10: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

10 Introduction and preliminaries

andλ(T−1[a, b)

)= b− a = λ ([a, b)) .

Although this map is very simple, it has in fact many facets. For example,iterations of this map yield the binary expansion of points in [0, 1) i.e., usingT one can associate with each point in [0, 1) an infinite sequence of 0’s and1’s. To do so, we define the function a1 by

a1(x) =

0 if 0 ≤ x < 1/2

1 if 1/2 ≤ x < 1,

then Tx = 2x − a1(x). Now, for n ≥ 1 set an(x) = a1(Tn−1x). Fix x ∈ X,

for simplicity, we write an instead of an(x), then Tx = 2x − a1. Rewritingwe get x = a1

2+ Tx

2. Similarly, Tx = a2

2+ T 2x

2. Continuing in this manner,

we see that for each n ≥ 1,

x =a1

2+a2

22+ · · ·+ an

2n+T nx

2n.

Since 0 < T nx < 1, we get

x−n∑i=1

ai2i

=T nx

2n→ 0 as n→∞.

Thus, x =∑∞

i=1ai

2i . We shall later see that the sequence of digits a1, a2, . . .forms an i.i.d. sequence of Bernoulli random variables.

(c) Baker’s Transformation – This example is the two-dimensional versionof example (b). The underlying probability space is [0, 1)2 with productLebesgue σ-algebra B × B and product Lebesgue measure λ × λ. DefineT : [0, 1)2 → [0, 1)2 by

T (x, y) =

(2x, y

2) 0 ≤ x < 1/2

(2x− 1, y+12

) 1/2 ≤ x < 1.

Exercise 1.3.1 Verify that T is invertible, measurable and measure preserv-ing.

(d) β-transformations – Let X = [0, 1) with the Lebesgue σ-algebra B. Let

β = 1+√

52

, the golden mean. Notice that β2 = β+1. Define a transformation

Page 11: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Basic Examples 11

T : X → X by

Tx = βx mod 1 =

βx 0 ≤ x < 1/ββx− 1 1/β ≤ x < 1.

Then, T is not measure preserving with respect to Lebesgue measure (givea counterexample), but is measure preserving with respect to the measure µgiven by

µ(B) =

∫B

g(x) dx,

where

g(x) =

5+3

√5

100 ≤ x < 1/β

5+√

510

1/β ≤ x < 1.

Exercise 1.3.2 Verify that T is measure preserving with respect to µ, andshow that (similar to example (b)) iterations of this map generate expansionsfor points x ∈ [0, 1) (known as β-expansions) of the form

x =∞∑i=1

biβi,

where bi ∈ 0, 1 and bibi+1 = 0 for all i ≥ 1.

(e) Bernoulli Shifts – Let X = 0, 1, . . . k − 1Z (or X = 0, 1, . . . k − 1N),F the σ-algebra generated by the cylinders. Let p = (p0, p1, . . . , pk−1) be apositive probability vector, define a measure µ on F by specifying it on thecylinder sets as follows

µ (x : x−n = a−n, . . . , xn = an) = pa−n . . . pan .

Let T : X → X be defined by Tx = y, where yn = xn+1. The map T , calledthe left shift, is measurable and measure preserving, since

T−1x : x−n = a−n, . . . xn = an = x : x−n+1 = a−n, . . . , xn+1 = an,

and

µ (x : x−n+1 = a−n, . . . , xn+1 = an) = pa−n . . . pan .

Page 12: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

12 Introduction and preliminaries

Notice that in case X = 0, 1, . . . k − 1N, then one should consider cylindersets of the form x : x0 = a0, . . . xn = an. In this case

T−1x : x0 = a0, . . . , xn = an = ∪k−1j=0x : x0 = j, x1 = a0, . . . , xn+1 = an,

and it is easy to see that T is measurable and measure preserving.

(f) Markov Shifts – Let (X,F , T ) be as in example (e). We define a measureν on F as follows. Let P = (pij) be a stochastic k × k matrix, and q =(q0, q1, . . . , qk−1) a positive probability vector such that qP = q. Define ν oncylinders by

ν (x : x−n = a−n, . . . xn = an) = qa−npa−na−n+1 . . . pan−1an .

Just as in example (e), one sees that T is measurable and measure pre-serving.

(g) Stationary Stochastic Processes– Let (Ω,F , IP) be a probability space,and

. . . , Y−2, Y−1, Y0, Y1, Y2, . . .

a stationary stochastic process on Ω with values in R. Hence, for each k ∈ Z

IP (Yn1 ∈ B1, . . . , Ynr ∈ Br) = IP (Yn1+k ∈ B1, . . . , Ynr+k ∈ Br)

for any n1 < n2 < · · · < nr and any Lebesgue sets B1, . . . , Br. We want tosee this process as coming from a measure preserving transformation.Let X = RZ = x = (. . . , x1, x0, x1, . . .) : xi ∈ R with the product σ-algebra(i.e. generated by the cylinder sets). Let T : X → X be the left shift i.e.Tx = z where zn = xn+1. Define φ : Ω → X by

φ(ω) = (. . . , Y−2(ω), Y−1(ω), Y0(ω), Y1(ω), Y2(ω), . . . ).

Then, φ is measurable since if B1, . . . , Br are Lebesgue sets in R, then

φ−1 (x ∈ X : xn1 ∈ B1, . . . xnr ∈ Br) = Y −1n1

(B1) ∩ . . . ∩ Y −1nr

(Br) ∈ F .

Define a measure µ on X by

µ(E) = IP(φ−1(E)

).

On cylinder sets µ has the form,

µ (x ∈ X : xn1 ∈ B1, . . . xnr ∈ Br) = IP (Yn1 ∈ B1, . . . , Ynr ∈ Br) .

Page 13: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Basic Examples 13

Since

T−1 (x : xn1 ∈ B1, . . . xnr ∈ Br) = x : xn1+1 ∈ B1, . . . xnr+1 ∈ Br,stationarity of the process Yn implies that T is measure preserving. Further-more, if we let πi : X → R be the natural projection onto the ith coordinate,then Yi(ω) = πi(φ(ω)) = π0 T i(φ(ω)).

(h) Random Shifts – Let (X,B, µ) be a probability space, and T : X → X aninvertible measure preserving transformation. Then, T−1 is measurable andmeasure preserving with respect to µ. Suppose now that at each momentinstead of moving forward by T (x→ Tx), we first flip a fair coin to decidewhether we will use T or T−1. We can describe this random system by meansof a measure preserving transformation in the following way.Let Ω = −1, 1Z with product σ-algebra F (i.e. the σ-algebra generatedby the cylinder sets), and the uniform product measure IP (see example (e)),and let σ : Ω → Ω be the left shift. As in example (e), the map σ is measurepreserving. Now, let Y = Ω × X with the product σ-algebra, and productmeasure IP× µ. Define S : Y → Y by

S(ω, x) = (σω, T ω0x).

Then S is invertible (why?), and measure preserving with respect to IP× µ.To see the latter, for any set C ∈ F , and any A ∈ B, we have

(IP× µ)(S−1(C × A)

)= (IP× µ) ((ω, x) : S(ω, x) ∈ (C × A))

= (IP× µ) ((ω, x) : ω0 = 1, σω ∈ C, Tx ∈ A)

+ (IP× µ)((ω, x) : ω0 = −1, σω ∈ C, T−1x ∈ A

)= (IP× µ)

(ω0 = 1 ∩ σ−1C × T−1A

)+ (IP× µ)

(ω0 = −1 ∩ σ−1C × TA

)= IP

(ω0 = 1 ∩ σ−1C

)µ(T−1A

)+ IP

(ω0 = −1 ∩ σ−1C

)µ (TA)

= IP(ω0 = 1 ∩ σ−1C

)µ(A)

+ IP(ω0 = −1 ∩ σ−1C

)µ(A)

= IP(σ−1C)µ(A) = IP(C)µ(A) = (IP× µ)(C × A).

(h) continued fractions – Consider ([0, 1),B), where B is the Lebesgue σ-algebra. Define a transformation T : [0, 1) → [0, 1) by T0 = 0 and for x 6= 0

Tx =1

x− b1

xc.

Page 14: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

14 Introduction and preliminaries

Exercise 1.3.3 Show that T is not measure preserving with respect toLebesgue measure, but is measure preserving with respect to the so calledGauss probability measure µ given by

µ(B) =

∫B

1

log 2

1

1 + xdx.

An interesting feature of this map is that its iterations generate the continuedfraction expansion for points in (0, 1). For if we define

a1 = a1(x) =

1 if x ∈ (1

2, 1)

n if x ∈ ( 1n+1

, 1n], n ≥ 2,

then, Tx = 1x− a1 and hence x =

1

a1 + Tx. For n ≥ 1, let an = an(x) =

a1(Tn−1x). Then, after n iterations we see that

x =1

a1 + Tx= . . . =

1

a1 +1

a2 +.. . +

1

an + T nx

.

In fact, ifpnqn

=1

a1 +1

a2 +.. . +

1

an

, then one can show that qn are mono-

tonically increasing, and

|x− pnqn| < 1

q2n

→ 0 as n→∞.

The last statement implies that

x =1

a1 +1

a2 +1

a3 +1. . .

.

Page 15: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Recurrence 15

1.4 Recurrence

Let T be a measure preserving transformation on a probability space (X,F , µ),and let B ∈ F . A point x ∈ B is said to be B-recurrent if there exists k ≥ 1such that T kx ∈ B.

Theorem 1.4.1 (Poincare Recurrence Theorem) If µ(B) > 0, thena.e. x ∈ B is B-recurrent.

Proof Let F be the subset of B consisting of all elements that are not B-recurrent. Then,

F = x ∈ B : T kx /∈ B for all k ≥ 1.

We want to show that µ(F ) = 0. First notice that F ∩ T−kF = ∅ for allk ≥ 1, hence T−lF ∩ T−mF = ∅ for all l 6= m. Thus, the sets F, T−1F, . . .are pairwise disjoint, and µ(T−nF ) = µ(F ) for all n ≥ 1 (T is measurepreserving). If µ(F ) > 0, then

1 = µ(X) ≥ µ(∪k≥0T

−kF)

=∑k≥0

µ(F ) = ∞,

a contradiction.

The proof of the above theorem implies that almost every x ∈ B returnsto B infinitely often. In other words, there exist infinitely many integersn1 < n2 < . . . such that T nix ∈ B. To see this, let

D = x ∈ B : T kx ∈ B for finitely many k ≥ 1.

Then,D = x ∈ B : T kx ∈ F for some k ≥ 0 ⊆ ∪∞k=0T

−kF.

Thus, µ(D) = 0 since µ(F ) = 0 and T is measure preserving.

1.5 Induced and Integral Transformations

1.5.1 Induced Transformations

Let T be a measure preserving transformation on the probability space(X,F , µ). Let A ⊂ X with µ(A) > 0. By Poincare’s Recurrence Theo-rem almost every x ∈ A returns to A infinitely often under the action of T .

Page 16: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

16 Introduction and preliminaries

For x ∈ A, let n(x) := infn ≥ 1 : T nx ∈ A. We call n(x) the first returntime of x to A.

Exercise 1.5.1 Show that n is measurable with respect to the σ-algebraF ∩ A on A.

By Poincare Theorem, n(x) is finite a.e. on A. In the sequel we removefrom A the set of measure zero on which n(x) = ∞, and we denote the newset again by A. Consider the σ-algebra F ∩A on A, which is the restrictionof F to A. Furthermore, let µA be the probability measure on A, defined by

µA(B) =µ(B)

µ(A), for B ∈ F ∩ A,

so that (A,F ∩A, µA) is a probability space. Finally, define the induced mapTA : A→ A by

TAx = T n(x)x , for x ∈ A.

From the above we see that TA is defined onA. What kind of a transformationis TA?

Exercise 1.5.2 Show that TA is measurable with respect to the σ-algebraF ∩ A.

Proposition 1.5.1 TA is measure preserving with respect to µA.

Proof For k ≥ 1, let

Ak = x ∈ A : n(x) = k

Bk = x ∈ X \ A : Tx, . . . , T k−1x 6∈ A, T kx ∈ A.

Notice that A =⋃∞k=1Ak, and

T−1A = A1 ∪B1 and T−1Bn = An+1 ∪Bn+1. (1.1)

Let C ∈ F∩A, since T is measure preserving it follows that µ(C) = µ(T−1C).To show that µA(C) = µA(T−1

A C), we show that

µ(T−1A C) = µ(T−1C).

Page 17: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Induced and Integral Transformations 17

A1 A2 A3 A4 A5

B1 B2 B3 B4

B1 B2 B3

B1 B2

B1

A ≡ (A, 1)

TA \ A ≡ (A, 2)

T 2A \ A ∪ TA ≡ (A, 3)

...

........

........

........

........

.................

.................................................................................

........

........

........

........

.................

.................................................................................

........

........

........

........

.................

.................................................................................

........

........

........

........

.................

................

........

........

........

........

.................

.................................................................................

........

........

........

........

.................

................

Figure 1.1: A tower.

Now,

T−1A (C) =

∞⋃k=1

Ak ∩ T−1A C =

∞⋃k=1

Ak ∩ T−kC,

hence

µ(T−1A (C)

)=

∞∑k=1

µ(Ak ∩ T−kC

).

On the other hand, using repeatedly (1.1) , one gets for any n ≥ 1,

µ(T−1(C)

)= µ(A1 ∩ T−1C) + µ(B1 ∩ T−1C)

= µ(A1 ∩ T−1C) + µ(T−1(B1 ∩ T−1C))

= µ(A1 ∩ T−1C) + µ(A2 ∩ T−2C) + µ(B2 ∩ T−2C)...

=n∑k=1

µ(Ak ∩ T−kC) + µ(Bn ∩ T−nC).

Since

1 ≥ µ

(∞⋃n=1

Bn ∩ T−nC

)=

∞∑n=1

µ(Bn ∩ T−nC),

it follows thatlimn→∞

µ(Bn ∩ T−nC) = 0.

Page 18: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

18 Introduction and preliminaries

Thus,

µ(C) = µ(T−1C) =∞∑k=1

µ(Ak ∩ T−kC

)= µ(T−1

A C).

This shows that µA(C) = µA(T−1A C), which implies that TA is measure pre-

serving with respect to µA.

Exercise 1.5.3 Assume T is invertible. Without using Proposition 1.5.1show that for all C ∈ F ∩ A,

µA(C) = µA(TAC).

Exercise 1.5.4 Let G =1 +

√5

2, so that G2 = G+ 1. Consider the set

X = [0,1

G)× [0, 1)

⋃[1

G, 1)× [0,

1

G),

endowed with the product Borel σ-algebra, and the normalized Lebesguemeasure λ× λ. Define the transformation

T (x, y) =

(Gx,

y

G), (x, y) ∈ [0, 1

G)× [0, 1]

(Gx− 1,1 + y

G), (x, y) ∈ [ 1

G, 1)× [0, 1

G).

(a) Show that T is measure preserving with respect to λ× λ.

(b) Determine explicitely the induced transformation of T on the set [0, 1)×[0, 1

G).

1.5.2 Integral Transformations

Let S be a measure preserving transformation on a probability space (A,F , ν),and let f ∈ L1(A, ν) be positive and integer valued. We now construct a mea-sure preserving transformation T on a probability space (X, C, µ), such thatthe original transformation S can be seen as the induced transformation onX with return time f .

(1) X = (y, i) : y ∈ A and 1 ≤ i ≤ f(y), i ∈ N,

Page 19: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Induced and Integral Transformations 19

(2) C is generated by sets of the form

(B, i) = (y, i) : y ∈ B and f(y) ≥ i ,

where B ⊂ A, B ∈ F and i ∈ N.

(3) µ(B, i) =ν(B)∫

A

f(y) dν(y)and then extend µ to all of X.

(4) Define T : X → X as follows:

T (y, i) =

(y, i+ 1), if i+ 1 ≤ f(y),

(Sy, 1), if i+ 1 > f(y).

Now (X, C, µ, T ) is called an integral system of (A,F , ν, S) under f . We nowshow that T is µ-measure preserving. In fact, it suffices to check this on thegenerators.

Let B ⊂ A be F -measurable, and let i ≥ 1. We have to discern thefollowing two cases:

(1) If i > 1, then T−1(B, i) = (B, i− 1) and clearly

µ(T−1(B, i)) = µ(B, i− 1) = µ(B, i) =ν(B)∫

Af(y) dν(y)

.

(2) If i = 1, we write An = y ∈ A : f(y) = n, and we have

T−1(B, 1) =∞⋃n=1

(An ∩ S−1B, n) (disjoint union).

Since⋃∞n=1An = A we therefore find that

µ(T−1(B, 1)) =∞∑n=1

ν(An ∩ S−1B)∫A

f(y) dν(y)=

ν(S−1B)∫A

f(y) dν(y)

=ν(B)∫

A

f(y) dν(y)= µ(B, 1) .

Page 20: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

20 Introduction and preliminaries

This shows that T is measure preserving. Moreover, if we consider theinduced transformation of T on the set (A, 1), then the first return timen(x, 1) = infk ≥ 1 : T k(x, 1) ∈ (A, 1) is given by n(x, 1) = f(x), andT(A,1)(x, 1) = (Sx, 1).

1.6 Ergodicity

Definition 1.6.1 Let T be a measure preserving transformation on a proba-bility space (X,F , µ). The map T is said to be ergodic if for every measurableset A satisfying T−1A = A, we have µ(A) = 0 or 1.

Theorem 1.6.1 Let (X,F , µ) be a probability space and T : X → X mea-sure preserving. The following are equivalent:

(i) T is ergodic.

(ii) If B ∈ F with µ(T−1B∆B) = 0, then µ(B) = 0 or 1.

(iii) If A ∈ F with µ(A) > 0, then µ (∪∞n=1T−nA) = 1.

(iv) If A,B ∈ F with µ(A) > 0 and µ(B) > 0, then there exists n > 0 suchthat µ(T−nA ∩B) > 0.

Remark 1.6.1

1. In case T is invertible, then in the above characterization one can replaceT−n by T n.

2. Note that if µ(B4T−1B) = 0, then µ(B \ T−1B) = µ(T−1B \ B) = 0.Since

B =(B \ T−1B

)∪(B ∩ T−1B

),

andT−1B =

(T−1B \B

)∪(B ∩ T−1B

),

we see that after removing a set of measure 0 from B and a set of measure 0from T−1B, the remaining parts are equal. In this case we say that B equalsT−1B modulo sets of measure 0.3. In words, (iii) says that if A is a set of positive measure, almost everyx ∈ X eventually (in fact infinitely often) will visit A.4. (iv) says that elements of B will eventually enter A.

Page 21: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Ergodicity 21

Proof of Theorem 1.6.1

(i)⇒(ii) Let B ∈ F be such that µ(B∆T−1B) = 0. We shall define ameasurable set C with C = T−1C and µ(C∆B) = 0. Let

C = x ∈ X : T nx ∈ B i.o. =∞⋂n=1

∞⋃k=n

T−kB.

Then, T−1C = C, hence by (i) µ(C) = 0 or 1. Furthermore,

µ(C∆B) = µ

(∞⋂n=1

∞⋃k=n

T−kB ∩Bc

)+ µ

(∞⋃n=1

∞⋂k=n

T−kBc ∩B

)

≤ µ

(∞⋃k=1

T−kB ∩Bc

)+ µ

(∞⋃k=1

T−kBc ∩B

)

≤∞∑k=1

µ(T−kB∆B

).

Using induction (and the fact that µ(E∆F ) ≤ µ(E∆G)+µ(G∆F )), one canshow that for each k ≥ 1 one has µ

(T−kB∆B

)= 0. Hence, µ(C∆B) = 0

which implies that µ(C) = µ(B). Therefore, µ(B) = 0 or 1.

(ii)⇒(iii) Let µ(A) > 0 and let B =⋃∞n=1 T

−nA. Then T−1B ⊂ B. Since Tis measure preserving, then µ(B) > 0 and

µ(T−1B∆B) = µ(B \ T−1B) = µ(B)− µ(T−1B) = 0.

Thus, by (ii) µ(B) = 1.

(iii)⇒(iv) Suppose µ(A)µ(B) > 0. By (iii)

µ(B) = µ

(B ∩

∞⋃n=1

T−nA

)= µ

(∞⋃n=1

(B ∩ T−nA)

)> 0.

Hence, there exists k ≥ 1 such that µ(B ∩ T−kA) > 0.

(iv)⇒(i) Suppose T−1A = A with µ(A) > 0. If µ(Ac) > 0, then by (iv) thereexists k ≥ 1 such that µ(Ac ∩ T−kA) > 0. Since T−kA = A, it follows thatµ(Ac ∩ A) > 0, a contradiction. Hence, µ(A) = 1 and T is ergodic.

Page 22: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

22 Introduction and preliminaries

1.7 Other Characterizations of Ergodicity

We denote by L0(X,F , µ) the space of all complex valued measurable func-tions on the probability space (X,F , µ). Let

Lp(X,F , µ) = f ∈ L0(X,F , µ) :

∫X

|f |p dµ(x) <∞.

We use the subscript R whenever we are dealing only with real-valued func-tions.Let (Xi,Fi, µi), i = 1, 2 be two probability spaces, and T : X1 → X2 a mea-sure preserving transformation i.e., µ2(A) = µ1(T

−1A). Define the inducedoperator UT : L0(X2,F2, µ2) → L0(X1,F1, µ1) by

UTf = f T.

The following properties of UT are easy to prove.

Proposition 1.7.1 The operator UT has the following properties:

(i) UT is linear

(ii) UT (fg) = UT (f)UT (g)

(iii) UT c = c for any constant c.

(iv) UT is a positive linear operator

(v) UT1B = 1B T = 1T−1B for all B ∈ F2.

(vi)∫X1UTf dµ1 =

∫X2f dµ2 for all f ∈ L0(X2,F2, µ2), (where if one side

doesn’t exist or is infinite, then the other side has the same property).

(vii) Let p ≥ 1. Then, UTLp(X2,F2, µ2) ⊂ Lp(X1,F1, µ1), and ||UTf ||p =

||f ||p for all f ∈ Lp(X2,F2, µ2).

Exercise 1.7.1 Prove Proposition 1.7.1

Using the above properties, we can give the following characterization ofergodicity

Page 23: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Other Characterizations of Ergodicity 23

Theorem 1.7.1 Let (X,F , µ) be a probability space, and T : X → X mea-sure preserving. The following are equivalent:

(i) T is ergodic.

(ii) If f ∈ L0(X,F , µ), with f(Tx) = f(x) for all x, then f is a constanta.e.

(iii) If f ∈ L0(X,F , µ), with f(Tx) = f(x) for a.e. x, then f is a constanta.e.

(iv) If f ∈ L2(X,F , µ), with f(Tx) = f(x) for all x, then f is a constanta.e.

(v) If f ∈ L2(X,F , µ), with f(Tx) = f(x) for a.e. x, then f is a constanta.e.

Proof

The implications (iii)⇒(ii), (ii)⇒(iv), (v)⇒(iv), and (iii)⇒(v) are all clear.It remains to show (i)⇒(iii) and (iv)⇒(i).

(i)⇒(iii) Suppose f(Tx) = f(x) a.e. and assume without any loss of gener-ality that f is real (otherwise we consider separately the real and imaginaryparts of f). For each n ≥ 1 and k ∈ Z, let

X(k,n) = x ∈ X :k

2n≤ f(x) <

k + 1

2n.

Then, T−1X(k,n)∆X(k,n) ⊆ x : f(Tx) 6= f(x) which implies that

µ(T−1X(k,n)∆X(k,n)

)= 0.

By ergodicity of T , µ(X(k,n)) = 0 or 1, for each k ∈ Z. On the other hand,for each n ≥ 1, we have

X =⋃k∈Z

X(k,n) (disjoint union).

Hence, for each n ≥ 1, there exists a unique integer kn such that µ(X(kn,n)

)=

1. In fact, X(k1,1) ⊇ X(k2,2) ⊇ . . ., and kn2n is a bounded increasing sequence,

Page 24: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

24 Introduction and preliminaries

hence limn→∞kn2n

exists. Let Y =⋂n≥1X(kn,n), then µ(Y ) = 1. Now, if

x ∈ Y , then 0 ≤ |f(x)− kn/2n| < 1/2n for all n. Hence, f(x) = limn→∞

kn2n

,

and f is a constant on Y .

(iv)⇒(i) Suppose T−1A = A and µ(A) > 0. We want to show that µ(A) = 1.Consider 1A, the indicator function of A. We have 1A ∈ L2(X,F , µ), and1A T = 1T−1A = 1A. Hence, by (iv), 1A is a constant a.e., hence 1A = 1 a.e.and therefore µ(A) = 1.

1.8 Examples of Ergodic Transformations

Example 1–Irrational Rotations. Consider ([0, 1),B, λ), where B is the Lebesgueσ-algebra, and λ Lebesgue measure. For θ ∈ (0, 1), consider the transforma-tion Tθ : [0, 1) → [0, 1) defined by Tθx = x + θ (mod 1). We have seenin example (a) that Tθ is measure preserving with respect λ. When is Tθergodic?

If θ is rational, then Tθ is not ergodic. Consider for example θ = 1/4, thenthe set

A = [0, 1/8) ∪ [1/4, 3/8) ∪ [1/2, 5/8) ∪ [3/4, 7/8)

is Tθ-invariant but µ(A) = 1/2.

Exercise 1.8.1 Suppose θ =p

qwith gcd(p, q) = 1. Find a non-trivial Tθ-

invariant set. Conclude that Tθ is not ergodic if θ is a rational.

Claim. Tθ is ergodic if and only if θ is irrational.

Proof of Claim.

(⇒) The contrapositive statement is given in Exercise 1.8.1 i.e. if θ is rational,then Tθ is not ergodic.

(⇐) Suppose θ is irrational, and let f ∈ L2(X,B, λ) be Tθ-invariant. Writef in its Fourier series

f(x) =∑n∈Z

ane2πinx.

Page 25: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Examples of Ergodic Transformations 25

Since f(Tθx) = f(x), then

f(Tθx) =∑n∈Z

ane2πin(x+θ) =

∑n∈Z

ane2πinθe2πinx

= f(x) =∑n∈Z

ane2πinx.

Hence,∑

n∈Z an(1 − e2πinθ)e2πinx = 0. By the uniqueness of the Fouriercoefficients, we have an(1 − e2πinθ) = 0 for all n ∈ Z. If n 6= 0, since θ isirrational we have 1 − e2πinθ 6= 0. Thus, an = 0 for all n 6= 0, and thereforef(x) = a0 is a constant. By Theorem 1.7.1, Tθ is ergodic.

Exercise 1.8.2 Consider the probability space ([0, 1),B × B, λ× λ), whereas above B is the Lebesgue σ-algebra on [0, 1), and λ normalized Lebesguemeasure. Suppose θ ∈ (0, 1) is irrational, and define Tθ×Tθ : [0, 1)× [0, 1) →[0, 1)× [0, 1) by

Tθ × Tθ(x, y) = (x+ θ mod (1) , y + θ mod (1) ) .

Show that Tθ × Tθ is measure preserving, but is not ergodic.

Example 2–One (or Two) sided shift. Let X = 0, 1, . . . k − 1N, F the σ-algebra generated by the cylinders, and µ the product measure defined oncylinder sets by

µ (x : x0 = a0, . . . xn = an) = pa0 . . . pan ,

where p = (p0, p1, . . . , pk−1) is a positive probability vector. Consider theleft shift T defined on X by Tx = y, where yn = xn+1 (See Example (e) inSubsection 1.3). We show that T is ergodic. Let E be a measurable subsetof X which is T -invariant i.e., T−1E = E. For any ε > 0, by Lemma 1.2.1(see subsection 1.2), there exists A ∈ F which is a finite disjoint union ofcylinders such that µ(E∆A) < ε. Then

|µ(E)− µ(A)| = |µ(E \ A)− µ(A \ E)|≤ µ(E \ A) + µ(A \ E) = µ(E∆A) < ε.

Since A depends on finitely many coordinates only, there exists n0 > 0such that T−n0A depends on different coordinates than A. Since µ is aproduct measure, we have

µ(A ∩ T−n0A) = µ(A)µ(T−n0A) = µ(A)2.

Page 26: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

26 Introduction and preliminaries

Further,

µ(E∆T−n0A) = µ(T−n0E∆T−n0A) = µ(E∆A) < ε,

andµ(E∆(A ∩ T−n0A)

)≤ µ(E∆A) + µ(E∆T−n0A) < 2ε.

Hence,

|µ(E)− µ((A ∩ T−n0A))| ≤ µ(E∆(A ∩ T−n0A)

)< 2ε.

Thus,

|µ(E)− µ(E)2| ≤ |µ(E)− µ(A)2|+ |µ(A)2 − µ(E)2|= |µ(E)− µ((A ∩ T−n0A))|+ (µ(A) + µ(E))|µ(A)− µ(E)|< 4ε.

Since ε > 0 is arbitrary, it follows that µ(E) = µ(E)2, hence µ(E) = 0 or 1.Therefore, T is ergodic.

The following lemma provides, in some cases, a useful tool to verify that ameasure preserving transformation defined on ([0, 1),B, µ) is ergodic, whereB is the Lebesgue σ-algebra, and µ is a probability measure equivalent toLebesgue measure λ (i.e., µ(A) = 0 if and only if λ(A) = 0).

Lemma 1.8.1 (Knopp’s Lemma) . If B is a Lebesgue set and C is a classof subintervals of [0, 1) satisfying

(a) every open subinterval of [0, 1) is at most a countable union of disjointelements from C,

(b) ∀A ∈ C , λ(A ∩B) ≥ γλ(A), where γ > 0 is independent of A,

then λ(B) = 1.

Proof The proof is done by contradiction. Suppose λ(Bc) > 0. Given ε > 0there exists by Lemma 1.2.1 a set Eε that is a finite disjoint union of openintervals such that λ(Bc4Eε) < ε. Now by conditions (a) and (b) (thatis, writing Eε as a countable union of disjoint elements of C) one gets thatλ(B ∩ Eε) ≥ γλ(Eε).

Page 27: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Examples of Ergodic Transformations 27

Also from our choice of Eε and the fact that

λ(Bc4Eε) ≥ λ(B ∩ Eε) ≥ γλ(Eε) ≥ γλ(Bc ∩ Eε) > γ(λ(Bc)− ε),

we have that

γ(λ(Bc)− ε) < λ(Bc4Eε) < ε .

Hence γλ(Bc) < ε+ γε, and since ε > 0 is arbitrary, we get a contradiction.

Example 3–Multiplication by 2 modulo 1–Consider ([0, 1),B, λ) be as in Ex-ample (1) above, and let T : X → X be given by

Tx = 2x mod 1 =

2x 0 ≤ x < 1/22x− 1 1/2 ≤ x < 1,

(see Example (b), subsection 1.3). We have seen that T is measure preserving.We will use Lemma 1.8.1 to show that T is ergodic. Let C be the collectionof all intervals of the form [k/2n, (k+ 1)/2n) with n ≥ 1 and 0 ≤ k ≤ 2n− 1.Notice that the the set k/2n : n ≥ 1, 0 ≤ k < 2n − 1 of dyadic rationalsis dense in [0, 1), hence each open interval is at most a countable union ofdisjoint elements of C. Hence, C satisfies the first hypothesis of Knopp’sLemma. Now, T n maps each dyadic interval of the form [k/2n, (k + 1)/2n)linearly onto [0, 1), (we call such an interval dyadic of order n); in fact,T nx = 2nx mod(1). Let B ∈ B be T -invariant, and assume λ(B) > 0. LetA ∈ C, and assume that A is dyadic of order n. Then, T nA = [0, 1) and

λ(A ∩B) = λ(A ∩ T−nB) =1

λ(A)λ(T nA ∩B)

=1

2nλ(B) = λ(A)λ(B).

Thus, the second hypothesis of Knopp’s Lemma is satisfied with γ = λ(B) >0. Hence, λ(B) = 1. Therefore T is ergodic.

Exercise 1.8.3 Let β > 1 be a non-integer, and consider the transformationTβ : [0, 1) → [0, 1) given by Tβx = βx mod(1) = βx − bβxc. Use Lemma1.8.1 to show that Tβ is ergodic with respect to Lebesgue measure λ, i.e. ifT−1β A = A, then λ(A) = 0 or 1.

Page 28: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

28 Introduction and preliminaries

Example 4–Induced transformations of ergodic transformations– Let T be anergodic measure preserving transformation on the probability space (X,F , µ),and A ∈ F with µ(A) > 0. Consider the induced transformation TA on(A,F ∩ A, µA) of T (see subsection 1.5). Recall that TAx = T n(x)x, wheren(x) := infn ≥ 1 : T nx ∈ A. Let (as before)

Ak = x ∈ A : n(x) = k

Bk = x ∈ X \ A : Tx, . . . , T k−1x 6∈ A, T kx ∈ A.

Proposition 1.8.1 If T is ergodic on (X,F , µ), then TA is ergodic on (A,F∩A, µA).

Proof Let C ∈ F ∩ A be such that T−1A C = C. We want to show that

µA(C) = 0 or 1; equivalently, µ(C) = 0 or µ(C) = µ(A). Since A =⋃k≥1Ak,

we have C = T−1A C =

⋃k≥1Ak ∩ T−kC. Let E =

⋃k≥1Bk ∩ T−kC, and

F = E∪C (disjoint union). Recall that (see subsection 1.5) T−1A = A1∪B1,and T−1Bk = Ak+1 ∪Bk+1. Hence,

T−1F = T−1E ∪ T−1C

=⋃k≥1

[(Ak+1 ∪Bk+1) ∩ T−(k+1)C

]∪[(A1 ∪B1) ∩ T−1C

]=

⋃k≥1

(Ak ∩ T−kC) ∪⋃k≥1

(Bk ∩ T−kC)

= C ∪ E = F.

Hence, F is T -invariant, and by ergodicity of T we have µ(F ) = 0 or 1.–If µ(F ) = 0, then µ(C) = 0, and hence µA(C) = 0.–If µ(F ) = 1, then µ(X \ F ) = 0. Since

X \ F = (A \ C) ∪ ((X \ A) \ E) ⊇ A \ C,

it follows thatµ(A \ C) ≤ µ(X \ F ) = 0.

Since µ(A \ C) = µ(A)− µ(C), we have µ(A) = µ(C), i.e., µA(C) = 1.

Exercise 1.8.4 Show that if TA is ergodic and µ(⋃

k≥1 T−kA

)= 1, then, T

is ergodic.

Page 29: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Chapter 2

The Ergodic Theorem

2.1 The Ergodic Theorem and its consequences

The Ergodic Theorem is also known as Birkhoff’s Ergodic Theorem or theIndividual Ergodic Theorem (1931). This theorem is in fact a generalizationof the Strong Law of Large Numbers (SLLN) which states that for a sequenceY1, Y2, . . . of i.i.d. random variables on a probability space (X,F , µ), withE|Yi| < ∞; one has

limn→∞

1

n

n∑i=1

Yi = EY1 (a.e.).

For example consider X = 0, 1N, F the σ-algebra generated by the cylindersets, and µ the uniform product measure, i.e.,

µ (x : x1 = a1, x2 = a2, . . . , xn = an) = 1/2n.

Suppose one is interested in finding the frequency of the digit 1. More pre-cisely, for a.e. x we would like to find

limn→∞

1

n#1 ≤ i ≤ n : xi = 1.

Using the Strong Law of Large Numbers one can answer this question easily.Define

Yi(x) :=

1, if xi = 1,0, otherwise.

29

Page 30: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

30 The Ergodic Theorem

Since µ is product measure, it is easy to see that Y1, Y2, . . . form an i.i.d.Bernoulli process, and EYi = E|Yi| = 1/2. Further, #1 ≤ i ≤ n : xi = 1 =∑n

i=1 Yi(x). Hence, by SLLN one has

limn→∞

1

n#1 ≤ i ≤ n : xi = 1 =

1

2.

Suppose now we are interested in the frequency of the block 011, i.e., wewould like to find

limn→∞

1

n#1 ≤ i ≤ n : xi = 0, xi+1 = 1, xi+2 = 1.

We can start as above by defining random variables

Zi(x) :=

1, if xi = 0, xi+1 = 1, xi+2 = 1,0, otherwise.

Then,

1

n#1 ≤ i ≤ n : xi = 0, xi+1 = 1, xi+2 = 1 =

1

n

n∑i=1

Zi(x).

It is not hard to see that this sequence is stationary but not independent. Soone cannot directly apply the strong law of large numbers. Notice that if Tis the left shift on X, then Yn = Y1 T n−1 and Zn = Z1 T n−1.In general, suppose (X,F , µ) is a probability space and T : X → X a measurepreserving transformation. For f ∈ L1(X,F , µ), we would like to know under

what conditions does the limit limn→∞1

n

n−1∑i=0

f(T ix) exist a.e. If it does exist

what is its value? This is answered by the Ergodic Theorem which wasoriginally proved by G.D. Birkhoff in 1931. Since then, several proofs of thisimportant theorem have been obtained; here we present a recent proof givenby T. Kamae and M.S. Keane in [KK].

Theorem 2.1.1 (The Ergodic Theorem) Let (X,F , µ) be a probability spaceand T : X → X a measure preserving transformation. Then, for any f inL1(µ),

limn→∞

1

n

n−1∑i=0

f(T i(x)) = f ∗(x)

exists a.e., is T -invariant and∫Xf dµ =

∫Xf ∗ dµ. If moreover T is ergodic,

then f ∗ is a constant a.e. and f ∗ =∫Xf dµ.

Page 31: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

The Ergodic Theorem and its consequences 31

For the proof of the above theorem, we need the following simple lemma.

Lemma 2.1.1 Let M > 0 be an integer, and suppose ann≥0, bnn≥0 aresequences of non-negative real numbers such that for each n = 0, 1, 2, . . . thereexists an integer 1 ≤ m ≤M with

an + · · ·+ an+m−1 ≥ bn + · · ·+ bn+m−1.

Then, for each positive integer N > M , one has

a0 + · · ·+ aN−1 ≥ b0 + · · ·+ bN−M−1.

Proof of Lemma 2.1.1 Using the hypothesis we recursively find integersm0 < m1 < . . . < mk < N with the following properties

m0 ≤M, mi+1 −mi ≤M for i = 0, . . . , k − 1, and N −mk < M,

a0 + . . .+ am0−1 ≥ b0 + . . .+ bm0−1,

am0 + . . .+ am1−1 ≥ bm0 + . . .+ bm1−1,

...

amk−1+ . . .+ amk−1 ≥ bmk−1

+ . . .+ bmk−1.

Then,

a0 + . . .+ aN−1 ≥ a0 + . . .+ amk−1

≥ b0 + . . .+ bmk−1 ≥ b0 + . . . bN−M−1.

Proof of Theorem 2.1.1 Assume with no loss of generality that f ≥ 0(otherwise we write f = f+ − f−, and we consider each part separately).

Let fn(x) = f(x) + . . . + f(T n−1x), f(x) = lim supn→∞fn(x)

n, and f(x) =

lim infn→∞fn(x)

n. Then f and f are T -invariant. This follows from

f(Tx) = lim supn→∞

fn(Tx)

n

= lim supn→∞

[fn+1(x)

n+ 1· n+ 1

n− f(x)

n

]= lim sup

n→∞

fn+1(x)

n+ 1= f(x).

Page 32: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

32 The Ergodic Theorem

(Similarly f is T -invariant). Now, to prove that f ∗ exists, is integrable andT -invariant, it is enough to show that∫

X

f dµ ≥∫X

f dµ ≥∫X

f dµ.

For since f − f ≥ 0, this would imply that f = f = f ∗. a.e.

We first prove that∫Xfdµ ≤

∫Xf dµ. Fix any 0 < ε < 1, and let L > 0 be

any real number. By definition of f , for any x ∈ X, there exists an integerm > 0 such that

fm(x)

m≥ min(f(x), L)(1− ε).

Now, for any δ > 0 there exists an integer M > 0 such that the set

X0 = x ∈ X : ∃ 1 ≤ m ≤M with fm(x) ≥ m min(f(x), L)(1− ε)

has measure at least 1− δ. Define F on X by

F (x) =

f(x) x ∈ X0

L x /∈ X0.

Notice that f ≤ F (why?). For any x ∈ X, let an = an(x) = F (T nx), andbn = bn(x) = min(f(x), L)(1 − ε) (so bn is independent of n).We now showthat an and bn satisfy the hypothesis of Lemma 2.1.1 with M > 0 asabove. For any n = 0, 1, 2, . . .

–if T nx ∈ X0, then there exists 1 ≤ m ≤M such that

fm(T nx) ≥ m min(f(T nx), L)(1− ε)

= m min(f(x), L)(1− ε)

= bn + . . .+ bn+m−1.

Hence,

an + . . .+ an+m−1 = F (T nx) + . . .+ F (T n+m−1x)

≥ f(T nx) + . . .+ f(T n+m−1x) = fm(T nx)

≥ bn + . . .+ bn+m−1.

–If T nx /∈ X0, then take m = 1 since

an = F (T nx) = L ≥ min(f(x), L)(1− ε) = bn.

Page 33: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

The Ergodic Theorem and its consequences 33

Hence by Lemma 2.1.1 for all integers N > M one has

F (x) + . . .+ F (TN−1x) ≥ (N −M) min(f(x), L)(1− ε).

Integrating both sides, and using the fact that T is measure preserving onegets

N

∫X

F (x) dµ(x) ≥ (N −M)

∫X

min(f(x), L)(1− ε) dµ(x).

Since ∫X

F (x) dµ(x) =

∫X0

f(x) dµ(x) + Lµ(X \X0),

one has∫X

f(x) dµ(x) ≥∫X0

f(x) dµ(x)

=

∫X

F (x) dµ(x)− Lµ(X \X0)

≥ (N −M)

N

∫X

min(f(x), L)(1− ε) dµ(x)− Lδ.

Now letting first N →∞, then δ → 0, then ε→ 0, and lastly L→∞ onegets together with the monotone convergence theorem that f is integrable,and ∫

X

f(x) dµ(x) ≥∫X

f(x) dµ(x).

We now prove that ∫X

f(x) dµ(x) ≤∫X

f(x) dµ(x).

Fix ε > 0, for any x ∈ X there exists an integer m such that

fm(x)

m≤ (f(x) + ε).

For any δ > 0 there exists an integer M > 0 such that the set

Y0 = x ∈ X : ∃ 1 ≤ m ≤M with fm(x) ≤ m (f(x) + ε)

Page 34: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

34 The Ergodic Theorem

has measure at least 1− δ. Define G on X by

G(x) =

f(x) x ∈ Y0

0 x /∈ Y0.

Notice that G ≤ f . Let bn = G(T nx), and an = f(x)+ε (so an is independentof n). One can easily check that the sequences an and bn satisfy thehypothesis of Lemma 2.1.1 with M > 0 as above. Hence for any M > N ,one has

G(x) + . . .+G(TN−M−1x) ≤ N(f(x) + ε).

Integrating both sides yields

(N −M)

∫X

G(x)dµ(x) ≤ N(

∫X

f(x)dµ(x) + ε).

Since f ≥ 0, the measure ν defined by ν(A) =∫Af(x) dµ(x) is absolutely

continuous with respect to the measure µ. Hence, there exists δ0 > 0 suchthat if µ(A) < δ, then ν(A) < δ0. Since µ(X \ Y0) < δ, then ν(X \ Y0) =∫X\Y0

f(x)dµ(x) < δ0. Hence,∫X

f(x) dµ(x) =

∫X

G(x) dµ(x) +

∫X\Y0

f(x) dµ(x)

≤ N

N −M

∫X

(f(x) + ε) dµ(x) + δ0.

Now, let first N → ∞, then δ → 0 (and hence δ0 → 0), and finally ε → 0,one gets ∫

X

f(x) dµ(x) ≤∫X

f(x) dµ(x).

This shows that ∫X

f dµ ≥∫X

f dµ ≥∫X

f dµ,

hence, f = f = f ∗ a.e., and f ∗ is T -invariant. In case T is ergodic, then theT -invariance of f ∗ implies that f ∗ is a constant a.e. Therefore,

f ∗(x) =

∫X

f ∗(y)dµ(y) =

∫X

f(y) dµ(y).

Page 35: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

The Ergodic Theorem and its consequences 35

Remarks(1) Let us study further the limit f ∗ in the case that T is not ergodic. Let Ibe the sub-σ-algebra of F consisting of all T -invariant subsets A ∈ F . Noticethat if f ∈ L1(µ), then the conditional expectation of f given I (denoted byEµ(f |I)), is the unique a.e. I-measurable L1(µ) function with the propertythat ∫

A

f(x) dµ(x) =

∫A

Eµ(f |I)(x) dµ(x)

for all A ∈ I i.e., T−1A = A. We claim that f ∗ = Eµ(f |I). Since the limitfunction f ∗ is T -invariant, it follows that f ∗ is I-measurable. Furthermore,for any A ∈ I, by the ergodic theorem and the T -invariance of 1A,

limn→∞

1

n

n−1∑i=0

(f1A)(T ix) = 1A(x) limn→∞

1

n

n−1∑i=0

f(T ix) = 1A(x)f ∗(x) a.e.

and ∫X

f1A(x) dµ(x) =

∫X

f ∗1A(x) dµ(x).

This shows that f ∗ = Eµ(f |I).

(2) Suppose T is ergodic and measure preserving with respect to µ, and letν be a probability measure which is equivalent to µ (i.e. µ and ν have thesame sets of measure zero so µ(A) = 0 if and only if ν(A) = 0), then forevery f ∈ L1(µ) one has

limn→∞

1

n

n−1∑i=0

f(T i(x)) =

∫X

f dµ

ν a.e.

Exercise 2.1.1 (Kac’s Lemma) Let T be a measure preserving and ergodictransformation on a probability space (X,F , µ). Let A be a measurablesubset of X of positive µ measure, and denote by n the first return time mapand let TA be the induced transformation of T on A (see section 1.5). Provethat ∫

A

n(x) dµ = 1.

Page 36: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

36 The Ergodic Theorem

Conclude that n(x) ∈ L1(A, µA), and that

limn→∞

1

n

n−1∑i=0

n(T iA(x)) =1

µ(A),

almost everywhere on A.

Exercise 2.1.2 Let β =1 +

√5

2, and consider the transformation Tβ :

[0, 1) → [0, 1) given by Tβx = βx mod(1) = βx − bβxc. Define b1 on[0, 1) by

b1(x) =

0 if 0 ≤ x < 1/β1 if 1/β ≤ x < 1,

Fix k ≥ 0. Find the a.e. value (with respect to Lebesgue measure) of thefollowing limit

limn→∞

1

n#1 ≤ i ≤ n : bi = 0, bi+1 = 0, . . . , bi+k = 0.

Using the Ergodic Theorem, one can give yet another characterization ofergodicity.

Corollary 2.1.1 Let (X,F , µ) be a probability space, and T : X → X ameasure preserving transformation. Then, T is ergodic if and only if for allA,B ∈ F , one has

limn→∞

1

n

n−1∑i=0

µ(T−iA ∩B) = µ(A)µ(B). (2.1)

Proof Suppose T is ergodic, and let A,B ∈ F . Since the indicator function1A ∈ L1(X,F , µ), by the ergodic theorem one has

limn→∞

1

n

n−1∑i=0

1A(T ix) =

∫X

1A(x) dµ(x) = µ(A) a.e.

Page 37: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

The Ergodic Theorem and its consequences 37

Then,

limn→∞

1

n

n−1∑i=0

1T−iA∩B(x) = limn→∞

1

n

n−1∑i=0

1T−iA(x)1B(x)

= 1B(x) limn→∞

1

n

n−1∑i=0

1A(T ix)

= 1B(x)µ(A) a.e.

Since for each n, the function limn→∞1n

∑n−1i=0 1T−iA∩B is dominated by the

constant function 1, it follows by the dominated convergence theorem that

limn→∞

1

n

n∑i=0

µ(T−iA ∩B) =

∫X

limn→∞

1

n

n−1∑i=0

1T−iA∩B(x) dµ(x)

=

∫X

1Bµ(A) dµ(x) = µ(A)µ(B).

Conversely, suppose (2.1) holds for every A,B ∈ F . Let E ∈ F be such thatT−1E = E and µ(E) > 0. By invariance of E, we have µ(T−iE∩E) = µ(E),hence

limn→∞

1

n

n−1∑i=0

µ(T−iE ∩ E) = µ(E).

On the other hand, by (2.1)

limn→∞

1

n

n−1∑i=0

µ(T−iE ∩ E) = µ(E)2.

Hence, µ(E) = µ(E)2. Since µ(E) > 0, this implies µ(E) = 1. Therefore, Tis ergodic.

To show ergodicity one needs to verify equation (2.1) for sets A and Bbelonging to a generating semi-algebra only as the next proposition shows.

Proposition 2.1.1 Let (X,F , µ) be a probability space, and S a generatingsemi-algebra of F . Let T : X → X be a measure preserving transformation.Then, T is ergodic if and only if for all A,B ∈ S, one has

limn→∞

1

n

n−1∑i=0

µ(T−iA ∩B) = µ(A)µ(B). (2.2)

Page 38: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

38 The Ergodic Theorem

Proof We only need to show that if (2.2) holds for all A,B ∈ S, then itholds for all A,B ∈ F . Let ε > 0, and A,B ∈ F . Then, by Lemma 1.2.1(subsection 1.2) there exist sets A0, B0 each of which is a finite disjoint unionof elements of S such that

µ(A∆A0) < ε, and µ(B∆B0) < ε.

Since,

(T−iA ∩B)∆(T−iA0 ∩B0) ⊆ (T−iA∆T−iA0) ∪ (B∆B0),

it follows that

|µ(T−iA ∩B)− µ(T−iA0 ∩B0)| ≤ µ[(T−iA ∩B)∆(T−iA0 ∩B0)

]≤ µ(T−iA∆T−iA0) + µ(B∆B0)

< 2ε.

Further,

|µ(A)µ(B)− µ(A0)µ(B0)| ≤ µ(A)|µ(B)− µ(B0)|+ µ(B0)|µ(A)− µ(A0)|≤ |µ(B)− µ(B0)|+ |µ(A)− µ(A0)|≤ µ(B∆B0) + µ(A∆A0)

< 2ε.

Hence,∣∣∣∣∣(

1

n

n−1∑i=0

µ(T−iA ∩B)− µ(A)µ(B)

)−

(1

n

n−1∑i=0

µ(T−iA0 ∩B0)− µ(A0)µ(B0)

)∣∣∣∣∣≤ 1

n

n−1∑i=0

∣∣µ(T−iA ∩B) + µ(T−iA0 ∩B0)∣∣− |µ(A)µ(B)− µ(A0)µ(B0)|

< 4ε.

Therefore,

limn→∞

[1

n

n−1∑i=0

µ(T−iA ∩B)− µ(A)µ(B)

]= 0.

Page 39: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

The Ergodic Theorem and its consequences 39

Theorem 2.1.2 Suppose µ1 and µ2 are probability measures on (X,F), andT : X → X is measurable and measure preserving with respect to µ1 and µ2.Then,

(i) if T is ergodic with respect to µ1, and µ2 is absolutely continuous withrespect to µ1, then µ1 = µ2,

(ii) if T is ergodic with respect to µ1 and µ2, then either µ1 = µ2 or µ1 andµ2 are singular with respect to each other.

Proof (i) Suppose T is ergodic with respect to µ1 and µ2 is absolutely con-tinuous with respect to µ1. For any A ∈ F , by the ergodic theorem for a.e.x one has

limn→∞

1

n

n−1∑i=0

1A(T ix) = µ1(A).

Let

CA = x ∈ X : limn→∞

1

n

n−1∑i=0

1A(T ix) = µ1(A),

then µ1(CA) = 1, and by absolute continuity of µ2 one has µ2(CA) = 1. SinceT is measure preserving with respect to µ2, for each n ≥ 1 one has

1

n

n−1∑i=0

∫X

1A(T ix) dµ2(x) = µ2(A).

On the other hand, by the dominated convergence theorem one has

limn→∞

∫X

1

n

n−1∑i=0

1A(T ix)dµ2(x) =

∫X

µ1(A) dµ2(x).

This implies that µ1(A) = µ2(A). Since A ∈ F is arbitrary, we have µ1 = µ2.

(ii) Suppose T is ergodic with respect to µ1 and µ2. Assume that µ1 6= µ2.Then, there exists a set A ∈ F such that µ1(A) 6= µ2(A). For i = 1, 2 let

Ci = x ∈ X : limn→∞

1

n

n−1∑j=0

1A(T jx) = µi(A).

Page 40: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

40 The Ergodic Theorem

By the ergodic theorem µi(Ci) = 1 for i = 1, 2. Since µ1(A) 6= µ2(A), thenC1 ∩ C2 = ∅. Thus µ1 and µ2 are supported on disjoint sets, and hence µ1

and µ2 are mutually singular.

We end this subsection with a short discussion that the assumption of er-godicity is not very restrictive. Let T be a transformation on the probabilityspace (X,F , µ), and suppose T is measure preserving but not necessarilyergodic. We assume that X is a complete separable metric space, and F thecorresponding Borel σ-algebra (in order to make sure that the conditionalexpectation is well-defined a.e.). Let I be the sub-σ-algebra of T -invariantmeasurable sets. We can decompose µ into T -invariant ergodic componentsin the following way. For x ∈ X, define a measure µx on F by

µx(A) = Eµ(1A|I)(x).

Then, for any f ∈ L1(X,F , µ),∫X

f(y) dµx(y) = Eµ(f |I)(x).

Note that

µ(A) =

∫X

Eµ(1A|I)(x) dµ(x) =

∫X

µx(A) dµ(x),

and that Eµ(1A|I)(x) is T -invariant. We show that µx is T -invariant andergodic for a.e. x ∈ X. So let A ∈ F , then for a.e. x

µx(T−1A) = Eµ(1A T |I)(x) = Eµ(IA|I)(Tx) = Eµ(IA|I)(x) = µx(A).

Now, let A ∈ F be such that T−1A = A. Then, 1A is T -invariant, and henceI-measurable. Then,

µx(A) = Eµ(1A|I)(x) = 1A(x) a.e.

Hence, for a.e. x and for any B ∈ F ,

µx(A ∩B) = Eµ(1A1B|I)(x) = 1A(x)Eµ(1B|I)(x) = µx(A)µx(B).

In particular, if A = B, then the latter equality yields µx(A) = µx(A)2 whichimplies that for a.e. x, µx(A) = 0 or 1. Therefore, µx is ergodic. (Onein fact needs to work a little harder to show that one can find a set N ofµ-measure zero, such that for any x ∈ X \N , and any T -invariant set A, onehas µx(A) = 0 or 1. In the above analysis the a.e. set depended on the choiceof A. Hence, the above analysis is just a rough sketch of the proof of whatis called the ergodic decomposition of measure preserving transformations.)

Page 41: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Characterization of Irreducible Markov Chains 41

2.2 Characterization of Irreducible Markov

Chains

Consider the Markov Chain in Example(f) subsection 1.3. That is X =0, 1, . . . N − 1Z, F the σ-algebra generated by the cylinders, T : X → Xthe left shift, and µ the Markov measure defined by the stochastic N × Nmatrix P = (pij), and the positive probability vector π = (π0, π1, . . . , πN−1)satisfying πP = π. That is

µ(x : x0 = i0, x1 = i1, . . . xn = in) = πi0pi0i1pi1i2 . . . pin−1in .

We want to find necessary and sufficient conditions for T to be ergodic. Toachieve this, we first set

Q = limn→∞

1

n

n−1∑k=0

P k,

where P k = (p(k)ij ) is the kth power of the matrix P , and P 0 is the k × k

identity matrix. More precisely, Q = (qij), where

qij = limn→∞

1

n

n−1∑k=0

p(k)ij .

Lemma 2.2.1 For each i, j ∈ 0, 1, . . . N−1, the limit limn→∞1n

∑n−1k=0 p

(k)ij

exists, i.e., qij is well-defined.

Proof For each n,

1

n

n−1∑k=0

p(k)ij =

1

πi

1

n

n−1∑k=0

µ(x ∈ X : x0 = i, xk = j).

Since T is measure preserving, by the ergodic theorem,

limn→∞

1

n

n−1∑k=0

1x:xk=j(x) = limn→∞

1

n

n−1∑k=0

1x:x0=j(Tkx) = f ∗(x),

where f ∗ is T -invariant and integrable. Then,

limn→∞

1

n

n−1∑k=0

1x:x0=i,xk=j(x) = 1x:x0=i(x) limn→∞

1

n

n−1∑k=0

1x:x0=j(Tkx) = f ∗(x)1x:x0=i(x).

Page 42: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

42 The Ergodic Theorem

Since 1n

∑n−1k=0 1x:x0=i,xk=j(x) ≤ 1 for all n, by the dominated convergence

theorem,

qij =1

πilimn→∞

1

n

n−1∑k=0

µ(x ∈ X : x0 = i, xk = j)

=1

πi

∫X

limn→∞

1

n

n−1∑k=0

1x:x0=i,xk=j(x) dµ(x)

=1

πi

∫X

f ∗(x)1x:x0=i(x) dµ(x)

=1

πi

∫x:x0=i

f ∗(x) dµ(x)

which is finite since f ∗ is integrable. Hence qij exists.

Exercise 2.2.1 Show that the matrix Q has the following properties:(a) Q is stochastic.(b) Q = QP = PQ = Q2.(c) πQ = π.

We now give a characterization for the ergodicity of T . Recall that thematrix P is said to be irreducible if for every i, j ∈ 0, 1, . . . N − 1, there

exists n ≥ 1 such that p(n)ij > 0.

Theorem 2.2.1 The following are equivalent,

(i) T is ergodic.

(ii) All rows of Q are identical.

(iii) qij > 0 for all i, j.

(iv) P is irreducible.

(v) 1 is a simple eigenvalue of P .

Proof(i) ⇒ (ii) By the ergodic theorem for each i, j,

limn→∞

1

n

n−1∑k=0

1x:x0=i,xk=j(x) = 1x:x0=i(x)πj.

Page 43: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Characterization of Irreducible Markov Chains 43

By the dominated convergence theorem,

limn→∞

1

n

n−1∑k=0

µ(x ∈ X : x0 = i, xk = j) = πiπj.

Hence,

qij =1

πilimn→∞

1

n

n−1∑k=0

µ(x ∈ X : x0 = i, xk = j) = πj,

i.e., qij is independent of i. Therefore, all rows of Q are identical.

(ii) ⇒ (iii) If all the rows of Q are identical, then all the columns of Q areconstants. Thus, for each j there exists a constant cj such that qij = cj forall i. Since πQ = π, it follows that qij = cj = πj > 0 for all i, j.

(iii) ⇒ (iv) For any i, j

limn→∞

1

n

n−1∑k=0

p(k)ij = qij > 0.

Hence, there exists n such that p(n)ij > 0, therefore P is irreducible.

(iv) ⇒ (iii) Suppose P is irreducible. For any state i ∈ 0, 1, . . . , N − 1, letSi = j : qij > 0. Since Q is a stochastic matrix, it follows that Si 6= ∅. Letl ∈ Si, then qil > 0. Since Q = QP = QP n for all n, then for any state j

qij =N−1∑m=0

qimp(n)mj ≥ qilp

(n)lj

for any n. Since P is irreducible, there exists n such that p(n)lj > 0. Hence,

qij > 0 for all i, j.

(iii) ⇒ (ii) Suppose qij > 0 for all j = 0, 1, . . . , N − 1. Fix any state j, andlet qj = max0≤i≤N−1 qij. Suppose that not all the qij’s are the same. Thenthere exists k ∈ 0, 1, . . . , N − 1 such that qkj < qj. Since Q is stochasticand Q2 = Q, then for any i ∈ 0, 1, . . . , N − 1 we have,

qij =N−1∑l=0

qilqlj <

N−1∑l=0

qilqj = qj.

Page 44: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

44 The Ergodic Theorem

This implies that qj = max0≤i≤N−1 qij < qj, a contradiction. Hence, thecolumns of Q are constants, or all the rows are identical.

(ii) ⇒ (i) Suppose all the rows of Q are identical. We have shown abovethat this implies qij = πj for all i, j ∈ 0, 1, . . . , N − 1. Hence πj =

limn→∞1n

∑n−1k=0 p

(k)ij .

Let

A = x : xr = i0, . . . , xr+l = il, and B = x : xs = j0, . . . , xs+m = jm

be any two cylinder sets of X. By Proposition 2.1.1 in Section 2, we mustshow that

limn→∞

1

n

n−1∑i=0

µ(T−iA ∩B) = µ(A)µ(B).

Since T is the left shift, for all n sufficiently large, the cylinders T−nA andB depend on different coordinates. Hence, for n sufficiently large,

µ(T−nA ∩B) = πj0pj0j1 . . . pjm−1jmp(n+r−s−m)jmi0

pi0i1 . . . pil−1il .

Thus,

limn→∞

1

n

n−1∑k=0

µ(T−kA ∩B)

= πj0pj0j1 . . . pjm−1jmpi0i1 . . . pil−1il limn→∞

1

n

n−1∑k=0

p(k)jmi0

= (πj0pj0j1 . . . pjm−1jm)(πi0pi0i1 . . . pil−1il)

= µ(B)µ(A).

Therefore, T is ergodic.

(ii) ⇒ (v) If all the rows of Q are identical, then qij = πj for all i, j. If

vP = v, then vQ = v. This implies that for all j, vj = (∑N−1

i=0 vi)πj. Thus,v is a multiple of π. Therefore, 1 is a simple eigenvalue.

(v)⇒ (ii) Suppose 1 is a simple eigenvalue. For any i, let q∗i = (qi0, . . . , qi(N−1))denote the ith row of Q then, q∗i is a probability vector. From Q = QP , we getq∗i = q∗i P . By hypothesis π is the only probability vector satisfying πP = P ,hence π = q∗i , and all the rows of Q are identical.

Page 45: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Characterization of Irreducible Markov Chains 45

2.3 Mixing

As a corollary to the ergodic theorem we found a new definition of ergodicity;namely, asymptotic average independence. Based on the same idea, we nowdefine other notions of weak independence that are stronger than ergodicity.

Definition 2.3.1 Let (X,F , µ) be a probability space, and T : X → X ameasure preserving transformation. Then,

(i) T is weakly mixing if for all A,B ∈ F , one has

limn→∞

1

n

n−1∑i=0

∣∣µ(T−iA ∩B)− µ(A)µ(B)∣∣ = 0. (2.3)

(ii) T is strongly mixing if for all A,B ∈ F , one has

limn→∞

µ(T−iA ∩B) = µ(A)µ(B). (2.4)

Notice that strongly mixing implies weakly mixing, and weakly mixing im-plies ergodicity. This follows from the simple fact that if an is a sequence

of real numbers such that limn→∞ an = 0, then limn→∞1

n

n−1∑i=0

|ai| = 0, and

hence limn→∞1

n

n−1∑i=0

ai = 0. Furthermore, if an is a bounded sequence,

then the following are equivalent:

(i) limn→∞1

n

n−1∑i=0

|ai| = 0

(ii) limn→∞1

n

n−1∑i=0

|ai|2 = 0

(iii) there exists a subset J of the integers of density zero, i.e.

limn→∞

1

n# (0, 1, . . . , n− 1 ∩ J) = 0,

such that limn→∞,n/∈J an = 0.

Page 46: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

46 The Ergodic Theorem

Using this one can give three equivalent characterizations of weakly mixingtransformations, can you state them?

Exercise 2.3.1 Let (X,F , µ) be a probability space, and T : X → X ameasure preserving transformation. Let S be a generating semi-algebra ofF .

(a) Show that if equation (2.3) holds for all A,B ∈ S, then T is weaklymixing.

(b) Show that if equation (2.4) holds for all A,B ∈ S, then T is stronglymixing.

Exercise 2.3.2 Consider the one or two-sided Bernoulli shift T as given inExample (e) in subsection 1.3, and Example (2) in subsection 1.8. Show thatT is strongly mixing.

Exercise 2.3.3 Let (X,F , µ) be a probability space, and T : X → X ameasure preserving transformation. Consider the transformation T × T de-fined on (X ×X,F × F , µ× µ) by T × T (x, y) = (Tx, Ty).

(a) Show that T × T is measure preserving with respect to µ× µ.

(b) Show that T × T is ergodic, if and only if T is weakly mixing.

Page 47: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Chapter 3

Measure PreservingIsomorphisms and Factor Maps

3.1 Measure Preserving Isomorphisms

Given a measure preserving transformation T on a probability space (X,F , µ),we call the quadruple (X,F , µ, T ) a dynamical system . Now, given two dy-namical systems (X,F , µ, T ) and (Y, C, ν, S), what should we mean by: thesesystems are the same? On each space there are two important structures:

(1) The measure structure given by the σ-algebra and the probability mea-sure. Note, that in this context, sets of measure zero can be ignored.

(2) The dynamical structure, given by a measure preserving transforma-tion.

So our notion of being the same must mean that we have a map

ψ : (X,F , µ, T ) → (Y, C, ν, S)

satisfying

(i) ψ is one-to-one and onto a.e. By this we mean, that if we remove a(suitable) set NX of measure 0 in X, and a (suitable) set NY of measure0 in Y , the map ψ : X \NX → Y \NY is a bijection.

(ii) ψ is measurable, i.e., ψ−1(C) ∈ F , for all C ∈ C.

47

Page 48: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

48 Measure Preserving Isomorphism and Factor Maps

(iii) ψ preserves the measures: ν = µ ψ−1, i.e., ν(C) = µ (ψ−1(C)) for allC ∈ C.

Finally, we should have that

(iv) ψ preserves the dynamics of T and S, i.e., ψ T = S ψ, which is thesame as saying that the following diagram commutes.

................................................................................................................................. ................

.........................................................................................................................

........

........

........

.........................................................................................................................

........

........

........

................................................................................................................................. ................N N

N ′ N ′S

T

ψ ψ

This means that T -orbits are mapped to S-orbits:

x→ Tx→ T 2x→ · · · → T nx→↓ ↓ ↓ ↓ ↓

ψ(x) → S (ψ(x)) → S2 (ψ(x)) → · · · → Sn (ψ(x)) →

Definition 3.1.1 Two dynamical systems (X,F , µ, T ) and (Y, C, ν, S) areisomorphic if there exist measurable sets N ⊂ X and M ⊂ Y with µ(X\N) =ν(Y \ M) = 0 and T (N) ⊂ N, S(M) ⊂ M , and finally if there exists ameasurable map ψ : N →M such that (i)–(iv) are satisfied.

Exercise 3.1.1 Suppose (X,F , µ, T ) and (Y, C, ν, S) are two isomorphic dy-namical systems. Show that

(a) T is ergodic if and only if S is ergodic.

(b) T is weakly mixing if and only if S is weakly mixing.

(c) T is strongly mixing if and only if S is strongly mixing.

Examples

(1) Let K = z ∈ C : |z| = 1 be equipped with the Borel σ-algebra B onK, and Haar measure (i.e., normalized Lebeque measure on the unit circle).

Page 49: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Measure Preserving Isomorphisms 49

Define S : K → K by Sz = z2; equivalently Se2πiθ = e2πi(2θ).. One caneasily check that S is measure preserving. In fact, the map S is isomorphicto the map T on ([0, 1),B, λ) given by Tx = 2x (mod 1) (see Example(b) in subsection 1.3, and Example (3) in subsection 1.8). Define a mapφ : [0, 1) → K by φ(x) = e2πix. We leave it to the reader to check that φis a measurable isomorphism, i.e., φ is a measurable and measure preservingbijection such that Sφ(x) = φ(Tx) for all x ∈ [0, 1).

(2) Consider ([0, 1),B, λ), the unit interval with the Lebesgue σ-algebra, andLebesgue measure. Let T : [0, 1) → [0, 1) be given by Tx = Nx − bNxc.Iterations of T generate the N -adic expansion of points in the unit interval.Let Y := 0, 1, . . . , N − 1N, the set of all sequences (yn)n≥1, with yn ∈0, 1, . . . , N − 1 for n ≥ 1. We now construct an isomorphism between([0, 1),B, λ, T ) and (Y,F , µ, S), where F is the σ-algebra generated by thecylinders, and µ the uniform product measure defined on cylinders by

µ((yi)i≥1 ∈ Y : y1 = a1, y2 = a2, . . . , yn = an) =1

Nn,

for any (a1, a2, a3, . . .) ∈ Y , and where S is the left shift.Define ψ : [0, 1) → Y = 0, 1, . . . , N − 1N by

ψ : x =∞∑k=1

akNk

7→ (ak)k≥1 ,

where∑∞

k=1 ak/Nk is the N -adic expansion of x (for example if N = 2 we

get the binary expansion, and if N = 10 we get the decimal expansion). Let

C(i1, . . . , in) = (yi)i≥1 ∈ Y : y1 = i1, . . . , yn = in .

In order to see that ψ is an isomorphism one needs to verify measurabilityand measure preservingness on cylinders:

ψ−1 (C(i1, . . . , in)) =[ i1N

+i2N2

+ · · ·+ inNn

,i1N

+i2N2

+ · · ·+ in + 1

Nn

)and

λ(ψ−1(C(i1, . . . , in))

)=

1

Nn= µ (C(i1, . . . , in))) .

Note that

N = (yi)i≥1 ∈ Y : there exists a k ≥ 1 such that yi = N−1 for all i ≥ k

Page 50: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

50 Measure Preserving Isomorphism and Factor Maps

is a subset of Y of measure 0. Setting Y = Y \ N , then ψ : [0, 1) → Y is abijection, since every x ∈ [0, 1) has a unique N -adic expansion generatedby T . Finally, it is easy to see that ψ T = S ψ.

Exercise 3.1.2 Consider ([0, 1)2,B × B, λ× λ), where B×B is the productLebesgue σ-algebra, and λ × λ is the product Lebesgue measure Let T :[0, 1)2 → [0, 1)2 be given by

T (x, y) =

(2x,

1

2y), 0 ≤ x < 1

2

(2x− 1,1

2(y + 1)), 1

2≤ x < 1.

Show that T is isomorphic to the two-sided Bernoulli shift S on(0, 1Z,F , µ

),

where F is the σ-algebra generated by cylinders of the form

∆ = x−k = a−k, . . . , x` = a` : ai ∈ 0, 1, i = −k, . . . , `, k, ` ≥ 0,

and µ the product measure with weights (12, 1

2) (so µ(∆) = (1

2)k+`+1).

Exercise 3.1.3 Let G =1 +

√5

2, so that G2 = G+ 1. Consider the set

X = [0,1

G)× [0, 1)

⋃[1

G, 1)× [0,

1

G),

endowed with the product Borel σ-algebra. Define the transformation

T (x, y) =

(Gx,

y

G), (x, y) ∈ [0, 1

G)× [0, 1]

(Gx− 1,1 + y

G), (x, y) ∈ [ 1

G, 1)× [0, 1

G).

(a) Show that T is measure preserving with respect to normalized Lebesguemeasure on X.

(b) Now let S : [0, 1)× [0, 1) → [0, 1)× [0, 1) be given by

S(x, y) =

(Gx,

y

G), (x, y) ∈ [0, 1

G)× [0, 1]

(G2x−G,G+ y

G2), (x, y) ∈ [ 1

G, 1)× [0, 1).

Page 51: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Factor Maps 51

Show that S is measure preserving with respect to normalized Lebesguemeasure on [0, 1)× [0, 1).

(c) Let Y = [0, 1) × [0, 1G), and let U be the induced transformation of T

on Y , i.e., for (x, y) ∈ Y , U(x, y) = T n(x,y), where n(x, y) = infn ≥1 : T n(x, y) ∈ Y . Show that the map φ : [0, 1)× [0, 1) → Y given by

φ(x, y) = (x,y

G)

defines an isomorphism from S to U , where Y has the induced measurestructure (see Section 1.5).

3.2 Factor Maps

In the above section, we discussed the notion of isomorphism which describeswhen two dynamical systems are considered the same. Now, we give a precisedefinition of what it means for a dynamical system to be a subsystem ofanother one.

Definition 3.2.1 Let (X,F , µ, T ) and (Y, C, ν, S) be two dynamical systems.We say that S is a factor of T if there exist measurable sets M1 ∈ F andM2 ∈ C, such that µ(M1) = ν(M2) = 1 and T (M1) ⊂ M1, S(M2) ⊂ M2,and finally if there exists a measurable and measure preserving map ψ : M1 →M2 which is surjective, and satisfies ψ(T (x)) = S(ψ(x)) for all x ∈M1. Wecall ψ a factor map.

Remark Notice that if ψ is a factor map, then G = ψ−1C is a T -invariantsub-σ-algebra of F , since

T−1G = T−1ψ−1C = ψ−1S−1C ⊆ ψ−1C = G.

Examples Let T be the Baker’s transformation on ([0, 1)2,B × B, λ× λ),given by

T (x, y) =

(2x, 1

2y), 0 ≤ x < 1

2

(2x− 1, 12(y + 1)), 1

2≤ x < 1,

Page 52: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

52 Measure Preserving Isomorphism and Factor Maps

and let S be the left shift on X = 0, 1N with the σ-algebra F generated bythe cylinders, and the uniform product measure µ. Define ψ : [0, 1)× [0, 1) →X by

ψ(x, y) = (a1, a2, . . .),

where x =∑∞

n=1

an2n

is the binary expansion of x. It is easy to check that ψ

is a factor map.

Exercise 3.2.1 Let T be the left shift on X = 0, 1, 2N which is endowedwith the σ-algebra F , generated by the cylinder sets, and the uniform productmeasure µ giving each symbol probability 1/3, i.e.,

µ (x ∈ X : x1 = i1, x2 = i2, . . . , xn = in) =1

3n,

where i1, i2, . . . , in ∈ 0, 1, 2.Let S be the left shift on Y = 0, 1N which is endowed with the σ-algebra G,generated by the cylinder sets, and the product measure ν giving the symbol0 probability 1/3 and the symbol 1 probability 2/3, i.e.,

µ (y ∈ Y : y1 = j1, y2 = j2, . . . , yn = jn) = (2

3)j1+j2+...+jn(

1

3)n−(j1+j2+...+jn),

where j1, j2, . . . , jn ∈ 0, 1. Show that S is a factor of T .

Exercise 3.2.2 Show that a factor of an ergodic (weakly mixing/stronglymixing) transformation is also ergodic (weakly mixing/strongly mixing).

3.3 Natural Extensions

Suppose (Y,G, ν, S) is a non-invertible measure-preserving dynamical system.An invertible measure-preserving dynamical system (X,F , µ, T ) is called anatural extension of (Y,G, ν, S) if S is a factor of T and the factor map ψsatisfies ∨∞m=0T

mψ−1G = F , where

∞∨k=0

T kψ−1G

is the smallest σ-algebra containing the σ-algebras T kψ−1G for all k ≥ 0.

Page 53: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Natural Extensions 53

Example Let T on(0, 1Z,F , µ

)be the two-sided Bernoulli shift, and S on(

0, 1N∪0,G, ν)

be the one-sided Bernoulli shift, both spaces are endowedwith the uniform product measure. Notice that T is invertible, while S isnot. Set X = 0, 1Z, Y = 0, 1N∪0, and define ψ : X → Y by

ψ (. . . , x−1, x0, x1, . . .) = (x0, x1, . . .) .

Then, ψ is a factor map. We claim that

∞∨k=0

T kψ−1G = F .

To prove this, we show that∨∞k=0 T

kψ−1G contains all cylinders generatingF .

Let ∆ = x ∈ X : x−k = a−k, . . . , x` = a` be an arbitrary cylinder inF , and let D = y ∈ Y : y0 = a−k, . . . , yk+` = a` which is a cylinder in G.Then,

ψ−1D = x ∈ X : x0 = a−k, . . . , xk+` = a` and T kψ−1D = ∆.

This shows that∞∨k=0

T kψ−1G = F .

Thus, T is the natural extension of S.

Page 54: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

54 Measure Preserving Isomorphism and Factor Maps

Page 55: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Chapter 4

Entropy

4.1 Randomness and Information

Given a measure preserving transformation T on a probability space (X,F , µ),we want to define a nonnegative quantity h(T ) which measures the averageuncertainty about where T moves the points of X. That is, the value ofh(T ) reflects the amount of ‘randomness’ generated by T . We want to defineh(T ) in such a way, that (i) the amount of information gained by an appli-cation of T is proportional to the amount of uncertainty removed, and (ii)that h(T ) is isomorphism invariant, so that isomorphic transformations haveequal entropy.

The connection between entropy (that is randomness, uncertainty) andthe transmission of information was first studied by Claude Shannon in 1948.As a motivation let us look at the following simple example. Consider a source(for example a ticker-tape) that produces a string of symbols · · ·x−1x0x1 · · ·from the alphabet a1, a2, . . . , an. Suppose that the probability of receivingsymbol ai at any given time is pi, and that each symbol is transmitted inde-pendently of what has been transmitted earlier. Of course we must have herethat each pi ≥ 0 and that

∑i pi = 1. In ergodic theory we view this process

as the dynamical system (X,F ,B, µ, T ), where X = a1, a2, . . . , anN, B theσ-algebra generated by cylinder sets of the form

∆n(ai1 , ai2 , . . . , ain) := x ∈ X : xi1 = ai1 , . . . , xin = ain

µ the product measure assigning to each coordinate probability pi of seeing

55

Page 56: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

56 Entropy

the symbol ai, and T the left shift. We define the entropy of this system by

H(p1, . . . , pn) = h(T ) := −n∑i=1

pi log2 pi . (4.1)

If we define log pi as the amount of uncertainty in transmitting the symbol ai,then H is the average amount of information gained (or uncertainty removed)per symbol (notice that H is in fact an expected value). To see why this is anappropriate definition, notice that if the source is degenerate, that is, pi = 1for some i (i.e., the source only transmits the symbol ai), then H = 0. Inthis case we indeed have no randomness. Another reason to see why thisdefinition is appropriate, is that H is maximal if pi = 1

nfor all i, and this

agrees with the fact that the source is most random when all the symbols areequiprobable. To see this maximum, consider the function f : [0, 1] → R+

defined by

f(t) =

0 if t = 0,

−t log2 t if 0 < t ≤ 1.

Then f is continuous and concave downward, and Jensen’s Inequality impliesthat for any p1, . . . , pn with pi ≥ 0 and p1 + . . .+ pn = 1,

1

nH(p1, . . . , pn) =

1

n

n∑i=1

f(pi) ≤ f(1

n

n∑i=1

pi) = f(1

n) =

1

nlog2 n ,

so H(p1, . . . , pn) ≤ log2 n for all probability vectors (p1, . . . , pn). But

H(1

n, . . . ,

1

n) = log2 n,

so the maximum value is attained at ( 1n, . . . , 1

n).

4.2 Definitions and Properties

So far H is defined as the average information per symbol. The above defini-tion can be extended to define the information transmitted by the occurrenceof an event E as − log2 P (E). This definition has the property that the in-formation transmitted by E ∩ F for independent events E and F is the sumof the information transmitted by each one individually, i.e.,

− log2 P (E ∩ F ) = − log2 P (E)− log2 P (F ).

Page 57: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Definitions and Properties 57

The only function with this property is the logarithm function to any base.We choose base 2 because information is usually measured in bits.

In the above example of the ticker-tape the symbols were transmittedindependently. In general, the symbol generated might depend on what hasbeen received before. In fact these dependencies are often ‘built-in’ to beable to check the transmitted sequence of symbols on errors (think here ofthe Morse sequence, sequences on compact discs etc.). Such dependenciesmust be taken into consideration in the calculation of the average informationper symbol. This can be achieved if one replaces the symbols ai by blocks ofsymbols of particular size. More precisely, for every n, let Cn be the collectionof all possible n-blocks (or cylinder sets) of length n, and define

Hn := −∑C∈Cn

P (C) logP (C) .

Then 1nHn can be seen as the average information per symbol when a block

of length n is transmitted. The entropy of the source is now defined by

h := limn→∞

Hn

n. (4.2)

The existence of the limit in (4.2) follows from the fact thatHn is a subadditivesequence, i.e., Hn+m ≤ Hn + Hm, and proposition (4.2.2) (see proposition(4.2.3) below).

Now replace the source by a measure preserving system (X,B, µ, T ).How can one define the entropy of this system similar to the case of asource? The symbols a1, a2, . . . , an can now be viewed as a partitionA = A1, A2, . . . , An of X, so that X is the disjoint union (up to setsof measure zero) of A1, A2, . . . , An. The source can be seen as follows: witheach point x ∈ X, we associate an infinite sequence · · ·x−1, x0, x1, · · · , wherexi is aj if and only if T ix ∈ Aj. We define the entropy of the partition α by

H(α) = Hµ(α) := −n∑i=1

µ(Ai) log µ(Ai) .

Our aim is to define the entropy of the transformation T which is independentof the partition we choose. In fact h(T ) must be the maximal entropy overall possible finite partitions. But first we need few facts about partitions.

Page 58: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

58 Entropy

Exercise 4.2.1 Let α = A1, . . . , An and β = B1, . . . , Bm be two parti-tions of X. Show that

T−1α := T−1A1, . . . , T−1An

andα ∨ β := Ai ∩Bj : Ai ∈ α, Bj ∈ β

are both partitions of X.

The members of a partition are called the atoms of the partition. We saythat the partition β = B1, . . . , Bm is a refinement of the partition α =A1, . . . , An, and write α ≤ β, if for every 1 ≤ j ≤ m there exists an1 ≤ i ≤ n such that Bj ⊂ Ai (up to sets of measure zero). The partitionα ∨ β is called the common refinement of α and β.

Exercise 4.2.2 Show that if β is a refinement of α, each atom of α is a finite(disjoint) union of atoms of β.

Given two partitions α = A1, . . . An and β = B1, . . . , Bm of X, wedefine the conditional entropy of α given β by

H(α|β) := −∑A∈α

∑B∈β

log

(µ(A ∩B)

µ(B)

)µ(A ∩B) .

(Under the convention that 0 log 0 := 0.)The above quantity H(α|β) is interpreted as the average uncertainty

about which element of the partition α the point x will enter (under T )if we already know which element of β the point x will enter.

Proposition 4.2.1 Let α, β and γ be partitions of X. Then,

(a) H(T−1α) = H(α) ;

(b) H(α ∨ β) = H(α) +H(β|α);

(c) H(β|α) ≤ H(β);

(d) H(α ∨ β) ≤ H(α) +H(β);

(e) If α ≤ β, then H(α) ≤ H(β);

Page 59: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Definitions and Properties 59

(f) H(α ∨ β|γ) = H(α|γ) +H(β|α ∨ γ);

(g) If β ≤ α, then H(γ|α) ≤ H(γ|β);

(h) If β ≤ α, then H(β|α) = 0.

(i) We call two partitions α and β independent if

µ(A ∩B) = µ(A)µ(B) for all A ∈ α, B ∈ β .

If α and β are independent partitions, one has that

H(α ∨ β) = H(α) +H(β) .

Proof We prove properties (b) and (c), the rest are left as an exercise.

H(α ∨ β) = −∑A∈α

∑B∈β

µ(A ∩B) log µ(A ∩B)

= −∑A∈α

∑B∈β

µ(A ∩B) logµ(A ∩B)

µ(A)

+ −∑A∈α

∑B∈β

µ(A ∩B) log µ(A)

= H(β|α) +H(α).

We now show that H(β|α) ≤ H(β). Recall that the function f(t) = −t log tfor 0 < t ≤ 1 is concave down. Thus,

H(β|α) = −∑B∈β

∑A∈α

µ(A ∩B) logµ(A ∩B)

µ(A)

= −∑B∈β

∑A∈α

µ(A)µ(A ∩B)

µ(A)log

µ(A ∩B)

µ(A)

=∑B∈β

∑A∈α

µ(A)f

(µ(A ∩B)

µ(A)

)

≤∑B∈β

f

(∑A∈α

µ(A)µ(A ∩B)

µ(A)

)=

∑B∈β

f(µ(B)) = H(β).

Page 60: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

60 Entropy

Exercise 4.2.3 Prove the rest of the properties of Proposition 4.2.1

Now consider the partition∨n−1i=0 T

−iα, whose atoms are of the form Ai0∩T−1Ai1 ∩ . . .∩T−(n−1)Ain−1 , consisting of all points x ∈ X with the propertythat x ∈ Ai0 , Tx ∈ Ai1 , . . . , T n−1x ∈ Ain−1 .

Exercise 4.2.4 Show that if α is a finite partition of (X,F , µ, T ), then

H(n−1∨i=0

T−iα) = H(α) +n−1∑j=1

H(α|j∨i=1

T−iα).

To define the notion of the entropy of a transformation with respect to apartition, we need the following two propositions.

Proposition 4.2.2 If an is a subadditive sequence of real numbers i.e.,an+p ≤ an + ap for all n, p, then

limn→∞

ann

exists.

Proof Fix any m > 0. For any n ≥ 0 one has n = km+ i for some i between0 ≤ i ≤ m− 1. By subadditivity it follows that

ann

=akm+i

km+ i≤ akm

km+

aikm

≤ kamkm

+aikm

.

Note that if n → ∞, k → ∞ and so lim supn→∞ an/n ≤ am/m. Since m isarbitrary one has

lim supn→∞

ann≤ inf

amm

≤ lim infn→∞

ann.

Therefore limn→∞ an/n exists, and equals inf an/n.

Proposition 4.2.3 Let α be a finite partitions of (X,B, µ, T ), where T is ameasure preserving transformation. Then, limn→∞

1nH(∨n−1i=0 T

−iα) exists.

Page 61: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Definitions and Properties 61

Proof Let an = H(∨n−1i=0 T

−iα) ≥ 0. Then, by Proposition 4.2.1, we have

an+p = H(

n+p−1∨i=0

T−iα)

≤ H(n−1∨i=0

T−iα) +H(

n+p−1∨i=n

T−iα)

= an +H(

p−1∨i=0

T−iα)

= an + ap.

Hence, by Proposition 4.2.2

limn→∞

ann

= limn→∞

1

nH(

n−1∨i=0

T−iα)

exists. We are now in position to give the definition of the entropy of the trans-

formation T .

Definition 4.2.1 The entropy of the measure preserving transformation Twith respect to the partition α is given by

h(α, T ) = hµ(α, T ) := limn→∞

1

nH(

n−1∨i=0

T−iα) ,

where

H(n−1∨i=0

T−iα) = −∑

D∈Wn−1

i=0 T−iα

µ(D) log(µ(D)) .

Finally, the entropy of the transformation T is given by

h(T ) = hµ(T ) := supαh(α, T ) .

The following theorem gives an equivalent definition of entropy..

Page 62: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

62 Entropy

Theorem 4.2.1 The entropy of the measure preserving transformation Twith respect to the partition α is also given by

h(α, T ) = limn→∞

H(α|n−1∨i=1

T−iα) .

Proof Notice that the sequence H(α|∨ni=1 T

−iα) is bounded from below,and is non-increasing, hence limn→∞H(α|

∨ni=1 T

−iα) exists. Furthermore,

limn→∞

H(α|n∨i=1

T−iα) = limn→∞

1

n

n∑j=1

H(α|j∨i=1

T−iα).

From exercise 4.2.4, we have

H(n−1∨i=0

T−iα) = H(α) +n−1∑j=1

H(α|j∨i=1

T−iα).

Now, dividing by n, and taking the limit as n → ∞, one gets the desiredresult

Theorem 4.2.2 Entropy is an isomorphism invariant.

Proof Let (X,B, µ, T ) and (Y, C, ν, S) be two isomorphic measure preserv-ing systems (see Definition 1.2.3, for a definition), with ψ : X → Y thecorresponding isomorphism. We need to show that hµ(T ) = hν(S).

Let β = B1, . . . , Bn be any partition of Y , then ψ−1β = ψ−1B1, . . . , ψ−1Bn

is a partition of X. Set Ai = ψ−1Bi, for 1 ≤ i ≤ n. Since ψ : X → Y is anisomorphism, we have that ν = µψ−1 and ψT = Sψ, so that for any n ≥ 0and Bi0 , . . . , Bin−1 ∈ β

ν(Bi0 ∩ S−1Bi1 ∩ . . . ∩ S−(n−1)Bin−1

)= µ

(ψ−1Bi0 ∩ ψ−1S−1Bi1 ∩ . . . ∩ ψ−1S−(n−1)Bin−1

)= µ

(ψ−1Bi0 ∩ T−1ψ−1Bi1 ∩ . . . ∩ T−(n−1)ψ−1Bin−1

)= µ

(Ai0 ∩ T−1Ai1 ∩ . . . ∩ T−(n−1)Ain−1

).

Setting

A(n) = Ai0 ∩ . . . ∩ T−(n−1)Ain−1 and B(n) = Bi0 ∩ . . . ∩ S−(n−1)Bin−1 ,

Page 63: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Calculation of Entropy and Examples 63

we thus find that

hν(S) = supβhν(β, S) = sup

βlimn→∞

1

nHν(

n−1∨i=0

S−iβ)

= supβ

limn→∞

− 1

n

∑B(n)∈

Wn−1i=0 S−iβ

ν(B(n)) log ν(B(n))

= supψ−1β

limn→∞

− 1

n

∑A(n)∈

Wn−1i=0 T−iψ−1β

µ(A(n)) log µ(A(n))

= supψ−1β

hµ(ψ−1β, T )

≤ supαhµ(α, T ) = hµ(T ) ,

where in the last inequality the supremum is taken over all possible finitepartitions α of X. Thus hν(S) ≤ hµ(T ). The proof of hµ(T ) ≤ hν(S) is donesimilarly. Therefore hν(S) = hµ(T ), and the proof is complete.

4.3 Calculation of Entropy and Examples

Calculating the entropy of a transformation directly from the definition doesnot seem feasible, for one needs to take the supremum over all finite parti-tions, which is practically impossible. However, the entropy of a partition isrelatively easier to calculate if one has full information about the partitionunder consideration. So the question is whether it is possible to find a par-tition α of X where h(α, T ) = h(T ). Naturally, such a partition contains allthe information ‘transmitted’ by T . To answer this question we need somenotations and definitions.

For α = A1, . . . , AN and all m, n ≥ 0, let

σ

(m∨i=n

T−iα

)and σ

(−n∨

i=−m

T−iα

)

be the smallest σ-algebras containing the partitions∨mi=n T

−iα and∨−ni=−m T

−iα

respectively. Furthermore, let σ(∨−∞

i=−∞ T−iα)

be the smallest σ-algebra con-

taining all the partitions∨mi=n T

−iα and∨−ni=−m T

−iα for all n and m. We

call a partition α a generator with respect to T if σ(∨∞

i=−∞ T−iα)

= F ,

Page 64: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

64 Entropy

where F is the σ-algebra on X. If T is non-invertible, then α is said to bea generator if σ (

∨∞i=0 T

−iα) = F . Naturally, this equality is modulo setsof measure zero. One has also the following characterization of generators,saying basically, that each measurable set in X can be approximated by afinite disjoint union of cylinder sets. See also [W] for more details and proofs.

Proposition 4.3.1 The partition α is a generator of F if for each A ∈ Fand for each ε > 0 there exists a finite disjoint union C of elements of αmn ,such that µ(A4C) < ε.

We now state (without proofs) two famous theorems known as Kolmogorov-Sinai’s Theorem and Krieger’s Generator Theorem. For the proofs, we referthe interested reader to the book of Karl Petersen or Peter Walter.

Theorem 4.3.1 (Kolmogorov and Sinai, 1958) If α is a generator with re-spect to T and H(α) <∞, then h(T ) = h(α, T ).

Theorem 4.3.2 (Krieger, 1970) If T is an ergodic measure preserving trans-formation with h(T ) <∞, then T has a finite generator.

We will use these two theorems to calculate the entropy of a Bernoullishift.Example (Entropy of a Bernoulli shift)–Let T be the left shift on X =1, 2, · · · , nZ endowed with the σ-algebra F generated by the cylinder sets,and product measure µ giving symbol i probability pi, where p1 + p2 + . . .+pn = 1. Our aim is to calculate h(T ). To this end we need to find a partitionα which generates the σ-algebra F under the action of T . The natural choiceof α is what is known as the time-zero partition α = A1, . . . , An, where

Ai := x ∈ X : x0 = i , i = 1, . . . , n .

Notice that for all m ∈ Z,

T−mAi = x ∈ X : xm = i ,

and

Ai0 ∩ T−1Ai1 ∩ · · · ∩ T−mAim = x ∈ X : x0 = i0, . . . , xm = im .

Page 65: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Calculation of Entropy and Examples 65

In other words,∨mi=0 T

−iα is precisely the collection of cylinders of lengthm (i.e., the collection of all m-blocks), and these by definition generate F .Hence, α is a generating partition, so that

h(T ) = h(α, T ) = limm→∞

1

mH

(m−1∨i=0

T−iα

).

First notice that − since µ is product measure here − the partitions

α, T−1α, · · · , T−(m−1)α

are all independent since each specifies a different coordinate, and so

H(α ∨ T−1α ∨ · · · ∨ T−(m−1)α)

= H(α) +H(T−1α) + · · ·+H(T−(m−1)α)

= mH(α) = −mn∑i=1

pi log pi.

Thus,

h(T ) = limm→∞

1

m(−m)

n∑i=1

pi log pi = −n∑i=1

pi log pi .

Exercise 4.3.1 Let T be the left shift on X = 1, 2, · · · , nZ endowed withthe σ-algebra F generated by the cylinder sets, and the Markov measure µgiven by the stochastic matrix P = (pij), and the probability vector π =(π1, . . . , πn) with πP = π. Show that

h(T ) = −n∑j=1

n∑i=1

πipij log pij

Exercise 4.3.2 Suppose (X1,B1, µ1, T1) and (X2,B2, µ2, T2) are two dynam-ical systems. Show that

hµ1×µ2(T1 × T2) = hµ1(T1) + hµ2(T2).

Page 66: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

66 Entropy

4.4 The Shannon-McMillan-Breiman Theorem

In the previous sections we have considered only finite partitions on X, how-ever all the definitions and results hold if we were to consider countable par-titions of finite entropy. Before we state and prove the Shannon-McMillan-Breiman Theorem, we need to introduce the information function associatedwith a partition.

Let (X,F , µ) be a probability space, and α = A1, A2, . . . be a finite ora countable partition of X into measurable sets. For each x ∈ X, let α(x)be the element of α to which x belongs. Then, the information functionassociated to α is defined to be

Iα(x) = − log µ(α(x)) = −∑A∈α

1A(x) log µ(A).

For two finite or countable partitions α and β of X, we define the condi-tional information function of α given β by

Iα|β(x) = −∑B∈β

∑A∈α

1(A∩B)(x) log

(µ(A ∩B)

µ(B)

).

We claim that

Iα|β(x) = − logEµ(1α(x)|σ(β)) = −∑A∈α

1A(x) logE(1A|σ(β)), (4.3)

where σ(β) is the σ-algebra generated by the finite or countable partition β,(see the remark following the proof of Theorem (2.1.1)). This follows from thefact (which is easy to prove using the definition of conditional expectations)that if β is finite or countable, then for any f ∈ L1(µ), one has

Eµ(f |σ(β)) =∑B∈β

1B1

µ(B)

∫B

fdµ.

Clearly, H(α|β) =∫XIα|β(x) dµ(x).

Exercise 4.4.1 Let α and β be finite or countable partitions of X. Showthat

IαWβ = Iα + Iβ|α.

Page 67: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

The Shannon-McMillan-Breiman Theorem 67

Now suppose T : X → X is a measure preserving transformation on(X,F , µ), and let α = A1, A2, . . . be any countable partition. Then T−1 =T−1A1, T

−1A2, . . . is also a countable partition. Since T is measure pre-serving one has,

IT−1α(x) = −∑Ai∈α

1T−1Ai(x) log µ(T−1Ai) = −

∑Ai∈α

1Ai(Tx) log µ(Ai) = Iα(Tx).

Furthermore,

limn→∞

1

n+ 1H(

n∨i=0

T−iα) = limn→∞

1

n+ 1

∫X

IWni=0 T

−iα(x) dµ(x) = h(α, T ).

The Shannon-McMillan-Breiman theorem says if T is ergodic and if α has

finite entropy, then in fact the integrand1

n+ 1IWn

i=0 T−iα(x) converges a.e. to

h(α, T ). Notice that the integrand can be written as

1

n+ 1IWn

i=0 T−iα(x) = − 1

n+ 1log µ

((n∨i=0

T−iα)(x)

),

where (∨ni=0 T

−iα)(x) is the element of∨ni=0 T

−iα containing x (often referredto as the α-cylinder of order n containing x). Before we proceed we need thefollowing proposition.

Proposition 4.4.1 Let α = A1, A2, . . . be a countable partition with finiteentropy. For each n = 1, 2, 3, . . ., let fn(x) = Iα|Wn

i=1 T−iα(x), and let f ∗ =

supn≥1 fn. Then, for each λ ≥ 0 and for each A ∈ α,

µ (x ∈ A : f ∗(x) > λ) ≤ 2−λ.

Furthermore, f ∗ ∈ L1(X,F , µ).

Proof Let t ≥ 0 and A ∈ α. For n ≥ 1, let

fAn (x) = − logEµ

(1A|

n∨i=1

T−iα

)(x),

andBn = x ∈ X : fA1 (x) ≤ t, . . . , fAn−1(x) ≤ t, fAn (x) > t.

Page 68: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

68 Entropy

Notice that for x ∈ A one has fn(x) = fAn (x), and for x ∈ Bn one hasEµ (1A|

∨ni=1 T

−iα) (x) < 2−t. Since Bn ∈ σ (∨ni=1 T

−iα), then

µ(Bn ∩ A) =

∫Bn

1A(x) dµ(x)

=

∫Bn

(1A|

n∨i=1

T−iα

)(x) dµ(x)

≤∫Bn

2−t dµ(x) = 2−tµ(Bn).

Thus,

µ (x ∈ A : f ∗(x) > t) = µ (x ∈ A : fn(x) > t, for some n)= µ

(x ∈ A : fAn (x) > t, for some n

)= µ (∪∞n=1A ∩Bn)

=∞∑n=1

µ (A ∩Bn)

≤ 2−t∞∑n=1

µ(Bn) ≤ 2−t.

We now show that f ∗ ∈ L1(X,F , µ). First notice that

µ (x ∈ A : f ∗(x) > t) ≤ µ(A),

hence,

µ (x ∈ A : f ∗(x) > t) ≤ min(µ(A), 2−t).

Page 69: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

The Shannon-McMillan-Breiman Theorem 69

Using Fubini’s Theorem, and the fact that f ∗ ≥ 0 one has∫X

f ∗(x) dµ(x) =

∫ ∞

0

µ (x ∈ X : f ∗(x) > t) dt

=

∫ ∞

0

∑A∈α

µ (x ∈ A : f ∗(x) > t) dt

=∑A∈α

∫ ∞

0

µ (x ∈ A : f ∗(x) > t) dt

≤∑A∈α

∫ ∞

0

min(µ(A), 2−t) dt

=∑A∈α

∫ − log µ(A)

0

µ(A) dt+∑A∈α

∫ ∞

− log µ(A)

2−t dt

= −∑A∈α

µ(A) log µ(A) +∑A∈α

µ(A)

loge 2

= Hµ(α) +1

loge 2<∞.

So far we have defined the notion of the conditional entropy Iα|β when αand β are countable partitions. We can generalize the definition to the caseα is a countable partition and G is a σ-algebra as follows (see equation (4.3)),

Iα|G(x) = − logEµ(1α(x)|G).

If we denote by∨∞i=1 T

−iα = σ(∪n∨ni=1 T

−iα), then

Iα|W∞

i=1 T−iα(x) = lim

n→∞Iα|

Wni=1 T

−iα(x). (4.4)

Exercise 4.4.2 Give a proof of equation (4.4) using the following importanttheorem, known as the Martingale Convergence Theorem (and is stated toour setting)

Theorem 4.4.1 (Martingale Convergence Theorem) Let C1 ⊆ C2 ⊆ · · · be asequence of increasing σalgebras, and let C = σ(∪nCn). If f ∈ L1(µ), then

Eµ(f |C) = limn→∞

Eµ(f |Cn)

µ a.e. and in L1(µ).

Page 70: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

70 Entropy

Exercise 4.4.3 Show that if T is measure preserving on the probabilityspace (X,F , µ) and f ∈ L1(µ), then

limn→∞

f(T nx)

n= 0, µ a.e.

Theorem 4.4.2 (The Shannon-McMillan-Breiman Theorem) Suppose T isan ergodic measure preserving transformation on a probability space (X,F , µ),and let α be a countable partition with H(α) <∞. Then,

limn→∞

1

n+ 1IWn

i=0 T−iα(x) = h(α, T ) a.e.

Proof For each n = 1, 2, 3, . . ., let fn(x) = Iα|Wn

i=1 T−iα(x). Then,

IWni=0 T

−iα(x) = IWni=1 T

−iα(x) + Iα|Wn

i=1 T−iα(x)

= IWn−1i=0 T−iα(Tx) + fn(x)

= IWn−1i=1 T−iα(Tx) + Iα|

Wn−1i=1 T−iα(Tx) + fn(x)

= IWn−2i=0 T−iα(T

2x) + fn−1(Tx) + fn(x)

...

= Iα(Tnx) + f1(T

n−1x) + . . .+ fn−1(Tx) + fn(x).

Let f(x) = Iα|W∞

i=1 T−iα(x) = limn→∞ fn(x). Notice that f ∈ L1(X,F , µ)

since∫Xf(x) dµ(x) = h(α, T ). Now letting f0 = Iα, we have

1

n+ 1IWn

i=0 T−iα(x) =

1

n+ 1

n∑k=0

fn−k(Tkx)

=1

n+ 1

n∑k=0

f(T kx) +1

n+ 1

n∑k=0

(fn−k − f)(T kx).

By the ergodic theorem,

limn→∞

n∑k=0

f(T kx) =

∫X

f(x) dµ(x) = h(α, T ) a.e.

Page 71: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

The Shannon-McMillan-Breiman Theorem 71

We now study the sequence 1

n+ 1

n∑k=0

(fn−k − f)(T kx). Let

FN = supk≥N

|fk − f |, and f ∗ = supn≥1

fn.

Notice that 0 ≤ FN ≤ f ∗+f , hence FN ∈ L1(X,F , µ) and limN→∞ FN(x) = 0a.e. Also for any k, |fn−k− f | ≤ f ∗+ f , so that |fn−k− f | ∈ L1(X,F , µ) andlimn→∞|fn−k − f | = 0 a.e.For any N ≤ n,

1

n+ 1

n∑k=0

|fn−k − f |(T kx) =1

n+ 1

n−N∑k=0

|fn−k − f |(T kx)

+1

n+ 1

n∑k=n−N+1

|fn−k − f |(T kx)

≤ 1

n+ 1

n−N∑k=0

FN(T kx)

+1

n+ 1

N−1∑k=0

|fk − f |(T n−kx).

If we take the limit as n→∞, then by exercise (4.4.3) the second term tendsto 0 a.e., and by the ergodic theorem as well as the dominated convergencetheorem, the first term tends to zero a.e. Hence,

limn→∞

1

n+ 1IWn

i=0 T−iα(x) = h(α, T ) a.e.

The above theorem can be interpreted as providing an estimate of the sizeof the atoms of

∨ni=0 T

−iα. For n sufficiently large, a typical element A ∈∨ni=0 T

−iα satisfies

− 1

n+ 1log µ(A) ≈ h(α, T )

orµ(An) ≈ 2−(n+1)h(α,T ).

Furthermore, if α is a generating partition (i.e.∨∞i=0 T

−iα = F , then in theconclusion of Shannon-McMillan-Breiman Theorem one can replace h(α, T )by h(T ).

Page 72: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

72 Entropy

4.5 Lochs’ Theorem

In 1964, G. Lochs compared the decimal and the continued fraction expan-sions. Let x ∈ [0, 1) be an irrational number, and suppose x = .d1d2 · · · is thedecimal expansion of x (which is generated by iterating the map Sx = 10x(mod 1)). Suppose further that

x =1

a1 +1

a2 +1

a3 +1. . .

= [0; a1, a2, · · · ] (4.5)

is its regular continued fraction (RCF) expansion (generated by the mapTx = 1

x− b 1

xc). Let y = .d1d2 · · · dn be the rational number determined by

the first n decimal digits of x, and let z = y + 10−n. Then, [y, z) is thedecimal cylinder of order n containing x, which we also denote by Bn(x).Now let

y =1

b1 +1

b2 +.. . +

1

bl

and

z =1

c1 +1

c2 +.. . +

1

ck

be the continued fraction expansion of y and z. Let

m (n, x) = max i ≤ min (l, k) : for all j ≤ i, bj = cj . (4.6)

In other words, if Bn(x) denotes the decimal cylinder consisting of all pointsy in [0, 1) such that the first n decimal digits of y agree with those of x,and if Cj(x) denotes the continued fraction cylinder of order j containingx, i.e., Cj(x) is the set of all points in [0, 1) such that the first j digits intheir continued fraction expansion is the same as that of x, then m(n, x) isthe largest integer such that Bn(x) ⊂ Cm(n,x)(x). Lochs proved the followingtheorem:

Page 73: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Lochs’ Theorem 73

Theorem 4.5.1 Let λ denote Lebesgue measure on [0, 1). Then for a.e.x ∈ [0, 1)

limn→∞

m(n, x)

n=

6 log 2 log 10

π2.

In this section, we will prove a generalization of Lochs’ theorem thatallows one to compare any two known expansions of numbers. We showthat Lochs’ theorem is true for any two sequences of interval partitions on[0, 1) satisfying the conclusion of Shannon-McMillan-Breiman theorem. Thecontent of this section as well as the proofs can be found in [DF]. We beginwith few definitions that will be used in the arguments to follow.

Definition 4.5.1 By an interval partition we mean a finite or countablepartition of [0, 1) into subintervals. If P is an interval partition and x ∈ [0, 1),we let P (x) denote the interval of P containing x.

Let P = Pn∞n=1 be a sequence of interval partitions. Let λ denoteLebesgue probability measure on [0, 1).

Definition 4.5.2 Let c ≥ 0. We say that P has entropy c a.e. with respectto λ if

− log λ (Pn (x))

n→ c a.e.

Note that we do not assume that each Pn is refined by Pn+1.

Suppose that P = Pn∞n=1 and Q = Qn∞n=1 are sequences of intervalpartitions. For each n ∈ N and x ∈ [0, 1), define

mP,Q (n, x) = sup m | Pn (x) ⊂ Qm (x) .

Theorem 4.5.2 Let P = Pn∞n=1 and Q = Qn∞n=1 be sequences of intervalpartitions and λ Lebesgue probability measure on [0, 1). Suppose that for someconstants c > 0 and d > 0, P has entropy c a.e with respect to λ and Q hasentropy d a.e. with respect to λ. Then

mP,Q (n, x)

n→ c

da.e.

Page 74: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

74 Entropy

Proof First we show that

lim supn→∞mP,Q (n, x)

n≤ c

da.e.

Fix ε > 0. Let x ∈ [0, 1) be a point at which the convergence conditions of

the hypotheses are met. Fix η > 0 so thatc+ η

c− c

dη< 1 + ε. Choose N so

that for all n ≥ Nλ (Pn (x)) > 2−n(c+η)

andλ (Qn (x)) < 2−n(d−η).

Fix n so that minn,c

dn≥ N , and let m′ denote any integer greater than

(1 + ε)c

dn. By the choice of η,

λ (Pn (x)) > λ (Qm′ (x))

so that Pn (x) is not contained in Qm′ (x). Therefore

mP,Q (n, x) ≤ (1 + ε)c

dn

and so

lim supn→∞mP,Q (n, x)

n≤ (1 + ε)

c

da.e.

Since ε > 0 was arbitrary, we have the desired result.Now we show that

lim infn→∞mP,Q (n, x)

n≥ c

da.e.

Fix ε ∈ (0, 1). Choose η > 0 so that ζ =: εc − η(1 + (1− ε)

c

d

)> 0. For

each n ∈ N let m (n) =⌊(1− ε)

c

dn⌋. For brevity, for each n ∈ N we call an

element of Pn (respectively Qn) (n, η)−good if

λ (Pn (x)) < 2−n(c−η)

(respectivelyλ (Qn (x)) > 2−n(d+η).)

Page 75: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Lochs’ Theorem 75

For each n ∈ N, let

Dn (η) =

x :

Pn (x) is (n, η)− good and Qm(n) (x) is (m (n) , η)− goodand Pn (x) " Qm(n) (x)

.

If x ∈ Dn (η), then Pn (x) contains an endpoint of the (m (n) , η)−goodinterval Qm(n) (x). By the definition of Dn (η) and m (n),

λ (Pn (x))

λ(Qm(n) (x)

) < 2−nζ .

Since no more than one atom of Pn can contain a particular endpoint of anatom of Qm(n), we see that λ (Dn (η)) < 2 · 2−nζ and so

∞∑n=1

λ (Dn (η)) <∞,

which implies thatλ x | x ∈ Dn (η) i.o. = 0.

Since m (n) goes to infinity as n does, we have shown that for almost everyx ∈ [0, 1), there exists N ∈ N, so that for all n ≥ N , Pn (x) is (n, η)−goodand Qm(n) (x) is (m (n) , η)−good and x /∈ Dn (η). In other words, for al-most every x ∈ [0, 1), there exists N ∈ N, so that for all n ≥ N , Pn (x)is (n, η)−good and Qm(n) (x) is (m (n) , η)−good and Pn (x) ⊂ Qm(n) (x).Thus, for almost every x ∈ [0, 1), there exists N ∈ N, so that for all n ≥ N ,mP,Q (n, x) ≥ m (n), so that

mP,Q (n, x)

n≥ b(1− ε)

c

dc.

This proves that

lim infn→∞mP,Q (n, x)

n≥ (1− ε)

c

da.e.

Since ε > 0 was arbitrary, we have established the theorem.

The above result allows us to compare any two well-known expansionsof numbers. Since the commonly used expansions are usually performed forpoints in the unit interval, our underlying space will be ([0, 1),B, λ), whereB is the Lebesgue σ-algebra, and λ the Lebesgue measure. The expansionswe have in mind share the following two properties.

Page 76: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

76 Entropy

Definition 4.5.3 A surjective map T : [0, 1) → [0, 1) is called a numbertheoretic fibered map (NTFM) if it satisfies the following conditions:

(a) there exists a finite or countable partition of intervals α = Ai; i ∈ Dsuch that T restricted to each atom of α (cylinder set of order 0) ismonotone, continuous and injective. Furthermore, α is a generatingpartition.

(b) T is ergodic with respect to Lebesgue measure λ, and there exists a Tinvariant probability measure µ equivalent to λ with bounded density.

(Bothdµ

dλand

dµare bounded, and µ(A) = 0 if and only if λ(A) = 0

for all Lebesgue sets A.).

Let T be an NTFM with corresponding partition α, and T -invariant measureµ equivalent to λ. Let L,M > 0 be such that

Lλ(A) ≤ µ(A) < Mλ(A)

for all Lebesgue sets A (property (b)). For n ≥ 1, let Pn =∨n−1i=0 T

−iα, thenby property (a), Pn is an interval partition. If Hµ(α) < ∞, then Shannon-McMillan-Breiman Theorem gives

limn→∞

− log µ(Pn(x))

n= hµ(T ) a.e. with respect to µ.

Exercise 4.5.1 Show that the conclusion of the Shannon-McMillan-BreimanTheorem holds if we replace µ by λ, i.e.

limn→∞

− log λ(Pn(x))

n= hµ(T ) a.e. with respect to λ.

Iterations of T generate expansions of points x ∈ [0, 1) with digits in D.We refer to the resulting expansion as the T -expansion of x.

Almost all known expansions on [0, 1) are generated by a NTFM. Amongthem are the n-adic expansions (Tx = nx (mod 1), where n is a positiveinteger), β expansions (Tx = βx (mod 1), where β > 1 is a real number),continued fraction expansions (Tx = 1

x− b 1

xc), and many others (see the

book Ergodic Theory of Numbers).

Page 77: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Lochs’ Theorem 77

Exercise 4.5.2 Prove Theorem (4.5.1) using Theorem (4.5.2). Use the factthat the continued fraction map T is ergodic with respect to Gauss measureµ, given by

µ(B) =

∫B

1

log 2

1

1 + xdx,

and has entropy equal to hµ(T ) =π2

6 log 2.

Exercise 4.5.3 Reformulate and prove Lochs’ Theorem for any two NTFMmaps S and T on [0, 1).

Page 78: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

78 Entropy

Page 79: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Chapter 5

Hurewicz Ergodic Theorem

In this chapter we consider a class of non-measure preserving transformations.In particular, we study invertible, non-singular and conservative transforma-tions on a probability space. We first start with a quick review of equivalentmeasures, we then define non-singular and conservative transformations, andstate some of their properties. We end this section by giving a new proofof Hurewicz Ergodic Theorem, which is a generalization of Birkhoff ErgodicTheorem to non-singular conservative transformations.

5.1 Equivalent measures

Recall that two measures µ and ν on a measure space (Y,F) are equivalentif µ and ν have the same null-sets, i.e.,

µ(A) = 0 ⇔ ν(A) = 0, A ∈ F .

The theorem of Radon-Nikodym says that if µ, ν are σ-finite and equiv-alent, then there exist measurable functions f, g ≥ 0, such that

µ(A) =

∫A

f dν and ν(A) =

∫A

g dµ.

Furthermore, for all h ∈ L1(µ) (or L1(ν)),∫h dµ =

∫hf dν and

∫h dν =

∫hg dµ.

79

Page 80: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

80 Hurewicz Ergodic Theorem

Usually the function f is denoted bydµ

dνand the function g by

dµ.

Now suppose that (X,B, µ) is a probability space, and T : X → X ameasurable transformation. One can define a new measure µT−1 on (X,B)by µ T−1(A) = µ(T−1A) for A ∈ B. It is not hard to prove that forf ∈ L1(µ), ∫

f d(µ T−1) =

∫f T dµ (5.1)

Exercise 5.1.1 Starting with indicator functions, give a proof of (5.1).

Note that if T is invertible, then one has that∫f d(µ T ) =

∫f T−1 dµ (5.2)

5.2 Non-singular and conservative transfor-

mations

Definition 5.2.1 Let (X,B, µ) be a probability space and T : X → X aninvertible measurable function. T is said to be non-singular if for any A ∈ B,

µ(A) = 0 if and only if µ(T−1A) = 0.

Since T is invertible, non-singularity implies that

µ(A) = 0 if and only if µ(T nA) = 0, n 6= 0.

This implies that the measures µ T n defined by µ T n(A) = µ(T nA) isequivalent to µ (and hence equivalent to each other). By the theorem ofRadon-Nikodym, there exists for each n 6= 0, a non-negative measurable

function ωn(x) =dµ T n

dµ(x) such that

µ(T nA) =

∫A

ωn(x) dµ(x).

We have the following propositions.

Page 81: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Non-singular and conservative transformations 81

Proposition 5.2.1 Suppose (X,B, µ) is a probability space, and T : X → Xis invertible and non-singular. Then for every f ∈ L1(µ),∫

X

f(x) dµ(x) =

∫X

f(Tx)ω1(x) dµ(x) =

∫X

f(T nx)ωn(x) dµ(x).

Proof We show the result for indicator functions only, the rest of the proofis left to the reader.∫

X

1A(x) dµ(x) = µ(A) = µ(T (T−1A))

=

∫T−1A

ω1(x) dµ(x)

=

∫X

1A(Tx)ω1(x) dµ(x).

Proposition 5.2.2 Under the assumptions of Proposition 5.2.1, one has forall n,m ≥ 1, that

ωn+m(x) = ωn(x)ωm(T nx), µ a.e.

Proof For any A ∈ B,∫A

ωn(x)ωm(T nx)dµ(x) =

∫X

1A(x)ωm(T nx)d(µ T n)(x)

=

∫X

1A(T−nx)ωm(x)dµ(x)

=

∫X

1TnA(x)d(µ Tm)(x)

= µ Tm(T nA) = µ(Tm+nA) =

∫A

ωn+m(x)dµ(x).

Hence, ωn+m(x) = ωn(x)ωm(T nx), µ a.e.

Exercise 5.2.1 Let (X,B, µ) be a probability space, and T : X → X aninvertible non-singular transformation. For any measurable function f , setfn(x) =

∑n−1i=0 f(T ix)ωi(x), n ≥ 1, where ω0(x) = 1. Show that for all

n,m ≥ 1,fn+m(x) = fn(x) + ωn(x)fm(T nx).

Page 82: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

82 Hurewicz Ergodic Theorem

Definition 5.2.2 Let (X,B, µ) be a probability space, and T : X → X ameasurable transformation. We say that T is conservative if for any A ∈ Bwith µ(A) > 0, there exists an n ≥ 1 such that µ(A ∩ T−nA) > 0.

Note that if T is invertible, non-singular and conservative , then T−1 is alsonon-singular and conservative. In this case, for any A ∈ B with µ(A) > 0,there exists an n 6= 0 such that µ(A ∩ T nA) > 0.

Proposition 5.2.3 Suppose T is invertible, non-singular and conservativeon the probability space (X,B, µ), and let A ∈ B with µ(A) > 0. Then for µa.e. x ∈ A there exist infinitely many positive and negative integers n, suchthat T nx ∈ A.

Proof Let B = x ∈ A : T nx /∈ A for all n ≥ 1. Note that for any n ≥ 1,B ∩ T−nB = ∅. If µ(B) > 0, then by conservativity there exists an n ≥ 1,such that µ(B∩T−nB) is positive, which is a contradiction. Hence, µ(B) = 0,and by non-singularity we have µ(T−nB) = 0 for all n ≥ 1.

Now, let C = x ∈ A; T nx ∈ A for only finitely many n ≥ 1, thenC ⊂

⋃∞n=1 T

−nB, implying that

µ(C) ≤∞∑n=1

µ(T−nB) = 0.

Therefore, for almost every x ∈ A there exist infinitely many n ≥ 1 such thatT nx ∈ A. Replacing T by T−1 yields the result for n ≤ −1.

Proposition 5.2.4 Suppose T is invertible, non-singular and conservative,then

∞∑n=1

ωn(x) = ∞, µ a.e.

Proof Let A = x ∈ X :∑∞

n=1 ωn(x) <∞. Note that

A =∞⋃

M=1

x ∈ X :∞∑n=1

ωn(x) < M.

If µ(A) > 0, then there exists an M ≥ 1 such that the set

B = x ∈ X :∞∑n=1

ωn(x) < M

Page 83: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Hurewicz Ergodic Theorem 83

has positive measure. Then,∫B

∑∞n=1 ωn(x) dµ(x) < Mµ(B) <∞. However,∫

B

∞∑n=1

ωn(x) dµ(x) =∞∑n=1

∫B

ωn(x) dµ(x)

=∞∑n=1

µ(T nB)

=∞∑n=1

∫X

1TnB(x) dµ(x)

=

∫X

∞∑n=1

1B(T−nx) dµ(x).

Hence,∫X

∑∞n=1 1B(T−nx) dµ(x) <∞, which implies that

∞∑n=1

1B(T−nx) <∞ µ a.e.

Therefore, for µ a.e. x one has T−nx ∈ B for only finitely many n ≥ 1,contradicting Proposition 5.2.3. Thus µ(A) = 0, and

∞∑n=1

ωn(x) = ∞, µ a.e.

5.3 Hurewicz Ergodic Theorem

The following theorem by Hurewicz is a generalization of Birkhoff’s ErgodicTheorem to our setting; see also Hurewicz’ original paper [H]. We give a newprove, similar to the proof for Birkhoff’s Theorem; see Section 2.1 and [KK].

Theorem 5.3.1 Let (X,B, µ) be a probability space, and T : X → X aninvertible, non-singular and conservative transformation. If f ∈ L1(µ), then

limn→∞

n−1∑i=0

f(T ix)ωi(x)

n−1∑i=0

ωi(x)

= f∗(x)

Page 84: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

84 Hurewicz Ergodic Theorem

exists µ a.e. Furthermore, f∗ is T -invariant and∫X

f(x) dµ(x) =

∫X

f∗(x) dµ(x).

Proof Assume with no loss of generality that f ≥ 0 (otherwise we writef = f+ − f−, and we consider each part separately). Let

fn(x) = f(x) + f(Tx)ω1(x) + · · ·+ f(T n−1x)ωn−1(x),

gn(x) = ω0(x) + ω1(x) + · · ·+ ωn−1(x), ω0(x) = g0(x) = 1,

f(x) = lim supn→∞

fn(x)n−1∑i=0

ωi(x)

= lim supn→∞

fn(x)

gn(x),

and

f(x) = lim infn→∞

fn(x)n−1∑i=0

ωi(x)

= lim infn→∞

fn(x)

gn(x).

By Proposition (5.2.2), one has gn+m(x) = gn(x) + gm(T nx). Using Exercise(5.2.1) and Proposition (5.2.4), we will show that f and f are T -invariant.To this end,

f(Tx) = lim supn→∞

fn(Tx)

gn(T nx)

= lim supn→∞

fn+1(x)− f(x)

ω1(x)

gn+1(x)− g(x)ω1(x)

= lim supn→∞

fn+1(x)− f(x)

gn+1(x)− g(x)

= lim supn→∞

[fn+1(x)

gn+1(x)· gn+1(x)

gn+1(x)− g(x)− f(x)

gn+1(x)− g(x)

]= lim sup

n→∞

fn+1(x)

gn+1(x)

= f(x).

Page 85: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Hurewicz Ergodic Theorem 85

(Similarly f is T -invariant).Now, to prove that f∗ exists, is integrable and T -invariant, it is enough

to show that ∫X

f dµ ≥∫X

f dµ ≥∫X

f dµ.

For since f − f ≥ 0, this would imply that f = f = f∗. a.e.

We first prove that∫Xfdµ ≤

∫Xf dµ. Fix any 0 < ε < 1, and let L > 0 be

any real number. By definition of f , for any x ∈ X, there exists an integerm > 0 such that

fm(x)

gm(x)≥ min(f(x), L)(1− ε).

Now, for any δ > 0 there exists an integer M > 0 such that the set

X0 = x ∈ X : ∃ 1 ≤ m ≤M with fm(x) ≥ gm(x) min(f(x), L)(1− ε)

has measure at least 1− δ. Define F on X by

F (x) =

f(x) x ∈ X0

L x /∈ X0.

Notice that f ≤ F (why?). For any x ∈ X, let an = an(x) = F (T nx)ωn(x),and bn = bn(x) = min(f(x), L)(1 − ε)ωn(x). We now show that an andbn satisfy the hypothesis of Lemma 2.1.1 with M > 0 as above. For anyn = 0, 1, 2, . . .

–if T nx ∈ X0, then there exists 1 ≤ m ≤M such that

fm(T nx) ≥ min(f(x), L)(1− ε)gm(T nx).

Hence,ωn(x)fm(T nx) ≥ min(f(x), L)(1− ε)gm(T nx)ωn(x).

Now,

bn + . . .+ bn+m−1 = min(f(x), L)(1− ε)gm(T nx)ωn(x)

≤ ωn(x)fm(T nx)

= f(T nx)ωn(x) + f(T n+1x)ωn+1(x) + · · ·+ f(T n+m−1x)ωn+m−1(x)

≤ F (T nx)ωn(x) + F (T n+1x)ωn+1(x) + · · ·+ F (T n+m−1x)ωn+m−1(x)

= an + an+1 + · · ·+ an+m−1.

Page 86: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

86 Hurewicz Ergodic Theorem

–If T nx /∈ X0, then take m = 1 since

an = F (T nx)ωn(x) = Lωn(x) ≥ min(f(x), L)(1− ε)ωn(x) = bn.

Hence by T -invariance of f , and Lemma 2.1.1 for all integers N > M onehas

F (x)+F (Tx)+ω1(x)+· · ·+ωN−1(x)F (TN−1x) ≥ min(f(x), L)(1−ε)gN−M(x).

Integrating both sides, and using Proposition (5.2.1) together with the T -invariance of f one gets

N

∫X

F (x) dµ(x) ≥∫X

min(f(x), L)(1− ε)gN−M(x) dµ(x)

= (N −M)

∫X

min(f(x), L)(1− ε) dµ(x).

Since ∫X

F (x) dµ(x) =

∫X0

f(x) dµ(x) + Lµ(X \X0),

one has∫X

f(x) dµ(x) ≥∫X0

f(x) dµ(x)

=

∫X

F (x) dµ(x)− Lµ(X \X0)

≥ (N −M)

N

∫X

min(f(x), L)(1− ε) dµ(x)− Lδ.

Now letting first N →∞, then δ → 0, then ε→ 0, and lastly L→∞ onegets together with the monotone convergence theorem that f is integrable,and ∫

X

f(x) dµ(x) ≥∫X

f(x) dµ(x).

We now prove that ∫X

f(x) dµ(x) ≤∫X

f(x) dµ(x).

Page 87: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Hurewicz Ergodic Theorem 87

Fix ε > 0, for any x ∈ X there exists an integer m such that

fm(x)

gm(x)≤ (f(x) + ε).

For any δ > 0 there exists an integer M > 0 such that the set

Y0 = x ∈ X : ∃ 1 ≤ m ≤M with fm(x) ≤ (f(x) + ε)gm(x)

has measure at least 1− δ. Define G on X by

G(x) =

f(x) x ∈ Y0

0 x /∈ Y0.

Notice that G ≤ f . Let bn = G(T nx)ωn(x), and an = (f(x) + ε)ωn(x). Wenow check that the sequences an and bn satisfy the hypothesis of Lemma2.1.1 with M > 0 as above.

–if T nx ∈ Y0, then there exists 1 ≤ m ≤M such that

fm(T nx) ≤ (f(x) + ε)gm(T nx).

Hence,

ωn(x)fm(T nx) ≤ (f(x)+ε)gm(T nx)ωn(x) = (f(x)+ε)(ωn(x)+· · ·+ωn+m−1(x).

By Proposition (5.2.2), and the fact that f ≥ G, one gets

bn + . . .+ bn+m−1 = G(T nx)ωn(x) + · · ·+G(T n+m−1x)ωn+m−1(x)

≤ f(T nx)ωn(x) + · · ·+ f(T n+m−1x)ωn+m−1(x)

= ωn(x)fm(T nx)

≤ (f(x) + ε)(ωn(x) + · · ·+ ωn+m+1(x))

= an + · · · an+m−1.

–If T nx /∈ Y0, then take m = 1 since

bn = G(T nx)ωn(x) = 0 ≤ (f(x) + ε)(ωn(x)) = an.

Hence by Lemma 2.1.1 one has for all integers N > M

G(x) +G(Tx)ω1(x) + . . .+G(TN−M−1x)ωN−M−1(x) ≤ (f(x) + ε)gN(x).

Page 88: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

88 Hurewicz Ergodic Theorem

Integrating both sides yields

(N −M)

∫X

G(x)dµ(x) ≤ N(

∫X

f(x)dµ(x) + ε).

Since f ≥ 0, the measure ν defined by ν(A) =∫Af(x) dµ(x) is absolutely

continuous with respect to the measure µ. Hence, there exists δ0 > 0 suchthat if µ(A) < δ, then ν(A) < δ0. Since µ(X \ Y0) < δ, then ν(X \ Y0) =∫X\Y0

f(x)dµ(x) < δ0. Hence,∫X

f(x) dµ(x) =

∫X

G(x) dµ(x) +

∫X\Y0

f(x) dµ(x)

≤ N

N −M

∫X

(f(x) + ε) dµ(x) + δ0.

Now, let first N → ∞, then δ → 0 (and hence δ0 → 0), and finally ε → 0,one gets ∫

X

f(x) dµ(x) ≤∫X

f(x) dµ(x).

This shows that ∫X

f dµ ≥∫X

f dµ ≥∫X

f dµ,

hence, f = f = f∗ a.e., and f∗ is T -invariant.

Remark We can extend the notion of ergodicity to our setting. If T is non-singular and conservative, we say that T is ergodic if for any measurable setA satisfying µ(A∆T−1A) = 0, one has µ(A) = 0 or 1. It is easy to check thatthe proof of Proposition (1.7.1) holds in this case, so that T ergodic impliesthat each T -invariant function is a constant µ a.e. Hence, if T is invertible,non-singular, conservative and ergodic, then by Hurewicz Ergodic Theoremone has for any f ∈ L1(µ),

limn→∞

n−1∑i=0

f(T ix)ωi(x)

n−1∑i=0

ωi(x)

=

∫X

fdµ µ a.e.

Page 89: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Chapter 6

Invariant Measures forContinuous Transformations

6.1 Existence

Suppose X is a compact metric space, and let B be the Borel σ-algebra i.e.,the σ-algebra generated by the open sets. Let M(X) be the collection of allBorel probability measures on X. There is natural embedding of the spaceX in M(X) via the map x→ δx, where δX is the Dirac measure concentratedat x (δx(A) = 1 if x ∈ A, and is zero otherwise). Furthermore, M(X) is aconvex set, i.e., pµ+(1−p)ν ∈M(X) whenever µ, ν ∈M(X) and 0 ≤ p ≤ 1.Theorem 6.1.2 below shows that a member of M(X) is determined by howit integrates continuous functions. We denote by C(X) the Banach space ofall complex valued continuous functions on X under the supremum norm.

Theorem 6.1.1 Every member of M(X) is regular, i.e., for all B ∈ B andevery ε > 0 there exist an open set Uε and a closed sed Cε such that Cε ⊆B ⊆ Uε such that µ(Uε \ Cε) < ε.

Idea of proof Call a set B ∈ B with the above property a regular set. LetR = B ∈ B : B is regular . Show that R is a σ-algebra containing all theclosed sets.

Corollary 6.1.1 For any B ∈ B, and any µ ∈M(X),

µ(B) = supC⊆B:C closed

µ(C) = infB⊆U :U open

µ(U).

89

Page 90: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

90 Invariant measures for continuous transformations

Theorem 6.1.2 Let µ,m ∈M(X). If∫X

f dµ =

∫X

f dm

for all f ∈ C(X), then µ = m.

Proof From the above corollary, it suffices to show that µ(C) = m(C) forall closed subsets C of X. Let ε > 0, by regularity of the measure m thereexists an open set Uε such that C ⊆ Uε and m(Uε \C) < ε. Define f ∈ C(X)as follows

f(x) =

0 x /∈ Uε

d(x,X\Uε)d(x,X\Uε)+d(x,C)

x ∈ Uε.

Notice that 1C ≤ f ≤ 1Uε , thus

µ(C) ≤∫X

f dµ =

∫X

f dm ≤ m(Uε) ≤ m(C) + ε.

Using a similar argument, one can show that m(C) ≤ µ(C) + ε. Therefore,µ(C) = m(C) for all closed sets, and hence for all Borel sets.

This allows us to define a metric structure on M(X) as follows. A sequenceµn in M(X) converges to µ ∈M(X) if and only if

limn→∞

∫X

f(x) dµn(x) =

∫X

f(x) dµ(x)

for all f ∈ C(X). We will show that under this notion of convergencethe space M(X) is compact, but first we need The Riesz RepresentationRepresentation Theorem.

Theorem 6.1.3 (The Riesz Representation Theorem) Let X be a compactmetric space and J : C(X) → C a continuous linear map such that J is apositive operator and J(1) = 1. Then there exists a µ ∈ M(X) such thatJ(f) =

∫Xf(x) dµ(x).

Theorem 6.1.4 The space M(X) is compact.

Page 91: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Existence 91

Idea of proof Let µn be a sequence in M(X). Choose a countable densesubset of fn of C(X). The sequence

∫Xf1 dµn is a bounded sequence of

complex numbers, hence has a convergent subsequence ∫Xf1 dµ

(1)n . Now,

the sequence ∫Xf2 dµ

(1)n is bounded, and hence has a convergent sub-

sequence ∫Xf2 dµ

(2)n . Notice that

∫Xf1 dµ

(2)n is also convergent. We

continue in this manner, to get for each i a subsequence µ(i)n of µn

such that for all j ≤ i, µ(i)n is a subsequence of µ(j)

n and ∫Xfj dµ

(i)n

converges. Consider the diagonal sequence µ(n)n , then

∫Xfj dµ

(n)n con-

verges for all j, and hence ∫Xf dµ

(n)n converges for all f ∈ C(X). Now

define J : C(X) → C by J(f) = limn→∞∫Xf dµ

(n)n . Then, J is lin-

ear, continuous (|J(f)| ≤ supx∈X |f(x)|), positive and J(1) = 1. Thus,by Riesz Representation Theorem, there exists a µ ∈ M(X) such that

J(f) = limn→∞∫Xf dµ

(n)n =

∫Xf dµ. Therefore, limn→∞ µ

(n)n = µ, and

M(X) is compact.

Let T : X → X be a continuous transformation. Since B is generatedby the open sets, then T is measurable with respect to B. Furthermore, Tinduces in a natural way, an operator T : M(X) →M(X) given by

(Tµ)(A) = µ(T−1A)

for all A ∈ B. Then Tiµ(A) = µ(T−iA). Using a standard argument, one

can easily show that∫X

f(x) d(Tµ)(x) =

∫X

f(Tx)dµ(x)

for all continuous functions f on X. Note that T is measure preserving withrespect to µ ∈ M(X) if and only if Tµ = µ. Equivalently, µ is measurepreserving if and only if∫

X

f(x) dµ(x) =

∫X

f(Tx) dµ(x)

for all continuous functions f on X. Let

M(X,T ) = µ ∈M(X) : Tµ = µ

be the collection of all probability measures under which T is measure pre-serving.

Page 92: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

92 Invariant measures for continuous transformations

Theorem 6.1.5 Let T : X → X be continuous, and σn a sequence inM(X). Define a sequence µn in M(X) by

µn =1

n

n−1∑i=0

Tiσn.

Then, any limit point µ of µn is a member of M(X,T ).

Proof We need to show that for any continuous function f on X, onehas

∫Xf(x) dµ(x) =

∫Xf(Tx) dµ. Since M(X) is compact there exists a

µ ∈ M(X) and a subsequence µni such that µni

→ µ in M(X). Now forany f continuous, we have∣∣∣∣∫X

f(Tx) dµ(x)−∫X

f(x) dµ(x)

∣∣∣∣ = limj→∞

∣∣∣∣∫X

f(Tx) dµnj(x)−

∫X

f(x) dµnj(x)

∣∣∣∣= lim

j→∞

∣∣∣∣∣ 1

nj

∫X

nj−1∑i=0

(f(T i+1x)− f(T ix)

)dσnj

(x)

∣∣∣∣∣= lim

j→∞

∣∣∣∣ 1

nj

∫X

(f(T njx)− f(x)) dσnj(x)

∣∣∣∣≤ lim

j→∞

2 supx∈X |f(x)|nj

= 0.

Therefore µ ∈M(X,T ).

Theorem 6.1.6 Let T be a continuous transformation on a compact metricspace. Then,

(i) M(X,T ) is a compact convex subset of M(X).

(ii) µ ∈ M(X,T ) is an extreme point (i.e. µ cannot be written in a non-trivial way as a convex combination of elements of M(X,T )) if andonly if T is ergodic with respect to µ.

Proof (i) Clearly M(X,T ) is convex. Now let µn be a sequence inM(X,T ) converging to µ in M(X). We need to show that µ ∈ M(X,T ).

Page 93: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Existence 93

Since T is continuous, then for any continuous function f on X, the functionf T is also continuous. Hence,∫

X

f(Tx) dµ(x) = limn→∞

∫X

f(Tx) dµn(x)

= limn→∞

∫X

f(x) dµn(x)

=

∫X

f(x) dµ(x).

Therefore, T is measure preserving with respect to µ, and µ ∈M(X,T ).(ii) Suppose T is ergodic with respect to µ, and assume that

µ = pµ1 + (1− p)µ2

for some µ1, µ2 ∈M(X,T ), and some 0 < p ≤ 1. We will show that µ = µ1.Notice that the measure µ1 is absolutely continuous with respect to µ, andT is ergodic with respect to µ, hence by Theorem (2.1.2) we have µ1 = µ.

Conversely, (we prove the contrapositive) suppose that T is not ergodic withrespect to µ. Then there exists a measurable set E such that T−1E = E,and 0 < µ(E) < 1. Define measures µ1, µ2 on X by

µ1(B) =µ(B ∩ E)

µ(E)and µ1(B) =

µ (B ∩ (X \ E))

µ(X \ E).

Since E and X \E are T -invariant sets, then µ1, µ2 ∈M(X,T ), and µ1 6= µ2

since µ1(E) = 1 while µ2(E) = 0. Furthermore, for any measurable set B,

µ(B) = µ(E)µ1(B) + (1− µ(E))µ2(B),

i.e. µ1 is a non-trivial convex combination of elements of M(X,T ). Thus, µis not an extreme point of M(X,T ).

Since the Banach space C(X) of all continuous functions on X (underthe sup norm) is separable i.e. C(X) has a countable dense subset, one getsthe following strengthening of the Ergodic Theorem.

Theorem 6.1.7 If T : X → X is continuous and µ ∈ M(X,T ) is ergodic,then there exists a measurable set Y such that µ(Y ) = 1, and

limn→∞

1

n

n−1∑i=0

f(T ix) =

∫X

f(x) dµ(x)

for all x ∈ Y , and f ∈ C(X).

Page 94: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

94 Invariant measures for continuous transformations

Proof Choose a countable dense subset fk in C(X). By the ErgodicTheorem, for each k there exists a subset Xk with µ(Xk) = 1 and

limn→∞

1

n

n−1∑i=0

fk(Tix) =

∫X

fk(x) dµ(x)

for all x ∈ Xk. Let Y =⋂∞k=1Xk, then µ(Y ) = 1, and

limn→∞

1

n

n−1∑i=0

fk(Tix) =

∫X

fk(x)dµ(x)

for all k and all x ∈ Y . Now, let f ∈ C(X), then there exists a subse-quence fkj

converging to f in the supremum norm, and hence is uniformlyconvergent. For any x ∈ Y , using uniform convergence and the dominatedconvergence theorem, one gets

limn→∞

1

n

n−1∑i=0

f(T ix) = limn→∞

limj→∞

1

n

n−1∑i=0

fkj(T ix)

= limj→∞

limn→∞

1

n

n−1∑i=0

fkj(T ix)

= limj→∞

∫X

fkjdµ =

∫X

f dµ.

Theorem 6.1.8 Let T : X → X be continuous, and µ ∈M(X,T ). Then Tis ergodic with respect to µ if and only if

1

n

n−1∑i=0

δT ix → µ a.e.

Proof Suppose T is ergodic with respect to µ. Notice that for any f ∈C(X), ∫

X

f(y) d(δT ix)(y) = f(T ix),

Page 95: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Existence 95

Hence by theorem 6.1.7, there exists a measurable set Y with µ(Y ) = 1 suchthat

limn→∞

1

n

n−1∑i=0

∫X

f(y) d(δT ix)(y) = limn→∞

1

n

n−1∑i=0

f(T ix) =

∫X

f(y) dµ(y)

for all x ∈ Y , and f ∈ C(X). Thus, 1n

∑n−1i=0 δT ix → µ for all x ∈ Y .

Conversely, suppose 1n

∑n−1i=0 δT ix → µ for all x ∈ Y , where µ(Y ) = 1. Then

for any f ∈ C(X) and any g ∈ L1(X,B, µ) one has

limn→∞

1

n

n−1∑i=0

f(T ix)g(x) = g(x)

∫X

f(y) dµ(y).

By the dominated convergence theorem

limn→∞

1

n

n−1∑i=0

∫X

f(T ix)g(x) dµ(x) =

∫X

g(x)dµ(x)

∫X

f(y) dµ(y).

Now, let F,G ∈ L2(X,B, µ). Then, G ∈ L1(X,B, µ) so that

limn→∞

1

n

n−1∑i=0

∫X

f(T ix)G(x) dµ(x) =

∫X

G(x)dµ(x)

∫X

f(y)dµ(y)

for all f ∈ C(X). Let ε > 0, there exists f ∈ C(X) such that ||F − f ||2 < εwhich implies that |

∫F dµ −

∫fdµ| < ε. Furthermore, there exists N so

that for n ≥ N one has

∣∣∣∣∣∫X

1

n

n−1∑i=0

f(T ix)G(x) dµ(x)−∫X

Gdµ

∫X

f dµ

∣∣∣∣∣ < ε.

Page 96: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

96 Invariant measures for continuous transformations

Thus, for n ≥ N one has∣∣∣∣∣∫X

1

n

n−1∑i=0

F (T ix)G(x) dµ(x)−∫X

Gdµ

∫X

F dµ

∣∣∣∣∣≤

∫X

1

n

n−1∑i=0

∣∣F (T ix)− f(T ix)∣∣ |G(x)| dµ(x)

+

∣∣∣∣∣∫X

1

n

n−1∑i=0

f(T ix)G(x)dµ(x)−∫X

Gdµ

∫X

fdµ

∣∣∣∣∣+

∣∣∣∣∫X

fdµ

∫X

Gdµ−∫X

F dµ

∫X

G dµ

∣∣∣∣< ε||G||2 + ε+ ε||G||2.

Thus,

limn→∞

1

n

n−1∑i=0

∫X

F (T ix)G(x) dµ(x) =

∫X

G(x) dµ(x)

∫X

F (y) dµ(y)

for all F,G ∈ L2(X,B, µ) and x ∈ Y . Taking F and G to be indicatorfunctions, one gets that T is ergodic.

Exercise 6.1.1 Let X be a compact metric space and T : X → X be acontinuous homeomorphism. Let x ∈ X be periodic point of T of period n,i.e. T nx = x and T jx 6= x for j < i. Show that if µ ∈ M(X,T ) is ergodic

and µ(x) > 0, then µ =1

n

n−1∑i=0

δT ix.

6.2 Unique Ergodicity

A continuous transformation T : X → X on a compact metric space isuniquely ergodic if there is only one T -invariant probabilty measure µ onX. In this case, M(X,T ) = µ, and µ is necessarily ergodic, since µ is anextreme point of M(X,T ). Recall that if ν ∈M(X,T ) is ergodic, then thereexists a measurable subset Y such that ν(Y ) = 1 and

limn→∞

1

n

n−1∑i=0

f(T ix) =

∫X

f(y) dν(y)

Page 97: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Unique Ergodicity 97

for all x ∈ Y and all f ∈ C(X). When T is uniquely ergodic we will see thatwe have a much stronger result.

Theorem 6.2.1 Let T : X → X be a continuous transformation on a com-pact metric space X. Then the following are equivalent:

(i) For every f ∈ C(X), the sequence 1

n

n−1∑j=0

f(T jx) converges uniformly

to a constant.

(ii) For every f ∈ C(X), the sequence 1

n

n−1∑j=0

f(T jx) converges pointwise

to a constant.

(iii) There exists a µ ∈ M(X,T ) such that for every f ∈ C(X) and allx ∈ X.

limn→∞

1

n

n−1∑i=0

f(T ix) =

∫X

f(y) dµ(y).

(iv) T is uniquely ergodic.

Proof (i) ⇒ (ii) immediate.

(ii) ⇒ (iii) Define L : C(X) → C by

L(f) = limn→∞

1

n

n−1∑i=0

f(T ix).

By assumption L(f) is independent of x, hence L is well defined. It is easyto see that L is linear, continuous (|L(f)| ≤ supx∈X |f(x)|), positive andL(1) = 1. Thus, by Riesz Representation Theorem there exists a probabilitymeasure µ ∈M(X) such that

L(f) =

∫X

f(x) dµ(x)

for all f ∈ C(x). But

L(f T ) = limn→∞

1

n

n−1∑i=0

f(T i+1x) = L(f).

Page 98: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

98 Invariant measures for continuous transformations

Hence, ∫X

f(Tx) dµ(x) =

∫X

f(x) dµ(x).

Thus, µ ∈M(X,T ), and for every f ∈ C(X),

limn→∞

1

n

n−1∑i=0

f(T ix) =

∫X

f(x) dµ(x)

for all x ∈ X.

(iii) ⇒ (iv) Suppose µ ∈M(X,T ) is such that for every f ∈ C(X),

limn→∞

1

n

n−1∑i=0

f(T ix) =

∫X

f(x) dµ(x)

for all x ∈ X. Assume ν ∈ M(X,T ), we will show that µ = ν. For any f ∈

C(X), since the sequence 1

n

n−1∑j=0

f(T jx) converges pointwise to the constant

function∫Xf(x) dµ(x), and since each term of the sequence is bounded in

absolute value by the constant supx∈X |f(x)|, it follows by the DominatedConvergence Theorem that

limn→∞

∫X

1

n

n−1∑i=0

f(T ix) dν(x) =

∫X

∫X

f(x) dµ(x)dν(y) =

∫X

f(x) dµ(x).

But for each n, ∫X

1

n

n−1∑i=0

f(T ix) dν(x) =

∫X

f(x) dν(x).

Thus,∫Xf(x) dµ(x) =

∫Xf(x) dν(x), and µ = ν.

(iv) ⇒ (i) The proof is done by contradiction. Assume M(X,T ) = µ and

suppose g ∈ C(X) is such that the sequence 1

n

n−1∑j=0

gT j does not converge

uniformly on X. Then there exists ε > 0 such that for each N there existsn > N and there exists xn ∈ X such that∣∣ 1

n

n−1∑j=0

g(T jxn)−∫X

g dµ∣∣ ≥ ε.

Page 99: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Unique Ergodicity 99

Let

µn =1

n

n−1∑j=0

δT jxn=

1

n

n−1∑j=0

Tjδxn .

Then,

|∫X

gdµn −∫X

g dµ| ≥ ε.

Since M(X) is compact. there exists a subsequence µniconverging to ν ∈

M(X). Hence,

|∫X

g dν −∫X

g dµ| ≥ ε.

By Theorem (6.1.5), ν ∈M(X,T ) and by unique ergodicity µ = ν, which isa contradiction.

Example If Tθ is an irrational rotation, then Tθ is uniquely ergodic. This isa consequence of the above theorem and Weyl’s Theorem on uniform distri-bution: for any Riemann integrable function f on [0, 1), and any x ∈ [0, 1),one has

limn→∞

1

n

n−1∑i=0

f(x+ iθ − bx+ iθc) =

∫X

f(y) dy.

As an application of this, let us consider the following question. Considerthe sequence of first digits

1, 2, 4, 8, 1, 3, 6, 1, . . .

obtained by writing the first decimal digit of each term in the sequence

2n : n ≥ 0 = 1, 2, 4, 8, 16, 32, 64, 128, . . ..

For each k = 1, 2, . . . , 9, let pk(n) be the number of times the digit k appearsin the first n terms of the first digit sequence. The asymptotic relative fre-

quency of the digit k is then limn→∞pk(n)

n. We want to identify this limit

for each k ∈ 1, 2, . . . 9. To do this, let θ = log10 2, then θ is irrational. Fork = 1, 2, . . . , 9, let Jk = [log10 k, log10(k+1)). By unique ergodicity of Tθ, wehave for each k = 1, 2, . . . , 9,

limn→∞

1

n

n−1∑i=0

1Jk(T jθ (0)) = λ(Jk) = log10

(k + 1

k

).

Page 100: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

100 Invariant measures for continuous transformations

Returning to our original problem, notice that the first digit of 2i is k if andonly if

k · 10r ≤ 2i < (k + 1) · 10r

for some r ≥ 0. In this case,

r + log10 k ≤ i log10 2 = iθ < r + log10(k + 1).

This shows that r = biθc, and

log10 k ≤ iθ − biθc < log10(k + 1).

But T iθ(0) = iθ−biθc, so that T iθ(0) ∈ Jk. Summarizing, we see that the firstdigit of 2i is k if and only if T iθ(0) ∈ Jk. Thus,

limn→∞

pk(n)

n= lim

n→∞

1

n

n−1∑i=0

1Jk(T iθ(0)) = log10

(k + 1

k

).

Page 101: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Chapter 7

Topological Dynamics

In this chapter we will shift our attention from the theory of measure preserv-ing transformations and ergodic theory to the study of topological dynamics.The field of topological dynamics also studies dynamical systems, but in adifferent setting: where ergodic theory is set in a measure space with a mea-sure preserving transformation, topological dynamics focuses on a topologicalspace with a continuous transformation. The two fields bear great similarityand seem to co-evolve. Whenever a new notion has proven its value in onefield, analogies will be sought in the other. The two fields are, however, alsocomplementary. Indeed, we shall encounter a most striking example of a mapwhich exhibits its most interesting topological dynamics in a set of measurezero, right in the blind spot of the measure theoretic eye.In order to illuminate this interplay between the two fields we will be merelyinterested in compact metric spaces. The compactness assumption will playthe role of the assumption of a finite measure space. The assumption of theexistence of a metric is sufficient for applications and will greatly simplifyall of our proofs. The chapter is structured as follows. We will first intro-duce several basic notions from topological dynamics and discuss how theyare related to each other and to analogous notions from ergodic theory. Wewill then devote a separate section to topological entropy. To conclude thechapter, we will discuss some examples and applications.

101

Page 102: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

102 Basic Notions

7.1 Basic Notions

Throughout this chapter X will denote a compact metric space. Unlessstated otherwise, we will always denote the metric by d and occasionallywrite (X, d) to denote a metric space if there is any risk of confusion. We willassume that the reader has some familiarity with basic (analytic) topology.To jog the reader’s memory, a brief outline of these basics is included as anappendix. Here we will only summarize the properties of the space X, forfuture reference.

Theorem 7.1.1 Let X be a compact metric space, Y a topological space andf : X −→ Y continuous. Then,

• X is a Hausdorff space

• Every closed subspace of X is compact

• Every compact subspace of X is closed

• X has a countable basis for its topology

• X is normal, i.e. for any pair of disjoint closed sets A and B of Xthere are disjoint open sets U , V containing A and B, respectively

• If O is an open cover of X, then there is a δ > 0 such that every subsetA of X with diam(A) < δ is contained in an element of O. We callδ > 0 a Lebesgue number for O.

• X is a Baire space

• f is uniformly continuous

• If Y is an ordered space then f attains a maximum and minimum onX

• If A and B are closed sets in X and [a, b] ⊂ R, then there exists acontinuous map g : X −→ [a, b] such that g(x) = a for all x ∈ A andg(x) = b for all x ∈ b

Page 103: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

103

The reader may recognize that this theorem contains some of the mostimportant results in analytic topology, most notably the Lebesgue numberlemma, Baire’s category theorem, the extreme value theorem and Urysohn’slemma.Let us now introduce some concepts of topological dynamics: topologicaltransitivity, topological conjugacy, periodicity and expansiveness.

Definition 7.1.1 Let T : X −→ X be a continuous transformation. Forx ∈ X, the forward orbit of x is the set FOT (x) = T n(x)|n ∈ Z≥0. T iscalled one-sided topologically transitive if for some x ∈ X, FOT (x) is densein X.If T is invertible, the orbit of x is defined as OT (x) = T n(x)|n ∈ Z. If Tis a homeomorphism, T is called topologically transitive if for some x ∈ XOT (x) is dense in X.

In the literature sometimes a stronger version of topological transitivity isintroduced, called minimality.

Definition 7.1.2 A homeomorphism T : X −→ X is called minimal if everyx ∈ X has a dense orbit in X.

Example 7.1.1 Let X be the topological space X = 1, 2, . . . , 1000 withthe discrete topology. X is a finite space, hence compact, and the discretemetric

ddisc(x, y) =

0 if x = y

1 if x 6= y

induces the discrete topology on X. Thus X is a compact metric space. Ifwe now define T : X −→ X by the permutation (12 · · · 1000), i.e. T (1) = 2,T (2) = 3,. . .,T (1000) = 1, then it is easy to see that T is a homeomorphism.Moreover, OT (x) = X for all x ∈ X, so T is minimal.

Example 7.1.2 Fix n ∈ Z>0. Let X be the topological space X = 0 ∪ 1nk |k ∈ Z>0 equipped with the subspace topology inherited from R. ThenX is a compact metric space. Indeed, if O is an open cover of X, then Ocontains an open set U which contains 0. U contains all but finitely manypoints of X, so by adding an open set Uy to U from O with y ∈ Uy, for eachy not in U , we obtain a finite subcover. Define T : X −→ X by T (x) = x

n.

Page 104: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

104 Basic Notions

Note that T is continuous, but not surjective as T−1(1) = ∅. Observethat FOT (1) = X − 0 and X − 0 = X, so T is one-sided topologicallytransitive as x = 1 has a dense forward orbit.

The following theorem summarizes some equivalent definitions of topologicaltransitivity.

Theorem 7.1.2 Let T : X −→ X be a homeomorphism of a compact metricspace. Then the following are equivalent:

1. T is topologically transitive

2. If A is a closed set in X satisfying TA = A, then either A = X or Ahas empty interior (i.e. X − A is dense in X)

3. For any pair of non empty open sets U , V in X there exists an n ∈ Zsuch that T nU ∩ V 6= ∅

Proof

Recall that OT (x) is dense in X if and only if for every U open in X,U ∩OT (x) 6= ∅.(1)⇒(2): Suppose that OT (x) = X and let A be as stated. Suppose that Adoes not have an empty interior, then there exists an open set U such thatU is open, non-empty and U ⊂ A. Now, we can find a p ∈ Z such thatT p(x) ∈ U ⊂ A. Since T nA = A for all n ∈ Z, OT (x) ⊂ A and takingclosures we obtain X ⊂ A. Hence, either A has empty interior, or A = X.(2)⇒(3): Suppose U , V are open and non-empty. Then W = ∪∞n=−∞T

nUis open, non-empty and invariant under T . Hence, A = X −W is closed,TA = A and A 6= X. By (2), A has empty interior and therefore W is densein X. Thus W ∩ V 6= ∅, which implies (3).(3)⇒(1): For an arbitrary y ∈ X we have: y ∈ OT (x) if and only if everyopen neighborhood V of y intersects OT (x), i.e. Tm(x) ∈ V for some m ∈ Z.By theorem 7.1.1 there exists a countable basis U = Un∞n=1 for the topologyof X. Since every open neighborhood of y contains a basis element Ui suchthat y ∈ Ui, we see that OT (x) = X if and only if for every n ∈ Z>0 there issome m ∈ Z such that Tm(x) ∈ Un. Hence,

x ∈ X|OT (x) = X =∞⋂n=1

∞⋃m=−∞

TmUn

Page 105: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

105

But ∪∞m=−∞TmUn is T -invariant and therefore intersects every open set in

X by (3). Thus, ∪∞m=−∞TmUn is dense in X for every n. Now, since X is

a Baire space (c.f. theorem 7.1.1), we see that x ∈ X|OT (x) = X is itselfdense in X and thus certainly non-empty (as X 6= ∅).

An analogue of this theorem exists for one-sided topological transitivity, see[W].Note that theorem 7.1.2 clearly resembles theorem (1.6.1) and (2) impliesthat we can view topological transitivity as an analogue of ergodicity (insome sense). This is also reflected in the following (partial) analogue oftheorem (1.7.1).

Theorem 7.1.3 Let T : X −→ X be continuous and one-sided topologicallytransitive or a topologically transitive homeomorphism. Then every continu-ous T -invariant function is constant.

Proof

Let f : X −→ Y be continuous on X and suppose f T = f . Thenf T n = f , so f is constant on (forward) orbits of points. Let x0 ∈ X have adense (forward) orbit in X and suppose f(x0) = c for some c ∈ Y . Fix ε > 0and let x ∈ X be arbitrary. By continuity of f at x, we can find a δ > 0 suchthat d(f(x), f(x)) < ε for any x ∈ X with d(x, x) < δ. But then, since x0

has a dense (forward) orbit in X, there is some n ∈ Z (n ∈ Z≥0) such thatd(x, T n(x0)) < δ. Hence,

d(f(x), c) = d(f(x), f(T n(x0))) < ε

Since ε > 0 was arbitrary, our proof is complete.

Exercise 7.1.1 Let X be a compact metric space with metric d and letT : X −→ X is a topologically transitive homeomorphism. Show that if T isan isometry (i.e. d(T (x), T (y)) = d(x, y), for all x, y ∈ X) then T is minimal.

Definition 7.1.3 Let X be a topological space, T : X −→ X a transforma-tion and x ∈ X. Then x is called a periodic point of T of period n (n ∈ Z>0)if T n(x) = x and Tm(x) 6= x for m < n. A periodic point of period 1 is calleda fixed point.

Page 106: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

106 Basic Notions

Example 7.1.3 The map T : R −→ R, defined by T (x) = −x has one fixedpoint, namely x = 0. All other points in the domain are periodic points ofperiod 2.

Definition 7.1.4 Let X be a compact metric space and T : X −→ X ahomeomorphism. T is said to be expansive if there exist a δ > 0 such that: ifx 6= y then there exists an n ∈ Z such that d(T n(x), T n(y)) > δ. δ is calledan expansive constant for T .

Example 7.1.4 Let X be a finite space with the discrete topology. X is acompact metric space, since the discrete metric ddisc induces the topology onX. Now, any bijective map T : X −→ X is an expansive homeomorphismand any 0 < δ < 1 is an expansive constant for T .

For a non-trivial example of an expansive homeomorphism, see exercise 7.3.1.Expansiveness is closely related to the concept of a generator.

Definition 7.1.5 Let X be a compact metric space and T : X −→ X ahomeomorphism. A finite open cover α of X is called a generator for T ifthe set ∩∞n=−∞T

−nAn contains at most one point of X, for any collectionAn∞n=−∞, Ai ∈ α.

Theorem 7.1.4 Let X be a compact metric space and T : X −→ X ahomeomorphism. Then T is expansive if and only if T has a generator.

Proof

Suppose that T is expansive and let δ > 0 be an expansive constant for T .Let α be the open cover of X defined by α = B(a, δ

2)|a ∈ X. By compact-

ness, there exists a finite subcover β of α. Suppose that x, y ∈ ∩∞n=−∞T−nBn,

Bi ∈ β for all i. Then d(Tm(x), Tm(y)) ≤ δ, for all m ∈ Z. But by expan-siveness, there exists an l ∈ Z such that d(T l(x), T l(y)) > δ, a contradiction.Hence, x = y. We conclude that β is a generator for T .Conversely, suppose that α is a generator for T . By theorem 7.1.1, thereexists a Lebesgue number δ > 0 for the open cover α. We claim thatδ/4 is an expansive constant for T . Indeed, if x, y ∈ X are such thatd(Tm(x), Tm(y)) ≤ δ/4, for all m ∈ Z. Then, for all m ∈ Z, the closedball B(Tm(x), δ/4) contains Tm(y) and has diameter δ/2 < δ. Hence,

B(Tm(x), δ/4) ⊂ Am,

Page 107: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

107

for some Am ∈ α. It follows that x, y ∈ ∩∞m=−∞T−mAm.But α is a generator,

so ∩∞m=−∞T−mAm contains at most one point. We conclude that x = y, T is

expansive.

Exercise 7.1.2 In the literature, our definition of a generator is sometimescalled a weak generator. A generator is then defined by replacing ∩∞n=−∞T

−nAnby ∩∞n=−∞T

−nAn in definition 7.1.5. Show that in a compact metric spaceboth concepts are in fact equivalent.

Exercise 7.1.3 Prove the following basic properties of an expansive home-omorphism T : X −→ X:

a. T is expansive if and only if T n is expansive (n 6= 0)

b. If A is a closed subset of X and T (A) = A, then the restriction of T toA, T |A, is expansive

c. Suppose S : Y −→ Y is an expansive homeomorphism. Then the productmap T ×S : X × Y −→ X × Y is expansive with respect to the metricd = maxdX , dY .

Definition 7.1.6 Let X, Y be compact spaces and let T : X −→ X, S :Y −→ Y be homeomorphisms. T is said to be topologically conjugate to S ifthere exists a homeomorphism φ : X −→ Y such that φ T = S φ. φ iscalled a (topological) conjugacy.

Note that topological conjugacy defines an equivalence relation on the spaceof all homeomorphisms.The term ‘topological conjugacy’ is, in a sense, a misnomer. The followingtheorem shows that topological conjugacy can be considered as the counter-part of a measure preserving isomorphism.

Theorem 7.1.5 Let X, Y be compact spaces and let T : X −→ X, S :Y −→ Y be topologically conjugate homeomorphisms. Then,

1. T is topologically transitive if and only if S is topologically transitive

2. T is minimal if and only if S is minimal

3. T is expansive if and only if S is expansive

Page 108: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

108 Two Definitions

Proof

The proof of the first two statements is trivial. For the third statementwe note that

∞⋂n=−∞

T−n(φ−1An) =∞⋂

n=−∞

φ−1 S−n(An) = φ−1(∞⋂

n=−∞

S−nAn)

The lefthandside of the equation contains at most one point if and only ifthe righthandside does, so a finite open cover α is a generator for S if andonly if φ−1α = φ−1A|A ∈ α is a generator for T . Hence the result followsfrom theorem 7.1.4.

7.2 Topological Entropy

Topological entropy was first introduced as an analogue of the succesful con-cept of measure theoretic entropy. It will turn out to be a conjugacy invariantand is therefore a useful tool for distinguishing between topological dynamicalsystems. The definition of topological entropy comes in two flavors. The firstis in terms of open covers and is very similar to the definition of measure the-oretic entropy in terms of partitions. The second (and chronologically later)definition uses (n,ε) separating and spanning sets. Interestingly, the latterdefinition was a topological dynamical discovery, which was later engineeredinto a similar definition of measure theoretic entropy.

7.2.1 Two Definitions

We will start with a definition of topological entropy similar to the defini-tion of measure theoretic entropy introduced earlier. To spark the reader’srecognition of the similarities between the two, we use analogous notationand terminology. Before kicking off, let us first make a remark on the as-sumptions about our space X. The first definition presented will only requireX to be compact. The second, on the other hand, only requires X to be ametric space. We will obscure from these subtleties and simply stick to thecontext of a compact metric space, in which the two definitions will turn outto be equivalent. Nevertheless, the reader should be aware that both defini-tions represent generalizations of topological entropy in different directions

Page 109: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

109

and are therefore interesting in their own right.Let X be a compact metric space and let α, β be open covers of X. We saythat β is a refinement of α, and write α ≤ β, if for every B ∈ β there is anA ∈ α such that B ⊂ A. The common refinement of α and β is defined tobe α ∨ β = A ∩ B|A ∈ α,B ∈ β. For a finite collection αini=1 of opencovers we may define

∨ni=1 αi = ∩ni=1Aji|Aji ∈ αi. For a continuous trans-

formation T : X −→ X we define T−1α = T−1A|A ∈ α. Note that theseare all again open covers of X. Finally, we define the diameter of an opencover α as diam(α) := supA∈α diam(A), where diam(A) = supx,y∈A d(x, y).

Exercise 7.2.1 Show that T−1(α ∨ β) = T−1(α) ∨ T−1(β) and show thatα ≤ β implies T−1α ≤ T−1β.

Definition 7.2.1 Let α be an open cover of X and let N(α) be the numberof sets in a finite subcover of α of minimal cardinality. We define the entropyof α to be H(α) = log(N(α)).

The following proposition summarizes some easy properties of H(α). Theproof is left as an exercise.

Proposition 7.2.1 Let α be an open cover of X and let H(α) be the entropyof α. Then

1. H(α) ≥ 0

2. H(α) = 0 if and only if N(α) = 1 if and only if X ∈ α

3. If α ≤ β, then H(α) ≤ H(β)

4. H(α ∨ β) ≤ H(α) +H(β)

5. For T : X −→ X continuous we have H(T−1α) ≤ H(α). If T issurjective then H(T−1α) = H(α).

Exercise 7.2.2 Prove proposition 7.2.1 (Hint: If T is surjective then T (T−1A) =A).

We will now move on to the definition of topological entropy for a contin-uous transformation with respect to an open cover and subsequently makethis definition independent of open covers.

Page 110: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

110 Two Definitions

Definition 7.2.2 Let α be an open cover of X and let T : X −→ X becontinuous. We define the topological entropy of T with respect to α to be:

h(T, α) = limn→∞

1

nH(

n−1∨i=0

T−iα)

We must show that h(T, α) is well-defined, i.e. that the right hand side exists.

Theorem 7.2.1 limn→∞1nH(∨n−1i=1 T

−iα) exists.

Proof

Define

an = H(n−1∨i=0

T−iα)

Then, by proposition (4.2.2),, it suffices to show that an is subadditive.Now, by (4) of proposition 7.2.1 and exercise 7.2.1

an+p = H(

n+p−1∨i=0

T−iα)

≤ H(n−1∨i=0

T−iα) +H(T−np−1∨j=0

T−jα)

≤ an + ap

This completes our proof.

Proposition 7.2.2 h(T, α) satisfies the following:

1. h(T, α) ≥ 0

2. If α ≤ β, then h(T, α) ≤ h(T, β)

3. h(T, α) ≤ H(α)

Page 111: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

111

Proof

These are easy consequences of proposition 7.2.1. We will only prove thethird statement. We have:

H(n−1∨i=0

T−iα) ≤n−1∑i=0

H(T−iα)

≤ nH(α)

Here we have subsequently used (4) and (5) of the proposition.

Finally, we arrive at our sought definition:

Definition 7.2.3 (I) Let T : X −→ X be continuous. We define the topo-logical entropy of T to be:

h1(T ) = supαh(T, α)

where the supremum is taken over all open covers of X.

We will defer a discussion of the properties of topological entropy till the endof this chapter.

Let us now turn to the second approach to defining topological entropy.This approach was first taken by E.I. Dinaburg.We shall first define the main ingredients. Let d be the metric on the com-pact metric space X and for each n ∈ Z≥0 define a new metric by:

dn(x, y) = max0≤i≤n−1

d(T i(x), T i(y))

Definition 7.2.4 Let n ∈ Z≥0, ε > 0 and A ⊂ X. A is called (n, ε)-spanning for X if for all x ∈ X there exists a y ∈ A such that dn(x, y) < ε.Define span(n, ε, T ) to be the minimum cardinality of an (n, ε)-spanning set.A is called (n, ε)-separated if for any x, y ∈ A dn(x, y) > ε. We definesep(n, ε, T ) to be the maximum cardinality of an (n, ε)-separated set.

Page 112: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

112 Two Definitions

Figure 7.1: An (n, ε)-separated set (left) and (n, ε)-spanning set (right). Thedotted circles represent open balls of dn-radius ε

2(left) and of dn-radius ε

(right), respectively.

Note that we can extend this definition by letting A ⊂ K, where K isa compact subset of X. This generalization to the context of non-compactmetric spaces was devised by R.E. Bowen. See [W] for an outline of thisextended framework.We can also formulate the above in terms of open balls. If B(x, r) = y ∈X|d(x, y) < r is an open ball in the metric d, then the open ball with centrex and radius r in the metric dn is given by:

Bn(x, r) :=n−1⋂i=0

T−iB(T i(x), r)

Hence, A is (n, ε)-spanning for X if:

X =⋃a∈A

Bn(a, ε)

and A is (n, ε)-separated if:

(A− a) ∩Bn(a, ε) = ∅ for all a ∈ A

Definition 7.2.5 For n ∈ Z≥0 and ε > 0 we define cov(n, ε, T ) to be theminimum cardinality of a covering of X by open sets of dn-diameter lessthan ε.

The following theorem shows that the above notions are actually two sidesof the same coin.

Page 113: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

113

Theorem 7.2.2 cov(n, 3ε, T ) ≤ span(n, ε, T ) ≤ sep(n, ε, T ) ≤ cov(n, ε, T ).Furthermore, sep(n, ε, T ), span(n, ε, T ) and cov(n, ε, T ) are finite, for all n,ε > 0 and continuous T .

Proof

We will only prove the last two inequalities. The first is left as an exerciseto the reader. Let A be an (n, ε)-separated set of cardinality sep(n, ε, T ).Suppose A is not (n, ε)-spanning for X. Then there is some x ∈ X suchthat dn(x, a) ≥ ε, for all a ∈ A. But then A ∪ x is an (n, ε)-separatedset of cardinality larger than sep(n, ε, T ). This contradiction shows that Ais an (n, ε)-spanning set for X. The second inequality now follows since thecardinality of A is at least as large as span(n, ε, T ).To prove the third inequality, let A be an (n, ε)-separated set of cardinal-ity sep(n, ε, T ). Note that if α is an open cover of dn-diameter less than ε,then no element of α can contain more than 1 element of A. This holds inparticular for an open cover of minimal cardinality, so the third inequality isproved.The final statement follows for span(n, ε, T ) and cov(n, ε, T ) from the com-pactness of X and subsequently for sep(n, ε, T ) by the last inequality.

Exercise 7.2.3 Finish the proof of the above theorem by proving the firstinequality.

Lemma 7.2.1 The limit cov(ε, T ) := limn→∞1n

log(cov(n, ε, T )) exists andis finite.

Proof

Fix ε > 0. Let α, β be open covers of X consisting of sets with dm-diameter and dn-diameter, smaller than ε and with cardinalities cov(m, ε, T )and cov(n, ε, T ), respectively. Pick any A ∈ α and B ∈ β. Then, forx, y ∈ A ∩ T−mB,

dm+n(x, y) = max0≤i≤m+n−1

d(T i(x), T i(y))

≤ max max0≤i≤m−1

d(T i(x), T i(y)), maxm≤j≤m+n−1

d(T j(x), T j(y))< ε

Page 114: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

114 Equivalence of the two Definitions

Thus, A∩T−mB has dm+n-diameter less than ε and the sets A∩T−mB|A ∈α,B ∈ β form a cover of X. Hence,

cov(m+ n, ε, T ) ≤ cov(m, ε, T )· cov(n, ε, T )

Now, since log is an increasing function, the sequence an∞n=1 defined byan = log cov(n, ε, T ) is subadditive, so the result follows from proposition 5.

Motivated by the above lemma and theorem, we have the following definition

Definition 7.2.6 (II) The topological entropy of a continuous transforma-tion T : X −→ X is given by:

h2(T ) = limε↓0

limn→∞

1

nlog(cov(n, ε, T ))

= limε↓0

limn→∞

1

nlog(span(n, ε, T ))

= limε↓0

limn→∞

1

nlog(sep(n, ε, T ))

Note that there seems to be a concealed ambiguity in this definition, sinceh2(T ) still depends on the chosen metric d. As we will see later on, incompact metric spaces h2(T ) is independent of d, as long as it induces thetopology of X. However, there is definitely an issue at this point if we dropour assumption of compactness of the space X.

7.2.2 Equivalence of the two Definitions

We will now show that definitions (I) and (II) of topological entropy coincideon compact metric spaces. This will justify the use of the notation h(T ) forboth definitions of topological entropy.In the meantime, we will continue to write h1(T ) for topological entropy inthe definition using open covers and h2(T ) for the second definition of entropy.After all the work done in the last section, our only missing ingredient is thefollowing lemma:

Page 115: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

115

Lemma 7.2.2 Let X be a compact metric space. Suppose αn∞n=1 is a se-quence of open covers of X such that limn→∞ diam(αn) = 0, where diam isthe diameter under the metric d. Then limn→∞ h1(T, αn) = h1(T ).

Proof

We will only prove the lemma for the case h1(T ) < ∞. Let ε > 0 be arbi-trary and let β be an open cover of X such that h1(T, β) > h1(T ) − ε. Bytheorem 7.1.1 there exists a Lebesgue number δ > 0 for β. By assumption,there exists an N > 0 such that for n ≥ N we have diam(αn) < δ. So, forsuch n, we find that for any A ∈ αn there exists a B ∈ β such that A ⊂ B,i.e. β ≤ αn. Hence, by proposition 7.2.2 and the above,

h1(T )− ε < h1(T, β) ≤ h1(T, αn) ≤ h1(T )

for all n ≥ N .

Exercise 7.2.4 Finish the proof of lemma 7.2.2. That is, show that ifh1(T ) = ∞, then limn→∞ h1(T, αn) = ∞.

Theorem 7.2.3 For a continuous transformation T : X −→ X of a compactmetric space X, definitions (I) and (II) of topological entropy coincide.

Proof

Step 1: h2(T ) ≤ h1(T ).For any n, let αk∞k=1 be the sequence of open covers of X, defined byαk = Bn(x,

13k

)|x ∈ X. Then, clearly, the dn-diameter of αk is smaller than1k. Now,

∨n−1i=0 T

−iαk is an open cover of X by sets of dn-diameter smaller

than 1k, hence cov(n, 1

k, T ) ≤ N(

∨n−1i=0 T

−iαk). Since limk→∞ diam(αk) = 0,by lemma 7.2.2, we can subsequently take the log, divide by n, take the limitfor n→∞ and the limit for k →∞ to obtain the desired result.Step 2: h1(T ) ≤ h2(T ).Let βk∞k=1 be defined by βk = B(x, 1

k)|x ∈ X. Note that δk = 2

kis a

Lebesgue number for βk, for each k. Let A be an (n, δk2)-spanning set for X

of cardinality span(n, δk2, T ). Then, for each a ∈ A the ball B(T i(a), δk

2) is

contained in a member of βk (for all 0 ≤ i < n), hence Bn(Ti(a), δk

2) is con-

tained in a member of∨n−1i=0 T

−iβk. Thus, N(∨n−1i=0 T

−iβk) ≤ span(n, 14k, T ).

Proceeding in the same way as in step 1, we obtain the desired inequality.

Page 116: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

116 Equivalence of the two Definitions

Properties of Entropy

We will now list some elementary properties of topological entropy.

Theorem 7.2.4 Let X be a compact metric space and let T : X −→ X becontinuous. Then h(T ) satisfies the following properties:

1. If the metrics d and d both generate the topology of X, then h(T ) is thesame under both metrics

2. h(T ) is a conjugacy invariant

3. h(T n) = n·h(T ), for all n ∈ Z>0

4. If T is a homeomorphism, then h(T−1) = h(T ). In this case, h(T n) =|n|·h(T ), for all n ∈ Z

Proof

(1): Let (X, d) and (X, d) denote the two metric spaces. Since both inducethe same topology, the identity maps i : (X, d) −→ (X, d) and j : (X, d) −→(X, d) are continuous and hence by theorem 7.1.1 uniformly continuous. Fixε1 > 0. Then, by uniform continuity, we can subsequently find an ε2 > 0 andε3 > 0 such that for all x, y ∈ X:

d(x, y) < ε1 ⇒ d(x, y) < ε2

d(x, y) < ε2 ⇒ d(x, y) < ε3

Let A be an (n, ε1)-spanning set for T under d. Then A is also (n, ε2)-spanning for T under d. Analogously, every (n, ε2)-spanning set for T underd is (n, ε3)-spanning for T under d. Therefore,

span(n, ε3, T, d) ≤ span(n, ε2, T, d) ≤ span(n, ε1, T, d)

where the fourth argument emphasizes the metric. By subsequently takingthe log, dividing by n, taking the limit for n → ∞ and the limit for ε1 ↓ 0(so that ε2, ε3 ↓ 0) in these inequalities, we obtain the desired result.(2): Suppose that S : Y −→ Y is topologically conjugate to T , where Y is a

Page 117: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

117

compact metric space, with conjugacy ψ : Y −→ X. Let dX be the metric onX. Then the map dY defined by dY (x, y) = dX(ψ(x), ψ(y)) defines a metricon Y which induces the topology of Y . Indeed, let BdY

(y0, η) be a basiselement of the topology of Y . Then ψ(BdY

(y0, η)) is open in X, as ψ is anopen map. Hence, there is an open ball BdX

(ψ(y0), η) ⊂ ψ(BdY(y0, η)), since

the open balls are a basis for the topology of X. Now, ψ−1(BdX(ψ(y0), η)) ⊂

BdY(y0, η) and

y ∈ ψ−1(BdX(ψ(y0), η)) ⇔ dY (y0, y) = dX(ψ(y), ψ(y0)) < η

hence, BdY(y0, η) ⊂ BdY

(y0, η). Analogously, for any y0 ∈ Y and δ > 0, thereis a δ > 0 such that

BdY(y0, δ) ⊂ BdY

(y0, δ)

Thus, dY induces the topology of Y .Now, for x1, x2 ∈ X, we have:

dX(T (x1), T (x2)) = dX(T ψ(y1), T ψ(y2))

= dX(ψ S(y1), ψ S(y2))

= dY (S(y1), S(y2))

Where we have used that x1 = ψ(y1), x2 = ψ(y2) for some y1, y2 ∈ Y , sinceψ is a bijection. Thus, we see that the n−diameters (in the metrics dX anddY ) remain the same under ψ. Hence, cov(n, ε, T, dX) = cov(n, ε, S, dY ), forall ε > 0 and n ∈ Z≥0. By (1), h(S) is the same under dY and dY , so it noweasily follows that h(S) = h(T ).(3): Observe that

max1≤i≤m−1

d(T ni(x), T ni(y)) ≤ max1≤j≤nm−1

d(T j(x), T j(y))

therefore, span(m, ε, T n) ≤ span(nm, ε, T ). This implies 1mspan(m, ε, T n) ≤

nnmspan(nm, ε, T ), thus h(T n) ≤ nh(T ).

Fix ε > 0. Then, by uniform continuity of T i on X (c.f. theorem 7.1.1),we can find a δ > 0 such that d(T i(x), T i(y)) < ε for all x, y ∈ X satisfyingd(x, y) < δ and i = 0, . . . , n − 1. Hence, for x, y ∈ X satisfying d(x, y) < δ,we find dn(x, y) < ε. Let A be an (m, δ)-spanning set for T n. Then, for allx ∈ X there exists some a ∈ A such that

max0≤i≤m−1

d(T ni(x), T ni(a)) < δ

Page 118: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

118 Equivalence of the two Definitions

But then, by the above,

max0≤k≤n−1

max0≤i≤m−1

d(T ni+k(x), T ni+k(a)) = max0≤j≤nm−1

d(T j(x), T j(a)) < ε

Thus, span(mn, ε, T ) ≤ span(m, δ, T n) and it follows that h(T n) ≥ nh(T )(4): Let A be an (n, ε)-separated set for T . Then, for any x, y ∈ A dn(x, y) >ε. But then,

max0≤i≤n−1

d(T−i(T n−1(x)), T−i(T n−1(y))) = max0≤i≤n−1

d(T i(x), T i(y)) > ε

so T n−1A is an (n, ε)-separated set for T−1 of the same cardinality. Con-versely, every (n, ε)-separated set B for T−1 gives the (n, ε)-separated setT−(n−1)B for T . Thus, sep(n, ε, T ) = sep(n, ε, T−1) and the first statementreadily follows. The second statement is a consequence of (3).

Exercise 7.2.5 Show that assertion (4) of theorem 7.2.4, i.e. h(T−1) =h(T ), still holds if X is merely assumed to be compact.

Exercise 7.2.6 Let (X, dX), (Y, dY ) be compact metric spaces and T :X −→ X, S : Y −→ Y continuous. Show that h(T × S) = h(T ) + h(S).(Hint: first show that the metric d = maxdX , dY induces the producttopology on X × Y ).

7.3 Examples

In this section we provide a few detailed examples to get some familiaritywith the material discussed in this chapter. We derive some simple propertiesof the dynamical systems presented below, but leave most of the good stufffor you to discover in the exercises. After all, the proof of the pudding is inthe eating!

Example. (The Quadratic Map) Consider the following biological popu-lation model. Suppose we are studying a colony of penguins and wish tomodel the evolution of the population through time. We presume that thereis a certain limiting population level, P ∗, which cannot be exceeded. When-ever the population level is low, it is supposed to increase, since there is

Page 119: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

119

plenty of fish to go around. If, on the other hand, population is near the up-per limit level P ∗, it is expected to decrease as a result of overcrowding. Wearrive at the following simple model for the population at time n, denotedby Pn:

Pn+1 = λPn(P∗ − Pn)

Where λ is a ‘speed’ parameter. If we now set P ∗ = 1, we can think of Pnas the population at time n as a percentage of the limiting population level.Then, writing x = P0, the population dynamics are described by iteration ofthe following map:

Definition 7.3.1 The Quadratic Map with parameter λ, Qλ : R −→ R, isdefined to be

Qλ(x) = λx(1− x)

Of course, in the context of the biological model formulated above, our atten-tion is restricted to the interval [0, 1] since the population percentage cannotlie outside this interval.The quadratic map is deceivingly simple, as it in fact displays a wide rangeof interesting dynamics, such as periodicity, topological transitivity, bifurca-tions and chaos.

Let us take a closer look at the case λ > 4. For x ∈ (−∞, 0) ∪ (1,∞)it can be shown that Qn

λ(x) → −∞ as n → ∞. The interesting dynam-ics occur in the interval [0, 1]. Figure 7.2 shows a phase portrait for Qλ

on [0, 1], which is nothing more than the graph of Qλ together with thegraph of the line y = x. In a phase portrait the trajectory of a pointunder iteration of the map Qλ is easily visualized by iteratively project-ing from the line y = x to the graph and vice versa. From our phaseportrait for Qλ it is immediately visible that there is a ‘leak’ in our in-terval, i.e. the subset A1 = x ∈ [0, 1]|Qλ(x) /∈ [0, 1] is non-empty. DefineAn = x ∈ [0, 1]|Qi

λ(x) ∈ [0, 1] for 0 ≤ i ≤ n − 1, Qnλ(x) /∈ [0, 1] and set

C = [0, 1]−∪∞n=1An. This procedure is reminiscent of the construction of the‘classical’ Cantor set, by removing middle-thirds in subsequent steps. In theexercises below it is shown that C is indeed a Cantor set.

Page 120: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

120 Equivalence of the two Definitions

Figure 7.2: Phase portrait of the quadratic map Qλ for λ > 4 on [0, 1]. Thearrows indicate the (partial) trajectory of a point in [0, 1]. The dotted linesdivide [0, 1] in three parts: I0, A1 and I1 (from left to right).

Example 7.3.1 . (Circle Rotations) Let S1 denote the unit circle in C. Wedefine the rotation map Rθ : S1 −→ S1 with parameter θ by Rθ(z) = eiθz =ei(θ+ω), where θ ∈ [0, 2π) and z = eiω for some ω ∈ [0, 2π). It is easily seenthat Rθ is a homeomorphism for every θ. The rotation map is very similarto the earlier introduced shift map Tθ. This is not surprising, considering thefact that the interval [0, 1] is homeomorphic to S1 after identification of theendpoints 0 and 1.

Proposition 7.3.1 Let θ ∈ [0, 2π) and Rθ : S1 −→ S1 be the rotation map.If θ is rational, say θ = a

b, then every z ∈ S1 is periodic with period b. Rθ is

minimal if and only if θ is irrational.

Proof

The first statement follows trivially from the fact that e2niπ = 1 for n ∈ Z.

Page 121: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

121

Suppose that θ is irrational. Fix ε > 0 and z ∈ S1, so z = eiω for someω ∈ [0, 2π). Then

Rnθ (z) = Rm

θ (z) ⇔ ei(ω+nθ) = ei(ω+mθ)

⇔ ei(n−m)θ = 1

⇔ (n−m)θ ∈ Z

Thus, the points Rnθ (z)|n ∈ Z are all distinct and it follows that Rn

θ (z)∞n=1

is an infinite sequence. By compactness, this sequence has a convergentsubsequence, which is Cauchy. Therefore, we can find integers n > m suchthat

d(Rnθ (z), R

mθ (z)) < ε

where d is the arc length distance function. Now, since Rθ is distance preserv-ing with respect to this metric, we can set l = n−m to obtain d(Rl

θ(z), z) < ε.Also, by continuity of Rl

θ, Rlθ maps the connected, closed arc from z to Rl

θ(z)onto the connected closed arc from Rl

θ(z) to R2lθ (z) and this one onto the arc

connecting R2lθ (z) to R3l

θ (z), etc. Since these arcs have positive and equallength, they cover S1. The result now follows, since the arcs have lengthsmaller than ε, and ε > 0 was arbitrary.

As mentioned in the proof, Rθ is an isometry with respect to the arc lengthdistance function and this metric induces the topology on S1. Hence, wesee that span(n, ε, Rθ) = span(1, ε, Rθ) for all n ∈ Z≥0 and it follows thath(Rθ) = 0, for any θ ∈ [0, 2π).

Example 7.3.2 . (Bernoulli Shift Revisited) In this example we will rein-troduce the Bernoulli shift in a topological context. Let Xk = 0, 1, . , k−1Z

(or X+k = 0, . , k− 1N∪0) and recall that the left shift on Xk is defined by

L : Xk −→ Xk, L(x) = y, yn = xn+1. We can define a topology on Xk (orX+k ) which makes the shift map continuous (in fact, a homeomorphism in

the case of Xk) as follows: we equip 0, . . . , k−1 with the discrete topologyand subsequently give Xk the corresponding product topology. It is easy tosee that the cylinder sets form a basis for this topology. By the Tychonofftheorem (see e.g. [Mu]), which asserts that an arbitrary product of compact

Page 122: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

122 Equivalence of the two Definitions

spaces is compact in the product topology, our space Xk is compact. In theexercises you are asked to show that the metric

d(x, y) = 2−l if l = min|j| : xj 6= yj

induces the product topology, i.e. the open balls B(x, r) with respect to thismetric form a basis for the product topology. Here we set d(x, y) = 0 ifx = y, this corresponds to the case l = ∞.We will end this example by calculating the topological entropy of the fullshift L.

Proposition 7.3.2 The topological entropy of the full shift map is equal toh(L) = log(k).

Proof

We will prove the proposition for the left shift onX+k , the case for the left shift

on Xk is similar. Fix 0 < ε < 1 and pick any x = xi∞i=0, y = yi∞i=0 ∈ X+k .

Notice that if at least one of the first n symbols in the itineraries of x and ydiffer, then

dn(x, y) = max0≤i≤n−1

d(T i(x), T i(y)) = 1 > ε

where we have used the metric d on X+k defined above. Thus, the set An

consisting of sequences a ∈ X+k such that ai ∈ 0, . . . , k−1 for 0 ≤ i ≤ n−1

and aj = 0 for j ≥ n is (n, ε)-separated. Explicitly, a ∈ An is of the form

a0a1a2 . . . an−1000 . . .

Since there are kn possibilities to choose the first n symbols of an itinerary,we obtain sep(n, ε, L) ≥ kn. Hence,

h(T ) = limε↓0

limn→∞

1

nlog(sep(n, ε, L)) ≥ log(k)

To prove the reverse inequality, take l ∈ Z>0 such that 2−l < ε. Then An+l

is an (n, ε, L)-spanning set, since for every x ∈ X+k there is some a ∈ An+l

for which the first n+ l symbols coincide. In other words, d(x, a) < 2−l < ε.Therefore,

h(T ) = limε↓0

limn→∞

1

nlog(span(n, ε, L)) ≤ lim

ε↓0limn→∞

n+ l

nlog(k) = log(k)

and our proof is complete.

Page 123: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

123

Figure 7.3: Phase portrait of the teepee map T on [0, 1].

Example 7.3.3 . (Symbolic Dynamics) The Bernoulli shift on the bise-quence space Xk is a topological dynamical system whose dynamics are quitewell understood. The subject of symbolic dynamics is devoted to using thisknowledge to unravel the dynamical properties of more difficult systems. Thescheme works as follows. Let T : X −→ X be a topological dynamical sys-tem and let α = A0, . . . , Ak−1 be a partition of X such that Ai ∩ Aj = ∅for i 6= j (Note the difference with the earlier introduced notion of a parti-tion). We assume that T is invertible. Now, for x ∈ X, we let φi(x) be theindex of the element of α which contains T i(x), i ∈ Z. This defines a mapφ : X −→ Xk, called the Itinerary map generated by T on α,

φ(x) = φi(x)∞i=−∞

The image of x under φ is called the itinerary of x. Note that φ satisfiesφ T = L φ. Of course, if T is not invertible, we may apply the sameprocedure by replacing Xk by X+

k .We will now apply the above procedure to determine the number of pe-

riodic points of the teepee (or tent) map T : [0, 1] −→ [0, 1], defined by (seefigure 7.3)

T (x) =

2x if 0 ≤ x ≤ 1

2

2(1− x) if 12≤ x ≤ 1

Let I0 = [0, 12], I1 = [1

2, 1]. We would like to define an itinerary map

Page 124: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

124 Equivalence of the two Definitions

ψ : [0, 1] −→ X+2 generated on I0, I1. Unfortunately, the fact that I0 ∩

I1 6= ∅ causes some complications. Indeed, we have an ambiguous choice ofthe itinerary for x = 1

2, since it can be described by either (01000 . . .) or

(11000 . . .), or any other point in [0, 1] which will end up in x = 12

uponiteration of T . We will therefore identify any pair of sequences of the form(a0a1 . . . al01000 . . .) and (a0a1 . . . al11000 . . .) and replace the two sequencesby a single equivalence class. The resulting space will be denoted by X+

2 andthe resulting itinerary map again by ψ : [0, 1] −→ X+

2 .We will first show that ψ is injective. Suppose that there are x, y ∈ [0, 1]with ψ(x) = ψ(y). Then T n(x) and T n(y) lie on the same side of x = 1

2for

all n ∈ Z≥0. Suppose that x 6= y. Then, for all n ∈ Z≥0, Tn([x, y]) does

not contain the point x = 12. But then no rational numbers of the form p

2n ,p ∈ 1, 2, 3, . . . , 2n − 1, are in [x, y] for all n ∈ Z>0. Since these form adense set in [0, 1], we obtain a contradiction. We conclude that x = y, ψ isone-to-one.Now let us show that ψ is also surjective. Let a = ai∞i=0 ∈ X+

2 . If a is oneof the aforementioned equivalence classes, we use the ‘0’ element to representa. Now, define

An = x ∈ [0, 1]|x ∈ Ia0 , T (x) ∈ Ia1 , . . . , Tn(x) ∈ Ian

= Ia0 ∩ T−1Ia1 ∩ . . . ∩ T−nIan

Since T is continuous, T−iIaiis closed for all i and therefore An is closed for

n ∈ Z≥0. As [0, 1] is a compact metric space, it follows from theorem 7.1.1that An is compact. Moreover, An∞n=0 forms a decreasing sequence, in thesense that An+1 ⊂ An. Therefore, the intersection ∩∞n=0An is not empty anda is precisely the itinerary of x ∈ ∩∞n=0An. We conclude that ψ is onto.Now, since ψ T = L ψ, with L : X+

2 −→ X+2 the (induced) left shift, we

have found a bijection between the periodic points of L on X+2 and those

of T on [0, 1]. The periodic points for T are quite difficult to find, but theperiodic points of L are easily identified. First observe that there are noperiodic points in the earlier defined equivalence classes in X+

2 since thesepoints are eventually mapped to 0. Now, any periodic point a of L of periodn in X+

2 must have an itinerary of the form

(a0a1 . . . an−1a0a1 . . . an−1a0a1 . . .)

Hence, T has 2n periodic points of period n in [0, 1].

Page 125: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

125

Exercise 7.3.1 In this exercise you are asked to prove some properties ofthe shift map. Let Xk = 0, 1, . , k − 1Z and X+

k = 0, 1, . , k − 1N∪0.

a. Show that the map d : Xk ×Xk −→ Xk, defined by

d(x, y) = 2−l if l = min|j| : xj 6= yj

is a metric on Xk that induces the product topology on Xk.

b. Show that the map d∗ : Xk ×Xk −→ Xk, defined by

d∗(x, y) =∞∑

n=−∞

|xn − yn|2|n|

is a metric on Xk that induces the product topology on Xk.

c. Show that the one-sided shift on X+k is continuous. Show that the two-

sided shift on Xk is a homeomorphism.

d. Show that the two-sided shift L : Xk −→ Xk is expansive.

Exercise 7.3.2 Let Qλ be the quadratic map and let C,A1 be as in the firstexample. Set [0, 1] − A1 = I0 ∪ I1, where I0 is the interval left of A1 and I1the interval to the right. We assume that λ > 2+

√5, so that |Q′

λ(x)| > 1 forall x ∈ I0 ∪ I1 (check this!). Let X+

2 = 0, 1N and let φ : C −→ X+2 be the

itinerary map generated by Qλ on I0, I1. This exercise serves as anotherdemonstration of the usefulness of symbolic dynamics.

a. Show that φ is a homeomorphism.

b. Show that φ is a topological conjugacy between Qλ and the one-sidedshift map on X+

2 .

c. Prove that Qλ has exactly 2n periodic points of period n in C. Show thatthe periodic points of Qλ are dense in C. Finally, prove that Qλ istopologically transitive on C.

Exercise 7.3.3 Let Qλ be the quadratic map and let C be as in the firstexample. This exercise shows that C is a Cantor set, i.e. a closed, totallydisconnected and perfect subset of [0, 1]. As in the previous exercise, we willassume that λ > 2 +

√5.

Page 126: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

126 Equivalence of the two Definitions

a. Prove that C is closed.

b. Show that C is totally disconnected, i.e. the only connected subspaces ofC are one-point sets.

c. Demonstrate that C is perfect, i.e. every point is a limit point of C.

Page 127: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Chapter 8

The Variational Principle

In this chapter we will establish a powerful relationship between measuretheoretic and topological entropy, known as the Variational Principle. Itasserts that for a continuous transformation T of a compact metric space Xthe topological entropy is given by h(T ) = suphµ(T )|µ ∈ M(X,T ). Toprove this statement, we will proceed along the shortest and most popularroute to victory, provided by [M]1. The proof is at times quite technical andis therefore divided into several digestable pieces.

8.1 Main Theorem

For the first part of the proof of the Variational Principle we will only usesome properties of measure theoretic entropy. All the necessary ingredientsare listed in the following lemma:

Lemma 8.1.1 Let α, β, γ be finite partitions of X and T a measure preserv-ing transformation of the probability space (X,F , µ). Then,

1. Hµ(α) ≤ log(N(α)), where N(α) is the number of sets in the partitionof non-zero measure. Equality holds only if µ(A) = 1

N(α), for all A ∈ α

with µ(A) > 0

2. If α ≤ β, then hµ(T, α) ≤ hµ(T, β)

1Our exposition of Misiurewicz’s proof follows [W] to a certain extent

127

Page 128: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

128 Equivalence of the two Definitions

3. hµ(T, α) ≤ hµ(T, γ) +Hµ(α|γ)

4. hµ(Tn) = n · hµ(T ), for all n ∈ Z>0

Proofs

(1): This is a straighforward consequence of the concavity of the functionf : [0,∞) −→ R, defined by f(t) = −t log(t) (t > 0), f(0) = 0. In thisnotation, Hµ(α) =

∑A∈α f(µ(A)).

(2): α ≤ β implies that for every B ∈ β, there some A ∈ α such that B ⊂ Aand hence also T−iB ⊂ T−iA, i ∈ Z≥0. Therefore, for n ≥ 1

n−1∨i=0

T−iα ≤n−1∨i=0

T−iβ

The result now follows from proposition .(3): By proposition (4.2.1)(e) and (b), respectively,

H(n−1∨i=0

T−iα) ≤ H(n−1∨i=0

T−iγ ∨n−1∨i=0

T−iα)

= H(n−1∨i=0

T−iγ) +H(n−1∨i=0

T−iα|n−1∨i=0

T−iγ)

Now, by successively applying proposition (4.2.1) (d), (g) and finally usingthe fact that T is measure preserving, we get

H(n−1∨i=0

T−iα|n−1∨i=0

T−iγ) ≤n−1∑i=0

H(T−iα|n−1∨i=0

T−iγ)

≤n−1∑i=0

H(T−iα|T−iγ)

= nH(α|γ)

Combining the inequalities, dividing both sides by n and taking the limit forn→∞ gives the desired result.

Page 129: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Main Theorem 129

(4): Note that

h(T n,n−1∨i=0

T−iα) = limk→∞

1

kH(

k−1∨j=0

T−nj(n−1∨i=0

T−iα))

= limk→∞

n

knH(

kn−1∨i=0

T−iα)

= limm→∞

n

mH(

m−1∨i=0

T−iα)

= nh(T, α)

Notice that ∨n−1i=0 T

−iα | α a partition of X ⊂ β | β a partition of X.Hence,

nh(T ) = n supαh(T, α)

= supαh(T n,

n−1∨i=0

T−iα)

≤ supαh(T n, α)

= h(T n)

Finally, since α ≤∨n−1i=0 T

−iα, we obtain by (2),

h(T n, α) ≤ h(T n,n−1∨i=0

T−iα) = nh(T, α)

So h(T n) ≤ nh(T ). This concludes our proof.

Exercise 8.1.1 Use lemma 8.1.1 and proposition (4.2.1) to show that for aninvertible measure preserving transformation T on (X,F , µ):

h(T n) = |n|h(T ), for all n ∈ Z

We are now ready to prove the first part of the theorem.

Page 130: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

130 Equivalence of the two Definitions

Theorem 8.1.1 Let T : X −→ X be a continuous transformation of acompact metric space. Then h(T ) ≥ suphµ(T )|µ ∈M(X,T ).

Proof

Fix µ ∈ M(X,T ). Let α = A1, . . . , An be a finite partition of X. Pickε > 0 such that ε < 1

n logn. By theorem 16, µ is regular, so we can find closed

sets Bi ⊂ Ai such that µ(Ai−Bi) < ε, i = 1, . . . , n. Define B0 = X−∪ni=1Bi

and let β be the partition β = B0, . . . , Bn. Then, writing f(t) = −t log(t),

Hµ(α|β) =n∑i=0

n∑j=1

µ(Bi)f(µ(Bi ∩ Aj)µ(Bi)

)

= µ(B0)n∑j=1

f(µ(B0 ∩ Aj)µ(B0)

)

in the final step we used the fact that for i ≥ 1,

Bi ∩ Ai = Bi

Bi ∩ Aj = ∅ if i 6= j

We can define a σ-algebra on B0 by F ∩ B0 = F ∩ B0|F ∈ F and definethe conditional measure µ(·|B0) : F ∩B0 −→ [0, 1] by

µ(A|B0) =µ(A ∩B0)

µ(B0)

It is easy to check that (B0,F ∩ B0, µ(·|B0)) is a probability space andµ(·|B0) ∈M(B0, T |B0). Now, noting that α0 := A∩B0|A ∈ α is a partitionof B0 under µ(·|B0), we get

Hµ(α|β) = µ(B0)n∑j=1

f(µ(B0 ∩ Aj)µ(B0)

) = µ(B0)Hµ(·|B0)(α0)

Hence, we can apply (1) of lemma 8.1.1 and use the fact that

µ(B0) = µ(X − ∪ni=1Bi) = µ(∪ni=1Ai − ∪ni=1Bi) = µ(∪i=1(Ai −Bi)) < nε

Page 131: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Main Theorem 131

to obtainHµ(α|β) ≤ µ(B0) log(n) < nε log(n) < 1

Define for each i, i = 1, . . . , n, the open set Ci by Ci = B0 ∪ Bi = X −∪j 6=iBj. Then we can define an open cover of X by γ = C1, . . . , Cn.By (1) of lemma 8.1.1 we have for m ≥ 1, in the notation of the lemma,Hµ(

∨m−1i=0 T−iβ) ≤ log(N(

∨m−1i=0 T−iβ)). Note that γ is not necessarily a par-

tition, but by the proof of (1) of lemma 8.1.1, we still have Hµ(∨m−1i=0 T−iβ) ≤

log(2mN(∨m−1i=0 T−iγ)), since β contains (at most) one more set of non-zero

measure.Thus, it follows that

hµ(T, β) ≤ h(T, γ) + log 2 ≤ h(T ) + log 2

and by (3) of lemma 8.1.1 we get obtain

hµ(T, α) ≤ hµ(T, β) +Hµ(α|β) ≤ h(T ) + log 2 + 1

Note that if µ ∈M(X,T ), then also µ ∈M(X,Tm), so the above inequalityholds for Tm as well. Applying (4) of lemma 8.1.1 and (3) of theorem 7.2.4leads us to mhµ(T ) ≤ mh(T ) + log 2 + 1. By dividing by m, taking the limitfor m → ∞ and taking the supremum over all µ ∈ M(X,T ), we obtain thedesired result. We will now finish the proof of the Variational Principle by proving theopposite inequality. This is the hard part of the proof, since it does not onlyrequire more machinery, but also a clever trick.

Lemma 8.1.2 Let X be a compact metric space and α a finite partition ofX. Then, for any µ, ν ∈ M(X,T ) and p ∈ [0, 1] we have Hpµ+(1−p)ν(α) ≥pHµ(α) + (1− p)Hν(α)

Proof

Define the function f : [0,∞) −→ R by f(t) = −t log(t) (t > 0), f(0) = 0.Then f is concave and so, for A measurable, we have:

0 ≤ f(pµ(A) + (1− p)ν(A))− pf(µ(A))− (1− p)f(ν(A))

The result now follows easily.

Page 132: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

132 Equivalence of the two Definitions

Exercise 8.1.2 Suppose X is a compact metric space and T : X −→ X acontinuous transformation. Use (the proof of) lemma 8.1.2 to show that forany µ, ν ∈ M(X,T ) and p ∈ [0, 1] we have hpµ+(1−p)ν(T ) ≥ phµ(T ) + (1 −p)hν(T ).

Exercise 8.1.3 Improve the result in the exercise above by showing that wecan replace the inequality by an equality sign, i.e. for any µ, ν ∈ M(X,T )and p ∈ [0, 1] we have hpµ+(1−p)ν(T ) = phµ(T ) + (1− p)hν(T ).

Recall that the boundary of a set A is defined by ∂A = A− A.

Lemma 8.1.3 Let X be a compact metric space and µ ∈M(X). Then,

1. For any x ∈ X and δ > 0, there is a 0 < η < δ such that µ(∂B(x, η)) =0

2. For any δ > 0, there is a finite partition α = A1, . . . , An of X suchthat diam(Aj) < δ and µ(∂Aj) = 0, for all j

3. If T : X −→ X is continuous, µ ∈ M(X,T ) and Aj ⊂ X measurablesuch that µ(∂Aj) = 0, j = 0, . . . , n− 1, then µ(∂(∩n−1

j=0T−jAj)) = 0

4. If µn → µ in M(X) and A is a measurable set such that µ(∂A) = 0,then limn→∞ µn(A) = µ(A)

Proof

(1): Suppose the statement is false. Then there exists an x ∈ X and δ > 0such that for all 0 < η < δ µ(∂B(x, η)) > 0. Let ηi∞i=1 be a sequence ofdistinct real numbers satisfying 0 < ηi < δ and ηi ↑ δ for i → ∞. Then,µ(∪∞i=1∂B(x, ηi)) =

∑∞i=1 µ(∂B(x, ηi)) = ∞, contradicting the fact that µ is

a probability measure on X.(2): Fix δ > 0. By (1), for each x ∈ X, we can find an 0 < ηx < δ/2such that µ(∂B(x, ηx)) = 0. The collection B(x, ηx)|x ∈ X forms an opencover of X, so by compactness there exists a finite subcover which we denoteby β = B1, . . . , Bn. Define α by letting A1 = B1 and for 0 < j ≤ n letAj = Bj − (∪j−1

k=1Bk). Then α is a partition of X, diam(Aj) < diam(Bj) < δand µ(∂Aj) ≤ µ(∪ni=1∂Bi) = 0, since ∂Aj ⊂ ∪ni=1∂Bi.

(3): Let x ∈ ∂(∩n−1j=0T

−jAj). Then x ∈ ∩n−1j=0T

−jAj, but x /∈ ∩n−1j=0T

−jAj.

That is, every open neighborhood of x intersects every T−jAj, but x /∈ T−kAk

Page 133: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Main Theorem 133

for some 0 ≤ k ≤ n − 1. Hence, x ∈ T−kAk − T−kAk and by continuity ofT k,

T k(x) ∈ T k(T−kAk)− Ak ⊂ Ak − Ak = ∂Ak

Thus, ∂(∩n−1j=0T

−jAj) ⊂ ∪n−1j=0T

−j∂Aj and the statement readily follows.(4): Recall that µn → µ in M(X) for n→∞ if and only if:

limn→∞

∫X

f(x)dµn(x) =

∫X

f(x)dµ(x)

for all f ∈ C(X). Let A be as stated and define Ak = x ∈ X|d(x,A) > 1k,

k ∈ Z>0. Then, since A and X−Ak are closed, it follows from theorem 7.1.1that for every k there exists a function fk ∈ C(X) such that 0 ≤ fk ≤ 1,fk(x) = 1 for all x ∈ A and fk(x) = 0 for all x ∈ X − Ak. Now, for each k,

limn→∞

µn(A) = limn→∞

∫X

1A(x)dµn(x) ≤ limn→∞

∫X

fk(x)dµn(x) =

∫X

fk(x)dµ(x)

Note that fk ↓ 1A in µ-measure for k →∞, so by the Dominated ConvergenceTheorem,

limn→∞

µn(A) ≤ limk→∞

∫X

fk(x)dµ(x) = µ(A) = µ(A)

where the final inequality follows from the assumption µ(∂A) = 0. The proofof the opposite inequality is similar.

Lemma 8.1.4 Let q, n be integers such that 1 < q < n. Define, for 0 ≤ j ≤q − 1, a(j) = bn−j

qc, where bc means taking the integer part. Then we have

the following

1. a(0) ≥ a(1) ≥ · · · ≥ a(q − 1)

2. Fix 0 ≤ j ≤ q − 1. Define

Sj = 0, 1, . . . , j − 1, j + a(j)q, j + a(j)q + 1, . . . , n− 1

Then

0, 1, . . . , n− 1 = j + rq + i|0 ≤ r ≤ a(j)− 1, 0 ≤ i ≤ q − 1 ∪ Sj

and card(Sj) ≤ 2q.

Page 134: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

134 Equivalence of the two Definitions

3. For each 0 ≤ j ≤ q− 1, (a(j)− 1)q + j ≤ bn−jq− 1cq + j ≤ n− q. The

numbers j + rq|0 ≤ j ≤ q − 1, 0 ≤ r ≤ a(j) − 1 are distinct and nogreater than n− q.

The three statements are clear after a little thought.

Example 8.1.1 Let us work out what is going on in lemma 8.1.4 for n =10, q = 4. Then a(0) = b10−0

4c = 2 and similarly, a(1) = 2, a(2) = 2

and a(3) = 1. This gives us S0 = 8, 9, S1 = 0, 9, S2 = 0, 1 andS3 = 0, 1, 2, 7, 8, 9. For example, S3 is obtained by taking all nonnegativeintegers smaller than 3 and adding the numbers 3+a(3)q = 7, 3+a(3)q+1 = 8and 3 + a(3)q + 2 = 9. Note that card(Sj) ≤ 8. One can readily check allproperties stated in the lemma, e.g. (a(3) − 1)q + 3 ≤ b10−3

4− 1c · 4 + 3 =

3 ≤ 6 = n− q.

We shall now finish the proof of our main theorem. The proof is presentedin a logical order, which is in this case not quite the same as thinking order.Therefore, before plunging into a formal proof, we will briefly discuss themain ideas.We would like to construct a Borel probability measure µ with measure the-oretic entropy hµ(T ) ≥ sep(ε, T ) := limn→∞

1n

log(sep(n, ε, T )). To do this,

we first find a measure σn with Hσn(∨n−1i=0 T

−iα) equal to log(sep(n, ε, T )),where α is a suitably chosen partition. This is not too difficult. The problemis to find an element of M(X,T ) with this property. Theorem (6.1.5) sug-gest a suitable measure µ which can be obtained from the σn. The trick inlemma 8.1.4 plays a crucial role in getting from an estimate for Hσn to onefor Hµ. The idea is to first remove the ‘tails’ Sj of the partition

∨n−1i=0 T

−iα,make a crude estimate for these tails and later add them back on again.Lemma 8.1.3 fills in the remaining technicalities.

Theorem 8.1.2 Let T : X −→ X be a continuous transformation of acompact metric space. Then h(T ) ≤ suphµ(T )|µ ∈M(X,T ).

Proof

Fix ε > 0. For each n, let En be an (n, ε)-separated set of cardinal-ity sep(n, ε, T ). Define σn ∈ M(X) by σn = (1/sep(n, ε, T ))

∑x∈En

δx,

Page 135: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Main Theorem 135

where δx is the Dirac measure concentrated at x. Define µn ∈ M(X) byµn = 1

n

∑n−1i=0 σnT−i. By theorem 19, M(X) is compact, hence there exists a

subsequence nj∞j=1 such that µnj converges in M(X) to some µ ∈M(X).

By theorem (6.1.5), µ ∈M(X,T ). We will show that hµ(T ) ≥ sep(ε, T ), fromwhich the result clearly follows.By (2) of lemma 8.1.3, we can find a µ-measurable partition α = A1, . . . , Akof X such that diam(Aj) < ε and µ(∂Aj) = 0, j = 1, . . . , k. We may as-sume that every x ∈ En is contained in some Aj. Now, if A ∈

∨n−1i=0 T

−iα,then A cannot contain more than one element of En. Hence, σn(A) = 0 or1/sep(n, ε, T ). Since ∪kj=1Aj contains all of En, we see thatHσn(

∨n−1i=0 T

−iα) =log(sep(n, ε, T )).Fix integers q, n with 1 < q < n and define for each 0 ≤ j ≤ q − 1 a(j) as inlemma 8.1.4. Fix 0 ≤ j ≤ q − 1. Since

n−1∨i=0

T−iα = (

a(j)−1∨r=0

T−(rq+j)(

q−1∨i=0

T−iα)) ∨ (∨i∈Sj

T−iα)

we find that

log(sep(n, ε, T )) = Hσn(n−1∨i=0

T−iα)

≤a(j)−1∑r=0

Hσn(T−rq−j(

q−1∨i=0

T−iα)) +∑i∈Sj

Hσn(T−iα)

≤a(j)−1∑r=0

HσnT−(rq+j)(

q−1∨i=0

T−iα) + 2q log k

Here we used proposition (4.2.1)(d) and lemma 8.1.1. Now if we sum theabove inequality over j and divide both sides by n, we obtain by (3) and (1)of lemma 8.1.4

q

nlog(sep(n, ε, T )) ≤ 1

n

n−1∑l=0

HσnT−l(

q−1∨i=0

T−iα) +2q2

nlog k

≤ Hµn(

q−1∨i=0

T−iα) +2q2

nlog k

Page 136: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

136 Measures of Maximal Entropy

By (3) of lemma 8.1.3, each atom A of∨q−1i=0 T

−iα has boudary of µ-measurezero, so by (4) of the same lemma, limj→∞ µnj

(A) = µ(A). Hence, if wereplace n by nj in the above inequality and take the limit for j → ∞ weobtain qsep(ε, T ) ≤ Hµ

∨q−1i=0 T

−iα. If we now divide both sides by q andtake the limit for q →∞, we get the desired inequality.

Corollary 8.1.1 (The Variational Principle) The topological entropy of acontinuous transformation T : X −→ X of a compact metric space X isgiven by h(T ) = suphµ(T )|µ ∈M(X,T )

To get a taste of the power of this statement, let us recast our proof of theinvariance of topological entropy under conjugacy ((2) of theorem 7.2.4). Letφ : X1 → X2 denote the conjugacy. We note that µ ∈M(X1, T1) if and onlyif µ φ−1 ∈M(X2, T2) and we observe that hµ(T1) = hµφ−1(T2). The resultnow follows from the Variational Principle. It is that simple.

8.2 Measures of Maximal Entropy

The Variational Principle suggests an educated way of choosing a Borel prob-ability measure on X, namely one that maximizes the entropy of T .

Definition 8.2.1 Let X be a compact metric space and T : X −→ X becontinuous. A measure µ ∈M(X,T ) is called a measure of maximal entropyif hµ(T ) = h(T ). Let Mmax(X,T ) = µ ∈ M(X,T )|hµ(T ) = h(T ). IfMmax(X,T ) = µ then µ is called a unique measure of maximal entropy.

Example. Recall that the topological entropy of the circle rotation Rθ isgiven by h(Rθ) = 0. Since hµ(Rθ) ≥ 0 for all µ ∈ M(X,Rθ), we see thatevery µ ∈ M(X,Rθ) is a measure of maximal entropy, i.e. Mmax(X,Rθ) =M(X,Rθ). More generally, we have h(T ) = 0 and hence Mmax(X,T ) =M(X,T ) for any continuous isometry T : X −→ X.

Measures of maximal entropy are closely connected to (uniquely) ergodicmeasures, as will become apparent from the following theorem.

Theorem 8.2.1 Let X be a compact metric space and T : X −→ X becontinuous. Then

Page 137: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

137

• Mmax(X,T ) is a convex set.

• If h(T ) < ∞ then the extreme points of Mmax(X,T ) are precisely theergodic members of Mmax(X,T ).

• If h(T ) = ∞ then Mmax(X,T ) 6= ∅. If, moreover, T has a uniquemeasure of maximal entropy, then T is uniquely ergodic.

• A unique measure of maximal entropy is ergodic. Conversely, if T isuniquely ergodic, then T has a measure of maximal entropy.

Proofs

(1): Let p ∈ [0, 1] and µ, ν ∈Mmax(X,T ). Then, by exercise 8.1.3,

hpµ+(1−p)ν(T ) = phµ(T ) + (1− p)hν(T ) = ph(T ) + (1− p)h(T ) = h(T )

Hence, pµ+ (1− p)ν ∈Mmax(X,T ).(2): Suppose µ ∈ Mmax(X,T ) is ergodic. Then, by theorem 6.1.6, µ cannotbe written as a non-trivial convex combination of elements of M(X,T ). SinceMmax(X,T ) ⊂ M(X,T ), µ is an extreme point of Mmax(X,T ). Conversely,suppose µ is an extreme point of M(X,T ) and suppose there is a p ∈ (0, 1)and ν1, ν2 ∈ M(X,T ) such that µ = pν1 + (1 − p)ν2. By exercise 8.1.3,h(T ) = hµ(T ) = phν1(T ) + (1− p)hν2(T ). But by the Variational Principle,h(T ) = suphµ(T )|µ ∈ M(X,T ), so h(T ) = hν1(T ) = hν2(T ). In otherwords, ν1, ν2 ∈ Mmax(X,T ), thus µ = ν1 = ν2. Therefore, µ is also anextreme point of M(X,T ) and we conclude that µ is ergodic.For the second part, suppose that Mmax(X,T ) 6= ∅.(3): By the Variational Principle, for any n ∈ Z>0, we can find a µn ∈M(X,T ) such that hµn(T ) > 2n. Define µ ∈M(X,T ) by

µ =∞∑n=1

µn2n

=N∑n=1

µn2n

+∞∑n=N

µn2n

But then,

hµ(T ) ≥N∑n=1

µn2n

> N

This holds for arbitrary N ∈ Z>0, so hµ(T ) = h(T ) = ∞ and µ is a measureof maximal entropy.

Page 138: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

138 Measures of Maximal Entropy

Now suppose that T has a unique measure of maximal entropy. Then, forany ν ∈ M(X,T ), hµ/2+ν/2(T ) = 1

2hµ(T ) + 1

2hν(T ) = ∞. Hence, µ = ν,

M(X,T ) = µ.(4): The first statement follows from (2), for h(T ) < ∞, and (3), forh(T ) = ∞. If T is uniquely ergodic, then M(X,T ) = µ for some µ.By the Variational Principle, hµ(T ) = h(T ).

Example 8.2.1 For θ irrationalRθ is uniquely ergodic with respect to Lebesguemeasure. By the theorem above, it follows that Lebesgue measure is also aunique measure of maximal entropy for Rθ.

Exercise 8.2.1 Let X = 0, 1, . . . , k − 1Z and let T : X −→ X be thefull two-sided shift. Use proposition 7.3.2 to show that the uniform productmeasure is a unique measure of maximal entropy for T .

We end this section with a generalization of the above exercise.

Proposition 8.2.1 Every expansive homeomorphism of a compact metricspace has a measure of maximal entropy.

Proof

Let T : X −→ X be an expansive homeomorphism and let δ > 0 be anexpansive constant for T . Fix 0 < ε < δ. Define µ ∈ M(X,T ) as in theproof of theorem 8.1.2. Then, by the proof, hµ(T ) ≥ sep(ε, T ). We will showthat h(T ) = sep(ε, T ). It then immediately follows that µ ∈Mmax(X,T ).Pick any 0 < η < ε and let A be an (n, η)-separated set of cardinal-ity sep(n, η, T ). By expansiveness, we can find for any x, y ∈ A somek = kx,y ∈ Z such that

d(T k(x), T k(y)) > ε

By theorem 7.2.2, A is finite, so l := max|kx,y| : x, y ∈ A is in Z>0. Now,by our choice of l, for x, y ∈ T−lA we have

max0≤i≤2l+n−1

d(T i(x), T i(y)) > ε

Page 139: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

139

So T−lA is an (2l + n, ε, T )-separated set of cardinality sep(n, η, T ). Hence,

sep(η, T ) = limn→∞

1

nlog(sep(n, η, T ))

≤ limn→∞

1

nlog(sep(2l + n, ε, T ))

≤ limn→∞

1

2l + nlog(sep(2l + n, ε, T ))

= sep(ε, T )

Conversely, if A is (n, ε)-separated then A is also (n, η)-separated, so

sep(ε, T ) ≤ sep(η, T ).

We conclude that sep(ε, T ) = sep(η, T ). Since 0 < η < ε was arbitrary, thisshows that h(T ) = limη↓0 sep(η, T ) = sep(ε, T ). This completes our proof.

From the proof we extract the following corollary.

Corollary 8.2.1 Let T : X −→ X be an expansive homeomorphism and letδ be an expansive constant for T . Then h(T ) = sep(ε, T ), for any 0 < ε < δ.

At this point we would like to make a remark on the proof of the above propo-sition. One might be inclined to believe that the measure µ constructed inthe proof of theorem 8.1.2 is a measure of maximal entropy for any continu-ous transformation T . The reader should note, though, that µ still dependson ε and it is therefore only because of the above corollary that µ is a mea-sure of maximal entropy for expansive homeomorphisms.

Page 140: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

140 Measures of Maximal Entropy

Page 141: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Bibliography

[B] Michael Brin and Garrett Stuck, Introduction to Dynamical Systems,Cambridge University Press, 2002.

[D] K. Dajani and C. Kraaikamp, Ergodic theory of numbers, CarusMathematical Monographs, 29. Mathematical Association of Amer-ica, Washington, DC 2002.

[DF] Dajani, K., Fieldsteel, A. Equipartition of Interval Partitions and anApplication to Number Theory, Proc. AMS Vol 129, 12, (2001), 3453- 3460.

[De] Devaney, R.L., 1989, An Introduction to Chaotic Dynamical Systems,Second Edition, Addison-Wesley Publishing Company, Inc.

[G] Gleick, J., 1998, Chaos: Making a New Science, Vintage

[Ho] Hoppensteadt, F.C., 1993, Analysis and Simulation of Chaotic Sys-tems, Springer-Verlag New York, Inc.

[H] W. Hurewicz, Ergodic theorem withour invariant measure. Annals ofMath., 45 (1944), 192–206.

[KK] Kamae, Teturo; Keane, Michael, A simple proof of the ratio ergodictheorem, Osaka J. Math. 34 (1997), no. 3, 653–657.

[KT] Kingman and Taylor, Introduction to measure and probability, Cam-bridge Press, 1966.

[M] Misiurewicz, M., 1976, A short proof of the variational principle fora Zn

+ action on a compact space, Asterique, 40, pp. 147-187

[Mu] Munkres, J.R., 2000, Topology, Second Edition, Prentice Hall, Inc.

141

Page 142: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

142 Measures of Maximal Entropy

[Pa] William Parry, Topics in Ergodic Theory, Reprint of the 1981 original.Cambridge Tracts in Mathematics, 75. Cambridge University Press,Cambridge, 2004.

[P] Karl Petersen, Ergodic Theory, Cambridge Studies in AdvancedMathematics, 2. Cambridge University Press, Cambridge, 1989.

[W] Peter Walters, An Introduction to Ergodic Theory, Graduate Textsin Mathematics, 79. Springer-Verlag, New York-Berlin, 1982.

Page 143: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

Index

(X, d), 102(n, ε)-separated, 111(n, ε)-spanning, 111B(x, r), 112Bn(x, r), 112FOT (x), 103H(α), 109Mmax(X,T ), 136N(α), 109OT (x), 103Qλ, 119Rθ, 120T−1α, 109X, 102α ∨ β, 109β-transformations, 10∨ni=1 αi, 109

∂A, 132cov(ε, T ), 113cov(n, ε, T ), 112d, 102ddisc, 103dn, 111diam(A), 109diam(α), 109h(T ), 114h(T, α), 110h1(T ), 111h2(T ), 114sep(ε, T ), 134

sep(n, ε, T ), 111span(n, ε, T ), 111

Stationary Stochastic Processes, 12

algebra, 7algebra generated, 7atoms of a partition, 58

Baker’s Transformation, 10Bernoulli Shifts, 11binary expansion, 10Birkhoff’s Ergodic Theorem, 29

Cantor set, 125circle rotation, 120common refinement, 58

of open cover, 109conditional expectation, 35conditional information function, 66conservative, 82Continued Fractions, 13

diameterof a set, 109of open cover, 109

dynamical system, 47

entropy of the partition, 57entropy of the transformation, 61equivalent measures, 79ergodic, 20

143

Page 144: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

144 Measures of Maximal Entropy

ergodic decomposition, 40expansive, 106

constant, 106extreme point, 92

factor map, 51first return time, 16fixed point, 105

generator, 64, 106weak, 107

homeomorphismconjugate, 107expansive, 106, 107generator for, 106minimal, 103topologically transitive, 103weak generator for, 107

Hurewicz Ergodic Theorem, 83

induced map, 16induced operator, 22information function, 66integral system, 19irreducible Markov chain, 42isometry, 105isomorphic, 47

Kac’s Lemma, 35Knopp’s Lemma, 26

Lebesgue number, 102Lochs’ Theorem, 72

Markov measure, 41Markov Shifts, 12measure of maximal entropy, 136

unique, 136measure preserving, 6

monotone class, 7

natural extension, 52non-singular transformation, 80

orbit, 103forward, 103

periodic point, 105phase portrait, 119Poincare Recurrence Theorem, 15

quadratic map, 118

Radon-Nikodym Theorem, 79random shifts, 13refinement

of open cover, 109regular measure, 89Riesz Representation Representation

Theorem, 90

semi-algebra, 7Shannon-McMillan-Breiman Theo-

rem, 70shift map

full, 121strongly mixing, 45subadditive sequence, 60symbolic dynamics, 123

topological conjugacy, 107topological entropy

definition (I), 111definition (II), 114of open cover, 109properties, 116wrt open cover, 110

topologically transitive, 103homeomorphism, 103

Page 145: A Simple Introduction to Ergodic Theory - science.uu.nl …kraai101/lecturenotes2009.pdf ·  · 2009-09-07A Simple Introduction to Ergodic Theory Karma Dajani and Sjoerd Dirksin

145

one-sided, 103translations, 9

uniquely ergodic, 96

Variational Principle, 136

weakly mixing, 45