kluwer

400
The Metrical Theory of Continued fractions Marius Iosifescu and Cor Kraaikamp

Upload: pancev

Post on 10-Apr-2015

689 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Kluwer

The Metrical Theory

of Continued fractions

Marius Iosifescu and Cor Kraaikamp

Page 2: Kluwer

Contents

Preface ix

Frequently Used Notation xv

1 Basic properties of the continued fraction expansion 11.1 A generalization of Euclid’s algorithm . . . . . . . . . . . . . 1

1.1.1 The continued fraction transformation τ . . . . . . . . 11.1.2 Continuants and convergents . . . . . . . . . . . . . . 41.1.3 Some special continued fraction expansions . . . . . . 11

1.2 Basic metric properties . . . . . . . . . . . . . . . . . . . . . . 141.2.1 Defining random variables of interest . . . . . . . . . . 141.2.2 Gauss’ problem and measure . . . . . . . . . . . . . . 151.2.3 Fundamental intervals, and applications . . . . . . . . 17

1.3 The natural extension of τ . . . . . . . . . . . . . . . . . . . . 251.3.1 Definition and basic properties . . . . . . . . . . . . . 251.3.2 Approximation coefficients . . . . . . . . . . . . . . . . 271.3.3 Extended random variables . . . . . . . . . . . . . . . 311.3.4 The conditional probability measures . . . . . . . . . . 361.3.5 Paul Levy’s solution to Gauss’ problem . . . . . . . . 391.3.6 Mixing properties . . . . . . . . . . . . . . . . . . . . . 43

2 Solving Gauss’ problem 532.0 Banach space preliminaries . . . . . . . . . . . . . . . . . . . 53

2.0.1 A few classical Banach spaces . . . . . . . . . . . . . . 532.0.2 Bounded essential variation . . . . . . . . . . . . . . . 55

2.1 The Perron–Frobenius operator . . . . . . . . . . . . . . . . . 562.1.1 Definition and basic properties . . . . . . . . . . . . . 562.1.2 Asymptotic behaviour . . . . . . . . . . . . . . . . . . 62

v

Page 3: Kluwer

vi CONTENTS

2.1.3 Restricting the domain of the Perron–Frobeniusoperator . . . . . . . . . . . . . . . . . . . . . . . . . . 64

2.1.4 A solution to Gauss’ problem for probability measureswith densities . . . . . . . . . . . . . . . . . . . . . . . 70

2.1.5 Computing variances of certain sums . . . . . . . . . . 712.2 Wirsing’s solution to Gauss’ problem . . . . . . . . . . . . . . 79

2.2.1 Elementary considerations . . . . . . . . . . . . . . . . 792.2.2 A functional-theoretic approach . . . . . . . . . . . . . 852.2.3 The case of Lipschitz densities . . . . . . . . . . . . . 95

2.3 Babenko’s solution to Gauss’ problem . . . . . . . . . . . . . 1012.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . 1012.3.2 A symmetric linear operator . . . . . . . . . . . . . . . 1032.3.3 An ‘exact’ Gauss–Kuzmin–Levy theorem . . . . . . . 1112.3.4 ψ-mixing revisited . . . . . . . . . . . . . . . . . . . . 119

2.4 Extending Babenko’s and Wirsing’s work . . . . . . . . . . . 1202.4.1 The Mayer–Roepstorff Hilbert space approach . . . . 1202.4.2 The Mayer–Roepstorff Banach space approach . . . . 1272.4.3 Mayer–Ruelle operators . . . . . . . . . . . . . . . . . 130

2.5 The Markov chain associated with thecontinued fraction expansion . . . . . . . . . . . . . . . . . . 1352.5.1 The Perron–Frobenius operator on BV (I) . . . . . . . 1352.5.2 An upper bound . . . . . . . . . . . . . . . . . . . . . 1392.5.3 Two asymptotic distributions . . . . . . . . . . . . . . 1512.5.4 A generalization of a result of A. Denjoy . . . . . . . . 156

3 Limit theorems 1653.0 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 1653.1 The Poisson law . . . . . . . . . . . . . . . . . . . . . . . . . 169

3.1.1 The case of incomplete quotients . . . . . . . . . . . . 1693.1.2 The case of associated random variable . . . . . . . . 1713.1.3 Some extreme value theory . . . . . . . . . . . . . . . 173

3.2 Normal convergence . . . . . . . . . . . . . . . . . . . . . . . 1793.2.1 Two general invariance principles . . . . . . . . . . . . 1793.2.2 The case of incomplete quotients . . . . . . . . . . . . 1823.2.3 The case of associated random variables . . . . . . . . 188

3.3 Convergence to non-normal stable laws . . . . . . . . . . . . . 1963.3.1 The case of incomplete quotients . . . . . . . . . . . . 1963.3.2 Sums of incomplete quotients . . . . . . . . . . . . . . 2023.3.3 The case of associated random variables . . . . . . . . 207

3.4 Fluctuation results . . . . . . . . . . . . . . . . . . . . . . . . 213

Page 4: Kluwer

CONTENTS vii

3.4.1 The case of incomplete quotients . . . . . . . . . . . . 2133.4.2 The case of associated random variables . . . . . . . . 215

4 Ergodic theory of continued fractions 2194.0 Ergodic theory preliminaries . . . . . . . . . . . . . . . . . . . 219

4.0.1 A few general concepts . . . . . . . . . . . . . . . . . . 2194.0.2 The special case of the transformations τ and τ . . . . 224

4.1 Classical results and generalizations . . . . . . . . . . . . . . 2254.1.1 The case of incomplete quotients . . . . . . . . . . . . 2254.1.2 Empirical evidence, and normal continued fraction

numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 2404.1.3 The case of associated and extended random variables 244

4.2 Other continued fraction expansions . . . . . . . . . . . . . . 2574.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . 2574.2.2 Semi-regular continued fraction expansions . . . . . . 2604.2.3 The singularization process . . . . . . . . . . . . . . . 2644.2.4 S-expansions . . . . . . . . . . . . . . . . . . . . . . . 2664.2.5 Ergodic properties of S-expansions . . . . . . . . . . . 273

4.3 Examples of S-expansions . . . . . . . . . . . . . . . . . . . . 2814.3.1 Nakada’s α-expansions . . . . . . . . . . . . . . . . . . 2814.3.2 Minkowski’s diagonal continued fraction expansion . . 2894.3.3 Bosma’s optimal continued fraction expansion . . . . . 292

4.4 Continued fraction expansions with σ-finite, infinite invariantmeasure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2994.4.1 The insertion process . . . . . . . . . . . . . . . . . . . 2994.4.2 The Lehner and Farey continued fraction expansions . 3004.4.3 The backward continued fraction expansion . . . . . . 307

Appendix 1: Spaces, functions, and measures 313A1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313A1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313A1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314A1.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314A1.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316A1.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

Appendix 2: Regularly varying functions 321A2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321A2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323A2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

Page 5: Kluwer

viii CONTENTS

Appendix 3: Limit theorems for mixing random variables 325A3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325A3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327A3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328

Notes and Comments 333

References 347

Index 377

Page 6: Kluwer

Preface

This monograph is intended to be a complete treatment of the metrical the-ory of the (regular) continued fraction expansion and related representationsof real numbers. We have attempted to give the best possible results knownso far, with proofs which are the simplest and most direct.

The book has had a long gestation period because we first decided towrite it in March 1994. This gave us the possibility of essentially improvingthe initial versions of many parts of it. Even if the two authors are differentin style and approach, every effort has been made to hide the differences.

Let Ω denote the set of irrationals in I = [0, 1]. Define the (reg-ular) continued fraction transformation τ by τ (ω) = fractional part of1/ω, ω ∈ Ω. Write τn for the nth iterate of τ, n ∈ N = 0, 1, · · · ,with τ0 = identity map. The positive integers an(ω) = a1(τn−1(ω)), n ∈N+ = 1, 2 · · · , where a1(ω) = integer part of 1/ω, ω ∈ Ω, are called the(regular continued fraction) digits of ω. Writing

[x1] = 1/x1, [x1, · · · , xn] = 1/(x1 + [x2, · · · , xn]), n ≥ 2,

for arbitrary indeterminates xi, 1 ≤ i ≤ n, we have

ω = limn→∞ [a1(ω), · · · , an(ω)] , ω ∈ Ω,

thus explaining the name of τ . The above equation will be also written as

ω = limn→∞[a1(ω), a2(ω), · · · ], ω ∈ Ω.

The an, n ∈ N, to be called incomplete quotients, are clearly positive integer-valued random variables which are defined almost surely on (I,BI) withrespect to any probability measure assigning probability 0 to the set I\Ω ofrationals in I. (Here BI denotes the σ-algebra of Borel subsets of I.) Themetrical theory of the (regular) continued fraction expansion is about thesequence (an)n∈N+ of its incomplete quotients, and related sequences.

ix

Page 7: Kluwer

x Preface

C.F. Gauss stated in 1812 that, in current notation,

limn→∞λ(τ−n ([0, x))) = γ([0, x]), x ∈ I,

where λ denotes Lebesgue measure and γ is what we now call Gauss’ mea-sure, defined by

γ(A) =1

log 2

A

dx

x + 1, A ∈ BI .

Gauss asked for an estimate of the convergence rate in the above limitingrelation, and this has actually been the first problem of the metrical theoryof continued fractions. Ramifications of this problem, which was given afirst solution only in 1928, still pervade the current developments. Chapter2 contains a detailed treatment of Gauss’ problem by an elementary ap-proach and functional-theoretic methods as well. The latter are applied tothe Perron–Frobenius operator associated with τ , considered as acting onvarious Banach spaces including that of functions of bounded variation onI.

Gauss’ measure is important since it is preserved by τ , that is, γ(τ−1(A))= γ(A) for any A ∈ BI . This implies that, by its very definition, thesequence (an)n∈N+ is strictly stationary under γ. As such, there shouldexist a doubly infinite version of it, say (a`)`∈Z, Z = · · · ,−1, 0, 1, · · · ,defined on a richer probability space. It appears that this doubly infiniteversion can be effectively constructed on (I2,B2

I , γ), where γ is the so calledextended Gauss’ measure defined by

γ(B) =1

log 2

∫∫

B

dxdy

(xy + 1)2, B ∈ B2

I .

Put a−n(ω, θ) = an+1(θ), a0(ω, θ) = a1(θ), an(ω, θ) = an(ω) for anyn ∈ N+ and (ω, θ) ∈ Ω2. Then whatever ` ∈ Z, k ∈ N, and n ∈ N+

the probability distribution of the random vector (a`, · · · , a`+k) under γ isidentical with that of the random vector (an, · · · , an+k) under γ, that is,(a`)`∈Z under γ is a doubly infinite version of (an)n∈N+ under γ. A distinc-tive feature of our treatment is the consistent use of the extended incompletequotients a`, ` ∈ Z. It appears that

γ ( [0, x]× I | a0, a−1, · · · ) =(a + 1)xax + 1

γ-a.s.

for any x ∈ I, where a = [a0, a−1, · · · ], which in turn implies that

γ(a`+1 = i | a`, a`−1, · · · ) =a + 1

(a + i)(a + i + 1)γ-a.s.

Page 8: Kluwer

Preface xi

for any i ∈ N+ and ` ∈ Z. The last equation emphasizes a ‘chain of infiniteorder’ structure of the incomplete quotients when properly defined on aricher probability space. This idea goes back to W. Doeblin (1940) and,hopefully, is fully clarified by our treatment. Also, the considerations abovemotivate the introduction of the family (γa)a∈I of probability measures onBI defined by their distribution functions

γa([0, x]) =(a + 1)xax + 1

, x ∈ I.

In particular, γ0 = λ. Besides γ, these probability measures, which we callconditional, are the most natural ones associated with the regular contin-ued fraction expansion. It appears that (a`)`∈Z is ψ-mixing under γ while(an)n∈N+ is ψ-mixing under γ and any γa, a ∈ I, and that the ψ-mixingcoefficients of the latter under γ (which are equal to the corresponding onesof the former under γ) can in principle be exactly calculated. The facts justdescribed are part of our Chapter 1.

Chapter 3 is devoted to limit theorems for incomplete quotients, relatedrandom variables, and their extended versions. These include weak conver-gence to the Poisson, normal, and non-normal stable laws as well as the lawof the iterated logarithm, in both classical and functional approaches, andare essentially based, in general, on the ψ-mixing property of both (a`)`∈Z

and (an)n∈N+ .The ergodic properties of the regular continued fraction expansion, lead-

ing to strong laws of large numbers, is deferred to Chapter 4. The reason isthat whilst these properties are inherited by the continued fraction expan-sions which can be derived from the regular continued fraction expansionby the procedures called singularization and insertion, the limit propertiesin Chapter 3 do not transfer automatically to continued fraction expansionsso derived. We give applications of the ergodic properties of the continuedfraction transformation τ and its natural extension τ . After an introduc-tion, in which several general ergodic theoretical concepts and results—suchas Birkhoff’s ergodic theorem—are described, various classical results andimportant recent results, based on the natural extension, are derived. It isthen shown that—via singularization and insertion—the ergodic propertiesof very many other continued fraction expansions can easily be obtained.In particular, the ergodic properties of the so called S-expansions are de-scribed in detail. Several examples of S-expansions are studied, such asNakada’s α-expansions, Minkowski’s diagonal continued fraction expansionand Bosma’s optimal continued fraction expansion. Also, the connectionbetween the regular continued fraction expansion and continued fraction

Page 9: Kluwer

xii Preface

expansions with σ-finite, infinite invariant measures, such as the backwardcontinued fraction expansion and Lehner’s continued fraction expansion, isexplained.

To make the book self-contained as reasonably as possible, we have in-cluded three appendices containing less known notions and results from mea-sure theory, regularly varying functions, and limit theorems for mixing se-quences of random variables, which we use frequently, especially in Chapter3. We urge the reader to become familiar with the appendices early on soas to be aware of what can be found there as needed. We also warn thereader that Chapter 3 and some subsections of Chapter 2 are more involvedor more abstract, and thus they make more difficult reading.

The concluding notes and comments aim at giving credit, pointing outto results not included in the main text, or tracing historical developments.

The references list greatly exceeds the number of works quoted in thecourse of the book. It should be consulted with the purpose of discoveringhistorical sources, parallel research, and starting points for new investiga-tions.

For what our work is not, the reader is referred to the books by Brezinski(1991) and von Plato (1994)—for the history of continued fractions—Jonesand Thron (1980), Lorenzen and Waadeland (1992), Olds (1963), Perron(1954, 1957), Rockett and Szusz (1992), Schmidt (1980), Sprindzuk (1979),Sudan (1959), and Wall (1948)—for various, mainly non-metric, aspects ofthe theory of continued fractions.

Acknowledgements

Much of our original work included in this book has been carried out in theframework of our association with the Bucharest ‘Gheorghe Mihoc’ Centrefor Mathematical Statistics of the Romanian Academy, and the Departmentof Probability and Statistics (CROSS), Faculty ITS, of the Delft Universityof Technology.

Many institutions and persons have helped us in various ways.The first of us wishes to acknowledge the hospitality of Universite Rene

Descartes – Paris 5, Universite des Sciences et des Technologies de Lille,and Universite Victor Segalen – Bordeaux 2. He is grateful to Bui TrongLieu, Michel Schreiber (both of Paris 5), George Haiman (Lille), and Jean-Marc Deshouillers (Bordeaux 2) for their kind invitations at these locationswhere his stays in the period 1996–1999 were very helpful in completingparts of the book. He is also grateful to the Nederlandse Organisatie voor

Page 10: Kluwer

Preface xiii

Wetenschappelijk Onderzoek (NWO)—the Dutch organization for scientificresearch—for two one-month research grants in the years 2000 and 2001,and to the Department of Probability and Statistics (CROSS) for invitationsallowing several short stays in Delft during which much of the joint work onthe book was done. A short stay in the spring of 2000 at the Department ofMathematics of Uppsala University, for which he is grateful to Allan Gut,was very beneficial for gathering recent literature on the subject. Last,but not the least, he gratefully acknowledges generous financial supportin the years 2000 and 2001 from a French–Romanian CNRS InternationalProject of Scientific Cooperation (PICS) directed by Haım Brezis and DoinaCioranescu (both of Universite Pierre et Marie Curie – Paris 6). This allowedhim to spend more time in Delft, which was decisive for completing the book.Finally, he wishes to acknowledge the technical help he has received fromAdriana Gradinaru who changed his handwritten, hardly legible drafts intoa camera ready copy.

The second author would also like to thank the Romanian Academy fortheir support during his visits to Bucharest.

Adriana Berechet read several versions of the typescript, and with herpenetrating mind detected some inaccuracies and slips. Expressing our in-debtedness to her, we wish to make it clear that any remaining errors areour own.

Finally, we must thank all the people with Kluwer Academic Publisherswho helped during the development and production of this book project.

Delft, November 2001 M.I.C.K.

Page 11: Kluwer

xiv Preface

Page 12: Kluwer

Frequently Used Notation

Abbreviations

a.e. = almost everywhere (with respect to Lebesgue measure)

a.s. = almost surely (with respect to any other measure)

Cov = covariance

g.c.d. = greatest common divisor

i.i.d.= independent identically distributed

i.o. = infinitely often

log = natural logarithm

p.m. = probability measure

s.i. = strongly infinitesimal

r.v. = random variable

var = total variation

Var = variance

2 = end of example, proof, or remark

xv

Page 13: Kluwer

xvi Frequently Used Notation

Symbols

N = 0, 1, 2, · · · , N+ = 1, 2, · · · , −N = · · · ,−2,−1, 0

Z = (−N) ∪N+ = · · · ,−1, 0, 1, · · ·

Q = the set of rational numbers

R = the set of real numbers

bac = integer part of a ∈ R

a = fractional part of a ∈ R

R+ = (x ∈ R : x ≥ 0) , R++ = (x ∈ R : x > 0)

I = [0, 1] = the unit interval of R

Ω = I \Q = the set of irrationals in I

C = the set of complex numbers

i =√−1 (imaginary unit)

z∗ = complex conjugate of z ∈ C

Rn = real n-vector space, or Euclidean n-space, n ∈ N+; R1 = R

Bn = σ-algebra of Borel sets in Rn; B1 = B

BM = Bn ∩M := (B ∩M : B ∈ Bn), M ∈ Bn, n ∈ N+

BI = B ∩ I = σ-algebra of Borel sets in I

B2I = BI2 = σ-algebra of Borel sets in I2

Ac = complementary set of the set A

Page 14: Kluwer

Frequently Used Notation xvii

IA = indicator function of the set A

∂A = boundary of the Borel set A

δx = p.m. concentrated at the point x

λ = Lebesgue measure on B

λ2 = Lebesgue measure on B2

N (0, 1) = standard normal distribution

Φ = standard normal distribution function

P (θ) = Poisson distribution with parameter θ

Pf−1 = P -distribution of r.v. f

∗ = convolution of measures

⊗ = product of σ-algebras or measures

C = 0.577 215 · · · (Euler’s constant)

Fn = nth Fibonacci number:

F0 = F1 = 1,Fn+1 = Fn + Fn−1, n ∈ N+

g = (√

5− 1)/2, G = g + 1 (‘golden ratios’)

K0 = 2.685 452 · · · (Khinchin’s constant)

K−1 = 1.745 405 · · · (Khinchin’s constant)

λ0 = 0.303 663 002 898 732 568 · · · (Wirsing’s constant)

ζ(2) =∑

i∈N+i−2 = π2/6

Page 15: Kluwer

xviii Frequently Used Notation

an, 3, 14

a`, 31

B(I), 53

||| · |||, 53

BEV (I), 55

||·||v, 56

||·||v,µ, 56

BV (I), 54

||| · ||| v, 54

C, 319

C(I), 53

C1 (I), 53

||| · ||| 1, 53

cτ Pois µ, 317

γ, 16

γ, 26

γa, 36

d0, 319

dP, 315

D = D(I), 319

ess sup, 55

F, 324

G, Gan, 39

L(I), 53

||| · ||| L, 54

Lp, 55

||·||p, 55

L∞, 55

||.||∞, 55

Lpµ, 54

||·||p,µ, 54

L∞µ , 55

||.||∞,µ, 55

m(X ), 314

µ-ess sup, 55

να, 197

Pλ, 60

Pi, 22

Pi1···in , 136

Pµ, 57

pn, 4, 19

pen, 261

pen, 265

Pois µ, 317

pr(X ), 314

qn, 4, 19

qen, 261

qen, 265

Page 16: Kluwer

Frequently Used Notation xix

Qν , 328

rn, 14

r`, 34

s (f), 53

sn, 14

san, 36

s`, 34

σ (C), 313

σ ((fi)i∈I), 314

ten, 263

ten, 273

τ , 2

τ , 25

Θn, 27

Θ′n, 251

Θen, 263

Θen, 280

u(i(n)

), 18

U := Pγ , 59

un, 14

uan, 38

u`, 34

v (f), 55

v(i(n)

), 18

W , 319

yn, 15

y`, 34

Page 17: Kluwer

xx Frequently Used Notation

Page 18: Kluwer

Chapter 1

Basic properties of thecontinued fraction expansion

In this chapter the (regular) continued fraction expansion is introduced andnotation fixed. Some basic properties to be used in subsequent chapters arealso derived.

1.1 A generalization of Euclid’s algorithm

1.1.1 The continued fraction transformation τ

In Proposition 2 of Book VII, Euclid gave an algorithm—now bearing hisname—for finding the greatest common divisor (g.c.d.) of two given integers:let a, b ∈ Z and assume for convenience that a > b > 0. Put

v0 := a, v1 := b,

and determine a1 ∈ N+, v2 ∈ N, such that

v0 = a1v1 + v2,

where 0 ≤ v2 < v1. If v2 6= 0 then we repeat this procedure and obtain

v1 = a2v2 + v3,

where 0 ≤ v3 < v2. In general, if vm 6= 0 for some m ≥ 2, then we obtain

vm−1 = amvm + vm+1, (1.1.1)

1

Page 19: Kluwer

2 Chapter 1

where 0 ≤ vm+1 < vm. Clearly, the procedure should stop after finitely manysteps: there exists n ∈ N+ such that vn 6= 0 and vn+1 = 0. Then, as is wellknown, we have

vn = g.c.d. (a, b) .

Remark. The running time of Euclid’s algorithm depends on the numberof division steps required to get the g.c.d. of the given positive integersv0 > v1. In an 1844 paper of the French mathematician Gabriel Lame itis essentially shown that (i) given n ∈ N+, if Euclid’s algorithm applied tov0 and v1 requires exactly n division steps and v0 is as small as possiblesatisfying this condition, then v0 = Fn+1 and v1 = Fn; (ii) if v1 < v0 < m ∈N+, then the number of division steps required by Euclid’s algorithm whenapplied to v0 and v1 is at most

⌊log(

√5m)/ log((

√5 + 1)/2)

⌋− 2 ≈ b2.078 log m + 1.672c − 2,

where b c : R → Z is the greatest integer function, that is,

bxc = greatest integer not exceeding x ∈ R.

For historical details we refer the reader to Shallit (1994), and for recentdevelopments to Knuth (1981, Section 4.5.3) and Hensley (1994). It shouldbe noted that the latter are based on results to be proved in this and laterchapters. 2

To consider Euclid’s algorithm more closely we define the so called con-tinued fraction transformation τ : I → I by

τ (x) =

x−1 − bx−1c if x 6= 0,0 if x = 0.

Then putting x = b/a we obviously have

a1 = a1 (x) = bv0/v1c, · · · , an = an (x) = bvn−1/vnc

andvm

vm−1= τm−1 (x) , 1 ≤ m ≤ n, τn(x) = 0,

where τ0 = identity map and τ `, ` ∈ N+, is the composition of τ with itself` times. Note that

am (x) = a1

(τm−1 (x)

), 1 ≤ m ≤ n. (1.1.2)

Page 20: Kluwer

Basic properties 3

As vm−1 = amvm + vm+1, we have

1τm−1 (x)

= am + τm (x) , 1 ≤ m ≤ n.

If for arbitrary indeterminates xi, 1 ≤ i ≤ n, n ∈ N+, we write

[x1] =1x1

, [x1, · · · , xn] =1

x1 + [x2, · · · , xn], n ≥ 2,

then it follows that

x = [a1 + τ (x)] = [a1, · · · , am−1, am + τm (x)] = [a1, · · · , an] (1.1.3)

for 1 < m ≤ n. An expression as on the right hand side of (1.1.3) is called afinite (regular) continued fraction (RCF for short). It follows from Euclid’salgorithm that each rational number x /∈ Z can be written as

x = a0 + [a1, . · · · , an] , (1.1.4)

where a0 = bxc. (Note that for any x ∈ R, x /∈ Z, the fractionary partx−bxc of x is a number in the open interval (0, 1) !) The right hand side of(1.1.4) will be denoted by

[a0; a1, · · · , an] .

Euclid’s algorithm yields an ≥ 2. Hence each rational number x /∈ Z hastwo continued fraction expansions, namely,

[a0; a1, · · · , an] = [a0; a1, · · · , an − 1, 1] .

Of course, there is no reason whatsoever to stick to rationals. Let x ∈R\Q and, as in the case of rationals, put a0 = bxc. It follows from the verydefinition of τ that

τn (x− a0) ∈ Ω = I\Q, n ∈ N.

Let us define

an = an (x) = b1/τn−1 (x− a0)c, n ∈ N+,

so that, similarly to (1.1.2),

an (x) = a1

(τn−1 (x− a0)

), n ∈ N+. (1.1.2′)

Page 21: Kluwer

4 Chapter 1

Hence

x = [a0; a1 + τ (x− a0)] = · · · = [a0; a1, · · · , an−1, an + τn (x− a0)] (1.1.5)

for any n ≥ 2.The two cases x ∈ Q and x ∈ R\Q can be treated in a unitary manner

if we definea1 (0) = ∞,

the symbol ∞ being subject to the rules 1/∞ = 0, 1/0 = ∞. Equations(1.1.5) are then valid for any x ∈ R. Clearly, for any x ∈ Q there existsn = n (x) ∈ N+ such that am (x) = ∞ for any m ≥ n.

The integers a1(x), a2(x), · · · will be called the (continued fraction) digitsof x ∈ R whilst the functions x → ai(x) ∈ N+ ∪ ∞, x ∈ R, i ∈ N+,will be called the incomplete (or partial) quotients of the continued fractionexpansion. Euclid’s algorithm implies that x ∈ R has finitely many finitecontinued fraction digits if and only if x ∈ Q.

1.1.2 Continuants and convergents

Throughout the first three chapters, without express mention to the con-trary, we will assume that x ∈ [0, 1), which implies that a0 = 0, and write

[0; a1, · · · , an] = [a1, · · · , an] , n ∈ N+.

We will usually drop the dependence on x in the notation. Define

ω0 = 0, ωn = ωn (x) = [a1, · · · , an] , x ∈ [0, 1), n ∈ N+.

Clearly, ωn ∈ Q, sayωn =

pn

qn, n ∈ N+,

where pn, qn ∈ N+ and g.c.d. (pn, qn) = 1. The number ωn ∈ ωn (x) is calledthe nth (regular continued fraction) (RCF) convergent of x, n ∈ N. Asa rule, in the first three chapters the specification RCF will be dropped.Clearly, for any x ∈ Q there exists n = n (x) ∈ N such that ωm (x) = x forany m ≥ n. We shall show that for any irrational ω ∈ Ω := I\Q we have

limn→∞ωn (ω) = ω.

For that we need some preparation. Define recursively polynomials Qn

of n variables, n ∈ N, by

Qn (x1, · · · , xn) =

1 if n = 0,x1 if n = 1,x1Qn−1 (x2, · · · , xn) + Qn−2 (x3, · · · , xn) if n ≥ 2.

Page 22: Kluwer

Basic properties 5

Thus

Q2 (x1, x2) = x1x2 + 1, Q3 (x1, x2, x3) = x1x2x3 + x1 + x3,

Q4 (x1, x2, x3, x4) = x1x2x3x4 + x1x2 + x1x4 + x3x4 + 1,

etc. In general, as noted by Leonhard Euler, for any n ∈ N+, Qn (x1, · · · , xn)is the sum of all terms which can be obtained starting from x1 · · ·xn anddeleting zero or more non-overlapping pairs (xi, xi+1) of consecutive vari-ables. There are Fn such terms. (Prove it!) The polynomials Qn, n ∈ N,are called continuants, and their basic property is that

[x1, · · · , xn] =Qn−1 (x2, · · · , xn)Qn (x1, · · · , xn)

, n ∈ N+. (1.1.6)

The proof by induction is immediate and is left to the reader. The continu-ants enjoy the symmetry property

Qn (x1, · · · , xn) = Qn (xn, · · · , x1) , n ∈ N+. (1.1.7)

This follows from Euler’s remark above. Hence

Qn (x1, · · · , xn) = xnQn−1 (x1, · · · , xn−1) + Qn−2 (x1, · · · , xn−2) (1.1.8)

for any n ≥ 2. The continuants also satisfy the equation

Qn (x1, · · · , xn) Qn (x2, · · · , xn+1)

− Qn+1 (x1, · · · , xn+1) Qn−1 (x2, · · · , xn) = (−1)n , n ∈ N+.(1.1.9)

The proof is immediate. For n = 1 equation (1.1.9) is true. By the verydefinition of Qn, for any n ≥ 2 we have

Qn (x1, · · · , xn) Qn (x2, · · · , xn+1)−Qn+1 (x1, · · · , xn+1) Qn−1 (x2, · · · , xn)

= (x1Qn−1 (x2, · · · , xn) + Qn−2 (x3, · · · , xn))Qn (x2, · · · , xn+1)− (x1Qn (x2, · · · , xn+1) + Qn−1 (x3, · · · , xn+1))Qn−1 (x2, · · · , xn)

= (−1) Qn−1 (x2, · · · , xn) Qn−1 (x3, · · · , xn+1)−(−1)Qn (x2, · · · , xn+1) Qn−2 (x3, · · · , xn)

= · · · = (−1)n−1 (Q1 (xn) Q1 (xn+1)−Q2 (xn, xn+1)) = (−1)n .

Page 23: Kluwer

6 Chapter 1

Now, let ω ∈ Ω = I\Q have digits a1(ω), a2(ω), · · · . It follows from(1.1.6) and (1.1.9) that

ωn (ω) =Qn−1 (a2, · · · , an)Qn (a1, · · · , an)

, (1.1.10)

pn = Qn−1(a2, · · · , an), qn = Qn(a1, · · · , an), n ∈ N+.

Hence pn (ω) = qn−1(τ(ω)), n ∈ N+, ω ∈ Ω, and using (1.1.8) we obtain

qn = anqn−1 + qn−2, n ≥ 2,pn = anpn−1 + pn−2, n ≥ 3,

(1.1.11)

with q0 = 1, q1 = a1, p1 = 1, p2 = a2. If we define p0 = q−1 = 0, p−1 = 1,then equations (1.1.11) hold for any n ∈ N+. It follows from (1.1.9) and(1.1.10) that

pnqn−1 − pn−1qn = (−1)n+1 , n ∈ N. (1.1.12)

Clearly, either (1.1.10) or (1.1.11) implies that

pn+1 ≥ Fn, qn ≥ Fn, n ∈ N. (1.1.13)

Notice that by (1.1.5), (1.1.6), (1.1.7), (1.1.10), and (1.1.11) we also have

ω = [a1 + τ (ω)] =1

a1 + τ (ω)=

p1 + τ (ω) p0

q1 + τ (ω) q0,

ω =[a1, a2 + τ2 (ω)

]=

a2 + τ2 (ω)a1a2 + 1 + a1τ2 (ω)

=p2 + τ2 (ω) p1

q2 + τ2 (ω) q1,

and for n ≥ 3,

ω = [a1, · · · , an−1, an + τn (ω)] =Qn−1 (an + τn (ω) , an−1, · · · , a2)Qn (an + τn (ω) , an−1, · · · , a1)

=(an + τn (ω))Qn−2 (a2, · · · , an−1) + Qn−3 (a2, · · · , an−2)(an + τn (ω))Qn−1 (a1, · · · , an−1) + Qn−2 (a1, · · · , an−2)

=anpn−1 + pn−2 + τn (ω) pn−1

anqn−1 + qn−2 + τn (ω) qn−1=

pn + τn (ω) pn−1

qn + τn (ω) qn−1.

Therefore we can assert that

ω =pn + τn (ω) pn−1

qn + τn (ω) qn−1, ω ∈ Ω, n ∈ N, (1.1.14)

Page 24: Kluwer

Basic properties 7

and remark that (1.1.14) also holds for any rational ω in [0, 1).

Remark. A matrix approach to equations (1.1.12) and (1.1.14) is asfollows. Consider the matrices

Mn =(

pn−1 pn

qn−1 qn

), n ∈ N,

so that M0 = identity matrix, and define

M−1 =(

0 11 0

).

Then equations (1.1.11) imply that

Mn = Mn−1An, n ∈ N,

where

An =(

0 11 an

), n ∈ N,

with a0 = 0. Hence

Mn =(

0 11 0

) n∏

i=0

(0 11 ai

), n ∈ N,

and (1.1.12) is nothing but the equation

detMn = (−1)n , n ∈ N.

Clearly, M−1, Mn, An ∈ SL (2,Z), n ∈ N, that is, the entries of these2 × 2 matrices belong to Z and their determinants are equal either to 1 or−1 . Recall that any matrix

M =(

a bc d

)∈ SL (2,Z)

can be viewed as a Mobius transformation denoted by the same letter of thecompactified complex plane C∗, which is defined by

M (z) =(

a bc d

)(z) :=

az + b

cz + d, z ∈ C∗.

With T denoting transpose we also have

M (z) =(1, 0)M (z, 1)T

(0, 1)M (z, 1)T, z ∈ C∗,

Page 25: Kluwer

8 Chapter 1

which implies at once that

M ′M ′′ (z) = M ′ (M ′′ (z)), z ∈ C∗,

for any M ′,M ′′ ∈ SL (2,Z) .Next, for any z ∈ C and n ∈ N we have

(pn + zpn−1

qn + zqn−1

)= Mn

(z1

)= Mn−1An

(z1

)

= Mn−1

(1

an + z

).

In particular, for z = 0 we have(

pn

qn

)= Mn

(01

)= Mn−1

(1an

), n ∈ N, (1.1.10′)

whence

Mn (0) =(1, 0) Mn−1 (1, an)T

(0, 1) Mn−1 (1, an)T=

pn

qn

:=

[a1, · · · , an] if n ∈ N+,0 if n = 0.

It follows that

Mn (z) =pn + zpn−1

qn + zqn−1= [a1, · · · , an−1, an + z] , n ≥ 2,

for any z ∈ C, z 6= −qn/qn−1, and

M1 (z) =1

a1 + z

(=

p1 + zp0

q1 + zq0

)

for any z ∈ C, z 6= −a1. Now, (1.1.14) follows from the last two equationsby taking z = τn (ω) , n ≥ 2, respectively z = τ(ω), ω ∈ Ω.

Finally, it is obvious by (1.1.10′) that pn and qn, n ∈ N+, can be actuallydefined as (

pn

qn

)=

(0 11 a1

)· · ·

(0 11 an

)(01

).

It is worth mentioning that any irrational number

ω = [a0; a1, a2, · · · ] ∈ R

Page 26: Kluwer

Basic properties 9

can be represented in terms of only two elements of SL(2,Z), namely

Q =(

0 1−1 0

)and R =

(1 10 1

),

so that Q(z) = −1/z, R(z) = z + 1, z ∈ C. It is not hard to check that Qand R generate SL(2,Z) and that

ω = limn→∞Ra0QR−a1QRa2Q · · · R−a2n−1Q Ra2n(z0)

for any z0 ∈ C. This simple remark is the starting point for understandingby the use of elementary results about continued fractions the behaviour ofthe geodesic flow on a certain Riemann surface. For details see Series (1982,1991). See also Adler (1991), Faivre (1993), and Nakada (1995). For anotherrepresentation of irrationals ω ∈ R in terms of matrices R and L = (PQ)2Qsee Raney (1973). 2

We can now prove the result announced before defining the continuants.

Proposition 1.1.1 For any x ∈ [0, 1) we have

x− ωn (x) =(−1)n τn (x)

qn (qn + τn (x) qn−1), n ∈ N. (1.1.15)

For any ω ∈ Ω we have

1qn (qn+1 + qn)

< |ω − ωn (ω)| < 1qnqn+1

, n ∈ N, (1.1.16)

andlim

n→∞ωn (ω) = ω. (1.1.17)

Proof. Equation (1.1.15) follows from (1.1.12) and (1.1.14). Next, since

1τn (ω)

= an+1 + τn+1 (ω) , n ∈ N, ω ∈ Ω,

by (1.1.11) we have

τn (ω)qn (qn + τn (ω) qn−1)

=1

qn (qn (an+1 + τn+1 (ω)) + qn−1)

=1

qn (qn+1 + qnτn+1 (ω)),

Page 27: Kluwer

10 Chapter 1

and (1.1.16) follows.Finally, (1.1.17) follows from (1.1.16) and (1.1.13). 2

Remark. It is easy to see that (1.1.15) implies

|x− ωn (x)| ≤ 1qnqn+1

, n ∈ N,

for any x ∈ [0, 1). Of course, for a rational x the inequality above is mean-ingful just for finitely many values of n ∈ N. 2

Notice that (1.1.12) implies that

ωn − ωn−1 =(−1)n+1

qnqn−1, n ∈ N+, ω ∈ Ω, (1.1.18)

which in conjunction with (1.1.15) yields

0 = ω0 < ω2 < ω4 < · · · < ω3 < ω1 < 1 (1.1.19)

for any ω ∈ Ω. Clearly, the above inequalities also hold for any rationalω ∈ [0, 1) with some inequality signs ‘<’ replaced by ‘≤’.

In what follows we shall write

ω = [a1, a2, · · · ] , ω ∈ Ω,

to mean precisely equation (1.1.17).The next result shows that the continued fraction expansion of an irra-

tional number is unique in a certain sense.

Proposition 1.1.2 Let (in)n∈N+ be a sequence of positive integers. De-fine the rational numbers

ωn = [i1, · · · , in] , n ∈ N+.

Then the limitlim

n→∞ωn = ω

exists, where ω ∈ Ω and, moreover, the in, n ∈ N+, are the continuedfraction digits of ω.

Proof. Writing ωn = pn/qn, n ∈ N+, ω0 = 0, where pn, qn ∈ N+ andg.c.d.(pn, qn) = 1, it follows from (1.1.18) that

ωn =n∑

k=1

(−1)k+1

qk−1qk, n ∈ N+.

Page 28: Kluwer

Basic properties 11

As qk increases with k, Leibnitz’s theorem ensures the existence of limn→∞ ωn,say, ω, and (1.1.19) shows that 0 < ω < 1.

It remains to show that an (ω) = in, n ∈ N+. This will also provethat ω ∈ Ω, since if ω ∈ Q then we should have am (ω) = am+1 (ω) = · · ·∞for some m ∈ N+. As

ωn =1

i1 + [i2, · · · , in], n ≥ 2, (1.1.20)

it is sufficient to show that

a1 (ω) = b1/ωc = i1.

This follows from (1.1.20) letting n →∞ and noting that limn→∞ [i2, · · · , in]exists and lies in the open interval (0, 1). 2

1.1.3 Some special continued fraction expansions

The continued fraction expansion of a real number is a fundamental repre-sentation of it through its connection with the Euclidean algorithm and with‘best’ rational approximations [see, e.g., Hardy and Wright (1979, Ch. 11)].At the same time very little is known about the explicit continued fractionexpansions of some interesting numbers.

We already know that these expansions are finite (i.e., terminating) ex-actly for rational numbers. Also, by a well known theorem of J.-L. Lagrange[for all classical non-metric results the basic reference is Perron (1954, 1957)],the sequence of digits of an irrational number x is eventually periodic if andonly if x is a quadratic irrationality. Here ‘eventually periodic’ means thatif

x = [a0; a1, a2, · · · ] ,then there exist k ∈ N and ` ∈ N+ such that an = an+` for any n ≥ k, andwe use the notation

x =

[a0; · · · , a`−1] if k = 0,[a0; a1, · · · , a`] if k = 1,[a0; a1, · · · , ak−1, ak, · · · , ak+`−1] if k ≥ 2

as a convenient abbreviation. The smallest such ` ∈ N+ is called the periodlength of x. If we can take k = 0, then x is called purely periodic. Next, aquadratic irrationality is a number of the form

x =a +

√b

c,

Page 29: Kluwer

12 Chapter 1

where b ∈ N+ is not a perfect square, and a, c ∈ Z, c 6= 0. Then x′ =(a−

√b)

/c is called the algebraic conjugate of x. A purely periodic quadraticirrationality x is characterized by the inequalities x > 1, −1 < x′ < 0. Wehave, for example,

1 +√

72

=[1; 1, 4, 1

]

and1 +

√2

3=

[1, 4, 8

].

The first quadratic irrationality above is purely periodic and has periodlength 4 while the second one has period length 2 but is not purely periodic.

Apart from that, the continued fraction expansion of even a single ad-ditional algebraic number is not explicitly known. We do not know evenwhether the sequence of digits is unbounded for such a number. [In connec-tion with this matter see, however, Brjuno (1964) and Richtmyer (1975).]

For transcendental numbers of interest it is not clear when to expecta continuous fraction expansion with a good ‘pattern’. For example, in apaper titled De Fractionibus Continuis, published in 1737, Leonhard Eulergave a nice continued fraction expansion for e =

∑n∈N 1/n!, namely

e = [2; 1, 2, 1, 1, 4, 1, 1, 6, 1, · · · , 1, 2n, 1, · · · ] .

In this expansion the digits are eventually comprised of a meshing of twoarithmetic progressions, one of which has zero common difference while theother has difference two. Generalizing the above result, Euler showed—theoverline in the notation indicates infinite arithmetic progressions— that

e1/n = [1;n− 1 + 2in, 1]i∈N = [1;n− 1, 1, 1, 3n− 1, 1, 1, 5n− 1, 1, · · · ]

for any 1 < n ∈ N+, and

e2/n = [1; (n− 1)/2 + 3in, 6n + 12in, (5n− 1)/2 + 3in, 1]i∈N

= [1; (n− 1)/2, 6n, (5n− 1)/2, 1, 1, (7n− 1)/2, 18n, (11n− 1)/2, 1, · · · ]

for any odd n ∈ N+ greater than 1.Recently, Clemens et al. (1995) have given explicit formulae relating

continued fraction expansions with almost periodic or almost symmetricpatterns in their digits, and series whose terms satisfy certain recurrencerelations. The method developed by these authors ties together as a single

Page 30: Kluwer

Basic properties 13

phenomenon previous results by Davison and Shallit (1991), Kohler (1980),Petho (1982), Shallit (1979, 1982 a,b), van der Poorten and Shallit (1992),and Tamura (1991), who have found continued fraction expansions for num-bers expressed by certain types of series.

On the other hand, nobody has made any sense out of the pattern in thecontinued fraction expansion for π :

π = [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, · · · ] .

The digits of π do not appear to follow any pattern and are widely suspectedto be in some sense random.

There is a vague folklore statement [cf. Thakur (1996)] that the nicepatterns come from the connection with hypergeometric functions and therepresentation of the latter by certain generalized continued fraction expan-sions. For more on that see Chudnovsky and Chudnovsky (1991, 1993).

Remark. Using the continued fraction expansion for e, Alzer (1998)proved that

minp,q∈N+,q≥3

q2 log q

log log q

∣∣∣∣e−p

q

∣∣∣∣

exists and is only attained at the 19th convergent of e

p19

q19=

28 245 72910 391 013

,

thus it is equal to

(10 391 013)2 log 10 391 013log log 10 391 013

∣∣∣∣e−28 245 72910 391 013

∣∣∣∣

= 0.386 249 199 819 · · · .

Further, the inequalityq2 log q

log log q

∣∣∣∣e−p

q

∣∣∣∣ < c

has infinitely many solutions in integers p, q ∈ N+ if and only if c ≥ 1/2.For further developments see Elsner (1999). 2

Page 31: Kluwer

14 Chapter 1

1.2 Basic metric properties

1.2.1 Defining random variables of interest

By (1.1.2′) the incomplete quotients an, n ∈ N+ , of the irrationals in I aredefined by

a1 (ω) = b1/ωc, an (ω) = a1

(τn−1 (ω)

), ω ∈ Ω, n ∈ N+.

If we define a1 (0) = ∞ then the above equations also define the incompletequotients for the rational numbers in [0, 1). As we have noted in Subsec-tion 1.1.1, for any rational x ∈ [0, 1) there exists n = n (x) ∈ N+ suchthat am (x) = ∞ for any m ≥ n.

The metric point of view in studying the sequence (an)n∈N+ is to con-sider that the an, n ∈ N+, are N+-valued random variables on (I,BI)which are defined µ-a.s. in I for any probability measure µ on BI assigningmeasure 0 to the rationals in I . (Such a µ is clearly Lebesgue measure λ.)Alternatively, we can look at the an, n ∈ N+, as N+ ∪ ∞-valued randomvariables which are defined everywhere in [0, 1). It is clear, for example, that

a1 (0) = ∞, a1 (x) = 1, x ∈(

12, 1

),

a1 (x) = i, x ∈(

1i + 1

,1i

], i ≥ 2,

a2 (0) = a2

(1i

)= ∞, i ≥ 2,

a2 (x) = 1, x ∈⋃

i∈N+

(1

i + 1,

1i + 1/2

),

a2 (x) = j, x ∈⋃

i∈N+

[1

i + 1/j,

1i + 1/ (j + 1)

), j ≥ 2.

The distinction between the two cases is nevertheless immaterial as weshall only consider probability measures on BI assigning measure 0 to therationals in I .

The probability structure of (an)n∈N+under λ will be given later. SeeProposition 1.2.7.

Let us define some related random variables. For any n ∈ N+ put

rn =1

τn−1= [an; an+1, an+2, · · · ], (1.2.1)

Page 32: Kluwer

Basic properties 15

sn =qn−1

qn, yn =

1sn

, (1.2.2)

un (ω) = q−2n−1

∣∣∣∣ω −pn−1

qn−1

∣∣∣∣−1

, ω ∈ Ω, (1.2.3)

where, as usual, pn/qn = [a1, · · · , an] , n ∈ N+, is the nth convergent,p0 = 0, q0 = 1. Note that qn = y1 · · · yn = (s1 · · · sn)−1, n ∈ N+. Next, itfollows from the first equation (1.1.11) that

1sn

= an + sn−1, n ∈ N+,

with s0 = 0. Hence

sn = [an, · · · , a1] , n ∈ N+ . (1.2.2′)

Finally, using (1.1.15) it is easy to see that

un = sn−1 + rn, n ∈ N+ . (1.2.3′)

In what follows we shall refer to the qn, rn, sn, un, yn, n ∈ N+, as asso-ciated (with (an)n∈N+) random variables.

It is clear that 0 < sn < 1 whilst rn, un, yn > 1, n ∈ N+. We deferto Subsection 1.2.3 the study of distributional properties under λ of theassociated random variables.

1.2.2 Gauss’ problem and measure

Of paramount importance for the metric theory of the continued fractionexpansion, actually its first basic result, is the asymptotic behaviour of thedistribution function Fn (x) = λ (τn < x) = λ (τ−n([0, x))), x ∈ I, of τn

as n → ∞. C.F. Gauss wrote on 25th October 1800 in his diary that (inmodern notation)

limn→∞Fn (x) =

log (x + 1)log 2

, x ∈ I.

Gauss’ proof has never been found. Later, in a letter dated 30th January1812, Gauss asked Laplace what we now call:

Gauss’ Problem. Estimate the error

en (x) := Fn (x)− log(x + 1)log 2

, n ∈ N, x ∈ I.

Page 33: Kluwer

16 Chapter 1

Gauss’ letter has been published on pages 371–372 of his Werke, Volume 1,Section 1, Teubner, Leipzig, 1917. Almost the whole letter is reproducedon pages 396–397 of J.V. Uspensky’s Introduction to Mathematical Proba-bility, McGraw-Hill, New York, 1937. See also Gray (1984, p. 123) for otherhistorical details about Gauss’ problem.

The first one to give a solution to Gauss’ problem (implicitly provingGauss’ 1800 assertion) was R.O. Kuzmin, who showed in 1928 [see Kuzmin(1928, 1932)] that en (x) = O(q

√n) as n → ∞, with 0 < q < 1, uniformly

in x ∈ I. Kuzmin’s proof is reproduced in Khintchine (1956, 1963, 1964).Independently, Paul Levy showed one year later [see Levy (1929) and alsoLevy (1954, Ch.IX)] that |en(x)| ≤ qn, n ∈ N+, x ∈ I, with q = 3.5−2

√2 =

0.67157 · · · . We present a slightly improved version of Levy’s solution inSubsection 1.3.5. Using Kuzmin’s approach, Szusz (1961) claimed to havelowered the Levy estimate for q to 0.4. Actually, Szusz’s argument yields just0.485 rather than 0.4. The optimal value of q was determined by Wirsing(1974), who found that it was equal to 0.303 663 002 · · · .

Chapter 2 is devoted to a thorough treatment of Gauss’ problem. Inparticular, Corollary 2.3.6 provides a complete solution to a generalizationof it, where the interval [0, x), x ∈ I, is replaced by an arbitrary set A ∈ BI .

The limiting distribution function log(x + 1)/ log 2, x ∈ I, occurringin Gauss’ problem motivates the introduction of what we now call Gauss’measure γ, which is defined on BI by

γ (A) =1

log 2

A

dx

x + 1, A ∈ BI .

Then clearly γ([0, x]) = log(x+1)/ log 2, x ∈ I. We are going to prove that γand τ enjoy an important property. First, we note that τ does not preserveλ. This means that we do not have λ(τ−1(A)) = λ (A) for any A ∈ BI .Indeed, for, e.g., A = (1/2, 1) we have

τ−1 (A) =⋃

i∈N+

(1

i + 1,

1i + 1/2

)

and

λ(τ−1 (A)

)=

i∈N+

(1

i + 1/2− 1

i + 1

)= 2

i∈N+

(1

2i + 1− 1

2i + 2

)

= 2(

log 2− 1 +12

)= 2 log 2− 1

while λ (A) = 1/2.

Page 34: Kluwer

Basic properties 17

Instead, τ does preserve γ and we state formally this result, which is abasic one in the metric theory of the RCF expansion.

Theorem 1.2.1 Gauss’ measure γ is preserved by τ, and the sequence(an)n∈N+ is strictly stationary under γ.

Proof. We should show that

γ(τ−1(A)

)= γ(A), A ∈ BI .

For this it is enough to show that the above equation holds for any intervalA = (0, u], 0 < u ≤ 1. As

τ−1 ((0, u]) =⋃

i∈N+

[1

u + i,1i

),

we only need to verify that∫ u

0

dx

x + 1=

i∈N+

∫ 1/i

1/(u+i)

dx

x + 1,

which is an easy exercise.Since an = a1 τn−1, n ∈ N+ , the second assertion is obvious. 2

Remark. The expectation of a1 under γ is infinite. Indeed

1log 2

∫ 1

0

a1 (x)x + 1

dx =1

log 2

i∈N+

i

∫ 1/i

1/(i+1)

dx

x + 1= ∞.

2

1.2.3 Fundamental intervals, and applications

For any n ∈ N+ and i(n) = (i1, · · · , in) ∈ Nn+ define

I(i(n)) = ( ω ∈ Ω : ak (ω) = ik, 1 ≤ k ≤ n ) .

For example, for any i ∈ N+ we have

I (i) = (ω ∈ Ω : a1 (ω) = i ) = Ω ∩(

1i + 1

,1i

).

We are going to prove that any I(i(n)) is the set of irrationals from a certainopen interval with rational endpoints. The sets I(i(n)), i(n) ∈ Nn

+, are

Page 35: Kluwer

18 Chapter 1

called fundamental intervals of rank n. Let us make the convention thatI(i(0)) = Ω.

Theorem 1.2.2 For any n ∈ N+ and i(n) = (i1, · · · , in) ∈ Nn+ let

pn−1

qn−1= [i1, · · · , in−1] ,

pn

qn= [i1, · · · , in]

with g.c.d. (pn−1, qn−1) = g.c.d. (pn, qn) = 1, p0 = 0, q0 = 1. Then

I(i(n)) = Ω ∩ (u(i(n)), v(i(n))), (1.2.4)

where

u(i(n)) =

pn + pn−1

qn + qn−1if n is odd,

pn

qnif n is even,

v(i(n)) =

pn

qnif n is odd,

pn + pn−1

qn + qn−1if n is even.

We have

pn + pn−1

qn + qn−1=

[i1 + 1] if n = 1,

[i1, · · · , in−1, in + 1] if n > 1,

λ(I(i(n))) =1

qn (qn + qn−1)(1.2.5)

and

maxi(n)∈Nn

+

λ(I(i(n))) = λ (I (1(n))) =1

FnFn+1, n ∈ N+, (1.2.6)

with 1(n) = (i1, · · · , in), where i1 = · · · = in = 1.

Proof. Since [i1, · · · , in−1, in + ω] ∈ I(i(n)

), n ≥ 2, and [i1 + ω] ∈ I (i1)

for any ω ∈ Ω, we have τn(I

(i(n)

))= Ω for any n ∈ N+ and i(n) ∈ Nn

+. Inconjunction with (1.1.14) this proves (1.2.4). It thus appears that I(i(n)) isthe image of Ω under the map

ω → pn + ωpn−1

qn + ωqn−1, ω ∈ Ω.

Page 36: Kluwer

Basic properties 19

Next, (1.2.5) follows from (1.2.4) and (1.1.12).Finally, (1.2.6) is an immediate consequence of (1.2.5), as the minimum

of qn is attained for i1 = · · · = in = 1 [cf.(1.1.13)]. 2

Remark. When denoting by pn and qn, n ∈ N+, quantities seeminglydifferent from those already defined in Subsection 1.1.2, we clearly abusedthe notation. However, it should be noted that according to the context pn

and qn will appear to be either functions of ω ∈ Ω or of i(n) ∈ Nn+ as well.

Actually, pn

(i(n)

)(qn

(i(n)

)) is the common value of pn (qn) as defined in

Subsection 1.1.2 at all points ω ∈ I(i(n)

), n ∈ N+. 2

Corollary 1.2.3 For p, q ∈ N+ with p < q and g.c.d. (p, q) = 1 let

p

q= [i1, · · · , in] = [i1, · · · , in−1, in − 1, 1]

for some n = n (p/q) ∈ N+, where in ≥ 2. Define

pn−1

qn−1= [i1, · · · , in−1] ,

p−nq−n

= [i1, · · · , in−1, in − 1]

with g.c.d. (pn−1, qn−1) = g.c.d. (p−n , q−n ) = 1, p0 = 0, q0 = 1, and

Ip/q =(

ω ∈ Ω :p

qis a convergent of ω

).

ThenIp/q = I (i1, · · · , in) ∪ I (i1, · · · , in−1, in − 1, 1) (1.2.7)

=

Ω ∩(

p + pn−1

q + qn−1,p + p−nq + q−n

)if n is odd,

Ω ∩(

p + p−nq + q−n

,p + pn−1

q + qn−1

)if n is even

andλ

(Ip/q

)=

3(q + qn−1)

(q + q−n

) .

We have

maxp,q∈N+: n(p,q)=n

λ(Ip/q

)= λ

(IFn/Fn+1

)=

3(Fn−1 + Fn+1) Fn+2

, n ∈ N+.

Page 37: Kluwer

20 Chapter 1

Proof. By (1.1.11) we have

p = p−n + pn−1, q = q−n + qn−1.

It then follows from (1.2.4) that

I (i1, · · · , in−1, in − 1, 1) =

Ω ∩(

p

q,p + p−nq + q−n

)if n is odd,

Ω ∩(

p + p−nq + q−n

,p

q

)if n is even

while, by (1.2.4) again,

I (i1, · · · , in) =

Ω ∩(

p + pn−1

q + qn−1,p

q

)if n is odd

Ω ∩(

p

q,p + pn−1

q + qn−1

)if n is even.

The last two equations show that (1.2.7) holds.To compute λ

(Ip/q

)we have to use (1.1.12) three times. Finally, we

should note that the maximum of λ(Ip/q

)is obtained for i1 = · · · = in−1 =

1, in = 2. 2

Corollary 1.2.4 (Legendre’s theorem) For ω ∈ Ω and p, q ∈ N+ with p <q and g.c.d. (p, q) = 1 let

p

q= [i1, · · · , in] ,

pn−1

qn−1= [i1, · · · , in−1]

with p0 = 0, q0 = 1, where the length n = n (p/q) ∈ N+ of the continuedfraction expansion of p/q is chosen in such a way that it is even if p/q <ω and odd otherwise. Define

Θ = q2

∣∣∣∣ ω − p

q

∣∣∣∣ .

ThenΘ <

q

q + qn−1if and only if

p

qis a convergent of ω.

In particular, if Θ ≤ 1/2 then p/q is a convergent of ω .

Proof. If p/q is a convergent of ω, then by (1.1.15) we have

Θ = q2

∣∣∣∣ ω − p

q

∣∣∣∣ =q τn (ω)

q + τn (ω) qn−1<

q

q + qn−1.

Page 38: Kluwer

Basic properties 21

Conversely, if Θ < q / (q + qn−1) then∣∣∣∣ ω − p

q

∣∣∣∣ <1

q (q + qn−1). (1.2.8)

Assuming that p/q < ω, that is, n is even, from (1.2.8) we obtain

p

q< ω <

p

q+

1q (q + qn−1)

=p + pn−1

q + qn−1

[by (1.1.12)]. Similarly, assuming that p/q > ω, that is, n is odd, we obtain

p

q> ω >

p

q− 1

q (q + qn−1)=

p + pn−1

q + qn−1

[by (1.1.12) again]. In both cases we thus have ω ∈ I (i1, · · · , in). Hencep/q = [i1, · · · , in] is a convergent of ω.

The special case follows from the inequality q / (q + qn−1) > 1/2 whichholds since q > qn−1. 2

Corollary 1.2.5 For any n ∈ N+ and i(n) = (i1, · · · , in) ∈ N+ wehave

γ (ak = i1, · · · , ak+n−1 = in) =1

log 2log

1 + v(i(n))1 + u(i(n))

, k ∈ N+.

In particular,

γ (ak = i) =1

log 2log

(i + 1)2

i (i + 2)=

1log 2

log(

1 +1

i (i + 2)

)(1.2.9)

for any k, i ∈ N+.

Proof. Theorem 1.2.1 and equation (1.2.4). 2

Corollary 1.2.6 (Broden–Borel–Levy formula) For any n ∈ N+ wehave

λ (τn < x | a1, · · · , an) =(sn + 1)x

snx + 1, x ∈ I, (1.2.10)

where sn is defined by (1.2.2) or (1.2.2′).

Proof. Clearly, for any n ∈ N+ and x ∈ I,

λ (τn < x | a1, · · · , an) =λ ((τn < x) ∩ I (a1, · · · , an))

λ (I (a1, · · · , an)).

Page 39: Kluwer

22 Chapter 1

By (1.1.14) and (1.2.4) we have

(τn < x) ∩ I (a1, · · · , an)

=

(ω ∈ Ω :

pn + xpn−1

qn + xqn−1< ω <

pn

qn

)if n is odd,

(ω ∈ Ω :

pn

qn< ω <

pn + xpn−1

qn + xqn−1

)if n is even.

Hence, using (1.2.5) and (1.1.12),

λ (τn < x | a1, · · · , an) =qn (qn + qn−1) x

qn (qn + xqn−1)=

(sn + 1)x

snx + 1

for any n ∈ N+ and x ∈ I, and the proof is complete. 2

Remark. For x ∈ N+ equation (1.2.10) has been obtained by the Swedishmathematician T. Broden as early as 1900 [see Broden (1900, p. 246)], nineyears before E. Borel [see Borel (1909)]. Levy (1929) also obtained andused (1.2.10). This equation was called the Borel-Levy formula by Doeblin(1940). A generalization of (1.2.10) will be given in Proposition 1.3.8. 2

The Broden–Borel–Levy formula (1.2.10) allows us to determine theprobability structure of (an)n∈N+ under λ.

Proposition 1.2.7 For any i, n ∈ N+ we have

λ (a1 = i) =1

i (i + 1), (1.2.11)

λ (an+1 = i | a1, · · · , an) = Pi (sn) , (1.2.12)

wherePi (x) =

x + 1(x + i) (x + i + 1)

, x ∈ I. (1.2.13)

Proof. As we have already noted,

( ω ∈ Ω : a1 (ω ) = i ) = Ω ∩(

1i + 1

,1i

), i ∈ N+,

and (1.2.11) follows at once.

Page 40: Kluwer

Basic properties 23

Since τn (ω) = [an+1 (ω) , an+2 (ω) , · · · ] , n ∈ N+, ω ∈ Ω, we have

( ω ∈ Ω : an+1 (ω) = i ) =(

ω ∈ Ω : τn (ω) ∈(

1i + 1

,1i

))

for any n, i ∈ N+ so that

λ (an+1 = i | a1, · · · , an) = λ

(τn ∈

(1

i + 1,1i

)∣∣∣∣ a1, · · · , an

),

and (1.2.12) follows from (1.2.10). 2

Remark. Proposition 1.2.7 is the starting point of an approach to themetrical theory of the continued fraction expansion via dependence withcomplete connections. See Iosifescu and Grigorescu (1990, Section 5.2). 2

Corollary 1.2.8 The sequence (sn)n∈N+ with s0 = 0 is a Q ∩ I-valuedMarkov chain on (I,BI , λ) with the following transition mechanism: fromstate s ∈ Q ∩ I the possible transitions are to any state 1/ (s + i) withcorresponding transition probability Pi (s), i ∈ N+.

We conclude this subsection by considering the random variables rn andun, n ∈ N+, introduced in Subsection 1.2.1.

Proposition 1.2.9 For any n ∈ N+ and x ≥ 1 we have

λ (r1 < x) = λ (u1 < x) = 1− 1x

, (1.2.14)

λ (rn+1 < x | a1, · · · , an) = 1− sn + 1sn + x

, (1.2.15)

λ (un+1 < x | a1, · · · , an) =

0 if x ≤ sn + 1,

1− sn + 1x

if x > sn + 1.(1.2.16)

Proof. Equations (1.2.14) are obvious since r1 = u1 = 1/τ0. Then forany n ∈ N+ and x ≥ 1 we have

λ (rn+1 < x | a1, · · · , an) = λ

(τn >

1x

∣∣∣ a1, · · · , an

)

and

λ (un+1 < x | a1, · · · , an) = λ (rn+1 < x− sn | a1, · · · , an)

= λ

(τn >

1x− sn

∣∣∣ a1, · · · , an

).

Page 41: Kluwer

24 Chapter 1

To obtain equations (1.2.15) and (1.2.16) it remains to use (1.2.10). 2

Corollary 1.2.10 For any n ∈ N+ let Gn(s) = λ(sn < s), s ∈ R,G0 (s) = 0 or 1 according as s ≤ 0 or s > 0. For any n ∈ N+ and x ≥ 1we have

λ (rn < x) =∫ 1

0

x− 1s + x

dGn−1 (s)

= (x− 1)(

1x + 1

+∫ 1

0

Gn−1 (s) ds

(s + x)2

),

(1.2.17)

λ (un < x) =

∫ x−1

0

(1− s + 1

x

)dGn−1 (s) if 1 ≤ x ≤ 2,

∫ 1

0

(1− s + 1

x

)dGn−1 (s) if x > 2

(1.2.18)

=

1x

∫ x−1

0Gn−1 (s) ds if 1 ≤ x ≤ 2,

1− 2x

+1x

∫ 1

0Gn−1 (s) ds if x > 2

=1x

∫ x−1

0Gn−1 (s) ds,

ddx

λ (rn < x) =∫ 1

0

(s + 1) dGn−1 (s)(s + x)2

(1.2.19)

=2

(x + 1)2+

∫ 1

0

(s− x + 2)Gn−1 (s) ds

(s + x)3.

Also, for any n ∈ N+ we have

ddx

λ (un < x) =1x

Gn−1 (x− 1)− 1x2

∫ x−1

0Gn−1 (s) ds (1.2.20)

=

1x

Gn−1 (x− 1)− 1x2

∫ x−1

0Gn−1 (s) ds if 1 ≤ x ≤ 2,

1x2

(2−

∫ 1

0Gn−1 (s) ds

)if x > 2

Page 42: Kluwer

Basic properties 25

a.e. in [1,∞).

Proof. The first equality in (1.2.17) follows at once from (1.2.15). Toobtain the second one we integrate by parts noting that Gn (0) = 0 andGn (1) = 1 for any n ∈ N.

Similarly, the first equality in (1.2.18) follows at once from (1.2.16). Toobtain the second and third ones we integrate by parts and then note thatGn (s) = 1 for any n ∈ N and s ≥ 1.

Finally, equations (1.2.19) and (1.2.20) follow immediately from (1.2.17)and (1.2.18), respectively. 2

1.3 The natural extension of τ

1.3.1 Definition and basic properties

The incomplete quotients an, n ∈ N+, are expressed in terms of a1 andthe powers of the continued fraction transformation τ . Such a thing is notpossible for the variables sn or un, n ∈ N+. To rule out this inconveniencewe consider the so called natural extension τ of τ which is a transformationof (0, 1)× I defined by

τ (ω, θ) =(

τ (ω) ,1

a1 (ω) + θ

), (ω, θ) ∈ (0, 1)× I. (1.3.1)

This is a one-to-one transformation of Ω2 with inverse

τ−1 (ω, θ) =(

1a1 (θ) + ω

, τ (θ))

, (ω, θ) ∈ Ω2. (1.3.2)

It is easy to see that for any n ≥ 2 we have

τn (ω, θ) = (τn (ω) , [an (ω) , · · · , a2 (ω) , a1 (ω) + θ]) (1.3.1′)

whatever (ω, θ) ∈ Ω× I, and

τ−n (ω, θ) = ([an (θ) , · · · , a2 (θ) , a1 (θ) + ω], τn (θ)) (1.3.2′)

whatever (ω, θ) ∈ Ω2.

Equations (1.3.1) and (1.3.1′) imply that

τn (ω, 0) = (τn (ω) , sn (ω)) , n ∈ N+, (1.3.3)

Page 43: Kluwer

26 Chapter 1

for any ω ∈ Ω. Note that the above equation also hold for n = 0 if we defineτ0=identity map.

Now, define the extended Gauss measure γ on B2I by

γ (B) =1

log 2

∫∫

B

dxdy

(xy + 1)2, B ∈ B2

I .

Note thatγ (A× I) = γ (I ×A) = γ (A) (1.3.4)

for any A ∈ BI . The result below shows that γ plays with respect to τ thepart played by γ with respect to τ (cf. Theorem 1.2.1).

Theorem 1.3.1 The extended Gauss measure γ is preserved by τ .

Proof. We should show that γ(τ−1 (B)

)= γ (B) for any B ∈ B2

I or,equivalently, since τ is invertible on Ω2, that γ (τ (B)) = γ (B) for any B ∈B2

I . As the set of Cartesian products I(i(m)) × I(j(n)), i(m) ∈ Nm+ , j(n) ∈

Nn+, m, n ∈ N, generates the σ-algebra B2

I , it is enough to show that

γ(τ(I(i(m))× I(j(n)))) = γ(I(i(m))× I(j(n))) (1.3.5)

for any i(m) ∈ Nm+ , j(n) ∈ Nn

+, m, n ∈ N. It follows from (1.3.4) andTheorem 1.2.1 that (1.3.5) holds for m = 0 and n ∈ N. If m ∈ N+ then itis easy to see that

τ(I(i(m))× I(j(n))) = I (i2, · · · , im)× I (i1, j1, · · · , jn) , n ∈ N+,

where I (i2, · · · , im) equals Ω for m = 1. Also, if I(i(m)) = Ω ∩ (a, b)and I(j(n)) = Ω ∩ (c, d), with a, b, c, d ∈ Q ∩ I, then I (i2, · · · , im) =Ω ∩ (

b−1 − i1, a−1 − i1

)and I (i1, j1, · · · , jn) = Ω ∩ ((d + i1)−1, (c + i1)−1).

A simple computation yields

γ((a, b)× (c, d)) =1

log 2log

(bd + 1) (ac + 1)(bc + 1) (ad + 1)

,

and thenγ(

(b−1 − i1, a

−1 − i1)× ((d + i1)

−1 , (c + i1)−1))

=1

log 2log

((a−1 − i1)(c + i1)−1 + 1)((b−1 − i1)(d + i1)−1 + 1)((a−1 − i1)(d + i1)−1 + 1)((b−1 − i1)(c + i1)−1 + 1)

=1

log 2log

(bd + 1) (ac + 1)(bc + 1) (ad + 1)

,

that is, (1.3.5) holds. 2

For more details on natural extensions we refer the reader to Subsection4.0.1.

Page 44: Kluwer

Basic properties 27

1.3.2 Approximation coefficients

On account of Legendre’s theorem (see Corollary 1.2.4), for any ω ∈ Ω wedefine the approximation coefficients Θn = Θn (ω) as

Θn = Θn (ω) = q2n

∣∣∣∣ω −pn

qn

∣∣∣∣ , n ∈ N.

Clearly, Θ0 (ω) = ω, ω ∈ Ω, and by (1.2.3) we have

Θn = u−1n+1, n ∈ N. (1.3.6)

Hence0 < Θn < 1, n ∈ N.

It is rather easy to obtain more information about Θn, n ∈ N. It followsfrom (1.2.3′) and (1.2.1) that

Θn =1

sn + rn+1=

τn

snτn + 1, n ∈ N.

Moreover, as s−1n = an + sn−1 and rn = an + r−1

n+1, n ∈ N+, we also have

Θn−1 =1

sn−1 + rn=

1sn−1 + an + r−1

n+1

=sn

snτn + 1, n ∈ N+.

Thus it appears that

(Θn−1,Θn) = Ψ (τn, sn) , n ∈ N+, (1.3.7)

the function Ψ : I2 → R2+ being defined by

Ψ (x, y) =(

y

xy + 1,

x

xy + 1

), (x, y) ∈ I2.

Clearly, Ψ is a C1-diffeomorphism between the interior of I2 and theinterior of the triangle ∆ with vertices (0, 0) , (1, 0) and (0, 1). It thenfollows from (1.3.7) that

Θn−1 + Θn < 1, n ∈ N+,

whencemin (Θn−1, Θn) <

12, n ∈ N+,

Page 45: Kluwer

28 Chapter 1

a well known result due to Vahlen (1895).The inverse Ψ−1 of Ψ is given by

Ψ−1 (α, β) =(

1 +√

1− 4αβ,

1 +√

1− 4αβ

), (α, β) ∈ ∆.

For i ∈ N+ putVi = I (i) × ΩHi = Ω × I (i) .

It follows from the definition of τ that

τ (Vi) = Hi, Vi = τ−1 (Hi) , i ∈ N+,

and that for any i ∈ N+ we have

τn ∈ Vi if and only if an+1 = i, n ∈ N, (1.3.8)

τn ∈ Hi if and only if an = i, n ∈ N+. (1.3.9)

Furthermore, the set V ∗i = ΨVi, is a quadrangle with vertices

(0,

1i

),

(i

i + 1,

1i + 1

),

(i + 1i + 2

,1

i + 2

)and

(0,

1i + 1

),

and notice that its symmetrical with respect to the diagonal α = β is H∗i =

ΨHi, i ∈ N+. (For i = 1 both quadrangles are in fact triangles.) Definethe mapping F : ∆ → ∆ as F = ΨτΨ−1.

It is easy to check that for any i ∈ N+ we have

(α, β) ∈ V ∗i ⇒ F (α, β) =

(β, α + i

√1− 4αβ − i2β

). (1.3.10)

Now, by (1.3.7) we have

Ψ−1 (Θn−1, Θn) = (τn, sn) ,

whenceτΨ−1 (Θn−1, Θn) =

(τn+1, sn+1

), n ∈ N+.

Therefore, by (1.3.7) again,

F (Θn−1,Θn) = Ψ τ Ψ−1 (Θn−1, Θn)

= Ψ(τn+1, sn+1

)= (Θn,Θn+1), n ∈ N+.

(1.3.11)

Page 46: Kluwer

Basic properties 29

Hence, by (1.3.3), (1.3.8), and (1.3.10),

Θn+1 = Θn−1 + an+1

√1− 4Θn−1Θn − a2

n+1Θn, n ∈ N+. (1.3.12)

Similarly, for any i ∈ N+ we have

(α, β) ∈ H∗i ⇒ F−1 (α, β) =

(β + i

√1− 4αβ − i2α, α

). (1.3.13)

As by (1.3.3), (1.3.9), and (1.3.13) we have

F−1 (Θn, Θn+1) = (Θn−1, Θn) , n ∈ N+,

we obtain

Θn−1 = Θn+1 + an+1

√1− 4ΘnΘn+1 − a2

n+1Θn, n ∈ N+. (1.3.12′)

We note that both (1.3.12) and (1.3.12′) can be established by directcomputation using the relationships between Θn, rn, sn, and an, n ∈ N+.

We are now able to derive some classical results in Diophantine approx-imation. Put

fi (α, β) = α + i√

1− 4αβ − i2β, i ∈ N+,

so that (1.3.10) can be rewritten as

(α, β) ∈ V ∗i ⇒ F (α, β) = (β, fi (α, β)) .

It is easy to check that

∂fi

∂α(α, β) < 0,

∂fi

∂β(α, β) < 0, (α, β) ∈ V ∗

i , i ∈ N+. (1.3.14)

The only fixed point of τ in Vi is (ξi, ξi), where

ξi = [i, i, i, · · · ] =−i +

√i2 + 4

2, i ∈ N+,

while the only fixed point of F in V ∗i = ΨVi is (ξ∗i , ξ∗i ), where

(ξ∗i , ξ∗i ) = Ψ (ξi, ξi) =(

1√i2 + 4

,1√

i2 + 4

), i ∈ N+. (1.3.15)

Note that by (1.3.11) we have (Θn−1, Θn,Θn+1) = (Θn−1, F (Θn−1,Θn)) , n ∈N+. Hence, for any i, n ∈ N+,

(Θn−1, Θn,Θn+1) = (Θn−1,Θn, fi (Θn−1, Θn))

Page 47: Kluwer

30 Chapter 1

if and only if (Θn−1,Θn) ∈ V ∗i , that is, by (1.3.7), if and only if an+1 = i.

Finally, note thatΘn−1 (ξ∗i ) 6= Θn (ξ∗i ) (1.3.16)

for any i, n ∈ N+.Now, on account of (1.3.14) through (1.3.16) we can state the following

result.

Theorem 1.3.2 For any ω ∈ Ω and n ∈ N+ we have

min (Θn−1, Θn, Θn+1) <1√

a2n+1 + 4

(1.3.17)

andmax (Θn−1,Θn, Θn+1) >

1√a2

n+1 + 4. (1.3.18)

Inequality (1.3.17) generalizes a result of Borel (1903) according to which

min (Θn−1, Θn, Θn+1) <1√5, n ∈ N+. (1.3.11)

A great number of people independently found (1.3.17). See, e.g., Bage-mihl and McLaughlin (1966), Obrechkoff (1951), Sendov (1959/60).

Inequality (1.3.18) is due to Tong (1983). Actually, the method sketchedabove yields easy proofs of generalizations of a great number of classicalresults by M. Fujiwara, B. Segre, J. LeVeque, P. Szusz, and others. We willmention here a generalization of a result of B. Segre. For other results thereader is referred to Jager and Kraaikamp (1989) and Kraaikamp (1991).

Theorem 1.3.3 Let ρ ≥ 0 and n ∈ N+. Then of the three inequalities

Θ2n−1 <ρ√

a22n+1 + 4ρ

, Θ2n <1√

a22n+1 + 4ρ

, Θ2n+1 <ρ√

a22n+1 + 4ρ

at least one is satisfied and at least one is not satisfied.

Corollary 1.3.4 [Segre (1945)] Let ρ ≥ 0 and ω ∈ Ω. Then thereare infinitely many rational numbers p/q with p < q and g.c.d. (p, q) = 1satisfying the inequalities

− ρ√1 + 4ρ

1q2

< ω − p

q<

1√1 + 4ρ

1q2

.

Page 48: Kluwer

Basic properties 31

Remark. Tong (1994) proved the optimal version of Theorem 1.3.2 byshowing that for any ω ∈ Ω and n ∈ N+ we have

min (Θn−1, Θn, Θn+1) <1√

(an+1 + |τn+1 − sn|)2 + 4

andmax (Θn−1,Θn, Θn+1) >

1√(an+1 − |τn+1 − sn|)2 + 4

.

2

1.3.3 Extended random variables

It is well known [see, e.g., Doob (1953, p. 456)] that a doubly infinite versionof (an)n∈N+ under γ (i.e., when the process is a strictly stationary one, seeTheorem 1.2.1) should exist on a richer probability space. It is possible toconstruct it effectively by using the natural extension τ as follows. Defineextended incomplete quotients a`, ` ∈ Z, on Ω2 by

a`+1 (ω, θ) = a1

(τ ` (ω, θ)

), ` ∈ Z,

witha1 (ω, θ) = a1 (ω) , (ω, θ) ∈ Ω2.

Clearly, by (1.3.1′) and (1.3.2′) we have

an (ω, θ) = an (ω) , a0 (ω, θ) = a1 (θ) ,

a−n (ω, θ) = an+1 (θ) , n ∈ N+, (ω, θ) ∈ Ω2.

Similarly to the interpretation of the an, n ∈ N+, in Subsection 1.2.1,we can consider the a`, ` ∈ Z, as N+-valued random variables on

(I2,B2

I

)which are defined µ-a.s. in I2 for any probability measure µ on B2

I assigningmeasure 0 to I2\Ω2. (Such a µ is clearly γ.) Alternatively, we can lookat the a`, ` ∈ Z, as N+ ∪ ∞-valued random variables which are definedeverywhere in [0, 1)2, as the an, n ∈ N+, can be defined everywhere in[0, 1) (cf. Subsection 1.2.1). In the latter case a typical trajectory of (a`)`∈Z

is either— a doubly infinite sequence of natural numbers;— a doubly infinite sequence of elements of N+ ∪ ∞ in which the

natural numbers appear finitely many times in consecutive positions;— a doubly infinite sequence of elements of N+ ∪ ∞ in which the

natural numbers appear in consecutive positions from a certain rank on orup to a certain rank.

Page 49: Kluwer

32 Chapter 1

The distinction between the two cases is again immaterial.Since τ preserves γ, the doubly infinite sequence (a`)`∈Z is strictly sta-

tionary under γ. It is indeed a doubly infinite version of (an)n∈N+ under γ,that is, the distribution of (ah, · · · , ah+m) under γ and that of (ak, · · · , ak+m)under γ are identical for any h ∈ Z, m ∈ N, and k ∈ N+.

The probability structure of (a`)`∈Z under γ is described by Corollary1.3.6 to Theorem 1.3.5 below. The latter also brings to light an importantfamily of probability measures on BI , to be called conditional, which weshall consider in some detail in the next subsection.

Theorem 1.3.5 For any x ∈ I we have

γ ([0, x]× I | a0, a−1, · · · ) =(a + 1)x

ax + 1γ-a.s.,

where a = [a0, a−1, · · · ].Proof. As is well known,

γ ([0, x]× I | a0, a−1, · · · ) = limn→∞ γ ([0, x]× I | a0, · · · , a−n ) γ-a.s..

For typographical convenience let us denote by In the fundamental inter-val I (a0, · · · , a−n) for any arbitrarily fixed values of the ai, i = 0,−1, . . . ,−n.Then we have

γ ([0, x] × I | a0, · · · , a−n )

=γ ([0, x]× In)

γ (I × In)

=(log 2)−1

In

dy

∫ x

0

du

(uy + 1)2

γ (In)=

(log 2)−1∫

In

x (y + 1)xy + 1

dy

y + 1γ (In)

=

In

x (y + 1)xy + 1

γ (dy)

γ (In)=

x (yn + 1)xyn + 1

for some yn ∈ In. Since

limn→∞ yn = [a0, a−1, · · · ] = a,

the proof is complete. 2

Page 50: Kluwer

Basic properties 33

Corollary 1.3.6 For any i ∈ N+ we have

γ (a1 = i| a0, a−1, · · · ) = Pi (a) γ-a.s.,

where a = [a0, a−1, · · · ] and the functions Pi, i ∈ N+, are defined by(1.2.13).

Proof. We have

(a1 = i) =

(12, 1

)× [0, 1) if i = 1,

(1

i + 1,1i

]× [0, 1) if i ≥ 2.

Hence the conditional probability in the statement is γ-a.s. equal to

(a + 1) /i

1 + a/i− (a + 1) / (i + 1)

1 + a/ (i + 1)= Pi (a) .

2

Remarks. 1. The strict stationarity of (a`)`∈Z under γ implies that theconditional probability

γ (a`+1 = i| a`, a`−1, · · · ), i ∈ N+,

does not depend on ` ∈ Z and is γ-a.s. equal to Pi (a), where a = [ a`, a`−1,· · · ]. Thus Proposition 1.2.7 and Corollary 1.3.6 provide interpretations ofPi (x) for all x ∈ [0, 1).

2. The process (a`)`∈Z is an example of what is called an infinite-orderchain in the theory of dependence with complete connections, see Section5.5 in Iosifescu and Grigorescu (1990). The existence of such chains is notobvious. To ensure the existence several restrictions should be imposed. See,e.g., Theorems 5.5.1 and 5.5.2 in Iosifescu and Grigorescu (op. cit.). Thelatter refers to N+-valued infinite-order chains and makes explicit use of thecontinued fraction expansion. The simple effective construction of (a`)`∈Z onthe probability space

(I2,B2

I , γ)

fully clarifies an idea of Wolfgang Doeblin[see Doeblin (1940)], who was the first to use dependence with completeconnections in the metric theory of the continued fraction expansion. 2

Note that by its very construction (a`)`∈Z is a reversible process, thatis, the finite dimensional distributions under γ of (a`)`∈Z and (a−`)`∈Z areidentical. A similar property holds for (an)n∈N+ under γ, as is shown bythe following result.

Page 51: Kluwer

34 Chapter 1

Proposition 1.3.7 The random sequence (an)n∈N+ on (I,BI , γ) is re-versible, i.e., the distributions of (a` : m ≤ ` ≤ n) and (am+n−` : m ≤ ` ≤ n)are identical for any m,n ∈ N+, m ≤ n.

Proof. By the strict stationarity under γ of (a`)`∈Z, the distribution of(a` : m ≤ ` ≤ n) is identical with the distribution of (a`−m−n+1 : m ≤ ` ≤ n)(both under γ). But by the very definition of (a`)`∈Z the first distributionis identical with that of (a` : m ≤ ` ≤ n) while the second one is identicalwith that of (am+n−` : m ≤ ` ≤ n) (both under γ). 2

Remark. The result stated in Proposition 1.3.7 amounts to the fact thatthe γ-measures of the fundamental intervals I (i1, · · · , in) and I (in, · · · , i1)are equal for any n ∈ N+ and i1, · · · , in ∈ N+. This can be also proved bydirect computation using results from Subsection 1.2.3. See Philipp (1967)and Durner (1992). 2

Define extended associated random variables s`, y`, r` and u` as

s` = [ a`, a`−1, · · · ] , y` =1s`

,

r` = [ a`; a`+1, a`+2, · · · ] , u` = s`−1 + r`, ` ∈ Z.

Clearly,s` = s0 τ `, y` = y0 τ `,

r` = r0 τ `, u` = s0 τ `−1 + r1 τ `−1, ` ∈ Z.

It follows from the above equations, Theorem 1.3.1, and Corollary 1.3.6 that(s`)`∈Z is a strictly stationary Ω-valued Markov process on

(I2,B2

I , γ)

withthe following transition mechanism: from state s ∈ Ω the possible transi-tions are to any state 1/ (s + i) with corresponding transition probabilityPi (s) , i ∈ N+. Clearly, for any ` ∈ Z we have

γ (s` < x) = γ (s0 < x) = γ (I × [0, x]) = γ ([0, x]) , x ∈ I.

Similar considerations can be made about the process (y`)`∈Z. This is astrictly stationary Ω′-valued Markov process on

(I2,B2

I , γ), where Ω′ = the

set of irrationals in [1,∞). The transition mechanism of (y`)`∈Z is as follows:from state y ∈ Ω′ the only possible transitions are to any state y −1 + i withcorresponding transition probability Pi(1/y), i ∈ N. For any ` ∈ Z we have

γ(y` < x) = γ(y0 < x) = γ([x−1, 1]) = γ′ ([1, x]) , x ∈ [1,∞),

Page 52: Kluwer

Basic properties 35

where γ′ is the probability measure on B[1,∞] defined by

γ′(A′

)=

1log 2

A′

dy

y (y + 1), A′ ∈ B[1,∞).

Next, the process (r`)`∈Z is a strictly stationary Ω′-valued ‘deterministic’Markov process on

(I2,B2

I , γ)

in which state r ∈ Ω′ is followed by state1/(r − brc) = 1/

(r−1

)). Obviously, for any ` ∈ Z we have

γ (r` < x) = γ (r1 < x) = γ (r1 < x) = γ′([1, x)), x ∈ [1,∞).

Note that by the reversibility of (a`)`∈Z the finite-dimensional distributionsunder γ of (s`)`∈Z and (r−1

` )`∈Z are identical.Finally, the process (s`−1, r

−1` )`∈Z is a strictly stationary Ω2-valued ‘de-

terministic’ Markov process on(I2,B2

I , γ)

in which state (s, ω) ∈ Ω2 is fol-lowed by state

τ−1 (s, ω) =(

1s + bω−1c , ω−1 − bω−1c

).

For any ` ∈ Z we have

γ(s`−1 < x, r−1

` < y)

= γ(s0 < x, r−1

1 < y)

= γ ([0, y]× [0, x]) =1

log 2

∫ y

0

∫ x

0

dudv

(uv + 1)2

=log (xy + 1)

log 2, x, y ∈ I.

The process (u`)`∈Z, which is a functional of (s`−1, r−1` )`∈Z (note that

u` = s`−1+r`, ` ∈ Z), is no longer Markovian but is still a strictly stationaryone. For any ` ∈ Z we have

γ (u` < x) = γ (u1 < x) = γ (s0 + r1 < x)

=1

log 2

∫∫

D

dudv

(uv + 1)2, x ∈ [1,∞),

where D =((u, v) ∈ I2 : u + v−1 < x

). Hence

γ (u` < x) =

1log 2

(log x− x− 1

x

)if 1 ≤ x ≤ 2,

1log 2

(log 2− 1

x

)if x ≥ 2.

Page 53: Kluwer

36 Chapter 1

1.3.4 The conditional probability measures

Motivated by Theorem 1.3.5 we shall consider the family of (conditional)probability measures (γa)a∈I on BI defined by their distribution functions

γa ([0, x]) =(a + 1)x

ax + 1, x ∈ I, a ∈ I.

In particular, γ0 = λ. The density ha of γa is

ha (x) =a + 1

(ax + 1)2, x ∈ I, a ∈ I,

and [see, e.g., Billingsley (1968, p. 224)] we then have

supA∈BI

|γa (A)− γb (A)| =12

I|ha (x)− hb (x)| dx

=12|b− a|

I

∣∣(ab + a + b) x2 + 2x− 1∣∣

(ax + 1)2 (bx + 1)2dx

= |b− a|∫ α

0

1− 2x− (ab + a + b) x2

(ax + 1)2 (bx + 1)2dx

=α (1− α) |b− a|(αa + 1) (αb + 1)

,

where α =(1 +

√(a + 1) (b + 1)

)−1, a, b ∈ I. Hence

supA∈BI

|γa (A)− γb (A)| ≤ 14|b− a| , a, b ∈ I . (1.3.19)

It is easy to see that we also have

supx∈I

|γa ([0, x])− γb ([0, x])| = α (1− α) |b− a|(1 + αa) (1 + αb)

, a, b ∈ I.

For any a ∈ I put sa0 = a and

san =

1san−1 + an

, n ∈ N+. (1.3.20)

It follows from the properties just described of the process (s`)`∈Z thatthe sequence (sa

n)n∈N+is an I-valued Markov chain on (I,BI , γa) which

starts at sa0 = a and has the following transition mechanism: from state

Page 54: Kluwer

Basic properties 37

s ∈ I the possible transitions are to any state 1/ (s + i) with correspondingtransition probability Pi (s) , i ∈ N+. [Strictly speaking, this only holdsfor any a ∈ E ⊂ Ω, for some E ∈ BI with λ (E) = 1, as (sa

n)n∈N under γa

is a version of (sn)n∈N under γ( · |s0 = a), a ∈ E. The validity of the aboveassertion for the remaining a ∈ I \ E follows by continuity on account of(1.3.19).]

Proposition 1.3.8 (Generalized Broden–Borel–Levy formula) For anya ∈ I and n ∈ N+ we have

γa (τn < x | a1, · · · , an) =(sa

n + 1)x

sanx + 1

, x ∈ I. (1.3.21)

Proof. For any n ∈ N+ and x ∈ I consider the conditional probability

γ(τ−n([0, x]× I)| an, · · · , a1, a0, a−1, · · ·

). (1.3.22)

Put a = [a0, a−1, · · · ] − actually, a (ω, θ) = θ, (ω, θ) ∈ Ω2 − and note that

[an, · · · , a1, a0, a−1, · · · ] = san.

On the one hand, it follows from Theorems 1.3.1 and 1.3.5 (see alsoRemark 1 after Corollary 1.3.6) that the conditional probability (1.3.22) isγ-a.s. equal to

(san + 1) x

sanx + 1

.

On the other hand, putting

γa ( · ) = γ ( · | a0, a−1, · · · ) ,

it is clear that (1.3.22) is γ-a.s. equal to

γa (τ−n([0, x]× I) ∩ (I (a1, · · · , an)× I))γa (I(a1, · · · , an)× I)

. (1.3.23)

Since τ−n ([0, x]× I) = τ−n ([0, x]) × I and γa (A× I) = γa (A) , A ∈ BI ,the fraction in (1.3.23) is equal to

γa

(τ−n ([0, x]) |I(a1, · · · , an)

)= γa (τn < x | a1, · · · , an) .

Therefore (1.3.21) holds for any a ∈ E ⊂ Ω, for some E ∈ BI with λ (E) = 1,hence by continuity [use (1.3.19)] for the remaining a ∈ I\E. 2

Page 55: Kluwer

38 Chapter 1

Remark. Equation (1.3.21) can be also proved by direct computation(cf. the proof of Corollary 1.2.6). 2

Corollary 1.3.9 For any a ∈ I and n ∈ N+ we have

γa (A | a1, · · · , an) = γsan

(τn (A)) (1.3.24)

whatever the set A belonging to the σ-algebra generated by the random vari-ables an+1, an+2, · · · , that is, τ−n(BI).

We now give a generalization of Proposition 1.2.9, where Lebesgue mea-sure λ(= γ0) is replaced by γa, a ∈ I. Define first the random variables ua

n

as

uan = sa

n−1 + rn, n ∈ N+, a ∈ I.

Proposition 1.3.10 For any a ∈ I, n ∈ N+, and x ≥ 1 we have

γa(r1 < x) = 1− a + 1x + a

,

γa(ua1 < x) =

0 if x ≤ a + 1,

1− a + 1x

if x > a + 1,

γa(rn+1 < x | a1, . . . , an) = 1− san + 1

x + san

,

γa(uan+1 < x | a1, . . . , an) =

0 if x ≤ san + 1,

1− san + 1x

if x > san + 1.

The proof is entirely similar to that of Proposition 1.2.9. 2

Corollary 1.3.11 For any a ∈ I and n ∈ N+ let Gan(s) = γa(sa

n <s), s ∈ R, Ga

0(s) = 0 or 1 according as s ≤ a or s > a. For any a ∈ I, n ∈

Page 56: Kluwer

Basic properties 39

N+, and x ≥ 1 we have

γa(rn < x) =∫ 1

0

x− 1s + x

dGan−1(s)

= (x− 1)(

1x + 1

+∫ 1

0

Gan−1(s) ds

(s + x)2

),

γa(uan < x) =

∫ x−1

0

(1− s + 1

x

)dGa

n−1(s) if 1 ≤ x ≤ 2,

∫ 1

0

(1− s + 1

x

)dGa

n−1(s) if x > 2

=1x

∫ x−1

0Ga

n−1 (s) ds.

Equations similar to (1.2.19) and (1.2.20) hold, too.

1.3.5 Paul Levy’s solution to Gauss’ problem

We now present the elegant solution given by Levy (1929) to Gauss’ problem.Actually, as Levy has done in the case a = 0, we shall obtain estimates forboth ‘errors’ F a

n −G and Gan −G, a ∈ I, n ∈ N, where

F an (x) = γa(τn < x), x ∈ I, Ga

n(s) = γa(san < s), s ∈ R,

and G(s) = 0, γ([0, s]), or 1 according as s < 0, s ∈ I, or s > 1.

It follows from Corollary 1.3.11 that

F an (x) =

∫ 1

0

x(s + 1)xs + 1

dGan(s) (1.3.25)

for any a, x ∈ I and n ∈ N. It is easy to check that

G (x) =∫ 1

0

x(s + 1)xs + 1

dG(s), x ∈ I, (1.3.26)

and

Gan+1

(1m

)= F a

n

(1m

), m, n ∈ N+, a ∈ I. (1.3.27)

Page 57: Kluwer

40 Chapter 1

The last equation is still valid for n = 0 and a 6= 0 while

G01

(1m

)= F 0

0

(1

m + 1

)=

1m + 1

, m ∈ N+. (1.3.27′)

Since (san)n∈N is a Markov chain on (I,BI , γa)—see the preceding subsection—

for any m,n ∈ N+, a ∈ I, and θ ∈ [0, 1) we have

Gan+1

(1m

)−Ga

n+1

(1

m + θ

)

= γa

(1

m + θ≤ sa

n+1 <1m

)

= E

(γa

(1

m + θ≤ sa

n+1 <1m

∣∣∣∣ san

))

=∫ θ

0Pm(s) dGa

n(s)

(1.3.28)

while

Ga1

(1m

)−Ga

1

(1

m + θ

)= γa

(1

m + θ≤ 1

a1 + a<

1m

)

=∫ θ

0+Pm(s) dGa

0(s),

(1.3.28′)

that is, (1.3.28) also holds for n = 0 if a 6= 0.It is easy to check that

∫ θ

0Pm(s)dG(s) = G

(1m

)−G

(1

m + θ

)(1.3.29)

for any m ∈ N+ and θ ∈ [0, 1).Now, by (1.3.25) and (1.3.26) we have

F an (x)−G(x) =

∫ 1

0

x(s + 1)xs + 1

d(Gan(s)−G(s))

= −∫ 1

0(Ga

n(s)−G(s))∂

∂s

(x(s + 1)xs + 1

)ds

for any a, x ∈ I and n ∈ N. Setting

αan = sup

s∈I|Ga

n(s)−G(s)| , a ∈ I, n ∈ N,

Page 58: Kluwer

Basic properties 41

we obtain

|F an (x)−G(x)| ≤ αa

n

∫ 1

0

x(1− x)(xs + 1)2

ds =x(1− x)

x + 1αa

n,

hence|F a

n (x)−G(x)| ≤ (3− 2√

2)αan (1.3.30)

for any a, x ∈ I and n ∈ N. Let us note that

αa0 = max (G(a), 1−G(a)), a ∈ I.

Theorem 1.3.12 For any n ∈ N+ and a ∈ I we have

supx∈I

|F an (x)−G(x)| ≤ 1

2(3− 2

√2)(3.5− 2

√2)n−1,

supx∈I

|Gan(x)−G(x)| ≤ 1

2(3.5− 2

√2)n−1.

Proof. By (1.3.27) through (1.3.30), for any m, n ∈ N+, a ∈ I, andθ ∈ [0, 1)—also for n = 0 and any m ∈ N+, a ∈ (0, 1], and θ ∈ [0, 1)—wehave

∣∣∣∣Gan+1

(1

m + θ

)−G

(1

m + θ

)∣∣∣∣

≤∣∣∣∣Ga

n+1

(1m

)−G

(1m

)∣∣∣∣

+∣∣∣∣Ga

n+1

(1m

)−Ga

n+1

(1

m + θ

)−G

(1m

)+ G

(1

m + θ

)∣∣∣∣

=∣∣∣∣F a

n

(1m

)−G

(1m

)∣∣∣∣ +∣∣∣∣∫ θ

0Pm(s) d (Ga

n(s)−G(s))∣∣∣∣

≤(3− 2

√2)

αan

+∣∣∣∣∫ θ

0(G(s)−Ga

n(s)) dPm(s) + Pm(θ)(Gan(θ)−G(θ))

∣∣∣∣≤ (3− 2

√2 + β(m, θ))αa

n,

where

β(m, θ) =∫ θ

0

∣∣∣∣dPm(s)

ds

∣∣∣∣ds + Pm(θ).

Page 59: Kluwer

42 Chapter 1

It is easy to check that β(m, θ) ≤ 1/2 for any m ∈ N+ and θ ∈ [0, 1).Actually,

β(m, θ) =

1/2 if m = 1,

4/(3 + θ)− 2/(2 + θ)− 1/6 if m = 2 and θ ≤ √2− 1,

6− 4√

2− 1/6 if m = 2 and θ ≥ √2− 1,

2Pm(θ)− 1/m(m + 1) if m ≥ 3.

Hence

αan+1 = sup

m∈N+, θ∈[0,1)

∣∣∣∣Gan+1

(1

m + θ

)−G

(1

m + θ

)∣∣∣∣

≤ (3.5− 2√

2)αan

(1.3.31)

for any a ∈ I and n ∈ N+.Finally, by (1.3.27), (1.3.27′), and (1.3.28′),

G01

(1

m + θ

)= G0

1

(1m

)=

1m + 1

and

Ga1

(1

m + θ

)= Ga

1

(1m

)−

∫ θ

0Pm(s)dGa

0(s)

=

F a0

(1m

)− Pm(a) if 0 ≤ θ ≤ a,

F a0

(1m

)if θ > a

=

a + 1a + m + 1

if 0 ≤ θ ≤ a,

a + 1a + m

if θ > a

for any a ∈ (0, 1], θ ∈ [0, 1), and m ∈ N+. It is easy to see that

αa1 = sup

m∈N+, θ∈[0,1)

∣∣∣∣Ga1

(1

m + θ

)−G

(1

m + θ

)∣∣∣∣ ≤12, a ∈ I. (1.3.32)

Page 60: Kluwer

Basic properties 43

It follows from (1.3.31) and (1.3.32) that

αan ≤

12(3.5− 2

√2)n−1, n ∈ N+, a ∈ I.

By (1.3.30) the proof is complete. 2

Theorem 1.3.12 shows that both F an and Ga

n converge very fast to Gauss’distribution function G. Actually, the convergence is even considerablyfaster. See Corollary 2.3.6 and Theorem 2.5.5.

1.3.6 Mixing properties

We conclude this section by studying the ψ-mixing coefficients of (an)n∈N+

under either γa, a ∈ I, or γ. Theorem 1.3.12 plays here an important part.For any k ∈ N+ let Bk

1 = σ (a1, · · · , ak) and B∞k = σ (ak, ak+1, · · · )denote the σ-algebras generated by the random variables a1, · · · , ak, respec-tively, ak, ak+1, · · · . Clearly, Bk

1 is the σ-algebra generated by the closuresof the fundamental intervals of rank k while B∞k = τ−k+1 (BI), k ∈ N+.

For any µ ∈ pr (BI) consider the ψ-mixing coefficients (cf. Section A3.1)

ψµ (n) = sup∣∣∣∣

µ (A ∩B)µ (A) µ (B)

− 1∣∣∣∣ , n ∈ N+,

where the supremum is taken over all A ∈ Bk1 and B ∈ B∞k+n such that

µ (A) µ (B) 6= 0, and k ∈ N+.Define

εn = sup∣∣∣∣γa (B)γ (B)

− 1∣∣∣∣ , n ∈ N+,

where the supremum is taken over all a ∈ I and B ∈ B∞n with γ (B) > 0.Note that the sequence (εn)n∈N+ is non-increasing since B∞n+1 ⊂ B∞n for anyn ∈ N+. We shall show that εn can be expressed in terms of F a

n−1, a ∈ I,and G, namely, εn = ε′n with

ε′n = supa,x∈I

∣∣∣∣dF a

n−1 (x) /dx

g (x)− 1

∣∣∣∣ , n ∈ N+,

where g (x) = G ′(x) = (log 2)−1 / (x + 1) , x ∈ I.Indeed, by the very definition of ε′n, for any a, x ∈ I we have

ε′ng (x) ≥∣∣∣∣dF a

n−1(x)dx

− g (x)∣∣∣∣ .

Page 61: Kluwer

44 Chapter 1

By integrating the above inequality over B ∈ B∞n we obtain

γ (B) ε′n ≥∫

B

∣∣∣∣dF a

n−1 (x)dx

− g (x)∣∣∣∣ dx

≥∣∣∣∣∫

BdF a

n−1 (x)−∫

Bg (x) dx

∣∣∣∣ = |γa (B)− γ (B)|

for any B ∈ B∞n , n ∈ N+, and a ∈ I. Hence ε′n ≥ εn, n ∈ N+. On the otherhand, for any arbitrarily given n ∈ N+ let B+

x,h = (x ≤ τn−1 < x+h) ∈ B∞n ,with x ∈ [0, 1), h > 0, x + h ∈ I, and B−

x,h = (x − h ≤ τn−1 < x) ∈ B∞n ,with x ∈ (0, 1], h > 0, x− h ∈ I. Clearly,

εn ≥ max

(∣∣∣∣∣γa(B+

x,h)

γ(B+x,h)

− 1

∣∣∣∣∣ ,

∣∣∣∣∣γa(B−

x,h)

γ(B−x,h)

− 1

∣∣∣∣∣

)

for any a ∈ I and suitable x ∈ I and h > 0. Letting h → 0 we getεn ≥ ε′n, n ∈ N+. Therefore εn = ε′n, n ∈ N+.

It is easy to compute ε′1 = ε1 and ε′2 = ε2. Since F a0 (x) = γa

(τ0 < x

)=

γa ([0, x]) , a, x ∈ I, we have

ε1 = supa,x∈I

∣∣∣∣dF a

0 (x) /dx

g (x)− 1

∣∣∣∣ = supa,x∈I

∣∣∣∣(a + 1) (x + 1)

(ax + 1)2log 2− 1

∣∣∣∣ .

As

1 ≤ (a + 1) (x + 1)(ax + 1)2

≤ 2, a, x ∈ I,

it follows that

ε1 = 2 log 2− 1 = 0.38629 · · · .

Next, as γa(sa1 = 1/(a+ i)) = Pi(a), a ∈ I, i ∈ N+, by Proposition 1.3.8

we have

F a1 (x) =

i∈N+

(a + i + 1)xx + a + i

a + 1(a + i)(a + i + 1)

=∑

i∈N+

(a + 1)x(x + a + i)(a + i)

, a, x ∈ I.

Page 62: Kluwer

Basic properties 45

Then

ε2 = supa,x∈I

∣∣∣∣dF a

1 (x)/dx

g(x)− 1

∣∣∣∣

= supa,x∈I

∣∣∣∣∣∣(log 2)(a + 1)(x + 1)

i∈N+

1(x + a + i)2

− 1

∣∣∣∣∣∣.

It is not difficult to check that

2(ζ(2)− 1) ≤ (a + 1)(x + 1)∑

i∈N+

1(x + a + i)2

≤ ζ(2), a, x ∈ I.

Henceε2 = max(ζ(2) log 2− 1, 1− 2(ζ(2)− 1) log 2)

= ζ(2) log 2− 1 = 0.14018 · · · .

For n ≥ 3 the computation of εn becomes forbidding. Instead, Theorem1.3.12 can be used to derive good upper bounds for εn whatever n ∈ N+.

Proposition 1.3.13 We have ε1 < log 2 and

εn ≤ 12(log 2)cn−2, n ≥ 2,

where c = 3.5− 2√

2 = 0.67157 · · · .

Proof. It follows from (1.3.25) and (1.3.26) that

dF an (x)dx

=∫ 1

0

s + 1(xs + 1)2

dGan(s)

and

g(x) =∫ 1

0

s + 1(xs + 1)2

dG(s)

for any a, x ∈ I and n ∈ N. Using the last two equations, integration byparts yields

∣∣∣∣dF a

n (x)dx

− g(x)∣∣∣∣ =

∣∣∣∣∫ 1

0

s + 1(xs + 1)2

d(Gan(s)−G(s))

∣∣∣∣

=∣∣∣∣∫ 1

0((Ga

n(s)−G(s))∂

∂s

(s + 1

(xs + 1)2

)ds

∣∣∣∣

≤ sups∈I

|Gan(s)−G(s)|

∫ 1

0

|x(s + 2)− 1|(xs + 1)3

ds.

Page 63: Kluwer

46 Chapter 1

But∫ 1

0

|x(s + 2)− 1|(xs + 1)3

ds

=

∫ 1

0

1− x(s + 2)(xs + 1)3

ds if 0 ≤ x ≤ 13 ,

∫ (1−2x)/x

0

1− x(s + 2)(xs + 1)3

ds−∫ 1

(1−2x)/x

1− x(s + 2)(xs + 1)3

ds if 13 ≤ x ≤ 1

2 ,

∫ 1

0

x(s + 2)− 1(xs + 1)3

ds if 12 ≤ x ≤ 1

=

2(x + 1)−2 − 1 if 0 ≤ x ≤ 13 ,

−2(x + 1)−2 − 1 + (2x(1− x))−1 if 13 ≤ x ≤ 1

2 ,

1− 2(x + 1)−2 if 12 ≤ x ≤ 1

and

(x + 1)∫ 1

0

|x(s + 2)− 1|(xs + 1)3

ds =

=

2(x + 1)−1 − (x + 1) if 0 ≤ x ≤ 13

−2(x + 1)−1 − (x + 1) + (x + 1)(2x(1− x))−1 if 13 ≤ x ≤ 1

2

x + 1− 2(x + 1)−1 if 12 ≤ x ≤ 1

≤ 1.

Therefore

supa,x∈I

∣∣∣∣dF a

n (x)/dx

g(x)− 1

∣∣∣∣ ≤ (log 2) supa,s∈I

|Gan(s)−G(s)| , n ∈ N.

Thenε′1 = ε1 ≤ log 2

and, by Theorem 1.3.12,

ε′n+1 = εn+1 ≤ 12(log 2)cn−1, n ∈ N+.

Page 64: Kluwer

Basic properties 47

2

Theorem 1.3.14 For any a ∈ I we have

ψγa (n) ≤ εn + εn+1

1− εn+1, n ∈ N+ . (1.3.33)

Also,ψγ (n) = εn, n ∈ N+ . (1.3.34)

Proof. It follows from (1.3.24) that for any a ∈ I we have

εn = sup

∣∣∣∣∣γa

(B|I(i(k))

)

γ(B)− 1

∣∣∣∣∣ , n ∈ N+, (1.3.35)

where the supremum is taken over all B ∈ B∞k+n with γ(B) > 0, i(k) ∈ Nk+,

and k ∈ N. For arbitrarily given k, `, n ∈ N+, i(k) ∈ Nk+, and j(`) ∈ N`

+

putA = I(i(k)), B = ((ak+n, · · · , ak+n+`−1) = j(`)))

and note that γa (A) γa (B) 6= 0 for any a ∈ I. By (1.3.35) we have

|γa (B|A)− γ (B)| ≤ εnγ (B) (1.3.36)

and

|γa (B)− γ (B)| ≤ εn+kγ (B) . (1.3.37)

It follows from (1.3.36) and (1.3.37) that

|γa (B|A)− γa (B)| ≤ (εn + εn+k) γ (B) ,

whence

|γa (A ∩B)− γa (A) γa (B)| ≤ (εn + εn+k) γa (A) γ (B) .

Finally, note that (1.3.37) yields

γ (B) ≤ γa (B)1− εn+k

.

Since the sequence (εn)n∈N+ is non-increasing, we have

εn + εn+k

1− εn+k≤ εn + εn+1

1− εn+1, k, n ∈ N+,

Page 65: Kluwer

48 Chapter 1

which completes the proof of (1.3.33).To prove (1.3.34) we first note that putting A = I(i(k)) for any given

k ∈ N+ and i(k) ∈ Nk+, by (1.3.35) we have

|γa (A ∩B)− γa (A) γ (B)| ≤ εnγa (A) γ (B)

for any a ∈ I, B ∈ B∞k+n, and n ∈ N+. By integrating the above inequalityover a ∈ I with respect to γ and taking into account that

Iγa (E) γ(da) = γ (E) , E ∈ BI ,

we obtain ψγ(n) ≤ εn, n ∈ N+.To prove the converse inequality remark that the ψ-mixing coefficients

under the extended Gauss measure γ of the doubly infinite sequence (a`)`∈Z

of extended incomplete quotients, are equal to the corresponding ψ-mixingcoefficients under γ of (an)n∈N+ . This is obvious by the very definitions of(a`)`∈Z and ψ-mixing coefficients. See Subsection 1.3.3 and Section A3.1.As (a`)`∈Z is strictly stationary under γ, we have

ψγ(n) = ψγ(n) = sup∣∣∣∣

γ(A ∩B)γ(A) γ(B)

− 1∣∣∣∣ , n ∈ N+,

where the upper bound is taken over all A = σ(an, an+1, · · · ) and B ∈σ(a0, a−1, · · · ) for which γ(A) γ(B) 6= 0. Clearly, A = A× I and B = I×B,with A ∈ B∞n = τ−n+1(BI) and B ∈ BI . Then

ψγ(n) = supA ∈ τ−n+1(BI), B ∈ BI

γ(A)γ(B) 6= 0

∣∣∣∣γ(A×B)γ(A) γ(B)

− 1∣∣∣∣ , n ∈ N+. (1.3.38)

Now, it is easy to check that

γ(A×B) =∫

Aγ(da)γa(B) =

Bγ(db)γb(A)

for any A, B ∈ BI . It then follows from (1.3.38) and the very definition ofεn that

ψγ(n) ≥ supb ∈ I, A ∈ τ−n+1(BI)

γ(A) 6= 0

∣∣∣∣γb(A)γ(A)

− 1∣∣∣∣ = εn, n ∈ N+.

Page 66: Kluwer

Basic properties 49

This completes the proof of (1.3.34). 2

Corollary 1.3.15 The sequence (an)n∈N+ is ψ-mixing under γ and anyγa, a ∈ I. For any a ∈ I we have ψγa (1) ≤ (ε1 + ε2)/(1− ε2) = 0.61231 · · ·and

ψγa(n) ≤ (log 2)cn−2(1 + c)2− (log 2)cn−1

, n ≥ 2.

Also, ψγ(1) = 2 log 2− 1 = 0.38629 · · · , ψγ(2) = ζ(2) log 2− 1 = 0.14018 · · ·and

ψγ(n) ≤ 12(log 2)cn−2, n ≥ 3.

The doubly infinite sequence (a`)`∈Z of extended incomplete quotients isψ-mixing under the extended Gauss measure γ, and its ψ-mixing coefficientsare equal to the corresponding ψ-mixing coefficients under γ of (an)n∈N+.

The proof follows from Proposition 1.3.13 and Theorem 1.3.14. As al-ready noted, the last assertion is obvious by the very definitions of (a`)`∈Z

and ψ-mixing coefficients. 2

Remark. The above result will be improved in Chapter 2. See Proposi-tion 2.3.7. 2

Proposition 1.3.16 (F. Bernstein’s theorem) Let (cn)n∈N+ be a se-quence of positive numbers. The random event (an ≥ cn) occurs infinitelyoften with γ-probability 0 or 1, according as the series

∑n∈N+

1/cn con-verges or diverges. In other words, γ(an ≥ cn i.o.) is either 0 or 1 accordingas the series

∑n∈N+

1/cn converges or diverges.

Proof. We can clearly assume that cn ≥ 1, n ∈ N+. Let En = (an ≥cn), n ∈ N+. By (1.2.9) we have

γ(En) = γ(an ≥ cn) = γ (a1 ≥ cn)

= γ(a1 ≥ c′n) =1

log 2log

(1 +

1c′n

),

where either c′n = bcnc+ 1 or c′n = bcnc. Hence

12cn

≤ γ(En) ≤ 2cn log 2

, n ∈ N+,

since x log 2 ≤ log(1+x) ≤ x for any x ∈ I. Thus if∑

n∈N+1/cn converges,

then the result stated follows from the Borel–Cantelli lemma.

Page 67: Kluwer

50 Chapter 1

Assume now that∑

n∈N+1/cn diverges. It follows from Theorem 1.3.14

that for any k, n ∈ N+ such that k ≤ n we have

|γ (Eck ∩ · · · ∩ Ec

n ∩ En+1)− γ (Eck ∩ · · · ∩ Ec

n) γ (En+1)|

≤ ε1γ (Eck ∩ · · · ∩ Ec

n) γ (En+1) ,

where ε1 = 2 log 2− 1 = 0.38629 · · · . Hence

γ (En+1|Eck ∩ · · · ∩ Ec

n) ≥ (1− ε1)γ(En+1) ≥ 1− ε1

2cn+1,

thereforeγ

(Ec

n+1

∣∣Eck ∩ · · · ∩Ec

n

) ≤ 1− 1− ε1

2cn+1

for any k, n ∈ N+ such that k ≤ n.It follows that for any k,m ∈ N+ we have

γ(Ec

k ∩ · · · ∩Eck+m

) ≤m∏

i=0

(1− 1− ε1

2ck+i

),

whence

γ(Ec

k ∩ Eck+1 ∩ · · ·

) ≤ limm→∞

m∏

i=0

(1− 1− ε1

2ck+i

)= 0

since∑

n∈N+1/cn diverges.

Finally,

γ (an ≥ cn i.o.) = γ(∩k∈N+∪ i≥kEi)

= limk→∞

γ(∪ i≥kEi) = limk→∞

γ((∩i≥kEci )

c)

= 1− limk→∞

γ(Ec

k ∩ Eck+1 ∩ · · ·

)= 1.

2

In Chapter 3 we shall need the following result.

Corollary 1.3.17 Let bn, n ∈ N+, be real-valued random variables on(I,BI) such that an ≤ bn ≤ an + c, n ∈ N+, for some c ∈ R+. Let(cn)n∈N+ be a sequence of positive numbers. Then γ (bn ≥ cn i.o.) is either0 or 1 according as the series

∑n∈N+

1/cn converges or diverges.

Page 68: Kluwer

Basic properties 51

Proof. Clearly,

(an ≥ cn i.o.) ⊂ (bn ≥ cn i.o.) ⊂ (an ≥ max(1, cn − c) i.o.),

and the series∑

n∈N+1/cn and

∑n∈N+

1/max(1, cn−c) are both convergentor divergent. 2

Page 69: Kluwer

52 Chapter 1

Page 70: Kluwer

Chapter 2

Solving Gauss’ problem

In this chapter a generalization of Gauss’ problem stated in Subsection 1.2.1is solved. Several applications are also given.

2.0 Banach space preliminaries

2.0.1 A few classical Banach spaces

In this subsection we describe some Banach spaces which are often men-tioned throughout the book. We consider just functions defined on I, butalmost all considerations below can be easily extended to more general cases.

We denote by B (I) the collection of all bounded measurable functionsf : I → C. This is a commutative Banach algebra with unit under thesupremum norm

||| f ||| = supx∈I

|f (x)| , f ∈ B (I) .

We denote by C (I) the collection of all continuous functions f : I → C .This is a commutative Banach algebra with unit under the supremum norm.We denote by C1 (I) the collection of all functions f : I → C which havea continuous derivative. This is a commutative Banach algebra with unitunder the norm

||| f ||| 1 = ||| f |||+ ||| f ′ ||| , f ∈ C 1 (I) .

We denote by L (I) the collection of all Lipschitz functions f : I → C,that is, those for which

s (f) := supx′ 6=x′′

|f (x′)− f (x′′) ||x′ − x′′| < ∞·

53

Page 71: Kluwer

54 Chapter 2

This is a commutative Banach algebra with unit under the norm

||| f ||| L = ||| f |||+ s (f) , f ∈ L (I) .

Clearly,C 1 (I) ⊂ L (I) ⊂ C (I) ⊂ B (I) .

The variation varAf over A ⊂ I of a function f : I → C is defined as

supk−1∑

i=1

|f (ti)− f (ti−1)| ,

the supremum being taken over t1 < · · · < tk, ti ∈ A, 1 ≤ i ≤ k, and k ≥ 2.We write simply var f for varI f . If var f < ∞ then f is called a functionof bounded variation. The collection BV (I) of all functions f : I → C ofbounded variation is a commutative Banach algebra with unit under thenorm

||| f ||| v = ||| f |||+ var f, f ∈ BV (I) .

Clearly,L (I) ⊂ BV (I) ⊂ B (I) .

Let µ be a measure on BI . Two measurable functions f : I → C andg : I → C are said to be µ-indistinguishable, or to be µ-versions of eachother, if and only if µ (f 6= g) = 0. Let us partition the collection of allmeasurable complex-valued functions defined on I into (equivalence) classesof µ-indistinguishable functions. For any real number p ≥ 1 we denote byLp (I,BI , µ) = Lp

µ the collection of all such classes of µ-indistinguishablefunctions f : I → C for which

∫I |f |pdµ < ∞. Clearly, Lp

µ ⊂ Lp′µ if p ≥ p′ ≥

1. Next, Lpµ is a Banach space under the norm

||f ||p,µ =(∫

I|f |p dµ

)1/p

, f ∈ Lpµ.

(Note that the value of the integral is the same for all functions in an equiv-alence class.)

To define L∞µ we should first define the µ-essential supremum. For ameasurable function f : I → R, its µ-essential supremum, which is denotedµ-ess sup f , is defined as

inf a ∈ R : µ (f > a) = 0 .

Page 72: Kluwer

Solving Gauss’ problem 55

A measurable function f : I → C is said to be µ-essentially bounded if andonly if

µ-ess sup|f | < ∞.

Note thatµ-ess sup|f | = inf ||| f ||| ,

where the lower bound is taken over all µ-versions f or f . We denoteby L∞ (I,BI , µ) = L∞µ the collection of all classes of µ-essentially boundedcomplex-valued µ-indistinguishable functions defined on I ; L∞µ is a com-mutative Banach algebra with unit under the norm

||f ||∞,µ = µ-ess sup |f |, f ∈ L∞µ .

(Note that the value of the essential supremum is the same for all functionsin an equivalence class.) Clearly, L∞µ ⊂ Lp

µ for any p ≥ 1.The special case p = 2 is an important one: L2

µ can be also consideredas a Hilbert space with inner product (·, ·)µ defined by

(f, g)µ =∫

Ifg∗dµ, f, g ∈ L2

µ.

In the case where µ = λ we simply write Lp, ||f ||p, L∞, ||f ||∞, andess sup f instead of Lp

λ, ||f ||p,λ, L∞λ , ||f ||∞,λ, and λ-ess sup f , respectively.

2.0.2 Bounded essential variation

A variation v (f) for f ∈ L∞ is defined as v (f) = inf var f, the infimumbeing taken over all λ-versions f of f . If v (f) < ∞ then f ∈ L∞ is calleda function of bounded essential variation. It can be shown that

v (f) = lim0<a→0

1a

∫ 1

0|f (u + a)− f (u) |du,

where for x > 1 we define f (x) = f (1). Clearly, if f ∈ BV (I) then,in general, v (f) ≤ var f . This is a special instance of the following moregeneral result due to Stadje (1985). If v (f) < ∞ then the limit

f (t) = lim0<a→0

1a

∫ t+a

tf (u) du

exists for any t ∈ I, the function f is a right-continuous λ-version of f , andvar f = v (f). The collection BEV (I) of all functions f ∈ L∞ of bounded

Page 73: Kluwer

56 Chapter 2

essential variation is a commutative Banach algebra with unit under any ofthe norms

||f ||v,µ = v (f) + ||f ||1,µ, f ∈ BEV (I) ,

with µ ∈ pr (BI) such that µ ≡ λ. See Rautu and Zbaganu (1989). In thecase where µ = λ we simply write ||f ||v instead of ||f ||v,λ.

Proposition 2.0.1 (i) Let µ ∈ pr (BI). If f ∈ BV (I) then

||| f ||| ≤ var f +∣∣∣∣∫

Ifdµ

∣∣∣∣ . (2.0.1)

(ii) Let µ ∈ pr (BI) with µ ≡ λ. If f ∈ BEV (I) then

µ-ess sup |f | ≤ v (f) +∣∣∣∣∫

Ifdµ

∣∣∣∣ . (2.0.2)

Proof. (i) For any x ∈ I we can write

|f (x)| −∣∣∣∣∫

Ifdµ

∣∣∣∣ ≤∣∣∣∣f (x)−

Ifdµ

∣∣∣∣ =∣∣∣∣∫

I(f (x)− f (u))µ (du)

∣∣∣∣ ≤ var f,

from which (2.0.1) follows at once.(ii) (2.0.2) follows from (2.0.1) since

µ-ess sup |f | = infef ||| f ||| , v (f) = infef var f ,

the infimum being taken over all µ-versions f of f , and∫

Ifdµ =

Ifdµ

for such an f . 2

2.1 The Perron–Frobenius operator

2.1.1 Definition and basic properties

Let µ ∈ pr (BI) such that

µ(τ−1 (A)

)= 0 whenever µ (A) = 0, A ∈ BI , (2.1.1)

where τ is the continued fraction transformation defined in Subsection 1.1.1.

Page 74: Kluwer

Solving Gauss’ problem 57

In particular, this condition is satisfied if τ is µ-preserving, that is,µτ−1 = µ, to mean µ

(τ−1 (A)

)= µ (A) for any A ∈ BI . In general,

assuming that µ ¿ λ and putting h = dµ/dλ, it is easy to check that (2.1.1)holds if and only if λ (E) = 0, where E = (x ∈ I : h (x) = 0).

The Perron–Frobenius operator Pµ of τ under µ is defined as thebounded linear operator on L1

µ which takes f ∈ L1µ into Pµf ∈ L1

µ with∫

APµfdµ =

τ−1(A)fdµ , A ∈ BI ,

or, equivalently, ∫

IgPµfdµ =

I(g τ) fdµ (2.1.2)

for any f ∈ L1µ and g ∈ L∞µ . The existence of Pµf is ensured by the Radon–

Nikodym theorem on account of (2.1.1). Actually, Pµ so defined takes Lpµ

into itself for any p ≥ 1 and p = ∞. So, (2.1.2) holds for any f ∈ Lpµ

and g ∈ Lqµ, with p > 1 and q = p/ (p− 1). In particular, (2.1.2) holds for

any f, g ∈ L2µ.

The probabilistic interpretation of Pµ is immediate : if an I-valued ran-dom variable ξ on I has µ-density h, that is, µ (ξ ∈ A) =

∫A hdµ,A ∈ BI ,

with h ≥ 0 and∫I hdµ = 1, then τ ξ has µ-density Pµh. In the special case

µ = λ we obviously have

Pλf(x) =ddx

τ−1([0,x])

fdλ a.e. in I.

Proposition 2.1.1 The following properties hold :

(i) Pµ is positive, that is, Pµ f ≥ 0 if f ≥ 0;(ii) Pµ preserves integrals, that is,

IPµfdµ =

Ifdµ, f ∈ L1

µ;

(iii) ‖Pµ‖p,µ := sup (||Pµf ||p,µ : f ∈ Lpµ, ||f ||p,µ = 1) ≤ 1 for any p ≥ 1

and p = ∞;(iv) for any n ∈ N+ the nth power Pn

µ of Pµ is the Perron–Frobeniusoperator of the nth iterate τn of τ under µ ;

(v) (Pµf)∗ = Pµf∗ for any f ∈ L1µ ;

(vi) Pµ ((g τ) f) = gPµf for any f ∈ L1µ and g ∈ L∞µ and for any f ∈

Lpµ and g ∈ Lq

µ with p > 1 and q = p/ (p− 1);

Page 75: Kluwer

58 Chapter 2

(vii) Pµf = f if and only if τ is ν-preserving, where ν is defined byν (A) =

∫A fdµ, A ∈ BI . In particular, Pµ1 = 1 if and only if τ is µ-

preserving.

For the proof see Boyarski and Gora (1997, Ch. 4), Lasota and Mackey(1985, Ch. 3) or Mackey (1992, Ch. 4). 2

Remark. The above considerations on the Perron–Frobenius operator ofthe continued fraction transformation τ under different probability measureson BI apply mutatis mutandis to the general case of a transformation of anarbitrary probability space. For example, in the case of the natural extensionτ of τ (see Subsection 1.3.1) we should start by considering measures µ ∈pr

(B2I

)such that

µ(τ−1 (B)

)= 0 whenever µ (B) = 0, B ∈ B2

I . (2.1.1′)

Assuming that µ ¿ λ2 (two-dimensional Lebesgue measure) and puttingh = dµ/dλ2, it is easy to check that (2.1.1′) holds if and only if λ2

(E

)= 0,

where E =((x, y) ∈ I2 : h (x, y) = 0

).

The Perron–Frobenius operator Pµ of τ under µ is the bounded linearoperator on L1

µ

(I2

)which takes f ∈ L1

µ

(I2

)into Pµf ∈ L1

µ

(I2

)with

BPµfdµ =

τ−1(B)fdµ, B ∈ B2

I .

It is also quite easy to check that if µ ≤ λ2 and h = dµ/dλ2 > 0 a.e. in I2,then

Pµf (x, y) =

(h τ−1

)(x, y)

(f τ−1

)(x, y)

y2 (x + b1/yc)2 h (x, y)

a.e. in I2. Alternatively,

Pµf (x, y) =(

sx1 (y)

τ0 (y)

)2(h τ−1

)(x, y)

h (x, y)

(f τ−1

)(x, y)

a.e. in I2. In particular, for µ = γ when

h (x, y) =1

log 21

(xy + 1)2, x, y ∈ I2,

we have P γ f = f τ−1 a.e. in I2. Hence

Pnµf (x, y) =

(sx1 (y) · · · sx

n (y)τ0 (y) · · · τn−1 (y)

)2(h τ−n

)(x, y)

h (x, y)

(f τ−n

)(x, y) ,

Page 76: Kluwer

Solving Gauss’ problem 59

Pnγf = f τ−n

a.e. in I2 for any n ∈ N+.We should, however, note that the Perron–Frobenius operator of an in-

vertible transformation, like τ , is not of great value for deriving asymptoticproperties of its nth power as n → ∞. For an interesting discussion of thePerron–Frobenius operator of τ in connection with the time evolution ofcertain spatially homogeneous cosmologies (‘mixmaster universe’), we referthe reader to Mayer (1987). 2

Proposition 2.1.2 The Perron–Frobenius operator Pγ := U of τ underγ is given a.e. in I by the equation

Uf (x) =∑

i∈N+

Pi (x) f

(1

x + i

), f ∈ L1

γ . (2.1.3)

Proof. Let τi : Ii → I denote the restriction of τ to the interval Ii =(1/ (i + 1) , 1/i], i ∈ N+, that is,

τi (u) =1u− i, u ∈ Ii.

For any f ∈ L1γ and any A ∈ BI we have

τ−1(A)fdγ =

i∈N+

τ−1(A∩Ii)fdγ =

i∈N+

τ−1i (A)

fdγ. (2.1.4)

For any i ∈ N+, by the change of variable x = τ−1i (y) = (y + i)−1 we

successively obtain∫

τ−1i (A)

fdγ =1

log 2

τ−1i (A)

f (x) dx

x + 1

=1

log 2

Af

(1

y + i

)1

(y + i)−1 + 1dy

(y + i)2

=1

log 2

APi (y) f

(1

y + i

)dy

y + 1

=∫

APi (y) f

(1

y + i

)γ (dy) .

(2.1.5)

Now, (2.1.3) follows from (2.1.4) and (2.1.5). 2

Page 77: Kluwer

60 Chapter 2

Proposition 2.1.3 Let µ ∈ pr (BI). Assume that µ ¿ λ and h =dµ/dλ > 0 a.e. in I . Then the Perron–Frobenius operator Pµ of τ underµ is given a.e. in I by the equation

Pµ f (x) =1

h (x)

i∈N+

h((x + i)−1

)

(x + i)2f

(1

x + i

)

=Ug (x)

(x + 1)h (x), f ∈ L1

µ,

(2.1.6)

where g (x) = (x + 1)h (x) f(x), x ∈ I.The powers of Pµ are given a.e. in I by the equation

Pnµ f (x) =

Ung (x)(x + 1)h (x)

, f ∈ L1µ, n ∈ N+. (2.1.7)

Proof. The proof of (2.1.6) is entirely similar to that of (2.1.3), and isleft to the reader. Note that f ∈ L1

µ entails g ∈ L1γ .

To prove (2.1.7) note that it holds for n = 1. Assuming that (2.1.7)holds for some n ∈ N+, we have

Pn+1µ f (x) = Pµ

(Pn

µ f)(x) = Pµ

(Ung

(·+ 1)h

)(x)

=1

h (x)

i∈N+

h((x + i)−1

)

(x + i)2Ung

(1

x + i

)/

(1

x + i+ 1

)h

((x + i)−1

)

=1

(x + 1)h (x)

i∈N+

Pi (x) Ung

(1

x + i

)=

Un+1g (x)(x + 1)h (x)

a.e. in I,

and the proof is complete. 2

Corollary 2.1.4 The Perron–Frobenius operator Pλ of τ under λ isgiven a.e. in I by the equation

Pλf (x) =∑

i∈N+

1(x + i)2

f

(1

x + i

), f ∈ L1.

The powers of Pλ are given a.e. in I by the equation

Pnλ f (x) =

Ung (x)x + 1

, f ∈ L1, n ∈ N+,

Page 78: Kluwer

Solving Gauss’ problem 61

where g (x) = (x + 1) f(x), x ∈ I.

Proposition 2.1.5 Let µ ∈ pr (BI). Assume that µ ¿ λ and let h =dµ/dλ. Then

µ(τ−n (A)

)=

A

Unf (x)x + 1

dx (2.1.8)

for any n ∈ N and A ∈ BI , where f (x) = (x + 1)h(x), x ∈ I.

Proof. For n = 0 equation (2.1.8) reduces to

µ (A) =∫

Ah (x) dx, A ∈ BI ,

which is obviously true. Assume that (2.1.8) holds for some n ∈ N. Then

µ(τ−(n+1) (A)

)= µ

(τ−n

(τ−1 (A)

))

=∫

τ−1(A)

Unf (x)x + 1

dx = (log 2)∫

τ−1(A)Unfdγ.

By the very definition of the Perron–Frobenius operator U = Pγ we have

τ−1(A)Unfdγ =

AUn+1fdγ.

Therefore

µ(τ−(n+1) (A)

)= (log 2)

AUn+1fdγ =

A

Un+1f (x)x + 1

dx,

and the proof is complete. 2

Remark. It should be noted that (2.1.8) holds without assuming thath > 0 a.e. Since

µ (τn ∈ A) = µ(τ−n (A)

)=

APn

µ 1 dµ, n ∈ N, A ∈ BI ,

it is possible to derive (2.1.8) from Proposition 2.1.3 assuming that h > 0a.e., which clearly restricts the generality of the result. 2

Page 79: Kluwer

62 Chapter 2

2.1.2 Asymptotic behaviour

It is easy to check that1

x + 1is an eigenfunction of Pλ corresponding to the eigenvalue 1.

Define on L1 the linear operators Π1 and T0 by

Π1f (x) =(log 2)−1

x + 1

Ifdλ, f ∈ L1, x ∈ I,

T0 = Pλ −Π1.

Hence

Π21 = Π1, PλΠ1 = Π1Pλ = Π1, T0Π1 = Π1T0 = 0. (2.1.9)

It follows from the last equation (2.1.9) that

Pnλ = Π1 + Tn

0 , n ∈ N+. (2.1.10)

Theorem 2.1.6 The only eigenvalue of modulus 1 of Pλ : L1 →L1 is 1 and this eigenvalue is simple. The operator T0 has the followingproperties:

(i) T0 (BEV (I)) ⊂ BEV (I);(ii) there exists 0 < q < 1 such that ||Tn

0 ||v = O (qn) as n → ∞(equivalently, the spectral radius of T0 in BEV (I) under || · ||v is less than1);

(iii) supn∈N+ ||Tn0 ||1 < ∞ and limn→∞ ||Tn

0 h||1 = 0 for any h ∈ L1.

Proof. This is a special case of Theorem 5.3.12 in Iosifescu and Grig-orescu (1990). 2

The result just stated concerning the asymptotic behaviour of Tn0 as

n → ∞ can be used to derive the asymptotic behaviour of Un as n → ∞.It follows from Corollary 2.1.4 and equation (2.1.10) that

Ung (x) = U∞g + (x + 1)Tn0

(g

·+ 1

)(x) (2.1.11)

a.e. in I for any g ∈ L1γ , where

U∞g =∫

Igdγ.

Page 80: Kluwer

Solving Gauss’ problem 63

It is obvious that U∞U∞ = UU∞ = U∞. Using the last equation (2.1.9) itis easy to check that

U∞U = U∞ . (2.1.12)

Now, defining the linear operator T : L1γ → L1

γ by

Tg (x) = (x + 1)T0

(g

·+ 1

)(x), g ∈ L1

γ ,

a.e. in I, it is easy to check that

Tng (x) = (x + 1)Tn0

(g

·+ 1

)(x), g ∈ L1

γ , (2.1.13)

a.e. in I for any n ∈ N+, and

TU∞ = U∞T = 0. (2.1.14)

It follows from (2.1.11) and (2.1.13) that

Un = U∞ + Tn, n ∈ N+.

Proposition 2.1.7 The only eigenvalue of modulus 1 of U : L1γ → L1

γ

is 1 and this eigenvalue is simple. The corresponding eigenspace consists ofthe a.e. constant functions on I. The linear operator T : L1

γ → L1γ has the

following properties:(i) T (BEV (I)) ⊂ BEV (I);(ii) there exists 0 < q < 1 such that ||Tn||v,γ = O (qn) as n → ∞

(equivalently, the spectral radius of T in BEV (I) under || · ||v,γ is less than1);

(iii) supn∈N+||Tn||1,γ < ∞ and limn→∞ ||Tnh||1,γ = 0 for any h ∈ L1

γ .

Proof. By (2.1.11) and (2.1.13), all the conclusions are immediate conse-quences of the corresponding conclusions of Theorem 2.1.6. In checking (ii)we have to use Proposition 2.0.1(ii). 2

Remark. Since

λ (A)2 log 2

≤ γ (A) ≤ λ (A)log 2

, A ∈ BI ,

the domains of the operators U, U∞ and T can be as well taken to be L1

and then in (ii) and (iii) the norms || · ||v,γ and || · ||1,γ should be replacedby the norms || · ||v and || · ||1, respectively. 2

Page 81: Kluwer

64 Chapter 2

Corollary 2.1.8 For any h ∈ L1 we have

limn→∞

I|Unh− U∞h|dγ = lim

n→∞

I|Unh− U∞h|dλ = 0.

Hence, for any h ∈ L1,

limn→∞

AUnh dµ = µ (A) U∞h (2.1.15)

uniformly with respect to A ∈ BI , where µ stands for either λ or γ.

Proof. For any A ∈ BI we have∣∣∣∣∫

AUnh dµ− µ (A) U∞h

∣∣∣∣ =∣∣∣∣∫

A(Unh− U∞h) dµ

∣∣∣∣

≤∫

A|Unh− U∞h| dµ

≤∫

I|Unh− U∞h| dµ −→ 0

as n →∞, and the proof is complete. 2

Remark. It is not possible to show that Unh → U∞h a.e. as n → ∞by using (2.1.15). It is an open problem whether this is actually true. Cf.Petek (1989) and Iosifescu (1992, p. 912). 2

2.1.3 Restricting the domain of the Perron–Frobeniusoperator

The asymptotic properties of the Perron–Frobenius operator U : L1γ → L1

γ

as described by Proposition 2.1.7, are not strong enough for to lead to a sat-isfactory solution to Gauss’ problem, whilst when restricting U to BEV (I)they are substantially better. See further Proposition 2.1.17.

In the next sections the domain of U will be successively restricted tovarious Banach spaces. In this subsection we show that U , defined by

Uf (x) =∑

i∈N+

Pi (x) f

(1

x + i

)(2.1.16)

for any x ∈ I, is a bounded linear operator on any of the Banach spacesB (I) , C (I) , BV (I), L (I), and C 1 (I).

Page 82: Kluwer

Solving Gauss’ problem 65

Proposition 2.1.9 The operator U defined by (2.1.16) is a boundedlinear operator of norm 1 on both B (I) and C (I).

Proof. It is obvious that if f ∈ B (I) then Uf ∈ B (I) and ||| Uf ||| ≤ ||| f |||.Next, if f ∈ C (I) then Uf ∈ C (I) since the series defining Uf is uniformlyconvergent, it being dominated by a convergent series of positive constants.We also have ||| Uf ||| ≤ ||| f ||| , f ∈ C (I) ⊂ B (I), as a consequence of thevalidity of this inequality for f ∈ B (I). In both cases ||| U ||| = 1 since Upreserves the constant functions. 2

A different interpretation is available for the operator U : B (I) → B (I).

Proposition 2.1.10 The operator U : B (I) → B (I) is the transitionoperator of both the Markov chain (sa

n)n∈N on (I,BI , γa), for any a ∈ I, andthe Markov chain (s`)`∈Z on

(I2,B2

I , γ).

Proof. As noted in Subsection 1.3.4, for any a ∈ I the sequence (san)n∈N

is an I-valued Markov chain with the following transition mechanism: fromstate s ∈ I the possible transitions are to any state 1/ (s + i) with corre-sponding transition probability Pi(s), i ∈ N+. Then the transition operatorof (sa

n)n∈N takes f ∈ B (I) to the function defined by

E(f

(san+1

) |san = s

)=

i∈N+

Pi (s) f

(1

s + i

)= Uf(s), s ∈ I,

that is, it coincides with the operator U whatever a ∈ I.A similar reasoning is valid for the case of the Markov chain (s`)`∈Z,

whose transition mechanism is identical with that of (san)n∈N. (See Subsec-

tion 1.3.3.) 2

To prove a result similar to Proposition 2.1.9 for the Banach spacesBV (I), L (I), and C 1 (I) we need some preparation.

We first prove that the operator U : B (I) → B (I) preserves monotonic-ity.

Proposition 2.1.11 If f ∈ B (I) is non-decreasing (non-increasing),then Uf is non-increasing (non-decreasing).

Proof. To make a choice assume that f is non-decreasing. Let y > x,x, y ∈ I. We have Uf (y)− Uf (x) = S1 + S2, where

S1 =∑

i∈N+

Pi (y)(

f

(1

y + i

)− f

(1

x + i

)),

S2 =∑

i∈N+

(Pi (y)− Pi (x)) f

(1

x + i

).

Page 83: Kluwer

66 Chapter 2

Clearly, S1 ≤ 0. We shall prove that S2 ≤ 0, too. Since∑

i∈N+

Pi (u) = 1, u ∈ I,

we can write

S2 = −∑

i∈N+

(f

(1

x + 1

)− f

(1

x + i

))(Pi (y)− Pi (x)) .

As is easy to see, the function P1 is decreasing while the functions Pi, i ≥ 3,are all increasing. Note also that

f

(1

x + 1

)− f

(1

x + i

)≥ f

(1

x + 1

)− f

(1

x + 2

)≥ 0, i ≥ 2.

Therefore

S2 = −∑

i≥2

(f

(1

x + 1

)− f

(1

x + i

))(Pi (y)− Pi (x))

≤ −(

f

(1

x + 1

)− f

(1

x + 2

))∑

i≥2

(Pi (y)− Pi (x))

=(

f

(1

x + 1

)− f

(1

x + 2

))(P1 (y)− P1 (x)) ≤ 0,

as claimed. Thus Uf (y)− Uf (x) ≤ 0, and the proof is complete. 2

Remark. It is possible to show more generally that if f ∈ L1 is non-decreasing (non-increasing), then Uf is non-increasing (non-decreasing).The proof, along the same lines as above, is left to the reader. 2

Proposition 2.1.12 If f ∈ B (I) is monotone, then

var Uf ≤ 12

var f.

The constant 1/2 cannot be lowered.

Proof. Assume, with no loss of generality, that f is non-decreasing. [Notethat if f is non-increasing, then −f is non-decreasing while var U (−f) =var Uf and var (−f) = var f .] Then by Proposition 2.1.11 we have

var Uf = Uf (0)− Uf (1) =∑

i∈N+

(Pi (0) f

(1i

)− Pi (1) f

(1

i + 1

)).

Page 84: Kluwer

Solving Gauss’ problem 67

Since Pi (1) = 2Pi+1(0), i ∈ N+, it follows that

var Uf = P1 (0) f (1)−∑

i∈N+

Pi+1 (0) f

(1

i + 1

).

AsP1 (0) =

i∈N+

Pi+1 (0) =12

and

f

(1

i + 1

)≥ f (0) , i ∈ N+,

we finally obtain

var Uf ≤ 12

(f (1)− f (0)) =12var f.

Since for f defined by f (x) = 0, 0 ≤ x < 1, and f (1) = 1 we havevar Uf = (var f) /2, it follows that the constant 1/2 cannot be lowered. 2

Corollary 2.1.13 If f ∈ BV (I) is real-valued, then

var Uf ≤ 12var f.

The constant 1/2 cannot be lowered.

Proof. By Hahn’s decomposition of a signed measure, for any f ∈ BV (I)there exist monotone functions f1, f2 ∈ B (I) such that f = f1 − f2 andvar f = var f1 + var f2. [To obtain this consider the signed measure µ onBI defined by µ ((a, b]) = f (b)− f(a), a < b, a, b ∈ I.] Then by Proposition2.1.12 we have

var Uf = var (Uf1 − Uf2) ≤ var Uf1 + var Uf2

≤ 12

(var f1 + var f2) =12var f .

The optimality of the constant 1/2 follows from Proposition 2.1.12. 2

Proposition 2.1.14 We have

s (Uf) ≤ (2ζ (3)− ζ (2)) s (f) (2.1.17)

for any f ∈ L (I). The constant θ = 2ζ (3) − ζ (2) = 0.7594798 · · · cannotbe lowered.

Page 85: Kluwer

68 Chapter 2

Proof. For x 6= y, x, y ∈ I, we have

Uf (y)− Uf (x)y − x

=∑

i∈N+

Pi (y)− Pi (x)y − x

f

(1

x + i

)(2.1.18)

−∑

i∈N+

Pi (y)f

(1

y+i

)− f

(1

x+i

)

1y+i − 1

x+i

1(x + i) (y + i)

.

Next, remark that

Pi (x) =i

x + i + 1− i− 1

x + i, i ∈ N+,

and then

Pi (y)− Pi (x)y − x

=i− 1

(x + i) (y + i)− i

(x + i + 1) (y + i + 1), i ∈ N+.

Hence

i∈N+

Pi (y)− Pi (x)y − x

f

(1

x + i

)

=∑

i∈N+

i

(x + i + 1) (y + i + 1)

(f

(1

x + i + 1

)− f

(1

x + i

)).

(2.1.19)

Assume that x > y. It then follows from (2.1.18) and (2.1.19) that∣∣∣∣Uf (y)− Uf (x)

y − x

∣∣∣∣ ≤ s (f)∑

i∈N+

(Pi (y)

(y + i)2+

i

(y + i) (y + i + 1)3

).

Now, the function g defined by

g (y) =∑

i∈N+

Pi (y)(y + i)2

, y ∈ I,

is precisely Uh for h (y) = y2, y ∈ I. Since h is increasing, g is decreasingby Proposition 2.1.11. Therefore for any y ∈ I we have

i∈N+

(Pi (y)

(y + i)2+

i

(y + i) (y + i + 1)3

)

Page 86: Kluwer

Solving Gauss’ problem 69

≤∑

i∈N+

(1

i3 (i + 1)+

1(i + 1)3

)

=∑

i∈N+

(1i3− 1

i2+

1i− 1

i + 1+

1(i + 1)3

)

= ζ (3)− ζ (2) + 1 + ζ (3)− 1 = 2ζ (3)− ζ (2) .

As clearly

supx,y∈I, x>y

∣∣∣∣Uf (y)− Uf (x)

y − x

∣∣∣∣ = s (Uf) ,

we obtain (2.1.17).Finally, it is easy to check that for f (x) = x, x ∈ I, we have s (f) =

1 and s (Uf) = 2ζ (3)− ζ (2). The proof is complete. 2

Proposition 2.1.15 We have

||| (Uf)′ ||| ≤ (2ζ (3)− ζ (2)) ||| f ′ ||| (2.1.20)

for any f ∈ C 1 (I). The constant θ = 2ζ (3)− ζ (2) = 0.7594798 · · · cannotbe lowered.

Proof. Equations (2.1.19) and (2.1.18) show that for f ∈ C 1 (I) the seriesdefining Uf can be differentiated term by term since the series of the deriva-tives is uniformly convergent, it being dominated by a convergent series ofpositive constants (cf.further Subsection 2.2.1). Then (2.1.20) follows from(2.1.17) since for any f ∈ C 1 (I) we have s (f) = ||| f ′ |||. 2

Now, we can state the result announced.

Proposition 2.1.16 The operator U defined by (2.1.16) is a boundedlinear operator of norm 1 on any of the Banach spaces BV (I), L (I), andC 1 (I).

Proof. The result follows from Corollary 2.1.13 and Propositions 2.1.14and 2.1.15, having in view that U preserves the constant functions. In thecase of BV (I) we should note that for a complex-valued f ∈ BV (I) we have

max (var Re f, var Im f) ≤ var f ≤ var Re f + var Im f.

Hence by Corollary 2.1.13 we have var Uf ≤ var f for such an f. 2

Page 87: Kluwer

70 Chapter 2

2.1.4 A solution to Gauss’ problem for probability measureswith densities

Let µ ∈ pr (BI) such that µ ¿ λ. By Proposition 2.1.5 for any n ∈ N wehave

µ(τ−n (A)

)=

A

Unf0 (x)x + 1

dx, A ∈ BI , (2.1.21)

with f0 (x) = (x + 1)F ′0 (x) , x ∈ I, whereF ′

0 = dµ/dλ. We shall considerGauss’ problem in a more general form, namely, that of the asymptoticbehaviour of µ(τ−n(A)) as n →∞ for any A ∈ BI .

Equation (2.1.21) shows that solving this more general Gauss’ problemfor a given µ ∈ pr (BI) amounts to studying the behaviour of the nth powerof the Perron–Frobenius operator U on a suitable Banach space. On accountof the results obtained in Subsection 2.1.2 we can state the following result.

Proposition 2.1.17 Let µ ∈ pr (BI) such that µ ¿ λ. We have

limn→∞ sup

A∈BI

∣∣µ (τ−n (A)

)− γ (A)∣∣ = 0. (2.1.22)

If F ′0 = dµ/dλ ∈ BEV (I) then there exists a constant C ∈ R+ such

that ∣∣µ (τ−n (A)

)− γ (A)∣∣ ≤ C qnγ (A) (2.1.23)

for any n ∈ N+ and A ∈ BI . Here 0 < q < 1 is the constant occurring inProposition 2.1.7(ii).

Proof. We have

µ(τ−n (A)

)− γ (A) =∫

A

Unf0 (x)− U∞f0

x + 1dx (2.1.24)

sinceU∞f0 =

If0dγ =

1log 2

IF ′

0dλ =1

log 2,

and equation (2.1.22) follows by (2.1.15).If F ′

0 ∈ BEV (I) then for some C0 ∈ R+ by Proposition 2.1.7(ii) we have

‖Unf0 − U∞f0‖v ≤ C0qn||f0||v, n ∈ N+.

It then follows from Proposition 2.0.1(ii) that

ess sup |Unf0 − U∞f0| ≤ C0qn||f0||v, n ∈ N+. (2.1.25)

Now, (2.1.23) follows from (2.1.24) and (2.1.25). 2

Page 88: Kluwer

Solving Gauss’ problem 71

Remark. As for q, we conjecture that its (optimal) value is g2 = (3 −√5)/2 = 0.38196 · · · , as in a further related result, namely, Corollary 2.5.7.2

In the next three sections we will take up Gauss’ problem assuming thatF ′

0 = dµ/dλ belongs to Banach spaces ‘smaller’ than BEV (I).

2.1.5 Computing variances of certain sums

In this subsection, using properties of the Perron–Frobenius operator U onBEV (I), we give some results concerning the variances of certain sums ofrandom variables constructed starting from either the a`, ` ∈ Z, or the an, n ∈N+. These results will be used in Chapter 3.

Let H be a real-valued function on NZ+. Set H` = H1 τ `−1, ` ∈ Z,

whereH1 = H(· · · , a−2, a−1, a0, a1, a2, · · · ).

Clearly (H` )`∈Z is a strictly stationary process on(I2,B2

I , γ). Set S0 =

0, Sn =∑n

i=1 Hi, n ∈ N+. We start with some well known results.

Theorem 2.1.18 If EγH21 < ∞, EγH1 = 0, and limn→∞EγH1Hn = 0,

then the finite or infinite limit limn→∞EγS2n exists. We have limn→∞EγS2

n

< ∞ if and only if there exists g ∈ L2γ

(I2

)such that H1 = g τ − g a.e. in

I2.

This is a special case of Theorem 18.2.2 in Ibragimov and Linnik (1971).

Proposition 2.1.19 If EγH21 < ∞, EγH1 = 0, and the series

σ2 = EγH21 + 2

n∈N+

EγH1Hn+1 (2.1.26)

converges absolutely, then σ2 ≥ 0 and

EγS2n = n

(σ2 + o (1)

)(2.1.27)

as n → ∞. If the stronger assumption∑

n∈N+n |EγH1Hn+1| < ∞ holds,

thenEγS2

n = n(σ2 + O(n−1)

)(2.1.28)

as n →∞.

Proof. By strict stationarity, for any n > 1 we have

EγS2n =

n∑

i,j=1

EγHiHj = nEγH21 + 2

n−1∑

j=1

(n− j) EγH1Hj+1.

Page 89: Kluwer

72 Chapter 2

Therefore

1n

∣∣EγS2n − nσ2

∣∣ ≤ 2

n−1∑j=1

j |EγH1Hj+1|

n+

j≥n

|EγH1Hj+1|

,

and the right hand side is o (1) as n →∞ when∑

n∈N+|EγH1Hn+1| < ∞

(note that∑

n∈N+|un| < ∞ implies limn→∞

∑nj=1 j |uj | /n = 0), so that

(2.1.27) holds. Finally, since

n−1∑j=1

j |EγH1Hj+1|

n+

j≥n

|EγH1Hj+1| ≤

∑j∈N+

j |EγH1Hj+1|

n,

equation (2.1.28) holds, too, under our stronger assumption. 2

Corollary 2.1.20 Assume that EγH21 < ∞, EγH1 = 0, and

n∈N+

n |EγH1Hn+1| < ∞.

Then σ = 0 if and only if there exists g ∈ L2γ

(I2

)such that H1 = g τ − g

a.e. in I2.

Proposition 2.1.21 If EγH21 < ∞, EγH1 = 0, and

n∈N+

E1/2γ [H1 − Eγ (H1|a−n, · · · , an)]2 < ∞, (2.1.29)

then series (2.1.26) converges absolutely.

On account of Corollary 1.3.15, this is a transcription of part of Theorem18.6.1 in Ibragimov and Linnik (1971) for the special case of the doublyinfinite sequence (a`)`∈Z. 2

Note that both the conditional mean value occurring in (2.1.29) and σ2

can be expressed in terms of the random variable h on(I2,B2

I

)defined on Ω2

(thus a.e. in I2) by

h ([i1, i2, · · · ], [i0, i−1, · · · ]) = H (· · · , i−1, i0, i1, · · · )for any (i`)`∈Z ∈ NZ

+. Clearly,

EγH1 =∫

I2

hdγ, EγH21 =

I2

h2dγ,

Page 90: Kluwer

Solving Gauss’ problem 73

Eγ (H1|a−n, · · · , an) (ω, θ) =1

γ (I2 (i−n, · · · , in))

I2(i−n,··· ,in)

hdγ

for (ω, θ) ∈ I2 (i−n, · · · , in), where

I2 (i−n, · · · , in) = I (i1, · · · , in)× I (i0, i−1, · · · , i−n)

for any ik ∈ N+, −n ≤ k ≤ n, n ∈ N+, and

σ2 =∫

I2

h2dγ + 2∑

n∈N+

I2

h (h τn) dγ.

Condition (2.1.29) is fulfilled for a large class of functions h as shown bythe following result.

Proposition 2.1.22 Put

cn = sup∣∣h (ω, θ)− h

(ω′, θ′

)∣∣, n ∈ N+,

where the upper bound is taken over all (ω, θ), (ω′, θ′) ∈ I2 (i−n, · · · , in) andik ∈ N+, −n ≤ k ≤ n. Assume that EγH2

1 =∫I2 h2dγ < ∞ and

∑n∈N+

cn <∞. Then (2.1.29) holds.

Proof. For any n ∈ N+ we have

Eγ [H1 − Eγ (H1|a−n, · · · , an)]2

=∑

i−n,··· ,in∈N+

I2(i−n,··· ,in)

h

(ω′, θ′

)−

I2(i−n,··· ,in)h (ω, θ) γ (dω, dθ)

γ (I2 (i−n, · · · , in))

2

γ(dω′,dθ′

)

=∑

i−n,··· ,in∈N+

I2(i−n,··· ,in)γ

(dω′,dθ′

)(∫

I2(i−n,··· ,in)

(h

(ω′, θ′

)− h (ω, θ))γ (dω, dθ)

)2

γ2 (I2 (i−n, · · · , in))

≤∑

i−n,··· ,in∈N+

γ(I2 (i−n, · · · , in)

)c2n = c2

n.

Hence the series occurring in (2.1.29) is dominated by the convergent series∑n∈N+

cn, which completes the proof. 2

Page 91: Kluwer

74 Chapter 2

Remark. If for some positive constants c and ε we have

∣∣h (ω, θ)− h(ω′, θ′

)∣∣ ≤ c

(∣∣∣∣1ω− 1

ω′

∣∣∣∣ +∣∣∣∣1θ− 1

θ′

∣∣∣∣)ε

(2.1.30)

for any (ω, θ), (ω′, θ′) ∈ Ω2, then the assumption of Proposition 2.1.22 holds.Indeed, for (ω, θ), (ω′, θ′) ∈ I2 (i−n, · · · , in) we have

∣∣∣∣1θ− 1

θ′

∣∣∣∣ ≤ supi−1,··· ,i−n∈N+

λ (I (i−1, · · · , i−n)) = (FnFn+1)−1

and similarly ∣∣∣∣1ω− 1

ω′

∣∣∣∣ ≤ (Fn−1Fn)−1 .

Hence ∣∣h (ω, θ)− h(ω′, θ′

)∣∣ ≤ c 2ε (Fn−1Fn)−ε, n ∈ N+,

for any (ω, θ), (ω′, θ′) ∈ I2(i−n, · · · , in), ik ∈ N+, −n ≤ k ≤ n, and clearlythe series

∑n∈N+

(Fn−1Fn)−ε is convergent.In particular, (2.1.30) holds if h satisfies a Holder condition of order

ε > 0, that is,

sup(ω,θ),(ω′,θ′)∈Ω2

|h (ω, θ)− h (ω′, θ′)|(|ω′ − ω|+ |θ′ − θ|)ε < ∞.

2

The results above clearly apply to the special case where H is a real-valued function on NN+

+ . In this case we set

Hn = H (an, an+1, · · · ) = H1 τn−1, n ∈ N+.

Then (Hn)n∈N+ is a strictly stationary sequence on (I,BI , γ). Theorem2.1.18, Proposition 2.1.19, Corollary 2.1.20, and Proposition 2.1.21 hold inthe present case if in their statements we replace γ by γ, I2 by I, τ by τand inequality (2.1.29) by

n∈N+

E1/2γ [H1 −Eγ (H1|a1, · · · , an)]2 < ∞. (2.1.31)

In the present case the conditional mean value occurring in (2.1.31) and σ2

can be expressed in terms of the random variable h on (I,BI) defined onΩ (thus a.e. in I) by

h ([i1, i2, · · · ]) = H (i1, i2, · · · )

Page 92: Kluwer

Solving Gauss’ problem 75

for any (i`)`∈N+ ∈ NN++ . Clearly,

EγH1 =∫

Ihdγ, EγH2

1 =∫

Ih2dγ,

Eγ (H1|a1, · · · , an) (ω) =1

γ(I

(i(n)

))∫

I(i(n))hdγ

for any ω ∈ I(i(n)), i(n) ∈ Nn+, n ∈ N+, and

σ2 =∫

Ih2dγ + 2

n∈N+

Ih (h τn) dγ

=∫

Ih2dγ + 2

n∈N+

Ih Unh dγ

[the last equation is a consequence of (2.1.2)].It follows from Proposition 2.1.22 that condition (2.1.31) is fulfilled if we

assume that∫I h2dγ < ∞ and

∑n∈N+

cn < ∞, where

cn = supi(n)∈Nn

+

supω,ω′∈I(i(n))

∣∣h (ω)− h(ω′

)∣∣ , n ∈ N+.

In turn, the second assumption holds if for some positive constants c and εwe have ∣∣h (ω)− h

(ω′

)∣∣ ≤ c

∣∣∣∣1ω− 1

ω′

∣∣∣∣ε

, ω, ω′ ∈ Ω. (2.1.32)

In particular, (2.1.32) holds if h satisfies a Holder condition of order ε > 0,that is,

supω,ω′∈Ω

|h (ω)− h (ω′)||ω − ω′|ε < ∞.

To indicate another class of functions h for which (2.1.31) holds let usrecall that a function h : I → C is said to be of bounded p–variation, p ≥ 1,on A ⊂ I if and only if

var(p)A h := sup

k−1∑

i=1

|h (ti+1)− h (ti)|p < ∞,

the supremum being taken over t1 < · · · < tk, ti ∈ A, 1 ≤ i ≤ k, andk ≥ 2. We write simply var(p)h for var(p)

I h. If var(p)h < ∞ then h iscalled a function of bounded p-variation. Clearly, var(1)h = var h and a

Page 93: Kluwer

76 Chapter 2

function of bounded variation is also a function of bounded p-variation forany p > 1. (The converse of this assertion is in general not true.) Moregenerally, a function of bounded p-variation, p ≥ 1, is also a function ofp′-variation, p′ > p.

Proposition 2.1.23. If h is a function of bounded p-variation on Ω,then (2.1.31) holds.

Proof. Without any loss of generality, on account of the last assertionabove we can assume that p ≥ 2. It is obvious that

∣∣h (ω)− h(ω′

)∣∣ ≤(var(p)

A h)1/p

for any A ⊂ Ω and ω, ω′ ∈ A. Then

E1/2γ [H1 − Eγ (H1|a1, · · · , an)]2 ≤ E1/p

γ |H1 −Eγ(H1|a1, · · · , an)|p

=

i(n)∈Nn+

I(i(n))

∣∣∣∣∣∣∣h (ω)− 1

γ(I

(i(n)

))∫

I(i(n))

h(ω′

(dω′

)∣∣∣∣∣∣∣

p

γ (dω)

1/p

=

i(n)∈Nn+

1γp

(I

(i(n)

))∫

I(i(n))

γ (dω)

∣∣∣∣∣∣∣

I(i(n))

(h (ω)− h

(ω′

))γ

(dω′

)∣∣∣∣∣∣∣

p

1/p

max

i(n)∈Nn+

γ(I(i(n)))∑

i(n)∈Nn+

var(p)

I(i(n))h

1/p

≤(

1log 2

(FnFn+1)−1

)1/p (var(p)

Ω h)1/p

.

Hence the series occurring in (2.1.31) is dominated by

(var(p)Ω h)1/p

(log 2)1/p

n∈N+

(FnFn+1)−1/p ,

and clearly the last series is convergent. 2

It is important to know when σ2 defined in terms of H or, equivalently, interms of h, is non-zero. In the result below the function h, which is only de-fined on Ω, is considered as the representative of a class of λ-indistinguishable

Page 94: Kluwer

Solving Gauss’ problem 77

functions on I, after having been extended in an arbitrary manner to thewhole of I .

Proposition 2.1.24 Assume that h ∈ L2γ(I),

∫I hdγ = 0, and Uh ∈

BEV (I). Then the series

σ2 =∫

Ih2dγ + 2

n∈N+

Ih Unh dγ (2.1.33)

converges absolutely, and we have σ = 0 if and only if there exists b ∈ L2γ (I)

such that h = b τ − b a.e. in I. In particular, if h is essentially unboundedthen σ 6= 0.

Proof. By (2.0.2) and Proposition 2.1.7(ii) we have

ess sup |Unh| ≤ ||Unh||v ≤ qn−1 ||Uh||v , n ∈ N+, (2.1.34)

for some positive q < 1. This clearly entails the absolute convergence of bothseries (2.1.33) and

∑n∈N+

n∫I h Unh dγ. Then Corollary 2.1.20 completes

the proof of the first two assertions concerning σ.Without appealing to Corollary 2.1.20, the characterization of the case

σ = 0 can be given a direct proof as follows. Put h1 =∑

n∈N+Unh. By

(2.1.34) this series converges in BEV (I), and we have h1 = Uh + Uh1 =U (h + h1). Writing g = h + h1 we note that Ug ∈ BEV (I) and

σ2 =∫

I

(h2 + 2hh1

)dγ =

I

(g2 − (Ug)2

)dγ.

By (2.1.2) we have∫

I(Ug)2dγ =

I((Ug) τ)g dγ

and ∫

I(Ug)2 dγ =

I((Ug) τ)2 dγ.

[Note that (2.1.2) implies in general that∫I fdγ =

∫I f τdγ, f ∈ L1

γ , whichalso follows from the fact that τ is γ-preserving.] Consequently, we can write

σ2 =∫

Ig2dγ − 2

I((Ug) τ) g dγ +

I((Ug) τ)2 dγ

=∫

I(g − (Ug) τ)2 dγ.

Page 95: Kluwer

78 Chapter 2

Now, if σ = 0 then g = (Ug) τ a.e. in I. Hence

h = (Ug) τ − Ug a.e. in I, (2.1.35)

that is, we can take b = Ug. Conversely, if h = b τ − b a.e. in I thenSn = b τn − b a.e. in I for any n ∈ N+. Hence

n−1EγS2n ≤ 4n−1

Ib2dγ → 0 as n →∞,

that is, σ = 0.Finally, since Ug ∈ BEV (I) as shown above, equation (2.1.35) cannot

hold in the case where h is essentially unbounded, that is, we cannot haveσ = 0. 2

Corollary 2.1.25 Let f : N+ → R such that Eγf2 (a1) < ∞, Eγf (a1) =0. Put

σ2 = Eγf2 (a1) + 2∑

n∈N+

Eγf (a1) f (an+1) (2.1.36)

Then σ = 0 if and only if f = 0.

Proof. As a special case of (2.1.26) with (2.1.31) trivially satisfied, series(2.1.36) is absolutely convergent. Moreover, in the present case h is definedby

h (ω) = f (b1/ωc) , ω ∈ Ω,

and by hypothesis h ∈ L2γ (I) and

∫I hdγ = 0. We then have

Uh (ω) =∑

i∈N+

Pi(ω)f(i), ω ∈ Ω,

andv (Uh) ≤

i∈N+

|f (i)| var Pi ≤ C∑

i∈N+

|f (i)|i2

for some C > 0. The last series is convergent since Eγ |f (a1)| < ∞, so thatUh ∈ BEV (I). Then by Proposition 2.1.24 we have σ = 0 if and only ifthere exists b ∈ L2

γ (I) such that h = b τ − b a.e. in I, and we have to showthat this happens if and only if f = 0. Clearly, if f = 0 then σ = 0. Toprove the converse we first note that

Uh = U (b τ)− Ub = b− Ub a.e. in I.

This equation holds for b equal to h1 =∑

n∈N+Unh ∈ BEV (I). Putting

b = b1 + h1 we get b1 = Ub1. But by Proposition 2.1.7 the last equation

Page 96: Kluwer

Solving Gauss’ problem 79

only holds for a.e. constant functions b1. This shows that actually b ∈BEV (I). Next, whatever i ∈ N+, for u ∈ (1/ (i + 1) , 1/i) the equationh (u) = (b τ) (u)− b (u) a.e. in I implies

f (i) = b (x)− b

(1

x + i

)

a.e. in I . Hence

nf (i) = b (x)− b ([i(n− 1), x + i])

a.e. in I for any n ≥ 2, where i (n− 1) = (i1, · · · , in−1) with i1 = · · · =in−1 = i. If f (i) 6= 0 then this contradicts the fact that b ∈ BEV (I). Theproof is complete. 2

We note another criterion for to have σ 6= 0 under stronger assumptionsthan in Proposition 2.1.24.

Proposition 2.1.26 Let h : I → R be continuous except for a finitenumber of points of I and assume that inf x∈(0,δ) |h (x)| > 0 for some δ > 0,∫I hdγ = 0, and

∫I h2dγ < ∞. If the series defining Uh is uniformly

convergent in I and Uh ∈ BV (I), then σ defined by (2.1.33) is non-zero.

For the proof see Samur (1996). 2

2.2 Wirsing’s solution to Gauss’ problem

2.2.1 Elementary considerations

Let µ ∈ pr (BI) such that µ ¿ λ. For any n ∈ N put

Fn (x) = µ (τn < x) , x ∈ I,

with τ0 = identity map. As (τn < x) = τ−n ((0, x)), by Proposition 2.1.5we have

Fn (x) =∫ x

0

Unf0 (u)u + 1

du, n ∈ N, x ∈ I, (2.2.1)

with f0(x) = (x + 1)F ′0(x), x ∈ I, where F ′

0 = dµ/dλ. [Clearly, (2.2.1) is aspecial case of (2.1.21).]

In this subsection we will assume that F ′0 ∈ C 1 (I). In other words, we

study the behaviour of Un as n → ∞, assuming that the domain of U isC 1 (I).

Page 97: Kluwer

80 Chapter 2

Let f ∈ C 1 (I). Then

Uf (x) =∑

i∈N+

Pi (x) f

(1

x + i

)

=∑

i∈N+

(i

x + i + 1− i− 1

x + i

)f

(1

x + i

), x ∈ I,

can be differentiated term by term to give

(Uf)′ (x) = −∑

i∈N+

((i

(x + i + 1)2− i− 1

(x + i)2

)f

(1

x + i

)

+(

i

x + i + 1− i− 1

x + i

)1

(x + i)2f ′

(1

x + i

))

= −∑

i∈N+

(i

(x + i + 1)2

(f

(1

x + i

)− f

(1

x + i + 1

))

+x + 1

(x + i)3 (x + i + 1)f ′

(1

x + i

)), x ∈ I,

since the series of derivatives is uniformly convergent, it being dominatedby a convergent series of positive constants. Hence

(Uf)′ = −V f ′, f ∈ C 1(I), (2.2.2)

where V : C (I) → C (I) is defined by

V g (x) =∑

i∈N+

(i

(x + i + 1)2

∫ 1/(x+i)

1/(x+i+1)g (u) du

+x + 1

(x + i)3 (x + i + 1)g

(1

x + i

)), g ∈ C (I), x ∈ I.

Clearly,(Unf)′ = (−1)n V nf ′, n ∈ N+, f ∈ C 1 (I) . (2.2.3)

We are going to show that V n takes certain functions into functions withvery small values when n ∈ N+ is large.

Proposition 2.2.1 There are positive constants v > 0.29017 and w <0.30796, and a real-valued function ϕ ∈ C (I) such that

vϕ ≤ V ϕ ≤ wϕ.

Page 98: Kluwer

Solving Gauss’ problem 81

Proof. Let h : R+ → R be a continuous bounded function such thatlimx→∞ h (x) /x = 0. We look for a function g : (0, 1] → R such thatUg = h, assuming that the equation

Ug (x) =∑

i∈N+

Pi (x) g

(1

x + i

)= h (x) (2.2.4)

holds for x ∈ R+. Then (2.2.4) yields

h (x)x + 1

− h (x + 1)x + 2

=1

(x + 1) (x + 2)g

(1

x + 1

), x ∈ R+.

Hence

g (u) =(

1u

+ 1)

h

(1u− 1

)− 1

uh

(1u

), u ∈ (0, 1],

and we indeed have Ug = h since

Ug (x) = (x + 1)∑

i∈N+

(h (x + i− 1)

x + i− h (x + i)

x + i + 1

)

= (x + 1)(

h (x)x + 1

− limi→∞

h (x + i)x + i + 1

)= h (x) , x ∈ R+.

In particular, for any fixed a ∈ I we consider the function ha : R+ → Rdefined by

ha (x) =1

x + a + 1, x ∈ R+.

We have just seen that the function ga : (0, 1] → R defined by

ga (x) =(

1x

+ 1)

ha

(1x− 1

)− 1

xha

(1x

)

=x + 1ax + 1

− 1(a + 1)x + 1

, x ∈ (0, 1],

satisfiesUga (x) = ha(x), x ∈ I.

We come to V via (2.2.2). Setting

ϕa (x) = g′a (x) =1− a

(ax + 1)2+

a + 1((a + 1)x + 1)2

, x ∈ I,

Page 99: Kluwer

82 Chapter 2

we haveV ϕa (x) = − (Uga)

′ (x) =1

(x + a + 1)2, x ∈ I.

Let us choose a by asking that

ϕa

V ϕa(0) =

ϕa

V ϕa(1) .

This amounts to

(a + 1)3 (2a + 1) + (a− 1) (a + 2)2 = 0

or2 (a + 1)4 − 3 (a + 1)− 2 = 0,

which yields as unique acceptable solution

a = 0.3126597 · · · .

For this value of a the function ϕa/V ϕa attains its maximum equal to2 (a + 1)2 = 3.44615 · · · at x = 0 and at x = 1, and has a minimum equal to

m (a) = a3 + a2 − a + 1 + 3a (a + 2)(1− a− a2

) ((1− a) δ +

a + 1δ

)

= 3.247229 · · ·

at x = (δ − 1) / (1− a (δ − 1)) = 0.3655 · · · , where

δ =(

a (a + 1) (a + 2)(1− a) (1− a− a2)

)1/3

= 1.328024 · · · .

It follows that for ϕ = ϕa with a = 0.3126597 · · · we have

ϕ

2 (a + 1)2≤ V ϕ ≤ ϕ

m (a),

that is,vϕ ≤ V ϕ ≤ wϕ,

where

v =1

2 (a + 1)2> 0.29017, w =

1m (a)

< 0.30796.

2

Page 100: Kluwer

Solving Gauss’ problem 83

Remark. As noted by Wirsing (1974, p. 513), a better choice of ϕis ϕ = 8ϕa′ − 7ϕa′′ with a′ = 0.6247 and a′′ = 0.7, which yields v =0.3020, w = 0.3043. 2

Corollary 2.2.2 Let f0 ∈ C 1 (I) such that f ′0 > 0. Put

α = minx∈I

ϕ (x)f ′0 (x)

, β = maxx∈I

ϕ (x)f ′0 (x)

.

Thenα

βvn f ′0 ≤ V nf ′0 ≤ β

αwn f ′0, n ∈ N+. (2.2.4)

Proof. Since V is a positive operator (that is, takes non-negative func-tions into non-negative functions) we have

vnϕ ≤ V nϕ ≤ wnϕ, n ∈ N+.

Noting that αf ′0 ≤ ϕ ≤ βf ′0 we then can write

α

βvnf ′0 ≤ 1

βvnϕ ≤ 1

βV nϕ ≤ V nf ′0 ≤ 1

αV nϕ

≤ 1α

wnϕ ≤ β

αwnf ′0, n ∈ N+,

which shows that (2.2.4) holds. 2

Remark. A similar result holds if f0 ∈ C 1 (I) and f ′0 < 0. 2

Theorem 2.2.3 (Near-optimal solution to Gauss’ problem) Let f0 ∈C1 (I) such that f ′0 > 0. For any n ∈ N+ and x ∈ I we have

(log 2)2 α minx∈I f ′0 (x)2β

vnG (x) (1−G (x))

≤ |µ (τn < x)−G (x)|

≤ (log 2)2 β maxx∈I f ′0 (x)α

wnG (x) (1−G(x)),

where α, β, v, and w are defined in Proposition 2.2.1 and Corollary 2.2.2.In particular, for any n ∈ N+ and x ∈ I we have

0.07739 vn G (x) (1−G (x)) ≤ |λ (τn < x)−G (x)|

≤ 1.49132 wn G (x) (1−G (x)) .

Page 101: Kluwer

84 Chapter 2

Proof. For any n ∈ N and y ∈ I set dn(y) = µ(τn < ey log 2 − 1

) − y sothat

dn (G (x)) = µ (τn < x)−G(x), x ∈ I.

Then by (2.2.1) we have

dn (G (x)) =∫ x

0

Unf0 (u)u + 1

du−G(x).

Differentiating twice with respect to x yields

d′n (G (x))1

(x + 1) log 2=

Unf0 (x)x + 1

− 1(x + 1) log 2

,

(Unf0 (x))′ =1

(log 2)2d′′n (G (x))

x + 1, n ∈ N, x ∈ I.

Hence, by (2.2.3),

d′′n (G (x)) = (−1)n (log 2)2 (x + 1)V nf ′0(x), n ∈ N, x ∈ I.

Since dn (0) = dn (1) = 0, it follows from a well known interpolation formulathat

dn (y) = −y (1− y)2

d′′n(θ), n ∈ N, y ∈ I,

for a suitable θ = θ (n, y) ∈ I. Therefore

µ (τn < x)−G (x) = (−1)n+1 (log 2)2θ + 1

2V nf ′0 (θ) G (x) (1−G (x))

for any n ∈ N and x ∈ I, and another suitable θ = θ (n, x) ∈ I. The resultstated follows now from Corollary 2.2.2.

In the special case µ = λ we have f0 (x) = x + 1, x ∈ I. Then witha = 0.3126597 · · · we have

α = minx∈I

ϕ (x)f ′0 (x)

=1− a

(a + 1)2+

a + 1(a + 2)2

= 0.644333 · · · ,

β = maxx∈I

ϕ (x)f ′0 (x)

= 2,

so that

(log 2)2 α

2β= 0.07739 · · · ,

(log 2)2 β

α= 1.49131 · · · .

Page 102: Kluwer

Solving Gauss’ problem 85

The proof is complete. 2

Remark. It follows from the above proof that for any n ∈ N the difference

µ (τn < x)−G (x)

has a constant sign equal to (−1)n+1 whatever 0 < x < 1. 2

2.2.2 A functional-theoretic approach

The question naturally arises whether the operator V has an eigenvalue λ0

such that v ≤ λ0 ≤ w (see Theorem 2.2.3). This will indeed follow from theresult below.

Let B be a collection of bounded real-valued functions defined on a setX, with the following properties: (i) B is a linear space over R; (ii) Bis complete with respect to the supremum norm, and (iii) B contains theconstant functions.

Theorem 2.2.4 Let V : B → B be a positive bounded linear operatorand F : B → R a positive bounded linear functional such that

V ≥ F . (2.2.5)

Assume that there exist ϕ ∈ B with

m (ϕ) = infx∈X

ϕ (x) > 0

and two positive numbers v and w, v ≤ w, such that

v ≤ V ϕ (x)ϕ (x)

≤ w, x ∈ X, (2.2.6)

andF (ϕ) >

(1− v

w

)|||V ϕ |||. (2.2.7)

Then V has an eigenvalue λ0 ∈ [v, w] with corresponding positive eigenfunc-tion ψ ∈ B such that

ψ ≥ ϕ ≥ m (ϕ) > 0, 0 < wF (ϕ)|||V ϕ ||| − (w − v) ≤ F (ψ)

||| ψ ||| ≤ λ0,

and for any n ∈ N and f ∈ B we have

V nf = G (f) λn0ψ + osc

f

ψ

(λ0 − F (ψ)

||| ψ |||)n

θnψ, (2.2.8)

Page 103: Kluwer

86 Chapter 2

where G : B → R is a positive bounded linear functional with ||| G ||| ≤ 1/m (ϕ),and θn : X → R is a function satisfying |θn| ≤ 1.

Proof. Define ϕn = V nϕ, n ∈ N, ϕ0 = ϕ. Since V is positive, from(2.2.6) we get

vϕn ≤ ϕn+1 ≤ wϕn, n ∈ N.

It follows thatinfx∈X

ϕn (x) > 0, n ∈ N.

Set v0 = v, w0 = w, and

vn = infϕn+1

ϕn, wn = sup

ϕn+1

ϕn, n ∈ N+.

Thenvnϕn ≤ ϕn+1 ≤ wnϕn, n ∈ N, (2.2.9)

whencevnV ϕn ≤ V ϕn+1 ≤ wnV ϕn,

that is,vnϕn+1 ≤ ϕn+2 ≤ wnϕn+1.

Therefore vn+1 ≥ vn and wn+1 ≤ wn, n ∈ N. We are going to improvethese inequalities.

It follows from (2.2.5) and (2.2.9) that

ϕn+2 − vnϕn+1 = V (ϕn+1 − vnϕn) ≥ F(ϕn+1 − vnϕn)

≥ ϕn+1

||| ϕn+1 ||| F(ϕn+1 − vnϕn),

whence

vn+1 ≥ vn +F(ϕn+1 − vnϕn)

||| ϕn+1 ||| , n ∈ N. (2.2.10)

Similarly,

wnϕn+1 − ϕn+2 = V (wnϕn − ϕn+1) ≥ F(wnϕn − ϕn+1)

≥ ϕn+1

||| ϕn+1 ||| F(wnϕn − ϕn+1),

whence

wn+1 ≤ wn − F (wnϕn − ϕn+1)||| ϕn+1 ||| , n ∈ N. (2.2.10′)

Page 104: Kluwer

Solving Gauss’ problem 87

Putting dn = wn− vn and en = F (ϕn) /||| ϕn+1 ||| , n ∈ N, it follows from(2.2.10) and (2.2.10′) that

dn+1 ≤ dn(1− en), n ∈ N, (2.2.11)

which shows that en ≤ 1, n ∈ N.Now, note that (2.2.9) implies

F (ϕn+1) ≥ vnF (ϕn) and ||| ϕn+2 ||| ≤ wn+1 ||| ϕn+1 ||| , n ∈ N.

Henceen+1 ≥ vn

wn+1en, n ∈ N. (2.2.12)

In conjunction with (2.2.11) and (2.2.12), assumption (2.2.7) which can bewritten as

e0 − d0

w0> 0,

ensures exponential decrease of the dn, n ∈ N, since

wn+1en+1 − dn+1 ≥ vnen − dn (1− en) = wnen − dn, n ∈ N,

whencewnen − dn ≥ w0e0 − d0,

1 ≥ en ≥ 1wn

(w0e0 − d0) ≥ e0 − d0

w0> 0, (2.2.13)

and

dn ≤ d0

(1− e0 +

d0

w0

)n

, n ∈ N. (2.2.14)

Put λ0 = limn→∞ vn = limn→∞wn, and define

ϕ0 = ϕ0 = ϕ, ϕn = ϕn (v0 · · · vn−1)−1 , n ∈ N+.

Then (2.2.9) amounts to

ϕn ≤ ϕn+1 ≤ wn

vnϕn =

(1 +

dn

vn

)ϕn ≤

(1 +

dn

v0

)ϕn, n ∈ N, (2.2.15)

and (2.2.14) implies that

A =∏

n∈N

(1 +

dn

v0

)< ∞.

Page 105: Kluwer

88 Chapter 2

Hence

ϕn ≤n−1∏

i=0

(1 +

di

v0

)ϕ0 ≤ A ϕ0, n ∈ N+. (2.2.16)

It follows from (2.2.15) and (2.2.16) that

0 ≤ ϕn+1 − ϕn ≤ dn

v0ϕn ≤ dnA

v0ϕ0, n ∈ N.

Therefore by (2.2.14) the series∑

n∈N ||| ϕn+1 − ϕn ||| converges. By thecompleteness of B the limit ψ = limn→∞ ϕn exists. Letting n → ∞ invnϕn ≤ V ϕn ≤ wnϕn, n ∈ N, yields V ψ = λ0ψ.

Since ϕn+1 ≥ ϕn ≥ · · · ≥ ϕ0 = ϕ, we have ψ ≥ ϕ. As 1 ≥ en =F (ϕn) /||| ϕn+1 ||| = F (ϕn) /||| V ϕn ||| , n ∈ N, letting n → ∞ yields 1 ≥F (ψ) /λ0 ||| ψ ||| . Finally, by (2.2.13) we have

F (ψ)||| ψ ||| =

λ0F (ψ)||| V ψ ||| = lim

n→∞wnen ≥ w0e0 − d0 = wF (ϕ)||| V ϕ ||| − w + v > 0.

To prove (2.2.8) let f ∈ B and define fn = V nf, n ∈ N, f0 = f,

vn = inffn

λn0ψ

, wn = supfn

λn0ψ

, n ∈ N.

Hence

fn+1 − vnλn+10 ψ = V (fn − vnλn

0ψ)

≥ F (fn − vnλn0ψ) ≥ ψ

||| ψ |||F(fn − vnλn0ψ),

which yields

vn+1 ≥ vn +1

λn+10 ||| ψ ||| F (fn − vnλn

0ψ) ≥ vn, n ∈ N.

Similarly,

wn+1 ≤ wn − 1λn+1

0 ||| ψ ||| F(wnλn0ψ − fn) ≤ wn, n ∈ N.

Therefore

wn+1 − vn+1 ≤ (wn − vn)(

1− F (ψ)λ0 ||| ψ |||

), n ∈ N,

Page 106: Kluwer

Solving Gauss’ problem 89

whence

wn − vn ≤ oscf

ψ

(1− F (ψ)

λ0 ||| ψ |||)n

, n ∈ N,

sincew0 − v0 = sup

f

ψ− inf

f

ψ= osc

f

ψ.

If we denote by G (f) the common limit of vn and wn as n → ∞, then wehave

vn, wn = G (f) + θnoscf

ψ

(1− F (ψ)

λ0||| ψ |||)n

, n ∈ N,

with a suitable θn ∈ R satisfying∣∣∣θn

∣∣∣ ≤ 1. Hence, by the very definitionof the vn and wn, n ∈ N, equation (2.2.8) should hold. Since

|G(f)| ≤ max (|v0| , |w0|) ≤ ||| f |||inf ψ

, f ∈ B,

it follows that

||| G ||| = supf∈B

|G (f)|||| f ||| ≤ 1

inf ψ.

The fact that G is a positive linear functional is an immediate consequenceof equation (2.2.8). 2

Let us show that Theorem 2.2.4 applies to Gauss’ problem as consideredin Subsection 2.2.1. The space B is Cr(I), the collection of all real-valuedfunctions in C (I) , and the operator V the one denoted there by the sameletter. As function ϕ we could use the function ϕa constructed in Subsection2.2.1 with a = 0.3126597 · · · . Nevertheless, it is more convenient to use V ϕa

instead, for which the same values of v and w apply. Thus we take

ϕ (x) =1

(x + a + 1)2, x ∈ I,

with a = 0.3126597 · · · . Finally, the functional F can be constructed asfollows. Let f ∈ Cr (I) , f ≥ 0. [Note that actually the considerations belowhold for any non-negative f ∈ B(I).] Then

V f (x) ≥∑

i∈N+

i

(x + i + 1)2

∫ 1/(x+i)

1/(x+i+1)f (y) dy

=∫ 1

0k (x, y) f (y) dy, x ∈ I,

Page 107: Kluwer

90 Chapter 2

where

k (x, 0) = 0, x ∈ I,

k (x, y) =by−1 − xc

(x + by−1 − xc+ 1)2, x ∈ I, y ∈ (0, 1].

If 0 < y ≤ 1/3 then by−1 − xc ≥ 2, and since

t → (t + x + 1)−2 , t ≥ 2,

is a decreasing function, we have

k (x, y) ≥ y−1 − x

(y−1 + 1)2≥ y−1 − 1

(y−1 + 1)2=

y (1− y)(y + 1)2

for x ∈ I, 0 ≤ y ≤ 1/3. If 1/3 < y ≤ 1/2 then either k (x, y) = (2 + x)−2 ork (x, y) = 2 (3 + x)−2. Hence k (x, y) ≥ 1/9 for x ∈ I, 1/3 < y ≤ 1/2. Thuswe have V f ≥ F (f), where

F (f) =∫ 1/3

0

y (1− y)(y + 1)2

f (y) dy +19

∫ 1/2

1/3f (y) dy.

Elementary calculations yield

F (ϕ) =∫ 1/3

0

y (1− y) dy

(y + 1)2 (y + a + 1)2+

19

∫ 1/2

1/3

dy

(y + a + 1)2

=∫ 1/3

0

(3a + 4

a3 (y + 1)− 2

a2 (y + 1)2− 3a + 4

a3 (y + a + 1)− a2 + 3a + 2

a2 (y + a + 1)2

)dy

+19

∫ 1/2

1/3

dy

(y + a + 1)2=

3a + 4a3

log4 (a + 1)3a + 4

− 88a2 + 279a + 21618a2 (2a + 3) (3a + 4)

.

As V ϕ ≤ wϕ, we have

w F (ϕ)||| V ϕ ||| ≥ F (ϕ)

||| ϕ ||| = (a + 1)2F (ϕ) > 0.033184. (2.2.17)

Since w−v < 0.01779, inequality (2.2.7) holds. Thus Theorem 2.2.4 appliesand we have

F (ψ)||| ψ ||| ≥ (a + 1)2F (ϕ)− (w − v) > 0.01539. (2.2.18)

Page 108: Kluwer

Solving Gauss’ problem 91

To state the result corresponding to Theorem 2.2.3 we should first intro-duce a few notation. Let

Ψ (x) =∫ x

0ψ (u) du

andψ (x) =

∫ x

0

Ψ(u)− U∞Ψu + 1

du, x ∈ I.

It is easy to check that((x + 1) ψ′ (x)

)′= ψ(x), x ∈ I,

and ψ (0) = ψ (1) = 0.

Remarks. 1. As noted by Wirsing (1974, p. 521), using as function ϕthe function V (8ϕa′ − 7ϕa′′) with a′ = 0.6247 and a′′ = 0.7 one can improve(2.2.18) to

F(ψ)/||| ψ ||| ≥ 0.031.

2. Wirsing (1974, § 5) proved that the functions ψ and ψ are analytic.Their analytic continuations are holomorphic in the whole complex planewith a cut along the negative real axis from ∞ to −1, which is the naturalboundary of these functions. 2

Theorem 2.2.5 Let f0 ∈ C1 (I) (equivalently, dµ/dλ = F ′0 ∈ C1(I)).

For any n ∈ N and x ∈ I we have∣∣∣ µ (τn < x)−G (x)− (−λ0)

n G (f ′0

)ψ (x)

∣∣∣

≤ ||| ψ ||| oscf ′0ψ

(log 2)2 (λ0 − 0.01539)n G (x) (1−G (x)) ,

where λ0 = 0.303 663 002 898 732 658 · · · ,1

(x + a + 1)2≤ ψ (x) ≤ 3.41

(x + a + 1)2, x ∈ I,

with a = 0.3126597, and G is a positive bounded functional on Cr (I) suchthat

||| G ||| ≤ 1inf ψ

≤ (a + 2)2 = 5.34839 · · · .

In particular, for any n ∈ N and x ∈ I we have∣∣∣λ (τn < x)−G (x)− (−λ0)

n G (1) ψ (x)∣∣∣ (2.2.19)

Page 109: Kluwer

92 Chapter 2

≤ 4.605 (λ0 − 0.01539)n G (x) (1−G (x)) .

Proof. We use the same trick as in the proof of Theorem 2.2.3. For n ∈ Nand y ∈ I set dn(y) = µ

(τn < ey log 2 − 1

)− y− (λ0)nG(f ′0)ψ(ey log 2 − 1

)so

that

dn (G (x)) = µ (τn < x)−G (x)− (−λ0)n G (

f ′0)ψ (x) , x ∈ I.

Differentiating twice with respect to x yields

1(log 2)2

d′′n (G (x))x + 1

= (Unf0)′ (x)− (−λ0)

n G (f ′0

) ((x + 1) ψ′ (x)

)′

= (−1)n V nf ′0 (x)− (−λ0)n G (

f ′0)ψ (x) .

Hence, by Theorem 2.2.4 and (2.2.18),

∣∣d′′n (G (x))∣∣ ≤ 2 ||| ψ ||| osc

f ′0ψ

(log 2)2 (λ0 − 0.01539)n , n ∈ N, x ∈ I.

Since dn (0) = dn (1) = 0, the first inequality in the statement follows (cf. theproof of Theorem 2.2.3).

In principle, Theorem 2.2.4 provides the means for computing λ0 to anyaccuracy. It follows from that theorem that for any real-valued f ∈ C1 (I)and n ∈ N we have

Unf (1)− Unf (0)

= (−1)n λn0 G

(f ′

) ∫ 1

0ψdλ + (λ0 − 0.01539)n osc

f ′

ψ

∫ 1

0θnψ dλ

with a suitable θn : I → R satisfying |θn| ≤ 1. Therefore if f ′ > 0 then

Unf (1)− Unf (0)Un−1f (1)− Un−1f (0)

= −λ0 + O

((λ0 − 0.01539

λ0

)n)

as n →∞. Using this equation Wirsing (1974) has obtained the value givenin the statement. Note that in Knuth (1981, p. 350) the first 20 (RCF)digits of λ0 are given as 3, 3, 2, 2, 3, 13, 1, 174, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 1. The20th convergent equals

227 769 828750 074 345

,

which yields 14 exact significant digits of λ0.

Page 110: Kluwer

Solving Gauss’ problem 93

Now, we refer to the proof of Theorem 2.2.4. It is shown there thatϕ ≤ ψ ≤ Aϕ, with

A =∏

n∈N

(1 +

dn

v

),

dn ≤ (w − v)(

1− e0 +w − v

w

)n

, n ∈ N,

where in the present case v > 0.29017 and w < 0.30796. Then since by(2.2.17) we have

we0 =wF (ϕ)||| V (ϕ) ||| ≥ (a + 1)2F (ϕ) ≥ 0.033184,

it follows that

A ≤ exp∑

n∈N dn

v≤ exp

w (w − v)v (we0 − (w − v))

≤ 3.409 · · · .

In the special case µ = λ we have

oscf ′0ψ

= osc1ψ

=1

inf ψ− 1

supψ≤ (a + 2)2 − (a + 1)2

3.41= 4.843094 · · · ,

and (2.2.19) follows. 2

Theorem 2.2.6 Let f ∈ C1 (I) be real-valued. For any n ∈ N we have

||| Unf − U∞f ||| (2.2.20)

≤(

λn0

∣∣G (f ′

)∣∣ + oscf ′

ψ(λ0 − 0.01539)n

) ∫

Iγ(dx)

∫ x

0ψ dλ

and

||| Unf − U∞f ||| (2.2.21)

≥(

λn0

∣∣G (f ′

)∣∣− oscf ′

ψ(λ0 − 0.01539)n

) ∫

Iγ(dx)

∫ x

0ψ dλ .

Here G is a positive bounded linear functional on Cr(I) with ||| G ||| ≤ 5.34839 · · · ,and the last inequality is meaningful for n ∈ N+ large enough.

Proof. It follows from (2.2.3) and (2.2.8) that

Unf(x)− Unf(y) =

Page 111: Kluwer

94 Chapter 2

= (−1)n

(G(f ′)λn

0

∫ y

xψ dλ + osc

f ′

ψ(λ0 − 0.01539)n

∫ y

xθnψ dλ

)

for any n ∈ N and x, y ∈ I with a suitable θn : I → R satisfying |θn| < 1.Integrating over y ∈ I with respect to γ, on account of (2.1.12) we obtain

Unf(x)− U∞f = (−1)n

(G(f ′)λn

0

Iγ(dy)

∫ y

xψ dλ (2.2.22)

+ oscf ′

ψ(λ0 − 0.01539)n

Iγ(dy)

∫ y

xθnψ dλ

)

for any n ∈ N and x ∈ I. Hence (2.2.20) and (2.2.21) follow at once. Forthe lower bound (2.2.21) we should note that

||| Unf − U∞f ||| ≥ |Unf(0)− U∞f | .2

Remarks. 1. Equation (2.2.22) shows that whatever f ∈ C1(I) the exactrate of convergence of Unf(x)−U∞f to 0 as n →∞ is O(λn

0 ) for any x /∈ E,where

E =(

x ∈ I :∫

Iγ(dy)

∫ y

xψ dλ = 0

).

Clearly, E is not empty since∫

Iγ(dy)

∫ y

0ψ dλ > 0 and

Iγ(dy)

∫ y

1ψ dλ < 0.

2. By (2.1.12) and Proposition 2.0.1(i) with µ = γ, for any f ∈ C1(I)we have

||| Unf − U∞f ||| ≤ var Unf, n ∈ N.

Next, since

Unf(1)− Unf(0) = Unf(1)− U∞f − (Unf(0)− U∞f),

we have|Unf(1)− Unf(0)| ≤ 2 |||Unf − U∞f |||.

Finally, noting that by (2.2.3) we have

var Unf =∫

I

∣∣(Unf)′∣∣dλ =

I

∣∣V nf ′∣∣ dλ,

|Unf(1)− Unf(0)| =∣∣∣∣∫

I(Unf)′dλ

∣∣∣∣ =∣∣∣∣∫

IV nf ′dλ

∣∣∣∣ ,

Page 112: Kluwer

Solving Gauss’ problem 95

from (2.2.8) we obtain

||| Unf − U∞f ||| ≤(

λn0

∣∣G(f ′)∣∣ + osc

f ′

ψ(λ0 − 0.01539)n

) ∫

Iψ dλ

and

||| Unf − U∞f ||| ≥ 12

(λn

0

∣∣G(f ′)∣∣− osc

f ′

ψ(λ0 − 0.01539)n

)∫

Iψ dλ

for any n ∈ N and any real-valued f ∈ C1(I).Since ∫

Iγ(dx)

∫ x

0ψ dλ <

Iψ dλ,

the upper bound for ||| Unf − U∞f ||| just derived is slightly worse than thatgiven in Theorem 2.2.6. The comparison of the lower bounds for ||| Unf −U∞f ||| , here and in Theorem 2.2.6, amounts to a comparison of

∫I ψ dλ/2

and∫I γ(dx)

∫ x0 ψ dλ, a question we cannot answer. 2

Corollary 2.2.7 The spectral radius of the operator U−U∞ in C1(I) isequal to λ0.

Proof. We should show that

limn→∞ ||| U

n − U∞ |||1/n1 = lim

n→∞

(sup

06=f∈C1(I)

||| Unf − U∞f |||1||| f |||1

)1/n

= λ0.

This follows easily using Theorem 2.2.6 and equations (2.2.3) and (2.2.8).The details are left to the reader. 2

2.2.3 The case of Lipschitz densities

Theorem 2.2.4 can be also used to solve Gauss’ problem in the case whereF ′

0 = dµ/dλ ∈ L(I). In other words, Theorem 2.2.4 enables us to study thebehaviour of Un as n →∞ assuming that the domain of U is L(I).

Let f ∈ L(I). Then the derivative f ′ exists a.e. in I and is bounded bys(f). Abusing the notation, we will also denote by f ′ the extension to I ofthe derivative of f , which is obtained by assigning the value 0 at the pointswhere f is not differentiable.

It is obvious that the operator V : C(I) → C(I) introduced in Subsection2.2.1 can be extended to B(I) with V g, g ∈ B(I), defined by the same

Page 113: Kluwer

96 Chapter 2

formula as in the case of a continuous g. The point is that, as is easy to see,equations (2.2.2) and (2.2.3) hold now a.e. in I, that is,

(Unf)′ = (−1)nV nf ′, f ∈ L(I), n ∈ N+, (2.2.23)

a.e. in I, with the null set of exempted points depending on f and n.Let us now apply Theorem 2.2.4 to our V in the case where B is Br(I),

the collection of all real-valued functions in B(I), with the same function ϕand functional F as in the case where B = Cr(I) ⊂ Br(I), which has beenconsidered in Subsection 2.2.2. It follows that the operator V : Br(I) →Br(I) has an eigenvalue λ0 = 0.303 663 002 898 732 658 · · · with correspond-ing positive eigenfunction ψ ∈ C(I) satisfying

1(x + a + 1)2

≤ ψ(x) ≤ 3.41(x + a + 1)2

, x ∈ I,

where a = 0.3126597 · · · , and

V ng = G(g)λn0ψ + osc

g

ψ(λ0 − 0.01539)nθnψ (2.2.24)

for any n ∈ N and g ∈ Br(I). Here G : Br(I) → R is a positive boundedlinear functional with ||| G ||| ≤ (a+2)2 and θn : I → R is a function satisfying|θn| ≤ 1.

Theorem 2.2.8 Let f ∈ L(I) be real-valued. For any n ∈ N+ we have

||| Unf − U∞f ||| ≤(

λn0

∣∣G(f ′)∣∣ + osc

f ′

ψ(λ0 − 0.01539)n

) ∫

Iγ(dx)

∫ x

0ψdλ

and

||| Unf − U∞f ||| ≥(

λn0

∣∣G(f ′)∣∣− osc

f ′

ψ(λ0 − 0.01539)n

) ∫

Iγ(dx)

∫ x

0ψdλ.

Here G is a positive bounded functional on Br(I) with ||| G ||| < 5.34839 · · · ,and the last inequality is meaningful for n ∈ N+ large enough.

The proof is identical with that of Theorem 2.2.6. Instead of (2.2.3) and(2.2.8) we should use (2.2.23) and (2.2.24). In particular, equation (2.2.22)holds for f ∈ L(I), too. 2

Remark. The contents of Remarks 1 and 2 following the proof of Theo-rem 2.2.6 apply mutatis mutandis to the present L(I) framework. 2

Page 114: Kluwer

Solving Gauss’ problem 97

Corollary 2.2.9 Let f0 ∈ L(I) (equivalently, dµ/dλ = F ′0 ∈ L(I)). For

any n ∈ N and A ∈ BI we have∣∣µ(τ−n(A))− γ(A)

∣∣ (2.2.25)

≤ (1− log 2)(

λn0

∣∣G(f ′0)∣∣ + osc

f ′0ψ

(λ0 − 0.01539)n

)||| ψ |||min(γ(A), 1−γ(A)).

Proof. By Proposition 2.1.5, for any n ∈ N and A ∈ BI we have

µ(τ−n(A))− γ(A) =∫

A

Unf0(x)− U∞f0

x + 1dx (2.2.26)

sinceU∞f0 =

I

f0dγ =1

log 2

I

F ′0dλ =

1log 2

.

Note that∫

Iγ(dx)

∫ x

0ψ dλ ≤ ||| ψ |||

log 2

∫ 1

0

x dx

x + 1= ||| ψ |||

(1

log 2− 1

)(2.2.27)

andµ(τ−n(A))− γ(A) = γ(Ac)− µ(τ−n(Ac)) (2.2.28)

for any n ∈ N and A ∈ BI .Now, (2.2.25) follows from (2.2.26) through (2.2.28) and Theorem 2.2.8.

2

Corollary 2.2.10 The spectral radius of the operator U − U∞ in L(I)equals λ0.

Proof. Obvious by Theorem 2.2.8. 2

As an application of Theorem 2.2.8 we shall derive the asymptotic be-haviour of

γa(uan < x), x ≥ 1,

as n →∞ for any a ∈ I. While it is natural to think that for any a ∈ I thelimit distribution function

limn→∞ γa(ua

n < x)

is the common distribution function γ(u1 < x), x ≥ 1, of the extendedrandom variables u`, ` ∈ Z,—cf. the last paragraph of Subsection 1.3.3—it

Page 115: Kluwer

98 Chapter 2

is somewhat surprising to find out that the (exact) convergence rate is O(λn0 )

for most a ∈ I.

Theorem 2.2.11 For any n ∈ N+ and x ≥ 1 we have

supa∈I

∣∣γa(uan+1 < x)−H(x)

∣∣ (2.2.29)

≤ 3.2228I(1,∞)(x)

xλn

0 (1 + (0.94932)n),

where

H(x) =

1log 2

(log x− x− 1

x

)if 1 ≤ x ≤ 2,

1log 2

(log 2− 1

x

)if x ≥ 2.

In (2.2.29), λ0 cannot be replaced by a smaller constant, and the exact con-vergence rate to 0 of the left hand side of (2.2.29) is O(λn

0 ).

Proof. By Proposition 1.3.10, for any a ∈ I, x ≥ 1, and n ∈ N+ we have

γa(uan+1 < x|a1, . . . , an) =

(1− sa

n + 1x

)I(sa

n+1,∞)(x).

Hence

γa

(ua

n+1 ≥1t

∣∣∣∣ a1, . . . , an

)= 1− (1− t(sa

n + 1))I(san+1,∞)

(1t

)

= min(1, t(san + 1)) = ft(sa

n)

for any a ∈ I, t ∈ (0, 1], and n ∈ N+, with

ft(y) = min(1, t(y + 1)), y ∈ I.

Therefore, by Proposition 2.1.10,

γa

(ua

n+1 ≥1t

)= E

(γa

(u1

n+1 ≥1t

∣∣∣∣ a1, . . . , an

))= Unft(a), (2.2.30)

for any a ∈ I, t ∈ (0, 1], and n ∈ N+. It is easy to check that (2.2.30) holdsfor n = 0, too. Clearly, ft ∈ L(I) for any t ∈ (0, 1], and

U∞ft =∫

Ift(y)γ(dy) =

t

log 2if 0 < t ≤ 1/2,

1log 2

(1− t + log(2t)) if 1/2 ≤ t ≤ 1.

Page 116: Kluwer

Solving Gauss’ problem 99

Next, 0 ≤ f ′t(y) ≤ tI(0,1)(t), t ∈ (0, 1], y ∈ I. Hence

oscf ′tψ≤ 5.348396 tI(0,1)(t)

and ∣∣G(f ′t)∣∣ ≤ ||| G ||| ||| f ′t ||| ≤ 5.348396 tI(0,1)(t)

for any t ∈ (0, 1]. Finally,

Iγ(dx)

∫ x

0ψdλ ≤ 3.41

log 2

I

(1

1.312659− 1

x + 1.312659

)dx

x + 1

=3.41

0.312659

(1

log 2log

2.3126591.312659

− 11.312659

)

≤ 0.60256.

Consequently, Theorem 2.2.8 yields

supa∈I

∣∣∣∣γa

(ua

n+1 ≥1t

)− U∞ft

∣∣∣∣ ≤ 3.2228 t I(0,1)(t)λn0 (1 + (0.94932)n)

for any n ∈ N and t ∈ (0, 1]. Hence, by putting 1/t = x, (2.2.29) follows.Finally, the assertion concerning the optimality of λ0 also follows from

Theorem 2.2.8. 2

Remarks. 1. The convergence of λ(un < x) to H(x), x ≥ 1, as n →∞ was first sketchy proved by Doeblin (1940, p. 365) with an unspecifiedconvergence rate. A detailed proof following Doeblin’s suggestions was givenby Samur (1989, Lemma 4.5) together with a slower convergence rate thanthat occurring in Theorem 2.2.11.

2. Theorem 2.2.8 shows that the convergence rate to 0 as n →∞ of

supa∈I

supx≥1

∣∣γa(uan+1 < x)−H(x)

∣∣

is O(λn0 ). It is possible for some a ∈ I that the convergence rate to 0 as

n →∞ ofsupx≥1

∣∣γa(uan+1 < x)−H(x)

∣∣

is O(αn) with 0 < α < λ0. It follows from equation (2.2.22), which is validfor f ∈ L(I) too, that this happens if and only if a ∈ E, with E defined in

Page 117: Kluwer

100 Chapter 2

Remark 1 following Theorem 2.2.6. In particular, 0 and 1 do not belong toE, thus

supx≥1

|λ(un+1 < x)−H(x)| = O(λn0 )

andsupx≥1

∣∣γ1(u1n+1 < x)−H(x)

∣∣ = O(λn0 )

as n →∞. It would be interesting to effectively determine elements of E.2

The asymptotic behaviour as n → ∞ of the probability density ofua

n, n ∈ N+, a ∈ I, which exists a.e. by Corollary 1.3.11, can be establishedusing a result to be proved later in Subsection 2.5.3. Set

h (x) =dH (x)

dx=

x− 1x2 log 2

if 1 ≤ x ≤ 2,

1x2 log 2

if x ≥ 2.

Recalling that

G (x) =

0 if x ≤ 0,

log(x + 1)log 2

if 0 ≤ x ≤ 1,

1 if x > 1,

it is easy to check that

H (x) =1x

∫ x−1

0G (s) ds, x ≥ 1.

Corollary 1.3.11 then yields

γa (uan < x)−H (x) =

1x

∫ x−1

0

(Ga

n−1(s)−G (s))ds (2.2.31)

for any a ∈ I, n ∈ N+, and x ≥ 1. Letting Dxγa (uan < x) denote anyone

of the four (two for x = 1) unilateral derivatives of γa (uan < x) at x, we can

state the following result.

Proposition 2.2.12 For any n ∈ N+, a ∈ I, and x ≥ 1 we have

|Dxγa (uan < x)− h (x)| ≤ k0 [min(x− 1, 1) + x I(1,2](x)]

x2

1Fn−1Fn

Page 118: Kluwer

Solving Gauss’ problem 101

where k0 is a constant not exceeding 14.8.

The proof follows from (2.2.31) and Theorem 2.5.5. The details are leftto the reader. 2

Remark. The upper bound in Proposition 2.2.12 is O(g2n) as n → ∞with g =

(√5− 1

)/2, g2 =

(3−√5

)/2 = 0.38196 · · · . It is an open

problem whether this yields the optimal convergence rate. 2

Theorem 2.2.11 and Proposition 2.2.12 can be restated in terms of theapproximation coefficients defined in Subsection 1.3.2. Indeed, by (1.3.6) wehave un+1 = u0

n+1 = Θ−1n , n ∈ N, and the results below are easily checked.

Theorem 2.2.13 For any n ∈ N+ and t ∈ I we have

|λ(Θn ≤ t)− H(t)| ≤ 3.2228 tI(0,1)(t)λn0 (1 + (0.94932)n)

and

|Dtλ(Θn ≤ t)− h(t)| ≤ k0

[min(t−1 − 1, 1) + t−1I[1/2,1)(t)]FnFn+1

,

where

H(t) =

tlog 2 if 0 ≤ t ≤ 1/2,

1log 2 (1− t + log(2t)) if 1/2 ≤ t ≤ 1

and

h(t) =dH

dt=

1log 2

if 0 ≤ t ≤ 1/2,

1log 2

(1t− 1

)if 1/2 ≤ t ≤ 1.

Remark. The first result above improves on the convergence rate ob-tained by Faivre (1998a) while the second one on that obtained by Knuth(1984). 2

2.3 Babenko’s solution to Gauss’ problem

2.3.1 Preliminaries

Let H−1/2 = H denote the collection of all complex-valued functions f whichare holomorphic in the half-plane Re z > −1/2, bounded in every half-plane

Page 119: Kluwer

102 Chapter 2

Re z > −1/2 + ε, ε > 0, and which satisfy

R

∣∣∣∣f(−1

2+ iy

)∣∣∣∣2

dy < ∞.

Note that H is known [see Duren (1970)] as the ordinary Hardy space offunctions holomorphic in the half-plane Re z > −1/2, which is a Hilbertspace with inner product (·, ·)H defined by

(f, g)H =12π

Rf∗

(−1

2+ iy

)g

(−1

2+ iy

)dy, f, g ∈ H,

therefore a Banach space under the norm ||| · ||| H defined by

||| f ||| H =

(12π

R

∣∣∣∣f(−1

2+ iy

)∣∣∣∣2

dy

)1/2

, f ∈ H.

Let L2(R+,BR+ , λ) = L2(R+) denote the Hilbert space of square λ-integrable functions ϕ : R+ → C with the usual scalar product

(ϕ,ψ) =∫

R+

ϕψ∗dλ, ϕ, ψ ∈ L2 (R+) ,

and norm||ϕ||2 = (ϕ,ϕ)1/2 , ϕ ∈ L2 (R+) .

A Paley–Wiener theorem holds, giving a simple characterization of theelements of H [see Duren (1970)]: f ∈ H if and only if there exists ϕ ∈L2 (R+) such that

f (z) =∫

R+

e−zs−s/2ϕ (s) ds, Re z > −1/2;

the function ϕ is unique (in the L2-sense) and

||| f ||| H = || ϕ ||2. (2.3.1)

In other words, the linear operator M : L2 (R+) → H defined by

Mϕ (z) =∫

R+

e−zs−s/2ϕ (s) ds, ϕ ∈ L2 (R+) , Re z > −1/2,

is an isometry and the image under M of L2 (R+) is H.

Page 120: Kluwer

Solving Gauss’ problem 103

Notice that in Babenko (1978) an equivalent definition of H is considered.We follow here Mayer (1991). See also Hensley [(1992, p. 344) and (1994, p.145)].

It is easy to check that the Perron–Frobenius operator Pλ of τ under λtakes H into itself. Obviously, for f ∈ H we define Pλf by

Pλf (z) =∑

i∈N+

1(z + i)2

f

(1

z + i

), Re z > −1/2.

2.3.2 A symmetric linear operator

Consider the linear operator S : L2 (R+) → L2 (R+) defined by

Sϕ (s) =(

1− e−s

s

)1/2

ϕ (s) , ϕ ∈ L2 (R+) , s ∈ R+.

Clearly, S is invertible and

S−1ϕ (s) =(

s

1− e−s

)1/2

ϕ (s) , ϕ ∈ S(L2 (R+)

), s ∈ R+.

Consider also the linear operator

A = SM−1 : H → L2 (R+)

with inverseA−1 = MS−1 : S

(L2 (R+)

) → H.

Proposition 2.3.1 Define the symmetric linear operator K : L2 (R+) →L2 (R+) by

Kϕ (s) =∫

R+

k (s, t) ϕ (t) dt , ϕ ∈ L2 (R+) , s ∈ R+,

where

k (s, t) =J1

(2√

st)

((es − 1) (et − 1))1/2, s, t ∈ R+,

and J1 is the Bessel function of order 1 defined by

J1 (s) =s

2

k∈N

(−1)k

k! (k + 1)!

(s

2

)2k, s ∈ R+.

Page 121: Kluwer

104 Chapter 2

ThenPλ = A−1 K A. (2.3.2)

Proof. Note first that the range of K is included in S(L2 (R+)

).

Let ϕ ∈ L2 (R+) and put f = Mϕ ∈ H. We have

A−1K A f = MS−1K S ϕ.

But

(S−1KSϕ

)(s) =

(s

1− e−s

)1/2 ∫

R+

k (s, t)(

1− e−t

t

)1/2

ϕ (t) dt

=∫

R+

(s

t

)1/2e

s−t2

J1

(2√

st)

es − 1ϕ (t) dt, s ∈ R+,

whence

(MS−1KSϕ

)(z) =

R2+

e−zs− t

2(s

t

)1/2 J1

(2√

st)

es − 1ϕ (t) dsdt

=∫

R+

k∈N+

1(z + k)2

exp(− t

z + k− t

2

)ϕ (t) dt,

for Re z > −1/2, on account of the identity

k∈N+

1(z + k)2

exp(− t

z + k

)=

R+

(s

t

)1/2e−zs J1

(2√

st)

es − 1ds

which is valid for t ∈ R+ and Re z > −1 [see Watson (1944, formula7.13.9)]. It remains to note that

R+

ϕ (t) exp(− t

z + k− t

2

)dt = (Mϕ)

(1

z + k

)= f

(1

z + k

)

for any k ∈ N+ and Re z > −1, to obtain

(A−1KAf

)(z) =

k∈N+

1(z + k)2

f

(1

z + k

)= (Pλ f) (z), Re z > −1/2.

2

Page 122: Kluwer

Solving Gauss’ problem 105

As an integral symmetric linear operator with continuous kernel, K isa compact operator on L2 (R+) with only real eigenvalues λj , j ∈ N+,satisfying

limj→∞

|λj | = 0.

See, e.g., Kanwal (1997, Ch.7). Note that 0 cannot be an eigenvalue sinceKϕ = 0 implies that ϕ = 0 by the invertibility of the Hankel transform.See, e.g., Magnus et al. (1966, Ch. 11). As usual, we order the eigenvaluesaccording to their absolute values, that is, |λ1| ≥ |λ2| ≥ ... , where we listeach eigenvalue according to its multiplicity. We then have

Kϕ =∑

j∈N+

λj (ϕ,ϕj) ϕj , ϕ ∈ L2 (R+) , (2.3.3)

where ϕj is a (real-valued) eigenfunction corresponding to λj , that is Kϕj =λjϕj , j ∈ N+, and the ϕj , j ∈ N+, define an orthonormal system inL2 (R+). Note that this system is complete since 0 is not an eigenvalue ofK.

We actually can prove more about K. For that we recall that a linearoperator L on a Banach space B of norm || · || is called nuclear of order 0(or of trace class) if and only if it can be written as

Lx =∑

i∈Iyi(x)xi, x ∈ B,

with ∑

i∈I(||yi|| ||xi||)r < ∞

for any r > 0. Here I is a countable set while xi ∈ B and yi ∈ B∗ = the dualBanach space of B (consisting of all bounded linear functional on B) for anyi ∈ I. Such operators have been introduced and studied by Grothendieck(1955, 1956). They are compact and thus have discrete spectra. Moreover,most of matrix algebra can be extended to them. In particular, one candefine the trace of such an operator as

Tr L =∑

i∈Iyi(xi) =

j∈N+

λj , (2.3.4)

where λj , j ∈ N+, are the eigenvalues of L, each of them counted with itsmultiplicity. The traces of the powers Ln, n ≥ 2, are also well defined. Theanalog of the characteristic polynomial of a matrix for a nuclear operator of

Page 123: Kluwer

106 Chapter 2

order 0, is known as the Fredholm determinant, which is an entire functionof z ∈ C given by the formula

det (Id− zL) =∏

j∈N+

(1− λjz).

Then the equation

det(Id− zL) = exp(−Tr log(Id− zL)) = exp

k∈N+

zk

kTrLk

holds for |z| < 1. Hence

Tr Ln =∑

j∈N+

λnj , n ∈ N+.

Moreover, generalized traces defined as∑

j∈N+

|λj |ε

exist for any ε > 0.Let us finally note that in some Banach spaces every bounded linear

operator is nuclear of order 0. A typical example of such a Banach space isA∞(D1), to be defined in Subsection 2.4.3.

Proposition 2.3.2 K is a nuclear operator of trace class. Hence∑

j∈N+

|λj |ε < ∞

for any ε > 0. We have

Tr K =∑

j∈N+

λj =∫

R+

k (s, s) ds =∫

R+

J1 (2s)es − 1

ds = 0.7711255237 · · · ,

Tr K2 =∑

j∈N+

λ2j =

∫∫

R2+

k (s, t) k (t, s) ds dt

=∫∫

R2+

J21

(2√

st)

(es − 1) (et − 1)ds dt = 1.103839654 · · · .

(2.3.5)

Page 124: Kluwer

Solving Gauss’ problem 107

Proof. Consider the Laguerre polynomials

L1n (s) = (n + 1)!

n∑

m=0

(−1)m sm

(m + 1)!m! (n−m)!, n ∈ N, s ∈ R+.

We have ∫

R+

se−s(L1

n (s))2 ds = n + 1, n ∈ N,

R+

se−sL1m (s) L1

n (s) ds = 0, m, n ∈ N, m 6= n.

See, e.g., Magnus et al. (1966, Ch. 5). We expand J1

(2√

st)/√

st, s, t ∈ R+,in terms of the L1

n (s) , n ∈ N, to obtain

J1

(2√

st)

√st

=∑

n∈N

L1n (s) Cn(t), s, t ∈ R+,

where

Cn (t) =1

n + 1

R+

se−sL1n (s)

J1

(2√

st)

√st

ds

= n!n∑

m=0

k∈N

(−1)m+k (m + k + 1)!tk

k! (k + 1)!m! (m + 1)! (n−m)!

=e−ttn

(n + 1)!, n ∈ N, t ∈ R+.

It follows thatKϕ =

n∈N

(ϕ, βn) αn, ϕ ∈ L2 (R+) , (2.3.6)

where αn, βn ∈ L2 (R+) are given by

αn (s) =s1/2L1

n (s)

(es − 1)1/2, βn (t) =

tn+1/2e−t

(et − 1)1/2 (n + 1)!, s, t ∈ R+.

To prove the first assertion we should show that∑

n∈N

(||αn||2||βn||2)r < ∞

Page 125: Kluwer

108 Chapter 2

for any r > 0. Since (es − 1)−1 =∑

k∈N+e−ks, s ∈ R++, the computation

of ||αn||2 reduces to that of a standard integral:

||αn||22 =∑

k∈N+

R+

se−ks(L1

n (s))2 ds

=∑

k∈N+

n + 1k2n+2

n∑

p=0

(n + 1

p

)(n

p

)(k − 1)2p ,

and since(n+1

p

) ≤ 2n+1, 0 ≤ p ≤ n, we obtain

||αn||22 ≤ 2n+1 (n + 1)∑

k∈N+

((k − 1)2 + 1

)n

k2n+2≤ 2n+1 (n + 1) ζ (2) .

Next, as∫

R+

sme−sds = m!, m ∈ N, we have

||βn||22 =1

((n + 1)!)2∑

k≥3

R+

s2n+1e−ksds

=(2n + 1)!

((n + 1)!)2∑

k≥3

1k2n+2

=

(2n+1n+1

)

n + 1

k≥3

1k2n+2

, n ∈ N.

Since

k≥3

1k2n+2

=2∑

j=0

`∈N+

1(3` + j)2n+2

≤ 3∑

`∈N+

1(3`)2n+2 = 3−2n−1ζ (2n + 2)

and (2n + 1n + 1

)≤ 22n+1, ζ (2n + 2) ≤ ζ(2), n ∈ N,

we obtain

||βn||22 ≤ζ (2)n + 1

(23

)2n+1

, n ∈ N.

Finally, for any r > 0 we have

n∈N

(||αn||2||βn||2)r ≤(

2√3ζ (2)

)r ∑

n∈N

((2√

23

)r)n

< ∞.

Page 126: Kluwer

Solving Gauss’ problem 109

The formulae for Tr K and Tr K2 in the statement follow from (2.3.4)and (2.3.6) which as easily checked yield

Tr K =∑

n∈N

(αn, βn) =∫

R+

k(s, s)ds,

Tr K2 =∑

m,n∈N

(αm, βn)(αn, βm) =∫∫

R2+

k(s, t)k(t, s)dsdt.

Concerning the numerical values of Tr K and Tr K2 we refer the readerto Mayer and Roepstorff (1987, Section 3). 2

Remark. There is an interesting relationship between Tr Kn and thenon-zero fixed points of τn for any n ∈ N+. It can be shown [see Mayer andRoepstorff (1987, Section 3) and (1988, Section 3)] that

Tr Kn =∑

i1,... ,in∈N+

[x−2

i1···in

n∏

k=2

x−2ik···ini1···ik−1

− (−1)n

]−1

,

with∏1

k=2 = 1, where xi1···in =[i1, . . . , in

], i1, . . . , in ∈ N+. (For no-

tation see Subsection 1.1.3.) Clearly, these quadratic irrationalities are allnon-zero solutions of the equation τnx = x. Hence

xi1···in =1

2qn−1

(pn−1 − qn +

((pn−1 + qn)2 + 4(−1)n−1

)1/2)

for any n ∈ N+ and i1, . . . , in ∈ N+. Here, as usual,

pn

qn= [i1, . . . , in] , g.c.d.(pn, qn) = 1, n ∈ N+,

with p0 = 0, q0 = 1. In particular,

xi =(

i2

4+ 1

)1/2

− i

2, i ∈ N+,

xij =(

j2

4+

j

i

)1/2

− j

2, i, j ∈ N+.

It is asserted in Babenko (1978, p. 140) that for any n ∈ N+, in ournotation, we have

Tr Kn =(−1)n−1

2

i1,... ,in∈N+

1− pn−1 + qn(

(pn−1 + qn)2 + 4(−1)n−1)1/2

.

Page 127: Kluwer

110 Chapter 2

For n = 1 and n = 2 this is in agreement with the Mayer–Roepstorff formula,as easily checked. Clearly, Babenko’s formula is much simpler than Mayer–Roepstorff’s. It can be shown that it is true for any n ∈ N+. See Subsection2.4.3.

Let us finally note that by the above we have

Tr K =12

i∈N+

(1− i√

i2 + 4

)

and

Tr K2 =12

i,j∈N+

(ij + 2√

ij (ij + 4)− 1

)

=12

k∈N+

(k + 2√k(k + 4)

− 1

)t(k),

where t(k) is the number of divisors of k, equal to∏

α(nα + 1) if 1 < k =∏α pnα

α is the factorization of k into distinct primes, and t(1) = 1. 2

Corollary 2.3.3 The dominant eigenvalue λ1 of K is simple and isequal to 1. The corresponding eigenfunction ϕ1 is defined by

ϕ1 (s) =1

(log 2)1/2

(1− e−s

s

)1/2

e−s/2, s ∈ R+.

Proof. Since∫

R+

ske−sds = k!, k ∈ N, we have

Kϕ1 (s) =1

(log 2)1/2 (es − 1)1/2

R+

J1

(2√

st)

t−1/2e−tdt

=s1/2

(log 2)1/2 (es − 1)1/2

k∈N

(−1)k sk

k! (k + 1)!

R+

tke−tdt

=s1/2 (1− e−s)

(log 2)1/2 (es − 1)1/2 s= ϕ1 (s) , s ∈ R+,

Page 128: Kluwer

Solving Gauss’ problem 111

and

||ϕ1||22 =1

log 2

R+

(1− e−s) e−s

sds

=1

log 2

k∈N+

(−1)k+1

k!

R+

sk−1e−sds

=1

log 2

k∈N+

(−1)k+1

k= 1.

Thus 1 is an eigenvalue of K with corresponding eigenfunction ϕ1. It shouldbe the dominant eigenvalue since λn = 1 implies Tr K2 ≥ n, which contra-dicts (2.3.5) unless n = 1. It should also be simple since λ1 = λ2 impliesTr K2 ≥ 2, which contradicts again (2.3.5). 2

Concerning the remaining eigenvalues λn, n ≥ 2, we first have

λ2 = −λ0 = −0.30366 30028 98732 65859 · · ·(this follows from Theorem 2.2.5 and Theorem 2.3.5 below). Next, extensivecomputations [cf. Daude et al. (1997, Section 6) and MacLeod (1993)] yield

λ3 = 0.10088 45092 93104 07530 · · · ,

λ4 = −0.03549 61590 21659 84540 · · · ,

λ5 = 0.01284 37903 62440 26481 · · · ,

λ6 = −0.00471 77775 11571 03107 · · · ,

λ7 = 0.00174 86751 24305 51191 · · · ,

λ8 = −0.00065 20208 58320 50290 · · · ,

λ9 = 0.00024 41314 65524 51581 · · · ,

λ10 = −0.00009 16890 83768 59330 · · · .

It has been conjectured in Babenko (1978) that all eigenvalues λj , j ∈N+, are simple. Another conjecture [Mayer and Roepstorff (1988)] is that(−1)j+1λj > 0, j ∈ N+.

2.3.3 An ‘exact’ Gauss–Kuzmin–Levy theorem

Let us define the functions ψj ∈ H, j ∈ N+, by

ψj (z) =(A−1ϕj

)(z) =

R+

e−zs−s/2

(s

1− e−s

)1/2

ϕj (s) ds, Re z > −1/2.

Page 129: Kluwer

112 Chapter 2

Note that since λjϕj = Kϕj implies

|ϕj (s)| ≤ Cjs1/2e−s/2, s ∈ R+,

for some suitable Cj ∈ R+, it follows that ψj is regular in the half-plane Re z > −1. It is possible to show that actually the ψj , j ∈ N+, areregular outside a cut along the negative axis from −1 to ∞, which is thenatural boundary of them.

In particular,

ψ1 (z) =1

(log 2)1/2

R+

e−zs−sds = − 1

(log 2)1/2

e−(z+1)s

z + 1

∣∣∣∣∣∞

0

=1

(log 2)1/2

1z + 1

, Re z > −1.

(2.3.7)

Proposition 2.3.4 We have∑

j∈N+

|ψj (z)|2 =∑

j∈N+

1(2 Re z + j)2

, Re z > −1/2, (2.3.8)

maxx∈I

|ψj (x)| ≤(

π2

6− 1

4 log 2

)1/2

= 1.13325209315 · · · , j ≥ 2. (2.3.9)

Proof. For any fixed z with Re z > −1/2 consider the function

ϕ (s) = e−zs−s/2

(s

1− e−s

)1/2

, s ∈ R+,

which clearly belongs to L2 (R+). On account of the completeness of the sys-tem (ϕj)j∈N+ , whose properties are described in the lines following equation(2.3.3), we can write

ϕ =∑

j∈N+

ejϕj ,

whereej = (ϕ,ϕj) = ψj (z) , j ∈ N+.

Parseval’s equation then yields∑

j∈N+

|ej |2 = ||ϕ||22.

Page 130: Kluwer

Solving Gauss’ problem 113

But

||ϕ||22 =∫

R+

∣∣∣e−zs−s/2∣∣∣2 s ds

1− e−s

=∫

R+

e−2sRez s ds

es − 1=

j∈N+

R+

e−(2 Re z+j)ss ds

= −∑

j∈N+

e−(2 Re z+j)s

(s

2 Re z + j+

1(2 Re z + j)2

)∣∣∣∣∣∣

0

=∑

j∈N+

1(2 Re z + j)2

, Re z > −1/2,

and (2.3.8) follows.Finally, (2.3.9) follows from (2.3.7) and (2.3.8) since

minx∈I

ψ1 (x) =1

2 (log 2)1/2.

2

Remarks. 1. It is conjectured in Babenko (1978, p.140) that ψj(0) 6= 0and |ψj(0)| = maxx∈I |ψj(x)| , j ≥ 2. Note that ψ2(0) 6= 0 is implicit inWirsing (1974).

2. If ψj(0) 6= 0 for some j ≥ 2, then

ψj(−i− [i1, . . . , in] + z) =(−1)n+1

λn+2j

(1− λj)ψj(0)

z+ O(1)

as z → 0 for any n ∈ N+, i, i1, . . . , in ∈ N+, in ≥ 2, with ε < |arg z| <π − ε whatever ε > 0. This was proved by Wirsing (1974) for j = 2, thusestablishing the cut along the negative real axis from −1 to ∞ as the naturalboundary of the functions ψ and ψ in Subsection 2.2.2. (See Remark 2 beforeTheorem 2.2.5.) It is asserted in Babenko & Jur′ev (1978) that Wirsing’sreasoning also works for any j ≥ 3. 2

We are now able to prove an ‘exact’ Gauss–Kuzmin–Levy theorem forthe measures γa, a ∈ I (cf. Subsection 1.3.4).

Theorem 2.3.5 For any a ∈ I, A ∈ BI , and n ∈ N+ we have

γa

(τ−n (A)

)− γ (A) = (a + 1)∑

j≥2

λn−1j ψj (a)

Aψj dλ. (2.3.10)

Page 131: Kluwer

114 Chapter 2

Next,∫

Iψj dλ = 0, j ≥ 2, and

∣∣∣∣∣∣γa

(τ−n (A)

)− γ (A)− (a + 1)`−1∑

j=2

λn−1j ψj (a)

Aψj dλ

∣∣∣∣∣∣

≤(

π2 log 26

− 1)|λ`|n−1 min (γ (A) , 1− γ (A))

for any a ∈ I, A ∈ BI , ` ≥ 2, and n ∈ N+. (Clearly,∑1

j=2 = 0.)

Proof. For any a ∈ I consider the function ha defined by

ha (z) =a + 1

(az + 1)2, Re z > −1/2.

Note that h0 does not belong to H. Instead, the function

Pλ ha (z) = (a + 1)∑

i∈N+

1(z + a + i)2

, Re z > −1/2,

does belong to H for any a ∈ I.

By (2.3.2) and (2.3.3) for any g ∈ H and n ∈ N we have

Pnλ g = A−1KnA g = A−1

j∈N+

λnj (Ag, ϕj) ϕj

=

j∈N+

λnj (Ag, ϕj) ψj .

Hence, for any n ∈ N+ and a ∈ I,

Pnλ ha = Pn−1

λ (Pλha) =∑

j∈N+

λn−1j (APλha, ϕj) ψj . (2.3.11)

We assert that for any a ∈ I we have

(APλha) (s) = (a + 1) e−s/2−as

(s

1− e−s

)1/2

, s ∈ R+. (2.3.12)

This can be checked as follows. Since Pλha = MS−1 (APλha), we have to

Page 132: Kluwer

Solving Gauss’ problem 115

prove that this last equation holds with APλha given by (2.3.12). We have

S−1 (APλha) (s) = (a + 1)s

1− e−se−s/2−as, s ∈ R+,

M(S−1APλha

)(z) = (a + 1)

R+

se−se−(z+a)s

1− e−sds

= (a + 1)∑

j∈N+

R+

se−(z+j+a)sds

= (a + 1)∑

j∈N+

1(z + j + a)2

= Pλha(z), Re z > −1/2.

Thus (2.3.12) holds and we then have

(APλha, ϕj) = (a + 1)ψj(a), a ∈ I, j ∈ N+. (2.3.13)

Therefore (2.3.11) and (2.3.13) imply that

Pnλ ha = (a + 1)

j∈N+

λn−1j ψj (a) ψj , a ∈ I, n ∈ N+.

The last equation holds in H . By (2.3.9), Proposition 2.3.2, and Corollary2.3.3, the series

∑j∈N+

λn−1j ψj (a) ψj is uniformly and absolutely convergent

in I for any a ∈ I and n ∈ N+. Hence whatever a ∈ I and n ∈ N+ by(2.3.7) we have

Pnλ ha (x)− 1

(x + 1) log 2= (a + 1)

j≥2

λn−1j ψj (a) ψ(x), x ∈ I. (2.3.14)

Equation (2.3.10) follows by integrating the last equation over A ∈ BI sinceby the very definition of the Perron–Frobenius operator we can write

APn

λ hadλ =∫

τ−n(A)hadλ =

τ−n(A)dγa = γa(τ−n (A)), n ∈ N.

Since∫

Iγ (da) γa

(τ−n (A)

)= γ(τ−n (A)) = γ (A) , n ∈ N, A ∈ BI ,

Page 133: Kluwer

116 Chapter 2

if we divide equation (2.3.10) by (a + 1) (log 2) and integrate the equationobtained over a ∈ I, then we obtain

0 =∑

j≥2

λn−1j

Iψjdλ

Aψjdλ, n ∈ N+, A ∈ BI .

Taking A = I and n = 1 we deduce that∫

Iψjdλ = 0, j ≥ 2.

Finally, for a ∈ I, A ∈ BI , ` ≥ 2, and n ∈ N+ set

Da,`,n (A) = D (A)

=

∣∣∣∣∣∣γa

(τ−n (A)

)− γ (A)− (a + 1)`−1∑

j=2

λn−1j ψj (a)

Aψjdλ

∣∣∣∣∣∣

and note that D (A) = D (I \A). It follows from (2.3.10) that

D (A) ≤ (a + 1) |λ`|n−1∫

A

j≥`

|ψj (a)| |ψj (x)|dx

≤ (a + 1) |λ`|n−1∫

A

j≥`

ψ2j (a)

1/2 ∑

j≥`

ψ2j (x)

1/2

dx

= (log 2) |λ`|n−1∫

A

(a + 1)2

j≥`

ψ2j (a)

1/2

×(x + 1)2

j≥`

ψ2j (x)

1/2

γ (dx) .

Now, equation (2.3.8) implies

(a + 1)2∑

j≥`

ψ2j (a) ≤ (a + 1)2

j∈N+

1(2a + j)2

− 1(a + 1)2 log 2

≤ ζ (2)− 1log 2

(2.3.15)

Page 134: Kluwer

Solving Gauss’ problem 117

for any a ∈ I and ` ≥ 2. (The last inequality can be easily checked.) Wetherefore obtain

D (A) ≤(

π2 log 26

− 1)|λ`|n−1 γ (A) .

Since D (A) = D (I \A) we conclude that

D (A) ≤(

π2 log 26

− 1)|λ`|n−1 min (γ(A), 1− γ (A)) .

Note thatπ2 log 2

6− 1 = 0.14018 · · · = ε2

(cf. Subsection 1.3.6 ). 2

Corollary 2.3.6 For any a, x ∈ I, n ∈ N+, and ` ≥ 2 we have

γa(τn < x)− γ([0, x]) = (a + 1)∑

j≥2

λn−1j ψj(a)

∫ x

0ψjdλ,

ddx

γa(τn < x)− 1(x + 1) log 2

= (a + 1)∑

j≥2

λn−1j ψj(a)ψj(x),

∣∣∣∣∣∣γa(τn < x)− γ([0, x])− (a + 1)

`−1∑

j=2

λn−1j ψj(a)

∫ x

0ψjdλ

∣∣∣∣∣∣

≤(

π2 log 26

− 1)|λ`|n−1

(12−

∣∣∣∣12− γ([0, x])

∣∣∣∣)

,

∣∣∣∣∣∣ddx

γa(τn < x)− 1(x + 1) log 2

− (a + 1)`−1∑

j=2

λn−1j ψj(a)ψj(x)

∣∣∣∣∣∣

≤(

π2

6− 1

log 2

)|λ`|n−1 1

x + 1.

Next (cf. Corollary 1.2.5), for any a ∈ I, n, k ∈ N+, and i(k) ∈ Nk+ we have

∣∣∣∣∣γa

((an+1, · · · , an+k) = i(k)

)

γ([u(i(k)), v(i(k))]

) − 1

∣∣∣∣∣ ≤(

π2 log 26

− 1)

λn−10 ,

Page 135: Kluwer

118 Chapter 2

which for k = 1 reduces to∣∣∣∣

γa(an+1 = i)(log 2)−1 log(1 + 1/i(i + 2))

− 1∣∣∣∣ ≤

(π2 log 2

6− 1

)λn−1

0 ,

for any a ∈ I and i,∈ n ∈ N+.

Proof. The first equation is (2.3.10) for A = [0, x), x ∈ I, while thesecond one is simply (2.3.14). (Clearly, the latter can be obtained from theformer by differentiation.) The first inequality is that occurring in Theorem2.3.5 for A = [0, x), x ∈ I, while the second one is easily obtained using(2.3.15). Finally, the last inequality (the general case) is that occurring inTheorem 2.3.5 for A = [u(i(k)), v(i(k))] and ` = 2. 2

It is interesting to compare Theorem 2.2.5 (with µ = γa, a ∈ I) andCorollary 2.3.6. It is easy to see that for any a, x ∈ I we have

−λ0G(f ′a)ψ(x) = ψ2(a)∫ x

0ψ2dλ, (2.3.16)

wherefa(x) =

x + 1(ax + 1)2

, a, x ∈ I.

Differentiating (2.3.16) with respect to x and then putting x = a yield

ψ22(a) = −λ0G(f ′a)ψ

′(a), a ∈ I.

In particular, ψ22(0) = −λ0G(1)ψ ′(0) = λ0G(1)U∞Ψ 6= 0 (since G(1) > 0).

Now, it follows from (2.3.16) that for any x ∈ I such that ψ ′(x) 6= 0 the ratioψ2(x)/ψ ′(x) has a constant value equal to −(sgn ψ2(0))(λ0G(1)/U∞Ψ)1/2,and that for any a ∈ I such that ψ2(a) 6= 0 the ratio G(f ′a)/ψ2(a) has aconstant value equal to G(1)/ψ2(0). Then

ψ(x) = −(sgn ψ2(0))(

U∞Ψλ0G(1)

)1/2 ∫ x

0ψ2dλ

and

ψ2(x) = −(sgn ψ2(0))(

λ0G(1)U∞Ψ

)1/2

ψ ′(x)

for any x ∈ I.

Remark. It follows from Corollary 2.3.6 that the exact convergence rateto 0 as n →∞ of

supx∈I

|γa(τn < x)− γ([0, x])| , a ∈ I, (2.3.17)

Page 136: Kluwer

Solving Gauss’ problem 119

is O(λn0 ) as long as ψ2(a) 6= 0. In particular this holds for a = 0 since, as

we have just shown, ψ2(0) 6= 0. If ψ2(a) = · · · = ψj−1 (a) = 0 and ψj(a) 6= 0for some j ≥ 3, then the exact convergence rate to 0 as n → ∞ of (2.3.17)is O(λn

j ).The high accuracy computations of MacLeod (1993) show, however, that

the only possible value of j is j = 3, since there exists a unique a ∈ I, veryclose to 0.4, with ψ2(a) = 0 while ψ3(a) 6= 0. 2

2.3.4 ψ-mixing revisited

Theorem 2.3.5 allows for an important improvement of Corollary 1.3.15.With the notation in Subsection 1.3.6, it follows from Theorem 2.3.5 that

εn+1 ≤(

π2 log 26

− 1)

λn−10 , n ∈ N+. (2.3.18)

It is easy to check that for n = 1 we actually have equality in (2.3.18), thatis,

ε2 =π2 log 2

6− 1 = 0.14018 · · · ,

in accordance with the result obtained in Subsection 1.3.6.We can thus reformulate Corollary 1.3.15 as follows.

Proposition 2.3.7 The sequence (an)n∈N+ is ψ-mixing under γ andany γa, a ∈ I. For any a ∈ I we have ψγa(1) ≤ 0.61231 · · · and

ψγa(n) ≤ ε2λn−20 (1 + λ0)

1− ε2λn−10

, n ≥ 2.

In particular ψγa(2) ≤ ε2(1 + λ0)/(1 − ε2λ0) = 0.19087 · · · for any a ∈ I.Also, ψγ(1) = ε1 = 2 log 2 − 1 = 0.38629 · · · , ψγ(2) = ε2 = 0.14018 · · · ,and

ψγ(n) ≤ ε2λn−20 , n ≥ 3.

The doubly infinite sequence (a`)`∈Z of extended incomplete quotients isψ-mixing under the extended Gauss measure γ and its ψ-mixing coefficientsare equal to the corresponding ψ-mixing coefficients under γ of (an)n∈N+ .

Remark. From Theorem 2.3.5 we can also obtain a formula expressingthe ψ-mixing coefficients ψγ(n), n ≥ 2, in terms of the eigenvalues λj andfunctions ψj , j ≥ 2, as

ψγ(n + 1) = (log 2) supa,b∈I

(a + 1)(b + 1)

∣∣∣∣∣∣∑

j≥2

λn−1j ψj (a) ψj(b)

∣∣∣∣∣∣, n ∈ N+.

Page 137: Kluwer

120 Chapter 2

It is not difficult to check that the above formula yields ψγ(2) = ε2. Other-wise it seems to be of little value. 2

2.4 Extending Babenko’s and Wirsing’s work

2.4.1 The Mayer–Roepstorff Hilbert space approach

In this subsection we describe the setting devised by Mayer and Roepstorff(1987) for Babenko’s work which is thus simplified and extended. Proofsare in general not given, and for them the reader is referred to the originalpaper.

Let m denote the measure on BR+ with density

dm

dt=

t

et − 1, t ∈ R+.

Note that

m (R+) =∫

R+

t∑

k∈N+

e−ktdt =∑

k∈N+

1k2

= ζ (2) .

Consider the Hilbert space L2(R+,BR+ , m

)= L2

m (R+) of m-square inte-grable functions f : R+ → C with inner product (·, ·)m defined by

(ϕ,ψ)m =∫

R+

ϕψ∗dm, ϕ, ψ ∈ L2m(R+),

and norm

‖ϕ‖2,m =(∫

R+

|ϕ|2 dm

)1/2

, ϕ ∈ L2m(R+).

Let D denote the half-plane Re z > −1/2 and consider the measure νon BD with density

dxdy=

1(x + 1)2 + y2

if − 12

< x < 0, y ∈ R,

0 otherwise.

Note that

ν (D) =1π

∫ 0

−1/2dx

R

dy

(x + 1)2 + y2=

∫ 0

−1/2

dx

x + 1= log 2.

Page 138: Kluwer

Solving Gauss’ problem 121

Consider the Hilbert space H2 (ν) of functions f holomorphic in D suchthat

∣∣∣(z + 1)−1 f (z)∣∣∣ is bounded in every half-plane Re z > −1/2+ ε, ε > 0,

and

‖f‖2,ν =(∫

D|f |2 dν

)1/2

< ∞,

with inner product (·, ·)ν defined by

(f, g)ν =∫

Dfg∗dν, f, g ∈ H2 (ν) .

Thus H2 (ν) is a Banach space under the norm ‖·‖2,ν .Let f denote the restriction of f ∈ H2 (ν) to I. Then

U∞f =∫

Ifdγ =

(f, 1)ν

log 2(2.4.1)

and ∥∥∥f∥∥∥

2,γ≤ ‖f‖2,ν . (2.4.2)

Next, the linear mapping M : L2m (R+) → H2 (ν) defined by

Mϕ (z) = (z + 1)∫

R+

e−ztϕ (t) m(dt), ϕ ∈ L2m (R+) , z ∈ D,

is an isometry and the image under M of L2m (R+) is H2 (ν).

The Perron–Frobenius operator U takes H2 (ν) into itself. Obviously,for f ∈ H2 (ν) we define Uf by

Uf (z) =∑

i∈N+

Pi (z) f

(1

z + i

), z ∈ D.

The mapping K : ϕ → Kϕ, where

Kϕ (s) =∫

R+

J1

(2√

st) ϕ (t)√

stm (dt) , ϕ ∈ L2

m (R+) , s ∈ R+,

defines on L2m (R+) an integral symmetric linear operator with continuous

kernel

k (s, t) =J1

(2√

st)

√st

=∑

n∈N

(−1)n

n! (n + 1)!(st)n , s, t ∈ R+.

Page 139: Kluwer

122 Chapter 2

K has infinite-dimensional range, is nuclear (of trace class) and, therefore,compact. The spectra of the operators K and K (introduced in Subsec-tion 2.3.2) coincide. Thus with the notation from Subsection 2.3.2 for theeigenvalues of K we have

Kϕ =∑

k∈N+

λk(ϕ, ϕk)m ϕk, ϕ ∈ L2m(R+), (2.4.3)

where ϕk is an eigenfunction corresponding to λk, that is, Kϕk = λkϕk,k ∈ N+, and the ϕk, k ∈ N+, define an orthonormal basis in L2

m (R+).Actually,

ϕk (t) = t−1/2(et − 1

)1/2ϕk (t) , k ∈ N+, t ∈ R+,

where the ϕk, k ∈ N+, are those introduced in Subsection 2.3.2.The operators M, K and U are connected by the equation U = MKM−1.

HenceUn = MKnM−1, n ∈ N+. (2.4.4)

From (2.4.3) we have

Knϕ =∑

k∈N+

λnk(ϕ, ϕk)m ϕk, n ∈ N+, ϕ ∈ L2

m (R+) . (2.4.5)

It then follows from (2.4.4) and (2.4.5) that

Ung =∑

k∈N+

λnk(M−1g, ϕk)m Mϕk, n ∈ N+, g ∈ H2 (ν) .

Alternatively,

Ung =∑

k∈N+

λnk(g, Mϕk)ν Mϕk, n ∈ N+, g ∈ H2 (ν) .

For k = 1 we have λ1 = 1 and

ϕ1 (t) =1

(log 2)1/2t−1

(et − 1

)e−t, t ∈ R+.

Therefore

Mϕ1 (z) = (z + 1)∫

R+

e−ztϕ1 (t) m (dt)

=(z + 1)

(log 2)1/2

R+

e−(z+1)tdt =1

(log 2)1/2, z ∈ D,

Page 140: Kluwer

Solving Gauss’ problem 123

and, by (2.4.1),

(g, Mϕ1)ν Mϕ1 =1

log 2(g, 1)ν = U∞g, g ∈ H2 (ν) .

As 0 is not an eigenvalue of K, we also have

M−1g =∑

k∈N+

(M−1g, ϕk)m ϕk, g ∈ H2 (ν) ,

or, alternatively,

g =∑

k∈N+

(g, Mϕk)ν Mϕk, g ∈ H2 (ν) .

Then∥∥M−1g

∥∥2

2,m= ||g||22,ν =

k∈N+

∣∣(M−1g, ϕk)m

∣∣2 =∑

k∈N+

|(g,Mϕk)ν |2

for any g ∈ H2 (ν). Therefore

||Ung − U∞g||22,ν =∑

k≥2 |λk|2n |(g, Mϕk)ν |2

≤(||g||22,ν − |U∞g|2 log 2

)|λ2|2n,

(2.4.6)

for any n ∈ N+ and g ∈ H2(ν).Inequalities (2.4.2) and (2.4.6) imply the following result.

Proposition 2.4.1 Let g ∈ H2 (ν). Then for any n ∈ N+ we have

‖Ung − U∞g‖2,γ ≤(||g||22,ν − |U∞g|2 log 2

)1/2|λ2|n .

Corollary 2.4.2 (L2-version of the Gauss–Kuzmin–Levy theorem) Leth : D → C such that the function z → (z + 1)h(z), z ∈ D, belongs to H2 (ν)and the restriction of h to I is the Radon–Nikodym derivative with respectto λ of a probability measure µ on BI . Then

|µ (τ−n (A))− γ (A)|

≤ (log 2) γ1/2 (A)(

∫∫

D|h (x + iy)|2 dxdy − 1

log 2

)1/2

|λ2|n(2.4.7)

Page 141: Kluwer

124 Chapter 2

for any n ∈ N+ and A ∈ BI .

Proof. Let g (z) = (z + 1)h(z), z ∈ D. For any A ∈ BI and n ∈ N+ wehave

∣∣∣(IA, Ung − U∞g)γ

∣∣∣ ≤(∫

II2Adγ

)1/2

‖Ung − U∞g‖2,γ . (2.4.8)

But

(IA, Ung − U∞g)γ =1

log 2

A

Ung (x)− U∞g

x + 1dx

and, by Proposition 2.1.5,

A

Ung (x)− U∞g

x + 1dx = µ

(τ−n (A)

)− γ (A)

since

U∞g =1

log 2

I

(x + 1) h (x)x + 1

dx =1

log 2.

Therefore (2.4.8) amounts to

∣∣µ (τ−n (A)

)− γ (A)∣∣ ≤ (log 2) γ1/2 (A) ‖Ung − U∞g‖2,γ (2.4.9)

for any n ∈ N+ and A ∈ BI . Now, (2.4.7) follows from (2.4.9) and Proposi-tion 2.4.1. 2

Remark. Inequality (2.4.6) can be obviously generalized as follows. Forany n, ` ∈ N+ and g ∈ H2 (ν) we have

∣∣∣∣∣∣

∣∣∣∣∣∣Ung − U∞g −

2≤k≤`

λnk(g, Mϕk)ν Mϕk

∣∣∣∣∣∣

∣∣∣∣∣∣

2

2,ν

≤||g||22,ν − |U∞g|2 log 2−

2≤k≤`

|(g,Mϕk)ν |2 |λ`+1|2n

with the usual convention which assigns value 0 to a sum over the emptyset. Proposition 2.4.1 and Corollary 2.4.2 can be accordingly generalized. 2

Page 142: Kluwer

Solving Gauss’ problem 125

We can again derive the ‘exact’ Gauss–Kuzmin–Levy Theorem 2.3.5.First, we clearly have

ψk (z) := Mϕk (z) = (z + 1)∫

R+

e−ztϕk (t) m (dt)

= (z + 1)∫

R+

e−ztt1/2(et − 1

)−1/2ϕk (t) dt

= (z + 1)ψk(z), k ∈ N+, z ∈ D.

(2.4.10)

Second, the function ga, a ∈ I, defined by

ga (z) =(a + 1) (z + 1)

(az + 1)2, z ∈ D,

does not belong to H2 (ν) for a = 0. Instead, the function

Uga (z) = (a + 1) (z + 1)∑

j∈N+

1(z + a + j)2

, z ∈ D,

does belong to H2 (ν) for any a ∈ I. Then

Unga = Un−1 (Uga) =∑

k∈N+

λn−1k (M−1Uga, ϕk)m ψk

for any a ∈ I and n ∈ N+. Now, it is easy to check that

M−1Uga (t) = (a + 1) e−at, a ∈ I, t ∈ R+. (2.4.11)

Hence

(M−1Uga, ϕk)m = (a + 1)∫

R+

e−atϕk (t) m(dt) = ψk(a), a ∈ I, k ∈ N+.

Therefore

Unga =∑

k∈N+

λn−1k ψk(a)ψk, n ∈ N+, a ∈ I, (2.4.12)

which by (2.4.10) is identical with (2.3.14).

Page 143: Kluwer

126 Chapter 2

Note that by (2.4.11) for any a ∈ I we have

||Uga||22,ν =∣∣∣∣M−1Uga

∣∣∣∣22,m

= (a + 1)2∫

R+

e−2att dt

et − 1

= (a + 1)2∑

k∈N+

R+

te−(2a+k)tdt

= (a + 1)2∑

k∈N+

1(2a + k)2

,

(2.4.13)

that is, by Proposition 2.3.4,

||U ga||22,ν = (a + 1)2∑

k∈N+

|ψk (a)|2 .

This result is not at all surprising. It can be derived immediately from(2.4.12) with n = 1 on account of the fact that (ψk)k∈N+ is an orthonormalbasis in H2 (ν). (Remark that the ψk, k ∈ N+, are not pairwise orthogonalin H !).

Next,

U∞U ga = U∞ga =1

log 2, a ∈ I. (2.4.14)

It then follows from Proposition 2.4.1 that for any n ∈ N+ we have

‖Unga − U∞ga‖2,γ =∥∥Un−1(Uga)− U∞ga

∥∥2,γ

≤(||Uga||22,ν − |U∞ga|2 log 2

)1/2|λ2|n−1.

(2.4.15)

Proposition 2.4.3 For any a ∈ I, n ∈ N+ and A ∈ BI we have∣∣γa(τ−n(A))− γ(A)

∣∣ (2.4.16)

≤ (log 2) γ1/2 (A)

(a + 1)2

k∈N+

1(2a + k)2

− 1log 2

1/2

|λ2|n−1 .

Proof. The function

ha (x) =ga (x)x + 1

=a + 1

(ax + 1)2, x ∈ I,

Page 144: Kluwer

Solving Gauss’ problem 127

is just the Radon–Nikodym derivative dγa/dλ. Now, (2.4.16) follows from(2.4.9) and (2.4.13) through (2.4.15). 2

Remarks. 1. On account of the remark following Corollary 2.4.2, in-equality (2.4.16) can be generalized as follows. For any a ∈ I, `, n ∈ N+,and A ∈ BI we have

∣∣∣∣∣∣γa

(τ−n (A)

)− γ (A)− (log 2)∑

2≤k≤`

λn−1k ψk (a)

Aψk dγ

∣∣∣∣∣∣(2.4.17)

≤ (log 2) γ1/2 (A)

(a + 1)2

k∈N+

1(2a + k)2

−∑

1≤k≤`

ψ2k(a)

1/2

|λ`+1|n−1 .

2. It is instructive to compare the inequality in Theorem 2.3.5 with(2.4.17). The difference between them reflects the difference between theHilbert spaces H and H2 (ν). 2

2.4.2 The Mayer–Roepstorff Banach space approach

In this subsection we give a summary of the work of Mayer and Roepstorff(1988) on the u0-positivity of the Perron–Frobenius operators Pλ and U =Pγ on a suitable Banach space.

Let us first recall a few concepts concerning positive operators with re-spect to a cone in a real Banach space B. A closed convex subset C of B iscalled a cone if and only if (i) x ∈ C and a ∈ R+ imply ax ∈ C, and (ii)x ∈ C and −x ∈ C imply x = 0.

A cone C induces a partial order ≤C (≤ for short): x ≤ y if and only ify − x ∈ C. A cone C is said to be reproducing if and only if B = C − C, thatis, any z ∈ B can be written as z = x − y with x, y ∈ C. A linear operatorT : B → B is said to be positive with respect to a cone C if and only ifTC ⊂ C.

Let C be a cone and 0 6= u0 ∈ C. A positive with respect to C operatorT is said to be u0-positive if and only if for any 0 6= x ∈ C there exist p ∈ N+

and α, β ∈ R++ such that

αu0 ≤ T px ≤ βu0.

Compact operators on the complexification of B, which are positive withrespect to a reproducing cone C ⊂ B and u0-positive for some 0 6= u0 ∈ C,enjoy properties similar to those of finite positive matrices. They obey

Page 145: Kluwer

128 Chapter 2

a generalization of the Perron–Frobenius theorem for such matrices. Fordetails the reader is referred to Krasnoselskii (1964).

Coming back to our problem, let

D1 = (z ∈ C : |z − 1| < 3/2)

and consider the collection A (D1) of all holomorphic functions in D1 whichtogether with their first derivatives are continuous in D1; A (D1) is a Banachspace under the norm

‖f‖ = max

(supz∈D1

|f (z)| , supz∈D1

∣∣f ′ (z)∣∣)

, f ∈ A (D1) .

Both operators Pλ and U take A (D1) into itself. Obviously, for f ∈ A (D1)we define Pλf and Uf by

Pλf (z) =∑

i∈N+

1(z + i)2

f

(1

z + i

), z ∈ D1,

and

Uf (z) =∑

i∈N+

Pi (z) f

(1

z + i

), z ∈ D1,

respectively. Both Pλ and U are nuclear operators of trace class on A (D1).Let us write (compare with Subsection 2.1.2) Pλ = Π1 + T0, where

Π1f (z) = f1 (z)∫

Ifdλ, f ∈ A(D1), z ∈ D1,

and

f1 (z) =(log 2)−1

z + 1, z ∈ D1.

Since Pλ (f1f) = f1Uf, f ∈ A (D1), the spectra of the operators Pλ and Uon A (D1) are identical, algebraic multiplicities of the eigenvalues included.

Theorem 2.4.4 The spectra of Uon A (D1) and on H2 (ν) (see Subsec-tion 2.4.1) are identical, algebraic multiplicities of the eigenvalues included.

Consider the subspaces

A⊥ (D1) =(

f ∈ A (D1) : U∞f =∫

If dγ = 0

)

Page 146: Kluwer

Solving Gauss’ problem 129

and

A⊥(D1) =(

f ∈ A(D1) :∫

If dλ = 0

)

of A (D1) and the real subspaces A⊥r (D1)(A⊥r (D1)

)of A⊥ (D1)

(A⊥(D1)

)

consisting of functions that take real values on R ∩ D1 = [−1/2, 5/2]. Notethat by Proposition 2.1.1(ii) U leaves invariant both subspaces A⊥(D1) andA⊥r (D1) while Pλ leaves invariant both subspaces A⊥(D1) and A⊥r (D1).The complexification of A⊥r (D1)

(A⊥r (D1)

)is just A⊥ (D1)

(A⊥(D1)

).

Also, the spectrum of T0 on A⊥(D1) is identical with the spectrum of U onA⊥ (D1).

The setC =

(f ∈ A⊥r (D1) : f ′ ≥ 0 on [−1/2, 5/2]

)

is a reproducing cone in A⊥r (D1) . Define u0 ∈ A(D1) by

u0 (z) = z + 1− 1log 2

, z ∈ D1.

Clearly, u0 ∈ C.Theorem 2.4.5 The operator −U on A⊥r (D1) is positive with respect

to the cone C . Moreover, −U is u0-positive. Hence the operator − U +U∞ on A (D1) has a simple positive dominant eigenvalue equal to λ0 (cf.Theorem 2.2.5) with eigenfunction f2 in the interior Co of C. There is noother eigenfunction in C.

Corollary 2.4.6 The operator −T0 on A⊥r (D1) is positive with respect tothe (reproducing) cone f1C = (f1f : f ∈ C). Moreover, −T0 is f1u0-positive.Hence the operator −T0 on A (D1) has a simple positive dominant eigenvalueequal to λ0 with eigenfunction f2 = f1f2. There is no other eigenfunctionin f1C.

Note that a minimax principle for −λ0 holds. We namely have

minf∈Co

max−1/2≤x≤5/2

(Uf)′(x)f ′(x)

= −λ0 = maxf∈Co

min−1/2≤x≤5/2

(Uf)′ (x)f ′(x)

.

Hence

min−1/2≤x≤5/2

(Uf)′ (x)f ′(x)

≤ −λ0 ≤ max−1/2≤x≤5/2

(Uf)′ (x)f ′(x)

for any f ∈ Co. For example, taking

f (z) =z + 1

z + 1.14617− c, z ∈ D1,

Page 147: Kluwer

130 Chapter 2

with c chosen such that f ∈ A⊥ (D1), we obtain

0.2995 ≤ λ0 ≤ 0.3038,

that is, an approximation which is good enough.

2.4.3 Mayer–Ruelle operators

Statistical mechanics problems motivated the consideration of a class ofoperators including as a special case the Perron–Frobenius operator Pλ ofτ under λ. This class has been thoroughly studied by Mayer (1990, 1991).Nowadays, these operators are named after him and D. Ruelle.

Let D1 = (z ∈ C : |z − 1| < 3/2) and consider the collection A∞(D1) ofall holomorphic functions in D1 which are continuous in D1; A∞(D1) is aBanach space under the supremum norm

||| f ||| = supz∈D1

|f(z)| , f ∈ A∞(D1).

For any β ∈ C with Reβ > 1 and f ∈ A∞(D1) define

Gβf(z) =∑

i∈N+

1(z + i)β

f

(1

z + i

), z ∈ D1.

It is easy to check that Gβ is a bounded linear operator on A∞(D1). Hence,as mentioned when discussing nuclear operators in Subsection 2.3.2, Gβ isnuclear of order 0 and thus has a discrete spectrum.

For β = 2, Gβ has the same analytical expression as Pλ. In what followswe give without proofs the most important properties of the Mayer–Ruelleoperator Gβ for Reβ > 1, which generalize those of Pλ. For proofs werefer the reader to Mayer (1990, 1991). See also Daude et al. (1997), Faivre(1992), Flajolet and Vallee (1998, 2000), and Vallee (1997).

Theorem 2.4.7 Let β be real, strictly greater than 1.(i) The operator Gβ : A∞(D1) → A∞(D1) has a positive dominant eigen-

value λ(β) which is simple and strictly greater in absolute value than allother eigenvalues. The corresponding eigenfunction gβ ∈ A∞(D1) is strictlypositive on D1 ∩R = [−1/2, 5/2].

(ii) The map β → λ(β) defines on (1,∞) a strictly decreasing and log-concave function with

limβ↓1

λ(β) = ∞, λ(2) = 1, limβ→∞

log λ(β)β

= log√

5− 12

.

Page 148: Kluwer

Solving Gauss’ problem 131

Moreover,

λ(β + u) ≤(√

5− 12

)u

λ(β), u ∈ R+.

(iii) There exists a linear functional `β on A∞(D1) with `β(gβ) = 1 and`β(f) > 0 for any f ∈ A∞(D1) such that f |[−1/2,5/2] > 0 (here f |[−1/2,5/2]

denotes the restriction of f to [−1/2, 5/2]). If Π1β denotes the projectiondefined as

Π1βf = `β(f)gβ, f ∈ A∞(D1),

thenGβ = λ(β)Π1β + T0β

with Π1βT0β = T0βΠ1β = 0. Hence

Gnβ = λn(β)Π1β + Tn

0β, n ∈ N+.

(iv) The spectral radius ρ(β) of the linear operator T0β : A∞(D1) →A∞(D1) is strictly smaller than λ(β), and for any f ∈ A∞(D1) such thatf |[−1/2,5/2] > 0 we have

Gnβf(z)

λn(β)`β(f)gβ(z)= 1 + O

((ρ(β)λ(β)

)n)

as n → ∞, where the constant implied in O is independent of z ∈ D1 (butdependent on f and β).

(v) There exists ε = ε(β) > 0 such that for any α ∈ C satisfying|α− β| ≤ ε the dominant spectral properties of Gβ : A∞(D1) → A∞(D1)transfer to Gα : A∞(D1) → A∞(D1) : quantities λ(α), ρ(α), gα, `α (thusΠ1α) and T0α can be defined to represent the dominant spectral objects asso-ciated with Gα, and all of them are analytical with respect to α. Moreover,let a ∈ (ρ(β)/λ(β), 1) . For any f ∈ A∞(D1) such that f |[−1/2,5/2] > 0 wehave

Gnαf(z)

λn(α)`α(f)gα(z)= 1 + O(an)

as n →∞, where the constant implied in O is independent of z ∈ D1 and αsatisfying |α− β| ≤ ε, but depends on a, f , and β. Finally, ρ(β + it) < ρ(β)for t ∈ [−ε, ε] , t 6= 0.

The proof is the same Perron–Frobenius type of argument used in thecase β = 2, which has been sketched in the preceding subsection. There the

Page 149: Kluwer

132 Chapter 2

existence of a dominant simple real (in fact, negative) eigenvalue of T02 = T0

followed by considering the subspace A(D1) ⊂ A∞(D1). 2

As in the special case β = 2, the Mayer–Ruelle operators enjoy betterproperties when defined on suitable Hilbert spaces.

Let Reβ > 1. Consider the collection H(β) of functions f which areholomorphic in the half plane Re z > −1/2, bounded in any half-planeRe z > −1/2 + ε, ε > 0, and can be represented in the form

f(z) =∫

R+

e−zsϕ(s)(β−1)/2m′(ds), Re z > −1/2, (2.4.18)

where m′ is the measure on BR+ with density

dm′

ds=

1es − 1

if s > 0,

0 if s = 0,

for some ϕ ∈ L2m′(R+), the Hilbert space of m′-square integrable functions

ϕ : R+ → C with inner product (·, ·)m′ defined by

(ϕ,ψ)m′ =∫

R+

ϕψ∗dm′, ϕ, ψ ∈ L2m′(R+)

and norm

||ϕ||2,m′ =(∫

R+

|ϕ|2 dm′)1/2

, ϕ ∈ L2m′(R+).

Introducing the inner product

(f1, f2)(β) = (ϕ1, ϕ2)m′ ,

where ϕi is associated with fi, i = 1, 2, by (2.4.18), H(β) is made a Hilbertspace with norm

||| f |||(β) = ||ϕ||2,m′ , f ∈ H(β),

where f and ϕ are again associated by (2.4.18).

Theorem 2.4.8 Let Re β > 1.(i) The linear operator Gβ takes boundedly H(β) into itself.(ii) For any f ∈ H(β) we have

Gβf(z) =∫

R+

e−zsKβϕ(s)s(β−1)/2m′(ds), Re z > −1/2,

Page 150: Kluwer

Solving Gauss’ problem 133

where Kβ : L2m′(R+) → L2

m′(R+) is a symmetric integral operator definedby

Kβϕ(s) =∫

R+

Jβ−1

(2√

st)

ϕ(t)m′(dt), ϕ ∈ L2m′(R+), s ∈ R+.

Here Jβ−1 is the Bessel function of order β − 1 defined by

Jβ−1(u) =(u

2

)β−1 ∑

k∈N

(−1)k

k! Γ(k + β)

(u

2

)2k, u ∈ R+.

Hence Gβ : H(β) → H(β) can be diagonalized in an orthonormal basis ofH(β). Moreover, if β ∈ R then Gβ is self-adjoint and its spectrum is real.

(iii) The spectra of the operators Gβ : A∞(D1) → A∞(D1), Gβ : H(β) →H(β) and Kβ : L2

m′(R+) → L2m′(R+) are identical. Hence for any real β > 1

these spectra are all real.

Let us note in particular that for β = 2 the symmetric operator K2 fromTheorem 2.4.8 is different from the symmetric operator K from Proposition2.3.1. They are related by the simple relation K2 = SKS−1, where S :L2(R+) → L2

m′(R+) is an invertible linear operator defined by

S ϕ(s) = (es − 1)1/2ϕ(s), s ∈ R+.

Hence the spectra of K and K2 are identical.As for K, formulae for the trace of Kβ and its powers are available. De-

noting by λi(β), i ∈ N+, the eigenvalues of Kβ taken in order of decreasingmoduli and counting their multiplicity, we have

Tr Kβ =∑

i∈N+

λi(β) =∑

i∈N+

1

yβ−2i (y2

i + 1),

where yi =(i +

√i2 + 4

)/2, i ∈ N+, and, in general,

Tr Knβ =

i∈N+

λni (β) =

i1,··· ,in∈N+

1

yβ−2i1···in

(y2

i1···in + (−1)n−1) ,

where

yi1···in =pn−1 + qn +

√(pn−1 + qn)2 + 4(−1)n−1

2with, as usual,

pn

qn= [i1, · · · , in] , g.c.d. (pn, qn) = 1, p0 = 0,

Page 151: Kluwer

134 Chapter 2

for any n ∈ N+ and i1, · · · , in ∈ N+. Let us note that for β = 2 we recoverBabenko’s formula for Tr Kn, n ∈ N+. See the remark following the proofof Proposition 2.3.2.

In particular [see Daude et al. (1997)], we have

Tr K4 =72− 2√

5− 7√

2+

12

i≥2

(−1)i i− 1i + 1

(2ii

)(ζ(2i)− 1− 1

22i

)

= 0.14446 23962 46160 81588 · · · ,

Tr K24 = 0.04647 18256 42727 93983 · · · ,

andλ1(4) = 0.19945 88183 43767 26019 · · · ,

λ2(4) = −0.07573 95140 84360 60892 · · · ,

λ3(4) = 0.02856 64037 69818 52783 · · · ,

λ4(4) = −0.01077 74165 76612 69829 · · · ,

λ5(4) = 0.00407 09406 93426 42144 · · · .

To conclude this brief discussion of Mayer–Ruelle operators we mentiontwo generalizations of them.

a. For any subset M of N+ define

GM,βf(z) =∑

i∈M

1(z + i)β

f

(1

z + i

), z ∈ D1,

whatever β ∈ C with Reβ > 1 and f ∈ A∞(D1). Clearly, GM,β is abounded linear operator on A∞(D1), hence a nuclear one of trace class,which coincides with Gβ when M = N+. Now, for an arbitrarily fixed k ∈N+, let Mi, 1 ≤ i ≤ k, be subsets of N+ and write M = (M1, . . . , Mk).Consider the linear operator GM,β : A∞(D1) → A∞(D1) defined as

GM,β = GMk,β · · · GM1,β,

which is nuclear of trace class, too.The operators GM,β for various M control the dynamics of continued

fraction expansions of irrationals subject to periodical constraints. Theirspectral properties are entirely similar to those of Gβ. For details see Vallee(1998), who considered systematically such operators. See, however, Fluch(1986, 1992) for special cases.

Page 152: Kluwer

Solving Gauss’ problem 135

b. The second generalization has been motivated by the study of thetransformation

z → 1z−

⌊Re

1z

⌋, 0 6= z ∈ C,

which extends to the complex domain the continued fraction transformationτ . Let

D2 =(

z : |z − 1| < 54

),

and consider the collection B∞(D2) of all functions F which are holomorphicin D2

2 and continuous in D22. Under the supremum norm

||| F ||| = sup(z,w)∈D

22

|F (z, w)| ,

B∞(D2) is a Banach space.Then for any (α, β) ∈ C2 with Re (α+β) > 1 a linear bounded operator

Gα,β : B∞(D2) → B∞(D2)

is defined by

Gα,βF (z, w) =∑

i∈N+

1(z + i)α(w + i)β

F

(1

z + i,

1w + i

)

for any F ∈ B∞(D2) and (z, w) ∈ D22. The spectral properties of Gα,β,

which is positive and nuclear of trace class, are strongly related to those ofGα+β+2`, ` ∈ N. For details see Vallee (1997).

2.5 The Markov chain associated with thecontinued fraction expansion

2.5.1 The Perron–Frobenius operator on BV (I)

In this section we study the Perron–Frobenius operator U on BV (I). Thisis motivated by Proposition 2.1.10 which establishes U as the transitionoperator of certain Markov chains. Throughout, except for Corollary 2.5.7,we consider just real-valued functions in BV (I).

By Proposition 2.1.16, the operator U defined by (2.1.16) is a boundedlinear operator of norm 1 on BV (I). Moreover, by Corollary 2.1.13 we have

var Uf ≤ 12var f

Page 153: Kluwer

136 Chapter 2

for any f ∈ BV (I), the constant 1/2 being optimal. Hence

var Unf ≤ 2−n var f

for any f ∈ BV (I) and n ∈ N+. As might be expected, we shall see thatthe constant 2−n is not optimal for n > 1. A natural problem thus arises:what is the upper bound of var Unf/var f over non-constant f ∈ BV (I)?A satisfactory answer to this problem will be given in Theorem 2.5.3 andCorollary 2.5.6.

It is easy to check by induction with respect to n ∈ N+ that

Unf(x) =∑

i1,··· ,in∈N+

Pi1···in(x)f(uin···i1(x)), x ∈ I, (2.5.1)

where

uin···i1 = uin · · · ui1 ,

Pi1···in(x) = Pi1(x)Pi2(ui1(x)) · · ·Pin(uin−1···i1(x)), n ≥ 2,(2.5.2)

and the functions ui and Pi, i ∈ N+ are defined by

ui(x) =1

x + i, Pi(x) =

x + 1(x + i)(x + i + 1)

, x ∈ I.

Note that by Proposition 2.1.10 we have

Unf(x) = Ex(f(sxn))

for any n ∈ N, f ∈ B(I), and x ∈ I (remember that sx0 = x, x ∈ I), where

Ex denotes the mean value operator with respect to the probability measureγx. As

sxn = uan···a1(x), x ∈ I, n ∈ N+,

we thus have

Unf(x) =∑

i(n)∈Nn+

γx((a1, · · · , an) = i(n))f (uin···i1(x)) (2.5.3)

for any n ∈ N+, f ∈ B(I), and x ∈ I. Hence

Pi1···in(x) = γx(I(i(n))) (2.5.4)

for any x ∈ I, n ∈ N+, and (i1, · · · , in) = i(n) ∈ Nn+. Of course, equation

(2.5.4) could be also obtained by direct computation.

Page 154: Kluwer

Solving Gauss’ problem 137

Now, by (1.2.4), I(i(n)) is the set of irrationals in the interval with end-points pn/qn and (pn + pn−1)/(qn + qn−1). Since

pn

qn= [i1, · · · , in] =

1/i1 if n = 1,

1i1 + pn−1(i2, · · · , in)/qn−1(i2, · · · , in)

if n > 1

and

pn + pn−1

qn + qn−1=

1/(i1 + 1) if n = 1,

[i1, · · · , in−1, in + 1] if n > 1

=

1/(i1 + 1) if n = 1,

1i1 + pn(i2, · · · , in, 1)/qn(i2, · · · , in, 1)

if n > 1

we can write

Pi1···in(x) = (x + 1)× 1qn−1(i2, · · · , in)(x + i1) + pn−1(i2, · · · , in)

×

× 1qn(i2, · · · , in, 1)(x + i1) + pn(i2, · · · , in, 1)

(2.5.5)

for any n ≥ 2, i(n) ∈ Nn+, and x ∈ I.

A useful alternative representation of Unf, n ∈ N+, when f ∈ BV (I) isavailable.

Proposition 2.5.1 If f ∈ BV (I) then for any n ∈ N+ and x ∈ I wehave

Unf(x) =∫

[0,1)UnI(a,1](x)df(a) + f(0)

with∫[0,x) df = f(x)− f(0), x ∈ I.

Proof. Since f can be represented as the difference of two non-decreasingfunctions, we may and shall assume that f is non-decreasing. Then for anyx ∈ I we have

f(x)− f(0) =∫

[0,1)I(a,1](x)df(a).

Page 155: Kluwer

138 Chapter 2

By (2.5.1), using the above equation and Fubini’s theorem we obtain

Unf(x) =∑

i1,··· ,in∈N+

Pi1···in(x)f(uin···i1(x))

=∑

i1,··· ,inPi1···in(x)

[0,1)I(a,1](uin···i1(x))df(a) + f(0)

=∫

[0,1)

i1,··· ,in∈N+

Pi1···in(x)I(a,1](uin···i1(x))

df(a) + f(0)

=∫

[0,1)UnI(a.1](x)df(a) + f(0)

for any n ∈ N+ and x ∈ I. 2

Corollary 2.5.2 For any n ∈ N+ we have

supf∈BV (I)

var Unf

var f= sup

f∈B(I),f↑

var Unf

var f= sup

f∈B(I)f↓

var Unf

var f

= supa∈[0,1)

var UnI(a,1],

where the first three upper bounds are taken over non-constant functions f ,and f ↑ (↓) means that f is non-decreasing (non-increasing).

Proof. It is clear that

supf∈B(I),f↑

var Unf

var f= sup

f∈B(I),f↓

var Unf

var f

sincevar Un(−f)

var(−f)=

var Unf

var f.

Next, let

vn = supf∈B(I),f↑

var Unf

var f, n ∈ N+.

Then (cf. the proof of Corollary 2.1.13) for any non-constant f ∈ BV (I)there exist two non-decreasing functions f1 and f2 such that f = f1 − f2

and var f = var f1 + var f2. Therefore

var Unf ≤ var Unf1 + var Unf2

≤ vn(var f1 + var f2) = vn var f, n ∈ N+.

Page 156: Kluwer

Solving Gauss’ problem 139

Hence

supf∈BV (I)

var Unf

var f≤ vn

and since

supf∈BV (I)

var Unf

var f≥ sup

f∈B(I),f↑

var Unf

var f= vn,

the first equation should hold.To derive the last equation let f ∈ B(I) be non-decreasing. Then Unf

is a monotone function by Proposition 2.1.11, and Proposition 2.5.1 impliesthat

Unf(1)− Unf(0) =∫

[0,1)

(UnI(a,1](1)− UnI(a,1](0)

)df(a)

for any n ∈ N+. Noting that I(a,1] : I → I is also a non-decreasing functionfor any a ∈ [0, 1), we obtain

var Unf ≤(

supa∈[0,1)

var UnI(a,1]

)var f.

Hence, for any a ∈ [0, 1) and n ∈ N+,

var UnI(a,1] ≤ supf∈B(I),f↑

var Unf

var f≤ sup

a∈(0,1]var UnI(a,1]

and the proof is complete. 2

2.5.2 An upper bound

On account of Corollary 2.5.2 our guess for the upper bound of var Unf/var fover non-constant f ∈ BV (I) is given in the conjecture below.

UB Conjecture. For any n ∈ N+ we have

vn = supa∈[0,1)

var UnI(a,1] = var UnI(g,1],

where g = [1, 1, 1, · · · ] = (√

5− 1)/2 = 0.6180339 · · · .Without any loss of generality, throughout this subsection we assume

that f ∈ BV (I) is non-decreasing. To simplify the writing put

Pi1···in(0) = αi1···in , ui1···in(0) = βi1···in , i1, · · · , in ∈ N+.

Page 157: Kluwer

140 Chapter 2

If n is odd then by Proposition 2.1.11 and equations (2.5.1), (2.5.2), and(2.5.5) we have

var Unf = Unf(0)− Unf(1) (2.5.6)

=∑

i1,··· ,in∈N+

[Pi1···in(0)f(uin···i1(0))− Pi1···in(1)f(uin···i1(1))]

=∑

i1,··· ,in∈N+

[Pi1···in(0)f(uin···i1(0))− 2P(i1+1)i2···in(0)f(uin···i2(i1+1)(0))]

=∑

i2,··· ,in∈N+

α1i2···inf(βin···i21)−

i1∈N+

α(i1+1)i2···inf(βin···i2(i1+1))

.

Similarly, if n is even then we have

var Unf = Unf(1)− Unf(0) (2.5.7)

=∑

i2,··· ,in∈N+

i1∈N+

α(i1+1)i2···inf(βin···i2(i1+1))− α1i2···inf(βin···i21)

.

It is easy to see that if n is odd then var UnI(a,1] has a constant valuefor

a ∈

[1

j1 + 1,

1j1

)if n = 1,

[ [j1, · · · , jn−1, jn + 1], [j1, · · · , jn] ) if n > 1

while if n is even then var UnI(a,1] has a constant value for

a ∈ [ [j1, · · · , jn], [j1, · · · , jn−1, jn + 1] ) ,

that is, in both cases, on the closure without the right endpoint of anyfundamental interval I(j(n)), j(n) = (j1, · · · , jn) ∈ Nn

+. Write 1(n) for(j1, · · · , jn) with jk = 1, 1 ≤ k ≤ n, n ∈ N+. Then in particular for

a ∈ [ [1(2m + 2)], [1(2m + 1)]) , m ∈ N,

that is,

a ∈[F2m+1

F2m+2,

F2m

F2m+1

), m ∈ N, (2.5.8)

Page 158: Kluwer

Solving Gauss’ problem 141

we have

v′1 := var UI(a,1] = 1/2,

v′3 := var U3I(a,1] =∑

i2∈N+

α1i21 −

i1∈N+

α(i1+1)i21

+

i1∈N+

α(i1+1)11,

and

v′2m+1 := var U2m+1I(a,1]

=m−2∑

q=0

i2,··· ,i2m−2q∈N+

[α1i2i3···i2m−2q−1(i2m−2q+1)1···1

−∑

i1∈N+

α(i1+1)i2i3···i2m−2q−1(i2m−2q+1)1···1]

+∑

i2∈N+

α1i21···1 −

i1∈N+

α(i1+1)i21···1

+

i1∈N+

α(i1+1)1···1

for m ≥ 2. (In the last equation the number of subscripts of the α’s is2m + 1.) Similarly, for

a ∈ [ [1(2m + 2)], [1(2m + 3)]) , m ∈ N,

that is,

a ∈[F2m+1

F2m+2,F2m+2

F2m+3

), m ∈ N, (2.5.9)

we havev′2 := var U2I(a,1] =

i1∈N+

α(i1+1)1,

v′2m+2 := var U2m+2I(a,1]

=m−1∑

q=0

i2,··· ,i2m−2q+1∈N+

[ ∑

i1∈N+

α(i1+1)i2i3···i2m−2q(i2m−2q+1+1)1···1

− α1i2i3···i2m−2q(i2m−2q+1+1)1···1]

+∑

i1∈N+

α(i1+1)1···1

Page 159: Kluwer

142 Chapter 2

for m ∈ N+. (In the last equation the number of subscripts of the α’s is2m + 2.)

Since g belongs to all intervals (2.5.8) and (2.5.9), the UB Conjectureamounts to

vn = v′n, n ∈ N+.

The case n = 1. This case was dealt with in Proposition 2.1.12. Actually,writing i for i1, equation (2.5.6) yields

var Uf = α1f(β1)−∑

i∈N+

αi+1f(βi+1).

Hencevar UI(a,1] =

1i + 1

for a ∈[

1i + 1

,1i

), i ∈ N+ and

v1 = supa∈[0,1)

var UI(a,1] =12

= var UI(g,1] = v′1

as g ∈ [1/2, 1). Thus in this case the UB Conjecture holds.The case n = 2. Write i for i1 and j for i2. Then we have

αij =1

(ij + 1)(i(j + 1) + 1), i, j ∈ N+,

and equation (2.5.7) yields

var U2f =∑

j∈N+

i∈N+

α(i+1)jf(βj(i+1))− α1jf(βj1)

=∑

i∈N+

α(i+1)1f(β1(i+1))

(2.5.10)

+∑

j∈N+

i∈N+

α(i+1)(j+1)f(β(j+1)(i+1))− α1jf(βj1)

.

Clearly, β(j+1)(i+1) < βj1 for any i, j ∈ N+. Hence

var U2f ≤ f(1)∑

i∈N+

α(i+1)1 +∑

j∈N+

f(βj1)

j∈N+

α(i+1)(j+1) − α1j

.

Page 160: Kluwer

Solving Gauss’ problem 143

But∑

i∈N+

α(i+1)(j+1) =∑

i∈N+

1((i + 1)(j + 1) + 1) ((i + 1)(j + 2) + 1)

≤ 1(j + 1)(j + 2)

i∈N+

1(i + 1)2

(2.5.11)

= (ζ(2)− 1)α1j < α1j

for any j ∈ N+. Since f(βj1) ≥ f(0), j ∈ N+, and

j∈N+

i∈N+

α(i+1)(j+1) − α1j

= −

i∈N+

α(i+1)1,

(2.5.10) and (2.5.11) imply that

var U2f ≤∑

i∈N+

α(i+1)1 (f(1)− f(0)) =∑

i∈N+

α(i+1)1 var f (2.5.12)

for any non-decreasing f ∈ B(I). Now, note that for f = I(a,1] witha ∈ [1/2, 2/3), in particular for a = g, we have

var U2I(a,1] =∑

i∈N+

α(i+1)1,

that is, the constant

i∈N+

α(i+1)1 =∑

i∈N+

1(i + 2)(2i + 3)

= 2∑

i∈N+

(1

2i + 3− 1

2i + 4

)

= 2(

log 2− 1 +12− 1

3+

14

)= log 4− 7

6= 0.21962 · · ·

occurring in (2.5.12) cannot be lowered. Therefore for n = 2 we have

v2 = log 4− 76

= 0.21962 · · · ,

and the UB Conjecture holds in this case.The case n ≥ 3. We could try to treat this case similarly to the case

n = 2. Using (2.5.5) it is not difficult to generalize inequality (2.5.11) to∑

i1∈N+

α(i1+1)(i2+1)i3···in ≤ (ζ(2)− 1)α1i2···in < α1i2···in (2.5.13)

Page 161: Kluwer

144 Chapter 2

for any n ≥ 3 and i2, · · · , in ∈ N+. Next, to make a choice let us assumethat n is odd. Then it is easy to see that

βin···i3(i2+1)(i1+1) > βin···i3i21,

βin···i31(i1+1) > βin···i31,

βin···i3i21 < βin···i2

for any i1, · · · , in ∈ N+. Then by (2.5.6) and (2.5.13) we have

var Unf ≤∑

i3,··· ,in∈N+

i1∈N+

α(i1+1)1i3···inf(βin···i31(i1+1)

)

+∑

i2∈N+

α1i2i3···in −

i1∈N

α(i1+1)(i2+1)i3···in

f (βin···i3i21)

≤∑

i3,··· ,in∈N+

i2∈N+

α1i2i3···in −

i1∈N

α(i1+1)i2i3···in

f (βin···i3)

+

i1∈N+

α(i1+1)1i3···in

(f (βin···i3)− f (βin···i31))

.

(2.5.14)For an even n the corresponding inequality is

var Unf ≤∑

i3,··· ,in∈N

i2∈N+

i1∈N+

α(i1+1)i2i3···in − α1i2i3···in

f(βin···i3)

+

i1∈N+

α(i1+1)1i3···in

(f(βin···i31)− f(βin···i3))

. (2.5.15)

Put

δi3···in = (−1)n−1∑

i2∈N+

α1i2i3···in −

i1∈N

α(i1+1)i2i3···in)

Page 162: Kluwer

Solving Gauss’ problem 145

for any i3, · · · , in ∈ N+. Note that

i3,··· ,in∈N+

δi3···in = (−1)n−1

α1 −

i1∈N+

αi1+1

= 0. (2.5.16)

Using (2.5.5), which implies

Pi1···in(0) = (−1)n

1

i1 +pn(i2, · · · , in, 1)qn(i2, · · · , in, 1)

− 1

i2 +pn−1(i2, · · · , in)qn−1(i2, · · · , in)

for any n ≥ 2 and i1, · · · , in ∈ N+, it is easy to see that δi3···in can beexpressed in terms of the digamma function ψ as

δi3···in = ψ

(2 +

p′n−2

q′n−2

)−ψ

(2 +

p′n−2 + p′n−3

q′n−2 + q′n−3

)

+∑

i∈N+

ψ

2 +

1

i +p′n−2

q′n−2

−ψ

2 +

1

i +p′n−2 + p′n−3

q′n−2 + q′n−3

,

where p′m = pm(i3, · · · , im+2), q′m = qm(i3, · · · , im+2), m ∈ N+, and p′0 =0, q′0 = 1. Let us recall that the digamma function can be expressed by theconvergent series

ψ(z) = −C +∑

j∈N+

(1j− 1

j + z − 1

)= −C +

j∈N+

z − 1j(j + z − 1)

for z 6= 0,−1,−2, · · · , where C = 0.57721 · · · is the Euler constant. As iswell known, ψ satisfies the equation

ψ(z + 1) = ψ(z) +1z

for z 6= 0,−1,−2, · · · . Tables for ψ can be found in Abramowitz and Stegun(1964).

Puttingδ(n)(f) =

i3,··· ,in∈N+

δi3···inf(βin···i3),

Page 163: Kluwer

146 Chapter 2

inequalities (2.5.14) and (2.5.15) imply that

var Unf ≤ δ(n)(f) +∑

i1∈N+

α(i1+1)1···1 var f (2.5.17)

for any n ≥ 3. Here we used the fact that α(i1+1)1i3···in < α(i1+1)11···1 for anyn ≥ 3 and (i3, · · · , in) 6= 1(n− 2), which follows at once from (2.5.5).

First, note that by (2.5.16) we have

δ(n)(f) ≤ 12

i3,··· ,in∈N+

|δi3···in | (f(1)− f(0)). (2.5.18)

Since12

i3,··· ,in∈N+

|δi3···in | = sup∑

(i3,··· ,in)∈A

δi3···in ,

where the supremum is taken over all A ⊂ Nn−2+ , it follows that

12

i3,··· ,in∈N+

|δi3···in | ≥12

i∈N+

|δi| .

Hence the right hand side of (2.5.17) does not tend to 0 as n → ∞, and(2.5.18) is useless for n ≥ 3. As a matter of fact, it is a general result whichdoes not take into account that f is non-decreasing.

If for some given n ≥ 3 the inequality

δ(n)(I(a,1]) ≤ δ(n)(I(g,1]) (2.5.19)

holds for any a ∈ [0, 1), then by (2.5.17) we have

var UnI(a,1] ≤ δ(n)(I(g,1]) +∑

i1∈N+

α(i1+1)1···1 (2.5.20)

for any a ∈ [0, 1). It is easy to see that the right hand side of (2.5.20) isequal to v′n. Since whatever n ∈ N+ we have

var UnI(g,1) = v′n,

it follows from (2.5.20) that vn = v′n. Thus if (2.5.19) holds then for thegiven n the UB Conjecture holds, too.

In particular for n = 3, writing i, j, k for i1, i2, i3, respectively, we have

αijk =1

(i(jk + 1) + k)(i(j(k + 1) + 1) + k + 1), i, j, k ∈ N+.

Page 164: Kluwer

Solving Gauss’ problem 147

It has been proved in Iosifescu (1994) that

δk =∑

j∈N+

α1jk −

i∈N+

α(i+1)jk

is positive for k = 1 and negative for k > 1. Then (2.5.19) clearly holds inthis case. Hence the UB Conjecture holds for n = 3 and

v3 = δ1 +∑

i∈N+

α(i+1)11

=∑

j∈N+

(1

(j + 2)(2j + 3)+ ψ

(2 +

1j + 1

)−ψ

(2 +

22j + 1

))

(2 +

23

)−ψ

(2 +

12

)

= log 4− 76

+∑

j∈N+

(2 +

1j + 1

)−ψ

(2 +

22j + 1

))

+35

+32

+ ψ

(23

)− 2

3− 2−ψ

(12

)

= log 4− 76− 17

30+ log

4√27

2√

3

+∑

j∈N+

(2 +

1j + 1

)−ψ

(2 +

22j + 1

)).

We have [see Iosifescu (1994, p.115)]

0.09104 < v3 < 0.09759

while a computation using MATHEMATICA yields

0.09436 < v3 < 0.09445.

Returning to the general case, a good upper bound for vn, n ∈ N+ isavailable. For a lower bound see further Corollary 2.5.6.

Theorem 2.5.3 We have

vn ≤ k0

FnFn+1(2.5.21)

Page 165: Kluwer

148 Chapter 2

for any n ∈ N+. Here and throughout the remainder of this section, k0 is aconstant not exceeding 14.8.

Proof. Clearly, (2.5.21) holds for n = 1, 2, 3 as was shown before.By Corollary 2.5.2 and on account of the constancy of the function a →var UnI(a,1] on any fundamental interval of order n, we have

vn = supa∈Ω

var UnI(a,1], n ∈ N+.

If to make a choice we assume that n ∈ N+ is odd, then by Proposition2.1.11 and equation (2.5.3) for any a ∈ I we have

var UnI(a,1] = UnI(a,1](0)− UnI(a,1](1)

=∑

i(n)∈Nn+

(γ0(I(i(n))− γ1(I(i(n)))

)I(a,1](uin···i1(1))

+∑

i(n)∈Nn+

γ0(I(i(n)))(I(a,1](uin···i1(0))− I(a,1](uin···i1(1)).

(2.5.22)Note that if a ∈ Ω then just one of the differences

I(a,1](uin···i1(0))− I(a,1](uin···i1(1)), i(n) ∈ Nn+,

is 6= 0 (and equal to 1). Also, for an arbitrarily given a = [j1, j2, · · · ] ∈ Ω theset

i(n) ∈ Nn+ : uin···i1(1) > a

consists of the i(n) = (i1, . . . , in) ∈ Nn+ satisfying

(i1 < j1), if n = 1;

(i3 < j1) ∪ (i3 = j1, i2 > j2) ∪ (i3 = j1, i2 = j2, i1 < j3), if n = 3;

(in < j1) ∪ (in = j1, in−1 > j2) ∪ (in = j1, in−1 = j2, in−2 < j3) ∪ · · ·

∪(in = j1, · · · , i3 = jn−2, i2 > jn−1) ∪ (in = j1, · · · , i2 = jn−1, i1 < jn),

if n ≥ 5. Therefore, putting µ = γ0 − γ1, it follows from (2.5.22) that for

Page 166: Kluwer

Solving Gauss’ problem 149

a = [j1, j2, . . . ] ∈ Ω and any odd n ≥ 5 we have

var UnI(a,1] ≤ |µ(an < j1)|+ |µ(an = j1, an−1 > j2)|

+ |µ(an = j1, an−1 = j2, an−2 < j3)|+ · · ·

+ |µ(an = j1, · · · , a3 = jn−2, a2 > jn−1|

+ |µ(an = j1, · · · , a2 = jn−1, a1 < jn|

+maxi(n)∈Nn+

γ0(I(i(n))).

(2.5.23)

We shall use the inequalities

|γ0(A)− γ1(A)| ≤ (log 2)γ(A),

|γa(τ−n(A))− γ(A)| ≤ (ζ (2) log 2− 1)λn−10 γ(A),

(2.5.24)

which are valid for any a ∈ I, A ∈ BI , and n ∈ N+, with λ0 = 0.303663 · · ·(Wirsing’s constant).

The first inequality follows by integrating over A the double inequality

− 1x + 1

≤ 1− 2(x + 1)2

≤ 1x + 1

, x ∈ I,

while the second one is the inequality in Theorem 2.3.5 for ` = 2. Note that

(an < j1) = τ−n+1(a1 < j1),

(an = j1, an−1 > j2) = τ−n+2(a2 = j1, a1 > j2),

(an = j1, an−1 = j2, an−2 < j3) = τ−n+3(a3 = j1, a2 = j2, a1 < j3),· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·(an = j1, · · · , a3 = jn−2, a2 > jn−1)

= τ−1(an−1 = j1, · · · , a2 = jn−2, a1 > jn−1)

and(a2 = j1, a1 > j2) ⊂ (a2 = j1)

(a3 = j1, a2 = j2, a1 < j3) ⊂ (a3 = j2, a2 = j2)· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·(an−1 = j1, · · · , a2 = jn−2, a1 > jn−1) ⊂ (an−1 = j1, · · · , a2 = jn−2)

(an = j1, · · · , a2 = jn−1, a1 < jn) ⊂ (an = j1, · · · , a2 = jn−1).

Page 167: Kluwer

150 Chapter 2

Next, by Theorem 1.2.2 we have

maxi(n)∈Nn

+

γ0(I(i(n))) =1

FnFn+1(:= σ(n)), n ∈ N+,

and thenmax

i(n)∈Nn+

γ(I(i(n))) ≤ 23 log 2

σ(n), n ∈ N+. (2.5.26)

Now, by (2.5.24) through (2.5.26), with k1 = log 2 = 0.69315 · · · , k2

= ζ(2) log 2 − 1 = 0.14018, · · · , θ = g2 =(√

5− 1)2

/4 =(3−√5

)/2 =

0.38196 · · · , it follows from (2.5.23) thatvar UnI(a,1]

≤ 4k2

3 log 2

(λn−2

0 σ (0) + λn−30 σ (1) + · · ·+ σ (n− 2) +

k1

2k2σ (n− 1)

)+ σ (n)

=4k2

3 log 2σ (n− 1)

(λn−2

0

σ (0)σ (n− 1)

+ λn−30

σ (1)σ (n− 1)

+ · · ·

+σ(n− 2)σ(n− 1)

+k1

2k2

)+ σ(n).

Sinceσ (k)σ (n)

≤ 12

θk−n−1, k, n ∈ N,

andσ (n− 1)

σ (n)≤ 8

3, n ≥ 3,

we finally obtain

var UnI(a,1] ≤(

1 +16k1

9 log 2+

16k2

9θ (θ − λ0) log 2

)σ (n) .

We have

1 +16k1

9 log 2+

16k2

9θ (θ − λ0) log 2= 1 +

169 log 2

(k1 +

k2

θ (θ − λ0)

)

= 1 +32

9 log 2

(log 2

2+

ζ (2) log 2− 17− 3

√5− (

3−√5)λ0

)

= 14.780 · · · < 14.8

Page 168: Kluwer

Solving Gauss’ problem 151

and the proof is complete for any odd n. The case of an even n can betreated similarly. 2

Corollary 2.5.4 Let f ∈ BV (I). For any n ∈ N we have

||| Unf − U∞f ||| ≤ k0 var f

FnFn+1.

Proof. By (2.1.12) and Proposition 2.0.1 (i) we have

||| Unf − U∞f ||| ≤ var Unf, n ∈ N,

and the result stated is implied by Theorem 2.5.3 for n ∈ N+. The casen = 0 can be checked directly. 2

Remark. It was claimed in Iosifescu (1997, p.76) that Theorem 2.5.3holds with k0 = 1/ log 2 for all n ∈ N large enough. (This is clearly truefor n = 1, 2, or 3.) A flaw detected by Adriana Berechet in the methodof proof in that paper invalidates the conclusion. We conjecture, however,that both Theorem 2.5.3 and Corollary 2.5.4 hold with k0 = 1/ log 2 for anyn ∈ N. 2

2.5.3 Two asymptotic distributions

We are now able to derive the asymptotic behaviour of γa(san ≤ x) as n →∞

for any a, x ∈ I.

Theorem 2.5.5 For any a ∈ I and n ∈ N we have

a + 12(Fn + aFn−1)(Fn+1 + aFn)

≤ supx∈I

|γa(san ≤ x)−G(x)| ≤ k0

FnFn+1.

Proof. (i) The upper bound. We have already used in Subsection 2.5.1the property of U of being the transition operator of the Markov chain(sa

n)n∈N for any a ∈ I. Therefore in particular

UnI[0,x](a) = Ea

(I[0,x](s

an)

)= γa(sa

n ≤ x)

for any a, x ∈ I and n ∈ N. As

U∞I[0,x] =∫

II[0,x]dγ = γ([0, x]) = G(x), x ∈ I,

Page 169: Kluwer

152 Chapter 2

Corollary 2.5.4 yields the upper bound announced .(ii) The lower bound. We start with two simple remarks. First, using

the continuity of G and the equations limh↓0 γa(san ≤ x − h) = γa(sa

n < x)and limh↓0 γa(sa

n < x + h) = γa(san ≤ x), x ∈ I, it is easy to see that

supx∈I

|γa(san ≤ x)−G(x)| = sup

x∈I|γa(sa

n < x)−G(x)|

for any a ∈ I and n ∈ N. Second, for any s ∈ I we have

γa(san = s) = γa(sa

n ≤ s)−G(s)− (γa(san < s)−G(s))

≤ supx∈I

|γa(san ≤ x)−G(x)|+ sup

x∈I|γa(sa

n < x)−G(x)|

= 2 supx∈I

|γa(san ≤ x)−G(x)| .

Hencesupx∈I

|γa(san ≤ x)−G(x)| ≥ 1

2sups∈I

γa(san = s) (2.5.27)

for any a ∈ I and n ∈ N.Next, recall (see Subsection 2.5.1) that

γa(san = [in, . . . , i2, i1 + a]) = γa(I(i(n))) = Pi1···in(a), n ≥ 2,

γa

(sa1 =

1i1 + a

)= γa(I(i1)) = Pi1(a)

for any a ∈ I and (i1, · · · , in) = i(n) ∈ Nn+. By (2.5.5) and (2.5.27) we then

havesups∈I

γa(san = s) = P1(n)(a), a ∈ I, n ∈ N+, (2.5.28)

where we write 1(n) for (i1, · · · , in) with i1 = · · · = in = 1, n ∈ N+. Withthe convention F−1 = 0, by equation (2.5.5) again,

P1(n)(a) =a + 1

((a + 1)Fn−1 + Fn−2)((a + 1)Fn + Fn−1)

=a + 1

(Fn + aFn−1)(Fn+1 + aFn), a ∈ I, n ∈ N+.

(2.5.29)

The lower bound announced now follows from (2.5.27) through (2.5.29). Thecase n = 0 can be checked directly. 2

Page 170: Kluwer

Solving Gauss’ problem 153

Remarks. 1. It is easy to see that P1(n)(·) is a decreasing function. Hence

P1(n)(a) ≥ P1(n)(1) =2

Fn+1Fn+2(2.5.30)

for any a ∈ I and n ∈ N+.2. Both lower and upper bounds in Theorem 2.5.5 are O(g2n) as n →∞

with g = (√

5 − 1)/2, g2 = (3 − √5)/2 = 0.38196 · · · . Thus the optimal

convergence rate has been obtained. 2

Corollary 2.5.6 For any n ∈ N+ we have

vn ≥ 2Fn+1Fn+2

.

Proof. As noted in the proof of Theorem 2.5.5, we have

γa(san ≤ x) = UnI[0,x](a), G(x) = U∞I[0,x]

for any a, x ∈ I and n ∈ N. Then Theorem 2.5.5, inequality (2.5.30), andthe argument used in the proof of Corollary 2.5.4 yield

2Fn+1Fn+2

≤ supx∈I

||| UnI[0,x] − U∞I[0,x] ||| ≤ supx∈I

var UnI[0,x] (2.5.31)

for any n ∈ N. By Corollary 2.5.2 the proof is complete. 2

Remark. Theorem 2.5.3 and Corollary 2.5.6 show that vn = O(g2n) asn →∞, and this convergence rate is optimal. 2

Corollary 2.5.7 The spectral radius of the operator U −U∞ in BV (I)equals g2 = (3−√5)/2 = 0.38196 · · · .

Proof. We should show that

limn→∞ ||| U

n − U∞ ||| 1/nv = lim

n→∞

(sup

06=f∈BV (I)

||| Unf − U∞f ||| v||| f ||| v

)1/n

= g2.

The argument used in the proof of Corollary 2.5.4, and Theorem 2.5.3yield

||| Unf − U∞f ||| v = ||| Unf − U∞f |||+ var Unf

≤ 2 var Unf ≤ 4k0

FnFn+1var f ≤ 4k0

FnFn+1||| f ||| v

Page 171: Kluwer

154 Chapter 2

for any n ∈ N and f ∈ BV (I). (We took into account that, as mentionedat the beginning of this section, here f is complex-valued. See the proof ofProposition 2.1.16.) Hence

limn→∞ ||| U

n − U∞ ||| 1/nv ≤ g2.

The converse inequality follows by taking f = I[0,x], x ∈ I, and using(2.5.31). 2

Theorem 2.5.5 allows a quick derivation of the asymptotic behaviour of

γa(τn ≤ x, san ≤ y)

as n → ∞ for any a, x, y ∈ I, and of the (optimal) convergence rate, thesame as above.

Theorem 2.5.8 For any a ∈ I and n ∈ N we have

a + 12(Fn + aFn−1)(Fn+1 + aFn)

≤ supx,y∈I

∣∣∣∣γa(τn ≤ x, san ≤ y)− log(xy + 1)

log 2

∣∣∣∣

≤ k0

FnFn+1.

Proof. Set Gan(y) = γa(sa

n ≤ y),Han(y) = Ga

n(y)−G(y), a, y ∈ I, n ∈ N.Theorem 2.5.5 yields

|Han (y) | ≤ k0

FnFn+1, a, y ∈ I, n ∈ N. (2.5.32)

By the generalized Broden–Borel–Levy formula (1.3.21), for any a, x, y ∈ I

Page 172: Kluwer

Solving Gauss’ problem 155

and n ∈ N we have

γa(τn ≤ x, san ≤ y) =

∫ y

0γa(τn ≤ x|sa

n = z) dGan(z)

=∫ y

0

(z + 1)xzx + 1

dGan(z)

=1

log 2

∫ y

0

(z + 1)xzx + 1

dz

z + 1+

∫ y

0

(z + 1)xzx + 1

dHan(z)

=log(xy + 1)

log 2+

(z + 1)xzx + 1

Han(z)

∣∣∣∣z=y

z=0

−∫ y

0

x− x2

(zx + 1)2Ha

n(z)dz.

[When applying formula (1.3.21) we used the fact that the σ-algebras gen-erated by (a1, · · · , an) and by sa

n are identical for any a ∈ I and n ∈ N+.]Hence, by (2.5.32),

∣∣∣∣γa(τn ≤ x, san ≤ y)− log(xy + 1)

log 2

∣∣∣∣

≤ k0

FnFn+1

((y + 1)xxy + 1

+(x− x2)yxy + 1

)≤ k0

FnFn+1

for any a, x, y ∈ I and n ∈ N, so that the upper bound holds.To get the lower bound we note that by Theorem 2.5.5 for any a ∈ I

and n ∈ N we have

supx,y∈I

∣∣∣∣γa(τn ≤ x, san ≤ y)− log(xy + 1)

log 2

∣∣∣∣

≥ supy∈I

∣∣∣∣γa(τn ≤ 1, san ≤ y)− log(y + 1)

log 2

∣∣∣∣

= supy∈I

|γa(san ≤ y)−G(y)| ≥ a + 1

2(Fn + aFn−1)(Fn+1 + aFn).

2

Remarks. 1. We can replace γa(τn ≤ x, san ≤ y) by λ(τn ≤ x, sa

n ≤ y) inthe statement of Theorem 2.5.8 since it is possible to relate these quantitiesby noticing that

∣∣san − s0

n

∣∣ ≤ 1/F2n, n ∈ N, a ∈ I. The new upper and lower

bounds are of order O(g2n) as n →∞, too.

Page 173: Kluwer

156 Chapter 2

2. As noted at the end of Subsection 1.3.3, log(xy + 1)/ log 2, x, y ∈ I,is the joint distribution function under γ of the extended random variablesτn and sn. 2

2.5.4 A generalization of a result of A. Denjoy

Sixty five years ago, A. Denjoy published a Comptes Rendus Note [see Den-joy (1936 b)] in which he sketched a proof of the fact that (in our notation)

limn→∞λ([a1, · · · , an] ≤ x, s0

n ≤ y) =x log(y + 1)

log 2(2.5.33)

uniformly with respect to x, y ∈ I. Of course, for x = 1 this follows at oncefrom Theorem 2.5.5. In this subsection we prove that (2.5.33) holds withλ replaced by any probability measure µ on BI absolutely continuous withrespect to λ, in particular with λ replaced by any γa, a ∈ (0, 1]. An estimateof the convergence rate is also given . These will follow from Theorem 2.5.9below.

Since |[a1, · · · , an] − τ0| ≤ (FnFn+1)−1, n ∈ N+, it is easy to see thatfor any probability measure µ on BI absolutely continuous with respect toλ, we have∣∣µ([a1, · · · , an] ≤ x, s0

n ≤ y)− µ(τ0 ≤ x, s0n ≤ y)

∣∣

≤ max(µ(x− (FnFn+1)−1 < τ0 ≤ x), µ(x < τ0 ≤ x + (FnFn+1)−1)) → 0

uniformly with respect to x, y ∈ I as n → ∞. This allows us to replace[a1, · · · , an] by τ0 in (2.5.33) and its generalizations.

Fix a ∈ I arbitrarily. Let f be a λ-integrable complex-valued functionon I. Since γa is equivalent to λ for any a ∈ I, f is γa-integrable, too.Denote by Ek, k ∈ N, the set consisting of the endpoints of all fundamentalintervals of rank `, 0 ≤ ` ≤ k. For any n ∈ N we associate with f afunction fa

n which has a constant value on each fundamental interval of rankn. Specifically, fa

0 =∫I fdγa and

fan(x) =

1γa(I(i(n)))

I(i(n))fdγa, x ∈ I(i(n)), i(n) ∈ Nn

+,

for n ∈ N+. Clearly,∫

Ifa

ndγa =∫

Ifdγa, n ∈ N. (2.5.34)

Page 174: Kluwer

Solving Gauss’ problem 157

Since for any n ∈ N+ and x ∈ I \ En there is a unique i(n) ∈ Nn+ such that

x ∈ I(i(n)) and sincemax

i(n)∈Nn+

γa(I(i(n))) → 0

as n →∞, by a well known property of the Lebesgue integral we have

limn→∞ fa

n(x) = f(x) (2.5.35)

a.e. in I. It follows from (2.5.34) and (2.5.35) that

limn→∞

I|f − fa

n |dγa = 0. (2.5.36)

By (2.5.36) the right hand side of (2.5.37) below converges to 0 as n →∞.

Remark. It is easy to check that (fan)n∈N is a martingale on (I,BI , γa)

whatever a ∈ I. 2

Theorem 2.5.9 Let f be a λ-integrable complex valued function on Iand let h ∈ BV (I) be real-valued. Then

∣∣∣∣∫

If (h sa

n) dγa −∫

Ifdγa

Ihdγ

∣∣∣∣

≤ inf0≤k≤n

(||| h |||

I|f − fa

k | dγa +k0

Fn−kFn−k+1var h

I|f |dγa

) (2.5.37)

for any a ∈ I and n ∈ N.

Proof. For any a ∈ I and k, n ∈ N+, k ≤ n, we have∫

If (h sa

n)dγa

=∑

i(k)∈Nk+

(∫

I(i(k))(f − fa

k )(h san)dγa +

I(i(k))fa

k (h san)dγa

).

(2.5.38)

Clearly,∣∣∣∣∣∣∣

i(k)∈Nk+

I(i(k))(f − fa

k )(h san)dγa

∣∣∣∣∣∣∣≤ ||| h |||

I|f − fa

k |dγa. (2.5.39)

Page 175: Kluwer

158 Chapter 2

Next, for any fixed i(k) ∈ Nk+ we can write

I(i(k))fa

k (h san)dγa =

1γa(I(i(k))

I(i(k))fdγa

I(i(k))(h sa

n)dγa. (2.5.40)

It is easy to check that

γa(I(i(k))) =a + 1

(qk + apk)(qk + qk−1 + a(pk + pk−1)),

wherepk

qk= [i1, . . . , ik], g.c.d. (pk, qk) = 1, k ∈ N+,

and p0 = 0, q0 = 1. With the change of variable

u =pk + t pk−1

qk + t qn−1, t ∈ I,

noting thatsan(u) = sa′

n−k(t)

for t ∈ Ω, where

a′ =

[ik, . . . , i2, i1 + a] if k > 1,1/(i1 + a) if k = 1

=qk−1 + apk−1

qk + apk,

we obtain∫

I(i(k))h(sa

n(u))γa(du) = (a + 1)∫

I(i(k))

h(san(u))du

(au + 1)2

= (a + 1)∫

I

h(sa′n−k(t))dt

(t(qk−1 + apk−1) + qk + apk)2.

Hence

1γa(I(i(k)))

I(i(k))h(sa

n(u))γa(du) = (a′ + 1)∫

I

h(sa′n−k(t))dt

(a′t + 1)2

=∫

I(h sa′

n−k)dγa′ =∫

Ih(v) dGa′

n−k(v),

(2.5.41)

Page 176: Kluwer

Solving Gauss’ problem 159

where Ga′m(v) = γa′(sa′

m < v), m ∈ N, v ∈ I. By Theorem 2.5.5 we have

|Gam(v)−G(v)| ≤ k0

FmFm+1

for any a, v ∈ I and m ∈ N. Then∣∣∣∣∫

Ih(v)dGa′

n−k(v)−∫

Ihdγ

∣∣∣∣

=∣∣∣∣∫

IGa′

n−k(v)dh(v)−∫

IG(v)dh(v)

∣∣∣∣ ≤k0 var h

Fn−kFn−k+1.

(2.5.42)

It follows from (2.5.40) through (2.5.42) that∣∣∣∣∣∣∣

i(k)∈Nk+

I(i(k))fa

k (h san)dγa −

Ifdγa

Ihdγ

∣∣∣∣∣∣∣(2.5.43)

≤ k0 var h

Fn−kFn−k+1

I|f |dγa.

Finally, (2.5.38), (2.5.39), and (2.5.43) for k = 0 and n ∈ N should bereplaced by

If(h sa

n)dγa =∫

I(f − fa

0 )(h san)dγa + fa

0

I(h sa

n)dγa, (2.5.38′)

∣∣∣∣∫

I(f − fa

0 )(h san)dγa

∣∣∣∣ ≤ ||| h |||∫

I|f − fa

0 | dγa, (2.5.39′)

and∣∣∣∣fa

0

I(h sa

n)dγa −∫

Ifdγa

Ihdγ

∣∣∣∣ ≤k0 var h

FnFn+1

I|f |dγa, (2.5.43′)

respectively.Now, (2.5.37) follows from (2.5.38), (2.5.38′) (2.5.39), (2.5.39′), (2.5.43),

and (2.5.43′). 2

Corollary 2.5.10 For any a, x, y ∈ I and n ∈ N we have∣∣γa(τ0 ≤ x, sa

n ≤ y)− γa([0, x])G(y)∣∣

≤ inf0≤k≤n

(δak(x) +

k0

Fn−kFn−k+1γa([0, x])

) (2.5.44)

Page 177: Kluwer

160 Chapter 2

where

δak(x) =

0 if x ∈ Ek,2(a + 1)(x− ak)(bk − x)

(bk − ak)(ax + 1)2if x ∈ (ak, bk),

and [ak, bk] is the closure of the (unique) fundamental interval of orderk ∈ N containing x ∈ I \Ek.

Proof. Clearly,

γa(τ0 ≤ x, san ≤ y) =

II[0,x](I[0,y] sa

n) dγa

for any a, x, y ∈ I and n ∈ N. Theorem 2.5.9 applies with f = I[0,x] andh = I[0,y], x, y ∈ I, yielding (2.5.44) since as is easy to see, in the presentcase ∫

I|f − fa

k |dγa = δak(x), k ∈ N, a, x ∈ I.

2

Corollary 2.5.11 For any a ∈ I and n ∈ N we have

a + 12(Fn + aFn−1)(Fn+1 + aFn)

≤ supx,y∈I

∣∣γa(τ0 ≤ x, san ≤ y)− γa([0, x])G(y)

∣∣

≤(

a + 12

+ k0

)1

Fbn/2cFbn/2c+1. (2.5.45)

Proof. We clearly have

δak(x) ≤ a + 1

2maxi(k)

λ(I(i(k))) =a + 1

2FkFk+1, k ∈ N, a, x ∈ I. (2.5.46)

The upper bound from (2.5.45) follows by using (2.5.46) and taking k =bn/2c.

Next, as in the proof of Theorem 2.5.8, we get

supx,y∈I

∣∣γa(τ0 ≤ x, san ≤ y)− γa([0, x])G(y)

∣∣

≥ supy∈I |γa(san ≤ y)−G(y)| ≥ a + 1

2(Fn + aFn−1)(Fn+1 + aFn)

for any a ∈ I and n ∈ N, and so the lower bound holds, too. 2

Page 178: Kluwer

Solving Gauss’ problem 161

Remark. The upper bound in Corollary 2.5.11 is O(gn) as n →∞, withg = (

√5− 1)/2. The lower bound is O(g2n) as n →∞ so that the problem

of the exact rate of convergence is unsettled. 2

Corollary 2.5.12 Let µ ∈ pr (BI) such that µ ¿ λ and let ga =dµ/dγa, a ∈ I. Then we have∣∣µ(τ0 ≤ x, sa

n ≤ y)− µ([0, x])G(y)∣∣

≤ inf0≤k≤n

(∫

I

∣∣gaI[0,x] − (gaI[0,x])ak

∣∣dγa +k0

Fn−kFn−k+1µ([0, x])

) (2.5.47)

for any a, x, y ∈ I and n ∈ N. In particular, if ga has a version ga ofbounded variation, then∫

I

∣∣gaI[0,x] − (gaI[0,x])ak

∣∣dγa (2.5.48)

(a + 1)var[0,x]ga

(Fk + aFk−1)(Fk+1 + aFk)if x ∈ Ek

(a + 1)var[0,x]ga

(Fk + aFk−1)(Fk+1 + aFk)+ 2

∫ x

ak

ga(t)γa(dt) if x ∈ (ak, bk),

where [ak, bk] is the closure of the (unique) fundamental interval of orderk ∈ N containing x ∈ I \Ek.

Proof. We have

µ(τ0 ≤ x, san ≤ y) =

II[0,x](I[0,y] sa

n)gadγa

for any a, x, y ∈ I and n ∈ N. Theorem 2.5.9 applies with f = gaI[0,x] andh = I[0,y], x, y ∈ I, yielding (2.5.47). Next, (2.5.48) can be obtained notingthat (i) for a typical fundamental interval I(i(k)) of order k ∈ N containedin [0, x] we have∫

I(i(k))

∣∣gaI[0,x] − (gaI[0,x])ak

∣∣dγa

=∫

I(i(k))

∣∣∣∣∣ga(t)− 1γa(I(i(k)))

I(i(k))ga(s)γa(ds)

∣∣∣∣∣ γa(dt)

=1

γa(I(i(k)))

I(i(k))

∣∣∣∣∣∫

I(i(k))(ga(t)− ga(s)) γa(ds)

∣∣∣∣∣ γa(dt)

≤ γa(I(i(k)))varI(i(k)) ga,

Page 179: Kluwer

162 Chapter 2

and (ii) for x ∈ (ak, bk) we have∫ bk

ak

∣∣gaI[0,x] − (gaI[0,x])ak

∣∣dγa

=∫ x

ak

∣∣∣∣ga(t)− 1γa([ak, bk])

∫ x

ak

ga(s)γa(ds)∣∣∣∣ γa(dt)

+1

γa([ak, bk])

∫ bk

x

∣∣∣∣∫ x

ak

ga(s)γa(ds)∣∣∣∣ γa(dt)

≤∫ x

ak

ga(t)γa(dt) +1

γa([ak, bk])

∫ bk

ak

(∫ x

ak

ga(s)γa(ds))

γa(dt)

= 2∫ x

ak

ga(t)γa(dt).

The proof is complete. 2

Corollary 2.5.13 Let µ ∈ pr(BI) such that µ ¿ λ and let ga =dµ/dγa, a ∈ I. Then we have

|µ(san ≤ x)−G(x)| ≤ inf

0≤k≤n

(∫

I|ga − (ga)a

k| dγa +k0

Fn−kFn−k+1

)

for any a, x ∈ I and n ∈ N. If ga has a version of bounded variation, thenthe right hand side of the above inequality is O(gn) as n → ∞ uniformlywith respect to a, x ∈ I.

Proof. Take x = 1 in (2.5.47), and then x = 1 and k = bn/2c in(2.5.48). 2

Remark. Corollary 2.5.13 shows that the limiting distribution as n →∞of sa

n under a probability measure on BI absolutely continuous with respectto λ is always Gauss’ γ for any a ∈ I. The problem of the exact rate ofconvergence, which should normally depend on ga, remains unsettled. 2

Other special cases of Theorem 2.5.9 and its corollaries can be considered.For example, we can check that

limn→∞ γ(τ0 ≤ x, sa

n ≤ y) = G(x)G(y), a, x, y ∈ I. (2.5.49)

It is interesting to note that (2.5.49) points to the asymptotic indepen-dence of τ0 and sa

n under γ as n →∞.

Page 180: Kluwer

Solving Gauss’ problem 163

As already noted at the beginning of this subsection, we can easily obtainthe results corresponding to Corollaries 2.5.10 through 2.5.12 in the caseconsidered by A. Denjoy, where τ0 is replaced by [a1, · · · , an]. A definitedifference occurs just in the convergence rates while the limiting probabilitiesare not altered.

Page 181: Kluwer

164 Chapter 2

Page 182: Kluwer

Chapter 3

Limit theorems

This chapter is devoted to functional versions of central limit and otherweak theorems, and of the law of the iterated logarithm for the incompletequotients and associated random variables. The reader should keep in mindthroughout that the sequence (an)n∈N+ of incomplete quotients is ψ-mixingunder different probability measures on BI (see Subsections 1.3.6 and 2.3.4),while frequent reference is made to the three appendices at the end of thebook.

3.0 Preliminaries

As in Subsection 2.5.4 let g be a λ-integrable complex-valued function on I.We particularize here the framework considered there taking a = 0 and ac-cordingly γ0 = λ. Denote by Ek, k ∈ N, the set consisting of the endpointsof all fundamental intervals of rank `, 0 ≤ ` ≤ k. For any n ∈ N+ we asso-ciate with g a function gn which has a constant value on each fundamentalinterval I(i(n)), i(n) ∈ Nn

+, of rank n. Specifically,

gn(x)=1

λ(I(i(n))

)∫

I(i(n))gdλ, x ∈ I(i(n)), i(n) ∈ Nn

+, n ∈ N+. (3.0.1)

Then ∫

Igndλ =

Igdλ, n ∈ N+, (3.0.2)

andlim

n→∞ gn(x) = g(x) a.e. in I. (3.0.3)

165

Page 183: Kluwer

166 Chapter 3

It follows from (3.0.2) and (3.0.3) that limn→∞∫I |g − gn|dλ = 0. Hence

ωg,A(n) =∫

A|g − gn|dλ → 0 (3.0.4)

uniformly with respect to A ∈ BI as n →∞.We shall now prove a result which, in a sense, is dual to Theorem 2.5.9.

Lemma 3.0.1 Let µ ∈ pr (BI) such that µ ¿ λ and let g =dµ/dλ.For any n ∈ N+ and A ∈ B∞n = τ−n+1(BI) we have

|µ(A)− γ(A)| ≤ inf1≤s<n

(γ(Ac)ωg,A(s) + γ(A)ωg,Ac(s) + γ(A)εn−s) ,

with εn, n ∈ N+, defined as in Subsection 1.3.6. Hence

limn→∞ sup

A∈B∞n|µ(A)− γ(A)| = 0.

Proof. Put h = IA − γ(A), A ∈ B∞n . Then

µ(A)− γ(A) =∫

Ighdλ

and ∣∣∣∣∫

Ighdλ

∣∣∣∣ ≤∫

I|gs − g| |h|dλ +

∣∣∣∣∫

Igshdλ

∣∣∣∣ ,

where gs is defined by (3.0.1) and s < n, s ∈ N+, is arbitrary. Since|h| = 1− γ(A) = γ(Ac) on A and |h| = γ(A) on Ac, we have

I|gs − g| |h|dλ ≤ γ(Ac)ωg,A(s) + γ(A)ωg,Ac(s). (3.0.5)

Next,

∣∣∣∣∫

Igshdλ

∣∣∣∣ =

∣∣∣∣∣∣∣

i(s)∈Ns+

I(i(s))gshdλ

∣∣∣∣∣∣∣

=

∣∣∣∣∣∣∣

i(s)∈Ns+

(1

λ(I(i(s)))

I(i(s))gdλ

)∫

I(i(s))hdλ

∣∣∣∣∣∣∣

=

∣∣∣∣∣∣∣

i(s)∈Ns+

µ(I(i(s)))λ(I(i(s)))

(λ(I(i(s)) ∩A)− λ(I(i(s)))γ(A)

)∣∣∣∣∣∣∣.

Page 184: Kluwer

Limit theorems 167

It then follows from equation (1.3.35) that∣∣∣∣∫

Igshdλ

∣∣∣∣ ≤ γ(A)εn−s. (3.0.6)

Now, the result stated follows from (3.0.5), (3.0.6), and (3.0.4). 2

Let fn : N+ → R, n ∈ N+, and define

Xnj = fn(aj), 1 ≤ j ≤ n,

Sn0 = 0, Snk =k∑

j=1

Xnj , 1 ≤ k ≤ n, Snn = Sn, n ∈ N+.

For any n ∈ N+ define the process ξn = ((ξn(t))t∈I by ξn(t) = Snbntc, t ∈ I.

Lemma 3.0.2 Let µ ∈ pr (BI) such that µ ¿ λ. Assume that thearray X = Xnj , 1 ≤ j ≤ n, n ∈ N+ is s.i. under γ.

(i) If either(γξ−1

n

)n∈N+

or(µξ−1

n

)n∈N+

converges weakly in BD, thenboth sequences converge weakly in BD and have the same limit.

(ii) If either(γS−1

n

)n∈N+

or(µS−1

n

)n∈N+

converges weakly in B, thenboth sequences converge weakly in B and have the same limit.

Proof. Clearly, (ii) is an immediate consequence of (i). Let us thereforeprove the latter. Take a sequence (kn)n∈N+ such that kn ≤ n, limn→∞ kn/n =0, and limn→∞ kn = ∞. As X is s.i. under γ, we have

limn→∞ γ (|Snkn | > ε) = 0 (3.0.7)

for any ε > 0.Let us first show that

limn→∞

(max

1≤k≤kn

|Snk| > ε

)= 0 (3.0.8)

for any ε > 0. It follows from Proposition A3.5 (see also Section A1.4) thatwhatever ε > 0 we have

dP

(γS−1

nk , δ0

) ≤ ε

4, 1 ≤ k ≤ kn,

for any n large enough (≥ nε). Therefore for some θ ≤ ε/4 we have

δ0(A) < γS−1nk (Aθ) + θ

Page 185: Kluwer

168 Chapter 3

for any n ≥ nε, 1 ≤ k ≤ kn, and A ∈ B. Hence, with A = (−θ, θ) for whichAθ = (−2θ, 2θ), we obtain

γS−1nk

((−ε

2,ε

2

))> γS−1

nk

(Aθ

)> 1− θ ≥ 1− ε

4

for any n ≥ nε and 1 ≤ k ≤ kn. Equivalently,

min1≤k≤kn

γ(|Snk| < ε

2

)> 1− ε

4, n ≥ nε.

If ε is small enough so that

1− ε

4> ϕγ(1),

then by an Ottaviani type inequality [see Lemma 1.1.6 in Iosifescu andTheodorescu (1969)] we can write

γ

(max

1≤k≤kn

|Snk| > ε

)≤ γ

(|Snkn | ≥ ε2

)

1− ε4 − ϕγ(1)

for any n ≥ nε. Hence (3.0.8) holds on account of (3.0.7).Next, for any n ∈ N+ consider the process ξn = (ξn (t))t∈I defined by

ξn (t) = Snbntc − Sn min(bntc,kn), t ∈ I.

Note that ξn is B∞kn+1-measurable and then by Lemma 3.0.1 and Lemma2.1.1 in Iosifescu and Grigorescu (1990) we have

limn→∞

(∫

Dhd(γξ−1

n )−∫

Dhd(µξ−1

n ))

= 0 (3.0.9)

for any bounded continuous real-valued function h on D. On the other hand(see Section A1.6), for any n ∈ N+ we have

d0(ξn, ξn) ≤ supt∈I

|ξn(t)− ξn(t)| ≤ max1≤k≤kn

|Snk|.

It then follows from (3.0.8) that

d0(ξn,∼ξn) converges to 0 in γ-probability as n →∞. (3.0.10)

Hence as µ ¿ γ we also have that

d0(ξn,∼ξn) converges to 0 in µ-probability as n →∞. (3.0.11)

Page 186: Kluwer

Limit theorems 169

We can now conclude the proof using (3.0.9) through (3.0.11). If, forexample, γξ−1

nw→ ν for some ν ∈ pr (BD), then it follows from (3.0.10) that

γξ−1n

w→ ν, too. Next, (3.0.9) implies that µξ−1n

w→ ν, which in conjunctionwith (3.0.11) yields µξ−1

nw→ ν. 2

Remark. Lemma 3.0.2 still holds when the process ξn is replaced by theprocess ξC

n =(ξCn (t)

)t∈I

defined by

ξCn (t) = Snbntc + (nt− bntc) (

Sn(bntc+1) − Snbntc), t ∈ I,

with the convention Sn0 = 0, n ∈ N+. 2

3.1 The Poisson law

3.1.1 The case of incomplete quotients

Let θ ∈ R++ and α ∈ R be arbitrarily given. Consider the array

X = Xnj , 1 ≤ j ≤ n, n∈ N+,

whereXnj =

(aj

n

)αI(aj>θn). (3.1.1)

For this array we have

Snk = n−αk∑

j=1

aαj I(aj>θn), 1 ≤ k ≤ n, Sn = Snn, n ∈ N+. (3.1.2)

Proposition 3.1.1 The array (3.1.1) is s.i. under γ.

Proof. We only consider the case α ∈ R++. The other cases can betreated similarly. We have

γ (|Snk| > ε) ≤k∑

j=1

γ(|Xnj | > ε

k

)= kγ

(|Xn1| > ε

k

)

= kγ

(a1 > n max

(θ,

( ε

k

)1/α))

≤ kγ(a1 > nθ), 1 ≤ k ≤ n.

Page 187: Kluwer

170 Chapter 3

Hence Xn1 converges in γ-probability to 0 as n →∞, and for any 0 < a < 1we have

lim supn→∞

max1≤k≤an

γ (|Snk| > ε) ≤ limn→∞ a n γ (a1 > nθ) =

a

θ log 2,

which is less than 1 if we choose

0 < a < min(1, θ log 2).

On account of Proposition A3.6 the proof is complete. 2

Theorem 3.1.2 We have

γS−1n

w→ ν in B, (3.1.3)

where:(i) if α ∈ R++ then ν = Pois ρ with

dλ(x) = δx ((θα,∞))

x−1−1/α

α log 2, x ∈ R;

(ii) if −α ∈ R++ then ν = Pois ρ with

dλ(x) = −δx ((0, θα))

x−1−1/α

α log 2, x ∈ R;

(iii) if α = 0 then ν = Pois((θ log 2)−1 δ1

), that is, ν is the Poisson

distribution P((θ log 2)−1

)with parameter (θ log 2)−1.

Proof. We only prove (i), the proofs of (ii) and (iii) being completelysimilar.

Consider the measures µn on B defined by

µn(A) = γ((a1

n

)α∈ A, a1 > θn

), A ∈ B, n∈ N+.

Clearly,

µn(R) = γ (a1 > θn) ≤ 1, µn ([−θα, θα]) = 0, n ∈ N+,

and

γ(Xn1 ∈ A) = γ (a1 ≤ θn) δ0(A) + µn(A), A ∈ B, n∈ N+.

Page 188: Kluwer

Limit theorems 171

Also, for any x ∈ R we have

limn→∞n µn ((x,∞)) = lim

n→∞n γ(a1 > n (max(x, θα))1/α

)

=1

log 2lim

n→∞n log

1 +

1⌊n (max(x, θα))1/α + 1

=1

log 21

(max(x, θα))1/α= ρ ((x,∞)) .

Finally,

limn→∞n µn(R) = lim

n→∞n γ(a1 > n θ) =1

θ log 2= ρ(R).

Therefore all hypotheses of Theorem A3.10 are fulfilled, and (3.1.3)holds. 2

Now, on account of Proposition 3.1.1, Theorem 3.1.2, Lemma 3.0.2, andTheorem A3.7 we can state the following result. (See Section A3.3 for nota-tion.)

Corollary 3.1.3 Let µ ∈ pr(BI) such that µ ≤ λ. Then µξ−1n

w→ Qν

in BD, hence µS−1n

w→ ν in B, where ξn = (Snbntc)t∈I , with the conventionSn0 = 0, n ∈ N+.

3.1.2 The case of associated random variable

We shall now show that both Theorem 3.1.2 and Corollary 3.1.3 still holdwhen aj is replaced by either yj , rj , or uj , 1 ≤ j ≤ n, in (3.1.1) and (3.1.2).This will follow from the result below.

Lemma 3.1.4 Let bn, n ∈ N+, be real-valued random variables on (I,BI)such that

an ≤ bn ≤ an + c, n ∈ N+,

for some c ∈ R+. For any n ∈ N+ consider the stochastic processes ξn =(Snbntc)t∈I and ξ′n = (S′nbntc)t∈I , where Snk, 1 ≤ k ≤ n, is defined by (3.1.2)and

S′nk = n−αk∑

j=1

bαj I(bj>θn), 1 ≤ k ≤ n,

Page 189: Kluwer

172 Chapter 3

with the convention S′n0 = 0. Then d0 (ξn, ξ′n) converges to 0 in γ-probabilityas n →∞.

Proof. For any n ∈ N+ we have

d0(ξn, ξ′n) ≤ supt∈I

|Snbntc − S′nbntc| ≤n∑

j=1

|δnj |,

whereδnj = n−α

(bαj I(bj>θn) − aα

j I(aj>θn)

), 1 ≤ j ≤ n.

Notice that (aj > θn) ⊂ (bj > θn), 1 ≤ j ≤ n, and put

δ′n = n−αn∑

j=1

bαj

(I(bj>θn) − I(aj>θn)

)= n−α

n∑

j=1

bαj I(bj>θn,aj≤θn),

δ′′n = n−αn∑

j=1

|bαj − aα

j |I(aj>θn).

Then∑n

j=1 |δnj | ≤ δ′n + δ′′n, and we are going to prove that δ′n and δ′′n bothconverge to 0 in γ-probability as n →∞.

We haveγ(δ′n > 0) ≤ nγ(θn− c < a1 ≤ θn) → 0

as n →∞ while

δ′′n ≤ cαn−1

n−(α−1)

n∑

j=1

aα−1j I(aj>θn)

,

where

cα =

cα(1 + c)α−1 if α ≥ 1,

c|α| if α < 1.

[We have used the inequality (1+a)α−1 ≤ a(α+ bαc(1 + a)α−1

), valid for

non-negative a and α, which implies 1− (1+a)−α ≤ aα.] By Theorem 3.1.2,δ′′n converges to 0 in γ-probability as n → ∞. It follows that d0(ξn, ξ′n) isdominated by the sum of two non-negative random variables both convergingin γ-probability to 0 as n →∞. The proof is complete. 2

Corollary 3.1.5 Let bn denote either yn, rn, or un, n ∈ N+. Put

S′nk = n−αk∑

j=1

bαj I(bj>θn), 1 ≤ k ≤ n,

Page 190: Kluwer

Limit theorems 173

and for any n ∈ N+ consider the stochastic process ξ′n = (S′nbntc)t∈I , with the

convention S′n0 = 0. Let µ ∈ pr(BI) such that µ ¿ λ. Then µξ′−1n

w→ Qν inBD, hence µS′−1

nnw−→ ν in B.

Proof. Lemma 3.1.4 applies with c = 1 in the case of yn and rn andwith c = 2 in the case of un. Since µ ¿ γ, the distance d0(ξn, ξ′n) convergesto 0 in µ-probability, too, as n → ∞. This property and Corollary 3.1.3imply the result stated. 2

Let bn denote either an, yn, rn or un, n ∈ N+, and consider the specialcase α = 0. By Corollaries 3.1.3 and 3.1.5, under any µ ∈ pr(BI) such thatµ ¿ λ, the random variable

S′n =n∑

j=1

I(bj>θn)

is asymptotically P((θ log 2)−1

)as n → ∞. It is possible to estimate the

rate of convergence of γ(S′n = k), k ∈ N, to its Poisson limit. The followingresult holds.

Theorem 3.1.6 Let k ∈ N and 0 < δ < 1 be fixed. We have

|γ(S′n = k)− e−θθk/k!| ≤ c exp(−(log n)δ), n ∈ N+,

for θ = O(na), 0 ≤ a < 1, where c only depends, perhaps, on δ, a, and k.

The proof for the case bn = an, n ∈ N+, k = 0, can be found in Philipp(1976, p. 382), where the proviso θ = O(na), 0 ≤ a ≤ 1, does not appear.Cf. Galambos (1972) and Iosifescu (1978, p. 35).

3.1.3 Some extreme value theory

Throughout this subsection let again bn denote either an, yn, rn or un, n ∈N+. For 1 ≤ k ≤ n let M

(k)n be the kth largest of b1, · · · , bn. Clearly,

M(1)n = Mn is the maximum of b1, · · · , bn. The asymptotic distribution of

M(k)n as n →∞ for any fixed k can be easily obtained from previous results

as shown below.

Proposition 3.1.7 Let µ ∈ pr(BI) such that µ ¿ λ. For any fixedk ∈ N+ we have

limn→∞µ

(M

(k)n log 2

n≤ x

)= e−

1x

k−1∑

j=0

x−j

j!, x ∈ R++. (3.1.4)

Page 191: Kluwer

174 Chapter 3

In particular,

limn→∞µ

(Mn log 2

n≤ x

)= e−

1x , x ∈ R++.

Proof. Let 1 ≤ k ≤ n. It is easy to see that S′n =∑n

j=1 I(bj>θn) is less

than k if and only if M(k)n does not exceed θn, that is,

(M (k)

n ≤ θn)

=(S′n < k

)(3.1.5)

for any θ ∈ R++ and n ∈ N+. Hence, by Corollaries 3.1.3 and 3.1.5,

µ(M (k)

n ≤ θn)

= µ(S′n < k

)=

k−1∑

j=0

µ(S′n = j

)

→ e−(θ log 2)−1k−1∑

j=0

1j!(θ log 2)j

as n → ∞ for any fixed k ∈ N+. Putting x = θ log 2 we obtain the resultstated. 2

Remark. The limit distribution for the special case k = 1 is known asType II Extreme Value distribution for sequences of i.i.d. random variables.See, e.g., de Haan (1970). The same result can also be obtained from generalresults of Loynes (1965) for mixing strictly stationary sequences. 2

In what follows we give some almost sure asymptotic properties of Mn

due to Philipp (1976), which improve upon results of Galambos (1974). Westart with a F. Bernstein type theorem (see Proposition 1.3.16).

Proposition 3.1.8 Let (cn)n∈N+ be a non-decreasing sequence of posi-tive numbers. Then

γ(Mn ≥ cn i.o.)

is either 0 or 1 according as the series∑

n∈N+1/cn converges or diverges.

Proof. We have (bn ≥ cn i.o.) ⊂ (Mn ≥ cn i.o.) since bn(ω) ≥ cn for somen ∈ N+ and ω ∈ Ω implies Mn(ω) ≥ cn. Conversely, if Mn(ω) ≥ cn for somen ∈ N+ and ω ∈ Ω, then there exists n′ ≤ n such that Mn(ω) = bn′(ω) ≥cn ≥ cn′ . Hence (Mn ≥ cn i.o.) ⊂ (bn ≥ cn i.o.). Therefore (Mn ≥ cn i.o.) =(bn ≥ cn i.o.) , and the conclusion follows from Corollary 1.3.17. 2

Page 192: Kluwer

Limit theorems 175

Corollary 3.1.9 Let (cn)n∈N+ be as in Proposition 3.1.8. Then either

limn→∞

Mn

cn= 0 a.e. (3.1.6)

or

lim supn→∞

Mn

cn= ∞ a.e. (3.1.7)

according as the series∑

n∈N+1/cn converges or diverges.

Proof. First, assume that s =∑

n∈N+1/cn < ∞. Choose positive

numbers dn, n ∈ N+, with limn→∞ dn = ∞ such that∑

n∈N+dn/cn < ∞.

This is always possible. Indeed, put sn =∑n

i=1 1/ci, n ∈ N+, and define

E1 = j ∈ N+ : sj ≤ 3s/4,

En =

j ∈ N+ : 3s

n−1∑

i=1

4−i < sj ≤ 3sn∑

i=1

4−i

, n ≥ 2.

Consider the increasing sequence (nk)k∈N+ of indices n for which En 6=∅ and take dj = 2nk−1 if j ∈ Enk

, k ∈ N+, with n0 = 0. Then we have∑j∈Enk

1/cj ≤ 3s(4−nk + 4−(nk−1) + · · · + 4−nk−1) ≤ 4−nk−1+1s, k ∈ N+,hence

∑n∈N+

dn/cn =∑

k∈N+

∑j∈Enk

dj/cj ≤ 4s∑

k∈N+2−nk−1 ≤ 8s. By

Proposition 3.1.8 we have

γ

(Mn

cn≥ 1

dni.o.

)= 0,

which is equivalent to (3.1.6).Second, assume that

∑n∈N+

1/cn = ∞. Choose positive numbers dn, n ∈N+, with limn→∞ dn = 0 such that

∑n∈N+

dn/cn = ∞. This is again alwayspossible. Indeed, put sn =

∑ni=1 1/ci, n ∈ N+, and define

E1 = j ∈ N+ : sj ≤ 4 ,

En =j ∈ N+ : 4n−1 < sj ≤ 4n

, n ≥ 2.

Consider the increasing sequence (nk)k∈N+ of indices n for which En 6= ∅and take dj = 2−nk−1 if j ∈ Enk

∪ Enk+1, k = 1, 3, · · · , with n0 = 0. Then∑

j∈Enk∪Enk+1

1/cj ≥ 4nk −4nk−1 ≥ 3 ·4nk−1 whence∑

j∈Enk∪Enk+1

dj/cj ≥

Page 193: Kluwer

176 Chapter 3

3 · 2nk−1 , k = 1, 3, · · · . Clearly, this implies∑

n∈N+dn/cn = ∞. By Propo-

sition 3.1.8 we have

γ

(Mn

cn≥ 1

dni.o.

)= 1,

which is equivalent to (3.1.7). 2

Theorem 3.1.10 Let (cn)n∈N+ be a non-decreasing sequence of positivenumbers such that the sequence (n/cn)n∈N+ is non-decreasing. Then

γ

(Mn ≤ n

cn log 2i.o.

)

is either 0 or 1 according as the series

n∈N+

log log n

n exp cn

converges or diverges.

The proof is completely similar to that given for the i.i.d. case inBarndorff–Nielsen (1961). Theorem 3.1.6 plays an essential part in thepresent case. For details in the case bn = an, n ∈ N+, see Philipp (1976,pp. 384–385). 2

Corollary 3.1.11 We have

lim supn→∞

(inf)log Mn − log n

log log n= 1(0) a.e.,

whencelim

n→∞log Mn

log n= 1 a.e..

Proof. For the lim sup case we should show that for any ε > 0 we have

γ

(log Mn − log n

log log n≥ 1 + ε i.o.

)= 0

and

γ

(log Mn − log n

log log n≥ 1− ε i.o.

)= 1

or, equivalently,γ

(Mn ≥ n(log n)1+ε i.o.

)= 0

Page 194: Kluwer

Limit theorems 177

andγ

(Mn ≥ n(log n)1−ε i.o.

)= 1.

These equations clearly hold by Proposition 3.1.8.For the lim inf case we should show that for any ε > 0 we have

γ

(log Mn − log n

log log n≤ ε i.o.

)= 1

and

γ

(log Mn − log n

log log n≤ −ε i.o.

)= 0

or, equivalently,γ (Mn ≤ n(log n)ε i.o.) = 1

andγ

(Mn ≤ n(log n)−ε i.o.

)= 0

It is easy to check that these equations hold by Theorem 3.1.10. 2

Corollary 3.1.12 We have

lim infn→∞

Mn log log n

n=

1log 2

a.e..

Proof. We should show that for any ε > 0 we have

γ

(Mn log log n

n− 1

log 2≤ ε i.o.

)= 1

and

γ

(Mn log log n

n− 1

log 2≤ −ε i.o.

)= 0

or, equivalently,

γ

(Mn ≤ n(1 + ε′)

(log log n)(log 2)i.o.

)= 1

and

γ

(Mn ≤ n(1− ε′)

(log log n)(log 2)i.o.

)= 0,

where ε′ = ε log 2. This follows immediately from Theorem 3.1.10. 2

Page 195: Kluwer

178 Chapter 3

To conclude this subsection we consider the kth smallest m(k)n of b1, · · · , bn,

1 ≤ k ≤ n, n ∈ N+. Clearly, m(1)n = M

(n)n . In general, we have m

(k)n =

M(n−k+1)n , 1 ≤ k ≤ n. Then by (3.1.5) we have

(m(k)n ≤ θn) =

(S′n < n− k + 1

)

for any θ ∈ R++ and n ∈ N+. Hence, for any µ ∈ pr(BI) such that µ ¿ λ,

µ(m(k)

n ≤ θn)

= µ(S′n < n− k + 1

)

=n−k∑

j=0

µ(S′n = j) = 1−n∑

j=n−k+1

µ(S′n = j).

Since n−1S′n converges to 0 in µ-probability as n → ∞ by Corollaries 3.1.3and 3.1.5, we have

limn→∞µ

(S′n = n−m

)= 0

for any fixed m ∈ N. Consequently,

limn→∞µ

(m(k)

n ≤ θn)

= 1 (3.1.8)

for any fixed k ∈ N+. This result is not at all surprising. Indeed, byProposition 4.1.1 we have

limn→∞ a(k)

n = 1 a.e.

for any fixed k ∈ N+, where a(k)n denotes the kth smallest of a1, · · · , an.

As m(k)n ≤ a

(k)n + 2, n ∈ N+, 1 ≤ k ≤ n, it follows that

limn→∞

m(k)n

n= 0 a.e.

for any fixed k ∈ N+, which clearly entails (3.1.8).

Remark. It is proved in Iosifescu (1977) that if (ηn)n∈N+ is a strictlystationary ψ-mixing sequence of positive random variables on a probabilityspace (Ω ,K, P ) such that for some real-valued function g on N+ there existsthe positive finite limit

limn→∞ nP (ηn < g(n)) = θ,

Page 196: Kluwer

Limit theorems 179

say, then P (ηk < g(n) for p values k, 1 ≤ k ≤ n) → e−θθp/p! as n →∞ forany fixed p ∈ N.

In particular this result applies to a sequence (ηn)n∈N+ for which

P (η1 ≥ x) = log(1 + 1/x)/ log 2, x ≥ 1,

withg(n) = 1 +

2θ log 2n

, n ∈ N+.

For such a sequence, similarly to (3.1.4) we can write

limn→∞P

(n(η(k)

n − 1)2 log 2

≥ x

)= e−x

k−1∑

j=0

xj

j!, x ∈ R++, (3.1.9)

for any fixed k ∈ N+, where η(k)n denotes the kth smallest of η1, · · · , ηn, 1 ≤

k ≤ n.We cannot assert that (3.1.9) is true for ηn = an, n ∈ N+, since the

equation γ (a1 ≥ x) = log (1 + 1/x) / log 2 holds just for x ∈ N+. It isconjectured in Iosifescu (1978) that (3.1.9) holds true for ηn = rn, n ∈ N+,under any P ¿ λ. [Notice that γ (r1 ≥ x) = log (1 + 1/x) / log 2 for anyx ≥ 1, but the sequence (rn)n∈N+ is not ψ-mixing under γ.] 2

3.2 Normal convergence

3.2.1 Two general invariance principles

Assume the framework of Subsection 2.1.5. Thus let H be a real-valuedfunction on NZ

+. Set Hl = H1 τ l−1, l ∈ Z, where

H1 = H( · · · , a−2, a−1, a0, a1, a2, · · · ).

Then (Hl)l∈Z is a strictly stationary process on (I2,B2I , γ). Set S0 = 0, Sn =∑n

i=1 Hi − nEγH1, n ∈ N+, assuming that the mean value EγH1 existsand is finite. For any n ∈ N+ let us define the stochastic processes ξC

n =(ξC

n (t))t∈I and ξDn = (ξD

n (t))t∈I by

ξCn (t) =

1σ√

n

(Sbntc + (nt− bntc)(Hbntc+1 −EγH1)

),

ξDn (t) =

1σ√

nSbntc, t ∈ I,

Page 197: Kluwer

180 Chapter 3

where σ = σ(H) is a positive number which will be specified later.We start with a weak invariance principle.

Theorem 3.2.1 Assume that EγH21 < ∞ and

n∈N+

E1/2γ [H1 −Eγ(H1|a−n, · · · , an)]2 < ∞ (3.2.1)

so that by Propositions 2.1.19 and 2.1.21

limn→∞

1n

EγS2n = σ2 ≥ 0

exists finitely and is given by the absolutely convergent series

σ2 = EγH21 −E2

γH1 + 2∑

n∈N+

(EγH1Hn+1 − E2

γH1

). (3.2.2)

If σ > 0 then γξ−1n

w−→ W in both C and D, where ξn stands for either ξCn

or ξDn . The last conclusion still holds when γ is replaced by any µ ∈ pr(B2

I )such that µ ¿ λ2.

Proof. This is a transcription of Theorem 21.1 in Billingsley (1968), withan improvement by Popescu (1978) (concerning the possibility of replacing γby µ), for the special case of the doubly infinite sequence (al)l∈Z. Note thatin Proposition 2.1.22 a class of functions H is indicated, for which (3.2.1)holds. 2

Next, we state a strong invariance principle.

Theorem 3.2.2 Assume that there exist constants 0 < δ ≤ 2 and c > 0such that Eγ |H1|2+δ < ∞ and

E1/(2+δ)γ |H1 − Eγ(H1|a−n, · · · , an)|2+δ ≤ cn−(2+7/δ), n ∈ N+, (3.2.3)

so that (3.2.1) holds and

limn→∞

1n

EγS2n = σ2 ≥ 0

exists finitely and is given by the absolutely convergent series (3.2.2). Ifσ > 0 then the strong invariance principle holds for the stochastic processesξCn and ξD

n , n ∈ N+. That is, without changing their distributions, we canredefine these processes on a common richer probability space together witha standard Brownian motion process (w(t))t∈I such that

supt∈I

|ξn(t)− w(t)| = O(n−a) a.s.

Page 198: Kluwer

Limit theorems 181

as n → ∞, with a random constant implied in O, for each a > 0 smallenough, depending on δ. Here ξn stands for either ξC

n or ξDn .

Proof. This is a transcription of Theorem 7.1.1 in Philipp and Stout(1975) for the special case of the doubly infinite sequence (al)l∈Z. 2

For further reference we also consider the special case where H onlydepends on the coordinates with positive indices of a current point in NZ

+,

i.e., H is a real-valued function on NN++ . (Completely similar considerations

can be made in the case where H only depends on the coordinates with non-positive indices of a current point inNZ

+, i.e., H is a real-valued function onN(−N)

+ .) In this case we set Hn = H1 τn−1, n ∈ N+, where

H1 = H(a1, a2, · · · ),

and we have a strictly stationary sequence (Hn)n∈N+ on (I,BI , γ). Withthe same definitions as before for Sn, ξC

n and ξDn , n ∈ N+, where EγH1 is

replaced by EγH1, we can state the following special cases of Theorems 3.2.1and 3.2.2.

Theorem 3.2.1′ Assume that EγH21 < ∞ and

n∈N+

E1/2γ [H1 − Eγ(H1|a1, · · · , an)]2 < ∞ (3.2.1′)

so thatlim

n→∞1n

EγS2n = σ2 ≥ 0

exists finitely and is given by the absolutely convergent series

σ2 = EγH21 − E2

γH1 + 2∑

n∈N+

(EγH1Hn+1 −E2

γH1

). (3.2.2′)

If σ > 0 then γξ−1n

w−→ W in both C and D, where ξn stands for either ξCn

or ξDn . The last conclusion still holds when γ is replaced by any µ ∈ pr(BI)

such that µ ¿ λ.

Note that inequality (2.1.32) and Proposition 2.1.23 describe two classesof functions H for which (3.2.1′) holds.

Theorem 3.2.2′ Assume that there exist constants 0 < δ ≤ 2 and c > 0such that Eγ |H1|2+δ < ∞ and

E1/(2+δ)γ |H1 −Eγ(H1|a1, · · · , an)|2+δ ≤ cn−(2+7/δ), n ∈ N+, (3.2.3′)

Page 199: Kluwer

182 Chapter 3

so that (3.2.1′) holds and

limn→∞

1n

EγS2n = σ2 ≥ 0

exists finitely and is given by the absolutely convergent series (3.2.2′). If σ >0 then the strong invariance principle holds for the stochastic processes ξC

n

and ξDn , n ∈ N+. That is, without changing their distributions, we can re-

define these processes on a common richer probability space together with astandard Brownian motion process (w(t))t∈I such that

supt∈I

|ξn(t)− w(t)| = O(n−a) a.s.

as n → ∞, with a random constant implied in O, for each a > 0 smallenough, depending on δ. Here ξn stands for either ξC

n or ξDn .

3.2.2 The case of incomplete quotients

An important special case of Theorem 3.2.1′ is obtained when the functionH only depends on finitely many coordinates of a current point of NN+

+ , i.e.,when H is a real-valued function on Nk

+ for a given k ∈ N+. In this caseHn = H(an, ..., an+k−1), n ∈ N+, assumption (3.2.1′) is trivially satisfied,and by Corollary 1.2.5 we have

EγHr1 =

1log 2

i(k)∈Nk+

Hr(i(k)) log1 + v(i(k))1 + u(i(k))

with r = 1 or 2, andσ2 = EγH2

1 − E2γH1 (3.2.2′′)

+2∑

n∈N+

i(n+k)∈Nn+k+

H(i(k))H(in+1, · · · , in+k)log 2

log1 + v(i(n+k))1 + u(i(n+k))

−E2γH1

.

Note that in the case k = 1 by either Corollary 2.1.25 or PropositionA3.4 we have σ = 0 if and only if H =const. It is an open problem to findnecessary and sufficient conditions in terms of H in the case k > 1 for tohave σ = 0.

The special framework assumed allows for an estimate of the convergencerate in the classical central limit theorem. Thus we have the following result.

Page 200: Kluwer

Limit theorems 183

Theorem 3.2.3 If σ > 0 and

Eγ |H1|2+δ =1

log 2

i(k)∈Nk+

∣∣∣H(i(k))∣∣∣2+δ

log1 + v(i(k))1 + u(i(k))

< ∞

for some δ > 0, then there exist two positive constants a < 1 and c suchthat ∣∣∣∣∣γ

(∑nj=1 Hj − nEγH1

σ√

n< x

)− Φ(x)

∣∣∣∣∣ ≤ c n−a

for any x ∈ R and n ∈ N+.

Proof. This is a transcription of Theorem 1 in Iosifescu (1968) for thespecial case of the sequence (an)n∈N+ of incomplete quotients. 2

Remark. It is an open problem to determine the optimal value of a inTheorem 3.2.3. We conjecture that a = δ/2, that is, the same value as inthe case of i.i.d. random variables with finite (2 + δ)-absolute moment. 2

In what follows, by restricting the class of functions H we give moreprecise results in the case k = 1. To emphasize this special framework wechange the notation by using the letter f instead of H.

Theorem 3.2.4 Let f : N+ → R, An ∈ R, Bn ∈ R++, n ∈ N+, withlimn→∞Bn = ∞, and define

Xnj = B−1n (f(aj)−An) , 1 ≤ j ≤ n,

Sn0 = 0, Snk =k∑

j=1

Xnj , 1 ≤ k ≤ n, Snn = Sn, n ∈ N+,

F (x) =1

log 2

k:|f(k)|≤xf2(k)k−2,

F (x) = Eγf2(a1)I(|f(a1)|≤x)

=1

log 2

k:|f(k)|≤xf2(k) log

(1 +

1k(k + 2)

), x ∈ R+.

(i) The following assertions are equivalent.

(I) The stochastic process ξDn = ξn = (ξn(t))t∈I defined for any n ∈ N+

by ξn(t) = Snbntc, t ∈ I, satisfies

γξ−1n

w−→ WD in BD ,

Page 201: Kluwer

184 Chapter 3

where WD is the Wiener measure on BD.

(II) γS−1n

w−→ N(0, 1), and the array X = Xnj , 1 ≤ j ≤ n, n ∈ N+ iss.i. under γ.

(ii) When limx→∞ F (x) = Eγf2(a1) = ∞, assertion (I) above holds with abounded sequence (An)n∈N+ if and only if

limn→∞

x2∑

k:|f(k)|>xk−2

k:|f(k)|≤xf2(k)k−2

= 0 (3.2.4)

or, equivalently (see Theorem A2.5), if and only if F is slowly varying. Ifthis is the case, then we can take An = Eγf(a1), n ∈ N+,and any sequence(Bn)n∈N+ such that limn→∞ nB−2

n F (Bn) = 1.When Eγf2(a1) < ∞, assertion (I) holds with a bounded sequence

(An)n∈N+ if and only if f 6=const. If this is the case, then we can takeAn = Eγf(a1) and Bn =

√nσ(0)E

1/2γ f2(a1), n ∈ N+, for some σ(0) > 0.

(iii) If either (I) or (II) holds, then γ can be replaced in (i) by anyµ ∈ pr(BI) such that µ ¿ λ.

Proof. (i) and (iii) follow from Theorem A3.7 and Lemma 3.0.2, respec-tively.

We thus have to only prove (ii). First, since

limk→∞

log(1 + 1

k(k+2)

)

k−2= 1,

either F and F both tend to ∞ as x → ∞ and limx→∞ F (x)/F (x) = 1 orboth have finite limits as x →∞. Consequently, F is slowly varying if andonly if F is.

Assume that (3.2.4) holds. Note that this does always happen when

0 < Eγf2(a1) = limx→∞ F (x) < ∞.

Then Theorem A3.12 applies with Xn = f(an), n ∈ N+, and

m2(X1) =

E2γf(a1)

Eγf2(a1)if Eγf2(a1) < ∞,

0 if Eγf2(a1) = ∞,

Page 202: Kluwer

Limit theorems 185

ϕ(0)1 = 1, ϕ(0)

n =

Eγf(a1)f(an)Eγf2(a1)

if Eγf2(a1) < ∞,

0 if Eγf2(a1) = ∞for n ≥ 2 [use Proposition A3.1 and equation (A3.2)], and σ2

(0) equals either

Eγf2(a1)− E2γf(a1) + 2

∑n∈N+

(Eγf(a1)f(an+1)− E2

γf(a1))

Eγf2 (a1)

or 1 according as Eγf2(a1) < ∞ or Eγf2(a1) = ∞. Noting that whenEγf2(a1) < ∞ by Corollary 2.1.25 we have σ(0) 6= 0 if and only if f 6=const., we conclude that with An and Bn, n ∈ N+, as indicated we haveγξ−1

nw−→ WD, that is, (I) holds with a bounded sequence (An)n∈N+ .

Next, assume that (I) or, equivalently, (II) holds with a bounded se-quence (An)n∈N+ . Clearly, this cannot happen if f = const. It thus remainsto show that F is slowly varying when

limx→∞ F (x) = ∞. (3.2.5)

Fix δ ∈ (0, 1) and put Xnjδ = XnjI(|Xnj |≤δ) − EγXnjI(|Xnj |≤δ) for any1 ≤ j ≤ n, n ∈ N+. As γS−1

nw→ N(0, 1) by (II), it follows from Theorem

A3.11(i) that

limn→∞Eγ

n∑

j=1

Xnjδ

2

= 1. (3.2.6)

On the other hand, it follows from Corollary A3.2 that

n∑

j=1

Xnjδ

2

≤1 + 2

k∈N+

ψ(k)

nEγX2

n1I(|Xn1|≤δ), n ∈ N+. (3.2.7)

Now, note that |f(i)−An| ≤ δBn entails

|f(i)| ≤ |An|+ δBn = Bn

(|An|B−1n + δ

) ≤ Bn

for any n large enough since δ ∈ (0, 1), (An)n∈N+ is bounded, and limn→∞Bn

= ∞. Then for such an n we have

EγX2n1I(|Xn1|≤δ)

≤ B−2n Eγ (f(a1)−An)2 I(|f(a1)|≤Bn)

≤ 2B−2n

(F (Bn) + A2

n

),

Page 203: Kluwer

186 Chapter 3

whence, by (3.2.5),

EγX2n1I(|Xn1|≤δ)

≤ 4B−2n F (Bn) (3.2.8)

for any n large enough. It follows from (3.2.6) through (3.2.8) that thereexist c > 0 and n0 ∈ N+ such that

nB−2n F (Bn) ≥ c, n ≥ n0. (3.2.9)

Finally, by Theorem A3.11 we also have

limn→∞nγ (|Xn1| > ε) = 0

for any ε > 0. Since

(|Xn1| > ε) = (|f(a1)−An| > εBn) ⊃ (|f(a1)| > |An|+ εBn)

and limn→∞ (|An|+ εBn) /εBn = 1, we then have

limx→∞nγ (|f(a1)| > Bn) = 0. (3.2.10)

It follows from (3.2.9) and (3.2.10) that

limn→∞

B2nγ (|f(a1)| > Bn)

Eγf2(a1)I(|f(a1)|≤Bn)= 0.

Noting that limn→∞Bn+1/Bn = 1 (this follows from, e.g., Theorem A3.9,but a direct proof can be also easily given), the last equation implies

limx→∞

x2γ (|f(a1)| > x)Eγf2(a1)I(|f(a1)|≤x)

= 0,

which shows by Theorem A2.5 that F is slowly varying.

Remarks. 1. Theorem 3.2.4 still holds if we replace D by C, WD by WC ,and the stochastic process ξD

n by the stochastic process ξCn defined by

ξCn (t) = Snbntc + (nt− bntc) (

Sn(bntc+1) − Snbntc), t ∈ I, n ∈ N+.

This follows from Theorem A3.8.2. For the many consequences of Theorem 3.2.4 (as well as of other

similar further results) concerning, e.g., the asymptotic behaviour as n →∞of random variables as min0≤k≤n Snk, max0≤k≤n Snk, max0≤k≤n |Snk|, Un =number of indices k, 1 ≤ k ≤ n, for which Snk > 0, we refer the reader to

Page 204: Kluwer

Limit theorems 187

Billingsley (1968, § 11). In particular, in the last case we have an arc-sinelaw

limn→∞µ

(Un

n< a

)=

arcsin√

a, 0 ≤ a ≤ 1,

for any µ ∈ pr(BI) such that µ ¿ λ. 2

Example 3.2.5 Let f(n) = na+1/2, n ∈ N+, with a ∈ R. Clearly, fora < 0 we have Eγf2(a1) < ∞. For a = 0 we have Eγf2(a1) = ∞, F (x) ∼2 log x/ log 2, x2

∑k:|f(k)|>x k−2 = O(1) as x →∞. Thus (3.2.4) holds and

we can take

An = Eγ

(a

1/21

)=

1log 2

k∈N+

k1/2 log(

1 +1

k(k + 2)

)

and Bn = (n log n/ log 2)1/2, n ∈ N+. It is easy to check that

ζ(3/2)/6 log 2 < An < ζ(3/2)/ log 2

and that we can also write

An =∑

k≥2

(2√

k − 1−√

k −√

k − 2)

log k, n ∈ N+.

Finally, for a > 0 we have F (x) ∼ x4a/(2a+1)/2a log 2 and x2∑k:|f(k)|>x k−2

∼ x4a/(2a+1) as x →∞, that is, (3.2.4) does not hold. 2

As a special case of Theorem 3.2.2′ we note the following result.

Proposition 3.2.6 Let f : N+ → R be a non-constant function. As-sume that there exists a constant δ > 0 such that Eγ |f(a1)|2+δ < ∞. PutS0 = 0, Sn =

∑ni=1 f(ai)− nEγf(a1), n ∈ N+. Let

σ2 = Eγf2(a1)− E2γf(a1) + 2

n∈N+

(Eγf(a1)f(an+1)− E2

γf(a1)),

which by Corollary 2.1.25 is positive. Then the strong invariance principleholds for the stochastic processes ξC

n and ξDn , n ∈ N+. That is, without

changing their distributions we can redefine these processes on a commonricher probability space together with a standard Brownian motion process(w(t))t∈I such that

supt∈I

|ξn(t)− w(t)| = O(n−a) a.s. (3.2.11)

Page 205: Kluwer

188 Chapter 3

as n → ∞, with a random constant implied in O, for each a > 0 smallenough, depending on δ. Here ξn stands for either ξC

n or ξDn .

Remark. It follows from a general result of Heyde and Scott (1973) thatif we only assume Eγf2(a1) < ∞, then instead of (3.2.11) we only can assertthat

supt∈I

|ξn(t)− w(t)| = o((log log n)1/2

)a.s.

as n →∞, with a random constant implied in o. 2

3.2.3 The case of associated random variables

Write bn for either yn, rn or un, n ∈ N+, respectively bl for either yl, rl orul, l ∈ Z. We now give a partial extension of Theorem 3.2.4 to the sequence(bn)n∈N+ in the case of infinite variance.

Theorem 3.2.7 Assume f : [1,∞) → R+ is regularly varying of index1/2, Eγf2(a1) = ∞, and f(x) = x1/2L(x), where L(x) = c exp

(∫ x1 ε(t)t−1dt

),

x ≥ 1, with c > 0, ε : [1,∞) → R+ continuous, and limt→∞ ε(t) = 0. Forany n ∈ N+ define the stochastic process ξ

′n = (ξ

′(t))t∈I by

ξ′n(t) =

1Bn

j≤bntc

(f(bj)− Eγ(b0)

), t ∈ I,

with the usual convention which assigns value 0 to a sum over the emptyset, where (Bn)n∈N+ is any sequence satisfying limn→∞ nB−2

n F (Bn) = 1with F defined as in Theorem 3.2.4, and Eγ(b0) is equal to

Eγf(y0) =1

log 2

∫ ∞

1

f(x)dx

x(x + 1), Eγf(r0) = Eγf(r1) =

1log 2

∫ ∞

1

f(x)dx

x(x + 1)

or

Eγf(u0) =1

log 2

(∫ 2

1

(x− 1)f(x)dx

x2+

∫ ∞

2

f(x)dx

x2

)

according as bn denotes yn, rn or un, n ∈ N+. Then

µξ′−1n

w→ WD in BD

for any µ ∈ pr(BI) such that µ ¿ λ.

The proof of Theorem 3.2.7 for the cases where bn = rn or bn = un, n ∈N+, can be found in Samur (1989, pp. 75–77). The case where bn = yn, n ∈N+, can be treated in a similar manner. 2

Page 206: Kluwer

Limit theorems 189

We note that the hypothesis of a slowly varying F occurring in Theorem3.2.4 is replaced here by stronger hypotheses. [By Corollary A2.7(ii) theassumptions on f imply that F is slowly varying.] And even the Karamatarepresentation of f is assumed to present special features (compare withTheorem A2.1).

Example 3.2.8 Let f(x) = x1/2, x ∈ [1,∞) (cf. Example 3.2.5). Theo-rem 3.2.7 holds with Bn = (n log n/ log 2)1/2, n ∈ N+, and

Eγf(y0) = Eγf(r1) =1

log 2

∫ ∞

1

dx√x(x + 1)

2 log 2,

Eγf(u0) =1

log 2

(∫ 2

1

(x− 1)dx

x3/2+

∫ ∞

2

dx

x3/2

)=

4(√

2− 1)

log 2.

2

The next result covers the case of finite variance.

Theorem 3.2.9 Let f : [1,∞) → R. Assume that either

(i) f satisfies a Lipschitz condition of order 0 < ε ≤ 1, that is,

supx 6=y, x,y≥1

|f(x)− f(y)||x− y|ε := sε(f) < ∞,

and∫ ∞

1|f(x)|2+δx−2dx < ∞ for some δ ≥ 0

or

(ii) f = I(b,∞) for some b > 1.

Put S′0 = 0, S′n =∑n

i=1(f(bi)− Eγf(b0)), n ∈ N+, and for any n ∈ N+

define the stochastic processes ξ′Cn = (ξ′Cn (t))t∈I and ξ′Dn = (ξ′Dn (t))t∈I on(I,BI , γ) by

ξ′Cn (t) =1

σ(f)√

n(S′bntc + (nt− bntc)(f(bbntc+1)− Eγf(b0))),

ξ′Dn (t) =S′bntc

σ(f)√

n, t ∈ I,

where σ(f) is a positive number which is defined by (3.2.12) below. Then

limn→∞

1n

(n∑

i=1

(f(bi)−Eγf(b0)

))2

= σ2(f) ≥ 0 (3.2.12)

Page 207: Kluwer

190 Chapter 3

exists finitely. If σ(f) > 0 then(a) assuming that δ = 0, for any µ ∈ pr(BI) such that µ ¿ λ we have

µξ′−1n

w→ W in both BC and BD,

where ξ′n stands for either ξ′Cn or ξ′Dn ;(b) assuming that δ > 0, the strong invariance principle holds for the

stochastic processes ξ′Cn and ξ′Dn , n ∈ N+. That is, without changing theirdistributions we can redefine these processes on a richer common probabilityspace together with a standard Brownian motion process (w(t))t∈I such that

supt∈I

∣∣∣ξ′n(t)− w(t)∣∣∣ = O(n−a) a.s.

as n → ∞, with a random constant implied in O, for each a > 0 smallenough, depending on δ. Here ξ′n stands for either ξ′Cn or ξ′Dn .

Proof. We shall show that (a) and (b) follow from Theorems 3.2.1 and3.2.2, respectively. We use the notation of Subsection 2.1.5 . Define

H ((il)l∈Z) = f(b1([i1, i2, · · · ], [i0, i−1, · · · ])

), (il)l∈Z ∈ NZ

+,

H1 = H((al)l∈Z), Hm = H1 τm−1, m ∈ N+.

(3.2.13)

Hence

h(ω, θ) =

f(1/θ) in the case where bl = yl, l ∈ Z,

f(1/ω) in the case where bl = rl, l ∈ Z,

f(θ + 1/ω) in the case where bl = ul, l ∈ Z

for (ω, θ) ∈ Ω2. Also, as in the proof of Proposition 2.1.22 we easily obtain

Eγ |H1 − Eγ (H1| a−n, · · · , an)|2+δ

=∑

i−n,··· ,in∈N+

1γ2+δ(I2(i−n, · · · , in))

I2(i−n,··· ,in)γ(dω′, dθ′)

×∣∣∣∣∣∫

I2(i−n,··· ,in)(h(ω′, θ′)− h(ω, θ))γ(dω,dθ)

∣∣∣∣∣2+δ

.

(3.2.14)

Now, under (i) it is easy to check that h satisfies an inequality of the form(2.1.30), which yields cn ≤ crn, n ∈ N+, for some c > 0 and 0 < r < 1,

Page 208: Kluwer

Limit theorems 191

with cn, n ∈ N+, defined as in Proposition 2.1.22. It follows from (3.2.14)that

E1/(2+δ)γ |H1 −Eγ (H1| a−n, · · · , an)|2+δ ≤ crn, n ∈ N+.

Hence (3.2.3) clearly holds.Next, we are going to show that under (ii) condition (3.2.3) also holds.

In the case where bl = yl, l ∈ Z, for any given n ∈ N+ there is at mostone fundamental interval I(i0, i−1, ..., i−n) such that 1/b ∈ I (i0, i−1, ..., i−n).Similarly, in the case where bl = rl, l ∈ Z, for any given n ∈ N+, there isat most one fundamental interval I(i1, ..., in) such that 1/b ∈ I (i1, ..., in).Therefore by (3.2.14) in both these cases Eγ |H1 − Eγ (H1|a−n, ..., an)|2+δ

does not exceed (FnFn+1 log 2)−1 for all n ∈ N+, hence (3.2.3) holds. Inthe case where bl = ul, l ∈ Z, the last integral in (3.2.14) may be differentfrom 0 only for those rectangles I2(i−n, ..., in) which are intersected by thehyperbola y + 1/x = 1/b. It is easy to see that for n large enough the totalEuclidean area of them does not exceed (FnFn+1)

−1 so that (3.2.3) holds inthis case, too.

To prove (a) note that for δ = 0 by Theorem 3.2.1 we have

µξ−1n

w−→ W in both BC and BD (3.2.15)

for anyµ ∈ pr(B2I ) such that µ ¿ λ2, where ξn stands for either ξC

n or ξDn

defined as in Section 3.2.1, for our special H given by (3.2.13) and withσ(f) = σ(H) defined by (3.2.12). But

∣∣bn(ω)− bn(ω, θ)∣∣ ≤ (Fn−1Fn)−1, n ∈ N+, (ω, θ) ∈ Ω2.

[In the case where bn = rn, n ∈ N+, we even have bn(ω) = bn(ω, θ), n ∈ N+,(ω, θ) ∈ Ω2.] Thus under (i) we have

supt∈I

∣∣ξ′n(t, ω)− ξn(t, (ω, θ))∣∣ ≤ 1

σ(f)√

nmax1≤i≤n

∣∣S′i(ω)− Si(ω, θ)∣∣

≤ 1σ(f)

√n

n∑

i=1

∣∣f (bi(ω))− f(bi(ω, θ)

)∣∣

≤ sε(f)σ(f)

√n

i=1

∣∣bi(ω)− bi(ω, θ)∣∣ε

= O(n−1/2

)

Page 209: Kluwer

192 Chapter 3

as n →∞, with a non-random constant independent of (ω, θ) ∈ Ω2 impliedin O, while under (ii) it is easy to see that

supt∈I

∣∣ξ′n(t, ω)− ξn(t, (ω, θ))∣∣ ≤ 1

σ(f)√

n

n∑

i=1

∣∣I(b,∞)(bi(ω))− I(b,∞)(bi(ω, θ))∣∣

≤ O(1)σ(f)

√n

= O(n−1/2

)γ-a.s.

with a random constant implied in O. Therefore in both cases

supt∈I

∣∣ξ′n(t, ω)− ξn(t, (ω, θ))∣∣ = O

(n−1/2

)µ-a.s. (3.2.16)

for any µ ∈ pr(B2I ) such that µ ¿ λ2. Now, (3.2.15) and (3.2.16) imply at

once thatµξ

′−1n

w−→ W in both BC and BD

for any µ ∈ pr(BI) such that µ ¿ λ.To prove (b) note that for δ > 0 by Theorem 3.2.2 we have

supt∈I

|ξn(t)− w(t)| = O(n−a) a.s.

as n → ∞. By (3.2.16) it is obvious that the strong invariance principleholds as stated for the stochastic processes ξ′Cn or ξ′Dn , n ∈ N+. 2

In the case where bn = rn, n ∈ N+, under different assumptions on f ,we can derive from Theorems 3.2.1′ and 3.2.2′ the following result.

Theorem 3.2.10 Let f : [1,∞) → R and define the function g byg(u) = f (1/u) , u ∈ (0, 1]. Assume that g is a function of bounded p-variation, p ≥ 1. Put

S′0 = 0, S′n =n∑

i=1

f(ri)− nEγf(r1), n ∈ N+.

Then the series

σ2(f) =∫

Ig2dγ −

(∫

Igdγ

)2

+ 2∑

n∈N+

(∫

Ig Ungdγ −

(∫

Igdγ

)2)

converges absolutely. If σ(f) 6= 0 then both the weak and strong invarianceprinciples hold as described in Theorems 3.2.1′ and 3.2.2′ for the stochastic

Page 210: Kluwer

Limit theorems 193

processes ξ′Cn and ξ′Dn , n ∈ N+, defined as in Theorem 3.2.9 with bn =rn, n ∈ N+.

Proof. In this case the function H considered in Theorems 3.2.1′ and3.2.2′ is defined by

H (i1, i2, ...) = g ([i1, i2, ...]) , (in)n∈N+ ∈ NN++ .

It follows from Proposition 2.1.23 and its proof that both (3.2.1′) and (3.2.3′)hold in our special case, hence the present statement. 2

Remark. Convergence rates in the central limit theorem are available forthe sequence (

∑ni=1 f(ri) − nEγf(r1))n∈N+ . Hofbauer and Keller (1982, p.

133) proved that

supx∈R

∣∣∣∣γ(∑n

i=1 f(ri)− nEγf(r1)σ(f)

√n

< x

)− Φ(x)

∣∣∣∣ = O(n−a)

as n →∞ for some 0 < a ≤ 1/2. Rousseau-Egele (1983) showed that in thecase p = 1 we can take a = 1/2. See also Iosifescu and Grigorescu (1990,pp. 212–213) and Misevicius (1971). 2

Example 3.2.11 Let f(x) = log x, x ∈ [1,∞). This is clearly a Lip-schitz function since f ′(x) = 1/x ≤ 1 for any x ∈ [1,∞). Also, it is easy tosee that Eγ

∣∣f(b0)∣∣α < ∞ for any α ∈ R+. In the cases where bn = yn or

bn = rn, n ∈ N+, Theorem 3.2.9 holds with

Eγf(b0) =1

log 2

∫ ∞

1

log x dx

x(x + 1)

=1

log 2

(− log x log

x + 1x

∣∣∣∣∞

1

+∫ ∞

1

1x

log(

1 +1x

)dx

)

=1

log 2

k∈N+

(−1)k+1

k

∫ ∞

1

dx

xk+1

=1

log 2

k∈N+

(−1)k+1

k2

=π2

12 log 2

while the corresponding σ(f) = σ < ∞ is non-zero. This can be shownas follows. By the reversibility of (a`)`∈Z—see Subsection 1.3.3—the finite

Page 211: Kluwer

194 Chapter 3

dimensional distributions under γ of (y`)`∈Z and (r`)`∈Z are identical. Then

σ2 = limn→∞

1nEγ

(n∑

i=1

(log yi − π2

12 log 2

))2

= limn→∞

1nEγ

(n∑

i=1

(log ri − π2

12 log 2

))2

= limn→∞

1nEγ

(n∑

i=1

(log ri − π2

12 log 2

))2

.

So, σ2 coincides with (2.1.33) in the case where the function h is defined by

h(ω) = log1ω− π2

12 log 2, ω ∈ Ω.

It is easy to check that Uh ∈ BV (I) while h is essentially unbounded. Henceσ 6= 0 by Proposition 2.1.24.

It is worth mentioning that Mayer (1990) showed that −π2/12 log 2 isthe value at β = 2 of the first derivative of the dominant eigenvalue λ(β)of the Mayer–Ruelle operator Gβ. See Theorem 2.4.7. Also, Hensley (1994)showed that σ2 = λ′′(2)− (λ′(2))2 > 1/6.

Note that in the case where bn = yn, n ∈ N+, we have

S′n =n∑

i=1

log yi − nπ2

12 log 2= log qn − nπ2

12 log 2, n ∈ N+.

In this case convergence rates in the central limit theorem are available.Misevicius (1981) proved that

supx∈R

∣∣∣∣λ(

log qn − nπ2/12 log 2σ√

n< x

)− Φ(x)

∣∣∣∣ = O

(log n√

n

)(3.2.17)

as n → ∞. Vallee (1997) was able to obtain the optimal convergence ratein (3.2.17) using Mayer–Ruelle operators. She proved that for µ ∈ pr(BI)such that µ ¿ λ and the Radon–Nikodym derivative dµ/dλ is analytic andstrictly positive in I, we have

supx∈R

∣∣∣∣µ(

log qn − nπ2/12 log 2σ√

n< x

)− Φ(x)

∣∣∣∣ = O

(1√n

)(3.2.18)

Page 212: Kluwer

Limit theorems 195

as n → ∞. The same result for µ = λ had been also obtained by Morita(1994). For further results on the sequence (log qn)n∈N+ see Misevicius(1992) and Vallee (1997). See also Example 3.4.6.

From (3.2.18), using the double inequality

12q2

n+1(ω)≤

∣∣∣∣ω −pn(ω)qn(ω)

∣∣∣∣ ≤1

q2n(ω)

, ω ∈ Ω, n ∈ N+,

we can derive the corresponding result for the random variable zn definedby

zn(ω) =∣∣∣∣ω −

pn(ω)qn(ω)

∣∣∣∣ , ω ∈ Ω, n ∈ N+.

We have∣∣∣∣µ

(log zn + nπ2/6 log 2

2σ√

n< x

)− Φ(x)

∣∣∣∣ = O

(1√n

)

as n →∞. The details are left to the reader.In the case where bn = un, n ∈ N+, Theorem 3.2.9 should hold with

Eγf(b0) =1

log 2

(∫ 2

1

(x− 1) log x

x2dx +

∫ ∞

2

log x dx

x2

)

=1

log 2

(1x

(log x− 1) |21 +12

(log x)2 |21 −1x

(log x− 1) |∞2)

= 1 +12

log 2

while we conjecture that σ(f) is non-zero. 2

Example 3.2.12 Let f(x) = 1/x, x ∈ [1,∞). This is also a Lips-chitz function since | f

′(x) | = 1/x2 ≤ 1 for all x ∈ [1,∞) while g(ω) =

f(1/ω), ω ∈ Ω, is a function of bounded variation. Both Theorems 3.2.9, inthe case where bn = rn, n ∈ N+, and 3.2.10 hold with

Eγf(r1) = Eγf(r0) =1

log 2

∫ ∞

1

dx

x2(x + 1)=

1log 2

− 1

while the corresponding σ(f) = σ is non-zero. Indeed, σ2 coincides with(2.1.33) in the case where the function h is defined by

h(ω) = ω − 1log 2

+ 1, ω ∈ Ω,

and Proposition 2.1.26 applies. 2

Page 213: Kluwer

196 Chapter 3

3.3 Convergence to non-normal stable laws

3.3.1 The case of incomplete quotients

We start with a result which parallels Theorem 3.2.4.

Theorem 3.3.1 Let f : N+ → R, An ∈ R, Bn ∈ R++, n ∈ N+, withlimn→∞Bn = ∞, and define

Xnj = B−1n (f(aj)−An) , 1 ≤ j ≤ n,

Sn0 = 0, Snk =k∑

j=1

Xnj , 1 ≤ k ≤ n, Snn = Sn, n ∈ N+.

Let k1, k2 ≥ 0, k1 + k2 > 0, α ∈ (0, 2), and denote by ν = ν(k1, k2, α) thestable p.m. c1Poisµ(k1, k2, α) (see Section A1.5).

(i) The following assertions are equivalent.

(I) The stochastic process ξDn = ξn = (ξn(t))t∈I defined for any n ∈ N+

by ξn(t) = Snbntc, t ∈ I, satisfies

γξ−1n

w→ Qν in BD,

where the p.m. Qν is defined as in Section A3.3.

(II) γS−1n

w→ ν, and the array X = Xnj , 1 ≤ j ≤ n, n ∈ N+ is s.i.under γ.

(ii) Assertion (I) above holds if and only if

F (x) =∑

k:|f(k)|>xk−2, x ∈ R+, is regularly varying of index − α (3.3.1)

andlim

x→∞1

F (x)

k:f(k)>xk−2 =

k1

k1 + k2,

limx→∞

1

F (x)

k:f(k)<−xk−2 =

k2

k1 + k2

(3.3.2)

or, equivalently (see Theorem A2.5), if and only if

F (x) = (log 2)−1∑

k:|f(k)|≤xf2(k)k−2, x ∈ R+,

Page 214: Kluwer

Limit theorems 197

is regularly varying of index 2−α and (3.3.2) holds or, equivalently, if andonly if

limx→∞

x2F (x)F (x)

=2− α

αlog 2

and (3.3.2) holds. If this is the case, then we can take

An = Eγf(a1)I(|f(a1)|≤Bn), n ∈ N+,

and any sequence (Bn)n∈N+ such that

limn→∞nB−2

n F (Bn) = (k1 + k2)/(2− α).

(iii) If either (I) or (II) above holds, then γ can be replaced in (i) by anyµ ∈ pr (BI) such that µ ¿ λ.

Proof. (i) and (iii) follows from Theorem A3.7 and Lemma 3.0.2, respec-tively. The proof of (ii) is entirely similar to that working in the case of i.i.d.random variables. See Samur (1989, p. 62) and Araujo and Gine (1980, pp.81, 84–85, 87–88). 2

Remark. In principle, from Theorem 3.3.1 we might derive the asymp-totic behaviour as n →∞ of random variables as, e.g.,

min0≤k≤n

Snk, max0≤k≤n

Snk, or max0≤k≤n

|Snk|.

This depends on the possibility of determining the distribution of the randomvector (

inft∈I

ξν(t), supt∈I

ξν(t), ξν(1))

,

where ξν = (ξν(t))t∈I is a stochastic process with stationary independentincrements, ξν(0) = 0 a.s., trajectories in D, and ξν(1) having probabilitydistribution ν (see Section A3.3). Note that this problem could be solved inthe case of normal convergence, when ν is the standard normal distributionand ξν is the standard Brownian motion process—see Remark 2 followingTheorem 3.2.4. 2

Corollary 3.3.2 Let k1, k2, α, and ν = ν(k1, k2, α) be as in Theorem3.3.1.

(i) Let f ∈ F (see Section A2.3). Then (3.3.1) and (3.3.2) hold if andonly if f is regularly varying of index 1/α.

Page 215: Kluwer

198 Chapter 3

(ii) Assume f : [1,∞) → R++ is bounded on finite intervals and regu-larly varying of index 1/α. Let

να =

δα/(1−α) log 2 ∗ ν

log 2, 0, α

)if α 6= 1,

ν

(1

log 2, 0, 1

)if α = 1,

and for any n ∈ N+ define the stochastic process ηn = (ηn(t))t∈I by

ηn(t) =1

f(n)×

j≤bntcf(aj) if α < 1,

j≤bntc

(f(aj)−Eγf(a1)I(f(a1)≤f(n))

)if α = 1,

j≤bntc(f(aj)−Eγf(a1)) if α > 1,

with the usual convention which assigns value 0 to a sum over the emptyset. Then

µη−1n

w→ Qνα in BD

for any µ ∈ pr(BI) such that µ ¿ λ.

Proof. (i) By Lemma A2.6(iii) it is sufficient to show that∑

k:f(k)>xk−2 ∼ (f1(x))−1 as x →∞. (3.3.3)

For any x ≥ 1 by the definition of f1 and f2 (see Section A2.3) we have

k : k > f2(x) ⊂ k : f(k) > x ⊂ k : k ≥ f1(x). (3.3.4)

Hence

1 ≤

k:f(k)>xk−2

k>f2(x)

k−2≤ 1 +

f1(x)≤k≤f2(x)

k−2

k>f2(x)

k−2(3.3.5)

for any x ≥ 1. But∑

f1(x)≤k≤f2(x)

k−2 ≤ (f1(x)− 1)−1 − (f2(x))−1 , (3.3.6)

Page 216: Kluwer

Limit theorems 199

k>f2(x)

k−2 ≥ (f2(x) + 1)−1 (3.3.7)

for any x ≥ 1, and

(f1(x))−1 ∼ (f2(x))−1 ∼∑

k>f2(x)

k−2 as x →∞. (3.3.8)

Now, (3.3.3) follows from (3.3.5) through (3.3.8).

(ii) By Lemma A2.6(ii) we have f ∈ F. It follows from (i) above andTheorem 3.3.1 that

µξ−1n

w→ Qνα in BD

for any µ ∈ pr(BI) such that µ ¿ λ, where for any n ∈ N+ the processξn = (ξn(t))t∈I is defined by

ξn(t) =1

Bn

j≤bntc

(f(aj)−Eγf(a1)I(f(a1)≤Bn)

), t ∈ I,

with Bn satisfying

limn→∞n B−2

n F (Bn) =k1 + k2

2− α. (3.3.9)

It is therefore sufficient to prove that in (3.3.9) we can take Bn = f(n), n ∈N+, k1 = α/ log 2, k2 = 0, and that

limn→∞Eγ(ηn(1)− ξn(1))

= limn→∞

n

f(n)×

Eγf(a1)I(f(a1)≤f(n))

−Eγf(a1)I(f(a1)>f(n))

if α < 1,

if α > 1

(1− α) log 2.

(3.3.10)

To proceed notice first that by the very definition of f1 and f2 we have

f1 (f(n)− 1) ≤ n ≤ f2 (f(n)) , n ∈ N+.

Since f1 is regularly varying, by Corollary A2.2(i) we have

f1 (f(n)− 1) ∼ f1 (f(n)) as n →∞.

Page 217: Kluwer

200 Chapter 3

As f1 ∼ f2, it follows that

fi(f(n)) ∼ n as n →∞, i = 1, 2. (3.3.11)

Taking up (3.3.9) we begin by noting that (3.3.4) implies that∑

k<f1(x)

f2(k)k−2

k≤f2(x)

f2(k)k−2≤

k:f(k)<xf2(k)k−2

k≤f2(x)

f2(k)k−2≤ 1 (3.3.12)

for all x ≥ 1. Next, we use Theorem A2.3 taking

L(x) = x−2/αf2 (bxc) (bxc+ 1) /bxc, x ≥ 1,

which is a slowly varying function. We easily obtain

limx→∞

x∑

k≤x f2(k)k−2

f2(x)=

α

2− α. (3.3.13)

Clearly, (3.3.13) also holds when∑

k≤x is replaced by∑

k<x. Because f1 ∼f2 and f is regularly varying, it follows from (3.3.13) that the first fractionin (3.3.12) tends to 1 as x → ∞. Then by (3.3.13) again and (3.3.11) weobtain

n

f2(n)F (f(n)) ∼ n

f2(n) log 2

k≤f2(f(n))

f2(k)k−2

∼ 1log 2

n

f2(f(n))f2 (f2 (f(n)))

f2(n)α

2− α

∼ α

(2− α) log 2as n →∞,

(3.3.14)

that is, (3.3.9) is satisfied as stated.Now, coming to (3.3.10) assume first α < 1. Then since

limk→∞

log(1 + 1

k(k+2)

)

k−2= 1 (3.3.15)

and∑

k∈N+f(k)k−2 = ∞, we have

Eγf(a1)I(f(a1)≤f(n)) ∼1

log 2

k:f(k)≤f(n)f(k)k−2 as n →∞.

Page 218: Kluwer

Limit theorems 201

Therefore the asymptotic behaviour of

n

f(n)Eγf(a1)I(f(a1)≤f(n))

as n → ∞ can be obtained from (3.3.14) by replacing f2 by f, thus α by2α (note that while f2 is regularly varying of index 2/α, f is regularlyvarying of index 1/α). Thus

n

f(n)Eγf(a1)I(f(a1)≤f(n)) ∼

α

(1− α) log 2as n →∞,

that is, (3.3.10) holds when α < 1.Finally, let α > 1. We now use Theorem A2.4 taking

L(x) = x−1/αf (bxc) (bxc+ 1) /bxc, x ≥ 1,

which is a slowly varying function. We easily obtain

limx→∞

x∑

k≥x f(k)k−2

f(x)=

α

α− 1. (3.3.16)

Clearly, (3.3.16) also holds when∑

k≥x is replaced by∑

k>x. By (3.3.4),similarly to (3.3.12) we have

Eγf(a1)I(a1>f2(f(n)))

Eγf(a1)I(a1≥f1(f(n)))≤ Eγf(a1)I(f(a1)>f(n))

Eγf(a1)I(a1≥f1(f(n)))≤ 1, n ∈ N+. (3.3.17)

It follows from (3.3.16) that the first fraction in (3.3.17) tends to 1 as n →∞.Notice then that since

∑k∈N+

f(k)k−2 < ∞, by (3.3.15 ) we have

Eγf(a1)I(a1≥f1(f(n))) ∼1

log 2

k≥f1(f(n))

f(k)k−2 as n →∞.

Using (3.3.16) again we thus obtain

n

f(n)Eγf(a1)I(f(a1)>f(n)) ∼ n

f(n) log 2

k≥f1(f(n))

f(k)k−2

∼ 1log 2

n

f1 (f(n))f (f1(f(n)))

f(n)α

α− 1

∼ α

(α− 1) log 2as n →∞,

Page 219: Kluwer

202 Chapter 3

that is, (3.3.10) holds when α > 1, too. 2

To complete the remark following Theorem 3.3.1 we note that Corollary3.3.2 allows to derive in some cases the asymptotic behaviour as n → ∞of the random variable Un = number of indices k, 1 ≤ k ≤ n, for whichSnk > 0.

Proposition 3.3.3 Assume f is bounded on finite intervals and regularlyvarying of index 1/α with 1 < α < 2. Then

limn→∞µ

(Un

n< x

)(3.3.18)

= limn→∞µ

card

1 ≤ k ≤ n :

∑kj=1 f(aj) > kEγf(a1)

n< x

=sin(π/α)

π

∫ x

0

dt

t1−1/α(1− t)1/α, 0 ≤ x ≤ 1,

for any µ ∈ pr(BI) such that µ ¿ λ.

Proof. It is easy to check that να defined in Corollary 3.3.2 is a strictlystable probability and να ((0,∞)) = 1/α for any 1 < α < 2. Then (3.3.18)is an immediate consequence of Theorem 5.1 in de Acosta (1982). 2

Remarks. 1. Proposition 3.3.3 holds for α = 2, too. In this case thelimiting distribution is the classical arc-sine law mentioned in Remark 2following Theorem 3.2.4. However, the assumption on f in Proposition3.3.3 is slightly stronger [cf. Corollary A2.7(ii)] than the assumption on fin Theorem 3.2.4, under which the arc-sine law holds.

2. It follows from Proposition 3.3.3 [cf. Theorem 5.2 in de Acosta (1982)]that

µ (λ (t ∈ I : ξνα(t) > 0) < x) =sin(π/α)

π

∫ x

0

dt

t1−1/α(1− t)1/α, 0 ≤ x ≤ 1,

for any 1 < α < 2. This generalizes P. Levy’s arc-sine law for Brownianmotion. 2

3.3.2 Sums of incomplete quotients

From Corollary 3.3.2 we can derive results for the sums tn =∑n

j=1 aj , n ∈N+, of incomplete coefficients by taking f(x) = x, x ∈ [1,∞). In this case

Page 220: Kluwer

Limit theorems 203

we have

An = Eγa1I(a1≤n) =1

log 2

n∑

j=1

j log(j + 1)2

j(j + 2)

=1

log 2

(log(n + 2)− (n + 1) log

n + 2n + 1

), n ∈ N+.

HenceAn =

1log 2

(log n− 1 + o(1)) (3.3.19)

as n → ∞. For any µ ∈ pr(BI) such that µ ¿ λ by Corollary 3.3.2(ii) wehave

µ (ηn(1))−1 w→ ν1, (3.3.20)

where

ηn(1) =1n

n∑

j=1

(aj −An) , n ∈ N+.

It follows from (3.3.19) and (3.3.20) that

µ (ζn(1))−1 w→ δ(C−1)/ log 2 ∗ ν1 := ν ′, (3.3.21)

where

ζn(1) =1n

n∑

j=1

(aj +

C− log n

log 2

), n ∈ N+,

and C = 0.57722 · · · is Euler’s constant. Note that the ch.f. of ν ′ is

ν ′(t) = exp(− π

2 log 2

(1 + i

sgn t log |t|)|t|

), t ∈ R,

see Section A1.5. Hence ν ′ is strictly stable.A convergence rate in (3.3.21) is available in the special case where µ = γ.

Heinrich (1987) proved that there exists c0 ∈ R++ such that

∣∣γ (ζn(1) < x)− ν ′ ((−∞, x))∣∣ ≤ c0(log n)2

n(3.3.22)

for any n ∈ N+ and x ∈ R.

To conclude let us note that (3.3.21) is a special case of

µζ−1n

w−→ Qν′ in BD,

Page 221: Kluwer

204 Chapter 3

where for any n ∈ N+ the process ζn = (ζn(t))t∈I is defined by

ζn(t) =1n

j≤bntc

(aj +

C− log n

log 2

), t ∈ I.

As a consequence (compare with Remark 2 following Proposition 3.3.3) wehave

limn→∞µ

(card1 ≤ k ≤ n :

∑kj=1 aj > k(log n− C)/ log 2

n< x

)

= µ (λ(t ∈ I : ξν′(t) > 0) < x) , 0 ≤ x ≤ 1.

An explicit expression of the last distribution function is not known.Immediate consequences of (3.3.21) and (3.3.22) are that (i) for any µ ∈

pr(BI) such that µ ¿ λ we have

tnn log n

−→ 1log 2

in µ-probability as n →∞, (3.3.23)

and (ii) for any ε > 0 and n ∈ N+ we have

γ

(∣∣∣∣tn

n log n− 1

log 2

∣∣∣∣ ≤ ε

)

≥ ν ′([−ε log n +

Clog 2

, ε log n +C

log 2

])− 2c0(log n)2

n.

Khintchine (1934/35) proved using (3.3.23) that the series∑

n∈N+1/tn

is divergent a.e. in I. A stronger result is Theorem 3.3.4 below. This wasstated by Doeblin (1940), but his proof is incorrect. We reproduce here theproof of Iosifescu (1996).

Theorem 3.3.4 The series∑

n≥2

(1tn− log 2

n log n

)

is absolutely convergent a.e. in I .

Proof. In what follows, the letter c with different indices will denotesuitable positive constants. Let h : N+ → N+ be a function such thatlimn→∞ h(n) = ∞. For any n ∈ N+ put

tn(h) =n∑

i=1

aiI(ai≤h(n)).

Page 222: Kluwer

Limit theorems 205

It follows from (3.3.19) and the strict stationarity of (an)n∈N+ under γ that

Eγtn(h) =n

log 2(log h(n)− 1 + o(1)) (3.3.24)

as n →∞. Next, for any n ∈ N+ we have

Eγa21I(a1≤n) =

1log 2

n∑

j=1

j2 log(

1 +1

j(j + 2)

)≤ c1n,

and Corollary A3.2 yields

Eγ (tn(h)− Eγtn(h))2 ≤ c2nh(n), n ∈ N+. (3.3.25)

Now, write tn = tn(h) for h(n) = n(blog4/3 nc+ 1

)and t′n = tn(h) for

h(n) = n, n ∈ N+. For any n ≥ 3 by (3.3.24) we have∣∣∣∣

1Eγtn

− log 2n log n

∣∣∣∣ ≤ c3log log n

n log2 n.

Since the series∑

n≥3(log log n)/n log2 n is convergent, it is sufficient toprove that the series

n≥2

(1tn− 1

Eγtn

)(3.3.26)

is absolutely convergent a.e. in I.For any n ≥ 2 consider the random events

A1(n) = A1 =(tn > 3

2Eγtn), A2(n) = A2 =

(tn < 1

2Eγtn),

A3(n) = A3 =(

12Eγtn ≤ tn ≤ 3

2Eγtn) ∩ (

tn 6= tn),

A4(n) = A4 =(

12Eγtn ≤ tn ≤ 3

2Eγtn) ∩ (

tn = tn).

Let us find upper bounds for the γ-probabilities of A1, A2, and A3.We have

A1 =(tn − Eγtn > 1

2Eγtn) ⊂ (∣∣tn − Eγtn

∣∣ > 12Eγtn

).

By (3.3.24) and (3.3.25) the Bienayme–Chebyshev inequality implies

γ(A1) ≤ 4c2n2(blog4/3 nc+ 1

)/

(Eγtn

)2 ≤ c4 (log n)−2/3 . (3.3.27)

Page 223: Kluwer

206 Chapter 3

Since t′n ≤ tn, n ∈ N+ and(Eγtn

)/2 − Eγt′n < 0 for n large enough, for

such an n we have

A2 =(tn < 1

2Eγtn) ⊂ (

t′n < 12Eγtn

)=

(t′n − Eγt′n < 1

2Eγtn − Eγt′n)

⊂ (∣∣t′n −Eγt′n∣∣ > Eγt′n − 1

2Eγtn).

Again by (3.3.24) and (3.3.25), the Bienayme–Chebyshev inequality implies

γ(A2) ≤ c′2n2

(Eγt′n −Eγtn/2

)2 ≤ c5(log n)−2. (3.3.28)

Noting that

(tn 6= tn) =n⋃

i=1

(ai > n(blog4/3 nc+ 1)

),

whence

γ(tn 6= tn) ≤ nγ(a1 > n

(blog4/3 nc+ 1

))≤ c6(log n)−4/3, (3.3.29)

we obviously haveγ(A3) ≤ c6(log n)−4/3. (3.3.30)

Next, let us find an upper bound for

∣∣∣∣1tn− 1

Eγtn

∣∣∣∣ =4∑

i=1

Ii(n),

where

Ii(n) =∫

Ai

∣∣∣∣1tn− 1

Eγtn

∣∣∣∣dγ, 1 ≤ i ≤ 4.

Since tn ≤ tn, n ∈ N+, on A1 we have

1tn≤ 1

tn<

23Eγtn

. (3.3.31)

It follows from (3.3.24), (3.3.27), and (3.3.31) that

I1(n) ≤ c7n−1 (log n)−5/3 . (3.3.32)

Since tn ≥ n, n ∈ N+, by (3.3.24), (3.3.28), and (3.3.30) we have

I2(n) ≤ c8n−1(log n)−2, I3(n) ≤ c9n

−1(log n)−4/3. (3.3.33)

Page 224: Kluwer

Limit theorems 207

Finally, setwn = (tn − Eγtn)/Eγtn

and note that by (3.3.24) and (3.3.25) we have

Eγ |wn| ≤ E1/2γ w2

n ≤ c10(log n)−1/3.

Since on A4 we have tn = tn and 2/3 ≤ 1/(1 + wn) ≤ 2, it follows that

I4(n) =∫

A4

∣∣∣∣1tn− 1

Eγtn

∣∣∣∣ dγ =∫

A4

|wn|(1 + wn)Eγtn

≤ 2Eγ tn

Eγ |wn| ≤ c11n−1(log n)−4/3.

(3.3.34)

Therefore by (3.3.32) through (3.3.34) we have

∣∣∣∣1tn− 1

Eγtn

∣∣∣∣ = O(n−1(log n)−4/3

)

as n → ∞. As the series∑

n≥2 n−1(log n)−4/3 is convergent, by BeppoLevy’s theorem series (3.3.26) is absolutely convergent a.e. in I. The proofis complete. 2

Corollary 3.3.5 We have

limn→∞

∑ni=1 1/ti

log log n= log 2 a.e..

Proof. This follows immediately from Theorem 3.3.4 since, as is wellknown,

limn→∞

(n∑

i=1

1i log i

− log log n

)

exists and is finite. 2

For further results on the sums tn, n ∈ N+, see Theorem 4.1.9 and itscorollaries.

3.3.3 The case of associated random variables

We shall now show that Corollary 3.3.2 still holds in the case where α < 1when aj is replaced by either yj , rj , or uj , j ∈ N+. This will follow fromthe result below (compare with Lemma 3.1.4).

Page 225: Kluwer

208 Chapter 3

Lemma 3.3.6 Let bn, n ∈ N+, be real-valued random variables on(I,BI) such that

an ≤ bn ≤ an + c, n ∈ N+,

for some c ∈ R+. For any n ∈ N+ consider the stochastic processes ηn =(ηn(t))t∈I and η′n = (η′n(t))t∈I defined by

ηn(t) =1

f(n)

j≤bntcf(aj), η′n(t) =

1f(n)

j≤bntcf(bj), t ∈ I,

with the usual convention which assigns value 0 to a sum over the emptyset, where f : [1,∞) → R++ is bounded on finite intervals and regularlyvarying of index β > 1. Then d0(ηn, η′n) converges to 0 in γ-probability asn →∞.

Proof. Write f(x) = xβL(x), x ∈ [1,∞), where L is slowly varying. Forany n ∈ N+ we have

d0 (ηn, η′n) ≤ supt∈I

|ηn(t)− η′n(t)|

≤ 1f(n)

n∑j=1

|f(aj)− f(bj)| ≤ δ′n + δ′′n,

(3.3.35)

where

δ′n =1

f(n)

n∑

j=1

(bβj − aβ

j

)L(aj), δ′′n =

1f(n)

n∑

j=1

bβj |L(bj)− L(aj)| .

Using the inequality (1 + a)α − 1 ≤ a(α+ bαc(1 + a)α−1

), valid for non-

negative a and α, we obtain

bβj − aβ

j ≤ cβ(1 + c)β−1aβ−1j , 1 ≤ j ≤ n,

whence

δ′n ≤ cβ(1 + c)β−1 1f(n)

n∑

j=1

a−1j f(aj).

Writing

a−1j f(aj) = a−1

j f(aj)I(aj≤M) + a−1j f(aj)I(aj>M), 1 ≤ j ≤ n,

for an arbitrarily given M ≥ 1, we easily obtain

δ′n ≤ cβ(1 + c)β−1

n

f(n)max

1≤i≤M

f(i)i

+1M

1f(n)

n∑

j=1

f(aj)

.

Page 226: Kluwer

Limit theorems 209

Then for any ε > 0 by Corollary 3.3.2(ii) we have

lim supn→∞

γ(δ′n > cβ(1 + c)β−1ε

)

≤ lim supn→∞

γ (ηn(1) > M ε/2)

≤ ν1/β

([Mε

2,∞

))−→ 0 as M →∞.

Hence δ′n converges to 0 in γ-probability as n →∞.Next, for any fixed M ≥ 1 we can write

δ′′n ≤ 1f(n)

n∑

j=1

(f(bj) +

(bj

aj

f(aj)

)I(aj≤M)

+n∑

j=1

(bj

aj

f(aj)∣∣∣∣L(bj)L(aj)

− 1∣∣∣∣ I(aj>M)

≤((

1 + (1 + c)β)

sup1≤x≤M+c

f(x)

)n

f(n)

+(1 + c)β

f(n)sup

0≤s≤c, x>M

∣∣∣∣L(x + s)

L(x)− 1

∣∣∣∣n∑

j=1

f(aj).

Given η > 0, choose M ≥ 1 such that

sup0≤s≤c

∣∣∣∣L(x + s)

L(x)− 1

∣∣∣∣ ≤ η

for x > M, which is possible by the Karamata representation of L (seeTheorem A2.1). Then for any ε > 0 by Corollary 3.3.2(ii) again we have

lim supn→∞

γ(δ′′n > ε) ≤ lim supn→∞

γ(ηn(1) > η−1(1 + c)−βε/2

)

≤ ν1/β

([η−1(1 + c)−βε

2,∞

))−→ 0 as η → 0.

Hence δ′′n converges to 0 in γ-probability as n →∞.

Page 227: Kluwer

210 Chapter 3

By (3.3.35) the proof is complete. 2

Corollary 3.3.7 Let bn denote either yn, rn or un, n ∈ N+. For anyn ∈ N+ consider the stochastic process

η′n =

1

f(n)

j≤bntcf(bj)

t∈I

with the usual convention which assigns value 0 to a sum over the empty set,where f : [1,∞) → R++ is bounded on finite intervals and regularly varyingof index 1/α, 0 < α < 1. Let µ ∈ pr(BI) such that µ ¿ λ. Then

µη′−1n

w→ Qνα in BD.

Proof. Lemma 3.3.6 applies with c = 1 in the case of yn and rn andwith c = 2 in the case of un. Since µ ¿ λ, the distance d0(ηn, η′n) convergesto 0 in µ-probability, too, as n →∞. This property and Corollary 3.3.2(ii)imply the result stated. 2

In the case where α ≥ 1 we have results which complement Theorem3.2.7. Write b0 for either y0, r0 or u0.

Theorem 3.3.8 Let bn denote either yn, rn or un. Assume f :[1,∞) → R++ is regularly varying of index 1/α, α ∈ [1, 2), Eγf2(a1) = ∞,

and f(x) = x1/αL(x), where L(x) = c exp(

x∫1

ε(t)t−1dt

), x ≥ 1, with

c > 0, ε : [1,∞) → R continuous, and limt→∞ ε(t) = 0. For any n ∈ N+

define the process η′n = (η′n(t))t∈I by

η′n(t) =1

f(n)×

j≤bntc

(f(bj)−m(f, b0)−Eγf(a1)I(f(a1)≤f(n))

)if α = 1,

j≤bntc

(f(bj)− Eγf(b0)

)if α > 1

with the usual convention which assigns value 0 to a sum over the emptyset, where m(f, b0) and Eγf(b0) are equal to

m(f, y0) = m(f, r0) = Eγ (f(r0)− f(a0)) = Eγ (f(r1)− f(a1))

=1

log 2

∫ ∞

1

(f(x)− f(bxc)) dx

x(x + 1),

Page 228: Kluwer

Limit theorems 211

m(f, u0) = Eγ (f(u0)− f(a0))

=1

log 2

∫ ∞

1

∫ ∞

1

(f

(x +

1y

)− f(bxc)

)(xy + 1)−2dxdy

=1

log 2

(∫ 2

1

(f(x)− f(1)) (x− 1)x2

dx

+∫ ∞

2

f(x)− (bxc − x + 1)f (bx− 1c)− (x− bxc)f(bxc)x2

dx

),

Eγf(y0) = Eγf(r0) = Eγf(r1) =1

log 2

∫ ∞

1

f(x)dx

x(x + 1),

Eγf(u0) =1

log 2

(∫ 2

1

f(x)(x− 1)dx

x2+

∫ ∞

2

f(x)dx

x2

),

according as bn denotes yn, rn or un, n ∈ N+. Then

µη′−1n

w−→ Qνα in BD

for any µ ∈ pr(BI) such that µ ¿ λ, where να is defined as in Corollary3.3.2(ii).

The proof of Theorem 3.3.8 for the cases bn = rn or bn = un, n ∈ N+,can be found in Samur (1989, pp. 75–77). The case where bn = yn, n ∈ N+,can be treated in a similar manner. 2

Example 3.3.9 Let f(x) = x1/α, x ∈ [1,∞), where α ∈ (1, 2). (For thecase α = 2 see Example 3.2.8.) Theorem 3.3.8 holds with

Eγf(y0) = Eγf(r0) = Eγf(r1)

=1

log 2

∫ ∞

1

x1/αdx

x(x + 1)=

1log 2

∫ 1

0

v−1/αdv

v + 1

=1

log 2

j∈N+

1(2j − 1− 1/α)(2j − 1/α)

=1

2 log 2

(1− 1

)−ψ

(12− 1

)),

Page 229: Kluwer

212 Chapter 3

where ψ is the digamma function—see p. 145—and

Eγf(u0) =1

log 2

(∫ 2

1

(x− 1)dx

x2−1/α+

∫ ∞

2

dx

x2−1/α

)=

α2(21/α − 1)(α− 1) log 2

.

2

Example 3.3.10 Let f(x) = x, x ∈ [1,∞). Theorem 3.3.8 holds with

m(f, y0) = m(f, r0) =1

log 2

∫ ∞

1

(x− bxc) dx

x(x + 1)

=1

log 2

∫ ∞

1

dx

x2(x + 1)= (log 2)−1 − 1,

m(f, u0) = Eγ(r0 − a0 + y−10 ) = m(f, r0) + Eγ(y−1

0 )

=2

log 2

∫ ∞

1

dx

x2(x + 1)= 2

((log 2)−1 − 1

).

It follows that if for any n ∈ N+ the process ζ′n = (ζ

′n(t))t∈I is defined

by

ζ′n(t) =

1n

j≤bntc

(bj +

C− log n

log 2

), t ∈ I,

where bn denotes either yn, rn or un, n ∈ N+, then for any µ ∈ pr(BI) suchthat µ ¿ λ we have

µζ′−1n

w−→ Qν′′ in BD

in the cases where bn = yn or bn = rn, n ∈ N+, with ν ′′ = δC/ log 2−1 ∗ ν1,and

µζ′−1n

w−→ Qν′′′ in BD

in the case where bn = un, n ∈ N+, with ν ′′′ = δ(C+1)/ log 2−2 ∗ ν1. As aconsequence (compare with the similar result for the incomplete quotientsan, n ∈ N+, in Subsection 3.3.2) we have

limn→∞µ

(card1 ≤ k ≤ n :

∑kj=1 yj > k(log n− C)/ log 2

n< x

)

= limn→∞µ

(card1 ≤ k ≤ n :

∑kj=1 rj > k(log n− C)/ log 2

n< x

)

= µ (λ(t ∈ I : ξν′′(t) > 0) < x) , 0 ≤ x ≤ 1,

Page 230: Kluwer

Limit theorems 213

and

limn→∞µ

(card1 ≤ k ≤ n :

∑kj=1 uj > k(log n− C)/ log 2

n< x

)

= µ(λ(t ∈ I : ξν

′′′ (t) > 0) < x), 0 ≤ x ≤ 1.

2

3.4 Fluctuation results

3.4.1 The case of incomplete quotients

We start with a direct consequence of Theorem 3.2.2′.Let K ⊂ C be the collection of all absolutely continuous functions x ∈ C

for which x(0) = 0 and∫ 10 [x′ (t)]2dt ≤ 1. Here x′ stands for the derivative

of x which exists a.e. in I.Let H be a real-valued function on NN+

+ . Set Hn = H (an, an+1, · · · ) , n ∈N+, and assume that EγH2

1 < ∞ and (3.2.1′) holds. Denoting Sn =∑ni=1 Hn − nEγH1, n ∈ N+, and assuming that σ2 defined by (3.2.2′) is

non-zero, for any n ≥ 3 put

θn(t) =1

σ√

2n log log n

(Sbntc + (nt− bntc) (

Hbntc+1 − EγH1

))

=1√

2n log log nξCn , t ∈ I.

Theorem 3.4.1 (Strassen’s law of the iterated logarithm). Assume thatEγ |H1|2+δ < ∞ for some constant δ > 0, (3.2.3′) holds, and σ2 defined by(3.2.2′) is non-zero. Then the sequence (θn)n≥3, viewed as a subset of C ,is a relatively compact set whose derived set coincides a.e. with K.

Proof. The result follows from Strassen’s law of the iterated logarithm forstandard Brownian motion [see Theorem 1 in Strassen (1964)] and Theorem3.2.2′. 2

Corollary 3.4.2 (Classical law of the iterated logarithm). Under theassumptions of Theorem 3.4.1 the set of accumulation points of the sequence

(Sn/σ

√2n log log n

)n≥3

Page 231: Kluwer

214 Chapter 3

coincides a.e. with the segment [−1, 1].

In the special case where H only depends on finitely many coordinatesof a current point of NN+

+ , i.e., when H is a real-valued function on Nk+

for a given k ∈ N+, certain assumptions in Theorem 3.4.1 are no longernecessary. In this case Hn = H (an, · · · , an+k−1), n ∈ N+, and (3.2.3′) istrivially satisfied. Also, σ2 reduces to (3.2.2′′) and when k = 1 by Corollary2.1.25 we have σ2 = 0 if and only if H = const. Finally, it is enough toassume that EγH2

1 < ∞. This follows from the work of Heyde and Scott(1973). Cf. the remark following Proposition 3.2.6.

We state a most striking result.

Proposition 3.4.3 Let f : N+ → R be a nonconstant function. Assumethat Eγf2 (a1) < ∞ and put Sn =

∑ni=1 f (ai)− nEγf (a1) , n ∈ N+. Let

σ2 = Eγf2 (a1)− E2γf (a1) + 2

n∈N+

(Eγf (a1) f (an+1)− E2

γf (a1)),

which by Corollary 2.1.25 is non-zero. For any n ≥ 3 put

θn (t) =1

σ√

2n log log n

(Sbntc + (nt− bntc) (fbntc+1 −Eγf (a1))

), t ∈ I.

Then the sequence (θn)n≥3, viewed as a subset of C, is a relatively compactset whose derived set coincides a.e. with K. In particular, the set of accu-mulation points of the sequence (Sn/σ

√2n log log n)n≥3 coincides a.e. with

the segment [−1, 1].

The almost sure invariance principle is instrumental in establishing in-tegral tests which characterize the asymptotic growth rates of partial sumsand maximum absolute partial sums.

Proposition 3.4.4 Let θ : [1,∞) → R++ be non-decreasing. Thenunder the assumptions of Theorem 3.4.1 the following assertions hold:

(i) γ (Sn > σ√

n θ (n) i.o.) = 0 or 1according as ∫ ∞

1

θ (t)t

exp(−θ2(t)

2

)dt

converges or diverges.(ii) γ (max1≤i≤n |Si| < σ

√n/θ(n) i.o.) = 0 or 1

according as ∫ ∞

1

θ2 (t)t

exp(−π2θ2 (t)

8

)dt

Page 232: Kluwer

Limit theorems 215

converges or diverges.

Proof. These results follow from Theorem 3.2.2′ and properties of stan-dard Brownian motion. See Jain and Taylor (1973) and Jain, Jogdeo andStout (1975) [cf. Philipp and Stout (1975)]. 2

Except for the sufficiency of the moment assumption EγH21 < ∞ in

the case considered there, the considerations on Theorem 3.4.1 followingCorollary 3.4.2 are valid for Proposition 3.4.4, too.

We note that Proposition 3.4.4(i) implies the classical law of the iteratedlogarithm

γ

(lim sup

n→∞Sn

σ√

2n log log n= 1

)= 1. (3.4.1)

To obtain (3.4.1) we should take successively θ(n) = (1 + ε)√

2 log log n andθ(n) = (1 − ε)

√2 log log n, 0 < ε < 1, n ∈ N+. Also, Proposition 3.4.4(ii)

implies Chung’s law of the iterated logarithm for maximum absolute partialsums

γ

(lim infn→∞

max1≤i≤n |Si|σ√

n/(log log n)=

π√8

)= 1. (3.4.2)

To obtain (3.4.2) we should take successively θ(n) = (√

8/π)(1+ε)√

log log nand θ(n) = (

√8/π)(1− ε)

√log log n, 0 < ε < 1, n ∈ N+.

We conjecture that in the special case where H only depends on finitelymany coordinates of a current point in NN+

+ , Chung’s law of the iteratedlogarithm (3.4.2) holds only assuming that EγH2

1 < ∞ [as (3.4.1) does]. SeeJain and Pruitt (1975) for the i.i.d. case.

3.4.2 The case of associated random variables

Write bn for either yn, rn or un, n ∈ N+, respectively b0 for either y0, r0 oru0.

Theorem 3.4.5 Let f : [1,∞) → R satisfy either (i) or (ii) of Theorem3.2.9. With the notation of that theorem assume that σ(f) > 0 and put

θ′n(t) =1√

2n log log nξ′Cn (t), n ≥ 3, t ∈ I.

If δ > 0 then the sequence (θ′n)n≥3, viewed as a subset of C, is a relativelycompact set whose derived set coincides a.e. with K. In particular, the set ofaccumulation points of the sequence (S′n/σ

√2n log log n)n≥3 coincides a.e.

with the segment [−1, 1].

Page 233: Kluwer

216 Chapter 3

Proof. The results follow at once from Theorem 3.2.9(b) and Strassen’slaw of the iterated logarithm for standard Brownian motion [see Theorem 1in Strassen (1964)]. 2

Note that in the present context we cannot make considerations similarto those following Corollary 3.4.2.

Example 3.4.6 Let f(x) = log x, x ∈ [1,∞). As we have seen in Exam-ple 3.2.11, in the cases where bn = yn or bn = rn, n ∈ N+, we have

Eγf(b0) =π2

12 log 2

and σ(f) = σ < ∞ is non-zero. It follows that Strassen’s law of the iteratedlogarithm holds for the corresponding processes θ′n, n ∈ N+. In particular,the classical law of the iterated logarithm

γ

(lim sup

n→∞log qn − nπ2/12 log 2

σ√

2n log log n= 1

)= 1

holds. This had been proved by Gordin and Reznik (1970) and Philipp andStackelberg (1969). 2

A result similar to Proposition 3.4.4 holds.

Proposition 3.4.7 Let θ : [1,∞) → R++ be non-decreasing. Thenunder the assumptions of Theorem 3.2.9 the following assertions hold:

(i) γ(S′n > σ(f)√

n θ(n) i.o.) = 0 or 1according as ∫ ∞

1

θ(t)t

exp(−θ2(t)

2

)dt

converges or diverges.(ii) γ (max1≤i≤n |S′i| < σ(f)

√n/θ(n) i.o.) = 0 or 1

according as ∫ ∞

1

θ2(t)t

exp(−π2θ2(t)

8

)dt

converges or diverges.

Proof. These results follow from Theorem 3.2.9 and properties of stan-dard Brownian motion. See Jain and Taylor (1973) and Jain, Jogdeo andStout (1975) [cf. Philipp and Stout (1975)]. 2

The remarks following Proposition 3.4.4 concerning the classical andChung’s laws of the iterated logarithm apply mutatis mutandis in the presentcontext, too.

Page 234: Kluwer

Limit theorems 217

It is obvious that all the results stated in this section still hold when γis replaced by any µ ∈ pr(BI) such that µ ¿ λ.

Page 235: Kluwer

218 Chapter 3

Page 236: Kluwer

Chapter 4

Ergodic theory of continuedfractions

In this chapter applications of the ergodic properties of the continued frac-tion transformation τ and its natural extension τ are given. Next, twooperations (‘singularization’ and ‘insertion’) on incomplete quotients are in-troduced, which allow to obtain most of the continued fraction expansionsrelated to the RCF expansion. Ergodic properties of these expansions arealso derived.

4.0 Ergodic theory preliminaries

4.0.1 A few general concepts

Let (X,X , µ) be a probability space. An X-valued random variable onX, i.e., an (X ,X )-measurable map from X into itself (see Section A1.2),is called a transformation of X. A transformation T of X is said to beµ-non-singular if and only if µ(T−1(A)) = 0 for any A ∈ X for whichµ(A) = 0; it is said to be measure preserving if and only if µT−1 = µ, i.e.,µ(T−1(A)) = µ(A) for any A ∈ X – see Section A1.3. (When the probabilityµ should be emphasized we shall say that T is µ-preserving.) Clearly, anyµ-preserving transformation of X is µ-non-singular. A pair (T, µ), whereT is a µ-preserving transformation of X, is called an endomorphism of X.An endomorphism (T, µ) of X is called an automorphism if and only if T isbijective [that is, T (X) = X and T−1 exists] and T−1 is (X ,X )-measurable.A quadruple (X,X , T, µ), where (T, µ) is an endomorphism of X, is calleda (measurable) dynamical system.

219

Page 237: Kluwer

220 Chapter 4

A transformation T of X is said to be ergodic (or metrically transitive,or indecomposable) under µ if and only if the sets A ∈ X with T−1(A) = A,which are called T -invariant, satisfy either µ(A) = 0 or µ(A) = 1. Anequivalent definition, even if seemingly more general, is that

µ((T−1(A) \A) ∪ (A \ T−1(A))

)= 0

for A ∈ X if and only if either µ(A) = 0 or µ(A) = 1. Finally, in termsof functions this is equivalent to f = f T µ-a.s. for an X-valued randomvariable f on X if and only if f is constant µ-a.s.

In particular, T is ergodic under µ if it is strongly mixing under µ, thatis,

limn→∞µ(T−n(A) ∩B) = µ(A)µ(B)

for any sets A, B ∈ X . This is equivalent to

limn→∞

X(f Tn)g dµ =

Xf dµ

Xg dµ

for any f ∈ L∞(X,X , µ) and g ∈ L1(X,X , µ).

Proposition 4.0.1 Let T be a µ-non-singular transformation of X. If Tis ergodic under µ, then there exists at most one probability measure ν on Xsuch that ν ¿ µ and (T, ν) is an endomorphism of X. Conversely, if thereexists a unique measure ν on X with ν ¿ µ and dν/dµ > 0 µ-a.s. such that(T, ν) is an endomorphism of X, then T is ergodic under µ.

The proof of Proposition 4.0.1, which entails the concept of the Perron–Frobenius operator of T (cf. Section 2.1), can be found in Lasota and Mackey(1985). 2

An endomorphism (T, µ) of X is said to be exact if and only if, putting

Xn =(T−n(A) : A ∈ X )

, n ∈ N,

where T 0 is the identity map, the tail σ-algebra⋂

n∈NXn is µ-trivial, i.e.,it contains only sets A for which either µ(A) = 0 or µ(A) = 1. If anendomorphism (T, µ) of X is exact, then T is ergodic under µ; also, for anyA ∈ X for which µ(A) > 0 and Tn(A) ∈ X , n ∈ N+, we have

limn→∞µ (Tn(A)) = 1.

Page 238: Kluwer

Ergodic theory of continued fractions 221

Proposition 4.0.2 Let T be a µ-preserving transformation of X forwhich T (A) ∈ X for any A ∈ X . Then the endomorphism (T, µ) is exact ifand only if

limn→∞ ||P

nf −∫

Xf dµ||1,µ = 0

for any non-negative f ∈ L1(X,X , µ), where P is the Perron–Frobeniusoperator of T under µ (cf. Section 2.1).

For the proof see Boyarski and Gora (1997, p. 82). 2

Theorem 4.0.3 (Birkhoff’s individual ergodic theorem) Let T be a µ-preserving transformation of X. Then for any f ∈ L1(X,X , µ) there existsf ∈ L1(X,X , µ) such that

limn→∞

1n

n−1∑

k=0

f(T k(x)) = f µ-a.s.

andf T = f µ-a.s.

Moreover,∫X f dµ =

∫X f dµ and if, in addition, T is ergodic under µ, then

f is µ-a.s. a constant equal to∫X f dµ.

A proof of the ergodic theorem can be found in, e.g., Billingsley (1965),Walters (1982), Petersen (1983) or Cornfeld et al. (1982). In particular, inKeane (1991) a short proof, essentially based on an idea of Kamae (1982),is outlined. See also Katznelson and Weiss (1982). 2

Under suitable assumptions it is possible to refine Birkhoff’s theorem bygiving an estimate of the convergence rate to the limit f . The result statedbelow is a special case of Theorem 3 of Gal and Koksma (1950).

Proposition 4.0.4 Let T be a µ-preserving transformation of X whichis ergodic under µ. Assume that

X

(n−1∑

κ=0

f T κ − n

Xf dµ

)2

dµ = O(Ψ(n))

as n → ∞, where Ψ : N+ → R is a function such that the sequence(Ψ(n)/n)n∈N+ is non-decreasing. Then whatever ε > 0 we have

n−1∑

κ=0

f (T κ(x)) = n

Xf dµ + o

(Ψ1/2(n) log

3+ε2 n

)µ-a.s.

Page 239: Kluwer

222 Chapter 4

as n → ∞. Here the constant implied in o depends on ε and the currentpoint x ∈ X.

Given a transformation T of X we can define its so called natural exten-sion T as follows. Let

XT =((xi)i∈N ∈ XN : xi = T (xi+1), i ∈ N

)

and define T : XT → XT by

T ((xi)i∈N) = (T (x0), x0, x1, · · · )

for any (xi)i∈N = (x0, x1, · · · ) ∈ XT . It is easy to check that T is bijective.If T is µ-preserving, then we can also define a measure µ on the σ-algebraXT ⊂ XN generated by the cylinder sets

C(A0, . . . , An) = ((xi)i∈N ∈ XT : xj ∈ Aj , 0 ≤ j ≤ n) ,

where Aj ∈ X , 0 ≤ j ≤ n, n ∈ N, by setting

µ(C(A0, . . . , An)) = µ

0≤j≤n

T−n+j(Aj)

, n ∈ N.

Proposition 4.0.5 If T is µ-preserving, then T is µ-preserving; T isergodic (strongly mixing) under µ if and only if T is ergodic (strongly mixing)under µ.

Clearly, if (T, µ) is an endomorphism of X, then (T , µ) is an automor-phism of XT .

Remarks. 1. The definition just given of the natural extension T ofT is a constructive one. More generally, starting from a transformationT of X which is µ-preserving (µT−1 = µ), a bijective transformation T :X → X is called a natural extension of T if and only if (i) there exists ameasurable space (X,X ) and a probability measure µ on X such that T isµ-preserving, and (ii) there exists a random variable f : X → X such thatthe σ-algebra generated by

⋃n∈N T

nf−1(X )—see Section A1.1—coincides

with X up to sets of µ-probability 0, f T = T f µ-a.s., and µf−1 = µ.The natural extension is unique up to isomorphism. By this we mean thatif T i : Xi → Xi, i = 1, 2, are natural extensions of T : X → X, with Xi

being µi-preserving for a probability measure µi on X i (the σ-algebra in

Page 240: Kluwer

Ergodic theory of continued fractions 223

Xi), i = 1, 2, then there exist Ei ∈ X i with µ(Ei) = 0, i = 1, 2, and aone-to-one random variable g : X 1 \ E1 → X 2 \ E2 such that gT 1 = T 2g onX 1 \ E1 and µ1(g−1(E)) = µ2(E) for any set E in X 2 which is included inX2 \ E2. In the case of the constructive definition we clearly have X = XT

while f is defined by

f ((xi)i∈N) = x0, (xi)i∈N ∈ XT .

Note that the definition of isomorphism of two natural extensions of agiven endomorphism also applies to the case of two arbitrary endomorphismsor dynamical systems.

2. Unlike ergodicity or strong mixing, exactness does not transfer froman endomorphism (T, µ) to its natural extension (T , µ). As T is invertible,(T , µ) cannot be exact since

µ(T (A)

)= µ

(T−1(T (A))

)= µ(A),

hence µ(T

n(A))

= µ(A) for any n ∈ N+ and A ∈ X . Instead,(T , µ

)always

is a K-automorphism, which means that there exists an algebra A ⊂ Xsuch that T

−1(A) ⊂ A,⋃

n∈N+T

n(A) generates X , and the tail σ-algebra⋂n∈N+

T−n(A) is µ-trivial. Cf. Petersen (1983, Section 2.5) 2

Finally, let us consider together with the probability space (X,X , µ) anda transformation T : X → X, a family of probability spaces ((Y,Y, νx))x∈X

and a family (Tx)x∈X of transformations of Y such that the map (x, y) ∈X × Y → Tx(y) ∈ Y is an Y -valued random variable on X × Y . The mapS : X × Y → X × Y defined by

S(x, y) = (T (x), Tx(y)) , (x, y) ∈ X × Y,

is called a skew product of T and (Tx)x∈X . In many cases the naturalextensions are constructed as skew products. Several examples can be foundin the next sections.

Assuming that T is µ-preserving and Tx is νx-preserving for any x ∈ X,we might expect the skew-product S to be ν-preserving, where ν is theprobability measure on X ⊗ Y defined by

ν(A×B) =∫

Aνx(B) µ(dx), A ∈ X , B ∈ Y.

Unfortunately, such a result does not hold even if it is claimed in Boyarskiand Gora (1997, p. 64). It is contradicted, e.g., by the case of the naturalextension τ of τ . Cf. the next subsection.

Page 241: Kluwer

224 Chapter 4

4.0.2 The special case of the transformations τ and τ

It is possible to give a direct proof of the ergodicity under γ of the continuedfraction transformation τ . See, e.g., Billingsley (1965, pp. 44–45).

Results proved in Chapter 2 allow us to assert that actually τ is stronglymixing under γ and any γa, a ∈ I, thus in particular under γ0 = λ. This is adirect consequence of Corollary 1.3.15. Therefore τ is also ergodic under γand any γa, a ∈ I. Moreover, the endomorphism (τ, γ) is exact by Corollary2.1.8 and Proposition 4.0.2. It follows from Proposition 4.0.1 that any ν ¿ λfor which τ is ν-preserving should coincide with γ.

As for τ , we shall show that it can be viewed as the natural extensionof τ in the meaning of the constructive definition given in the precedingsubsection. Indeed, in our case XT from the preceding subsection is

Ωτ = (ωi)i∈N ∈ ΩN : ωi = τ(ωi+1), i ∈ N,and the natural extension of τ appears to be—we are bound to changenotation—the transformation given by

τe ((ωi)i∈N) = (τ(ω0), ω0, ω1, · · · )for any (ωi)i∈N = (ω0, ω1, · · · ) ∈ Ωτ . Let us remark that by the very defi-nition of Ωτ we have ωi+1 = 1/(κi + ωi) for some κi ∈ N+ whatever i ∈ N.Hence Ωτ can be viewed as the Cartesian product

Ω×NN++

or, equivalently, Ω × Ω = Ω2. More precisely, there is a one-to-one corre-spondence between Ωτ and Ω2 given by

(ωi)i∈N ∈ Ωτ ↔ (ω0, [bω−11 c, bω−1

2 c, · · · ] ) ∈ Ω2.

Then there also is a one-to-one correspondence between

τe((ωi)i∈N) = (τ(ω0), ω0, ω1, · · · ) ∈ Ωτ

and (τ(ω0),

1bω−1

0 c+ [bω−11 c, bω−1

2 c, · · · ]

)∈ Ω2 .

These considerations show that we can identify τe : Ωτ → Ωτ and τ : Ω2 →Ω2 defined as in Subsection 1.3.1 by

τ(ω, θ) =(

τ(ω),1

a1(ω) + θ

), (ω, θ) ∈ Ω2.

Page 242: Kluwer

Ergodic theory of continued fractions 225

It follows from Proposition 4.0.5 that τ is strongly mixing (thus ergodic)under γ. Also, (τ , γ) is a K-automorphism. Clearly, τ can be viewed as askew product.

4.1 Classical results and generalizations

4.1.1 The case of incomplete quotients

Since τ is γ-preserving and ergodic under γ, it follows from Theorem 4.0.3that

limn→∞

1n

n−1∑

κ=0

f τκ =1

log 2

∫ 1

0

f(x)x + 1

dx a.e. (4.1.1)

for any measurable function f : I → R such that∫I |f | dλ < ∞. It is clear

that under suitable further assumptions on f , Proposition 4.0.4 should leadto estimates of convergence rates in (4.1.1).

We now state several classical results which can be derived from (4.1.1)by specializing f , together with the corresponding estimates of the conver-gence rates, when available. Let us note that throughout this subsectionthe constants implied in o will depend on ε, the current point in Ω, and theother variables involved.

Proposition 4.1.1 [Asymptotic relative digit frequencies – Levy (1929)]For any i ∈ N+ we have

limn→∞

cardκ : aκ = i, 1 ≤ κ ≤ nn

=1

log 2log

(1 +

1i(i + 2)

)a.e..

More precisely, whatever ε > 0, for any i ∈ N+ we have

cardκ : aκ = i, 1 ≤ κ ≤ nn

=1

log 2log

(1 +

1i(i + 2)

)+ o

(n−

12 log(3+ε)/2 n

)a.e.

as n →∞.

Proof. The first equation in the above statement follows from (4.1.1) bytaking f = I(a1=i), hence f τκ = I(a1τκ=i) = I(aκ+1=i), κ ∈ N. The secondequation follows from Proposition 4.0.4 on account of Corollaries 1.3.15 andA3.3 which yield Ψ(n) = n, n ∈ N+. 2

Page 243: Kluwer

226 Chapter 4

A more general result yielding the asymptotic relative m-digit blockfrequencies is also easily obtained.

Proposition 4.1.2 Whatever ε > 0, for any m ∈ N+ and i(m) =(i1, · · · , im) ∈ Nm

+ we have

cardκ : (aκ, · · · , aκ+m−1) = i(m), 1 ≤ κ ≤ nn

=1

log 2log

1 + v(i(m))1 + u(i(m))

+ o(n−

12 log(3+ε)/2 n

)a.e.

as n →∞.

The proof is quite similar to that of the preceding proposition. In (4.1.1)we should take f = I((a1,··· ,am)=i(m)). 2

It is important to note that the asymptotic relative digit frequencies aswell as the asymptotic relative m-digit block frequencies, m ≥ 2, consti-tute probability distributions on N+ respectively Nm

+ . This is quite easilychecked in the first case and not so easily in the second one (induction onm!). Actually, this follows from (4.1.1) on account of the countable additiv-ity of the integral there with respect to the integrand.

We now give other results related to asymptotic relative digit frequencies.

Corollary 4.1.3 (Asymptotic relative frequencies of digits between twogiven values) For any i, j ∈ N+ such that i ≤ j we have

limn→∞

cardκ : i ≤ aκ ≤ j, 1 ≤ κ ≤ nn

=1

log 2log

(i + 1)(j + 1)i(j + 2)

a.e..

More precisely, whatever ε > 0, for any i, j ∈ N+ such that i ≤ j we have

cardκ : i ≤ aκ ≤ j, 1 ≤ κ ≤ nn

=1

log 2log

(i + 1)(j + 1)i(j + 2)

+ o(n−

12 log

3+ε2 n

)a.e.

as n →∞.

This is a direct consequence of Proposition 4.1.1, which can be alsoobtained from (4.1.1) by taking f = I(i≤a1≤j).

Page 244: Kluwer

Ergodic theory of continued fractions 227

Proposition 4.1.4 (Asymptotic relative frequencies of digits exceedinga given value) For any i ∈ N+ we have

limn→∞

cardκ : aκ ≥ i, 1 ≤ κ ≤ nn

=1

log 2log

i + 1i

a.e..

More precisely, whatever ε > 0, for any i ∈ N+ we have

cardκ : aκ ≥ i, 1 ≤ κ ≤ nn

=1

log 2log

i + 1i

+ o(n−

12 log

3+ε2 n

)a.e.

as n →∞.

The proof is quite similar to that of Proposition 4.1.1. In (4.1.1) weshould take f = I(a1≥i). 2

Let us note that on account of the complete additivity of the asymp-totic relative digit frequencies, the first half of Proposition 4.1.4 is a directconsequence of the first half of Proposition 4.1.1.

Now, let m ∈ N+ such that m ≥ 2, and fix arbitrarily an ` ∈ N+ notexceeding m. It then follows from Proposition 4.1.1 that

limn→∞

cardκ : aκ ≡ ` mod m, 1 ≤ κ ≤ nn

=1

log 2

∞∑

p=0

log(` + pm + 1)2

(` + pm)(` + pm + 2)a.e..

[By taking f = I(a1≡` mod m) in (4.1.1), an estimate of the convergence ratecan be also obtained.] It has been shown that the sum of the series abovecan be expressed in terms of Euler’s Gamma-function. To be precise, thefollowing result holds.

Proposition 4.1.5 [Nolte (1990)] We have

1log 2

∞∑

p=0

log(` + pm + 1)2

(` + pm)(` + pm + 2)=

1log 2

log

(Γ( `

m)Γ( `+2m )

Γ2( `+1m )

).

The proof rests on a special case of a result from Whittaker and Watson(1927, Section 12.13), which reads as follows.

Let αi, βi ∈ C \N+, 1 ≤ i ≤ r, for a given r ∈ N+. Then the infiniteproduct ∏

n∈N+

(n− α1)(n− α2) · · · (n− αr)(n− β1)(n− β2) · · · (n− βr)

Page 245: Kluwer

228 Chapter 4

converges if and only if∑r

i=1 αi =∑r

i=1 βi. If this condition is fulfilled,then

n∈N+

(n− α1)(n− α2) · · · (n− αr)(n− β1)(n− β2) · · · (n− βr)

=r∏

i=1

Γ(1− βi)Γ(1− αi)

. (4.1.2)

2

For example, using the well known relations Γ(z)Γ(1 − z) = π/ sinπz,z 6∈ Z, and Γ(z + 1) = zΓ(z), z 6∈ −N, if we take m = 2 and ` = 1 then wefind that

limn→∞

cardκ : aκ ≡ 1 mod 2, 1 ≤ κ ≤ nn

=1

log 2log

Γ(1/2)Γ(3/2)Γ2(1)

=log π

log 2− 1 = 0.6514 · · · a.e.,

i.e., about 65 % of the occurring digits are odd a.e..

Next, using the same relations for the function Γ, for m = 4 and ` = 1we find that

limn→∞

cardκ : aκ ≡ 1 mod 4, 1 ≤ κ ≤ nn

=1

log 2log

Γ(1/4)Γ(3/4)Γ2(1/2)

=12

a.e.,

i.e., about half of the occurring digits are ≡ 1 mod 4 a.e..Similar considerations can be made about 2-digit blocks. For example,

we have

limn→∞

cardκ : (aκ, aκ+1) ≡ (0, 0) mod 2, 1 ≤ κ ≤ nn

=1

log 2

i∈N+

j∈N+

log(4ij + 1)(4ij + 2i + 2j + 2)(4ij + 2i + 1)(4ij + 2j + 1)

a.e.,

which by (4.1.2) is equal to

1log 2

i∈N+

logΓ(1 + 2i+1

4i )Γ(1 + 14i+2)

Γ(1 + 14i)Γ(1 + i+1

2i+1).

Page 246: Kluwer

Ergodic theory of continued fractions 229

Nolte (op. cit.) proved that the last quantity can be expressed as

α +1

log 2

n≥2

(−1)n ζ(n)− 1n

((22−n − 22−2n − 1)(ζ(n)− 1) +

2n−1 − 122n−2

),

where

α = log 2− 1 +2

log 2log 6

√2π − 4

log 2log Γ

(14

)= 0.08167 · · · .

Setting y = 2− log π/ log 2 = 0.3485 . . . , Nolte’s computations show that

limn→∞

cardκ : (aκ, aκ+1) ≡ (a, b) mod 2, 1 ≤ κ ≤ nn

is a.e. equal to

z = 0.11694 · · · for (a, b) = (0, 0);y − z = 0.23156 · · · for (a, b) = (0, 1) or (1, 0);1− 2y + z = 0.41993 · · · for (a, b) = (1, 1).

Actually, all the results we have proved so far are special cases of thefollowing result.

Proposition 4.1.6 Given m ∈ N+, let H : Nm+ → R be such that

i(m)∈Nm+

|H(i(m))|(v(i(m))− u(i(m))) < ∞

[which is equivalent to Eγ |H(a1, · · · , am)| < ∞]. Then we have

limn→∞

1n

n−1∑

κ=0

H(aκ, · · · , aκ+m−1) = αm a.e.,

where

αm =1

log 2

i(m)∈Nm+

H(i(m)) log1 + v(i(m))1 + u(i(m))

.

If, in addition,

EλH2(a1, · · · , am) =∑

i(m)∈Nm+

H2(i(m))(v(i(m))− u(i(m))) < ∞

Page 247: Kluwer

230 Chapter 4

[which is equivalent to EγH2(a1, · · · , am) < ∞], then whatever ε > 0 wehave

1n

n−1∑

κ=0

H(aκ, · · · , aκ+m−1) = αm + o(n−

12 log(3+ε)/2 n

)a.e.

as n →∞.

For the proof this time the choice of f in (4.1.1) is

f(ω) = H(a1(ω), · · · , am(ω)), ω ∈ Ω ,

while Corollaries 1.3.15 and A3.3 should be also invoked. 2

Remark. A generalization of the second half of Proposition 4.1.6 wasgiven by Philipp (1967). It allows the integer m vary in relation to n, andreads as follows.

Proposition 4.1.7 Let H :⋃

m∈N+Nm

+ → R be such that

EλH2(a1, · · · , am) < ∞

for any m ∈ N+. Whatever ε > 0, if 2m ≤ n < 2m+1 then

1n

n−1∑

κ=0

H(aκ, · · · , aκ+m−1) = αm + o(n−

12 α2

m log2+ε n)

a.e.

as n →∞. 2

We shall now consider other important special cases of Proposition 4.1.6.With m = 1 and

H(i) = Hp(i) =

ip if p < 1, p 6= 0,

log i if p = 0

for i ∈ N+, we obtain the following results.

Proposition 4.1.8 We have

limn→∞(a1 · · · an)1/n = K0 a.e.

and

limn→∞

(ap

1 + · · ·+ apn

n

)1/p

= Kp a.e.

Page 248: Kluwer

Ergodic theory of continued fractions 231

for any p < 1, p 6= 0, where

K0 =∏

i∈N+

(1 +

1i(i + 2)

)log i/ log 2

= exp(

1log 2

∫ 1

0

logb1/tc1 + t

dt

)

= 2.685452 · · ·and

Kp =

1

log 2

i∈N+

ip log(

1 +1

i(i + 2)

)

1/p

=(

1log 2

∫ 1

0

(b1/tc)p

1 + tdt

)1/p

.

In particular,

K−1 = 1.745405 · · · , K−2 = 1.450340 · · · , K−3 = 1.313507 · · · ,

K−4 = 1.236961 · · · , K−5 = 1.189003 · · · , K−6 = 1.156552 · · · ,

K−7 = 1.133323 · · · , K−8 = 1.115964 · · · , K−9 = 1.102543 · · · ,

K−10 = 1.091877 · · · .

More precisely, whatever ε > 0 we have

(a1 · · · an)1/n = K0 + o(n−12 log

3+ε2 n) a.e.

as n →∞, and(

ap1 + · · ·+ ap

n

n

)1/p

= Kp + o(n−12 log

3+ε2 n) a.e.

for any p < 1/2, p 6= 0, as n →∞.

The cases p = 0 and p = −1 leading to the asymptotic a.e. values K0 andK−1 of the geometric, respectively, harmonic mean of the first n incompletequotients as n →∞ , were studied by Khintchine (1934/35). Ever since itsdiscovery much effort has been put in the numerical evaluation of K0. SeeLehmer (1939), Pedersen (1959), Shanks and Wrench, Jr. (1959), Wrench,Jr. (1960). In the last reference K0 has been evaluated to 155 decimal places.Recently, using work by Wrench, Jr. and Shanks (1996), Bailey et al. (1997)have presented rapidly converging series for any Kp, p < 1, allowing them toevaluate K0 and K−1 to 7,350 decimal places and Kp for p = −2,−3, · · · ,−10to 50 decimal places. Setting

ζ(s, n) = ζ(s)−n∑

i=1

i−s , s > 1, n ∈ N+,

Page 249: Kluwer

232 Chapter 4

the following identities hold:(i) for any n ∈ N+ we have

log K0 =1

log 2

i∈N+

ζ(2i, n)Ai

i−

2≤i≤n

log(

1− 1i

)log

(1 +

1i

) ,

where

Ai =2i−1∑

κ=1

(−1)κ−1/κ , i ∈ N+;

(ii) whatever the negative integer p, for any n ∈ N+ we have

Kpp =

1log 2

i∈N+

∑j∈N

(j − p− 1−p− 1

)ζ(2i + j − p, n)

i

−∑

2≤i≤n

(i− 1)p log(

1− 1i2

) ;

(iii) in particular, for any n ∈ N+ we have

1K−1

=1

log 2

i∈N+

n−1 −∑2ij=2 ζ(j, n)i

−∑

2≤i≤n

log(1− i−2)i− 1

.

Clearly, for n = 1 the sums∑

2≤i≤n occurring above are empty, thus zero,so that both Kp

p log 2 whatever the negative integer p and (log K0)(log 2) canbe cast in terms of series involving values of the Riemann zeta function andrationals.

From (i) above, the elegant integral representation

log K0 = − 1log 2

∫ 1

0

log[sin(πt)/πt]t(t + 1)

dt

can be derived. Let us note that we also have

log K0 = log 2 +1

log 2

∫ 1

0

log[πt(1− t2)/ sinπt]t(t + 1)

dt ,

as shown in Shanks and Wrench, Jr. (1959). Actually, the second equationfor log K0 follows from the first one since

∫ 1

0

log(1− t2)t(t + 1)

dt = − log2 2.

Page 250: Kluwer

Ergodic theory of continued fractions 233

See Bailey et al. (op. cit. p. 419).

Remarks. 1. Whatever p ∈ R the series∑

i∈N+ap

i is divergent a.e. Forp < 0 the assertion follows immediately from Proposition 4.1.8 while forp ≥ 0 it is obvious since in this case clearly

∑ni=1 ap

i ≥ n, n ∈ N+. Forp < 0 arbitrarily large in absolute value this might seem strange at firstsight. Actually, things are quite natural since by Proposition 4.1.1 any digiti ∈ N+ occurs a.e. infinitely often (and thus there is no need to invokeProposition 4.1.8).

2. It has been proved by Salat (1969, 1984) that from a topologicalstandpoint the sets of probability 1 in Propositions 4.1.1 and 4.1.8 (for p = 0)are only of the first Baire category, i.e., they are countable unions of nowheredense subsets of I.

3. A set which is ‘small’ in the measure theoretical sense, can be quite‘large’ from the point of view of topology. Consider, for example, the setE2 of all numbers in [0, 1) whose RCF digits are 1 or 2. It is a trivialconsequence of Proposition 4.1.1 that λ(E2) = γ(E2) = 0. On the otherhand, it is also clear that E2 has the power of the continuum.

To express the ‘topological size’ of sets like E2 the concepts of Hausdorffmeasure and Hausdorff dimension are suitable. We first recall their formaldefinitions and then outline two applications of these concepts to continuedfractions. Given a subset E of Rn, for any ε, δ > 0 put

Hδε (E) = inf

U

i

diam(Ui)δ

,

where the infimum is taken over all open coverings U = Uii of E such thatdiam(Ui) ≤ ε. The Hausdorff measure Hδ(E) and the Hausdorff dimensiondimH(E) of E are then defined as

Hδ(E) = limε→0

Hδε (E), dimH(E) = inf

δ : Hδ(E) = 0

.

See Falconer (1986, 1990), Harman (1998), and Rogers (1998).It follows from Proposition 1.1.1—see also Corollary 4.1.30—that for any

ω ∈ Ω the inequality ∣∣∣∣ω −p

q

∣∣∣∣ <1q2

has infinitely many solutions in integers p, q ∈ N+ with g.c.d. (p, q) = 1.Let then Mc denote the set of all x ∈ [0, 1) satisfying

∣∣∣∣x−p

q

∣∣∣∣ <1qc

Page 251: Kluwer

234 Chapter 4

for infinitely many pairs (p, q) of positive integers. Clearly, if c ≤ 2 thenMc = [0, 1), but what happens when c > 2? It is fairly easy to show thatλ(Mc) = 0 for c > 2. On the other hand, V. Jarnık proved in 1929 thatdimH(Mc) = 2/c for any c > 2. A simplified proof of this result can befound in Falconer (1990, p. 142).

Using iterated function systems (IFS)—which is another name for depen-dence with complete connections—it is possible to calculate the Hausdorffdimension of sets defined by number-theoretic properties. For instance, theset E2 just defined is the attractor of the IFS consisting of the two (non-linear) contractions

u1(x) =1

1 + xand u2(x) =

12 + x

.

It was first shown by Jarnık that 13 ≤ dimH(E2) ≤ 2

3 , but Jenkinson andPollicott (2001) found that

dimH(E2) = 0.53128 05062 77205 14162 44686 · · · ,

an approximation accurate to 25 decimal places, which improves earlier es-timates of Hensley (1996). A striking feature of Jenkinson and Pollicott’smethod is that successive approximations of dimH(E2) converge at a super-exponential rate. Their method can be also used to efficiently compute theHausdorff dimension of other sets consisting of numbers whose RCF digitsare constrained to belong to any given finite subset of N+. 2

The case p = 1 is not settled by Proposition 4.1.8. For H(i) = i, i ∈ N+,the series

i∈N+

|H(i)|(v(i)− u(i)) =∑

i∈N+

i

i(i + 1)=

i∈N+

1i + 1

is divergent. In this case EγH(a1) = ∞ but, however, we have

limn→∞

a1 + · · ·+ an

n= ∞ a.e..

Before proving this (see Corollary 4.1.10 and Remark 1 following it) let usrecall that in Subsection 3.3.2 we noted that, writing tn = a1 + · · · + an,n ∈ N+, tn/n log n converges in µ-probability to 1/ log 2 as n → ∞ forany µ ∈ pr(BI) such that µ ¿ λ. It follows that tnκ/nκ log nκ convergesa.e. to 1/ log 2 as κ → ∞, where (nκ)κ∈N+ is some sequence of positiveintegers with limκ→∞ nκ = ∞. Hence tnκ/nκ converges a.e. to ∞ as κ →∞.

Page 252: Kluwer

Ergodic theory of continued fractions 235

Thus lim supn→∞ tn/n = ∞ a.e. and it remains to show that lim sup can bereplaced by lim. Actually, we shall prove much more.

Theorem 4.1.9 [Diamond and Vaaler (1986)] We have

tn =1 + o(1)

log 2n log n + θn max

1≤i≤nai a.e.

as n →∞, where θn is an I-valued random variable for any n ∈ N+.

Proof. Given ε > 0 and n ∈ N+ set

a′i = aiI(ai≤h(n)), 1 ≤ i ≤ n,

where h : N+ → R is defined by h(n) = n log12+ε n, and t′n = a′1 + · · ·+ a′n.

Then

Eγt′n =n

log 2

bh(n)c∑

j=1

j log(

1 +1

j(j + 2)

)

=n

log 2

bh(n)c∑

j=1

1j

(1 + o(1)) = n logbh(n)c(1 + o(1))/ log 2

as n →∞. By Corollaries 1.3.15 and A3.2 we have

Varγ t′n = O(nVarγ t′1) = O(nEγ(t′1)2)

as n →∞. But

Eγ(t′1)2 =

1log 2

bh(n)c∑

j=1

j2 log(

1 +1

j(j + 2)

)= bh(n)c(1 + o(1))/ log 2

as n →∞. Therefore Varγ t′n = O(nbh(n)c) as n →∞.Now, consider the sequence (nκ)κ∈N+ defined as

nκ = bexpκ1−εc , κ ∈ N+.

Note thatnκ−1 =

(1 + O(κ−ε)

)nκ

as κ →∞ so that nκ−1/nκ and h(nκ−1)/h(nκ) both converge to 1 as κ →∞.

By the choice of the nκ it is obvious that the series with general term

Eγ(t′nκ−Eγt′nκ

)2

nκh(nκ)κ1+ε, κ ∈ N+,

Page 253: Kluwer

236 Chapter 4

is convergent. Hence by Beppo Levi’s theorem the random series with gen-eral term

(t′nκ− Eγt′nκ

)2

nκh(nκ)κ1+ε, κ ∈ N+,

is convergent a.e. Therefore

|t′nκ− Eγt′nκ

| = o(nκκ(1+ε)/2 log(1+2ε)/4 nκ

)a.e.

as κ →∞. Now, it is easy to check that

nκκ(1+ε)/2 log(1+2ε)/4 nκ = O

(Eγt′nκ

logε/3 nκ

)= o

(Eγt′nκ

)a.e.

as κ →∞ provided that ε < 0.126. Thus

t′nκ= (1 + o(1))Eγt′nκ

a.e.

as κ →∞.Next, for any n ∈ N+ satisfying nκ−1 < n ≤ nκ for some κ ∈ N+ we

clearly havet′nκ−1

≤ t′n ≤ t′nκ,

so that(1 + o(1))Eγt′nκ−1

≤ t′n ≤ (1 + o(1))Eγt′nκa.e.

as k → ∞. On account of the properties already noted of the sequence(nκ)κ∈N+ we easily obtain

t′n = (1 + o(1))Eγt′n a.e.

as n →∞, and since

n logbh(n)c − n log n = o(n log n)

as n →∞, we can also write

t′n = (1 + o(1))n log n

log 2a.e. (4.1.3)

as n →∞.To complete the proof we shall show that a.e. there exist at most finitely

many integers n ∈ N+ for which the inequalities

ai > h(n), aj > h(n)

Page 254: Kluwer

Ergodic theory of continued fractions 237

hold for two distinct indices i, j ≤ n. To proceed fix i < j. It follows fromCorollary 1.3.15 that

γ(ai > h(n), aj > h(n)) = O(γ(ai > h(n))γ(aj > h(n)))

= O(γ2(a1 > h(n))) = O((h(n))−2)

= O(n−2(log n)−1−2ε)

as n →∞. Hence the probability of the random event

(ai > h(n), aj > h(n) for distinct indices i, j ≤ 2n)

is of order at most (log n)−1−2ε. For κ ∈ N+ let

Eκ =⋃

`≥κ

(ai > h(2`), aj > h(2`) for distinct indices i, j ≤ 2`+1) .

Then γ(Eκ) = O(∑

`≥κ `−1−2ε) → 0 as κ → ∞. It is now clear that forω 6∈ Eκ and n > 2κ+1 there exists at most one index i ≤ n for whichai(ω) > h(n).

Consequently, we can assert that

0 ≤ tn − t′n ≤ max1≤i≤n

ai a.e. (4.1.4)

for all sufficiently large n. By (4.1.3) and (4.1.4) the proof is complete. 2

Remarks. 1. It is now clear from the above theorem and Proposition3.1.7 why tn/n log n converges in probability, rather than a.e., to 1/ log 2 asn →∞. The obstacle to a.e. convergence is the occurrence of a single largevalue of the digits. At the same time, a.e. convergence can be obtained byexcluding at most one summand.

2. It is interesting to compare Theorems 3.3.4 and 4.1.9 (see also Corol-lary 3.1.11). 2

Corollary 4.1.10 Whatever 0 ≤ ε < 1 we have

limn→∞

a1 + · · ·+ an

n(log n)ε= ∞ a.e..

Remarks. 1. The equation

limn→∞

a1 + · · ·+ an

n= ∞ a.e.

Page 255: Kluwer

238 Chapter 4

can be also derived from a slight generalization of equation (4.1.1). Hartman(1951) proved that if f : I → R+ is measurable and

∫I f dλ = ∞, then the

limit in (4.1.1) exists and is equal to ∞ a.e.. The equation above thenfollows by taking f(ω) = a1(ω), ω ∈ Ω. It is interesting to note that if wetake f(ω) = a2(ω)/a1(ω) or f(ω) = a1(ω)/a2(ω), ω ∈ Ω, then we obrain

limn→∞

1n

i∈N+

ai+1

ai= lim

n→∞1n

i∈N+

ai

ai+1= ∞ a.e..

2. Salem (1943) proved that the celebrated Minkowski’s ? function canbe expressed in terms of the tn, n ∈ N, as

?(x) =∑

i∈N+

(−1)i−121−ti(x)

for any x ∈ I, if we consider that ai(x) = ∞ for any large enough i ∈ N+

when x ∈ I \Ω. It is known that ? is a strictly increasing singular function,that is, ?′(x) = 0 a.e. in I. Recently, Viader et al. (1998) have shown that

(x ∈ I : lim

n→∞tn(x)

n= ∞

)∩ (

x ∈ I : ?′(x) exists finitely)

⊂ (x ∈ I : ?′(x) = 0

),

thus making more precise the set where the derivative of ? vanishes.Note that the sequence (an)n∈N+ is i.i.d. with common µ-distribution

(2−m : m ∈ N+) under the probability measure µ induced by ? on BI . Cf.Lagarias (1992, p. 45).

3. Vardi (1995, 1997) discussed an interesting relationship between theSt. Petersburg game [see, e.g., Feller (1968, X.4)] and the sequence (an)n∈N+ ,on account of the properties of the sequence (tn)n∈N+ . That game is a wellknown example of a sequence of independent identically distributed randomvariables with infinite mean value, and was considered as a paradox since no‘fair’ entry fee exists. It appears that (an)n∈N+ makes a reasonable choiceof entry fees for the St. Petersburg game. 2

Corollary 4.1.11 Let (cn)n∈N+ be a non-decreasing sequence of positivenumbers satisfying

∑n∈N+

c−1n < ∞. Then

tn =1 + o(1)

log 2n log n + θncn a.e.

as n →∞, where θn is an I-valued random variable for any n ∈ N+.

Page 256: Kluwer

Ergodic theory of continued fractions 239

Proof. This is an immediate consequence of Theorem 4.1.9 and Propo-sition 1.3.16 (F. Bernstein’s theorem). 2

Corollary 4.1.12 Set dn = exp(κ log2 κ)κ log2 κ for

exp((κ− 1) log2(κ− 1)) < n ≤ exp(κ log2 κ) , κ ≥ 2. (4.1.5)

Thenlim sup

n→∞a1 + · · ·+ an

dn=

1log 2

a.e..

Proof. In Corollary 4.1.11 set

cn = dn/(log log 10κ)

for n in the range (4.1.5). It is easy to check that∑

n∈N+c−1n < ∞ and that

(4.1.5) impliesn log n ≤ dn , n ∈ N+.

Then by Corollary 4.1.11 we have

tn ≤ 1 + o(1)log 2

dn +dn

log log 10κa.e.

as κ → ∞, so lim supn→∞ tn/dn ≤ 1/ log 2 a.e. To complete the proof wenote that setting nκ = exp((κ + 1) log2(κ + 1)) we have dnκ = nκ log nκ,κ ∈ N+, and limκ→∞ tnκ/dnκ = 1/ log 2. 2

Remarks. 1. Philipp (1988, Theorem 1) proved that (i) for any se-quence (cn)n∈N+ of positive numbers such that

∑n∈N+

c−1n < ∞, we have

lim supn→∞ tn/cn = 0 a.e., and (ii) for any sequence (cn)n∈N+ of pos-itive numbers such that the sequence (cn/n)n∈N+ is non-decreasing and∑

n∈N+c−1n = ∞, we have lim supn→∞ tn/cn = ∞ a.e. Corollary 4.1.11

shows that the condition on the sequence (cn/n)n∈N+ in (ii) cannot be dis-pensed with.

2. It is easy to show, see Diamond and Vaaler (op. cit., pp. 81–82), thatif (cn)n∈N+ is as in Corollary 4.1.11, then setting

S = n ∈ N+ : cn < n log n ,

we havelim

x→∞1

log x

n≤x, n∈S

1n

= 0,

Page 257: Kluwer

240 Chapter 4

that is, S has logarithmic density zero. It then follows from Corollary 4.1.11that

a1 + · · ·+ an = O(cn)

as n →∞ for all integers n outside a set of logarithmic density 0. See alsoCorollary 3.1.9.

3. Theorem 4.1.9 can be easily generalized for a function H : N+ → R++

satisfying

1≤i≤n

H2(i)/i2

/

1≤i≤n

H(i)/i2

2

= O(n log−

32−ε n

)

as n →∞ for some ε > 0. [Clearly, H(i) = i, i ∈ N+, satisfies the conditionabove.] For such a function H we have

n∑

i=1

H(ai) =(1 + o(1))

log 2n

1≤i≤n

H(i) log(

1 +1

i(i + 2)

)

+ θn max1≤i≤n

H(ai) a.e.,

where θn is an I-valued random variable for any n ∈ N+. The proof can befound in Diamond and Vaaler (op. cit.). 2

4.1.2 Empirical evidence, and normal continued fractionnumbers

We shall now discuss the important amount of empirical evidence alreadyaccumulated on continued fraction expansions of certain real numbers. Theinterest of such computations lies in comparing statistics of such expansionswith known theoretical limiting distributions.

It is clear that, for instance, contained in the exceptional set in Propo-sition 4.1.8 are all quadratic irrationalities and the number e− 2. See Sub-section 1.1.3.

Clearly, all the numbers just mentioned are also contained in the excep-tional set in Proposition 4.1.1.

As we have already mentioned in Subsection 1.1.3, in the opposite direc-tion seems to lie π − 3 whose continued fraction expansion is

π − 3 = [ 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, · · · ] .

Page 258: Kluwer

Ergodic theory of continued fractions 241

In Bailey et al. (1997, p. 423) it is asserted that, based on the first 17,001,303continued fraction digits of π − 3, the geometric mean is 2.68639 and theharmonic mean is 1.745882, which are reasonably close to K0 and K−1 —seeProposition 4.1.8. Clearly, no conclusion can be drawn beyond this.

For computations concerning the continued fraction digits of various ir-rationals in I we refer the reader to Alexandrov (1978), Brjuno (1964),Choong, Daykin and Rathbone (1971) (see nevertheless D. Shanks’ review[MR 52 # 7073] of this paper), Lang and Trotter (1972), Richtmyer (1975),Shiu (1995), and J.O. Shallit’s review [MR 96b: 11165] of this last paper.

Presenting an algorithm for computing the continued fraction expansionof numbers which are zeroes of differentiable functions, Shiu (1995) obtainedstatistics of the first 10000 digits of irrationals in I such as 3

√2 − 1, π − 3,

π2 − 9, log 2, 2√

2 − 2. Table 1 below is compiled from his Table 1. The lastcolumn contains the (theoretical) asymptotic relative digit frequencies

1log 2

log(

1 +1

i(i + 2)

), 1 ≤ i ≤ 10,

in the first 10 lines, the asymptotic relative frequency

1log 2

log12× 10111× 102

of the digits in the range [11, 100] in the 11th line, and the asymptoticrelative frequency

1log 2

log102101

of the digits exceeding 100 in the last line. Cf. Propositions 4.1.1, 4.1.3, and4.1.4.

Page 259: Kluwer

242 Chapter 4

Frequency of occurrence of i in10000 digits of Theoretical

Digit asymptotici 3

√2− 1 π − 3 π2 − 9 log 2 2

√2 − 2 relative frequency

1 4173 4206 4134 4149 4192 0.415037499 · · ·2 1675 1672 1706 1666 1639 0.169925001 · · ·3 946 882 948 905 933 0.093109404 · · ·4 636 597 581 600 616 0.058893689 · · ·5 421 443 401 390 390 0.040641984 · · ·6 295 282 302 334 278 0.029747343 · · ·7 240 224 232 226 213 0.022720076 · · ·8 163 186 185 187 190 0.017921908 · · ·9 122 143 138 142 135 0.014499569 · · ·10 118 123 117 137 135 0.011972641 · · ·

11− 100 1060 1113 1111 1113 1130 0.111317022 · · ·≥ 101 151 129 145 151 149 0.014213859 · · ·

Table 1

It is also interesting to note that setting M10000(ω) = max1≤κ≤10000 aκ(ω)(cf. Subsection 3.1.3) we have

M10000(3√

2− 1) = a1990(3√

2− 1) = 12737,M10000(π − 3) = a431(π − 3) = 20776,

M10000(π2 − 9) = a1234(π2 − 9) = 12013,M10000(log 2) = a9168(log 2) = 963664,

M10000(2√

2 − 2) = a6342(2√

2 − 2) = 44122 ,

and that in all cases just considered there exist digits not exceeding 100which do not appear, viz.

74, 86, 91, 96, 97, 99, and 100 for 3√

2− 1;90, 91, and 96 for π − 3;91 and 92 for π2 − 9;55, 73, 76, 96, and 97 for log 2;

79, 80, 81, 82, 91, 94, 97, and 99 for 2√

2 − 2.

Page 260: Kluwer

Ergodic theory of continued fractions 243

Concerning Khinchin’s constant K0, computations of

K0(ω, n) = (a1(ω) · · · an(ω))1n

for n ≤ 10000 and various ω ∈ Ω, including those considered above, suggestthat, e.g., π− 3 is not in the exceptional set. However, it should be pointedout that if even there might be convergence the rate has to be very slow. Itwas found that K0(π−3, 10000) differs from K0 by more than K0(π−3, 100)does!

The existence of the asymptotic relative digit and, more generally, m-digit block frequencies (Propositions 4.1.1 and 4.1.2) raises naturally thequestion of normality for the continued fraction expansion.

The idea of normality, first introduced by E. Borel in 1909, is an attemptto formalize the notion of a real number being random. A real numberx ∈ I is said to be normal in base b, b ∈ N+, b ≥ 2, if and only if in itsrepresentation in base b all digits 0, 1, · · · , b−1 appear asymptotically equallyoften, i.e., with asymptotic relative frequencies all equal to 1/b. In addition,for each m ∈ N+ the bm different m-digit blocks must occur equally often.In other words, for any m ∈ N+ we should have

limn→∞

1n

(number of occurrences of a given m-digitblock in the first n + m− 1 base-b digits of x

)= b−m

whatever the given m-digit block. Actually, the above equation holds for allx ∈ I except for a set of Lebesgue measure zero. This can easily be seen byapplying Birkhoff’s ergodic theorem to the transformation Tx = bx mod 1of I. A number that is normal in all bases b ∈ N+, b ≥ 2, is called normal.However, even if there are lots of normal numbers, when we are given a‘concrete’ number x ∈ I the existence result just mentioned does not helpto decide whether x is normal or not. Such a problem cannot be handledby methods known today. (Will it ever be solved?) For instance, it is notknown whether π − 3, e − 2, or any irrational algebraic number is normalor not. The first example of a normal number in base 10 was given byChampernowne (1933). His number is

x = 0. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 · · ·

but an explicit example of a normal number is still lacking.Clearly, a similar problem can be considered for the continued fraction

expansion (which has the advantage of not being related to any base). Anirrational ω ∈ I is said to be a normal continued fraction number if and only

Page 261: Kluwer

244 Chapter 4

if all its asymptotic relative m-digit block frequencies exist and are equal tothose occurring in Proposition 4.1.2 for any m ∈ N+. In other words, ω isa normal continued fraction number if it does not belong to the exceptionalsets of λ-measure zero excluded in Proposition 4.1.2 for any m ∈ N+. Forinstance, the quadratic irrationalities are not normal since they eventuallyhave periodic expansions, and neither is e− 2.

A construction of the Champernowne type for a normal continued frac-tion number was given by Adler, Keane, and Smorodinsky (1981). Theirexample is as follows. Let (rn)n∈N+ be the sequence of rationals in (0,1) ob-tained by first writing r1 = 1/2, then r2 = 1/3 and r3 = 2/3, then r4 = 1/4,r5 = 2/4, r6 = 3/4, etc., at each stage m ∈ N+ writing all quotients withdenominator m + 1 in increasing order. Let ri = [ai,1, ai,2, . . . , ai,ni ] be thecontinued fraction expansion of ri, with ai,ni 6= 1, i ∈ N+. The irrational ωwith continued fraction expansion

[a1,1, a2,1, a3,1, a3,2, a4,1, a5,1, a6,1, a6,2, a7,1, a8,1, a8,2, a9,1, a9,2, a9,3, · · · ],which is obtained by concatenating the expansions of r1, r2, · · · in the givenorder, is a normal continued fraction number. The first 14 digits of ω are

2, 3, 1, 2, 4, 2, 1, 3, 5, 2, 2, 1, 1, 2.

Another example of a different nature had been given by Postnikov(1960).

We should emphasize that even if the empirical evidence pleads in favourof normality for the continued fraction expansion of algebraic irrationals ofdegree exceeding 2, or of π − 3, π2 − 9 etc., the only mathematical resultsproved so far are the examples of normal continued fraction numbers justdiscussed.

Finally, a few words about the empirical evidence concerning Theorem4.1.9. Von Neumann and Tuckerman (1955) computed tn( 3

√2 − 1) and

n log n/ log 2 for n = 100(100)2000. It appears that tn( 3√

2− 1) log 2/n log nis most of the time greater than 1 and often nearly 2. As tn log 2/n log nconverges just in probability to 1 as n →∞, these deviations cannot be seenas significant.

4.1.3 The case of associated and extended random variables

Since τ is γ-preserving and ergodic under γ (see Subsection 4.0.2), it followsagain from Theorem 4.0.3 that

limn→∞

1n

n−1∑

k=0

f τk =1

log 2

∫ 1

0dx

∫ 1

0

f(x, y)(xy + 1)2

dy a.e. in I2 (4.1.6)

Page 262: Kluwer

Ergodic theory of continued fractions 245

for any measurable function f : I2 → R such that∫∫

I2 |f |dλ2 < ∞. Asin Subsection 4.1.1, for suitable choices of f , Proposition 4.0.4 will lead toestimates of convergence rates in (4.1.6).

We now give several results which can be derived from (4.1.6).

Proposition 4.1.13 For any B ∈ B2I we have

limn→∞

1n

n−1∑

k=0

IB(τk, sk) =1

log 2

∫∫

B

dxdy

(xy + 1)2a.e. in I2.

Proof. The equation above follows from (4.1.6) by taking f = IB,B ∈ B2

I , and noting that by the very definition of the extended incom-plete quotients (see Subsection 1.3.3), equations (1.3.1) and (1.3.1′) can bewritten as

τn(ω, θ) = (τn(ω), sn(ω, θ)) , (ω, θ) ∈ Ω× I,

for any n ∈ N+. (The last equation holds for n = 0, too.) 2

Corollary 4.1.14 For any A ∈ BI we have

limn→∞

1n

n−1∑

k=0

IA(τk) = γ(A) a.e. in I,

and

limn→∞

1n

n−1∑

k=0

IA(sk) = γ(A) a.e. in I2.

Proof. The first equation follows by taking B = A × I. [It might bealso derived from equation (4.1.1).] The second equation follows by takingB = I ×A. 2

It follows by dominated convergence from Proposition 4.1.13 that for anyµ ∈ pr(B2

I ) we have

limn→∞

1n

n−1∑

k=0

µ(τ−k(B)

)= γ(B), B ∈ B2

I . (4.1.7)

In particular,

limn→∞

1n

n−1∑

k=0

µ(τ−k(I ×A)

)= lim

n→∞1n

n−1∑

k=0

µ (sk ∈ A)

= γ(A), A ∈ BI .

(4.1.8)

Page 263: Kluwer

246 Chapter 4

We are going to show under suitable assumptions that in (4.1.7) actualconvergence holds instead of Cesaro convergence while in (4.1.8) the ex-tended random variable sk can be replaced by sa

k, k ∈ N, for a fixed a ∈ I.

Proposition 4.1.15 Let µ ∈ pr(B2I ) such that µ ¿ λ2. Then

limn→∞ µ

(τ−n(B)

)= γ(B) (4.1.9)

for any B ∈ B2I .

Proof. Let h = dµ/dλ2. Then for any B ∈ B2I we have

µ(τ−n(B)) =∫∫

I2

IB τn dµ =∫∫

I2

(IB τn)(h/g) dγ,

where g = dγ/dλ2, that is,

g(x, y) =1

log 21

(xy + 1)2, (x, y) ∈ I2.

Now, since τ is strongly mixing (see Subsections 4.0.1 and 4.0.2), the lastintegral in the equations above converges to

∫∫

I2

IB dγ

∫∫

I2

(h/g) dγ = γ(B)µ(I2) = γ(B)

as n →∞. 2

Remarks. 1. Proposition 2.1.5 shows that measures µτ−n, n ∈ N, canbe expressed in terms of the Perron–Frobenius operator Pγ = U of τ withrespect to γ. A similar representation holds for the case of a measure µ asin Proposition 4.1.15. It is easy to check that we have

µ(τ−n(B)) =∫∫

BPn

γ f dγ, B ∈ B2I ,

where f = h/g and Pγ is the Perron–Frobenius operator of τ under γ. Seethe Remark following Proposition 2.1.1.

If the endomorphism (τ , γ) were exact, then from Proposition 4.0.2 wemight have deduced that convergence in (4.1.9) is uniform with respect toB ∈ B2

I . Since (τ , γ) is not exact, such a conclusion cannot be reached thisway. It is an open problem whether this is really true.

2. Proposition 4.1.15 is a first step towards the solution of what can becalled Gauss’ problem for the natural extension τ of τ . 2

Page 264: Kluwer

Ergodic theory of continued fractions 247

Theorem 4.1.16 Let µ ∈ pr(BI) such that µ ¿ λ. For any B ∈ B2I

such that λ2(∂B) = 0 we have(i) lim

n→∞µ (τn( · , a) ∈ B) = γ(B)

uniformly with respect to a ∈ I;

(ii) limn→∞

1n

n−1∑

k=0

IB(τk, sak) = γ(B) a.e. in I

uniformly with respect to a ∈ I.

Proof. (i) For any θ ∈ I and B ∈ B2I set

hn(θ, B) = µ (τn( · , θ) ∈ B) , n ∈ N+.

By Fubini’s theorem we have

(µ⊗ λ)(τ−n(B)

)=

∫∫

I2

IB (τn(ω, θ))µ(dω) dθ

=∫ 1

0dθ

∫ 1

0IB (τn(ω, θ))µ(dω)

=∫ 1

0µ (τn( · , θ) ∈ B) dθ =

∫ 1

0hn(θ,B) dθ.

Since µ⊗ λ ¿ λ2, it follows from Proposition 4.1.15 that

limn→∞

∫ 1

0hn(θ, B) dθ = γ(B) (4.1.10)

for any B ∈ B2I .

Now, note that—letting d denote the Euclidean distance in I2—by The-orem 1.2.2 we have

d (τn(ω, θ), τn(ω, a)) ≤ maxi(n)∈Nn

+

I(i(n)) =1

FnFn+1, n ∈ N+, (4.1.11)

for any θ, a ∈ I. Given ε > 0, let

B+ε =

(x,y)∈B

Dε(x, y),

where Dε(x, y) is the open disk of radius ε centered at (x, y) ∈ I2, and

B−ε = ((x, y) ∈ B : Dε(x, y) ⊂ B) .

Page 265: Kluwer

248 Chapter 4

By (4.1.11), for n ≥ n0(ε) great enough and any θ, a ∈ I we have

(ω : τn(ω, θ) ∈ B−ε ) ⊂ (ω : τn(ω, a) ∈ B)

⊂ (ω : τn(ω, θ) ∈ B+ε ) .

(4.1.12)

On the other hand, for any n ∈ N and θ ∈ I we trivially have

(ω : τn(ω, θ) ∈ B−ε ) ⊂ (ω : τn(ω, θ) ∈ B)

⊂ (ω : τn(ω, θ) ∈ B+ε ) .

(4.1.13)

Hence

−hn

(θ,B+

ε \B−ε

) ≤ hn(θ, B)− hn(a,B) ≤ hn

(θ, B+

ε \B−ε

)

for any n ≥ n0(ε) and θ, a ∈ I. Integrating the double inequality above overθ ∈ I yields

∣∣∣∣∫ 1

0hn(θ, B) dθ − hn(a,B)

∣∣∣∣ ≤∫ 1

0hn

(θ,B+

ε \B−ε

)dθ

for any n ≥ n0(ε) whatever a ∈ I. Finally, let first n → ∞ then ε → 0 inthe last inequality. By (4.1.10) we obtain

lim supn→∞

supa∈I

|γ(B)− hn(a,B)| ≤ limε→0

γ(B+ε \B−

ε ) = γ(∂B) = 0

since λ2(∂B) = 0, and the proof of (i) is complete.(ii) It is easy to check that (4.1.12) and (4.1.13) imply the inequalities

IB−ε

(τk, sk

)≤ IB

(τk, sa

k

)≤ IB+

ε

(τk, sk

)

for any a ∈ I, (ω, θ) ∈ Ω × I, and any k ≥ n0(ε) great enough. Also, wetrivially have

IB−ε

(τk, sk

)≤ IB

(τk, sk

)≤ IB+

ε

(τk, sk

)

for any k ∈ N and (ω, θ) ∈ Ω× I. Hence∣∣∣IB(τk, sk)− IB(τk, sa

k)∣∣∣ ≤ IB+

ε \B−ε (τk, sk) (4.1.14)

for any k ≥ n0(ε), a ∈ I, and (ω, θ) ∈ Ω× I. By Proposition 4.1.13 we have

limn→∞

1n

n−1∑

k=0

IB(τk, sk) = γ(B)

Page 266: Kluwer

Ergodic theory of continued fractions 249

and

limn→∞

1n

n−1∑

k=0

IB+ε \B−ε (τk, sk) = γ(B+

ε \B−ε ) a.e. in I2.

Since λ2(∂B) = 0, we have

limε→0

γ(B+ε \B−

ε ) = γ(∂B) = 0.

It is now easy to see that (4.1.14) and the last three equations imply theresult stated. 2

Remark. Theorem 4.1.16(i) has been proved by Barbolosi and Faivre(1995) while (ii) is implicit (or implicitly used) in many papers by Dutchauthors. See, e.g., Bosma et al. (1983) or Jager (1986). 2

Theorem 4.1.16 has a host of consequences. We state some of them.

Corollary 4.1.17 Let µ ∈ pr(BI) such that µ ¿ λ. For any B ∈ B2I

such that λ2(∂B) = 0 we have

limn→∞µ((τn, sa

n) ∈ B) = γ(B) (4.1.15)

uniformly with respect to a ∈ I.

Proof. This is just a transcription of the result stated in Theorem4.1.16(i) as

τn(ω, a) = (τn(ω), sn(ω, a)) = (τn(ω), san(ω)), (ω, a) ∈ Ω× I,

for any n ∈ N. 2

Let us note that in Theorem 2.5.8 the (optimal) convergence rate in(4.1.15) has been obtained in the case where µ = γa for the class of rectanglesB = [0, x]× [0, y], x, y ∈ I. Using this result we can prove

Proposition 4.1.18 Let B be a simply connected subset of I2 such that∂B =

⋃mi=1 `i for some m ∈ N+, where either

`i := ( (x, fi(x)) : ai ≤ x ≤ bi )

with 0 ≤ ai < bi ≤ 1 and fi : [ai,bi] → I continuous and monotone, or

`i :=((ci, y) : a′i ≤ y ≤ b′i

)

with ci ∈ I and 0 ≤ a′i < b′i ≤ 1. Then

γa ((τn, san) ∈ B) = γ(B) + O(gn)

Page 267: Kluwer

250 Chapter 4

as n →∞, where the constant implied in O depends on m and the quantitiesdefining the `i, 1 ≤ i ≤ m.

The proof in the case a = 0 can be found in Dajani and Kraaikamp(1994). 2

By particularizing the set B in Corollary 4.1.17 and Proposition 4.1.18we obtain results originally derived by ad hoc methods. We shall state belowsome of them leaving the calculation details to the reader.

Corollary 4.1.19 For any µ ∈ pr(BI) such that µ ¿ λ and any t ∈ Iwe have

limn→∞µ (Θn ≤ t) = H(t),

where H has been defined in Theorem 2.2.13. For µ = λ the convergencerate in the equation above is O(gn) as n →∞.

Proof. This follows from Corollary 4.1.17 with a = 0 and

B =(

(x, y) ∈ I2 :x

xy + 1≤ t

), t ∈ I,

and Proposition 4.1.18, as Θn = τn/(snτn + 1), n ∈ N, by equation (1.3.7).Note that, however, Theorem 2.2.13 yields a better convergence rate! 2

Corollary 4.1.20 For any µ ∈ pr(BI) such that µ ¿ λ and any(t1, t2) ∈ I2 we have

limn→∞µ (Θn−1 ≤ t1, Θn ≤ t2) = H(t1, t2),

where H is the distribution function with density

1log 2

1√1− 4t1t2

if t1 ≥ 0, t2 ≥ 0, t1 + t2 < 1,

0 elsewhere.

For µ = λ the convergence rate in the equation above is O(gn) as n →∞.

Proof. This follows from Corollary 4.1.17 with a = 0 and

B =(

(x, y) ∈ I2 :y

xy + 1≤ t1,

x

xy + 1≤ t2

), (t1, t2) ∈ I2,

and Proposition 4.1.18, as

Θn−1 =sn

snτn + 1, Θn =

τn

snτn + 1, n ∈ N,

Page 268: Kluwer

Ergodic theory of continued fractions 251

by equation (1.3.7). 2

Let us define random variables ρn and Θ′n by

ρn(ω) =

∣∣∣ω − pn+1

qn+1

∣∣∣∣∣∣ω − pn

qn

∣∣∣, Θ′

n = qnqn+1

∣∣∣∣ω −pn

qn

∣∣∣∣ , ω ∈ Ω, n ∈ N.

It is easy to see that ρn = sn+1τn+1 and Θ′

n = 1/(sn+1τn+1 + 1) so that

Θ′n = 1/(ρn + 1), n ∈ N.

Corollary 4.1.21 For any µ ∈ pr(BI) such that µ ¿ λ we have

limn→∞µ(ρn ≤ t) =

1log 2

(log(t + 1)− t log t

t + 1

), t ∈ I,

limn→∞µ(Θ′

n ≤ t) =

0 if 0 ≤ t ≤ 1/2,

log(2tt(1− t)1−t)log 2

if 1/2 ≤ t ≤ 1.

For µ = λ the convergence rate in the equations above is O(gn) as n →∞.

The proof is left to the reader. 2

For other results of the same type, which can be derived as before, werefer the reader to Bosma et al. (1983), Jager (1986), Kraaikamp (1994).

Corollary 4.1.22 For any t, t1, t2 ∈ I the limits

limn→∞

1n

cardk : Θk ≤ t, 0 ≤ k ≤ n− 1 ,

limn→∞

1n

cardk : Θk ≤ t1, Θk+1 ≤ t2, 0 ≤ k ≤ n− 1 ,

limn→∞

1n

cardk : ρk ≤ t, 0 ≤ k ≤ n− 1 ,

andlim

n→∞1n

cardk : Θ′k ≤ t, 0 ≤ k ≤ n− 1 ,

all exist a.e. in I and are equal to the corresponding values of the limitingdistribution functions occurring in Corollaries 4.1.19, 4.1.20, and 4.1.21,respectively.

Page 269: Kluwer

252 Chapter 4

The proof is immediate on account of Theorem 4.1.16(ii) and the corol-laries referred to in the statement. 2

Remarks. 1. It has been proved by Hensley (1998) that if (kn)n∈N+ is astrictly increasing sequence of positive integers, then for any t ∈ I we have

limn→∞

1n

cardj : Θkj ≤ t, 0 ≤ j ≤ n− 1 = H(t) a.e. in I, (4.1.16)

where H has been defined in Theorem 2.2.13. Corollary 4.1.22 only coversthe case kn = n, n ∈ N+.

2. In the case kn = n, n ∈ N+, equation (4.1.16) has been conjecturedby H.W. Lenstra Jr. Actually, this conjecture is implicit in Doeblin (1940),which enables us to call it after both Doeblin and Lenstra. The Doeblin–Lenstra conjecture has been proved by Bosma et al. (1983) by using, evenif not explicitly, Theorem 4.1.16(ii) in a special case. 2

Corollary 4.1.23 The equations

limn→∞

1n

n−1∑

k=0

Θk =1

4 log 2= 0.36067 · · ·

limn→∞

1n

n−1∑

k=0

ΘkΘk+1 =16

(1− 1

4 log 2

)= 0.10655 · · ·

limn→∞

1n

n−1∑

k=0

ρk =π2

12 log 2− 1 = 0.18656 · · ·

and

limn→∞

1n

n−1∑

k=0

Θ′k =

12

+1

4 log 2= 0.86067 · · ·

all hold a.e. in I.Proof. We consider just the first equation, leaving the calculation details

to the reader, as the same idea underlies the proofs in the other cases.By Corollary 4.1.22 we have

limn→∞

1n

n−1∑

k=0

I[0,t](Θk) = H(t)

a.e. in I for any t ∈ I ∩Q. Hence for any fixed ω ∈ Ω not belonging to theexceptional set the distribution function

Fn(t) :=1n

n−1∑

k=0

I[0,t](Θk), t ∈ I,

Page 270: Kluwer

Ergodic theory of continued fractions 253

converges weakly to H as n →∞. Consequently,

It dFn(t) =

1n

n−1∑

k=0

Θk

should converge to ∫

It dH(t) =

14 log 2

as n →∞ for any ω ∈ Ω not belonging to the exceptional set, thus a.e. in I.While for the last two equations the reasoning is quite similar, in the

case of the second equation we should consider two-dimensional distributionfunctions, and the value of the limit equals

∫∫I2 t1t2 dH(t1, t2). 2

We turn now to limit properties of certain associated random variables.It follows from (4.1.6) that for any measurable real-valued function f on Isuch that

∫I |f | dλ < ∞ we have

limn→∞

1n

n−1∑

k=0

f(sk) =∫

If dγ a.e. in I2. (4.1.17)

From (4.1.17) we can derive a weaker result for the sequences (san)n∈N, a ∈ I.

Theorem 4.1.24 Let f : I → R be continuous. Then for any a ∈ I wehave

limn→∞

1n

n−1∑

k=0

f(sak) =

If dγ a.e. in I.

Proof. We have |sk − sak| ≤ (FkFk+1)

−1 for any k ∈ N, (ω, θ) ∈ Ω × I,a ∈ I. The result then follows from (4.1.17) and the uniform continuity off on I. 2

Remarks. 1. The above result also follows from a theorem of Breiman(1960) on account of the Markov property of the sequences (sa

n)n∈N, a ∈ I.2. The corresponding result for ya

n = 1/san, n ∈ N+, a ∈ I, can be easily

stated. In this form it can be found in Elton (1987) and Grigorescu andPopescu (1989). 2

Corollary 4.1.25 For any m ∈ N+ and a ∈ I we have

limn→∞

1n

n−1∑

k=0

(sak)

m =1

log 2

i∈N+

(−1)i−1

(m + i)a.e. in I.

Page 271: Kluwer

254 Chapter 4

In particular, for m = 1 the value of the limit is (1/ log 2)− 1.

The proof amounts to computing the integral

1log 2

∫ 1

0

tm

t + 1dt,

which yields the result stated. 2

Taking f(x) = log x, x ∈ I, in (4.1.17) and noting that∫

Ilog xγ(dx) =

1log 2

∫ 1

0

log xdx

x + 1

=1

log 2

(log(x + 1) log x|10 −

∫ 1

0

log(x + 1) dx

x

)

= − 1log 2

k∈N

(−1)k

k + 1

∫ 1

0xk dx = − 1

log 2

k∈N

(−1)k

(k + 1)2

= − 1log 2

(ζ(2)− 2

4ζ(2)

)= − π2

12 log 2,

we obtain

limn→∞

1n

log(s0s1 · · · sn−1) = − π2

12 log 2a.e. in Ω

or, equivalently

limn→∞

1n

log(y0y1 · · · yn−1) =π2

12 log 2a.e. in Ω.

In the last equation we can give an estimate of the convergence rate. Wehave shown in Example 3.2.11 that

limn→∞

1n

(n−1∑

i=0

(log yi − π2

12 log 2

))2

> 0.

Then for any ε > 0 by Theorem 4.0.4 we obtain

1n

n−1∑

k=0

log yk = − 1n

n−1∑

k=0

log sk

=π2

12 log 2+ o

(n−

12 log(3+ε)/2 n

)a.e. in Ω

(4.1.18)

Page 272: Kluwer

Ergodic theory of continued fractions 255

as n → ∞, where the constant implied in o depends on ε and the currentpoint (ω, θ) ∈ Ω2.

While we cannot take f(x) = log x, x ∈ I, in Proposition 4.1.24 sincethis is not a continuous function on I, we can however replace sk by sa

k,k ∈ N, a ∈ I, in (4.1.18) as shown below.

Theorem 4.1.26 For any a ∈ I we have

limn→∞

1n

log(sa1s

a2 · · · sa

n) = − π2

12 log 2a.e. in Ω.

More precisely, whatever ε > 0, for any a ∈ I we have

1n

log(sa1s

a2 · · · sa

n) = − π2

12 log 2+ o

(n−

12 log(3+ε)/2 n

)a.e. in Ω

as n →∞, where the constant implied in o depends on both ε and the currentpoint ω ∈ Ω.

In particular, for a = 0 the above equations amount to

limn→∞

n√

qn = eπ2/12 log 2 a.e. in Ω (4.1.19)

and

n√

qn = eπ2/12 log 2 + o(n−

12 log(3+ε)/2 n

)a.e. in Ω (4.1.20)

as n →∞, respectively.

Proof. By the mean value theorem we have∣∣∣∣log x− log y

x− y

∣∣∣∣ ≤1

min(x, y)

for any 0 < x, y ≤ 1, x 6= y. Next, note that

0 <v(i(k))u(i(k))

− 1 ≤ max(

1Fk−1Fk+1

,1F2

k

)

for any fundamental interval I(i(k)) = Ω ∩ (u(i(k)), v(i(k))

), i(k) ∈ Nk

+, k ∈N+. This follows easily from (1.1.12), (1.1.13), and Theorem 1.1.2.

Consequently, for any k ∈ N+ and a ∈ I we have

|log sk − log sak| ≤ max

(1

Fk−1Fk+1,

1F2

k

)= O(g2k) (4.1.21)

Page 273: Kluwer

256 Chapter 4

as n →∞, whatever the current point (ω, θ) ∈ Ω.Clearly, by (4.1.18) and (4.1.21) the proof is complete for any a ∈ I. In

the special case a = 0 we only should note that

s0k =

qk−1

qk, k ∈ N+.

2

Remark. The convergence rate in Theorem 4.1.26 with a = 0 is slightlybetter than that derived by Philipp (1967, p. 122). Equation (4.1.19) wasfirst derived by Levy (1929) using a different method. 2

Corollary 4.1.27 We have

limn→∞

1n

log∣∣∣∣ω −

pn

qn

∣∣∣∣ = − π2

6 log 2a.e. in Ω

and, for any ε > 0,

1n

log∣∣∣∣ω −

pn

qn

∣∣∣∣ = − π2

6 log 2+ o

(n−1/2 log(3+ε)/2 n

)a.e. in Ω

as n →∞, where the constant implied in o depends on both ε and ω ∈ Ω.

Proof. It follows from (1.1.16) that for any ω ∈ Ω and n ∈ N we have

12q2

n+1

<

∣∣∣∣ω −pn

qn

∣∣∣∣ <1q2n

.

Then the results stated are immediate consequences of equations (4.1.19)and (4.1.20). 2

Corollary 4.1.28 We have

limn→∞

1n

log λ (I(a1, · · · , an)) = − π2

6 log 2a.e. in Ω

and, for any ε > 0,

1n

log λ (I(a1, · · · , an)) = − π2

6 log 2+ o

(n−1/2 log(3+ε)/2 n

)a.e. in Ω

as n →∞, where the constant implied in o depends on both ε and ω ∈ Ω.

Proof. By (1.2.2) and (1.2.5) we have

log λ (I(a1, · · · , an)) = −2 log qn − log(sn + 1), n ∈ N+.

Page 274: Kluwer

Ergodic theory of continued fractions 257

Since sn ∈ I, the results stated are again immediate consequences of equa-tions (4.1.19) and (4.1.20). 2

Remark. The result above implies that the entropy H(τ) of the continuedfraction transformation τ is equal to π2/6 log 2. See e.g., Billingsley (1965,p. 134). 2

Corollary 4.1.29 For any ε > 0 we have

n√

pn(ω) = ω1/neπ2/12 log 2 + o(n−

12 log(3+ε)/2 n

)a.e. in Ω

as n →∞, where the constant implied in o depends on both ε and ω ∈ Ω.

The proof follows from the inequality

∣∣∣ n√

pn(ω)− n√

ω qn(ω)∣∣∣ ≤ 1

Fn+1F(n−1)/nn

, ω ∈ Ω, n ∈ N+,

which can be easily checked. 2

Corollary 4.1.30 (Khinchin’s fundamental theorem of Diophantine ap-proximation) Let f : N+ → R++.

(i) If∑

i∈N+f(i) = ∞ and if(i) ≥ (i + 1)f(i + 1), i ∈ N+, then a.e. in

Ω the inequality ∣∣∣∣ω −p

q

∣∣∣∣ <f(q)

q

has infinitely many solutions in integers p, q ∈ N+ with g.c.d.(p, q) = 1.(ii) If

∑i∈N+

f(i) < ∞, then a.e. in Ω the above inequality has at mostfinitely many solutions in integers p, q ∈ N+ with g.c.d.(p, q) = 1.

The proof follows from Theorem 4.1.26 with a = 0 and F. Bernstein’stheorem (Proposition 1.3.16). See, e.g., Billingsley (1965, p. 48). 2

4.2 Other continued fraction expansions

4.2.1 Preliminaries

In this section we study a large class of continued fraction expansions whichcan be derived from the RCF expansion. Before defining them formally letus briefly describe the underlying idea.

Page 275: Kluwer

258 Chapter 4

The following rather old and well known remark is fundamental. Fora ∈ Z, b ∈ N+ and x ∈ [0, 1) we have

a +1

1 +1

b + x

= a + 1 +−1

b + 1 + x.

This operation is called a singularization. We have singularized the digit 1in

[ · ; · · · , a, 1, b, · · · ]The effect of a singularization is that a new and shorter continued frac-tion expansion is obtained. Moreover, we will see that the sequence ofconvergents associated with the ‘new’ continued fraction expansion is a sub-sequence of the sequence of convergents of the ‘old’ one. For example, givenn ∈ N+, if we singularize the digit an+1(ω) = 1 in the RCF expansion ofsome ω ∈ Ω, then the sequence of convergents of the ‘new’ continued frac-tion expansion is obtained by deleting the nth term from the sequence ofRCF convergents of ω. Obviously, the ‘new’ continued fraction expansion isno longer an RCF expansion!

Starting from the RCF expansion of a given x ∈ [0, 1) it is not possible(i) to singularize two consecutive digits equal to 1, and (ii) to singularizedigits other than 1.

It is also important to note that once we have singled out digits equal to1 to be singularized, the order in which they are singularized has no impacton the final result. Of course, just one singularization does not make thenew expansion ‘really faster’ than the old one. However, many algorithmscan be devised such that for almost all x ∈ [0, 1) infinitely many convergentsare skipped. Before considering such algorithms, let us fix notation.

Let x ∈ [0, 1) with RCF expansion

x = [a1, a2, · · · ] .

Any finite or infinite string of consecutive digits

ak(x) = 1, ak+1(x) = 1, · · · , ak+n−1(x) = 1, k ∈ N+, n ∈ N+∪∞

is called a 1-block if either k = 1 and ak+n(x) 6= 1 (if n is finite) or k > 1 andak−1(x) 6= 1, ak+n(x) 6= 1 (if n is finite). The first algorithm we consider is:

A For any x ∈ [0, 1) singularize the first, third, fifth, etc., components inany 1-block.

Page 276: Kluwer

Ergodic theory of continued fractions 259

Applying algorithm A to a (finite or infinite) RCF expansion [a1, a2, · · · ]yields a (finite or infinite) continued fraction of the form

b0 +e1

b1 +e2

b2 +.. .

(4.2.1)

or [b0; e1/b1, e2/b2, · · · ], for short. In (4.2.1) we have b0 ∈ 0, 1, bn ∈N+, en ∈ −1, 1, and bn + en+1 ≥ 2, n ∈ N+.

Example 4.2.1 Let x = (−3 +√

17)/2 = 0.56155 · · · . As a quadraticirrationality x should have a periodic RCF expansion (see Subsection 1.1.3).We easily find that

x = [0; 1, 1, 3, 1, 1, 3, · · · ] =[0; 1, 1, 3

].

Applying algorithm A to the RCF expansion of x yields

x = [1;−1/2, 1/4,−1/2, 1/4, · · · ]

or x =[1;−1/2, 1/4

], for short. 2

By the very construction, the convergents

pen

qen

:= b0 +e1

b1 +e2

b2 +.. . +

en

bn

, n = 1, 2, · · · ,

of (4.2.1) are a subset of the convergents of [a1, a2, · · · ]. Therefore in thecase of an infinite RCF expansion we have

limn→∞

pen

qen

= [a1, a2, · · · ] .

Several questions naturally arise :

(i) Are there other algorithms yielding continued fraction expansions withthe property above?

(ii) Does algorithm A always yield fastest continued fraction expansions?Closest expansions? (The precise meaning of these terms will be ex-plained later. See Subsection 4.3.3. Informally, one would like thedenominators qe

n, n ∈ N+, to grow as fast as possible while the ap-proximation coefficients associated with the new expansion to be assmall as possible.)

Page 277: Kluwer

260 Chapter 4

(iii) Is there an underlying ergodic transformation?

We can easily answer the first question. The second algorithm we con-sider is:

B For any x ∈ [0, 1) singularize the last, third from last, fifth from last,etc., components in any 1-block.

Example 4.2.2 Let x be as in Example 4.2.1. Applying algorithm B tothe RCF expansion of x yields

x = [1; 1/2,−1/4, 1/2,−1/4, · · · ] ,

or x =[1; 1/2,−1/4

], for short. 2

Clearly, in general, algorithms A and B yield different results. Actuallyit is possible to show that, in a sense, one cannot do better than eitherof these algorithms. Since one can singularize just digits equal to 1, andsince two consecutive 1’s cannot be both singularized, it is not possible togo faster than either algorithms A or B. Slower algorithms are trivially athand. Here is an example of such an algorithm:

C For any x ∈ [0, 1) singularize all digits an+1(x) = 1 for which Θn (x) ≥1/2 (see Subsection 1.3.2) whatever n ∈ N.

In Subsection 4.3.2 it is shown that algorithm C is well defined, that is, notin conflict with the requirements of the singularization procedure.

Example 4.2.3 Let x be as in Example 4.2.1. A simple calculationshows that the first four digits equal to 1 in the RCF expansion of x shouldnot be singularized if we apply algorithm C to it. 2

From this example it is clear that, in general, algorithm C does notyield expansions which are fastest. In Subsection 4.3.3 we will discuss analgorithm which yields both fastest and closest expansions. This algorithmwas introduced by Selenius (1960) and—independently—by Bosma (1987),and is called the optimal continued fraction (OCF) expansion. Finally, inSubsection 4.2.5 we will answer question (iii) above.

4.2.2 Semi-regular continued fraction expansions

Apart from the RCF expansion there exist many so called semi-regular con-tinued fraction expansions. To define the latter we start by defining a con-tinued fraction (CF) as a pair of two sets e = (ek)k∈M and (ak)k∈0∪M of

Page 278: Kluwer

Ergodic theory of continued fractions 261

integers with ek ∈ −1, 1 and a0 ∈ Z, ak ∈ N+, k ∈ M , where eitherM = k : 1 ≤ k ≤ n for some n ∈ N+ or M = N+. Next, for arbitraryindeterminates xi, yi, 1 ≤ i ≤ n, n ∈ N+, write

[y1/x1] =y1

x1, [y1/x1, · · · , yn/xn] =

y1

x1 + [y2/x2, · · · , yn/xn], n ≥ 2.

If card M = n ∈ N+ then we say that the CF considered has length n andassign it the value

[a0; e1/a1, · · · , en/an] := a0 + [e1/a1, · · · , en/an]

= a0 +e1

a1 +e2

a2 +.. . +

en

an

∈ R ∪ −∞,∞.

If M = N+ then we say that the CF considered is infinite and look at it asthe sequence

((ek)1≤k≤n, (ak)0≤k≤n)n∈N+

of all finite CF’s which are obtained by finite truncation. In both cases wecan associate with a CF its convergents

pe0

qe0

:= a0,pe

k

qek

:= [a0; e1/a1, · · · , ek/ak] , 1 ≤ k ≤ n,

for either some n ∈ N+ or any n ∈ N+, with pe0 = a0, qe

0 = 1, pek ∈ Z, qe

k ∈N+, g.c.d. (|pe

k|, qek) = 1, 1 ≤ k ≤ n.

To ensure the convergence of the sequence of convergents of an infiniteCF, which would enable us to speak of a CF expansion, additional conditionsshould be imposed on the ek and ak, k ∈ N+. One possibility, yielding theso called semi-regular continued fraction (SRCF ) expansion, is to ask thatei+1 + ai ≥ 1, i ∈ N+, and ei+1 + ai ≥ 2 infinitely often (in the infinitecase). It can be shown that the sequence of convergents of an infinite SRCFexpansion converges to an irrational number. See Tietze (1913) [cf. Perron(1954, §37)]. This will be written as

limk→∞

pek

qek

:= [a0; e1/a1, e2/a2, · · · ] .

As in the RCF expansion case a matrix theory is associated with anSRCF expansion (or, more generally, with a CF). Consider (cf. Remark

Page 279: Kluwer

262 Chapter 4

preceding Proposition 1.1.1) the matrices

Ae0 :=

(0 11 0

)(0 11 a0

)=

(1 a0

0 1

), Ae

n :=(

0 en

1 an

), n ∈ N+,

andMe

n := Ae0 · · ·Ae

n, n ∈ N.

Clearly,

det M e0 = 1, det M e

n = (−1)n e1 · · · en, n ∈ N+. (4.2.2)

One can prove that

M en =

(pe

n−1 pen

qen−1 qe

n

), n ∈ N, (4.2.3)

with pe−1 = 1, qe

−1 = 0, which implies that the sequences (pen)n∈N and

(qen)n∈N satisfy the recurrence relations

pen = anpe

n−1 + enpen−2, qe

n = anqen−1 + enqe

n−2, n ∈ N+.

The second equation above implies at once that

sen :=

qen−1

qen

= [1/an, en/an−1, · · · , e2/a1] , n > 1, (4.2.4)

and clearly se1 := qe

0/qe1 = 1/a1. It follows from (4.2.2) and (4.2.3) that

pe−1q

e0 − pe

0qe−1 = 1,

pen−1q

en − pe

nqen−1 = (−1)n e1 · · · en, n ∈ N+,

showing that indeed g.c.d (|pen|, qe

n) = 1, n ∈ N.Next (see again the RCF expansion case), looking at M e

n as a Mobiustransformation one can show that

M en(0) =

pen

qen

, n ∈ N.

More generally,

M en (z) =

pen + zpe

n−1

qen + zqe

n−1

= [a0; e1/a1, · · · , en−1/an−1, en/(an + z)] , n ≥ 2,

Page 280: Kluwer

Ergodic theory of continued fractions 263

for any z ∈ C, z 6= −1/sen, and

a0 +e1

a1 + z= M e

1 (z)(

=pe1 + zpe

0

qe1 + zqe

0

)

for any z ∈ C, z 6= −1/se1. It follows that putting ten = [en+1/an+1, · · · ] , n ∈

N, we have

a0 + te0 =pe

n + tenpen−1

qen + tenqe

n−1

, n ∈ N.

Finally, defining

Θen(a0 + te0) = (qe

n)2∣∣∣∣a0 + te0 −

pen

qen

∣∣∣∣ , n ∈ N,

it is easy to check that

Θen(a0 + te0) =

en+1ten

senten + 1

=|ten|

senten + 1

, n ∈ N. (4.2.5)

Since

(ten)−1 = en+1(an+1 + ten+1), sen + en+1an+1 =

en+1

sen+1

, n ∈ N,

we also haveΘe

n(a0 + te0) =sen+1

sen+1t

en+1 + 1

, n ∈ N. (4.2.6)

The numbers Θen, n ∈ N, associated with a (finite or infinite) SRCF

expansion are called its approximation coefficients. Compare with the RCFexpansion case in Subsection 1.3.2.

We conclude this subsection with a few examples of well known SRCFexpansions.

1. The RCF expansion: this is the SRCF expansion for which en = 1 forany n ∈ N+.

2. Nakada’s α-expansions for α ∈ [1/2, 1]: see Subsection 4.3.1.

3. The nearest integer continued fraction (NICF) expansion: this is theSRCF expansion for which en+1+an ≥ 2 for any n ∈ N+. It was intro-duced by Minnigerode (1873) and studied by Hurwitz (1889). Actually,the NICF expansion is the 1/2-expansion, and is obtained by applyingalgorithm A defined in Subsection 4.2.1 to the RCF expansion.

Page 281: Kluwer

264 Chapter 4

4. The singular continued fraction (SCF) expansion: this is the SRCFexpansion for which en + an ≥ 2, n ∈ N+. It was introduced byHurwitz (1889). Actually, the SCF expansion is the g-expansion withg = (

√5−1)/2, the golden ratio, and is obtained by applying algorithm

B defined in Subsection 4.2.1 to the RCF expansion.

5. Minkowski’s diagonal continued fraction (DCF) expansion: this is theSRCF expansion which is obtained by applying algorithm C definedin Subsection 4.2.1 to the RCF expansion. See Subsection 4.3.2.

6. The continued fraction with odd incomplete quotients (Odd CF) ex-pansion: this is the SRCF expansion for which e1 = 1, an ≡ 1 mod 2,en+1 + an ≥ 2, n ∈ N+. It was introduced by Rieger (1981a) [seealso Barbolosi (1990), Hartono and Kraaikamp (2002), and Schweiger(1995, Ch. 3)].

7. The continued fraction with even incomplete quotients (Even CF) ex-pansion: this is the SRCF expansion for which e1 = 1, an ≡ 0 mod 2,en+1 + an ≥ 2, n ∈ N+. See also Kraaikamp and Lopes (1996) andSchweiger (1995, Ch. 3).

4.2.3 The singularization process

The following two easily checked identities are fundamental for the theorywhich we develop in this section:

(1 a0 1

)(0 c1 1

) (0 11 b

)=

(1 a + c0 1

)(0 −c1 b + 1

), (4.2.7)

(0 c1 a

) (0 d1 1

)(0 11 b

)=

(0 c1 a + d

)(0 −d1 b + 1

), (4.2.8)

where a, b, c and d are arbitrary real or complex numbers.Let

(ek)k∈M , (ak)k∈0∪M (4.2.9)

be a (finite or infinite) CF with a`+1 = 1, e`+2 = 1 for some ` ∈ N for which` + 2 ∈ M . The transformation σ` which takes (4.2.9) into the CF

(ek)k∈M\`+1, (ak)k∈0∪(M\`+1) (4.2.10)

Page 282: Kluwer

Ergodic theory of continued fractions 265

with ek = ek, k ∈ M, k < ` + 1 or k ≥ ` + 3, e`+2 = −e`+1, ak = ak, k ∈0 ∪M, k < ` or k ≥ ` + 3, a` = a` + e`+1, a`+2 = a`+2 + 1, is called asingularization of the pair (a`+1, e`+2).

Let (pek/qe

k)k∈0∪M and (pek/qe

k)k∈0∪(M\`+1) be the sets of convergentsassociated with (4.2.9) and (4.2.10), respectively. We are going to derive therelationship between these sets. Let (M e

k)k∈0∪M and (M ek)k∈0∪(M\`+1)

be the sets of matrices defined in the preceding subsection, associated with(4.2.9) and (4.2.10), respectively. We have

(pe

k

qek

)= M e

k

(01

), k ∈ 0 ∪ (M\` + 1).

Clearly, M ek = M e

k for k < ` and, moreover, by (4.2.7) and (4.2.8) we haveM e

k = Mk+1 for k ≥ ` + 1. The matrix M e` will then be given by

M e` = M e

`−1

(0 e`

1 a` + e`+1

)

with Me−1 :=

(0 11 0

)and e0 = 1. Hence

M e` = M e

`+1

(0 e`+1

1 1

)−1 (0 e`

1 a`

)−1 (0 e`

1 a` + e`+1

)

= M e`+1

( −e`+1 0e`+1 1

).

Therefore(

pe`

qe`

)= M e

`+1

( −e`+1 0e`+1 1

)(01

)=

(pe

`+1

qe`+1

),

and we can state the following result.

Proposition 4.2.4 Let ` ∈ N such that ` + 2 ∈ M . The set of conver-gents

(pek/qe

k)k∈0∪(M\`+1)

resulting after the singularization σ` of the pair (a`+1, e`+2) = (1, 1), isobtained by deleting pe

`/qe` from the set (pe

k/qek)k∈0∪M .

In what follows a singularization process will consist of a set S of contin-ued fractions and a rule which determines in an unambiguous way the pairsa`+1 = 1, e`+2 = 1 that should be singularized for any member of S.

Page 283: Kluwer

266 Chapter 4

Remark. For an infinite CF the sequence of convergents of the ‘new’ CFobtained after singularization, is a subsequence of the sequence of conver-gents of the ‘old’ one. Therefore if the ‘old’ CF converged to x, so does the‘new’ one, and it converges faster. In particular, this holds for any SRCFexpansion to be singularized.

4.2.4 S-expansions

From now on we will concentrate on one special singularization process.The set S of continued fraction expansions to be singularized is the set ofall (finite or infinite) RCF expansions. Since in this case all the e’s are+1, we will speak of singularizing a`+1 = 1 instead of singularizing the paira`+1 = 1, e`+2 = 1.

Before describing the general rule (as we should according to the defini-tion just given) remark that Example 4.2.1 actually describes a singulariza-tion process: S plus algorithm A yield the NICF expansion! Now, noticethat algorithm A is equivalent to

singularize a`+1 = 1 if and only if (τ `, s`) ∈ SA, ` ∈ N,

where (cf. Subsection 1.3) τ ` = [a`+1, a`+2, · · · ], s`

= [a`, · · · , a1], ` ∈ N,with s0 = 0, and

SA = [1/2, 1)× [0, g] ⊂ I2.

We recall that the golden ratios g and G are defined as

g =√

5− 12

, G = g + 1.

Similarly, we can verify that algorithm B—leading to Hurwitz’ SCF expansion—is equivalent to

singularize a`+1 = 1 if and only if (τ `, s`) ∈ SB, ` ∈ N,

whereSB = [g, 1)× I ⊂ I2.

Finally, using properties of the approximation coefficients Θn, n ∈ N, de-fined in Subsection 1.3.2, we can also show that algorithm C—leading toMinkowski’s DCF expansion—is equivalent to

singularize a`+1 = 1 if and only if (τ `, s`) ∈ SC , ` ∈ N,

Page 284: Kluwer

Ergodic theory of continued fractions 267

where

SC =

(x, y) ∈ I2 ;x

xy + 1≥ 1

2

.

These three examples lead to the idea of prescribing by a subset S ⊂ I2

which digits 1 = a`+1 are to be singularized in the RCF expansion in theform of the condition (τ `, s`) ∈ S, ` ∈ N. Such an S cannot be just any setbut must satisfy the conditions

S ⊂ [1/2, 1)× I,

since otherwise a`+1 would not be equal to 1, and

S ∩ τ(S) ⊂ (g,g),

since otherwise one would be forced to singularize two consecutive digitsboth equal to 1, which is impossible. Thus we are lead—in a natural way—to the following definition which exactly describes all S-expansions.

Definition 4.2.5 A subset S of I2 is said to be a singularization area ifand only if

(i) S ∈ B2I and γ(∂S) = 0;

(ii) S ⊂ [1/2, 1)× I;

(iii) S ∩ τ(S) ⊂ (g,g).

If S is a singularization area, then the S-expansion of ω ∈ Ω is defined as theSRCF expansion converging to ω which is obtained from the RCF expansionof ω by singularizing a digit 1 = a`+1 = a`+1(ω) if and only if (τ `, s`) ∈ S,whatever ` ∈ N.

Remarks. 1. We need the continuity condition γ(∂S) = 0 in order to beable to draw the following conclusion. Let A(S, n) be the random variabledefined as

A(S, n) = cardj : (τ j , sj) ∈ S, 1 ≤ j ≤ n, n ∈ N+.

By Theorem 4.1.16(ii) we then have

limn→∞

A(S, n)n

= γ(S) a.e..

Page 285: Kluwer

268 Chapter 4

2. Actually, the sets SA and SB do not satisfy condition (iii). Indeed, inboth cases, S ∩ τ(S) is a line segment. Of course, this can be easily repairedby taking

S∗A = ([1/2, g]× [0, g]) ∪ ((g, 1)× [0, g))

andS∗B = ([g, 1)× [0, g]) ∪ ((g, 1)× (g, 1])

instead of SA and SB, respectively.3. Since

γ([1/2, 1)× I) = (log 2)−1 log43

= 0.41503 · · · ,

a singularization area S never can have γ-measure greater that 0.41503 · · · .But condition (iii) forces the maximal possible γ-measure of a singularizationarea S to be essentially smaller than 0.41503 · · · as shown below.

Proposition 4.2.6 For any singularization area S we have

γ(S) ≤ 1− log Glog 2

= 0.30575 · · · ,

where the bound is sharp.

Proof. Define M1 = S∗A with S∗A as before and M2 = ([0, g)× (g, 1]) ∪([g, 1)× [g, 1]). It is easy to check that M2 = τ(M1) and

γ(M1) = γ(M2) = 1− log Glog 2

.

Next, put S1 = S ∩M1 and S2 = S ∩M2. Clearly,

τ(S1) ∪ S2 ⊂ M2

and by Definition 4.2.5(iii) we have

τ(S1) ∩ S2 ⊂ (g, g) ,

see also Figure 4.1.We now see that

γ(S) = γ(S1) + γ(S2) = γ(τ(S1)) + γ(S2) = γ(τ(S1) ∪ S2)

≤ γ(M2) = 1− log Glog 2

.

Page 286: Kluwer

Ergodic theory of continued fractions 269

0 1

1

12

g

g

S1

S2

τ(S1)

...................................................................................................................................................................................................................................................................................................................................................................................................................

...........................................................

...................................................

.............................................

........................................

....................................

.................................

..............................

.............................................................................................................

Figure 4.1: S = S1 ∪ S2 and τ(S1)

That a singularization area actually can have γ-measure 1− (log 2)−1(log G)is shown by the cases of S∗A and S∗B. 2

On account of Proposition 4.2.6, a singularization area S will be calledmaximal if

γ(S) = 1− log Glog 2

.

Given a singularization area S, let BS be a subset of I2 such that what-ever ω = [a1, a2, · · · ] ∈ Ω any digit 1 = a`+1 = a`+1(ω) is unchanged by S-singularization if and only if (τ `, s`) ∈ BS , ` ∈ N. Clearly, such a set—whichdetermines the occurrence of digits equal to 1 in the S-expansion—shouldhave the following properties:

(1) BS ⊂ [1/2, 1)× I since a`+1 = 1;

(2) BS ∩ S = ∅ since a`+1 = 1 is not singularized;

(3) τ−1(BS) ∩ S = ∅ since a` is not singularized;

(4) τ(BS) ∩ S = ∅ since a`+2 is not singularized.

On account of the considerations above, the subset BS of I2 defined as

BS = ([1/2, 1)× I) \ (S ∪ τ−1(S) ∪ τ(S))

is called the preservation area of 1’s.

We have the following result.

Page 287: Kluwer

270 Chapter 4

Proposition 4.2.7 If S is maximal, then γ(BS) = 0. In general, theconverse of this statement does not hold.

Proof. Let M1, M2, S1 and S2 be as in the proof of Proposition 4.2.6.Put moreover

B1 = BS ∩M1, B2 = BS ∩M2.

It is now easy to see that

τ(B1) ∩ (τ(S1) ∪ S2) = ∅, τ(B1) ∪ τ(S1) ∪ S2 ⊂ M2,B2 ∩ (τ(S1) ∪ S2) = ∅, B2 ∪ τ(S1) ∪ S2 ⊂ M2.

Hence, since S is maximal,

γ(B2) = 0, γ(B1) = γ(τ(B1)) = 0,

which completes the proof. (The reader is invited to give an example wherethe converse does not hold.) 2

We conclude this subsection by deriving a number of results, which areobtained as easy spin-off. Let S be a singularization area and ω ∈ Ω. Asthe sequence (pe

k/qek)k∈N+ of S-convergents of ω is a subsequence of the

sequence (pn/qn)n∈N+ of its RCF convergents, there exists an increasingrandom function nS : N+ → N+ such that

(pe

k

qek

)=

(pnS(k)

qnS(k)

), k ∈ N+.

Theorem 4.2.8 Let S be a singularization area. Then

limk→∞

nS(k)k

=1

1− γ(S)a.e..

Proof. It follows from the definition of nS that

nS(k) = k +nS(k)∑

j=1

IS(τ j , sj) .

Since γ(∂S) = 0, by Theorem 4.1.16(ii) we have

1 = limk→∞

k

nS(k)+ lim

k→∞1

nS(k)

nS(k)∑

j=1

IS(τ j , sj)

= limk→∞

k

nS(k)+ γ(S) a.e.,

Page 288: Kluwer

Ergodic theory of continued fractions 271

whence the result stated. 2

Remark. Theorem 4.2.8 implies that

limk→∞

nS(k)k

≤ log 2log G

= 1.4404 · · · a.e.,

the upper bound being attained if and only if S is maximal. In words:sparsest sequences of S-convergents are given by maximal singularizationareas. As the singularization area S∗A which yields the NICF is maximal, wehave thus re-proved a theorem of Adams (1979), see also Jager (1982) andNakada (1981). 2

The following corollary gives the S-expansion analogues of two classicalresults of P. Levy in Subsection 4.1.3.

Corollary 4.2.9 Let S be a singularization area and let (pek/qe

k)k∈N+ bethe corresponding sequence of S-convergents. Then

limk→∞

1k

log qek =

11− γ(S)

π2

12 log 2a.e.,

limk→∞

1k

log∣∣∣∣ω −

pek

qek

∣∣∣∣ =1

1− γ(S)−π2

6 log 2a.e.

Proof. This is an immediate consequence of Theorems 4.1.26 and 4.2.8.We have

limk→∞

1k

log qek = lim

k→∞nS(k)

k

1nS(k)

log qnS(k) =1

1− γ(S)π2

12 log 2a.e.,

and similarly for the second equation. 2

By the mechanism of singularization the collection of RCF convergentsthat are deleted to obtain the S-convergents has the same cardinality as theset of the e`, ` ∈ N+, which are equal to −1. It is easy to see that

nS(k)− k =12

(k −

k∑

`=1

e`

).

Therefore we can state the following result.

Corollary 4.2.10 We have

limk→∞

1k

k∑

`=1

e` =1− 3γ(S)1− γ(S)

a.e..

Page 289: Kluwer

272 Chapter 4

The minimum of the limit above is attained if and only if S is maximal, andis equal to

1log G

logG3

4= 0.11915 · · · .

We conclude this subsection by giving the S-expansion analogue of Le-gendre’s theorem—see Corollary 1.2.4.

Theorem 4.2.11 Let

A(t) =((x, y) ∈ I2 : x/(xy + 1) < t, y ∈ Q

), 0 < t ≤ 1,

and definecS = sup (t ∈ (0, 1] : A(t) ∩ S = ∅) .

PutLS = min(cS , 1/2) .

Let ω ∈ Ω and p, q ∈ N+ with g.c.d.(p, q) = 1, p < q. If

Θ = Θ(ω, p/q) = q2

∣∣∣∣ω −p

q

∣∣∣∣ < LS ,

then p/q is an S-convergent of ω. The constant LS is best possible.

Proof. Suppose that Θ(ω, p/q) < LS and that p/q is not an S-convergentof ω. Since LS ≤ 1/2, p/q is an RCF convergent of ω by Corollary 1.2.4,i.e., there exists n ∈ N+ such that p/q = pn/qn. Now, since pn/qn is not anS-convergent, by the very definition of an S-expansion we have (τn, sn) ∈ S.The definition of LS then implies

τn

snτn + 1≥ cS ≥ LS ,

which by the definition of the approximation coefficients in Subsection 1.3.2yields

Θ(ω, p/q) = Θn ≥ LS ,

contrary to the hypothesis. Finally, it follows from the definition of LS andCorollary 1.2.4 that LS is best possible. 2

Remarks. 1. Rieger (1979) and Adams (1979) gave a proof of Corollary4.2.10 for the special case of the NICF expansion, using a formula of Spenceand Abel for the dilogarithm. We see that these transcendent techniquescan be avoided, which was also observed by Jager (1982).

Page 290: Kluwer

Ergodic theory of continued fractions 273

2. An easy calculation shows that for S = S∗A (the singularization areayielding the NICF expansion) we have

LS = g2 = 0.38166 · · · .

This value was also found by Ito (1987) and by Jager and Kraaikamp (1989).Their methods are different. Ito (op. cit.) developed a theory for determin-ing the Legendre constants for a class of continued fractions, larger than theclass of S-expansions. Unfortunately, his method is rather complicated.

4.2.5 Ergodic properties of S-expansions

In this subsection we show that for any S-expansion there exists an ‘under-lying’ two-dimensional ergodic dynamical system. These systems will be ob-tained via an induced transformation from (I2,B2

I , τ , γ), the two-dimensionalergodic dynamical system underlying the RCF expansion. Using the ergodicdynamical systems thus obtained we will then deduce more metric and arith-metic properties of S-expansions.

Let S be a singularization area and let x = [ a0; a1, a2, · · · ] = a0 +[ a1, a2, · · · ], a0 ∈ Z, [ a1, a2, · · · ] ∈ Ω. Denote by

[ a0; e1/a1, e2/an, · · · ]the S-expansion of x (cf. Subsection 4.2.3). Recall that this is an SRCF-expansion satisfying en+1 + an ≥ 1, n ∈ N+.

As before let

τn = [ an+1, an+2, · · · ] , n ∈ N,

sn = [an, · · · , a1 ] , n ∈ N+, s0 = 0,

and put

ten = [ en+1/an+1, · · · ] , n ∈ N,

sen =

0 if n = 0,

1/a1 if n = 1,

[1/an, en/an−1, · · · , e2/a1 ] if n > 1.

By equations (1.2.2) and (4.2.4) we have

sn = qn−1/qn, sen = qe

n−1/qen, n ∈ N,

Page 291: Kluwer

274 Chapter 4

where (pn/qn)n∈N and (pen/qe

n)n∈N are the sequences of RCF convergentsand S-convergents of x, respectively. Also,

x =

pn + τnpn−1

qn + τnqn−1,

pek + tekp

ek−1

qek + tekq

ek−1

(4.2.11)

for any k, n ∈ N, with p−1 = pe−1 = 1, and q−1 = qe

−1 = 0. Finally, put

∆ := I2 \ S , ∆− = τ(S), ∆+ = ∆ \∆−.

Theorem 4.2.12 For any n ∈ N+ the following assertions hold:

(i) (τn, sn) ∈ S if and only if pn/qn is not an S-convergent;

(ii) if pn/qn is not an S-convergent, then both pn−1/qn−1 and pn+1/qn+1

are S-convergents;

(iii) (τn, sn) ∈ ∆+ is equivalent to the existence of k = k(n) ∈ N such that

pek−1 = pn−1, pe

k = pn,

qek−1 = qn−1, qe

k = qn,and

tek = τn (⇒ ek+1 = +1),

sek = sn;

(iv) (τn, sn) ∈ ∆− is equivalent to the existence of k = k(n) ∈ N such that

pek−1 = pn−2, pe

k = pn,

qek−1 = qn−2, qe

k = qn,and

tek = −τn/(τn + 1) (⇒ ek+1 = −1),

sek = 1− sn.

Proof. (i) This follows directly from Definition 4.2.5 and Proposition4.2.4.

(ii) This follows from the fact that in the sequence of RCF convergentswe cannot remove two or more consecutive convergents and still have asequence of convergents of some srcf.

(iii) If (τn, sn) ∈ ∆+ then the very definition of ∆+ implies that

(τn−1, sn−1) 6∈ S and (τn, sn) 6∈ S .

Page 292: Kluwer

Ergodic theory of continued fractions 275

Hence neither an nor an+1 is singularized and therefore both pn−1/qn−1 andpn/qn are S-convergents. But then there exists k ∈ N+ such that

pek−1

qek−1

=pn−1

qn−1,

pek

qek

=pn

qn.

Since all the fractions are in their lowest terms and their denominators arepositive we should have

pek−1 = pn−1, pe

k = pn,

qek−1 = qn−1, qe

k = qn.

Then (4.2.11) implies that

pn + τnpn−1

qn + τnqn−1=

pn + tekpn−1

qn + tekqn−1

,

hence tek = τn. Finally, we have

sek =

qek−1

qek

=qn−1

qn= sn .

The converse is obvious.(iv) If (τn, sn) ∈ ∆− then the very definition of ∆− implies that

(τn−1, sn−1) ∈ S and (τn, sn) 6∈ S .

Hence an = 1, and it should be singularized according to Definition 4.2.5.Then pn−2/qn−2 and pn/qn are consecutive S-convergents by (ii). Again,there exists k ∈ N+ such that

pek−1 = pn−2, pe

k = pn,

qek−1 = qn−2, qe

k = qn.

Sincepn = anpn−1 + pn−2 = pn−1 + pn−2,

qn = anqn−1 + qn−2 = qn−1 + qn−2

(4.2.12)

we havesek =

qn−2

qn=

qn − qn−1

qn= 1− sn .

Page 293: Kluwer

276 Chapter 4

Next, from (4.2.11) we have

pn + τnpn−1

qn + τnqn−1=

pn + tekpn−2

qn + tekqn−2

,

and using equations (4.2.12) and (1.1.12) we obtain

tek + tekτn + τn = 0 ,

whence

tek = − τn

τn + 1.

The converse is obvious. 2

Now, define the transformation τ∆ : ∆ → ∆ as

τ∆(x, y) =

τ(x, y) if τ(x, y) 6∈ S,

τ2(x, y) if τ(x, y) ∈ S

for any (x, y) ∈ ∆ = I2 \ S. This is a very simple instance of an inducedtransformation. Cf., e.g., Petersen (1983, Sections 2.3 and 2.4). Accordingto the general theory, it follows that (∆,B∆, τ∆, γ∆) is an ergodic dynamicalsystem. Here γ∆ is the probability measure on B∆ with density

1γ(∆) log 2

1(xy + 1)2

, (x, y) ∈ ∆.

Next, Theorem 4.2.12 leads us naturally to consider the map M : ∆ →R2 defined by

M(x, y) =

(x, y), (x, y) ∈ ∆+,

(−x/(x + 1), 1− y) (x, y) ∈ ∆−.

Set AS = M(∆). Clearly, AS consists of ∆+ = I2 \ (S ∪ τ(S)) and theimage M (τ(S)) of ∆− = τ(S) under M , which lies in the second quadrantof the plane. Also, M : ∆ → AS is one-to-one. We can then define thetransformation τS : AS → AS as τS = Mτ∆M−1, and Theorem 4.2.12implies that (

tek+1, sek+1

)= τS

(tek, s

ek

), k ∈ N. (4.2.13)

Page 294: Kluwer

Ergodic theory of continued fractions 277

It is immediate that the determinant of the Jacobian J of M |∆− is equalto 1/(x + 1)2 > 0. For (x, y) ∈ ∆− we have

|J |−1 1(xy + 1)2

=(

x + 1xy + 1

)2

=1

(st + 1)2,

where t = −x/(x + 1) and s = 1− y. This shows that

1log 2

∫∫

M(∆−)

dsdt(st + 1)2

= 1log 2

∫∫

∆−|J | |J |−1 dx dy

(xy + 1)2

= γ (τ(S)) = γ(S).

(4.2.14)

Note also that

γ(∆+

)= 1− γ (S)− γ (τ(S)) = 1− 2γ (S) . (4.2.15)

Theorem 4.2.13 Let ρ be the probability measure on BASwith density

1(1− γ(S)) log 2

1(xy + 1)2

, (x, y) ∈ AS .

Then (AS ,BAS, τS , ρ) is an ergodic dynamical system which underlies the

corresponding S-expansion.

Proof. The conclusion follows on account of equations (4.2.13) through(4.2.15) noting that the dynamical systems (∆,B∆, τ∆, γ∆) and (AS ,BAS

,τS , ρ) are isomorphic by the very definition of the latter. See Remark 1following Proposition 4.0.5 and Petersen (1983, Sections 1.3 and 2.3). 2

Remark. The entropy of the maps τ∆ and τS can be easily obtainedusing Abramov’s formula [see e.g. Petersen (1983, p. 257)]. Since H(τ) =π2/6 log 2 (see Remark following Corollary 4.1.28), we have

H(τ∆) =H(τ)γ(∆)

=1

1− γ(S)π2

6 log 2= H(τS),

which shows that entropy is maximal(π2/6 log G

)for maximal singulariza-

tion areas. 2

At first sight the dynamical system (AS ,BAS, τS , ρ) looks very intricate.

However, it is quite helpful. We have the following result.

Page 295: Kluwer

278 Chapter 4

Theorem 4.2.14 Let the map f : AS → R ∪ ∞ be defined by

f(x, y) =∣∣x−1

∣∣− τ(1)S (x, y), (x, y) ∈ AS ,

where τ(1)S (x, y) is the first coordinate of τS(x, y). Let a : [0, 1) → N+∪∞

be defined as in Chapter 1, that is,

a(t) =

bt−1c if t ∈ (0, 1),

∞ if t = 0.

We have

f(x, y) =

a(x) if sgnx = 1 and τ(x, y) 6∈ S,

a(x) + 1 if sgnx = 1 and τ(x, y) ∈ S,

a(−x/(x + 1)) + 1 if sgnx = −1 and τ(M−1(x, y)) 6∈ S,

a(−x/(x + 1)) + 2 if sgnx = −1 and τ(M−1(x, y)) ∈ S

and

τS(x, y) =(|x−1| − f(x, y), (y f(x, y) + sgn x)−1

), (x, y) ∈ AS .

Proof. We should distinguish four cases, of which only two will be con-sidered here. The other two cases can be treated similarly. Cf. Kraaikamp(1991, p. 26).

1. Let (x, y) ∈ ∆+ and τ(x, y) ∈ S. Then sgnx = 1 and

τ∆(M−1(x, y)) = τ2(x, y) = τ

(1x− a(x),

1a(x) + y

)

=(

1x−1 − a(x)

− 1,1

1 + 1/(a(x) + y)

)

=(

x− 1 + xa(x)1− xa(x)

,a(x) + y

a(x) + y + 1

)∈ ∆− .

Page 296: Kluwer

Ergodic theory of continued fractions 279

Therefore

τS(x, y) = M(τ∆(M−1(x, y))) =

−x−1+xa(x)

1−xa(x)

1 + x−1+xa(x)1−xa(x)

, 1− a(x) + y

a(x) + y + 1

=(

1x− (a(x) + 1),

1a(x) + y + 1

).

Thus we see that

τS(x, y) =(∣∣x−1

∣∣− f(x, y), (f(x, y) + y sgnx)−1)

,

wheref(x, y) = a(x) + 1.

2. Let (x, y) ∈ M(∆−) and τ(M−1(x, y)

) 6∈ S. Then sgnx = −1 andwe have

τS(x, y) = MτM−1(x, y) = τM−1(x, y) = τ

(− x

x + 1, 1− y

)

=(− 1

x/(x + 1)− a

(− x

x + 1

),

1a(−x/(x + 1)) + 1− y

)

=(−1

x− 1− a

(− x

x + 1

),

1a(−x/(x + 1)) + 1 + y sgnx

).

Thus we see that

τS(x, y) =(∣∣x−1

∣∣− f(x, y), (f(x, y) + y sgnx)−1)

,

wheref(x, y) = a(−x/(x + 1)) + 1.

2

Corollary 4.2.15 We have

(i) f(x, y) ∈ N+ for (x, y) ∈ AS , x 6= 0;

(ii) ak+1 = f(tek, sek), k ∈ N, with (te0, s

e0) = (x− a0, 0).

Page 297: Kluwer

280 Chapter 4

Let AiS , i = 1, 2, be the projections of AS onto the two axes and let λAi

S

denote the probability measure defined by

λAiS(A) =

λ(A ∩Ai

S

)

λ(Ai

S

) , A ∈ BAiS, i = 1, 2.

Proposition 4.2.16 Let µ ∈ pr(BA1

S

)such that µ ¿ λA1

S. For any

B ∈ BASsuch that λA1

S⊗ λA2

S(∂B) = 0 we have

limn→∞µ

((ten, se

n) ∈ B)

= ρS(B),

limn→∞

1n

n−1∑

k=0

IB(tek, sek) = ρS(B) a.e. inA1

S .

Proof. This is the result corresponding to Theorem 4.1.16 and Corollary4.1.17 for the ergodic dynamical system (AS ,BAS

, τS , ρ). It is easy to seethat the proof of Theorem 4.1.16 for the case of the ergodic dynamical system(I2,B2

I , τ , γ) carries over to the present case. 2

Corollary 4.2.17 Consider the approximation coefficients

Θen = (qe

n)2∣∣ae

0 + te0 − pen/qe

n

∣∣ , n ∈ N.

For any µ ∈ pr(BA1S) such that µ ¿ λA1

Sand any (t1, t2) ∈ I2 we have

limn→∞µ

(Θe

n−1 ≤ t1, Θen ≤ t2

)= ρ(B),

limn→∞

1n cardk : Θe

k ≤ t1, Θek+1 ≤ t2, 0 ≤ k ≤ n− 1 = ρ(B) a.e. in A1

S ,

where

B =(

(x, y) ∈ AS ;y

xy + 1≤ t1,

|x|xy + 1

≤ t2

).

Proof. The results stated follow from Proposition 4.2.16 on account ofequations (4.2.5) and (4.2.6). 2

Page 298: Kluwer

Ergodic theory of continued fractions 281

4.3 Examples of S-expansions

4.3.1 Nakada’s α-expansions

Let Iα = [α−1, α], α ∈ R, so that I1 = I. In this subsection we will considertransformations Nα : Iα → Iα defined by

Nα(x) =

|x−1| − ⌊|x−1|+ 1− α

⌋if x 6= 0

0 if x = 0

for x ∈ Iα, with α ∈ [1/2, 1]. Any irrational number x ∈ Iα has an infiniteSRCF expansion called α-expansion, of the form

e1

b1 +e2

b2 +.. .

:= [ e1/b1, e2/b2, · · · ] ,

where

(en, bn) = (en(x), bn(x)) =(e1(Nn−1

α (x)), b1(Nn−1α (x))

), n ∈ N+,

with(e1(x), b1(x)) =

(sgnx,

⌊|x−1|+ 1− α⌋)

, x ∈ Iα.

Here Nnα denotes the composition of Nα with itself n times while N0

α is theidentity map.

The theory of α-expansions can be developed by parallelling that of theRCF expansion. This has been done by Nakada (1981), Nakada et al. (1977),Bosma et al. (1983), and Popescu (2000). Originally, these expansions weredefined by McKinney (1907).

Our approach here consists in putting any α-expansion in the frameworkof the S-expansion theory by giving a suitable singularization area Sα, α ∈[1/2, 1]. This will allow us to retrieve results derived by Nakada and co-workers (op. cit.) using different methods.

We should distinguish two cases: (i) α ∈ [1/2, g] and (ii) α ∈ (g, 1].Case (i). Before giving the singularization areas Sα, α ∈ [1/2, g], we

first return to the special case α = 1/2 which yields the NICF expansion.Recall that the NICF expansion of an irrational number can be obtainedfrom its RCF expansion by applying algorithm A from Subsection 4.2.1 tothe latter. We noticed in Subsection 4.2.4 that this is equivalent to

singularize a`+1 = 1 if and only if (τ `, s`) ∈ SA, ` ∈ N,

Page 299: Kluwer

282 Chapter 4

whereSA = [1/2, 1)× [0, g] .

For α ∈ (1/2, g], notice that

τ ([1/2, α)× [0, g]) = ((1− α)/α, 1]× [g, 1] .

In particular, for α = g we have

(SA \ ([1/2, α)× [0, g])) ∪ ((1− α)/α, 1]× [g, 1])

= (SA \ ([1/2, g)× [0, g])) ∪ ((g, 1]× [g, 1])

= ([g, 1)× [0, g]) ∪ ((g, 1]× [g, 1]) ,

which only slightly differs from the singularizaton area S∗B of Hurwitz’s SCFexpansion, which coincides with the g-expansion. See Remark 2 followingDefinition 4.2.5. It therefore seems natural to try as singularization areasSα for α ∈ [1/2, g] the sets

Sα = ([α, g)× [0, g)) ∪ ([g, (1− α)/α]× [0, g])

∪ ((1− α)/α, 1]× I) .(4.3.1)

Hence

τ(Sα) = ([0, (2α− 1)/(1− α))× [1/2, 1])∪ ([(2α− 1)/(1− α), g]× [g, 1])∪ ((g, (1− α)/α]× (g, 1]) .

It is easy to check that Sα is indeed a singularization area: obviously,γ(∂Sα) = 0, Sα ⊂ [1/2, 1]× I, and clearly Sα ∩ τ(Sα) = (g, g). Also,

γ(Sα) = 1− log Glog 2

,

hence Sα is maximal for any α ∈ [1/2, g].Notice that with M defined as in Subsection 4.2.5 we have

M (τ(Sα)) = ([α− 1, g − 1)× [0, 1− g))∪ ([g − 1, (1− 2α)/α]× [0, 1− g])∪ ((1− 2α)/α, 0]× [0, 1/2]) .

Page 300: Kluwer

Ergodic theory of continued fractions 283

Writing Aα for ASα—see again the general case in Subsection 4.2.5—we take

Aα =(I2 \ (Sα ∪ τ(Sα))

) ∪ (M(τ(Sα)) \ (0 × [0, 1/2]))= ([α− 1, g − 1)× [0, 1− g))

∪ ([g − 1, (1− 2α)/α]× [0, 1− g])∪ (((1− 2α)/α, 0)× [0, 1/2]) ∪ ([0, (2α− 1)/(1− α)]× [0, 1/2))∪ (((2α− 1)/(1− α), α)× [0, g)) .

If we denote by fα : Aα → R ∪ ∞ the function corresponding to thefunction f in Theorem 4.2.14, then it easy to see that actually fα maps Aα

into N+ and that∣∣x−1

∣∣− fα(x, y) ∈ [α− 1, α), x ∈ [α− 1, α) \ 0.

Since there exists only one n ∈ N+ such that∣∣x−1

∣∣− n ∈ [α− 1, α),

we deduce that fα(x, y) does not depend on y and that we should have

fα(x, y) = b∣∣x−1

∣∣ + 1− αc, (x, y) ∈ Aα, x 6= 0.

Hence x → ∣∣x−1∣∣ − fα(x, y) is Nakada’s transformation Nα. On account

of Theorem 4.2.14 we can therefore state the main result for the case α ∈[1/2, g].

Theorem 4.3.1 [Nakada (1981)] Let 12 ≤ α ≤ g. Consider the probability

measure γα on BAα with density

1log G

1(xy + 1)2

, (x, y) ∈ Aα,

and the transformation Nα : Aα → Aα defined by

Nα(x, y) =(|x−1| − b|x−1|+ 1− αc, (b|x−1|+ 1− αc+ y sgnx

)−1)

,

where (x, y) ∈ Aα. Then (Aα,BAα , Nα, γα) is an ergodic dynamical systemunderlying the corresponding α-expansion.

Taking projection onto the first axis, we deduce from Theorem 4.3.1 thefollowing result.

Page 301: Kluwer

284 Chapter 4

Corollary 4.3.2 Let 12 ≤ α ≤ g. Consider the probability measure µα

on BIα with density

1log G

×

1/(x + G + 1) if x ∈ [α− 1, (1− 2α)/α],

1/(x + 2) if x ∈ ((1− 2α)/α, (2α− 1)/(1− α)),

1/(x + G) if x ∈ [(2α− 1)/(1− α), α] .

Then (Iα,BIα , Nα, µα) is an ergodic dynamical system.

Remark. For α = 1/2 we obtain the NICF expansion, and the corre-sponding result has been derived independently by Rieger (1979) and Rock-ett (1980). 2

0 1

1

12

g

gα 1−αα

Figure 4.2: Sα for 12 ≤ α ≤ g

From Figure 4.2 it is clear that the vertices (α, g) and ((1− α)/α, 1) ofSα determine the value of the Legendre constant Lα := LSα . See Theorem4.2.11. More precisely, we have the following result.

Theorem 4.3.3 Let 12 ≤ α ≤ g. Then

Lα = min(α/(1 + αg), 1− α).

Remark. Notice that for the values of α ∈ [1/2, g] under considerationwe have

τ([1/2, α)× [0, g)) ⊂ Sα .

Page 302: Kluwer

Ergodic theory of continued fractions 285

It follows at once from this and (4.3.1) that BSα

.= ∅, which is consistentwith Proposition 4.2.7. 2

Case (ii). Let α ∈ (g, 1]. Put

Sα = [α, 1]× I . (4.3.2)

Hence τ(Sα) = [0, (1−α)/α]×[1/2, 1], and Sα∩τ (Sα) = ∅ since for α ∈ (g, 1]we have

(1− α)/α < α .

It is then easy to check that Sα is indeed a singularization area. However,a simple calculation shows that

γ(Sα) = 1− log(1 + α)log 2

,

thus for no value of α under consideration here the singularization area Sα

is maximal.Next, with M defined as in Subsection 4.2.5 we have

M (τ(Sα)) = [α− 1, 0]× [0, 1/2] .

Define Aα exactly as in case (i) and denote by fα : Aα → R ∪ ∞ thefunction corresponding to the function f in Theorem 4.2.14. The expressionof Aα is now simpler, namely,

Aα = ([α− 1, 0)× [0, 1/2])∪([0, (1− α)/α]× [0, 1/2))∪(((1− α)/α, α)× I) ,

see Figure 4.3.Similarly to case (i) we find that fα(x, y) is independent of y and that

in fact we have again

fα(x, y) = b|x−1|+ 1− αc , (x, y) ∈ Aα, x 6= 0.

Thus we can state the main result for the case α ∈ (g, 1].

Theorem 4.3.4 [Nakada (1981)] Let g < α ≤ 1. Consider the probabilitymeasure γα on BAα with density

1log(1 + α)

1(xy + 1)2

, (x, y) ∈ Aα,

and the transformation Nα : Aα → Aα defined as in Theorem 4.3.1. Then(Aα,BAα , Nα, γα) is an ergodic dynamical system.

Page 303: Kluwer

286 Chapter 4

0

1/2

1

α− 1 1−αα 1/2

α

τ(Sα)

M(τ(Sα))

1

Figure 4.3: Sα for g ≤ α ≤ 1

Taking again projection onto the first axis, we deduce from Theorem4.3.4 the following result.

Corollary 4.3.5 Let g < α ≤ 1. Consider the probability measure µα

on BIα with density

1log(1 + α)

×

1/(x + 2) if x ∈ [α− 1, (1− α)/α],

1/(x + 1) if x ∈ ((1− α)/α, α].

Then (Iα,BIα , Nα, µα) is an ergodic dynamical system.

We conclude the discussion of case (ii) with some results from Kraaikamp(1991). It is obvious that the vertex (α, 1) of Sα determines the value of theLegendre constant Lα := LSα . As min(α/(α + 1), 1/2) = α/(α + 1) in case(ii), we have the following result. See again Theorem 4.2.11.

Theorem 4.3.6 Let g < α ≤ 1. Then

Lα =α

α + 1.

Page 304: Kluwer

Ergodic theory of continued fractions 287

Next, it is easy to check that

τ−1 (Sα) ∩ ([1/2, 1]× I) = [1/2, 1/(1 + α)]× I.

Since for our values of α we have (1 − α)/α < 1/(1 + α), we find that theset Bα := BSα from Proposition 4.2.7 is (1/(1 + α), α)× I. Then

γα(Bα) = 2− log(2 + α)log(1 + α)

,

and we can state the following result.

Theorem 4.3.7 Let g < α ≤ 1. For the α-expansion [e1/a1, e2/a2, · · · ] =[e1/b1, e2/b2, · · · ] of irrationals in Iα we have

limn→∞

1n

cardk ; ak = 1, 1 ≤ k ≤ n = 2− log(2 + α)log(1 + α)

a.e..

Remarks. 1. The case α = 1 gives the classical result from Proposition4.1.1.

2. For α ∈ [g, 1] the limit 2 − log(2+α)log(1+α) increases monotonically from 0

to 2− log 3log 2 = 0.4150 · · · , the asymptotic relative frequency of digit 1 in the

RCF expansion. At α = 0.76292 · · · we have already lost half of the original1’s.

3. It follows from Corollary 4.2.10 that for the α-expansion with α ∈(g, 1] we have

limn→∞

1n

n∑

k=1

ek = 3− log 4log(1 + α)

a.e..

2

We conclude this subsection by giving the analogue of Vahlen’s theorem—see Subsection 1.3.2—for α-expansions with α ∈ [1/2, 1]. For the NICF andHurwitz’ SCF expansions this analogue was independently given by Kurosu(1924) and Sendov (1959/60). Kraaikamp (1990) proved the Kurosu–Sendovresults by giving a domain in R2 where the point (Θe

n−1, Θen) always lies.

For the two expansions just mentioned, that is for α = 1/2 and α = g, wehave

min(Θen−1, Θ

en) < 2g3 = 0.4721 · · · ,

and the constant 2g3 is best possible.

Page 305: Kluwer

288 Chapter 4

However, one might ask whether there are values of α for which stillsmaller values can be obtained for the corresponding approximation coeffi-cients Θe

n(α) = Θen, n ∈ N. Beforehand it is clear that a value smaller than

1/√

5 = 0.447 · · · can never be found by a classical result of A. Hurwitz[see Perron (1954, p. 49)], according to which for every θ < 1/

√5 there exist

irrational numbers x such that the inequality q2 |x− (p/q)| < θ is verifiedonly for finitely many p/q ∈ Q.

The above-mentioned method from Kraaikamp (1990) can easily be adap-ted for S-expansions. As an example we will mention here the case of α-expansions, for which the first result below is due to Bosma et al. (1983).

Theorem 4.3.8 Let α ∈ [1/2, 1]. For any irrational number in Iα andany n ∈ N+ we have

Θen < c(α)

andmin(Θe

n−1, Θen) < V (α) ,

where the functions c, V : [1/2, 1] → R are defined by

c(α) = max(

G1− α

gα + 1, α

),

12≤ α ≤ 1,

and

V (α) =

max

(g

1 + gα, 4α− 2

)if 1

2 ≤ α ≤ g,

max

(2(1− α)α + 1

α2 + 1

)if g ≤ α ≤ 1.

The bounds c(α) and V (α) are best possible.

For the proof see Kraaikamp (1991).Remark. A simple calculation yields minα c(α) = c(α0) = α0, with

α0 =12

(−2−

√5 +

√6√

5 + 15)

= 0.5473 · · · .

Moreover, we have minα V (α) = V (α1) = 0.4484 · · · , a constant slightlylarger than 1/

√5, where

α1 =1− 3g +

√10− 11g

4g2= 0.6121 · · · < g.

2

Page 306: Kluwer

Ergodic theory of continued fractions 289

4.3.2 Minkowski’s diagonal continued fraction expansion

Let x ∈ R such that both x and 2x 6∈ Z. Consider the sequence σ of allirreducible fractions p/q ∈ Q with q ∈ N+ satisfying

∣∣∣∣x−p

q

∣∣∣∣ <1

2q2,

ordered in such a way that their denominators form an increasing sequence.It can be shown [see, e.g., Perron (1954, §45)] that there exists a uniqueSRCF expansion whose sequence of convergents coincides with σ. Legen-dre’s theorem (see Corollary 1.2.4) implies that we take precisely those RCFconvergents for which Θn < 1/2. By (4.2.5) this SRCF expansion—whichis called Minkowski’s diagonal continued fraction (DCF ) expansion—is anS-expansion with singularization area

S = SDCF :=

(x, y) ∈ I2 :x

xy + 1≥ 1

2

.

Since min(Θn, Θn+1) < 1/2—cf. Subsection 1.3.2—the DCF expansion picksat least one out of two consecutive RCF convergents. Since

γ (SDCF) = 1− 12 log 2

,

the singularization area SDCF is not maximal. Also, by Theorem 4.2.8 wehave

limk→∞

nSDCF(k)

k= 2 log 2 = 1.3862 · · · a.e..

It can be shown [cf. Kraaikamp (1989, p. 210)] that the DCF expan-sion of any ω ∈ Ω can be obtained from its RCF expansion [a1, a2, · · · ] bysingularizing any digit ak+1(ω) = 1 if and only if one of the following fourconditions is fulfilled:

(i) k = 0, that is, a1 = 1;

(ii) ak, ak+2 6= 1, k ∈ N+;

(iii) ak 6= 1, ak+2 = 1, and [ak+3, ak+4, · · · ] > [ak−1, · · · , a1], k ∈ N+, withthe convention that the value of [ak − 1, · · · , a1] for k = 1 is [a1 − 1];

(iv) ak = 1, ak+2 6= 1, and [ak−1, · · · , a1] > [ak+2 − 1, ak+3, · · · ], k ≥ 2.

Page 307: Kluwer

290 Chapter 4

It is also interesting to note that the DCF expansion of a quadratic ir-rationality is periodic.

The general theory developed in Subsections 4.2.4 and 4.2.5 allows usto state the following results. For detailed proofs the reader is referred toKraaikamp (op. cit.).

With the notation in Subsection 4.2.5, for the DCF expansion case wehave

∆+DCF =

((x, y) ∈ R2

++ :x

xy + 1<

12,

y

xy + 1<

12

),

M (τ(SDCF)) =(

(x, y) ∈ R2 :(x + 1)(1− y)

xy + 1≤ 1

2, −1

2≤ x ≤ 0, y ≥ 0

),

ADCF := ASDCF= ∆+

DCF ∪M (τ(SDCF)) ,

see also Figure 4.4.

0 1

1

1/2−1/2

1/2

SDCF

τ(SDCF)

M(τ(SDCF))

...................................................................................................................................................................................................................................................................................................................................................................................................................

...........................................................

...................................................

.............................................

.......................................

....................................

.................................

..............................

..............................................................................................................

.............................................................................................................................................................................

....................................

..............................................

..............

Figure 4.4: SDCF

Furthermore, writing fDCF for fSDCFand τDCF for τSDCF

we have

fDCF(x, y) =

⌊∣∣x−1

∣∣ +b∣∣x−1

∣∣c+ y sgnx− 12(b|x−1|+ y sgnx)− 1

⌋,

and

τDCF(x, y) =(∣∣x−1

∣∣− fDCF(x, y), (fDCF(x, y) + y sgnx)−1)

Page 308: Kluwer

Ergodic theory of continued fractions 291

for (x, y) ∈ ADCF.Proposition 4.3.9 Let ρDCF be the probability measure on BSDCF

withdensity

2(xy + 1)2

, (x, y) ∈ ADCF.

Then (ADCF,BSDCF, τDCF, ρDCF) is an ergodic dynamical system which un-

derlies the DCF expansion.

Proposition 4.3.10 For any µ ∈ pr(B[−1/2,1]

)such that µ ¿ λ and

any (t1, t2) ∈ I2 we have

limn→∞µ

(Θe

n−1 ≤ t1, Θen ≤ t2

)= H(t1, t2).

Here H is the distribution function with density d1 + d2, where

d1(x, y) =2IB1(x, y)√

1− 4xy, d2(x, y) =

2IB2(x, y)√1 + 4xy

,

with

B1 = [0, 1/2]× [0, 1/2],

B2 = B1 ∩((x, y) ∈ E1 : 0 ≤ (x− y)2 + x + y ≤ 3/4

).

The result above can be also stated in an equivalent form concerning theexistence for any (t1, t2) ∈ I2 of the limit a.e. equal to H(t1, t2) of

1n

cardk : Θek ≤ t1, Θe

k+1 ≤ t2, 0 ≤ k ≤ n− 1

as n →∞. It then follows, e.g., that

limn→∞

1n

n−1∑

k=0

Θek =

14

a.e..

We also note the following results.

Proposition 4.3.11 An RCF digit ak+1 equal to 1 does not disappearin the DCF expansion if and only if

(τk, sk) ∈ B =(

(x, y) ∈ ADCF : y <1− 2x

3x− 2, y >

2x− 1x

, y <1

2− x

)

Page 309: Kluwer

292 Chapter 4

whatever k ∈ N.

Note that γ(B) is equal to

1log 2

(∫ 1

1/2dt

∫ 1/(2−t)

(2t−1)/t

du

(tu + 1)2−

∫ 2−√2

1/2dt

∫ 1/(2−t)

(2t−1)/(2−3t)

du

(tu + 1)2

)

=1

log 2

(log(

√2− 1) +

√2− 1

2

)= 0.0473 · · · .

Corollary 4.3.12 Let [ae0; a

e1, a

e2, · · · ] be the DCF expansion of an irra-

tional number. Then

limn→∞

1n

cardk : aek = 1, 1 ≤ k ≤ n

= ρDCF(B) =γ(B)

1− γ(SDCF)

= 2(

log(√

2− 1) +√

2− 12

)= 0.0656 · · · a.e..

This asymptotic relative frequency (6.56 · · ·%) should be compared withthe asymptotic relative frequency of digit 1 in the RCF expansion (2− log 3

log 2 =41.50 · · ·%). See Proposition 4.1.1 and Subsection 4.1.2.

4.3.3 Bosma’s optimal continued fraction expansion

A remarkable geometrical interpretation of the RCF expansion of an irra-tional number was given by Klein (1895). The idea behind it is to representany irreducible p/q ∈ Q ∩ I by an integer-valued vector in R2

+, namely, bythe point (q, p) ∈ R2

+, and to represent an irrational number ω ∈ Ω by ahalf-line L with slope ω. The approximation of ω by its RCF convergentsamounts to systematically finding integer-valued vectors close to L. Moreprecisely, starting from V−1 = (0, 1) and V1 = (1, 0) we define Vn recursivelyby

Vn = anVn−1 + Vn−2 , n ∈ N+,

where an ∈ N+ is maximal with respect to the property that Vn is on thesame side of L as Vn−2. It then appears that the positive integers a1, a2, · · ·are in fact the RCF digits of ω, that is, ω = [a1, a2, · · · ].

Page 310: Kluwer

Ergodic theory of continued fractions 293

Bosma (1987) gave a similar interpretation of α-expansions and, inspiredby this, presented a very interesting SRCF expansion formally defined asfollows.

Definition 4.3.13 Let −1/2 < x < 1/2. Put ae0 = 0, te0 = x, e1 = sgn te0,

pe−1 = 1, qe

−1 = 0, pe0 = 0, qe

0 = 1, se0 = 0, and define recursively

aek+1 =

∣∣tek∣∣−1

+b∣∣tek

∣∣−1c+ eek+1s

ek

2(∣∣tek

∣∣−1c+ eek+1s

ek

)+ 1

,

tek+1 =∣∣tek

∣∣−1 − aek+1, ek+2 = sgn tek+1,

pek+1 = ae

k+1pek + ek+1p

ek−1, qe

k+1 = aek+1q

ek + ek+1q

ek−1,

sek+1 = qe

k/qek+1, k ∈ N.

The optimal continued fraction (OCF ) expansion of x, denoted OCF(x), isthe SRCF expansion [e1/ae

1, e2/ae2, · · · ]. For an irrational x ∈ R such that

2x 6∈ Z, OCF(x)=[ae0; e1/ae

1, e2/ae2, · · · ] is defined as ae

0 + [e1/ae1, e2/ae

2, · · · ],where ae

0 ∈ Z is such that −1/2 < x − ae0 < 1/2, and [e1/ae

1, e2/ae2, · · · ] =

OCF(x− ae0).

It is not difficult to see that the tek and sek have the usual meaning, that

is,

tek = [ek+1/aek+1, · · · ], k ∈ N,

sek =

0 if k = 0,

1/ae1 if k = 1,

[1/aek, ek/ae

k−1, · · · , e2/ae1] if k ≥ 2

and pek/qe

k, k ∈ N, are the OCF convergents of x.

Next, the sequence of OCF convergents (pek/qe

k)k∈N is a subsequence ofthe sequence (pn/qn)n∈N of RCF convergents. If we define n(k) in such away that pe

k/qek = pn(k)/qn(k), k ∈ N+, then

n(k + 1) =

n(k) + 1 if ek+2 = 1,

n(k) + 2 if ek+2 = −1

Page 311: Kluwer

294 Chapter 4

with

n(0) =

0 if x > 0,

1 if x < 0.

Finally, it appears that the OCF expansion gives approximation coeffi-cients Θe

n = (qen)2 |x− (pe

n/qen)| < 1/2 for any n ∈ N and, at the same time,

it is a fastest expansion. Fastest SRCF expansions for which all convergentsare RCF convergents can be defined as those in which always the maximalnumber of RCF convergents is skipped, meaning that whenever a 1-block oflength m ∈ N+ occurs in the RCF expansion, exactly b(m + 1)/2c out of them 1’s are skipped. (Note that this implies that for fastest SRCF expansionsonly a choice is left in deciding which RCF convergents will be skipped whenm is even.) A still more precise definition of ‘fastest’ is as follows. Writingnα(k) := nSα(k), k ∈ N+, α ∈ [1/2, 1], by Theorem 4.2.8 we have a.e.

limk→∞

nα(k)k

=

log 2log G = 1.44092 · · · if 1/2 ≤ α ≤ g,

log 2log(α + 1) if g < α ≤ 1.

Then an (arbitrary) SRCF expansion is said to be fastest if and only ifnSRCF(k) = n1/2(k) for infinitely many k ∈ N+. Here the non-decreasingfunction nSRCF : N+ → N+ is defined by

qnSRCF(k) ≤ qek < qnSRCF(k)+1, k ∈ N+,

where the qi and qei , i ∈ N+, are associated with the RCF expansion and

the SRCF expansion considered, respectively. Cf. Bosma (1987, p. 364).

The next result [cf. Bosma and Kraaikamp (1990)] places OCF expan-sions in the context of the S-expansion theory. More precisely, it shows howsingularizing appropriately the RCF expansion yields the OCF expansion.(Note that it is for this reason that we have anticipated notation by denotingthe OCF expansion as an S-expansion.)

Lemma 4.3.14 Let ω ∈ Ω have RCF expansion [a1, a2, · · · ], RCF con-vergents pn/qn, and RCF approximation coefficients Θn, n ∈ N. Considerthe set

SOCF =(

(x, y) ∈ I2 ; y < min(

x,2x− 11− x

)).

Then for any n ∈ N+ the following three assertions are equivalent:

Page 312: Kluwer

Ergodic theory of continued fractions 295

(i) pn/qn is not an OCF convergent of ω;

(ii) an+1 = 1 , Θn−1 < Θn and Θn > Θn+1;

(iii) (τn, sn) ∈ SOCF.

Proof. For the proof of the equivalence of (i) and (ii) we refer the readerto Corollary (4.20) of Bosma (1987). Here we show that (ii) and (iii) areequivalent.

Since

Θn−1 =sn

snτn + 1, Θn =

τn

snτn + 1, n ∈ N+, (4.3.2)

we have|qnω − pn|

|qn−1ω − pn−1| =Θnqn−1

Θn−1qn= τn < 1, ω ∈ Ω. (4.3.3)

AlsoΘn−1 < Θn if and only if τn > sn . (4.3.4)

Furthermore, if an+1 = 1 then pn+1 = pn + pn−1 and qn+1 = qn + qn−1, andby (4.3.3) we have

Θn+1 = qn+1|qn+1ω − pn+1|

= (qn + qn−1)|(qn + qn−1)ω − (pn + pn−1)|

= (qn + qn−1)|(qn−1ω − pn−1) + (qnω − pn)|

= (qn + qn−1)(|qn−1ω − pn−1| − |qnω − pn|)

since qnω− pn and qn−1ω− pn−1 have different signs, as shown by equation(1.1.18). Thus

Θn+1 = Θn−1

(1 +

qn

qn−1

)−Θn

(1 +

qn−1

qn

).

It follows from (4.3.3) that

an+1 = 1 and Θn+1 < Θn if and only if sn <2τn − 11− τn

. (4.3.5)

Combining (4.3.4) and (4.3.5) with the definition of SOCF completes theproof. 2

Page 313: Kluwer

296 Chapter 4

Remarks. 1. It is easy to check that

γ (SOCF) = 1− log Glog 2

,

so SOCF is a maximal singularization area. See Figure 4.5. Notice that SOCF

contains SDCF, hence any sequence of OCF convergents is a subsequence ofthe corresponding sequence of DCF convergents. Since τ (SOCF) ⊂ I2\SOCF,the set BSOCF

of the OCF preservation area of 1’s is empty. Hence any OCFincomplete quotient (or digit) is greater than or equal to 2.

2. It now appears that the function n : N+ → N+ considered above isin fact nSOCF

. It then follows from Theorem 4.2.8 that

limk→∞

n(k)k

=log 2log G

= 1.4404 · · · a.e..

2

As in the DCF expansion case, the general theory developed in Subsec-tions 4.2.4 and 4.2.5 allows us to state the following results. For detailedproofs the reader is referred to Bosma and Kraaikamp (1990, 1991).

With the notation in Subsection 4.2.5, for the OCF case we have

∆OCF = I2 \ SOCF =(

(x, y) ∈ I2 : y ≥ min(

x,2x− 11− x

)),

∆−OCF = τ (SOCF) =

((x, y) ∈ I2 : (y, x) ∈ SOCF

),

that is, reflecting SOCF in the diagonal y = x yields ∆−OCF, and

AOCF := ASOCF= M(∆OCF)

=(

(x, y) ∈ (−1/2, g)× [0, g] : y ≤ min(

2x + 1x + 1

,x + 1x + 2

)

and y ≥ max(

0,2x− 11− x

)),

see Figure 4.5.Furthermore, writing fOCF for fSOCF

and τOCF for τSOCFwe have

fOCF(x, y) =⌊∣∣x−1

∣∣ +b|x−1|c+ y sgnx

2(b|x−1|c+ y sgnx) + 1

⌋,

τOCF(x, y) =(∣∣x−1

∣∣− fOCF(x, y), (fOCF(x, y) + y sgnx)−1)

Page 314: Kluwer

Ergodic theory of continued fractions 297

0 1

1

g1/2−1/2

1/2

SOCF

τ(SOCF)

M(τ(SOCF))

.................................................................................................................................................................................................................................................................................................................................................................................................................................

.....................................................................

..................................................................................

.........................................................................

...............................................

......................................................

..........................................

...............................................................................................................................................

Figure 4.5: SOCF

for (x, y) ∈ AOCF.

Theorem 4.3.15 Let ρOCF be the probability measure on BAOCFwith

density1

log G1

(xy + 1)2, (x, y) ∈ AOCF.

Then (AOCF,BAOCF, τOCF, ρOCF) is an ergodic dynamical system which un-

derlies the OCF expansion.

Remark. For both DCF and OCF expansions the two-dimensional setsADCF and AOCF have curved boundaries. This implies that the functionsfDCF and fOCF depend on both their arguments x and y, and not only on xas in the case of α-expansions, α ∈ [1/2, 1]. As a result, no one-dimensionalergodic dynamical system exists for either DCF or OCF expansion. 2

Proposition 4.3.16 For any µ ∈ pr(B[−1/2,g]

)such that µ ¿ λ and

any (t1, t2) ∈ I2 we have

limn→∞µ

(Θe

n−1 ≤ t1, Θen ≤ t2

)= H(t1, t2).

Here H is the distribution function with density

1log G

(1√

1− 4xy+ 1√

1 + 4xy

)if (x, y) ∈ Π,

0 elsewhere,

Page 315: Kluwer

298 Chapter 4

where Π =((x, y) ∈ R2

++ : 4x2 + y2 < 1, x2 + 4y2 < 1).

The result above can be also stated in an equivalent form concerning theexistence for any (t1, t2) ∈ I2 of the limit a.e. equal to H(t1, t2) of

1n

cardk : Θek ≤ t1, Θe

k+1 ≤ t2, 0 ≤ k ≤ n− 1

as n →∞. It then follows, e.g., that

limn→∞

1n

n∑

k=1

Θek =

arctan 12

4 log G= 0.24087 · · · a.e.. (4.3.6)

Other consequences are that for any irrational number we have

(i) 0 < Θen < 1/2, n ∈ N+;

(ii) 0 < Θen−1 + Θe

n < 2/√

5, hence min (Θen−1, Θ

en) < 1/

√5, n ∈ N+.

In connection with (ii) above, it should be noted that the constant 1/√

5in the second inequality is ‘best possible’ by A. Hurwitz’s result mentionedjust before Theorem 4.3.8.

Remark. The a.e. asymptotic arithmetic mean (4.3.6) should be com-pared with the corresponding values

14 log 2

= 0.36067 · · · for the RCF expansion,

14

= 0.25 for the DCF expansion,

√5− 2

2 log G= 0.24528 · · · for the NICF and SCF expansions,

√8G + 6− 2G− 1

log G= 0.24195 · · · for the α0-expansion,

where α0 = 0.55821 · · · . See Corollary 4.1.23 and Proposition 4.3.10 for thefirst two values, and Bosma et al. (1983) for the last two ones. Note howclose the value in (4.3.6) is to

1− γ (SOCF) =log G

2= 0.24061 · · · .

Page 316: Kluwer

Ergodic theory of continued fractions 299

The latter gives an a priori bound for the a.e. asymptotic arithmetic meanof the approximation coefficients. It can be shown that the value in (4.3.6)is in fact ‘the best one can get’ for any irrational number. More precisely,we have the following result.

Theorem 4.3.17 [Bosma and Kraaikamp (1991)] Whatever the SRCFexpansion with convergents pe

n/qen and approximation coefficients Θe

n, n ∈ N,we have

1m

m∑

k=1

Θek ≥

1n

n∑

k=1

Θek , n ∈ N+,

for any irrational number, where m = cardk : qek < qe

n+1, k ∈ N+ andqen and Θe

n, n ∈ N+, are associated with the OCF expansion.

4.4 Continued fraction expansions with σ-finite,infinite invariant measure

4.4.1 The insertion process

We have seen in previous subsections how the concept of singularizationleads to a class of SRCF expansions for which the underlying ergodic theorycan be developed.

The idea of adding a convergent instead of removing one (as singulariza-tion does) leads to the concept of insertion, to some extent the opposite ofthat of singularization. Now, the fundamental identity is

a +1

b + x= a + 1 +

−1

1 +1

b− 1 + x

,

where a ∈ Z, b ∈ N+, b > 1, x ∈ [0, 1). Let (cf. Subsection 4.2.2)

(ek)k∈M , (ak)k∈0∪M (4.4.1)

be a (finite or infinite) CF with a`+1 > 1, e`+1 = 1 for some ` ∈ N for which` + 1 ∈ M . The transformation ι` which takes (4.4.1) into the CF

(ek)k∈fM , (ak)k∈0∪fM , (4.4.2)

where M = M if M = N+ and M = k : 1 ≤ k ≤ n + 1 if M =k : 1 ≤ k ≤ n, n ∈ N+, with ek = ek, k ∈ M , k ≤ `, e`+1 = −1,

Page 317: Kluwer

300 Chapter 4

e`+2 = 1, ek = ek−1, k ∈ M , k ≥ ` + 3, ak = ak, k ∈ 0 ∪M , k ≤ ` − 1,a` = a` + 1, a`+1 = 1, a`+2 = a`+1 − 1, ak = ak−1, k ≥ ` + 3, is calledan insertion of the pair (1,−1) before a`+1, e`+1. Let (pe

k/qek)k∈0∪M and

(pek/qe

k)k∈0∪fM be the sets associated with (4.4.1) and (4.4.2), respectively.The result corresponding to Proposition 4.2.4 can be stated as follows.

Proposition 4.4.1 Let ` ∈ N such that ` + 1 ∈ M . The set of conver-gents

(pek/qe

k)k∈0∪fMresulting after the insertion ι` of the pair (1,−1) before a`+1 (> 1), e`+1

(= 1), is obtained by inserting the term (pe` + pe

`−1)/(qe` + qe

`−1) in the set(pe

k/qek)k∈0∪M before the convergent pe

`/qe` . As usual, here pe

−1 = 1, qe−1 =

0.

The proof is similar to that of Proposition 4.2.4 by using appropriatematrix identities. 2

Starting from the RCF expansion, by appropriate insertions we can ob-tain many classical SRCF expansions, and also continued fraction algorithmswhich are not SRCF expansions. Amongst the former we mention the Lehnercontinued fraction (LCF) expansion, and amongst the latter the Farey con-tinued fraction (FCF) expansion. Both these expansions will be studied inthe next subsection.

In particular, we can obtain this way the OddCF and EvenCF expansions—see the examples of SRCF expansions at the end of Subsection 4.2.2—aswell as the backward continued fraction (BCF) expansion that we will studyin Subsection 4.4.3.

4.4.2 The Lehner and Farey continued fraction expansions

Lehner (1994) showed that any number x ∈ [1, 2) has a unique infinite SRCFexpansion of the form

b0 +e1

b1 +e2

b2 +.. .

:= [ b0; e1/b1, e2/b2, · · · ] , (4.4.3)

where (bn, en+1) is equal to either (1, 1) or (2,−1), n ∈ N. We shall callthis expansion the Lehner continued fraction (LCF ) expansion. Dajani andKraaikamp (2000) called it the Lehner fraction or the Lehner expansion, andshowed that if we define the transformation L : [1, 2) → [1, 2) by

L(x) =e(x)

x− b(x), x ∈ [1, 2),

Page 318: Kluwer

Ergodic theory of continued fractions 301

where

(b(x), e(x)) =

(2,−1) if 1 ≤ x < 32 ,

(1, 1) if 32 ≤ x < 2,

then(bn(x), en+1(x)) = (b(Ln(x)), e(Ln(x))) , x ∈ [1, 2),

for any n ∈ N. Here Ln, n ∈ N+, denotes the composition of L with itselfn times while L0 is the identity map.

Denoting as usual the RCF convergents of a real number x = [a0;a1, a2, · · · ] by (pn/qn)n∈N and defining the mediant convergents of x by

kpn + pn−1

kqn + qn−1, 1 ≤ k < an+1, n = 1, 2, · · ·

(so that if an+1 = 1 then there is no mediant convergent), we will see thatthe set of LCF convergents of x is the union of the sets of RCF and mediantconvergents of x. It is for this reason that the LCF expansion was called themother of all SRCF expansions in Dajani and Kraaikamp (op. cit.).

Proposition 4.4.2 Let x ∈ [1, 2) \Q, with RCF expansion

[ 1; a1, a2, · · · ].Then the LCF expansion (4.4.3) of x is given by the following algorithm.

(i) Let n be the smallest m ∈ N for which am+1 > 1. If n = 0, that is,a1 > 1 then we replace [1; a1, a2 · · · ] by

[ 2; −1/2, · · · , −1/2︸ ︷︷ ︸(a1−2) times

, −1/1, 1/1, 1/a2, · · · ] .

If n ≥ 1 then we replace

[ 1; 1, · · · , 1, an+1, · · · ]by

ιn+an+1−1( · · · (ιn+1(ιn([ 1; 1, · · · , 1, an+1, · · · ])) · · · )

= [ 1; 1/1, · · · , 1/1︸ ︷︷ ︸(n−1) times

, , 1/2, −1/2, · · · , −1/2︸ ︷︷ ︸(an+1−2) times

, −1/1, 1/1, 1/an+2, · · · ] ,

where ιn is defined as in Subsection 4.4.1. Denote the SRCF expansionof x thus obtained by

Page 319: Kluwer

302 Chapter 4

[ b′0; e′1/b′1, e′2/b′2, · · · ]. (4.4.4)

(ii) Let n′ > n be the smallest integer m′ > n for which e′m′+1 = 1 andb′m′+1 > 1. Apply to (4.4.4) the procedure from (i) to b′n′+1.

The proof is easy and left to the reader. 2

Remark. It follows from the very insertion mechanism that any RCF ormediant convergent is an LCF convergent. Conversely, the sequence of LCFconvergents is obtained after all mediant convergents have been inserted intothe sequence of RCF convergents. Another immediate consequence is thatthe LCF expansion of a quadratic irrationality is (eventually) periodic. 2

Note that the transformation L [which is implicit in Lehner (1994)] isisomorphic to the transformation I : [0, 1) → [0, 1) defined by

I(x) =

x

1− xif 0 ≤ x < 1/2,

1− x

xif 1/2 ≤ x < 1,

which was used by Ito (1989) to generate the RCF and mediant convergentsof any x ∈ [0, 1). More precisely, we have

L(x) = I(x− 1) + 1, x ∈ [1, 2).

We also haveL(x) =

1I (h(x− 1))

, x ∈ [1, 2),

where the bijective function h : [0, 1) → [1/3, 2/3) is defined by

h(x) =

12− x

if 0 ≤ x < 1/2,

x

x + 1if 1/2 ≤ x < 1.

Ito (op. cit.) showed that I is ν-preserving, where ν is the σ-finite, infinitemeasure on B[0,1) with density x−1, x ∈ (0, 1), and that

([0, 1),B[0,1), I, ν

)is

an ergodic dynamical system. This implies that L is µ-preserving, where µis the σ-finite, infinite measure on B[1,2) with density (x − 1)−1, x ∈ (1, 2),and that

([1, 2),B[1,2), L, µ

), is an ergodic dynamical system underlying the

LCF expansion.

Page 320: Kluwer

Ergodic theory of continued fractions 303

We will now exhibit the relationship between the LCF expansion and analgorithm yielding the so called Farey continued fraction (FCF ) expansion.The latter is an infinite CF expansion of any x ∈ [−1, 0)∪ (0,∞) of the form

f1

d1 +f2

d2 +.. .

:= [ f1/d1, f2/d2, · · · ] , (4.4.5)

where (dn, fn) is equal to either (1, 1) or (2,−1), n ∈ N+. Formally, asshown by Dajani and Kraaikamp (op. cit.), if we define the transformationF : [−1,∞) → [−1,∞) by

F(x) =

f(x)x − d(x) if x 6= 0,

0 if x = 0,

where

(d(x), f(x)) =

(2,−1) if − 1 ≤ x < 0,

(1, 1) if x ≥ 0,

then

(dn(x), fn(x)) =(d(Fn−1(x)), f(Fn−1(x))

), x ∈ [−1,∞),

for any n ∈ N+. Here Fn, n ∈ N+, denotes the composition of F with itselfn times while F0 is the identity map.

By its very definition the FCF expansion is not an SRCF expansionsince the condition fn+1 + dn ≥ 1, n ∈ N+, is violated.

Put D = [1, 2)× [−1,∞), and define the transformation L : D → D by

L(x, y) =(

L(x),e(x)

b(x) + y

), (x, y) ∈ D.

It is easy to check that L is a one-to-one transformation of D′ := [1, 2) ×([−1, 0) ∪ (0,∞)) with inverse

L−1(x, y) =(

f(y)x

+ d(y), F(y))

, (x, y) ∈ D′.

Also, for any n ≥ 2 we have

Ln(x, y) = (Ln(x), [en(x)/bn−1(x), · · · , e2(x)/b1(x), e1(x)/(b0(x) + y)])

Page 321: Kluwer

304 Chapter 4

whatever (x, y) ∈ D, and

L−n(x, y) = ([dn(y); fn(y)/dn−1(y), · · · , f2(y)/d1(y), f1(y)/x],Fn(y))

whatever (x, y) ∈ D′.

Remark. It is interesting to compare the last two equations above with(1.3.1′) and (1.3.2′). This might suggests developments similar to those inSection 1.3. 2

Theorem 4.4.3 The quadruple(D,BD, L, µ

)is an ergodic dynamical

system which is a natural extension of the dynamical system

([1, 2),B[1,2), L, µ

).

Here µ is the σ-finite, infinite measure on BD with density (x+y)−2, (x, y) ∈D = [1, 2)× [−1,∞).

Proof. Let π1 : [1, 2) × [−1,∞) → [1, 2) denote the projection onto thefirst axis. Cf. Remark 1 after Proposition 4.0.5. Then it is easy to checkthat π1 L = L π1, and that

µ(π−1

1 (A))

= µ(A), A ∈ B[1,2).

We should next show that L is µ-preserving and, finally, that the σ-algebragenerated by ⋃

n∈N

Lnπ−11

(B[1,2)

)

coincides with BD. We leave the details to the reader, who can find them inDajani and Kraaikamp (op. cit.). 2

Let us denote by φ the σ-finite, infinite measure on B[−1,∞) with density(x+1)−1−(x+2)−1, x ∈ (−1,∞). It is easy to check that F is φ-preserving.

Theorem 4.4.4 The map ξ : [−1, 0) ∪ (0,∞) → [1, 2) defined by

ξ(x) = [ d1; f1/d2, f2/d3, · · · ] ,

if x ∈ [−1, 0) ∪ (0,∞) has FCF expansion

x = [ f1/d1, f2/d2, · · · ]

is an isomorphism from([−1,∞),B[−1,∞), F, φ

)to

([1, 2),B[1,2), L, µ

).

Page 322: Kluwer

Ergodic theory of continued fractions 305

Proof. It is clear that ξ is bijective. Since

L (ξ(x)) = L ([ d1; f1/d2, f2/d3, · · · ])

= [ d2; f2/d3, f3/d4, · · · ]

= ξ ([ f2/d2, f3/d3, · · · ]) = ξ (F(x)) ,

we only need to show that ξ is measurable and that µ(A) = φ(ξ−1(A)

)for

any A ∈ B[1,2). Whilst measurability is obvious, the equation above canbe easily checked. The details can be found in Dajani and Kraaikamp (op.cit.). 2

An immediate consequence of Theorems 4.4.3 and 4.4.4 is that(

[−1,∞),B[−1,∞), F, φ)

is an ergodic dynamical system underlying the FCF expansion.

Remark. Corollary 4.1.10 in conjunction with the insertion concept pro-vides a heuristic argument why the dynamical system

([1, 2),B[1,2), L, µ

)should be ergodic, where L is µ-preserving for a σ-finite, infinite measureµ. After all, an insertion before a digit > 1 is simply building a tower overthe RCF cylinder corresponding to that digit. Since the LCF expansion isobtained by using insertion as many times as possible in order to ‘shrinkaway’ any RCF digit > 1, it follows that the system thus obtained shouldbe ergodic (it includes the RCF dynamical system as an induced system),but by Corollary 4.1.10 it should have infinite mass. 2

The next result corresponds to Proposition 4.1.8 for the values p =−1, 0, 1 there.

Theorem 4.4.5 Let x ∈ [1, 2) \Q with LCF expansion

[ b0; e1/b1, e2/b2, · · · ].Then

limn→∞

n1b1

+ · · ·+ 1bn

= 2 a.e.,

limn→∞

n√

b1 · · · bn = 2 a.e.,

limn→∞

b1 + · · ·+ bn

n= 2 a.e.

Page 323: Kluwer

306 Chapter 4

Proof. Let [1; a1, a2, · · · ] be the RCF expansion of x. For any givensufficiently large m ∈ N+ there (uniquely) exist integers k ∈ N+ and j ∈ Nsuch that

m = a1 + · · ·+ ak + j , 0 ≤ j < ak+1 .

By Proposition 4.4.2 the LCF expansion is obtained by replacing any RCFdigit ` by a block of LCF digits of length ` consisting of (`− 1) 2’s followedby one 1. Then

1b1

+ · · ·+ 1bm

= k +12

k∑

i=1

(ai − 1) +j

2=

m + k

2.

This implies thatm

1b1

+ · · ·+ 1bm

=1

12

(1 + k

m

) .

Since 0 ≤ j < ak+1, we have

k

m≤ 1

1k

∑ki=1 ai

,

which converges a.e. to 0 by Corollary 4.1.10. Hence

limm→∞

m

1b1

+ · · ·+ 1bm

= 2.

Since any bn, n ∈ N+, is equal to either 1 or 2, recalling the classicalinequalities

m1b1

+ · · ·+ 1bm

≤ m√

b1 · · · bm ≤ b1 + · · ·+ bm

m(≤ 2) ,

the result follows. 2

Corollary 4.4.6 Let x ∈ [−1,∞) \Q, with FCF expansion

[ f1/d1, f2/d2, · · · ].

Page 324: Kluwer

Ergodic theory of continued fractions 307

Thenlim

n→∞n

1d1

+ · · ·+ 1dn

= 2 a.e.,

limn→∞

n√

d1 · · · dn = 2 a.e.,

limn→∞

d1 + · · ·+ dn

n= 2 a.e..

The proof follows from Theorems 4.4.4 and 4.4.5. 2

4.4.3 The backward continued fraction expansion

Until now we have used only the insertion mechanism in this section. Asan example of combining singularization and insertion we discuss here thebackward continued fraction (BCF ) expansion.

Any irrational number ω ∈ I has an infinite CF expansion of the form

1− 1

c1 − 1

c2 +.. .

:= [ 1; −1/c1, −1/c2, · · · ] , (4.4.6)

where 2 ≤ cn = cn(ω) ∈ N+, so that (4.4.6) is an SRCF expansion. Thereis a transformation β : I → I naturally associated with the RCF transfor-mation τ , which is defined by

β(x) =

(x− 1)−1 − ⌊(x− 1)−1

⌋if x ∈ [0, 1),

0 if x = 1.

The graph of β can be obtained from that of τ by reflecting the latter in theline x = 1/2. It is for this reason that (4.4.6) has been called ‘backward’.Note also that β(x) = −N0(x− 1), x ∈ I, where N0 is defined in Subsection4.3.1. In terms of β, the incomplete BCF quotients are given by cn =c1

(βn−1(ω)

), n ∈ N+, with c1 =

⌊(1− ω)−1

⌋, ω ∈ Ω. Here βn, n ∈ N+,

denotes the composition of β with itself n times while β0 is the identitymap. Renyi (1957) showed that β is ν-preserving, where ν is Ito’s σ-finite,infinite measure with density x−1, x ∈ (0, 1), which has been considered inSubsection 4.4.2, and that the dynamical system (I,BI , β, ν) is ergodic. Seealso Adler and Flatto (1984).

Page 325: Kluwer

308 Chapter 4

As with Proposition 4.4.2 we leave to the reader the proof of the followingresult.

Proposition 4.4.7 Let ω ∈ Ω with RCF expansion [a1, a2, · · · ]. Thenthe BCF expansion (4.4.6) of ω is given by the following algorithm.

(i) If a1 = 1 then singularize a1 to arrive at

[ 1; −1/(a2 + 1), 1/a3, · · · ]

as a new SRCF expansion of ω. If a1 > 1 then insert (a1 − 1) times−1/1 before a1 to arrive at

[ 1;−1/2, · · · , −1/2︸ ︷︷ ︸(a1−2) times

, −1/1, 1/1, 1/a2, · · · ]

as a new SRCF expansion of ω, and then singularize the digit 1 ap-pearing before 1/a2 in this expansion of ω. In either case we obtainas SRCF expansion of ω

[ 1; (−1/2)a1−1, −1/(a2 + 1), 1/a3, · · · ] , (4.4.7)

where (−1/2)a1−1 abbreviates −1/2, · · · , −1/2︸ ︷︷ ︸(a1−1)times

.

(ii) Let n be the smallest integer m ∈ N+ for which em = 1 in (4.4.7).Apply to the latter expansion the procedure from (i) to an.

Remarks. 1. The above insertion/singularization mechanism impliesthat ω has a BCF expansion

[ 1; (−1/2)a1−1, −1/(a2 + 2), (−1/2)a3−1, 1/(a4 + 2), · · · ] . (4.4.8)

See also Zagier (1981, Aufgabe 3, p. 131). It also follows easily from (4.4.8)that every quadratic irrationality has an (eventually) periodic BCF expan-sion.

2. Again, as for the LCF expansion, it heuristically follows from Corol-lary 4.1.10 and the insertion mechanism that the BCF transformation βshould be ergodic, with invariant σ-finite, infinite measure. 2

For the LCF expansion it was intuitively clear that n√

b1 · · · bn → 2 a.e. asn →∞ since the only digits are 1 and 2, and ‘there are very few 1’s against

Page 326: Kluwer

Ergodic theory of continued fractions 309

the 2’s’ (by Corollary 4.1.10). For the BCF expansion such an argumentclearly does not work. However, we have the following result.

Theorem 4.4.8 Let ω ∈ Ω with BCF expansion (4.4.6). Then

limn→∞

n√

c1 · · · cn = 2 a.e.

andlim

n→∞n

1c1

+ · · ·+ 1cn

= 2 a.e..

Proof. Let [a1, a2, · · · ] be the RCF expansion of ω. For any given suf-ficiently large m ∈ N+ there (uniquely) exist integers k ∈ N+ and j ∈ Nsuch that

m = a1 + a3 + · · ·+ a2k−1 + j, 0 ≤ j < a2k+1.

It follows from (4.4.8) that

c1 · · · cm = 2Pk

i=1(a2i−1−1)+j−1k∏

i=1

(a2i + 2) ,

and therefore

1m

m∑

i=1

log ci =log 2m

(k∑

i=1

a2i−1 − k + j − 1

)+

1m

k∑

i=1

log(a2i + 2)

= (log 2)

1− k + 1k∑

i=1

a2i−1 + j

+

k∑

i=1

log(a2i + 2)

k∑

i=1

a2i−1 + j

.

Sincek + 1

k∑

i=1

a2i−1 + j

=1

1k + 1

k∑

i=1

a2i−1 +j

k + 1

→ 0 a.e.

as m →∞, andk∑

i=1

log(a2i + 2)

k∑

i=1

a2i−1 + j

→ 0 a.e.

Page 327: Kluwer

310 Chapter 4

as m →∞, we deduce that

m√

c1 · · · cm → 2 a.e.

as m →∞. Next, since cn ≥ 2, n ∈ N+, we have

m1c1

+ · · ·+ 1cm

≥ 2 .

Using the same inequalities as in the proof of Theorem 4.4.5 we thereforeobtain

2 ≤ limm→∞

m1c1

+ · · ·+ 1cm

≤ limm→∞

m√

c1 · c2 · · · · · cm = 2,

that is,lim

m→∞m

1c1

+ · · ·+ 1cm

= 2 a.e..

2

Remark. The asymptotic behaviour of the arithmetic mean

c1 + · · ·+ cm

m

as m →∞ was posed as an open problem in Dajani and Kraaikamp (2000).If we write m as before, then an easy calculation yields

c1 + · · ·+ cm

m= 2 +

k∑

i=1

a2i

j +k∑

i=1

a2i−1

,

with 0 ≤ j < a2k+1. Thus we need to study the behaviour of

k∑

i=1

a2i

k∑

i=1

a2i−1

(4.4.9)

Page 328: Kluwer

Ergodic theory of continued fractions 311

as k →∞. The asymptotic behaviour of the numerator in (4.4.9) is the sameof that of the denominator, and Aaronson (1986) showed that the fractionconverges to 1 in probability. However, one expects that infinitely oftenthe denominator is much larger that the numerator, and vice-versa. ThusDajani and Kraaikamp (op. cit.) conjectured that the lim inf and lim supof (4.4.9) are a.e. equal to 0 and +∞, respectively. Recently, Aaronson andNakada (2001) have proved this conjecture. 2

Page 329: Kluwer

312 Chapter 4

Page 330: Kluwer

Appendix 1: Spaces,functions, and measures

A1.1

Let X be an arbitrary non-empty set. A non-empty collection X of subsetsof X is said to be a σ-algebra (in X) if and only if it is closed under the for-mation of complements and countable unions. Clearly, ∅ and X both belongto X , and X is also closed under the formation of countable intersections.For any non-empty collection C of subsets of X the σ-algebra generated byC, denoted σ(C), is defined as the smallest σ-algebra in X which contains C.Clearly, σ(C) is the intersection of all σ-algebras in X which contain C.

A pair (X,X ) consisting of a non-empty set X and a σ-algebra X in X iscalled a measurable space. In the special case where X is a denumerable setthe usual σ-algebra in X is P(X), the collection of all subsets of X. Clearly,P(X) is generated by the elements of X : P(X) = σ (x : x ∈ X).

The product of two measurable spaces (X,X ) and (Y,Y) is the measur-able space (X × Y,X ⊗ Y), where the product σ-algebra X ⊗Y is defined asσ(C) with C = (A×B : A ∈ X , B ∈ Y).

A1.2

Let (X,X ) and (Y,Y) be two measurable spaces. A map f : X → Y from Xinto Y is said to be (X ,Y)-measurable or a Y -valued random variable (r.v.)on X if and only if the inverse image f−1(A) = (x ∈ X : f(x) ∈ A) ofevery set A ∈ Y is in X . Setting f−1(Y) = (f−1(A) : A ∈ Y), the abovecondition can be compactly written as f−1(Y) ⊂ X . [Note that f−1(Y) isalways a σ-algebra in X whatever f : X → Y ! ]

Let (X,X ) be a measurable space, let ((Yi,Yi))i∈I be a family ofmeasurable spaces, and for any i ∈ I let fi be a Yi-valued r.v. on X. Then

313

Page 331: Kluwer

314 Appendix 1

the σ-algebra σ(∪i∈If−1

i (Yi))

is called the σ-algebra generated by the family(fi)i∈I and is denoted σ((fi)i∈I). Clearly, this is the smallest σ-algebra S⊂Xhaving the property that fi is (S, Yi)-measurable for any i ∈ I.

A1.3

Let (X,X ) be a measurable space. A function µ : X → R+ is said tobe a (finite) measure on X if and only if it is completely additive, thatis, for any sequence (Ai)i∈N+ of pairwise disjoint elements of X we haveµ

(∪i∈N+Ai

)=

∑i∈N+

µ(Ai). Complete additivity is equivalent to finiteadditivity [that is, for any finite collection A1, . . . , An of pairwise disjointelements of X , we have µ (∪n

i=1Ai) =∑n

i=1 µ(Ai)] in conjunction with con-tinuity at ∅ (that is, for any decreasing sequence A1 ⊃ A2 ⊃ . . . of elementsof X with ∩i∈N+Ai = ∅ we have limn→∞ µ(An) = 0 ). Clearly, finiteadditivity implies µ (∅) = 0. In the special case where X is a denumerableset a measure µ on P(X) is defined by simply giving the values µ (x) forthe elements x ∈ X. A probability on X is a measure P on X satisfyingP (X) = 1. An important example of a probability on X is that of theprobability δx concentrated at x for any given x ∈ X, which is defined byδx(A) = IA(x), A ∈ X . The collection of all measures (probabilities) on Xwill be denoted m(X ) (pr(X )).

A triple (X,X , P ) consisting of a measurable space (X,X ) and a prob-ability P on X is called a probability space. [The traditional notation fora probability space is (Ω,K, P ). The points ω ∈ Ω are interpreted as thepossible outcomes (elementary events) of a random experiment, and the setsA ∈ K as the (random) events associated with it; these are the subsets ofΩ arising as the truth sets of certain statements concerning the experiment.]We say that A ∈ X occurs P -almost surely, and write A P -a.s., if and onlyif P (A) = 1. Let (Y,Y) be a measurable space and let f be a Y -valuedr.v. on X. The P -distribution of f is the probability Pf−1 on Y defined by(Pf−1

)(A) = P (f−1(A)), A ∈ Y.

Let (X,X ) and (Y,Y) be two measurable spaces. The product measureof µ ∈ m(X ) and ν ∈ m(Y) is the (unique) measure µ ⊗ ν ∈ m (X ⊗ Y)satisfying the equation µ⊗ν(A×B) = µ(A)ν(B) for any A ∈ X and B ∈ Y.

A1.4

Let X be a metric space with metric d. The usual σ-algebra in X, denotedBX , is that of Borel subsets of X, that is, the σ-algebra generated by the

Page 332: Kluwer

Spaces, functions, and measures 315

collection of all open subsets of X. In the special case where X = Rn (n-dimensional Euclidean space) we write Bn for BRn , n ∈ N+, and B = B1.Further, if X is a Borel subset M of Rn, then BM = Bn ∩M = (A ∩M :A ∈ Bn), n ∈ N+.

A sequence (µn)n∈N+ of measures on BX is said to converge weakly to ameasure µ on BX , and we write µn

w→ µ, if and only if

limn→∞

Xhdµn =

Xhdµ

for any h ∈ Cr(X) = the set of all real-valued bounded continuous functionson (X, d). An equivalent definition is obtained by asking that

limn→∞µn(A) = µ(A) (A1.1)

for any A ∈ BX for which µ (∂A) = 0, where ∂A is the boundary of Adefined as the closure of A minus the interior of A. In the special casewhere X = R, putting Fn(x) = µn ((−∞, x]) and F (x) = µ ((−∞, x]),x ∈ R, equation (A1.1) holds if and only if limn→∞ µn(R) = µ(R) andlimn→∞ Fn(x) = F (x) for any point of continuity x of F .

The Prokhorov metric dP on pr(BX) is defined by

dP(P, Q) = inf(ε > 0 : P (A) ≤ Q(Aε)+ε, A ⊂ X, A closed), P,Q ∈ pr(BX),

where Aε = (x : d(x,A) < ε) and d(x,A) = inf(d(x, y) : y ∈ A). If themetric space (X, d) is separable, then for P, Pn ∈ pr(BX), n ∈ N+, theweak convergence of Pn to P is equivalent to limn→∞ dP (Pn, P ) = 0.

Let (X, d) and (Y, d′) be two metric spaces. Consider a Y -valued r.v. fon X. The set Df of all discontinuity points of f belongs to BX since itcan be written as ∪ε ∩δ Aε,δ, where ε and δ vary over the positive rationalnumbers, and Aε,δ is the (open) set of all points x ∈ X for which thereexist x′, x′′ ∈ X such that d(x, x′) < δ, d(x, x′′) < δ and d′ (f(x′), f(x′′)) ≥ε.

Proposition A1.1 If Pn, P ∈ pr (BX), Pnw→ P , and P (Df ) = 0, then

Pnf−1 w→ Pf−1.

In particular, the above result holds for a continuous f for which clearlyDf = ∅. For a characterization via weak convergence of almost every-where continuous functions f , that is, such that P (Df ) = 0, see Mazzone(1995/96).

Page 333: Kluwer

316 Appendix 1

A1.5

In this section (X, d) is the real line with the usual Euclidean distance.The characteristic function (ch.f.) or Fourier transform of a measure

µ ∈ m(B) is the complex-valued function∧µ defined on R by

∧µ (t) =

Re itxµ(dx), t ∈ R.

If∧µ =

∧ν for two measures µ, ν ∈ m(B), then µ = ν.

Proposition A1.2 (Levy-Cramer continuity theorem) Let P, Pn ∈pr(B), n ∈ N+.

(i) Pnw→ P ∈ pr(B) implies limn→∞ Pn = P pointwise, and the conver-

gence of ch.f.s is uniform on compact subsets of R.

(ii) If limn→∞∧Pn= h pointwise and h is continuous at 0, then h is the

ch.f. of a probability P ∈ pr(B) and Pnw→ P .

Let µ, ν ∈ m(B). The convolution µ ∗ ν is the measure on B defined by

µ ∗ ν(A) =∫

Rµ(A− x)ν(dx), A ∈ B,

where A− x := (y − x : y ∈ A) , x ∈ R.

The convolution operator ∗ is associative and commutative. We have

µ ∗ ν = µ ν, µ, ν ∈ m(B).

For any n ∈ N+ let fi, 1 ≤ i ≤ n, be real-valued r.v.s on a probabilityspace (Ω, K, P ). The fi are said to be independent if and only if theσ-algebras f−1

i (B), 1 ≤ i ≤ n, are P -independent, that is,

P

(n⋂

i=1

Ai

)=

n∏

i=1

P (Ai)

for any Ai ∈ f−1i (B), 1 ≤ i ≤ n. For independent real-valued r.v.s fi, 1 ≤ i ≤

n, the ch.f. of the P -distribution P (∑n

i=1 fi)−1 of the sum

∑ni=1 fi is equal

to the product of the ch.f.s of the P -distributions Pf−1i of the summands,

1 ≤ i ≤ n. Also, P (∑n

i=1 fi)−1 is the convolution of the Pf−1

i , 1 ≤ i ≤ n.Let µ ∈ m(B). For any n ∈ N+ the nth convolution µ∗n of µ with itself

is defined recursively by µ∗1 = µ and µ∗n = µ∗(n−1) ∗ µ for n ≥ 2. Definealso µ∗0 as δ0.

Page 334: Kluwer

Spaces, functions, and measures 317

Let µ ∈ m(B). The Poisson probability Poisµ associated with µ is definedas

Pois µ = e−µ(R)∑

n∈N

µ∗n

n!= eµ−µ(R).

Its ch.f. is Pois µ = exp(∧µ − ∧

µ (0)). The classical Poisson distribution P (θ)with parameter θ > 0 is Pois(θδ1).

A measure on B is said to be a Levy measure if and only if it integratesthe function min

(1, x2

)on the whole of R. Given a Levy measure µ, the

τ -centered Poisson probability cτPois µ, τ > 0, is defined as the probabilitywith characteristic function

exp(∫

R

(e itx − 1− itx I[−τ,τ ](x)

)µ(dx)

).

We have cτ Pois µ = (Pois µ) ∗ δb(τ), where

b(τ) = −∫ τ

−τxµ(dx).

A probability P ∈ pr(B) is said to be infinitely divisible if and only if forany n ∈ N+ there exists Pn ∈ pr(B) such that P ∗n

n = P .

Proposition A1.3 (Levy–Khinchin representation) P ∈ pr(B) is in-finitely divisible if and only if there exist σ ≥ 0 and a Levy measure ν, andfor any τ > 0 there exists aτ ∈ R such that

∧P (t) = exp

(itaτ − σ2t2

2+

R

(e itx − 1− itx I[−τ,τ ](x)

)ν(dx)

), t ∈ R.

It follows from Proposition A1.3 that an infinitely divisible probability isthe convolution of a normal distribution N(aτ , σ

2) and a τ -centered Poissonprobability cτPois ν. Either of the two terms can be degenerate, that is, thecases σ = 0 and ν ≡ 0 are allowed.

An important special class of infinitely divisible probabilities on B is thatof stable probabilities. A probability P ∈ pr(B) is said to be stable if andonly if for any n ∈ N+ there exist An ∈ R++ and Bn ∈ R such that

P ∗n = Pf−1n ,

where fn is the affine function on R defined by

fn(x) = Anx + Bn, x ∈ R. (A1.2)

Page 335: Kluwer

318 Appendix 1

If Bn = 0 for any n ∈ N+, then P is said to be strictly stable. It appearsthat the only constants An allowed in (A1.2) are An = n1/α, n ∈ N+, withα ∈ (0, 2], and then α is called the order of µ. A probability P ∈ pr(B) is

stable of order α if and only if its ch.f.∧P has the form

∧P (t) = exp [i at− c|t|α (1− i b sgn t σ (t, α))] , t ∈ R,

where a, b, c ∈ R with |b| ≤ 1 and c ≥ 0, and

σ(t, α) =

tgπα2 if α 6= 1,

2π log |t| if α = 1.

In particular, a stable probability has order 2 if and only if it is normal.An important example of a stable probability is that of the 1-centered

Poisson probability c1Pois µk1,k2,α, 0 < α < 2, k1, k2 ≥ 0, k1 + k2 > 0,whose Levy measure has density

µk1,k2,α(dx)dx

=(k2I(−∞,0)(x) + k1I(0,∞)(x)

) |x|−1−α , x 6= 0.

The ch.f. hk1,k2,α of c1Pois µk1,k2,α is

hk1,k2,α(t) = exp

k2

0∫

−∞

(e itx − 1− itx I[−1,0)(x)

) |x|−1−αdx

+ k1

∞∫

0

(e itx − 1− itx I(0,1](x)

)x−1−αdx

, t ∈ R,

which can be expressed in terms of elementary functions as follows. We have

hk1,k2,1(t)

= exp

i(k2 − k1)(C− 1)t− π(k1 + k2)2

(1 + i

sgn tk1 − k2

k1 + k2log |t|

)|t|

,

where C = 0.57721... is Euler’s constant, while for α 6= 1, 0 < α < 2,

hk1,k2,α(t) = exp

i(k2 − k1)t1− α

+(k1 + k2)Γ(2− α)α(α− 1)

cosπα

2

(1 + i sgn t

k1 − k2

k1 + k2tg

πα

2

)|t|α

,

Page 336: Kluwer

Spaces, functions, and measures 319

where Γ is the classical gamma function.Actually, any stable probability of order α 6= 2 has the form

δa ∗ c1Pois µk1,k2,α

with a ∈ R, k1, k2 ≥ 0, k1 + k2 > 0.

A1.6

Let C = Cr(I) be the metric space of real-valued continuous functionson I = [0, 1] with the uniform metric

d(x, y) = supt∈I

|x(t)− y(t)| , x, y ∈ C.

The space C is complete and separable. The σ-algebra BC of Borel setsin (C, d) coincides with the σ-algebra BI ∩ C. Here BI denotes the σ-algebra in RI generated by the collection of its subsets of the form Πt∈IAt,where At ∈ B, t ∈ I, and At 6= R for finitely many t ∈ I.

Of paramount importance is the probability W on BC known as theWiener measure, for which

W (x : x(0) = 0) = 1,

W (x : x(ti)− x(ti−1) ≤ ai, 1 ≤ i≤ k)

=k∏

i=1

1√2π (ti − ti−1)

∫ ai

−∞e−u2/2(ti−ti−1)du

for any k ∈ N+, 0 ≤ t0 < t1 < · · · < tk ≤ 1, ai ∈ R, 1 ≤ i ≤ k.Let D = D(I)(⊃ Cr(I)) be the metric space of real-valued functions

on I which are right continuous and have left limits, with the Skorohodmetric d0 to be defined below. Clearly, we can also consider the uniformmetric d in D which is defined similarly to that in C, that is, d(x, y) =supt∈I |x(t)− y(t)| , x, y ∈ D.

Let L denote the set of all strictly increasing continuous functions ` :I → I with `(0) = 0, `(1) = 1, and put

s0(`) = sups6=t

|log [(`(t)− `(s)) / (t− s)]|

for any ` ∈ L. The distance d0(x, y)(≤ d(x, y)) for x, y ∈ D is defined asthe infimum of all ε > 0 for which there exists ` ∈ L such that s0(`) ≤ ε

Page 337: Kluwer

320 Appendix 1

and supt∈I |x(t)− y (`(t))| ≤ ε. The metrics d0 and d generate the sametopology in D. Nevertheless, while D is complete and separable under d0,separability does not hold under d.

The σ-algebra BD of Borel sets in (D, d0) coincides with the σ-algebra BI∩D. Wiener measure W can be immediately extended from BC to BD as thetopologies induced in D by the metrics d0 and d are identical. Hence A∩C ∈BC for any A ∈ BD. This allows us to define W (A) = W (A ∩ C), A ∈ BD.Clearly, C is the support of W in D, that is, the smallest closed subset ofD whose W -measure equals 1.

General references: Araujo and Gine (1980), Billingsley (1968), Halmos(1950), Hoffmann–Jørgensen (1994), Samorodnitsky and Taqqu (1994).

Page 338: Kluwer

Appendix 2: Regularlyvarying functions

A2.1

A measurable function R : [r,∞) → R+, where r ∈ R+, is said to beregularly varying (at ∞) of index α ∈ R if and only if there exists x0 ≥ rsuch that R([x0,∞)) ⊂ R++ and

limx→∞

R(tx)R(x)

= tα

for any t ∈ R++. A regularly varying function of index 0 is called a slowlyvarying function.

It is obvious that R is regularly varying of index α if and only if it canbe written in the form

R(x) = xαL(x), x ∈ (r,∞),

where L is a slowly varying function.The general form of a slowly varying function is described by the cel-

ebrated Karamata theorem below [cf. Seneta (1976, Theorem 1.2 and itsCorollary)].

Theorem A2.1 (Representation theorem) Let r ∈ R+. A function L :[r,∞) → R+ is slowly varying if and only if

L(x) = c(x) exp(∫ x

x0

ε(t)t

dt

), x ≥ x0,

for some x0 ≥ r, where the function c : [x0,∞) → R+ is bounded andmeasurable and limx→∞ c(x) = c > 0 while the function ε : [x0,∞) → R iscontinuous and limx→∞ ε(x) = 0.

Corollary A2.2 If L is a slowly varying function, then

321

Page 339: Kluwer

322 Appendix 2

(i) limx→∞ L(x + y)/L(x) = 1 for any y ∈ R++;

(ii) limx→∞ xεL(x) = ∞ and limx→∞ x−εL(x) = 0 for any ε > 0;

(iii) L is bounded on finite intervals in [x0,∞) if x0 ≥ r is large enough.

There exist necessary or sufficient integral conditions for slow variationwhich are easy to check and use for theoretical and practical purposes. Hereare two such results. See, e.g., Seneta (1976, pp. 53-56 and 86-88).

Theorem A2.3 Let r ∈ R+. If L : [r,∞) → R+ is a slowly varyingfunction and x0 ≥ r so large that L is bounded on finite intervals in [r,∞),then for any α ≥ −1 we have

limx→∞

xα+1L(x)∫ x

x0

yαL(y)dy

= α + 1 (A2.1)

while the function x →∫ x

x0

yαL(y)dy, x > x0, is regularly varying of index

α + 1.Conversely, if L : [r,∞) → R+ is measurable and bounded on finite

intervals in [x0,∞) for some x0 ≥ r and (A2.1) holds for some α > −1, then

L is a slowly varying function while the function x →∫ x

x0

yαL(y)dy, x > x0,

is regularly varying of index α+1. The last assertion also holds for α = −1.

Theorem A2.4 Let r ∈ R+. If L : [r,∞) → R+ is a slowly varyingfunction, then

limx→∞

∫ ∞

xyαL(y) dy < ∞ (A2.2)

for any α < −1. If limx→∞

∫ ∞

xy−1L(y) dy < ∞ then for any α ≤ −1 we

have

limx→∞

xα+1L(x)∫ ∞

xyαL(y)dy

= −(α + 1) (A2.3)

while the function x →∫ ∞

xyαL(y) dy, for x large enough, is regularly

varying of index α + 1.Conversely, if L : [r,∞) → R+ is measurable, satisfies (A2.2), and

(A2.3) holds for some α < −1, then L is a slowly varying function while

Page 340: Kluwer

Regularly varying functions 323

the function x →∫ ∞

xyαL(y)dy, for x large enough, is regularly varying of

index α + 1.

A2.2

An important class of pairs of regularly varying functions is defined as fol-lows. Let ξ be a non-degenerate real-valued random variable on a probabilityspace (Ω,K, P ), and define real-valued functions F and F on [0,∞) by

F (x) = E(ξ2I(|ξ|≤x)), F (x) = P (|ξ| > x), x ∈ R+.

Clearly, F is non-decreasing and F non-increasing. It is easy to check that

F (x) = −∫ x

0u2dF (u), F (x) =

∫ ∞

xu−2dF (u), x ∈ R+,

whence by integrating by parts we obtain

F (x) + x2F (x) = 2∫ x

0u F (u)du, (A2.4)

x2F (x) + F (x) = 2x2

∫ ∞

xu−3F (u)du, x ∈ R+. (A2.5)

Theorem A2.5 If either F or F varies regularly, then the limit

limx→∞

x2F (x)F (x)

= c (A2.6)

exists and 0 ≤ c ≤ ∞. Conversely, if (A2.6) holds with 0 < c < ∞, then

F (x) ∼ x2− 21+c L(x), F (x) ∼ cx−

21+c L(x)

as x →∞, where L is a slowly varying function. Finally, (A2.6) holds withc = 0 if and only if F is slowly varying while (A2.6) holds with c = ∞ ifand only if F is slowly varying.

The proof follows immediately from equations (A2.4) and (A2.5) byusing Theorems A2.3 and A2.4. 2

Page 341: Kluwer

324 Appendix 2

A2.3

Let f : [1,∞) → R++ be a measurable function which is bounded on finiteintervals and such that limx→∞ f(x) = ∞. For any y ∈ [f(1),∞) define

f0(y) = infx ≥ 1 : f(x) ≥ y, f1(y) = infx ≥ 1 : f(x) > y,

f2(y) = supx ≥ 1 : f(x) ≤ y.Clearly, the functions fi : [f(1),∞) → [1,∞), i = 0, 1, 2, are well defined,any of them is non-decreasing, 1 ≤ f0 ≤ f1 ≤ f2, and limy→∞ fi(y) =∞, i = 0, 1, 2. We say that f ∈ F if and only if

limy→∞

f1(y)f2(y)

= 1.

Lemma A2.6 [Samur (1989, Lemma 2.11)] (i) If f : [1,∞) → R++ isnon-decreasing and limx→∞ f(x) = ∞, then f ∈ F.

(ii) If f : [1,∞) → R++ is bounded on finite intervals and regularlyvarying of index α > 0, then f ∈ F. Moreover,

limy→∞

f0(y)f2(y)

= 1

and fi is regularly varying of index 1/α, i = 0, 1, 2.(iii) If f ∈ F and f1 is regularly varying of index 1/α for some α > 0,

then f is regularly varying of index α.

Corollary A2.7 Let f ∈ F, and define a real-valued function F on R+

byF (x) = (log 2)−1

k∈N+: |f(k)|≤xf2(k)k−2, x ∈ R+.

(i) F is slowly varying if and only if

limx→∞

f2(x)

x∑

k∈N+: k≤xf2(k)k−2

= 0. (A2.7)

(ii) If f ∈ F is regularly varying of index 1/2, then (A2.7) holds, thatis, F is slowly varying.

Page 342: Kluwer

Appendix 3: Limit theoremsfor mixing random variables

A3.1

Let (Ω, K, P ) be a probability space. For any two σ-algebras K1 and K2

included in the σ-algebra K define the dependence coefficients

α(K1,K2) = sup (|P (A1 ∩A2)− P (A1)P (A2)| : Ai ∈ Ki, i = 1, 2) ,

ϕ(K1,K2) = sup (|P (A2|A1)− P (A2)| : Ai ∈ Ki, i = 1, 2, P (A1) > 0) ,

ψ(K1,K2) = sup(∣∣∣∣

P (A2|A1)P (A2)

− 1∣∣∣∣ : Ai ∈ Ki, P (Ai) > 0, i = 1, 2

).

Clearly,α(K1,K2) ≤ ϕ(K1,K2) ≤ ψ(K1,K2)

and

0 ≤ α(K1,K2), ϕ(K1,K2) ≤ 1, 0 ≤ ψ(K1,K2) ≤ ∞.

Let (X,X ) be a measurable space and consider an array

X = Xnj , 1 ≤ j ≤ jn, jn ∈ N+, n ∈ N+ (A3.1)

of X-valued r.v.s defined on (Ω,K, P ). [An infinite sequence (Xn)n∈N+ ofX-valued r.v.s can be seen as the (triangular) array Xnj ≡ Xj , 1 ≤ j ≤ n,n ∈ N+ .] For such an array define the dependence coefficients

δ (k) = supn∈N

(k)+

max1≤h≤jn−k

δ(σ (Xnj , 1 ≤ j ≤ h), σ (Xnj , h + k ≤ j ≤ jn)) ,

325

Page 343: Kluwer

326 Appendix 3

where N(k)+ = n ∈ N+ : jn > k , k ∈ N+, and δ stands for either α, ϕ or

ψ. Clearly, in the case of an infinite sequence (Xn)n∈N+ we can write

δ(k) = suph,`∈N+

δ(σ(Xj , 1 ≤ j ≤ h), σ(Xj , h + k ≤ j ≤ h + k + `)).

It is obvious that the sequence (δ(k))k∈N+ is non-increasing. An array (resp.sequence) of r.v.s is said to be δ-mixing if and only if limk→∞ δ(k) = 0. Itcan be shown [Bradley (1986, p. 184)] that ϕ(1) < 1 whenever ψ(1) < ∞.

A finite collection (Xi)1≤i≤n , n ≥ 2, of X-valued r.v.s is said to bestrictly stationary if and only if the probability distribution of

(Xk+1, · · · , Xk+h), 0 ≤ k ≤ n− h,

does not depend on k whatever 1 ≤ h < n. A sequence (Xn)n∈N+ of X-valued r.v.s is said to be strictly stationary if and only if the probabilitydistribution of (Xk+1, · · · , Xk+h) does not depend on k ∈ N whatever h ∈N+. An array of X-valued r.v.s is said to be strictly stationary if and onlyif any row of it is strictly stationary.

Proposition A3.1 Let (A3.1) be a ψ-mixing array of X-valued r.v.s.Let ξ and η be real-valued random variables which are σ(Xnj , 1 ≤ j ≤ h)-and σ (Xnj , h + k ≤ j ≤ jn)-measurable, respectively, for some h, k, n ∈ N+.Assume that E |ξ| , E |η| < ∞ and ψ(k) < ∞. Then Cov (ξ, η) exists and

|Cov (ξ, η)| ≤ ψ(k)E |ξ|E |η| .

In particular, if Eξ2 < ∞ and Eη2 < ∞ then

|Cov (ξ, η)| ≤ ψ(k)Var1/2ξ Var1/2 η.

Corollary A3.2 Let (A3.1) be a ψ-mixing strictly stationary array ofreal-valued r.v.s with ψ(1) < ∞. Assume that EX2

n1 < ∞ for some n ∈ N+.Then

Vark∑

j=1

Xnj < k

1 + 2

jn∑

j=1

ψ(j)

Var Xn1, 1 ≤ k ≤ jn.

Corollary A3.3 Let (Xn)n∈N+ be a ψ-mixing strictly stationary se-quence of X-valued r.v.s. Assume that

∑n∈N+

ψ(n) < ∞. Let f be a

Page 344: Kluwer

Limit theorems 327

real-valued r.v. on (X,X ), and assume that Ef2 (X1) < ∞. Then theseries

σ2 = Ef2(X1)−E2f(X1)+2∑

n∈N+

E(f(X1)−Ef(X1))(f(Xn+1)−Ef(X1))

is absolutely convergent and σ ≥ 0. We have

Varn∑

j=1

f(Xj) = n(σ2 + o(1))

as n →∞.

The above results are already folklore. See, e.g., Doukhan (1994, Ch. 1).

Proposition A3.4 [Gordin (1971, Remark 3)] In addition to the hy-potheses of Corollary A3.3 assume that ψ(1) < 1. Then σ = 0 if and onlyif f = const.

A3.2

For an array (A3.1) of real-valued r.v.s on (Ω, K, P ) set

Snk =k∑

j=1

Xnj , 1 ≤ k ≤ jn, Snjn = Sn, n ∈ N+.

Then such an array is said to be strongly infinitesimal (s.i. for short) if andonly if it is strictly stationary and for any sequence (kn)n∈N+ of naturalintegers such that kn ≤ jn, n ∈ N+, and limn→∞ kn/jn = 0 the sum Snkn

converges in P -probability to 0 as n →∞.

All results given below were proved by J.D. Samur, as indicated at appro-priate places, in the more general case of Banach valued random variables.

Proposition A3.5 If (A3.1) is a ϕ-mixing s.i. array of real-valued r.v.s,then

limn→∞ max

1≤k≤kn

dP

(PS−1

nk , δ0

)= 0

for any sequence (kn)n∈N+ of natural integers such that kn ≤ jn, n ∈ N+,and limn→∞ kn/jn = 0.

This is a consequence of a more general result [Samur (1984, Theorem3.3)].

Page 345: Kluwer

328 Appendix 3

Proposition A3.6 [Samur (1987, § 3.4.3.2)] Let (A3.1) be a ϕ-mixingstrictly stationary array of real-valued r.v.s such that PS−1

n converges weaklyto some probability measure on B. Then the array (A3.1) is s.i. if and onlyif Xn1 converges in P -probability to 0 as n → ∞, and for any ε > 0 thereexists 0 < a = a(ε) < 1 such that

lim supn→∞

max1≤k≤ajn

P (|Snk| > ε) < 1.

A3.3

Let ν be an infinitely divisible probability on B. We denote by Qν thedistribution (on BD) of a stochastic process ξν = (ξν(t))t∈I with stationaryindependent increments, ξν(0) = 0 a.s., trajectories in D, and ξν(1) havingprobability distribution ν. When ν is Gaussian the process ξν can be takenwith trajectories in C. In this case the distribution of ξν is concentrated onBC , and we shall denote it by Q′

ν .Given an array (A3.1) of real-valued r.v.s., for any n ∈ N+ define the

stochastic processes ξDn = (ξD

n (t))t∈I and ξCn = (ξC

n (t))t∈I by

ξDn (t) = Snbjntc,

ξCn (t) = Snbjntc + (jnt− bjntc) (Sn(bjntc+1) − Snbjntc), t ∈ I,

with the convention Sn0 = 0, n ∈ N+. Clearly, for any n ∈ N+ thetrajectories of ξD

n and ξCn are in D and C, respectively.

Theorem A3.7 [Samur (1987, Theorem 3.2 and Corollary 3.3)] Let(A3.1) be a ϕ-mixing strictly stationary array of real-valued r.v.s such thatψ(1) < ∞. Let ν be a probability measure on B. Then the following state-ments are equivalent:

I. PS−1n

w→ ν and the array (A3.1) is s.i.II. ν is infinitely divisible and P

(ξDn

)−1 w→ Qν in BD.

Remark. If the assumption ψ(1) < ∞ does not hold, then Theorem A3.7still holds with statement I replaced by

I.′ PS−1n

w→ ν, the array (A3.1) is s.i., and

supn∈N+

jnP (|Xn1| > ε) < ∞, limn→∞ jnP (|Xn1| > ε, |Xnj | > ε) = 0

for any ε > 0 and any integer j ≥ 2. 2

Page 346: Kluwer

Limit theorems 329

Theorem A3.8 [Samur (1987, Corollary 3.5 and § 3.6.4)] Let (A3.1) bea ϕ-mixing strictly stationary array of real-valued r.v.s. Let ν be a probabilitymeasure on B. Then the following statements are equivalent :

I. PS−1n

w→ ν, the array (A3.1) is s.i., and limn→∞ jnP (|Xn1| > ε) = 0for any ε > 0.

II. ν is Gaussian and P(ξDn

)−1 w→ Qν in BD.

III. ν is Gaussian and P(ξCn

)−1 w→ Q′ν in BC .

IV. ν is Gaussian, and on a common probability space (Ω′,K′, P ′) thereexist an array

X′ =X ′

nj , 1 ≤ j ≤ jn, jn ∈ N+, n ∈ N+

of real-valued r.v.s and a stochastic process ζ = (ζ(t))t∈I with trajec-tories in C which satisfy

P ′(X ′n1, · · · , X ′

njn)−1 = P (Xn1, · · · , Xnjn)−1, n ∈ N+,

P ′ζ−1 = Q′ν ,

max1≤k≤jn

∣∣∣∣∣∣

k∑

j=1

X ′nj − ζ

(kjn

)∣∣∣∣∣∣→ 0 P ′-a.s. as n →∞.

Remark. If ϕ(1) < 1 and ν is Gaussian, then statement I above can bereplaced by

I.′ PS−1n

w→ ν, and the array (A3.1) is s.i. 2

Theorem A3.9 [Samur (1987, § 3.4.3.1)] Let (Xn)n∈N+ be a ϕ-mixingstrictly stationary sequence of real-valued r.v.s. Let (Bn)n∈N+ be a sequenceof positive numbers such that limn→∞Bn = ∞, and let (An)n∈N+ be asequence of real numbers. Assume that

P

1

Bn

n∑

j=1

(Xj −An)

−1

w→ ν,

Page 347: Kluwer

330 Appendix 3

where ν is a non-degenerate probability measure on B. Then ν is stable. Letα ∈ (0, 2] be the order of ν and write

Xnj =1

Bn(Xj −An) , 1 ≤ j ≤ n, n ∈ N+.

The array X = Xnj , 1 ≤ j ≤ n, n ∈ N+ is s.i. if and only if:

(i) Bn = n1/αL(n), n ∈ N+, for some slowly varying function L : R+ →R++ integrable over finite intervals, and

(ii) for any sequence (rn)n∈N+ of natural integers such that rn ≤ n andlimn→∞ rn/n = 0 we have

limn→∞

rn(Arn −An)Bn

= 0.

Theorem A3.10 [Samur (1984, Theorem 5.6)] Let (A3.1) be a ϕ-mixingstrictly stationary array of real-valued r.v.s such that ψ(1) < ∞. Assumethere exist positive measures µn on B, n ∈ N+, such that µn(R) ≤ 1 andµn([−t, t]) = 0, n ∈ N+, for some t ∈ R++. If PX−1

n1 = (1−µn(R))δ0 +µn

and jnµn converges weakly to a finite measure µ on B, then PS−1n

w→ Pois µ.

Theorem A3.11 [Samur (1984, Theorems 4.1 and 4.2)] Let (A3.1) be aϕ-mixing strictly stationary s.i. array of real-valued r.v.s such that ϕ(1) < 1.Assume that PS−1

n converges weakly to a probability measure ν on B. Thenν is Gaussian if and only if

limn→∞ jnP (|Xn1| > ε) = 0

for any ε > 0. If ν = N(m,σ2) then for any ε > 0 we have

(i) limn→∞E

(jn∑

j=1

(XnjI(|Xnj |≤ε) −EXnjI(|Xnj |≤ε)

))2

= σ2

and

(ii) limn→∞Ejn∑

j=1XnjI(|Xnj |≤ε) = m.

For any real-valued r.v. η put

m2(η) =

E2η/Eη2 if 0 < Eη2 < ∞,

0 if Eη2 = ∞.

Page 348: Kluwer

Limit theorems 331

It can be proved that if Eη2 = ∞ then

limx→∞

E2η I(|η|≤x)

Eη2I(|η|≤x)= 0. (A3.2)

See, e.g., Araujo and Gine (1980, p. 80).

Theorem A3.12 [Samur (1985), Corollary 3.4] Let (Xn)n∈N+ be a ϕ-mixing strictly stationary sequence of real-valued r.v.s for which

n∈N+

ϕ1/2(n) < ∞.

Assume that

0 < EX21 ≤ ∞, lim

x→∞x2P (|X1| > x)EX2

1I(|X1|≤x)= 0,

and the limits

ϕ(0)n := lim

x→∞EX1XnI(|X1|≤x,|Xn|≤x)

EX21I(|X1|≤x)

, n ∈ N+,

exist and are all finite. Put Sn =∑n

i=1 Xi, n ∈ N+, S0 = 0. Then thefollowing assertions hold:

(i) E |X1| < ∞.(ii) The series

σ2(0) = ϕ

(0)1 −m2(X1) + 2

n≥2

(ϕ(0)n −m2(X1))

converges absolutely and its sum is non-negative.(iii) If σ(0) 6= 0 then for any sequence (Bn)n∈N+ of positive numbers with

limn→∞Bn = ∞ satisfying

limn→∞nB−2

n EX21I(|X1|≤Bn) = 1

we have P ξ−1n

w→ WD in BD, where

ξn(t) =Sbntc − bntc EX1

σ(0)Bn, n ∈ N+, t ∈ I.

When a2 = EX21 < ∞ we can take Bn = |a|n1/2, n ∈ N+.

Page 349: Kluwer

332 Appendix 3

Page 350: Kluwer

Notes and Comments

1.1

As we have noted, the basic reference for classical non-metric results ondifferent types of continued fraction expansions is Perron (1954, 1957).

There exist several metrical results about Euclid’s algorithm. Let b, n ∈N+ with 1 ≤ b < n. Then b/n = [a1, · · · , aτ(b,n)] with aτ(b,n) ≥ 2, andτ(b, n) ∈ N+ is the number of division steps occurring when b and n areinput to the algorithm. Since Euclid’s algorithm applied to b and n behavesessentially the same as when applied to b/g.c.d.(b, n) and n/g.c.d.(b, n),it is convenient to consider the average number τn of division steps when bis relatively prime to n and chosen at random, that is, probability 1/ϕ(n)is given to any integer in the range [1, n] which is prime to n. Here ϕ isEuler’s ϕ-function defined by

ϕ(n) = n∏

p|n

(1− 1

p

), n ≥ 2,

and ϕ(1) = 1, where the product is taken over all prime numbers p whichdivide n. Clearly,

τn =1

ϕ(n)

n∑

k = 1g.c.d.(k, n) = 1

τ(k, n).

Porter (1975) and Knuth (1976) showed that

τn =12 log 2

π2log n + c + O(n−1/6+ε)

as n →∞ for any ε > 0, with

c =6 log 2

π2

(3 log 2 + 4C− 24π2ζ ′(2)− 2

)− 12

= 1.467078... .

333

Page 351: Kluwer

334 Notes and Comments

The leading coefficient (12 log 2)/π2 = 0.84276... was independently derivedby Dixon (1970, 1971) and Heilbronn (1969). A very interesting discussion ofthis topic can be found in Knuth (1981, Section 4.5.3). See also Lochs (1961),Szusz (1980), and Tonkov (1974). For recent generalizations of Dixon’s andHeilbronn’s results, see Hensley (1994). The largest quotient

max1≤k≤τ(b,n)

ak

occurring in Euclid’s algorithm when b and n are input to the algorithm,has been studied by Hensley (1991).

The continued fraction transformation τ underlies a chaotic discrete dy-namical system which exhibits in an accessible manner all the common fea-tures of such systems. See, e.g., Corless (1992).

1.2

Whole sections or chapters on the metrical theory of continued fractionscan be found in the books by Billingsley (1965), Ibragimov and Linnik(1971), Iosifescu and Grigorescu (1990), Kac (1959), Khin(t)chin(e) (1956,1963, 1964), Knuth (1981), Koksma (1936), Levy (1954), Rockett and Szusz(1992), Sinai (1994), Urban (1923).

1.3

The natural extension τ of τ has been introduced in a more general contextby Nakada (1981) in order to derive ergodic properties of associated randomvariables. See Sections 4.0 and 4.1.

The extended incomplete quotients have been first introduced by Faivre(1996) and, in general, the extended random variables by Iosifescu (1997),who proved Theorem 1.3.5 which motivates the consideration of the condi-tional probability measures γa, a ∈ I. Proposition 1.3.8 and Corollary 1.3.9can also be found in the latter reference.

Subsections 1.3.5 and 1.3.6 rely on the work of Iosifescu (1989, 2000b). It is worth mentioning that to our knowledge it is the first time thatmixing coefficients have been computed exactly. A first estimation, ψ(n) ≤(0.8)n, n ∈ N+, of the ψ-mixing coefficients is due to Philipp (1988). Asto other types of mixing, it seems possible to prove a kind of α-mixing for(r`)`∈Z using the Markovian structure of (s`)`∈Z and the reversibility of(a`)`∈Z.

Page 352: Kluwer

Notes and Comments 335

It is the appropriate place to mention that the sequence (an)n∈N+ enjoysanother mixing property known as the almost Markov property, a conceptintroduced by the Lithuanian school—see especially the references to thepapers by V.A. Statulevicius and B. Riauba in Heinrich (1987) and Mis-evicius (1971). See also Saulis and Statulevicius (1991). Let µ ∈ pr(BI) andfor k, n ∈ N+ define the random variable

αk,n(µ) = sup |µ (B|σ(a1, · · · , ak+n−1))− µ (B|σ(ak+1, · · · , ak+n−1))| ,

where the supremum is taken over all B ∈ σ(ak+n, ak+n+1, · · · ). Put

χµ(n) = supk∈N+

ess sup αk,n(µ).

Then as shown in Heinrich (op. cit.)—for a slightly weaker form of this resultsee Misevicius (1981)—assuming that µ ¿ λ and that f = dµ/dλ ∈ L(I)and is bounded away from 0, we have

χµ(n) ≤ 2−n+1(24 + s(f)/ infx∈I

f(x)), n ∈ N+.

Finally, note that it has not been usual to prove F. Bernstein’s theo-rem (Proposition 1.3.16) as an application of ψ-mixing of the sequence ofincomplete quotients.

2.1

Theorem 2.1.6 and Proposition 2.1.7 are in fact corollaries of the ergodictheorem of Ionescu Tulcea and Marinescu (1950) [see also Hennion (1993)],which is a deep generalization of an ergodic theorem of Doeblin and Fortet(1937). Cf. Iosifescu (1993b). As noted by Iosifescu (1993a), it is hard tounderstand how Doeblin (1940) missed a geometric rate solution to Gauss’problem, which could have been obtained by using the latter theorem.

Subsection 2.1.3 relies on the work of Iosifescu (1992, 1993, 1994). Inparticular, Propositions 2.1.11 and 2.1.12 have allowed for the simplest so-lution known to date to Gauss’ problem, which is included in the first tworeferences just quoted. Proposition 2.1.11 has been also proved by Szusz(1961) for f ∈ C1(I).

In connection with Proposition 2.1.17 we note that in the case of a sin-gular µ ∈ pr(BI) the solution to the corresponding Gauss’ problem has notbeen yet systematically studied. See Remark 2 following Corollary 4.1.10for a case where the limit clearly differs from Gauss’ measure.

Page 353: Kluwer

336 Notes and Comments

2.2

Subsections 2.2.1 and 2.2.2 contain a very detailed presentation of E.Wir-sing’s 1974 celebrated paper. This also includes the effective computationof numerical constants occurring there.

Subsection 2.2.3 relies on the work of Iosifescu (2000 a, c). That Theo-rem 2.2.6 holds for f ∈ L(I), that is, that Theorem 2.2.8 holds, had beenannounced in Iosifescu (1992) and subsequently used by Faivre (1998a). Westress again the importance of a study of the set E defined in Remark 1following Theorem 2.2.6. (See also Remark 2 following Theorem 2.2.11.)

2.3

This section contains a detailed presentation of K.I. Babenko’s work onGauss’ problem, with some improvements and generalizations. Informationabout the life and work of K.I. Babenko (1919–1987) can be found in RussianMath. Surveys 35 (1980), no. 2, 265–275, and 43 (1988), no. 2, 138–151.

Proposition 2.3.2 and its proof are due to Mayer and Roepstorff (1987).For a = 0, that is, under Lebesgue measure λ = γ0 the exact Gauss–Kuzmin–Levy Theorem 2.3.5 has been proved by Babenko (1978). The general casea ∈ I has been announced by Iosifescu (2000 b). Note that equation (2.3.14)is equivalent to equation (3.6) in Hensley (1992).

We stress the fact that for some a ∈ I the exact convergence rate inGauss’ problem under γa is faster than Wirsing’s optimal rate O(λn

0 ) asn →∞. See the Remark after the proof of Corollary 2.3.6.

It should be noted that by Proposition 2.1.17 for any i(k) ∈ Nk+ the

limit of µ[(an+1, . . . , an+k) = i(k)] as n →∞ exists and is equal to γ(I(i(k))whatever µ ∈ pr(BI) such that µ ¿ λ. Corollary 2.3.6 shows that in thecase where µ = γa, a ∈ I, a good convergence rate also holds.

A note of historical nature is in order concerning the equation

limn→∞λ(an = k) =

1log 2

log(

1 +1

k(k + 2)

), k ∈ N+,

which is a weaker form of a result given in Corollary 2.3.6. This formula wasfirst obtained as early as 1900. Two papers of the Swedish astronomer HugoGylden, whose understanding of the approximate computation of planetarymotions led him in 1888 to study the asymptotic of λ(an = k), k ∈ N+, asn →∞, were taken up for revision by his fellow-countrymen Torsten Brodenand Anders Wiman, both mathematicians associated with Lund University.

Page 354: Kluwer

Notes and Comments 337

Wiman (1900) got finally the correct result after Sisyphical computations.Two subsequent papers, both published in 1901, of Broden and Wiman werethen considered by Emile Borel as the first ones to notice the applicabilityof measure theory in probability. The reader will find precise referencesand all the necessary details in von Plato (1994, Ch. 2). This book is afascinating account of the emergence of measure-theoretic probability in thefirst third of the 20th century (until the publication of A.N. Kolmogorov’sGrundbegriffe der Wahrscheinlichkeitsrechnung in 1933). It is convincinglyargued there that the theory of the continued fraction expansion shouldbe counted among the fields that brought infinitary events and the idea ofmeasure 0 into probability.

2.5

This section relies on the work of Iosifescu (1994, 1997, 1999). For a = 0,that is, under Lebesgue measure λ = γ0 the optimal convergence rate O(g2n)in Theorem 2.5.5 (without explicit lower and upper bounds), has been firstshown by Durner (1992) using a different approach. For a = 0, too, Theorem2.2.8 with just an upper bound O(gn) [instead of the optimal one O(g2n)],has been proved by a different method by Dajani and Kraaikamp (1994).The proof given here emphasizes the importance of the generalized Broden–Borel–Levy formula (1.3.21).

It is hard to understand why A. Denjoy’s 1936 Comptes Rendus Noteswent unnoticed so many years. The method of proving and generalizingDenjoy’s results here, is quite different from that suggested by him.

3.0

The idea underlying Lemma 3.0.1 goes back to Philipp (1970). Lemma 3.0.2is a special case of a result of Samur (1989, Lemma 2.3).

3.1

Except for Theorem 3.1.6, the results in Subsections 3.1.1 and 3.1.2 havebeen proved by Samur (1989). The classical Poisson law [Theorem 3.1.2(iii)] under any µ ¿ λ has been first given a complete proof by Iosifescu(1977), who filled a gap in an incomplete proof by Doeblin (1940, p. 358).

Page 355: Kluwer

338 Notes and Comments

3.2 & 3.3

Subsections 3.2.2 and 3.2.3 mainly rely on the work of Samur (1989, 1996),who applied his earlier results for different mixing random variables to thespecial case of random variables occurring in the metrical theory of contin-ued fractions. The presentation here is more transparent due to the consis-tent use of the extended random variables which only appear in an implicitmanner in Samur’s treatment.

For the first versions of most of the results in these sections credit shouldbe given to Doeblin (1940). An extensive analysis of Doeblin’s paper hasbeen made by Iosifescu (1990, 1993 a,b), where the reader can find a com-prehensive evaluation of Doeblin’s important contributions to the metricaltheory of continued fractions as compared with subsequent work in the field.

It should be noted that Samur (1989) has also dealt with more generalpartial sums Sn defined as follows. Let (fn)n∈N+ be a sequence of H-valuedfunctions on N+, where H is a separable Hilbert space, and put Sn =∑n

i=1 fn(ai), n ∈ N+. He derived sufficient conditions for the laws of certainrandom functions associated with the Sn, n ∈ N+, to converge weakly (inthe Skorohod space of H-valued functions on I) to an infinitely divisibleprobability measure on H.

Another generalization of the case considered in Theorem 3.2.4 is thatof partial sums

Sn =n∑

i=1

fi(ai),

where (fn)n∈N+ is a sequence of real-valued functions on N+. A very specialcase has been taken up by Doeblin (1940, p. 360), with fn(j) = 1 or 0according as j ≥ cn or j < cn, n, j ∈ N+. Here (cn)n∈N+ is a sequenceof positive numbers. In this case Sn is the number of occurrences of therandom events (ai ≥ ci), 1 ≤ i ≤ n. By F. Bernstein’s theorem—seeCorollary 1.3.16—limn→∞ Sn < ∞ or = ∞ a.e. in I according as the series∑

n∈N+1/cn converges or diverges. Doeblin gave valid hints for a proof that

if∑

n∈N+1/cn = ∞ then (Sn)n∈N+ obeys the central limit theorem under

λ. More precisely, (Sn − An)/√

An is asymptotically N(0, 1) under λ asn →∞, with

An =1

log 2

n∑

i=1

log(

1 +1ci

), n ∈ N+.

A complete proof with an estimate of the convergence rate under any µ ¿ λhas been given by Philipp (1970). This result has been improved by Zuparov

Page 356: Kluwer

Notes and Comments 339

(1981). The functional version of this central limit theorem was proved byPhilipp and Webb (1973).

3.4

We only mention here a result not covered by those given in this section.It is about Doeblin’s sequence (Sn)n∈N+ just discussed. Doeblin (1940, p.361) asserted the validity of the law of the iterated logarithm

λ

(lim supn→∞

Sn −An√2An log log An

= 1)

= 1.

A complete proof was again given by Philipp (1970). The functional versionof this law of the iterated logarithm might follow from a more general resultin Szusz and Volkmann (1982, p. 458).

4.0

Most of the results stated for probability measures are still valid for finitemeasures and even for σ-finite, infinite measures. See, e.g., Aaronson (1997).

4.1

Khin(t)chin(e) [1934/35, 1936; 1963 (or 1964), Ch. 3] proved the a.e. con-vergence of arithmetic means

∑ni=1 f(ai, · · · , ai+k−1)/n, n ∈ N+, for some

fixed k ∈ N, under an unnecessarily strong assumption on the functionf : Nk

+ → R. His proofs are quite intricate since he made no use of theBirkhoff–Khinchin (!) ergodic theorem which, as we have seen, providesshort and elegant proofs. (This should be certainly associated with the factthat ergodic theory at the time was restricted to invertible transformations.But even so a way out could have perhaps been found.) Unlike Khinchin,Doeblin (1940, p. 366) did make use of the ergodic theorem. He proved thatthe continued fraction transformation τ is ergodic under λ [a different proofhad been given earlier by Knopp (1926), see also Martin (1934)]. Since τis γ-preserving, this enabled him to derive (in an equivalent form) equa-tion (4.1.1), thus to retrieve Khinchin’s results under weaker assumptionsin a straightforward manner. It is the appropriate place to note that, inspite of the fact that, e.g., Billingsley (1965, p. 49) fully credits Doeblinfor the idea leading to (4.1.1), many authors assert that this idea is due to

Page 357: Kluwer

340 Notes and Comments

Ryll–Nardzewski (1951). Actually, the only real advance made after 1940in using ergodic theorems in the metric theory of RCF expansion originatedwith Nakada (1981) who, as already mentioned, introduced the natural ex-tension τ of τ , allowing to derive equation (4.1.6). It is again really surprisingthat Doeblin (1940, p. 365) asserts that his version of Theorem 2.2.11—seeRemark 1 following that theorem—implies that

limn→∞

1n

cardk : Θ−1k < x, 1 ≤ k ≤ n = H(x), x ≥ 1,

and that n−1∑n

i=1 Θi converges a.e. as n →∞ to a constant (not indicated).Or Doeblin’s first assertion above is equivalent to the first case consideredin Corollary 4.1.22 while the second one is the first equation in Corollary4.1.23 without the value of the limit. How did Doeblin guess these resultswhose proofs involve the use of τ?

It should be noted that special cases of the Khinchin-Doeblin resultshave been known before. For example, as already noted, Proposition 4.1.1and its consequences were first proved (without convergence rates) by Levy(1929).

The application of the Gal–Koksma theorem to the RCF expansion,yielding the convergence rates indicated, is due to de Vroedt (1962, 1964).Let us finally mention that in Philipp (1967) a more general problem isconsidered. Given an arbitrary sequence (In)n∈N+ of intervals contained inI, it is shown there that for any ε > 0 the random variable

cardk : τk ∈ Ik, 1 ≤ k ≤ n , n ∈ N+,

is equal to

n∑

k=1

γ(Ik) + O

(n∑

k=1

γ(Ik)

)1/2

log3+ε2

(n∑

k=1

γ(Ik)

) a.e.

as n → ∞, where the constant implied in O depends on both ε and thecurrent point ω ∈ Ω.

Moeckel (1982), then Jager and Liardet (1988), using quite differentmethods showed—amongst other things—that if we consider modulo 2 thesequence (qn)n∈N+ of the denominators of the RCF convergents of any givenω ∈ Ω, then the asymptotic relative frequencies of the digit blocks 01, 10,and 11 all are a.e. equal to 1/3. [Note that the digit block 00 cannot occursince |pn−1qn − pnqn−1| = 1, n ∈ N+.] Jager and Liardet (op. cit.) showed

Page 358: Kluwer

Notes and Comments 341

that results of this kind can be easily derived from the ergodicity of a certainskew product. To define it we need some notation. For any integer m ≥ 2let G(m) denote the finite group of 2× 2 matrices with entries from Z/mZ(the classes of remainders modulo m) and determinant equal to ±1, that is,

G(m) =((

a bc d

): a, b, c, d ∈ Z/mZ, ad− bc = ±1

).

It is known that the cardinality of G(m) is given by the formula

cardG(m) =

2J(2) = 6 if m = 2,

2mJ(m) if m ≥ 3,

where J is Jordan’s arithmetical totient function defined by

J(m) = m2∏

p|m

(1− 1

p2

), m ≥ 2.

Here the product is taken over all prime numbers p which divide m.Jager and Liardet’s skew product Tm : Ω × G(m) → Ω × G(m) is then

defined by

Tm(ω, A) =(

τ(ω), A

(0 11 a1(ω)modm

)), (ω, A) ∈ Ω×G(m).

These authors showed that Tm is γ ⊗ hm-preserving, where hm is the Haarmeasure on G(m), that is, the uniform one assigning measure 1/cardG(m)to any element of G(m), and that (Tm, γ⊗hm) is an ergodic endomorphism.Hence they deduced, e.g., that given integers m ≥ 2, a, b ∈ N+, withg.c.d.(a, b,m) = 1, we have

limn→∞

1n

card k : pk ≡ a, qk ≡ b mod m, 1 ≤ k ≤ n =1

J(m)a.e.,

a result also obtained by Moeckel (1982). Subsequently, Nolte (1990) gaveother interesting applications of Jager and Liardet’s endomorphism.

A natural extension Tm of Tm was obtained and studied by Dajani andKraaikamp (1998). It appears that we can take Tm : Ω2×G(m) → Ω2×G(m)defined by

Tm ((ω, θ), A) =(

τ(ω, θ), A

(0 11 a1(ω) modm

))

Page 359: Kluwer

342 Notes and Comments

for (ω, θ,A) ∈ Ω2×G(m). Then Tm is γ⊗hm-preserving, and(Tm, γ ⊗ hm

)is

an ergodic automorphism. Hence Dajani and Kraaikamp (op. cit.) deduced,e.g., that for any integers m ≥ 2, 0 ≤ a, b ≤ m− 1, with g.c.d.(a, b, m) = 1and for any (t1, t2) ∈ I2 we have

limn→∞

1n

card k : Θk−1 < t1, Θk < t2, pk ≡ a, qk ≡ b mod m, 1 ≤ k ≤ n

=H(t1, t2)

J(m)a.e.,

where the distribution function H has been defined in Corollary 4.1.20.Their paper contains a host of other results. They also showed that theseresults can be extended to S-expansions (cf. Sections 4.2 and 4.3). It isinteresting to note that the sequences of numerators and denominators ofthe S-convergents have – mod m – the same asymptotic behaviour as thatjust indicated for the sequences of numerators and denominators of the RCFconvergents.

It may seem difficult to compare, e.g., the decimal expansion with theRCF expansion, since their dynamics are different. However, Lochs (1964)obtained a then surprising result that had to serve as a prototype for furtherresults of the same kind. Let ω ∈ Ω and consider the rational numberxn = xn(ω) := b10nωc/10n, which yields the first n decimal digits of ω, andyn = xn +10−n, n ∈ N+. Clearly, for n large enough we have yn < 1. Next,let ω = [a1, a2, · · · ], xn = [b1, · · · , bk], and yn = [c1, · · · , c`] be the RCFexpansions of ω, xn, and yn, respectively, and for n ∈ N+ large enough put

mn = mn(ω) = maxi ≤ max(k, `) : bj = cj , 1 ≤ j ≤ i .In other words, mn(ω) is the largest integer such that the closed interval[xn, yn] is contained in the closure of the fundamental interval I(a1, · · · ,amn(ω)) (containing ω). For example, if ω = 3

√2 − 1 = 0.259921 · · · then

x5 = 0.25992, y5 = 0.25993, ω = [3, 1, 5, 1, 1, · · · ], x5 = [3, 1, 5, 1, 1, 4, 2, 5, 1,3], and y5 = [3, 1, 5, 1, 1, 5, 5, 1, 2, 1, 4, 3]. Therefore m5(ω) = 5, that is,from the first 5 decimal digits of ω we obtain its first 5 RCF digits. Usingarithmetic properties of τ and Paul Levy’s result (4.1.19), Lochs (op. cit.)proved that

limn→∞

mn

n=

6 log 2 log 10π2

= 0.97027014 · · · a.e..

This means that, roughly speaking, usually around 97% of the RCF digitsare determined by the decimal digits. Using an early mainframe computer,

Page 360: Kluwer

Notes and Comments 343

by way of example, Lochs (1963) calculated that the first 1000 decimal digitsof π determine 968 RCF digits of it!

Lochs’ result was generalized to a wider class of transformations of I byBosma et al. (1999). Their results are based on the Shannon–McMillan–Breiman theorem in information theory [see Billingsley (1965, p. 129)] whileLochs’ limit appears in fact to be the ratio of the entropies of the transfor-mations S : I → I defined as Sx = 10 xmod 1, x ∈ I, underlying the decimalexpansion, and τ . Finally, Dajani and Fieldsteel (2001) gave wider applica-tions and simpler proofs of results describing the rate at which the digits ofone number theoretical expansion determine those of another. Their proofsare based on general measure-theoretic covering arguments and not on thedynamics of specific maps.

We mention that Lochs’ problem was also considered by Faivre (1997,1998b), who showed that (i) for any ε > 0 there exist positive constantsa < 1 and A such that

λ

(∣∣∣∣mn

n− 6 log 2 log 10

π2

∣∣∣∣ ≥ ε

)≤ Aan, n ∈ N+,

and (ii) the random variable(mn − 6(log 2)(log 10)n/π2

)/√

n is asymptot-ically N(0, σ) for some σ > 0 (which is related to the constant denoted bythe same letter in Example 3.2.11). Clearly, Lochs’ result is implied by (i)via the Borel–Cantelli lemma.

Cassels (1959) showed that there exist numbers x which are normal inbase 3 but non-normal in any base that is not a power of 3. This result wasgeneralized by Schmidt (1960) as follows. Let the notation r ∼ s stand forr, s ∈ N+ being powers of the same integer. It is fairly obvious that if r ∼ sthen normality of x in both bases r and s imply each other. If r 6∼ s thenthis implication does not hold. In fact, Schmidt (op. cit.) showed that in thelatter case there is a continuum power set of numbers x which are normalin base r but not even simply normal in base s. (Simple normality meansthat each single digit occurs with the proper frequency.) Motivated by this,Schweiger (1969) defined two number theoretical transformations T and Son I (or Id, the d-dimensional unit cube, d ∈ N+) to be equivalent (T ∼ S)if there exist positive integers m, n ∈ N+ such that Tm = Sn. Schweigerthen showed that T ∼ S implies that every T -normal number is S-normal,and conjectured that T 6∼ S implies the opposite conclusion.

Surprisingly, Kraaikamp and Nakada (2000) proved that the RCF andNICF expansions share the same set of normal numbers. Clearly, in itself

Page 361: Kluwer

344 Notes and Comments

this is not a counter-example to Schweiger’s conjecture, since the RCF trans-formation τ and the NICF transformation N1/2 ‘live’ on different intervals.However, in Kraaikamp and Nakada (2001) two counter-examples are given.

4.2 & 4.3

Section 4.2 fully relies on the work of Kraaikamp (1991), see also his 1989paper.

There exists a host of CF expansions which would have deserved tobe discussed here. Two such expansions are the Rosen continued fractionexpansions, and the α-expansions of Tanaka and Ito (1981). We will brieflydiscuss both of them.

Although Rosen (1954) introduced his CF expansions in the mid-1950s,it is only very recently that there has been any investigation of their metricproperties—see Burton et al. (2000), Grochenig and Haas (1996), Nakada(1995), Sebe (2002), and Schmidt (1993).

The groups which underlie the Rosen continued fraction expansions areFuchsian groups of the first kind—discrete subgroups of PSL(2,R) actingupon the Poincare upper half-plane by Mobius (fractional linear) transfor-mations, with all of R as their limit sets.

Let λ = λq = 2 cos(π/q) for q ∈ 3, 4, . . . , and put

A =(

1 λ0 1

), B =

(0 −11 0

).

Then the group Gq generated by A and B is called the Hecke (triangle)group of index q. Rosen (op. cit.) defined a CF expansion related to Gq,q ≥ 4. (Note that for q = 3 we have the modular group.) Fix some such qand let Jq = [−λ/2, λ/2 ]. Then the transformation fq : Jq → Jq defined by

fq(x) =sgnx

x−

⌊sgnx

λx+

12

⌋λ, x ∈ Jq \ 0, fq(0) = 0,

leads to a CF expansion of the form

x =e1

b1λ +e2

b2λ +.. .

,

where ei is equal to either 1 or −1 and bi ∈ N, i ∈ N+. We call this theRosen, or λ-continued fraction (λ-CF ), expansion of x ∈ Jq \ 0.

Page 362: Kluwer

Notes and Comments 345

In Burton et al. (op. cit.) the natural extension of the ergodic dynamicalsystem underlying the λ-CF expansion was obtained for any q ≥ 3—the caseq = 3 is in fact the NICF expansion. [Previously, Nakada (op. cit.) obtaineda similar result for any even q.] From this a large number of results similar tothose holding for the RCF expansion, were obtained for the λ-CF expansion.

At first sight Nakada’s α-expansions and those of Tanaka and Ito (1981)bear a close resemblance. Let α ∈ [1/2, 1], Iα = [α − 1, α], and define thetransformation Tα : Iα → Iα by

Tα(x) =

x−1 − ⌊x−1 + 1− α

⌋if x ∈ Iα \ 0,

0 if x = 0.

It yields a unique Tanaka–Ito α-expansion of the form

x =1

b1 +1

b2 +.. .

, x ∈ Iα \ 0 ,

which is finite if and only if x is rational, and where bi ∈ Z \ 0, i ∈N+. In spite of the similarities it is much harder to obtain results for theTanaka–Ito α-expansions as compared to the Nakada α-expansions discussedin Subsection 4.3.1. E.g., Tanaka and Ito (op. cit.) were able only to givethe explicit form of the density of the invariant measure for 1/2 ≤ α ≤ g.For these values of α they were also able to derive the entropy of Tα. Itis interesting to note that the latter is independent of α ∈ [1/2, g], and isequal to π2/(6 log g), which is the value corresponding to an S-expansionwith maximal singularization area.

It should be noted that limit properties as those in Chapter 3 for CFexpansions, other than the RCF expansion, need the corresponding Gauss–Kuzmin–Levy theorems (implying ψ-mixing of the sequence of their in-complete quotients). In this respect we mention the papers of Dajani andKraaikamp (1999), Iosifescu and Kalpazidou (1993), Kalpazidou (1985a, c,1986d, e, 1987b), Popescu (1997a, b, 1999, 2000), Rieger (1978, 1979), Rock-ett (1980), and Sebe (2000a, b, 2001a, b, 2002). It appears, as noted in thePreface, that for any single CF expansion a specific approach is required,which has to more or less mimic that working for the RCF expansion.

We conclude by briefly discussing a generalization of the RCF expansionknown as f -expansions (which, in general, are not CF expansions). Let f be

Page 363: Kluwer

346 Notes and Comments

a continuous strictly decreasing (increasing) real-valued function defined on[1, β], where either 2 < β ∈ N+ or β = ∞ ([0, β], where either 1 < β ∈ N+

or β = ∞), such that f(1) = 1 and f(β) = 0 (f(0) = 0 and f(β) = 1),with the convention f(β) = limx→β f(x) for β = ∞. Denote by f−1 theinverse function of f , which is defined on I. Such a function f can be usedto represent most real numbers t ∈ I as

t = f(a1(t) + f(a2(t) + · · · )) := limn→∞ fn(a1(t), · · · , an(t)),

where fn is defined recursively by

f1(x1) = f(x1), f2(x1, x2) = f1(x1 + f(x2)),

and

fn+1(x1, · · · , xn+1) = fn(x1, · · · , xn−1, xn + f(xn+1)), n ≥ 2.

Here the ‘incomplete quotients’ an(t) are defined recursively as

an(t) =⌊f−1(rn−1(t))

withr0(t) = t, rn(t) = f−1(rn−1(t)), n ∈ N+.

Note that

rn(t) = an(t) + f(an+1(t) + f(an+2(t) + · · · )), n ∈ N+.

The above representation of t is called its f -expansion. Clearly, the RCFexpansion is obtained for f(x) = 1/x, x ≥ 1, and the part of the continuedfraction transformation τ is now played by the f -expansion transformationτf of I defined by τf (t) = f−1(t), t ∈ I. [Some caution is necessary in thecase where β = ∞ when either τf (0) or τf (1) should be given the value 0.]Also, the natural extension τf of τf is defined by

τf (t, u) = (τf (t), f(a1(t) + u))

for the points (t, u) of a suitable subset of I2 of Lebesgue measure 1. Thef -expansions were first considered by Kakeya (1924), who proved that iff−1 is absolutely continuous and

∣∣∣(f−1)′∣∣∣ > 1 a.e. in I then, save possibly

a countable subset of I, any other t ∈ I has an f -expansion. A metricaltheory of f -expansions parallelling that of the RCF expansion is available.See, e.g., Iosifescu and Grigorescu (1990, Section 5.4) and the referencestherein. Finally, if β does not belong to N+ ∪ ∞, then the correspondingf leads to a so called f -expansion with dependent digits. For recent resultson such f -expansions, see Barrionuevo et al. (1994), Dajani and Kraaikamp(1996, 2001), and Dajani et al. (1994).

Page 364: Kluwer

References

Aaronson, J. (1986) Random f -expansions. Ann. Probab. 14, 1037–1057.

Aaronson, J. (1997) An Introduction to Infinite Ergodic Theory. Math-ematical Surveys and Monographs 50. Amer. Math. Soc., Providence,RI.

Aaronson, J. and Nakada, H. (2001) Sums without maxima. Preprint.

Abramov, L.M. (1959) Entropy of induced automorphisms. Dokl.Akad. Nauk SSSR 128, 647–650. (Russian)

Abramowitz, M. and Stegun, I.A. (Eds.) (1964) Handbook of Math-ematical Functions with Formulas, Graphs, and Mathematical Tables.National Bureau of Standards, Washington, D.C.

de Acosta, A. (1982) Invariance principles in probability for triangulararrays of B-valued random vectors and some applications. Ann. Probab.10, 346–373.

Adams, W.W. (1979) On a relationship between the convergents ofthe nearest integer and regular continued fractions. Math. Comp. 33,1321–1331.

Adler, R.L. (1991) Geodesic flows, interval maps, and symbolic dy-namics. In: Bedford, T. et al. (Eds.) (1991), 93–123.

Adler, R.L. and Flatto, L. (1984) The backward continued fractionmap and geodesic flow. Ergodic Theory and Dynamical Systems 4,487–492.

Adler, R., Keane, M., and Smorodinsky, M. (1981) A construction of anormal number for the continued fraction transformation. J. NumberTheory 13, 95–105.

347

Page 365: Kluwer

348 References

Alexandrov, A.G. (1978) Computer investigation of continued frac-tions. Algoritmic Studies in Combinatorics, 142–161. Nauka, Moscow.(Russian)

Aliev, I., Kanemitsu, S., and Schinzel, A. (1998) On the metric theoryof continued fractions. Colloq. Math. 77, 141–146.

Alzer, H. (1998) On rational approximation to e. J. Number Theory68, 57–62.

Araujo, A. and Gine, E. (1980) The Central Limit Theorem for Realand Banach Valued Random Variables. Wiley, New York.

Babenko, K.I. (1978) On a problem of Gauss. Soviet Math. Dokl. 19,136–140.

Babenko, K.I. and Jur′ev, S.P. (1978) On the discretization of a prob-lem of Gauss. Soviet Math. Dokl. 19, 731–735.

Bagemilhl, F. and McLaughlin, J.R. (1966) Generalization of someclassical theorems concerning triples of consecutive convergents to sim-ple continued fractions. J. Reine Angew. Math. 221, 146–149.

Bailey, D.H., Borwein, J.M., and Crandall, R.E. (1997) On the Khint-chine constant. Math. Comp. 66, 417–431.

Baladi, V. and Keller, G. (1990) Zeta functions and transfer operatorsfor piecewise monotonic transformations. Comm. Math. Phys. 127,459–477.

Barbolosi, D. (1990) Sur le developpement en fractions continues aquotients partiels impairs. Monatsh. Math. 109, 25–37.

Barbolosi, D. (1993) Automates et fractions continues. J. Theor. Nom-bres Bordeaux 5, 1–22.

Barbolosi, D. (1997) Une application du theoreme ergodique sous-additif a la theorie metrique des fractions continues. J. Number Theory66, 172–182.

Barbolosi, D. (1999) Sur l’ordre de grandeur des quotients partielsdu developpement en fractions continues regulieres. Monatsh. Math.128, 189–200.

Page 366: Kluwer

References 349

Barbolosi, D. and Faivre, C. (1995) Metrical properties of some ran-dom variables connected with the continued fraction expansion. Indag.Math. (N.S.) 6, 257–265.

Barndorff–Nielsen, O. (1961) On the rate of growth of the partial max-ima of a sequence of independent identically distributed random vari-ables. Math. Scand. 9, 383–394.

Barrionuevo, J., Burton, R.M., Dajani, K., and Kraaikamp, C. (1996)Ergodic properties of generalized Luroth series. Acta Arith. 74, 311–327.

Bedford, T., Keane, M., and Series, C. (Eds.) (1991) Ergodic Theory,Symbolic Dynamics and Hyperbolic Spaces. Oxford University Press,Oxford.

Berechet, A. (2001a) A Kuzmin-type theorem with exponential con-vergence for a class of fibred systems. Ergodic Theory and DynamicalSystems 21, 673–688.

Berechet, A. (2001b) Perron–Frobenius operators acting on BV(I) ascontractors. Ergodic Theory and Dynamical Systems 21, 1609–1624.

Bernstein, F. (1911) Uber eine Anwendung der Mengenlehre auf einaus der Theorie der sakularen Storungen herruhrendes Problem. Math.Ann. 71, 417–439.

Billingsley, P. (1965) Ergodic Theory and Information. Wiley, NewYork.

Billingsley, P. (1968) Convergence of Probability Measures. Wiley, NewYork.

Borel, E. (1903) Contribution a l’analyse arithmetique du continu.J. Math. Pures Appl. (5) 9, 329–375.

Borel, E. (1909) Les probabilites denombrables et leurs applicationsarithmetiques. Rend. Circ. Mat. Palermo 27, 247–271.

Bosma, W. (1987) Optimal continued fractions. Indag. Math. 49,353–379.

Bosma, W. and Kraaikamp, C. (1990) Metrical theory for optimalcontinued fractions. J. Number Theory 34, 251–270.

Page 367: Kluwer

350 References

Bosma, W. and Kraaikamp, C. (1991) Optimal approximation by con-tinued fractions. J. Austral. Math. Soc. Ser. A 50, 481–504.

Bosma, W., Dajani, K., and Kraaikamp, C. (1999) Entropy and count-ing correct digits. Report No. 9925 (June), Univ. Nijmegen, Dept. ofMath., Nijmegen (The Netherlands).

Bosma, W., Jager, H., and Wiedijk, F. (1983) Some metrical observa-tions on the approximation by continued fractions. Indag. Math. 45,281–299.

Bowman, K.O. and Shenton, L.R. (1989) Continued Fractions in Sta-tistical Applications. Marcel Dekker, New York.

Boyarsky, A. and Gora, P. (1997) Laws of Chaos: Invariant Measuresand Dynamical Systems in One Dimension. Birkhauser, Boston.

Bradley, R.C. (1986) Basic properties of strong mixing conditions. In:Eberlein, E. and Taqqu, M.S. (Eds.) Dependence in Probability andStatistics, 165–192. Birkhauser, Boston.

Breiman, L. (1960) A strong law of large numbers for a class of Markovchains. Ann. Math. Statist. 31, 801–803.

Brezinski, C. (1991) History of Continued Fractions and Pade Approx-imants. Springer–Verlag, Berlin.

Brjuno, A.D. (1964) The expansion of algebraic numbers into contin-ued fractions. Z. Vycisl. Mat. i Mat. Fiz. 4, 211–221. (Russian)

Broden, T. (1900) Wahrscheinlichkeitsbestimmungen bei der gewohn-lichen Kettenbruchentwickelung reeller Zahlen. Ofversigt af Kongl.Svenska Vetenskaps-Akademiens Forhandlingar 57, 239–266.

Brown, G. and Yin, Q. (1996) Metrical theory for Farey continuedfractions. Osaka J. Math. 33, 951–970.

Bruckheimer, M. and Arcavi, A. (1995) Farey series and Pick’s areatheorem. Math. Intelligencer 17, no. 4, 64–67.

de Bruijn, N.G. and Post, K.A. (1968) A remark on uniformly dis-tributed sequences and Riemann integrability. Indag. Math. 30, 149–150.

Page 368: Kluwer

References 351

Bunimovich, L.A. (1996) Continued fractions and geometrical optics.Amer. Math. Soc. Transl. (2) 171, 45–55.

Burton, R.M., Kraaikamp, C., and Schmidt, T.A. (2000) Natural ex-tensions for the Rosen fractions. Trans. Amer. Math. Soc. 352, 1277–1298.

Cassels, J.W.S. (1959) On a problem of Steinhaus about normal num-bers. Colloq. Math. 7, 95–101.

Chaitin, G.J. (1998) The Limits of Mathematics: A Course on Infor-mation Theory and the Limits of Formal Reasoning. Springer–VerlagSingapore, Singapore.

Champernowne, D.G. (1933) The construction of decimals normal inthe scale of ten. J. London Math. Soc. 8, 254–260.

Chatterji, S.D. (1966) Masse, die von regelmassigen Kettenbrucheninduziert sind. Math. Ann. 164, 113–117.

Choong, K.Y., Daykin, D.E., and Rathbone, C.R. (1971) Rationalapproximations to π. Math. Comp. 25, 387–392.

Chudnovsky, D.V. and Chudnovsky, G.V. (1991) Classical constantsand functions: computations and continued fraction expansions. In:Chudnovsky, D.V. et al. (Eds.) Number Theory (New York, 1989/1990),13–74. Springer–Verlag, New York.

Chudnovsky, D.V. and Chudnovsky, G.V. (1993) Hypergeometric andmodular function identities, and new rational approximations to andcontinued fraction expansions of classical constants and functions. In:Knopp, M. and Sheingorn, M. (Eds.) (1993), 117–162.

Clemens, L.E. , Merrill, K.D., and Roeder, D.W. (1995) Continuedfractions and series. J. Number Theory 54, 309–317.

Cohn, H. (Ed.) (1993) Doeblin and Modern Probability (Blaubeuren,Germany, 1991). Contemporary Mathematics 149. Amer. Math. Soc.,Providence, RI.

Corless, R.M. (1992) Continued fractions and chaos. Amer. Math.Monthly 99, 203–215.

Cornfeld, I.P., Fomin, S.V., and Sinai, Ya.G. (1982) Ergodic Theory.Springer–Verlag, Berlin.

Page 369: Kluwer

352 References

Dajani, K. and Fieldsteel, A. (2001) Equipartition of interval parti-tions and an application to number theory. Proc. Amer. Math. Soc.129, 3453–3460.

Dajani, K. and Kraaikamp, C. (1994) Generalization of a theorem ofKusmin. Monatsh. Math. 118, 55–73.

Dajani, K. and Kraaikamp, C. (1996) On approximation by Lurothseries. J. Theor. Nombres Bordeaux 8, 331–346.

Dajani, K. and Kraaikamp, C. (1998) A note of the approximation bycontinued fractions under an extra condition. New York J. Math. 3A,69–80.

Dajani, K. and Kraaikamp, C. (1999) A Gauss–Kusmin theorem foroptimal continued fractions. Trans. Amer. Math. Soc. 351, 2055–2079.

Dajani, K. and Kraaikamp, C. (2000) ‘The mother of all continuedfractions’. Colloq. Math. 84/85, 109–123.

Dajani, K. and Kraaikamp, C. (2001) From greedy to lazy expansionsand their driving dynamics. Preprint No. 1186, Utrecht Univ., Dept.of Math., Utrecht.

Dajani, K., Kraaikamp, C., and Solomyak, B. (1996) The natural ex-tension of the β-transformation. Acta Math. Hungar. 73, 97–109.

Daude, H., Flajolet, P., and Vallee, B. (1997) An average-case anal-ysis of the Gaussian algorithm for lattice reduction. Combinatorics,Probability and Computing 6, 397–433.

Davenport, H. (1999) The Higher Arithmetic: An Introduction to theTheory of Numbers, 7th Edition. Cambridge Univ. Press, Cambridge.

Davison, J.L. and Shallit, J.O. (1991) Continued fractions for somealternating series. Monatsh. Math. 111, 119–126.

Delmer, F. and Deshouillers, J-M. (1993) On a generalization of Fareysequences, I. In: Knopp, M. and Sheingorn, M. (Eds.) (1993), 243–246.

Delmer, F. and Deshouillers, J-M. (1995) On a generalization of Fareysequences. II. J. Number Theory 55, 60–67.

Page 370: Kluwer

References 353

Denker, M. and Jakubowski, A. (1989) Stable limit distributions forstrongly mixing sequences. Statist. Probab. Lett. 8, 477–483.

Denjoy, A. (1936 a) Sur les fractions continues. C.R. Acad. Sci. Paris202, 371–374.

Denjoy, A. (1936 b) Sur une formule de Gauss. C.R. Acad. Sci. Paris202, 537–540.

Diamond, H.G. and Vaaler, J.D. (1986) Estimates for partial sums ofcontinued fraction partial quotients. Pacific J. Math. 122, 73–82.

Dixon, J.D. (1970) The number of steps in the Euclidean algorithm.J. Number Theory 2, 414–422.

Dixon, J. D. (1971) A simple estimate for the number of steps in theEuclidean algorithm. Amer. Math. Monthly 78, 374–376.

Doeblin, W. (1940) Remarques sur la theorie metrique des fractionscontinues. Compositio Math. 7, 353–371.

Doeblin, W. and Fortet, R. (1937) Sur des chaınes a liaisons completes.Bull. Soc. Math. France 65, 132–148.

Doob, J.L. (1953) Stochastic Processes. Wiley, New York.

Doukhan, P. (1994) Mixing: Properties and Examples. Lecture Notesin Statist. 85. Springer–Verlag, New York.

Duren, P.L. (1970) Theory of Hp Spaces. Academic Press, New York.

Durner, A. (1992) On a theorem of Gauss–Kuzmin–Levy. Arch. Math.(Basel) 58, 251–256.

Elsner, C. (1999) On arithmetic properties of the convergents of Euler’snumber. Colloq. Math. 79, 133–145.

Elton, H.J. (1987) An ergodic theorem for iterated maps. Ergodic The-ory and Dynamical Systems 7, 481–488.

Faivre, C. (1992) Distribution of Levy constants for quadratic num-bers. Acta Arith. 61, 13–34.

Faivre, C. (1993) Sur la mesure invariante de l’extension naturelle de latransformation des fractions continues. J. Theor. Nombres Bordeaux5, 323–332.

Page 371: Kluwer

354 References

Faivre, C. (1996) On the central limit theorem for random variablesrelated to the continued fraction expansion. Colloq. Math. 71, 153–159.

Faivre, C. (1997) On decimal and continued fraction expansions of areal number. Acta Arith. 82, 119–128.

Faivre, C. (1998a) The rate of convergence of approximations of acontinued fraction. J. Number Theory 68, 21–28.

Faivre, C. (1998b) A central limit theorem related to decimal andcontinued fraction expansions. Arch. Math. (Basel) 70, 455–463.

Falconer, K.J. (1986) The Geometry of Fractal Sets. Cambridge Univ.Press, Cambridge.

Falconer, K. (1990) Fractal Geometry: Mathematical Foundations andApplications. Wiley, Chichester.

Feller, W. (1968) An Introduction to Probability Theory and Its Ap-plications, Vol. I, 3rd Edition. Wiley, New York.

Finch, S. (1995) Favorite Mathematical Constants. Available at: http://www.mathsoft.com/asolve/constant/constant.html

Flajolet, P. and Vallee, B. (1998) Continued fractions algorithms, func-tional operators, and structure constants. Theoret. Comput. Sci. 194,1–34.

Flajolet, P. and Vallee, B. (2000) Continued fractions, comparisonalgorithms, and fine structure constants. Constructive, Experimental,and Nonlinear Analysis (Limoges, 1999), 53–82. Amer. Math. Soc.,Providence, RI.

Fluch, W. (1986) Eine Verallgemeinerung des Kuz’min-Theorems. Anz.Osterreich. Akad. Wiss. Math.-Natur. Kl. Sitzungsber. II 195, 325–339.

Fluch, W. (1992) Ein Operator der Kettenbruchtheorie. Anz. Oster-reich. Akad. Wiss. Math.-Natur. Kl. 129, 39–49.

Gal, I.S. and Koksma, J.F. (1950) Sur l’ordre de grandeur des fonctionssommables. Indag. Math. 12, 638–653.

Page 372: Kluwer

References 355

Galambos, J. (1972) The distribution of the largest coefficient in con-tinued fraction expansions. Quart. J. Math. Oxford Ser. (2) 23, 147–151.

Galambos, J. (1973) The largest coefficient in continued fractions andrelated problems. In: Osgood, Ch. (Ed.) Diophantine Approximationand its Applications (Proc. Conf., Washington, D.C., 1972), 101–109.Academic Press, New York.

Galambos, J. (1994) An iterated logarithm type theorem for the largestcoefficient in continued fractions. Acta Arith. 25, 359–364.

Gologan, R.-N. (1989) Applications of Ergodic Theory. Technical Pub-lishing House, Bucharest. (Romanian)

Gordin, M.I. (1971) On the behavior of the variances of sums of ran-dom variables forming a stationary process. Theory Probab. Appl. 16,474–484.

Gordin, M.I. and Reznik, M.H. (1970) The law of the iterated loga-rithm for the denominators of continued fractions. Vestnik Leningrad.Univ. 25, no. 13, 28–33. (Russian)

Gray, J.J. (1984) A commentary on Gauss’ mathematical diary, 1796–1814, with an English translation. Exposition. Math. 2, 97–130.

Grigorescu, S. and Popescu, G. (1989) Random systems with completeconnections as a framework for fractals. Stud. Cerc. Mat. 41, 481–489.

Grochenig, K. and Haas, A. (1996) Backward continued fractions andtheir invariant measures. Canad. Math. Bull. 39, 186–198.

Grothendieck, A. (1955) Produits tensoriels topologiques et espacesnucleaires. Mem. Amer. Math. Soc. 16. Amer. Math. Soc., Providence,RI.

Grothendieck, A. (1956) La theorie de Fredholm. Bull. Soc. Math.France 84, 319–384.

de Haan, L. (1970) On Regular Variation and its Application to theWeak Convergence of Sample Extremes. Math. Centre Tracts 32.Math. Centrum, Amsterdam.

Halmos, P.R. (1950) Measure Theory. Van Nostrand, New York. (Re-printed 1974 by Springer–Verlag, New York)

Page 373: Kluwer

356 References

Hardy, G.H. and Wright, E. (1979) An Introduction to the Theory ofNumbers, 5th Edition. Clarendon Press, Oxford. [Reprinted (withcorrections) 1983]

Harman, G. (1998) Metric Number Theory. Oxford University Press,New York.

Harman, G. and Wong, K.C. (2000) A note on the metrical theory ofcontinued fractions. Amer. Math. Monthly 107, 834–837.

Hartman, S. (1951) Quelques proprietes ergodiques des fractions con-tinues. Studia Math. 12, 271–278.

Hartono, Y. and Kraaikamp, C. (2002) On continued fractions withodd partial quotients. Rev. Roumaine Math. Pures Appl. 47, no. 1.

Heilbronn, H. (1969) On the average length of a class of finite continuedfractions. Number Theory and Analysis (Papers in Honor of EdmundLandau), 87–96. Plenum, New York.

Heinrich, H. (1987) Rates of convergence in stable limit theorems forsums of exponentially ψ-mixing random variables with an applicationto metric theory of continued fractions. Math. Nachr. 131, 149–165.

Hennion, H. (1993) Sur un theoreme spectral et son application auxnoyaux lipschitziens. Proc. Amer. Math. Soc. 118, 627–634.

Hensley, D. (1988) A truncated Gauss–Kuzmin law. Trans. Amer.Math. Soc. 306, 307–327.

Hensley, D. (1991) The largest digit in the continued fraction expan-sion of a rational number. Pacific J. Math. 151, 237–255.

Hensley, D. (1992) Continued fraction Cantor sets, Hausdorff dimen-sion, and functional analysis. J. Number Theory 40, 336–358.

Hensley, D. (1994) The number of steps in the Euclidean algorithm.J. Number Theory 49, 142–182.

Hensley, D. (1996) A polynomial time algorithm for the Hausdorffdimension of continued fraction Cantor sets. J. Number Theory 58,9–45.

Hensley, D. (1998) Metric Diophantine approximation and probability.New York J. Math. 4, 249–257.

Page 374: Kluwer

References 357

Hensley, D. (2000) The statistics of the continued fraction digit sum.Pacific J. Math. 192, 103–120.

Heyde, C.C. and Scott, D.J. (1973) Invariance principles for the law ofthe iterated logarithm for martingales and processes with stationaryincrements. Ann. Probab. 1, 428–436.

Hofbauer, F. and Keller, G. (1982) Ergodic properties of invariantmeasures for piecewise monotonic transformations. Math. Z. 180, 119–140.

Hoffmann-Jørgensen, J.(1994) Probability with a View toward Statis-tics, Vols. I and II. Chapman & Hall, New York.

Hurwitz, A. (1889) Uber eine besondere Art der Kettenbruch-Entwick-lung reeller Grossen. Acta Math. 12, 367–405.

Ibragimov, I.A. and Linnik, Yu.V. (1971) Independent and StationarySequences of Random Variables. Wolters–Noordhoff, Groningen.

Ionescu Tulcea, C. T. and Marinescu, G. (1950) Theorie ergodiquepour des classes d’operations non completement continues. Ann. ofMath. (2) 52, 140–147.

Iosifescu, M. (1968) The law of the iterated logarithm for a class ofdependent random variables. Theory Probab. Appl. 13, 304–313. Ad-dendum, ibid. 15 (1970), 160.

Iosifescu, M. (1972) On Strassen’s version of the loglog law for someclasses of dependent random variables. Z. Wahrsch. Verw. Gebiete24, 155–158.

Iosifescu, M. (1977) A Poisson law for φ-mixing sequences establishingthe truth of a Doeblin statement. Rev. Roumaine Math. Pures Appl.22, 1441–1447.

Iosifescu, M. (1978) Recent advances in the metric theory of continuedfractions. Trans. Eighth Prague Conf. on Information Theory, Statis-tical Decision Functions, Random Processes (Prague, 1978), Vol. A,27–40. Reidel, Dordrecht.

Iosifescu, M. (1989) On mixing coefficients for the continued fractionexpansion. Stud. Cerc. Mat. 41, 491–499.

Page 375: Kluwer

358 References

Iosifescu, M. (1990) A survey of the metric theory of continued frac-tions, fifty years after Doeblin’s 1940 paper. In: Grigelionis, B. etal. (Eds.) Probability Theory and Mathematical Statistics (Proc. FifthVilnius Conference, 1989), Vol. I, 550–572. Mokslas, Vilnius & VSP,Utrecht.

Iosifescu, M. (1992) A very simple proof of a generalization of theGauss–Kuzmin–Levy theorem on continued fractions, and related ques-tions. Rev. Roumaine Math. Pures Appl. 37, 901–914.

Iosifescu, M. (1993a) Doeblin and the metric theory of continued frac-tions: a functional theoretical approach to Gauss’ 1812 problem. In:Cohn, H. (Ed.) (1993), 97–110.

Iosifescu, M. (1993b) A basic tool in mathematical chaos theory: Doe-blin and Fortet’s ergodic theorem and Ionescu Tulcea and Marinescu’sgeneralization. In: Cohn, H. (Ed.) (1993), 111–124.

Iosifescu, M. (1994) On the Gauss–Kuzmin–Levy theorem, I. Rev. Rou-maine Math. Pures Appl. 39, 97–117.

Iosifescu, M. (1995) On the Gauss–Kuzmin–Levy theorem, II. Rev. Rou-maine Math. Pures Appl. 40, 91–105.

Iosifescu, M. (1996) On some series involving sums of incomplete quo-tients of continued fractions. Stud. Cerc. Mat. 48, 31–36. Corrigen-dum, ibid. 48, 146.

Iosifescu, M. (1997a) On the Gauss–Kuzmin–Levy theorem, III. Rev.Roumaine Math. Pures Appl. 42, 71–88.

Iosifescu, M. (1997b) A reversible random sequence arising in the met-ric theory of the continued fraction expansion. Rev. Anal. Numer.Theor. Approx. 26, 91–93.

Iosifescu, M. (1999) On a 1936 paper of Arnaud Denjoy on the metricaltheory of the continued fraction expansion. Rev. Roumaine Math.Pures Appl. 44, 777–792.

Iosifescu, M. (2000a) An exact convergence rate result with applicationto Gauss’ 1812 problem. Proc. Romanian Acad. Ser. A 1, 11–13.

Iosifescu, M. (2000b) Exact values of ψ-mixing coefficients of the se-quence of incomplete quotients of the continued fraction expansion.Proc. Romanian Acad. Ser. A 1, 67–69.

Page 376: Kluwer

References 359

Iosifescu, M. (2000c) On the distribution of continued fraction approx-imations: optimal rates. Proc. Romanian Acad. Ser. A 1, 143–145.

Iosifescu, M. and Grigorescu, S. (1990) Dependence with CompleteConnections and its Applications. Cambridge Univ. Press, Cambridge.

Iosifescu, M. and Kalpazidou, S. (1993) The nearest integer continuedfraction expansion: an approach in the spirit of Doeblin. In: Cohn,H. (Ed.) (1993), 125–137.

Iosifescu, M. and Kraaikamp, C. (2001) On Denjoy’s canonical contin-ued fraction expansion. Submitted.

Iosifescu, M. and Theodorescu, R. (1969) Random Processes and Learn-ing. Springer–Verlag, Berlin.

Ito, Sh. (1987) On Legendre’s theorem related to Diophantine approx-imations. Seminaire de Theorie des Nombres, 1987–1988 (Talence,1987–1988), Exp. No. 44, 19 pp.

Ito, Sh. (1989) Algorithms with mediant convergents and their metri-cal theory. Osaka J. Math. 26, 557–578.

Jager, H. (1982) On the speed of convergence of the nearest integercontinued fraction. Math. Comp. 39, 555–558.

Jager, H. (1985) Metrical results for the nearest integer continued frac-tion. Indag. Math. 47, 417–427.

Jager, H. (1986a) The distribution of certain sequences connected withthe continued fraction. Indag. Math. 48, 61–69.

Jager, H. (1986b) Continued fractions and ergodic theory. Trans-cendental Number Theory and Related Topics, 55–59. RIMS Koky-uroku 599. Kyoto Univ., Kyoto.

Jager, H. and Kraaikamp, C. (1989) On the approximation by contin-ued fractions. Indag. Math. 51, 289–307.

Jager, H. and Liardet, P. (1988) Distributions arithmetiques des deno-minateurs de convergents de fractions continues. Indag. Math. 50,181–197.

Jain, N.C. and Pruitt, W.E. (1975) The other law of the iteratedlogarithm. Ann. Probab. 3, 1046–1049.

Page 377: Kluwer

360 References

Jain, N.C. and Taylor, S.J. (1973) Local asymptotic laws for Brownianmotion. Ann. Probab. 1, 527–549.

Jenkinson, O. and Pollicott, M. (2001) Computing the dimension ofdynamically defined sets: E2 and bounded continued fractions. Er-godic Theory and Dynamical Systems 21, 1429–1445.

Jain, N.C., Jodgeo, K., and Stout, W.F. (1975) Upper and lower func-tions for martingales and mixing processes. Ann. Probab. 3, 119–145.

Jones, W.B. and Thron, W.J. (1980) Continued Fractions: AnalyticTheory and Applications. Addison-Wesley, Reading, Mass.

Kac, M. (1959) Statistical Independence in Probability and Statistics.Wiley, New York.

Kaijser, T. (1983) A note on random continued fractions. Probabil-ity and Mathematical Statistics : Essays in Honour of Carl-GustavEsseen, 74–84. Uppsala Univ., Dept. of Math., Uppsala.

Kakeya, S. (1924) On a generalized scale of notations. Japan J. Math.1, 95-108.

Kalpazidou, S. (1985a) On a random system with complete connec-tions associated with the continued fraction to the nearer integer ex-pansion. Rev. Roumaine Math. Pures Appl. 30, 527–537.

Kalpazidou, S. (1985b) On some bidimensional denumerable chains ofinfinite order. Stochastic Process. Appl. 19, 341–357.

Kalpazidou, S. (1985c) Denumerable chains of infinite order and Hur-witz expansion. Selected Papers Presented at the 16th European Meet-ing of Statisticians (Marburg, 1994). Statist. Decisions, Suppl. Issueno. 2, 83–87.

Kalpazidou, S. (1986a) A class of Markov chains arising in the met-rical theory of the continued fraction to the nearer integer expansion.Rev. Roumaine Math. Pures Appl. 31, 877–890.

Kalpazidou, S. (1986b) Some asymptotic results on digits of the near-est integer continued fraction. J. Number Theory 22, 271–279.

Kalpazidou, S. (1986c) On nearest continued fractions with stochasti-cally independent and identically distributed digits. J. Number Theory24, 114–125.

Page 378: Kluwer

References 361

Kalpazidou, S. (1986d) On a problem of Gauss–Kuzmin type for con-tinued fractions with odd partial quotients. Pacific J. Math. 123,103–114.

Kalpazidou, S. (1986e) A Gaussian measure for certain continued frac-tions. Proc. Amer. Math. Soc. 96, 629–635.

Kalpazidou, S. (1987a) On the entropy of the expansion with oddpartial quotients. In: Grigelionis, B. et al. (Eds.) Probability Theoryand Mathematical Statistics (Proc. Fourth Vilnius Conf., 1985), Vol.II, 55–62. VNU Science Press, Utrecht.

Kalpazidou, S. (1987b) On the application of dependence with com-plete connections to the metrical theory of G-continued fractions. Li-thuanian Math. J. 27, no. 1, 32–40.

Kamae, T. (1982) A simple proof of the ergodic theorem using non-standard analysis. Israel J. Math. 42, 284–290.

Kanwal, R.P. (1997) Linear Integral Equations: Theory and Technique,2nd Edition. Birkhauser, Boston.

Kargaev, P. and Zhigljavsky, A. (1997) Asymptotic distribution of thedistance function to the Farey points. J. Number Theory 65, 130–149.

Katznelson, Y. and Weiss, B. (1982) A simple proof of some ergodictheorems. Israel J. Math. 42, 291–296.

Keane, M.S. (1991) Ergodic theory and subshifts of finite type. In:Bedford, T. et al. (Eds.) (1991), 35–70.

Keller, G. (1984) On the rate of convergence to equilibrium in one-dimensional systems. Comm. Math. Phys. 96, 181–193.

Khintchine, A. (1934/35) Metrische Kettenbruchprobleme. Composi-tio Math. 1, 361–382.

Khintchine, A. (1936) Zur metrischen Kettenbruchtheorie. CompositioMath. 3, 276–285.

Khintchine, A.J. (1956) Kettenbruche. Teubner, Leipzig. [Translationof the 2nd (1949) Russian Edition; 1st Russian Edition 1935]

Khintchine, A.Ya. (1963) Continued Fractions. Noordhoff, Groningen.[Translation of the 3rd (1961) Russian Edition]

Page 379: Kluwer

362 References

Khinchin, A.Ya. (1964) Continued Fractions. Univ. Chicago Press,Chicago. [Translation of the 3rd (1961) Russian Edition]

Klein, F. (1895) Uber eine geometrische Auffassung der gewohnlichenKettenbruchentwicklung. Nachr. Konig. Gesellsch. Wiss. GottingenMath.-Phys. Kl. 45, 357–359. [French version (1896) Sur une represen-tation geometrique du developpement en fraction continue ordinaire.Nouvelles Ann. Math. (3), 15, 327–331]

Knopp, K. (1926) Mengentheoretische Behandlung einiger Problemeder diophantische Approximationen und der transfiniten Wahrschein-lichkeiten. Math. Ann. 95, 409–426.

Knopp, M. and Sheingorn, M. (Eds.) (1993) A Tribute to Emil Gross-wald: Number Theory and Related Analysis. Contemporary Mathe-matics 143. Amer. Math. Soc., Providence, RI.

Knuth, D.E. (1976) Evaluation of Porter’s constant. Comput. Math.Appl. 2, 137–139.

Knuth, D.E. (1981) The Art of Computer Programming, Vol. 2: Seminu-merical Algorithms, 2nd Edition. Addison-Wesley, Reading, Mass.

Knuth, D.E. (1984) The distribution of continued fraction approxima-tions. J. Number Theory 19, 443–448.

Kohler, G. (1980) Some more predictable continued fractions. Monatsh.Math. 89, 95–100.

Koksma, J.F. (1936) Diophantische Approximationen. J. Springer,Berlin.

Kraaikamp, C. (1987) The distribution of some sequences connectedwith the nearest integer continued fraction. Indag. Math. 49, 177–191.

Kraaikamp, C. (1989) Statistic and ergodic properties of Minkowski’sdiagonal continued fraction. Theoret. Comput. Sci. 65, 197–212.

Kraaikamp, C. (1990) On the approximation by continued fractions,II. Indag. Math. (N.S.) 1, 63–75.

Kraaikamp, C. (1991) A new class of continued fractions. Acta Arith.57, 1–39.

Page 380: Kluwer

References 363

Kraaikamp, C. (1993) Maximal S-expansions are Bernoulli shifts. Bull.Soc. Math. France 121, 117–131.

Kraaikamp, C. (1994) On symmetric and asymmetric Diophantine ap-proximation by continued fractions. J. Number Theory 46, 137–157.

Kraaikamp, C. and Liardet, P. (1991) Good approximations and con-tinued fractions. Proc. Amer. Math. Soc. 112, 303–309.

Kraaikamp, C. and Lopes, A. (1996) The theta group and the con-tinued fraction expansion with even partial quotients. GeometriaeDedicata 59, 293–333.

Kraaikamp, C. and Meester, R. (1998) Convergence of continued frac-tion type algorithms and generators. Monatsh. Math. 125, 1–14.

Kraaikamp, C. and Nakada, H. (2000) On normal numbers for contin-ued fractions. Ergodic Theory and Dynamical Systems 20, 1405–1421.

Kraaikamp, C. and Nakada, H. (2001) On a problem of Schweigerconcerning normal numbers. J. Number Theory 86, 330–340.

Krasnoselskii, M. (1964) Positive Solutions of Operator Equations. No-ordhoff, Groningen.

Krengel, U. (1985) Ergodic Theorems (with a Supplement by AntoineBrunel). W. de Gruyter, Berlin.

Kuipers, L. and Niederreiter, H. (1974) Uniform Distribution of Se-quences. Wiley, New York.

Kurosu, K. (1924) Notes on some points in the theory of continuedfractions. Japan J. Math. 1, 17–21. Corrigendum, ibid. 2 (1926), 64.

Kuzmin, R.O. (1928) On a problem of Gauss. Dokl. Akad. Nauk SSSRSer. A, 375–380. [Russian; French version in Atti Congr. Inter-naz. Mat. (Bologna, 1928), Tomo VI, 83–89. Zanichelli, Bologna,1932]

Lagarias, J.C. (1992) Number theory and dynamical systems. In:Burr, S.A. (Ed.) The Unreasonable Effectiveness of Number Theory,35–72. Proc. Sympos. Appl. Math. 46. Amer. Math. Soc., Providence,RI.

Page 381: Kluwer

364 References

Lang, S. and Trotter, H. (1972) Continued fractions for some algebraicnumbers. J. Reine Angew. Math. 255, 112–134. Addendum, ibid. 267(1974), 219–220.

Lasota, A. and Mackey, M.C. (1985) Probabilistic Properties of Deter-ministic Systems. Cambridge Univ. Press, Cambridge. [2nd Edition(1994) Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics.Applied Mathematical Sciences 97. Springer–Verlag, New York]

Legendre, A.M. (1798) Essai sur la theorie des nombres. Duprat,Paris. [2eme edition (1808), Courcier, Paris; 3eme edition (1830),Didot, Paris; reprinted (1955), Blanchard, Paris]

Lehmer, D. (1939) Note on an absolute constant of Khintchine. Amer.Math. Monthly 46, 148–152.

Lehner, J. (1994) Semiregular continued fractions whose partial de-nominators are 1 or 2. In: Abikoff, W. et al. (Eds.) The Mathe-matical Legacy of Wilhelm Magnus: Groups, Geometry and SpecialFunctions (Brooklyn, NY, 1992), 407–410. Contemporary Mathemat-ics 169. Amer. Math. Soc., Providence, RI.

Levy, P. (1929) Sur les lois de probabilite dont dependent les quotientscomplets et incomplets d’une fraction continue. Bull. Soc. Math. France57, 178–194.

Levy, P. (1936) Sur le developpement en fraction continue d’un nombrechoisi au hasard. Compositio Math. 3, 286–303.

Levy, P. (1952) Fractions continues aleatoires. Rend. Circ. Mat. Palermo(2) 1, 170–208.

Levy, P. (1954) Theorie de l’addition des variables aleatoires, 2emeedition. Gauthier-Villars, Paris. (1ere edition 1937)

Liardet, P. and Stambul, P. (2000) Series de Engel et fractions conti-nues. J. Theor. Nombres Bordeaux 12, 37–68.

Lin, M. (1978) Quasi-compactness and uniform ergodicity of positiveoperators. Israel J. Math. 29, 309–311.

Lochs, G. (1961) Statistik der Teilnenner der zu den echten Bruchengehorigen regelmassigen Kettenbruche. Monatsh. Math. 65, 27–52.

Page 382: Kluwer

References 365

Lochs, G. (1963) Die ersten 968 Kettenbruchnenner von π. Monatsh.Math. 67, 311–316.

Lochs, G. (1964) Vergleich der Genauigkeit von Dezimalbruch undKettenbruch. Abh. Math. Sem. Hamburg 27, 142–144.

Lorenzen, L. and Waadeland, H. (1992) Continued Fractions and Ap-plications. North-Holland, Amsterdam.

Loynes, R.M. (1965) Extreme values in uniformly mixing stationarystochastic processes. Ann. Math. Statist. 36, 993–999.

Lyons, R. (2000) Singularity of some random continued fractions. J.Theoret. Probab. 13, 535–545.

Mackey, M.C. (1992) Time’s Arrow: The Origins of ThermodynamicBehavior. Springer–Verlag, New York.

MacLeod, A. J.(1993) High-accuracy numerical values in the Gauss–Kuzmin continued fraction problem. Comput. Math. Appl. 26, 37–44.

Magnus, W., Oberhettinger, F., and Soni, R.P. (1966) Formulas andTheorems for the Special Functions of Mathematical Physics, 3rd Edi-tion. Springer–Verlag, Berlin.

Marcus, S. (1961) Les approximations diophantiennes et la categoriede Baire. Math. Z. 76, 42–45.

Marques Henriques, J. (1966) On probability measures generated byregular continued fractions. Gaz. Mat. (Lisboa) 27, no. 103–104, 16–22.

Martin, M.H. (1934) Metrically transitive point transformations. Bull.Amer. Math. Soc. 40, 606–612.

Mayer, D.H. (1987) Relaxation properties of the mixmaster universe.Physics Lett. A 122, 390–394.

Mayer, D. (1990) On the thermodynamic formalism for the Gauss map.Comm. Math. Phys. 130, 311–333.

Mayer, D. (1991) Continued fractions and related transformations. In:Bedford, T. et al. (Eds.) (1991), 175–222.

Page 383: Kluwer

366 References

Mayer, D. and Roepstorff, G. (1987) On the relaxation time of Gauss’continued-fraction map. I. The Hilbert space approach (Koopman-ism). J. Statist. Phys. 47, 149–171.

Mayer, D. and Roepstorff, G. (1988) On the relaxation time of Gauss’continued-fraction map. II. The Banach space approach (transfer op-erator method). J. Statist. Phys. 50, 331–344.

Mazzone, F. (1995/96) A characterization of almost everywhere con-tinuous functions. Real Anal. Exchange 21, no. 1, 317–319.

McKinney, T.E. (1907) Concerning a certain type of continued frac-tions depending on a variable parameter. Amer. J. Math. 29, 213–278.

Minkowski, H. (1900) Uber die Annaherung an eine reelle Grosse durchrationale Zahlen. Math. Ann. 54, 91–124.

Minnigerode, B. (1873) Uber eine neue Methode, die Pell’sche Gle-ichung aufzulosen. Nachr. Konig. Gesellsch. Wiss. Gottingen Math.-Phys. Kl. 23, 619–652.

Misevicius, G. (1971) Asymptotic expansions for the distribution func-tions of sums of the form

∑n−1j=0 f(T jt). Ann. Univ. Sci. Budapest

Eotvos Sect. Math. 14, 77–92. (Russian)

Misevicius, G. (1981) Estimate of the remainder term in the limit the-orem for the denominators of continued fractions. Lithuanian Math. J.21, 245–253.

Misevicius, G. (1992) The optimal zone for large deviations of thedenominators of continued fractions. New Trends in Probability andStatistics (Palanga, 1991), Vol. 2, 83–90. VSP, Utrecht.

Moeckel, R. (1982) Geodesics on modular surfaces and continued frac-tions. Ergodic Theory and Dynamical Systems 2, 69–83.

Mollin, R.A. (1999) Continued fraction gems. Nieuw Arch. Wiskunde(4) 17, 383–405.

Morita, T. (1994) Local limit theorem and distribution of periodicorbits of Lasota-Yorke transformations with infinite Markov partitions.J. Math. Soc. Japan 46, 309–343. Errata, ibid. 47 (1995), 191–192.

Page 384: Kluwer

References 367

Nakada, H. (1981) Metrical theory for a class of continued fractiontransformations and their natural extensions. Tokyo J. Math. 7, 399–426.

Nakada, H. (1990) The metrical theory of complex continued fractions.Acta Arith. 56, 279–289.

Nakada, H. (1995) Continued fractions, geodesic flows and Ford circles.In: Takahashi, Y. (Ed.), Algorithms, Fractals and Dynamics, 179–191.Plenum, New York.

Nakada, H., Ito, Sh., and Tanaka, S. (1977) On the invariant measurefor the transformations associated with some real continued fraction.Keio Engrg. Rep. 30, 159–175.

von Neumann, J. and Tuckerman, B. (1955) Continued fraction ex-pansion of 21/3. Math. Tables Aids Comput. 9, 23–24.

Nolte, V.N. (1990) Some probabilistic results on the convergents ofcontinued fractions. Indag. Math. (N.S.) 1, 381–389.

Obrechkoff, N. (1951) Sur l’approximation des nombres irrationnelspar des nombres rationnels. C.R. Acad. Bulgare Sci. 3, no. 1, 1–4.

Olds, C.D. (1963) Continued Fractions. Random House, Toronto.

Pedersen, P. (1959) On the expansion of π in a regular continuedfraction. II. Nordisk Mat. Tidskr. 7, 165–168.

Perron, O. (1954, 1957) Die Lehre von der Kettenbruchen. Band I: El-ementare Kettenbruche; Band II: Analytisch-funktiontheoretische Ket-tenbruche. Teubner, Stuttgart. (1st Edition 1913; 2nd Edition 1929)

Petek, P. (1989) The continued fraction of a random variable. Expo-sition. Math. 7, 369–378.

Petersen, K. (1983) Ergodic Theory. Cambridge Univ. Press, Cam-bridge.

Petho, A. (1982) Simple continued fractions for the Fredholm numbers.J. Number Theory 14, 232–236.

Philipp, W. (1967) Some metrical theorems in number theory. PacificJ. Math. 20, 109–127.

Page 385: Kluwer

368 References

Philipp, W. (1970) Some metrical theorems in number theory II. DukeMath. J. 37, 447–458. Errata, ibid. 37, 788.

Philipp, W. (1976) A conjecture of Erdos on continued fractions. ActaArith. 28, 379–386.

Philipp, W. (1988) Limit theorems for sums of partial quotients ofcontinued fractions. Monatsh. Math. 105, 195–206.

Philipp, W. and Stackelberg, O.P. (1969) Zwei Grenzwertsatze furKettenbruche. Math. Ann. 181, 152–156.

Philipp, W. and Stout, W. (1975) Almost Sure Invariance Principlesfor Partial Sums of Weakly Dependent Random Variables. Mem. Amer.Math. Soc. 161. Amer. Math. Soc., Providence, RI.

Philipp, W. and Webb, G.R. (1973) An invariance principle for mixingsequences of random variables. Z. Wahrsch. Verw. Gebiete 25, 223–237.

von Plato, J. (1994) Creating Modern Probability: Its Mathematics,Physics and Philosophy in Historical Perspective. Cambridge Univ.Press, Cambridge.

van der Poorten, A. and Shallit, J. (1992) Folded continued fractions.J. Number Theory 40, 237–250.

Popescu, C. (1997a) Continued fractions with odd partial quotients:an approach in the spirit of Doeblin. Stud. Cerc. Mat. 49, 107–117.

Popescu, C. (1997b) On the rate of convergence in Gauss’ problemfor the continued fraction expansion with odd partial quotients. Stud.Cerc. Mat. 49, 231–244.

Popescu, C. (1999) On the rate of convergence in Gauss’ problemfor the nearest interger continued fraction expansion. Rev. RoumaineMath. Pures Appl. 44, 257–267.

Popescu, C. (2000) On a Gauss–Kuzmin problem for the α-continuedfractions. Rev. Roumaine Math. Pures Appl. 45, 993–1004.

Popescu, G. (1978) Asymptotic behaviour of random systems withcomplete connections, I, II. Stud. Cerc. Mat. 30, 37–68, 181–215.(Romanian)

Page 386: Kluwer

References 369

Porter, J.W. (1975) On a theorem of Heilbronn. Mathematika 22,20–28.

Postnikov, A.G. (1960) Arithmetic Modeling of Random Processes.Trudy Mat. Inst. Steklov. 57. Nauka, Moscow. [Russian; Englishtranslation Selected Transl. in Math. Statist. and Probab. 13 (1973),41–122]

Raney, G.N. (1973) On continued fractions and finite automata. Math.Ann. 206, 265–283.

Rautu, G. and Zbaganu, G. (1989) Some Banach algebras of functionsof bounded variation. Stud. Cerc. Mat. 41, 513–519.

Renyi, A. (1957) Representations for real numbers and their ergodicproperties. Acta Math. Acad. Sci. Hungar. 8, 477–493.

Richtmyer, R.D. (1975) Continued fraction expansion of algebraic num-bers. Adv. in Math. 16, 362–367.

Rieger, G.J. (1977) Die metrische Theorie der Kettenbruche seit Gauss.Abh. Braunschweig. Wiss. Gesellsch. 27, 103–117.

Rieger, G.J. (1978) Ein Gauss–Kusmin–Levy–Satz fur Kettenbruchenach nachsten Ganzen. Manuscripta Math. 24, 437–448.

Rieger, G.J. (1979) Mischung und Ergodizitat bei Kettenbruchen nachnachsten Ganzen. J. Reine Angew. Math. 310, 171–181.

Rieger, G.J. (1981a) Ein Heilbronn–Satz fur Kettenbruche mit unger-aden Teilnennern. Math. Nachr. 101, 295–307.

Rieger, G.J. (1981b) Uber die Lange von Kettenbruchen mit ungeradenTeilnennern. Abh. Braunschweig. Wiss. Gesellsch. 32, 61–69.

Rieger, G.J. (1984) On the metrical theory of the continued fractionswith odd partial quotients. Topics in Classical Number Theory (Bu-dapest, 1981), Vol. II, 1371–1418. Colloq. Math. Soc. Janos Bolyai34. North-Holland, Amsterdam.

Rivat, J. (1999) On the metric theory of continued fractions. Col-loq. Math. 79, 9–15.

Rockett, A.M. (1980) The metrical theory of continued fractions tothe nearer integer. Acta Arith. 38, 97–103.

Page 387: Kluwer

370 References

Rockett, A.M. and Szusz, P. (1992) Continued Fractions. World Sci-entific, Singapore.

Rogers, C.A. (1998) Hausdorff measures, 2nd Printing, with a Fore-word by K.Falconer. Cambridge Univ. Press, Cambridge.

Rosen, D. (1954) A class of continued fractions associated with certainproperly discontinuous groups. Duke Math. J. 21, 549–563.

Rousseau-Egele, J. (1983) Un theoreme de la limite locale pour uneclasse de transformations dilatantes et monotones par morceaux. Ann.Probab. 11, 772–788.

Ruelle, D. (1978) Thermodynamic Formalism. The Mathematical Struc-tures of Classical Equilibrium Statistical Mechanics. Addison-Wesley,Reading, Mass.

Ryll–Nardzewski, C. (1951) On the ergodic theorems. II. Ergodic the-ory of continued fractions. Studia Math. 12, 74–79.

Salat, T. (1967) Remarks on the ergodic theory of the continued frac-tions. Mat. Casopis Sloven. Akad. Vied 17, 121–130.

Salat, T. (1969) Bemerkung zu einem Satz von P. Levy in der metrischenTheorie der Kettenbruche. Math. Nachr. 41, 91–94.

Salat, T. (1984) On a metric result in the theory of continued fractions.Acta Math. Univ. Comenian. 44–45, 49–53.

Salem, R. (1943) On some singular monotonic functions which arestrictly increasing. Trans. Amer. Math. Soc. 53, 427–439.

Samorodnitsky, G. and Taqqu, M.S. (1994) Stable Non-Gaussian Ran-dom Processes: Stochastic Models with Infinite Variance. Chapman &Hall, New York.

Samur, J.D. (1984) Convergence of sums of mixing triangular arraysof random vectors with stationary rows. Ann. Probab. 12, 390–426.

Samur, J.D. (1985) A note on the convergence to Gaussian laws ofsums of stationary ϕ-mixing triangular arrays. Probability in BanachSpaces V (Proccedings, Medford, 1984), 387–399. Lecture Notes inMath. 1153. Springer–Verlag, Berlin.

Page 388: Kluwer

References 371

Samur, J.D. (1987) On the invariance principle for stationary ϕ-mixingtriangular arrays with infinitely divisible limits. Probab. Theory Re-lated Fields 75, 245–259.

Samur, J.D. (1989) On some limit theorems for continued fractions.Trans. Amer. Math. Soc. 316, 53–79.

Samur, J.D. (1991) A functional central limit theorem in Diophantineapproximation. Proc. Amer. Math. Soc. 111, 901–911.

Samur, J.D. (1996) Some remarks on a probability limit theorem forcontinued fractions. Trans. Amer. Math. Soc. 348, 1411–1428.

Saulis, L. and Statulevicius, V. (1991) Limit Theorems for Large De-viations. Kluwer, Dordrecht.

Schmidt, A.L. (1975) Diophantine approximation of complex numbers.Acta Math. 134, 1–85.

Schmidt, A.L. (1983) Ergodic theory for complex continued fractions.Monatsh. Math. 93, 39–62.

Schmidt, T.A. (1993) Remarks on the Rosen λ-continued fractions.In: Pollington, A. and Moran, W. (Eds.), Number Theory with anEmphasis on the Markoff Spectrum, 227–238. Marcel Dekker, NewYork.

Schmidt, W.M. (1960) On normal numbers. Pacific J. Math. 10,661–672.

Schmidt, W.M. (1980) Diophantine Approximation. Lecture Notes inMath. 785. Springer–Verlag, Berlin.

Schweiger, F. (1969) Eine Bemerkung zu einer Arbeit von S.D. Chat-terji. Mat. Casopis Sloven. Akad. Vied 19, 89–91.

Schweiger, F. (1995) Ergodic Theory of Fibred Systems and MetricNumber Theory. Clarendon Press, Oxford.

Schweiger, F. (2000a) Kuzmin’s theorem revisited. Ergodic Theoryand Dynamical Systems 20, 557–565.

Schweiger, F. (2000b) Multidimensional Continued Fractions. OxfordUniv. Press, Oxford.

Page 389: Kluwer

372 References

Sebe, G.I. (1999) Spectral analysis of the Ruelle operator associatedwith the topological infinite order chain of the continued fraction ex-pansion. Rev. Roumaine Math. Pures Appl. 44, 277–291.

Sebe, G.I. (2000a) The Gauss–Kuzmin theorem for Hurwitz’s singularcontinued fraction expansion. Rev. Roumaine Math. Pures Appl. 45,495–514.

Sebe, G.I. (2000b) A two-dimensional Gauss–Kuzmin theorem for sin-gular continued fractions. Indag. Math. (N.S.) 11, 593–605.

Sebe, G.I. (2001a) On convergence rate in the Gauss–Kuzmin problemfor the grotesque continued fractions. Monatsh. Math. 133, 241–254.

Sebe, G.I. (2001b) Gauss’ problem for the continued fraction expan-sion with odd partial quotients revisited. Rev. Roumaine Math. PuresAppl. 46, 839–852.

Sebe, G.I. (2002) A Gauss–Kuzmin theorem for the Rosen fractions.J. Theor. Nombres Bordeaux 14.

Segre, B. (1945) Lattice points in infinite domains, and asymmetricDiophantine approximation. Duke J. Math. 12, 337–365.

Selenius, C.-O. (1960) Konstruktion und Theorie halbregelmassigerKettenbruche mit idealer relativer Approximationen. Acta Acad. Abo.Math. Phys. 22, no. 2, 1–75.

Sendov, B. (1959/60) Der Vahlensatz uber die singularen Kettenbrucheund die Kettenbruche nach nachsten Ganzen. Annuaire Univ. SofiaFac. Sci. Phys. Math. Livre 1 Math. 54, 251–258.

Seneta, E. (1976) Regularly Varying Functions. Lecture Notes inMath. 508. Springer–Verlag, Berlin.

Series, C. (1982) Non-Euclidean geometry, continued fractions, andergodic theory. Math. Intelligencer 4, no. 1, 24–31.

Series, C. (1991) Geometrical methods of symbolic coding. In: Bed-ford, T. et al. (Eds.) (1991), 125–151.

Shallit, J. (1979) Simple continued fractions for some irrational num-bers. J. Number Theory 11, 209–217.

Page 390: Kluwer

References 373

Shallit, J. O. (1982a) Simple continued fractions for some irrationalnumbers, II. J. Number Theory 14, 228–231.

Shallit, J. O. (1982b) Explicit descriptions of some continued fractions.Fibonacci Quart. 20, 77–81.

Shallit, J. (1994) Origins of the analysis of the Euclidean algorithm.Historia Math. 21, 401–419.

Shanks, D. and Wrench, J.W., Jr. (1959) Khintchine’s constant. Amer.Math. Monthly 66, 276–279.

Shiu, P. (1995) Computation of continued fractions without input val-ues. Math. Comp. 64, 1307–1317.

Sinai, Ya.G. (1994) Topics in Ergodic Theory. Princeton Univ. Press,Princeton, NJ.

Sloane, N.J.A. and Plouffe, S. (1995) The Encyclopedia of Integer Se-quences. Academic Press, San Diego.

Sprindzuk, V.G. (1979) Metric Theory of Diophantine Approxima-tions. Wiley, New York.

Stadje, W. (1985) Bemerkung zu einem Satz von Akcoglu und Krengel.Studia Math. 81, 307–310.

Strassen, V. (1964) An invariance principle for the law of the iteratedlogarithm. Z. Wahrsch. Verw. Gebiete 3, 211–226.

Sudan, G. (1959) The Geometry of Continued Fractions. TechnicalPublishing House, Bucharest. (Romanian)

Szusz, P. (1961) Uber einen Kusminschen Satz. Acta Math. Acad. Sci.Hungar. 12, 447–453.

Szusz, P. (1962) Verallagemainerung und Anwendungen eines Kusmin-schen Satzes. Acta Arith. 7, 149–160.

Szusz, P. (1980) On the length of continued fractions representing arational number with given denominator. Acta Arith. 37, 55–59.

Szusz, P. and Volkmann, B. (1982) On Strassen’s law of the iteratedlogarithm. Z. Wahrsch. Verw. Gebiete 61, 453–458.

Page 391: Kluwer

374 References

Tamura, J. (1991) Symmetric continued fractions related to certainseries. J. Number Theory 38, 251–264.

Tanaka, S. and Ito, Sh. (1981) On a family of continued-fractiontransformations and their ergodic properties. Tokyo J. Math. 4, 153–175.

Thakur, D.S. (1996) Exponential and continued fractions. J. NumberTheory 59, 248–261.

Tietze, H. (1913) Uber die raschesten Kettenbruchentwicklungen reellerZahlen. Monatsh. Math. Phys. 24, 209–242.

Tong, J. (1983) The conjugate property of the Borel theorem on Dio-phantine approximation. Math. Z. 184, 151–153.

Tong, J. (1994) The best approximation function to irrational num-bers. J. Number Theory 49, 89–94.

Tonkov, T. (1974) On the average length of finite continued fractions.Acta Arith. 26, 47–57.

Urban, F.M. (1923) Grundlagen der Wahrscheinlichkeitsrechnung undder Theorie der Beobachtungsfehler. Teubner, Leipzig.

Urbanski, M. (2001) Porosity in conformal infinite iterated functionsystems. J. Number Theory 88, 283–312.

Vahlen, K.T. (1895) Uber Naherungswerthe und Kettenbruche. J. ReineAngew. Math. 115, 221–233.

Vajda, S. (1989) Fibonacci and Lucas Numbers, and the Golden Sec-tion: Theory and Applications. E. Horwood, Chichester.

Vallee, B. (1997) Operateurs de Ruelle–Mayer generalises et analysedes algorithmes d’Euclide et de Gauss. Acta Arith. 81, 101–144.

Vallee, B. (1998) Dynamique des fractions continues a contraintesperiodiques. J. Number Theory 72, 183–235.

Vallee, B. (2000) Digits and continuants in Euclidean algorithms. Er-godic versus Tauberian theorems. J. Theor. Nombres Bordeaux 12,531–570.

Page 392: Kluwer

References 375

Vardi, I. (1995) The limiting distribution of the St. Petersburg game.Proc. Amer. Math. Soc. 123, 2875–2882.

Vardi, I. (1997) The St. Petersburg game and continued fractions.C.R. Acad. Sci. Paris Ser. I Math. 324, 913–918.

Veech, V.A. (1982) Gauss measures for transformations on the spaceof interval exchange maps. Ann. of Math. (2) 115, 201–242.

Vershik, A.M. and Sidorov, N.A. (1993) Arithmetic expansions asso-ciated with the rotation of a circle. Algebra i Analiz 5, no. 6, 97–115.(Russian)

Viader, P., Paradis, J., and Bibiloni, L. (1998) A new light on Minkow-ski’s ?(x)-function. J. Number Theory 73, 212–227.

Viswanath, D. (2000) Random Fibonacci sequences and the number1.13198824 · · · . Math. Comp. 69, 1131–1155.

de Vroedt, C. (1962) Measure-theoretical investigations concerningcontinued fractions. Indag. Math. 24, 583–591.

de Vroedt, C. (1964) Metrical problems concerning continued fractions.Compositio Math. 16, 191–195.

Wall, H.S. (1948) Analytic Theory of Continued Fractions. Van Nos-trand, New York.

Walters, P. (1982) An Introduction to Ergodic Theory. Graduate Textsin Mathematics 79. Springer–Verlag, New York.

Watson, G.N. (1944) A Treatise on the Theory of Bessel Functions,2nd Edition. Cambridge Univ. Press, Cambridge.

Whittaker, E.T. and Watson, G.N. (1927) A Course of Modern Anal-ysis. Cambridge Univ. Press, Cambridge.

Wiman, A. (1900) Uber eine Wahrscheinlichkeitsaufgabe bei Ketten-bruchentwickelungen. Ofversicht af Kongl. Svenska Vetenskaps-Akade-miens Forhandlingar 57, 829–841.

Wirsing, E. (1974) On the theorem of Gauss–Kusmin–Levy and aFrobenius type theorem for function spaces. Acta Arith. 24, 507–528.

Page 393: Kluwer

376 References

Wrench, J.W., Jr. (1960) Further evaluation of Khintchine’s constant.Math. Comp. 14, 370–371.

Wrench, J.W., Jr. and Shanks, D. (1966) Questions concerning Khint-chine’s constant and the efficient computation of regular continuedfractions. Math. Comp. 20, 444–448.

Zagier, D.B. (1981) Zetafunktionen und quadratische Korper. EineEinfuhrung in die hohere Zahlentheorie. Springer–Verlag, Berlin-NewYork.

Zuparov, T.M. (1981) On a theorem from the metric theory of con-tinued fractions. Izv. Akad. UzSSR Ser. Fiz.-Mat. Nauk no. 6, 9–12.(Russian)

Page 394: Kluwer

Index

Aaronson, J., 311, 339Abramov’s formula, 277Acosta, A. de, 202Adams, W.W., 271, 272Adler, R.L, 9, 244, 307α-expansion, 281, 344, 345Alexandrov, A.G., 241algorithm A, 259algorithm B, 260algorithm C, 260almost Markov property, 335Alzer, H., 13approximation coefficient, 27, 263Araujo, A., 197, 320, 331arc-sine law, 187

generalization of, 202array, 325

strictly stationary, 326strongly infinitesimal (s.i.), 327

associated random variables, 15extended, 34

automorphism, 219

Babenko, K.I., 103, 109, 111, 113,336

backward continued fraction (BCF)expansion, 307

Bagemihl, F., 30Bailey, D.H., 231, 233, 241Barbolosi, D., 249, 264Barndorff–Nielsen, O., 176Barrionuevo, J., 346

Berechet, A., xiii, 151Bernstein, F.

F. Bernstein’s theorem, 49, 174Bibiloni, L., 238Billingsley, P., 36, 180, 187, 221,

224, 257, 320, 334, 343Birkhoff’s individual ergodic theo-

rem, 221Borel sets, 314Borel, E., 22, 30, 243, 337Borwein, J.M., 231, 233, 241Bosma, W., 249, 251, 252, 260,

281, 288, 293–296, 298, 299,343

boundary, 315bounded essential variation, 55bounded p-variation, 75Boyarski, A., 58, 221, 223Bradley, R.C., 326Breiman, L., 253Brezinski, C., xiiBrjuno, A.D., 12, 241Broden, T., 22, 336, 337Broden–Borel–Levy formula, 21

generalized, 37Burton, R.M., 344, 345

Cassels, J.W.S., 343Champernowne, D.G., 243characteristic function, 316Choong, K.Y., 241Chudnovsky, D.V., 13

377

Page 395: Kluwer

378 Index

Chudnovsky, G.V., 13Clemens, L.E., 12conditional probability measures,

36continuant, 5continued fraction (CF), 260continued fraction digits, 4continued fraction expansion, 4continued fraction expansion for e,

12continued fraction expansion for π,

13continued fraction transformation,

2natural extension of, 25

continued fraction with even incom-plete quotients (Even CF)expansion, 264

continued fraction with odd incom-plete quotients (Odd CF)expansion, 264

convolution, 316Corless, R.M., 334Cornfeld, I.P., 221Crandall, R.E., 231, 233, 241

Dajani, K., 250, 300, 303–305, 310,311, 337, 341, 343, 345,346

Daude, H., 111, 130, 134Davison, J.L., 13Daykin, D.E., 241Denjoy, A., 156, 163, 337dependence coefficients, 325dependence with complete connec-

tions, 23, 234diagonal continued fraction (DCF)

expansion, 289Diamond, H.G., 235, 239, 240digamma function ψ, 145

Diophantine approximation, 29fundamental theorem of, 257

Dixon, J.D., 334δ-mixing, 326Doeblin, W., xi, 22, 33, 99, 204,

252, 335, 337–340Doeblin–Lenstra conjecture, 252Doob, J.L., 31Doukhan, P., 327Duren, P.L., 102Durner, A., 34, 337dynamical system, 219

Elsner, C., 13Elton, H.J., 253endomorphism, 219entropy, 257, 277Euclid’s algorithm, 1, 2Euler, L., 5, 12

Faivre, C., 9, 101, 130, 249, 334,336, 343

Falconer, K.J., 233, 234Farey continued fraction (FCF) ex-

pansion, 303Feller, W., 238f -expansion, 346

with dependent digits, 346Fieldsteel, A., 343Flajolet, P., 111, 130, 134Flatto, L., 307Fluch, W., 134Fortet, R., 335Fourier transform, 316Fujiwara, M., 30fundamental interval, 18

Gal, I.S., 221, 340Gora, P., 58, 221, 223Galambos, J., 173, 174Gauss, C.F., x, 15

Page 396: Kluwer

Index 379

Gauss–Kusmin–Levy theorem‘exact’, 111, 125L2-version, 123

Gauss’ measure, 16extended, 26

Gauss’ Problem, 15Babenko’s solution to, 101fPaul Levy’s solution to, 39fWirsing’s solution to, 79f

Gauss’ problem for τ , 246geodesic flow, 9Gine, E., 197, 320, 331Gordin, M.I., 216, 327Grochenig, K., 344Gray, J.J., 16Grigorescu, S., 23, 33, 62, 168, 193,

253, 334, 346Grothendieck, A., 105Gylden, H., 336

Haan, L. de, 174Haas, A., 344Halmos, P.R., 320Hardy, G.H., 11Harman, G., 233Hartman, S., 238Hartono, Y., 264Hausdorff dimension, 233Hausdorff measure, 233Heilbronn, H., 334Heinrich, H., 203, 335Hennion, H., 335Hensley, D., 2, 103, 194, 234, 252,

334, 336Heyde, C.C., 188, 214Hofbauer, F., 193Hoffmann–Jørgensen, J., 320Hurwitz, A., 263, 264, 288, 298

Ibragimov, I.A., 71, 72, 334

infinite-order chain, 33insertion, 300Ionescu Tulcea, C.T., 335Iosifescu, M., 23, 33, 62, 64, 147,

151, 168, 173, 178, 179,183, 193, 204, 334–337, 345,346

isomorphism, 222iterated function systems, 234Ito, Sh., 273, 281, 302, 344, 345

Jager, H., 30, 249, 251, 252, 271–273, 281, 288, 298, 340,341

Jain, N.C., 215, 216Jarnık, V., 234Jenkinson, O., 234Jogdeo, K., 215, 216Jones, W.B., xiiJur0ev, S.P., 113

Kac, M., 334Kakeya, S., 346Kalpazidou, S., 345Kamae, T., 221Kanwal, R.P., 105Karamata theorem, 321Katznelson, Y., 221K-automorphism, 223Keane, M.S., 221, 244Keller, G., 193Khin(t)chin(e), A.Ya., 16, 204, 231,

257, 334, 339, 340Knopp, K., 339Knuth, D.E., 2, 92, 101, 333, 334Kohler, G., 13Koksma, J.F., 221, 334, 340Kolmogorov, A.N., 337Kraaikamp, C., 30, 250, 251, 264,

273, 278, 286–290, 294, 296,

Page 397: Kluwer

380 Index

299, 300, 303–305, 310, 311,337, 341–346

Krasnoselskii, M., 128Kurosu, K., 287Kuzmin, R.O., 16

Lagarias, J.C., 238Lagrange, J.-L., 11Lame, G., 2Lang, S., 241Laplace, P.S., 15Lasota, A., 58, 220λ-continued fraction (λ-CF) expan-

sion, 344Law of the iterated logarithm

Chung’s, 215classical, 213Strassen’s, 213, 216

Legendre constants, 273Legendre’s theorem, 20Lehmer, D., 231Lehner continued fraction (LCF)

expansion, 300Lehner, J., 300, 302Lenstra, H.W., 252LeVeque, J., 30Levy-Cramer continuity theorem,

316Levy–Khinchin representation, 317Levy measure, 317Levy, Paul, 16, 22, 39, 256, 271,

334, 340, 342Liardet, P., 340, 341Linnik, Yu.V., 71, 72, 334Lochs, G., 334, 342Lopes, A., 264Lorenzen, L., xiiLoynes, R.M., 174

Mackey, M.C., 58, 220

MacLeod, A.J., 111, 119Magnus, W., 105, 107Marinescu, G., 335Martin, M.H., 339matrix approach, 7Mayer, D.H., 59, 103, 109, 111,

120, 127, 130, 194, 336Mazzone, F., 315McKinney, T.E., 281McLaughlin, J.R., 30measurable space, 313measure, 314mediant convergents, 301Merrill, K.D., 12Minnigerode, B., 263Misevicius, G., 193, 194, 335Mobius transformation, 7Moeckel, R., 340, 341Morita, T., 195‘Mother of all SRCF expansions’,

301

Nakada, H., 9, 271, 281, 283, 285,311, 334, 340, 343–345

nearest integer continued fraction(NICF) expansion, 263

Neumann, J. von, 244Nolte, V.N., 227, 229, 341normal continued fraction number,

243normal number, 243number normal in base b, 243

Oberhettinger, F., 105, 107Obrechkoff, N., 30Olds, C.D., xii1–block, 258Operator

Mayer–Ruelle, 130generalization of, 134

Page 398: Kluwer

Index 381

nuclear of order 0 (of trace class),105

trace of, 105Perron–Frobenius, 57, 58transition, 65

optimal continued fraction (OCF)expansion, 293

Paradis, J., 238Pedersen, P., 231Perron, O., xii, 11, 261, 288, 289,

333Petek, P., 64Petersen, K., 221, 223, 276, 277Petho, A., 13Philipp, W., 34, 173, 174, 176, 181,

215, 216, 230, 239, 256,334, 337–340

Plato, J. von, xii, 337Poisson probability, 317

τ -centered, 317Pollicott, M., 234Poorten, A. van der, 13Popescu, C., 180, 281, 345Popescu, G., 253Porter, J.W., 333Postnikov, A.G., 244preservation area, 269probability, 314

infinitely divisible, 317stable, 317

order of, 318strictly stable, 318

probability space, 314Prokhorov metric, 315Pruitt, W.E., 215ψ-mixing coefficient, 43

quadratic irrationality, 11

random variable (r.v.), 313

independent, 316P -distribution of, 314

Raney, G.N., 9Rathbone, C.R., 241Rautu, G., 56(regular) continued fraction (RCF),

3, 4convergents of [= (RCF) con-

vergents], 4digits of, 4asymptotic relative digit fre-

quencies, 225asymptotic relative frequencies

of digits between two givenvalues, 226

asymptotic relative frequenciesof digits exceeding a givenvalue, 227

asymptotic relative m-digit blockfrequencies, 226

incomplete (partial) quotientsof, 4

extended, 31regularly varying function, 321

index of, 321Reznik, M.H., 216Riauba, R., 335Richtmyer, R.D., 12, 241Rieger, G.J., 264, 272, 284, 345Rockett, A.M., xii, 284, 334, 345Roeder, D.W., 12Roepstorff, G., 109, 111, 120, 127,

336Rogers, C.A., 233Rosen continued fraction expansion,

344Rosen, D., 344Rousseau-Egele, J., 193Ruelle, D., 130Ryll–Nardzewski, C., 340

Page 399: Kluwer

382 Index

Salat, T., 233Salem, R., 238σ-algebra, 313Samorodnitsky, G., 320Samur, J.D., 79, 99, 188, 197, 211,

324, 327–331, 337, 338Saulis, L., 335Schmidt, T.A., 344Schmidt, W.M., xii, 343Schweiger, F., 264, 343S-convergent, 270Scott, D.J., 188, 214Sebe, G.I., 344, 345Segre, B., 30Selenius, C.-O., 260semi-regular continued fraction (SRCF)

expansion, 261closest, 259fastest, 259, 294

Sendov, B., 30, 287Seneta, E., 321, 322Series, C., 9S-expansion, 267Shallit, J.O., 2, 13, 241Shanks, D., 231, 232, 241Shiu, P., 241Sinai, Ya.G., 334singular continued fraction (SCF)

expansion, 264singularization, 258, 265singularization area, 267

maximal, 269singularization process, 265skew product, 223

Jager and Liardet’s, 341Skorohod metric d0, 319slowly varying function, 321

representation theorem, 321Smorodinsky, M., 244Soni, R.P., 105, 107

spectral radius, 95Sprindzuk, V.G., xiiSt. Petersburg game, 238Stackelberg, O.P., 216Stadje, W., 55Statulevicius, V.A., 335Stout, W.F., 181, 215, 216Strassen, V., 213, 216Sudan, G., xiiSzusz, P., xii, 16, 30, 334, 335, 339

Tamura, J., 13Tanaka, S., 281, 344, 345Taqqu, M.S., 320Taylor, S.J., 215, 216Thakur, D.S., 13Theodorescu, R., 168Thron, W.J., xiiTietze, H., 261Tong, J., 30, 31Tonkov, T., 334transformation, 219

ergodic, 220exact, 220measure preserving, 219natural extension of, 222non-singular, 219strongly mixing, 220

Trotter, H., 241Tuckerman, B., 244

UB Conjecture, 139Urban, F.M., 334Uspensky, J.V., 16

Vaaler, J.D., 235, 239, 240Vahlen, K.T., 28Vallee, B., 111, 130, 134, 135, 194Vardi, I., 238Viader, P., 238Volkmann, B., 339

Page 400: Kluwer

Index 383

Vroedt, C. de, 340

Waadeland, H., xiiWall, H.S., xiiWalters, P., 221Watson, G.N., 104, 227weak convergence, 315Webb, G.R., 339Weiss, B., 221Whittaker, E.T., 227Wiedijk, F., 249, 251, 252, 281,

288, 298Wiener measure, 319Wiman, A., 336, 337Wirsing, E., 16, 83, 91, 92, 113,

336Wrench, J.W., 231, 232Wright, E., 11

Zagier, D.B., 308Zbaganu, G., 56Zuparov, T.M., 338