probability theory - mathematik.uni-kl.de · introduction a stochastic model: ... basic concepts of...
TRANSCRIPT
Probability Theory
Klaus Ritter
TU Kaiserslautern, WS 2017/18
Literature
In particular,
H. Bauer, Probability Theory, de Gruyter, Berlin, 1996.
P. Billingsley, Probability and Measure, Wiley, New York, 1995.
P. Ganssler, W. Stute, Wahrscheinlichkeitstheorie, Springer, Berlin, 1977.
A. Klenke, Probability Theory, Springer, Berlin, 2014.
Prerequisites
Stochastic Methods and Measure and Integration Theory
Contents
I Introduction 1
II Basic Concepts of Probability Theory 3
1 Random Variables and Distributions . . . . . . . . . . . . . . . . . . . 3
2 Convergence in Probability . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Convergence in Distribution . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Uniform Integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Kernels and Product Measures . . . . . . . . . . . . . . . . . . . . . . . 21
6 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
IIILimit Theorems 37
1 Zero-One Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2 The Strong Law of Large Numbers . . . . . . . . . . . . . . . . . . . . 40
3 The Weak Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . 47
4 Characteristic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5 The One-Dimensional Central Limit Theorem . . . . . . . . . . . . . . 54
6 The Law of the Iterated Logarithm . . . . . . . . . . . . . . . . . . . . 60
7 The Multi-dimensional Central Limit Theorem . . . . . . . . . . . . . . 60
IV Brownian Motion 65
1 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2 Donsker’s Invariance Principle . . . . . . . . . . . . . . . . . . . . . . . 69
3 The Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
A Measure and Integration 78
1 Measure Spaces and Measurable Mappings . . . . . . . . . . . . . . . . 78
2 Borel Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3 The Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4 Real-valued Measurable Mappings . . . . . . . . . . . . . . . . . . . . . 80
5 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6 Lp-Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
ii
CONTENTS iii
7 Dynkin Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8 Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9 Caratheodory’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 88
10 The Factorization Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 88
Literature 89
Definitions 90
Chapter I
Introduction
A stochastic model : a probability space (Ω,A, P ) together with a collection of random
variables (measurable mappings) Ω→ R, say.
Main topics in this course:
(i) limit theorems,
(ii) conditional probabilities and expectations,
(iii) discrete-time martingales,
(iv) Brownian motion.
Example 1. Limit theorems like the law of large numbers or the central limit theorem
deal with sequences X1, X2, . . . of random variables and their partial sums
Sn =n∑i=1
Xi
(physics: position of a particle after n collisions; gambling: cumulative gain after n
trials). Under which conditions and in which sense does n−α · Sn converge as n tends
to infinity?
Example 2. Consider two random variables X1 and X2. If P (X2 = v) > 0 then
the conditional probability of X1 ∈ A given X2 = v is defined by
P (X1 ∈ A | X2 = v) =P (X1 ∈ A ∩ X2 = v)
P (X2 = v).
How can we reasonably extend this definition to the case P (X2 = v) = 0, e.g.,
for X2 being normally distributed? How does the observation X2 = v change our
stochastic model?
Example 3. Martingales may be used to model fair games, and a particular case of
a martingale S0, S1, . . . arises in Example 1, if X1, X2, . . . is an independent sequence
with zero mean. A gambling strategy is defined by a sequence H1, H2, . . . of random
1
I. Introduction 2
variables, where Hn may depend in any way on the outcomes of the first n− 1 trials.
The cumulative gain after n trials is given by the discrete integral
n∑k=1
Hk ·Xk =n∑k=1
Hk · (Sk − Sk−1).
Can we tilt the martingale in favor of the gambler by a suitable strategy?
Example 4. The fluctuation of a stock price defines a function on the time interval
[0,∞[ with values in R (for simplicity, we admit negative stock prices at this point).
What is a reasonable σ-algebra on the space Ω of all mappings [0,∞[→ R or on the
subspace of all continuous mappings? How can we define (non-discrete) probability
measures on these spaces in order to model the random dynamics of stock prices?
Analogously for random perturbations in physics, biology, etc.
More generally, the same questions arise for mappings I → S with an arbitrary
non-empty set I and S ⊂ Rd (physics: phase transition in ferromagnetic materials,
the orientation of magnetic dipoles on a set I of sites; medicine: spread of diseases,
certain biometric parameters for a set I of individuals; environmental science: the
concentration of certain pollutants in a region I).
Example 5. Suppose that X1, X2, . . . is an independent sequence with zero mean
and variance one. Rescaling the partial sums Sn in time and space according to
S(m)n/m =
n∑k=1
Xk/√m
and using piecewise linear interpolation of S(m)0 , S
(m)1/m, . . . at the knots 0, 1/m, . . . we
get a random variable S(m) taking values in the space C([0,∞[). By the central
limit theorem S(m)1 converges to a standard normally distributed random variable as
m tends to infinity. The infinite-dimensional counterpart of this result, which deals
with probability measures on the space C([0,∞[), guarantees convergence of S(m) to
a Brownian motion. On the latter we quote from Schilling, Partzsch (2014, p. vi):
‘Brownian motion is arguably the single most important stochastic process.’
Chapter II
Basic Concepts of Probability
Theory
Context for probability theoretical concepts: a probability space (Ω,A, P ).
Terminology: A ∈ A event , P (A) probability of the event A ∈ A.
1 Random Variables and Distributions
Given: a probability space (Ω,A, P ) and a measurable space (Ω′,A′).
Definition 1. X : Ω → Ω′ is a random element if X is A-A′-measurable. Particular
cases:
(i) X is a (real) random variable if (Ω′,A′) = (R,B),
(ii) X is a numerical random variable if (Ω′,A′) = (R,B),
(iii) X is a k-dimensional (real) random vector if (Ω′,A′) = (Rk,Bk),
(iv) X is a k-dimensional numerical random vector if (Ω′,A′) = (Rk,Bk).
As is customary, we use the abbreviation
X ∈ A = ω ∈ Ω : X(ω) ∈ A
for any X : Ω→ Ω and A ⊂ Ω.
Definition 2.
(i) The distribution (probability law) of a random element X : Ω→ Ω′ (with respect
to P ) is the image measure
PX = X(P ).
Notation: X ∼ Q if PX = Q.
3
II.1. Random Variables and Distributions 4
(ii) Given: probability spaces (Ω1,A1, P1), (Ω2,A2, P2) and random elements
X1 : Ω1 → Ω′, X2 : Ω2 → Ω′.
X1 and X2 are identically distributed if
(P1)X1= (P2)X2
.
Definition 3. A property Π holds P -almost surely (P -a.s., a.s., with probability one),
if
∃A ∈ A : A ⊂ ω ∈ Ω : Π holds for ω ∧ P (A) = 1.
Remark 4.
(i) For random elements X, Y : Ω→ Ω′
X = Y P -a.s. ⇒ PX = PY ,
but the converse is not true in general. For instance, let P be the uniform
distribution on Ω = 0, 1 and define X(ω) = ω and Y (ω) = 1− ω.
(ii) For every probability measure Q on (Ω′,A′) there exists a probability space
(Ω,A, P ) and a random element X : Ω→ Ω′ such that X ∼ Q. Take (Ω,A, P ) =
(Ω′,A′, Q) and X = idΩ.
Caratheodory’s Theorem is a general tool for the construction of probability
measures, see Section A.9.
(iii) A major part of probability theory deals with properties of random elements
that can be formulated in terms of their distributions only.
Example 5. A discrete distribution PX is specified by a countable set ∅ 6= D ⊂ Ω′
and a mapping p : D → R such that
∀ω′ ∈ D : p(ω′) ≥ 0 ∧∑ω′∈D
p(ω′) = 1,
namely,
PX =∑ω′∈D
p(ω′) · εω′ .
Here εω′ denotes the Dirac measure at the point ω′ ∈ Ω′, see Example A.1.3. Hence
PX(A′) = P (X ∈ A′) =∑
ω′∈A′∩D
p(ω′), A′ ∈ A′.
Assume that ω′ ∈ A′ for every ω′ ∈ D. Then
P (X = ω′) = p(ω′),
and p, extended by zero to a mapping Ω′ → R, is the density of PX w.r.t the counting
measure on A′.
II.1. Random Variables and Distributions 5
If |D| <∞ then p(ω′) = 1|D| yields the uniform distribution on D.
For (Ω′,A′) = (R,B)
B(n, p) =n∑k=0
(n
k
)· pk(1− p)n−k · εk
is the binomial distribution with parameters n ∈ N and p ∈ [0, 1]. In particular, for
n = 1 we get the Bernoulli distribution
B(1, p) = (1− p) · ε0 + p · ε1.
Further examples include the geometric distribution G(p) with parameter p ∈ ]0, 1],
G(p) =∞∑k=1
p · (1− p)k−1 · εk,
and the Poisson distribution π(λ) with parameter λ > 0,
π(λ) =∞∑k=0
exp(−λ) · λk
k!· εk.
Example 6. Consider a distribution on (Rk,Bk) that is defined in terms of a proba-
bility density f : Rk → [0,∞[ w.r.t. the Lebesgue measure λk. In this case
PX(A′) = P (X ∈ A′) =
∫A′f dλk, A′ ∈ Bk.
We present some examples in the case k = 1. The normal distribution
N(µ, σ2) = f · λ1
with parameters µ ∈ R and σ2, where σ > 0, is obtained by
f(x) =1√
2πσ2· exp
(−1
2
(x− µ)2
σ2
), x ∈ R.
The exponential distribution with parameter λ > 0 is obtained by
f(x) =
0 if x < 0
λ · exp(−λx) if x ≥ 0.
The uniform distribution on D ∈ B with λ1(D) ∈ ]0,∞[ is obtained by
f =1
λ1(D)· 1D.
II.1. Random Variables and Distributions 6
Distribution Functions
Definition 7. Let X = (X1, . . . , Xk) : Ω→ Rk be a random vector. Then
FX : Rk → [0, 1]
(x1, . . . , xk) 7→ PX
( k×i=1
]−∞, xi])
= P
( k⋂i=1
Xi ≤ xi)
is called the distribution function of X.
Theorem 8. Given: probability spaces (Ω1,A1, P1), (Ω2,A2, P2) and random vectors
X1 : Ω1 → Rk, X2 : Ω2 → Rk.
Then
(P1)X1 = (P2)X2 ⇔ FX1 = FX2 .
Proof. ‘⇒’ holds trivially. ‘⇐’: By Theorem A.2.4, Bk = σ(E) for
E = k×i=1
]−∞, xi] : x1, . . . , xk ∈ R,
and E is closed w.r.t. intersections. Use Theorem A.1.5.
For notational convenience, we consider the case k = 1 in the sequel. We refer to
Ubung 3.4 in Maß- und Integrationstheorie (2016) for the following two facts.
Theorem 9.
(i) FX is non-decreasing,
(ii) FX is right-continuous,
(iii) limx→−∞ FX(x) = 0 and limx→∞ FX(x) = 1,
(iv) FX is continuous at x iff P (X = x) = 0.
Theorem 10. For every function F that satisfies (i)–(iii) from Theorem 9,
∃1Q probability measure on B ∀x ∈ R : Q(]−∞, x]) = F (x).
Expectation and Variance
Remark 11. Define ∞r =∞ for r > 0. For 1 ≤ p < q <∞ and X ∈ Z(Ω,A)∫|X|p dP ≤
(∫|X|q dP
)p/q,
due to Holder’s inequality, see Theorem A.6.2.
II.1. Random Variables and Distributions 7
Notation:
L = L(Ω,A, P ) =X ∈ Z(Ω,A) :
∫|X| dP <∞
is the class of P -integrable random variables, and analogously
L = L(Ω,A, P ) =X ∈ Z(Ω,A) :
∫|X| dP <∞
is the class of P -integrable numerical random variables. We consider PX as a distri-
bution on (R,B) if P (X ∈ R) = 1 for a numerical random variable X, and we
consider L as a subspace of L.
Definition 12. For X ∈ L or X ∈ Z+
E(X) =
∫X dP
is the expectation of X. For X ∈ Z(Ω,A) such that X2 ∈ L
Var(X) =
∫(X − E(X))2 dP
and√
Var(X) are the variance and the standard deviation of X, respectively.
Remark 13. The Transformation Theorem A.5.12 implies∫Ω
|X|p dP <∞ ⇔∫R|x|p PX(dx) <∞
for X ∈ Z(Ω,A), in which case, for p = 1
E(X) =
∫RxPX(dx),
and for p = 2
Var(X) =
∫R(x− E(X))2 PX(dx).
Thus E(X) and Var(X) depend only on PX .
Example 14.
X ∼ B(n, p) E(X) = n · p Var(X) = n · p · (1− p)
X ∼ G(p) E(X) =1
pVar(X) =
1− pp2
X ∼ π(λ) E(X) = λ Var(X) = λ,
see the lecture on Stochastische Methoden.
X is Cauchy distributed with parameter α > 0 if X ∼ f · λ1 where
f(x) =α
π(α2 + x2), x ∈ R.
II.2. Convergence in Probability 8
Since∫ t
0x
1+x2dx = 1
2log(1 + t2) neither E(X+) < ∞ nor E(X−) < ∞, and therefore
X 6∈ L.
If X ∼ N(µ, σ2) then
E(X) = µ ∧ Var(X) = σ2.
If X is exponentially distributed with parameter λ > 0 then
E(X) =1
λ∧ Var(X) =
1
λ2.
See the lecture on Stochastische Methoden.
2 Convergence in Probability
Motivated by the Examples A.5.11 and A.6.8 we introduce a notion of convergence
that is weaker than convergence in mean and convergence almost surely.
In the sequel, X, Xn, etc. are random variables on a common probability space
(Ω,A, P ).
Definition 1. (Xn)n converges to X in probability if
∀ ε > 0: limn→∞
P (|Xn −X| > ε) = 0.
Notation: XnP−→ X.
Theorem 2 (Chebyshev-Markov Inequality). Let (Ω,A, µ) be a measure space and
f ∈ Z(Ω,A). For every u > 0 and every 1 ≤ p <∞
µ(|f | ≥ u) ≤ 1
up·∫|f |p dµ.
Proof. We have ∫|f |≥u
up dµ ≤∫
Ω
|f |p dµ.
Corollary 3. If E(X2) <∞ and ε > 0, then
P (|X − E(X)| ≥ ε) ≤ 1
ε2· Var(X).
Theorem 4.
d(X, Y ) =
∫min(1, |X − Y |) dP
defines a semi-metric on Z(Ω,A), and
XnP−→ X ⇔ lim
n→∞d(Xn, X) = 0.
II.2. Convergence in Probability 9
Proof. ‘⇒’ For ε > 0∫min(1, |Xn −X|) dP
=
∫|Xn−X|>ε
min(1, |Xn −X|) dP +
∫|Xn−X|≤ε
min(1, |Xn −X|) dP
≤ P (|Xn −X| > ε) + min(1, ε).
‘⇐’: Let 0 < ε < 1. Use Theorem 2 to obtain
P (|Xn −X| > ε) = P (min(1, |Xn −X|) > ε)
≤ 1
ε·∫
min(1, |Xn −X|) dP =1
ε· d(Xn, X).
Remark 5. By Theorems 4 and A.6.11,
XnLp
−→ X ⇒ XnP−→ X.
Example A.5.11 shows that ‘⇐’ does not hold in general.
Remark 6. By Theorems 4 and A.5.10,
XnP -a.s.−→ X ⇒ Xn
P−→ X.
Example A.6.8 shows that ‘⇐’ does not hold in general. The Law of Large Numbers
deals with convergence almost surely or convergence in probability, see the introduc-
tory Example I.1 and Sections III.2 and III.3.
Subsequence Criteria
Corollary 7.
XnP−→ X ⇒ ∃ subsequence (Xnk
)k∈N : Xnk
P -a.s.−→ X.
Proof. Due to Theorems 4 and A.6.7 there exists a subsequence (Xnk)k∈N such that
min(1, |Xnk−X|) P -a.s.−→ 0.
Remark 8. In any semi-metric space (M,d) a sequence (an)n∈N converges to a iff
∀ subsequence (ank)k∈N ∃ subsequence (ank`
)`∈N : lim`→∞
d(ank`, a) = 0.
Corollary 9. XnP−→ X iff
∀ subsequence (Xnk)k∈N ∃ subsequence (Xnk`
)`∈N : Xnk`
P -a.s.−→ X.
Proof. ‘⇒’: Corollary 7. ‘⇐’: Remarks 6 and 8 together with Theorem 4.
II.3. Convergence in Distribution 10
Remark 10. We conclude that, in general, there is no semi-metric on Z(Ω,A) that
defines the a.s.-convergence. However, if Ω is countable, then
XnP -a.s.−→ X ⇔ Xn
P−→ X.
Proof: Ubung 2.3.
Lemma 11. Let −→ denote convergence almost everywhere or convergence in prob-
ability. If X(i)n −→ X(i) for i = 1, . . . ,m and f : Rm → R is continuous, then
f (X(1)n , . . . , X(m)
n ) −→ f (X(1), . . . , X(m)).
Proof. Trivial for convergence almost everywhere, and by Corollary 9 the conclusion
holds for convergence in probability, too.
Corollary 12. Let XnP−→ X. Then
XnP−→ Y ⇔ X = Y P -a.s.
Proof. Corollary 9 and Lemma A.6.6.
3 Convergence in Distribution
Given: a metric space (M,ρ). Let M(M) denote the set of all probability measures
on the Borel-σ-algebra B(M) in M . Moreover, let
Cb(M) = f : M → R : f bounded, continuous.
Definition 1.
(i) A sequence (Qn)n∈N in M(M) converges weakly to Q ∈M(M) if
∀ f ∈ Cb(M) : limn→∞
∫f dQn =
∫f dQ.
Notation: Qnw−→ Q.
(ii) A sequence (Xn)n∈N of random elements with values in M converges in distribu-
tion to a random element X with values in M if Qnw−→ Q for the distributions
Qn of Xn and Q of X, respectively.
Notation: Xnd−→ X.
Remark 2. For convergence in distribution the random elements need not be defined
on a common probability space.
In the sequel: Qn, Q ∈M(M) for n ∈ N.
II.3. Convergence in Distribution 11
Example 3.
(i) For xn, x ∈Mεxn
w−→ εx ⇔ limn→∞
ρ(xn, x) = 0.
For the proof of ‘⇐’, note that∫f dεxn = f(xn),
∫f dεx = f(x).
For the proof of ‘⇒’, suppose that lim supn→∞ ρ(xn, x) > 0. Take
f(y) = min(ρ(y, x), 1), y ∈M,
and observe that f ∈ Cb(M) and
lim supn→∞
∫f dεxn = lim sup
n→∞min(ρ(xn, x), 1) > 0
while∫f dεx = 0.
(ii) For the Euclidean distance ρ on M = Rk we have (M,B(M)) = (Rk,Bk). Now,
in particular, k = 1 and
Qn = N(µn, σ2n)
where σn > 0. For f ∈ Cb(R)∫f dQn = 1/
√2π ·
∫Rf(σn · x+ µn) · exp(−1/2 · x2)λ1(dx).
Put N(µ, 0) = εµ. Then
limn→∞
µn = µ ∧ limn→∞
σn = σ ⇒ Qnw−→ N(µ, σ2).
Otherwise (Qn)n∈N does not converge weakly. Ubung 4.1.
(iii) For M = C([0, T ]) let ρ(x, y) = supt∈[0,T ] |x(t) − y(t)|. Cf. the introductory
Examples I.4 and I.5.
Remark 4. Note that Qnw−→ Q does not imply
∀A ∈ B(M) : limn→∞
Qn(A) = Q(A).
For instance, assume limn→∞ ρ(xn, x) = 0 with xn 6= x for every n ∈ N. Then
εxn(x) = 0, εx(x) = 1.
Theorem 5 (Portemanteau Theorem). The following properties are equivalent:
(i) Qnw−→ Q,
(ii) ∀ f ∈ Cb(M) uniformly continuous : limn→∞∫f dQn =
∫f dQ,
II.3. Convergence in Distribution 12
(iii) ∀A ⊂M closed: lim supn→∞Qn(A) ≤ Q(A),
(iv) ∀A ⊂M open: lim infn→∞Qn(A) ≥ Q(A),
(v) ∀A ∈ B(M) : Q(∂A) = 0 ⇒ limn→∞Qn(A) = Q(A).
Proof. At first we show that ‘(ii)⇒ (iii)⇒ (i)’, which yields the equivalence of (i)–(iv).
‘(ii) ⇒ (iii)’: Let A ⊂M be closed, and put
Am = x ∈M : dist(x,A) < 1/m.
Note that Am ↓ A. Let ε > 0, and take m ∈ N such that Q(Am) < Q(A) + ε. Define
ϕ : R→ R by
ϕ(z) =
1 if z ≤ 0
1− z if 0 < z < 1
0 otherwise
and f : M → R by
f(x) = ϕ(m · dist(x,A)).
Then f ∈ Cb(M) is uniformly continuous, and 0 ≤ f ≤ 1 with f |A = 1 and f |Acm
= 0.
Hence Qn(A) ≤∫f dQn and
lim supn→∞
Qn(A) ≤∫f dQ ≤ Q(Am) ≤ Q(A) + ε.
‘(iii) ⇒ (i)’: Let f ∈ Cb(M). Assume that 0 < f < 1 without loss of generality. For
every m ∈ N and every P ∈M(M)∫f dP ≤
m∑k=1
k/m · P ((k − 1)/m ≤ f < k/m) =1
m
m∑k=1
P (f ≥ (k − 1)/m)
=m∑k=1
(k − 1)/m · P ((k − 1)/m ≤ f < k/m) + 1/m ≤∫f dP + 1/m.
Therefore
lim supn→∞
∫f dQn ≤
1
m
m∑k=1
Q(f ≥ (k − 1)/m) ≤∫f dQ+ 1/m,
which implies lim supn→∞∫f dQn ≤
∫f dQ. In fact, the latter holds true for any
bounded and upper semi-continuous mapping f : M → R. For f being continuous
consider 1− f , too, to obtain limn→∞∫f dQn =
∫f dQ.
‘(i) ⇒ (v)’: Let A ∈ B(M) with Q(∂A) = 0, and put f = 1A as well a
f∗ = supg : g lower semi-continuous, g ≤ f,f ∗ = infh : h upper semi-continuous, h ≥ f.
II.3. Convergence in Distribution 13
Then f∗ is lower semi-continuous, f ∗ is upper semi-continuous, and f∗ ≤ f ≤ f ∗ with
equality Q-a.s., since Q(∂A) = 0. From the proof of ‘(iii) ⇒ (i)’ we get∫f∗ dQ ≤ lim inf
n→∞
∫f∗ dQn ≤ lim sup
n→∞
∫f ∗ dQn ≤
∫f ∗ dQ.
Hence limn→∞∫f dQn =
∫f dQ.
‘(v) ⇒ (i)’: Ubung 3.4.
Lemma 6. Consider metric spaces M,M ′ and a continuous mapping π : M → M ′.
Then
Qnw−→ Q ⇒ πQn
w−→ πQ.
Proof. Note that f ∈ Cb(M ′) implies f π ∈ Cb(M). Employ Theorem A.5.12 and
the definition of weak convergence.
Weak Convergence of Measures on (R,B)
In the sequel, we study the particular case (M,B(M)) = (R,B), i.e., convergence in
distribution for random variables. The Central Limit Theorem deals with this notion
of convergence, see the introductory Example I.1 and Section III.5.
Notation: for any Q ∈M(R)
FQ(x) = Q(]−∞, x]), x ∈ R,
and for any function F : R→ R
Cont(F ) = x ∈ R : F continuous at x.
Theorem 7.
Qnw−→ Q ⇔ ∀x ∈ Cont(FQ) : lim
n→∞FQn(x) = FQ(x).
Moreover, if Qnw−→ Q and Cont(FQ) = R then
limn→∞
supx∈R|FQn(x)− FQ(x)| = 0.
Proof. ‘⇒’: If x ∈ Cont(FQ) and A = ]−∞, x] then Q(∂A) = Q(x) = 0, see
Theorem 1.9. Hence Theorem 5 implies
limn→∞
FQn(x) = limn→∞
Qn(A) = Q(A) = FQ(x).
‘⇐’: Consider a non-empty open set A ⊂ R. Take pairwise disjoint open intervals
A1, A2, . . . such that A =⋃∞i=1Ai. Fatou’s Lemma implies
lim infn→∞
Qn(A) = lim infn→∞
∞∑i=1
Qn(Ai) ≥∞∑i=1
lim infn→∞
Qn(Ai).
II.3. Convergence in Distribution 14
Note that R \ Cont(FQ) is countable. Fix ε > 0, and take
A′i = ]a′i, b′i] ⊂ Ai
for i ∈ N such that
a′i, b′i ∈ Cont(FQ) ∧ Q(Ai) ≤ Q(A′i) + ε · 2−i.
Then
lim infn→∞
Qn(Ai) ≥ lim infn→∞
Qn(A′i) = Q(A′i) ≥ Q(Ai)− ε · 2−i.
We conclude that
lim infn→∞
Qn(A) ≥ Q(A)− ε,
and therefore Qnw−→ Q by Theorem 5.
Uniform convergence, see Ubung 1.3.
Corollary 8.
Qnw−→ Q ∧ Qn
w−→ Q ⇒ Q = Q.
Proof. By Theorem 7 FQ(x) = FQ(x) if x ∈ D = Cont(FQ) ∩ Cont(FQ). Since D
is dense in R and FQ as well as FQ are right-continuous, we get FQ = FQ. Apply
Theorem 1.10.
Given: random variables Xn, X on (Ω,A, P ) for n ∈ N.
Theorem 9.
XnP−→ X ⇒ Xn
d−→ X
and
Xnd−→ X ∧ X constant a.s. ⇒ Xn
P−→ X.
Proof. Assume XnP−→ X. For ε > 0 and x ∈ R
P (X ≤ x− ε)− P (|X −Xn| > ε)≤ P (X ≤ x− ε ∩ |X −Xn| ≤ ε)≤ P (Xn ≤ x)= P (Xn ≤ x ∩ X ≤ x+ ε) + P (Xn ≤ x ∩ X > x+ ε)≤ P (X ≤ x+ ε) + P (|X −Xn| > ε).
Thus
FX(x− ε) ≤ lim infn→∞
FXn(x) ≤ lim supn→∞
FXn(x) ≤ FX(x+ ε).
For x ∈ Cont(FX) we get limn→∞ FXn(x) = FX(x). Apply Theorem 7.
Now, assume that Xnd−→ X and PX = εx. Let ε > 0 and take f ∈ Cb(R) such that
f ≥ 0, f(x) = 0, and f(y) = 1 if |x− y| ≥ ε. Then
P (|X −Xn| > ε) = P (|x−Xn| > ε) =
∫1R\[x−ε,x+ε] dPXn ≤
∫f dPXn
and
limn→∞
∫f dPXn =
∫f dPX = 0.
II.3. Convergence in Distribution 15
Example 10. Consider the uniform distribution P on Ω = 0, 1. Put
Xn(ω) = ω, X(ω) = 1− ω.
Then PXn = PX and therefore
Xnd−→ X.
However, |Xn −X| < 1 = ∅ and therefore
XnP−→ X does not hold.
Theorem 11 (Skorohod). There exists a probability space (Ω,A, P ) with the follow-
ing property. If
Qnw−→ Q,
then there exist Xn, X ∈ Z(Ω,A) for n ∈ N such that
∀n ∈ N : Qn = PXn ∧ Q = PX ∧ XnP -a.s.−→ X.
Proof. Take Ω = ]0, 1[, A = B(Ω), and consider the uniform distribution P on Ω.
Define
XQ(ω) = infz ∈ R : ω ≤ FQ(z), ω ∈ ]0, 1[ ,
for any Q ∈M(R). Since XQ is non-decreasing, we have XQ ∈ Z(Ω,A). Furthermore,
PXQ= Q, (1)
see Ubung 2.4.
Assuming Qnw−→ Q we define Xn = XQn and X = XQ. Since X is non-decreasing,
we conclude that Ω \ Cont(X) is countable. Thus it suffices to show
∀ω ∈ Cont(X) : limn→∞
Xn(ω) = X(ω).
Let ω ∈ Cont(X) and ε > 0. Put x = X(ω) and take xi ∈ Cont(FQ) such that
x− ε < x1 < x < x2 < x+ ε.
Hence
FQ(x1) < ω < FQ(x2).
By assumption there exists n0 ∈ N such that
FQn(x1) < ω < FQn(x2)
for n ≥ n0. Hence Xn(ω) ∈ ]x1, x2], i.e. |Xn(ω)−X(ω)| < ε.
II.3. Convergence in Distribution 16
Remark 12. By (1) we have a general method to transform uniformly distributed
‘random numbers’ from ]0, 1[ into ‘random numbers’ with distribution Q.
Remark 13.
(i) Put
C(r) = f : R→ R : f, f (1), . . . , f (r) bounded, f (r) uniformly continuous.
Then
Qnw−→ Q ⇔ ∃ r ∈ N0 ∀ f ∈ C(r) : lim
n→∞
∫f dQn =
∫f dQ,
see Ganssler, Stute (1977, p. 66).
(ii) The Levy distance
d(Q,R) = infh ∈ ]0,∞[ : ∀x ∈ R : FQ(x− h)− h ≤ FR(x) ≤ FQ(x+ h) + h
defines a metric on M(R), and
Qnw−→ Q ⇔ lim
n→∞d(Qn, Q) = 0,
see Chow, Teicher (1978, Thm. 8.1.3).
(iii) Suppose that (M,ρ) is a complete separable metric space. Then there exists a
metric d on M(M) such that (M(M), d) is complete and separable as well, and
Qnw−→ Q ⇔ lim
n→∞d(Qn, Q) = 0,
see Parthasarathy (1967, Sec. II.6) or Klenke (2013, Abschn. 13.2) and cf.
Ubung 3.3.
Compactness
Finally, we present a compactness criterion, which is very useful for construction of
probability measures on B(M).
Lemma 14. Let xn,` ∈ R for n, ` ∈ N with
∀ ` ∈ N : supn∈N|xn,`| <∞.
Then there exists an increasing sequence (ni)i∈N in N such that
∀ ` ∈ N : (xni,`)i∈N converges.
Proof. See Billingsley (1979, Thm. 25.13).
II.3. Convergence in Distribution 17
Definition 15.
(i) P ⊂M(M) tight if
∀ ε > 0 ∃K ⊂M compact ∀P ∈ P : P (K) ≥ 1− ε.
(ii) P ⊂ M(M) relatively compact if every sequence in P contains a subsequence
that converges weakly.
Theorem 16 (Prohorov). Assume that M is a complete separable metric space and
P ⊂M(M). Then
P relatively compact ⇔ P tight.
Proof. See Parthasarathy (1967, Thm. II.6.7). Here: M = R.
‘⇒’: Suppose that P is not tight. Then, for some ε > 0, there exists a sequence
(Pn)n∈N in P such that
Pn([−n, n]) < 1− ε.
For a suitable subsequence, Pnk
w−→ P ∈M(R). Take m > 0 such that
P (]−m,m[) > 1− ε.
Theorem 5 implies
P (]−m,m[) ≤ lim infk→∞
Pnk(]−m,m[) ≤ lim inf
k→∞Pnk
([−nk, nk]) ≤ 1− ε,
which is a contradiction.
‘⇐’: Consider any sequence (Pn)n∈N in P and the corresponding sequence (Fn)n∈Nof distribution functions. Use Lemma 14 to obtain a subsequence (Fni
)i∈N and a
non-decreasing function G : Q→ [0, 1] with
∀ q ∈ Q : limi→∞
Fni(q) = G(q).
Put
F (x) = infG(q) : q ∈ Q ∧ x < q, x ∈ R.
Claim (Helly’s Theorem):
(i) F is non-decreasing and right-continuous,
(ii) ∀x ∈ Cont(F ) : limi→∞ Fni(x) = F (x).
Proof: Ad (i): Obviously F is non-decreasing. For x ∈ R and ε > 0 take δ2 > 0 such
that
∀ q ∈ Q ∩ ]x, x+ δ2[ : G(q) ≤ F (x) + ε.
Thus, for z ∈ ]x, x+ δ2[,
F (x) ≤ F (z) ≤ F (x) + ε.
II.4. Uniform Integrability 18
Ad (ii): If x ∈ Cont(F ) and ε > 0 take δ1 > 0 such that
F (x)− ε ≤ F (x− δ1).
Thus, for q1, q2 ∈ Q with
x− δ1 < q1 < x < q2 < x+ δ2,
we get
F (x)− ε ≤ F (x− δ1) ≤ G(q1) ≤ lim infi→∞
Fni(x) ≤ lim sup
i→∞Fni
(x)
≤ G(q2) ≤ F (x) + ε.
Claim:
limx→−∞
F (x) = 0 ∧ limx→∞
F (x) = 1.
Proof: For ε > 0 take m ∈ Q such that
∀n ∈ N : Pn(]−m,m]) ≥ 1− ε.
Thus
G(m)−G(−m) = limi→∞
(Fni
(m)− Fni(−m)
)= lim
i→∞Pni
(]−m,m]) ≥ 1− ε.
Since F (m) ≥ G(m) and F (−m− 1) ≤ G(−m), we obtain
F (m)− F (−m− 1) ≥ 1− ε.
It remains to apply Theorems 1.10 and 7.
4 Uniform Integrability
In the sequel: Xn, X random variables on a common probability space (Ω,A, P ).
Definition 1. (Xn)n∈N uniformly integrable (u.i.) if
limα→∞
supn∈N
∫|Xn|≥α
|Xn| dP = 0.
Remark 2.
(i) (Xn)n∈N u.i. ⇒ (∀n ∈ N : Xn ∈ L1) ∧ supn∈N ‖Xn‖1 <∞.
(ii) ∃Y ∈ L1 ∀n ∈ N : |Xn| ≤ Y ⇒ (Xn)n∈N u.i.
(iii) ∃ p > 1 (∀n ∈ N : Xn ∈ Lp) ∧ supn∈N ‖Xn‖p <∞⇒ (Xn)n∈N u.i.
Proof:∫|Xn|≥α |Xn| dP = 1/αp−1 ·
∫|Xn|≥α α
p−1|Xn| dP ≤ 1/αp−1 · ‖Xn‖pp.
II.4. Uniform Integrability 19
Example 3. For the uniform distribution P on [0, 1] and
Xn = n · 1[0,1/n]
we have Xn ∈ L1 and ‖Xn‖1 = 1, but for any α > 0 and n ≥ α∫|Xn|≥α
|Xn| dP = n · P ([0, 1/n]) = 1,
so that (Xn)n∈N is not u.i.
Lemma 4. (Xn)n∈N u.i. iff
supn∈N
E(|Xn|) <∞ (1)
and
∀ ε > 0 ∃ δ > 0 ∀A ∈ A :
(P (A) < δ ⇒ sup
n∈N
∫A
|Xn| dP < ε
). (2)
Proof. ‘⇒’: For (1), see Remark 2.(i). Moreover,∫A
|Xn| dP =
∫A∩|Xn|≥α
|Xn| dP +
∫A∩|Xn|<α
|Xn| dP
≤∫|Xn|≥α
|Xn| dP + α · P (A).
For ε > 0 take α > 0 with
supn∈N
∫|Xn|≥α
|Xn| dP < ε/2
and δ = ε/(2α) to obtain (2).
‘⇐’: Put M = supn∈N E(|Xn|). Theorem 2.2 yields P (|Xn| ≥ α) ≤ M/α. Let
ε > 0, take δ > 0 according to (2) to obtain for α > M/δ
supn∈N
∫|Xn|≥α
|Xn| dP < ε.
Theorem 5. Let 1 ≤ p <∞, and assume Xn ∈ Lp for every n ∈ N. Then
(Xn)n∈N converges in Lp
iff
(Xn)n∈N converges in probability ∧ (|Xn|p)n∈N is u.i.
Proof. ‘⇒’: Assume XnLp
−→ X. It follows that (Xn)n∈N is bounded in Lp, and from
Remark 2.5 we get XnP−→ X. Observe that
‖1A ·Xn‖p ≤ ‖1A · (Xn −X)‖p + ‖1A ·X‖p
II.4. Uniform Integrability 20
for every A ∈ A. Let ε > 0, take k ∈ N such that
supn>k‖Xn −X‖p < ε. (3)
By Remark 2.(ii),
(|X1 −X|p, . . . , |Xk −X|p, |X|p, |X|p, . . . ) u.i.
By Lemma 4
P (A) < δ ⇒(
sup1≤n≤k
‖1A · (Xn −X)‖p < ε ∧ ‖1A ·X‖p < ε
)for a suitable δ > 0. Together with (3) this implies
P (A) < δ ⇒ supn∈N‖1A ·Xn‖p < 2 · ε.
‘⇐’: Let ε > 0, put A = Am,n = |Xm −Xn| > ε. Then
‖Xm −Xn‖p ≤ ‖1A · (Xm −Xn)‖p + ‖1Ac · (Xm −Xn)‖p≤ ‖1A ·Xm‖p + ‖1A ·Xn‖p + ε.
By assumption XnP−→ X for some X ∈ Z(Ω,A). Take δ > 0 according to (2) for
(|Xn|p)n∈N, and note that
Am,n ⊂ |Xm −X| > ε/2 ∪ |Xn −X| > ε/2.
Hence, for m, n sufficiently large,
P (Am,n) < δ,
which implies
‖Xm −Xn‖p ≤ 2 · ε1/p + ε.
Apply Theorem A.6.7.
Remark 6.
(i) Theorem 5 yields a generalization of Lebesgue’s convergence theorem:
If Xn ∈ L1 for every n ∈ N and XnP−→ X, then
(Xn)n∈N u.i. ⇔ X ∈ L1 ∧XnL1
−→ X.
(ii) Uniform integrability is a property of the distributions only.
II.5. Kernels and Product Measures 21
Convergence of Expectations
Theorem 7.
Xnd−→ X ⇒ E(|X|) ≤ lim inf
n→∞E(|Xn|).
Proof. From Skorohod’s Theorem 3.11 we get a probability space (Ω, A, P ) with ran-
dom variables Xn, X such that
XnP -a.s.−→ X ∧ PXn
= PXn ∧ PX = PX .
Thus E(|X|) = E(|X|) and E(|Xn|) = E(|Xn|). Apply Fatou’s Lemma A.5.5.
Theorem 8. If
Xnd−→ X ∧ (Xn)n∈N u.i.
then
X ∈ L1 ∧ limn→∞
E(Xn) = E(X).
Proof. Notation as previously. Now (|Xn|)n∈N is u.i., see Remark 6.(ii). Hence, by
Remark 6.(i), X ∈ L1 and XnL1
−→ X. Thus E(|X|) <∞ and
limn→∞
E(Xn) = limn→∞
E(Xn) = E(X) = E(X).
Example 9. Example 3 continued. With X = 0 we have XnP -a.s.−→ X, and therefore
Xnd−→ X. But E(Xn) = 1 > 0 = E(X).
5 Kernels and Product Measures
Recall the construction and properties of product (probability) measures from Sec-
tion V.2 in the Lecture Notes on Maß- und Integrationstheorie (2016).
Two-Stage Experiments
Given: measurable spaces (Ω1,A1) and (Ω2,A2).
Motivation: two-stage experiment. Output ω1 ∈ Ω1 of the first stage determines
probabilistic model for the second stage.
Example 1. Choose one out of n coins and throw it once. Parameters a1, . . . , an ≥ 0
such that∑n
i=1 ai = 1 and b1, . . . , bn ∈ [0, 1].
Let
Ω1 = 1, . . . , n, A1 = P(Ω1)
and define
µ =n∑i=1
ai · εi,
II.5. Kernels and Product Measures 22
i.e., ai = µ(i) is the probability of choosing the i-th coin. Moreover, let
Ω2 = H,T, A2 = P(Ω2)
and define
K(i, ·) = bi · εH + (1− bi) · εT,
i.e., bi = K(i, H) is the probability for obtaining H when throwing the i-th coin.
Definition 2. K : Ω1 × A2 → R is a Markov (transition) kernel (from (Ω1,A1) to
(Ω2,A2)), if
(i) K(ω1, ·) is a probability measure on A2 for every ω1 ∈ Ω1,
(ii) K(·, A2) is A1-B-measurable for every A2 ∈ A2.
Example 3. Extremal cases, non-disjoint.
(i) Model for the second stage not influenced by output of the first stage, i.e., for a
probability measure ν on A2
∀ω1 ∈ Ω1 : K(ω1, ·) = ν.
In Example 1 this means b1 = · · · = bn.
(ii) Output of the first stage determines the output of the second stage, i.e., for a
A1-A2-measurable mapping f : Ω1 → Ω2
∀ω1 ∈ Ω1 : K(ω1, ·) = εf(ω1).
In Example 1 this means b1, . . . , bn ∈ 0, 1.
Given:
• a probability measure µ on A1 and
• a Markov kernel K from (Ω1,A1) to (Ω2,A2).
Question: stochastic model (Ω,A, P ) for the compound experiment? Reasonable, and
assumed in the sequel,
Ω = Ω1 × Ω2, A = A1 ⊗ A2.
How to define P?
Example 4. In Example 1, a reasonable requirement for P is
P (A1 × Ω2) = µ(A1), A1 ⊂ Ω1,
and
P (i × A2) = K(i, A2) · µ(i), A2 ⊂ Ω2.
II.5. Kernels and Product Measures 23
Consequently, for A ⊂ Ω
P (A) =n∑i=1
P ((ω1, ω2) ∈ A : ω1 = i) =n∑i=1
P (i × ω2 ∈ Ω2 : (i, ω2) ∈ A)
=n∑i=1
K(i, ω2 ∈ Ω2 : (i, ω2) ∈ A) · µ(i)
=
∫Ω1
K(ω1, ω2 ∈ Ω2 : (ω1, ω2) ∈ A)µ(dω1).
May we generally use the right-hand side integral for the definition of P?
Lemma 5. Let f ∈ Z(Ω,A). Then, for ω1 ∈ Ω1, the ω1-section
f(ω1, ·) : Ω2 → R
of f is A2-B-measurable, and for ω2 ∈ Ω2 the ω2-section
f(·, ω2) : Ω1 → R
of f is A1-B-measurable.
Proof. In the case of an ω1-section. Fix ω1 ∈ Ω1. Then Ω2 → Ω1 ×Ω2 : ω2 7→ (ω1, ω2)
is A2-A-measurable due to Theorem A.8.5.(i). Apply Theorem A.1.7.
Remark 6. In particular, for A ∈ A and f = 1A
f(ω1, ·) = 1A(ω1, ·) = 1A(ω1)
where1
A(ω1) = ω2 ∈ Ω2 : (ω1, ω2) ∈ A
is the ω1-section of A. By Lemma 5
∀ω1 ∈ Ω1 : A(ω1) ∈ A2.
Analogously for the ω2-section
A(ω2) = ω1 ∈ Ω1 : (ω1, ω2) ∈ A
of A.
Lemma 7. Let f ∈ Z+(Ω,A). Then
g : Ω1 → [0,∞[ ∪ ∞, ω1 7→∫
Ω2
f(ω1, ω2)K(ω1, dω2)
is A1-B([0,∞])-measurable.
1poor notation
II.5. Kernels and Product Measures 24
Proof. Let F denote the set of all functions f ∈ Z+(Ω,A) with the measurability
property as claimed. We show that
∀A1 ∈ A1, A2 ∈ A2 : 1A1×A2 ∈ F. (1)
Indeed, ∫Ω2
1A1×A2(ω1, ω2)K(ω1, dω2) = 1A1(ω1)K(ω1, A2).
Furthermore, we show that
∀A ∈ A : 1A ∈ F. (2)
To this end let
D = A ∈ A : 1A ∈ Fand
E = A1 × A2 : A1 ∈ A1 ∧ A2 ∈ A2.Then E ⊂ D by (1), E is closed w.r.t. intersections, and σ(E) = A. It easily follows
that D is a Dynkin class. Hence Theorem A.7.4 yields
A = σ(E) = δ(E) ⊂ D ⊂ A,
which implies (2). From Theorems A.4.6 and A.5.9.(iv) we get
f1, f2 ∈ F ∧ α ∈ [0,∞[ ⇒ αf1 + f2 ∈ F. (3)
Finally, Theorem A.5.3 and Theorem A.4.4.(iii) imply that
fn ∈ F ∧ fn ↑ f ⇒ f ∈ F. (4)
Use Theorem A.4.8 together with (2)–(4) to conclude that F = Z+.
Theorem 8.
∃1
probability measure µ×K on A ∀A1 ∈ A1 ∀A2 ∈ A2 :
µ×K(A1 × A2) =∫A1K(ω1, A2)µ(dω1). (5)
Moreover,
∀A ∈ A : µ×K(A) =
∫Ω1
K(ω1, A(ω1))µ(dω1). (6)
Proof. ‘Existence’: For A ∈ A and ω1 ∈ Ω1
K(ω1, A(ω1)) =
∫Ω2
1A(ω1)(ω2)K(ω1, dω2) =
∫Ω2
1A(ω1, ω2)K(ω1, dω2).
According to Lemma 5.7 µ × K is well-defined via (6). Using Theorem A.5.3, it is
easy to verify that µ×K is a probability measure on A.
For A1 ∈ A1 and A2 ∈ A2
K(ω1, (A1 × A2)(ω1)) =
K(ω1, A2) if ω1 ∈ A1
0 otherwise.
Hence µ×K satisfies (5).
‘Uniqueness’: Apply Theorem A.1.5 with A0 = A1 × A2 : Ai ∈ Ai.
II.5. Kernels and Product Measures 25
Example 9. In Example 4 we have P = µ×K.
Theorem 10 (Fubini’s Theorem).
(i) For f ∈ Z+(Ω,A)∫Ω
f d(µ×K) =
∫Ω1
∫Ω2
f(ω1, ω2)K(ω1, dω2)µ(dω1).
(ii) For f (µ×K)-integrable and
A1 = ω1 ∈ Ω1 : f(ω1, ·) K(ω1, ·)-integrable
we have
(a) A1 ∈ A1 and µ(A1) = 1,
(b) A1 → R : ω1 7→∫
Ω2f(ω1, ·) dK(ω1, ·) is integrable w.r.t. µ|A1∩A1 ,
(c) ∫Ω
f d(µ×K) =
∫A1
∫Ω2
f(ω1, ω2)K(ω1, dω2)µ|A1∩A1(dω1).
Proof. Ad (i): algebraic induction. Ad (ii): consider f+ and f− and use (i).
Remark 11. For brevity, we write∫Ω1
∫Ω2
f(ω1, ω2)K(ω1, dω2)µ(dω1) =
∫A1
∫Ω2
f(ω1, ω2)K(ω1, dω2)µ|A1∩A1(dω1),
if f is (µ×K)-integrable. For f ∈ Z(Ω,A)
f is (µ×K)-integrable ⇔∫
Ω1
∫Ω2
|f |(ω1, ω2)K(ω1, dω2)µ(dω1) <∞.
Multi-Stage Experiments
Now we construct a stochastic model for a series of experiments, where the outputs
of the first i− 1 stages determine the model for the ith stage.
Given:
• measurable spaces (Ωi,Ai) for i ∈ I, where I = 1, . . . , n or I = N.
Put (Ω′i,A
′i
)=
( i×j=1
Ωj,i⊗
j=1
Aj
),
and note that
i×j=1
Ωj = Ω′i−1 × Ωi ∧i⊗
j=1
Aj = A′i−1 ⊗ Ai
II.5. Kernels and Product Measures 26
for i ∈ I \ 1. Furthermore, let
Ω =×i∈I
Ωi, A =⊗i∈I
Ai. (7)
Given:
• a probability measure µ on A1,
• Markov kernels Ki from(Ω′i−1,A
′i−1
)to (Ωi,Ai) for i ∈ I \ 1.
Theorem 12. For I = 1, . . . , n
∃1
probability measure ν on A ∀A1 ∈ A1 . . . ∀An ∈ An :
ν(A1 × · · · × An)
=
∫A1
. . .
∫An−1
Kn((ω1, . . . , ωn−1), An)Kn−1((ω1, . . . , ωn−2), dωn−1) · · ·µ(dω1).
Moreover, for f ν-integrable (the short version)∫Ω
f dν =
∫Ω1
. . .
∫Ωn
f(ω1, . . . , ωn)Kn((ω1, . . . , ωn−1), dωn) · · ·µ(dω1). (8)
Notation: ν = µ×K2 × · · · ×Kn.
Proof. Induction, using Theorems 8 and 10.
Remark 13. Particular case of Theorem 12 with
µ = P1, ∀ i ∈ I \ 1 ∀ω′i−1 ∈ Ω′i−1 : Ki(ω′i−1, ·) = Pi (9)
for probability measures Pi on Ai:
∃1
probability measure P1 × · · · × Pn on A ∀A1 ∈ A1 . . . ∀An ∈ An :
P1 × · · · × Pn(A1 × · · · × An) = P1(A1) · · · · · Pn(An).
Moreover, for every P1 × · · · × Pn-integrable function f , Fubini’s Theorem reads∫Ω
f d(P1 × · · · × Pn) =
∫Ω1
. . .
∫Ωn
f(ω1, . . . , ωn) Pn(dωn) · · ·P1(dω1).
Analogously for any other order of integration.
Definition 14. P1×· · ·×Pn is called the product probability measure corresponding to
Pi for i = 1, . . . , n, and (Ω,A, µ) is called the product probability space corresponding
to (Ωi,Ai, Pi) for i = 1, . . . , n.
Example 15.
(i) In Example 4 with b = b1 = · · · = bn and ν = b · εH + (1 − b) · εT we have
P = µ× ν.
II.5. Kernels and Product Measures 27
(ii) For countable spaces Ωi and σ-algebras Ai = P(Ωi) we get
P1 × P2(A) =∑ω1∈Ω1
P2(A(ω1)) · P1(ω1), A ⊂ Ω.
(iii) For uniform distributions Pi on finite spaces Ωi, P1 × · · · × Pn is the uniform
distribution on Ω.
Remark 16. Theorems 8, 10, and 12 and Remark 13 extend to σ-finite measures µ
instead of probability measures and σ-finite kernels K or Ki, resp., instead of Markov
kernels, with the resulting measure on A being σ-finite, too. Recall that µ is σ-finite,
if
∃A1,1, A1,2, . . . ∈ A1 pairwise disjoint : Ω1 =∞⋃i=1
A1,i ∧ ∀ i ∈ N : µ(A1,i) <∞.
By definition, a mapping K : Ω1 ×A2 → R is a kernel , if K(ω1, ·) is a measure on A2
for every ω1 ∈ Ω1 and if K(·, A2) is A1-R-measurable for every A2 ∈ A2. Moreover,
K is a σ-finite kernel if, additionally,
∃A2,1, A2,2, . . . ∈ A2 pairwise disjoint :
Ω2 =∞⋃i=1
A2,i ∧ ∀ i ∈ N : supω1∈Ω1
K(ω1, A2,i) <∞.
Example 17.
(i) For every integrable random variable X on any probability space (Ω,A, P ) with
X ≥ 0,
E(X) =
∫]0,∞[
(1− FX(u))λ1(du),
see Ubung 7.4 in Maß- und Integrationstheorie (2016).
(ii) For the Lebesgue measure
λn = λ1 × · · · × λ1.
Theorem 18 (Ionescu-Tulcea). For I = N,
∃1
probability measure P on A ∀n ∈ N ∀A1 ∈ A1 . . . ∀An ∈ An :
P(A1 × · · · × An ×
∞×i=n+1
Ωi
)= (µ×K2 × · · · ×Kn)(A1 × · · · × An). (10)
Proof. ‘Existence’: Consider the σ-algebras
n⊗i=1
Ai, An = σ(π1,...,n
)
II.5. Kernels and Product Measures 28
in×ni=1 Ωi and×∞i=1 Ωi, respectively. Define a probability measure Pn on An by
Pn
(A×
∞×i=n+1
Ωi
)= (µ×K2 × · · · ×Kn)(A), A ∈
n⊗i=1
Ai.
Then (8) yields the following consistency property
Pn+1
(A× Ωn+1 ×
∞×i=n+2
Ωi
)= Pn
(A×
∞×i=n+1
Ωi
), A ∈
n⊗i=1
Ai.
Thus
P (A) = Pn(A), A ∈ An,
yields a well-defined mapping on the algebra
A =⋃n∈N
An
of cylinder sets. Obviously, P is a content with P (×∞i=1 Ωi) = 1 and (10) holds for
P = P .
Claim: P is σ-continuous at ∅.It suffices to show that for every sequence of sets
A(n) = B(n) ×∞×
i=n+1
Ωi
with B(n) ∈⊗n
i=1 Ai and A(n) ↓ ∅ we have
limn→∞
(µ×K2 × . . .×Kn)(B(n)) = 0.
Assume the contrary, i.e., infn∈N(µ×K2 × . . .×Kn)(B(n)) > 0.
Observe that
B(n) × Ωn+1 ⊃ B(n+1). (11)
Recursively, we define ω∗ ∈ Ω such that
∀n ∈ N : B(n+1)(ω∗1, . . . , ω∗n) 6= ∅, (12)
which implies (ω∗1, . . . , ω∗n) ∈ B(n) for every n ∈ N, see (11). Consequently,
ω∗ ∈∞⋂n=1
A(n),
contradicting A(n) ↓ ∅.Put
ωn = (ω1, . . . , ωn)
for n ≥ 1 and ωi ∈ Ωi. Consider the probabilty measure Kn+1(ωn, ·) on An+1 as well
as the Markov kernels
Kn+m((ωn, ·), ·) :n+m−1×i=n+1
Ωi × An+m → R
II.5. Kernels and Product Measures 29
for m ≥ 2. By
Qn,m(ωn, ·) = Kn+1(ωn, ·)× · · · ×Kn+m((ωn, ·), ·)
we obtain a probability measures on⊗n+m
i=n+1 Ai for m ≥ 1; actually Qn,m is a Markov
kernel. Finally, we put
fn,m(ωn) = Qn,m
(ωn, B
(n+m)(ωn))
for n,m ≥ 1.
Recursively, we define ω∗ ∈ Ω such that
∀n ∈ N : infm≥1
fn,m(ω∗1, . . . , ω∗n) > 0. (13)
Take m = 1 in (13) to obtain (12).
Due to (11) we have Bn+m(ωn)× Ωn+m+1 ⊃ Bn+m+1(ωn), and therefore
fn,m(ωn) = Qn,m+1
(ωn, B
n+m(ωn)× Ωn+m+1
)≥ fn,m+1(ωn).
Furthermore, 0 ≤ fn,m ≤ 1 and
fn,m(ωn) =
∫×n+m
i=n+1 Ωi
1B(n+m)(ωn, ωn+1, . . . , ωn+m)Qn,m(ωn, d(ωn+1, . . . , ωn+m)).
In particular, fn,m is⊗n
i=1 Ai–B measurable.
For n = 1 ∫Ω1
f1,m(ω1)µ(dω1) = (µ×K2 × . . .×K1+m)(B(1+m)).
Therefore∫Ω1
infm≥1
f1,m(ω1)µ(dω1) = infm≥1
(µ×K2 × . . .×K1+m)(B(1+m)) > 0,
which yields
∃ω∗1 ∈ Ω1 : infm≥1
f1,m(ω∗1) > 0.
For n ≥ 1 ∫Ωn+1
fn+1,m(ωn, ωn+1)Kn+1(ωn, dωn+1) = fn,m+1(ωn).
Therefore ∫Ωn+1
infm≥1
fn+1,m(ωn, ωn+1)Kn+1(ωn, dωn+1) = infm≥1
fn,m+1(ωn).
Proceed inductively to establish (13).
By Theorem A.9.3, P is σ-additive and it remains to apply Theorem A.9.4.
‘Uniqueness’: By (10), P is uniquely determined on the class of measurable rectangles.
Apply Theorem A.1.5.
Outlook: the probabilistic method.
Example 19. The queuing model, see Ubung 5.3. Here Ki((ω1, . . . , ωi−1), ·) only
depends on ωi−1. See also Ubung 5.1.
Outlook: Markov processes.
II.6. Independence 30
Product Measures – The General Case
Given: a non-empty set I and probability spaces (Ωi,Ai, Pi) for i ∈ I. Recall the
definition (7). Put
P0(I) = J ⊂ I : J non-empty, finite.
Theorem 20.
∃1
probability measure P on A ∀S ∈ P0(I) ∀Ai ∈ Ai, i ∈ S :
P(×i∈S
Ai ××i∈I\S
Ωi
)=∏i∈S
Pi(Ai). (14)
Notation: P =×i∈I Pi.
Proof. See Remark 13 in the case of a finite set I.
If |I| = |N|, assume I = N without loss of generality. The particular case of Theo-
rem 18 with (9) for probability measures Pi on Ai shows
∃1
probability measure P on A ∀n ∈ N ∀A1 ∈ A1 . . . ∀An ∈ An :
P(A1 × · · · × An ×
∞×i=n+1
Ωi
)= P1(A1) · · · · · Pn(An).
If I is uncountable, we use Theorem A.8.7. For S ⊂ I non-empty and countable and
for B ∈⊗
i∈S Ai we put
P((πIS)−1
B) =×i∈S
Pi(B).
Hereby we get a well-defined mapping P : A → R, which clearly is a probability
measure and satisfies (14). Use Theorem A.1.5 to obtain the uniqueness result.
Definition 21. P = ×i∈I Pi is called the product measure corresponding to Pi for
i ∈ I, and (Ω,A, P ) is called the product measure space corresponding to (Ωi,Ai, Pi)
for i ∈ I.
6 Independence
‘. . . the concept of independence . . . plays a central role in probability theory; it is precisely
this concept that distinguishes probability theory from the general theory of measure spaces’,
see Shiryayev (1984, p. 27).
In the sequel, (Ω,A, P ) denotes a probability space and I is a non-empty set.
Independence of Events
Definition 1. Let Ai ∈ A for i ∈ I. Then (Ai)i∈I is independent if
P
(⋂i∈S
Ai
)=∏i∈S
P (Ai) (1)
for every S ∈ P0(I). Elementary case: |I| = 2.
II.6. Independence 31
In the sequel, Ei ⊂ A for i ∈ I.
Definition 2. (Ei)i∈I is independent if (1) holds for every S ∈ P0(I) and all Ai ∈ Eifor i ∈ S.
Remark 3.
(i) (Ei)i∈I independent ∧ ∀ i ∈ I : Ei ⊂ Ei ⇒ (Ei)i∈I independent.
(ii) (Ei)i∈I independent ⇔ ∀S ∈ P0(I) : (Ei)i∈S independent.
Lemma 4.
(Ei)i∈I independent ⇒ (δ(Ei))i∈I independent.
Proof. Without loss of generality, I = 1, . . . , n and n ≥ 2, see Remark 3.(ii). Put
D1 = A ∈ δ(E1) : (A,E2, . . . ,En) independent.
Then D1 is a Dynkin class and E1 ⊂ D1, hence δ(E1) = D1. Thus
(δ(E1),E2, . . . ,En) independent.
Repeat this step for 2, . . . , n.
Theorem 5. If
(Ei)i∈I independent ∧ ∀ i ∈ I : Ei closed w.r.t. intersections (2)
then
(σ(Ei))i∈I independent.
Proof. Use Theorem A.7.4 and Lemma 4.
Corollary 6. Assume that I =⋃j∈J Ij for pairwise disjoint sets Ij 6= ∅. If (2) holds,
then (σ
( ⋃i∈Ij
Ei
))j∈J
independent.
Proof. Let
Ej =
⋂i∈S
Ai : S ∈ P0(Ij) ∧ Ai ∈ Ei for i ∈ S.
Then Ej is closed w.r.t. intersections and(Ej)j∈J is independent. Finally,
σ
( ⋃i∈Ij
Ei
)= σ(Ej).
II.6. Independence 32
Independence of Random Elements
In the sequel, (Ωi,Ai) denotes a measurable space for i ∈ I, and Xi : Ω → Ωi is
A-Ai-measurable for i ∈ I.
Definition 7. (Xi)i∈I is independent if (σ(Xi))i∈I is independent.
Remark 8. See Ubung 6.2 for a characterization of independence in terms of predic-
tion problems.
Example 9. Actually, the essence of independence. Assume that
(Ω,A, P ) =
(×i∈I
Ωi,⊗i∈I
Ai,×i∈I
Pi
)for probability measures Pi on Ai. Let
Xi = πi.
Then, for S ∈ P0(I) and Ai ∈ Ai for i ∈ S
P
(⋂i∈S
Xi ∈ Ai)
= P
(×i∈S
Ai ××i∈I\S
Ωi
)=∏i∈S
Pi(Ai) =∏i∈S
P (Xi ∈ Ai).
Hence (Xi)i∈I is independent. Furthermore, PXi= Pi.
Theorem 10. Given: probability spaces (Ωi,Ai, Pi) for i ∈ I. Then there exist
(i) a probability space (Ω,A, P ) and
(ii) A-Ai-measurable mappings Xi : Ω→ Ωi for i ∈ I
such that
(Xi)i∈I independent ∧ ∀ i ∈ I : PXi= Pi.
Proof. See Example 9.
Theorem 11. Let Fi ⊂ Ai for i ∈ I. If
∀ i ∈ I : σ(Fi) = Ai ∧ Fi closed w.r.t. intersections
then
(Xi)i∈I independent ⇔ (X−1i (Fi))i∈I independent.
Proof. By Lemma II.3.6 in Maß- und Integrationstheorie (2016) we have σ(Xi) =
X−1i (Ai) = σ(X−1
i (Fi)). ‘⇒’: See Remark 3.(i). ‘⇐’: Note that X−1i (Fi) is closed
w.r.t. intersections. Use Theorem 5.
Example 12. Independence of a family of random variables Xi, i.e., (Ωi,Ai) = (R,B)
for i ∈ I. In this case (Xi)i∈I is independent iff
∀S ∈ P0(I) ∀ ci ∈ R, i ∈ S : P
(⋂i∈S
Xi ≤ ci)
=∏i∈S
P (Xi ≤ ci).
II.6. Independence 33
Theorem 13. Let
(i) I =⋃j∈J Ij for pairwise disjoint sets Ij 6= ∅,
(ii) (Ωj, Aj) be measurable spaces for j ∈ J ,
(iii) fj : ×i∈Ij Ωi → Ωj be(⊗
i∈Ij Ai
)-Aj measurable mappings for j ∈ J .
Put
Yj = (Xi)i∈Ij : Ω→×i∈Ij
Ωi.
Then
(Xi)i∈I independent ⇒ (fj Yj)j∈J independent.
Proof.
σ(fj Yj) = Y −1j (f−1
j (Aj)) ⊂ Y −1j
(⊗i∈Ij
Ai
)= σ(Xi : i ∈ Ij) = σ
( ⋃i∈Ij
X−1i (Ai)
).
Use Corollary 6 and Remark 3.(i).
Example 14. For an independent sequence (Xi)i∈N of random variables(max(X1, X3), 1[0,∞[(X2), lim sup
n→∞1/n
n∑i=1
Xi
)are independent.
Independence and Product Measures
Remark 15. Consider the mapping
X : Ω→×i∈I
Ωi : ω 7→ (Xi(ω))i∈I .
Clearly X is A-⊗
i∈I Ai-measurable. By definition, PX(A) = P (X ∈ A) for A ∈⊗i∈I Ai. In particular, for measurable rectangles A ∈
⊗i∈I Ai, i.e.,
A =×i∈S
Ai ××i∈I\S
Ωi (3)
with S ∈ P0(I) and Ai ∈ Ai,
PX(A) = P
(⋂i∈S
Xi ∈ Ai). (4)
II.6. Independence 34
Definition 16.
(i) PX is called the joint distribution of the random elements Xi, i ∈ I.
(ii) Let P denote a probability measure on (×i∈I Ωi,⊗
i∈I Ai), and let i ∈ I. Then
Pπi is called a (one-dimensional) marginal distribution of P .
Example 17. Let Ω = 1, . . . , 62 and consider the uniform distribution P on A =
P(Ω), which is a model for rolling a die twice.
Moreover, let Ωi = N and Ai = P(Ωi) such that⊗2
i=1 Ai = P(N2). Consider the
random variables
X1(ω1, ω2) = ω1, X2(ω1, ω2) = ω1 + ω2.
Then
PX(A) =|A ∩M |
36, A ⊂ N2,
where
M = (k, `) ∈ N2 : 1 ≤ k ≤ 6 ∧ k + 1 ≤ ` ≤ k + 6
Claim: (X1, X2) are not independent. Proof:
P (X1 = 1 ∩ X2 = 3) = PX((1, 3)) = P ((1, 2)) = 1/36
but
P (X1 = 1) · P (X2 = 3) = 1/6 · P ((1, 2), (2, 1)) = 1/3 · 1/36.
We add that
PX1 =6∑
k=1
1/6 · εk, PX2 =12∑`=2
(6− |`− 7|)/36 · ε`.
Theorem 18.
(Xi)i∈I independent ⇔ PX =×i∈I
PXi.
Proof. For A given by (3)(×i∈I
PXi
)(A) =
∏i∈S
PXi(Ai) =
∏i∈S
P (Xi ∈ Ai).
On the other hand, we have (4). Thus ‘⇐’ hold trivially. Use Theorem A.1.5 to
obtain ‘⇒’.
In the sequel, we consider random variables Xi, i.e., (Ωi,Ai) = (R,B) for i ∈ I.
Theorem 19. Let I = 1, . . . , n. If
(X1, . . . , Xn) independent ∧ ∀ i ∈ I : Xi ≥ 0 (Xi integrable)
then (∏n
i=1Xi is integrable and)
E
( n∏i=1
Xi
)=
n∏i=1
E(Xi).
II.6. Independence 35
Proof. Use Fubini’s Theorem and Theorem 18 to obtain
E
(∣∣∣∣ n∏i=1
Xi
∣∣∣∣) =
∫Rn
|x1 · · · · · xn|P(X1,...,Xn)(d(x1, . . . , xn))
=
∫Rn
|x1 · · · · · xn| (PX1 × · · · × PXn)(d(x1, . . . , xn))
=n∏i=1
∫R|xi|PXi
(dxi) =n∏i=1
E(|Xi|).
Drop | · | if the random variables are integrable.
Uncorrelated Random Variables
Definition 20. X1, X2 ∈ L2 are uncorrelated if
E(X1 ·X2) = E(X1) · E(X2).
Theorem 21 (Bienayme). Let X1, . . . , Xn ∈ L2 be pairwise uncorrelated. Then
Var
( n∑i=1
Xi
)=
n∑i=1
Var(Xi).
Proof. We have
Var
( n∑i=1
Xi
)= E
(n∑i=1
(Xi − E(Xi))
)2
=n∑i=1
E(Xi − E(Xi))2 +
n∑i,j=1i 6=j
E((Xi − E(Xi)) · (Xj − E(Xj))).
Moreover,
E((Xi − E(Xi)) · (Xj − E(Xj))) = E(Xi ·Xj)− E(Xi) · E(Xj).
(The latter quantity is called the covariance between Xi and Xj.)
Convolutions
Definition 22. The convolution product of probability measures P1, . . . , Pn on B is
defined by
P1 ∗ · · · ∗ Pn = s(P1 × · · · × Pn)
where
s(x1, . . . , xn) = x1 + · · ·+ xn.
Theorem 23. Let (X1, . . . , Xn) be independent and S =∑n
i=1 Xi. Then
PS = PX1 ∗ · · · ∗ PXn .
II.6. Independence 36
Proof. Put X = (X1, . . . , Xn). Since S = s (X1, . . . , Xn) we get
PS = s(PX) = s(PX1 × · · · × PXn).
Remark 24. The class of probability measure on B forms an Abelian semi-group
w.r.t. ∗, and P ∗ ε0 = P .
Theorem 25. For all probability measures P1, P2 on B and every P1 ∗ P2-integrable
function f ∫Rf d(P1 ∗ P2) =
∫R
∫Rf(x+ y)P1(dx)P2(dy).
If P1 = h1 · λ1 then P1 ∗ P2 = h · λ1 with
h(x) =
∫Rh1(x− y)P2(dy).
If P2 = h2 · λ1, additionally, then
h(x) =
∫Rh1(x− y) · h2(y)λ(dy).
Proof. Use Fubini’s Theorem and the transformation theorem. See Billingsley (1979,
p. 230).
Example 26.
(i) Put N(µ, 0) = εµ. By Theorems 21 and 25
N(µ1, σ21) ∗N(µ2, σ
22) = N(µ1 + µ2, σ
21 + σ2
2)
for µi ∈ R and σi ≥ 0.
(ii) Consider n independent Bernoulli trials, i.e., (X1, . . . , Xn) independent with
PXi= p · ε1 + (1− p) · ε0
for every i ∈ 1, . . . , n, where p ∈ [0, 1]. Inductively, we get for k ∈ 1, . . . , n
k∑i=1
Xi ∼ B(k, p),
see also Ubung 4.4. Thus, for any n,m ∈ N,
B(n, p) ∗B(m, p) = B(n+m, p).
Chapter III
Limit Theorems
Given: a sequence (Xn)n∈N of random variables on a probability space (Ω,A, P ) and
weights 0 < an ↑ ∞.
Put
Sn =n∑i=1
Xi, n ∈ N0.
For instance, Sn might be the cumulative gain after n trials or (one of the coordinates
of) the position of a particle after n collisions.
Question: Convergence of Sn/an for suitable weights an in a suitable sense? Particular
case: an = n.
Definition 1. (Xn)n∈N independent and identically distributed (i.i.d.) if (Xn)n∈N is
independent and
∀n ∈ N : PXn = PX1 .
In this case (Sn)n∈N0 is called the associated random walk .
1 Zero-One Laws
Kolmogorov’s Zero-One Law
Definition 1. For σ-algebras An ⊂ A, n ∈ N, the corresponding tail σ-algebra is
A∞ =⋂n∈N
σ
( ⋃m≥n
Am
),
and A ∈ A∞ is called a tail (terminal) event .
Example 2. Let An = σ(Xn). Put C =⊗∞
i=1 B. Then
A∞ =⋂n∈N
σ(Xm : m ≥ n)
and
A ∈ A∞ ⇔ ∀n ∈ N ∃C ∈ C : A = (Xn, Xn+1, . . . ) ∈ C.
37
III.1. Zero-One Laws 38
For instance,
(Sn)n∈N converges, (Sn/an)n∈N converges ∈ A∞,
and the function lim infn→∞ Sn/an is A∞-B-measurable. However, Sn as well as
lim infn→∞ Sn are not A∞-B-measurable, in general. Analogously for the lim sup’s.
Theorem 3 (Kolmogorov’s Zero-One Law). Let (An)n∈N be an independent sequence
of σ-algebras An ⊂ A. Then
∀A ∈ A∞ : P (A) ∈ 0, 1 .
Proof. We show that A∞ and A∞ are independent (terminology), which implies
P (A) = P (A) · P (A) for every A ∈ A∞. Put
An = σ(A1 ∪ · · · ∪ An).
Note that A∞ ⊂ σ(An+1 ∪ . . . ). By Corollary II.6.6 and Remark II.6.3.(i)
An,A∞ independent,
and therefore⋃n∈N An and A∞ are independent, too. Thus, by Theorem II.6.5,
σ
( ⋃n∈N
An
),A∞ independent.
Finally,
A∞ ⊂ σ
( ⋃n∈N
An
)= σ
( ⋃n∈N
An
).
Corollary 4. Let X ∈ Z(Ω,A∞). Under the assumptions of Theorem 3, X is constant
P -a.s.
Remark 5. Assume that (Xn)n∈N is independent. Then
P ((Sn)n∈N converges), P ((Sn/an)n∈N converges) ∈ 0, 1.
In case of convergence P -a.s., limn→∞ Sn/an is constant P -a.s.
The Borel-Cantelli Lemma
Definition 6. Let An ∈ A for n ∈ N. Then
lim infn→∞
An =⋃n∈N
⋂m≥n
Am, lim supn→∞
An =⋂n∈N
⋃m≥n
Am.
Remark 7.
(i)(
lim infn→∞
An
)c= lim sup
n→∞Acn.
III.1. Zero-One Laws 39
(ii) P(
lim infn→∞
An
)≤ lim inf
n→∞P (An) ≤ lim sup
n→∞P (An) ≤ P
(lim supn→∞
An
).
(iii) If (An)n∈N is independent, then P(
lim supn→∞
An
)∈ 0, 1 (Borel’s Zero-One Law).
Proof: Ubung 6.3.
Theorem 8 (Borel-Cantelli Lemma). Let A = lim supn→∞An with An ∈ A.
(i) If∑∞
n=1 P (An) <∞ then P (A) = 0.
(ii) If∑∞
n=1 P (An) =∞ and (An)n∈N is independent, then P (A) = 1.
Proof. Ad (i):
P (A) ≤ P
( ⋃m≥n
Am
)≤
∞∑m=n
P (Am).
By assumption, the right-hand side tends to zero as n tends to ∞.
Ad (ii): We have
P (Ac) = P (lim infn→∞
Acn) ≤∞∑n=1
P
( ⋂m≥n
Acm
).
Use 1− x ≤ exp(−x) for x ≥ 0 to obtain
P
( ⋂m=n
Acm
)=∏m=n
(1− P (Am)) ≤∏m=n
exp(−P (Am)) = exp(−∑m=n
P (Am)).
By assumption, the right-hand side tends to zero as ` tends to∞. Thus P (Ac) = 0.
Example 9. A fair coin is tossed an infinite number of times. Determine the proba-
bility that 0 occurs twice in a row infinitely often. Model: (Xn)n∈N i.i.d. with
P (X1 = 0) = P (X1 = 1) = 1/2.
Put
An = Xn = Xn+1 = 0.
Then (A2n)n∈N is independent and P (A2n) = 1/4. Thus P (lim supn→∞An) = 1.
Remark 10. A stronger version of Theorem 8.(ii) requires only pairwise indepen-
dence, see Bauer (1996, p. 70).
Example 11. Let (Xn)n∈N be i.i.d. with
P (X1 = 1) = p = 1− P (X1 = −1)
for some constant p ∈ [0, 1]. Put An = σ(Xn) and
A = lim supn→∞
Sn = 0 ,
III.2. The Strong Law of Large Numbers 40
and note that
A /∈ A∞ =⋂n∈N
σ(Xm : m ≥ n)
at least in the standard setting from Example II.6.9. Clearly
1/2 · (Sn + n) ∼ B(n, p).
Use Stirling’s Formula
n! ≈(n
e
)n√2πn
to obtain
P (S2n = 0) =
(2n
n
)· pn · (1− p)n ≈ rn√
πn,
where r = 4p · (1− p) ∈ [0, 1].
Suppose that
p 6= 1/2.
Then r < 1, and therefore
∞∑n=0
P (Sn = 0) =∞∑n=0
P (S2n = 0) <∞.
The Borel-Cantelli Lemma implies
P (A) = 0.
Suppose that
p = 1/2.
Then∞∑n=0
P (Sn = 0) =∞∑n=0
P (S2n = 0) =∞,
but (Sn = 0)n∈N is not independent. Using the Central Limit Theorem one can
show that P (A) = 1, see Example 5.12.
2 The Strong Law of Large Numbers
Throughout this section: (Xn)n∈N independent.
The L2 Case
Put
C =
(Sn)n∈N converges in R.
By Remark 1.5, P (C) ∈ 0, 1.First we provide sufficient conditions for P (C) = 1 to hold.
III.2. The Strong Law of Large Numbers 41
Theorem 1 (Kolmogorov inequality). If
∀ i ∈ 1, . . . , n : Xi ∈ L2 ∧ E(Xi) = 0,
then for every u > 0
P(
sup1≤k≤n
|Sk| ≥ u)≤ 1
u2· Var(Sn).
Proof. Let 1 ≤ k ≤ n. We show that
∀B ∈ σ(X1, . . . , Xk) :
∫B
S2k dP ≤
∫B
S2n dP. (1)
Note that
S2n = (Sn − Sk)2 + 2Sk(Sn − Sk) + S2
k .
Moreover, for B ∈ σ(X1, . . . , Xk),
1B · Sk is σ(X1, . . . , Xk)-B-measurable,
Sn − Sk is σ(Xk+1, . . . , Xn)-B-measurable.
Use Theorem II.6.13 to obtain
1B · Sk, Sn − Sk independent.
Hence Theorem II.6.19 yields
E(1B · Sk · (Sn − Sk)) = E(1B · Sk) · E(Sn − Sk) = 0,
and thereby
E(1B · S2n) ≥ 2 · E(1B · Sk · (Sn − Sk)) + E(1B · S2
k) = E(1B · S2k).
This completes the proof of (1).
Put
Ak =k−1⋂`=1
|S`| < u ∩ |Sk| ≥ u.
Then Ak ∈ σ(X1, . . . , Xk), and by (1)
u2 · P(
sup1≤k≤n
|Sk| ≥ u)
= u2 ·n∑k=1
P (Ak) ≤n∑k=1
∫Ak
S2k dP
≤n∑k=1
∫Ak
S2n dP ≤
∫Ω
S2n dP = Var(Sn).
Theorem 2. If
∀n ∈ N : Xn ∈ L2 ∧ E(Xn) = 0
and∞∑i=1
Var(Xi) <∞,
then
P (C) = 1.
III.2. The Strong Law of Large Numbers 42
Proof. Clearly
ω ∈ C ⇔ ∀ ε > 0 ∃n ∈ N ∀ k ∈ N |Sn+k(ω)− Sn(ω)| < ε.
Put
M = infn∈N
supk∈N|Sn+k − Sn|.
Then
C = M = 0.
Let ε > 0. For every n ∈ N
M > ε ⊂
supk∈N|Sn+k − Sn| > ε
,
and sup
1≤k≤r|Sn+k − Sn| > ε
↑
supk∈N|Sn+k − Sn| > ε
as r tends to ∞. Hence
P (M > ε) ≤ limr→∞
P(
sup1≤k≤r
|Sn+k − Sn| > ε),
and Kolmogorov’s inequality yields
P(
sup1≤k≤r
|Sn+k − Sn| > ε)≤ 1
ε2·n+r∑i=n+1
Var(Xi) ≤1
ε2·∞∑
i=n+1
Var(Xi).
Thus P (M > ε) = 0 for every ε > 0, which implies P (M > 0) = 0.
Example 3. Let (Yn)n∈N be i.i.d. with PY1 = 1/2 · (ε1 + ε−1). Then E(Yn) = 0 and
Var(Yn) = 1, so that∑∞
i=1 Yi ·1i
converges P -a.s.
Remark 4. Consider the truncated random variables
X(c)i = Xi · 1|Xi|<c, c > 0.
According to the three-series theorem, P (C) = 1 iff there exists c > 0 such that (iff
for all c > 0)
(i)∑∞
i=1 P (|Xi| > c) <∞ and
(ii)∑∞
i=1 E(X(c)i ) <∞ and
(iii)∑∞
i=1 Var(X(c)i ) <∞.
See Ganssler, Stute (1977, Satz 2.2.9).
Recall that 0 < an ↑ ∞.
We now study convergence almost surely of (Sn/an)n∈N.
III.2. The Strong Law of Large Numbers 43
Lemma 5 (Kronecker’s Lemma). For every sequence (xn)n∈N in R∞∑i=1
xiai
converges⇒ limn→∞
1
an·
n∑i=1
xi = 0.
Proof. Put c =∑∞
i=1 xi/ai and cn =∑n
i=1 xi/ai. It is straightforward to verify that
1
an·
n∑i=1
xi = cn −1
an·
n∑i=2
(ai − ai−1) · ci−1.
Moreover, since ai−1 ≤ ai and limi→∞ ai =∞,
c = limn→∞
1
an·
n∑i=2
(ai − ai−1) · ci−1.
Theorem 6 (Strong Law of Large Numbers, L2 Case). If
∀n ∈ N : Xn ∈ L2 ∧∞∑i=1
1
a2i
· Var(Xi) <∞ (2)
then1
an·
n∑i=1
(Xi − E(Xi))P -a.s.−→ 0.
Proof. Put Yn = 1/an · (Xn − E(Xn)). Then E(Yn) = 0 and (Yn)n∈N is independent.
Moreover,∞∑i=1
Var(Yi) =∞∑i=1
1
a2i
· Var(Xi) <∞.
Thus∑∞
i=1 Yi converges P -a.s. due to Theorem 2. Apply Lemma 5.
Remark 7. In particular, if (Xn)n∈N is i.i.d. and X1 ∈ L2, then Theorem 6 with
an = n implies1
n·
n∑i=1
XiP -a.s.−→ E(X1).
In fact, this conclusion already holds if X1 ∈ L1, see Theorem 11 below.
Remark 8. Assume
supn∈N
Var(Xn) <∞.
Then another possible choice of an in Theorem 6 is
an =√n · (log n)1/2+ε
for any ε > 0, and we have
limn→∞
Sn − E(Sn)
an= 0 P -a.s.
Note that limn→∞ an/n = 0. Precise description of the fluctuation of Sn(ω) for P -a.e.
ω ∈ Ω: law of the iterated logarithm, see Section 6 and also Example 5.12.
III.2. The Strong Law of Large Numbers 44
The I.I.D. Case
Lemma 9. Let Ui, Vi, W ∈ Z(Ω,A) such that
∞∑i=1
P (Ui 6= Vi) <∞.
Then1
n·
n∑i=1
UiP -a.s.−→ W ⇔ 1
n·
n∑i=1
ViP -a.s.−→ W.
Proof. The Borel-Cantelli Lemma implies P (lim supi→∞Ui 6= Vi) = 0.
Lemma 10. For X ∈ Z+(Ω,A)
E(X) ≤∞∑k=0
P (X > k) ≤ E(X) + 1.
Proof. We have
E(X) =∞∑k=1
∫k−1<X≤k
X dP,
and therefore
E(X) ≤∞∑k=1
k · P (k − 1 < X ≤ k) =∞∑k=0
P (X > k)
as well as
E(X) ≥∞∑k=1
(k − 1) · P (k − 1 < X ≤ k) ≥∞∑k=0
P (X > k)− 1.
Theorem 11 (Strong Law of Large Numbers, i.i.d. Case). Let (Xn)n∈N be i.i.d. Then
∃Z ∈ Z(Ω,A) :1
n· Sn
P -a.s.−→ Z ⇔ X1 ∈ L1,
in which case Z = E(X1) P -a.s.
Proof. ‘⇒’: Clearly
P (|X1| > n) = P (An)
where
An = |Xn| > n.
Note that1
n·Xn =
1
n· Sn −
n− 1
n· 1
n− 1· Sn−1
P -a.s.−→ 0.
Hence
P (lim supn→∞
An) = 0.
III.2. The Strong Law of Large Numbers 45
Since (An)n∈N is independent, the Borel-Cantelli Lemma implies
∞∑n=1
P (An) <∞.
Use Lemma 10 to obtain E(|X1|) <∞.
‘⇐’: Consider the truncated random variables
Yn =
Xn if |Xn| < n
0 otherwise.
We have∞∑i=1
1
i2· Var(Yi) <∞. (3)
Proof: Observe that
Var(Yi) ≤ E(Y 2i ) =
i∑k=1
E(Y 2i · 1[k−1,k[ |Yi|)
=i∑
k=1
E(X2i · 1[k−1,k[ |Xi|) ≤
i∑k=1
k2 · P (k − 1 ≤ |X1| < k).
Thus
∞∑i=1
1
i2· Var(Yi) ≤
∞∑k=1
k2 · P (k − 1 ≤ |X1| < k) ·∞∑i=k
1
i2
≤ 2 ·∞∑k=1
k · P (k − 1 ≤ |X1| < k) ≤ 2 · (E(|X1|) + 1) <∞,
cf. the proof of Lemma 10.
Theorem 6 and (3) yield
1
n·
n∑i=1
(Yi − E(Yi))P -a.s.−→ 0. (4)
Furthermore,
limn→∞
E(Yn) = limn→∞
∫|X1|<n
X1 dP = E(X1),
according to the dominated convergence theorem, and therefore
limn→∞
1
n·
n∑i=1
E(Yi) = E(X1). (5)
From (4) and (5) we get
1
n·
n∑i=1
YiP -a.s.−→ E(X1).
III.2. The Strong Law of Large Numbers 46
Finally,∞∑i=1
P (Xi 6= Yi) <∞,
since, by Lemma 10,∞∑i=1
P (Xi 6= Yi) =∞∑i=1
P (|Xi| ≥ i) ≤∞∑i=0
P (|X1| > i) ≤ E(|X1|) + 1 <∞.
It remains to observe Lemma 9.
Theorem 12. Let (Xn)n∈N be i.i.d.
(i) If E(X−1 ) <∞∧ E(X+1 ) =∞ then
1
n· Sn
P -a.s.−→ ∞.
(ii) If E(|X1|) =∞ then
lim supn→∞
∣∣∣ 1n· Sn∣∣∣ =∞ P -a.s.
Proof. (i) follows from Theorem 11, and (ii) is an application of the Borel-Cantelli
Lemma, see Ganssler, Stute (1977, p. 131).
Remark 13. We already have Sn/nP -a.s.−→ E(X1) if the random variables Xn are
identically distributed, P -integrable, and pairwise independent. See Bauer (1996,
§12).
Two Applications
Remark 14. The basic idea of Monte-Carlo algorithms : to compute a quantity a ∈ R
(i) find a probability measure µ on (R,B) such that∫R xµ(dx) = a,
(ii) take an i.i.d. sequence (Xn)n∈N with PX1 = µ and approximate a by 1/n ·Sn(ω).
Clearly Sn/n is an unbiased estimator for a, i.e.,
E( 1
n· Sn)
= a.
Due to the Strong Law of Large Numbers Sn/n converges almost surely to a. If
X1 ∈ L2, then
E( 1
n· Sn − a
)2
= Var( 1
n· Sn − a
)= Var
( 1
n·
n∑i=1
(Xi − a))
=1
n· Var(X1),
i.e., the variance of X1 is the key quantity for the error of the Monte Carlo algorithm
in the mean square sense. Moreover,
1
n− 1·
n∑i=1
(Xi − Sn/n)2 P -a.s.−→ Var(X1)
provides a simple estimator for this variance, see Ubung 6.4.
Applications: see, e.g., Ubung 7.1 and 8.2.
III.3. The Weak Law of Large Numbers 47
Remark 15. Let (Xn)n∈N be i.i.d. with µ = PX1 and corresponding distribution
function F = FX1 . Suppose that µ is unknown, but observations X1(ω), . . . , Xn(ω)
are available for ‘estimation of µ’.
Fix C ∈ B. Due to Theorem 11
1
n·
n∑i=1
1C XiP -a.s.−→ µ(C).
The particular case C = ]−∞, x] leads to the definitions
Fn(x, ω) =1
n· |i ∈ 1, . . . , n : Xi(ω) ≤ x|, x ∈ R,
and
µn(·, ω) =1
n·
n∑i=1
εXi(ω)
of the empirical distribution function Fn(·, ω) and the empirical distribution µn(·, ω),
resp. We obtain
∀x ∈ R ∃A ∈ A : P (A) = 1 ∧(∀ω ∈ A : lim
n→∞Fn(x, ω) = F (x)
).
Therefore
∃A ∈ A : P (A) = 1 ∧(∀ q ∈ Q ∀ω ∈ A : lim
n→∞Fn(q, ω) = F (q)
),
which implies
∃A ∈ A : P (A) = 1 ∧(∀ω ∈ A : µn(·, ω)
w−→ µ),
see Helly’s Theorem (ii), p. 17, and Theorem II.3.7.
A refined analysis yields the Glivenko-Cantelli Theorem
∃A ∈ A : P (A) = 1 ∧(∀ω ∈ A : lim
n→∞supx∈R|Fn(x, ω)− F (x)| = 0
),
see Klenke (2013, Satz 5.23) and Ubung 7.2.
3 The Weak Law of Large Numbers
As previously, 0 < an ↑ ∞.
In the sequel: (Xn)n∈N pairwise uncorrelated.
Particular case, (Xn)n∈N pairwise independent and Xn ∈ L2 for every n ∈ N, see
Theorem II.6.19.
Theorem 1 (Khintchine). If
limn→∞
1
a2n
·n∑i=1
Var(Xi) = 0,
then1
an·
n∑i=1
(Xi − E(Xi))P−→ 0.
III.4. Characteristic Functions 48
Proof. Without loss of generality E(Xn) = 0 for every n ∈ N. For ε > 0 the
Chebyshev-Markov inequality and Bienayme’s Theorem yield
P(∣∣∣ 1
an· Sn∣∣∣ ≥ ε
)≤ 1
ε2· Var
( 1
an· Sn)
=1
ε2· 1
a2n
·n∑i=1
Var(Xi). (1)
Remark 2. Assume that supn∈N Var(Xn) < ∞. Then Theorem 1 is applicable for
any sequence (an)n∈N with limn→∞ an/√n =∞.
Example 3. Consider an independent sequence (Xn)n∈N with
P (Xn = 0) = 1− 1
n log(n+ 1), P (Xn = ±n) =
1
2n log(n+ 1).
Hence
E(Xn) = 0, Var(Xn) =n
log(n+ 1),
and1
n2·
n∑i=1
Var(Xi) ≤1
log(n+ 1).
Thus 1/n · SnP−→ 0 due to Theorem 1, but 1/n · Sn
P -a.s.−→ 0 does not hold, see Ubung
7.3.
4 Characteristic Functions
We use the notation 〈·, ·〉 and ‖ · ‖ for the Euclidean inner product and norm. Recall
that M(Rk) denotes the class of all probability measures on (Rk,Bk).
Given: a probability measure µ ∈M(Rk).
Definition 1. f : Rk → C is µ-integrable if <f and =f are µ-integrable, in which
case ∫f dµ =
∫<f dµ+ ı ·
∫=f dµ.
Definition 2. The mapping µ : Rk → C with
µ(y) =
∫exp(ı〈x, y〉)µ(dx), y ∈ Rk,
is called the Fourier transform of µ.
Example 3. For a discrete probability measure
µ =∞∑j=1
αj · εxj
we have
µ(y) =∞∑j=1
αj · exp(ı〈xj, y〉).
III.4. Characteristic Functions 49
For instance, if µ = π(λ) is the Poisson distribution with parameter λ > 0, then
µ(y) = exp(−λ) ·∞∑j=0
λj
j!· exp(ıjy) = exp(−λ) · exp(λ · exp(ıy))
= exp(λ · (exp(ıy)− 1)).
Definition 4. Let
fk(x) = (2π)−k/2 · exp(−‖x‖2/2), x ∈ Rk.
The probability measure fk · λk is called the k-dimensional standard normal distribu-
tion.
Example 5. Let µ = f ·λk with any probability density f : Rk → [0,∞[ with respect
to the k-dimensional Lebesgue measure λk. Then
µ(y) =
∫exp(ı〈x, y〉) · f(x)λk(dx).
For any λk-integrable function f , the right-hand side defines its Fourier transform,
see also Functional Analysis. For instance, if µ is the k-dimensional standard normal
distribution, then
µ(y) = exp(−‖y‖2/2).
Proof: For k = 1 the statement follows from
µ′(y) = (2π)−1/2
∫ıx · exp(ıxy) · exp(−x2/2) dx = − y · µ(y)
and µ(0) = 1. Use Fubini’s Theorem for k > 1.
We study properties of the Fourier transformation : M(Rd)→ C(Rd).
The Range Problem
Theorem 6.
(i) µ is uniformly continuous on Rk,
(ii) |µ(y)| ≤ 1 = µ(0) for y ∈ Rk,
(iii) for n ∈ N, a1, . . . , an ∈ C, and y1, . . . , yn ∈ Rk,
n∑j,`=1
aj · a` · µ(yj − y`) ≥ 0
(positive semi-definite).
III.4. Characteristic Functions 50
Proof. Ad (i): Observe that∣∣exp(ı〈x, y1〉)− exp(ı〈x, y2〉)∣∣ ≤ ‖x‖ · ‖y1 − y2‖. (1)
For ε > 0 take r > 0 such that µ(B) ≥ 1− ε, where B = x ∈ Rk : ‖x‖ ≤ r. Then∣∣µ(y1)− µ(y2)∣∣ ≤ ∫
B
∣∣exp(ı〈x, y1〉)− exp(ı〈x, y2〉)∣∣µ(dx) + 2 · ε
≤ r · ‖y1 − y2‖+ 2 · ε.
Properties (ii) and (iii) are easily verified.
Remark 7. Bochner’s Theorem states that every continuous, positive semi-definite
function ϕ : Rk → C with ϕ(0) = 1 is the Fourier transform of a probability measure
on (Rk,Bk). See Bauer (1996, p. 184) for references.
Convolutions
In the sequel: X, Y, . . . are k-dimensional random vectors on a probability space
(Ω,A, P ).
Definition 8. The characteristic function of X is given by
ϕX = PX .
Remark 9. Due to Theorem A.5.12
ϕX(y) =
∫Rk
exp(ı〈x, y〉)PX(dx) =
∫Ω
exp(ı〈X(ω), y〉)P (dω).
Theorem 10.
(i) For every linear mapping T : Rk → R`
ϕTX = ϕX T>.
(ii) For independent random vectors X and Y
ϕX+Y = ϕX · ϕY .
In particular, for a ∈ Rk,
ϕX+a = exp(ı〈a, ·〉) · ϕX .
Proof. Ad (i): Let z ∈ R`. Use PTX = T (PX) to obtain
ϕTX(z) =
∫Rk
exp(ı〈T (x), z〉)PX(dx) = ϕX(T>(z)).
Ad (ii): Let z ∈ Rk. Fubini’s Theorem and Theorem II.6.18 imply
ϕX+Y (z) =
∫R2k
exp(ı〈x+ y, z〉)P(X,Y )(d(x, y)) = ϕX(z) · ϕY (z).
III.4. Characteristic Functions 51
Corollary 11 (Convolution Theorem). For probability measures µj ∈M(R),
µ1 ∗ µ2 = µ1 · µ2.
Proof. Use Theorem 10.(ii) and Theorem II.6.23.
Example 12. For µ = N(m,σ2) with σ ≥ 0 and m ∈ R
µ(y) = exp(ımy) · exp(−σ2y2/2).
See Examples 3 and 5 and Theorem 10.
The Uniqueness Theorem
Theorem 13 (Uniqueness Theorem). For probability measures µj ∈M(Rk),
µ1 = µ2 ⇒ µ1 = µ2.
Proof. See Bauer (1996, Thm. 23.4) or Billingsley (1979, Sec. 29) for the case k > 1.
Here: the case k = 1. Let T > 0 and a < b. Consider the bounded function
fa,b(y) =1
2πıy· (exp(−iay)− exp(−iby)) , y ∈ R \ 0 ,
and put
Ia,b,T =
∫[−T,T ]
fa,b(y) · µ(y)λ1(dy).
We show that
limT→∞
Ia,b,T = µ(]a, b[) +µ(a) + µ(b)
2. (2)
Fubini’s Theorem yields
Ia,b,T =
∫R
∫[−T,T ]
fa,b(y) · exp(ıxy)λ1(dy)µ(dx).
Let (sinus integralis)
Si(x) =
∫ x
0
sin(y)
ydy
for x ≥ 0. It is well known that limx→∞ Si(x) = π/2. Note that∫[−T,T ]
fa,b(y) · exp(ıxy)λ1(dy) =
∫ T
−T
1
2πıy(exp(ıy(x− a))− exp(ıy(x− b))) dy
=
∫ T
0
1
πy(sin(y(x− a))− sin(y(x− b))) dy
=ga,b,T (x)
π,
where
ga,b,T (x) = sgn(x− a) · Si(T |x− a|)− sgn(x− b) · Si(T |x− b|).
III.4. Characteristic Functions 52
We conclude that
Ia,b,T =
∫R
ga,b,T (x)
πµ(dx).
Use
limT→∞
ga,b,T (x) =
0, if x < a or x > b,
π, if a < x < b,
π/2, if x = a or x = b
to obtain (2).
According to (2), µ determines the distribution function of µ. Apply Theorem II.1.10.
Example 14. For independent random variables X1 and X2 with Xj ∼ π(λj) we
have X1 +X2 ∼ π(λ1 + λ2).
Proof: Theorem 10 and Example 3 yield
ϕX1+X2(y) = exp(λ1 · (exp(ıy)− 1)) · exp(λ2 · (exp(ıy)− 1))
= exp((λ1 + λ2) · (exp(ıy)− 1)).
Use Theorem 13.
Remark 15. Suppose that µ is λ1-integrable. Then µ = f · λ1 with a continuous
density f , which is given by
f(x) = (2π)−1
∫exp(−ıxy) · µ(y)λ1(dy), x ∈ R.
Proof: Employ (2), see Billingsley (1979, p. 347).
The Continuity Theorem
Lemma 16. For every ε > 0 and every probability measure µ ∈M(R),
µ(x ∈ R : |x| ≥ 1/ε) ≤ 7/ε ·∫ ε
0
(1−<µ(y)) dy.
Proof. Clearly
<µ(y) =
∫R
cos(xy)µ(dx).
Hence, with the convention sin(0)/0 = 1,
1/ε ·∫ ε
0
(1−<µ(y)) dy = 1/ε ·∫
[0,ε]
∫R(1− cos(xy))µ(dx)λ1(dy)
=
∫R
(1/ε ·
∫ ε
0
(1− cos(xy)) dy
)µ(dx)
=
∫R(1− sin(εx)/(εx))µ(dx)
≥ inf|z|≥1
(1− sin(z)/z) · µ(x ∈ R : |εx| ≥ 1).
III.4. Characteristic Functions 53
Finally,
inf|z|≥1
(1− sin(z)/z) ≥ 1/7.
Theorem 17 (Continuity Theorem, Levy).
(i) Let µ, µn ∈M(Rk) for n ∈ N. Then
µnw−→ µ ⇒ ∀ y ∈ Rk : lim
n→∞µn(y) = µ(y).
(ii) Let µn ∈M(Rk) for n ∈ N, and let ϕ : Rk → C be continuous at 0. Then
∀ y ∈ Rk : limn→∞
µn(y) = ϕ(y) ⇒ ∃µ ∈M(Rk) : µ = ϕ ∧ µnw−→ µ.
Proof. Ad (i): Note that x 7→ exp(ı〈x, y〉) is bounded and continuous on Rk.
Ad (ii): See Bauer (1996, Thm. 23.8) or Billingsley (1979, Sec. 29) for the case k > 1.
Here: the case k = 1.
We first show that
µn : n ∈ N is tight. (3)
By Lemma 16
µn(x ∈ R : |x| ≥ 1/ε) ≤ cn(ε)
with
cn(ε) = 7/ε ·∫ ε
0
(1−<µn(y)) dy.
Note that
c(ε) = 7/ε ·∫ ε
0
(1−<ϕ(y)) dy
is well-defined, at least for sufficiently small ε > 0, and the dominated convergence
theorem yields
limn→∞
cn(ε) = c(ε).
Let δ > 0, and note that ϕ(0) = 1. By continuity of ϕ at 0 there exists ε > 0 such
that
c(ε) ≤ δ/2.
Take n0 ∈ N such that, for every n ≥ n0,
|cn(ε)− c(ε)| ≤ δ/2.
Hence, for n ≥ n0,
µn(x ∈ R : |x| ≥ 1/ε) ≤ δ,
and hereby we get (3).
Thus, by Prohorov’s Theorem,
µn : n ∈ N is relatively compact. (4)
III.5. The One-Dimensional Central Limit Theorem 54
We fix a probability measure µ ∈M(R) such that µnk
w−→ µ for a suitable subsequence
of (µn)n∈N. By assumption and (i), we get µ = ϕ as well as the following fact:
if µnk
w−→ ν for any subsequence (µnk)k∈N, then ν = µ, (5)
see Theorem 13.
We claim that µnw−→ µ. Due to Remarks II.2.8 and II.3.13.(ii) it suffices to show
that every subsequence of (µn)n∈N contains a subsequence that converges weakly to µ.
The latter property follows from (4) and (5).
Example 18. For the uniform distribution µn on [−n, n] we have
µn(y) =
1 if y = 0sin(n y)n y
otherwise.
Hence
limn→∞
µn(y) =
1 if y = 0
0 otherwise,
but (µn)n∈N does not converge weakly.
Corollary 19. Weak convergence in M(Rk) is equivalent to pointwise convergence
of Fourier transforms.
Example 20. Let µn = B(n, pn) and assume that
limn→∞
n · pn = λ > 0.
Then
µnw−→ π(λ).
Proof: Ubung 3.1 or 8.1.
5 The One-Dimensional Central Limit Theorem
Given: a triangular array of random variables Xnk, where n ∈ N and k ∈ 1, . . . , rnwith rn ∈ N.
Assumptions:
(i) Xnk ∈ L2 for every n ∈ N and k ∈ 1, . . . , rn,
(ii) (Xn1, . . . , Xnrn) independent for every n ∈ N.
Put
Sn =rn∑k=1
Xnk, s2n = Var(Sn).
III.5. The One-Dimensional Central Limit Theorem 55
Additional assumption:
(iii) s2n > 0 for every n ∈ N.
Normalization:
X∗nk =Xnk − E(Xnk)
sn, S∗n =
rn∑k=1
X∗nk =Sn − E(Sn)
sn.
Clearly
E(X∗nk) = E(S∗n) = 0 ∧ Var(X∗nk) =Var(Xnk)
s2n
∧ Var(S∗n) = 1.
Question: convergence in distribution of the sequence of standardized sums S∗n?
For notational convenience: all random variables Xnk are defined on a common prob-
ability space (Ω,A, P ).
Example 1. (Xk)k∈N i.i.d. with X1 ∈ L2 and Var(X1) = σ2 > 0. Put m = E(X1),
take
rn = n, Xnk = Xk.
Then
S∗n =
∑nk=1Xk − n ·m√
n · σ.
Definition 2.
(i) Lyapunov condition
∃ δ > 0: limn→∞
rn∑k=1
E(|X∗nk|2+δ) = 0.
(ii) Lindeberg condition
∀ ε > 0: limn→∞
rn∑k=1
∫|X∗nk|≥ε
|X∗nk|2 dP = 0.
(iii) Feller condition
limn→∞
max1≤k≤rn
Var(X∗nk) = 0.
(iv) The random variables Xnk are asymptotically negligible if
∀ ε > 0: limn→∞
max1≤k≤rn
P (|X∗nk| > ε) = 0.
Lemma 3. The conditions from Definition 2 satisfy
(i) ⇒ (ii) ⇒ (iii) ⇒ (iv).
Moreover, (iii) implies limn→∞ rn =∞.
III.5. The One-Dimensional Central Limit Theorem 56
Proof. From∫|X∗nk|≥ε
|X∗nk|2 dP ≤1
εδ·∫|X∗nk|≥ε
|X∗nk|2+δ dP ≤ 1
εδ· E(|X∗nk|2+δ)
we get ‘(i) ⇒ (ii)’. From
Var(X∗nk) ≤ ε2 +
∫|X∗nk|≥ε
|X∗nk|2 dP
we get ‘(ii) ⇒ (iii)’. The Chebyshev-Markov inequality yields ‘(iii) ⇒ (iv)’. Finally,
1 = Var(S∗n) ≤ rn · max1≤k≤rn
Var(X∗nk),
so that (iii) implies limn→∞ rn =∞.
Example 4. Example 1 continued. We take rn = n and
X∗nk =Xk −m√n · σ
to obtain
n∑k=1
∫|X∗nk|≥ε
|X∗nk|2 dP =1
σ2·∫|X1−m|≥ε·
√n·σ
(X1 −m)2 dP.
Hence the Lindeberg condition is satisfied.
In the sequel
ϕnk = ϕX∗nk
denotes the characteristic function of X∗nk. Moreover,
τ 2nk = Var(X∗nk).
Use (4.1) and Theorem A.5.10 to conclude that
ϕ′nk(0) = ı · E(X∗nk) = 0
and
ϕ′′nk(0) = −E(|X∗nk|2) = −τ 2nk.
Lemma 5. For y ∈ R and ε > 0∣∣ϕnk(y)− (1− τ 2nk/2 · y2)
∣∣ ≤ y2 ·(ε · |y| · τ 2
nk +
∫|X∗nk|≥ε
|X∗nk|2 dP).
Proof. For u ∈ R ∣∣exp(ıu)− (1 + ıu− u2/2)∣∣ ≤ min(u2, |u|3/6),
III.5. The One-Dimensional Central Limit Theorem 57
see Billingsley (1979, Eqn. (26.4)). Hence∣∣ϕnk(y)− (1− τ 2nk/2 · y2)
∣∣=∣∣E(exp(ı ·X∗nk · y))− E
(1 + ı ·X∗nk · y − |X∗nk|2 · y2/2
)∣∣≤ E
(min(y2 · |X∗nk|2, |y|3 · |X∗nk|3)
)≤ |y|3 ·
∫|X∗nk|<ε
ε · |X∗nk|2 dP + y2 ·∫|X∗nk|≥ε
|X∗nk|2 dP
≤ ε · |y|3 · τ 2nk + y2 ·
∫|X∗nk|≥ε
|X∗nk|2 dP.
Lemma 6. Put
∆n(y) =rn∏k=1
ϕnk(y)− exp(−y2/2), y ∈ R.
If the Lindeberg condition is satisfied, then
∀ y ∈ R : limn→∞
∆n(y) = 0.
Proof. Since |ϕnk(y)| ≤ 1 and | exp(−τ 2nk/2 · y2)| ≤ 1, we get
|∆n(y)| =∣∣ rn∏k=1
ϕnk(y)−rn∏k=1
exp(−τ 2nk/2 · y2)
∣∣≤
rn∑k=1
∣∣ϕnk(y)− exp(−τ 2nk/2 · y2)
∣∣by induction, see Billingsley (1979, Lemma 27.1).
We assume
max1≤k≤rn
τ 2nk · y2 ≤ 1,
which holds for fixed y ∈ R if n is sufficiently large, see Lemma 3. Using
0 ≤ u ≤ 1/2 ⇒ | exp(−u)− (1− u)| ≤ u2
and Lemma 5 we obtain
|∆n(y)| ≤rn∑k=1
|ϕnk(y)− (1− τ 2nk/2 · y2)|+
rn∑k=1
τ 4nk/4 · y4
≤ y2 ·(ε · |y|+
rn∑k=1
∫|X∗nk|≥ε
|X∗nk|2 dP)
+ y4/4 · max1≤k≤rn
τ 2nk
for every ε > 0. Thus Lemma 3 yields
lim supn→∞
|∆n(y)| ≤ |y|3 · ε.
III.5. The One-Dimensional Central Limit Theorem 58
Theorem 7 (Central Limit Theorem). The following properties are equivalent:
(i) (X∗nk)n,k satisfies the Lindeberg condition.
(ii) PS∗nw−→ N(0, 1) and (X∗nk)n,k satisfies the Feller condition.
(iii) PS∗nw−→ N(0, 1) and the random variables X∗nk are asymptotically negligible.
Proof. ‘(i) ⇒ (ii)’: Due to Lemma 3 we only have to prove the weak convergence.
Recall that µ(y) = exp(−y2/2) for the standard normal distribution µ. Consider the
characteristic function ϕn = ϕS∗n of S∗n. By Theorem 4.10.(ii)
ϕn =rn∏k=1
ϕnk,
and therefore Lemma 6 implies
∀ y ∈ R : limn→∞
ϕn(y) = µ(y).
It remains to apply Corollary 4.19.
Lemma 3 yields ‘(ii) ⇒ (iii)’. See Billingsley (1979, p. 314–315) for the proof of ‘(iii)
⇒ (i)’.
In the sequel we consider an i.i.d. sequence (Xn)n∈N with X1 ∈ L2 and
σ2 = Var(X1) > 0,
and we put
m = E(X1),
cf. Example 4.
Corollary 8. We have ∑nk=1Xk − n ·m√
n · σd−→ Z
where Z ∼ N(0, 1).
Proof. Theorem 7 and Example 4.
Example 9. Example 4 continued, and Corollary 8 reformulated. Let
Φ(x) =1√2π·∫ x
−∞exp(−u2/2) du, x ∈ R,
denote the distribution function of the standard normal distribution, and let
δn = supx∈R
∣∣P (Sn ≤ m · n+ x · σ ·√n)− Φ(x)
∣∣. (1)
Due to the Central Limit Theorem and Theorem II.3.7 we have limn→∞ δn = 0.
Consequently, for
an = m · n+ a · σ ·√n, bn = m · n+ b · σ ·
√n,
with a < b,
limn→∞
P (an ≤ Sn ≤ bn) = Φ(b)− Φ(a).
III.5. The One-Dimensional Central Limit Theorem 59
Theorem 10 (Berry-Esseen). Suppose that X1 ∈ L3 and m = 0, additionally. For
δn given by (1)
∀n ∈ N : δn ≤6 · E(|X1|3)
σ3· 1√
n.
Proof. See Ganssler, Stute (1977, Section 4.2).
Example 11. Example 9 continued with
PX1 =1
2· (ε1 + ε−1). (2)
Since (−Xn)n∈N is i.i.d. as well, and since P−X1 = PX1 , we have
P (S2n ≤ 0) = P (S2n ≥ 0),
which yields
P (S2n ≤ 0) =1
2· (1 + P (S2n = 0)).
From Example 1.11 we know that
P (S2n = 0) ≈ 1√πn
,
and therefore
δ2n ≥ P (S2n ≤ 0)− 1
2=
1
2· P (S2n = 0) ≈ 1
2√πn
.
Hence the upper bound from Theorem 10 cannot be improved in terms of powers of n.
Example 12. Example 9 continued in the case m = 0.
Let
Bc = lim supn→∞
Sn/√n ≥ c, c > 0.
Using Remark 1.7.(ii) we get
P (Bc) ≥ P (lim supn→∞
Sn/√n > c) ≥ lim sup
n→∞P (Sn/
√n > c) = 1− Φ(c/σ) > 0.
Kolmogorov’s Zero-One Law yields
P (Bc) = 1,
and therefore
P (lim supn→∞
Sn/√n =∞) = P
(⋂c∈N
Bc
)= 1.
By symmetry
P (lim infn→∞
Sn/√n = −∞) = 1.
In particular, for PX1 given by (2),
P (lim supn→∞
Sn = 0) = 1,
see also Example 1.11.
III.6. The Law of the Iterated Logarithm 60
6 The Law of the Iterated Logarithm
Given: an i.i.d. sequence (Xn)n∈N of random variables on (Ω,A, P ) such that
X1 ∈ L2 ∧ E(X1) = 0 ∧ Var(X1) = σ2 > 0.
Remark 1. For every ε > 0, with probability one,
limn→∞
Sn√n · (log n)1/2+ε
= 0,
see Remark 2.8. On the other hand, with probability one,
lim supn→∞
Sn√n
=∞ ∧ lim infn→∞
Sn√n
= −∞,
see Example 5.12.
Question: precise description of the fluctuation of (Sn(ω))n∈N for P -almost every ω?
In particular: existence of a deterministic sequence (γ(n))n∈N of positive reals such
that, with probability one,
lim supn→∞
Snγ(n)
= 1 ∧ lim infn→∞
Snγ(n)
= −1?
Notation: L((un)n∈N) is the set of all limit points in R of a sequence (un)n∈N in R.
Let
γ(n) =√
2n · ln(lnn) · σ2, n ≥ 3.
Theorem 2 (Strassen’s Law of the Iterated Logarithm).
With probability one,
L
((Snγ(n)
)n∈N
)= [−1, 1].
Proof. See Bauer (1996, §33).
Corollary 3 (Hartman and Wintner’s Law of the Iterated Logarithm).
With probability one,
lim supn→∞
Snγ(n)
= 1 ∧ lim infn→∞
Snγ(n)
= −1.
7 The Multi-dimensional Central Limit Theorem
Multi-dimensional Normal Distributions
In the sequel, νk = fk · λk denotes the k-dimensional standard normal distribution,
see Definition 4.4. We Consider a k-dimensional random vector
X = (X1, . . . , Xk)>.
III.7. The Multi-dimensional Central Limit Theorem 61
Theorem 1. X ∼ νk iff (X1, . . . , Xk) is i.i.d. and X1 ∼ N(0, 1).
Proof. ‘⇒’: Take Aj ∈ B and apply Fubini’s Theorem to obtain
P (X ∈ A1 × · · · × Ak) =k∏j=1
∫Aj
f1(xj)λ1(dxj).
Choose Ai = R for i 6= j to conclude that PXj= f1 · λ1 = N(0, 1), and therefore
P
(k⋂j=1
Xj ∈ Aj
)=
k∏j=1
P (Xj ∈ Aj).
‘⇐’: By Theorem II.6.18 we have PX =×kj=1 N(0, 1), and Fubini’s Theorem yields
P (X ∈ A1 × · · · × Ak) =k∏j=1
P (Xj ∈ Aj) = (fk · λk)(A1 × · · · × Ak)
for every choice of Aj ∈ B. It remains to apply Theorem A.1.5.
See Ubung 9.1 for a generalization of Theorem 1.
Theorem 2. For probability measures µ1, µ2 on Bk we have µ1 = µ2 iff
Tµ1 = Tµ2
for every linear mapping T : Rk → R.
Proof. Let y ∈ Rk, and put Tx = 〈x, y〉. Then T>z = z · y and for µ = µ1, µ2
µ(y) = µ(T>1) = T µ(1),
see Theorem 4.10.(i). It remains to apply Theorem 4.13.
Recall that N(b, 0) = εb for any b ∈ R.
Definition 3. A probability measure µ on Bk is called a k-dimensional normal dis-
tribution, if Tµ is a one-dimensional normal distribution for every linear mapping
T : Rk → R. Furthermore, X is normally distributed, if PX is a k-dimensional normal
distribution.
Example 4. Let X ∼ νk, and take a = (a1, . . . , ak) ∈ R1×k. Use Theorem 1 to
conclude that (a1X1, . . . , akXk) is independent and ajXj ∼ N(0, a2j). By Example
II.6.26.(i)
aX ∼ N(0, aa>),
so that νk is a k-dimensional normal distribution.
More generally, for X being normally distributed, L ∈ R`×k, and b ∈ R`, the random
vector LX+b is normally distributed, too. We shall see that every normal distribution
arises in this way with X ∼ νk.
III.7. The Multi-dimensional Central Limit Theorem 62
Multi-dimensional Expectations and Covariance Matrices
In the sequel, we assume that X1, . . . , Xk ∈ L2.
The notions of expectation and variance are extended to random vectors as follows.
Definition 5. The expectation and the covariance matrix of X are defined by
E(X) =
E(X1)...
E(Xk)
∈ Rk
and
Cov(X) =
Cov(X1, X1) . . . Cov(X1, Xk)...
...
Cov(Xk, X1) . . . Cov(Xk, Xk)
∈ Rk×k
respectively, where
Cov(Xi, Xj) = E((Xi − E(Xi)) · (Xj − E(Xj))).
The mean and the covariance matrix of a probability measure µ on Bk are defined by
E(X) and Cov(X), respectively, for the random vector X = idRk on the probability
space (Rk,Bk, µ).
Let Ek denote the identity matrix in Rk×k.
Example 6. If X ∼ νk, then E(X) = 0 and Cov(X) = Ek by Theorems II.6.19 and 1.
Lemma 7. For L ∈ R`×k and b ∈ R`
E(LX + b) = LE(X) + b
and
Cov(LX + b) = LCov(X)L>.
Proof. See Irle (2001, Abschn. 9.10).
Remark 8. Covariance matrices are symmetric and non-negative definite. To verify
the latter, let a ∈ R1×k to obtain
0 ≤ σ2(aX) = aCov(X) a>.
Theorem 9.
(i) For every b ∈ Rk and every symmetric, non-negative definite matrix K ∈ Rk×k
there exists a k-dimensional normal distribution with mean b and covariance
matrix K.
(ii) A normal distribution is uniquely determined by its mean and its covariance
matrix.
III.7. The Multi-dimensional Central Limit Theorem 63
Proof. Ad (i): See Example 6 in the case K = Ek and b = 0. In the general case, we
get the factorization
K = LL> (1)
with L ∈ Rk×k by diagonalization of K. Then Y = LX + b is normally distributed
with mean b and covariance matrix K, if X ∼ νk.
Ad (ii): Let µ be a k-dimensional normal distribution with mean b and covariance
matrix K, and let Tx = ax with a ∈ R1×k. Then Tµ = N(ab, aKa>) by Lemma 7.
Apply Theorem 2.
Notation: the k-dimensional normal distribution with mean b ∈ Rk and covariance
matrix K ∈ Rk×k is denoted by N(b,K).
Remark 10. If K is symmetric and positive definite, then the Cholesky decomposi-
tion yields (1) with a lower triangular matrix L.
Example 11. Let K ∈ Rk×k be symmetric and non-negative definite. Take L ∈ Rk×k
with (1), and assume that X ∼ νk.
Assume that detK > 0. For A ∈ Bk
N(0, K)(A) = P (LX ∈ A) = νk(L−1A)
= (2π)−k/2 ·∫L−1A
exp(−‖x‖2/2)λk(dx) =
∫A
gk(x;K)λk(dx)
with
gk(x;K) = ((2π)k detK)−1/2 · exp(−x>K−1x/2).
It follows that N(0, K) = gk(· ;K) · λk.There exists an orthonormal basis e1, . . . , ek of Rk and c1, . . . , ck > 0 with Kei = ci ·ei.Hence
x>K−1x =k∑i=1
〈x, ei〉2/ci
for x ∈ Rk, which yields
x ∈ Rk : x>K−1x = r2 = k∑i=1
ai ·√ciei :
k∑i=1
a2i = r2
for every r ≥ 0.
Assume that detK = 0. Then Ke = 0 for some e ∈ Rk \ 0, which implies e>LX ∼ε0. Put A = x ∈ Rk : e>x = 0. It follows that N(0, K)(A) = 1, while λk(A) = 0.
Hence there exists no Lebesgue density for N(0, K).
A Multi-dimensional Central Limit Theorem
Lemma 12. For Y ∼ N(b,K)
ϕY (y) = exp(ı〈y, b〉) · exp(−y>Ky/2).
III.7. The Multi-dimensional Central Limit Theorem 64
Proof. With the notation from the previous proof
ϕLX(y) = ϕX(L>y) = exp(−‖L>y‖2/2) = exp(−y>Ky/2),
see Example 4.5, and furthermore ϕY (y) = exp(ı〈y, b〉) · ϕLX(y).
A variant of the multi-dimensional Central Limit Theorem reads as follows.
Theorem 13. Let (Xn)n∈N be an independent sequence of identically distributed
k-dimensional random vectors with components in L2. Then
1√n·
(n∑k=1
Xk − n · E(X1)
)d−→ Z,
where Z ∼ N(0,Cov(X1)).
Proof. Assume E(X1) = 0 without loss of generality, and put S ′n = 1√n·∑n
k=1 Xk as
well as K = Cov(X1). Let y ∈ Rk, and put T (x) = 〈x, y〉. Then
ϕT (S′n)(1) = ϕS′n(y).
The real-valued random variables T (Xk) form an i.i.d. sequence with zero mean and
variance y>Ky. By Corollary 5.8 and Lemma 12,
limn→∞
ϕT (S′n)(1) = exp(−y>Ky/2) = ϕZ(y).
Apply Theorem 4.17.
Example 14. Let the assumptions from Theorem 13 be satisfied. Put Sn =∑n
k=1 Xk
and K = Cov(X1). If N(0, K)(δA) = 0, then
limn→∞
P (Sn ∈√n · A+ n · E(X1)) = N(0, K)(A).
We add that
N(0, K)(∂A) = 0 ⇔ λk(∂A) = 0,
if detK > 0.
Chapter IV
Brownian Motion
We construct a Brownian motion by means of an infinite-dimensional counterpart to
the Central Limit Theorem. We refer to Klenke (2013, p. 467) for the following quote
on Brownian motion: ‘Ohne Ubertreibung kann man sagen, daß dies das zentrale
Objekt der Stochastik ist’.
1 Stochastic Processes
Given: a probability space (Ω,A, P ), a set I 6= ∅, and a measurable space (Ω′,A′).
Definition 1. A mapping
X : I × Ω→ Ω′
such that
∀ t ∈ I : X(t, ·) A-A′-measurable
is called a stochastic process with state space (Ω′,A′) and parameter set I. The
mappings
X(·, ω) : I → Ω′
with ω ∈ Ω are called the realizations (paths, trajectories) of X.
Remark 2. Equivalently, a stochastic process is a family of random elements with
a common state space on a common probability space. Frequently used notation:
X = (Xt)t∈I with
Xt = X(t, ·) : Ω→ Ω′.
In the sequel, X = (Xt)t∈I denotes a stochastic process on (Ω,A, P ) with state
space (Ω′,A′). Note that X may be identified with the mapping
Ψ: Ω→(Ω′)I,
given by
Ψ(ω) = X(·, ω),
which is A-⊗
t∈I A′-measurable, since πs(Ψ(ω)) = Xs(ω).
Later on we will denote Ψ by X as well, which is consistent with the notation from
Remark II.6.15. See also Definition II.6.16.
65
IV.1. Stochastic Processes 66
Definition 3.
(i) The measure PΨ on⊗
t∈I A′ is called the distribution of X. For m ∈ N and
t1, . . . , tm ∈ I the joint distribution of Xt1 , . . . , Xtm is called a finite-dimensional
marginal distribution of X.
(ii) The σ-algebra generated by X is defined by
FX = σ(Xs : s ∈ I).
Lemma 4. We have
FX = σ(Ψ).
Proof. For every σ-algebra A1 in Ω
Ψ A1-⊗t∈I
A′-measurable ⇔ ∀ s ∈ I : Xs A1-A′-measurable ⇔ FX ⊂ A1.
Definition 5. A family (X(j))j∈J of stochastic processes on a common probability
space is independent if (FX(j)
)j∈J is independent.
In this lecture we study the case
I ⊂ [0,∞[ ,
and typically the elements t ∈ I are considered as time instances. In the limit theorems
from the previous chapter we have I = N and (Ω′,A′) = (Rk,Bk) with k = 1 most
often.
Definition 6. The σ-algebra generated by X up to time t ∈ I is defined by
FXt = σ(Xs : 0 ≤ s ≤ t).
Lemma 7. We have
FXt = σ(Ψt),
where Ψt(ω) = Ψ(ω)|[0,t].
Proof. Apply Lemma 4 to the restriction of X onto [0, t]× Ω.
Processes with Continuous Paths
In the sequel,
I = [0,∞[
and
(Ω′,A′) = (Rk,Bk).
Definition 8. X is a stochastic process with continuous paths , if X(·, ω) ∈ C(I,Rk)
for every ω ∈ Ω.
IV.1. Stochastic Processes 67
For simplicity we focus on the case k = 1; the general case k ∈ N may be treated
analogously. See also Remark 18 below.
Example 9. Let X have continuous paths. Then
sups∈[0,t]
|Xs| ≤ c =⋂
s∈[0,t]∩Q
|Xs| ≤ c ∈ FXt .
We discuss σ-algebras and probability measures on the path space
M = C(I,R).
Put
ρ(f, g) =∞∑T=1
1
2Tsupt∈[0,T ]
min(|f(t)− g(t)|, 1), f, g ∈M.
Remark 10. For f, fn ∈ M we have limn→∞ ρ(f, fn) = 0 iff (fn)n∈N converges uni-
formly to f on every compact subset of I.
Theorem 11. (M,ρ) is a complete separable metric space.
Proof. Employ the respective properties of the spaces C([0, T ],R), equipped with the
supremum norm.
We consider the Borel σ-algebra B(M) on M as well as the Borel σ-algebra Bm on
Rm. For 0 ≤ t1 < · · · < tm we define κt1,...,tm : M → Rm by
κt1,...,tm(f) = (f(t1), . . . , f(tm)).
Theorem 12.
B(M) = σ(κt : t ∈ I).
Proof. Let G = σ(κt : t ∈ I). Since κt is continuous for every t ∈ I, it follows that
G ⊂ B(M).
Let B(f, r) = g ∈M : ρ(f, g) < r for f ∈M and r > 0. Note that
supt∈[0,T ]
min(|f(t)− g(t)|, 1) = supt∈[0,T ]∩Q
min(|f(t)− g(t)|, 1)
for f, g ∈M , which implies the G-B-measurability of ρ(f, ·). It follows that B(f, r) ∈G, i.e., G contains all open balls. Let M0 be a countable dense subset of M . Consider
an open set ∅ 6= A ⊂M . There exist mappings
c : A→M0, r : A→ ]0,∞[ ∩Q
such that
f ∈ B(c(f), r(f)) ⊂ A
for all f ∈ A. It follows that
A =⋃f∈A
B(c(f), r(f)),
where the right-hand side actually is a countable union of open balls. This implies
A ∈ G, and therefore B(M) ⊂ G.
IV.1. Stochastic Processes 68
Example 13. We have f ∈ B(M) for every f ∈M . Moreover,
f ∈M : supt∈[0,∞[
|f(t)|/(1 + t) <∞ ∈ B(M).
Corollary 14. Let X be a process with continuous paths. Then Φ: Ω→M , defined
by
Φ(ω) = X(·, ω),
is A-B(M)-measurable. Moreover,
FX = σ(Φ),
and the analogous statement holds for FXt and Φt(ω) = Φ(ω)|[0,t].
Proof. Note that κt(Φ(ω)) = Xt(ω). Apply Theorem 12, and consider (M,B(M))
instead of ((Ω′)I ,⊗
t∈I A′) as well as Φ instead of Ψ in the proofs of Lemma 4 and
Lemma 7.
Later on we will also denote Φ by X.
Remark 15. Consider the measurable space (Ω2,A2) = (RI ,⊗
t∈I B), and the em-
bedding ı : C(I,R)→ RI . We claim that B(M) = σ(ı).
Proof: Observe that κt = πt ı. Use Theorem 11 to conclude that ı is B(M)-A2-
measurable. Conversely, Theorem A.10.1 implies that κt is σ(ı)-B-measurable.
It follows that B(M) is the trace of A2 in M , i.e.,
B(M) = A ∩M : A ∈ A2.
Therefore any set of the form κ−1t1,...,tm(B1×· · ·×Bm) with Bi ∈ B is called a measurable
rectangle in M , and any set of the form κ−1t1,...,tm(B) with B ∈ Bm is called a cylinder
set in M , cf. Section A.8.
Definition 16. Let X be a process with continuous paths. Then PΦ is called the
distribution of X (on B(M)).
Theorem 17. For µ1, µ2 ∈M(M) we have µ1 = µ2 iff
κt1,...,tmµ1 = κt1,...,tmµ2
for all m ∈ N and 0 ≤ t1 < · · · < tm.
Proof. By Theorem 12 the class of measurable rectangles is a generator of B(M),
which is closed with respect to intersections. Apply Theorem A.1.5.
Remark 18. Let k ∈ N. Consider a product metric on Mk, and observe that C(I,Rk)
may be identified with Mk in a canonical way. We have B(C(I,Rk)) =⊗k
j=1 B(M),
see Theorem 11 and Theorem A.8.8.
IV.2. Donsker’s Invariance Principle 69
Canonical Processes
Let Ω = C(I,Rk) and A = B(Ω) according to Remark 18.
Definition 19. The canonical process X = (Xt)t∈I on (Ω,B(Ω), µ) with any µ ∈M(Ω) is defined by
Xt(f) = f(t), t ∈ I, f ∈ Ω.
Remark 20. Let X denote the canonical process on (Ω,B(Ω), µ). By definition,
X(·, f) = f , such that X has continuous paths and µ is the distribution of X.
2 Donsker’s Invariance Principle
We present an infinite-dimensional central limit theorem, which deals with scaling
limits of random walks.
Given: an i.i.d. sequence (ξj)j∈N of square integrable random variables on a proba-
bility space (Ω,A, P ) such that
E(ξ1) = 0, E(ξ21) = σ2 > 0.
Put In = `/n : ` ∈ N0 as well as
X[n]`/n =
1
σ√n·∑j=1
ξj, ` ∈ N0.
Terminology: a random variable Y on (Ω,A, P ) and a σ-algebra G ⊂ A are called
independent if (σ(Y ),G) is independent.
Remark 1. Let s, t ∈ In with s < t. Use
X[n]t −X [n]
s =1
σ√n·
tn∑j=sn+1
ξj
to conclude that
(i) E(X[n]t −X
[n]s ) = 0,
(ii) Var(X[n]t −X
[n]s ) = t− s,
(iii) X[n]t −X
[n]s is independent of FX
[n]
s .
According to Example III.5.12, lim supn→∞X[n]1 =∞ and lim infn→∞X
[n]1 = −∞ with
probability one. However, X[n]1
d−→ Z ∼ N(0, 1) due to the Central Limit Theorem.
IV.2. Donsker’s Invariance Principle 70
Theorem 2. Fix m,n0 ∈ N, and let µ[n]m,n0 denote the distribution of
X [n]m,n0
=(X
[n·n0]1/n0
, . . . , X[n·n0]m/n0
)>.
Then
µ[n]m,n0
w−→ N(0, K)
as n tends to infinity, where
K =(min(`/n0, k/n0)
)1≤`,k≤m.
Proof. Put
∆[n]` =
√n0 ·
(X
[n·n0]`/n0
−X [n·n0](`−1)/n0
)for ` = 1, . . . ,m. Since
∆[n]` =
1
σ√n·
n∑j=1
ξ(`−1)·n+j,
we get P∆
[n]`
w−→ N(0, 1), see Corollary III.5.8. Furthermore, we have independence
of
∆[n] =(
∆[n]1 , . . . ,∆[n]
m
),
and therefore
P∆[n]w−→ N(0, Em)
follows from Ubung 8.1.b) and Theorem III.7.1.
Note that
X [n]m,n0
= 1/√n0 · L∆[n]
with
L =
1 0...
. . .
1 . . . 1
∈ Rm×m. (1)
Moreover,
1/√n0 · LZ ∼ N(0, K)
if Z ∼ N(0, Em), see Example III.7.4 and Lemma III.7.7. Use Lemma II.3.6 to
conclude that µ[n]m,n0
w−→ N(0, K).
Let I = [0,∞[. We consider M = C(I,R), equipped with the metric of locally uniform
convergence, see p. 67, and the corresponding Borel σ-algebra B(M).
We extend (X[n]t )t∈In to a process (X
[n]t )t∈I with continuous paths by piecewise linear
interpolation, i.e.,
X[n]t (ω) = (t− `/n) · n ·X [n]
(`+1)/n(ω) + ((`+ 1)/n− t) · n ·X [n]`/n(ω)
for t ∈ [`/n, (`+ 1)/n] and every ω ∈ Ω. We study the distribution µ[n] of X [n], which
depends on n and Pξ1 .
IV.2. Donsker’s Invariance Principle 71
Theorem 3 (Donsker). There exists a probability measure µ∗ on B(M) such that
µ[n] w−→ µ∗
for every choice of Pξ1 . Moreover,
κµ∗ = N(0, Kκ)
for every κ = κt1,...,tm with m ∈ N and 0 ≤ t1 < · · · < tm, where
Kκ = (min(ti, tj))1≤i,j≤m. (2)
Sketch of the proof. For details, see Klenke (2013, Abschn. 21.8). The proof consists
of two major steps:
1. Show that µ[n] : n ∈ N is tight.
2. Show that κµ[n] w−→ N(0, Kκ) for every κ = κt1,...,tm .
In the proof of 1.) one considers the modulus of continuity of f ∈M on [0, T ], which
is defined by
mT (f ; δ) = sup|f(t)− f(s)| : s, t ∈ [0, T ], |t− s| ≤ δ.
According to the Arzela-Ascoli Theorem, K ⊂M is relatively compact iff
supf∈K|f(0)| <∞
and
limδ→0
supf∈K
mT (f ; δ) = 0
for every T > 0. Remark 1.(ii) provides a first indication of tightness. See Theorem 2
for a particular case of a subsequence appearing in 2.)
By 1.) and Prohorov’s Theorem, µ[n] : n ∈ N is relatively compact. We fix a
convergent subsequence and let µ∗ denote its limit. We claim that µ[n] w−→ µ∗. Due
to Remarks II.2.8 and II.3.13.(iii) it suffices to show that every subsequence (µ[nk])k∈Nof (µ[n])n∈N contains a subsequence (µ[nk`
])`∈N that converges weakly to µ∗. Relative
compactness yields the existence of a convergent subsubsequence, and µ[nk`] w−→ ν
implies κν = κµ∗ for every κ according to 2.), see Lemma II.3.6. Use Theorem 1.17
to conclude that ν = µ∗.
Definition 4. The measure µ∗ according to Theorem 3 is called the Wiener measure
on B(M).
IV.3. The Brownian Motion 72
3 The Brownian Motion
In the sequel, I = [0,∞[ and X = (Xt)t∈I denotes a stochastic process on (Ω,A, P )
with state space (Rk,Bk).
Definition 1. X is called a k-dimensional Brownian motion (Wiener process), if
(i) X has continuous paths,
(ii) X0 = 0 a.s.,
(iii) Xt −Xs with 0 ≤ s < t is
(a) independent of FXs ,
(b) N(0, (t− s) · Ek)-distributed.
We address the questions of existence and uniqueness of a Brownian motion.
Definition 2. X has
(i) independent increments , if
Xt1 −Xt0 , . . . , Xtm −Xtm−1
is independent for all m ≥ 2 and 0 = t0 < · · · < tm.
(ii) stationary increments , if the distributions of Xt − Xs and Xt−s − X0 coincide
for all 0 ≤ s < t.
Lemma 3. Assume that X0 is constant P -a.s. Then
X has independent increments ⇔ ∀ 0 ≤ s < t : Xt −Xs independent of FXs .
Proof. ‘⇐’: Follows inductively. ‘⇒’: Fix s < t and put
C =⋃
m∈N, 0=s0<···<sm=s
σ(Xs0 , . . . , Xsm).
Since C is closed w.r.t ∩ and σ(C) = FXs , it suffices to show that C and σ(Xt −Xs)are independent, see Theorem II.6.5. By assumption, for 0 = s0 < · · · < sm = s,
Xs0 , Xs1 −Xs0 , . . . , Xsm −Xsm−1 , Xt −Xs independent.
Moreover,
σ(Xs0 , Xs1 −Xs0 , . . . , Xsm −Xsm−1) = σ(Xs0 , Xs1 , . . . , Xsm).
Use Corollary II.6.6 to conclude that σ(Xs0 , Xs1 , . . . , Xsm) and σ(Xt − Xs) are
independent, which yields the independence of C and σ(Xt −Xs), as claimed.
IV.3. The Brownian Motion 73
One-dimensional Brownian Motion
Theorem 4. Let k = 1 and let X have continuous paths. Then the following prop-
erties are equivalent:
(i) X is a one-dimensional Brownian motion,
(ii) the distribution of X is the Wiener measure,
(iii) every finite dimensional marginal distribution of X is a normal distribution and
E(Xt) = 0, Cov(Xs, Xt) = min(s, t)
for s, t ∈ I.
Proof. Use Theorems 1.17 and 2.3 to establish the equivalence of (ii) and (iii).
Let
Y = (Xt1 , . . . , Xtm)>
and
Z = (Xt1 −Xt0 , . . . , Xtm −Xtm−1)>
for 0 = t0 ≤ t1 < · · · < tm. Moreover, let
D = diag(t1, t2 − t1, . . . , tm − tm−1) ∈ Rm×m,
L be given by (2.1), and define K = Kκ according to (2.2). A direct computation
yields
LDL> = K.
Suppose that Xt0 = 0 P -a.s., which follows in particular from (i) or (ii). Then we
have Y = LZ P -a.s., and therefore Y ∼ N(0, K) iff Z ∼ N(0, D).
‘(i)⇒ (iii)’: Use Theorem III.7.1 and Lemma 3 to conclude that Z ∼ N(0, D). Hence
Y ∼ N(0, K).
‘(ii) ⇒ (i)’: By Theorem 2.3, Y ∼ N(0, K) so that Z ∼ N(0, D). Apply Theorem
III.7.1 and Lemma 3.
Corollary 5. Let M = C(I,R), and let µ∗ denote the Wiener measure. Then the
canonical process on (M,B(M), µ∗) is a one-dimensional Brownian motion.
Proof. Apply Remark 1.20 and Theorem 4.
Remark 6. According to Theorem 4 the distribution of a one-dimensional Brownian
motion is uniquely determined.
IV.3. The Brownian Motion 74
Multi-dimensional Brownian Motion
The components of X are real-valued processes, which are denoted by X(j) with
j ∈ 1, . . . , k, i.e.,
Xt = (X(1)t , . . . , X
(k)t ), t ∈ I.
Consider the product measurable space
(M,B(M)) =(C(I,Rk),
k⊗j=1
B(C(I,R))),
see Remark 1.18. Note that Corollary 1.14 and Definition 1.16 extend to Rk-valued
processes with continuous paths.
Remark 7. Let X have continuous paths, let µ denote the distribution of X, and let
µ(j) denote the distribution of X(j). Then, by Corollary 1.14,
(X(1), . . . , X(k)) independent ⇔ µ =k×j=1
µ(j).
See also Ubung 9.4.
Theorem 8. The following properties are equivalent:
(i) X is a k-dimensional Brownian motion,
(ii) (X(1), . . . , X(k)) is independent and X(j) is a one-dimensional Brownian motion
for every j ∈ 1, . . . , k.
Proof. ‘(i) ⇒ (ii)’: It is easily verified that X(1), . . . , X(k) are one-dimensional Brow-
nian motions. Let m ∈ N and 0 = t0 < · · · < tm, and put
∆(j)i = X
(j)ti −X
(j)ti−1
for i ∈ 1, . . . ,m and j ∈ 1, . . . , k as well as
∆(j) = (∆(j)1 , . . . ,∆(j)
m ), ∆i = (∆(1)i , . . . ,∆
(k)i ).
Due to Theorem II.6.13 and Ubung 9.4. it suffices to show
(∆(1), . . . ,∆(k)) is independent. (1)
By Lemma 3
(∆1, . . . ,∆m) is independent,
and for every i ∈ 1, . . . ,m
(∆(1)i , . . . ,∆
(k)i ) is independent
by Theorem III.7.1. It follows that
(∆(1)1 , . . . ,∆(k)
m ) is independent.
Apply Theorem II.6.13 to get (1).
‘(ii) ⇒ (i)’ is established in a similar way.
IV.3. The Brownian Motion 75
Corollary 9. Let µ∗ denote the Wiener measure. Then the canonical process on
(M,B(M), µ)
with µ =×kj=1 µ∗ is a Brownian motion.
Proof. Remark 7 yields the independence of (X(1), . . . , X(k)), and by Theorem 4 the
components X(j) are one-dimensional Brownian motions. Apply Theorem 8.
Remark 10. According to Theorems 4 and 8 the distribution of any k-dimensional
Brownian motion is given by×kj=1 µ∗ with µ∗ denoting the Wiener measure.
Remark 11. Brownian motion
(i) is a fundamental object in the theory of martingales, Markov processes, and
stochastic differential equations.
(ii) is the building block to model continuous-time random dynamics, e.g., in math-
ematical finance, physics, or engineering applications.
(iii) provides a link between probability theory and analysis, called stochastic anal-
ysis.
Example 12. Let X be a one-dimensional Brownian motion, and let µ ∈ R as well
as s0, σ > 0. Then
St = s0 · exp((µ− σ2/2) · t+ σ ·Xt), t ∈ I,
is called a geometric Brownian motion with initial value s0, drift µ, and volatility σ.
The process S is used to define the dynamics of the stock price in the Black-Scholes
model .
Example 13. We consider a boundary value problem for an elliptic partial differential
equation. Let ∅ 6= D ⊂ Rk be a bounded open set, and let f : ∂D → R be a continuous
function. Find a function
u : clD → R
such that
(i) u is continuous on clD and u(x) = f(x) for x ∈ ∂D,
(ii) u is harmonic on D, i.e., u is twice continuously differentiable on D and
k∑j=1
∂2u
∂x2j
(x) = 0, x ∈ D.
Consider a k-dimensional Brownian motion X, let x ∈ clD, and put
τx = inft ∈ I : x+Xt ∈ ∂D.
Then τx is a numerical random variable, and τx < ∞ P -a.s. follows from Example
III.5.12. Put
Y x = x+Xτx
IV.3. The Brownian Motion 76
on τx <∞. Then Y x is a random variable, which takes values in ∂D. Put
u∗(x) = E(f(Y x)), x ∈ clD.
We have a stochastic representation and uniquenes result: u = u∗ for every solution
u of the boundary value problem. Moreover, we have an existence result: under mild
geometrical assumptions on ∂D, which ensure continuity of u∗, e.g., if D is convex,
u∗ is the unique solution of the boundary value problem.
See Muller-Gronbach, Novak, Ritter (2012, Abschn. 4.5.5) for further details, refer-
ences, and a simulation technique that takes into account the geometry of D.
Invariance and Path Properties
Given: a one-dimensional Brownian motion X = (Xt)t∈I .
Theorem 14.
(i) Yt = −Xt defines a one-dimensional Brownian motion (symmetry).
(ii) For every c > 0
Yt = c−1/2 ·Xc·t
defines a one-dimensional Brownian motion (scaling invariance).
(iii) For every T > 0 and t ∈ [0, T ]
Yt = XT −XT−t
defines a one-dimensional Brownian motion with parameter set [0, T ] (time re-
versal).
(iv) For every T > 0
Yt = XT+t −XT
defines a one-dimensional Brownian motion, which is independent of (Xt)t∈[0,T ]
(Markov property).
Proof. Use Theorem 4 to establish (i)–(iii). For (iv) see Ubung 10.3 .
A variant of the strong law of large numbers for Brownian motion reads as follows.
Theorem 15. For every α > 1/2
limt→∞
t−α ·Xt = 0 P -a.s.
Proof. Reduction to the discrete-time case, see Ubung 10.4.b) .
Theorem 16.
Yt =
t ·X1/t if t > 0
0 if t = 0
defines a one-dimensional Brownian motion (projective reflection).
IV.3. The Brownian Motion 77
Proof. Use Theorem 4 and note that continuity at t = 0 follows from Theorem 15,
see Ubung 10.4.c) .
The law of the iterated logarithm hold for Brownian motion, too.
Theorem 17.
lim supt→∞
Xt√2t · log log t
= 1 P -a.s.
Proof. See Schilling, Partzsch (2014, Sec. 12.1).
Remark 18. Let c, ε > 0, and put
Ac,ε = f ∈ C(I,R) : supt∈]0,ε[
|f(t)|/t1/2 ≤ c,
Bc,ε = f ∈ C(I,R) : supt∈]1/ε,∞[
|f(t)|/t1/2 ≤ c.
Use Theorems 4 and 16 to conclude that
P (X· ∈ Ac,ε) = P (Y· ∈ Ac,ε) = P (X· ∈ Bc,ε).
Moreover, P (X· ∈ Bc,ε) = 0 by Theorem 17 (or Example III.5.12). Thus X is not
locally Holder continuous of order 1/2 at t = 0 with probability one. In particular,
X is not differentiable at t = 0 with probability one. By Theorem 14.(iv) the same
conclusion holds true at any fixed t ∈ I.
Theorem 19. With probability one X is nowhere locally Holder-continuous of any
order γ > 1/2. In particular, X is nowhere differentiable with probability one.
Proof. See Klenke (2013, p. 477) as well as Schilling, Partzsch (2014, Chap. 10).
The following result is known as Levy’s modulus of continuity.
Theorem 20 (Levy’s modulus of continuity).
lim supδ→0
m1(X; δ)√2δ · log δ−1
= 1 P -a.s.
Proof. See Schilling, Partzsch (2014, Sec. 10.3).
Consider the level sets
Zb(ω) = t ∈ I : Xt(ω) = b.
Theorem 21. For every b ∈ R with probability one, Zb is closed, unbounded, of
Lebesgue measure zero, and has no isolated points.
Proof. By continuity of X·(ω) and by Theorem 17 (or Example III.5.12), Zb(ω) is
closed and unbounded for P -a.e. ω ∈ Ω. Use Ubung 10.1.b and Fubini’s Theorem to
obtain ∫Ω
∫I
1b(Xt(ω))λ(dt)P (dω) =
∫I
P (Xt = b)λ(dt) = 0
to derive λ(Zb(ω)) = 0 for P -a.e. ω ∈ Ω. See Schilling, Partzsch (2014, Sec. 11.4) for
the remaing part of the proof.
Appendix A
Measure and Integration
We collect some basic definitions and facts from measure and integration theory, as
needed in the course of this lecture.
For more details and for a consistent presentation see the Lecture Notes on Maß- und
Integrationstheorie (2016). A basic reference is Elstrodt (2011), see also Ganssler,
Stute (1977, Kap. 1) and Klenke (2013).
1 Measure Spaces and Measurable Mappings
Given: a non-empty set Ω.
Definition 1. A ⊂ P(Ω) is a σ-algebra (in Ω) if
(i) Ω ∈ A,
(ii) A1, A2, . . . ∈ A⇒⋃∞n=1An ∈ A,
(iii) A ∈ A⇒ Ac ∈ A.
In this case (Ω,A) is called measurable space, and elements A ∈ A are called measur-
able sets .
Definition 2. µ : A→ [0,∞[ ∪ ∞ is a measure (on A) if
(i) A is a σ-algebra,
(ii) µ(∅) = 0,
(iii) µ is σ-additive, i.e.,
A1, A2, . . . ∈ A pairwise disjoint ⇒ µ( ∞⋃i=1
Ai
)=∞∑i=1
µ(Ai).
In this case (Ω,A, µ) is called measure space. If µ(Ω) = 1, additionally, then µ is
called probability measure (on A) and (Ω,A, µ) is called probability space.
78
A.1. Measure Spaces and Measurable Mappings 79
See Theorem 9.4 for further properties of probability measures.
Example 3. Let A denote a σ-algebra in Ω. The Dirac measure in ω ∈ Ω is given by
εω(A) = 1A(ω), A ∈ A.
The counting measure is given by
µ(A) = |A|, A ∈ A.
The uniform distribution, in the case |Ω| <∞ and A = P(Ω),
µ(A) =|A||Ω|
, A ⊂ Ω.
Given: a class E ⊂ P(Ω).
Definition 4. The σ-algebra generated by E is given by
σ(E) =⋂A : A σ-algebra in Ω ∧ E ⊂ A.
Theorem 5. For probability measures µ1, µ2 on a σ-algebra A and E ⊂ A with
(i) σ(E) = A and E is closed w.r.t. intersections,
(ii) µ1|E = µ2|E
we have µ1 = µ2.
In the sequel, (Ωi,Ai) are measurable spaces.
Definition 6. f : Ω1 → Ω2 is A1-A2-measurable if f−1(A) ∈ A1 for every A ∈ A2.
Theorem 7. If f : Ω1 → Ω2 is A1-A2-measurable and g : Ω2 → Ω3 is A2-A3-
measurable, then g f : Ω1 → Ω3 is A1-A3-measurable.
Theorem 8. If A2 = σ(E) with E ⊂ P(Ω2) and f : Ω1 → Ω2, then
f is A1-A2-measurable ⇔ ∀E ∈ E : f−1(E) ∈ A1.
Given: a measure µ on (Ω1,A1) and an A1-A2-measurable mapping f : Ω1 → Ω2.
Remark 9.
f(µ) : A2 → [0,∞[ ∪ ∞A 7→ µ(f−1(A))
defines a measure on A2.
Definition 10. f(µ) is called the image measure of µ under f .
A.2. Borel Sets 80
2 Borel Sets
Definition 1. For every topological space (Ω,G)
B(Ω) = σ(G)
is the Borel σ-algebra (in Ω w.r.t. G). Elements A ∈ B(Ω) are called Borel sets .
Theorem 2. For every pair of topological spaces (Ω1,G1) and (Ω2,G2) and every
mapping f : Ω1 → Ω2,
f continuous ⇒ f is B(Ω1)-B(Ω2)-measurable.
Remark 3. Put R = R ∪ ±∞, and consider the natural topologies on R and R.
Then O ⊂ R is an open set iff O ∩ R is an open set in R and ]a,∞] ⊂ O for some
a <∞ if ∞ ∈ O as well as [−∞, a[ ⊂ O for some a > −∞ if −∞ ∈ O.
By definition, 0 · ±∞ = 0.
Notation
Bk = B(Rk), B = B1, Bk = B(Rk), B = B1.
Theorem 4.
Bk = σ(F ⊂ Rk : F closed) = σ(K ⊂ Rk : K compact)= σ(]−∞, a] : a ∈ Rk) = σ(]−∞, a] : a ∈ Qk).
3 The Lebesgue Measure
Theorem 1. There exists a uniquely determined measure λk on Bk with
(i) λk([0, 1]k) = 1,
(ii) ∀ b ∈ Rk ∀A ∈ Bk : λk(A+ b) = λk(A).
Definition 2. λk is called the k-dimensional Lebesgue measure.
Remark 3. For a, b ∈ Rk with ai ≤ bi we have λk([a, b]) =∏k
i=1 λ1([ai, bi]).
For further properties of (probability) measures we refer to Section III.3 in the Lecture
Notes on Maß- und Integrationstheorie (2016).
4 Real-valued Measurable Mappings
Given: a measurable space (Ω,A). Put
Z(Ω,A) = f : Ω→ R : f is A-B-measurable ,Z+(Ω,A) = f ∈ Z(Ω,A) : f ≥ 0 ,Z(Ω,A) =
f : Ω→ R : f is A-B-measurable
,
Z+(Ω,A) =f ∈ Z(Ω,A) : f ≥ 0
.
A.5. Integration 81
Remark 1. Every function f : Ω → R may also be considered as a function with
values in R, and in this case f ∈ Z(Ω,A) iff f ∈ Z(Ω,A).
Theorem 2. For ≺∈ ≤, <,≥, > and f : Ω→ R,
f ∈ Z(Ω,A) ⇔ ∀ a ∈ R : ω ∈ Ω : f(ω) ≺ a ∈ A.
Theorem 3. For f, g ∈ Z(Ω,A) and ≺∈ ≤, <,≥, >,=, 6=,
ω ∈ Ω : f(ω) ≺ g(ω) ∈ A.
Theorem 4. For every sequence f1, f2, . . . ∈ Z(Ω,A),
(i) infn∈N fn, supn∈N fn ∈ Z(Ω,A),
(ii) lim infn→∞ fn, lim supn→∞ fn ∈ Z(Ω,A),
(iii) if (fn)n∈N converges at every point ω ∈ Ω, then limn→∞ fn ∈ Z(Ω,A).
Remark 5. For f ∈ Z(Ω,A) we have f+, f−, |f | ∈ Z+(Ω,A).
Theorem 6. For f, g ∈ Z(Ω,A),
f ± g, f · g, f/g ∈ Z(Ω,A),
provided that these functions are well defined.
Definition 7. f ∈ Z(Ω,A) is called simple function if |f(Ω)| <∞.
Put
S(Ω,A) = f ∈ Z(Ω,A) : f simple ,S+(Ω,A) = f ∈ S(Ω,A) : f ≥ 0 .
Theorem 8. For every (bounded) function f ∈ Z+(Ω,A) there exists a sequence
f1, f2, . . . ∈ S+(Ω,A) such that fn ↑ f (with uniform convergence).
5 Integration
Given: a measure space (Ω,A, µ). Notation: S+ = S+(Ω,A).
Definition 1. The integral of f ∈ S+ w.r.t. µ is given by∫f dµ =
n∑i=1
αi · µ(Ai)
if
f =n∑i=1
αi · 1Ai
with αi ≥ 0 and Ai ∈ A.
A.5. Integration 82
Notation: Z+ = Z+(Ω,A).
Definition 2. The integral of f ∈ Z+ w.r.t. µ is given by∫f dµ = sup
∫g dµ : g ∈ S+ ∧ g ≤ f
.
Theorem 3 (Monotone convergence, Beppo Levi). Let fn ∈ Z+ such that
∀n ∈ N : fn ≤ fn+1.
Then ∫supnfn dµ = sup
n
∫fn dµ.
Remark 4. For every f ∈ Z+ there exists a sequence of functions fn ∈ S+ such that
fn ↑ f , see Theorem 4.8.
Theorem 5 (Fatou’s Lemma). For every sequence (fn)n in Z+∫lim infn→∞
fn dµ ≤ lim infn→∞
∫fn dµ.
Definition 6. A property Π holds µ-almost everywhere (µ-a.e., a.e.), if
∃A ∈ A : ω ∈ Ω : Π does not hold for ω ⊂ A ∧ µ(A) = 0.
Theorem 7. Let f ∈ Z+. Then∫f dµ = 0 ⇔ f = 0 µ-a.e.
Notation: Z = Z(Ω,A).
Definition 8. f ∈ Z is quasi-µ-integrable if∫f+ dµ <∞ ∨
∫f− dµ <∞.
In this case: integral of f (w.r.t. µ)∫f dµ =
∫f+ dµ−
∫f− dµ.
f ∈ Z µ-integrable if ∫f+ dµ <∞ ∧
∫f− dµ <∞.
Theorem 9.
(i) f µ-integrable ⇒ |f | <∞ µ a.e.,
(ii) f µ-integrable ∧ g ∈ Z ∧ f = g µ-a.e. ⇒ g µ-integrable ∧∫f dµ =
∫g dµ.
A.5. Integration 83
(iii) equivalent properties for f ∈ Z are:
(a) f µ-integrable,
(b) |f | µ-integrable,
(c) ∃ g : g µ-integrable ∧ |f | ≤ g µ-a.e.,
(iv) for f and g µ-integrable and c ∈ R
(a) f+g well-defined µ-a.e. and µ-integrable with∫
(f+g) dµ =∫f dµ+
∫g dµ,
(b) c · f µ-integrable with∫
(cf) dµ = c ·∫f dµ,
(c) f ≤ g µ-a.e. ⇒∫f dµ ≤
∫g dµ.
Theorem 10 (Dominated convergence, Lebesgue). Assume that
(i) fn ∈ Z for n ∈ N,
(ii) ∃ g µ-integrable ∀n ∈ N : |fn| ≤ g µ-a.e.,
(iii) f ∈ Z such that limn→∞ fn = f µ-a.e.
Then f is µ-integrable and ∫f dµ = lim
n→∞
∫fn dµ.
Example 11. Consider fn = n · 1]0,1/n[ on (R,B, λ1). Then limn→∞ fn = 0 but∫fn dλ1 = 1.
Given: a measurable space (Ω′,A′) and an A-A′-measurable mapping f : Ω→ Ω′.
Theorem 12 (Transformation ‘Theorem’).
(i) For g ∈ Z+(Ω′,A′) ∫Ω′g df(µ) =
∫Ω
g f dµ. (1)
(ii) For g ∈ Z(Ω′,A′)
g is f(µ)-integrable ⇔ g f is µ-integrable,
in which case (1) holds.
Definition 13. For f ∈ Z and A ∈ A∫A
f dµ =
∫1A · f dµ.
Theorem 14. For f ∈ Z+
ν(A) =
∫A
f dµ, A ∈ A,
defines a measure on A. If∫f dµ = 1 then ν is a probability measure.
A.6. Lp-Spaces 84
Definition 15. ν = f · µ (probability) measure with (probability) density f w.r.t. µ.
Theorem 16. Let ν = f · µ with f ∈ Z+ and let g ∈ Z(Ω,A). Then
g (quasi)-ν-integrable ⇔ g · f (quasi)-µ-integrable,
in which case ∫g dν =
∫g · f dµ.
6 Lp-Spaces
Given: a measure space (Ω,A, µ) and 1 ≤ p <∞. Put Z = Z(Ω,A) and Z = Z(Ω,A).
Definition 1.
Lp = Lp(Ω,A, µ) =f ∈ Z :
∫|f |p dµ <∞
.
In particular, for p = 1: integrable functions and L = L1, and for p = 2: square-
integrable functions . Put
‖f‖p =
(∫|f |p dµ
)1/p
, f ∈ Lp.
Theorem 2 (Holder inequality). Let 1 < p, q < ∞ such that 1/p + 1/q = 1 and let
f ∈ Lp, g ∈ Lq. Then ∫|f · g| dµ ≤ ‖f‖p · ‖g‖q.
In particular, for p = q = 2: Cauchy-Schwarz inequality,
Theorem 3. Lp is a vector space and ‖ · ‖p is a semi-norm on Lp. Furthermore,
‖f‖p = 0 ⇔ f = 0 µ-a.e.
Definition 4. Let f, fn ∈ Lp for n ∈ N. (fn)n converges to f in Lp (in mean of order
p) if
limn→∞
‖f − fn‖p = 0.
In particular, for p = 1: convergence in mean, and for p = 2: mean-square conver-
gence. Notation:
fnLp
−→ f.
Remark 5. Let f, fn ∈ Z for n ∈ N. Recall (define) that (fn)n converges to f µ-a.e.
if µ(Ac) = 0 for A =
limn→∞ fn = f∈ A. Notation:
fnµ-a.e.−→ f.
Lemma 6. Let f, g, fn ∈ Lp for n ∈ N such that fnLp
−→ f . Then
fnLp
−→ g ⇔ f = g µ-a.e.
Analogously for convergence almost everywhere.
A.7. Dynkin Classes 85
Theorem 7 (Fischer-Riesz). Consider a sequence (fn)n in Lp. Then
(i) (fn)n Cauchy sequence ⇒ ∃ f ∈ Lp : fnLp
−→ f (completeness),
(ii) fnLp
−→ f ⇒ ∃ subsequence (fnk)k : fnk
µ-a.e.−→ f .
Example 8. Let (Ω,A, µ) =([0, 1],B([0, 1]), λ1|B([0,1])
). Define
A1 = [0, 1]
A2 = [0, 1/2], A3 = [1/2, 1]
A4 = [0, 1/3], A5 = [1/3, 2/3], A6 = [2/3, 1]
etc.
Put fn = 1An . Then fnLp
−→ 0 but (fn)n converges = ∅.
Remark 9. Define
L∞ = L∞(Ω,A, µ) = f ∈ Z : ∃ c ∈ [0,∞[ : |f | ≤ c µ-a.e.
and
‖f‖∞ = infc ∈ [0,∞[ : |f | ≤ c µ-a.e., f ∈ L∞.
f ∈ L∞ is called essentially bounded and ‖f‖∞ is called the essential supremum of
|f |. Note that |f | ≤ ‖f‖∞ µ-a.e.
The definitions and results of this section, except fnL∞−→ 0 in Example 8, extend to
the case p =∞, where q = 1 in Theorem 2. In Theorem 7.(ii) we even have fnL∞−→ f
⇒ fnµ-a.e.−→ f .
Remark 10. Put
Np = f ∈ Lp : f = 0 µ-a.e..
Then the quotient space Lp = Lp/Np is a Banach space. In particular, for p = 2, L2
is a Hilbert space, with semi-inner product on L2 given by
〈f, g〉 =
∫f · g dµ, f, g ∈ L2.
Theorem 11. If µ(Ω) = 1 and 1 ≤ p < q ≤ ∞ then Lq ⊂ Lp and
‖f‖p ≤ ‖f‖q, f ∈ Lq.
7 Dynkin Classes
Given: a non-empty set Ω.
Definition 1. A ⊂ P(Ω) is a Dynkin class (in Ω) if
(i) Ω ∈ A,
A.8. Product Spaces 86
(ii) A1, A2, . . . ∈ A pairwise disjoint ⇒⋃∞n=1An ∈ A,
(iii) A1, A2 ∈ A ∧ A1 ⊂ A2 ⇒ A2 \ A1 ∈ A.
Theorem 2. For every Dynkin class A
A σ-algebra ⇔ A closed w.r.t. intersections.
Given: a class E ⊂ P(Ω).
Definition 3. The Dynkin class generated by E is given by
δ(E) =⋂A : A Dynkin class in Ω ∧ E ⊂ A.
Theorem 4. E closed w.r.t. intersections ⇒ σ(E) = δ(E).
8 Product Spaces
Given: a non-empty set I and measurable spaces (Ωi,Ai) for i ∈ I as well as mappings
fi : Ω→ Ωi for i ∈ I and some non-empty set Ω.
Definition 1. The σ-algebra generated by (fi)i∈I (and (Ai)i∈I) is given by
σ(fi : i ∈ I) = σ(⋃i∈I
f−1i (Ai)
).
Put σ(f) = σ(f) in the case |I| = 1 and f = fi.
Theorem 2. For every measurable space (Ω, A) and every mapping g : Ω→ Ω,
g is A-σ(fi : i ∈ I)-measurable ⇔ ∀ i ∈ I : fi g is A-Ai-measurable.
Put Y =⋃i∈I Ωi and define
×i∈I
Ωi = ω ∈ Y I : ω(i) ∈ Ωi for i ∈ I.
Notation: ω = (ωi)i∈I for ω ∈×i∈I Ωi. Moreover,
P0(I) = J ⊂ I : J non-empty, finite.
Definition 3. (i) A measurable rectangle is any set of the form
A =×j∈J
Aj ××i∈I\J
Ωi
with J ∈ P0(I) and Aj ∈ Aj for j ∈ J . Notation: R class of measurable
rectangles.
A.8. Product Spaces 87
(ii) The product (measurable) space (Ω,A) with components (Ωi,Ai), i ∈ I, is given
by
Ω =×i∈I
Ωi, A = σ(R).
Notation: A =⊗
i∈I Ai, product σ-algebra.
Put Ω =×i∈I Ωi. For any ∅ 6= S ⊂ I let
πIS : Ω→×i∈S
Ωi, (ωi)i∈I 7→ (ωi)i∈S
denote the projection of Ω onto ×i∈S Ωi. For short, πS instead of πIS and πi instead
of πi.
Theorem 4.
(i)⊗
i∈I Ai = σ(πi : i ∈ I).
(ii) ∀ i ∈ I : Ai = σ(Ei) ⇒⊗
i∈I Ai = σ(⋃
i∈I π−1i (Ei)
).
Theorem 5.
(i) For every measurable space (Ω, A) and every mapping g : Ω→ Ω
g is A-⊗i∈I
Ai-measurable ⇔ ∀ i ∈ I : πi g is A-Ai-measurable.
(ii) For every ∅ 6= S ⊂ I the projection πIS is⊗
i∈I Ai-⊗
i∈S Ai-measurable.
Remark 6. From Theorems 4.(i) and 5 we get⊗i∈I
Ai = σ(πIS : S ∈ P0(I)).
The sets (πIS)−1
(B) = B ×(×i∈I\S
Ωi
)with S ∈ P0(I) and B ∈
⊗i∈S Ai are called cylinder sets . Notation: C class of
cylinder sets.
Theorem 7. For every A ∈⊗
i∈I Ai there exists a non-empty countable set S ⊂ I
and a set B ∈⊗
i∈S Ai such that A =(πIS)−1
(B).
Given: a non-empty countable set I and a family of topological spaces (Ωi,Gi) where
i ∈ I. Consider the product topology G on Ω =×i∈I Ωi.
Theorem 8. If every space (Ωi,Gi) has a countable basis, then
B(Ω) =⊗i∈I
B(Ωi).
In particular, Bk =⊗k
i=1 B and Bk =⊗k
i=1 B.
A.9. Caratheodory’s Theorem 88
9 Caratheodory’s Theorem
Given: a non-empty set Ω.
Definition 1. A ⊂ P(Ω) is an algebra (in Ω) if
(i) Ω ∈ A,
(ii) A,B ∈ A⇒ A ∪B ∈ A,
(iii) A ∈ A⇒ Ac ∈ A.
Definition 2. µ : A→ [0,∞[ ∪ ∞ is a content (on A) if
(i) A is an algebra,
(ii) µ(∅) = 0,
(iii) µ is additive, i.e.,
A,B ∈ A disjoint ⇒ µ(A ∪B) = µ(A) + µ(B).
Theorem 3. The following are equivalent for a content µ on A with µ(Ω) <∞:
(i) A1, A2, . . . ∈ A pairwise disjoint ∧⋃∞i=1Ai ∈ A ⇒ µ
(⋃∞i=1Ai
)=∑∞
i=1 µ(Ai)
(σ-additivity)
(ii) A1, A2, . . . ∈ A ∧⋃∞i=1Ai ∈ A⇒ µ
(⋃∞i=1 Ai
)≤∑∞
i=1 µ(Ai) (σ-subadditivity),
(iii) A1, A2, . . . ∈ A ∧ An ↑ A ∈ A ⇒ limn→∞ µ(An) = µ(A) (σ-continuity from
below),
(iv) A1, A2, . . . ∈ A ∧ An ↓ A ∈ A ⇒ limn→∞ µ(An) = µ(A) (σ-continuity from
above),
(v) A1, A2, . . . ∈ A ∧ An ↓ ∅ ⇒ limn→∞ µ(An) = 0 (σ-continuity at ∅).
Theorem 4. For every σ-additive content µ on A with µ(Ω) <∞,
∃µ∗ measure on σ(A): µ∗|A = µ.
10 The Factorization Lemma
Given: a set Ω1 6= ∅, a measurable space (Ω2,A2), and a mapping X : Ω1 → Ω2.
Theorem 1. For every Y : Ω1 → R
Y σ(X)-B-measurable ⇔ ∃ g : Ω2 → R A2-B-measurable : Y = g X.
Literature
H. Bauer, Probability Theory, de Gruyter, Berlin, 1996.
P. Billingsley, Probability and Measure, Wiley, New York, 1st ed. 1979, 3rd ed. 1995.
Y. S. Chow, H. Teicher, Probability Theory, Springer, New York, 1st ed. 1978, 3rd ed.
1997.
J. Elstrodt, Maß- und Integrationstheorie, Springer, Berlin, 1st ed. 1996, 7th ed. 2011.
P. Ganssler, W. Stute, Wahrscheinlichkeitstheorie, Springer, Berlin, 1977.
A. Irle, Wahrscheinlichkeitstheorie und Statistik, Teubner, Stuttgart, 1st ed. 2001, 2nd
ed. 2005.
O. Kallenberg, Foundations of Modern Probability, Springer, New York, 2002.
A. Klenke, Wahrscheinlichkeitstheorie, Springer, Berlin, 1st ed. 2006, 3rd ed. 2013. [in
English: Probability Theory, Springer, Berlin, 2014.]
T. Muller-Gronbach, E. Novak, K. Ritter, Monte Carlo-Algorithmen, Springer, Berlin,
2012.
K. R. Parthasarathy, Probability Measures on Metric Spaces, Academic Press, New
York, 1967.
R. L. Schilling, Wahrscheinlichkeit, de Gruyter, Berlin, 2017.
R. L. Schilling, L. Partzsch, Brownian Motion, de Gruyter, Berlin, 1st ed. 2012, 2nd
ed. 2014.
A. N. Sirjaev, Wahrscheinlichkeit, Deutscher Verlag der Wissenschaften, Berlin, 1988.
[in English: Probability, Springer, New York, 1984.]
D. Williams, Probability with Martingales, Cambrigde University Press, Cambrigde,
1991.
J. Yeh, Martingales and Stochastic Analysis, World Scientific, Singapore, 1995.
Bauer (1996) is available via http://www.degruyter.com/viewbooktoc/product/3786
Elstrodt (2011), Irle (2005), Klenke (2013), and Muller-Gronbach et al. (2012) are available
via http://link.springer.com
89
Definitions
additivity, 88
algebra, 88
almost everywhere, 82
almost surely, 4
asymptotically negligible, 55
Bernoulli distribution, 5
binomial distribution, 5
Black-Scholes model, 75
Borel set, 80
Borel σ-algebra, 80
boundary value problem
elliptic PDE, 75
Brownian motion, 72
Markov property, 76
projective reflection, 76
scaling invariance, 76
symmetry, 76
time reversal, 76
canonical process, 69
Cauchy distribution, 7
characteristic function, 50
content, 88
convergence
almost everywhere, 84
in Lp, 84
in distribution, 10
in mean, 84
in mean-square, 84
in probability, 8
weak, 10
convolution, 35
counting measure, 79
covariance, 35
covariance matrix, 62
cylinder set, 68, 87
density, 84
Dirac measure, 79
discrete distribution, 4
distribution, 3
distribution function, 6
Dynkin class, 85
generated by a class of sets, 86
empirical distribution, 47
empirical distribution function, 47
essential supremum, 85
essentially bounded function, 85
event, 3
expectation, 7, 62
exponential distribution, 5
Feller condition, 55
Fourier transform
of a probability measure, 48
of an integrable function, 49
Fourier transformation, 49
geometric Brownian motion, 75
geometric distribution, 5
i.i.d, 37
identically distributed, 4
image measure, 79
independence
of a family of classes, 31
of a family of events, 30
of a family of processes, 66
of a family of random elements, 32
integrable function, 82
complex-valued, 48
integral, 82
of a complex-valued function, 48
of a non-negative function, 82
of a simple function, 81
joint distribution, 34
90
DEFINITIONS 91
kernel, 27
σ-finite, 27
Lebesgue measure, 80
Levy distance, 16
limes inferior, 38
limes superior, 38
Lindeberg condition, 55
Lyapunov condition, 55
marginal distribution, 34
finite-dimensional, 66
Markov kernel, 22
mean, 62
measurable mapping, 79
measurable rectangle, 68, 86
measurable set, 78
measurable space, 78
measure, 78
σ-finite, 27
with density, 84
measure space, 78
Monte Carlo algorithm, 46
normal distribution
multi-dimensional, 49, 61
one-dimensional, 5
Poisson distribution, 5
positive semi-definite function, 49
probability density, 84
probability measure, 78
with density, 84
probability space, 78
product measurable space, 87
product measure, 30
product measure space, 30
product probability measure, 26
product probability space, 26
product σ-algebra, 87
quasi-integrable function, 82
random element, 3
random variable, 3
random vector, 3
random walk, 37
relatively compact set of measures, 17
section
of a mapping, 23
of a set, 23
σ-additivity, 78, 88
σ-algebra, 78
generated by a class of sets, 79
generated by a family of mappings, 66,
86
σ-continuity at ∅, 88
σ-continuity from above, 88
σ-continuity from below, 88
σ-subadditivity, 88
simple function, 81
square-integrable function, 84
standard deviation, 7
stochastic process, 65
distribution, 66, 68
parameter set, 65
realization, 65
state space, 65
with continuous paths, 66
with independent increments, 72
with stationary increments, 72
tail σ-algebra, 37
tail (terminal) event, 37
tightness, 17
unbiased estimator, 46
uncorrelated random variables, 35
uniform distribution
on a finite set, 5, 79
on a subset of Rk, 5
uniform integrability, 18
variance, 7
Wiener measure, 71
with probability one, 4