(probability & statistics) may 27, 2020 lesson 3...
TRANSCRIPT
Continuous Distributions &
Expectation, Variance, Moment…
May 27, 2020
来嶋 秀治 (Shuji Kijima)
Dept. Informatics,
Graduate School of ISEE
確率統計特論 (Probability & Statistics)
Lesson 3
0. Discrete distributions
review
terminology3
Discrete distribution (離散分布)
distribution on countable set Ω ⊆ ℝ such that
σ𝑥∈ΩPr 𝑋 = 𝑥 = 1 holds
Probability function (確率関数)
𝑓 𝑥 = Pr 𝑋 = 𝑥
(cumulative) distribution function ((累積)分布関数)
𝐹 𝑥 = Pr 𝑋 ≤ 𝑥
important concept
in continuous distr.
(next week)
𝑋 is called “random variable (確率変数)”
4
discrete uniform (離散一様分布)
Ω = 1,2,… , 𝑛
𝑓 𝑘 =1
𝑛𝑘 ∈ Ω
Ex. Roulette
Ω = 0,1,2,… , 36
ℱ = 2Ω
Pr 𝑋 = 𝑥 =1
37(𝑥 ∈ Ω)
roulette
https://en.wikipedia.org/wiki/Roulette
5
Bernoulli (ベルヌーイ分布, 2点分布) 𝐵 1; 𝑝 (0 ≤ 𝑝 ≤ 1)
Ω = 0,1
𝑓 𝑘 = ቊ𝑝 𝑘 = 1
1 − 𝑝 𝑘 = 0
Bernoulli trial (ベルヌーイ試行) is
a random variable according to Bernoulli dist.
Ex. (biased) coin tossing
head (𝑋 = 1)
tail (𝑋 = 0)from「いらすとや」
6
binomial dist. (2項分布) B 𝑛; 𝑝 𝑛 ∈ ℤ, 0 ≤ 𝑝 ≤ 1
Ω = 0,1,2,… , 𝑛
𝑓 𝑘 =𝑛
𝑘𝑝𝑘 1 − 𝑝 𝑛−𝑘 𝑘 ∈ Ω
Ex. i.i.d. Bernoulli 𝑛 trials
Let 𝑋1, 𝑋2, … , 𝑋𝑛 be outputs of Bernoulli trial (B 1; 𝑝 ), i.i.d.
Let 𝑋 = 𝑋1 + 𝑋2 +⋯+ 𝑋𝑛 meaning that the total number of heads.
𝑋 is according to a binomial distribution B 𝑛; 𝑝
from「いらすとや」
7
binomial dist. (2項分布) B 𝑛; 𝑝
For any Ω, 2Ω, Pr , Pr must satisfy Kolmogorov’s axiom.
Axiom (i) is easy to check. (iii) is ok.
We will check axiom (ii): Pr Ω = 1.
Pr Ω =
𝑘=0
𝑛𝑛
𝑘𝑝𝑘 1 − 𝑝 𝑛−𝑘
= 𝑝 + 1 − 𝑝𝑛
= 1.
By binomial thm.
𝑥 + 𝑦 𝑛 =
𝑘=0
𝑛𝑛
𝑘𝑥𝑘𝑦𝑛−𝑘
(where 𝑛𝑘
=𝑛!
𝑘! 𝑛−𝑘 !,
proof by induction)
Ω = 0,1,2,… , 𝑛
𝑓 𝑘 =𝑛
𝑘𝑝𝑘 1 − 𝑝 𝑛−𝑘 𝑘 ∈ Ω
8
geometric dist. (幾何分布) Ge 𝑝 (0 < 𝑝 < 1)
Ω = 0,1,2,…
𝑓 𝑘 = 1 − 𝑝 𝑘𝑝 𝑘 ∈ Ω
Ex.
Repeat Bernoulli trials B 1; 𝑝 i.i.d., until head.
Let 𝐾 denote the number of tail before head,
then 𝐾 is according to a geometric distribution Ge 𝑝 .
from「いらすとや」
Check Kolmogorov (ii)9
For any Ω, 2Ω, Pr , Pr must satisfy Kolmogorov’s axiom.
Axiom (i) is easy to check. (iii) is ok.
We will check axiom (ii): Pr Ω = 1.
Pr Ω =
𝑘=0
∞
1 − 𝑝 𝑘𝑝 = 𝑝
𝑘=0
∞
1 − 𝑝 𝑘 = 𝑝1 − 𝑝∞
1 − 1 − 𝑝= 1.
Recall σ𝑘=0𝑛 𝑥𝑘 =
1−𝑥𝑛+1
1−𝑥holds
since 1 − 𝑥𝑛+1 = 1 − 𝑥 1 + 𝑥1 + 𝑥2 + 𝑥3 +⋯+ 𝑥𝑛
Ω = 0,1,2,…
𝑓 𝑘 = 1 − 𝑝 𝑘𝑝 𝑘 ∈ Ω
10
Poisson dist. (ポアソン分布) Po() (>0)
Ω = 0,1,2,…
𝑓 𝑧 = e−𝜆𝜆𝑧
𝑧!(𝑧 ∈ Ω)
Ex. Rare events
Let’s consider the probability of rare events,
the expected number of occurrences is 𝜆 in a unit time.
Let 𝑋 be the number of occurrences,
then 𝑋 is known to be according to the Poisson distr. Po(𝜆).
More precisely, repeat Bernoulli trials B 1; 𝑝 i.i.d. with 𝑝 ≪ 1.
Let 𝜆 = 𝑛𝑝, then it is known that B 𝑛; 𝑝 ≃ Po(𝜆).
today’s Exercise 2. Poisson distr. appears later today.
from「いらすとや」
Check Kolmogorov (ii)11
For any Ω, 2Ω, Pr , Pr must satisfy Kolmogorov’s axiom.
Axiom (i) is easy to check. (iii) is ok.
We will check axiom (ii): Pr Ω = 1.
Pr Ω =
𝑧=0
∞
e−𝜆𝜆𝑧
𝑧!= e−𝜆
𝑧=0
∞1
𝑧!𝜆𝑧 = e−𝜆e𝜆 = 1.
Recall e𝑥 = σ𝑘=0𝑛 1
𝑘!𝑥𝑘 (by definition. Cf., Taylor expansion)
Ω = 0,1,2,…
𝑓 𝑧 = e−𝜆𝜆𝑧
𝑧!(𝑧 ∈ Ω)
12
Discrete distr.: (distr. on a countable set R)
σ𝑥∈ΩPr 𝑋 = 𝑥 = 1 holds.
probability function (確率関数)
𝑓 𝑥 = Pr 𝑋 = 𝑥
(cumulative) distribution function ((累積)分布関数)
𝐹 𝑥 = Pr 𝑋 ≤ 𝑥
1
P
x
F(x)
1 2 3 4 5 6
1/6
2/6
3/6
4/65/6
13
Discrete distr.: (distr. on a countable set R)
σ𝑥∈ΩPr 𝑋 = 𝑥 = 1 holds.
probability function (確率関数)
𝑓 𝑥 = Pr 𝑋 = 𝑥
(cumulative) distribution function ((累積)分布関数)
𝐹 𝑥 = Pr 𝑋 ≤ 𝑥
1
P
x
F(x)
1 2 3 4 5 6
1/6
2/6
3/6
4/65/6
Discrete Distribution Function 𝐹: Ω → R≥0
1. 𝐹 −∞ = 0, 𝐹 +∞ = 1
2. Monotone non-decreasing (単調非減少)
3. Right continuous (右連続)
1. (univariate) continuous distributions
15
Continuous roulette
Ω = 𝜃 0 ≤ 𝜃 < 2𝜋
ℱ = 2Ω
Pr X = 𝜃 =? (𝜃 ∈ Ω)
Pr 𝑋 =𝜋
4=?
16
Continuous roulette
Ω = 𝜃 0 ≤ 𝜃 < 2𝜋
ℱ = 2Ω
Pr X = 𝜃 =? (𝜃 ∈ Ω)
Pr 𝑋 =𝜋
4= 0 ? ? ?
17
(continuous) uniform distr.
Ω = 0,2𝜋
Pr 𝑋 =𝜋
4= 0 ? ? ?
Pr 𝑋 ≤𝜋
4=
1
8
cumulative distribution function
seems appropriate.
18
continuous distr. (distr. on uncountable set R)
probability density function (確率密度関数)
𝑓 𝑥 =d
d𝑥𝐹 𝑥
(cumulative) distribution function ((累積)分布関数)
𝐹 𝑥 = Pr 𝑋 ≤ 𝑥 differentiable (continuous)
1
P
x
F(x)Continuous Distribution Function 𝐹: R → R≥0
1. 𝐹 −∞ = 0, 𝐹 +∞ = 1
2. Monotone non-decreasing (単調非減少)
3. Differentiable* (微分可能)
*in the effective domain.
19
Uniform ditr. (一様分布) U(a,b)
Ω = 𝑎, 𝑏
𝑓 𝑥 =1
𝑏 − 𝑎a ≤ 𝑥 ≤ 𝑏
𝐹 𝑥 =𝑥 − 𝑎
𝑏 − 𝑎(𝑎 ≤ 𝑥 ≤ 𝑏)
continuous roulette
= (0,2]
ℱ= 2
F(x) = x/2 (x)
f(x) = 1/2 (x)
20
Uniform ditr. (一様分布) U(a,b)
Ω = 𝑎, 𝑏
𝑓 𝑥 =1
𝑏 − 𝑎a ≤ 𝑥 ≤ 𝑏
𝐹 𝑥 =𝑥 − 𝑎
𝑏 − 𝑎(𝑎 ≤ 𝑥 ≤ 𝑏)
https://en.wikipedia.org/wiki/Uniform_distribution_(continuous)
Density function
Distribution function
21
Normal distr. (正規分布) N(, 2)
Ω = −∞,∞
𝑓 𝑥 =1
2𝜋𝜎exp −
1
2
𝑥 − 𝜇
𝜎
2
−∞ < 𝑥 < ∞
https://en.wikipedia.org/wiki/Normal_distribution
Density function
Distribution function
22
Exponential distr. (指数分布) Ex() (>0)
Ω = 0,∞
𝑓 𝑥 = 𝜆e−𝜆𝑥 (𝑥 ≥ 0)
https://en.wikipedia.org/wiki/Exponential_distribution
Density function
Distribution function
where
Γ 𝜈 = න−∞
∞
𝑡𝜈−1e−𝑡d𝑡
23
Gamma distr. (ガンマ分布) G(,) (>0, >0)
Ω = 0,∞
𝑓 𝑥 =1
Γ(𝜈)𝛼𝜈𝑥𝜈−1e−𝛼𝑥 (𝑥 ≥ 0)
remark that
Γ 1 = 1Γ 𝜈 = 𝜈 − 1 Γ 𝜈 − 1Γ 𝜈 = 𝜈 − 1 ! (𝜈 = 1,2,… )
24
Some Distributions
Discrete distributions
(1) Bernoulli B(1,p)
(2) Binomial B(n,p)
#heads during tossing n coins.
(3) Geometric Ge(p)
# tails before a head.
(4) Poisson Po()
Continuous distributions
(1) Uniform U(a,b)
(2) Exponential Ex()
(3) Normal N(,2)
(4) Beta Be(,)
(5) Gamma G(,k)
2. Multivariate distr., and i.i.d.
Distribution of random variables X and Y of (Ω, F , P).
Ex1. two dice.
Ω ={(1,1),(1,2),…,(6,5),(6,6)}
X = sum of casts
Y = product of casts
Multivariate distribution26
multivariate discrete distribution
distr. fnc. : 𝐹 𝑥, 𝑦 ≔ Pr 𝑋, 𝑌 ≤ 𝑥, 𝑦 = Pr 𝑋 ≤ 𝑥 , 𝑌 ≤ 𝑦
pmf: 𝑓 𝑥, 𝑦 ≔ Pr 𝑋, 𝑌 = 𝑥, 𝑦 = Pr 𝑋 = 𝑥 , 𝑌 = 𝑦
multivariate continuous distribution
distr. fnc. : 𝐹 𝑥, 𝑦 ≔ Pr 𝑋, 𝑌 ≤ 𝑥, 𝑦 = Pr 𝑋 ≤ 𝑥 , 𝑌 ≤ 𝑦
pdf: 𝑓 𝑥, 𝑦 ≔𝜕2
𝜕𝑥𝜕𝑦𝐹(𝑥, 𝑦)
i.i.d. (独立同一分布)27
𝑋 and 𝑌 are independent (独立)
𝐹𝑋𝑌 𝑥, 𝑦 = 𝐹𝑋 𝑥 𝐹𝑌(𝑦)
Prop. 𝑋, 𝑌 independent 𝑓𝑋𝑌 𝑥, 𝑦 = 𝑓𝑋 𝑥 𝑓𝑌(𝑦)
𝑋, 𝑌 are identically distributed (同一分布に従う)𝑓𝑋 ≡ 𝑓𝑌
𝑋, 𝑌 are independent and identically distributed
(i.i.d.;独立同一分布)
Prop.28
Proof.
𝑓 𝑥, 𝑦 ≔𝜕2
𝜕𝑥𝜕𝑦𝐹𝑋𝑌 𝑥, 𝑦
=𝜕2
𝜕𝑥𝜕𝑦𝐹𝑋 𝑥 𝐹𝑌 𝑦
=𝜕
𝜕𝑥
𝜕
𝜕𝑦𝐹𝑋 𝑥 𝐹𝑌 𝑦 +
𝜕
𝜕𝑥𝐹𝑋 𝑥
𝜕
𝜕𝑦𝐹𝑌 𝑦
= 0 +𝜕
𝜕𝑥𝐹𝑋 𝑥 𝑓𝑌 𝑦
=𝜕
𝜕𝑥𝐹𝑋 𝑥 𝑓𝑌 𝑦 + 𝐹𝑋 𝑥
𝜕
𝜕𝑥𝑓𝑌 𝑦
= 𝑓𝑋 𝑥 𝑓𝑌 𝑦
Prop.
𝐹𝑋𝑌 𝑥, 𝑦 = 𝐹𝑋 𝑥 𝐹𝑌(𝑦) 𝑓𝑋𝑌 𝑥, 𝑦 = 𝑓𝑋 𝑥 𝑓𝑌(𝑦).
3.1. Expectation
Expectation of discrete random variable30
Expectation (期待値) of a discrete random variable X is defined by
E 𝑋 =
𝑥∈Ω
𝑥 ⋅ 𝑓 𝑥
only when the right hand side is converged absolutely (絶対収束),
i.e., σ𝑥∈Ω 𝑥 ⋅ 𝑓 𝑥 < ∞ holds.
If it is not the case, we say “expectation does not exist.”
Why absolute convergence? 31
Let 𝑎𝑛 = −1 𝑛 for 𝑛 = 0,1,2,….
Q. What is σ𝑛=0∞ 𝑎𝑛?
Why absolute convergence? 32
Let 𝑎𝑛 = −1 𝑛 for 𝑛 = 0,1,2,….
A1.
𝑛=0
∞
𝑎𝑛 = 1 + −1 + 1 + −1 +⋯ =
𝑘=0
∞
𝑎2𝑘 + 𝑎2𝑘+1 =
𝑘=0
∞
1 + −1 = 0
Q. What is σ𝑛=0∞ 𝑎𝑛?
Why absolute convergence? 33
Let 𝑎𝑛 = −1 𝑛 for 𝑛 = 0,1,2,….
A1.
𝑛=0
∞
𝑎𝑛 = 1 + −1 + 1 + −1 +⋯ =
𝑘=0
∞
𝑎2𝑘 + 𝑎2𝑘+1 =
𝑘=0
∞
1 + −1 = 0
A2.
𝑛=0
∞
𝑎𝑛 = 𝑎0 +
𝑛=1
∞
𝑎𝑛 = 1 +
𝑘=0
∞
(𝑎2𝑘+1 + 𝑎2𝑘+2) = 1 +
𝑘=1
∞
−1 + 1 = 1
Q. What is σ𝑛=0∞ 𝑎𝑛?
Why absolute convergence? 34
Let 𝑎𝑛 = −1 𝑛 for 𝑛 = 0,1,2,….
A1.
𝑛=0
∞
𝑎𝑛 = 1 + −1 + 1 + −1 +⋯ =
𝑘=0
∞
𝑎2𝑘 + 𝑎2𝑘+1 =
𝑘=0
∞
1 + −1 = 0
A2.
𝑛=0
∞
𝑎𝑛 = 𝑎0 +
𝑛=1
∞
𝑎𝑛 = 1 +
𝑘=0
∞
(𝑎2𝑘+1 + 𝑎2𝑘+2) = 1 +
𝑘=1
∞
−1 + 1 = 1
A3.
𝑛=0
∞
𝑎𝑛 = 𝑎1 +
𝑘=0
∞
(𝑎2𝑘 + 𝑎2𝑘+3) = −1 +
𝑘=1
∞
1 + −1 = −1
Q. What is σ𝑛=0∞ 𝑎𝑛?
Why absolute convergence? 35
Let 𝑎𝑛 = −1 𝑛 for 𝑛 = 0,1,2,….
A1.
𝑛=0
∞
𝑎𝑛 = 1 + −1 + 1 + −1 +⋯ =
𝑘=0
∞
𝑎2𝑘 + 𝑎2𝑘+1 =
𝑘=0
∞
1 + −1 = 0
A2.
𝑛=0
∞
𝑎𝑛 = 𝑎0 +
𝑛=1
∞
𝑎𝑛 = 1 +
𝑘=0
∞
(𝑎2𝑘+1 + 𝑎2𝑘+2) = 1 +
𝑘=1
∞
−1 + 1 = 1
A3.
𝑛=0
∞
𝑎𝑛 = 𝑎1 +
𝑘=0
∞
(𝑎2𝑘 + 𝑎2𝑘+3) = −1 +
𝑘=1
∞
1 + −1 = −1
A4.
𝑛=0
∞
𝑎𝑛 = 𝑎0 + 𝑎2 +
𝑘=0
∞
(𝑎2𝑘+1 + 𝑎2𝑘+4) = 1 +
𝑘=1
∞
−1 + 1 = 2
A. σ𝑛=0∞ 𝑎𝑛 is not (well-)defined.
Q. What is σ𝑛=0∞ 𝑎𝑛?
Expectation36
Expectation (期待値) of a discrete random variable X is defined by
E 𝑋 =
𝑥∈Ω
𝑥 ⋅ 𝑓 𝑥
only when the right hand side is converged absolutely (絶対収束),
i.e., σ𝑥∈Ω 𝑥 ⋅ 𝑓 𝑥 < ∞ holds.
If it is not the case, we say “expectation does not exist.”
Expectation (期待値) of a continuous random variable X is defined by
E 𝑋 = න−∞
∞
𝑥 ⋅ 𝑓 𝑥 d𝑥
only when the right hand side is converged absolutely (絶対収束),
i.e., −∞
∞𝑥 ⋅ 𝑓 𝑥 d𝑥 < ∞ holds.
If it is not the case, we say “expectation does not exist.”
Compute expectations of distributions37
*Ex 2.
Discrete
(*i) Bernoulli distribution B 1, 𝑝 .
(*ii) Binomial distribution B 𝑛, 𝑝 .
(iii) Geometric distribution Ge 𝑝 .
(iv) Poisson distribution Po 𝜆 .
Continuous
(v) Exponential distribution Ex 𝛼 .
(vi) Normal distribution N 𝜇, 𝜎2 .
Ex. Expectation of Geom. distr. 38
Thm.
The expectation of 𝑋 ∼ 𝐵 𝑛, 𝑝 is 𝑛𝑝
proof
𝑘=0
𝑛
𝑘𝑛
𝑘𝑝𝑘 1 − 𝑝 𝑛−𝑘 =
𝑘=0
𝑛
𝑘𝑛!
𝑘! 𝑛 − 𝑘 !𝑝𝑘 1 − 𝑝 𝑛−𝑘
=
𝑘=1
𝑛
𝑘𝑛!
𝑘! 𝑛 − 𝑘 !𝑝𝑘 1 − 𝑝 𝑛−𝑘
=
𝑘=1
𝑛𝑛!
(𝑘 − 1)! 𝑛 − 𝑘 !𝑝𝑘 1 − 𝑝 𝑛−𝑘
=
𝑘=1
𝑛
𝑛𝑝(𝑛 − 1)!
(𝑘 − 1)! 𝑛 − 𝑘 !𝑝𝑘−1 1 − 𝑝 𝑛−𝑘
= 𝑛𝑝
𝑘′=0
𝑛−1𝑛 − 1
𝑘′𝑝𝑘
′1 − 𝑝 𝑛−1−𝑘′
= 𝑛𝑝
Ex. Expectation of Geom. distr. 39
Thm.
The expectation of 𝑋 ∼ Ge 𝑝 is 1−𝑝
𝑝.
Proof
E 𝑋 = 0 𝑝 + 1 1 − 𝑝 𝑝 + 2 1 − 𝑝 2𝑝 + 3 1 − 𝑝 3𝑝 + ⋯−) 1 − 𝑝 E 𝑋 = 0 1 − 𝑝 𝑝 + 1 1 − 𝑝 2𝑝 + 2 1 − 𝑝 3𝑝 +⋯
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−𝑝E 𝑋 = 1 − 𝑝 𝑝 + 1 − 𝑝 2𝑝 + 1 − 𝑝 3𝑝 +⋯
=1 − 𝑝 𝑝
1 − (1 − 𝑝)= 1 − 𝑝
Thus E 𝑋 =1−𝑝
𝑝.
Properties of Expectations40
Thm.
For an arbitrary constant c,
E 𝑐 = 𝑐E 𝑐𝑋 = 𝑐 ⋅ E 𝑋E 𝑋 + 𝑐 = E 𝑋 + 𝑐
Linearity of expectations (discrete random variables)41
Thm. (linearity of expectation; 期待値の線形性)
E
𝑖=1
𝑛
𝑋𝑖 =
𝑖=1
𝑛
E(𝑋𝑖)
proof.
E 𝑋 + 𝑌
= σ𝑥σ𝑦(𝑥 + 𝑦) Pr 𝑋 = 𝑥 ∩ 𝑌 = 𝑦
= σ𝑥σ𝑦 𝑥𝑓(𝑥, 𝑦) + σ𝑥σ𝑦 𝑦𝑓(𝑥, 𝑦)
= σ𝑥 𝑥 σ𝑦 𝑓(𝑥, 𝑦) + σ𝑦 𝑦σ𝑥 𝑓(𝑥, 𝑦)
= σ𝑥 𝑥𝑓(𝑥) + σ𝑦 𝑦𝑓(𝑦)
= E 𝑋 + E[𝑌]
= σ𝑥σ𝑦 𝑥 + 𝑦 𝑓(𝑥, 𝑦)
Linearity of expectations (continuous random variables)42
Thm. (linearity of expectation; 期待値の線形性)
E
𝑖=1
𝑛
𝑋𝑖 =
𝑖=1
𝑛
E(𝑋𝑖)
proof.
E 𝑋 + 𝑌
= ∞−+∞
∞−+∞
𝑥 + 𝑦 𝑓 𝑥, 𝑦 d𝑥d𝑦
= −∞
+∞−∞
+∞𝑥𝑓 𝑥, 𝑦 d𝑥d𝑦 +
−∞
+∞−∞
+∞𝑦𝑓 𝑥, 𝑦 d𝑥d𝑦
= −∞
+∞𝑥
−∞
+∞𝑓 𝑥, 𝑦 d𝑦 d𝑥 +
−∞
+∞𝑦
−∞
+∞𝑓 𝑥, 𝑦 d𝑥 d𝑦
= −∞
+∞𝑥𝑓(𝑥)d𝑥 +
−∞
+∞𝑦𝑓(𝑦)d𝑦
= E 𝑋 + E[𝑌]
Application of linearity of expectation43
Thm.
The expectation of 𝑋 ∼ B(𝑛; 𝑝) is 𝑛𝑝
proof
Suppose 𝑋1, … , 𝑋𝑛 are i.i.d. B(1; 𝑝),
then 𝑌 ≔ 𝑋1 +⋯+ 𝑋𝑛 follows B(𝑛; 𝑝).
E 𝑋𝑖 = 1 ⋅ 𝑝 + 0 ⋅ (1 − 𝑝)
E 𝑌 = E σ𝑖𝑋𝑖 = σ𝑖 E 𝑋𝑖 = σ𝑖 𝑝 = 𝑝𝑛