Conditional Prob. & Discrete Distrib.
May 20, 2020
来嶋 秀治 (Shuji Kijima)
Dept. Informatics, ISEE
Todays topics
• Bayes’ theorem
• Probability distributions
• Discrete distributions and expectations
確率統計特論 (Probability & Statistics)
lesson 2
Today’s question: Boy or Girl2
Question 1.
Desmond and Molly have two kids. One is a boy.
What is the probability that the other is a girl?
Elder Younger
Case 1 Boy Boy
Case 2 Boy Girl
Case 3 Girl Boy
Case 4 Girl Girl
Conditional Probability
Today’s topic 1
Terminology4
Def. 1. Joint probability; (同時確率 or 結合確率)
Pr 𝐴, 𝐵 = Pr(𝐴 ∩ 𝐵)
Def. 2. Conditional Probability (条件付き確率)
Pr 𝐴 𝐵 =Pr 𝐴, 𝐵
Pr(𝐵)
Def. 3. Events 𝐴 and 𝐵 are independent (独立)
Pr 𝐴, 𝐵 = Pr(𝐴) Pr 𝐵
Events 𝐴1, 𝐴2, … , 𝐴𝑘 are mutually independent (相互に独立)
Pr ∩𝑖=1𝑘 𝐴𝑖 = ς𝑖=1
𝑘 Pr(𝐴𝑖)
Events 𝐴1, 𝐴2, … , 𝐴𝑘 are pairwise independent (対ごとに独立)
Pr 𝐴𝑖 , 𝐴𝑗 = Pr(𝐴𝑖) Pr 𝐴𝑗 for any distinct 𝑖, 𝑗
see ex. 1.
Tossing coins (Independence)5
Suppose two coins.
Head probability of coin A is 0.5.
Head probability of coin B is 0.5.
The probability of two heads
Pr H , H = Pr H Pr([H]) =1
4
Tossing coins (Independence)6
Suppose two coins.
Head probability of coin A is 0.6.
Head probability of coin B is 0.7.
The probability of two heads
Pr H , H = Pr H Pr([H]) = 0.42
H T Prob.
H 0.42 0.18 0.6
T 0.28 0.12 0.4
Prob. 0.7 0.3
Tossing coins (Dependence)7
Two coins are made of magnets.
Head probability of coin A is 0.5.
Head probability of coin B is 0.5.
N
S
N S
S
N iron
Tossing coins (Dependence)8
Two coins are made of magnets.
Head probability of coin A is 0.5.
Head probability of coin B is 0.5.
The probability of two heads
Pr H , H = Pr H Pr([H])
H T Prob.
H 0.05 0.45 0.5
T 0.45 0.05 0.5
Prob. 0.5 0.5
N
S
N S
S
N iron
?
Independence test9
Good
(early healing)
No goodTotal
Med. 28 22 50
Placebo 13 37 50
Total 41 59 100
Pr med. , good = Pr med Pr(good)
?
Conditional probability of independent events10
Prop.
If 𝐴, 𝐵 are independent then Pr 𝐴 𝐵 = Pr 𝐴 .
Proof:
Pr 𝐴 𝐵 =Pr 𝐴,𝐵
Pr 𝐵…(*) by definition.
If 𝐴, 𝐵 are independent then Pr 𝐴, 𝐵 = Pr 𝐴 Pr 𝐵 …(**)
by the definition of “independent”.
By (*) and (**), we obtain the claim.
Bayes’ theorem11
Thm. (Bayes; ベイズ)
Pr 𝐴 𝐵 =Pr 𝐵 𝐴 Pr 𝐴
Pr 𝐵
Proof:
Note Pr 𝐴 𝐵 =Pr 𝐴,𝐵
Pr 𝐵implies Pr 𝐴, 𝐵 = Pr 𝐴 𝐵 Pr 𝐵 …(*).
Note Pr 𝐵 𝐴 =Pr 𝐴,𝐵
Pr 𝐵implies Pr 𝐴, 𝐵 = Pr 𝐵 𝐴 Pr 𝐴 …(**).
(*) and (**) implies Pr 𝐴 𝐵 Pr 𝐵 = Pr 𝐵 𝐴 Pr 𝐴 …(***).
(***) implies the claim.
Conditional Probability12
A B
Conditional Probability
Pr ○ | 𝐴 =Pr ○, 𝐴
Pr(𝐴)=
12 ∗ 0.6
12
= 0.6
There are two boxes 𝐴, 𝐵:
• Look-alike
• Can’t see inside
• White balls (w) and black
balls (b) are inside:
6w and 4b in 𝐴,
2w and 8b in 𝐵.
You choose one box uniformly at
random, and pick a ball from the
box. Suppose the ball is w.
Conditional Probability13
A BThere are two boxes 𝐴, 𝐵:
• Look-alike
• Can’t see inside
• White balls (w) and black
balls (b) are inside:
6w and 4b in 𝐴,
2w and 8b in 𝐵.
You choose one box uniformly at
random, and pick a ball from the
box. Suppose the ball is w.
Bayes’ probability
Pr 𝐴| ○ =Pr ○ |𝐴 Pr(𝐴)
Pr(○)=0.6 ∗
12
820
=3
4
Today’s question: Boy or Girl14
Question 1.
Desmond and Molly have two kids. One is a boy.
What is the probability that the other is a girl?
Elder Younger Prob.
Case 1 Boy Boy 1/4
Case 2 Boy Girl 1/4
Case 3 Girl Boy 1/4
Case 4 Girl Girl 1/4
Pr G B =Pr[B, G]
Pr[𝐵]
=
2434
=2
3
By Bayes’ thm.
Bayes’ theorem (general)15
Thm. (Bayes; ベイズ)
𝐴1, … , 𝐴𝑘 are mutually exclusive, and ∪𝑖=1𝑘 𝐴𝑖 = Ω.
Pr 𝐴𝑖 𝐵) =Pr 𝐵 𝐴𝑖 Pr(𝐴𝑖)
σ𝑗=1𝑘 Pr 𝐵 𝐴𝑗) Pr(𝐴𝑗)
Prop.
𝐴1, … , 𝐴𝑘 are mutually exclusive, and ∪𝑖=1𝑘 𝐴𝑖 = Ω.
Pr 𝐵 =
𝑖=1
𝑘
Pr(𝐴𝑖 , 𝐵)
(the right hand side) is called marginal distribution.
Bayes’ theorem (general)16
Prop.
𝐴1, … , 𝐴𝑘 are mutually exclusive, and ∪𝑖=1𝑘 𝐴𝑖 = Ω.
Pr 𝐵 =
𝑖=1
𝑘
Pr(𝐴𝑖 , 𝐵)
(the right hand side) is called marginal distribution.
Proof:
Remark σ𝑖=1𝑘 Pr(𝐴𝑖 , 𝐵) = Pr(𝐴1 ∩ 𝐵) + Pr(𝐴2 ∩ 𝐵) +⋯+ Pr 𝐴𝑘 ∩ 𝐵 .
Note (𝐴𝑖 ∩ 𝐵) for 𝑖 = 1,… , 𝑘 are mutually exclusive
(since 𝐴1, … , 𝐴𝑘 are mutually exclusive and by set theory).
Thus, σ𝑖=1𝑘 Pr(𝐴𝑖 ∩ 𝐵) = Pr 𝑖=1ڂ
𝑘 𝐴𝑖 ∩ 𝐵 …(*) by Axiom (3).
By set theory, ڂ𝑖=1𝑘 𝐴𝑖 ∩ 𝐵 = 𝑖=1ڂ
𝑘 𝐴 ∩ 𝐵 …(**) (distributive property).
The hypothesis ڂ𝑖=1𝑘 𝐴 = Ω implies ڂ𝑖=1
𝑘 𝐴 ∩ 𝐵 = 𝐵 …(***) (by set theory).
(*), (**) and (***) imply σ𝑖=1𝑘 Pr(𝐴𝑖 ∩ 𝐵) = Pr 𝐵 , and we obtain the claim.
Bayes’ theorem (general)17
Thm. (Bayes; ベイズ)
𝐴1, … , 𝐴𝑘 are mutually exclusive, and ∪𝑖=1𝑘 𝐴𝑖 = Ω.
Pr 𝐴𝑖 𝐵) =Pr 𝐵 𝐴𝑖 Pr(𝐴𝑖)
σ𝑗=1𝑘 Pr 𝐵 𝐴𝑗) Pr(𝐴𝑗)
Proof:
Pr 𝐴𝑖 𝐵 =Pr 𝐵 𝐴𝑖 Pr 𝐴𝑖
Pr 𝐵…(*)
by Bayes’ theorem for two events 𝐴𝑖 and 𝐵.
We remark Pr 𝐵 𝐴𝑖 Pr 𝐴𝑗 = Pr 𝐵, 𝐴𝑗 for 𝑗 = 1,… , 𝑘. …(**)
(*) and (**) with Prop. imply the claim.
Ex 1. Monty Hall problem --- ask Marilyn18
You are given the choice of three doors:
Behind on door is a car; behind the others goats.
You pick a door, say A.
The host (Monty), who knows what's behind the doors,
opens another door, say C, which he knows has a goat.
He then says to you, "Do you want to pick door B?"
Question
Is it to your advantage to switch your choice?
図: wikipedia”モンティーホール問題”より
Ex 1. Monty Hall problem --- ask Marilyn19
You are given the choice of three doors:
Behind on door is a car; behind the others goats.
You pick a door, say A.
The host (Monty), who knows what's behind the doors,
opens another door, say C, which he knows has a goat.
He then says to you, "Do you want to pick door B?"
図: wikipedia”モンティーホール問題”より
Pr 𝐵∗ 𝐶 =Pr 𝐶 𝐵∗ Pr 𝐵∗
Pr 𝐶 𝐴∗) Pr 𝐴∗ + Pr 𝐶 𝐵∗) Pr 𝐵∗ + Pr 𝐶 𝐶∗) Pr 𝐶∗
=1 ×
13
12×13+ 1 ×
13+ 0 ×
13
=2
3
By Bayes’ thm.
Discrete Distributions
Probability on ℝ𝑛
Today’s topic 2
21
“variable” vs “random variable”
Ex. 1. Set Ω
Ω = 1,2,3,4,5,6
Let 𝑥 be a member of Set Ω.
Observation
𝑥 ∈ Ω
22
Def. random variable
Ex. 1. die Ω,ℱ, 𝑃
Ω = 1,2,3,4,5,6
ℱ = 2Ω
𝑃 𝐴 =𝐴
6for any 𝐴 ⊆ Ω.
Let 𝑋 denote the “cast” of Ω,ℱ, 𝑃Observation
𝑋 ∈ Ω (∈ ℱ in fact)
𝑃 𝑋 is odd =1
2
𝑃 𝑋 < 5 =2
3etc.
Note
random variable may not be a member of ℱ.
e.g., Let 𝑌 ≔ square of castwhere, there is a map from ℱ. (see regime)
called random variable.
(usually denoted by CAPITALS)
terminology23
Discrete distribution (離散分布)
distribution on countable set Ξ ⊆ ℝ such that
σ𝑥∈ΞPr 𝑋 = 𝑥 = 1 holds
Probability function (確率関数)
𝑓 𝑥 = Pr 𝑋 = 𝑥
(cumulative) distribution function ((累積)分布関数)
𝐹 𝑥 = Pr 𝑋 ≤ 𝑥
note Ξ may not be Ω (cf. ex. 6)
important concept
in continuous distr.
(next week)
𝑋 is called “random variable (確率変数)”
(univariate) discrete distributions
uniform dist. (離散一様分布)
Bernoulli dist. (ベルヌーイ分布; 2点分布)
binomial dist. (2項分布)
geometric dist. (幾何分布)
Poisson dist. (ポアソン分布)
25
discrete uniform (離散一様分布)
Ω = 1,2,… , 𝑛
Pr 𝑋 = 𝑖 =1
𝑛
Ω = 0,1,2,… , 36
ℱ = 2Ω
Pr 𝑋 = 𝑥 =1
37(𝑥 ∈ Ω)
roulette
https://en.wikipedia.org/wiki/Roulette
26
Bernoulli (ベルヌーイ分布, 2点分布) B(1;p)
Ω = 0,1
Pr 𝑋 = 1 = 𝑝
Pr 𝑋 = 0 = 1 − 𝑝
An experiment outputting a random variable
according to Bernoulli dist. is said
Bernoulli trial (ベルヌーイ試行).
(biased) coin tossing
head (𝑋 = 1)
tail (𝑋 = 0)
27
binomial dist. (2項分布) B 𝑛; 𝑝
Ω = 0,1,2,… , 𝑛
Pr 𝑋 = 𝑘 =𝑛
𝑘𝑝𝑘 1 − 𝑝 𝑛−𝑘
Let 𝑋1, 𝑋2, … , 𝑋𝑛 be outputs of Bernoulli trial (B 1; 𝑝 ), i.i.d.
Let 𝑋 = 𝑋1 + 𝑋2 +⋯+ 𝑋𝑛
meaning that the total number of heads.
𝑋 is according to a binomial distribution B 𝑛; 𝑝
28
geometric dist. (幾何分布) Ge(p)
Ω = 0,1,2,…
Pr 𝑋 = 𝑘 = 1 − 𝑝 𝑘𝑝
Repeat Bernoulli trials B 1; 𝑝 i.i.d., until head.
Let 𝐾 denote the number of tail before head,
then 𝐾 is according to a geometric distribution Ge 𝑝 .
29
Poisson dist. (ポアソン分布) Po() (>0)
Ω = 0,1,2,…
Pr 𝑋 = 𝑧 = 𝑒−𝜆𝜆𝑧
𝑧!
Let’s consider the probability of rare events,
the expected number of occurrences is 𝜆 in a unit time.
Let 𝑋 be the number of occurrences,
then 𝑋 is known to be according to the Poisson distr. Po(𝜆).
More precisely, repeat Bernoulli trials B 1; 𝑝 i.i.d. with 𝑝 ≪ 1.
Let 𝜆 = 𝑛𝑝, then it is known that B 𝑛; 𝑝 ≃ Po(𝜆).
today’s Exercise 2. Poisson distr. appears later today.
30
Discrete distr.: (distr. on a countable set R)
σ𝑥∈ΩPr 𝑋 = 𝑥 = 1 holds.
probability function (確率関数)
𝑓 𝑥 = Pr 𝑋 = 𝑥
(cumulative) distribution function ((累積)分布関数)
𝐹 𝑥 = Pr 𝑋 ≤ 𝑥
1
P
x
F(x)
1 2 3 4 5 6
1/6
2/6
3/6
4/65/6
31
Discrete distr.: (distr. on a countable set R)
σ𝑥∈ΩPr 𝑋 = 𝑥 = 1 holds.
probability function (確率関数)
𝑓 𝑥 = Pr 𝑋 = 𝑥
(cumulative) distribution function ((累積)分布関数)
𝐹 𝑥 = Pr 𝑋 ≤ 𝑥
1
P
x
F(x)
1 2 3 4 5 6
1/6
2/6
3/6
4/65/6
Discrete Distribution Function 𝐹: Ω → R≥0
1. 𝐹 −∞ = 0, 𝐹 +∞ = 1
2. Monotone non-decreasing (単調非減少)
3. Right continuous (右連続)