18: central limit theorem - stanford university...•perform an experiment •based on experiment...

61
18: Central Limit Theorem Lisa Yan November 1, 2019 1

Upload: others

Post on 25-Nov-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

18: Central Limit TheoremLisa YanNovember 1, 2019

1

Page 2: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Beta random variabledef An Beta random variable 𝑋 is defined as follows:

2

𝑓 𝑥 =1

𝐵 𝑎, 𝑏𝑥!"# 1 − 𝑥 $"#𝑋~Beta(𝑎, 𝑏)

VarianceExpectation

PDF

𝐸 𝑋 =𝑎

𝑎 + 𝑏 Var 𝑋 =𝑎𝑏

𝑎 + 𝑏 - 𝑎 + 𝑏 + 1

Support of 𝑋: 0, 1

𝑎 > 0, 𝑏 > 0

Beta is a distribution for probabilities.

where 𝐵 𝑎, 𝑏 = ∫/# 𝑥!"# 1 − 𝑥 $"#𝑑𝑥, normalizing constant

Review

Page 3: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

🤔

0.0

1.0

2.0

3.0

4.0

0.0 0.2 0.4 0.6 0.8 1.0

3

CS109 focus: Beta where 𝑎, 𝑏 both positive integers

If 𝑎, 𝑏 are positive integers,Beta parameters 𝑎, 𝑏 couldcome from an experiment:

𝑎 = “successes” + 1𝑏 = “failures” + 1

𝑋~Beta(𝑎, 𝑏)

Beta(5,5)

Beta(2,8) Beta(8,2)

Review

Page 4: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

0.0

1.0

2.0

3.0

4.0

0.0 0.2 0.4 0.6 0.8 1.0

Back to flipping coins• Start with a 𝑋~Uni 0,1 over probability• Observe 𝑛 = 7 successes and 𝑚 = 1 failures• Your new belief about the probability of 𝑋 is:

𝑓5|7 𝑥 𝑛 =1𝑐𝑥8 1 − 𝑥 #,where 𝑐 = 5

/

#𝑥8 1 − 𝑥 #𝑑𝑥

4𝑓 7|9𝑥|𝑛

Posterior belief, 𝑋|𝑁

𝑥

Posterior belief, 𝑋|𝑁:Beta(𝑎 = 8, 𝑏 = 2)

Beta(𝑎 = 𝑛 + 1, 𝑏 = 𝑚 + 1)

𝑓5|7 𝑥 𝑛 =1𝑐𝑥9"# 1 − 𝑥 :"#

Page 5: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Understanding Beta• Start with a 𝑋~Uni 0,1 over probability• Observe 𝑛 successes and 𝑚 failures• Your new belief about the probability of 𝑋 is:

𝑋|𝑁~Beta(𝑎 = 𝑛 + 1, 𝑏 = 𝑚 + 1)

5

Page 6: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Understanding Beta• Start with a 𝑋~Uni 0,1 over probability• Observe 𝑛 successes and 𝑚 failures• Your new belief about the probability of 𝑋 is:

𝑋|𝑁~Beta(𝑎 = 𝑛 + 1, 𝑏 = 𝑚 + 1)Check this out:

Beta 𝑎 = 1, 𝑏 = 1 has PDF:

So our prior 𝑋~Beta 𝑎 = 1, 𝑏 = 1 !

6

𝑓 𝑥 =1

𝐵 𝑎, 𝑏𝑥?@A 1 − 𝑥 B@A

where 0 < 𝑥 < 1

=1

𝐵 𝑎, 𝑏𝑥D 1 − 𝑥 D =

1

∫DA 1𝑑𝑥

Page 7: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

If the prior is a Beta…Let 𝑋 be our random variable for probability of success and 𝑁• If our prior belief about 𝑋 is beta:

• …and if we observe 𝑛 successes and 𝑚 failures:

• …then our posterior belief about 𝑋is also beta.

7

𝑋~Beta(𝑎, 𝑏)

𝑁|𝑋~Bin(𝑛 + 𝑚, 𝑥)

𝑋|𝑁~Beta(𝑎 + 𝑛, 𝑏 + 𝑚)

This is the main takeaway of Beta.👉

likelihood

Page 8: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

If the prior is a Beta…Let 𝑋 be our random variable for probability of success and 𝑁• If our prior belief about 𝑋 is beta:

• …and if we observe 𝑛 successes and 𝑚 failures:

• …then our posterior belief about 𝑋is also beta.

8

𝑋~Beta(𝑎, 𝑏)

Proof:

𝑓5|7 𝑥 𝑛 =𝑝7|5 𝑛|𝑥 𝑓5 𝑥

𝑝7 𝑛 =

𝑛 +𝑚𝑛 𝑥G 1 − 𝑥 H ⋅ 1

𝐵 𝑎, 𝑏 𝑥?@A 1 − 𝑥 B@A

𝑝9 𝑛

= 𝐶 ⋅ 𝑥> 1 − 𝑥 ? ⋅ 𝑥!"# 1 − 𝑥 $"#constants that don’t depend on 𝑥

= 𝐶 ⋅ 𝑥>@!"# 1 − 𝑥 ?@$"# ✅

𝑁|𝑋~Bin(𝑛 + 𝑚, 𝑥)

𝑋|𝑁~Beta(𝑎 + 𝑛, 𝑏 + 𝑚)

likelihood

Page 9: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

If the prior is a Beta…Let 𝑋 be our random variable for probability of success and 𝑁• If our prior belief about 𝑋 is beta:

• …and if we observe 𝑛 successes and 𝑚 failures:

• …then our posterior belief about 𝑋is also beta.

9

𝑋~Beta(𝑎, 𝑏)

𝑋|𝑁~Beta(𝑎 + 𝑛, 𝑏 + 𝑚)

Beta is a conjugate distribution.• Prior and posterior parametric forms are the same• Practically, conjugate means easy update:

Add number of “heads” and “tails” seento Beta parameter.

𝑁|𝑋~Bin(𝑛 + 𝑚, 𝑥)likelihood

Page 10: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

If the prior is a Beta…Let 𝑋 be our random variable for probability of success and 𝑁• If our prior belief about 𝑋 is beta:

• …and if we observe 𝑛 successes and 𝑚 failures:

• …then our posterior belief about 𝑋is also beta.

10

𝑋~Beta(𝑎, 𝑏)

𝑋|𝑁~Beta(𝑎 + 𝑛, 𝑏 + 𝑚)

You can set the prior to reflect how biased you think the coin is apriori.• This is a subjective probability!• 𝑋~Beta(𝑎, 𝑏): have seen 𝑎 + 𝑏 − 2 imaginary trials, where

𝑎 − 1 are heads, 𝑏 − 1 tails• Then Beta 1, 1 = Uni(0, 1) means we haven’t seen any imaginary trials

𝑁|𝑋~Bin(𝑛 + 𝑚, 𝑥)likelihood

Page 11: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

If the prior is a Beta…Let 𝑋 be our random variable for probability of success and 𝑁• If our prior belief about 𝑋 is beta:

• …and if we observe 𝑛 successes and 𝑚 failures:

• …then our posterior belief about 𝑋is also beta.

11

𝑋~Beta(𝑎, 𝑏)

𝑁|𝑋~Bin(𝑛 + 𝑚, 𝑥)

This is the main takeaway of Beta.👉

likelihood

Beta(𝑎 = 𝑛A?!B + 1, 𝑏 = 𝑚A?!B + 1)Prior

Beta 𝑎 = 𝑛A?!B + 𝑛 + 1, 𝑏 = 𝑚A?!B +𝑚 + 1Posterior

𝑋|𝑁~Beta(𝑎 + 𝑛, 𝑏 + 𝑚)

Page 12: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

The enchanted dieLet 𝑋 be the probability of rolling a 6 on Lisa’s die.• Prior: Imagine 5 die rolls where only 6 showed up• Observation: roll it a few times…

What is the updated distribution of 𝑋 after our observation?

Check out the demo!

12

Beta(𝑎 = 𝑛A?!B + 1, 𝑏 = 𝑚A?!B + 1)Prior

Beta(𝑎 = 𝑛A?!B + 𝑛 + 1, 𝑏 = 𝑚A?!B + 𝑚 + 1)Posterior

Page 13: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Medicinal Beta• Before being tested, a medicine is believed to “work” 80% of the time.• The medicine is tried on 20 patients.• It “works” for 12, “doesn’t work” for 8.

What is your new belief that the drug “works”?

13

Frequentist

Let 𝑝 be the probabilityyour drug works.

𝑝 ≈1220

= 0.6

Bayesian

A frequentist view will not incorporateprior/expert belief about probability.👉

Page 14: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Medicinal Beta• Before being tested, a medicine is believed to “work” 80% of the time.• The medicine is tried on 20 patients.• It “works” for 12, “doesn’t work” for 8.

What is your new belief that the drug “works”?

14

Frequentist

Let 𝑝 be the probabilityyour drug works.

𝑝 ≈1420

= 0.7

Bayesian

Let 𝑋 be the probabilityyour drug works.

𝑋 is a random variable.

Page 15: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

🤔15

Medicinal Beta• Before being tested, a medicine is believed to “work” 80% of the time.• The medicine is tried on 20 patients.• It “works” for 12, “doesn’t work” for 8.

What is your new belief that the drug “works”?

What is the prior distribution of 𝑋? (select all that apply)

A. 𝑋~Beta 1, 1 = Uni 0, 1B. 𝑋~Beta 81, 101C. 𝑋~Beta 80, 20D. 𝑋~Beta 81, 21E. 𝑋~Beta 5, 2

Beta(𝑎 = 𝑛A?!B + 1, 𝑏 = 𝑚A?!B + 1)Prior

Beta(𝑎 = 𝑛A?!B + 𝑛 + 1, 𝑏 = 𝑚A?!B + 𝑚 + 1)Posterior

(Bayesian interpretation)

Page 16: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

🤔16

Medicinal Beta• Before being tested, a medicine is believed to “work” 80% of the time.• The medicine is tried on 20 patients.• It “works” for 12, “doesn’t work” for 8.

What is your new belief that the drug “works”?

What is the prior distribution of 𝑋? (select all that apply)

A. 𝑋~Beta 1, 1 = Uni 0, 1B. 𝑋~Beta 81, 101C. 𝑋~Beta 80, 20D. 𝑋~Beta 81, 21E. 𝑋~Beta 5, 2

Beta(𝑎 = 𝑛A?!B + 1, 𝑏 = 𝑚A?!B + 1)Prior

Beta(𝑎 = 𝑛A?!B + 𝑛 + 1, 𝑏 = 𝑚A?!B + 𝑚 + 1)Posterior

Interpretation: 80 successes / 100 imaginary trials

Interpretation: 4 successes / 5 imaginary trials

(Bayesian interpretation)

(you can choose either; we choose E on next slide)

Page 17: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

🤔17

Medicinal Beta• Before being tested, a medicine is believed to “work” 80% of the time.• The medicine is tried on 20 patients.• It “works” for 12, “doesn’t work” for 8.

What is your new belief that the drug “works”?

Prior: 𝑋~Beta 𝑎 = 5, 𝑏 = 2Posterior: 𝑋~Beta 𝑎 = 5 + 12, 𝑏 = 2 + 8

~Beta 𝑎 = 17, 𝑏 = 10

Beta(𝑎 = 𝑛A?!B + 1, 𝑏 = 𝑚A?!B + 1)Prior

Beta(𝑎 = 𝑛A?!B + 𝑛 + 1, 𝑏 = 𝑚A?!B + 𝑚 + 1)Posterior

(Bayesian interpretation)

0.01.02.03.04.05.0

0.0 0.2 0.4 0.6 0.8 1.0

Posterior

Prior

𝑥

Page 18: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

🤔18

Medicinal Beta• Before being tested, a medicine is believed to “work” 80% of the time.• The medicine is tried on 20 patients.• It “works” for 12, “doesn’t work” for 8.

What is your new belief that the drug “works”?

Prior: 𝑋~Beta 𝑎 = 5, 𝑏 = 2Posterior: 𝑋~Beta 𝑎 = 5 + 12, 𝑏 = 2 + 8

~Beta 𝑎 = 17, 𝑏 = 10

Beta(𝑎 = 𝑛A?!B + 1, 𝑏 = 𝑚A?!B + 1)Prior

Beta(𝑎 = 𝑛A?!B + 𝑛 + 1, 𝑏 = 𝑚A?!B + 𝑚 + 1)Posterior

(Bayesian interpretation)

What do you report to pharmacists?A. Expectation of posteriorB. Mode of posteriorC. Distribution of posteriorD. Nothing

0.01.02.03.04.05.0

0.0 0.2 0.4 0.6 0.8 1.0

Posterior

Prior

𝑥

mode

Page 19: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

🤔19

Medicinal Beta• Before being tested, a medicine is believed to “work” 80% of the time.• The medicine is tried on 20 patients.• It “works” for 12, “doesn’t work” for 8.

What is your new belief that the drug “works”?

Prior: 𝑋~Beta 𝑎 = 5, 𝑏 = 2Posterior: 𝑋~Beta 𝑎 = 5 + 12, 𝑏 = 2 + 8

~Beta 𝑎 = 17, 𝑏 = 10

Beta(𝑎 = 𝑛A?!B + 1, 𝑏 = 𝑚A?!B + 1)Prior

Beta(𝑎 = 𝑛A?!B + 𝑛 + 1, 𝑏 = 𝑚A?!B + 𝑚 + 1)Posterior

(Bayesian interpretation)

What do you report to pharmacists?A. Expectation of posteriorB. Mode of posteriorC. Distribution of posteriorD. Nothing

0.01.02.03.04.05.0

0.0 0.2 0.4 0.6 0.8 1.0

Posterior

Prior

𝑥

𝐸 𝑋 =𝑎

𝑎 + 𝑏=

1717 + 10

≈ 0.63

mode 𝑋 =𝑎 − 1

𝑎 + 𝑏 − 2=

1616 + 9

= 0.64

mode

Page 20: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Food for thought

20

𝑌~Ber 𝑝

In this lecture: If we don’t know the parameter 𝑝,Bayesian statisticians will:• Treat the parameter as a random variable 𝑋

with a Beta prior distribution• Perform an experiment• Based on experiment outcomes, update the

posterior distribution of 𝑋Food for thought:

Any parameter for a “parameterized” random variable can be thought of as

a random variable.𝑌~𝒩 𝜇, 𝜎:

🤔

👉

Page 21: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Today’s plan

Finish Beta

Central Limit Theorem (CLT)

CLT exercises

21

Page 22: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

(silent drumroll)

22

Page 23: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Central Limit TheoremConsider 𝑛 independent and identically distributed (i.i.d.) variables 𝑋#, 𝑋:, … , 𝑋>with 𝐸 𝑋A = 𝜇 and Var 𝑋A = 𝜎:.

JAK#

>

𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)

The sum of 𝑛 i.i.d. random variables is normally distributed with mean 𝑛𝜇and variance 𝑛𝜎:.

23

As 𝑛 → ∞

Page 24: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

True happiness

24

Page 25: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Central Limit TheoremConsider 𝑛 independent and identically distributed (i.i.d.) variables 𝑋#, 𝑋:, … , 𝑋>with 𝐸 𝑋A = 𝜇 and Var 𝑋A = 𝜎:.

JAK#

>

𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)

The sum of 𝑛 i.i.d. random variables is normally distributed with mean 𝑛𝜇and variance 𝑛𝜎:.

25

As 𝑛 → ∞

Page 26: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

🤔26

Quick checkWhat dimensions are the following random variables?1. 𝑋#

2. 𝑋#, 𝑋:, … , 𝑋>

3.

4.

A. 1-D random variableB. 𝑛 -D random variable (a vector)C. not a random variable

WXYA

G

𝑋X

1𝑛WXYA

G

𝑋X

Page 27: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

🤔27

Quick checkWhat dimensions are the following random variables?1. 𝑋#

2. 𝑋#, 𝑋:, … , 𝑋>

3.

4.

A. 1-D random variableB. 𝑛 -D random variable (a vector)C. not a random variable

WXYA

G

𝑋X

1𝑛WXYA

G

𝑋X

(A)

(B)

(A)

(A) (aka the sample mean)

(aka a sample)

Page 28: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Central Limit TheoremConsider 𝑛 independent and identically distributed (i.i.d.) variables 𝑋#, 𝑋:, … , 𝑋>with 𝐸 𝑋A = 𝜇 and Var 𝑋A = 𝜎:.

JAK#

>

𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)

The sum of 𝑛 i.i.d. random variables is normally distributed with mean 𝑛𝜇and variance 𝑛𝜎:.

28

As 𝑛 → ∞

Page 29: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

i.i.d. random variablesConsider 𝑛 variables 𝑋#, 𝑋:, … , 𝑋>.𝑋#, 𝑋:, … , 𝑋> are independent and identically distributed if• 𝑋#, 𝑋:, … , 𝑋> are independent, and• All have the same PMF (if discrete) or PDF (if continuous).⇒ 𝐸 𝑋A = 𝜇 for 𝑖 = 1,… , 𝑛⇒ Var 𝑋A = 𝜎: for 𝑖 = 1,… , 𝑛

Side note: Multiple random variables 𝑋A, 𝑋-, … , 𝑋G are independent iff

for all subsets 𝑋A,… , 𝑋\29

𝑃 𝑋# ≤ 𝑥#, 𝑋: ≤ 𝑥:, … , 𝑋N ≤ 𝑥N =OAK#

N

𝑃 𝑋A ≤ 𝑥A

Same thing: i.i.d. iid IID

Page 30: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

🤔30

Quick check

Are 𝑋#, 𝑋:, … , 𝑋> i.i.d. with the following distributions?

1. 𝑋A~Exp 𝜆 , 𝑋A independent

2. 𝑋A~Exp 𝜆A , 𝑋A independent

3. 𝑋A~Exp 𝜆 , 𝑋# = 𝑋: = ⋯ = 𝑋>

4. 𝑋A~Bin 𝑛A , 𝑝 , 𝑋A independent

Page 31: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

🤔31

Quick check

Are 𝑋#, 𝑋:, … , 𝑋> i.i.d. with the following distributions?

1. 𝑋A~Exp 𝜆 , 𝑋A independent

2. 𝑋A~Exp 𝜆A , 𝑋A independent

3. 𝑋A~Exp 𝜆 , 𝑋# = 𝑋: = ⋯ = 𝑋>

4. 𝑋A~Bin 𝑛A , 𝑝 , 𝑋A independent

(unless 𝜆X equal)

dependent: 𝑋A = 𝑋- = ⋯ = 𝑋G

(unless 𝑛X equal)Note underlying Bernoulli RVs are i.i.d.!

Page 32: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Central Limit TheoremConsider 𝑛 independent and identically distributed (i.i.d.) variables 𝑋#, 𝑋:, … , 𝑋>with 𝐸 𝑋A = 𝜇 and Var 𝑋A = 𝜎:.

JAK#

>

𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)

The sum of 𝑛 i.i.d. random variables is normally distributed with mean 𝑛𝜇and variance 𝑛𝜎:.

32

As 𝑛 → ∞

(demo)

Page 33: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Sum of dice rolls

Roll 𝑛 independent dice. Let 𝑋A be the outcome of roll 𝑖. 𝑋P are i.i.d.

33

WXYA

A

𝑋X

2 4 6 8 10 120.0

0.1

0.2

1 2 3 4 5 60.0

0.1

0.2

3 5 7 9 11131517

WXYA

-

𝑋X WXYA

_

𝑋XSum of 1die roll

Sum of 2die rolls

Sum of 3die rolls

How many wayscan you roll a totalof 3 vs 11?

Page 34: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

CLT explains a lot

34

WXYA

G

𝑋X ~𝒩(𝑛𝜇, 𝑛𝜎-)

Normal approximation of BinomialSum of i.i.d. Bernoulli RVs ≈ Normal

𝑋~Bin(𝑛, 𝑝)𝑋 =WXYA

G

𝑋X

𝑋~𝒩 𝑛𝑝, 𝑛𝑝(1 − 𝑝)

Proof:

𝑋~𝒩 𝑛𝜇, 𝑛𝜎: CLT, as 𝑛 → ∞

(substitute mean,variance of Bernoulli)

Let 𝑋X~Ber(𝑝) for 𝑖 = 1,… , 𝑛, where 𝑋X are i.i.d.𝐸 𝑋X = 𝑝, Var 𝑋X = 𝑝(1 − 𝑝)

As 𝑛 → ∞The sum of 𝑛 i.i.d. random variables is normally distributed with mean 𝑛𝜇 and variance 𝑛𝜎-.

Page 35: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

CLT explains a lot

35

WXYA

G

𝑋X ~𝒩(𝑛𝜇, 𝑛𝜎-)

Sample ofsize 15,

sum values

Distribution of 𝑋X Distribution of ∑XYAAb 𝑋X

As 𝑛 → ∞The sum of 𝑛 i.i.d. random variables is normally distributed with mean 𝑛𝜇 and variance 𝑛𝜎-.

Page 36: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

CLT explains a lot

36

WXYA

G

𝑋X ~𝒩(𝑛𝜇, 𝑛𝜎-)

Sample ofsize 15,

average values

(sample mean) Distribution of AAb∑XYAAb 𝑋XDistribution of 𝑋X

As 𝑛 → ∞The sum of 𝑛 i.i.d. random variables is normally distributed with mean 𝑛𝜇 and variance 𝑛𝜎-.

Page 37: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

CLT explains a lot

37

WXYA

G

𝑋X ~𝒩(𝑛𝜇, 𝑛𝜎-)

0 1 2 3 4 5

𝑛 = 5

Galton Board, by Sir Francis Galton(1822-1911)

As 𝑛 → ∞The sum of 𝑛 i.i.d. random variables is normally distributed with mean 𝑛𝜇 and variance 𝑛𝜎-.

Page 38: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Break for Friday/ announcements

38

BAYESIANS

Page 39: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Midterm exam

It’s done!Grades: Friday 11/1Solutions: Friday 11/1

Announcements

39

Problem Set 4

Due: Wednesday 11/6Covers: Up to Law of Total Expectation

Late day reminder: No late days permitted past last day of the quarter, 12/7

Page 40: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Proof of CLT

40

The sum of 𝑛 i.i.d. random variables is normally distributed with mean 𝑛𝜇 and variance 𝑛𝜎-.W

XYA

G

𝑋X ~𝒩(𝑛𝜇, 𝑛𝜎-)

• The Fourier Transform of a PDF is called a characteristic function.• Take the characteristic function of the probability mass of the sample

distance from the mean, divided by standard deviation• Show that this approaches an

exponential function in the limit as 𝑛 → ∞: • This function is in turn the characteristic function of the Standard

Normal, 𝑍~ 𝒩(0,1).

Proof:

𝑓 𝑥 = 𝑒"ST:

(this proof is beyond the scope of CS109)

As 𝑛 → ∞

Page 41: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Implications of CLT

Anything that is a sum/average of independent random variables is normal…meaning in real life, many things are normally distributed:• Movie ratings: averages of independent viewer scores• Polling:◦ Ask 100 people if they will vote for candidate 1◦ 𝑝# = # “yes”/100◦ Sum of Bernoulli RVs (each person independently says “yes” w.p. 𝑝)

◦ Repeat this process with different groups to get 𝑝A,… , 𝑝G (different sample statistics)

◦ Normal distribution over sample means 𝑝\◦ Confidence interval: “How likely is it that an estimate for true 𝑝 is close?”

41

Page 42: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

What about other functions?

42

WAK#

>

𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)

?

Sum of i.i.d. RVs

Average of i.i.d. RVs

Max of i.i.d. RVs

(sample mean)?

Let 𝑋#, 𝑋:, … , 𝑋> i.i.d., where 𝐸 𝑋A = 𝜇, Var 𝑋A = 𝜎:. As 𝑛 → ∞:

Page 43: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

What about other functions?

43

WAK#

>

𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)

?

Average of i.i.d. RVs

Max of i.i.d. RVs

(sample mean)?

Let 𝑋#, 𝑋:, … , 𝑋> i.i.d., where 𝐸 𝑋A = 𝜇, Var 𝑋A = 𝜎:. As 𝑛 → ∞:

Sum of i.i.d. RVs

Page 44: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

🤔44

Distribution of sample meanLet 𝑋#, 𝑋:, … , 𝑋> i.i.d., where 𝐸 𝑋A = 𝜇, Var 𝑋A = 𝜎:. As 𝑛 → ∞:

Define:

𝑌~𝒩 𝑛𝜇, 𝑛𝜎:

f𝑋 = #>𝑌

f𝑋~𝒩( ? , ? )

U𝑋 =1𝑛JAK#

>

𝑋A (sample mean) 𝑌 =JAK#

>

𝑋A (sum)

(Linear transform of a Normal)

A. f𝑋~𝒩(𝑛𝜇, 𝑛𝜎-)B. f𝑋~𝒩(hG ,

iT

G )C. f𝑋~𝒩(𝜇, 𝜎-)D. f𝑋~𝒩(𝜇, i

T

G)

(CLT, as 𝑛 → ∞)

Page 45: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

🤔45

Distribution of sample meanLet 𝑋#, 𝑋:, … , 𝑋> i.i.d., where 𝐸 𝑋A = 𝜇, Var 𝑋A = 𝜎:. As 𝑛 → ∞:

Define:

𝑌~𝒩 𝑛𝜇, 𝑛𝜎:

f𝑋 = #>𝑌

f𝑋~𝒩( ? , ? )

U𝑋 =1𝑛JAK#

>

𝑋A (sample mean) 𝑌 =JAK#

>

𝑋A (sum)

(Linear transform of a Normal)

A. f𝑋~𝒩(𝑛𝜇, 𝑛𝜎-)B. f𝑋~𝒩(hG ,

iT

G )C. f𝑋~𝒩(𝜇, 𝜎-)D. f𝑋~𝒩(𝜇, i

T

G)

(CLT, as 𝑛 → ∞)

Page 46: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

🤔46

Distribution of sample meanLet 𝑋#, 𝑋:, … , 𝑋> i.i.d., where 𝐸 𝑋A = 𝜇, Var 𝑋A = 𝜎:. As 𝑛 → ∞:

Define:

𝑌~𝒩 𝑛𝜇, 𝑛𝜎:

f𝑋 = #>𝑌

f𝑋~𝒩 𝜇, VT

>

U𝑋 =1𝑛JAK#

>

𝑋A (sample mean) 𝑌 =JAK#

>

𝑋A (sum)

(CLT, as 𝑛 → ∞)

(Linear transform of a Normal)

The average of i.i.d. random variables (i.e., sample mean) is normally distributed with mean 𝜇 and variance 𝜎-/𝑛.

1𝑛WXYA

G

𝑋X ~𝒩(𝜇,𝜎-

𝑛)

Demo: http://onlinestatbook.com/stat_sim/sampling_dist/

Page 47: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

?

What about other functions?

47

WAK#

>

𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)

Average of i.i.d. RVs

Max of i.i.d. RVs

(sample mean)

Let 𝑋#, 𝑋:, … , 𝑋> i.i.d., where 𝐸 𝑋A = 𝜇, Var 𝑋A = 𝜎:. As 𝑛 → ∞:

Sum of i.i.d. RVs

1𝑛WAK#

>

𝑋A ~𝒩(𝜇,𝜎:

𝑛)

Gumbel(see Fisher-Tippett Gnedenko Theorem)

Page 48: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Once upon a time…

48

Abraham de MoivreCLT for 𝑋~Ber 1/21733

Aubrey Drake Graham(Drake)

Page 49: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

A short history of the CLT

49

1700

1800

1900

2000

1733: CLT for 𝑋~Ber 1/2postulated by Abraham de Moivre

1823: Pierre-Simon Laplace extends de Moivre’swork to approximating Bin 𝑛, 𝑝 with Normal

1901: Alexandr Lyapunov provides precisedefinition and rigorous proof of CLT

2018: Drake releases Scorpion• It was his 5th studio album, bringing his total # of songs to 190• Mean quality of subsamples of songs is normally distributed (thanks

to the Central Limit Theorem)

Page 50: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Today’s plan

Central Limit Theorem (CLT)

CLT exercises

50

Page 51: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Working with the CLT

51

⚠If 𝑋X is discrete:

Use the continuity correction on 𝑌!

WAK#

>

𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)

1𝑛WAK#

>

𝑋A ~𝒩(𝜇,𝜎:

𝑛)

Sum of i.i.d. RVs

Average of i.i.d. RVs(sample mean)

Let 𝑋#, 𝑋:, … , 𝑋> i.i.d., where 𝐸 𝑋A = 𝜇, Var 𝑋A = 𝜎:. As 𝑛 → ∞:

Page 52: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

🤔52

Dice gameYou will roll 10 6-sided dice 𝑋#, 𝑋:, … , 𝑋#/ .• Let 𝑋 = 𝑋# + 𝑋: +⋯+ 𝑋#/, the total value of all 10 rolls.• You win if 𝑋 ≤ 25 or 𝑋 ≥ 45.

And now the truth (according to the CLT)…

WAK#

>

𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)As 𝑛 → ∞:

1. Define RVs andstate goal.

2. Solve.

𝐸 𝑋X = 3.5,Var 𝑋X = 35/12𝑋 ≈ 𝑌~𝒩(10 3.5 , 10 35/12 )

Want:𝑃 𝑋 ≤ 25 or 𝑋 ≥ 45

A. 𝑃 25 ≤ 𝑌 ≤ 45B. 𝑃 𝑌 ≤ 25.5 + 𝑃 𝑌 ≥ 44.5C. 1 − 𝑃 25 ≤ 𝑌 ≤ 45D. 1 − 𝑃 25.5 ≤ 𝑌 ≤ 44.5

Page 53: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

🤔53

Dice gameYou will roll 10 6-sided dice 𝑋#, 𝑋:, … , 𝑋#/ .• Let 𝑋 = 𝑋# + 𝑋: +⋯+ 𝑋#/, the total value of all 10 rolls.• You win if 𝑋 ≤ 25 or 𝑋 ≥ 45.

And now the truth (according to the CLT)…

WAK#

>

𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)As 𝑛 → ∞:

1. Define RVs andstate goal.

2. Solve.

𝐸 𝑋X = 3.5,Var 𝑋X = 35/12𝑋 ≈ 𝑌~𝒩(10 3.5 , 10 35/12 )

Want:𝑃 𝑋 ≤ 25 or 𝑋 ≥ 45≈ 𝑃 𝑌 ≤ 25.5 + 𝑃 𝑌 ≥ 44.5

𝑃 𝑌 ≤ 25.5 + 𝑃 𝑌 ≥ 44.5

= Φ25.5 − 3510 35/12

− 1 − Φ44.5 − 3510 35/12

≈ Φ −1.76 − 1 − Φ 1.76≈ 1 − 0.9608 − 1 − 0.9608= 0.0784

Page 54: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

🤔54

Dice gameYou will roll 10 6-sided dice 𝑋#, 𝑋:, … , 𝑋#/ .• Let 𝑋 = 𝑋# + 𝑋: +⋯+ 𝑋#/, the total value of all 10 rolls.• You win if 𝑋 ≤ 25 or 𝑋 ≥ 45.

And now the truth (according to the CLT)…

WAK#

>

𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)As 𝑛 → ∞:

0

0.02

0.04

0.06

0.08

10 20 30 40 50 60

≈ 𝑃 𝑌 ≤ 25.5 + 𝑃 𝑌 ≥ 44.5

≈ 0.0784

𝑃 𝑋 ≤ 25 or 𝑋 ≥ 45 ≈ 0.0780

(by computer)

(by CLT)

Page 55: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

As 𝑛 → ∞:Clock running time

55

f𝑋~𝒩 𝑡,4𝑛

𝑃 𝑡 − 0.5 ≤ f𝑋 ≤ 𝑡 + 0.5 = 0.95Want:

𝑃 0.5 ≤ f𝑋 − 𝑡 ≤ 0.5 = 0.95f𝑋 − 𝑡~𝒩 0,4𝑛

(linear transform of

a normal)

(CLT)

Want to find the mean (clock)runtime of an algorithm, 𝜇 = 𝑡 sec.• Suppose variance of

runtime is 𝜎: = 4 sec2.How many trials do we need s.t. estimated time = 𝑡 ± 0.5 with 95% certainty?

Run algorithm repeatedly (i.i.d. trials):• 𝑋X = runtime of 𝑖-th run (for 1 ≤ 𝑖 ≤ 𝑛)• Estimate runtime to be

average of 𝑛 trials, f𝑋

1. Define RVs andstate goal.

2. Solve.

1𝑛WAK#

>

𝑋A ~𝒩(𝜇,𝜎:

𝑛 )

Page 56: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Clock running timeWant to find the mean (clock)runtime of an algorithm, 𝜇 = 𝑡 sec.• Suppose variance of

runtime is 𝜎: = 4 sec2.How many trials do we need s.t. estimated time = 𝑡 ± 0.5 with 95% certainty?

56

Run algorithm repeatedly (i.i.d. trials):• 𝑋X = runtime of 𝑖-th run (for 1 ≤ 𝑖 ≤ 𝑛)• Estimate runtime to be

average of 𝑛 trials, f𝑋

1. Define RVs andstate goal.

0.95 =𝑃 0.5 ≤ f𝑋 − 𝑡 ≤ 0.5

f𝑋 − 𝑡~𝒩 0,4𝑛

2. Solve.

0.95 = 𝐹 U5"] 0.5 − 𝐹 U5"] −0.5

= Φ0.5 − 04/𝑛

− Φ−0.5 − 0

4/𝑛= 2Φ

𝑛4

− 1

As 𝑛 → ∞:1𝑛WAK#

>

𝑋A ~𝒩(𝜇,𝜎:

𝑛 )

Page 57: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Clock running timeWant to find the mean (clock)runtime of an algorithm, 𝜇 = 𝑡 sec.• Suppose variance of

runtime is 𝜎: = 4 sec2.How many trials do we need s.t. estimated time = 𝑡 ± 0.5 with 95% certainty?

57

Run algorithm repeatedly (i.i.d. trials):• 𝑋X = runtime of 𝑖-th run (for 1 ≤ 𝑖 ≤ 𝑛)• Estimate runtime to be

average of 𝑛 trials, f𝑋

1. Define RVs andstate goal.

0.95 =𝑃 0.5 ≤ f𝑋 − 𝑡 ≤ 0.5

f𝑋 − 𝑡~𝒩 0,4𝑛

2. Solve.

0.95 = 𝐹 U5"] 0.5 − 𝐹 U5"] −0.5

= Φ0.5 − 04/𝑛

− Φ−0.5 − 0

4/𝑛= 2Φ

𝑛4

− 1

0.975 = Φ 𝑛/4𝑛/4 = Φ"# 0.975 ≈ 1.96 𝑛 ≈ 62

As 𝑛 → ∞:1𝑛WAK#

>

𝑋A ~𝒩(𝜇,𝜎:

𝑛 )

Page 58: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Wonderful form of cosmic order

I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the ”[Central limit theorem]". The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement, amidst

the wildest confusion. The huger the mob, and the greater the apparent anarchy, the more perfect is its sway.

– Sir Francis Galton

58

(of the Galton Board)

It reigns with serenity and in complete self-effacement, amidst the wildest confusion. The huger the mob, and the greater the apparent anarchy, the more perfect is its sway. It is the supreme law of Unreason.

Whenever a large sample of chaotic elements are taken in hand and marshalled in the order of their magnitude, an unsuspected and most

beautiful form of regularity proves to have been latent all along.

Page 59: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Extra slides

Extra CLT exercises

59

Page 60: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Crashing website• Let 𝑋 = number of visitors to a website, where 𝑋~Poi 100 .• The server crashes if there are ≥ 120 requests/minute.

What is 𝑃 server crashes in next minute ?

60

Strategy: Poisson (exact) Strategy: CLT (approximate)State goal

𝑃 𝑋 ≥ 120

Want: 𝑃 𝑋 ≥ 120Solve

= JNK#:/

^120 N𝑒"#//

𝑘!

≈ 0.0282

Page 61: 18: Central Limit Theorem - Stanford University...•Perform an experiment •Based on experiment outcomes, update the posterior distribution of ! Food for thought: Any parameter for

Lisa Yan, CS109, 2019

Crashing website• Let 𝑋 = number of visitors to a website, where 𝑋~Poi 100 .• The server crashes if there are ≥ 120 requests/minute.

What is 𝑃 server crashes in next minute ?

61

Strategy: Poisson (exact) Strategy: CLT (approximate)State goal

𝑃 𝑋 ≥ 120

Want: 𝑃 𝑋 ≥ 120Solve

= JNK#:/

^120 N𝑒"#//

𝑘!

≈ 0.0282

State approx.goal

Poi 100 ~WXYA

G

Poi 100/𝑛 (sum of IIDPoisson = Poisson)

𝑋 ≈ 𝑌~𝒩(𝑛𝜇, 𝑛𝜎-) 𝜇 =100𝑛

= 𝜎-

Want: 𝑃 𝑋 ≥ 120 ≈ 𝑃 𝑌 ≥ 119.5Solve𝑃 𝑌 ≥ 119.5 = 1 − Φ

119.5 − 100100

= 1 − Φ 1.95≈ 0.0256