20: maximum likelihood estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11...
TRANSCRIPT
![Page 1: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/1.jpg)
20: Maximum Likelihood EstimationLisa YanMay 20, 2020
1
![Page 2: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/2.jpg)
Lisa Yan, CS109, 2020
Quick slide reference
2
3 Intro to parameter estimation 20a_intro
14 Maximum Likelihood Estimator 20b_mle
21 argmax and log-likelihood 20c_argmax
30 MLE: Bernoulli 20d_mle_bernoulli
42 MLE exercises: Poisson, Uniform, Gaussian LIVE
![Page 3: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/3.jpg)
Intro to parameter estimation
3
20a_intro
![Page 4: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/4.jpg)
Lisa Yan, CS109, 2020
Story so farAt this point:
If you are given a model with all thenecessary probabilities, you canmake predictions.
But what if you want to learn the probabilities in the model?
What if you want to learn the structure of the model, too?
Machine Learning4
(I wishβ¦another day)
π~Poi 5
π!, β¦ , π" i.i.d.π~Ber 0.2 ,π = β π#"
#$!
![Page 5: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/5.jpg)
Lisa Yan, CS109, 2020
ML: Rooted in probability theory
Artificial Intelligence
Machine Learning
Deep Learning
AI and Machine Learning
![Page 6: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/6.jpg)
Lisa Yan, CS109, 2020
Tensor Flow
Alright, so Deep Learning now?
Not so fastβ¦
![Page 7: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/7.jpg)
Lisa Yan, CS109, 2020
![Page 8: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/8.jpg)
Lisa Yan, CS109, 2020 8
![Page 9: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/9.jpg)
Lisa Yan, CS109, 2020
Once upon a timeβ¦
9
β¦there was parameter estimation.
![Page 10: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/10.jpg)
Lisa Yan, CS109, 2020
Recall some estimators π!, π", β¦ , π# are π i.i.d. random variables,where π$ drawn from distribution πΉ with πΈ π$ = π, Var π$ = π".
Sample mean:
10
π+ =1π
- π$
#
$%!
unbiased estimate of π
Sample variance: π" =1
π β 1- π$ β π+ "
#
$%!
unbiased estimate of π"
![Page 11: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/11.jpg)
Lisa Yan, CS109, 2020
What are parameters?def Many random variables we have learned so far are parametric models:
Distribution = model + parameter πex The distribution Ber 0.2
For each of the distributions below, what is the parameter π?
1. Ber π2. Poi π3. Uni πΌ, π½4. π©(π, π!)5. π = ππ + π
11
π = π
= Bernoulli model, parameter π = 0.2.
π€
![Page 12: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/12.jpg)
Lisa Yan, CS109, 2020
What are parameters?def Many random variables we have learned so far are parametric models:
Distribution = model + parameter πex The distribution Ber 0.2
For each of the distributions below, what is the parameter π?
1. Ber π2. Poi π3. Uni πΌ, π½4. π©(π, π!)5. π = ππ + π
12
= Bernoulli model, parameter π = 0.2.
π = ππ = ππ = πΌ, π½
π = π, ππ = π, π"
π is the parameter of a distribution.π can be a vector of parameters!
![Page 13: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/13.jpg)
Lisa Yan, CS109, 2020
Why do we care?In real world, we donβt know the βtrueβ parameters.β’ But we do get to observe data:
def estimator π: : random variable estimating parameter π from data.
In parameter estimation,We use the point estimate of parameter estimate (best single value):β’ Better understanding of the process producing dataβ’ Future predictions based on modelβ’ Simulation of future processes
13
(# times coin comes up heads, lifetimes of disk drives produced, # visitors to website per day, etc.)
![Page 14: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/14.jpg)
Maximum Likelihood Estimator
14
20b_mle
![Page 15: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/15.jpg)
Lisa Yan, CS109, 2020
Defining the likelihood of data: BernoulliConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ π$ was drawn from distribution πΉ = Ber π with unknown parameter π.β’ Observed data:
0, 0, 1, 1, 1, 1, 1, 1, 1, 1
How likely was the observed data if π = 0.4?
π sample|π = 0.4 = 0.4 & 0.6 " = 0.000236
15
(π = 10)
Likelihood of datagiven parameter π = 0.4 Is there a better
parameter π?
![Page 16: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/16.jpg)
Lisa Yan, CS109, 2020
Defining the likelihood of dataConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ π$ was drawn from a distribution with density function π π$|π .β’ Observed data: π!, π", β¦ , π#
Likelihood question:How likely is the observed data π!, π", β¦ , π# given parameter π?
Likelihood function, πΏ π :
16
This is just a product, since π# are i.i.d.
or mass
= B π π$|π#
$%!
πΏ π = π π!, π", β¦ , π#|π
![Page 17: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/17.jpg)
Lisa Yan, CS109, 2020
Defining the likelihood of data
17
πΏ π = B π π$|π#
$%!
![Page 18: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/18.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood EstimatorConsider a sample of π i.i.d. random variables π!, π", β¦ , π#, drawn from a distribution π π$|π .def The Maximum Likelihood Estimator (MLE) of π is the value of π that
maximizes πΏ π .
18
π!"# = arg max$
πΏ π
![Page 19: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/19.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood EstimatorConsider a sample of π i.i.d. random variables π!, π", β¦ , π#, drawn from a distribution π π$|π .def The Maximum Likelihood Estimator (MLE) of π is the value of π that
maximizes πΏ π .
19
π!"# = arg max$
πΏ π
πΏ π = B π π$|π#
$%!
Likelihood of your sample
For continuous π$, π π$|π is PDF; for discrete π$, π π$|π is PMF
![Page 20: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/20.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood EstimatorConsider a sample of π i.i.d. random variables π!, π", β¦ , π#, drawn from a distribution π π$|π .def The Maximum Likelihood Estimator (MLE) of π is the value of π that
maximizes πΏ π .
20
π!"# = arg max$
πΏ π
The argument πthat maximizes πΏ π
Stay tuned!
![Page 21: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/21.jpg)
argmax
21
20c_argmax
![Page 22: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/22.jpg)
Lisa Yan, CS109, 2020
New function: arg max
1. max'
π π₯ ?
2. arg max'
π π₯ ?
22
arg max%
π π₯ The argument π₯ thatmaximizes the function π π₯ .
Let π π₯ = βπ₯" + 4, where β2 < π₯ < 2.
0
1
2
3
4
-2 -1 0 1 2
π π₯
π₯ π€
![Page 23: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/23.jpg)
Lisa Yan, CS109, 2020
New function: arg max
1. max'
π π₯ ?
2. arg max'
π π₯ ?
23
arg max%
π π₯ The argument π₯ thatmaximizes the function π π₯ .
Let π π₯ = βπ₯" + 4, where β2 < π₯ < 2.
0
1
2
3
4
-2 -1 0 1 2
π π₯
π₯
= 4
= 0
![Page 24: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/24.jpg)
Lisa Yan, CS109, 2020
Argmax and log
24
arg max%
π π₯
Let π π₯ = βπ₯" + 4, where β2 < π₯ < 2.
0
1
2
3
4
-2 -1 0 1 2
π π₯
π₯arg max
' π π₯ = 0
-8
-4
0
4
-2 -1 0 1 2
log π π₯
π₯
= arg max%
log π π₯
The argument π₯ thatmaximizes the function π π₯ .
![Page 25: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/25.jpg)
Lisa Yan, CS109, 2020
Logs all around
25
-2
-1
0
1
-1 0 1 2 3 4 5 6
β’ Log is monotonic:π₯ β€ π¦ βΊ log π₯ β€ log π¦
β’ Log of product = sum of logs:
β’ Natural logslog π₯
log ππ = log π + log π
log(π₯ = ln π₯
![Page 26: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/26.jpg)
Lisa Yan, CS109, 2020
Argmax properties
26
arg max%
π π₯
= arg max%
log π π₯ (log is an increasing function: π₯ β€ π¦ βΊ log π₯ β€ log π¦)
= arg max%
π log π π₯
for any positive constant π
(π₯ β€ π¦ βΊ π log π₯ β€ π log π¦)
The argument π₯ thatmaximizes the function π π₯ .
![Page 27: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/27.jpg)
Lisa Yan, CS109, 2020
Argmax properties
27
arg max%
π π₯
= arg max%
log π π₯ (log is monotonic: π₯ β€ π¦ βΊ log π₯ β€ log π¦)
= arg max%
π log π π₯
for any positive constant π
(π₯ β€ π¦ βΊ π log π₯ β€ π log π¦)
The argument π₯ thatmaximizes the function π π₯ .
arg max
How do we compute argmax?
![Page 28: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/28.jpg)
Lisa Yan, CS109, 2020
Finding the argmax with calculus
28
π₯/ = arg max%
π π₯ Let π π₯ = βπ₯" + 4, where β2 < π₯ < 2.
0
1
2
3
4
-2 -1 0 1 2
π π₯
π₯
πππ₯
π π₯ =π
ππ₯π₯" + 4 = 2π₯Differentiate w.r.t.
argmaxβs argument
Set to 0 and solve 2π₯ = 0 β π₯U = 0
Make sure π₯;is a maximum
β’ Check π π₯; Β± π < π π₯;β’ Often ignored in expository derivationsβ’ Weβll ignore it here too
(and wonβt require it in class)
![Page 29: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/29.jpg)
MLE: Bernoulli
29
20d_mle_bernoulli
![Page 30: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/30.jpg)
Lisa Yan, CS109, 2020
Computing the MLE
General approach for finding π)*+ , the MLE of π:
30
π!"# = arg max$
πΏπΏ π
1. Determine formula for πΏπΏ π
2. Differentiate πΏπΏ πw.r.t. (each) π
πΏπΏ π = ? log π π#|π"
#$!
ππΏπΏ πππ
3. Solve resulting(simultaneous) equations
To maximize:ππΏπΏ π
ππ = 0
4. Make sure derived πB%&' is a maximum β’ Check πΏπΏ π%&' Β± π < πΏπΏ π%&'β’ Often ignored in expository derivationsβ’ Weβll ignore it here too (and wonβt require it in class)
(algebra orcomputer)
πΏπΏ π is often easier to differentiate than πΏ π .
![Page 31: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/31.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with BernoulliConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
31
1. Determine formula for πΏπΏ π
β’ Let π$~Ber π .
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
3. Solve resultingequations
π π$|π = Wπ if π$ = 11 β π if π$ = 0
πΏπΏ π = ? log π π#|π"
#$!?
![Page 32: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/32.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with BernoulliConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
32
1. Determine formula for πΏπΏ π
β’ Let π$~Ber π .β’ π π#|π = π(! 1 β π !)(!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
3. Solve resultingequations
πΏπΏ π = ? log π π#|π"
#$!
π π$|π = π7! 1 β π !87! where π$ β {0,1}
β’ Is differentiable with respect to πβ’ Valid PMF over discrete domainβ
π π$|π = Wπ if π$ = 11 β π if π$ = 0
![Page 33: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/33.jpg)
Lisa Yan, CS109, 2020
πΏπΏ π = ? log π π#|π"
#$!
Maximum Likelihood with BernoulliConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
33
1. Determine formula for πΏπΏ π
β’ Let π$~Ber π .β’ π π#|π = π(! 1 β π !)(!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
3. Solve resultingequations
= ? log π(! 1 β π !)(!"
#$!
π = ? π#"
#$!
= ? π# log π + 1 β π# log 1 β π"
#$!
= π log π + π β π log 1 β π , where
![Page 34: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/34.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with BernoulliConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
34
1. Determine formula for πΏπΏ π
β’ Let π$~Ber π .β’ π π#|π = π(! 1 β π !)(!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
3. Solve resultingequations
= π log π + π β π log 1 β π , where π = ? π#
"
#$!
πΏπΏ π = ? π# log π + 1 β π# log 1 β π"
#$!
ππΏπΏ πππ = π
1π + π β π
β11 β π = 0
![Page 35: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/35.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with BernoulliConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
35
1. Determine formula for πΏπΏ π
β’ Let π$~Ber π .β’ π π#|π = π(! 1 β π !)(!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
3. Solve resultingequations
= π log π + π β π log 1 β π , where π = ? π#
"
#$!
πΏπΏ π = ? π# log π + 1 β π# log 1 β π"
#$!
ππΏπΏ πππ = π
1π + π β π
β11 β π = 0
![Page 36: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/36.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with BernoulliConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
36
1. Determine formula for πΏπΏ π
β’ Let π$~Ber π .β’ π π#|π = π(! 1 β π !)(!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
3. Solve resultingequations
= π log π + π β π log 1 β π , where π = ? π#
"
#$!
πΏπΏ π = ? π# log π + 1 β π# log 1 β π"
#$!
ππΏπΏ πππ = π
1π + π β π
β11 β π = 0
π%&' =1π π =
1π ? π#
"
#$!
MLE of the Bernoulli parameter, π%&', is the unbiased estimate of the mean, πF (sample mean)
![Page 37: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/37.jpg)
Lisa Yan, CS109, 2020
MLE of Bernoulli is the sample mean
37
πF =1π ? π#
"
#$!
Bernoulliπ π#|π = π(! 1 β π !)(! ,
where π# β {0,1}
πΏπΏ π = ? log π π#|π"
#$!
![Page 38: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/38.jpg)
Lisa Yan, CS109, 2020
Quick checkβ’ You draw π i.i.d. random variables π!, π", β¦ , π# from the distribution πΉ,
yielding the following sample: 0, 0, 1, 1, 1, 1, 1, 1, 1, 1
β’ Suppose distribution πΉ = Ber π with unknown parameter π.
38
(π = 10)
A. 1.0B. 0.5C. 0.8D. 0.2E. None/other
1. What is π)*+ , the MLE of the parameter π?
π%&' = πF =1π ? π#
"
#$!
π€
![Page 39: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/39.jpg)
Lisa Yan, CS109, 2020
Quick checkβ’ You draw π i.i.d. random variables π!, π", β¦ , π# from the distribution πΉ,
yielding the following sample: 0, 0, 1, 1, 1, 1, 1, 1, 1, 1
β’ Suppose distribution πΉ = Ber π with unknown parameter π.
39
A. 1.0B. 0.5C. 0.8D. 0.2E. None/other
1. What is π)*+ , the MLE of the parameter π?
π%&' = πF =1π ? π#
"
#$!
(π = 10)
![Page 40: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/40.jpg)
Lisa Yan, CS109, 2020
Quick checkβ’ You draw π i.i.d. random variables π!, π", β¦ , π# from the distribution πΉ,
yielding the following sample: 0, 0, 1, 1, 1, 1, 1, 1, 1, 1
β’ Suppose distribution πΉ = Ber π with unknown parameter π.
40
C. 0.8
πΏ π = J π π#|π"
#$!
1. What is π)*+ , the MLE of the parameter π?2. What is the likelihood πΏ π of this particular sample?
π π#|π = π(! 1 β π !)(! where π# β {0,1}
= π* 1 β π +
where π = π
(π = 10)
![Page 41: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/41.jpg)
(live)20: Maximum Likelihood EstimationLisa YanMay 20, 2020
41
![Page 42: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/42.jpg)
Lisa Yan, CS109, 2020
Computing the MLE
General approach for finding π)*+ , the MLE of π:
42
1. Determine formula for πΏπΏ π
2. Differentiate πΏπΏ πw.r.t. (each) π
πΏπΏ π = ? log π π#|π"
#$!
ππΏπΏ πππ
3. Solve resulting(simultaneous) equations
To maximize:ππΏπΏ π
ππ = 0
4. Make sure derived πB%&' is a maximum β’ Check πΏπΏ π%&' Β± π < πΏπΏ π%&'β’ Often ignored in expository derivationsβ’ Weβll ignore it here too (and wonβt require it in class)
(algebra orcomputer)
πΏπΏ π is often easier to differentiate than πΏ π .
Review
![Page 43: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/43.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with PoissonConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
43
1. Determine formula for πΏπΏ π
πΏπΏ π =$logπ!"π#!π$!
%
$&'
= βππ + log π $π$
%
$&'
β$log π$!%
$&'
=$ βπ log π + π$ log π β logπ$!%
$&'(using natural log, ln π = 1)
π π#|π =π),π(!
π#!
β’ Let π$~Poi π .β’ PMF:
![Page 44: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/44.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with PoissonConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
44
1. Determine formula for πΏπΏ π
πΏπΏ π =$logπ!"π#!π$!
%
$&'
= βππ + log π $π$
%
$&'
β$log π$!%
$&'
=$ βπ log π + π$ log π β logπ$!%
$&'(using natural log, ln π = 1)
π π#|π =π),π(!
π#!
β’ Let π$~Poi π .β’ PMF:
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
βπ +1π A π%
&
%'(+ π log π β A
1π%!
β ππ%!ππ%
&
%'(βπ +
1π ? π#
"
#$!
A. B. C. None/other/donβt know π€
ππΏπΏ πππ = ?
![Page 45: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/45.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with PoissonConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
45
1. Determine formula for πΏπΏ π
πΏπΏ π =$logπ!"π#!π$!
%
$&'
= βππ + log π $π$
%
$&'
β$log π$!%
$&'
=$ βπ log π + π$ log π β logπ$!%
$&'(using natural log, ln π = 1)
π π#|π =π),π(!
π#!
β’ Let π$~Poi π .β’ PMF:
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
βπ +1π A π%
&
%'(+ π log π β A
1π%!
β ππ%!ππ%
&
%'(βπ +
1π ? π#
"
#$!
A. B. C. None/other/donβt know
ππΏπΏ πππ = ?
![Page 46: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/46.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with PoissonConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
46
1. Determine formula for πΏπΏ π
πΏπΏ π =$logπ!"π#!π$!
%
$&'
= βππ + log π $π$
%
$&'
β$log π$!%
$&'
=$ βπ log π + π$ log π β logπ$!%
$&'(using natural log, ln π = 1)
π π#|π =π),π(!
π#!
β’ Let π$~Poi π .β’ PMF:
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
ππΏπΏ πππ = βπ +
1π ? π#
"
#$!= 0
3. Solve resultingequations
π%&' =1π ? π#
"
#$!
![Page 47: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/47.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with PoissonConsider a sample of π i.i.d. RVs π!, π", β¦ , π#.What is π)*+ = π)*+?
47
1. Determine formula for πΏπΏ π
πΏπΏ π =$logπ!"π#!π$!
%
$&'
= βππ + log π $π$
%
$&'
β$log π$!%
$&'
=$ βπ log π + π$ log π β logπ$!%
$&'(using natural log, ln π = 1)
π π#|π =π),π(!
π#!
β’ Let π$~Poi π .β’ PMF:
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
ππΏπΏ πππ = βπ +
1π ? π#
"
#$!= 0
3. Solve resultingequations
π%&' =1π ? π#
"
#$!
MLE of the Poisson parameter, π%&', is the unbiased estimate of the mean, πF (sample mean)
![Page 48: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/48.jpg)
Lisa Yan, CS109, 2020
Quick check1. A particular experiment can be modeled as a
Poisson RV with parameter π, in terms of events/minute.Collect data: observe 53 events over the next 10 minutes. What is π)*+?
2. Is the Bernoulli MLE an unbiased estimator of the Bernoulli parameter π?
3. Is the Poisson MLE an unbiased estimator of the Poisson variance?
4. What does unbiased mean?
48
π€
π%&' =1π ? π#
"
#$!
![Page 49: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/49.jpg)
Lisa Yan, CS109, 2020
Quick check1. A particular experiment can be modeled as a
Poisson RV with parameter π, in terms of events/minute.Collect data: observe 53 events over the next 10 minutes. What is π)*+?
2. Is the Bernoulli MLE an unbiased estimator of the Bernoulli parameter π?
3. Is the Poisson MLE an unbiased estimator of the Poisson variance?
4. What does unbiased mean?
49
π%&' =1π ? π#
"
#$!
β
β
πΈ estimator = true_thingUnbiased: If you could repeat your experiment, on average you would get what you are looking for.
![Page 50: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/50.jpg)
Interlude for jokes/announcements
50
![Page 51: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/51.jpg)
Lisa Yan, CS109, 2020
Announcements
51
Problem Set 5
Only do problems on the official Pset handout.
Problem Set 6Released today! Due Wed. August 12 (no late days or on-time bonus).
Regrade RequestsPset 1-5 and midterm regrade requests are due by August 11 via Gradescope. Please submit Pset 6 regrades only in extreme cases (e.g. we didnβt see your answers because of mislabeled pages) via email.
Completely Optional ProjectYou may be able to replace an early Pset grade that youβre unhappy with by completing a CS109-related project. Details here: https://us.edstem.org/courses/667/discussion/98951
![Page 52: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/52.jpg)
Lisa Yan, CS109, 2020
Are these trials independent?Are probabilities consistent across jobs?
Interesting probability news
52
Bernoulliβstrialscantellyouhowmanyjobapplicationstosend
https://swizec.com/blog/bernoullis-trials-can-tell-many-job-applications-send/swizec/7677
![Page 53: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/53.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with UniformConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.
Let π$~Uni πΌ, π½ .
53
π π#|πΌ, π½ = Q1
π½ β πΌ if πΌ β€ π₯# β€ π½
0 otherwise
1. Determine formula for πΏ π
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
πΏ π = R1
π½ β πΌ
" if πΌ β€ π₯!, π₯+, β¦ , π₯" β€ π½
0 otherwise
A. Great, letβs do itB. Differentiation is hardC. Constraint πΌ β€ π₯!, π₯+, β¦ , π₯" β€ π½
makes differentiation hard π€
![Page 54: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/54.jpg)
Lisa Yan, CS109, 2020
Example sample from a UniformConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.
Let π$~Uni πΌ, π½ .
54
πΏ π = R1
π½ β πΌ
" if πΌ β€ π₯!, π₯+, β¦ , π₯" β€ π½
0 otherwise
A. Uni πΌ = 0 , π½ = 1
B. Uni πΌ = 0.15, π½ = 0.75
C. Uni πΌ = 0.15, π½ = 0.70
Suppose π$~Uni 0,1 .You observe data:
Which parameterswould give youmaximum πΏ π ?
0.15, 0.20, 0.30, 0.40, 0.65, 0.70, 0.75
π€
![Page 55: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/55.jpg)
Lisa Yan, CS109, 2020
Example sample from a UniformConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.
Let π$~Uni πΌ, π½ .
55
πΏ π = R1
π½ β πΌ
" if πΌ β€ π₯!, π₯+, β¦ , π₯" β€ π½
0 otherwise
A. Uni πΌ = 0 , π½ = 1
B. Uni πΌ = 0.15, π½ = 0.75
C. Uni πΌ = 0.15, π½ = 0.70
Suppose π$~Uni 0,1 .You observe data:
Which parameterswould give youmaximum πΏ π ?
0.15, 0.20, 0.30, 0.40, 0.65, 0.70, 0.75
!-.//
0β 0 = 0
1 1 = 1!-.0
1= 59.5
β Original parameters may not yield maximum likelihood.
![Page 56: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/56.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with UniformConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.
Let π$~Uni πΌ, π½ .
56
πΏ π = R1
π½ β πΌ
" if πΌ β€ π₯!, π₯+, β¦ , π₯" β€ π½
0 otherwise
π)*+ : πΌ)*+ = min π₯!, π₯", β¦ , π₯# π½)*+ = max π₯!, π₯", β¦ , π₯#
Intuition:β’ Want interval size π½ β πΌ to be as small
as possible to maximize likelihood function per datapoint
β’ Need to make sure all observed data is in interval (if not, then πΏ π = 0)
![Page 57: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/57.jpg)
Lisa Yan, CS109, 2020
Small samples = problems with MLEMaximum Likelihood Estimator π)*+ :β’ Best explains data we have seen β’ Does not attempt to generalize to unseen data.
In many cases,
β’ Unbiased (πΈ π%&' = π regardless of size of sample, π)
For some cases, like Uniform:
β’ Biased. Problematic for small sample sizeβ’ Example: If π = 1 then πΌ = π½, yielding an invalid distribution
57
π%&' =1π ? π#
"
#$!Sample mean (MLE for Bernoulli π,
Poisson π, Normal π)
πΌ)*+ β₯ πΌ, π½)*+ β€ π½
β
β
π)*+ = arg maxF
πΏ π
![Page 58: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/58.jpg)
Lisa Yan, CS109, 2020
Properties of MLEMaximum Likelihood Estimator:β’ Best explains data we have seen β’ Does not attempt to generalize to unseen data.
β’ Often used when sample size π is large relative to parameter space
β’ Potentially biased (though asymptotically less so, as π β β)
β’ Consistent:
As π β β (i.e., more data), probability that πB significantly differs from π is zero
58
π)*+ = arg maxF
πΏ π
lim#βH
π π: β π < π = 1 where π > 0
![Page 59: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/59.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with Normal
59
πΏπΏ π = ? log1
2πππ) (!)2 "/ +4"
"
#$!= ? β log 2ππ β π# β π +/ 2π+
"
#$! (using natural log)
= β ? log 2ππ"
#$!
β ? π# β π +/ 2π+"
#$!
Consider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ Let π#~ π© π, π+ .
What is π)*+ = π)*+ , π)*+" ?
1. Determine formula for πΏπΏ π
3. Solve resultingequations
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
π π#|π, π+ =1
2πππ) (!)2 "/ +4"
![Page 60: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/60.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with NormalConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ Let π#~ π© π, π+ .
What is π)*+ = π)*+ , π)*+" ?
60
1. Determine formula for πΏπΏ π
3. Solve resultingequations
πΏπΏ π = β ? log 2ππ"
#$!
β ? π# β π +/ 2π+"
#$!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
π π#|π, π+ =1
2πππ) (!)2 "/ +4"
ππΏπΏ πππ = ? 2 π# β π / 2π+
"
#$!
=1
π+ ? π# β π"
#$!
= 0
with respect to π
![Page 61: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/61.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with NormalConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ Let π#~ π© π, π+ .
What is π)*+ = π)*+ , π)*+" ?
61
1. Determine formula for πΏπΏ π
3. Solve resultingequations
πΏπΏ π = β ? log 2ππ"
#$!
β ? π# β π +/ 2π+"
#$!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
π π#|π, π+ =1
2πππ) (!)2 "/ +4"
ππΏπΏ πππ = ? 2 π# β π / 2π+
"
#$!
=1
π+ ? π# β π"
#$!
= 0
with respect to π with respect to π
ππΏπΏ πππ = β ?
1π
"
#$!
+ ? 2 π# β π +/ 2π5"
#$!
= βππ +
1π5 ? π# β π +
"
#$!
= 0
![Page 62: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/62.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with Normal Consider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ Let π#~ π© π, π+ .
What is π)*+ = π)*+ , π)*+" ?
62
3. Solve resultingequations
π π#|π, π+ =1
2πππ) (!)2 "/ +4"
1π+ ? π# β π
"
#$!= 0Two equations,
two unknowns:
First, solvefor π%&':
1π+ ? π#
"
#$!β
1π+ ? π
"
#$!= 0 β ? π#
"
#$!
= ππ β π%&' =1π ? π#
"
#$!unbiased
βππ +
1π5 ? π# β π +
"
#$!
= 0
![Page 63: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/63.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with NormalConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ Let π#~ π© π, π+ .
What is π)*+ = π)*+ , π)*+" ?
63
3. Solve resultingequations
π π#|π, π+ =1
2πππ) (!)2 "/ +4"
βππ +
1π5 ? π# β π +
"
#$!
= 0Two equations, two unknowns:
1π+ ? π#
"
#$!β
1π+ ? π
"
#$!= 0 β ? π#
"
#$!
= ππ β π%&' =1π ? π#
"
#$!
Next, solvefor π%&':
1π5 ? π# β π +
"
#$!
=ππ β ? π# β π +
"
#$!
= π+π β π%&'+ =1π ? π# β π%&' +
"
#$!biased
unbiased
First, solvefor π%&':
1π+ ? π# β π
"
#$!= 0
![Page 64: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/64.jpg)
Lisa Yan, CS109, 2020
Estimating a Bernoulli parameterConsider π i.i.d. random variables π!, π", β¦ , π#.β’ Suppose distribution πΉ = Ber π with unknown parameter π.β’ Say you have three coins: π! = 0.5, π" = 0.8, or πK = 1
Which coin is most likely to give you the following sample (π = 10)?0, 0, 1, 1, 1, 1, 1, 1, 1, 1
64
π€
![Page 65: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/65.jpg)
Lisa Yan, CS109, 2020
Estimating a Bernoulli parameterConsider π i.i.d. random variables π!, π", β¦ , π#.β’ Suppose distribution πΉ = Ber π with unknown parameter π.β’ Say you have three coins: π! = 0.5, π" = 0.8, or πK = 1
Which estimate is most likely to give you the following sample (π = 10)?0, 0, 1, 1, 1, 1, 1, 1, 1, 1
65
How do we write this process mathematically?
Most likely, sochoose this coin
π sample|π = 0.5 = 0.5 & 0.5 " = 0.00097π sample|π = 0.8 = 0.8 & 0.2 " = 0.00671π sample|π = 1.0 = 1.0 & 0 " = 0
![Page 66: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/66.jpg)
Lisa Yan, CS109, 2020
Estimating a Bernoulli parameter Consider π i.i.d. random variables π!, π", β¦ , π#.β’ Suppose distribution πΉ = Ber π with unknown parameter π.β’ Say you have three coins: π! = 0.5, π" = 0.8, or πK = 1
Which estimate is most likely to give you the following sample (π = 10)? 0, 0, 1, 1, 1, 1, 1, 1, 1, 1
66
π3 = arg max$β '.),'.+,,
π+ 1 β π - = 0.8
Most likely, sochoose this coin
π sample|π = 0.5 = 0.5 & 0.5 " = 0.00097π sample|π = 0.8 = 0.8 & 0.2 " = 0.00671π sample|π = 1.0 = 1.0 & 0 " = 0
![Page 67: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/67.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with BernoulliConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ Let π#~Ber π .
What is π)*+ = π)*+?
67
What is the PMF π π$|π ?A. πB. 1 β π
C. Wπ if π$ = 11 β π if π$ = 0
D. π7! 1 β π !87! where π$ β {0,1}
1. Determine formula for πΏπΏ π
3. Solve resultingequations
πΏπΏ π = ? log π π#|π"
#$!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
π€
![Page 68: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/68.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with BernoulliConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ Let π#~Ber π .
What is π)*+ = π)*+?
68
What is the PMF π π$|π ?A. πB. 1 β π
C. Wπ if π$ = 11 β π if π$ = 0
D. π7! 1 β π !87! where π$ β {0,1}
1. Determine formula for πΏπΏ π
3. Solve resultingequations
πΏπΏ π = ? log π π#|π"
#$!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0
![Page 69: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?](https://reader036.vdocuments.site/reader036/viewer/2022071110/5fe56f5c23fc5f599c731820/html5/thumbnails/69.jpg)
Lisa Yan, CS109, 2020
Maximum Likelihood with BernoulliConsider a sample of π i.i.d. random variables π!, π", β¦ , π#.β’ Let π#~Ber π .
What is π)*+ = π)*+?
69
β’ Is differentiableβ’ Valid PMF over
discrete domain
What is the PMF π π$|π ?A. πB. 1 β π
C. Wπ if π$ = 11 β π if π$ = 0
D. π7! 1 β π !87! where π$ β {0,1}
1. Determine formula for πΏπΏ π
3. Solve resultingequations
πΏπΏ π = ? log π π#|π"
#$!
2. Differentiate πΏπΏ πw.r.t. (each) π, set to 0