hidden markov models and dynamic programming · bb bb bb bb @ a 01::: a 0n::::: :: a n1::: a nn 1...
TRANSCRIPT
![Page 1: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/1.jpg)
University of Oslo : Department of Informatics
Hidden Markov Modelsand Dynamic Programming
Stephan Oepen Jonathon Read
Date: October 14 2011 Venue: INF4820Department of InformaticsUniversity of Oslo
![Page 2: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/2.jpg)
Topics
Last weekI Parts-of-speech (POS)I A symbolic approach to POS taggingI Stochastic POS tagging
TodayI Hidden Markov ModelsI Computing likelihoods using the Forward algorithmI Decoding hidden states using the Viterbi algorithm
![Page 3: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/3.jpg)
Topics
Last weekI Parts-of-speech (POS)I A symbolic approach to POS taggingI Stochastic POS tagging
TodayI Hidden Markov ModelsI Computing likelihoods using the Forward algorithmI Decoding hidden states using the Viterbi algorithm
![Page 4: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/4.jpg)
Markov chains
DefinitionQ = q1q2 . . . qN a set of N statesq0, qF special start and final states
A =
a01 . . . a0N...
. . ....
aN1 . . . aNN
a transition probability matrix, whereaij the probability of moving from statei to state j
![Page 5: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/5.jpg)
Markov chains
DefinitionQ = q1q2 . . . qN a set of N statesq0, qF special start and final states
A =
a01 . . . a0N...
. . ....
aN1 . . . aNN
a transition probability matrix, whereaij the probability of moving from statei to state j
![Page 6: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/6.jpg)
Hidden Markov models (HMMs)
Markov chains are useful when computing the probability ofobservable sequences. However, we often interested in eventsthat are hidden.
DefinitionQ = q1q2 . . . qN a set of N statesq0, qF special start and final states
A =
a01 . . . a0N...
. . ....
aN1 . . . aNN
a transition probability matrix, whereaij the probability of moving from statei to state j
O = o1o2 . . . oT a sequence of observationsB = bi(ot) a sequence of observation likelihoods
![Page 7: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/7.jpg)
Ice cream and global warming
Missing records of weather in Baltimore for Summer 2007I likelihood of hot/cold weather given yesterday’s weatherI Jason’s diary, listing how many ice creams he ate each dayI number of ice creams he tends to eat, given the weather
![Page 8: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/8.jpg)
Ice cream and global warming
Missing records of weather in Baltimore for Summer 2007I likelihood of hot/cold weather given yesterday’s weatherI Jason’s diary, listing how many ice creams he ate each dayI number of ice creams he tends to eat, given the weather
![Page 9: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/9.jpg)
Computing likelihoods
TaskGiven a HMM λ = (A,B) and an observation sequence O,determine the likelihood P(O|λ).
Compute the sum over all possible state sequences:
P(O) =∑
Q
P(O,Q) =∑
Q
P(O|Q)P(Q)
For example, the ice cream sequence 3 1 3:
P(3 1 3) = P(3 1 3, cold cold cold) +
P(3 1 3, cold cold hot) +
P(3 1 3,hot hot cold) + . . . ⇒ O(NTT)
![Page 10: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/10.jpg)
Computing likelihoods
TaskGiven a HMM λ = (A,B) and an observation sequence O,determine the likelihood P(O|λ).
Compute the sum over all possible state sequences:
P(O) =∑
Q
P(O,Q) =∑
Q
P(O|Q)P(Q)
For example, the ice cream sequence 3 1 3:
P(3 1 3) = P(3 1 3, cold cold cold) +
P(3 1 3, cold cold hot) +
P(3 1 3,hot hot cold) + . . . ⇒ O(NTT)
![Page 11: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/11.jpg)
The Forward algorithm
Employs dynamic programming—storing and reusing theresults of partial computations in a trellis α.
Each cell in the trellis stores the probability of being in state qjafter seing the first t observations:
αt(j) = P(o1 . . . ot, qt = j)
=
N∑i=1
αt−1(i)aijbj(ot)
![Page 12: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/12.jpg)
The Forward algorithm
Employs dynamic programming—storing and reusing theresults of partial computations in a trellis α.
Each cell in the trellis stores the probability of being in state qjafter seing the first t observations:
αt(j) = P(o1 . . . ot, qt = j)
=
N∑i=1
αt−1(i)aijbj(ot)
![Page 13: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/13.jpg)
Calculating a single element in the trellis
![Page 14: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/14.jpg)
Psuedocode for the Forward algorithm
Input: observations of length T, state-graph of length NOutput: forward-probabilitycreate a probability matrix forward[N + 2,T]foreach state s from 1 to N do
forward[s, 1]← ao,s × bs(o1)endforeach time step t from 2 to T do
foreach state s from 1 to N doforward[s, t]←
∑Ns′=1 forward[s, t − 1] × as′,s × bs(ot)
endendforward[qF,T]←
∑Ns=1 forward[s,T] × as,qF
return forward[qF,T]
![Page 15: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/15.jpg)
An example of the Forward algorithmn
![Page 16: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/16.jpg)
The ice cream HMM
![Page 17: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/17.jpg)
Decoding hidden states
TaskGiven a HMM λ = (A,B) and a sequence of observationsO = o1, o2, . . . , oT, find the most probable correspondingsequence of hidden states Q = q1, q2, . . . , qT.
vt(j) = max(q1,...,qt−1)
P(o1 . . . ot, q1 . . . qt−1, qt = j)
=N
maxi=1
vt−1(i) aij bj(ot)
and additionally keep a backtrace to which ever state was themost probable path to the current state:
βt(j) = argN
maxi=1
vt−1(i) aij
![Page 18: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/18.jpg)
Decoding hidden states
TaskGiven a HMM λ = (A,B) and a sequence of observationsO = o1, o2, . . . , oT, find the most probable correspondingsequence of hidden states Q = q1, q2, . . . , qT.
vt(j) = max(q1,...,qt−1)
P(o1 . . . ot, q1 . . . qt−1, qt = j)
=N
maxi=1
vt−1(i) aij bj(ot)
and additionally keep a backtrace to which ever state was themost probable path to the current state:
βt(j) = argN
maxi=1
vt−1(i) aij
![Page 19: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/19.jpg)
Decoding hidden states
TaskGiven a HMM λ = (A,B) and a sequence of observationsO = o1, o2, . . . , oT, find the most probable correspondingsequence of hidden states Q = q1, q2, . . . , qT.
vt(j) = max(q1,...,qt−1)
P(o1 . . . ot, q1 . . . qt−1, qt = j)
=N
maxi=1
vt−1(i) aij bj(ot)
and additionally keep a backtrace to which ever state was themost probable path to the current state:
βt(j) = argN
maxi=1
vt−1(i) aij
![Page 20: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/20.jpg)
Psuedocode for the Virterbi algorithm
Input: observations of length T, state-graph of length NOutput: best-pathcreate a path probability matrix viterbi[N + 2,T]create a path backpointer matrix backpointer[N + 2,T]foreach state s from 1 to N do
forward[s, 1]← ao,s × bs(o1)backpointer[s, 1]← 0
endforeach time step t from 2 to T do
foreach state s from 1 to N doviterbi[s, t]← maxN
s′=1 viterbi[s′, t − 1] × as′ ,s × bs(ot)backpointer[s, t]← arg maxN
s′=1 viterbi[s′, t − 1] × as′ ,s
endendviterbi[qF,T]← maxN
s=1 viterbi[s,T] × as,qF
backpointer[qF,T]← arg maxNs=1 viterbi[s,T] × as,qF
return the path by following backpointers from backpointer[qF,T]
![Page 21: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/21.jpg)
An example of the Viterbi algorithmn
![Page 22: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/22.jpg)
The ice cream HMM
![Page 23: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/23.jpg)
(A Practical Tip)
I When multiplying many small probabilities, we riskgetting values that are too close to zero to be represented:Underflow.
I It is often helpful to work in “log-space”:
log(max f ) = max(log f )
I Reduces multiplication to addition.
log∏
i
Pi =∑
i
log Pi
![Page 24: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/24.jpg)
(A Practical Tip)
I When multiplying many small probabilities, we riskgetting values that are too close to zero to be represented:Underflow.
I It is often helpful to work in “log-space”:
log(max f ) = max(log f )
I Reduces multiplication to addition.
log∏
i
Pi =∑
i
log Pi
![Page 25: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/25.jpg)
Evaluation
I Using a manually labeled test set as our gold standard, wecan compute the accuracy of our model: The percentage oftags in test set that the tagger gets right.
I Compare the accuracy to some reference models: anupper-bound and a baseline.
I An upper-bound ceiling can be based on e.g. how wellhumans would do on the task or by assuming an “oracle”.
I A lower-bound baseline can be based on the accuracyexpected by e.g. random choice, always picking the tagswith the highest frequency, or applying a unigram model.
I Standard hypothesis tests can be applied to test thestatistical significance of any differences.
![Page 26: Hidden Markov Models and Dynamic Programming · BB BB BB BB @ a 01::: a 0N::::: :: a N1::: a NN 1 CC CC CC CC A a transition probability matrix, where a ij the probability of moving](https://reader033.vdocuments.site/reader033/viewer/2022053101/606129439557b67db91ec870/html5/thumbnails/26.jpg)
Evaluation
I Using a manually labeled test set as our gold standard, wecan compute the accuracy of our model: The percentage oftags in test set that the tagger gets right.
I Compare the accuracy to some reference models: anupper-bound and a baseline.
I An upper-bound ceiling can be based on e.g. how wellhumans would do on the task or by assuming an “oracle”.
I A lower-bound baseline can be based on the accuracyexpected by e.g. random choice, always picking the tagswith the highest frequency, or applying a unigram model.
I Standard hypothesis tests can be applied to test thestatistical significance of any differences.