an introduction to hidden markov model
TRANSCRIPT
An Introduction to HMM
Browny
2010.07.21
MM vs. HMM
States
States
States
Observations
Markov Model
• Given 3 weather states:
– {S1, S2, S3} = {rain, cloudy, sunny}
Rain Cloudy Sunny
Rain 0.4 0.3 0.3
• What is the probabilities for next 7 days
will be {sun, sun, rain, rain, sun, cloud,
sun} ?
Rain 0.4 0.3 0.3
Cloudy 0.2 0.6 0.2
Sunny 0.1 0.1 0.8
Hidden Markov Model
• The states
– We don’t understand, Hidden!
– But it can be indirectly observed
• Example
–北極or赤道(model), Hot/Cold(state), 1/2/3
ice cream(observation)
Hidden Markov Model
• The observation is a probability function
of state which is not observable directly
Hidden States
HMM Elements
• N, the number of states in the model
• M, the number of distinct observation
symbols
• A, the state transition probability distribution• A, the state transition probability distribution
• B, the observation symbol probability
distribution in states
• π, the initial state distribution λ: model
Example
P(…|C) P(…|H) P(…|Start)
P(1|…) 0.7 0.1
P(2|…) 0.2 0.2B:
Observation
B:
Observation
P(3|…) 0.1 0.7
P(C|…) 0.8 0.1 0.5
P(H|…) 0.1 0.8 0.5
P(STOP|…) 0.1 0.1 0
ObservationObservation
A:
Transition
A:
Transition
π:
initial
π:
initial
3 Problems
3 Problems
1. 觀察到的現象最符合哪一個模型
P(觀察到的現象|模型)
2. 怎樣的狀態序列最符合觀察到的現
象和已知的模型象和已知的模型
P(狀態序列|觀察到的現象, 模型)
3. 怎樣的模型最有可能產生觀察到的
現象
what 模型 maximize P(觀察到的現象|
模型)
Solution 1
• 已知模型,一觀察序列之產生機率 P(O|λ)
S1
R1
R2
S1 S1
R
R1
R2
R
R1
R2
RS2
S3
S2
S3
S2
S3
R1
R2
R1
R2
R1
R2
R1
R2
R1
R2
R1
R2
t1 2 3
觀察到 R1 � R1 � R2 的機率為多少?
Solution 1
• 考慮一特定的狀態序列
•
Q = q1, q2 … qT
• 產生出一特定觀察序列之機率為
P(O|Q, λ) = P(O1|q1, λ) * P(O2|q2, λ) * … * P(Ot|qt, λ)
= bq1(O1) * bq2(O2) * … * bqT(OT)
Solution 1
• 此一特定序列發生之機率為
• P(O|λ)
P(Q|λ) = πq1 * aq1q2 * aq2q3 * … * aq(T-1)qT
• 已知模型,一觀察序列之產生機率 P(O|λ)
P(O|λ) = P(O|Q, λ) * P(Q| λ)
= πq1 * bq1(O1) * aq1q2 bq2(O2)* … * aq(T-1)qT * bqT(OT)
1 2, ,..., Tq q q
∑
1 2, ,..., Tq q q
∑
Solution 1
• Complexity (N: 狀態的數量狀態的數量狀態的數量狀態的數量)
– 2T*NT ≈ (2T-1)NT muls + NT-1 adds (NT:狀態轉換
組合數)
– For N=5 states, T=100 observations, there are – For N=5 states, T=100 observations, there are
order 2*100*5100 ≈ 1072 computations!!
• Forward Algorithm
– Forward variable αt(i) (給定時間 t 時狀態為 Si 的
條件下,向前向前向前向前局部觀察序列為O1, O2, O3…, Ot的
機率)
1 2( ) ( , ,..., , | )t t t ia i P O O O q S λ= =
Solution 1
S1
S2
S3
R1
R2
S1
S2
S3
S1
S2
S3
R1
R2
R1
R
R1
R2
R1
R2
R1
R
R1
R2
R1
R2
R1
R
When O1 = R1
R2 R2 R2
1 1 1 1
1 2 2 1
1 3 3 1
(1) ( )
(2) ( )
(3) ( )
b O
b O
b O
α π
α π
α π
=
=
=
( ) ( )1 11i ii b O i Nα π= ≤ ≤
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
2 1 11 1 21 1 31 1 2
2 1 12 1 22 1 32 2 2
1 1 2 3
2 1 2 3
a a a b O
a a a b O
α α α α
α α α α
= + +
= + +
t1 2 3
Forward Algorithm
• Initialization:
• Induction:
1 1( ) ( ) 1i ii b O i Nα π= ≤ ≤
• Induction:
• Termination:
( ) ( ) ( )1 1
1
N
t t ij j t
i
j i a b Oα α+ +=
= ∑
1 1
1
t T
j N
≤ ≤ −
≤ ≤
1
( | ) ( )N
T
i
P O iλ α=
=∑
Backward Algorithm
• Forward Algorithm
1 2( ) ( , ,..., , | )t t t ia i P O O O q S λ= =
• Backward Algorithm
–給定時間 t 時狀態為 Si 的條件下,向後向後向後向後局
部觀察序列為 Ot+1, Ot+2, …, OT的機率
1 2( ) ( , ,..., , | )t t t T t ii P O O O q Sβ λ+ += =
Backward Algorithm
• Initialization
• Induction
( ) 1 1T i i Nβ = ≤ ≤
• Induction
1 1
1
( ) ( ) ( )N
t ij j t t
j
i a b O jβ β+ +=
=∑1, 2, ...,1
1
t T T
i N
= − −
≤ ≤
Backward Algorithm
S1
S2
S3
R1
R2
S1
S2
S3
S1
S2
S3
R1
R2
R1
R
R1
R2
R1
R2
R1
R
R1
R2
R1
R2
R1
R
When OT = R1
R2 R2 R2
( ) ( ) ( )
( ) ( ) ( )
1 1
1
11 1 12 2 13 3
1N
T j j T T
j
T T T
a b O j
a b O a b O a b O
β β−=
=
= + +
∑
t1 2 3
Solution 2
• 怎樣的狀態序列最能解釋觀察到的現
象和已知的模型
P(狀態序列|觀察到的現象, 模型)
• 無精確解,有很多種方式解此問題,
對狀態序列的不同限制對狀態序列的不同限制對狀態序列的不同限制對狀態序列的不同限制有不同的解法
Solution 2
• 例: Choose the state qt which are individually
most likely
– γt(i) : the probability of being in state Si at
time t, given the observation sequence O, time t, given the observation sequence O,
and the model λ
( )( )
( ) ( )( )
( ) ( )
( ) ( )
( )1
1
( | , )
1
t t t tt it N
t t
i
t ti N
i i i iP O q Si
P O P Oi i
q argmax i t T
α β α βλγ
λ λα β
γ
=
≤ ≤
== = =
= ≤ ≤
∑
Viterbi algorithm
• The most widely used criteria is to find
the “single best state sequence”
( ) ( )| , , |maxmize P Q O maxmize P Q Oλ λ≈
• A formal technique exists, based on
dynamic programming methods, and is
called the Viterbi algorithm
( ) ( )| , , |maxmize P Q O maxmize P Q Oλ λ≈
Viterbi algorithm
• To find the single best state sequence, Q =
{q1, q2, …, qT}, for the given observation
sequence O = {O1, O2, …, OT}
• δt(i): the best score (highest prob.) along a
single path, at time t, which accounts for the
first t observations and end in state Si
( )1 2 1
1 2 1 2, ,...,
... , ...t
t t i tq q q
i max P q q q S O O Oδ λ−
= =
Viterbi algorithm
• Initialization - δ1(i)
– When t = 1 the most probable path to a
state does not sensibly exist
– However we use the probability of being in
that state given t = 1 and the observable
state O1
( ) ( )
( )1 1
1
0
i ii b O i N
i
δ π
ψ
= ≤ ≤
=
Viterbi algorithm
• Calculate δt(i) when t > 1
– δt(i) : The most probable path to the state X
at time t
– This path to X will have to pass through one – This path to X will have to pass through one
of the states A, B or C at time (t-1)
Most probable path to A: 1( )
tAδ − AX
a ( )X tb O
Viterbi algorithm
• Recursion
( ) ( ) ( )
( ) ( )
11
t t ij j ti N
j max i a b O
j argmax i a
δ δ
ψ δ
−≤ ≤
=
=
2
1
t T
j N
≤ ≤
≤ ≤
• Termination
( ) ( )11
t t iji N
j argmax i aψ δ −≤ ≤
= 1 j N≤ ≤
( )
( )
*
1
*
1
Ti N
T Ti N
P max i
q argmax i
δ
δ
≤ ≤
≤ ≤
=
=
Viterbi algorithm
• Path (state sequence) backtracking
( )
* *
1 1
* *
( ) 1, 2, ...,1
( )
t t tq q t T T
q q argmax i a
ψ
ψ δ
+ += = − −
= = ( ) *
* *
1 11
* *
1 2 2
( )
...
...
( )
TT T T T iq
i N
q q argmax i a
q q
ψ δ
ψ
− −≤ ≤
= =
=
Solution 3
• 怎樣的模型 λ = (A, B, π) 最有可能產生
觀察到的現象
what 模型 maximize P(觀察到的現象|
模型)模型)
• There is no known analytic solution. We
can choose λ = (A, B, π) such that P(O| λ)
is locally maximized using an iterative
procedure
Baum-Welch Method
• Define ξt(i, j) = P(qt=Si , qt+1=Sj|O, λ)
– The probability of being in state Si at time t,
and state Sj at time t+1
( )( ) ( ) ( )
( )( ) ( ) ( )
( ) ( ) ( )
1 1
1 1
1 1
1 1
,t ij j t t
t
t ij j t t
N N
t ij j t t
i j
i a b O ji j
P O
i a b O j
i a b O j
α βξ
λ
α β
α β
+ +
+ +
+ += =
=
=
∑∑
Baum-Welch Method
• γt(i) : the probability of being in state Si at time
t, given the observation sequence O, and the
model λ
( )( ) ( )( )
( ) ( )t t t ti i i i
iα β α β
γ = =
• Relate γt(i) to ξt(i, j)
( )( ) ( ) ( )
1
t N
t t
i
iP O
i i
γλ
α β=
= =
∑
( )( ) ( ) ( )
( )( ) ( ) ( )
( ) ( ) ( )
1 1
1 1
1 1
1 1
,t ij j t t
t
t ij j t t
N N
t ij j t t
i j
i a b O ji j
P O
i a b O j
i a b O j
α βξ
λ
α β
α β
+ +
+ +
+ += =
=
=
∑∑
( ) ( )1
,N
t t
j
i i jγ ξ=
=∑
Baum-Welch Method
• The expected number of times that state Si is
visited
( )1T
tiγ
−
=∑ Expected number of transitions from Si
• Similarly, the expected number of transitions
from state Si to state Sj
( )1
t
t=
∑
( )1
1
,T
t
t
i jξ−
=
=∑ Expected number of transitions from Si to Sj
Baum-Welch Method
• Re-estimation formulas for π, A and B
1
1
( )
( , )
i
T
i
i j
π γ
ξ−
=
∑1
1
1
1. .
1
( , )
( )
( )
( )
( )
t k
ti jt
ij T
it
t
T
t
ts t O v
j T
t
t
i jexpected number of transitions fromstate S to S
aexpected number of transitions fromstate S
i
j
expected number of times in state j and observing symbb k
j
ξ
γ
γ
γ
=−
=
==
=
= =
= =
∑
∑
∑
∑
kol v
expected number of times in state j
Baum-Welch Method
• P(O|λ) > P(O|λ)
• Iteratively use λ in place of λ and repeat • Iteratively use λ in place of λ and repeat
the re-estimation, we then can improve
P(O| λ) until some limiting point is
reached