an introduction to hidden markov model

32
An Introduction to HMM Browny 2010.07.21

Upload: browny-lin

Post on 19-May-2015

2.200 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: An Introduction to Hidden Markov Model

An Introduction to HMM

Browny

2010.07.21

Page 2: An Introduction to Hidden Markov Model

MM vs. HMM

States

States

States

Observations

Page 3: An Introduction to Hidden Markov Model

Markov Model

• Given 3 weather states:

– {S1, S2, S3} = {rain, cloudy, sunny}

Rain Cloudy Sunny

Rain 0.4 0.3 0.3

• What is the probabilities for next 7 days

will be {sun, sun, rain, rain, sun, cloud,

sun} ?

Rain 0.4 0.3 0.3

Cloudy 0.2 0.6 0.2

Sunny 0.1 0.1 0.8

Page 4: An Introduction to Hidden Markov Model

Hidden Markov Model

• The states

– We don’t understand, Hidden!

– But it can be indirectly observed

• Example

–北極or赤道(model), Hot/Cold(state), 1/2/3

ice cream(observation)

Page 5: An Introduction to Hidden Markov Model

Hidden Markov Model

• The observation is a probability function

of state which is not observable directly

Hidden States

Page 6: An Introduction to Hidden Markov Model

HMM Elements

• N, the number of states in the model

• M, the number of distinct observation

symbols

• A, the state transition probability distribution• A, the state transition probability distribution

• B, the observation symbol probability

distribution in states

• π, the initial state distribution λ: model

Page 7: An Introduction to Hidden Markov Model

Example

P(…|C) P(…|H) P(…|Start)

P(1|…) 0.7 0.1

P(2|…) 0.2 0.2B:

Observation

B:

Observation

P(3|…) 0.1 0.7

P(C|…) 0.8 0.1 0.5

P(H|…) 0.1 0.8 0.5

P(STOP|…) 0.1 0.1 0

ObservationObservation

A:

Transition

A:

Transition

π:

initial

π:

initial

Page 8: An Introduction to Hidden Markov Model

3 Problems

Page 9: An Introduction to Hidden Markov Model

3 Problems

1. 觀察到的現象最符合哪一個模型

P(觀察到的現象|模型)

2. 怎樣的狀態序列最符合觀察到的現

象和已知的模型象和已知的模型

P(狀態序列|觀察到的現象, 模型)

3. 怎樣的模型最有可能產生觀察到的

現象

what 模型 maximize P(觀察到的現象|

模型)

Page 10: An Introduction to Hidden Markov Model

Solution 1

• 已知模型,一觀察序列之產生機率 P(O|λ)

S1

R1

R2

S1 S1

R

R1

R2

R

R1

R2

RS2

S3

S2

S3

S2

S3

R1

R2

R1

R2

R1

R2

R1

R2

R1

R2

R1

R2

t1 2 3

觀察到 R1 � R1 � R2 的機率為多少?

Page 11: An Introduction to Hidden Markov Model

Solution 1

• 考慮一特定的狀態序列

Q = q1, q2 … qT

• 產生出一特定觀察序列之機率為

P(O|Q, λ) = P(O1|q1, λ) * P(O2|q2, λ) * … * P(Ot|qt, λ)

= bq1(O1) * bq2(O2) * … * bqT(OT)

Page 12: An Introduction to Hidden Markov Model

Solution 1

• 此一特定序列發生之機率為

• P(O|λ)

P(Q|λ) = πq1 * aq1q2 * aq2q3 * … * aq(T-1)qT

• 已知模型,一觀察序列之產生機率 P(O|λ)

P(O|λ) = P(O|Q, λ) * P(Q| λ)

= πq1 * bq1(O1) * aq1q2 bq2(O2)* … * aq(T-1)qT * bqT(OT)

1 2, ,..., Tq q q

1 2, ,..., Tq q q

Page 13: An Introduction to Hidden Markov Model

Solution 1

• Complexity (N: 狀態的數量狀態的數量狀態的數量狀態的數量)

– 2T*NT ≈ (2T-1)NT muls + NT-1 adds (NT:狀態轉換

組合數)

– For N=5 states, T=100 observations, there are – For N=5 states, T=100 observations, there are

order 2*100*5100 ≈ 1072 computations!!

• Forward Algorithm

– Forward variable αt(i) (給定時間 t 時狀態為 Si 的

條件下,向前向前向前向前局部觀察序列為O1, O2, O3…, Ot的

機率)

1 2( ) ( , ,..., , | )t t t ia i P O O O q S λ= =

Page 14: An Introduction to Hidden Markov Model

Solution 1

S1

S2

S3

R1

R2

S1

S2

S3

S1

S2

S3

R1

R2

R1

R

R1

R2

R1

R2

R1

R

R1

R2

R1

R2

R1

R

When O1 = R1

R2 R2 R2

1 1 1 1

1 2 2 1

1 3 3 1

(1) ( )

(2) ( )

(3) ( )

b O

b O

b O

α π

α π

α π

=

=

=

( ) ( )1 11i ii b O i Nα π= ≤ ≤

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

2 1 11 1 21 1 31 1 2

2 1 12 1 22 1 32 2 2

1 1 2 3

2 1 2 3

a a a b O

a a a b O

α α α α

α α α α

= + +

= + +

t1 2 3

Page 15: An Introduction to Hidden Markov Model

Forward Algorithm

• Initialization:

• Induction:

1 1( ) ( ) 1i ii b O i Nα π= ≤ ≤

• Induction:

• Termination:

( ) ( ) ( )1 1

1

N

t t ij j t

i

j i a b Oα α+ +=

= ∑

1 1

1

t T

j N

≤ ≤ −

≤ ≤

1

( | ) ( )N

T

i

P O iλ α=

=∑

Page 16: An Introduction to Hidden Markov Model

Backward Algorithm

• Forward Algorithm

1 2( ) ( , ,..., , | )t t t ia i P O O O q S λ= =

• Backward Algorithm

–給定時間 t 時狀態為 Si 的條件下,向後向後向後向後局

部觀察序列為 Ot+1, Ot+2, …, OT的機率

1 2( ) ( , ,..., , | )t t t T t ii P O O O q Sβ λ+ += =

Page 17: An Introduction to Hidden Markov Model

Backward Algorithm

• Initialization

• Induction

( ) 1 1T i i Nβ = ≤ ≤

• Induction

1 1

1

( ) ( ) ( )N

t ij j t t

j

i a b O jβ β+ +=

=∑1, 2, ...,1

1

t T T

i N

= − −

≤ ≤

Page 18: An Introduction to Hidden Markov Model

Backward Algorithm

S1

S2

S3

R1

R2

S1

S2

S3

S1

S2

S3

R1

R2

R1

R

R1

R2

R1

R2

R1

R

R1

R2

R1

R2

R1

R

When OT = R1

R2 R2 R2

( ) ( ) ( )

( ) ( ) ( )

1 1

1

11 1 12 2 13 3

1N

T j j T T

j

T T T

a b O j

a b O a b O a b O

β β−=

=

= + +

t1 2 3

Page 19: An Introduction to Hidden Markov Model

Solution 2

• 怎樣的狀態序列最能解釋觀察到的現

象和已知的模型

P(狀態序列|觀察到的現象, 模型)

• 無精確解,有很多種方式解此問題,

對狀態序列的不同限制對狀態序列的不同限制對狀態序列的不同限制對狀態序列的不同限制有不同的解法

Page 20: An Introduction to Hidden Markov Model

Solution 2

• 例: Choose the state qt which are individually

most likely

– γt(i) : the probability of being in state Si at

time t, given the observation sequence O, time t, given the observation sequence O,

and the model λ

( )( )

( ) ( )( )

( ) ( )

( ) ( )

( )1

1

( | , )

1

t t t tt it N

t t

i

t ti N

i i i iP O q Si

P O P Oi i

q argmax i t T

α β α βλγ

λ λα β

γ

=

≤ ≤

== = =

= ≤ ≤

Page 21: An Introduction to Hidden Markov Model

Viterbi algorithm

• The most widely used criteria is to find

the “single best state sequence”

( ) ( )| , , |maxmize P Q O maxmize P Q Oλ λ≈

• A formal technique exists, based on

dynamic programming methods, and is

called the Viterbi algorithm

( ) ( )| , , |maxmize P Q O maxmize P Q Oλ λ≈

Page 22: An Introduction to Hidden Markov Model

Viterbi algorithm

• To find the single best state sequence, Q =

{q1, q2, …, qT}, for the given observation

sequence O = {O1, O2, …, OT}

• δt(i): the best score (highest prob.) along a

single path, at time t, which accounts for the

first t observations and end in state Si

( )1 2 1

1 2 1 2, ,...,

... , ...t

t t i tq q q

i max P q q q S O O Oδ λ−

= =

Page 23: An Introduction to Hidden Markov Model

Viterbi algorithm

• Initialization - δ1(i)

– When t = 1 the most probable path to a

state does not sensibly exist

– However we use the probability of being in

that state given t = 1 and the observable

state O1

( ) ( )

( )1 1

1

0

i ii b O i N

i

δ π

ψ

= ≤ ≤

=

Page 24: An Introduction to Hidden Markov Model

Viterbi algorithm

• Calculate δt(i) when t > 1

– δt(i) : The most probable path to the state X

at time t

– This path to X will have to pass through one – This path to X will have to pass through one

of the states A, B or C at time (t-1)

Most probable path to A: 1( )

tAδ − AX

a ( )X tb O

Page 25: An Introduction to Hidden Markov Model

Viterbi algorithm

• Recursion

( ) ( ) ( )

( ) ( )

11

t t ij j ti N

j max i a b O

j argmax i a

δ δ

ψ δ

−≤ ≤

=

=

2

1

t T

j N

≤ ≤

≤ ≤

• Termination

( ) ( )11

t t iji N

j argmax i aψ δ −≤ ≤

= 1 j N≤ ≤

( )

( )

*

1

*

1

Ti N

T Ti N

P max i

q argmax i

δ

δ

≤ ≤

≤ ≤

=

=

Page 26: An Introduction to Hidden Markov Model

Viterbi algorithm

• Path (state sequence) backtracking

( )

* *

1 1

* *

( ) 1, 2, ...,1

( )

t t tq q t T T

q q argmax i a

ψ

ψ δ

+ += = − −

= = ( ) *

* *

1 11

* *

1 2 2

( )

...

...

( )

TT T T T iq

i N

q q argmax i a

q q

ψ δ

ψ

− −≤ ≤

= =

=

Page 27: An Introduction to Hidden Markov Model

Solution 3

• 怎樣的模型 λ = (A, B, π) 最有可能產生

觀察到的現象

what 模型 maximize P(觀察到的現象|

模型)模型)

• There is no known analytic solution. We

can choose λ = (A, B, π) such that P(O| λ)

is locally maximized using an iterative

procedure

Page 28: An Introduction to Hidden Markov Model

Baum-Welch Method

• Define ξt(i, j) = P(qt=Si , qt+1=Sj|O, λ)

– The probability of being in state Si at time t,

and state Sj at time t+1

( )( ) ( ) ( )

( )( ) ( ) ( )

( ) ( ) ( )

1 1

1 1

1 1

1 1

,t ij j t t

t

t ij j t t

N N

t ij j t t

i j

i a b O ji j

P O

i a b O j

i a b O j

α βξ

λ

α β

α β

+ +

+ +

+ += =

=

=

∑∑

Page 29: An Introduction to Hidden Markov Model

Baum-Welch Method

• γt(i) : the probability of being in state Si at time

t, given the observation sequence O, and the

model λ

( )( ) ( )( )

( ) ( )t t t ti i i i

iα β α β

γ = =

• Relate γt(i) to ξt(i, j)

( )( ) ( ) ( )

1

t N

t t

i

iP O

i i

γλ

α β=

= =

( )( ) ( ) ( )

( )( ) ( ) ( )

( ) ( ) ( )

1 1

1 1

1 1

1 1

,t ij j t t

t

t ij j t t

N N

t ij j t t

i j

i a b O ji j

P O

i a b O j

i a b O j

α βξ

λ

α β

α β

+ +

+ +

+ += =

=

=

∑∑

( ) ( )1

,N

t t

j

i i jγ ξ=

=∑

Page 30: An Introduction to Hidden Markov Model

Baum-Welch Method

• The expected number of times that state Si is

visited

( )1T

tiγ

=∑ Expected number of transitions from Si

• Similarly, the expected number of transitions

from state Si to state Sj

( )1

t

t=

( )1

1

,T

t

t

i jξ−

=

=∑ Expected number of transitions from Si to Sj

Page 31: An Introduction to Hidden Markov Model

Baum-Welch Method

• Re-estimation formulas for π, A and B

1

1

( )

( , )

i

T

i

i j

π γ

ξ−

=

∑1

1

1

1. .

1

( , )

( )

( )

( )

( )

t k

ti jt

ij T

it

t

T

t

ts t O v

j T

t

t

i jexpected number of transitions fromstate S to S

aexpected number of transitions fromstate S

i

j

expected number of times in state j and observing symbb k

j

ξ

γ

γ

γ

=−

=

==

=

= =

= =

kol v

expected number of times in state j

Page 32: An Introduction to Hidden Markov Model

Baum-Welch Method

• P(O|λ) > P(O|λ)

• Iteratively use λ in place of λ and repeat • Iteratively use λ in place of λ and repeat

the re-estimation, we then can improve

P(O| λ) until some limiting point is

reached