hidden markov models

41
Hidden Markov Models Richard Golden (following approach of Chapter 9 of Manning and Schutze, 2000) REVISION DATE: April 15 (Tuesday), 2003

Upload: katy

Post on 20-Mar-2016

67 views

Category:

Documents


0 download

DESCRIPTION

Hidden Markov Models. Richard Golden (following approach of Chapter 9 of Manning and Schutze, 2000) REVISION DATE: April 15 (Tuesday), 2003. a 11 =0.7.  1. S 0. S 1. a 12 =0.3. a 21 =0.5.  2. S 2. a 22 =0.5. VMM (Visible Markov Model). HMM Notation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hidden Markov Models

Hidden Markov Models

Richard Golden(following approach of Chapter 9 of Manning and Schutze, 2000)

REVISION DATE: April 15 (Tuesday), 2003

Page 2: Hidden Markov Models

VMM (Visible Markov Model)

S0

S1

S2

1

2

a12=0.3

a11=0.7

a22=0.5

a21=0.5

Page 3: Hidden Markov Models

HMM Notation• State Sequence Variables: X1, …, XT+1

• Output Sequence Variables: O1, …, OT

• Set of Hidden States (S1, …, SN)• Output Alphabet (K1, …, KM)• Initial State Probabilities (1, .., N)

i=p(X1=Si), i=1,…,N• State Transition Probabilities (aij) i,j{1,…,N}

aij =p(Xt+1|Xt), t=1,…,T• Emission Probabilities (bij) i{1,…,N},j {1,…,M}

bij=p(Xt+1=Si|Xt=Sj), t=1,…,T

Page 4: Hidden Markov Models

HMM State-Emission Representation

S0

S1

S2

1=1

2=0

a12=0.3

a11=0.7

a22=0.5

a21=0.5

K1

K2

K3

b11=0.6

b12=0.1

b13=0.3

b23=0.2

b22=0.7

b21=0.1

• Note that sometimes a Hidden Markov Model is represented by having the emission arrows come off the arcs

• In this situation you would have a lot more emission arrows because there’s a lot more arcs…

• But the transition and emission probabilities are the same…it just takes longer to draw on your powerpoint presentation (self-conscious presentation)

Page 5: Hidden Markov Models

Arc-Emission Representation• Note that sometimes a Hidden Markov Model

is represented by having the emission arrows come off the arcs

• In this situation you would have a lot more emission arrows because there’s a lot more arcs…

• But the transition and emission probabilities are the same…it just takes longer to draw on your powerpoint presentation (self-conscious presentation)

Page 6: Hidden Markov Models

Fundamental Questions for HMMs

• MODEL FIT– How can we compute likelihood of observations and

hidden states given known emission and transition probabilities?

Compute:p(“Dog”/NOUN,”is”/VERB,”Good”/ADJ | {aij},{bkm})

– How can we compute likelihood of observations given known emission and transition probabilities? p(“Dog”,”is”,”Good” | {aij},{bkm})

Page 7: Hidden Markov Models

Fundamental Questions for HMMs

• INFERENCE

• How can we infer the sequence of hidden states given the observations and the known emission and transition probabilities?

• Maximize:Maximize: • p(“Dog”/?,”is”/?, “Good”/? | {aij},{bkm})

with respect to the unknown labels

Page 8: Hidden Markov Models

Fundamental Questions for HMMs

• LEARNING– How can we estimate the emission and

transition probabilities given observations and assuming that hidden states are observable during learning process?

– How can we estimate emission and transition probabilities given observations only?

Page 9: Hidden Markov Models

Direct Calculation of Model Fit(note use of “Markov” Assumptions)

Part 1

1 1

1 1 1

( , , , , , |{ },{ })

( , , | , , ,{ },{ }) ( , , |{ },{ })T T ij km

T T ij km T ij km

p o o x x a b

p o o x x a b p x x a b

EXAMPLE:P(“Dog”/NOUN,”is”/VERB,”Good”/ADJ | {aij},{bij}) =

p(“Dog”,”is”,”Good”|NOUN,VERB,ADJ {aij},{bij}) Xp(NOUN,VERB,ADJ | aij},{bij})

Follows directly from the definition of a conditional probability: p(o,x)=p(o|x)p(x)

Page 10: Hidden Markov Models

Direct Calculation of Likelihood of Labeled Observations(note use of “Markov” Assumptions)

Part 2

1 1

1 1 1

1 11

1 1 11

( , , , , , |{ },{ })

( , , | , , ,{ },{ }) ( , , |{ },{ })

( , , | , , ,{ },{ }) ( | ,{ })

( , , |{ },{ }) ( |{ }) ( | ,{ })

T T ij km

T T ij km T ij km

T

T T ij km t t kmt

T

T ij km i t t ijt

p o o x x a b

p o o x x a b p x x a b

p o o x x a b p o x b

p x x a b p x p x x a

EXAMPLE:Compute p(“DOG”/NOUN,”is”/VERB,”good”/ADJ|{aij},{bkm})

Page 11: Hidden Markov Models

Graphical Algorithm Representation of Direct Calculation of Likelihood of Observations and Hidden States (not hard!)

S0

S1

S2

1=1

2=0

a12=0.3

a11=0.7

a22=0.5

a21=0.5

K1

K2

K3

b11=0.6b12=0.1

b13=0.3

b23=0.2

b22=0.7

b21=0.1

The likelihood of a particular “labeled” sequence of observations(e.g., p(“Dog”/NOUN,”is”/VERB,”Good”/NOUN|{aij},{bkm})) may be computed Using the “direct calculation” method using following simple graphical algorithm.

Specifically, p(K3/S1, K2/S2, K1/S1 |{aij},{bkm}))= 1b13a12b22a21b11

Note that“good” isThe nameOf the dogjSo it is aNoun!

Page 12: Hidden Markov Models

Extension to case where the likelihood of the observations given parameters is needed

(e.g., p( “Dog”, ”is”, ”good” | {aij},{bij})

1 1

1 1 1

1 11

1 1 11

1

( , , , , , |{ },{ })

( , , | , , ,{ },{ }) ( , , |{ },{ })

( , , | , , ,{ },{ }) ( | ,{ })

( , , |{ },{ }) ( |{ }) ( | ,{ })

( , , |{

T T ij km

T T ij km T ij km

T

T T ij km t t kmt

T

T ij km i t t ijt

T i

p o o x x a b

p o o x x a b p x x a b

p o o x x a b p o x b

p x x a b p x p x x a

p o o a

1

1 1, ,

},{ }) ( , , , , , |{ },{ })T

j km T T ij kmx x

b p o o x x a b

KILLER EQUATION!!!!!

Page 13: Hidden Markov Models

Efficiency of Calculations is Important (e.g., Model-Fit)

• Assume 1 multiplication per microsecond• Assume N=1000 word vocabulary and T=7 word sentence.• (2T+1)NT+1 multiplications by

“direct calculation” yields (2(7)+1)(1000)(7+1) is about 475,000 million years of computer time!!!

• 2N2T multiplications using “forward method”is about 14 seconds of computer time!!!

Page 14: Hidden Markov Models

Forward, Backward, and Viterbi Calculations

• Forward calculation methods are thus very useful.

• Forward, Backward, and Viterbi Calculations will now be discussed.

Page 15: Hidden Markov Models

Forward Calculations – Overview

S0

S1

S2

S1

S2

S1

S2

1

2

a12=0.3

a11=0.7

a22=0.5

a21=0.5

K1 K2 K3K1 K2 K3 K1 K2 K3

TIME 2 TIME 3 TIME 4

b13=0.3

K1 K2 K3 K1 K2 K3K1 K2 K3

b23=0.2b21=0.1

b11=0.6

b22=0.1

b12=0.1

Page 16: Hidden Markov Models

Forward Calculations – Time 2 (1 word example)

S0

S1

S2

S1

S2

1

2

a12=0.3

a11=0.7

a22=0.5

a21=0.5

K1 K2 K3

TIME 2

K1 K2 K3

b23=0.2

b13=0.3

NOTE: that 1 (2)+ 2 (2)is the likelihood of the observation/word “K3”in this “1 word example”

1

2

1 1 13 11 2 23 21

2 1 13 12 2 23 22

(1) 1(1) 0

(2) (1) (1) 0.21(2) (1) (1) 0.09

b a b ab a b a

Page 17: Hidden Markov Models

Forward Calculations – Time 3 (2 word example)

S0

S1

S2

S1

S2

S1

S2

1

2

a12=0.3

a11=0.7

a22=0.5

a21=0.5

K1 K2 K3K1 K2 K3

TIME 2 TIME 3 TIME 4

K1 K2 K3 K1 K2 K3

b21=0.1

b11=0.6

b22=0.1

b12=0.1

1(3)

Page 18: Hidden Markov Models

Forward Calculations – Time 4 (3 word example)

S0

S1

S2

S1

S2

S1

S2

1

2

a12=0.3

a11=0.7

a22=0.5

a21=0.5

K1 K2 K3K1 K2 K3 K1 K2 K3

TIME 2 TIME 3 TIME 4

b13=0.3

K1 K2 K3 K1 K2 K3K1 K2 K3

b23=0.2b21=0.1

b11=0.6

b22=0.1

b12=0.1

S1

S2

Page 19: Hidden Markov Models

Forward Calculation of Likelihood Function (“emit and jump”)

t=1(0-word)

t=2(1-word)

t=3(2-word)

t=4(3-word)

1(t) 1.0

1 =1

0.211(1) a11 b13

+2(1) a21 b23

0.04621(2)a11 b12

+2(2)a21 b12

0.021294

2(t) 0.0

2 =0

0.09 1(1) a12 b13

+2(1) a22 b23

0.0378 0.010206

L(t)p(K1… Kt)

1.01(1)

+2(1)

0.31(2) +2(2)

0.0841(3) +2(3)

0.03151(4) +2(4)

Page 20: Hidden Markov Models

Backward Calculations – Overview

S0

S1

S2

S1

S2

S1

S2

1

2

a12=0.3

a11=0.7

a22=0.5

a21=0.5

K1 K2 K3K1 K2 K3 K1 K2 K3

TIME 2 TIME 3 TIME 4

b13=0.3

K1 K2 K3 K1 K2 K3K1 K2 K3

b23=0.2b21=0.1

b11=0.6

b22=0.1

b12=0.1

Page 21: Hidden Markov Models

Backward Calculations – Time 4

S1

S2

K1 K2 K3

TIME 4

b11`=0.6

K1 K2 K3

b21=0.1

Page 22: Hidden Markov Models

Backward Calculations – Time 3

S1

S2

K1 K2 K3

TIME 3

b11`=0.6

K1 K2 K3

b21=0.1

Page 23: Hidden Markov Models

Backward Calculations – Time 2

S1

S2

S1

S2

K1 K2 K3 K1 K2 K3

TIME 2 TIME 3 TIME 4

b13=0.3

K1 K2 K3K1 K2 K3

b23=0.2

b22=0.7

b12=0.1

a22=0.5

a11=0.7

a12=0.3a21=0.5

NOTE: that 1 (2)+ 2 (2)is the likelihood the observation/word sequence “K2,K1”in this “2 word example”

1

2

1

2

1 1 11 12 2 12 12

2 1 21 22 2 22 22

(4) 1(4) 1

(3) 0.6(3) 0.1

(2) (3) (3) 0.045(2) (3) (3) 0.245

a b a ba b a b

Page 24: Hidden Markov Models

Backward Calculations – Time 1

S0

S1

S2

S1

S2

S1

S2

1

2

a12=0.3

a11=0.7

a22=0.5

a21=0.5

K1 K2 K3K1 K2 K3 K1 K2 K3

TIME 2 TIME 3 TIME 4

b13=0.3

K1 K2 K3 K1 K2 K3K1 K2 K3

b23=0.2b21=0.1

b11=0.6

b22=0.1

b12=0.1

Page 25: Hidden Markov Models

Backward Calculation of Likelihood Function (“EMIT AND JUMP”)

t=1 t=2 t=3 t=4

1(t) 0.0315 0.045a11b11 1(1) ++ a12b21 1(1)

0.6b11

1

2(t) 0.029 0.245 a11b11 1(1) ++ a12b21 1(1)

0.1 b21

1

L(t)p(Kt… KT)

0.03151 1(1) +

2 2(1)

0.2901(2) +2(2)

0.71(3) + 2(3)

1

Page 26: Hidden Markov Models

You get same answer going forward or backward!!

t=1 t=2 t=3 t=4

1(t) 0.0315 0.045a11b11 1(1) ++ a12b21 1(1)

0.6b11

1

2(t) 0.029 0.245 a11b11 1(1) ++ a12b21 1(1)

0.1 b21

1

L(t)p(Kt… KT)

0.03151 1(1) +

2 2(1)

0.2901(2) +2(2)

0.71(3) + 2(3)

1

t=1 t=2 t=3 t=4

1(t) 1.0

1 =1

0.211(1) a11 b13

+2(1) a21

b23

0.04621(2)a11

b12

+2(2)a21

b12

0.021294

2(t) 0.0

2 =0

0.09 1(1) a12 b13

+2(1) a22

b23

0.0378 0.010206

L(t)p(K1…

Kt)

1.01(1)

+2(1)

0.31(2) +2(2)

0.0841(3)

+2(3)

0.03151(4)

+2(4)

Forward Backward

Page 27: Hidden Markov Models

The Forward-Backward Method

• Note the forward method computes:

• Note the backward method computes (t>1):

• We can do the forward-backward methodwhich computes p(K1,…,KT) using formula (using any choice of t=1,…,T+1!):

1 11

( ,..., ) ( )N

t ii

p K K t

1

( ,..., ) ( )N

t T ii

p K K t

11

( ,..., ) ( ) ( )N

T i iI

L p K K t t

Page 28: Hidden Markov Models

Example Forward-Backward Calculation!

1 1 1 2 2( ,..., ) ( ) ( ) ( ) ( ) 0.0315 1,..., 1TL p K K t t t t for t T

t=1 t=2 t=3 t=4

1(t) 0.0315 0.045a11b11 1(1) ++ a12b21 1(1)

0.6b11

1

2(t) 0.029 0.245 a11b11 1(1) ++ a12b21 1(1)

0.1 b21

1

L(t)p(Kt… KT)

0.03151 1(1) +

2 2(1)

0.2901(2) +2(2)

0.71(3) + 2(3)

1

t=1 t=2 t=3 t=4

1(t) 1.0

1 =1

0.211(1) a11 b13

+2(1) a21

b23

0.04621(2)a11

b12

+2(2)a21

b12

0.021294

2(t) 0.0

2 =0

0.09 1(1) a12 b13

+2(1) a22

b23

0.0378 0.010206

L(t)p(K1…

Kt)

1.01(1)

+2(1)

0.31(2) +2(2)

0.0841(3)

+2(3)

0.03151(4)

+2(4)

Forward Backward

Page 29: Hidden Markov Models

Solution to Problem 1

• The “hard part” of the 1st Problem was to find the likelihood of the observations for an HMM

• We can now do this using either theforward, backward, or forward-backwardmethod.

Page 30: Hidden Markov Models

Solution to Problem 2: Viterbi Algorithm(Computing “Most Probable” Labeling)

• Consider direct calculation of labeledobservations

• Previously we summed these likelihoods together across all possible labelings to solve the first problemwhich was to compute the likelihood of the observationsgiven the parameters (Hard part of HMM Question 1!).– We solved this problem using forward or backward

method.• Now we want to compute all possible labelings and their

respective likelihoods and pick the labeling which isthe largest!

EXAMPLE:Compute p(“DOG”/NOUN,”is”/VERB,”good”/ADJ|{aij},{bkm})

Page 31: Hidden Markov Models

Efficiency of Calculations is Important (e.g., Most Likely Labeling Problem)

• Just as in the forward-backward calculations wecan solve problem of computing likelihood of every possible one of the NT labelings efficiently

• Instead of millions of years of computing time we can solve the problem in several seconds!!

Page 32: Hidden Markov Models

Viterbi Algorithm – Overview (same setup as forward algorithm)

S0

S1

S2

S1

S2

S1

S2

1

2

a12=0.3

a11=0.7

a22=0.5

a21=0.5

K1 K2 K3K1 K2 K3 K1 K2 K3

TIME 2 TIME 3 TIME 4

b13=0.3

K1 K2 K3 K1 K2 K3K1 K2 K3

b23=0.2b21=0.1

b11=0.6

b22=0.1

b12=0.1

Page 33: Hidden Markov Models

Forward Calculations – Time 2 (1 word example)

S0

S1

S2

S1

S2

1=1

2=0

a12=0.3

a11=0.7

a22=0.5

a21=0.5

K1 K2 K3

TIME 2

K1 K2 K3

b23=0.2

b13=0.3

1 1

2 2

1 1 13 11 2 23 21

2 1 13 12 2 23 22

1

2

(1) 1(1) 0

(2) max{ (1) , (1) } 0.21(2) max{ (1) , (1) } 0.09

(2) 1(2) 1

b a b ab a b a

Page 34: Hidden Markov Models

Backtracking – Time 2 (1 word example)

S0

S1

S2

S1

S2

1=1

2=0

a12=0.3

a11=0.7

a22=0.5

a21=0.5

K1 K2 K3

TIME 2

K1 K2 K3

b23=0.2

b13=0.3

1 1

2 2

1 1 13 11 2 23 21

2 1 13 12 2 23 22

1

2

(1) 1(1) 0

(2) max{ (1) , (1) } 0.21(2) max{ (1) , (1) } 0.09

(2) 1(2) 1

b a b ab a b a

Page 35: Hidden Markov Models

Forward Calculations – (2 word example)

S0

S1

S2

S1

S2

S1

S2

1

2

a12=0.3

a11=0.7

a22=0.5

a21=0.5

K1 K2 K3K1 K2 K3

TIME 2 TIME 3 TIME 4

K1 K2 K3 K1 K2 K3

b23=0.2

b13=0.3

b22=0.1

b12=0.1

Page 36: Hidden Markov Models

BACKTRACKING – (2 word example)

S0

S1

S2

S1

S2

S1

S2

1

2

a12=0.3

a11=0.7

a22=0.5

a21=0.5

K1 K2 K3K1 K2 K3

TIME 2 TIME 3 TIME 4

K1 K2 K3 K1 K2 K3

b23=0.2

b13=0.3

b22=0.1

b12=0.1

Page 37: Hidden Markov Models

Formal Analysis of 2 word case

1 1 12 11 2 22 21

1

1

2 1 12 12 2 22 22

2

2

(3) max{ (2) , (2) }(3) max{(0.21)(0.1)(0.7), (0.09)(0.7)(0.5)}(3) max{0.0147,0.0315} 0.0315

(3) max{ (2) , (2) }(3) max{(0.21)(0.1)(0.3), (0.09)(0.7)(0.5)}(3) max{0.0

b a b a

b a b a

1

2

063,0.0315} 0.0315

(2) 1 2 1

(2) 1 2 1

or pick arbitrarily

or pick arbitrarily

Page 38: Hidden Markov Models

Forward Calculations – Time 4 (3 word example)

S0

S1

S2

S1

S2

S1

S2

1

2

a12=0.3

a11=0.7

a22=0.5

a21=0.5

K1 K2 K3K1 K2 K3 K1 K2 K3

TIME 2 TIME 3 TIME 4

b13=0.3

K1 K2 K3 K1 K2 K3K1 K2 K3

b23=0.2b21=0.1

b11=0.6

b22=0.1

b12=0.1

S1

S2

Page 39: Hidden Markov Models

Backtracking to Obtain Labeling for 3 word case

S0

S1

S2

S1

S2

S1

S2

1

2

a12=0.3

a11=0.7

a22=0.5

a21=0.5

K1 K2 K3K1 K2 K3 K1 K2 K3

TIME 2 TIME 3 TIME 4

b13=0.3

K1 K2 K3 K1 K2 K3K1 K2 K3

b23=0.2b21=0.1

b11=0.6

b22=0.1

b12=0.1

S1

S2

Page 40: Hidden Markov Models

Formal Analysis of 3 word case

1 1 11 11 2 21 21

1

2 1 11 12 2 21 22

2

1

2

(4) max{ (3) , (3) }(4) max{(0.0315)(0.6)(0.7), (0.0315)(0.1)(0.5)} 0.01323

(4) max{ (3) , (3) }(4) max{(0.0315)(0.6)(0.3), (0.0045)(0.1)(0.5)} 0.00567

(4) 2(4) 2

b a b a

b a b a

Page 41: Hidden Markov Models

Third Fundamental Question:Parameter Estimation

• Make Initial Guess for {aij} and {bkm}• Compute probability one hidden state follows another

given: {aij} and {bkm} and sequence of observations.(computed using forward-backward algorithm)

• Compute probability of observed state given a hidden state given: {aij} and {bkm} and sequence of observations.(computed using forward-backward algorithm)

• Use these computed probabilities tomake an improved guess for {aij} and {bkm}

• Repeat this process until convergence• Can be shown that this algorithm does in

fact converge to correct choice for {aij} and {bkm}assuming that the initial guess was close enough..