hidden markov modelssrihari/cse574/chap13/13.2-hmms.pdf · hidden markov models sargur srihari...

16
0 Hidden Markov Models Sargur Srihari [email protected]

Upload: others

Post on 20-Jun-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hidden Markov Modelssrihari/CSE574/Chap13/13.2-HMMs.pdf · Hidden Markov Models Sargur Srihari srihari@cedar.buffalo.edu. HMM Overview Machine Learning Srihari 1. Role of HMMs in

0

Hidden Markov ModelsSargur Srihari

[email protected]

Page 2: Hidden Markov Modelssrihari/CSE574/Chap13/13.2-HMMs.pdf · Hidden Markov Models Sargur Srihari srihari@cedar.buffalo.edu. HMM Overview Machine Learning Srihari 1. Role of HMMs in

HMM OverviewMachine Learning Srihari

1

Page 3: Hidden Markov Modelssrihari/CSE574/Chap13/13.2-HMMs.pdf · Hidden Markov Models Sargur Srihari srihari@cedar.buffalo.edu. HMM Overview Machine Learning Srihari 1. Role of HMMs in

Role of HMMs in ML• Ubiquitous tool for modeling time series data

• Used in most all speech recognition systems• Computational molecular biology• Group amino acid sequences into proteins

• It is a BN for representing probability distributions over sequences of observations

• HMM has two defining properties:• Observation xt at time t was generated by some process

whose state zt is hidden from the observer• Assumes that state at zt is dependent only on state zt-1 and

independent of all prior states (First order) • Example: • z are phoneme sequences, x are acoustic observations

Machine Learning Srihari

2

Page 4: Hidden Markov Modelssrihari/CSE574/Chap13/13.2-HMMs.pdf · Hidden Markov Models Sargur Srihari srihari@cedar.buffalo.edu. HMM Overview Machine Learning Srihari 1. Role of HMMs in

3

Graphical Model of an HMM• Has the state space model shown below and

latent variables are discrete

• Joint distribution has the form

p(x

1,..x

N,z

1,..z

n) = p(z

1) p(z

n| z

n−1)n=2

N

∏⎡⎣⎢

⎦⎥ p(

n=1

N

∏ xn| z

n)

Page 5: Hidden Markov Modelssrihari/CSE574/Chap13/13.2-HMMs.pdf · Hidden Markov Models Sargur Srihari srihari@cedar.buffalo.edu. HMM Overview Machine Learning Srihari 1. Role of HMMs in

4

Mixture Viewed as HMM

• A single time slice corresponds to a mixture distribution with component densities p(x|z)

• An extension of mixture model• Choice of mixture component depends on choice of

mixture component for previous distribution• Latent variables are multinomial variables zn

• That describe component responsible for generating xn• Can use one-of-K coding scheme

Page 6: Hidden Markov Modelssrihari/CSE574/Chap13/13.2-HMMs.pdf · Hidden Markov Models Sargur Srihari srihari@cedar.buffalo.edu. HMM Overview Machine Learning Srihari 1. Role of HMMs in

Transitional Probability Matrix• Joint distribution:

• We allow zn to depend on zn-1 via p(zn|zn-1)• Latent variables are 1-of-K, thus this conditional

distribution is specified by a transition probability matrix A with Ajk =p(znk=1|zn-1,j=1)

• Conditional distribution is stated as

• Exponent zn-1,jzn,k , which is a product, is 0 or 1• Hence product evaluates to a single Ajk for each

setting of values of zn , zn-1

• Thus p(zn=3|zn-1=2)=A23

p(z

n| z

n−1,A) = Ajk

zn−1,jzn ,k

j=1

K

∏k=1

K

• Not a graphical model since nodes are not separate variables but states of a single variable• Here K=3

Transition diagramwhere latent variables have three possible states

State of zn State of zn-1 A matrixMatrixA has K(K-1) independent parameters

p(x

1,..x

N,z

1,..z

n) = p(z

1) p(z

n| z

n−1)n=2

N

∏⎡⎣⎢

⎦⎥ p(

n=1

N

∏ xn| z

n)

Page 7: Hidden Markov Modelssrihari/CSE574/Chap13/13.2-HMMs.pdf · Hidden Markov Models Sargur Srihari srihari@cedar.buffalo.edu. HMM Overview Machine Learning Srihari 1. Role of HMMs in

6

Initial Variable Probabilities

• Initial latent node z1 does not have a parent node • Its marginal distribution p(z1) is represented by a vector of

probabilities p with elements p k=p(z1k=1) so that

• Note that p is an HMM parameter • representing probabilities of each state for the first variable

p(z

1|π) = π

k

z1,k

k=1

K

∏ where Σkπ

k= 1

Page 8: Hidden Markov Modelssrihari/CSE574/Chap13/13.2-HMMs.pdf · Hidden Markov Models Sargur Srihari srihari@cedar.buffalo.edu. HMM Overview Machine Learning Srihari 1. Role of HMMs in

7

Unfolding State Transition over timeLattice or trellis representation of the latent states

State Transition diagram Each column corresponds to one of the zn

Page 9: Hidden Markov Modelssrihari/CSE574/Chap13/13.2-HMMs.pdf · Hidden Markov Models Sargur Srihari srihari@cedar.buffalo.edu. HMM Overview Machine Learning Srihari 1. Role of HMMs in

8

Emission Probabilities p(xn|zn)• Specification of probabilistic model is completed by

defining conditional distributions of observed variablesp(xn|zn,ϕ) where ϕ are parameters• These are known as emission probabilities which can be

either continuous (by Gaussians) or discrete (by tables)• Because xn is observed, distribution p(xn|zn,ϕ)

consists of a table of K numbers corresponding to Kstates of binary vector zn• Emission probabilities can be represented as

!(x# |%#,&) = ∏*+,- (! .# &* )/01

Page 10: Hidden Markov Modelssrihari/CSE574/Chap13/13.2-HMMs.pdf · Hidden Markov Models Sargur Srihari srihari@cedar.buffalo.edu. HMM Overview Machine Learning Srihari 1. Role of HMMs in

Homogeneous Models• All conditional distributions governing the latent

variables share the same parameters A• All the emission distributions share the same

parameters ϕ

9

Page 11: Hidden Markov Modelssrihari/CSE574/Chap13/13.2-HMMs.pdf · Hidden Markov Models Sargur Srihari srihari@cedar.buffalo.edu. HMM Overview Machine Learning Srihari 1. Role of HMMs in

10

Joint Distribution over Latent and Observed variables

},,{ where

),|(),|()|()|,(

11

1211

fp

fpq

A}, θ,..z{z}, Z,..x{xX

zxpAzzpzpZXp

NN

mm

N

m

N

nnn

===

úû

ùêë

é= ÕÕ

==-

• Joint can be expressed in terms of parameters:

• Most discussion of HMM is independent of emission probabilities• Tractable for discrete tables, Gaussian, GMMs

Page 12: Hidden Markov Modelssrihari/CSE574/Chap13/13.2-HMMs.pdf · Hidden Markov Models Sargur Srihari srihari@cedar.buffalo.edu. HMM Overview Machine Learning Srihari 1. Role of HMMs in

HMM from a generative viewpoint• Can get a better understanding of HMMs by

considering it from a generative viewpoint• First choose latent variable z1 with probabilities

determined by probabilities πk and then sample the corresponding observation x1

• Now we choose the state of the variable z2according to the transition probabilities p(z2|z1) according to the already instantiated value of z1

• This is an example of ancestral sampling for adirected PGM

Machine Learning Srihari

11

Page 13: Hidden Markov Modelssrihari/CSE574/Chap13/13.2-HMMs.pdf · Hidden Markov Models Sargur Srihari srihari@cedar.buffalo.edu. HMM Overview Machine Learning Srihari 1. Role of HMMs in

12

Example of sampling from a HMM• Generative viewpoint• Three states of a latent variable• Gaussian emission model p(x|z)• Two-dimensional x• 50 data points generated• Transition probabilities

• 5% probability of making transition• 90% of remaining in same

Page 14: Hidden Markov Modelssrihari/CSE574/Chap13/13.2-HMMs.pdf · Hidden Markov Models Sargur Srihari srihari@cedar.buffalo.edu. HMM Overview Machine Learning Srihari 1. Role of HMMs in

A variant of HMM• Imposing restrictions on transition matrix A

• Left-to right HMM• Setting the elements of Ajk=0 if k < j

• Once a state has been vacated, it cannot be reentered• Corresponding lattice

Machine Learning Srihari

13

Page 15: Hidden Markov Modelssrihari/CSE574/Chap13/13.2-HMMs.pdf · Hidden Markov Models Sargur Srihari srihari@cedar.buffalo.edu. HMM Overview Machine Learning Srihari 1. Role of HMMs in

Left-to-Right HMM Applied to Digits• Examples of on-line handwritten digits

•K=16 states: corresponding to a line segment of fixed length in one of 16 angles

• Emission probabilities: 16 x 16 table, associated with allowed angle values for each state

• Transition probabilities set to zero except for those that keep state index k the same or increment by one

• Parameters optimized by 25 iterations of EM

Top row: Training SetOf 45 examples of 2

Bottom row: Generated Set

Page 16: Hidden Markov Modelssrihari/CSE574/Chap13/13.2-HMMs.pdf · Hidden Markov Models Sargur Srihari srihari@cedar.buffalo.edu. HMM Overview Machine Learning Srihari 1. Role of HMMs in

HMM invariance to time warping• Time warping (compression and stretching) • On-line handwriting recognition

• A typical digit consists of two strokes• Arc starts at top left down to cusp/loop at bottom left• A straight sweep ending at bottom right• Relative sizes of the two sections vary and hence the

location of the cusp/loop• HMM accommodates this by no of transitions to

same state vs transitions to successive state• Speech recognition

• Warping of time axis is speed of speech• HMM accommodates such distortion

Machine Learning Srihari

15