hidden markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf ·...

47
Hidden Markov models: from the beginning to the state of the art Frank Wood Columbia University November, 2011 Wood (Columbia University) HMMs November, 2011 1 / 44

Upload: nguyenliem

Post on 15-Apr-2018

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Hidden Markov models: from the beginning tothe state of the art

Frank Wood

Columbia University

November, 2011

Wood (Columbia University) HMMs November, 2011 1 / 44

Page 2: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Outline

Overview of hidden Markov models from Rabiner tutorial to now

EDHMM

Gateway to state of the art modelsInference

Tips and tricks for Bayesian inference in general (auxiliary variablesand slice sampling)

Toy examples

Wood (Columbia University) HMMs November, 2011 2 / 44

Page 3: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Hidden Markov Models

Hidden Markov models (HMMs) [Rabiner, 1989] are an important tool fordata exploration and engineering applications.

Applications include

Speech recognition [Jelinek, 1997, Juang and Rabiner, 1985]

Natural language processing [Manning and Schutze, 1999]

Hand-writing recognition [Nag et al., 1986]

DNA and other biological sequence modeling applications [Kroghet al., 1994]

Gesture recognition [Tanguay Jr, 1995, Wilson and Bobick, 1999]

Financial data modeling [Ryden et al., 1998]

. . . and many more.

Wood (Columbia University) HMMs November, 2011 3 / 44

Page 4: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Graphical Model: Hidden Markov Model

πm

θm

y1

y2 yT

y3K

z0 z1 z2 z3 zT

Wood (Columbia University) HMMs November, 2011 4 / 44

Page 5: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Notation: Hidden Markov Model

zt |zt−1 = m ∼ Discrete(πm)

yt |zt = m ∼ Fθ(θm)

A =

...

......

π1 · · · πm · · · πK...

......

Wood (Columbia University) HMMs November, 2011 5 / 44

Page 6: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

HMM: Dynamic mixture model

�����

�����

�����

0 0.5 10

0.5

1

0 0.5 10

0.5

1

Visualization from PRML. [Bishop, 2006]

Wood (Columbia University) HMMs November, 2011 6 / 44

Page 7: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

HMM: Trellis

k = 1

k = 2

k = 3

n− 2 n− 1 n n + 1

A11 A11 A11

A33 A33 A33

Visualization from PRML. [Bishop, 2006]

Wood (Columbia University) HMMs November, 2011 7 / 44

Page 8: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

HMM: Typical Usage Scenario (Character Recognition)

Training data: multiple “observed” yt = {vt , ht} sequences of styluspositions for each kind of character

Task: train |Σ| different models, one for each character

Latent states: design for correspondence with strokes

Usage: classify new stylus position sequences using trained modelsMσ = {Aσ,Θσ}

P(Mσ|y1, . . . , yT ) ∝ P(y1, . . . , yT |Mσ)P(Mσ)

Wood (Columbia University) HMMs November, 2011 8 / 44

Page 9: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Shortcomings of Original HMM Specification

Latent state dwell times are not usually geometrically distributed

P(zt = m, . . . , zt+L = m|A)

=L∏`=1

P(zt+`+1 = m|zt+` = m,A)

= Geometric(L; πm(m))

There are often problem-specific structural constraints on allowabletransitions, i.e. Ai ,i = 0

The state cardinality of the latent Markov chain K is usually unknown

Wood (Columbia University) HMMs November, 2011 9 / 44

Page 10: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Explicit Duration HMM / H. Semi-Markov Model

[Mitchell et al., 1995, Murphy, 2002, Yu and Kobayashi, 2003, Yu, 2010]

Latent state sequence z = ({s1, r1}, . . . , {sT , rT})Latent state id sequence s = (s1, . . . , sT )

Latent “remaining duration” sequence r = (r1, . . . , rT )

State-specific duration distribution Fr (λm)

Other distributions the same

An EDHMM transitions between states in a different way than does atypical HMM. Unless rt = 0 the current remaining duration is decrementedand the state does not change. If rt = 0 then the EDHMM transitions to astate m 6= st according to the distribution defined by πst

Problem: inference requires enumerating possible durations.

Wood (Columbia University) HMMs November, 2011 10 / 44

Page 11: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

EDHMM notation

Latent state zt = {st , rt} is tuple consisting of state identity and time leftin state.

st |st−1, rt−1 ∼{

I(st = st−1), rt−1 > 0

Discrete(πst−1), rt−1 = 0

rt |st , rt−1 ∼{

I(rt = rt−1 − 1), rt−1 > 0

Fr (λst ), rt−1 = 0

yt |st ∼ Fθ(θst )

Wood (Columbia University) HMMs November, 2011 11 / 44

Page 12: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

EDHMM: Graphical Model

λm

πm

θm

r0

s0

r1

s1

y1

r2

s2

y2

rT

sT

yT

r3

s3

y3K

Wood (Columbia University) HMMs November, 2011 12 / 44

Page 13: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Structured HMMs: i.e. left-to-right HMM [Rabiner, 1989]

Example: Chicken pox

Observations vital signs

Latent states pre-infection, infected, post-infectiona

State transition structure can’t go from infected to pre-infection

adisregarding shingles

Structured transitions imply zeros in the transition matrix A, i.e. (for aleft-to-right HMM)

p(st = m|st−1 = `) = 0 ∀ m < `

Wood (Columbia University) HMMs November, 2011 13 / 44

Page 14: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Structured HMM: Trellis: One step at a time, left-to-right HMM

k = 1

k = 2

k = 3

n− 2 n− 1 n n + 1

A11 A11 A11

A33 A33 A33

Figure from PRML. [Bishop, 2006]

Wood (Columbia University) HMMs November, 2011 14 / 44

Page 15: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Bayesian HMM

We will put a prior on parameters so that we can effect a solutionthat conforms to our ideas about what the solution should look like

Structured prior

Ai,j = 0 (hard constraints)Ai,j ≈

Pj Ai,j (rich get richer)

Bayesian regularization means that we can specify a model with moreparameters than could possibly be needed

infinite complexity (i.e. K →∞) avoids many model selection problems“extra” states can be thought of as auxiliary or nuisance variablesinference requires sampling in a model with countably infinite support

Posterior over latent variables encodes uncertainty aboutinterpretation of data.

Wood (Columbia University) HMMs November, 2011 15 / 44

Page 16: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Bayesian HMM

πm

Hy θm

y1

y2 yT

y3K

z0 z1 z2 z3 zTHz

πm ∼ Hz

θm ∼ Hy

zt |zt−1 = m ∼ Discrete(πm)

yt |zt = m ∼ Fθ(θm)

Wood (Columbia University) HMMs November, 2011 16 / 44

Page 17: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Infinite HMMs (IHMM) [Beal et al., 2002, Teh et al., 2006]

πm

θm

y1

y2 yT

y3∞

z0 z1 z2 z3 zT

c0

γ

c1

Hy

K →∞,Sticky IHMM [Fox et al., 2011] = IHMM with up-weighted self-transitions

Wood (Columbia University) HMMs November, 2011 17 / 44

Page 18: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Inference for the Explicit Duration HMM (EDHMM)

Simple Idea

Infinite HMMs and EDHMMs share fundamental characteristic:countable support

Inference techniques for Bayesian nonparametric (infinite) HMMs canbe used for EDHMM inference

Result

Approximation-free, efficient inference algorithm for EDHMMinference

Utility

New HMM for you to try in your applications

Gateway to understanding and dealing with infinite state cardinalityvariants

Joint work with Chris Wiggins (Columbia), Mike Dewar (Bitly)

Wood (Columbia University) HMMs November, 2011 18 / 44

Page 19: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Bayesian EDHMM: Graphical Model

Hr λm

πm

Hy θm

r0

s0

r1

s1

y1

r2

s2

y2

rT

sT

yT

r3

s3

y3K

Hz

A choice of prior

λm|Hr ∼ Gamma(Hr )

πm|Hz ∼ Dir(1/K , 1/K , . . . , 1/K , 0, 1/K , . . . , 1/K , 1/K )

Wood (Columbia University) HMMs November, 2011 19 / 44

Page 20: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

EDHMM Inference: Beam Sampling [Dewar, Wiggins and W, 2011]

We employ the forward-filtering, backward slice-sampling approach for theIHMM of [Van Gael et al., 2008] , in which the state and durationvariables s and r are sampled conditioned on auxiliary slice variables u.

Net result: efficient, always finite forward-backward procedure for samplinglatent states and the amount of time spent in them.

Wood (Columbia University) HMMs November, 2011 20 / 44

Page 21: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Auxiliary Variables for Sampling

Objective: get samples of x .

Sometimes it is easier to introducean auxiliary variable u and to Gibbssample the joint P(x , u) (i.e. samplefrom P(x |u;λ) then P(u|x , λ), etc.)then discard the u values than it is todirectly sample from p(x |λ).

Useful when: p(x |λ) does not have aknown parametric form but adding uresults in a parametric form andwhen x has countable support andsampling it requires enumerating allvalues.

λ

x

λ

x

u

Wood (Columbia University) HMMs November, 2011 21 / 44

Page 22: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Auxiliary Variables for Sampling

Objective: get samples of x .

Sometimes it is easier to introducean auxiliary variable u and to Gibbssample the joint P(x , u) (i.e. samplefrom P(x |u;λ) then P(u|x , λ), etc.)then discard the u values than it is todirectly sample from p(x |λ).

Useful when: p(x |λ) does not have aknown parametric form but adding uresults in a parametric form andwhen x has countable support andsampling it requires enumerating allvalues.

λ

x

λ

x

u

Wood (Columbia University) HMMs November, 2011 21 / 44

Page 23: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Auxiliary Variables for Sampling

Objective: get samples of x .

Sometimes it is easier to introducean auxiliary variable u and to Gibbssample the joint P(x , u) (i.e. samplefrom P(x |u;λ) then P(u|x , λ), etc.)then discard the u values than it is todirectly sample from p(x |λ).

Useful when: p(x |λ) does not have aknown parametric form but adding uresults in a parametric form andwhen x has countable support andsampling it requires enumerating allvalues.

λ

x

λ

x

u

Wood (Columbia University) HMMs November, 2011 21 / 44

Page 24: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Slice Sampling: A very useful auxiliary variable sampling trick

Pedagogical Example:

x |λ ∼ Poisson(λ) (countable support)

enumeration strategy for sampling x (impossible)1

auxiliary variable u with P(u|x , λ) = I(0≤u≤P(x |λ))P(x |λ)

Note: Marginal distribution of x is

P(x |λ) =∑u

P(x , u|λ)

=∑u

P(x |λ)P(u|x , λ)

=∑u

P(x |λ)I(0 ≤ u ≤ P(x |λ))

P(x |λ)

=∑u

I(0 ≤ u ≤ P(x |λ)) = P(x |λ)

1Necessary in EDHMMWood (Columbia University) HMMs November, 2011 22 / 44

Page 25: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Slice Sampling: A very useful auxiliary variable sampling trick

This suggests a Gibbs sampling scheme: alternately sampling fromP(x |u, λ) ∝ I(u ≤ P(x |λ))

finite support, uniform above slice, enumeration possible

P(u|x , λ) = I(0≤u≤P(x |λ))P(x |λ)

uniform between 0 and y = P(x |λ)

then discarding the u values to arrive at x samples marginally distributedaccording to P(x |λ).

1 2 3 4 5 6 7 8 9 100

0.05

0.1

0.15

rt

Poi

sson

PD

F(r

t; λm

= 5

)

ut = .05

Wood (Columbia University) HMMs November, 2011 23 / 44

Page 26: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

EDHMM Graphical Model with Auxiliary Variables

r0

s0

r1

s1

y1

r2

s2

y2

rT

sT

yT

r3

s3

y3

u1 u2 uT

Wood (Columbia University) HMMs November, 2011 24 / 44

Page 27: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

EDHMM Inference: Beam Sampling

With auxiliary variables defined as

p(ut |zt , zt−1) =I(0 < ut < p(zt |zt−1))

p(zt |zt−1)

and

p(zt |zt−1) = p((st , rt)|(st−1, rt−1))

=

{rt−1 > 0, I(st = st−1)I(rt = rt−1 − 1)

rt−1 = 0, πst−1st Fr (rt ;λst ).

one can run standard forward-backward conditioned on u’s.

Forward-backward slice sampling only has to consider a finite number ofsuccessor states at each timestep.

Wood (Columbia University) HMMs November, 2011 25 / 44

Page 28: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Forward recursion

αt(zt) = p(zt ,Yt1,U t

1)

=∑zt−1

p(zt , zt−1,Yt1,U t

1)

∝∑zt−1

p(ut |zt , zt−1)p(zt , zt−1,Yt1,U t−1

1 )

=∑zt−1

p(ut |zt , zt−1)p(yt |zt)p(zt |zt−1)p(zt−1,Yt−11 ,U t−1

1 )

=∑zt−1

I(0 < ut < p(zt |zt−1))p(yt |zt)αt−1(zt−1).

Only a finite (small) part of the forward trellis needs to be enumerated (inexpectation).

Wood (Columbia University) HMMs November, 2011 26 / 44

Page 29: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Backward Sampling

Recursively sample a state sequence from the distribution p(zt−1|zt ,Y,U)which can expressed in terms of the forward variable:

p(zt−1|zt ,Y,U) ∝ p(zt , zt−1,Y,U)

∝ p(ut |zt , zt−1)p(zt |zt−1)αt−1(zt−1)

∝ I(0 < ut < p(zt |zt−1))αt−1(zt−1).

Wood (Columbia University) HMMs November, 2011 27 / 44

Page 30: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Sampler

Algorithm 1 EDHMM Sampler

Initialise parameters A, λ, θ. Initialize ut small ∀Tfor sweep ∈ {1, 2, 3, . . .} do

Forward: run FR to get αt given U and Y ∀TBackward: sample zT ∼ αT

for t ∈ {T ,T − 1, . . . , 1} dosample zt−1 ∼ I(ut+1 < p(zt |zt−1))αt−1

end forSlice:for t ∈ {1, 2, . . . ,T} do

evaluate l = p(dt |xt , dt−1)p(xt |xt−1, dt−1)sample ut+1 ∼ Uniform(0, l)

end forsample parameters A, λ, θ

end forWood (Columbia University) HMMs November, 2011 28 / 44

Page 31: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Posterior Utility

For instance, posterior predictive inference

P(yT+1|y)

=

∫ ∫ ∫ ∑z

P(yT+1|A, z,π,λ,θ)P(A, z,π,λ,θ|y)dAdπdλdθ

≈ 1

L

∑`

P(yT+1|A(`), z(`),π(`),λ(`),θ(`))

where{A(`), z(`),π(`),λ(`),θ(`)} ∼ P(A, z,π,λ,θ|y)

Wood (Columbia University) HMMs November, 2011 29 / 44

Page 32: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Results

To illustrate EDHMM learning on synthetic data, five hundred datapointswere generated using a 4 state EDHMM with Poisson durationdistributions

λ = (10, 20, 3, 7)

and Gaussian emission distributions with means

µ = (−6,−2, 2, 6)

all unit variance.

Wood (Columbia University) HMMs November, 2011 30 / 44

Page 33: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

EDHMM: Synthetic Data Results

0 100 200 300 400 500−10

−5

0

5

10

time

emission/mean

Wood (Columbia University) HMMs November, 2011 31 / 44

Page 34: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

EDHMM: Synthetic Data, State Mean Posterior

−10 −5 0 5 100

200

400

600

emission mean

coun

t

Wood (Columbia University) HMMs November, 2011 32 / 44

Page 35: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

EDHMM: Synthetic Data, State Duration Parameter Posterior

0 5 10 15 20 250

100

200

300

duration rate

coun

t

Wood (Columbia University) HMMs November, 2011 33 / 44

Page 36: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

EDHMM: Nanoscale Transistor Spontaneous Voltage Fluctuation

0 1000 2000 3000 4000 5000−3

−2

−1

0

1

2

time

emission/mean

[Realov and Shepard, 2010]

Wood (Columbia University) HMMs November, 2011 34 / 44

Page 37: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

EDHMM: States Not Distinguishable By Output

Task: understand system with states that have identical observationdistributions and differ only in duration distribution.

Observation distributions have means µ1 = 0, µ2 = 0, and µ3 = 3 and theduration distributions have rates λ1 = 5, λ2 = 15, λ3 = 20.

(Next slide)

a) observations; b) true states overlaid with 20 randomly selected statetraces produced by the sampler.

Samples from the posterior observation distribution mean are shown in c),and samples from the posterior duration distribution rates are shown in d).

Wood (Columbia University) HMMs November, 2011 35 / 44

Page 38: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

EDHMM: States Not Distinguishable By Output

0 100 200 300 400 500

time−4−2

02468

y t

(a)

0 100 200 300 400 500

time

0.00.51.01.52.0

xt

(b)

−2 −1 0 1 2 3 4y

0

1

2

3

4

5

6

p(µ

j|Y

)

(c)

0 5 10 15 20 25

time0.00

0.05

0.10

0.15

0.20

0.25

0.30

p(λ

j|Y

)

(d)

Figure: Beam sampler results applied to a system with identical observationdistributions but differing durations. Observations are shown in a); true states areshown in b) overlaid with 20 randomly selected state traces produced by thesampler. Here the observation distributions have means µ1 = 0, µ2 = 0, andµ3 = 3 and the duration distributions have rates λ1 = 5, λ2 = 15, λ3 = 20.Samples from the posterior observation distribution mean are shown in c), andsamples from the posterior duration distribution rates are shown in d).

Wood (Columbia University) HMMs November, 2011 36 / 44

Page 39: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

EDHMM vs. IHMM: Modeling the Morse Code Cepstrum

Wood (Columbia University) HMMs November, 2011 37 / 44

Page 40: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Wrap-Up

Overview of HMM developments since Rabiner tutorial

Tools to understand state-of-the art HMMs

Useful tricks for Bayesian inference in general (sampling)

Extras

Code: https://github.com/mikedewar/EDHMM

Me: http://www.stat.columbia.edu/∼fwoodw: http://www.stat.columbia.edu/∼fwood/w4240/

Wood (Columbia University) HMMs November, 2011 38 / 44

Page 41: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Questions?

Thank you!

Wood (Columbia University) HMMs November, 2011 39 / 44

Page 42: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

More Technical Wrap-Up - Small Contribution

Novel inference procedure for EDHMMs that doesn’t requiretruncation and is more efficient than considering all possible durations.

0 200 400 600 800 1000iteration

25

30

35

40

45

50

55

num

ber o

f tra

nsiti

ons v

isite

d

Figure: Mean number of transitions considered per time point by the beamsampler for 1000 post-burn-in iterations. Compare to (KT )2 = O(106) transitionsthat would need to be considered by standard forward backward withouttruncation, the only surely safe, truncation-free alternative.

Wood (Columbia University) HMMs November, 2011 40 / 44

Page 43: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Future Work

Novel Gamma process construction for dependent, structured, infinitedimensional HMM transition distributions.

Generalize to spatial prior on HMM states (“location”)

Simultaneous location and mappingProcess diagram modeling for systems biology

Applications; seeking “users”

Wood (Columbia University) HMMs November, 2011 41 / 44

Page 44: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Bibliography I

M J Beal, Z Ghahramani, and C E Rasmussen. The Infinite HiddenMarkov Model. In Advances in Neural Information Processing Systems,pages 29–245, March 2002.

C M Bishop. Pattern Recognition and Machine Learning. Springer, 2006.

M Dewar, C Wiggins, and F Wood. Inference in hidden Markov modelswith explicit state duration distributions. In Submission, 2011.

E B Fox, E B Sudderth, M I Jordan, and A S Willsky. A StickyHDP-HMM with Application to Speaker Diarization. Annals of AppliedStatistics, 5(2A):1020–1056, 2011.

F. Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1997.

B H Juang and L R Rabiner. Mixture Autoregressive Hidden MarkovModels for Speech Signals. IEEE Transactions on Acoustics, Speech,and Signal Processing, 33(6):1404–1413, 1985.

Wood (Columbia University) HMMs November, 2011 42 / 44

Page 45: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Bibliography II

A. Krogh, M. Brown, I. S. Mian, K. Sjolander, and D. Haussler. HiddenMarkov models in computational biology: Applications to proteinmodelling . Journal of Molecular Biology, 235:1501–1531, 1994.

C. Manning and H. Schutze. Foundations of statistical natural languageprocessing. MIT Press, Cambridge, MA, 1999.

C Mitchell, M Harper, and L Jamieson. On the complexity of explicitduration HMM’s. IEEE Transactions on Speech and Audio Processing, 3(3):213–217, 1995.

K P Murphy. Hidden semi-markov models (HSMMs). Technical report,MIT, 2002.

R. Nag, K. Wong, and F. Fallside. Script regonition using hidden Markovmodels. In ICASSP86, pages 2071–2074, 1986.

L R Rabiner. A tutorial on hidden Markov models and selected applicationsin speech recognition. In Proceedings of the IEEE, pages 257–286, 1989.

Wood (Columbia University) HMMs November, 2011 43 / 44

Page 46: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Bibliography III

Simeon Realov and Kenneth L Shepard. Random Telegraph Noise in45-nm CMOS : Analysis Using an On- Chip Test and MeasurementSystem. Analysis, (212):624–627, 2010. ISSN 01631918. URLhttp://dx.doi.org/10.1109/IEDM.2010.5703436.

T. Ryden, T. Terasvirta, and S. rAsbrink. Stylized facts of daily returnseries and the hidden markov model. Journal of Applied Econometrics,13(3):217–244, 1998.

D.O. Tanguay Jr. Hidden Markov models for gesture recognition. PhDthesis, Massachusetts Institute of Technology, 1995.

Y W Teh, M I Jordan, M J Beal, and D M Blei. Hierarchical DirichletProcesses. Journal of the American Statistical Association, 101(476):1566–1581, 2006.

Wood (Columbia University) HMMs November, 2011 44 / 44

Page 47: Hidden Markov models: from the beginning to the state of ...fwood/talks/hmm_fly_through.pdf · Hidden Markov models: from the beginning to the state of ... Outline Overview of hidden

Bibliography IV

J Van Gael, Y Saatci, Y W Teh, and Z Ghahramani. Beam sampling forthe infinite hidden Markov model. In Proceedings of the 25thInternational Conference on Machine Learning, pages 1088–1095. ACM,2008.

A.D. Wilson and A.F. Bobick. Parametric hidden Markov models forgesture recognition. Pattern Analysis and Machine Intelligence, IEEETransactions on, 21(9):884–900, 1999.

S Yu and H Kobayashi. An Efficient Forward–Backward Algorithm for anExplicit-Duration Hidden Markov Model. Signal Processing letters, 10(1):11–14, 2003.

S Z Yu. Hidden semi-Markov models. Artificial Intelligence, 174(2):215–243, 2010.

Wood (Columbia University) HMMs November, 2011 45 / 44