hidden markov models - tcdsigmedia/pmwiki/uploads/main.tutorials/intro_to... · the talk… hidden...

Hidden Markov Models

Dr. Naomi Harte

The Talk…

Hidden Markov ModelsWhat are they?Why are they useful?

The maths part…Probability calculationsTraining – optimising parametersViterbi – unseen sequences

Real Systems

Background

Discrete Markov processSystem can be in any of N states S1…SN

State changes each time instant, t1 t2 t3etcActual state at time t is qt

For first order Markov processP(qt= Sj| qt-1=Si, qt-2=Sk…) simplifies toP(qt=Sj| qt-1=Si)

Background

P(qt=Sj| qt-1=Si)Independent of timeState transition probabilities

aij = P(qt=Sj| qt-1=Si) i,j are 1..Naij >= 0∑aij = 1 j=1:N

Observable Markov Model

Example (from Rabiner)

Hidden Markov Model

State corresponded to observable eventRestrictive

Observation probabilistic function of state

Hidden state, observable outputs[O1, O2, O3 ,…, OT]

Ball and Urn (Rabiner)

URN 1 URN 2 URN N

P(Red) = b1(1)

P(Blue) = b1(2)

P(Green) = b1(3)

P(Pink) = b1(M)

P(Red) = bN(1)

P(Blue) = bN(2)

P(Green) = bN(3)

P(Pink) = bN(M)

P(Red) = b2(1)

P(Blue) = b2(2)

P(Green) = b2(3)

P(Pink) = b2(M)

Jack Ferguson’s Urn and Ball ModelN Urns, M coloursO = {Red, Green, Green, Pink, Orange, Blue, Orange, Yellow}

Ball and Urn

Simplest HMMState is urnColour probability defined for each state (urn)State transition matrix governs urn choice

HMM elementsN - number of statesA – state transition probability

aij = P(qt=Sj| qt-1=Si)B – observation probability in state j

bj = P(Ot|qt=Sj)Discrete Ot is vk, k=1:MContinuous, gaussian mixture

Initial state distributionπi=P(q1=Si)

Model λ =(A, B, π)

What are HMMs useful for?

Modelling temporally evolving events with a reproducible pattern

with some reasonable level of variationmeasurable features at intervals

Well structured Left to right HMM

More random? fully connected (ergodic) HMM

Applications in BOTH Audio and Video

HMM applications

Need labelled training data!!Usual reason to NOT use HMMsSpeech & audio visual applications

Research databases Labelled/transcribed

What might a HMM model?Sequence of events, features sampled at intervalsIn speech recognition:

A word, a phoneme, a syllableIn speech analysis for home monitoring

Normal speech, emotionally distressed speech, slurred speechIn music to transcribe scores

A violin, a piano, a trumpet, a mixture of instrumentsIn sports video to automatically extract highlights

A tennis serve, tennis volley, tennis rally, passing shot etc.Snooker: pot black, pot colour, pot red, foul

In cell biology video, flag specific events Nothing happening, fluorescence, cells growing, cells shrinking, cell death or division

Observations

What is this observation sequence O?[O1, O2, O3 ,…, OT]

Pertinent features or measures taken at regular time intervals that compactly describe events of interest Spectral features, pitch, speaker rate in speechColour, shape, motion in video

Example

c1

c12

O1 O2 O3 OT

Take DCT of log spectrum on 20ms windows with 50% overlap

HMM problem 1

Given O = [O1, O2, … OT], and model λ, how to efficiently compute P(O| λ). EvaluationWhich model gives best scoreForward-Backward procedure

HMM problem 2

Given O = [O1, O2, … OT], and model λ, how to choose a state sequence Q=[q1, q2, …, qT] that is optimalUncover “hidden” partNo correct sequence Viterbi Algorithm

HMM problem 3

How to adjust model parameters of model λ= (A, B, π) to maximise P(O| λ). TrainingAdapt parameters to observed training dataUse Baum WelchIterative solution.Expectation maximisation

Notation

Follow Rabiner tutorial

Back to Problem 1

Given O = [O1, O2, … OT], and model λ, how to efficiently compute P(O| λ).Consider ALL possible state sequencesSay one particular sequence

Q = [q1, q2, … ,qT]Probability of O given Q and λ?

( ) ( )

( ) ( ) ( )Tqqq

T

ttt

ObObOb

qOPQOP

TK21

1

21

,|,|

=

=∏=

λλ

Observation probability ctd.

Probability of state sequence?

( )TT qqqqqqq aaaQP

132211|

−= Kπλ

JOINT probability of O and Q?

( ) ( ) ( )λλλ ,,||, QPQOPQOP =

Probability of O for ALL possible Q?

( ) ( ) ( )∑=allQ

QPQOPOP λλλ |,||

Observation probability ctd.

Probability of state sequence?

( )TT qqqqqqq aaaQP

132211|

−= Kπλ

JOINT probability of O and Q?

( ) ( ) ( )λλλ ,,||, QPQOPQOP =

Probability of O for ALL possible Q?

( ) ( ) ( )∑=allQ

QPQOPOP λλλ |,|| Gets crazy as N and T increase!!

Forward-Backward Procedure

Be smart!Only have N statesSo any state at t+1 can only be reached from N previous states at time tReuses calculations

Forward variable (Rabiner)

Exercise

Corresponding Backward variable?Partial observation sequence from t+1 to end, given in state i at time t and model λAnswer in Rabiner paper!

Observation Probability

Observation probability in state jbj = P(Ot|qt=Sj)Discrete Ot is vk, k=1:MContinuous, multivariate gaussian mixture density most common

Are the features independent?1st years – how does this affect the pdf?

What if features not independent?

Use full covariance HMMsSlowNeed more training data

Decorrelate the featuresPCA, LDA, DCT

Problem 2

Given O = [O1, O2, … OT], and model λ, how to choose a state sequence Q=[q1, q2, …, qT] that is optimalWell explained in Rabiner paperSingle best state sequence QBest score along path at time t accounting for first t observations and ending in state i( ) [ ]λδ |,,, 2121

,,max

121

ttqqq

t OOOiqqqPit

LLK

==−

Viterbi Trellis

Back to Problem 3

How to adjust model parameters of model λ= (A, B, π) to maximise P(O| λ).Training of modelsBaum Welch

An implementation of EM algorithm (tutorial from David)Start with good estimateClustering with k-means

Training strategies

Choice of number of statesControlling transitions

Fully connected, or left-right HMM

Gradually increasing number of mixtures per state

More Information

HTK, Hidden Markov Model Toolkit from Cambridge University

htk.eng.cam.ac.uk

Rabiner paperRabiner, L.R., "A tutorial on hidden Markov models and selected applications in speech recognition," Proceedings of the IEEE , vol.77, no.2, pp.257-286, Feb 1989

Speech Recognition Books

hidden markov models - tcdsigmedia/pmwiki/uploads/main.tutorials/intro_to... · the talk… hidden...

Documents