hidden markov models - tcdsigmedia/pmwiki/uploads/main.tutorials/intro_to... · the talk… hidden...

31
Hidden Markov Models Dr. Naomi Harte

Upload: nguyendiep

Post on 13-Mar-2019

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Hidden Markov Models

Dr. Naomi Harte

Page 2: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

The Talk…

Hidden Markov ModelsWhat are they?Why are they useful?

The maths part…Probability calculationsTraining – optimising parametersViterbi – unseen sequences

Real Systems

Page 3: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Background

Discrete Markov processSystem can be in any of N states S1…SN

State changes each time instant, t1 t2 t3etcActual state at time t is qt

For first order Markov processP(qt= Sj| qt-1=Si, qt-2=Sk…) simplifies toP(qt=Sj| qt-1=Si)

Page 4: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Background

P(qt=Sj| qt-1=Si)Independent of timeState transition probabilities

aij = P(qt=Sj| qt-1=Si) i,j are 1..Naij >= 0∑aij = 1 j=1:N

Observable Markov Model

Page 5: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Example (from Rabiner)

Page 6: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Hidden Markov Model

State corresponded to observable eventRestrictive

Observation probabilistic function of state

Hidden state, observable outputs[O1, O2, O3 ,…, OT]

Page 7: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Ball and Urn (Rabiner)

URN 1 URN 2 URN N

P(Red) = b1(1)

P(Blue) = b1(2)

P(Green) = b1(3)

P(Pink) = b1(M)

P(Red) = bN(1)

P(Blue) = bN(2)

P(Green) = bN(3)

P(Pink) = bN(M)

P(Red) = b2(1)

P(Blue) = b2(2)

P(Green) = b2(3)

P(Pink) = b2(M)

Jack Ferguson’s Urn and Ball ModelN Urns, M coloursO = {Red, Green, Green, Pink, Orange, Blue, Orange, Yellow}

Page 8: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Ball and Urn

Simplest HMMState is urnColour probability defined for each state (urn)State transition matrix governs urn choice

Page 9: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

HMM elementsN - number of statesA – state transition probability

aij = P(qt=Sj| qt-1=Si)B – observation probability in state j

bj = P(Ot|qt=Sj)Discrete Ot is vk, k=1:MContinuous, gaussian mixture

Initial state distributionπi=P(q1=Si)

Model λ =(A, B, π)

Page 10: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

What are HMMs useful for?

Modelling temporally evolving events with a reproducible pattern

with some reasonable level of variationmeasurable features at intervals

Well structured Left to right HMM

More random? fully connected (ergodic) HMM

Applications in BOTH Audio and Video

Page 11: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

HMM applications

Need labelled training data!!Usual reason to NOT use HMMsSpeech & audio visual applications

Research databases Labelled/transcribed

Page 12: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

What might a HMM model?Sequence of events, features sampled at intervalsIn speech recognition:

A word, a phoneme, a syllableIn speech analysis for home monitoring

Normal speech, emotionally distressed speech, slurred speechIn music to transcribe scores

A violin, a piano, a trumpet, a mixture of instrumentsIn sports video to automatically extract highlights

A tennis serve, tennis volley, tennis rally, passing shot etc.Snooker: pot black, pot colour, pot red, foul

In cell biology video, flag specific events Nothing happening, fluorescence, cells growing, cells shrinking, cell death or division

Page 13: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Observations

What is this observation sequence O?[O1, O2, O3 ,…, OT]

Pertinent features or measures taken at regular time intervals that compactly describe events of interest Spectral features, pitch, speaker rate in speechColour, shape, motion in video

Page 14: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Example

c1

c12

O1 O2 O3 OT

Take DCT of log spectrum on 20ms windows with 50% overlap

Page 15: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

HMM problem 1

Given O = [O1, O2, … OT], and model λ, how to efficiently compute P(O| λ). EvaluationWhich model gives best scoreForward-Backward procedure

Page 16: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

HMM problem 2

Given O = [O1, O2, … OT], and model λ, how to choose a state sequence Q=[q1, q2, …, qT] that is optimalUncover “hidden” partNo correct sequence Viterbi Algorithm

Page 17: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

HMM problem 3

How to adjust model parameters of model λ= (A, B, π) to maximise P(O| λ). TrainingAdapt parameters to observed training dataUse Baum WelchIterative solution.Expectation maximisation

Page 18: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Notation

Follow Rabiner tutorial

Page 19: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Back to Problem 1

Given O = [O1, O2, … OT], and model λ, how to efficiently compute P(O| λ).Consider ALL possible state sequencesSay one particular sequence

Q = [q1, q2, … ,qT]Probability of O given Q and λ?

( ) ( )

( ) ( ) ( )Tqqq

T

ttt

ObObOb

qOPQOP

TK21

1

21

,|,|

=

=∏=

λλ

Page 20: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Observation probability ctd.

Probability of state sequence?

( )TT qqqqqqq aaaQP

132211|

−= Kπλ

JOINT probability of O and Q?

( ) ( ) ( )λλλ ,,||, QPQOPQOP =

Probability of O for ALL possible Q?

( ) ( ) ( )∑=allQ

QPQOPOP λλλ |,||

Page 21: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Observation probability ctd.

Probability of state sequence?

( )TT qqqqqqq aaaQP

132211|

−= Kπλ

JOINT probability of O and Q?

( ) ( ) ( )λλλ ,,||, QPQOPQOP =

Probability of O for ALL possible Q?

( ) ( ) ( )∑=allQ

QPQOPOP λλλ |,|| Gets crazy as N and T increase!!

Page 22: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Forward-Backward Procedure

Be smart!Only have N statesSo any state at t+1 can only be reached from N previous states at time tReuses calculations

Page 23: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Forward variable (Rabiner)

Page 24: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Exercise

Corresponding Backward variable?Partial observation sequence from t+1 to end, given in state i at time t and model λAnswer in Rabiner paper!

Page 25: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Observation Probability

Observation probability in state jbj = P(Ot|qt=Sj)Discrete Ot is vk, k=1:MContinuous, multivariate gaussian mixture density most common

Are the features independent?1st years – how does this affect the pdf?

Page 26: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

What if features not independent?

Use full covariance HMMsSlowNeed more training data

Decorrelate the featuresPCA, LDA, DCT

Page 27: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Problem 2

Given O = [O1, O2, … OT], and model λ, how to choose a state sequence Q=[q1, q2, …, qT] that is optimalWell explained in Rabiner paperSingle best state sequence QBest score along path at time t accounting for first t observations and ending in state i( ) [ ]λδ |,,, 2121

,,max

121

ttqqq

t OOOiqqqPit

LLK

==−

Page 28: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Viterbi Trellis

Page 29: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Back to Problem 3

How to adjust model parameters of model λ= (A, B, π) to maximise P(O| λ).Training of modelsBaum Welch

An implementation of EM algorithm (tutorial from David)Start with good estimateClustering with k-means

Page 30: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

Training strategies

Choice of number of statesControlling transitions

Fully connected, or left-right HMM

Gradually increasing number of mixtures per state

Page 31: Hidden Markov Models - TCDsigmedia/pmwiki/uploads/Main.Tutorials/Intro_to... · The Talk… Hidden Markov Models What are they? Why are they useful? The maths part… Probability

More Information

HTK, Hidden Markov Model Toolkit from Cambridge University

htk.eng.cam.ac.uk

Rabiner paperRabiner, L.R., "A tutorial on hidden Markov models and selected applications in speech recognition," Proceedings of the IEEE , vol.77, no.2, pp.257-286, Feb 1989

Speech Recognition Books