hidden markov models - tcdsigmedia/pmwiki/uploads/main.tutorials/intro_to... · the talk… hidden...
TRANSCRIPT
Hidden Markov Models
Dr. Naomi Harte
The Talk…
Hidden Markov ModelsWhat are they?Why are they useful?
The maths part…Probability calculationsTraining – optimising parametersViterbi – unseen sequences
Real Systems
Background
Discrete Markov processSystem can be in any of N states S1…SN
State changes each time instant, t1 t2 t3etcActual state at time t is qt
For first order Markov processP(qt= Sj| qt-1=Si, qt-2=Sk…) simplifies toP(qt=Sj| qt-1=Si)
Background
P(qt=Sj| qt-1=Si)Independent of timeState transition probabilities
aij = P(qt=Sj| qt-1=Si) i,j are 1..Naij >= 0∑aij = 1 j=1:N
Observable Markov Model
Example (from Rabiner)
Hidden Markov Model
State corresponded to observable eventRestrictive
Observation probabilistic function of state
Hidden state, observable outputs[O1, O2, O3 ,…, OT]
Ball and Urn (Rabiner)
URN 1 URN 2 URN N
P(Red) = b1(1)
P(Blue) = b1(2)
P(Green) = b1(3)
P(Pink) = b1(M)
P(Red) = bN(1)
P(Blue) = bN(2)
P(Green) = bN(3)
P(Pink) = bN(M)
P(Red) = b2(1)
P(Blue) = b2(2)
P(Green) = b2(3)
P(Pink) = b2(M)
Jack Ferguson’s Urn and Ball ModelN Urns, M coloursO = {Red, Green, Green, Pink, Orange, Blue, Orange, Yellow}
Ball and Urn
Simplest HMMState is urnColour probability defined for each state (urn)State transition matrix governs urn choice
HMM elementsN - number of statesA – state transition probability
aij = P(qt=Sj| qt-1=Si)B – observation probability in state j
bj = P(Ot|qt=Sj)Discrete Ot is vk, k=1:MContinuous, gaussian mixture
Initial state distributionπi=P(q1=Si)
Model λ =(A, B, π)
What are HMMs useful for?
Modelling temporally evolving events with a reproducible pattern
with some reasonable level of variationmeasurable features at intervals
Well structured Left to right HMM
More random? fully connected (ergodic) HMM
Applications in BOTH Audio and Video
HMM applications
Need labelled training data!!Usual reason to NOT use HMMsSpeech & audio visual applications
Research databases Labelled/transcribed
What might a HMM model?Sequence of events, features sampled at intervalsIn speech recognition:
A word, a phoneme, a syllableIn speech analysis for home monitoring
Normal speech, emotionally distressed speech, slurred speechIn music to transcribe scores
A violin, a piano, a trumpet, a mixture of instrumentsIn sports video to automatically extract highlights
A tennis serve, tennis volley, tennis rally, passing shot etc.Snooker: pot black, pot colour, pot red, foul
In cell biology video, flag specific events Nothing happening, fluorescence, cells growing, cells shrinking, cell death or division
Observations
What is this observation sequence O?[O1, O2, O3 ,…, OT]
Pertinent features or measures taken at regular time intervals that compactly describe events of interest Spectral features, pitch, speaker rate in speechColour, shape, motion in video
Example
c1
c12
O1 O2 O3 OT
Take DCT of log spectrum on 20ms windows with 50% overlap
HMM problem 1
Given O = [O1, O2, … OT], and model λ, how to efficiently compute P(O| λ). EvaluationWhich model gives best scoreForward-Backward procedure
HMM problem 2
Given O = [O1, O2, … OT], and model λ, how to choose a state sequence Q=[q1, q2, …, qT] that is optimalUncover “hidden” partNo correct sequence Viterbi Algorithm
HMM problem 3
How to adjust model parameters of model λ= (A, B, π) to maximise P(O| λ). TrainingAdapt parameters to observed training dataUse Baum WelchIterative solution.Expectation maximisation
Notation
Follow Rabiner tutorial
Back to Problem 1
Given O = [O1, O2, … OT], and model λ, how to efficiently compute P(O| λ).Consider ALL possible state sequencesSay one particular sequence
Q = [q1, q2, … ,qT]Probability of O given Q and λ?
( ) ( )
( ) ( ) ( )Tqqq
T
ttt
ObObOb
qOPQOP
TK21
1
21
,|,|
=
=∏=
λλ
Observation probability ctd.
Probability of state sequence?
( )TT qqqqqqq aaaQP
132211|
−= Kπλ
JOINT probability of O and Q?
( ) ( ) ( )λλλ ,,||, QPQOPQOP =
Probability of O for ALL possible Q?
( ) ( ) ( )∑=allQ
QPQOPOP λλλ |,||
Observation probability ctd.
Probability of state sequence?
( )TT qqqqqqq aaaQP
132211|
−= Kπλ
JOINT probability of O and Q?
( ) ( ) ( )λλλ ,,||, QPQOPQOP =
Probability of O for ALL possible Q?
( ) ( ) ( )∑=allQ
QPQOPOP λλλ |,|| Gets crazy as N and T increase!!
Forward-Backward Procedure
Be smart!Only have N statesSo any state at t+1 can only be reached from N previous states at time tReuses calculations
Forward variable (Rabiner)
Exercise
Corresponding Backward variable?Partial observation sequence from t+1 to end, given in state i at time t and model λAnswer in Rabiner paper!
Observation Probability
Observation probability in state jbj = P(Ot|qt=Sj)Discrete Ot is vk, k=1:MContinuous, multivariate gaussian mixture density most common
Are the features independent?1st years – how does this affect the pdf?
What if features not independent?
Use full covariance HMMsSlowNeed more training data
Decorrelate the featuresPCA, LDA, DCT
Problem 2
Given O = [O1, O2, … OT], and model λ, how to choose a state sequence Q=[q1, q2, …, qT] that is optimalWell explained in Rabiner paperSingle best state sequence QBest score along path at time t accounting for first t observations and ending in state i( ) [ ]λδ |,,, 2121
,,max
121
ttqqq
t OOOiqqqPit
LLK
==−
Viterbi Trellis
Back to Problem 3
How to adjust model parameters of model λ= (A, B, π) to maximise P(O| λ).Training of modelsBaum Welch
An implementation of EM algorithm (tutorial from David)Start with good estimateClustering with k-means
Training strategies
Choice of number of statesControlling transitions
Fully connected, or left-right HMM
Gradually increasing number of mixtures per state
More Information
HTK, Hidden Markov Model Toolkit from Cambridge University
htk.eng.cam.ac.uk
Rabiner paperRabiner, L.R., "A tutorial on hidden Markov models and selected applications in speech recognition," Proceedings of the IEEE , vol.77, no.2, pp.257-286, Feb 1989
Speech Recognition Books