t-61.184 automatic speech recognition: from theory to practice · l. r. rabiner, “a tutorial on...
TRANSCRIPT
-
Automatic Speech Recognition: From Theory to Practice 1T-61.184T-61.184
T-61.184Automatic Speech Recognition:
From Theory to Practice
http://www.cis.hut.fi/Opinnot/T-61.184/October 4, 2004
Prof. Bryan PellomDepartment of Computer Science
Center for Spoken Language ResearchUniversity of Colorado
-
Automatic Speech Recognition: From Theory to Practice 2T-61.184T-61.184
Role of Hidden Markov Models in Speech Recognition
Feature Extraction
Feature Extraction
Optimization /
Search
Optimization /
Search
Lexicon Language Model
Acoustic Model
Speech Text +Timing +Confidence
-
Automatic Speech Recognition: From Theory to Practice 3T-61.184T-61.184
Motivation for Today’s Topic
SPEECH
S P IY CH
-
Automatic Speech Recognition: From Theory to Practice 4T-61.184T-61.184
Expectations for this course
Today to present a basic idea of Hidden Markov Models (HMMs)
Next week we’ll discuss how HMMs are used in practice to model acoustics of speech
Don’t be lost in the math today: I don’t expect you’ll get the theory from all these slides! If you want to learn about HMMs it’s important to take a look at the resources listed on the next slide.
-
Automatic Speech Recognition: From Theory to Practice 5T-61.184T-61.184
Resources for Today’s Topic
L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. IEEE, Vol. 77, No. 2, pp. 257-286, February, 1989.
L.R. Rabiner & B. W. Juang, Fundamentals of Speech Recognition, Prentice-Hall, ISBN 0-13 015157-2, 1993 (see chapter 6)
The Cambridge HTK Toolkit has a user manual called the “HTK Book”. This is an excellent resource and covers many of the equations related to HMM modeling
-
Automatic Speech Recognition: From Theory to Practice 6T-61.184T-61.184
Discrete-Time Markov ProcessCharacterized by:
A set of N states
Transition Probabilities
First-order Markov ChainCurrent system state only depends on previous stateLet be the system state at time t.Transition probability depends only on previous state
{ }NSSSS L,, 10=
)|( 1 itjtij SqSqPa === −
S0
00a01a
11a
10a
)|(),|( 121 itjtktitjt SqSqPSqSqSqP ====== −−− L
tq
S1ija
-
Automatic Speech Recognition: From Theory to Practice 7T-61.184T-61.184
Discrete-Time Markov Process
Properties:
“Observable Markov Model” : output of the process is a set of states.
ji, 0 ∀≥ija
∑=
∀=N
jija
0
i 1
=
1110
0100Aaaaa
S0
00a01a
11a
10a S1
-
Automatic Speech Recognition: From Theory to Practice 8T-61.184T-61.184
Example 1: Single Fair Coin Observable Markov Process
OutcomesHead (State 0)Tail (State 1)
Observed outcomes uniquely define state sequence
e.g., HHHTTTHHTT S=0001110011
Transition Probabilities,
HEAD TAILS
S0
00a
01a11a
10a S1
=
5.05.05.00.5
A
-
Automatic Speech Recognition: From Theory to Practice 9T-61.184T-61.184
Example 2: Observable Markov Model of Weather
States:S0 : rainyS1 : cloudyS2 : sunny
With state transition probabilities,
02a
00a
01aS0
11a
S1
S221a
22a
12a02a20a
10a
{ }
==
8.01.01.02.06.02.03.03.04.0
ijaA
-
Automatic Speech Recognition: From Theory to Practice 10T-61.184T-61.184
Observable Markov Model of Weather
What is the probability that the weather for 8 consecutive days is “sun, sun, sun, rain, rain, sun, cloudy, sun”?
SolutionObservation sequence:
O = {sun, sun, sun, rain, rain, sun, cloudy, sun”}Corresponds to state sequence, S = { 2, 2, 2, 0, 0, 2, 1, 2}Want to determine, P(O | model)P(O | model) = P ( S={2,2,2,0,0,2,1,2} | model )
-
Automatic Speech Recognition: From Theory to Practice 11T-61.184T-61.184
Observable Markov Model of Weather
{ }
( ) ( ) ( ) ( ) ( ) ( )2.01.03.04.01.08.0
)1|2()2|2()2()model|2,1,2,0,0,2,2,2()model|(
22
122102002022222
78121
⋅⋅⋅⋅⋅=
⋅⋅⋅⋅⋅⋅⋅=======
==
π
π aaaaaaaqqPqqPqP
SPOPL
{ }
==
8.01.01.02.06.02.03.03.04.0
ijaA)(yprobabilit state initial :
1 iqPii
==ππ
-
Automatic Speech Recognition: From Theory to Practice 12T-61.184T-61.184
“Hidden” Markov Models
Observations are a probabilistic function of state
Underlying state sequence is not observable (hidden)
First-order assumption
Output independence: observations are dependent only on the state that generated them, not on eachother.
-
Automatic Speech Recognition: From Theory to Practice 13T-61.184T-61.184
Example: 2-Coins, Observable Markov Process
ObservationsHeadTail
Observed outcomes do not uniquely define state sequence
Transition Probabilities,1
1
1)()(
PTPPHP−=
=
2
2
1)()(
PTPPHP−=
=
S0
00a
01a11a
10a S1
−
−=
1111
0000
11a
Aaaa
-
Automatic Speech Recognition: From Theory to Practice 14T-61.184T-61.184
Urn-and-Ball Illustration
A Genie plays a game with a blind observer1. Genie initially selects an Urn at random2. Genie picks a colored ball, tells blind observer the color3. Genie puts ball back in Urn4. Genie moves to next Urn based on random selection
procedure from current Urn.5. Steps 2-4 repeated
-
Automatic Speech Recognition: From Theory to Practice 15T-61.184T-61.184
Urn-and-Ball Illustration
Observations: The color sequence of the balls
States:The identity of the urn
State-Transition:The process of selecting the urns
-
Automatic Speech Recognition: From Theory to Practice 16T-61.184T-61.184
Urn-and-Ball Illustration
Observation sequence: R B Y Y G B Y G R …Individual colors (observations) don’t reveal urn (state)
60.0)(10.0)(30.0)(20.0)(
====
YPBPGPRP
20.0)(20.0)(15.0)(45.0)(
====
YPBPGPRP
05.0)(10.0)(70.0)(15.0)(
====
YPBPGPRP
-
Automatic Speech Recognition: From Theory to Practice 17T-61.184T-61.184
Discrete Symbol Observation HMMA set of N states
Transition Probabilities
A set of M observation symbols
Probability Distribution(for state j, symbol k)
{ }NSSSS L,, 10=
)|( 1 itjtij SqSqPa === −
{ }MvvvV L,, 21=
)|()( jqvoPkb tktj ===)|(
)|()|(
2
1
iM
i
i
SvP
SvPSvP
M
)|(
)|(
)|(
2
1
jM
j
j
SvP
SvP
SvP
M
Si
iia
ijajja
jia Sj
-
Automatic Speech Recognition: From Theory to Practice 18T-61.184T-61.184
Discrete Symbol Observation HMM
Also characterized by:An initial state distribution
Specification thus requires2 model parameters, N and MSpecification of M symbols3 probability measures A,B,π
{ } ( )iqPi === 1ππ
)|(
)|()|(
2
1
iM
i
i
SvP
SvPSvP
M
)|(
)|(
)|(
2
1
jM
j
j
SvP
SvP
SvP
M
Si
iia
ijajja
jia Sj
),,( πλ BA=
-
Automatic Speech Recognition: From Theory to Practice 19T-61.184T-61.184
Left-to-Right HMMs
=
333231
232221
131211
Aaaaaaaaaa ijaij
-
Automatic Speech Recognition: From Theory to Practice 20T-61.184T-61.184
Ergodic HMMs
jiaij ∀∀> , 0
S1
11a12a
22a
S2 S3
23a33a
13a21a 32a
31a
-
Automatic Speech Recognition: From Theory to Practice 21T-61.184T-61.184
HMM as a Symbol Generator
Given parameters N, M, A, B, and π, HMMs can generate an observation sequence,
1. Choose initial state from init. state distribution π2. Set 3. Choose according to distribution4. Transition to set state according to state-transition probability distribution5. Set go to step 3
{ }ToooO ,,, 21 L=iq =1
1=tkt vo = )(kbijqt =+1
ija1+= tt
-
Automatic Speech Recognition: From Theory to Practice 22T-61.184T-61.184
HMM as a Symbol Generator
Example output from symbol generation
We can think of the HMM as generating the observation sequence as it transitions from state to state.
Observation
State
T…654321Time, t
1q 2q 3q 4q 5q 6q Tq1o 2o 3o 4o 5o 6o To
-
Automatic Speech Recognition: From Theory to Practice 23T-61.184T-61.184
Three Interesting ProblemsProblem 1: Scoring & Evaluation
How to efficiently compute the probability of an observation sequence (O) given a model (λ)? P(O| λ)
Problem 2: DecodingGiven an observation sequence (O) and a model (λ), how do we determine the corresponding state-sequence (q) that “best explains” how the observations were generated?
Problem 3: TrainingHow to adjust model parameters (λ = {A,B,π}) to maximize probability of generating a given observation sequence? maximize P(O| λ).
-
Automatic Speech Recognition: From Theory to Practice 24T-61.184T-61.184
Problem 1: Scoring and Evaluation
Given an observation sequence,
Want to compute probability of generating it:
Let’s assume a particular sequence of states,
{ }ToooO ,,, 21 L=
)| λ(OP
{ }Tqqqq ,,, 21 L=
-
Automatic Speech Recognition: From Theory to Practice 25T-61.184T-61.184
Problem 1: Scoring and Evaluation
Using the chain rule we can decompose the problem by summing over all possible state sequences,
The first term relates to the likelihood of generating the observed symbol sequence given the assumed state sequence.
The second term relates to how likely the system is to step through those sequence of states.
∑=q all
)|)P(,|P()|P( λλλ qqOO
-
Automatic Speech Recognition: From Theory to Practice 26T-61.184T-61.184
Problem 1: Scoring and Evaluation
Probability of the observation sequence given the state sequence,
Probability of the state sequence,
)()()(
),|(),|
21
1
21 Tqqq
T
ttt
Tbbb
qpqP
ooo
o (O
L⋅=
=∏=
λλ
)()()()|132211 TT qqqqqqq
aaaqP−
⋅= Lπλ (
-
Automatic Speech Recognition: From Theory to Practice 27T-61.184T-61.184
Problem 1: Scoring and Evaluation
Using the chain rule,
This is not practical to compute. For N states, T observations, the number of state sequences is:
∑=q all
)|)P(,|P()|P( λλλ qqOO
)o()o()o(122111 2
q all 1 Tqqqqqqqq TTT babab −∑= Lπ
)*2(O TNT
-
Automatic Speech Recognition: From Theory to Practice 28T-61.184T-61.184
Forward AlgorithmDefinition:
(Probability of seeing observations o1 to ot and ending at state i given HMM λ)
1. Initialization
2. Induction
3. Termination
)|,()( 21 λα iqPi ttt == ooo K
ii πα =)(0
)()()( 11
1 +=
+
= ∑ tj
N
iijtt baij oαα
∑=
=N
iT iP
1)()|( αλO
-
Automatic Speech Recognition: From Theory to Practice 29T-61.184T-61.184
Forward Algorithm Illustration)1(tα
)2(tα
)(itα
)(Ntα
ja1ja2
ija
Nja
)()(
)(
11i
1
+=
+
=
∑ tjN
ijt
t
bai
j
oα
α
time t time t+1 )*(O2NT
-
Automatic Speech Recognition: From Theory to Practice 30T-61.184T-61.184
Forward Algorithm Example
S0
6.0
4.0
0.1
S1
2.0B8.0A
7.0B3.0A
=
0.00.1
π
Given the above HMM with discrete observations “A” and “B”, what is the probability of generating the sequence “O = {A,A,B}”?
In other words, find P( O={A,A,B} | λ )
-
Automatic Speech Recognition: From Theory to Practice 31T-61.184T-61.184
Forward Algorithm Example
S0
6.0
4.0
0.1
S1
2.0B8.0A
7.0B3.0A
=
0.00.1
π
t=0S0
S1
0.6*0.81.0
0.01.0*0.3
t=10.6*0.8
0.48
1.0*0.30.12
t=20.6*0.2
0.23
t=30.03
0.091.0*0.7
0.13
0.4*0.3 0.4*0.3 0.4*0.3
A A B
Answer:P(O|λ)=
0.03+0.13= 0.16
Answer:P(O|λ)=
0.03+0.13= 0.16
-
Automatic Speech Recognition: From Theory to Practice 32T-61.184T-61.184
Backward Algorithm
Definition:
Probability of observation sequence ot+1 to oTgiven state i at time t and HMM λ
Initialization
Induction
)|,()( 21 λβ iqPi tTttt == ++ ooo K
)()()( 11
1 jbai tN
jtjijt +
=+∑= ββ o
1)( =iTβ
NiTTt
≤≤−−=
11,,2,1 K
-
Automatic Speech Recognition: From Theory to Practice 33T-61.184T-61.184
Backward Algorithm Illustration
( )122 +ti oba
)1(1+tβ
)2(1+tβ
)(1 jt+β
)(1 Nt+β
( )111 +ti oba
)()(
)(
1j11∑
=++
=N
ttjij
t
jba
i
β
β
o
time t time t+1
( )122 +ti oba
( )1+tjij oba
( )1+tNiN oba
)*(O 2NT
-
Automatic Speech Recognition: From Theory to Practice 34T-61.184T-61.184
Backward Algorithm Example
S0
6.0
4.0
0.1
S1
2.0B8.0A
7.0B3.0A
=
0.00.1
π
t=0S0
S1
0.6*0.80.16
0.001.0*0.3
t=10.6*0.8
0.28
1.0*0.30.21
t=20.6*0.2
0.40
t=31.0
0.71.0*0.7
1.0
0.4*0.3 0.4*0.3 0.4*0.7
A A B
1* 0 =π
0* 1 =π
-
Automatic Speech Recognition: From Theory to Practice 35T-61.184T-61.184
Problem 1: Scoring and Evaluation
In fact, there are 2 ways to compute
Computation of each term is
)| λ(OP
∑=
=N
iT iP
1)()|( αλO
)()|( 01
iPN
iiβπλ ∑
=
=O
)(O 2TN
Forward Algorithm
Backward Algorithm
-
Automatic Speech Recognition: From Theory to Practice 36T-61.184T-61.184
Problem 2: Decoding
Given an observation sequence,
Find the single best sequence of states,
Which maximizes,
{ }ToooO ,,, 21 L=
)|,( λqP O
{ }Tqqqq ,,, 21 L=
-
Automatic Speech Recognition: From Theory to Practice 37T-61.184T-61.184
Solution: Viterbi Algorithm
Define:
“Highest probable state sequence that accounts for observations 1 through t-1and ends in state i and time t”.
)|,(max)( 21,121121
λδ tttqqqt iqqqqPi tooo KL
L== −
−
( )11 ])(max[)( ++ ⋅= tjijtit obaij δδ
-
Automatic Speech Recognition: From Theory to Practice 38T-61.184T-61.184
Viterbi Algorithm Illustration
)1(tδ
)2(tδ
)(itδ
)(Ntδ
ja1ja2
ija
Nja
)(])([max )(
11
1
+≤≤
+ =
tjijtNi
t
baij
oδδ
time t time t+1
-
Automatic Speech Recognition: From Theory to Practice 39T-61.184T-61.184
Viterbi Algorithm1. Initialization
2. Recursion
3. Termination
4. Path Back trace
)()( 11 oiibi πδ = 0)(1 =iψ
)(])([max)( 11 tjijtNit baij o−≤≤= δδ
])([maxarg)( 11
ijtNi
t aij −≤≤
= δψ
[ ])(max1
* iP TNi δ≤≤=[ ])(maxarg
1
* iq TNi
T δ≤≤
=
)( * 11*
++= ttt qq ψ
-
Automatic Speech Recognition: From Theory to Practice 40T-61.184T-61.184
Viterbi Algorithm Illustration
S0
6.0
4.0
0.1
S1
2.0B8.0A
7.0B3.0A
=
0.00.1
π
t=0S0
S1
0.6*0.81.0
0.01.0*0.3
t=10.6*0.8
0.48
1.0*0.30.12
t=20.6*0.2
0.23
t=30.03
0.061.0*0.7
0.06
0.4*0.3 0.4*0.3 0.4*0.7
A A B
-
Automatic Speech Recognition: From Theory to Practice 41T-61.184T-61.184
Viterbi Algorithm in Log-Domain
Due to numerical underflow issues, it is often common to conduct the Viterbi algorithm in the log-domain,
Viterbi search in speech recognition systems is implemented in the log-domain for example.
( )( ) ( )( )
( )ijijtjtj
ii
aa
obob
log~log~
log~
=
=
= ππ
-
Automatic Speech Recognition: From Theory to Practice 42T-61.184T-61.184
Viterbi Algorithm in Log-Domain1. Initialization
2. Recursion
3. Termination
4. Path Back trace
)(~~)(~ 11 oii bi +=πδ 0)(1 =iψ
)(~]~)(~[max)( 11 tjijtNit baij o++= −≤≤ δδ
]~)(~[maxarg)( 11
ijtNi
t aij += −≤≤
δψ
[ ])(~max~1
* iP TNi δ≤≤=[ ])(~maxarg
1
* iq TNi
T δ≤≤
=
)( * 11*
++= ttt qq ψ
-
Automatic Speech Recognition: From Theory to Practice 43T-61.184T-61.184
Problem 3: Training
Train parameters of HMMTune λ to maximize P(O| λ)No efficient algorithm for global optimumEfficient iterative algorithm finds a local optimum
(Baum-Welch) Forward-Backward AlgorithmCompute probabilities using current model λRefine λ λ based on computed valuesUses α and β from forward and backward algorithms
-
Automatic Speech Recognition: From Theory to Practice 44T-61.184T-61.184
Forward-Backward Algorithm
Define:
“Probability of being in state i at time tand state j at time t+1 given the model and observation sequence”
)|()|,,(
),|,(),(
1
1
λλλξ
OOO
PjqiqPjqiqPji
tt
ttt
===
===
+
+
-
Automatic Speech Recognition: From Theory to Practice 45T-61.184T-61.184
Forward-Backward Algorithm
)|()()()(
),|,(),(
11
1
λβα
λξ
Oo
O
Pjbai
jqiqPji
ttjijt
ttt
++
+
=
===
)(itα )(1 jt+β)( 1+tjijba o
-
Automatic Speech Recognition: From Theory to Practice 46T-61.184T-61.184
Forward-Backward Algorithm
∑=
++
++
+
=
=
===
N
itt
ttjijt
ttjijt
ttt
ii
jbaiP
jbaijqiqPji
1
11
11
1
)()(
)()()(
)|()()()(
),|,(),(
βα
βαλβα
λξ
oOo
O
-
Automatic Speech Recognition: From Theory to Practice 47T-61.184T-61.184
Forward-Backward Algorithm
=∑−
=
1
1
)(T
tt iγ
=∑−
=
1
1),(
T
tt jiξ
Expected number of transitions from state i in O.
Expected number of transitions from state i tostate j in O.
∑=
=N
jtt jii
1
),()( ξγ Probability of being in statei at time t
-
Automatic Speech Recognition: From Theory to Practice 48T-61.184T-61.184
Forward-Backward Algorithm
Using these definitions, we can compute the initial state occupancy probability,
1 at time statein timesofnumber expected == tiiπ
)(1 iγ=
-
Automatic Speech Recognition: From Theory to Practice 49T-61.184T-61.184
Forward-Backward Algorithm
i
jiija
state from ns transitioofnumber expected
state to state from ns transitioofnumber expected =
∑
∑
∑∑
∑−
=
−
=−
= =
−
= == 1
1
1
11
1 1
1
1
)(
),(
),(
),(
T
tt
T
tt
T
t
N
jt
T
tt
i
ji
ji
ji
γ
ξ
ξ
ξ
-
Automatic Speech Recognition: From Theory to Practice 50T-61.184T-61.184
Forward-Backward Algorithm
j
jkbj statein esnumber tim expected
symbolkth observing and statein timesofnumber expected )( =
∑
∑
∑
∑
∑∑
∑∑
=
=
=
=
= =
= ==== === T
ttt
T
ttt
T
tt
T
tt
T
t
N
jt
T
t
N
jt
jj
jj
j
j
jj
jj
kvtkvtkvt
1
1
1
1
1 1
1 1
)()(
)()(
)(
)(
),(
),(
βα
βα
γ
γ
ξ
ξooo
-
Automatic Speech Recognition: From Theory to Practice 51T-61.184T-61.184
It can be shownThat
P(O|λ) > P(O| λ)
Unlessλ= λ
It can be shownThat
P(O|λ) > P(O| λ)
Unlessλ= λ
Forward-Backward Algorithm
1. Initialize λ=(A,B,π)
2. Compute α, β, and ξ
3. Estimate λ = (A,B, π)
4. Replace λ with λ
5. If not converged go to 2
-
Automatic Speech Recognition: From Theory to Practice 52T-61.184T-61.184
FB Algorithm Satisfies Constraints:
∑=
=N
ii
1
1π
NiaN
jij ≤≤=∑
=
1 11
NjkbM
kj ≤≤=∑
=
1 1)(1
-
Automatic Speech Recognition: From Theory to Practice 53T-61.184T-61.184
Continuous Observation Densities
Discussions up until this point have considered observations characterized by discrete symbols
Often, we work with continuous valued data
Can convert continuous variables to discrete symbols using a Vector Quantizer (VQ) codebook [with information loss]
How to model continuous observations directly?
-
Automatic Speech Recognition: From Theory to Practice 54T-61.184T-61.184
Mixture Gaussian PDFs
Mixture Gaussian PDF with M components,
Constraints on the mixture weights,
),,()(1
jkjk
M
kjkj µcb ΣΝ=∑
=tt oo
∑=
=M
kjkc
1
1
Mk1 ,0 ≤≤≥jkc
-
Automatic Speech Recognition: From Theory to Practice 55T-61.184T-61.184
Definition
Probability of being in state j at time t with the kth mixture component accounting for ot:
ΣΝ
ΣΝ
=
∑∑==
M
mjmjmtjm
jkjktjkN
jtt
ttt
c
c
jj
jjkj
11
),,(
),,(
)()(
)()(),(µ
µ
βα
βαγo
o
-
Automatic Speech Recognition: From Theory to Practice 56T-61.184T-61.184
Resulting Equations for Mixture Weight & Mean Update
∑∑
∑
= =
=
⋅= T
t
M
kt
t
T
tt
jk
kj
kj
1 1
1
),(
),(
γ
γµ
o
∑∑
∑
= =
== T
t
M
kt
T
tt
jk
kj
kjc
1 1
1
),(
),(
γ
γ
-
Automatic Speech Recognition: From Theory to Practice 57T-61.184T-61.184
Resulting Equations for Transition Matrix & Variance Update
Transition probability aij same as case for discrete symbols. Covariance Matrix:
( )( )
∑∑
∑
= =
=
′−−⋅=Σ T
t
M
kt
jktjkt
T
tt
jk
kj
kj
1 1
1
),(
),(
γ
µµγ oo
-
Automatic Speech Recognition: From Theory to Practice 58T-61.184T-61.184
Estimation from Multiple Observation Sequences
We estimate HMM parameters from multiple examples of speech:
Collected from different speakersIn different contexts
We attempt to model variability in producing each sound unit by estimating parameters from large corpora of speech data.
-
Automatic Speech Recognition: From Theory to Practice 59T-61.184T-61.184
Multiple Observations
Assume K training patterns (observation sequences),
Where,
And,
[ ](K)(2))1( , , , OOOO K=
{ }(k)T(k)1(k)1(k) koooO , , , K=
)|(P (k) λOPk =
-
Automatic Speech Recognition: From Theory to Practice 60T-61.184T-61.184
Transition Probabilitieswith Multiple Observations
∑ ∑
∑ ∑
=
−
=
=
−
=++
=K
k
T
t
kt
kt
k
K
k
T
t
ktjij
kt
kij k
k
iiP
jbaiP
a
1
1
1
1
1
11
)()(1
)()()(1
βα
βα (k)1to
-
Automatic Speech Recognition: From Theory to Practice 61T-61.184T-61.184
Observation Probabilitieswith Multiple Observations
∑ ∑
∑ ∑
=
−
=
=
−
===
K
k
T
t
kt
kt
k
K
k
T
t
kt
kt
k
j k
k
lvtOts
iiP
iiP
lb
1
1
1
1
1
1
)()(1
)()(1
)( ..
βα
βα
-
Automatic Speech Recognition: From Theory to Practice 62T-61.184T-61.184
Single State HMM Example
How to estimate the parameters of a single-state HMM model with M component Gaussians?,
( )( ) ( )
21exp
2
),,()(
1
12/12/
1
∑
∑
=
−
=
−Σ′−−
Σ=
Σ=
M
kktkkt
kd
k
kk
M
kkk
uouow
µbwb
π
tt oo
-
Automatic Speech Recognition: From Theory to Practice 63T-61.184T-61.184
Single State HMM Example
Define probability of tth observation being generated by the kth mixture component,
Note that “k” (bk) here refers to mixture component, not HMM state. We are assuming just 1 HMM state.
( )
( )∑=
= M
ktkk
tkkt
obw
obwokP
1
),|( λ
-
Automatic Speech Recognition: From Theory to Practice 64T-61.184T-61.184
Single State HMM Update Equations
( )( )2
1
1
2
1
1
1
),|(
),|(
),|(
),|(
),|(1
kT
tt
T
ttt
k
T
tt
T
ttt
k
T
ttk
uokP
ookP
okP
ookPu
okPT
w
−⋅
=Σ
⋅=
=
∑
∑
∑
∑
∑
=
=
=
=
=
λ
λ
λ
λ
λ This termis computedusing modelparametersfrom previousalgorithmiteration.
Updated(new)
parameterestimates
-
Automatic Speech Recognition: From Theory to Practice 65T-61.184T-61.184
Mixture Gaussian (1-D case)
-7.5 -5.5 -3.5 -1.5 0.5 2.5 4.5 6.5
Example: 3 mixtures used to model underlying random process of 3 Gaussians
)(ob
o
-
Automatic Speech Recognition: From Theory to Practice 66T-61.184T-61.184
Homework #3
You will estimate the parameters of a single state multivariate HMM given a sequence of observations (O).
Training parameters (observations) are MFCCs (13 dimensional) from 10 speakers
Test parameters: from unknown speakers… which speaker most likely produced the sequence?
-
Automatic Speech Recognition: From Theory to Practice 67T-61.184T-61.184
Homework #3
Can compute the probability of generating the observation sequence for each estimated single-state HMM:
Find the model which has the maximum log-probability…
∑=
=T
tts obOP
1
)(log)|(log λ
-
Automatic Speech Recognition: From Theory to Practice 68T-61.184T-61.184
HMM Assumptions
First-Order Markov ChainProbability of transitioning to a state only dependents on the current state
Time-independence of State TransitionsTransitioning from state A to state B is independent of time
Observation IndependenceObservations don’t depend on each other, just on the states thatgenerated them
(sometimes) Left-to-Right TopologyAs we will see, it is generally assumed that a left-to-right HMM is used to model speech units (phonemes)
-
Automatic Speech Recognition: From Theory to Practice 69T-61.184T-61.184
Important Concepts from Today
Do you understand the basic idea of an HMM?
Could you implement the Viterbi Algorithm if you had to?
Can you estimate at least the parameters of a single-state HMM (mixture-Gaussians)?
What assumptions do HMMs impose on the data being modeled?
-
Automatic Speech Recognition: From Theory to Practice 70T-61.184T-61.184
Next Week
How to use HMMs for modeling speech units?
Consider major strategies for acoustic training
How to model context dependencies using HMMs
Some practical notes on HMM training for speech recognition