lecture 9 - lvcsr search - columbia universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf ·...
TRANSCRIPT
![Page 1: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/1.jpg)
Lecture 9
LVCSR Search
Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen,Markus Nussbaum-Thom
Watson GroupIBM T.J. Watson Research CenterYorktown Heights, New York, USA
{picheny,bhuvana,stanchen,nussbaum}@us.ibm.com
23 March 2016
![Page 2: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/2.jpg)
Administrivia
Lab 2 sample answers./user1/faculty/stanchen/e6870/lab2_ans/
Lab 3 not graded yet.Lab 4 out today.
Due nine days from now (Friday, Apr. 1) at 6pm?Lab 5 cancelled.Visit to IBM Watson Astor Place in 1.5 weeks.
April 1, 11am-1pm.
2 / 139
![Page 3: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/3.jpg)
Feedback
Clear (2); mostly clear (1).Pace: fast (1).Muddiest: moving from small to large vocab (1).No comments with 2+ votes; 6 responses total.
3 / 139
![Page 4: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/4.jpg)
Road Map
4 / 139
![Page 5: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/5.jpg)
Review, Part I
What is x?The feature vector.
What is ω?A word sequence.
What notation do we use for acoustic models?P(x|ω)
What does an acoustic model model?How likely feature vectors are given a word sequence.
What notation do we use for language models?P(ω)
What does a language model model?How frequent each word sequence is.
5 / 139
![Page 6: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/6.jpg)
Review, Part II
What is the fundamental equation of ASR?
(answer) = arg maxω∈vocab∗
(language model)× (acoustic model)
= arg maxω∈vocab∗
(prior prob over words)× P(feats|words)
= arg maxω∈vocab∗
P(ω)P(x|ω)
6 / 139
![Page 7: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/7.jpg)
Match the Lecture With The Topic
Language modeling Estimate P(x|ω)
LVCSR training arg maxω∈vocab∗ P(ω)P(x|ω)
LVCSR search Estimate P(ω)
Which of these are offline? Online?
7 / 139
![Page 8: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/8.jpg)
Demo: Speed Kills
8 / 139
![Page 9: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/9.jpg)
This Lecture
How to do LVCSR decoding.How to make it fast.
9 / 139
![Page 10: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/10.jpg)
Part I
Making the Decoding Graph
10 / 139
![Page 11: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/11.jpg)
LVCSR Search a.k.a. Decoding
(answer) = arg maxω∈vocab∗
(language model)× (acoustic model)
= arg maxω∈vocab∗
P(ω)P(x|ω)
How to compute the argmax?Run Viterbi/Forward/Forward-Backward?One big HMM/one small HMM/lots of small HMM’s?
The whole ballgame: how to build the HMM!!!
11 / 139
![Page 12: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/12.jpg)
One Big HMM: Small Vocabulary
one
two
three
four
�ve
six
seveneight
nine
zero
12 / 139
![Page 13: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/13.jpg)
Small⇒ Large Vocabulary
How to build the big HMM for LVCSR?What’s missing? Are there any scores we need to add?
13 / 139
![Page 14: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/14.jpg)
Idea: Add LM Scores to HMM
(answer) = arg maxω∈vocab∗
(language model)× (acoustic model)
= arg maxω∈vocab∗
P(ω)P(x|ω)
Viterbi: without LM.
arg maxω
P(x|ω)⇔ maxT∏
t=1
(arc cost)
Viterbi: with LM.
arg maxω
P(ω)P(x|ω)⇔ arg maxT∏
t=1
(arc cost)× (LM score)
14 / 139
![Page 15: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/15.jpg)
Adding in Unigram LM Scores P(wi)
one
two
three
four
�ve
six
seveneight
nine
zero
What about bigram P(wi |wi−1)? Trigrams P(wi |wi−2wi−1)?
15 / 139
![Page 16: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/16.jpg)
Adding Language Model Scores
Solution: multiple copies of each word HMM!Old view: add LM scores to word HMM loop.New view: express LM as HMM. Sub in word HMM’s.
16 / 139
![Page 17: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/17.jpg)
Example: Unigram LM
Take (H)MM representing language model.
one
two
three
. . . . . .
�
Replace each word with phonetic word HMM.
HMMone
HMMtwo
HMMthree
. . . . . .
�
one
two
three
four
�ve
six
seveneight
nine
zero
17 / 139
![Page 18: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/18.jpg)
N-Gram Models as (H)MM’s
18 / 139
![Page 19: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/19.jpg)
Substituting in Word HMM’s
AACHEN
AA K AX N
AA-|+K K-AA+AX AX-K+N N-AX+|
gAA.1,9 gAA.2,2 gK.1,6 gK.2,7 gAX.1,15 gAX.2,3 gN.1,4 gN.2,1
gAA.1,9 gAA.2,2 gK.1,6 gK.2,7 gAX.1,15 gAX.2,3 gN.1,4 gN.2,1
19 / 139
![Page 20: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/20.jpg)
Recap: Small vs. Large Vocabulary Decoding
It’s all about building the one big HMM.Add in LM scores in graph; Viterbi unchanged.Start from word LM; substitute in word HMM’s.
20 / 139
![Page 21: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/21.jpg)
Where Are We?
1 Introduction to FSA’s, FST’s, and Composition
2 What Can Composition Do?
3 How To Compute Composition
4 Composition and Graph Expansion
5 Weighted FSM’s
21 / 139
![Page 22: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/22.jpg)
Substituting in Word HMM’s
AACHEN
AA K AX N
AA-|+K K-AA+AX AX-K+N N-AX+|
gAA.1,9 gAA.2,2 gK.1,6 gK.2,7 gAX.1,15 gAX.2,3 gN.1,4 gN.2,1
gAA.1,9 gAA.2,2 gK.1,6 gK.2,7 gAX.1,15 gAX.2,3 gN.1,4 gN.2,1
What about cross-word dependencies?e.g., no boundary token; quinphones.
22 / 139
![Page 23: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/23.jpg)
Cross-Word Dependencies
Tricky: single-phone words; depend on two words away.
23 / 139
![Page 24: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/24.jpg)
Graph Expansion Issues
How to handle context-dependency?How to "glue in" HMM’s, e.g., word HMM’s into an LM?How to do graph optimization?And handle scores/probs.Is there an elegant framework for all this?
24 / 139
![Page 25: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/25.jpg)
Finite-State Machines!
A way of representing graphs/HMM’s.e.g., LM’s, one big HMM.
A way of transforming graphs.e.g., substituting word HMM’s into an LM.
A set of graph operations.e.g., intersection, determinization, minimization, etc.
Weighted graphs and transformations, too.
25 / 139
![Page 26: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/26.jpg)
Graph Expansion and FSM’s
Design a bunch of “simple” finite-state machines.Apply standard FSM operations . . .To compute the one big HMM, and optimize it, too!
26 / 139
![Page 27: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/27.jpg)
How To Represent a Graph/HMM?
Finite-state acceptor (FSA).Just like HMM with symbolic outputs.Exactly one initial state; one or more final states.Arcs can be labeled with ε.Ignore probabilties for now.
a
a
�
c
b
27 / 139
![Page 28: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/28.jpg)
What Does an FSA Accept?
An FSA accepts a string i . . .If path from initial to final state labeled with i .Does this FSA accept abb? acccbaacc? aca? ε?Can an FSA accept an infinite number of strings?
a
a
�
c
b
28 / 139
![Page 29: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/29.jpg)
How To Represent a Graph Transformation?
Finite-state transducer (FST).Like FSA, except each arc has two symbols.
An input label (possibly ε).An output label (possibly ε).
Intuition: rewrites input labels as output labels.
a:a
a:�
�:b
c:c
b:a
29 / 139
![Page 30: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/30.jpg)
What Does an FST Accept?
An FST accepts a string pair (i ,o) . . .If path from initial to final state . . .Labeled with i on input side and o on output side.Does this FST accept (acb, ca)? (acb, a)?
a:a
a:�
�:b
c:c
b:a
30 / 139
![Page 31: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/31.jpg)
How To Apply a Graph Transformation?
Composition!Given FSA graph A, e.g.,
a b
c
And FST transformation T , e.g.,
a:A b:B
c:C
Their composition A ◦ T is an FSA, e.g.,
A B
C
31 / 139
![Page 32: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/32.jpg)
Composition Intuition
If A accepts string i , e.g., ab . . .
a b
c
And T accepts pair (i ,o), e.g., (ab, AB) . . .
a:A b:B
c:C
Then A ◦ T accepts string o, e.g., AB.
A B
C
Perspective: trace paths in A and T together.
32 / 139
![Page 33: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/33.jpg)
Recap
Graphs: FSA’s.One label on each arc.
Graph transformations: FST’s.Input and output label on each arc.
Use composition to apply FST to FSA; produces FSA.
33 / 139
![Page 34: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/34.jpg)
Where Are We?
1 Introduction to FSA’s, FST’s, and Composition
2 What Can Composition Do?
3 How To Compute Composition
4 Composition and Graph Expansion
5 Weighted FSM’s
34 / 139
![Page 35: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/35.jpg)
A Simple Class of FST’s
Replacing single symbol with single symbol, everywhere.
1
a:Ab:Bc:Cd:D
35 / 139
![Page 36: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/36.jpg)
Rewriting Single String A Single Way
A 1 2a 3b 4d
T1
a:Ab:Bc:Cd:D
A ◦ T 1 2A 3B 4D
36 / 139
![Page 37: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/37.jpg)
Rewriting Many Strings At Once
A 1
2c
d
6
b
3a
5
a
a
4
b
d
T1
a:Ab:Bc:Cd:D
A ◦ T 1
3B
2
C
D
4
A
A
5A 6
D
B
37 / 139
![Page 38: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/38.jpg)
Rewriting Single String Many Ways
A 1 2a 3b 4a
T1
a:aa:Ab:bb:B
A ◦ T 1 2aA
3bB
4aA
38 / 139
![Page 39: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/39.jpg)
Rewriting Some Strings Zero Ways
A 1
2a
d
6
b
3a
5
a
a
4
b
a
T 1
a:a
A ◦ T 1 2a3a
4
a
5a
39 / 139
![Page 40: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/40.jpg)
Generalizing Replacement
Instead of replacing single symbol with single symbol . . .Can replace arbitrary string with arbitrary string.e.g., what does FST on right do?
1
a:Ab:Bc:Cd:D
�:AH
�:IY
THE:DH
DOG:D
�:G
�:AO
40 / 139
![Page 41: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/41.jpg)
Context-Dependent Replacement
Instead of always replacing symbol with symbol . . .Only do so in certain context.e.g., what does this FST do? (Think: bigram model.)
a
a:a bb:b
cc:c
a:a
b:b
c:c
a:A
b:Bc:C
41 / 139
![Page 42: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/42.jpg)
Discussion
Transforming a single string to a single string is easy.e.g., change color to colour everywhere in file.
Composition: rewrites every string accepted by graph.Things composition can do:
Transform (possibly infinite) set of strings!Not just 1:1, but 1:many and 1:0 transforms!Can replace arbitrary strings with arbitrary strings!Can do context-dependent transforms!Expresses output compactly, as another graph!
42 / 139
![Page 43: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/43.jpg)
Where Are We?
1 Introduction to FSA’s, FST’s, and Composition
2 What Can Composition Do?
3 How To Compute Composition
4 Composition and Graph Expansion
5 Weighted FSM’s
43 / 139
![Page 44: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/44.jpg)
How To Define Composition?
A ◦ T accepts the string o iff . . .There exists a string i such that . . .A accepts i and T accepts (i ,o).
Aa b
c
Ta:A b:B
c:C
A ◦ TA B
C
44 / 139
![Page 45: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/45.jpg)
A Simple Case
A 1 2a 3b
T 1 2a:A 3b:B
A ◦ T 1,1 2,2A
3,3B
Intuition: trace through A, T simultaneously.
45 / 139
![Page 46: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/46.jpg)
Another Simple Case
A 1 2a 3b 4d
T1
a:Ab:Bc:Cd:D
A ◦ T 1,1 2,1A
3,1B
4,1D
Intuition: trace through A, T simultaneously.
46 / 139
![Page 47: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/47.jpg)
Composition: States
A 1 2a 3b 4d
T1
a:Ab:Bc:Cd:D
A ◦ T 1,1 2,1A
3,1B
4,1D
What is the possible set of states in result?Cross product of states in inputs, i.e., (s1, s2).
47 / 139
![Page 48: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/48.jpg)
Composition: Arcs
A 1 2a 3b 4d
T1
a:Ab:Bc:Cd:D
A ◦ T 1,1 2,1A
3,1B
4,1D
Create arc from (s1, t1) to (s2, t2) with label o iff . . .Arc from s1 to s2 in A with label i and . . .Arc from t1 to t2 in T with input i and output o.
48 / 139
![Page 49: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/49.jpg)
The Composition Algorithm
For every state s ∈ A, t ∈ T , create state (s, t) ∈ A ◦ T .Create arc from (s1, t1) to (s2, t2) with label o iff . . .
Arc from s1 to s2 in A with label i and . . .Arc from t1 to t2 in T with input i and output o.
(s, t) is initial iff s and t are initial; similarly for final states.What is time complexity?
49 / 139
![Page 50: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/50.jpg)
Example
A 1 2a 3b
T 1 2a:A 3b:B
A ◦ T
1,1
2,2
A
3,3
B
1,2
1,3
2,1
2,3
3,1
3,2
50 / 139
![Page 51: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/51.jpg)
Another Example
A1
2a
3a
b
b
T 1 2
a:A
b:B
a:a
b:b
A ◦ T 1,1 3,2
A
2,2A
b
3,1b 1,2B
a2,1a B
51 / 139
![Page 52: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/52.jpg)
Composition and ε-Transitions
Basic idea: can take ε-transition in one FSM . . .Without moving in other FSM.
Tricky to do exactly right.Do readings if you care: (Pereira, Riley, 1997)
A,T 1 2<epsilon>
A3B 1 2<epsilon>:B
A:A3B:B
A ◦ T
1,1
2,2
A
1,2
B
2,1eps
3,3
B
eps
1,3 2,3eps
B
3,1
3,2
B
52 / 139
![Page 53: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/53.jpg)
Recap
Composition is easy!Composition is fast!Worst case: quadratic in states.
Optimization: only expand reachable state pairs.
53 / 139
![Page 54: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/54.jpg)
Where Are We?
1 Introduction to FSA’s, FST’s, and Composition
2 What Can Composition Do?
3 How To Compute Composition
4 Composition and Graph Expansion
5 Weighted FSM’s
54 / 139
![Page 55: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/55.jpg)
Building the One Big HMM
Can we do this with composition?Start with n-gram LM expressed as HMM.Repeatedly expand to lower-level HMM’s.
55 / 139
![Page 56: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/56.jpg)
A View of Graph Expansion
Design some finite-state machines.L = language model FSA.TLM→CI = FST mapping to CI phone sequences.TCI→CD = FST mapping to CD phone sequences.TCD→GMM = FST mapping to GMM sequences.
Compute final decoding graph via composition:
L ◦ TLM→CI ◦ TCI→CD ◦ TCD→GMM
How to design transducers?
56 / 139
![Page 57: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/57.jpg)
Example: Mapping Words To Phones
THE DH AHTHE DH IYDOG D AO G
THE:DH.AH
THE:DH.IY
DOG:D.AO.G
�:AH
�:IY
THE:DH
DOG:D
�:G
�:AO
57 / 139
![Page 58: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/58.jpg)
Example: Mapping Words To Phones
ATHE DOG
T
�:AH
�:IY
THE:DH
DOG:D
�:G
�:AO
A ◦ TGDH AOAH
IY
D
58 / 139
![Page 59: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/59.jpg)
Example: Inserting Optional Silences
A 1 2C 3A 4B
T1
<epsilon>:~SILA:AB:BC:C
A ◦ T1
~SIL
2C
~SIL
3A
~SIL
4B
~SIL
Don’t forget identity transformations!Strings that aren’t accepted are discarded.
59 / 139
![Page 60: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/60.jpg)
Example: Rewriting CI Phones as HMM’s
AD AO G
T
�:gD.2
�:gD.1
�:gG.2�:gG.2
�:gAO.2
�:gAO.2
G:gG.1
�:gG.1
�:�
AO:gAO.1
�:�
�:gD.2D:gD.1
�:�
�:gAO.1
A ◦ T gD.2
gAO.1gG.2
gG.1 gG.2
gAO.2
gAO.2
gG.1gD.2
gD.1 gAO.1
gD.1
60 / 139
![Page 61: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/61.jpg)
Example: Rewriting CI⇒ CD Phones
e.g., L⇒ L-S+IHThe basic idea: adapt FSA for trigram model.When take arc, know current trigram (P(wi |wi−2wi−1)).Output wi−1-wi−2+wi !
dit
dah
dit
dit
dah
dah
dah
dit
dah
dit
dit
dah
dit
dah
dit
dah
61 / 139
![Page 62: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/62.jpg)
How to Express CD Expansion via FST’s?
AT
D
AAAA T
D
T�:AA-T+j
T:AA-D+T
�:AA-D+j
D:AA-D+D
AA:D-AA+AA
D:AA-T+D
T:AA-T+T
D:�
AA:T-AA+AA
T:�
AA:D-j+AA
AA:T-j+AA
j
j D
j T D AA
T AA AA D
AA T
AA j
A ◦ T AA-D+j
AA-D+D
D-AA+AA
T-AA+AA
D-j+AA
T-j+AA AA-T+jAA-T+D
AA-D+T
AA-T+T
62 / 139
![Page 63: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/63.jpg)
How to Express CD Expansion via FST’s?
T
D
AAAA T
D
AA-D+j
AA-D+D
D-AA+AA
T-AA+AA
D-j+AA
T-j+AA AA-T+jAA-T+D
AA-D+T
AA-T+T
Point: composition automatically expands FSA . . .To correctly handle context!
Makes multiple copies of states in original FSA . . .That can exist in different triphone contexts.(And makes multiple copies of only these states.)
63 / 139
![Page 64: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/64.jpg)
Example: Rewriting CD Phones as HMM’s
AD-|+AO AO-D+G G-AO+|
T
ε:gD.2,7
ε:gD.1,3
ε:gG.2,4ε:gG.2,4
ε:gAO.2,3
ε:gAO.2,3
G-AO+|:gG.1,8
ε:gG.1,8
ε:ε
AO-D+G:gAO.1,5
ε:ε
ε:gD.2,7D-|+AO:gD.1,3
ε:ε
ε:gAO.1,5
A ◦ T gD.2,7
gAO.1,5gG.2,4
gG.1,8 gG.2,4
gAO.2,3
gAO.2,3
gG.1,8gD.2,7
gD.1,3 gAO.1,5
gD.1,3
64 / 139
![Page 65: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/65.jpg)
Recap: Whew!
Design some finite-state machines.L = language model FSA.TLM→CI = FST mapping to CI phone sequences.TCI→CD = FST mapping to CD phone sequences.TCD→GMM = FST mapping to GMM sequences.
Compute final decoding graph via composition:
L ◦ TLM→CI ◦ TCI→CD ◦ TCD→GMM
65 / 139
![Page 66: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/66.jpg)
Where Are We?
1 Introduction to FSA’s, FST’s, and Composition
2 What Can Composition Do?
3 How To Compute Composition
4 Composition and Graph Expansion
5 Weighted FSM’s
66 / 139
![Page 67: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/67.jpg)
What About Those Probability Thingies?
e.g., to hold language model probs, transition probs, etc.FSM’s⇒ weighted FSM’s.
WFSA’s, WFST’s.Each arc has score or cost.
So do final states.
a/0.2
a/0.3
�/0.6
c/0.4
b/1.3
13/0.4
2/1.0
67 / 139
![Page 68: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/68.jpg)
What Is A Cost?
HMM’s have probabilities on arcs.Prob of path is product of arc probs.
a/0.1 b/1.0 d/0.011 32 4
WFSM’s have negative log probs on arcs.Cost of path is sum of arc costs plus final cost.
a/1 b/0 d/21 32 4/0
68 / 139
![Page 69: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/69.jpg)
What Does a WFSA Accept?
A WFSA accepts a string i with cost c . . .If path from initial to final state labeled with i and with cost c.How costs/labels distributed along path doesn’t matter!Do these accept same strings with same costs?
1 2a/1 3/3b/2 1 2a/0 3/6b/0
69 / 139
![Page 70: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/70.jpg)
What If Two Paths With Same String?
How to compute cost for this string?Use “min” operator to compute combined cost?
Combine paths with same labels.
1 2
a/1
a/2
b/33/0c/0 1 2a/1
b/33/0c/0
Operations (+,min) form a semiring (the tropical semiring).
70 / 139
![Page 71: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/71.jpg)
Which Is Different From the Others?
1 2/1a/0
1 2/0.5a/0.5
a/1
1 2<epsilon>/1 3/0a/0
1 2/-2a/3 3b/1
b/1
71 / 139
![Page 72: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/72.jpg)
Weighted Composition
Aa/1 b/0 d/2
1 32 4/0
T a:A/2
b:B/1
c:C/0
d:D/0
1/1
A ◦ TA/3 B/1 D/2
1 32 4/1
72 / 139
![Page 73: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/73.jpg)
The Bottom Line
Place LM, AM log probs in L, TLM→CI, TCI→CD, TCD→GMM.e.g., LM probs, pronunciation probs, transition probs.
Compute decoding graph via weighted composition:
L ◦ TLM→CI ◦ TCI→CD ◦ TCD→GMM
Then, doing Viterbi decoding on this big HMM . . .Correctly computes (more or less):
ω∗ = arg maxω
P(ω|x) = arg maxω
P(ω)P(x|ω)
73 / 139
![Page 74: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/74.jpg)
Recap: FST’s and Composition? Awesome!
Operates on all paths in WFSA (or WFST) simultaneously.Rewrites symbols as other symbols.Context-dependent rewriting of symbols.Adds in new scores.Restricts set of allowed paths (intersection).Or all of above at once.
74 / 139
![Page 75: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/75.jpg)
Weighted FSM’s and ASR
Graph expansion can be framed . . .As series of (weighted) composition operations.
Correctly combines scores from multiple WFSM’s.Building FST’s for each step is pretty straightforward . . .
Except for context-dependent phone expansion.Handles graph expansion for training, too.
75 / 139
![Page 76: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/76.jpg)
Discussion
Don’t need to write code?!AT&T FSM toolkit⇒ OpenFST; lots of others.Generate FST’s as text files.
1 2 C2 3 A3 4 B4
1 2C 3A 4B
WFSM framework is very flexible.Just design new FST’s!e.g., CD pronunciations at word or phone level.
76 / 139
![Page 77: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/77.jpg)
Part II
Making Decoding Fast
77 / 139
![Page 78: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/78.jpg)
How Big? How Fast?
Time to look at efficiency.How big is the one big HMM?How long will Viterbi take?
78 / 139
![Page 79: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/79.jpg)
Pop Quiz
How many states in HMM representing trigram model . . .With vocabulary size |V |?
How many arcs?
dit
dah
dit
dit
dah
dah
dah
dit
dah
dit
dit
dah
dit
dah
dit
dah
79 / 139
![Page 80: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/80.jpg)
Issue: How Big The Graph?
Trigram model (e.g., vocabulary size |V | = 2)
dit
dah
dit
dit
dah
dah
dah
dit
dah
dit
dit
dah
dit
dah
dit
dah
|V |3 word arcs in FSA representation.Words are ∼4 phones = 12 states on average (CI).If |V | = 50000, 500003 × 12 ≈ 1015 states in graph.PC’s have ∼ 1010 bytes of memory.
80 / 139
![Page 81: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/81.jpg)
Issue: How Slow Decoding?
In each frame, loop through every state in graph.If 100 frames/sec, 1015 states . . .
How many cells to compute per second?A core can do ∼ 1011 floating-point ops per second.
81 / 139
![Page 82: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/82.jpg)
Recap
Naive graph expansion is way too big; Viterbi way too slow.Shrinking the graph also makes things faster!How to shrink the one big HMM?
82 / 139
![Page 83: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/83.jpg)
Where Are We?
1 Shrinking the Language Model
2 Graph Optimization
3 Pruning
4 Other Viterbi Optimizations
5 Other Decoding Paradigms
83 / 139
![Page 84: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/84.jpg)
Compactly Representing N-Gram Models
One big HMM size ∝ LM HMM size.Trigram model: |V |3 arcs in naive representation.
dit
dah
dit
dit
dah
dah
dah
dit
dah
dit
dit
dah
dit
dah
dit
dah
Small fraction of all trigrams occur in training data.Is it possible to keep arcs only for seen trigrams?
84 / 139
![Page 85: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/85.jpg)
Compactly Representing N-Gram Models
Can express smoothed n-gram models . . .Via backoff distributions.
Psmooth(wi |wi−1) =
{Pprimary(wi |wi−1) if count(wi−1wi) > 0αwi−1Psmooth(wi) otherwise
Idea: avoid arcs for unseen trigrams via backoff states.
85 / 139
![Page 86: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/86.jpg)
Compactly Representing N-Gram Models
Psmooth(wi |wi−1) =
{Pprimary(wi |wi−1) if count(wi−1wi) > 0αwi−1Psmooth(wi) otherwise
three/P(threejtwo)
one/P(onejone)
two/P(twojtwo)
one/P(onejtwo)
one/P(onejthree)
�/�(one)
�/�(three)
one/P(one)
three/P(threejthree)
two/P(twojthree)
two/P(twojone)
two/P(two)
three/P(three)
three/P(threejone)
�/�(two)
one
three
two
�
86 / 139
![Page 87: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/87.jpg)
Problem Solved!?
Is this FSA deterministic?i.e., are there multiple paths with same label sequence?
Is this method exact?Does Viterbi ever use the wrong probability?
87 / 139
![Page 88: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/88.jpg)
Can We Make the LM Even Smaller?
Sure, just remove some more arcs. Which?Count cutoffs.
e.g., remove all arcs corresponding to n-grams . . .Occurring fewer than k times in training data.
Likelihood/entropy-based pruning (Stolcke, 1998).Choose those arcs which when removed, . . .Change likelihood of training data the least.
88 / 139
![Page 89: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/89.jpg)
Discussion
Only need to keep seen n-grams in LM graph.Exact representation blows up graph several times.
Can further prune LM to arbitrary size.e.g., for BN 4-gram model, 100MW training data . . .Pruning by factor of 50⇒ +1% absolute WER.
Graph small enough now?Let’s keep on going; smaller⇒ faster!
89 / 139
![Page 90: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/90.jpg)
Where Are We?
1 Shrinking the Language Model
2 Graph Optimization
3 Pruning
4 Other Viterbi Optimizations
5 Other Decoding Paradigms
90 / 139
![Page 91: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/91.jpg)
Graph Optimization
Can we modify topology of graph . . .Such that it’s smaller (fewer arcs or states) . . .Yet accepts same strings (with same costs)?(OK to move labels and costs along paths.)
91 / 139
![Page 92: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/92.jpg)
Graph Compaction
Consider word graph for isolated word recognition.Expanded to phone level: 39 states, 38 arcs.
AX
AX
AX
AE
AE
AE
AA
B
B
B
B
B
B
B
R
S
Z
UW
UW
Y
Y
AO
ER
ER
ABU
ABU
UW
UW
DD
DD
DD
S
Z
ABROAD
ABSURD
ABSURD
ABUSE
ABUSE
92 / 139
![Page 93: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/93.jpg)
Determinization
Share common prefixes: 29 states, 28 arcs.
AX
AEAA
B
B
B
R
Y
S
Z
UW
UW
AO
UW
ER
ER
ABU
ABU
DD
S
Z
DD
DD
ABROAD
ABUSE
ABUSE
ABSURD
ABSURD
93 / 139
![Page 94: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/94.jpg)
Minimization
Share common suffixes: 18 states, 23 arcs.
AX
AEAA
B
B
B
R
Y
S
Z
UW
UW
AO
UW
ER
ABU
DD
S
Z
DD
ABROAD
ABUSE
ABSURD
Does this accept same strings as original graph?Original: 39 states, 38 arcs.
94 / 139
![Page 95: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/95.jpg)
What Is A Deterministic FSM?
Same as being nonhidden for HMM.No two arcs exiting same state with same input label.No ε arcs.i.e., for any input label sequence . . .
Only one state reachable from start state.
A
A <epsilon>
B
BA B
95 / 139
![Page 96: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/96.jpg)
Determinization: A Simple Case
1
2a
3a
4
b1
2,3a
4
b
Does this accept same strings?States on right⇔ state sets on left!
96 / 139
![Page 97: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/97.jpg)
A Less Simple Case
1
2<epsilon>
4
a
3a
b
5b 1,2 3,4
a4,5
b
b
Does this accept same strings? (ab∗)
97 / 139
![Page 98: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/98.jpg)
Determinization
Start from start state.Keep list of state sets not yet expanded.
For each, compute outgoing arcs in logical way . . .Creating new state sets as needed.
Must follow ε arcs when computing state sets.
1
2A
3
A 5<epsilon>
4B
B1 2,3,5A 4B
98 / 139
![Page 99: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/99.jpg)
Example 2
1 2a
3
a 4a
5
aaabb
1 2,3a 2,3,4,5a
a
4,5b
b
99 / 139
![Page 100: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/100.jpg)
Example 3
1
2AX
7AX
8AX
3AE
4
AE
5
AE
6
AA
9B
14B
15B
10B
11B
12B
13B
16R
17S
18Z
19UW
20UW
21Y
22Y
23AO
24ER
25ER
26ABU
27ABU
28UW
29UW
30DD
31DD
32DD
33S
34Z
35ABROAD
36ABSURD
37ABSURD
38ABUSE
39ABUSE
100 / 139
![Page 101: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/101.jpg)
Example 3, Continued
1
2,7,8
AX
3,4,5AE
6
AA
9,14,15B
10,11,12B
13
B
R
Y
S
Z
UW
UW
AO
UW
ER
ER
ABU
ABU
DD
S
Z
DD
DD
ABROAD
ABUSE
ABUSE
ABSURD
ABSURD
101 / 139
![Page 102: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/102.jpg)
Pop Quiz: Determinization
For FSA with s states, . . .What is max number of states when determinized?i.e., how many possible unique state sets?
Are all unweighted FSA’s determinizable?i.e., does algorithm always terminate . . .To produce equivalent deterministic FSA?
102 / 139
![Page 103: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/103.jpg)
Minimization
What should we minimize?The number of states!
103 / 139
![Page 104: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/104.jpg)
Minimization Basics
Algorithm only correct for deterministic FSM’s.Output FSM is also deterministic.Basic idea: suffix sharing.
Can merge two states if have same “suffix”.
104 / 139
![Page 105: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/105.jpg)
Minimization: A Simple Case
1
2a
6
b
3a
4b
7a
8
b
5b
9b
1 2,6a
b3,5,7,9
a
4,8
b b
Does this accept same strings?States on right⇔ state sets on left! Partition!
105 / 139
![Page 106: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/106.jpg)
Minimization: Acyclic Graphs
Merge states with same following strings (follow sets).
1
2A
6B
3B
7C
8
D
4C
5
D
1
2A
3,6B
B
4,5,7,8C
D
states following strings1 ABC, ABD, BC, BD2 BC, BD
3, 6 C, D4,5,7,8 ε
106 / 139
![Page 107: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/107.jpg)
General Minimization: The Basic Idea
Given deterministic FSM . . .Start with all states in single partition.Whenever states within partition . . .
Have “different” outgoing arcs or finality . . .Split partition.
At end, each partition corresponds to state in output FSM.Make arcs in logical manner.
1
2a
6
b
3a
4b
7a
8
b
5b
9b
1 2,6a
b3,5,7,9
a
4,8
b b
107 / 139
![Page 108: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/108.jpg)
Minimization
Invariant: if two states are in different partitions . . .They have different follow sets.
First split: final and non-final states.Final states have ε in their follow sets.
Two states in same partition have different follow sets if . . .Different number of outgoing arcs or arc labels . . .Or arcs go to different partitions.
1
2a
6
b
3a
4b
7a
8
b
5b
9b
1 2,6a
b3,5,7,9
a
4,8
b b
108 / 139
![Page 109: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/109.jpg)
Minimization
1
2a
4
d
c
3b
5c
c
6b
action evidence partitioning{1,2,3,4,5,6}
split 3,6 final {1,2,4,5}, {3,6}split 1 has a arc {1}, {2,4,5}, {3,6}split 4 no b arc {1}, {4}, {2,5}, {3,6}
1 2,5
a
4
d
c
3,6bc
109 / 139
![Page 110: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/110.jpg)
Discussion
Determinization.May reduce or increase number of states.Improves behavior of search⇒ prefix sharing!
Minimization.Minimizes states, not arcs, for deterministic FSM’s.Does minimization always terminate? How long?
Weighted algorithms exist for both FSA’s, FST’s.Available in FSM toolkits.
Weighted minimization requires push operation.Normalizes locations of costs/labels along paths . . .So arcs that can be merged have same cost/label.
110 / 139
![Page 111: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/111.jpg)
Weighted Graph Expansion, Optimized
Final graph: min(det(L ◦ TLM→CI ◦ TCI→CD ◦ TCD→GMM))
L = pruned, backoff language model FSA.TLM→CI = FST mapping to CI phone sequences.TCI→CD = FST mapping to CD phone sequences.TCD→GMM = FST mapping to GMM sequences.
Build big graph; minimize at end?Problem: can’t hold big graph in memory.Many existing recipes for graph expansion.
1015+ states⇒ 20–50M states/arcs.5–10M n-grams kept in LM.
111 / 139
![Page 112: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/112.jpg)
Where Are We?
1 Shrinking the Language Model
2 Graph Optimization
3 Pruning
4 Other Viterbi Optimizations
5 Other Decoding Paradigms
112 / 139
![Page 113: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/113.jpg)
Real-Time Decoding
Why is this desirable?Decoding time for Viterbi algorithm; 10M states in graph.
100 frames/sec × 10M states × . . .100 cycles/state⇒ 1011 cycles/sec.PC’s do ∼ 109 cycles/second (e.g., 3GHz Xeon).
Cannot afford to evaluate each state at each frame.Need to optimize Viterbi algorithm!
113 / 139
![Page 114: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/114.jpg)
Pruning
At each frame, only evaluate cells with highest scores.Given active states/cells from last frame . . .
Only examine states/cells in current frame . . .Reachable from active states in last frame.Keep best to get active states in current frame.
114 / 139
![Page 115: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/115.jpg)
Don’t Throw Out the Baby
When not considering every state at each frame . . .Can make search errors.
ω∗ = arg maxω
P(ω|x) = arg maxω
P(ω)P(x|ω)
The goal of search:Minimize computation and search errors.
115 / 139
![Page 116: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/116.jpg)
How Many Active States To Keep?
Goal: Prune paths with no chance of becoming best path.Beam pruning.
Keep only states with log probs within fixed distance . . .Of best log prob at that frame.
Rank or histogram pruning.Keep only k highest scoring states.
When are these good? Bad? Can get best of both?
116 / 139
![Page 117: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/117.jpg)
Pruning Visualized
Active states are small fraction of total states (<1%)Tend to be localized in small regions in graph.
AX
AEAA
B
B
B
R
Y
S
Z
UW
UW
AO
UW
ER
ER
ABU
ABU
DD
S
Z
DD
DD
ABROAD
ABUSE
ABUSE
ABSURD
ABSURD
117 / 139
![Page 118: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/118.jpg)
Pruning and Determinization
Most uncertainty occurs at word starts.Determinization drastically reduces branching here.
AX
AX
AX
AE
AE
AE
AA
B
B
B
B
B
B
B
R
S
Z
UW
UW
Y
Y
AO
ER
ER
ABU
ABU
UW
UW
DD
DD
DD
S
Z
ABROAD
ABSURD
ABSURD
ABUSE
ABUSE
118 / 139
![Page 119: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/119.jpg)
Language Model Lookahead
In practice, put word labels at word ends. (Why?)What’s wrong with this picture? (Hint: think beam pruning.)
AX/0
AE/0
AA/0
B/0
B/0
B/0
R/0
Y/0
S/0
Z/0
UW/0
UW/0
AO/0
UW/0
ER/0
ER/0
ABU/7
ABU/7
DD/0
S/0
Z/0
DD/0
DD/0
ABROAD/4.3
ABUSE/3.5
ABUSE/3.5
ABSURD/4.7
ABSURD/4.7
119 / 139
![Page 120: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/120.jpg)
Language Model Lookahead
Move LM scores as far ahead as possible.At each point, total cost⇔ min LM cost of following words.push operation does this.
AX/3.5
AE/4.7
AA/7.0
B/0
B/0
B/0
R/0.8
Y/0
S/0
Z/0
UW/2.3
UW/0
AO/0
UW/0
ER/0
ER/0
ABU/0
ABU/0
DD/0
S/0
Z/0
DD/0
DD/0
ABROAD/0
ABUSE/0
ABUSE/0
ABSURD/0
ABSURD/0
120 / 139
![Page 121: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/121.jpg)
Where Are We?
1 Shrinking the Language Model
2 Graph Optimization
3 Pruning
4 Other Viterbi Optimizations
5 Other Decoding Paradigms
121 / 139
![Page 122: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/122.jpg)
Saving Memory
Naive Viterbi implementation: store whole DP chart.If 10M-state decoding graph:
10 second utterance⇒ 1000 frames.1000 frames × 10M states = 10 billion cells.
Each cell holds:Viterbi log prob; backtrace pointer.
122 / 139
![Page 123: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/123.jpg)
Forgetting the Past
To compute cells at frame t . . .Only need cells at frame t − 1!
Only reason need to keep cells from past . . .Is for backtracing, to recover word sequence.
Can we store backtracing information another way?
123 / 139
![Page 124: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/124.jpg)
Compressing Backtraces
Only need to remember graph! (Can forget gray stuff.)How to make this graph smaller?
124 / 139
![Page 125: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/125.jpg)
Determinization!
1
2six
3five
4
oh5two
6
four
In each cell, just remember node in FSA!125 / 139
![Page 126: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/126.jpg)
Token Passing
1
2six
3five
4
oh5two
6
four
126 / 139
![Page 127: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/127.jpg)
Token Passing
Maintain “word tree”:Node represents word sequence from start state.
Backtrace pointer points to node in tree . . .Holding word sequence labeling best path to cell.
Set backtrace to same node as at best last state . . .Unless cross word boundary.
1
2THE
9THIS
11
THUD
3DIG
4DOG
10DOG
5ATE
6EIGHT
7MAY
8MY
127 / 139
![Page 128: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/128.jpg)
Recap: Efficient Viterbi Decoding
The essence: one big HMM and Viterbi.Graph optimization crucial, but not enough by itself.Pruning is key for speed.
Determinization and LM lookahead help pruning a ton.Can process ∼10000 states/frame in <1× RT on PC.
Can process ∼1% of cells for 10M-state graph . . .And make very few search errors.
Depending on application and resources . . .May run faster or slower than 1× RT (desktop).
Memory usage.The biggie: decoding graph (shared memory).
128 / 139
![Page 129: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/129.jpg)
Where Are We?
1 Shrinking the Language Model
2 Graph Optimization
3 Pruning
4 Other Viterbi Optimizations
5 Other Decoding Paradigms
129 / 139
![Page 130: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/130.jpg)
My Language Model Is Too Small
What we’ve described: static graph expansion.To make decoding graph tractable . . .Use heavily-pruned language model.
Another approach: dynamic graph expansion.Don’t store whole graph in memory.Build parts of graph with active states on the fly.
one
two
three
four
�ve
six
seveneight
nine
zero
one
two
three
. . . . . .
�
�:AH
�:IY
THE:DH
DOG:D
�:G
�:AO
130 / 139
![Page 131: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/131.jpg)
Dynamic Graph Expansion: The Basic Idea
Express graph as composition of two smaller graphs.Composition is associative.
Gdecode = L ◦ TLM→CI ◦ TCI→CD ◦ TCD→GMM
= L ◦ (TLM→CI ◦ TCI→CD ◦ TCD→GMM)
Can do on-the-fly composition.States in result correspond to state pairs (s1, s2).
131 / 139
![Page 132: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/132.jpg)
Two-Pass Decoding
What about my fuzzy logic 15-phone acoustic model . . .And 7-gram neural net LM with SVM boosting?
Some of the models developed in research are . . .Too expensive to implement in one-pass decoding.
First-pass decoding: use simpler model . . .To find “likeliest” word sequences . . .As lattice (WFSA) or flat list of hypotheses (N-best list).
Rescoring: use complex model . . .To find best word sequence . . .Among first-pass hypotheses.
132 / 139
![Page 133: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/133.jpg)
Lattice Generation and Rescoring
THE
THIS
THUD
DIG
DOG
DOG
DOGGY
ATE
EIGHT
MAY
MY
MAY
In Viterbi, store k -best tracebacks at each word-end cell.To add in new LM scores to lattice . . .
What operation can we use?Lattices have other uses.
e.g., confidence estimation; consensus decoding;discriminative training, etc.
133 / 139
![Page 134: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/134.jpg)
N-Best List Rescoring
For exotic models, even lattice rescoring may be too slow.Easy to generate N-best lists from lattices.
A∗ algorithm.
THE DOG ATE MYTHE DIG ATE MYTHE DOG EIGHT MAYTHE DOGGY MAY
N-best lists have other uses.e.g., confidence estimation; displaying alternatives; etc.
134 / 139
![Page 135: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/135.jpg)
Discussion: A Tale of Two Decoding Styles
Approach 1: Dynamic graph expansion (since late 1980’s).Can handle more complex language models.Decoders are incredibly complex beasts.e.g., cross-word CD expansion without FST’s.Graph optimization difficult.
Approach 2: Static graph expansion (AT&T, late 1990’s).Enabled by optimization algorithms for WFSM’s.Much cleaner way of looking at everything!FSM toolkits/libraries can do a lot of work for you.Static graph expansion is complex and can be slow.Decoding is relatively simple.
135 / 139
![Page 136: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/136.jpg)
Static or Dynamic? Two-Pass?
If speed is priority?If flexibility is priority?
e.g., update LM vocabulary every night.If need gigantic language model?If latency is priority?
What can’t we use?If accuracy is priority (all the time in the world)?If doing cutting-edge research?
136 / 139
![Page 137: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/137.jpg)
References
F. Pereira and M. Riley, “Speech Recognition byComposition of Weighted Finite Automata”, Finite-StateLanguage Processing, MIT Press, pp. 431–453, 1997.
M. Mohri, F. Pereira, M. Riley, “Weighted finite-statetransducers in speech recognition”, Computer Speech andLanguage, vol. 16, pp. 69–88, 2002.
A. Stolcke, “Entropy-based pruning of Backoff LanguageModels”, Proceedings of the DARPA Broadcast NewsTranscription and Understanding Workshop, pp. 270–274,1998.
137 / 139
![Page 138: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/138.jpg)
Road Map
138 / 139
![Page 139: Lecture 9 - LVCSR Search - Columbia Universitystanchen/spring16/e6870/slides/lecture9_dcd.pdf · Lecture 9 LVCSR Search Michael Picheny, Bhuvana Ramabhadran, ... Does this FSA accept](https://reader034.vdocuments.site/reader034/viewer/2022042800/5a71dd287f8b9a98538d3c41/html5/thumbnails/139.jpg)
Course Feedback
Was this lecture mostly clear or unclear?What was the muddiest topic?Other feedback (pace, content, atmosphere, etc.).
139 / 139