speech recognition - nyu computer sciencemohri/asr07/lecture_2.pdf · 2007. 10. 29. · mehryar...
TRANSCRIPT
-
Speech RecognitionLecture 2: Weighted Finite-StateTransducer
Software Library
Mehryar MohriCourant Institute of Mathematical Sciences
1
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Software Libraries
2
FSM Library: Finite-State Machine Library. Generalsoftware utilities for building, combining, optimizing, and searching weighted automata and transducers (MM, Pereira, and Riley, 2000).
http://www.research.att.com/projects/mohri/fsm
OpenFst Library: Open-source Finite-state transducer Library (Allauzen et al., 2007).
http://www.openfst.org
2
http://www.research.att.com/projects/mohri/fsmhttp://www.research.att.com/projects/mohri/fsmhttp://www.research.att.com/projects/mohri/fsmhttp://www.research.att.com/projects/mohri/fsm
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Software Libraries
GRM Library: Grammar Library. General softwarecollection for constructing and modifying weighted automata and transducers representing grammars and statistical language models (Allauzen, MM, and Roark, 2005).
http://www.research.att.com/projects/mohri/grm
DCD Library: Decoder Library. General softwarecollection for speech recognition decoding and related functions (MM and Riley, 2003).
http://www.research.att.com/~fsmtools/dcd
3
3
http://www.research.att.com/projects/mohri/fsmhttp://www.research.att.com/projects/mohri/fsmhttp://www.research.att.com/projects/mohri/fsmhttp://www.research.att.com/projects/mohri/fsm
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
FSM Library
The FSM utilities construct, combine, minimize, and search weighted finite-states transducers.
• User Program Level: Programs that read from and write to files or pipelines, fsm(1): fsmintersect in1.fsm in2.fsm >out.fsm
• C(++) Library Level: Library archive of C(++) functions that implements the user program level, fsm(3): Fsm in1 = FSMLoad("in1.fsm"); Fsm in2 = FSMLoad("in2.fsm"); Fsm out = FSMIntersect(fsm1, fsm2); FSMDump("out.fsm", out);
4
4
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage 5
• Definition Level: Specification of labels, of costs,and of types of FSM representations.
5
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Weighted automata and transducers
Rational operations
Elementary unary operations
Fundamental binary operations
Optimization algorithms
Search algorithms
This Lecture
6
6
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
FSM File Types
Textual format
• automata/acceptor files,• transducer files,• symbols files.Binary format: compiled representation used by all FSM utilities.
7
7
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Compiling, Printing, and Drawing
Compiling
• fsmcompile -s tropical -iA.syms A.fsm• fsmcompile -s log -iA.syms -oA.syms -t T.fsmPrinting
• fsmprint -iA.syms A.txt• fsmprint -iA.syms -oA.syms T.psDrawing
• fsmdraw -iA.syms A.ps• fsmdraw -iA.syms -oA.syms T.ps
8
8
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Weight Sets: Semirings
A semiring is a ring that may lack negation.
• sum: to compute the weight of a sequence (sum of the weights of the paths labeled with that sequence).
• product: to compute the weight of a path (product of the weights of constituent transitions).
(K,⊕,⊗, 0, 1)
9
9
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Semirings - Examples
Semiring Set ⊕ ⊗ 0 1
Boolean {0, 1} ∨ ∧ 0 1Probability R+ + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R ∪ {−∞,+∞} min + +∞ 0
with ⊕log defined by: x ⊕log y = − log(e−x + e−y).
10
10
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Automata/Acceptors
Graphical Representation ( )
Acceptor file ( )
Symbols file ( )
0
red/0.5
1 green/0.3
2 /0.8blue/0
yellow/0.6
A.ps
A.txt
A.syms
0 0 red .5
0 1 green .3
1 2 blue
1 2 yellow .6
2 .8
red 1
green 2
blue 3
yellow 4
11
11
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Transducers
Graphical Representation ( )
Transducer file ( )
Symbols file ( )red 1
green 2
blue 3
yellow 4
T.ps
0
red:yellow/0.5
1 green:blue/0.3
2 /0.8blue:green/0
yellow:red/0.6
0 0 red yellow .5
0 1 green blue .3
1 2 blue green
1 2 yellow red .6
2 .8
T.txt
T.syms
12
12
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Paths - Definitions and Notation
Path
Sets of paths
• : paths from to .• : paths in with input label x.• : paths in with output
label y.
P (R1, R2) R1 ⊆ Q R2 ⊆ QP (R1, x, R2) P (R1, R2)P (R1, x, y, R2) P (R1, x, R2)
13
π
n[π]p[π]i[π]:o[π]
input label output label
previous stateor source state
next state ordestination state
13
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
General Definitions
Alphabets: input , output .
States: , initial states , final states .
Transitions: .
Weight functions:
• initial: .• final: .
Σ ∆
Q
E ⊆ Q × (Σ ∪ {!}) × (∆ ∪ {!}) × K × Q
λ : I → K
ρ : F → K
14
I F
14
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Automata and Transducers - Definitions
Automaton
Transducer
A = (Σ, Q, I, F, E, λ, ρ)
T = (Σ,∆, Q, I, F, E, λ, ρ)
[[T ]](x, y) =⊕
π∈P (I,x,y,F )
λ(p[π]) ⊗ w[π] ⊗ ρ(n[π])
∀x ∈ Σ∗, y ∈ ∆
∗,
[[A]](x) =⊕
π∈P (I,x,F )
λ(p[π]) ⊗ w[π] ⊗ ρ(n[π])
∀x ∈ Σ∗,
15
15
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Weighted Automata
Sum of the weights of all successful paths labeled with x
[[A]](x) =
0
1a/0.12
a/0.5
b/0.2
a/0.43/0.1
b/0.3
b/0.6
[[A]](abb) = .1 × .2 × .3 × .1 + .5 × .3 × .6 × .1
16
16
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Weighted Transducers
[[T ]](x, y) = Sum of the weights of all successful paths with input x and output y.
0
1a:b/0.12
a:b/0.5
b:a/0.2
a:a/0.43/0.1
b:a/0.3
b:a/0.6
[[T ]](abb, baa) = .1 × .2 × .3 × .1 + .5 × .3 × .6 × .1
17
17
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Weighted automata and transducers
Rational operations
Elementary unary operations
Fundamental binary operations
Optimization algorithms
Search algorithms
This Lecture
18
18
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Rational Operations
Sum
Product
Closure
[[T1 ⊕ T2]](x, y) = [[T1]](x, y) ⊕ [[T2]](x, y)
[[T ∗]](x, y) =∞⊕
n=0
[[T ]]n(x, y)
19
[[T1 ⊗ T2]](x, y) =⊕
x=x1x2y=y1y2
[[T1]](x1, y1) ⊗ [[T2]](x2, y2).
19
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage 20
• Conditions (on the closure operation): condition on T: e.g., weight of ε-cycles = (regulated transducers), or semiring condition: e.g., as with the tropical semiring (more generally locally closed semirings).
• Complexity and implementation:• linear-time complexity:
• lazy implementation.
1 ⊕ x = 1
0
O((|E1| + |Q1|) + (|E2| + |Q2|))
O(|Q| + |E|)
or
20
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsmunion A.fsm B.fsm >C.fsm
Graphical representation:
Sum - Illustration
A.fsa
0
red/0.5
1 green/0.3
2 /0.8blue/0
yellow/0.6
B.fsa
0
1 /0green/0.4
2 /0.3
blue/1.2
C.fsa
0
red/0.5
1 green/0.3 2 /0.8blue/0
yellow/0.6
3 4 /0green/0.4
5 /0.3
blue/1.2
6
eps/0
eps/0
A.fsa
0
red/0.5
1 green/0.3
2 /0.8blue/0
yellow/0.6
B.fsa
0
1 /0green/0.4
2 /0.3
blue/1.2
C.fsa
0
red/0.5
1 green/0.3 2 /0.8blue/0
yellow/0.6
3 4 /0green/0.4
5 /0.3
blue/1.2
6
eps/0
eps/0
A.fsa
0
red/0.5
1 green/0.3
2 /0.8blue/0
yellow/0.6
B.fsa
0
1 /0green/0.4
2 /0.3
blue/1.2
C.fsa
0
red/0.5
1 green/0.3 2 /0.8blue/0
yellow/0.6
3 4 /0green/0.4
5 /0.3
blue/1.2
6
eps/0
eps/0
21
21
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsmconcat A.fsm B.fsm >C.fsm
Graphical representation:
Product - Illustration
A.fsa
0
red/0.5
1 green/0.3
2 /0.8blue/0
yellow/0.6
B.fsa
0
1 /0green/0.4
2 /0.3
blue/1.2
C.fsa
0
red/0.5
1 green/0.3 2 /0.8blue/0
yellow/0.6
3 4 /0green/0.4
5 /0.3
blue/1.2
6
eps/0
eps/0
A.fsa
0
red/0.5
1 green/0.3
2 /0.8blue/0
yellow/0.6
B.fsa
0
1 /0green/0.4
2 /0.3
blue/1.2
C.fsa
0
red/0.5
1 green/0.3 2 /0.8blue/0
yellow/0.6
3 4 /0green/0.4
5 /0.3
blue/1.2
6
eps/0
eps/0
A.fsa
0
red/0.5
1 green/0.3
2 /0.8blue/0
yellow/0.6
B.fsa
0
1 /0green/0.4
2 /0.3
blue/1.2
C.fsa
0
red/0.5
1 green/0.3
2 blue/0
yellow/0.63
eps/0.8
4 /0green/0.4
5 /0.3
blue/1.2
22
22
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsmclosure B.fsm >C.fsm
Graphical representation:
Closure - Illustration
A.fsa
0
red/0.5
1 green/0.3
2 /0.8blue/0
yellow/0.6
B.fsa
0
1 /0green/0.4
2 /0.3
blue/1.2
C.fsa
0
red/0.5
1 green/0.3 2 /0.8blue/0
yellow/0.6
3 4 /0green/0.4
5 /0.3
blue/1.2
6
eps/0
eps/0
B.fsa
0
1 /0green/0.4
2 /0.3
blue/1.2
C.fsa
0
1 /0
green/0.4
2 /0.3
blue/1.2
eps/0
eps/0.3
3 /0eps/0
23
23
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Weighted automata and transducers
Rational operations
Elementary unary operations
Fundamental binary operations
Optimization algorithms
Search algorithms
This Lecture
24
24
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Elementary Unary Operations
Reversal
Inversion
Projection
Linear-time complexity, lazy implementation (not for reversal).
[[T̃ ]](x, y) = [[T ]](x̃, ỹ)
[[T−1]](x, y) = [[T ]](y, x)
[[A]](x) =⊕
y
[[T ]](x, y)
25
25
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsmreverse A.fsm >C.fsm
Graphical representation:
Reversal - Illustration
A.fsa
0
red/0.5
1 green/0.3
2 blue/0
yellow/0.6
3 /0green/1.2
4 /0.3
blue/2
C.fsa
0
4 eps/0
5
eps/0.33
green/1.2
blue/21 /0
red/0.5
2 green/0.3blue/0
yellow/0.6
A.fsa
0
red/0.5
1 green/0.3
2 blue/0
yellow/0.6
3 /0green/1.2
4 /0.3
blue/2
C.fsa
0
4 eps/0
5
eps/0.33
green/1.2
blue/21 /0
red/0.5
2 green/0.3blue/0
yellow/0.6
26
26
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsminvert A.fsm >C.fsm
Graphical representation:
Inversion - Illustration
A.fst
0
red:bird/0.5
1 green:pig/0.3
2 /0.8blue:cat/0
yellow:dog/0.6
C.fst
0
bird:red/0.5
1 pig:green/0.3
2 /0.8cat:blue/0
dog:yellow/0.6
A.fst
0
red:bird/0.5
1 green:pig/0.3
2 /0.8blue:cat/0
yellow:dog/0.6
C.fst
0
bird:red/0.5
1 pig:green/0.3
2 /0.8cat:blue/0
dog:yellow/0.6
27
27
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsmproject -1 T.fsm >A.fsm
Graphical representation:
Projection - Illustration
A.fst
0
red:bird/0.5
1 green:pig/0.3
2 /0.8blue:cat/0
yellow:dog/0.6
C.fst
0
bird:red/0.5
1 pig:green/0.3
2 /0.8cat:blue/0
dog:yellow/0.6
T.fst
0
red:bird/0.5
1 green:pig/0.3
2 /0.8blue:cat/0
yellow:dog/0.6
A.fsa
0
red/0.5
1 green/0.3
2 /0.8blue/0
yellow/0.6
28
28
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Weighted automata and transducers
Rational operations
Elementary unary operations
Fundamental binary operations
Optimization algorithms
Search algorithms
This Lecture
29
29
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Some Fundamental Binary Operations
Composition ( commutative)
Intersection ( commutative)
Difference ( unweighted and deterministic)
(K,⊕,⊗, 0, 1)
[[T1 ◦ T2]](x, y) =⊕
z
[[T1]](x, z) ⊗ [[T2]](z, y)
[[A1 ∩ A2]](x) = [[A1]](x) ⊗ [[A2]](x)
(K,⊕,⊗, 0, 1)
A2
[[A1 − A2]](x) = [[A1 ∩ A2]](x)
(Pereira and Riley, 1997; MM et al. 1996)
30
30
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage 31
• Complexity and implementation:• quadratic complexity:
• path multiplicity in presence of ε-transitions: ε-filter;
• lazy implementation.
O((|E1| + |Q1|) (|E2| + |Q2|))
31
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsmcompose A.fsm B.fsm >C.fsm
Graphical representation:
Composition - Illustration
0
1a:b/0.1
2
b:a/0.2
c:a/0.3
3/0.6
a:a/0.4
b:b/0.50 1
b:c/0.32/0.7
a:b/0.4
a:b/0.6
(0, 0) (1, 1)a:c/0.4
(1, 2)c:b/0.7
(3, 2)/1.3a:b/0.8
c:b/0.9
a:b/1
0
1a:b/0.1
2
b:a/0.2
c:a/0.3
3/0.6
a:a/0.4
b:b/0.50 1
b:c/0.32/0.7
a:b/0.4
a:b/0.6
(0, 0) (1, 1)a:c/0.4
(1, 2)c:b/0.7
(3, 2)/1.3a:b/0.8
c:b/0.9
a:b/1
0
1a:b/0.1
2
b:a/0.2
c:a/0.3
3/0.6
a:a/0.4
b:b/0.50 1
b:c/0.32/0.7
a:b/0.4
a:b/0.6
(0, 0) (1, 1)a:c/0.4
(1, 2)c:b/0.7
(3, 2)/1.3a:b/0.8
c:b/0.9
a:b/1
32
32
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Multiplicity and ε-Transitions - Problem
33
Springer Handbook on Speech Processing and Speech Communication 14
0,0 1,1 1,2
2,1 2,2
3,1 3,2
4,3
a:d !:e
b:!
c:!
b:!
c:!
!:e
!:e
d:a
b :e
(x:x) (!1:!1)
(!1:!1)
(!1:!1)
(!2:!2)(!2:!2)
(!2:!2) (!2:!2)
(x:x)
(!2:!1)
0 1a:a
2b:!
3c:!
4d:d
0 1a:d
2!:e
3d:a
Figure 8: Redundant !-paths. A straightforward generalization of the !-free case could generate all the pathsfrom (1, 1) to (3, 2) when composing the two simple transducers on the right-hand side.
λ[p[π]]⊗w[π]⊗x. Thus, x can be viewed as the resid-ual weight at state q. The algorithm takes as input aweighted automaton A = (A, Q, I, F, E, λ, ρ) and,when it terminates, yields an equivalent deterministicweighted automatonA′ = (A, Q′, I ′, F ′, E′, λ′, ρ′).
The algorithm uses a queue S containing the setof states of the resulting automaton A′, yet to be ex-amined. The sets Q′, I ′, F ′, and E′ are initiallyempty. The queue discipline for S can be arbitrar-ily chosen and does not affect the termination of the
algorithm. The initial state set of A′ is I ′ = {i′}where i′ is the weighted set of the initial states of Awith the respective initial weights. Its initial weight
is 1 (lines 1-2). S originally contains only the subsetI ′ (line 3). Each time through the loop in lines 4-16,a new weighted subset p′ is dequeued from S (lines5-6). For each x labeling at least one of the transi-tions leaving a state p in the weighted subset p′, anew transition with input label x is constructed. Theweight w′ associated to that transition is the sum ofthe weights of all transitions in E[Q[p′]] labeled withx pre-⊗-multiplied by the residual weight v at eachstate p (line 8). The destination state of the transi-tion is the subset containing all the states q reachedby transitions in E[Q[p′]] labeled with x. The weightof each state q of the subset is obtained by taking the⊕-sum of the residual weights of the states p ⊗-timesthe weight of the transition from p leading to q andby dividing that by w′. The new subset q′ is insertedin the queue S when it is a new state (line 16). If anyof the states in the subset q′ is final, q′ is made a fi-
nal state and its final weight is obtained by summingthe final weights of all the final states in q′, pre-⊗-multiplied by their residual weight v (line 14-15).
The worst case complexity of determinization isexponential even in the unweighted case. However,
in many practical cases such as for weighted au-tomata used in large-vocabulary speech recognition,
this blow-up does not occur. It is also important to
notice that just like composition, determinization hasa natural lazy implementation in which only the tran-
sitions required by an application are expanded in the
result automaton.
Unlike in the unweighted case, determinization
does not halt on all input weighted automata. We
say that a weighted automaton A is determinizableif the determinization algorithm halts for the inputA.With a determinizable input, the algorithm outputs anequivalent deterministic weighted automaton.
The twins property for weighted automata char-
acterizes determinizable weighted automata undersome general conditions [Mohri, 1997]. Let A bea weighted automaton over a weakly left-divisible
semiring K. Two states q and q′ of A are said to besiblings if there are strings x and y in A∗ such thatboth q and q′ can be reached from I by paths labeledwith x and there is a cycle at q and a cycle at q′ bothlabeled with y. When K is a commutative and can-cellative semiring, two sibling states are said to betwins when for every string y:
w[P (q, y, q)] = w[P (q′, y, q′)]. (14)
Springer Handbook on Speech Processing and Speech Communication 14
0,0 1,1 1,2
2,1 2,2
3,1 3,2
4,3
a:d !:e
b:!
c:!
b:!
c:!
!:e
!:e
d:a
b :e
(x:x) (!1:!1)
(!1:!1)
(!1:!1)
(!2:!2)(!2:!2)
(!2:!2) (!2:!2)
(x:x)
(!2:!1)
0 1a:a
2b:!
3c:!
4d:d
0 1a:d
2!:e
3d:a
Figure 8: Redundant !-paths. A straightforward generalization of the !-free case could generate all the pathsfrom (1, 1) to (3, 2) when composing the two simple transducers on the right-hand side.
λ[p[π]]⊗w[π]⊗x. Thus, x can be viewed as the resid-ual weight at state q. The algorithm takes as input aweighted automaton A = (A, Q, I, F, E, λ, ρ) and,when it terminates, yields an equivalent deterministicweighted automatonA′ = (A, Q′, I ′, F ′, E′, λ′, ρ′).
The algorithm uses a queue S containing the setof states of the resulting automaton A′, yet to be ex-amined. The sets Q′, I ′, F ′, and E′ are initiallyempty. The queue discipline for S can be arbitrar-ily chosen and does not affect the termination of the
algorithm. The initial state set of A′ is I ′ = {i′}where i′ is the weighted set of the initial states of Awith the respective initial weights. Its initial weight
is 1 (lines 1-2). S originally contains only the subsetI ′ (line 3). Each time through the loop in lines 4-16,a new weighted subset p′ is dequeued from S (lines5-6). For each x labeling at least one of the transi-tions leaving a state p in the weighted subset p′, anew transition with input label x is constructed. Theweight w′ associated to that transition is the sum ofthe weights of all transitions in E[Q[p′]] labeled withx pre-⊗-multiplied by the residual weight v at eachstate p (line 8). The destination state of the transi-tion is the subset containing all the states q reachedby transitions in E[Q[p′]] labeled with x. The weightof each state q of the subset is obtained by taking the⊕-sum of the residual weights of the states p ⊗-timesthe weight of the transition from p leading to q andby dividing that by w′. The new subset q′ is insertedin the queue S when it is a new state (line 16). If anyof the states in the subset q′ is final, q′ is made a fi-
nal state and its final weight is obtained by summingthe final weights of all the final states in q′, pre-⊗-multiplied by their residual weight v (line 14-15).
The worst case complexity of determinization isexponential even in the unweighted case. However,
in many practical cases such as for weighted au-tomata used in large-vocabulary speech recognition,
this blow-up does not occur. It is also important to
notice that just like composition, determinization hasa natural lazy implementation in which only the tran-
sitions required by an application are expanded in the
result automaton.
Unlike in the unweighted case, determinization
does not halt on all input weighted automata. We
say that a weighted automaton A is determinizableif the determinization algorithm halts for the inputA.With a determinizable input, the algorithm outputs anequivalent deterministic weighted automaton.
The twins property for weighted automata char-
acterizes determinizable weighted automata undersome general conditions [Mohri, 1997]. Let A bea weighted automaton over a weakly left-divisible
semiring K. Two states q and q′ of A are said to besiblings if there are strings x and y in A∗ such thatboth q and q′ can be reached from I by paths labeledwith x and there is a cycle at q and a cycle at q′ bothlabeled with y. When K is a commutative and can-cellative semiring, two sibling states are said to betwins when for every string y:
w[P (q, y, q)] = w[P (q′, y, q′)]. (14)
33
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Solution - Filter F for Composition
0
x:x
!2:!1
1 !1:!1
2
!2:!2
x:x
!1:!1
x:x
!2:!2
Replace T1 ◦ T2 with T̃1 ◦ F ◦ T̃2.
34
34
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsmintersect A.fsm B.fsm >C.fsm
Graphical representation:
Intersection - Illustration
A.fsa
0
red/0.5
1 green/0.3
2 /0.8blue/0
yellow/0.6
B.fsa
0 /0 1 red/0.2
blue/0.6
green/0.4
2 /0.5yellow/1.3
C.fsa
0 1 red/0.7
2 green/0.7
3 /0.8blue/0.6
4 /1.3
yellow/1.9
A.fsa
0
red/0.5
1 green/0.3
2 /0.8blue/0
yellow/0.6
B.fsa
0 /0 1 red/0.2
blue/0.6
green/0.4
2 /0.5yellow/1.3
C.fsa
0 1 red/0.7
2 green/0.7
3 /0.8blue/0.6
4 /1.3
yellow/1.9
A.fsa
0
red/0.5
1 green/0.3
2 /0.8blue/0
yellow/0.6
B.fsa
0 /0 1 red/0.2
blue/0.6
green/0.4
2 /0.5yellow/1.3
C.fsa
0 1 red/0.7
2 green/0.7
3 /0.8blue/0.6
4 /1.3
yellow/1.9
35
35
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsmdifference A.fsm B.fsm >C.fsm
Graphical representation:
Difference - Illustration
A.fsa
0
red/0.5
1 green/0.3
2 /0.8blue/0
yellow/0.6
B.fsa
0 /0 1 red/0.2
blue/0.6
green/0.4
2 /0.5yellow/1.3
C.fsa
0 1 red/0.7
2 green/0.7
3 /0.8blue/0.6
4 /1.3
yellow/1.9
A.fsa
0
red/0.5
1 green/0.3
2 /0.8blue/0
yellow/0.6
B.fsa
0 1 red
blue
green
2 yellow
C.fsa
0
1 red/0.5
3 green/0.3
2 red/0.5
4 /0.8blue/0
yellow/0.6
green/0.3
red/0.5
A.fsa
0
red/0.5
1 green/0.3
2 /0.8blue/0
yellow/0.6
B.fsa
0 1 red
blue
green
2 yellow
C.fsa
0
1 red/0.5
3 green/0.3
2 red/0.5
4 /0.8blue/0
yellow/0.6
green/0.3
red/0.5
36
36
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Weighted automata and transducers
Rational operations
Elementary unary operations
Fundamental binary operations
Optimization algorithms
Search algorithms
This Lecture
37
37
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Optimization Algorithms
Connection: removes non-accessible/non-coaccessible states.
ε-Removal: removes ε-transitions.
Determinization: creates equivalent deterministic machine.
Pushing: creates equivalent pushed/stochastic machine.
Minimization: creates equivalent minimal deterministic machine.
38
38
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage 39
• Conditions: there are specific semiring conditionsfor the use of these algorithms, e.g., not all weighted automata or transducers can be determinized using the determinization algorithm.
39
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsmconnect A.fsm >C.fsm
Graphical representation:
Connection - Illustration
Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology ISMB 200534page
• Program: fsmconnect A.fsm >C.fsm• Graphical representation:
Connection - Illustration
A.fsa
0
red/0.5
1 green/0.3
2 /0.8blue/0
yellow/0.6
5
red/0
3 4 /0.2green/0.2
A.fsa
0
red/0.5
1 green/0.3
2 /0.8blue/0
yellow/0.6
5
red/0
3 4 /0.2green/0.2
C.fsa
0
red/0.5
1 green/0.3
2 /0.8blue/0
yellow/0.6
40
40
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Definition:
• Input: weighted transducer .• Output: equivalent weighted transducer with
all states connected.
Description:3. Depth-first search of from .
4. Mark accessible and coaccessible states.
5. Keep marked states and corresponding transitions.
Complexity: linear
Connection - Algorithm
T1
T2
T1 I1
O(|Q1| + |E1|).41
41
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsmrmepsilon T.fsm >TP.fsm
Graphical representation:
ε-Removal - Illustration
0 1a:b/0.1
!:!/0.2
2
!:!/0.3
3b:a/0.4
b:b/0.5
!:!/0.64/0
b:a/0.7
b:a/0.9
!:!/1
a:a/0.8 0
10.2
30.8 2
0.3
0.6
1.6
1
4/0
0
4/0
a:a/1.6
b:a/1
1
a:b/0.1
b:a/1.7
3
b:a/0.4
2
b:b/0.7
a:a/1.4
b:a/2.3
b:a/1.5
b:b/0.5
a:a/0.8
b:a/1.7
b:a/0.9
b:a/0.7
0 1a:b/0.1
!:!/0.2
2
!:!/0.3
3b:a/0.4
b:b/0.5
!:!/0.64/0
b:a/0.7
b:a/0.9
!:!/1
a:a/0.8 0
10.2
30.8 2
0.3
0.6
1.6
1
4/0
0
4/0
a:a/1.6
b:a/1
1
a:b/0.1
b:a/1.7
3
b:a/0.4
2
b:b/0.7
a:a/1.4
b:a/2.3
b:a/1.5
b:b/0.5
a:a/0.8
b:a/1.7
b:a/0.9
b:a/0.7
0 1a:b/0.1
!:!/0.2
2
!:!/0.3
3b:a/0.4
b:b/0.5
!:!/0.64/0
b:a/0.7
b:a/0.9
!:!/1
a:a/0.8 0
10.2
30.8 2
0.3
0.6
1.6
1
4/0
0
4/0
a:a/1.6
b:a/1
1
a:b/0.1
b:a/1.7
3
b:a/0.4
2
b:b/0.7
a:a/1.4
b:a/2.3
b:a/1.5
b:b/0.5
a:a/0.8
b:a/1.7
b:a/0.9
b:a/0.7
42
42
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Definition:
• Input: weighted transducer .• Output: equivalent WFST with no ε-transition.Description:• Computation of ε-closures.• Removal of εs.
• Complexity:• Acyclic • General case (tropical semiring):
ε-Removal - Algorithm
T1
T2
T! : O(|Q|2 + |Q||E|(T⊕ + T⊗)).
O(|Q||E| + |Q|2 log |Q|)
(MM, 2001)
43
43
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Computation of ε-closures
Definition: for p in Q,
Problem formulation: all-pairs shortest-distance problem in (T reduced to its ε-transitions).
• closed semirings: generalization of Floyd-Warshall algorithm.
• k-closed semirings: generic sparse shortest-distance algorithm.
C[p] ={
(q, w) : q ∈ ![p], d[p, q] = w "= 0}
,
where d[p, q] =⊕
π∈P (p,",q)
w[π].
T!
44
44
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Definition:
• Input: weighted automaton or transducer• Output: equivalent subsequential or deterministic
machine : has a unique initial state and no two transitions leaving the same state share the same input label.
Description:3. Generalization of subset construction: weighted
subsets , where s are remainder weights.
4. Computation of the weight of resulting transitions.
Determinization - Algorithm
T1
T2
{(q1, w1), . . . , (qn, wn)} wi
(MM,1997)
45
45
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Semiring: weakly left divisible semirings.
Definition: T is determinizable when the determinization algorithm applies to T.
• All unweighted automata are determinizable.• All acyclic machines are determinizable.• Not all weighted automata or transducers are
determinizable.
• Characterization based on the twins property.Complexity: exponential, but lazy implementation.
Determinization - Conditions
46
46
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsmdeterminize A.fsm >D.fsm
Graphical representation:
Determinization of Weighted Automata - Illustration
0
2a/1
b/4
1/0
a/3
b/1
3/0
b/1
b/3
b/3
b/5
{(0,0)}
{(1,2),(2,0)}/2a/1
{(1,0),(2,3)}/0
b/1
{(3,0)}/0
b/1
b/3
0
2a/1
b/4
1/0
a/3
b/1
3/0
b/1
b/3
b/3
b/5
{(0,0)}
{(1,2),(2,0)}/2a/1
{(1,0),(2,3)}/0
b/1
{(3,0)}/0
b/1
b/3
47
47
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsmdeterminize T.fsm >D.fsm
Graphical representation:
Determinization of Weighted Transducers - Illustration
0
2a:a/0.1
1
a:c/0.5
a:b/0.2 4/0.7
b:eps/0.4
3
b:b/0.3
c:eps/0.7
a:b/0.6
a:eps/0.5
Determinization of Weighted Transducers – Illustration
• Program: fsmdeterminize T.fsm >D.fsm
• Graphical Representation:
{(0, eps, 0)}{(1, c, 0.4),
(2, a, 0)}
a:eps/0.1
{(0, b, 0)}a:a/0.2
{(3, b, 0),
(4, eps, .1)}
/(eps, 0.8)
b:a/0.3
{(4, eps, 0)}
/(eps, 0.7)c:c/1.1
a:b/0.1
a:b/0.5
a:b/0.6
48
48
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Definition:
• Input: weighted automaton or transducer• Output: equivalent automaton or transducer
such that the longest common prefix of all outgoing paths be ε or such that the sum of the weights of all outgoing transitions be modulo the string or weight at the initial state.
Pushing - Algorithm
T1
T2
1
(MM,1997; 2004)
49
49
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage 50
• Description:1. Single-source shortest distance computation:
for each state q,
2. Reweighting: for each transition e such that
d[q] =⊕
π∈P (q,F )
w[π].
d[p[e]] != 0,
w[e] ← (d[p[e]])−1(w[e] ⊗ d[n[e]])
50
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage 51
• Conditions (automata case): weakly divisible semiring, zero-sum free semiring or automaton.
• Complexity:• automata case• acyclic case: linear• general case (tropical semiring):
• transducer case:
O(|Q| + |E|(T⊕ + T⊗)).
O(|Q| log |Q| + |E|).
O((|Pmax| + 1) |E|).
51
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsmpush -ic A.fsm >P.fsm
Graphical representation:
• Tropical semiring:
Weight Pushing - Illustration
0
1
a/0
b/1
c/4
2
d/0
e/1
3/0
e/0
f/1
e/10
f/11
0
1
a/0
b/1
c/4
2
d/10
e/11
3/0
e/0
f/1
e/0
f/1
0
1
a/0
b/1
c/4
2
d/0
e/1
3/0
e/0
f/1
e/10
f/11
0
1
a/0
b/1
c/4
2
d/10
e/11
3/0
e/0
f/1
e/0
f/1
52
52
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage 53
• Log semiring:
0
1
a/0
b/1
c/4
2
d/0
e/1
3/0
e/0
f/1
e/10
f/11
0
1
a/0.3266
b/1.3266
c/4.3266
2
d/10.326
e/11.326
3/-0.639
e/0.3132
f/1.3132
e/0.3132
f/1.3132
0
1
a/0
b/1
c/4
2
d/0
e/1
3/0
e/0
f/1
e/10
f/11
0
1
a/0.3266
b/1.3266
c/4.3266
2
d/10.326
e/11.326
3/-0.639
e/0.3132
f/1.3132
e/0.3132
f/1.3132
53
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsmpush -il T.fsm >P.fsm
Graphical representation:
Label Pushing - Illustration
0
a:a
1c: !
b:d
2a: !
3b:!
4
c:d
d:!
a:d
5a: !
6f:d
0
a:a
1c:d
b:!2
a:d
3b:!
4
c: !
d:d
a:d
5a: !
6f: !
0
a:a
1c: !
b:d
2a: !
3b:!
4
c:d
d:!
a:d
5a: !
6f:d
0
a:a
1c:d
b:!2
a:d
3b:!
4
c: !
d:d
a:d
5a: !
6f: !
54
54
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Definition:
• Input: deterministic weighted automaton or transducer .
• Output: equivalent deterministic automaton or transducer with the minimal number of states and transitions.
Description:
• Canonical representation: use pushing or other algorithm to standardize input automata.
• Automata minimization: encode pairs (label, weight) as labels and use classical unweighted minimization algorithm.
Minimization - Algorithm
T1
T2
(MM,1997)
55
55
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage 56
• Complexity:
• Automata case
• acyclic case: linear,
• general case (tropical semiring):
• Transducer case
• acyclic case:
• general case (tropical semiring):
O(|Q| + |E|(T⊕ + T⊗)).
O(|E| log |Q|).
O(S + |Q| + |E| (|Pmax| + 1)).
O(S + |Q| + |E| (log |Q| + |Pmax|)).
56
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsmminimize D.fsm >M.fsm
Graphical representation:
Minimization - Illustration
0 1a/0
b/1
d/0
2b/1
4
a/2
3
c/2
5
c/3
d/3
6
e/2
c/1
7/0e/1
d/4 e/3
0 1a/0
b/1
d/0
2b/0
4
a/3
3
c/0
5
c/0
d/6
6
e/0
c/1
7/6e/0
d/6 e/0
0 1a/0
b/1
d/0
2
b/0
a/3
3
c/0
d/6 6e/0
c/1
7/6e/0
0 1a/0
b/1
d/0
2b/1
4
a/2
3
c/2
5
c/3
d/3
6
e/2
c/1
7/0e/1
d/4 e/3
0 1a/0
b/1
d/0
2b/0
4
a/3
3
c/0
5
c/0
d/6
6
e/0
c/1
7/6e/0
d/6 e/0
0 1a/0
b/1
d/0
2
b/0
a/3
3
c/0
d/6 6e/0
c/1
7/6e/0
57
57
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Definition:
• Input: deterministic weighted automata A and B.• Output: TRUE iff A and B equivalent.Description
• Canonical representation: use pushing or other algorithm to standardize input automata.
• Automata minimization: encode pairs (label, weight) as labels and use classical algorithm for testing the equivalence of unweighted automata.
Complexity: (second stage is quasi-linear)
Equivalence - Algorithm
O(|E1| + |E2| + |Q1| log |Q1| + |Q2| log |Q2|).
(MM,1997):
58
58
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsmequiv [-v] D.fsm M.fsm
Graphical representation:
Equivalence - Illustration
D.fsa
0 1 red/0.3
2 /0.3blue/0.7
3 /0.4
yellow/0.9
M.fsa
0 1 red/0
2 /1.3blue/0
yellow/0.3
D.fsa
0 1 red/0.3
2 /0.3blue/0.7
3 /0.4
yellow/0.9
M.fsa
0 1 red/0
2 /1.3blue/0
yellow/0.3
59
59
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Weighted automata and transducers
Rational operations
Elementary unary operations
Fundamental binary operations
Optimization algorithms
Search algorithms
This Lecture
60
60
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsmbestpath [-n N] A.fsm >C.fsm
Graphical representation:
Single-Source Shortest-Distance Algorithms - Illustration
A.fsa
0
1 red/0.5
3 green/0.3
2 red/0.5
4 /0.8blue/0
yellow/0.6
green/0.3
red/0.5
C.fsa
0 1 green/0.3
2 /0.8blue/0
A.fsa
0
1 red/0.5
3 green/0.3
2 red/0.5
4 /0.8blue/0
yellow/0.6
green/0.3
red/0.5
C.fsa
0 1 green/0.3
2 /0.8blue/0
61
61
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Program: fsmprune -c1.0 A.fsm >C.fsm
Graphical representation:
Pruning - Illustration
A.fsa
0
1 red/0.5
3 green/0.3
2 red/0.5
4 /0.8blue/0
yellow/0.6
green/0.3
red/0.5
C.fsa
0 1 green/0.3
2 /0.8blue/0
A.fsa
0
1 red/0.5
3 green/0.3
2 red/0.5
4 /0.8blue/0
yellow/0.6
green/0.3
C.fsa
0 1 green/0.3
2 /0.8blue/0
yellow/0.6
62
62
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
Summary
FSM Library:
• weighted finite-state transducers (semirings);• elementary unary operations (e.g., reversal);• rational operations (sum, product, closure);• fundamental binary operations (e.g., composition);• optimization algorithms (e.g., ε-removal,
determinization, minimization);
• search algorithms (e.g., shortest-distance algorithms, n-best paths algorithms, pruning).
63
63
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
References
• Cyril Allauzen and Mehryar Mohri. Efficient Algorithms for Testing the Twins Property. Journal of Automata, Languages and Combinatorics, 8(2):117-144, 2003.
• Cyril Allauzen, Mehryar Mohri, and Brian Roark. The Design Principles and Algorithms of a Weighted Grammar Library. International Journal of Foundations of Computer Science, 16(3):403-421, 2005.
• Cyril Allauzen, Michael Riley, Johan Schalkwyk, Wojciech Skut, and Mehryar Mohri. OpenFst: a general and efficient weighted finite-state transducer library. In Proceedings of the Ninth International Conference on Implementation and Application of Automata, (CIAA 2007), volume 4783 of Lecture Notes in Computer Science, pages 11-23. Springer, Berlin, 2007.
• Mehryar Mohri. Finite-State Transducers in Language and Speech Processing. Computational Linguistics, 23:2, 1997.
• Mehryar Mohri. Weighted Grammar Tools: the GRM Library. In Robustness in Language and Speech Technology. pages 165-186. Kluwer Academic Publishers, The Netherlands, 2001.
• Mehryar Mohri. Statistical Natural Language Processing. In M. Lothaire, editor, Applied Combinatorics on Words. Cambridge University Press, 2005.
64
64
-
Mehryar Mohri - Speech Recognition Courant Institute, NYUpage
References
• Mehryar Mohri. Generic Epsilon-Removal and Input Epsilon-Normalization Algorithms for Weighted Transducers. International Journal of Foundations of Computer Science, 13(1):129-143, 2002.
• Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. The Design Principles of a Weighted Finite-State Transducer Library. Theoretical Computer Science, 231:17-32, January 2000.
• Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Weighted Automata in Text and Speech Processing. In Proceedings of the 12th biennial European Conference on Artificial Intelligence (ECAI-96), Workshop on Extended finite state models of language. Budapest, Hungary, 1996. John Wiley and Sons, Chichester.
• Mehryar Mohri and Michael Riley. A Weight Pushing Algorithm for Large Vocabulary Speech Recognition. In Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech'01). Aalborg, Denmark, September 2001.
• Fernando Pereira and Michael Riley. Finite State Language Processing, chapter Speech Recognition by Composition of Weighted Finite Automata. The MIT Press, 1997.
65
65